University
c of Leicester 2014
i
Preface
Welcome to module MA2404 Principles of Financial Modelling.
This module is part of your studies towards the BSc Diploma in Actuarial
Science with the University of Leicester. The module has no formal prerequi
sites apart from that you are familiar with standard probability theory seen
during the first year of your study.
An important thread through all actuarial and financial disciplines is the use
of appropriate models. For example, it is obvious that human mortality is of
fundamental importance in the pricing of pension policies and life insurance
contracts; robust mathematical models of mortality are therefore needed by
institutions involved in the provision of such products. Actuaries within
these institutions need a thorough understanding of a variety of models so
that they can choose the best model in any given situation. The aim of this
module is to provide an introduction to some such models, with emphasis to
Markov ones.
We begin the module with a chapter that explains the underlying concepts
of financial and actuarial modelling in general. Then we discuss the Monte
Carlo method as simple probabilistic algorithms for simulating systems where
an underlying randomness exists. Then, after short review of probability
theory and stochastic processes, we study the theory of Markov processes and
their application to financial and actuarial modelling. In the last chapters,
we pay particular attention to mortality.
Dr Bogdan Grechuk
ii
Contents
1 Principles of actuarial and financial modelling 1
1.1 Why are models used? . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Key steps in the modelling process . . . . . . . . . . . . . . . 5
1.3 Benefits and limitations of modelling . . . . . . . . . . . . . . 8
1.4 Stochastic and deterministic models . . . . . . . . . . . . . . . 9
1.5 Suitability of a model . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Shortrun and longrun properties of a model . . . . . . . . . . 13
1.7 Analysing model output . . . . . . . . . . . . . . . . . . . . . 13
1.8 Sensitivity testing . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.9 Communicating the results . . . . . . . . . . . . . . . . . . . . 15
1.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Markov Chains 57
4.1 The Markov property . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Definition of Markov Chains . . . . . . . . . . . . . . . . . . . 58
4.3 The ChapmanKolmogorov equations . . . . . . . . . . . . . . 61
iii
4.4 Time dependency of Markov chains . . . . . . . . . . . . . . . 62
4.4.1 Timeinhomogeneous Markov chains . . . . . . . . . . 62
4.4.2 Timehomogeneous Markov chains . . . . . . . . . . . . 62
4.5 Further applications . . . . . . . . . . . . . . . . . . . . . . . 64
4.5.1 The simple (unrestricted) random walk . . . . . . . . . 64
4.5.2 The restricted random walk . . . . . . . . . . . . . . . 66
4.5.3 The modified NCD model . . . . . . . . . . . . . . . . 67
4.5.4 A model of accident proneness . . . . . . . . . . . . . . 69
4.5.5 General principles of modelling using Markov chains . . 70
4.6 Stationary distributions . . . . . . . . . . . . . . . . . . . . . 71
4.7 The longterm behaviour of Markov chains . . . . . . . . . . . 74
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
iv
A Chapter 1 solutions 124
v
The following book has been used as the basis for the lecture notes
vi
Chapter 1
Principles of actuarial and financial modelling
Introduction
In this chapter, we introduce the idea of using models to represent and ex
amine various real life systems or processes. We discuss:
why models are used
1
provided for life assurance. However, unlike with death, the insured
can move into and out of the state of illhealth.
Within each section of this chapter, we discuss examples to put the modelling
process into financial and actuarial contexts.
This section of the module provides the principles of financial and actuar
ial modelling, primarily as a theoretical concept. The module, as a whole,
then expands on this theoretical basis, providing details around the models
actually used by financial analysts and actuaries. Therefore, it is likely that
you will benefit from revisiting this chapter whilst studying the rest of this
module.
2
1.1 Why are models used?
A model is an imitation of a system or process. It is usually built to represent
and explore an event which could occur in the real world. For example, the
effect of medical treatment on a cancer patients, the flow of traffic on the
street or the results of a horse race.
We can use a model to investigate the possible outcome and consequences of
a particular scenario without having to wait for the actual scenario to run its
course. This allows us to observe systems in condensed time and so to plan
for possible consequences, or even to decide not to proceed with a certain
project at all.
The model also enables us to investigate the effect different input param
eters have on the results of a model.
Example 1.1. Let us consider the expected lifetime of a terminally ill
cancer patient. Suppose we set up a model exploring the effect taking various
quantities of a new drug has on the patients lifetime. Then the quantity of
the new drug is an input parameter. Our model may then enable us to
optimise the amount of the drug to give to the patient in order to maximise
their lifetime.
Example 1.2. A 20 year old married man has decided to take out a personal
pension with a life company. He wishes to pay a fixed percentage of his
salary into this personal pension until he retires. The life company directs
him to a Pensions Calculator on their website, which projects his pension
at retirement. The Calculator asks for details like current earnings, the
percentage of salary he wishes to pay into the pension, the age at which
he wishes to retire and whether he wishes to provide a pension for his wife
(spouses pension) if he dies before she does. These details are all examples of
input parameters and affect the size of the projected pension at retirement.
The Pensions Calculator itself is an example of a model.
Models are simplified versions of real world system and involve making a
number of assumptions about how the system works. This includes whether
and how we choose to model the relationships between various parameters
in the model. The number of assumptions made can reduce the level of
complexity of the model compared to that of the real world system.
Example 1.3. In example 1.2 above, there are many factors which will
affect the projected level of pension. Some of the financial factors include
future salary increases, price inflation and future investment returns. There
is likely to be some correlation between these factors, i.e. salary increases
3
tend to be higher when price inflation is higher. However, expected salary
increases may also be dependent on the individual concerned e.g. is the man
a highflyer?
Any Pensions Calculator will either include such factors as input parameters
(i.e. the user explicitly chooses these assumptions) or the assumptions will be
implicitly made within the model. There will need to be a balance between
making the Pensions Calculator easy to use and simple for customers and
ensuring accurate pensions projections are produced.
We will need to use input data in order to produce and parametrise the
model. Such data may relate to past observations, current observations or
expected future values (e.g. target inflation rate). Statistical methods can be
used to fit the data to the model, if the data is considered to be appropriate.
When deciding on the appropriate level of detail to include in the model, it
is important to consider the objectives of the model. That is, we need to
balance the cost and time of producing the most accurate model against the
purpose for which it is required.
Example 1.4. An actuary advises a large final salary pension scheme (i.e.
a pension scheme which provide a pension based on the members salary at
retirement). She has been asked to estimate the current funds required to
meet the cost of the pension benefits built up to date in the pension scheme.
This will involve, amongst other things, assumptions about the lifetime of
the members of the pension scheme.
The actuary could send a questionnaire to each member asking them about
their current health, whether they smoke, how much they drink etc. She
could then use this data in her model to establish exactly how long each
member can be expected to survive. However, this would be a time consum
ing and costly exercise. Given the size of the pension scheme, it is unlikely
to add significantly to the accuracy of her estimate. It would be more appro
priate for her to use standard life tables, produced by life companies or from
census data, and to adjust these if necessary, to assess the expected lifetime
of members. For example, if the company is a large accountancy firm, em
ployees are likely to have lower mortality than the general population (for
many reasons; for example because general population includes people hav
ing much more healthrisky jobs) and so she may use the standard life table
rated down by two or three years.
4
1.2 Key steps in the modelling process
The modelling process is varied and does not rigidly follow a prescribed series
of steps. Movement between each stage of modelling is fluid, and actuaries
will often revisit earlier work in order to improve and fine tune their model.
However, there are 12 key stages which should always be considered when
constructing a model. These are outlined below.
2. Plan and validate: Plan the model around the chosen objectives,
ensuring that the models output can be validated i.e. checked to ensure
it accurately reflects the anticipated output from the relevant system
or process.
3. Data: Collect and analyse the necessary data for the model. This
will include assigning appropriate values to the input parameters and
justifying any assumptions made as part of the modelling process.
9. Test model output: Test the reasonableness of the output from the
model. The experts on the relevant system or process should be in
volved at the stage.
10. Review and amend: Review and carefully consider the appropriate
ness of the model in the light of small changes in input parameters.
5
12. Document and communicate: Communicate and document the re
sults and the model.
2. Plan, validate and consult experts: Bills plan is to use all of the
rental income to pay off of the mortgage. He intends to do any repair
work or maintenance himself and to rent the property out directly
rather than through an agent. He has obtained a mortgage offer from
the bank and will use this to compare to his estimate of how quickly
he will pay off the mortgage.
3. Data: Bill will need to collect data on the cost of mortgages, the level
of rent he can expect to obtain from the property, and the anticipated
cost of any repair work or ongoing maintenance of the property. He will
also need to have an idea of the likely length of voids on the property.
4. Capture real world system: You discuss with Bill some of the com
plexities involved with getting a mortgage and estimating the future
rental income. Issues such as fluctuating interest rates, estimating in
flation, tax etc. These would all need to be taken into account in order
to model the true time period it could take to pay his mortgage. How
ever, Bill says that at this stage he just wants an approximate idea
of the period. He accepts that this is subject to some potential risk,
upside or downside.
6
and has built up a large portfolio. He will discuss this project with
him.
6. Choose computer program: Bill decides to ignore the potential
volatility of the input parameters, and build a simple model in excel.
This treats interest rates, void periods (i.e. periods where the property
is empty and hence no rent is received), and increases in rental income
as fixed values.
7. Write model: Bill uses excel to project four figures for each year after
he buys the house: the mortgage level at the start of the year, the
interest required for that year, the rental income received over the year
and the mortgage level at the end of the year. He multiplies the rental
income by a certain percentage (below 100) to allow for void periods.
8. Debug: You explain to Bill that he should check to ensure that he
has not made any silly errors in the model. He should also check the
figures in each row appear reasonable and that the mortgage level is
decreasing each year: if not, he will have either made a mistake or he
will be unable to pay off his mortgage based on rental income alone.
9. Test model output: Bill should looks at the overall results to check
that it appears reasonable, based on the discussions he has had with
his acquaintance and the bank.
10. Review and amend: You are concerned that Bill has such a relaxed
attitude to the potential volatility of the input parameters. You suggest
he reruns his model looking at different scenarios, e.g. consider a best
case scenario with low interest rates, high rental growth and low voids
and a worst case scenario with high interest rates, low rental growth
and high voids. He should try to understand any correlation that exists
between the financial and property market, to put these scenarios into
context. His magazines may help here.
11. Analyse output: He should ensure that he is comfortable with the
results, taking into account the different scenarios covered. He may
wish consider other options, for example, selling the property early
under the worst case scenario.
12. Document and communicate: This is the point at which he will
need to make a decision about whether to proceed. He should sum
marise what the information at this stage to ensure he has fully under
stood this investment.
7
1.3 Benefits and limitations of modelling
Benefits of modelling
Limitations of modelling
Whilst models are extremely useful when dealing with actuarial problems,
they have their limitations. Modelling a process is not always the most
effective or efficient way to approach the problem. We have explained some
of the general drawbacks/limitations that can occur below.
8
Time and cost: Modelling complex systems can require the invest
ment of a significant amount of time and expertise. This, in turn, leads
to a significant cost to the client.
9
When we run a stochastic model we obtain one possible outcome from the
model, i.e. the output is random. Each specific output is only one estimate
of the real world system. Therefore, several independent runs are required
in order to obtain an idea of the distribution of the output. The more runs,
the more accurate a picture we can obtain. For complex models, thousands
of runs maybe required. (We will develop this idea later in this course, when
we meet Monte Carlo simulations.)
In contrast, a deterministic model uses fixed parameters and has a single
set of outputs. You could view a deterministic model as a single run of a
stochastic model with fixed input parameters. That is, you only need run a
deterministic model once to determine the results.
Example 1.6. Consider the expected income from investing in a unit trust
over a three year period. Imagine the return in any year is independent and
is distributed such that there is an equal chance of a 20% fall, a 10% rise or
a 50% rise.
If we use a deterministic approach, we may decide to determine the result
simply by calculating the mean return, per annum, and assuming this is the
return each year. i.e.
(0.8+1.1+1.5)
3
= 1.1333 i.e. 13.3% p.a. and
3
1.1333 = 1.4557 i.e. 45.6% return over 3 years.
This result provides no indication of the possible range of returns from this
investment.
Alternatively, we could use a stochastic approach. We could use a string of
three random numbers, between 1 and 3, to determine the outcome for each
year of investment. Attributing 1 to a 20% loss, 2 to a 10% gain and 3 to a
50% gain. For example, imagine our string of three random numbers is 1,1,3
our return would be:
0.8 0.8 1.5 = 0.96 i.e. a 4% loss over the 3 years.
We could then rerun this model, with more strings of three random numbers,
to obtain predicted values for the possible investment return. The random
number strings and corresponding output may look something like:
(1,1,3),(1,2,3),(1,2,2),(2,2,2),(3,2,3),(3,1,3),...
which translates to a 3 year return of:
4%, 32%, 3.2%, 33.1%, 147.5%, 80%,...
This approach does provide an indication of the possible range of returns
from this investment. This may be vital if the investor requires a lump sum
of a certain amount in three years time.
10
In actual fact, we could calculate the distribution for this investment and
do not need to examine the output from a number of runs to obtain an
indication of the distribution. However, this is often not the case for more
complex models.
11
Too simple model may not be able to provide the results, sufficient to
achieve the objectives. On the other hand, the model should not be
too complicated, and provide more results than required.
The validity of the model: At all stages we should bear in mind the
validity of the model for the purpose to which it is to be applied.
The validity of the data to be used: The input data for the model
should be credible and appropriate. For example, if an input parameter
is highly unpredictable, it should probably be included in a model as a
random parameter rather than deterministic one.
12
The ease of the communication: The model, and its results, must
be easily communicated to its intended audience. Any correlations
between results should be understood and communicated to the client.
1. To check the models results are consistent with the type of results
we would expect from the system we are trying to imitate. For real
world systems the comparison can be carried out using a Turing test.
This involves experts in the real world system comparing several sets of
data from the model and data from the real world system. The experts
should be unable to differentiate between the two sets of data. If they
are able to differentiate, their method of differentiation can then be
used to adapt and improve the model.
13
from the models output can be investigated to yield some knowledge
about the system concerned.
14
Consider an employee aged 45 now. This employee will receive his lump sum
in 20 years time. The present value of this benefit will include a discounting
1
factor to allow for investment return over this period i.e. (1.07)20 .
15
bestestimate and pessimistic. It is important the client fully understands
the difference between each of these results.
The client should not simply choose the optimistic basis on the grounds that
it allows them to pay a lower contribution to the pension scheme now. Such
an opinion would result in a misunderstanding of the true cost of the pension
scheme and may unnecessarily threaten the funding position of the scheme,
and hence reduce the security of pension benefits for the scheme members.
On the other hand, the client should also be careful not to simply pay contri
butions based on the pessimistic basis. This may lead to the scheme becoming
overfunded on a statutory basis (i.e. a basis prescribed by the government),
which can have tax implications, or the client deciding to reduce future pen
sion benefits unnecessarily, because they appear to cost too much.
Therefore, it is vital that the actuary communicates such results as carefully
as possible. She should ensure the implications of changing each key assump
tion is fully understood, and if necessary that results are provided on further
bases.
Indeed, in recent years, there has been a move towards carrying out such
valuations on a stochastic basis. This enables the client to fully understand
the distribution of the projected assets and liabilities.
16
1.10 Summary
In this chapter, we have:
discussed some of the decisions that need to be made during this pro
cess, e.g. should the model be deterministic or stochastic?
17
Questions
1. What input parameters might you have for a model assessing the cost
of providing life assurance?
3. At what stage in the modelling process should you debug your program?
18
Chapter 2
The Monte Carlo method
A complicated stochastic model can rarely be completely solved, in sense
that exact distribution of the results is analytically derived. If this is not
possible, the most natural approach one could imagine is just to run model
several times with different input parameters to have an idea of the possible
range of the model results. For example, assume that the model is intended
to estimate a single quantity of interest F , such as a possible profit of an
investment, or number of claims to an insurance company. Running the
model m times, we get a values f1 , f2 , . . . , fm , which provides us with an
indication what F can be. Based of this, different quantities of interest can
beP estimated; for example, the expected value of F can be approximated as
1 m
m i=1 fi . This simple idea of several runs is called MonteCarlo simulation
for a model.
Thus, the Monte Carlo method provides a simple probabilistic algorithms
for simulating systems where an underlying randomness exists. The main
advantage of the method is that the basic concepts are easily understood
and can be programmed relatively quickly, even for the most exotic models.
With the advent of cheap highpowered computers, the Monte Carlo method
has become extremely important in all financial institutions.
Even though an Actuarys interest in the method is a practical one, it is
important to have an understanding of the theoretical background of the
method. For example, one can ask the following questions.
Accuracy of the method: How many runs of the MonteCarlo method
is required to get a reasonably accurate results for a particular model?
Input data generation: Assuming that the input data for a model
follow complicated probability distributions, how to generate such in
put data appropriately?
These and other questions will be considered in this chapter together with
some practical examples. We begin with a discussion of the basic concepts,
motivated by a simple example: the evaluation of a deterministic integral.
We then proceed to discuss the generation of random numbers which is a
fundamental requirement of Monte Carlo method, before considering the
formal development of the method as applied to solving stochastic financial
and actuarial models. We will see that the Monte Carlo method that is
the preferred numerical method for highdimensional problems that typically
arise in financial and actuarial modelling.
19
2.1 A motivating example
By way of introduction to the Monte Carlo method we look at a straight
forward example. Consider the evaluation of a deterministic integral over
the unit interval [0, 1] Z 1
I= g(x)dx. (1)
0
The Monte Carlo method requires a probabilistic representation of the prob
lem, even though a deterministic example is being considered. This is a
fundamental requirement of the method. We can do this by noting that
the probability density function for a Uniform distribution over this interval,
U (0, 1), is f (x) = 1. I can therefore be represented as
Z 1
I= g(x)f (x)dx = Eg(), (2)
0
where is a random variable with uniform distribution over the interval [0, 1],
i.e. from U (0, 1), and E represents an expectation under this distribution.
Using this representation we can now propose a probabilistic algorithm for
evaluating the integral, this will be the Monte Carlo method.
Let us first assume that we have an algorithm for generating random numbers
that draws points 1 , 2 , . . . , M independently from U (0, 1). We can then
produce an approximation to equation (2) in terms of an arithmetic average
of evaluations of g(x) at a number of random points:
M
1 X
I = Eg() IM = g(m ). (3)
M m=1
20
and 0.711. Use these data to approximate the integral of x2 over the unit
domain.
Answer: The integral can be approximated by computing
5
1X 2
I5 = u,
5 i=1 i
where {ui } are the random numbers given. Using these values, we obtain
1
I5 = 0.6592 + 0.9312 + 0.7102 + 0.6882 + 0.7112 = 0.556.
5
Note that the actual value is 1/3 and so we see that the method has over
estimated the integral using this very small number of random numbers. In
reality much higher values of M are required, see Example 2.2 for the orders
of magnitude.
21
no systematic error in this application. Further, the statistical error, defined
as Var[RM ] = g2 /M , results from the finite number of random numbers
generated, and tends to zero as M increases. These two different measures
of the error are important in approximation techniques.
Since IM is a random estimate of the integral, we can form confidence inter
vals for its value from the mean and variance as follows
g g
IM I c , I + c ,
M M
with probability 0.997 for c = 2.75 and 0.95 for c = 1.96, for example. The
confidence interval demonstrates how the statistical error of the Monte Carlo
method is of practical importance and emphasises that it is of O(M 1/2 ).
This means that to add one decimal place of preciseness the method requires
100 times as many randomnumber computations.
The trapezoidal rule, that you may be familiar with from your undergradu
ate studies, is an alternative deterministic method for approximating I. In
particular,
M 1
g(0) + g(1) 1 X m
I + g . (6)
2M M m=1 M
In contrast, the error in this approximation is of O(M 2 ) which demonstrates
that the Monte Carlo method is not competitive for the 1dimensional inte
gral.
However, the great advantage of the Monte Carlo method is that it can,
in principle, deal with the curse of dimension: where using some numerical
integration techniques there would be an exponential increase in the com
putational cost with the dimension of the problem, there is not using the
Monte Carlo method. This is due to the fact that the rate of convergence
of O(M 1/2 ) is not restricted to integrals over the unit interval. What was
done in this simple example can easily be extended to estimating an integral
over [0, 1]d or any other domain in Rd for all dimensions d. Of course, when
we change the dimension, we change the function g and so we change g2 , but
the statistical error still has the form g / M for an estimate computed from
M draws from [0, 1]d . In particular, the O(M 1/2 ) convergence rate holds for
all d.
In contrast, the error produced by the trapezoidal rule in ddimensions is
of O(M 2/d ). This degradation in convergence rate with increasing dimen
sion is characteristic of all deterministic integration methods. Thus, Monte
Carlo methods are attractive in evaluating integrals in high dimensions, as
do typically arise in financial and actuarial applications.
22
2.3 Application to stochastic modelling
The Monte Carlo method was introduced by the very simple application of
approximating deterministic integrals over a unit domain, see 2.1. As was
discussed, the method is not as efficient as standard methods of numerical
integration (for example, the trapezoidal method) in the 1dimensional case,
but has the great advantage that it does not suffer from the curse of dimension
for general ddimensional systems. Monte Carlo methods are therefore to be
preferred in multidimensional problems.
Assume that we have build a complicated stochastic model which is intended
to estimate a single quantity of interest F . Let X1 , X2 , . . . , Xn be (random)
input parameters, which for simplicity will be assumed to be independent.
The model can be viewed as a black box returning the result in response
to input. From this point of view, the model can be described as a single
equation
F = f (X1 , X2 , . . . , Xn ).
The underlying function f , however, is usually so complicated, that it cannot
be analysed analytically. We can, however, evaluate the value of this function
for any particular input parameters x1 , x2 , . . . , xn .
The expected value of F (as well as variance and other characteristics we may
be interested in) is ndimensional integral and can be evaluated using the
MonteCarlo method as described in the previous sections. Namely, assume
that we are able to generate M tuples of input parameters xi1 , xi2 , . . . , xin , i =
1, 2, . . . , M , which are independent random variates from the distributions of
X1 , X2 , . . . , Xn , correspondingly. Then we can evaluate
From
chapter 2.2 we know that the accuracy of this estimate increases as
O( M ), no matter how complicated the underlying function f is, and how
many input parameters it has.
As we see, in order to apply the Monte Carlo method to stochastic modelling,
we need to be able to generate random numbers from distributions much more
complicated than U (0, 1). We therefore consider the generation of random
numbers from a general distribution in the next section.
23
2.4 Random number generation
As we have seen in 2.1, the Monte Carlo method is a statistical sampling
technique where one evaluates a nonrandom quantity as an expectation of
a random variable (see equation (3), for example). In order to apply the
technique it is therefore necessary to use a large number of random numbers
from a specified distribution. Generating these is a fundamental task in
implementing any Monte Carlo approximation.
Truly random numbers are generated from physical processes, such as ther
mal noise and quantum phenomena, that can be exploited in physical devices.
Although the use of such devices is clearly impractical in nearly all conceiv
able applications, in 1955 the RAND Corporation published a table of one
million random digits obtained with such a device. Subsets of these can be
incorporated into software and such lists of random numbers are convenient
to use in many applications. However, when implementing the Monte Carlo
method millions of random numbers are often required and problems of peri
odicity in a finitelength table can therefore arise. For this reason one needs a
better source of random numbers and it is typical to use pseudorandom num
bers that are generated by random number generating algorithms (RNGs).
These are not truly random in the sense that they are generated from a
small set of initial values (the seed) and a recursive formula which forms the
algorithm, but they approximate the statistical properties of truly random
numbers. In what follows we shall refer to realisations of a random variable
generated by a computer as random variates to distinguish them from truly
random numbers.
RNGs have the great advantage of being easily incorporated into the coding of
Monte Carlo simulations, producing an unlimited supply of random variates
quickly and without resorting to physical means. They have the further
advantage of being reproducible: if the same seed is given at the beginning of
two runs of the RNG, identical sequences of random variates will be produced.
Exact reproductions of individual simulations may be very usefully when
debugging the code and the advantage of having to store only a single seed
rather than a very large sequence of random numbers is clear.
24
formula. If further information is required you are invited to research linear
congruential generators or Fibonacci generators as particular examples of a
method.
In what follows it is assumed that a source of random variates from the
U (0, 1)distribution has been established and the question of how to use
these to generate random variates from other distributions is considered.
M IM
102 0.327
103 0.330
Note that each computation using this method will produce a different ran
dom value of IM .
P (X x) = P F 1 (U ) x = P (U F (x)) = F (x).
25
Therefore, if we require a random variate x from a given distribution, we can
use the following short algorithm:
2. Return x = F 1 (u).
x1
f (x) = , for x > 0.
( + x )+1
Generate a random variate from this distribution using the inverse transform
method.
Answer: The distribution function of X is given by
Z x
F (x) = P (X x) = f (s)ds = 1 .
0 + x
To find F 1 (u) we solve the equation u = F (x) for the variable x, leading to
1/
F 1 (u) = x = (1 u)1/ 1
.
The main disadvantage of the inverse transform method is the need for either
an explicit expression for the inverse of the distribution function, F 1 (y), or
a numerical method to solve y = F (x) for an unknown x. This means that
it cannot be used for some distributions where the explicit expression does
not exist, and the computation (using Newtons method, for example) is too
expensive. For example, to generate a random variate from the standard
normal distribution using the inversetransform method requires the inverse
of the distribution function
Z x
1 2
F (x) = et /2 dt.
2
26
Since no explicit solution to the equation u = F (x) can be found in this case,
numerical methods must be used.
Using the inverse transform method, it is also possible to generate random
variates from discrete distributions. Let X be a discrete random variable
which can take values x1 , x2 , . . . , xN where x1 < x2 < < xN . The
distribution function of X is given by
P (X = xi ) = pi , for i = 1, 2, . . . , N ;
where pi > 0 and N
i=1 pi = 1. The distribution function of X is therefore
X
F (x) = P [X x] = pi .
i:xi x
This algorithm can only return variates x from the range {x1 , x2 , . . . xN },
and the probability that a particular value x = xn is given by
P [value returned is xn ] = P [F (xn1 ) < U F (xn )] = F (xn )F (xn1 ) = pn .
27
AcceptanceRejection method
0.8
0.6
f(x)
0.4
(X,Y)
0.2
0
x0
0.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
This method is best motivated from a visual point of view: consider the den
sity function f (x) plotted in Figure 1. If any point is selected at random from
the area enclosed by the graph and the horizontal axis, then the point can be
considered a random variable (X, Y ). In particular, the xcoordinate of the
point, X, is a random variable with density function f . This is intuitively
obvious but we can justify this slightly more formally as follows.
The quantity P (X x0 ) is the probability that the xcoordinate of the ran
domly chosen point is less than x0 , i.e. it is the probability that the point is
within the area to the left of x0 . By saying that the point has been selected
randomly, we mean that it was selected uniformly by area under the plot of
the
R x0 density function to the left of x0 . The probability therefore has value
f (x)dx. Differentiating this we see that the density function of X is f .
This reasoning forms the basis of the acceptancerejection method for random
variate generation. If it is too difficult to generate points at random from
under the graph of f , a reasonable approximation is to generate points from
a larger area which includes the region under the graph of f , then discard
any points which are not acceptable. To this end, a simpler density function
h(x) is constructed that is straight forward to draw random points from un
der and is such that f (x)/h(x) is bounded for all x. Once this function is
found, we define
f (x) f (x)
C = sup and g(x) = x, (7)
h(x) Ch(x)
28
so that 0 g(x) 1. This construction means that once a point (x, y) is
drawn at random from under graph Ch(x), the value g(x) gives the proba
bility that the point also falls under the graph of f .
The following algorithm therefore arises for the acceptancerejection method:
ex
f (x) = for < x < .
(1 + ex )2
0.9
f(x)
Ch(x)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
10 8 6 4 2 0 2 4 6 8 10
x
29
Answer: Inspection of the behaviour of f at 0 and at suggests that the
double exponential density may be appropriate (see Figure 2), with
1
h(x) = ex .
2
In order to find C we first consider the ratio f (x)/h(x),
f (x) 2
= 2.
h(x) (1 + ex )
According to equation (7), C is the largest value that this ratio can take,
which is 2, and this leaves
f (x) 2
g(x) = = 1 + ex .
Ch(x)
The algorithm for generation of random variates x from the logistic distribu
tion is therefore given by
30
particular properties, but an understanding of the mathematics behind each
method is beyond the scope of the syllabus. The algorithms for the methods
are therefore stated without justification.
Note that the scaling property of the normal distribution means that any
standard normally distributed variate, Z N (0, 1), generated from these
methods is easily transformed to a normally distributed variate, X N (, 2 ),
via X = + Z.
BoxMuller algorithm
The following algorithm can be used for generating a pair of independent
random variates from the standard normal distribution, z1 , z2 :
1. Generate two random variates from U (0, 1), u1 and u2 ;
2. Return z1 = 2 ln u1 cos(2u2 ) and z2 = 2 ln u1 sin(2u2 ).
The BoxMuller method is easy to incorporate into a computer code, however
it suffers from the disadvantage that the computation of sin and cos func
tions is time consuming. For this reason an alternative formulation of this
method is generally preferred when very large numbers of random variates
are required, as, for example, in the Monte Carlo method. This alternative
is called the Polar algorithm.
Polar algorithm
The polar algorithm is very similar to the BoxMuller method in its justi
fication (not discussed here) but is modified through use of the acceptance
rejection method to avoid computation of the trigonometric functions.
The Polar algorithm is as follows:
1. Generate two random variates from U (0, 1), u1 and u2 ;
3. If s > 1 go to step 1. q q
2 ln s 2 ln s
Otherwise, return z1 = s
1 and z2 = s
2 .
As with the BoxMuller method, the Polar method generates a pair of inde
pendent random variates from the standard normal distribution.
31
0.447, 0.048, 0.224, 0.593, 0.478, 0.165 and 0.113. Use the BoxMuller and
Polar algorithms to generate variates from the N (1, 4) distribution, based
on these data.
Answer: Standard normal variates are computed in pairs using the above
algorithms and each is transformed via X = 1 + 2Z to obtain the N (1, 4)
variates. Results for the BoxMuller and Polar algorithms are shown below:
u1 u2 Z1 (BM ) Z2 (BM ) X1 (BM ) X2 (BM )
0.587 0.155 0.580 0.854 2.160 2.708
0.030 0.447 2.503 0.866 4.006 2.732
0.048 0.224 0.401 2.432 1.802 5.864
0.593 0.478 1.013 0.141 1.026 1.282
0.165 0.113 1.440 1.237 3.880 3.474
u1 u2 Z1 (P ) Z2 (P ) X1 (P ) X2 (P )
0.587 0.155 0.285 1.131 1.570 1.262
0.030 0.447 0.468 0.053 0.064 0.894
0.048 0.224 N/A N/A N/A N/A
0.593 0.478 2.504 0.592 6.008 0.184
0.165 0.113 N/A N/A N/A N/A
Where Zi and Xi indicate N (0, 1) and N (1, 4)variates computed using these
methods, respectively.
Note that the BoxMuller method produced 10 variates but the Polar method
produced only 6 from the same set of U (0, 1)variates. This is a consequence
of the acceptancerejection method incorporated into the Polar method. If 10
variates were required from the Polar method, more pairs of U (0, 1)variates
would have to be generated.
Approximate method
When the exact distribution of the variates is not important, a method often
used is to generate a sequence of U (0, 1)variates, u1 , u2 , . . . , u12 and set z =
12
1 ui 6. The resulting variate has mean 0 and variance 1 and so, by the
Central Limit Theorem, is approximately normally distributed. This is called
the approximate method. If M variates are required from this method, 12M
variates from the U (0, 1)distribution are required.
References
The following texts were used in the preparation of this chapter and you are
referred there for further reading if required.
32
Faculty and Institute of Actuaries, CT8 Core Reading;
33
2.5 Summary
The Monte Carlo method is a statistical sampling technique where one eval
uates a nonrandom quantity as an expectation of a random variable.
The method can be applied to the approximation of deterministic integrals:
Z 1 M
1 X
I= g(x)f (x)dx = Eg() IM = g(m ).
0 M m=1
34
Questions
2. Obtain the 95%confidence interval for the Monte Carlo estimate of the
integral in question 1, based on M random simulations.
35
Chapter 3
Probability theory and stochastic processes
The aim of this chapter is to give the review of probability theory and stochas
tic processes. This background material is necessary for understanding the
models that will be developed in later chapters. This chapter is written quite
formally in terms of definitions and theorems and is therefore of a different
style to nearly all chapters you have seen in previous modules. This has been
done so that you are able to look up key definitions and concepts if required
when the ideas are applied in later chapters. Although a certain amount
of revision is containned here, the focus is on the rigorous development of
the stochastic process in the general sense which includes both discrete and
continuous models.
The rigorous development of probability theory and stochastic processes can
be very technical. Although we have attempted to avoid unnecessary tech
nicalities in this text, a certain level of technicality is unavoidable. As this
chapter is intended as background material it is most important that you
understand the material on the intuitive level. However, it is also impor
tant that you have a level of technical knowledge able to solve quantitative
problems when necessary.
We begin with the formal development of probability theory before moving
on to stochastic processes.
36
It is necessary that P has the properties
P () = 1, P () = 0,
P (Ac ) = 1 P (A) A F
P (A B) = P (A) + P (B) P (A B) A, B F.
Note that is an impossible event, A and B are possible events and the
superscript c denotes the complement of an event.
This definition allows one to construct random variables with countably many
possible outcomes. However, in many cases it is necessary to assume that
a random variable can take any real number, for example, if it measures
distance or temperature, say. This requires one to consider a continuum of
events in a continuous probability space.
To include this possibility of uncountably many outcomes arising from a
continuous probability space, we need to extend the above definition to the
case of uncountable .
Further, on a discrete probability space one cannot construct an infinite
sequence of independent events, such as an infinite number of coin tosses.
Indeed, if represents all possible outcomes of an infinite number of coin
tosses, then n o
= : = (a1 , a2 , . . .), ai = 0, 1
(where ai is the result of the ith toss: 1 if the coin lands head up; 0 otherwise).
This set is already uncountable, i.e. its elements cannot be enumerated by
integers, as = {k } k=1 .
Example 3.1. Consider that we toss a fair coin (i.e. that the probability
of a head is equal to the probability of a tail, and so these probabilities are
equal to 1/2). Let
N := { : a1 = = aN = 1}
be the event that all the first N coins land with the head up. It is natural
to assume that P (AN ) = 2N , since the event N is one of the 2N possible
outcomes. Therefore, for
n o
:= = (a1 , a2 , . . .) : ai = 1 i = 1, 2, . . .
37
we must have P ( ) P (N ) = 2N for all N , since N . It therefore
must be that P ( ) = 0.
With theSsame logic we get P () = 0 for every . However, P () = 1
and = {}, and the natural desire to use our intuition that
X X
P () = P () = 0 = 0,
1. , F
S T
2. If A1 , A2 , F, then n=1 An F and n=1 An F
3. If A F then Ac F
The key feature of the above definition is that a algebra is closed under
infinite countable unions and intersections, but it is not closed under un
countable unions.
We can now give a formal definition for the notion of a probability space.
38
Definition 3.2. Let be a nonempty set and F be some algebra of
subsets of . Suppose that a function P : F [0, 1] satisfies
1. P (0) = 0, P () = 1
S P
2. P n=1 An = n=1 P (An ), provided A1 , A2 , F is a collection of
disjoint sets, i.e. Ai Aj = for all i 6= j.
(a, b] = b a;
to the whole set B. There is a theorem which ensures that any countable
structure of random events or random variables can be constructed on this
standard probability space.
Example 3.2. The results of tossing the coin in Example 3.1 an infinite
(but countable) number of times can be constructed as
1, if the integer part of 2i in the binary system equals 1,
i () :=
0, otherwise
for all [0, 1]. We can say that the ith coin lands head up, if i = 1,
and tail up, if i = 0. It is easy to see that P (i = 0) = P (i = 1) = 1/2
39
for all i = 1, 2, . . . . Further, {i }
i=1 is a system of mutually independent
random variables (see the definition of independence in 3.1.4 below) which
makes the model equivalent to the infinite sequence of coin tosses (socalled
Bernoulli trials).
:R
(Borel sets are the elements of a Borel algebra, the minimum algebra
containning all open sets (see above)).
We may interpret a random variable as a number produced by an experiment,
such as temperature or the oil price tomorrow. The mathematical formalism
allows us to be accurate. In particular, the property of measurability is im
portant to make sense of the probability of the event that a random variable
does not exceed a threshold: P r) := P ({ : () r} .
We say that the random variables and are equal almost surely (a.s.) if
P ( = ) := P : () = () = 1.
Often the term almost surely is omitted since in almost all probability
applications everything is defined up to a set of probability zero.
40
Example 3.4. The simplest random variables are indicators. For a set
(event) A F, its indicator is defined by
1, A
IA () :=
0, Ac
For example, if A is the event that tomorrow there is a thunderstorm, we set
to 1 if a thunderstorm is really going to take place and 0 otherwise. We
define the expectation of an indicator IA by
E[IA ] := P (A).
It is easy to check that for random variable of the form (8) this integral
reduces to (9).
If E[] exists and is finite, then is called integrable. The class of all inte
grable random variables is denoted by L1 (, F, P ) or just L1 . In particular,
all bounded random variables are integrable. A random variable is called
bounded if there exists a constant M such that P ( < M ) = 1).
The key property of the expectation is linearity
41
The variance of a random variable is defined by
Not every L1 has a finite variance. The class of random variables with
finite variance is denoted by L2 (, F, P ) or just L2 . A random variable with
finite variance is called squareintegrable. Variance is always nonnegative
and is equal to zero only for constants.
p
The square root of the variance () = Var() is called the standard devi
ation of .
Note that
Covariance is equal to zero if the random variables are independent (see 3.1.4
for the definitions of independence). If covariance is close to zero, random
variables are sometimes considered as almost independent.
Sometimes it is convenient to normalise the covariance. For nonconstant
random variables and , define the correlation of and by
Cov(, )
Corr(, ) := p p .
Var() Var()
CauchySchwarz inequality says
E[]2 E[ 2 ]E[ 2 ]
for all , L2 ,
42
3.1.3 Probability distributions
Given a random variable we define its cumulative distribution function by
F (x) := P ( x).
Note that quite different random variables 6= can have identical proba
bility distributions F = F . In this case we say that and are identically
distributed (i.d.). It can be shown that the random variables and are
identically distributed if and only if E[f ()] = E[f ()] for any f : R R.
Example 3.5. Question: We say that random variable X has the expo
nential distribution with parameter R > 0, and write X Exp(), if its
probability density function is
43
Calculate E[X] and Var(X).
Answer: The expectation is calculated as
Z
1
E[X] = xex dx = .
0
The variance is calculated as
Z 2
1 1
Var(X) = x ex dx = 2 .
0
F1 ,...,d (x1 , . . . , xd ) = P [1 x1 , 2 x2 , . . . , d xd ].
If it can be represented as
Z x1 Z xd
F1 ,...,d (x1 , . . . , xd ) = ... 1 ,...,d (z1 , . . . , zd ) dz1 . . . dzd ,
for some nonnegative function 1 ,...,d (z1 , . . . , zd ), the latter is called the joint
probability density function of 1 , . . . , d .
3.1.4 Independence
P (A B) = P (A)P (B).
44
For random variables we have the following definition.
are mutually independent for all real numbers {ak , bk }nk=1 . An infinite set
of random variables { } is called mutually independent if any finite subset
{1 , . . . , n } is mutually independent.
45
This definition can be reformulated in terms of the joint probability distribu
tion. The random variables 1 , . . . d are mutually independent if and only if
their joint distribution (i.e. the distribution of the random vector (1 , . . . , d ))
is the product of the distribution functions of k s:
d
Y
F1 ,...,d (x1 , . . . , xd ) = Fk (xk ).
k=1
Moreover, it is easy toQ check that if the joint distribution can be expressed as
F1 ,...,dR(x1 , . . . , xd ) = dk=1 Gk (xk ), with some functions Gk , then Fk (x) =
Gk (x)( Gk (x)dx)1 for each k and so 1 , . . . d are mutually independent.
If the variables 1 , . . . , d have probability densities, then they are mutually
independent if and only if the random Q vector X also has a density which can
be factorised as X (x1 , . . . , xd ) = dk=1 k (xk ).
For independent random variables and we have
E f ()g() = E[f ()]E[g()]
46
For any random variables , L2 such that Cov(, ) = 0 (so, in par
ticular, for any independent and ) we have
47
Knowing P [A], P [B], and P [AB], one can estimate P [BA] as follows
P [A B] P [AB]P [B]
P [BA] = = ,
P [A] P [A]
48
3.2.1 Rough classification of random processes
There are many classifications of random processes. One of the most basic is
to classify them with respect to the time (index) set T and with respect to
the state space. By definition the state space S is the set of possible values
of a random process Xt .
49
of a couponpaying bond changes at the deterministic times of the coupon
payments, but it also changes randomly all the time before its maturity, due
to the current situation in the market.
50
as a satisfactory description. For example, if is the temperature tomorrow
at 9.00, the knowledge of F (x) allows us to answer all the questions of the
form what is the probability that the temperature will be between x and
y, which is actually all we would like to know. If is another forecast for
tomorrows 9.00 temperature with F (x) F (x), and can be considered
as indistinguishable from the practical point of view.
For stochastic process {Xt : t T }, however, the collection of all distribution
functions {FXt (x) : t T } is far from satisfactory description of the whole
process, because it say nothing about the dependencies among the underlying
random variables. To describe the whole process, we need to specify the joint
distributions of Xt1 , Xt2 , . . . , Xtn for all t1 , t2 , . . . , tn in T and all integers n.
The collection of the joint distributions above is called the family of finite
dimensional probability distributions (f.f.d. for short), and two stochastic
processes {Xt : t T1 } and {Yt : t T2 } are said to be identically distributed,
if they have the same f.f.d. and T1 = T2 .
To describe a stochastic process in practice, we will rarely give the exact
formulas for its f.f.d., but will rather use some indirect intuitive descriptions.
For example, take the familiar Bernoulli trials of consecutive tosses of a fair
coin. A sequence of i.i.d. Bernoulli variables (t ) t=1 is a stochastic process,
and its f.f.d. is fully determined by this description. Indeed, for any sequences
of times t1 , t2 , . . . , tn in T = {0, 1, 2, . . . } and results x1 , x2 , . . . , xn in S, we
are able to estimate the probability P (t1 = x1 t2 = x2 tn = xn ),
and it is equal to 2n .
51
often already very useful for applications. A stochastic process {Xt : t T }
is said to be weakly stationary, if the mean of the process, m(t) = E[Xt ], is
constant, and the covariance of the process, C(s, t) = E[(Xs m(s))(Xt
m(t))], depends only on the time difference t s. Obviously, any strictly
stationary stochastic process with finite mean and variance is also weakly
stationary.
References
The following texts were used in the preparation of this chapter and you are
referred there for further reading if required.
R.G. Grimmett & D.R. Stirzaker, Probability and Random Processes.
52
J. Jacod, & P. Protter, Probability Essentials.
53
3.3 Summary
Let be a nonempty set and F be a collection of subsets of . We say that
F is a algebra if it satisfies the following properties:
1. , F
S T
2. If A1 , A2 , F, then n=1 An F and n=1 An F
3. If A F then Ac F
1. P (0) = 0, P () = 1
S P
2. P n=1 An = n=1 P (An ), provided A1 , A2 , F is a collection of
disjoint sets, i.e. Ai Aj = for all i 6= j.
for some nonnegative function , the latter is called the probability density
function (pdf) of .
R
The expectation of a random variable is defined as E[] dP . It can
be calculated as
Z Z
E[] = x dF (x) = x (x) dx.
54
all real numbers a, b, c, d. If and are independent, Cov(, ) = 0, but the
converse is not always true.
The conditional probability of A given B, denoted P [AB], can be evaluated
as P [AB] = P P[AB]
[B]
.
A stochastic process is a family of random variables indexed in time, {Xt : t T }.
The time set T can be discrete or continuous, as can the state space S in
which the variables take their values.
Stochastic process can be roughly classified into the following groups:
Mixed processes.
55
Questions
(a) p(a1 ) = 0.3; p(a2 ) = 0.2; p(a3 ) = 0.1; p(a4 ) = 0.1; p(a5 ) = 0.1;
(b) p(a1 ) = 0.4; p(a2 ) = 0.3; p(a3 ) = 0.1; p(a4 ) = 0.1; p(a5 ) = 0.1:
(c) p(a1 ) = 0.4; p(a2 ) = 0.3; p(a3 ) = 0.2; p(a4 ) = 0.1; p(a5 ) = 0.1;
(d) p(a1 ) = 0.4; p(a2 ) = 0.3; p(a3 ) = 0.2; p(a4 ) = 0.1; p(a5 ) = 0.1.
Show that the correlation between the returns on asset A and asset B
is equal to 0.3830.
A := {1 , 4 }, B := {2 , 4 }, C := {3 , 4 }.
Prove that the pairs (A, B), (A, C) and (B, C) are independent, but
the triple (A, B, C) is not mutually independent according to Definition
3.4.
56
Chapter 4
Markov Chains
The Markov property is used extensively in the Actuarial Mathematics to
develop twostate and multistate Markov models of mortality and other
decrements. The rest of this course is devoted to a thorough description of
the Markov property in a general context and its applications to actuarial
modelling.
We will distinguish between two types of stochastic process that possess the
Markov property: Markov chains and Markov jump processes. Both have a
discrete state space, but Markov chains have a discrete time set and Markov
jump processes have a continuous time set.
We begin with Markov chains and discuss the mathematical formulation of
such process, leading to one important actuarial application: the noclaims
discount process used in motor insurance. We then move onto Markov jump
processes.
The practical considerations of applying these models in the Actuarial Math
ematics will be discussed in detail in later sections. In this chapter we focus
on the mathematical development of Markov models without reference to
their calibration to real data.
57
is a particular value is zero, and so it is necessary to work with probabilities
of Xt lying in some subset of the state space in any general definition.
Although we are entirely concerned with discrete state spaces in this chapter,
it is important to realise that the Markov property can be possessed by
general stochastic processes.
for all s1 < s2 < < sn < s < t and all states a, x1 , x2 , . . . , xn , x in S.
An important result is that any process with independent increments has the
Markov property.
Example 4.1. Question: Prove that any process with independent incre
ments has the Markov property.
Answer: We begin with equation (10) and use the fact that Xt = Xt Xs +x
to introduce an increment
the second equality arises from the definition of independent increments and
the fact that x is known.
A Markov process with a discrete state space and a discrete time set is called
a Markov chain, these are consider in this chapter. A Markov process with
discrete state space and continuous time set is called a Markov jump process,
these are considered in the next chapter.
58
We define the transition probabilities as
pij (n, n + 1) = P Xn+1 = j Xn = i . (12)
Note that P(n, n+1) is a finite matrix in the case of a finite number of states,
and an infinite matrix in the case of an infinite number of states.
3. If one or more claims are made the policyholder moves down one level,
or remains at the 0% level.
The insurance company believes that the chance of claiming each year is
independent of the current discount level and has a probability of 1/4. Why
can this process be modeled as a Markov chain? Give the state space and
transition matrix.
Answer: The model can be considered as a Markov chain since the future
discount depends only on the current level, not the entire history. The state
space is S = {0%, 30%, 60%}, which is convenient to denote as S = {0, 1, 2}
(where state 0 is the 0% state, 1 is the 30% state and 2 is the 60% state).
The transition probability matrix between two states in a unit time is given
by 1 3
4 4
0
P = 41 0 34 . (14)
1 3
0 4 4
59
1. All its entries are nonnegative, and
2. The sum of entries in any row is one.
It is clear that the transition matrix in Example 4.2 is a stochastic matrix
by this definition. More generally, every transition matrix P(n, n + 1) of a
Markov chain is a stochastic matrix. Indeed, allX the transition probabilities
pij (n, n + 1) are by definition nonnegative, and pij = 1 for all i since the
jS
system must move to some state from any state i.
A clear way of representing Markov chains is by a transition graph. The
states are represented by circles linked by arrows indicating each possible
transition. Next to each arrow is the corresponding transition probability.
Example 4.3. Question: Draw the transition graph for the NCD system
defined in Example 4.2.
Answer: See Figure 3
Figure 3: Transition graph for the NCD system of Example 4.2. Reproduced
with permission of the Faculty and Institute of Actuaries.
Equation (12) defines the probabilities of transition over a single time step.
Similarly, the nstep transition probabilities pij (m, m + n) denote the prob
ability that a process in state i at time m will be in state j at time m + n.
That is:
pij (m, m + n) = P [Xm+n = j Xm = i] .
The transition probabilities of a Markov process satisfy the system of equa
tions called the ChapmanKolmogorov equations
X
pij (m, n) = pik (m, l)pkj (l, n), (15)
kS
60
for all states i, j S and all integer times m < l < n. This can be expressed
in terms of nstep stochastic matrices as
This should be intuitively clear but a formal proof is left as a question at the
end of the chapter.
61
of stochastic matrices denoted by P(t):
p00 (t, t + 1) p01 (t, t + 1) . . .
P(t) = pij (t, t + 1) i,jS = p10 (t, t + 1) p11 (t, t + 1) . . .
.. ..
. .
The value of t can represent many factors such as time of year, age of pol
icyholder or the length of time the policy has been in force. For example,
young drivers and very old drivers may have more accidents than middle
aged drivers and therefore t might represent the age or age group of the
driver purchasing a motor insurance policy.
Although timeinhomogeneous models are important in practical modelling,
a further analysis is beyond the scope of this course.
P(n) = Pn ,
Example 4.5. Question: Calculate the 2step transition matrix for the
NCD system from Example 4.2 and confirm that it is a stochastic matrix.
62
Answer: The 1step transition matrix is given by equation (14) and so we
can compute that
1 3 1 3
0 0 4 3 9
4 4 4 4 1
P(2) = 14 0 43 14 0 34 = 1 6 9 .
1 3 1 3 16
0 4 4 0 4 4 1 3 12
We note that the two conditions for P(2) to be a stochastic matrix are satis
fied.
Example 4.6. Question: Using the 2step transition matrix from Exam
ple 4.5 state the probabilities that
Answer:
63
4.5.1 The simple (unrestricted) random walk
A simple random walk is a stochastic process {Xt } with state space S = Z
i.e. the integers. The process is defined by
Xn = Y1 + Y2 + + Yn ,
The simple random walk has the Markov property, that is:
P (Xm+n = j X1 = i1 , X2 = i2 , . . . , Xm = i) ,
= P (Xm + Ym+1 + Ym+2 + + Ym+n = j X1 = i1 , X2 = i2 , . . . , Xm = i) ,
= P (Ym+1 + Ym+2 + + Ym+n = j i) ,
= P (Xm+n = j Xm = i) .
64
Figure 4: Transition graph for the unrestricted random walk. Reproduced
with permission of the Faculty and Institute of Actuaries.
the total number of steps where Xi+1 Xi = 1), and the number of negative
steps be l, (that is, l is the total number of steps where Xi+1 Xi = 1).
Since there are n steps in total, it follows that r + l = n and that r l = j i,
the excess of positive steps over negative steps. Solving these simultaneous
equations for r and l gives
1 1
r = (n + j i) and l = (n j + i).
2 2
From this we can see that the nstep transition probabilities are
(n) n 1 1
pij = 1 p 2 (n+ji) (1 p) 2 (nj+i) ,
2
(n + j i)
where nr is the number of possible paths with r positive steps, each of which
occurs with probability pr (1 p)nr . The expression arises since the distri
bution of the number of positive steps in n steps is Binomial with parameters
n and p. Since r and l must be nonnegative integers, it follows that both
n + j i and n j + i must be nonnegative even numbers.
In addition to being timehomogeneous, a simple random walk is spatially
homogeneous, that is
(n)
pij = P (Xn = j X0 = i) = P (Xn = j + r X0 = i + r) .
1
A simple random walk with p = q = 2
is called a symmetric simple random
walk.
65
A man needs to raise N to fund a specific project and asks his very rich
friend to accompany him to a casino where he hopes to win this money. The
man plays the following game: A fair coin is tossed. If it lands headsup
the man wins 1; if it lands tailsup the man loses 1. If he loses all his
money he will borrow 1 from his friend and continue to play until he has
the required N . Once he has accumulated N he will stop playing the
game.
The restricted random walk is therefore a simple random walk with boundary
conditions. In this example the boundary conditions are specified at 0 and
N . At N the barrier is an absorbing barrier, while at 0 it is called a reflecting
barrier.
More formally, an absorbing barrier is a value b such that:
In other words, once state b is reached, the random walk stops and remains
in this state thereafter.
A reflecting barrier is a value c such that:
P (Xn+1 = c + 1 Xn = c) = 1.
In other words, once state c is reached, the random walk is pushed away.
A mixed barrier is a value d such that:
for all s > 0 and [0, 1]. In other words, once state d is reached, the
random walk remains in this state with probability or moves to the neigh
boring state d + 1 with probability 1 , i.e. it is an absorbing barrier with
probability and a reflecting barrier with probability 1 .
If, in the example, above the man does not take his rich friend, he will
continue to gamble until either his money reaches the target N or he runs
out of money. In each case reaching the boundary means that the wealth
will remain there forever; the barriers therefore become absorbing barriers.
The transition graph for the general case of a restricted random walk with
two mixed barriers is given in Figure 5. The special cases of reflecting and
absorbing boundary conditions are obtained by taking or equal to 0 or
1.
66
Figure 5: Transition graph for the restricted random walk with mixed bound
ary conditions. Reproduced with permission of the Faculty and Institute of
Actuaries.
67
For example
P [Xn+1 = 1 Xn = 2, Xn1 = 1] > 0, (18)
whereas
P [Xn+1 = 1 Xn = 2, Xn1 = 3] = 0. (19)
2+ : 40% discount and no claim in the previous year, that is, the state
corresponding to {Xn = 2, Xn1 = 1}.
2 : 40% discount and claim in the previous year, that is, the state
corresponding to {Xn = 2, Xn1 = 3}.
Assuming that the probability of making no claims in any year is still 3/4, the
Markov chain on the modified state space S 0 = {0, 1, 2+ , 2 , 3} has transition
graph given by Figure 6, and 1step transition matrix given by
1/4 3/4 0 0 0
1/4 0 3/4 0 0
P= 0 1/4 0 0 3/4 .
1/4 0 0 0 3/4
0 0 0 1/4 3/4
Figure 6: Transition graph for the modified NCD process. Reproduced with
permission of the Faculty and Institute of Actuaries.
68
Note that a policyholder can only be in state 2+ by moving up from state
1; and in state 2 by moving down from state 3. Hence equations (18) and
(19) become
P Xn+1 = 1 Xn = 2+ = 1/4,
and
P Xn+1 = 1 Xn = 2 = 0.
f (y1 + y2 + + yt )
P [Yt+1 = 1Y1 = y1 , Y2 = y2 , . . . , Yt = yt ] = ,
g(t)
P [Xt+1 = 1 + xt X1 = x1 , X2 = x2 , . . . , Xt = xt ]
f (xt )
= P [Yt+1 = 1Y1 = x1 , Y2 = x2 x1 , . . . , Yt = xt xt1 ] = ,
g(t)
which does not depend on the past history x1 , x2 , . . . , xt1 . Thus, {Xt , t =
0, 1, 2 . . . } is a Markov chain.
69
4.5.5 General principles of modelling using Markov chains
In this section we summarise the examples above and identify the key step in
modelling reallife situations using Markov chains. For simplicity, we discuss
only timehomogeneous models here.
This follows from the fact that the conditional distribution of Nij given
Ni is binomial with parameters Ni and pij .
Step 3. Checking the Markov property: Once the state space and
transition probabilities are found, the model is fully determined. But,
to ensure that the fit of the model to the data is adequate, we need to
check that the Markov property seems to hold. In practice, it is often
considered sufficient to look at triplets of successive observations. For
70
a set of observations x1 , x2 , . . . , xN , let nijk be the number of times t
(1 t N 2) such that xt = i, xt+1 = j, and xt+2 = k. If the Markov
property holds, nijk is an observation from a Binomial distribution with
parameters nij and pjk . An effective test to check this is a 2 test: the
statistic
X X X (nijk nij pjk )2
X2 =
i j k
nij pjk
Step 4. Using the model: Once the model parameters are de
termined, and Markov property checked, we can use the established
model to estimate different quantities of interest. In particular, we
have used the Markov model for the Example 4.2 to address questions
like What is the probability that a policyholder initially in the 0%
state is in the 0%state after 2 years? (see Example 4.6). If the Markov
model is too complicated to answer questions of this type analytically,
we can use MonteCarlo simulation (see chapter 2). Simulating a time
homogeneous Markov chain is relatively straightforward. In addition
to commercial simulation packages, even standard spreadsheet software
can easily cope with the practical aspects of estimating transition prob
abilities and performing a simulation.
P (Xn = j X0 = i) j , (20)
71
X
2. j 0 for all j and j = 1.
jS
Example 4.7. Question: Are the simple NCD, modified NCD, unre
stricted and restricted random walk processes irreducible?
Answer: It is clear from Figures 3, 4 & 6 that both NCD processes and
the unrestricted random walks are irreducible as all states have a nonzero
probability of being reached from any other state in a finite number of steps.
72
For the restricted random walk, Figure 5 shows that it is irreducible unless
either boundary is absorbing, i.e. it is irreducible for 6= 1 or =6= 1.
An irreducible Markov chain with a finite state space has a unique stationary
probability distribution. This is stated without proof.
73
We therefore discard one of the equations (discarding the last one will simplify
the system) and work in terms of a working variable, say 1 .
30 2 =1 , 30 + 2+ = 41 ,
3
2+ = 1 , 42 3 = 0.
4
74
Answer: The model is irreducible and aperiodic, therefore, assuming that
the policies have been held for a sufficient length of time, the distribution of
policyholders amongst states is given by the stationary distribution computed
in Example 4.9. We would therefore expect the following distribution:
State 0: no discount 10, 000 13/169 769
State 1: 25% discount 10, 000 12/169 710
State 2: 40% discount 10, 000 (9/169 + 27/169) 2, 130
State 3: 60% discount 10, 000 108/169 6, 391
References
The following texts were used in the preparation of this chapter and you are
referred there for further reading if required.
75
4.8 Summary
For discrete state spaces the Markov property is written as
for all s1 < s2 < < sn < s < t and all states a, x1 , x2 , . . . , xn , x in S.
Any process with independent increments has the Markov property.
Markov chains are discretetime and discretestatespace stochastic processes
satisfying the Markov property. You should be familiar with the simple
NCD, modified NCD, unrestricted random walk and restricted random walk
processes.
In general, the nstep transition probabilities pij (m, m + n) denote the prob
ability that a process in state i at time m will be in state j at time m + n.
The transition probabilities of a Markov process satisfy the ChapmanKolmogorov
equations: X
pij (m, n) = pik (m, l)pkj (l, n),
kS
for all states i, j S and all integer times m < l < n. This can be expressed
in terms of nstep stochastic matrices as
= P(n) .
76
Questions
77
i. one year.
ii. two years.
iii. three years.
(iii) Explain whether the chain is irreducible and/or aperiodic.
(iv) Does this Markov chain converge to a stationary distribution?
(v) Calculate the longrun probability that a policyholder is in dis
count level 2.
78
Chapter 5
Markov Jump Processes
A Markov jump process is a stochastic process with discrete state space and
continuous time set, which has Markov property.
The mathematical development of Markov jump processes is similar to Markov
chains considered in the previous chapter. For example, the Chapman
Kolmogorov equations have the same format. However, Markov jump pro
cesses are in continuous time and so the notion of a onestep transition prob
ability does not exist and we are forced to consider time intervals of arbi
trarily small length. Taking the limit of these intervals to zero leads to the
reformulation of the ChapmanKolmogorov equations in terms of differential
equations.
We begin my discussing the Poisson process which is the simplest example of
a Markov jump process. In doing so we will encounter some general features
of Markov jump processes.
79
The counting process {Nt }t[0,) is said to be a Poisson process with rate
> 0, if
1. N0 = 0;
3. P (Nt+h Nt = 1) = h + o(h);
p0 (t + h) = P (Nt+h = 0),
= P (Nt = 0, Nt+h Nt = 0),
= P (Nt = 0)P (Nt+h Nt = 0),
= p0 (t)(1 h + o(h)).
dp0 (t)
= p0 (t),
dt
with the initial condition, p0 (0) = 1. It is clear that this has solution
p0 (t) = et . (22)
Similarly, for n 1;
pn (t + h) = P (Nt+h = n),
= P (Nt = n, Nt+h Nt = 0) + P (Nt = n 1, Nt+h Nt = 1) + o(h),
= P (Nt = n)P (Nt+h Nt = 0) + P (Nt = n 1)P (Nt+h Nt = 1) + o(h),
= pn (t)p0 (h) + pn1 (t)p1 (h) + o(h),
= (1 h)pn (t) + hpn1 (t) + o(h).
Rearranging this for pn (t+h), and again taking the limit as h 0, we obtain
the differential equation
dpn (t)
= pn (t) + pn1 (t), (23)
dt
for n = 1, 2, 3, . . . .
It can be shown by mathematical induction, or using generating functions,
that the solution to the differential equation (23), with initial condition
pn (0) = 0 yields equation (21). As required.
A Poisson process has positive integer values and can jump at any time
t [0, ). However, since time is continuous, the probability of a jump is
zero at specific time point t. The process can be pictured as an upwards
staircase shown in Figure 7.
81
Figure 7: Sample Poisson process. Horizontal distance is time.
The sequence {n }n1 is called the sequence of interarrival times (or holding
times). These are the horizontal distances between each step in Figure 7.
The random variables 1 , 2 , . . . are i.i.d., each having the exponential distri
bution with parameter . They therefore each have the density function
To demonstrate this for general n , first consider 1 and note that the event
1 > t occurs if and only if there are zero events of the Poisson process in the
fixed interval (0, t], that is
P (1 > t) = P (Nt = 0) = et .
P (1 t) = 1 et ,
82
The same argument can be repeated for 3 , 4 , . . . leading to the conclusion
that the interarrival times are i.i.d. random variables that are exponentially
distributed with parameter .
Further, it can be shown using similar arguments that if Nt and Nt are
two independent Poisson processes with parameters 1 and 2 respectively,
then their sum Nt = Nt + Nt is a Poisson process with parameter 1 + 2 .
This result follows immediately from our intuitive interpretation of a Poisson
process: assume that male customers are arriving uniformly with rate 1 ,
and female customers are arriving independently and uniformly with rate 2 .
Then Nt describes the cumulative number of male customers, Nt  female
customers, thus Nt = Nt + Nt is the total number of customers, which clearly
also arriving uniformly with rate 1 + 2 .
This can be extended to the sum of any number of Poisson processes and is
a very useful result.
83
where {Nt }t[0,) is a Poisson process with rate , and {Yi , i 1} are
independent and identically distributed random variables, with distribution
function F , which are also independent of Nt .
The expected value and variance of the compound Poisson process are
given by
E[Xt ] = tE[Y ], V ar[Xt ] = tE[Y 2 ], (26)
where Y is a random variable with distribution function F .
Example 5.4. In Example 5.3 assume that size of each claim is a random
variable uniformly distributed on [a, b]. All claims sizes are independent.
What is the mean and variance of the cumulative size of the claims from all
policies during 3 years?
Answer:
PNtThe cumulative size of the claims is the compound Poisson process
Xt = i=0 Yi , where Nt is the number of claims from all policies, which is
the Poisson process
R b with parameter = 10,
R b 000q, and Yi is size of claim i.
2 2
Then EYi = ba a xdx = 2 ; EYi = ba a x dx = a +ab+b
1 a+b 2 1 2
3
, which gives
a+b
E[X3 ] = 3E[Yi ] = 30, 000q = 15, 000q(a + b),
2
and
a2 + ab + b2
V ar[X3 ] = 3E[Y 2 ] = 30, 000q = 10, 000q(a2 + ab + b2 ).
3
Assume that a company has initial capital u, premium rate c, and the cumu
lative claims size Xt is given by (25). Then the basic problem in risk theory
is to estimate the probability of ruin at time t > 0, defined as
pij (s, t) = P [Xt = jXs = i] , where pij (s, t) 0 and s < t. (28)
84
In matrix form, these are expressed as
The proof of these is analogous to that for equation (15) in discrete time,
and is left as a question at the end of the chapter.
We require that the transition probabilities satisfy the continuity condition
1, i = j
lim+ pij (s, t) = ij = (30)
ts 0, i 6= j
This condition means that as the time difference between two observations
approach zero, the process will very likely not change its state with proba
bility approaching one in the limit.
It is easy to see that this condition is consistent with ChapmanKolmogorov
equation. Indeed, taking the limits t2 t +
3 or t2 t1 in equation (29) we
obtain the identity.
However, this condition does not follow from the ChapmanKolmogorov
equations. For example, pij (s, t) = 12 for i, j = 1, 2 satisfy equation (29),
since 1 1 1 1 1 1
2
1
2
1 = 2
1
2
1 21 21 .
2 2 2 2 2 2
85
It follows from equation (31) that {ij } approach certain limits as h 0. In
particular, we define
pjj (t, t + h) 1
lim := qjj (t),
h0 h (32)
pkj (t, t + h)
lim := qkj (t), for k 6= j.
h0 h
The quantities qjj (t), qkj (t) are called transition rates. They correspond to
the rate of transition from state k to state j in a small time interval h, given
that state k is occupied at time t.
Transition probabilities pkj (t, t + h) can be expressed through the transition
rates as
hqkj (t) + o(h), k 6= j
pkj (t, t + h) = (33)
1 + hqjj (t) + o(h), k = j
pij (s, t) X
= pik (s, t)qkj (t). (34)
t kS
P(s, t)
= P(s, t)Q(t),
t
where Q(t) is called the generator matrix with entries qij (t).
Repeating the procedure but differentiating with respect to s, we have
Therefore
pij (s, t) X
= qik (s)pkj (s, t), (35)
s kS
86
and we see that the derivative with respect to s can also be expressed in
terms of the transition rates. The differential equations (35) are called Kol
mogorovs backward equations. In matrix form these are written as
P(s, t)
= Q(s)P(s, t).
s
Therefore if transition probabilities pij (s, t) for t > s have derivatives with
respect to t and s, transition rates are welldefined and given by equation
(32).
Alternatively, if we can assume the existence of transition rates, then it
follows that transition probabilities pij (s, t) for t > s have derivatives with
respect to t and s, given by equations (34) and (35). These equations are
compatible, and we may ask whether we can find transition probabilities,
given transition rates, by solving equations (34) and (35).
It can be shown that each row of the generator matrix Q(s) has zero sum.
That is, X
qii (s) = qij (s).
j6=i
The residual holding time for a general Markov jump process is denoted Rs .
This is the random amount of time between time s and the next jump:
Similarly, the current holding time is denoted Ct . This is the time between
the last jump and time t:
We will not study these questions further for general Markov processes, but
will investigate such and related questions for timehomogeneous Markov
processes below.
87
5.4 Timehomogeneous Markov jump processes
Just as we defined timehomogeneous Markov chains (equation (17)), we can
define timehomogeneous Markov jump processes.
Consider the transition probabilities for a Markov process given by equation
(28), a Markov process in continuous time is called timehomogeneous if the
transition probabilities pij (s, t) = pij (0, t s) for all i, j S and s, t > 0.
In other words, a Markov process in continuous time is called timehomogeneous
if the probability P (Xt = j Xs = i) depends only on the time interval t s.
In this case we can write
Here, for example, pij (s) form a stochastic matrix for every s, that is
X
pij (s) 0 and pij (s) = 1,
jS
Also pij (s) satisfy the ChapmanKolmogorov equations, which, for a time
homogeneous Markov process take the form
X
pij (t + s) = pik (t) pkj (s). (36)
kS
88
The argument for different values of n is similar. So, if for some t we would
have pii (t) = 0, this would imply pii (t/n) = 0 for all n, contradiction with
(30).
The following properties of transition functions and transition rates for a
timehomogeneous process are stated without proof:
dpij (t)
pij (h) ij
1. Transition rates qij = = lim exist for all i, j.
dt t=0
h0 h
Equivalently, as h 0, h > 0
hqij + o(h), i 6= j
pij (h) = (38)
1 + hqii + o(h), i = j
Comparing this to equation (33) we see that the only difference between
the timehomogeneous and timeinhomogeneous cases is that the tran
sition rates qij are not allowed to change over time.
2. Transition rates are nonnegative and finite for i 6= j, and are non
positive when i = j, that is
X
qii = qij .
j6=i
dP(t)
= QP(t).
dt
89
X
Note that since qii = qij , each row of the matrix Q has zero sum.
j6=i
Example 5.5. Consider the Poisson process again. The rate at which
events occur is a constant , leading to
, j = i + 1
qij = 0, j 6= i, i + 1 (39)
, j = i
dpi0 (t)
= pi0 (t),
dt
dpij (t)
= pij (t) + pij1 (t),
dt
with pij (0) = ij . These equations are essentially the same as equations (22)
and (23).
The backward equations are
dpij (t)
= pij (t) + pi+1,j (t).
dt
5.5 Applications
In this section we briefly discuss a number of applications of Markov jump
processes to actuarial modelling. In each case the models can be made time
homogeneous by insisting that the transition rates are independent of time.
A more detailed discussion of the survival model is postponed to the next
chapters.
90
Figure 8: Transition graph for the survival model. Reproduced with permis
sion of the Faculty and Institute of Actuaries.
91
Figure 9: Transition graph for the sicknessdeath model. Reproduced with
permission of the Faculty and Institute of Actuaries.
the probability that an individual who is sick at time s will still be sick
at time t.
These are in terms of the residual holding times as
Rt
P (Rs > t s Xs = H) = e s ((u)+(u))du ,
and Rt
P (Rs > t s Xs = S) = e s ((u)+(u))du ,
respectively.
We note that transition probabilities can be related to each other. For ex
ample, the probability of a transition from state H at time s to S at time t
would be
Z ts R
s+w
pHS (s, t) = e s ((u)+(u))du (s + w)pSS (s + w, t)dw.
0
92
This is interpreted as the individual remains in the healthy state from time
s to time s + w and then jumps to the state sick at time s + w where he
remains. The derivation of this equation is beyond the scope of the course,
however similar expressions can be written down intuitively.
This sicknessdeath model can be extended to include the length of time an
individual has been in state S. This leads to the socalled long term care
model where the rate of transition out of state S will depend on the current
holding time in state S.
Figure 10: Transition graph for the marriage model. Reproduced with per
mission of the Faculty and Institute of Actuaries.
93
This mathematical statement can be read as the individual is in state B at
time s where he either remains until time (t v), or jumps to states W or D
by time (t v). At time (t v) he then jumps to state M and remains there
until time t.
References
The following texts were used in the preparation of this chapter and you are
referred there for further reading if required.
94
5.6 Summary
Markov jump processes are continuoustime and discretestatespace stochas
tic processes satisfying the Markov property. You should be familiar with
the Poisson, survival, sicknessdeath and marriage models.
The Poisson process is a simple Markov jump process. It is timehomogeneous
with stationary increments that are Poisson distributed with mean > 0.
Waiting times between jumps are exponentially distributed with mean 1/.
As with Markov chains, transition probabilities exist for a general Markov
jump process
pij (s, t) = P [Xt = jXs = i] , where pij (s, t) 0 and s < t,
which must also satisfy the ChapmanKolmogorov equations.
The quantities qjj (t), qkj (t) are the transition rates, such that
pjj (t, t + h) 1
lim := qjj (t),
h0 h
pkj (t, t + h)
lim := qkj (t), for k 6= j.
h0 h
Kolmogorovs forward and backwards equations are respectively
pij (s, t) X pij (s, t) X
= pik (s, t)qkj (t) and = qik (s)pkj (s, t).
t kS
s k
95
Questions
(a) Calculate the probability that there will be fewer than 1 claim on
a given day.
(b) Estimate the probability that another claim will be reported dur
ing the next hour. State all assumptions made.
(c) If there have not been any claims for over a week, calculate the
expected time before a new claim occurs.
96
Chapter 6
The twostate Markov survival model
Introduction
97
t px : = 1 t qx is the probability that a life, aged x, survives for the
next t years.
qx : = 1 qx ; px : = 1 px .
1
x : = lim+ h
P [Tx h] is the force of mortality at age x, or
h0
hazard rate.
Our approach in this section is based on the fact that a life is in one of two
states: alive or dead, and can only move from the state of being alive to the
state of being dead.
x
Alive Dead
Here x represents the age of the life. In other words, our model is a Markov
jump process with just 2 states, and the hazard rate x is the transition
intensity from state Alive to state Dead.
The twostate Markov model makes the following assumptions:
The Markov assumption means that the only information that affects the
future lifetime of the life, is its current age and state. Therefore, we ig
nore any previous or current medical conditions or lifestyle factors, such as
smoking or exercise, and treat every life, aged x, as an identical life. The
Markov assumptions means that the probabilities of survival after age x + t
is independent of the probability of survival up to age x.
98
When studying mortality for a heterogeneous group, our results will reveal
only the average result for the group. Therefore the Markov assumption is
clearly a simplification.
In reality, the population could be split into homogeneous groups taking into
account characteristics thought to have a significant effect on mortality. A
separate model could then be developed for each group. For example, we
could model mortality for a group of 40 year old males who smoke and rarely
exercise. However, this would reduce the size of the populations studied for
each model.
The third assumption means that the transition intensity is a step function,
with discrete steps at each birthday. This assumption is made to simplify
the model. In reality, the transition intensity is continuous i.e. you are not
suddenly more likely to die aged 41 than you were aged 40 and 364 days. On
making this assumption, the most accurate approximation for x is actually
to use x+0.5 .
Let us show how to derive this relationship directly from our 3 assumptions
above:
By definition:
t px t+dt px t px
t
= lim+ dt
dt0
Since the Markov assumptions means that the probability of survival after
age x + t is independent of the probability of survival up to age x then:
99
t+dt px = t px dt px+t
t px t px .dt px+t t px
So t
= lim+ dt
dt0
Then, by assumption 2:
t px t px .(1x+t dt+o(dt))t px
t
= lim+ dt
dt0
o(dt)
Hence, since lim+ dt
= 0 by definition of the small correction term,
dt0
t px
t
= t px .x+t
Rt
x+s ds
Hence, by integration, we obtain our required result i.e. t px = e 0 .
100
Clearly 0 ai < bi 1, since our observation is only for the period whilst
life i is aged x last birthday.
Definition 6.2. Define Di as the random variable which indicates whether
or not life i is observed to die during the observation.
Let Di = 1 if life i is observed to die during the observation, and 0 otherwise.
Definition 6.3. Define the waiting time, denoted by Vi , to be the actual
time of the observation for life i. Hence 0 < Vi bi ai
Hence Vi is equal to the age when the observation of life i actually ceases less
the age when the observation of life i starts.
Vi is random variable, which is a mixture of discrete and continuous distri
bution: it is continuous for 0 < Vi < bi ai , and discrete at Vi = bi ai
because the observation will definitely cease at this point, if the life has not
already died at an earlier point. We refer to this end point, bi ai , as having
a probability mass i.e. there is a probability that the random variable Vi
will be exactly equal to bi ai .
101
Consider the case where di = 0, i.e. life i is not observed to die. Then
vi = bi ai and the probability of this is bi ai px+ai .
Consider the case where di = 1, i.e. life i dies, then this life has:
Let (d, v) denote a sample drawn from the distribution (D, V ) and let f (d, v)
denote the joint probability function for (d, v).
Note: The observed waiting time, denoted by v, can also be referred to as
the central exposed to risk.
Our representation of the joint probability function fi (di , vi ) can be extended
to derive f (d, v):
N N N
evi .di = ev .d , where d =
Q P P
f (d, v) = di and v = vi
i=1 i=1 i=1
102
Probability distribution function for Di :
P [Di = 0] =bi ai px+ai = e(bi ai )
P [Di = 1] =bi ai qx+ai = 1 bi ai px+ai = 1 e(bi ai )
Example 6.2. Investigation has shown that 60 = 0.01 for a certain popu
lation. Consider a member of this population currently aged 60 12 . Find the
probability distribution function for observing this individuals death before
age 61 and the probability density function for the waiting period of this
observation.
Using notation consistent with that above, denote this life by i. Then:
= 0.01, ai = 0.5 and bi = 1
Hence, the probability distribution function for Di is:
P [Di = 0] = e(bi ai ) = e0.010.5 = 0.995
P [Di = 1] = 1 e(bi ai ) = 0.005
and the probability density function for Vi is:
(
0.01.e0.01vi 0 < vi < 0.5
f (vi ) =
e0.005 vi = 0.5
103
d
= v
1. E[Di Vi ] = 0
2. V ar[Di Vi ] = E[Di ]
Let us prove the the first of these relationships. Since the sum of a p.d.f. is
always 1, then the relation (40) gives us the following:
ai
biR
et dt + e(bi ai ) = 1
0
104
ai
biR ai
biR
et dt [ et tdt + (bi ai )e(bi ai ) ] = 0 (Equation A)
0 0
ai
biR
Now, E[Di ] = 0 P [Di = 0] + 1 P [Di = 1] = et dt
0
ai
biR
and E[Vi ] = t.et dt + (bi ai )e(bi ai )
0
2. E[Di Vi ] = 0 implies E[ N1 (D V )] = 0,
V ar[ N1 (D V )] = 1
N2
V ar[D V ] = 1
N2
E[D]
to deduce that:
1
N
(D V ) N (0, E[D]
N2
)
D DV N 1
Now consider = V
= V
= V N
(D V ).
V
Since limN N = E[Vi ] and E[V ] = N E[Vi ] (from 2 above) then:
N
E[V ]
N (0, E[D]
N2
) E[D]
= N (0, E[V ]2
)
So N (, E[V ]
)
105
3. Discuss whether or not your initial hypothesis appears valid
1
1
Then, using integration by parts, E[V ] = 1, 000. 0.4 (1e0.4. 12 ) = 81.96
0.41030.4
P [ > 0.4103] = P [Z > 0.0699
] = 1(0.1474) = 10.5586 = 0.44.
106
6.6 Summary
The twostate Markov model treats a life as being in one of two states: alive
or dead. It relies on the validity of three assumptions:
The Markov approach is useful in that it can easily be extended to cope with
more than one state and/or transition intensity.
For a life i, we recorded our observation of the twostate model by using the
terms ai , bi , di , vi . We then summarised our observation of a population,
N
P N
P
size N, using the notation d = Di and v = Vi for 1 i N .
i=1 i=1
107
Questions
(a) Di =0
(b) Di =1
(a) ai
(b) bi
(c) Vi
and assuming that all lives are independent, state the joint probability
PN N
P
function for (D, V ), where D = Di and V = Vi for 1 i N .
i=1 i=1
3
Attrition is the percentage of employees leaving a firm, over a defined period, usually
a year.
108
7. A university is assessing its drop out rates for the first year of a three
year degree course, and believes = 0.15. At the end of the first term,
there are 50 students on the course. Assuming that all three terms are
equal, students decision to drop out of a course is independent, and
courses run back to back, determine the following:
10. An investigation took place into the mortality of males aged between
60 and 61 years suffering from angina. The table below summarises the
results of this investigation. For each person it gives the ages at which
observation of this life began (Start), the age at which observation
ceased (End) and the reason for it ceasing (Reason) i.e. D= observation
ceased due to death, W=withdrew from observation for reason other
than death.
109
Chapter 7
The multiplestate Markov model
Introduction
In the previous chapter, we introduced the twostate Markov model. In this
chapter we extend this model to consider scenarios where more than two state
exist and/or where movement between states is twodirectional. Examples
of where such models can be used include:
Income protection (IP): This type of insurance plan is designed to pro
vide a replacement income when your salary stops if you are unable to
work due illness. The diagram below illustrates the relevant states for
IP, and the possible transitions between these states.
x
able x
 ill
@
x@ x
@
R
@
dead
Cancer patients: The states to consider for cancer patients can vary
depending on the research being carried. At their simplest, four states
can be considered: able, ill, remission and dead. On a more complex
level, states can be constructed so as to take into account more complex
detail such as family history and genetic testing.
110
gh
x denote the transition intensity from state g to state h at age x.
gh
t px =P[In state h at age x + t In state g at age x] i.e. the probability that
a life in state g at age x will be in state h at age x + t.
gg
t px =P[In state g at age x + t In state g at age x] i.e. the probability that
a life in state g at age x will be in state g at age x + t.
gg
t px =P[In state g from age x to age x + t In state g at age x] i.e. the
probability that a life in state g at age x will remain in state g until age
x + t.
The difference between t pgg gg
x and t px may not be immediately clear. For t px
gg
we are only concerned the probability that life is in state g at age x + t given
it is in state g at age x. However, we are not concerned about the state the
life has been in between these ages i.e. it could have left state g and returned
to it during the period. For t pgg
x we are considering the probability that a life
age x, and in state g, remains in state g until age x + t. It is possible that
for some models t pgg gg
x and t px are equal, however, this is only the case when
either you cannot reenter the state once you have left it or you are unable
to leave the state. Let us illustrate this by considering two examples.
Example 7.1. Let us consider the a=able, i=ill, d=dead states. Then t paa x
is not equal to t paa
x since it is possible that the life becomes ill and recovers
between ages x and x + t.
Example 7.2. Let us consider a model for cancer sufferers, where a=able,
i=ill, r=remission, c=recovered and d=dead.

able  ill remission
@ @
I
@ @
@ @
@
R
)
dead recovered
Then t paa aa
x is equal to t px , since you cannot reenter this state after leaving
it.
Let us now state the three assumptions that are the foundation of the
multiplestate Markov model.
111
2. For any distinct two states, g and h, over a short time interval dt:
gh
dt px+t = gh
x+t dt + o(dt) for t > 0
where o(dt) is a small correction term, which allows for the probability
that a life makes any two or more transitions in time period dt.
t pgg
= t pgg gj
x
P
t x x+t
j6=g
t pgh jh gh hj
(t pgj
P
Example 7.3. Let us prove the equation = x x+t t px x+t )
x
t
j6=g
directly.
Start with the definition of partial differentiation
gh gh
t pgh t+dt px t px
t
x
= lim+ dt
dt0
Now consider t+dt pgh x , the probability that a life in state g at age x will be
in state h at age x + t + dt. Let us split this probability by considering the
state the life will be in at age x + t: it can either be in state h, or in some
state other than h i.e.
gh gh hh
P gj jh
t+dt px =t px .dt px+t + t px .dt px+t
j6=h
By applying the second assumption listed above for the multiple state model:
jh
dt px+t = jh
x+t .dt + o(dt)
hj
hh
jh
P P
and by noting that dt px+t = 1 dt px+t = 1 x+t .dt + o(dt) we can
j6=h j6=h
derive the relationship:
gh gh
P hj P gj jh
t+dt px = t px .(1 x+t .dt + o(dt)) + t px .(x+t .dt + o(dt))
j6=h j6=h
112
gh P hj P gj jh gh
t px .(1 x+t .dt+o(dt))+ t px .(x+t .dt+o(dt))t px
t pgh j6=h j6=h
Hence t
x
= lim+ dt
dt0
Example 7.4. The diagram below outlines the three state model, a=able,
ai
i=ill and d=dead. Write down equations for t piix and ttpx .
t
ii
R ia id
t px = exp (x+s + x+s )ds
0
t pai
t
x
= t paa ai ai ia ad di ai id
x x+t t px x+t + t px x+t t px x+t
Since, di
x+t = 0 this simplifies to:
t pai
t
x
= t paa ai ai ia id
x x+t t px (x+t + x+t )
113
0
01 02
@
@
R
@
1 2
Note that this case is simplified by the fact the transitions are one directional
only and you cannot revisit states.
Then the Kolmogorov equations for the model can be solved by noting the
following:
t p00 01 02
x = 1 (t px +t px ), since if you are not in state 0 you must be in
either state 1 or state 2.
t p00 00
x =t px
Rt 01
t p00
x = exp( ( + 02 )ds), from the previous section.
0
( 01 +02 )t
Hence, t p00
x = e
h i
02 01 02
02
t px = 01 +02
1 e( + )t
114
x
able x
 ill
@
x@ x
@
R
@
dead
Let us make some observations about the terms we have defined above and
then use an example to reinforce them.
Life i may enter and leave both the able and ill state several times during
the period. Vi and Wi represent the total waiting time spend in the able
and ill state respectively. Since we are assuming the transition intensities are
constant, this is an acceptable approach.
At the end of the observation the life has either died or not died, hence
Di + Ui is either 0 or 1. Both Di and Ui can be either 0 or 1 but both Di
and Ui cannot be 1 as you either die from the able state or ill state, but not
both.
Definition 7.3. Let us use the following notation for transition intensities
of the illnessdeath model:
Let denote the transition intensity ad .
115
Let denote the transition intensity ai .
Let denote the transition intensity ia .
Let denote the transition intensity id .
Note: is the greek letter nu, and is different from the letter v, used to
denote the total sample observed waiting time in the able state.
Example 7.5. Consider two lives, life A and life B, who take part in an
investigation to establish an illnessdeath model for the age interval 40 to 41.
Summaries of the observation of these lives are provided below.
Life A: Joins the investigation age 40 years 2 months in an ill state. Recovers
age 40 years 5 months. Becomes ill again age 40 years 7 months and dies age
40 years 9 months.
Life B: Joins the investigation age 40 years in an well state. Falls ill age 40
years 5 months. Recovers age 40 years 9 months and remains well until age
41.
The parameters for these sample lives are summarised in the table below.
Life vi wi si ri di ui
2 5
A 12 12
1 1 0 1
8 4
B 12 12
1 1 0 0
The likelihood function for the four parameters is proportional to:
L(, , , ) = e(+)v e(+)w d u s r
This relationship is derived using a similar approach to that seen in chapter
5. Let us consider life i. Then using the relationship:
!
t P
gj
gg
R
t px = exp x+s ds
0 j6=g
116
L(, , , ) = e(+)v e(+)w d u s r ,
which enables us to derive the maximum likelihood estimators:
D U S R
= V
, = W
, = V
, = W
Example 7.6. Imagine we have access to the following data from a six
month study into the sickness of men aged between 95 and 96 for a year.
The investigation started with 10,000 healthy lives age 95. During the year
we observe the following movements:
1,700 deaths are observed: 30% of which are from a state of illhealth
2,600 illnesses are diagnosed during the period, and 780 recoveries from
these illnesses observed.
4,300 lives were censored during the investigation, for various reasons,
including turning 96.
There were 4,000 lives under observation when the investigation termi
nated.
Then we can estimate the following statistics in the Markov sickness model.
6 3 3
v + w = 10, 000 12
4, 300 12
1, 700 12
= 3, 500
4
w = 2, 600 12
( 26 + 16 ( 78 + 85 + 38 + 18 )) = 577.78
v = 3, 500 577.78 = 2922.22
s = 2, 600, r = 780, d = 1, 190, u = 510
Therefore, the maximum likelihood estimates for these parameters, based on
this observation, are:
117
1,190 510
= 2,922.22
= 0.4072, = 577.78
= 0.8827,
2,600 780
= 2,922.22
= 0.8897, = 577.78
= 1.3500
As the example above illustrates, it is sometimes necessary to estimate v and
w. There are two reasons for this:
Practical applications
The Markov model allows us to reduce the problem of analysing the morbidity
(the rate of incidence of illness) and mortality experience to the problem of
modelling the transitions between states. They offer a practical, intuitive
and essentially tractable solution to many problems.
118
Where the transition intensities do not depend on the length of time in the
current state, then we apply a standard Markov model. If the intensities do
depend on the length of time in the current state we require a semiMarkov
model. The transition intensities can be estimated from transition data, and
with them we can calculate any probabilities by solving some differential
equations.
Once we have calculated the probabilities of transition we can value all sorts
of insurance benefits. One important and quite controversial example, is
the use of a Markov multiple state model to explore the effect of genetic
information on insurance and the risk of adverse selection.
Definition 7.4. In an actuarial context, selection is the process by which
lives in a population are divided into separate homogeneous groups. Where
an individual has the ability to influence the group they fall into, adverse
selection can occur.
For example, if an individuals parents have both died at an early age from
heart disease, they may choose to take out additional life insurance. If in
surance company A do not take into account family history when setting
premiums, but other insurance companies do, the individual may approach
company A to try to avoid higher premiums. Company A are then subject
to adverse selection.
Insurers were extremely worried about the effect of adverse selection on their
solvency. Adverse selection usually arises from an information asymmetry,
where the policyholder exploits the additional information he holds about
his own claim risk. Models were used to explore the idea that the risk of
adverse selection could, to a large extent, be managed by restrictions on the
level of death benefits under insurance policies (which are very sensitive to
mortality experience). If large sums insured were only available to applicants
prepared to undergo genetic testing, then the adverse selection issues were
not as daunting as the insurance industry had assumed.
Since 1999 the Genetics and Insurance Research Centre, based at Heriot
Watt University in Edinburgh, has continued to apply Markov and semi
Markov models to specific disorders having a genetic component, including
breast and ovarian cancer, Alzheimers disease and Huntingtons disease. Such
research is used when pricing critical illness insurance, which pays a benefit
in the event that the policyholder suffers from one of a list of diseases within
a specified term. The implications of adverse selection for critical illness
insurance have been explored in many of their papers, and the work assesses
what information is required for a potential policyholder to be eligible for
insurance, and at what premium.
119
Multiple state models are also highly appropriate for modelling policyholder
or investor behaviour, for example life insurance lapse experience.
They are also used for econometric models, for example, analysing stock
data or the state of the economy. The mathematics is somewhat more com
plex, but it opens opportunities also for more advanced financial engineering
applications.
120
7.5 Summary
The multiple state models relies on the following three assumptions:
gh
x+t is constant for x an integer and 0 t < 1.
t pgg
= t pgg gj
x
P
t x x+t
j6=g
We can find the probability of remaining in the same state over a period t:
!
t P
gg
R gj
t px = exp x+s ds
0 j6=g
121
Questions
1. The model below represents a multiple state model for an Income Pro
tection Plan, where there are 3 states: a =able, i =ill and d =dead.
Outline the following terms in words:
(a) t pad
x
(b) t piix
(c) t piix
(d) t pdd
x
x
able x
 ill
@
x@ x
@
R
@
dead
(a) t pnn nn
x =t px
(b) t pss ss
x =t px
t pgg
= t pgg gj
P
3. Prove the relationship t
x
x x+t
j6=g
(a) Life A: Joins observation age 70 when ill. Recovers age 70 years
1 months. Falls ill again age 70 years 2 months. Recovers age 70
years 4 months. Leaves study age 70 years 11 months.
122
(b) Life B: Joins observation age 70 years 3 months when well. Dies
age 70 years 10 months.
(c) Life C: Joins observation age 70 and well. Falls ill age 70 years
3 months. Recovers age 70 years 9 months. Falls ill again age 70
years 10 months. Dies age 70 years 11 months.
5. Define W , D and U .
Assuming the transition intensities are constant for this age band, draw
a diagram to represent the Markov model for sickness and use the
observation above to estimate the relevant transition intensities.
123
A Chapter 1 solutions
1. What input parameters might you have for a model assessing the cost
of providing life assurance?
date of birth
sex
smoker/nonsmoker indicator
height
weight
married
profession
Units of alcohol consumed per week
Family history of illness
Criminal record indicator
The factors above will help a life company to accurately assess the
likelihood of an individual dying during the period the life cover is in
force. In addition to this the sum assured (i.e. the amount which will
be paid out on death) and the period of cover may also be an input
parameter, chosen by the customer rather than specified by the insurer.
124
(d) Capture real world system: The initial model should be de
scribed so as to capture the main feature of the real world system.
The level of detail in the model can always be reduced at a later
stage.
(e) Expertise: Involve the experts on the relevant system or process.
They will be able to feedback on the validity of the model before
it is developed further.
(f) Choose computer program: Decide on whether the model
should be built using a simulation package or a general purpose
language.
(g) Write model: Write the computer program for the model.
(h) Debug the program.
(i) Test model output: Test the reasonableness of the output from
the model. The experts on the relevant system or process should
be involved at this stage.
(j) Review and amend: Review and carefully consider the appro
priateness of the model in the light of small changes in input
parameters.
(k) Analyse output: Analyse the output from the model.
(l) Document and communicate: Document and communicate
the results of the model, including an appropriate amount of de
tails around the methodology used.
3. At what stage in the modelling process should you debug your program?
125
4. Briefly discuss the issues to consider when assessing the suitability of
a model.
126
6. What does sensitivity analysis mean?
A Turing test involves experts in the real world system analysing the
output from a model against real life data from the system the model
is trying to recreate. The experts are asked to compare several sets
of data: somce from the model and some from the real world system.
They are not told which data is which. The experts are then asked
whether they can differentiate between the two sets of data. If they are
able to differentiate, their method of differentiation can then be used
to adapt and improve the model.
127
(d) his future investment returns are 6% p.a.
10, 000a 20 where i = 6% p.a. = 114, 699
(e) his future investment returns are 8% p.a.
10, 000a 20 where i = 8% p.a. = 98, 181
128
B Chapter 2 solutions
1. Use Excels RAND() function to generate 30 U (0, 1)random variates. Use
them to estimate the integral of f (x) = xe2x + 1 over the domain x [0, 1].
(u1 , . . . , u30 ) = (0.587, 0.030, 0.048, 0.593, 0.165, 0.788, 0.714, 0.265,
0.712, 0.630, 0.569, 0.766, 0.638, 0.984, 0.721, 0.028,
0.726, 0.218, 0.792, 0.656, 0.155, 0.447, 0.224, 0.478,
0.113, 0.493, 0.980, 0.320, 0.524, 0.933)
PM
Using these numbers, the estimator of the integral is equal to M1 m=1 f (um ) =
1.147872. The integral I can be evaluated as
Z 1 1
2x 2x 1 2x x=1 5 3
I= (xe + 1) dx = xe e + x = 2 = 1.148499.
0 2 4 x=0 4 4e
which is an unbiased estimator of the variance Var(f (U )). Let z be such that
i.e. (z) = 0.975. From the table of the standard normal distribution func
tion , we then get the value z = 1.96. This implies that
ef
ef
0.95 = P IbM 1.96 I IbM + 1.96 ,
M M
129
ef b
ef
i.e. [IbM 1.96 M , IM + 1.96 M ] is a 95%confidence interval for I. In the
situation of Question 1, we obtain the interval [1.1316, 1.1642], which contains
both the true value I = 1.148499 and the point estimate 1.147872. Here the
standard deviation is ef = 0.045532.
3. Write an algorithm to generate M N random variates from the double
exponential distribution with density function f (x) = 21 ex for x R
where > 0, using the inversetransform method.
Answer: The corresponding distribution function is given by
Z x 1 x
2
e , if x < 0,
F (x) = f (y) dy = 1 x
1 2
e , if x 0.
We note that F (x) < 1/2 if and only if x < 0. The inverse F 1 is easily
derived as 1
1
log(2y), if 0 < y < 21 ,
F (y) =
1 log(2(1 y)), if 12 y < 1.
We can use the following algorithm:
(a) Generate a sequence of independent random variates u1 , . . . , uM from
the uniform distribution U (0, 1).
(b) If um < 1/2 then let xm = 1 log(2um ), otherwise let
xm = 1 log(2(1 um )), (m = 1, . . . , M ).
(c) Return the sequence x1 , . . . , xM .
4. Demonstrate that the acceptancerejection method algorithm does gener
ate random variates from the distribution function with density function f (x).
Answer: The answer requires to formulate the algorithm with the help of
random variables rather than their realisations. Let U and Z be indepen
dent random variables, where we assume that U U (0, 1) and that Z has a
suitable density h(x). We consider a probability density f (x) and set
f (x) f (x)
C = sup , g(x) = .
x h(x) C h(x)
Here we use that 0/0 := 0 and 1/0 = . We assume that C is finite. The
algorithm for the simulation of a random variable X from density f (x) states
that we return X = Z if U g(Z) and reject otherwise. So we have to prove
that the distribution of Z given U g(Z) has the density f (x). For this, it
suffices to show that the distribution
Rz function P (Z z  U g(Z)) coincides
with the distribution function f (x) dx for arbitrary (fixed) z R. Using
130
the properties of conditional expectations, we get
Note that I denotes an indicator function, where, for any event A, we have
IA () = 1 if A and IA () = 0 otherwise. Similarly to the above
P (Z z, U g(Z))
= E[I{Zz, U g(Z)} ] = E[I{Zz} I{U g(Z)} ] = E(E[I{Zz} I{U g(Z)}  Z])
Z z
= E(I{Zz} E[I{U g(Z)}  Z]) = E(I{Zz} g(Z)) = g(x)h(x) dx
Z z
1 z
Z
f (x)
= h(x) dx = f (x) dx.
C h(x) C
as required.
5. Estimate the proportion of computed numbers that would be rejected from
3
the acceptancerejection method if f (x) = 32 (1 x)(x 5) for 1 x 5
and h(x) is the density function for the U (1, 5) distribution.
Answer: Let us use the notation from the answer of Question 5. The
proportion of computed numbers that would be rejected is estimated by the
probability P (U > g(Z)) = 1P (U g(Z)) = 1 C1 . So we have to evaluate
the constant C:
3
f (x) 32
(x 1)(5 x) 3
C = sup = sup 1 = sup (x 1)(5 x) .
x h(x) x(1,5) 4
8 x(1,5)
131
(1) (1) (2) (2) (3) (3)
random variates (u1 , u2 ), (u1 , u2 ), and (u1 , u2 ) given by (0.686, 0.033),
(0.248, 0.046), and (0.229, 0.617). We simply return
q q
(j) (j) (j) (j) (j) (j)
z1 = 10 + 4 2 ln u1 cos(2u2 ), z2 = 10 + 4 2 ln u1 sin(2u2 ),
for all j = 1, 2, 3, i.e. 13.398, 10.715, 16.403, 11.904, 4.906, and 5.394. Since
we have generated 6 random variates, one of them (say the last one) can be
dropped.
132
C Chapter 3 solutions
1. Let X be a random variable from the continuous uniform distribution,
X U (0.5, 1.0). Starting with the probability density function, derive ex
pressions for the cumulative distribution function, expectation and variance
of X.
(b) p(a1 ) = 0.4; p(a2 ) = 0.3; p(a3 ) = 0.1; p(a4 ) = 0.1; p(a5 ) = 0.1;
133
(c) p(a1 ) = 0.4; p(a2 ) = 0.3; p(a3 ) = 0.2; p(a4 ) = 0.1; p(a5 ) = 0.1;
(d) p(a1 ) = 0.4; p(a2 ) = 0.3; p(a3 ) = 0.2; p(a4 ) = 0.1; p(a5 ) = 0.1.
In parts (a) and (d), the values of this sum is equal to 0.8 and 1.1, respectively,
which means that P is not a probability distribution and therefore (, A, P )
is not a probability space.
We further know that probabilities can never be negative. In part (c), we
have p(a4 ) = 0.1, which means that (, A, P ) is not a probability space.
In part (b), (, A, P ) is indeed a probability space, since here all require
ments are met.
Cov(RA , RB )
Corr(RA , RB ) = p ,
Var(RA )Var(RB )
134
where Cov(RA , RA ) = E(RA RB ) E(RA )E(RB ) is the covariance between
RA and RB . We have
Note that % and %% stand for 1/100 and 1/1002 , respectively. Using the
values above, we obtain
A := {1 , 4 }, B := {2 , 4 }, C := {3 , 4 }.
Prove that the pairs (A, B), (A, C) and (B, C) are independent, but the triple
(A, B, C) is not mutually independent according to Definition 3.4.
135
Answer: We have
1 1 1
P (A) = P (B) = P (C) = + = ,
4 4 2
1
P (A B) = P ({4 }) = = P (A)P (B),
4
1
P (A C) = P ({4 }) = = P (A)P (C),
4
1
P (B C) = P ({4 }) = = P (B)P (C),
4
which shows that the pairs (A, B), (A, C) and (B, C) are independent. How
ever
1 1
P (A B C) = P ({4 }) = 6= = P (A)P (B)P (C).
4 8
So the triple (A, B, C) is not mutually independent.
136
D Chapter 4 solutions
1. Consider a Markov chain with state space S = {0, 1, 2} and transition
matrix
p q 0
P = 1/4 0 3/4 .
p 1/2 7/10 1/5
Answer: (a) The sum of all entries in the last row must be equal to 1, as a
7
consequence of which p = 1 51 10 + 12 = 35 . In view of the first row, we see
that q = 25 .
(b)
1
2/5 7/10
3/4
1/4
0 2
3/5 1/10
1/5
137
evaluated using the property Pk+` = Pk P` , (k, ` N). E.g. the calculation
of P4 = (P2 )2 does not require the calculation of P3 .
(d) It can be shown that the only stationary distribution is given by
55 64 60
= (1 , 2 , 3 ) = , , (0.30726, 0.35754, 0.33520).
179 179 179
Indeed this follows, if we solve the linear equations P = for 1 , 2 , 3
[0, 1] with 1 + 2 + 3 = 1. More precisely, we have
3 1 1
1 + 2 + 3 = 1 , (41)
5 4 10
2 7
1 +
02 + 3 = 2 , (42)
5 10
3 1
01 + 2 + 3 = 3 , (43)
4 5
1 + 2 + 3 = 1. (44)
15
From (43) it follows that 3 = 16 2 . Using this in (42), we see that 2 = 64
55 1
12 64 12
and, in turn, 3 = 11 1 . In view of (44), we then get 1 (1 + 55 + 11 ) = 1, i.e.
55
1 = 179 . From the above, we then obtain the remaining values 2 and 3
as indicated. We did not use (41). This equation must be valid, since P is a
stochastic matrix. Therefore, (41) can be used to check our solution.
138
Suppose the equation is true for a N N, then, using the Markov property
of {Xk }, we get
P (X0 = j0 , X1 = j1 , . . . , XN +1 = jN +1 )
= P (XN +1 = jN +1  X0 = j0 , X1 = j1 , . . . , XN = jN )
P (X0 = j0 , X1 = j1 , . . . , XN = jN )
N
Y 1
= P (XN +1 = jN +1  XN = jN ) P (X0 = j0 ) pjn ,jn+1 (n, n + 1)
n=0
N
Y 1
= pjN ,jN +1 (N, N + 1) P (X0 = j0 ) pjn ,jn+1 (n, n + 1)
n=0
N
Y
= P (X0 = j0 ) pjn ,jn+1 (n, n + 1),
n=0
139
(ii) Calculate the probability that a policyholder who is currently at level
2 will be at level 2 after:
Answer:
(i) It is clear that X(t) is a Markov chain; knowing the present state, any
additional information about the past is irrelevant for predicting the
next transition.
Then the transition matrix is given by
.15 .85 0 0
.15 0 .85 0
P =.03 .12 0 .85 .
(ii) (a) For the one year transition p22 = 0, since with probability 1, the
chain will leave the state 2.
(b) The second order transition matrix is given by
0.15 0.1275 0.7225 0
0.048 0.2295 0 0.7225
P (2) = P P =
0.0225 0.051 0.204 0.7225 ,
(iii) The chain is irreducible as any state is reachable by any other state.
It is also aperiodic. For states 1 and 4 the chain can simply remain
there. This is not the case for states 2 and 3. However these are
140
also aperiodic, since starting from 2 the chain can return in 2 and 3
transitions from the previous part of the question. Similarly the chain
started at 3 can return at 3 in two steps (look at P 2 ), and at three
steps.
(iv) The chain is irreducible and has a finite state space and thus has a
unique stationary distribution.
(v) To find the long run probability that the chain is at level 2 we need to
calculate the unique stationary distribution . This amounts to solving
the matrix equation P = . This is a system of 4 equations in 4
unknowns given by
1 + 2 + 4 + 4 = 1.
(n)
Let pij be the nstep transition probability of an irreducible aperiodic
(n)
Markov chain on a finite state space. Then, lim pij = j for each i and
j. Thus the long run probability that the chain is in state 2 is given by
2 = .05269.
141
E Chapter 5 solutions
1. Claims are known to follow a Poisson process with a uniform rate of 3 per
day.
(a) Calculate the probability that there will be fewer than 1 claim on a
given day.
(b) Estimate the probability that another claim will be reported during the
next hour. State all assumptions made.
(c) If there have not been any claims for over a week, calculate the expected
time before a new claim occurs.
Answer: Let {Nt }t[0,) denote our Poisson process with rate = 3, where
the time is measured in days.
(a) We have to evaluate P (Nt+1 Nt < 1) for a fixed t 0. But this is
equal to
P (N1 = 0) = e = e3 = 0.04979.
1
(b) We look for the probability that, during the time interval (t, t + 24
]
for a fixed t, at least one claim will be reported, i.e.
142
where i, j S and 0 t1 < t2 < t3 < . We have
The individual remains in the healthy state from time s to time s+w and then
jumps to the state dead (where he remains) or to the state sick (where he
jumps to state dead by time t). Note that here pDD (s+w, t) = 1. Further note
that the formula for pHS (s, t) did not contain the term pDS (s + w, t)(s + w),
since the probability to jump from dead to sick is equal to zero.
(b) Solve the Kolmogorovs forward equations for this Markov jump process
to find all transition probabilies.
143
(d) What is the probability that the process will be in state 0 in the long
term? Does it depend on the initial state?
dp00 (t)
= p00 (t) + p01 (t),
dt
dp01 (t)
= p00 (t) p01 (t),
dt
dp10 (t)
= p10 (t) + p11 (t),
dt
dp11 (t)
= p10 (t) p11 (t),
dt
Substituting p01 (t) = 1 p00 (t) in the first equation, we get equation
dp00 (t)
= p00 (t) + (1 p00 (t)) = ( + )p00 (t) + .
dt
which has a general solution
p00 (t) = + Ce(+)t .
+
Initial condition p00 (0) = 1 lealds to C = +
, so finally we get
(+)t
p00 (t) = + e .
+ +
Transition probabilities p01 (t), p10 (t), and p11 (t) can be found similarly. They
are:
(+)t
p01 (t) = e ;
+ +
p10 (t) = e(+)t ;
+ +
p11 (t) = + e(+)t .
+ +
144
(c) For a timehomogeneous Markov process ChapmanKolmogorov equa
tions take the form X
pij (t + s) = pik (t) pkj (s).
kS
In our case, S = {0, 1}, thus there are 4 equations. For example, for i = j = 0
we get
p00 (t + s) = p00 (t) p00 (s) + p01 (t) p10 (s).
So we should check that
(+)(t+s)
+ e =
+ +
(+)t (+)s
+ e + e +
+ + + +
(+)t (+)s
e e ,
+ + + +
145
F Chapter 6 solutions
1. The twostate Markov model involves a onedirectional, single, age de
pendent transition intensity. Explain what this means.
It refers to the fact that we only have two possible states: alive or dead.
Transition between these states can only take place in one direction,
from alive to dead. Hence the terms singe and onedirectional.
The transition intensity between these two states varies with age e.g.
it is much higher for a 90 year old than a 30 year old. Hence the term
age dependent.
The Markov assumption means that the only information that affects
the future lifetime of the life is its current age and state. Therefore we
ignore any previous or current medical conditions or lifestyle factors,
such as smoking or exercise, and treat every life, aged x, as an identical
life. The Markov assumptions means that the probability of survival
after age x + t is independent of the probability of survival up to age
x.
146
In reality, the population could be split into homogenous groups taking
into account characteristics thought to have a significant effect on mor
tality. A separate model could then be developed for each group. Al
ternatively the Cox model could be use instead of the twostate model.
This addresses heterogeneity directly by allowing for different covari
ates.
(a) Di =0
(b) Di =1
(a) Di =0: Then a death has not been observed and observation has
taken place between age x + ai and x + bi . Hence, Vi = bi ai .
(b) Di =1: Then a death has been observed during the observation
period and hence 0 Vi < bi ai .
(a) ai
(b) bi
(c) vi
3
(a) ai = 12
(b) bi = 1
8
(c) vi = 12
4
Attrition is the percentage of employees leaving a firm, over a defined period, usually
a year.
147
Possible heterogeneity links to how long the employee has been with
the firm before qualifying (or joining the organisation if later), time to
qualification and age at qualification.
and assuming that all lives are independent, state the joint probability
PN N
P
function for (D, V ), where D = Di and V = Vi for 1 i N .
i=1 i=1
N N N
evi .di = ev .d , where d =
Q P P
f (d, v) = di and v = vi
i=1 i=1 i=1
7. A university is assessing its drop out rates for the first year of a three
year degree course, and believes = 0.15. At the end of the first term,
there are 50 students on the course. Assuming that all three terms
are equal, students decision to drop out of a course is independent and
courses run back to back, determine the following:
(a) the probability distribution function for an individual on this
course leaving before the end of the first year
(b) the probability distribution function for the remaining time an
individual spends on the course for the remainder of the first year.
(c) state the joint probability function for the 50 students
d log L
Setting d
equal to zero to find a turning point, leads to = vd .
d log L
To prove this is a maximum, differentiate d
with respect to :
d2 log L
d2
= d2 < 0. Hence d
v
is a maximum and our maximum likelihood
estimate is = vd .
Under the Markov model, we assume is constant over the year and
derive t px = e.t . Hence:
10. An investigation took place into the mortality of males aged between
60 and 61 years suffering from angina. The table below summarises the
results of this investigation. For each person it gives the ages at which
observation of this life began (Start), the age at which observation
ceased (End) and the reason for it ceasing (Reason) i.e. D= observation
ceased due to death, W=withdrew from observation for reason other
than death.
149
Life Start End Reason
1 5
1 60 12 60 12 D
4 11
2 60 12 60 12 W
3 60 61 W
6
4 60 12 61 W
2
5 60 60 12 D
6 60 11
12
61 W
4 10
7 60 12 60 12 W
3
8 60 60 12 D
9 60 61 W
1 6
10 60 12 60 12 W
Hence d = 3 and v = 4 10
12
, so 60 = 3
4 10
= 0.6207.
12
150
G Chapter 7 solutions
1. The model below represents a multiple state model for an Income Pro
tection Plan, where a=able, i=ill and d=dead. Outline the following
terms in words:
(a) t pad
x
(b) t piix
(c) t piix
(d) t pdd
x
x
able x
 ill
@
x@ x
@
R
@
dead
(a) t pad
x is the probability of an individual who is able at age x dying
in the next t years, i.e. by age x + t.
(b) t piix is the probability of an individual who is ill at age x being ill
at age x + t.
(c) t piix is the probability of an individual who is ill at age x remaining
ill, without lapse, until age x + t.
(d) t pdd
x is the probability of an individual who is dead at age x being
dead at age x + t. Clearly this probability is equal to 1.
(a) t pnn nn
x =t px
(b) t pss ss
x =t px
151
(a) True, since you cannot return to nonsmoker status once you have
left it.
(b) False, you can leave smoker status and return to it during the
period between age x and x + t.
(c) False, you can leave smoker status and return to it during the
period between age x and x + t.
(d) True, since you remain dead once you have died. In fact, these
probabilities are equal to 1.
t pgg
= t pgg gj
P
3. Prove the relationship t
x
x x+t
j6=g
By applying the second assumption listed above for the multiple state
model, and recognising the dt is sufficiently small to assume only one
transition can take place:
gg P gj P gj
dt px+t = 1 dt px+t = 1 x+t .dt + o(dt).
j6=g j6=g
gg P gj
t px .(1 x+t .dt+o(dt))t pgg
x
t pgg j6=g
Hence t
x
= lim+ dt
dt0
152
4. Consider three lives, lives A, B and C, who take part in an investigation
to establish an illness death model for the age interval 70 to 71. The
results of the observation of these lives are outlined below. Summarise
the results from these observed samples using the appropriate notation
(e.g. vi , wi etc.).
(a) Life A: Joins observation age 70 when ill. Recovers age 70 years
1 months. Falls ill again age 70 years 2 months. Recovers age 70
years 4 months. Leaves study age 70 years 11 months.
(b) Life B: Joins observation age 70 years 3 months when well. Dies
age 70 years 10 months.
(c) Life C: Joins observation age 70 and well. Falls ill age 70 years
3 months. Recovers age 70 years 9 months. Falls ill again age 70
years 10 months. Dies age 70 years 11 months.
Life vi wi si ri di ui
8 3
A 12 12
1 2 0 0
7
B 12
0 0 0 1 0
4 7
C 12 12
2 1 0 1
5. Define W , D and U .
N
P
Wi =Waiting time of life i in the ill state. W = Wi , i.e. the total
i=1
time spent in the ill state over all lives under observation.
N
P
Di =Number of transitions able to dead by life i. Di = Di , i.e.
i=1
the total number of deaths from an able state during the observation
period.
N
P
Ui = Number of transitions ill to dead by life i. Ui = Ui , i.e. the
i=1
total number of deaths from an ill state during the observation period.
153
72 subjects die during the observation: 62 of whom were ill at the
time of death.
Subjects were observed for 1,304 years (treating each life as a
separate contribution to the total): for 503 of these years the
subjects were in an ill state.
Assuming the transition intensities are constant for this age band, draw
a diagram to represent the Markov model for sickness and use the
observation above to estimate the relevant transition intensities.
154