Economic Statistics, Output Analysis and Actuarial Models

Economic statistics, output administration
and actuarial science

Costel Balcau
2012
Contents
1 Preparation from Probability Theory
2 Statistical Indicators
2.1 Introduction to Economic Statistics . . . .
2.2 Statistical frequency series . . . . . . . . .
2.3 Classification algorithm . . . . . . . . . . .
2.4 Classification of statistical indicators . . .
2.5 Average measures . . . . . . . . . . . . . .
2.5.1 The arithmetic mean . . . . . . . .
2.5.2 The harmonic mean . . . . . . . .
2.5.3 The geometric mean . . . . . . . .
2.5.4 The quadratic mean . . . . . . . .
2.5.5 Absolute moments . . . . . . . . .
2.5.6 Properties of the means . . . . . .
2.6 Position measures . . . . . . . . . . . . . .
2.6.1 The mode . . . . . . . . . . . . . .
2.6.2 The median . . . . . . . . . . . . .
2.6.3 Quintiles . . . . . . . . . . . . . . .
2.6.4 Properties of the position measures
2.7 Variation measures . . . . . . . . . . . . .
2.7.1 Simple measures of dispersion . . .
2.7.2 Average deviation measures . . . .
2.7.3 Shape measures . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
26
27
28
29
31
31
32
32
33
33
34
34
34
35
37
37
38
38
39
41
3 Two-dimensional statistical distributions

3.1 Least Squares Method . . . . . . . . . . . . . . . . . . . . . .
3.2 Average measures for two-dimensional statistical distributions
3.3 Variation measures for two-dimensional statistical distributions
3.4 Correlation between variables . . . . . . . . . . . . . . . . . .
3.5 Nonparametric measures of correlation . . . . . . . . . . . . .
44
44
49
52
54
57
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
4 Time series and forecasting

60
4.1 The trend component . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 The cyclical component . . . . . . . . . . . . . . . . . . . . . . 61
4.3 The seasonal component . . . . . . . . . . . . . . . . . . . . . 62
5 The
5.1
5.2
5.3
5.4
5.5
5.6
interest
A general model of interest . . . . . . . . . .
Equivalence of investments . . . . . . . . . .
Simple interest . . . . . . . . . . . . . . . .
5.3.1 Basic formulas . . . . . . . . . . . . .
5.3.2 Simple interest with variable rate . .
5.3.3 Equivalence by simple interest . . . .
Compound interest . . . . . . . . . . . . . .
5.4.1 Basic formulas . . . . . . . . . . . . .
5.4.2 Nominal rate and effective rate . . .
5.4.3 Compound interest with variable rate
Loans . . . . . . . . . . . . . . . . . . . . .
Problems . . . . . . . . . . . . . . . . . . . .
6 Introduction to Actuarial Math

6.1 A general model of insurance . . . . .
6.2 Biometric functions . . . . . . . . . .
6.2.1 Probabilities of life and death
6.2.2 The survival function . . . . .
6.2.3 The life expectancy . . . . . .
6.2.4 Life tables . . . . . . . . . . .
6.3 Problems . . . . . . . . . . . . . . . .
7 Life
7.1
7.2
7.3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
annuities
A general model. Classifications . . . . . . . . . . . . . . .
Single claim . . . . . . . . . . . . . . . . . . . . . . . . . .
Life annuities-immediate . . . . . . . . . . . . . . . . . . .
7.3.1 Whole life annuities . . . . . . . . . . . . . . . . . .
7.3.2 Deferred whole life annuities . . . . . . . . . . . . .
7.4 Temporary life annuities . . . . . . . . . . . . . . . . . . .
7.5 Life annuities-immediate with k-thly payments . . . . . . .
7.5.1 Whole life annuities with k-thly payments . . . . .
7.5.2 Deferred whole life annuities with k-thly payments
7.5.3 Temporary life annuities with k-thly payments . . .
7.6 Pension . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.6.1 Annually pension . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
64
64
66
68
68
69
70
70
70
71
72
72
74
.
.
.
.
.
.
.
75
75
76
76
78
79
81
83
.
.
.
.
.
.
.
.
.
.
.
.
84
84
85
86
86
87
88
88
89
90
91
92
92
CONTENTS
7.7
8 Life
8.1
8.2
8.3
8.4
8.5
7.6.2 Monthly pension . . . . . . . . . . . . . . . . . . . . . 93

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
insurances
A general model. Classification
Whole life insurance . . . . . .
Deferred life insurance . . . . .
Temporary life insurance . . . .
Problems . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9 Collective annuities and insurances

9.1 Multiple life probabilities . . . . . . . . . . . . .
9.2 Single claim for joint survival . . . . . . . . . .
9.3 Single claims for partial survival . . . . . . . . .
9.4 Whole life annuities for joint survival . . . . . .
9.5 Whole life annuities for partial survival . . . . .
9.6 Deferred whole life annuities for joint survival .
9.7 Deferred whole life annuities for partial survival
9.8 Temporary life annuities for joint survival . . .
9.9 Temporary life annuities for partial survival . .
9.10 Group insurance payable at the first death . . .
9.11 Group insurance payable at the k-th death . . .
9.12 Problems . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
95
95
95
97
98
99
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
100
. 100
. 101
. 102
. 102
. 103
. 104
. 105
. 106
. 106
. 107
. 108
. 109
10 Bonus-Malus system
10.1 A general model . . . . . . . . . . . . . . . . . . . . . . .
10.2 Bayes model based on a mixed Poisson distribution . . .
10.3 Gamma distribution for the average number of accidents
10.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
111
111
112
114
118
11 Some optimization models

119
11.1 Portfolio planning . . . . . . . . . . . . . . . . . . . . . . . . . 119
11.2 Regional planning . . . . . . . . . . . . . . . . . . . . . . . . . 121
11.3 Industrial production planning . . . . . . . . . . . . . . . . . . 123
Bibliography
[1] F. Badea, C. Dobrin, Gestiunea bugetar
a a sistemelor de productie, Ed. Economica,
2003.
[2] N. Boboc, Analiz
a matematic
a. Partea I, Tipografia Universitatii din Bucuresti,
Bucuresti, 1988.
[3] C. Kleiber, S. Kotz, Statistical Size Distributions in Economics and Actuarial Sciences, Wiley, New Jersey, 2003.
[4] P.M. Lee, Bayesian statistics. An introduction, Hodder Arnold, London, 2004.
[5] D. Lovelock, M. Mendel, A.L. Wright, An Introduction to the Mathematics of Money.
Saving and Investing, Springer, New York, 2007.
[6] Y.D. Lyuu, Financial engineering and computation. Principles, mathematics, algorithms, Cambridge Univ. Press, 2004.
[7] I. Mircea, Matematici financiare si actuariale, Ed. Corint, Bucuresti, 2006.
[8] I. Negoit
a, Aplicatii practice n asigur
ari si reasigur
ari, Ed. Etape, Bucuresti, 2001.
[9] V. Preda, C. B
alc
au, Entropy optimization with applications, Ed. Academiei Romane,
Bucuresti, 2010.
[10] I. Purcaru, Matematici financiare: Teorie si practic
a n operatiuni bancare.
Tranzactii bursiere. Asigur
ari, Ed. Economica, Bucuresti, 1998.
[11] I. Purcaru, I. Mircea, Gh. Lazar, Asigur
ari de persoane si de bunuri : Aplicatii.
Cazuri. Solutii, Ed. Economica, Bucuresti, 1998.
[12] Gh. Secar
a, Statistic
a, Ed. Univ. Pitesti, 2002.
[13] A. Ullah, D.Giles, Handbook of applied economic statistics, Marcel Dekker, New York,
1998.
[14] R. Vernic, Matematici actuariale, Ed. Adco, Constanta, 2004.
[15] Gh. Zb
aganu, Metode matematice n teoria riscului si actuariat, Ed. Univ. Bucuresti,
2004.
Theme 1
Preparation from Probability
Theory
We group here the principal notions and results from probability theory that
were used in this course.
Definition 1.1. Let be any set. We denote by P() the set of all subsets
of , i.e. P() = {A / A }.
Definition 1.2. A topology on the set is a family T of subsets of s.t.
, T ,
A, B T A B T ,
S
Ai T ,
(Ai )iI T
iI
for each non-empty index set I.

A topological space is a pair (, T ), where is a set and T is a topology
on . Each set A T is called an open set of the topological space (, T ).
Definition 1.3. A Borel field (-field, -algebra) on the set is a nonempty family B of subsets of s.t.
A B \ A B,
S
(Ai )iN B
Ai B.
i=1
A measurable space is a pair (, B), where is a set and B is a Borel

field on . Each set A B is called a Borel set (measurable set) of the
measurable space (, B).
THEME 1. PREPARATION FROM PROBABILITY THEORY
Proposition 1.1. If M is a family of subsets of , then

\
B(M) = {B / B is a Borel field on , B M}
is a Borel field on .
Definition 1.4. In the context of the above proposition, B(M) is called the
Borel field generated by M.
For any d N , we denote by B d the Borel field generated by the intervals
of Rd .
Definition 1.5. Let (1 , B1 ) and (2 , B2 ) be two measurable spaces. A function f : 1 2 is called measurable (with respect to the Borel fields B1
and B2 ) if f 1 (B2 ) B1 .
Definition 1.6. Let be a set and I be a non-empty index set. For every
i I, let (i , Bi ) be a measurable space and fi : i be a function. We
denote by B(fi / i I) the smallest Borel field on with respect to which !
all
[
the functions fi , i I are measurable, i.e. B(fi / i I) = B
fi1 (Bi ) .
iI
The product Borel field of the Borel fields Bi , i I is

O
Bi = B(pri / i I),
iI
where, for every j I, prj :
i j , prj ((i )iI ) = j , (i )iI
iI
iI
is the projection function

O component.
Yon j-th
Bi ) is called the product of the meaThe measurable space ( i ,
iI
iI
surable spaces (i , Bi ), i I.
Proposition 1.2. For every d N we have B =
d
O
B1 .
i=1
Definition 1.7. A measure on the measurable space (, B) is a function

: B [0, ] s.t.
() = 0,
[
X
(Ai ).
(Ai )iN B mutually disjoint (Ai Aj = , i 6= j) ( Ai ) =
i=1
A measure is called a finite measure if () < .
i=1
A measure is called a -finite measure if there exists a sequence
(Ai )iN B s.t. (Ai ) < i N , Ai Ai+1 i N and

Ai = .
i=1
A measure space is a triple (, B, ), where (, B) is a measurable

space and is a measure on this space.
Definition 1.8. Let (, B, ) be a measure space and let R be a property on
, i.e. R : {0, 1},
(
1, if satisfies R
R() =
, .
0, otherwise
We say that the property R holds -almost everywhere (-a.e.) if (R1 ({0})) =
0, i.e. ({ / does not satisfy R}) = 0.
Definition 1.9. Let and be two measures on a measurable space (, B).
We say that is absolutely continuous with respect to , and we write
, if (A) = 0 for every A B such that (A) = 0.
Definition 1.10. A probability (probability measure) on the measurable
space (, B) is a measure on this space with the property that () = 1.
A probability space is a triple (, B, ), where (, B) is a measurable
space and is a probability on this space. The elements of B are called
the events of the probability space. For every A B, (A) is called the
probability of the event A. For every such that {} B, the event
{} is called an elementary event.
Proposition 1.3. Let (, B, ) be a probability space. We have:
a) () = 0;
b) (A) [0, 1], A B, i.e. : B [0, 1];
c) If A, B B and A B, then (A) (B) and (B \ A) = (B) (A);
d) If A1 , . . . , An B are mutually disjoint (i.e. Ai Aj = , i 6= j), then
n
n
[
X
( Ai ) =
(Ai );
i=1
i=1
e) (Inclusion-exclusion formula) If A1 , . . . , An B, then

(
n
[
i=1
Ai ) =
n
X
k=1
(1)k1
(Ai1 . . . Aik );
1i1 ...ik n
f ) If (An )nN B s.t. An An+1 n N , then (
[
n=1
An ) = lim (An );
n
g) If (An )nN B s.t. An An+1 n N , then (

h) If (An )nN B, then (
n=1
An )
An ) = lim (An );
n=1
(An ).
n=1
Proposition 1.4. Let be a countable set and

Xlet be a probability on
the measurable space (, P()). Then (A) =
({}) A P(), and
A
(({})) is a bijective correspondence

between the set of all the
n
X
probabilities on (, P()) and the set (p ) / p 0 ,
p =
o
1 .
Definition 1.11. In the context of the above proposition, we say that (p )
defined by p = ({}) is the discrete (or countable) probability
distribution of the discrete (or countable) probability .
Remark 1.1. In the setting of the above definition, (p ) is a vector if
is finite and a sequence if is infinite.
Remark 1.2. If (, B, P ) is a probability space, X : 1 is a function
and x 1 s.t. { / X() = x} B, then we denote
P (X = x) = P ({ / X() = x}).
Similarly one uses the notation P (X < x), P (X > x), P (X x), P (X x),
P (X 6= x), P (X A), where A 1 .
Also, if Y : 1 is another function s.t. { / X() = Y ()}
B, then we denote
P (X = Y ) = P ({ / X() = Y ()}).
Similarly one uses the notation P (X < Y ), P (X > Y ), P (X Y ), P (X
Y ), P (X 6= Y ).
Also, if Z : 2 is another function and z 2 s.t. { / Z() =
z} B, then we denote
P (X = x, Z = z) = P ({ / X() = x and Z() = z}).
Similarly one uses the notation P (X < x, Z < z), P (X A, Y B),
P (X = x, Y = y, Z = z), etc.
Definition 1.12. Let (, B, P ) be a probability space and let (Ai )iI B be

a family of events, where I is a non-empty index set. The events Ai , i I
are called independent if
\
Y
P ( Ai ) =
P (Ai )
iJ
iJ
for every finite non-empty subset J I of indices.

Proposition 1.5. Let (, B, P ) be a probability space and let A B be an
event such that P (A) > 0. Then the function
PA : B [0, 1], PA (B) =
P (B A)
, B B
P (A)
is a probability on the measurable space (, B).

Definition 1.13. In the context of the above proposition, PA is called the
conditional probability induced by the event A. For every event B B,
PA (B) is called the conditional probability of the event B given the event A.
We denote P (B/A) = PA (B).
Proposition 1.1 (Total probability formula). Let (, B, P ) be a probability space and let A1 , . . . , An B be events such that
= A1 An , Ai Aj = i 6= j
and P (Ai ) > 0 i {1, . . . , n}. Then, for every event B B we have
P (B) =
n
X
P (Ai )PAi (B).
i=1
Proposition 1.2 (Bayess formula). Let (, B, P ) be a probability space

and let A1 , . . . , An B be events such that
= A1 An , Ai Aj = i 6= j
and P (Ai ) > 0 i {1, . . . , n}. Then, for every event B B such that
P (B) > 0, we have
PB (Ai ) =
P (Ai )PAi (B)

P (Ai )PAi (B)
= n
, i {1, . . . , n}.
X
P (B)
P (Ak )PAk (B)
k=1
10
Definition 1.14. Let d N . A distribution (probability distribution)

on Rd is a probability on the measurable space (Rd , B d ).
Let be a distribution on Rd . The distribution is called discrete
(countable) if there exists a countable set A Rd such that (Rd \ A) = 0.
The distribution is called continuous if ({x}) = 0 for every x Rd .
d
Remark
X 1.3. A distribution on R is discrete if and only if it has the form
=
px x , where A Rd is a countable set, px = ({x}) x Rd and x
xA
is the Dirac measure, defined by

(
1, if x B
x (B) =
, B B d .
0, if x
6 B
Definition 1.15. Let be a distribution on Rd . The distribution function (probability distribution function, cumulative probability distribution function) of is the function F : Rd [0, 1] defined by
F (x) = ((, x]), x Rd ,
where (, x] = (, x1 ] . . . (, xd ] for every x = (x1 , . . . , xd ) Rd .
Definition 1.16. Let d N . For every function F : Rd R and every
vectors x = (x1 , . . . , xd ) Rd , a = (a1 , . . . , ad ) Rd and b = (b1 , . . . , bd )
Rd we define
X
(d) (F ; a; b) =
(1)i1 +...+id +d F (a1 +i1 (b1 a1 ), . . . , ad +id (bd ad )).
i1 ,...,id {0,1}
Proposition 1.6. Let be a distribution on Rd . Then its distribution function F verifies the following properties:
1) (d) (F ; a; b) 0, a, b Rd s.t. a b (i.e. ai bi i {1, . . . , d});
2) F is right continuous, i.e. lim F (x) = F (a), a Rd ;
x&a
3) lim F (x) = 1;
x
lim F (x) = 0 for an i {1, . . . , d}.
xi
Definition 1.17. A function F : Rd [0, 1] which verifies all the properties

from the above proposition is called a distribution function on Rd . A
function F : Rd R which verifies only the properties 1 and 2 from the above
proposition is called a generalized distribution function (LebesgueStieltjes measure function) on Rd .
Proposition 1.7. The correspondence F is a bijection between the set
of all the distributions on Rd and the set of all the distribution functions on
Rd .
11
Proposition 1.8. Let F : Rd R be a generalized distribution function.

Then there exists a unique measure F on the measurable space (Rd , B d ) with
the property that
F ((a, b]) = (d) (F ; a; b), a, b Rd s.t. a b,
where (a, b] = (a1 , b1 ] . . . (ad , bd ] a = (a1 , . . . , ad ), b = (b1 , . . . , bd ) Rd .
Definition 1.18. In the context of the above proposition, the measure F is
called the Lebesgue-Stieltjes measure generated by F .
We denote by d the Lebesgue-Stieltjes measure on the space (Rd , B d )
generated by the generalized distribution function
F (x) = x1 . . . xd , x = (x1 , . . . , xd ) Rd .
d is called the Lebesgue measure on Rd . For d = 1 we denote by mL the
Lebesgue measure on R, i.e. mL = 1 .
Proposition 1.9. For every a = (a1 , . . . , ad ), b = (b1 , . . . , bd ) Rd s.t. a b
we have d ((a, b]) = (b1 a1 ) . . . (bd ad ).
In particular, mL ((a, b]) = b a, a, b R s.t. a b.
Definition 1.19. Let (, B) be a measurable space and let f : R be a
measurable function (with respect to the Borel fields B and B 1 ). We say that
the function f is simple if the set f () is finite. We denote
S(, B) = {f : R / f is measurable (with respect to B and B 1 ) and simple},
S+ (, B) = {f : [0, ] / f is measurable (with respect to B and B1 )},
where B1 is the Borel field generated by the open subsets of R.

Definition 1.20. Let A . The characteristic function of the subset
A (with respect to the set ) is the function
(
1, if x A
1A : {0, 1}, 1A (x) =
, x .
0, if x 6 A
Proposition 1.10. Let (, B)
be a measurable space.
X
a) If f S(, B), then f =
a 1f 1 ({a}) .
af ()
b) f S(, B) if and only if there exist n N , a1 , . . . , an R and

n
X
A1 , . . . , An B mutually disjoint s.t. f =
ai 1 A i .
i=1
c) If f S+ (, B), then there exists a sequence (fn )nN S(, B) s.t.

0 fn fn+1 n N and lim fn = f .
n
12
be a
Definition 1.21. Let (, B, ) be a measure space and let f : R
measurable function (with respect to the Borel fields B and B1 ).
a) If f S(, B), then the Lebesgue integral of the function f with respect
to the measure is defined by
Z
X
f d =
a(f 1 ({a}))
af ()
(with the convention 0 = 0, = ).

b) If f S+ (, B) s.t. f = lim fn , where (fn )nN S(, B), 0 fn
n
fn+1 n N , then the Lebesgue integral of the function f with respect to
the measure is defined by
Z
Z
f d = lim
fn d ( [0, ]).
n
c) f is called Lebesgue
integrable with respect to the measure (-Lebesgue
Z
integrable) if
|f |d < . In this case the Lebesgue integral of the
function f with respect to the measure is defined by

Z
Z
Z
+
f d = f d f d ( R),
where f + = max{f, 0} and f + = max{f, 0}.
d) If A B and the function 1A f is -Lebesgue integrable, then the
Lebesgue integral of the function f with respect to the measure over
the set A is defined by
Z
Z
f d = 1A f d ( R).
A
Remark 1.4. Sometimes, in order to avoid any possible confusion, we might

choose to emphasize the argument of the function that we are integrating and
we write
Z
Z
Z
Z
f d = f (x)d(x),
f d =
f (x)d(x).
A
Proposition 1.11. (Correctness of Definition 1.21.b) Let (, B, )

be a measure space. If f S+ (, B) s.t. f = lim fn = lim gn , where
n
(fn )nN , (gn )nN S(, B), 0 fn fn+1 , 0 gn gn+1 n N , then

Z
Z
lim
fn d = lim
gn d.
n
13
Proposition 1.12. (Properties of Lebesgue integral) Let (, B, ) be a

be -Lebesgue integrable functions, for every
measure space, f, fn : R
be a measurable function (with respect to the Borel

n N , and let g : R
1
fields B and B ).
a) (Linearity)Z For every 1 , 2 R the
Z function 1Zf1 + 2 f2 is -Lebesgue
integrable and
(1 f1 + 2 f2 )d = 1 f1 d + 2 f2 d.
Z
Z
b) (Monotonicity) If f1 f2 , then
f1 d f2 d.
Z
Z
If f1 f2 and ({x / f1 (x) < f2 (x)}) > 0, then
f1 d < f2 d.
Z
Z

c) f d |f |d.
d) f is finite -a.e., i.e. ({x / f (x) = }) = 0. Z
e) If g = f -a.e., then g is -Lebesgue integrable and
gd = f d.
Z
Z

f ) If |g| f -a.e., then g is -Lebesgue integrable and gd f d.
Theorem 1.1. (Lebesgues dominated convergence Theorem) Let
for
(, B, ) be a measure space and let the functions f, fn , g : R,
every n N . If the functions fn , n 1 are measurable (with respect to

the Borel fields B and B1 ), lim fn = f -a.e., |fn | g -a.e. for every
n
n N and the function g is -Lebesgue integrable, then f is also -Lebesgue
integrable and
Z
Z
lim
fn d = f d.
n
Definition 1.22. Let U, V Rd be two non-empty open sets. A function

: U V is called a C 1 diffeomorphism if it is a bijection and all the
components of and 1 have continuous first partial derivatives.
Theorem 1.2. i) (Substitution formula) Let (, B, ) be a measure space,
(1 , B1 , ) be a measurable space, f : 1 be a measurable function (with
be a measurable function
respect to the Borel fields B and B1 ) and g : 1 R
(with respect to the Borel fields B1 and B1 ). Then
Z
Z
g f d =
gd( f 1 ),
that is, if either integral exists so does the other and they are equal.
ii) (Change of variable formula) Let U, V Rd be two non-empty open
14
sets, : U V be a C 1 diffeomorphism and f : V R be a measurable

function (with respect to the Borel fields B d and B 1 ). Then
Z
Z
f (x)dd (x) = (f )(y)|J (y)|dd (y),
V

where J (y) = det

i
(y)
(J is called the Jacobian of ).
yj
i,j{1,...,d}
Theorem 1.3. (Jensens Inequality) Let (, B, P ) be a probability space,

I R be an open interval, f : I be a P -Lebesgue integrable function
and F : I R be a convex function with the property that the function
F f : R is P -Lebesgue integrable. Then
Z
Z
F
f dP (F f )dP.
Moreover,
if F is strictly convex then the equality holds if and only if f =
Z
f dP P -a.e.
Theorem 1.4. (Radon-Nikodym) Let be a measure and be a -finite
measure on the measurable space (, B). Then if and only if there
exists a -Lebesgue integrable function f : [0, ] such that
Z
(A) =
f d, A B.
A
Moreover, the function f is unique -a.e.

Definition 1.23. In the context of the above theorem, the function f is
called the Radon-Nikodym derivative of with respect to and is written
d
.
f=
d
Theorem 1.5. (Fubini) Let (1 , B1 , 1 ) and (2 , B2 , 2 ) be two measure
spaces, where the measures 1 and 2 are -finite. Then there exists a unique
measure on the measurable space (1 2 , B1 B2 ) s.t.
(A1 A2 ) = 1 (A1 )2 (A2 ), A1 B1 , A2 B2 .
Moreover, for every A B1 B2 we have
Z
Z
(A) = 2 (Ax1 )d1 (x1 ) = 1 (Ax2 )d2 (x2 ),
15
where Ax1 = {x2 2 / (x1 , x2 ) A} and Ax2 = {x1 1 / (x1 , x2 ) A}.

we have
Also, for every -Lebesgue integrable function f : 1 2 R

Z
Z Z
Z Z
f d =
f (x1 , x2 )d1 (x1 ) d2 (x2 ) =
f (x1 , x2 )d2 (x2 ) d1 (x1 )
(and all the integrals exist and they are finite).
Definition 1.24. In the context of the above theorem, the measure is called
the product of the measures 1 and 2 , and is written = 1 2 .
The measure space (1 2 , B1 B2 , 1 2 ) is called the product of
the measure spaces (1 , B1 , 1 ) and (2 , B2 , 2 ).
Theorem 1.6. (Comparison of the Lebesgue and the Riemann integrals) a) Let a = (a1 , . . . , ad ), b = (b1 , . . . , bd ) Rd s.t. a b. If the
function f : [a, b] R is measurable (with respect to the Borel fields B d and
B 1 ) and Riemann integrable, then f is also d -Lebesgue integrable and
Z
b1
bd
f (x)dd (x) =
[a,b]
a1
f (x1 , . . . , xd )dx1 . . . dxd .

ad
b) Let f : Rd [0, ) be a measurable function (with respect to the Borel

fields B d and B 1 ) such that f is Riemann integrable on every compact interval
[a, b] Rd . Then f is d -Lebesgue integrable if and only if f is (improperly)
Riemann integrable and
Z
Z
Z
f (x1 , . . . , xd )dx1 . . . dxd
f (x)dd (x) =
Rd
(where the Riemann integrals from the right side are improper).
Proposition 1.13. Let (1 , B1 , P1 ), . . . , (n , Bn , Pn ) be probability spaces,
n N . Then there exists a unique probability P on the measurable space
n
n
Y
O
( i ,
Bi ) s.t.
i=1
i=1
P(
n
Y
i=1
Ai ) =
n
Y
Pi (Ai ), Ai Bi , i {1, . . . , n}.
i=1
Moreover, P = (. . . ((P1 P2 ) P3 ) . . .) Pn and the operation is associative.
16
Definition 1.25. In the context of the above proposition, the probability P

is called the product of the probabilities P1 , . . . , Pn and is written P =
n
O
Pi .
i=1
n
n
n
Y
O
O
The probability space ( i ,
Bi ,
Pi ) is called the product of the
i=1
i=1
i=1
probability spaces (1 , B1 , P1 ), . . . , (n , Bn , Pn ).
Proposition 1.14. Let I be an infinite index set and, for every i I, let
(i , Bi , Pi ) be a probabilityYspace.
OThen there exists a unique probability P
on the measurable space ( i ,
Bi ) s.t.
iI
iI
P prJ1 =
Pj
jJ
for every finite non-empty subset J I of indices, where prJ :

Y
Y
i .
j , prJ ((i )iI ) = (j )jJ , (i )iI
iI
iI
jJ
Definition 1.26. In the context of the above proposition, the probability

OP is
called the product of the probabilities (Pi )iI , and is written P =
Pi .
iI
Y
O
O
The probability space ( i ,
Bi ,
Pi ) is called the product of the
iI
iI
iI
probability spaces (i , Bi , Pi ), i I.
Definition 1.27. Let (, B, P ) be a probability space and (1 , B1 ) be a measurable space. A function X : 1 which is measurable (with respect
to the Borel fields B and B1 ) is called a random variable (r.v., random
element).
If (1 , B1 ) = (Rd , B d ), then X is called a d-dimensional r.v. (random
vector). In particular, if d = 1 then X is called a real-valued r.v.
Proposition 1.15. Let (, B, P ) be a probability space, (1 , B1 ) be measurable space and X : 1 be a random variable. Then = P X 1 is a
probability on the space (1 , B1 ).
Definition 1.28. In the context of the above proposition, the probability =
P X 1 is called the distribution (probability distribution) of the r.v.
X (with respect to the probability P ).
17
The distribution function (probability distribution function, cumulative probability distribution function) of the r.v. X (with
respect to the probability P ) is the distribution function of its distribution
P X 1 , i.e. the function
FX : Rd [0, 1], FX (x) = P (X < x), x Rd .
A d-dimensional r.v. X is called discrete if the image X() is a countable set. A d-dimensional r.v. X is called continuous (with respect to the
probability P ) if its distribution function FX is continuous.
Remark 1.5. Let X be a d-dimensional discrete r.v. Then its distribution
function FX (with respect to any probability P ) is also discrete, i.e. the image
FX (Rd ) is a countable set.
Proposition 1.16. Let (, B, P ) be a probability space and X : Rd be
a d-dimensional r.v.
a) If X is discrete, then its distribution = P X 1 is also discrete.
b) If X is continuous (with respect to P ), then its distribution = P X 1
is also continuous.
Definition 1.29. A function p : Rd [0, ) that is measurable (with respect
to the Borel fields B d and B1 ) is called a probability density function
(probability
function, density function) if is d -Lebesgue integrable and
Z
p(x)dd (x) = 1.
Rd
Proposition 1.17. Let be a distribution on Rd and X be a d-dimensional

r.v. with the distribution (with respect to a probability P ). If d , then
d
is a probability density function and
the Radon-Nikodym derivative
dd
d
= F0
dd
d -a.e.,
where F is the distribution function of (and of X, with respect to P ), and

F0 is its derivative.
Definition 1.30. In the context of the above proposition, the function p =
d
is called the probability density function (probability function,
dd
density function) of the distribution (and of the r.v. X, with
respect to the probability P ).
18
Definition 1.31. Let X1 , . . . , Xn be random variables defined on the probability space (, B, P ), where Xi is a di -dimensional r.v., for every i {1, . . . , n}.
Let the d! +. . .+dn -dimensional r.v. X = (X1 , . . . , Xn ) : Rd1 . . .Rdn .
The distribution of X (with respect to P ) is called the joint distribution
of the r.v. X1 , . . . , Xn (with respect to P ).
For any i {1, . . . , n}, the distribution of Xi (with respect to P ) is called
a marginal distribution of the r.v. X (with respect to P ).
Definition 1.32. Let (, B, P ) be a probability space, I be a non-empty
index set and (Xi )iI be a family of random variables defined on this space
with values in a measurable space (i , Bi ), for every i I. The r.v. Xi , i I
are called independent (with respect to the probability P ) if
P (Xi1 Ai1 , . . . , Xik Aik ) = P (Xi1 Ai1 ) . . . P (Xik Aik ),
for every finite non-empty subset {i1 , . . . , ik } I of indices, i1 < . . . < ik ,
k N , and for every events Ai1 Bi1 , . . . , Aik Bik .
Proposition 1.18. Let X1 , . . . , Xn be random variables defined on the probability space (, B, P ), where Xi is a di -dimensional r.v., for every i
{1, . . . , n}, n N . Let 1 , . . . , n be the distributions of X1 , . . . , Xn , respectively (with respect to P ), and let FX1 , . . . , FXn be the distribution functions
of X1 , . . . , Xn , respectively (with respect to P ).
a) The following assertions are equivalent:
a1) The r.v. X1 , . . . , Xn are independent (with respect to P );
a2) For every events A1 B d1 , . . . , An B dn we have
P (X1 A1 , . . . , Xn An ) = P (X1 A1 ) . . . P (Xn An );
a3) The distribution of X = (X1 , . . . , Xn ) (with respect to P ) verifies
the equality = 1 . . . n ;
a4) The distribution function FX of X = (X1 , . . . , Xn ) (with respect to P )
verifies the equality
FX (x) = FX1 (x1 ) . . . FXn (xn ), x = (x1 , . . . , xn ) Rd1 . . . Rdn .
b) We assume moreover that the r.v. X1 , . . . , Xn are discrete. Then X1 , . . . , Xn
are independent (with respect to P ) if and only if
P (X1 = x1 , . . . , Xn = xn ) = P (X1 = x1 ) . . . P (Xn = xn ),
for every x1 Rd1 , . . . , xn Rdn .
c) We assume moreover that the r.v. X1 , . . . , Xn are continuous, with the
19
probability density functions p1 , . . . , pn , respectively (with respect to P ). Then

X1 , . . . , Xn are independent (with respect to P ) if and only if the probability
density function p of the r.v. X = (X1 , . . . , Xn ) (with respect to P ) verifies
the equality
p(x) = p1 (x1 ) . . . pn (xn ), x = (x1 , . . . , xn ) Rd1 . . . Rdn
(d1 +...+dn -a.e.).
Definition 1.33. Let and be two distributions on R, and let X and Y
be two random variables with the distributions and , respectively (with
respect to a probability P ). Let r R, r > 0.
a) The rth absolute moment of the distribution (and of the r.v. X, with
respect to the probability P ) is defined by
Z
|x|r d(x).
Er () Er (X) =
R
b) If Er () < , then the rth moment of the distribution (and of the

r.v. X, with respect to the probability P ) is defined by
Z
xr d(x).
Er () Er (X) =
R
In particular, the mean (expected value, expectation or average) of the

distribution (and of the r.v. X, with respect to the probability P ) is defined
by
Z
E() E(X) = E1 () =
xd(x).
R
c) If Er () < , then the rth central moment of the distribution (and

of the r.v. X, with respect to the probability P ) is defined by
Z
c
c
Er () Er (X) = [x E()]r d(x).
R
In particular, the variance of the distribution (and of the r.v. X, with

Z
c
var () var (X) = E2 () = [x E()]2 d(x)
R
d) If var (X) < and var (Y ) < , then the covariance of the r.v. X and
Y (with respect to the probability P ) is defined by

cov (X, Y ) = E [X E(X)][Y E(Y )] .
20
Proposition 1.19. (Properties of mean, variance and covariance

for real-valued random variables) In the context of the above definition,
we have:
var (X) = E2 (X) [E(X)]2 = cov (X, X);
cov (Y, X) = cov (X, Y ); cov (X, Y ) = E(XY ) E(X)E(Y );
E(aX) = aE(X), var (aX) = a2 var (X), a R;
E(X + Y ) = E(X) + E(Y ); var (X + Y ) = var (X) + var (Y ) + 2 cov (X, Y ).
If the r.v. X and Y are independent and their means are finite, then
E(XY ) = E(X)E(Y ), cov (X, Y ) = 0, var (X + Y ) = var (X) + var (Y ).
If the r.v. X are constant, i.e. X() = c , where c R, then
E(X) = c and var (X) = 0.
Proposition 1.20. (Moments of discrete r.v.) In the context of the
above definition, if the r.v. X is discrete then we have:
X
X
Er (X) Er () =
|x|r ({x}); Er (X) Er () =
xr ({x});
xA
E(X) E() =
xA
x({x}); Erc (X) Erc () =
xA
[x E()]r ({x});
xA
hX
i2
X
X
var (X) var () =
[x E()]2 ({x}) =
x2 ({x})
x({x}) ,
xA
xA
xA
where ({x}) = P (X = x) and A = {x R / ({x}) > 0}.

Remark 1.6. In the setting of the above proposition, we have A X(),
and hence the set A is countable. Obviously, all the formulas for the above
proposition remain valid if we replace the set A with the set X().
Proposition 1.21. (Moments of continuous r.v.) In the context of
the above definition, if the r.v. X is continuous, with the probability density
function p (with respect to the probability P ) then we have:
Z
Er (X) Er () =
|x|r p(x)dmL (x);
ZR
Er (X) Er () =
xr p(x)dmL (x);
ZR
E(X) E() =
xp(x)dmL (x);
R

Erc (X)
Erc ()
Z
=
21
[x E()]r p(x)dmL (x);
ZR
var (X) var () = [x E()]2 p(x)dmL (x)

R
Z
hZ
i2
2
x p(x)dmL (x)
xp(x)dmL (x) .
=
R
Remark 1.7. In the context of the above proposition, if the probability density function p is continuous or, more generally, Riemann integrable on every
compact interval, then from Theorem 1.6 we have:
Z
Z
r
xr p(x)dx;
|x| p(x)dx; Er (X) =
Er (X) =
Z
Z
E(X) =
xp(x)dx; Erc (X) =
[x E()]r p(x)dx;
Z
Z
hZ
i2
2
2
var (X) =
[x E()] p(x)dx =
x p(x)dx
xp(x)dx .
Definition 1.34. Let be a distribution on Rd and X = (X1 , . . . , Xd ) be

a d-dimensional random variable with the distribution (with respect to a
probability
P ).
Z
|xi |d(x) < , i {1, . . . , d}, then the mean (expected value,
If
Rd
expectation or average) of the distribution (and of the r.v. X, with

E() E(X) = (E1 (X), . . . , Ed (X)), where
Z
Ei (X) Ei () =
xi d(x), i {1, . . . , d}.
Rd
Z
If
Rd
(x21 + . . . + x2d )d(x) < , then the covariance matrix of the
distribution (and of the r.v. X, with respect to the probability P ) is defined

by
cov () cov (X) = (Eij (X) Ei (X)Ej (X))i,j{1,...,d} , where
Z
Eij (X) Eij () =
xi xj d(x), i, j {1, . . . , d}.
Rd
Proposition 1.22. (Properties of mean and covariance for random

vectors) In the context of the above definition, we have:
E(X) = (E(X1 ), . . . , E(Xd )); cov (X) = (cov (Xi , Xj ))i,j{1,...,d} ;
22
E(AX> ) = AE(X)> , cov (AX> ) = A cov (X) A> , A Rdd .

If Y is another d-dimensional r.v. with the distribution (with respect to
the same probability P ), then
E(X + Y) = E(X) + E(Y),
and if X and Y are independent, then
cov (X + Y) = cov (X) + cov (Y).
Remark 1.8. Similarly to Proposition 1.20, Proposition 1.21 and Remark
1.7, the formulas of mean and covariance matrix for random vectors can be
rewritten in the particular cases of d-dimensional discrete r.v., d-dimensional
continuous r.v. with probability density function, and d-dimensional continuous r.v. with a Riemann integrable (on every compact interval) probability
density function. The formulas obtained in this way are expressed in terms
of sums, Lebesgue integrals (with respect to the Lebesgue measure d ) and
Riemann integrals, respectively.
Proposition 1.23. Let (, B, P ) be a probability space, (1 , B1 ) be a measurable space, X : 1 be a random variable and A B be an event
such that P (A) > 0. Then the restriction of X to the subset A , i.e. the
function
X/A : A 1 , (X/A)() = X(), A
is a random variable on the probability space (A, BA , PA ), where BA = {B
A / B B} and PA is the conditional probability induced by the event A.
Definition 1.35. In the context of the above proposition, X/A is called a
conditional random variable induced by the event A. Its distribution is
called the conditional distribution of the r.v. X given the event A, and its
mean E(X/A) is called the conditional mean (conditional expectation)
of the r.v. X given the event A.
Definition 1.36. Let be a probability on the measurable space (N, P(N))
and let X be a real-valued r.v. with the distribution (with respect to a
probability P ). The probability generating function of (and of X,
with respect to P ) is the function G GX defined by
G (t) GX (t) =
X
i=0
where ({i}) = P (X = i).
({i})ti , t [1, 1],
23
Proposition 1.24. Let X1 , . . . , Xn be r.v. taking values in N. If X1 , . . . , Xn

are independent, then
GX1 +...+Xn = GX1 . . . GXn
(all the probability generating functions being defined with respect to the same
probability P ).
Definition 1.37. Let be a distribution on Rd and X be a d-dimensional r.v.
with the distribution (with respect to a probability P ). The characteristic
function of (and of X, with respect to P ) is the function X defined
by
Z
eiht, xi d(x), t Rd (i2 = 1).
(t) X (t) =
Rd
The moment generating function of (and of X, with respect to P ) is

the function X defined by
Z
eht, xi d(x), t Rd .
(t) X (t) =
Rd
Remark 1.9. In the above definition, ht, xi denotes the inner product
(scalar product) of vectors t = (t1 , . . . , td ) and x = (x1 , . . . , xd ), i.e.
d
X
ht, xi =
ti xi . For d = 1 we have:
i=1
Z
X (t) (t) =
itx
e d(x), X (t) (t) =

R
etx d(x), t R,
Proposition 1.25. a) In the context of the above definition, if the r.v. X is

discrete, then
X
X
X (t) (t) =
eiht, xi ({x}), X (t) (t) =
eht, xi ({x}),
xA
xA
where ({x}) = P (X = x) and A = {x Rd / ({x}) > 0} (or A = X()).

If the r.v. X is continuous, with the probability density function p (with
respect to the probability P ), then
Z
Z
iht,
xi
X (t) (t) =
e
p(x)dd (x), X (t) (t) =
eht, xi p(x)dd (x).
Rd
Rd
b) Let X1 , . . . , Xn be d-dimensional r.v. If X1 , . . . , Xn are independent (with

respect to a probability P ), then
X1 +...+Xn = X1 . . . Xn , X1 +...+Xn = X1 . . . Xn
(all the characteristic functions and the moment generating functions being
defined with respect to the same probability P ).
24
Proposition 1.26. Let X be a real-valued r.v. with the distribution (with

respect to a probability P ) such that En (X) < , where n N . Then
Er (X) =
r X
(0), r n, r N .
tr
Proposition 1.27. Let 1 , . . . , n be distributions on Rd and consider the

sum function
sn : |Rd .{z
. . R}d Rd , sn (x1 , . . . , xn ) = x1 + . . . + xn , x1 , . . . , xn Rd .
n
Then the function 1 . . . n : B d [0, 1] defined by

1 . . . n = (1 . . . n ) s1
n
is a distribution on Rd .
Definition 1.38. In the context of the above proposition, the distribution
1 . . . n is called the convolution of the distributions 1 , . . . , n .
Proposition 1.28. Let X1 , . . . , Xn be d-dimensional r.v. with the distributions 1 , . . . , n , respectively (with respect to a probability P ). If X1 , . . . , Xn
are independent, then X = X1 + . . . + Xn is a random variable with the
distribution = 1 . . . n (with respect to the same probability P ).
Definition 1.39. Let (, B, P ) be a probability space, and let (1 , B1 ) be
a measurable space, where 1 is a metric space and B1 is the Borel field
generated by the open subsets of 1 .
Let and (n )nN be finite measures on the measurable space (1 , B1 ).
We say that the sequence (n )nN converges weakly to , and we write
w
n
(or n ), if
Z
Z
lim
f dn = f d
n
for every bounded, continuous function f : 1 R.

Let X and (Xn )nN be random variables defined on the probability space
(, B, P ) with values in the measurable space (1 , B1 ). We say that the sequence (Xn )nN converges in distribution to X (with respect to the probd
ability P ), and we write Xn
X, if
w
P Xn1
P X 1 .
25
Proposition 1.29. In the context of the above definition, we have the following equivalences:
Z
Z
w
a) n
if and only if lim
f dn = f d for every bounded, uniformly
n
continuous function f : 1 R.
w
b) n
if and only if lim n (A) = (A) for every A B1 s.t. (A) = 0,
n
where A is the boundary of the
Z set A.
Z
d
c) Xn
X if and only if lim
f (Xn )dP =
uniformly continuous function f : 1 R.
f (X)dP for every bounded,
Theme 2
Statistical Indicators
2.1
Introduction to Economic Statistics
Economic Statistics is the science that deals with the collection, classification, analysis and interpretation of numerical facts or data from economics.
It means that by the use of probability theory it imposes order and regularity
on aggregate of disparate elements of the same population.
Statistical population (statistical collectivity): the total number of
elements of the same properties representing the object of the investigation.
Statistical unit: the basic element of the statistical population, which
will be observed within the statistical research, and will represent any individual elements of the population.
Statistical characteristic: a common property of all the population
units.
Statistical variable: a statistical characteristic which can take different
values from a unit to another unit (or from a group of units to another group).
Statistical indicator: a numerical expression of an economic category,
obtained using a statistical calculus characterizing a variable.
Statistical sample: a part of statistical population, which will be investigated.
Descriptive Statistics: methods for representing and describing the
statistical population (data summarizing, tabulation and presentation; analysis of data uniformity and consistency and symmetry interpretation; construction of indicators, index numbers, time series; correlation and regression,...).
Inferential Statistics: methods for predicting about the whole statistical population by studying the properties of a statistical sample (estimation
of population parameters; construction of confidence intervals; testing statis-
26
THEME 2. STATISTICAL INDICATORS
27
tical hypothesis).
The main steps of a statistical research:
1. data collection;
2. data analysis;
3. data conclusions and results interpreting.
The detailed steps of a statistical research:
1. Establishing the objective of the research.
2. Defining and identifying the population to be studied according to the
objective.
3. Establishing the set of characteristics according to the information we
need to obtain.
4. Analyzing the already existing data bases about the studied population,
that is analyzing the secondary data sources.
5. For insufficient secondary data, organizing a total research or a partial
research (by sampling).
6. Organizing the data collection, which means deciding where, when and
how to collect the data for each unit, individually or collectively (using
a common recording way, like as a list).
7. Data recording, using a data analysis program like as EXCEL, STATISTICA, MINITAB or SPSS.
8. Data summarizing and presentation (by tables, series, graphs...).
9. Data analyzing using descriptive statistics and inferential statistics
methods.
10. Data conclusions and results interpreting (by research reports).
2.2
Statistical frequency series
Statistical frequency distribution (statistical frequency series) of a

statistical variable: the correspondence from the values of the variable, called
also variants, to the frequencies of these values.
28
Simple frequency distribution (single variation frequency distribution or univariated data): the statistical variable is one-dimensional.
Multidimensional frequency distribution: the statistical variable
is multidimensional.
Frequency:
Absolute frequency, denoted by ni , represents the number of units
occurring to a certain variant or falling into a certain class. (interval).
Relative frequency, denoted by fi , represents the share of the absolute frequency corresponding to a variant or a class into the total
P
n
ni is the volume of
number of frequencies: fi = i , where n =
n
i
distribution.
Cumulated frequencies, can be obtained from absolute frequencies
or from relative frequencies, and represent the number of units with
the variable value lower or equal than the upper limit of the current
class.
2.3
Classification algorithm
A class (interval) of variation of the values (data) of a statistical distribution is defined between two boundaries: its lower and upper limit. The
class size (the interval size) represents the difference between the upper
limit and the lower limit.
Data grouping assumes solving the following main issues:
the purpose of the classification is to obtain synthetic data;
the grouped results should be homogeneous groups;
their frequency distribution should be as close as possible to the normal
distribution (Gauss bell ).
The classification algorithm consists in the following steps:
1. Compute the amplitude of distribution:
A = maximum value minimum value.
29
2. Choose the number r of classes (intervals). For example, according to

the rule of H.D. Sturges:
r = 1 + 3.322 lg n.
3. Compute the class sizes (the interval sizes). For example, for classes
with the same size, the size is
d=
A
r
(rounded to an integer!).
4. Construct the classes, by starting with the minimum value and adding
the class size d step by step.
Exercise 2.1. The number of failures produced by an equipment and recorded
for the last 25 hours are as follows: 12, 15, 29, 23, 17, 7, 10, 14, 14, 27, 22,
8, 5, 19, 6, 15, 20, 17, 16, 17, 23, 19, 9, 28, 5.
a. Construct a frequency distribution and a relative frequency distribution for these data.
b. Construct a line chart for these data.
c. Group these data using the above classification algorithm.
d. Construct a frequency distribution and a relative frequency distribution for the obtained classes (grouped data).
2.4
Classification of statistical indicators
Statistical indicators (statistical measures) are numerical expression of

a statistical distribution, according to a certain characteristic.
Classification of statistical indicators:
Central tendency indicators: describe in a synthetic manner the
typical feature of a statistical distribution and summarize the essential
information comprised into it.
The main central tendency indicators:
Average measures: the arithmetic mean, the geometric mean,
the quadratic mean, the harmonic mean, the absolute moments;
Position measures: the mode, the median, the quintiles.
30
For a central tendency measure to be representative, the set of values of

a given distribution need to be homogeneous. This property is evaluate
by variation measures.
Variation indicators: evaluate the variability of a statistical distribution from its central tendency measures.
The main variation indicators:
Simple measures of dispersion: the amplitude (the absolute
range), the relative range, the inter-quintile range, the individual
deviation;
Average deviation measures: the mean absolute deviation,
the variance, the standard deviation, the central moments, the
covariance.
Shape measures: the Pearsons skewness coefficients, the Yules
skewness coefficient, the excess coefficient.
Relationships between the statistical methods applied according to the
type of measurement scale:
Measures of
position
Measures of
dispersion
Measures of
association
Significant
tests
Ratio scale
Interval scale
Ordinal scale
Nominal scale
Mode
Median,
Arithmetic
Geometric
Quintiles
mean
mean
Quintiles
Standard
Percentage of
deviation
variation
Contingency Rank correlation Correlation, all the previous
coefficient
coefficients
regression
methods
Chi-square
sign test
t test,
all the previous
Fisher test
tests
2.5
Average measures
2.5.1
The arithmetic mean
31
For a simple distribution with an ungrouped set of values (data),

the arithmetic mean is
n
1X
xi ,
x=
n i=1
where x1 , . . . , xn are the values of the distribution, n being the volume
(the size) of distribution (the number of recorded values).
For a frequency distribution obtained from a classification by variants, the arithmetic mean (or the weighted arithmetic mean)
is
r
P
ni xi X
r
i=1
x= P
=
fi xi ,
r
i=1
ni
i=1
where x1 , . . . , xr are the variants (the distinct values of the distribution), n1 , . . . , nr are the corresponding absolute frequencies and f1 , . . . , fr
are the corresponding relative frequencies, r being the number of variants.
For a frequency distribution obtained from a classification by classes
(intervals), the arithmetic mean (or the weighted arithmetic
mean) is
r
P
ni xi X
r
i=1
x= P
=
fi xi ,
r
i=1
ni
i=1
where x1 , . . . , xr are the classes middles (the intervals middles)

given by
xi =
li1 + li
2
if the i-th interval is [li1 , li ), i {1, . . . , r},
n1 , . . . , nr are the corresponding absolute frequencies of the given classes

and f1 , . . . , fr are the corresponding relative frequencies of the given
classes, r being the number of classes (intervals).
2.5.2
32
The harmonic mean
We will use the same notations as above.

the harmonic mean is
n
xh = P
n 1 .
i=1 xi
For a frequency distribution obtained from a classification by variants or by classes (intervals), the harmonic mean (or the weighted
harmonic mean) is
r
P
ni
1
i=1
xh = P
r n = P
r f .
i
i
x
x
i=1 i
i=1 i
2.5.3
The geometric mean

the geometric mean is
v
u n
uY
n
xg = t
xi .
i=1
For a frequency distribution obtained from a classification by variants or by classes (intervals), the geometric mean (or the weighted
geometric mean) is
v
u r
r
uY n Y
f
n
i
t
xg =
xi =
xi i ,
i=1
where n =
r
P
i=1
ni .
i=1
2.5.4
33
The quadratic mean

the quadratic mean (the square mean) is
v
u n
u1 X 2
x.
xq = t
n i=1 i
For a frequency distribution obtained from a classification by variants or by classes (intervals), the quadratic mean (the square
mean, the weighted quadratic mean or the weighted square
mean) is
v
uP
u r n x2 v
u r
u
uX
u i=1 i i
t
xq = u P
=
fi x2i .
t r
i=1
ni
i=1
2.5.5
Absolute moments

the j-th absolute moment is
n
1X
mj =
|xi |j ,
n i=1
and the j-th moment is
n
1X j
mj =
x.
n i=1 i
For a frequency distribution obtained from a classification by variants or by classes (intervals), the j-th absolute moment is
r
P
mj =
ni |xi |j
i=1
r
P
i=1
=
ni
r
X
i=1
fi |xi |j ,
34
and the j-th moment is

r
P
mj =
ni xji
i=1
r
P
=
ni
r
X
fi xji .
i=1
i=1
We remark that m1 = x.
2.5.6
Properties of the means
1. The means inequality:

xmin xh xg x xq xmax ,
where xmin and xmax are the minimum value and the maximum value
of the given distribution, respectively.
2. All the above means are affected by extreme values and cannot be used
for heterogeneous data.
3. The arithmetic mean is a normal value meaning that the deviations
sum of the individual values from the mean will be equal to zero, i.e.
n
P
(xi x) = 0.
i=1
4. Compared to the arithmetic mean, which is influenced by large values of

the given distribution, the harmonic mean value is more influenced by
the small values of the distribution. For example, the harmonic mean
is used to compute the price index in order to measure the inflation.
5. The geometric mean is used,
q for example, to compute the average price
p
index for a year: I =
11
I f eb/jan I mar/f eb . . . I dec/nov .
6. The quadratic mean is more influenced by the large values of the variable. This mean is used to compute the standard deviations.
2.6
2.6.1
Position measures
The mode
The mode (the modal value, the dominant value) of a statistical

distribution is the most frequent value of this distribution.
35
For a frequency distribution obtained from a classification by variants, the mode is the variant with the highest frequency.
For a frequency distribution obtained from a classification by classes
(intervals), the mode is
Mo =
f f
li1 + li d
i+1 i1 ,
2
2 fi1 2fi + fi+1
where [li1 , li ) is the interval with the maximum frequency, called the
modal interval, d = li li1 is the size of the modal interval, fi
is the relative frequency of the modal interval, and fi1

, fi+1
are the
relative frequencies for the previous interval and for the next interval,
respectively.
2.6.2
The median
The median (the median value) of a statistical distribution is the value

of distribution that splits the set of values in two equal subsets. Hence half
of the population has the characteristic smaller that the median value, and
the other half has the characteristic larger than the median value.
For a simple distribution with an ungrouped set of values (data), let
x1 x2 xn
be the ordered sequence of its values. The median of this distribution
is
if n is odd,
x n+1
2
Me = x n + x n
+1
2
2
, if n is even.
2
For a frequency distribution obtained from a classification by variants, let x1 , . . . , xr be the variants, and let n1 , . . . , nr be the corresponding absolute frequencies.
Median estimation procedure consists in the following steps:
1. Compute the median location:
n2 if n 100,
Meloc =
n+1 , if n < 100,
2
where n =
r
P
i=1
ni is the volume of distribution.
36
2. For any variant xi , i {1, . . . , k}, on compute the cumulated

i
P
absolute frequency
nj ;
j=1
3. Compute the median Me as the variant corresponding to the

minimum (or first) cumulated frequency grather or equal to the
median location:
Me = xi , where i = min{i /
i
X
nj Meloc }.
j=1
For a frequency distribution obtained from a classification by intervals (classes), let [l0 , l1 ), [l1 , l2 ), . . . , [lr1 , lr ] be the intervals, and let
n1 , . . . , nr be the corresponding absolute frequencies of these intervals.
Median estimation procedure consists in the following steps:
1. Compute the median location:
n2 if n 100,
Meloc =
n+1 , if n < 100,
2
where n =
r
P
ni is the volume of distribution.
i=1
2. For any interval [li1 , li ), i {1, . . . , k}, on compute the cumui

P
lated absolute frequency
nj ;
j=1
3. Compute the median interval [li 1 , li ) as the interval corresponding to the minimum (or first) cumulated frequency grather
or equal to the median location:
i = min{i /
i
X
nj Meloc };
j=1
4. Compute the median Me as

Meloc
Me = li 1 + d
1
iP
j=1
ni
nj
,
where d = li li 1 is the size of the median interval.
2.6.3
37
Quintiles
The quintiles (the fractiles) of a statistical distribution are the values of

distribution that splits the set of values in k equal subsets. They are defined
and computed in a similar manner as the median.
The main categories of fractiles:
Quartiles: 3 measures Q1 , Q2 , Q3 that split the set of values in 4
equal subsets: We remark that the second quartile equals the median:
Q2 = Me ;
Deciles: 9 measures that split the set of values in 10 equal subsets;
Percentiles: 99 measures that split the set of values in 100 equal
subsets.
2.6.4
Properties of the position measures
1. The mode is a measure of the central tendency very used in sales analysis. Its main advantage is the possibility to be computed also for
qualitative variables, and its main disadvantage is the possibility to
have multi-modal distribution (distribution with more than one modal
value).
2. The main advantage of the median is the fact that the extreme values
do not affect it as strong as they are affecting the mean. Also the
median is easy to compute and can be used also for ordinal qualitative
data. The main disadvantage of the median is that it does not take
into account all the observation.
3. For a symmetrical distribution, the mean, the median and the mode
are identical. For a skewed distribution the mean, the median and the
mode are located in different places.
8, 5, 19, 6, 15, 20, 17, 16, 17, 23, 19, 9, 28, 5.
Calculate the above statistical indicators in each of the following cases:
a. simple distribution with an ungrouped set of values;
b. frequency distribution obtained from a classification by variants.
c. frequency distribution obtained from a classification by intervals.
2.7
38
Variation measures
The importance of the variation measures:

provide additional information to analyze the reliability of the central
tendency measure;
characterize in depth the variation and the spread of the value set;
compare two or many samples selected from the same population.
2.7.1
Simple measures of dispersion

1. The amplitude (the absolute range):
Ax = xmax xmin ,
where xmin and xmax are the minimum value and the maximum value
of the given distribution, respectively.
2. The relative range:
Ax
;
x
Ax
in percentages:
100.
x
as a coefficient:
3. The inter-quintile range:

Q=
(Me Q1 ) + (Q3 Me )
Q3 Q1
=
.
2
2
It measures how far from the median we should go on either side before
including 50% of the observations.
4. The individual deviations:
the absolute deviation: di = xi x;
xi x
the relative deviation: d0i =
100.
x
They provide information only for each recorded value and they are
not expressing the overall variation.
2.7.2
39
Average deviation measures

1. The mean absolute deviation:
the mean absolute deviation is
n
1X
|xi x|.
M AD =
n i=1
For a frequency distribution obtained from a classification by
variants or by classes (intervals), the mean absolute deviation is
r
P
ni |xi x| X
r
=
fi |xi x|.
M AD = i=1 P
r
i=1
ni
i=1
2. The variance:
the variance is
n
1X
2
(xi x)2 ,
=
n i=1
and the rectified variance is
n
1 X
s =
(xi x)2 .
n 1 i=1
2

variants or by classes (intervals), the variance is
r
P
2
ni (xi x)2
i=1
r
P
=
ni
r
X
fi (xi x)2 ,
i=1
i=1
and the rectified variance is

r
P
ni (xi x)2
s2 = i=1P
.
r
ni 1
i=1
40
For the rectified variance, the average variance computed for many
samples extracted from the same population tends to the population
variance.
The variance has no measurement unit, being an abstract measure.
It is used to compute the standard deviation and other variation and
correlation measures.
3. The standard deviation: = 2 .
The rectified standard deviation: s = s2 .

Standard deviation allows to determine how the values of a frequency
distribution are located in relation to the mean. For example, according
to the Chebishevs Inequality:
P (|X x| )
2
, > 0,
2
we have:
at least 75% of the values will fall within 2 standard deviations
from the mean of the distribution ( = 2);
at least 88.89% of the values will fall within 3 standard deviations from the mean ( = 3).
4. The coefficient of variation: v =
.
x
s
.
x
Some average measures of dispersion are expressed in concrete measurements units as the variable. When comparing two or many distribution
we cannot use these measures due to possible different measurement
units. This inconvenience is over passed using the relative dispersion
measures. The coefficient of variation is the main relative dispersion
measure.
The rectified coefficient of variation: v 0 =
It takes values between 0 and 1.

If 0 v 0.17 then the mean is strictly representative and we
have a high level of homogeneity;
If 0.17 < v 0.35 then the mean is moderately representative
and we have a medium level of homogeneity;
If 0.35 < v 0.5 then the mean has a low representativeness;
41
If v > 0.5 then the mean is not representative for the data set and
the population is heterogeneous.
5. The central moments:
the j-th central moment is
n
mcj
1X
(xi x)j .
=
n i=1

variants or by classes (intervals), the j-th central moment
is
r
P
ni (xi x)j X
r
i=1
c
mj =
=
fi (xi x)j .
r
P
i=1
ni
i=1
We remark that
mc2
= 2.
Remark 2.1. For the frequency distributions obtained from a classification by classes (intervals), on use the following Sheepards corrections
for firstly four moments and central moments:
M 1 = m1 ;
d2
;
12
d2
M 3 = m3 +
m1 ;
4
d2
d4
M 4 = m4 +
m2 + ;
2
80
M 2 = m2 +
2.7.3
M1c = mc1 = 0;
d2
M2c = mc2 ;
12
M3c = mc3 ;
M4c = mc4
d2
7d4
mc2 +
.
2
240
Shape measures
For a perfectly symmetric distribution the mean, the median and the mode
are equals. This distribution corresponds to the Gauss Bell shape (the normal
distribution). In this case the influence of the random factors is characterized
by certain regularity, so the influences are distributed in both directions,
compared to the arithmetic mean.
For analyzing the shape of an arbitrary distribution on needs to compare
the mean the median and the mode. An arbitrary distributions can be symmetric, slightly skewed or highly skewed. For a skewed distribution the mean,
the median and the mode are located in different places. More precisely:
42
If the frequencies are concentrated around the small values we have

Mo < Me < x and the symmetric distribution was modified by prolonging to + and it is becoming skewed to the right, called positive
skewness.
If the frequencies are concentrated around the large values we have x <
Me < Mo and the symmetric distribution was modified by prolonging to
and it is becoming skewed to the left, called negative skewness.
The main methods for interpreting the frequency distributions shapes:
The graphical method, by analyzing the frequency polygon.
The analytic method, by computing the skewness coefficients: the Pearsons coefficients, the Yules coefficients, the excess coefficient, the interquintile coefficient.
1. The Pearsons skewness coefficient based on the mean deviation from the mode:
x Mo
.
As =
It takes values between 1 and +1.

If As is close to zero, then the distribution is symmetric.
If As is close to 1, then the distribution is skewed to the left
(negative skewness).
If As is close to +1, then the distribution is skewed to the right
(positive skewness).
2. The Pearsons skewness coefficient based on the mean deviation from the median:
A0s =
3(x Me )
.
It takes values between -3 and +3.

If A0s is close to zero, then the distribution is symmetric.
If A0s is close to -3, then the distribution is skewed to the left
(negative skewness).
If A0s is close to +3, then the distribution is skewed to the right
(positive skewness).
43
This coefficient is mainly used for slightly skewed distributions for

which we can have the relation
x Mo = 3(x Me ), so A0s = As .
3. The Yules skewness coefficient, based on the quartiles (the interquintile asymmetry coefficient):
A00s =
(Q3 Me ) (Me Q1 )
Q1 + Q3 2Me
=
.
(Q3 Me ) + (Me Q1 )
Q3 Q1
It takes values between -1 and 1 and is also close to zero for a symmetrical distribution.
4. The excess coefficient:
Es =
mc4
3.
4
If As (A0s or A00s ) and Es are close to zero, then the distribution is

symmetric.
Q
Q3 Q1
=
.
Me
2Me
It takes also values between -1 and 1 and is close to zero for a symmetrical distribution.
5. The inter-quintile coefficient: q =

8, 5, 19, 6, 15, 20, 17, 16, 17, 23, 19, 9, 28, 5.
Compute and interpret the above statistical indicators in each of the
following cases:
a. simple distribution with an ungrouped set of values;
b. frequency distribution obtained from a classification by variants.
c. frequency distribution obtained from a classification by intervals.
Theme 3
Two-dimensional statistical
distributions
3.1
Least Squares Method
This method is used to approximate a function when only a partial set of its
values is known. Hence we will obtain the trend of the given function.
Let f : A R R be a function and let
f (xi ), i {1, . . . , n}
be the given values, where x1 , x2 , . . . , xn A.
We will approximate the function f by a trend function g : A R.
Usually, g is a polynomial function
g(x) = a0 + a1 x + a2 x2 + + ak xk , where k n
(in particular, g can be linear g(x) = a0 + a1 x or quadratic g(x) = a0 +
a1 x + a2 x2 ), a hyperbolic function
g(x) = a0 +
a1
,
x
an exponential function
g(x) = a0 + a1 ex ,
or a logarithmic function
g(x) = a0 + a1 ln x.
The Least Squares Method consists in the following steps:
44
THEME 3. TWO-DIMENSIONAL STATISTICAL DISTRIBUTIONS 45

1. Select the type of trend function g, according to the graph of the set
of given points (xi , f (xi )), i {1, . . . , n}.
2. Calculate the parameters a0 , a1 , . . . of trend function g by minimizing
the error sum of squares:
n
X
min
[f (xi ) g(xi )]2
a0 ,a1 ,...
i=1
(by the method of critical points, for example).

In the case of polynomial trend function
g(x) = a0 + a1 x + a2 x2 + + ak xk , k n,
using the method of critical points it follows that the parameters a0 , a1 , a2 , . . . , ak
are the solution of the following equation system
F
(a0 , a1 , . . . , ak ) = 0, j {0, . . . , k},
aj
where
n
X
F (a0 , a1 , . . . , ak ) =
[f (xi ) g(xi )]2
=
i=1
n
X
[f (xi ) a0 a1 xi a2 x2i ak xki ]2
i=1
(since (a0 , a1 , a2 , . . . , ak ) is the unique critical point of the function F ).

This system can be derived as
n
X
2
xji [f (xi ) a0 a1 xi a2 x2i ak xki ] = 0, j {0, . . . , k},
i=1
that is
n
n
n
n
X
X
X
X
2
k
na0 + a1
x i + a2
x i + + ak
xi =
f (xi )
i=1
i=1
i=1
i=1
n
n
n
n
n
X
X
X
X
X
k+1
2
3
x i + a1
x i + a2
x i + + ak
xi =
xi f (xi )
a0
i=1
i=1
i=1
i=1
i=1
...
n
n
n
n
n
X
X
X
X
X
k+1
k+2
k
2k
a
x
+
a
x
+
a
x
+
+
a
x
=
xki f (xi )
1
2
k
0
i
i
i
i
i=1
i=1
i=1
i=1
i=1
(3.1)

(a linear system of k + 1 equations with k + 1 variables, with a nonzero
determinant).
In the particular case of linear trend function
g(x) = a0 + a1 x
(k = 1), this system has the following form
n
n
X
X
na0 + a1
xi =
f (xi )
i=1
i=1
(3.2)
n
n
n
X
X
X
x i + a1
xi =
xi f (xi ).
a0
i=1
i=1
i=1
Remark 3.1. If we select two or more trend functions of different types, the
n
X
best approximation is given by the minimum error sum of squares
[f (xi )
i=1
g(xi )]2 .
Example 3.1. The sales of a company for the last five months are as follows:
Month Jan Feb
Sales
20
25
March April May

35
45
60
Determine a trend function of sales and a forecast for June.

Solution. We know the values
f (2) = 20, f (1) = 25, f (0) = 35, f (1) = 45, f (2) = 60.
The graph of given points (xi , f (xi )) is
y
6
60
q
45
q 35
q
2 1 O
25
20
-x

therefore we can use a linear or a quadratic trend function.
Case 1. For a linear trend function g(x) = a0 + a1 x, the parameters a0
and a1 are the solution of the linear system (3.2), i.e.
5
5
X
X
5a0 + a1
xi =
f (xi )
i=1
i=1
5
5
5
X
X
X
a
x
+
a
x
=
xi f (xi ).
i
1
i
0
i=1
i=1
i=1
The coefficients of this system are calculated in the following table (see the
columns corresponding to xi , f (xi ), x2i , xi f (xi )):
i xi
1 -2
2 -1
3 0
4 1
5 2
P
0
Therefore
f (xi ) x2i xi f (xi ) g(xi )

20
4
-40
17
25
1
-25
27
35
0
0
37
45
1
45
47
60
4
120
57
185 10
100
f (xi ) g(xi )
3
-2
-2
-2
3
5a0 = 185
and hence
10a1 = 100
[f (xi ) g(xi )]2

9
4
4
4
9
30
a0 = 37
a1 = 10.
Then the linear trend function is

g(x) = 37 + 10x.
Hence we estimate that the sales for June will be
g(3) = 37 + 10 3 = 67.
The error sum of squares, calculated in the final column of the above
table, has the value
5
X
[f (xi ) g(xi )]2 = 30.
i=1
Case 2. For a quadratic trend function g(x) = a0 + a1 x + a2 x2 , the

parameters a0 , a1 and a2 are the solution of the linear system (3.1) for k = 2,

i.e.
5
5
5
X
X
X
5a0 + a1
x i + a2
xi =
f (xi )
i=1
i=1
i=1
5
5
5
5
X
X
X
X
a0
x i + a1
x2i + a2
x3i =
xi f (xi )
i=1
i=1
i=1
i=1
5
5
5
5
X
X
X
X
2
3
4
x i + a1
x i + a2
xi =
x2i f (xi ).
a0
i=1
i=1
i=1
i=1
The coefficients of this system are calculated in the following table:

i xi
1 -2
2 -1
3 0
4 1
5 2
P
0
f (xi ) x2i
20
4
25
1
35
0
45
1
60
4
185 10
x3i
-8
-1
0
1
8
0
x4i xi f (xi ) x2i f (xi )

16
-40
80
1
-25
25
0
0
0
1
45
45
16
120
240
34
100
390
g(xi ) f (xi ) g(xi )

19.86
0.14
25.57
-0.57
34.14
0.86
45.57
-0.57
59.86
0.14
[f (xi ) g(xi )]2

0.02
0.33
0.73
0.33
0.02
1.43
Therefore
5a0 + 10a2 = 185

a0 = 34.14
10a1 = 100
a1 = 10
and hence
10a0 + 34a2 = 390

a2 = 1.43.
Then the quadratic trend function is
g(x) = 34.14 + 10x + 1.43x2 .
Hence in this case we estimate that the sales for June will be
g(3) = 34.14 + 10 3 + 1.43 32 = 77.01.
The error sum of squares, calculated in the final column of the above
table, has now the value
5
X
[f (xi ) g(xi )]2 = 1.43.
i=1
Then the quadratic approximation is better than the linear approximation.
3.2
Average measures for two-dimensional statistical distributions
Two-dimensional statistical distribution: the statistical variable is Z =

(X, Y ), where X and Y are two simple statistical variables, called the components of Z.
The distribution of Z = (X, Y ) is called the joint distribution of X
and Y .
The distributions of X and Y are called the marginal distributions of
Z = (X, Y ).
Let x1 , . . . , xn be the values of distribution of X, f1 , . . . , fn be their
corresponding absolute frequencies and f1 , . . . , fn

be their corresponding
relative frequencies.
Let y1 , . . . , ym be the values of distribution of Y , f1 , . . . , fm be their
corresponding absolute frequencies and f1 , . . . , fm

be their corresponding
relative frequencies.
Then the values of distribution of Z = (X, Y ) are the pairs (xi , yj ), i
{1, . . . , n}, j {1, . . . , m}.
Let fij be the absolute frequency and fij be the relative frequency of
(xi , yj ), for any i {1, . . . , n}, j {1, . . . , m}.
Obviously,
fi =
fj =
m
X
j=1
n
X
fij ,
fi
fij , fj =
i=1
m
X
j=1
n
X
fij , i {1, . . . , n},

fij , j {1, . . . , m}.
i=1
If X and Y are independent, then fij = fi fj and fij = fi fj , i, j.

A two-dimensional statistical distribution and its marginal distributions
are represented in a cross-table of one of the following forms:
XY
x1
..
.
y1
f11
...
...
yj
f1j
...
...
ym
f1m
Total
f1
xi
..
.
fi1
...
fij
...
fim
fi
xn
Total
fn1
f1
...
...
fnj
fj
...
...
fnm
fm
fn
(absolute frequencies);

XY
x1
..
.
y1
f11
...
...
yj
f1j
...
...
ym
f1m
Total
f1
xi
..
.
fi1
...
fij
...
fim
fi
xn
Total
fn1
f1
...
...
fnj
fj
...
...
fnm
fm
fn
(relative frequencies).
Let the two-dimensional distribution of Z = (X, Y ) as above.
The conditional distribution of Y given the event X = xi has the
values y1 , . . . , ym , the corresponding absolute frequencies fi1 , . . . , fim ,
and the corresponding relative frequencies
fi1
fim
fim
fi1
= ,...,
= , suppose that fi > 0.
fi
fi
fi
fi
The conditional distribution of X given the event Y = yj has the

values x1 , . . . , xn , the corresponding absolute frequencies f1j , . . . , fnj ,
and the corresponding relative frequencies
f1j
fnj
f1j
fnj
= ,...,
= , suppose that fj > 0.
fj
fj
fj
fj
The (u, v)-th moment of Z = (X, Y ) (or of the two-dimensional

distribution of Z = (X, Y )) is
n P
m
P
muv =
fij xui yjv

i=1 j=1
n P
m
P
fij
n X
m
X
i=1 j=1
i=1 j=1
We remark that
m10 = x and m01 = y,
where
n
P
x=
fi xi
i=1
n
P
i=1
=
fi
n
X
i=1
fi xi
fij xui yjv .

and
m
P
y=
fj yj
j=1
m
P
m
X
fj
fj yj
j=1
j=1
are the means of the marginal distributions (of X and Y , respectively).

The mean of Z = (X, Y ) (or of the two-dimensional distribution of
Z = (X, Y )) is the pair
z = (x, y).
The conditional mean of Y given the event X = xi (the mean
inside xi -group) is the mean of the conditional distribution of Y given
X = xi , i.e.
m
m
P
P
fij yj
fij yj
j=1
j=1
i = mY /X=xi =
=
.
fi
fi
The function
B(xi ) = i , i {1, . . . , n}
is called the regression function of the mean of Y with respect to
X.
The conditional mean of X given the event Y = yj (the mean
inside yj -group) is the mean of the conditional distribution of X
given Y = yj , i.e.
n
P
j = mX/Y =yj =
n
P
fij xi
i=1
fj
fij xi
i=1
fj
The function
A(yj ) = j , j {1, . . . , m}
is called the regression function of the mean of X with respect to
Y.
We remark that the regression functions B(x) and A(y) can be approximated by Least Squares Method.
3.3
Variation measures for two-dimensional

statistical distributions
Let the two-dimensional distribution of Z = (X, Y ) as above.

The (u, v)-th central moment of Z = (X, Y ) (or of the two-dimensional
n P
m
P
fij (xi x)u (yj y)v
n X
m
X
i=1 j=1
c
=
fij (xi x)u (yj y)v .
muv =
n P
m
P
i=1 j=1
fij
i=1 j=1
We remark that
2
mc20 = X
and mc02 = Y2 ,
where
n P
m
P
2
X
fij (xi x)2
i=1 j=1
n P
m
P
=
fij
n
X
fi (xi x)2
i=1
i=1 j=1
and
n P
m
P
Y2
fij (yj y)2
i=1 j=1
n P
m
P
=
fij
m
X
fj (yj y)2
j=1
i=1 j=1
are the variances of the marginal distributions (of X and Y ,

respectively).
The covariance of X and Y (or of the two-dimensional distribution
of Z = (X, Y )) is
n P
m
P
fij (xi x)(yj y)
n X
m
X
i=1 j=1
c
=
fij (xi x)(yj y).
cov (X, Y ) = m11 =
n P
m
P
i=1 j=1
fij
i=1 j=1
The conditional variance of Y given the event X = xi (the variance inside xi -group) is the variance of the conditional distribution
of Y given X = xi , i.e.
m
m
P
P
fij (yj i )2
fij (yj i )2
j=1
j=1
=
.
Y2 /X=xi =
fi
fi

The conditional variance of X given the event Y = yj (the variance inside yj -group) is the variance of the conditional distribution
of X given Y = yj , i.e.
n
n
P
P
fij (xi j )2
fij (xi j )2
i=1
i=1
2
=
.
X/Y
=yj =
fj
fj
The average of the variances within x-groups is
n P
m
P
fij (yj i )2
n
X
i=1 j=1
=
fi Y2 /X=xi .
2Y =
n P
m
P
i=1
fij
i=1 j=1
The average of the variances within y-groups is

n P
m
P
fij (xi j )2
m
X
i=1 j=1
2
2
X =
=
fj X/Y
=yj .
n P
m
P
j=1
fij
i=1 j=1
The conditional variance of X given Y (the variance between

x-groups) is
n P
m
P
fij (i y)2
n
X
i=1 j=1
2
=
fi (i y)2 .
Y /X =
n P
m
P
i=1
fij
i=1 j=1
The conditional variance of Y given X (the variance between

y-groups) is
n P
m
P
fij (j x)2
m
X
i=1 j=1
2
=
fj (j x)2 .
X/Y =
n P
m
P
j=1
fij
i=1 j=1
The rule of variances:

Y2 = 2Y + Y2 /X ;
2
2
X
= 2X + X/Y
(The overall variation is the combination result between the random factors
within each group and the essential factors determining the variation from a
group to another.)
3.4
Correlation between variables
The correlations (the dependence) that can be found between two variables
X and Y are classified as follows:
According to the way of change we can have:
positive correlation (direct dependence): if X is increasing
then Y will also increase and if X is decreasing then Y will also
decrease.
negative correlation (opposite dependence): if X is increasing then Y will decrease and if X is decreasing then Y will increase.
According to the intensity of the correlation we can have:
high intensity (strong or tight);
medium intensity;
low intensity.
According to the shape of the correlation we can have:
linear correlation;
nonlinear correlation, as exponential growth or logarithmic decrease, for example.
Let the two-dimensional distribution of Z = (X, Y ) as above. The degree
of correlation between the variables X and Y can be measured by using the
following indicators.
1. The covariance of X and Y (or of the two-dimensional distribution
of Z = (X, Y )), i.e.
n P
m
P
cov (X, Y ) =
mc11
fij (xi x)(yj y)
i=1 j=1
n P
m
P
=
fij
n X
m
X
fij (xi x)(yj y).
i=1 j=1
i=1 j=1
It takes values between X Y and X Y .

If X and Y are independent, then cov (X, Y ) = 0.
If cov (X, Y ) is close to zero, then there is no linear dependence
between the variables X and Y .

If cov (X, Y ) is positive, then we have a positive correlation
cov (X, Y ) = X Y in the case of the perfect positive correlation
(the linear increasing dependence).
If cov (X, Y ) is negative, then we have a negative correlation
cov (X, Y ) = X Y in the case of the perfect negative correlation
(the linear decreasing dependence).
2. The coefficient of correlation of X and Y (or of the two-dimensional
cov (X, Y )
.
X Y
= (X, Y ) =
It takes values between 1 and 1.
The regression line (of Y with respect to X) is

yy =
Y
(x x).
X
If X and Y are independent, then (X, Y ) = 0.

If (X, Y ) = 0, then there is no linear dependence between the
variables X and Y (the variables are independent or there is a
nonlinear dependence!).
If (X, Y ) = 1, then we have a direct linear dependence between
the variables X and Y , given by the regression line
yy =
Y
(x x).
X
If (X, Y ) = 1, then we have an opposite linear dependence

between the variables X and Y , given by the regression line
yy =
Y
(x x).
X
If 0 < (X, Y ) < 0.2, then we have a low positive correlation

If 0.2 < (X, Y ) < 0, then we have a low negative correlation

If 0.2 (X, Y ) 0.5, then we have a weak positive correlation
between the variables X and Y , case needing a significance test
to be applied (like as the Student test).
If 0.5 (X, Y ) 0.2, then we have a weak negative correlation between the variables X and Y , case needing a significance
test to be also applied.
If 0.5 < (X, Y ) 0.75, then we have a medium positive correlation between the variables X and Y .
If 0.75 (X, Y ) < 0.5, then we have a medium negative
correlation between the variables X and Y .
If 0.75 < (X, Y ) 0.95, then we have a high positive correlation
If 0.95 (X, Y ) < 0.75, then we have a high negative correlation between the variables X and Y .
If 0.95 < (X, Y ) < 1, then we have an extremely strong positive
correlation between the variables X and Y , almost a direct linear
dependence.
If 1 < (X, Y ) < 0.95, then we have an extremely strong
negative correlation between the variables X and Y . almost an
opposite linear dependence.
3. The coefficient of determination of Y with respect to X is
R2 =
Y2 /X
Y2
and the coefficient of non-determination of Y with respect to X

is
2Y
2
K = 2.
Y
By the rule of variances Y2 = 2Y + Y2 /X it follows that
R2 + K 2 = 1.
The coefficient R2 shows the share of the variance between groups in the
overall variance, expressing the influence of the classification factors.
If R2 = 1, then there is a strong functional relation between Y
and X.

If 0.7 < R2 < 1, then the classification of the population according
to X has a meaning, X variation influencing Y variation.
If 0.5 < R2 0.7, then the differences between the group means
are significant.
If R2 = 0.5, then we cannot decide whether X variation has a
significant influence over Y variation.
If 0 < R2 < 0.5, then X variation has no a significant influence
over Y variation.
If R2 = 0, then X variation has no influence over Y variation.
Exercise 3.1. Two hydrological stations make each a hundred measurements
of the level of a river during a year. The recorded data are given in the
following table.
XY
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
4.1
3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0
1
1
1
1
2
1
2
2
1
3
5
2
1
2
1
3
5
4
1
1
1
1
1
10
3
1
1
1
1
11
2
1
1
1
12
1
1
1
2
1
1
2
2
1
a. Represent the data into a scatter diagram.

b. Compute the means and the variances of X and Y and the covariance
of X and Y .
c. Compute the linear regression function of the mean of Y with respect
to X.
d. Compute and interpret the coefficient of correlation of X and Y , the
regression line of Y with respect to X, and the coefficient of determination
of Y with respect to X.
3.5
Nonparametric measures of correlation
If we do not have sufficient elements to identify the rule of distributions, then

we can use nonparametric methods like as the coefficients of ranks correlation
proposed by Kendall and Spearman.

Let X and Y be two simple statistical variables for a statistical population
or for a statistical sample. Let x1 , . . . , xn and y1 , . . . , yn be the ungrouped
values (variants) of the distributions of X and Y , respectively. The distributions of X and Y are represented in a table of the following form:
Units
X values
Y values
u1
x1
y1
u2
x2
y2
...
...
...
un
xn
yn
where n is the volume of population (the number of statistical units ui ).

Let ai be the rank of the variant xi inside the distribution of X, namely
the rank of xi in the increasing order of x1 , . . . , xn . Let also bi be the rank
of the variant yi inside the distribution of Y , namely the rank of yi in the
increasing order of y1 , . . . , yn .
The Spearmans coefficient of correlation of the ranks is
6
S = 1
n
P
d2i
i=1
3
n
where
di = ai bi , i {1, . . . , n}
(the rank differences between variables).
We remark that the Spearmans coefficient of correlation of the ranks
S is even the coefficient of correlation (A, B) of A and B, where A
and B are the statistical variables that represent the ranks of X and
Y , respectively. The distributions of A and B are represented in the
following table:
Units
A values
B values
u1
a1
b1
u2
a2
b2
...
...
...
un
an
bn
If some ranks of X or Y are equal, then on can use the corrected

Spearmans coefficient of correlation of the ranks, given by
n

P 2 t3 t
6
di +
12
i=1
eS = 1
,
3
n n
where t is the number of equal ranks.

The Kendalls coefficient of correlation of the ranks is
K =
where
P =
n
X
2(P Q)
,
n2 n
Pi , Q =
i=1
n
X
Qi ,
i=1
Pi = |{j = 1, n /aj > ai and bj > bi }|,

Qi = |{j = 1, n /aj > ai and bj < bi }|,
for all i {1, . . . , n}.
We remark that the numbers Pi are indicators of the concordance,
and the numbers Qi are indicators of the discordance between the
ranks.
The coefficients of correlation of the ranks take values between 1 and
+1. They can interpreted similarly with the coefficient of correlation.
These coefficients have the advantage that they can be used in the case of
skewed distributions or a small number of units. Also, these coefficients are
applicable for studying the relation between qualitative variables that cannot
be expressed numerically, but can be classified by their ranks.
Exercise 3.2. A group of students obtained the following marks over two
tests:
Students
Test A marks
Test B marks
Students
Test A marks
Test B marks
1
10
17
11
30
28
2
25
23
12
15
13
3 4 5
13 14 28
15 12 26
13 14 15
23 4 26
25 10 27
6
16
18
16
12
5
7
6
8
17
21
19
8
8
13
18
19
14
9
24
20
19
29
29
10
17
22
20
18
24
Compute and interpret the coefficient of correlation and the coefficients of

correlation of the ranks between the results of these tests.
Theme 4
Time series and forecasting
Usually, a time series Y = (yi )i (i being the time) is influenced by the
following factors (components):
the trend (the tendency);
the cyclical factor;
the seasonal factor;
the random factor (the irregular factor).
The main decomposition models for a time series Y = (yi )i :
The additive model:
yi = Ti + Ci + Si + Ri ,
where Ti , Ci , Si , Ri represent the trend, the cyclical, the seasonal and
the random components, respectively.
This model assumes that the components are independent and they
have the same measurement unit.
The multiplicative model:
yi = Ti Ci Si Ri .
This model assumes that the components depend each other or they
have different measurement units.
60
THEME 4. TIME SERIES AND FORECASTING
4.1
61
The trend component
This component can be determined by Least Squares Method.
4.2
The cyclical component
The cyclical variation of a time series is the component that tends to oscillate
above and below the trend line for periods longer than 1 year (if the time
series is composed by annual dates). This component explains most of the
variation of evolution that remains unexplained by the trend component.
The cyclical component can be expressed as:
The cyclical variation:
Ci = yi yei ,
where
yi is the value of time series Y at time i;
yei = Ti is the estimated trend value of time series Y at the same
time i.
The cycle:
yi
100.
yei
The relative cyclical residual:

yi yei
100.
yei
Example 4.1. The sales of a company for the last nine years are as follows:
Year 2002 2003 2004 2005 2006 2007 2008 2009 2010
Sales 5.7
5.9
6
6.2
6.3
6.3
6.4
6.4
6.6
Determine the trend function of sales and evaluate the cyclical variation.
Solution. Using the Least Squares Method, we obtain that the trend line is
ye = 5.7 + 0.1y
(where y1 = 1, . . . , y9 = 9).
The estimated sales, the cyclical variations, the cycles and the relative
cyclical residuals are calculated in the following table:
Year Sales (yi )

2002
5.7
2003
5.9
2004
6
2005
6.2
2006
6.3
2007
6.3
2008
6.4
2009
6.4
2010
6.6
Estimated sales (e
yi )
5.8
5.9
6
6.1
6.2
6.3
6.4
6.5
6.6
Cyclical variation
-0.1
0
0
0.1
0.1
0
0
-0.1
0
62
Cycle Rel. cycl. residual

98.28
-1.72
100.00
0.00
100.00
0.00
101.64
1.64
101.61
1.61
100.00
0.00
100.00
0.00
98.46
-1.54
100.00
0.00
4.3
The seasonal component
The seasonal variation of a time series is the repetitive and predictable movement around the trend line in 1 year or less. For detecting the seasonal
variation, the time intervals need to be measured in small periods such as
quarters, months, weeks, ... .
Let Y = (yi )i=1,n be a time series, and let k be the number of equal
periods per each year.
The seasonal component can be expressed as:
The moving average value for each time interval:
If k is odd, the moving average value corresponding to yi is

1
yi =
yi k1 + + yi + + yi+ k1 ,
2
2
k
for all i {1 +
k1
,...,n
2
k1
}.
2
If k is even, the moving average value corresponding to yi is

1 1
1
yi =
y k + yi k +1 + + yi + + yi+ k 1 + yi+ k ,
2
2
k 2 i 2
2 2
for all i {1 + k2 , . . . , n k2 }.
The percentage of actual value to the moving average value:
yi
100.
yi
63
The seasonal index for each period is obtained by eliminating the

extreme values of the above percentages (of actual value to the moving
average value) corresponding to period (that is one minimum value
and one maximum value for period) and computing the mean of the
remaining values.
The seasonal indexes are used to deseasonalizing the time series, in order
to remove the effects of the seasonality from the recorded dates. For that,
each actual recorded data is dividing by the correspondent seasonal index,
before the computing of the trend and the cyclical components of the time
series.
Exercise 4.1. The quarterly sales of a company for the last five years are
as follows:
Year Quarter I
2006
120
2007
124
2008
126
2009
125
2010
128
Quarter II
130
133
137
136
141
Quarter III Quarter IV

110
150
112
156
115
160
119
162
118
167
a. Compute the 4-th quarter moving averages.

b. Represent the time series and the moving averages into a scatter
diagram.
c. Compute the seasonal indexes of the four quarters.
d. Deseasonalize the time series.
e. Determine the trend function of sales and evaluate the cyclical variation.
f. Calculate the corresponding forecast for the next year.
Theme 5
The interest
5.1
A general model of interest
Definition 5.1. The interest corresponding to the initial value (the

present value, the principal) S0 (expressed in units of currency (u.c.))
over the time (the period of investment) t (usually expressed in years)
is a function D : [0, ) [0, ) [0, ) that verifies the following two
conditions:
1. D(S0 , 0) = 0, S0 0; D(0, t) = 0, t 0;
2. The function D(S0 , t) increases in each of the two variables S0 and
t. Assuming that the function D(S0 , t) has partial derivatives, this
condition can be expressed as:
D
D
(S0 , t) > 0,
(S0 , t) > 0, S0 > 0, t > 0.
S0
t
Definition 5.2. The sum
S(S0 , t) = S0 + D(S0 , t)
is called the final value (the future value, the amount), and is also
denoted by St .
Remark 5.1. The final value is a function S : [0, ) [0, ) [0, ) that
verifies the following two conditions:
S(S0 , 0) = S0 , S0 0; S(0, t) = 0, t 0;
S
S
(S0 , t) > 1,
(S0 , t) > 0, S0 > 0, t > 0
S0
t
64
THEME 5. THE INTEREST
65
Definition 5.3. The annual interest rate, denoted by i, is the interest

for 1 u.c. over 1 year, that is
i = D(1, 1).
The annual interest percentage, denoted by p, is the interest for 100 u.c.
over 1 year, that is
p = D(100, 1).
Remark 5.2. Usually, p = 100i.
Definition 5.4. The function F : [0, ) [0, ) [0, ) given by
F (S0 , t) =
D
(S0 , t), S0 0, t 0
t
is called the proportionality factor of the interest.

Remark 5.3.
F (S0 , t) =
S
(S0 , t), S0 0, t 0.
t
Proposition 5.1. We have

Z
F (S0 , x)dx, S0 0, t 0;
D(S0 , t) =
0
F (S0 , x)dx, S0 0, t 0
St S(S0 , t) = S0 +
0
Corollary 5.1. We have

Z 1
Z
F (1, x)dx; p =
i=
F (100, x)dx.
Definition 5.5. The function : [0, ) [0, ) given by

(t) =
S
(S0 , t)
t
S(S0 , t)
, t 0
is called the instantaneous interest rate.

Remark 5.4.
(t) =
ln S
F (S0 , t)
(S0 , t) =
, t 0.
t
S(S0 , t)
66

Z
(x)dx
St S(S0 , t) = S0 e
, S0 0, t 0;
Z t

(x)dx
D(S0 , t) = S0 e 0
1 , S0 0, t 0.
0
Corollary 5.2. We have

Z 1
Z 1

(x)dx
(x)dx
i=e 0
1; p = 100 e 0
1 .
5.2
Equivalence of investments
Definition 5.6. A multiple (financial) investment consists in n initial values S01 , S02 , . . . , S0n invested over the times t1 , t2 , . . ., tn , with the
annual interest rates i1 , i2 , . . ., in (or with the annual interest percentages
p1 , p2 , . . ., pn ). Let D(S01 , t1 ), D(S02 , t2 ), . . ., D(S0n , tn ) be the corresponding interests, and let S1 = S(S01 , t1 ), S2 = S(S02 , t2 ), . . ., Sn = S(S0n , tn )
n
n
n
P
P
P
be the corresponding final values. The sums
S0k ,
D(S0k , tk ) and
Sk
k=1
k=1
k=1
are called the total initial value, the total interest and the total final value of the given multiple investment, respectively. This multiple investment
can be
two forms
expressed by a matrix of one of the following
S01 t1 i1
t1 i1 S1
S02 t2 i2
t2 i2 S2
..
.. .. (if the initial values are known), or .. .. .. (if
.
. . .
. .
S0n tn in
tn in Sn
the final values are known).
Definition 5.7. We saythat two multiple
are equivalent
by
investments
0
S01 t1 i1
S01
t01 i01
0
0
S02 t2 i2 S 0
I 02 t2 i2
interest and we denote ..
.. ..
.
..
.. if the corre .
. . ..
.
.
0
0
0
S0n tn in
S0m tm im
n
m
P
P
0
sponding total interest are equal, i.e.
D(S0k , tk ) =
D(S0k
, t0k ).
k=1
k=1
67
We say that two

by present value
multiple investments
are equivalent
0
0
0
t1 i1 S1
t1 i1 S1
t0 i0 S 0
t2 i2 S2
2
P 2 2
and we denote .. .. .. ..
..
.. if the corresponding
. . .
.
.
.
0
0
0
tn in Sn
tm im Sm
n
m
P
P
0
total initial values are equal, i.e.
S0k =
S0k
.
k=1
Definition 5.8. If
S01 t1 i1
S02 t2 i2
..
.. ..
.
. .
S0n tn in
k=1

I

I
I
(CI)
S0 , t, i S0 , t(CI) , i S0 , t, i(CI) ,
(CI)
then the initial value S0 , the time of investment t(CI) and the annual interest rate i(CI) are called commonly replacements by interest.
Definition
S01 t1
S02 t2
..
..
.
.
S0n tn
5.9. If
i1
i2
I
..
.
in

(M I)
S0
t1 i1
S01 t1 i(M I)
S01 t(M I) i1
(M I)
(M I)
(M I)
i2
t2 i2
S0
I S02 t2 i
I S02 t
.
..
..
..
..
..
.. .. ..
..
.
.
.
.
.
.
. .
(M I)
(M I)
(M I)
S0n tn i
S0n t
in
S0
tn in
(M I)
then the initial value S0 , the time of investment t(M I) and the annual
interest rate i(M I) are called meanly replacements by interest.
Definition 5.10.
t1 i1
t2 i2
.. ..
. .
tn in
If
S1
S2
..
.
P
P

P
t, i, S (CP ) t(CP ) , i, S t, i(CP ) , S .
Sn
then the final value S (CP ) , the time of investment t(CP ) and the annual interest rate i(CP ) are called commonly replacements by present value.
Definition 5.11. If
t1 i1 S1
t2 i2 S2
P
.. .. ..
. . .
tn in Sn
t1 i1 S (M P )
t2 i2 S (M P )
.. ..
..
. .
.
(M P )
tn in S
t(M P ) i1 S1
t1 i(M P ) S1
(M P )
t(M P ) i2 S2
S2
P t2 i
..
.. .. ..
..
.. ,
.
.
. .
.
.
(M P )
(M P )
t
in Sn
tn i
Sn
68
then the final value S (M P ) , the time of investment t(M P ) and the annual
interest rate i(M P ) are called meanly replacements by present value.
5.3
5.3.1
Simple interest
Basic formulas
Definition 5.12. If the principal is not actualized over the time of investment, then we say that we obtain a simple interest.
Proposition 5.3. For simple interest we have:
D D(S0 , t) = S0 it =
S0 pt
100
(the simple interest formula),

St S(S0 , t) = S0 + D = S0 (1 + it)
(the compounding formula, the rule of interest),
S0 =
St
1 + it
(the discounting formula),

i=
D
St S0
D
St S0
=
, t=
=
.
S0 t
S0 t
S0 i
S0 i
Remark 5.5. According to the above formulas, 1 + it is called the com1

pounding factor, and 1+it
is called the discounting factor for simple
interest.
h
(i.e. k is the number of periods per year and h is
k
the number of such periods), then the simple interest is:

h
S0 ih
S0 pt
D S,
=
=
.
k
k
100k
Corollary 5.3. If t =
Remark 5.6. If the time of investment t is given as the period from the initial
date (d1 , m1 , y1 ) to the final date (d2 , m2 , y2 ), (where di , mi , yi represents the
day, the number of month and the year of the date), then we have three
conventions (procedures) to calculate the simple interest:
69
1. The exact interest (actual/actual):

D=
S0 ih
365
or D =
S0 ih
366
(for leap years),
where h is the number of calendar days from (d1 , m1 , y1 ) to (d2 , m2 , y2 )

(excluding either the first or last day);
2. The bankers rule (actual/360):
D=
S0 ih
,
360
where h is the number of calendar days from (d1 , m1 , y1 ) to (d2 , m2 , y2 )

(excluding either the first or last day);
3. The ordinary interest (30/360):
D=
S0 ih
,
360
where
h = 360(y2 y1 ) + 30(m2 m1 ) + d2 d1
(assumes that all months have 30 days, called the 30-day month
convention).
5.3.2
Simple interest with variable rate
Proposition 5.4. If the time of investment is t = t1 + t2 + ... + tm and the

annual interest rate is i1 for the first period t1 , i2 for the second period t2 ,...,
im for the last period tm , then we have:
D = S0
m
X
ik tk
(the simple interest formula);
k=1
St = S0 1 +
m
X
!
ik tk
(the compounding formula);
k=1
S0 =
1+
St
m
P
k=1
(the discounting formula).

ik tk
5.3.3
70
Equivalence by simple interest
Proposition 5.5. For simple interest we have:

n
P
(CI)
S0
k=1
it
n
P
(M I)
S0
n
P
S0k ik tk
, t
(CI)
k=1
n
P
, t
(M I)
i k tk
k=1
5.4
5.4.1
k=1
S0 i
n
P
S0k ik tk
n
P
S0k ik tk
, i
(CI)
S0 t
n
P
S0k ik tk
k=1
n
P
S0k ik tk
k=1
, i
(M I)
S0k ik
k=1
S0k ik tk
k=1
n
P
.
S0k tk
k=1
Compound interest
Basic formulas
Definition 5.13. If the principal is actualized over each year of investment

time (by adding the interest of the previous year), then we say that we obtain
a compound interest.
Proposition 5.6. For compound interest we have:

D D(S0 , t) = S0 (1 + i)t 1
(the compound interest formula),
St S(S0 , t) = S0 + D = S0 (1 + i)t
(the compounding formula, the rule of interest),
S0 =
St
(1 + i)t
(the discounting formula).

Remark 5.7. According to the above formulas, (1 + i)t is called the com1
is called the discounting factor for compounding factor, and
(1 + i)t
pound interest. Denoting the annual compounding factor by
u=1+i
and the annual discounting factor by
v=
1
1
=
,
u
1+i
71
the above formulas can be written as

St = S0 ut ,
S0 = St v t .
Remark 5.8. If
t=n+
h
k
h
(n being the integer part and being the fractional part, i.e. the time
k
of investment cover only h periods from a total of k equal periods per the last
year), then we have two conventions (procedures) to calculate the compound
interest:
1. The rational procedure: we apply a compound interest for the integer part and a simple interest for the fractional part, and hence

h
n
St Sn+ h = S0 (1 + i) 1 + i
k
k

h
n
D = S0 (1 + i) 1 + i
1
(the interest formula).
k
2. The commercial procedure: we extend the compound interest to the
fractional part, and hence
p
h
St Sn+ h = S0 (1+i)n+ k = S0 (1+i)n k (1 + i)h (the compounding formula);
k
D = S0 (1 + i)
5.4.2
n+ h
k
h
i
i
p
n k
h
(1 + i) 1
1 = S0 (1 + i)
(the interest formula).
Nominal rate and effective rate
Definition 5.14. For an initial value S0 over n years at an annual rate jk

compounded k times per each year, the final value is

kn
jk
St = S0 1 +
= S0 (1 + i)n ,
k
where

1+i=
jk
1+
k
k
.
k is called the number of interest periods per year;

jk is called the nominal rate (annual interest rate);
72
jk
is called the interest rate per interest period (period
k
interest rate);
ik =
i is called the effective rate or the real rate (annual interest rate).
5.4.3
Compound interest with variable rate
Proposition 5.7. If the time of investment is t = t1 + t2 + ... + tm and the

h1
annual interest rate is i1 for the first period t1 = n1 + , i2 for the second
k1
hm
h2
, then we have:
period t2 = n2 + ,..., im for the last period tm = nm +
k2
km
1. For the rational procedure:
(m
)

Y
h
l
D = S0
(1 + il )nl 1 + il
(the compound interest formula);
1
kl
l=1
St = S0
m
Y
l=1

hl
(1 + il )
1 + il
kl
nl
2. For the commercial procedure:

"m
#
Y
(1 + il )tl 1
D = S0
(the compound interest formula);
l=1
St = S0
m
Y
(1 + il )tl
(the compounding formula).
l=1
5.5
Loans
The amortization table for a loan of size (original balance) V0 u.c. per
n years at an annual interest rate i has the following form:
Years start
Years end
Year Remaining
Interest
Principal
Payment
Remaining
principal
part
part
(Rate)
principal
1
V0
d1 = V0 i
Q1
T1 = d1 + Q1
V1 = V0 Q1
2
V1
d2 = V1 i
Q2
T2 = d2 + Q2
V2 = V1 Q2
...
k
Vk1
dk = Vk1 i
Qk
Tk = dk + Qk
Vk = Vk1 Qk
...
n
Vn1
dn = Vn1 i
Qn
Tn = dn + Qn Vn = Vn1 Qn = 0
73
Obviously, we have:
V0 = Q1 + Q2 + + Qn , Vn1 = Qn ,
Tn = Qn u, Tk+1 Tk = Qk+1 Qk u,
where u = 1 + i is the the annual compounding factor.
We have two mainly procedures to calculate the payments of a loan:
1. The fixed-principal amortization: Q1 = Q2 = = Qn = Q.
In this case, we have:
V0
;
n
Tk+1 Tk = Q i
Q=
(arithmetic progression);
Tk = Q[1 + (n k + 1)i].
2. The fixed-rate amortization: T1 = T2 = = Tn = T.
In this case, we have:
T = V0
i
1 vn
(the fixed-rate formula);

Qk+1 = Qk u
(geometric progression);
Qk = V0
un
i
uk1 ,
1
1
where u = 1 + i is the the annual compounding factor, and v = =
u
1
is the annual discounting factor.
1+i
Remark 5.9. The inflation changes the purchasing power of money. After
n years, the purchasing power of Sn u.c. is reduced to
S0 =
Sn
,
(1 + a1 )(1 + a2 ) . . . (1 + an )
where a1 , a2 , . . . , an are the annual inflation rates. Sn is measured in future

units of currency, and S0 is measured in todays units of currency.
5.6
74
Problems
Exercise 5.1. A person deposits 1000 u.c. on 20 February 2011 at an annual

interest percent of 12%. Calculate the amount of this investment on 10
November 2011 in each of the following cases:
a) exact interest;
b) bankers rule;
c) ordinary interest.
Exercise 5.2. A person deposits 1000 u.c. for 3 years and seven months at
an annual interest percent of 12%. Calculate the final value of this investment
in each of the following cases:
a) simple interest;
b) compound interest, the rational procedure;
c) compound interest, the commercial procedure;
d) compounded monthly interest.
Exercise 5.3. Consider the following investments: 1000 u.c. for one year at
12% per year, 800 u.c. for 9 months at 14% per year, and 1200 u.c. for 10
months at 9% per year. Calculate the initial value, the time of investment
and the annual interest rate meanly replacements by simple interest.
Exercise 5.4. A person deposits 100 u.c. at the end of every month for 5
years, at successive annual interest percents of 12%, 12%, 9%, 10%, 10%.
Calculate the amount of this investment at the end of 5 years.
Exercise 5.5. Construct the amortization table for a loan of 2400 u.c. per
4 years at an annual interest percent of 16%, in each of the following cases:
a) fixed-principal annually amortization;
b) fixed-rate annually amortization;
c) fixed-principal monthly amortization;
d) fixed-rate monthly amortization.
Compare the obtained results when the annual successive inflation rates are
4%, 6%, 5%, 6%.
Theme 6
Introduction to Actuarial Math
6.1
A general model of insurance
In an insurance model the insurer agrees to pay the insured one or more
amounts called claims (claim payments), at fixed times or when the
insured event occurs. In return of these claims, the insured pays one or
more amounts called premiums.
Usually the insure events are random events.
For a mutually advantageous insurance, the present values (at the initial
moment of the insurance) of the premiums need to be equal to the present
value of the claims. These values are also called actuarial present values.
Definition 6.1. For a given insurance, the single premium payable at the
initial moment of the insurance is
P = E(X),
where E(X) denotes the mean of the random variable X that represents the
present value of the claim.
Theorem 6.1. Let A be an insurance consisting in the partial insurances
A1 , A2 , . . . , An (n N ), and let P1 , P2 , . . . , Pn be the single premiums corresponding of these partial insurances. Then the single premium of the total
insurance A is
P = P 1 + P2 + + Pn .
Proof. Let X be the random variable that represents the present value of the
total insurance A and let X1 , X2 , . . . , Xn be the random variables representing the present values of partial insurances A1 , A2 , . . . , An , respectively. We
have
X = X 1 + X2 + + Xn ,
75
THEME 6. INTRODUCTION TO ACTUARIAL MATH
76
and hence
P = E(X) = E(X1 + X2 + + Xn ) = E(X1 ) + e(X2 ) + + E(Xn )
= P1 + P2 + + P n .
6.2
Biometric functions
The mortality is the most important factor in the insurances of persons.

The frequency of mortality for a population is measured by some statistical
function of age called biometric functions. We assume that the age is
measured in years.
Definition 6.2. We denote by l0 the total number of persons of the analyzed
population (the number of newborns).
Remark 6.1. Usually, l0 = 100000.
Remark 6.2. l0 represents the number of survivors to age 0 (from the analyzed population).
6.2.1
Probabilities of life and death
Definition 6.3. Let x, n, m N. We denote

px = the probability that a person of age x will live at least one year;
qx = the probability that a person of age x will die within one year,
n px = the probability that a person of age x will attain age x + n;
n qx = the probability that a person of age x will die until the age x + n;
m|n qx
= the probability that a person of age x will attain age x + m but

die until the age x + m + n.
px is called probability of life for age x, and qx is called probability of

death for age x.
Proposition 6.1. Let x, y, z N, x y z. We have:
q x = 1 px ;
n q x = 1 n px ;
(6.1)
(6.2)
77
0 px
= 1; 0 qx = 0;
(6.3)
(6.4)
1 px = px ; 1 q x = qx ;
(6.5)
0|n qx = n qx ;
(6.6)
n+m px = n px m px+n ;
(6.7)
n px = px px+1 . . . px+n1 ;
n qx = qx + px qx+1 + px px+1 qx+2 + . . . + px px+1 . . . px+n2 qx+n1 ;
(6.8)
= m px n qx+m ;
m|n qx = m+n qx m qx = m px m+n px .
m|n qx
(6.9)
(6.10)
Proof. Equalities (6.1), (6.2), (6.3), (6.4) and (6.5) are obvious.
Denote by A(x, y) the event that a person of age x will attains age y.
Then
A(x, x + n + m) = A(x, x + n) A(x + n, x + n + m),
and the events A(x, x + n) and A(x + n, x + n + m) are independent. Hence
n+m px
= P (A(x, x+n+m)) = P (A(x, x+n))P (A(x+n, x+n+m)) = n px m px+n
(where P (A) represents the probability of event A). Using (6.6) and (6.4)
we have
n px
= 1 px 1 px+1 . . . 1 px+n1 = px px+1 . . . px+n1 .
Also, we have
n qx
= P (A(x, x + n))

= P A(x, x + 1) A(x, x + 1) A(x + 1, x + 2) A(x, x + 2) A(x + 2, x + 3) . . .

A(x, x + n 1) A(x + n 1, x + n)
= qx + px qx+1 + px px+1 qx+2 + . . . + px px+1 . . . px+n2 qx+n1
(where A represents the complementary event of A). We have

m|n qx
= P (A(x, x + m) A(x + m, x + m + n))

= P (A(x, x + m))P (A(x + m, x + m + n))
= m px n qx+m ;
m|n qx
= P (A(x, x + m + n) A(x, x + m))
78
= P (A(x, x + m + n) \ A(x, x + m))

= P (A(x, x + m + n)) P (A(x, x + m)) = m+n qx m qx
= (1 m+n px ) (1 m px ) = m px m+n px .
6.2.2
The survival function
Definition 6.4. Let x N. We denote

lx = the expected number of survivors at age x (from the analyzed population).
Proposition 6.2. For any x N we have
lx = l0 p(0, x).
(6.11)
Proof. Obviously,
lx = E(X),
where X is the random variable that represents the number of survivors at
age x. Let

0
...
n
...
l0
X:
x (0) . . . x (n) . . . x (l0 )
be the distribution of X, where, for any n {0, . . . , l0 }, x (n) denotes the
probability that the number of survivors at age x is equal to n. We have
x (n) = Cln0 (x p0 )n (x q0 )l0 n , n {0, . . . , l0 }.
Then X has a binomial distribution of parameters l0 and p(0, x). Therefore
lx = E(X) = l0 x p0 .
Definition 6.5. Let x N. We denote

s(x) = x p0 = the probability that a newborn will live to at least x.
s(x) is called the survival function for age x.
Remark 6.3. x q0 = 1 x p0 = 1 s(x) represents the probability that a
newborn will die until the age x.
79
Proposition 6.3. Let x, n, m N. We have:

lx = l0 p0 p1 . . . px1 ;
lx+n
lx lx+n
; n qx =
;
n px =
lx
lx
lx+m lx+m+n
;
m|n qx =
lx
dx
lx+1
; qx = ,
px =
lx
lx
(6.12)
(6.13)
(6.14)
(6.15)
where
dx = lx lx+1 .
(6.16)
Proof. Equation (6.12) is an immediate consequence of (6.11) and (6.7). Using (6.6) and (6.11) we have
n px
x+n p0
x p0
lx+n
lx lx+n
lx+n l0
=
, and n qx = 1 n px =
.
l0 lx
lx
lx
Using (6.9) and (6.12) we have

m|n qx
= m px n qx+m =
lx+m lx+m lx+m+n

lx+m lx+m+n
=
.
lx
lx+m
lx
Taking n = 1 into (6.13) we obtain equalities (6.15).

Remark 6.4. dx represents the expected number of deaths at age x
(i.e. at exactly age x or between ages x and x + 1).
Remark 6.5. There exists an age N such that
l > 0 and lx = 0 x > .
Definition 6.6. The value from the above remark is called the limiting
age.
Remark 6.6. Usually, = 100.
6.2.3
The life expectancy
Definition 6.7. Let x N, x . We denote
ex = the expected future lifetime for a person of age x (prior to death).
80
ex is called the average remaining lifetime for age x, and x+ ex is called

the life expectancy for age x.
Remark 6.7. We assume that the deaths are uniform distributed throughout
the year.
Proposition 6.4. For any x N, x , we have
ex =
x
1
1X
lx+n .
+
2 lx n=1
(6.17)
Proof. Obviously,
lx = E(Y ),
where Y is the random variable that represents the future lifetime for a
person of age x. Let
1
1
1 !
... n +
... x +
Y :
2
2
2
x (0) . . . x (n) . . . x ( x)
be the distribution of Y , where, for any n {0, . . . , x}, x (n) represents
the probability that a person of age x will live only n (i.e. will die at age
x + n). We have
x (n) = n|1 qx = n px qx+n , n {0, . . . , x},
Using (6.13) and (6.16) it follows that
x (n) =
dx+n
lx+n lx+n+1
lx+n dx+n
=
=
, n {0, . . . , x}. (6.18)
lx
lx+n
lx
lx
Hence
x
X

1 lx+n lx+n+1
ex = E(Y ) =
n+
2
lx
n=0

x
1X
1
1
=
n+
lx+n n + 1 +
lx+n+1 + lx+n+1
lx n=0
2
2
"
#

x
x
X
1
1X
1 1
1
=
lx x + 1 +
l+1 +
lx+n+1 = +
lx+n ,
lx 2
2
2
l
x
n=0
n=1
since l+1 = 0.
Remark 6.8. According to (6.17) and (6.13) we obtain:
x
1 X
ex = +
n px , x N, x .
2 n=1
(6.19)
6.2.4
81
Life tables
The value of biometric functions are tabulated in life table (mortality

table or actuarial table) of the following form:
Age
x
0
1
..
.
Nr. of survivors
lx
l0 = 100000
Nr. of deaths
dx
Probab. of death
qx
Average rem. lifetime
ex
= 100
Usually, the values lx are derived by a census. The values dx , qx and ex are
calculated according to (6.16), (6.15) and (6.17), respectively.
The following actuarial table shows the life expectancy for the Romanian
population in 2008 (www.pensiileprivate.ro).
x
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
lx
MALE
100000
99930
99860
99760
99654
99543
99425
99302
99173
99032
98880
98716
98540
98353
98144
97914
97664
97392
97100
96754
96356
95905
lx
FEMALE
100000
99960
99920
99880
99838
99794
99748
99700
99651
99597
99539
99477
99412
99342
99261
99167
99062
98945
98817
98670
98507
98325
qx
MALE
0.0007
0.0007
0.001
0.0011
0.0011
0.0012
0.0012
0.0013
0.0014
0.0015
0.0017
0.0018
0.0019
0.0021
0.0023
0.0026
0.0028
0.003
0.0036
0.0041
0.0047
0.0052
qx
FEMALE
0.0004
0.0004
0.0004
0.0004
0.0004
0.0005
0.0005
0.0005
0.0005
0.0006
0.0006
0.0007
0.0007
0.0008
0.0009
0.0011
0.0012
0.0013
0.0015
0.0017
0.0018
0.002
x + ex
MALE
66.7
66.7
66.8
66.8
66.9
66.9
67
67
67.1
67.1
67.2
67.3
67.3
67.4
67.5
67.6
67.7
67.7
67.8
68
68.1
68.2
x + ex
FEMALE
73
73
73
73
73
73.1
73.1
73.1
73.1
73.2
73.2
73.2
73.2
73.3
73.3
73.3
73.4
73.4
73.5
73.5
73.6
73.7

x
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
lx
MALE
95402
94849
94231
93548
92804
91998
91133
90209
89228
88191
87101
85960
84731
83417
82024
80556
79017
77374
75633
73803
71891
69907
67814
65625
63353
61011
58614
56111
53524
50875
48183
45471
42371
38981
35399
31727
28059
24483
21072
17886
lx
FEMALE
98127
97911
97670
97404
97114
96799
96461
96088
95683
95245
94774
94272
93717
93112
92456
91752
91000
90182
89302
88361
87361
86304
85125
83829
82423
80911
79301
77493
75501
73342
71032
68588
65336
61387
56877
51959
46789
41524
36311
31280
qx
MALE
0.0058
0.0065
0.0072
0.008
0.0087
0.0094
0.0101
0.0109
0.0116
0.0124
0.0131
0.0143
0.0155
0.0167
0.0179
0.0191
0.0208
0.0225
0.0242
0.0259
0.0276
0.0299
0.0323
0.0346
0.037
0.0393
0.0427
0.0461
0.0495
0.0529
0.0563
0.0682
0.08
0.0919
0.1037
0.1156
0.1275
0.1393
0.1512
0.163
qx
FEMALE
0.0022
0.0025
0.0027
0.003
0.0032
0.0035
0.0039
0.0042
0.0046
0.0049
0.0053
0.0059
0.0065
0.007
0.0076
0.0082
0.009
0.0098
0.0105
0.0113
0.0121
0.0137
0.0152
0.0168
0.0183
0.0199
0.0228
0.0257
0.0286
0.0315
0.0344
0.0474
0.0604
0.0735
0.0865
0.0995
0.1125
0.1255
0.1386
0.1516
x + ex
MALE
68.4
68.5
68.7
68.9
69.1
69.3
69.5
69.8
70
70.3
70.5
70.8
71.1
71.4
71.7
72
72.3
72.7
73
73.4
73.7
74.1
74.5
74.9
75.3
75.7
76.1
76.6
77
77.4
77.9
78.3
78.8
79.4
80
80.6
81.3
82
82.7
83.4
82
x + ex
FEMALE
73.7
73.8
73.9
74
74.1
74.2
74.3
74.4
74.5
74.6
74.7
74.8
75
75.1
75.3
75.4
75.6
75.8
76
76.2
76.3
76.5
76.7
77
77.2
77.4
77.7
77.9
78.2
78.5
78.8
79.1
79.5
79.9
80.4
81
81.6
82.2
82.9
83.6

x
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
6.3
lx
MALE
14970
12352
10045
8050
6356
4942
3785
2854
2118
1546
1111
785
545
372
250
165
107
68
42
26
15
lx
FEMALE
26538
22170
18232
14757
11751
9205
7091
5370
3996
2922
2099
1480
1024
696
463
303
194
122
75
45
26
qx
MALE
0.1749
0.1868
0.1986
0.2105
0.2223
0.2342
0.2461
0.2579
0.2698
0.2816
0.2935
0.3054
0.3172
0.3291
0.3409
0.3528
0.3647
0.3765
0.3884
0.4002
1
qx
FEMALE
0.1646
0.1776
0.1906
0.2037
0.2167
0.2297
0.2427
0.2557
0.2688
0.2818
0.2948
0.3078
0.3208
0.3339
0.3469
0.3599
0.3729
0.3859
0.399
0.412
1
x + ex
MALE
84.2
85
85.8
86.6
87.4
88.3
89.1
90
90.9
91.7
92.6
93.5
94.4
95.3
96.2
97
97.8
98.6
99.3
99.8
101.8
83
x + ex
FEMALE
84.3
85.1
85.9
86.7
87.5
88.3
89.2
90
90.9
91.7
92.6
93.5
94.4
95.2
96.1
97
97.8
98.6
99.3
99.8
101.9
Problems
Exercise 6.1. Calculate the probability that a 30 years old person will live
at least 35 years but at most 55 years.
Exercise 6.2. Consider a family of a 45 years old husband and a 43 years
old wife.
a) Calculate the probability that both spouses will die in the same year.
b) Calculate the probability that both spouses will die at the same age.
Exercise 6.3. Calculate the average remaining lifetime and the life expectancy for a 50 years old person.
Exercise 6.4. Calculate the probability that a 60 years old person will die
before the integer number of years of his average remaining lifetime.
Exercise 6.5. For a 35 years old person, calculate the life expectancy and
the age of death having the maximum probability.
Theme 7
Life annuities
7.1
A general model. Classifications
In a person insurance, the claims are payments while the insured survives.
We have the following classifications.
1. By period, the claims can be:
annuities;
semiannual;
quarterly;
monthly.
2. By amount, the claims can be:
constants;
variables.
3. By time of payment, the claims can be:
annuity-due, when the claims are payed at the beginning of each
period;
annuity-immediate, when the claims are payed at the end of
each period.
4. By time of first payment, the claims can be:
immediate;
deferred.
84
THEME 7. LIFE ANNUITIES
85
5. By number of payments, the claims can be:

single, when the claim is payed at a fixed time, only if the insured
will live at this time;
temporary (limited), when the claim is payed at fixed times,
while the insured survives;
unlimited, when the claim is payed whole life.
7.2
Single claim
Definition 7.1. Let x, n N s.t. x + n . We denote

n Ex = the single premium payable by a person of age x for a single
claim of 1 u.c. over n years if the person survives.
Remark 7.1. n Ex is called the unitary premium.
Proposition 7.1. For any x, n N s.t. x + n , we have
Dx+n
,
Dx
(7.1)
Dx = v x lx ,
(7.2)
n Ex
where
1
being the annual discounting factor, i being the annual interest
v =
1+i
rate.
Proof. For a mutually advantageous insurance, the single premium n Ex need
to be equal to the present value of the single claim, that is
n Ex
= E(X),
where X is the random variable that represents the present value of the claim.
We have
(
v n , if the insurer survives at least n years from the time of insurance issue,
X=
0, otherwise.
Hence the distribution of X is

X:
vn
n px
0
n qx

.
86
By (6.13) and (7.2) we have

n Ex
= E(X) = v n n px + 0 n qx = v n
lx+n
v x+n lx+n
Dx+n
=
=
.
x
lx
v lx
Dx
Corollary 7.1. Let x, n N, x+n , T 0. The single premium payable

by a person of age x for a single claim of T u.c. over n years if the person
survives is
Dx+n
.
T n Ex = T
Dx
Definition 7.2. Dx defined by (7.2) is called the commutation number.
The single premium n Ex defined by (7.1) is called the life discounting
factor.
Remark 7.2. By (7.1) it follows that
y+z Ex
7.3
7.3.1
= y Ex z Ex+y , x, y, z N s.t. x + y .
(7.3)
Life annuities-immediate
Whole life annuities

ax = the single premium payable by a person of age x for a whole life
annuity-immediate of 1 u.c. per year.
ax =
Nx+1
,
Dx
(7.4)
where
Nx = Dx + Dx+1 + + D .
Proof. By Theorem 6.1 we have
ax = 1 Ex + 2 Ex + + x Ex .
Using (7.1) and (7.5) we obtain
ax =
D
Nx+1
Dx+1 Dx+2
+
+ +
=
.
Dx
Dx
Dx
Dx
(7.5)
87
Corollary 7.2. Let x N, x , T 0. The single premium payable by a

person of age x for a whole life annuity-immediate of T u.c. per year is
T ax = T
Nx+1
.
Dx
Definition 7.4. Nx defined by (7.5) is called the cumulative commutation number.
7.3.2
Deferred whole life annuities
Definition 7.5. Let x, r N s.t. x + r . We denote
r| ax
= the single premium payable by a person of age x for an r-year

deferred whole life annuity-immediate of 1 u.c. per year (payable at the
end of each year while the person survives from age x + r onward).
Proposition 7.3. For any x, r N s.t. x + r , we have

r| ax
Nx+r+1
.
Dx
(7.6)

r| ax
= r+1 Ex + r+2 Ex + + x Ex .
Using (7.1) and (7.5) we obtain

r| ax
Dx+r+1 Dx+r+2
D
Nx+r+1
+
+ +
=
.
Dx
Dx
Dx
Dx
Corollary 7.3. Let x, r N, x + r , T 0. The single premium payable

by a person of age x for an r-year deferred whole life annuity-immediate of
T u.c. per year is
Nx+r+1
T r| ax = T
.
Dx
Remark 7.3. By (7.6), (7.1) and (7.4) it follows that
= r Ex ax+r , x, r N s.t. x + r ,
0| ax = ax , x N, x ,
x| ax = 0, x N, x .
r| ax
(7.7)
7.4
88
Temporary life annuities

ax: re = the single premium payable by a person of age x for an r-year
temporary life annuity-immediate of 1 u.c. per year (payable at the end
of each year while the person survives during the next r years).
ax: re =
Nx+1 Nx+r+1
.
Dx
(7.8)

ax: re = 1 Ex + 2 Ex + + r Ex .
By (7.1) and (7.5) we obtain
ax: re =
Dx+1 Dx+2
Dx+r
Nx+1 Nx+r+1
+
+ +
=
.
Dx
Dx
Dx
Dx

by a person of age x for an r-year temporary life annuity-immediate of T
u.c. per year is
Nx+1 Nx+r+1
.
T ax: re = T
Dx
Remark 7.4. By (7.4), (7.8), (7.6) and (7.7) it follows that
ax = ax: re + r| ax , x, r N s.t. x + r ,
ax: re = ax r Ex ax+r , x, r N s.t. x + r ,
ax: 0e = 0, x N, x ,
ax: xe = ax , x N, x .
7.5
(7.9)
Life annuities-immediate with k-thly payments
In this case the claims are payable at the end of each k-th period of the year.
7.5.1
89
Whole life annuities with k-thly payments
Definition 7.7. Let x N, x and k N . We denote

(k)
ax = the single premium payable by a person of age x for a whole life

1
u.c. per each k-th period of the year (i.e. 1
annuity-immediate of
k
u.c. per each year).
Definition 7.8. For any x N, x and k N , k 2, we define the
intermediate commutation numbers Dx+ 1 , Dx+ 2 , . . . , Dx+ k1 such that
k
k
k
Dx , Dx+ 1 , Dx+ 2 , . . . , Dx+ k1 , Dx+1 is a arithmetic progression.
k
Lemma 7.1. For any x N, x , k N and h {0, 1, . . . , k} we have

Dx+ h =
k
kh
h
Dx + Dx+1 .
k
k
(7.10)
Proof. The arithmetic progression Dx , Dx+ 1 , Dx+ 2 , . . . , Dx+ k1 , Dx+1 has k+

k
k
k
Dx+1 Dx
and hence its (h + 1)-th term is
1 terms, so its ratio is
k
Dx+ h = Dx + h
k
Dx+1 Dx
kh
h
=
Dx + Dx+1 .
k
k
k
Proposition 7.5. For any x N, x and k N we have

a(k)
x =
Nx+1 k 1
+
.
Dx
2k
(7.11)

x
a(k)
x
1 XX
=
h Ex .
k n=0 h=1 n+ k
(7.12)
By (7.1), (7.10) and (7.5) we obtain

a(k)
x

x k
k x
1 X X Dx+n+ hk
1 XX k h
h
=
=
Dx+n + Dx+n+1
k n=0 h=1 Dx
k Dx h=1 n=0
k
k
!
x
x
k
1 X khX
hX
=
Dx+n +
Dx+n+1
k Dx h=1
k n=0
k n=0
90

k
1 X kh
h
=
(Dx + Nx+1 ) + Nx+1
k Dx h=1
k
k

k
1
k(k + 1)
1 X kh
Dx + Nx+1 =
k
Dx + kNx+1
=
k Dx h=1
k
k Dx
2k
=
Nx+1 k 1
+
.
Dx
2k
Corollary 7.5. Let x N, x , k N and T 0. The single premium

payable by a person of age x for a whole life annuity-immediate of T u.c. per
each k-th period of the year is

Nx+1 k 1
(k)
+
.
T k ax = T k
Dx
2
Remark 7.5. By (7.11) and (7.4) it follows that
k1
, x N, x , k N ,
2k
= ax , x N, x .
a(k)
x = ax +
a(1)
x
7.5.2
Deferred whole life annuities with k-thly payments
Definition 7.9. Let x, r N s.t. x + r and let k N . We denote
(k)
r| ax
= the single premium payable by a person of age x for an r-year

1
deferred whole life annuity-immediate of u.c. per each k-th period of
k
the year (payable at the end of each k-th period of the year while the
person survives from age x + r onward).
Proposition 7.6. For any x, r N s.t. x + r and any k N we have

(k)
r| ax
Nx+r+1 k 1 Dx+r
+
.
Dx
2k
Dx
(7.13)

(k)
r| ax =
rx k
1 X X
h Ex .
k n=0 h=1 r+n+ k
(7.14)
91
By (7.3) and (7.12) we obtain

(k)
r| ax
rx k
1 X X
(k)
=
r Ex n+ h Ex+r = r Ex ax+r ,
k
k n=0 h=1
and using (7.1) and (7.11) we obtain

Dx+r Nx+r+1 k 1
(k)
+
.
r| ax =
Dx
Dx+r
2k
Corollary 7.6. Let x, r N, x + r , k N and T 0. The single premium payable by a person of age x for an r-year deferred whole life
annuity-immediate of T u.c. per each k-th period of the year is

Nx+r+1 k 1 Dx+r
(k)
+
.
T k r| ax = T k
Dx
2
Dx
= r Ex ax+r , x, r N s.t. x + r , k N ,
(k)
0| ax
= a(k)
x , x N, x , k N ,
(k)
x| ax
(1)
r| ax
7.5.3
(k)
(k)
r| ax
(7.15)
= 0, x N, x , k N ,
= r| ax , x, r N, x .
Temporary life annuities with k-thly payments
Definition 7.10. Let x, r N s.t. x + r and let k N . We denote

(k)
ax: re = the single premium payable by a person of age x for an r-year

1
temporary life annuity-immediate of u.c. per each k-th period of the
k
year (payable at the end of each each k-th period of the year while the
person survives during the next r years).
Proposition 7.7. For any x, r N s.t. x + r and any k N we have

Dx+r
Nx+1 Nx+r+1 k 1
(k)
ax: re =
+
1
.
(7.16)
Dx
2k
Dx
92
Proof. By Theorem 6.1, (7.12) and (7.14) we have

r1
(k)
ax: re
1 XX
1 XX
1 XX
=
Ex =
h Ex
h Ex
n+ h
n+
k n=0 h=1 k
k n=0 h=1 k
k n=r h=1 n+ k
x k
rx k
1 XX
1 X X
(k)
r| ax(k) ,
=
h Ex
h Ex = ax
k n=0 h=1 n+ k
k n=0 h=1 r+n+ k
and using (7.11) and (7.13) we obtain the equality from enounce.
Corollary 7.7. Let x, r N, x + r , k N and T 0. The single
premium payable by a person of age x for an r-year temporary life annuityimmediate of T u.c. per each k-th period of the year is

Nx+1 Nx+r+1 k 1
Dx+r
(k)
T k ax: re = T k
+
1
.
Dx
2
Dx
Remark 7.7. By (7.11), (7.16), (7.13), (7.15) and (7.8) it follows that
(k)
(k)
a(k)
x = ax: re + r| ax , x, r N s.t. x + r , k N ,
(k)
(k)
ax: re = a(k)
x r Ex ax+r , x, r N s.t. x + r , k N ,
(k)
ax: 0e = 0, x N, x , k N ,
(k)
ax: xe = a(k)
x , x N, x , k N ,
(1)
ax: re = ax: re , x, r N, x .
7.6
7.6.1
Pension
Annually pension
We denote by r the number of years until the time of retirement.

Px: re (r| ax ) = the r-year temporary life premium payable by a person of
age x (at the end of each year while the person survives during the next
r years) for an r-year deferred whole life annually pension of 1 u.c. per
each year (payable at the end of each year while the person survives
from age x + r onward).
Px: re (r| ax ) =
r| ax
ax: re
Nx+r+1
.
Nx+1 Nx+r+1
(7.17)
93
Proof. For a mutually advantageous insurance, the present values (at the
initial moment of the insurance) of the premiums need to be equal to the
present value of the pensions. By Definition 7.6, Corollary 7.4, Definition 7.5
and Proposition 7.3 we have
Px: re (r| ax ) ax: re = r| ax , so Px: re (r| ax )
Nx+1 Nx+r+1
Nx+r+1
=
,
Dx
Dx
and hence we obtain the equality from enounce.

Corollary 7.8. Let x, r N, x + r and T 0. The r-year temporary
life premium payable by a person of age x for an r-year deferred whole life
annually pension of T u.c. per each year is
T Px: re (r| ax ) = T
7.6.2
Nx+r+1
.
Nx+1 Nx+r+1
Monthly pension
We denote by r the number of years until the time of retirement.

(12)
(12)
Px: re (r| ax ) = the r-year temporary life premium payable by a person

of age x at the end of each month (while the person survives during
the next r years) for an r-year deferred whole life monthly pension of 1
u.c. per each month (payable at the end of each month while the person
survives from age x + r onward).
(12)
Px: re (r| a(12)

x ) =
(12)
r| ax
(12)
ax: re
24Nx+r+1 + 11Dx+r
.
24(Nx+1 Nx+r+1 ) + 11(Dx Dx+r )
(7.18)
Proof. For a mutually advantageous insurance, the present values (at the
initial moment of the insurance) of the premiums need to be equal to the
present value of the pensions. By Definition 7.10, Corollary 7.7, Definition
7.9 and Proposition 7.6 we have
(12)
(12)
(12)
Px: re (r| a(12)
x ) 12 ax: re = 12 r| ax ,
so
(12)
Px: re (r| a(12)
x )
Nx+1 Nx+r+1 11
12
+
Dx
2

Dx+r
Nx+r+1 11 Dx+r
1
= 12
+
,
Dx
Dx
2 Dx
and hence we obtain the equality from enounce.
94
Corollary 7.9. Let x, r N, x + r and T 0. The r-year temporary

life premium payable by a person of age x at the end of each month for an
r-year deferred whole life monthly pension of T u.c. per each month is
(12)
T Px: re (r| a(12)

x ) = T
7.7
24Nx+r+1 + 11Dx+r
.
24(Nx+1 Nx+r+1 ) + 11(Dx Dx+r )
Problems
Exercise 7.1. Calculate the single premium payable by a 30 years old person
for a single claim of 10000$ over 35 years if the person survives. The annual
interest percent is 8%.
Exercise 7.2. Calculate the single premium payable by a 30 years old person for a whole life annuity-immediate of 12000RON per year. The annual
for a 35-year deferred whole life annuity-immediate of 12000RON per year.
The annual interest percent is 14%.
for a 35-year temporary life annuity-immediate of 12000RON per year. The
annual interest percent is 14%.
for a whole life annuity-immediate of 1000RON per month. The annual
for a 35-year deferred whole life annuity-immediate of 1000RON per month.
The annual interest percent is 14%.
for a 35-year temporary life annuity-immediate of 1000RON per month. The
Exercise 7.8. Calculate the annuity-immediate premium payable by a 30
years old person for an annuity-immediate pension of 12000RON per year.
The annual interest percent is 14% and the age of retirement is 65 years.
Exercise 7.9. Calculate the monthly-immediate premium payable by a 30
years old person for a monthly pension of 1000RON per month. The annual
interest percent is 14% and the age of retirement is 65 years.
Theme 8
Life insurances
8.1
A general model. Classification
In a life insurance, the single claim is payable at the moment of death, if the
death occurs in the period covered by the insurance. The life insurance can
be:
immediate and unlimited, when the claim is payed at the moment
of death, whenever this occurs.
deferred, when the claim is payed only if the insured dies after a fixed
term from the time of insurance issue.
temporary (limited), when the claim is payed only if the insured
dies within a fixed term from the time of insurance issue.
8.2
Whole life insurance
Ax = the single premium payable by a person of age x for a whole life

insurance of 1 u.c. (payable at the moment of death, whenever this
occurs).
Ax =
Mx
,
Dx
(8.1)
where
1
Mx = Cx + Cx+1 + + C , with Cx = dx v x+ 2 = (lx lx+1 )v x+ 2 , (8.2)

95
THEME 8. LIFE INSURANCES
96
1
v =
1+i
rate.
Proof. For a mutually advantageous insurance, the single premium A(x) need
to be equal to the present value of the single claim, that is
Ax = E(X),
where X is the random variable that represents the present value of the claim.
Assuming that the deaths are uniform distributed throughout the year, we
have
1
X = v n+ 2 , if n is the number of complete years lived by the insured since issue,

for any n {0, . . . , x}. Hence the distribution of X is

1
1
1
. . . v n+ 2 . . .
v x+ 2
v2
X:
,
x (0) . . . x (n) . . . x ( x)
where, for any n {0, . . . , x}, x (n) represents the probability that a
person of age x will live only n (i.e. will die at age x + n). By (6.18) we have
dx+n
, n {0, . . . , x},
lx
x (n) =
where dx+n = lx+n lx+n+1 represents the number of deaths at age x + n.

Using (8.2) it follows that
Ax = E(X) =
x
X
x (n)v
n+ 12
n=0
n=0
x
X
Cx+n
n=0
Dx
x
X
dx+n
lx
n+ 12
1
x
X
dx+n v x+n+ 2
n=0
lx v x
Mx
.
Dx
Corollary 8.1. Let x N, x and let T 0. The single premium

payable by a person of age x for a whole life insurance of T u.c. is
T Ax = T
Mx
.
Dx
Corollary 8.2. For any x N, x , we have
Ax =
v (1 i ax ) .
(8.3)
8.3
97
Deferred life insurance
Definition 8.2. Fie x, r N s.t. x + r . We denote
r| Ax
= the single premium payable by a person of age x for a r-year

deferred life insurance of 1 u.c. (payable at the moment of death only
if the insured die at least r years following insurance issue).
r| Ax
Mx+r
.
Dx
(8.4)
Proof. Similar to Proposition 8.1 we have
r| Ax
= E(r| X),
where r X is the random variable having the distribution

1
1
0
0
...
0
v r+ 2
v r+1+ 2
...
X
:
r|
x (0) x (1) . . . x (r 1) x (r) x (r + 1) . . .
v x+ 2
x ( x)

.
Using (6.18) and (8.2) we obtain that

x
X
r| Ax
= E(r| X) =
x (n)v
n+ 12
n=r
n=r
x
X
Cx+n
Dx
n=r
x
X
dx+n
lx
n+ 12
1
x
X
dx+n v x+n+ 2
n=r
lx v x
Mx+r
.
Dx

by a person of age x for a r-year deferred life insurance of T u.c. is
T r| Ax = T
Mx+r
.
Dx
r| Ax
= r Ex Ax+r , x, r N s.t. x + r ,
0| Ax
= Ax , x N, x .
(8.5)
r| Ax
r Ex

i r| ax , x, r N s.t. x + r .
(8.6)
8.4
98
Temporary life insurance
Ax:
= the single premium payable by a person of age x for a r-year
1
re
term life insurance of 1 u.c. (payable at the moment of death only if
the insured die within r years following insurance issue).
Mx Mx+r
.
Ax:
=
1
re
Dx
Proof. Similar to Proposition 8.1 we have
(8.7)
Ax:
= E(Xre ),
1
re
where Xre is the random variable having the distribution

1
1
1
v2
v 1+ 2 . . .
v r1+ 2
0
0
...
Xre :
x (0) x (1) . . . x (r 1) x (r) x (r + 1) . . .
0
x ( x)

.
Using(6.18) and (8.2) we obtain
= E(Xre ) =
Ax:
1
re
r1
X
x (n)v
n+ 21
n=0
n=0
r1
X
Cx+n
n=0
Dx
r1
X
dx+n
lx
n+ 21
1
r1
X
dx+n v x+n+ 2
n=0
lx v x
Mx Mx+r
.
Dx

by a person of age x for a r-year term life insurance of T u.c. is
Mx Mx+r
T Ax:
=
T
.
1
re
Dx
Ax = Ax:
+ r| Ax , x, r N s.t. x + r ,
1
re
(8.8)
Ax:
= Ax r Ex Ax+r , x, r N s.t. x + r ,
1
re
Ax:
= 0, x N, x .
1
0e
Ax:
=
v
1
a
, x, r N s.t. x + r .
1
r
x
x:
re
re
(8.9)
8.5
99
Problems
for a whole life insurance of 1000RON. The annual interest percent is 14%.
for a 35-year deferred life insurance of 1000RON. The annual interest percent
is 14%.
for a 35-year term life insurance of 1000RON. The annual interest percent is
14%.
Theme 9
Collective annuities and
insurances
Next, we consider an insured group of m persons having the ages x1 , x2 , . . . , xm
(m N , xj N, xj j {1, . . . , m}).
9.1
Multiple life probabilities
Definition 9.1. Let a group of m persons having the ages x1 , x2 , . . . , xm ,

where m N , xj N, xj j {1, . . . , m}. Let n, k N s.t. k m.
We denote
n px1 x2 ...xm = the probability that all members of the group will survive
n years;
np
[k]
= the probability that exactly k of the group members will
x1 x2 ...xm
survive n years;
np
k = the probability that at least k of the group members will
x1 x2 ...xm
survive n years;
n px1 x2 ...xm
is called the probability of joint survival (probability of the

joint-life) for the group, and n p
k are called probabil[k] and n p
x1 x2 ...xm
x1 x2 ...xm
ities of partial survival for the group.

Remark 9.1. We assume that the deaths of the group members are independent.
Definition 9.2. We denote by x
e the maximum age of the group, i.e.
x
e = max{x1 , x2 , . . . , xm }.
100
THEME 9. COLLECTIVE ANNUITIES AND INSURANCES
101
Also, we denote by x
b the minimum age of the group, i.e.
x
b = min{x1 , x2 , . . . , xm }.
Remark 9.2. Obviously, if n > x
e then n px1 x2 ...xm = 0.
Proposition 9.1. Let n, k N s.t. k m. We have:
n px1 x2 ...xm
np
[k]
x1 x2 ...xm
np
9.2
k
x1 x2 ...xm
lx1 +n lx2 +n
lx +n
... m ;
lx1
lx2
lxm
X
n pxi1 xi2 ...xik+s ;
= n px1 n px2 . . . n pxm =

=
mk
X
s
(1)s Ck+s
(9.2)
1i1 <...<ik+s m
s=0
(9.1)
mk
X
s
(1)s Ck+s1
n pxi1 xi2 ...xik+s .
(9.3)
1i1 <...<ik+s m
s=0
Single claim for joint survival

where m N , xj N, xj j {1, . . . , m}. Let n N. We denote
n Ex1 ,x2 ,...,xm = the single premium payable by the group for a single
claim of 1 u.c. over n years if all of the members survive.
Remark 9.3. n Ex1 ,x2 ,...,xm is called the unitary premium.
n Ex1 ,x2 ,...,xm
Dx1 +n,x2 +n,...,xm +n

,
Dx1 ,x2 ,...,xm
where
Dx1 ,x2 ,...,xm = lx1 lx2 . . . lxm v
x1 +x2 +...+xm
m
(9.4)
(9.5)
1
v =
1+i
rate.
Corollary 9.1. Let a group of m persons having the ages x1 , x2 , . . . , xm ,
where m N , xj N, xj j {1, . . . , m}. Let n N and T 0.
The single premium payable by the group for a single claim of T u.c. over n
years if all of the members survive is
T n Ex1 ,x2 ,...,xm = T
Dx1 +n,x2 +n,...,xm +n

.
Dx1 ,x2 ,...,xm
9.3
102
Single claims for partial survival

where m N , xj N, xj j {1, . . . , m}. Let n, k N s.t. k m.
We denote
nE
= the single premium payable by the group for a single
[k]
x1 ,x2 ,...,xm
claim of 1 u.c. over n years if exactly k of the members survive;

nE
x1 ,x2 ,...,xm
= the single premium payable by the group for a single
claim of 1 u.c. over n years if at least k of the members survive.

nE
[k]
x1 ,x2 ,...,xm
k
nE
x1 ,x2 ,...,xm
mk
X
n Exi1 ,xi2 ,...,xik+s ;
(9.6)
1i1 <...<ik+s m
s=0
mk
X
s
(1)s Ck+s
s
(1)s Ck+s1
n Exi1 ,xi2 ,...,xik+s .
(9.7)
1i1 <...<ik+s m
s=0

where m N , xj N, xj j {1, . . . , m}. Let n, k N s.t. k m
and let T 0.
1. The single premium payable by the group for a single claim of T u.c.
over n years if exactly k of the members survive is
T nE
[k]
x1 ,x2 ,...,xm
=T
mk
X
s=0
s
(1)s Ck+s
1i1 <...<ik+s m
2. The single premium payable by the group for a single claim of T u.c.
over n years if at least k of the members survive is
T nE
k
x1 ,x2 ,...,xm
9.4
=T
mk
X
s=0
s
(1)s Ck+s1
1i1 <...<ik+s m
Whole life annuities for joint survival

where m N , xj N, xj j {1, . . . , m}. We denote
103
ax1 ,x2 ,...,xm = the single premium payable by the group for a whole jointlife annuity-immediate of 1 u.c. per year (payable at the end of each
year while all of the members survive).
ax1 ,x2 ,...,xm =
Nx1 +1,x2 +1,...,xm +1

,
Dx1 ,x2 ,...,xm
(9.8)
where
Nx1 ,x2 ,...,xm =
e
x
X
Dx1 +n,x2 +n,...,xm +n ,
(9.9)
n=0
x
e = max{x1 , x2 , . . . , xm } being the maximum age of the group.
where m N , xj N, xj j {1, . . . , m}. Let T 0. The single
premium payable by the group for a whole joint-life annuity-immediate of T
u.c. per year is
T ax1 ,x2 ,...,xm = T
9.5
Nx1 +1,x2 +1,...,xm +1

.
Dx1 ,x2 ,...,xm
Whole life annuities for partial survival

where m N , xj N, xj j {1, . . . , m}. Let k N s.t. k m. We
denote
a
[k]
= the single premium payable by the group for a whole life
x1 ,x2 ,...,xm
annuity-immediate of 1 u.c. per year payable (at the end of each year)
while exactly k of the members survive;
a
x1 ,x2 ,...,xm
= the single premium payable by the group for a whole life
annuity-immediate of 1 u.c. per year payable (at the end of each year)
while at least k of the members survive.
a
[k]
x1 ,x2 ,...,xm
x1 ,x2 ,...,xm
mk
X
s=0
mk
X
s=0
s
(1)s Ck+s
axi1 ,xi2 ,...,xik+s ;
(9.10)
1i1 <...<ik+s m
s
(1)s Ck+s1
X
1i1 <...<ik+s m
axi1 ,xi2 ,...,xik+s .
(9.11)
104

where m N , xj N, xj j {1, . . . , m}. Let k N s.t. k m and
let T 0.
1. The single premium payable by the group for a whole life annuityimmediate of T u.c. per year payable while exactly k of the members
survive is
T a
[k]
x1 ,x2 ,...,xm
=T
mk
X
s
(1)s Ck+s
1i1 <...<ik+s m
s=0
2. The single premium payable by the group for a whole life annuityimmediate of T u.c. per year payable while at least k of the members
survive is
T a
9.6
k
x1 ,x2 ,...,xm
=T
mk
X
s
(1)s Ck+s1
1i1 <...<ik+s m
s=0
Deferred whole life annuities for joint survival

where m N , xj N, xj j {1, . . . , m}. Let r N s.t. x
e + r .
We denote
r| ax1 ,x2 ,...,xm
= the single premium payable by the group for an r-year

deferred whole joint-life annuity-immediate of 1 u.c. per year (payable
after r-years, at the end of each year while all of the members survive).

r| ax1 ,x2 ,...,xm
Nx1 +r+1,x2 +r+1,...,xm +r+1

.
Dx1 ,x2 ,...,xm
(9.12)

e+r
and let T 0. The single premium payable by the group for an r-year
deferred whole joint-life annuity-immediate of T u.c. per year is
T r| ax1 ,x2 ,...,xm = T
Nx1 +r+1,x2 +r+1,...,xm +r+1

.
Dx1 ,x2 ,...,xm
9.7
105
Deferred whole life annuities for partial

survival

b+r
and let k N s.t. k m. We denote
r| a
[k]
x1 ,x2 ,...,xm
deferred whole life annuity-immediate of 1 u.c. per year payable (after

r-years, at the end of each year) while exactly k of the members survive;
r| a
x1 ,x2 ,...,xm
deferred whole life annuity-immediate of 1 u.c. per year payable (after

r-years, at the end of each year) while at least k of the members survive.
r| a
r| a
[k]
x1 ,x2 ,...,xm
k
x1 ,x2 ,...,xm
mk
X
r| axi1 ,xi2 ,...,xik+s ;
(9.13)
1i1 <...<ik+s m
s=0
s
(1)s Ck+s
mk
X
s
(1)s Ck+s1
r| axi1 ,xi2 ,...,xik+s .
(9.14)
1i1 <...<ik+s m
s=0

b+r
and let k N s.t. k m. Let T 0.
1. The single premium payable by the group for an r-year deferred whole
life annuity-immediate of T u.c. per year payable while exactly k of the
members survive is
T r| a
[k]
x1 ,x2 ,...,xm
=T
mk
X
s=0
s
(1)s Ck+s
1i1 <...<ik+s m
2. The single premium payable by the group for an r-year deferred whole
life annuity-immediate of T u.c. per year payable while at least k of the
members survive is
T r| a
x1 ,x2 ,...,xm
=T
mk
X
s=0
s
(1)s Ck+s1
X
1i1 <...<ik+s m
9.8
106
Temporary life annuities for joint survival

e + r .
We denote
ax1 ,x2 ,...,xm : re = the single premium payable by the group for an r-year
temporary joint-life annuity-immediate of 1 u.c. per year (payable at
the end of each year while all of the members survive during the next r
years).
ax1 ,x2 ,...,xm : re =
Nx1 +1,x2 +1,...,xm +1 Nx1 +r+1,x2 +r+1,...,xm +r+1

.
Dx1 ,x2 ,...,xm
(9.15)

e+r
and let T 0. The single premium payable by the group for an r-year
temporary joint-life annuity-immediate of T u.c. per year is
T ax1 ,x2 ,...,xm : re = T
9.9
Nx1 +1,x2 +1,...,xm +1 Nx1 +r+1,x2 +r+1,...,xm +r+1

.
Dx1 ,x2 ,...,xm
Temporary life annuities for partial survival

where m N , xj N, xj j {1, . . . , m}. Letr N s.t. x
b+r
and let k N s.t. k m. We denote
a
[k]
x1 ,x2 ,...,xm : re
temporary life annuity-immediate of 1 u.c. per year payable (at the end
of each year) while exactly k of the members survive (during the next r
years);
a
x1 ,x2 ,...,xm : re
temporary life annuity-immediate of 1 u.c. per year payable (at the end
of each year) while at least k of the members survive (during the next
r years).
107

a
[k]
x1 ,x2 ,...,xm : re
k
x1 ,x2 ,...,xm : re
mk
X
axi1 ,xi2 ,...,xik+s : re ;
(9.16)
1i1 <...<ik+s m
s=0
s
(1)s Ck+s
mk
X
s
(1)s Ck+s1
axi1 ,xi2 ,...,xik+s : re .
(9.17)
1i1 <...<ik+s m
s=0

where m N , xj N, xj j {1, . . . , m}. Letr N s.t. x
b+r
and let k N s.t. k m. Let T 0.
1. The single premium payable by the group for an r-year temporary life
annuity-immediate of T u.c. per year payable while exactly k of the
members survive is
T a
=T
[k]
mk
X
x1 ,x2 ,...,xm : re
s
(1)s Ck+s
1i1 <...<ik+s m
s=0
2. The single premium payable by the group for an r-year temporary life
annuity-immediate of T u.c. per year payable while at least k of the
members survive is
T a
k
x1 ,x2 ,...,xm : re
9.10
=T
mk
X
s
(1)s Ck+s1
1i1 <...<ik+s m
s=0
Group insurance payable at the first death

where m N , xj N, xj j {1, . . . , m}. We denote
Ax1 ,x2 ,...,xm = the single premium payable by the group for an insurance
of 1 u.c. payable at the moment of the first death, whenever this occurs.
Ax1 ,x2 ,...,xm =
Mx1 ,x2 ,...,xm

,
Dx1 ,x2 ,...,xm
(9.18)
where
Mx1 ,x2 ,...,xm =
e
x
X
n=0
Cx1 +n,x2 +n,...,xm +n ,
(9.19)
108
x
e = max{x1 , x2 , . . . , xm } being the maximum age of the group, with
Cx1 ,x2 ,...,xm = (lx1 lx2 . . . lxm lx1 +1 lx2 +1 . . . lxm +1 ) v
x1 +x2 +...+xm
+ 12
m
,
(9.20)
1
v =
1+i
rate.
where m N , xj N, xj j {1, . . . , m}. Let T 0. The single
premium payable by the group for an insurance of T u.c. payable at the
moment of the first death, whenever this occurs, is
T Ax1 ,x2 ,...,xm = T
9.11
Mx1 ,x2 ,...,xm

.
Dx1 ,x2 ,...,xm
Group insurance payable at the k-th death

where m N , xj N, xj j {1, . . . , m}. Let k N s.t. k m. We
denote
[k]
= the single premium payable by the group for an insurance
x1 ,x2 ,...,xm
of 1 u.c. payable at the moment of the k-th death, whenever this occurs.
[k]
x1 ,x2 ,...,xm
k1
X
s
(1)s Cmk+s
Axi1 ,xi2 ,...,ximk+s+1 . (9.21)
1i1 <...<imk+s+1 m
s=0

where m N , xj N, xj j {1, . . . , m}. Let k N s.t. k m and
let T 0. The single premium payable by the group for an insurance of T
u.c. payable at the moment of the k-th death, whenever this occurs, is
T A
[k]
x1 ,x2 ,...,xm
k1
X
s
=T
(1)s Cmk+s
s=0
X
1i1 <...<imk+s+1 m
Axi1 ,xi2 ,...,ximk+s+1 .
9.12
109
Problems
Exercise 9.1. Consider a group of 4 members of 55, 53, 30, and 28 years
old.
a) Calculate the probability that all of the members survive 15 years.
b) Calculate the probability that exactly 2 of the members survive 25 years.
c) Calculate the probability that at least 3 of the members survive 20 years.
d) Calculate the probability that at most 3 of the members survive 10 years.
Exercise 9.2. Calculate the single premium payable by a family of two
persons of 32 and 30 years old for a single claim of 20000$ over 35 years if
both members will be alive. The annual interest percent is 12%.
persons of 32 and 30 years old for a single claim of 20000$ over 35 years if
just one member will be alive. The annual interest percent is 12%.
persons of 32 and 30 years old for a single claim of 20000$ over 35 years if at
least one member will be alive. The annual interest percent is 12%.
Exercise 9.5. Calculate the single premium payable by a family of three
persons of 46, 44 and 22 years old for a life annuity-immediate of 10000$ per
year while all of the members survive. The annual interest percent is 12%.
year while exactly two of the members survive. The annual interest percent
is 12%.
year while at least two of the members survive. The annual interest percent
is 12%.
persons of 42 and 37 years old for a 10-year deferred life annuity-immediate
of 10000$ per year while all of the members survive. The annual interest
percent is 12%.
of 10000$ per year while just one member survives. The annual interest
percent is 12%.
110

of 10000$ per year while at least one member survives. The annual interest
percent is 12%.
persons of 60, 54 and 35 years old for a 30-year temporary life annuityimmediate of 10000$ per year while all of the members survive. The annual
persons of 60, 54 and 35 years old for a 30-year temporary life annuityimmediate of 10000$ per year while just one member survives. The annual
persons of 60, 54 and 35 years old for a 30-year temporary life annuityimmediate of 10000$ per year while at least one member survives. The
persons of 28 and 25 years old for an insurance of 50000$ payable at the
moment of the first death. The annual interest percent is 12%.
persons of 28 and 25 years old for an insurance of 50000$ payable at the
moment of the last death. The annual interest percent is 12%.
persons of 50, 49 and 25 years old for an insurance of 50000$ payable at the
moment of the first death. The annual interest percent is 12%.
moment of the second death. The annual interest percent is 12%.
moment of the last death. The annual interest percent is 12%.
Theme 10
Bonus-Malus system in
automobile insurance
10.1
A general model
The Bonus-Malus system is the most well known system of goods insurance,
especially car insurance. In this type of insurance, policies are categorized
based on characteristics of the insured vehicle (the insured good), and on
Bonus-Malus level, given by the previous number of claims. The insurance
period for goods is usually one year. In this case, one policy remains in a
certain payment class for one year and then it can be transferred to another
payment class, based on the number of accidents from the previous year. If
the insured vehicle didnt have any accident, then the new payment class
will be better, so the premium will be reduced (bonus). As the number
of accidents grows, the new class will be worst, so the premium will be
increased (malus).
Definition 10.1. A Bonus-Malus insurance system can be represented
as S = (C, D, T, ), where:
C = {1, . . . , c} represents the set of payment classes (c N ). If
i > j, i, j C, we say thai i is a better class than j.
D = {0, . . . , r} represents the set of annual number of accidents
possible for an insurance policy (r N ).
T : C D C is a function called the rule of passing of the system;
for any i C and j D, T (i, j) represents the payment class in which
it will be transferred the next year every insurance policy from class
i that had j accidents during the current year. The function T (i, j)
increases in i (for any fixed j) and decreases in j (for any fixed i).
111
THEME 10. BONUS-MALUS SYSTEM
112
: C (0, ) is a decreasing function; for any i C, (i) represents

the premium insurance for an insurance policy from class i.
Remark 10.1. The premium insurances (i) calculated using the previous
system are also called mathematical premiums. In practice, to this premiums are also added
values for reducing the probability of ruin for the insurer;
charges spent on employees;
taxes.
10.2
Bayes model based on a mixed Poisson

distribution
Definition 10.2. We denote by X the random variable that represents the

number of accidents during one year (for a random insurance policy).
In order to place a policy in a payment class and calculate the corresponding insurance premium, it is necessary to estimate the value of the number
of accidents Xn+1 for the next year based on the recorded values of the number of accidents X1 , . . . , Xn of the ensured vehicle in n previous years (years
1, . . . , n), where n N . In this context, the known distributions of the
r.v. X1 , . . . , Xn are also called the prior distributions, and the estimated
distribution of r.v. Xn+1 is also called the posterior distribution.
For estimating the posterior distribution of number of accidents and for
calculating the premium for year n + 1, we consider a Bayes model based
on a mixed Poisson distribution for the annual number of accidents,
in which we assume:
The annual number of accidents X (for a random policy) has a mixed
Poisson-H distribution, where H is the distribution of the random variable > 0 which represents the average number of annual accidents (for a random policy), so X|( = ) Po() for any > 0.
For any > 0 the conditioned random variables X1 |( = ), . . . , Xn |( =
), Xn+1 |( = ) are independent and identically distributed with
X|( = ) (i.e. Xi |( = ) Po() for any i {1, . . . , n + 1}).
113
Definition 10.3. For any n N , the number

E(Xn+1 |X1 = x1 , . . . , Xn = xn )
, x1 , . . . , xn D
E(Xn+1 )
(10.1)
is called the frequency index for year n + 1 when X1 = x1 , . . . , Xn = xn
are known.
In+1 (x1 , . . . , xn ) =
Remark 10.2. The frequency index represents the rapport between the posterior mean and the prior mean of Xn+1 .
Proposition 10.1. For any n N and x1 , . . . , xn D we have
E(|X1 = x1 , . . . , Xn = xn )
E()
premium for year n + 1 given (x1 , . . . , xn )
.
=
initial premium, from year 1
In+1 (x1 , . . . , xn ) =
(10.2)
(10.3)
Remark 10.3. The premium for year n + 1 given (x1 , . . . , xn ) is called a

posterior premium, and the initial premium, from year 1, is called a prior
premium.
Algorithm 10.1 (Bayes model for calculating the premiums in BonusMalus system).
Step 0. Estimate the distribution of r.v. representing the average number of
annual accidents (for a random insurance policy). This distribution is
also called the prior distribution of the r.v. . It can use the
maximum likelihood estimation, based on the frequency of the accidents
from the initial year.
To calculate the premium for year n + 1 knowing the number of accidents in the previous years x1 , . . . , xn the following four steps will be
proceeded:
Step 1. Calculate the distribution of the conditioned random variable |(X1 =
x1 , . . . , Xn = xn ), also called the posterior distribution of r.v. .
For this it can use the Bayess formula.
Step 2. Calculate the posterior distribution of r.v. Xn+1 , i.e. the distribution of conditioned r.v. Xn+1 |(X1 = x1 , . . . , Xn = xn ). For this it
can use the total probability formula.
Step 3. Calculate the frequency index In+1 (x1 , . . . , xn ), using the formulas
(10.1) or (10.2).
114
Step 4. Calculate the premium for year n + 1 by formula (10.3).

Remark 10.4. In the next section will be prove that the posterior distributions of r.v. and Xn+1 and the frequency indexes In+1 (x1 , . . . , xn ) depend
n
P
only on the total number of accidents
xi observed in the previous n years,
i=1
and not on the distribution of the accidents during these n years. Therefore
the values of the frequency indexes are tabled according to the year n and the
n
P
total number of accidents
xi .
i=1
10.3
Gamma distribution for the average number of accidents
We will apply the described model in the particular case when the r.v.
(which represents the average number of annual accidents for a random insurance policy) has a Gamma prior distribution of parameters a and b, where
a, b > 0, with the probability density function
f () =
1
a1 e b , > 0.
a
(a)b
Then the r.v. X (which represents the number of accidents during one year
for a random policy) has a mixed Poisson-Gamma distribution of parameters a and b, which is equivalent with a Negative Binomial distribution of
1
1
. So X BN (a, b+1
).
parameters a and b+1
Step 0 of the previous algorithm requires the estimation of the parameters
a and b for the prior distribution of r.v. . In the next example we will
apply the maximum likelihood estimation method to estimate these
parameters, based on the number of accidents during one year.
Step 1 consists of calculating the a posteriori distribution for r.v. , that
is the distribution of conditioned r.v. |(X1 = x1 , . . . , Xn = xn ). According
to Bayes formula, this distribution has the probability density function
f ()P (X1 = x1 , . . . , Xn = xn | = )
.
f (|x1 , . . . , xn ) = R
f (t)P (X1 = x1 , . . . , Xn = xn | = t)dt
0
Using the hypothesis that for any > 0 the conditioned random variables
X1 |( = ), . . . , Xn |( = ) are independent and identically distributed with
X|( = ) (i.e. Xi |( = ) Po() for any i {1, . . . , n}), it follows that
P (X1 = x1 , . . . , Xn = xn | = ) = P (X1 = x1 | = ) . . . P (Xn = xn | = )
115
= P (X = x1 | = ) . . . P (X = xn | = )
n
P
x1
=e
x1 !
... e
xn
xn !
=e
xi
i=1
,
x1 ! . . . xn !
so
n
P
xi
a1 e b en
a
(a)b
x1 ! . . . xn !
f (|x1 , . . . , xn ) =
n
Z
P
x
1
1
a1 bt nt i=1 i
e
t
t
e
dt
(a)ba x1 ! . . . xn ! 0
i=1
a+
R
0
n
P
xi 1
i=1
a+
n
P
i=1
xi 1
(1+bn)
b
t(1+bn)
b
.
dt
Therefore the posterior distribution of the r.v. is a Gamma distribution of

n
P
b
parameters a +
xi and 1+bn
.
i=1
Step 2 consists of calculating the posterior distribution for the r.v. Xn+1 ,
that is the distribution of the conditioned r.v. Xn+1 |(X1 = x1 , . . . , Xn = xn ).
According to the total probability formula, this distribution is given by
P (Xn+1 = x|X1 = x1 , . . . , Xn = xn )
Z
P (Xn+1 = x|X1 = x1 , . . . , Xn = xn , = )f (|x1 , . . . , xn )d, x N.
=
0
Using the hypothesis that for any > 0 the conditioned random variables
X1 |( = ), . . . , Xn |( = ), Xn+1 |( = ) are independent and identically
distributed with X|( = ), it follows that
P (Xn+1 = x|X1 = x1 , . . . , Xn = xn , = ) = P (Xn+1 = x| = )
= P (X = x| = ),
so
Z
P (Xn+1 = x|X1 = x1 , . . . , Xn = xn ) =
P (X = x| = )f (|x1 , . . . , xn )d,
0
for any x N. Since X|( = ) Po() and f (|x1 , . . . , xn ) is the probabiln

P
b
,
ity density function of Gamma distribution of parameters a + xi and 1+bn
i=1
116
we get that the posterior distribution of the r.v. Xn+1 is a mixed Poissonn
P
b
Gamma distribution of parameters a +
xi and 1+bn
, which is equivalent
i=1
with the Negative Binomial distribution of parameters a +

Hence Xn+1 |(X1 = x1 , . . . , Xn = xn ) BN (a +
n
P
i=1
n
P
xi and
i=1
1+bn
.
1+b+bn
1+bn
).
xi , 1+b+bn
Step 3 consists of calculating the frequency indexes In+1 (x1 , . . . , xn ). According to formula (10.1) we have
E(Xn+1 |X1 = x1 , . . . , Xn = xn )
, x1 , . . . , xn D.
E(Xn+1 )
According to the formula of the mean for the Negative Binomial distribution,
1
) it follows that
from X BN (a, b+1
In+1 (x1 , . . . , xn ) =
E(Xn+1 ) = E(X) = a
1
b+1
1
b+1
= ab,
and from Xn+1 |(X1 = x1 , . . . , Xn = xn ) BN (a +
n
P
i=1
1+bn
xi , 1+b+bn
) it follows
that
E(Xn+1 |X1 = x1 , . . . , Xn = xn ) =
a+
n
X
!
xi
i=1
1+bn
1+b+bn
1+bn
1+b+bn

n
P
b a+
xi
=
i=1
1 + bn
so
a+
In+1 (x1 , . . . , xn ) =
n
P
xi
i=1
a(1 + bn)
1+
=
1
a
n
P
i=1
1 + bn
xi
, x1 , . . . , xn D.
(10.4)
After calculating the frequency indexes using formula (10.4), the premiums
for year n + 1 can be obtained (at Step 4) by using the formula (10.3).
Example 10.1. We apply the model discussed for the following data set,
that one French insurance company had during year 1979. The data set
consists of m = 1044454 policyholders.
No. of accidents (j) Absolute frequency (mj )
0
881705
1
142217
2
18088
3
2118
4
273
5
53
Total
1044454
117
1
).
According to our model, X BN (a, b+1
By using the maximum likelihood estimation method, the estimated values of parameters a > 0 and b verify the following equations
1
a
=
,
b+1
a+X
5
X
j=1

mj
1
1
1
+
+ ... +
a a+1
a+j1

X
m ln 1 +
= 0,
a
(10.5)
(10.6)
where X is the mean of the sample formed by the recorded data. We have
X=
5
X
j=0
mj
' 0, 178183051.
m
Dividing by m, the equation (10.6) can be rewritten as

5
X
mj 1
1
1
X
+
+ ... +
ln 1 +
= 0.
m a a+1
a+j1
a
j=1
(10.7)
For the given values of mj , m and X, we derive that the equation (10.7) has
a unique positive solution, namely
a ' 1, 672974126.
By (10.5) we obtain that
b=
X
' 0, 106506758.
a
It follows from (10.4) that

1+
In+1 (x1 , . . . , xn ) =
1
1,672974126
n
P
xi
i=1
1 + 0, 106506758 n
, x1 , . . . , xn D.
(10.8)
Using the constructed model (Classical Bonus-Malus System) we consider the case when the maximum number of consecutive years is n = 10
and the maximum number of accidents is 5. According to formula (10.8)
we obtain the following table containing the values of frequency indexes
n
P
In+1 (x1 , . . . , xn ) based on year n and the total number of accidents
xi .
i=1

n
n
P
xi
118
1.0000
0.9037
0.8244
0.7579
0.7012
0.6525
0.6101
0.5729
0.5399
0.5106
0.4842
1.4439
1.3172
1.2108
1.1204
1.0425
0.9748
0.9153
0.8627
0.8158
0.7737
1.9842
1.8099
1.6638
1.5396
1.4326
1.3395
1.2578
1.1854
1.1210
1.0631
2.5244
2.3027
2.1168
1.9587
1.8226
1.7042
1.6002
1.5082
1.4262
1.3526
3.0646
2.7955
2.5698
2.3779
2.2126
2.0689
1.9426
1.8309
1.7313
1.6421
3.6048
3.2882
3.0228
2.7971
2.6027
2.4336
2.2851
2.1537
2.0365
1.9315
i=1
0
1
2
3
4
5
6
7
8
9
10
For example, consider a policyholder that had only one accident in the first
three years and the initial premium was 200 u.c. According to formula (10.3)
and the values from the table of frequency indexes, the premium for the fourth
3
P
year is obtained as 200 I3+1 (x1 , x2 , x3 ) for
xi = 1, that is 200 1.2108 =
i=1
242.16 u.c.
10.4
Problems
Exercise 10.1. Calculate and extend the above table of frequency indexes.
Exercise 10.2. The initial premium for a policyholder was 300 u.c. Calculate the premium for the next 12 years, if the policyholder had only one
accident in the second year, two accidents in the sixth year and one accident
in the seventh year.
Exercise 10.3. a) A policyholder had only one accident in the first year.
Calculate the number of years over which the premium will be less that the
initial premium.
b) The same question for a policyholder who had three accidents in the first
year.
Theme 11
Some optimization models
11.1
Portfolio planning
Consider a portfolio model where an investor wishes to invest in n assets.

For any i {1, . . . , n}, let ri be the return (the rate of profit) of the asset i.
Obviously, r = (r1 , . . . , rn )> is a random vector.
Let m = (m1 , . . . , mn )> and V = (vij )i,j{1,...,n} be the mean and the
covariance matrix of r, respectively. The matrix V represents the risk matrix
of the investment.
Assume that the investor disposes of estimated values of m and V. The
portfolio planning consists in determining the proportions p1 , p2 , . . . , pn of
the investment to asset 1, 2, . . . , n, respectively. Obviously,
n
X
pi = 1 and pi 0, i {1, . . . , n}.
i=1
Let p = (p1 , . . . , pn )> . Then the value

m> p =
n
X
m i pi
i=1
represents the expected return of the portfolio p, and the value

>
p Vp =
n X
n
X
Vij pi pj
i=1 j=1
represents the expected risk of the portfolio p.

To optimize the portfolio, on the one hand one needs to minimize the
expected risk, and on the other hand one needs to maximize the expected
119
THEME 11. SOME OPTIMIZATION MODELS
120
return. So, by choosing the minimization of expected risk as optimization

criterion, we obtain that an optimal portfolio p is an optimal solution for the
following problem

min p> Vp s.t.

m> p c,
n

(P11.1.1) X
pi = 1,

i=1

pi 0, i {1, . . . , n},
where c represents a given lower bound for the expected return.
On the other hand, by choosing the maximization of expected return as
optimization criterion, we obtain that an optimal portfolio p is an optimal
solution for the following problem

max m> p s.t.

p> Vp d,
n

(P11.1.2) X
pi = 1,

i=1

pi 0, i {1, . . . , n},
where d represents a given upper bound for the expected risk.
Also, by choosing the MEP (maximum entropy principle) as optimization
criterion, we obtain that an optimal portfolio p is an optimal solution for the
following problem

n
X

max H(p) =
pi ln pi s.t.

i=1
>
m p c,

(P11.1.3) p> Vp d,
X
n

pi = 1,

i=1
p 0, i {1, . . . , n}.
i
By combining the above criteria, we can define an optimal portfolio p as

an optimal solution for the following problem

n
X

min w1 p> Vp w2 m> p + w3
pi ln pi s.t.

i=1
n
(P11.1.4) X

pi = 1,

i=1
p 0, i {1, . . . , n},
i
121
where w1 , w2 and w3 are given weights such that w1 , w2 , w3 > 0 and

w1 + w2 + w3 = 1.
We remark that the problem (P11.1.1) is a quadratic programming problem, and the problem (P11.1.2) is a linear programming problem with quadratic
constraints. Also, the problem (P11.1.3) is an entropy optimization problem
with quadratic constraints, and the problem (P11.1.4) is a convex quadratic
programming problem with entropic perturbation.
11.2
Regional planning
An important problem in regional or urban planning is the allocation of new

houses. Let n be the number of zones dividing the city or the region, K
be the number of different household types to be located, and let L be the
number of different house types to be allocated. For any i {1, . . . , n},
k {1, . . . , K} and l {1, . . . , L} we assume that the following elements are
known:
bikl = the budget that a type k household is willing to allocate for purchasing and living in a type l house from zone i, including house price,
the housing costs and the daily transportation costs (the residential
budget);
cikl = the cost that must be allocated by a type k household to living
in a type l house from zone i, including the daily transportation (the
necessary budget);
sikl = the area allocated for a type k household with a type l house
from zone i;
Si = the total area allocated for housing in zone i;
Fk = the total number of type k households to be located.
The difference bikl cikl represents the bidding power of a type k household
for purchasing a type l house from zone i.
The urban or regional planning for households locating consists in determining the numbers xikl of type k households that will be located in a type
l house in zone i, for any i {1, . . . , n}, k {1, . . . , K} and l {1, . . . , L}.
To optimize the households locating, one needs to introduce an optimization criterion. A such criterion is the maximization of total bidding power.
122
Hence (xikl )i,k,l is an optimal solution for the following linear programming
problem

n
K
L

max X X X(b c )x s.t.

ikl
ikl ikl

i=1 k=1 l=1

K X
L
X

sikl xikl Si , i {1, . . . , n},

(P11.2.1)
k=1 l=1
n X
L
X

xikl = Fk , k {1, . . . , K},

i=1 l=1
x 0, i {1, . . . , n}, k {1, . . . , K}, l {1, . . . , L}.
ikl
When the planner disposes of a lower bound M for the total bidding
power, then (xikl )i,k,l is an optimal solution for the following problem, according to the MEP:

L
K X
n X

X
max
xikl ln xikl s.t.

i=1
l=1
k=1
K L
XX

sikl xikl Si , i {1, . . . , n},

k=1 l=1

n X
L
(P11.2.2) X

xikl = Fk , k {1, . . . , K},

i=1 l=1
X
L
n X
K X

(bikl cikl )xikl M,

i=1 k=1 l=1
x 0, i {1, . . . , n}, k {1, . . . , K}, l {1, . . . , L}.
ikl
Adding auxiliary variables to the inequality constraints, this problem becomes a linear programming problem with partial entropic perturbation.
Another model for households locating is obtained by choosing the minimization of the total of journey-to-work transportation costs. By refining
the elements of the above model, we assume that the following elements are
now known, for any i, j {1, . . . , n}, k {1, . . . , K} and l {1, . . . , L}:
bijkl = the budget that a type k household is willing to allocate for
purchasing and living in a type l house in zone i and working in zone j
(under the assumption that every household has only one key worker);
cijkl = the cost that must be allocated by a type k household to living
in a type l house in zone i and working in zone j;
Lil = the number of type l house available in zone i;
123
Sjk = the (estimated) number of jobs in zone j for key workers of a

type k household.
The urban or regional planning for households locating consists now in
determining the numbers xijkl of type k households living in a type l house
in zone i and working in zone j, for any i, j {1, . . . , n}, k {1, . . . , K} and
l {1, . . . , L}. Hence (xijkl )i,j,k,l is now an optimal solution for the following
linear programming problem

n X
n X
K X
L
X

max
(bijkl cijkl )xijkl s.t.

i=1 j=1 k=1 l=1
n K
XX

xijkl Lil , i {1, . . . , n}, l {1, . . . , L},
(P11.2.3)
k=1
j=1
n X
L
X

xijkl = Sjk , j {1, . . . , n}, k {1, . . . , K},

i=1 l=1

xijkl 0, i, j {1, . . . , n}, k {1, . . . , K}, l {1, . . . , L}.
Again, when the planner disposes of a lower bound M for the total bidding power, then (xijkl )i,j,k,l is an optimal solution for the following problem,
according to the MEP:

L
n X
n X
K X

X

xijkl ln xijkl s.t.
max

i=1
j=1
l=1
k=1

X
K
n X

xijkl Lil , i {1, . . . , n}, l {1, . . . , L},

j=1 k=1
n L
(P11.2.4) X
X

xijkl = Sjk , j {1, . . . , n}, k {1, . . . , K},

i=1 l=1
n n K L
XXXX

(bijkl cijkl )xijkl M,

i=1 j=1 k=1 l=1

xijkl 0, i, j {1, . . . , n}, k {1, . . . , K}, l {1, . . . , L}.
Similarly to (P11.2.2), problem (P11.2.4) can be written as a linear programming problem with partial entropic perturbation.
11.3
Industrial production planning
When planning the industrial production of a country or region, one important problem consists of estimating the technical coefficient matrix
124
A = (aij )i,j{1,...,n} , where n is the number of industry sectors and, for any
i, j {1, . . . , n}, aij represents the amount of input from sector i to sector j
per unit of the output of sector j. Therefore
aij =
zij
, i, j {1, . . . , n},
Xj
where zij represents the sales input from sector i to sector j, and Xj represents the total output of sector j. Also, we have
Xi =
n
X
zij + Yi , i {1, . . . , n},
j=1
where Yi represents the amount of output from sector i to beneficiaries outside

the analyzed sectors, like as the government or the foreign markets.
Usually, the estimation of the current values aij of technical coefficients
(0)
is based on some known estimated values aij of the previous values of these
coefficients and on some known estimated values li and ci of the total interindustry outputs (sales) and inputs (purchases), respectively, for each sector
i {1, . . . , n}, i.e.
n
X
zij = li , i {1, . . . , n},
j=1
n
X
zij = cj , j {1, . . . , n},
i=1
where
n
X
i=1
li =
n
X
cj .
j=1
Also, by using some known estimated values Yi , i {1, . . . , n} of the

current outputs from industry sectors to beneficiaries outside the analyzed
sectors, we obtain the estimated values
Xi = li + Yi , i {1, . . . , n}
of total output for each sector i.
Among the measures of deviation of current values aij of technical coef(0)
ficients from their previous values aij , we can use the generalized relative
entropy
n X
n
X
aij
H(A; A(0) ) =
aij ln (0) ,
aij
i=1 j=1
(0)
where A(0) = (aij )i,j{1,...,n} , with the assumption that aij = 0 for any
(0)
i, j {1, . . . , n} for which aij = 0.
125
Hence, the technical coefficient matrix A is an optimal solution for the

following optimization problem

X
X
min H(A; A(0) ) =
fij aij +
aij ln aij s.t.

(i,j)K
(i,j)K
n
X

Xj aij = li , i {1, . . . , n},

j=1
(P11.3.1) n
cj
X
aij =
, j {1, . . . , n},

X
j
i=1
aij 0, i, j {1, . . . , n},

aij = 0, (i, j) {1, . . . , n} {1, . . . , n} \ K,
where
o
n
(0)
K = (i, j) aij > 0, i, j {1, . . . , n}
and
(0)
fij = ln aij , (i, j) K.

We remark that this problem is a linear programming problem with entropic perturbation. We can use the geometric programming method to solve
problem (P11.3.1) and we obtain that this problem has a unique optimal solution of the form
(0)
aij = ri aij sj , i, j {1, . . . , n},

i.e.
A = RA(0) S,
where R and S are the n n diagonal matrices with the diagonal entries ri
(i {1, . . . , n}) and sj (j {1, . . . ,n}), respectively (and all other entries
equal to zero). We can iterate the last equality to estimate the technical
coefficient matrix at m consecutive times, namely
A(k) = R(k) A(k1) S(k) , k {1, . . . , m}.
Therefore, we obtain a method for estimating the matrix A = A(m) , called
the RAS algorithm.

Economic Statistics, Output Analysis and Actuarial Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Economic Statistics, Output Analysis and Actuarial Models

Uploaded by

Copyright:

Available Formats

Economic statistics, output administration

and actuarial science

3 Two-dimensional statistical distributions

4 Time series and forecasting

6 Introduction to Actuarial Math

7.6.2 Monthly pension . . . . . . . . . . . . . . . . . . . . . 93

9 Collective annuities and insurances

11 Some optimization models

for each non-empty index set I.

A measurable space is a pair (, B), where is a set and B is a Borel

THEME 1. PREPARATION FROM PROBABILITY THEORY

Proposition 1.1. If M is a family of subsets of , then

The product Borel field of the Borel fields Bi , i I is

where, for every j I, prj :

i j , prj ((i )iI ) = j , (i )iI

is the projection function

Proposition 1.2. For every d N we have B =

Definition 1.7. A measure on the measurable space (, B) is a function

A measure is called a finite measure if () < .

THEME 1. PREPARATION FROM PROBABILITY THEORY

A measure is called a -finite measure if there exists a sequence

(Ai )iN B s.t. (Ai ) < i N , Ai Ai+1 i N and

A measure space is a triple (, B, ), where (, B) is a measurable

e) (Inclusion-exclusion formula) If A1 , . . . , An B, then

f ) If (An )nN B s.t. An An+1 n N , then (

THEME 1. PREPARATION FROM PROBABILITY THEORY

g) If (An )nN B s.t. An An+1 n N , then (

Proposition 1.4. Let be a countable set and

(({})) is a bijective correspondence

THEME 1. PREPARATION FROM PROBABILITY THEORY

Definition 1.12. Let (, B, P ) be a probability space and let (Ai )iI B be

for every finite non-empty subset J I of indices.

is a probability on the measurable space (, B).

P (Ai )PAi (B).

Proposition 1.2 (Bayess formula). Let (, B, P ) be a probability space

P (Ai )PAi (B)

THEME 1. PREPARATION FROM PROBABILITY THEORY

Definition 1.14. Let d N . A distribution (probability distribution)

is the Dirac measure, defined by

lim F (x) = 0 for an i {1, . . . , d}.

Definition 1.17. A function F : Rd [0, 1] which verifies all the properties

THEME 1. PREPARATION FROM PROBABILITY THEORY

Proposition 1.8. Let F : Rd R be a generalized distribution function.

where B1 is the Borel field generated by the open subsets of R.

b) f S(, B) if and only if there exist n N , a1 , . . . , an R and

c) If f S+ (, B), then there exists a sequence (fn )nN S(, B) s.t.

THEME 1. PREPARATION FROM PROBABILITY THEORY

(with the convention 0 = 0, = ).

|f |d < . In this case the Lebesgue integral of the

function f with respect to the measure is defined by

Remark 1.4. Sometimes, in order to avoid any possible confusion, we might

Proposition 1.11. (Correctness of Definition 1.21.b) Let (, B, )

(fn )nN , (gn )nN S(, B), 0 fn fn+1 , 0 gn gn+1 n N , then

THEME 1. PREPARATION FROM PROBABILITY THEORY

Proposition 1.12. (Properties of Lebesgue integral) Let (, B, ) be a

be a measurable function (with respect to the Borel

e) If g = f -a.e., then g is -Lebesgue integrable and

every n N . If the functions fn , n 1 are measurable (with respect to

Definition 1.22. Let U, V Rd be two non-empty open sets. A function

THEME 1. PREPARATION FROM PROBABILITY THEORY

sets, : U V be a C 1 diffeomorphism and f : V R be a measurable

Theorem 1.3. (Jensens Inequality) Let (, B, P ) be a probability space,

Moreover, the function f is unique -a.e.