You are on page 1of 18

A primer on probability theory in financial modeling

Sergio M. Focardi
Tel: +33 1/45 75 51 74
interteksf@aol.com
The Intertek Group
94, rue de Javel
F-75015 Paris
Tutorial 2001-01
The objective of finance theory is to predict the future evolution of financial
quantities, such as the price of a single asset or broad market movements. Uncertainty as to the future evolution of prices is a fundamental tenet of modern
finance theory. The paradigm of choice for modeling uncertainty in finance is
the probability theory. This tutorial presents the formal probabilistic concepts
behind todays financial modeling.

Introduction

The theory of finance is a mathematical theory that describes the time evolution of financial quantities. To describe the time evolution of quantities, finance
theory uses a mathematical formalism similar to that of the physical sciences.
The results - either actual or possible - of empirical observations can be predicted through a sequence of purely logical operations. Given todays low-cost
high-performance computers, predictions are generally obtained by running mathematical models on computers.
As in the physical sciences, the objective of finance theory is to make predictions.
Predictions might concern phenomena such as the future value of a single stock or
currency or future market movements in a given geographical block. But there are
important differences between finance theory and theory in the physical sciences:
Finance theory does not describe laws of nature but a complex human artifact, the financial markets. Physical sciences that describe a complex system,
such as weather, are typically supported by basic physical laws; finance theory describes a complex system for which there is (presently) no description
in terms of elementary laws or elementary components.

A primer on probability theory in financial modeling

Finance theory lacks a mathematical theory that allows to compute the evolution of a system (in this case of a financial system) starting from initial
conditions. In the absence of this, finance theory consists of two separate
components: 1) a set of relationships that constrain the entire market from
the time evolution of fundamental determinants and 2) a set of assumptions
on the time evolution of fundamental determinants. The determination of
the appropriate assumptions as regards the evolution of fundamental determinants is more the domain of financial econometrics than of finance theory.
The above considerations are central to an understanding of financial modeling.
In fact, derivatives asset pricing models are (generally) algorithms that compute
the price evolution of derivatives starting from assumptions on the evolution of the
price of the underlying. The choice of models must take into account the ability
of the model to describe not only the underlying (such as the term structure of
interest rates) but also the functioning of markets. Lets now briefly present the
development of finance theory.
1.1

Competitive markets under uncertainty

During the 1950s, the economists Kenneth Arrow and Georges Debreu proposed
extending to financial markets the notions of microeconomics and, consequently,
the analysis of competitive markets. A classical reference on microeconomics is
Varian (1992). Lets review the fundamental points.
Following classical notions, a competitive exchange market (without production) is a tool that allows N agents to exchange a given good. The fundamental
notion of competitive exchange is that of supply and demand. It is assumed that
each agent is characterized by a demand function which prescribes the quantity
of a good that the agent is willing to buy at a given price. The aggregation of
demand produces the market demand function, i.e., the total amount of the
good that agents will purchase at any given price. There is a parallel function of
supply which associates to each price the quantity of the good offered for sale.
The supply function derives from the aggregation of the individual supply function of agents.
The classical analysis makes the assumption of perfect competitive markets.
Perfect competitive markets have a number of characteristics. First, there are no
costs or constraints associated with the exchange of goods; price and quantity are
the only determinants of exchange. Each agent is individually too small to influence the market; prices are determined by the collective action of agents. Given
the market price, each agent will buy or sell exactly the quantity prescribed by
its demand or supply function, under the constraint of their financial endowment
which prescribes the maximum total amount that each can purchase.
If there is not one but several goods to choose from, agents must select from a

A primer on probability theory in financial modeling

panel of goods. It is assumed that agents are able to order their preferences,
i.e., that they are able to decide if they are either indifferent to or have preferences
for different panels of goods. It is possible to demonstrate that, under assumptions of continuity in the ordering of preferences, preferences themselves can be
expressed through a utility function. A utility function is a numerical function
defined over each panel of goods, i.e., it is a function that assigns a numerical
utility index to each panel of goods. Preferences are expressed by higher values
of the utility function.
Therefore, in a competitive exchange market there are N agents, each characterized by a financial endowment and by a utility function. A panel of goods chosen
by each agent corresponds to each set of prices of the goods. The aggregation of
choices leads to the aggregate market demand for each good.
The key point of microeconomic theory is to determine if and how the market reaches its equilibrium point which is defined as the point at which the
aggregate demand is exactly equal to the aggregate offer. This might seem a
banal problem, but it involves one of the key results of this century mathematics:
Brouwers fixed point theorem.
The fixed-point theorem can be stated in many ways. In its simplest formulation,
it states that a continuous function that maps an interval onto itself has a fixed
point, i.e., a point where the argument of the function has the same value as the
function itself. Each level of offer corresponds to a price which induces demand.
The equilibrium problem is to find the fixed point of this supply and demand
function.
The above notion of competitive markets does not consider uncertainty. To
extend classical microeconomic analysis to include uncertainty, Kenneth Arrow
reasoned as follows. At every instant, agents exchange not only goods but also
contracts that will be executed in the future. The future result of such contracts
is uncertain. A stock, for example, is a contract that gives its holder the right to
receive future dividends and the eventual final liquidation price, but the amount
of these payments at time of execution is uncertain.
Suppose, for simplicity, that there is only one period and thus two dates: the initial instant T0 and the final instant T1 . At instant T0 , agents exchange contracts
that give them the right to receive an uncertain amount of goods at instant T1 .
Abstracting from physical goods, agents exchange at instant T0 contracts that
give them the right to receive an uncertain payment at instant T1 .
Following Arrow, suppose that the economy might be, at instant T1 , in one of k
possible different states. We can now observe that each couple good-state can be
considered a different good with an associated market supply and demand and a

A primer on probability theory in financial modeling

market price. Each contract will produce a different outcome in function of the
state realized at instant T1 . Every contract is therefore a contingent claim, i.e.,
each contract gives the right to a payment or a delivery of goods contingent on
the realized state. If we now assume that agents have utility functions defined
on quantities for each state, we have placed the analysis of markets under uncertainty in the framework of the analysis of deterministic markets.
If we now drop the single-period assumption, we can apply the same reasoning to
a model that assumes that agents perform market operations (i.e., they exchange
contracts or trade) at each of the M future dates. Utility functions must be defined for each good, for each state, and for each instant. The number of trading
instants can be either finite or infinite.
The above description of financial markets is highly idealized and serves only as
a framework for models and application software. It is, however, the conceptual
basis for understanding modern finance theory. In a number of applications, it
is used directly. For example, optimization software used in investment management is based on the above theoretical model.
Lets now explore how the previous framework translates into a mathematical
probabilistic description of financial markets. It would seem natural to define
the states of the economy as instantaneous states at each trading date. From
the point of view of the mathematical description of the economy, however, it
is more convenient to stipulate that states are an entire possible history of the
economy over a given time period. The following paragraphs will describe how a
probability structure can be imposed on this set of states.

The mathematical representation of uncertainty

Todays finance theory is based on the hypothesis that uncertainty about future
prices is ineliminable; a fundamental tenet is that no entity can attain a deterministic description of the evolution of the economy with the exception of the
evolution of riskfree assets. A typical practical example of a riskfree asset is USA
government debt.
The inability to build purely deterministic models of the economy calls for the
mathematical representation of uncertainty. Probability theory is the mathematical description of uncertainty that presently enjoys the broadest diffusion; it
is the paradigm of choice for mainstream finance theory. But it is by no means
the only one. Competing mathematical paradigms for uncertainty include, for
example, fuzzy measures.
Though probability as a mathematical axiomatic theory is well known, its interpretation is still the object of debate. There are three basic interpretations of

A primer on probability theory in financial modeling

probability:
Probability as intensity of belief, J.M. Keynes, 1921
Probability as relative frequency, R. von Mises, 1928
Probability as an axiomatic system, A. Kolmogorov, 1933.
Developed primarily by the Russian mathematician Kolmogorov, the axiomatic
theory of probability eliminated the logical ambiguities that plagued probabilistic reasoning prior to his work. Application of the axiomatic theory is,
however, a matter of interpretation. In finance theory, probability might have
two different meanings: 1) as a descriptive concept and 2) as a determinant of
the agent decision-making process.
As a descriptive concept, probability is used in the sense of relative frequency,
similar to its use in the physical sciences. In this sense, the probability of an event
is assumed to be approximately equal to the relative frequency of its manifestation
in a large number of experiments. This interpretation is unsatisfactory in finance
theory for a number of reasons:
The approximate equality between relative frequency and theoretical probability cannot be made precise.
More fundamentally, in a truly probabilistic environment there can be no
definite link between probability and observation. Unless we rule out the
possibility of low-probability events, any observation is compatible with any
statement of probability.
An additional complication comes from the fact that financial time series
have only one realization. Every estimate is made on a single time-evolving
series. If stationarity (or a well-defined time process) is not assumed, it is
not possible to make statistical estimates.
If by probability we refer to the agent decision-making process, there are problems
here too. It is assumed that agents are able to associate probability numbers to
future events and that decisions are made on the basis of these evaluations. Different assumptions can be made. The strictest assumption is that all agents share
both the same probabilistic evaluations and the model assumptions. Billingsley
(1986) and Chow and Teicher (1988) offer excellent presentations of probability
theory.
2.1

The axiomatic theory of probability is based on three fundamental concepts:

1) outcomes, 2) events and 3) measure. The outcomes are the set of all possible
results of an experiment or an observation. The set of all possible outcomes is
often written as the set . For instance, in the dice game, a possible outcome is

A primer on probability theory in financial modeling

a pair of numbers, one for each face, such as 6 + 6 or 3 + 2; the space is the set
of all 36 possible outcomes.
Events are sets of outcomes. Continuing with the example of the dice game,
a possible event is the set of all outcomes such that the sum of the numbers is
10. Probabilities are defined on events, not on outcomes. To render definitions
consistent, events must be a class F of subsets of with the following properties:
1. F is not empty;
2. If A F then A0 F ; (A0 is the complement of A made of all those elements
of that do not belong to A);
3. If Ai F for i = 1, 2, . . . then Ai F .
Every such class is called a -algebra. Any class for which property 3 is valid
only for a finite number of sets is called an algebra.
Any set that belongs to a class G is said to be measurable with respect to
G. Consider a class G of subsets of and consider the smallest -algebra that
contains G, defined as the intersection of all the sigma-algebras that contain G.
That -algebra is indicated as (G) and is said to be the -algebra generated by G.
A particularly important space in probability is the Euclidean space. Consider
first the real axis R, i.e., the Euclidean space R1 in one dimension. Consider the
set formed by all intervals open to the left and all unions and intersections of
intervals open to the left. The -algebra generated by this set is represented with
the letter B; sets that belong to B are called Borel sets.
Now consider, more in general, the n-dimensional Euclidean space Rn , for n 1.
(ennuples of real numbers). Consider the class of all generalized rectangles open
to the left and their unions and intersections. The -algebra generated by this
class is indicated as Rn ; sets that belong to Rn are called n-dimensional Borel sets.
The above construction is not the only possible one. The Rn , for any value of n,
are also generated by open or closed sets. As we will see, the Rn are fundamental
to defining random variables. They define a class of subsets of Euclidean spaces
on which it is reasonable to impose a probability structure: the class of every
subset would be too big while the class of, say, generalized rectangles would be
too small. The Rn are an adequately rich class.
2.2

Probability

Intuitively, probability is a set function that associates to every event a number

between zero and one. Probability is formally defined by a triple (, F, P ), which
is called a probability space, where is the set of all possible outcomes, F is

the event -algebra and P is a probability measure defined as follows.

A probability measure P is a set function from F to R (the set of real numbers)
that satisfies three conditions:
1. 0 P (A) 1 for every A F ;
2. P () = 0 and P () = 1;
P
3. P (Ai ) =
P (Ai ) for every finite or numerable sequence of disjoint events
Ai such that Ai F.
F does not have to be a -algebra. The definition of a probability space can
be limited to algebras of events. It is however possible to demonstrate that a
probability defined over an algebra of events H can be extended in a unique way
to the -algebra generated by H.
Two events are said to be independent if:
P (A B) = P (A)P (B).

(1)

The probability of event A given event B, written as P(A/B), is defined as follows:

P (A B)
.
(2)
P (B)
It is immediate to deduct from simple properties of set theory and from the
disjoint additivity of probability that:
P (A/B) =

(3)

P (A) = 1 P (B).

(4)

Discrete probabilities are a special instance of probabilities. Defined over a

finite or denumerable set of outcomes, discrete probabilities are non-zero over
each outcome. The probability of an event is the sum of the probabilities of its
outcomes. In the finite case, discrete probabilities are the usual combinatorial
probabilities.
2.3

Measure

A measure is a set function defined over an algebra of sets, denumerably additive, and such that it takes value 0 on the empty set but can otherwise assume
any positive value including, conventionally, an infinite value. A probability is
thus a measure of total mass 1, i.e., it takes value 1 on the set .
Measure can be formally defined as a function M (A) from an algebra F to R
(the set of real numbers) that satisfies the following three properties:

1. 0 M (A) for every A F;

2. M () = 0;
P
3. M (Ai ) =
M (Ai ) for every finite or denumerable sequence Ai of disjoints
events such that Ai F.
If M is a measure defined over a -algebra F, the triple (, F, M ) is called a
measure space (this term is not used if F is an algebra). The couple (, F) is
a measurable space if F is a -algebra.
Measures in general, and not only probabilities, can be uniquely extended from
an algebra to the generated -algebra.
2.4

Integrals

The notion of measure allows to define a concept of integral that generalizes the
usual concept of the Riemann integral. For each measure M , the integral is a
number that is associated to every integrable function f . It is defined in two steps.
First suppose that f is non-negative and consider a finite decomposition of the
space , that is to say a finite class of disjoint subsets Ai of whose union
is
P : (Ai ; Ai Aj = for i 6= j;RAi = ). Then consider the sum:
inf (f () : Ai )M (Ai ). The integral f dM is defined as the superior, if it
exists, of all these sums over all possible decompositions of .
Second, given a generic function f not necessarily non-negative, consider its decomposition in its positive and negative parts. The integral of f is defined as the
difference, if difference exists, between the integrals of its positive and negative
parts with the sign changed. This definition of integral generalizes the usual definition of the Riemann integral. The integral can be defined not only on but
on any measurable set G.
Given an algebra F, suppose that G and
R M are two measures and suppose that
a function f exists such that G(A) = A f dM , for A F. In this case G is said
to have density f with respect to M .
2.5

Measures and integrals over Euclidean spaces

A number of integrals of interest for probability theory are defined over the real
axis R and over the n-dimensional spaces Rn (real numbers and ennuples of real
numbers respectively).
The definition of these integrals requires the definition of various measures with
respect to which integrals are defined. Without formally defining each measure,
lets recall that the following integrals are defined over Euclidean spaces:

A primer on probability theory in financial modeling

the classical Riemann integral, defined with respect to the length of intervals
or areas of rectangles;
the Lebesgue integral, defined with respect to the measure space (, Rn , n ),
where n is the Lebesgue measure, a measure that generalizes the concept
of area;
the Stieltjes integral, defined with respect to measures that are in turn defined over finite rectangles.

Random variables

Probability is a set function defined over a space of events; random variables

transfer probability from the original space into the space of real numbers.
Given a probability space (, F, P ), a random variable X is a function X()
defined over the set that takes values in the set R of real numbers and subject
to the condition: the set ( : X() x) belongs to the -algebra F for every real
number x. In other words, the inverse image of any interval (, x] is an event.
It can be easily demonstrated that the inverse image of any union and product
of intervals is also an event.
A real-valued set function defined over is called measurable with respect to a
-algebra F if the inverse image of any Borel set belongs to F. A random variable which is measurable with respect to a -algebra cannot discriminate between
events that are not in that -algebra. A random variable X is said to generate G
if G is the smallest -algebra in which it is measurable.
Given a probability space (, F, P ) and a random variable X, theRexpected value
of X is its integral with respect to the measure P : E[X] = XdP , where
integration is extended to entire space .
3.1

Given a probability space (, F, P ) and a random variable X, consider a set A

of real numbers that belongs to R1 , i.e., A is a Borel set on the real line. Recall
that a random variable is a real-valued measurable function defined over the set
of outcomes. Therefore, the inverse image of A, X 1 (A) belongs to F and has a
well-defined probability P (X 1 (A)).
The measure P thus induces another measure on the real axis called distribution
or distribution law of the random variable X given by: p(A) = P (X 1 (A)).
It is easy to see that this measure is a probability measure. A random variable
therefore transfers on the set of real numbers the probability originally defined
over the space .

10

The function F defined by: F (x) = p(, x) = P (X x) is the distribution

function of the random variable X.
R
Suppose that there is a function f such that P (A) = A f dx for every set A that
belongs to R1 and with respect to the Lebesgue measure. The function f is called
a probability density function and the probability P is said to have density f .
For every interval (a, b), the property F (a) F (b) =
extended to the interval (a, b) holds.
3.2

f dx where integration is

Random vectors

The next step is to consider not only one but a set of random variables referred to as random vectors. Random vectors are formed by ennuples of random variables. Consider a probability space (, F, P ). A random variable is
a measurable function from to R1 ; a random vector is a measurable function from to Rn . We can therefore write a random vector as a function:
f () = (f1 (), f2 (), . . . , fn ()). Measurability is defined with respect to the
Borel -algebras Rn , with n = 1 for random variables. It can be demonstrated
that the function f is measurable F if and only if each component function fi is
measurable F.
Conceptually, the key issue is to define joint probabilities, i.e., the probabilities
that the n variables are in a given set. For example, consider the joint probability
that the inflation rate is in a given interval and the growth rate in another given
interval.
Consider the Borel -algebra Rn on the real n-dimensional space Rn . It can
be easily demonstrated that a random vector formed by n random variables
Xi , i = 1, 2, . . . , n induces a probability distribution over (Rn , Rn ). In fact, the
set ( : (X1 (), X2 (), . . . , Xn ()) H : H Rn ) belongs to F, i.e., the
inverse image of every set of the -algebra Rn belongs to the -algebra F. It is
therefore immediate to induce over every set H that belongs to Rn a probability
measure, the joint probability of the n random variables Xi . In general, however,
knowledge of the distributions and of distribution functions of each random variable is not sufficient to determine the joint probability distribution function.
Two random variables X, Y are said to be independent if P (X A, Y B) =
P (X A)P (Y B), A and B belong to R. This definition generalizes in obvious
ways to any number of variables and therefore to the components of a random
vector. It is easy to show that, if the variable components of a random vector are
independent, the joint probability distribution is the product of distributions.

A primer on probability theory in financial modeling

11

Stochastic processes

Given a probability space (, F, P ), a stochastic process is a set of random

variables that are measurable with respect to F, indexed with an index t [0, T ]
interpreted as time. A stochastic process is therefore an indexed random variable
Xt (). When it is necessary to emphasize the dependence of the random variable
value from both time and the element , a stochastic process is explicitly written
as a function of two variables: X = X(t, ). Given , the function X(t, ) is a
function of time that is called the path of the stochastic process.
The variable X might be a single random variable or a multidimensional random
vector. A stochastic process is therefore a function X(t, ) from the product
space [0, T ] into the n-dimensional real space Rn . Because to each corresponds a time path of the process - in general formed by a set of functions Xi (t, )
- it is possible to identify the space with a subset of the real functions defined
over an interval [0, T ].
Lets now discuss how to represent a stochastic process X(t, ) and the conditions
of identity of two stochastic processes. As a stochastic process is a function of
two variables, one can define equality as pointwise identity for each couple t, .
However, as processes are defined over probability spaces, pointwise identity is
seldom used; it is more fruitful to define equality modulo sets of measure zero or
equality with respect to probability distributions. In general, two random variables X, Y will be considered equal if the equality X() = Y () holds for every
with the exception of a set of probability zero. In this case, it is said that the
equality holds almost always (a.a.).
A rather general (but not complete) representation is given by the finite dimensional probability distributions. Given any set of indices (t1 , . . . , tm ), consider
the distributions t1 ,...,tm (H) = P ((Xt1 , . . . , Xtm ) H) where H Rm . These
probability measures are, for any choice of the ti , the finite-dimensional joint
probabilities of the process. They determine many, but not all, properties of a
stochastic process. For example, the finite dimensional distributions of a Brownian motion do not determine if the process paths are continuous or not.
In general, one can define the three concepts of equality between stochastic
processes described below:
Two stochastic processes are equal if they have the same finite-dimensional
distributions. This is the weakest concept of equality.
The process X(t, ) is said to be a modification of the process Y (t, ) if
the following equation holds:
X(t, ) = Y (t, ) a.a.,

t.

(5)

12

In other words, according to this definition, two stochastic processes are

equal if, given any value of t, the random variables X(, ), Y (, ) are equal
except over a set of probability zero. For each t, the set of measure zero over
which the two processes are different might be different.
Two processes are said to be indistinguishable if the following relationship
holds:
X(t, ) = Y (t, ) t,

a.a..

(6)

That is to say, two processes are indistinguishable if their paths coincide

except (eventually) over a set of measure zero.
It is quite obvious that property 3 implies property 2 which implies, in turn,
property 1. Implications do not hold in the opposite sense. Two processes having
the same finite distributions might have completely different paths. However
if one assumes that paths are continuous functions of time, properties 2 and 3
become equivalent.
4.1

Assets, prices, dividends and economic states

We are now in the position to summarize the probabilistic representation of financial markets. From a financial point of view, an asset is a contract which
gives the right to receive a stream of future payments, generically indicated as
dividends. In the case of a stock, the stream of payments will include the stock
dividends and the proceedings of the eventual final liquidation of the firm. A
bond is a contract that gives the right to receive coupons and the repayment of
the principal. We will suppose that all payments are made at the trading dates
and that no transactions take place between trading dates.
Lets assume that all securities are traded (i.e., exchanged on the market) at
either discrete fixed dates, variable dates or continuously. At each trading date
there is a market price for each security. Each security is therefore modeled with
two time series, a series of market prices and a series of dividends. As both series are subject to uncertainty, dividends and prices are time-dependent random
variables, i.e., they are stochastic processes. The time dependence of random
variables in this probabilistic setting is a delicate question and will be examined
shortly.
Following Kenneth Arrow and using a framework now standard, the economy
and the financial markets in a situation of uncertainty are described with the
following basic concepts:
It is assumed that the economy might be in one of the states of a probability
space (, F, P ). Therefore, the economy is represented by a probability
space (, F, P ).

A primer on probability theory in financial modeling

13

Every security is described by two stochastic processes formed by two timedependent random variables St and dt , that represent prices and dividends of
the same security. Therefore, every security is represented by two stochastic
processes St and dt .
This representation is completely general and is not linked to the assumption
that the space of states is finite.
4.2

Information structures

Lets now turn our attention to the question of time. The previous paragraphs
considered a space formed by states in an abstract sense. We have now to introduce an appropriate representation of time as well as rules that describe the
evolution of information, i.e. information propagation, over time. The concepts of information and information propagation are fundamental in economics
and finance theory.
Information, in this context, is a concept different from both the intuitive notion of information and from that of information theory in which information is
a quantitative measure related to the a priori probability of messages. In economics, information means the (progressive) revelation of the set of events to
which the current state of the economy belongs.
The concept of information in finance is a bit technical, but sheds light on the
probabilistic structure of finance theory. (Readers not interested in the formal
development of finance theory might skip this section.) The point is the following. Securities are represented by stochastic processes, i.e., time-dependent
random variables. But the probabilistic states on which these random variables
are defined represent entire histories of the economy. To embed time into the
probabilistic structure of states in a coherent way calls for information structures
and filtrations.
Recall that it is assumed that the economy is in one of many possible states
and that there is uncertainty on the state that has been realized. Consider a
time period of the economy. At the beginning of the period, there is complete
uncertainty on the state of the economy, i.e., there is complete uncertainty on
what path the economy will take. Different events have different probabilities,
but there is no certainty. As time passes, uncertainty is reduced as the number of
states to which the economy can belong is progressively reduced. Revelation of
information means the progressive reduction of the number of possible states;
at the end of the period, the realized state is fully revealed.
This progressive reduction of the set of possible states is formally expressed in the
concepts of information structure and filtration. Lets start with information
structures. Information structures apply only to discrete probabilities defined

A primer on probability theory in financial modeling

14

over a discrete set of states. At the initial instant T0 , there is complete uncertainty on the state of the economy; the actual state is known only to belong to
the largest possible event, i.e., the entire space . At the following instant T1 ,
the states are separated into a partition, a partition being a denumerable class
of disjoint sets whose union is the space itself. The actual state belongs to one of
the sets of the partitions. The revelation of information consists in ruling out all
other sets but one. In the discrete case, and only in the discrete case, partitions
are determined by the value of all the random variables at time T1 . For all the
states of each partition, and only for these, random variables assume the same
values.
Suppose, to exemplify, that only two securities exist in the economy and that
each can assume only two possible prices and pay only two possible dividends.
At every moment there are sixteen possible price-dividend combinations. We can
thus see that at the moment T1 all the states are partitioned into sixteen sets,
each containing only one state. Each partition includes all the states that have a
given set of prices and dividends at the moment T1 . The same reasoning can be
applied to each instant. The evolution of information can thus be represented by
a tree structure in which every path represents a state and every point a partition.
Obviously the tree structure does not have to develop in a symmetrical way as in
the above example. The tree might have a very generic structure of branches.
4.3

Filtration

The concept of information structure based on partitions supplies a rather intuitive representation of the propagation of information through a tree of progressively finer partitions. However, this structure is not sufficient to describe the
propagation of information in a general probabilistic context. In fact, the set of
possible events is much richer than the set of partitions. It is therefore necessary
to identify not only partitions, but also a structure of events. The structure of
events used to define the propagation of information is called a filtration. In the
discrete case, however, the two concepts, information structure and filtration, are
equivalent.
The concept of filtration is based on identifying all events that are known at any
given instant. It is assumed that it is possible to associate to each trading moment t a -algebra of events Ft contained in F and formed by all events that are
known prior to or at time t. It is assumed that events are never forgotten, i.e.,
that Ft Fs if t < s. In this way, an ordering of time is created. This ordering
is formed by an increasing sequence of -algebras, each associated to the time at
which all its events are known. This sequence is called a filtration. Indicated as
{Ft }, a filtration is therefore the increasing sequence of all -algebras Ft , each
associated to the respective instant t.
In the finite case, it is possible to create a mutual correspondence between fil-

15

trations and information structures. In fact, given an information structure, it

is possible to associate to each partition the algebra generated by the same partition. Observe that a tree information structure is formed by partitions that
create increasing refinement, that is to say that, by going from one instant to the
next, every set of the partition is decomposed. One can then conclude that the
algebras generated by an information structure form a filtration.
On the other hand, given a filtration {Ft }, it is possible to associate a partition
to each Ft . In fact, given any element , consider any other element such that,
for each set of Ft , both either belong to or are outside it. It is easy to see that
classes of equivalence are thus formed, that these create a partition, and that the
algebra generated by each such partition is exactly the Ft that has generated the
partition.
A stochastic process is said to be adapted to the filtration {Ft } if the variable
Xt is measurable with respect to the -algebra Ft . It is assumed that the price
and dividend processes St and dt of every security are adapted to Ft . This means
that, for each t, no measurement of any price or dividend variable can identify
events not included in the respective algebra. Every random variable is a partial
image of the set of states seen from a given point of view and at a given moment.
The concepts of filtration and of processes adapted to a filtration are fundamental. They ensure that information is revealed without anticipation. Consider the
economy and associate at every instant a partition and an algebra generated by
the partition. Every random variable defined at that moment assumes a value
constant on each set of the partition. The knowledge of the realized values of the
random variables does not allow identifying sets of events finer than partitions.
One might well ask: Why introduce the complex structure of -algebras as opposed to simply defining random variables? The point is that, from a logical
point of view, the primitive concept is that of states and events. The evolution
of time has to be defined on the primitive structure - it cannot simply be imposed on random variables. In practice, filtrations become an important concept
when dealing with conditional probabilities in a continuous environment. As the
probability that a continuous random variable assumes a specific value is zero,
the definition of conditional probabilities requires the machinery of filtration.
4.4

Conditional probability and conditional expectation

Conditional probabilities and conditional averages are fundamental in the stochastic description of financial markets. For instance, one is often interested in the
probability distribution of the price of a security at some date given its price at
an earlier date. The widely used regression models are an example of conditional
expectation models.

A primer on probability theory in financial modeling

16

The conditional probability of event A given event B was defined in the above
paragraphs on probability as P (A/B) = P P(AB)
. This simple definition cannot
(B)
be used in the context of continuous random variables because the conditioning
event (i.e., one variable assuming a given value) has probability zero. To avoid
this problem, one conditions on -algebras and not on single zero-probability
events. In general, as each instant is characterized by a -algebra Ft , the conditioning elements are the Ft .
The general definition of conditional expectation is the following. Consider a
probability space (, F, P ) and a -algebra G contained in F and suppose that X
is an integrable random variable on (, F, P ). We define the conditional expectation of X with respect to G,R written as E[X/G],
R a random variable measurable
with respect to G such that G E[X/G]dP = G XdP for every set G G. In
other words, the conditional expectation is a random variable whose average on
every event that belongs to G is equal to the average of X over those same events
but it is measurable G whilst X is not. It is possible to demonstrate that such
variables exist and are unique up to a set of measure zero.
Econometric models usually condition a random variable given another variable.
In the previous framework, conditioning one random variable X with respect to
another random variable Y means conditioning X given (Y ), i.e., given the algebra generated by Y . Thus E[X/Y ] means E[X/(Y )].
One can define conditional probabilities starting from the concept of conditional
expectations. Consider a probability space (, F, P ), a sub--algebra G of F and
two sets A, B F . If IA , IB are the indicator functions of the sets A, B (the
indicator function of a set assumes value 1 on the set, 0 elsewhere), we can define
conditional probabilities of the event A, respectively, given G or given the event
B as:
P (A/G) = E[IA /G],

P (A/B) = E[IA /IB ]

(7)

Using these definitions, it is possible to demonstrate that given two random variables X and Y with joint density f (x, y), the conditional density of X given Y
is
f (x, y)
f (x/y) =
.
(8)
fY (y)
In the discrete case, the conditional expectation is a random variable (i.e., a realvalued function defined over ) that takes a constant value over the sets of the
finite partition associated to Ft . Its value for each element of is defined by
the classical concept of conditional probability. It is simply the average over a
partition assuming the classical conditional probabilities.
An important econometric concept related to conditional expectations is that
of a martingale. Given a probability space (, F, P ) and a filtration {Fi },

17

a sequence of random variables Xi measurable Fi is called a martingale if the

following condition holds:
E[Xi+1 /Fi ] = Xi .

(9)

A martingale translates the idea of a fair game as the expected value of the
variable at the next period is the present value of the same value.

In summary

This tutorial has reviewed the following key concepts used in the probabilistic
description of financial markets:
Probability and probability spaces: Probability is a set function defined over
a class of events where events are sets of possible outcomes of an experiment.
A probability space is a triple formed by a set of outcomes, a -algebra of
events and a probability measure.
Random variables and random vectors: A random variable is a real-valued
function defined over the set of outcomes such that the inverse image of any
interval is an event. n-dimensional random vectors are functions from the
set of outcomes into the n-dimensional Euclidean space with the property
that the inverse image of n-dimensional generalized rectangles is an event.
Stochastic processes: Stochastic processes are time dependent random variables.
Information structures and filtrations: An information structure is a class
of partitions of the -algebra of events associated to each instant of time
that become progressively finer with the evolution of time. A filtration is an
increasing class of -algebras associated to each instant of time.
The stochastic representation of financial markets: The states of the economy, intended as full histories of the economy, are represented as a probability space. The revelation of information with time is represented by
information structures or filtrations. Prices and other financial quantities
are represented by adapted stochastic processes.
Conditional probabilities and conditional expectations: Conditioning means
the change in probabilities due to the acquisition of some information. It
is possible to condition with respect to an event if the event has non zero
probability. In general terms, conditioning is conditioning with respect to a
filtration or an information structure.
Martingales: A martingale is a stochastic process such that the conditional
expected value is always equal to its present value. It embodies the idea of
a fair game, where todays wealth is the best forecast of future wealth.

A primer on probability theory in financial modeling

18

References
 Billingsley, Patrick, Probability and Measure, 2nd edition, Wiley and Sons,
New York, NY, 1986.
 Chow, Yuan Shih and Henry Teicher, Probability Theory, Springer-Verlag,
New York, NY, 1988.
 Varian, Hal, Microeconomic Theory, W.W Norton & Company, 1992.