Professional Documents
Culture Documents
Articles
Mathematical finance 1
Asymptotic analysis 4
Calculus 7
Copula (statistics) 20
Differential equation 25
Expected value 31
Ergodic theory 38
FeynmanKac formula 44
Fourier transform 46
Girsanov's theorem 66
It's lemma 68
Martingale representation theorem 72
Mathematical model 73
Monte Carlo method 79
Numerical analysis 90
Real analysis 100
Partial differential equation 102
Probability 114
Probability distribution 120
Binomial distribution 125
Log-normal distribution 132
Heat equation 137
RadonNikodym derivative 149
Risk-neutral measure 153
Stochastic calculus 155
Wiener process 157
Lvy process 164
Stochastic differential equations 167
Stochastic volatility 171
Numerical partial differential equations 174
CrankNicolson method 175
Finite difference 180
Value at risk 186
Volatility (finance) 194
Autoregressive conditional heteroskedasticity 198
Brownian Model of Financial Markets 202
Rational pricing 207
Arbitrage 214
Futures contract 223
Putcall parity 235
Intrinsic value (finance) 237
Option time value 239
Moneyness 240
BlackScholes 242
Black model 254
Binomial options pricing model 255
Monte Carlo option model 260
Volatility smile 263
Implied volatility 267
SABR Volatility Model 270
Markov Switching Multifractal 272
Greeks (finance) 275
Finite difference methods for option pricing 284
Trinomial tree 285
Optimal stopping 287
Interest rate derivative 288
Short rate model 291
HullWhite model 293
CoxIngersollRoss model 296
Chen model 297
LIBOR Market Model 299
HeathJarrowMorton framework 300
References
Article Sources and Contributors 303
Image Sources, Licenses and Contributors 308
Article Licenses
License 309
Mathematical finance
1
Mathematical finance
Mathematical finance is applied mathematics concerned with financial markets. The subject has a close relationship
with the discipline of financial economics, which is concerned with much of the underlying theory. Generally,
mathematical finance will derive, and extend, the mathematical or numerical models suggested by financial
economics. Thus, for example, while a financial economist might study the structural reasons why a company may
have a certain share price, a financial mathematician may take the share price as a given, and attempt to use
stochastic calculus to obtain the fair value of derivatives of the stock (see: Valuation of options).
In terms of practice, mathematical finance also overlaps heavily with the field of computational finance (also known
as financial engineering). Arguably, these are largely synonymous, although the latter focuses on application, while
the former focuses on modeling and derivation (see: Quantitative analyst). The fundamental theorem of
arbitrage-free pricing is one of the key theorems in mathematical finance. Many universities around the world now
offer degree and research programs in mathematical finance; see Master of Mathematical Finance.
History
The history of mathematical finance starts with The Theory of Speculation (published 1900) by Louis Bachelier,
which discussed the use of Brownian motion to evaluate stock options. However, it hardly caught any attention
outside academia.
The first influential work of mathematical finance is the theory of portfolio optimization by Harry Markowitz on
using mean-variance estimates of portfolios to judge investment strategies, causing a shift away from the concept of
trying to identify the best individual stock for investment. Using a linear regression strategy to understand and
quantify the risk (i.e. variance) and return (i.e. mean) of an entire portfolio of stocks and bonds, an optimization
strategy was used to choose a portfolio with largest mean return subject to acceptable levels of variance in the return.
Simultaneously, William Sharpe developed the mathematics of determining the correlation between each stock and
the market. For their pioneering work, Markowitz and Sharpe, along with Merton Miller, shared the 1990 Nobel
Memorial Prize in Economic Sciences, for the first time ever awarded for a work in finance.
The portfolio-selection work of Markowitz and Sharpe introduced mathematics to the black art of investment
management. With time, the mathematics has become more sophisticated. Thanks to Robert Merton and Paul
Samuelson, one-period models were replaced by continuous time, Brownian-motion models, and the quadratic utility
function implicit in meanvariance optimization was replaced by more general increasing, concave utility functions
[1]
.
The next major revolution in mathematical finance came with the work of Fischer Black and Myron Scholes along
with fundamental contributions by Robert C. Merton, by modeling financial markets with stochastic models. For this
M. Scholes and R. Merton were awarded the 1997 Nobel Memorial Prize in Economic Sciences. Black was
ineligible for the prize because of his death in 1995.
More sophisticated mathematical models and derivative pricing strategies were then developed but their credibility
was damaged by the financial crisis of 20072010. Bodies such as the Institute for New Economic Thinking are now
attempting to establish more effective theories and methods.
[2]
Mathematical finance
2
Mathematical finance articles
Mathematical tools
Asymptotic analysis
Calculus
Copulas
Differential equations
Expected value
Ergodic theory
FeynmanKac formula
Fourier transform
Gaussian copulas
Girsanov's theorem
It's lemma
Martingale representation theorem
Mathematical models
Monte Carlo method
Numerical analysis
Real analysis
Partial differential equations
Probability
Probability distributions
Binomial distribution
Log-normal distribution
Quantile functions
Heat equation
RadonNikodym derivative
Risk-neutral measure
Stochastic calculus
Brownian motion
Lvy process
Stochastic differential equations
Stochastic volatility
Numerical partial differential equations
CrankNicolson method
Finite difference method
Value at risk
Volatility
ARCH model
GARCH model
Mathematical finance
3
Derivatives pricing
The Brownian Motion Model of Financial Markets
Rational pricing assumptions
Risk neutral valuation
Arbitrage-free pricing
Futures contract pricing
Options
Putcall parity (Arbitrage relationships for options)
Intrinsic value, Time value
Moneyness
Pricing models
BlackScholes model
Black model
Binomial options model
Monte Carlo option model
Implied volatility, Volatility smile
SABR Volatility Model
Markov Switching Multifractal
The Greeks
Finite difference methods for option pricing
Trinomial tree
Optimal stopping (Pricing of American options)
Interest rate derivatives
Short rate model
HullWhite model
CoxIngersollRoss model
Chen model
LIBOR Market Model
HeathJarrowMorton framework
See also
Computational finance
Quantitative Behavioral Finance
Derivative (finance), list of derivatives topics
Modeling and analysis of financial markets
International Swaps and Derivatives Association
Fundamental financial concepts - topics
Model (economics)
List of finance topics
List of economics topics, List of economists
List of accounting topics
Statistical Finance
Brownian model of financial markets
Master of Mathematical Finance
Mathematical finance
4
Notes
[1] Karatzas, I., Methods of Mathematical Finance, Secaucus, NJ, USA: Springer-Verlag New York, Incorporated, 1998
[2] Gillian Tett (April 15 2010), Mathematicians must get out of their ivory towers (http:/ / www. ft. com/ cms/ s/ 0/
cfb9c43a-48b7-11df-8af4-00144feab49a.html), Financial Times,
References
Harold Markowitz, Portfolio Selection, Journal of Finance, 7, 1952, pp.7791
William Sharpe, Investments, Prentice-Hall, 1985
Asymptotic analysis
In mathematical analysis, asymptotic analysis is a method of describing limiting behavior. The methodology has
applications across science. Examples are
in computer science in the analysis of algorithms, considering the performance of algorithms when applied to
very large input datasets
the behavior of physical systems when they are very large.
The simplest example is, when considering a function f(n), there is a need to describe its properties when n becomes
very large. Thus, if f(n) = n
2
+3n, the term 3n becomes insignificant compared to n
2
when n is very large. The
function "f(n) is said to be asymptotically equivalent to n
2
as n ", and this is written symbolically as f(n) ~ n
2
.
Definition
Formally, given complex-valued functions f and g of a natural number variable n, one writes
to express the fact that
and f and g are called asymptotically equivalent as n . This defines an equivalence relation on the set of
functions being nonzero for all n large enough. Alternatively, a more general definition is that
using little O notation, which defines an equivalence relation on all functions. In each case, the equivalence class of f
informally consists of all functions g which "behave like" f, in the limit. Here, o(1) stands for some function of n
whose value tends to 0 as n ; in general o(h(n)) stands for some function k(n) such that k(n)/h(n) tends to 0 as n
.
Big O notation (also known as Landau notation or asymptotic notation) has been developed to provide a convenient
language for the handling of statements about order of growth and is now ubiquitous in the analysis of algorithms.
The asymptotic point of view is basic in computer science, where the question is typically how to describe the
resource implication of scaling-up the size of a computational problem.
Asymptotic analysis
5
Asymptotic expansion
An asymptotic expansion of a function f(x) is in practice an expression of that function in terms of a series, the
partial sums of which do not necessarily converge, but such that taking any initial partial sum provides an asymptotic
formula for f. The idea is that successive terms provide a more and more accurate description of the order of growth
of f. An example is Stirling's approximation.
In symbols, it means we have
but also
and
for each fixed k, while some limit is taken, usually with the requirement that g
k+1
= o(g
k
), which means the (g
k
) form
an asymptotic scale. The requirement that the successive sums improve the approximation may then be expressed as
In case the asymptotic expansion does not converge, for any particular value of the argument there will be a
particular partial sum which provides the best approximation and adding additional terms will decrease the accuracy.
However, this optimal partial sum will usually have more terms as the argument approaches the limit value.
Asymptotic expansions typically arise in the approximation of certain integrals (Laplace's method, saddle-point
method, method of steepest descent) or in the approximation of probability distributions (Edgeworth series). The
famous Feynman graphs in quantum field theory are another example of asymptotic expansions which often do not
converge.
Use in applied mathematics
Asymptotic analysis is a key tool for exploring the ordinary and partial differential equations which arise in the
mathematical modelling of real-world phenomena
[1]
. An illustrative example is the derivation of the boundary layer
equations from the full Navier-Stokes equations governing fluid flow. In many cases, the asymptotic expansion is in
power of a small parameter, : in the boundary layer case, this is the nondimensional ratio of the boundary layer
thickness to a typical lengthscale of the problem. Indeed, applications of asymptotic analysis in mathematical
modelling often
[1]
centre around a nondimensional parameter which has been shown, or assumed, to be small
through a consideration of the scales of the problem at hand.
Method of dominant balance
The method of dominant balance is used to determine the asymptotic behavior of solutions to an ODE without
solving it. The process is iterative in that the result obtained by performing the method once can be used as input
when the method is repeated, to obtain as many terms in the asymptotic expansion as desired.
The process is as follows:
1. Assume that the asymptotic behavior has the form
.
2. Make a clever guess as to which terms in the ODE may be negligible in the limit we are interested in.
3. Drop those terms and solve the resulting ODE.
4. Check that the solution is consistent with step 2. If this is the case, then we have the controlling factor of the
asymptotic behavior. Otherwise, we need to try dropping different terms in step 2.
Asymptotic analysis
6
5. Repeat the process using our result as the first term of the solution.
Example
Consider this second order ODE:
where c and a are arbitrary constants.
This differential equation cannot be solved exactly. However, it may be useful to know how the solutions behave for
large x.
We start by assuming as . We do this with the benefit of hindsight, to make things quicker.
Since we only care about the behavior of y in the large x limit, we set y equal to , and re-express the ODE in
terms of S(x):
, or
where we have used the product rule and chain rule to find the derivatives of y.
Now let us suppose that a solution to this new ODE satisfies
as
as
We get the dominant asymptotic behaviour by setting
If satisfies the above asymptotic conditions, then everything is consistent. The terms we dropped will indeed
have been negligible with respect to the ones we kept. is not a solution to the ODE for S, but it represents the
dominant asymptotic behaviour, which is what we are interested in. Let us check that this choice for is
consistent:
Everything is indeed consistent. Thus we find the dominant asymptotic behaviour of a solution to our ODE:
By convention, the asymptotic series is written as:
so to get at least the first term of this series we have to do another step to see if there is a power of x out the front.
We proceed by making an ansatz that we can write
and then attempt to find asymptotic solutions for C(x). Substituting into the ODE for S(x) we find
Asymptotic analysis
7
Repeating the same process as before, we keep C' and (c-a)/x and find that
The leading asymptotic behaviour is therefore
See also
Asymptotic computational complexity
Asymptotic theory
References
[1] S. Howison, Practical Applied Mathematics, Cambridge University Press, Cambridge, 2005. ISBN 0-521-60369-2
J. P. Boyd, "The Devil's Invention: asymptotic, superasymptotic and hyperasymptotic series", Acta Applicandae
Mathematicae, 56: 1-98 (1999). Preprint (http:/ / www-personal. umich. edu/ ~jpboyd/ boydactaapplicreview.
pdf).
A. Erdlyi, Asymptotic Expansions. New York: Dover, 1987.
Calculus
Calculus (Latin, calculus, a small stone used for counting) is a branch of mathematics focused on limits, functions,
derivatives, integrals, and infinite series. This subject constitutes a major part of modern mathematics education. It
has two major branches, differential calculus and integral calculus, which are related by the fundamental theorem
of calculus. Calculus is the study of change,
[1]
in the same way that geometry is the study of shape and algebra is the
study of operations and their application to solving equations. A course in calculus is a gateway to other, more
advanced courses in mathematics devoted to the study of functions and limits, broadly called mathematical analysis.
Calculus has widespread applications in science, economics, and engineering and can solve many problems for
which algebra alone is insufficient.
Historically, calculus was called "the calculus of infinitesimals", or "infinitesimal calculus". More generally, calculus
(plural calculi) may refer to any method or system of calculation guided by the symbolic manipulation of
expressions. Some examples of other well-known calculi are propositional calculus, variational calculus, lambda
calculus, pi calculus, and join calculus.
Calculus
8
History
Ancient
Isaac Newton is one of the most famous contributors to
the development of calculus, with, among other things,
the use of calculus in his laws of motion and
gravitation.
The ancient period introduced some of the ideas that led to integral
calculus, but does not seem to have developed these ideas in a
rigorous or systematic way. Calculations of volumes and areas,
one goal of integral calculus, can be found in the Egyptian
Moscow papyrus (c. 1820 BC), but the formulas are mere
instructions, with no indication as to method, and some of them
are wrong. Some, including Morris Kline in Mathematical thought
from ancient to modern times, Vol. I, suggest trial and error.
[2]
From the age of Greek mathematics, Eudoxus (c. 408355 BC)
used the method of exhaustion, which prefigures the concept of the
limit, to calculate areas and volumes, while Archimedes (c.
287212 BC) developed this idea further, inventing heuristics
which resemble the methods of integral calculus.
[3]
The method of
exhaustion was later reinvented in China by Liu Hui in the 3rd
century AD in order to find the area of a circle.
[4]
In the 5th
century AD, Zu Chongzhi established a method which would later
be called Cavalieri's principle to find the volume of a sphere.
[5]
Medieval
Around AD 1000, the mathematician Ibn al-Haytham (Alhacen)
was the first to derive the formula for the sum of the fourth powers
of an arithmetic progression, using a method that is readily generalizable to finding the formula for the sum of any
higher integer powers.
[6]
In the 11th century, the Chinese polymath Shen Kuo developed 'packing' equations that
prefigure integration. In the 12th century, the Indian mathematician, Bhskara II, developed an early method using
infinitesimal change, a precursor of the derivative, and he stated a form of Rolle's theorem.
[7]
Also in the 12th
century, the Persian mathematician Sharaf al-Dn al-Ts used a method similar to taking the derivative of cubic
polynomials.
[8]
In the 14th century, Indian mathematician Madhava of Sangamagrama, along with other
mathematician-astronomers of the Kerala school of astronomy and mathematics, described special cases of Taylor
series,
[9]
which are treated in the text Yuktibhasa.
[10]
[11]
[12]
Modern
In Europe, the foundational work was a treatise due to Bonaventura Cavalieri, who argued that volumes and areas
should be computed as the sums of the volumes and areas of infinitesimal thin cross-sections. The ideas were similar
to Archimedes' in The Method, but this treatise was lost until the early part of the twentieth century. Cavalieri's work
was not well respected since his methods can lead to erroneous results, and the infinitesimal quantities he introduced
were disreputable at first.
The formal study of calculus combined Cavalieri's infinitesimals with the calculus of finite differences developed in
Europe at around the same time. The combination was achieved by John Wallis, Isaac Barrow, and James Gregory,
the latter two proving the second fundamental theorem of calculus around 1675.
The product rule and chain rule, the notion of higher derivatives, Taylor series, and analytical functions were
introduced by Isaac Newton in an idiosyncratic notation which he used to solve problems of mathematical physics.
In his publications, Newton rephrased his ideas to suit the mathematical idiom of the time, replacing calculations
Calculus
9
with infinitesimals by equivalent geometrical arguments which were considered beyond reproach. He used the
methods of calculus to solve the problem of planetary motion, the shape of the surface of a rotating fluid, the
oblateness of the earth, the motion of a weight sliding on a cycloid, and many other problems discussed in his
Principia Mathematica. In other work, he developed series expansions for functions, including fractional and
irrational powers, and it was clear that he understood the principles of the Taylor series. He did not publish all these
discoveries, and at this time infinitesimal methods were still considered disreputable.
Gottfried Wilhelm Leibniz was originally accused of
plagiarizing Sir Isaac Newton's unpublished work (only
in Britain, not in continental Europe), but is now
regarded as an independent inventor of and contributor
to calculus.
These ideas were systematized into a true calculus of
infinitesimals by Gottfried Wilhelm Leibniz, who was originally
accused of plagiarism by Newton.
[13]
He is now regarded as an
independent inventor of and contributor to calculus. His
contribution was to provide a clear set of rules for manipulating
infinitesimal quantities, allowing the computation of second and
higher derivatives, and providing the product rule and chain rule,
in their differential and integral forms. Unlike Newton, Leibniz
paid a lot of attention to the formalismhe often spent days
determining appropriate symbols for concepts.
Leibniz and Newton are usually both credited with the invention
of calculus. Newton was the first to apply calculus to general
physics and Leibniz developed much of the notation used in
calculus today. The basic insights that both Newton and Leibniz
provided were the laws of differentiation and integration, second
and higher derivatives, and the notion of an approximating
polynomial series. By Newton's time, the fundamental theorem of
calculus was known.
When Newton and Leibniz first published their results, there was
great controversy over which mathematician (and therefore which
country) deserved credit. Newton derived his results first, but Leibniz published first. Newton claimed Leibniz stole
ideas from his unpublished notes, which Newton had shared with a few members of the Royal Society. This
controversy divided English-speaking mathematicians from continental mathematicians for many years, to the
detriment of English mathematics. A careful examination of the papers of Leibniz and Newton shows that they
arrived at their results independently, with Leibniz starting first with integration and Newton with differentiation.
Today, both Newton and Leibniz are given credit for developing calculus independently. It is Leibniz, however, who
gave the new discipline its name. Newton called his calculus "the science of fluxions".
Since the time of Leibniz and Newton, many mathematicians have contributed to the continuing development of
calculus. In the 19th century, calculus was put on a much more rigorous footing by mathematicians such as Cauchy,
Riemann, and Weierstrass (see (, )-definition of limit). It was also during this period that the ideas of calculus were
generalized to Euclidean space and the complex plane. Lebesgue generalized the notion of the integral so that
virtually any function has an integral, while Laurent Schwartz extended differentiation in much the same way.
Calculus is a ubiquitous topic in most modern high schools and universities around the world.
[14]
Calculus
10
Significance
While some of the ideas of calculus were developed earlier in Egypt, Greece, China, India, Iraq, Persia, and Japan,
the modern use of calculus began in Europe, during the 17th century, when Isaac Newton and Gottfried Wilhelm
Leibniz built on the work of earlier mathematicians to introduce its basic principles. The development of calculus
was built on earlier concepts of instantaneous motion and area underneath curves.
Applications of differential calculus include computations involving velocity and acceleration, the slope of a curve,
and optimization. Applications of integral calculus include computations involving area, volume, arc length, center
of mass, work, and pressure. More advanced applications include power series and Fourier series. Calculus can be
used to compute the trajectory of a shuttle docking at a space station or the amount of snow in a driveway.
Calculus is also used to gain a more precise understanding of the nature of space, time, and motion. For centuries,
mathematicians and philosophers wrestled with paradoxes involving division by zero or sums of infinitely many
numbers. These questions arise in the study of motion and area. The ancient Greek philosopher Zeno gave several
famous examples of such paradoxes. Calculus provides tools, especially the limit and the infinite series, which
resolve the paradoxes.
Foundations
In mathematics, foundations refers to the rigorous development of a subject from precise axioms and definitions.
Working out a rigorous foundation for calculus occupied mathematicians for much of the century following Newton
and Leibniz and is still to some extent an active area of research today.
There is more than one rigorous approach to the foundation of calculus. The usual one today is via the concept of
limits defined on the continuum of real numbers. An alternative is nonstandard analysis, in which the real number
system is augmented with infinitesimal and infinite numbers, as in the original Newton-Leibniz conception. The
foundations of calculus are included in the field of real analysis, which contains full definitions and proofs of the
theorems of calculus as well as generalizations such as measure theory and distribution theory.
Principles
Limits and infinitesimals
Calculus is usually developed by manipulating very small quantities. Historically, the first method of doing so was
by infinitesimals. These are objects which can be treated like numbers but which are, in some sense, "infinitely
small". An infinitesimal number dx could be greater than 0, but less than any number in the sequence 1, 1/2, 1/3, ...
and less than any positive real number. Any integer multiple of an infinitesimal is still infinitely small, i.e.,
infinitesimals do not satisfy the Archimedean property. From this point of view, calculus is a collection of techniques
for manipulating infinitesimals. This approach fell out of favor in the 19th century because it was difficult to make
the notion of an infinitesimal precise. However, the concept was revived in the 20th century with the introduction of
non-standard analysis and smooth infinitesimal analysis, which provided solid foundations for the manipulation of
infinitesimals.
In the 19th century, infinitesimals were replaced by limits. Limits describe the value of a function at a certain input
in terms of its values at nearby input. They capture small-scale behavior, just like infinitesimals, but use the ordinary
real number system. In this treatment, calculus is a collection of techniques for manipulating certain limits.
Infinitesimals get replaced by very small numbers, and the infinitely small behavior of the function is found by
taking the limiting behavior for smaller and smaller numbers. Limits are the easiest way to provide rigorous
foundations for calculus, and for this reason they are the standard approach.
Calculus
11
Differential calculus
Tangent line at (x, f(x)). The derivative f(x) of a curve at a point is the slope (rise
over run) of the line tangent to that curve at that point.
Differential calculus is the study of the
definition, properties, and applications of
the derivative of a function. The process of
finding the derivative is called
differentiation. Given a function and a point
in the domain, the derivative at that point is
a way of encoding the small-scale behavior
of the function near that point. By finding
the derivative of a function at every point in
its domain, it is possible to produce a new
function, called the derivative function or
just the derivative of the original function.
In mathematical jargon, the derivative is a
linear operator which inputs a function and
outputs a second function. This is more
abstract than many of the processes studied
in elementary algebra, where functions usually input a number and output another number. For example, if the
doubling function is given the input three, then it outputs six, and if the squaring function is given the input three,
then it outputs nine. The derivative, however, can take the squaring function as an input. This means that the
derivative takes all the information of the squaring functionsuch as that two is sent to four, three is sent to nine,
four is sent to sixteen, and so onand uses this information to produce another function. (The function it produces
turns out to be the doubling function.)
The most common symbol for a derivative is an apostrophe-like mark called prime. Thus, the derivative of the
function of f is f, pronounced "f prime." For instance, if f(x) = x
2
is the squaring function, then f(x) = 2x is its
derivative, the doubling function.
If the input of the function represents time, then the derivative represents change with respect to time. For example,
if f is a function that takes a time as input and gives the position of a ball at that time as output, then the derivative of
f is how the position is changing in time, that is, it is the velocity of the ball.
If a function is linear (that is, if the graph of the function is a straight line), then the function can be written y = mx +
b, where:
This gives an exact value for the slope of a straight line. If the graph of the function is not a straight line, however,
then the change in y divided by the change in x varies. Derivatives give an exact meaning to the notion of change in
output with respect to change in input. To be concrete, let f be a function, and fix a point a in the domain of f. (a,
f(a)) is a point on the graph of the function. If h is a number close to zero, then a + h is a number close to a.
Therefore (a + h, f(a + h)) is close to (a, f(a)). The slope between these two points is
This expression is called a difference quotient. A line through two points on a curve is called a secant line, so m is
the slope of the secant line between (a, f(a)) and (a + h, f(a + h)). The secant line is only an approximation to the
behavior of the function at the point a because it does not account for what happens between a and a + h. It is not
possible to discover the behavior at a by setting h to zero because this would require dividing by zero, which is
impossible. The derivative is defined by taking the limit as h tends to zero, meaning that it considers the behavior of f
Calculus
12
for all small values of h and extracts a consistent value for the case when h equals zero:
Geometrically, the derivative is the slope of the tangent line to the graph of f at a. The tangent line is a limit of secant
lines just as the derivative is a limit of difference quotients. For this reason, the derivative is sometimes called the
slope of the function f.
Here is a particular example, the derivative of the squaring function at the input 3. Let f(x) = x
2
be the squaring
function.
The derivative f(x) of a curve at a point is the slope of the line tangent to that curve
at that point. This slope is determined by considering the limiting value of the
slopes of secant lines. Here the function involved (drawn in red) is f(x) = x
3
x.
The tangent line (in green) which passes through the point (3/2, 15/8) has a
slope of 23/4. Note that the vertical and horizontal scales in this image are
different.
The slope of tangent line to the squaring function at the point (3,9) is 6, that is to say, it is going up six times as fast
as it is going to the right. The limit process just described can be performed for any point in the domain of the
squaring function. This defines the derivative function of the squaring function, or just the derivative of the squaring
function for short. A similar computation to the one above shows that the derivative of the squaring function is the
doubling function.
Calculus
13
Leibniz notation
A common notation, introduced by Leibniz, for the derivative in the example above is
In an approach based on limits, the symbol dy/dx is to be interpreted not as the quotient of two numbers but as a
shorthand for the limit computed above. Leibniz, however, did intend it to represent the quotient of two
infinitesimally small numbers, dy being the infinitesimally small change in y caused by an infinitesimally small
change dx applied to x. We can also think of d/dx as a differentiation operator, which takes a function as an input and
gives another function, the derivative, as the output. For example:
In this usage, the dx in the denominator is read as "with respect to x". Even when calculus is developed using limits
rather than infinitesimals, it is common to manipulate symbols like dx and dy as if they were real numbers; although
it is possible to avoid such manipulations, they are sometimes notationally convenient in expressing operations such
as the total derivative.
Integral calculus
Integral calculus is the study of the definitions, properties, and applications of two related concepts, the indefinite
integral and the definite integral. The process of finding the value of an integral is called integration. In technical
language, integral calculus studies two related linear operators.
The indefinite integral is the antiderivative, the inverse operation to the derivative. F is an indefinite integral of f
when f is a derivative of F. (This use of upper- and lower-case letters for a function and its indefinite integral is
common in calculus.)
The definite integral inputs a function and outputs a number, which gives the area between the graph of the input
and the x-axis. The technical definition of the definite integral is the limit of a sum of areas of rectangles, called a
Riemann sum.
A motivating example is the distances traveled in a given time.
If the speed is constant, only multiplication is needed, but if the speed changes, then we need a more powerful
method of finding the distance. One such method is to approximate the distance traveled by breaking up the time into
many short intervals of time, then multiplying the time elapsed in each interval by one of the speeds in that interval,
and then taking the sum (a Riemann sum) of the approximate distance traveled in each interval. The basic idea is that
if only a short time elapses, then the speed will stay more or less the same. However, a Riemann sum only gives an
approximation of the distance traveled. We must take the limit of all such Riemann sums to find the exact distance
traveled.
Calculus
14
Integration can be thought of as measuring the area under a curve, defined by
f(x), between two points (here a and b).
If f(x) in the diagram on the left represents speed
as it varies over time, the distance traveled
(between the times represented by a and b) is the
area of the shaded region s.
To approximate that area, an intuitive method
would be to divide up the distance between a
and b into a number of equal segments, the
length of each segment represented by the
symbol x. For each small segment, we can
choose one value of the function f(x). Call that
value h. Then the area of the rectangle with base
x and height h gives the distance (time x
multiplied by speed h) traveled in that segment.
Associated with each segment is the average
value of the function above it, f(x)=h. The sum
of all such rectangles gives an approximation of
the area between the axis and the curve, which is
an approximation of the total distance traveled.
A smaller value for x will give more rectangles and in most cases a better approximation, but for an exact answer
we need to take a limit as x approaches zero.
The symbol of integration is , an elongated S (the S stands for "sum"). The definite integral is written as:
and is read "the integral from a to b of f-of-x with respect to x." The Leibniz notation dx is intended to suggest
dividing the area under the curve into an infinite number of rectangles, so that their width x becomes the
infinitesimally small dx. In a formulation of the calculus based on limits, the notation
is to be understood as an operator that takes a function as an input and gives a number, the area, as an output; dx is
not a number, and is not being multiplied by f(x).
The indefinite integral, or antiderivative, is written:
Functions differing by only a constant have the same derivative, and therefore the antiderivative of a given function
is actually a family of functions differing only by a constant. Since the derivative of the function y = x + C, where C
is any constant, is y = 2x, the antiderivative of the latter is given by:
An undetermined constant like C in the antiderivative is known as a constant of integration.
Calculus
15
Fundamental theorem
The fundamental theorem of calculus states that differentiation and integration are inverse operations. More
precisely, it relates the values of antiderivatives to definite integrals. Because it is usually easier to compute an
antiderivative than to apply the definition of a definite integral, the Fundamental Theorem of Calculus provides a
practical way of computing definite integrals. It can also be interpreted as a precise statement of the fact that
differentiation is the inverse of integration.
The Fundamental Theorem of Calculus states: If a function f is continuous on the interval [a, b] and if F is a function
whose derivative is f on the interval (a, b), then
Furthermore, for every x in the interval (a, b),
This realization, made by both Newton and Leibniz, who based their results on earlier work by Isaac Barrow, was
key to the massive proliferation of analytic results after their work became known. The fundamental theorem
provides an algebraic method of computing many definite integralswithout performing limit processesby
finding formulas for antiderivatives. It is also a prototype solution of a differential equation. Differential equations
relate an unknown function to its derivatives, and are ubiquitous in the sciences.
Applications
The logarithmic spiral of the Nautilus shell is a
classical image used to depict the growth and
change related to calculus
Calculus is used in every branch of the physical sciences, actuarial
science, computer science, statistics, engineering, economics, business,
medicine, demography, and in other fields wherever a problem can be
mathematically modeled and an optimal solution is desired. It allows
one to go from (non-constant) rates of change to the total change or
vice versa, and many times in studying a problem we know one and are
trying to find the other.
Physics makes particular use of calculus; all concepts in classical
mechanics and electromagnetism are interrelated through calculus. The
mass of an object of known density, the moment of inertia of objects,
as well as the total energy of an object within a conservative field can
be found by the use of calculus. An example of the use of calculus in
mechanics is Newton's second law of motion: historically stated it expressly uses the term "rate of change" which
refers to the derivative saying The rate of change of momentum of a body is equal to the resultant force acting on
the body and is in the same direction. Commonly expressed today as Force=Massacceleration, it involves
differential calculus because acceleration is the time derivative of velocity or second time derivative of trajectory or
spatial position. Starting from knowing how an object is accelerating, we use calculus to derive its path.
Maxwell's theory of electromagnetism and Einstein's theory of general relativity are also expressed in the language
of differential calculus. Chemistry also uses calculus in determining reaction rates and radioactive decay. In biology,
population dynamics starts with reproduction and death rates to model population changes.
Calculus can be used in conjunction with other mathematical disciplines. For example, it can be used with linear
algebra to find the "best fit" linear approximation for a set of points in a domain. Or it can be used in probability
theory to determine the probability of a continuous random variable from an assumed density function. In analytic
geometry, the study of graphs of functions, calculus is used to find high points and low points (maxima and minima),
slope, concavity and inflection points.
Calculus
16
Green's Theorem, which gives the relationship between a line integral around a simple closed curve C and a double
integral over the plane region D bounded by C, is applied in an instrument known as a planimeter which is used to
calculate the area of a flat surface on a drawing. For example, it can be used to calculate the amount of area taken up
by an irregularly shaped flower bed or swimming pool when designing the layout of a piece of property.
In the realm of medicine, calculus can be used to find the optimal branching angle of a blood vessel so as to
maximize flow. From the decay laws for a particular drug's elimination from the body, it's used to derive dosing
laws. In nuclear medicine, it's used to build models of radiation transport in targeted tumor therapies.
In economics, calculus allows for the determination of maximal profit by providing a way to easily calculate both
marginal cost and marginal revenue.
Calculus is also used to find approximate solutions to equations; in practice it's the standard way to solve differential
equations and do root finding in most applications. Examples are methods such as Newton's method, fixed point
iteration, and linear approximation. For instance, spacecraft use a variation of the Euler method to approximate
curved courses within zero gravity environments.
See also
Lists
List of differentiation identities
List of calculus topics
Publications in calculus
Table of integrals
Related topics
Calculus of finite differences
Calculus with polynomials
Complex analysis
Differential equation
Differential geometry
Elementary calculus
Fourier series
Integral equation
Mathematical analysis
Mathematics
Multivariable calculus
Non-classical analysis
Non-standard analysis
Non-standard calculus
Precalculus (mathematical education)
Product integral
Stochastic calculus
Taylor series
Calculus
17
References
Notes
[1] Latorre, Donald R.; Kenelly, John W.; Reed, Iris B.; Biggers, Sherry (2007), Calculus Concepts: An Applied Approach to the Mathematics of
Change (http:/ / books. google.com/ books?id=bQhX-3k0LS8C), Cengage Learning, p.2, ISBN0-618-78981-2, , Chapter 1, p 2 (http:/ /
books.google. com/ books?id=bQhX-3k0LS8C& pg=PA2)
[2] Morris Kline, Mathematical thought from ancient to modern times, Vol. I
[3] Archimedes, Method, in The Works of Archimedes ISBN 978-0-521-66160-7
[4] Dun, Liu; Fan, Dainian; Cohen, Robert Sonn (1966). A comparison of Archimdes' and Liu Hui's studies of circles (http:/ / books. google.
com/ books?id=jaQH6_8Ju-MC). Chinese studies in the history and philosophy of science and technology. 130. Springer. p. 279.
ISBN0-792-33463-9. ., Chapter , p. 279 (http:/ / books. google. com/ books?id=jaQH6_8Ju-MC& pg=PA279)
[5] Zill, Dennis G.; Wright, Scott; Wright, Warren S. (2009). Calculus: Early Transcendentals (http:/ / books. google. com/
books?id=R3Hk4Uhb1Z0C) (3 ed.). Jones & Bartlett Learning. p.xxvii. ISBN0-763-75995-3. ., Extract of page 27 (http:/ / books. google.
com/ books?id=R3Hk4Uhb1Z0C& pg=PR27)
[6] Victor J. Katz (1995). "Ideas of Calculus in Islam and India", Mathematics Magazine 68 (3), pp. 163-174.
[7] Ian G. Pearce. Bhaskaracharya II. (http:/ / turnbull. mcs. st-and. ac. uk/ ~history/ Projects/ Pearce/ Chapters/ Ch8_5. html)
[8] J. L. Berggren (1990). "Innovation and Tradition in Sharaf al-Din al-Tusi's Muadalat", Journal of the American Oriental Society 110 (2), pp.
304-309.
[9] "Madhava" (http:/ / www-gap. dcs. st-and.ac.uk/ ~history/ Biographies/ Madhava. html). Biography of Madhava. School of Mathematics
and Statistics University of St Andrews, Scotland. . Retrieved 2006-09-13.
[10] "An overview of Indian mathematics" (http:/ / www-history. mcs. st-andrews. ac. uk/ HistTopics/ Indian_mathematics. html). Indian Maths.
School of Mathematics and Statistics University of St Andrews, Scotland. . Retrieved 2006-07-07.
[11] "Science and technology in free India" (http:/ / www.kerala. gov. in/ keralcallsep04/ p22-24. pdf) (PDF). Government of Kerala Kerala
Call, September 2004. Prof.C.G.Ramachandran Nair. . Retrieved 2006-07-09.
[12] Charles Whish (1834), "On the Hindu Quadrature of the circle and the infinite series of the proportion of the circumference to the diameter
exhibited in the four Sastras, the Tantra Sahgraham, Yucti Bhasha, Carana Padhati and Sadratnamala", Transactions of the Royal Asiatic
Society of Great Britain and Ireland (Royal Asiatic Society of Great Britain and Ireland) 3 (3): 509523, doi:10.1017/S0950473700001221,
JSTOR25581775
[13] Leibniz, Gottfried Wilhelm. The Early Mathematical Manuscripts of Leibniz. Cosimo, Inc., 2008. Page 228. Copy (http:/ / books. google.
com/ books?hl=en& lr=& id=7d8_4WPc9SMC& oi=fnd& pg=PA3& dq=Gottfried+ Wilhelm+ Leibniz+ accused+ of+ plagiarism+ by+
Newton& ots=09h9BdTlbE& sig=hu5tNKpBJxHcpj8U3kR_T2bZqrY#v=onepage& q=plagairism& f=false|Online)
[14] UNESCO-World Data on Education (http:/ / nt5.scbbs. com/ cgi-bin/ om_isapi. dll?clientID=137079235& infobase=iwde. nfo&
softpage=PL_frame)
Books
Larson, Ron, Bruce H. Edwards (2010). "Calculus", 9th ed., Brooks Cole Cengage Learning. ISBN
9780547167022
McQuarrie, Donald A. (2003). Mathematical Methods for Scientists and Engineers, University Science Books.
ISBN 9781891389245
Stewart, James (2008). Calculus: Early Transcendentals, 6th ed., Brooks Cole Cengage Learning. ISBN
9780495011668
Thomas, George B., Maurice D. Weir, Joel Hass, Frank R. Giordano (2008), "Calculus", 11th ed.,
Addison-Wesley. ISBN 0-321-48987-X
Calculus
18
Other resources
Further reading
Courant, Richard ISBN 978-3540650584 Introduction to calculus and analysis 1.
Edmund Landau. ISBN 0-8218-2830-4 Differential and Integral Calculus, American Mathematical Society.
Robert A. Adams. (1999). ISBN 978-0-201-39607-2 Calculus: A complete course.
Albers, Donald J.; Richard D. Anderson and Don O. Loftsgaarden, ed. (1986) Undergraduate Programs in the
Mathematics and Computer Sciences: The 1985-1986 Survey, Mathematical Association of America No. 7.
John Lane Bell: A Primer of Infinitesimal Analysis, Cambridge University Press, 1998. ISBN 978-0-521-62401-5.
Uses synthetic differential geometry and nilpotent infinitesimals.
Florian Cajori, "The History of Notations of the Calculus." Annals of Mathematics, 2nd Ser., Vol. 25, No. 1 (Sep.,
1923), pp.146.
Leonid P. Lebedev and Michael J. Cloud: "Approximating Perfection: a Mathematician's Journey into the World
of Mechanics, Ch. 1: The Tools of Calculus", Princeton Univ. Press, 2004.
Cliff Pickover. (2003). ISBN 978-0-471-26987-8 Calculus and Pizza: A Math Cookbook for the Hungry Mind.
Michael Spivak. (September 1994). ISBN 978-0-914098-89-8 Calculus. Publish or Perish publishing.
Tom M. Apostol. (1967). ISBN 9780471000051 Calculus, Volume 1, One-Variable Calculus with an Introduction
to Linear Algebra. Wiley.
Tom M. Apostol. (1969). ISBN 9780471000075 Calculus, Volume 2, Multi-Variable Calculus and Linear
Algebra with Applications. Wiley.
Silvanus P. Thompson and Martin Gardner. (1998). ISBN 978-0-312-18548-0 Calculus Made Easy.
Mathematical Association of America. (1988). Calculus for a New Century; A Pump, Not a Filter, The
Association, Stony Brook, NY. ED 300 252.
Thomas/Finney. (1996). ISBN 978-0-201-53174-9 Calculus and Analytic geometry 9th, Addison Wesley.
Weisstein, Eric W. "Second Fundamental Theorem of Calculus." (http:/ / mathworld. wolfram. com/
SecondFundamentalTheoremofCalculus. html) From MathWorldA Wolfram Web Resource.
Online books
Crowell, B. (2003). "Calculus" Light and Matter, Fullerton. Retrieved 6 May 2007 from http:/ / www.
lightandmatter. com/ calc/ calc. pdf (http:/ / www. lightandmatter. com/ calc/ calc. pdf)
Garrett, P. (2006). "Notes on first year calculus" University of Minnesota. Retrieved 6 May 2007 from
http://www.math.umn.edu/~garrett/calculus/first_year/notes.pdf (http:/ / www. math. umn. edu/ ~garrett/ calculus/
first_year/ notes. pdf)
Faraz, H. (2006). "Understanding Calculus" Retrieved 6 May 2007 from Understanding Calculus, URL http:/ /
www. understandingcalculus. com/ (http:/ / www. understandingcalculus. com/ ) (HTML only)
Keisler, H. J. (2000). "Elementary Calculus: An Approach Using Infinitesimals" Retrieved 29 August 2010 from
http://www.math.wisc.edu/~keisler/calc.html (http:/ / www. math. wisc. edu/ ~keisler/ calc. html)
Mauch, S. (2004). "Sean's Applied Math Book" California Institute of Technology. Retrieved 6 May 2007 from
http://www.cacr.caltech.edu/~sean/applied_math.pdf (http:/ / www. cacr. caltech. edu/ ~sean/ applied_math. pdf)
Sloughter, Dan (2000). "Difference Equations to Differential Equations: An introduction to calculus". Retrieved
17 March 2009 from http:/ / synechism. org/ drupal/ de2de/ (http:/ / synechism. org/ drupal/ de2de/ )
Stroyan, K.D. (2004). "A brief introduction to infinitesimal calculus" University of Iowa. Retrieved 6 May 2007
from http:/ / www. math. uiowa. edu/ ~stroyan/ InfsmlCalculus/ InfsmlCalc. htm (http:/ / www. math. uiowa. edu/
~stroyan/ InfsmlCalculus/ InfsmlCalc. htm) (HTML only)
Strang, G. (1991). "Calculus" Massachusetts Institute of Technology. Retrieved 6 May 2007 from http:/ / ocw.
mit. edu/ ans7870/ resources/ Strang/ strangtext. htm (http:/ / ocw. mit. edu/ ans7870/ resources/ Strang/
strangtext. htm)
Calculus
19
Smith, William V. (2001). "The Calculus" Retrieved 4 July 2008 (http:/ / www. math. byu. edu/ ~smithw/
Calculus/ ) (HTML only).
External links
Weisstein, Eric W., " Calculus (http:/ / mathworld. wolfram. com/ Calculus. html)" from MathWorld.
Topics on Calculus (http:/ / planetmath. org/ encyclopedia/ TopicsOnCalculus. html) at PlanetMath.
Calculus Made Easy (1914) by Silvanus P. Thompson (http:/ / djm. cc/ library/ Calculus_Made_Easy_Thompson.
pdf) Full text in PDF
Calculus (http:/ / www. bbc. co. uk/ programmes/ b00mrfwq) on In Our Time at the BBC. ( listen now (http:/ /
www. bbc. co. uk/ iplayer/ console/ b00mrfwq/ In_Our_Time_Calculus))
Calculus.org: The Calculus page (http:/ / www. calculus. org) at University of California, Davis contains
resources and links to other sites
COW: Calculus on the Web (http:/ / cow. math. temple. edu/ ) at Temple University contains resources ranging
from pre-calculus and associated algebra
Earliest Known Uses of Some of the Words of Mathematics: Calculus & Analysis (http:/ / www. economics.
soton. ac. uk/ staff/ aldrich/ Calculus and Analysis Earliest Uses. htm)
Online Integrator (WebMathematica) (http:/ / integrals. wolfram. com/ ) from Wolfram Research
The Role of Calculus in College Mathematics (http:/ / www. ericdigests. org/ pre-9217/ calculus. htm) from
ERICDigests.org
OpenCourseWare Calculus (http:/ / ocw. mit. edu/ OcwWeb/ Mathematics/ index. htm) from the Massachusetts
Institute of Technology
Infinitesimal Calculus (http:/ / eom. springer. de/ I/ i050950. htm) an article on its historical development, in
Encyclopaedia of Mathematics, Michiel Hazewinkel ed. .
Elements of Calculus I (http:/ / ocw. nd. edu/ mathematics/ elements-of-calculus-i) and Calculus II for Business
(http:/ / ocw. nd. edu/ mathematics/ calculus-ii-for-business), OpenCourseWare from the University of Notre
Dame with activities, exams and interactive applets.
Calculus for Beginners and Artists (http:/ / math. mit. edu/ ~djk/ calculus_beginners/ ) by Daniel Kleitman, MIT
Calculus Problems and Solutions (http:/ / www. math. ucdavis. edu/ ~kouba/ ProblemsList. html) by D. A. Kouba
Solved problems in calculus (http:/ / calculus. solved-problems. com/ )
Copula (statistics)
20
Copula (statistics)
In statistics, a copula is used as a general way of formulating a multivariate distribution in such a way that various
general types of dependence can be represented.
[1]
The approach to formulating a multivariate distribution using a
copula is based on the idea that a simple transformation can be made of each marginal variable in such a way that
each transformed marginal variable has a uniform distribution. Once this is done, the dependence structure can be
expressed as a multivariate distribution on the obtained uniforms, and a copula is precisely a multivariate distribution
on marginally uniform random variables. When applied in a practical context, the above transformations might be
fitted as an initial step for each marginal distribution, or the parameters of the transformations might be fitted jointly
with those of the copula.
There are many families of copulas which differ in the detail of the dependence they represent. A family will
typically have several parameters which relate to the strength and form of the dependence. Some families of copulas
are outlined below. A typical use for copulas is to choose one such family and use it to define the multivariate
distribution to be used, typically in fitting a distribution to a sample of data. However, it is possible to derive the
copula corresponding to any given multivariate distribution.
The basic idea
Consider two random variables X and Y, with continuous cumulative distribution functions F
X
and F
Y
. The
probability integral transform can be applied separately to the two random variables to define X = F
X
(X) and Y =
F
Y
(Y). It follows that X and Y both have uniform distributions but are, in general, dependent if X and Y were already
dependent (of course, if X and Y were independent, X and Y remain independent). Since the transforms are
invertible, specifying the dependence between X and Y is, in a way, the same as specifying dependence between X
and Y. With X and Y being uniform random variables, the problem reduces to specifying a bivariate distribution
between two uniforms, that is, a copula. So the idea is to simplify the problem by removing consideration of many
different marginal distributions by transforming the marginal variates to uniforms, and then specifying dependence
as a multivariate distribution on the uniforms.
Definition
A copula is a multivariate joint distribution defined on the n-dimensional unit cube [0,1]
n
such that every marginal
distribution is uniform on the interval [0,1].
Specifically, C:[0,1]
n
[0,1] is an n-dimensional copula (briefly, n-copula) if:
C(u)=0 whenever u[0,1]
n
has at least one component equal to 0;
C(u) = u
i
whenever u[0,1]
n
has all the components equal to 1 except the ith one, which is equal to u
i
;
C is n-increasing, i.e., for each hyperrectangle
where the . is the so called C-volume of .
Copula (statistics)
21
Sklar's theorem
The theorem proposed by Sklar
[2]
underlies most applications of the copula. Sklar's theorem states that given a joint
distribution function for variables, and respective marginal distribution functions, there exists a copula
such that the copula binds the margins to give the joint distribution.
For the bivariate case, Sklar's theorem can be stated as follows. For any bivariate distribution function , let
and be the univariate marginal probability distribution functions. Then
there exists a copula such that
(where the symbol for the copula has also been used for with its cumulative distribution function). Moreover, if
the marginal distributions and are continuous, the copula function is unique. Otherwise, the copula
is unique on the range of values of the marginal distributions.
To understand the density function of the coupled random variable it should be noticed that
The expectation of a function g can be written in the following ways:
FrchetHoeffding copula boundaries
Graphs of the FrchetHoeffding copula limits
and of the independence copula (in the middle).
Minimum (antimonotone) copula: This is the lower bound for all
copulas. In the bivariate case only, it represents perfect negative
dependence between variates.
For n-variate copulas, the lower bound is given by
Maximum (comonotone ) copula: This is the upper bound for all copulas. It represents perfect positive dependence
between variates:
For n-variate copulas, the upper bound is given by
Conclusion: For all copulas C(u,v),
In the multivariate case, the corresponding inequality is
Copula (statistics)
22
Families of copula
Gaussian copula
Cumulative distribution and probability density
functions of Gaussian copula with =0.4
One example of a copula often used for modelling in financeas
introduced by David X. Li in 2000is the Gaussian copula,
[3]
which is
constructed from the bivariate normal distribution via Sklar's theorem.
With being the standard bivariate normal cumulative distribution
function with correlation , the Gaussian copula function is
where and denotes the standard normal cumulative distribution function.
Differentiating C yields the copula density function:
where
is the density function for the standard bivariate Gaussian with Pearson's product moment correlation coefficient
and is the standard normal density.
Archimedean copulas
Archimedean copulas are an important family of copulas, which have a simple form with properties such as
associativity and have a variety of dependence structures. Unlike elliptical copulas (e.g. Gaussian), most of the
Archimedean copulas have closed-form solutions and are not derived from the multivariate distribution functions
using Sklars Theorem.
One particularly simple form of a n-dimensional copula is
where is known as a generator function. Such copulas are known as Archimedean. Any generator function which
satisfies the properties below is the basis for a valid copula:
Product copula: Also called the independent copula, this copula has no dependence between variates. Its density
function is unity everywhere.
Where the generator function is indexed by a parameter, a whole family of copulas may be Archimedean. For
example:
Clayton copula:
For = 0 in the Clayton copula, the random variables are statistically independent. The generator function approach
can be extended to create multivariate copulas, by simply including more additive terms.
Gumbel copula:
Frank copula:
Copula (statistics)
23
Periodic copula
In 2005 Aurlien Alfonsi and Damiano Brigo introduced new families of copulas based on periodic functions.
[4]
They noticed that if is a 1-periodic non-negative function that integrates to 1 over [0,1] and F is a double primitive
of , then both
are copula functions, the second one not necessarily exchangeable. This may be a tool to introduce asymmetric
dependence, which is absent in most known copula functions.
Empirical copulas
When analysing data with an unknown underlying distribution, one can transform the empirical data distribution into
an "empirical copula" by warping such that the marginal distributions become uniform
[1]
. Mathematically the
empirical copula frequency function is calculated by
where x
(i)
represents the ith order statistic of x.
Less formally, simply replace the data along each dimension with the data ranks divided byn.
Applications
Dependence modelling with copula functions is widely used in applications of financial risk assessment and actuarial
analysis - for example in the pricing of collateralized debt obligations (CDOs).
[5]
Some believe the methodology of
applying the Gaussian copula to credit derivatives to be one of the reasons behind the global financial crisis of
20082009.
[6]
[7]
Despite this perception, there are documented attempts of the financial industry, occurring before
the crisis, to address the limitations of the Gaussian copula and of Copula functions more generally, specifically the
lack of dependence dynamics and the poor representation of extreme events
[8]
. The volume "Credit Correlation: Life
After Copulas", published in 2007 by World Scientific, summarizes a 2006 conference held by Merrill Lynch in
London where several practitioners attempted to propose models rectifying some of the copula limitations. See also
the article by Donnelly and Embrechts
[9]
and the book by Brigo, Pallavicini and Torresetti
[10]
.
Whilst the application of copulas in credit has gone through popularity as well as misfortune during the global
financial crisis of 2008-2009,
[3]
[11]
it is arguably an industry standard model for pricing CDOs. Less arguably,
copulas have also been applied to other asset classes as a flexible tool in analyzing multi-asset derivative products.
The first such application outside credit was to use a copula to construct an implied basket volatility surface,
[12]
taking into account the volatility smile of basket components. Copulas have since gained popularity in pricing and
risk management
[13]
of options on multi-assets in the presence of volatility smile/skew, in equity, foreign exchange
and fixed income derivative business. Some typical examples of application of copulas are listed below:
Analyzing and pricing volatility smile/skew of exotic baskets, e.g. best/worst of;
Analyzing and pricing volatility smile/skew of less liquid FX cross, which is effectively a basket: C = S
1
/S
2
or C
= S
1
*S
2
;
Analyzing and pricing spread options, in particular in fixed income constant maturity swap (CMS) spread options.
Recently, copula functions have been successfully applied to the database formulation for the reliability analysis of
highway bridges, to the analysis of spike counts in neuroscience
[14]
and to various multivariate simulation studies in
civil, mechanical and offshore engineering.
Copula (statistics)
24
See also
Joint probability distribution
References
Notes
[1] Nelsen, Roger B. (1999), An Introduction to Copulas, New York: Springer, ISBN0387986235.
[2] Sklar, A. (1959), "Fonctions de rpartition n dimensions et leurs marges", Publ. Inst. Statist. Univ. Paris 8: 229231
[3] Li, David X. (2000), "On Default Correlation: A Copula Function Approach" (http:/ / www. defaultrisk. com/ pp_corr_05. htm), Journal of
Fixed Income 9: 4354,
[4] Alfonsi, A. & Brigo, D. (2005), "New families of Copulas based on periodic functions", Communications in Statistics - Theory and Methods
34 (7): 14371447, doi:10.1081/STA-200063351
[5] Meneguzzo, David; Walter Vecchiato (Nov 2003), "Copula sensitivity in collateralized debt obligations and basket default swaps", Journal of
Futures Markets 24 (1): 3770, doi:10.1002/fut.10110
[6] Recipe for Disaster: The Formula That Killed Wall Street (http:/ / www. wired. com/ techbiz/ it/ magazine/ 17-03/
wp_quant?currentPage=all) Wired, 2/23/2009
[7] MacKenzie, Donald (2008), "End-of-the-World Trade" (http:/ / www. lrb. co. uk/ v30/ n09/ mack01_. html), London Review of Books,
2008-05-08, , retrieved 2009-07-27
[8] Lipton, A., and A. Rennie, (Editors) (2007), Credit Correlation: Life after Copulas (http:/ / www. worldscibooks. com/ economics/ 6559.
html), World Scientific,
[9] Donnelly, C, Embrechts, P, (2010), The devil is in the tails: actuarial mathematics and the subprime mortgage crisis, ASTIN Bulletin 40(1),
1-33
[10] Brigo, D, Pallavicini, A, and Torresetti, R, (2010), Credit Models and the Crisis: A Journey into CDOs, Copulas, Correlations and dynamic
Models, Wiley and Sons
[11] Jones, Sam (April 24, 2009), "The formula that felled Wall St" (http:/ / www. ft. com/ cms/ s/ 2/ 912d85e8-2d75-11de-9eba-00144feabdc0.
html), Financial Times,
[12] Qu, Dong, (2001), Basket Implied Volatility Surface, Derivatives Week, 4 June.
[13] Qu, Dong, (2005), Pricing Basket Options With Skew, Wilmott Magazine, July.
[14] Onken, A; Grnewlder, S; Munk, MH; Obermayer, K (2009), "Analyzing Short-Term Noise Dependencies of Spike-Counts in Macaque
Prefrontal Cortex Using Copulas and the Flashlight Transformation" (http:/ / www. ploscompbiol. org/ article/ info:doi/ 10. 1371/ journal.
pcbi.1000577), PLoS Computational Biology 5 (11): e1000577, doi:10.1371/journal.pcbi.1000577, PMID19956759, PMC2776173,
General
David G. Clayton (1978), "A model for association in bivariate life tables and its application in epidemiological
studies of familial tendency in chronic disease incidence", Biometrika 65, 141151. JSTOR (subscription) (http:/ /
links. jstor. org/ sici?sici=0006-3444(197804)65:1<141:AMFAIB>2. 0. CO;2-Y)
Frees, E.W., Valdez, E.A. (1998), "Understanding Relationships Using Copulas", North American Actuarial
Journal 2, 125. Link to NAAJ copy (http:/ / www. soa. org/ library/ journals/ north-american-actuarial-journal/
1998/ january/ naaj9801_1. pdf)
Roger B. Nelsen (1999), An Introduction to Copulas. ISBN 0-387-98623-5.
S. Rachev, C. Menn, F. Fabozzi (2005), Fat-Tailed and Skewed Asset Return Distributions. ISBN 0-471-71886-6.
A. Sklar (1959), "Fonctions de rpartition n dimensions et leurs marges", Publications de l'Institut de Statistique
de L'Universit de Paris 8, 229-231.
C. Schlzel, P. Friederichs (2008), "Multivariate non-normally distributed random variables in climate research
introduction to the copula approach", Nonlinear Processes in Geophysics, 15, 761-772 Copernicus (open access)
(http:/ / www. nonlin-processes-geophys. net/ 15/ 761/ 2008/ npg-15-761-2008. html)
W.T. Shaw, K.T.A. Lee (2006), "Copula Methods vs Canonical Multivariate Distributions: The Multivariate
Student T Distibution with General Degrees of Freedom". PDF (http:/ / www. mth. kcl. ac. uk/ ~shaww/
web_page/ papers/ MultiStudentc. pdf)
Srinivas Sriramula, Devdas Menon and A. Meher Prasad (2006), "Multivariate Simulation and Multimodal
Dependence Modeling of Vehicle Axle Weights with Copulas", ASCE Journal of Transportation Engineering 132
Copula (statistics)
25
(12), 945955. (doi 10.1061/(ASCE)0733-947X(2006)132:12(945)) ASCE(subscription) (http:/ / cedb. asce. org/
cgi/ WWWdisplay. cgi?0613154)
Genest, C.; MacKay, R.J. (1986), "The Joy of Copulas: Bivariate Distributions with Uniform Marginals" (http:/ /
jstor. org/ stable/ 2684602), The American Statistician (American Statistical Association) 40 (4): 280283,
doi:10.2307/2684602
External links
MathWorld Eric W. Weisstein. "Sklar's Theorem." From MathWorldA Wolfram Web Resource (http:/ /
mathworld. wolfram. com/ SklarsTheorem. html)
Copula Wiki: community portal for researchers with interest in copulas (http:/ / sites. google. com/ site/
copulawiki/ )
A collection of Copula simulation and estimation codes (http:/ / www. mathfinance. cn/ tags/ copula)
Recipe for Disaster: The Formula That Killed Wall Street (http:/ / www. wired. com/ techbiz/ it/ magazine/ 17-03/
wp_quant) By Felix Salmon, Wired News
Did math formula cause financial crisis? (http:/ / marketplace. publicradio. org/ display/ web/ 2009/ 02/ 24/
pm_stock_formula_q/ ) By Felix Salmon and Kai Ryssdal, Marketplace, American
Several short articles on copulas (http:/ / www. aghakouchak. com/ resources/ copulas)
An introduction and some examples to modeling with copulas in Excel (http:/ / www. vosesoftware. com/
ModelRiskHelp/ Modeling_correlation/ Copulas. htm)
Public Media
Copula Functions and their Application in Pricing and Risk Managing Multiame Credit Derivative Products
(http:/ / www. defaultrisk. com/ pp_crdrv_41. htm)
Differential equation
Visualization of heat transfer in a pump casing, by solving the heat equation. Heat is
being generated internally in the casing and being cooled at the boundary, providing a
steady state temperature distribution.
A differential equation is a
mathematical equation for an unknown
function of one or several variables
that relates the values of the function
itself and its derivatives of various
orders. Differential equations play a
prominent role in engineering, physics,
economics, and other disciplines.
Differential equations arise in many
areas of science and technology,
specifically whenever a deterministic
relation involving some continuously
varying quantities (modeled by
functions) and their rates of change in
space and/or time (expressed as
derivatives) is known or postulated.
This is illustrated in classical
mechanics, where the motion of a body
Differential equation
26
is described by its position and velocity as the time varies. Newton's laws allow one to relate the position, velocity,
acceleration and various forces acting on the body and state this relation as a differential equation for the unknown
position of the body as a function of time. In some cases, this differential equation (called an equation of motion)
may be solved explicitly.
An example of modelling a real world problem using differential equations is determination of the velocity of a ball
falling through the air, considering only gravity and air resistance. The ball's acceleration towards the ground is the
acceleration due to gravity minus the deceleration due to air resistance. Gravity is constant but air resistance may be
modelled as proportional to the ball's velocity. This means the ball's acceleration, which is the derivative of its
velocity, depends on the velocity. Finding the velocity as a function of time involves solving a differential equation.
Differential equations are mathematically studied from several different perspectives, mostly concerned with their
solutionsthe set of functions that satisfy the equation. Only the simplest differential equations admit solutions
given by explicit formulas; however, some properties of solutions of a given differential equation may be determined
without finding their exact form. If a self-contained formula for the solution is not available, the solution may be
numerically approximated using computers. The theory of dynamical systems puts emphasis on qualitative analysis
of systems described by differential equations, while many numerical methods have been developed to determine
solutions with a given degree of accuracy.
Directions of study
The study of differential equations is a wide field in pure and applied mathematics, physics, meteorology, and
engineering. All of these disciplines are concerned with the properties of differential equations of various types. Pure
mathematics focuses on the existence and uniqueness of solutions, while applied mathematics emphasizes the
rigorous justification of the methods for approximating solutions. Differential equations play an important role in
modelling virtually every physical, technical, or biological process, from celestial motion, to bridge design, to
interactions between neurons. Differential equations such as those used to solve real-life problems may not
necessarily be directly solvable, i.e. do not have closed form solutions. Instead, solutions can be approximated using
numerical methods.
Mathematicians also study weak solutions (relying on weak derivatives), which are types of solutions that do not
have to be differentiable everywhere. This extension is often necessary for solutions to exist, and it also results in
more physically reasonable properties of solutions, such as possible presence of shocks for equations of hyperbolic
type.
The study of the stability of solutions of differential equations is known as stability theory.
Nomenclature
The theory of differential equations is quite developed and the methods used to study them vary significantly with
the type of the equation.
An ordinary differential equation (ODE) is a differential equation in which the unknown function (also known as
the dependent variable) is a function of a single independent variable. In the simplest form, the unknown
function is a real or complex valued function, but more generally, it may be vector-valued or matrix-valued: this
corresponds to considering a system of ordinary differential equations for a single function. Ordinary differential
equations are further classified according to the order of the highest derivative of the dependent variable with
respect to the independent variable appearing in the equation. The most important cases for applications are
first-order and second-order differential equations. In the classical literature also distinction is made between
differential equations explicitly solved with respect to the highest derivative and differential equations in an
implicit form.
Differential equation
27
A partial differential equation (PDE) is a differential equation in which the unknown function is a function of
multiple independent variables and the equation involves its partial derivatives. The order is defined similarly to
the case of ordinary differential equations, but further classification into elliptic, hyperbolic, and parabolic
equations, especially for second-order linear equations, is of utmost importance. Some partial differential
equations do not fall into any of these categories over the whole domain of the independent variables and they are
said to be of mixed type.
Both ordinary and partial differential equations are broadly classified as linear and nonlinear. A differential
equation is linear if the unknown function and its derivatives appear to the power 1 (products are not allowed) and
nonlinear otherwise. The characteristic property of linear equations is that their solutions form an affine subspace of
an appropriate function space, which results in much more developed theory of linear differential equations.
Homogeneous linear differential equations are a further subclass for which the space of solutions is a linear
subspace i.e. the sum of any set of solutions or multiples of solutions is also a solution. The coefficients of the
unknown function and its derivatives in a linear differential equation are allowed to be (known) functions of the
independent variable or variables; if these coefficients are constants then one speaks of a constant coefficient linear
differential equation.
There are very few methods of explicitly solving nonlinear differential equations; those that are known typically
depend on the equation having particular symmetries. Nonlinear differential equations can exhibit very complicated
behavior over extended time intervals, characteristic of chaos. Even the fundamental questions of existence,
uniqueness, and extendability of solutions for nonlinear differential equations, and well-posedness of initial and
boundary value problems for nonlinear PDEs are hard problems and their resolution in special cases is considered to
be a significant advance in the mathematical theory (cf. NavierStokes existence and smoothness).
Linear differential equations frequently appear as approximations to nonlinear equations. These approximations are
only valid under restricted conditions. For example, the harmonic oscillator equation is an approximation to the
nonlinear pendulum equation that is valid for small amplitude oscillations (see below).
Examples
In the first group of examples, let u be an unknown function of x, and c and are known constants.
Inhomogeneous first-order linear constant coefficient ordinary differential equation:
Homogeneous second-order linear ordinary differential equation:
Homogeneous second-order linear constant coefficient ordinary differential equation describing the harmonic
oscillator:
First-order nonlinear ordinary differential equation:
Second-order nonlinear ordinary differential equation describing the motion of a pendulum of length L:
In the next group of examples, the unknown function u depends on two variables x and t or x and y.
Homogeneous first-order linear partial differential equation:
Differential equation
28
Homogeneous second-order linear constant coefficient partial differential equation of elliptic type, the Laplace
equation:
Third-order nonlinear partial differential equation, the Kortewegde Vries equation:
Related concepts
A delay differential equation (DDE) is an equation for a function of a single variable, usually called time, in
which the derivative of the function at a certain time is given in terms of the values of the function at earlier
times.
A stochastic differential equation (SDE) is an equation in which the unknown quantity is a stochastic process and
the equation involves some known stochastic processes, for example, the Wiener process in the case of diffusion
equations.
A differential algebraic equation (DAE) is a differential equation comprising differential and algebraic terms,
given in implicit form.
Connection to difference equations
The theory of differential equations is closely related to the theory of difference equations, in which the coordinates
assume only discrete values, and the relationship involves values of the unknown function or functions and values at
nearby coordinates. Many methods to compute numerical solutions of differential equations or study the properties
of differential equations involve approximation of the solution of a differential equation by the solution of a
corresponding difference equation.
Universality of mathematical description
Many fundamental laws of physics and chemistry can be formulated as differential equations. In biology and
economics differential equations are used to model the behavior of complex systems. The mathematical theory of
differential equations first developed, together with the sciences, where the equations had originated and where the
results found application. However, diverse problems, sometimes originating in quite distinct scientific fields, may
give rise to identical differential equations. Whenever this happens, mathematical theory behind the equations can be
viewed as a unifying principle behind diverse phenomena. As an example, consider propagation of light and sound in
the atmosphere, and of waves on the surface of a pond. All of them may be described by the same second-order
partial differential equation, the wave equation, which allows us to think of light and sound as forms of waves, much
like familiar waves in the water. Conduction of heat, the theory of which was developed by Joseph Fourier, is
governed by another second-order partial differential equation, the heat equation. It turned out that many diffusion
processes, while seemingly different, are described by the same equation; Black-Scholes equation in finance is for
instance, related to the heat equation.
Differential equation
29
Notable differential equations
Newton's Second Law in dynamics (mechanics)
Hamilton's equations in classical mechanics
Radioactive decay in nuclear physics
Newton's law of cooling in thermodynamics
The wave equation
Maxwell's equations in electromagnetism
The heat equation in thermodynamics
Laplace's equation, which defines harmonic functions
Poisson's equation
Einstein's field equation in general relativity
The Schrdinger equation in quantum mechanics
The geodesic equation
The NavierStokes equations in fluid dynamics
The CauchyRiemann equations in complex analysis
The PoissonBoltzmann equation in molecular dynamics
The shallow water equations
Universal differential equation
The Lorenz equations whose solutions exhibit chaotic flow.
Biology
Verhulst equation biological population growth
von Bertalanffy model biological individual growth
LotkaVolterra equations biological population dynamics
Replicator dynamics may be found in theoretical biology
Economics
The BlackScholes PDE
Exogenous growth model
Malthusian growth model
The Vidale-Wolfe advertising model
See also
Complex differential equation
Exact differential equation
Integral equations
Linear differential equation
PicardLindelf theorem on existence and uniqueness of solutions
Differential equation
30
References
D. Zwillinger, Handbook of Differential Equations (3rd edition), Academic Press, Boston, 1997.
A. D. Polyanin and V. F. Zaitsev, Handbook of Exact Solutions for Ordinary Differential Equations (2nd edition),
Chapman & Hall/CRC Press, Boca Raton, 2003. ISBN 1-58488-297-2.
W. Johnson, A Treatise on Ordinary and Partial Differential Equations
[1]
, John Wiley and Sons, 1913, in
University of Michigan Historical Math Collection
[2]
E.L. Ince, Ordinary Differential Equations, Dover Publications, 1956
E.A. Coddington and N. Levinson, Theory of Ordinary Differential Equations, McGraw-Hill, 1955
P. Blanchard, R.L. Devaney, G.R. Hall, Differential Equations, Thompson, 2006
External links
Lectures on Differential Equations
[3]
MIT Open CourseWare Video
Online Notes / Differential Equations
[4]
Paul Dawkins, Lamar University
Differential Equations
[5]
, S.O.S. Mathematics
Introduction to modeling via differential equations
[6]
Introduction to modeling by means of differential
equations, with critical remarks.
Differential Equation Solver
[7]
Java applet tool used to solve differential equations.
Mathematical Assistant on Web
[8]
Symbolic ODE tool, using Maxima
Exact Solutions of Ordinary Differential Equations
[9]
Collection of ODE and DAE models of physical systems
[10]
MATLAB models
Notes on Diffy Qs: Differential Equations for Engineers
[11]
An introductory textbook on differential equations by
Jiri Lebl of UIUC
References
[1] http:/ / www. hti. umich.edu/ cgi/ b/ bib/ bibperm?q1=abv5010. 0001. 001
[2] http:/ / hti.umich. edu/ u/ umhistmath/
[3] http:/ / ocw. mit. edu/ OcwWeb/ Mathematics/ 18-03Spring-2006/ VideoLectures/ index. htm
[4] http:/ / tutorial.math. lamar. edu/ classes/ de/ de. aspx
[5] http:/ / www. sosmath.com/ diffeq/ diffeq. html
[6] http:/ / www. diptem.unige.it/ patrone/ differential_equations_intro. htm
[7] http:/ / publicliterature. org/ tools/ differential_equation_solver/
[8] http:/ / user. mendelu. cz/ marik/ maw/ index.php?lang=en& form=ode
[9] http:/ / eqworld. ipmnet. ru/ en/ solutions/ ode. htm
[10] http:/ / www.hedengren.net/ research/ models.htm
[11] http:/ / www.jirka.org/ diffyqs/
Expected value
31
Expected value
In probability theory and statistics, the expected value (or expectation value, or mathematical expectation, or
mean, or first moment) of a random variable is the integral of the random variable with respect to its probability
measure.
[1]
[2]
Intuitively, expectation is the long-run average: if a test could be repeated many times, expectation is
the mean of all the results.
For discrete random variables this is equivalent to the probability-weighted sum of the possible values.
For continuous random variables with a density function it is the probability density-weighted integral of the
possible values.
The term "expected value" can be misleading. It must not be confused with the "most probable value." The expected
value is in general not a typical value that the random variable can take on. It is often helpful to interpret the
expected value of a random variable as the long-run average value of the variable over many independent repetitions
of an experiment.
The expected value may be intuitively understood by the law of large numbers: The expected value, when it exists, is
almost surely the limit of the sample mean as sample size grows to infinity. The value may not be expected in the
general sense the "expected value" itself may be unlikely or even impossible (such as having 2.5 children), just
like the sample mean.
The expected value does not exist for some distributions with large "tails", such as the Cauchy distribution.
[3]
It is possible to construct an expected value equal to the probability of an event by taking the expectation of an
indicator function that is one if the event has occurred and zero otherwise. This relationship can be used to translate
properties of expected values into properties of probabilities, e.g. using the law of large numbers to justify estimating
probabilities by frequencies.
History
The idea of the expected value originated in the middle of the 17th century from the study of the so-called problem
of points, posed by a French nobleman chevalier de Mr. The problem was that of two players who want to finish a
game early and, given the current circumstances of the game, want to divide the stakes fairly, based on the chance
each has of winning the game from that point. This problem was solved in 1654 by Blaise Pascal in his private
correspondence with Pierre de Fermat, however the idea was not communicated to the broad scientific community.
Three years later, in 1657, a Dutch mathematician Christiaan Huygens published a treatise (see Huygens (1657)) De
ratiociniis in ludo ale on probability theory, which not only lay down the foundations of the theory of probability,
but also considered the problem of points, presenting a solution essentially the same as Pascals.
[4]
Neither Pascal nor Huygens used the term expectation in its modern sense. In particular, Huygens writes: That my
Chance or Expectation to win any thing is worth just such a Sum, as woud procure me in the same Chance and
Expectation at a fair Lay. If I expect a or b, and have an equal Chance of gaining them, my Expectation is worth
. More than a hundred years later, in 1814, Pierre-Simon Laplace published his tract Thorie analytique des
probabilits, where the concept of expected value was defined explicitly:
This advantage in the theory of chance is the product of the sum hoped for by the probability of obtaining it; it is the partial sum which
ought to result when we do not wish to run the risks of the event in supposing that the division is made proportional to the probabilities. This
division is the only equitable one when all strange circumstances are eliminated; because an equal degree of probability gives an equal right
for the sum hoped for. We will call this advantage mathematical hope.
The use of letter E to denote expected value goes back to W.A. Whitworth (1901) Choice and chance. The symbol
has become popular since for English writers it meant Expectation, for Germans Erwartungswert, and for French
Esprance mathmatique.
[5]
Expected value
32
Examples
The expected outcome from one roll of an ordinary (that is, fair) six-sided die is
which is not among the possible outcomes.
[6]
A common application of expected value is gambling. For example, an American roulette wheel has 38 places where
the ball may land, all equally likely. A winning bet on a single number pays 35-to-1, meaning that the original stake
is not lost, and 35 times that amount is won, so you receive 36 times what you've bet. Considering all 38 possible
outcomes, the expected value of the profit resulting from a dollar bet on a single number is the sum of potential net
loss times the probability of losing and potential net gain times the probability of winning, that is,
The net change in your financial holdings is $1 when you lose, and $35 when you win. Thus one may expect, on
average, to lose about five cents for every dollar bet, and the expected value of a one-dollar bet is $0.947368421. In
gambling, an event of which the expected value equals the stake (i.e. the bettor's expected profit, or net gain, is zero)
is called a fair game.
Mathematical definition
In general, if is a random variable defined on a probability space , then the expected value of ,
denoted by , , or , is defined as
When this integral converges absolutely, it is called the expectation of X.The absolute convergence is necessary
because conditional convergence means that different order of addition gives different result, which is against the
nature of expected value. Here the Lebesgue integral is employed. Note that not all random variables have an
expected value, since the integral may not converge absolutely (e.g., Cauchy distribution). Two variables with the
same probability distribution will have the same expected value, if it is defined.
If is a discrete random variable with probability mass function , then the expected value becomes
as in the gambling example mentioned above.
If the probability distribution of admits a probability density function , then the expected value can be
computed as
It follows directly from the discrete case definition that if is a constant random variable, i.e. for some
fixed real number , then the expected value of is also .
The expected value of an arbitrary function of X, g(X), with respect to the probability density function f(x) is given
by the inner product of f and g:
This is sometimes called the law of the unconscious statistician. Using representations as RiemannStieltjes integral
and integration by parts the formula can be restated as
if ,
if .
As a special case let denote a positive real number, then
Expected value
33
In particular, for , this reduces to:
if , where F is the cumulative distribution function of X.
Conventional terminology
When one speaks of the "expected price", "expected height", etc. one means the expected value of a random
variable that is a price, a height, etc.
When one speaks of the "expected number of attempts needed to get one successful attempt," one might
conservatively approximate it as the reciprocal of the probability of success for such an attempt. Cf. expected
value of the geometric distribution.
Properties
Constants
The expected value of a constant is equal to the constant itself; i.e., if c is a constant, then .
Monotonicity
If X and Y are random variables so that almost surely, then .
Linearity
The expected value operator (or expectation operator) is linear in the sense that
Note that the second result is valid even if X is not statistically independent of Y. Combining the results from
previous three equations, we can see that
for any two random variables and (which need to be defined on the same probability space) and any real
numbers and .
Expected value
34
Iterated expectation
Iterated expectation for discrete random variables
For any two discrete random variables one may define the conditional expectation:
[7]
which means that is a function on .
Then the expectation of satisfies
Hence, the following equation holds:
[8]
The right hand side of this equation is referred to as the iterated expectation and is also sometimes called the tower
rule. This proposition is treated in law of total expectation.
Iterated expectation for continuous random variables
In the continuous case, the results are completely analogous. The definition of conditional expectation would use
inequalities, density functions, and integrals to replace equalities, mass functions, and summations, respectively.
However, the main result still holds:
Inequality
If a random variable X is always less than or equal to another random variable Y, the expectation of X is less than or
equal to that of Y:
If , then .
In particular, since and , the absolute value of expectation of a random variable is less
than or equal to the expectation of its absolute value:
Expected value
35
Non-multiplicativity
If one considers the joint PDF of X and Y, say j(x,y), then the expectation of XY is
In general, the expected value operator is not multiplicative, i.e. is not necessarily equal to .
In fact, the amount by which multiplicativity fails is called the covariance:
Thus multiplicativity holds precisely when , in which case and are said to be uncorrelated
(independent variables are a notable case of uncorrelated variables).
Now if X and Y are independent, then by definition j(x,y)=f(x)g(y) where f and g are the marginal PDFs for X and Y.
Then
and .
Observe that independence of X and Y is required only to write j(x,y)=f(x)g(y), and this is required to establish the
second equality above. The third equality follows from a basic application of the Fubini-Tonelli theorem.
Functional non-invariance
In general, the expectation operator and functions of random variables do not commute; that is
A notable inequality concerning this topic is Jensen's inequality, involving expected values of convex (or concave)
functions.
Uses and applications
The expected values of the powers of are called the moments of ; the moments about the mean of are
expected values of powers of . The moments of some random variables can be used to specify their
distributions, via their moment generating functions.
To empirically estimate the expected value of a random variable, one repeatedly measures observations of the
variable and computes the arithmetic mean of the results. If the expected value exists, this procedure estimates the
true expected value in an unbiased manner and has the property of minimizing the sum of the squares of the residuals
(the sum of the squared differences between the observations and the estimate). The law of large numbers
demonstrates (under fairly mild conditions) that, as the size of the sample gets larger, the variance of this estimate
gets smaller.
This property is often exploited in a wide variety of applications, including general problems of statistical estimation
and machine learning, to estimate (probabilistic) quantities of interest via Monte Carlo methods, since most
quantities of interest can be written in terms of expectation, e.g. where is the
indicator function for set , i.e. .
In classical mechanics, the center of mass is an analogous concept to expectation. For example, suppose is a
discrete random variable with values and corresponding probabilities . Now consider a weightless rod on
which are placed weights, at locations along the rod and having masses (whose sum is one). The point at
which the rod balances is .
Expected values can also be used to compute the variance, by means of the computational formula for the variance
Expected value
36
A very important application of the expectation value is in the field of quantum mechanics. The expectation value of
a quantum mechanical operator operating on a quantum state vector is written as . The
uncertainty in can be calculated using the formula .
Expectation of matrices
If is an matrix, then the expected value of the matrix is defined as the matrix of expected values:
This is utilized in covariance matrices.
Formulas for special cases
Discrete distribution taking only non-negative integer values
When a random variable takes only values in we can use the following formula for computing its
expectation:
Proof:
interchanging the order of summation, we have
as claimed. This result can be a useful computational shortcut. For example, suppose we toss a coin where the
probability of heads is . How many tosses can we expect until the first heads (not including the heads itself)? Let
be this number. Note that we are counting only the tails and not the heads which ends the experiment; in
particular, we can have . The expectation of may be computed by . This is because
the number of tosses is at least exactly when the first tosses yielded tails. This matches the expectation of a
random variable with an Exponential distribution. We used the formula for Geometric progression:
Expected value
37
Continuous distribution taking non-negative values
Analogously with the discrete case above, when a continuous random variable X takes only non-negative values, we
can use the following formula for computing its expectation:
Proof: It is first assumed that X has a density .
interchanging the order of integration, we have
as claimed. In case no density exists, it is seen that
See also
Conditional expectation
An inequality on location and scale parameters
Expected value is also a key concept in economics, finance, and many other subjects
The general term expectation
Moment (mathematics)
Expectation value (quantum mechanics)
Wald's equation for calculating the expected value of a random number of random variables
Notes
[1] Sheldon M Ross (2007). "2.4 Expectation of a random variable" (http:/ / books. google. com/ books?id=12Pk5zZFirEC& pg=PA38).
Introduction to probability models (9th ed.). Academic Press. p.38 ff. ISBN0125980620. .
[2] Richard W Hamming (1991). "2.5 Random variables, mean and the expected value" (http:/ / books. google. com/
books?id=jX_F-77TA3gC& pg=PA64). The art of probability for scientists and engineers. Addison-Wesley. p.64 ff. ISBN0201406861. .
[3] For a discussion of the Cauchy distribution, see Richard W Hamming (1991). "Example 8.71 The Cauchy distribution" (http:/ / books.
google.com/ books?id=jX_F-77TA3gC& printsec=frontcover& dq=isbn:0201406861& cd=1#v=onepage& q=Cauchy& f=false). The art of
probability for scientists and engineers. Addison-Wesley. p.290 ff. ISBN0201406861. . "Sampling from the Cauchy distribution and
averaging gets you nowhere one sample has the same distribution as the average of 1000 samples!"
[4] In the foreword to his book, Huygens writes: It should be said, also, that for some time some of the best mathematicians of France have
occupied themselves with this kind of calculus so that no one should attribute to me the honour of the first invention. This does not belong to
me. But these savants, although they put each other to the test by proposing to each other many questions difficult to solve, have hidden their
methods. I have had therefore to examine and go deeply for myself into this matter by beginning with the elements, and it is impossible for me
for this reason to affirm that I have even started from the same principle. But finally I have found that my answers in many cases do not differ
from theirs. (cited in Edwards (2002)). Thus, Huygens learned about de Mrs problem in 1655 during his visit to France; later on in 1656
from his correspondence with Carcavi he learned that his method was essentially the same as Pascals; so that before his book went to press in
1657 he knew about Pascals priority in this subject.
[5] "Earliest uses of symbols in probability and statistics" (http:/ / jeff560. tripod. com/ stat. html). .
[6] Sheldon M Ross. "Example 2.15" (http:/ / books. google. com/ books?id=12Pk5zZFirEC& pg=PA39). cited work. p.39. ISBN0125980620. .
[7] Sheldon M Ross. "Chapter 3: Conditional probability and conditional expectation" (http:/ / books. google. com/ books?id=12Pk5zZFirEC&
pg=PA97). cited work. p.97 ff. ISBN0125980620. .
[8] Sheldon M Ross. "3.4: Computing expectations by conditioning" (http:/ / books. google. com/ books?id=12Pk5zZFirEC& pg=PA105). cited
work. p.105 ff. ISBN0125980620. .
Expected value
38
Historical background
Edwards, A.W.F (2002). Pascals arithmetical triangle: the story of a mathematical idea (2nd ed.). JHU Press.
ISBN0-8018-6946-3.
Huygens, Christiaan (1657). De ratiociniis in ludo ale (English translation, published in 1714: (http:/ / www.
york.ac. uk/ depts/ maths/ histstat/ huygens. pdf)).
External links
An 8-foot-tall (2.4 m) Probability Machine (named Sir Francis) comparing stock market returns to the
randomness of the beans dropping through the quincunx pattern. (http:/ / www. youtube. com/
watch?v=AUSKTk9ENzg) from Index Funds Advisors IFA.com (http:/ / www. ifa. com), youtube.com
Expectation (http:/ / planetmath. org/ ?op=getobj& amp;from=objects& amp;id=505) on PlanetMath
Ergodic theory
Ergodic theory is a branch of mathematics that studies dynamical systems with an invariant measure and related
problems. Its initial development was motivated by problems of statistical physics.
A central aspect of ergodic theory is the behavior of a dynamical system when it is allowed to run for a long time.
The first result in this direction is the Poincar recurrence theorem, which claims that almost all points in any subset
of the phase space eventually revisit the set. More precise information is provided by various ergodic theorems
which assert that, under certain conditions, the time average of a function along the trajectories exists almost
everywhere and is related to the space average. Two of the most important examples are ergodic theorems of
Birkhoff and von Neumann. For the special class of ergodic systems, the time average is the same for almost all
initial points: statistically speaking, the system that evolves for a long time "forgets" its initial state. Stronger
properties, such as mixing and equidistribution, have also been extensively studied.
The problem of metric classification of systems is another important part of the abstract ergodic theory. An
outstanding role in ergodic theory and its applications to stochastic processes is played by the various notions of
entropy for dynamical systems.
Applications of ergodic theory to other parts of mathematics usually involve establishing ergodicity properties for
systems of special kind. In geometry, methods of ergodic theory have been used to study the geodesic flow on
Riemannian manifolds, starting with the results of Eberhard Hopf for Riemann surfaces of negative curvature.
Markov chains form a common context for applications in probability theory. Ergodic theory has fruitful connections
with harmonic analysis, Lie theory (representation theory, lattices in algebraic groups), and number theory (the
theory of diophantine approximations, L-functions).
Ergodic theory
39
Ergodic transformations
Ergodic theory is often concerned with ergodic transformations.
Let T: X X be a measure-preserving transformation on a measure space (X, , ), usually assumed to have finite
measure. A measure-preserving transformation T as above is ergodic if for every with
either or .
Examples
An irrational rotation of the circle R/Z, T: x x+, where is irrational, is ergodic. This transformation has even
stronger properties of unique ergodicity, minimality, and equidistribution. By contrast, if = p/q is rational (in
lowest terms) then T is periodic, with period q, and thus cannot be ergodic: for any interval I of length a, 0 < a <
1/q, its orbit under T is a T-invariant mod 0 set that is a union of q intervals of length a, hence it has measure qa
strictly between 0 and 1.
Let G be a compact abelian group, the normalized Haar measure, and T a group automorphism of G. Let G
*
be
the Pontryagin dual group, consisting of the continuous characters of G, and T
*
be the corresponding adjoint
automorphism of G
*
. The automorphism T is ergodic if and only if the equality (T
*
)
n
()= is possible only when n
= 0 or is the trivial character of G. In particular, if G is the n-dimensional torus and the automorphism T is
represented by an integral matrix A then T is ergodic if and only if no eigenvalue of A is a root of unity.
A Bernoulli shift is ergodic. More generally, ergodicity of the shift transformation associated with a sequence of
i.i.d. random variables and some more general stationary processes follows from Kolmogorov's zero-one law.
Ergodicity of a continuous dynamical system means that its trajectories "spread around" the phase space. A
system with a compact phase space which has a non-constant first integral cannot be ergodic. This applies, in
particular, to Hamiltonian systems with a first integral I functionally independent from the Hamilton function H
and a compact level set X = {(p,q): H(p,q)=E} of constant energy. Liouville's theorem implies the existence of a
finite invariant measure on X, but the dynamics of the system is constrained to the level sets of I on X, hence the
system possesses invariant sets of positive but less than full measure. A property of continuous dynamical
systems that is the opposite of ergodicity is complete integrability.
Ergodic theorems
Let be a measure-preserving transformation on a measure space . One may then consider
the "time average" of a -integrable functionf, i.e. . The "time average" is defined as the average (if
it exists) over iterations of T starting from some initial pointx.
If is finite and nonzero, we can consider the "space average" or "phase average" of f, defined as
In general the time average and space average may be different. But if the transformation is ergodic, and the measure
is invariant, then the time average is equal to the space average almost everywhere. This is the celebrated ergodic
theorem, in an abstract form due to George David Birkhoff. (Actually, Birkhoff's paper considers not the abstract
general case but only the case of dynamical systems arising from differential equations on a smooth manifold.) The
equidistribution theorem is a special case of the ergodic theorem, dealing specifically with the distribution of
probabilities on the unit interval.
More precisely, the pointwise or strong ergodic theorem states that the limit in the definition of the time average of
f exists for almost every x and that the (almost everywhere defined) limit function is integrable:
Ergodic theory
40
Furthermore, is T-invariant, that is to say
holds almost everywhere, and if is finite, then the normalization is the same:
In particular, if T is ergodic, then must be a constant (almost everywhere), and so one has that
almost everywhere. Joining the first to the last claim and assuming that is finite and nonzero, one has that
for almost all x, i.e., for all x except for a set of measure zero.
For an ergodic transformation, the time average equals the space average almost surely.
As an example, assume that the measure space models the particles of a gas as above, and let f(x)
denotes the velocity of the particle at position x. Then the pointwise ergodic theorems says that the average velocity
of all particles at some given time is equal to the average velocity of one particle over time.
Probabilistic formulation: BirkhoffKhinchin theorem
BirkhoffKhinchin theorem. Let be measurable, , and be a measure-preserving map. Then
where is the conditional expectation given the -algebra of invariant sets of .
Corollary (Pointwise ergodic theorem) In particular, if is also ergodic, then is the trivial -algebra, and thus
Mean ergodic theorem
Another form of the ergodic theorem, von Neumann's mean ergodic theorem, holds in Hilbert spaces.
[1]
Let be a unitary operator on a Hilbert space ; more generally, an isometric linear operator (that is, a not
necessarily surjective linear operator satisfying for all , or equivalently, satisfying
but not necessarily ). Let be the orthogonal projection onto
.
Then, for any , we have:
where the limit is with respect to the norm on H. In other words, the sequence of averages
converges to P in the strong operator topology.
This theorem specializes to the case in which the Hilbert space H consists of L
2
functions on a measure space and U
is an operator of the form
Ergodic theory
41
where T is a measure-preserving endomorphism of X, thought of in applications as representing a time-step of a
discrete dynamical system.
[2]
The ergodic theorem then asserts that the average behavior of a function f over
sufficiently large time-scales is approximated by the orthogonal component of f which is time-invariant.
In another form of the mean ergodic theorem, let U
t
be a strongly continuous one-parameter group of unitary
operators on H. Then the operator
converges in the strong operator topology as T. In fact, this result also extends to the case of strongly
continuous one-parameter semigroup of contractive operators on a reflexive space.
Remark: Some intuition for the mean ergodic theorem can be developed by considering the case where complex
numbers of unit length are regarded as unitary transformations on the complex plane (by left multiplication). If we
pick a single complex number of unit length (which we think of as ), it is intuitive that its powers will fill up the
circle. Since the circle is symmetric around 0, it makes sense that the averages of the powers of will converge to
0. Also, 0 is the only fixed point of , and so the projection onto the space of fixed points must be the zero
operator (which agrees with the limit just described).
Convergence of the ergodic means in the norms
Let be as above a probability space with a measure preserving transformation T, and let .
The conditional expectation with respect to the sub--algebra of the T-invariant sets is a linear projector of
norm 1 of the Banach space onto its closed subspace The latter may also be
characterized as the space of all T-invariant -functions on X. The ergodic means, as linear operators on
also have unit operator norm; and, as a simple consequence of the BirkhoffKhinchin theorem,
converge to the projector in the strong operator topology of if and in the weak operator
topology if . More is true if then the WienerYoshidaKakutani ergodic dominated
convergence theorem states that the ergodic means of are dominated in ; however, if , the
ergodic means may fail to be equidominated in . Finally, if f is assumed to be in the Zygmund class, that is
is integrable, then the ergodic means are even dominated in .
Sojourn time
Let be a measure space such that is finite and nonzero. The time spent in a measurable set A is
called the sojourn time. An immediate consequence of the ergodic theorem is that, in an ergodic system, the relative
measure of A is equal to the mean sojourn time:
for all x except for a set of measure zero, where is the indicator function of A.
Let the occurrence times of a measurable set A be defined as the set k
1
, k
2
, k
3
, ..., of times k such that T
k
(x) is in A,
sorted in increasing order. The differences between consecutive occurrence times R
i
= k
i
k
i1
are called the
recurrence times of A. Another consequence of the ergodic theorem is that the average recurrence time of A is
inversely proportional to the measure of A, assuming that the initial point x is in A, so that k
0
= 0.
(See almost surely.) That is, the smaller A is, the longer it takes to return to it.
Ergodic theory
42
Ergodic flows on manifolds
The ergodicity of the geodesic flow on compact Riemann surfaces of variable negative curvature and on compact
manifolds of constant negative curvature of any dimension was proved by Eberhard Hopf in 1939, although special
cases had been studied earlier: see for example, Hadamard's billiards (1898) and Artin billiard (1924). The relation
between geodesic flows on Riemann surfaces and one-parameter subgroups on SL(2,R) was described in 1952 by S.
V. Fomin and I. M. Gelfand. The article on Anosov flows provides an example of ergodic flows on SL(2,R) and on
Riemann surfaces of negative curvature. Much of the development described there generalizes to hyperbolic
manifolds, since they can be viewed as quotients of the hyperbolic space by the action of a lattice in the semisimple
Lie group SO(n,1). Ergodicity of the geodesic flow on Riemannian symmetric spaces was demonstrated by F. I.
Mautner in 1957. In 1967 D. V. Anosov and Ya. G. Sinai proved ergodicity of the geodesic flow on compact
manifolds of variable negative sectional curvature. A simple criterion for the ergodicity of a homogeneous flow on a
homogeneous space of a semisimple Lie group was given by C. C. Moore in 1966. Many of the theorems and results
from this area of study are typical of rigidity theory.
In the 1930s G. A. Hedlund proved that the horocycle flow on a compact hyperbolic surface is minimal and ergodic.
Unique ergodicity of the flow was established by Hillel Furstenberg in 1972. Ratner's theorems provide a major
generalization of ergodicity for unipotent flows on the homogeneous spaces of the form \G, where G is a Lie group
and is a lattice in G.
In the last 20 years, there have been many works trying to find a measure-classification theorem similar to Ratner's
theorems but for diagonalizable actions, motivated by conjectures of Furstenberg and Marguils. An important partial
result (solving those conjectures with an extra assumption of positive entropy) was proved by Elon Lindenstrauss,
and he was awarded the Fields medal in 2010 for this result.
See also
Chaos theory
Ergodic hypothesis
Ergodic process
Maximal ergodic theorem
Statistical mechanics
References
[1] I: Functional Analysis : Volume 1 by Michael Reed, Barry Simon,Academic Press; REV edition (1980)
[2] (Walters 1982)
Historical references
Birkhoff, George David (1931), "Proof of the ergodic theorem" (http:/ / www. pnas. org/ cgi/ reprint/ 17/ 12/ 656),
Proc Natl Acad Sci USA 17 (12): 656660, doi:10.1073/pnas.17.12.656, PMID16577406, PMC1076138.
Birkhoff, George David (1942), "What is the ergodic theorem?" (http:/ / www. jstor. org/ stable/ 2303229),
American Mathematical Monthly (The American Mathematical Monthly, Vol. 49, No. 4) 49 (4): 222226,
doi:10.2307/2303229.
von Neumann, John (1932), "Proof of the Quasi-ergodic Hypothesis" (http:/ / www. pubmedcentral. nih. gov/
articlerender. fcgi?tool=pmcentrez& artid=1076162), Proc Natl Acad Sci USA 18 (1): 7082,
doi:10.1073/pnas.18.1.70, PMID16577432, PMC1076162.
von Neumann, John (1932), "Physical Applications of the Ergodic Hypothesis" (http:/ / www. jstor. org/ stable/
86260), Proc Natl Acad Sci USA 18 (3): 263266, doi:10.1073/pnas.18.3.263, PMID16587674, PMC1076204.
Ergodic theory
43
Hopf, Eberhard (1939), "Statistik der geodtischen Linien in Mannigfaltigkeiten negativer Krmmung", Leipzig
Ber. Verhandl. Schs. Akad. Wiss. 91: 261304.
Fomin, Sergei V.; Gelfand, I. M. (1952), "Geodesic flows on manifolds of constant negative curvature", Uspehi
Mat. Nauk 7 (1): 118137.
Mautner, F. I. (1957), "Geodesic flows on symmetric Riemann spaces" (http:/ / jstor. org/ stable/ 1970054), Ann.
Of Math. (The Annals of Mathematics, Vol. 65, No. 3) 65 (3): 416431, doi:10.2307/1970054.
Moore, C. C. (1966), "Ergodicity of flows on homogeneous spaces" (http:/ / jstor. org/ stable/ 2373052), Amer. J.
Math. (American Journal of Mathematics, Vol. 88, No. 1) 88 (1): 154178, doi:10.2307/2373052.
Modern references
D.V. Anosov (2001), "Ergodic theory" (http:/ / eom. springer. de/ e/ e036150. htm), in Hazewinkel, Michiel,
Encyclopaedia of Mathematics, Springer, ISBN978-1556080104
This article incorporates material from ergodic theorem on PlanetMath, which is licensed under the Creative
Commons Attribution/Share-Alike License.
Vladimir Igorevich Arnol'd and Andr Avez, Ergodic Problems of Classical Mechanics. New York: W.A.
Benjamin. 1968.
Leo Breiman, Probability. Original edition published by AddisonWesley, 1968; reprinted by Society for
Industrial and Applied Mathematics, 1992. ISBN 0-89871-296-3. (See Chapter 6.)
Peter Walters, An introduction to ergodic theory, Springer, New York, 1982, ISBN 0-387-95152-0.
Tim Bedford, Michael Keane and Caroline Series, eds. (1991), Ergodic theory, symbolic dynamics and hyperbolic
spaces, Oxford University Press, ISBN0-19-853390-X (A survey of topics in ergodic theory; with exercises.)
Karl Petersen. Ergodic Theory (Cambridge Studies in Advanced Mathematics). Cambridge: Cambridge
University Press. 1990.
Joseph M. Rosenblatt and Mt Weirdl, Pointwise ergodic theorems via harmonic analysis, (1993) appearing in
Ergodic Theory and its Connections with Harmonic Analysis, Proceedings of the 1993 Alexandria Conference,
(1995) Karl E. Petersen and Ibrahim A. Salama, eds., Cambridge University Press, Cambridge, ISBN
0-521-45999-0. (An extensive survey of the ergodic properties of generalizations of the equidistribution theorem
of shift maps on the unit interval. Focuses on methods developed by Bourgain.)
A.N. Shiryaev, Probability, 2nd ed., Springer 1996, Sec. V.3. ISBN 0-387-94549-0.
External links
Ergodic Theory (29 October 2007) (http:/ / www. cscs. umich. edu/ ~crshalizi/ notebooks/ ergodic-theory. html)
Notes by Cosma Rohilla Shalizi
FeynmanKac formula
44
FeynmanKac formula
The FeynmanKac formula, named after Richard Feynman and Mark Kac, establishes a link between parabolic
partial differential equations (PDEs) and stochastic processes. It offers a method of solving certain PDEs by
simulating random paths of a stochastic process. Conversely, an important class of expectations of random processes
can be computed by deterministic methods. Consider the PDE,
defined for all real and in the interval , subject to the terminal condition
where are known functions, is a parameter and is the unknown. Then the FeynmanKac
formula tells us that the solution can be written as an expectation:
where is an It process driven by the equation
with is a Wiener process (also called Brownian motion) and the initial condition for is .
This expectation can then be approximated using Monte Carlo or quasi-Monte Carlo methods.
Proof
NOTE: The proof presented below is essentially that of
[1]
, albeit with more detail. Let be the solution to
above PDE. Applying It's lemma to the process one gets
Since , the third term is and can be dropped. Applying It's
lemma once again to , it follows that
The first term contains, in parentheses, the above PDE and is therefore zero. What remains is
Integrating this equation from to , one concludes that
Upon taking expectations, conditioned on , and observing that the right side is an It integral, which has
expectation zero, it follows that . The desired result is
obtained by observing that
Remarks
When originally published by Kac in 1949,
[2]
the FeynmanKac formula was presented as a formula for determining
the distribution of certain Wiener functionals. Suppose we wish to find the expected value of the function
in the case where is some realization of a diffusion process starting at . The FeynmanKac
formula says that this expectation is equivalent to the integral of a solution to a diffusion equation. Specifically,
under the conditions that ,
FeynmanKac formula
45
where and
The FeynmanKac formula can also be interpreted as a method for evaluating functional integrals of a certain form.
If
where the integral is taken over all random walks, then
where is a solution to the parabolic partial differential equation
with initial condition .
See also
It's lemma
KunitaWatanabe theorem
Girsanov theorem
Kolmogorov forward equation (also known as FokkerPlanck equation)
References
Simon, Barry (1979). Functional Integration and Quantum Physics. Academic Press.
[1] http:/ / www. math. nyu.edu/ faculty/ kohn/ pde_finance.html
[2] Kac, Mark (1949). "On Distributions of Certain Wiener Functionals" (http:/ / www. jstor. org/ stable/ 1990512). Transactions of the American
Mathematical Society 65 (1): 113. doi:10.2307/1990512. . Retrieved 2008-05-30.
Fourier transform
46
Fourier transform
In mathematics, the Fourier transform (often abbreviated FT) is an operation that transforms one complex-valued
function of a real variable into another. In such applications as signal processing, the domain of the original function
is typically time and is accordingly called the time domain. The domain of the new function is typically called the
frequency domain, and the new function itself is called the frequency domain representation of the original function.
It describes which frequencies are present in the original function. This is analogous to describing a musical chord in
terms of the individual notes being played. In effect, the Fourier transform decomposes a function into oscillatory
functions. The term Fourier transform refers both to the frequency domain representation of a function, and to the
process or formula that "transforms" one function into the other.
The Fourier transform and its generalizations are the subject of Fourier analysis. In this specific case, both the time
and frequency domains are unbounded linear continua. It is possible to define the Fourier transform of a function of
several variables, which is important for instance in the physical study of wave motion and optics. It is also possible
to generalize the Fourier transform on discrete structures such as finite groups. The efficient computation of such
structures, by fast Fourier transform, is essential for high-speed computing.
Fourier transforms
Continuous Fourier transform
Fourier series
Discrete Fourier transform
Discrete-time Fourier transform
Related transforms
Definition
There are several common conventions for defining the Fourier transform of an integrable function : R C
(Kaiser 1994). This article will use the definition:
for every real number .
When the independent variable x represents time (with SI unit of seconds), the transform variable represents
frequency (in hertz). Under suitable conditions, can be reconstructed from by the inverse transform:
for every real numberx.
For other common conventions and notations, including using the angular frequency instead of the frequency ,
see Other conventions and Other notations below. The Fourier transform on Euclidean space is treated separately, in
which the variable x often represents position and momentum.
Introduction
The motivation for the Fourier transform comes from the study of Fourier series. In the study of Fourier series,
complicated periodic functions are written as the sum of simple waves mathematically represented by sines and
cosines. Due to the properties of sine and cosine it is possible to recover the amount of each wave in the sum by an
integral. In many cases it is desirable to use Euler's formula, which states that e
2i
=cos2+isin2, to write
Fourier series in terms of the basic waves e
2i
. This has the advantage of simplifying many of the formulas involved
and providing a formulation for Fourier series that more closely resembles the definition followed in this article. This
passage from sines and cosines to complex exponentials makes it necessary for the Fourier coefficients to be
Fourier transform
47
complex valued. The usual interpretation of this complex number is that it gives both the amplitude (or size) of the
wave present in the function and the phase (or the initial angle) of the wave. This passage also introduces the need
for negative "frequencies". If were measured in seconds then the waves e
2i
and e
2i
would both complete one
cycle per second, but they represent different frequencies in the Fourier transform. Hence, frequency no longer
measures the number of cycles per unit time, but is closely related.
We may use Fourier series to motivate the Fourier transform as follows. Suppose that is a function which is zero
outside of some interval [L/2,L/2]. Then for any TL we may expand in a Fourier series on the interval
[T/2,T/2], where the "amount" (denoted by c
n
) of the wave e
2inx/T
in the Fourier series of is given by
and should be given by the formula
If we let
n
=n/T, and we let =(n+1)/Tn/T=1/T, then this last sum becomes the Riemann sum
By letting T this Riemann sum converges to the integral for the inverse Fourier transform given in the
Definition section. Under suitable conditions this argument may be made precise (Stein & Shakarchi 2003). Hence,
as in the case of Fourier series, the Fourier transform can be thought of as a function that measures how much of
each individual frequency is present in our function, and we can recombine these waves by using an integral (or
"continuous sum") to reproduce the original function.
The following images provide a visual illustration of how the Fourier transform measures whether a frequency is
present in a particular function. The function depicted oscillates at 3 hertz (if t measures
seconds) and tends quickly to 0. This function was specially chosen to have a real Fourier transform which can easily
be plotted. The first image contains its graph. In order to calculate we must integrate e
2i(3t)
(t). The second
image shows the plot of the real and imaginary parts of this function. The real part of the integrand is almost always
positive, this is because when (t) is negative, then the real part of e
2i(3t)
is negative as well. Because they oscillate
at the same rate, when (t) is positive, so is the real part of e
2i(3t)
. The result is that when you integrate the real part
of the integrand you get a relatively large number (in this case 0.5). On the other hand, when you try to measure a
frequency that is not present, as in the case when we look at , the integrand oscillates enough so that the
integral is very small. The general situation may be a bit more complicated than this, but this in spirit is how the
Fourier transform measures how much of an individual frequency is present in a function (t).
Original function showing
oscillation 3 hertz.
Real and imaginary parts of
integrand for Fourier transform
at 3 hertz
Real and imaginary parts of
integrand for Fourier transform
at 5 hertz
Fourier transform with 3 and 5
hertz labeled.
Fourier transform
48
Properties of the Fourier transform
An integrable function is a function on the real line that is Lebesgue-measurable and satisfies
Basic properties
Given integrable functions f(x), g(x), and h(x) denote their Fourier transforms by , , and
respectively. The Fourier transform has the following basic properties (Pinsky 2002).
Linearity
For any complex numbers a and b, if h(x)=a(x)+bg(x), then
Translation
For any real number x
0
, if h(x)=(xx
0
), then
Modulation
For any real number
0
, if h(x)=e
2ix
0(x), then .
Scaling
For a non-zero real number a, if h(x)=(ax), then . The case a=1 leads to the
time-reversal property, which states: if h(x)=(x), then .
Conjugation
If , then
In particular, if is real, then one has the reality condition
And if is purely imaginary, then
Convolution
If , then
Uniform continuity and the RiemannLebesgue lemma
The rectangular function is Lebesgue integrable.
The sinc function, the Fourier transform of the
rectangular function, is bounded and continuous,
but not Lebesgue integrable.
The Fourier transform of an integrable function is bounded and continuous, but need not be integrable for
example, the Fourier transform of the rectangular function, which is a step function (and hence integrable) is the sinc
Fourier transform
49
function, which is not Lebesgue integrable, though it does have an improper integral: one has an analog to the
alternating harmonic series, which is a convergent sum but not absolutely convergent.
It is not possible in general to write the inverse transform as a Lebesgue integral. However, when both and are
integrable, the following inverse equality holds true for almost every x:
Almost everywhere, is equal to the continuous function given by the right-hand side. If is given as continuous
function on the line, then equality holds for every x.
A consequence of the preceding result is that the Fourier transform is injective on L
1
(R).
The Plancherel theorem and Parseval's theorem
Let f(x) and g(x) be integrable, and let and be their Fourier transforms. If f(x) and g(x) are also
square-integrable, then we have Parseval's theorem (Rudin 1987, p. 187):
where the bar denotes complex conjugation.
The Plancherel theorem, which is equivalent to Parseval's theorem, states (Rudin 1987, p. 186):
The Plancherel theorem makes it possible to define the Fourier transform for functions in L
2
(R), as described in
Generalizations below. The Plancherel theorem has the interpretation in the sciences that the Fourier transform
preserves the energy of the original quantity. It should be noted that depending on the author either of these theorems
might be referred to as the Plancherel theorem or as Parseval's theorem.
See Pontryagin duality for a general formulation of this concept in the context of locally compact abelian groups.
Poisson summation formula
The Poisson summation formula provides a link between the study of Fourier transforms and Fourier Series. Given
an integrable function we can consider the periodic summation of given by:
where the summation is taken over the set of all integers k. The Poisson summation formula relates the Fourier series
of to the Fourier transform of . Specifically it states that the Fourier series of is given by:
Fourier transform
50
Convolution theorem
The Fourier transform translates between convolution and multiplication of functions. If (x) and g(x) are integrable
functions with Fourier transforms and respectively, then the Fourier transform of the convolution is
given by the product of the Fourier transforms and (under other conventions for the definition of the
Fourier transform a constant factor may appear).
This means that if:
where denotes the convolution operation, then:
In linear time invariant (LTI) system theory, it is common to interpret g(x) as the impulse response of an LTI system
with input (x) and output h(x), since substituting the unit impulse for (x) yields h(x)= g(x). In this case,
represents the frequency response of the system.
Conversely, if (x) can be decomposed as the product of two square integrable functions p(x) and q(x), then the
Fourier transform of (x) is given by the convolution of the respective Fourier transforms and .
Cross-correlation theorem
In an analogous manner, it can be shown that if h(x) is the cross-correlation of (x) and g(x):
then the Fourier transform of h(x) is:
As a special case, the autocorrelation of function (x) is:
for which
Eigenfunctions
One important choice of an orthonormal basis for L
2
(R) is given by the Hermite functions
where are the "probabilist's" Hermite polynomials, defined by H
n
(x)= (1)
n
exp(x
2
/2)D
n
exp(x
2
/2). Under
this convention for the Fourier transform, we have that
In other words, the Hermite functions form a complete orthonormal system of eigenfunctions for the Fourier
transform on L
2
(R) (Pinsky 2002). However, this choice of eigenfunctions is not unique. There are only four
different eigenvalues of the Fourier transform (1 and i) and any linear combination of eigenfunctions with the
same eigenvalue gives another eigenfunction. As a consequence of this, it is possible to decompose L
2
(R) as a direct
sum of four spaces H
0
, H
1
, H
2
, and H
3
where the Fourier transform acts on H
k
simply by multiplication by i
k
. This
approach to define the Fourier transform is due to N. Wiener(Duoandikoetxea 2001). The choice of Hermite
functions is convenient because they are exponentially localized in both frequency and time domains, and thus give
rise to the fractional Fourier transform used in time-frequency analysis (Boashash 2003).
Fourier transform
51
Fourier transform on Euclidean space
The Fourier transform can be in any arbitrary number of dimensions n. As with the one-dimensional case there are
many conventions, for an integrable function (x) this article takes the definition:
where x and are n-dimensional vectors, and x is the dot product of the vectors. The dot product is sometimes
written as .
All of the basic properties listed above hold for the n-dimensional Fourier transform, as do Plancherel's and
Parseval's theorem. When the function is integrable, the Fourier transform is still uniformly continuous and the
RiemannLebesgue lemma holds. (Stein & Weiss 1971)
Uncertainty principle
Generally speaking, the more concentrated f(x) is, the more spread out its Fourier transform must be. In
particular, the scaling property of the Fourier transform may be seen as saying: if we "squeeze" a function in x, its
Fourier transform "stretches out" in . It is not possible to arbitrarily concentrate both a function and its Fourier
transform.
The trade-off between the compaction of a function and its Fourier transform can be formalized in the form of an
Uncertainty Principle by viewing a function and its Fourier transform as conjugate variables with respect to the
symplectic form on the timefrequency domain: from the point of view of the linear canonical transformation, the
Fourier transform is rotation by 90 in the timefrequency domain, and preserves the symplectic form.
Suppose (x) is an integrable and square-integrable function. Without loss of generality, assume that (x) is
normalized:
It follows from the Plancherel theorem that is also normalized.
The spread around x= 0 may be measured by the dispersion about zero (Pinsky 2002) defined by
In probability terms, this is the second moment of about zero.
The Uncertainty principle states that, if (x) is absolutely continuous and the functions x(x) and (x) are square
integrable, then
(Pinsky 2002).
The equality is attained only in the case (hence ) where > 0
is arbitrary and C
1
is such that is L
2
normalized (Pinsky 2002). In other words, where is a (normalized) Gaussian
function, centered at zero.
In fact, this inequality implies that:
for any in R (Stein & Shakarchi 2003).
In quantum mechanics, the momentum and position wave functions are Fourier transform pairs, to within a factor of
Planck's constant. With this constant properly taken into account, the inequality above becomes the statement of the
Heisenberg uncertainty principle (Stein & Shakarchi 2003).
Fourier transform
52
Spherical harmonics
Let the set of homogeneous harmonic polynomials of degree k on R
n
be denoted by A
k
. The set A
k
consists of the
solid spherical harmonics of degree k. The solid spherical harmonics play a similar role in higher dimensions to the
Hermite polynomials in dimension one. Specifically, if f(x)=e
|x|
2P(x) for some P(x) in A
k
, then
. Let the set H
k
be the closure in L
2
(R
n
) of linear combinations of functions of the form f(|x|)P(x)
where P(x) is in A
k
. The space L
2
(R
n
) is then a direct sum of the spaces H
k
and the Fourier transform maps each
space H
k
to itself and is possible to characterize the action of the Fourier transform on each space H
k
(Stein & Weiss
1971). Let (x)=
0
(|x|)P(x) (with P(x) in A
k
), then where
Here J
(n+2k2)/2
denotes the Bessel function of the first kind with order (n+2k2)/2. When k=0 this gives a
useful formula for the Fourier transform of a radial function (Grafakos 2004).
Restriction problems
In higher dimensions it becomes interesting to study restriction problems for the Fourier transform. The Fourier
transform of an integrable function is continuous and the restriction of this function to any set is defined. But for a
square-integrable function the Fourier transform could be a general class of square integrable functions. As such, the
restriction of the Fourier transform of an L
2
(R
n
) function cannot be defined on sets of measure 0. It is still an active
area of study to understand restriction problems in L
p
for 1<p<2. Surprisingly, it is possible in some cases to
define the restriction of a Fourier transform to a set S, provided S has non-zero curvature. The case when S is the unit
sphere in R
n
is of particular interest. In this case the Tomas-Stein restriction theorem states that the restriction of the
Fourier transform to the unit sphere in R
n
is a bounded operator on L
p
provided 1p (2n + 2) / (n + 3).
One notable difference between the Fourier transform in 1 dimension versus higher dimensions concerns the partial
sum operator. Consider an increasing collection of measurable sets E
R
indexed by R(0,): such as balls of radius
R centered at the origin, or cubes of side 2R. For a given integrable function , consider the function
R
defined by:
Suppose in addition that is in L
p
(R
n
). For n= 1 and 1 < p < , if one takes E
R
= (R,R), then
R
converges to in
L
p
as R tends to infinity, by the boundedness of the Hilbert transform. Naively one may hope the same holds true for
n> 1. In the case that E
R
is taken to be a cube with side length R, then convergence still holds. Another natural
candidate is the Euclidean ball E
R
= {:||< R}. In order for this partial sum operator to converge, it is necessary
that the multiplier for the unit ball be bounded in L
p
(R
n
). For n2 it is a celebrated theorem of Charles Fefferman
that the multiplier for the unit ball is never bounded unless p=2 (Duoandikoetxea 2001). In fact, when p 2, this
shows that not only may
R
fail to converge to in L
p
, but for some functions L
p
(R
n
),
R
is not even an element of
L
p
.
Generalizations
Fourier transform on other function spaces
It is possible to extend the definition of the Fourier transform to other spaces of functions. Since compactly
supported smooth functions are integrable and dense in L
2
(R), the Plancherel theorem allows us to extend the
definition of the Fourier transform to general functions in L
2
(R) by continuity arguments. Further : L
2
(R)
L
2
(R) is a unitary operator (Stein & Weiss 1971, Thm. 2.3). Many of the properties remain the same for the Fourier
transform. The HausdorffYoung inequality can be used to extend the definition of the Fourier transform to include
functions in L
p
(R) for 1 p 2. Unfortunately, further extensions become more technical. The Fourier transform of
functions in L
p
for the range 2 < p < requires the study of distributions (Katznelson 1976). In fact, it can be shown
that there are functions in L
p
with p>2 so that the Fourier transform is not defined as a function (Stein & Weiss
Fourier transform
53
1971).
FourierStieltjes transform
The Fourier transform of a finite Borel measure on R
n
is given by (Pinsky 2002):
This transform continues to enjoy many of the properties of the Fourier transform of integrable functions. One
notable difference is that the RiemannLebesgue lemma fails for measures (Katznelson 1976). In the case that
d=(x)dx, then the formula above reduces to the usual definition for the Fourier transform of . In the case that is
the probability distribution associated to a random variable X, the Fourier-Stieltjes transform is closely related to the
characteristic function, but the typical conventions in probability theory take e
ix
instead of e
2ix
(Pinsky 2002). In
the case when the distribution has a probability density function this definition reduces to the Fourier transform
applied to the probability density function, again with a different choice of constants.
The Fourier transform may be used to give a characterization of continuous measures. Bochner's theorem
characterizes which functions may arise as the FourierStieltjes transform of a measure (Katznelson 1976).
Furthermore, the Dirac delta function is not a function but it is a finite Borel measure. Its Fourier transform is a
constant function (whose specific value depends upon the form of the Fourier transform used).
Tempered distributions
The Fourier transform maps the space of Schwartz functions to itself, and gives a homeomorphism of the space to
itself (Stein & Weiss 1971). Because of this it is possible to define the Fourier transform of tempered distributions.
These include all the integrable functions mentioned above and have the added advantage that the Fourier transform
of any tempered distribution is again a tempered distribution.
The following two facts provide some motivation for the definition of the Fourier transform of a distribution. First let
and g be integrable functions, and let and be their Fourier transforms respectively. Then the Fourier transform
obeys the following multiplication formula (Stein & Weiss 1971),
Secondly, every integrable function defines a distribution T
by the relation
for all Schwartz functions .
In fact, given a distribution T, we define the Fourier transform by the relation
for all Schwartz functions .
It follows that
Distributions can be differentiated and the above mentioned compatibility of the Fourier transform with
differentiation and convolution remains true for tempered distributions.
Locally compact abelian groups
The Fourier transform may be generalized to any locally compact abelian group. A locally compact abelian group is
an abelian group which is at the same time a locally compact Hausdorff topological space so that the group
operations are continuous. If G is a locally compact abelian group, it has a translation invariant measure , called
Haar measure. For a locally compact abelian group G it is possible to place a topology on the set of characters so
that is also a locally compact abelian group. For a function in L
1
(G) it is possible to define the Fourier
Fourier transform
54
transform by (Katznelson 1976):
Locally compact Hausdorff space
The Fourier transform may be generalized to any locally compact Hausdorff space, which recovers the topology but
loses the group structure.
Given a locally compact Hausdorff topological space X, the space A=C
0
(X) of continuous complex-valued functions
on X which vanish at infinity is in a natural way a commutative C*-algebra, via pointwise addition, multiplication,
complex conjugation, and with norm as the uniform norm. Conversely, the characters of this algebra A, denoted
are naturally a topological space, and can be identified with evaluation at a point of x, and one has an isometric
isomorphism In the case where X=R is the real line, this is exactly the Fourier transform.
Non-abelian groups
The Fourier transform can also be defined for functions on a non-abelian group, provided that the group is compact.
Unlike the Fourier transform on an abelian group, which is scalar-valued, the Fourier transform on a non-abelian
group is operator-valued (Hewitt & Ross 1971, Chapter 8). The Fourier transform on compact groups is a major tool
in representation theory (Knapp 2001) and non-commutative harmonic analysis.
Let G be a compact Hausdorff topological group. Let denote the collection of all isomorphism classes of
finite-dimensional irreducible unitary representations, along with a definite choice of representation U
()
on the
Hilbert space H
of finite dimension d
) indexed by of
(bounded) linear operators E
:H
H
for which the norm
is finite. The "convolution theorem" asserts that, furthermore, this isomorphism of Banach spaces is in fact an
isomorphism of C
*
algebras into a subspace of C
2
> 0 squared scale (real),
R location
support: x (0, +)
pdf:
cdf:
mean:
median:
mode:
variance:
skewness:
ex.kurtosis:
Log-normal distribution
133
entropy:
mgf: (defined only on the negative half-axis, see text)
cf:
representation is asymptotically divergent but sufficient for numerical purposes
Fisher information:
In probability theory, a log-normal distribution is a probability distribution of a random variable whose logarithm
is normally distributed. If Y is a random variable with a normal distribution, then X=exp(Y) has a log-normal
distribution; likewise, if X is log-normally distributed, then Y=log(X) is normally distributed. (This is true regardless
of the base of the logarithmic function: if log
a
(Y) is normally distributed, then so is log
b
(Y), for any two positive
numbers a,b1.)
Log-normal is also written lognormal or lognormal. It is occasionally referred to as the Galton distribution or
Galton's distribution, after Francis Galton.
A variable might be modeled as log-normal if it can be thought of as the multiplicative product of many independent
random variables each of which is positive. For example, in finance, a long-term discount factor can be derived from
the product of short-term discount factors. In wireless communication, the attenuation caused by shadowing or slow
fading from random objects is often assumed to be log-normally distributed. See log-distance path loss model.
Characterization
Probability density function
The probability density function of a log-normal distribution is:
where and are the mean and standard deviation of the variables natural logarithm (by definition, the variables
logarithm is normally distributed).
Cumulative distribution function
where erfc is the complementary error function, and is the standard normal cdf.
Mean and standard deviation
If X is a lognormally distributed variable, its expected value (mean), variance, and standard deviation are
Equivalently, parameters and can be obtained if the values of mean and variance are known:
Log-normal distribution
134
The geometric mean of the log-normal distribution is , and the geometric standard deviation is equal to .
Mode and median
The mode is the point of global maximum of the pdf function. In particular, it solves the equation (ln)=0:
The median is such a point where F
X
=:
Confidence interval
If X is distributed log-normally with parameters and , then the (1)-confidence interval for X will be
where q* is the (1/2)-quantile of the standard normal distribution: q*=
1
(1/2).
Moments
For any real or complex number s, the s
th
moment of log-normal X is given by
A log-normal distribution is not uniquely determined by its moments E[X
k
] for k1, that is, there exists some other
distribution with the same moments for all k. In fact, there is a whole family of distributions with the same moments
as the log-normal distribution.
Characteristic function and moment generating function
The characteristic function E[e
itX
] has a number of representations. The integral itself converges for Im(t)0. The
simplest representation is obtained by Taylor expanding e
itX
and using formula for moments above.
This series representation is divergent for Re(
2
)>0, however it is sufficient for numerically evaluating the
characteristic function at positive as long as the upper limit in sum above is kept bounded, nN, where
and
2
<0.1. To bring the numerical values of parameters , into the domain where strong inequality holds true
one could use the fact that if X is log-normally distributed then X
m
is also log-normally distributed with parameters
m,m. Since , the inequality could be satisfied for sufficiently smallm. The sum of series first
converges to the value of (t) with arbitrary high accuracy if m is small enough, and left part of the strong inequality
is satisfied. If considerably larger number of terms are taken into account the sum eventually diverges when the right
part of the strong inequality is no longer valid.
Another useful representation was derived by Roy Lepnik (see references by this author and by Daniel Dufresne
below) by means of double Taylor expansion of e
(lnx)
2/(2
2
).
The moment-generating function for the log-normal distribution does not exist on the domain R, but only exists on
the half-interval (,0].
Log-normal distribution
135
Partial expectation
The partial expectation of a random variable X with respect to a threshold k is defined as g(k) = E[X|X>k]P[X>k].
For a log-normal random variable the partial expectation is given by
This formula has applications in insurance and economics, it is used in solving the partial differential equation
leading to the BlackScholes formula.
Properties
Data that arise from the the log-normal distribution has a symmetric Lorenz curve (see also Lorenz asymmetry
coefficient)
[1]
.
Maximum likelihood estimation of parameters
For determining the maximum likelihood estimators of the log-normal distribution parameters and , we can use
the same procedure as for the normal distribution. To avoid repetition, we observe that
where by
L
we denote the probability density function of the log-normal distribution and by
N
that of the normal
distribution. Therefore, using the same indices to denote distributions, we can write the log-likelihood function thus:
Since the first term is constant with regard to and , both logarithmic likelihood functions,
L
L
and
N
, reach their
maximum with the same and. Hence, using the formulas for the normal distribution maximum likelihood
parameter estimators and the equality above, we deduce that for the log-normal distribution it holds that
Generating log-normally-distributed random variates
Given a random variate N drawn from the normal distribution with 0 mean and 1 standard deviation, then the variate
has a Log-normal distribution with parameters and .
Related distributions
If is a normal distribution, then
If is distributed log-normally, then is a normal random variable.
If are n independent log-normally distributed variables, and , then Y is
also distributed log-normally:
Let be independent log-normally distributed variables with possibly varying and
parameters, and . The distribution of Y has no closed-form expression, but can be reasonably
approximated by another log-normal distribution Z at the right tail. Its probability density function at the
neighborhood of 0 is characterized in (Gao et al., 2009) and it does not resemble any log-normal distribution. A
commonly used approximation (due to Fenton and Wilkinson) is obtained by matching the mean and variance:
Log-normal distribution
136
In the case that all have the same variance parameter , these formulas simplify to
If , then X+c is said to have a shifted log-normal distribution with support x (c, +).
E[X+c] = E[X]+c, Var[X+c] =Var[X].
If , then Y=aX is also log-normal,
If , then Y=
1
X
is also log-normal,
If anda0, then Y=X
a
is also log-normal,
Similar distributions
A substitute for the log-normal whose integral can be expressed in terms of more elementary functions (Swamee,
2002) can be obtained based on the logistic distribution to get the CDF
This is a log-logistic distribution.
An exGaussian distribution is the distribution of the sum of a normally distributed random variable and an
exponentially distributed random variable. This has a similar long tail, and has been used as a model for reaction
times.
Further reading
Robert Brooks, Jon Corson, and J. Donal Wales. "The Pricing of Index Options When the Underlying Assets All
Follow a Lognormal Diffusion"
[2]
, in Advances in Futures and Options Research, volume 7, 1994.
References
[1] Damgaard, Christian; Jacob Weiner (2000). "Describing inequality in plant size or fecundity". Ecology 81 (4): 1139-1142.
doi:10.1890/0012-9658(2000)081[1139:DIIPSO]2.0.CO;2.
[2] http:/ / papers. ssrn. com/ sol3/ papers.cfm?abstract_id=5735
Citations
The Lognormal Distribution, Aitchison, J. and Brown, J.A.C. (1957)
Log-normal Distributions across the Sciences: Keys and Clues (http:/ / stat. ethz. ch/ ~stahel/ lognormal/
bioscience. pdf), E. Limpert, W. Stahel and M. Abbt,. BioScience, 51 (5), p.341352 (2001).
Eric W. Weisstein et al. Log Normal Distribution (http:/ / mathworld. wolfram. com/ LogNormalDistribution.
html) at MathWorld. Electronic document, retrieved October 26, 2006.
Swamee, P.K. (2002). Near Lognormal Distribution (http:/ / scitation. aip. org/ getabs/ servlet/
GetabsServlet?prog=normal& id=JHYEFF000007000006000441000001& idtype=cvips& gifs=yes), Journal of
Hydrologic Engineering. 7(6): 441-444
Log-normal distribution
137
Roy B. Leipnik (1991), On Lognormal Random Variables: I - The Characteristic Function (http:/ / anziamj.
austms. org. au/ V32/ part3/ Leipnik. html), Journal of the Australian Mathematical Society Series B, vol. 32, pp
327347.
Gao et al. (2009), (http:/ / www. hindawi. com/ journals/ ijmms/ 2009/ 630857. html), Asymptotic Behaviors of
Tail Density for Sum of Correlated Lognormal Variables. International Journal of Mathematics and Mathematical
Sciences.
Daniel Dufresne (2009), (http:/ / www. soa. org/ library/ proceedings/ arch/ 2009/ arch-2009-iss1-dufresne. pdf),
SUMS OF LOGNORMALS, Centre for Actuarial Studies, University of Melbourne.
See also
Normal distribution
Geometric mean
Geometric standard deviation
Error function
Log-distance path loss model
Slow fading
Stochastic volatility
Heat equation
The heat equation predicts that if a hot body is
placed in a box of cold water, the temperature of
the body will decrease, and eventually (after
infinite time, and subject to no external heat
sources) the temperature in the box will equalize.
The heat equation is an important partial differential equation which
describes the distribution of heat (or variation in temperature) in a
given region over time. For a function u(x,y,z,t) of three spatial
variables (x,y,z) and the time variable t, the heat equation is
also written
or sometimes
where is a positive constant and or denotes the Laplacian operator. For the mathematical treatment it is
sufficient to consider the case =1. For the case of variation of temperature u(x,y,z,t) is the temperature and is
the thermal diffusivity
Heat equation
138
The heat equation is of fundamental importance in diverse scientific fields. In mathematics, it is the prototypical
parabolic partial differential equation. In probability theory, the heat equation is connected with the study of
Brownian motion via the FokkerPlanck equation. In financial mathematics it is used to solve the BlackScholes
partial differential equation. The diffusion equation, a more general version of the heat equation, arises in connection
with the study of chemical diffusion and other related processes.
General description
Graphical representation of the solution to a 1D
heat equation PDE. (View animated version)
Suppose one has a function u which describes the temperature at a
given location (x, y, z). This function will change over time as heat
spreads throughout space. The heat equation is used to determine the
change in the function u over time. The image to the right is animated
and describes the way heat changes in time along a metal bar. One of
the interesting properties of the heat equation is the maximum principle
which says that the maximum value of u is either earlier in time than
the region of concern or on the edge of the region of concern. This is
essentially saying that temperature comes either from some source or
from earlier in time because heat permeates but is not created from
nothingness. This is a property of parabolic partial differential
equations and is not difficult to prove mathematically (see below).
Another interesting property is that even if u has a discontinuity at an
initial time t = t
0
, the temperature becomes smooth as soon as t > t
0
. For example, if a bar of metal has temperature 0
and another has temperature 100 and they are stuck together end to end, then very quickly the temperature at the
point of connection is 50 and the graph of the temperature is smoothly running from 0 to 100.
The heat equation is used in probability and describes random walks. It is also applied in financial mathematics for
this reason.
It is also important in Riemannian geometry and thus topology: it was adapted by Richard Hamilton when he defined
the Ricci flow that was later used by Grigori Perelman to solve the topological Poincar conjecture.
The physical problem and the equation
Derivation in one dimension
The heat equation is derived from Fourier's law and conservation of energy (Cannon 1984). By Fourier's law, the
flow rate of heat energy through a surface is proportional to the negative temperature gradient across the surface,
where k is the thermal conductivity and u is the temperature. In one dimension, the gradient is an ordinary spatial
derivative, and so Fourier's law is
where u
x
is du/dx. In the absence of work done, a change in internal energy per unit volume in the material, Q, is
proportional to the change in temperature, u. That is,
where c
p
is the specific heat capacity and is the mass density of the material. (In this section only, is the ordinary
difference operator, not the Laplacian.) Choosing zero energy at absolute zero temperature, this can be rewritten as
.
The increase in internal energy in a small spatial region of the material
Heat equation
139
over the time period
is given by
[1]
where the fundamental theorem of calculus was used. Additionally, with no work done and absent any heat sources
or sinks, the change in internal energy in the interval [x-x, x+x] is accounted for entirely by the flux of heat across
the boundaries. By Fourier's law, this is
again by the fundamental theorem of calculus.
[2]
By conservation of energy,
This is true for any rectangle [tt, t+t] [xx, x+x]. Consequently, the integrand must vanish identically:
Which can be rewritten as:
or:
which is the heat equation. The coefficient k/(c
p
) is called thermal diffusivity and is often denoted .
Three-dimensional problem
In the special case of heat propagation in an isotropic and homogeneous medium in a 3-dimensional space, this
equation is
where:
u = u(x, y, z, t) is temperature as a function of space and time;
is the rate of change of temperature at a point over time;
u
xx
, u
yy
, and u
zz
are the second spatial derivatives (thermal conductions) of temperature in the x, y, and z
directions, respectively;
is the thermal diffusivity, a material-specific quantity depending on the thermal conductivity, k, the
mass density, , and the specific heat capacity, .
The heat equation is a consequence of Fourier's law of cooling (see heat conduction).
If the medium is not the whole space, in order to solve the heat equation uniquely we also need to specify boundary
conditions for u. To determine uniqueness of solutions in the whole space it is necessary to assume an exponential
bound on the growth of solutions, this assumption is consistent with observed experiments.
Solutions of the heat equation are characterized by a gradual smoothing of the initial temperature distribution by the
flow of heat from warmer to colder areas of an object. Generally, many different states and starting conditions will
tend toward the same stable equilibrium. As a consequence, to reverse the solution and conclude something about
earlier times or initial conditions from the present heat distribution is very inaccurate except over the shortest of time
periods.
The heat equation is the prototypical example of a parabolic partial differential equation.
Heat equation
140
Using the Laplace operator, the heat equation can be simplified, and generalized to similar equations over spaces of
arbitrary number of dimensions, as
where the Laplace operator, or , the divergence of the gradient, is taken in the spatial variables.
The heat equation governs heat diffusion, as well as other diffusive processes, such as particle diffusion or the
propagation of action potential in nerve cells. Although they are not diffusive in nature, some quantum mechanics
problems are also governed by a mathematical analog of the heat equation (see below). It also can be used to model
some phenomena arising in finance, like the Black-Scholes or the Ornstein-Uhlenbeck processes. The equation, and
various non-linear analogues, has also been used in image analysis.
The heat equation is, technically, in violation of special relativity, because its solutions involve instantaneous
propagation of a disturbance. The part of the disturbance outside the forward light cone can usually be safely
neglected, but if it is necessary to develop a reasonable speed for the transmission of heat, a hyperbolic problem
should be considered instead like a partial differential equation involving a second-order time derivative.
Internal heat generation
The function u above represents temperature of a body. Alternatively, it is sometimes convenient to change units and
represent u as the heat density of a medium. Since heat density is proportional to temperature in a homogeneous
medium, the heat equation is still obeyed in the new units.
Suppose that a body obeys the heat equation and, in addition, generates its own heat per unit volume (e.g., in
watts/L) at a rate given by a known function q varying in space and time.
[3]
Then the heat per unit volume u satisfies
an equation
For example, a tungsten light bulb filament generates heat, so it would have a positive nonzero value for when
turned on. While the light is turned off, the value of for the tungsten filament would be zero.
Solving the heat equation using Fourier series
Idealized physical setting for heat conduction in a rod with homogeneous boundary
conditions.
The following solution technique for the
heat equation was proposed by Joseph
Fourier in his treatise Thorie analytique de
la chaleur, published in 1822. Let us
consider the heat equation for one space
variable. This could be used to model heat
conduction in a rod. The equation is
where u = u(x, t) is a function of two variables x and t. Here
x is the space variable, so x [0,L], where L is the length of the rod.
t is the time variable, so t 0.
We assume the initial condition
Heat equation
141
where the function f is given and the boundary conditions
.
Let us attempt to find a solution of (1) which is not identically zero satisfying the boundary conditions (3) but with
the following property: u is a product in which the dependence of u on x, t is separated, that is:
This solution technique is called separation of variables. Substituting u back into equation (1),
Since the right hand side depends only on x and the left hand side only on t, both sides are equal to some constant
value . Thus:
and
We will now show that solutions for (6) for values of 0 cannot occur:
1. Suppose that < 0. Then there exist real numbers B, C such that
From (3) we get
and therefore B = 0 = C which implies u is identically 0.
2. Suppose that = 0. Then there exist real numbers B, C such that
From equation (3) we conclude in the same manner as in 1 that u is identically 0.
3. Therefore, it must be the case that > 0. Then there exist real numbers A, B, C such that
and
From (3) we get C = 0 and that for some positive integer n,
This solves the heat equation in the special case that the dependence of u has the special form (4).
In general, the sum of solutions to (1) which satisfy the boundary conditions (3) also satisfies (1) and (3). We can
show that the solution to (1), (2) and (3) is given by
where
Heat equation
142
Generalizing the solution technique
The solution technique used above can be greatly extended to many other types of equations. The idea is that the
operator u
xx
with the zero boundary conditions can be represented in terms of its eigenvectors. This leads naturally to
one of the basic ideas of the spectral theory of linear self-adjoint operators.
Consider the linear operator u = u
x x
. The infinite sequence of functions
for n 1 are eigenvectors of . Indeed
Moreover, any eigenvector f of with the boundary conditions f(0)=f(L)=0 is of the form e
n
for some n 1. The
functions e
n
for n 1 form an orthonormal sequence with respect to a certain inner product on the space of
real-valued functions on [0, L]. This means
Finally, the sequence {e
n
}
n N
spans a dense linear subspace of L
2
(0, L). This shows that in effect we have
diagonalized the operator .
Heat conduction in non-homogeneous anisotropic media
In general, the study of heat conduction is based on several principles. Heat flow is a form of energy flow, and as
such it is meaningful to speak of the time rate of flow of heat into a region of space.
The time rate of heat flow into a region V is given by a time-dependent quantity q
t
(V). We assume q has a density,
so that
Heat flow is a time-dependent vector function H(x) characterized as follows: the time rate of heat flowing through
an infinitesimal surface element with area d S and with unit normal vector n is
Thus the rate of heat flow into V is also given by the surface integral
where n(x) is the outward pointing normal vector at x.
The Fourier law states that heat energy flow has the following linear dependence on the temperature gradient
where A(x) is a 33 real matrix that is symmetric and positive definite.
By Green's theorem, the previous surface integral for heat flow into V can be transformed into the volume integral
Heat equation
143
The time rate of temperature change at x is proportional to the heat flowing into an infinitesimal volume element,
where the constant of proportionality is dependent on a constant
Putting these equations together gives the general equation of heat flow:
Remarks.
The coefficient (x) is the inverse of specific heat of the substance at x density of the substance at x.
In the case of an isotropic medium, the matrix A is a scalar matrix equal to thermal conductivity.
In the anisotropic case where the coefficient matrix A is not scalar (i.e., if it depends on x), then an explicit
formula for the solution of the heat equation can seldom be written down. Though, it is usually possible to
consider the associated abstract Cauchy problem and show that it is a well-posed problem and/or to show some
qualitative properties (like preservation of positive initial data, infinite speed of propagation, convergence toward
an equilibrium, smoothing properties). This is usually done by one-parameter semigroups theory: for instance, if
A is a symmetric matrix, then the elliptic operator defined by
is self-adjoint and dissipative, thus by the spectral theorem it generates a one-parameter semigroup.
Fundamental solutions
A fundamental solution, also called a heat kernel, is a solution of the heat equation corresponding to the initial
condition of an initial point source of heat at a known position. These can be used to find a general solution of the
heat equation over certain domains; see, for instance, (Evans 1998) for an introductory treatment.
In one variable, the Green's function is a solution of the initial value problem
where is the Dirac delta function. The solution to this problem is the fundamental solution
One can obtain the general solution of the one variable heat equation with initial condition u(x,0) = g(x) for -<x<
and 0<t< by applying a convolution:
In several spatial variables, the fundamental solution solves the analogous problem
in -<x
i
<, i=1,...,n, and 0<t<. The n-variable fundamental solution is the product of the fundamental solutions in
each variable; i.e.,
The general solution of the heat equation on R
n
is then obtained by a convolution, so that to solve the initial value
problem with u(x,t=0)=g(x), one has
Heat equation
144
The general problem on a domain in R
n
is
with either Dirichlet or Neumann boundary data. A Green's function always exists, but unless the domain can be
readily decomposed into one-variable problems (see below), it may not be possible to write it down explicitly. The
method of images provides one additional technique for obtaining Green's functions for non-trivial domains.
Some Green's function solutions in 1D
A variety of elementary Green's function solutions in one-dimension are recorded here. In some of these, the spatial
domain is the entire real line (-,). In others, it is the semi-infinite interval (0,) with either Neumann or Dirichlet
boundary conditions. One further variation is that some of these solve the inhomogeneous equation
where f is some given function of x and t.
Homogeneous heat equation
Initial value problem on (-,)
Comment. This solution is the convolution with respect to the variable of the fundamental solution
and the function . Therefore, according to the general properties of the
convolution with respect to differentiation, is a solution of the same heat equation, for
Moreover, and so that, by general facts
about approximation to the identity, as in various senses, according to the specific For
instance, if is assumed bounded and continuous on then converges uniformly to as ,
meaning that is continuous on with
Initial value problem on (0,) with homogeneous Dirichlet boundary conditions
Comment. This solution is obtained from the preceding formula as applied to the data suitably extended
to so as to be an odd function, that is, letting for all Correspondingly, the solution
of the initial value problem on is an odd function with respect to the variable for all values of
and in particular it satisfies the homogeneous Dirichlet boundary conditions
Initial value problem on (0,) with homogeneous Neumann boundary conditions
Heat equation
145
Comment. This solution is obtained from the first solution formula as applied to the data suitably
extended to so as to be an even function, that is, letting for all Correspondingly, the
solution of the initial value problem on is an even function with respect to the variable for all
values of and in particular, being smooth, it satisfies the homogeneous Neumann boundary conditions
Problem on (0,) with homogeneous initial conditions and non-homogeneous Dirichlet boundary conditions
Comment. This solution is the convolution with respect to the variable of
and the function . Since is the fundamental solution of
the function is also a solution of the same heat equation, and so is , thanks to
general properties of the convolution with respect to differentiation. Moreover, and
so that, by general facts about approximation to the identity, as in various
senses, according to the specific For instance, if is assumed continuous on with support in
then converges uniformly on compacta to as , meaning that is continuous on
with
Inhomogeneous heat equation
Problem on (-,) homogeneous initial conditions
Comment. This solution is the convolution in , that is with respect to both the variables and of the
fundamental solution and the function both meant as defined on the whole and
identically 0 for all . One verifies that which is expressed in the language of distributions as
where the distribution is the Dirac's delta function, that is the evaluation at 0.
Problem on (0,) with homogeneous Dirichlet boundary conditions and initial conditions
Comment. This solution is obtained from the preceding formula as applied to the data suitably extended to
, so as to be an odd function of the variable that is, letting for all and
Correspondingly, the solution of the inhomogeneous problem on is an odd function with respect to the
variable for all values of and in particular it satisfies the homogeneous Dirichlet boundary conditions
Problem on (0,) with homogeneous Neumann boundary conditions and initial conditions
Comment. This solution is obtained from the first formula as applied to the data suitably extended to
, so as to be an even function of the variable that is, letting for all and Correspondingly, the
solution of the inhomogeneous problem on is an even function with respect to the variable for all
values of and in particular, being a smooth function, it satisfies the homogeneous Neumann boundary conditions
Heat equation
146
Examples
Since the heat equation is linear, solutions of other combinations of boundary conditions, inhomogeneous term, and
initial conditions can be found by taking an appropriate linear combination of the above Green's function solutions.
For example, to solve
let
where u and v solve the problems
Similarly, to solve
let
where w, v, and r solve the problems
Mean-value property for the heat equation
Solutions of the heat equations satisfy a mean-value property analogous to the mean-value
properties of harmonic functions (solutions of ), though a bit more complicated. Precisely, if solves
and then
where is a "heat-ball", that is a super-level set of the fundamental solution of the heat equation:
Notice that as so the above formula holds for any in the (open) set
for large enough. Conversely, any function u satisfying the above mean-value property on an open
domain of is a solution of the heat equation. This can be shown by an argument similar to the analogous
one for harmonic functions.
Heat equation
147
Applications
Particle diffusion
One can model particle diffusion by an equation involving either:
the volumetric concentration of particles, denoted c, in the case of collective diffusion of a large number of
particles, or
the probability density function associated with the position of a single particle, denoted P.
In either case, one uses the heat equation
or
Both c and P are functions of position and time. D is the diffusion coefficient that controls the speed of the diffusive
process, and is typically expressed in meters squared over second. If the diffusion coefficient D is not constant, but
depends on the concentration c (or P in the second case), then one gets the nonlinear diffusion equation.
Brownian motion
The random trajectory of a single particle subject to the particle diffusion equation (or heat equation) is a Brownian
motion. If a particle is placed at at time , then the probability density function associated with the
position vector of the particle will be the following:
which is a (multivariate) normal distribution evolving in time.
Schrdinger equation for a free particle
With a simple division, the Schrdinger equation for a single particle of mass m in the absence of any applied force
field can be rewritten in the following way:
, where i is the unit imaginary number, and is Planck's constant divided by , and is
the wavefunction of the particle.
This equation is formally similar to the particle diffusion equation, which one obtains through the following
transformation:
Applying this transformation to the expressions of the Green functions determined in the case of particle diffusion
yields the Green functions of the Schrdinger equation, which in turn can be used to obtain the wavefunction at any
time through an integral on the wavefunction at t=0:
, with
Remark: this analogy between quantum mechanics and diffusion is a purely formal one. Physically, the evolution of
the wavefunction satisfying Schrdinger's equation might have an origin other than diffusion.
Heat equation
148
Thermal diffusivity in polymers
A direct practical application of the heat equation, in conjunction with Fourier theory, in spherical coordinates, is the
measurement of the thermal diffusivity in polymers (Unsworth and Duarte). The dual theoretical-experimental
method demonstrated by these authors is applicable to rubber and various other materials of practical interest.
Further applications
The heat equation arises in the modeling of a number of phenomena and is often used in financial mathematics in the
modeling of options. The famous BlackScholes option pricing model's differential equation can be transformed into
the heat equation allowing relatively easy solutions from a familiar body of mathematics. Many of the extensions to
the simple option models do not have closed form solutions and thus must be solved numerically to obtain a modeled
option price. The heat equation is also widely used in image analysis (Perona & Malik 1990) and in
machine-learning as the driving theory behind graph Laplacian methods. The heat equation can be efficiently solved
numerically using the CrankNicolson method of (Crank & Nicolson 1947). This method can be extended to many
of the models with no closed form solution, see for instance (Wilmott, Howison & Dewynne 1995).
An abstract form of heat equation on manifolds provides a major approach to the AtiyahSinger index theorem, and
has led to much further work on heat equations in Riemannian geometry.
Notes
[1] Here we are assuming that the material has constant mass density and heat capacity through space as well as time, although generalizations
are given below.
[2] In higher dimensions, the divergence theorem is used instead.
[3] Note that the units of u must be selected in a manner compatible with those of q. Thus instead of being temperature (K), units of u should be
J/L.
References
Cannon, John (1984), The One-Dimensional Heat Equation, Encyclopedia of mathematics and its applications,
Addison-Wesley, ISBN0-521-30243-9
Crank, J.; Nicolson, P.; Hartree, D. R. (1947), "A Practical Method for Numerical Evaluation of Solutions of
Partial Differential Equations of the Heat-Conduction Type", Proceedings of the Cambridge Philosophical
Society 43: 5067, doi:10.1017/S0305004100023197
Einstein, Albert (1905), "ber die von der molekularkinetischen Theorie der Wrme geforderte Bewegung von in
ruhenden Flssigkeiten suspendierten Teilchen", Ann. Phys. Leipzig 17 322: 549560,
doi:10.1002/andp.19053220806
Evans, L.C. (1998), Partial Differential Equations, American Mathematical Society, ISBN0-8218-0772-2
John, Fritz (1991), Partial Differential Equations (4th ed.), Springer, ISBN978-0387906096
Wilmott, P.; Howison, S.; Dewynne, J. (1995), The Mathematics of Financial Derivatives:A Student Introduction,
Cambridge University Press
Carslaw, H. S.; Jaeger, J. C. (1973), Conduction of Heat in Solids (2nd ed.), Oxford University Press,
ISBN9780198533689
Perona, P; Malik, J. (1990), "Scale-Space and Edge Detection Using Anisotropic Diffusion", {IEEE Transactions
on Pattern Analysis and Machine Intelligence 12 (7): 629-639
Unsworth, J.; Duarte, F. J. (1979), "Heat diffusion in a solid sphere and Fourier Theory", Am. J. Phys. 47:
891-893, doi:10.1119/1.11601
Heat equation
149
External links
Derivation of the heat equation (http:/ / www. mathphysics. com/ pde/ HEderiv. html)
Linear heat equations (http:/ / eqworld. ipmnet. ru/ en/ solutions/ lpde/ heat-toc. pdf): Particular solutions and
boundary value problems - from EqWorld
RadonNikodym derivative
In mathematics, the RadonNikodym theorem is a result in functional analysis that states that, given a measurable
space (X,), if a -finite measure on (X,) is absolutely continuous with respect to a -finite measure on (X,),
then there is a measurable function f on X and taking values in [0,), such that
for any measurable set A.
The theorem is named after Johann Radon, who proved the theorem for the special case where the underlying space
is R
N
in 1913, and for Otton Nikodym who proved the general case in 1930.
[1]
Above the function f was assumed to be complex-valued (or real-valued). If Y is a Banach space and the
generalization of the Radon-Nikodym theorem also holds for functions with values in Y (mutatis mutandis), then Y is
said to have the Radon-Nikodym property. All Hilbert spaces have the Radon-Nikodym property.
RadonNikodym derivative
The function f satisfying the above equality is uniquely defined up to a -null set, that is, if g is another function
which satisfies the same property, then f=g -almost everywhere. f is commonly written d/d and is called the
RadonNikodym derivative. The choice of notation and the name of the function reflects the fact that the function
is analogous to a derivative in calculus in the sense that it describes the rate of change of density of one measure with
respect to another (the way the Jacobian determinant is used in multivariable integration). A similar theorem can be
proven for signed and complex measures: namely, that if is a nonnegative -finite measure, and is a finite-valued
signed or complex measure such that , there is -integrable real- or complex-valued function g on X such
that
for any measurable set A.
Applications
The theorem is very important in extending the ideas of probability theory from probability masses and probability
densities defined over real numbers to probability measures defined over arbitrary sets. It tells if and how it is
possible to change from one probability measure to another. Specifically, the probability density function of a
random variable is the RadonNikodym derivative of the induced measure with respect to some base measure
(usually the Lebesgue measure for continuous random variables).
For example, it can be used to prove the existence of conditional expectation for probability measures. The latter
itself is a key concept in probability theory, as conditional probability is just a special case of it.
Amongst other fields, financial mathematics uses the theorem extensively. Such changes of probability measure are
the cornerstone of the rational pricing of derivative securities and are used for converting actual probabilities into
those of the risk neutral probabilities.
RadonNikodym derivative
150
Properties
Let , , and be -finite measures on the same measure space. If and ( and are absolutely
continuous in respect to), then
If , then
If and g is a -integrable function, then
If and , then
If is a finite signed or complex measure, then
Further applications
Information divergences
If and are measures over X, and
The Kullback-Leibler divergence from to is defined to be
For the Rnyi divergence of order from to is defined to be
The assumption of -finiteness
The RadonNikodym theorem makes the assumption that the measure with respect to which one computes the rate
of change of is sigma-finite. Here is an example when is not sigma-finite and the RadonNikodym theorem fails
to hold.
Consider the Borel sigma-algebra on the real line. Let the counting measure, , of a Borel set A be defined as the
number of elements of A if A is finite, and + otherwise. One can check that is indeed a measure. It is not
sigma-finite, as not every Borel set is at most a countable union of finite sets. Let be the usual Lebesgue measure
on this Borel algebra. Then, is absolutely continuous with respect to , since for a set A one has (A)=0 only if A
is the empty set, and then (A) is also zero.
Assume that the RadonNikodym theorem holds, that is, for some measurable function f one has
for all Borel sets. Taking A to be a singleton set, A={a}, and using the above equality, one finds
RadonNikodym derivative
151
for all real numbers a. This implies that the function f, and therefore the Lebesgue measure , is zero, which is a
contradiction.
Proof
This section gives a measure-theoretic proof of the theorem. There is also a functional-analytic proof, using Hilbert
space methods, that was first given by von Neumann.
For finite measures and , the idea is to consider functions f with f d d. The supremum of all such functions,
along with the monotone convergence theorem, then furnishes the Radon-Nikodym derivative. The fact that the
remaining part of is singular with respect to follows from a technical fact about finite measures. Once the result
is established for finite measures, extending to -finite, signed, and complex measures can be done naturally. The
details are given below.
For finite measures
First, suppose that and are both finite-valued nonnegative measures. Let F be the set of those measurable
functions f : X[0, +] satisfying
for every A (this set is not empty, for it contains at least the zero function). Let f
1
, f
2
F; let A be an arbitrary
measurable set, A
1
= {x A | f
1
(x) > f
2
(x)}, and A
2
= {x A | f
2
(x) f
1
(x)}. Then one has
and therefore, max{f
1
,f
2
} F.
Now, let {f
n
}
n
be a sequence of functions in F such that
By replacing f
n
with the maximum of the first n functions, one can assume that the sequence {f
n
} is increasing. Let g
be a function defined as
By Lebesgue's monotone convergence theorem, one has
for each A , and hence, g F. Also, by the construction of g,
Now, since g F,
defines a nonnegative measure on . Suppose
0
0; then, since is finite, there is an > 0 such that
0
(X) > (X).
Let (P,N) be a Hahn decomposition for the signed measure
0
. Note that for every A one has
0
(AP)
(AP), and hence,
Also, note that (P) > 0; for if (P) = 0, then (since is absolutely continuous in relation to )
0
(P) (P) = 0, so
0
(P) = 0 and
RadonNikodym derivative
152
contradicting the fact that
0
(X) > (X).
Then, since
g + 1
P
F and satisfies
This is impossible, therefore, the initial assumption that
0
0 must be false. So
0
= 0, as desired.
Now, since g is -integrable, the set {xX | g(x)=+} is -null. Therefore, if a f is defined as
then f has the desired properties.
As for the uniqueness, let f,g : X[0,+) be measurable functions satisfying
for every measurable set A. Then, gf is -integrable, and
In particular, for A = {xX | f(x) > g(x)}, or {x X | f(x) < g(x)}. It follows that
and so, that (gf)
+
= 0 -almost everywhere; the same is true for (gf)
where one of the measures is
finite. Applying the previous result to those two measures, one obtains two functions, g,h : X[0,+), satisfying the
RadonNikodym theorem for
+
and
respectively, at least one of which is -integrable (i.e., its integral with
respect to is finite). It is clear then that f = gh satisfies the required properties, including uniqueness, since both g
and h are unique up to -almost everywhere equality.
If is a complex measure, it can be decomposed as =
1
+i
2
, where both
1
and
2
are finite-valued signed
measures. Applying the above argument, one obtains two functions, g,h : X[0,+), satisfying the required
properties for
1
and
2
, respectively. Clearly, f = g+i h is the required function.
RadonNikodym derivative
153
Notes
[1] Nikodym, O. (1930). "Sur une gnralisation des intgrales de M. J. Radon" (http:/ / matwbn. icm. edu. pl/ ksiazki/ fm/ fm15/ fm15114. pdf)
(in French). Fundamenta Mathematicae 15: 131179. JFM56.0922.02. . Retrieved 2009-05-11.
References
Shilov, G. E., and Gurevich, B. L., 1978. Integral, Measure, and Derivative: A Unified Approach, Richard A.
Silverman, trans. Dover Publications. ISBN 0486635198.
This article incorporates material from Radon-Nikodym theorem on PlanetMath, which is licensed under the
Creative Commons Attribution/Share-Alike License.
Risk-neutral measure
In mathematical finance, a risk-neutral measure, which is an equivalent martingale measure, or Q-measure is a
probability measure that results when one assumes that the current value of all financial assets is equal to the
expected future payoff of the asset discounted at the risk-free rate. The concept is used in the pricing of derivatives.
Idea
In an actual economy, prices of assets depend crucially on their risk. Investors typically demand payment for bearing
uncertainty. Therefore, today's price of a claim on a risky amount realised tomorrow will generally differ from its
expected value. Most commonly,
[1]
investors are risk-averse and today's price is below the expectation, remunerating
those who bear the risk.
To price assets, consequently, the calculated expected values need to be adjusted for the risk involved (see also
Sharpe ratio).
It turns out, under certain weak conditions (absence of arbitrage) there is an alternative way to do this calculation:
Instead of first taking the expectation and then adjusting for risk, one can first adjust the probabilities of future
outcomes such that they incorporate the effects of risk, and then take the expectation under those different
probabilities. Those adjusted, 'virtual' probabilities are called risk-neutral probabilities, they constitute the
risk-neutral measure.
The probabilities over asset outcomes in the real world cannot be impacted; the constructed probabilities are
counterfactual. They are only computed because the second way of pricing, called risk-neutral pricing, is often much
simpler to calculate than the first.
The main benefit stems from the fact that once the risk-neutral probabilities are found, every asset can be priced by
simply taking its expected payoff (i.e. calculating as if investors were risk neutral). If we used the real-world,
physical probabilities, every security would require a different adjustment (as they differ in riskiness).
Note that under the risk-neutral measure all assets have the same expected rate of return, the risk-free rate (or short
rate). This does not imply the assumption that investors were risk neutral. On the contrary, the point is to price given
exactly the risk aversion we observe in the physical world. Towards that aim, we hypothesize about parallel
universes where everybody is risk neutral. The risk-neutral measure is the probability measure of that parallel
universe where all claims have exactly the prices they have in our real world.
Mathematically, adjusting the probabilities is a measure transformation to an equivalent martingale measure; it is
possible if there are no arbitrage opportunities. If the markets are complete, the risk-neutral measure is unique.
Often, the physical measure is called , and the risk-neutral one . The term physical measure is often abused to
denote the Lebesgue measure, occasionally, the measure induced by the corresponding normal density with respect
Risk-neutral measure
154
to the Lebesgue measure.
Usage
Risk-neutral measures make it easy to express the value of a derivative in a formula. Suppose at a future time a
derivative (e.g., a call option on a stock) pays units, where is a random variable on the probability space
describing the market. Further suppose that the discount factor from now (time zero) until time is .
Then today's fair value of the derivative is
where the risk-neutral measure is denoted by . This can be re-stated in terms of the physical measure P as
where is the RadonNikodym derivative of with respect to .
Another name for the risk-neutral measure is the equivalent martingale measure. If in a financial market there is just
one risk-neutral measure, then there is a unique arbitrage-free price for each asset in the market. This is the
fundamental theorem of arbitrage-free pricing. If there are more such measures, then in an interval of prices no
arbitrage is possible. If no equivalent martingale measure exists, arbitrage opportunities do.
Example 1 Binomial model of stock prices
Given a probability space , consider a single-period binomial model. A probability measure is called
risk neutral if for all . Suppose we have a two-state economy: the initial
stock price can go either up to or down to . If the interest rate is , and
, then the risk-neutral probability of an upward stock movement is given by the number
Given a derivative with payoff when the stock price moves up and when it goes down, we can price the
derivative via
Example 2 Brownian motion model of stock prices
Suppose our economy consists of 2 assets, a stock and a risk-free bond, and that we use the Black-Scholes model. In
the model the evolution of the stock price can be described by Geometric Brownian Motion:
where is a standard Brownian motion with respect to the physical measure. If we define
Girsanov's theorem states that there exists a measure under which is a Brownian motion. is known
as the market price of risk. Differentiating and rearranging yields:
Put this back in the original equation:
Risk-neutral measure
155
is the unique risk-neutral measure for the model. The (discounted) payoff process of a derivative on the stock
is a martingale under . Since and are -martingales we can invoke the
martingale representation theorem to find a replicating strategy - a holding of stocks and bonds that pays off at
all times .
Notes
[1] At least in large financial markets. Example of risk-seeking markets are casinos and lotteries.
See also
Mathematical finance
Forward measure
Fundamental theorem of arbitrage-free pricing
Law of one price
Rational pricing
Brownian model of financial markets
Martingale (probability theory)
External links
Gisiger, Nicolas: Risk-Neutral Probabilities Explained (http:/ / ssrn. com/ abstract=1395390)
Tham, Joseph: Risk-neutral Valuation: A Gentle Introduction (http:/ / papers. ssrn. com/ sol3/ papers.
cfm?abstract_id=290044), Part II (http:/ / papers. ssrn. com/ sol3/ papers. cfm?abstract_id=292724)
Stochastic calculus
Stochastic calculus is a branch of mathematics that operates on stochastic processes. It allows a consistent theory of
integration to be defined for integrals of stochastic processes with respect to stochastic processes. It is used to model
systems that behave randomly.
The best-known stochastic process to which stochastic calculus is applied is the Wiener process (named in honor of
Norbert Wiener), which is used for modeling Brownian motion as described by Albert Einstein and other physical
diffusion processes in space of particles subject to random forces. Since the 1970s, the Wiener process has been
widely applied in financial mathematics and economics to model the evolution in time of stock prices and bond
interest rates.
The main flavours of stochastic calculus are the It calculus and its variational relative the Malliavin calculus. For
technical reasons the It integral is the most useful for general classes of processes but the related Stratonovich
integral is frequently useful in problem formulation (particularly in engineering disciplines.) The Stratonovich
integral can readily be expressed in terms of the It integral. The main benefit of the Stratonovich integral is that it
obeys the usual chain rule and does therefore not require It's lemma. This enables problems to be expressed in a
co-ordinate system invariant form, which is invaluable when developing stochastic calculus on manifolds other than
R
n
. The dominated convergence theorem does not hold for the Stratonovich integral, consequently it is very difficult
to prove results without re-expressing the integrals in It form.
Stochastic calculus
156
It integral
The It integral is central to the study of stochastic calculus. The integral is defined for a semimartingale
X and locally bounded predictable process H.
Stratonovich integral
The Stratonovich integral of a semimartingale against another semimartingale Y can be defined in terms of the
It integral as
where [X,Y]
t
c
denotes the quadratic covariation of the continuous parts of X andY. The alternative notation
is also used to denote the Stratonovich integral.
Applications
A very important application of stochastic calculus is in quantitative finance, in which asset prices are often assumed
to follow geometric Brownian motion.
External links
Notes on Stochastic Calculus
[1]
A short elementary description of the basic It integral.
T. Szabados and B. Szekely, Stochastic integration based on simple, symmetric random walks
[1]
- A new
approach which the authors hope is more transparent and technically less demanding.
References
[1] http:/ / arXiv. org/ abs/ 0712.3908/
Wiener process
157
Wiener process
A single realization of a one-dimensional Wiener process
A single realization of a three-dimensional Wiener process
In mathematics, the Wiener process is a
continuous-time stochastic process named in
honor of Norbert Wiener. It is often called
Brownian motion, after Robert Brown. It is
one of the best known Lvy processes
(cdlg stochastic processes with stationary
independent increments) and occurs
frequently in pure and applied mathematics,
economics and physics.
The Wiener process plays an important role
both in pure and applied mathematics. In
pure mathematics, the Wiener process gave
rise to the study of continuous time
martingales. It is a key process in terms of
which more complicated stochastic
processes can be described. As such, it plays
a vital role in stochastic calculus, diffusion
processes and even potential theory. It is the
driving process of Schramm-Loewner
evolution. In applied mathematics, the
Wiener process is used to represent the
integral of a Gaussian white noise process,
and so is useful as a model of noise in
electronics engineering, instrument errors in
filtering theory and unknown forces in
control theory.
The Wiener process has applications
throughout the mathematical sciences. In
physics it is used to study Brownian motion,
the diffusion of minute particles suspended
in fluid, and other types of diffusion via the
Fokker-Planck and Langevin equations. It
also forms the basis for the rigorous path
integral formulation of quantum mechanics
(by the Feynman-Kac formula, a solution to
the Schrdinger equation can be represented
in terms of the Wiener process) and the study of eternal inflation in physical cosmology. It is also prominent in the
mathematical theory of finance, in particular the BlackScholes option pricing model.
Wiener process
158
Characterizations of the Wiener process
The Wiener process W
t
is characterized by three properties:
[1]
1. W
0
= 0
2. W
t
is almost surely continuous
3. W
t
has independent increments with (for 0 s < t).
N(,
2
) denotes the normal distribution with expected value and variance
2
. The condition that it has independent
increments means that if 0 s
1
t
1
s
2
t
2
then W
t
1W
s
1 and W
t
2W
s
2 are independent random variables, and
the similar condition holds for n increments.
An alternative characterization of the Wiener process is the so-called Lvy characterization that says that the Wiener
process is an almost surely continuous martingale with W
0
= 0 and quadratic variation [W
t
,W
t
] = t (which means that
W
t
2
-t is also a martingale).
A third characterization is that the Wiener process has a spectral representation as a sine series whose coefficients are
independent N(0,1) random variables. This representation can be obtained using the Karhunen-Lo-ve theorem.
The Wiener process can be constructed as the scaling limit of a random walk, or other discrete-time stochastic
processes with stationary independent increments. This is known as Donsker's theorem. Like the random walk, the
Wiener process is recurrent in one or two dimensions (meaning that it returns almost surely to any fixed
neighborhood of the origin infinitely often) whereas it is not recurrent in dimensions three and higher. Unlike the
random walk, it is scale invariant, meaning that
is a Wiener process for any nonzero constant . The Wiener measure is the probability law on the space of
continuous functions g, with g(0) = 0, induced by the Wiener process. An integral based on Wiener measure may be
called a Wiener integral.
Properties of a one-dimensional Wiener process
Basic properties
The unconditional probability density function at a fixed time t:
The expectation is zero:
The variance is t:
The covariance and correlation:
The results for the expectation and variance follow immediately from the definition that increments have a normal
distribution, centered at zero. Thus
The results for the covariance and correlation follow from the definition that non-overlapping increments are
independent, of which only the property that they are uncorrelated is used. Suppose that t
1
< t
2
.
Wiener process
159
Substitute the simple identity :
Since W(t
1
) = W(t
1
)W(t
0
) and W(t
2
)W(t
1
), are independent,
Thus
Self-similarity
Brownian scaling
For every c>0 the process is another Wiener process.
A demonstration of Brownian scaling, showing for decreasing
c. Note that the average features of the function do not change while zooming in, and note
that it zooms in quadratically faster horizontally than vertically.
Time reversal
The process for 0 t 1 is distributed like for 0 t 1.
Time inversion
The process is another Wiener process.
A class of Brownian martingales
If a polynomial p(x,t) satisfies the PDE
then the stochastic process
is a martingale.
Example: is a martingale, which shows that the quadratic variation of on is equal to It
follows that the expected time of first exit of from is equal to
More generally, for every polynomial p(x,t) the following stochastic process is a martingale:
where a is the polynomial
Example: the process is a martingale,
which shows that the quadratic variation of the martingale on is equal to
Wiener process
160
About functions p(x,t) more general than polynomials, see local martingales.
Some properties of sample paths
The set of all functions w with these properties is of full Wiener measure. That is, a path (sample function) of the
Wiener process has all these properties almost surely.
Qualitative properties
For every >0, the function w takes both (strictly) positive and (strictly) negative values on (0,).
The function w is continuous everywhere but differentiable nowhere (like the Weierstrass function).
Points of local maximum of the function w are a dense countable set; the maximum values are pairwise different;
each local maximum is sharp in the following sense: if w has a local maximum at t then
as s tends to t. The same holds for local minima.
The function w has no points of local increase, that is, no t>0 satisfies the following for some in (0,t): first, w(s)
w(t) for all s in (t-,t), and second, w(s) w(t) for all s in (t,t+). (Local increase is a weaker condition than that
w is increasing on (t-,t+).) The same holds for local decrease.
The function w is of unbounded variation on every interval.
Zeros of the function w are a nowhere dense perfect set of Lebesgue measure 0 and Hausdorff dimension 1/2
(therefore, uncountable).
Quantitative properties
Law of the iterated logarithm
Modulus of continuity
Local modulus of continuity:
Global modulus of continuity (Lvy):
Local time
The image of the Lebesgue measure on [0,t] under the map w (the pushforward measure) has a density L
t
(). Thus,
for a wide class of functions (namely: all continuous functions; all locally integrable functions; all non-negative
measurable functions). The density L
t
is (more exactly, can and will be chosen to be) continuous. The number L
t
(x) is
called the local time at x of w on [0,t]. It is strictly positive for all x of the interval (a,b) where a and b are the least
and the greatest value of w on [0,t], respectively. (For x outside this interval the local time evidently vanishes.)
Treated as a function of two variables x and t, the local time is still continuous. Treated as a function of t (while x is
fixed), the local time is a singular function corresponding to a nonatomic measure on the set of zeros ofw.
These continuity properties are fairly non-trivial. Consider that the local time can also be defined (as the density of
the pushforward measure) for a smooth function. Then, however, the density is discontinuous, unless the given
Wiener process
161
function is monotone. In other words, there is a conflict between good behavior of a function and good behavior of
its local time. In this sense, the continuity of the local time of the Wiener process is another manifestation of
non-smoothness of the trajectory.
Related processes
The generator of a Brownian motion is times
the Laplace-Beltrami operator. The image above
is of the Brownian motion on a special manifold:
the surface of a sphere.
The stochastic process defined by
is called a Wiener process with drift and infinitesimal variance
2
. These processes exhaust continuous Lvy
processes.
Two random processes on the time interval [0,1] appear, roughly speaking, when conditioning the Wiener process to
vanish on both ends of [0,1]. With no further conditioning, the process takes both positive and negative values on
[0,1] and is called Brownian bridge. Conditioned also to stay positive on (0,1), the process is called Brownian
excursion.
[2]
In both cases a rigorous treatment involves a limiting procedure, since the formula
does not work when
A geometric Brownian motion can be written
It is a stochastic process which is used to model processes that can never take on negative values, such as the value
of stocks.
The stochastic process
is distributed like the OrnsteinUhlenbeck process.
The time of hitting a single point x>0 by the Wiener process is a random variable with the Lvy distribution. The
family of these random variables (indexed by all positive numbersx) is a left-continuous modification of a Lvy
process. The right-continuous modification of this process is given by times of first exit from closed intervals [0,x].
The local time L
t
(0) treated as a random function of t is a random process distributed like the process
The local time L
t
(x) treated as a random function of x (while t is constant) is a random process described by
Ray-Knight theorems in terms of Bessel processes.
Wiener process
162
Brownian martingales
Let A be an event related to the Wiener process (more formally: a set, measurable with respect to the Wiener
measure, in the space of functions), and X
t
the conditional probability of A given the Wiener process on the time
interval [0,t] (more formally: the Wiener measure of the set of trajectories whose concatenation with the given
partial trajectory on [0,t] belongs toA). Then the process X
t
is a continuous martingale. Its martingale property
follows immediately from the definitions, but its continuity is a very special fact a special case of a general
theorem stating that all Brownian martingales are continuous. A Brownian martingale is, by definition, a martingale
adapted to the Brownian filtration; and the Brownian filtration is, by definition, the filtration generated by the
Wiener process.
Time change
Every continuous martingale (starting at the origin) is a time changed Wiener process.
Example. 2W
t
=V(4t) where V is another Wiener process (different from W but distributed like W).
Example. where and is another Wiener process.
In general, if M is a continuous martingale then where is the quadratic variation of M on
[0,t], and is a Wiener process.
Corollary. (See also Doob's martingale convergence theorems) Let be a continuous martingale, and
Then only the following two cases are possible:
other cases (such as etc.) are of probability 0.
Especially, a nonnegative continuous martingale has a finite limit (as ) almost surely.
All stated (in this subsection) for martingales holds also for local martingales.
Change of measure
A wide class of continuous semimartingales (especially, of diffusion processes) is related to the Wiener process via a
combination of time change and change of measure.
Using this fact, the qualitative properties stated above for the Wiener process can be generalized to a wide class of
continuous semimartingales.
Complex-valued Wiener process
The complex-valued Wiener process may be defined as a complex-valued random process of the form
where are independent Wiener processes (real-valued).
[3]
Wiener process
163
Self-similarity
Brownian scaling, time reversal, time inversion: the same as in the real-valued case.
Rotation invariance: for every complex number c such that |c|=1 the process is another complex-valued Wiener
process.
Time change
If f is an entire function then the process is a time-changed complex-valued Wiener process.
Example. where and is another
complex-valued Wiener process.
In contrast to the real-valued case, a complex-valued martingale is generally not a time-changed complex-valued
Wiener process. For example, the martingale is not (here are independent Wiener processes, as
before).
See also
Wiener sausage
Abstract Wiener space
Classical Wiener space
Chernoff's distribution
Notes
[1] Durrett 1996, Sect. 7.1
[2] Vervaat, W. (1979). A relation between Brownian bridge and Brownian excursion. Ann. Prob. 7, 143-149.
[3] Navarro-moreno, J.; Estudillo-martinez, M.D; Fernandez-alcala, R.M.; Ruiz-molina, J.C., "Estimation of Improper Complex-Valued Random
Signals in Colored Noise by Using the Hilbert Space Theory" (http:/ / ieeexplore. ieee. org/ Xplore/ login. jsp?url=http:/ / ieeexplore. ieee. org/
iel5/ 18/ 4957623/ 04957648. pdf?arnumber=4957648& authDecision=-203), IEEE Transactions on Information Theory 55 (6): 28592867,
doi:10.1109/TIT.2009.2018329, , retrieved 2010-03-30
References
Kleinert, Hagen, Path Integrals in Quantum Mechanics, Statistics, Polymer Physics, and Financial Markets, 4th
edition, World Scientific (Singapore, 2004); Paperback ISBN 981-238-107-4 (also available online: PDF-files
(http:/ / www. physik. fu-berlin. de/ ~kleinert/ b5))
Stark,Henry, John W. Woods, Probability and Random Processes with Applications to Signal Processing, 3rd
edition, Prentice Hall (New Jersey, 2002); Textbook ISBN 0-13-020071-9
Durrett, R. (2000) Probability: theory and examples,4th edition. Cambridge University Press, ISBN 0521765390
Daniel Revuz and Marc Yor, Continuous martingales and Brownian motion, second edition, Springer-Verlag
1994.
Lvy process
164
Lvy process
In probability theory, a Lvy process, named after the French mathematician Paul Lvy, is any continuous-time
stochastic process that starts at 0, admits cdlg modification and has "stationary independent increments" this
phrase will be explained below. They are a stochastic analog of independent and identically-distributed random
variables, and the most well-known examples are the Wiener process and the Poisson process.
Definition
A stochastic process is said to be a Lvy process if,
1. almost surely
2. Independent increments: For any ,
are independent
3. Stationary increments: For any , is equal in distribution to
4. is almost surely right continuous with left limits.
Properties
Independent increments
A continuous-time stochastic process assigns a random variable X
t
to each point t 0 in time. In effect it is a random
function of t. The increments of such a process are the differences X
s
X
t
between its values at different times t < s.
To call the increments of a process independent means that increments X
s
X
t
and X
u
X
v
are independent random
variables whenever the two time intervals do not overlap and, more generally, any finite number of increments
assigned to pairwise non-overlapping time intervals are mutually (not just pairwise) independent.
Stationary increments
To call the increments stationary means that the probability distribution of any increment X
s
X
t
depends only on
the length st of the time interval; increments with equally long time intervals are identically distributed.
In the Wiener process, the probability distribution of X
s
X
t
is normal with expected value 0 and variance st.
In the (homogeneous) Poisson process, the probability distribution of X
s
X
t
is a Poisson distribution with expected
value (st), where > 0 is the "intensity" or "rate" of the process.
Divisibility
Lvy processes correspond to infinitely divisible probability distributions:
The probability distributions of the increments of any Lvy process are infinitely divisible, since the increment of
length t is the sum of n increments of length t/n, which are i.i.d. by assumption (independent increments and
stationarity).
Conversely, there is a Lvy process for each infinitely divisible probability distribution: given such a distribution
D, multiples and dividing define a stochastic process for positive rational time, defining it as a Dirac delta
distribution for time 0 defines it for time 0, and taking limits defines it for real time. Independent increments and
stationarity follow by assumption of divisibility, though one must check continuity and that taking limits gives a
well-defined function for irrational time.
Lvy process
165
Moments
In any Lvy process with finite moments, the nth moment , is a polynomial function of t; these
functions satisfy a binomial identity:
LvyKhintchine representation
It is possible to characterise all Lvy processes by looking at their characteristic function. This leads to the
LvyKhintchine representation. If is a Lvy process, then its characteristic function satisfies the following
relation:
where , and is the indicator function. The Lvy measure must be such that
A Lvy process can be seen as comprising of three components: a drift, a diffusion component and a jump
component. These three components, and thus the LvyKhintchine representation of the process, are fully
determined by the LvyKhintchine triplet . So one can see that a purely continuous Lvy process is a
Brownian motion with drift.
LvyIt decomposition
We can also construct a Lvy process from any given characteristic function of the form given in the
LvyKhintchine representation. This expression corresponds to the decomposition of a measure in Lebesgue's
decomposition theorem: the drift and diffusion are the absolutely continuous part, while the measure W is the
singular measure.
Given a Lvy triplet there exists three independent Lvy processes, which lie in the same probability
space, , , such that:
is a Brownian motion with drift, corresponding to the absolutely continuous part of a measure and
capturing the drift a and diffusion ;
is a compound Poisson process, corresponding to the pure point part of the singular measure W;
is a square integrable pure jump martingale that almost surely has a countable number of jumps on a finite
interval, corresponding to the singular continuous part of the singular measure W.
The process defined by is a Lvy process with triplet .
Constructing a stochastic probability measure
Consider a random process; with independent increments, where the random values occur in, say,
a second countable locally compact abelian group .
Let for denote the (borel regular) probability measures on the initial
position and the increments. Now for let
. These define bona fide probability measures which,
by the properties of the process, compute appropriate probabilities for properties of paths depending on only finitely
many times.
Now correspond to these measures continuous linear operators in the
obvious way. Then, for any countable set of times (for ease consider the rationals ) define a linear functional,
as follows: if depends only on finitely many times, say
Lvy process
166
where without loss of generality ; let . It is straight-forward to see that this is well-defined and linear. Moreover it is
clearly a positive, bounded operator with since . By Stone-Weierstrass, extends (uniquely) to a (linear) continuous
(positive) operator (with norm 1) on its domain. By the Riesz representation theorem, this in turn gives rise to a
(unique) (borel regular) probability measure, . Precisely, this measure is the unique one satisfying that for any .
Whereas initially we knew the probability distributions of a path at given times / over time increments, and thus
could talk about local properties of the paths in the stochastic process; the constructed measure above allows us to
attach a probability distribution to (almost) the full path space, and thus enables us to talk about global properties.
Roughly we are justified (and compelled to) thinking of the measure as though calculates "the probability"
that a path occurs in (when projected onto the times ).
As an example of our new ability to talk about global properties, we have that "almost every path has left limits / is
left continuous", if and only if: for every countable sequence of times , letting ,
we have -almost-everywhere then converges / converges to . (This makes sense, as it can be
shown that (i) has left limits/is left cts if and only if has limits/is cts under the topology on generated by
; and (ii) if is second countable then has limits/is cts if and only if converges / converges
to whenever .)
Verifying how global properties of paths over the real line can be translated into properties considering only
countably many times, can be a little tricky. There is no escaping this. Fortunately, the problem of having to change
the countable set of times over which the measure is based can be prevented. If we consider a countable dense
subset, , of the reals (e.g. the rationals), we may apply knowledge of the distribution on the increments together
with the stochastic measure to check these global properties. E.g. in the case of the wiener process, we are able
to check that almost every path is (i) everywhere cts; (ii) has continuity modulus (Lvy); and thus (iii)
is nowhere differentiable.
See also
Independent and identically-distributed random variables
External links
Applebaum, David (December 2004), "Lvy ProcessesFrom Probability to Finance and Quantum Groups"
[1]
(PDF), Notices of the American Mathematical Society (Providence, RI: American Mathematical Society) 51 (11):
13361347, ISSN1088-9477
References
[1] http:/ / www. ams. org/ notices/ 200411/ fea-applebaum. pdf
Stochastic differential equations
167
Stochastic differential equations
A stochastic differential equation (SDE) is a differential equation in which one or more of the terms is a stochastic
process, thus resulting in a solution which is itself a stochastic process. SDE are used to model diverse phenomena
such as fluctuating stock prices or physical system subject to thermal fluctuations. Typically, SDEs incorporate white
noise which can be thought of as the derivative of Brownian motion (or the Wiener process); however, it should be
mentioned that other types of random fluctuations are possible, such as jump processes.
Background
The earliest work on SDEs was done to describe Brownian motion in Einstein's famous paper, and at the same time
by Smoluchowski. However, one of the earlier works related to Brownian motion is credited to Bachelier (1900) in
his thesis 'Theory of Speculation'. This work was followed upon by Langevin. Later It and Stratonovich put SDEs
on more solid mathematical footing.
Terminology
In physical science, SDEs are usually written as Langevin equations. These are sometimes confusingly called "the
Langevin equation" even though there are many possible forms. These consist of an ordinary differential equation
containing a deterministic part and an additional random white noise term. A second form is the Fokker-Planck
equation. The Fokker-Planck equation is a partial differential equation that describes the time evolution of the
probability distribution function. The third form is the stochastic differential equation that is used most frequently in
mathematics and quantitative finance (see below). This is similar to the Langevin form, but it is usually written in
differential form. SDEs come in two varieties, corresponding to two versions of stochastic calculus.
Stochastic Calculus
Brownian motion or the Wiener process was discovered to be exceptionally complex mathematically. The Wiener
process is non-differentiable; thus, it requires its own rules of calculus. There are two dominating versions of
stochastic calculus, the Ito stochastic calculus and the Stratonovich stochastic calculus. Each of the two has
advantages and disadvantages, and newcomers are often confused whether the one is more appropriate than the other
in a given situation. Guidelines exist (e.g. ksendal, 2003) and conveniently, one can readily convert an Ito SDE to
an equivalent Stratonovich SDE and back again. Still, one must be careful which calculus to use when the SDE is
initially written down.
Numerical Solutions
Numerical solution of stochastic differential equations and especially stochastic partial differential equations is a
young field relatively speaking. Almost all algorithms that are used for the solution of ordinary differential equations
will work very poorly for SDEs, having very poor numerical convergence. A textbook describing many different
algorithms is Kloeden & Platen (1995).
Methods include the EulerMaruyama method, Milstein method and RungeKutta method (SDE).
Stochastic differential equations
168
Use in Physics
In physics, SDEs are typically written in the Langevin form and referred to as "the Langevin equation." For example,
a general coupled set of first-order SDEs is often written in the form:
where is the set of unknowns, the and are arbitrary functions and the are
random functions of time, often referred to as "noise terms". This form is usually usable because there are standard
techniques for transforming higher-order equations into several coupled first-order equations by introducing new
unknowns. If the are constants, the system is said to be subject to additive noise, otherwise it is said to be subject
to multiplicative noise. This term is somewhat misleading as it has come to mean the general case even though it
appears to imply the limited case where : . Additive noise is the simpler of the two cases; in that
situation the correct solution can often be found using ordinary calculus and in particular the ordinary chain rule of
calculus. However, in the case of multiplicative noise, the Langevin equation is not a well-defined entity on its own,
and it must be specified whether the Langevin equation should be interpreted as an Ito SDE or a Stratonovich SDE.
In physics, the main method of solution is to find the probability distribution function as a function of time using the
equivalent Fokker-Planck equation (FPE). The Fokker-Planck equation is a deterministic partial differential
equation. It tells how the probability distribution function evolves in time similarly to how the Schrdinger equation
gives the time evolution of the quantum wave function or the diffusion equation gives the time evolution of chemical
concentration. Alternatively numerical solutions can be obtained by Monte Carlo simulation. Other techniques
include the path integration that draws on the analogy between statistical physics and quantum mechanics (for
example, the Fokker-Planck equation can be transformed into the Schrdinger equation by rescaling a few variables)
or by writing down ordinary differential equations for the statistical moments of the probability distribution function.
Note on "the Langevin equation"
The "the" in "the Langevin equation" is somewhat ungrammatical nomenclature. Each individual physical model has
its own Langevin equation. Perhaps, "a Langevin equation" or "the associated Langevin equation" would conform
better with common English usage.
Use in probability and mathematical finance
The notation used in probability theory (and in many applications of probability theory, for instance mathematical
finance) is slightly different. This notation makes the exotic nature of the random function of time in the physics
formulation more explicit. It is also the notation used in publications on numerical methods for solving stochastic
differential equations. In strict mathematical terms, can not be chosen as a usual function, but only as a
generalized function. The mathematical formulation treats this complication with less ambiguity than the physics
formulation.
A typical equation is of the form
where denotes a Wiener process (Standard Brownian motion). This equation should be interpreted as an informal
way of expressing the corresponding integral equation
The equation above characterizes the behavior of the continuous time stochastic process X
t
as the sum of an ordinary
Lebesgue integral and an It integral. A heuristic (but very helpful) interpretation of the stochastic differential
equation is that in a small time interval of length the stochastic process X
t
changes its value by an amount that is
Stochastic differential equations
169
normally distributed with expectation (X
t
,t) and variance (X
t
,t) and is independent of the past behavior of the
process. This is so because the increments of a Wiener process are independent and normally distributed. The
function is referred to as the drift coefficient, while is called the diffusion coefficient. The stochastic process X
t
is
called a diffusion process, and is usually a Markov process.
The formal interpretation of an SDE is given in terms of what constitutes a solution to the SDE. There are two main
definitions of a solution to an SDE, a strong solution and a weak solution. Both require the existence of a process X
t
that solves the integral equation version of the SDE. The difference between the two lies in the underlying
probability space (F,Pr). A weak solution consists of a probability space and a process that satisfies the integral
equation, while a strong solution is a process that satisfies the equation and is defined on a given probability space.
An important example is the equation for geometric Brownian motion
which is the equation for the dynamics of the price of a stock in the Black Scholes options pricing model of financial
mathematics.
There are also more general stochastic differential equations where the coefficients and depend not only on the
present value of the process X
t
, but also on previous values of the process and possibly on present or previous values
of other processes too. In that case the solution process, X, is not a Markov process, and it is called an It process and
not a diffusion process. When the coefficients depends only on present and past values of X, the defining equation is
called a stochastic delay differential equation.
Existence and uniqueness of solutions
As with deterministic ordinary and partial differential equations, it is important to know whether a given SDE has a
solution, and whether or not it is unique. The following is a typical existence and uniqueness theorem for It SDEs
taking values in n-dimensional Euclidean space R
n
and driven by an m-dimensional Brownian motion B; the proof
may be found in ksendal (2003, 5.2).
Let T>0, and let
be measurable functions for which there exist constants C and D such that
for all t[0,T] and all x and yR
n
, where
Let Z be a random variable that is independent of the -algebra generated by B
s
, s0, and with finite second
moment:
Then the stochastic differential equation/initial value problem
has a Pr-almost surely unique t-continuous solution (t,)|X
t
() such that X is adapted to the filtration F
t
Z
generated by Z and B
s
, st, and
Stochastic differential equations
170
See also
Langevin dynamics
Local volatility
Stochastic volatility
Sethi advertising model
Stochastic partial differential equations
References
Adomian, George (1983). Stochastic systems. Mathematics in Science and Engineering (169). Orlando, FL:
Academic Press Inc..
Adomian, George (1986). Nonlinear stochastic operator equations. Orlando, FL: Academic Press Inc..
Adomian, George (1989). Nonlinear stochastic systems theory and applications to physics. Mathematics and its
Applications (46). Dordrecht: Kluwer Academic Publishers Group.
ksendal, Bernt K. (2003). Stochastic Differential Equations: An Introduction with Applications. Berlin:
Springer. ISBN3-540-04758-1.
Teugels, J. and Sund B. (eds.) (2004). Encyclopedia of Actuarial Science. Chichester: Wiley. pp.523527.
C. W. Gardiner (2004). Handbook of Stochastic Methods: for Physics, Chemistry and the Natural Sciences.
Springer. p.415.
Thomas Mikosch (1998). Elementary Stochastic Calculus: with Finance in View. Singapore: World Scientific
Publishing. p.212. ISBN981-02-3543-7.
Bachelier, L., (1900). Thorie de la speculation (in French), PhD Thesis. NUMDAM: http:/ / www. numdam.
org/ item?id=ASENS_1900_3_17__21_0. & #32;In English in 1971 book 'The Random Character of the Stock
Market' Eds. P.H. Cootner.
P.E. Kloeden and E. Platen, (1995). Numerical Solution of Stochastic Differential Equations,. Springer,.
Stochastic volatility
171
Stochastic volatility
Stochastic volatility models are used in the field of Mathematical finance to evaluate derivative securities, such as
options. The name derives from the models' treatment of the underlying security's volatility as a random process,
governed by state variables such as the price level of the underlying security, the tendency of volatility to revert to
some long-run mean value, and the variance of the volatility process itself, among others.
Stochastic volatility models are one approach to resolve a shortcoming of the Black-Scholes model. In particular,
these models assume that the underlying volatility is constant over the life of the derivative, and unaffected by the
changes in the price level of the underlying security. However, these models cannot explain long-observed features
of the implied volatility surface such as volatility smile and skew, which indicate that implied volatility does tend to
vary with respect to strike price and expiration. By assuming that the volatility of the underlying price is a stochastic
process rather than a constant, it becomes possible to model derivatives more accurately.
Basic model
Starting from a constant volatility approach, assume that the derivative's underlying price follows a standard model
for geometric brownian motion:
where is the constant drift (i.e. expected return) of the security price , is the constant volatility, and
is a standard gaussian with zero mean and unit standard deviation. The explicit solution of this stochastic differential
equation is
.
The Maximum likelihood estimator to estimate the constant volatility for given stock prices at different times
is
;
its expectation value is .
This basic model with constant volatility is the starting point for non-stochastic volatility models such as
Black-Scholes and Cox-Ross-Rubinstein.
For a stochastic volatility model, replace the constant volatility with a function , that models the variance of
. This variance function is also modeled as brownian motion, and the form of depends on the particular SV
model under study.
where and are some functions of and is another standard gaussian that is correlated with
with constant correlation factor .
Stochastic volatility
172
Heston model
The popular Heston model is a commonly used SV model, in which the randomness of the variance process varies as
the square root of variance. In this case, the differential equation for variance takes the form:
where is the mean long-term volatility, is the rate at which the volatility reverts toward its long-term mean,
is the volatility of the volatility process, and is, like , a gaussian with zero mean and unit standard
deviation. However, and are correlated with the constant correlation value .
In other words, the Heston SV model assumes that volatility is a random process that
1. exhibits a tendency to revert towards a long-term mean volatility at a rate ,
2. exhibits its own (constant) volatility, ,
3. and whose source of randomness is correlated (with correlation ) with the randomness of the underlying's price
processes.
SABR volatility model
The SABR model (Stochastic Alpha, Beta, Rho) describes a single forward (related to any asset e.g. an index,
interest rate, bond, currency or equity) under stochastic volatility :
The initial values and are the current forward price and volatility, whereas and are two correlated
Wiener processes (i.e. Brownian motions) with correlation coefficient . The constant parameters
are such that .
The main feature of the SABR model is to be able to reproduce the smile effect of the volatility smile.
GARCH model
The Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model is another popular model for
estimating stochastic volatility. It assumes that the randomness of the variance process varies with the variance, as
opposed to the square root of the variance as in the Heston model. The standard GARCH(1,1) model has the
following form for the variance differential:
The GARCH model has been extended via numerous variants, including the NGARCH, LGARCH, EGARCH,
GJR-GARCH, etc.
3/2 model
The 3/2 model is similar to the Heston model, but assumes that the randomness of the variance process varies with
. The form of the variance differential is:
.
Chen model
In interest rate modelings, Lin Chen in 1994 developed the first stochastic mean and stochastic volatility model,
Chen model. Specifically, the dynamics of the instantaneous interest rate are given by following the stochastic
differential equations:
,
,
Stochastic volatility
173
.
Calibration
Once a particular SV model is chosen, it must be calibrated against existing market data. Calibration is the process of
identifying the set of model parameters that are most likely given the observed data. This process is called Maximum
Likelihood Estimation (MLE). For instance, in the Heston model, the set of model parameters
can be estimated applying an MLE algorithm such as the Powell Directed Set method [1] to observations of historic
underlying security prices.
In this case, you start with an estimate for , compute the residual errors when applying the historic price data to
the resulting model, and then adjust to try to minimize these errors. Once the calibration has been performed, it is
standard practice to re-calibrate the model over time.
See also
Chen model
Heston model
Local volatility
Risk-neutral measure
SABR Volatility Model
Volatility
Volatility, uncertainty, complexity and ambiguity
BlackScholes
References
Stochastic Volatility and Mean-variance Analysis
[2]
, Hyungsok Ahn, Paul Wilmott, (2006).
A closed-form solution for options with stochastic volatility
[3]
, SL Heston, (1993).
Inside Volatility Arbitrage
[4]
, Alireza Javaheri, (2005).
Accelerating the Calibration of Stochastic Volatility Models
[5]
, Kilin, Fiodar (2006).
Lin Chen (1996). Stochastic Mean and Stochastic Volatility -- A Three-Factor Model of the Term Structure of
Interest Rates and Its Application to the Pricing of Interest Rate Derivatives. Blackwell Publishers.. Blackwell
Publishers.
References
[1] http:/ / www. library. cornell. edu/ nr/ bookcpdf.html
[2] http:/ / www. wilmott. com/ detail.cfm?articleID=245
[3] http:/ / www. javaquant. net/ papers/ Heston-original. pdf
[4] http:/ / www. amazon. com/ s?platform=gurupa& url=index%3Dblended& keywords=inside+ volatility+ arbitrage
[5] http:/ / ssrn.com/ abstract=982221
Numerical partial differential equations
174
Numerical partial differential equations
Numerical partial differential equations is the branch of numerical analysis that studies the numerical solution of
partial differential equations (PDEs).
Numerical techniques for solving PDEs include the following:
The finite difference method, in which functions are represented by their values at certain grid points and
derivatives are approximated through differences in these values.
The method of lines, where all but one variable is discretized. The result is a system of ODEs in the remaining
continuous variable.
The finite element method, where functions are represented in terms of basis functions and the PDE is solved in
its integral (weak) form.
The finite volume method, which divides space into regions or volumes and computes the change within each
volume by considering the flux (flow rate) across the surfaces of the volume.
The spectral method, which represents functions as a sum of particular basis functions, for example using a
Fourier series.
Meshfree methods don't need a grid to work and so may be better suited for some problems. However the
computational effort is usually higher.
Domain decomposition methods solve boundary value problems by splitting them into smaller boundary value
problems on subdomains and iterating to coordinate the solution between the subdomains.
Multigrid methods solve differential equations using a hierarchy of discretizations.
The finite difference method is often regarded as the simplest method to learn and use. The finite element and finite
volume methods are widely used in engineering and in computational fluid dynamics, and are well suited to
problems in complicated geometries. Spectral methods are generally the most accurate, provided that the solutions
are sufficiently smooth.
See also
List of numerical analysis topics#Numerical partial differential equations
Numerical ordinary differential equations
External links
Numerical Methods for Partial Differential Equations
[1]
course at MIT OpenCourseWare.
IMS
[2]
, the Open Source IMTEK Mathematica Supplement (IMS)
References
[1] http:/ / ocw. mit. edu/ courses/ aeronautics-and-astronautics/
16-920j-numerical-methods-for-partial-differential-equations-sma-5212-spring-2003/
[2] http:/ / www. imtek. uni-freiburg.de/ simulation/ mathematica/ IMSweb/
CrankNicolson method
175
CrankNicolson method
In numerical analysis, the CrankNicolson method is a finite difference method used for numerically solving the
heat equation and similar partial differential equations.
[1]
It is a second-order method in time, implicit in time, and is
numerically stable. The method was developed by John Crank and Phyllis Nicolson in the mid 20th century.
[2]
For diffusion equations (and many other equations), it can be shown the CrankNicolson method is unconditionally
stable.
[3]
However, the approximate solutions can still contain (decaying) spurious oscillations if the ratio of time
step to the square of space step is large (typically larger than 1/2). For this reason, whenever large time steps or high
spatial resolution is necessary, the less accurate backward Euler method is often used, which is both stable and
immune to oscillations.
The method
The CrankNicolson stencil for a 1D problem.
The CrankNicolson method is based on central difference in
space, and the trapezoidal rule in time, giving second-order
convergence in time. For example, in one dimension, if the partial
differential equation is
then, letting , the equation for CrankNicolson method is the average of that forward Euler
method at and that backward Euler method at n+1 (note, however, that the method itself is not simply the
average of those two methods, as the equation has an implicit dependence on the solution):
The function F must be discretized spatially with a central difference.
Note that this is an implicit method: to get the "next" value of u in time, a system of algebraic equations must be
solved. If the partial differential equation is nonlinear, the discretization will also be nonlinear so that advancing in
time will involve the solution of a system of nonlinear algebraic equations, though linearizations are possible. In
many problems, especially linear diffusion, the algebraic problem is tridiagonal and may be efficiently solved with
the tridiagonal matrix algorithm, which gives a fast direct solution as opposed to the usual for a full
matrix.
CrankNicolson method
176
Example: 1D diffusion
The CrankNicolson method is often applied to diffusion problems. As an example, for linear diffusion,
whose CrankNicolson discretization is then:
or, letting :
which is a tridiagonal problem, so that may be efficiently solved by using the tridiagonal matrix algorithm in
favor of a much more costly matrix inversion.
A quasilinear equation, such as (this is a minimalistic example and not general)
would lead to a nonlinear system of algebraic equations which could not be easily solved as above; however, it is
possible in some cases to linearize the problem by using the old value for , that is instead of .
Other times, it may be possible to estimate using an explicit method and maintain stability.
Example: 1D diffusion with advection for steady flow, with multiple channel
connections
This is a solution usually employed for many purposes when there's a contamination problem in streams or rivers
under steady flow conditions but information is given in one dimension only. Often the problem can be simplified
into a 1-dimensional problem and still yield useful information.
Here we model the concentration of a solute contaminant in water. This problem is composed of three parts: the
known diffusion equation ( chosen as constant), an advective component (which means the system is evolving
in space due to a velocity field), which we choose to be a constant Ux, and a lateral interaction between longitudinal
channels (k).
where C is the concentration of the contaminant and subscripts N and M correspond to previous and next channel.
The CrankNicolson method (where i represents position & j time) transform each component of the PDE into the
following:
Now we create the following constants to simplify the algebra:
CrankNicolson method
177
and substitute <1>, <2>, <3>, <4>, <5>, <6>, , and into <0>. We then put the new time terms on the left (j+1)
and the present time terms on the right (j) to get:
To model the first channel, we realize that it can only be in contact with the following channel (M), so the expression
is simplified to:
In the same way, to model the last channel, we realize that it can only be in contact with the previous channel (N), so
the expression is simplified to:
To solve this linear system of equations we must now see that boundary conditions must be given first to the
beginning of the channels:
: initial condition for the channel at present time step
: initial condition for the channel at next time step
: initial condition for the previous channel to the one analyzed at present time step
: initial condition for the next channel to the one analyzed at present time step
For the last cell of the channels (z) the most convenient condition becomes an adiabatic one, so
This condition is satisfied if and only if (regardless of a null value)
Let us solve this problem (in a matrix form) for the case of 3 channels and 5 nodes (including the initial boundary
condition). We express this as a linear system problem:
where
and
Now we must realize that AA and BB should be arrays made of four different subarrays (remember that only three
channels are considered for this example but it covers the main part discussed above).
and
CrankNicolson method
178
where the elements mentioned above correspond to the next arrays and an additional 4x4 full of zeros. Please note
that the sizes of AA and BB are 12x12:
,
,
,
&
The d vector here is used to hold the boundary conditions. In this example it is a 12x1 vector:
To find the concentration at any time, one must iterate the following equation:
Example: 2D diffusion
When extending into two dimensions on a uniform Cartesian grid, the derivation is similar and the results may lead
to a system of band-diagonal equations rather than tridiagonal ones. The two-dimensional heat equation
can be solved with the CrankNicolson discretization of
assuming that a square grid is used so that . This equation can be simplified somewhat by rearranging
terms and using the CFL number
For the CrankNicolson numerical scheme, a low CFL number is not required for stability, however it is required for
numerical accuracy. We can now write the scheme as:
CrankNicolson method
179
Application in financial mathematics
Because a number of other phenomena can be modeled with the heat equation (often called the diffusion equation in
financial mathematics), the CrankNicolson method has been applied to those areas as well.
[4]
Particularly, the
Black-Scholes option pricing model's differential equation can be transformed into the heat equation, and thus
numerical solutions for option pricing can be obtained with the CrankNicolson method.
The importance of this for finance, is that option pricing problems, when extended beyond the standard assumptions
(e.g. incorporating changing dividends), cannot be solved in closed form, but can be solved using this method. Note
however, that for non-smooth final conditions (which happen for most financial instruments), the CrankNicolson
method is not satisfactory as numerical oscillations are not damped. For vanilla options, this results in oscillation in
the gamma value around the strike price. Therefore, special damping initialization steps are necessary (e.g., fully
implicit finite difference method).
See also
Financial mathematics
Partial differential equations
References
[1] Tuncer Cebeci (2002). Convective Heat Transfer (http:/ / books. google. com/ ?id=xfkgT9Fd4t4C& pg=PA257& dq="Crank-Nicolson+
method"). Springer. ISBN0966846141. .
[2] Crank, J.; Nicolson, P. (1947). "A practical method for numerical evaluation of solutions of partial differential equations of the heat
conduction type". Proc. Camb. Phil. Soc. 43: 5067. doi:10.1007/BF02127704..
[3] Thomas, J. W. (1995). Numerical Partial Differential Equations: Finite Difference Methods. Texts in Applied Mathematics. 22. Berlin, New
York: Springer-Verlag. ISBN978-0-387-97999-1.. Example 3.3.2 shows that CrankNicolson is unconditionally stable when applied to
.
[4] Wilmott, P.; Howison, S.; Dewynne, J. (1995). The Mathematics of Financial Derivatives: A Student Introduction (http:/ / books. google. co.
in/ books?hl=en& q=The Mathematics of Financial Derivatives Wilmott& um=1& ie=UTF-8& sa=N& tab=wp). Cambridge Univ. Press.
ISBN0521497892. .
External links
Module for Parabolic P.D.E.'s (http:/ / math. fullerton. edu/ mathews/ n2003/ CrankNicolsonMod. html)
Finite difference
180
Finite difference
A finite difference is a mathematical expression of the form f(x+b) f(x+a). If a finite difference is divided by
ba, one gets a difference quotient. The approximation of derivatives by finite differences plays a central role in
finite difference methods for the numerical solution of differential equations, especially boundary value problems.
Recurrence relations can be written as difference equations by replacing iteration notation with finite differences.
Forward, backward, and central differences
Only three forms are commonly considered: forward, backward, and central differences.
A forward difference is an expression of the form
Depending on the application, the spacing h may be variable or constant.
A backward difference uses the function values at x and x h, instead of the values at x + h and x:
Finally, the central difference is given by
Relation with derivatives
The derivative of a function f at a point x is defined by the limit
If h has a fixed (non-zero) value, instead of approaching zero, then the right-hand side is
Hence, the forward difference divided by h approximates the derivative when h is small. The error in this
approximation can be derived from Taylor's theorem. Assuming that f is continuously differentiable, the error is
The same formula holds for the backward difference:
However, the central difference yields a more accurate approximation. Its error is proportional to square of the
spacing (if f is twice continuously differentiable):
The main problem with the central difference method, however, is that oscillating functions can yield zero
derivative. If f(nh)=1 for n uneven, and f(nh)=2 for n even, then f'(nh)=0 if it is calculated with the central difference
scheme. This is particularly troublesome if the domain of f is discrete.
Finite difference
181
Higher-order differences
In an analogous way one can obtain finite difference approximations to higher order derivatives and differential
operators. For example, by using the above central difference formula for and and
applying a central difference formula for the derivative of at x, we obtain the central difference approximation of
the second derivative of f:
More generally, the n
th
-order forward, backward, and central differences are respectively given by:
Note that the central difference will, for odd , have multiplied by non-integers. This is often a problem because
it amounts to changing the interval of discretization. The problem may be remedied taking the average of
and .
The relationship of these higher-order differences with the respective derivatives is very straightforward:
Higher-order differences can also be used to construct better approximations. As mentioned above, the first-order
difference approximates the first-order derivative up to a term of order h. However, the combination
approximates f'(x) up to a term of order h
2
. This can be proven by expanding the above expression in Taylor series,
or by using the calculus of finite differences, explained below.
If necessary, the finite difference can be centered about any point by mixing forward, backward, and central
differences.
Arbitrarily sized kernels
Using a little linear algebra, one can fairly easily construct approximations, which sample an arbitrary number of
points to the left and a (possibly different) number of points to the right of the center point, for any order of
derivative. This involves solving a linear system such that the Taylor expansion of the sum of those points, around
the center point, well approximates the Taylor expansion of the desired derivative.
This is useful for differentiating a function on a grid, where, as one approaches the edge of the grid, one must sample
fewer and fewer points on one side.
The details are outlined in these notes
[1]
.
Finite difference
182
Properties
For all positive k and n
Leibniz rule:
Finite difference methods
An important application of finite differences is in numerical analysis, especially in numerical differential equations,
which aim at the numerical solution of ordinary and partial differential equations respectively. The idea is to replace
the derivatives appearing in the differential equation by finite differences that approximate them. The resulting
methods are called finite difference methods.
Common applications of the finite difference method are in computational science and engineering disciplines, such
as thermal engineering, fluid mechanics, etc.
Calculus of finite differences
The forward difference can be considered as a difference operator, which maps the function f to
h
[f]. This operator
satisfies
where is the shift operator with step , defined by , and is an identity operator.
Finite difference of higher orders can be defined in recursive manner as or, in
operators notation, Another possible (and equivalent) definition is
The difference operator
h
is linear and satisfies Leibniz rule. Similar statements hold for the backward and central
difference.
Formally applying the Taylor series with respect to h gives the formula
where D denotes the derivative operator, mapping f to its derivative f'. The expansion is valid when both sides act on
analytic function, for sufficiently small h. Formally inverting the exponential suggests that
This formula holds in the sense that both operators give the same result when applied to a polynomial. Even for
analytic functions, the series on the right is not guaranteed to converge; it may be an asymptotic series. However, it
can be used to obtain more accurate approximations for the derivative. For instance, retaining the first two terms of
the series yields the second-order approximation to mentioned at the end of the section Higher-order
differences.
The analogous formulas for the backward and central difference operators are
The calculus of finite differences is related to the umbral calculus in combinatorics.
The inverse operator of the forward difference operator is the indefinite sum.
In mathematics, a difference operator maps a function, (x), to another function, (x+b)(x+a).
Finite difference
183
The forward difference operator
occurs frequently in the calculus of finite differences, where it plays a role formally similar to that of the derivative,
but used in discrete circumstances. Difference equations can often be solved with techniques very similar to those for
solving differential equations. This similarity led to the development of time scale calculus. Analogously we can
have the backward difference operator
When restricted to polynomial functions f, the forward difference operator is a delta operator, i.e., a shift-equivariant
linear operator on polynomials that reduces degree by 1.
n-th difference
The nth forward difference of a function f(x) is given by
where is the binomial coefficient. Forward differences applied to a sequence are sometimes called the
binomial transform of the sequence, and have a number of interesting combinatorial properties.
Forward differences may be evaluated using the NrlundRice integral. The integral representation for these types of
series is interesting because the integral can often be evaluated using asymptotic expansion or saddle-point
techniques; by contrast, the forward difference series can be extremely hard to evaluate numerically, because the
binomial coefficients grow rapidly for large n.
Newton series
The Newton series consists of the terms of the Newton forward difference equation, named after Isaac Newton
and in essence the Newton interpolation formula first published in his Principia Mathematica in 1687
[2]
, is the
relationship
which holds for any polynomial function f and for some, but not all, analytic functions. Here, the expression
is the binomial coefficient, as
is the "falling factorial" or "lower factorial" and the empty product (x)
0
defined to be 1. In this particular case there is
an assumption of unit steps for the changes in the values of x. Note also the formal similarity of this result to Taylor's
theorem; this is one of the observations that lead to the idea of umbral calculus.
To illustrate how one might use Newton's formula in actual practice consider the first few terms of the Fibonacci
sequence f = 2, 2, 4... One can find a polynomial that reproduces these values by first computing a difference table
and then substituting the differences which correspond to x
0
(underlined) into the formula as follows,
For the case of nonuniform steps in the values of x Newton computes the divided differences,
the series of products,
Finite difference
184
and the resulting polynomial is the scalar product,
.
In analysis with p-adic numbers, Mahler's theorem states that the assumption that f is a polynomial function can be
weakened all the way to the assumption that f is merely continuous.
Carlson's theorem provides necessary and sufficient conditions for a Newton series to be unique, if it exists.
However, a Newton series will not, in general, exist.
The Newton series, together with the Stirling series and the Selberg series, is a special case of the general difference
series, all of which are defined in terms of scaled forward differences.
Rules for calculus of finite difference operators
Analogous to rules for finding the derivative, we have:
Constant rule: If c is a constant, then
Linearity: if a and b are constants,
All of the above rules apply equally well to any difference operator, including as to .
Product rule:
Quotient rule:
or
Summation rules:
Finite difference
185
Indefinite sum
The inverse operator of the forward difference operator is the indefinite sum.
Generalizations
A generalized finite difference is usually defined as
where is its coefficients vector. An infinite difference is a further generalization, where the
finite sum above is replaced by an infinite series. Another way of generalization is making coefficients depend
on point : , thus considering weighted finite difference. Also one may make step depend on
point : . Such generalizations are useful for constructing different modulus of continuity.
Difference operator generalizes to Mbius inversion over a partially ordered set.
As a convolution operator: Via the formalism of incidence algebras, difference operators and other Mbius
inversion can be represented by convolution with a function on the poset, called the Mbius function ; for the
difference operator, is the sequence (1,1,0,0,0,...).
Finite difference in several variables
Finite differences can be considered in more than one variable. They are analogous to partial derivatives in several
variables.
Some partial derivative approximations are:
See also
Finite difference coefficients
Taylor series
Numerical differentiation
Five-point stencil
Divided differences
Modulus of continuity
Time scale calculus
Summation by parts
Newton polynomial
Table of Newtonian series
Lagrange polynomial
Gilbreath's conjecture
Finite difference
186
References
[1] http:/ / commons. wikimedia. org/ wiki/ File:FDnotes. djvu
[2] see Newton, Isaac, Principia, Book III, Lemma V, Case 1 (http:/ / books. google. com/ books?id=KaAIAAAAIAAJ& dq=sir isaac newton
principia mathematica& as_brr=1& pg=PA466#v=onepage& q& f=false)
William F. Ames, Numerical Methods for Partial Differential Equations, Section 1.6. Academic Press, New
York, 1977. ISBN 0-12-056760-1.
Francis B. Hildebrand, Finite-Difference Equations and Simulations, Section 2.2, Prentice-Hall, Englewood
Cliffs, New Jersey, 1968.
Boole, George, A Treatise On The Calculus of Finite Differences, 2
nd
ed., Macmillan and Company, 1872. [See
also: Dover edition 1960].
Levy, H.; Lessman, F. (1992). Finite Difference Equations. Dover. ISBN0-486-67260-3.
Robert D. Richtmyer and K. W. Morton, Difference Methods for Initial Value Problems, 2
nd
ed., Wiley, New
York, 1967.
Flajolet, Philippe; Sedgewick, Robert (1995), "Mellin transforms and asymptotics: Finite differences and Rice's
integrals" (http:/ / www-rocq. inria. fr/ algo/ flajolet/ Publications/ mellin-rice. ps. gz), Theoretical Computer
Science 144 (12): 101124, doi:10.1016/0304-3975(94)00281-M.
External links
Table of useful finite difference formula generated using [[Mathematica (http:/ / reference. wolfram. com/
mathematica/ tutorial/ NDSolvePDE. html#c:4)] ]
Finite Calculus: A Tutorial for Solving Nasty Sums (http:/ / www. stanford. edu/ ~dgleich/ publications/
finite-calculus. pdf)
Value at risk
In financial mathematics and financial risk management, Value at Risk (VaR) is a widely used risk measure of the
risk of loss on a specific portfolio of financial assets. For a given portfolio, probability and time horizon, VaR is
defined as a threshold value such that the probability that the mark-to-market loss on the portfolio over the given
time horizon exceeds this value (assuming normal markets and no trading in the portfolio) in the given probability
level.
[1]
For example, if a portfolio of stocks has a one-day 5% VaR of $1 million, there is a 0.05 probability that the
portfolio will fall in value by more than $1 million over a one day period, assuming markets are normal and there is
no trading. Informally, a loss of $1 million or more on this portfolio is expected on 1 day in 20. A loss which
exceeds the VaR threshold is termed a VaR break.
[2]
Value at risk
187
The 5% Value at Risk of a hypothetical profit-and-loss probability density function
VaR has five main uses in finance: risk
management, risk measurement,
financial control, financial reporting
and computing regulatory capital. VaR
is sometimes used in non-financial
applications as well.
[3]
Important related ideas are economic
capital, backtesting, stress testing and
expected shortfall.
[4]
Details
Common parameters for VaR are 1%
and 5% probabilities and one day and
two week horizons, although other
combinations are in use.
[5]
The reason for assuming normal markets and no trading, and to restricting loss to things measured in daily accounts,
is to make the loss observable. In some extreme financial events it can be impossible to determine losses, either
because market prices are unavailable or because the loss-bearing institution breaks up. Some longer-term
consequences of disasters, such as lawsuits, loss of market confidence and employee morale and impairment of
brand names can take a long time to play out, and may be hard to allocate among specific prior decisions. VaR marks
the boundary between normal days and extreme events. Institutions can lose far more than the VaR amount; all that
can be said is that they will not do so very often.
[6]
The probability level is about equally often specified as one minus the probability of a VaR break, so that the VaR in
the example above would be called a one-day 95% VaR instead of one-day 5% VaR. This generally does not lead to
confusion because the probability of VaR breaks is almost always small, certainly less than 0.5.
[1]
Although it virtually always represents a loss, VaR is conventionally reported as a positive number. A negative VaR
would imply the portfolio has a high probability of making a profit, for example a one-day 5% VaR of negative $1
million implies the portfolio has a 95% chance of making more than $1 million over the next day.
[7]
Another inconsistency is VaR is sometimes taken to refer to profit-and-loss at the end of the period, and sometimes
as the maximum loss at any point during the period. The original definition was the latter, but in the early 1990s
when VaR was aggregated across trading desks and time zones, end-of-day valuation was the only reliable number
so the former became the de facto definition. As people began using multiday VaRs in the second half of the 1990s
they almost always estimated the distribution at the end of the period only. It is also easier theoretically to deal with
a point-in-time estimate versus a maximum over an interval. Therefore the end-of-period definition is the most
common both in theory and practice today.
[8]
Value at risk
188
Varieties of VaR
The definition of VaR is nonconstructive; it specifies a property VaR must have, but not how to compute VaR.
Moreover, there is wide scope for interpretation in the definition.
[9]
This has led to two broad types of VaR, one used
primarily in risk management and the other primarily for risk measurement. The distinction is not sharp, however,
and hybrid versions are typically used in financial control, financial reporting and computing regulatory capital.
[10]
To a risk manager, VaR is a system, not a number. The system is run periodically (usually daily) and the published
number is compared to the computed price movement in opening positions over the time horizon. There is never any
subsequent adjustment to the published VaR, and there is no distinction between VaR breaks caused by input errors
(including Information Technology breakdowns, fraud and rogue trading), computation errors (including failure to
produce a VaR on time) and market movements.
[11]
A frequentist claim is made, that the long-term frequency of VaR breaks will equal the specified probability, within
the limits of sampling error, and that the VaR breaks will be independent in time and independent of the level of
VaR. This claim is validated by a backtest, a comparison of published VaRs to actual price movements. In this
interpretation, many different systems could produce VaRs with equally good backtests, but wide disagreements on
daily VaR values.
[1]
For risk measurement a number is needed, not a system. A Bayesian probability claim is made, that given the
information and beliefs at the time, the subjective probability of a VaR break was the specified level. VaR is adjusted
after the fact to correct errors in inputs and computation, but not to incorporate information unavailable at the time of
computation.
[7]
In this context, backtest has a different meaning. Rather than comparing published VaRs to actual
market movements over the period of time the system has been in operation, VaR is retroactively computed on
scrubbed data over as long a period as data are available and deemed relevant. The same position data and pricing
models are used for computing the VaR as determining the price movements.
[2]
Although some of the sources listed here treat only one kind of VaR as legitimate, most of the recent ones seem to
agree that risk management VaR is superior for making short-term and tactical decisions today, while risk
measurement VaR should be used for understanding the past, and making medium term and strategic decisions for
the future. When VaR is used for financial control or financial reporting it should incorporate elements of both. For
example, if a trading desk is held to a VaR limit, that is both a risk-management rule for deciding what risks to allow
today, and an input into the risk measurement computation of the desks risk-adjusted return at the end of the
reporting period.
[4]
VAR in Governance
An interesting takeoff on VaR is its application in Governance for endowments, trusts, and pension plans. Essentially
trustees adopt portfolio Values-at-Risk metrics for the entire pooled account and the diversified parts individually
managed. Instead of probability estimates they simply define maximum levels of acceptable loss for each. Doing so
provides an easy metric for oversight and adds accountability as managers are then directed to manage, but with the
additional constraint to avoid losses within a defined risk parameter. VAR utilized in this manner adds relevance as
well as an easy to monitor risk measurement control far more intuitive than Standard Deviation of Return. Use of
VAR in this context, as well as a worthwhile critique on board governance practices as it relates to investment
management oversight in general can be found in 'Best Practices in Governance".
[12]
Value at risk
189
Risk measure and risk metric
The term VaR is used both for a risk measure and a risk metric. This sometimes leads to confusion. Sources earlier
than 1995 usually emphasize the risk measure, later sources are more likely to emphasize the metric.
The VaR risk measure defines risk as mark-to-market loss on a fixed portfolio over a fixed time horizon, assuming
normal markets. There are many alternative risk measures in finance. Instead of mark-to-market, which uses market
prices to define loss, loss is often defined as change in fundamental value. For example, if an institution holds a loan
that declines in market price because interest rates go up, but has no change in cash flows or credit quality, some
systems do not recognize a loss. Or we could try to incorporate the economic cost of things not measured in daily
financial statements, such as loss of market confidence or employee morale, impairment of brand names or
lawsuits.
[4]
Rather than assuming a fixed portfolio over a fixed time horizon, some risk measures incorporate the effect of
expected trading (such as a stop loss order) and consider the expected holding period of positions. Finally, some risk
measures adjust for the possible effects of abnormal markets, rather than excluding them from the computation.
[4]
The VaR risk metric summarizes the distribution of possible losses by a quantile, a point with a specified probability
of greater losses. Common alternative metrics are standard deviation, mean absolute deviation, expected shortfall
and downside risk.
[1]
VaR risk management
Supporters of VaR-based risk management claim the first and possibly greatest benefit of VaR is the improvement in
systems and modeling it forces on an institution. In 1997, Philippe Jorion wrote
[13]
:
[14]
[T]he greatest benefit of VAR lies in the imposition of a structured methodology for critically thinking
about risk. Institutions that go through the process of computing their VAR are forced to confront their
exposure to financial risks and to set up a proper risk management function. Thus the process of getting
to VAR may be as important as the number itself.
Publishing a daily number, on-time and with specified statistical properties holds every part of a trading organization
to a high objective standard. Robust backup systems and default assumptions must be implemented. Positions that
are reported, modeled or priced incorrectly stand out, as do data feeds that are inaccurate or late and systems that are
too-frequently down. Anything that affects profit and loss that is left out of other reports will show up either in
inflated VaR or excessive VaR breaks. A risk-taking institution that does not compute VaR might escape disaster,
but an institution that cannot compute VaR will not.
[15]
The second claimed benefit of VaR is that it separates risk into two regimes. Inside the VaR limit, conventional
statistical methods are reliable. Relatively short-term and specific data can be used for analysis. Probability estimates
are meaningful, because there are enough data to test them. In a sense, there is no true risk because you have a sum
of many independent observations with a left bound on the outcome. A casino doesn't worry about whether red or
black will come up on the next roulette spin. Risk managers encourage productive risk-taking in this regime, because
there is little true cost. People tend to worry too much about these risks, because they happen frequently, and not
enough about what might happen on the worst days.
[16]
Outside the VaR limit, all bets are off. Risk should be analyzed with stress testing based on long-term and broad
market data.
[17]
Probability statements are no longer meaningful.
[18]
Knowing the distribution of losses beyond the
VaR point is both impossible and useless. The risk manager should concentrate instead on making sure good plans
are in place to limit the loss if possible, and to survive the loss if not.
[1]
One specific system uses three regimes.
[19]
1. One to three times VaR are normal occurrences. You expect periodic VaR breaks. The loss distribution typically
has fat tails, and you might get more than one break in a short period of time. Moreover, markets may be
Value at risk
190
abnormal and trading may exacerbate losses, and you may take losses not measured in daily marks such as
lawsuits, loss of employee morale and market confidence and impairment of brand names. So an institution that
can't deal with three times VaR losses as routine events probably won't survive long enough to put a VaR system
in place.
2. Three to ten times VaR is the range for stress testing. Institutions should be confident they have examined all the
foreseeable events that will cause losses in this range, and are prepared to survive them. These events are too rare
to estimate probabilities reliably, so risk/return calculations are useless.
3. Foreseeable events should not cause losses beyond ten times VaR. If they do they should be hedged or insured, or
the business plan should be changed to avoid them, or VaR should be increased. It's hard to run a business if
foreseeable losses are orders of magnitude larger than very large everyday losses. It's hard to plan for these
events, because they are out of scale with daily experience. Of course there will be unforeseeable losses more than
ten times VaR, but it's pointless to anticipate them, you can't know much about them and it results in needless
worrying. Better to hope that the discipline of preparing for all foreseeable three-to-ten times VaR losses will
improve chances for surviving the unforeseen and larger losses that inevitably occur.
"A risk manager has two jobs: make people take more risk the 99% of the time it is safe to do so, and survive the
other 1% of the time. VaR is the border."
[15]
VaR risk measurement
The VaR risk measure is a popular way to aggregate risk across an institution. Individual business units have risk
measures such as duration for a fixed income portfolio or beta for an equity business. These cannot be combined in a
meaningful way.
[1]
It is also difficult to aggregate results available at different times, such as positions marked in
different time zones, or a high frequency trading desk with a business holding relatively illiquid positions. But since
every business contributes to profit and loss in an additive fashion, and many financial businesses mark-to-market
daily, it is natural to define firm-wide risk using the distribution of possible losses at a fixed point in the future.
[4]
In risk measurement, VaR is usually reported alongside other risk metrics such as standard deviation, expected
shortfall and greeks (partial derivatives of portfolio value with respect to market factors). VaR is a distribution-free
metric, that is it does not depend on assumptions about the probability distribution of future gains and losses.
[15]
The
probability level is chosen deep enough in the left tail of the loss distribution to be relevant for risk decisions, but not
so deep as to be difficult to estimate with accuracy.
[20]
Risk measurement VaR is sometimes called parametric VaR. This usage can be confusing, however, because it can
be estimated either parametrically (for examples, variance-covariance VaR or delta-gamma VaR) or
nonparametrically (for examples, historical simulation VaR or resampled VaR). The inverse usage makes more
logical sense, because risk management VaR is fundamentally nonparametric, but it is seldom referred to as
nonparametric VaR.
[4]
[6]
History of VaR
The problem of risk measurement is an old one in statistics, economics and finance. Financial risk management has
been a concern of regulators and financial executives for a long time as well. Retrospective analysis has found some
VaR-like concepts in this history. But VaR did not emerge as a distinct concept until the late 1980s. The triggering
event was the stock market crash of 1987. This was the first major financial crisis in which a lot of
academically-trained quants were in high enough positions to worry about firm-wide survival.
[1]
The crash was so unlikely given standard statistical models, that it called the entire basis of quant finance into
question. A reconsideration of history led some quants to decide there were recurring crises, about one or two per
decade, that overwhelmed the statistical assumptions embedded in models used for trading, investment management
and derivative pricing. These affected many markets at once, including ones that were usually not correlated, and
Value at risk
191
seldom had discernible economic cause or warning (although after-the-fact explanations were plentiful).
[18]
Much
later, they were named "Black Swans" by Nassim Taleb and the concept extended far beyond finance.
[21]
If these events were included in quantitative analysis they dominated results and led to strategies that did not work
day to day. If these events were excluded, the profits made in between "Black Swans" could be much smaller than
the losses suffered in the crisis. Institutions could fail as a result.
[15]
[18]
[21]
VaR was developed as a systematic way to segregate extreme events, which are studied qualitatively over long-term
history and broad market events, from everyday price movements, which are studied quantitatively using short-term
data in specific markets. It was hoped that "Black Swans" would be preceded by increases in estimated VaR or
increased frequency of VaR breaks, in at least some markets. The extent to which has proven to be true is
controversial.
[18]
Abnormal markets and trading were excluded from the VaR estimate in order to make it observable.
[16]
It is not
always possible to define loss if, for example, markets are closed as after 9/11, or severely illiquid, as happened
several times in 2008.
[15]
Losses can also be hard to define if the risk-bearing institution fails or breaks up.
[16]
A
measure that depends on traders taking certain actions, and avoiding other actions, can lead to self reference.
[1]
This is risk management VaR. It was well-established in quantative trading groups at several financial institutions,
notably Bankers Trust, before 1990, although neither the name nor the definition had been standardized. There was
no effort to aggregate VaRs across trading desks.
[18]
The financial events of the early 1990s found many firms in trouble because the same underlying bet had been made
at many places in the firm, in non-obvious ways. Since many trading desks already computed risk management VaR,
and it was the only common risk measure that could be both defined for all businesses and aggregated without strong
assumptions, it was the natural choice for reporting firmwide risk. J. P. Morgan CEO Dennis Weatherstone famously
called for a 4:15 report that combined all firm risk on one page, available within 15 minutes of the market close.
[9]
Risk measurement VaR was developed for this purpose. Development was most extensive at J. P. Morgan, which
published the methodology and gave free access to estimates of the necessary underlying parameters in 1994. This
was the first time VaR had been exposed beyond a relatively small group of quants. Two years later, the
methodology was spun off into an independent for-profit business now part of RiskMetrics Group
[22]
.
[9]
In 1997, the U.S. Securities and Exchange Commission ruled that public corporations must disclose quantitative
information about their derivatives activity. Major banks and dealers chose to implement the rule by including VaR
information in the notes to their financial statements.
[1]
Worldwide adoption of the Basel II Accord, beginning in 1999 and nearing completion today, gave further impetus
to the use of VaR. VaR is the preferred measure of market risk, and concepts similar to VaR are used in other parts
of the accord.
[1]
Mathematics
"Given some confidence level the VaR of the portfolio at the confidence level is given by the
smallest number such that the probability that the loss exceeds is not larger than "
[3]
The left equality is a definition of VaR. The right equality assumes an underlying probability distribution, which
makes it true only for parametric VaR. Risk managers typically assume that some fraction of the bad events will
have undefined losses, either because markets are closed or illiquid, or because the entity bearing the loss breaks
apart or loses the ability to compute accounts. Therefore, they do not accept results based on the assumption of a
well-defined probability distribution.
[6]
Nassim Taleb has labeled this assumption, "charlatanism."
[23]
On the other
hand, many academics prefer to assume a well-defined distribution, albeit usually one with fat tails.
[1]
This point has
probably caused more contention among VaR theorists than any other.
[9]
Value at risk
192
Criticism
VaR has been controversial since it moved from trading desks into the public eye in 1994. A famous 1997 debate
[13]
between Nassim Taleb and Philippe Jorion set out some of the major points of contention. Taleb claimed VaR:
[24]
1. Ignored 2,500 years of experience in favor of untested models built by non-traders
2. Was charlatanism because it claimed to estimate the risks of rare events, which is impossible
3. Gave false confidence
4. Would be exploited by traders
More recently David Einhorn and Aaron Brown debated VaR in Global Association of Risk Professionals Review
[25][15]
[26]
Einhorn compared VaR to an airbag that works all the time, except when you have a car accident. He
further charged that VaR:
1. Led to excessive risk-taking and leverage at financial institutions
2. Focused on the manageable risks near the center of the distribution and ignored the tails
3. Created an incentive to take excessive but remote risks
4. Was potentially catastrophic when its use creates a false sense of security among senior executives and
watchdogs.
New York Times reporter Joe Nocera wrote an extensive piece Risk Mismanagement
[27][28]
on January 4, 2009
discussing the role VaR played in the Financial crisis of 2007-2008. After interviewing risk managers (including
several of the ones cited above) the article suggests that VaR was very useful to risk experts, but nevertheless
exacerbated the crisis by giving false security to bank executives and regulators. A powerful tool for professional
risk managers, VaR is portrayed as both easy to misunderstand, and dangerous when misunderstood.
A common complaint among academics is that VaR is not subadditive.
[4]
That means the VaR of a combined
portfolio can be larger than the sum of the VaRs of its components. To a practicing risk manager this makes sense.
For example, the average bank branch in the United States is robbed about once every ten years. A single-branch
bank has about 0.004% chance of being robbed on a specific day, so the risk of robbery would not figure into
one-day 1% VaR. It would not even be within an order of magnitude of that, so it is in the range where the institution
should not worry about it, it should insure against it and take advice from insurers on precautions. The whole point
of insurance is to aggregate risks that are beyond individual VaR limits, and bring them into a large enough portfolio
to get statistical predictability. It does not pay for a one-branch bank to have a security expert on staff.
As institutions get more branches, the risk of a robbery on a specific day rises to within an order of magnitude of
VaR. At that point it makes sense for the institution to run internal stress tests and analyze the risk itself. It will
spend less on insurance and more on in-house expertise. For a very large banking institution, robberies are a routine
daily occurrence. Losses are part of the daily VaR calculation, and tracked statistically rather than case-by-case. A
sizable in-house security department is in charge of prevention and control, the general risk manager just tracks the
loss like any other cost of doing business.
As portfolios or institutions get larger, specific risks change from low-probability/low-predictability/high-impact to
statistically predictable losses of low individual impact. That means they move from the range of far outside VaR, to
be insured, to near outside VaR, to be analyzed case-by-case, to inside VaR, to be treated statistically.
[15]
Even VaR supporters generally agree there are common abuses of VaR:
[6]
[9]
1. Referring to VaR as a "worst-case" or "maximum tolerable" loss. In fact, you expect two or three losses per year
that exceed one-day 1% VaR.
2. Making VaR control or VaR reduction the central concern of risk management. It is far more important to worry
about what happens when losses exceed VaR.
3. Assuming plausible losses will be less than some multiple, often three, of VaR. The entire point of VaR is that
losses can be extremely large, and sometimes impossible to define, once you get beyond the VaR point. To a risk
manager, VaR is the level of losses at which you stop trying to guess what will happen next, and start preparing
Value at risk
193
for anything.
4. Reporting a VaR that has not passed a backtest. Regardless of how VaR is computed, it should have produced the
correct number of breaks (within sampling error) in the past. A common specific violation of this is to report a
VaR based on the unverified assumption that everything follows a multivariate normal distribution.
References
[1] Philippe Jorion, Value at Risk: The New Benchmark for Managing Financial Risk, 3rd ed. McGraw-Hill (2006). ISBN 978-0071464956
[2] Glyn Holton, Value-at-Risk: Theory and Practice, Academic Press (2003). ISBN 978-0123540102.
[3] Alexander McNeil, Rdiger Frey and Paul Embrechts, Quantitative Risk Management: Concepts Techniques and Tools, Princeton University
Press (2005). ISBN 978-0691122557
[4] Kevin Dowd, Measuring Market Risk. John Wiley & Sons (2005) ISBN 978-0470013038.
[5] Neil Pearson, Risk Budgeting: Portfolio Problem Solving with Value-at-Risk. John Wiley & Sons (2002). ISBN 978-0471405566.
[6] Aaron Brown, The Unbearable Lightness of Cross-Market Risk, Wilmott Magazine, March 2004.
[7] Michel Crouhy, Dan Galai and Robert Mark, The Essentials of Risk Management. McGraw-Hill (2001) ISBN 978-0071429665
[8] Jose A. Lopez, Regulatory Evaluation of Value-at-Risk Models. Wharton Financial Institutions Center Working Paper 96-51, September
1996.
[9] Joe Kolman, Michael Onak, Philippe Jorion, Nassim Taleb, Emanuel Derman, Blu Putnam, Richard Sandor, Stan Jonas, Ron Dembo, George
Holt, Richard Tanenbaum, William Margrabe, Dan Mudge, James Lam and Jim Rozsypal, Roundtable: The Limits of VaR. Derivatives
Strategy, April 1998.
[10] Aaron Brown, The Next Ten VaR Disasters. Derivatives Strategy, March 1997.
[11] Paul Wilmott, Paul Wilmott Introduces Quantitative Finance. Wiley (2007). ISBN 978-0470319581
[12] Best Practices in Governance, Lawrence York, 2009
[13] http:/ / www.derivativesstrategy. com/ magazine/ archive/ 1997/ 0497fea2. asp
[14] Philippe Jorion in Nassim Taleb and Philippe Jorion, The Jorion/Taleb Debate. Derivatives Strategy, April 1997.
[15] Aaron Brown, in David Einhorn and Aaron Brown, Private Profits and Socialized Risk. GARP Risk Review (June/July 2008).
[16] Espen Haug, Derivative Models on Models. John Wiley & Sons (2007). ISBN 978-0470013229
[17] Ezra Zask, Taking the Stress Out of Stress Testing. Derivative Strategy, February 1999.
[18] Joe Kolman, Michael Onak, Philippe Jorion, Nassim Taleb, Emanuel Derman, Blu Putnam, Richard Sandor, Stan Jonas, Ron Dembo,
George Holt, Richard Tanenbaum, William Margrabe, Dan Mudge, James Lam and Jim Rozsypal, Roundtable: The Limits of Models.
Derivatives Strategy, April 1998.
[19] Aaron Brown, On Stressing the Right Size. GARP Risk Review, December 2007.
[20] Paul Glasserman, Monte Carlo Methods in Financial Engineering. Springer (2004). ISBN 978-0387004518.
[21] Taleb, Nassim Nicholas (2007). The Black Swan: The Impact of the Highly Improbable. New York: Random House.
ISBN978-1-4000-6351-2.
[22] http:/ / www.riskmetrics. com/
[23] Nassim Taleb, The World According to Nassim Taleb. Derivatives Strategy, December 1996/January 1997.
[24] Nassim Taleb in Philippe Jorion in Nassim Taleb and Philippe Jorion, The Jorion/Taleb Debate. Derivatives Strategy, April 1997.
[25] http:/ / www.garpdigitallibrary.org/ download/ GRR/ 2012. pdf
[26] David Einhorn in David Einhorn and Aaron Brown, Private Profits and Socialized Risk. GARP Risk Review (June/July 2008).
[27] http:/ / www.nytimes.com/ 2009/ 01/ 04/ magazine/ 04risk-t. html?pagewanted=1& _r=1
[28] Joe Nocera, Risk Mismanagement, The New York Times Magazine (January 4, 2009)
External links
Discussion
Perfect Storms Beautiful & True Lies In Risk Management (http:/ / www. wilmott. com/ blogs/ satyajitdas/
enclosures/ perfectstorms(may2007)1. pdf), Satyajit Das
Gloria Mundi All About Value at Risk (http:/ / www. gloriamundi. org/ ), Barry Schachter
Risk Management (http:/ / www. nytimes. com/ 2009/ 01/ 04/ magazine/ 04risk-t. html?dlbk=& pagewanted=all),
Joe Nocera NYTimes article.
Tools
Online real-time VaR calculator (http:/ / www. cba. ua. edu/ ~rpascala/ VaR/ VaRForm. php), Razvan Pascalau,
University of Alabama
Value at risk
194
Value-at-Risk (VaR) (http:/ / finance. wharton. upenn. edu/ ~benninga/ mma/ MiER74. pdf), Simon Benninga and
Zvi Wiener. (Mathematica in Education and Research Vol. 7 No. 4 1998.)
Volatility (finance)
In finance, volatility most frequently refers to the standard deviation of the continuously compounded returns of a
financial instrument within a specific time horizon. It is used to quantify the risk of the financial instrument over the
specified time period. Volatility is normally expressed in annualized terms, and it may either be an absolute number
($5) or a fraction of the mean (5%).
Volatility terminology
Volatility as described here refers to the actual current volatility of a financial instrument for a specified period (for
example 30 days or 90 days). It is the volatility of a financial instrument based on historical prices over the specified
period with the last observation the most recent price. This phrase is used particularly when it is wished to
distinguish between the actual current volatility of an instrument and
actual historical volatility which refers to the volatility of a financial instrument over a specified period but with
the last observation on a date in the past
actual future volatility which refers to the volatility of a financial instrument over a specified period starting at
the current time and ending at a future date (normally the expiry date of an option)
historical implied volatility which refers to the implied volatility observed from historical prices of the financial
instrument (normally options)
current implied volatility which refers to the implied volatility observed from current prices of the financial
instrument
future implied volatility which refers to the implied volatility observed from future prices of the financial
instrument
For a financial instrument whose price follows a Gaussian random walk, or Wiener process, the width of the
distribution increases as time increases. This is because there is an increasing probability that the instrument's price
will be farther away from the initial price as time increases. However, rather than increase linearly, the volatility
increases with the square-root of time as time increases, because some fluctuations are expected to cancel each other
out, so the most likely deviation after twice the time will not be twice the distance from zero.
Since observed price changes do not follow Gaussian distributions, others such as the Lvy distribution are often
used.
[1]
These can capture attributes such as "fat tails".
Volatility for market players
When investing directly in a security, volatility is often viewed as a negative in that it represents uncertainty and risk.
However, with other investing strategies, volatility is often desirable. For example, if an investor is short on the
peaks, and long on the lows of a security, the profit will be greatest when volatility is highest.
In today's markets, it is also possible to trade volatility directly, through the use of derivative securities such as
options and variance swaps. See Volatility arbitrage.
Volatility (finance)
195
Volatility versus direction
Volatility does not measure the direction of price changes, merely their dispersion. This is because when calculating
standard deviation (or variance), all differences are squared, so that negative and positive differences are combined
into one quantity. Two instruments with different volatilities may have the same expected return, but the instrument
with higher volatility will have larger swings in values over a given period of time.
For example, a lower volatility stock may have an expected (average) return of 7%, with annual volatility of 5%.
This would indicate returns from approximately -3% to 17% most of the time (19 times out of 20, or 95%). A higher
volatility stock, with the same expected return of 7% but with annual volatility of 20%, would indicate returns from
approximately -33% to 47% most of the time (19 times out of 20, or 95%). These estimates assume a normal
distribution; in reality stocks are found to be leptokurtotic.
Volatility is a poor measure of risk, as explained by Peter Carr, "it is only a good measure of risk if you feel that
being rich then being poor is the same as being poor then rich".
Volatility over time
Although the Black Scholes equation assumes predictable constant volatility, none of these are observed in real
markets, and amongst the models are Bruno Dupire's Local Volatility, Poisson Process where volatility jumps to new
levels with a predictable frequency, and the increasingly popular Heston model of Stochastic Volatility.
[2]
It's common knowledge that types of assets experience periods of high and low volatility. That is, during some
periods prices go up and down quickly, while during other times they might not seem to move at all.
Periods when prices fall quickly (a crash) are often followed by prices going down even more, or going up by an
unusual amount. Also, a time when prices rise quickly (a bubble) may often be followed by prices going up even
more, or going down by an unusual amount.
The converse behavior, 'doldrums' can last for a long time as well.
Most typically, extreme movements do not appear 'out of nowhere'; they're presaged by larger movements than
usual. This is termed autoregressive conditional heteroskedasticity. Of course, whether such large movements have
the same direction, or the opposite, is more difficult to say. And an increase in volatility does not always presage a
further increasethe volatility may simply go back down again.
Mathematical definition
The annualized volatility is the standard deviation of the instrument's yearly logarithmic returns.
The generalized volatility
T
for time horizon T in years is expressed as:
Therefore, if the daily logarithmic returns of a stock have a standard deviation of
SD
and the time period of returns
is P, the annualized volatility is
A common assumption is that P = 1/252 (there are 252 trading days in any given year). Then, if
SD
= 0.01 the
annualized volatility is
The monthly volatility (i.e., T = 1/12 of a year) would be
Volatility (finance)
196
The formula used above to convert returns or volatility measures from one time period to another assume a particular
underlying model or process. These formulas are accurate extrapolations of a random walk, or Wiener process,
whose steps have finite variance. However, more generally, for natural stochastic processes, the precise relationship
between volatility measures for different time periods is more complicated. Some use the Lvy stability exponent
to extrapolate natural processes:
If =2 you get the Wiener process scaling relation, but some people believe <2 for financial activities such as
stocks, indexes and so on. This was discovered by Benot Mandelbrot, who looked at cotton prices and found that
they followed a Lvy alpha-stable distribution with =1.7. (See New Scientist, 19 April 1997.) Mandelbrot's
conclusion is, however, not accepted by mainstream financial econometricians.
Crude volatility estimation
Using a simplification of the formulas above it is possible to estimate annualized volatility based solely on
approximate observations. Suppose you notice that a market price index, which has a current value near 10,000, has
moved about 100 points a day, on average, for many days. This would constitute a 1% daily movement, up or down.
To annualize this, you can use the "rule of 16", that is, multiply by 16 to get 16% as the annual volatility. The
rationale for this is that 16 is the square root of 256, which is approximately the number of trading days in a year
(252). This also uses the fact that the standard deviation of the sum of n independent variables (with equal standard
deviations) is n times the standard deviation of the individual variables.
Of course, the average magnitude of the observations is merely an approximation of the standard deviation of the
market index. Assuming that the market index daily changes are normally distributed with mean zero and standard
deviation , the expected value of the magnitude of the observations is (2/) = 0.798. The net effect is that this
crude approach overestimates the true volatility by about 25%.
Estimate of compound annual growth rate (CAGR)
Consider the Taylor series:
Taking only the first two terms one has:
Realistically, most financial assets have negative skewness and leptokurtosis, so this formula tends to be
over-optimistic. Some people use the formula:
for a rough estimate, where k is an empirical factor (typically five to ten).
Volatility (finance)
197
See also
Beta (finance)
Derivative (finance)
Financial economics
Implied volatility
IVX
Risk
Standard deviation
Stochastic volatility
Volatility arbitrage
Volatility smile
References
[1] http:/ / www. wilmottwiki.com/ wiki/ index.php/ Levy_distribution
[2] http:/ / www. wilmottwiki.com/ wiki/ index.php/ Volatility#Definitions
Lin Chen (1996). Stochastic Mean and Stochastic Volatility A Three-Factor Model of the Term Structure of
Interest Rates and Its Application to the Pricing of Interest Rate Derivatives. Blackwell Publishers.
External links
Complex Options (http:/ / www. optionistics. com/ f/ strategy_calculator) Multi-Leg Option Strategy Calculator
An introduction to volatility and how it can be calculated in excel, by Dr A. A. Kotz (http:/ / quantonline. co. za/
Articles/ article_volatility. htm)
Interactive Java Applet " What is Historic Volatility? (http:/ / www. frog-numerics. com/ ifs/ ifs_LevelA/
HistVolaBasic. html)"
Diebold, Francis X.; Hickman, Andrew; Inoue, Atsushi & Schuermannm, Til (1996) "Converting 1-Day
Volatility to h-Day Volatility: Scaling by sqrt(h) is Worse than You Think" (http:/ / citeseer. ist. psu. edu/ 244698.
html)
A short introduction to alternative mathematical concepts of volatility (http:/ / staff. science. uva. nl/ ~marvisse/
volatility. html)
Autoregressive conditional heteroskedasticity
198
Autoregressive conditional heteroskedasticity
In econometrics, AutoRegressive Conditional Heteroskedasticity (ARCH) models are used to characterize and
model observed time series. They are used whenever there's reason to believe that, at any point in a series, the terms
will have a characteristic size, or variance. In particular ARCH models assume the variance of the current error term
or innovation to be a function of the actual sizes of the previous time periods' error terms: often the variance is
related to the squares of the previous innovations.
Such models are often called ARCH models (Engle, 1982), although a variety of other acronyms is applied to
particular structures of model which have a similar basis. ARCH models are employed commonly in modeling
financial time series that exhibit time-varying volatility clustering, i.e. periods of swings followed by periods of
relative calm.
ARCH(q) model Specification
Suppose one wishes to model a time series with an ARCH process. Let denote the error terms (return residuals,
w.r.t. a mean process) i.e. the series terms. These are split into a stochastic piece and a time-dependent
standard deviation characterizing the typical size of the terms so that
where is a random variable drawn from a Gaussian distribution centered at 0 with standard deviation equal to 1.
(i.e. and where the series are modeled by
and where and .
An ARCH(q) model can be estimated using ordinary least squares. A methodology to test for the lag length of
ARCH errors using the Lagrange multiplier test was proposed by Engle (1982). This procedure is as follows:
1. Estimate the best fitting AR(q) model .
2. Obtain the squares of the error and regress them on a constant and q lagged values:
where q is the length of ARCH lags.
3. The null hypothesis is that, in the absence of ARCH components, we have for all . The
alternative hypothesis is that, in the presence of ARCH components, at least one of the estimated coefficients
must be significant. In a sample of T residuals under the null hypothesis of no ARCH errors, the test statistic TR
follows distribution with q degrees of freedom. If TR is greater than the Chi-square table value, we reject the
null hypothesis and conclude there is an ARCH effect in the ARMA model. If TR is smaller than the Chi-square
table value, we do not reject the null hypothesis.
Autoregressive conditional heteroskedasticity
199
GARCH
If an autoregressive moving average model (ARMA model) is assumed for the error variance, the model is a
generalized autoregressive conditional heteroskedasticity (GARCH, Bollerslev(1986)) model.
In that case, the GARCH(p, q) model (where p is the order of the GARCH terms and q is the order of the ARCH
terms ) is given by
Generally, when testing for heteroskedasticity in econometric models, the best test is the White test. However, when
dealing with time series data, this means to test for ARCH errors (as described above) and GARCH errors (below).
Prior to GARCH there was EWMA which has now been superseded by GARCH, although some people utilise both.
GARCH(p, q) model specification
The lag length p of a GARCH(p, q) process is established in three steps:
1. Estimate the best fitting AR(q) model
.
2. Compute and plot the autocorrelations of by
3. The asymptotic, that is for large samples, standard deviation of is . Individual values that are larger
than this indicate GARCH errors. To estimate the total number of lags, use the Ljung-Box test until the value of
the these are less than, say, 10% significant. The Ljung-Box Q-statistic follows distribution with n degrees of
freedom if the squared residuals are uncorrelated. It is recommended to consider up to T/4 values of n. The
null hypothesis states that there are no ARCH or GARCH errors. Rejecting the null thus means that there are
existing such errors in the conditional variance.
Nonlinear GARCH (NGARCH)
Nonlinear GARCH (NGARCH) also known as Nonlinear Asymmetric GARCH(1,1) (NAGARCH) was introduced
by Engle and Ng in 1993.
.
For stock returns, parameter is usually estimated to be positive; in this case, it reflects the leverage effect,
signifying that negative returns increase future volatility by a larger amount than positive returns of the same
magnitude.
[1]
[2]
This model shouldn't be confused with the NARCH model, together with the NGARCH extension, introduced by
Higgins and Bera in 1992.
Autoregressive conditional heteroskedasticity
200
IGARCH
Integrated Generalized Autoregressive Conditional Heteroskedasticity IGARCH is a restricted version of the
GARCH model, where the persistent parameters sum up to one, and therefore there is a unit root in the GARCH
process. The condition for this is
.
EGARCH
The exponential general autoregressive conditional heteroskedastic (EGARCH) model by Nelson (1991) is
another form of the GARCH model. Formally, an EGARCH(p,q):
where , is the conditional variance, , , , and are
coefficients, and may be a standard normal variable or come from a generalized error distribution. The
formulation for allows the sign and the magnitude of to have separate effects on the volatility. This is
particularly useful in an asset pricing context.
[3]
Since may be negative there are no (fewer) restrictions on the parameters.
GARCH-M
The GARCH-in-mean (GARCH-M) model adds a heteroskedasticity term into the mean equation. It has the
specification:
The residual is defined as
QGARCH
The Quadratic GARCH (QGARCH) model by Sentana (1995) is used to model asymmetric effects of positive and
negative shocks.
In the example of a GARCH(1,1) model, the residual process is
where is i.i.d. and
Autoregressive conditional heteroskedasticity
201
GJR-GARCH
Similar to QGARCH, The Glosten-Jagannathan-Runkle GARCH (GJR-GARCH) model by Glosten, Jagannathan
and Runkle (1993) also models asymmetry in the GARCH process. The suggestion is to model where
is i.i.d., and
where if , and if .
TGARCH model
The Threshold GARCH (TGARCH) model by Zakoian (1994) is similar to GJR GARCH, and the specification is
one on conditional standard deviation instead of conditional variance:
where if , and if . Likewise, if , and
if .
fGARCH
Hentschel's fGARCH model
[4]
, also known as Family GARCH, is an omnibus model that nests a variety of other
popular symmetric and asymmetric GARCH models including APARCH, GJR, AVGARCH, NGARCH, etc.
References
[1] Engle, R.F.; Ng, V.K.. "Measuring and testing the impact of news on volatility" (http:/ / papers. ssrn. com/ sol3/ papers.
cfm?abstract_id=262096). Journal of Finance 48 (5): 17491778. .
[2] Posedel, Petra (2006). "Analysis Of The Exchange Rate And Pricing Foreign Currency Options On The Croatian Market: The Ngarch Model
As An Alternative To The Black Scholes Model" (http:/ / www. ijf. hr/ eng/ FTP/ 2006/ 4/ posedel. pdf). Financial Theory and Practice 30
(4): 347368. .
[3] St. Pierre, Eilleen F (1998): Estimating EGARCH-M Models: Science or Art, The Quarterly Review of Economics and Finance, Vol. 38, No.
2, pp. 167-180 (http:/ / dx.doi. org/ 10.1016/ S1062-9769(99)80110-0)
[4] Hentschel, Ludger (1995). All in the family Nesting symmetric and asymmetric GARCH models (http:/ / www. personal. anderson. ucla. edu/
rossen. valkanov/ hentschel_1995.pdf), Journal of Financial Economics, Volume 39, Issue 1, Pages 71-104
Tim Bollerslev. "Generalized Autoregressive Conditional Heteroskedasticity", Journal of Econometrics,
31:307-327, 1986.
Enders, W., Applied Econometrics Time Series, John-Wiley & Sons, 139-149, 1995
Robert F. Engle. "Autoregressive Conditional Heteroscedasticity with Estimates of Variance of United Kingdom
Inflation", Econometrica 50:987-1008, 1982. (the paper which sparked the general interest in ARCH models)
Robert F. Engle. "GARCH 101: The Use of ARCH/GARCH Models in Applied Econometrics", Journal of
Economic Perspectives 15(4):157-168, 2001. (a short, readable introduction) (http:/ / pages. stern. nyu. edu/
~rengle/ Garch101. doc)
Engle, R.F. (1995) ARCH: selected readings. Oxford University Press. ISBN 0-19-877432-X
Gujarati, D. N., Basic Econometrics, 856-862, 2003
Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach, Econometrica 59:
347-370.
Bollerslev, Tim (2008). Glossary to ARCH (GARCH) (ftp:/ / ftp. econ. au. dk/ creates/ rp/ 08/ rp08_49. pdf),
working paper
Hacker, R. S. and Hatemi-J, A. (2005). A Test for Multivariate ARCH Effects (http:/ / ideas. repec. org/ a/ taf/
apeclt/ v12y2005i7p411-417. html), Applied Economics Letters, Vol. 12(7), pp.411-417.
Brownian Model of Financial Markets
202
Brownian Model of Financial Markets
The Brownian motion models for financial markets are based on the work of Robert C. Merton and Paul A.
Samuelson, as extensions to the one-period market models of Harold Markowitz and William Sharpe, and are
concerned with defining the concepts of financial assets and markets, portfolios, gains and wealth in terms of
continuous-time stochastic processes.
Under this model, these assets have continuous prices evolving continuously in time and are driven by Brownian
motion processes. This model requires an assumption of perfectly divisible assets and that no transaction costs occur
either for buying or selling (i.e. a frictionless market). Another assumption is that asset prices have no jumps, that is
there are no surprises in the market.
Financial market processes
Consider a financial market consisting of financial assets, where one of these assets, called a bond or
money-market, is risk free while the remaining assets, called stocks, are risky.
Definition
A financial market is defined as :
1. A probability space
2. A time interval
3. A -dimensional Brownian process adapted to the augmented
filtration
4. A measurable risk-free money market rate process
5. A measurable mean rate of return process .
6. A measurable dividend rate of return process .
7. A measurable volatility process such that .
8. A measurable, finite variation, singularly continuous stochastic
9. The initial conditions given by
The augmented filtration
Let be a probability space, and a be D-dimensional
Brownian motion stochastic process, with the natural filtration:
If are the measure 0 (i.e. null under measure ) subsets of , then define the augmented filtration:
The difference between and is that the latter is both
left-continuous, in the sense that:
and right-continuous, such that:
while the former is only left-continuous
[1]
.
Brownian Model of Financial Markets
203
Bond
A share of a bond (money market) has price at time with , is continuous,
adapted, and has finite variation. Because it has finite variation, it can be decomposted into
an absolutely continuous part and a singularly continuous part , by Lebesgue's decomposition
theorem. Define:
and
resulting in the SDE:
which gives:
Thus, it can be easily seen that if is absolutely continuous (i.e. ), then the price of the bond evolves
like the value of a risk-free savings account with instantaneous interest rate , which is random,
time-dependendent and measurable.
Stocks
Stock prices are modeled as being similar to that of bonds, except with a randomly fluctuating component (called its
volatility). As a premium for the risk originating from these random fluctuations, the mean rate of return of a stock is
higher than that of a bond.
Let be the strictly positive prices per share of the stocks, which are continuous stochastic
processes satisfying:
Here, gives the volatility of the -th stock, while is its mean rate of return.
In order for an arbitrage-free pricing scenario, must be as defined above. The solution to this is:
and the discounted stock prices are:
Note that the contribution due to the discontinuites in the bond price does not appear in this equation.
Dividend rate
Each stock may have an associated dividend rate process giving the rate of divident payment per unit price of
the stock at time . Accounting for this in the model, gives the yield process :
Portfolio and gain processes
Definition
Consider a financial market .
A portfolio process for this market is an measurable, valued process such that:
, almost surely,
, almost surely, and
Brownian Model of Financial Markets
204
, almost surely.
The gains process for this porfolio is:
We say that the porfolio is self-financed if:
.
It turns out that for a self-financed portfolio, the appropriate value of is determined from and
therefore sometimes is referred to as the portfolio process. Also, implies borrowing money from the
money-market, while implies taking a short position on the stock.
The term in the SDE of is the risk premium process, and it is the compensation
received in return for investing in the -th stock.
Motivation
Consider time intervals , and let be the number of shares of asset
, held in a portfolio during time interval at time . To avoid the case
of insider trading (i.e. foreknowledge of the future), it is required that is measurable.
Therefore, the incremental gains at each trading interval from such a portfolio is:
and is the total gain over time , while the total value of the portfolio is .
Define , let the time partition go to zero, and substitute for as defined earlier, to get the
corresponding SDE for the gains process. Here denotes the dollar amount invested in asset at time , not
the number of shares held.
Income and wealth processes
Definition
Given a financial market , then a cumulative income process is a semimartingale and
represents the income accumulated over time , due to sources other than the investments in the assets
of the financial market.
A wealth process is then defined as:
and represents the total wealth of an investor at time . The portfolio is said to be -financed if:
The corresponding SDE for the wealth process, through appropriate substitutions, becomes:
.
Note, that again in this case, the value of can be determined from .
Brownian Model of Financial Markets
205
Viable markets
The standard theory of mathematical finance is restricted to viable financial markets, i.e. those in which there are no
opportunities for arbitrage. If such opportunities exists, it implies the possibility of making an arbitrarily large
risk-free profit.
Definition
In a financial market , a self-financed portfolio process is said to be an arbitrage opportunity if the
associated gains process , almost surely and strictly. A market in which no
such portfolio exists is said to be viable.
Implications
In a viable market , there exists a adapted process such that for almost every
:
.
This is called the market price of risk and relates the premium for the -the stock with its volatility .
Conversely, if there exists a D-dimensional process such that it satifies the above requirement, and:
,
then the market is viable.
Also, a viable market can have only one money-market (bond) and hence only one risk-free rate. Therefore, if
the -th stock entails no risk (i.e. ) and pays no dividend (i.e. ), then its rate
of return is equal to the money market rate (i.e. ) and its price tracks that of the bond (i.e.
).
Standard financial market
Definition
A financial market is said to be standard if:
(i) It is viable
(ii) The number of stocks is not greater than the dimension of the underlying Brownian motion process
(iii) The market price of risk process satisfies:
, almost surely
(iv) The positive process is a
martingale
Brownian Model of Financial Markets
206
Comments
In case the number of stocks is greater than the dimension , in violation of point (ii), from linear algebra, it
can be seen that there are stocks $n$ whose wolatilies (given by the vector ) are linear
combination of the volatilities of other stocks (because the rank of is ). Therefore, the stocks can be
replaced by equivalent mutual funds.
The standard martingale measure on for the standard market, is defined as:
.
Note that and are absolutely continuous with respect to each other, i.e. they are equivalent. Also, according to
Girsanov's theorem,
,
is a -dimensional Brownian motion process on the filtration with respect to .
Complete financial markets
A complete financial market is one that allows effective hedging of the risk inherent in any investment strategy.
Definition
Let be a standard financial market, and be an -measurable random variable, such that:
.
,
The market is said to be complete if every such is financeable, i.e. if there is an -financed portfolio
process , such that its associated wealth process satisfies
, almost surely.
Motivation
If a particular investment strategy calls for a payment at time , the amount of which is unknown at time
, then a conservative strategy would be to set aside an amount in order to cover the
payment. However, in a complete market it is possible to set aside less capital (viz. ) and invest it so that at time
it has grown to match the size of .
Corollary
A standard financial market is complete if and only if , and the volalatily process is
non-singular for almost every , with respect to the Lebesgue measure.
Brownian Model of Financial Markets
207
Notes
[1] Karatzas, Ioannis; Shreve, Steven E. (1991). Brownian motion and stochastic calculus. New York: Springer-Verlag. ISBN0387976558.
See also
Mathematical finance
Monte Carlo method
Martingale (probability theory)
References
Karatzas, Ioannis; Shreve, Steven E. (1998). Methods of mathematical finance. New York: Springer.
ISBN0387948392.
Korn, Ralf; Korn, Elke (2001). Option pricing and portfolio optimization: modern methods of financial mathematics.
Providence, R.I.: American Mathematical Society. ISBN0821821237.
Merton, R. C. (1 August 1969). "Lifetime Portfolio Selection under Uncertainty: the Continuous-Time Case" (http:/ /
jstor. org/ stable/ 1926560). The Review of Economics and Statistics 51 (3): 247257. doi:10.2307/1926560.
ISSN00346535.
Merton, R.C. (1970). "Optimum consumption and portfolio rules in a continuous-time model" (http:/ / www. math.
uwaterloo. ca/ ~mboudalh/ Merton1971. pdf) (w). Journal of Economic Theory 3. Retrieved 2009-05-29.
Rational pricing
Rational pricing is the assumption in financial economics that asset prices (and hence asset pricing models) will
reflect the arbitrage-free price of the asset as any deviation from this price will be "arbitraged away". This
assumption is useful in pricing fixed income securities, particularly bonds, and is fundamental to the pricing of
derivative instruments.
Arbitrage mechanics
Arbitrage is the practice of taking advantage of a state of imbalance between two (or possibly more) markets. Where
this mismatch can be exploited (i.e. after transaction costs, storage costs, transport costs, dividends etc.) the
arbitrageur "locks in" a risk free profit without investing any of his own money.
In general, arbitrage ensures that "the law of one price" will hold; arbitrage also equalises the prices of assets with
identical cash flows, and sets the price of assets with known future cash flows.
The law of one price
The same asset must trade at the same price on all markets ("the law of one price"). Where this is not true, the
arbitrageur will:
1. buy the asset on the market where it has the lower price, and simultaneously sell it (short) on the second market at
the higher price
2. deliver the asset to the buyer and receive that higher price
3. pay the seller on the cheaper market with the proceeds and pocket the difference.
Rational pricing
208
Assets with identical cash flows
Two assets with identical cash flows must trade at the same price. Where this is not true, the arbitrageur will:
1. sell the asset with the higher price (short sell) and simultaneously buy the asset with the lower price
2. fund his purchase of the cheaper asset with the proceeds from the sale of the expensive asset and pocket the
difference
3. deliver on his obligations to the buyer of the expensive asset, using the cash flows from the cheaper asset.
An asset with a known future-price
An asset with a known price in the future, must today trade at that price discounted at the risk free rate.
Note that this condition can be viewed as an application of the above, where the two assets in question are the asset
to be delivered and the risk free asset.
(a) where the discounted future price is higher than today's price:
1. The arbitrageur agrees to deliver the asset on the future date (i.e. sells forward) and simultaneously buys it today
with borrowed money.
2. On the delivery date, the arbitrageur hands over the underlying, and receives the agreed price.
3. He then repays the lender the borrowed amount plus interest.
4. The difference between the agreed price and the amount owed is the arbitrage profit.
(b) where the discounted future price is lower than today's price:
1. The arbitrageur agrees to pay for the asset on the future date (i.e. buys forward) and simultaneously sells (short)
the underlying today; he invests the proceeds.
2. On the delivery date, he cashes in the matured investment, which has appreciated at the risk free rate.
3. He then takes delivery of the underlying and pays the agreed price using the matured investment.
4. The difference between the maturity value and the agreed price is the arbitrage profit.
It will be noted that (b) is only possible for those holding the asset but not needing it until the future date. There may
be few such parties if short-term demand exceeds supply, leading to backwardation.
Fixed income securities
Rational pricing is one approach used in pricing fixed rate bonds. Here, each cash flow can be matched by trading in
(a) some multiple of a zero-coupon bond corresponding to the coupon date, and of equivalent credit worthiness (if
possible, from the same issuer as the bond being valued) with the corresponding maturity, or (b) in a corresponding
strip and ZCB.
Given that the cash flows can be replicated, the price of the bond must today equal the sum of each of its cash flows
discounted at the same rate as each ZCB, as above. Were this not the case, arbitrage would be possible and would
bring the price back into line with the price based on ZCBs; see Bond valuation: Arbitrage-free pricing approach
The pricing formula is as below, where each cash flow is discounted at the rate that matches the coupon date:
Price =
Often, the formula is expressed as , using prices instead of rates, as prices are more
readily available.
See also Fixed income arbitrage; Bond credit rating.
Rational pricing
209
Pricing derivatives
A derivative is an instrument that allows for buying and selling of the same asset on two markets the spot market
and the derivatives market. Mathematical finance assumes that any imbalance between the two markets will be
arbitraged away. Thus, in a correctly priced derivative contract, the derivative price, the strike price (or reference
rate), and the spot price will be related such that arbitrage is not possible.
see: Fundamental theorem of arbitrage-free pricing
Futures
In a futures contract, for no arbitrage to be possible, the price paid on delivery (the forward price) must be the same
as the cost (including interest) of buying and storing the asset. In other words, the rational forward price represents
the expected future value of the underlying discounted at the risk free rate (the "asset with a known future-price", as
above). Thus, for a simple, non-dividend paying asset, the value of the future/forward, , will be found by
accumulating the present value at time to maturity by the rate of risk-free return .
This relationship may be modified for storage costs, dividends, dividend yields, and convenience yields; see futures
contract pricing.
Any deviation from this equality allows for arbitrage as follows.
In the case where the forward price is higher:
1. The arbitrageur sells the futures contract and buys the underlying today (on the spot market) with borrowed
money.
2. On the delivery date, the arbitrageur hands over the underlying, and receives the agreed forward price.
3. He then repays the lender the borrowed amount plus interest.
4. The difference between the two amounts is the arbitrage profit.
In the case where the forward price is lower:
1. The arbitrageur buys the futures contract and sells the underlying today (on the spot market); he invests the
proceeds.
2. On the delivery date, he cashes in the matured investment, which has appreciated at the risk free rate.
3. He then receives the underlying and pays the agreed forward price using the matured investment. [If he was short
the underlying, he returns it now.]
4. The difference between the two amounts is the arbitrage profit.
Options
As above, where the value of an asset in the future is known (or expected), this value can be used to determine the
asset's rational price today. In an option contract, however, exercise is dependent on the price of the underlying, and
hence payment is uncertain. Option pricing models therefore include logic that either "locks in" or "infers" this future
value; both approaches deliver identical results. Methods that lock-in future cash flows assume arbitrage free
pricing, and those that infer expected value assume risk neutral valuation.
To do this, (in their simplest, though widely used form) both approaches assume a Binomial model for the behavior
of the underlying instrument, which allows for only two states - up or down. If S is the current price, then in the next
period the price will either be S up or S down. Here, the value of the share in the up-state is S u, and in the
down-state is S d (where u and d are multipliers with d < 1 < u and assuming d < 1+r < u; see the binomial options
model). Then, given these two states, the "arbitrage free" approach creates a position that has an identical value in
either state - the cash flow in one period is therefore known, and arbitrage pricing is applicable. The risk neutral
approach infers expected option value from the intrinsic values at the later two nodes.
Rational pricing
210
Although this logic appears far removed from the Black-Scholes formula and the lattice approach in the Binomial
options model, it in fact underlies both models; see The Black-Scholes PDE. The assumption of binomial behaviour
in the underlying price is defensible as the number of time steps between today (valuation) and exercise increases,
and the period per time-step is increasingly short. The Binomial options model allows for a high number of very
short time-steps (if coded correctly), while Black-Scholes, in fact, models a continuous process.
The examples below have shares as the underlying, but may be generalised to other instruments. The value of a put
option can be derived as below, or may be found from the value of the call using put-call parity.
Arbitrage free pricing
Here, the future payoff is "locked in" using either "delta hedging" or the "replicating portfolio" approach. As above,
this payoff is then discounted, and the result is used in the valuation of the option today.
Delta hedging
It is possible to create a position consisting of shares and 1 call sold, such that the positions value will be identical
in the S up and S down states, and hence known with certainty (see Delta hedging). This certain value corresponds to
the forward price above ("An asset with a known future price"), and as above, for no arbitrage to be possible, the
present value of the position must be its expected future value discounted at the risk free rate, r. The value of a call is
then found by equating the two.
1) Solve for such that:
value of position in one period = S up - (S up strike price ) = S down - (S down
strike price)
2) Solve for the value of the call, using , where:
value of position today = value of position in one period (1 + r) = S current value of call
The replicating portfolio
It is possible to create a position consisting of shares and $B borrowed at the risk free rate, which will produce
identical cash flows to one option on the underlying share. The position created is known as a "replicating portfolio"
since its cash flows replicate those of the option. As shown above ("Assets with identical cash flows"), in the
absence of arbitrage opportunities, since the cash flows produced are identical, the price of the option today must be
the same as the value of the position today.
1) Solve simultaneously for and B such that:
i) S up - B (1 + r) = ( 0, S up strike price )
ii) S down - B (1 + r) = ( 0, S down strike price )
2) Solve for the value of the call, using and B, where:
call = S current - B
Note that here there is no discounting - the interest rate appears only as part of the construction. This approach is
therefore used in preference to others where it is not clear whether the risk free rate may be applied as the discount
rate at each decision point, or whether, instead, a premium over risk free would be required. The best example of this
would be under Real options analysis where managements' actions actually change the risk characteristics of the
project in question, and hence the Required rate of return could differ in the up- and down-states. Here, in the above
formulae, we then have: " S up - B (1 + r up)..." and " S down - B (1 + r down)..." .
Rational pricing
211
Risk neutral valuation
Here the value of the option is calculated using the risk neutrality assumption. Under this assumption, the expected
value (as opposed to "locked in" value) is discounted. The expected value is calculated using the intrinsic values
from the later two nodes: Option up and Option down, with u and d as price multipliers as above. These are then
weighted by their respective probabilities: probability p of an up move in the underlying, and probability (1-p) of
a down move. The expected value is then discounted at r, the risk free rate.
1) Solve for p
for no arbitrage to be possible in the share, todays price must represent its expected value discounted at the
risk free rate (i.e., the share price is a Martingale):
S = [ p (up value) + (1-p) (down value) ] (1+r) = [ p S u + (1-p) S d ] (1+r)
then, p = [(1+r) - d ] [ u - d ]
2) Solve for call value, using p
for no arbitrage to be possible in the call, todays price must represent its expected value discounted at the risk
free rate:
Option value = [ p Option up + (1-p) Option down] (1+r)
= [ p Max(S up - strike price, 0) + (1-p) Max(S down - strike price, 0) ] (1+r)
The risk neutrality assumption
Note that above, the risk neutral formula does not refer to the volatility of the underlying p as solved, relates to the
risk-neutral measure as opposed to the actual probability distribution of prices. Nevertheless, both arbitrage free
pricing and risk neutral valuation deliver identical results. In fact, it can be shown that Delta hedging and Risk
neutral valuation use identical formulae expressed differently. Given this equivalence, it is valid to assume risk
neutrality when pricing derivatives. See Fundamental theorem of arbitrage-free pricing.
Swaps
Rational pricing underpins the logic of swap valuation. Here, two counterparties "swap" obligations, effectively
exchanging cash flow streams calculated against a notional principal amount, and the value of the swap is the present
value (PV) of both sets of future cash flows "netted off" against each other.
Valuation at initiation
To be arbitrage free, the terms of a swap contract are such that, initially, the Net present value of these future cash
flows is equal to zero; see swap valuation. For example, consider the valuation of a fixed-to-floating Interest rate
swap where Party A pays a fixed rate, and Party B pays a floating rate. Here, the fixed rate would be such that the
present value of future fixed rate payments by Party A is equal to the present value of the expected future floating
rate payments (i.e. the NPV is zero). Were this not the case, an Arbitrageur, C, could:
1. Assume the position with the lower present value of payments, and borrow funds equal to this present value
2. Meet the cash flow obligations on the position by using the borrowed funds, and receive the corresponding
paymentswhich have a higher present value
3. Use the received payments to repay the debt on the borrowed funds
4. Pocket the difference - where the difference between the present value of the loan and the present value of the
inflows is the arbitrage profit
Rational pricing
212
Subsequent valuation
Once traded, swaps can also be priced using rational pricing. For example, the Floating leg of an interest rate swap
can be "decomposed" into a series of Forward rate agreements. Here, since the swap has identical payments to the
FRA, arbitrage free pricing must apply as above - i.e. the value of this leg is equal to the value of the corresponding
FRAs. Similarly, the "receive-fixed" leg of a swap, can be valued by comparison to a Bond with the same schedule
of payments. (Relatedly, given that their underlyings have the same cash flows, bond options and swaptions are
equatable.)
Pricing shares
The Arbitrage pricing theory (APT), a general theory of asset pricing, has become influential in the pricing of shares.
APT holds that the expected return of a financial asset, can be modelled as a linear function of various
macro-economic factors, where sensitivity to changes in each factor is represented by a factor specific beta
coefficient:
where
is the risky asset's expected return,
is the risk free rate,
is the macroeconomic factor,
is the sensitivity of the asset to factor ,
and is the risky asset's idiosyncratic random shock with mean zero.
The model derived rate of return will then be used to price the asset correctly - the asset price should equal the
expected end of period price discounted at the rate implied by model. If the price diverges, arbitrage should bring it
back into line. Here, to perform the arbitrage, the investor creates a correctly priced asset (a synthetic asset), a
portfolio with the same net-exposure to each of the macroeconomic factors as the mispriced asset but a different
expected return. See the APT article for detail on the construction of the portfolio. The arbitrageur is then in a
position to make a risk free profit as follows:
Where the asset price is too low, the portfolio should have appreciated at the rate implied by the APT, whereas the
mispriced asset would have appreciated at more than this rate. The arbitrageur could therefore:
1. Today: short sell the portfolio and buy the mispriced-asset with the proceeds.
2. At the end of the period: sell the mispriced asset, use the proceeds to buy back the portfolio, and pocket the
difference.
Where the asset price is too high, the portfolio should have appreciated at the rate implied by the APT, whereas
the mispriced asset would have appreciated at less than this rate. The arbitrageur could therefore:
1. Today: short sell the mispriced-asset and buy the portfolio with the proceeds.
2. At the end of the period: sell the portfolio, use the proceeds to buy back the mispriced-asset, and pocket the
difference.
Note that under "true arbitrage", the investor locks-in a guaranteed payoff, whereas under APT arbitrage, the
investor locks-in a positive expected payoff. The APT thus assumes "arbitrage in expectations" i.e. that arbitrage
by investors will bring asset prices back into line with the returns expected by the model.
The Capital asset pricing model (CAPM) is an earlier, (more) influential theory on asset pricing. Although based on
different assumptions, the CAPM can, in some ways, be considered a "special case" of the APT; specifically, the
CAPM's Securities market line represents a single-factor model of the asset price, where Beta is exposure to changes
in value of the Market.
Rational pricing
213
See also
Efficient market hypothesis
Fair value
Fundamental theorem of arbitrage-free pricing
Homo economicus
List of valuation topics
Rational choice theory
Rationality
Risk-neutral measure
Volatility arbitrage
External links
Arbitrage free pricing
Pricing by Arbitrage
[1]
, The History of Economic Thought Website
The Idea Behind Arbitrage Pricing
[2]
, Samy Mohammed, Quantnotes
"The Fundamental Theorem" of Finance
[3]
; part II
[4]
. Prof. Mark Rubinstein, Haas School of Business
Elementary Asset Pricing Theory
[5]
, Prof. K. C. Border California Institute of Technology
The Notion of Arbitrage and Free Lunch in Mathematical Finance
[6]
, Prof. Walter Schachermayer
Risk Neutral Pricing in Discrete Time
[7]
(PDF), Prof. Don M. Chance
No Arbitrage in Continuous Time
[8]
, Prof. Tyler Shumway
Risk neutrality and arbitrage free pricing
Risk-Neutral Probabilities Explained
[9]
. Nicolas Gisiger
Risk-neutral Valuation: A Gentle Introduction
[10]
, Part II
[11]
. Joseph Tham Duke University
Application to derivatives
Option Valuation in the Binomial Model
[12]
, Prof. Ernst Maug
Pricing Futures and Forwards by Arbitrage Argument
[13]
, Quantnotes
The relationship between futures and spot prices
[14]
, Investment Analysts Society of Southern Africa
The illusions of dynamic replication
[15]
, Emanuel Derman and Nassim Taleb
Swaptions and Options
[16]
, Prof. Don M. Chance
References
[1] http:/ / cepa. newschool.edu/ het/ essays/ sequence/ arbitpricing. htm
[2] http:/ / www. quantnotes. com/ fundamentals/ basics/ arbitragepricing. htm
[3] http:/ / www. in-the-money.com/ artandpap/ IV%20Fundamental%20Theorem%20-%20Part%20I. doc
[4] http:/ / www. in-the-money.com/ artandpap/ IV%20Fundamental%20Theorem%20-%20Part%20II. doc
[5] http:/ / www. hss. caltech.edu/ ~kcb/ Notes/ Arbitrage. pdf
[6] http:/ / www. fam. tuwien. ac. at/ ~wschach/ pubs/ preprnts/ prpr0118a. pdf
[7] http:/ / www. bus. lsu.edu/ academics/ finance/ faculty/ dchance/ Instructional/ TN96-02. pdf
[8] http:/ / www-personal.umich. edu/ ~shumway/ courses.dir/ f872. dir/ noarb. pdf
[9] http:/ / ssrn.com/ abstract=1395390
[10] http:/ / papers.ssrn.com/ sol3/ papers.cfm?abstract_id=290044
[11] http:/ / papers.ssrn.com/ sol3/ papers.cfm?abstract_id=292724
[12] http:/ / www.rpi.edu/ ~olivaa2/ binomial. pdf
[13] http:/ / www.quantnotes. com/ fundamentals/ futures/ futureforwardpricing. htm
[14] http:/ / www.iassa. co. za/ images/ file/ indexmain.htm
[15] http:/ / www.ederman.com/ new/ docs/ qf-Illusions-dynamic. pdf
[16] http:/ / papers.ssrn.com/ sol3/ papers.cfm?abstract_id=291988
Arbitrage
214
Arbitrage
In economics and finance, arbitrage (IPA:/rbtr/) is the practice of taking advantage of a price difference
between two or more markets: striking a combination of matching deals that capitalize upon the imbalance, the profit
being the difference between the market prices. When used by academics, an arbitrage is a transaction that involves
no negative cash flow at any probabilistic or temporal state and a positive cash flow in at least one state; in simple
terms, it is the possibility of a risk-free profit at zero cost.
In principle and in academic use, an arbitrage is risk-free; in common use, as in statistical arbitrage, it may refer to
expected profit, though losses may occur, and in practice, there are always risks in arbitrage, some minor (such as
fluctuation of prices decreasing profit margins), some major (such as devaluation of a currency or derivative). In
academic use, an arbitrage involves taking advantage of differences in price of a single asset or identical cash-flows;
in common use, it is also used to refer to differences between similar assets (relative value or convergence trades), as
in merger arbitrage.
People who engage in arbitrage are called arbitrageurs (IPA:/rbtrr/)such as a bank or brokerage firm. The
term is mainly applied to trading in financial instruments, such as bonds, stocks, derivatives, commodities and
currencies.
Arbitrage-free
If the market prices do not allow for profitable arbitrage, the prices are said to constitute an arbitrage equilibrium
or arbitrage-free market. An arbitrage equilibrium is a precondition for a general economic equilibrium. The
assumption that there is no arbitrage is used in quantitative finance to calculate a unique risk neutral price for
derivatives.
Conditions for arbitrage
Arbitrage is possible when one of three conditions is met:
1. The same asset does not trade at the same price on all markets ("the law of one price").
2. Two assets with identical cash flows do not trade at the same price.
3. An asset with a known price in the future does not today trade at its future price discounted at the risk-free
interest rate (or, the asset does not have negligible costs of storage; as such, for example, this condition holds for
grain but not for securities).
Arbitrage is not simply the act of buying a product in one market and selling it in another for a higher price at some
later time. The transactions must occur simultaneously to avoid exposure to market risk, or the risk that prices may
change on one market before both transactions are complete. In practical terms, this is generally only possible with
securities and financial products which can be traded electronically, and even then, when each leg of the trade is
executed the prices in the market may have moved. Missing one of the legs of the trade (and subsequently having to
trade it soon after at a worse price) is called 'execution risk' or more specifically 'leg risk'.
[1]
In the simplest example, any good sold in one market should sell for the same price in another. Traders may, for
example, find that the price of wheat is lower in agricultural regions than in cities, purchase the good, and transport it
to another region to sell at a higher price. This type of price arbitrage is the most common, but this simple example
ignores the cost of transport, storage, risk, and other factors. "True" arbitrage requires that there be no market risk
involved. Where securities are traded on more than one exchange, arbitrage occurs by simultaneously buying in one
and selling on the other.
See rational pricing, particularly arbitrage mechanics, for further discussion.
Mathematically it is defined as follows:
Arbitrage
215
and
where means a portfolio at time t.
Examples
Suppose that the exchange rates (after taking out the fees for making the exchange) in London are 5 = $10 =
1000 and the exchange rates in Tokyo are 1000 = $12 = 6. Converting 1000 to $12 in Tokyo and converting
that $12 into 1200 in London, for a profit of 200, would be arbitrage. In reality, this "triangle arbitrage" is so
simple that it almost never occurs. But more complicated foreign exchange arbitrages, such as the spot-forward
arbitrage (see interest rate parity) are much more common.
One example of arbitrage involves the New York Stock Exchange and the Chicago Mercantile Exchange. When
the price of a stock on the NYSE and its corresponding futures contract on the CME are out of sync, one can buy
the less expensive one and sell it to the more expensive market. Because the differences between the prices are
likely to be small (and not to last very long), this can only be done profitably with computers examining a large
number of prices and automatically exercising a trade when the prices are far enough out of balance. The activity
of other arbitrageurs can make this risky. Those with the fastest computers and the most expertise take advantage
of series of small differences that would not be profitable if taken individually.
Economists use the term "global labor arbitrage" to refer to the tendency of manufacturing jobs to flow towards
whichever country has the lowest wages per unit output at present and has reached the minimum requisite level of
political and economic development to support industrialization. At present, many such jobs appear to be flowing
towards China, though some which require command of English are going to India and the Philippines. In popular
terms, this is referred to as offshoring. (Note that "offshoring" is not synonymous with "outsourcing", which
means "to subcontract from an outside supplier or source", such as when a business outsources its bookkeeping to
an accounting firm. Unlike offshoring, outsourcing always involves subcontracting jobs to a different company,
and that company can be in the same country as the outsourcing company.)
Sports arbitrage numerous internet bookmakers offer odds on the outcome of the same event. Any given
bookmaker will weight their odds so that no one customer can cover all outcomes at a profit against their books.
However, in order to remain competitive their margins are usually quite low. Different bookmakers may offer
different odds on the same outcome of a given event; by taking the best odds offered by each bookmaker, a
customer can under some circumstances cover all possible outcomes of the event and lock a small risk-free profit,
known as a Dutch book. This profit would typically be between 1% and 5% but can be much higher. One problem
with sports arbitrage is that bookmakers sometimes make mistakes and this can lead to an invocation of the
'palpable error' rule, which most bookmakers invoke when they have made a mistake by offering or posting
incorrect odds. As bookmakers become more proficient, the odds of making an 'arb' usually last for less than an
hour and typically only a few minutes. Furthermore, huge bets on one side of the market also alert the bookies to
correct the market.
Exchange-traded fund arbitrage Exchange Traded Funds allow authorized participants to exchange back and
forth between shares in underlying securities held by the fund and shares in the fund itself, rather than allowing
the buying and selling of shares in the ETF directly with the fund sponsor. ETFs trade in the open market, with
prices set by market demand. An ETF may trade at a premium or discount to the value of the underlying assets.
When a significant enough premium appears, an arbitrageur will buy the underlying securities, convert them to
shares in the ETF, and sell them in the open market. When a discount appears, an arbitrageur will do the reverse.
In this way, the arbitrageur makes a low-risk profit, while fulfilling a useful function in the ETF marketplace by
keeping ETF prices in line with their underlying value.
Some types of hedge funds make use of a modified form of arbitrage to profit. Rather than exploiting price
differences between identical assets, they will purchase and sell securities, assets and derivatives with similar
characteristics, and hedge any significant differences between the two assets. Any difference between the hedged
positions represents any remaining risk (such as basis risk) plus profit; the belief is that there remains some
Arbitrage
216
difference which, even after hedging most risk, represents pure profit. For example, a fund may see that there is a
substantial difference between U.S. dollar debt and local currency debt of a foreign country, and enter into a
series of matching trades (including currency swaps) to arbitrage the difference, while simultaneously entering
into credit default swaps to protect against country risk and other types of specific risk..
Price convergence
Arbitrage has the effect of causing prices in different markets to converge. As a result of arbitrage, the currency
exchange rates, the price of commodities, and the price of securities in different markets tend to converge to the
same prices, in all markets, in each category. The speed at which prices converge is a measure of market efficiency.
Arbitrage tends to reduce price discrimination by encouraging people to buy an item where the price is low and resell
it where the price is high, as long as the buyers are not prohibited from reselling and the transaction costs of buying,
holding and reselling are small relative to the difference in prices in the different markets.
Arbitrage moves different currencies toward purchasing power parity. As an example, assume that a car purchased in
the United States is cheaper than the same car in Canada. Canadians would buy their cars across the border to exploit
the arbitrage condition. At the same time, Americans would buy US cars, transport them across the border, and sell
them in Canada. Canadians would have to buy American Dollars to buy the cars, and Americans would have to sell
the Canadian dollars they received in exchange for the exported cars. Both actions would increase demand for US
Dollars, and supply of Canadian Dollars, and as a result, there would be an appreciation of the US Dollar.
Eventually, if unchecked, this would make US cars more expensive for all buyers, and Canadian cars cheaper, until
there is no longer an incentive to buy cars in the US and sell them in Canada. More generally, international arbitrage
opportunities in commodities, goods, securities and currencies, on a grand scale, tend to change exchange rates until
the purchasing power is equal.
In reality, of course, one must consider taxes and the costs of travelling back and forth between the US and Canada.
Also, the features built into the cars sold in the US are not exactly the same as the features built into the cars for sale
in Canada, due, among other things, to the different emissions and other auto regulations in the two countries. In
addition, our example assumes that no duties have to be paid on importing or exporting cars from the USA to
Canada. Similarly, most assets exhibit (small) differences between countries, transaction costs, taxes, and other costs
provide an impediment to this kind of arbitrage.
Similarly, arbitrage affects the difference in interest rates paid on government bonds, issued by the various countries,
given the expected depreciations in the currencies, relative to each other (see interest rate parity).
Risks
Arbitrage transactions in modern securities markets involve fairly low day-to-day risks, but can face extremely high
risk in rare situations, particularly financial crises, and can lead to bankruptcy. Formally, arbitrage transactions have
negative skew prices can get a small amount closer (but often no closer than 0), while they can get very far apart.
The day-to-day risks are generally small because the transactions involve small differences in price, so an execution
failure will generally cause a small loss (unless the trade is very big or the price moves rapidly). The rare case risks
are extremely high because these small price differences are converted to large profits via leverage (borrowed
money), and in the rare event of a large price move, this may yield a large loss.
The main day-to-day risk is that part of the transaction fails execution risk. The main rare risks are counterparty
risk and liquidity risk that a counterparty to a large transaction or many transactions fails to pay, or that one is
required to post margin and does not have the money to do so.
In the academic literature, the idea that seemingly very low risk arbitrage trades might not be fully exploited because
of these risk factors and other considerations is often referred to as limits to arbitrage.
[2]
Arbitrage
217
Execution risk
Generally it is impossible to close two or three transactions at the same instant; therefore, there is the possibility that
when one part of the deal is closed, a quick shift in prices makes it impossible to close the other at a profitable price.
Competition in the marketplace can also create risks during arbitrage transactions. As an example, if one was trying
to profit from a price discrepancy between IBM on the NYSE and IBM on the London Stock Exchange, they may
purchase a large number of shares on the NYSE and find that they cannot simultaneously sell on the LSE. This
leaves the arbitrageur in an unhedged risk position.
In the 1980s, risk arbitrage was common. In this form of speculation, one trades a security that is clearly undervalued
or overvalued, when it is seen that the wrong valuation is about to be corrected by events. The standard example is
the stock of a company, undervalued in the stock market, which is about to be the object of a takeover bid; the price
of the takeover will more truly reflect the value of the company, giving a large profit to those who bought at the
current priceif the merger goes through as predicted. Traditionally, arbitrage transactions in the securities markets
involve high speed and low risk. At some moment a price difference exists, and the problem is to execute two or
three balancing transactions while the difference persists (that is, before the other arbitrageurs act). When the
transaction involves a delay of weeks or months, as above, it may entail considerable risk if borrowed money is used
to magnify the reward through leverage. One way of reducing the risk is through the illegal use of inside
information, and in fact risk arbitrage with regard to leveraged buyouts was associated with some of the famous
financial scandals of the 1980s such as those involving Michael Milken and Ivan Boesky.
Mismatch
Another risk occurs if the items being bought and sold are not identical and the arbitrage is conducted under the
assumption that the prices of the items are correlated or predictable; this is more narrowly referred to as a
convergence trade. In the extreme case this is merger arbitrage, described below. In comparison to the classical quick
arbitrage transaction, such an operation can produce disastrous losses.
Counterparty risk
As arbitrages generally involve future movements of cash, they are subject to counterparty risk: if a counterparty
fails to fulfill their side of a transaction. This is a serious problem if one has either a single trade or many related
trades with a single counterparty, whose failure thus poses a threat, or in the event of a financial crisis when many
counterparties fail. This hazard is serious because of the large quantities one must trade in order to make a profit on
small price differences.
For example, if one purchases many risky bonds, then hedges them with CDSes, profiting from the difference
between the bond spread and the CDS premium, in a financial crisis the bonds may default and the CDS writer/seller
may itself fail, due to the stress of the crisis, causing the arbitrageur to face steep losses.
Liquidity risk
The market can stay irrational longer than you can stay solvent.