H. P. Geering, G. Dondi, F. Herzog, S.

Keel

Stochastic Systems

April 14, 2011

c by Measurement and Control Laboratory

All rights reserved.
Unauthorized reproduction of any kind prohibited.

Contents

1

Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Convergence of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2

Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Classes of Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Markov Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Gaussian Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.4 Diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Brownian Motion and White Noise . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19
19
21
21
21
22
22
22
22
24
26
26

3

Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Stochastic Integration or Itˆo Integrals . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Properties of Itˆo Integrals . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4 Stochastic Integrals for Poisson Processes . . . . . . . . . . . . .
3.3 Stochastic Differentials and Itˆo Calculus . . . . . . . . . . . . . . . . . . . .
3.3.1 The Scalar Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 The Vector Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29
29
30
30
31
33
34
35
35
37
38

7. . . . . . . . . . . . . . . . 90 5. . . . . . . . . . . . . .6 Solutions of Stochastic Differential Equations . . .4. . . . . . . . . . . . . 87 5. . . . . . . . 100 . . . . . . . .2. . . . . . . .1. . . . . . . . .1. . . . . . . . . . .6. . . . . . . . .IV Contents 3.2. . . . . . . . . . . . . . . . . . . 92 5.3 Solution Procedure . . . . . . . . . . . . . . . . . . 87 5. . . .5 Example: The LQ-Regulator Problem . . . . . . . . . . 91 5. . . . . . .1. . . . . . . . . . . . . . . . . . .4 Numerical Implementation . 3. . . . . . . . . . . . . . . . . . . .1. . . . . . 4. .7. . . . . . . . . 3.2 The Extended Kalman Filter . . . . . .4 Deterministic Hamilton-Jacobi-Bellman Theory . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . 89 5. . . . . . . . . . .3 Example: The LQ-Regulator Problem . . . .1 Introduction . . . .6. . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . 96 5. . . . . . . . . . . . . . 98 5. . . . . . . . . . .3.1. . . . .1 Deterministic Optimal Control Problems . . . . . . . . .4. . . . . . . . . . . .2 Nonlinear Filtering . . . . . . . . .4 Itˆo Calculus for Poisson Processes . . . . . . .7.5 Stochastic Pontryagin’s Maximum Principle . . . . 3. . . . . . . . . . . 4. . . . .5 Partial Differential Equations and SDEs . . . . . . . . . . . . .2 Numerical Solution of SDEs . . . .3 Kalman Filter and Parameter Identification . . . . . . . . 3. . . 88 5.1 Stochastic Optimal Control Problems . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . .2 Stochastic Hamilton-Jacobi-Bellman Equation . . . . . . . 3. . . .2. . . .6 Stochastic LQG Example with Maximum Principle . . . . . .3 Lyapunov’s Second Method . . . . . . .2 Moment Method for Stochastic Systems . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . 3. . 95 5. . . . . . . . . . . . . . . . . . . . . . . 3. . . .6. . . . . . . . . . . . . . . 4. . .1 Linear Scalar SDEs . 40 41 42 45 46 49 50 51 55 55 57 57 62 62 64 65 4 Model-Based Filtering . . . . . . . . . . . . . . . . 92 5. . . . . . . . . . . .2 Kalman Filter Equations . . . . . . . .2. . 3. .1 Analytical Solutions of SDEs . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . .4 Stochastic LQG Examples with HJB Equation . . . . . . .1. . . 4. . .4 Popular Vector-Valued Linear Price Models . . . . . . .4. . . . . 3. . . . . . .4. . .1 Deterministic Optimal Control .1 Linear Filtering . 3. . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . .1 Introduction . . . .4. . . . . . .3. . . . .1. .3. . . . . . . . .2 Popular Scalar Linear Models . . . . . . . . .3 Parameter Estimation .3 Solutions of SDEs as Diffusion Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 68 68 76 79 81 81 81 82 85 5 Optimal Control . . . . . . .3. 3. . . . . . . . . . . . . . . . . . . . 93 5. . . . . .4 Stochastic Differential Equations .7 Stability . . . . . . . . . .2 Necessary Conditions for Optimality . . . .5 Nonlinear SDEs and Popular Nonlinear Pricing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . .3 Vector-Valued Linear SDEs . . . . 4. . . . . . . . . .1 The Kalman Filter . . . . . . . . . . . . . . . . .2 Stochastic Optimal Control . . . . . . 87 5. . . . . . . . . . . . . . . . . . . .

. . . . .2 The Markowitz Model . 104 6. . . . . . . . . 132 6. 128 6. . . . . . . . . . . . . . .1 Continuous Compounding . . . . 103 6. . . . . . . . . . . 112 6. . . . . . . . . . . 108 6. . . . .4 Derivatives . . . . . . . . .3 The Capital Asset Pricing Model (CAPM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . 107 6. . . . . . . . . . . . .3. . . . . . . . .6 General Option Pricing . . . . . .4 Arbitrage Pricing Theory (APT) . . 110 6. . . . . . . . . . . . . . . . . . . . . . .1 Forward Contracts . . . . . . . . . . . . . . . . . . . . . . . .Contents 6 V Financial Applications . . . . . . 126 6. 136 6. . . . . . . . . . . . . . . . . . .4.5 Black-Scholes Formula for European Put Options . . . . 115 6. . . . 139 . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . .4 Portfolio Models and Stochastic Optimal Control . . . . . . . .4 Black-Scholes Formula and PDE . . .4. . . . . . . . . . . . . . . . . . . .2. . 107 6. . .1 Introduction . .3 Wealth Dynamics and Self-Financing Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . .3 Options . . . . . . . . . . . . .2 The Dynamics of Asset Prices . . .2 Futures . . 130 6. . . . . . . .3. . . . . . . . .2 Net Present Value . . . . . . . . . . . . . . . . . . . . . . 103 6. . . . . .2.1 Introduction . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6. . . . . . . . . .4. . . . . . . . . 118 6. . . . .4. . . . . . . .3.2 Mean-Variance Portfolio Theory . . . . .3 Utility Functions . . . . . . . . . . . . . . . . . . . . . 115 6. . .1 Introduction . . . . . . . . . . . . . . . . . . . .4.1. . . . .3. . . . . . . . . . . 105 6. . . . . .3 Continuous-Time Finance . . . . . . . . . . . . . . . . 137 References . . . . . . . . . . . . . . . 126 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6. . . . 104 6. . . . . . . .2. . . . . . . . . . . . .

.

e. — See Definition 1..1 Foundations Definition 1.. The probability measure P assigns a probability P (A) to every event A ∈ F : P : F → [0. P }. 1 2 3 The sample space Ω is the set of all possible samples or elementary events ω: Ω = {ω | ω ∈ Ω}.g. The purpose of this section is to clarify the salient details of this very compact definition. and P its probability measure3 . . nothing is impossible. F its σ-algebra of events2 . This also includes some measure theory. Randomness only enters when a concrete experiment is made (e. It is important to note that these tools are deterministic.1. For a more rigorous treatment the reader may refer to [5] or [36]. Probability space A probability space W is a unique triple W = {Ω. i. we need a more rigorous framework than elementary probability theory.4. we conduct an observation). where Ω is its sample space1 .1 Probability In real life. F. A ∈ F}. Gian-Carlo Rota Probability theory develops the mathematical tools for describing the nature of uncertainty. Geering A random variable is neither random nor variable. Since we want to model random phenomena described by random processes and their stochastic differential equations.10. 1. It is not the purpose of this text to rigorously develop measure theory but to provide the reader with the important results (without proofs) and their practical implications. The σ-algebra F is the set of all of the considered events A. — See Definition 1. say “this event has probability zero” if you think it is impossible. 1]. subsets of Ω: F = {A | A ⊆ Ω. Therefore. Hans P.

Example 1. Sample space • • • • • Toss of a coin (with head and tail): Ω = {H. ω3 . 3. ω5 . An event A is a subset of Ω. Ω}. Two tosses of a coin: Ω = {HH. A ∪ B. . . . etc. A cubic die: Ω = {ω1 . Events • • • • Head in the coin toss: A = {H}. . where Ω = {1. }. If the outcome ω of the experiment is in the subset A.2 1 Probability The sample space Ω is sometimes called the universe of all samples or possible outcomes ω. Definition 1. we call them measurable sets or measurable events. where Ω = {ω | ω ∈ R}. For all Ai ∈ F: i=1 Ai ∈ F The pair {Ω. Odd number in the roll of a die: A = {ω1 . a σ-algebra represents all of the events of our experiment. we must also be able to determine whether Ac has occurred or not. In our probabilistic environment. 2. ω4 . Furthermore. A real number between 0 and 1: A = [0. where |Ω| < ∞ is the number of elements in Ω. then the event A is said to have occurred. }. A simple example is the roll of a die with incomplete information. Example 1. We denote the complementary event of A by Ac = Ω\A. 4}. ω5 }. Ac ∩ B. ω5 }. Therefore. 2. ω3 . 3. if A and B are events we can also detect the events A ∩ B. Therefore. When it is possible to determine whether an event A has occurred or not. 1]. . ω2 . Note that the ωs are a mathematical construct and have per se no real or scientific meaning. F} is called measure space and the elements of F are called measurable sets. ω3 . The reals: Ω = {ω | ω ∈ R}. 3. ω4 . . ω6 }. T T }. T }.3.2. The set of all subsets of the sample space are denoted by 2Ω . {ω2 . the number of all possible events of a finite set is 2|Ω| . An integer smaller than 5: A = {1. our σ-algebra would be F = {∅. . T H. If we are only told whether an odd or an even number has been rolled by the die. HT. ω6 }. σ-algebra A collection F of subsets of Ω is called a σ-algebra on Ω if the following properties apply • • • Ω ∈ F and ∅ ∈ F (∅ denotes the empty set) If A ∈ F then Ω\A = Ac ∈ F: The complementary ∪∞ subset of A is also in Ω. The positive integers: Ω = {1. {ω1 . When we define the σ-algebra of an experiment we actually define which events we are able to detect. The ωs in the die example refer to the numbers of dots observed when the die is thrown. 2.4.

{ω1 . {ω1 . ω2 . The concept of generated σ-algebras is important in probability theory. If. we are only interested in one subset A ∈ Ω in our experiment. ω4 }. {ω1 . {ω4 }. not an event. is. there is no more randomness since we have full information.1 Foundations 3 This is the meaning of the σ-algebra when modeling the experiment. i. Therefore. ω3 }. the corresponding σ-algebra is {∅. Only the elements of F are events and have their assigned probabilities. ω4 }. {ω1 .5. for instance. the σ-algebra is the mathematical construct for modeling informational aspects in an experiment. The σ-algebra generated by C. {ω1 . ω2 . ω5 }). ω4 }.. Conversely. A. then we just work with the σ-algebra generated by A and we have avoided all the measure-theoretic technicalities for constructing σ-algebras. C ∈ F. {ω3 }. i. after we made a concrete observation. Ac . ω4 }. that not every subset of Ω is an event. Any subset of Ω. As a consequence.6. {ω1 . ω4 }}. denoted by σ(C). T H. Ω}. it determines how much information we get once we conduct some observations. ω2 . ω2 }.e. we think of σalgebras as the measurable events in an experiment. T T } = {ω1 . If we think of measurability in the engineering context. If we know what kind of events in experiment we can measure. we have full information if F = 2Ω . {ω3 . ω4 }. is the smallest σ-algebra F which includes all elements of C. ω3 }. • Fmax = {∅. Ω}. ω4 } • Fmin = {∅. ω3 . ω3 .1. {ω2 }. ω3 . {ω2 . With this σalgebra as information structure available. {ω1 . This means. With this σ-algebra at hand. This leads us to the following definition: Definition 1. Ω}. F consists of all subsets of Ω. we denote them by A. This means for the die example that we know how many eyes showed up once we make an observation. and hence does not have a probability. ω3 . it depends on the σ-algebra how much information we obtained by the observation. Now.e. we can measure every possible event and therefore we know for every observation which ω in Ω had been chosen. ω3 }. ω3 . {ω1 }. In the die example we would only be told that the die has been rolled but not how many eyes showed up. σ(C): σ-algebra generated by a class C of subsets Let C be a class of subsets of Ω. {ω2 . we say that every element A ∈ F is F-measurable. we only know that an event has occurred but we have no information about which element ω of Ω has been chosen. {ω2 .. HT. Example 1. ω2 . It is important to note. Ω} = {∅. ω3 . mathematically speaking. The most important σ-algebra used . for the definition of events. Therefore. ω4 }. For the even/odd die example we just consider the σ-algebra generated by {ω1 . σ-algebra of two coin tosses • Ω = {HH. which is not an element of the σ-algebra F. This is actually a very convenient tool for the scientific usage of σ-algebras. The simplest or “trivial” σ-algebra is {∅. ω5 }: σ({ω1 .

and for every sequence of disjoint sets (Fi : i ∈ N) in F with F = i∈N Fi we have ∑ µ(F ) = µ(Fi ). The sets in B(R) are called Borel sets.8. The triple (Ω. ∞] is called a measure on (Ω. Definition 1. Mathematically speaking. Measure Let F be a σ-algebra of Ω and therefore (Ω. i∈N If µ is countably additive. then the probability that at least one event occurs is . F. b] ∩ (a. Intuitively. Definition 1. µ) is called a measure space. a] = n=1 [a − n. The extension to the multi-dimensional case. F) be a measurable space. ∞). a) ∪ (b. b] = (−∞. ∞). a ∪ + n1 ). closed interval: [a. the measure states that if we take two events which cannot occur simultaneously. denoted by ω. a + n]. With the Borel σ-algebra B(R) we are now able to measure events such as [0. Intuitively. B(Rn ). a) and (a. an } = k=1 ak . b] = (−∞. The real line R is often considered as sample space. The Borel σ-algebra is the σ-algebra generated by all open subsets of R and therefore includes all subsets of R which are of interest in practical applications.∪a) ∪ (b. This could not be done by just considering the “atomic” elements of Ω.4 1 Probability in this context is the Borel σ-algebra B. A probability measure has some additional properties. it is also additive. It can be shown that B(R) contains (for all real numbers a and b): • • • • • • • open half-lines: (−∞. half-open and half-closed (a. union of open half-lines: (−∞. meaning that for every F. ∞) = n=1 [a. a] and [a. F) if µ is countably additive. ∪∞ ∞ closed half-lines: (−∞. The measure µ is countably additive (or σ-additive) if µ(∅) ∪ = 0. Borel σ-algebra B(R) The Borel σ-algebra B(R) is the smallest σ-algebra containing all open intervals in R.∩ ∞ every set containing only one real number: {a} = n=1 (a − n1 . G ∈ F we have µ(F ∪ G) = µ(F ) + µ(G) if and only if F ∩ G = ∅ .7. is straightforward. ∞). a measure is some kind of function µ from F to R. ∞). What is still needed is the definition of the probability measure itself. The map µ : F → [0. we want to assign a probability measure to each event in order to know the frequency of its observation. n every set containing finitely many real numbers: {a1 . · · · . 1].

.1. The Lebesgue measure of a set containing only one point must be zero: λ({a}) = 0. At this point. we get the following facts: • • P (∅) = 0. The only problem we are still facing is the range of the measure µ. denoted by λ. e.1 Foundations 5 just the sum of the probabilities of the original events. it is worth noting that there indeed exist subsets on the straight line which do not have a determinable length. ∈ F are mutually disjoint. Lebesgue measure on B(R) The Lebesgue measure on B(R). If A ∈ F then P (A) ≥ 0. Our goal is to standardize the probability measure. or infinite. 1]. We do this by defining the probability of the certain event..g. positive and finite. satisfying the following conditions • • • P(Ω) = 1. Probability measure A probability measure P on the sample space Ω with σ-algebra F is a set function P : F → [0. then P ∞ (∪ i=1 ∞ ) ∑ Ai = P (Ai ).. But these sets are hard to construct and therefore have no practical importance.9. If A1 . i=1 The Lebesgue measure of a set containing uncountably many points can be either zero. P (Ac ) = 1 − P (A) where Ac is the complementary set of A: A + Ac = Ω. · · · }) must also be zero: λ(A) = ∞ ∑ λ({ai }) = 0.10. B(R)) which assigns the measure of each interval to be its length. the Vitali sets. The Lebesgue measure of a set containing countably many points (A = {a1 . Note that for example length is a measure on the real line R. .. i=1 As a consequence of this definition. A2 . is defined as the measure on (R. a2 . µ(Ω). We will now state Kolomogorov’s axioms (1931) for probability which have generally been accepted. to have value 1. Definition 1. A3 . Definition 1. This measure is known as the Lebesgue measure.

1. HT H. HT T. — Sometimes. The probability of the set A = {HHH.e. T HT. A random variable X assigns a real number to every ω ∈ Ω.12. · · · . • Sample space: Ω consists of finitely many sample points. HT H. P ) is called a probability space. F. HHT. w2 . Example 1. HT H. • • σ-algebra: all subsets of Ω (maximal algebra or power set) A = 2Ω .2 Random Variables Consider a sample space Ω which includes all possible outcomes of an experiment.6 1 Probability The triple (Ω. HT T }) = P ({ω}) ω∈A 3 2 2 2 = p + p q + p q + pq = p This is another way of saying that the probability of H on the first toss is p. Finite number of coin tosses In this experiment. a random variable is just a function from Ω to the real numbers. · · · . T T H. we define P ({ω}) = p|H ∈ ω| · q |T ∈ ω| .. T HH. HHT. For n = 3 the sample space is: Ω = {HHH. Glibly speaking.11. T T T } . HT T } ∈ A is: ∑ P (A) = P ({HHH. Probability measure: suppose the probability of H on each toss is P (H) = p with 0 ≤ p ≤ 1. it is easier to work with the following equivalent condition: y ∈ R ⇒ {ω ∈ Ω : f (ω) ≤ y} ∈ F . Each sample point is a sequence of ”Head” (H) and ”Tail” (T ) with n components: ω = (w1 . HHT. w2 . the inverse f −1 maps all of the Borel sets B ⊂ R to F. wn ). But X has to be measurable with respect to the σ-algebra of Ω! This is made clearer in the following definition: Definition 1. wn ) in Ω. . For each ω = (w1 . F. F-measurable function The function f : Ω → R defined on (Ω. i. a coin is tossed n < ∞ times. P ) is called F-measurable if f −1 (B) = {ω ∈ Ω : f (ω) ∈ B} ∈ F for all B ∈ B(R). The total number of subsets in A is 28 = 256. Then the probability of T is P (T ) = q = 1 − p.

Let us consider the easy case of F = {∅. ω3 .51} = {ω4 . ω6 . only the constant functions are measurable. We do not need to care about the set {ω ∈ Ω : f (ω) ≤ y} for arbitrarily chosen y since every possible subset of Ω is in F. Ω}. Conversely. ω2 . The concept of random variables For the even/odd die example we consider the following random variable: { 1 if ω = ω1 . ω3 f (ω) = −1 if ω = ω4 . ω5 . at first glance. we state that the set {ω ∈ Ω : f (ω) ≤ −0. If we regard the measurable sets F as events. If f (ω) = c we get.1. This means that once we know the (random) value X(ω) we know which of the events in F have occurred. More formally. an F-measurable function is consistent with the information of the experiment. X(ω) : Ω 7→ R R 6 ω A1 s X(ω) ∈ R A2 (−∞. we always get the empty set ∅ since f (ω) = c.1 gives an overview on the concept of random variables as F-measurable functions.1 the equivalent condition of Definition 1. Figure 1. Note that in Figure 1. by the equivalent condition of the definition of F-measurable functions. 1. For this σ-algebra. .1. ω5 . always the whole sample space Ω. For the case of the power set F = 2Ω all functions are measurable.2 Random Variables 7 The definition of measurable functions is. for all ω ∈ Ω. we already know that this is not a measurable function for F = σ({ω1 . ω5 }). not obvious to understand. a] ∈ B Y A3 Ω A4 X −1 : B 7→ F Fig. for y ≥ c.12 of F-measurable functions is used. Of course. ω6 } ∈ / F and therefore f is not F-measurable. Consider the equivalence condition of the definition of F-measurable functions. This can be thought of as a game where the player wins one Euro when the number of eyes is below 4 and loses one Euro if the number of eyes is above 3. if y < c.

µ : Ω → R a measure. measurable function. i. 0).8 1 Probability An important example of measurable functions are indicator functions of measurable sets A ∈ F: { 1 if ω ∈ A IA (ω) = 0 if ω ∈ / A. • If f is a simple function. we introduce the concept of integration in the stochastic environment. the construction of the Lebesgue integral is more complicated. we have f = f + − f − with f + (x) = max(f (x). The Lebesgue integral of a function f is a generalization of the Riemann integral. the Lebesgue integral is defined by ∫ ∫ f dµ = lim fn dµ. Of course. we define ∫ f dµ = Ω • n ∑ ci µ(Ai ). n→∞ Ω • Ω If f is an arbitrary. Before we state the definition of random variables. Recall that the integral is just the limit of a sum. but can be calculated on any sample space Ω. this is also the case for the Lebesgue integral.. Definition 1. and f : Ω → R an F-measurable function. measurable but otherwise general function. i=1 If f is a nonnegative.e. Ω The ∫integral above may be finite or infinite. x ∈ Ai for all where each ci is a real number and each Ak is a set in F.). Lebesgue Integral Let (Ω. n→∞ With this sequence. 0) and then define ∫ and ∫ ∫ f + dP − f dµ = Ω f − (x) = max(−f (x). It is not defined if and Ω f − dP are both infinite. The important point here is that we can always construct a sequence of simple functions fn with fn (x) ≤ fn+1 (x) which converges to f : lim fn (x) = f (x). F) be a measure space. Ω f − dP. f (x) = ci . ∫ Ω f + dP . The importance stems from the fact the indicator functions can be used to build more sophisticated functions (such as limits etc.13. possibly also taking the values ±∞.

if X : Ω → R and Y : Ω → R are functions and a and b are real constants. Since X is F-measurable we have X −1 : B → F. The Lebesgue integral is more general than the Riemann integral since it is defined over arbitrary sample spaces Ω. Furthermore. the distribution function is introduced: . Ω Ω Ω If X(ω) ≤ Y (ω) for all ω ∈ Ω. x2 ] except at a countable number of points in [x1 . Riemann-Lebesgue integral equivalence Let f be a bounded and continuous function on [x1 .1. In the important case where Ω ≡ R.15. Finding a convergent sequence of functions is very tedious. the only difference between the Lebesgue and the Riemann integral is that one is based on the partitioning of the range and the other is based on the partitioning of the domain. x1 [x1 . the definition of the Lebesgue integral is very inconvenient.x2 ] A random variable or random vector is defined as follows: Definition 1. We will take full advantage of the Lebesgue integral when we introduce the concept of expectation. As already mentioned. The Lebesgue integral has all the linearity and comparison properties one would expect. But fortunately we have the following theorem: Theorem 1. B(R). P ). Then both the Riemann and the Lebesgue integral with Lebesgue measure µ exist and are the same: ∫ x2 ∫ f (x) dx = f dµ.2 Random Variables 9 As mentioned before. the most important sample space in practice is R (or Rn ). Random variable/vector A real-valued random variable (vector) X is an F-measurable function defined on a probability space (Ω. the most important concept of the Lebesgue integral is that it is the limit of approximating sums (as the Riemann-Stieltjes integral is). We therefore analyze the case of (R. For notational convenience we use P (X ≤ x) instead of P ({ω ∈ Ω | X(ω) ≤ x}). the measure µ does not have to be length (as in the Riemann-Stieltjes case). F. In particular.14. First. P ) mapping its sample space Ω into the real line R (Rn ): X : Ω → R (Rn ). then ∫ ∫ XdP ≤ Y dP. Ω Ω For quantitative purposes. then ∫ ∫ ∫ (aX + bY ) dP = a X dP + b Y dP . x2 ].

the extension to the multi-dimensional case. has density f with respect to the Lebesgue measure such that f is a non-negative function and for all A ∈ F: ∫ P ({ω | ω ∈ A}) = f (x)dx. Σ ∈ Rn×n ): f (x) = ( )− 12 (ν+n) Γ ( ν+n 1 T −1 2 ) √ 1 + (x − µ) Σ (x − µ) . xn ). is defined by: F (x) = P (X(ω) ≤ x) = P ({ω | X(ω) ≤ x}). f (x) = √ (2π)n det(Σ) • Multivariate t-density with ν degrees of freedom (x. . is straightforward. f (x1 . A Again. . defined on a probability space (Ω. σ22 ) are independent. . defined on a probability space (Ω. σ 2 ) and Y = aX + b. If X1 ∼ N (µ1 . Density function The random variable X. The extension to the multi-dimensional case. a2 σ 2 ). The following properties are useful when dealing with normally distributed random variables: • • If X ∼ N (µ. . σ12 + σ22 ). σ 2 ) for normally distributed random variables with parameters µ and σ is often found in the literature. . The probability measure of the half-open sets in R is P (a < X ≤ b) = P ({ω | a < X(ω) ≤ b}) = F (b) − F (a). x = 0. . . σ12 ) and X2 ∼ N (µ2 . F. µ ∈ Rn . . then X1 + X2 ∼ N (µ1 + µ2 . P ). P ). Example 1. F (x1 .18. xn ). is straightforward. Σ > 0 ∈ Rn×n ): T −1 1 1 e− 2 (x−µ) Σ (x−µ) . . then Y ∼ N (aµ + b. A close relative of the distribution function is the density function: Definition 1.16. Distribution function The distribution function of a random variable X. 1. Multivariate normal density (x. µ ∈ Rn .10 1 Probability Definition 1. . F. . .17. Important density functions • Poisson density or probability mass function (λ > 0): f (x) = • λx −λ e x! . 2. . ν Γ ( ν2 ) (πν)n det(Σ) The shorthand notation X ∼ N (µ.

In particular. F.2 Random Variables 11 Instead of defining the probability measure on (Ω. This translation is done by ∫the random variable X itself. which we are far more familiar with from elementary probability theory. F) to (R. Since F ∈ B(R) and X is F-measurable by definition. Characteristic function For the random variable X with the distribution function F and the density function f . B. convolution of two density functions in the x-domain corresponds to multiplication of their characteristic functions in the ζ-domain. 2π −∞ Since we have defined the Lebesgue integral we can now define the expectation and the variance of a random variable in a straight forward manner: Definition 1. Rather than describing a random variable X by its distribution function F (x) or its density function f (x).1. Expectation of a random variable The expectation of a random variable X. or else he would have chosen −ζ rather than +ζ in the exponent of the transformation kernel. . B). we can also define the probability measure on the real line (R. Ω 4 R R It seems that the poor fellow who invented the characteristic function was not aware of the Fourier transformation. We therefore limit ourselves to the case of (R. P ). j denotes −1. A∈B dF ) because it is a sufficient description of “real world” problems (assuming that the distribution function exists).20. The inverse transformation is: ∫ ∞ 1 f (x) = e−jζx φ(ζ) dζ .19. the characteristic function φ is obtained via the following functional transformation: ∫ ∞ ∫ ∞ jζx e dF (x) = ejζx f (x) dx φ(ζ) = for all ζ ∈ R. the nice properties of the Fourier transformation are retained. is defined by: ∫ ∫ ∫ E[X] = X dP = x dF (x) = xf (x) dx. F) in the case of Ω ≡ R. defined on a probability space (Ω. — As usual. — Nevertheless. Definition 1. B) if Ω ≡ R. it is sometimes useful to work with its so-called characteristic function φ(ζ). we can always transform (Ω. −∞ −∞ Notice that the real variable x in the x-domain is replaced by the new real √ variable ζ in the ζ-domain4 .

. it does not matter what the sample space Ω is. . If we assume that fi (xi ) is the density of the random variable Xi . F.21. . . this yields E n [∏ i=1 n ] ∏ Xi = E[Xi ] i=1 for independent random variables. X2 ) = cov(X1 . Ω The square root of the variance. i=1 As an important consequence. Definition 1. and the correlation coefficient ρ ρ(X1 . P ). σ. . the expectation of an arbitrary function g of a random variable X is defined as ∫ E[g(X)] = g(X)dP. xn ) = n ∏ fi (xi ) . Independence of random variables The random variables X1 . . . Xn are independent if P n (∩ n ) ∏ {Xi ∈ Ai } = P ({Xi ∈ Ai }) i=1 for all Ai ∈ F. i=1 For two random variables we define their covariance to be [ ] cov(X1 . then the independence condition is equivalent to f (x1 . is called the standard deviation. . is defined by: ∫ 2 2 var(X) = σ (X) = E[(X − E[X]) ] = (X − E[X])2 dP = E[X 2 ] − E[X]2 . 1] . The concept of (in-)dependence of random variables is an important topic in probability. More generally. Calculations and reasoning are a lot easier once we know that two random variables are independent. The calculations for the two familiar cases of a finite Ω and Ω ≡ R with continuous random variables remain the same. σ(X1 )σ(X2 ) It is important to notice that uncorrelated random variables need not be independent. defined on a probability space (Ω.12 1 Probability With this definition at hand.22. Ω Definition 1. Variance of a random variable The variance of a random variable X. X2 . X2 ) ∈ [−1. X2 ) = E (X1 − E[X1 ])(X2 − E[X2 ]) . .

. } and call σ(C) = σ(X) the σalgebra generated by X. we ω ∈ B. ˜ = B) = 1. given B.3 Conditional Expectation 13 1. we define the conditional expectation of the random variable Y . the natural extension for the conditional expectation is the inclusion of the σ-algebra. together. only need to scale P (A ∩ B) by 1/P (B) in order to have P (Ω A∩B A B Ω Fig. the probability of A. The concept of conditional expectation From the conditional probability we get conditional expectation of the random variable Y . are a disjoint partition of Ω. P (B) > 0.2. We consider the sets Ai = {ω | X(ω) = xi } which. is P (A|B) = P (B|A)P (A) P (A ∩ B) = P (B) P (B) . . This formula is also known as Bayes’ rule. We have considered the set B above as an event and we have introduced the σ-algebra F as the collection of “measurable” events. this formula is very simple if we look at it in the following way: since we know for sure that ˜ Therefore. For the discrete case. We then use the concept of generated σ-algebras. to be E(Y |X = xi ) = E(Y |σ(X)). The concept for a discrete random variable is rather simple. As we already know from fundamental probability theory. We consider the sets where the random variable X which takes distinct values xi .3 Conditional Expectation The concept of conditional expectation is very important because it plays a fundamental role in many applications of probability. In this setup. . given B. as E(Y |B) = E(XIB ) P (B) . conditional probability makes use of additional information. P (B) > 0. Choose C = {A1 .1. given the value of the random variable X. generated by a random variable or vector. . Therefore. Intuitively. where IB denotes the indicator function of the set B. it is natural to consider B as our new sample space Ω. 1. A2 .

For all sets G in G we have ∫ ∫ Y dP = X dP G for all G ∈ G. The definition of the conditional expectation is: Definition 1. ω3 . ω5 . Therefore.23. ω6 }. ω6 } or X(ω) = 1.24. P ). We want to compute the conditional expectation of Y if we know the value of X. Conditional expectation Let X be a random variable defined on the probability space (Ω. ω3 } or X(ω) = 0. ω3 }). 3 Note that the actual value of X does not influence the value of the conditional expectation. The σ-algebra generated by X is σ({ω1 . E[|Y |] < ∞. ω2 . ω5 . As already mentioned. ω4 . this definition seems very unpleasant but it is not that bad. At first glance. ω6 . respectively E(Y |X) = 1 if ω ∈ {ω4 . respectively. F. It can be shown that if another random variable Z satisfies the conditions above we have Z = Y almost surely. G The random variable Y = E[X|G] is called conditional expectation. ω2 . 3. 2. ω4 . P ) with E[|X|] < ∞. The most obvious fact is that Y = E[X|G] is constant on all of the sets in . the sets Ai = {ω | X(ω) = xi } determine the conditional expectation. This yields for the conditional expectation { 1 − 3 if ω ∈ {ω1 . ω2 . Simple die game Consider a game where a die is rolled: Ω = {ω1 . ω3 . Consider another random variable X on Ω which indicates whether the number is above three or below four: { 0 if ω = ω1 . The player wins one Pound Sterling when the number of eyes is even and loses one Pound Sterling if the number is odd. the random variable Y of the player’s win or loss is { 1 if ω = ω2 . Example 1. let G be a sub-σ-algebra of F (G ⊆ F). Then there exists a random variable Y with the following properties: 1. ω5 . We now want to extend the conditional expectation for the general case of a probability space (Ω. ω2 . the mathematical construct for describing additional information are σ-algebras. ω6 Y (ω) = −1 if ω = ω1 . Furthermore. F. Rather. ω3 X(ω) = 1 if ω = ω4 . ω5 .14 1 Probability Note that the values xi do not matter for the conditional expectation. Y is G-measurable.

Then Y is just a coarser version of X. fX1 (x1 ) . then E(X|F) ≥ 0. Conditional expectation • • • • • E(E(X|F)) = E(X).3. 1.25. Therefore. Linearity: E(αX1 + βX2 |F) = αE(X1 |F) + βE(X2 |F). Tower property: If G is a sub-σ-algebra of F. given X1 = x1 is given by f (x2 | X1 = x1 ) = f (x1 . Conditional expectation as piecewise linear approximation G. Ω}] = Ω XdP = E[X]. then E(ZX|G) = Z · E(X|G). — Some useful properties of the conditional expectation are stated below: Property 1.ω G1 G2 G3 G4 G5 Fig. From elementary probability theory we already know the conditional density. Ω} equals the unconditional expectation Y = E[X|{∅. For two random variables X1 and X2 which have the joint density function f (x1 . −∞ The conditional density of X2 . x2 ). x2 ) . x2 ) dx2 . The sample space is partitioned into five mutually disjoint subsets: Ω = G1 ∪ G2 ∪ G3 ∪ G4 ∪ G5 .3.1. Positivity: If X ≥ 0 almost surely. If X is F-measurable. This is shown in Figure 1. It is easily seen that the conditional expectation for the trivial∫ σ-algebra {∅. the marginal density of X1 is defined by ∫ ∞ fX1 (x1 ) = f (x1 . then E(E(X|F)|G) = E(X|G).3 Conditional Expectation 15 X(ω) 6 X Y . then E(X|F) = X. • Taking out what is known: If Z is G-measurable. Y is a piecewise linear approximation of X.

for all ε > 0. the law of large numbers states that the probability of an event A can be determined arbitrarily precisely by making sufficiently many observations. we cannot compare almost sure convergence and convergence in Lp . In general. The upper right corner of Figure 1.4 summarizes the dependence of the different types of convergence. Note that almost sure convergence is usually hard to prove whereas convergence in Lp is usually a lot easier to prove. if lim Fn (x) = F (x).4 states that if a sequence converges in Lp then it also converges in Lq for all q < p. P ). Xn −→ X. Xn −→ X. n→∞ This means that Xn converges to X in the usual sense except for null sets of Ω. if ( ) lim E(|Xn (ω) − X(ω)|p ) = 0. Nevertheless. . The sequence {Xn } converges to X in Lp . if P ({ω ∈ Ω | lim (Xn (ω)) = X(ω)}) = 1. n→∞ d 4. The most important case is convergence in the mean-square sense. The sequence {Xn } converges to X in distribution. This concept only describes the statistical properties of the limit of the sequence. 1. p 2. n→∞ Lp 3. where Fn denotes the distribution function of Xn and F denotes the distribution function of X. We consider a sequence of random variables {Xn } and a random variable X. Loosely speaking.16 1 Probability 1. Figure 1. There are four convergence concepts which will be discussed in this section. Xn −→ X. The weakest concept of convergence considered here is convergence in distribution. F.4 Convergence of Random Variables The best known convergence property in probability is the law of large numbers. Obviously. Xn −→ X. all of them defined on the probability space (Ω. This fact was used long before Kolmogorov’s axiomatic definitions of probability. both types of convergence imply convergence in probability. The sequence {Xn } converges to X in probability. n→∞ for all x ∈ R. The sequence {Xn } converges to X with probability one (or almost 1 surely). the different convergence concepts are not independent of each other. From the results in this section we therefore only have to check convergence in L2 in order to have also convergence in L1 . if ( ) lim P ({ω ∈ Ω | |Xn (ω) − X(ω)| > ε}) = 0.

In order to maximize the winning probability. The showmaster does not know your guess. q<p q ) p Xn −→ X (in probability) ? d Xn −→ X (in distribution) Fig.1. 3. there are very readable textbooks on the subject of this chapter. there are three doors. corresponds to the multiplication of their characteristic functions: φ(f ) = φ(f1 ) · φ(f2 ). If you can guess correctly. Convergence of random variables Notes and Comments Besides the rigorous treatments in [36]. there is a goat. . f = f1 ∗ f2 . You are attending an entertainment show. What is the probability that in the first ten throws you always get six eyes? What is the probability that you will always get six eyes in the throws eleven through twenty as well? 2. do you stick with your initial guess or do you switch to the other door? — Hint: Start with a stochastic simulation .4. He opens one door at random. you can keep it.19)? 5. [7]. Among them are [26]. Show that the density function f of the sum x1 + x2 is obtained by the convolution of f1 and f2 . or [24]. Who invented the characteristic function (Definition 1. Verify that the convolution of densities. 1. respectively. and the goat is not there. — You make an initial guess about the door but you do not tell anybody.5 Exercises 17 Lp Xn −→ X ? Lq 1 Xn −→ X (almost sure) Xn −→ X. [5]. [3]. 1. It’s not the door you have chosen. On the stage. and [11]. and he does not know where the goat is. . . Behind one of them. A fair six-faced die is thrown repetitively.5 Exercises 1. We have two independent real random variables x1 and x2 with the density functions f1 and f2 . . behind which one it is. 4. — Now you must tell the showmaster which door he should open.

What is the implication of this when we are working with characteristic functions? Choose an example and verify! .18 1 Probability 6. The central limit theorem of probability theory says that the sum (and the average) of independent and identically distributed real random variables converges to a random variable with a Gaussian distribution.

Therefore. Remark 2. Definition 2.1. realization. a sample ω is chosen from the sample space Ω “at random”. In order to model randomness in signals (noise signals). This yields the “stochastic signal” or “noise signal” r(·. it does not exist.2 Random Processes She: What is white noise? He: It is the best model of a totally unpredictable process. F. We assume that information is not lost with increasing time and therefore the corresponding σ-algebras will increase over time as more and more information becomes available. however. ω) In the stochastic interpretation. ω) = X(t).2. ω) 7→ X(t. We will. This signal is also denoted as sample path. or trajectory. . Random process A random (or stochastic) process {Xt . Notation We introduced random or stochastic processes as functions with two arguments: t and ω. I am white noise? He: No. t ∈ T } is a collection of random variables on the same probability space (Ω. we know that the amount of information is increasing with time. ω) defined on the index set T . we introduce the notion of random processes. omit the argument ω for brevity as it is done in most text books: X(t. randomness only enters when observations are conducted. She: Are you implying. P ). This chapter extends these concepts with an additional time dependence component. By the definition of random processes. Dialogue of an unknown couple 2. Once again we want to stress the fact that the tools are deterministic mathematical constructs. We first state the classic definition of random processes. Again. t2 ] or a discrete set. the random process X can be written as a function: X : R × Ω → R. (t. we need the concept of sigma algebras. This concept is called filtration. The index set T is usually representing time and can be either an interval [t1 .1 Introduction In the first chapter. we have introduced the mathematical framework to describe random observations.

stationary random processes. Example of a filtration Therefore. F. At time T2 we know whether we have {ω1 . ω4 }. Ft = {∅. Suppose we have a sample space of four elements: Ω = {ω1 . {ω1 . The random variables {Xt : 0 ≤ t ≤ ∞} are called adapted to the filtration Ft if. T ]. Thus. Its continuous differentiability implies both a bounded total variation and a vanishing “sum of squared increments”: 1. 2. ω3 . {Ft }t≥0 . we have full information. a stochastic process is said to be defined on a filtered probability space (Ω. ω4 }. we have Fs ⊆ Ft .4. for every s ≤ t. At time zero. let us first recapitulate two (almost) trivial properties of deterministic functions: Let x(·) be a real. Gaussian random processes. we do not have any information about which ω has been chosen.20 2 Random Processes Definition 2. Ω}. Before going into the topics of random processes.1. {ω3 . Total variation: ∫ T 0 .3. At time T . for every t. The concept of filtration is easily understood with a simple example. ω2 . continuously differentiable function defined on the interval [0. Xt is measurable with respect to Ft . F0 represents initial information whereas F∞ represents full information (all we will ever know). Therefore. T )  Fmax = 2Ω . ω2 }. ω4 }. T2 )  {∅. rD = {ω1 } B r rE = {ω2 } Ar rF = {ω3 } r C rG = {ω4 } -t T 2 0 T Fig. we have the following σ-algebras:  t ∈ [0. Filtration/adapted process A collection {Ft }t≥0 of sub σ-algebras is called filtration if. Example 2. etc.. Ω}. P ). ω2 } or {ω3 . t = T. t ∈ [ T2 .

.

.

dx(t) .

.

.

dt

dt < ∞

y) dy. B). t ∈ T. as follows: P(s. Therefore. More formally we have: Definition 2. 0 ≤ s < t. The transition density p is implicitly defined as ∫ P(s. This allows the desired “wild” and “random” behavior of the (sample) “noise signals”.2. the past and the future of a Markov process have no direct interconnection. is called a Markov process if for any finite parameter set {ti : ti < ti+1 } ∈ T we have P (X(tn+1 ) ∈ B | X(t1 ). 2.1 Markov Process A Markov process X is a particular type of stochastic process where only the present value X(t) is relevant for predicting the future evolution of X. where µ(t) and σ 2 (t) are arbitrary functions.2 Classes of Processes 21 2. .5. x. denoted by P (s. A Gaussian process is fully characterized by its mean and covariance function. t. Gaussian processes do have many nice mathematical properties.2. The function P gives the probability of X(t) lying in the set B at time t.2. Another important property is that the limit of a Gaussian random sequence remains a Gaussian process. σ 2 (t)) for all t. X(tn )) = P (X(tn+1 ) ∈ B | X(tn )) . t. t. . x. x. “Sum of squares”: )2 N ( ( ( ∑ T) T) x k − x (k−1) =0 N →∞ N N lim k=1 Random processes do not have either of these nice smoothness properties in general. If X(t) is a Gaussian process. Markov process A continuous-time stochastic process X(t). B 2. These crucial properties will be needed later on. B) = p(s. For example performing linear algebraic operations on a Gaussian process yields a Gaussian process.2 Gaussian Process A stochastic process is called Gaussian if all of its joint probability distributions are Gaussian. the mean square derivatives and integrals of Gaussian processes are Gaussian processes themselves.2 Classes of Processes 2. t. x. . then X(t) ∼ N (µ(t). . For a Markov process X(t) we define the transition probability. B) = P (X(t) ∈ B | X(s) = x) . given the value x of the process at time s. . Hence.

σ 2 (t.22 2 Random Processes 2. From this definition. This is a pretty surprising result because usually a distribution is not completely determined by its first two moments.3.2. the transition probability is uniquely determined by µ(t. µ(t. under certain assumptions.6. for all s ∈ [0. t]. 2. X(t)) is called drift and σ 2 (t. X(t)). .4 Diffusions A diffusion is a Markov process with continuous trajectories such that for each time t and state X(t) the following limits exist 1 E[X(t + ∆t) − X(t)|X(t)]. Actually. P ) if the following conditions hold: • • X(t) is {Ft }t≥0 -adapted. it follows that the best prediction of a martingale process is its current value. X(t)) := lim E[{X(t + ∆t) − X(t)}2 |X(t)]. Brownian motion A stochastic process W(t) is called Brownian motion if 1.3 Martingales A stochastic process X(t) is a martingale on the filtered probability space (Ω. X(t)). We therefore state that martingale processes model fair games. the American mathematician Norbert Wiener stipulated the following assumptions for a stationary random process W (·. X(t)). Independence: W (t+∆t) − W (t) is independent of {W (τ )} for all τ ≤ t. X(t)) is called the diffusion coefficient. X(t)) and σ 2 (t. 2.3 Brownian Motion and White Noise 2.1 Brownian Motion Motivated by the apparently random walk of a tiny particle in a fluid (observed by the Scottish botanist Robert Brown in 1827). If we consider a coin tossing game where the player gains one dollar on head and loses one dollar on tail the wealth of the player follows a martingale. Stationarity: The distribution of W (t + ∆t) − W (t) does not depend on t. 2. ·) with independent increments in 1923: Definition 2. The martingale theory is a fundamental tool in finance. X(t)) := lim ∆t↓0 For these limits. ∆t↓0 ∆t µ(t. F.2.s. ∆t 1 σ 2 (t. Since diffusions are Markov processes we expect a relationship between the transition probability and µ(t. E[|X(t)|] < ∞ for all t ≥ 0. and the theory behind it is vast. {Ft }t≥0 . E[X(t)|Fs ] = X(s) a.

Continuity: lim Please note that the third assumption is expressed with probabilities: discontinuities in sample functions can only occur with probability zero. This is also the case for almost sure convergence. Normally distributed increments of Brownian motion If W (t) is a Brownian motion. 1). Some of them are listed below: • • • • • • Autocovariance E{(W (t) − µt)(W (τ ) − µτ )} = σ 2 min(t. . τ ) { } function: W (t) σ2 Var = t t W (t) − µt lim = 0 with probability 1 t→∞ t The total variation of the Brownian motion over a finite interval [0.2.3 Brownian Motion and White Noise 23 P (|W (t +∆t) − W (t)| ≥ δ) = 0 for all δ > 0 .7. The random process ∞ ∑ Y0 2 Yk X(t) = √ t + sin kt for t ∈ [0. π]. be mutually independent random variables with identical normal distributions√N (0. where µ and σ are constant real numbers. there is a version of the Brownian motion with all sample functions continuous. 2πσ 2 t An irritating property of Brownian motion is that its sample paths are not differentiable. Hence.) This definition induces the distribution of the process Wt : Theorem 2. then W (t)−W (0) is a normal random variable with mean µt and variance σ 2 t. (This technicality is not of any practical importance. . Y1 . ∆t↓0 ∆t 3. As a result of this theorem. we have the following density function of a Brownian motion: (x−µt)2 1 fW (t) (x) = √ e− 2σ2 t . T ] is infinite! The “sum of squares” of a drift-free Brownian motion is deterministic: )2 N ( ( T) ( ∑ T) lim W k − W (k−1) = σ2 T N →∞ N N k=1 Infinite oscillations: Let Y0 . 2 ∆t ∆t This diverges for ∆t → 0 and therefore it is not differentiable in L2 . . π] π k π k=1 is a normalized Brownian motion on the interval [0. This is easily verified in the mean-square sense: E [( W (t +∆t) − W (t) )2 ] ∆t = E[(W (t +∆t) − W (t))2 ] σ2 = . but this is much more difficult to prove. The Brownian motion has many more bizarre and intriguing properties. .

3. the white noise v(·) becomes truly stationary on the infinite time interval (−∞. i.3. t] exists and equals t. every sample of a drift-free Brownian motion has infinitely many zero-crossings. for t = 0. We have already stated that the “sum of squares” of a drift-free Brownian motion is deterministic.e. we can also write (dW (t))2 = dt. Quadratic variation of standard Brownian motion The quadratic variation of standard Brownian motion over [0. E[W 2 (t)] = t (σ 2 = 1).1. Formally. for t > 0. In this way. T ]. E[W (t)] = 0 (µ = 0). This stationary white noise is characterized uniquely as follows: • Expected value: • Autocovariance function: E{v(t)} ≡ µ Σ(τ ) = E{[v(t+τ ) − µ][v(t) − µ]} ≡ σ 2 δ(τ ) . it is customary to define a random process v(·) called stationary white noise as the formal derivative of a general Brownian motion W (·) with the drift parameter µ and the variance parameter σ2 : dW (t) v(t) = .. Standard Brownian motion A Brownian motion is standard if W (0) = 0 a. the “initial” time is shifted from t = 0 to t = −∞.9. then the following process W{∗ (·) is a Brownian motion as well: tW ( 1t ). This can be formulated more generally as follows: Theorem 2. Without loss of generality. Zero crossings: In a finite interval [0. T ]. 2. dt Usually. Nevertheless. ∞)..2 White Noise As we have seen in Section 2.s.8. Note that Brownian motion is usually assumed to be standard if not explicitly stated otherwise. ∞).24 • • 2 Random Processes If W (·) is a Brownian motion on the interval [0. W ∗ (t) = 0. we may assume that v(t) is Gaussian for all t. in engineering circles. no sample path has isolated zero-crossings! Definition 2. a Brownian motion is continuous but nowhere differentiable. The set of zero-crossings is dense in [0.

Using white noise as the model of a completely unpredictable random process. and the initial time t = 0 satisfies the following stochastic differential equation. 0 Mathematicians prefer to write this equation in the following way: ∫ t ∫ t ∫ t dW (α) W (t) = v(α) dα = dα = dW (α) . we can say: the continuous-time measurement y of the third state variable x3 is corrupted by an additive white noise v: y(t) = x3 (t) + v(t) . dα 0 0 0 Consequently. smoothing by averaging is not optimal. Of course.3 Brownian Motion and White Noise • 25 Spectral density function: ∫ S(ω) = F{Σ(τ )} = ∞ −∞ e−jωτ Σ(τ ) dτ ≡ σ 2 . . (See Chapter 4. the characterizations by the autocovariance function and the spectral density function are redundant. we would have to say: The integral of the continuous-time measurement y of the third state variable x3 is corrupted by an additive Brownian motion W : ∫ t ∫ t y(t) dt = x3 (t) dt + W (t) .2. a Kalman filter (or extended Kalman filter) should be used. the variance parameter σ 2 .) The Brownian motion W on the time interval [0. y(t) dt = x3 (t) dt + ∆T t−∆T ∆T t−∆T ∆T It should be obvious where this leads to mathematically as ∆T ↓ 0. Rather. where W is a standard Brownian motion: dX(t) = µdt + σdW (t) X(0) = 0 . 0 0 Yet another way of expressing ourselves in full mathematical correctness could be: The short-time averaged (or smoothed) measurement y of the third state variable x3 is corrupted by an additive increment of a Brownian motion W: ∫ t ∫ t 1 1 W (t) − W (t−∆T ) y(t) = . Expressing the same fact in full mathematical correctness using a Brownian motion. a Brownian motion X with the drift parameter µ. ∞) can be retrieved from the stationary white noise v by integration: ∫ t W (t) = v(α) dα . Of course.

1987 constitutes such a “rare” event. . ∞) defined on (Ω. . 1. . . Definition 2. In this section.3. Ornstein-Uhlenbeck process or exponentially correlated noise: dY (t) = − aY (t)dt + bσdW (t) with a > 0 . .10. . Often. Brownian motion is not a sufficient model and thus there is a need to describe discontinuous stochastic processes. we introduce a stochastic process in continuous time with discontinuous realizations. Q(t3 ) − Q(t2 ). < tn the increments Q(t2 ) − Q(t1 ). 2. t)dt + σ(Y (t). the drop of the Dow Jones Index of 22. For example. . .26 2 Random Processes 2. Q(0)=0 with probability one. these discontinuities in financial time series are called “extreme” or “rare” events.3 Generalizations Defining the Brownian motion via a stochastic differential equation involving the drift parameter µ and the volatility parameter σ leads to the following rather straightforward generalizations: • Instationary Brownian motion: dY (t) = µ(t)dt + σ(t)dW (t) . . the Wiener process or Brownian motion has been introduced. Brownian motion is a stochastic process in continuous-time with continuous realizations. F. • This is a special case of a locally Brownian motion. To account for such a large drop in the time series. This model is very popular and useful in the area of finance. • Locally Brownian motion: dY (t) = µ(Y (t). • Geometric Brownian motion: dY (t) = µY (t)dt + σY (t)dW (t) . .6% on October 19. For each 0 < t1 < t2 < .4 Poisson Processes In the previous section. Poisson process A Poisson process with parameter λ is a collection of random variables Q(t). t ∈ [0. A suitable stochastic model for this kind of behavior is a Poisson process. t)dW (t) .} and satisfying the following properties: 1. {Ft }t≥0 . Q(tn ) − Q(tn−1 ) are independent. Note that both its drift parameter µY (t) and its volatility parameter σY (t) are proportional to the value Y (t) of the random process. 2. P ) having the discrete state space N = {0. 2.

We formally define the differential dQ(t) to be the limit dQ(t) = lim (Q(t + ∆t) − Q(t)) . the probability of at least one event happening in a time period of duration ∆t is given by P (Q(t + ∆t) − Q(t) ̸= 0) = λ∆t + o(∆t2 ) with λ > 0 and ∆t → 0. Note that the probability of an event happening during ∆t is proportional to the time period of duration. For 0 ≤ s < t < ∞ the increment Q(t) − Q(s) has a Poisson distribution with parameter λ. Second. [16]. because the state space contains only discrete numbers. Let Q(t + ∆t) − Q(t) be a Poisson process as defined above with the parameter λ. Essentially. dQ(t) = 0 with probability 1 − λdt 2. |dQ|(t) = 1 with probability λdt . . the distribution of the increments is given by P ([Q(t) − Q(s)] = k) = λk (t − s)k −λ(t−s) e k! for k ∈ N . therefore making this probability extremely small.2. Notes and Comments Textbooks including the subject of stochastic processes are [7]. ∆t→dt From the definition of the Poisson process. The realizations are always positive by definition of N .4 Poisson Processes 27 3.e. [26]. and [11]. it says that the simultaneous occurrence of more than one event during a small ∆t is almost zero. [3].. First. i. [30]. [24]. the probability of two or more events happening during ∆t is of order o(∆t2 ). The result is obtained by expanding the Taylor series of the Poisson distribution to the first order around ∆t ≃ 0. The Poisson process is a continuous-time process with discrete realizations. it follows that dQ(t) has the following properties: 1.

28 2 Random Processes .

Rather. We can write (3. as yet undefined. x0 . consider the following system dx(t) = a(t)x(t) . x)dt or.1) in the symbolic differential form dx(t) = f (t. x(s)) ds .) Oliver Heaviside 3. I assure you mine are far greater.3) dt where a(t) is not a deterministic parameter. so that we have a(t) = f (t) + h(t)ξ(t).1 Introduction In order to illustrate stochastic differential equations (SDEs). The idea of an ODE can be augmented to an SDE by adding noise to the system under consideration. x) dt (3.1) may be viewed as a degenerated form of an SDE.3 Stochastic Differential Equations Do not worry about your problems with mathematics. An ODE dx(t) = f (t. The uncertainty is . in the absence of randomness.2) 0 where x(t) = x(t. it is subjected to some random effects. t0 ) is the solution satisfying the given initial condition x(t0 ) = x0 . as an integral equation ∫ t x(t) = x0 + f (s. x(0) = x0 . For example. (3. (3. more accurately. we first have a look at an ordinary differential equation (ODE). Albert Einstein Why should I refuse a good dinner simply because I do not understand the digestive processes involved? (Reply when criticised for his daring use of operators before they could be justified formally.

2) we can rewrite (3. and W (t. an approach similar to Riemann integrals is taken. we get the following SDE dX(t) = f (t)X(t)dt + h(t)X(t)dW (t) . X(s. 3. X(t. For further discussions. ω) changes only at the times ti (i = 1. 3. X(s. ω) = X0 .1 Definition ∫T For the stochastic integral 0 g(t. ω) is a random process defined on the appropriate probability space and has the deterministic initial condition X(0. ω)) dW (s. .2. . In general. (3. there is still the problem of understanding what the integral g(s.4) where ξ(t) is a white noise process. (3.2 Stochastic Integration or Itˆ o Integrals 3. ω)) ∈ R.7) 0 1 With capital letters we denote random variables or stochastic processes.5) as ∫ t ∫ t X(t.6) is quite 0 familiar. ω) ∈ R. ω) = f (t. X(t. ω) .3) can now be rewritten as1 dX(t) = f (t)X(t) + h(t)X(t)ξ(t) . X(s.. ω)) ds + g(s. S= (3.30 3 Stochastic Differential Equations represented by the stochastic process ξ(t). ω) . In order to define stochastic integrals. ω). ω) . The first integral in (3. The differential equation (3. By writing (3. g(t. ω)) ∈ R.. ω)dW (t. whereas lower case letters denote deterministic variables or processes. ω)dW (t. an SDE is given by dX(t.6) 0 0 At ∫ t the moment. ω))dt + g(t. < tN −1 < tN = T. .5) where ω indicates that X = X(t. we assume that g(t. X(t. We define the integral ∫ T g(t. dt (3. with 0 = t0 < t1 < t2 < . In analogy to (3. since it is the ordinary Riemann integral. ω))dW (t. ω) = X0 + f (s. N − 1). ω))dW (s. 2. This is presented in the next section. X(t.. we assume that f (t.4) in the differential form and using the substitution dW (t) = ξ(t)dt (where dW (t) is the differential form of a standard Brownian motion W (t)). ω) exactly means. .

In the case of stochastic calculus in the Itˆo sense. the stochastic integral is a random variable. ω). the stochastic integral is deterministic with value 0.2 Stochastic Integration or Itˆ o Integrals 31 as the limit of the “Riemann sums” SN (ω) = N ∑ ( ) g(ti−1 . such that g(t. Chapter 4]. T ] if [( lim E N →∞ S− N ∑ ( )] g(ti−1 . (3. The realizations of the random processes are observable in a path-wise sense adapted to the Brownian motion: At time t = 0.8) i=1 for N → ∞. Thus. g(t. Because we choose ti−1 for approximating g(t. (3. There are some natural conditions which guarantee that a stochastic variable as described above does indeed exist. we call this approximation non-anticipative. This choice implies the stochastic properties of the Itˆo integral which are discussed later in this chapter. however. ω) = 0 . For stochastic calculus in the Itˆo sense. ω) W (ti . ω) = g(t). Definition 3. (3.9) i=1 for each sequence of partitions (t0 . the limit limN →∞ SN (ω) cannot be treated as in the deterministic case. ω) with respect to the Brownian motion W (t. ω) − W (ti−1 . t1 . ω) − (W (ti−1 . In contrast to Riemann integrals. it matters. These conditions are stated later in the text. The simplest possible example . at time t = T .. For a more detailed treatment of this definition refer to [3. 3. T ] such that maxi (ti − ti−1 ) → 0. . the samples of which depend on the individual realizations of the paths W (. For Riemann integrals. .7) is defined and can be computed. In this case. ω) . ti ] the value of g(t. so that the integral is well defined. Unfortunately. ω).1. ω) on the interval [0. the limit depends on the choice of the point at which the approximation is considered. tN ) of the interval [0. the complete realizations of W have been observed and have contributed to the random value of the integral in the corresponding way. let us assume that the argument is a non-random variable. ω) is approximated. This is not the case for stochastic integrals and therefore.3. The limit in the above definition converges to the stochastic integral in the mean-square sense. . the convergence concept of interest the is mean-square or L2 convergence. A random variable S is called the Itˆ o integral of a stochastic process g(t. ω) is approximated at ti−1 . ω) W (ti . . the limit converges to the same number regardless of the chosen approximation.2 Examples To give a first illustration of the definition. at which point of the interval [ti−1 .2.

ω) = W (t. ω) = W 2 (T. ω) − W (ti−1 . ω) = lim 0 N →∞ = lim N ∑ i=1 N ( [1 ∑ N →∞ − =− 1 2 ( ) W (ti−1 . 2 (3. ω)−W (t0 . ω) dW (t. The random function can be approximated by a random step function g(t. i. 2 Due to the sum-of-squares property of the normalized the first term is deterministic. ω)) = c (W (T. ω) = c lim N ( ∑ N →∞ 0 ) W (ti . To simplify things a bit. By applying the definition for Itˆo integrals we obtain ∫ T W (t. ω) W (ti . ω) = c W (T. ω) − 2 0 Brownian motion W .. because it agrees with our intuition from standard calculus. we now use the definition that the Brownian motion starts from zero.10) . ω) dW (t. W (0. ω) are standard Gaussian random variables. ω) itself is a random function. ω) − W (ti−1 . we actually get ∫ T c dW (t. An illustrative example is g(t. ω) = g(ti−1 . 0 In the next example. ti ] . and we finally get ∫ T 1 W (t.32 3 Stochastic Differential Equations is g(t) = c for all t. 1 T. N →∞ +(W (tN . ω) i=1 N ( )2 ∑ 1 lim W (ti . ω) . y(x − y) = yx − y 2 + 21 x2 − 12 x2 = 21 x2 − 12 y 2 − 12 (x − y)2 . we allow that g(t. This makes sense. ω) − W 2 (ti−1 . ω) i=1 )2 ] W (ti . ω)) . ω)) + . ω)−W (t1 . where W (T. ω) . Taking the definition. ω) . The following algebraic identity is needed for the next calculation. The last result becomes ∫ T c dW (t. ω) − W (ti−1 . ω) and W (0. ω) − W (ti−1 . t ∈ [ti−1 .e. . so that the integrand itself is a random variable. ω) − W (0. ω) 2 N →∞ i=1 1 + W 2 (T. ω)) + (W (t2 . ω)−W (tN −1 . but a simple one. This is still a stochastic process. . ω) = 0. ω) 2 N ( ∑ ) W 2 (ti . ω). ω) i=1 = c lim [(W (t1 .

First. ω) = 0 Proof: [∫ Var T T E[g 2 (t. ω)] dt . ω)dW (t. whereas the Itˆo integral differs 1 by the term − 2 T . ω) − W (ti−1 . — This example shows that the rules of differentiation (in particular the chain rule) and integration need to be re-formulated in the stochastic calculus.3 Properties of Itˆ o Integrals We now state some important properties of Itˆo integrals. we compute the mean and the variance of the stochastic integral. ω) ] i=1 = 0. ω) − W (ti−1 . ω) ] ( ) E[g(ti−1 . ω)g(tj−1 . 3. ω))2 ] 0 [( =E lim N ∑ N →∞ = lim N →∞ N ∑ ( ))2 ] g(ti−1 . ω) W (ti . ω) − W (ti−1 . ω) dW (t.2 Stochastic Integration or Itˆ o Integrals 33 This is in contrast to ∫our intuition from standard calculus. ω)] E[ W (ti . ω)] = E[ lim N →∞ 0 = lim i=1 N ∑ N →∞ ( ) g(ti−1 . ω) − W (ti−1 . In the case of a T deterministic integral 0 x(t)dx(t) = 12 x2 (t). For the mean. ω) dW (t. 0 [∫ g(t. ω)] = 0 .3. ω))(W (tj . ω) W (ti . ω) dW (t. This is what we would expect anyway. ω) − W (ti−1 . the variance can be computed in a similar way and we obtain [∫ Var T ] ∫ g(t. ω) = E ( ] 0 T g(t. we obtain: ∫ E[ T g(t. ω)] E[ W (ti . ω) ] N →∞ i=1 . ω) i=1 j=1 (W (ti . ω) i=1 N ∑ E[g(ti−1 .2. 0 Proof: ∫ E[ N ∑ T g(t. Next. ω))] N ( )2 ∑ = lim E[g 2 (ti−1 . ω) − W (tj−1 . The expectation of stochastic integrals is zero. ω) dW (t.

e. ω). ω) depends only on the stochastic variable {W (t − h.5). ω)dW (t. ω) should be adapted to the Brownian motion. ω)] dt .2. ω)]dt . This is the non-anticipating approximation. ∫ T [a1 g1 (t. X(t))A(t)dQ(t) . The natural requirement is to impose on g(t. 0 Otherwise the variance would be without bounds.. X(t))dt + h(t. ω) | h ≥ 0} and possibly on other variables which are independent of the Brownian motion. ω) should only attain large values with low probability. When the stochastic process is interpreted as a stock price process. = 0 The computation of the variance yields another important property: [( ∫ T )2 ] ∫ T E g(t. the integrand of the Itˆo integral is approximated by g(ti−1 .4 Stochastic Integrals for Poisson Processes Let us consider the case that the source of uncertainty of an SDE is a Poisson process rather than a Brownian motion. In analogy to (3.i. ω).e. ω)dW (t. g(t. (3. ω)] (ti − ti−1 ) i=1 T E[g 2 (t. ω). 0 0 The condition that stochastic integrals are well-defined and solvable is briefly discussed here: More generally. driven by a Poisson process as follows: dX(t) = f (t. ω) 0 for any real numbers a1 . the stochastic variable g(t. 3. roughly speaking. g2 (t. Of course. Mathematically speaking.11) . ω) + a2 0 T g2 (t. ω)]dW (t. a2 and any functions g1 (t. i. In the further discussion of this text. For this reason. Another requirement is that. g(t. the Itˆo integral is linear.. ω)]dt < ∞ . ∫ T E[g 2 (t. ω) = E[g 2 (t. knowing tomorrow’s asset price would make life on the stock market totally different from what it is today. This means that at any time t. ω) + a2 g2 (t. ω) lead to well-defined results. one may ask which processes (or functions) g(t. ω)dW (t. ω). ω) 0 ∫ ∫ T = a1 g1 (t.34 3 Stochastic Differential Equations = lim N →∞ ∫ N ∑ E[g 2 (ti−1 . we shall omit the ω argument for convenience. we define an SDE. ω) that it does not depend on future values of the Brownian motion W (t.

The term i=1 h(si . X(t))A(t)dQ(t) . X(si ))A(si ) . A(t)). X(t))dW (t) .12) i=1 where si ∈ [0. dQ(t) is multiplied by h(t. we write (3.1 The Scalar Case As mentioned before.d. X(t))dt + g(t. A(si ) denotes the i-th jump amplitude drawn from the distribution p(A). X(t)) is continuously differentiable in t and twice continuously differentiable in X. . X(t))dt + g(t. t]. X(t))dW (t) + h(t.3 Stochastic Differentials and Itˆ o Calculus 3.13) and another process Y (t) which is a function of X(t). the rules of classical calculus are not valid for stochastic integrals and differential equations. Similarly to g(t.3. X(t))dW (t) . X(t). X(si ))Ai (s) adds up all Poisson events evaluated at the respective jump time si and with the jump amplitude Ai . h(t. A(t)) allows us to model complicated non-linear stochastic environments.11) as the solution for X(t) as follows: ∫ f (s. the stochastic integral degenerates to a finite ∑N (t) sum. and N (t) denotes the number of jumps that occurred between [0.i.) random variable which allows the “jump amplitude” to be an independent stochastic variable. It is the equivalent to the chain rule in classical calculus. X(t)) . X(t))dt + g˜(t.3. 3. (3. find the stochastic differential equation for the process Y (t): dY (t) = f˜(t. As before. (3. X(t). Furthermore. where the function ϕ(t. we shall derive.3 Stochastic Differentials and Itˆ o Calculus 35 where dQ(t) is the differential form of the Poisson process and X(0) = x0 is the initial condition. t] denote the points in time when a “jump” has occurred. X(t)) for an SDE driven by Brownian motion. the famous Itˆ o formula. where W (t) is a standard Brownian motion and Q(t) is a standard Poisson process. Y (t) = ϕ(t. Since the Poisson process has discrete realizations. The most general SDE is one where a Brownian motion and a Poisson process are driving the uncertainty: dX(t) = f (t. heuristically. In this section. The problem can be stated as follows: Given a stochastic differential equation dX(t) = f (t. where A(t) is an identically and independently distributed (i. X(s))ds + 0 ∑ N (t) t X(t) = X0 + h(si .

X)g(t. X(t))dt + g(t. X) + ϕx (t. X(t))dt . we reason as follows: The Taylor expansion of ϕ(t. X)dt + ϕx (t. X)dt + ϕx (t. X(t))dt + g(t. X)g 2 (t. X(t))dt + g(t. X(t))dW (t)]dt + higher order terms . X)dX(t)dt + higher order terms . X)dt + ϕx (t. X(t)) + ϕxx (t. X(t)) satisfies the stochastic differential equation dY (t) = f˜(t. x)f (t. where 1 f˜(t. X)[f (t.13) for dX(t) we arrive at dY (t) = ϕt (t. 2 Using the expression (3. X(t)) = ϕ(t. we know the result: the chain rule for standard calculus. X(t)) + ϕxx (t. X)f (t. due to the sumof-squares property of the Brownian motion W . omitting higher order terms. X(t))]dt 2 +ϕx (t. X(t))dW (t)]2 2 2 +ϕxt (t.2. X(t)) be a suitably differentiable function. X(t)) ̸= 0. Thus. Theorem 3. X(t))dW (t)] 1 + ϕxx (t. X(t)) yields 1 dY (t) = ϕt (t. X(t))dW (t) . The higher-order differentials tend to be small compared to the first order terms dt and dW and thus dt2 → 0 and dtdW (t) → 0. X(t)) . X(t))dW (t) . X)[f (t. For the random process defined by the stochastic differential equation dX(t) = f (t. X)g(t. But. X(t)) = 0. X(t))dW (t) . X)dt2 2 1 + ϕxx (t. To arrive at a similar result in the case when g(t. X(t))dt + g(t. we arrive at dY (t) = ϕt (t. X)f (t. X)dt2 + ϕxx (t. The result is given by dy(t) = (ϕt (t. . X)g 2 (t. Itˆ o’s Lemma Let ϕ(t. X)(dX(t))2 + ϕxt (t.36 3 Stochastic Differential Equations In the case when we assume that g(t. X(t))dt + g(t. x) + ϕx (t. the transformed stochastic process Y (t. X(t))dW (t)] 1 1 + ϕtt (t. X(t)) 2 g˜(t. X(t)) = ϕx (t. X(t)) = ϕt (t. 2 Reordering the terms yields the scalar version of Itˆo’s Lemma: 1 dY (t) = [ϕt (t. X)g 2 (t. X(t))dt + g˜(t. X) + ϕx (t. X)[f (t. X)[f (t. x))dt . X)dX(t) + ϕtt (t. we have dW 2 (t) = dt.

X(t)). X)dt + ϕx (t. This is confirmed by the Itˆo correction term. This means that W (t) takes values in Rm and its components processes Wi (t). .10). we con= 2X. dX(t) = dW (t). X(t))dt + g˜(t. The Itˆo formula can be written in vector notation as follows: dY (t) = f˜(t. Itˆo’s formula yields d(W 2 (t)) = dt + 2W (t)dW (t) .X) ∂t = 0. Rewriting the equation in integral form and using W (0) = 0. where ϕ(t.3. it is convenient to write the SDE of the process Y (t) as 1 dY (t) = ϕt (t. — From the SDE. we get ∫ W 2 (t) = t + 2 t W (t)dW (t) 0 or ∫ t W (t)dW (t) = 0 1 2 1 W (t) − t ..X) ∂X ∂ 2 ϕ(t. X)dX 2 2 (3. dt · dW = 0. Consider a scalar process Y (t) defined by Y (t) = ϕ(t.3 Stochastic Differentials and Itˆ o Calculus 37 The term 12 ϕxx (t. we conclude that our intuition based on standard calculus does not work. X) is a scalar function which is continuously differentiable with respect to t and twice continuously differentiable with respect to X. (i = 1.. X)dX + ϕxx (t. X)g 2 (t. X(t))dW (t) with .3.X) ∂X 2 = 2. We let W (t) be an m-dimensional standard Brownian motion. X(t)) ∈ Rn×m .14) where dX 2 is computed according to the rules dt2 = 0.. The relevant partial derivatives are: ∂ϕ(t. To make the dimensions fit. 2 2 This is exactly the result we obtained by solving the stochastic integral in (3. From this result. X(t)) ∈ Rn and g(t. X(t)) = X 2 (t). 2. 3. X(t)) which appears in the expression for f˜(t. we have f (t. X(t)) is sometimes called the Itˆo correction term since it is absent in the deterministic case. ϕ(t. and ∂ϕ(t. We now allow that the process X(t) is in Rn .2 The Vector Case All of the above was for a situation in which the process X(t) is scalar. Sometimes. m) are independent scalar Brownian motions. Example 3. and dW 2 = dt. clude that X(t) = W (t).3. Note that this term vanishes if ϕ is linear.

The process S(. X(t))ϕxx (t.).S) 1 S. for the sample path of the geometric Brownian motion S(. 2 dY (t) = ( ∂ϕ(t. Therefore. Y (. We want to find the stochastic differential equation for the process Y related to S as follows: Y (t) = ϕ(t. we have the following closed-form solution: 1 Y (t) = Y0 + (µ − σ 2 )t + σW (t) . for the sample path of Y (. X(t)) . 3. Consider the following stochastic differential equation: dS(t) = µ S(t)dt + σ S(t)dW (t) . S) ) + σS(t) dW (t) ∂S 1 dY (t) = (µ − σ 2 )dt + σdW (t) . X(t)) + ϕx (t. X(t)) + ϕx (t. and So. S) + (3. X(t))g(t. according to Itˆo ∂t ∂ϕ(t. X(t))g(t. 2 Hence.4.) = eY (. and ∂ϕ(t.) .3.S) = 0. The relevant partial derivatives are ∂ϕ(t. S(t)) = log(S(t)) . X(t))f (t.3 Examples In this section. X(t)) ) 1 ( + tr ϕxx (t.16) Obviously. X(t)) = ϕt (t.15) = − S12 . where “tr” denotes the trace operator. S) 1 ∂ 2 ϕ(t. Most of the examples are classical examples which can be found in numerous textbooks. X(t))g T (t.S) ∂S = ∂ 2 ϕ(t.38 3 Stochastic Differential Equations f˜(t. S) 2 2 ) µS(t) + σ S (t) dt ∂t ∂S 2 ∂S 2 ( ∂ϕ(t. X(t)) ) 1 ( + tr g T (t. X(t)) 2 and g˜(t. X(t))g(t.) is called geometric Brownian motion. we show how to use Itˆo’s formula.) is a non-standard Brownian motion with 1 E[Y (t)] = (µ − σ 2 )t 2 Var[Y (t)] = σ 2 t . X(t))f (t. X(t)) 2 = ϕt (t. Example 3. we have the closed-form solution . ∂S 2 (3. X(t)) = ϕx (t.

X(t))ϕxx (t.17) The geometric Brownian motion is a popular stock price model and (3. X(t)) 0 g2 (t. dY (t) = dX1 (t)X2 (t) + X1 (t)dX2 (t) . X(t)) = ϕt (t. we can calculate the SDE for Y (t): f˜(t. 39 (3. . which drive the uncertainty of the SDE for Y (t). Mathematically. X1 . X1 . X2 ) = ϕx2 x2 (t. the risk is represented by the two Brownian motions. X2 ) = X2 ϕx2 (t. X(t)) ) 1 ( + tr g T (t. X(t))]dW (t) = X2 (t)g1 (t. X)g(t. X(t)) g1 (t. X2 ) = 0 ϕx1 x2 (t. X(t))dW (t) = [ϕx (t. ϕ(t.3. X(t)) + X1 (t)f2 (t. Thus. X2 ) = X1 ϕx1 x1 (t.5. Let us now apply the multivariate form of the Itˆo calculus. X(t)) + X1 (t)f2 (t. Suppose that two processes X1 (t) and X2 (t) are given by the coupled SDEs ] [ ] [ ][ ] [ dX1 (t) f1 (t. The partial derivatives are ϕt (t. X) + ϕx (t.17) gives the exact solution for this price process. Example 3. X2 ) = X1 X2 . dX2 (t) f2 (t. X(t)) dW2 (t) where W1 (t) and W2 (t) are two independent Brownian motions. X(t))dW1 (t) + X1 (t)g2 (t. X(t))dW2 (t) dY (t) = [X2 (t)f1 (t. X1 . X1 . Having computed the elements of the Jacobian and the Hessian. The SDE for Y (t) shows that the investor faces two sources of uncertainties: the first from the uncertainty of the American stock and the second from the uncertainty of the exchange rate. X1 . with the with the almost suspiciously simple final result. An interpretation of ϕ could be that X1 describes the evolution of an American stock in US $. whereas X2 describes the evolution of the exchange rate CHF/US $ and thus ϕ describes the evolution of the price of the American stock measured in CHF. X1 . Let us now compute a stochastic differential for the product Y = X1 X2 . X(t)) 2 = X2 (t)f1 (t. X1 . X)g(t.3 Stochastic Differentials and Itˆ o Calculus 1 S(t) = eY0 e(µ− 2 σ 2 )t+σW (t) . X1 . X2 ) = 1 . X2 ) = ϕx2 x1 (t. X)f (t. X(t))]dt +X2 (t)g1 (t. X2 ) = 0 ϕx1 (t. X(t)) g˜(t. X(t)) 0 dW1 (t) = dt + . X(t))dW1 (t) + X1 (t)g2 (t. X(t))dW2 (t) or.

X)g(t. X(t))dW (t) = ϕx (t. X(t))ϕxx (t. The dynamics of X(t) can be described as a linear superposition of an SDE driven by a Brownian motion and an SDE driven by a Poisson process. X2 (t))dt .3.6. X2 (t)) g˜(t. X2 ) = 0 ϕx1 (t. Suppose that two processes X1 (t) and X2 (t) are given by the uncoupled SDEs [ ] [ ] ] [ dX1 (t) f1 (t. with the surprisingly simple final result. X1 . X1 . X(t))dW (t) = [X2 (t)g1 (t. X1 (t)) + X1 (t)g2 (t. X2 ) = X1 X2 . X(t)) 2 = X2 (t)f1 (t. ϕ(t. 3. X1 (t))g2 (t. dY (t) = dX1 (t)X2 (t) + X1 (t)dX2 (t) + g1 (t. X2 ) = 1 . Let us apply the multivariate form of the Itˆo calculus. X1 .4 Itˆ o Calculus for Poisson Processes We briefly state the differentiation rules for scalar SDEs driven by a Brownian motion and a Poisson process. X2 (t)) +g1 (X1 (t))g2 (X2 (t))]dt +[X2 (t)g1 (t. X2 (t)) Let us compute a stochastic differential for the product Y = X1 X2 . X2 (t)) g2 (t. Thus. X1 . X1 (t)) + X1 (t)f2 (t. X2 ) = X1 ϕx1 x1 (t. X2 ) = X2 ϕx2 (t. X1 . X(t)) = ϕt (t. X1 . X)f (t. X2 (t))]dW (t) dY (t) = [X2 (t)f1 (t. Having computed the elements of the Jacobian and the Hessian. X2 (t)) +g1 (t. X1 . Theorem 3. X2 ) = 0 ϕx1 x2 (t. X2 ) = ϕx2 x1 (t.40 3 Stochastic Differential Equations Example 3. X1 (t)) g1 (t. dX2 (t) f2 (t. X)g(t.7. X1 (t)) + X1 (t)f2 (t. X1 (t)) + X1 (t)g2 (t. X1 (t)) = dt + dW (t) . X) + ϕx (t. X2 ) = ϕx2 x2 (t. X1 (t))g2 (t. X1 . The partial derivatives are ϕt (t. X(t)) ) 1 ( + tr g T (t. Itˆ o calculus for Poisson processes Let ϕ(t. X2 (t))]dW (t) or. X(t)) be a twice continuously differentiable function. we can calculate the SDE for Y (t): f˜(t. For the stochastic differential equation .

X(t)) . Hence (e−A(t) − 1) represents a downward jump which leaves S(t) positive. S) 2 2 ) µS(t) + σ S (t) dt ∂S 2 ∂S 2 ∂t ∂ϕ(t. X(t) + h(t. S) + σS(t)dW (t) ∂S +[log(S(t) + (e−A(t) − 1)S(t)) − log(S(t)]dQ(t) 1 dY (t) = (µ − σ 2 )dt + σdW (t) − A(t)dQ(t) .4 Stochastic Differential Equations 41 dX(t) = f (t. X(t))dt + g˜(t. X(t)) = ϕ(t. X)f (t. X(t)) 2 g˜(t.3. and the Poisson process Q(t) has the jump frequency λ. X(t))dW (t) + h(t. 2 ∂S and ∂ϕ(t.4 Stochastic Differential Equations We now possess the main tools to deal with stochastic differential equations. Applying the extended Itˆo formula we get ∂t dY (t) = ( ∂ϕ(t.8. A(t)) = 0. We classify SDEs . The example is a geometric Brownian motion with Poisson process which allows for downward jumps of the stock price process under consideration dS(t) = µS(t)dt + σS(t)dW (t) + (e−A(t) − 1)S(t)dQ(t) . A(t))dQ(t) . A(t))dQ(t). Example 3. S) 1 ∂ 2 ϕ(t. the transformation Y (t. X(t). X(t)) = ϕt (t. X(t)) ˜ X(t).S) log(S(t)). X(t)) + ϕxx (t. We assume that the jump magnitude is determined by draws from an exponential distribution with density p(A) = 1 − Aη e . S) + ∂ϕ(t. we recover the Itˆo formula in the scalar case. X(t)) = 2 ϕ(t. X) + ϕx (t. X(t). A(t)) = ϕ(t. A(t) > 0 is the magnitude of the stochastic jump. η We want to compute the SDE of the transformation Y (t) = ϕ(t. In the case where h(t.S) = S1 . A(t)) ) − ϕ(t. 2 3. where µ and σ are constants. The transformation in front of the dQ(t) operator is the transformation before and after a jump has occurred. h(t. ∂ ∂S = − S12 . X)g 2 (t. X(t))dt + g(t. X(t)) = ϕx (t. with 1 f˜(t. namely the Itˆo calculus and the stochastic integration. X(t))dW (t) + h(t. dY (t) = f˜(t. The relevant partial derivatives are ∂ϕ(t. X(t)) satisfies the stochastic differential equation ˜ X(t).S) = 0. X(t). X)g(t.

we can derive exact solutions and we are able to compute moments. X(t)) and g(t. W (t) ∈ Rm is an m-dimensional Brownian motion. X(t)) are affine functions of X(t) ∈ R and thus f (t. X(t))dW (t) with the initial condition X(0) = x0 for a one-dimensional stochastic process X(t) is called a linear (scalar) SDE if and only if the functions f (t.20) 2 0 i=1 i=1 0 2 A common extension of this equation is the following form of a controlled stochastic differential equation as given by m ∑ dX(t) = (A(t)X(t)+C(t)u(t)+a(t))dt + (Bi (t)X(t)+Di (t)u(t)+bi (t))dWi . X(t)) ∈ R1×m . X(t)) = A(t)X(t) + a(t) . g(t. finance. (3. linear SDEs and non-linear SDEs. f (t. m. . and Bi (t). we distinguish between scalar linear and vector-valued linear SDEs. For both classes.19) 0 where the fundamental matrix Φ(t) is given by m ∫ t m (∫ t[ ) ∑ ∑ Bi2 (s) ] ds + Φ(t) = exp A(s) − Bi (s)dWi (s) . examples from economics. X(t)) = [B1 (t)X(t) + b1 (t). Hence. · · · . bi (t) ∈ R.2 (3.9. f (t. X(t)) is called diffusion. the scalar linear SDE can be written in the form dX(t) = (A(t)X(t) + a(t))dt + m ∑ (Bi (t)X(t) + bi (t))dWi (t) . where A(t). 3. and C(t) ∈ R1×k . Commonly.42 3 Stochastic Differential Equations into two large groups. i = 1. and Di (t) ∈ R1×k . Bm (t)X(t) + bm (t)] . Furthermore. A stochastic differential equation dX(t) = f (t.4. and engineering are given.18) i=1 This linear SDE has the following solution: ∫ t m ( [ ] ∑ X(t) = Φ(t) x0 + Φ−1 (s) a(s) − Bi (s)bi (s) ds + m ∫ ∑ i=1 0 t i=1 ) Φ−1 (s)bi (s) dWi (s) . Alternatively. · · · . X(t)) ∈ R and g(t. a(t) ∈ R. i=1 with the control u(t) ∈ Rk .1 Linear Scalar SDEs In the case of linear SDEs. (3. X(t)) is called drift and g(t. X(t))dt + g(t. Definition 3.

The ODE for the expectation is derived by applying the expectation operator on both sides of (3. We can solve it using (3.18) can be calculated by solving the following system of ODEs: m(t) ˙ = A(t)m(t) + a(t) m(0) = x0 m m m ( ) ( ) ∑ ∑ ∑ P˙ (t) = 2A(t) + Bi2 (t) P (t) + 2m(t) a(t) + Bi (t)bi (t) + b2i (t) i=1 i=1 i=1 P (0) = x20 . Let us assume that W (t) ∈ R. the reader is referred to [3. a(t) = 0. A(t) = A. the solution is: 1 X(t) = x0 e(A− 2 B 2 )t+BW (t) .19) and (3. In order to compute the second moment.20). B(t) = B. As we already know. ordinary differential equations exist for computing their first and second moments of the random process X(t). The expectation m(t) = E[X(t)] and the second moment P (t) = E[X 2 (t)] for (3.4 Stochastic Differential Equations 43 This is the solution of the following SDE: dΦ(t) = A(t)Φ(t)dt + m ∑ Bi (t)Φ(t)dWi (t) . we need to derive the SDE for Y (t) = X 2 (t): . Chapter 8].10.3.19). b(t) = 0. we get E[dX(t)] = E[(A(t)X(t) + a(t))dt + E[dX(t)] = (A(t) E[X(t)] +a(t))dt + | {z } | {z } dm(t) =m(t) m ∑ (Bi (t)X(t) + bi (t))dWi (t) ] i=1 m ∑ i=1 E[(Bi (t)X(t) + bi (t))] E[dWi (t) ] | {z } =0 dm(t) = (A(t)m(t) + a(t))dt . i=1 with the initial condition Φ(0) = 1. Example 3. For the proof. We want to compute the solution of the SDE dX(t) = AX(t)dt + BX(t)dW (t) X(0) = x0 . Using the rules for the expectation operator. For scalar linear SDEs.

we can explicitly give the exact probability density function: The solution of the scalar linear SDE dX(t) = (A(t)X(t) + a(t))dt + m ∑ bi (t)dWi (t) (3.21) and use P (t) = E[X 2 (t)] = E[Y (t)] and m(t) = E[X(t)]. b2i (t) . .44 3 Stochastic Differential Equations m ( )2 ] [ ∑ Bi (t)X(t) + bi (t) dt dY (t) = 2X(t)(A(t)X(t) + a(t)) + +2X(t) m ( ∑ i=1 ) Bi (t)X(t) + bi (t) dWi (t) i=1 m ( [ ∑ = 2A(t)X 2 (t) + 2X(t)a(t) + Bi2 (t)X 2 (t) + 2Bi (t)bi (t)X(t) i=1 m ( )] ) ∑ +b2i (t) dt + 2X(t) Bi (t)X(t) + bi (t) dWi (t) .22) i=1 with the initial condition X(0) = x0 is normally distributed: P (X(t)|x0 ) ∼ N (m(t). we apply the expectation operator to (3.21) i=1 Furthermore. m. (3. V (t)) with the mean m(t) and the variance V (t) = P (t) − m2 (t). . We obtain m ( [ ∑ 2 E[dY (t)] = 2A(t) E[X (t)] +2a(t) E[X(t)] + Bi2 (t) E[X 2 (t)] | {z } | {z } | {z } | {z } =dP (t) P (t) i=1 =m(t) =P (t) )] +2Bi (t)bi (t) E[X(t)] +b2i (t) dt | {z } =m(t) m ( [ )] [ ] ∑ +E 2X(t) Bi (t)X(t) + bi (t) E dWi (t) | {z } i=1 =0 [ dP (t) = 2A(t)P (t) + 2a(t)m(t) + m ( ∑ Bi2 (t)P (t) + 2Bi (t)bi (t)m(t) + b2i (t) )] dt . i = 1. which are the solutions of the following ODEs: m(t) ˙ = A(t)m(t) + a(t) m(0) = x0 V˙ (t) = 2A(t)V (t) + m ∑ i=1 V (0) = 0 . . . i=1 In the special case where Bi (t) = 0.

2 Popular Scalar Linear Models There are some specific scalar linear SDEs which are found to be quite useful in practice. In this subsection. The mean is given by E[S(t)] = S0 eµt and its variance by Var[S(t)] = 2 S02 e2µt (eσ t −1). T ] and the second is that all returns are in scale with the current price. The mean is E[S(t)] = µt + S0 and the variance Var[S(t)] = σ 2 t. S(0) = S0 . we discuss these cases and show what types of asset prices they could represent and how they could be useful. Examples of price processes can be found in [27] and [28]. ω) . t→∞ • t→∞ Mean reverting process: Another very popular class of SDEs are mean reverting linear SDEs. The volatility is used in many asset models as an indicator of the risk an investor is taking by buying a certain asset. The model is obtained by dS(t) = κ[µ − S(t)]dt + σ dW (t. • Brownian motion: The simplest case of stochastic differential equations is where the drift and the diffusion coefficients are independent of the information received over time dS(t) = µdt + σdW (t. This class of SDE is used to model for example short rates of the term structure of interest rates or electricity prices. ω) .3.4 Stochastic Differential Equations 45 3. S(0) = S0 .4. and thus the model is unstable: lim E[S(t)] = lim S0 eµt → ∞ t→∞ t→∞ 2 lim Var[S(t)] = lim S02 e2µt (eσ t − 1) → ∞ . S(0) = S0 . S(t) possesses a behavior of fluctuations around the straight line S0 + µt. This model forms the starting point for the famous BlackScholes formula for option pricing. the moments of the geometric Brownian motion become infinite. The geometric Brownian motion has two main features which make it popular for stock price models: The first property is that S(t) > 0 for all t ∈ [0.23) . The parameter σ is also called volatility. For µ > 0. • Geometric Brownian motion: The standard model of stock prices is the geometric Brownian motion as given by dS(t) = µS(t)dt + σS(t)dW (t. This model has been used to simulate commodity prices. such as metals or agricultural products. ω) . (3. This process has a log-normal probability density function. The process is normally distributed with the given mean and variance.

23) models a process which naturally falls back to its equilibrium level of µ.e.4. the geometric mean reverting process: dS(t) = κ[µ − S(t)]dt + σS(t)dW (t. (3. ω) . The rest of this section proceeds in a similar fashion as for scalar linear SDEs. The expected price is E[S(t)] = µ − (µ − S0 )e−κ t and the variance is Var[S(t)] = ) σ2 ( 1 − e−2κ t . This process is a stationary process which is normally distributed. the deterministic optimal control law (ignoring the Brownian motion) and the stochastic optimal control law are the same. This feature is called certainty equivalence. 2κ In the long run. i. There are many variations of the mean reverting process. The parameter κ > 0 governs how fast the process moves back to µ. 3. When we compute an optimal control law for this SDE. When the price S(t) is above µ. 2κ This analysis shows that the process fluctuates around µ and has a vari2 ance of σ2κ which depends on the parameter κ: the higher κ. the lower the variance. the most important (scalar) case is dX(t) = (A(t)X(t) + C(t)u(t)) dt + m ∑ bi (t) dWi . . Equation (3. then κ(µ − S(t)) < 0 and the probability that S(t) decreases is high and when S(t) is below µ. A popular extension is where the diffusion term is in scale with the current value. then κ(µ − S(t)) > 0 and the probability is high that S(t) increases. S(0) = S0 . the stochastics are often ignored in control engineering.3 Vector-Valued Linear SDEs The logical extension of scalar SDEs is to allow X(t) ∈ Rn to be a vector. X(t) is normally distributed because the Brownian motion is just multiplied by time-dependent factors.24) i=1 In this equation.46 3 Stochastic Differential Equations A special case of this SDE where µ = 0 is called Ohrnstein-Uhlenbeck process. For this reason. This is obvious since the higher κ.. the faster the process reverts back to its mean value. the following (unconditional) approximations are valid lim E[S(t)] = µ t→∞ and lim Var[S(t)] = t→∞ σ2 . • Engineering model: In control engineering science.

A stochastic vector differential equation dX(t) = f (t. g(t. Alternatively. The linear SDE (3. where A(t) ∈ Rn×n . W (t) ∈ Rm is an m-dimensional Brownian motion. X(t)) ∈ Rn×m are affine functions of X(t) and thus f (t. Di (t) ∈ Rn×k .28) i=1 with the initial condition Φ(0) = I ∈ Rn×n . (3. X(t))dt + g(t. X(t)) ∈ Rn and g(t. We now prove that (3.25). and Bi (t) ∈ Rn×n . bi (t) ∈ Rn . We rewrite (3. (3.11.26) i=1 where u(t) ∈ Rk .27) and (3. a(t) ∈ Rn .27) 0 where the fundamental matrix Φ(t) ∈ Rn×n is the solution of the homogenous stochastic differential equation dΦ(t) = A(t)Φ(t)dt + m ∑ Bi (t)Φ(t)dWi (t) . C(t) ∈ Rn×k .25) has the following solution: ∫ t m ( [ ] ∑ X(t) = Φ(t) x0 + Φ−1 (s) a(s) − Bi (s)bi (s) ds + m ∫ ∑ i=1 0 t i=1 ) Φ−1 (s)bi (s)dWi (s) . (3. · · · .28) are solutions of (3. the vector-valued linear SDE can be written as dX(t) = (A(t)X(t) + a(t))dt + m ∑ (Bi (t)X(t) + bi (t))dWi (t) . X(t)) = [B1 (t)X(t) + b1 (t). X(t))dW (t) with the initial condition X(0) = x0 ∈ Rn for an n-dimensional stochastic process X(t) is called a linear SDE if the functions f (t. X(t)) = A(t)X(t) + a(t) .25) i=1 A common extension of the above equation is the following form of a controlled stochastic differential equation as given by dX(t) = (A(t)X(t) + C(t)u(t) + a(t)) dt m ∑ + (Bi (t)X(t) + Di (t)u(t) + bi (t)) dWi .27) as . (3.4 Stochastic Differential Equations 47 Definition 3. Bm (t)X(t) + bm (t)] .3.

25). there exist ordinary differential equations for computing the moments of the stochastic process X(t). i=1 Noting that Z(t) = Φ−1 (t)X(t) and using the SDE for Y (t). i=1 This completes the proof.48 3 Stochastic Differential Equations ∫ t ( ) X(t) = Φ(t) x0 + Φ−1 (t)dY (t) 0 [ dY (t) = a(t) − m ∑ m ] ∑ Bi (t)bi (t) dt + bi (t)dWi (t) i=1 i=1 X(t) = Φ(t)Z(t) ∫ t Z(t) = x0 + Φ−1 (t)dY (t) 0 dZ(t) = Φ −1 (t)dY (t) . For vector-valued linear models.25). we get dX(t) = dY (t) + A(t)Φ(t)Z(t)dt + m ∑ m [ ] ∑ = a(t) − Bi (t)bi (t) dt + i=1 + m ∑ Bi (t)X(t)dWi (t) + i=1 = [a(t) + A(t)X(t)]dt + Bi (t)Φ(t)Z(t)dWi (t) + i=1 m ∑ m ∑ Bi (t)bi (t)dt i=1 bi (t)dWi (t) + A(t)X(t)dt i=1 m ∑ Bi (t)bi (t)dt i=1 m ∑ (Bi (t)X(t) + bi (t))dWi (t) . In order to prove that X(t) = Φ(t)Z(t) solves the SDE (3. Using Itˆo’s formula. The expectation m(t) = E[X(t)] ∈ Rn and the second moment matrix P (t) = E[X(t)X T (t)] ∈ Rn×n can be computed as follows: m(t) ˙ = A(t)m(t) + a(t) m(0) = x0 P˙ (t) = A(t)P (t) + P (t)AT (t) + a(t)mT (t) + m(t)aT (t) m [ ∑ + Bi (t)P (t)BiT (t) + Bi (t)m(t)bTi (t) ] i=1 + bi (t)mT (t)BiT (t) + bi (t)bi (t)T P (0) = x0 xT0 . we get dX(t) = Φ(t)dZ(t) + dΦ(t)Z(t) + m ∑ Bi (t)Φ(t)Φ(t)−1 bi (t)dt i=1 m ∑ = dY (t) + A(t)Φ(t)Z(t)dt + Bi (t)Φ(t)Z(t)dWi (t) + i=1 m ∑ Bi (t)bi (t)dt . . we compute dX(t) and show that this gives us back (3.

m. This setup can easily be extended to include n assets. we propose the following system of equations: ( ) dS1 (t) = µ1 S1 (t)dt + S1 (t) σ11 dW1 (t) + σ12 dW2 (t) ( ) dS2 (t) = µ2 S2 (t)dt + S2 (t) σ21 dW1 (t) + σ22 dW2 (t) .4 Stochastic Differential Equations 49 The covariance matrix for the system of linear SDEs is given by V (t) = Var{x(t)} = P (t) − m(t)mT (t) . i = 1.3. the solutions of many linear SDEs can explicitly be computed. . P (X(t)|x0 ) ∼ N (m(t). .e.4. i. which are correlated and have the properties of one-dimensional geometric Brownian motions.. Furthermore. 3. In the special case where Bi (t) = 0. due to the general solution theory. the system of SDEs is linear. The ease of calculating moments is one of the advantages of linear SDEs. Furthermore. . In order to model two stock price processes. . The two price processes are correlated if σ12 = σ21 ̸= 0. we show two popular multi-dimensional linear price models: • Multi-dimensional geometric Brownian motion: The most popular stock price model is the geometric Brownian motion. V (t)) with the expectation m(t) ∈ Rn and the covariance matrix V (t) ∈ Rn×n which are the solutions of the following ODEs: m(t) ˙ = A(t)m(t) + a(t) m(0) = x0 V˙ (t) = A(t)V (t) + V (t)AT (t) + m ∑ bi bTi (t) i=1 V (0) = 0 .4 Popular Vector-Valued Linear Price Models In this part. we can explicitly give the solution in form of a probability density function: The solution of the linear vector SDE dX(t) = (A(t)X(t) + a(t))dt + m ∑ bi (t)dWi (t) i=1 with the initial condition X(0) = x0 ∈ Rn is normally distributed. .

a(t). and B(t) are real scalars. This model captures the behavior of real existing prices better and its distribution of returns shows “fatter tails”. we show some examples of nonlinear SDEs and their properties. nonlinear SDEs are less well understood. The process shows a less volatile behavior than its linear geometric counterpart and it has a non-central chi-square distribution. X(t)) = A(t)X(t) + a(t) √ g(t. X(t)) = B(t) X(t) . where A(t). σ1 a volatility. The following model is obtained: dP (t) = µdt + σ(t)dW1 (t) P (0) = P0 dσ(t) = κ(θ − σ(t))dt + σ1 dW2 (t) σ(0) = σ0 .4. hence corr[dW1 (t).50 • 3 Stochastic Differential Equations Linear SDE with stochastic volatility: The observed volatility for real existing price processes. The process is . The two Brownian motions dW1 (t) and dW2 (t) are correlated. X(t))dW (t) with f (t. The first SDE describes the logarithm of the price and the second SDE describes the evolution of the volatility σ(t) over time. A mean reverting square root process is described by √ dS(t) = κ[µ − S(t)]dt + σ S(t) dW (t) S(0) = S0 . the transformation P (t) = ln(S(t)) is useful. In this section. 3. If this model is used for stock prices. No general solution theory exists. where θ is the average volatility. dW2 (t)] = ρ. In order to capture this observed fact. The nonlinear mean reverting SDEs differ from the linear scalar equations by their nonlinear diffusion term. And there are no explicit formulae for calculating the moments. such as stocks or bonds. a scalar square root process can be written as dX(t) = f (t. but it is itself a stochastic process. is not constant as most models assume. the price is modeled by two SDEs. X(t))dt + g(t. • Square Root processes: In general.5 Nonlinear SDEs and Popular Nonlinear Pricing Models In comparison with linear SDEs. and κ the mean reversion rate of the volatility process σ(t).

(3. Its model is given by √ dS(t) = µS(t)dt + σ S(t)dW (t) .3. Here. 2κ Because of the transformation.29) . and electricity prices. X(t) = x . stochastic volatilities. but with a square root diffusion term instead of the linear diffusion term.5 Partial Differential Equations and SDEs 51 often used to model short-term interest rates or stochastic volatility processes for stock prices. S(t) is always positive. X(s))ds + g(s. the variance of this model grows much more slowly over time. Its mean and variance are: E[S(t)] = S0 eµt ) σ 2 S0 ( 2µt Var[S(t)] = e − eµt .) with the deterministic initial value x ∈ Rn at the given initial time t. µ In comparison with the geometric Brownian motion model. • Mean reversion with log-normal distribution: Another widely used mean reversion model is obtained by dS(t) = κS(t)[µ − ln(S(t))]dt + σS(t)dW (t) . S(0) = S0 . which is governed by the stochastic differential equation dX(s) = f (s. Using the transformation P (t) = ln(S(t)) yields the linear mean reverting and normally distributed process P (t): [ ] σ2 dP (t) = κ µ − − P (t) dt + σdW (t) . and g : R × Rn → Rn×k . dW (s) ∈ Rk . and f : R × Rn → Rn . Because S(t) is log-normally distributed. S(t) is log-normally distributed. Consider the stochastic process X(. Another often used square root process is similar to the geometric Brownian motion. we highlight the connection between stochastic differential equations and partial differential equations (PDEs). 3.5 Partial Differential Equations and SDEs In this section. X(s))dW (s) for s ≥ t . This model is used to model stock prices.

12. x) = 0 2 ∂x (3. x)g T (t. We are interested in the average volatility of the stock market in a given time period [0. x) + (t. we want to compute its expected value which we denote by ∫ [ F (t.32) . x) defined in (3.30) t Note that the expectation operator Et.30) which is associated with the stochastic process X(. x) for all starting times t ≤ T and all starting values x ∈ Rn . Furthermore. x) ∂t ∂x ] ∂2F 1 [ + tr g(t. x) 2 (t. X(s)) ds . x)f (t. X(s)) ds . Example 3.) governed by the SDE (3. (3.52 3 Stochastic Differential Equations We are interested in the stochastic “cost” functional which we associate with the stochastic process X(.) over the time interval [t. T ] ∫ T K(X(T )) + L(s.31) with the boundary condition F (T. The following result is of paramount importance: The cost-to-go function can be found by solving a partial differential equation! Theorem 3.x 0 ] 1 x(t) dt T dx(t) = κ(θ − x(t))dt + σ x(0) = x0 . One could for instance buy a derivative that gives a monetary payoff which equals the average volatility.29) satisfies the PDE ∂F ∂F (t. Such instruments are known as volatility swaps.x is the conditional expectation with respect to the given initial time t and the fixed initial state x. T ]. The problem is to find the expected average volatility as described by [∫ T E0.x K(X(T )) + T ] L(s. we are also interested in the so-called “cost-to-go” function F (t. t In particular.13. x) + L(t. x) = Et. √ x(t) dW (t) where the SDE describes the dynamics of the market’s volatility x(t). x) = K(x) at the final time T . (3. Feynman-Kaˇc The cost-to-go function F (t.

X(τ ))” have been suppressed. T ] and in the final form { } ∂F ∂F 1 ∂2F T L(t. ∂τ ∂X 2 ∂X 2 ∂X t where for ease of typesetting and reading. we notice that for t = T we get the claimed deterministic boundary condition F (T. x) + tr g(t. . x)g (t. X(T ))] = F (t. X(T ))] { }) }] [ ∫ T {( ∂F ∂F 1 ∂2F ∂F − Et. X(τ )) dτ t = Et. x) 2 (t. we get F (t.x [F (T.x L+ + f + tr gg dτ =0 ∂τ ∂X 2 ∂X 2 t and in the second preliminary form { }] [ 2 ∂F 1 ∂F T ∂ F + f + tr gg =0 Et.x K(X(T )) + ] T L(τ. F (t.x L + ∂τ ∂X 2 ∂X 2 for all τ ∈ [t. This PDE and other similar PDEs will turn up in many applications where the problem is modeled by SDEs. x)f (t.x t ] dF (τ.x [F (T. x) + (t. x) = 0 . considering that the Brownian motion W (. This PDE is essential for finding solutions to derivative pricing or optimal control applications. x) + Et.13: By definition. and using (3. Proof of Theorem 3. X(τ )) dτ .5 Partial Differential Equations and SDEs 53 The solution to the PDE problem is well-defined if the integrability conditions for the SDE are met.30) and Itˆo’s rule.3. x) ∫ [ = Et.) is zero-mean and cancelling the boundary condition in the above equation yields the claimed partial differential equation in the first preliminary form { }) } ] [ ∫ T {( 2 ∂F ∂F 1 T ∂ F Et. dτ Reordering this equation. First.x + f + tr gg T dτ + g dW (τ ) . we can write [∫ T Et. x) is deterministic. x) = K(x) . all of the identical arguments “(τ. Since integration and the expectation operator are linear. Next. ∂t ∂x 2 ∂x This completes the proof of this theorem. x) + (t.

x) + x 1 + σ 2 xFxx (t. This leads to the PDE Ft (t. x)f T (t.33) with an arbitrary scalar-valued function r(. We now have to solve the PDE and make an ansatz for F (t. X)ds t dX(t) = f (t. we now solve [∫ ] 1 x(s)ds 0 T √ dx(t) = κ(θ − x(t))dt + σ x(t)dW T F (0. x) = a1 (t) + a2 (t)x(t) Ft (t.). x) 2 (t. Possessing all of the necessary tools to solve the problem of the value of a volatility swap. x) = 0 . x) + (t. x) = 0 2 ∂x F (T.X(t))dt F (s. x)F (t. X(t))dt + g(t. x) = 0 T 2 F (T. This PDE often arises in problems for financial derivatives and is also attributed to Feynman and Kaˇc. with the terminal conditions a1 (T ) = 0 and a2 (T ) = 0. x) + r(t. Using the partial derivatives.54 3 Stochastic Differential Equations We can also investigate another closely related PDE problem: ∂F ∂F (t. X(t))dW X(t) = x . x) = a2 (t) Fxx (t. x) = K(X) (3.x x(0) = x0 . X) + T ∫s e t r(t.. The related probabilistic solution to (3.X(t))dt ] L(s. x) + κ(θ − x)Fx (t.x e t K(T. x) = E0. Example 3. x) = a˙ 1 (t) + a˙ 2 (t)x(t) Fx (t. x) and compute the corresponding partial derivatives.14.33) is given by ∫ [ ∫T r(t. we arrive at . x) = Et. x) = 0 . This leads us to F (t. x) + L(t. x) ∂t ∂x ] 1 [ ∂2F + tr g(t. . x)g T (t.

Secondly. 3. it is assumed that the functions f (t. X(s)) dW (s) . X(t)) are sufficiently smooth in order to guarantee the existence of the solution X(t). we introduce numerical methods to compute path-wise solutions of SDEs. 0 0 where the first integral is a path-wise Riemann integral and the second integral is an Itˆ o integral.1 Analytical Solutions of SDEs Definition 3. ω) = X0 + f (s. In this definition.6. The first method is based on the Itˆo integral and has already been used for linear solutions. κ κT 3. X(t))dt + g(t. One way is to guess a solution and use the Itˆo calculus to verify that it is a solution for the SDE under consideration. There are several ways of finding analytical solutions. x) = θ + ) ) θT ( 1 ( 1 − e−κT + 1 − e−κ(t−T ) x(t) .3. The third method is based on partial differential equations. X(s)) ds + g(s. The stochastic process X(t) governed by the stochastic differential equation dX(t) = f (t. we introduce three major methods to compute solution of SDEs.6 Solutions of Stochastic Differential Equations a˙ 1 (t) + a˙ 2 (t)x(t) + κ(θ − x)a2 (t) + 55 x(t) =0 T 1 )x(t) = 0 T a˙ 1 (t) + κθa2 (t) = 0 1 a˙ 2 (t) − κa2 (t) + = 0 .15. X(t)) and g(t. X(t))dW (t) X(0) = X0 is explicitly described by the integral form ∫ t ∫ t X(t. where the problem of finding the probability density function of the solution is transformed into solving a partial differential equation. The solution is F (t. . T a˙ 1 (t) + κθa2 (t) + (a˙ 2 (t) − κa2 (t) + We have transformed the problem from finding a solution to a PDE into solving an ODE.6 Solutions of Stochastic Differential Equations In this part.

X(t)) = ϕ(X(t))′ − X(t) − dt + ϕ′′ (X(t)) 2 8X 3 (t) 2 4X 2 (t) = −X 2 (t) ge(t. Hence. We define Z(t) = W (t) and thus dZ(t) = dW (t) which is an SDE with f = 0 and g = 1.16. As an example for this solution technique. We√have √ X(t) = ϕ(Z) where ϕ(Z) = (W (t) + X0 )2 . Y (t) is the well-known Ornstein-Uhlenbeck process. Example 3. we get dX(t) = fe(t. Using Itˆo’s rule. With the trans√ formation rule X(t) = Y (t) we have solved the original SDE. Another way of finding analytical solutions is to use the Itˆo calculus to transform the SDE in such a way that the resulting SDE can easily be integrated or that it already has a known analytic solution. X(t))dW (t) with ( 1 1 ) 1 1 fe(t. we use the Itˆo calculus. We assume that the following nonlinear SDE √ dX(t) = dt + 2 X(t) dW (t) . X) = ϕ′ (W )1 = 2(W (t) + X0 ) . we consider the following SDE ( 1 1 ) 1 dX(t) = − X(t) − dt + dW (t) . X(t))dt + ge(t. X) = ϕ(W )′ 0 + ϕ′′ (W )1 = 1 2 √ ge(T. In order to simplify things. so that ϕ′ (X(t)) = 2X(t) and ϕ′′ (X(t)) = 2. In order to verify this claim. we get dX(t) = fe(t. so that ϕ′ (Z) = 2(Z(t) + X0 ) and ϕ′′ (Z) = 2. √ √ √ Since X(t) = (W (t) + X0 )2 we know that (W (t) + X0 ) = X(t) and thus the Itˆo calculation generated the original SDE where we started at. Using Itˆo’s rule. X)dW (t) 1 fe(t. X(t)) = ϕ(X(t))′ 1 =1 2X(t) dY (t) = −X 2 (t)dt + dW (t) = −Y (t)dt + dW (t) . 2 8X 3 (t) 2X(t) For this rather unpleasant SDE.17. has the solution X(t) = (W (t) + √ X0 ) 2 .56 3 Stochastic Differential Equations Example 3. . let us try the transformation Y (t) = X 2 (t). X)dt + ge(t. it is difficult to guess the solution.

in practice. roundoff errors get in the way when the step size is taken too small. In order to compute the approximate solution of the SDE.2 Numerical Solution of SDEs In some of the examples. we need to simulate a significant number of sample paths. most SDEs.6 Solutions of Stochastic Differential Equations 57 3. one can use the so-called Milstein scheme which includes second-order information X(tk ) = X(tk−1 ) + f (tk−1 . However. x. X(tk−1 ))[(∆W (tk ))2 − ∆t]. This is called the Euler scheme X(tk ) = X(tk−1 ) + f (tk−1 .6. However. X(tk−1 ))∆t + g(tk−1 . we are interested in knowing the statistical distribution of the possible outcomes of the stochastic process at all times. This is important since. given a realization of a Brownian motion. X(tk−1 ))∆W (tk ) 1 ∂g + (tk−1 . Both schemes converge to true sample paths if the step size in the numerical approximations are taken smaller and smaller. especially nonlinear SDEs. since one sample path is just one realization of the unknown time-varying distribution. where the ϵ(. The Brownian motion term can be approximated as follows: √ ∆W (tk ) = ϵ(tk ) ∆t . But higher-order schemes converge more quickly. we want to know the transition probability P (s.3 Solutions of SDEs as Diffusion Processes Introduction and Motivation In the previous sections. do not have analytical solutions so that one has to resort to numerical approximation schemes in order to simulate sample paths of solutions to the given equation. we found explicit solutions for a given stochastic differential equation. B) of the process. We consider the stochastic differential equation . We state here the two most common numerical procedures: The simplest scheme is obtained by using a first-order approximation. In other words. X(tk−1 ))∆W (tk ) . Therefore. these methods can only deliver one sample path of the given process.6. some analytical and numerical examples of solutions for SDEs are given. t. To get a somewhat better approximation.) is a discrete-time Gaussian white process with mean 0 and standard deviation 1. 3. X(tk−1 ))∆t + g(tk−1 .3. 2 ∂X For the derivation and proof of the schemes the reader is referred to [26].

1). This is the function we eventually are going to find and use to describe the solution of the stochastic differential equation (3. B) is the conditional distribution of the probability P (X(t) ∈ B | X(s) = x) and is called the transition probability. t. B). x. B) . B) ≡ P (s.58 3 Stochastic Differential Equations dX(t) = f (t. t. P (s + u. t.18. x. B A Markov process is called shift-invariant if its transition probability P (s. t. t. X ∈ Rd . such that P (t. x. t. X(t))dt + g(t. The function P (s. Let X(t) be a Markov process. x. x. t − s. The associated transition probability density p(s.34) We know that the analytical solution of this general equation for X(t) can only be found under certain conditions for f (t. T ). t.. and ε > 0 . x.e. X(t))? Transition Probability We begin here by recalling the definition of the transition probability P (s. when at time s < t it was found to be in state X(s) = x (see Figure 3. x. Definition 3. X(t))dW (t) . The question we want to answer here is the following: can we obtain the transition probability P (s. B) is the probability of transition of X(t) into B over the time interval t. y) dy. x. x. t. x. independently of where the time interval of length t lies. x. x. X(t)) and g(t. a diffusion process is a kind of Markov process with continuous realizations whose transition probability P (s. B) = P (t − s. The simplest example of a diffusion process is the Brownian motion. i.34). or rather from its drift and diffusion functions f (t. if its transition probability P (s. B) satisfies the following conditions for s ∈ [to . X(t)). B) . Diffusion Processes Generally speaking.34). B) has certain properties for t ↓ s. t + u. the transition probability is only a function of x. It is best described as the probability that the process is found inside the set B at time t. x. X(t)) and g(t. A Markov process X with values in Rd is called a diffusion process. Definition 3. B) of X(t) directly from the stochastic differential equation (3. B) = p(s. t. B) is stationary. (3. and B and can be written in the following form P (s. t. x. t.19. y) is given by ∫ P (s. In this case.

f is the drift vector and Σ the diffusion matrix. we have the boundary condition Xs = x): Condition a) makes it improbable for the process X to have large changes in values in a short time interval. B).08 σ=0. t. and at time s.1. X. x)g T (t. Conditions a). x) with ∫ 1 lim (y − x)(y − x)T P (s. dy) = Σ(s. x) . dy) = 0 . x.3.2 50 45 40 35 B x(t) 30 25 20 15 10 Xs 5 t s 0 0 5 10 15 20 s+t 25 t 30 35 40 Fig. x) and Σ(t. t. dy) = f (s. t. and c) have the following meanings (assuming the first and second moments of X exist. x) = g(t. x) are called the coefficients of the diffusion process. Transition probability P (s.6 Solutions of Stochastic Differential Equations 59 Geometric Brownian motion with µ=0. the index denoting the time at which the process is evaluated. x) in Rd with ∫ 1 lim (y − x)P (s. x. |y−x|>ε b) There exists a function f (t. t ↓s t − s |y−x|≤ε c) There exists a symmetric d × d matrix Σ(t. 3. t. x) . a) Continuity: lim t ↓s 1 t−s ∫ P (s. 45 50 . t ↓s t − s |y−x|≤ε The functions f (t. It thereby rules out any Poisson type effects in the diffusion process: P (| Xt − Xs |≤ ε | Xs = x) = 1 − o(t − s). x. b).

it is now necessary to define a method of directly stating the transition probability as a function of the process parameter functions f and B. x. Xt − Xs = f (s. An analytical approach is the following: Let X be the solution of a d-dimensional diffusion process dX(t) = f (t.x (Xt − Xs ) = Σ(s. B).x ξ = (t − s)I. dXt = f (t.x ξ = 0 and Covs. t. We can replace this by a Brownian motion. i. Xs )ξ. (3. X(t))dW (t) . (t − s)I). x)(t − s) + o(t − s) t ↓s and with condition c) lim Es. Note that this form of the SDE (3. exactly the distribution of ξ that we were looking for. Fokker-Planck Equation After having seen the link between the transition probability and the stochastic process above.35) This closely resembles equation (3.e. t ↓s We can now state that f is the average speed of the stochastic movement and Σ describes the magnitude of the fluctuation Xt − Xs around the mean value of X with respect to the transition probability P (s.35) has been derived directly from the transition probabilities and the characteristics of diffusion processes without knowing the general solution of (3.x (Xt − Xs )(Xt − Xs )T = Σ(s. x)(t − s) + o(t − s) t ↓s or lim Covs.60 3 Stochastic Differential Equations For condition b) we can write lim Es.34) in advance.. Xs )(t − s) + g(s. as we know that Wt − Ws follows the distribution N (0.34) of the problem description. X(t))dt + g(t. Xt )dt + g(t.x (Xt − Xs ) = f (s. Xt )dWt . Xs )(Wt − Ws ) or. x) such that Σ = gg T and can write the following equation: Xt − Xs = f (s. This goes to show how the same process can be described either through transition probabilities or through a stochastic differential equation. So. . in differential notation. Xs )(t − s) + g(s. where Es. We now choose an arbitrary matrix g(t. x)(t − s) + o(t − s) .

x. t. y). t. x. x. y) = δ(y − x). x. given the final state (t. y) 1 + tr{g(s. y) = δ(y − x) . y) starting at the initial state (s. x)Pxx (s. t. x. y) + f T (s. t. B) = IB (X(t)) . t.x IB (X(t)) = P (X(t) ∈ B | X(s) = x) . t.3. x. x) and therefore.36) where IB is the indicator function of the set B. x. (3. ∂t 2 i=1 ∂yi2 d /2t . y)pyy (s. y) and we can directly write the so-called Kolmogorov backward equation for the transition probability density function p(s. The transition probability P (s. t. x. For the Wiener process. the differential equation gives the backward evolution with respect to the initial state. t. x)Px (s. y)} = 0 2 lim p(s.20. B) = P (X(t) ∈ B | X(s) = x) are given as the solution of the equations Ps (s. y)g T (t. Then the transition probabilities P (s. (3. y) + f T (t. t. y) ps (s. x)g T (s. x. x. the forward equation for the homogeneous transition density is p(t. pt (s. y)} = 0 2 lim p(s. t. y) = (2πt)−d/2 e−|y−x| 2 1 ∑ ∂2p ∂p = . B) has the density p(s. x. x. t. Finally. in other words. B) 1 + tr{g(s. x. it is called a forward equation. x. x)pxx (s. We further know that P (s. y)py (s. B) = Es. t. x.37) s↑t The reason for calling this equation a backward equation is that the differential operators are considered with regard to the backward variables (s. t. the Fokker-Planck equation gives the forward evolution with respect to the final state (t. B)} = 0 2 P (t. x. x)px (s. y) 1 − tr{g(t. t. x. x. x)g T (s. t. B) + f T (s.6 Solutions of Stochastic Differential Equations 61 whose drift term f and diffusion coefficient g are sufficiently smooth functions. t↓s Example 3. x). t. t.

t. Kt (s. t.7 Stability 3. ∂t ∂y 2 ∂y As boundary conditions we assume that p vanishes (including all its partial derivatives) for | x |→ ∞ and | y |→ ∞. x)) exp − 2Kt (s. x.21. p(s.e. x. x. t.1 Introduction Every autonomous linear time-invariant dynamic system x(t) ˙ = Ax(t) has either exactly one equilibrium state or it has infinitely many equilibrium states. x))2 −1/2 p(s. 0 This includes the Ornstein-Uhlenbeck process (a = 0). This equilibrium state is asymptotically stable because the transition matrix asymptotically vanishes: lim eAt = 0 . 2A i. x). We know that ( ) (y − mt (s. t. If all of the eigenvalues of the matrix A have strictly negative real parts. t. x) = xeA(t−s) + Kt (s.62 3 Stochastic Differential Equations Example 3. x. x) holds with mt (s. t→∞ . x(t) ≡ 0 is the only equilibrium state. x)). b = 1. x) = a A(t−s) (e − 1) A b2 2A(t−s) (e − 1) . 3. a deterministic differential equation (b = 0).7. y) − b2 2 p(s. t ≥ 0. and the standard Brownian motion (A = a = 0. B) = (2πKt (s. The scalar linear differential equation dXt = (AX(t) + a)dt + bdW (t). With b ̸= 0 we can find a solution for the transition probability density p(s. y) = 0 . c = 0) as special cases. has the following solution X(t) = ceAt + a At (e − 1) + b A ∫ t eA(t−s) dW (s) . x. t. y) + (Ay + a) p(s. x. X(0) = c. B) is the density of the normal distribution N (mt (s. y) through the Fokker-Planck equation ∂ 1 ∂2 ∂ p(s..

exponential stability. Example 3.7 Stability 63 In other words. we say. The state vector consists of the position of the cart and the cart’s velocity. the state remains bounded by ∥x(t)−x∗ ∥ < R for all t > 0. Every bottom of a strictly convex (local) valley corresponds to an asymptotically stable equilibrium. there is no guarantee that the state will remain close to x∗ because the influences of the noise may eventually drive the system “over the nearest hill”. limt→∞ x(t) = x∗ . for a time-varying nonlinear dynamic system.38) is called asymptotically stable if for some value r > 0 and for all initial states x(0) with ∥x(0) − x∗ ∥ < r.38) may have an arbitrary number of equilibrium states. again. This is not pursued here. and it is called an unstable equilibrium otherwise. Some of these equilibrium states may be asymptotically stable. we get asymptotic reverting. Each one of them is characterized by f (x) = 0. If at least one of the eigenvalues of the matrix A is zero.23. the system may have no equilibrium state at all. In this case. and every point on a level plateau corresponds to a marginally stable equilibrium. or unstable. the situation is more complicated. i. The stability properties of a linear time-varying dynamic system is a more complicated issue because the (quasi-stationary) stability properties can change as a function of the time t. . we need to refine the concept of stability and we need to define different stability measures such as asymptotic stability. it is called a stable equilibrium if for some value R > 0. Of course.e. The underlying definition of stability of an equilibrium state is as follows: Definition 3. is an equilibrium state. every top of a strictly concave (local) hill corresponds to an unstable equilibrium. x ∈ N (A). An equilibrium state x∗ of the nonlinear time-invariant system (3. The system may be unstable or stable (but not asymptotically stable) depending on the real part of the dominant pole and the multiplicity of the vanishing eigenvalue of A. the linear time-invariant system is asymptotically stable. then every vector x in the null space of A. the state vector x(t) asymptotically vanishes for every initial state x(0) = x0 ∈ Rn . For a stochastic autonomous nonlinear time-invariant dynamic system dX(t) = f (X(t))dt + g(X(t))dW (t) . and stability in probability. Also..22.3. or marginally stable. An autonomous nonlinear time-invariant dynamic system x(t) ˙ = f (x(t)) (3. Think of a roller-coaster where the motion of the cart is subject to friction. Therefore. the situation is even more complex due to the omnipresent white noise driving the system: if the state x∗ is an asymptotically stable equilibrium (of the deterministic system).

we want to know about stability of the p-th moment of a stochastic system.. Several different stability statements are possible: Definition 3. |x0 | ≤ δ . we know that the state vector will stay within certain limits for all times t → ∞ with a given probability.7. The process is called 1. c1 . .24. 4. exponentially p-stable if there are two positive constants. X(t0 ) = x0 .e.64 3 Stochastic Differential Equations 3. X(t))dt + g(t. we know that it will converge back to this point as t → ∞. t0 ≤t<∞ Here. X(t))dW (t). for p = 1 we regard the special case of stability of the expected value of the process. for each ϵ > 0. i. as in this case. We call c2 the rate of exponential convergence. X(t0 ) = x0 . for all X(t0 ) = x0 . 3. this might be a disadvantage of the method. 2. for stochastic systems we can look at the asymptotic stability of any p-th moment with this method. stable in probability if for arbitrary ϵ1 > 0.2 Moment Method for Stochastic Systems The moment method for stochastic systems is a way to study the stability of a stochastic system whose solution is already known. asymptotically p-stable if it is p-stable and there is an ϵ > 0 such that we have lim E|X(t)|p = 0 . Again. exponentially stable means that the state vector of an exponentially stable system converges to the origin faster than an exponential function. there is a δ > 0 such that sup E|X(t)|p ≤ ϵ . It is an obvious expansion of the known deterministic stability measure and it is very interesting nontheless. (3. ϵ2 > 0 there is a δ > 0 such that { } P sup |X(t)| ≤ ϵ1 ≥ 1 − ϵ2 . such that E|X(t)|p ≤ c1 |x0 |p e−c2 (t−t0 ) . p-stable if. t→∞ This is a stronger type of stability.39) In the following. Consider the stochastic differential equation dX(t) = f (t. 0) ≡ 0 such that the trivial solution x(t) = 0 holds for x0 = 0. Here. when the system is disturbed from its nominal position. X(t0 ) = x0 . Expressed in words. |x0 | ≤ δ. c2 . 0) ≡ g(t.23. t0 ≤t<∞ for all X(t0 ) = x0 . we assume f (t. but it provides a very intuitive view on stability. This stability description for the p-th moment of a stochastic system comes close to the basic stability description given in Definition 3. As we shall see.

x(t0 ) = x0 . 0) = 0 such that the solution x(t) ≡ 0 holds for x0 = 0.e. i. We further regard V (t) = v(t. the system’s equilibrium is called stable. x) ∂v ∑ ∂v = + fi (t. We use the stability theory of Lyapunov which can then be expanded to suit the special needs of stochastic systems.3 Lyapunov’s Second Method Lyapunov Method for Deterministic Systems In the first step.7 Stability 65 We call he special cases p = 1 and p = 2 mean-stability and mean-square stability. x) is a positive-definite function with continuous partial first-order derivatives.40) We set f (t. x(t)). the “distance” of x(t) to its equilibrium “measured” by v(t. t≥t0 We assume that x(t) is a solution to equation (3. h > 0. More generally. Definition 3. i. x) → ∞ f or |x| → ∞ . Lyapunov. x(t)). A second method.e. x(t)). We call the function radially unbounded. x) ≤ 0 dt ∂t i=1 ∂xi d on the trajectories the of differential equation (3.40). Thus. t ≥ t0 . The function is defined in the spherical region Uh = {x : |x| ≤ h} ⊂ Rd . v(t. dt ∂t i=1 ∂xi d If dVdt(t) ≤ 0. We call this equilibrium stable if for every ϵ > 0 there exists a δ(ϵ. t0 ) > 0 such that supt0 ≤t<∞ |x(t)| ≤ ϵ as long as x(t0 ) = x0 and |x0 | ≤ δ. x(t)) with its derivative dV (t) ∂v ∑ ∂v = + fi (t. respectively.. M. 3. . x) such that the first-order derivative satisfies dv(t. (3. we can say that if there exists a positive-definite function v(t. if inf v(t. x(t)) does not increase. before being able to investigate stability.25. We call a scalar function v(x) positive-definite if v(0) = 0 and v(x) > 0 for all x ̸= 0. The difficulty with this definition of stability is that we need to know the solution of equation (3.40) and v(t. x = 0 is an equilibrium state. we know that x(t) follows a path such that the value of V (t) does not increase. x) which fulfills the conditions for a stability proof is called a Lyapunov function. We begin with a deterministic differential equation for the d-dimensional state x(t) x(t) ˙ = f (t.7..3. we want to discuss the stability of deterministic systems. A function v(t. found 1892 by A.39). allows to prove stability without necessarily knowing the solution of the given system.

[26]. Due to the last term with the stochastic property. [23]. We call the function v(t. x) ≤ 0 . However. such that the equilibrium X(t) ≡ 0 is the solution of the differential equation with c = 0. x)g T (t. i. Notes and Comments The literature on stochastic calculus and stochastic differential equations is vast. x) = g(t.. this cannot be easily guaranteed in most cases. X(t0 ) = c. x) and the stability condition dv dt ≤ 0 accordingly. x) and diffusion matrix Σ(t. we will only demand that X(t) stays near the equilibrium on average. Xt ) yields dV (t) = (Lv(t. (3. 0) = 0. Let X(t) be the solution to equation (3. We set f (t. 0) = 0 for all t ≥ t0 . g(t. i=1 j=1 For a stable equilibrium. We consider the stochastic differential equation dX(t) = f (t. Fairly straightforward introductions can be found in [6]. ∂t ∂x 2 ∂x2 The differentiation of V (t. V (t) does not increase. x). and [27]. x) . The process V (t) = v(t. we finally get the condition for stochastic stability in the Lyapunov sense: Lv(t. With E(dV (t)) = E(Lv(t. for all t ≥ t0 . x) + tr g(t. Mathematically more rigorous textbooks are [28].41) whose solution X(t) is a d-dimensional diffusion process with drift f (t. X(t))dW (t). dV (t) ≤ 0.e. X(t))dt). This results in E(dV (t)) ≤ 0 .66 3 Stochastic Differential Equations Lyapunov Method for Stochastic Systems We now need to expand the stability analysis according to Lyapunov to the world of stochastic systems. x)g T (t. the first problem is to create a suitable definition of stability for stochastic systems and to change the definition of the Lyapunov function v(t. x) a positive-definite function with continuous partial differentials. t ≥ t0 . . X(t))dW (t)j . x) the Lyapunov function belonging to a particular equilibrium state of the stochastic differential equation (3. Therefore.41).41) and v(t. X(t)))dt + d ∑ m ∑ vxi (t. X(t))dt + g(t. X(t))gij (t. X(t)) then has a stochastic differential according to the Itˆo differential operator L(·) = ∂(·) ∂(·) 1 { ∂ 2 (·) } + f (t. and [3].

When a stochastic dynamic system is modeled.1) (4. This is achieved by minimizing the variance of the estimation error. It is something theoretical. Furthermore. In almost every real-world application. of a dynamic system. i. Naturally.2) where Q1/2 (t) and R1/2 (t) are the volatility parameters.4 Model-Based Filtering She: How often does a Kalman filter need to be cleaned? He: Never. (4. In the deterministic case. She: Can you get me one for our coffee machine? He: No. we demand some favorable properties for the filter. If we need an estimation of the state of such a noisy system.). We therefore refer to a filter as a state estimator for a stochastic dynamic system. respectively. noise is present in the system itself as well as in the measurements. Dialogue of an unknown couple Filtering and estimation are two closely related concepts. the Luenberger observer is used for reconstructing the missing information from the measurements and the system dynamics. We shall switch back and forth between the mathematically precise description of these (normalized) Brownian motions by their increments and the sloppy description by the corresponding white noises v(. the estimation error should have zero expectation in order to prevent any systematic errors. Filtering techniques can also be used for estimating unknown parameters of a deterministic or stochastic dynamic system. we abandon the rather cumbersome notation of stochastic processes by upper-case letters. which is preferred in engineering circles Q1/2 (t)dw1 (t) = [v(t) ˙ − u(t)]dt 1/2 R (t)dw2 (t) = [r(t) ˙ − r(t)]dt . we want the estimation to bias-free.. we usually are not able to measure every state variable of the system.e.) and w2 (. Most importantly. the scattering of the estimate around the true value of the state should be as small as possible in order to have more confidence in the estimations. ˙ respectively. In this chapter. .) at the input and the output. we make use of the concepts of filtering. we shall use stochastic processes with independent increments w1 (.) ˙ and r(. She: Why not? He: Because it is optimal. In addition.

For white noises. Q(t)dt) (4.13) (4.). Σ0 ) dv(t) : N (u(t)dt.) is corrupted by the additive white noise r(. 4. Q and R are simply the spectral density matrices. 1 Of course. The concept of modelbased filtering was published by Kalman and Bucy in the early 1960s. an engineer would not use symbols such as v˙ and r˙ to denote white noise processes. we refer to problems with linear system dynamics.12) v(t) ˙ : N (u(t). (4.3) (4. For stationary white noises. (4. Q(t)) r(t) ˙ : N (r(t).1 The Kalman Filter Continuous-time version of the Kalman filter Consider the following linear time-varying dynamic system of order n which is driven by the m-vector-valued white noise v(.6) (4.): ˙ System description in the mathematically precise form: dx(t) = A(t)x(t)dt + B(t)dv(t) = [A(t)x(t) + B(t)u(t)]dt + B(t)Q1/2 (t)dw1 (t) x(t0 ) = ξ y(t)dt = C(t)x(t)dt + dr(t) = [C(t)x(t) + r(t)]dt + R1/2 (t)dw2 (t) . it is customary to only notate the “covariance” or “intensity” parameters.5) where ξ : N (x0 .1 Linear Filtering By linear filtering. ˙ Its initial state x(t0 ) is a random vector ξ and its p-vector-valued output y(.8) System description in the engineer’s form1 : x(t) ˙ = A(t)x(t) + B(t)v(t) ˙ x(t0 ) = ξ y(t) = C(t)x(t) + r(t) ˙ . we have deleted the factor dt which ought to multiply Q(t) and R(t). Consequently. — Just consider the derivate dots as concessions to mathematicians! .68 4 Model-Based Filtering 4. R(t)) .9) (4. Σ0 ) (4.1. (4.10) (4.7) dr(t) : N (r(t)dt.11) where ξ : N (x0 . (4.4) (4. the dynamics of the filter are linear as well. R(t)dt) .14) 1 Note that in the last two lines.

v. Requiring that the matrix R(t) must be positive-definite means that the measurement y(t) cannot contain any perfectly uncorrupted information. but there may be a better nonlinear one.21) (4. and r are mutually independent and Gaussian. (4. respectively) are assumed to be mutually independent.4. In addition. the Kalman filter is the best linear filter. v.1.16) (4.2. The error of the state estimate has infimal covariance matrix: Σopt (t) ≤ Σsubopt (t).22) Remark 4.17) These requirements are pretty obvious. the optimal filter is the following linear dynamic system: with x b˙ (t) = A(t)b x(t) + B(t)u(t) + H(t)[y(t)−r(t)−C(t)b x(t)] = [A(t)− H(t)C(t)]b x(t) + B(t)u(t) + H(t)[y(t)−r(t)] x b(t0 ) = x0 (4. (4. Theorem 4. . we impose the following requirements: Σ0 = Σ0T ≥ 0 Q(t) = Q(t) ≥ 0 R(t) = R(t)T > 0 .19) H(t) = Σ(t)C T (t)R−1 (t) .15) (4.18) (4. T (4. or r are not Gaussian. The Kalman Filter With the above-mentioned assumptions that ξ.1 Linear Filtering 69 The random initial state ξ and the white noise processes v˙ and r˙ (or the normalized Wiener processes w1 and w2 . The filtering problem is stated as follows: Find the optimal filter with the state vector x b which is optimal in the following sense: • • The state estimation is bias-free: E{x(t) − x b(t)} ≡ 0.20) where Σ(t) is the covariance matrix of the error x b(t)−x(t) of the state estimate x b(t) satisfying the following matrix Riccati differential equation: ˙ Σ(t) = A(t)Σ(t) + Σ(t)AT (t) − Σ(t)C T (t)R−1 (t)C(t)Σ(t) + B(t)Q(t)B T (t) Σ(t0 ) = Σ0 . If ξ.

Its first differential with respect to H. (4.1: Consider again the differential equation (4.28) A more sophisticated proof of Theorem 4. the covariance matrix Σ(t) of the state estimation errer e(t) is governed by the matrix differential equation Σ˙ = [A−HC]Σ + Σ[A−HC]T + BQB T + HRH T (4.27) (4.29) where T is the matrix transposing operator and U is the operator which ˙ adds the transposed matrix to its matrix argument2 .70 4 Model-Based Filtering A simple proof of Theorem 4. (4.23) Since the white noise processes v˙ and r˙ are uncorrelated.26) Hence. respectively. . (4. Σ(t) is infimized at all times t for H(t) = Σ(t)C T (t)R−1 (t) .25) ˙ Obviously.1: The estimation error e(t) = x(t) − x b(t) satisfies the differential equation e˙ = x˙ − x b˙ = Ax + Bu + B(v−u) ˙ − [A−HC]b x − Bu − HCx − H(r−r) ˙ = [A−HC]e + B(v−u) ˙ − H(r−r) ˙ . with increment dH can be formulated as follows: ˙ ∂ Σ(H) ˙ dΣ(H. In order for Σ(H) to 2 T : N 7→ N T and U : M 7→ M + M T .24) = AΣ + ΣAT + BQB T − ΣC T R−1 CΣ + [H −ΣC T R−1 ] R [H −ΣC T R−1 ]T ≥ AΣ + ΣAT + BQB T − ΣC T R−1 CΣ . its integral Σ(t) is infimized at all times as well.24) for the covariance matrix Σ(t) of the state estimation error e(t): Σ˙ = [A−HC]Σ + Σ[A−HC]T + BQB T + HRH T . at some value of H. (4. (4. dH) = dH ∂H = − dHCΣ − ΣC T dH T + dHRH T + HRdH T = U [HR−ΣC T ]T dH . The optimal error covariance matrix of the state estimation errer e(t) thus satisfies the following matrix Riccati differential equation: ˙ Σ(t) = A(t)Σ(t) + Σ(t)A(t)T + B(t)Q(t)B(t)T − Σ(t)C(t)T R(t)−1 C(t)Σ(t) Σ(t0 ) = Σ0 .

For Kalman filters operating over very long time intervals. Q) and r˙ : N (0. B] is completely controllable [16. This is satisfied if and only if the matrix in the square brackets vanishes. it is necessary that the first derivative vanishes at H [4]. for H(t) = Σ(t)C T (t)R−1 (t) . where.33) Note that the covariance matrix Σ∞ of the state estimation error is positive-definite (rather than only positive-semidefinite).1 Linear Filtering 71 have an infimum at some value of the matrix H.32) where Σ∞ ≥ 0 is the covariance matrix of the error x b(t) − x(t) of the state estimate x b(t) satisfying the following algebraic matrix Riccati equation: 0 = AΣ∞ + Σ ∞ AT − Σ∞ C T R−1 CΣ ∞ + BQB T . again. Find the time-invariant Kalman filter! . Ch.4.3. ∞). If the dynamic system [A. use the following time-invariant Kalman filter: x b˙ (t) = Ab x(t) + Bu(t) + H[y(t)−r(t)−C x b(t)] = [A− HC]b x(t) + Bu(t) + H[y(t)−r(t)] x b(t0 ) = x0 with H = Σ∞ C T R−1 . it is therefore attractive to use the time-invariant Kalman filter: Corollary 4. 4.28). the solution of the matrix Riccati differential equation becomes asymptotically constant (even if A is not a stability matrix): limt→∞ Σ(t) = Σ∞ .30) (4. Example 4. and if the system [A. C] is time-invariant. Ch. B.e. The Time-Invariant Kalman Filter For t ∈ [0. if and only if the system [A. and if the two intensity matrices Q and R are constant. in the time-invariant/quasi-stationary case. Σ0 ) and the uncorrelated white noises v˙ : N (0.4.31) (4. i. C] is completely observable [16.7].6]. (4. (4. 4. Σ(t) is the solution of the Riccati differential equation (4.27) with the boundary condition (4.. System of first order Consider the following stochastic system of first order with the random initial condition ξ : N (x0 . R): x(t) ˙ = ax(t) + b[u(t) + v(t)] ˙ x(0) = ξ y(t) = x(t) + r(t) ˙ .

the filter gain H is smaller than in the latter. Rk ) Cov(e ri . y(t) = C(t)x(t) + r(t) ˙ . The Continuous-Time/Discrete-Time Kalman Filter Update at time tk : x b(tk |tk ) = x b(tk |tk−1 ) [ ]−1 + Σ(tk |tk−1 )CkT Ck Σ(tk |tk−1 )CkT +Rk b(tk |tk−1 )] × [yk −rk −Ck x Σ(tk |tk ) = Σ(tk |tk−1 ) (4. Also note that the ratio Q R is relevant rather than the absolute sizes of the intensities Q and R. the continuous-time/discrete-time Kalman filter alternatingly performs update steps at the sampling times tk processing the latest measurement information yk and open-loop extrapolation steps between the times tk and tK+1 with no measurement information available: Lemma 4. In the former case.37) [ ]−1 − Σ(tk |tk−1 )CkT Ck Σ(tk |tk−1 )CkT +Rk Ck Σ(tk |tk−1 ) (4. rej ) = Ri δij .11) which is corrupted by the white noise error r(t) ˙ with infinite covariance. irrespective of whether the given stochastic system is asymptotically stable (a < 0) or unstable (a > 0). must be replaced by an “averaged” discrete-time measurement yk = Ck x(tk ) + rek rek : N (rk .5. The continuous-time measurement (4. — As a consequence. Therefore.72 4 Model-Based Filtering The resulting time-invariant Kalman filter is described by the following equations: ( ) √ √ Q Q b(t) + bu(t) + a + a2 + b2 y(t) x b˙ (t) = − a2 + b2 x R R x b(0) = x0 . Notice that the pole of the Kalman filter is at the left of −|a| in the complex plane.34) (4.38) . (4.35) (4.36) At any sampling time tk we consider the state estimate x b(tk |tk−1 ) and the corresponding error covariance matrix Σ(tk |tk−1 ) before the new measurement yk has been processed and the state estimate x b(tk |tk ) and the corresponding error covariance matrix Σ(tk |tk ) after the new measurement yk has been processed. Continuous-time/discrete-time version of the Kalman filter In the continuous-time/discrete-time problem we consider the case of continuous-time system dynamics and discrete-time measurements. we need a discrete-time version of the Kalman filter for digital signal processing.

47) It vanishes if and only if the following optimal observer gain matrix Hk = Σk|k−1 CkT [Ck Σk|k−1 CkT +Rk ]−1 (4. this Kalman filter is initialized at t0 as follows: x b(t0 |t−1 ) = x0 Σ(t0 |t−1 ) = Σ0 . (4. (4.41) Assuming that the Kalman filter starts with an update at the initial time t0 .49) .4.46) The first derivative of Σk|k with respect to Hk is ∂Σk|k = U [Hk Ck Σk|k−1 CkT + Hk Rk − Σk|k−1 CkT ]T . the covariance matrix Σ of the estimation error e changes as a consequence to the processing of the new measurement yk in the following way: Σk|k = [I −Hk Ck ]Σk|k−1 [I −Hk Ck ]T + Hk Rk HkT .39) Continuous-time extrapolation from tk to tk+1 with the initial conditions x b(tk |tk ) and Σ(tk |tk ): x b˙ (t|tk ) = A(t)b x(t|tk ) + B(t)u(t) ˙ Σ(t|tk ) = A(t)Σ(t|tk ) + Σ(t|tk )AT (t) + B(t)Q(t)B T (t) . (4.48) is chosen.45) By assumption. the white noise random vector rk is independent of ek|k−1 . ∂Hk (4. This choice yields the infimal error covariance matrix Σk|k = Σk|k−1 [ ]−1 − Σk|k−1 CkT Ck Σk|k−1 CkT +Rk Ck Σk|k−1 .43) Proof of Lemma 4. Therefore. (4. Therefore. (4.42) (4.5 (update step): A bias-free estimate x bk|k can be obtained in the following form: x bk|k = x bk|k−1 + Hk [yk −rk −Ck x bk|k−1 ] .44) where Hk is an arbitrary observer gain matrix. (4. equivalently: Σ −1 (tk |tk ) = Σ −1 (tk |tk−1 ) + CkT Rk−1 Ck . (4.1 Linear Filtering 73 or.40) (4. the estimation error e = x −b x changes as follows due to the processing of the measurement yk : ek|k = ek|k−1 − Hk [Ck ek|k−1 + rk −rk ] = [I −Hk Ck ]ek|k−1 − Hk [rk −rk ] .

rej ) = Ri δij ξ.63) Lemma 4.66) . Rk ) (4.60) tk Ck = C(tk ) Q(tk ) Qk = tk+1 −tk R(tk ) Rk = ∆tk (4.53) ve : N (uk .58) Of course.57) (4.59) zero order hold equivalence (4. (4. Σ0 ) (4. re : mutually independent. the system dynamics as well as the measurements are modeled in discrete-time.52) (4. The Discrete-Time Kalman Filter Update at time tk : x bk|k = x bk|k−1 Σk|k [ ]−1 + Σk|k−1 CkT Ck Σk|k−1 CkT +Rk [yk −rk −Ck x bk|k−1 ] [ ] −1 = Σk|k−1 − Σk|k−1 CkT Ck Σk|k−1 CkT +Rk Ck Σk|k−1 (4. ve. We have the following (approximate) “zero-order-hold-equivalence” correspondences to the continuoustime stochastic system: Fk = Φ(tk+1 . t)B(t)dt transition matrix (4. equivalently: −1 −1 Σk|k = Σk−1|k−1 + CkT Rk−1 Ck (4. tk ) ∫ tk+1 Gk = Φ(tk+1 . vej ) = Qi δij re : N (rk .74 4 Model-Based Filtering Discrete-time version of the Kalman filter In the discrete-time version of the problem. we consider the following discrete-time stochastic system: xk+1 = Fk xk + Gk vek x0 = ξ (4.6.55) (4.64) (4.62) ∆tk = averaging time at time tk (4.51) yk = Ck xk + rek ξ : N (x0 .56) Cov(e ri .50) (4. Qk ) Cov(e vi .65) or.54) (4. For this kind of problem.61) (4. it is possible to transform the continuous-time version of the linear filtering problem into this discrete-time version.

In order to obtain reasonable results. . Q) e . the discrete-time version of the stochastic system is xk+1 = F xk + G[uk + vek ] x0 = ξ yk = xk + rek with F = eaT ) b ( aT G= e −1 a e vek : N (0.67) Σk+1|k = Fk Σk|k FkT + Gk Qk GTk (4. Σ0 ) and the uncorrelated white noises v˙ : N (0.70) Example 4. — Find the discrete-time Kalman filter! Using the zero order hold equivalence. R) e=Q Q T e of the discrete white noise signal rek could be obtained The covariance matrix R e = R .4 with the random initial condition ξ : N (x0 . it e of the discrete measurement error by is usually best to model the variance R considering the physics of the discrete measurement device.4. (4. In practice. rek : N (0. Q) and r˙ : N (0. R): x(t) ˙ = ax(t) + b[u(t) + v(t)] ˙ x(0) = ξ y(t) = x(t) + r(t) ˙ . where from the intensity R of the continuous-time white noise r˙ as R ∆T ∆T is some averaging time of a continuous-time measurement.68) Initialization at time t0 : x b0|−1 = x0 (4.7.1 Linear Filtering 75 Extrapolation from tk to tk+1 : x bk+1|k = Fk x bk|k + Gk uk (4. the data are sampled with a constant 1 sampling time T ≪ τ = |a| .69) Σ0|−1 = Σ0 . System of first order Consider again the first-order stochastic system of Example 4.

we might as well use the time-invariant discrete Kalman filter (except perhaps during the start-up transient). and a constant sampling time. Σk+1|k = F 2 Σk|k + G2 Q Initialization at t0 : x b0|−1 = x0 Σ0|−1 = Σ0 .76) . Q(t)) r˙ : N (r(t). R(t)) . t) + B(t)v(t) ˙ x(t0 ) = ξ y(t) = g(x(t). When the nonlinear model is too involved. u(t). Σ0 ) v˙ : N (v(t). where both Σk|k−1 and Σk|k become constants.75) (4. a linearization of the problem is the only way to tackle the filtering problem.2 The Extended Kalman Filter Continuous-time version of the extended Kalman filter Obviously. there are many phenomena whose dynamics are not linear. t) + r(t) ˙ ξ : N (x0 . Where it hurts: use linearization. We consider the following nonlinear stochastic system: x(t) ˙ = f (x(t). Unfortunately.73) (4. Since the linearized problem may not be a good approximation we stick to the following philosophy: • • Where it does not hurt: Analyze the nonlinear system. (4. stationary noise processes.71) (4.1. Notice that we have obtained a time-varying discrete Kalman filter. — The tedious algebra is left to the reader.74) (4. there is no closed theory for nonlinear problems as there is for linear problems. (Approximate) extrapolation from tk to tk+1 : x bk+1|k = F x bk|k + Guk e.76 4 Model-Based Filtering The discrete-time Kalman filter is described by the following equations: Update at time tk : x bk|k = x bk|k−1 + Σk|k = Σk|k−1 − Σk|k−1 e Σk|k−1 + R [yk − x bk|k−1 ] 2 Σk|k−1 e Σk|k−1 + R . Since we have a time-invariant stochastic system. 4.72) (4.

t)] x b(t0 ) = x0 (4.80) (4. The extended Kalman filter Dynamics of the state estimation: x b˙ (t) = f (b x(t).77) (4.81) The matrices A(t) and C(t) correspond to the dynamics matrix and the output matrix. t) + B(t)v(t) + H(t)[y(t)−r(t)−g(b x(t). (4.4.79) The “error covariance” matrix Σ(t) must be calculated in real-time using the folloing matrix Riccati differential equation: ˙ Σ(t) = A(t)Σ(t) + Σ(t)AT (t) − Σ(t)C T (t)R−1 (t)C(t)Σ(t) + B(t)Q(t)B T (t) Σ(t0 ) = Σ0 . . ∂x A(t) = (4. u(t).8. t) C(t) = .82) (4. u(t). we face the following problems: • • • • • This filter is not optimal.1 Linear Filtering 77 The initial state ξ and the white noise processes v˙ and r˙ are assumed to be mutually independent. The state estimate will be biased. The matrix Σ(t) is only an approximation of the state error covariance matrix.78) with H(t) = Σ(t)C T (t)R−1 (t) . of the linearization of the nonlinear system around the estimated trajectory: ∂f (b x(t). (4. ˙ and r˙ are.83) Because the dynamics of the “error covariance matrix” Σ(t) correspond to the linearized system. The above-mentioned pragmatic approach leads to the extended Kalman filter: Lemma 4. respectively. The reference point for the linearization is questionable. v. t) ∂x ∂g(b x(t). The state vectors x(t) und x b(t) are not Gaussian even if ξ.

92) Again.85) (4. rej ) = Ri δij . tk )] Σ(tk |tk ) = Σ(tk |tk−1 ) − Σ(tk |tk−1 )CkT (4.95) (4. u(t). of the nonlinear system linearized around the estimated trajectory: ∂f (b x(t).94) Initialization at t0 : x b(t0 |t−1 ) = x0 Σ(t0 |t−1 ) = Σ0 . (4.88) (4. the matrices A(t) and Ck are the dynamics matrix and the output matrix. t) + r(t) ˙ (4. tk ) Ck = ∂x A(t) = (4.91) (4. t) ∂x ∂g(b x(tk |tk−1 ). u(t).78 4 Model-Based Filtering Continuous-time/discrete-time version of the extended Kalman filter Again.87) This version of the extended Kalman filter works as follows: Lemma 4. respectively.86) (4. Rk ) Cov(e ri .89) ]−1 [ T Ck Σ(tk |tk−1 )Ck + Rk Ck Σ(tk |tk−1 ) (4.84) must be replaced by an “averaged” dicrete-time measurement yk = g(x(tk ).9.90) Extrapolation between tk and tk+1 : x b˙ (t|tk ) = f (b x(t|tk ).93) (4. the continous-time measurement corrupted with the white noise r˙ with infinite covariance y(t) = g(x(t). (4. The continuous-time/discrete-time extended Kalman filter Update at time tk : x b(tk |tk ) = x b(tk |tk−1 ) [ ]−1 + Σ(tk |tk−1 )CkT Ck Σ(tk |tk−1 )CkT + Rk × [yk −rk −g(b x(tk |tk−1 ).96) . t) + B(t)v(t) ˙ Σ(t|tk ) = A(t)Σ(t|tk ) + Σ(t|tk )AT (t) + B(t)Q(t)B T (t) (4. tk ) + rek rek : N (rk .

t) : has full rank for all x and t ∂x ξ : N (x0 . tk+1 ] between two subsequent measurements. state estimates x b(. t) + r(t) ˙ ∂g(x. there is no discretetime version of the extended Kalman filter which would be analogous to the discrete-time Kalman filter of Lemma 4. The initial state ξ and the white noise processes v˙ and r˙ are assumed to be mutually independent. Σ0 ) v˙ : N (v(t). R(t)) . we will have to live with a continuous-time/discrete-time version of optimal nonlinear filtering. t)v(t) ˙ x(t0 ) = ξ y(t) = g(x(t).4. Rather. 4. there are various methods for extracting a suitable estimate x b(t). the differential equations for the state estimate x b(t) and the “error covariance matrix” Σ(t) have to be numerically integrated over the time interval [tk . For a given probability density function pxb(tk ) for x b(tk ).e.2 Nonlinear Filtering Continuous-time version The following stochastic nonlinear dynamic system is considered: x(t) ˙ = f (x(t). u(t). it will not be possible to integrate the differential equations of the nonlinear dynamic system analytically. Q(x(t). t)) r˙ : N (r(t).2 Nonlinear Filtering 79 Discrete-time version of the extended Kalman filter In general.6. probability density functions must be calculated in continuous-time. The continuous-time version of optimal filtering is of interest only if all of the relevant equations can be solved analytically. i. — In general. . We use the following Philosophy: • • • Optimal filtering implies making full usage of information. In general.. t) + B(x(t).) will only be needed at discrete times tk . Therefore.

t) ∂t ∂xi i ] } ∑ ∑ 1 ∂ 2 {[ B(x. The optimal nonlinear filter Update at time tk : Calculate the conditional probability density function of x(tk |tk ) using the multivariate probability density function of y(tk |tk−1 ) and x(tk |tk−1 ) and the probability density function of y(tk |tk−1 ) at the measured value y(tk ): p(x(tk |tk )) = = p(y(tk |tk−1 ). tk ) + rek rek : N (rk . t)+Bi (x. t) + r(t) ˙ must be replaced by an “averaged” dicrete-time measurement yk = g(x(tk ). Rk ) Cov(e ri . x(tk |tk−1 )) p(y(tk |tk−1 ))|y(tk ) p(y(tk |tk−1 )|x(tk |tk−1 )) p(y(tk |tk−1 ))|y(tk ) . t) p(x. tk+1 ]: ] } ∂p(x. the continous-time measurement corrupted with the white noise r˙ with infinite covariance y(t) = g(x(t). t) = 0 . t)B T (x. t) ∑ ∂ {[ + fi (x. t)v(t) p(x. tk )+rk . − 2 i j ∂xi ∂xj ij . Extrapolation between tk and tk+1 : Calculate the probability density function p(x(t|tk )) of x(t|tk ) by solving the following Fokker-Planck partial differential equation over the time interval [tk . p(y|x)p(x) dx . t)Q(x. The following continuous-time/disrete-time scheme for optimal nonlinear filtering can be guessed from the (extended) Kalman filter: Theorem 4. . rej ) = Ri δij . Rk ) p(y(tk |tk−1 )) : to be computed from p(x(tk |tk−1 )) and p(e rk ) using the general relation ∫ ∫ p(y) = .10. u(t). . p(x(tk |tk−1 )) Please note: • • • p(x(tk |tk−1 )) : solution of the corresponding Fokker-Planck partial differential equation p(yk |xk ) ↔ N (g(xk .80 4 Model-Based Filtering Continuous-time/discrete-time version Again.

3.3 Kalman Filter and Parameter Identification 81 Initialization at time t0 : p(x(t0 |t−1 )) = p(ξ) . vej ) = Qi δij rek : N (rk . In addition.1 Introduction In order to use stochastic differential equations as a tool to model financial prices or technical applications. where ek|k−1 denotes the unbiased prediction error or residual at the output. we use the Kalman filter in order to derive a maximum likelihood estimator. the equations of the discrete-time Kalman filter are recapitulated here. ve(.101) (4.98) yk = Ck xk + rek (4. 4.4. . Thus.102) (4.3 Kalman Filter and Parameter Identification 4. In this chapter.104) with the stochastic properties We assume that ξ.) . first-principle models hardly exist for financial problems. we show some results of probability theory which are useful in the derivation of the maximum likelihood estimator. and re(. rej ) = Ri δij . The equations of the discrete-time Kalman filter are: Update at time tk due to the measurement yk : ek|k−1 = [yk − rk − Ck x bk|k−1 ] x bk|k = x bk|k−1 + Σk|k−1 CkT [Ck Σk|k−1 CkT + Rk ]−1 ek|k−1 Σk|k = Σk|k−1 − Σk|k−1 CkT [Ck Σk|k−1 CkT + Rk ]−1 Ck Σk|k−1 .103) Cov(e ri .) are mutually independent. one has to resort to quantitative techniques to identify the parameters of the chosen models.100) vek : N (uk . In contrast to engineering applications.97) (4. We consider the discrete-time stochastic dynamic system xk+1 = Fk xk + Gk vek x0 = ξ (4. conditioned on the measurements observed up to and including time tk−1 . 4. Σ0 ) (4.2 Kalman Filter Equations For ease of reference. one important step is to identify the relevant parameters of one’s models. Qk ) Cov(e vi .99) ξ : N (x0 . Rk ) (4. (4.3.

. YN −1 .3. Y1 . and Rk are not perfectly known. .97)–(4. . . Qk . . . . y1 . . Σ0 . — This should help the reader distinguish between a priori information which can be obtained from Y and a posteriori or conditional information based on the sample path y. yN ) . . . . . But they are parametrized by a “completely” unknown parameter vector ψ ∈ Ψ ⊆ Rd .) .105) has the covariance matrix Λk|k−1 = Ck Σk|k−1 CkT + Rk . ψ)(. 4. . yN are fixed. ψ)(y) = p(Y. the multivariate probability density function of the random process Y is parametrized by ψ and can be denoted as follows: p(Y. yN } an individual sample path of Y which is obtained from the measurements at the times t0 through tN in a single experiment.104) for the time interval from t0 through tN . On the other side.) . the unbiased conditional prediction error ek|k−1 = yk − rk − Ck x bk|k−1 (4. the values y0 . Given the sample path y of the measurements. . ψ | y0 . Note that due to the mutual independence of ξ.82 4 Model-Based Filtering Extrapolation from time tk to time tk+1 : x bk+1|k = Fk x bk|k + Gk uk Σk+1|k = Fk Σk|k FkT + Gk Qk GTk Initialization at t0 : x b0 = x0 Σ0|−1 = Σ0 . and re(. Notice that the likelihood function l is a function of ψ alone because for this sample path. some or all of the elements of the matrices Fk . In the scenario of this section. y1 . yN −1 . we denote by Y = {Y0 . (4. This is explicitly denoted by . we introduce the so-called likelihood function l(ψ|y) = p(Y.) . we denote by y = {y0 . ve(. . .106) This result is important for building the log-likelihood estimator. Obviously. YN } the discrete-time ramdom process which corresponds to the output of the stochastic system (4. yN −1 .3 Parameter Estimation In this section. Gk .

yN −1 )(yN ) = p(YN .. . . ψ∈Ψ (4. . . . . . . ψ | y0 )(y1 ) · p(Y0 . . we use an appropriate optimization routine which maximizes L(ψ|y) with respect to ψ. ψ | y0 . . (4.4. conditioning the general term p(Yk . Therefore. . . · p(Y1 .107) Given the measurements y0 . By this. yN −2 )(yN −1 ) · . yN −2 )(yN −1 ) .. (4. · p(Y1 . ψ)(y) = p(YN . · p(Yk . . . ψ)(y0 ) . ψ | yN −1 )(yN ) · p(YN −1 . ψ|yk−1 ) of the corresponding residual only differ by “bias” terms. it is more practical to consider the so-called “log-likelihood function” L(ψ|y) = ln(l(ψ|y)). yN . . in Equation (4. ψ | y0 .107).105) shows that the conditional density functions p(Yk . . Equation (4. ψ | y0 . we can replace the former by the latter. ” in l(. ψ | y0 . Since the stochastic processes X and Y are Markov processes. . yk−1 )(yk ) on y0 through yk−1 amounts to the same as conditioning on the most recent value yk−1 only. we can find the values for the conditional expectation . we want to choose the “most likely” value for the unknown parameter ψ. . . .3 Kalman Filter and Parameter Identification 83 “given that . ψ | y0 . yk−1 )(yk ) · . . This yields the expression ) 1 ∑( p ln(2π) + ln(det Λk|k−1 ) + eTk|k−1 Λ−1 e . . . . Thus. . the desired maximum likelihood estimate ψbM L (y) for the unknown parameter ψ ∈ Ψ ⊆ Rd is: ψbM L (y) = arg max L(ψ|y) . we mean that we want to find the value of ψ which maximizes the likelihood function l(ψ|y). . . . . . Using Bayes’ rule for probability density functions repetitively.) and on the right-hand side. .108). ψ|yk−1 ) of the measurement at time tk and p(Ek . . ·p(Yk . .109) In order to estimate the unknown parameters from the log-likelihood function in (4. . ψ)(y0 ) . ψ | yN −2 )(yN −1 ) · . yN −1 )(yN ) · p(YN −1 . ψ | y0 . we can write l(ψ|y) = p(Y. . . = p(YN .108) k|k−1 k|k−1 2 N L(ψ|y) = − k=0 Since the logarithm is a monotonic function. . ψ | y0 )(y1 ) · p(Y0 . . . . . Since all of the distributions are Gaussian. Given the simplicity of the innovation form of the log-likelihood function. we get the considerably simpler expression l(ψ|y) = p(YN . yN −1 )(yN ) · p(YN −1 . . This is standard in the context of conditional probability density functions. ψ | yk−1 )(yk ) · . ψ | y0 .

108) can be used to yield a quasi maximum likelihood parameter estimate. Thereupon. Finally. Flow Chart of Kalman Filter Parameter Estimation In the case of implementing extended Kalman filter algorithms where we use approximations of the conditional normal distribution. the prediction error and the covariance matrix are saved. the log-likelihood function (4. This loop is repeated until we reach the final time T after N steps. Hence. and the likelihood function is evaluated again and so on. we are able to calculate the value of the conditional log-likelihood function numerically.1. . with the obtained time-series of the state variables we can evaluate the likelihood function to choose a better estimate of ψ. then use the update equations to calculate the a posteriori estimates. For each time step. The recursion starts with a feasible choice of the parameter vector ψ and a state vector x0 with covariance matrix Σ0 . we use the prediction equations to calculate the a priori estimates.84 4 Model-Based Filtering of the prediction error and the conditional covariance matrix for every given parameter vector ψ using the Kalman filter algorithm. The flow chart in Figure 4. The Kalman filter recursion is started again. until a reasonable stopping criterion is satisfied. 4.1 gives a graphical illustration of the Kalman filter parameter estimation algorithm. Inital guess of model parameters ? Evaluate Kalman filter equations Chose other parameter values ? Evaluate likelihood ? 6 no Stop criterion of optimizer met? yes ? Stop Fig.

4 Numerical Implementation In the case where the parameter space is unrestricted. Notes and Comments For more details on this chapter. [21]. − ∂2L b (ψM L |y) > 0 . we actually minimize the negative log-likelihood function of Equation (4. [32].3. the following books are recommended: [16]. In the numerical implementation to obtain the optimal parameter estimates. [13]. i.3 Kalman Filter and Parameter Identification 85 4. [12]. ∂ψ 2 ∂ψ 2 ∂L ∂ L In order to calculate ∂ψ (ψi |y) and ∂ψ 2 (ψi |y) we use numerical approximations. ψbM L (y) = arg min −L(ψ|y) ψ∈Ψ The necessary condition for a minimum is ∂L b (ψM L |y) = 0 ∂ψ assuming that the relevant partial derivative exists.108). ∂ψ 2 In order to compute the minimum numerically. [1]. we briefly discuss a numerical procedure to calculate a better estimate of ψ. The iteration rule for Newton’s method for unconstrained optimization is given by the parameter estimates at the ith iteration step by: ψi+1 = ψi + si . and [18]. [19]. [2].4. the Hessian is negative-definite and thus the quasi-Newton method converges. . [3]. we choose a quasi-Newton method. such as first and second order differences.e.. A sufficient condition is that the Hessian matrix of = −L exists and is positive-definite. [22]. Since the log-likelihood function assumes a global maximum. For this case. the estimation includes an unconstrained optimization. where the variable si is found by solving ∂2L ∂L (ψi |y) · si = − (ψi |y) .

86

4 Model-Based Filtering

5
Optimal Control
As in all optimal control problems,
the GIGO principle applies.
Hans P. Geering

5.1 Deterministic Optimal Control
We begin the section on optimal control with a repetition of the deterministic
case. We do this since the methods and results of the deterministic case can
be regarded as a special case of the stochastic case. In addition, it is essential
to see the differences and similarities in the methods as well as the results of
the deterministic and the stochastic cases.
In the deterministic optimal control problem, we regard a deterministic dynamic system, a control, and a cost functional which is to be minimized. In
the optimal control problem, the control strategy is sought that effectively
minimizes the cost functional under the given dynamics. Two approaches are
given here: Pontryagin’s minimum principle and Bellman’s dynamic programming method leading to the well known Hamilton-Jacobi-Bellman equation.
The solution procedure for Pontryagin’s minimum principle involves solving
a two-point boundary problem. The resulting optimal control strategy is then
the optimal feed-forward solution. The optimal feed-back strategy is obtained
by solving the Hamilton-Jacobi-Bellman partial differential equation. For an
in depth treatment of the matter of optimal control we refer to [8], [15], or
[?].
5.1.1 Deterministic Optimal Control Problems
Consider the following dynamical system with the state vector x(t) ∈ Rn and
the control vector u(t) ∈ Rm :
x(t)
˙
= f (x(t), u(t), t) .
Its initial state at the fixed initial time 0 is given:
x(0) = x0 .

88

5 Optimal Control

The permissible controls over the fixed time interval [0, T ] satisfy the following
condition:
u(t) ∈ U for all t ∈ [0, T ],
where U is a time-invariant, closed, and convex subset of the control space
Rm :
U ⊆ Rm .
Furthermore, consider a cost functional of the following form:

T

J = K(x(T )) +

L(x(t), u(t), t) dt .
0

This cost functional should either be minimized or maximized, depending
upon the problem at hand. Consequently, there are two alternative formulations of the optimal control problem.
The Minimization problem:
Find the control trajectory u∗ : [0, T ] → U ⊆ Rm generating the state trajectory x∗ : [0, T ] → Rn such that the cost functional J is minimized.
The Maximization problem:
Find the control trajectory u∗ : [0, T ] → Ω ⊆ Rm generating the state trajectory x∗ : [0, T ] → Rn such that the cost functional J is maximized.
Definition 5.1. Hamiltonian function
The Hamiltonian function H : Rn × U × Rn × [0, T ] associated with a regular
optimal control problem is:
H(x(t), u(t), p(t), t) = L(x(t), u(t), t) + pT (t)f (x(t), u(t), t) ,
where p(t) ∈ Rn is the so-called costate vector.
5.1.2 Necessary Conditions for Optimality
The Russian mathematician Pontryagin has found the following necessary
conditions for the optimality of a solution:
Theorem 5.2. Pontryagin
If u∗ : [0, T ] → U is an optimal control trajectory, the following conditions are
satisfied:
a) Optimal state trajectory:
x˙ ∗ (t) = ∇p H|∗
= f (x∗ (t), u∗ (t), t)
x∗ (0) = x0

for t ∈ [0, T ]

5.1 Deterministic Optimal Control

89

b) Optimal costate trajectory: There exists an optimal costate trajectory satisfying
p˙∗ (t) = −∇x H|∗
= −∇x L(x∗ (t), u∗ (t), t) − fxT (x∗ (t), u∗ (t), t)p∗ (t)
p∗ (T ) = ∇x K(x∗ (T )) .

for t ∈ [0, T ]

c) Global static optimization of the Hamiltonian function:
For the minimization problem:
For all t ∈ [0, T ], the Hamiltonian is globally minimized w.r. to u, i.e.,
H(x∗ (t), u∗ (t), p∗ (t), t) ≤ H(x∗ (t), u, p∗ (t), t)

for all u ∈ U .

For the maximization problem:
For all t ∈ [0, T ], the Hamiltonian is globally maximized w.r. to u, i.e.,
H(x∗ (t), u∗ (t), p∗ (t), t) ≥ H(x∗ (t), u, p∗ (t), t)

for all u ∈ U .

5.1.3 Example: The LQ-Regulator Problem
For the linear time-varying system
x(t)
˙
= A(t)x(t) + B(t)u(t)
with the initial state
x(0) = x0
find the unconstrained optimal control u : [0, T ] → Rm such that the quadratic
cost functional
∫ T (
)
1
1 T
J = xT (T )F x(T ) +
x (t)Q(t)x(t) + uT (t)R(t)u(t) dt
2
0 2
is minimized.
Here, the penalty matrix R(t) is symmetric and positive-definite, and the
penalty matrices F and Q(t) are symmetric and positive-semidefinite.
Analysis of the necessary conditions for optimality:
Hamiltonian function:
H(x(t), u(t), p(t), t) =

1 T
1
x (t)Q(t)x(t) + uT (t)R(t)u(t)
2
2
+pT (t)A(t)x(t) + pT (t)B(t)u(t)

90

5 Optimal Control

Pontryagin’s necessary conditions for optimality:
x˙ ∗ (t) = A(t)x∗ (t) + B(t)u∗ (t)
x∗ (0) = x0
p˙∗ (t) = − Q(t)x∗ (t) − AT (t)p∗ (t)
p∗ (T ) = F x∗ (T )
(1
)
u∗ (t) = arg minm
uT R(t)u + p∗ (t)B(t)u
u∈R
2
H-minimizing control:
u∗ (t) = −R−1 (t)B T (t)p∗ (t)
Plugging the H-minimizing control into the differential equations leads to
the following linear two-point-boundary-value problem:
x˙ ∗ (t) = A(t)x∗ (t) − B(t)R−1 (t)B T (t)p∗ (t)
p˙∗ (t) = − Q(t)x∗ (t) − AT (t)p∗ (t)
x∗ (0) = x0
p∗ (T ) = F x∗ (T ) .
Considering that p∗ (T ) is linear in x∗ (T ) and that the linear differential
equations are homogeneous leads to the educated guess that p∗ (t) is linear in
x∗ (t) at all times, i.e.,
p∗ (t) = K(t)x∗ (t)
with a suitable n by n matrix function K(t).
This leads to the following linear state feedback control:
u∗ (t) = −R−1 (t)B T (t)K(t)x∗ (t) .
Exploiting the two-point-boundary-value problem and the proposed linear
relation leads to the following matrix Riccati differential equation for K(t)
with a boundary condition at the final time T :
˙
K(t)
= − AT (t)K(t) − K(t)A(t) + K(t)B(t)R−1 (t)B T (t)K(t) − Q(t)
K(T ) = F .
5.1.4 Deterministic Hamilton-Jacobi-Bellman Theory
In this section, we give a short introduction to the deteministic HamiltonJacobi-Bellman theory.
Consider again the optimal control problem formulated in Section 5.1.1:
x(t)
˙
= f (x(t), u(t), t)
x(0) = x0

u(t) ∈ U ⊆ Rm

T

J = K(x(T )) +

L(x(t), u(t), t) dt : min !
0

t). p. u.) is optimal with respect to all of the admissible trajectories x(. T ] at u e(x. x b(. iii) If X = Rn × [0. t . u. t) + pT f (x.). T ]. have a unique absolute minimum for all p ∈ Rn and all (x. t). p. t) . t) is the optimal-cost-to-go function. t) + H x. T ] → U generating the state trajectory x b : [0. the solution is linear and the optimal value of the cost J is quadratic (in x0 ). Theorem 5. t) = L(x. ii) Let J (x. ∇x J (x. t). u ∈ U. u. ii) The above statement of the theorem can be adapted to the problem of maximizing the cost functional J in the obvious way. T ] → Rn satisfies the condition ( ) u b(t) = u e x b(t). Hamilton-Jacobi-Bellman If the above-mentioned assumptions are satisfied and if the control trajectory u b : [0. t) : X → R be a continuously differentiable function satisfying the following partial differential equation [ ] ( ) ∂J (x.5. ( ) then the solution u b(. . T ] . t = 0 ∂t and the following boundary condition at the final time T : J (x. u e x.) which do not leave X. t for all t ∈ [0. t) as a function of u.5 Example: The LQ-Regulator Problem Consider again the LQ regulator problem: x(t) ˙ = A(t)x(t) + B(t)u(t) x(0) = x0 ∫ T ( ) 1 T 1 T J = x (T )F x(T ) + x (t)Q(t)x(t) + uT (t)R(t)u(t) dt : min ! 2 0 2 Quite generally speaking: Since this is an infinite-dimensional least-squares problem with linear side constraints. ∇x J (x. 5. T ) ∈ X . then the theorem states both necessary and sufficient conditions for the unique globally optimal solution of the control problem. ∇x J (b x(t). T ) = K(x) for all (x.1. t) ∈ X ⊆ Rn × [0.3.1 Deterministic Optimal Control 91 Assumptions: i) Let the Hamiltonian function H(x. Remarks: i) J (x.

p.1. The topic of stochastic optimal control is well covered in the literature. such as economics. u(t). among others in [6]. t) 1 ˙ = xT K(t)x ∂t 2 ∇x J (x. the result of Section 5.1) . Analysis: ∂J (x. As in the deterministic case. u(t). T ) = 1 T 1 x K(T )x = xT F x 2 2 Thus. we shall now generalize the deterministic system dynamics to the stochastic case by including diffusion models. The control may independently influence both the deterministic and the stochastic part of the system dynamics. The range of stochastic optimal control problems cover a great variety of fields.92 5 Optimal Control Therefore. T ]. and management systems among others.2.2 Stochastic Optimal Control After having recalled the deterministic optimal control problem. the optimal-cost-to-go function J (x. (5. we are interested in finding an optimal control that minimizes the expected cost functional subjected to the stochastic dynamic system. t)dt + g(x(t). [15]. defined for t ∈ [0. t) must be of the form J = 1 T x K(t)x 2 with a suitable symmetric and positive-(semi)definite n by n matrix K(t). t)dW (t) x(0) = x0 .1 Stochastic Optimal Control Problems Consider the following dynamical system with state vector x(t) ∈ Rn and control vector u(t) ∈ Rm : dx(t) = f (x(t). physics.3 is reestablished. t) = K(t)x u e(x. 5. 5. and [37]. t) = −R−1 (t)B T (t)p = −R−1 (t)B T (t)K(t)x HBJ partial differential equation: } 1 T{ ˙ x K(t) + Q + AT K + KA − KBR−1 B T K x = 0 2 Boundary condition: J (x.

2. Furthermore. Let W (t) be a k-dimensional Brownian motion which is given on the probability space (Ω. u(t).2 Stochastic Hamilton-Jacobi-Bellman Equation Consider the optimal control problem stated in Section 5. the problem can be divided into two parts.4) u:[t. T ]. over the fixed time interval [0. u∗ ). P) on which the usual conditions apply.2 Stochastic Optimal Control 93 The functions f : Rn × Rm × R → Rn g : Rn × Rm × R → Rn×k are given. Bellmann called dynamic programming. Essentially. x0 ) = J(u∗ ) = max J(u) .T ]→U The corresponding optimal state trajectory x∗ : [0. T ] → U form the optimal pair (x∗ . We define the optimal cost-to-go function J(t. This means. closed. the idea is that an optimal path has the property.4. We further consider the cost functional [ ] ∫ T J(u) = E K(x(T )) + L(x(t). (5. 5.5. since the second part is already optimal: . we restrict the control vector u(t) such that u(t) ∈ U for all t ∈ [0. T ] → Rn and the optimal control u∗ : [0. T ]. the control variables over the remaining period must be optimal for the remaining problem. x) as J(0.1. Maximize (5.4. The steps for finding a solution to the stochastic optimal control problem follow the approach by R. {Ft }t≥0 .2) 0 with the given scalar functions L : Rn × Rm × R → R and K : Rn → R. where we assume that from time t + ∆t to time T we have computed the optimal solution and we are still searching for the solution from time t to time t + ∆t. equations (5. T ] → U ⊆ Rm satisfying J(u∗ ) = max J(u). where the cost functional is to be minimized. t) dt (5. It is sufficient to search for the solution of the first part. The set of permissible controls U is a time-invariant. (5. and convex subset of the control space Rm .4) and the Problem 5. there is another optimal control problem.T ]→U Remark: Obviously.3) u:[0.2) by finding u∗ (t) : [0. that whatever the initial conditions and the control values over a certain initial period are. F.1)– (5. Problem 5.2.

7) ∂t t ∫ t+∆t + Jx (x. t + ∆t) ) ∫ t+∆t ( ∂J(x. t + ∆t) into (5. t) dt . t+∆t) + L(x(t). t) ∂x 2 ∂x2 Now. This results in the following derivation for J(x. we plug J(x. t + ∆t). u(t).t+∆t) We assume. t) = max E J(x. t)dt E J(x.6). t + ∆t) = J(x. t) J(x. We then substitute this function by taking the total differential and integrating it. x)g(x. t) dt t ) ] ∫ t+∆t ∂J(t. (5.t+∆t]→U = max [ t+∆t + t t+∆t E J(x. t) dt u:[t. t) + tr g(x. t)g T (x. u.6) } J(x. u(t). t) dt + Jx (x.t+∆t]→U ∫ t ∫ ( L(x. and we obtain [ ] ∫ t+∆t J(x.94 5 Optimal Control { u(t) = u(s) for t ≤ s ≤ t + ∆t u∗ (s) for t + ∆t < s ≤ T (5. t) + u:[t.5) The cost-to-go function can now be split into the two parts and we obtain [∫ t+∆t J(x. t) dt + K(x(T )) + t+∆t | {z (5. t) + AJ(x. u. ∂t (5. u(t). u. u. t) dt u:[t. t) dt (5.t+∆t]→U ∫ + t t+∆t L(x. t)g(x. We must be aware that x is a stochastic process and we therefore need to apply the Itˆo calculus. t) + ( t ) ] ∂J(x. t) = max E L(x(t). u. t) + + AJ(x.9) .T ]→U t ∫ ] T L(x(t). t)dW ∂t t [ ∫ t+∆t = max u:[t. x) + AJ(x. u. t)dW.8) t where we use the stochastic differential operator A(·) = ∂(·) 1 { ∂ 2 (·) } f (x. that the control function u(t) in t0 + ∆t ≤ t ≤ T is optimal for the problem beginning at t = t0 + ∆t so that we can write the second integral as J(x. u.

10) we set the arguments of the integral to zero and interchange the maximum operator with the integral and finally arrive at the HJB equation } { ∂J(x. t)Jxx (x. t) . and Jxx and reinserted into (5. t) + f (x. The maximizing u(t) can now be found in terms of x. u. 2. t)+Jx (x.2 Stochastic Optimal Control 95 We now take the expectation which removes the stochastic integral and subtract J(t.5. t)f (x.11) max L(x. t.12) This is the Hamilton-Jacobi-Bellman equation for a stochastic process of (5. t)g T (x.2. and the partial differential equation 1 Jt (x. t)+ tr{g(x. The solution J(x.1) and the cost functional (5. t)} 2 is maximal. u. t). t). t) is the maximal value function. t) u∈U ∂t ∂x } 1 { ∂ 2 J(x. t) + Jx (x. Jx . t) − = max L(x. x) on both sides (which we can do as this is a deterministic function independent of u and therefore not affected by the maximization over u) leading to: [∫ ) ] t+∆t ( ∂J(x. t) + AJ(x. u. Jx . t) dt (5. the explicit solution for the optimal control u(t) can be found. u. t) = 0. t) + ∂t u:[t. For a fixed J(x.t+∆t]→U t In order to fulfil the equality in equation (5. t) + tr{g(x. t)g T(x. The function u is put back into the HJB equation for u. t) + AJ(x. u. 2 ∂x2 (5.2). t)} = 0 2 is solved with the terminal condition J(x. u. T ) = K(x). u. u. . u. t)+L(x. t)g T (x. u. 5.10) 0= max E L(x. t) } + tr g(x. t)Jxx (x. t)f (x. (5. t) + u∈U ∂t and by expanding the differential operator A we get { ∂J(x.12).3 Solution Procedure The following steps lead to the optimal feedback control law as well as to the maximal value function: 1. t) such that 1 L(x. u. u. find u = u(x. By solving the resulting PDE for the cost-to-go function J(x. u. u. t) ∂J(x. Jxx .

this step by step “cook book” recipe is applied to a linear dynamic system and a quadratic cost functional. Q(t) ≥ 0 . Jx . t) = −R−1 S T x + B TJxT . In the next section. 2. Jx . Q(t) S(t) ] S T (t) R(t) ≥ 0. t) is put back into the equation for u. 5. Compute u(x. t) yields Ru + S T x + B TJxT = 0 .4 Stochastic LQG Examples with HJB Equation Below.2. J(x. t).3. x) into the HJB equation and solve it. 1{ T 0 = Jt + x Qx − xT SR−1 S T x − Jx BR−1 B T JxT + xT (A − BR−1 S T )T JxT 2 +Jx (A − BR−1 S T )x + xT C T Jxx Cx + σ T Jxx σ + xT C T Jxx σ } +σ T Jxx Cx + 2bT JxT . This results in the optimal feedback control law: u∗ = u∗ (x. t) = E xT (T )F x(T ) + g T x(T ) 2 ∫ 1 T( + x(t)T Q(t)x(t) + x(t)T S(t)u(t) 2 0 ) } + u(t)T S T (t)x(t) + u(t)T R(t)u(t) dt : min ! [ where F ≥ 0 . which is derived in Step 1. The function J(x. t): {1 Jt + max (xT Qx + xT Su + uT S T x + uT Ru) u 2 }} 1 { + Jx (Ax + Bu + b) + tr (Cx + σ)(Cx + σ)T Jxx . R(t) > 0 . t) for a fixed J(x. 2 The maximization with respect to u(x. which gives us ) ( u(x.96 5 Optimal Control 3. . t). the stochastic linear quadratic problem is formulated and its solution is given.2. Jx . {1 J(x. 1. We follow the “cook book” recipe of Section 5. Jx . Plug the obtained u(t. Consider the stochastic LQG problem: dx(t) = [A(t)x(t) + B(t)u(t) + b(t)]dt + [C(t)x(t) + σ(t)]dW (t) x(0) = x0 .

and consequently. t) = −R−1 S T x + B T JxT . Since a solution for the HJB PDE is found.. . and with JxT = Kx + φ .. the linear state feedback law can now be stated explicitly.. This is a matrix Riccati differential equation. We remember from Step 1: ( ) u(x. Jx . The conditions for the quadratic term are K˙ = − Q + SR−1 S T + KBR−1 B T K − (A − BR−1 S T )T K − K(A − BR−1 S T ) − C T KC K(T ) = F .. the resulting optimal feedback law is: u∗ (x.2 Stochastic Optimal Control 97 In order to solve this nonlinear PDE. 3. the linear term xT [. The conditions that the linear and the constant terms vanish are given by φ˙ = − (A − BR−1 S T − BR−1 B T K)T φ − Kb − C T Kσ φ(T ) = g and 1 1 ψ˙ = − σ T Kσ + φT BR−1 B T φ − bT φ 2 2 ψ(T ) = 0 .)x is set to zero. and the conditions for the solution are derived: ( 1 ˙ + φ˙ T x + ψ˙ + 1 xT Q − SR−1 S T − KBR−1 B T K 0 = xT Kx 2 2 ) +(A − BR−1 S T )T K + K(A − BR−1 S T ) + C T KC x [ ] +xT (A − BR−1 S T − BR−1 B T K)T φ + Kb + C T Kσ 1 1 + σ T Kσ − φT BR−1 B T φ + bT φ . the quadratic term xT (.] is set to zero. 2 The Ansatz is plugged back into the PDE. t) = 1 T x (t)K(t)x(t) + φ(t)T x(t) + ψ(t) . t) = −R−1 (B T K + S T )x − R−1 B T φ .5. the constant term is set to zero. the first order derivative of the quadratic Ansatz J(x) in Step 2. a quadratic Ansatz is made: J(x. 2 2 In order to assure that the above equation is satisfied for all x ∈ Rn .

there is a unique H-minimizing control law u∗ : Rn × Rn × R × [0. 2 0 For the LQG Problem. t) . t)g T (x. u.18) (5. t) = f (x. an explicit control law can be found. We regard the stochastic optimal control problem as stated in Section 5. u∗ (x.2. t) L(x. since the nonlinear PDE (of Step 2) is hardly ever solvable.98 5 Optimal Control The value of the optimal cost functional can be computed. px .14) and the stochastic version of the Hamiltonian function H : Rn × U × Rn × Rn×n × [0. t) 1 + tr{g(x. t) + pT f (x. 2 (5. px . t). t) = ∇x J(x. u. t) e p. p.16) We assume. u. we introduce the following substitutions: n×n H ∗ (x. u∈U (5. x0 ) = 1 T x K(0)x0 + φ(0)T x0 + ψ(0) . px .15) With these definitions. And in order to avoid cumbersome notation. p. px . In general.17) (5. t) = H(x. t) = L(x. p. Here. px . t) . we can derive the stochastic Pontryagin’s Maximum Principle. a system of forwardbackward stochastic differential equations (FBSDE) replaces the HJB partial differential equation. t) ∂x (5. fe(x. we need to define the adjoint variables for the stochastic optimal control problem: p(t) = JxT (x. u∗ (x. t). u. (5. t) ∂p and px (t) = = Jxx (x. p. t). px . t). because of its special structure which has been exploited. p.5 Stochastic Pontryagin’s Maximum Principle Using the knowledge we have from the HJB equation (5. px . u. u∗ (x. p. u. t) = L(x. p.1. px . p. T ]: H(x. t)px } . 5.12) of the HJB equation is: −Jt = max H(x. u∗ (x. t) ge(x. Before stating the stochastic maximum principle.20) . p.2. px . px . finding the solution of a stochastic optimal control law is a tremendous undertaking.12). px . T ] → U . t) = g(x.13) (5.19) (5. the stochastic version (5. p. because the optimal value function has been found: J0 = J(0.

. u∗ . t) + pT fe(x. u∗ .25) into (5. . p.22) Using Itˆo’s lemma on the definition of p in (5. t)e g T (x. tr{(e g geTJx2 )xx }. .25) (5. t)px .1) and (5. (5. c) Global static maximization of the Hamiltonian function implied. .21) we get the following simple expression for dx: dx = f (x. we write the differential equations for the state x and the costate p. px . t)dW = ∇p H ∗ (x. p. we arrive at the following explicit version of the HJB equation: −Jt (x. (5. (5. p. Combining (5. t) = H ∗ (x.13) yields: 1 T dp = Jxt dt + Jxx dx + Jxxx (dx)2 2 ] [ 1 T = Jxt + Jxx fe + tr{e g geT Jxxx } dt + Jxx gedW . t)dt + ge(x.27) We can finally write the system of forward-backward stochastic differential equation (FBSDE) for the stochastic maximum principle: dx∗ = ∇p H ∗ dt + gedW dp∗ = −∇x H ∗ dt + px gedW x∗ (0) = x0 p∗ (T ) = ∇x K(x(T )) . 2 (5.26) (5.28) This stochastic two-point-boundary value problem is similar to the deterministic case: a) Optimal state trajectory obtained from dx∗ . . t) e p. tr{(e g geTJxn )xx }]T . } 1 { + tr ge(x. t) + g(x. p. t) = L(x. px .2 Stochastic Optimal Control 99 Thus. p.21) 2 In the next step. 2 Inserting (5. px . px . px .23) (5. t)dW . (5. p.24) where tr{e g geTJxxx } = [tr{(e g geTJx1 )xx }.21) with respect to x and we get ∂p ∂px + Hp∗x ∂x ∂x 1 T −Jxt = ∇x H ∗ + Jxx fe + tr{e g geTJxxx } .24) leads to −Jxt = Hx∗ + Hp∗ dp = − ∇x H ∗ dt + Jxx gedW ∗ = − ∇x H dt + px gedW . px .5. b) Optimal costate trajectory obtained from dp∗ . px . we differentiate both sides of the HJB equation (5. In order to calculate the term Jxt .

px ) = [ ] 1 ) 1( T x Qx + uT Ru + pT Ax + Bu + tr{σσ T px } . The Hamiltonian function is: H(t. 2 2 Pontryagin’s necessary conditions are: dx∗ (t) = ∇p H|∗ dt + σdW [ ] = A(t)x∗ (t) + B(t)u∗ (t) dt + σ(t)dW (t) x∗ (0) = x0 dp∗ (t) = −∇x H|∗ + p∗x σdW [ ] = − Q(t)x∗ (t) + AT (t)p∗ (t) dt + p∗x σ(t)dW (t) p∗ (T ) = F x∗ (T ) H(x∗ . p∗x . t) . that ) 1( T u∗ (t) = arg min u (t)R(t)u(t) + uT (t)B T (t)p∗ (t) u 2 −1 = −R (t)B T (t)p∗ (t) . Find the optimal control u∗ : [0.100 5 Optimal Control 5. Plugging this H-minimizing control into the differential equations for x∗ and p∗ yields the following forward-backward stochastic differential equations: [ ] dx∗ (t) = A(t)x∗ (t) − B(t)R−1 (t)B T (t)p∗ (t) dt + σ(t)dW (t) x∗ (0) = x0 [ ] dp∗ (t) = − Q(t)x(t) + AT (t)p∗ (t) dt + p∗x σ(t)dW (t)) p∗ (T ) = F x∗ (T ) . u∗ .6 Stochastic LQG Example with Maximum Principle Consider the stochastic LQG problem [ ] dx = A(t)x(t) + B(t)u(t) dt + σ(t)dW (t) x(0) = x0 {1 xT (T )F x(T ) J(u) = E 2 ∫ ) } 1 T( + x(t)T Q(t)x(t) + u(t)T R(t)u(t) dt 2 0 with F ≥ 0. u. u. Q(t) ≥ 0. R(t) > 0. p∗ . p∗x . p. From the last statement we conclude. Next. We take the total differential taking stochastic calculus into account: . T ] → Rm which minimizes the cost functional J(u). This makes sense as we can see that p∗ (t) is linear in x∗ (t).2. p∗ . x. t) ≤ H(x∗ . we make a linear Ansatz for p∗ (t) = K(t)x + φ(t).

Kx This finally leads to the following two differential equations for K and φ: ˙ K(t) = − K(t)A(t) − AT (t)K(t) + K(t)B(t)R−1 (t)B T (t)K(t) − Q(t) K(T ) = F φ(t) ˙ = − [A(t) − B(t)R−1 (t)B T (t)K(t)]T φ(t) φ(T ) = 0 . Note.5. we see that for our process with linear drift and additive Gaussian noise. the result is the same as in the purely deterministic case. On the other hand. Combining the two equations and using px = K yields: [ ( )] ˙ + φ˙ + K Ax − BR−1 B T (Kx + φ) dt + KσdW Kx [ ] = − Qx − AT (Kx + φ) dt + KσdW and therefore: ( ) ˙ + φ˙ + K Ax − BR−1 B T (Kx + φ) = −Qx − AT (Kx + φ) . We call this property the certainty equivalence principle. = Kx From the FBSDE system we know [ ] dp∗ = (−Qx − AT p∗ )dt + p∗x σdW = − Qx − AT (Kx + φ) dt + p∗x σdW .2 Stochastic Optimal Control 101 ] [ ˙ + φ˙ + K(Ax − BR−1 B T p∗ ) + 1 · 0 dt + KσdW dp∗ = Kx 2 ] [ ( ) −1 T ˙ + φ˙ + K Ax − BR B (Kx + φ) dt + p∗x σdW . . that this result illustrates two important facts: On one hand we can see that the solution procedure with the FBSDE from the Pontryagins Maximum Principle and the PDE from the HJB equation end up with equivalent results for the optimal control u∗ .

102 5 Optimal Control .

The price of a bond varies over it’s lifetime. if consumption exceeds the present savings. An investment always contains some sort of risk... John Maynard Keynes (1883-1946) 6. one has to be willing to pay back more in the future. The interest rate or return is defined as the gain or loss of the investment divided by the initial value of the investment. The second type of securities are Bonds. A financial investment. in contrary to a real investment which involves tangible assets (e.S. i. There are two categories of Bonds: Government Bonds and Corporate Bonds. the higher an investor considers the risk of a security. There is a market for bonds where they can be bought and sold. stocks. Treasury.g. bonds. a security is a contract to receive prospective benefits under stated conditions (e. land. as Treasury Bills.6 Financial Applications Successful investing is anticipating the anticipations of others. They involve loaning on a short-term basis to the U. is an allocation of money with contracts in order to make its value increase over time.1 Introduction This section is an introduction to the terms of financial markets. According to the above statements. the higher the rate of return he demands. when the final cash payment (the principal or face value) is made. The last type of securities are stocks. Standard & Poor. The first type of securities considered are Treasury Bills. There are several rating agencies such as Moody’s. involve lending money but on a fairly long-term basis. options). The board of directors of a corporation decides when to pay cash dividends and how much.. factories). The two main attributes that distinguish securities are time and risk. Stocks (or shares) represent ownership in a part of a corporation. and Fitch. Therefore.g. Therefore. an investment can be looked at as a sacrifice of current cash for future cash. Conversely. The bond rating system is a measure of a company’s credit risk. sometimes referred as risk premium. Bonds. the maturity date. Bonds result in a cash payment each year (the coupon) up to it’s expiry. where risk also includes inflation. . Treasury Bills contain almost no risk.e.

we define the present value of a known cashflow (positive or negative) in the future (e. For more details. k does no longer need to be an integer and therefore we have St+△t = St er△t . t + 1] on a security with value S is defined by St+1 − St rt = . St Therefore. n We now let n tend to infinity and derive the formula for continuous compounding: r St+k = St lim (1 + )nk = St erk n→∞ n Now. The yield to maturity is used to value bonds.104 6 Financial Applications Another important term is short selling. How should a coupon-paying bond be valued? The concept of the net present value deals with this kind of questions.g. but the concept can easily be expanded to variable interest rates. Of course it is not possible to cover all terms of investing in this small section. . bonds). An investor who is short selling (sometimes referred as ”shorting”) assets believes in declining prices. This yields the following wealth after k periods r St+k = St (1 + )nk . If r is the interest rate then receiving (or 1 today. it follows that St+1 = St (1 + rt ). 6.1 Continuous Compounding In the discrete case. a return is always calculated with respect to the length of the sample period. Note that r is assumed T or Se constant. in the continuous case it is paying) S dollars in one year is worth S 1+r −r worth Se today. 6. [34].). and [23]. the value after k periods is calculated as St+k = St (1 + r)k . then its present 1 −rT value is S (1+r) in the continuous case.a. Short selling essentially means selling stocks which are borrowed with the intention to buy them back later. If the sample period is divided into n equal intervals. First of all. This result will play an important role in the valuation of securities and derivatives in continuous-time.2 Net Present Value This section considers the problem of how to value future cash flows.1. In the financial environment. the return is usually calculated on a yearly basis if not stated otherwise (sometimes denoted by p.. If we assume that the security S has a constant interest rate of r. From the equation above.1. the return in one such interval would be nr . When the cashflow takes place in T years. the return r of the sample period [t. the reader should refer to [31].

In the case of bonds. Once the investor decides to allocate his capital among the alternatives. a utility function U must fulfill certain properties: • • A utility function must be an increasing continuous function: U ′ > 0 A utility function must be concave: U ′′ < 0 The first property makes sure that an investor prefers always more wealth to less wealth.6. For a risk averse investor. The net present value (NPV) of an investment (or project) is defined as all discounted future cashflows.1 Introduction 105 The process of valuing future cashflows as equivalent to present values is called discounting. the quadratic functions (x < 1 b (x) b a 2b ) U (x) = ax − bx2 . NPV = ∞ ∑ i=1 ( Si 1 )i ∑ = Si e−r(ti )ti = 1 + ri i=1 ∞ ∫ ∞ Si (t)e−r(t)t dt 0 When calculating the present value of future cashflows. the only unknown in the equation of the net present value ist the interest rate (ri = r = const). his future wealth is governed by the corresponding random cash-flows of the investment opportunities. the exponential function (a > 0) U (x) = −e−ax 2. The resulting interest rate is called the yield to maturity. Some commonly used utility functions include 1. Utility functions provide a ranking to judge uncertain situations. the power functions (b < 1 and b ̸= 0) U (x) = 4.1. it would be easy to rank the options. Therefore. If all outcomes were certain.3 Utility Functions Suppose an investor has many different investment opportunities that could influence his wealth at the end of the year. The second property captures the principle of risk aversion. If an investor has several possibilities to invest his money. 6. In an uncertain environment. we know the present value of the bond since it is sold on the market. the logarithmic function U (x) = ln(x) 3. he should invest in the one having the highest net present value. one needs to make an assumption about the interest rate r. this decision is not so obvious.

1. To rank both options. the line between U (x1 ) and U (x2 ) will be below the function of U . The first option yields 12 [U (x1 ) + U (x2 )] and the second U ( 12 (x1 + x2 )). U ′ (x) . Risk averse utility functions Any of the utility functions shown above capture the principle of risk aversion. as shown in Figure 6.106 6 Financial Applications Utility U 6 U (x2 ) U ( 21 (x1 r r 1 U (x1 ) 2 U (x1 ) r r + x2 )) + 21 U (x2 ) . 6.x x1 1 (x1 2 + x2 ) x2 Fig. and thus the second option has a higher utility. The first will either yield with equal probability x1 or x2 (x2 > x1 ). The second possibility will always yield 12 (x1 + x2 ). Pratt and Arrow have developed measures for risk aversion. This is accomplished whenever the utility function is concave. we consider two wealth possibilities. These are • the Arrow-Pratt measure of absolute risk aversion: a(x) = − • U ′′ (x) . the expected utility is computed. To illustrate this point. Both options have the same expected value.1. U ′ (x) the Arrow-Pratt measure of relative risk aversion: b(x) = −x U ′′ (x) . but the first option involves more risk. since the payoff is deterministic. Risk aversion coefficients Since we are not actually interested in the absolute values of utility functions but rather in its shape. since one might end up with U (x1 ). The second option is risk free. Since U is concave.

The fundamental assumption of the mean-variance theory is that the returns are normally distributed. we can now construct infinitely many possible portfolios. Mean-variance portfolio selection was introduced by Harry Markowitz in 1952 and is called modern portfolio theory. whose elements state how much of the total wealth should be invested in each of the n securities.2 Mean-Variance Portfolio Theory 6. it is still a reasonable assumption. i=1 The returns Ri of the n securities are described by their means. . But we want to construct an optimal portfolio in the mean-variance sense. and their covariance matrix Σ.. Although this is not always the case. it is natural to maximize the expected end-of-period utility: { } maxn E U [W0 (1 + RP )] u∈R s. Mean-variance portfolio selection gives a solution to this kind of problem. The vector u is defined as u = [u1 .1 Introduction The first attempt to systematically solve the asset allocation problem was the mean-variance approach.2 Mean-Variance Portfolio Theory 107 6. This yields the expected portfolio return µP = E[RP ] = uT µ = n ∑ u i µi i=1 and the variance of the portfolio σP2 = var(RP ) = uT Σu = n ∑ n ∑ σij ui uj (σii = σi2 ). n ∑ ui = 1. the sum of the elements of u must be one.. un ]T . there exists no analytical solution for this problem. Such a portfolio is also called mean-variance efficient and is defined as follows: Definition 6. A portfolio P is fully described by the vector u. . eT u = 1 with e = [1.2.6.. 1. . An investor usually faces the problem of investing money in n different risky securities. Therefore.t.1]T ∈ Rn .. its distributions are fully described by its mean and its variance. u2 . Since the returns of the securities are normally distributed. A portfolio u∗ is called mean-variance efficient if there exists no portfolio u with µu ≥ µu∗ and σu2 ≤ σu2 ∗ Having introduced the concept of utility. The goal of the mean-variance analysis is to find a control vector u. respectively. i=1 j=1 With this framework. In general.1. denoted by the vector with elements µi ..

The solution for λ is λ= 1 eT Σ −1 µ − τ .. eT Σ −1 e eT Σ −1 e Figure 6. Every solution for u∗ (τ > 0) is mean-variance efficient and lies on the so called efficient set. .2.t.g. Markowitz was the first to include the covariances of the returns in his model.108 6 Financial Applications 6. b ∈ Rm . . The statement of the optimization of Markowitz is the following: { 1 } maxn − uT Σu + τ µT u u∈R 2 s. 1. 1. λ) = − uT Σu + τ µT u + λ(eT u − 1) 2 ∂L = −Σu + τ µ + λe = 0 ∂u ∂L = eT u − 1 = 0 ∂λ Since Σ is positive-definite. eT u = 1 with e = [1. eT u = 1 with e = [1.1]T ∈ Rn Au ≤ b with A ∈ Rm×n . sometimes also denoted as efficient frontier.. Here. 1 L(u. To find a solution to this problem we use the method of Lagrange multipliers.. eT Σ −1 e eT Σ −1 e This gives the following solution for u: u∗ = Σ −1 ( ( 1 eT Σ −1 µ )) e + τ µ − e .t. nonnegativity of the elements of u) the optimization looks as follows { 1 } maxn − uT Σu + τ µT u u∈R 2 s. By including m additional. Every solution of the problem formulation above is a mean-variance efficient portfolio.2 shows the result.1]T ∈ Rn . the two conditions are necessary and sufficient conditions for optimality. arbitrary constraints (e... τ ≥ 0 measures the investors risk tolerance and is closely related to the Arrow-Pratt measure of relative risk aversion. . The optimal allocation u∗ (τ = 0) has the lowest possible variance and is therefore called the minimum variance point.2 The Markowitz Model The solution provided by Markowitz is very popular and still widely used in practice.

which is shown in Figure 6.3. Therefore. The new optimization problem is { 1 } maxn − uT Σu + τ (µT u + (1 − eT u)r) . the efficient frontier in the σ − µ plane is a straight line. even for very large dimensions. the first order condition is a necessary and a sufficient condition: τ (µ − er) − Σu = 0 ⇔ u = τ Σ −1 (µ − er). There are numerous programs for numerically solve this kind of problems. 6. Every mean-variance .2. (1 − eT u). Inclusion of a risk-free asset So far. in the riskless security. Intuitively. Hence the portfolio mean and variance are µP = τ (µ − er)T Σ −1 (µ − er) + r σP2 = τ 2 (µ − er)T Σ −1 (µ − er). u∈R 2 Since we have no constraints (because they are inherent in the problem) the problem can be solved in a straightforward manner. we have considered only n risky asset with σ > 0. Mean variance plot and efficient frontier This type of problem is called a quadratic program.2 Mean-Variance Portfolio Theory 109 µ 6 efficient frontier r u∗ (τ > 0) r uˆ r u∗ (τ = 0) - σ Fig. we want to answer this question.6. As Σ is positive-definite. But what happens if we include a riskless asset with return r and σ = 0? In this section. this makes sense. we rather invest the difference to one. It is no longer necessary for the ui s to sum to one.

All information about the market is freely and instantly available to all investors. The main assumptions are: • • • • Everyone has the same estimations for µ and Σ for the available risky assets and can invest at the same riskfree rate. The market portfolio consists of all shares available in the market. Hence. 6.3. Everyone uses the same mean-variance approach for constructing his portfolio. All these assumptions require the market to be in equilibrium. For a more detailed treatment of the topic of mean-variance. the reader may consult [29] or [23]. every portfolio is a combination of the portfolio u ¯ and the riskless asset. 6. There are no transaction costs or taxes.2. In the case of the CAPM. If a security would not .3. the best one is the tangent on the original efficient set. everybody chooses the same tangency portfolio u ¯ as in Figure 6. the portfolio u ¯ is called the market portfolio and it corresponding tangency line is called the capital market line. Mean variance plot and efficient frontier efficient portfolio (without the riskless asset) combined with the riskless asset yields a straight line.110 6 Financial Applications µ 6 efficient frontier r u¯ r u∗ (τ = 0) r r - σ Fig. and Mossin (1964). as shown in Figure 6. The assumptions behind the CAPM are pretty daring but still the CAPM is widely used because of its easiness.3 The Capital Asset Pricing Model (CAPM) The capital asset pricing model (CAPM) was developed by Sharpe. Under these assumptions.3. Therefore. Lintner.

nobody would buy it and therefore its price would decline until it yields a return that makes it entering the market portfolio again. From the results obtained so far. see [34]. the return R of a security can be described as follows: Ri = r + βi (µM − r) + εi .2 Mean-Variance Portfolio Theory 111 be in the market portfolio. The relative market value of a security simply is equal to the aggregate market value1 of a security divided by the sum of aggregate market values of all securities. the expected return of the portfolio can easily be calculated. the expected portfolio return can be calculated from its standard deviation and vice versa.6. where εi is an arbitrary random variable with E[εi ] = 0 and σεi M = 0. i=1 and therefore. we still do not know the relation for individual securities. where σiM βi = For a proof of this formula see [23]. The relationship is the following: µi − r = βi (µM − r) σiM ρim σi σM ρim σi = 2 = 2 σM σM σM : covariance of the i-th asset with the market. the resulting portfolio can again be expressed by the CAPM. The beta of the portfolio is βP = n ∑ βi ui ⇒ µP = r + βP (µM − r). By combining n arbitrary assets of the market. 1 The aggregate market value for a common stock of a company is equal to the current market price of the stock multiplied by the number of shares outstanding. Since every investor holds a portfolio on the capital market line. . µM − r σP σM µP : expected return of the portfolio r : risk-free interest rate µP = r + µM : expected return of the market σM : standard deviation of the market σP : standard deviation of the portfolio As we know the relation of the return and the variance of efficient portfolios. Therefore it can be stated: The market portfolio is a portfolio consisting of all securities in which the proportion invested in each security corresponds to its relative market value.

for every security. GDP. there is. hence it cannot be reduced by diversification. σε2i . capital markets are perfectly competitive. inflation. It is unrelated to the market and can therefore be reduced by diversification. Every security with β ̸= 0 includes this type of risk.. the specific variance will vanish as n tends to infinity. 6. yield spreads. we get the following specific variance of the portfolio where σε2i denotes the average variance: σε2iP n ∑ 1 2 1( = σ = n2 εi n i=1 ∑n i=1 n σε2i ) = 1 2 σ .112 6 Financial Applications From this definition. It is also called non-systematic or idiosyncratic risk. we need to introduce factor models. is called systematic risk which is tied to the market. Factor Models Factor models relate the returns of a security to one or more common factors (e. which is uncorrelated to the error terms of the other securities. It is also assumed that the unique error term is uncorrelated with every factor included in the model. factor models explain the random return R of a security in the following manner: . Before we can state the APT. a unique error term. i=1 To see the effect of diversification. The main assumptions of the APT are. etc. The total risk of a portfolio under these assumptions is: 2 σP2 = βP2 σM + σε2iP = n (∑ )2 βi u i 2 σM + i=1 n ∑ u2i σε2i . that the investors always prefer more return to less return with the same risk.2. and that the investment universe is sufficiently large. interest rates. The second part. Therefore. we assume that the proportion in each of the n securities ui is n1 . Since there is always uncertainty in security returns. is the specific risk for every single security. it is obvious that a security has two components of risk: 2 σi2 = βi2 σM + σε2i .).4 Arbitrage Pricing Theory (APT) The Arbitrage Pricing Theory (APT) does not require the same assumptions as the CAPM. n εi Therefore. 2 The first term. variances. βi2 σM .g. Then.

2 Mean-Variance Portfolio Theory Ri = ai + m ∑ 113 bij Fj + εi j=1 Ri : the return of the i-th asset ai . This yields the random rate of return RP of the portfolio: RP = bj = σε2 = n ∑ i=1 n ∑ i=1 n ∑ ui ai + m ∑ bj Fj + ε j=1 ui bij u2i σε2i . i=1 σ ¯ε2 max(σε2i ). Statement of the APT The APT theory was developed by Ross in 1976. Since we demand that there is no n dominating ui we know that i=1 u ¯ ≈ 1 and therefore u ¯ ≈ n1 and get for the specific portfolio variance σε2 ≤ n ∑ 1 2 1 2 σ ¯ε = σ ¯ . It is also assumed that the portfolio is well diversified. This is illustrated as follows.. the riskfree rate. the diversification effect also reduces the specific risk of a portfolio. The weights of the different factors determines the expected return.. variance. bij : constants Fj : a factor εi : error term with E[εi ] = 0. APT extends .6. As in the CAPM. This is because the intercept in the CAPM is fixed. Suppose a portfolio of n securities is constructed with proportion ui of total wealth invested in the i-th security. APT states that prices are generated by a factor model but does not state which factors. but in a general factor model. the intercepts may vary for different securities. The only difference is that the CAPM is an equilibrium model whereas factor models usually are not. and covariances of every security. where the only factor is the return of the market. there is no dominating ui . i. The ai s are sometimes denoted by intercepts and the bij s are denoted by factor loadings. 2 n n ε i=1 Therefore.e. This argument is inherent in the APT.e. We define = u ¯∑ = max(ui ). i. the systematic risk tends to zero for a well diversified portfolio as n tends to infinity. Remember that the CAPM is a one-factor model.

The APT simply states that given all bij .  b1m b2m · · · bnm un 0 Obviously the matrix of coefficients must be singular.. . E[Fij ] = 0. This is the main argument to construct the theory..   . .. j=1 Suppose.   . . we hold a portfolio u.  .. The arbitrage portfolio △u has the following characteristics: n ∑ △ui = 0 △u is self financing. i=1 n ∑ △ui bij = 0 i=1 n ∑ △ui ai = 0 △u contains no factor risk j = 1.m. λ1 . We now state that there is no way to change this portfolio by a so called arbitrage portfolio △u which yields a higher return but bears the same risk as the original portfolio. i=1 We therefore have the following system of m + 2 linear equations:      1 1 ··· 1 u1 0  a1 a2 · · · an   u2   0        b11 b21 · · · bn1   u3   0     =  . Therefore we can express any row of the system as a linear combination of the remaining rows.   . .  . there must not be any arbitrage opportunities. As shown in the previous chapter. that is the expected excess return on a portfolio that has unit sensitivity to the corresponding factor.. i. . △u must not have a return. they have to be calculated with the equations above. there is no way to chose the ai s.. λm such that ai = λ0 + m ∑ bij λj . In order for the market to be in equilibrium. Therefore..   . The λj s are called factor risk premium. the uncertainty stems only from the uncertainty of the factors. E[Ri ] = λ0 + m ∑ bij λj . . As a consequence.. the return can be expressed as Ri = ai + m ∑ bij Fij . .. j=1 If there is a riskless asset with return r.114 6 Financial Applications the factor model approach such that the market is in equilibrium again.e.. j=1 Since E[Fij ] = 0 we know have the following expected return.. there exists λ0 . . we know that λ0 = r. we can omit the specific risk in a well diversified portfolio. .

. µi ∈ R is the relative change in price of Si and σi σiT is the covariance per unit time (σi ∈ R1×n is the i-th row of the matrix σ(x(t).2 The Dynamics of Asset Prices This section introduces models to describe asset price dynamics. riskless asset (bank account) with a deterministic interest rate r. The model usually consists of some type of stochastic differential equation.3. filtered probability space (Ω. 6. t) dWS (t) Si (t) Si (0) > 0 . we consider a market in which n risky assets exist. The breakthrough was in 1973 when Fisher Black and Myron Scholes presented their formula for pricing options. µi and σi can be dependent on some arbitrary factors x(t). x(t)) dt B(0) > 0 . The investor can borrow and invest money at the same deterministic interest rate. t) dt + σi (x(t). with the following dynamics dB(t) = B(t)r(t. The investor cannot influence prices by buying or selling an asset. the price process is the famous geometric Brownian motion. which may be described by SDEs themselves and the time t. This model reflects important characteristics of real stock or bond prices (e. Paul Samuelson and Robert Merton were among the main contributors.1 Introduction Continuous-time finance was developed in the late 1960. fees. t) are constant. {Ft }t≥0 . or taxes. The type of stochastic differential equation is a generalized geometric Brownian motion with possibly stochastic drift and diffusion. By adding a further. . P). The asset price processes Si (t) of the risk-bearing investments satisfy the stochastic differential equation dSi (t) = µi (x(t). In general.3 Continuous-Time Finance 115 6. F. stochastic optimal control plays a very important role. The following assumptions are necessary to develop the models: • • • • • Trading is continuous. There is vast research today in the area of continuous-time finance.g.3 Continuous-Time Finance 6. In the case when µi (x(t). There are no transaction costs. To develop the model. The ndimensional Brownian motion dWS is defined on a fixed.3.6. Therefore. In the context of continuous-time finance. The investor can do short selling. µi and σi may be stochastic processes. t) and σi (x(t). t) ∈ Rn×n ). no negative prices are observed).

For example.3. In this context. industry specific. The only way an investor can change the wealth dynamics is by buying or selling shares. we have the fundamental relationship which holds in the discrete time and in the continuous time case X(t) = n ∑ Ni (t)Si (t) i=1 X(t) : wealth of the investor at time t Ni (t) : number of shares of the i-th security in the portfolio Si (t) : share price of the i-th security n : the number of existing shares on the exchange . 6. Therefore. Examples of these factors are Macroeconomic factors GDP growth long term interest rate inflation Industry specific Company specific sector growth dividends industry rate of returns earnings industry levarage cash-flow By carefully selecting external variables with some predictive capacity. a popular interest rate model could be taken. options. the wealth changes according to the changes of the share prices. Once the portfolio is constructed and no shares are either bought or sold. The setup of equations imposes a structure to model asset price dynamics. The framework is very broad and thus allows us to model different types of securities. t) and a function for instantaneous volatility σi (x(t). such as stocks. we will use a slightly different approach to develop the wealth dynamics. Both are functions of the external economic variables x(t) and thereby we are able to model the influences of the economic factors on the instantaneous expected returns and volatility of each individual security. . A further advantage is that these variables are measurable and are not contained invisibly in the price dynamics. The relationships between the external economic variables and the assets price dynamics as well as the dynamics of the external variables can be derived by a theoretical model or by empirical methods applied to financial data. we can model the time-varying features of the asset price dynamics. The price processes are divided into a function for instantaneous expected returns µi (x(t). empirical time series modelling could be used to establish the relationship between asset prices and external economic factors.116 6 Financial Applications The process x(t) allows us to model variables which describe external influence factors of either macroeconomic. bonds. which theoretically proposes interest rate dynamics and the relationship between the interest rates and the price dynamics of bonds.3 Wealth Dynamics and Self-Financing Portfolios The derivation of wealth dynamics of a portfolio was first developed in Merton. Alternatively. etc. see [25]. t). or company specific nature.

Letting ∆t → 0 and the budget equation yield dX = n ∑ Ni (t)dSi − C(t)dt. i Si we have ui = NX whereas u0 denotes the proportion of wealth invested at the risk-free rate r (σ0 = 0). t) − e r(t.. t + ∆t]. therefore u0 = 1 − ∑n i=1 ui . the above equation states that stochastic changes in wealth may only occur from changes in the share prices. i=1 We have already modelled the dynamics of the share prices (omitting the time arguments) dSi = Si µi dt + Si σi dWS . x(t))X(t) dt + X(t)u(t)T σ(x(t). i=1 ui = 1. the equation becomes much harder to read: dX(t) = u(t)T (µ(x(t).1) By including the arguments. Intuitively. i=1 i=1 In vector notation with e = (1. This yields the wealth equation dX = n (∑ ui Xµi − C i=0 By definition. .. Therefore. t)dWS (t) − C(t)dt..6. (6. . i=1 As in the mean-variance portfolio theory we define the elements ui of the control vector u as the fraction of wealth invested in the i-th security. Putting this into the wealth equation we get dX = n (∑ Ni Si µi − C ) dt + i=1 n ∑ Ni Si σi dWS . If we also allow the investor to take out money of the portfolio we get the following budget equation X(t + ∆t) − X(t) = n ∑ Ni (t)[Si (t + ∆t) − Si (t)] − C(t + ∆t)∆t i=1 C(t) > 0 : consumption per unit time in the interval [t. we have equation becomes ∑n i=0 ) dt + n ∑ ui Xσi dWS . x(t)))X(t) dt + r(t. 1)T ∈ Rn we get dX = [X(uT (µ − e r) + r) − C] dt + XuT σdWS .3 Continuous-Time Finance 117 Consider the situation where the investor can only change his portfolio in discrete time and during the sampling period ∆t he cannot change the portfolio. so the wealth n n ( ) ∑ ∑ dX = Xr + X ui (µi − r) − C dt + X ui σi dWS .

1). X(t) = i=1 Ni (t)Si (t). . Merton introduces non-capital gains (wage) addition to the consumption. t) = (µ1 (x(t). i=1 For ∆t → 0 we arrive at −C(t)dt = n ∑ dNi (t)dSi (t) + i=1 n ∑ dNi (t)Si (t). initial condition X(0) = X0 . . i=1 Be rearranging the terms we get: −C(t + ∆t)∆t = = n ∑ i=1 n ∑ [Ni (t + ∆t) − Ni (t)]Si (t + ∆t) [Ni (t + ∆t) − Ni (t)][Si (t + ∆t) − Si (t)] i=1 + n ∑ [Ni (t + ∆t) − Ni (t)]Si (t). . we consider the n budget equation. µn (x(t).118 6 Financial Applications where u(t) = (u1 (t). which∑we will show now. If the original wealth equation. denoted by dy. σn (x(t). . Therefore. . the other two terms on the right hand side must be the consumption of the investor.4 Portfolio Models and Stochastic Optimal Control Example 1: Merton’s Problem We consider a market in which n + 1 assets are traded continuously. dWS (t) ∈ Rn and µ(x(t). . t). t))T ∈ Rn σ(x(t). un (t))T ∈ Rn . One of the assets is a bond whose price process B(t) is given by the deterministic . .3. This concludes the derivation of the wealth equation (6. 6. as in our discrete model. Again. we forego non-capital gains and therefore set dy to zero. is differentiated according to Itˆo we get: dX(t) = n ∑ Ni (t)dSi (t) + i=1 n ∑ dNi (t)Si (t) + i=1 n ∑ dNi (t)dSi (t). . i=1 The first term on the right hand side reflects capital gains. t))T ∈ Rn×n Before we can conclude the section there is still point which needs clarifica∑one n tion. In this context. but set W (t) to i=1 Ni (t)Si (t) n ∑ Ni (t + ∆t)Si (t + ∆t) − i=1 n ∑ Ni (t)Si (t) i=1 = n ∑ Ni (t)[Si (t + ∆t) − Si (t)] − C(t + ∆t)∆t. . . t) = (σ1 (x(t). . t). i=1 In [25]. .

As a part of the model. . J(X(t). .1): dX = [X(uT (µ − e r) + r) − C] dt + XuT σdW. where r(t) > 0 is called the interest rate of the bond. The m-dimensional Brownian motion dW (t) is defined on a fixed. The expected discounted objective (using a weighting factor π) is the following: {∫ } T 1 −ρt 1 γ γ J(u) = E e C(t) dt + π X(T ) . Sn (t) satisfy the stochastic differential equations dSi (t) = µi (t)Si (t) dt + σi (t)T Si (t) dW (t) Si (0) = si > 0 where µ ∈ Rn is called the appreciation rate (drift) and σi ∈ Rm (σ ∈ Rn×m ) the volatilities (diffusion) which reflect the fluctuations of the stock prices. we can apply the wealth equation (6. P). {Ft }t≥0 . we include the investor’s objective function. γ 0 γ Solution of Example 1 We first state the optimization problem: {∫ } T 1 −ρt 1 max E e C(t)γ dt + π X(T )γ u γ 0 γ s. t) = max E γ C.u(t) 0 γ This problem leads to the following HJB equation ( ∂J ∂t ≡ Jt . We define the optimal cost-to-go function by } {∫ T 1 1 −ρt γ γ e C(t) dt + π X(T ) . The other n assets are called stocks. where u(t) ∈ Rn is the usual control vector. dX = [X(uT (µ − e r) + r) − C] dt + XuT σdW X(0) = x0 > 0 . . The total wealth at time t ≥ 0 is given by X(t). Their price processes S1 (t).6. .t. ∂2J ∂X 2 ≡ Jxx ) ∂J ∂X ≡ Jx .3 Continuous-Time Finance 119 differential equation dB = r(t)B(t) dt B(0) = b0 > 0 . F. and . Now. B(t) will grow steadily and the bond is therefore called a riskless asset. As time passes. filtered probability space (Ω.

u∗ (t) = − Plugging the optimal policies and the partial derivatives from above into the HJB equation results in ( ) ( γ (µ − er)T Σ −1 (µ − er) ) −ρt γ ′ γ−1 e X h (t) + h(t) rγ − ρ + − (1 − γ)h(t) = 0. the ordinary differential equation γ ˙ h(t) + Ah(t) − (1 − γ)h(t) γ−1 = 0 h(T ) = πeρT . ∂X 2 Inserting these partial derivatives into equations for C ∗ and u∗ yields the optimal policies for consumption C ∗ (t) and investment strategy u∗ (t): 1 1 Σ −1 (b − er) = Σ −1 (µ − er) γ−1 1−γ 1 ( ) γ−1 1 C ∗ (t) = h(t)γX γ−1 = h(t) γ−1 X. γ u∗ (t) = − We now put these value back into the HJB equation ) 1 J2 γ ( ρt −γ −1 x Jt + e γ−1 Jxγ−1 γ γ−1 − γ γ−1 − (µ − er)T Σ −1 (µ − er) + XrJx = 0 2 Jxx We use the following Ansatz J(X. . To derive the optimal C and u. to specify h(t) and find an explicit solution to the optimal control problem.120 6 Financial Applications [ Jt + max C(t). 2(1 − γ) Thus. with A = rγ − ρ + (µ−er)T Σ −1 (µ−er) 2(1−γ) remains to be solved. t) = X γ e−ρt h(t) and get ( ) ∂J = e−ρt X γ h′ (t) − ρh(t) ∂t ∂J = e−ρt h(t)γX γ−1 ∂X ∂2J = e−ρt h(t)γ(γ − 1)X γ−2 .u(t) −ρt e ] 1 2 T C + X(u (µ − e r) + r) − C Jx + X Jxx u Σu = 0 2 ( γ ) T with Σ = Σ(t) = σ(t)σ(t)T . we first do the maximization as demanded by the equation above (assuming the partial derivatives as constant) Jx Σ −1 (µ − er) XJxx 1 (1 ) γ−1 C ∗ (t) = eρt Jx .

The stock market portfolio process is modelled by a geometric Brownian motion. The wealth dynamics and the dynamics of µ(t) govern the portfolio of the investor. described by dB = rB(t) dt B(0) = b0 > 0 .3 Continuous-Time Finance 121 Example 2: Stochastic Returns The second example is an extension of the geometric Brownian motion model. but this framework can easily be extended to many assets including many drift term processes.6. Let u(t) the fraction of the wealth invested in the stock market and 1 − u(t) the fraction of wealth invested in the short term bond. We assume that an investor has the choice to invest in either the stock market or in a short term bond (or money market account). The investors objective function is assumed to be the following { } 1 γ J(u) = E X(T ) . we apply the wealth equation (6. This example features only one stochastic asset. The correlation between Zµ and ZP is denoted by ρ. The optimization problem is to choose u(t) in such way that a risk adjusted performance measure is maximized. where r denotes the short term risk-free interest rate and B(t) the price of the continuously compounding bond. γ Solution of Example 2 We first state the optimal control problem . The assumption that the parameters µ and σ are constant values are dropped and instead we model the drift term as a stochastic process.1) and set C(t) ≡ 0: dX(t) = X(t)[u(t)(µ(t) − r) + r] dt + X(t)u(t)σP dWP X(0) = x0 > 0 . as given by dP (t) = µ(t)P (t) dt + σP P (t) dWP P (0) = p0 > 0 . The drift is modelled as a mean-reverting stochastic process given by dµ(t) = κ(θ − µ(t)) dt + σµ dWµ µ(0) = µ0 > 0 . Again.

γ u(·) Before stating the HJB equation in the usual manner. µ) = 2 1 (X(t)er(T −t) )γ ea(t)+b(t)µ(t)+c(t)µ (t) γ . X. dX(t) = X(t)[u(t)(µ(t) − r) + r] dt + X(t)u(t)σP dWP dµ(t) = κ(θ − µ(t))dt + σµ dWµ dWP dWµ = ρ X(0) = x0 > 0 µ(0) = µ0 > 0 . X.122 6 Financial Applications { 1 E X(T )γ γ max u(·) } s.t. These state dynamics leads us to the following HJB equation: [ Jt + max Jx X(t)[(µ − r)u(t) + r] + Jµ κ(θ − µ(t)) u(t) ] 1 1 + Jxx X 2 (t)u2 (t)σP2 + Jxµ X(t)u(t)ρσP σµ + Jµµ σµ2 = 0 2 2 1 J(T ) = X(T )γ γ The optimal u∗ (t) has to satisfy the first order conditions: u∗ (t) = − Jx (µ(t) − r) + Jxµ ρσP σµ . Jxx X(t)σP2 The optimal u∗ (t) is plugged back into the HJB equation and we obtain: 1 Jt + Jx X(t)r + Jµ κ(θ − µ(t)) + Jµµ σµ2 2 2 2 2 ρ σµ Jµx Jx (µ(t) − r)ρσµ 1 J 2 (µ(t) − r)2 1 Jµx − − x − = 0. We define the optimal cost-to-go function by { } 1 J(t. 2 2 Jxx σP Jxx σP 2 Jxx We now guess that the value function J has a form J(t. where we assume that the investor does not consume any of his wealth until the end of the time horizon. µ) = max E X(T )γ . we reformulate the state equations as follows: dX(t) = X(t)[u(t)(µ(t) − r) + r] dt + X(t)u(t)σP dWP √ dµ(t) = κ(θ − µ(t))dt + σµ (ρdWP + 1 − ρ2 dWµ ).

The partial derivatives of the value function J are computed and put back into HJB equation. we assume the investor invests in long term bonds. The time-varying short-term interest rate is given by dr(t) = κ(θ − r(t))dt + σr dW r(0) = r0 > 0 . and c(t). r0 the initial condition. κ b(t) c(t) 2r 1˙ b(t) + (2 θ c(t) − b(t)) + + γ γ γ (γ − 1) σP2 − ρ σµ (a(t) − r c(t) 2 ρ2 σµ2 b(t) c(t) − =0 (γ − 1) σP (γ − 1) with terminal condition b(T ) = 0.3 Continuous-Time Finance 123 In order to satisfy the terminal condition J(T. b(t). (1 − γ) σP2 The three ODEs can be solved offline before the controller is used to make the investment decisions. The task of solving a nonlinear PDE is now turned into a task of solving three coupled ODEs for a(t). Example 3: Stochastic Interest Rates The third model assumes that the interest rate is a stochastic process rather than a given constant. µ) = γ1 X γ (T ). µ) = (µ(t) − r) + ρ σP σµ [b(t) + 2 c(t) µ(t)] . Let B(r(t). where r(t) is the short-term interest rate. Instead of investing in equities (stocks). X. τ ) denote the price . θ the long run average. ρ2 σµ2 b2 (t) 1 κθ b2 (t) + 2c2 (t) r2 ρ σµ r a(t) a(t) ˙ + b(t) + − − − =0 γ γ 2γ 2(γ − 1)σ 2 (γ − 1) σP2 2(γ − 1) with terminal condition a(T ) = 0. the following conditions are imposed a(T ) = b(T ) = c(T ) = 0. This model is known as the Vasicek model. The following ODEs are obtained: 2 c2 (t) ρ2 σµ2 2 2 2 ρ σµ c(t) 1 1 c(t) ˙ − κc(t) + c2 (t) − − − =0 γ γ γ 2(γ − 1)σP2 (γ − 1)σP (γ − 1) with terminal condition c(T ) = 0. and σ the variance parameter.6. κ > 0 the rate of reversion. The optimal portfolio control law is given by u∗ (t. X.

t. we assume that the investor wants to have maximum discounted risk-adjusted cashflow of his investments. τ ) B(0) = b0 > 0 . The investor is assumed to invest only in the long-term bond or short-term money market account which pays the short-term interest rate r(t). dX = [(u(t)λa(τ ) + r(t))W (t) − C(t)]dt − u(t)X(t)a(τ )σr dWP dr(t) = κ(θ − r(t))dt + σr dWr r(0) = r0 > 0 X(0) = x0 > 0 . Let u(t) denote the fraction of the wealth invested in the long-term bond and 1 − u(t) the fraction of his wealth put into the money account. The second investment possibility is a short-term money market account M which is described by dM (t) = r(t)M (t)dt M (0) = m0 > 0 .124 6 Financial Applications of a zero-coupon bond with τ = T − t periods to maturity. Instead of maximizing his wealth for a given horizon. The dynamics of bond returns can be derived using the no-arbitrage argument together with a constant price of risk dB(r(t). where dW denotes the Brownian motion. τ ) = [r(t) + λa(τ )]dt − a(τ )σr dW B(r(t). λ the price of interest rate risk. u(·) 0 s. Solution of Example 3 For this model we assume that the investor wants to maximize his consumption from the portfolio. The wealth dynamics are given by dB(t) dM (t) dX = X(t)u(t) + (1 − u(t))X(t) B(t) M (t) ( ) = u(t)X(t) [r(t) + λa(τ )]dt − a(τ )σr dW + (1 − u(t))r(t)X(t)dt = [(u(t)λa(τ ) + r(t))X(t) − C(t)]dt − u(t)X(t)λa(τ )σr dW where X(t) denotes the investor’s wealth and C(t) denotes the consumption out of the portfolio. . and a(τ ) = κ1 (1 − e−κτ ). The optimal control problem is therefore given by {∫ } ∞ 1−γ −βt C max E e dt 1−γ C(·)≥0.

C(t) = H(r(t)) u(t) = .3 Continuous-Time Finance 125 where we assume that the consumption is positive. t) = e−βt H(r(t))γ X(t)1−γ .C(t) ] 1 1 + Jxx u2 (t)X 2 (t)a2 (τ )σr2 − Jxr u(t)X(t)a(τ )σr2 + Jrr σr = 0 2 2 sup From the HJB equation. The optimal portfolio control law and the optimal consumption laws are given by λ 1 − a(τ )σr2 γ H(r(t))a(τ ) X(t) . 1−γ where H(r(t)) is a function of the short-term interest rate. and the optimal portfolio selection is unrestricted.6. the first-order conditions for optimal consumption and portfolio choice are: C(t) = (eβt Jx )− γ Jxr 1 Jx λ + u(t) = − . Jx X a(τ )σr2 Jxx X a(τ ) 1 Substituting these conditions into the HJB equation gives a second-order nonlinear PDE for the value function J. and it can be shown that H(r) ≥ 0. Using the guess we arrive at an ODE of the form 0= γ β λ2 γκ(θ − r) H′ γσr2 H ′′ − + + r + ( − λ) + . The ODE can analytically be solved. t) = max E e−βt dt 1−γ C(·)≥0. The cost-togo function is defined as {∫ } ∞ 1−γ C J(X. To solve this equation. β is the discount rate. we guess that the solution takes the form J(X. but it involves complicated expressions involving gamma functions. 0 < γ < 1. r. (1 − γ)H 2γσr2 1−γ 1−γ H 2(1 − γ) H where H ′ denotes the derivative with respected to r. u(·) 0 For the optimal portfolio and consumption strategy for this model we obtain the following HJB equation: Jt + [ C 1−γ e−βt + Jx (u(t)λa(τ )X(t) + rX(t) − C(t)) + Jr κ(θ − r(t)) 1−γ u(t). r.

This would lead to infinite gains by investing infinite borrowed money in the arbitrage opportunity. This section deals with forward contracts (forwards). in its simplest form. The forward price Ft of a forward contract is the delivery price that would apply if the contract would be established at that time.2. If we denote the price of the underlying security at maturity by ST and the forward position by K. The security which determines the value of the derivative is called the underlying security (or underlying). the gain of the long position . the payoff of the long position is ST − K. arbitrage means taking simultaneous positions in different securities guaranteeing a higher profit than the risk-free rate of interest. futures. In this section. The opposite party who is to sell the underlying takes the short position. customer to sell a machine for 1 million dollar in one year. A forward contract is a contract to purchase or sell an asset at a specific price and at a specific time in the future. Example 6. The concept of arbitrage plays an important role in pricing derivatives. The spot price is the price of the asset for immediate delivery. EUREX. Note that the initial price of a forward contract is zero and therefore no money is transferred at the beginning. The party of the contract who agreed to buy the underlying is said to be long.4. (Currency Risk) A Japanese corporation has a contract with a U.1 Forward Contracts A forward contract is the simplest example of a derivative. 6.126 6 Financial Applications 6. This means that the long party of the contract can sell the security at a price of ST which he bought for K.4 Derivatives A derivative security (or simply derivative) is a security where the value depends explicitly on other variables. So. Chicago Board of Trade CBOT) and also in a over-the-counter manner. We call the interest rate at which cash can be borrowed or loaned the risk-free rate of interest. Of course. we consider financial instruments whose payoff is explicitly tied to the payoff of other financial securities. The price specified in the forward contract is called delivery price K.S. and options.g. If the exchange rate Yen/Dollar rises in the forthcoming year the Japanese firm will earn less. Derivatives are traded in a standardized fashion on financial markets (e. The Japanese firm can eliminate this risk by entering in forward contract to sell 1 million dollar for fixed amount of Yen in one year. where tailored contracts are sold to investors.

Then. the delivery price in a forward contract with time to delivery T is K = erT S0 for the continuous case and K = (1 + r)T S0 for the discrete case. . The risk-free interest rate is r. the payoff of the short position is K − ST . 6. Payoff of a forward contract at maturity where K is the delivery price an ST the price of the security at maturity The question which still remains is how to set the delivery price in order to have an initial price of zero. Proof: The proof is straightforward by showing that there exist arbitrage opportunities otherwise. Figure 6.4 shows the payoff-diagram of a forward contract.4. 6 Payoff Long position . This is the case when there are no arbitrage opportunities. Therefore. The long position gains on rising exchange rate whereas the short position’s earnings rise with declining exchange rate. Delivery price of a forward contract: Suppose that the current price of an asset is S0 and assume that the asset can be stored at zero cost.ST 0 K Short position Fig. The proof includes only the continuous-time case. The proof for the discrete-time case in analogous.6.4 Derivatives 127 is the loss of the short position and vice versa.

Therefore. the value of the contract is zero at the beginning. usually commodities or shares The contract size . the value of the forward contract will also change.. Consider a portfolio of a forward with delivery price Ft with short position and one forward with delivery price K with long position at time t. Proof: The proof includes only the continuous-time case. The arbitrage portfolio can now be constructed as follows: Borrow S0 at rate r and buy the security. At time T . K must not be smaller than erT S0 . K must not be larger than erT S0 . Therefore. The forward contract has delivery at time T . A future contract has to specify the following: • • The underlying asset. Value of a forward contract: Suppose that the forward price of security S after time t is Ft and that the delivery price of the forward is K.2 Futures Forwards have one major drawback because it is not possible to trade them on an exchange market. But as the price of the underlying changes.4. In order to make trading possible. the two parties of a future contract do not need to know each other and the exchange takes care of the risk of default of the counterpart. The risk-free interest rate is r. The security can be stored at zero cost. Obviously the value of the short position is equivalent to the negative value of the long position. the market has to specify certain standards. the proof for the discrete-time case in analogous. The gain at maturity is erT S0 − K > 0. Then the current value of the forward contract for the long position is ft = e−r(T −t) (Ft − K) for the continuous-time case and ft = 1 (Ft − K) (1 + r)T −t for the discrete-time case. Therefore. the investor has to lend a cash amount ft since the forward with delivery price Ft has a value of zero. Suppose that K < erT S0 .  6.e.  By construction. repurchase the security for K. the portfolio will be worth Ft − K with certainty and the repayment for the loan will be ft er(T −t) . The arbitrage portfolio for this case is the following: Sell the security short and invest the received cash S0 with the risk-free rate r. There is no arbitrage only if the final value of the loan and the value of the portfolio are identical. At maturity. i. The guaranteed gain is K − erT S0 > 0. In order to establish such a portfolio.128 6 Financial Applications Suppose that K > erT S0 . At maturity the security can be sold for K and the loan has to be repaid for erT S0 .

i. The party in the long position then receives F1 − F0 . arbitrage opportunities exist.e. called the maintenance margin.. Since futures are marked to market every day. the value of the contract is always zero. a minimum margin. the margin account is balanced at the end of every day. The only variable price in the contract is the delivery price. To make sure that the margin account never has a negative balance. In effect. maximum of contracts a speculator may hold. The setting of the forward price is a very subtle problem since the exchange cannot issue new contracts every time the price of the underlying changes. The next day the forward price is F1 which is most probably not equal to F0 . a margin call is issued to the trader and he is requested to top up the margin account to the initial margin. These two problems can be addressed by margin accounts.. The margin account serves as a guarantee that neither party of the contract defaults on its obligation.4 Derivatives • • • • • • 129 Delivery arrangements Delivery date Quality of delivered goods Price quotes Daily price movement limits.. the short party pays F0 − F1 .e. As the contract reaches its maturity.e. If the futures Price 6 Futures Spot - t T Fig. 6. On writing the contract.6.5. the delivery price in the contract has been changed from F0 to F1 and therefore the contract is “renewed” each day. On the opposite. is set to about 75% of the initial margin. Convergence of futures and spot prices as time approaches maturity . a deposit. Consider the situation where the contract has a forward price of F0 at the beginning. there remain two unsolved problems: how to avoid defaults and how to specify the forward price. i. has to be paid (about 5-10% of the current value of the underlying in the contract). the futures price must converge to the spot price (see Figure 6. known as the initial margin. prevention of large movements Position limits. But even by standardizing in the above fashion. If the value of the margin account drops below the maintenance margin. i. If this is not the case.5).

or strike price (K). Secondly. The question of how to set the futures price is still unanswered. 6. which may be much lower than the current market price. the period of time for which the option is valid must be specified. the party who writes the option. must be specified. but do not imply the location where they are issued. one could buy the underlying asset at the spot price. the party who grants the option and the party who purchases the option. The party which grants the option is said to write an option. the futures price has to converge to the spot price. An American option allows exercise at any time before and including the expiration date. If the risk-free rate is constant and deterministic. There are two primary conventions regarding acceptable exercise dates before expiration. The terms European and American option classify the different options. the corresponding futures and forward prices are identical. there are numerous reasons which cause differences between futures and forward prices such as transaction costs. An option which gives the right to purchase something is called a call option whereas an option which gives the right to sell something is called a put option. An option is a derivative security (asset) whose underlying asset is the asset that can be bought or sold. In the case of an exercised call option. Thirdly. taxes etc. such as a share. may face a large loss. This is the price at which the asset can be purchased upon exercise of the option. Therefore. defined by the expiration date (T ). if the writer does not already own the asset. However. margin accounts. In an option issuing there are two sides involved. he must purchase it in order to deliver it at the specified strike price. The options on common stocks need specification in order to classify them and price the option. since this party must buy or sell this asset at specified terms if the option is exercised. usually a financial intermediary such as a bank or insurance company. firstly. if the futures price lies below the spot price at maturity one takes a long position in the futures contract and immediately sell the asset in the spot market. A European option allows exercise only on the expiration date.3 Options Option Basics An option is the right. Conversely. to buy (or sell) an asset under specified terms. The specification of an option include. . The party which purchases an option faces no risk of loss other than the original purchase premium.4. take a short position in the future and then settle the contract.130 6 Financial Applications price at maturity would be above the spot price. the exercise price. but not the obligation. For a more detailed discussion of forwards and futures see [20] and [23]. a clear description of what can be bought (for a call) or sold (for a put). In practice.

and where S < K we say it is out of the money. where S = K we say it is at the money. The put option on stock one has a higher value. S(T ) − K). we start to develop some intuition. the option payoff at expiration time is C(S. Value of options at expiration date given stock with strike price K and suppose the the option has reached the expiration time T .S 0 K Put option - 0 S K Call option Fig.6. 6. Consider two stocks at the same price and two put option with the same strike price. In addition to the time value of options.6. The value of an option at expiration is graphically depicted in Figure 6. the value of the call option is C = S − K. What is the value of the option at this time in function of the underlying stock price S? If the stock price S is larger than the strike price K.4 Derivatives 131 The Nature of Options Before we begin to derive the exact pricing formula for options. C = 0. the call option value is above its payoff function at expiration time. Suppose you own a European call option on a P C 6 6 . The argument for put options is identical and thus P (S. because the probability that stock one falls below the strike price is higher than for stock two. since the time remaining offers us the possibility that the stock price would further increase and thus the option has a time value. Since a put option is similar to an . the options has a volatility value. In the case where we have not reached the expiration time. Therefore. T ) = max(0. Stock one is more volatile than stock two. For a call option where S > K we say the option is in the money.6. T ) = max(K − S(T ). 0). In the case where S < K you would not exercise the option and thus it has no value.

Let us form a portfolio G of a stock at the current price S(0) and a put option P with strike price K. Take a call option some time before expiration which is out of the money. 6. Initial Investment Return Payoff P + S − L = 0 r1 > r E[r1 (P + S) − rL] > 0 −P − S + L = 0 r1 < r E[−r1 (P + S) + rL] > 0 P + S − L = 0 r1 = r E[r1 (P + S) − rL] = 0 The same argument can be made to explain the value of options at any other time than expiration. the possibility to earn an abnormal profit with no risks involved. [23]. Option pricing centers on the concept of arbitrage-free markets. the insurance premium must go up. and this will consequently increase the option price until the payoff at expiration is exactly equal to the risk-free interest rate. and [14]. A similar argument can be made if the expected payoff is below the risk-free interest rate. [27]. If the payoff is larger than the risk-free rate.4. loosely put. Lets assume the value of the option is zero. and buy the portfolio. because we may short the stock and the option and lend money. the out of the money options must a have a positive time value. . The notion arbitrage is. the expected payoff return at expiration r1 is exactly the risk-free interest rate r. The derivation of the Black-Scholes Equations can be found in [6]. since the combination of a put option and the same underlying stock removes all possibility of a loss. we need think about the underlying economics of options and financial markets. This would clearly be an arbitrage possibility and thus. such as payed by Treasuries bonds. [25]. we could borrow money (L) at the risk-free rate. We consequently would achieve a profit larger than the risk-free rate at no risk without any capital investments.132 6 Financial Applications insurance which pays in case of a loss. The payoff at expiration is either zero or positive and can be larger than the prevailing risk-free interest rate. In an arbitragefree market.4 Black-Scholes Formula and PDE In order to price options correctly. if the likelihood of this case gets larger. When the market is arbitrage-free. Economics of Options In order to derive pricing concepts for options. Because this is such an interesting money machine. We could now obtain infinitely large quantities of this option and face no risk of loss and need no initial capital outlay. we start to buy huge quantities of the portfolio. we will derive the famous Black-Scholes PDE and Formula for Call Options based on the no arbitrage argument. The possible scenarios are shown in table below. one can only obtain a risk-free return at the prevailing risk free interest rate.

t) is an arbitrary function of S and t.. The call option is a derivative security of the underlying stock S and therefore.6) The term dxS + dyB equals zero. and third. Additionally. we introduce the following more general stock price dynamics dS(t) = µ(S. and σ is a constant. because a change in the amount of stocks and bonds we hold at constant stock and bond prices equals an in.e. Let us call the value of the call option C(t. The portfolio dynamics are given by dG = xdS + ydB + dxS + dyB = xdS + ydB (6. t)S(t)dt + σS(t)dW (6. i. The bond dynamics are dB = rBdt (6. and the current risk-free interest rate r. The portfolio is very often called the replication portfolio. Derivation of the Black-Scholes PDE The derivation is based on two central arguments. which should match the option value at all times.or outflow . As outlined before. S). The stock price dynamics are assumed to be as in equation (6. The SDE of the stock price is given by dS(t) = µS(t)dt + σS(t)dW (6. it changes its value dC according to Itˆos formula ( ∂C ∂C 1 ∂2C 2 2) ∂C dC = + µS + σ S dt + σSdW (6. no money should be needed except for the initial capital outlay. For stock option. First. we need to model the dynamics of stocks. asset price are always positive. The first argument is that in a perfect market no arbitrage possibilities exist. stock prices resemble a log-normal distribution.4 Derivatives 133 Asset Dynamics The starting point for option pricing is a model of the underlying asset dynamics. we use the Geometric Brownian motion model to model stocks for three reasons. stocks behave like bonds with continuous compounding and stochastic returns.6. the expiration time T . second.3) where µ(S.2).4) 2 ∂t ∂S 2 ∂S ∂S The writer of the option does not want to undertake any risks by writing the option and therefore forms a portfolio G which is comprised of the amount x of the underlying stock and of the amount y of a bond B.2) where µ and σ are constant values.5) The portfolio G = xS + yB should be self-financing. and the second that the writer of the option should not undertake any risks by writing the option.

any arbitrage possibility is removed. thus C ∗ (S0 . 0). because before we have only used the argument that the option’s writer should not undertake any risks. Only when C ∗ (S0 .9) B ∂S Substituting the (6.134 6 Financial Applications of money. (6. we would have found an arbitrage possibility.4) gives ∂C 1( ∂C ) ∂C ∂C 1 ∂2C 2 2 µS + C− S rB = + µS + σ S . The portfolio dynamics are extended by substituting dS from (6. 0). we arrive at the celebrated Black-Scholes PDE ∂C ∂C 1 ∂2C 2 2 + rS + σ S . 0) and the theoretical Black-Scholes price is C(S0 .7) Since the portfolio dynamics should match the dynamics of the option. 0). because we will invest C(S0 . The same argument can be made if the market price is below the theoretical price. ∂S B ∂S ∂t ∂S 2 ∂S 2 Finally.7) and matching the coefficient of dt in (6. If we manage to convince somebody to pay more than the theoretical price. we only allow to change the positions in the stocks and bonds in such way that the change in one asset finance the change in the other asset. 0) as profit. 0) > C(S0 . Consider the strategy of shorting (writing) one call option and investing the money in the replication portfolio.6). Since we require that the portfolio is self-financing. To do this. 0) in the replication portfolio and pocket the initial difference C ∗ (S0 .9) into (6. the replication portfolio always matches the option until the expiration date. we match the coefficients of dt and dw in (6. The beauty of the theory is that the Black-Scholes PDE gives a framework to price options and yields a replication procedure that covers all risks for the party that writes the option. (6. This yields dG = x(µS(t)dt + σS(t)dw) + y(rB)dt = (xµS + yrB)dt + xσSdW . The last step in the argument is to show that (6. we could earn a risk-less profit. we first match the coefficient of dw by setting ∂C . (6. rC = .7) and (6. 0) = C(S0 . ∂S Since we want G = C and using G = xS + yB yields x= (6.8) and (6.10) is arbitrage-free. 0) − C(S0 .2) into (6.10) ∂t ∂S 2 ∂S 2 The solution of the PDE together with the appropriate boundary conditions gives the pricing of the option. By construction. The initial market price is C ∗ (S0 .8) ∂C G = yB + S=C ∂S ( 1 ∂C ) y= C− S . Since there is no initial capital required and the replication portfolio procedure removes all risks.4).

6. S) = max(S(T ) − K. −∞ S(T ) < K Since N (∞) = 1 and N (−∞) = 0 we obtain C(S.11) satisfies the boundary condition. S) = S(t)N (d1 ) − Ke−r(T −t) N (d2 ) where N (d) is the cumulative normal probability distribution ∫ d y2 1 N (d) = √ e− 2 dy 2π −∞ (6. It is (in a trivial way) a derivative of the stock itself. expiration time T . The Black-Scholes formula for a European call option with strike price K. T ) = d2 (S. ∂t = 0 and therefore rS(t) = rS(t).12) and d1 and d2 are defined as 2 S ln( K ) + (r + σ2 )(T − t) √ d1 = σ T −t √ d2 = d1 − σ T − t . ∂t 2 2π(T − t) .4 Derivatives 135 Black-Scholes Formula for European Call Options Having derived the Black-Scholes PDE. There are uncountably many more solutions to this PDE.10) and the boundary conditions C(T.11) given by ∂C = N (d1 ) ∂S d2 1 ∂2C e− 2 √ = ∂S 2 Sσ 2π(T − t) d2 1 e− 2 Sσ ∂C =− √ − rKe−rT N (d2 ). For t = T (expiration time) we get { +∞ S(T ) > K d1 (S. a non dividend paying stock with current price S. With this choice we have ∂C ∂S = 1. The derivatives of (6. 0 S(T ) < K This proves that (6. T ) = . Before that we take a closer look at the PDE. 0). Let us consider the stock itself.11) (6. Let us now show that the Black-Scholes formula satisfies the PDE (6. we now present the Black-Scholes formula for European call options. T ) = { S − K S(T ) > K . This proves that C(t. S) = S(t) ∂2C ∂C should satisfy the PDE. is given by C(t. S) = S(t) is a possible solution. so C(t. and a constant risk free interest rate r. ∂S 2 = 0.

we need to use a simple theoretical relationship between the prices of the corresponding puts and calls. The replication procedure remains the same as well. For a detailed discussion of this property. By lending e−r(T −t) K. the reader may refer to [9].5 Black-Scholes Formula for European Put Options The formula for European put options can be easily derived from the formula for European call options. but also with the more general dynamics shown in (6. 0) = S(T ) − K. S) = Ke−r(T −t) N (−d2 ) − S(t)N (−d1 ). To derive the formula. If you buy a call and sell a put with the same strike price. S) = S(t)(N (d1 ) − 1) − Ke−r(T −t) (1 − N (d2 )). The difference to the stock price is the strike price K. we obtain a payoff of K at expiration time and the portfolio of the put and the call and the credit resembles exactly the stock. We now use the so-called put-call-parity and the Black-Scholes call option formula to derive the European put option formula. 6. Thus the call-put-parity is C − P + e−r(T −t) K = S. Since the drift term does not enter the PDE. S).3). 0)− max(K − S(T ). C − P + e−r(T −t) K = S S(t)N (d1 ) − Ke −r(T −t) N (d2 ) − P (t. Often however. d2 1 ∂C ∂C 1 ∂2C 2 2 e− 2 Sσ √ + rS + σ S = − − rKe−rT N (d2 ) ∂t ∂S 2 ∂S 2 2 2π(T − t) d2 1 e− 2 √ +N (d1 )rS + σ2 S 2 2Sσ 2π(T − t) ( ) = r S(t)N (d1 ) − Ke−r(T −t) N (d2 ) = rC(t. S) + e−r(T −t) K = S(t) P (t. The Black-Scholes PDE and the call option formula is valid not only when we model the stock price dynamics with a geometric Brownian motion (6.2). Since (N (x) − 1) = N (−x) we can state the formula for European put options P (t. it does not affect the pricing of options. The payoff at expiration time is max(S(T )−K. This shows that (6.4.11) satisfies the Black-Scholes PDE. American Options In the option contracts we have discussed so far. the portfolio behaves almost like the stock itself. the holder may exercise the option at a certain time specified in the contract. the holder is .136 6 Financial Applications Thus.

13) in which equality holds if the option is not exercised. where it is assumed that the option will be exercised at a time at which this is most profitable to the holder. For the PDE (6.e. S) = 0. For American options the Black-Scholes PDE is replaced by an inequality ∂P ∂P 1 ∂2P 2 2 + rS + σ S − rP ≤ 0 ∂t ∂S 2 ∂S 2 (6.13) together with the boundary conditions.4 Derivatives 137 given the opportunity to exercise early which means that the option may be exercised not only on a given date but also any at time before.. A mixture between American and European options are options which can be exercised at a fixed number of dates before the expiration date. The valuation of American options usually are based on a worst case principle. Those options (you might guess it) are called “Bermuda” options.6 General Option Pricing An similar PDE for pricing can be derived for any single factor stock price process. The solution procedure is given in detail in ([33]). The Black-Scholes PDE (6. 0). American options or any other “Exotic” options. 0). and the early exercise inequality yield no analytical solution. where the payoff depends not only of the value of the underlying security at expiration date but on the path of the asset. which represents a worst case for the writer. if its value exceeds the revenue of exercising. S→∞ and the second boubdary condition states that the value of the put option for S(t) = 0 is equal to the discounted strike price P (T. We will now use the example of an American put option to illustrate the pricing of American options. Such contracts are known as American options. 0) = e−r(T −t) K . 6. i. In addition we impose the boundary conditions on the PDE. The PDE needs to be solved numerically.13) the condition on maturity (expiration) holds: P (T.4. S) > max(K − S(t). this is expressed by the inequality P (t. where one of the popular techniques is the transformation into a linear complimentary system. If we assume that the underlying asset follows an SDE as given by . S) = max(K − S(T ). the terminal value.6. are a lot more difficult to evaluate. For the put option. The first boundary condition states that the put has no value for arbitrarily large stock prices lim P (T.

14) applies only when the underlying stock price is modelled by a scalar SDE. a multi dimensional form of (6.14) can be derived. S)dW the PDE for pricing any possible option C(t.14) Again it is noteworthy. S)dt + σ(t. that the drift does not enter the PDE and that the diffusion term changes the PDE. This equation (6. In the case that a system of SDE models the underlying stock. The derivation is analogous to the derivation of the Black-Scholes PDE. ∂t ∂S 2 ∂S 2 (6. The reader may refer to [25] or [14]. S) is given by rC = ∂C ∂C 1 ∂2C + rS + σ(t.138 6 Financial Applications S(t) = µ(t. . S)2 .

NY. MacKinlay. Hemisphere. Oldenbourg. Moore. Necessary and sufficient conditions for non-scalarvalued functions to attain extrema. second edition. L. Athans and H. L. T. Y. Princeton. unchen. Geering. H. Stochastic Processes. W. 1973. M¨ unchen. 3. 1953. L. reprinted edition. 1996. NJ. 4. 6. New York. 1975. 18. Duffie. Harvey. NJ. Dynamic Asset Pricing Theory. Zeitreihenmodelle. World Scientific. P. A. H. 2002. Prentice-Hall. 1998. 1979. Viceira. 7. Campbell and L. Rishel. S. E. C. Wiley. Kiesel. M. H. B. Optimal Control. and J. NJ. Springer. Oxford. and A. 1996. 2007. PA. Estimation. 13. J. D. Linear and Nonlinear Filtering for Scientists and Engineers. E. and Control. 2002. Cyganowski.-C. New York. 1994. Oxford University Press. Englewood Cliffs. NY. C. Schnabel. 1997. Arbitrage Theory in Continuous Time. Regelungstechnik.References 1. Geering. 9. Time Series Analysis. Washington. second edition. J. DC. Risk Neutral Valuation. Princeton University Press. Ombach. Probability. B. Springer. 17. 2003. 1973. The Econometrics of Financial Markets. A. Campbell. Bj¨ ork. N. Stochastische Differentialgleichungen: Theorie und Anwendung. Princeton University Press. H. Princeton. Princeton University Press. Springer. 1995. 2. Fleming and R. NJ. J. U. N. Kloeden. Bingham and R. Berlin. Applied Optimal Control: Optimization. 12. W. 1998. Berlin. D. sixth edition. Y. Lo. Philadelphia. From Elementary Probability to Stochastic Differential Equations with MAPLE. PA. P. 1998. D. IEEE Transactions on Automatic Control. 15. Berlin. P. Strategic Asset Allocation: Portfolio Choice for Long-Term Investors. M¨ . W. J. Optimal Filtering. 14. Oxford. Doob. Breiman. London. 5. Ho. M. Springer. Springer. 1975. Hamilton. P. Dennis and R. Princeton. A. Arnold. Singapore. 8. O. 16. Ahmed. Geering. 18:132–139. B. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Anderson and J. 1992. SIAM. Bryson and Y. J. Oxford University Press. Deterministic and Stochastic Optimal Control. SIAM. 19. Philadelphia. Oldenbourg. 10. 11.

MA. Berlin. J. NJ. 26. Springer. G. A. Alexander. Elsevier Science. R. 32. second edition. Uppersaddle River. Williams. 28. Stochastic Controls: Hamiltonian Systems and HJB Equations. Schaumburg. Academic Press. 2000. J. Systemtheorie f¨ ur regellose Vorg¨ ange. Brown. Hanzon. Oxford. 22. New York. A. H. Luenberger. 2000. Berlin. 2001. Mikosch. 27. Amsterdam. and J. Zhou. World Scientific. H. J. New York. European Control Conference. Thomson South-Western. . G. V. Yong and X. 36. H. Continuous-Time Finance. San Diego. Stochastic Processes and Filtering Theory.140 References 20. 1999. Englewood Cliffs. 30. Springer. 1992. A. Baschnagel. Malliaris and W. Mason. Stochastic Processes: From Physics to Finance. Stochastic Methods in Economics and Finance. IL. D. Hull. 1998. P. E. F. Singapore. Panjer. Schumacher. Berlin. Investment Analysis and Portfolio Management. B. B. NJ. NY. Schlitt. Financial Pricing Models in Continuous Time and Kalman Filtering. Stochastic Differential Equations. second edition. Y. O Øksendal. 2006. NJ. Vellekoop. Elementary Stochastic Calculus with Finance in View. Applied Nonlinear Control. CA. Jazwinski. Investment Science. An Introduction to the Mathematics of Financial Derivatives. 35. Sharpe. Neftci. Cambridge University Press. Springer. 23. C. Kellerhals. Investments. F. Cambridge. C. W. G. Malden. Finance for Control Engineers: Tutorial Workshop. K. fifth edition. Blackwell. C. Options. 1999. Reilly and K. 2001. Berlin. 1991. 31. Brock. 21. editor. 25. 34. 1998. Porto. 1960. Springer. S. 37. Academic Press. sixth edition. Bailey. eigth edition. 1970. T. Oxford University Press. W. Springer. H. N. Li. Prentice Hall. The Actuarial Foundation. 1999. Probability with Martingales. Merton. 1998. J. sixth edition. 33. OH. and Other Derivatives. Uppersaddle River. Futures. NY. 1991. 1982.-J. Prentice Hall. and M. Financial Economics. D. 29. 24. Pearson Prentice Hall. Paul and J. Slotine and W. 1998. H.