You are on page 1of 103

Advanced Financial Models Michael R.

Tehranchi

Contents
1. Standing assumptions: complications we ignore 2. Prerequisite knowledge Chapter 1. Arbitrage theory for discrete-time models 1. The set-up 2. Investment, consumption and state price densities 3. Arbitrage and the rst fundamental theorem 4. Supermartingales, stopping times and local martingales 5. Num eraires and equivalent martingale measures Chapter 2. Pricing and hedging contingent claims in discrete-time models 1. European claims and the second fundamental theorem of asset pricing 2. Super-replication of American claims Chapter 3. Brownian motion and stochastic calculus 1. Brownian motion 2. It o stochastic integration 3. It os formula 4. Girsanovs theorem 5. A martingale representation theorem Chapter 4. Arbitrage theory for continuous-time models 1. The set-up 2. Admissible strategies 3. Absolute arbitrage and state price densities 4. Relative arbitrage and equivalent local martingale measures 5. The structure of state price densities Chapter 5. Hedging contingent claims in continuous time models 1. An existential replication result 2. An example of relative arbitrage 3. The BlackScholes model and formula 4. Markovian markets and the BlackScholes PDE 5. BlackScholes volatility 6. Local volatility models 7. Computing marginal laws 8. American claims in local volatility models Chapter 6. Interest rate models
3

5 8 9 9 10 17 22 29 33 33 37 41 42 42 46 50 51 53 53 54 56 59 60 63 63 65 65 66 70 72 75 80 83

1. 2. 3. 4. 5.

Bond prices and interest rates Bank accounts to bond prices and interest rates Short rate models Markovian short rate models The HeathJarrowMorton framework

83 84 85 86 88 93 93 93 94 96 97 97 98 98 101

Chapter 7. Crashcourse on probability theory 1. Measures 2. Random variables 3. Expectations and variances 4. Special distributions 5. Conditional probability and expectation, independence 6. Probability inequalities 7. Characteristic functions 8. Fundamental probability results Index

Financial mathematics as a subject is young (as compared to, say, number theory), but it is mature enough now that there has emerged some consensus on the notation, vocabulary and important results. These notes are an attempt to present many of the main ingredients of this theory, mainly concerning the pricing and hedging of derivative securities. But before launching into the story, we will begin by acknowledging some of the real-world complications that will not be discussed at length hereafter. 1. Standing assumptions: complications we ignore Unfortunately, actual nancial markets are very complicated. Of course, in order to develop a systematic nancial theory, it is prudent to concentrate on the essential features of these markets and ignore the less essential complications. Therefore, the theory that will be presented in these notes is concerned with the analysis of market models that have plenty of simplifying assumptions. That is not to say that these complications are not important. Indeed, there is active ongoing research attempting to remove these simplifying assumptions from the canonical theory. Below is a list of these assumptions. 1.1. Dividends. The total stock of a publicly traded rm is divided into a xed number N of shares. The owner of each share is then entitled to the fraction 1/N of the total prot of the rm.1 A portion of the rms prot is usually reinvested by management, for instance by building new factories, but the rest of the prot is paid out to the shareholders. In particular, the owner of each share of stock will receive periodically a dividend payment. However, in this course, we will assume that there are no dividend payments. Actually, this assumption is not as terrible as it sounds. Example sheet 1 will show how to adapt the theory developed for assets that pay no dividends to incorporate assets that have non-zero dividend payments. 1.2. Tick size. Financial markets usually have a smallest increment of price, the tick. (The tick refers back to the days when prices were quoted on ticker tape.) Indeed, the tick size can vary from market to market, and even for assets traded in the same market. There seems to be an industry-wide eort to hamonise tick sizes, but a quick google search found this document http://cdn.batstrading.com/resources/participant resources/BATSEuro Ticks.pdf which highlights the complexity of the system in Europe. However, in this course, we will assume that the tick size is zero. This is a convenient assumption for those who prefer continuous mathematics to discrete. It is usually a harmless assumption, unless the prices of interest are very close to zero.
things are even more complicated. For instance, stocks can be classied as either common or preferred, with implications on dividends, voting rights and claims on the rms assets in case of bankruptcy. Also, the number N of shares outstanding is not necessarily xed.
5
1Actually,

1.3. Transactions costs. Financial transactions are processed by a string of middle men, each of whom charge a fee for their services. Usually the fee is nearly proportional to the size of the transaction. However, in this course, we will assume that there are no transactions costs. This assumption is justied by by the fact that transactions costs are often very small relative to the size of typical transactions. But one must always remember that in some applications, it might not be wise to neglect these costs. 1.4. Short-selling constraints. In the real world, it is actually possible for someone to sell an asset that he does not own. The essential mechanism is to borrow a share of that asset from a broker, and then immediately to sell it to the market. This procedure is called short selling. Brokers, however, place contraints on this behaviour. Indeed, they usually require collateral and charge a fee for their service. Furthemore, if the market price of the asset increases, or if the price of the collateral decreases, the broker may ask the short seller to put up even more collateral. However, in this course, we will assume that there are no short-selling constraints. Indeed, the theory of discrete-time trading is cleaner without additional assumptions on the sizes of trades. But we will see that to overcome some technical problems in the theory of continuous-time trading, it will be natural to restrict trading to what are called admissible strategies. 1.5. Divisibility of assets. There is another real-world trading constraint of a rather technical nature. The smallest unit of stock is the share. A share cannot be further divided it is generally impossible to buy half a share of a particular stock. However, in this course, we will assume that assets are innitely divisible. 1.6. Bid-ask spread. Real-world trading is asymmetrical since the price to buy a share is usually higher than the price to sell it. The reason is that are two dierent ways to buy or sell an asset listed on an exchange: the limit order and the market order. A limit buy order is an oer to buy a certain number of shares of the asset at a certain price. A limit sell order is dened similarly. The collection of unlled limit orders is called the limit order book. At any time, there is the highest price for which there is an order to buy the asset. This is called the bid price. The lowest price for which there is an order to sell is called the ask price. The bid/ask spread is the dierence. Figure 1 illustrates the evolution of a hypothetical limit order book as various orders arrive and are lled. A market order are instructions to execute a transaction at the best available price. In particular, if the market order is to buy, then the lowest limit sell order is lled rst. Therefore, for small market buy orders, the per share price paid is the ask price. Similarly,
6

if a market sell order arrives, then the highest limit buy order is lled rst, and hence the per share price received is the bid price. However, in this course, we will assume that there are no bid-ask spreads. This assumption is justied by the observation that in many markets, the spread is very small. However, in times of crisis, this assumption is not usually applicable, and hence the theory breaks down dramatically.

Figure 1. Top left. The bid price is 8 and the ask is 11. Top right. A limit sell order for three shares at 11 arrives. Bottom left. A limit buy order for two shares at 8 is cancelled. Bottom right. A market order to buy ve shares arrives. Note that four shares are sold at 11 and one at 12. After the transaction, the ask price is 12. 1.7. Market depth. As described above, there are only a nite number of limit orders on the book at one time. If a large market buy order arrives, for instance, then the lowest limit sell order is lled rst. But if the market order is bigger than the total shares available to buy at the ask price, then the limit orders at the next-to-lowest price are lled, and progresses up the book until the market order is nally lled. In this way, the ask price increases.
7

The market depth is the number of shares available to buy or sell at the ask or bid price respectively. Equivalently, the depth of a market is a measure of the size of a market order necessary to move quoted prices. However, in this course, we will assume that there is innite market depth. Equivalently, we will assume that investors are small relative to the limit order book, so they are price takers, not price makers. However, the most recent nancial crisis shows that this assumption does not always approximate reality just ask the traders at Lehman Brothers! 2. Prerequisite knowledge The emphasis of this course is on some of the mathematical aspects of nancial market models. Very little is assumed of the readers knowledge of the workings of nancial markets. However, some mathematical background is needed. Our starting point is the famous observation (sometimes attributed to Niels Bohr) that it is dicult to make predictions, especially about the future. Indeed, anyone with even a passing acquaintance with nance knows that most of us cannot predict with absolute certainty how the the price of an asset will uctuate otherwise we would be much richer! Therefore, the proper language to formulate the models that we will study is the language of probability theory. An attempt is made to keep this course self-contained, but you should be familiar with the basics of the theory, including knowing the denition and key properties of the following concepts: random variable, expected value, variance, conditional probability/expectation, independence, Gaussian (normal) distribution, etc. Familarity with measure theoretical probability is helpful, though a crashcourse on probability theory is given in an appendix. Please send all comments and corrections (including small typos and major blunders) to me at m.tehranchi@statslab.cam.ac.uk.

CHAPTER 1

Arbitrage theory for discrete-time models


1. The set-up The models we will encounter will be of form P = (Pt1 , . . . , Ptn )tT where Pti will model the price of a nancial asset (stock, bond, etc.) at time t T. In this course, the index set T will be one of two sets Z+ = {0, 1, 2, . . .} when time is discrete, and R+ = [0, ) when time is continuous. Usually, the context will be clear and we write t 0 for t T. A modelling assumption that we will use throughout is that the n-dimensional stochastic process P is adapted to a ltration F. We now briey describe what this means. ***** Recall that a probability space is a triple (, F , P) where is a set, called the sample space, whose elements are interpreted as the possible outcomes of an experiment; F is a sigma-eld1 on , whose elements are interpreted as measurable events; P is a probability measure on (, F ), a countably additive function on F such that P() = 1. A random variable X is simply an F -measurable2 function from to R. We say that Y = (Y1 , . . . , YN ) is an N -dimensional random vector if each Yi is a random variable. A stochastic process (Zt )t0 is just a collection of random variables (or vectors) indexed by the parameter t, interpreted as time, either discrete or continuous. We now formalise the concept of information being revealed as time marches forward. The correct notions are that of a ltration and adaptedness. Definition. A ltration F = (Ft )t0 on the probability space (, F , P) is a collection of sigma-elds such that Fs Ft F for all 0 s t. Definition. A process X = (Xt )t0 is adapted to F i the random variable Xt is Ft measurable for all t 0. To gain some intuition about these denitions, consider this example.
1a

2that

non-empty collection of subsets of closed under complementation and countable unions is, the set { : X ( ) x} belongs to F for all real x
9

Example. Consider the experiment of tossing a coin two times. We can model this experiment on the sample space = {HH, HT, T H, T T }. The set of all events is the set F = {, {HH }, . . . , {HH, HT }, . . . , {HH, HT, T H }, . . . , {HH, HT, T H, T T }} of all 24 = 16 subsets of . The probability measure is just the one that assigns P({ }) = 1/4 equal probability to each elementary event. The ow of information is modelled by the following sigma-elds F0 = {, }, F1 = {, {HH, HT }, {T H, T T }, }, F2 = F . Now consider a stochastic process (Xt )t{0,1,2} that is adapted to the ltration (Ft )t{0,1,2} . Intuitively, the value of the random variable Xt is known once after t tosses of the coin. For instance, X0 must be a constant, X0 ( ) = a for all , since there is no information before the experiment. On the other hand, the random variable X1 must be of the form b if {HH, HT } X1 ( ) = c if {T H, T T } since the only information known at time 1 is whether or not the rst coin came up heads. Finally, X2 can be any function on , that is, of the form d if = HH e if = HT X2 ( ) = f if = T H g if = T T. Notice that for all t {0, 1, 2} the event {Xt x} is in Ft for every real x. For this course, it will be convenient to assume that there is no randomness at time 0. This can be made formal by assuming the sigma-eld F0 is trivial. This means that if A is an element F0 then either P(A) = 0 or P(A) = 1. In particular, every F0 -measurable random variable is almost surely constant. In the discrete-time theory, there nothing loss by further assuming F0 = {, }. However, it turns out that this further assumption is technically inconvenient in the continuous-time theory. 2. Investment, consumption and state price densities To the market described by the adapted process P , we now introduce an investor. Suppose that Hti is the number of shares of asset i held during the interval (t 1, t]. We will allow Hti to be either positive, negative, or zero with the interpretation that if Hti > 0 the investor is long asset i and if Hti < 0 the investor is short the asset. Also, we do not demand that the Hti are integers. We also asssume that the investor consumes ct units of money during the interval (t 1, t]. As is natural, we will insist that ct is non-negative.
10

To have an economically meaningful theory, we will make some restrictions on the possible dynamics of the n + 1-dimensional process H = (Ht1 , . . . , Htn , ct )t>0 . Definition. An investment/consumption strategy is an n-dimensional process predictable process H satisfying the self-nancing (with respect to the market P ) condition Ht1 Pt1 Ht Pt1 . Given the strategy H we dene the corresponding consumption process c via ct = Ht1 Pt1 Ht Pt1 . A strategy H is a pure investment strategy of the corresponding predictable consumption process is identically zero.
n i i (We are using the notation a b = i=1 a b to denote the usual Euclidean inner (or dot) product in Rn .) Note that the above denition only makes nancial sense thanks to our assumptions of no market frictions, no dividends, etc. We now recall the denition of a predictable process:

***** This denion is purely mathematical but is useful for us because it is the right way of eliminating clairvoyant investors. Definition. A stochastic process X = (Xt )t1 is predictable 3 (with respect to a ltration F) if Xt is Ft1 -measurable for all t 1. Remark. Note that the time index set for a predictable process (Xt )t1 is (usually) {1, 2, . . .}, not {0, 1, . . .}. Hence X0 is not necessarily dened. Remark. In discrete time, a process X is predictable if and only if the process Y is adapted, where Xt = Yt1 . That is to say, the notion of predictability can be dispensed with by simply changing notation. However, in continuous time, there is a much deeper dierence between the notions of predictability and adaptedness. Therefore, for the sake of a unied treatment of the discrete and continuous time cases, we keep it in. ***** Throughout the rest of the chapter we will nd the following notation helpful. Let X : trading strategies adapted processes be dened by Xt (H ) = Ht Pt = Ht+1 Pt + ct+1 . Of course, Xt (H ) is simply the investors wealth at time t if he is employing trading strategy H. Now that we have our market model and weve introduced an investor into this market, our rst challenge is to nd out how to invest optimally. We consider one such optimal investment problem. Obviously, the following set-up can be generalised in many ways, but since the main motivation for studying this problem is to introduce the very important notion of a state price density, we try to keep it simple.
term predictable is used in the US, while the synonym previsible is more common in the UK. I am American, so I will use predictable out of habit. I hope this will not cause too much confusion.
11
3The

Let T > 0 be some non-random time horizon, and let u be a strictly increasing, strictly concave function on [0, ). This function will model our investors utility from consumption, in the sense that he strictly prefers consumption stream (ct )1tT to (ct )1tT if and only if
T T

E
t=1

u(ct ) > E
t=1

u(ct )

That u is strictly increasing models the assumption that the investor strictly prefers more to less. And the strict concavity models the assumption that the investor strictly prefers to consume the non-random quantity E(C ) to the random quantity C , for any non-constant random variable C . We suppose that investors initial wealth is x 0 given. We also suppose that he will live exactly to age T , and since he derives no utility from wealth in the afterlife, chooses a strategy H such that XT (H ) = 0 a.s. In particular cT = XT 1 (H ) and HT = 0. Summing up, we now consider the problem of maximising
T

E
t=1

u(ct )

over strategies H such that X0 (H ) = x and XT (H ) = 0. We impose some mathematical conditions to help with the analsyis. We will assume that u is dierentiable, that u(0) > that u is bounded from above. For instance, we could take u(c) = ec . With these assumptions we know that there exists at one feasible strategy, namely consuming everything during the rst period c1 = x and never again c2 , . . . , cT = 0 and while never investing H1 , . . . , HT = 0. The key ingredient to the analysis of this problem is the notion of a state price density. We will see that it is related to the dual problem in a certain sense. Definition. A state price density is a strictly positive adapted process Y = (Yt )t0 such that the n-dimensional process (Yt Pt )t0 is a martingale. Remark. In discrete time, a state price density is a positive adapted process Y such that the n-dimensional random variable Yt Pt is integrable for each t 0 and such that E(Yt Pt |Ft1 ) = Yt1 Pt1 for all t 0. ***** We briey recall some notions from probability. Definition. Given a probability space (, F , P), let G F be a sub-sigma-eld of events. A random variable X : R is measurable with respect to G ( or briey, G measurable) if and only if the event {X x} is an element of G for all x R.
12

You know what that the conditional expectation of an integrable random variable X given a non-null event G means E(X 1G ) E(X |G) = P(G) The next theorem leads to a denition of conditional expectation given a sigma-eld: Theorem (Existence and uniqueness of conditional expectations). Let X be an integrable random variable dened on the probability space (, F , P), and let G F be a sub-sigma-eld of F . Then there exists an integrable G -measurable random variable Y such that E(1G Y ) = E(1G X ) for all G G . Furthermore, if there exists another G -measurable random variable Y such that E(1G Y ) = E(1G X ) for all G G , then Y = Y almost surely. Definition. Let X be an integrable random variable and let G F be a sigma-eld. The conditional expectation of X given G , written E(X |G ), is a G -measurable random variable with the property that E [1G E(X |G )] = E(1G X ) for all G G . Example. (Sigma-eld generated by a countable partition) Let X be a non-negative random variable dened on (, F , P). Let G1 , G2 , . . . be a sequence of disjoint events with P(Gn ) > 0 for all n and nN Gn = . Let G be the smallest sigma-eld containing {G1 , G2 , . . . , ...}. That is, every element of G is of the form nI Gn where I N. Then E(X |G )( ) = E(X |Gn ) if Gn where the right-hand side denotes conditional expection given the event Gn . More concretely, suppose = {HH, HT, T H, T T } consists of two tosses of a coin, and let G = {, {HH, HT }, {T H, T T }, } be the sigma-eld containg the information revealed by the rst toss. Suppose the coin is fair, so that each outcome is equally likely. Consider the random variable a if = HH b if = HT X ( ) = c if = T H d if = T T. Then E(X |G )( ) = (a + b)/2 (c + d)/2 if {HH, HT } if {T H, T T }

The important properties of conditional expectations are collected below: Theorem. Let all random variables appearing below be such that the relevant conditional expectations are dened, and let G be a sub-sigma-eld of the sigma-eld F of all events. linearity: E(aX + bY |G ) = aE(X |G ) + bE(Y |G ) for all constants a and b positivity: If X 0 almost surely, then E(X |G ) 0 almost surely, with almost sure equality if and only if X = 0 almost surely.
13

Jensens inequality: If f is convex, then E[f (X )|G ] f [E(X |G )] monotone convergence theorem: If 0 Xn X a.s. then E(Xn |G ) E(X |G ) a.s. Fatous lemma: If Xn 0 a.s. for all n, then E(lim inf n Xn |G ) lim inf n E(Xn |G ) dominated convergence theorem: If supn |Xn | is integrable and Xn X a.s. then E(Xn |G ) E(X |G ) a.s. If X is independent of G (the events {X x} and G are independent for each x R and G G ) then E(X |G ) = E(X ). In particular, E(X |G ) = E(X ) if G is trivial. slot property: If X is G -measurable, then E(XY |G ) = X E(Y |G ). In particular, if X is G -measurable, then E(X |G ) = X . tower property or law of iterated expectations: If H G then E[E(X |G )|H] = E[E(X |H)|G ] = E(X |H)

Now we come to one of the most important concepts in nancial mathematics, the martingale. A martingale is simply an adapted stochastic process that is constant on average in the following sense: Definition. A martingale relative to a ltration F is an adapted stochastic process M = (Mt )t0 with the following properties: E(|Mt |) < for all t T E(Mt |Fs ) = Ms for all 0 s t. Remark. The above denition of martingale is the same both discrete- and continuoustime processes. However, if the time index set is discrete T = Z+ , it is an exercise to show that an integrable process M is a martingale only if E(Mt+1 |Ft ) = Mt for all t 0. That is, it is sucient to verify the conditional expectations of the process one period ahead. Below are some examples of martingales. Before listing them, it is convenient to introduce a denition: Definition. Given a stochastic process Y = (Yt )t0 , let Ft be the smallest sigma-eld for which the random variables Ys is measurable for all 0 s t. The natural ltration of Y is the ltration (Ft )t0 , that is, the smallest ltration for which Y is adapted. In what follows, if a stochastic process is given but a ltration is not explicitly mentioned, then we are implicitly working with the natural ltration of the process. Example. Let X1 , X2 , X3 , . . . be independent integrable random variables such that E(Xi ) = 0 for all i. The process (St )t0 given by S0 = 0 and St = X1 + . . . + Xt is a martingale relative to its natural ltration. Indeed, the random variable St is integrable since E(|St |) E(|X1 |) + . . . + E(|Xt |) by the triangular inequality. Also, E(St+1 |Ft ) = E(St + Xt+1 |Ft ) = E(St |Ft ) + E(Xt+1 |Ft ) = St + E(Xt+1 ) = St .
14

Example. We now construct one of the most important examples of a martingale. Let X be an integrable random variable, and let Mt = E(X |Ft ). Then M = (Mt )t0 is a martingale. Indeed, integrability follows from the theorem on the existence and uniqueness of conditional expectation. Now, for every 0 s t we have E(Mt |Fs ) = E[E(X |Ft )|Fs ] = E(X |Fs ) = Ms by the tower property. Notice that this example also works in continuous time. Example. This last example is theorem shows how to take one martingale and build another one. Let M be a martingale and let K be a bounded predictable process. Then the process N dened by
t

Nt =
s=1

Ks (Ms Ms1 )

is a martingale. Indeed, by assumption, we have E(|Mt |) < and that there exist a constant C > 0 such that |Kt | C almost surely for all t 0. Hence
t

E(|Nt |)
s=1 t

E(|Ks ||Ms Ms1 |) C [E(|Ms |) + E(|Ms1 |)] <


s=1

Using the predictability of K and the slot property of conditional expectation, we have E(Nt+1 Nt |Ft ) = E(Kt+1 (Mt+1 Mt )|Ft ) = Kt+1 E(Mt+1 Mt |Ft ) =0 and were done. Remark. The martingale N above is often called a martingale transform or a discrete time stochastic integral. As we will see, it is one of the key building blocks for the continuous time theory to come. ***** Now we are in a position to make a rst analysis of the optimal investment problem mentioned above. Theorem. Suppose that there exists a state price density Y and an investment strategy H and corresponding consumption stream c such thate
u (c t ) = Yt1 .

15

If (H , c ) is feasible, then (H , c ) maximises


T

E u(ct ) subject to X0 (H ) = x, XT (H ) = 0 a.s.


t=1

Proof. (Assuming for now that the sample space is nite) This is a standard Lagrangian suciency argument. Introduce the Lagrangian
T

L(H, c, Y ) = E
t=1

[u(ct ) + Yt1 (Ht1 Pt1 ct Ht Pt1 )].

Of course,
T

L(H, c, Y ) = E
t=1

u(ct )

when (H, c) is feasible. Now, rearranging the sum we have


T T

L(H, c, Y ) = E
t=1 T

[u(ct ) Yt1 ct ] +
t=1

Ht (Yt Pt Yt1 Pt1 ) + Y0 H0 P0 YT HT PT

=E
t=1

[u(ct ) Yt1 ct ] + E MT + Y0 x
t

where Mt =

Hs (Ys Ps Ys1 Ps1 )


s=1

and we have used the boundary condition X0 (H ) = x and XT (H ) = 0. Now, if Y is a state price density, then Y P is a martingale by assumption. Then M is a martingale transform as dened above, and we would like to conclude that M is a martingale. Unfortunately, in general the holdings H need not be bounded. However, when the sample space is nite, all random variables are automatically bounded, and hence M really is a martingale. In particular, E MT = M0 = 0 so that
T

L(H, c, Y ) = E
t=1

[u(ct ) Yt1 ct ] + Y0 x

for any feasible (H, c) and state price density Y . Finally note that the strategy (H , c ) from the statement of the theorem has the property that L(H , c , Y ) L(H, c, Y ) for any feasible (H, c), since the concave function c u(c) yc is maximised when u (c) = y. Remark. The above proof gives some economic signicance to a state price density: the marginal utility of optimal consumption,
16

the Lagrange multiplier enforcing the self-nancing budget constraint, and consequently, the dual variable to an investors optimal investment problem. Remark. In the above proof, we see that a key step is to nd sucient conditions to ensure that
t

E
s=1

Hs (Ys Ps Ys1 Ps1 ) = 0.

Boundedness of H is sucient since we can appeal to the result about martingale transforms. In the general case, the martingale transform is not necessarily a true martingale, but it is always a local martingale. We will see how to handle local martingales later. One might argue that assuming the sample space is nite is not such a problem. Indeed, our cartoon model of the nancial market is ignoring plenty of other complications of reality. In particular, since prices really move on a discrete grid (because tick sizes are positive) and one could assume that prices are bounded, say, by 10100 , so a large nite sample space might be enough for our modelling needs. There are a couple reasons why we will strive for results that hold on general probability set-ups. Firstly, many popular models are based on random variables with continuous distributions, such as normal random variables. It would be a shame if our theory could not handle such models. Secondly, often it really is possible to prove general results, and since these notes are aimed at a mathematical audience, we are trying to state and prove results with the mimimum of assumptions. The downside, of course, is extra technical work. 3. Arbitrage and the rst fundamental theorem We have seen that the existence of a state price density is intimately related to the existence of a maximiser to a certain investment problem. With this motivation, we investigate a modelling assumption that ensures that there is no optimal investment/consumption strategy. That is, we investigate models that have arbitrage opportunities: Definition. An absolute arbitrage is a strategy H such that there exists a non-random time T > 0 with the properties X0 (H ) = 0 = XT (H ) almost surely and T P t=1 ct > 0 > 0. Note that if H f is a feasible investment strategy for the problem studied in the last section and if H a is an arbitrage, then H f + H a is also feasible but has strictly higher expected utility. Inductively, the strategy H f + nH a is feasible for every n 0, and so there cannot be a maximiser to the optimal investment problem. Remark. We have dened an absolute arbitrage above, and so all wealths must be compared to 0. Later we will dene the concept of a relative arbitrage where wealths are compared to a certain benchmark. In the examples that follow, recall that the terminal condition XT (H ) = 0 a.s. can be acheived by insisting that XT 1 (H ) 0 a.s., because we can then take cT = XT 1 and HT = 0. In particular, we need only specify an arbitrage strategy up to T 1.
17

Example. Consider a market with two assets with prices given by (P 1 , P 2 )


w; ww w ww ww
1/2

(4, 7)

(3, 4)

GG GG GG 1/2 GG#

(2, 3)
1 2 (The above diagram should be read P0 = 3, P(P1 = 7) = 1/2, etc.) Note that if H1 = (3, 2) and c1 = 1 then X @2

0= == == 1/2 ==
1/2

and so X1 0 a.s. Hence H is an arbitrage. In most discussions of arbitrage theory, there is the assumption that at least one asset is a num eraire: Definition. An asset is a num eraire i its price is strictly positive for all time, almost surely. One advantage of assuming that there is a num eraire is this fact: If the investment/consumption strategy H is an arbitrage for the market model, there exists a pure investment strategy H and a non-random time horizon T such that X0 ( H ) = 0 XT (H ) 0 almost surely, and P(XT (H ) > 0) > 0. Therefore, for market models with a num eraire asset, the above can be taken as the denition of arbitrage. Let us return to the last example. Example. Consider a market with two assets with prices given by (P 1 , P 2 )
w; ww w ww ww
1/2

(4, 7)

(3, 4)

GG GG GG 1/2 GG#

(2, 3) Then H1 = (4, 3) is a pure investment arbitrage.


18

For the sake of generality and avoiding unneeded assumptions, we have only assumed that prices are real-valued, but not necessarily positive. Unfortunately, in this generality we cannot limit ourselves to pure investment arbitrages: we need to allow consumption also. Example. Let n = 1 and P0 = 1 and P1 = 1 a.s. Clearly, an arbitrage would be at time 0 to sell short one share of the asset, H1 = 1 and consume the proceeds c1 = 1. At time 1, the wealth would be X1 = 1 > 0 a.s. However, there are no pure investment arbitrages. Markets with many arbitrage opportunities would be nicewe all would be a lot richer. But for the sake of building realistic models, we usually assume that markets are free of arbitrages. The rst theorem of the course is the mathematical classication of such market models: Theorem (First fundamental theorem of asset pricing). A market model has no arbitrage if and only if there exists a state price density. Proof of the 1FTAP, one-period case. Since we are only considering the oneperiod case, we can simplify notation. In particular, if (Yt )t=0,1 is a state price density, we will assume without loss that Y0 = 1 and write Y for Y1 . Note that by the denition of state price density we have P0 = E(Y P1 ). Also, as mentioned above, we will only specify our candidate arbitrage strategies by their initial holding H1 , which we shall denote H for clarity. Recall that in this context H denotes a non-random n-vector. One direction is easy. Suppose there exists a a state price density Y . Let H be a strategy such that H P0 0 H P1 almost surely. Since Y > 0 we must have 0 E(Y H P1 ) = H E(Y P1 ) = H P0 and hence H P0 = 0. The pigeonhole principle4 tells us that Y H P1 = 0 almost surely. Finally, since Y is strictly positive, we can conclude H P1 = 0 almost surely, and hence H is not an arbitrage. The other direction is more dicult5. First dene a set of candidate state price densities by Y = {Y > 0 a.s. and E(Y P1 ) < }. Note that Y is non-empty, since Y0 = e P1 is certainly an element. Also, it is easy to see that Y is convex. Now consider the set P = {E(Y P1 ) : Y Y} Rn .
mean the fact that 0 a.s. and E( ) = 0 together implies = 0 a.s. proof is adapted from Douglas Kennedy, Stochastic Financial Models Chapman & Hall/CRC Financial Mathematics Series
5This 4I

19

which is also non-empty and convex. If P0 P then there would exist a state price density. So suppose P0 P . We will show that there exists an arbitrage. Now, by the separating hyperplane theorem, there exists a vector H Rn such that (1) for all p P we have H (p P0 ) 0 (2) there exists p P such that H (p P0 ) > 0. Since H is our candidate arbitrage, we will let (in line with our notational convention mentioned above) c = H P0 and X = H P1 . We must show that c 0, that X 0 a.s. and P(c + X > 0) > 0. But since every p P can be written p = E(Y P1 ) for some Y Y , we can rewrite the above as (1) for all Y Y we have E(Y X ) c, (2) there exists Y Y such that E(Y X ) > c. Letting Y = Y0 for > 0 in (1) and sending 0 yields c0 Now let Y = (1

1{X<0} + 1)Y0 for > 0 in (1) and multiplying by yields E(Y0 X 1{X<0} ) (c + E Y0 X ) 0
X 0 a.s.

But since X 1{X<0} 0 and Y0 > 0, the pigeonhole principle says P(X < 0) = 0, i.e. Finally, by (2) we see that P(X + c = 0) = 0 and hence H is an arbitrage. 3.1. A separating hyperplane theorem*. In this optional subsection, a version of the separating hyperplane theorem is stated and proved. There are many versions of this theorem, but we give only the version needed to prove the rst fundamental theorem of asset pricing in one-period. Theorem (Separating/supporting hyperplane theorem). Let C RN be convex and x RN not contained in C . Then there exists a RN such that (y x) 0 for all y C , where the inequality is strict for at least one y C . of C . Let Proof. Case 1: Separating hyperplane. The point x is not in the closure C y C be the point in C closest to x. Assuming for the moment the existence of this point, let = y x. Fix a point y C and 0 < < 1, and note that the point (1 )y + y is since C is convex. Then in C

0 = = =

|y x|2 2 |(1 )y + y x|2 2 |(y y ) + |2 2 2 |y y |2 + 2(y y ) .

By rst dividing by and then taking the limit as 0 in the above inequality, we conclude (y y ) 0. Hence (y x) = (y y ) + ||2 > 0 as desired.
20

Case 1. Separating hyperplane Now, we establish the existence of y : Let d = inf yC |y x| > 0 and let yn be a sequence in C such that |yn x| d. Applying the parallelogram law |a + b|2 + |a b|2 = 2|a|2 + 2|b|2 we have
1 (ym + yn ) x |ym yn |2 = 2|ym x|2 + 2|yn x|2 4 2 2

2|ym x|2 + 2|yn x|2 4d2 0


1 (ym + yn ) C and as m, n , where we have used the convexity of C to assert that 2 1 hence 2 (ym + yn ) x d. We have established that the sequence (yn )n is Cauchy, and as claimed. hence converges to some point y C of C . Dene a subspace Case 2: Supporting hyperplane. The point x is in the closure C N of R by S = span{y x : y C }. (The reason we introduce S is to account for the possiblity that C might not have thickness in all N dimensions, i.e. it is contained in some lower dimensional ane space. This observation is needed to prove the strict separation assertion.) Let (xn )n be a sequence in such that xn x is in S and xn x. As in case 1, we can nd the the complement of C point yn C closest to xn . Let y xn n = n . x | |yn n Since (n )n is bounded, there exists a convergent subsequence, still denoted by (n )n , with limit S . Then

(y x) = lim n (y x)
n

= lim n (y xn ) + n (xn x)
n

0.
21

Case 2. Supporting hyperplane Finally, since S and = 0, we can conclude that there exists y C such that (y x) = 0, as desired. 4. Supermartingales, stopping times and local martingales In this section, we introduce some of the notions that will be involved in proving the easy direction of the discrete-time fundamental theorem of asset pricing in the multi-period case. Definition. A supermartingale is an adapted integrable process X (in either discrete or continuous time) such that E(Xt |Fs ) Xs for all 0 s t. (For discrete time, it is enough to show that E(Xt |Ft1 ) Xt1 .) We will nd the following result useful on several occasions. Proposition. Let X be a supermartingale, and suppose there is a non-random T > 0 such that E(XT ) = X0 . Then (Xt )0tT is a martingale. Proof. Let Yt = Xt E(XT |Ft ). Note that Yt is non-negative and integrable since X is a supermartingale. Then E(Yt ) = E(Xt ) E[E(XT )|Ft )] = E(Xt ) E(XT ) by the tower property of conditional expectation. Since X is a supermartingale, we have the inequality E(Xt ) X0 while E(XT ) = X0 by assumption. In particular, we can conclude the E(Yt ) 0. Therefore, by the pigeonhole principle, we have proven Yt = 0 and hence Xt = E(XT |Ft ). Finally, by the tower property, the process (Xt )0tT is a martingale.
22

***** Now we switch briey back to nance. To illustrate the kind of results we expect to hold more generally, we rst investigate the case of a nite sample space. Proposition. Let (H, c) be an investment/consumption strategy, let X be the corresponding wealth process and let Y be a state price density. If the sample space is nite then the process
t

X t Yt +
s=1

Ys1 cs

is a martingale. In particular, the process XY is a supermartingale. Proof. We have the calculation E(Xt Yt |Ft1 ) = E(Ht Pt Yt |Ft1 ) = Ht E(Pt Yt |Ft1 ) = Ht Pt1 Yt1 = (Ht1 Pt1 + ct )Yt1 = (Xt1 + ct )Yt1 where we have used the slot property between the rst and second line. This is justied because Ht is Ft1 -measurable by assumption (since H is predictable), and furthermore, the random variable Xt Yt is integrable because the sample space if nite. We can use this result to extend our proof of the rst fundamental theorem to to the multi-period case. Proof of the 1FTAP, multi-period, finite sample space case. Let Y be a state price density and let H be a strategy such that X0 (H ) = 0 and XT (H ) = 0 a.s. for some non-random time horizon T > 0. In particular, E(XT YT ) = 0 = X0 Y0 , so by the previous two propositions, we conclude that XY is a martingale. But again Xt Yt = E(XT YT |Ft ) = 0 so Xt = 0 a.s. for all 0 t T . Finally, we have Yt1 ct = E(Xt Yt |Ft1 ) Xt1 Yt1 = 0 so ct = 0 a.s. for all 0 t T . In the above discussion, we have seen that intregrability has been a stumbling block that has prevented us from attempting the case of a general (possibly innite) sample space. Fortunately, probablity theory has a trick for dealing with such technicalities. It is called localisation. ***** We now are returning to a general mathematical discussion. Definition. A stopping time for a ltration (Ft )tT is a random variable taking values in T {} such that the event { t} is Ft -measurable for all t T. Example. Obviously, non-random times are stopping times. That is, if = t0 for some xed t0 0, then { t} = if t0 t and otherwise.
23

Example. Here is a typical example of a stopping time. Let (Yt )t0 be a discrete-time adapted process and let A be an open set. Then the random variable = inf {t 0 : Yt A} (with the usual convention that inf = +) corresponding to the rst time the process enters the set A is a stopping time. Indeed,
t

{ t} =

{Ys A}
s=0

is Ft -measurable because each {Ys A} is Fs -measurable by the adaptedness of Y , and Fs Ft by the denition of ltration. Stopping times can be used to stop processes. Definition. For an adapted process X (in discrete or continuous6 time) and a stopping time , the process X dened by Xt = Xt is said to be X stopped at . Stopping times interact well martingales: stopped martingales are still martingales. Proposition. Let X be a martingale and let be a stopping time. Then X is a martingale. Proof. (Discrete time case) Note that
t

Xt = X0 +
s=1

1{s } (Xs Xs1 ).

Since the event {t } = { t 1}c is Ft1 -measurable by the denition of stopping time, the process Kt = 1{t } is predictable. Also, note that K is bounded. Therefore X is the martingale transform as dened earlier, and hence a martingale. The above result says that the martingale property is stable under stopping. We use this property as motivation for the following denition. Definition. A local martingale is an adapted process X = (Xt )t0 , in either discrete or continuous time, such that there exists an increasing sequence of stopping times (N ) with N such that the stopped process X N is a martingale for each N . Remark. Note that martingales are local martingales. Indeed, given a martingale X and any sequence of stopping times N , the stopped process X N is a martingale. Remark. Note that the local martingale property is also stable under stopping. Indeed, let X be a local martingale and a stopping time. Then by denition, there exists a sequence of stopping times N such that X N is a martingale. Hence (X N ) = X N is again a martingale since N is a stopping time. But note that X N = (X )N , implying that the sequence of stopping times N is such that (X )N is a martingale. This means X is a local martingale.
time is continuous, we also need the extra technical assumption that X is progressively measurable in order for this denition to be useful. Fortunately, it is sucient to assume that sample paths of X are continuous, which will be enough for this course.
24
6If

Example. To illustrate the type of technicality local martingales allow us to tackle, let and be independent random variables, and let F1 = ( ) and F2 = (, ). (Recall that the notation ( ) denotes the sigma-eld generated by , i.e. the smallest sigma-eld for which is measurable.) Suppose further that is integrable and E( ) = 0. Let M1 = 0, and M2 = . Is the process M = (Mt )t=1,2 a martingale? It seems like it should be, since E(M2 |F1 ) = E( | ) = E( | ) by the slot property = E( ) by independence =0 = M1 . But THIS ARGUMENT IS NOT CORRECT!!! Indeed, before we can consider the conditional expectation E(M2 |F1 ) we need rst to check that the random variable M2 is integrable. If you doubt the importance of this, please see example sheet 1 for an example of what can go wrong when trying to compute conditional expectations of non-integrable random variables. Therefore, the argument above shows that (Mt )t=1,2 is a martingale, only if we assume that that M2 = is integrable. We now consider the general case. For each N 1 let N = Now and hence M2N is integrable and E(M2N |F1 ) = E(1{||N } | ) = 1{||N } E( | ) by the slot property = 1{||N } E( ) by independence =0 = M1N . Therefore the stopped process M N is a martingale. Since N a.s. since |X | < a.s. we have shown that although M may not necessarily be a martingale, it is always a local martingale. The notion of a local martingale allows us to use martingale techniques on processes that ought to behave like martingales. The following theorem provides a context where local martingales arise naturally. Theorem. Suppose X is a discrete-time local martingale and K is a predictable process. Let Yt =
s=1 t

1 +

if | | > N if | | N

M2N = 1{||N }

Ks (Xs Xs1 )
25

for t 1. Then Y is a local martingale. Remark. This is the martingale transform as before, but now do not insist that K is bounded or that X is a true martingale. As a consequence, we cannot assert that Y is a true martingale, merely a local martingale. The idea is that by localising, we can study the algebraic and measurability structure of the martingale transform without worrying about integrability issues. Proof. Since X is a local martingale, we can nd a sequence of stopping times N that X N is a martingale. Let N = inf {t 0 : |Kt+1 | > N } with the convention inf = +. Note that N is a stopping time since K is predictable. Now writing
t

YtN N =
s=1

N N Ks 1{sN } (Xs Xs 1 )

we see that the stopped process is the martingale transform of the bounded predictable process (Kt 1{tN } )t0 with respect to the martingale X N , and hence is a martingale. The next theorem gives a sucient condition that a local martingale is a true martingale. Theorem. Let X be a local martingale in either discrete or continuous time. Let Yt be a process such that |Xs | Yt almost surely for all 0 s t. If E(Yt ) < for all t 0, then X is a true martingale. Proof. Let (N )N be a localising sequence of stopping times for X . Note that XtN Xt a.s. since N . Furthermore, by assumption |XtN | Yt which is integrable, so we may apply the conditional version of the dominated convergence theorem to conclude E(Xt |Fs ) = E(lim XtN |Fs )
N

= lim E(XtN |Fs )


N N

= lim XsN = Xs for 0 s t, where we have used the fact that the stopped process (XtN )t0 is a martingale. The following corollary is useful: Corollary. Suppose X is a DISCRETE-TIME local martingale such that E(|Xt |) < for all t 0. Then X is a true martingale. Proof. Let Yt = |X0 | + . . . + |Xt |. The process Y is integrable by assumption and |Xs | Yt for all 0 s t. The conclusion follows from the previous theorem. In the absense of integrability, the next best property is non-negativity. Theorem. Suppose X is a local martingale in either continuous or discrete time. If Xt 0 for all t 0, then X is a supermartingale.
26

Proof. In the general case, let (N )N be the localising sequence for X . First we show that Xt is integrable for each t 0. Fatous lemma yields E(|Xt |) = E(Xt ) = E(lim XtN )
N

lim inf E(XtN )


N

= X0 < . Now that we have established integrability, we can discuss conditional expectations. The conditional version of Fatous lemma yields E(Xt |Fs ) = E(lim XtN |Fs )
N

lim inf E(XtN |Fs )


N

= lim inf XsN


N

= Xs for 0 s t, as claimed. As before, discrete time local martingales are particularly nice: Corollary. If X is a DISCRETE-TIME local martingale such that Xt 0 a.s. for all t 0, then X is a martingale. Proof. By the above theorem, we have that E(|Xt |) = E(Xt ) X0 < . Since X is integrable, the previous corollary implies X is a martingale. Actually, in discrete time, the assumption of integrability or non-negativity for all t 0 can be relaxed: Theorem. Suppose Y is a DISCRETE-TIME local martingale such that there is a nonrandom time horizon T > 0 with the property that either Integrability: E(|YT |) < , or Non-negativity: YT 0 a.s. Then (Yt )0tT is a martingale To prove this theorem, we will appeal to the following result, whose proof7 is omitted: Theorem. Suppose Y is a discrete-time local martingale. Then there exists a true martingale X and a predictable process K such that
t

Yt = Y0 +
s=1

Ks (Xs Xs1 )

for t 0.
J. Jacod and A.N. Shiryaev. Local martingales and the fundamental asset pricing theorems in the discrete-time case. Finance and Stochastics 2: 259-273 (1998)
27
7see

Proof that integrability or non-negativity imply martingality. Suppose our local martingale Y can be written as
t

Yt = Y0 +
s=1

K s ( Xs Xs 1 )

as in the above theorem. Let N = inf {t 0 : |Kt+1 | > N } as before. Note YT 1{T N } is integrable since X is integrable by denition of martingale, and K is bounded on {T N }. Hence we have E[YT 1{N T } |FT 1 ] = E[YT 1 1{T N } + KT 1{T N } (XT XT 1 )|FT 1 ] = YT 1 1{T N } + KT 1{T N } E[XT XT 1 |FT 1 ] = YT 1 1{T N } .

(Integrability) If YT is integrable, we take N and apply the dominated convergence theorem to conclude that E(YT |FT 1 ) = YT 1 . In particular, YT 1 is integrable and induction shows that Yt in integrable for all 0 t T . Therefore (Yt )0tT is an integrable local martingale in discrete time and hence a true martingale. (Non-negativity) If YT 0 we have YT 1 1{T N } = E[YT 1{N T } |FT 1 ] 0 Taking N shows YT 1 0, induction shows that Yt 0 for all 0 t T . Therefore (Yt )0tT is a non-negative local martingale in discrete time and hence a true martingale. ***** We can use these results to give full proofs of earlier results. Let (H, c) be an investment/consumption strategy, and let X be the associated wealth process. Let Y be a state price density. Since Xt Yt + Yt1 ct Xt1 = Ht Pt Yt + Yt1 ct Ht1 Pt1 Yt1 = Ht (Pt Yt Pt1 Yt1 ) we see that the process XY + Y c is the martingale transform of the predictable process H with respect to the martingale P Y , and hence a local martingale. Now suppose XT = 0. Since consumption is non-negative we have
T

XT YT +
s=1

Ys1 cs 0.

Therefore we can conclude that the process is a true martingale and hence
T

X0 Y0 = E
s=1

Ys1 cs
28

In particular, if X0 = 0 then ct = 0 for all 0 t T and the strategy (H, c) is not an arbitrage. 5. Num eraires and equivalent martingale measures In this section, we introduce the concepts of num eraire assets and equivalent martingale measures. The primary purpose of this section is to reconcile concepts and terminology used by other authors to the theory developed so far. We will also nd that equivalent martingale measures can be used to simplify some calculations later in the course. We begin with a denition: Definition. Let (, F ) be a measurable space and let P and Q be two probability measures on (, F ). The measures P and Q are equivalent , written P Q, i P(A) = 0 Q(A) = 0. The above denition says that equivalent measures have the same null sets. Complementarily, equivalent measures have the same almost sure sets. Example. Consider the sample space = {1, 2, 3} with the set F of events all subsets of . Consider probability measures P and Q dened by P{1} = 1/2, P{2} = 1/2, and P{3} = 0 Q{1} = 1/1000, Q{2} = 999/1000, and Q{3} = 0. Then P and Q are equivalent. It turns out that equivalent measures can be characterised by the following theorem. When there are more than one probability measure oating around, we use the notation EP to denote expected value with respect to P, etc. Theorem (RadonNikodym theorem). The probability measure Q is equivalent to the probability measure P if and only if there exists a positive random variable such that Q(A) = EP ( 1A ) for each A F . The random variable above is unique in the sense that if is positive random variable such that Q(A) = EP ( 1A ) for all A, then P( = ) = Q( = ) = 1. The random variable is called the density, or the RadonNikodym derivative , of Q with respect to P, and is often denoted dQ = . dP In fact, P also has a density with respect to Q given by dP 1 = . dQ Note that EP ( ) = 1 by putting A = in the conclusion of theorem. More generally, by the usual rules of integration theory, if X is a non-negative random variable then EQ (X ) = EP (X ). *****
29

Now lets return to our nancial model. To start o, we need the following denition: Definition. A num eraire is an asset with a strictly positive price at all times. The idea is that a num eraire can be used to count money. Hence, we can speak in terms of prices relative to the num eraire. Definition. Let P be a market model dened on a probability space (, F , P). The measure P is called the objective (or historical or statistical ) measure for the model. Suppose that we can write our asset price process as P = (N, S ) where N is a positive adapted process (the price of a num eraire) and S is an adapted d dimensional process. An equivalent martingale measure relative to this num eraire is any probability measure Q equivalent to P such that the discounted price processes St Nt is a martingale under Q. Remark. In many accounts of arbitrage theory, the concept of an equivalent martingale measure has taken centre stage. I believe that its importance has been overstressed. In particular, it is a num eraire-dependent concept, unlike that of a state price density. For instance, if there are two assets that both num eraires (for example from the point of view of a British trader, both the euro and the US dollar are num eraires) then one must be very careful to specify which one is the num eraire. We will now see that state price density and equivalent martingale measures are essentially the same concept once a num eraire is specied.

t0

5.1. Finding an EMM given a state price density. Let Y be a state price density for the model, and suppose the prices can be written P = (N, S ). Fix a time horizon T > 0, and dene a new measure Q by the density dQ YT NT = . dP Y0 N0 Then Q is an equivalent martingale measure (relative to the num eraire) for the model. Let us check this claim. First we need to show that the proposed density does in fact dene a probability measure. Since the asset is a num eraire we have Nt > 0 for all t and since Y is a state price density we also have Nt > 0 for all t. In particular, the random variable dQ/dP above is positive-valued. Also, since Y N is a martingale, we have EP (YT NT ) = Y0 N0 EP
30

dQ dP

= 1.

Now we will show that the discounted price process S/N is a martingale under Q. For 0 t T Bayess formula (see the rst example sheet) yields E
Q

ST |Ft NT

= =

EP EP

dQ ST dP NT

|Ft

dQ |Ft dP

EP (YT ST |Ft ) EP (YT NT |Ft ) St = Nt since by the denition of state price density, both SY and N Y are martingales. 5.2. Finding a state price density given an EMM. Suppose Q is an equivalent martingale measure for the market (Pt )0tT with respect to the num eraire asset with price N . Let N0 P dQ Yt = E |Ft . Nt dP Then Y is as state price density. The verication of this fact is left as exercise.

31

CHAPTER 2

Pricing and hedging contingent claims in discrete-time models


A contingent claim is any cash payment where the size of the payment is contingent on the prices of other assets or any other variable (for instance, the weather). There are two major types of contingent claims that we will study in these notes: European and American. European: specied by a time horizon T > 0 and FT -measurable random variable T modelling the payout at the maturity date T . American: specied by a time horizon T > 0 and an adapted process (t )0tT where t models the payout of the claim if the owner of the claim chooses to exercise at time t. Example (Call option). A European call option gives the owner of the option the right, but not the obligation, to buy a given stock at some xed time T at some xed price K , called the strike of the option. Let ST denote the price of the stock at the maturity date T . There are two cases: If K ST , then the option is worthless to the owner since there is no point paying a price above the market price for the underlying stock. On the other hand, if K < ST , then the owner of the option can buy the stock for the price K from the counterparty and immediately sell the stock for the price ST to the market, realising a prot of ST K . Hence, the payout of the call option is T = (ST K )+ , where a+ = max{a, 0} as usual. The hockey-stick graph of the function g (x) = (x K )+ is below.

An American call option gives the owner of the option the right, but not the obligation, to buy a given stock at any time t [0, T ] at some xed strike price K . By the argument above, the payout of the call option exercised at time t is given by t = (St K )+ . 1. European claims and the second fundamental theorem of asset pricing Imagine that you nd yourself in a market with prices P = (Pt )t0 , and you would like to sell a European contingent claim with payout T . What price should you ask at time 0
33

to o-set this liability at time-T ? One criterion would be to ask for enough money to oset the cost of hedging away the liability by trading in the market. With this motivation, we introduce an important class of claims that can be perfectly hedged: Definition. A European contingent claim with payout T is replicable (or attainable ) i there exists a pure investment strategy H such that XT (H ) = T almost surely. One of the reasons to single out attainable claims is that there is an unambiguous way to price them according to the no-arbitrage principle: Theorem. Suppose that the market model with n-dimensional price process P has no arbitrage. Let T be the payout of an attainable European contingent claim with maturity date T > 0, and let H be the n-dimensional replicating strategy. Suppose the claim has price t for 0 t T . If the augmented market with (n + 1)dimensional price process (P, ) has no arbitrage, then t = Xt (H ) almost surely for all 0 t T Proof. The idea is to that if Xt = t for some t, then there would be an arbitrage in the augmented market. To construct an arbitrage wait until the rst time that the price of the replicating portfolio diers from the price of the claim, and then buy the cheap one, sell the expensive one and pocket the dierence. In mathematical notation, let = inf {0 t T : Xt = t }, with the usual convention that inf = +. Consider the (n + 1)-dimensional investment strategy t = sign( X )1{t> } (Ht , 1) H is and consumption ct = | X |1{t= +1} . Note that the corresponding wealth process X 0 = X T = 0. If the augmented market has no arbitrage, then ct = 0 a.s. for all such that X t, implying = a.s. as claimed. Example. (Put-call parity formula) Suppose we start with a market with three assets with prices (Bt,T , St , Ct )0tT . The rst asset is a bond with maturity date T and unit principal value, so that in particular, BT,T = 1 almost surely. The next asset is a stock. The last asset is a call option on that stock with strike K and maturity T , so that CT = (ST K )+ . Suppose that this market is free of arbitrage. Now we introduce another claim, called a put option. A put option gives the owner of the option the right, but not the obligation, to sell the stock for a xed strike price at a xed maturity date. If the strike is K and maturity date is T , then a similar argument as we used for the call option, the payout of a put option is PT = (K ST )+ . It turns out that the put option is replicable in the market (B, S, C ). Indeed, we have the identity PT = (K ST )+ = K ST + (ST K )+ = KBT,T ST + CT = (K, 1, +1) (BT,T , ST , CT ).
34

Hence Ht = (K, 1, +1) for all 1 t T is a replicating strategy. Now, suppose we want to assign prices Pt to the put for 0 t < T . The above theorem says there is no arbitrage in the augmented market (B, S, C, P ) if and only if Pt Ct = KBt,T St . This is the famous put-call parity formula. A diculty in using the above theorem for pricing an attainable contingent claim is that it requires knowing the replicating strategy. The following theorem gives a formula for the no-arbitrage price of the claim which does not require knowledge of this strategy, just that it exists. Theorem. Suppose that the market model with n-dimensional price process P has no arbitrage, and let Y be a state price density process. Let T be the payout of an attainable European contingent claim with maturity date T > 0. Suppose the claim has price t for 0 t T and that the augmented market with (n + 1)dimensional price process (P, ) has no arbitrage. If either YT T is integrable or T 0 a.s., then 1 t = E(T YT |Ft ) Yt for all 0 t T . Remark. Suppose the market has a num eraire asset and the price process decomposes as P = (N, S ) where N is strictly positive. As discussed in the previous chapter, to each state price density procee Y one can associate an equivalent martingale measure Q by the RadonNikodym density dQ NT YT = . dP N0 Y0 With this notation, the pricing formula then becomes t = Nt EQ T |Ft . NT

Proof. Since the claim is attainable there exists a pure investment strategy such that XT (H ) = T a.s. Since the augmented market has no arbitrage, the previous theorem says that Xt = t a.s. for all 0 t T . From our calculation in the last chapter, we know that for any state price density Y and any wealth process X associated with a pure investment strategy is such that XY is a local martingale. Furthermore, if the random variable XT YT is either integrable or non-negative, we know that (Xt Yt )0tT is a true martingale. Hence t Yt = Xt Yt = E(XT YT |Ft ) = E(T YT |Ft ) as claimed. For the sake of comparison, consider the following result:
35

Theorem. Suppose that the market model with n-dimensional price process P has no arbitrage. Let T be the payout of (not necessarily attainable) contingent claim with maturity date T > 0. Suppose the claim has price t for 0 t T and that the augmented market with (n + 1)dimensional price process (P, ) has no arbitrage. Then there exists a state price density Y of the original market such that 1 t = E(T YT |Ft ) Yt for all 0 t T . Proof. This is just the rst fundamental theorem of asset pricing applied to the augmented market with prices (P, ). Remark. The message is this: if a claim is attainable it can be priced with any state price density. On the other hand, the most one can say for a general claim is that there exists some state price density that prices the claim. Since attainable claims have unique no-arbitrage prices, we single out the markets for which every claim is attainable: Definition. A market is complete if and only if every European contingent claim is attainable. A market is incomplete otherwise. We can characterise complete markets: Theorem (Second Fundamental Theorem of Asset Pricing). An arbitrage-free market model is complete if and only if there exists a unique state price density Y such that Y0 = 1. Proof that completeness implies uniqueness. First, suppose the market is complete. Let Y and Y be two state price densities, and T 0 be an arbitrary FT -measurable random variable. Since T is attainable by assumption, there exists a strategy H such that XT (H ) = T a.s. Hence E(YT T ) = X0 (H ) = E(YT T ) so that E[(YT YT )T ] = 0. Since the random variable T above is arbitrary, we can let T = 1{YT >YT } which by the pigeonhole principle implies YT YT a.s. and by symmetry YT = YT as desired. To prove the other direction, we will make use of the following characterisation of attainable claims. Some of the details of the proof are on the second example sheet. Theorem. Let T 0 be an FT -measurable random variable such that for all state price densities E(YT T )/Y0 = x for some constant x. Then there exists a pure investment strategy H such that XT (H ) = T and x = X0 (H ). Proof that uniqueness implies completeness. Suppose that there is a unique state price density such that Y0 = 1. Then for any FT -measurable random variable, exist strategies + H + and H such that XT (H + ) = T and XT (H ) = T by the above characterisation of + attainable claims. Hence XT (H H ) = T and so the claim with payout T is attainable. This proves the if direction.
36

This box summarises the fundamental theorems: 1FTAP: 2FTAP: No arbitrage Completeness Existence of state price density Uniqueness of state price density

In discrete time models complete markets have even more structure: Theorem. If the market model P with n assets is complete, then for each t 0 the probability space can be partitioned into no more than nt Ft -measurable events of positive probability, and in particular, the n-dimensional random vector Pt takes values in a set of at most nt elements. Proof. We rst consider the t = 1 case. Suppose A1 , . . . , Ak are a collection of disjoing F1 -measurable events with P(Ai ) > 0 for all i. Claim: the set {1A1 , . . . , 1Ak } is linearly independent, and in particular, the dimension of the span of {1A1 , . . . , 1Ak } is exactly k . To prove this claim, we must show that if a1 1A1 + . . . ak 1Ak = 0 a.s. for some constants a1 , . . . , ak , then a1 = = ak = 0. To this end, note that if i = j the sets Ai and Aj are disjoint and hence 1Ai 1Aj = 0. By multiplying both sides of the equation by 1Ai we get ai 1Ai = 0. But since P(Ai ) > 0 it must be the case that ai = 0. Now if the market is complete, each of the 1Ai is replicable. Hence span{1A1 , . . . , 1Ak } {H P1 : H Rn }
1 = span{P1 , . . . , Ptn }

Looking at the dimensions of the spaces above, we must conclude k n. The argument in the case t > 1 is similar. Let B1 , . . . , BN be a maximal partition of into disjoint Ft1 -measurable sets of positive measure. If a random vector Ht is Ft1 measurable, then it takes exactly one value on each of the Bj s for a total of at most N values H1 , . . . , HN . Hence {H Pt : H is Ft1 -meas. } = {H1 Pt 1B1 + . . . + HN Pt 1BN : H1 , . . . , HN Rn } = span{Pti 1Bj : 1 i n, 1 j N } and the dimension of the space above is nN . The argument above proves that there are at most nN sets of disjoint Ft -measurable sets of positive measure. Induction completes the proof. 2. Super-replication of American claims We now discuss American claims. Here, things are quite dierent. The canonical example of an American claim is the American put option a contract which gives the buyer the right (but not the obligation) to sell the underlying stock at a xed strike price K > 0 at any time between time 0 and a xed maturity date T . Hence, the payout of the option is (K S )+ where {0, . . . , T } is a time chosen by the holder of the put to exercise the option. The payout of an American claim is specied by two ingredients: a maturity date T > 0, an adapted process (t )0tT .
37

For instance, in the case of an American put, we may take t = (K St )+ . Unlike the European claim, the holder of an American claim can choose to exercise the option at any time before or at maturity. However, to rule out clairvoyance, we insist that is a stopping time. Now, if an American claim matures at T > 0 and is specied by the payout process (t )0tT , then the actual payout of the claim is modelled by the random variable , where is any stopping time for the ltration taking values in {0, . . . , T }. We can think of the American claim then as a family, indexed by the stopping time , of European claims with payouts . To simplify matters, we make the following assumption in this subsection: The market model P = (Pt )0tT is complete. Let Y = (Yt )0tT be the unique state price density such that Y0 = 1. Intuitively, the seller of such a claim should at time 0 charge at least the amount sup E (Y )
T

to be sure that he can hedge the option, where the supremum is taken over the set of stopping times smaller than or equal to T . Indeed, this is the case. Theorem. Suppose that the adapted process (t )0tT species the payout of an American claim maturing at T > 0. There exists a trading strategy H such that Xt (H ) t for all 0 t T , X (H ) = for some stopping time , and X0 (H ) = sup T E (Y ). Remark. The strategy H dominates the payout of the American claim at all times, but is conservative in the sense that it exactly replicates the optimally exercised claim. The rest of this subsection is dedicated to proving this theorem. ***** We will need a result of general interest: Theorem (Doob decomposition theorem). Let U be a discrete-time supermartingale. Then there is a unique decomposition Ut = U0 + Mt At where M is a martingale and A is a predictable non-decreasing process with M0 = A0 = 0. Proof. Let M0 = 0 = A0 and dene Mt+1 = Mt + Ut+1 E(Ut+1 |Ft ) At+1 = At + Ut E(Ut+1 |Ft ) for t 0. Since U is assumed to be supermartingale, and hence integrable, the processes M and A are integrable. It is straightforward to check that M is a martingale, and since U is a supermartingale, that A is non-decreasing. Also by induction, we see that At+1 is Ft -measurable.
38

Summing up,
t

Mt At = M0 A0 +
s=1 t

(Ms Ms1 As + As1 )

=
s=1

(Us Us1 )

= Ut U0 . To show uniqueness, assume that Ut = U0 + Mt At = U0 + Mt At . Then M M is a predictable discrete-time martingale, that is, a constant. Now we introduce the key concept in optimal stopping theory: Definition. Let (Zt )0tT be a given integrable adapted discrete-time process. Dene an adapted process (Ut )0tT by the recursion UT = ZT Ut = max{Zt , E(Ut+1 |Ft )} for 0 t T 1. The process (Ut )0tT is called the Snell envelope of (Zt )0tT . Remark. The Snell envelope clearly satises both Ut Zt and Ut E(Ut+1 |Ft ) almost surely. Thus, another way to describe the Snell envelope of a process is to say it is the smallest supermartingale dominating that process. In our application Z will be the process Y , where Y is the state price density and is the process specifying the payout of the American claim. Theorem. Let (Zt )0tT be an integrable adapted process, let (Ut )0tT be its Snell envelope with Doob decomposition Ut = U0 + Mt At . Let = min{t {0, . . . , T } : At+1 > 0} with the convention = T on {At = 0 for all t}. Then is a stopping time and U = U0 + M = Z . Proof. That is a stopping time follows from the fact that the non-decreasing process (At )0tT is predictable. Now note that E(Ut+1 |Ft ) = E(U0 + Mt+1 At+1 |Ft ) = U0 + Mt At+1 since M is a martingale and A is predictable so that by the denition of Snell envelope U0 + Mt At = max{Zt , U0 + Mt At+1 }. In particular, since A U0 + M = max{Z , U0 + M A +1 } = 0. But since A +1 > 0 we must conclude U = U0 + M = Z .
39

Theorem. Let Z be an adapted integrable process and let U be its Snell envelope. Then U0 = sup E(Z ).
T

Proof. Since U is a supermartingale, U0 E(U ) for any stopping time by the optional sampling theorem. (See example sheet 2.) But since Ut Zt by construction, U0 E(Z ) for any stopping time . But letting = min{t {0, . . . , T } : At+1 > 0} where U = U0 + M A is the Doob decomposition of U , we have U0 = U0 + E(M ) = E(Z ). again by the optional sampling theorem and the previous result. Remark. By a similar argument, one can show that Ut = sup E(Z |Ft )
t T

for all 0 t T . This formula allows us to dene1 the Snell envelope for the innite horizon case T = and also in the continuous time case. Definition. If Z is an integrable adapted proces, a stopping time such that E(Z ) = sup T E(Z ) is called an optimal stopping time. Obviously the stopping time dened above is an optimal stopping time. Example sheet 2 shows how to nd another one. ***** Returning to nance, let (t )0tT be the process specifying the payout of an American option, and let (Ut )0tT be the Snell envelope of Y with Doob decomposition Ut = U0 + Mt At . We now will use the assumption that the market is complete: let H be strategy such that XT (H ) = (U0 + MT )/YT . Since XY is a martingale since it is a local martingale from before, and since the market is complete, it is also bounded. By the martingale property, we have Xt Yt = U0 + Mt for all 0 t T . In particular, Xt (H ) = (U0 + Mt )/Yt Ut /Yt t for all 0 t T , X (H ) = , and X0 (H ) = sup T E(Y ), completing the proof of the theorem.

should be a bit more careful. Notice that the supremum is over the possibly uncountable set of stopping times. In particular, the supremum may fail to be measurable. Therefore, the correct denition replaces the supremum with the essential supremum...
40

1Actually,

CHAPTER 3

Brownian motion and stochastic calculus


Despite the elegance of discrete-time nancial theory, there is at least one glaring problem: explicit computations are dicult. For instance, the fundamental theorems are stated in terms of state price densities, but it is very dicult to classify them except in a few simple examples. The continuous-time theory has the convenient feature that explicit formulae are easy to ndindeed, one of our rst results will be the general formula for a state price density in a continuous-time market model. Before we can describe the continuous-time nancial theory, we need to rst learn about stochastic integration. Recall that in discrete time, the self-nancing condition and budget constraint imply that for the wealth process X corresponding to a pure investment strategy H satises Xt Xt1 = Ht (Pt Pt1 ) so that
t

X t = X0 +
s=1

Hs (Ps Ps1 )

The continuous time analogue ought to be something like


t

Xt = X0 +
0

Hs dPs

What does the integral on the right mean? If we assume that the sample paths t Pt are dierentiable, we could interpret the integral as the Lebesgue integral dPs ds. ds 0 Unfortunately, it turns out that life is not that simple. To see why, remember that in discrete time we dened the state price density Y as a positive process such that Y P is a martingale. We will adopt more-or-less the same denition in continuous time. Now, a theorem of stochastic calculus says that a continuous martingale with dierentiable sample paths is necessarily constant. So if we insist that our price processes have dierentiable sample paths, we will have a very boring theory. This chapter is concerned with an integration theory where we use the martingale property, rather than the dierentiablity of the sample paths, as the key ingredient. This theory is nice, and indeed something like the fundamental theorem of calculus holds. This means we can do explicit computations. The most basic example of a continuous martingale is Brownian motion. We will build up our theory by rst dening Brownian motion, to construct the Brownian stochastic integral, and to learn the rules of the resulting calculus. The following chapter will provide an extremely brief introduction to this theory. Hs
41
t

1. Brownian motion In this section, we introduce one of the most fundamental continuous-time stochastic processes, Brownian motion. As hinted above, our primary interest in this process is that it will be the building block for all of the continuous-time market models studied in these lectures. Definition. A Brownian motion W = (Wt )t0 is a collection of random variables such that W0 ( ) = 0 for all , for all 0 t0 < t1 < ... < tn the increments Wti+1 Wti are independent, and the distribution of Wt Ws is N (0, |t s|), the sample path t Wt ( ) is continuous all . It is not clear that Brownian motion exists. That is, does there exist a probability space (, F , P) on which the uncountable collection of random variables (Wt )t0 can be simultaneously dened in such a way that the above denition holds? The answer, of course, is yes, and the proof of this fact is due to Wiener in 1923. Therefore, the Brownian motion is also often called the Wiener process , especially in the U.S. Although the sample paths of Brownian motion are continuous, they are very irregular. Below is a computer simulation of a one-dimensional Brownian motion: 2. It o stochastic integration We now have sucient motivation to construct a stochastic integral with respect to a Wiener process. What follows is the briefest of sketches of the theory. There are now plenty of places to turn for a proper treatment of the subject. For instance, please consult one of the following references: L.C.G. Rogers and D. Williams, Diusions, Markov Processes, and Martingales: Volume 2 I. Karatzas and S.E. Shreve, Brownian Motion and Stochastic Calculus. 2.1. The L2 theory. To get things started, let W be a scalar Brownian motion. We will assume that W is adapted to a ltration (Ft )t0 . For the record, we will assume that the ltration satises what are called the usual conditions of right-continuity Ft = >0 Ft+ and that F0 contains all P-null events. These are technical assumptions that ensure the existence of stopping times with the right properties. We also will assume that for each 0 s < t the increment Wt Ws is independent of Fs . The rst building block of the theory are the simple predictable integrands. Definition. A simple predictable process is an adapted process = (t )t0 of the form
N

t ( ) =
n=1

1(tn1 ,tn ] (t)an ( )


42

Sample path of Brownian motion 3.5 3.0 2.5 2.0 1.5 W 1.0 0.5 0.0 0.5 1.0 1.5

5 t

10

where an is bounded and Ftn1 -measurable for some 0 t0 < t1 < ... < tN < . For simple predictable processes we dene the stochastic integral by the formula
N

s dWs =
0 n=1

an (Wtn Wtn1 )

Theorem (It os isometry). For a simple predictable integrand , we have


2

E
0

s dWs

=E
0

2 s ds

Proof. Note that


2

s dWs
0

=
n

2 a2 n (Wtn Wtn1 ) + 2 m<n

am an (Wtm Wtm1 )(Wtn Wtn1 ).

Now, if m < n we know that am , an and Wtm Wtm1 are Ftn1 measureable and hence E[am an (Wtm Wtm1 )(Wtn Wtn1 )] = E E[am an (Wtm Wtm1 )(Wtn Wtn1 )|Ftn1 ] = E am an (Wtm Wtm1 )E[Wtn Wtn1 |Ftn1 ] =0
43

by the tower and slot property, where in the last line we used the fact that the increment Wtn Wtn1 has mean zero and is independent of Ftn1 . Similarly,
2 2 2 E[a2 n (Wtn Wtn1 ) ] = E{an E[(Wtn Wtn1 ) |Ftn1 ]}

= E[a2 n (tn tn1 )] where we have used the fact that the increment Wtn Wtn1 has variance tn tn1 . Putting this together implies
2 N

E
0

s dWs

=E =E
0

a2 n (tn tn1 )
n=1 2 ds s

as claimed. Now, the map dened by I () = 0 s dWs is an isometry from the space of simple predictable integrands to the space L2 (, F , P) of square-integrable random variables. The fact that L2 is complete is the key observation which allows us to build the stochastic integral of more general integrands. Definition. The predictable sigma-eld P is the sigma-eld on the product space R+ generated by sets of the form (s, t] A where 0 s < t and A is Fs -measurable. Equivalently, the predictable sigma-eld is that generated by the simple, predictable integrands. A predictable process is a map : R+ R that is P -measurable. Equivalently, predictable processes are limits of simple, predictable integrands. Remark. Every left-continuous, adapted process is predictable. These are the examples to keep in mind, since they are the ones that come up most in application. Now, suppose ((k) )k1 is a sequence of simple predictable integrands converging in L (R+ , P , Leb P) to a predictable process so that
2

E
0

(k) (s s )2 ds

as k . By It os isometry the sequence I ((k) ) is a Cauchy sequence, which by the completeness of L2 , converges to some random variable. This is what we take as the denition. Definition. If is predictable and

E
0

2 s ds

<

then
0

s dWs = lim
k 0 2

(k ) s dWs

where the limit is interpreted in the L () sense where (k) is any sequence of simple, predictable processes converging to in L2 (R+ ).
44

The inspired idea of the above denition of the integral is that it compensates for the roughness of a typical Brownian sample path by using instead the many cancellations that occur on average from the uncorrelated Brownian increments. Of course, we are not really interested in integrals over the whole interval [0, ) but rather nite intervals [0, t]. This is easily handled. Theorem. For every predictable such that
t

E
0

2 s ds

< for all t 0

there exists a continuous martingale X such that Xt =


0

s 1{st} dWs .
t

In this case, we will use the notation Xt =


0

s dWs .

2.2. Localisation. In this section, we show how to extend the denition of stochastic integral to predictable processes such that
t 2 s ds < almost surely 0

for all t 0. The technique is called localisation. Dene the stopping times
t

n = inf t 0 :
0

2 s ds = n

for each n 1, where inf = as usual, and let t Note that since E
t (n) (s )2 ds 0 (n)

= t 1{tn } .

n, the process X (n) dened by Xt


2 (n) t

=
0

(n) s dWs

is a martingale for each n by the L theory. Now x t > 0 and dene the increasing sequence of events An = { : n t}. Since t 2 ds < almost surely for all t 0, we have P nN An = 1. Hence we can dene 0 s the stochastic integral by the formula
t t

s dWs = lim
0

(n) s dWs 0

where the limit is in probability. Note that the X dened by


t

Xt =
0

s dWs
45

is a continuous local martingale. Indeed, for each n the stopped process Xtn = Xt
(n)

is a martingale. To summarise the most frequently used aspects of this construction: If is an adapated continuous process then Xt = local martingale. If in addition we have E X is a true martingale.
t 0 t 0

s dWs denes a continuous

2 s ds < for all t 0, then

3. It os formula In the last section, we sketched very quickly the constructed of a stochastic integral with respect to a Wiener process. What makes the It o stochastic integral useful is that there is a corresponding stochastic calculus. The basic building block of this calculus is the chain rule, called It os formula. 3.1. The scalar version. Let (Wt )t0 be a scalar Brownian motion adapted to a ltration (Ft )t0 satisfying our usual conditions. We can use our stochastic integration theory to dene a useful class of stochastic process: Definition. An It o process X is an adapted process of the form
t t

Xt = X0 +
0

s dWs +
0

s ds.

where X0 is a xed real number and (t )t0 and (t )t0 be predictable real-valued processes such that
t t 2 s ds < and 0 0

|s |ds <

almost surely for all t 0. Note that the two integrals appearing the above denition have dierent meanings: the rst as a stochastic integral and the second as a pathwise Lebesgue integral. We are now ready for the rst version of It os formula: Theorem (It os formula, scalar version). Let X be an It o process and f : R R twice continuously dierentiable. Then
t t

f (Xt ) = f (X0 ) +
0

f (Xs )s dWs +
0

1 2 f (Xs )s + f (Xs )s ds. 2

Let us highlight a dierence between It o and ordinary calculus, by noting the mysterious appearance of the f term in It os formula. This term would not appear in the chain rule of ordinary calculus. But consider the example f (x) = x2 so that
t

Wt2 = 2
0

Ws dWs + t.
t

Note that since E


0

Ws2 ds

=
0

s ds = t2 /2 < ,
46

the local martingale X given by


t

Xt =
0

Ws dWs

is actually a true martingale. Can you verify, directly from the denition of Brownian motion, that the process (Wt2 t)t0 is a martingale? We now introduce a dierential notation which cleans up some of the formulae. We will use the notation dXt = t dWt + t dt to mean t t Xt = X0 +
0

s dWs +
0

s ds.

Recall that the sample paths of the Brownian motion are nowhere dierentiable, so the notation dWt is only formal, and can only be interpreted via the stochastic integration theory. But in this dierential notion, It os formula takes a nicer form 1 2 df (Xt ) = f (Xt )t + f (Xt )t dt + f (Xt )t dWt . 2 Example. Consider the It o process given by Xt = X0 + aWt + bt Yt = eXt , we would like to show that the process (Yt )t0 is an It o process, and write down its decomposition in terms of ordinary and stochastic integrals. Let f (x) = ex . Then f (x) = ex and f (x) = ex . Also, dXt = a dWt + b dt and d X So It os formula says: 1 df (Xt ) = f (Xt )dXt + f (Xt )d X 2 dYt = Yt [(b + a2 /2)dt + a dWt ]
t t

for some constants a, b R. Letting

= a2 dt

We now introduce a notion which helps with computations involving It os formula. Theorem. Let X be an It o process. There exists a continuous non-decreasing process X , called the quadratic variation of X , such that
N

= lim
N n=1

(Xnt/N X(n1)t/N )2

for each t 0, where the limit is in probability. If dXt = t dWt + t dt then dX Remark. Some people write d X
t t 2 = t dt.

= (dXt )2 for obvious reasons.


47

This is not the appropriate place to prove this important result in full, but to get a feeling for why it is true, we will prove it in the case where X is a Brownian motion: Proof. By denition, the increments of Brownian motion are Gaussian randoms so that E[(Wt Ws )2 ] = t s and Var[(Wt Ws )2 ] = 2(t s)2 for every 0 s t. Hence
N N

E
n=1

(Wtn Wtn1 )2 =
n=1

(tn tn1 ) = tN t0

and, by the independence of the increments of Brownian motion,


N N

Var
n=1

(Wtn Wtn1 )

=2
n=1

(tn tn1 )2 .

Letting tn = nt/N , Chebychevs inequality implies


N

P
n=1

(Wnt/N W(n1)t/N )2 t >

2t2 0. N 2

Remark. For comparison, consider a continuously dierentiable function f : [0, 1] R. Recall that for such functions there exists a constant C > 0 such that |f (t) f (s)| C |t s| for all s, t [0, 1]. Hence we have
N N

[f (n/N ) f ((n 1)/N )]2


n=1 n=1 2

C 2 /N 2

= C /N 0 Since the quadratic variation of a Brownian motion is positive, the typical Brownian sample path is not a continuously dierentiable function of time. With the notion of quadratic variation, we can rewrite It os formula once more in a particularly easy to remember form: 1 df (Xt ) = f (Xt )dXt + f (Xt )d X 2 In this form, the idea of the proof becomes clear:
48
t

s formula. Fix a partion of [0, t]. By telescoping a sum and Idea of proof of Ito consider the following second order Taylor approximation we have the following:
N

f (Xt ) f (X0 ) =
n=1 N

f (Xtn ) f (Xtn1 ) 1 f (Xtn1 )(Xtn Xtn1 ) + f (Xtn1 )(Xtn Xtn1 )2 2 n=1


t t

f (Xs )dXs +
0 0

1 f (Xs )d X s . 2

3.2. The multi-dimensional version. We now introduce the vector version of It os formula. It is basically the same as before, but with worse notation. An n-dimensional It o process (Xt )t0 dened by
t t

Xt = X0 +
0

s dWs +
0

s ds,

interpreted component-wise as Xt = X0 +
0 k=1 (i) (i) t d (i,k) s dWs(k) + 0 t (i) s ds

where (Wt )t0 is a d-dimensional Brownian motion so that W (1) , . . . , W (d) , are independent scalar Brownian motions, and the predictable process (t )t0 is valued in the space of n d matrices, and the predictable process (t )t0 is valued in Rn . We insist that
t 0 n d (i,k) 2 (s ) ds i=1 k=1 t n ( i) |s |ds < 0 i=1

< and

almost surely for all t 0 so that all of the integrals make are dened. The aim of this section is to give a formula for the It o decomposition of f (t, Xt ). Now in the scalar case we needed a notion of quadratic variation (dXt )2 = d X t . In the (i) (j ) multi-dimensional case, we now introduce the notion of quadratic co-variation (dXt )(dXt ) = d X (i) , X (j ) t . Theorem. There exists a continuous process of nite variation X (i) , X (j ) , called the quadratic co-variation of X (i) and X (j ) , such that
N

X (i) , X (j )

= lim
n n=1

(Xnt/N X(n1)t/N )(Xnt/N X(n1)t/N )

(i)

(i)

(j )

(j )

for each t 0, where the limit is in probability, given by


d

d X ,X

(i)

(j )

=
j =1

(i,k) (j,k) s s dt.

49

The following multiplication table might help you remember how to compute quadriatic covariation, where W and W denote independant Brownian motions: (dt)2 = 0 (dt)(dWt ) = 0

(dWt )2 = dt (dWt )(dWt ) = 0 Now we are ready for the statement of the theorem: Theorem (It os formula, multi-dimensional version). Let f : R+ Rn R where (t, x) f (t, x) be continuously dierentiable in the t variable and twice-continuously dierentiable in the x variable. Then f df (t, Xt ) = (t, Xt )dt + t
n

i=1

f 1 (i) (t, Xt ) dXt + xi 2

i=1 j =1

2f (t, Xt ) d X (i) , X (j ) xi xj

4. Girsanovs theorem As we have seen in discrete time, the economic notion of an arbitrage-free market model is tied to the existence of an equivalent measure for which the asset prices, when discounted by a num eraire are martingales. Recall that an equivalent measures is related to a positive random variable via the Radon Nikodym theorem. Indeed, let (, F , P) be our probability space and let Q be equivalent to P. Then, by the RadonNikodym theorem there exists a density dQ Z= dP such that Z > 0 has unit P-expectation. Conversely, if Z > 0 and EP (Z ) = 1, we can dene an equivalent measure Q with density Z . Motivated by above discussion, we aim to understand how martingales arise within the context of the It o stochastic integration theory. Consider the stochastic process (Zt )t0 given by t 1 t 2 Zt = e 2 0 |s | ds+ 0 s dWs where (Wt )t0 is a m-dimensional Brownian motion and (t )t0 is a m-dimensional pret dictable process with 0 |s |2 ds < a.s. for all t 0. This process is clearly positive. Furthermore, notice that by It os formula we have dZt = Zt t dWt so that (Zt )t0 is a local martingale, as it is a stochastic integral with respect to a Brownian motion. Recall that since Z is a positive local martingale, it is automatically a supermartingale. Hence, if E(ZT ) = 1 for some non-random T > 0, then (Zt )0tT is a true martingale. In this case, what happens to the Brownian motion when we change to an equivalent measure with density ZT ?
50

Theorem (CameronMartinGirsanov Theorem). Let (, F , P) be a probability space on which a m-dimensional Brownian motion (Wt )t0 is dened, and let (Ft )t0 be a ltration satisfying the usual conditions. Let Zt = e 2
1 t 0

2 ds+ t 0

s dWs

and suppose (Zt )0tT is a martingale. Dene the equivalent measure Q on (, FT ) by the density process dQ = ZT . dP t )0tT dened by Then the m-dimensional process (W
t

t = Wt W
0

s ds

is a Brownian motion on (, FT , Q). Now, you may be asking yourself: When is the process (Zt )t0 not just a local martingale, but a true martingale? Theorem (Novikovs criterion). If E e+ 2 then E e 2
1 T 0 1 T 0 2 ds

< = 1.

2 ds+ T 0

s dWs

5. A martingale representation theorem In this section we will see that all continuous martingales are essentially stochastic integrals with respect to Brownian motion. This will have applications to our continuous-time nancial models in the next chapter. Theorem (It os Martingale Representation Theorem). Let (, F , P) be a probability space on which a m-dimensional Brownian motion W = (Wt )t0 is dened, and let the ltration (Ft )t0 be the ltration generated by W . Let X = (Xt )t0 be a continuous local martingale. Then there exists a unique predictable t m-dimensional process (t )t>0 such that 0 s 2 ds < almost surely for all t 0 and
t

Xt = X 0 +
0

s dWs .
t 0

Furthermore, if Xt > 0 for all t 0 then there exists a predictable such that and T 1 T 2 Xt = X0 e 2 0 s ds+ 0 s dWs Proof of the second claim. Assuming that dXt = t dWt and the positivity of X , apply It os formula to get t t 2 d log Xt = dWt dt Xt 2Xt2
51

s 2 ds <

so the conclusion follows with t = t /Xt . We conclude this section with a useful result. It is not directly applicable to nance, but simplies several arguments. Theorem (L evys Characterisation of Brownian Motion). Let (Xt )t0 be a continuous m-dimensional local martingale such that X (i) , X (j )
t

t 0

if i = j if i = j.

Then (Xt )t0 is a standard m-dimensional Brownian motion. Proof. Fix a constant vector Rm and let i = 1. Consider Mt = eiXt +|| By It os formula, dMt = Mt ||2 1 i dXt + dt Mt 2 2
m m
2 t/2

(i) (j ) d X (i) , X (j )
i=1 j =1

= iMt dXt and so (Mt )t0 is a continuous local martingale, as it is the stochastic integral with re2 spect to a continuous local martingale. On the other hand, since |Mt | = e|| t/2 and hence E(sups[0,t] |Ms |) < the process (Mt )t0 is a true martingale. Thus for all 0 s t we have E(Mt |Fs ) = Ms which implies 2 E(ei (Xt Xs ) |Fs ) = e|| (ts)/2 . The above equation implies that the increment Xt Xs has the Nm (0, (t s)I ) distribution and is independent of Fs .

52

CHAPTER 4

Arbitrage theory for continuous-time models


We now return to the main theme of these lecture, models of nancial markets. We now have the tools to discuss the continuous time case, at least when the asset prices are continuous processes. 1. The set-up As before, our market model consists of a n-dimensional stochastic processes P = (Pt1 , . . . , Ptn )t0 representing the asset prices. This process will be dened on a probability space (, F , P) with a ltration F = (Ft )t0 satisfying the usual conditions. Furthermore, we will make the following assumption to make use of the It o calculus developed in the previous chapter. Assumption. The stochastic process P is assumed to be is an It o process adapted to F. As before, the investors controls consist of the n-dimensional process H = (Ht1 , . . . , Htn )t0 where Hti and corresponds to the number of shares of asset i held at time t. We will assume that H is self-nancing in the continuous time sense: Definition. A (n+1)-dimensional predictable process (H, c) such that H is P -integrable1is a self-nancing investment/consumption strategy i d(Ht Pt ) = Ht dPt ct dt The wealth associated with a self-nancing trading strategy H is denoted
t t

Xt (H ) = Ht Pt = X0 (H ) +
0

Hs dPs
0

cs ds.

Now, following the programme laid out in discrete time, we need to dene an arbitrage: Definition (WARNING: THIS DEFINITION IS INCOMPLETE). An absolute arbitrage is a trading strategy H such that there is a non-random time T such that X0 (H ) = 0 = XT (H ) a.s. and P
0 T

cs > 0

> 0.

The reason for the above warning is spelled out below.


1...this

means the stochastic integral


t

t 0

Hs dPs is well-dened, i.e.


s

i j Hs Hs d P i , P j i,j 0

< a.s. for all t 0

53

2. Admissible strategies In order to make sense of the stochastic integral dening the wealth, we need to impose a technical integrablity condition which holds automatically for continuous processes. However, in moving from discrete to continuous time, we have to be careful. We will now see that this condition isnt strong enough to make our economic analysis interesting. Example. Consider a discrete-time market model with two assets P = (1, S ) where S is a simple symmetric random walk: St = 1 + . . . + t where the random variables 1 , 2 , . . . are independent and P(t = 1) = P(t = 1) = 1/2. Obviously this market has no arbitrage as P is a martingale. Nevertheless, lets explore how to approximate an arbitrage in some sense. Given a predictable process , let
t1

t =
s=1

(s+1 s )Ss

Then the pair (, ) denes a self-nancing pure investment strategy with associated wealth process
t

Xt =
s=1

s (Ss Ss1 ).

In particular, X0 = 0. A simple strategy that resembles an arbitrage is constructed as follows: rst dene the stopping time = inf {t 0 : St > 0}. and consider the strategy with t = 1{t} Note that the associated wealth process is Xt = St . Since < a.s., the conclusion is that if you are willing to wait a while, investing in this strategy will result in an almost sure gain X = 1. But the amount of time you have to wait is very long: one can show that E( ) = +. One can improve upon the above idea by taking larger and larger bets, eectively speeding up the clock. Indeed, dene the stopping time = inf {t 0 : t = 1} and consider the strategy t = 2t1 1{t } . In this case, the associated wealth process is Xt = 1 2t 1{t 1} . This is the classical martingale or doubling strategy. Note that E( ) = 2, so an investor following this strategy does not have to wait very long on average to realise the gain X = 1.
54

But although is small on average, it is not bounded, and hence this strategy does not qualify as an arbitrage. Example. A technical problem with continuous time models is that events that will happen eventually can be made to happen in bounded time by speeding up the clock. Consider the market with prices P = (1, W ) where W is a Brownian motion. We will now construct a pure investment trading strategy such that the corresponding wealth process has X0 = 0 and XT = K a.s. where T > 0 is an arbitrary (non-random) time horizon and the constant K > 0 is also arbitrary. More concretely, by writing H = (, ), we will nd a real-valued adapted process T 2 ds < almost surely, but (t )t[0,1] such that 0 s
T

s dWs = K a.s
0

Let f : [0, T ] [0, ] be a strictly increasing, continuous function such that f (0) = 0 and f (T ) = . In particular we assume that f (t) > 0 for t and there exists an inverse function f 1 : [0, ] [0, T ] such that f f 1 (u) = u. For instance, to be explicit, we may t uT take f (t) = T and f 1 (u) = 1+ . t u Now dene a local martingale (Zu )u0 by
f 1 (u)

Zu =
0

(f (s))1/2 dWs

Note that the quadratic variation is


f 1 (u)

=
0

f (s)ds

= f (f 1 (u)) f (0) = u so by L evys characterisation (Zu )u0 is a Brownian motion. Dene the stopping time by = inf {u 0, Zu = K }. Since (Zu )u0 is a Brownian motion, we have < almost surely since supu0 Zu = almost surely. Now let t = (f (t))1/2 1{tf 1 ( )} and Xt =
0 t

s dWs

for 0 t T . Note that since


T 2 s ds 0 f 1 ( )

=
0

f (s)ds = <

the stochastic integral is well-dened. The strange fact is that (Xt )t[0,T ] is a local martingale with X0 = 0, but XT = Z = K almost surely.
55

We see that integrand (s )s[0,T ] roughly corresponds to an gambler starting at noon with 0, employing a doubling strategy (with borrowed money) at a quicker and quicker pace, until nally he gains K almost surely before the clock strikes one oclock. This situation is rather unrealistic, particularly since the gambler must go arbitrarily far into debt in order to secure the K winning. Indeed, if such strategies were a good model for investor behaviour, we all could be much richer by just spending some time trading over the internet. The above discussion shows that the integrability necessary to dene the stochastic integral is not really sucient for our needs. At this stage, there are several reasonable options. In this course we will insist that the investor cannot go arbitrarily far into debt. Definition. A trading strategy H is L-admissible i the associated wealth process X (H ) is such that Xt (H ) Lt for all t 0 almost surely where L is a given continuous non-negative adapted processes. Remark. In most of the course, we will take L = 0. However, for our arbitrage theory, it is convenient to allow some exibility with the choice of credit limit, otherwise there would not be enough admissible strategies to have an interesting theorem. Note that the doubling strategy is not admissible, since the investor now has only a nite credit line. However, a suicide strategy, that is, a doubling strategy in which the object is to lose a xed amount K by time T , is admissible. 3. Absolute arbitrage and state price densities To see that our restriction to admissible strategies is reasonable, lets now consider continuous-time arbitrage theory. Definition. An admissible investment consumption strategy (H, c) is called an absolute arbitrage i there is a non-random time T such that X0 (H ) = 0 = XT (H ) a.s. and
T

P
0

cs ds > 0

> 0.

As in the discrete-time theory, we now introduce state price densities: Definition. A state price density is a positive It o process Y such that Y P = (Yt Pt )t0 is an n-dimensional local martingale. Our continuous-time version of the rst fundamental theorem follows. Unfortunately, to get a clean statement of this result we need to up the technical ante. Theorem. If there exists a state price density Y such that Y L is locally of class D, then there are no L-admissible absolute arbitrages.
56

We now recall some denitions and results concerning uniform integrability: ***** Definition. A family of random variables Z is called uniformly integrable i
k Z Z

lim sup E(|Z |1{|Z |>k} ) = 0.

Note that this generalises the usual notion of integrability,since E(|Z |) < lim E(|Z |1{|Z |>k} ) = 0
k

by the dominated convergence theorem. Here are some sucient conditions. If there is an integrable random variable such that |Z | a.s. for all Z Z , then Z is uniformly integrable. If supZ Z E(|Z |p ) < for some p > 1, then Z is uniformly integrable. If (Mt )0tT is a martingale on the nite non-random time horizon T , then the family of random variables (Mt )0tT is uniformly integrable. The reason uniform integrability is useful for probabilist is this theorem: Theorem. Let Z1 , Z2 , . . . , Z be a family of integrable random variables. The following statements are equivalent: (1) Zn Z in L1 , and (2) (Zn )n1 is uniformly integrable and Zn Z in probability. In particular, we will use this theorem to justify the claim that if Zn Z a.s. and (Zn )n1 is uniformly integrable then E(Zn ) E(Z ). In the context of a ltered probability space, there is a useful notion that is chosen to make the proofs of our theorems work. Definition. A continuous adapted process Z = (Zt )t0 is of class D if the family of random variables {Z : a nite stopping time } is uniformly integrable. The process is locally of class D (or, of class DL) if {Z t : a stopping time } is uniformly integrable for each t 0. Class D is named in honour of the American probabilist Joseph Doob. Here are some sucient conditions. If E(sup0st |Zs |) < for each t 0, then Z is locally of class D. If Z is a martingale, then Z is locally of class D. Now lets return to the proof of the fundamental theorem. ***** The proof of this fact is based on an important lemma:
57

Lemma. Suppose (H, c) is a self-nancing investment/consumption strategy and let


t t

Xt = Ht Pt = X0 +
0

Hs dPs
0

cs ds.

Then d(Xt Yt ) = Ht d(Yt Pt ) Yt ct dt for any It o process Y . Proof of lemma. . First note Yt dXt = Yt (Ht dPt ct dt) and Xt dYt = Ht Pt dYt . Finally, note that
n

d X, Y

= (Ht dPt )(dYt ) =


i=1

Hti d P i , Y t .

Putting this together with It os formula yields d(Xt Yt ) = Yt dPt + Xt dYt + d X, Y =


i t

Hti (Yt dPti + Pti dYt + Y, P i t ) Yt ct dt

= Ht d(Yt Pt ) Yt ct dt as claimed. Proof that existence of a state price density implies no arbitrage. First note that for any self-nancing investment consumption strategy (H, c) we have the identity
t t

Hs d(Ys Ps ) = Yt Xt Y0 X0 +
0 0

Ys cs ds

from the lemma. In particular, if the strategy is L-admissible then


t

Hs d(Ys Ps ) + Lt Yt Y0 X0 for all t 0.


0

Now by assumption Y P is a local martingale, and therefore so is the stochastic integral H d(Y P ). Hence there exists a sequence of stopping times N so that the stopped stochastic integral is a true martingale. Supposing that Y L is of class D locally, we have
T N T

E
0

Hs d(Ys Ps ) + LT YT

=E

lim

Hs d(Ys Ps ) + LN T YN T
0 N T

lim inf E
N 0 N

Hs d(Ys Ps ) + LN YN

Fatous lemma

= lim inf E (LN YN )

stopped local martingale is a martingale

= E(LT YT ) uniform integrability


58

Now suppose (H, c) is such that X0 = XT = 0. Then


T T

E
0

Ys cs ds

=E
0

Hs d(Ys Ps )

0 which implies that ct ( ) = 0 for almost every (t, ). Hence there is no arbitrage. 4. Relative arbitrage and equivalent local martingale measures In this section we impose a little more structure to the model by insisting one of the assets is a num eraire. That is, we will decompose the n = d + 1 dimensional It o procees as P = (N, S ) where N is a positive scalar process (the price of the num eraire) and a ddimensional process S modelling the price of d other assets. As in the discrete time case, we limit ourselves to pure investment strategies in our discussion of arbitrage theory. Definition. A relative arbitrage is a pure investment strategy with wealth process X such that there is a non-random time T > 0 satisying XT X0 almost surely and P NT N0 XT X0 > NT N0 > 0.

Remark. Note that an absolute arbitrage compares the agents wealth to 0, while an relative arbitrage compares the wealth to some xed num eraire. Definition. An equivalent (local) martingale measure (relative to the num eraire with price N ) is a probability measure Q equivalent to P such that S/N is a local martingale. Another version of the rst fundamental theorem in continuous time is the following: Theorem. Let Q be an equivalent local martingale measure, and suppose L/N is locally in Q-class D. Then there are no L-admissible relative arbitrages. To prove this, we follow the same steps as before. Proof. Let H = (, ) be an agents pure investment strategy. Note that the wealth satises two equations: Xt = t Nt + t St dXt = t dNt + t dSt . Solving for t = (Xt t St )/Nt in the rst equation and plugging into the second yields (by It os formula) St Xt d = t d . Nt Nt Since by denition the discounted price S/N is a Q-local martingale, so is the discounted wealth X/N . If the strategy is L-admissible then (X + L)/N is non-negative. As before we can conclude as before that XT X0 EQ NT N0
59

by Fatous lemma since EQ (LN T /NN T ) EQ (LT /NT ) by the assumed uniform integrability. Now if P(XT /NT X0 /N0 ) = 1 then Q(XT /NT X0 /N0 ) = 1 by the equivalence of P and Q. But by the pigeonhole principle, we have Q(XT /NT = X0 /N0 ) = 1. Again by the equivalence of P and Q we have P(XT /NT = X0 /N0 ) = 1 and there is no arbitrage. Remark. Note that the above theorem doesnt say that no relative arbitrage implies the existence of an equivalent local martingale measure. A weaker version notion of relative arbitrage, called free-lunch-with-vanishing-risk, is needed to have the converse implication. See the recent book of Delbaen and Schachermayer The Mathematics of Arbitrage for an account of the modern theory. In discrete time, the notions of absolute arbitrage and relative arbitrage are essentially equivalent. In continuous time, it is still the case that if H is an absolute pure-investment arbitrage, then it is also an arbitrage relative to the num eraire. However, we will soon nd examples of the surprising fact that there exists continuous time markets that have relative arbitrage but no absolute arbitrage. The point of all of this is to warn you to be careful when making arbitrage arguments in continuous time, since reasonable people can disagree on what kind of strategies should be called arbitrages. 5. The structure of state price densities In this section we will parametrise a fairly general It o market with n = d + 1 assets. All assets in this market are num eraires, and we use the notation P = (B, S ). We will assume the dynamics of the prices are given by the following equations dBt = Bt rt dt
m

dSti

Sti

i t dt

+
j =1

ij t dWtj

for i = 1, . . . , d

where the processes r, i , ij are predictable and suitably integrable, and the W j are independent Brownian motions. The rst asset can be thought of as a bank account, and the random variable rt is the spot interest rate at time t. The (random) ordinary dierential equation can be solved: Bt = B0 e
t 0 rs ds

The d assets can be thought of as risky stocks. The random variable i t is interpreted as the ij 2 1/2 mean instantaneous return of asset i, while the spot volatility is ( j (t ) ) . Note that It os formula yields t i t j ij 2 ij 1 i 0 Sti = S0 e [s 2 j (s ) ]ds+ 0 j s dWs . We will use the notation 1 11 1m t t t . . .. . . t = . . . and t = . . . d1 dm d t t t
60

for the d 1 vector of means and d m matrix of volatilities, respectively. With this more explicit parametrisation, we can describe the structure of state price densities: Theorem. Let be a predictable m-dimensional process such that 0 s 2 ds < a.s. for all t 0 and that t t = t rt 1 for almost all (t, ) where 1 = (1, , 1) is the d 1 vector with the constant 1 in each component. Let t t 2 Yt = Y0 e 0 (rs + s /2)ds 0 s dWs for a constant Y0 > 0 or in equivalent dierential form dYt = Yt (rt dt t dWt ). Then Y is a state price density. Furthermore, if the ltration is generated by the m-dimensional Brownian motion W , all state price densities have this form. Remark. The m-dimensional random vector t appearing the theorem is a generalisation of the Sharpe ratio. The process = (t )t0 is often called the market price of risk, for the state price density, since it measures in some sense the excess return of the stocks per unit of volatility. Proof. We need to show that Y B and Y S are local martingales. Note that by It os formula d(Yt Bt ) = Yt Bt t dWt so Y B is a local martingale since it is the stochastic integral with respect to a Brownian motion W . Also, by It os formula
i i. d(Yt Sti ) = Yt St [rt + i t (t t ) ] + Yt St (t t ) dWt i. = Yt St (t t ) dWt t

where we have used the identity t t = t rt 1 to cancel the dt term. Conversely, if the ltration is generated by the Brownian motion, the martingale representation theorem says that all positive local martingales M are of the form Mt = M0 e 2
1 t 0

2 ds t 0

s dWs

for some predictable , or in dierential form dMt = Mt t dWt . Hence, if Y B = M is a positive local martingale then dYt = Yt (rt dt + t dWt ) by It os formula. Furthermore, if Y S is a local martingale, then It os formula shows that in order to cancel the drift we must have the identity t t = t rt 1.
61

In the discrete time case, corresponding to a state price density Y , there is an equivalent local martingale measure (with respect to the bank account) with the RadonNikodym density BT YT dQ = . dP B0 Y0 for some time horizon T > 0. Recall, that the discrete time case, the denition of state price density implies that BY is a true martingale, while the denition of state price density that we have adopted in continuous time only implies that BY is a local martingale. Therefore, we have to check that BY is a true martingale in order to claim the density process above denes an equivalent probability measure. Theorem. Suppose that is a predictable process such that t t = t rt 1. If Mt = e 2 0 s ds 0 s dWs is a true martingale, then the measure Q dened by dQ = MT dP is an equivalent martingale measure (with respect to the bank account). In particular, the dynamics of the stock prices are given by dSti = Sti t = Wt + where W
t 0
1 t 2 t

rt dt +
j

ij j t dW t

s ds is a Q-Brownian motion. Sti Bt Sti Bt Sti Bt Sti Bt


j j ij j t dWt .

Proof. Note that by It os formula d = = = [i t rt ]dt +


j ij j t (j t dt + dWt ) ij t dWtj

t = Wt + t s ds is a Q-Brownian motion. Therefore, Now Girsanovs theorem says that W 0 each S i /B is the stochastic integral with respect to a Q-Brownian motion, and hence is a Q-local martingale as claimed.

62

CHAPTER 5

Hedging contingent claims in continuous time models


As before, given a market model P we can introduce a contingent claim. Recall that a European contingent claim maturing at a time T > 0 is modelled as random variable that is FT -measurable. We shall assume that there is at least one state price density, so that, in particular, there are no absolute arbitrages. 1. An existential replication result We will assume the market model P = (B, S ) has dynamics dBt = Bt rt dt
m

dSti

Sti

i t dt

+
j =1

ij t dWtj

for i = 1, . . . , d

as before, or in vector notation, these equations can be written as dSt = diag(St )(t dt + t dWt ) 0 .. . 0 0 s2 diag(s1 , . . . , sd ) = . . . . . .. .. . . . . 0 0 sd We will work in the ltration generated by W , so that all state price densities Y are of the form dYt = Yt (rt dt t dWt ). where t t = t rt 1. The following will serve as a version of the second fundamental theorem of asset pricing in continuous time. s1 0 Theorem. Suppose m = d and that the d d matrix t is invertible for all (t, ), so that in particular, there is a unique (up to scaling) state price density Y of the form dYt = Yt (rt dt t dWt ). where
1 t = t (t rt 1). Let T be non-negative, FT -measurable and such that T YT is integrable. Then there exists a 0-admissible strategy H with initial cost

where

X0 (H ) = E(YT T )/Y0
63

which replicates the European claim with payout T . is an L-admissible strategy replicating the Furthermore, if LY is locally of class D and H claim, then ) X0 (H ). X0 (H Remark. That is to say, the quantity E(YT T )/Y0 is the minimal amount of money needed to replicate the claim among reasonable trading strategies. Of course, if you could employ a doubling strategy, you could replicate the claim with strictly less money. Proof. Let Mt = E(YT T |Ft ). Then M is a martingale, and since the ltration is generated by the Brownian motion W the martingale representation theorem tells us that there exists a d-dimensional predictable process such that dMt = t dWt . By It os formula we have Mt (Mt t + t ) Mt rt dt + (dWt + t dt). d = Yt Yt Yt Now let (Mt t + t ) 1 Mt t = diag(St )1 (t )1 and t = t S t Yt Bt Yt Note that t Bt + t St = Mt /Yt and that (after some tedious algebra) t dBt + t dSt = d Mt Yt .

This means H = (, ) is a self-nancing strategy and Mt for all 0 t T. Xt ( H ) = Yt It is 0-admissible since Mt 0 and satises XT = MT /YT = T and X0 = E(T YT )/Y0 as desired. be an L-admissible replicating strategy for the payout T , so that XT (H ) = T . Let H ) is a supermartingale by the assumption that Y L is of class D, we have Since Y X (H ) E(YT T ) = E(YT XT (H ). Y0 X0 (H If we consider the equation t t = t rt 1 where t is an d m matrix, one expects from the rules of linear algebra for there to be no solution if m < d, exactly one solution if m = d, and many solutions if m > d. Of course, this is not a theorem, just a rule of thumb. Financially, the rule of thumb becomes: m < d m = d m > d The market has arbitrage. The market has no arbitrage and is complete. The market has no arbitrage and is incomplete.
64

2. An example of relative arbitrage As promised, we will now exhibit a market model that does not have absolute arbitrages, but does have relative arbitrages. Suppose P = (1, S ). The rst asset is cash and has constant price. The second asset (the stock) has prices given by the positive local martingale S with dynamics dSt = St t dWt . where t > 0 for all t 0 a.s. and W is a scalar Brownian motion. Recall that positive local martingales are supermartingales. We will assume that S is a strictly local martingale, in the sense that E(ST ) < S0 for for some non-random time horizon T > 0. We will assume that the ltration is generated by the the Brownian motion W . An example of such a process is exhibited in the example sheet. Notice that Y = 1 is a state price density for the model. So long as L is of class D, there are no L-admissible absolute arbitrages by the fundamental theorem of asset pricing. Also, if we take cash as the num eraire, then an equivalent local martingale measure is Q = P. So, again, if L is of class D, there are no L-admissible arbitrages, relative to the cash num eraire. Now, what happens when we take take the stock as the num eraire? Note that by our replication theorem, (by taking the contingent claim T = ST to be the stock itself) there exists a 0-admissible strategy H such that XT (H ) = ST with initial cost X0 (H ) = E(ST ) < S0 . Hence H is an arbitrage (!) relative to the stock since XT ( H ) X0 ( H ) > a.s. ST S0 That is, it is possible to trade in the stock and cash in such a way to outperform the stock. = In discrete time, we could use this relative arbitrage to form an absolute arbitrage H H (0, p) by buying the portfolio and selling a certain number p = E(ST )/S0 of shares of ) = 0 and XT (H ) = ST (1 p) > 0. However, in continuous time, we the stock. Indeed, X0 (H have to be careful with admissibility. Notice that the wealth associated with this proposed ) = Xt (H ) St p is not bounded below by a process in class D. So depending strategy Xt (H on our credit limit, it may not be possible to lock in the arbitrage. 3. The BlackScholes model and formula We will consider the simplest possible model of the type studied introduced above. Consider the case of a market with two assets. We will assume that all coecients are constant, so the price dynamics are given by the pair of equations dBt = Bt r dt dSt = St ( dt + dWt ) for real constants r, , where > 0. We will assume that the ltration is generate by the scalar Brownian motion W . This is often called the BlackScholes model. We are interested in nding the replication cost of a European contingent claim with payout T = g (ST ), where g is a given function which we assume to be non-negative and
65

suitably integrable. We know from before that the unique state price density with /Y0 = 1 is given by 2 Yt = e(r /2)tWt where = ( r)/ . Hence, from our existential result there is a trading strategy H which replicates the payout with 1 Xt (H ) = E[YT g (ST )|Ft ]. Yt This is where we see the advantage of working with equivalent martingale measures rather than state price densities. Indeed, dene the equivalent martingale measure Q by the density dQ 2 = e T /2WT dP t = Wt + t is a and recall that by the CameronMartinGirsanov theorem the process W Q-Brownian motion. The price of the stock can be written explicitly: St = S0 e( and hence Xt (H ) = er(T t) EQ g S0 e(r = er(T t) EQ g St e(r

2 /2)T + W T 2 /2)t+W t

= S0 e(r

2 /2)t+ W

|Ft
T Wt )

2 /2)(T t)+ (W

|Ft ez /2 dz. 2
2

=e

r(T t)

g St e

(r 2 /2)(T t)+ T tz

A famous example is the case of the European call option where the payout function is of the form g (S ) = (S K )+ . In this case, we have the the Nobel-prize-winning BlackScholes formula : log(K/St ) Ct (T, K ) =St + (r/ + /2) T t T t log(K/St ) Ker(T t) + (r/ /2) T t T t where (x) = 1 ey /2 dy is the standard normal distribution function. (You are asked 2 to derive this formula on Example Sheet 3.) We have argued that the martingale representation theorem asserts the existence of replicating strategy H , but unfortunately, it gives us no information about how to compute H . This problem will be tackled in the next section. 4. Markovian markets and the BlackScholes PDE We now have a sucient condition that a contingent claim can be replicated. However, at this stage we can only assert the existence of a replicating strategy for a given claim, but we do not yet know how to actually compute it. This problem is the subject of this section.
66
x
2

The rst step is to pose a model for the asset prices (Bt , St )t0 . A good model should give a reasonable statistical t to the actual market data. Furthermore, a useful model is one in which the prices and hedges of contingent claims can be computed reasonably easily. In this section, we will study models in which the asset prices are Markov processes. These models are useful in the above sense, though there seems to be some controversy over how well they t actual market data. Now suppose that the d + 1 assets have It o dynamics which can be expressed as dBt = Bt r(t, St ) dt dSt = diag(St )((t, St )dt + (t, St )dWt ) where the nonrandom functions r : [0, ) Rd R, : [0, ) Rd Rd and : [0, ) Rd Rdm are given. Notice that this is a special case of the set-up of the last section, as now (with an abuse notation) rt ( ) = r(t, St ( )), t ( ) = (t, St ( )), and t ( ) = (t, St ( )). In this special situation, the asset prices (St )t0 are a d-dimensional Markov process. The next theorem says how to nd a replicating strategy for a contingent claim maturing at time T with payout T = g (ST ) d for some non-random function g : R [0, ). Theorem. Suppose the function V : [0, T ] Rd [0, ) satises the partial dierential equation V + t
d

i=1

1 V rS + i S 2
i

ai,j S i S j
i=1 j =1

2V = rV S i S j

V (T, S ) = g (S ) where a = , and where all functions in the PDE are evaluated at the same point (t, S ) [0, T ) Rd . Then there exists a 0-admissible strategy H such that Xt (H ) = V (t, St ). In particular, this strategy replicates the contingent claim with payout g (ST ). Furthermore, if H = (, ) then the strategy can be calculated as t = grad V (t, St ) = and t = V V ( t, S ) , . . . , (t, St ) . t S 1 S d

V (t, St ) t St . Bt

The above theorem says that if the market model is Markovian, the price (i.e. replication cost) of a claim contingent on the future risky asset prices can be written as a deterministic function V of the current market prices. Furthermore, the pricing function V can be found by solving a certain linear parabolic partial dierential equation1 with terminal data to match
1sometimes

called the FeynmanKac PDE. If r = 0, the PDE reduces to the (backward) Kolmogorov
67

equation.

the payout of the claim. Solving this equation may be dicult to do by hand, but it can usually be done by computer if the dimension d is reasonably small. And most importantly for the banker selling such a contingent claim: the replicating portfolio t can be calculated as the gradient of the pricing function V with respect to the spatial variables, evaluated at time t and current price St . Proof. By It os formula we have V dV (t, St ) = dt + t = 1 V + t 2 V 1 dSti + i S 2 2V d S i, S j i j S S dt +
i t

i,j

i,j

2V S i S j aij S i S j V S i dt + +
i

V dS i S i t

=r V
i

Si

V dSti i S

where we have used the assumption that V solves a certain PDE to go from the second to third line above. Now letting and be as in the statement of the theorem we have that V (t, St ) = t Bt + t St dV (t, St ) = t dBt + t dSt . Hence H = (, ) is a self-nancing strategy with associated wealth process Xt (H ) = V (t, St ) as claimed. It is 0-admissible since V 0 by assumption. We have seen that there are two distinct ways to nd replication costs for certain contingent claims: by computing expectations or by solving a PDE. Furthermore, the PDE method also gives the replicating portfolio. But how do you solve the PDE? In many cases, the easiest way to solve the PDE is to compute the expectations. This is illustrated by the BlackScholes model: Example (BlackScholes continued). Lets return to the BlackScholes model dBt = Bt rdt dSt = St (dt + dWt ) with constant coecients r, , . If we would like to replicate a claim with payout g (ST ), the previous theorem says we should solve the BlackScholes PDE V V 1 2V + rS + 2 S 2 2 = rV t S 2 S V (T, S ) = g (S ) How can we solve this PDE? Yes, it is straightward, if tedious, to verify that ez /2 dz. V (t, S ) = e g Se 2 is a solution. Of course, the right-hand side is simply the expected discounted payout of the claim under the unique equivalent martingale measure.
r(T t) (r 2 /2)(T t)+ T tz
2

68

Because of its central importance in the theory, the quantity V is given a special name, S the delta, of the claim. Note that if g is nice enough that dierentiation can be taken under the integral, we have V = er(T t) S where e 22 f (u; ) = 2 2
(ua )2

g (Seu )eu f (u; T t)du

and a = r 2 /2. In particular, if the payout function g is increasing then the delta is non-negative. 2V Another important quantity , called the gamma, measures the sensitivity of the delta S 2 to changes in the price of the underlying. As before, if g is nice enough, we have 2V = er(T t) S 2

g (Seu )e2u f (u; T t)du

so that if the payout function g is convex, then the gamma is non-negative, i.e. the delta is increasing in S . Remark. In practical nance, there is a whole list if quantities, the delta, theta, gamma, vega, vanna, etc. which describe the sensitivities of a pricing formula with respect to some of its parameters. These quantities are collectively knowns as the greeks. Now, lets specialise to the case of the call option where g (S ) = (S K )+ . From last section we have log(K/S ) V (t, S ) =S + (r/ + /2) T t T t log(K/S ) Ker(T t) + (r/ /2) T t . T t The delta, i.e. the replicating portfolio, in this case is (by a miracle of algebra) V log(K/S ) (t, S ) = + (r/ + /2) T t . S T t Note that an agent attempting to replicate a call option using the BlackScholes theory will always hold a fraction of shares of the underlying stock between 0 and 1. Also note that since the sensitivity of the portfolio to the price of the underlying, is given by the formula 2V 1 log(K/S ) (t, S ) = + (r/ + /2) T t 2 S S T t T t where (x) = 1 ex /2 . Since the gamma is always positive, the hedger will buy more 2 shares of the underlying if the price goes up.
69
2

5. BlackScholes volatility What made the BlackScholes formula so popular after its publication in 1973 is the fact that the right-hand-side depends only on six quantities: the current calendar time t, the options maturity time T , the options strike K , the spot interest rate r, the underlying stocks price St at time t, and a volatility parameter . Of these six numbers, only the volatility parameter is neither specied by the option contract nor quoted in the market. To use the BlackScholes formula to nd the price of real call options, one must rst estimate the volatility . 5.1. Estimation: statistics. In the BlackScholes model, the drift and volitility are not directly observable. Nevertheless, they can be estimated by appealing to standard statistical theory. Suppose that we have observed the stock price (St )T t0 . If we sample at S are independent times ti = (i/n 1)T , we see that the n random variables Yi = log St ti i1 with distribution Yi = ( 2 /2)(ti ti1 ) + (Wti Wti1 ) N (aT /n, 2 T /n) where a = 2 /2. The maximum likelihood estimator of a is 1 a = T and of 2 is 2 = 1 T
n n

Yi
i=1

(Yi a T /n)2 .
i=1

Notice that this estimator a can be rewritten as 1 a = log(S0 /ST ), T and hence does not depend on n! That is to say, there is no advantage going to ever higher and higher frequency data to estimate the drift . Fortunately, a careful reading of the previous section shows that the drift parameter is not needed to nd either replication cost or the replicating strategy. This is good news for the BlackScholes theory.2 On the other hand, the variance of 2 is 2 4 /n 0 as n . Hence, there is some hope of accurately estimating the volatility parameter by sampling the historical stock prices regularly enough. If one was to truly believe that the stock price was a geometric Brownian motion, that 2 into the BlackScholes is, of the form St = S0 eat+Wt , then one could insert the value formula to obtain the price of a call option. Notice that we have done the statistics under the objective measure P, not the equivalent martingale measure Q.
this is very bad news for optimal investment. For instance, consider the problem of maximising E log(XT ) over all 0-admissible trading strategies. It turns out that the optimal fraction of wealth to hold in the stock is given by t St r = . Xt 2 However, this formula is useless unless the parameters on the right-hand side can be estimated accurately.
70
2However,

5.2. Calibration: implied volatility. A completely dierent approach to nd the volatility parameter is to observe the prices of contingent claims from the market, and then try to work out which to put into the BlackScholes formula to get the right price. The BlackScholes formula says that in the context of a BlackScholes model the call price is given by (*) Ct (T, K ) = C BS (t, T, K, St , r, )

for an explicit function C BS written out the previous section. But in reality we do not know but can observe the call prices. Therefore, rather than compute the call price from the parameters, we turn the story around by dening the implied volatility of the option to be unique number such that equation (*) holds. We denote by t (T, K ) the implied volatility of at time t of an option with maturity T and strike K . If the market was still pricing call options by BlackScholes formula, then there would exist one parameter such that t (T, K ) = for all 0 t < T and K > 0. However, in realworld markets, is is usually the case that the implied volatility surface (T, K ) t (T, K ) is not at. Indeed, for xed T , the graph of the function K t (T, K ) often resembles a convex parabola3 at least for strikes K close to the money, i.e. such that Ker(T t) /St 1. That is why practictioner refer to the function K t (T, K ) as the implied volatility smile or smirk. One could either conclude BlackScholes model is the true model of the stock price and that the market is mispricing options, or that the BlackScholes model does not quite match reality. The second approach is more prudent. Then, why even consider implied volatility? As Rebonato famously put it: Implied volatility is the wrong number to put into wrong formula to obtain the correct price. However, thanks to the enormous inuence of the BlackScholes theory, the implied volatility is now used as a common language to quote option prices. 5.3. Robustness of BlackScholes. As argued above, since real markets tend to exhibit implied volatility smiles, the BlackScholes model cannot be considered an adequate description of how stock prices uctuate. However, it should be considered an approximation of reality, and we will now do a calculation to see how to quantify how good this approximation is. Suppose a banker wants to sell a contingent claim with payout T = g (ST ). The banker believes that the underlying stocks prices are given by the BlackScholes model, so that the initial price of the claim is given by V (0, S0 , ) where

V (t, S, ) = e

r(T t)

g (S0 e

(r 2 /2)(T t)+ T tz

ez /2 ) dz, 2

for some to be determined. Now, the claim is already traded on the market with initial price 0 , so the banker chooses a = such that V (0, S0 , ) = 0 , i.e. is the initial implied volatility for the claim.
3...but

be careful: for large K , the graph can grow no faster than


71

2 log K/(T t). See example sheet

4.

Now, the banker wants to hedge away the liability associated with the payout of the claim, so again believing the BlackScholes theory, he puts the initial wealth of X0 = 0 in his account and holds a portfolio of V (t, St , ) S shares of the stock at all times. His wealth then evolves as t = (*) dXt = r(Xt t St )dt + t dSt .

The banker knows that according to the BlackScholes theory his strategy should replicate the claim XT = g (ST ) a.s. Suppose that the true dynamics of the market are given by dBt = Bt rdt dSt = St (dt + t dWt ) where r and are the same constants as before, but now (t )t0 is some predictable process. How big is the hedging error XT g (ST )? First note that V solves the BlackScholes PDE: V V 1 2 2 2V + rS + S = rV. t S 2 S 2 V (T, S ) = g (S ). Now note that by It os formula and the BlackScholes PDE V 1 2V V dt + dSt + dS t t S 2 S 2 1 2V 2 = rV dt + t (dSt rSt dt) + St2 (t 2 ) 2 dt 2 S Subtracting this equation from equation (*) and solving yields dV (t, St , ) = 1 2
T 2 er(T t) ( 2 t )St2 0

2V (t, St , )dt = XT V (T, ST , ) erT (X0 V (0, S0 , )) S 2 = XT g (ST )

since X0 = 0 by assumption and 0 = V (0, S0 , ) by the denition of . The above formula show that the naive BlackScholes hedger does reasonably well in a world where the implied volatility is close to the actual spot volatility. For many claims, 2V is positive. Therefore, the naive hedgers strategy may such as call options, the gamma S 2 fall short if the implied volatility is smaller than the realised spot volatility. 6. Local volatility models In the previous section we have considered the BlackScholes modela two asset market model in which the risky asset price is a geometric Brownian motion. The BlackScholes formula gives an explicit representation of the prices Ct (T, K ) of call options in this model in terms of the calendar time t, the current stock price St , spot interest rate r, the option maturity T and strike K , and a volatility parameter .
72

However, since the implied volatility surface t (T, K ) of real-world option prices is usually not at, practitioners and researchers have proposed various generalisations of the Black Scholes model to better match the observed implied volatility surface. We now consider another Markovian model which can match a given implied volatility surface exactly. We consider a model given by dBt = Bt r dt dSt = St ( dt + (t, St )dWt ). That is, the idea is replace the constant volatility parameter in BlackScholes model with a local volatility function : [0, ) (0, ) (0, ). We will assume that is smooth and bounded from below and above. As always, let Q be the equivalent martingale measures with density T 1 T 2 dQ = e 2 0 s ds 0 s dWs dP t = dWt t dt denes a where t = ( r)/ (t, St ). Recall that by Girsanovs theorem dW Q-Brownian motion. The next theorem in the present context is usually attributed to Dupires 1994 paper. Theorem. Suppose that C0 (T, K ) = EQ [erT (ST K )+ ] Then C0 C0 (T, K )2 2 2 C0 (T, K ) + rK (T, K ) = K (T, K ). T K 2 K 2 Remark. We have already seen a PDE for the replication cost of options in Markovian models. In that PDE, the solution V (t, St ) was the time-t value of a replication strategy for the given claim, and the derivatives were respect to the calendar time t and the current price of the underlying asset St . In contrast, Dupires PDE is for the initial replication cost of a call option, and the derivatives are with respect to the maturity date T and the strike K .

Remark. The point of the above theorem is this: Suppose you believe that the stock price is generated by a local volatility model, but you do not know what the local volatility function is. If you can observe todays call price surface {C0 (T, K ) : T > 0, K > 0} then you can solve for the local volatility in Dupires PDE to arrive at Dupires formula (T, K ) =
0 0 2[ C (T, K ) + rK C (T, K )] T K

1/2

C0 K2 (T, K ) K 2

Furthermore, assuming Dupires PDE has a unique solution (it will if is smooth and bounded as assumed) then we have found a model that can reproduce the observed call prices. Of course, plugging the call surface C0 (T, K ) = C BS (t = 0, T, K, S0 , r, 0 ) into Dupires formula yields = 0 , 2C 0 K2 ( T, K ) 2 K as it should. In general, however, the local volatility surface need not be at.
73
0 0 2[ C (T, K ) + rK C (T, K )] T K

1/2

Before we sketch the proof, we will need a simple result observed by Breeden and Litzenberger in 1978. The proof just involves calculus, so there is no need to spell it out. Lemma. Suppose the random variable ST has a continuous density with respect to Lebesgue measure; that is, there exists a continuous function fST : [0, ) [0, ) such that
x

Q(St x) =
0

fSt (y ) dy.

Then C0 (T, K ) = e and


rT 0

fST (y )(y K ) dy = e

rT K

fST (y )(y K )dy.

C0 rT fST (y ) dy (T, K ) = e K K 2 C0 (T, K ) = erT fST (K ) K 2 Sketch of proof of Dupires formula. To outline the argument, we proceed formally T

(ST K )+ = (S0 K )+ +
0 T

1{St K } dSt +

1 2 1 2

K (St )d S
0

= ( S0 K )+ +
0 T

1{St K } St r + K (St )St2 (t, St )2 dt

+
0

t 1{St K } St (t, St )dW

where we have appealed to It os formula4 with g (x) = (x K )+ , g (x) = 1[K,) (x), and g (x) = K (x), the Dirac delta function. Now, by the assumption of smoothness and the bounds on the volatility function, the Q-law of the random variable ST has a density function fST . Computing expected values of both sides
T

(1)

erT C0 (T, K ) = (S0 K )+ +


0 K

fSt (y )y r dy dt +

1 2

fSt (K )K 2 (t, K )2 dt
0

and then dierentiating both sides with respect to T yields erT C0 (T, K ) + rC0 (T, K ) T

=
K

1 fST (y )y r dy + fSt (K )K 2 (t, K )2 2

and the result follows from noting fST (y ) y dy =


K 0

fST (y )(y K )+ dy + K
K

fST (y )dy

and applying the BreedenLitzenberger identities.


version of It os formula for non-smooth convex functions, called Tanakas formula, can actually be rigorously stated in terms of a quantity called local time.
74
4A

7. Computing marginal laws We begin this section with some comments on the BreedenLitzenberger formula. First note that the collection of call prices {C0 (T, K ) : K > 0}, determines the Q-law of the random variable ST , even if a density doesnt exist. Indeed, note that lim
0

C0 (T, K + ) C0 (T, K )

= erT Q(ST > K ).

Also, if the put prices are given by the put-call parity formula P0 (T, K ) = C0 (T, K ) S0 + erT K then the put prices also determine the marginal laws of S . Now, if we know the distribution of ST , we can compute the expectation EQ erT g (ST ) for any non-negative function g . Of course, the above quantity is the replication cost of a contingent claim with payout T = g (ST ). Is there a way to deduce this replication cost directly from the prices of calls and puts? The answer is yes. Indeed, suppose g is C 2 and convex. Then, the following formula holds identically
a

g (S ) = g (a) + g (a)(S a) +
0

g (K )(K S ) dK +
a

g (K )(S K )+ dK

for any a > 0. This identity can be veried by integration by parts. By approximating the integral by a Riemann sum g (S ) g (a) ag (a) + g (a)S +
Ki <a

Ki g (Ki )(Ki S )+ +
K i a

Ki g (Ki )(S Ki )+

and we see that the nancial signicance is that the payout g (ST ) of the claim can be approximated by holding an portfolio consisting of a bond with principal value g (a) ag (a), g (a) shares of the stock, Ki g (Ki ) puts of strike Ki < a and Ki g (Ki ) calls of strike Ki a. And by integration, we get
a

E [e

rT

g (ST )] = [g (a)ag (a)]e

rT

+g (a)S0 +
0

g (K )P0 (T, K )dK +


a

g (K )C0 (T, K )dK.

7.1. Call prices from moment generating functions. Since a portfolio of calls and puts on a stock can essentially replicate any European contingent claim, it is important to have models where the call prices can be computed easily. Unfortunately, there are few models where there exists nice, elementary formulae for the call prices. However, there are many models where the moment generating functions can be computed explicitly, and we will now see that given the moment generating function we can compute call prices by integration: Consider a market model (B, S ) where Bt = B0 ert and S/B is a positive Q-martingale. For complex in the vertical strip = { = p + iq : 0 p 1, q R}
75

dene the moment generating function of the log stock price by Mt () = EQ (e log St ). Note that since S/B is a martingale we have for = p + iq , EQ (|e log St |) = EQ (Stp ) EQ (St )p
p = eprt S0 <

by Jensens inequality, so the moment generating function is well-dened. The following result shows how to recover call prices from the moment generating function. Theorem. For any 0 < p < 1 the identity E[erT (ST K )+ ] = S0 holds. Essentially, we are inverting the moment generating function via a complex integral. Variants of this procedure are often called a Bromwich, Fourier or Mellin transform. To prove this formula, we begin with a lemma: Lemma. For any 0 < p < 1 the identity 1 2 holds.

erT K 1p 2

MT (p + ix)eix log K dx (x ip)(x + i(1 p))

eiax dx = (x ip)(x + i(1 p))

eap ea(1p)

if a 0 if a < 0

Proof. This is a standard application of the Cauchy residue theorem. Consider the case a 0. Dene the semi-circular contour R = {x + i0 : R x R} {Rei : 0 } in the upper half-plane. Cauchys theorem
R

eiaz eiaz dz = i2 (z ip)(z + i(1 p)) z + i(1 p) = 2e


76
ap

z =ip

since the integrand is meromorphic with a simple pole at z = ip inside the contour, and the contour integral is evaluated in the anticlockwise sense. On the other hand, eiaz dz = (z ip)(z + i(1 p))
R R

eiax dx + (x ip)(x + i(1 p))

iReaR sin ei(aR cos +) d (Rei ip)(Rei + i(1 p))

and the second integral vanishes as R since a 0. The case a < 0 is handled in exactly the same way; just integrate around a semi-circular contour in the lower half-plane enclosing the other pole at i(1 p). Proof of theorem. From the lemma, we have the identity (ST K )+ = ST K 1p 2

ep log ST +ix log(ST /K ) dx. (x ip)(x + i(1 p))

Now multiply by erT and compute expectations. The result follows upon interchanging expectation and integration on the right-hand side. This is justied by Fubinis theorem since

ep log ST +ix log(ST /K ) dx = MT (p) (x ip)(x + i(1 p))

dx (x2 + p2 )(x2 + (1 p)2 )

<

Remark. Here is one application of the representation of call prices in terms of the moment generating function. Let T (p) = log MT (p) be the cumulant generating function. By standard arguments, the function T is convex and smooth on p [0, 1]. Suppose that T (p) is minimised at p = p (0, 1). Then since T (p ) = 0 we have 1 T (p) T (p ) + T (p )(p p )2 2 where T (p ) > 0. Hence MT (p + ix) MT (p )e 2 T (p
1 )x2

If we approximate the Bromwich integral by a Gaussian integral, we have the approximation K 1p erT MT (p ) C0 (T, K ) S0 p (1 p ) 2 T (p ) As an application of this estimate the long-time implied volatilty is approximately5 T (T, K )2 8T (p ) + 4(2p 1) log(KerT /S0 ) + 4 log 2[p (1 p )]2 (p ) T (p ) .

as T . For a reality check, in the BlackScholes model we have MT () = e(r


5See
2 /2)T + 2 2 T /2

, T (p) = p(p 1) 2 T /2, p = 1/2 and T (T, K )2 = T 2 .

my paper, Asymptotics of implied volatility far from maturity, Journal of Applied Probability, 2009, for details.
77

7.2. Computing moment generating functions. In order to make use of the previous section, we need to be able to compute the moment generating function for some interesting models. We rst consider a general stochastic volatility model: dBt = Bt rdt dSt = St (rdt + vt dWtS )

dvt = A(vt )dt + B (vt )dWtv Here W S and W v are assumed to be correlated Brownian motions in a xed equivalent martingale measure Q, with correlation . Correlated Brownian motions can be constructed, for instance, by letting W v and W be independent Brownian motions and let WtS = Wtv + 1 2 Wt .

Theorem. For each , let F (, ; ) solve the PDE F 1 2F F + [r + (2 )v/2]F + (A + vB) + B2 2 = 0 t v 2 v with boundary condition F (T ; ) = 1 then e log St F (t, v ; ) is a local martingale. Proof. This is just another application of It os formula. The signicance of this result is that if we can prove the local martingale is a true martingale, then E[e log ST ] = e log S0 F (0, v0 ; ). To use this result, we need to solve a PDE in one spacial variable. Since the PDE for the option prices would involve two spacial variables (v, S ), we are in a better position nding the moment generating function rst via the above theorem, though we still need to evaluate the Bromwich integral. 7.3. The Heston model. We now explore a model where the moment generating function can be computed explicitly. It was introduced by Heston in 1993: dBt = Bt rdt dSt = St (rdt + vt dWtS ) dvt = ( v vt )dt + c vt dWtv

with W S , W v = t. This is just a special case of the stochastic volatility model in the previous subsection with A(v ) = ( v v ) and B (v ) = v for some positive constants , v , . In this model the squared volatility v is a mean-reverting process , i.e. an ergodic Markov process, at least under Q. The interpretation of v is the level of mean reversion, while is the speed of mean reversion. We will come across the stochastic process in the context of the CoxIngersollRoss rate model. It was rst studied by Feller in the 1950s.
78

The Heston PDE is then F F 1 2F + [r + (2 )v/2]F + [v + (c )v ] + c2 v 2 = 0. t v 2 v It turns out that this PDE can be solved explicitly. The trick is to make the ansatz F (t, v ; ) = eR(T t;)v+Q(T t;) . Note that the boundary condition F (T, v ; ) = 1 force R(0; ) = Q(0; ) = 0. The PDE becomes 1 Q + [r + (2 )v/2] + [v Rv + (c )v ]R + c2 vR2 = 0, 2 where the dot indicates dierentiation with respect to the time variable. Notice that the equation can be written in the form (T t; )v + (T t; ) = 0. Now, the above equation should hold for all v so (T t; ) = 0 = (T t; ), i.e = (2 )/2 + (c )R + 1 c2 R2 R 2 Q = r + v R. The equation for R is a Riccati equation which can be solved explicitly. In fact, we do not even have to make any tricky substitutions, separation of variables and partial fractions work well enough: = 1 c2 (R R+ )(R R ) R 2 R 1 = c2 (R R+ )(R R ) 2 1 1 1 R+ R R R+ R R 1 R( )/R+ log = 1 R( )/R R( ; ) = (2 )

= 1 c2 R 2

e () 1 ( () c + )e () + ( () + c )

where () = ( c)2 (2 )c2 and R () = [( c)2 ()]/c2 . And the second equation can be solved

Q( ; ) = r +
0

v R(s; )ds (2 )v () + c 2v log 2 c ( () c + )e () + ( () + c ) 2 ()

r +

It can be shown that for that EQ (e log ST ) = e log S0 +R(T ;)v0 +Q(T ;) .
79

What is the point of this calculation? Although the formula for the moment generating function is hard to call beautiful, it is very explicit. In particular, given the set of model parameters (v0 , , v , c), the function can be evaluated very quickly on a computer, and hence the Bromwich integral for call prices can be computed numerically quickly. Hence, it is possible to calibrate the Heston model to market data in a reasonable amount of time. This is one of the main reasons for its popularity. 8. American claims in local volatility models We have previously considered American claims in a general complete market in discrete time. Our main tool was the Snell envelope. In this section we we will consider American claims in a a continuous time model. We will make a Markovian assumption so that PDE techniques are available. We work in a market with d + 1 assets, a bank account with dynamics dBt = Bt rdt and d stocks with dynamics dSt = diag(St )( dt + (t, St )dWt ) where r 0 and Rd are constants, : [0, ) Rd Rdm is a given function and W is an m-dimensional Brownian motion. Consider the problem faced by a banker who has sold an American contingent claim with maturity T which pays g (St ) if the claim is exercised at time t [0, T ], and who wishes to trade in the underlying market to hedge his exposure to the optimally exercised claim. The main result is this: Theorem. Let L be the dierential operator dened by V LV = + t
d

i=1

1 V rS + i S 2
i

i=1

2V rV ai,j S S S i S j j =1
i j

where a = . Suppose V : [0, T ] Rd [0, ) solves the variational inequality max {LV, g V } = 0 V (T, S ) = g (S ). Let X be the wealth process started with X0 = V (0, S0 ) and with t = gradV (t, St ) shares of stock at time t. Then Xt g (St ) for all t [0, T ], and there exists a stopping time such that X = g (S ). Remark. To see where this variational inequality comes from, lets consider heuristically the Snell envelope of Y which should satisfy an equation like Zt = max{Yt t , E[Zt+ |Ft ]} where > 0 is a small increment of time, the process species the payout of the claim and Y is the state price density. First, let U = Z/Y and let Q the equivalent martingale measure corresponding to Y and the num eraire B . Then U should satisfy Ut = max{t , EQ [er Ut+ |Ft ]}.
80

Now, since t = g (St ) and S is a Markov process under Q, we suspect that Ut = V (t, St ) for some function V . By It os formula
t+ t+

er V (t + , St+ ) = V (t, St ) +
t

ers LV (s, Ss )ds +


t

ers

V s (s, Ss )dW S

is a Q-Brownian motion. Assuming the stochastic integral is mean-zero, we have where W


t+

V (t, S ) = max g (S ), V (t, S ) + EQ


t

ers LV (s, Ss )ds|St = S

Subtracting V (t, S ) from both sides and sending 0 yields the variational inequality appearing in the theorem. Remark. It should not be too surprising that the dierential operator L also appeared in our discussion of hedging European contingent claims. The following proof will proceed in a similar way to the proof of the robustness of BlackScholes implied volatility. Proof. Let X be the wealth process. As usual we have dXt = rXt dt + t (dSt rSt dt). Also by It os formula we have dV (t, St ) = and hence by subtraction (*) d(Xt V (t, St )) = [r(Xt V (t, St )) LV (t, St )]dt V 1 + t 2
d d

ai,j S i S j
i=1 j =1

2V S i S j

dt + t dSt

Since X0 = V (0, S0 ) and LV 0 by assumption, equation (*) implies Xt V (t, St ) for all t [0, T ]. The fact that Xt g (St ) follows from the assumption that g V 0. Finally, let = inf {t 0 : V (t, St ) = g (St )}. We will show that X = V ( , S ) = g (S ). Note that { = 0} we have X = X0 = V (0, S0 ) = V ( , S ) by assumption. Also, on { > 0} note that on the interval t [0, ], we have LV (t, St ) = 0 and hence by equation (*) we have Xt = V (t, St ). Remark. We can identify two subsets of [0, T ] Rd as follows S = {(t, S ) : g (S ) V (t, S ) = 0} C = {(t, S ) : g (S ) V (t, S ) < 0} The set S is called the stopping region since if (t, St ) S it is optimal (from the point of view of the buyer) to exercise the American claim. Similarly, the set C is called the continuation region, since if (t, St ) C it is optimal to wait.
81

In general, it is impossible to nd an explicit solution to the American options PDE. However, there is one case6 where all the calculations can be done. Example (Innite horizon put in the BlackScholes model). Let d = 1 and assume > 0 is constant. That is, the stock price is given by the BlackScholes model. We will study the American option PDE in the case of the innite horizon put, where g (S ) = (K S )+ and T = . Since (St )t0 is a time-homogeneous process, we can restrict ourselves to functions V that depend on S but not on t. Also, since the payout function is decreasing, we guess that there exists some q (0, K ) such that we should continue if S > q and stop if S < q . Hence we guess that (S ) V (S ) = K S if S < q 1 2 2 (C ) S V + rSV rV = 0 if S > q 2 By the usual techniques of ODEs, the general solution to equation (C ) is given by V (S ) = A0 S + A1 S a for some constants A0 and A1 , where a = 2r/ 2 . Since we expect V (S ) 0 as S , we can conclude that A0 = 0. It remains to solve for the constants q and A1 . Since we expect V to be continuous at S = q , we have A1 q a = K q. To nd another equation, we assume the smooth pasting condition that the derivative V is continuous at S = q : aA1 q (a+1) = 1. From this, we have aK K a+1 aa q= and A1 = . a+1 (a + 1)a+1

back at least to H. McKean. Appendix: A free boundary problem for the heat equation arising from a problem in mathematical economics. Industrial Management Review 6: 32-39. (1965)
82

6going

CHAPTER 6

Interest rate models


1. Bond prices and interest rates In this last chapter, we explore models for the interest rate term structure. The basic nancial instruments in this setting are the zero-coupon bonds. Definition. A (zero-coupon) bond with maturity T is a European contingent claim that pays exactly1 one unit of currency at time T . We denote by P (t, T ) the price at time t [0, T ] of the bond. To get a feel for how we should model the bond prices, note that a typical sample path t P (t, T ) of a zero-coupon bond price will look similar to the sample path of any other asset price. However, note that at maturity the bond is worth its principal value, so P (T, T ) = 1. On the other hand, since people prefer to be paid sooner rather than later, the map T P (t, T ) is usually decreasing. Of course, there are only a nite number of maturities of bonds traded on the the xed income market. But since this number is very large, it is common practice to represent the zero-coupon bond prices as a continuous curve, rather than a discrete set of points.

Rather than speak of bond prices, it is often easier to speak of interest rates. A popular interest rate is the yield y (t, T ) at time t of a bond maturing at time T dened by the
assume that the bond issuer is absolutely credit worthy, and there is exactly zero probability of default. Therefore, we are not discussing corporate bonds, mortgage-backed securities or the debt of some countries (for instance, Russia famously defaulted in 1998). In fact, there is probably no real-world example of a perfectly risk-free bond. Nevertheless, many practictioners probably still regard U.S. Treasury bonds, which are backed by the full faith and credit of the U.S. government, as virtually risk-free. Though with the current political situation in Washington, this may well change.
83
1We

formula

1 log P (t, T ). T t For us, a more useful interest rate is the forward rate f (t, T ) at time t for maturity T , dened by log P (t, T ). f (t, T ) = T The yield curve, the forward rate curve and the bond price curve contain the same information, since T P (t, T ) = e(T t) y(t,T ) = e t f (t,s)ds y (t, T ) = The term structure of interest rates refers the function T P (t, T ), or equivalently, the price data encoded in either of the functions T y (t, T ) or T f (t, T ). There are at least two perspectives to bond price modelling. One is to assume that bonds are derivative securities, where the underlying asset is a bank or money market account. A complementary perspective is to consider the bonds as fundamental and the bank account as a derivative asset (see example sheet 1). We will mostly explore the rst perspective in this chapter, but will return to the second perspective in our study of HJM models. 2. Bank accounts to bond prices and interest rates Adopting the rst perspective mentioned above, we assume that there is a num eraire asset, the bank account, with price dynamics dBt = Bt rt dt where the process r = (rt )t0 is called the spot interest rate or the short interest rate. Of course, the above dierential equation has the solution Bt = B0 e
t 0 rs

ds

Now, we formulate a condition so that for any collection of maturities T1 < . . . < Td , the market (Bt , P (t, T1 ), . . . , P (t, Td ))t[0,T1 ] has no arbitrage. Theorem. There is no arbitrage relative to the num eraire if there exists an equivalent measure Q such that the discounted bond price process (P (t, T )/Bt )t[0,T ] is a local martingale for all T > 0. In particular, there is no arbitrage if P (t, T ) = EQ (e for all 0 t T . Notice that if P (t, T ) = EQ (e
T t T t

rs ds

|Ft )

rs ds

|Ft ),

and r is suitably well-behaved, we can dierentiate the bond price with respect to maturity to recover the forward rate: f (t, T ) = EQ (rT e EQ (e
84
T t

rs ds

|Ft )

T t

rs ds

|Ft )

Notice that lim f (t, T ) = rt


T t

so the short rate is the left-hand end point of the forward rate curve. (The long rate limT f (t, T ) is the far right-hand end of the curve.) From common experience, it seems that we should like to model the interest rate (rt )t0 as a non-negative process. Indeed, if rt 0 for all t 0 then the map T P (t, T ) is decreasing. However, for the sake of tractablility, this modelling requirement is frequently dropped. 3. Short rate models We begin with a market that has just the bank account B . We will consider an It o process short interest rate model of the form drt = at dt + t dWt for adapted process (at )t0 and (bt )t0 , and a Brownian motion (Wt )tR+ for P. Note that while in a complete stock market model there was only one equivalent martingale measure, no such choice is possible since the short rate is not traded. However, we know that there is no arbitrage if the market somehow picks an equivalent martingale measure Q to price the bonds. We will assume that the market price of risk is given by the process (t )t0 so that t drt = t dt + t dW t = dWt + t dt denes a Brownian motion for the measure Q whose martingale where dW density process M is given by dMt = Mt t dWt , and where t = at t t denes the risk-neutral drift. Since we are interested in pricing and hedging, there is no need to model the processes (at )t0 and (t )t0 separately. However, we must be careful to realize that is impossible to estimate the distribution of the random variable t directly from a time series rt1 , . . . , rtn . 3.1. Vasicek model. In 1977, Vasicek proposed the following model for the short rate: t drt = ( r rt )dt + dW for a parameter r > 0 interpreted as a mean short rate, a mean-reversion parameter > 0, and a volatility parameter > 0. This stochastic dierential equation can be solved explicitly to yield
t

rt = e

r0 + (1 e

) r+
0

s. e(ts) dW

Note that the short interest rate in the Vasicek model follows an OrnsteinUhlenbeck process, and in particular, that for each t 0 the random variable rt is Gaussian under the measure Q with
t

EQ (rt ) = et r0 + (1 et ) r and VarQ (rt ) =


0

e2(ts) 2 ds =

2 (1 e2t ). 2

85

Moreover, one can show that the process is ergodic and converges to the invariant distri2 bution N r , . In particular, we have 2 1 T
T

rs ds r Q almost surely.
0

Please note, however, that in the present framework we can say absolutely nothing about the distribution of rt for the objective measure P, unless we have a model for the market price of risk. Since the short rate rt is Gaussian, the advantage of this type of model is that it is relatively easy to compute prices, for instance of bonds, explicitly. A disadvantage of this model is that there is a chance that rt < 0 for some time t > 0. Recall that a normal random variable can take any real value, both positive and negative. However, for sensible parameter values, the Q-probabilty of the event {rt < 0} is pretty small. We have learned from example sheet 3 that
T T T t

rt dt =
0 0 T

[et r0 + (1 et ) r]dt +
0 T 0

s dt e(ts) dW
T

=
0

[et r0 + (1 et ) r]dt +
0 T s

s e(ts) dt dW 2 2
T

N
0

[et r0 + (1 et ) r]dt,

(1 et )2 dt
0

under Q, so that, using the moment generating function of a Gaussian random variable we have P (0, T ) = EQ [e
T 0

rt dt T

] et r0 + (1 et ) r 2 (1 et )2 dt 2 2

= exp
0

so that

2 (1 et )2 2 2 By the time-homogeneity of the Vasicek model, we can actually deduce the formula f (0, T ) = et r0 + (1 et ) r f (t, t + x) = rt e
x

+r (1 e

2 ) 2 (1 ex )2 2

This formula says that for the Vasicek model, the forward rates at time t are an ane function of the short rate at time t. (An ane function is of the form g (x) = ax + b, that is, its graph is a line.) 4. Markovian short rate models We now study the case when the short rate is Markovian. Assume that t drt = (t, rt )dt + (t, rt )dW for some non-random functions : R+ R R and : R+ R R.
86

As we have learned for Markovian stock models, the price of contingent claims can be expressed in terms the solution of a PDE: Theorem. Fix T > 0 and suppose V : [0, T ] R R satises the PDE 2V V 1 V (t, r) + (t, r) (t, r) + (t, r)2 2 (t, r) = rV (t, r) t r 2 r V (T, r) = 1 Suppose P (t, T ) = V (t, rt ). Then the discounted price process e is a Q-local martingale. Proof. It os formula implies d e
t 0 rs ds t 0 rs ds

P (t, T )

V (t, rt )

= rt e +e

t 0 rs ds

V (t, rt )dt

V V 1 2V (t, rt ) dt + (t, rt )drt + (t, rt )d r t t r 2 r2 t V 1 2V V (t, rt ) + (t, rt ) (t, rt ) + (t, rt )2 2 (t, rt ) = e 0 rs ds t r 2 r t V t rt V (t, rt ) dt + e 0 rs ds (t, rt ) (t, rt )dW r Since the drift vanishes by assumption, so (P (t, T )/Bt )t[0,T ] is a local martingale.
t 0 rs ds

Remark. In the proof of the preceding theorem, notice that we can only conclude that t Mt = e 0 rs ds P (t, T ) is a local martingale since we are using It os formula. When is it a true martingale? Here is a sucient condition. Suppose that we can show that rt 0 and that 0 P (t, T ) 1 for all t 0. In this case, we would have 0 Mt 1 and hence M is a true martingale (recall that bounded local martingales are true martingales). In particular, we have the formula T P (t, T ) = EQ e t rs ds |Ft . 4.1. CoxIngersoll-Ross model. In 1985, Cox, Ingersoll, and Ross proposed the following model for the short rate: t drt = ( r rt ) + rt dW for a parameter r > 0 interpreted as a mean short rate, a mean-reversion parameter > 0, and a volatility parameter > 0. The process (rt )tR+ satisfying the above stochastic dierential equation is often called a square-root diusion or CIR process, though this stochastic process was studied as early as 1951 by Feller. This process was also used by Heston to model the spot volatility process in an equity market. Althought the CIR stochastic dierential equation cannot be solved explicitly, one can say quite a lot about this process. For instance, one can show that the process is ergodic and its invariant disribution is a gamma distribution with mean r . An advantage of this model over the Vasicek model is that the short rate rt is non-negative for all t 0. Furthermore, explicit formula are still available for the bond prices.
87

We can also use the above theorem to compute bond prices. Indeed, x T > 0 and consider the PDE V 1 2V V (t, r) + ( r r) (t, r) + 2 r 2 (t, r) = rV (t, r) t r 2 r V (T, r) = 1. As we did in the Heston model, we can make the ansatz V (t, r) = erR(T t)+Q(T t) for some functions A and B which satisfy the boundary conditions R(0) = Q(0) = 0. Substituting this into the PDE yields Q ) + ( (Rr r r)R + This time we have
2 = R + R2 1 R 2 Q = r R.

2 2 rR = r 2

The equation for R is a Riccati equation, whose solution is 2(e 1) R( ) = ( + )e + ( )

Q( ) =
0

r R(s)ds

where = 2 + 2 2 . The bond prices are too messy to write down, but the forward rates are given by f (t, t + x) = 4 2 ex 2r (ex 1) r + . t [( + )ex + ( )]2 ( + )ex + ( )

In particular, the forward rates for the CIR model are again given by an ane function of the short rate. 5. The HeathJarrowMorton framework Starting from a short-rate model, the derived bond prices are necessarily It o processes. There is no arbitrage in a factor model since, by construction, there exists an equivalent martingale measure Q such that all discounted bond prices (P (t, T )/Bt )t[0,T ] are local martingales. The insight of Heath, Jarrow, and Morton in 1992 was that we can change perspectives by modelling the bond prices directly. Motivation. Indeed, suppose we start out with just the bond market, but without the bank account. We can construct the bank account by considering an investor holding his wealth in just-maturing bonds. More concretely, suppose at time 0 the investor has B0 units of wealth. Fix a sequence 0 t0 < t1 < . . . of times and suppose that during the interval (ti1 , ti ] the investor holds all of his wealth in the bond which matures at time ti . If the
88

investors wealth at time t is denoted by Bt , and the number of shares of the just-maturing bond by t , the budget constraint is Bti1 = ti P (ti1 , ti ) and the self-nancing condition is Bti = ti since P (t, t) = 1 for all t. Hence, the rate of change of the wealth is given by Bti Bti1 ti ti1 = Bti1 1 P (ti1 , ti ) Pti1 (ti ) ti ti1

By taking the limit as ti ti1 0, we can dene the spot rate by rt = P (t, T )|T =t T so that dBt = Bt rt dt as before. The usual formulation of the HJM idea is in terms of the forward rates. As usual, we put ourselves in the context of a probability space (, F , Q) on which we can dene a t )t0 . d-dimensional Brownian motion (W Theorem. Suppose for each T , the foward rate process (f (t, T ))t[0,T ] has dynamics
n

df (t, T ) = a(t, T )dt +


i=1

t(i) (i) (t, T )dW

for some suitably regular adapted processes (a(t, T ))t[0,T ] and ( (i) (t, T ))t[0,T ] . Let the short rate be given by rt = f (t, t) and the bank account dynamics by dBt = Bt rt dt. Finally, let the bond prices be given by P (t, T ) = e If
d T
T t

f (t,s) ds

a(t, T ) =
i=1

(i) (t, T )
t
t 0 rs ds

(i) (t, s)ds,

then, the discounted bond prices e are local martingales. Remark. The upshot of the HJM result is that the drift and the volatilty of the forward rate dynamics cannot be prescribed independently. Indeed, they must be related by the famous formula
T

P (t, T )

a(t, T ) = (t, T )
t

(t, s)ds,

usually called the HJM drift condition. Notice that this drift/volatility contraint is not present in the factor models from the previous sections.
89

The dierence with the short rate models is that we are now trying to model the dynamics of the whole term structure. Indeed, in the HJM framework, we can initialize the model with any initial forward rate curve T f (0, T ). Nevertheless, note that any of the short rate or factor models can be put into the HJM framework, just by choosing the initial forward rate curve to match the one predicted by the model. Proof. We must show that for each T > 0, the discounted bond price process e 0 rs ds t f (t,s)ds is a local martingale. Now applying some formal manipulations (we assume enough regularity that we can appeal to a stochastic Fubini theorem )
t T T
t T

d
0

rs ds +
t

f (t, s) ds

= (rt f (t, t))dt +


t

df (t, s) ds
T

1 = 2 Hence, by It os formula, we have


T

T t

(t, s)ds dt +
t

t. (t, s)ds dW

dMt = Mt
t
t 0 rs ds

t (t, s)ds dW

where Mt = e P (t, T ). Note that M is a stochastic integral with respect to a Brownian motion, and hence a local martingale. We conclude this section with some examples. In these examples, the forward rates are Gaussian under the measure Q, and hence are vulnerable to the criticism that there is a positive probability that the rates become negative. 5.1. HoLee. (1986) This model is the simplest possible model HJM model. Let d = 1 and (t, T ) = 0 be constant. Then
2 t. df (t, T ) = 0 (T t) dt + 0 dW

or
2 t. f (t, T ) = f (0, T ) + 0 (T t t2 /2) + 0 W

Here is an unusual feature of this model: if the initial forward rate curve T f (0, T ) is bounded from below, then for positive times t the forward rates f (0, T ) as T . The short rate is then given by
2 2 t. rt = f (0, t) + 0 t /2 + 0 W

Hence the HoLee model corresponds to the following short rate model:
2 t. t)dt + 0 dW drt = (f0 (t) + 0

5.2. VasicekHullWhite. (1990) Again let d = 1 but now (t, T ) = 0 e(T t) for positive constants 0 and . Then df (t, T ) =
2 0 t. e(T t) (1 e(T t) )dt + 0 e(T t) dW

90

The short rates are given by


t

rt = f (0, t) +
0

2 0 e(ts) (1 e(ts) )ds + t

s 0 e(ts) dW
0

= f (0, t) +

2 0 (1 22

et )2 +
0

s 0 e(ts) dW

The short rate dynamics are given by


t 2 0 s dt t 0 e(ts) dW et (1 et ) dt + 0 dW 0 2 f0 (t) 0 t = + f0 (t) + 2 (1 e2t ) rt dt + 0 dW 2 Hence, the HullWhite extension of the Vasicek essentially replaces the mean interest rate r with a time-varying, but non-random, mean rate r (t).

drt =

f0 (t) +

5.3. Kennedy. (1994) Note that for the HJM models discussed above, the forward rates are given by
t T t

f (t, T ) = f (0, T ) +
0

(u, T )
u

(u, s)ds du +
0

u. (u, T ) dW

If is not random, then the distribution of f (t, T ) under the risk-neutral measure Q is Gaussian with mean
t T

EQ [f (t, T )] = f0 (T ) +
0

(u, T )
u st

(u, s)ds du

and covariance CovQ [f (s, S ), f (s, T )] =


0

(u, S ) (u, T )du.

Kennedy reversed this logic, and considered a Gaussian random eld {f (t, T ) : 0 t T } with mean (t, T ) and covariance C (s, t; S, T ). Suppose that covariance has the special form C (s, t; S, T ) = cst (S, T ) so that, for each xed T > 0, the increments of (f (t, T ))t[0,T ] are independent. Then the discounted bound prices are local martingales (actually true martingales since everything is Gaussian and we can compute the conditional expectations by hand) when the mean is given by
T

(t, T ) = f (0, T ) +
0

cts (s, T )ds.

An advantage of this formulation of the Gaussian HJM model is that one is no longer restricted to nite dimensional Brownian motions, and, therefore, there is much more exibility to specify the correlation of the increments. For instance, one choice is to have the correlation of the increments decay exponentially in the dierence of the maturities: corr(df (t, t + x), df (t, t + y ) = e |xy| . Since the operator on L2 (R+ ) with kernel e |xy| is not of nite rank, the above correlation could not be realised by a nite rank HJM model. However, since the operator is positive
91

denite, it can be the correlation of a Gaussian random eld. Actually, this model can be realised as an HJM model driven by an innite dimensional Brownian motion. See the book Interest Rate Models: an Innite Dimensional Stochastic Analysis Perspective by Ren e Carmona and me for details.

92

CHAPTER 7

Crashcourse on probability theory


These notes are a list of many of the denitions and results of probability theory needed to follow the Advanced Financial Models course. Since they are free from any motivating exposition or examples, and since no proofs are given for any of the theorems, these notes should be used only as a reference. A table of notation is in the appendix. 1. Measures Definition. Let be a set. A sigma-eld on is a non-empty set F of subsets of such that (1) if A F then Ac F , (2) if A1 , A2 , . . . F then i=1 Ai F . The terms sigma-eld and sigma-algebra are interchangeable. The Borel sigma-eld B on R is the smallest sigma-eld containing every open interval. More generally, if is a topological space, for instance Rn , the Borel sigma-eld on is the smallest sigma-eld containing every open set. Definition. Let be a set and let F be a sigma-eld on . A measure on the measurable space (, F ) is a : F [0, ] such that (1) () = 0 (2) if A1 , A2 , . . . F are disjoint then ( i=1 (Ai ). i=1 Ai ) = Theorem. There exists a unique measure Leb on (R, B ) such that Leb(a, b] = b a for every b > a. This measure is called Lebesgue measure. Definition. A probability measure P on (, F ) is a measure such that P() = 1. Let be a set, F a sigma-eld on , and P a probability measure on (, F ). The triple (, F , P) is called a probability space. The set is called the sample space, and an element of is called an outcome. A subset of which is an element of F is called an event. Let A F be an event. If P(A) = 1 then A is called an almost sure event, and if P(A) = 0 then A is called a null event. The phrase almost surely is often abbreviated a.s. A sigma-eld is called trivial if each of its elements is either almost sure or null. 2. Random variables Definition. Let (, F , P) be a probability space. A random variable is a function X : R such that the set { : X ( ) t} is an element of F for all t R.
93

Let A be a subset of R, and let X be a random variable. We use the notation {X A} to denote the set { : X ( ) A}. For instance, the event {X t} denotes { : X ( ) t}. The distribution function of X is the function FX : R [0, 1] dened by FX (t) = P(X t) for all t R. We also use the term random variable to refer to measurable functions X from to more general spaces. In particular, we call a function X : Rn a random variable or random vector if X ( ) = (X1 ( ), . . . , Xn ( )) and Xi is a random variable for each i {1, . . . , n}. Definition. Let A be an event in . The indicator function of the event A is the random variable 1A : {0, 1} dened by

1A ( ) =
for all .

1 0

if A if Ac

3. Expectations and variances Definition. Let X be a random variable on (, F , P). The expected value of X is denoted by E(X ) and is dened as follows X is simple , i.e. takes only a nite number of values x1 , . . . , xn .
n

E(X ) =
i=1

xi P(X = xi ).

X 0 almost surely. E(X ) = sup{E(Y ) : Y simple and 0 Y X a.s.} Note that the expected value of a non-negative random variable may take the value . Either E(X + ) or E(X ) is nite. E(X ) = E(X + ) E(X ) X is vector valued and E(|X |) < . E[(X1 , . . . , Xd )] = (E[X1 ], . . . , E[Xd ]) A random variable X is integrable i E(|X |) < and is square-integrable i E(X 2 ) < . The terms expected value, expectation, and mean are interchangeable. The variance of an integrable random variable X , written Var(X ), is Var(X ) = E{[X E(X )]2 } = E(X 2 ) E(X )2 . The covariance of square-integrable random variable X and Y , written Cov(X, Y ), is Cov(X, Y ) = E{[X E(X )][Y E(Y )]} = E(XY ) E(X )E(Y ). If neither X or Y is almost surely constant, then their correlation, written (X, Y ), is (X, Y ) = Cov(X, Y ) . Var(X )1/2 Var(Y )1/2
94

Random variables X and Y are called uncorrelated if Cov(X, Y ) = 0. Theorem. Let X and Y be integrable random variables. linearity: E(aX + bY ) = aE(X ) + bE(Y ) for constants a, b. positivity: Suppose X 0 almost surely. Then E(X ) 0 with equality if and only if X = 0 almost surely. Definition. For p 1, the space Lp is the collection of random variables such that E(|X |p ) < . The space L is the collection of random variables which are bounded almost surely. Theorem (Jensens inequality). Let X be a random variable and g : R R be a convex function. Then E[g (X )] g (E[X ]) whenever the expectations exist. If g is strictly convex, the above inequality is strict unless X is constant.
1 p

Theorem (H olders inequality). Let X and Y be random variables and let p, q > 1 with 1 + q = 1. If X Lp and Y Lq then E(XY ) E(|X |p )1/p E(|Y |q )1/q

with equality if and only if either X = 0 almost surely or X and Y have the same sign and |Y | = a|X |p1 almost surely for some constant a 0. The case when p = q = 2 is called the CauchySchwarz inequality. Definition. A random variable X is called discrete if X takes values in a countable set; i.e. there is a countable set S such that X S almost surely. If X is discrete, the function pX : R [0, 1] dened by pX (t) = P(X = t) is called the mass function of X . The random variable X is absolutely continuous (with respect to Lebesgue measure) if and only if there exists a function fX : R [0, ) such that
t

P(X t) =

fX (x)dx

for all t R, in which case the function fX is called the density function of X . If X is a random vector taking values in Rn , then the density of X , if it exists, is the function fX : Rn [0, ) such that P(X A) =
A

fX (x)dx

for all Borel subsets A Rn . Theorem. Let the function g : R R be such that g (X ) is integrable. If X is a discrete random variable with probability mass function pX taking values in a countable set S then E(g (X )) = g (t) pX (t).
tS

If X is an absolutely continuous random variable with density function fX then

E(g (X )) =

g (x) fX (x) dx.


95

More generally, if X is a random vector valued in Rn with density fX and g : Rn R then E(g (X )) =
Rn

g (x) fX (x) dx.

4. Special distributions Definition. Let X be a discrete random variable taking values in Z+ with mass function pX . The random variable X is called Bernoulli with parameter p if pX (0) = 1 p and pX (1) = p. where 0 < p < 1. Then E(X ) = p and Var(X ) = p(1 p). binomial with parameters n and p, written X bin(n, p), if pX (k ) = n k p (1 p)nk for all k {0, 1, . . . , n} k

where n N and 0 < p < 1. Then E(X ) = np and Var(X ) = np(1 p). Poisson with parameter if pX (k ) = k e for all k = 0, 1, 2, . . . k!

where > 0. Then E(X ) = . geometric with parameter p if pX (k ) = p(1 p)k1 for all k = 1, 2, 3, . . . where 0 < p < 1. Then E(X ) = 1/p. Definition. Let X be a continuous random variable with density function fX . The random variable X is called uniform on the interval (a, b), written X unif(a, b), if fX (t) = 1 for all a < t < b ba

b for some a < b. Then E(X ) = a+ . 2 normal or Gaussian with mean and variance 2 , written X N (, 2 ), if

fX (t) =

1 (x )2 exp 2 2 2

for all t R

for some R and 2 > 0. Then E(X ) = and Var(X ) = 2 . exponential with rate , if fX (t) = et for all t 0 for some > 0. Then E(X ) = 1/.
96

If X is a random vector valued in Rn with density 1 fX (x) = (2 )n/2 det(V )1/2 exp (x ) V 1 (x ) 2 for a positive denite n n matrix V and vector Rn , then X is said to have the ndimensional normal (or Gaussian) distribution with mean and variance V , written X Nn (, V ). Then E(Xi ) = i and Cov(Xi , Xj ) = Vij . 5. Conditional probability and expectation, independence Definition. Let B be an event with P(B ) > 0. The conditional probability of an event A given B , written P(A|B ), is P(A B ) P(A|B ) = . P( B ) The conditional expectation of X given B , written E(X |B ), is E(X 1B ) E(X |B ) = . P( B ) Theorem (The law of total probability). Let B1 , B2 , . . . be disjoint, non-null events such that i=1 Bi = . Then

P(A) =
i=1

P(A|Bi )P(Bi )

for all events A. Definition. Let A1 , A2 , . . . be events. If P(


i I

Ai ) =
iI

P(Ai )

for every nite subset I N then the events are said to be independent. Random variables X1 , X2 , . . . are called independent if the events {X1 t1 }, {X2 t2 }, . . . are independent. The phrase independent and identically distributed is often abbreviated i.i.d. Theorem. If X and Y are independent and integrable, then E(XY ) = E(X )E(Y ). 6. Probability inequalities Theorem (Markovs inequality). Let X be a positive random variable. Then E(X ) P(X ) for all > 0. Corollary (Chebychevs inequality). Let X be a random variable with E(X ) = and Var(X ) = 2 . Then 2 P(|X | ) 2 for all > 0.
97

7. Characteristic functions Definition. The characteristic function of a real-valued random variable X is the function X : R C dened by X (t) = E(eitX ) for all t R, where i = 1. More generally, if X is a random vector valued in Rn then X : Rn C dened by X (t) = E(eitX ) is the characteristic function of X . Theorem (Uniqueness of characteristic functions). Let X and Y be real-valued random variables with distribution functions FX and FY . Let X and Y be the characteristic functions of X and Y . Then X (t) = Y (t) for all t R if and only if FX (t) = FY (t) for all t R. 8. Fundamental probability results Definition (Modes of convergence). Let X1 , X2 , . . . and X be random variables. Xn X almost surely if P(Xn X ) = 1 Xn X in Lp , for p 1, if E|X |p < and E|Xn X |p 0 Xn X in probability if P(|Xn X | > ) 0 for all > 0 Xn X in distribution if FXn (t) FX (t) for all points t R of continuity of FX Theorem. The following implications hold: Xn X almost surely or Xn X in probability Xn X in distribution Xn X in Lp , p 1 Furthermore, if r p 1 then Xn X in Lr Xn X in Lp . Definition. Let A1 , A2 , . . . be events. The term eventually is dened by {An eventually} =
N N nN

An

and innitely often by {An innitely often} =


N N nN

An .

[The phrase innitely often is often abbreviated i.o.] Theorem (The rst BorelCantelli lemma). Let A1 , A2 , . . . be a sequence of events. If

P(An ) <
n=1

then P(An innitely often) = 0.


98

Theorem (The second Borel-Cantelli lemma). Let A1 , A2 , . . . be a sequence of independent events. If

P(An ) =
n=1

then P(An innitely often) = 1. Theorem (Monotone convergence theorem). Let X1 , X2 , . . . be positive random variables with Xn Xn+1 almost surely for all n 1, and let X = supnN Xn . Then Xn X almost surely and E(Xn ) E(X ). Theorem (Fatous lemma). Let X1 , X2 , . . . be positive random variables. Then E(lim inf Xn ) lim inf E(Xn ).
n n

Theorem (Dominated convergence theorem). Let X1 , X2 , . . . and X be random variables such that Xn X almost surely. If E(supn1 |Xn |) < then E(Xn ) E(X ). Theorem (A strong law of large numbers). Let X1 , X2 , . . . be independent and identically distributed integrable random variables with common mean E(Xi ) = . Then X 1 + . . . + Xn almost surely. n Theorem (Central limit theorem). Let X1 , X2 , . . . be independent and identically distributed with E(Xi ) = and Var(Xi ) = 2 for each i = 1, 2, . . ., and let X1 + . . . + Xn n Zn = . n Then Zn Z in distribution, where Z N (0, 1).

99

R R+ N C Z Z+ Ac FX pX fX X E(X ) Var(X ) Cov(X, Y ) E(X |B ) ab ab a+ lim supn xn lim inf n xn ab |a| X

the the the the the the the the the the the the the the the

set of real numbers set of non-negative real numbers [0, ) set of natural numbers {1, 2, . . .} set of complex numbers set of integers {. . . , 2, 1, 0, 1, 2, . . .} set of non-negative integers {0, 1, 2, . . .} complement of a set A, Ac = { , / A} distribution function of a random variable X mass function of a discrete random variable X density function of an absolutely continuous random variable X characteristic function of X expected value of the random variable X variance of X covariance of X and Y conditional expectation of X given the event B

min{a, b} max{a, b} max{a, 0} the limit superior of the sequence x1 , x2 , . . . the limit inferior of the sequence x1 , x2 , . . . Euclidean inner (or dot) product in Rn , a b = Euclidean norm in Rn , |a| = (a a)1/2
n i=1

ai b i

1A

N (, 2 ) Nn (, V ) bin(n, p) unif(a, b) Lp

the random variable X is distributed as the probability measure the indicator function of the event A the normal distribution with mean and variance 2 the n-dimensional normal distribution with mean Rn and variance V Rnn the binomial distribution with parameters n and p the uniform distribution on the interval (a, b) the set of random variables X with E|X |p < Table 1. Notation

100

Index

1FTAP continuous time, 56 discrete time, 19 2FTAP continuous time, 63 discrete time, 36 a.s., 93 absolutely continuous random variable, 95 adapted process, 9 admissible trading strategy, 56 almost sure event, 93 American contingent claims, 37 arbitrage absolute continuous time, 56 discret time, 17 attainable discrete-time, 34 Bernoulli random variable, 96 binomial random variable, 96 BlackScholes formula, 66 BlackScholes model, 65 BlackScholes PDE, 68 bond, 83 Borel sigma-eld, 93 BorelCantelli lemmas, 98, 99 BreedenLitzenberger formula, 74 Bromwich integral, 76 Brownian motion, 42 call option American, 33 European, 33 CameronMartinGirsanov theorem, 51 Cauchy residue theorem, 76 CauchySchwarz inequality, 95 central limit theorem, 99 characterisation of attainable claims discrete-time, 36
101

Chebychevs inequality, 97 CIR model, 87 class D, 57 locally, 57 class DL, 57 complete market discrete time, 36 conditional expectation existence and uniqueness, 13 given a sigma-eld, 13 given an event, 97 continuation region, 81 contour integration, 76 CoxIngersollRoss model, 87 density function, 95 discounted price relative to a num eraire, 30 discrete random variable, 95 dominated convergence theorem, 99 Doob decomposition, 38 Dupires formula, 73 equivalent martingale measure continuous time, 59 discrete time, 30 equivalent measures, 29 exponential random variable, 96 Fatous lemma, 99 FeynmanKac PDE, 67 ltration, 9 forward rate, 84 fundamental theorem of asset pricing rst continuous time, 56 discrete time, 19 second continuous time, 63 discrete time, 36 Gaussian random variable, 96 Gaussian random vector, 97

geometric random variable, 96 Girsanovs theorem, 51 H olders inequality, 95 HeathJarrowMorton drift condition, 89 Heston model, 78 historical probability measure, 30 HJM drift condition, 89 HoLee model, 90 HullWhite extension of Vasicek, 90 i.i.d., 97 implied volatility, 71 asymptotics, 77 incomplete market discrete time, 36 independent events, 97 independent random variables, 97 indicator function, 94 integrable random variable, 94 interest rate term structure, 84 It o process, 46 It os formula multi-dimensional version, 50 scalar version, 46 It os isometry, 43 Jensens inequality, 95 Kennedy model, 91 Kolmogorov equation, 67 law of iterated expectations, 14 Lebesgue measure, 93 local martingale, 24 local volatility, 73 long interest rate, 85 market price of risk, 61 Markovs inequality, 97 martingale, 14 martingale representation theorem, 51 martingale transform, 15 mass function, 95 mean-reverting process, 78 measurable with respect to a sigma-eld, 12 measure, 93 probability, 93 monotone convergence theorem, 99 multivariate Gaussian, 97 multivariate normal distribution, 97 natural ltration of a process, 14 normal random variable, 96
102

normal random vector, 97 Novikovs criterion, 51 null event, 93 num eraire, 18, 30 objective probability measure, 30 optimal stopping time, 40 Poisson random variable, 96 predictable discrete time, 11 predictable process continuous time, 44 predictable sigma-eld, 44 previsible discrete time, 11 probability density function, 95 probability mass function, 95 put option, 34 put-call parity, 34 quadratic co-variation, 49 quadratic variation, 47 RadonNikodym derivative, 29 RadonNikodym theorem, 29 relative arbitrage, 59 replicable discrete-time, 34 Riccati equation, 79 self-nancing continuous time, 53 discrete time, 11 separating hyperplane theorem, 20 short interest rate, 84 sigma-algebra, 93 sigma-eld, 93 simple predictable process, 42 simple random variable, 94 smile, implied volatility, 71 smirk, implied volatility, 71 smooth pasting, 82 Snell envelope discrete time, 39 spot interest rate, 84 square-integarable random variable, 94 statistical probability measure, 30 stochastic integral discrete time, 15 stopping region, 81 stopping time, 23 strike price, 33 strong law of large numbers, 99

suicide strategy, 56 supporting hyperplane theorem, 20 term structure of interest rates, 84 tower property, 14 trivial sigma-eld, 10, 93 uniform integrable, 57 uniform random variable, 96 usual conditions, 42 Vasicek model, 85 Wiener process, 42 yield curve, 83 zero-coupon bond, 83

103

You might also like