You are on page 1of 44

Understanding bias Control variates Importance sampling

wi3425TU—Monte Carlo methods

L.E. Meester

Week 5

1/ 44
Understanding bias Control variates Importance sampling

Week 5—Program for this week

1 Understanding bias (and trying to deal with it)


Bias detection: analysis
Envisioning what you can’t really see
2 Control variates (Chapter 22)
Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary
3 Importance sampling: introduction
The principle
Example: determining π (again)
Don’t forget: Do the weekly practice material and upload your
solutions and we’ll provide you with feedback!
2/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

1 Understanding bias (and trying to deal with it)


Bias detection: analysis
Envisioning what you can’t really see

2 Control variates (Chapter 22)


Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary

3 Importance sampling: introduction


The principle
Example: determining π (again)

3/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

Understanding bias (and trying to deal with it)

Recall estimating derivatives (from Lecture 3):

Matlabcode: DeltaCall2.m, M = 106 ; exact value: ∆ = 0.95577.


Estimates and standard errors:
h right way (2) ˆh − ∆
∆ ˆ h/10 “right”−exact
0.1 0.96044±0.00023 0.00405 0.00466
0.01 0.95649±0.00024 0.00041 0.00072
0.001 0.95608±0.00024 0.00004 0.00031
0.0001 0.95604±0.00024 0.00026
We see: error caused by the finite stepsize h is roughly linear in h.
Bias “estimate”: the change in the estimate, for example from
h = 0.1 to h = 0.01, coincides in order of magnitude with the bias
(that at h = 0.1!), at least while this difference is well above the
standard error. See also PoAMC 9 and 10.
4/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

A small bias detection quiz:

A. We estimate (some) Delta for two values of h:

>> DeltaCall2B2
M = 1000
h = 1
Delta = 0.9398
seDelta = 0.0093
h = 0.5000
Delta = 0.9208
seDelta = 0.0105

Does the output show clear evidence of bias? NO

5/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

B. Another run:

>> DeltaCall2B2
M = 100000
h = 1
Delta = 0.9300
seDelta = 0.0010
h = 0.5000
Delta = 0.9094
seDelta = 0.0011

Does the output show clear evidence of bias? YES


Can we say anything about how big it is?

6/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

1 Understanding bias (and trying to deal with it)


Bias detection: analysis
Envisioning what you can’t really see

2 Control variates (Chapter 22)


Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary

3 Importance sampling: introduction


The principle
Example: determining π (again)

7/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

Bias detection: analysis


In the estimates we produce we can distinguish three components:
true value + bias + random error,
Let’s call:
∆0 : the true value;
∆ˆ (h) : the estimate at stepsize h;
bias(h) : the bias at stepsize h (we don’t know this);
1,2 : the (random) simulation error (indication of size: the se);
so in formula (for h = 0.1 and 0.01):

∆ˆ (0.1) = ∆0 + bias(0.1) + 1
ˆ (0.01) = ∆0 + bias(0.01) + 2

and their difference:


ˆ (0.1) − ∆
∆ ˆ (0.01) = bias(0.1) − bias(0.01) + 1 − 2 .
8/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

ˆ (0.1) − ∆
∆ ˆ (0.01) = bias(0.1) − bias(0.01) + 1 − 2 .

Note: We know none of the terms on the right, only their sum. . .
However: the s.e.’s provide an indication of the “size” of 1 and 2 .
If the standard errors are
ˆ (0.1) − ∆
small: then the sum ∆ ˆ (0.01) ≈ bias(0.1) − bias(0.01)
and we “see” how big the reduction in bias is;
big: the random errors dominate, the reduction in bias is
smaller; nothing more can be said.

9/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

1 Understanding bias (and trying to deal with it)


Bias detection: analysis
Envisioning what you can’t really see

2 Control variates (Chapter 22)


Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary

3 Importance sampling: introduction


The principle
Example: determining π (again)

10/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

Seeing what you can’t see

Next, plots for M = 103 , 104 and 105 , each time


1 single estimates with a bar the size of 1 s.e., h = 0.1 and 0.01;
2 superimposed: histograms of one thousand simulations.

In practice:
1 is all the information you have, this is all you know;
2 you should try to imagine/visualise; you know it’s there, your
results are just “one set from the histograms.”

11/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

M = 103 , h = 0.1 (red), 0.01 (blue):

12/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

M = 103 , h = 0.1 (red), 0.01 (blue):

13/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

M = 104 , h = 0.1 (red), 0.01 (blue):

14/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

M = 104 , h = 0.1 (red), 0.01 (blue):

15/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

M = 105 , h = 0.1 (red), 0.01 (blue):

16/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

M = 105 , h = 0.1 (red), 0.01 (blue):

17/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

The quiz, question B again:

>> DeltaCall2B2
M = 100000
h = 1
Delta = 0.9300
seDelta = 0.0010
h = 0.5000
Delta = 0.9094
seDelta = 0.0011

For h = 1 their is clearly bias.


How big the bias is for h = 0.5 one cannot say, but in this
situation one should probably prefer reducing the bias further
over reducing the random error by increasing M.
If the bias is larger than the standard error,
increasing M will not improve the estimate.
18/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see

Two key things about bias

9 Bias. Some methods produce biased estimates: there is a


systematic deviation with respect to the unknown quantity
you are estimating. Make sure you are aware of this.
10 Accuracy in the presence of bias. The Monte Carlo rule
“one hundred times as many replications gives me an
additional digit of accuracy” is no longer true when there is
bias, so then blindly going for as large an M as possible is
senseless. It is better to find a balance between the size of the
(remaining) bias and the standard error. This is a hard and
sometimes unsolvable problem—do what you can.

19/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

1 Understanding bias (and trying to deal with it)


Bias detection: analysis
Envisioning what you can’t really see

2 Control variates (Chapter 22)


Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary

3 Importance sampling: introduction


The principle
Example: determining π (again)

20/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

Variance reduction: correlation often is the key

Determining derivatives/Greeks, we found:

ˆ 2 ≈ 2 Var(W ) · [1 − ρ (Wh , W )]
 
Var ∆
h2 M
Analysing antithetic variables:

Var(Y ) = Var(X ) · 21 [1 + ρ (X , X a )].

Today: given a Y that “resembles” X ,


h i
Var(Zbest ) = Var(X ) · 1 − ρ (X , Y )2 .

21/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

1 Understanding bias (and trying to deal with it)


Bias detection: analysis
Envisioning what you can’t really see

2 Control variates (Chapter 22)


Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary

3 Importance sampling: introduction


The principle
Example: determining π (again)

22/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

Control variates: the idea

We want to determine by Monte Carlo simulation: E [X ] = I .


Suppose: the random variable Y is
1 “closely connected to or correlated with X ”, and
2 E [Y ] is known.
For example, if they are positively correlated:
If a realization Y is above E [Y ], then perhaps also the
realization of X is above E [X ].
Similarly: Y < E [Y ] then perhaps X < E [X ].
Loosely speaking: perhaps X − Y stays close to E [X ] − E [Y ].
One may try Z = E [Y ] + X − Y :
1 note that it is definitely an OK estimator for I :

E [Z ] = E [X ] + E [Y ] − E [Y ] = I ;

2 perhaps its variance is smaller than that of X because the


−Y term counteracts the variability in X .
23/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

1 Understanding bias (and trying to deal with it)


Bias detection: analysis
Envisioning what you can’t really see

2 Control variates (Chapter 22)


Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary

3 Importance sampling: introduction


The principle
Example: determining π (again)

24/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

A simple example

Consider once again


R1 √
I = 0 e u du or, equivalently,

I = E [X ] with X = e U for standard uniform U.
A candidate might be Y = eU , because:

one might say: the functions u and u don’t differ much on
the interval [0, 1]; and
R1
E [Y ] = 0 eu du = e − 1, so indeed E [Y ] is known.

Let’s try it out in Matlab: HigS222a.m compares


1 the estimate based on X against
2 the one based on Z = X + E [Y ] − Y .

25/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

Some plots


Left: X and Y versus U (in fact plots u 7→ eu and e u ).
Right: Simulated pairs (X , Y ). They have a strong correlation.

26/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

1 Understanding bias (and trying to deal with it)


Bias detection: analysis
Envisioning what you can’t really see

2 Control variates (Chapter 22)


Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary

3 Importance sampling: introduction


The principle
Example: determining π (again)

27/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

Analysis and refinement (§22.2)

Generalisation: for any number θ the following

Zθ = X + θ · (E [Y ] − Y )

also has the right expectation: E [Zθ ] = E [X ] = I .


We might look for the best θ.
Optimisation: from Var(Zθ ) = Var(X − θY ) we get:

Cov(X , Y ) 2
h i  
Var(Zθ ) = Var(X )· 1 − ρ (X , Y )2 +Var(Y )· θ − .
Var(Y )

red: the optimal value for θ (minimizes Var(Zθ )).


blue: the reduction wrt of the variance Var(X );
Next: implementation. . .

28/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

Implementation

In practice:
Cov(X , Y ) is not known (why?); often neither is Var(Y ).
Solution: estimate them from the simulation.
The simulation then looks like this:
1 Compute µY = E [Y ].
2 Generate pairs (X1 , Y1 ), . . . , (XM , YM ). (Save them)
3 Compute the sample covariance of X and Y and the sample
variance of Y , and estimate θmin = Cov(X , Y ) /Var(Y ).
4 Now determine for i = 1, . . . , M: Zi = Xi + θ̂min · (µY − Yi ).
5 Process Z1 , . . . , ZM in the usual way.
Matlab: HigS222b-c.m

29/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

1 Understanding bias (and trying to deal with it)


Bias detection: analysis
Envisioning what you can’t really see

2 Control variates (Chapter 22)


Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary

3 Importance sampling: introduction


The principle
Example: determining π (again)

30/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

A classic example from path-dependent options (§22.3)

Arithmetic average Asian option: payoff


n
!
1X
W a := max S(ti ) − E , 0 ,
n
i=1

with t1 ,. . . , tn pre-determined time points.


Geometric average Asian option: payoff
n  
!1 1X
ln S(ti )
 
n
Y n  n 
g
− E , 0 = max e − E , 0 .
 i=1
W := max  S(ti )  
 
i=1

31/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

Geometric type can be priced exactly. Since


 i
X 
1 2
p
S(ti ) = S(t0 ) · exp (r − 2 σ )(ti − t0 ) + σ tj − tj−1 · Zj
j=1

the sum in the exponential of the pay-off


n
X
ln S(ti )
i=1

is a linear combination of Z1 , . . . , Zn : it has a normal distribution


and the parameters are easily determined.
Consequence: there is a Black-Scholes-formula for the geometric
version and W g can be used as control variate for W a ; hopefully
the payoffs are (quite) correlated. Matlab: HigS223.m

32/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

1 Understanding bias (and trying to deal with it)


Bias detection: analysis
Envisioning what you can’t really see

2 Control variates (Chapter 22)


Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary

3 Importance sampling: introduction


The principle
Example: determining π (again)

33/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp

Control variates: summary

We seek E [X ] = I and know about Y , the control variate, the


value µY = E [Y ] and that Y is “close” to X . We simulate the
pair (X , Y ) and do Monte Carlo based on

Zθ = X + θ · (µY − Y ).

Cov(X , Y ) σX
The best θ is: θmin = ≡ ρ (X , Y ) .
Var(Y ) σY
This leads to a reduction factor 1 − ρ2 :
h i
Var(Zθmin ) = Var(X ) · 1 − ρ (X , Y )2 .

The challenge: to find a Y with a high correlation with X .

34/ 44
Understanding bias Control variates Importance sampling The principle Example

1 Understanding bias (and trying to deal with it)


Bias detection: analysis
Envisioning what you can’t really see

2 Control variates (Chapter 22)


Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary

3 Importance sampling: introduction


The principle
Example: determining π (again)

35/ 44
Understanding bias Control variates Importance sampling The principle Example

Importance sampling: the principle

R1
For a function k : [0, 1] → R one can determine I = 0 k(x)dx
by Monte Carlo, using that I = E [k(U)], with U ∼ U (0, 1).
Let’s make explicit that there is a density term: write
Z 1
I = k(x)f (x)dx,
0

where f (x) = 1 for 0 ≤ x ≤ 1, f (x) = 0 elsewhere.


Suppose: g is another probability density on [0, 1] with
g (x) > 0 on the whole interval; then:
Z 1
f (x)
I = k(x) g (x) dx.
0 g (x)

36/ 44
Understanding bias Control variates Importance sampling The principle Example

Z 1
f (x)
I = k(x) g (x) dx.
0 g (x)

We can re-interpret this as


 
f (X )
I = E k(X ) ,
g (X )

where X is a random variable with pdf g .


Values of X sampled from the interval (x, x + dx):
occur with relative frequency g (x)dx;
are multiplied by f (x)/g (x);
so contribute the correct amount f (x)dx to the integral I .
Conclusion: we have found an alternative way to determine I .

37/ 44
Understanding bias Control variates Importance sampling The principle Example

The ratios
f (x) f (X )
w (x) = and w (X ) =
g (x) g (X )

are sometimes called the likelihood ratio (LR), so we write


Z 1
I = k(x)w (x) g (x) dx and I = E [k(X )w (X )] ,
0

where X has pdf g .


Now, why would we do all this? To get a small variance!
There is lot of freedom in choosing g ; conditions:
we should be able to simulate from g ;
g should be positive where f is positive;
more precisely, where k times f is non-zero.

38/ 44
Understanding bias Control variates Importance sampling The principle Example

f (x)
w (x) = and I = E [k(X )w (X )] , with X ∼ g .
g (x)

A small variance for k(X )w (X ) could be obtained if

k(x)f (x)
k(x)w (x) = ≈ constant,
g (x)

so we want g (x) ≈ constant · k(x)f (x).


(This is only possible if k ≥ 0 of course!)

(N.B. The optimal g would be g (x) = constant · k(x)f (x), this is


the so-called zero-variance distribution—compute the constant to
find out why this is not possible. It is still a very useful idea.)

39/ 44
Understanding bias Control variates Importance sampling The principle Example

The x values that contribute most to the integral


Z 1
I = k(x)f (x)dx
0

are those for which k(x)f (x) is large.

Summarizing, we look for a density g that


is positive where k(x)f (x) is non-zero;
is large where k(x)f (x) is large.

This method is called importance sampling (IS): we look for g


that sample the important values with higher frequency.

40/ 44
Understanding bias Control variates Importance sampling The principle Example

1 Understanding bias (and trying to deal with it)


Bias detection: analysis
Envisioning what you can’t really see

2 Control variates (Chapter 22)


Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary

3 Importance sampling: introduction


The principle
Example: determining π (again)

41/ 44
Understanding bias Control variates Importance sampling The principle Example

Example: determining π


To estimate π we used Y = 4 1 − U 2 with U ∼ U (0, 1).
R1 √
Then E [Y ] = 0 4 1 − x 2 dx = π and
from earlier simulation: Var(Y ) ≈ 0.797.
Here:

k(x) = 4 1 − x 2 and
f (x) = 1 for 0 ≤ x ≤ 1, f (x) = 0 elsewhere.
So, by the theory just developed, we look for g
positive on [0, 1];

proportional to 1 − x 2 .

42/ 44
Understanding bias Control variates Importance sampling The principle Example

We look for g
positive on [0, 1];

proportional to 1 − x 2 .
Note: √
1 − x 2 decreases from 1 to 0 on [0, 1].
Let’s try a decreasing density g (x) for 0 ≤ x ≤ 1:
1 Perhaps g (x) ∝ 1 − x 2 ? (Details on board.)
Results in G (x) = 12 (3x − x 3 ), 0 ≤ x ≤ 1.
Fails, because we cannot solve G (x) = u.
2 Try g2 (x) ∝ 2 − x. (Details on board.)
Results in G2 (x) = 43 x − 31 x 2 , 0 ≤ x ≤ 1.

Simulate X ∼ G2 by X = 2 − 4 − 3U.
Details/Matlab: SchatPi_IS2.m
Gain: about a factor 4.

43/ 44
Understanding bias Control variates Importance sampling The principle Example

To be continued next week!

Importance sampling is like a formula one racing car:


if you know how to drive it you can go very fast; otherwise,
you might end up in the hay (if you’re lucky).

44/ 44

You might also like