Week5 Handout

Understanding bias Control variates Importance sampling
wi3425TU—Monte Carlo methods
L.E. Meester
Week 5
1/ 44
Understanding bias Control variates Importance sampling
Week 5—Program for this week
1 Understanding bias (and trying to deal with it)

Bias detection: analysis
Envisioning what you can’t really see
2 Control variates (Chapter 22)
Variance reduction: correlation often is the key
The idea
A simple example (computational example §22.2)
Analysis and refinement
A classic example
Summary
3 Importance sampling: introduction
The principle
Example: determining π (again)
Don’t forget: Do the weekly practice material and upload your
solutions and we’ll provide you with feedback!
2/ 44
Understanding bias Control variates Importance sampling Bias detection: analysis Envisioning what you can’t really see


The idea
A classic example
Summary

The principle
3/ 44
Understanding bias (and trying to deal with it)
Recall estimating derivatives (from Lecture 3):
Matlabcode: DeltaCall2.m, M = 106 ; exact value: ∆ = 0.95577.

Estimates and standard errors:
h right way (2) ˆh − ∆
∆ ˆ h/10 “right”−exact
0.1 0.96044±0.00023 0.00405 0.00466
0.01 0.95649±0.00024 0.00041 0.00072
0.001 0.95608±0.00024 0.00004 0.00031
0.0001 0.95604±0.00024 0.00026
We see: error caused by the finite stepsize h is roughly linear in h.
Bias “estimate”: the change in the estimate, for example from
h = 0.1 to h = 0.01, coincides in order of magnitude with the bias
(that at h = 0.1!), at least while this difference is well above the
standard error. See also PoAMC 9 and 10.
4/ 44
A small bias detection quiz:
A. We estimate (some) Delta for two values of h:
>> DeltaCall2B2
M = 1000
h = 1
Delta = 0.9398
seDelta = 0.0093
h = 0.5000
Delta = 0.9208
seDelta = 0.0105
Does the output show clear evidence of bias? NO
5/ 44
B. Another run:
>> DeltaCall2B2
M = 100000
h = 1
Delta = 0.9300
seDelta = 0.0010
h = 0.5000
Delta = 0.9094
seDelta = 0.0011
Does the output show clear evidence of bias? YES

Can we say anything about how big it is?
6/ 44


The idea
A classic example
Summary

The principle
7/ 44

In the estimates we produce we can distinguish three components:
true value + bias + random error,
Let’s call:
∆0 : the true value;
∆ˆ (h) : the estimate at stepsize h;
bias(h) : the bias at stepsize h (we don’t know this);
1,2 : the (random) simulation error (indication of size: the se);
so in formula (for h = 0.1 and 0.01):
∆ˆ (0.1) = ∆0 + bias(0.1) + 1
ˆ (0.01) = ∆0 + bias(0.01) + 2
∆
and their difference:

ˆ (0.1) − ∆
∆ ˆ (0.01) = bias(0.1) − bias(0.01) + 1 − 2 .
8/ 44
ˆ (0.1) − ∆
∆ ˆ (0.01) = bias(0.1) − bias(0.01) + 1 − 2 .
Note: We know none of the terms on the right, only their sum. . .
However: the s.e.’s provide an indication of the “size” of 1 and 2 .
If the standard errors are
ˆ (0.1) − ∆
small: then the sum ∆ ˆ (0.01) ≈ bias(0.1) − bias(0.01)
and we “see” how big the reduction in bias is;
big: the random errors dominate, the reduction in bias is
smaller; nothing more can be said.
9/ 44


The idea
A classic example
Summary

The principle
10/ 44
Seeing what you can’t see
Next, plots for M = 103 , 104 and 105 , each time

1 single estimates with a bar the size of 1 s.e., h = 0.1 and 0.01;
2 superimposed: histograms of one thousand simulations.
In practice:
1 is all the information you have, this is all you know;
2 you should try to imagine/visualise; you know it’s there, your
results are just “one set from the histograms.”
11/ 44
M = 103 , h = 0.1 (red), 0.01 (blue):
12/ 44
M = 103 , h = 0.1 (red), 0.01 (blue):
13/ 44
M = 104 , h = 0.1 (red), 0.01 (blue):
14/ 44
M = 104 , h = 0.1 (red), 0.01 (blue):
15/ 44
M = 105 , h = 0.1 (red), 0.01 (blue):
16/ 44
M = 105 , h = 0.1 (red), 0.01 (blue):
17/ 44
The quiz, question B again:
>> DeltaCall2B2
M = 100000
h = 1
Delta = 0.9300
seDelta = 0.0010
h = 0.5000
Delta = 0.9094
seDelta = 0.0011
For h = 1 their is clearly bias.

How big the bias is for h = 0.5 one cannot say, but in this
situation one should probably prefer reducing the bias further
over reducing the random error by increasing M.
If the bias is larger than the standard error,
increasing M will not improve the estimate.
18/ 44
Two key things about bias
9 Bias. Some methods produce biased estimates: there is a

systematic deviation with respect to the unknown quantity
you are estimating. Make sure you are aware of this.
10 Accuracy in the presence of bias. The Monte Carlo rule
“one hundred times as many replications gives me an
additional digit of accuracy” is no longer true when there is
bias, so then blindly going for as large an M as possible is
senseless. It is better to find a balance between the size of the
(remaining) bias and the standard error. This is a hard and
sometimes unsolvable problem—do what you can.
19/ 44
Understanding bias Control variates Importance sampling Correlation=key Idea Example Refinement A classic examp


The idea
A classic example
Summary

The principle
20/ 44
Determining derivatives/Greeks, we found:
ˆ 2 ≈ 2 Var(W ) · [1 − ρ (Wh , W )]

Var ∆
h2 M
Analysing antithetic variables:
Var(Y ) = Var(X ) · 21 [1 + ρ (X , X a )].
Today: given a Y that “resembles” X ,

h i
Var(Zbest ) = Var(X ) · 1 − ρ (X , Y )2 .
21/ 44


The idea
A classic example
Summary

The principle
22/ 44
Control variates: the idea
We want to determine by Monte Carlo simulation: E [X ] = I .

Suppose: the random variable Y is
1 “closely connected to or correlated with X ”, and
2 E [Y ] is known.
For example, if they are positively correlated:
If a realization Y is above E [Y ], then perhaps also the
realization of X is above E [X ].
Similarly: Y < E [Y ] then perhaps X < E [X ].
Loosely speaking: perhaps X − Y stays close to E [X ] − E [Y ].
One may try Z = E [Y ] + X − Y :
1 note that it is definitely an OK estimator for I :
E [Z ] = E [X ] + E [Y ] − E [Y ] = I ;
2 perhaps its variance is smaller than that of X because the

−Y term counteracts the variability in X .
23/ 44


The idea
A classic example
Summary

The principle
24/ 44
A simple example
Consider once again

R1 √
I = 0 e u du or, equivalently,
√
I = E [X ] with X = e U for standard uniform U.
A candidate might be Y = eU , because:
√
one might say: the functions u and u don’t differ much on
the interval [0, 1]; and
R1
E [Y ] = 0 eu du = e − 1, so indeed E [Y ] is known.
Let’s try it out in Matlab: HigS222a.m compares

1 the estimate based on X against
2 the one based on Z = X + E [Y ] − Y .
25/ 44
Some plots
√
Left: X and Y versus U (in fact plots u 7→ eu and e u ).
Right: Simulated pairs (X , Y ). They have a strong correlation.
26/ 44


The idea
A classic example
Summary

The principle
27/ 44
Analysis and refinement (§22.2)
Generalisation: for any number θ the following
Zθ = X + θ · (E [Y ] − Y )
also has the right expectation: E [Zθ ] = E [X ] = I .

We might look for the best θ.
Optimisation: from Var(Zθ ) = Var(X − θY ) we get:
Cov(X , Y ) 2
h i
Var(Zθ ) = Var(X )· 1 − ρ (X , Y )2 +Var(Y )· θ − .
Var(Y )
red: the optimal value for θ (minimizes Var(Zθ )).

blue: the reduction wrt of the variance Var(X );
Next: implementation. . .
28/ 44
Implementation
In practice:
Cov(X , Y ) is not known (why?); often neither is Var(Y ).
Solution: estimate them from the simulation.
The simulation then looks like this:
1 Compute µY = E [Y ].
2 Generate pairs (X1 , Y1 ), . . . , (XM , YM ). (Save them)
3 Compute the sample covariance of X and Y and the sample
variance of Y , and estimate θmin = Cov(X , Y ) /Var(Y ).
4 Now determine for i = 1, . . . , M: Zi = Xi + θ̂min · (µY − Yi ).
5 Process Z1 , . . . , ZM in the usual way.
Matlab: HigS222b-c.m
29/ 44


The idea
A classic example
Summary

The principle
30/ 44
A classic example from path-dependent options (§22.3)
Arithmetic average Asian option: payoff

n
!
1X
W a := max S(ti ) − E , 0 ,
n
i=1
with t1 ,. . . , tn pre-determined time points.

Geometric average Asian option: payoff
n  
!1 1X
ln S(ti )
 
n
Y n  n 
g
− E , 0 = max e − E , 0 .
 i=1
W := max  S(ti )  
 
i=1
31/ 44
Geometric type can be priced exactly. Since

i
X
1 2
p
S(ti ) = S(t0 ) · exp (r − 2 σ )(ti − t0 ) + σ tj − tj−1 · Zj
j=1
the sum in the exponential of the pay-off

n
X
ln S(ti )
i=1
is a linear combination of Z1 , . . . , Zn : it has a normal distribution

and the parameters are easily determined.
Consequence: there is a Black-Scholes-formula for the geometric
version and W g can be used as control variate for W a ; hopefully
the payoffs are (quite) correlated. Matlab: HigS223.m
32/ 44


The idea
A classic example
Summary

The principle
33/ 44
Control variates: summary
We seek E [X ] = I and know about Y , the control variate, the

value µY = E [Y ] and that Y is “close” to X . We simulate the
pair (X , Y ) and do Monte Carlo based on
Zθ = X + θ · (µY − Y ).
Cov(X , Y ) σX
The best θ is: θmin = ≡ ρ (X , Y ) .
Var(Y ) σY
This leads to a reduction factor 1 − ρ2 :
h i
Var(Zθmin ) = Var(X ) · 1 − ρ (X , Y )2 .
The challenge: to find a Y with a high correlation with X .
34/ 44
Understanding bias Control variates Importance sampling The principle Example


The idea
A classic example
Summary

The principle
35/ 44
Importance sampling: the principle
R1
For a function k : [0, 1] → R one can determine I = 0 k(x)dx
by Monte Carlo, using that I = E [k(U)], with U ∼ U (0, 1).
Let’s make explicit that there is a density term: write
Z 1
I = k(x)f (x)dx,
0
where f (x) = 1 for 0 ≤ x ≤ 1, f (x) = 0 elsewhere.

Suppose: g is another probability density on [0, 1] with
g (x) > 0 on the whole interval; then:
Z 1
f (x)
I = k(x) g (x) dx.
0 g (x)
36/ 44
Z 1
f (x)
I = k(x) g (x) dx.
0 g (x)
We can re-interpret this as

f (X )
I = E k(X ) ,
g (X )
where X is a random variable with pdf g .

Values of X sampled from the interval (x, x + dx):
occur with relative frequency g (x)dx;
are multiplied by f (x)/g (x);
so contribute the correct amount f (x)dx to the integral I .
Conclusion: we have found an alternative way to determine I .
37/ 44
The ratios
f (x) f (X )
w (x) = and w (X ) =
g (x) g (X )
are sometimes called the likelihood ratio (LR), so we write

Z 1
I = k(x)w (x) g (x) dx and I = E [k(X )w (X )] ,
0
where X has pdf g .

Now, why would we do all this? To get a small variance!
There is lot of freedom in choosing g ; conditions:
we should be able to simulate from g ;
g should be positive where f is positive;
more precisely, where k times f is non-zero.
38/ 44
f (x)
w (x) = and I = E [k(X )w (X )] , with X ∼ g .
g (x)
A small variance for k(X )w (X ) could be obtained if
k(x)f (x)
k(x)w (x) = ≈ constant,
g (x)
so we want g (x) ≈ constant · k(x)f (x).

(This is only possible if k ≥ 0 of course!)
(N.B. The optimal g would be g (x) = constant · k(x)f (x), this is

the so-called zero-variance distribution—compute the constant to
find out why this is not possible. It is still a very useful idea.)
39/ 44
The x values that contribute most to the integral

Z 1
I = k(x)f (x)dx
0
are those for which k(x)f (x) is large.
Summarizing, we look for a density g that

is positive where k(x)f (x) is non-zero;
is large where k(x)f (x) is large.
This method is called importance sampling (IS): we look for g

that sample the important values with higher frequency.
40/ 44


The idea
A classic example
Summary

The principle
41/ 44
Example: determining π
√
To estimate π we used Y = 4 1 − U 2 with U ∼ U (0, 1).
R1 √
Then E [Y ] = 0 4 1 − x 2 dx = π and
from earlier simulation: Var(Y ) ≈ 0.797.
Here:
√
k(x) = 4 1 − x 2 and
f (x) = 1 for 0 ≤ x ≤ 1, f (x) = 0 elsewhere.
So, by the theory just developed, we look for g
positive on [0, 1];
√
proportional to 1 − x 2 .
42/ 44
We look for g
positive on [0, 1];
√
proportional to 1 − x 2 .
Note: √
1 − x 2 decreases from 1 to 0 on [0, 1].
Let’s try a decreasing density g (x) for 0 ≤ x ≤ 1:
1 Perhaps g (x) ∝ 1 − x 2 ? (Details on board.)
Results in G (x) = 12 (3x − x 3 ), 0 ≤ x ≤ 1.
Fails, because we cannot solve G (x) = u.
2 Try g2 (x) ∝ 2 − x. (Details on board.)
Results in G2 (x) = 43 x − 31 x 2 , 0 ≤ x ≤ 1.
√
Simulate X ∼ G2 by X = 2 − 4 − 3U.
Details/Matlab: SchatPi_IS2.m
Gain: about a factor 4.
43/ 44
To be continued next week!
Importance sampling is like a formula one racing car:

if you know how to drive it you can go very fast; otherwise,
you might end up in the hay (if you’re lucky).
44/ 44

Week5 Handout

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week5 Handout

Uploaded by

Copyright:

Available Formats

Understanding bias Control variates Importance sampling

wi3425TU—Monte Carlo methods

Week 5—Program for this week

1 Understanding bias (and trying to deal with it)

1 Understanding bias (and trying to deal with it)

2 Control variates (Chapter 22)

3 Importance sampling: introduction

Understanding bias (and trying to deal with it)

Recall estimating derivatives (from Lecture 3):

Matlabcode: DeltaCall2.m, M = 106 ; exact value: ∆ = 0.95577.

A small bias detection quiz:

A. We estimate (some) Delta for two values of h:

Does the output show clear evidence of bias? NO

Does the output show clear evidence of bias? YES

1 Understanding bias (and trying to deal with it)

2 Control variates (Chapter 22)

3 Importance sampling: introduction

Bias detection: analysis

and their difference:

1 Understanding bias (and trying to deal with it)

2 Control variates (Chapter 22)

3 Importance sampling: introduction

Seeing what you can’t see

Next, plots for M = 103 , 104 and 105 , each time

M = 103 , h = 0.1 (red), 0.01 (blue):

M = 103 , h = 0.1 (red), 0.01 (blue):

M = 104 , h = 0.1 (red), 0.01 (blue):

M = 104 , h = 0.1 (red), 0.01 (blue):

M = 105 , h = 0.1 (red), 0.01 (blue):

M = 105 , h = 0.1 (red), 0.01 (blue):

The quiz, question B again:

For h = 1 their is clearly bias.

Two key things about bias

9 Bias. Some methods produce biased estimates: there is a

1 Understanding bias (and trying to deal with it)

2 Control variates (Chapter 22)

3 Importance sampling: introduction

Variance reduction: correlation often is the key

Determining derivatives/Greeks, we found:

Var(Y ) = Var(X ) · 21 [1 + ρ (X , X a )].

Today: given a Y that “resembles” X ,

1 Understanding bias (and trying to deal with it)

2 Control variates (Chapter 22)

3 Importance sampling: introduction

Control variates: the idea

We want to determine by Monte Carlo simulation: E [X ] = I .

2 perhaps its variance is smaller than that of X because the

1 Understanding bias (and trying to deal with it)

2 Control variates (Chapter 22)

3 Importance sampling: introduction

Consider once again

Let’s try it out in Matlab: HigS222a.m compares

1 Understanding bias (and trying to deal with it)

2 Control variates (Chapter 22)

3 Importance sampling: introduction

Analysis and refinement (§22.2)

Generalisation: for any number θ the following

also has the right expectation: E [Zθ ] = E [X ] = I .

red: the optimal value for θ (minimizes Var(Zθ )).

1 Understanding bias (and trying to deal with it)

2 Control variates (Chapter 22)

3 Importance sampling: introduction

A classic example from path-dependent options (§22.3)