Professional Documents
Culture Documents
Random Number Generation
Random Number Generation
r z or bz
r z +m (whichever is positive).
When a ,= 0, the constant m may be non-prime. The standard C library function
drand48() is an LCG with m = 2
48
, a = 11, and b = 25214903917. This random
number generator can be shown to have period 2
48
. The following shows a sequence
from this LCG (the results dier from the results of drand48() due to the use of the
shue box described below).
1
x
i
a +bx
i
x
i+1
x
i+1
/2
48
0 11 11 0.000000
11 277363943098 277363943098 0.000985
277363943098 6993705175256325314877 11718085204285 0.041631
11718085204285 295470392517265591684356 49720483695876 0.176643
49720483695876 1253697219098278389146303 102626409374399 0.364602
102626409374399 2587715051722178864620894 25707281917278 0.091331
25707281917278 648206643511396312177937 25979478236433 0.092298
25979478236433 655070047545450703808072 137139456763464 0.487217
137139456763464 3457958225520320556088499 148267022728371 0.526750
148267022728371 3738538732155529954929218 127911637363266 0.454433
127911637363266 3225279645980899415312933 65633894156837 0.233178
65633894156837 1654952334863192683630540 233987836661708 0.831292
233987836661708 5899980819171657253110247 262259097190887 0.931731
262259097190887 6612837937027380313004390 159894566279526 0.568060
159894566279526 4031726125588636254303353 156526639281273 0.556094
156526639281273 3946804169928216632446352 14307911880080 0.050832
14307911880080 360772623309120026273371 215905707320923 0.767051
215905707320923 5444041665228996928755402 5324043867850 0.018915
5324043867850 134245254577730795368461 71032958119949 0.252360
71032958119949 1791089213934798995940244 83935042429844 0.298197
A rule of the form x
n+1
= a + bx
1
n
(mod m), where a
1
satises m[aa
1
1 is
called an inversive congruential generator, or ICG. ICGs have slightly better statistical
properties than LCGs, but the computation of x
1
is expensive.
All LCGs exhibit some positive autocorrelation. In particular, with the MCG, an
extremely small value is always followed by another small value, and this is true of the
LCG as well if a is small relative to m. For example in the MCG given above, the
frequency of values less than 10000 is 4.7 10
6
, but such a value is always followed
by a value less than .07 m.
The output of LCGs falls mainly on the planes. This means that if consecutive
values are binned into k-tuples z
j
= (x
kj
, x
kj+1
, . . . , x
kj+k1
), then the z
j
tend to fall
into hyperplanes. There are about m
1/k
distinct hyperplanes for most LCGs.
A shue box can break up low-order serial correlations and destroy the hyperplane
structure. To initialize, ll in an array [v
1
, . . . , v
32
] with random numbers, and let k
denote another random number. To generate one draw, let k
2 log(x
1
) cos(2x
2
)
g
1
(x
1
, x
2
) =
2 log(x
1
) sin(2x
2
).
It is easy to verify that [J(G)[ =
1
2
exp(G
2
1
/2) exp(G
2
2
/2). This is called the Box-
Muller method for generating Gaussian draws.
3
Many random variables are expressable as functions of other random variables, pro-
viding a way to simulate them. Examples include the Cauchy distribution (Z = X/Y ,
where X and Y are independent and normal), and the
2
p
distribution (Z =
p
i=1
X
2
i
,
where the X
i
are independent and normal; note that this is not an ecient way to
simulate a
2
variate).
A Bernoulli trial with success probability p can be obtained by rounding: B = 1(U <
p). Binomial draws can be obtained by summing independent Bernoulli draws (al-
though there are much better ways to simulate a binomial draw).
Rejection method
Suppose that
1. We want to simulate from the density , which we can evaluate as a function.
2. There is a density f whose sample space contains the sample space of , and we
can easily simulate from f.
3. There is a constant c such that sup
x
(x)/f(x) < c. Note that we must have c 1
since both and f are densities.
Under these circumstances, we can generate a candidate draw from f (which is called
the trial distribution), and make a random decision as to whether we will accept or
reject the draw. If we specify the probability of accepting the draw in a certain way,
then the marginal distribution of the draws that are accepted will be .
To carry out rejection sampling, generate Z according to f, and with probability
(Z)/cf(Z) use Z as the next draw. Otherwise, reject it and draw a new Z. The
resulting distribution has density :
P(Z[accept) = P(accept[Z)P(Z)/P(accept)
= (Z)/cf(Z) f(Z)/P(accept)
= (Z)/cP(accept)
= (Z).
Since P(Z[accept) and (Z) are both densities in Z, it follows that P(accept) = 1/c.
For example, (x) = exp(x
2
/2)/
1
/(1 +x
2
) is the Cauchy density. It can be shown that
(x)/f(x)
2
e
.
4
Therefore if we simulate a Cauchy draw Z (e.g. using tan(U) where U is uniform),
and then accept it with probability
exp(x
2
/2)/
1
/(1 +x
2
)
e
2
= exp(x
2
/2)(1 +x
2
)
e/2
then the draws that are accepted will be iid standard normal.
The eciency of rejection sampling is determined by P(accept) = 1/c, larger bounds c
yield less ecient schemes. In the limit, if (x)/f(x) is unbounded, rejection sampling
can not be used. For example, if (x) is Cauchy and f(x) is Gaussian, rejection
sampling can not be used.
Suppose we can not nd a tight upper bound for (Z)/f(Z), but we know (Z)/f(Z)
c
. Suppose that c is the tight upper bound. Since P(accept) = 1/c, the expected
number of trials that must be made to get one accepted value is c. Thus the rejection
sampling scheme using c
is c
f(x) < c, then the rejection sampling as described above still works.
P(Z[accept) = P(accept[Z)P(Z)/P(accept)
= (Z)/c
f(Z) f(Z)/P(accept)
= (Z).
However we now have P(accept) = a/bc, so we do not know the marginal acceptance
rate.
5
We can rejection sample from a continuous trial distribution and achieve a discrete
density. Suppose p(k) (k = 1, 2, . . .), is a probability mass function. We can dene a
denity p(x) = p(x|). If we sample from x p via rejection sampling, then x| has
mass function p.
Suppose we want to sample from a binomial distribution B(p, n). If n is small then
we can simply add independent Bernoulli trials. If n is large, we can use rejection
sampling from a generalized Cauchy distribution with = np, =
2
()
exp(( + 1) log x x)
2
()
exp(( + 1)(log( + 1) 1)).
Note that c goes to like n
n
/n!, so the eciency of this method is poor for large
(i.e., for = 1, 1/c .294; = 10, 1/c .012; = 100, 1/c .0004).
Simulating a Gamma distribution using rejection sampling (method 2): use rejection
sampling from a generalized Cauchy trial distribution. This distribution has density:
p(x) =
()
1
1 + (
x
)
2
.
We now have the opportunity to optimize the acceptance rate over and . To
sample from the generalized Cauchy distribution, sample Z from a standard Cauchy
distribution, and transform via Z Z +.
It is proved in the following paper that the following values of , , and c are opti-
mal: J.H. Ahrens and U. Dieter. Generating gamma variates by a modied rejection
technique. Comm. ACM, 25(1):4754, January 1982.
= 1
=
2 1
c = exp((1 log()))/().
6
Simulating a Gamma distribution using rejection sampling (method 3): Let
f(x) =
4 exp()
+
x
1
()(
+x
)
2
,
where =
2/e 1.52.
The accepted draws will be distributed according to the density on the right, below.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Standard normal
1.2*Cauchy
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Importance sampling
One of the main applications of simulation is to estimate a population mean (an in-
tegral) using the sample mean. Given an arbitrary integration problem
h(x)(x)dx,
where > 0 is a weighting function, we can consider the integral as an expectation
with respect to a density f by writing
h(x)(x)dx =
h(x)
(x)
f(x)
f(x)dx,
9
where it is necessary to select f so that h/f is integrable with respect to f. If we
have an iid sample Z
1
, . . . , Z
n
from f, then n
1
i
h(Z
i
)(Z
i
)/f(Z
i
) is a consistent
and unbiased estimate of
w
i
h(Z
i
). The Z
i
with small importance weights would likely be rejected
under rejection sampling, but under importance sampling we allow them to contribute
a small degree to the approximation.
As a simple example, consider
exp([x[ [y[)dxdy over the region [5, 5]
2
. The
true value is 4(1 exp(5))
2
3.95. The following two gures show the convergence
behavior of the importance sampling estimate based on a uniform trial density on
[5, 5]
2
, and on a bivariate standard normal trial density truncated to [5, 5]
2
. For
the uniform trial density, the importance weights are w
i
= 100, and for the normal trial
density the importance weights are w
i
= 2(F(5) F(5))
2
exp((z
2
1
+ z
2
2
)/2), where
F is the standard normal CDF.
10
## Do 1000 replicates using a uniform [-5,5]^2 trial density.
for r=1:1000
## Simulate trial density values.
Z = 10*rand(1000,2) - 5;
## The integrand values at the points in Z.
F = e.^(-abs(Z(:,1)) - abs(Z(:,2)));
## Estimate the integral using weights equal to 100.
I1(r) = 100*sum(F)/1000;
endfor
## Do 1000 replicates using a truncated standard normal trial density.
for r=1:1000
## Simulate from a truncated normal on [-5,5]^2.
Z = [];
while (1)
X = randn(1000,2);
ii = find(max(abs(X)) <= 5);
Z = [Z; X(ii,:)];
if (size(Z,1) >= 1000)
break;
endif
endwhile
Z = Z(1:1000,:);
## Importance weights for the truncated normal trial density.
W = 2*pi*(normal_cdf(5) - normal_cdf(-5))^2*exp((Z(:,1).^2+Z(:,2).^2)/2);
## The integrand values at the points in Z.
F = e.^(-abs(Z(:,1)) - abs(Z(:,2)));
## Estimate the integral.
I2(r) = dot(F, W) / 1000;
endfor
The eciency of importance sampling depends on the skew of the weights. If the
weights are highly skewed, then the sample mean is mostly determined by just a few
values, so the usual
n convergence will not hold. From survey sampling, there is a
11
notion of eective sample size (ESS), given by the formula
ESS =
SS
1 + var(w)
where the weighted sample mean should congerge at rate
h. In this
case E
f
w = 1 and var
f
w
i
= E
w.
Example: Suppose we wish to compute the expectation Eh(X), where X has a (, )
distribution. One option is to use rejection sampling from one of the trial densities
given above to obtain an iid sample X
1
, . . . , X
n
from the (, ) distribution, then
estimate the expectation using the simple average n
1
i
h(X
i
).
Another option is to simulate X
1
, . . . , X
n
from an one of the trial densities given above
(call it f), then calculate weights w
i
= exp(X
i
/)X
1
i
/
()f(X
i
). In this case,
the estimate of Eh(X) is the weighted average n
1
i
w
i
h(X
i
).
Suppose we generate Z
1
, . . . , Z
n
from a rejection sampling trial density f, and let D
i
be the indicators of whether Z
i
is accepted (so each D
i
is a Bernoulli trial with success
probability (Z
i
)/f(Z
i
)). The rejection sampling estimator of EZ can be written
i
D
i
Z
i
/
i
D
i
.
View this as an estimator of EZ based on data Z
1
, . . . , Z
n
. By the Rao-Blackwell
theorem,
= E(
[Z
1
, . . . , Z
n
) is unbiased and at least as ecient as
. For large n,
w
i
. The resulting estimate is still
consistent, but is no longer unbiased.
12