You are on page 1of 3

Uniform random oats

Taylor R. Campbell
Monday 28
th
April, 2014
Abstract
How to generate a double-precision oating-point number in [0, 1]
uniformly at random given a uniform random source of bits
The naive approach of generating an integer in {0, 1, . . . , 2
k
1} for some k
and dividing by 2
k
, as used by, e.g., Boost.Random and GCC 4.8s implemen-
tation of C++11 uniform real distribution , gives a nonuniform distribution:
If k is 64, a natural choice for the source of bits, then because the
set {0, 1, . . . , 2
53
1} is represented exactly, whereas many subsets of
{2
53
, 2
53
+ 1, . . . , 2
64
1} are rounded to common oating-point num-
bers, outputs in [0, 2
11
) are underrepresented.
If k is 53, in an attempt to avoid nonuniformity due to rounding, then
numbers in (0, 2
53
) and 1 will never be output. These outputs have
very small, but nonnegligible, probability: the Bitcoin network today
randomly guesses solutions every ten minutes to problems for which
random guessing has much, much lower probability of success, closer
to 2
64
.
What is the uniform distribution we want, anyway? It is obviously not
the uniform discrete distribution on the nite set of oating-point numbers
in [0, 1] that would be silly. For our uniform distribution, we would like
to imagine
1
drawing a real number in [0, 1] uniformly at random, and then
choosing the nearest oating-point number to it.
1
For the pedantic who may stumble upon this before reading ahead, we cannot, of
course, assign a meaningful probability to a choice of real number in [0, 1]. What we can
do, however, is imagine that it were meaningful, and then formalize it in the next sentence
with the correct sense of measure.
1
To formalize this, start with the uniform continuous distribution on [0, 1]
in the real numbers, whose measure is ([a, b]) = b a for any [a, b] con-
tained in [0, 1]. Next, let be the default (round-to-nearest/ties-to-even)
rounding map from real numbers to oating-point numbers, so that, e.g.,
(0.50000000000000001) = 0.5.
Note that the preimage under of any oating-point number that is,
for a oating-point number x, the set of real numbers that will be rounded
to x, or {u [0, 1] : (u) = x} is an interval. Its measure, (
1
(x)), is the
measure of the set of numbers in [0, 1] that will be rounded to x, and that is
precisely the probability we want for x in our uniform distribution.
Instead of drawing from real numbers in [0, 1] uniformly at random, we
can imagine drawing from the space of innite sequences of bits uniformly at
random, say (0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, . . .), and interpreting the result as the
fractional part of the binary expansion of a real number in [0, 1]:
0
1
2
+ 0
1
4
+ 0
1
8
+ 1
1
16
+ 0
1
32
+ 1
1
64
+ . . .
Then if we round that number, well get a oating-point number with the
desired probability. But we cant just directly compute that sum one term at
a time from left to right: once we get to 53 signicant bits, further bits will
be rounded away by addition, so the result will be biased to be even, i.e. to
have zero least signicant bit. And we obviously cant choose uniformly at
random from the space of innite sequences of bits and round the result.
We can, however, choose nite sequences of bits uniformly at random,
and, except for a detail about ties, after a certain constant number of bits,
no matter what further bits we choose they wont change what oating-
point number we get by rounding. So we can choose a suciently large nite
sequence of bits and round that, knowing that the result of rounding would
not change no matter how many extra bits we add.
The detail about ties is that if we simply choose, say, 1088 bits the least
multiple of 64 greater than the exponent of the smallest possible (subnor-
mal) oating-point number, 2
1074
and round it in the default round-to-
nearest/ties-to-even mode, the result would be biased to be even.
The reason is that the probability of seeing a tie in any nite sequence
of bits is nonzero, but but real ties occur only in sets of measure zero
they happen only when every remaining bit in the innite sequence is also
zero. To avoid this, before rounding, we can simulate a sticky bit by setting
the least signicant bit of the nite sequence we choose, in order to prevent
2
rounding what merely looks approximately like a tie to even.
One little, perhaps counterintuitive, detail remains: the boundary values,
0 and 1, have half the probability you might naively expect, because we
exclude from consideration the half of the interval
1
(0) below 0 and the
half of the interval
1
(1) above 1.
Some PRNG libraries provide variations on the naive algorithms that
omit one or both boundary values, in order to allow, e.g., computing log(x)
or log(1 x). For example, instead of drawing from {0, 1, . . . , 2
53
1} and
dividing by 2
53
, some
divide by 2
53
1 instead, to allow both boundary points;
draw from {1, 2, . . . , 2
5
3} instead, to omit 0 but allow 1; or
draw from {2
52
, 2
52
+ 1, . . . , 2
52
2, 2
52
1} (i.e., the signed rather
than unsigned 53-bit integers) and then add 1/2 before dividing, to
omit both boundary points.
These algorithms still have the biases described above, however. Given a
uniform distribution on [0, 1], you can always turn it into a uniform distri-
bution on (0, 1) or (0, 1] or [0, 1) by rejection sampling if 0 or 1 is a problem.
See http://mumble.net/
~
campbell/tmp/random_real.c for source code.
Thanks to Steve Canon and Alex Barnett for explaining the binary ex-
pansion approach to me and talking the problem over.
Copyright (c) 20062014, Taylor R. Campbell.
Verbatim copying and distribution of this entire article are permitted
worldwide, without royalty, in any medium, provided this notice, and the
copyright notice, are preserved.
3

You might also like