Fasc 9 C

Note to readers:
Please ignore these

sidenotes; they're just
hints to myself for
preparing the index,
and they're often aky!
KNUTH
THE ART OF
COMPUTER PROGRAMMING
VOLUME 4 PRE-FASCICLE 9C
ESTIMATING
BACKTRACK
COSTS
(ridi ulously preliminary draft)
DONALD E. KNUTH Stanford University
6
ADDISON{WESLEY 77
Mar h 27, 2022
Internet
Stanford GraphBase
MMIX
Internet page http://www- s-fa ulty.stanford.edu/~knuth/tao p.html ontains

urrent information about this book and related books.
See also http://www- s-fa ulty.stanford.edu/~knuth/sgb.html for information
about The Stanford GraphBase , in luding downloadable software for dealing with
the graphs used in many of the examples in Chapter 7.
See also http://www- s-fa ulty.stanford.edu/~knuth/mmixware.html for down-
loadable software to simulate the MMIX omputer.
See also http://www- s-fa ulty.stanford.edu/~knuth/programs.html for various
experimental programs that I wrote while writing this material (and some data les).
Copyright 2022 by Addison{Wesley
All rights reserved. No part of this publi ation may be reprodu ed, stored in a retrieval
system, or transmitted, in any form, or by any means, ele troni , me hani al, photo-
opying, re ording, or otherwise, without the prior onsent of the publisher, ex ept
that the oÆ ial ele troni le may be used to print single opies for personal (not
ommer ial) use.
Zeroth printing (revision -92), 27 Mar 2022
Mar h 27, 2022
HIPPOCRATES
LONGFELLOW
Internet
PREFACE
Life is short, the art is long.
| HIPPOCRATES, Aphorisms ( . 415 B.C.)
Art is long, and Time is eeting.
| HENRY WADSWORTH LONGFELLOW, A Psalm of Life (1838)
This booklet ontains draft material that I'm ir ulating to experts in the
eld, in hopes that they an help remove its most egregious errors before too
many other people see it. I am also, however, posting it on the Internet for
ourageous and/or random readers who don't mind the risk of reading a few
pages that have not yet rea hed a very mature state. Beware: This material
has not yet been proofread as thoroughly as the manus ripts of Volumes 1, 2, 3,
and 4A were at the time of their rst printings. And alas, those arefully- he ked
volumes were subsequently found to ontain thousands of mistakes.
Given this aveat, I hope that my errors this time will not be so numerous
and/or obtrusive that you will be dis ouraged from reading the material arefully.
I did try to make the text both interesting and authoritative, as far as it goes.
But the eld is vast; I annot hope to have surrounded it enough to orral it
ompletely. So I beg you to let me know about any de ien ies that you dis over.
To put the material in ontext, this portion of fas i le 9 previews Se tion
7.2.2.9 of The Art of Computer Programming, entitled \Estimating ba ktra k
osts." I haven't had time to write mu h of it yet | not even this prefa e!

The explosion of resear h in ombinatorial algorithms sin e the 1970s has
meant that I annot hope to be aware of all the important ideas in this eld.
I've tried my best to get the story right, yet I fear that in many respe ts I'm
woefully ignorant. So I beg expert readers to steer me in appropriate dire tions.
Please look, for example, at the exer ises that I've lassed as resear h
problems (rated with diÆ ulty level 46 or higher), namely exer ises : : : ; I've
also impli itly mentioned or posed additional unsolved questions in the answers
to exer ises : : : . Are those problems still open? Please inform me if you know
of a solution to any of these intriguing questions. And of ourse if no solution
is known today but you do make progress on any of them in the future, I hope
you'll let me know.
Mar h 27, 2022 iii
iv PREFACE
I urgently need your help also with respe t to some exer ises that I made Knuth
up as I was preparing this material. I ertainly don't like to re eive redit for
things that have already been published by others, and most of these results are
quite natural \fruits" that were just waiting to be \plu ked." Therefore please
tell me if you know who deserves to be redited, with respe t to the ideas found
in exer ises : : : . Furthermore I've redited exer ises : : : to unpublished work of
: : : . Have any of those results ever appeared in print, to your knowledge?

Spe ial thanks are due to : : : for their detailed omments on my early attempts
at exposition, as well as to numerous other orrespondents who have ontributed
ru ial orre tions.

I happily o er a \ nder's fee" of $2.56 for ea h error in this draft when it is rst
reported to me, whether that error be typographi al, te hni al, or histori al.
The same reward holds for items that I forgot to put in the index. And valuable
suggestions for improvements to the text are worth 32/ ea h. (Furthermore, if
you nd a better solution to an exer ise, I'll a tually do my best to give you
immortal glory, by publishing your name in the eventual book: )
Cross referen es to yet-unwritten material sometimes appear as `00'; this
impossible value is a pla eholder for the a tual numbers to be supplied later.
Happy reading!
Stanford, California D. E. K.
99 Umbruary 2020
Mar h 27, 2022

7.2.2.9 ESTIMATING BACKTRACK COSTS 1
That whi h puri es us is triall, MILTON
and triall is by what is ontrary. CARROLL
KNUTH
| JOHN MILTON, Areopagiti a (1644) random sampling{
Monte Carlo{
sear h trees{
Does not my exhausted Reader . . . feel independent sets{
a ertain glow of pride at having onstru ted so splendid a Tree| 8
su h a veritable Monar h of the Forest? pi graph of order 8
leaf node: a tree node that has no hildren
| LEWIS CARROLL, in Symboli Logi , Part II (1896)
Ba ktra k programs are full of surprises.

Sometimes they produ e instant answers to a supposedly diÆ ult problem.
But sometimes they spin their wheels endlessly,
trying to traverse an astronomi ally large sear h tree.
And sometimes they deliver results just about as fast as we might expe t.
| DONALD E. KNUTH, in Combinatorial Algorithms, Part 2 (2019)
7.2.2.9. Estimating ba ktra k osts. Let's return now to a topi that was
introdu ed near the beginning of Se tion 7.2.2, namely the fa t that random
sampling an often help us predi t the approximate size of a sear h tree.
Suppose, for example, that we want to look at all of the independent sets of
a given graph. Figure 400(a) shows 8 , the pi graph of order 8, whi h was intro-
du ed in Se tion 7.2.2.5; and Fig. 400(b) shows a sear h tree that dis overs ea h
of its independent sets. There are 26 su h sets, orresponding to the 26 leaves
of that tree: ;, f0g, f1g, f2g, f1; 2g, f3g, f0; 3g, : : : , f0; 4; 7g, f2; 4; 7g, f6; 7g.
01234567 7
(a) 0 6 4 1 0123456 6 0246 6
2 5 3 7 012345 5 3 3 024 4
01234 4 01 1 02 2 02 2
0123 3 023 3 0 0 0 0 0 0
(b) 012 2 02 2 02 2 02 2
01 1 1 1 0 0 0 0 0 0
0 0
Fig. 400. A \random" graph, and a binary sear h tree for its independent sets.
Ea h node s of Fig. 400(b) is labeled with the subset S of verti es that might
be in an independent set onsistent with the downward path to s. More pre isely,
every nonleaf node is a binary bran h: Its left hild leads to the independent sets
that do not in lude the largest andidate v 2 S , while its right hild leads to those
that do in lude v. Thus, for example, the right subtree of the root represents all
of the independent sets of 8 j f0; 2; 4; 6g, be ause vertex 7 is adja ent to 1, 3,
and 5. The left subtree represents all of the independent sets of 7 .
In general, when a problem involves a potentially huge sear h tree T , we
often want to estimate X
ost(T ) = (s); (1)
s2T
Mar h 27, 2022
2 COMBINATORIAL SEARCHING (F9C: 27 Mar 2022 1949) 7.2.2.9
the total ost of its nodes, where (s) is the ost asso iated with a parti ular unbiased estimator
node s. For example, if we want to ount the total number of nodes in T , we random path
degree of a tree node
simply de ne (s) = 1 for ea h s. Or if we want to ount only the number of b
s
leaves, we let (s) = [ s is a leaf℄. Or we might want the ost to be the amount of parent
re urren e
time that's spent when pro essing node . Many useful s enarios arise, and we varian e{
shall dis uss methods that work for any fun tion for whi h (1) makes sense. mean
We learned long ago in Algorithm 7.2.2E about a surprisingly easy way to
ompute a random value X that is an \unbiased estimator" of the ost, in the
sense that
E X = ost(T ): (2)
In fa t, we obtain su h an X by simply ipping oins and following a random
path, starting at the root of T , until oming to a leaf. Here's that algorithm
again, slightly reformulated:
Algorithm P (Path sampling ). Given a tree T (or a way to generate T , top-
down), and a ost fun tion on the nodes s 2 T , this Monte Carlo algorithm
omputes a value X that satis es (1) and (2).
P1. [Initialize.℄ Set W 1, X 0, and s root(T ).
P2. [Augment X .℄ Set X X + W (s).
P3. [Done?℄ If s is a leaf, terminate the algorithm.
P4. [Bran h.℄ Let s have hildren fs1 ; s2 ; : : : ; sd g. Choose a uniformly random
integer J 2 f1; 2; : : : ; dg, and set s sJ , W W d. Return to step P2.
The hoi es made in step P4 should, of ourse, be independent of ea h other.
For example, suppose we apply Algorithm P to Fig. 400(b); then it will
sample one of 26 possible paths. If we de ne (s) = [ s is a leaf℄, so that ost(T )
is the number of independent sets of 8 , the nal value of X will be the nal
value of W . And sin e that tree T is binary, the value of W will be a power of 2.
Indeed, X will be 2l when the path ends at a leaf that's l steps down from the
root. So we'll get X = 28 = 256 if we happen to take the leftmost path; and
we'll get X = 22 = 4 if we always bran h to the right. But we're more likely to
obtain X = 16 or X = 32 than either of those extreme values.
Noti e that it's easy to al ulate the probability Pr(s) that Algorithm P will
visit any parti ular node s of any given tree T : Let d(s) denote the degree of s
(the number of s's hildren), and let sb be the parent of s. Then Pr(s) = 1=w(s),
where
w(root) = 1; and w(s) = d(s b) w(sb) if s 6= root: (3)
In fa t, w(s) is the value of W whenever Algorithm P visits node s. Therefore
X X X
E X = Pr(s) w(s) (s) = 1=w(s) w(s) (s) = (s) = ost(T );
s2T s2T s2T
and we've veri ed (2).
Let's write = E X and 2 = var(X ) as a onvenient shorthand for the
mean and varian e of X . If X1 , X2 , : : : , Xn are independent estimates, we know
Mar h 27, 2022
that the empiri al \sample mean" Eb X = (X1 + X2 + + Xn )=n will almost sample mean
surely be lose to the true mean if n is large enough. Indeed, the se ond se ond moment prin iple
omplete graph
moment prin iple implies that Kn
Floating point
Pr Eb X + pa a2 1+ 1 and Pr Eb X pa a2 1+ 1 ; (4) error bars
Chatterjee{Dia onis s ore
n n pi graph
for all a 0. (See exer ise MPR{48.) Our sample mean will therefore be quite
reliable if the varian e isn't too large.
On the other hand, a big varian e an mean big trouble. Suppose, for
example, that we try to estimate the number of independent sets of the omplete
graph Kn by the method above. (The answer is obviously n +1; but let's pretend
that we need to estimate it.) Then we're applying Algorithm P to a maximally
unbalan ed binary tree with n + 1 leaves: Every right hild is a leaf. So X takes
the value 2 l with probability 1=2 l, for 1 l < n, and the value 2n o urs with
probability 2=2n. In this ase = n + 1; but 2 = 2n+1 + 2n 2 (n + 1)2
is huge. There's learly no pra ti al way to distinguish between n = 1000 and
n = 1000000, say, when X almost never ex eeds 2100 .
We generally don't know whether or not the varian e is suitably small. But
there is a way to guess: After making n > 1 independent estimates X1 , X2 ,
: : : , Xn we an ompute

var(X ) = n n 1 Eb X 2 (Eb X )2
= X1 + Xn2 + 1 + Xn (X1 + Xn2(n+ 1) + Xn ) ;
2 2 2 2
(5)
whi h is an unbiased estimator of var(X ). (See exer ise 2.) We de ne var(X ) = 0
when n = 1. Floating point evaluation of var is best done with Eq. 4.2.2{(16).
Empiri al estimates of the mean of a random variable X are often presented
with \error bars" in the form a Æ, where a and Æ are approximations to Eb X and
p
var(X )=n, suitably rounded. We might also ompute the Chatterjee{Dia onis
s ore
b(X ) =
max(X1 ; X2 ; : : : ; Xn ) = max(X1 ; X2; : : : ; Xn ) ; (6)
nEbX X1 + X2 + + Xn
whi h should ideally be small; see 7.2.2{(34). For example, after we apply
Algorithm P to Fig. 400(b), n = 10 times, we typi ally get an error-bar estimate
like 31 12 for the number of leaves, and an a ompanying s ore b 0:4. But
with n = 1000 we get sharper estimates su h as 27 1 and a s ore of 0.01.
The news isn't so good, however, when we try to estimate the independent
sets of the larger graph 64 . Here n = 1000 estimates typi ally give results like
11000 2600 (and b 0:2), whi h is too low: The a tual number is 33607.
We need to go up to n = 1000000 to get a de ent approximation, and still the
error-bars are rather large: 29000 6000 (with s ore 0.15).
The situation is mu h worse with the really big graph 4096 . A billion runs
yield the estimate (2:0 0:8) 1011 (with b 0:36); we'll see later that this is
totally o base. It's not even a de ent ballpark estimate.
Mar h 27, 2022
01234567 ve tors
ek
1 2 02 3 4 023 5 01 6 3 unit ve tor
0 1 7 0246 bitwise te hniques
1 0 2 0 2 3 0 1 3 0 2 4 6
02 02
0 2 0 2
Fig. 401. A multiway sear h tree for the independent sets of the graph 8 in Fig. 400(a).
But let's not give up hope. There's another way to examine all the indepen-
dent sets of a graph, by using multiway bran hing instead of binary bran hing.
For example, Fig. 401 shows the multiway sear h tree for 8 ; this one is more
ompa t than Fig. 400(b): Every node s whose label lists k verti es v1 : : : vk now
has exa tly k hildren, orresponding to the hoi e of the largest vertex that
will be in luded in the urrent independent set if we don't stop at node s. All
independent sets of size l now appear on level l of the tree, for l 0; therefore
the relevant ost fun tion is now (s) = 1 for all s.
Algorithm P an take 17 paths in Fig. 401, and those paths are shorter than
before. The smallest estimate is now 8, and the largest is now 64. Here are some
typi al results, together with orresponding statisti s from the larger graph 64 :
8 ; n = 10: 23 3 (b 0:2) 64 ; n = 103: 31000 5000 (b 0:13)
8 ; n = 1000: 25:6 0:7 (b 0:004) 64 ; n = 106: 33700 200 (b 0:001)
And a million samples with respe t to 4096 yield (4:1 0:7) 1016 (b 0:1).
By looking a little loser at the tree while making these sample runs, we
an a tually make several estimates at on e, be ause the individual osts (s)
at ea h node an be ve tors instead of s alars. (The random variable X will
then be a ve tor too.) For example, suppose we want to estimate the number
of independent sets of ea h size, as well as the total number. Every leaf of the
binary sear h tree represents an independent set of size k when the path to that
leaf takes k rightward bran hes; we an let (s) = ek at that leaf, where ek is
the unit ve tor whose j th oordinate is [ j = k ℄. Similarly, we an let (s) = el
for every node s on level l of the multiway sear h tree. Then the expe ted value
of X will be (n0 ; n1 ; n2 ; : : : ), when the graph has nk independent k-element sets.
And it's easy for Algorithm P to keep tra k of the relevant values of k or l as it
moves down the tree. (For example, a million multiway samples of 4096 suggest
that n0 = 1, n1 = 4096, n2 4191000 100, n3 1429000000 100000, : : : ,
n12 (1:454 2) 1014 , and n13 = 0, although b 0:1 is rather high.)
?
One of the great advantages of Algorithm P is that it's almost blindingly fast.
Exer ise 6 shows, for example, how bitwise te hniques allow us to sample sear h
trees for independent sets very qui kly. Even with the large graph 4096 , the
running time per multiway sample is only about 550 mems, be ause independent
sets in 4096 rarely have more than 12 elements. (A binary sample takes longer,
about 2700 mems.)
Mar h 27, 2022
Exer ise 8 proves that there's a straightforward way to evaluate the varian e strati ed trees{
of X exa tly, if we're able to examine the whole tree T that is to be sampled by heuristi s ore
stratum
Algorithm P:
X
var(X ) = w(s)( ost(sj ) ost(sk ))2 sj and sk are hildren of s : (7)
s2T
Here ost(s) denotes the sum of (s ) over all nodes s in the subtree rooted
0 0
at s. Of ourse, we only use Algorithm P in pra ti e when T is too big for this
al ulation to a tually be arried out (or when we're debugging a program to be
used later with large T ). But formula (7) makes it lear that the varian e will be
small if and only if the siblings of ea h family have roughly equal- ost subtrees.
Using this formula we an show that the binary sear h trees for 8 and 64 have
varian es 1440 and 155,660,199,823, respe tively. The multiway sear h trees, by
ontrast, have varian es 459 and 40,836,274,204.
Thus multiway trees are the winners in graphs like n . But exer ises 9
and 10 show that we get quite di erent out omes in di erent kinds of graphs.
Strati ed trees. The estimates of Algorithm P an be improved if we know
more about the tree that we're trying to quantify. Indeed, there's often good
reason to believe that many nodes of the tree represent subproblems that are
similar to ea h other. We an take advantage of this knowledge when looking
for a representative sample.
For instan e, let's redraw Figs. 400(b) and 401 by positioning ea h node at
a height that's based on the length of its label, rather than on its distan e from
the root:
01234567
0123456
012345
01234
0123 0246
012 023 024
01 02 02 02 01 02 02
0 1 0 0 0 0 3 0 0
01234567
0246
023
02 02 01 02
1 3
Fig. 402. Sear h trees with similar nodes pla ed into the same horizontal \stratum."
Nodes 0123 and 0246 are likely to have similar subtrees, be ause they both
represent a sear h for independent sets in a 4-vertex graph. Figure 400(b), by
ontrast, gave equal prominen e to the quite dissimilar nodes 0123456 and 0246 .
In general, the idea is to assign a heuristi s ore h = h(s), alled the stratum
of s, to every node s of the tree. This an be done in any way, provided only that
h(root) = 0; and h(s) > h(s b) when sb is the parent of s. (8)
But nodes of the same stratum should, ideally, have similar subtrees.
Mar h 27, 2022
A 0 Fig. 403. A small, Chen
B C 1 poorly strati ed tree. pun not resisted
D 2 Here h(A) = 0, Chen subset
E F G 3 h(B) = h(C) = 1,
preorder
H I J K 4 : : : , h(L) = h(M) = 5.
L M 5
Let's look at a somewhat weird example, in order to grasp the generality

of this on ept. Figure 403 illustrates a s enario in whi h nodes B and C have
been assigned to the same stratum, although their subtrees are quite di erent;
similarly, node E has little in ommon with F or G.
Given a large, strati ed tree T , is there a fast way to obtain an unbiased
estimate of its total ost, taking advantage of the strata when the heuristi s ores
h(s) have been well hosen? Yes! Pang C. Chen has devised just su h a \strat-
egy," in SICOMP 21 (1992), 295{315. His elegant method, like Algorithm P,
assigns weights W = W (s)Pto a small subset S of the nodes, in su h a way that
the random variable X = s S W (s) (s) satis es E X = ost(T ), regardless of
the individual osts (s).
2
Every sample S produ ed by Chen's method will be a tree, ontained in the

big tree T ; but it's not ne essarily a path. More pre isely, S will be what we
shall all a Chen subset, de ned by the following four properties:
i) S is not empty.
ii) If s 2 S , then sb 2 S . (As above, sb denotes the parent of s.)
iii) If sb 2 S , then s 2 S for some node s with h(s) = h(s ). (Perhaps s = s.)
0 0 0 0
iv) If s 2 S and s 2 S and s 6= s , then h(s) 6= h(s ).

0 0 0
From (i) and (ii) we know that the root of T is in every Chen subset. From
(i) and (iii) we know that every Chen subset ontains at least one leaf. And
property (iv) is perhaps the most important of all: It states that every stratum
is represented by at most one node. (Therefore, if there are fewer than 1000
strata, S will have fewer than 1000 nodes. That's what makes the method fast.)
In Fig. 403, for example, S must ontain A. So it must ontain either B
or C, by (iii). But it annot ontain both B and C, by (iv). All told, that small
tree turns out to have exa tly nine Chen subsets:
ABE; ABFHL; ACDFHL; ACDFHM; ACDFIL; (9)
ACDFIM; ACDGIM; ACDGJM; ACDGKM:
At one extreme, we ould assign every node of T to a distin t stratum
all by itself; for example, we ould let h(s) be the position of s in preorder.
Then every Chen subset would be the entire tree, and we'd be \estimating"
ost(T ) exa tly | but very foolishly, by brute for e. At another extreme, we
ould let h(s) be the depth of s, namely its distan e from the root. Then every
Chen subset would be a path, and Chen's method would redu e to Algorithm P.
In between those extremes there's often a happy medium, where Chen subsets
provide eÆ ient and reasonably sharp estimates.
Here then, without further ado, is Chen's method. We'll prove it valid later.
Mar h 27, 2022
Algorithm C (Chen subset sampling ). Given a strati ed tree T (or a way to merge
generate T , top-down), a ost fun tion on the nodes s 2 T , and a heuristi
fun tion h satisfying (8) that de nes the strata of T , this Monte Carlo algorithm
omputes a value X that satis es (1) and (2). It uses auxiliary arrays S and W,
having one entry for every possible value of h(s).
C1. [Initialize.℄ Set h hmax 0, X 0, S[0℄ root(T ), and W[0℄ 1.
Also set W[h℄ 0 for all possible strata h > 0.
C2. [Advan e h.℄ If W[h℄ = 0, set h h + 1 and repeat this step.
C3. [Augment X .℄ Set s S[h℄, W W[h℄, and X X + W (s).
C4. [Done?℄ If s is a leaf and h = hmax , terminate the algorithm.
C5. [Bran h.℄ Let s have hildren fs1 ; : : : ; sd g. Do the operation merge(sj ; W )
below, for 1 j d. (This will ensure that W[h(sj )℄ > 0 for all j , and that
ea h of those strata h(sj ) will have a representative in S.) Set h h + 1
and return to C2.
The set S of all nodes s that o ur in step C3 learly satis es properties (i){(iv)
above; so it's a Chen subset. Conversely, we'll see below that every Chen subset
has a positive probability of o urring.
The key operation needed in step C5 is quite straightforward:
8
>
>
Set h h(s) and W W[h ℄.
0 0 0
>
>
> If h > hmax, set hmax h .
0 0
<
If W = 0, set S[h ℄
merge(s; W ) = > Otherwise
0
s and W[h ℄
0
W.
(10)
0
>
>
set W 0
W + W , W[0
h ℄ W, 0 0
: and let U be uniform in [0 : : 1);

>
>
if U < W=W , set S[h ℄ s.
0 0
This is where the randomization o urs. The net e e t is that, after Algo-
rithm C has seen nodes s(1), : : : , s(k) on stratum h , having the respe tive weights
0
w(1), : : : , w(k), the urrent representative

S[h ℄ of that stratum will be s(i) with
0
probability w = w + + w . (Exer ise 14 illustrates an example run.)

(i ) (1) (k )
How well does Algorithm C handle our independent set problem? The
strati ation by label size works very well indeed: The varian e for the binary
graph in Fig. 402 goes down from 1440 to just 2; and the varian e for the
multiway graph is zero! The orresponding varian es for estimates of 64 , via
Algorithm P, are also redu ed more than a thousandfold, to 137,328,400 (binary)
and 24,847,958 (multiway).
Of ourse the running time is also an issue here. When a node has high
degree d, step C5 must look at all d of the hildren, to determine their strata and
to merge them into the urrent sample. But step P4 looks at just one hild and
zooms ahead. Therefore we get a more meaningful omparison if we ask, \How
reliable an estimate an I obtain by making one gigamem of omputation?" | a
se ond or so | instead of asking about the varian e of a single estimate.
Consider, for example, the large graph 4096 . One gigamem with Algo-
rithm P on the binary tree brings us 372,646 independent samples; but this
result, roughly 1010 1010, is terrible, as mentioned above. We an a tually
Mar h 27, 2022
make 2,019,383 samples on the multiway tree in the same amount of time. And oriented binary trees
that result, (3:85 0:41) 1016, isn't bad. Algorithm C has time to produ e binary trees
only ve(!) samples with respe t to that giant multiway tree; its estimate is then
(3:35 0:22) 1016, only slightly better. But Algorithm C wins with the binary
tree: 672 samples, leading to the estimate (3:58 0:05) 1016. (Indeed, the true
value is 35856268459815803; see exer ise 20.)
Let's look at a di erent kind of tree, in order to get more experien e with
Algorithms P and C. There are 676,157 oriented binary trees that have 20 nodes
(see exer ise 2.3.4.4{6); and we an stratify them by assigning ea h node to
stratum (l; d), where l is the distan e from the root and d is the degree. (Well,
Algorithm C a tually wants the stratum to be an integer. So we de ne h(s) =
3l + d, plus a onstant to make h(root) = 0.)
Suppose we set (s) = 1, and try to estimate the tree size. Ea h tree T will
have a ertain path varian e var P(T ) with respe t to Algorithm P, and a ertain
strati ed varian e varC (T ) with respe t to Algorithm C. Figure 404 summarizes
the joint distribution of these statisti s.
strati ed varian e
000
100
200
300
400
500
600
700
0000
1000
2000
path varian e
3000
4000
A heat map for path varian e
1..3
5000 4..15 Fig. 404.

versus strati ed varian e, based on the
16..63
6000 64..255
7000
256..1023
1024..4095
4096..16383
676,157 oriented 20-node binary trees.
8000 16384..65535
65536..262143
In general var P tends to be roughly 10 times as large as varC , although there

are many ex eptions. Most of the trees | 416,889 of them (62%) | have var P <
1000 and varC < 100. In fa t 35% have var P < 500 and varC < 50; 89% have
var P < 2000 and varC < 200. But only 82 trees have var P(T ) = varC(T ) = 0.
Several extreme ases are noteworthy:
var P = 8255 ; var P = 751 ; var P = 215 :

varC = 0 varC = 735 varC = 443
The tree on the left maximizes var P; the tree in the middle maximizes varC ; the
urious tree on the right maximizes varC var P.
Mar h 27, 2022
A 0 imitation tree
A BC F 1 unbiased estimates
AC2 D2 FG2 I2 2
AB2 E2 F AC2 D2 FG2 I2 M2 3
AB2 F3 H3 L3 2 2 3 3 2 3
AC D F H I L M 2 2 2 3 2 3 3
AC D G I J K M 2 4
AB2 F3 H3 L3 2 2 3 5 3
AC D F H L M 2 2 2 3 5 3
AC D F I L M 2 2 2 3 8
AC D G I M 2 2 2 3 8
AC D G J M 2 2 2 3
AC D G K58 M2
AB2 E3 AB2 F3 H3 L3 AC2 D2 F3 H5 L5 AC2 D2 F3 H5 M5 AC2 D2 F3 I5 L5 AC2 D2 F3 I5 M5 AC2 D2 G3 I8 M2 AC2 D2 G3 J8 M2 AC2 D2 G3 K8 M2
Fig. 405. Algorithm C will follow a path in this tree, if given Fig. 403.
*Theoreti al onsiderations. When Algorithm C is invoked on the small ex-
ample tree of Fig. 403, it essentially does a random walk in the bran hing tree
depi ted in Fig. 405. Its task is to hoose a representative, unbiased Chen subset.
The top node, `jAj', means that the algorithm is poised to de ide what node
should represent the stratum ontaining A. There's only one hoi e; so the rst
move is to ÀjBCjF', whi h means that A has been hosen and that a hoi e
must next be made between B and C on the next stratum. There also will ome
a time when a representative for the stratum ontaining F must be sele ted.
Two equally likely hoi es pro eed from ÀjBCjF'. At the left is ÀB2 jE2 Fj':
The representative of stratum 1 is B; and we're going to pretend that A had two
hildren exa tly like B, ea h pulling a hild exa tly like E into the pi ture.
That makes a three-way tie for stratum 3, two bran hes of whi h lead to an
assumed \imitation tree" whose six nodes are AB2 E3. The other bran h leads
to ÀB2 F3 jH3 jL3 ', be ause ea h F brings in an H and L on strata 4 and 5; and
that leads inexorably to an imitation tree with twelve nodes AB2 F3 H3 L3 .
The right bran h from ÀjBCjF' leads to similar adventures. Eventually the
algorithm will hoose among nine possible imitation trees, whi h are shown in
Fig. 406. Ea h of them appears above the probability that it will be hosen.
A A A A A A A A A
B B B B C C C C C C C C C C C
C C C
D D D D D D D D D D D
D D D
E E E F F F F F F F F F F F F F F F G G G G G G G G G
H H H H H H H H H H H H H I I I I I I I I I I I I I I I I I I J J J J J J J J K K K K K K K K
L L L L L L L L M M M M M L L L L L M M M M M M M M M M M
2 9 18 12 12 8 16 24 24
6 54 300 300 300 300 192 192 192
Fig. 406. The nine imitation trees that result from the paths in Fig. 405.
(The intermediate states shown in Fig. 405 don't a tually mat h the pre ise
behavior of Algorithm C. Instead of being in state ÀC2 jD2 jFG2 I2 ', for instan e,
the algorithm will have already merged F with G2 , so that it never has to deal
with more than one node from any stratum. In other words, Algorithm C makes
its hoi es in a somewhat pe uliar order. But the nal out omes are equivalent
to those shown in Fig. 406.)
Figure 405 explains why Algorithm C produ es unbiased estimates, be ause
every step an be seen to preserve the expe ted ost. Let's write a = (A),
b = (B), : : : , and A = ost(A), B = ost(B), : : : ; then A = a + B + C + F =
a + (b + E ) + ( + D + G + I ) + (f + H + L) = = a + b + + m. When the
Mar h 27, 2022
algorithm rst rea hes step C2, it's in state `jAj', with X = 0 and A yet to be martingale
evaluated. The next time it gets to C2 it's in state ÀjBCjF', with X = a and varian e
Chen subset
B + C + F yet to be evaluated. Then it bran hes; we'll either have X = a + 2b dire ted a y li graph
and 2E + F pending, or X = a + 2 and 2D + F + 2G + 2I pending. dag
imitation tree
The main point is that the expe ted value of X plus the pending osts hasn't
hanged : We have a+B +C +F = 21 (a+2b+2E +F )+ 21 (a+2 +2D +F +2G+2I ).
More formally, let's write for example
Z (AC2 jD2 jFG2 I2 ) = a + 2 + ost(D2 FG2 I2 ) = a + 2 + 2D + F + 2G + 2I;
by \lower asing" the nodes before the rst verti al bar; this makes Z the sum of
a + 2 (the estimated osts so far, namely the urrent value of X in this state)
and 2D + F + 2G + 2I (the pending osts, whi h are to be estimated later).
It's not diÆ ult to show that Z is a martingale, namely that every random
hoi e preserves E Z | and by extension, that Algorithm C will produ e an
unbiased estimate for any given strati ed tree. Exer ise 30 illustrates the details
for the 8-way split of jI2 J3 K3 j, whi h is the most omplex bran h in Fig. 405.
At the top, Z = A. At the bottom, Z = X has one of nine values a +2b +3e,
: : : , a + 2 + 2d + 3g + 8k + 2m, orresponding to the nine trees in Fig. 406, and
with the probabilities shown there. The martingale property proves that
A = 31 (a +2b +3e)+ 16 (a +2b +3f +3h +3l)+ 50 3 (a +2 +2d +3f +5h +5l)+
+ 81 (a +2 +2d +3g +8j +2m)+ 18 (a +2 +2d +3g +8k +2m):

Our main goal, now that we know that X has the desired mean, is to study
the varian e of X , whi h is a quadrati polynomial in the individual node osts
fa; b; : : : ; mg. To begin, we observe that there's a dire t way to determine the
nal value X (S ) that arises in onne tion with any given Chen subset S of T ,
and its orresponding probability p(S ), by onstru ting a dire ted a y li graph
dag(S ) on the nodes of S :
dag(S ) has an ar sb ! s for every pair (s; s ) in property (iii): (11)
0 0
In other words, there's an ar for every (s; s ) with h(s) = h(s ) and sb; s 2 S .
0 0 0
(We know that S satis es property (iii), by the de nition of Chen subsets.) For
example,
dag(ACDFHL) = A C D F H L ; (12)
be ause the pairs (B; C), (C; C), (D; D), (F; F), (G; F), (H; H), (I; H), (L; L),
(M; L) yield the ar s A ! C, A ! C, C ! D, A ! F, C ! F, F ! H, C ! H,
F ! L, D ! L. Noti e that the out-degree of every node s in dag(S ) will be the
degree of s in the overall tree T .
Let WS (s) be the number of paths from the root r of T to node s in dag(S ).
In (12), for example, we have S = ACDFHL, WS (A) = 1, WS (C) = 2, WS (D) =
2, WS (F) = 3, WS (H) = 5, and WS (L) = 5. An easy indu tion proves that
WS (s) is, in fa t, the value of W that o urs when step C3 adds W (s) to X .
It's therefore the number of opies of node s in the imitation tree that's based
on S (whi h for our example is the third imitation tree from the left in Fig. 406).
Mar h 27, 2022
Furthermore, the merging me hanism of Algorithm C learly auses S to be partial Chen subset
hosen as the Chen subset with probability lones
rux
Y WS ( s b)
riti al pair
pT (S ) = p(S ) = : (13) least ommon an estor
WS (s) la
s S r 2 n riti al for s00
(In example (12), p(S ) omes to = .) We've proved that, in general,
1 2 1 3 3
2 2 3 5 5
18
300
X
X (S ) = WS (s) (s) o urs with probability p(S ): (14)
s2S

The varian e, then, is PS p(S )X (S )2 Ps T (s) 2 , where the rst sum
is over all Chen subtrees S in T . And here's where we get a lu ky break: We've
2
shown in Fig. 405 that the behavior of Algorithm C an a tually be viewed as

equivalent to the behavior of Algorithm P, on a spe ial kind of tree T | be ause 0
Algorithm P takes random paths from the root to the leaves! Thus we an use
Eq. (7) to evaluate the varian e of X in Algorithm C, if we interpret (7) properly.
Let's say that a partial Chen subset is a subset S that is obtained from a
0
Chen subset S by removing all elements of strata h, for some h. For example,
the partial Chen subsets ontained in the Chen subset ABFHL are ;, A, AB,
ABF, ABFH, and ABFHL. Noti e now that the tree T in Fig. 405 is pre isely 0
the tree of partial Chen subsets of the tree in Fig. 403, ex ept that many of the
nodes are present more than on e (as indi ated by parallel bran h lines).
Indeed, Fig. 405 has not nine leaves but 2+9+18+12+12+8+16+24+24 =
125 of them, be ause it has two opies of AB2 E3 , 3 3 opies of AB2 F3 H3 L3 ,
2 3 3 opies of AC2 D2 F3 H5 L5 , and so on. For purposes of al ulating the
varian e, it's best in fa t to imagine
Q that there are not nine Chen subsets but 125;
ea h S should be repli ated s S r WS (sb) times. This is the numerator of p(S )
in (13). Therefore, ourÆ Qnew onvention is that ea h of the lones of S o urs with
2 n
probability p (S ) = 1 s S r WS (s) | retaining only the denominator of p(S ).

0
2 n
(The leaf ÀB2 E3 ', for instan e, whi h orresponds to X = a + 2b + 3e, doesn't
o ur on e with probability 1/3; it o urs twi e, ea h with probability 1/6. Su h
splitting of ases with equal X does not a e t the mean or varian e.) And with
this onvention there's a perfe t orresponden e between the nodes of Fig. 405
and the ( lones of) partial Chen subsets of Fig. 403.
Exer ise 31 explains how to nail this orresponden e down pre isely, by
showing for example that the partial Chen subset ACD has two lones, whi h
orrespond to the two nodes labeled AC2 D2 jFG2 jI2 M2 .
The rux of a nonleaf label in T is the portion between verti al lines. For
0
example, the most ompli ated rux in Fig. 405 is `jI2 J3 K3 j', on stratum 4. We
say that two nodes (s; s ) form a riti al pair if they o ur together in at least
0
one rux. The riti al pairs in Fig. 405 are therefore (B; C), (E; F), (F; G), (H; I),
(I; J), (I; K), (J; K), and (L; M).
Ea h riti al pair has a least ommon an estor (l a) in T . For example,
l a(E; F) = A; l a(I; J) = C; l a(J; K) = G. The riti al pair (s; s ) is said to be 0
riti al for s if l a(s; s ) = s .

00 0 00
Mar h 27, 2022

With these de nitions, we're ready to state a sweeping generalization of (7): heft
Theorem V. If Algorithm C omputes X , given a strati ed tree T , we have
X
var(X ) = ( )( ost(s )
w s 0
ost(s ))2 (s ; s ) is riti al for s
00 0 00
: (15)
s2T
Here w(s), alled the heft of s, is de ned by a re urren e that generalizes (3):

w(s) =
1; P if s is the root; (16)
b) + f w(l a(s; s )) j (s; s ) is a riti al pairg; otherwise.
w (s 0 0

For example, w(A); : : : ; w(M) = (1; 2; 2; 2; 3; 3; 3; 4; 7; 8; 8; 4; 3) in Fig. 405.
Proof. Theorem V is proved in exer ise 36.
Corollary V. If ea h individual node ost (s) is an integer, so is var(X ).

Who knows what I might eventually say next?

Mar h 27, 2022

EXERCISES ZDD
independent subsets
1. [20 ℄ Figure 400(b) is very similar to the ZDD for independent subsets of 8 . Why? 8
x 2. [M20 ℄ Let m denote the sum X1m + + Xnm , where X1 , : : : , Xn are independent unbiased estimator
umulant
samples of the random variable X . Also let m = E X m . (Hen e var(X ) = 2 21 .) varian e
a) Prove that E 1 = n1 and E 2 = n2 . ruler fun tion
b) Express E 21 in terms of n, 1 , and 2 . sample varian e
total varian e
) Consequently E var(X ) = var(X ), where var is de ned in Eq. (5). binary
multiway
3. [HM21 ℄ Continuing exer ise 2, nd an unbiased estimator for 3 = E(X E X )3 , omplete graph
whi h is the third umulant of X (see 1.2.10{(23)). Kn
empty graph
4. [HM22 ℄ What is the varian e of var(X )?
Kn
5. [M21 ℄ Suppose we obtain the su essive estimates X1 , X2 , X3 , X4 , X5 , X6 , : : : omplete bipartite graph
= 1, 2, 1, 4, 1, 2, : : : , where Xk = 2k is de ned by the ruler fun tion . What is the Km;n
strati ed tree
sample varian e var(X ) of Eq. (5) when (a) n = 2k 1? (b) n = 2k ? XCC problem
x 6. [21 ℄ Devise eÆ ient data stru tures for using Algorithm P to estimate the number Chen subsets
ve tor-valued osts
of independent sets in a given graph G, using (a) binary (b) multiway sear h trees.
x 7. [M21 ℄ (Law of total varian e.) We dis ussed the famous law of total expe tation,
E X = E(E(X j Y ));
after MPR{(11), and observed that it is \a truly marvelous identity, great for hand-
waving and for impressing outsiders." Prove the even more marvelous formula
var(X ) = var(E(X j Y )) + E(var(X j Y )):
\The overall varian e of a random variable X is the varian e of its average plus the
average of its varian e, with respe t to any other random variable Y ."
8. [M22 ℄ Use the law of total varian e to prove (7).
9. [M22 ℄ For ea h of the following graphs, does a binary or multiway sear h tree
give a better estimate of the total number of independent sets, using Algorithm P?
a) The omplete graph Kn .
b) The empty graph K n .
) The omplete bipartite graph Km;n .
10. [M30 ℄ Continuing exer ise 9, onsider the n-vertex path graph Pn .
x 12. [21 ℄ Given a strati ed tree T , formulate an XCC problem whose solutions are the
Chen subsets of T .
13. [20 ℄ A ertain tree T with eight strata has 15 nodes, seven of whi h have degree 2;
the other eight nodes are leaves. How many of those leaves an be in a Chen subset S ?
14. [17 ℄ Play through Algorithm C by hand with respe t to the strati ed tree of
Fig. 403. Whi h of the nine Chen subsets in (9) does it impli itly onstru t, if the
uniform deviate U in (10) is always exa tly 2/5?
15. [15 ℄ How many Chen subsets does the multiway tree in Fig. 402 have?
16. [16 ℄ Why does Algorithm C redu e to Algorithm P when h(s) = h(s b)+1 for all s?
x 17. [21 ℄ Let T be a sear h tree for the independent sets of a graph. Explain how use
ve tor-valued osts (s) in Algorithm C, in order to obtain unbiased estimates of the
number of independent sets that have sizes 0, 1, : : : , when T is (a) binary (b) multiway.
x 18. [M21 ℄ How well does Algorithm C do, when estimating the (a) binary and (b)
multiway sear h trees for the families of graphs in exer ises 9 and 10?
Mar h 27, 2022
19. [M20 ℄ How many independent sets does a random graph on n verti es have, if ea h random graph
edge is present with probability 1/2? Evaluate the expe ted number when n = 4096. queen graphs Qn
oriented binary tree
20. [40 ℄ Determine the exa t number of independent sets of size k in the graph 4096 , binary tree
varian e
for k = 0, 1, : : : , 19. partial Chen subset
x 22. [24 ℄ Use Algorithms P and C to estimate the number of independent sets in the dire ted a y li graph
lones
queen graphs Qn , namely the number of ways to pla e k nonatta king queens on an Partial imitation trees
n n hessboard, for 0 k n. Consider the ounts for individual k as well as imitation trees
similar
the total over all k, using both binary and multiway sear h trees. Approximately how
many pla ements of nonatta king queens are possible on a 64 64 board?
25. [21 ℄ Find an oriented binary tree on 11 verti es for whi h var P (T ) = 0 < varC (T ).
30. [M19 ℄ Write out Z (AC D G jI J K jM ) and the Z equivalents of the three
2 2 3 2 3 3 2
states just below that state in Fig. 405. Verify that the 8-way bran h preserves E Z .
0
31. [M23 ℄ Given a partial Chen subset S of T , de ne a dire ted a y li graph dag(S )
0
from whi h we an readily determine the label of the node that orresponds to S in a 0
tree T 0 su h as Fig. 405. How many lones of that node appear in T 0 ?
32. [M21 ℄ Nodes E and G both o ur on stratum 3 of Fig. 403, but (E; G) isn't a
riti al pair. Why not? What is a ne essary and suÆ ient ondition that two nodes of
the same stratum make a riti al pair?
33. [M20 ℄ Why is Eq. (7) a spe ial ase of (15), and Eq. (3) a spe ial ase of (16)?
x 34. [M27 ℄ Given the tree T 0 onstru ted in exer ise 31, what osts 0 (s0 ) on its nodes
will make Algorithm P emulate the behavior of Algorithm C on T ? And what is
ost0 (s0 ), the sum of all osts 0 in the subtree of T 0 rooted at s0 ?
x 35. [M28 ℄ (Partial imitation trees.) Given a strati ed tree T A
with root r, de ne a tree I (s) for ea h s 2 T as follows: I (r) = T . B C
Otherwise I (s) is obtained from I (sb) by surgi ally removing the D
subtrees of every node s0 6= s with h(s0 ) = h(s), and repla ing F
them with opies of s and its subtree. (These \partial imitation" H H F I H
F
trees, whi h en apsulate key information about T , interpolate
between T and the \full imitation" trees illustrated in Fig. 406.) L L L L
a) If T is the tree in Fig. 403, its partial imitation I (L) is shown above. Draw I (I).
b) Prove that nodes s and t an both be present in the same Chen subset S if and
only if t appears in I (s).
) What are the Chen subsets of I (L) in the example above? With what probabilities
do they arise, if Algorithm C is applied to I (L) (instead of to T )?
d) Every node s0 of I (s) is a lone of some parti ular node [s0 ℄ of T . If S is a Chen
subset of I (s), show that the set [S ℄ = f[s0 ℄ j s0 2 S g is a Chen subset of T .
e) Write S S 0 , and say that S is similar to S 0 , if S and S 0 are Chen subsets of
I (s) with [S ℄ = [S 0 ℄. In su h ases dag([S ℄) is essentially the same as dag(S ) and
dag(S 0 ). Illustrate this fa t with the answer to ( P ).
f) Prove that, for all Chen subsets S and s 2 S , S S pI (s) (S 0 ) = pT (S )WS (s).
0
(See (13).) Hint: Consider Algorithm C onditioned on hoosing s.

g) Let s and t be any nodes of T , possibly equal. Prove that E WS (s)WS (t), over all
Chen subsets S hosen by Algorithm C, is the number of lones of t in I (s).
36. [M25 ℄ Exer ise 34 de nes the osts that enter into (7) when Algorithm P emulates
Algorithm C. Therefore (7) evaluates the desired varian e var(X ) in that algorithm.
Mar h 27, 2022
a) Prove that, whenever s0j and s0k are hildren of the same parent s0 2 T 0, the heft
di eren e ost0 (s0j ) ost0 (s0k ) between their subtree osts is a multiple of the
di eren e between the osts of two nodes sj and sk that form a riti al pair in T .
b) Consequently var(X ) is the sum of w(s0 ; s00 )( ost(s0 ) ost(s00 ))2 over all riti al
(s0 ; s00 ), for some rational onstants w(s0 ; s00 ). The remaining task is to prove that
w(s0 ; s00 ) equals the \heft," w(l a(s0 ; s00 )), de ned in (16). Apply exer ise 35(g).
x 37. [22 ℄ Design an algorithm that evaluates var(X ), the varian e in Eq. (15), given a
strati ed tree T . Hint: See exer ise 31.
99. [00 ℄ this is a temporary dummy exer ise
999. [M00 ℄ this is a temporary exer ise (for dummies)
Mar h 27, 2022

16 ANSWERS TO EXERCISES 7.2.2.9
After [this℄ way of Solving Questions, a man may steale a Nappe, AUBREY
and fall to worke again afresh where he left o . family of sets
Fisher
| JOHN AUBREY, An Idea of Edu ation of Young Gentlemen ( . 1684) bitwise
adja en y matrix
sideways addition
SECTION 7.2.2.9
1. The ZDD for this family of sets (with elements ordered 7 > 6 > > 0) is obtained
by (i) ombining all nodes with the same label into a single node; (ii) hanging ea h
label to its largest element; and (iii) using > for the leaves.
Pn Pn
2. (a) Indeed, E m = E k=1 Xk = k=1 E Xk = nm .
m m
Pn Pn Pn Pn P
Pn Pn
(b) E( k=1 Xk ) = E( k=1 Xk + j=1 k=1 Xj Xk [ j 6= k ℄) = nk=1 E X 2 +
2 2
k=1 (E X ) [ j 6= k ℄ = n2 + n(n 1)1 .

2 2
j =1
( ) (E 2 =n) (E 1 =n) = 2 2 =n (n 1)21 =n = nn 1 var(X ).
2
3. We have E 1 = n3 + 3n(n 1)2 1 + n(n 1)(n 2)31 , and E 2 1 = n3 +

3
n(n 1)2 1 . Hen e (n 3 3n2 1 + 231 )=(n(n 1)(n 2)) has the desired mean.
2
[R. A. Fisher, Pro eedings of the London Math. So iety (2) 30 (1929), 199{238.℄
4. 4 =n + 22 =(n 1). (See , exer ise 8.54.)
2 CMath
5. (a) (2
3k+1 (k2 + 4)22k + 2k+1 )/(4(2k 1)(2k 2)) = 2k 1 k2=4 + O(1); (b) (6
2 3 k
(k + 4k + 6)22k )/(2k+2 (2k 1)) = 3 2k 1 k2=4 k + O(1).
2
6. If G has verti es f0; 1; : : : ; n 1g, represent it bitwise as an n n matrix A, with

n rows Aj of dn=64e o tabytes ea h. We will use only the leftmost j bits of Aj ; and
in that row, we will assume that ajk = [ j / k ℄. (Thus A is a tually the omplement
of the usual adja en y matrix.) We work with a bitwise mask M = m0 m1 m2 : : : ,
represented in dn=64e o tabytes Mi = m64i m64i+1 : : : m64i+63 . And we maintain an
integer z su h that Mz 1 6= 0 for Mi = 0 for all i z. (Or z = 0 if M is entirely
zero.) The presen e of z allows us to do most of the ne essary multiword operations in
a single 64-bit register. Mask M represents the labels on the nodes of Figs. 400(b) and
401; initially mj = [ j < n ℄ for all j , and z = dn=64e, representing the root of the tree.
(a) In step P4, let v be maximum with mv = 1, and set mv 0. (Bitwise, we set
x Mz 1 , Mz 1 x & (x 1), and v 64z 1 ((x 1) & x).) Set d 2; and
with probability 1/2, also set M M & Av . While doing this, adjust z as ne essary.
(b) In step P4, set d M . After hoosing J , zero out the rightmost J 1-bits
of M ; then set M M & Av , where mv was the J th 1-bit removed.
7. Working in the sli es of probability spa e where Y is onstant, we have (by
de nition) var(X j Y ) = E(X 2 j Y ) (E(X j Y ))2 and var(E(X j Y )) = E(E(X j Y ))2

(E(E(X j Y )))2 . Hen e E(var(X j Y )) = E(E(X 2 j Y )) E(E(X j Y ))2 . The ompli ated
term E(E(X j Y )) fortuitously an els out, giving var(E(X j Y )) + E(var(X j Y )) =
2
E(E(X 2 j Y )) (E(E(X j Y )))2 = E X 2 (E X )2 .

8. Let T onsist of the root r and d subtrees T1 , : : : , Td rooted at r1 , : : : , rd . Then
Algorithm P omputes X = X (T ) = (r) + d X (TJ ), where J 2 f1; : : : ; dg is uniform.
Conditioning on J we have E(XP(T ) j J = j ) = E( (r) P + d X (Tj )) = (r) +
d ost(rj ). Hen e var(E(X j J )) = dj=1 (d ost(rj ))2=d ( dj=1 d ost(rj )=d)2 =
P P P P
d dj=1 ost(rj )2 ( dj=1 ost(rj ))2 = dj=1 dk=j +1 ( ost(rj ) ost(rk ))2 .
Pd Also
var(X (T ) j J = j ) = var(d X (Tj )) = d2 var(X (Tj )). Hen e E(var(X j J )) =
d 2 var(X (T ))=d = d Pd var(X (T )).
j =1 j j =1 j
Mar h 27, 2022

7.2.2.9 ANSWERS TO EXERCISES 17
The total varian e satis es the re urren e var(X ) = var(E(X j J )) + E(var(X j J )), re urren e
whi h is solved by (7). [A generalization to the ase that nonuniform probabilities are Knuth
binomial trees
used in step P4 was given by D. E. Knuth in Math. Comp. 29 (1975), 130.℄ Fibona i tree
9. (a) This is a worst- ase s enario for binary sear h trees, as dis ussed in the text.
re urren e
generating fun tions
But the multiway sear h tree has varian e 0. phi
(b) This is a best- ase s enario for binary sear h trees, sin e ea h of the 2n paths unit ve tors ek
yields a perfe t estimate. But the multiway sear h trees are the binomial trees, whi h
make Algorithm P behave horribly, P
as shownPin exer ise 7.2.2{52.
P
Indeed, (7) shows
that the varian e satis es vn = n nk=01 4k ( nk=01 2k )2 + n nk=01 vk ; it's easy to prove
that vk > (k + 1)! for all k 4.
( ) There are 2m +2n 1 independent sets. The sear h trees depend on an ordering
of the verti es; let's assume that the m verti es of one part pre ede the n verti es of the
other. With binary bran hing there are 2m leaves on level m + n and 2n 1 leaves on
level n; the varian e turns out to be (2m 1)2 (2n 1). But with multiway bran hing
we're stu k with two binomial trees, one for m and one for n, joined at the root.
10. Let the path be 1 2 n. In the binary ase the sear h tree turns
out to be the Fibona i tree of order n + 1, whi h has Fn+2 leaves. (See Fig. 8 in
Se tion 6.2.1.) The varian e satis es the interesting re urren e vn = (Fn+1 Fn )2 +
2vn 1P+ 2vn 2 , whi h an be solved by working with generating fun tions be ause we
have n0 vn zn = (z2 z3 )=((1 + z)(1 3z + z2 )(1 2z 2z2 )). The solution is
p n+1 p n+1
vn =
(1 + 3) + (1 3) 4F2n + 7F2n+1 2( 1)n ;
2 5
p
and it grows slightly faster than the square of the mean, be ause 1 + 3 2:08859 .
On the other hand, the multiway sear h tree is bad news. If we all it Tn , the n
hildren of the root are (T0 ; T0 ; T1 ; : : : ; Tn 2 ). The varian e grows superexponentially.
12. Let there be a primary item #s for ea h node s, and a primary item h for ea h
stratum. Also a se ondary item s for ea h node. The root r has one option, `#r 0
r:1'. Every node s 6= r has two options, `#s h(s) s:1 sb:1' and `#s s:0'. [\Node s is
either in or out."℄ Also append the option `h s1 :0 : : : sk :0' for ea h stratum h > 0,
where fs1 ; : : : ; sk g is the set of all parents of nodes on that stratum. [\If a stratum
isn't represented, the parents of its nodes mustn't be in luded."℄
For example, the options for Fig. 403 in lude `#G 3 G:1 C:1'; `4 F:0 C:0 G:0'.
13. There an't be ve leaves in S , be ause at least
A
B
four parents would be needed and there are only D
C
eight strata. But the tree T shown here has a four- G

E
H I
F
leaf Chen subset ABCDFILO. J

M
K
N O
L
14. Examining the hildren in left-to-right order, we perform merge(B; 1), merge(F; 1),
merge(C; 1) when h = 0; then merge(I; 2), merge(D; 2), merge(G; 2) when h = 1; et .
The Chen subset is ACDGJM (be ause U = 2=5 in merge(K; 3) retains S[4℄ = J). The
nal value of X is (A) + 2 (C) + 2 (D) + 3 (G) + 8 (J) + 2 (M).
15. We an hoose arbitrarily on the \bottom three" strata; so the answer is 4 2 17.
0
16. We'll always have h = h +1 in (10). So merging fs1 ; : : : ; sd g will leave a uniformly
random sj in S[h + 1℄, with W[h + 1℄ = d W . (But Algorithm C does take a lot longer,
of ourse, be ause ea h hild is examined; Algorithm P peeks at only one of them.)
17. (a) As mentioned in the text, we want (s) = ek when s is a leaf, otherwise
(s) = (0; : : : ). Add a new array K to Algorithm C; set K[0℄ 0 in step C1, k K[h℄
Mar h 27, 2022
in step C3; and when setting S[h0 ℄ s in (10), also set K[h0 ℄ k if s was the left riti al pair
hild, K[h0 ℄ k + 1 if s was the right hild. Ignore the value of X until termination |
Osterg
ZDD
ard
the nal hmax will always be n, and there will be only one leaf. Then set X W ek . generating fun tion
(b) Now (s) = el when we're l steps from the root. Add a new array L to
Algorithm C; set L[0℄ 0 in step C1, l L[h℄ in step C3; and when setting S[h0 ℄ s
in (10), also set L[h0 ℄ l +1. Noti e that the Chen subset might ontain several nodes
from di erent strata with the same l. So X should be maintained as an array X0 , X1 ,
: : : , initially zero, with Xl Xl + W in step C3. (It's onvenient to maintain a new
variable lmax , initially zero, the largest l seen so far in step C3.)
In pra ti e we run Algorithm C many times, and we use Eqs. 4.2.2{(15), (16) to
a umulate the overall statisti s. Then it is important to realize that ea h run with a
binary sear h tree estimates zero for all sizes k0 6= k; the urrent means and varian es
must be updated for every size less than or equal to Kmax , the largest k seen so far in
di erent runs. Similarly, with multiway trees, we must update with estimates that are
zero for all sizes with lmax < l Lmax , where Lmax is the largest lmax seen so far.
With another auxiliary array, for partial sums, we ould estimate (say) the inde-
pendent sets of size k whose sum is a prime number.
18. In seven of those eight ases, Algorithm C gives perfe t results (varian e 0), be ause
any two nodes with labels of the same length have identi al subtrees.
The ex eption is the binary tree for independent subsets of Km;n , when n > m +1.
If n = m + t for t > 0, the stratum with labels of length m + k has one node of ost
2m + 2k 1 and 2t k 1 nodes of ost 2m+k , for 0 k P < t. Fortunately, that stratum
has only one riti al pair. Thus the varian e omes to tk=01 (2m+k 2m 2k + 1)2 =
(2m 1)2 (22t 6 2t + 3t + 5)=3.
n k(k 1)=2
19. The expe ted number of independent sets of size k is learly
k =2 . This
quantity is maximized when k lg n lg ln n. (More pre isely, k =ln 2, where
e = n ln 2; see 7.2.1.5{(32).) Summing when n = 4096 gives 3:690196266499 1016 .
(Of ourse, n isn't exa tly random.)
20. (Solution by P. O stergard, after 11929 \virtual ore days" on a large omputer lus-
ter.) 1, 4096, 4190310, 1427544996, 182194979725, 9291979435922, 197255119600988,
1792775452108880, 7121172972180096, 12558590689855334, 9956022747349148, 35838-
38539300134, 590649524479701, 44880043687654, 1581582919448, 25981609725, 19984-
5828, 722567, 1250, 0. (Multiway estimates with Algorithm P and binary estimates
with Algorithm C are satisfa tory only for k 13. The expe ted values for a random
graph, as in exer ise 19, are similar: 1, 4096, 4193280, : : : , 802660.4, 1387.7, 1.1.)
22. Consider rst the familiar 8 8 hessboard. Exer ise 7.1.4{241(a) mentioned this
family f of sets, and reported that the maximal ones are represented by a ZDD of size
Z (f " ) = 8577. The entire family, with Z (f ) = 20243, yields the generating fun tion
n0 +n1 z + = 1+64z +1288z 2 +10320z 3 +34568z 4 +46736z 5 +22708z 6 +3192z 7 +92z 8 .
Using one gigamem of omputation, Algorithm P estimates (n0 ; : : : ; n8 ) = (0, 0,
0, 5700 1700, 26000 2000, 64000 21000, 24000 1000, 3170 40, 88 1), total
12300021000, with 6565984 samples of binary trees (b :2); or (1, 64, 1288, 103264,
34600 30, 46800 100, 22700 100, 3100 100, 110 30), total 119000 300, with
15086912 samples of multiway trees (b 0:0003). (The exa t varian es for the total,
a ording to (7), are 2019317169903 and 1253928499136 for binary and multiway.)
In the same amount of time, Algorithm C estimates (1:7 :4, 63 3, 1270 14,
10240 40, 34600 70, 46700 80, 22760 70, 3200 30, 93 6), total 118980 40,
with 581247 samples of binary trees (b < 10 5 ); or (1, 62 2, 1280 20, 10360 60,
Mar h 27, 2022
34600 100, 46600 100, 22900 100, 3140 50, 88 8), total 119000 40, with 108069 Chatterjee{Dia onis s ores
samples of multiway trees (b 2 10 5 ). (The exa t varian es, by Theorem V, are
are 842451990 and 159706980. The exa t total is 118969.)
The 64 64 board is a di erent story. Both versions of Algorithm P, again with a
gigamem time limit, yield miserably inadequate estimates like 1048 1048 or 1038 1038 ,
with Chatterjee{Dia onis s ores > :9. But Algorithm C is promising: The binary
version estimates (3:6 :3) 1078 , after generating 740 samples; the multiway version,
with time to make only 2 samples, estimates (4 2) 1078 . A further binary run, this
time allowing a teramem, suggests that the true value is (3:79 :02) 1078 , with most
of the pla ements having between 45 and 54 queens.
25. One of two examples is shown; in both ases varC (T ) = 8.
30. Z = a + 2 + 2d + 3g + 2I + 3J + 3K + 2M ; Z1 = a + 2 + 2d + 3g + 8i + 2M ;
Z2 = a + 2 + 2d + 3g + 8j + 2M ; Z3 = a + 2 + 2d + 3g + 8k + 2M . And Z =
(2Z1 + 3Z2 + 3Z3 )=8, be ause I = i, J = j , K = k.
31. Let dag(;) be an isolated node, the root. Otherwise, as in (11), let b s ! s0 whenever
0 0 0 0 0
h(s) = h(s ) and sb; s 2 S . Also let sb ! s whenever bs 2 S but S ontains no node in
stratum h(s). For example, dag(ACD) is mu h like (12), but it la ks H and L, and has
additional nodes I, G, M; ar s go C ! G, C ! I, D ! M but not C ! F.
Put the nodes of dag(S 0 ), with multipli ities WS (s), into a label, ordered by strata,
0
with verti al bars surrounding the smallest unrepresented

Q
stratum. (For example,
ÀC2 D2 jFG2 jI2 M2 '.) The number of lones is s2S nr WS (sb).
0 0
32. Nodes E and G an't be \between bars" with respe t to dag(S ) for any partial
0
Chen subset S 0 , be ause their parents (B and C) lie in the same stratum.
In general, same-stratum nodes form a riti al pair if and only if they have no two
same-stratum an estors, on the paths between them and their least ommon an estor.
33. Two nodes on the same stratum, having the same parent, always form a riti al
pair for that parent. In the spe ial ase where ea h stratum is simply the distan e from
the root, those are the only riti al pairs. And in that ase if the parent bs of node s
has degree d, the heft w(s) is w(sb) + (d 1)w(sb), be ause s has d 1 siblings.
0 0
34. When Algorithm P rea hes a node s 2 T that orresponds to the partial Chen
0 Q 0
subset S , it will have set W s2S WS (s), whi h is 1=p(S ). Let s 2 T be the
0
node of S 0 with highest stratum. We want step P2 to set X

0
X + WS (s) (s),
0
be ause that's what Algorithm C would do. So we de ne 0 (s0 ) = p(S 0 ) WS (s) (s). 0
[In identally, p(S 0)WS (s) = p(S 00 ), where s00 is sb0 , the parent of P
0 s0 in T 0 .℄
This de nition of implies that ost (S ) = p(S ) ( (s) + WS (s000 ) ost(s000 )),
0 0 0 0 0
summed over all s000 that are verti es of dag(S 0 ) but not elements of S 0 .
Of ourse those de nitions ry out for an example. Suppose S 0 = ACD. Then s0
is one of the two nodes labeled AC2 D2 jFG2 jI2 M2 . From the label we know that s = D,
S 00 = AC, p(S 0 ) = 1=(1 2 2) = 1=4, and p(S 00 ) = 1=(1 2) = 1=2. Also ost0 (s0 ) =
1 ( ost(D) + ost(F) + 2 ost(G) + 2 ost(I) + 2 ost(M)) = 1 d + 1 (F + 2G + 2I + 2M ),
4 2 4
where F = f + H + L, G = g + J + K, I = i, M = m. A
35. (a) (It's rather like the seventh full imitation tree in Fig. 406.) C C
(b) All surgi ally removed nodes are in ompatible with s. D D
Conversely, if t 2 I (s) we an do further surgery, if ne essary, G F G
until obtaining a full imitation tree that in ludes both s and t. I I I I I I I
( ) Using primes to distinguish same-name nodes, we nd M L M
0 0 0 0 0 0 0 0 00
ten: ABFHL 54 ; ABF H L 54 ; ACDF H L 300 ; ACDF H L 300 ; ACDF H L 300 ;
18 9 18 12 00 00 00 24
Mar h 27, 2022

ACDF00 H00 L000 300 36 ; ACDF0 IL0 12 ; ACDF0 IL00 8 ; ACDF00 IL00 16 ; ACDF00 IL000 24 .
300 300 300 300 Chen
left-sibling/right hild links
(d) Clearly [S ℄ satis es properties (i), (ii), (iii), and (iv) of the de nition.
(e) The rst two Chen subsets in ( ) have [S ℄ = ABFHL; the next four redu e
to ACDFHL; and the last four to ACDFIL. These are the Chen subsets of T that
ontain L. Noti e that dag(S ) = dag([S ℄), when primes on the node names are removed,
by de nition of the dag; hen e WS (s0 ) = W[S℄(s0 ). (Although similar Chen subsets have
the same node names and the same dag, they often o ur with di erent probabilities,
be ause the parental stru tures in formula (13) di er.)
(f) The behavior of Algorithm C on I (s) is equivalent to its behavior on T ,
restri ted to ases where s and all of its an estors are part of the nal sample. Hen e
the stated sum is the probability of obtaining S , divided by 1=WS (s) | whi h is the
probability of inPluding s. (The fa tors forPs and its an estors in (13) are teles oping.)
(g) By (f),P S WS (s)WS (t) pT (S ) = S ;S pI (s) (SP
0
0 )[[S 0 ℄ = S ℄ WS (t), whi h equals
P 0 0 0 (s0 ) pI (s) (S 0 ) = 1 times
PS pI (s) (S0 ) s 2S WS (s )[[s ℄ = t℄. And that's just S WS
0 0 0 0 0 0
s 2I (s) [[s ℄ = t℄. (Surprise: By symmetry, it must also be the number of lones of s
0
in I (t)(!). For example, there's one I in I (L), and one L in I (I). This theory was
developed in Chen's Ph.D. thesis, Heuristi Sampling on Ba ktra k Trees (Stanford
University, 1989), x3.3, where he noted that Corollary V is an immediate onsequen e.)
36. (a) This is almost true by de nition, on e we penetrate the de nitions and the
notations. Consider, for example, the hildren of node AC2 D2 jFG2 jI2 M2 ; that node
orresponds to S 0 = ACD. The left hild, orresponding to S10 = ACDF, has ost0 (s01 ) =
f=4+(3H +2I +3L +2M )=12. The other two hildren, orresponding to S20 = ACDG,
have ost0 (s02 ) = g=4 + (2I + 3J + 3K + 2M )=12. The di eren e is (3f + 3H + 3L
3g 3J 3K )=12 = (F G)=4. And (F; G) is riti al.
(b) Indeed, the heft w(s) is the number of s I (s), be ause this quantity
lones of in
satis es the re urren e (16). Consider, for example, s = I in exer ise 35(a); then sb = C.
There are two I's in I (C); also, on the same stratum, one H, two J's, and two K's. That's
be ause l a(I; H) = A has heft 1, and l a(I; J) = l a(I; K) = C has heft 2.
(Noti e furthermore that the oeÆ ient of (s)2 in var(X ) is w(s) 1. If s 6= t,
the oeÆ ient of (s) (t) is twi e the number of t's in I (s), minus 2.)
37. Let T have nodes f0; 1; : : : ; n 1g, with root 0, and with link elds LCHILD(p),
RSIB(p) in ea h node, pointing to the leftmost hild and right sibling. This algorithm
also uses additional elds PARENT, LINK, COST, and HEFT. We assume that LCHILD(p) >
p and RSIB(p) > p, unless they're . Furthermore there's an auxiliary array with en-
tries HEAD[h℄ for 0 h hmax , where hmax is the maximum stratum; it's initially zero.
V1. [Compute subtree osts.℄ For p n 1, : : : , 1, 0, in de reasing order, do this: Set
COST(p) (p), q LCHILD(p); while q 6= , set COST(p) COST(p) + COST(q),
PARENT(q ) p, and q RSIB(q ).
V2. [Link same-stratum nodes.℄ For p n 1, : : : , 1, 0 do this: Set h h(p),
LINK(p) HEAD[h℄, HEAD[h℄ p.
V3. [Loop on h.℄ Set HEFT(0) 1, h 1, V 0.
V4. [Initialize hefts for h.℄ Set p HEAD[h℄, and while p 6= set HEFT(p)
HEFT(PARENT(p)) , p LINK(p).
V5. [Loop on p in h.℄ Set p HEAD(h). Go to V11 if p = .
V6. [Loop on q > p in h.℄ Set q LINK(p). Go to V10 if q = .
Mar h 27, 2022
V7. [Test riti ality.℄ Set p0 PARENT(p), q0 PARENT(q). While h(p0 ) 6= h(q0 ), set
p0 PARENT(p0 ) if h(p0 ) < h(q 0 ), otherwise set q 0 PARENT(q 0 ). Then go to V9
if p0 6= q0 . (In that ase (p; q) isn't riti al, by exer ise 32.)
V8. [Augment V and hefts.℄ Set V V + HEFT(p0 ) (COST(p) COST(q ))2 , HEFT(p)
0
HEFT(p) + HEFT(p ), HEFT(q ) HEFT(q ) + HEFT(p0 ). (See Eqs. (15) and (16).)
V9. [End loop q .℄ Set q LINK(q ). Return to V7 if q 6= .
V10. [End loop p.℄ Set p LINK(p). Return to V6 if p 6= .
V11. [End loop h.℄ Set h h + 1. Return to V4 if h hmax ; otherwise terminate (the
varian e is V ).
99. :::
999. : : :
Mar h 27, 2022

INDEX AND GLOSSARY HUNT
Graham
Knuth
Index-making has been held to be the driest Patashnik
as well as lowest spe ies of writing.
We shall not dispute the humbleness of it;
but the task need not be so very dry.
| LEIGH HUNT, in The Indi ator (1819)
When an index entry refers to a page ontaining a relevant exer ise, see also the answer to
that exer ise for further information. An answer page is not indexed here unless it refers to a
topi not in luded in the statement of the exer ise.
CMath: Con rete Mathemati s , a book Preliminary notes for indexing appear in the
by R. L. Graham, D. E. Knuth, upper right orner of most pages.
and O. Patashnik.
Pi graph of order : The in nite graph ,
n If I've mentioned somebody's name and
restri ted to f0 1
; ;:::;n1g, 1. forgotten to make su h an index note,
Nothing else is indexed yet (sorry). it's an error (worth $2.56).
Mar h 27, 2022

22

Fasc 9 C

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fasc 9 C

Uploaded by

Copyright:

Available Formats

Note to readers:

Please ignore these

DONALD E. KNUTH Stanford University

Internet page http://www- s-fa ulty.stanford.edu/~knuth/tao p.html ontains

Mar h 27, 2022

Ba ktra k programs are full of surprises.

Let's look at a somewhat weird example, in order to grasp the generality

Every sample S produ ed by Chen's method will be a tree, ontained in the

iv) If s 2 S and s 2 S and s 6= s , then h(s) 6= h(s ).

: and let U be uniform in [0 : : 1);

w(1), : : : , w(k), the urrent representative

probability w = w + + w . (Exer ise 14 illustrates an example run.)

5000 4..15 Fig. 404.

In general var P tends to be roughly 10 times as large as varC , although there

var P = 8255 ; var P = 751 ; var P = 215 :

+ 81 (a +2 +2d +3g +8j +2m)+ 18 (a +2 +2d +3g +8k +2m):

shown in Fig. 405 that the behavior of Algorithm C an a tually be viewed as

probability p (S ) = 1 s S r WS (s) | retaining only the denominator of p(S ).

riti al for s if l a(s; s ) = s .

Mar h 27, 2022

Corollary V. If ea h individual node ost (s) is an integer, so is var(X ).

Mar h 27, 2022

(See (13).) Hint: Consider Algorithm C onditioned on hoosing s.

999. [M00 ℄ this is a temporary exer ise (for dummies)

Mar h 27, 2022

k=1 (E X ) [ j 6= k ℄ = n2 + n(n 1)1 .

3. We have E 1 = n3 + 3n(n 1)2 1 + n(n 1)(n 2)31 , and E 2 1 = n3 +

6. If G has verti es f0; 1; : : : ; n 1g, represent it bitwise as an n n matrix A, with

de nition) var(X j Y ) = E(X 2 j Y ) (E(X j Y ))2 and var(E(X j Y )) = E(E(X j Y ))2

E(E(X 2 j Y )) (E(E(X j Y )))2 = E X 2 (E X )2 .

Mar h 27, 2022

eight strata. But the tree T shown here has a four- G

leaf Chen subset ABCDFILO. J

with verti al bars surrounding the smallest unrepresented

node of S 0 with highest stratum. We want step P2 to set X

Mar h 27, 2022

Mar h 27, 2022

Mar h 27, 2022

You might also like

k=1 (E X ) [ j 6= k ℄ = n2 + n(n 1)1 .

3. We have E 1 = n3 + 3n(n 1)2 1 + n(n 1)(n 2)31 , and E 2 1 = n3 +