You are on page 1of 100

An Introduction to

Mathematical Methods in Combinatorics

Renzo Sprugnoli
Dipartimento di Sistemi e Informatica
Viale Morgagni, 65 - Firenze (Italy)

January 18, 2006


2
Contents

1 Introduction 5
1.1 What is the Analysis of an Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 The Analysis of Sequential Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Binary Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Closed Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 The Landau notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Special numbers 11
2.1 Mappings and powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 The group structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Counting permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Dispositions and Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 The Pascal triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.7 Harmonic numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.8 Fibonacci numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.9 Walks, trees and Catalan numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.10 Stirling numbers of the first kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.11 Stirling numbers of the second kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.12 Bell and Bernoulli numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Formal power series 25


3.1 Definitions for formal power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 The basic algebraic structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Formal Laurent Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Operations on formal power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6 Coefficient extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.7 Matrix representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8 Lagrange inversion theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.9 Some examples of the LIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.10 Formal power series and the computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.11 The internal representation of expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.12 Basic operations of formal power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.13 Logarithm and exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Generating Functions 39
4.1 General Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Some Theorems on Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 More advanced results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Common Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 The Method of Shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.6 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.7 Some special generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3
4 CONTENTS

4.8 Linear recurrences with constant coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47


4.9 Linear recurrences with polynomial coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.10 The summing factor method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.11 The internal path length of binary trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.12 Height balanced binary trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.13 Some special recurrences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Riordan Arrays 55
5.1 Definitions and basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 The algebraic structure of Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 The A-sequence for proper Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.4 Simple binomial coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.5 Other Riordan arrays from binomial coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.6 Binomial coefficients and the LIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.7 Coloured walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.8 Stirling numbers and Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.9 Identities involving the Stirling numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6 Formal methods 67
6.1 Formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 Context-free languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3 Formal languages and programming languages . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4 The symbolic method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.5 The bivariate case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.6 The Shift Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.7 The Difference Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.8 Shift and Difference Operators - Example I . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.9 Shift and Difference Operators - Example II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.10 The Addition Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.11 Definite and Indefinite summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.12 Definite Summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.13 The Euler-McLaurin Summation Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.14 Applications of the Euler-McLaurin Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7 Asymptotics 83
7.1 The convergence of power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 The method of Darboux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.3 Singularities: poles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.4 Poles and asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.5 Algebraic and logarithmic singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.6 Subtracted singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.7 The asymptotic behavior of a trinomial square root . . . . . . . . . . . . . . . . . . . . . . . . 89
7.8 Hayman’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.9 Examples of Hayman’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8 Bibliography 93
Chapter 1

Introduction

1.1 What is the Analysis of an ever, as a matter of fact, the nature of S is not essen-
tial, because we always deal with a suitable binary
Algorithm representation of the elements in S on a computer,
and have therefore to be considered as “words” in
An algorithm is a finite sequence of unambiguous the computer memory. The “problem of searching”
rules for solving a problem. Once the algorithm is is as follows: we are given a finite ordered subset
started with a particular input, it will always end, T = (a1 , a2 , . . . , an ) of S (usually called a table, its
obtaining the correct answer or output, which is the elements referred to as keys), an element s ∈ S and
solution to the given problem. An algorithm is real- we wish to know whether s ∈ T or s 6∈ T , and in the
ized on a computer by means of a program, i.e., a set former case which element ak in T it is.
of instructions which cause the computer to perform Although the mathematical problem “s ∈ T or
the elaborations intended by the algorithm. not” has almost no relevance, the searching prob-
So, an algorithm is independent of any computer, lem is basic in computer science and many algorithms
and, in fact, the word was used for a long time before have been devised to make the process of searching
computers were invented. Leonardo Fibonacci called as fast as possible. Surely, the most straight-forward
“algorisms” the rules for performing the four basic algorithm is sequential searching: we begin by com-
operations (addition, subtraction, multiplication and paring s and a1 and we are finished if they are equal.
division) with the newly introduced arabic digits and Otherwise, we compare s and a2 , and so on, until we
the positional notation for numbers. “Euclid’s algo- find an element ak = s or reach the end of T . In
rithm” for evaluating the greatest common divisor of the former case the search is successful and we have
two integer numbers was also well known before the determined the element in T equal to s. In the latter
appearance of computers. case we are convinced that s 6∈ T and the search is
Many algorithms can exist which solve the same unsuccessful.
problem. Some can be very skillful, others can be The analysis of this (simple) algorithm consists in
very simple and straight-forward. A natural prob- finding one or more mathematical expressions de-
lem, in these cases, is to choose, if any, the best al- scribing in some way the number of operations per-
gorithm, in order to realize it as a program on a par- formed by the algorithm as a function of the number
ticular computer. Simple algorithms are more easily n of the elements in T . This definition is intentionally
programmed, but they can be very slow or require vague and the following points should be noted:
large amounts of computer memory. The problem
• an algorithm can present several aspects and
is therefore to have some means to judge the speed
therefore may require several mathematical ex-
and the quantity of computer resources a given algo-
pressions to be fully understood. For example,
rithm uses. The aim of the “Analysis of Algorithms”
for what concerns the sequential search algo-
is just to give the mathematical bases for describing
rithm, we are interested in what happens during
the behavior of algorithms, thus obtaining exact cri-
a successful or unsuccessful search. Besides, for
teria to compare different algorithms performing the
a successful search, we wish to know what the
same task, or to see if an algorithm is sufficiently good
worst case is (Worst Case Analysis) and what
to be efficiently implemented on a computer.
the average case is (Average Case Analysis) with
Let us consider, as a simple example, the problem
respect to all the tables containing n elements or,
of searching. Let S be a given set. In practical cases
for a fixed table, with respect to the n elements
S is the set N of natural numbers, or the set Z of
it contains;
integer numbers, or the set R of real numbers or also
the set A∗ of the words on some alphabet A. How- • the operations performed by an algorithm can be

5
6 CHAPTER 1. INTRODUCTION

of many different kinds. In the example above technical point for the analyst. A large part of our
the only operations involved in the algorithm are efforts will be dedicated to this topic.
comparisons, so no doubt is possible. In other Let us now consider an unsuccessful search, for
algorithms we can have arithmetic or logical op- which we only have an Average Case analysis. If Uk
erations. Sometimes we can also consider more denotes the number of comparisons necessary for a
complex operations as square roots, list concate- table with k elements, we can determine Un in the
nation or reversal of words. Operations depend following way. We compare s with the first element
on the nature of the algorithm, and we can de- a1 in T and obviously we find s 6= a1 , so we should
cide to consider as an “operation” also very com- go on with the table T ′ = T \ {a1 }, which contains
plicated manipulations (e.g., extracting a ran- n − 1 elements. Consequently, we have:
dom number or performing the differentiation of
a polynomial of a given degree). The important Un = 1 + Un−1
point is that every instance of the “operation”
takes on about the same time or has the same This is a recurrence relation, that is an expression
complexity. If this is not the case, we can give relating the value of Un with other values of Uk having
k < n. It is clear that if some value, e.g., U0 or U1
different weights to different operations or to dif-
ferent instances of the same operation. is known, then it is possible to find the value of Un ,
for every n ∈ N. In our case, U0 is the number of
We observe explicitly that we never consider exe- comparisons necessary to find out that an element
cution time as a possible parameter for the behavior s does not belong to a table containing no element.
of an algorithm. As stated before, algorithms are Hence we have the initial condition U0 = 0 and we
independent of any particular computer and should can unfold the preceding recurrence relation:
never be confused with the programs realizing them.
Algorithms are only related to the “logic” used to Un = 1 + Un−1 = 1 + 1 + Un−2 = · · · =
solve the problem; programs can depend on the abil- = 1 + 1 + · · · + 1 +U0 = n
| {z }
ity of the programmer or on the characteristics of the n times
computer.
Recurrence relations are the other mathematical
device arising in algorithm analysis. In our example
1.2 The Analysis of Sequential the recurrence is easily transformed into a sum, but
as we shall see this is not always the case. In general
Searching we have the problem of solving a recurrence, i.e., to
find an explicit expression for Un , starting with the
The analysis of the sequential searching algorithm is
recurrence relation and the initial conditions. So, an-
very simple. For a successful search, the worst case
other large part of our efforts will be dedicated to the
analysis is immediate, since we have to perform n
solution or recurrences.
comparisons if s = an is the last element in T . More
interesting is the average case analysis, which intro-
duces the first mathematical device of algorithm anal- 1.3 Binary Searching
ysis. To find the average number of comparisons in a
successful search we should sum the number of com- Another simple example of analysis can be performed
parisons necessary to find any element in T , and then with the binary search algorithm. Let S be a given
divide by n. It is clear that if s = a1 we only need ordered set. The ordering must be total, as the nu-
a single comparison; if s = a2 we need two compar- merical order in N, Z or R or the lexicographical
isons, and so on. Consequently, we have: order in A∗ . If T = (a1 , a2 , . . . , an ) is a finite or-
dered subset of S, i.e., a table, we can always imag-
1 1 n(n + 1) n+1
Cn = (1 + 2 + · · · + n) = = ine that a1 < a2 < · · · < an and consider the fol-
n n 2 2
lowing algorithm, called binary searching, to search
This result is intuitively clear but important, since for an element s ∈ S in T . Let ai the median el-
it shows in mathematical terms that the number of ement in T , i.e., i = ⌊(n + 1)/2⌋, and compare it
comparisons performed by the algorithm (and hence with s. If s = ai then the search is successful;
the time taken on a computer) grows linearly with otherwise, if s < ai , perform the same algorithm
the dimension of T . on the subtable T ′ = (a1 , a2 , . . . , ai−1 ); if instead
The concluding step of our analysis was the execu- s > ai perform the same algorithm on the subtable
tion of a sum—a well-known sum in the present case. T ′′ = (ai+1 , ai+2 , . . . , an ). If at any moment the ta-
This is typical of many algorithms and, as a matter of ble on which we perform the search is reduced to the
fact, the ability in performing sums is an important empty set ∅, then the search is unsuccessful.
1.4. CLOSED FORMS 7

Let us consider first the Worst Case analysis of This sum also is not immediate, but the reader can
this algorithm. For a successful search, the element check it by using mathematical induction. If we now
s is only found at the last step of the algorithm, i.e., write k2k − 2k + 1 = k(2k − 1) + k − (2k − 1), we find:
when the subtable on which we search is reduced to
a single element. If Bn is the number of comparisons k log2 (n + 1)
An = k + − 1 = log2 (n + 1) − 1 +
necessary to find s in a table T with n elements, we n n
have the recurrence: which is only a little better than the worst case.
Bn = 1 + B⌊n/2⌋ For unsuccessful searches, the analysis is now very
simple, since we have to proceed as in the Worst Case
In fact, we observe that every step reduces the table analysis and at the last comparison we have a failure
to ⌊n/2⌋ or to ⌊(n−1)/2⌋ elements. Since we are per- instead of a success. Consequently, Un = Bn .
forming a Worst Case analysis, we consider the worse
situation. The initial condition is B1 = 1, relative to
the table (s), to which we should always reduce. The 1.4 Closed Forms
recurrence is not so simple as in the case of sequential
searching, but we can simplify everything considering The sign “=” between two numerical expressions de-
a value of n of the form 2k −1. In fact, in such a case, notes their numerical equivalence as for example:
we have ⌊n/2⌋ = ⌊(n − 1)/2⌋ = 2k−1 − 1, and the re- n
X n(n + 1)
currence takes on the form: k=
2
B2k −1 = 1 + B2k−1 −1 or βk = 1 + βk−1 k=0

if we write βk for B2k −1 . As before, unfolding yields Although algebraically or numerically equivalent, two
βk = k, and returning to the B’s we find: expressions can be computationally quite different.
In the example, the left-hand expression requires n
Bn = log2 (n + 1) sums to be evaluated, whereas the right-hand ex-
by our definition n = 2 − 1. This is valid for every n pression only requires a sum, a multiplication and
k

of the form 2k − 1 and for the other values this is an a halving. For n also moderately large (say n ≥ 5)
approximation, a rather good approximation, indeed, nobody would prefer computing the left-hand expres-
because of the very slow growth of logarithms. sion rather than the right-hand one. A computer
We observe explicitly that for n = 1, 000, 000, a evaluates this latter expression in a few nanoseconds,
sequential search requires about 500,000 comparisons but can require some milliseconds to compute the for-
on the average for a successful search, whereas binary mer, if only n is greater than 10,000. The important
searching only requires log2 (1, 000, 000) ≈ 20 com- point is that the evaluation of the right-hand expres-
parisons. This accounts for the dramatic improve- sion is independent of n, whilst the left-hand expres-
ment that binary searching operates on sequential sion requires a number of operations growing linearly
searching, and the analysis of algorithms provides a with n.
mathematical proof of such a fact. A closed form expression is an expression, depend-
The Average Case analysis for successful searches ing on some parameter n, the evaluation of which
can be accomplished in the following way. There is does not require a number of operations depending
only one element that can be found with a single com- on n. Another example we have already found is:
parison: the median element in T . There are two n
X
elements that can be found with two comparisons: k2k−1 = n2n − 2n + 1 = (n − 1)2n + 1
the median elements in T ′ and in T ′′ . Continuing k=0
in the same way we find the average number An of
comparisons as: Again, the left-hand expression is not in closed form,
whereas the right-hand one is. We observe that
1
An = (1 + 2 + 2 + 3 + 3 + 3 + 3 + 4 + · · · 2n = 2×2×· · ·×2 (n times) seems to require n−1 mul-
n
tiplications. In fact, however, 2n is a simple shift in
· · · + (1 + ⌊log2 (n)⌋))
a binary computer and, more in general, every power
The value of this sum can be found explicitly, but the αn = exp(n ln α) can be always computed with the
method is rather difficult and we delay it until later maximal accuracy allowed by a computer in constant
(see Section 4.7). When n = 2k − 1 the expression time, i.e., in a time independent of α and n. This
simplifies: is because the two elementary functions exp(x) and
k ln(x) have the nice property that their evaluation is
1 X j−1 k2k − 2k + 1 independent of their argument. The same property
A2k −1 = k j2 =
2 − 1 j=1 2k − 1 holds true for the most common numerical functions,
8 CHAPTER 1. INTRODUCTION

as the trigonometric and hyperbolic functions, the Γ the last integral being the famous Gauss’ integral.
and ψ functions (see below), and so on. Finally, from the recurrence relation we obtain:
As we shall see, in algorithm analysis there appear
1
many kinds of “special” numbers. Most of them can Γ(1/2) = Γ(1 − 1/2) = − Γ(−1/2)
be reduced to the computation of some basic quan- 2
tities, which are considered to be in closed form, al- and therefore:
though apparently they depend on some parameter √
n. The three main quantities of this kind are the fac- Γ(−1/2) = −2Γ(1/2) = −2 π.
torial, the harmonic numbers and the binomial co-
efficients. In order to justify the previous sentence, The Γ function is defined for every x ∈ C, except
let us anticipate some definitions, which will be dis- when x is a negative integer, where the function goes
cussed in the next chapter, and give a more precise to infinity; the following approximation can be im-
presentation of the Γ and ψ functions. portant:
(−1)n 1
The Γ-function is defined by a definite integral: Γ(−n + ǫ) ≈ .
Z ∞ n! ǫ
Γ(x) = tx−1 e−t dt. When we unfold the basic recurrence of the Γ-
0 function for x = n an integer, we find Γ(n + 1) =
By integrating by parts, we obtain: n × (n − 1) × · · · × 2 × 1. The factorial Γ(n + 1) =
Z ∞ n! = 1 × 2 × 3 × · · · × n seems to require n − 2 multi-
Γ(x + 1) = tx e−t dt = plications. However, for n large it can be computed
0
Z ∞ by means of the Stirling’s formula, which is obtained
£ x −t ¤∞ from the same formula for the Γ-function:
= −t e 0 + xtx−1 e−t dt
0
n! = Γ(n + 1) = nΓ(n) =
= xΓ(x) ³ n ´n µ ¶
√ 1 1
which is a basic, recurrence property of the Γ- = 2πn 1+ + + ··· .
e 12n 288n2
function. It allows us to reduce the computation of
Γ(x) to the case 1 ≤ x ≤ 2. In this interval we can This requires only a fixed amount of operations to
use a polynomial approximation: reach the desired accuracy.
The function ψ(x), called ψ-function or digamma
Γ(x + 1) = 1 + b1 x + b2 x2 + · · · + b8 x8 + ǫ(x) function, is defined as the logarithmic derivative of
where: the Γ-function:

b1 = −0.577191652 b5 = −0.756704078 d Γ′ (x)


ψ(x) = ln Γ(x) = .
b2 = 0.988205891 b6 = 0.482199394 dx Γ(x)
b3 = −0.897056937 b7 = −0.193527818
Obviously, we have:
b4 = 0.918206857 b8 = 0.035868343
d
Γ′ (x + 1) xΓ(x)
The error is |ǫ(x)| ≤ 3 × 10−7 . Another method is to ψ(x + 1) = = dx =
use Stirling’s approximation: Γ(x + 1) xΓ(x)

µ Γ(x) + xΓ′ (x) 1
−x x−0.5 1 1 = = + ψ(x)
Γ(x) = e x 2π 1 + + − xΓ(x) x
12x 288x2
¶ and this is a basic property of the digamma function.
139 571
− 3
− 4
+ · · · . By this formula we can always reduce the computa-
51840x 2488320x
tion of ψ(x) to the case 1 ≤ x ≤ 2, where we can use
Some special values of the Γ-function are directly the approximation:
obtained from the definition. For example, when
x = 1 the integral simplifies and we immediately find 1 1 1 1
ψ(x) = ln x − − 2
+ 4
− + ···.
Γ(1) = 1. When x = 1/2 the definition implies: 2x 12x 120x 252x6
Z ∞ Z ∞ −t By the previous recurrence, we see that the digamma
e
Γ(1/2) = t1/2−1 e−t dt = √ dt. function is related to the harmonic numbers Hn =
0 0 t
√ 1 + 1/2 + 1/3 + · · · + 1/n. In fact, we have:
By performing the substitution y = t (t = y 2 and
dt = 2ydy), we have: Hn = ψ(n + 1) + γ
Z ∞ −y2 Z ∞
e 2 √ where γ = 0.57721566 . . . is the Mascheroni-Euler
Γ(1/2) = 2ydy = 2 e−y dy = π constant. By using the approximation for ψ(x), we
0 y 0
1.5. THE LANDAU NOTATION 9

obtain an approximate formula for the Harmonic is in the same order as g(n), a constant K 6= 0 exists
numbers: such that:
1 1 f (n) f (n)
Hn = ln n + γ + − + ··· lim =1 or lim = K;
2n 12n2 n→∞ Kg(n) n→∞ g(n)
which shows that the computation of Hn does not
the constant K is very important and will often be
require n − 1 sums and n − 1 inversions as it can
used.
appear from its definition.
Before making some important comments on Lan-
Finally, the binomial coefficient:
µ ¶ dau notation, we wish to introduce a last definition:
n n(n − 1) · · · (n − k + 1) we say that f (n) is of smaller order than g(n) and
= =
k k! write f (n) = o(g(n)), iff:
n!
= = f (n)
k!(n − k)! lim = 0.
n→∞ g(n)
Γ(n + 1)
= Obviously, this is in accordance with the previous
Γ(k + 1)Γ(n − k + 1)
definitions, but the notation introduced (the small-
can be reduced to the computation of the Γ function oh notation) is used rather frequently and should be
or can be approximated by using the Stirling’s for- known.
mula for factorials. The two methods are indeed the If f (n) and g(n) describe the behavior of two al-
same. We observe explicitly that the last expression gorithms A and B solving the same problem, we
shows that binomial coefficients can be defined for will say that A is asymptotically better than B iff
every n, k ∈ C, except that k cannot be a negative f (n) = o(g(n)); instead, the two algorithms are
integer number. asymptotically equivalent iff f (n) = Θ(g(n)). This is
The reader can, as a very useful exercise, write rather clear, because when f (n) = o(g(n)) the num-
computer programs to realize the various functions ber of operations performed by A is substantially less
mentioned in the present section. than the number of operations performed by B. How-
ever, when f (n) = Θ(g(n)), the number of operations
1.5 The Landau notation is the same, except for a constant quantity K, which
remains the same as n → ∞. The constant K can
To the mathematician Edmund Landau is ascribed simply depend on the particular realization of the al-
a special notation to describe the general behavior gorithms A and B, and with two different implemen-
of a function f (x) when x approaches some definite tations we may have K < 1 or K > 1. Therefore, in
value. We are mainly interested to the case x → general, when f (n) = Θ(g(n)) we cannot say which
∞, but this should not be considered a restriction. algorithm is better, this depending on the particular
Landau notation is also known as O-notation (or big- realization or on the particular computer on which
oh notation), because of the use of the letter O to the algorithms are run. Obviously, if A and B are
denote the desired behavior. both realized on the same computer and in the best
Let us consider functions f : N → R (i.e., sequences possible way, a value K < 1 tells us that algorithm
of real numbers); given two functions f (n) and g(n), A is relatively better than B, and vice versa when
we say that f (n) is O(g(n)), or that f (n) is in the K > 1.
order of g(n), if and only if: It is also possible to give an absolute evaluation
for the performance of a given algorithm A, whose
f (n) behavior is described by a sequence of values f (n).
lim <∞
n→∞ g(n) This is done by comparing f (n) against an absolute
scale of values. The scale most frequently used con-
In formulas we write f (n) = O(g(n)) or also f (n) ∼ tains powers of n, logarithms and exponentials:
g(n). Besides, if we have at the same time: √
O(1) < O(ln n) < O( n) < O(n) < · · ·
g(n) √
lim <∞ < O(n ln n) < O(n n) < O(n2 ) < · · ·
n→∞ f (n)
< O(n5 ) < · · · < O(en ) < · · ·
we say that f (n) is in the same order as g(n) and n
< O(ee ) < · · · .
write f (n) = Θ(g(n)) or f (n) ≈ g(n).
It is easy to see that “∼” is an order relation be- This scale reflects well-known properties: the loga-
tween functions f : N → R, and that “≈” is an equiv- rithm grows more slowly than any power nǫ , how-
alence relation. We observe explicitly that when f (n) ever small ǫ, while en grows faster than any power
10 CHAPTER 1. INTRODUCTION

nk , however large k, independent of n. Note that


nn = en ln n and therefore O(en ) < O(nn ). As a mat-
ter of fact, the scale
√ is not complete, as we obviously
have O(n0.4 ) < O( n) and O(ln ln n) < O(ln n), but
the reader can easily fill in other values, which can
be of interest to him or her.
We can compare f (n) to the elements of the scale
and decide, for example, that f (n) = O(n); this
means that algorithm A performs at least as well as
any algorithm B whose behavior is described by a
function g(n) = Θ(n).
Let f (n) be a sequence describing the behavior of
an algorithm A. If f (n) = O(1), then the algorithm
A performs in a constant time, i.e., in a time indepen-
dent of n, the parameter used to evaluate the algo-
rithm behavior. Algorithms of this type are the best
possible algorithms, and they are especially interest-
ing when n becomes very large. If f (n) = O(n), the
algorithm A is said to be linear; if f (n) = O(ln n), A
is said to be logarithmic; if f (n) = O(n2 ), A is said
to be quadratic; and if f (n) ≥ O(en ), A is said to
be exponential. In general, if f (n) ≤ O(nr ) for some
r ∈ R, then A is said to be polynomial.
As a standard terminology, mainly used in the
“Theory of Complexity”, polynomial algorithms are
also called tractable, while exponential algorithms are
called intractable. These names are due to the follow-
ing observation. Suppose we have a linear algorithm
A and an exponential algorithm B, not necessarily
solving the same problem. Also suppose that an hour
of computer time executes both A and B with n = 40.
If a new computer is used which is 1000 times faster
than the old one, in an hour algorithm A will exe-
cute with m such that f (m) = KA m = 1000KA n, or
m = 1000n = 40, 000. Therefore, the problem solved
by the new computer is 1000 times larger than the
problem solved by the old computer. For algorithm
B we have g(m) = KB em = 1000KB en ; by simplify-
ing and taking logarithms, we find m = n + ln 1000 ≈
n + 6.9 < 47. Therefore, the improvement achieved
by the new computer is almost negligible.
Chapter 2

Special numbers

A sequence is a mapping from the set N of nat- we write B = {x | P (x) is true}, thus giving an inten-
ural numbers into some other set of numbers. If sional definition of the particular set B.
f : N → R, the sequence is called a sequence of real
If S is a finite set, then |S| denotes its cardinality
numbers; if f : N → Q, the sequence is called a se-
or the number of its elements: The order in which we
quence of rational numbers; and so on. Usually, the
write or consider the elements of S is irrelevant. If we
image of a k ∈ N is denoted by fk instead of the tradi-
wish to emphasize a particular arrangement or order-
tional f (k), and the whole sequence is abbreviated as
ing of the elements in S, we write (a1 , a2 , . . . , an ), if
(f0 , f1 , f2 , . . .) = (fk )k∈N . Because of this notation,
|S| = n and S = {a1 , a2 , . . . , an } in any order. This is
an element k ∈ N is called an index.
the vector notation and is used to represent arrange-
Often, we also study double sequences, i.e., map-
ments of S. Two arrangements of S are different if
pings f : N × N → R or some other nu-
and only if an index k exists, for which the elements
meric set. In this case also, instead of writing
corresponding to that index in the two arrangements
f (n, k) we will usually write fn,k and the whole
are different; obviously, as sets, the two arrangements
sequence will be denoted by {fn,k | n, k ∈ N} or
continue to be the same.
(fn,k )n,k∈N . A double sequence can be displayed
as an infinite array of numbers, whose first row is If A, B are two sets, a mapping or function from A
the sequence (f0,0 , f0,1 , f0,2 , . . .), the second row is into B, noted as f : A → B, is a subset of the Carte-
(f1,0 , f1,1 , f1,2 , . . .), and so on. The array can also sian product of A by B, f ⊆ A × B, such that every
be read by columns, and then (f0,0 , f1,0 , f2,0 , . . .) is element a ∈ A is the first element of one and only
the first column, (f0,1 , f1,1 , f2,1 , . . .) is the second col- one pair (a, b) ∈ f . The usual notation for (a, b) ∈ f
umn, and so on. Therefore, the index n is the row is f (a) = b, and b is called the image of a under the
index and the index k is the column index. mapping f . The set A is the domain of the map-
We wish to describe here some sequences and dou- ping, while B is the range or codomain. A function
ble sequences of numbers, frequently occurring in the for which every a1 6= a2 ∈ A corresponds to pairs
analysis of algorithms. In fact, they arise in the study (a1 , b1 ) and (a2 , b2 ) with b1 6= b2 is called injective. A
of very simple and basic combinatorial problems, and function in which every b ∈ B belongs at least to one
therefore appear in more complex situations, so to be pair (a, b) is called surjective. A bijection or 1-1 cor-
considered as the fundamental bricks in a solid wall. respondence or 1-1 mapping is any injective function
which is also surjective.
If |A| = n and |B| = m, the cartesian product
2.1 Mappings and powers A × B, i.e., the set of all the couples (a, b) with a ∈ A
and b ∈ B, contains exactly nm elements or couples.
In all branches of Mathematics two concepts appear, A more difficult problem is to find out how many
which have to be taken as basic: the concept of a mappings from A to B exist. We can observe that
set and the concept of a mapping. A set is a col- every element a ∈ A must have its image in B; this
lection of objects and it must be considered a primi- means that we can associate to a any one of the m
tive concept, i.e., a concept which cannot be defined elements in B. Therefore, we have m · m · . . . · m
in terms of other and more elementary concepts. If different possibilities, when the product is extended
a, b, c, . . . are the objects (or elements) in a set de- to all the n elements in A. Since all the mappings can
noted by A, we write A = {a, b, c, . . .}, thus giving be built in this way, we have a total of mn different
an extensional definition of this particular set. If a mappings from A into B. This also explains why the
set B is defined through a property P of its elements, set of mappings from A into B is often denoted by

11
12 CHAPTER 2. SPECIAL NUMBERS

B A ; in fact we can write: In order to abstract from the particular nature of


¯ A¯ the n objects, we will use the numbers {1, 2, . . . , n} =
¯B ¯ = |B||A| = mn .
Nn , and define a permutation as a 1-1 mapping π :
Nn → Nn . By identifying a and 1, b and 2, c and 3,
This formula allows us to solve some simple com-
the six permutations of 3 objects are written:
binatorial problems. For example, if we toss 5 coins,
µ ¶ µ ¶ µ ¶
how many different configurations head/tail are pos- 1 2 3 1 2 3 1 2 3
sible? The five coins are the domain of our mappings , , ,
1 2 3 1 3 2 2 1 3
and the set {head,tail} is the codomain. Therefore µ ¶ µ ¶ µ ¶
we have a total of 25 = 32 different configurations. 1 2 3 1 2 3 1 2 3
, , ,
Similarly, if we toss three dice, the total number of 2 3 1 3 1 2 3 2 1
3
configurations is 6 = 216. In the same way, we can
where, conventionally, the first line contains the el-
count the number of subsets in a set S having |S| = n.
ements in Nn in their proper order, and the sec-
In fact, let us consider, given a subset A ⊆ S, the
ond line contains the corresponding images. This
mapping χA : S → {0, 1} defined by:
is the usual representation for permutations, but
½ since the first line can be understood without
χA (x) = 1 for x ∈ A
χA (x) = 0 for x ∈ /A ambiguity, the vector representation for permuta-
tions is more common. This consists in writ-
This is called the characteristic function of the sub- ing the second line (the images) in the form of
set A; every two different subsets of S have dif- a vector. Therefore, the six permutations are
ferent characteristic functions, and every mapping (1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1), re-
f : S → {0, 1} is the characteristic function of some spectively.
subset A ⊆ S, i.e., the subset {x ∈ S | f (x) = 1}. Let us examine the permutation π = (3, 2, 1) for
Therefore, there are as many subsets of S as there which we have π(1) = 3, π(2) = 2 and π(3) = 1.
are characteristic functions; but these are 2n by the If we start with the element 1 and successively ap-
formula above. ply the mapping π, we have π(1) = 3, π(π(1)) =
A finite set A is sometimes called an alphabet and π(3) = 1, π(π(π(1))) = π(1) = 3 and so on. Since
its elements symbols or letters. Any sequence of let- the elements in Nn are finite, by starting with any
ters is called a word; the empty sequence is the empty k ∈ Nn we must obtain a finite chain of numbers,
word and is denoted by ǫ. From the previous consid- which will repeat always in the same order. These
erations, if |A| = n, the number of words of length m numbers are said to form a cycle and the permuta-
is nm . The first part of Chapter 6 is devoted to some tion (3, 2, 1) is formed by two cycles, the first one
basic notions on special sets of words or languages. composed by 1 and 3, the second one only composed
by 2. We write (3, 2, 1) = (1 3)(2), where every cycle
is written between parentheses and numbers are sep-
2.2 Permutations arated by blanks, to distinguish a cycle from a vector.
Conventionally, a cycle is written with the smallest
In the usual sense, a permutation of a set of objects
element first and the various cycles are arranged ac-
is an arrangement of these objects in any order. For
cording to their first element. Therefore, in this cycle
example, three objects, denoted by a, b, c, can be ar-
representation the six permutations are:
ranged into six different ways:
(1)(2)(3), (1)(2 3), (1 2)(3), (1 2 3), (1 3 2), (1 3)(2).
(a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a).
A number k for which π(k) = k is called a fixed
A very important problem in Computer Science is
point for π. The corresponding cycles, formed by a
“sorting”: suppose we have n objects from an or-
single element, are conventionally understood, except
dered set, usually some set of numbers or some set of
in the identity (1, 2, . . . , n) = (1)(2) · · · (n), in which
strings (with the common lexicographic order); the
all the elements are fixed points; the identity is simply
objects are given in a random order and the problem
written (1). Consequently, the usual representation
consists in sorting them according to the given order.
of the six permutations is:
For example, by sorting (60, 51, 80, 77, 44) we should
obtain (44, 51, 60, 77, 80) and the real problem is to (1) (2 3) (1 2) (1 2 3) (1 3 2) (1 3).
obtain this ordering in the shortest possible time. In
other words, we start with a random permutation of A permutation without any fixed point is called a
the n objects, and wish to arrive to their standard derangement. A cycle with only two elements is called
ordering, the one in accordance with their supposed a transposition. The degree of a cycle is the number
order relation (e.g., “less than”). of its elements, plus one; the degree of a permutation
2.3. THE GROUP STRUCTURE 13

is the sum of the degrees of its cycles. The six per- In fact, in cycle notation, we have:
mutations have degree 2, 3, 3, 4, 4, 3, respectively. A
permutation is even or odd according to the fact that (2 5 4 7 3 6)(2 6 3 7 4 5) =
its degree is even or odd. = (2 6 3 7 4 5)(2 5 4 7 3 6) = (1).
The permutation (8, 9, 4, 3, 6, 1, 7, 2, 10, 5),
in vector notation, has a cycle representation A simple observation is that the inverse of a cycle
(1 8 2 9 10 5 6)(3 4), the number 7 being a fixed is obtained by writing its first element followed by
point. The long cycle (1 8 2 9 10 5 6) has degree 8; all the other elements in reverse order. Hence, the
therefore the permutation degree is 8 + 3 + 2 = 13 inverse of a transposition is the same transposition.
and the permutation is odd. Since composition is associative, we have proved
that (Pn , ◦) is a group. The group is not commuta-
tive, because, for example:
2.3 The group structure ρ◦π = (1 4)(2 5 7 3) ◦ (2 5 4 7 3 6)
Let n ∈ N; Pn denotes the set of all the permutations = (1 7 6 2 4)(3 5) 6= π ◦ ρ.
of n elements, i.e., according to the previous sections,
the set of 1-1 mappings π : Nn → Nn . If π, ρ ∈ Pn , An involution is a permutation π such that π 2 = π ◦
we can perform their composition, i.e., a new permu- π = (1). An involution can only be composed by fixed
tation σ defined as σ(k) = π(ρ(k)) = (π ◦ ρ)(k). An points and transpositions, because by the definition
example in P7 is: we have π −1 = π and the above observation on the
π◦ρ= inversion of cycles shows that a cycle with more than
µ ¶µ ¶2 elements has an inverse which cannot coincide with
1 2 3 4 5 6 7 1 2 3 4 5 6 7 the cycle itself.
1 5 6 7 4 2 3 4 5 2 1 7 6 3 Till now, we have supposed that in the cycle repre-
µ ¶
1 2 3 4 5 6 7 sentation every number is only considered once. How-
= .
4 7 6 3 1 5 2 ever, if we think of a permutation as the product of
In fact, by instance, π(2) = 5 and ρ(5) = 7; therefore cycles, we can imagine that its representation is not
σ(2) = ρ(π(2)) = ρ(5) = 7, and so on. The vec- unique and that an element k ∈ Nn can appear in
tor representation of permutations is not particularly more than one cycle. The representation of σ or π ◦ ρ
suited for hand evaluation of composition, although it are examples of this statement. In particular, we can
is very convenient for computer implementation. The obtain the transposition representation of a permuta-
opposite situation occurs for cycle representation: tion; we observe that we have:

(2 5 4 7 3 6) ◦ (1 4)(2 5 7 3) = (1 4 3 6 5)(2 7) (2 6)(6 5)(6 4)(6 7)(6 3) = (2 5 4 7 3 6)


Cycles in the left hand member are read from left to We transform the cycle into a product of transposi-
right and by examining a cycle after the other we find tions by forming a transposition with the first and
the images of every element by simply remembering the last element in the cycle, and then adding other
the cycle successor of the same element. For example, transpositions with the same first element (the last
the image of 4 in the first cycle is 7; the second cycle element in the cycle) and the other elements in the
does not contain 7, but the third cycle tells us that the cycle, in the same order, as second element. Besides,
image of 7 is 3. Therefore, the image of 4 is 3. Fixed we note that we can always add a couple of transpo-
points are ignored, and this is in accordance with sitions as (2 5)(2 5), corresponding to the two fixed
their meaning. The composition symbol ‘◦’ is usually points (2) and (5), and therefore adding nothing to
understood and, in reality, the simple juxtaposition the permutation. All these remarks show that:
of the cycles can well denote their composition.
The identity mapping acts as an identity for com- • every permutation can be written as the compo-
position, since it is only composed by fixed points. sition of transpositions;
Every permutation has an inverse, which is the per-
mutation obtained by reading the given permutation • this representation is not unique, but every
from below, i.e., by sorting the elements in the second two representations differ by an even number of
line, which become the elements in the first line, and transpositions;
then rearranging the elements in the first line into • the minimal number of transpositions corre-
the second line. For example, the inverse of the first sponding to a cycle is the degree of the cycle
permutation π in the example above is: (except possibly for fixed points, which however
µ ¶
1 2 3 4 5 6 7 always correspond to an even number of trans-
. positions).
1 6 7 5 2 3 4
14 CHAPTER 2. SPECIAL NUMBERS

Therefore, we conclude that an even [odd] permuta- When n ≥ 2, we can add to every permutation in
tion can be expressed as the composition of an even Pn one transposition, say (1 2). This transforms ev-
[odd] number of transpositions. Since the composi- ery even permutation into an odd permutation, and
tion of two even permutations is still an even permu- vice versa. On the other hand, since (1 2)−1 = (1 2),
tation, the set An of even permutations is a subgroup the transformation is its own inverse, and therefore
of Pn and is called the alternating subgroup, while the defines a 1-1 mapping between even and odd permu-
whole group Pn is referred to as the symmetric group. tations. This proves that the number of even (odd)
permutations is n!/2.
Another simple problem is how to determine the
2.4 Counting permutations number of involutions on n elements. As we have al-
ready seen, an involution is only composed by fixed
How many permutations are there in Pn ? If n = 1,
points and transpositions (without repetitions of the
we only have a single permutation (1), and if n = 2
elements!). If we denote by In the set of involutions
we have two permutations, exactly (1, 2) and (2, 1).
of n elements, we can divide In into two subsets: In′
We have already seen that |P3 | = 6 and if n = 0
is the set of involutions in which n is a fixed point,
we consider the empty vector () as the only possible
and In′′ is the set of involutions in which n belongs
permutation, that is |P0 | = 1. In this way we obtain
to a transposition, say (k n). If we eliminate n from
a sequence {1, 1, 2, 6, . . .} and we wish to obtain a
the involutions in In′ , we obtain an involution of n − 1
formula giving us |Pn |, for every n ∈ N.
elements, and vice versa every involution in In′ can be
Let π ∈ Pn be a permutation and (a1 , a2 , ..., an ) be
obtained by adding the fixed point n to an involution
its vector representation. We can obtain a permuta-
in In−1 . If we eliminate the transposition (k n) from
tion in Pn+1 by simply adding the new element n + 1
an involution in In′′ , we obtain an involution in In−2 ,
in any position of the representation of π:
which contains the element n−1, but does not contain
(n + 1, a1 , a2 , . . . , an ) (a1 , n + 1, a2 , . . . , an ) ··· the element k. In all cases, however, by eliminating
(k n) from all involutions containing it, we obtain a
··· (a1 , a2 , . . . , an , n + 1) set of involutions in a 1-1 correspondence with In−2 .
The element k can assume any value 1, 2, . . . , n − 1,
Therefore, from any permutation in Pn we obtain
and therefore we obtain (n − 1) times |In−2 | involu-
n+1 permutations in Pn+1 , and they are all different.
tions.
Vice versa, if we start with a permutation in Pn+1 ,
We now observe that all the involutions in In are
and eliminate the element n + 1, we obtain one and
obtained in this way from involutions in In−1 and
only one permutation in Pn . Therefore, all permuta-
In−2 , and therefore we have:
tions in Pn+1 are obtained in the way just described
and are obtained only once. So we find: |In | = |In−1 | + (n − 1)|In−2 |
|Pn+1 | = (n + 1) |Pn | Since |I0 | = 1, |I1 | = 1 and |I2 | = 2, from this recur-
which is a simple recurrence relation. By unfolding rence relation we can successively find all the values
this recurrence, i.e., by substituting to |Pn | the same of |In |. This sequence (see Section 4.9) is therefore:
expression in |Pn+1 |, and so on, we obtain: n 0 1 2 3 4 5 6 7 8
|Pn+1 | = (n + 1)|Pn | = (n + 1)n|Pn−1 | = In 1 1 2 4 10 26 76 232 764
= ··· = We conclude this section by giving the classical
= (n + 1)n(n − 1) · · · 1 × |P0 | computer program for generating a random permu-
tation of the numbers {1, 2, . . . , n}. The procedure
Since, as we have seen, |P0 |=1, we have proved that
shuffle receives the address of the vector, and the
the number of permutations in Pn is given by the
number of its elements; fills it with the numbers from
product n · (n − 1)... · 2 · 1. Therefore, our sequence
1 to n then uses the standard procedure random to
is:
produce a random permutation, which is returned in
n 0 1 2 3 4 5 6 7 8 the input vector:
|Pn | 1 1 2 6 24 120 720 5040 40320
procedure shuffle( var v : vector; n : integer ) ;
As we mentioned in the Introduction, the number var : · · ·;
n·(n−1)...·2·1 is called n factorial and is denoted by begin
n!. For example we have 10! = 10·9·8·7·6·5·4·3·2·1 = for i := 1 to n do v[ i ] := i ;
3, 680, 800. Factorials grow very fast, but they are one for i := n downto 2 do begin
of the most important quantities in Mathematics. j := random( i ) + 1 ;
2.5. DISPOSITIONS AND COMBINATIONS 15

a := v[ i ]; v[ i ] := v[ j ]; v[ j ] := a of these objects are:


end
C3,0 = ∅
end {shuffle} ;
C3,1 = {{a}, {b}, {c}}
C3,2 = {{a, b}, {a, c}, {b, c}}
The procedure exchanges the last element in v with C3,3 = {{a, b, c}}
a random element in v, possibly the same last ele-
ment. In this way the last element is chosen at ran- The¡ number
¢ of k-combinations of n objects is denoted
n
dom, and the procedure goes on with the last but one by k , which is often read “n choose k”, and is called
element. In this way, the elements in v are properly a binomial coefficient. By the very definition we have:
shuffled and eventually v contains the desired random µ ¶ µ ¶
n n
permutation. The procedure obviously performs in =1 =1 ∀n ∈ N
0 n
linear time.
because, given a set of n elements, the empty set is
the only subset with 0 elements, and the whole set is
2.5 Dispositions and Combina- the only subset with n elements.
tions The name “binomial coefficients” is due to the well-
known “binomial formula”:
Permutations are a special case of a more general sit- Xn µ ¶
n n k n−k
uation. If we have n objects, we can wonder how (a + b) = a b
k
many different orderings exist of k among the n ob- k=0
jects. For example, if we have 4 objects a, b, c, d, we which is easily proved. In fact, when expanding the
can make 12 different arrangements with two objects product (a + b)(a + b) · · · (a + b), we choose a term
chosen from a, b, c, d. They are: from each factor (a + b); the resulting term ak bn−k is
obtained by summing over all the possible choices of
(a, b) (a, c) (a, d) (b, a) (b, c) (b, d)k a’s and ¡ n¢ − k b’s, which, by the definition above,
are just nk .
(c, a) (c, b) (c, d) (d, a) (d, b) (d, c) There exists a simple formula to compute binomial
These arrangements ara called dispositions and, in coefficients. As we have seen, there are nk differ-
general, we can use any one of the n objects to be first ent k-dispositions of n objects; by permuting the k
in the permutation. There remain only n − 1 objects objects in the disposition, we obtain k! other dispo-
to be used as second element, and n − 2 objects to sitions with the same elements. Therefore, k! dis-
be used as a third element, and so on. Therefore, positions correspond to a single combination and we
the k objects can be selected in n(n − 1)...(n − k + 1) have:
µ ¶
different ways. If Dn,k denotes the number of possible n nk n(n − 1) . . . (n − k + 1)
dispositions of n elements in groups of k, we have: = =
k k! k!

Dn,k = n(n − 1) · · · (n − k + 1) = nk This formula gives a simple way to compute binomial


coefficients in a recursive way. In fact we have:
The symbol nk is called a falling factorial because it µn¶ n(n − 1) . . . (n − k + 1)
consists of k factors beginning with n and decreasing = =
k k!
by one down to (n − k + 1). Obviously, nn = n! and,
by convention, n0 = 1. There exists also a rising n (n − 1) . . . (n − k + 1)
= =

factorial n = n(n + 1) · · · (n + k − 1), often denoted k (k − 1)!
µ ¶
by (n)k , the so-called Pochhammer symbol. n n−1
=
When in k-dispositions we do not consider the k k−1
order of the elements, we obtain what are called
and¡ the
¢ expression on the right is successively reduced
k-combinations. Therefore, there are only 6 2-
to 0r , which is 1 as we have already seen. For exam-
combinations of 4 objects, and they are:
ple, we have:
{a, b} {a, c} {a, d} {b, c} {b, d} {c, d} µ ¶ µ ¶ µ ¶ µ ¶
7 7 6 76 5 765 4
= = = = 35
3 3 2 32 1 321 0
Combinations are written between braces, because
they are simply the subsets with k objects of a set of n It is not¡ difficult
¢ to compute a binomial coefficient
¡100¢
objects. If A = {a, b, c}, all the possible combinations such as 1003 , but it is a hard job to compute 97
16 CHAPTER 2. SPECIAL NUMBERS

by means of the above formulas. There exists, how- 2.6 The Pascal triangle
ever, a symmetry formula which is very useful. Let
us begin by observing that: Binomial coefficients satisfy a very important recur-
µ ¶ rence relation, ¡ ¢ which we are now going to prove. As
n n(n − 1) . . . (n − k + 1) we know, nk is the number of the subsets with k
= =
k k! elements of a set with n elements. We can count
n . . . (n − k + 1) (n − k) . . . 1 the number of such subsets in the following way. Let
= =
k! (n − k) . . . 1 {a1 , a2 , . . . , an } be the elements of the base set, and
n! let us fix one of these elements, e.g., an . We can dis-
= tinguish the subsets with k elements into two classes:
k!(n − k)!
the subsets containing an and the subsets that do
This is a very important formula by its own, and it not contain an . Let S + and S − be these two classes.
shows that: We now point out that S + can be seen (by elimi-
µ ¶ µ ¶
n n! n nating an ) as the subsets with k − 1 elements of a
= = set with n − 1 elements;
n−k (n − k)!(n − (n − k))! k ¡ ¢ therefore, the −
number of the
¡ ¢ elements in S + is n−1 k−1 . The class S can be seen
From this formula we immediately obtain 100 97 = as composed by the subsets with k elements of a set
¡100¢
3 . The most difficult computing problem is thewith n − 1 elements, i.e., ¡ the ¢ base set minus an : their
¡ ¢
evaluation of the central binomial coefficient 2k number is therefore n−1 k . By summing these two
k , for
which symmetry gives no help. contributions, we obtain the recurrence relation:
The reader is invited to produce a computer pro- µ ¶ µ ¶ µ ¶
n n−1 n−1
gram to evaluate binomial coefficients. He (or she) is = +
warned not to use the formula n!/k!(n−k)!, which can k k k−1
produce very large numbers, exceeding the capacity which can be used with the initial conditions:
of the computer when n, k are not small. µ ¶ µ ¶
The definition of a binomial coefficient can be eas- n n
= =1 ∀n ∈ N.
ily expanded to any real numerator: 0 n
µ ¶
r rk r(r − 1) . . . (r − k + 1) For example, we have:
= = . µ ¶ µ ¶ µ ¶
k k! k!
4 3 3
= + =
For example we have: 2 2 1
µ ¶ µ ¶ µ ¶ µ ¶ µ ¶
1/2 1/2(−1/2)(−3/2) 1 2 2 2 2
= = = + + + =
3 3! 16 2 1 1 0
µ ¶ µµ ¶ µ ¶¶
2 1 1
but in this case the symmetry rule does not make = 2+2 =2+2 + =
1 1 0
sense. We point out that:
µ ¶ = 2 + 2 × 2 = 6.
−n −n(−n − 1) . . . (−n − k + 1)
= = This recurrence is not particularly suitable for nu-
k k!
merical computation. However, it gives a simple rule
(−1)k (n + k − 1)k to compute successively all the binomial coefficients.
=
µ k! ¶ Let us dispose them in an infinite array, whose rows
n+k−1 k represent the number n and whose columns represent
= (−1)
k the number k. The recurrence tells us that the ele-
which allows us to express a binomial coefficient with ment in position (n, k) is obtained by summing two
negative, integer numerator as a binomial coefficient elements in the previous row: the element just above
with positive numerator. This is known as negation the position (n, k), i.e., in position (n − 1, k), and the
rule and will be used very often. element on the left, i.e., in position (n−1, k −1). The
If in a combination we are allowed to have several array is initially filled by 1’s
¡n¢in the first column (cor-
copies of the same element, we obtain a combination responding to the various 0 ) and
¡n¢the main diagonal
with repetitions. A useful exercise is to prove that the (corresponding to the numbers n ). See Table 2.1.
number of the k by k combinations with repetitions We actually obtain an infinite, lower triangular ar-
of n elements is: ray known as the Pascal triangle (or Tartaglia-Pascal
µ ¶ triangle). The symmetry rule is quite evident and a
n+k−1 simple observation is that the sum of the elements in
Rn,k = .
k row n is 2n . The proof of this fact is immediate, since
2.7. HARMONIC NUMBERS 17

as can be shown by mathematical induction. The


n\k 0 1 2 3 4 5 6 first two cases are f (0) = (1 + x)r = r0 (1 + x)r−0 ,
0 1 f ′ (x) = r(1 + x)r−1 . Suppose now that the formula
1 1 1 holds true for some n ∈ N and let us differentiate it
2 1 2 1 once more:
3 1 3 3 1
4 1 4 6 4 1 f (n+1) (x) = rn (r − n)(1 + x)r−n−1 =
5 1 5 10 10 5 1 = rn+1 (1 + x)r−(n+1)
6 1 6 15 20 15 6 1
and this proves our statement. Because of that, (1 +
x)r has a Taylor expansion around the point x = 0
Table 2.1: The Pascal triangle of the form:
(1 + x)r =
′ ′′ (n)
a row represents the total number of the subsets in a = f (0) + f (0) x + f (0) x2 + · · · + f (0) xn + · · · .
set with n elements, and this number is obviously 2n . 1! 2! n!
The reader can try to prove that the row alternating The coefficient of xn is therefore f (n) (0)/n! =
¡ ¢
sums, e.g., the sum 1−4+6−4+1, equal zero, except rn /n! = r , and so:
n
for the first row, the row with n = 0.
X ∞ µ ¶
¡2nAs
¢ we have already mentioned, the numbers cn = (1 + x) = r r n
x .
n are called central binomial coefficients and their n
sequence begins: n=0

We conclude with the following property, which is


n 0 1 2 3 4 5 6 7 8
called the cross-product rule:
cn 1 2 6 20 70 252 924 3432 12870 µ ¶µ ¶
n k n! k!
They are very important, and, for¡ example, we can = =
−1/2
¢ k r k!(n − k)! r!(k − r)!
express the binomial coefficients in terms of
n n! (n − r)!
the central binomial coefficients: = =
µ ¶ r!(n − r)! (n − k)!(k − r)!
−1/2 (−1/2)(−3/2) · · · (−(2n − 1)/2) µ ¶µ ¶
= = n n−r
n n! = .
r k−r
n
(−1) 1 · 3 · · · (2n − 1)
= = This rule, together with the symmetry and the nega-
2n n!
n
(−1) 1 · 2 · 3 · 4 · · · (2n − 1) · (2n) tion rules, are the three basic properties of binomial
= = coefficients:
2n n!2 · 4 · · · (2n) µ ¶ µ ¶ µ ¶ µ ¶
(−1)n (2n)! (−1)n (2n)! n n −n n+k−1
= = = = = (−1)k
n n
2 n!2 (1 · 2 · · · n) 4 n n!2 k n − k k k
µ ¶ µ ¶µ ¶ µ ¶µ ¶
(−1)n 2n n k n n−r
= . =
4n n k r r k−r
In a similar way, we can prove the following identities: and are to be remembered by heart.
µ ¶ µ ¶
1/2 (−1)n−1 2n
=
n 4n (2n − 1) n 2.7 Harmonic numbers
µ ¶ µ ¶
3/2 (−1)n 3 2n
= It is well-known that the harmonic series:
n 4n (2n − 1)(2n − 3) n
µ ¶ µ ¶ ∞
−3/2 (−1)n−1 (2n + 1) 2n X 1 1 1 1 1
= . = 1 + + + + + ···
n 4n n k 2 3 4 5
k=1
An important point of this generalization is that
diverges. In fact, if we cumulate the 2m numbers
the binomial formula can be extended to real expo-
from 1/(2m + 1) to 1/2m+1 , we obtain:
nents. Let us consider the function f (x) = (1+x)r ; it
is continuous and can be differentiated as many times 1 1 1
as we wish; in fact we have: + + · · · + m+1 >
2m + 1 2m + 2 2
dn 1 1 1 2m 1
f (n) (x) = (1 + x)r = rn (1 + x)r−n > + + ··· + = =
dxn 2m+1 2m+1 2m+1 2m+1 2
18 CHAPTER 2. SPECIAL NUMBERS

and therefore the sum cannot be limited. On the Since H20 = H1 = 1, we have the limitations:
other hand we can define:
ln 2k < H2k < ln 2k + 1.
1 1 1
Hn = 1 + + + · · · +
2 3 n These limitations can be extended to every n, and
since the values of the Hn ’s are increasing, this im-
a finite, partial sum of the harmonic series. This plies that a constant γ should exist (0 < γ < 1) such
number has a well-defined value and is called a har- that:
monic number. Conventionally, we set H0 = 0 and Hn → ln n + γ as n→∞
the sequence of harmonic numbers begins:
This constant is called the Euler-Mascheroni con-
n 0 1 2 3 4 5 6 7 8 stant and, as we have already mentioned, its value
3 11 25 137 49 363 761 is: γ ≈ 0.5772156649 . . .. Later we will prove the
Hn 0 1
2 6 12 60 20 140 280 more accurate approximation of the Hn ’s we quoted
in the Introduction.
Harmonic numbers arise in the analysis of many
The generalized harmonic numbers are defined as:
algorithms and it is very useful to know an approxi-
mate value for them. Let us consider the series: 1 1 1 1
Hn(s) = s + s + s + · · · + s
1 1 1 1 1 2 3 n
1 − + − + − · · · = ln 2
2 3 4 5 (1)
and Hn = Hn . They are the partial sums of the
and let us define: series defining the Riemann ζ function:

1 1 1 (−1)n−1 1 1 1
Ln = 1 − + − + ··· + . ζ(s) = s
+ s + s + ···
2 3 4 n 1 2 3

Obviously we have: which can be defined in such a way that the sum
actually converges except for s = 1 (the harmonic
µ ¶
1 1 1 series). In particular we have:
H2n − L2n = 2 + + ··· + = Hn
2 4 2n
1 1 1 1 π2
ζ(2) = 1 + + + + + ··· =
or H2n = L2n + Hn , and since the series for ln 2 is al- 4 9 16 25 6
ternating in sign, the error committed by truncating 1 1 1 1
it at any place is less than the first discarded element. ζ(3) = 1 + + + + + ···
8 27 64 125
Therefore:
1 1 1 1 1 π4
ln 2 − < L2n < ln 2 ζ(4) = 1 + + + + + ··· =
2n 16 81 256 625 90
and by summing Hn to all members: and in general:

1 (2π)2n
Hn + ln 2 − < H2n < Hn + ln 2 ζ(2n) = |B2n |
2n 2(2n)!

Let us now consider the two cases n = 2k−1 and n = where Bn are the Bernoulli numbers (see below). No
2k−2 : explicit formula is known for ζ(2n + 1), but numeri-
cally we have:
1
H2k−1 + ln 2 − k < H2k < H2k−1 + ln 2
2 ζ(3) ≈ 1.202056903 . . . .
1 Because of the limited value of ζ(s), we can set, for
H2k−2 + ln 2 − < H2k−1 < H2k−2 + ln 2.
2k−1 large values of n:
By summing and simplifying these two expressions,
we obtain: Hn(s) ≈ ζ(s)

1 1
H2k−2 + 2 ln 2 − − k−1 < H2k < H2k−2 + 2 ln 2.
2k 2 2.8 Fibonacci numbers
We can now iterate this procedure and eventually At the beginning of 1200, Leonardo Fibonacci intro-
find: duced in Europe the positional notation for numbers,
1 1 1 together with the computing algorithms for perform-
H20 +k ln 2− k
− k−1 −· · ·− 1 < H2k < H20 +k ln 2. ing the four basic operations. In fact, Fibonacci was
2 2 2
2.9. WALKS, TREES AND CATALAN NUMBERS 19

the most important mathematician in western Eu-


rope at that time. He posed the following problem:
a farmer has a couple of rabbits which generates an-
other couple after two months and, from that moment
on, a new couple of rabbits every month. The new
generated couples become fertile after two months,
when they begin to generate a new couple of rab-
bits every month. The problem consists in comput-
ing how many couples of rabbits the farmer has after Figure 2.1: Fibonacci coverings for a strip 4 dm long
n months.
It is a simple matter to find the initial values: there
is one couple at the beginning (1st month) and 1 in therefore we have the recurrence relation:
the second month. The third month the farmer has
Mn = Mn−1 + Mn−2
2 couples, and 3 couples the fourth month; in fact,
the first couple has generated another pair of rabbits, which is the same recurrence as for Fibonacci num-
while the previously generated couple of rabbits has bers. This time, however, we have the initial condi-
not yet become fertile. The couples become 5 on the tions M0 = 1 (the empty covering is just a covering!)
fifth month: in fact, there are the 3 couples of the and M1 = 1. Therefore we conclude Mn = Fn+1 .
preceding month plus the newly generated couples, Euclid’s algorithm for computing the Greatest
which are as many as there are fertile couples; but Common Divisor (gcd) of two positive integer num-
these are just the couples of two months beforehand, bers is another instance of the appearance of Fi-
i.e., 2 couples. In general, at the nth month, the bonacci numbers. The problem is to determine the
farmer will have the couples of the previous month maximal number of divisions performed by Euclid’s
plus the new couples, which are generated by the fer- algorithm. Obviously, this maximum is attained
tile couples, that is the couples he had two months when every division in the process gives 1 as a quo-
before. If we denote by Fn the number of couples at tient, since a greater quotient would drastically cut
the nth month, we have the Fibonacci recurrence: the number of necessary divisions. Let us consider
two consecutive Fibonacci numbers, for example 34
Fn = Fn−1 + Fn−2 and 55, and let us try to find gcd(34, 55):

with the initial conditions F1 = F2 = 1. By the same 55 = 1 × 34 + 21


rule, we have F0 = 0 and the sequence of Fibonacci 34 = 1 × 21 + 13
numbers begins: 21 = 1 × 13 + 8
13 = 1×8 + 5
n 0 1 2 3 4 5 6 7 8 9 10 8 = 1×5 + 3
Fn 0 1 1 2 3 5 8 13 21 34 55 5 = 1×3 + 2
3 = 1×2 + 1
every term obtained by summing the two preceding 2 = 2×1
numbers in the sequence.
Despite the small numbers appearing at the begin- We immediately see that the quotients are all 1 (ex-
ning of the sequence, Fibonacci numbers grow very cept the last one) and the remainders are decreasing
fast, and later we will see how they grow and how they Fibonacci numbers. The process can be inverted to
can be computed in a fast way. For the moment, we prove that only consecutive Fibonacci numbers enjoy
wish to show how Fibonacci numbers appear in com- this property. Therefore, we conclude that given two
binatorics and in the analysis of algorithms. Suppose integer numbers, n and m, the maximal number of
we have some bricks of dimensions 1 × 2 dm, and we divisions performed by Euclid’s algorithm is attained
wish to cover a strip 2 dm wide and n dm long by when n, m are two consecutive Fibonacci numbers
using these bricks. The problem is to know in how and the actual number of divisions is the order of the
many different ways we can perform this covering. In smaller number in the Fibonacci sequence, minus 1.
Figure 2.1 we show the five ways to cover a strip 4
dm long.
If Mn is this number, we can observe that a cov- 2.9 Walks, trees and Catalan
ering of Mn can be obtained by adding vertically a numbers
brick to a covering in Mn−1 or by adding horizon-
tally two bricks to a covering in Mn−2 . These are the “Walks” or “paths” are common combinatorial ob-
only ways of proceeding to build our coverings, and jects and are defined in the following way. Let Z2 be
20 CHAPTER 2. SPECIAL NUMBERS

the walk is decomposed into a first part (the walk be-


6 r
D¡ tween A and B) and a second part (the walk between
¡
¡ r
C and D) which are underdiagonal walks of the same
¡ type. There are bk−1 walks of the type AB and bn−k
¡ r walks of the type CD and, obviously, k can be any
¡ number between 1 and n. We therefore obtain the
¡ r recurrence relation:
¡
¡ r r r n
X n−1
X
¡ bn = bk−1 bn−k = bk bn−k−1
C r¡ r r r k=1 k=0
¡
¡ rB with the initial condition b0 = 1, corresponding to
¡ ¡
¡ r¡ r
the empty walk or the walk composed by the only
¡ ¡ point (0, 0). The sequence (bk )k∈N begins:
¡ ¡r r -
A n 0 1 2 3 4 5 6 7 8
bn 1 1 2 5 14 42 132 429 1430

Figure 2.2: How a walk is decomposed and, as we shall see:


µ ¶
1 2n
2
the integral lattice, i.e., the set of points in R having bn = .
n+1 n
integer coordinates. A walk or path is a finite sequence
of points in Z2 with the following properties: The bn ’s are called Catalan numbers and they fre-
quently occur in the analysis of algorithms and data
1. the origin (0, 0) belongs to the walk; structures. For example, if we associate an open
parenthesis to every east steps and a closed paren-
2. if (x + 1, y + 1) belongs to the walk, then either thesis to every north step, we obtain the number of
(x, y + 1) or (x + 1, y) also belongs to the walk. possible parenthetizations of an expression. When
we have three pairs of parentheses, the 5 possibilities
A pair of points ((x, y), (x + 1, y)) is called an east
are:
step and a pair ((x, y), (x, y + 1)) is a north step. It
is a simple matter to show that the number of walks ()()() ()(()) (())() (()()) ((())).
composed by n steps and ending at column ¡k ¢(i.e.,
the last point is (x, k) = (n − k, k)) is just nk . In When we build binary trees from permutations, we
fact, if we denote by 1, 2, . . . , n the n steps, starting do not always obtain different trees from different
with the origin, then we can associate to any walk a permutations. There are only 5 trees generated by
subset of Nn = {1, 2, . . . , n}, that is the subset of the the six permutations of 1, 2, 3, as we show in Figure
north-step numbers. Since the other steps should be 2.3.
east steps, a 1-1 correspondence exists between the How many different trees exist with n nodes? If we
walks and the subsets of N ¡ n¢ with k elements, which, fix our attention on the root, the left subtree has k
as we know, are exactly nk . This also proves, in a nodes for k = 0, 1, . . . , n − 1, while the right subtree
combinatorial way, the symmetry rule for binomial contains the remaining n − k − 1 nodes. Every tree
coefficients. with k nodes can be combined with every tree with
Among the walks, the ones that never go above the n − k − 1 nodes to form a tree with n nodes, and
main diagonal, i.e., walks no point of which has the therefore we have the recurrence relation:
form (x, y) with y > x, are particularly important. n−1
They are called underdiagonal walks. Later on, we X
bn = bk bn−k−1
will be able to count the number of underdiagonal
k=0
walks composed by n steps and ending at a hori-
zontal distance k from the main diagonal. For the which is the same recurrence as before. Since the
moment, we only wish to establish a recurrence rela- initial condition is again b0 = 1 (the empty tree) there
tion for the number bn of the underdiagonal walks of are as many trees as there are walks.
length 2n and ending on the main diagonal. In Fig. Another kind of walks is obtained by considering
2.2 we sketch a possible walk, where we have marked steps of type ((x, y), (x + 1, y + 1)), i.e., north-east
the point C = (k, k) at which the walk encounters steps, and of type ((x, y), (x + 1, y − 1)), i.e., south-
the main diagonal for the first time. We observe that east steps. The interesting walks are those starting
2.10. STIRLING NUMBERS OF THE FIRST KIND 21

1 1 2 3 3
@
@ @
@ ¡
¡ @
@ ¡
¡ ¡
¡
2 3 1 3 1 2
@
@ ¡
¡ @
@ ¡
¡
3 2 2 1

Figure 2.3: The five binary trees with three nodes

n\k 0 1 2 3 4 5 6
q q q q 0 1
q q ¡@
q¡ @q 1 0 1
q 2 0 1 1
3 0 2 3 1
4 0 6 11 6 1
q q q q q 5 0 24 50 35 10 1
q q q¡¡@ @q q ¡@
¡ @q q ¡q@
¡ @q 6 0 120 274 225 85 15 1
q q
¡ ¡@ @q q q
q
Table 2.2: Stirling numbers of the first kind

instances:
Figure 2.4: Rooted Plane Trees x1 = x
x2 = x(x − 1) = x2 − x
x3 = x(x − 1)(x − 2) = x3 − 3x2 + 2x
from the origin and never going below the x-axis; they x4 = x(x − 1)(x − 2)(x − 3) =
are called Dyck walks. An obvious 1-1 correspondence = x4 − 6x3 + 11x2 − 6x
exists between Dyck walks and the walks considered
above, and again we obtain the sequence of Catalan and picking the coefficients in their proper order
numbers: (from the smallest power to the largest) he obtained a
table of integer numbers. We are mostly interested in
the absolute values of these numbers, as are shown in
n 0 1 2 3 4 5 6 7 8 Table 2.2. After him, these numbers are called Stir-
bn 1 1 2 5 14 42 132 429 1430
£ling
n
¤ numbers of the first kind and are now denoted by
k , sometimes read “n cycle k”, for the reason we
Finally, the concept of a rooted planar tree is as are now going to explain.
follows: let us consider a node, which is the root of First note that the above identities can be written:
the tree; if we recursively add branches to the root or Xn h i
n
to the nodes generated by previous insertions, what xn = (−1)n−k xk .
k
we obtain is a “rooted plane tree”. If n denotes the k=0
number of branches in a rooted plane tree, in Fig. 2.4 n n−1
we represent all the trees up to n = 3. Again, rooted Let us now observe that x = x (x − n + 1) and
plane trees are counted by Catalan numbers. therefore we have:
X ·n − 1¸
n−1
n
x = (x − n + 1) (−1)n−k−1 xk =
k
k=0
n−1
X · ¸
2.10 Stirling numbers of the =
n−1
(−1)n−k−1 xk+1 −
k
first kind k=0
n−1
X · ¸
n−1
− (n − 1) (−1)n−k−1 xk =
About 1730, the English mathematician James Stir- k
k=0
ling was looking for a connection between powers n · ¸
X n−1
of a number x, say xn , and the falling factorials = (−1)n−k xk +
xk = x(x − 1) · · · (x − k + 1). He developed the first k=0
k−1
22 CHAPTER 2. SPECIAL NUMBERS

n
X · ¸
n−1 As an immediate consequence of this reasoning, we
+ (n − 1) (−1)n−k xk .
k find:
k=0 Xn h i
n
= n!,
We performed the change of variable k → k − 1 in k
k=0
the first sum and then extended both sums from 0
i.e., the row sums of the Stirling triangle of the first
to n. This identity is valid for every value of x, and
kind equal n!, because they correspond to the total
therefore we can equate its coefficients to those of the
number of permutations of n objects. We also observe
previous, general Stirling identity, thus obtaining the
that:
recurrence relation: £ ¤
hni · ¸ · ¸ • n1 = (n − 1)!; in fact, Sn,1 is composed by
n−1 n−1 all the permutations having a single cycle; this
= (n − 1) + .
k k k−1 begins by 1 and is followed by any permutations
of the n − 1 remaining numbers;
This recurrence, together with the initial conditions: h i
n
¡ ¢
hni hni • n−1 = n2 ; in fact, Sn,n−1 contains permu-
= 1, ∀n ∈ N and = 0, ∀n > 0, tations having all fixed points except a single
n 0
transposition; but this transposition can only be
completely defines the Stirling number of the first formed by taking ¡two
¢ elements among 1, 2, . . . , n,
kind. which is done in n2 different ways;
What is a possible combinatorial interpretation of £ ¤
these numbers? Let us consider the permutations of • n2 = (n − 1)!Hn−1 ; returning to the numeri-
n elements and count the permutations having ex- cal definition, the coefficient of x2 is a sum of
actly k cycles, whose set will be denoted by Sn,k . If products, in each of which a positive integer is
we fix any element, say the last element n, we ob- missing.
serve that the permutations in Sn,k can have n as a
fixed point, or not. When n is a fixed point and we
eliminate it, we obtain a permutation with n − 1 el-
2.11 Stirling numbers of the
ements having exactly k − 1 cycles; vice versa, any second kind
such permutation gives a permutation in Sn,k with n
as a fixed point if we add (n) to it. Therefore, there James Stirling also tried to invert the process de-
are |Sn−1,k−1 | such permutations in Sn,k . When n is scribed in the previous section, that is he was also
not a fixed point and we eliminate it from the permu- interested in expressing ordinary powers in terms of
tation, we obtain a permutation with n − 1 elements falling factorials. The first instances are:
and k cycles. However, the same permutation is ob-
x1 = x1
tained several times, exactly n − 1 times, since n can
x2 = x1 + x2 = x + x(x − 1)
occur after any other element in the standard cycle
x3 = x1 + 3x2 + x3 = x + 3x(x − 1)+
representation (it can never occur as the first element
+ x(x − 1)(x − 2)
in a cycle, by our conventions). For example, all the
x4 = x1 + 6x2 + 7x3 + x4
following permutations in S5,2 produce the same per-
mutation in S4,2 : The coefficients can be arranged into a triangular ar-
ray, as shown in Table 2.3, and are called Stirling
(1 2 3)(4 5) (1 2 3 5)(4) (1 2 5 3)(4) (1 5 2 3)(4). numbers©ofªthe second kind. The usual notation for
them is nk , often read “n subset k”, for the reason
The process can be inverted and therefore we have: we are going to explain.
Stirling’s identities can be globally written:
|Sn,k | = (n − 1)|Sn−1,k | + |Sn−1,k−1 |
n n o
X n
which is just the recurrence relation for the Stirling xk .
xn =
k
k=0
numbers of the first kind. If we now prove that also
the initial
£ conditions
¤ are the same, we conclude that We obtain a recurrence relation in the following way:
|Sn,k | = nk . First we observe that Sn,n is only com- n−1 ½ ¾
posed by the identity, the only permutation having n xn = xxn−1 = x X n − 1 xk =
cycles, i.e., n fixed points; so |Sn,n | = 1, ∀n ∈ N. k=0
k
Moreover, for n ≥ 1, Sn,0 is empty, because ev- n−1 ½ ¾
X n−1
ery permutation contains at least one cycle, and so = (x + k − k)xk =
|Sn,0 | = 0. This concludes the proof. k
k=0
2.12. BELL AND BERNOULLI NUMBERS 23

n\k 0 1 2 3 4 5 6 in Pn−1,k ; however, the same partition is obtained


0 1 several times, exactly by eliminating n from any of
1 0 1 the k subsets containing it in the various partitions.
2 0 1 1 For example, the following three partitions in P5,3 all
3 0 1 3 1 produce the same partition in P4,3 :
4 0 1 7 6 1
5 0 1 15 25 10 1 {1, 2, 5} ∪ {3} ∪ {4} {1, 2} ∪ {3, 5} ∪ {4}
6 0 1 31 90 65 15 1
{1, 2} ∪ {3} ∪ {4, 5}
Table 2.3: Stirling numbers of the second kind This proves the recurrence relation:

|Pn,k | = k|Pn−1,k | + |Pn−1,k−1 |



n−1
n−1
¾
n−1
¾ n−1
X ½
k+1
= x + k xk =
k k which is the same recurrence as for the Stirling num-
k=0 k=0
n ½ ¾ n ½ ¾ bers of the second kind. As far as the initial con-
X n−1 X n−1
= k
x + k xk ditions are concerned, we observe that there is only
k−1 k one partition of Nn composed by n subsets, i.e., the
k=0 k=0
partition containing n singletons; therefore |Pn,n | =
where, as usual, we performed the change of variable 1, ∀n ∈ N (in the case n = 0 the empty set is the only
k → k − 1 and extended the two sums from 0 to n. partition of the empty set). When n ≥ 1, there is
The identity is valid for every x ∈ R, and therefore we no partition of Nn composed by 0 subsets, and there-
can equate the coefficients of xk in this and the above fore |Pn,0 | = 0. We can conclude that |Pn,k | coincides
identity, thus obtaining the recurrence relation: with the corresponding Stirling number of the second
nno ½ ¾ ½ ¾ kind, and use this fact for observing that:
n−1 n−1
=k + © ª
k k k−1 • n1 = 1, ∀n ≥ 1. In fact, the only partition of
Nn in a single subset is when the subset coincides
which is slightly different from the recurrence for the with Nn ;
Stirling numbers of the first kind. Here we have the ©nª
initial conditions: • 2 = 2n−1 , ∀n ≥ 2. When the partition is only
nno nno composed by two subsets, the first one uniquely
= 1, ∀n ∈ N and = 0, ∀n ≥ 1. determines the second. Let us suppose that the
n 0
first subset always contains 1. By eliminating
These relations completely define the Stirling trian- 1, we obtain as first subset all the subsets in
gle of the second kind. Every row of the triangle Nn \ {1}, except this last whole set, which would
determines a polynomial; for example, from row 4 we correspond to an empty second set. This proves
obtain: S4 (w) = w + 7w2 + 6w3 + w4 and it is called the identity;
the 4-th Stirling polynomial. n o ¡ ¢
n
Let us now look for a combinatorial interpretation • n−1 = n2 , ∀n ∈ N. In any partition with
of these numbers. If Nn is the usual set {1, 2, . . . , n}, one subset with 2 elements, and all the others
we can study the partitions of Nn into k disjoint, non- singletons, the two elements can be chosen in
empty subsets. For example, when n = 4 and k = 2, ¡n¢
2 different ways.
we have the following 7 partitions:

{1} ∪ {2, 3, 4} {1, 2} ∪ {3, 4} {1, 3} ∪ {2, 4} 2.12 Bell and Bernoulli num-
{1, 4} ∪ {2, 3} {1, 2, 3} ∪ {4} bers
{1, 2, 4} ∪ {3} {1, 3, 4} ∪ {2} . If we sum the rows of the Stirling triangle of the sec-
If Pn,k is the corresponding set, we now count |Pn,k | ond kind, we find a sequence:
by fixing an element in Nn , say the last element n.
n 0 1 2 3 4 5 6 7 8
The partitions in Pn,k can contain n as a singleton
Bn 1 1 2 5 15 52 203 877 4140
(i.e., as a subset with n as its only element) or can
contain n as an element in a larger subset. In the which represents the total number of partitions rela-
former case, by eliminating {n} we obtain a partition tive to the set Nn . For example, the five partitions of
in Pn−1,k−1 and, obviously, all partitions in Pn−1,k−1 a set with three elements are:
can be obtained in such a way. When n belongs to a
larger set, we can eliminate it obtaining a partition {1, 2, 3} {1} ∪ {2, 3} {1, 2} ∪ {3}
24 CHAPTER 2. SPECIAL NUMBERS

{1, 3} ∪ {2} {1} ∪ {2} ∪ {3} . This construction can be easily inverted and since it is
injective, we have proved that it is actually a 1-1 cor-
The numbers in this sequence are called Bell numbers
respondence. Because of that, ordered Bell numbers
and are denoted by Bn ; by definition we have:
are also called preferential arrangement numbers.
n n o
X We conclude this section by introducing another
n
Bn = . important sequence of numbers. These are (positive
k
k=0 or negative) rational numbers and therefore they can-
©nªnot correspond to any counting problem, i.e., their
¤ numbers grow very fast; however, since k ≤
£ nBell combinatorial interpretation cannot be direct. How-
k , for every value of n and k (a subset in Pn,k ever, they arise in many combinatorial problems and
corresponds to one or more cycles in Sn,k ), we always therefore they should be examined here, for the mo-
have Bn ≤ n!, and in fact Bn < n! for every n > 1. ment only introducing their definition. The Bernoulli
Another frequently occurring sequence is obtained numbers are implicitly defined by the recurrence re-
by ordering the subsets appearing in the partitions of lation:
Nn . For example, the partition {1} ∪ {2} ∪ {3} can be Xn µ ¶
n+1
ordered in 3! = 6 different ways, and {1, 2} ∪ {3} can Bk = δn,0 .
k
be ordered in 2! = 2 ways, i.e., {1, 2} ∪ {3} and {3} ∪ k=0
{1, 2}. These are called ordered partitions, and their No initial condition is necessary, because for n = 0
¡¢
number is denoted by On . By the previous example, we have 10 B0 = 1, i.e., B0 = 1. This is the starting
we easily see that O3 = 13, and the sequence begins: value, and B1 is obtained by setting n = 1 in the
recurrence relation:
n 0 1 2 3 4 5 6 7 µ ¶ µ ¶
On 1 1 3 13 75 541 4683 47293 2 2
B0 + B1 = 0.
0 1
Because of this definition, the numbers On are called
ordered Bell numbers and we have: We obtain B1 = −1/2, and we now have a formula
n
X n n o for B2 :
On = k!; µ ¶ µ ¶ µ ¶
k 3 3 3
k=0 B0 + B1 + B2 = 0.
0 1 2
this shows that On ≥ n!, and, in fact, On > n!, ∀n >
1. By performing the necessary computations, we find
Another combinatorial interpretation of the or- B2 = 1/6, and we can go on successively obtaining
dered Bell numbers is as follows. Let us fix an in- all the possible values for the Bn ’s. The first twelve
teger n ∈ N and for every k ≤ n let Ak be any mul- values are as follows:
tiset with n elements containing at least once all the
n 0 1 2 3 4 5 6
numbers 1, 2, . . . , k. The number of all the possible
Bn 1 −1/2 1/6 0 −1/30 0 1/42
orderings of the Ak ’s is just the nth ordered Bell num-
ber. For example, when n = 3, the possible multisets n 7 8 9 10 11 12
are: {1, 1, 1} , {1, 1, 2} , {1, 2, 2} , {1, 2, 3}. Their pos-
Bn 0 −1/30 0 5/66 0 −691/2730
sible orderings are given by the following 7 vectors:
Except for B1 , all the other values of Bn for odd
(1, 1, 1) (1, 1, 2) (1, 2, 1) (2, 1, 1) n are zero. Initially, Bernoulli numbers seem to be
small, but as n grows, they become extremely large in
(1, 2, 2) (2, 1, 2) (2, 2, 1) modulus, but, apart from the zero values, they are al-
plus the six permutations of the set {1, 2, 3}. These ternatively one positive and one negative. These and
orderings are called preferential arrangements. other properties of the Bernoulli numbers are not eas-
We can find a 1-1 correspondence between the ily proven in a direct way, i.e., from their definition.
orderings of set partitions and preferential arrange- However, we’ll see later how we can arrange things
ments. If (a1 , a2 , . . . , an ) is a preferential arrange- in such a way that everything becomes accessible to
ment, we build the corresponding ordered partition us.
by setting the element 1 in the a1 th subset, 2 in the
a2 th subset, and so on. If k is the largest number
in the arrangement, we exactly build k subsets. For
example, the partition corresponding to (1, 2, 2, 1) is
{1, 4} ∪ {2, 3}, while the partition corresponding to
(2, 1, 1, 2) is {2, 3}∪{1, 4}, whose ordering is different.
Chapter 3

Formal power series

3.1 Definitions for formal be called the (ordinary) generating function of the
sequence. The term ordinary is used to distinguish
power series these functions from exponential generating func-
Let R be the field of real numbers and let t be any tions, which will be introduced in the next chapter.
indeterminate over R, i.e., a symbol different from The indeterminate t is used as a “place-marker”, i.e.,
any element in R. A formal power series (f.p.s.) over a symbol to denote the place of the element in the se-
R in the indeterminate t is an expression: quence. For example, in the f.p.s. 1 + t + t2 + t3 + · · ·,
corresponding to the sequence (1, 1, 1, . . .), the term

X t5 = 1 · t5 simply denotes that the element in position
f (t) = f0 +f1 t+f2 t2 +f3 t3 +· · ·+fn tn +· · · = fk tk 5 (starting from 0) in the sequence is the number 1.
k=0 Although our study of f.p.s. is mainly justified by
the development of a generating function theory, we
where f0 , f1 , f2 , . . . are all real numbers. The same
dedicate the present chapter to the general theory of
definition applies to every set of numbers, in particu-
f.p.s., and postpone the study of generating functions
lar to the field of rational numbers Q and to the field
to the next chapter.
of complex numbers C. The developments we are now
There are two main reasons why f.p.s. are more
going to see, and depending on the field structure of
easily studied than sequences:
the numeric set, can be easily extended to every field
F of 0 characteristic. The set of formal power series 1. the algebraic structure of f.p.s. is very well un-
over F in the indeterminate t is denoted by F[[t]]. The derstood and can be developed in a standard
use of a particular indeterminate t is irrelevant, and way;
there exists an obvious 1-1 correspondence between,
say, F[[t]] and F[[y]]; it is simple to prove that this 2. many f.p.s. can be “abbreviated” by expressions
correspondence is indeed an isomorphism. In order easily manipulated by elementary algebra.
to stress that our results are substantially indepen- The present chapter is devoted to these algebraic as-
dent of the particular field F and of the particular pects of f.p.s.. For example, we will prove that the
indeterminate t, we denote F[[t]] by F, but the reader series 1 + t + t2 + t3 + · · · can be conveniently abbre-
can think of F as of R[[t]]. In fact, in combinatorial viated as 1/(1 − t), and from this fact we will be able
analysis and in the analysis of algorithms the coeffi- to infer that the series has a f.p.s. inverse, which is
cients f0 , f1 , f2 , . . . of a formal power series are mostly 1 − t + 0t2 + 0t3 + · · ·.
used to count objects, and therefore they are positive We conclude this section by defining the concept
integer numbers, or, in some cases, positive rational of a formal Laurent (power) series (f.L.s.), as an ex-
numbers (e.g., when they are the coefficients of an ex- pression:
ponential generating function. See below and Section
4.1). g(t) = g−m t−m + g−m+1 t−m+1 + · · · + g−1 t−1 +

X
If f (t) ∈ F, the order of f (t), denoted by ord(f (t)), 2
is the smallest index r for which fr 6= 0. The set of all + g0 + g1 t + g 2 t + · · · = gk t k .
k=−m
f.p.s. of order exactly r is denoted by F r or by Fr [[t]].
The formal power series 0 = 0 + 0t + 0t2 + 0t3 + · · · The set of f.L.s. strictly contains the set of f.p.s..
has infinite order. For a f.L.s. g(t) the order can be negative; when the
If (f0 , f1 , f2 , . . .) = (fk )k∈N is a sequence of (real) order of g(t) is non-negative, then g(t) is actually a
numbers, there is no substantial P∞difference between f.p.s..
P∞ We kobserve explicitly that an expression as
k
the sequence and the f.p.s. k=0 fk t , which will k=−∞ fk t does not represent a f.L.s..

25
26 CHAPTER 3. FORMAL POWER SERIES

3.2 The basic algebraic struc- from zero, then we can suppose that ord(f (t)) = k1
and ord(g(t)) = k2 , with 0 ≤ k1 , k2 < ∞. This
ture means fk1 6= 0 and gk2 6= 0; therefore, the product
The set F of f.p.s. can be embedded into several al- f (t)g(t) has the term of degree k1 +k2 with coefficient
gebraic structures. We are now going to define the fk1 gk2 6= 0, and so it cannot be zero. We conclude
most common one, which is related to the usual con- that (F, +, ·) is an integrity domain.
cept of sum and (Cauchy) product of series. Given The previous reasoning also shows that, in general,
P∞ P ∞ we have:
two f.p.s. f (t) = k=0 fk tk and g(t) = k=0 gk tk ,
the sum of f (t) and g(t) is defined as: ord(f (t)g(t)) = ord(f (t)) + ord(g(t))

X ∞
X ∞
X The order of the identity 1 is obviously 0; if f (t) is an
f (t) + g(t) = fk tk + gk t k = (fk + gk )tk .
invertible element in F, we should have f (t)f (t)−1 =
k=0 k=0 k=0
1 and therefore ord(f (t)) = 0. On the other hand, if
From this definition, it immediately follows that F f (t) ∈ F 0 , i.e., f (t) = f0 + f1 t + f2 t2 + f3 t3 + · · · with
is a commutative group with respect to the sum. f0 6= 0, we can easily prove that f (t) is invertible. In
The associative and commutative laws directly fol- fact, let g(t) = f (t)−1 so that f (t)g(t) = 1. From
low from the analogous properties in the field F; the the explicit expression for the Cauchy product, we
identity is the f.p.s. 0 = 0 + 0tP+ 0t2 + 0t3 + · · ·, and can determine the coefficients of g(t) by solving the
∞ k
P∞ series ofkf (t) = k=0 fk t is the series infinite system of linear equations:
the opposite
−f (t) = k=0 (−fk )t . 

 f0 g0 = 1
Let us now define the Cauchy product of f (t) by 
f0 g1 + f1 g0 = 0
g(t):

 f0 g2 + f1 g1 + f2 g0 = 0
Ã∞ !à ∞ ! 
X X ···
f (t)g(t) = fk tk gk t k =
k=0 k=0
The system can be solved in a simple way, starting
  with the first equation and going on one equation
X∞ Xk after the other. Explicitly, we obtain:
=  fj gk−j  tk
f1 f2 f2
k=0 j=0 g0 = f0−1 g1 = − 2 g2 = − 13 − 2 · · ·
f0 f0 f0
Because of the form of the tk coefficient, this is also
−1
called the convolution of f (t) and g(t). It is a good and therefore g(t) = f (t) is well defined. We con-
idea to write down explicitly the first terms of the clude stating the result just obtained: a f.p.s. is in-
Cauchy product: vertible if and only if its order is 0. Because of that,
F 0 is also called the set of invertible f.p.s.. Accord-
f (t)g(t) = f0 g0 + (f0 g1 + f1 g0 )t + ing to standard terminology, the elements of F 0 are
+ (f0 g2 + f1 g1 + f2 g0 )t2 + called the units of the integrity domain.
+ (f0 g3 + f1 g2 + f2 g1 + f3 g0 )t3 + · · · As a simple example, let us compute the inverse of
the f.p.s. 1 − t = 1 − t + 0t2 + 0t3 + · · ·. Here we have
This clearly shows that the product is commutative f0 = 1, f1 = −1 and fk = 0, ∀k > 1. The system
and it is a simple matter to prove that the identity is becomes: 
the f.p.s. 1 = 1 + 0t + 0t2 + 0t3 + · · ·. The distributive 
 g0 = 1

law is a consequence of the distributive law valid in g1 − g 0 = 0
F. In fact, we have: 
 g2 − g 1 = 0

···
(f (t) + g(t))h(t) =
  and we easily obtain that all the gj ’s (j = 0, 1, 2, . . .)
X∞ Xk
are 1. Therefore the inverse f.p.s. we are looking for
=  (fj + gj )hk−j  tk =
is 1 + t + t2 + t3 + · · ·. The usual notation for this
k=0 j=0
    fact is:
1

X X k ∞
X X k = 1 + t + t 2 + t3 + · · · .
=   k
fj hk−j t +   k
gj hk−j t = 1 − t
k=0 j=0 k=0 j=0 It is well-known that this identity is only valid for
= f (t)h(t) + g(t)h(t) −1 < t < 1, when t is a variable and f.p.s. are inter-
preted as functions. In our formal approach, however,
Finally, we can prove that F does not contain any these considerations are irrelevant and the identity is
zero divisor. If f (t) and g(t) are two f.p.s. different valid from a purely formal point of view.
3.4. OPERATIONS ON FORMAL POWER SERIES 27

3.3 Formal Laurent Series Our aim is to show that the field (L, +, ·) of f.L.s. is
isomorphic with the field constructed in the described
In the first section of this Chapter, we introduced the way starting with the integrity domain of f.p.s.. Let
concept of a formal Laurent series, P as an extension Lb = F × F be the set of pairs of f.p.s.; we begin
∞ k
of the concept
P∞ of a f.p.s.; if a(t) = k=m ak t and by showing that for every (f (t), g(t)) ∈ L b there ex-
k
b(t) = k=n bk t (m, n ∈ Z), are two f.L.s., we can ists a pair (a(t), b(t)) ∈ Lb such that (f (t), g(t)) ∼
define the sum and the Cauchy product: (a(t), b(t)) (i.e., f (t)b(t) = g(t)a(t)) and at least

X X∞ one between a(t) and b(t) belongs to F 0 . In fact,
a(t) + b(t) = ak tk + bk t k = let p = min(ord(f (t)), ord(g(t))) and let us define
k=m k=n a(t), b(t) as f (t) = tp a(t) and g(t) = tp b(t); obviously,
X∞ either a(t) ∈ F 0 or b(t) ∈ F 0 or both are invertible
= (ak + bk )tk f.p.s.. We now have:
k=p
à !à !

X ∞
X b(t)f (t) = b(t)tp a(t) = tp b(t)a(t) = g(t)a(t)
k k
a(t)b(t) = ak t bk t =
k=m k=n and this shows that (a(t), b(t)) ∼ (f (t), g(t)).
 

X X If b(t) ∈ F 0 , then a(t)/b(t) ∈ F and is uniquely de-
=  ai bj  tk termined by a(t), b(t); in this case, therefore, our as-
k=q i+j=k sert is proved. So, let us now suppose that b(t) 6∈ F 0 ;
then we can write P b(t) = tm v(t), where v(t) ∈ F 0 . We
where p = min(m, n) and q = m + n. As we did for ∞
have a(t)/v(t) = k=0 dk tk ∈ F 0 , P and consequently
f.p.s., it is not difficult to find out that these opera- ∞ k−m
let us consider the f.L.s. l(t) = k=0 dk t ; by
tions enjoy the usual properties of sum and product, construction, it is uniquely determined by a(t), b(t)
and if we denote by L the set of f.L.s., we have that or also by f (t), g(t). It is now easy to see that l(t)
(L, +, ·) is a field. The only pointP we should formally is the inverse of the f.p.s. b(t)/a(t) in the sense of

prove is that every f.L.s.Pa(t) = k=m ak tk 6= 0 has f.L.s. as considered above, and our proof is complete.

an inverse f.L.s. b(t) = k=−m bk tk . However, this This shows that the correspondence is a 1-1 corre-
is proved in the same way we proved that every f.p.s. spondence between L and Lb preserving the inverse,
in F 0 has an inverse. In fact we should have: so it is now obvious that the correspondence is also
an isomorphism between (L, +, ·) and (L, b +, ·).
am b−m = 1
Because of this result, we can identify Lb and L
am b−m+1 + am+1 b−m = 0
and assert that (L, +, ·) is indeed the smallest field
am b−m+2 + am+1 b−m+1 + am+2 b−m = 0
··· containing (F, +, ·). From now on, the set Lb will be
ignored and we will always refer to L as the field of
By solving the first equation, we find b−m = a−1 f.L.s..
m ;
then the system can be solved one equation after the
other, by substituting the values obtained up to the
moment. Since am b−m is the coefficient of t0 , we have 3.4 Operations on formal
a(t)b(t) = 1 and the proof is complete. power series
We can now show that (L, +, ·) is the smallest
field containing the integrity domain (F, +, ·), thus Besides the four basic operations: addition, subtrac-
characterizing the set of f.L.s. in an algebraic way. tion, multiplication and division, it is possible to con-
From Algebra we know that given an integrity do- sider other operations on F, only a few of which can
main (K, +, ·) the smallest field (F, +, ·) containing be extended to L.
(K, +, ·) can be built in the following way: let us de- The most important operation is surely taking a
fine an equivalence relation ∼ on the set K × K: power of a f.p.s.; if p ∈ N we can recursively define:
½
(a, b) ∼ (c, d) ⇐⇒ ad = bc; f (t)0 = 1 if p = 0
f (t)p = f (t)f (t)p−1 if p > 0
if we now set F = K × K/ ∼, the set F with the
operations + and · defined as the extension of + and and observe that ord(f (t)p ) = p ord(f (t)). There-
· in K is the field we are searching for. This is just fore, f (t)p ∈ F 0 if and only if f (t) ∈ F 0 ; on the
the way in which the field Q of rational numbers is other hand, if f (t) 6∈ F 0 , then the order of f (t)p be-
constructed from the integrity domain Z of integers comes larger and larger and goes to ∞ when p → ∞.
numbers, and the field of rational functions is built This property will be important in our future devel-
from the integrity domain of the polynomials. opments, when we will reduce many operations to
28 CHAPTER 3. FORMAL POWER SERIES

infinite sums involving the powers f (t)p with p ∈ N. By applying well-known rules of the exponential
If f (t) 6∈ F 0 , i.e., ord(f (t)) > 0, these sums involve and logarithmic functions, we can easily define the
elements of larger and larger order, and therefore for corresponding operations for f.p.s., which however,
every index k we can determine the coefficient of tk as will be apparent, cannot be extended to f.L.s.. For
by only a finite number of terms. This assures that the exponentiation we have for f (t) ∈ F 0 , f (t) = f0 +
our definitions will be good definitions. v(t):
We wish also to observe that taking a positive in-

X
teger power can be easily extended to L; in this case, v(t)k
ef (t) = exp(f0 + v(t)) = ef0 .
when ord(f (t)) < 0, ord(f (t)p ) decreases, but re- k!
k=0
mains always finite. In particular, for g(t) = f (t)−1 ,
g(t)p = f (t)−p , and powers can be extended to all Again, since v(t) 6∈ F 0 , the order of v(t)k increases
integers p ∈ Z. with k and the sums necessary to compute the co-
When the exponent p is a real or complex num- efficient of tk are always finite. The formula makes
ber whatsoever, we should restrict f (t)p to the case clear that exponentiation can be performed on every
f (t) ∈ F 0 ; in fact, if f (t) = tm g(t), we would have: f (t) ∈ F, and when f (t) 6∈ F 0 the factor ef0 is not
f (t)p = (tm g(t))p = tmp g(t)p ; however, tmp is an ex- present.
pression without any mathematical sense. Instead, For the logarithm, let us suppose f (t) ∈ F 0 , f (t) =
if f (t) ∈ F 0 , let us write f (t) = f0 + vb(t), with f0 + vb(t), v(t) = vb(t)/f0 ; then we have:
ord(bv (t)) > 0. For v(t) = vb(t)/f0 , we have by New-
ton’s rule: ln(f0 + vb(t)) = ln f0 + ln(1 + v(t)) =
X∞
X∞µ ¶ v(t)k
p p p p p p = ln f0 + (−1)k+1 .
f (t) = (f0 +b v (t)) = f0 (1+v(t)) = f0 v(t)k , k
k k=1
k=0
In this case, for f (t) 6∈ F 0 , we cannot define the log-
which can be assumed as a definition. In the¡ last ¢ arithm, and this shows an asymmetry between expo-
expression, we can observe that: i) f0p ∈ C; ii) kp is
nential and logarithm.
defined for every value of p, k being a non-negative
Another important operation is differentiation:
integer; iii) v(t)k is well-defined by the considerations
above and ord(v(t)k ) grows indefinitely, so that for d

X
every k the coefficient of tk is obtained by a finite Df (t) = f (t) = kfk tk−1 = f ′ (t).
dt
sum. We can conclude that f (t)p is well-defined. k=1
Particular cases are p = −1 and p = 1/2. In the
This operation can be performed on every f (t) ∈ L,
former case, f (t)−1 is the inverse of the f.p.s. f (t).
and a very important observation is the following:
We have already seen a method for computing f (t)−1 ,
but now we obtain the following formula: Theorem 3.4.1 For every f (t) ∈ L, its derivative
∞ µ ¶ ∞ f ′ (t) does not contain any term in t−1 .
1 X −1 1 X
f (t)−1 = v(t)k = (−1)k v(t)k .
Proof: In fact, by the general rule, the term in
f0 k f0
k=0 k=0
t−1 should have been originated by the constant term
For p = 1/2, we obtain a formula for the square root (i.e., the term in t0 ) in f (t), but the product by k
of a f.p.s.: reduces it to 0.
p p X ∞ µ ¶ This fact will be the basis for very important results
1/2
f (t)1/2 = f (t) = f0 v(t)k = on the theory of f.p.s. and f.L.s. (see Section 3.8).
k
k=0 Another operation is integration; because indefinite
∞ µ ¶
p X (−1)k−1 2k integration leaves a constant term undefined, we pre-
= f0 v(t)k . fer to introduce and use only definite integration; for
4k (2k − 1) k
k=0
f (t) ∈ F this is defined as:
In Section 3.12, we will see how f (t)p can be ob- Z t ∞ Z t ∞
tained computationally without actually performing X X fk k+1
k
k f (τ )dτ = fk τ dτ = t .
the powers v(t) . We conclude by observing that this 0 0 k+1
k=0 k=0
more general operation of taking the power p ∈ R
cannot be extended to f.L.s.: in fact, we would have Our purely formal approach allows us to exchange
smaller and smaller terms tk (k → −∞) and there- the integration and summation signs; in general, as
fore the resulting expression cannot be considered an we know, this is only possibleRwhen the convergence is
t
actual f.L.s., which requires a term with smallest de- uniform. By this definition, 0 f (τ )dτ never belongs
gree. to F 0 . Integration can be extended to f.L.s. with an
3.6. COEFFICIENT EXTRACTION 29

obvious exception: because integration is the inverse position ◦; composition is always associative and
operation of differentiation, we cannot apply integra- therefore (F 1 , ◦) is a group if we prove that every
tion to a f.L.s. containing a term in t−1 . Formally, f (t) ∈ F 1 has a left (or right) inverse, because the
from the definition above, such a term would imply a theory assures that the other inverse exists and coin-
division by 0, and this is not allowed. In all the other cides with the previously found inverse. Let f (t) =
cases, integration does not create any problem. f1 t+f2 t2 +f3 t3 +· · · and g(t) = g1 t+g2 t2 +g3 t3 +· · ·;
we have:

3.5 Composition f (g(t)) = f1 (g1 t + g2 t2 + g3 t3 + · · ·) +


+ f2 (g12 t2 + 2g1 g2 t3 + · · ·) +
A last operation on f.p.s. is so important that we
dedicate to it a complete section. The operation is the + f3 (g13 t3 + · · ·) + · · · =
composition of two f.p.s.. Let f (t) ∈ F and g(t) 6∈ F 0 , = f1 g1 t + (f1 g2 + f2 g12 )t2 +
then we define the “composition” of f (t) by g(t) as + (f1 g3 + 2f2 g1 g2 + f3 g13 )t3 + · · ·
the f.p.s: = t

X
f (g(t)) = fk g(t)k .
In order to determine g(t) we have to solve the sys-
k=0
tem: 
This definition justifies the fact that g(t) cannot be-  f1 g1 = 1 2

long to F0 ; in fact, otherwise, infinite sums were in- 
f1 g2 + f2 g1 = 0
volved in the computation of f (g(t)). In connection 3

 f1 g3 + 2f2 g1 g2 + f3 g1 = 0
with the composition of f.p.s., we will use the follow- 
···
ing notation:
The first equation gives g1 = 1/f1 ; we can substitute
£ ¯ ¤
f (g(t)) = f (y) ¯ y = g(t) this value in the second equation and obtain a value
for g2 ; the two values for g1 and g2 can be substi-
which is intended to mean the result of substituting tuted in the third equation and obtain a value for g3 .
g(t) to the indeterminate y in the f.p.s. f (y). The Continuing in this way, we obtain the value of all the
indeterminate y is a dummy symbol; it should be dif- coefficients of g(t), and therefore g(t) is determined
ferent from t in order not to create any ambiguity, in a unique way. In fact, we observe that, by con-
but it can be substituted by any other symbol. Be- struction, in the kth equation, gk appears in linear
cause every f (t) 6∈ F 0 is characterized by the fact form and its coefficient is always f1 . Being f1 6= 0,
that g(0) = 0, we will always understand that, in the gk is unique even if the other gr (r < k) appear with
notation above, the f.p.s. g(t) is such that g(0) = 0. powers.
The definition can be extended to every f (t) ∈ L, The f.p.s. g(t) such that f (g(t)) = t, and therefore
but the function g(t) has always to be such that such that g(f (t)) = t as well, is called the composi-
ord(g(t)) > 0, otherwise the definition would imply tional inverse of f (t). In the literature, it is usually
infinite sums, which we avoid because, by our formal denoted by f (t) or f [−1] (t); we will adopt the first no-
approach, we do not consider any convergence crite- tation. Obviously, f (t) = f (t), and sometimes f (t) is
rion. also called the reverse of f (t). Given f (t) ∈ F 1 , the
The f.p.s. in F 1 have a particular relevance for determination of its compositional inverse is one of
composition. They are called quasi-units or delta se- the most interesting problems in the theory of f.p.s.
ries. First of all, we wish to observe that the f.p.s. or f.L.s.; it was solved by Lagrange and we will dis-
t£ ∈¯ F 1 acts ¤as an identity£ for composition.
¯ ¤ In fact cuss it in the following sections. Note that, in princi-
y ¯ y = g(t) = g(t) and f (y) ¯ y = t = f (t), and ple, the gk ’s can be computed by solving the system
therefore t is a left and right identity. As a sec- above; this, however, is too complicated and nobody
ond fact, we show that a f.p.s. f (t) has an inverse will follow that way, unless for exercising.
with respect to composition if and only if f (t) ∈ F 1 .
Note that g(t) is the inverse of f (t) if and only if
f (g(t)) = t and g(f (t)) = t. From this, we de- 3.6 Coefficient extraction
duce immediately that f (t) 6∈ F 0 and g(t) 6∈ F 0 .
On the other hand, it is clear that ord(f (g(t))) = If f (t) ∈ L, or in particular f (t) ∈ F, the notation
ord(f (t))ord(g(t)) by our initial definition, and since [tn ]f (t) indicates the extraction of the coefficient of
ord(t) = 1 and ord(f (t)) > 0, ord(g(t)) > 0, we must tn from f (t), and therefore we have [tn ]f (t) = fn .
have ord(f (t)) = ord(g(t)) = 1. In this sense, [tn ] can be seen as a mapping: [tn ] :
Let us now come to the main part of the proof L → R or [tn ] : L → C, according to what is the
and consider the set F 1 with the operation of com- field underlying the set L or F. Because of that, [tn ]
30 CHAPTER 3. FORMAL POWER SERIES

(linearity) [tn ](αf (t) + βg(t)) = α[tn ]f (t) + β[tn ]g(t) (K1)

(shifting) [tn ]tf (t) = [tn−1 ]f (t) (K2)

(differentiation) [tn ]f ′ (t) = (n + 1)[tn+1 ]f (t) (K3)


Xn
(convolution) [tn ]f (t)g(t) = [tk ]f (t)[tn−k ]g(t) (K4)
k=0
X∞
(composition) [tn ]f (g(t)) = ([y k ]f (y))[tn ]g(t)k (K5)
k=0

Table 3.1: The rules for coefficient extraction

is called an operator and exactly the “coefficient of ” conclude with the so-called Newton’s rule:
operator or, more simply, the coefficient operator. µ ¶
r n
In Table 3.1 we state formally the main properties [tn ](1 + αt)r = α
n
of this operator, by collecting what we said in the pre-
vious sections. We observe that α, β ∈ R or α, β ∈ C which is one of the most frequently used results in
are any constants; the use of the indeterminate y is coefficient extraction. Let us remark explicitly that
only necessary not to confuse the action on different when r = −1 (the geometric series) we have:
f.p.s.; because g(0) = 0 in composition, the last sum µ ¶
is actually finite. Some points require more lengthy 1 −1 n
[tn ] = α =
comments. The property of shifting can be easily gen- 1 + αt n
µ ¶
eralized to [tn ]tk f (t) = [tn−k ]f (t) and also to nega- 1+n−1
= (−1)n αn = (−α)n .
tive powers: [tn ]f (t)/tk = [tn+k ]f (t). These rules are n
very important and are often applied in the theory of
A simple, but important use of Newton’s rule con-
f.p.s. and f.L.s.. In the former case, some care should
cerns the extraction of the coefficient of tn from the
be exercised to see whether the properties remain in
inverse of a trinomial at2 + bt + c, in the case it is
the realm of F or go beyond it, invading the domain
reducible, i.e., it can be written (1 + αt)(1 + βt); ob-
of L, which can be not always correct. The property
viously, we can always reduce the constant c to 1; by
of differentiation for n = 1 gives [t−1 ]f ′ (t) = 0, a
the linearity rule, it can be taken outside the “coeffi-
situation we already noticed. The operator [t−1 ] is
cient of” operator. Therefore, our aim is to compute:
also called the residue and is noted as “res”; so, for
example, people write resf ′ (t) = 0 and some authors 1
use the notation res tn+1 f (t) for [tn ]f (t). [tn ]
(1 + αt)(1 + βt)
We will have many occasions to apply rules (K1) ÷
(K5) of coefficient extraction. However, just to give with α 6= β, otherwise Newton’s rule would be im-
a meaningful example, let us find the coefficient of mediately applicable. The problem can be solved by
tn in the series expansion of (1 + αt)r , when α and using the technique of partial fraction expansion. We
r are two real numbers whatsoever. Rule (K3) can look for two constants A and B such that:
be written in the form [tn ]f (t) = n1 [tn−1 ]f ′ (t) and we 1 A B
= + =
can successively apply this form to our case: (1 + αt)(1 + βt) 1 + αt 1 + βt
A + Aβt + B + Bαt
rα n−1 = ;
[tn ](1 + αt)r = [t ](1 + αt)r−1 = (1 + αt)(1 + βt)
n
rα (r − 1)α n−2 if two such constants exist, the numerator in the first
= [t ](1 + αt)r−2 = · · · = expression should equal the numerator in the last one,
n n−1
rα (r − 1)α (r − n + 1)α 0 independently of t, or, if one so prefers, for every
= ··· [t ](1 + αt)r−n =
value of t. Therefore, the term A + B should be equal
n n−1 1
µ ¶ to 1, while the term (Aβ + Bα)t should always be 0.
n r
= α [t0 ](1 + αt)r−n . The values for A and B are therefore the solution of
n
the linear system:
½
We now observe that [t0 ](1 + αt)r−n = 1 because of A+B =1
our observations on f.p.s. operations. Therefore, we Aβ + Bα = 0
3.7. MATRIX REPRESENTATION 31

The discriminant of this system is α − β, which is At this point we only have to find the value of θ.
always different from 0, because of our hypothesis Obviously:
α 6= β. The system has therefore only one solution, Ãp ! √
which is A = α/(α−β) and B = −β/(α−β). We can |∆| . −b 4c − b2
now substitute these values in the expression above: θ = arctan + kπ = arctan + kπ
2 2 −b
1
[tn ] = ¡ √ ¢
(1 + αt)(1 + βt) When b < 0, we have 0 < arctan − 4c − b2 /2 <
µ ¶
1 α β π/2, and this is the correct value for θ. However,
= [tn ] − = when b > 0, the principal branch of
α − β 1 + αt 1 + βt ¡ arctan
√ is negative,
¢
µ ¶ and we should set θ = π + arctan − 4c − b2 /2. As
1 α β
= [tn ] − [tn ] = a consequence, we have:
α−β 1 + αt 1 + βt

αn+1 − β n+1 4c − b2
= (−1)n θ = arctan +C
α−β −b
Let us now consider a trinomial 1 + bt + ct2 for where C = π if b > 0 and C = 0 if b < 0.
which ∆ = b2 − 4c < 0 and b 6= 0. The trinomial is An interesting and non-trivial example is given by:
irreducible, but we can write:
1
1 σn = [tn ] =
[tn ] = 1 − 3t + 3t2
1 + bt + ct2 √ n+1 √
2( 3) sin((n + 1) arctan( 3/3))
1 = √ =
= [tn ] µ √ ¶µ √ ¶ 3
−b+i |∆| −b−i |∆| √ (n + 1)π
1− 2 t 1− 2 t = 2( 3)n sin
6
This time, a partial fraction expansion does not give
These coefficients have the following values:
a simple closed form for the coefficients, however, we
can apply the formula above in the form: √ 12k
n = 12k σn = 3 = 729k
√ 12k+2
1 αn+1 − β n+1 n = 12k + 1 σn = 3 = 3 × 729k
[tn ] = . √ 12k+2
(1 − αt)(1 − βt) α−β n = 12k + 2 σn = 2 3 = 6 × 729k
√ 12k+4
Since α and β are complex numbers, the resulting n = 12k + 3 σn = 3 = 9 × 729k
√ 12k+4
expression is not very appealing. We ³ can try to give n = 12k + 4 σn = 3 = 9 × 729k
p ´
it a better form. Let us set α = −b + i |∆| /2, n = 12k + 5 σn = 0
√ 12k+6
so α is always contained in the positive imaginary n = 12k + 6 σn = − 3 = −27 × 729k
√ 12k+8
halfplane. This implies 0 < arg(α) < π and we have: n = 12k + 7 σn = − 3 = −81 × 729k
p p √ 12k+8
b |∆| b |∆| n = 12k + 8 σn = −2 3 = −162 × 729k
α−β = − +i + +i = √ 12k+10
2 2p 2 2 n = 12k + 9 σn = − 3 = −243 × 729k
p √ 12k+10
= i |∆| = i 4c − b2 n = 12k + 10 σn = − 3 = −243 × 729k
n = 12k + 11 σn = 0
If θ = arg(α) and:
r
ρ = |α| =
b2

4c − b2
= c
√ 3.7 Matrix representation
4 4
Let f (t) ∈ F 0 ; with the coefficients of f (t) we form
we can set α = ρeiθ and β = ρe−iθ . Consequently: the following infinite lower triangular matrix (or ar-
³ ´ ray) D = (dn,k )n,k∈N : column 0 contains the coeffi-
αn+1 − β n+1 = ρn+1 ei(n+1)θ − e−i(n+1)θ = cients f0 , f1 , f2 , . . . in this order; column 1 contains
the same coefficients shifted down by one position
= 2iρn+1 sin(n + 1)θ and d0,1 = 0; in general, column k contains the coef-
ficients of f (t) shifted down k positions, so that the
and therefore:
first k positions are 0. This definition can be sum-

n 1 2( c)n+1 sin(n + 1)θ marized in the formula dn,k = fn−k , ∀n, k ∈ N. For
[t ] = √ a reason which will be apparent only later, the array
1 + bt + ct2 4c − b 2
32 CHAPTER 3. FORMAL POWER SERIES
³ ´
D will be denoted by (f (t), 1): If fbn,k = (1, f (t)/t), by definition we have:
n,k∈N
 
f0 0 0 0 0 ···
 f1 f0 0 0 0 ···  fbn,k = [tn ]f (t)k
 
 f2 f1 f0 0 0 ··· 
  and therefore the generic element dn,k of the product
D = (f (t), 1) =  f3 f2 f1 f0 0 ··· .
  is:
 f4 f3 f2 f1 f0 ··· 
  ∞ ∞
.. .. .. .. .. .. X X
. . . . . . dn,k = gbn,j fbj,k = [tn ]g(t)j [y j ]f (y)k =
j=0 j=0
If (f (t), 1) and (g(t), 1) are the matrices corre- ∞
X ¡ ¢
sponding to the two f.p.s. f (t) and g(t), we are in- = [tn ] [y j ]f (t)k g(t)j = [tn ]f (g(t))k .
terested in finding out what is the matrix obtained j=0
by multiplying the two matrices with the usual row-
by-column product. This product will be denoted by In other words, column k in (1, g(t)/t) · (1, f (t)/t) is
(f (t), 1) · (g(t), 1), and it is immediate to see what the kth power of the composition f (g(t)), and we can
its generic element dn,k is. The row n in (f (t), 1) is, conclude:
by definition, {fn , fn−1 , fn−2 , . . .}, and column k in
(1, g(t)/t) · (1, f (t)/t) = (1, f (g(t))/t).
(g(t), 1) is {0, 0, . . . , 0, g1 , g2 , . . .} where the number
of leading 0’s is just k. Therefore we have: Clearly, the identity t ∈ F 1 corresponds to the ma-

X trix (1, t/t) = (1, 1), the identity matrix, and this is
dn,k = fn−j gj−k sufficient to prove that the correspondence f (t) ↔
j=0 (1, f (t)/t) is a group isomorphism.
Row-by-column product is surely the basic op-
if we conventionally set P∞gr = 0, ∀r P < 0, When
n
eration on matrices and its extension to infinite,
k = 0, we have dn,0 = j=0 fn−j gj = j=0 fn−j gj , lower triangular arrays is straight-forward, because
and therefore column 0 contains the coefficients of the sums involved in the product are actually finite.
the convolution f (t)g(t). When k = 1 we have We have shown that we can associate every f.p.s.
P∞ Pn−1
dn,1 = j=0 fn−j gj−1 = j=0 fn−1−j gj , and this f (t) ∈ F 0 to a particular matrix (f (t), 1) (let us
n−1 denote by A the set of such arrays) in such a way
is the coefficient of t in the convolution f (t)g(t).
Proceeding in the same way, we see that column k that (F 0 , ·) is isomorphic to (A, ·), and the Cauchy
contains the coefficients of the convolution f (t)g(t) product becomes the row-by-column product. Be-
shifted down k positions. Therefore we conclude: sides, we can associate every f.p.s. g(t) ∈ F 1 to a
matrix (1, g(t)/t) (let us call B the set of such matri-
(f (t), 1) · (g(t), 1) = (f (t)g(t), 1) ces) in such a way that (F 1 , ◦) is isomorphic to (B, ·),
and the composition of f.p.s. becomes again the row-
and this shows that there exists a group isomorphism by-column product. This reveals a connection be-
between (F 0 , ·) and the set of matrices (f (t), 1) with tween the Cauchy product and the composition: in
the row-by-column product. In particular, (1, 1) is the Chapter on Riordan Arrays we will explore more
the identity (in fact, it corresponds to the identity deeply this connection; for the moment, we wish to
matrix) and (f (t)−1 , 1) is the inverse of (f (t), 1). see how this observation yields to a computational
Let us now consider a f.p.s. f (t) ∈ F 1 and let method for evaluating the compositional inverse of a
us build an infinite lower triangular matrix in the f.p.s. in F 1 .
following way: column k contains the coefficients of
f (t)k in their proper order:
  3.8 Lagrange inversion theo-
1 0 0 0 0 ···
 0 f1 0 0 0 ···  rem
 
 0 f2 f 2
0 0 ··· 
 1  Given an infinite, lower triangular array of the form
 0 f3 2f1 f2 f13 0 ···  .
  (1, f (t)/t), with f (t) ∈ F 1 , the inverse matrix
 0 f4 2f1 f3 + f22 3f12 f2 f14 · · · 
  (1, g(t)/t) is such that (1, g(t)/t) · (1, f (t)/t) = (1, 1),
.. .. .. .. .. . .
. . . . . . and since the product results in (1, f (g(t))/t) we have
f (g(t)) = t. In other words, because of the isomor-
The matrix will be denoted by (1, f (t)/t) and we are phism we have seen, the inverse matrix for (1, f (t)/t)
interested to see how the matrix (1, g(t)/t)·(1, f (t)/t) is just the matrix corresponding to the compositional
is composed when f (t), g(t) ∈ F 1 . inverse of f (t). As we have already said, Lagrange
3.9. SOME EXAMPLES OF THE LIF 33

found a noteworthy formula for the coefficients of in fact, f (t)k−n is a f.L.s. and, as we observed, the
this compositional inverse. We follow the more recent residue of its derivative should be zero. This proves
proof of Stanley, which points out the purely formal that D · (1, f (t)/t) = (1, 1) and therefore D is the
aspects of Lagrange’s formula. Indeed, we will prove inverse of (1, f (t)/t).
something more, by finding the exact form of the ma- If f (t) is the compositional inverse of f (t), the col-
trix (1, g(t)/t), inverse of (1, f (t)/t). As a matter of umn 1 gives us the value of its coefficients; by the
fact, we state what the form of (1, g(t)/t) should be formula for dn,k we have:
and then verify that it is actually so. µ ¶n
Let D = (dn,k )n,k∈N be defined as: n 1 n−1 t
f n = [t ]f (t) = dn,1 = [t ]
µ ¶n n f (t)
k t
dn,k = [tn−k ] . and this is the celebrated Lagrange Inversion For-
n f (t) mula (LIF). The other columns give us the coefficients
k
Because f (t)/t ∈ F 0 , the power (t/f (t)) = of the powers f (t) , for which we have:
k

(f (t)/t)−k is well-defined; in order to show that µ ¶n


n k k n−k t
(dn,k )n,k∈N = (1, g(t)/t) we only have to prove that [t ]f (t) = [t ] .
n f (t)
D · (1, f (t)/t) = (1, 1), because we already know that
the compositional inverse of f (t) is unique. The Many times, there is another way for applying the
generic element vn,k of the row-by-column product LIF. Suppose we have a functional equation w =
D · (1, f (t)/t) is: tφ(w), where φ(t) ∈ F 0 , and we wish to find the
∞ f.p.s. w = w(t) satisfying this functional equation.
X
vn,k = j
dn,j [y ]f (y) = k Clearly w(t) ∈ F 1 and if we set f (y) = y/φ(y), we
j=0 also have f (t) ∈ F 1 . However, the functional equa-
X∞ µ ¶n tion can be written f (w(t)) = t, and this shows that
j n−j t j k w(t) is the compositional inverse of f (t). We there-
= [t ] [y ]f (y) .
j=0
n f (t) fore know that w(t) is uniquely determined and the
LIF gives us:
By the rule of differentiation for the coefficient of op-
µ ¶n
erator, we have: 1 t 1
[tn ]w(t) = [tn−1 ] = [tn−1 ]φ(t)n .
d n f (t) n
j[y j ]f (y)k = [y j−1 ] f (y)k = k[y j ]yf ′ (y)f (y)k−1 .
dy The LIF can also give us the coefficients of the
Therefore, for vn,k we have: powers w(t)k , but we can obtain a still more general
∞ µ ¶n result. Let F (t) ∈ F and let us consider the com-
k X n−j t j ′ k−1 position F (w(t)) where w = w(t) is, as before, the
vn,k = [t ] [y ]yf (y)f (y) =
n j=0 f (t) solution to the functional equation w = tφ(w), with
µ ¶n φ(w) ∈ F 0 . For the coefficient of tn in F (w(t)) we
k n t ′ k−1
= [t ] tf (t)f (t) . have:
n f (t) X∞
n n
In fact, the factor k/n does not depend on j and [t ]F (w(t)) = [t ] Fk w(t)k =
can be taken out of the summation sign; the sum k=0
X∞ X∞
is actually finite and is the term of the convolution n k k
appearing in the last formula. Let us now distinguish = Fk [t ]w(t) = Fk [tn−k ]φ(t)n =
n
k=0 k=0
between the case k = n and k 6= n. When k = n we ̰ !
have: 1 X
n−1 k−1
= [t ] kFk t φ(t)n =
n n
vn,n = [t ]t tf (t) f (t) = −1 ′ n
k=0
µ ¶
t 1
= [t0 ]f ′ (t) = 1; = [tn−1 ]F ′ (t)φ(t)n .
f (t) n
0
in fact, f ′ (t) = f1 + 2f2 t + 3f3 t2 + · · · and, being Note that [t ]F (w(t)) = F0 , and this formula can be
f (t)/t ∈ F 0 , (f (t)/t)−1 = (f1 + f2 t + f3 t2 + · · ·)−1 = generalized to every F (t) ∈ L, except for the coeffi-
0
−1
f +· · ·; therefore, the constant term in f (t)(t/f (t))′ cient [t ]F (w(t)).
1
is f1 /f1 = 1. When k 6= n:

vn,k =
k n n
[t ]t tf (t)k−n−1 f ′ (t) =
3.9 Some examples of the LIF
n
k −1 1 d ¡ ¢ We found that the number bn of binary trees
= [t ] f (t)k−n = 0; with n nodes (and of other combinatorial objects
n k − n dt
34 CHAPTER 3. FORMAL POWER SERIES

P
Pnwell) satisfies the recurrence relation: bn+1 = which proves that Tn+1 =
as Ti1 Ti2 · · · Tip , where the
P∞ k=0 b k b n−k . Let us consider the f.p.s. b(t) = sum is extended to all the p-uples (i1 , i2 , . . . , ip ) such
k
k=0 bk t ; if we multiply the recurrence relation by that i1 + i2 + · · · + ip = n. As before, we can multiply
tn+1 and sum for n from 0 to infinity, we find: by tn+1 the two members of the recurrence relation
à n ! and sum for n from 0 to infinity. We find:
X ∞ X ∞ X
bn+1 tn+1 = tn+1 bk bn−k . T (t) − 1 = tT (t)p .
n=0 n=0 k=0
This time we have a p degree equation, which cannot
Since b0 = 1, we can add and subtract 1 = b0 t0 from be directly solved. However, if we set w(t) = T (t)−1,
the left hand member and can take t outside the sum- so that w(t) ∈ F 1 , we have:
mation sign in the right hand member:
à n ! w = t(1 + w)p
X∞ ∞
X X
n
bn t − 1 = t bk bn−k tn . and the LIF gives:
n=0 n=0 k=0
1
Tn = [tn ]w(t) = [tn−1 ](1 + t)pn =
In the r.h.s. we recognize a convolution and substi- µ ¶n
tuting b(t) for the corresponding f.p.s., we obtain: 1 pn 1 (pn)!
= = =
n n−1 n (n − 1)!((p − 1)n + 1)!
b(t) − 1 = tb(t)2 . µ ¶
1 pn
=
n (p − 1)n + 1 n
We are interested in evaluating bn = [t ]b(t); let us
therefore set w = w(t) = b(t) − 1, so that w(t) ∈ F 1 which generalizes the formula for the Catalan num-
and wn = bn , ∀n > 0. The previous relation becomes bers.
w = t(1 + w)2 and we see that the LIF can be applied Finally, let us find the solution of the functional
(in the form relative to the functional equation) with equation w = tew . The LIF gives:
φ(t) = (1 + t)2 . Therefore we have:
1 n−1 nt 1 nn−1 nn−1
1 wn = [t ]e = = .
bn = [tn ]w(t) = [tn−1 ](1 + t)2n = n n (n − 1)! n!
µ ¶n Therefore, the solution we are looking for is the f.p.s.:
1 2n 1 (2n)!
= = =
n n−1 n (n − 1)!(n + 1)! X∞
µ ¶ nn−1 n 3 8 125 5
1 (2n)! 1 2n w(t) = t = t + t 2 + t3 + t4 + t +· · · .
= = . n=1
n! 2 3 24
n + 1 n!n! n+1 n

As we said in the previous chapter, bn is called the As noticed, w(t) is the compositional inverse of
nth Catalan number and, under this name, it is often t3 t4 t5
denoted by Cn . Now we have its form: f (t) = t/φ(t) = te−t = t − t2 + − + − · · · .
2! 3! 4!
µ ¶
1 2n It is a useful exercise to perform the necessary com-
Cn =
n+1 n putations to show that f (w(t)) = t, for example up to
the term of degree 5 or 6, and verify that w(f (t)) = t
also valid in the case n = 0 when C0 = 1. as well.
In the same way we can compute the number of
p-ary trees with n nodes. A p-ary tree is a tree in
which all the nodes have arity p, except for leaves, 3.10 Formal power series and
which have arity 0. A non-empty p-ary tree can be the computer
decomposed in the following way:
When we are dealing with generating functions, or
r
©©HH more in general with formal power series of any kind,
© © ¶ HH
© ¶ H we often have to perform numerical computations in
©© ¶ HH order to verify some theoretical result or to experi-
r ©
© r¶ HHr ment with actual cases. In these and other circum-
­J ­J ­J stances the computer can help very much with its
­ T1J ­ T2J . . . ­ TpJ speed and precision. Nowadays, several Computer
­ J ­ J ­ J
Algebra Systems exist, which offer the possibility of
3.11. THE INTERNAL REPRESENTATION OF EXPRESSIONS 35

actually working with formal power series, contain- provided that (m, n) is a reduced rational number.
ing formal parameters as well. The use of these tools The dimension of m and n is limited by the internal
is recommended because they can solve a doubt in representation of integer numbers.
a few seconds, can clarify difficult theoretical points In order to avoid this last problem, Computer Al-
and can give useful hints whenever we are faced with gebra Systems usually realize an indefinite precision
particular problems. integer arithmetic. An integer number has a vari-
However, a Computer Algebra System is not al- able length internal representation and special rou-
ways accessible or, in certain circumstances, one may tines are used to perform the basic operations. These
desire to use less sophisticated tools. For example, routines can also be realized in a high level program-
programmable pocket computers are now available, ming language (such as C or JAVA), but they can
which can perform quite easily the basic operations slow down too much execution time if realized on a
on formal power series. The aim of the present and programmable pocket computer.
of the following sections is to describe the main al-
gorithms for dealing with formal power series. They
can be used to program a computer or to simply un- 3.11 The internal representa-
derstand how an existing system actually works.
The simplest way to represent a formal power se- tion of expressions
ries is surely by means of a vector, in which the kth
component (starting from 0) is the coefficient of tk in The simple representation of a formal power series by
the power series. Obviously, the computer memory a vector of real, or rational, components will be used
only can store a finite number of components, so an in the next sections to explain the main algorithms
upper bound n0 is usually given to the length of vec- for formal power series operations. However, it is
tors and to represent power series. In other words we surely not the best way to represent power series and
have: becomes completely useless when, for example, the
̰ ! coefficients depend on some formal parameter. In
X other words, our representation only can deal with
k
reprn0 ak t = (a0 , a1 , . . . , an ) (n ≤ n0 )
purely numerical formal power series.
k=0
Because of that, Computer Algebra Systems use a
Fortunately, most operations on formal power se- more sophisticated internal representation. In fact,
ries preserve the number of significant components, power series are simply a particular case of a general
so that there is little danger that a number of succes- mathematical expression. The aim of the present sec-
sive operations could reduce a finite representation to tion is to give a rough idea of how an expression can
a meaningless sequence of numbers. Differentiation be represented in the computer memory.
decreases by one the number of useful components; In general, an expression consists in operators and

on the contrary, integration and multiplication by tr , operands. For example, in a + 3, the operators are
say, increase the number of significant elements, at √
+ and ·, and the operands are a and 3. Every
the cost of introducing some 0 components. operator has its own arity or adicity, i.e., the number
The components a0 , a1 , . . . , an are usually real of operands on which it acts. The adicity of the sum
numbers, represented with the precision allowed by + is two, because it √ acts on its two terms; the adicity
the particular computer. In most combinatorial ap- of the square root · is one, because it acts on a
plications, however, a0 , a1 , . . . , an are rational num- single term. Operands can be numbers (and it is
bers and, with some extra effort, it is not difficult important that the nature of the number be specified,
to realize rational arithmetic on a computer. It is i.e., if it is a natural, an integer, a rational, a real or a
sufficient to represent a rational number as a couple complex number) or can be a formal parameter, as a.
(m, n), whose intended meaning is just m/n. So we Obviously, if an operator acts on numerical operands,
must have m ∈ Z, n ∈ N and it is a good idea to it can be executed giving a numerical result. But if
keep m and n coprime. This can be performed by any of its operands is a formal parameter, the result is
a routine reduce computing p = gcd(m, n) using Eu- a formal expression, which may perhaps be simplified
clid’s algorithm and then dividing both m and n by but cannot be evaluated to a numerical result.
p. The operations on rational numbers are defined in An expression can always be transposed into a
the following way: “tree”, the internal nodes of which correspond to
(m, n) + (m′ , n′ ) = reduce(mn′ + m′ n, nn′ ) operators and whose leaves correspond to operands.
The simple tree for the previous expression is given
(m, n) × (m′ , n′ ) = reduce(mm′ , nn′ )
in Figure 3.1. Each operator has as many branches as
(m, n)−1 = (n, m) its adicity is and a simple visit to the tree can perform
(m, n)p = (mp , np ) (p ∈ N) its evaluation, that is it can execute all the operators
36 CHAPTER 3. FORMAL POWER SERIES

where ci = ai + bi for every 0 ≤ i ≤ r, and r =


+ min(n, m). In a similar way the Cauchy product is
¡ @ defined:
¡ @
º·¡ @ (a0 , a1 , . . . , an ) × (b0 , b1 , . . . , bm ) = (c0 , c1 , . . . , cr )

a ·
¹¸ Pk
where ck = j=0 aj bk−j for every 0 ≤ k ≤ r. Here
º·
r is defined as r = min(n + pB , m + pA ), if pA is
the first index for which apA 6= 0 and pB is the first
3
¹¸
index for which bpB 6= 0. We point out that the
time complexity for the sum is O(r) and the time
complexity for the product is O(r2 ).
Figure 3.1: The tree for a simple expression Subtraction is similar to addition and does not re-
quire any particular comment. Before discussing divi-
when only numerical operands are attached to them. sion, let us consider the operation of rising a formal
Simplification is a rather complicated matter and it power series to a power α ∈ R. This includes the
is not quite clear what a “simple” expression is. For inversion of a power series (α = −1) and therefore
example, which is simpler between (a + 1)(a + 2) and division as well.
a2 +3a+2? It is easily seen that there are occasions in First of all we observe that whenever α ∈ N, f (t)α
which either expression can be considered “simpler”. can be reduced to that case (1 + g(t))α , where g(t) ∈
/
Therefore, most Computer Algebra Systems provide F0 . In fact we have:
both a general “simplification” routine together with
f (t) = fh th + fh+1 th+1 + fh+2 th+2 + · · · =
a series of more specific programs performing some µ ¶
specific simplifying tasks, such as expanding paren- h fh+1 fh+2 2
= fh t 1 + t+ t + ···
thetized expressions or collecting like factors. fh fh
In the computer memory an expression is repre- and therefore:
sented by its tree, which is called the tree or list repre- µ ¶α
sentation of the expression. The representation of the fh+1 fh+2 2
Pn f (t)α = fhα tαh 1 + t+ t + ··· .
formal power series k=0 ak tk is shown in Figure 3.2. fh fh
This representation is very convenient and when some
coefficients a1 , a2 , . . . , an depend on a formal param- On the contrary, when α ∈ / N, f (t)α only can be
eter p nothing is changed, at least conceptually. In performed if f (t) ∈ F0 . In that case we have:
fact, where we have drawn a leaf aj , we simply have ¡ ¢α
f (t)α = f0 + f1 t + f2 t2 + f3 t3 + · · · =
a more complex tree representing the expression for µ ¶α
aj . f1 f2 f3
= f0α 1 + t + t2 + t3 + · · · ;
The reader can develop computer programs for f0 f0 f0
dealing with this representation of formal power se- note that in this case if f0 6= 1, usually f0α is not ratio-
ries. It should be clear that another important point nal. In any case, we are always reduced to compute
of this approach is that no limitation is given to the (1 + g(t))α and since:
length of expressions. A clever and dynamic use of
X∞ µ ¶
the storage solves every problem without increasing α
the complexity of the corresponding programs. (1 + g(t))α = g(t)k (3.12.1)
k
k=0

if the coefficients in g(t) are rational numbers, also


3.12 Basic operations of formal the coefficients in (1 + g(t))α are, provided α ∈ Q.
power series These considerations are to be remembered if the op-
eration is realized in some special environment, as
We are now considering the vector representation of described in the previous section.
formal power series: Whatever α ∈ R is, the exponents involved in
̰ ! the right hand member of (3.12.1) are all positive
X
reprn0 ak tk = (a0 , a1 , . . . , an ) n ≤ n0 . integer numbers. Therefore, powers can be real-
k=0 ized as successive multiplications or Cauchy products.
This gives a straight-forward method for perform-
The sum of the formal power series is defined in an
ing (1 + g(t))α , but it is easily seen to take a time
obvious way:
in the order of O(r3 ), since it requires r products
(a0 , a1 , . . . , an ) + (b0 , b1 , . . . , bm ) = (c0 , c1 , . . . , cr ) each executed in time O(r2 ). Fortunately, however,
3.13. LOGARITHM AND EXPONENTIAL 37

+
¡ @
¡
¡ @
@
∗ +
¡ ¡ @
¡
º· ¡ º· ¡¡ @@
a0 t 0 ∗ +
¹¸¹¸ .
¡ ¡ .
¡
º· ¡ º·
¡ .
¡ .
.
a1 t 1 ∗
¹¸¹¸ +
¡
¡
º· ¡ º· ¡ @
a2 t2 ¡¡ @¾»
@
¹¸¹¸ ∗ O(tn )
½¼
¡
¡
º· ¡ º·
an tn
¹¸¹¸

Figure 3.2: The tree for a formal power series

J. C. P. Miller has devised an algorithm allowing to The computation is now straight-forward. We be-
perform (1 + g(t))α in time O(r2 ). In fact, let us gin by setting h0 = 1, and then we successively com-
write h(t) = a(t)α , where a(t) is any formal power pute h1 , h2 , . . . , hr (r = n, if n is the number of terms
series with a0 = 1. By differentiating, we obtain in (a1 , a2 , . . . , an )). The evaluation of hk requires a
h′ (t) = αa(t)α−1 a′ (t), or, by multiplying everything number of operations in the order O(k), and therefore
by a(t), a(t)h′ (t) = αh(t)a′ (t). Therefore, by extract- the whole procedure works in time O(r2 ), as desired.
ing the coefficient of tn−1 we find: The inverse of a series, i.e., (1+g(t))−1 , is obtained
by setting α = −1. It is worth noting that the previ-
n−1
X n−1
X ous formula becomes:
ak (n − k)hn−k = α (k + 1)ak+1 hn−k−1
k−1
X k
X
k=0 k=0
hk = −ak − aj hk−j = − aj hk−j (h0 = 1)
We now isolate the term with k = 0 in the left hand j=1 j=1
member and the term having k = n − 1 in the right and can be used to prove properties of the inverse of
hand member (a0 = 1 by hypothesis): a power series. As a simple example, the reader can
show that the coefficients in (1 − t)−1 are all 1.
n−1
X n−1
X
nhn + ak (n − k)hn−k = αnan + αkak hn−k
k=1 k=1 3.13 Logarithm and exponen-
(in the last sum we performed the change of vari- tial
able k → k − 1, in order to have the same indices as
in the left hand member). We now have an expres- The idea of Miller can be applied to other operations
sion for hn only depending on (a1 , a2 , . . . , an ) and on formal power series. In the present section we
(h1 , h2 , . . . , hn−1 ): wish to use it to perform the (natural) logarithm and
the exponentiation of a series. Let us begin with the
n−1
1X logarithm and try to compute ln(1 + g(t)). As we
hn = αan + ((α + 1)k − n) ak hn−k =
n know, there is a direct way to perform this operation,
k=1
i.e.:
X µ (α + 1)k
n−1 ¶ ∞
X 1
= αan + − 1 ak hn−k ln(1 + g(t)) = g(t)k
n k
k=1 k=1
38 CHAPTER 3. FORMAL POWER SERIES

and this formula only requires a series of successive number of significant terms in f (t) and g(t), respec-
products. As for the operation of rising to a power, tively.
the procedure needs a time in the order of O(r3 ), We conclude this section by sketching the obvious
and it is worth considering an alternative approach. algorithms to compute differentiation and integration
In fact, if we set h(t) = ln(1 + g(t)), by differentiating of a formal power series f (t) = f0 +f1 t+f2 t2 +f3 t3 +
we obtain h′ (t) = g ′ (t)/(1 + g(t)), or h′ (t) = g ′ (t) − · · ·. If h(t) = f ′ (t), we have:
h′ (t)g(t). We can now extract the coefficient of tk−1
and obtain: hk = (k + 1)fk+1
k−1
X and therefore the number of significant
Rt terms is re-
khk = kgk − (k − j)hk−j gj duced by 1. Conversely, if h(t) = 0 f (τ )dτ , we have:
j=0

However, g0 = 0 by hypothesis, and therefore we have 1


hk = fk−1
an expression relating hk to (g1 , g2 , . . . , gk ) and to k
(h1 , h2 , . . . , hk−1 ): and h0 = 0; consequently the number of significant
k−1
X terms is increased by 1.
1
h k = gk − (k − j)hk−j gj =
k j=1


k−1
j

= gk − 1− hk−j gj
j=1
k

A program to perform the logarithm of a formal


power series 1+g(t) begins by setting h0 = 0 and then
proceeds computing h1 , h2 , . . . , hr if r is the number
of significant terms in g(t). The total time is clearly
in the order of O(r2 ).
A similar technique can be applied to the com-
putation of exp(g(t)) provided that g(t) ∈ / F0 . If
g(t) ∈ F0 , i.e., g(t) = g0 + g1 t + g2 t2 + · · ·, we have
exp(g0 +g1 t+g2 t2 +· · ·) = eg0 exp(g1 t+g2 t2 +· · ·). In
this way we are reduced to the previous case, but we
no longer have rational coefficients when g(t) ∈ Q[[t]].
By differentiating the identity h(t) = exp(g(t)) we
obtain h′ (t) = g ′ (t) exp(g(t)) = g ′ (t)h(t). We extract
the coefficient of tk−1 :
k−1
X k
X
khk = (j + 1)gj+1 hk−j−1 = jgj hk−j
j=0 j=1

This formula allows us to compute hk in terms of


(g1 , g2 , . . . , gk ) and (h0 = 1, h1 , h2 , . . . , hk−1 ). A pro-
gram performing exponentiation can be easily writ-
ten by defining h0 = 1 and successively evaluating
h1 , h2 , . . . , hr . if r is the number of significant terms
in g(t). Time complexity is obviously O(r2 ).
Unfortunately, a similar trick does not work for
series composition. To compute f (g(t)), when g(t) ∈ /
F0 , we have to resort to the defining formula:

X
f (g(t)) = fk g(t)k
k=0

This requires the successive computation of the in-


teger powers of g(t), which can be performed by re-
peated applications of the Cauchy product. The exe-
cution time is in the order of O(r3 ), if r is the minimal
Chapter 4

Generating Functions

4.1 General Rules in the last passage we applied backwards the formula:
1 n−1 ′
Let us consider a sequence of numbers F = [tn ]F (w(t)) = [t ]F (t)φ(t)n (w = tφ(w))
n
(f0 , f1 , f2 , . . .) = (fk )k∈N ; the (ordinary) generat-
ing function for the sequence F is defined as f (t) = and therefore w = w(t) ∈ F 1 is the unique solution
f0 + f1 t + f2 t2 + · · ·, where the indeterminate t is of the functional equation w = tφ(w). By now apply-
arbitrary. Given the sequence (fk )k∈N , we intro- ing the rule of differentiation for the “coefficient of”
duce the generating function operator G, which ap- operator, we can go on:
plied to (fk )k∈N produces the ordinary generating
function for the sequence, i.e., G(fk )k∈N = f (t). In [tn ]F (t)φ(t)n
·Z ¸
this expression t is a bound variable, and a more ac- d F (y) ¯¯
= [tn−1 ] dy ¯ y = w(t) =
curate notation would be Gt (fk )k∈N = f (t). This dt y
notation is essential when (fk )k∈N depends on some · ¯ ¸
F (w) ¯ dw
parameter or when we consider multivariate gener- = [tn−1 ] ¯ w = tφ(w) .
w dt
ating functions. In this latter case, for example, we
should write Gt,w (fn,k )n,k∈N = f (t, w) to indicate the We have applied the chain rule for differentiation, and
fact that fn,k in the double sequence becomes the co- from w = tφ(w) we have:
efficient of tn wk in the function f (t, w). However, · ¸
whenever no ambiguity can arise, we will use the no- dw dφ ¯¯ dw
= φ(w) + t ¯ w = tφ(w) .
tation G(fk ) = f (t), understanding also the binding dt dw dt
for the variable k. For the sake of completeness, we
We can therefore compute the derivative of w(t):
also define the exponential generating function of the
sequence (f0 , f1 , f2 , . . .) as: · ¯ ¸
dw φ(w) ¯
= ¯ w = tφ(w)
µ ¶ dt 1 − tφ′ (w)

X
fk tk
E(fk ) = G = fk . where φ′ (w) denotes the derivative of φ(w) with re-
k! k!
k=0 spect to w. We can substitute this expression in the
formula above and observe that w/φ(w) = t can be
The operator G is clearly linear. The function f (t) taken outside of the substitution symbol:
can be shifted or differentiated. Two functions f (t)
and g(t) can be multiplied and composed. This leads [tn ]F (t)φ(t)n =
· ¯ ¸
to the properties for the operator G listed in Table 4.1. F (w) φ(w) ¯
= [tn−1 ] ¯ w = tφ(w) =
Note that formula (G5) requires g0 = 0. The first w 1 − tφ′ (w)
· ¸
five formulas are easily verified by using the intended 1 F (w) ¯¯
interpretation of the operator G; the last formula can = [tn−1 ] ¯ w = tφ(w) =
t 1 − tφ′ (w)
be proved by means of the LIF, in the form relative · ¸
F (w) ¯¯
to the composition F (w(t)). In fact we have: = [tn ] ¯ w = tφ(w)
1 − tφ′ (w)
F (t) which is our diagonalization rule (G6).
[tn ]F (t)φ(t)n = [tn−1 ] φ(t)n =
t The name “diagonalization” is due to the fact
·Z ¸
F (y) ¯¯ that if we imagine the coefficients of F (t)φ(t)n as
= n[tn ] dy ¯ y = w(t) ; constituting the row n in an infinite matrix, then
y

39
40 CHAPTER 4. GENERATING FUNCTIONS

linearity G(αfk + βgk ) = αG(fk ) + βG(gk ) (G1)


(G(fk ) − f0 )
shifting G(fk+1 ) = t (G2)

differentiation à G(kfk!) = tDG(fk ) (G3)


n
X
convolution G fk gn−k = G(fk ) · G(gk ) (G4)
k=0

X n
composition fn (G(gk )) = G(fk ) ◦ G(gk ) (G5)
n=0 · ¸
F (w) ¯¯
diagonalisation G([tn ]F (t)φ(t)n ) = ¯ w = tφ(w) (G6)
1 − tφ′ (w)

Table 4.1: The rules for the generating function operator

[tn ]F (t)φ(t)n are just the elements in the main di-


agonal of this array.
By mathematical induction this result can be gen-
The rules (G1) − (G6) can also be assumed as ax-
eralized to:
ioms of a theory of generating functions and used to
derive general theorems as well as specific functions Theorem 4.2.2 Let f (t) = G(fk ) be as above; then:
for particular sequences. In the next sections, we
will prove a number of properties of the generating G(fk ) − f0 − f1 t − · · · − fj−1 tj−1
G(fk+j ) = (4.2.2)
function operator. The proofs rely on the following tj
fundamental principle of identity:
If we consider right instead of left shifting we have
Given two sequences (fk )k∈N and (gk )k∈N , then to be more careful:
G(fk ) = G(gk ) if and only if for every k ∈ N fk = gk .
Theorem 4.2.3 Let f (t) = G(fk ) be as above, then:

The principle is rather obvious from the very defini- G(fk−j ) = tj G(fk ) (4.2.3)
tion of the concept of generating functions; however, ¡ ¢
Proof: We have G(fk ) = G f(k−1)+1 =
it is important, because it states the condition under t−1 (G(fn−1 ) − f−1 ) where f−1 is the coefficient of t−1
which we can pass from an identity about elements in f (t). If f (t) ∈ F, f−1 = 0 and G(fk−1 ) = tG(fk ).
to the corresponding identity about generating func- The theorem then follows by mathematical induction.
tions. It is sufficient that the two sequences do not
agree by a single element (e.g., the first one) and we
cannot infer the equality of generating functions. Property (G3) can be generalized in several ways:

Theorem 4.2.4 Let f (t) = G(fk ) be as above; then:


4.2 Some Theorems on Gener-
G((k + 1)fk+1 ) = DG(fk ) (4.2.4)
ating Functions
Proof: If we set gk = kfk , we obtain:
We are now going to prove a series of properties of
generating functions. G((k + 1)fk+1 ) = G(gk+1 ) =
(G2)
Theorem 4.2.1 Let f (t) = G(fk ) be the generating = t−1 (G(kfk ) − 0f0 ) =
function of the sequence (fk )k∈N , then = t−1 tDG(fk ) = DG(fk ) .

G(fk ) − f0 − f1 t
G(fk+2 ) = (4.2.1)
t2
Proof: Let gk = fk+1 ; by (G2), G(gk ) = Theorem 4.2.5 Let f (t) = G(fk ) be as above; then:
(G(fk ) − f0 ) /t. Since g0 = f1 , we have: ¡ ¢
G k 2 fk = tDG(fk ) + t2 D2 G(fk ) (4.2.5)
G(gk ) − g0 G(fk ) − f0 − f1 t
G(fk+2 ) = G(gk+1 ) = = This can be further generalized:
t t2
4.3. MORE ADVANCED RESULTS 41

Theorem 4.2.6 Let f (t) = G(fk ) be as above; then: Proof: Let us consider the sequence (gk )k∈N , where
¡ ¢ gk+1 = fk and g0 = 0. So we have: G(gk+1 ) =
G k j fk = Sj (tD)G(fk ) (4.2.6)
G(fk ) = t−1 (G(gk ) − g0 ). Finally:
Pj © j ª r µ ¶ µ ¶ µ ¶
where Sj (w) = r=1 r w is the jth Stirling poly- 1 1 1 1
nomial of the second kind (see Section 2.11). G f k = G g k+1 = G g k =
k+1 k+1 t k
Z Z
Proof: Formula (4.2.6) is to be understood in the 1 t dz 1 t
= (G(gk ) − g0 ) = G(fk ) dz
operator sense; so, for example, being S3 (w) = w + t 0 z t 0
3w2 + w3 , we have:
¡ ¢
G k 3 fk = tDG(fk ) + 3t2 D2 G(fk ) + t3 D3 G(fk ) . In the following theorems, f (t) will always denote
the generating function G(fk ).
The proof proceeds by induction, as (G3) and (4.2.5)
are the first two instances. Now: Theorem 4.2.10 Let f (t) = G(fk ) denote the gen-
¡ j+1 ¢ ¡ j ¢ ¡ j ¢ erating function of the sequence (fk )k∈N ; then:
G k fk = G k(k )fk = tDG k fk ¡ ¢
G pk fk = f (pt) (4.2.10)
that is: Proof: P By setting g(t) P = pt in (G5) we¡ have:¢
∞ n ∞ n n
Xj f (pt) = f
n=0 n (pt) = n=0 p fn t = G pk fk
r r
Sj+1 (tD) = tDSj (tD) = tD S(j, r)t D =
r=1 ¡ ¢
j j In particular, for p = −1 we have: G (−1)k fk =
X X
= S(j, r)rtr Dr + S(j, r)tr+1 Dr+1 . f (−t).
r=1 r=1

By equating like coefficients we find S(j + 1, r) = 4.3 More advanced results


rS(j, r) + S(j, r − 1), which is the classical recur-
rence for the Stirling numbers of the second kind. The results obtained till now can be considered as
Since initial
©jª conditions also coincide, we can conclude simple generalizations of the axioms. They are very
S(j, r) = r . useful and will be used in many circumstances. How-
ever, we can also obtain more advanced results, con-
For the falling factorial k r = k(k − 1) · · · (k − r + cerning sequences derived from a given sequence by
1) we have a simpler formula, the proof of which is manipulating its elements in various ways. For exam-
immediate: ple, let us begin by proving the well-known bisection
Theorem 4.2.7 Let f (t) = G(fk ) be as above; then: formulas:
G(k r fk ) = tr Dr G(fk ) (4.2.7) Theorem 4.3.1 Let f (t) = G(fk ) denote the gener-
ating function of the sequence (fk )k∈N ; then:
Let us now come to integration: √ √
f ( t) + f (− t)
Theorem 4.2.8 Let f (t) = G(fk ) be as above and G(f2k ) = (4.3.1)
√ 2 √
let us define gk = fk /k, ∀k 6= 0 and g0 = 0; then: f ( t) − f (− t)
µ ¶ Z t G(f2k+1 ) = √ (4.3.2)
1 dz 2 t
G fk = G(gk ) = (G(fk ) − f0 ) (4.2.8) √ √
k 0 z where t is a symbol with the property that ( t)2 = t.
Proof: Clearly, kgk = fk , except for k = 0. Hence Proof: By (G5) we have:
we have G(kgk ) = G(fk ) − f0 . By using (G3), we find √ √
tDG(gk ) = G(fk ) − f0 , from which (4.2.8) follows by f ( t) + f (− t) =
integration and the condition g0 = 0. 2
P∞ √ n P∞ √ n
n=0 fn t + n=0 fn (− t)
A more classical formula is: = =
2
∞ √ √
Theorem 4.2.9 Let f (t) = G(fk ) be as above or, X ( t)n + (− t)n
equivalently, let f (t) be a f.p.s. but not a f.L.s.; then: = fn
n=0
2
µ ¶ √ √
1 For n odd ( t)n + (− t)n = 0; hence, by setting
G fk =
k+1 n = 2k, we have:
Z t Z t ∞ ∞
1 1 X X
k k
= G(fk ) dz = f (z) dz (4.2.9) f2k (t + t )/2 = f2k tk = G(f2k ) .
t 0 t 0
k=0 k=0
42 CHAPTER 4. GENERATING FUNCTIONS

∞ µ ¶k
The proof of the second formula is analogous. 1 X k t
= [tn ] [y ]f (y) =
The following proof is typical and introduces the 1−t 1−t
k=0
µ ¶
use of ordinary differential equations in the calculus 1 t
of generating functions: = [tn ] f .
1−t 1−t
Theorem 4.3.2 Let f (t) = G(fk ) denote the gener- Since the last expression does not depend on n, it
ating function of the sequence (fk )k∈N ; then: represents the generating function of the sum.
µ ¶ Z t
fk 1 G(fk ) PWe ¡observe
¢ explicitly that by (K4) we have:
G = √ √ dz (4.3.3) n n n n
2k + 1 2 t 0 z k=0 k fk = [t ](1 + t) f (t), but this expression
does not represent a generating function because it
Proof: Let us set gk = fk /(2k+1), or 2kgk +gk = fk . depends on n. The Euler transformation can be gen-
If g(t) = G(gk ), by applying (G3) we have the differ- eralized in several ways, as we shall see when dealing
ential equation 2tg ′ (t) + g(t) = f (t), whose solution with Riordan arrays.
having g(0) = f0 is just formula (4.3.3).
We conclude with two general theorems on sums:
4.4 Common Generating Func-
Theorem 4.3.3 (Partial Sum Theorem) Let
f (t) = G(fk ) denote the generating function of the tions
sequence (fk )k∈N ; then:
The aim of the present section is to derive the most
à n !
X 1 common generating functions by using the apparatus
G fk = G(fk ) (4.3.4) of the previous sections. As a first example, let us
1−t
k=0 consider the constant sequence F = (1, 1, 1, . . .), for
Pn which we have fk+1 = fk for every k ∈ N. By ap-
Proof: If we set sn = k=0 fk , then we have sn+1 =
sn + fn+1 for every n ∈ N and we can apply the oper- plying the principle of identity, we find: G(fk+1 ) =
ator G to both members: G(sn+1 ) = G(sn ) + G(fn+1 ), G(fk ), that is by (G2): G(fk ) − f0 = tG(fk ). Since
i.e.: f0 = 1, we have immediately:
G(sn ) − s0 G(fn ) − f0
= G(sn ) + 1
t t G(1) =
1−t
Since s0 = f0 , we find G(sn ) = tG(sn ) + G(fn ) and
from this (4.3.4) follows directly. For any constant sequence F = (c, c, c, . . .), by (G1)
we find that G(c) = c(1−t)−1 . Similarly, by using the
The following result is known as Euler transforma- basic rules and the theorems of the previous sections
tion: we have:
Theorem 4.3.4 Let f (t) = G(fk ) denote the gener- 1 t
G(n) = G(n · 1) = tD =
ating function of the sequence (fk )k∈N ; then: 1−t (1 − t)2
à n µ ¶ ! µ ¶ ¡ ¢ 1 t + t2
X n 1 t G n2 = tDG(n) = tD =
G fk = f (4.3.5) (1 − t) 2 (1 − t)3
k 1−t 1−t
k=0 1
G((−1)n ) = G(1) ◦ (−t) =
Proof: By well-known properties of binomial coeffi- 1+t
µ ¶ µ ¶ Z tµ ¶
cients we have: 1 1 1 dz
µ ¶ µ ¶ µ ¶ G = G ·1 = −1 =
n n −n + n − k − 1 n n 0 1 − z z
= = (−1)n−k = Z t
k n−k n−k dz 1
µ ¶ = = ln
−k − 1 0 1−z 1−t
= (−1)n−k à n ! µ ¶
n−k X1 1 1
G(H n ) = G = G =
and this is the coefficient of t n−k
in (1 − t)−k−1
. We k 1 − t n
k=0
now observe that the sum in (4.3.5) can be extended 1 1
to infinity, and by (G5) we have: = ln
1−t 1−t
Xn µ ¶ Xn µ ¶
n −k − 1 where Hn is the nth harmonic number. Other gen-
fk = (−1)n−k fk =
k n−k erating functions can be obtained from the previous
k=0 k=0
∞ formulas:
X
n−k −k−1 k 1 1
= [t ](1 − t) [y ]f (y) =
G(nHn ) = tD ln =
k=0 1−t 1−t
4.4. COMMON GENERATING FUNCTIONS 43
µ ¶
t 1 By (K4), we have the well-known Vandermonde
= ln +1
(1 − t)2 1−t convolution:
µ ¶ Z t µ ¶
1 1 1 1 m+p
G Hn = ln dz = = [tn ](1 + t)m+p =
n+1 t 0 1−z 1−z n
µ ¶2
1 1 = [tn ](1 + t)m (1 + t)p =
= ln X n µ ¶µ ¶
2t 1−t m p
=
1−t k n−k
G(δ0,n ) = G(1) − tG(1) = =1 k=0
1−t Pn ¡n¢2 ¡2n¢
where δn,m is the Kronecker’s delta. This last re- which, for m = p = n becomes
³¡ ¢ ´ k=0 k = n .
k
lation can be readily generalized to³G(δn,m´ ) = tm . We can also find G p , where p is a parameter.
An interesting example is given by G n(n+1) . Since The derivation is purely algebraic and makes use of
1

1 1 1 the generating functions already found and of various


n(n+1) = n − n+1 , it is tempting to apply the op- properties considered in the previous section:
erator G to both members. However, this relation is µµ ¶¶ µµ ¶¶
not valid for n = 0. In order to apply the principle G k k
= G =
of identity, we must define: p k−p
µµ ¶ ¶
1 1 1 −k + k − p + 1 k−p
= − + δn,0 = G (−1) =
n(n + 1) n n+1 k−p
µµ ¶ ¶
−p − 1
in accordance with the fact that the first element of = G (−1)k−p =
k−p
the sequence is zero. We thus arrive to the correct µ ¶
k−p 1
generating function: = G [t ] =
µ ¶ (1 − t)p+1
1 1−t 1 µ ¶
G =1− ln k tp tp
n(n + 1) t 1−t = G [t ] = .
(1 − t)p+1 (1 − t)p+1
Let us ¡¡
now¢¢come to binomial coefficients. In order Several generating functions for different forms of bi-
to find G kp we observe that, from the definition: nomial coefficients can be found by means of this
µ ¶ method. They are summarized as follows, where p
p p(p − 1) · · · (p − k + 1)(p − k)
= = and m are two parameters and can also be zero:
k+1 (k + 1)! µµ ¶¶
µ ¶ p (1 + t)p
p−k p G =
= . m+k tm
k+1 k µµ ¶¶
¡p¢ p+k tm−p
Hence, by denoting as fk , we have: G =
k m (1 − t)m+1
G((k + 1)fk+1 ) = G((p − k)fk ) = pG(fk ) − G(kfk ). µµ ¶¶
p+k 1
By applying (4.2.4) and (G3) we have G = m
m+k t (1 − t)p+1−m
DG(fk ) = pG(fk ) − tDG(fk ), i.e., the differen-
tial equation f ′ (t) = pf (t) − tf ′ (t). By sep- These functions can make sense even when they are
arating the variables and integrating, we find f.L.s. and not simply f.p.s..
ln f (t) = p ln(1 + t) + c, or¡f¢(t) = c(1 + t)p . For t = 0 Finally, we list the following generating functions:
we should have f (0) = p0 = 1, and this implies µ µ ¶¶ µµ ¶¶
p p
c = 1. Consequently: G k = tDG =
k k
µµ ¶¶
p = tD(1 + t)p = pt(1 + t)p−1
G = (1 + t)p p∈R µ µ ¶¶ µµ ¶¶
k 2 p 2 2 p
G k = (tD + t D )G =
We are now in a position to derive the recurrence k k
relation for binomial coefficients. By using (K1) ÷ = pt(1 + pt)(1 + t)p−2
(K5) we find easily: µ µ ¶¶ Z µµ ¶¶
1 p 1 t p
µ ¶ G = G dz =
p k+1 k t 0 k
= [tk ](1 + t)p = [tk ](1 + t)(1 + t)p−1 = · ¸t
k 1 (1 + t)p+1
= =
= [tk ](1 + t)p−1 + [tk−1 ](1 + t)p−1 = t p+1 0
µ ¶ µ ¶
p−1 p−1 (1 + t)p+1 − 1
= + . =
k k−1 (p + 1)t
44 CHAPTER 4. GENERATING FUNCTIONS
µ µ ¶¶ µ ¶
k tm 1 1 1
G k = tD = = − .
m (1 − t)m+1 (p − 1)t 1 − pt 1 − t
mtm + tm+1 The last relation has been obtained by partial fraction
=
(1 − t)m+2 expansion. By using the operator [tk ] we easily find:
µ µ ¶¶ µµ ¶¶
k
2 2 2 k n
X
G k = (tD + t D )G = 1 1
m m pk = [tn ] =
1 − t 1 − pt
m2 tm + (3m + 1)tm+1 + tm+2 k=0
µ ¶
= 1 1 1
(1 − t)m+3 = [tn+1 ] − =
µ µ ¶¶ Z tµ µµ ¶¶ µ ¶¶ (p − 1) 1 − pt 1 − t
1 k k 0 dz
G = G − = pn+1 − 1
k m 0 m m z =
Z t m−1 p−1
z dz tm
= m+1
= the well-known formula for the sum of a geometric
0 (1 − z) m(1 − t)m
progression. We observe explicitly that the formulas
The last integral can be solved by setting y = (1 − above could have been obtained from formulas of the
z)−1 and is valid for m > 0; for m = 0 it reduces to previous section and the general formula (4.2.10). In
G(1/k) = − ln(1 − t). a similar way we also have:
µµ ¶ ¶
m k
4.5 The Method of Shifting G p = (1 + pt)m
k
µµ ¶ ¶ µµ ¶ ¶
When the elements of a sequence F are given by an k k k k−m m
G p = G p p =
explicit formula, we can try to find the generating m k−m
µµ ¶ ¶
function for F by using the technique of shifting: we m −m − 1 k−m
= p G (−p) =
consider the element fn+1 and try to express it in k−m
terms of fn . This can produce a relation to which pm t m
we apply the principle of identity deriving an equa- = .
(1 − pt)m+1
tion in G(fn ), the solution of which is the generating
function. In practice, we find a recurrence for the As a very simple application of the shifting method,
elements fn ∈ F and then try to solve it by using let us observe that:
the rules (G1) ÷ (G5) and their consequences. It can 1 1 1
happen that the recurrence involves several elements =
(n + 1)! n + 1 n!
in F and/or that the resulting equation is indeed a
differential equation. Whatever the case, the method that is (n+1)fn+1 = fn , where fn = 1/n!. By (4.2.4)

of shifting allows us to find the generating function we have f (t) = f (t) or:
of many sequences. µ ¶
1
G = et
¡ Let 2 us3 consider¢ the geometric sequence
k+1 k
n!
1, p, p , p , . . . ; we have p ¡= k+1pp¢ , ∀k ∈ ¡ k ¢N µ ¶ Z t z
or, by applying the operator G, G p = pG p . 1 e −1
¡ ¡ ¢ ¢ ¡ ¢ G = dz
By (G2) we have t−1 G pk − 1 = pG pk , that is: n · n! 0 z
µ ¶ µ ¶
n 1
¡ ¢ 1 G = tDG =
G pk = (n + 1)! (n + 1)!
1 − pt
1 t tet − et + 1
From this we obtain other generating functions: = tD (e − 1) =
t t
¡ k¢ 1 pt
G kp = tD = By this relation the well-known result follows:
1 − pt (1 − pt)2 n µ t ¶
X k 1 te − et + 1
¡ 2 k¢ pt pt + p2 t2 = [t ]n
=
G k p = tD = (k + 1)! 1−t t
(1 − pt)2 (1 − pt)3 k=0
µ ¶ Z tµ ¶ µ ¶
1 k 1 dz n 1 et − 1
G p = −1 = = [t ] − =
k 0 1 − pz z 1−t t
Z 1
p dz 1 = 1− .
= = ln (n + 1)!
(1 − pz) 1 − pt
à n ! ¡ ¢ ¡2n¢
X 1 1 Let us now observe that 2n+2 = 2(2n+1)
G p k
= = ¡2n¢ n+1 n+1 n ;
1 − t 1 − pt by setting fn = n , we have the recurrence (n +
k=0
4.6. DIAGONALIZATION 45

1)fn+1 = 2(2n + 1)fn . By using (4.2.4), (G1) and 4.6 Diagonalization


(G3) we obtain the differential equation: f ′ (t) =
4tf ′ (t) + 2f (t), the simple solution of which is: The technique of shifting is a rather general method
µµ ¶¶ for obtaining generating functions. It produces first
2n 1 order recurrence relations, which will be more closely
G = √
n 1 − 4t studied in the next sections. Not every sequence can
µ µ ¶¶ Z √ be defined by a first order recurrence relation, and
1 2n 1 t dz 1 − 1 − 4t
G = √ = other methods are often necessary to find out gener-
n+1 n t 0 1 − 4z 2t
µ µ ¶¶ Z t µ ¶ ating functions. Sometimes, the rule of diagonaliza-
1 2n 1 dz tion can be used very conveniently. One of the most
G = √ −1 =
n n 0 1 − 4z z simple examples is how to determine the generating

1 − 1 − 4t function of the central binomial coefficients, without
= 2 ln having to pass through the solution of a differential
µ µ ¶¶ 2t
2n 2t equation. In fact we have:
G n = √ µ ¶
n (1 − 4t) 1 − 4t 2n
µ µ ¶¶ Z t = [tn ](1 + t)2n
1 2n 1 dz n
G = √ p =
2n + 1 n t 0 4z(1 − 4z)
r and (G6) can be applied with F (t) = 1 and φ(t) =
1 4t
= √ arctan . (1 + t)2 . In this case, the function w = w(t) is easily
4t 1 − 4t determined by solving the functional equation w =
t(1+w)2 . By expanding, we find tw2 −(1−2t)w+t = 0
A last group of generating functions is obtained by
¡ ¢−1 or: √
considering fn = 4n 2nn . Since: 1 − t ± 1 − 4t
w = w(t) = .
µ ¶−1 µ ¶−1 2t
2n + 2 2n + 2 n 2n
4n+1 = 4 Since w = w(t) should belong to F 1 , we must elim-
n+1 2n + 1 n inate the solution with the + sign; consequently, we
have: µµ ¶¶
we have the recurrence: (2n + 1)fn+1 = 2(n + 1)fn . 2n
By using the operator G and the rules of Section 4.2, G =
n
the differential equation 2t(1 − t)f ′ (t) − (1 + 2t)f (t) +
· ¯ √ ¸
1 = 0 is derived. The solution is: 1 ¯ 1 − t − 1 − 4t
s à Z r ! = ¯w= =
t
1 − 2t(1 + w) 2t
t (1 − z)3 dz
f (t) = − 1
(1 − t)3 0 z 2z(1 − z) =√
1 − 4t
By
p simplifying and using the change of variable y = as we already know.
z/(1 − z), the integral can be computed without The function φ(t) = (1 + t)2 gives rise to a second
difficulty, and the final result is: degree equation. More in general, let us study the
à µ ¶−1 ! s r sequence:
n 2n t t 1 cn = [tn ](1 + αt + βt2 )n
G 4 = arctan + .
n (1 − t)3 1−t 1−t
and look for its generating function C(t) = G(cn ).
Some immediate consequences are: In this case again we have F (t) = 1 and φ(t) = 1 +
αt + βt2 , and therefore we should solve the functional
à µ ¶−1 ! r
4n 2n 1 t equation w = t(1+αw+βw2 ) or βtw2 −(1−αt)w+t =
G = p arctan 0. This gives:
2n + 1 n t(1 − t) 1−t
à ! à !2 p
µ ¶ −1 r 1 − αt ± (1 − αt)2 − 4βt2
4n 2n t w = w(t) =
G 2
= arctan 2βt
2n n 1−t
and again we have to eliminate the solution with the
and finally: + sign. By performing the necessary computations,
à µ ¶−1 ! r r we find:
1 4n 2n 1−t t · ¯
G = 1− arctan . 1 ¯
2n 2n + 1 n t 1−t C(t) = ¯
1 − t(α + 2βw)
46 CHAPTER 4. GENERATING FUNCTIONS

n\k 0 1 2 3 4 5 6 7 8 and therefore we obtain:


0 1
³ √ ´ X ∞
tk
2
1 1 1 1
G ⌊ k⌋ = .
2 1 2 3 2 1 1−t
k=0
3 1 3 6 7 6 3 1
4 1 4 10 16 19 16 10 4 1 In the same way we obtain analogous generating
functions:
Table 4.2: Trinomial coefficients
³ √ ´ X ∞
tk
r ∞
X tr
k
r
p # G ⌊ k⌋ = G(⌊logr k⌋) =
¯ 1 − αt − 1 − 2αt + (α2 − 4β)t2 1−t 1−t
¯ k=0 k=0
¯w=
2βt
where r is any integer number, or also any real num-
1
= p ber, if we substitute to k r and rk , respectively, ⌈k r ⌉
1 − 2αt + (α2 − 4β)t2 and ⌈rk ⌉.
and for α = 2, β = 1 we obtain again the generating These generating functions can be used to find the
function for the central binomial coefficients. values of several sums in closed or semi-closed form.
The coefficients of (1 + t + t2 )n are called trinomial Let us begin by the following case, where we use the
coefficients, in analogy with the binomial coefficients. Euler transformation:
They constitute an infinite array in which every row X µn¶ √
has two more elements, different from 0, with respect ⌊ k⌋(−1)k =
k
to the previous row (see Table 4.2). k
If Tn,k is [tk ](1 + t + t2 )n , a trinomial coefficient, X µn¶ √
n
= (−1) (−1)n−k ⌊ k⌋ =
by the obvious property (1 + t + t2 )n+1 = (1 + t + k
k
t2 )(1+t+t2 )n , we immediately deduce the recurrence "∞ #
1 X y k2 ¯¯ t
relation: n n
= (−1) [t ] ¯y= =
1+t 1−y 1+t
k=0
Tn+1,k+1 = Tn,k−1 + Tn,k + Tn,k+1
X∞ 2
n n tk
from which the array can be built, once we start from = (−1) [t ] =
(1 + t)k2
the initial conditions Tn,0 = 1 and Tn,2n = 1, for ev- k=0

ery n ∈ N. The elements Tn,n , marked in the table, ⌊ n⌋
X 2 1
n
are called the central trinomial coefficients; their se- = (−1) [tn−k ] =
(1 + t)k2
quence begins: k=0

⌊ n⌋ µ ¶
X −k 2
n 0 1 2 3 4 5 6 7 8 = (−1) n
=
Tn 1 1 3 7 19 51 141 393 1107 n − k2
k=0

⌊ n⌋ µ
and by the formula above their generating function X n−1¶ 2

is: = (−1)n (−1)n−k =


n − k2
k=0
1 1 √
⌊ n⌋ µ ¶
G(Tn,n ) = √ =p . X n−1
1 − 2t − 3t2 (1 + t)(1 − 3t) = (−1)k .
2

n−k 2
k=0

4.7 Some special generating We can think of the last sum as a “semi-closed”
functions form, because the√number of terms is dramatically
reduced from n to n, although it remains depending
We wish to determine the generating function of on n. In the same way we find:
the sequence {0, 1, 1, 1, 2, 2, 2, 2, 3, . . .},
√ that is the
sequence whose generic element is ⌊ k⌋. We can X µn¶ ⌊log2 n⌋ µ
X n−1

k
think that it is formed up by summing an infinite ⌊log2 k⌋(−1) = .
k n − 2k
number of simpler sequences {0, 1, 1, 1, 1, 1, 1, . . .}, k k=1

{0, 0, 0, 0, 1, 1, 1, 1, . . .}, the next one with the first 1


A truly closed formula is found for the following
in position 9, and so on. The generating functions of
sum:
these sequences are:
Xn √ ∞ 2
1 X tk
t1 t4 t9 t16 ⌊ k⌋ = [t ] n
=
··· 1−t 1−t
1−t 1−t 1−t 1−t k=1 k=1
4.8. LINEAR RECURRENCES WITH CONSTANT COEFFICIENTS 47


X 2 2 2
2 1 2k 2k (1 − t)k − 1
= [tn−k ] = = − =
(1 − t)2 1 − 2t (1 − t)k2 (1 − 2t)
k=1

⌊ n⌋ µ ¶ 1
X −2 = .
=
2
(−1)n−k = (1 − t) 2 (1 − 2t)
k
n − k2
k=1 Therefore, we conclude:

⌊ n⌋ µ ¶
X n−k +1 2
√ √
= = n µ ¶ √
X ⌊ n⌋
X 2 X n−1¶
⌊ n⌋ µ
n − k2 n 2
k=1 ⌊ k⌋ = 2k 2n−k − −
√ k n − k2
⌊ n⌋ k=0 k=1 k=1
X
= (n − k 2 + 1) = √ √
k=1 X µn−2¶
⌊ n⌋
X 2 µn − k 2 ¶
⌊ n⌋
√ − 2 − ··· − 2k −1 =

⌊ n⌋
X n − k2 n − k2
k=1 k=1
= (n + 1)⌊ n⌋ − k2 . √
⌊ n⌋ µµ µ ¶ ¶
k=1 √ X n−1 n−2
n
= ⌊ n⌋2 − +2 + ···+
The final value of the sum is therefore: n − k2 n − k2
k=1
√ µ ¶ µ ¶¶
√ ⌊ n⌋ ¡ √ ¢ √ 1 2
(n + 1)⌊ n⌋ − ⌊ n⌋ + 1 ⌊ n⌋ + k2 −1 n − k
3 2 +··· + 2 .
n − k2

whose asymptotic value is 23 n n. Again, for the anal- We observe that for very large values of n the first
ogous sum with ⌊log2 n⌋, we obtain (see Chapter 1): term dominates all the others,√and therefore the
n
asymptotic value of the sum is ⌊ n⌋2n .
X
⌊log2 k⌋ = (n + 1)⌊log2 n⌋ − 2⌊log2 n⌋+1 + 1.
k=1
4.8 Linear recurrences with
A somewhat more difficult sum is the following one: constant coefficients
µ ¶ "∞ #
n
X n √ 1 X yk ¯ 2 ¯ t
⌊ k⌋ = [tn ] ¯y= If (fk )k∈N is a sequence, it can be defined by means
k 1−t 1−y 1−t of a recurrence relation, i.e., a relation relating the
k=0 k=0
X ∞
t k2 generic element fn to other elements fk having k < n.
= [tn ] 2 = Usually, the first elements of the sequence must be
(1 − t)k (1 − 2t)
k=0 given explicitly, in order to allow the computation
X∞
2 1 of the successive values; they constitute the initial
= [tn−k ] k 2 . conditions and the sequence is well-defined if and only
(1 − t) (1 − 2t)
k=0 if every element can be computed by starting with
We can now obtain a semi-closed form for this sum by the initial conditions and going on with the other
expanding the generating function into partial frac- elements by means of the recurrence relation. For
tions: example, the constant sequence (1, 1, 1, . . .) can be
defined by the recurrence relation xn = xn−1 and
1 A B
= + + the initial condition x0 = 1. By changing the initial
(1 − t)k2 (1 − 2t) 1 − 2t (1 − t)k2 conditions, the sequence can radically change; if we
C D X consider the same relation xn = xn−1 together with
+ k 2 −1 + k 2 −2 + · · · + . the initial condition x0 = 2, we obtain the constant
(1 − t) (1 − t) 1−t
sequence {2, 2, 2, . . .}.
2
We can show that A = 2k , B = −1, C = −2, D = In general, a recurrence relation can be written
2
−4, . . . , X = −2k −1 ; in fact, by substituting these fn = F (fn−1 , fn−2 , . . .); when F depends on all
values in the previous expression we get: the values fn−1 , fn−2 , . . . , f1 , f0 , then the relation is
2 called a full history recurrence. If F only depends on
2k 1 2(1 − t) a fixed number of elements fn−1 , fn−2 , . . . , fn−p , then
− − −
1 − 2t (1 − t)k2 (1 − t)k2 the relation is called a partial history recurrence and p
4(1 − t)2
2
2k −1 (1 − t)k −1
2
is called the order of the relation. Besides, if F is lin-
− k 2 − ··· − k 2 = ear, we have a linear recurrence. Linear recurrences
(1 − t) (1 − t)
2
à 2 2
! are surely the most common and important type of
2k 1 2k (1 − t)k − 1 recurrence relations; if all the coefficients appearing
= − 2 =
1 − 2t (1 − t)k 2(1 − t) − 1 in F are constant, we have a linear recurrence with
48 CHAPTER 4. GENERATING FUNCTIONS

constant coefficients, and if the coefficients are poly- and by solving in F (t) we have the explicit expression:
nomials in n, we have a linear recurrence with poly-
nomial coefficients. As we are now going to see, the t
F (t) = .
method of generating functions allows us to find the 1 − t − t2
solution of any linear recurrence with constant coeffi-
cients, in the sense that we find a function f (t) such This is the generating function for the Fibonacci
that [tn ]f (t) = fn , ∀n ∈ N. For linear recurrences numbers. We can now find an explicit expression for
with polynomial coefficients, the same method allows Fn in the following way. The denominator of F (t)
us to find a solution in many occasions, but the suc- b where:
can be written 1 − t − t2 = (1 − φt)(1 − φt)
cess is not assured. On the other hand, no method

is known that solves all the recurrences of this kind, 1+ 5
and surely generating functions are the method giv- φ= ≈ 1.618033989
2
ing the highest number of positive results. We will
discuss this case in the next section. √
b 1− 5
The Fibonacci recurrence Fn = Fn−1 + Fn−2 is an φ= ≈ −0.618033989.
example of a recurrence relation with constant co- 2
efficients. When we have a recurrence of this kind, The constant 1/φ ≈ 0.618033989 is known as the
we begin by expressing it in such a way that the re- golden ratio. By applying the method of partial frac-
lation is valid for every n ∈ N. In the example of tion expansion we find:
Fibonacci numbers, this is not the case, because for
n = 0 we have F0 = F−1 + F−2 , and we do not know t A B
the values for the two elements in the r.h.s., which F (t) = = + =
b
(1 − φt)(1 − φt) 1 − φt 1 − φt
b
have no combinatorial meaning. However, if we write
b + B − Bφt
A − Aφt
the recurrence as Fn+2 = Fn+1 + Fn we have fulfilled = .
the requirement. This first step has a great impor- 1 − t − t2
tance, because it allows us to apply the operator G
to both members of the relation; this was not pos- We determine the two constants A and B by equat-
sible beforehand because of the principle of identity ing the coefficients in the first and last expression for
for generating functions. F (t):
The recurrence being linear with constant coeffi- ½ ½ √
cients, we can apply the axiom of linearity to the A+B =0 b = 1/ 5
A = 1/(φ − φ) √
recurrence: −Aφb − Bφ = 1 B = −A = −1/ 5

fn+p = α1 fn+p−1 + α2 fn+p−2 + · · · + αp fn


The value of Fn is now obtained by extracting the
and obtain the relation: coefficient of tn :
G(fn+p ) = α1 G(fn+p−1 )+
Fn = [tn ]F (t) =
+ α2 G(fn+p−2 ) + · · · + αp G(fn ). µ ¶
n 1 1 1
= [t ] √ − =
By Theorem 4.2.2 we can now express every 5 1 − φt 1 − φt b
G(fn+p−j ) in terms of f (t) = G(fn ) and obtain a lin- µ ¶
1 n 1 n 1
ear relation in f (t), from which an explicit expression = √ [t ] − [t ] =
5 1 − φt b
1 − φt
for f (t) is immediately obtained. This is the solution
of the recurrence relation. We observe explicitly that φn − φbn
= √ .
in writing the expressions for G(fn+p−j ) we make use 5
of the initial conditions for the sequence.
Let us go on with the example of the Fibonacci This formula allows us to compute Fn in a time in-
sequence (Fk )k∈N . We have: dependent of n, because φn = exp(n ln φ), and shows
b < 1,
that Fn grows exponentially. In fact, since |φ|
G(Fn+2 ) = G(Fn+1 ) + G(Fn )
b n
the quantity φ approaches 0 very rapidly and we
and by setting F (t) = G(Fn ) we find: have Fn = O(φn ). In reality, Fn should be an inte-
ger and therefore we can compute
√ it by finding the
F (t) − F0 − F1 t F (t) − F0
2
= + F (t). integer number closest to φn / 5; consequently:
t t
Because we know that F0 = 0, F1 = 1, we have: µ n¶
φ
Fn = round √ .
2 5
F (t) − t = tF (t) + t F (t)
4.10. THE SUMMING FACTOR METHOD 49

4.9 Linear recurrences with where C is an integration constant. Because i(0) =


eC = 1, we have C = 0 and we conclude with the
polynomial coefficients formula: µ ¶
t2
When a recurrence relation has polynomial coeffi- In = n![tn ] exp t + .
2
cients, the method of generating functions does not
assure a solution, but no other method is available
to solve those recurrences which cannot be solved by 4.10 The summing factor
a generating function approach. Usually, the rule of
differentiation introduces a derivative in the relation
method
for the generating function and a differential equa- For linear recurrences of the first order a method ex-
tion has to be solved. This is the actual problem of ists, which allows us to obtain an explicit expression
this approach, because the main difficulty just con- for the generic element of the defined sequence. Usu-
sists in dealing with the differential equation. We ally, this expression is in the form of a sum, and a pos-
have already seen some examples when we studied sible closed form can only be found by manipulating
the method of shifting, but here we wish to present this sum; therefore, the method does not guarantee
a case arising from an actual combinatorial problem, a closed form. Let us suppose we have a recurrence
and in the next section we will see a very important relation:
example taken from the analysis of algorithms. an+1 fn+1 = bn fn + cn
When we studied permutations, we introduced the
concept of an involution, i.e., a permutation π ∈ Pn where an , bn , cn are any expressions, possibly depend-
such that π 2 = (1), and for the number In of involu- ing on n. As we remarked in the Introduction, if
tions in Pn we found the recurrence relation: an+1 = bn = 1, by unfolding the recurrence we can
find an explicit expression for fn+1 or fn :
In = In−1 + (n − 1)In−2 n
X
which has polynomial coefficients. The number of n+1 f = f n +cn = fn−1 +c n−1 +c n = · · · = f0 + ck
k=0
involutions grows very fast and it can be a good idea
to consider the quantity in = In /n!. Therefore, let where f0 is the initial condition relative to the se-
us begin by changing the recurrence in such a way quence under consideration. Fortunately, we can al-
that the principle of identity can be applied, and then ways change the original recurrence into a relation of
divide everything by (n + 2)!: this more simple form. In fact, if we multiply every-
thing by the so-called summing factor:
In+2 = In+1 + (n + 1)In
In+2 1 In+1 1 In an an−1 . . . a0
= + . bn bn−1 . . . b0
(n + 2)! n + 2 (n + 1)! n + 2 n!
The recurrence relation for in is: provided none of an , an−1 , . . . , a0 , bn , bn−1 , . . . , b0 is
zero, we obtain:
(n + 2)in+2 = in+1 + in an+1 an . . . a0
fn+1 =
and we can pass to generating functions. bn bn−1 . . . b0
an an−1 . . . a0 an an−1 . . . a0
G((n + 2)in+2 ) can be seen as the shifting of = fn + cn .
G((n + 1)in+1 ) = i′ (t), if with i(t) we denote the bn−1 bn−2 . . . b0 bn bn−1 . . . b0
generating function of (ik )k∈N = (Ik /k!)k∈N , which We can now define:
is therefore the exponential generating function for
(Ik )k∈N . We have: gn+1 = an+1 an . . . a0 fn+1 /(bn bn−1 . . . b0 ),

i′ (t) − 1 i(t) − 1 and the relation becomes:


= + i(t)
t t an an−1 . . . a0
gn+1 = gn + cn g0 = a0 f0 .
because of the initial conditions i0 = i1 = 1, and so: bn bn−1 . . . b0
Finally, by unfolding this recurrence the result is:
i′ (t) = (1 + t)i(t).
à n
!
bn bn−1 . . . b0 X ak ak−1 . . . a0
This is a simple differential equation with separable fn+1 = a0 f0 + ck .
variables and by solving it we find: an+1 an . . . a0 bk bk−1 . . . b0
k=0
µ ¶
t2 t2 As a technical remark, we observe that sometimes
ln i(t) = t + + C or i(t) = exp t + + C a0 and/or b0 can be 0; in that case, we can unfold the
2 2
50 CHAPTER 4. GENERATING FUNCTIONS

recurrence down to 1, and accordingly change the last We can somewhat simplify this expression by observ-
index 0. ing that:
In order to show a non-trivial example, let us
X n
discuss the problem of determining the coefficient 1 1 1
= + + ··· + 1 − 1 =
of tn in√ the f.p.s. corresponding to the function 2k − 1 2n − 1 2n − 3
k=0
f (t) = 1 − t ln(1/(1 − t)). If we expand the func-
tion, we find: 1 1 1 1 1 1
√ 1 = + + +· · ·+ +1− −· · ·− −1 =
f (t) = 1 − t ln = 2n 2n − 1 2n − 2 2 2n 2
1−t
1 1 71 5 31 6 1 1 1 1
= t − t3 − t4 − t − t + · · · . = H2n+2 − 2n + 1 − 2n + 2 − 2 Hn+1 + 2n + 2 − 1 =
24 24 1920 960
A method for finding a recurrence relation for the 1 2(n + 1)
= H2n+2 − Hn+1 − .
coefficients fn of this f.p.s. is to derive a differential 2 2n + 1
equation for f (t). By differentiating:
Furthermore, we have:
′ 1 1 1 µ ¶ µ ¶
f (t) = − √ ln +√ 1 2n 4 2n + 2 n+1
2 1−t 1−t 1−t =
(n + 1)4n n (n + 1)4n+1 n + 1 2(2n + 1)
and therefore we have the differential equation: µ ¶
2 1 2n + 2
= .
1 √ 2n + 1 4n+1 n + 1
(1 − t)f ′ (t) = − f (t) + 1 − t.
2
Therefore:
By extracting the coefficient of tn , we have the rela- µ ¶
tion: 1 2(n + 1)
fn+1 = Hn+1 − H2n+2 + ×
µ ¶ 2 2n + 1
1 1/2
(n + 1)fn+1 − nfn = − fn + (−1)n µ ¶
2 n 2 1 2n + 2
× .
which can be written as: 2n + 1 4n+1 n + 1
µ ¶
2n − 1 1 2n This expression allows us to obtain a formula for fn :
(n + 1)fn+1 = fn − n .
2 4 (2n − 1) n µ ¶ µ ¶
1 2n 2 1 2n
fn = Hn − H2n + =
This is a recurrence relation of the first order with the 2 2n − 1 2n − 1 4n n
initial condition f0 = 0. Let us apply the summing
µ ¶µ ¶
factor method, for which we have an = n, bn = (2n − 4n 1/2
1)/2. Since a0 = 0, we have: = Hn − 2H2n + .
2n − 1 n
an an−1 . . . a1 n(n − 1) · · · 1 · 2n The reader can numerically check this expression
= =
bn bn−1 . . . b1 (2n − 1)(2n − 3) · · · 1 against the actual values of fn given above. By using
n!2n 2n(2n − 2) · · · 2 the asymptotic approximation Hn ∼ ln n + γ given in
= =
2n(2n − 1)(2n − 2) · · · 1 the Introduction, we find:
n 2
4 n! 4n
= . Hn − 2H2n + ∼
(2n)! 2n − 1
By multiplying the recurrence relation by this sum- ∼ ln n + γ − 2(ln 2 + ln n + γ) + 2 =
ming factor, we find: = − ln n − γ − ln 4 + 2.
4n n!2 2n − 1 4n n!2 1 Besides:
(n + 1) fn+1 = fn − . µ ¶
(2n)! 2 (2n)! 2n − 1 1 1 2n 1
∼ √
We are fortunate and cn simplifies dramatically; 2n − 1 4n n 2n πn
besides, we know that the two coefficients of fn+1 and we conclude:
and fn are equal, notwithstanding their appearance.
Therefore we have: 1
fn ∼ −(ln n + γ + ln 4 − 2) √
µ ¶X n 2n πn
1 2n −1
fn+1 = (a0 f0 = 0).
(n + 1)4n n 2k − 1 which shows that |fn | grows as ln n/n3/2 .
k=0
4.11. THE INTERNAL PATH LENGTH OF BINARY TREES 51
¡n−1¢
4.11 The internal path length k , and therefore the total contribution to Pn of
the left subtrees is:
of binary trees µ ¶ µ ¶
n−1 Pk
(n − 1 − k)!(Pk + k!k) = (n − 1)! +k .
Binary trees are often used as a data structure to re- k k!
trieve information. A set D of keys is given, taken
In a similar way we find the total contribution of the
from an ordered universe U . Therefore D is a permu-
right subtrees:
tation of the ordered sequence d1 < d2 < · · · < dn ,
µ ¶
and as the various elements arrive, they are inserted n−1
in a binary tree. As we know, there are n! pos- k!(Pn−1−k + (n − 1 − k)!(n − 1 − k)) =
k
sible ¡permutations
¢ of the keys in D, but there are µ ¶
only 2n n /(n + 1) different binary trees with n nodes. Pn−1−k
= (n − 1)! + (n − 1 − k) .
When we are looking for some key d to find whether (n − 1 − k)!
d ∈ D or not, we perform a search in the binary tree, It only remains to count the contribution of the roots,
comparing d against the root and other keys in the which obviously amounts to n!, a single comparisons
tree, until we find d or we arrive to some leaf and we for every tree. We therefore have the following recur-
cannot go on with our search. In the former case our rence relation, in which the contributions of the left
search is successful, while in the latter case it is un- and right subtrees turn out to be the same:
successful. The problem is: how many comparisons
should we perform, on the average, to find out that d Pn = n! + (n − 1)!×
is present in the tree (successful search)? The answer
to this question is very important, because it tells us Xµ
n−1
Pk Pn−1−k

how good binary trees are for searching information. × +k+ + (n − 1 − k) =
k! (n − 1 − k)!
k=0
The number of nodes along a path in the tree, start-
ing at the root and arriving to a given node K is Xµ
n−1 ¶
called the internal path length for K. It is just the Pk
= n! + 2(n − 1)! +k =
number of comparisons necessary to find the key in k!
k=0
K. Therefore, our previous problem can be stated in Ãn−1 !
X Pk n(n − 1)
the following way: what is the average internal path = n! + 2(n − 1)! + .
length for binary trees with n nodes? Knuth has k! 2
k=0
found a rather simple way for answering this ques- We used the formula for the sum of the first n − 1
tion; however, we wish to show how the method of integers, and now, by dividing by n!, we have:
generating functions can be used to find the average
internal path length in a standard way. The reason- n−1 n−1
Pn 2 X Pk 2 X Pk
ing is as follows: we evaluate the total internal path =1+ +n−1=n+ .
n! n k! n k!
length for all the trees generated by the n! possible k=0 k=0
permutations of our key set D, and then divide this Let us now set Qn = Pn /n!, so that Qn is the average
number by n!n, the total number of nodes in all the total i.p.l. relative to a single tree. If we’ll succeed
trees. in finding Qn , the average i.p.l. we are looking for
A non-empty binary tree can be seen as two sub- will simply be Qn /n. We can also reformulate the
trees connected to the root (see Section 2.9); the left recurrence for n + 1, in order to be able to apply the
subtree contains k nodes (k = 0, 1, . . . , n − 1) and the generating function operator:
right subtree contains the remaining n − 1 − k nodes. n
Let Pn the total internal path length (i.p.l.) of all the X
(n + 1)Qn+1 = (n + 1)2 + 2 Qk
n! possible trees generated by permutations. The left k=0
subtrees have therefore a total i.p.l. equal to Pk , but
1+t Q(t)
every search in these subtrees has to pass through Q′ (t) = +2
(1 − t)3 1−t
the root. This increases the total i.p.l. by the total
number of nodes, i.e., it actually is Pk + k!k. We 2 1+t
Q′ (t) − Q(t) = .
now observe that every left subtree is associated to 1−t (1 − t)3
each possible right subtree and therefore it should be This differential equation can be easily solved:
counted (n − 1 − k)! times. Besides, every permuta- µZ ¶
tion generating the left and right subtrees is not to 1 (1 − t)2 (1 + t)
Q(t) = dt + C =
be counted only once: the keys can be arranged in (1 − t)2 (1 − t)3
µ ¶
all possible ways in the overall permutation, retain- 1 1
ing their relative ordering. These possible ways are = 2 ln − t + C .
(1 − t)2 1−t
52 CHAPTER 4. GENERATING FUNCTIONS

Since the i.p.l. of the empty tree is 0, we should have Formally, a height balanced binary tree is a tree such
Q0 = Q(0) = 0 and therefore, by setting t = 0, we that for every node K in it, if h′K and h′′k are the
find C = 0. The final result is: heights of the two subtrees originating from K, then
|h′K − h′′K | ≤ 1.
2 1 t
Q(t) = ln − . The algorithm of Adelson-Velski and Landis is very
(1 − t)2 1 − t (1 − t)2
important because, as we are now going to show,
We can now use the formula for G(nHn ) (see the Sec- height balanced binary trees assure that the retriev-
tion 4.4 on Common Generating Functions) to ex- ing time for every key in the tree is O(ln n). Because
tract the coefficient of tn : of that, height balanced binary trees are also known
µ µ ¶ ¶ as AVL trees, and the algorithm for building AVL-
n 1 1 trees from a set of n keys can be found in any book
Qn = [t ] 2 ln + 1 − (2 + t) =
(1 − t)2 1−t on algorithms and data structures. Here we only wish
µ ¶ to perform a worst case analysis to prove that the re-
t 1 2+t trieval time in any AVL tree cannot be larger than
= 2[tn+1 ] ln + 1 − [t n
] =
(1 − t)2 1−t (1 − t)2 O(ln n).
µ ¶ µ ¶
−2 −2 In order to perform our analysis, let us consider
= 2(n+1)Hn+1 −2 (−1)n − (−1)n−1 = to worst possible AVL tree. Since, by definition, the
n n−1
height of the left subtree of any node cannot exceed
= 2(n + 1)Hn + 2 − 2(n + 1) − n = 2(n + 1)Hn − 3n. the height of the corresponding right subtree plus 1,
Thus we conclude with the average i.p.l.: let us consider trees in which the height of the left
µ ¶ subtree of every node exceeds exactly by 1 the height
Pn Qn 1 of the right subtree of the same node. In Figure 4.1
= =2 1+ Hn − 3.
n!n n n we have drawn the first cases. These trees are built in
a very simple way: every tree Tn , of height n, is built
This formula is asymptotic to 2 ln n+γ −3, and shows by using the preceding tree Tn−1 as the left subtree
that the average number of comparisons necessary to and the tree Tn−2 as the right subtree of the root.
retrieve any key in a binary tree is in the order of Therefore, the number of nodes in Tn is just the sum
O(ln n). of the nodes in Tn−1 and in Tn−2 , plus 1 (the root),
and the condition on the heights of the subtrees of
4.12 Height balanced binary every node is satisfied. Because of this construction,
Tn can be considered as the “worst” tree of height n,
trees in the sense that every other AVL-tree of height n will
have at least as many nodes as Tn . Since the height
We have been able to show that binary trees are a n is an upper bound for the number of comparisons
“good” retrieving structure, in the sense that if the necessary to retrieve any key in the tree, the average
elements, or keys, of a set {a1 , a2 , . . . , an } are stored retrieving time for every such tree will be ≤ n.
in random order in a binary (search) tree, then the If we denote by |Tn | the number of nodes in the
expected average time for retrieving any key in the tree Tn , we have the simple recurrence relation:
tree is in the order of ln n. However, this behavior of
binary trees is not always assured; for example, if the |Tn | = |Tn−1 | + |Tn−2 | + 1.
keys are stored in the tree in their proper order, the
resulting structure degenerates into a linear list and This resembles the Fibonacci recurrence relation,
the average retrieving time becomes O(n). and, in fact, we can easily show that |Tn | = Fn+1 − 1,
To avoid this drawback, at the beginning of the as is intuitively apparent from the beginning of the
1960’s, two Russian researchers, Adelson-Velski and sequence {0, 1, 2, 4, 7, 12, . . .}. The proof is done by
Landis, found an algorithm to store keys in a “height mathematical induction. For n = 0 we have |T0 | =
balanced” binary tree, a tree for which the height of F1 − 1 = 1 − 1 = 0, and this is true; similarly we
the left subtree of every node K differs by at most proceed for n + 1. Therefore, let us suppose that for
1 from the height of the right subtree of the same every k < n we have |Tk+1 | = Fk − 1; this holds for
node K. To understand this concept, let us define the k = n−1 and k = n−2, and because of the recurrence
height of a tree as the highest level at which a node relation for |Tn | we have:
in the tree is placed. The height is also the maximal
number of comparisons necessary to find any key in |Tn | = |Tn−1 | + |Tn−2 | + 1 =
the tree. Therefore, if we find a limitation for the = Fn − 1 + Fn−1 − 1 + 1 = Fn+1 − 1
height of a class of trees, this is also a limitation for
the internal path length of the trees in the same class. since Fn + Fn−1 = Fn+1 by the Fibonacci recurrence.
4.13. SOME SPECIAL RECURRENCES 53

T0 Tr 1 Tr 2 Tr 3 Tr 4 Tr 5
¡ ¡@ ¡¡@ ¡@
r
¡ r¡ @r r
¡ @r r
¡ @r
¡ A ¢ ¡@ ¢
r
¡ r¡ Ar ¢r r
¡ @r r¢ @ @r
¡ A ¢ ¢
r
¡ r¡ Ar r¢ r¢
¡

Figure 4.1: Worst AVL-trees

We have√ shown that for large values of n we √ have We are now in a position to find out their expo-
Fn ≈ φn / √5; therefore we have |Tn | ≈ φn+1 / 5 − 1 nential generating function, i.e., the function B(t) =
or φn+1 ≈ 5(|Tn |+1).√ By passing to the logarithms, G(Bn /n!), and prove some of their properties. The
we have: n ≈ logφ ( 5(|Tn | + 1)) − 1, and since all defining relation can be written as:
logarithms are proportional, n = O(ln |Tn |). As we n µ ¶
X n+1
observed, every AVL-tree of height n has a number Bn−k =
of nodes not less than |Tn |, and this assures that the n−k
k=0
retrieving time for every AVL-tree with√ at most |Tn | n
X Bn−k
nodes is bounded from above by logφ ( 5(|Tn |+1))−1 = (n + 1)n · · · (k + 2) =
(n − k)!
k=0
Xn
4.13 Some special recurrences (n + 1)! Bn−k
= = δn,0 .
(k + 1)! (n − k)!
k=0
Not all recurrence relations are linear and we had
occasions to deal with a different sort of relation when If we divide everything by (n + 1)!, we obtain:
we studied the Catalan
Pn−1 numbers. They satisfy the n
X 1 Bn−k
recurrence Cn = k=0 Ck Cn−k−1 , which however, = δn,0
in this particular form, is only valid for n > 0. In (k + 1)! (n − k)!
k=0
order to apply the method of generating functions,
we write it for n + 1: and since this relation holds for every n ∈ N, we
n
can pass to the generating functions. The left hand
X
Cn+1 = Ck Cn−k . member is a convolution, whose first factor is the shift
k=0
of the exponential function and therefore we obtain:

The right hand member is a convolution, and there- et − 1 t


B(t) = 1 or B(t) = t .
fore, by the initial condition C0 = 1, we obtain: t e −1
C(t) − 1 The classical way to see that B2n+1 = 0, ∀n > 0,
= C(t)2 or tC(t)2 − C(t) + 1 = 0.
t is to show that the function obtained from B(t) by
This is a second degree equation, which can be di- deleting the term of first degree is an even function,
rectly solved; for t = 0 we should have C(0) = C0 = and therefore should have all its coefficients of odd
1, and therefore the solution with the + sign before order equal to zero. In fact we have:
the square root is to be ignored; we thus obtain: t t t t et + 1
√ B(t) + = t + = .
1 − 1 − 4t 2 e −1 2 2 et − 1
C(t) =
2t In order to see that this function is even, we substi-
which was found in the section “The method of shift- tute t → −t and show that the function remains the
ing” in a completely different way. same:
The Bernoulli numbers were introduced by means
t e−t + 1 t 1 + et t et + 1
of the implicit relation: − −t =− = .
2e −1 21−e t 2 et − 1
n µ
X n+1 ¶
Bk = δn,0 .
k
k=0
54 CHAPTER 4. GENERATING FUNCTIONS
Chapter 5

Riordan Arrays

5.1 Definitions and basic con- Theorem 5.1.1 Let D = (d(t), h(t)) be a Riordan
array and let f (t) be the generating function of the
cepts sequence (fk )k∈N ; then we have:
A Riordan array is a couple of formal power series X n
D = (d(t), h(t)); if both d(t), h(t) ∈ F 0 , then the dn,k fk = [tn ]d(t)f (th(t)) (5.1.3)
Riordan array is called proper. The Riordan array k=0
can be identified with the infinite, lower triangular
Proof: The proof consists in a straight-forward com-
array (or triangle) (dn,k )n,k∈N defined by:
putation:
dn,k = [tn ]d(t)(th(t))k (5.1.1) Xn X∞

In fact, we are mainly interested in the sequence of d f


n,k k = dn,k fk =
k=0 k=0
functions iteratively defined by:
X∞
½
d0 (t) = d(t) = [tn ]d(t)(th(t))k fk =
dk (t) = dk−1 (t)th(t) = d(t)(th(t))k k=0
X ∞
These functions are the column generating functions = [tn ]d(t) fk (th(t))k =
of the triangle. k=0
Another way of characterizing a Riordan array = [tn ]d(t)f (th(t)).
D = (d(t), h(t)) is to consider the bivariate gener-
ating function:
X∞
d(t) In the case of Pascal triangle we obtain the Euler
d(t, z) = d(t)(th(t))k z k = (5.1.2)
1 − tzh(t) transformation:
k=0
Xn µ ¶ µ ¶
A common example of a Riordan array is the Pascal n n 1 t
triangle, for which we have d(t) = h(t) = 1/(1 − t). fk = [t ] f
k 1−t 1−t
k=0
In fact we have:
µ ¶k
1 t 1 which we proved by simple considerations on bino-
dn,k = [tn ] = [tn−k ] k+1
= mial coefficients and generating functions. By con-
1−t 1−t (1 − t)
µ ¶ µ ¶ µ ¶
−k − 1 n n
sidering
¡ ¢the generating functions G(1) = 1/(1 − t),
= (−1)n−k = = G (−1)k = 1/(1 + t) and G(k) = t/(1 − t)2 , from the
n−k n−k k previous theorem we immediately find:
and this shows that the generic element is actually a X d(t)
binomial coefficient. By formula (5.1.2), we find the row sums (r. s.) dn,k = [tn ]
well-known bivariate generating function d(t, z) = 1 − th(t)
k
−1
(1 − t − tz) . X d(t)
From our point of view, one of the most impor- alternating r. s. (−1)k dn,k = [tn ]
1 + th(t)
tant properties of Riordan arrays is the fact that the k
X td(t)h(t)
sums involving the rows of a Riordan array can be weighted r. s. kdn,k = [tn ] .
performed by operating a suitable transformation on (1 − th(t))2
k
a generating function and then by extracting a coeffi-
cient from the resulting function. More precisely, we Moreover, by observing that D b = (d(t), th(t)) is a
prove the following theorem: Riordan array, whose rows are the diagonals of D,

55
56 CHAPTER 5. RIORDAN ARRAYS

we have: Proof: For every k ∈ N take the sequence which


X is 0 everywhere except in the kth element fk = 1.
d(t)
diagonal sums dn−k,k = [tn ] . The corresponding
P∞ generating function is f (t) = t
k
1 − t2 h(t) and we have i=0 dn,i fi = dn,k . Hence, according to
k
the theorem’s hypotheses, we find Gt (dn,k )n,k∈N =
Obviously, this observation can be generalized
P to find k
the generating function of any sum k d n−sk,k for dk (t) = d(t)(th(t)) , and this corresponds to the ini-
every s ≥ 1. We obtain well-known results for the tial definition of column generating functions for a
Pascal triangle; for example, diagonal sums give: Riordan array, for every k = 1, 2, . . ..
X µn − k ¶ 1 1
= [tn ] =
k 1 − t 1 − t2 (1 − t)−1 5.2 The algebraic structure of
k
1 Riordan arrays
= [tn ] = Fn+1
1 − t − t2
connecting binomial coefficients and Fibonacci num- The most important algebraic property of Riordan
bers. arrays is the fact that the usual row-by-column prod-
Another general result can be obtained by means of uct of two Riordan arrays is a Riordan array. This is
two sequences (fk )k∈N and (gk )k∈N and their gener- proved by considering two Riordan arrays (d(t), h(t))
ating functions f (t), g(t). For p = 1, 2, . . ., the generic and (a(t), b(t)) andPperforming the product, whose
element of the Riordan array (f (t), tp−1 ) is: generic element is j dn,j fj,k , if dn,j is the generic
element in (d(t), h(t)) and fj,k is the generic element
dn,k = [tn ]f (t)(tp )k = [tn−pk ]f (t) = fn−pk . in (a(t), b(t)). In fact we have:

X
Therefore, by formula (5.1.3), we have: dn,j fj,k =
⌊n/p⌋ j=0
X £ ¯ ¤ ∞
fn−pk gk = [tn ]f (t) g(y) ¯ y = tp = X
= [tn ]d(t)(th(t))j [y j ]a(y)(yb(y))k =
k=0
j=0
= [tn ]f (t)g(tp ). ∞
X
= [tn ]d(t)
(th(t))j [y j ]a(y)(yb(y))k =
This can be called the rule of generalized convolution
j=0
since it reduces to the usual convolution rule for p =
1. Suppose, for example, that we wish to sum one = [t ]d(t)a(th(t))(th(t)b(th(t)))k .
n

out of every three powers of 2, starting with 2n and By definition, the last expression denotes the generic
going down to the lowest integer exponent ≥ 0; we element of the Riordan array (f (t), g(t)) where f (t) =
have: d(t)a(th(t)) and g(t) = h(t)b(th(t)). Therefore we
⌊n/3⌋ have:
X 1 1
Sn = 2n−3k = [tn ] . (d(t), h(t)) · (a(t), b(t)) = (d(t)a(th(t)), h(t)b(th(t))).
1 − 2t 1 − t3
k=0
(5.2.1)
As we will learn studying asymptotics, an approxi-
mate value for this sum can be obtained by extracting This expression is particularly important and is the
the coefficient of the first factor and then by multiply- basis for many developments of the Riordan array
ing it by the second factor, in which t is substituted theory.
by 1/2. This gives Sn ≈ 2n+3 /7, and in fact we have The product is obviously associative, and we ob-
the exact value Sn = ⌊2n+3 /7⌋. serve that the Riordan array (1, 1) acts as the neutral
In a sense, the theorem on the sums involving the element or identity. In fact, the array (1, 1) is every-
Riordan arrays is a characterization for them; in fact, where 0 except for the elements on the main diagonal,
we can prove a sort of inverse property: which are 1. Observe that this array is proper.
Let us now suppose that (d(t), h(t)) is a proper
Theorem 5.1.2 Let (dn,k )n,k∈N be an infinite tri-
Riordan array. By formula (5.2.1), we immediately
angle
P such that for every sequence (fk )k∈N we have see that the product of two proper Riordan arrays is
n
k dn,k fk = [t ]d(t)f (th(t)), where f (t) is the gen- proper; therefore, we can look for a proper Riordan
erating function of the sequence and d(t), h(t) are two array (a(t), b(t)) such that (d(t), h(t)) · (a(t), b(t)) =
f.p.s. not depending on f (t). Then the triangle de- (1, 1). If this is the case, we should have:
fined by the Riordan array (d(t), h(t)) coincides with
(dn,k )n,k∈N . d(t)a(th(t)) = 1 and h(t)b(th(t)) = 1.
5.3. THE A-SEQUENCE FOR PROPER RIORDAN ARRAYS 57

By setting y = th(t) we have: done by using the LIF. As we observed in the first sec-
h ¯ i tion, the bivariate generating function for (d(t), h(t))
¯
a(y) = d(t)−1 ¯ t = yh(t)−1 is d(t)/(1 − tzh(t)) and therefore we have:
h ¯ i dh (t)
¯
b(y) = h(t)−1 ¯ t = yh(t)−1 . dn,k = [tn z k ] =
1 − tzhh (t)
· ¸
Here we are in the hypotheses of the Lagrange Inver- k n dh (t) ¯¯
= [z ][t ] ¯ y = thh (t) .
sion Formula, and therefore there is a unique function 1 − zy
−1
t = t(y) such that t(0) = 0 and t = yh(t) . Besides,
By the formulas above, we have:
being d(t), h(t) ∈ F 0 , the two f.p.s. a(y) and b(y) are
uniquely defined. We have therefore proved: y = thh (t) = th(thh (t))−1 = th(y)−1
Theorem 5.2.1 The set A of proper Riordan arrays which is the same as t = yh(y). Therefore we find:
is a group with the operation of row-by-column prod- d (t) = d (yh(y)) = d(t)−1 , and consequently:
h h
uct defined functionally by relation (5.2.1). · ¸
k n d(y)−1 ¯¯ −1
It is a simple matter to show that some important d n,k = [z ][t ] ¯ y = th(y) =
1 − zy
classes of Riordan arrays are subgroups of A: µ ¶
1 d d(y)−1 1
= [z k ] [y n−1 ] =
• the set of the Riordan arrays (f (t), 1) is an in- n dy 1 − zy h(y)n
variant subgroup of A; it is called the Appell µ
1 z
subgroup; = [z k ] [y n−1 ] −
n d(y)(1 − zy)2

• the set of the Riordan arrays (1, g(t)) is a sub- d′ (y) 1
− =
group of A and is called the subgroup of associ- d(y) (1 − zy) h(y)n
2
ated operators or the Lagrange subgroup; ̰
X
k 1 n−1
= [z ] [y ] z r+1 y r (r + 1)−
• the set of Riordan arrays (f (t), f (t)) is a sub- n r=0
group of A and is called the Bell subgroup. Its ∞
!

d (y) X 1
elements are also known as renewal arrays. − r r
z y =
d(y) r=0 d(y)h(y)n
The first two subgroups have already been consid- µ ¶
ered in the Chapter on “Formal Power Series” and 1 n−1 d′ (y) 1
= [y ] ky k−1 − y k =
show the connection between f.p.s. and Riordan ar- n d(y) d(y)h(y)n
µ ¶
rays. The notations used in that Chapter are thus 1 n−k yd′ (y) 1
explained as particular cases of the most general case = [y ] k − .
n d(y) d(y)h(y)n
of (proper) Riordan arrays.
Let us now return to the formulas for a Riordan This is the formula we were looking for.
array inverse. If h(t) is any fixed invertible f.p.s., let
us define: 5.3 The A-sequence for proper
h ¯ i
¯
dh (t) = d(y)−1 ¯ y = th(y)−1 Riordan arrays
so that we can write (d(t), h(t))−1 = (dh (t), hh (t)). Proper Riordan arrays play a very important role
By the product formula (5.2.1) we immediately find in our approach. Let us consider a Riordan array
the identities: D = (d(t), h(t)), which is not proper, but d(t) ∈
F 0 . Since h(0) = 0, an s > 0 exists such that
d(thh (t)) = dh (t)−1 dh (th(t)) = d(t)−1 h(t) = hs ts + hs+1 ts+1 + · · · and hs 6= 0. If we define
b
h(t) = hs +hs+1 t+· · ·, then bh(t) ∈ F 0 . Consequently,
h(thh (t)) = hh (t)−1 hh (th(t)) = h(t)−1 the Riordan array D b = (d(t), b h(t)) is proper and the
which can be reduced to the single and basic rule: rows of D can be seen as the s-diagonals (dbn−sk )k∈N
b Fortunately, for proper Riordan arrays, Rogers
of D.
f (thh (t)) = f h (t)−1 ∀f (t) ∈ F 0 . has found an important characterization: every ele-
ment dn+1,k+1 , n, k ∈ N, can be expressed as a linear
Observe that obviously f h (t) = f (t). combination of the elements in the preceding row,
We wish now to find an explicit expression for the i.e.:
generic element dn,k in the inverse Riordan array
(d(t), h(t))−1 in terms of d(t) and h(t). This will be dn+1,k+1 = a0 dn,k + a1 dn,k+1 + a2 dn,k+2 + · · · =
58 CHAPTER 5. RIORDAN ARRAYS


X and this uniquely determines A when h(t) is given
= aj dn,k+j . (5.3.1)
and, vice versa, h(t) is uniquely determined when A
j=0
is given.
The sum is actually finite and the sequence A = The A-sequence for the Pascal triangle is the so-
(ak )k∈N is fixed. More precisely, we can prove the lution A(y) of the functional equation 1/(1 − t) =
following theorem: A(t/(1 − t)). The simple substitution y = t/(1 − t)
gives A(y) = 1 + y, corresponding to the well-known
¡ ¢
Theorem 5.3.1 An infinite lower triangular array basic recurrence of the Pascal triangle: n+1 =
¡n¢ ¡ n ¢ k+1
D = (dn,k )n,k∈N is a Riordan array if and only if a
k + k+1 . At this point, we realize that we could
sequence A = {a0 6= 0, a1 , a2 , . . .} exists such that for have started with this recurrence relation and directly
every n, k ∈ N relation (5.3.1) holds found A(y) = 1+y. Now, h(t) is defined by (5.3.2) as
Proof: Let us suppose that D is the Riordan the solution of h(t) = 1 + th(t), and this immediately
array (d(t), h(t)) and let us consider the Riordan gives h(t) = 1/(1−t). Furthermore, since column 0 is
array (d(t)h(t), h(t)); we define the Riordan array {1, 1, 1, . . .}, we have proved that the Pascal triangle
(A(t), B(t)) by the relation: corresponds to the Riordan array (1/(1−t), 1/(1−t))
as initially stated.
(A(t), B(t)) = (d(t), h(t))−1 · (d(t)h(t), h(t)) The pair of functions d(t) and A(t) completely
characterize a proper Riordan array. Another type
or: of characterization is obtained through the following
observation:
(d(t), h(t)) · (A(t), B(t)) = (d(t)h(t), h(t)).
Theorem 5.3.2 Let (dn,k )n,k∈N be any infinite,
By performing the product we find: lower triangular array with dn,n 6= 0, ∀n ∈ N (in
particular, let it be a proper Riordan array); then a
d(t)A(th(t)) = d(t)h(t) and h(t)B(th(t)) = h(t). unique sequence Z = (zk )k∈N exists such that every
element in column 0 can be expressed as a linear com-
The latter identity gives B(th(t)) = 1 and this implies bination of all the elements in the preceding row, i.e.:
B(t) = 1. Therefore we have (d(t), h(t)) · (A(t), 1) = ∞
X
(d(t)h(t), h(t)).
P∞ The element fP n,k of the left hand dn+1,0 = z0 dn,0 + z1 dn,1 + z2 dn,2 + · · · = zj dn,j .

member is j=0 dn,j ak−j = j=0 dn,k+j aj , if as j=0
usual we interpret ak−j as 0 when k < j. The same
element in the right hand member is: (5.3.3)
Proof: Let z0 = d1,0 /d0,0 . Now we can uniquely
[tn ]d(t)h(t)(th(t))k = determine the value of z1 by expressing d2,0 in terms
of the elements in row 1, i.e.:
= [tn+1 ]d(t)(th(t))k+1 = dn+1,k+1 .
d0,0 d2,0 − d21,0
By equating these two quantities, we have the iden- d2,0 = z0 d1,0 + z1 d1,1 or z1 = .
d0,0 d1,1
tity (5.3.1). For the converse, let us observe that
(5.3.1) uniquely defines the array D when the ele- In the same way, we determine z2 by expressing d3,0
ments {d0,0 , d1,0 , d2,0 , . . .} of column 0 are given. Let in terms of the elements in row 2, and by substituting
d(t) be the generating function of this column, A(t) the values just obtained for z0 and z1 . By proceeding
the generating function of the sequence A and de- in the same way, we determine the sequence Z in a
fine h(t) as the solution of the functional equation unique way.
h(t) = A(th(t)), which is uniquely determined be- The sequence Z is called the Z-sequence for the
cause of our hypothesis a0 6= 0. We can therefore (Riordan) array; it characterizes column 0, except
consider the proper Riordan array D b = (d(t), h(t));
for the element d0,0 . Therefore, we can say that
by the first part of the theorem, D b satisfies relation the triple (d0,0 , A(t), Z(t)) completely characterizes
(5.3.1), for every n, k ∈ N and therefore, by our previ- a proper Riordan array. To see how the Z-sequence
ous observation, it must coincide with D. This com- is obtained by starting with the usual definition of a
pletes the proof. Riordan array, let us prove the following:
The sequence A = (ak )k∈N is called the A-sequence Theorem 5.3.3 Let (d(t), h(t)) be a proper Riordan
of the Riordan array D = (d(t), h(t)) and it only array and let Z(t) be the generating function of the
depends on h(t). In fact, as we have shown during array’s Z-sequence. We have:
the proof of the theorem, we have: d0,0
d(t) =
h(t) = A(th(t)) (5.3.2) 1 − tZ(th(t))
5.4. SIMPLE BINOMIAL COEFFICIENTS 59

Proof: By the preceding theorem, the Z-sequence two parameters and k is a non-negative integer vari-
exists and is unique. Therefore, equation (5.3.3) is able. Depending if we consider n a variable and m a
valid for every n ∈ N, and we can go on to the gener- parameter, or vice versa, we have two different infi-
ating functions. Since d(t)(th(t))k is the generating nite arrays (dn,k ) or (dbm,k ), whose elements depend
function for column k, we have: on the parameters a, b, m or a, b, n, respectively. In
either case, if some conditions on a, b hold, we have
d(t) − d0,0
= Riordan arrays and therefore we can apply formula
t (5.1.3) to find the value of many sums.
= z0 d(t) + z1 d(t)th(t) + z2 d(t)(th(t))2 + · · · =
= d(t)(z0 + z1 th(t) + z2 (th(t))2 + · · ·) = Theorem 5.4.1 Let dn,k and dbm,k be as above. If
= d(t)Z(th(t)). b > a and b − a is an integer, then D = (dn,k ) is
b =
a Riordan array. If b < 0 is an integer, then D
By solving this equation in d(t), we immediately find (db ) is a Riordan array. We have:
m,k
the relation desired.
µ ¶
The relation can be inverted and this gives us the tm tb−a−1
D= ,
formula for the Z-sequence: (1 − t)m+1 (1 − t)b
· ¸ µ ¶
d(t) − d0,0 ¯¯ −1 t−b−1
Z(y) = ¯ t = yh(t) . b
D = (1 + t) ,n
.
td(t) (1 + t)−a
We conclude this section by giving a theorem, Proof: By using well-known properties of binomial
which characterizes renewal arrays by means of the coefficients, we find:
A- and Z-sequences: µ ¶ µ ¶
n + ak n + ak
Theorem 5.3.4 Let d(0) = h(0) 6= 0. Then d(t) = dn,k = = =
m + bk n − m + ak − bk
h(t) if and only if: A(y) = d(0) + yZ(y). µ ¶
−n − ak + n − m + ak − bk − 1
= ×
Proof: Let us assume that A(y) = d(0) + yZ(y) or n − m + ak − bk
Z(y) = (A(y) − d(0))/y. By the previous theorem,
× (−1)n−m+ak−bk =
we have: µ ¶
−m − bk − 1
d(0) = (−1)n−m+ak−bk =
d(t) = = (n − m) + (a − b)k
1 − tZ(th(t))
1
d(0) = [tn−m+ak−bk ] =
= = (1 − t)m+1+bk
1 − (tA(th(t)) − d(0)t)/th(t) µ b−a ¶k
tm t
d(0)th(t) = [tn ] ;
= = h(t), (1 − t)m+1 (1 − t)b
d(0)t
because A(th(t)) = h(t) by formula (5.3.2). Vice and:
versa, by the formula for Z(y), we obtain from the dbm,k = [tm+bk ](1 + t)n+ak =
hypothesis d(t) = h(t):
= [tm ](1 + t)n (t−b (1 + t)a )k .
d(0) + yZ(y) =
· µ ¶ ¸ The theorem now directly follows from (5.1.1)
1 d(0) ¯¯ −1
= d(0) + y − ¯ t = yh(t) = For m = a = 0 and b = 1 we again find the Riordan
t th(t)
· ¸ array of the Pascal triangle. The sum (5.1.3) takes
th(t) d(0)th(t) ¯¯ −1
= d(0) + − ¯ t = yh(t) = on two specific forms which are worth being stated
t th(t)
£ ¯ ¤ explicitly:
= h(t) ¯ t = yh(t)−1 = A(y). X µ n + ak ¶
fk =
m + bk
k
µ ¶
n tm tb−a
= [t ] f b>a (5.4.1)
(1 − t)m+1 (1 − t)b
5.4 Simple binomial coeffi-
X µ n + ak ¶
cients fk =
m + bk
k
Let us consider simple binomial coefficients,
¡ n+ak ¢ i.e., bi-
nomial coefficients of the form m+bk , where a, b are = [tm ](1 + t)n f (t−b (1 + t)a ) b < 0. (5.4.2)
60 CHAPTER 5. RIORDAN ARRAYS
√ ¸
If m and n are independent of each other, these (1 − 2 y)z+1 ¯¯ t
− √ ¯y= =
relations can also be stated as generating¡ n+ak ¢ function 2 y (1 + t)2
identities. The binomial coefficient m+bk is so gen- µ √
n z+1 (1 + t + 2 t)z+1
eral that a large number of combinatorial sums can = [t ](1 + t) √ −
be solved by means of the two formulas (5.4.1) and 2 t(1 + t)z+1
√ ¶
(5.4.2). (1 + t − 2 t)z+1
− √ =
Let us begin our set of examples with a simple¡sum;¢ 2 t(1 + t)z+1
by the theorem above, the binomial coefficients m n−k µ ¶
2n+1 2z+2 2z + 2
corresponds to the Riordan array (tm /(1 + t)m+1 , 1); = [t ](1 + t) = ;
2n + 1
therefore, by the formula concerning the row sums,
we have: in the last but one passage, we √ used backwards
√ 2z+2the
µ ¶ z+1
X n−k m bisection rule, since (1 + t ± 2 t) = (1 ± t) .
t 1
= [tn ] = We solve the following sum by using (5.4.2):
m (1 − t)m+1 1 − t
k
µ ¶ X µ2n − 2k ¶µn¶
1 n+1 (−2)k =
= [tn−m ] = . k
m−k k
(1 − t)m+2 m+1 · ¸
¯ t
m 2n n ¯
Another simple example is the sum: = [t ](1 + t) (1 − 2y) ¯ y = =
(1 − t)2
Xµ n ¶ µ ¶
5k = n
2k + 1 = [tm ](1 + t2 )n =
k m/2
· ¯ ¸
n t 1 ¯ t2 where the binomial coefficient is to be taken as zero
= [t ] ¯y= =
(1 − t)2 1 − 5y (1 − t)2 when m is odd.
1 n 2t
= [t ] = 2n−1 Fn .
2 1 − 2t − 4t2
5.5 Other Riordan arrays from
The following sum is a more interesting case. From
the generating function of the Catalan numbers we binomial coefficients
immediately find:
Other Riordan arrays can be found by using the the-
X µ n + k ¶µ2k ¶ (−1)k orem in the previous section and the rule (α 6= 0, if
=
m + 2k k k+1 − is considered):
k
·√ ¸ µ ¶ µ ¶ µ ¶
tm 1 + 4y − 1 ¯¯ t α±β α α α−1
= [tn ] ¯ y = = = ± .
(1 − t)m+1 2y (1 − t)2 α β β β−1
Ãs !
n−m 1 4t For example we find:
= [t ] 1+ −1 × µ ¶ µ ¶
(1 − t)m+1 (1 − t)2 2n n + k 2n n + k
= =
(1 − t)2 n+k 2k n+k n−k
× = µ ¶ µ ¶
2t n+k n+k−1
µ ¶ = + =
1 n−1 n−k n−k−1
= [tn−m ] = . µ ¶ µ ¶
(1 − t)m m−1 n+k n−1+k
= + .
In the following sum we use the bisection 2k 2k
¡ ¢ formulas.
Because the generating function for z+1 k
k 2 is (1 + Hence, by formula (5.4.1), we have:
z+1
2t) , we have:
X 2n µn + k ¶
µµ ¶ ¶ fk =
z+1 n+k n−k
G 22k+1 = k
2k + 1
X µn + k ¶ X µn − 1 + k ¶
1 ³ √ √ ´
= fk + fk =
= √ (1 + 2 t)z+1 − (1 − 2 t)z+1 . 2k 2k
2 t k
µ
k

By applying formula (5.4.2): n 1 t
= [t ] f +
1−t (1 − t)2
X µ z + 1 ¶µz − 2k ¶ µ ¶
22k+1 = 1 t
2k + 1 n−k + [tn−1 ] f =
k 1−t (1 − t)2
· √ µ ¶
(1 + 2 y)z+1 1+t t
= [tn ](1 + t)z √ − = [tn ] f .
2 y 1−t (1 − t)2
5.6. BINOMIAL COEFFICIENTS AND THE LIF 61

This¡ proves
¢ that the infinite triangle of the elements where φ is the golden ratio and φb = −φ−1 . The
2n n+k
n+k 2k is a proper Riordan array and many identi- reader can generalize formula (5.5.1) by using the
ties can be proved by means of the previous formula. change of variable t → pt and prove other formulas.
For example: The following one is known as Riordan’s old identity:
X 2n µn + k ¶µ2k ¶ X n µn − k ¶
(−1)k = (a + b)n−2k (−ab)k = an + bn
n+k n−k k n−k k
k k
· ¯ ¸
n 1+t 1 ¯ t
= [t ] √ ¯y= = while this is a generalization of Hardy’s identity:
1−t 1 + 4y (1 − t)2
= [tn ]1 = δn,0 , X n µn − k ¶
xn−2k (−1)k =
n−k k
X 2n µn + k ¶µ2k ¶ (−1)k
k

= √ √
n+k n−k k k+1 (x + x2 − 4)n + (x − x2 − 4)n
k = .
·√ ¸ 2n
n 1+t 1 + 4y − 1 ¯¯ t
= [t ] ¯y= =
1−t 2y (1 − t)2
5.6 Binomial coefficients and
= [tn ](1 + t) = δn,0 + δn,1 .
the LIF
The following is a quite different case. Let f (t) =
G(fk ) and: In a few cases only, the formulas of the previous sec-
µ ¶ Z t tions give the desired result when the m and n in
fk f (τ ) − f0 the numerator and denominator of a binomial coeffi-
G(t) = G = dτ.
k 0 τ cient are related between them. In fact, in that case,
we have to extract the coefficient of tn from a func-
Obviously we have: tion depending on the same variable n (or m). This
µ ¶ µ ¶ requires to apply the Lagrange Inversion Formula, ac-
n−k n−k n−k−1
= cording to the diagonalization rule.
¡ Let
¢ us suppose
k k k−1
we have the binomial coefficient 2n−k
n−k and we wish
except for k = 0, when the left-hand side is 1 and the to know whether it corresponds to a Riordan array
right-hand side is not defined. By formula (5.4.1): or not. We have:
µ ¶ µ ¶
X n n−k 2n − k
fk = = [tn−k ](1 + t)2n−k =
n−k k n − k
k µ ¶k
∞ µ ¶ n 2n t
X n − k − 1 fk = [t ](1 + t) .
= f0 + n = 1+t
k−1 k
k=1
µ 2 ¶ The function (1 + t)2n cannot be assumed as the
n t
= f0 + n[t ]G . (5.5.1) d(t) function of a Riordan array because it varies
1−t as n varies. Therefore, let us suppose that k is
This gives an immediate proof of the following for- fixed; we can apply the diagonalization rule with
mula known as Hardy’s identity: F (t) = (t/(1 + t))k and φ(t) = (1 + t)2 , and try to
find a true generating function. We have to solve the
X n µn − k ¶
(−1) =k equation:
n−k k
k
w = tφ(w) or w = t(1 + w)2 .
n 1 − t2
= [t ] ln =
µ
1 + t3
¶ This equation is tw2 − (1 − 2t)w + t = 0 and we are
1 1 looking for the unique solution w = w(t) such that
= [tn ] ln − ln =
1+t 3 1−t 2 w(0) = 0. This is:
½
(−1)n 2/n if 3 divides n √
= 1 − 2t − 1 − 4t
(−1)n−1 /n otherwise. w(t) = .
2t
We also immediately obtain:
We now perform the necessary computations:
X 1 µn − k ¶ φn + φbn µ ¶k
= w
n−k k n F (w) = =
k 1+w
62 CHAPTER 5. RIORDAN ARRAYS

µ √ ¶k
1 − 2t − 1 − 4t 5.7 Coloured walks
= √ =
1 − 1 − 4t
µ √ ¶k In the section “Walks, trees and Catalan numbers”
1 − 1 − 4t we introduced the concept of a walk or path on the
= ;
2 integral lattice Z2 . The concept can be generalized by
furthermore: defining a walk as a sequence of steps starting from
1 1 1 the origin and composed by three kinds of steps:
= =√ .
1 − tφ′ (w) 1 − 2t(1 + w) 1 − 4t 1. east steps, which go from the point (x, y) to (x +
Therefore, the diagonalization gives: 1, y);
µ ¶ µ √ ¶k
2n − k 1 1 − 1 − 4t 2. diagonal steps, which go from the point (x, y) to
= [tn ] √ .
n−k 1 − 4t 2 (x + 1, y + 1);
This shows that the binomial coefficient is the generic
3. north steps, which go from the point (x, y) to
element of the Riordan array:
µ √ ¶ (x, y + 1).
1 1 − 1 − 4t
D= √ , . A colored walk is a walk in which every kind of step
1 − 4t 2t
can assume different colors; we denote by a, b, c (a >
As a check, we observe that¡ column ¢ 0 contains all the
0, b, c ≥ 0) the number of colors the east, diagonal
elements with k = 0, i.e., 2n n , and this is in accor-
√ and north steps can be. We discuss complete colored
dance with the generating function d(t) = 1/ 1 − 4t. walks, i.e., walks without any restriction, and under-
A simple example is: diagonal walks, i.e., walks that never go above the
Xn µ ¶
2n − k k main diagonal x − y = 0. The length of a walk is the
2 = number of its steps, and we denote by dn,k the num-
n−k
k=0
· ¯ √ ¸ ber of colored walks which have length n and reach a
n 1 1 ¯ 1 − 1 − 4t distance k from the main diagonal, i.e., the last step
= [t ] √ ¯y= =
1 − 4t 1 − 2y 2 ends on the diagonal x − y = k ≥ 0. A colored walk
1 1 1 problem is any (counting) problem corresponding to
= [tn ] √ √ = [tn ] = 4n .
1 − 4t 1 − 4t 1 − 4t colored walks; a problem is called symmetric if and
By using the diagonalization rule as above, we can only if a = c.
show that: We wish to point out that our considerations are
µµ ¶¶ by no means limited to the walks on the integral lat-
2n + ak
= tice. Many combinatorial problems can be proved
n − ck k∈N
à µ √ ¶a+2c ! to be equivalent to some walk problems; bracketing
1 1 − 1 − 4t problems are a typical example and, in fact, a vast
= √ , tc−1 . literature exists on walk problems.
1 − 4t 2t
Let us consider dn+1,k+1 , i.e., the number of col-
An interesting example is given by the following al- ored walks of length n + 1 reaching the distance k + 1
ternating sum: from the main diagonal. We observe that each walk
X µ 2n ¶ is obtained in a unique way as:
k
(−1) =
n − 3k
k
" µ √ ¶6 # 1. a walk of length n reaching the distance k from
1 1 ¯ 1 − 1 − 4t
¯ the main diagonal, followed by any of the a east
= [tn ] √ ¯ y = t3
1 − 4t 1 + y 2t steps;
µ ¶
1 1−t 2. a walk of length n reaching the distance k + 1
= [tn ] √ + =
2 1 − 4t 2(1 − 3t) from the main diagonal, followed by any of the b
µ ¶
1 2n δn,0 diagonal steps;
= + 3n−1 + .
2 n 6
3. a walk of length n reaching the distance k + 2
The reader is invited to solve, in a similar way, the from the main diagonal, followed by any of the
corresponding non-alternating sum. c north steps.
In the same way¡we can ¢ deal with binomial coeffi-
cients of the form pn+ak n−ck , but in this case, in order Hence we have: dn+1,k+1 = adn,k + bdn,k+1 + cdn,k+2 .
to apply the LIF, we have to solve an equation of de- This proves that A = {a, b, c} is the A-sequence of
gree p > 2. This creates many difficulties, and we do (dn,k )n,k∈N , which therefore is a proper Riordan ar-
not insist on it any longer. ray. This significant fact can be stated as:
5.7. COLOURED WALKS 63

Theorem 5.7.1 Let dn,k be the number of colored weighted row sums given in the first section allow us
walks of length n reaching a distance k from the main to find the generating functions α(t) of the total num-
diagonal, then the infinite triangle (dn,k )n,k∈N is a ber αn of underdiagonal walks of length n, and δ(t)
proper Riordan array. of the total distance δn of these walks from the main
diagonal:
The Pascal, Catalan and Motzkin triangles define

walking problems that have different values¡ of ¢ a, b, c. 1 1 − (b + 2a)t − ∆
When c = 0, it is easily proved that dn,k = k a b n k n−k α(t) =
2at (a + b + c)t − 1
and so we end up with the Pascal triangle. Con-
sequently, we assume c 6= 0. For any given triple à √ !2
1 1 − (b + 2a)t − ∆
(a, b, c) we obtain one type of array from complete δ(t) = .
walks and another from underdiagonal walks. How- 4at (a + b + c)t − 1
ever, the function h(t), that only depends on the A- In the symmetric case these formulas simplify as fol-
sequence, is the same in both cases, and we can find it lows:
by means of formula (5.3.2). In fact, A(t) = a+bt+ct2
Ãs !
and h(t) is the solution of the functional equation 1 1 − (b − 2a)t
h(t) = a + bth(t) + ct2 h(t)2 having h(0) 6= 0: α(t) = −1
2at 1 − (b + 2a)t

1 − bt − 1 − 2bt + b2 t2 − 4act2 Ã s !
h(t) = (5.7.1)
2ct2 1 1 − bt 1 − (b − 2a)t
δ(t) = − .
The radicand 1 − 2bt + (b 2
− 4ac)t 2
= (1 − (b + 2at 1 − (b + 2a)t 1 − (b + 2a)t
√ √
2 ac)t)(1 − (b − 2 ac)t) will be simply denoted by The alternating row sums and the diagonal sums
∆. sometimes have some combinatorial significance as
Let us now focus our attention on underdiagonal well, and so they can be treated in the same way.
walks. If we consider dn+1,0 , we observe that every The study of complete walks follows the same lines
walk returning to the main diagonal can only be ob- and we only have to derive the form of the corre-
tained from another walk returning to the main diag- sponding Riordan array, which is:
onal followed by any diagonal step, or a walk ending
à √ !
at distance 1 from the main diagonal followed by any 1 1 − bt − ∆
north step. Hence, we have dn+1,0 = bdn,0 + cdn,1 (dn,k )n,k∈N = √ , .
∆ 2ct2
and in the column generating functions this corre-
sponds to d(t) − 1 = btd(t) + ctd(t)th(t). From this
The proof is as follows. Since a complete walk can
relation we easily find d(t) = (1/a)h(t), and therefore
go above the main diagonal, the array (dn,k )n,k∈N is
by (5.7.1) the Riordan array of underdiagonal colored
only the right part of an infinite triangle, in which
walk is:
k can also assume the negative values. By following
à √ √ !
1 − bt − ∆ 1 − bt − ∆ the logic of the theorem above, we see that the gen-
(dn,k )n,k∈N = , . erating function of the nth row is ((c/w) + b + aw)n ,
2act2 2ct2
and therefore the bivariate generating function of the
extended triangle is:
In current literature, major importance is usually
given to the following three quantities: X³ c ´n
d(t, w) = + b + aw tn =
n
w
1. the number of walks returning to the main diag-
onal; this is dn = [tn ]d(t), for every n, 1
= .
1 − (aw + b + c/w)t
2. the total
Pn number of walks of length n; this is
αn = k=0 dn,k , i.e., the value of the row sums If we expand this expression by partial fractions, we
of the Riordan array; get:
à !
3. the average distance from the mainP diagonal of d(t, w) = √ 1 1 1
n
√ − √
all the walks of length n; this is δn = k=0 kdn,k , ∆ 1 − 1−bt− 2ct

w 1 − 1−bt+ ∆
2ct w
which is the weighted row sum of the Riordan Ã
array, divided by αn . 1 1
= √ √ +
∆ 1− 1−bt− ∆
2ct w
In Chapter 7 we will learn how to find an asymp- √ !
totic approximation for dn . With regard to the last 1 − bt − ∆ 1 1
+ √ .
two points, the formulas for the row sums and the 2at w 1 − 1−bt− ∆ 1
2ct w
64 CHAPTER 5. RIORDAN ARRAYS

The first term represents the right part of the ex- recurrence relation by (k + 1)!/(n + 1)! we obtain the
tended triangle and this corresponds to k ≥ 0, new relation:
whereas the second term corresponds to the left part ½ ¾
(k < 0). We are interested in the right part, and the (k + 1)! n + 1
=
expression can be written as: (n + 1)! k + 1
à √ !k ½ ¾
(k + 1)! n k+1 k! n n o k + 1
1 1 1 X 1 − bt − ∆ k = + .
√ √ =√ w n! k + 1 n + 1 n! k n + 1
∆1− 1−bt− ∆
w ∆ k 2ct
2ct © ª
If we denote by dn,k the quantity k! nk /n!, this is a
which immediately gives the form of the Riordan ar- recurrence relation for dn,k , which can be written as:
ray.
(n + 1)dn+1,k+1 = (k + 1)dn,k+1 + (k + 1)dn,k .

5.8 Stirling numbers and Rior- Let us now proceed as above and find the column
dan arrays generating functions for the new array (dn,k )n,k∈N .
Obviously, d0 (t) = 1; by setting k = 0 in the new
The connection between Riordan arrays and Stirling recurrence:
numbers is not immediate. If we examine the two in-
finite triangles of the Stirling numbers of both kinds, (n + 1)dn+1,1 = dn,1 + dn,0
we immediately realize that they are not Riordan ar-

rays. It is not difficult to obtain the column generat- and passing to generating functions: d1 (t) = d1 (t) +
ing functions for the Stirling numbers of the second 1. The solution of this simple differential equation
t
kind; by starting with the recurrence relation: is d 1 (t) = e − 1 (the reader can simply check this
½ ¾ ½ ¾ n o solution, if he or she prefers). We can now go on
n+1 n n by setting k = 1 in the recurrence; we obtain: (n +
= (k + 1) +
k+1 k+1 k 1)dn+1,2 = 2dn,2 + 2dn,1 , or d′2 (t) = 2d2 (t) + 2(et −
1). Again, this differential equation has the solution
and the obvious generating function S0 (t) = 1, we d2 (t) = (et − 1)2 , and this suggests that, in general,
can specialize the recurrence (valid for every n ∈ N) we have: dk (t) = (et − 1)k . A rigorous proof of this
to the case k = 0. This gives the relation between fact can be obtained by mathematical induction; the
generating functions: recurrence relation gives: d′k+1 (t) = (k + 1)dk+1 (t) +
S1 (t) − S1 (0) (k + 1)dk (t). By the induction hypothesis, we can
= S1 (t) + S0 (t); substitute dk (t) = (et − 1)k and solve the differential
t
equation thus obtained. In practice, we can simply
because S1 (0) = 0, we immediately obtain S1 (t) = verify that dk+1 (t) = (et − 1)k+1 ; by substituting, we
t/(1−t), which is easily checked by looking at column have:
1 in the array. In a similar way, by specializing the (k + 1)et (et − 1)k =
recurrence relation to k = 1, we find S2 (t) = 2tS2 (t)+
tS1 (t), whose solution is: (k + 1)(et − 1)k+1 + (k + 1)(et − 1)k
and this equality is obviously true.
t2
S2 (t) = . The form of this generating function:
(1 − t)(1 − 2t)
µ n o¶
© ª k! n
This proves, in an algebraic way, that n2 = 2n−1 −1, dk (t) = G = (et − 1)k
n! k
and also indicates the form of the generating function n∈N
for column m:
proves that (dn,k )n,k∈N is a Riordan array having
³n n o´ tm
Sm (t) = G = d(t) = 1 and th(t) = (et − 1). This fact allows us
m n∈N (1 − t)(1 − 2t) · · · (1 − mt) to prove algebraically a lot of identities concerning
the Stirling numbers of the second kind, as we shall
which is now proved by induction when we specialize
see in the next section.
the recurrence relation above to k = m. This is left
For the Stirling numbers of the first kind we pro-
to the reader as a simple exercise.
ceed in an analogous way. We multiply the basic
The generating functions for the Stirling numbers
recurrence:
of the first kind are not so simple. However, let us
· ¸ · ¸ h i
go on with the Stirling numbers of the second kind n+1 n n
proceeding in the following way; if we multiply the = n +
k+1 k+1 k
5.9. IDENTITIES INVOLVING THE STIRLING NUMBERS 65

by£ (k¤ + 1)!/(n + 1)! and study the quantity fn,k = A first result we obtain by using the correspon-
k! nk /n!: · ¸ dence between Stirling numbers and Riordan arrays
(k + 1)! n + 1 concerns the row sums of the two triangles. For the
= Stirling numbers of the first kind we have:
(n + 1)! k + 1
· ¸ h i n h i
k! h n i 1
(k + 1)! n n k! n k + 1 X Xn
= + , n
n! k + 1 n + 1 n! k n + 1 = n! =
k n! k k!
k=0 k=0
that is: · ¯ ¸
¯ 1
= n![tn ] ey ¯ y = ln =
(n + 1)fn+1,k+1 = nfn,k+1 + (k + 1)fn,k . 1−t
1
In this case also we have f0 (t) = 1 and by special- = n![tn ] = n!
1−t
izing the last relation to the case k = 0, we obtain:
as we observed and proved in a combinatorial way.
f1′ (t) = tf1′ (t) + f0 (t). The row sums of the Stirling numbers of the second
kind give, as we know, the Bell numbers; thus we
This is equivalent to f1′ (t) = 1/(1 − t) and because
can obtain the (exponential) generating function for
f1 (0) = 0 we have:
these numbers:
1 n n o
k! n n o 1
X Xn
f1 (t) = ln . n
1−t = n! =
k n! k k!
k=0 k=0
By setting k = 1, we find the simple differential equa- £ ¯ ¤
tion f2′ (t) = tf2′ (t) + 2f1 (t), whose solution is: = n![tn ] ey ¯ y = et − 1 =
µ ¶2 = n![tn ] exp(et − 1);
1
f2 (t) = ln . therefore we have:
1−t
µ ¶
This suggests the general formula: Bn
G = exp(et − 1).
µ h i¶ µ ¶k n!
k! n 1
fk (t) = G = ln
n! k n∈N 1−t PWen © n ª defined the ordered Bell numbers as On =
also
k=0 k k!; therefore we have:
and again this can be proved by induction. In this
k! n n o
X n
case, (fn,k )n,k∈N is the Riordan array having d(t) = 1 On
= =
and th(t) = ln(1/(1 − t)). n! n! k
k=0
· ¸
n 1 ¯¯ 1
= [t ] ¯ y = e − 1 = [tn ]
t
.
5.9 Identities involving the 1−y 2 − et
Stirling numbers We have thus obtained the exponential generating
function: µ ¶
The two recurrence relations for dn,k and fn,k do not On 1
give an immediate evidence that the two triangles G = .
n! 2 − et
are indeed Riordan arrays., because they do not cor-
respond to A-sequences. However, the A-sequences Stirling numbers of the two kinds are related be-
for the two arrays can be easily found, once we know tween them in various ways. For example, we have:
their h(t) function. For the Stirling numbers of the X h n i ½ k ¾ n! X k! h n i m! k
½ ¾
first kind we have to solve the functional equation: = =
k m m! n! k k! m
µ ¶ k
·
k
¸
1 1 n! n ¯ 1
ln = tA ln . y m ¯
1−t 1−t = [t ] (e − 1) ¯ y = ln =
m! 1−t
µ ¶
By setting y = ln(1/(1 − t)) or t = (ey − 1)/y, we n! n tm n! n − 1
y y = [t ] = .
have A(y) = ye /(e − 1) and this is the generating m! (1 − t)m m! m − 1
function for the A-sequence we were looking for. In
a similar way, we find that the A-sequence for the Besides, two orthogonality relations exist between
triangle related to the Stirling numbers of the second Stirling numbers. The first one is proved in this way:
kind is: X hni ½ k ¾
t (−1)n−k =
A(t) = . k m
ln(1 + t) k
66 CHAPTER 5. RIORDAN ARRAYS
½ ¾
n! X k! h n i m! k 1−t 1
= (−1) n
(−1)k = = n![tn ] ln =
m! n! k k! m t 1−t
k
· ¯ ¸ 1 (n − 1)!
n n! n −y m ¯ 1 = n![tn+1 ](1 − t) ln = .
= (−1) [t ] (e − 1) ¯ y = ln = 1−t n+1
m! 1−t
n! Clearly, this holds for n > 0. For n = 0 we have:
= (−1)n [tn ](−t)m = δn,m .
m! n h i
X n
The second orthogonality relation is proved in a sim- Bk = B0 = 1.
k
k=0
ilar way and reads:
X nno · k ¸
(−1)n−k = δn,m .
k m
k

We introduced Stirling numbers by means of Stir-


ling identities relative to powers and falling factorials.
We can now prove these identities by using a Riordan
array approach. In fact:
n h i
X n
(−1)n−k xk =
k
k=0

k! h n i xk
Xn
= (−1)n n! (−1)k =
n! k k!
k=0
· ¯ ¸
n n −xy ¯ 1
= (−1) n![t ] e ¯ y = ln =
1−t
µ ¶
x
= (−1)n n![tn ](1 − t)x = n! = xn
n
and:
n n o µ ¶
k! n n o x
X Xn
n k
x = n! =
k n! k k
k=0 k=0
£ ¯ ¤
= n![tn ] (1 + y)x ¯ y = et − 1 =
= n![tn ]etx = xn .

We conclude this section by showing two possible


connections between Stirling numbers and Bernoulli
numbers. First we have:
n n o
k! n n o (−1)k
X Xn
n (−1)k k!
= n! =
k k+1 n! k k + 1
k=0 k=0
· ¸
1 1 ¯¯
= n![tn ] − ln ¯ y = et − 1 =
y 1+y
t
= n![tn ] t = Bn
e −1
which proves that Bernoulli numbers can be defined
in terms of the Stirling numbers of the second kind.
For the Stirling numbers of the first kind we have the
identity:
n h i
k! h n i Bk
X Xn
n
Bk = n! =
k n! k k!
k=0 k=0
· ¯ ¸
y ¯ 1
= n![tn ] y ¯ y = ln =
e −1 1−t
Chapter 6

Formal methods

6.1 Formal languages say that z occurs in w, and the particular instance
of z in w is called an occurrence of z in w. Observe
During the 1950’s, the linguist Noam Chomski in- that if z is a subword of w, it can have more than one
troduced the concept of a formal language. Several occurrence in w. If w = zw2 , we say that z is a head
definitions have to be provided before a precise state- or prefix of w, and if w = w1 z, we say that z is a tail
ment of the concept can be given. Therefore, let us or suffix of w. Finally, a language on A is any subset
proceed in the following way. L ⊆ A∗ .
First, we recall definitions given in Section 2.1. An The basic definition concerning formal languages
alphabet is a finite set A = {a1 , a2 , . . . , an }, whose is the following: a grammar is a 4-tuple G =
elements are called symbols or letters. A word on A (T, N, σ, P), where:
is a finite sequence of symbols in A; the sequence is
written by juxtaposing the symbols, and therefore a • T = {a1 , a2 , . . . , an } is the alphabet of terminal
word w is denoted by w = ai ai . . . ai , and r = |w| is symbols;
1 2 r
the length of the word. The empty sequence is called • N = {φ1 , φ2 , . . . , φm } is the alphabet of non-
the empty word and is conventionally denoted by ǫ; terminal symbols;
its length is obviously 0, and is the only word of 0
length. • σ ∈ N is the initial symbol;
The set of all the words on A, the empty word in-
• P is a finite set of productions.
cluded, is indicated by A∗ , and by A+ if the empty
word is excluded. Algebraically, A∗ is the free monoid Usually, the symbols in T are denoted by lower
on A, that is the monoid freely generated by the sym- case Latin letters; the symbols in N by Greek let-
bols in A. To understand this point, let us consider ters or by upper case Latin letters. A production
the operation of juxtaposition and recursively apply is a pair (z1 , z2 ) of words in T ∪ N , such that z1
it starting with the symbols in A. What we get are contains at least a symbol in N ; the production is
the words on A, and the juxtaposition can be seen as often indicated by z1 → z2 . If w ∈ (T ∪ N )∗ , we
an operation between them. The algebraic structure can apply a production z1 → z2 ∈ P to w whenever
thus obtained has the following properties: w can be decomposed w = w1 z1 w2 , and the result
is the new word w1 z2 w2 ∈ (T ∪ N )∗ ; we will write
1. associativity: w1 (w2 w3 ) = (w1 w2 )w3 =
w = w1 z1 w2 ⊢ w1 z2 w2 when w1 z1 w2 is the decompo-
w1 w2 w3 ;
sition of w in which z1 is the leftmost occurrence of
2. ǫ is the identity or neutral element: ǫw = wǫ = z1 in w; in other words, if we also have w = w b1 z1 w
b2 ,
w. then |w1 | < |w
b1 |.
Given a grammar G = (T, N, σ, P), we define the
It is called a monoid, which, by construction, has relation w ⊢ w b between words w, w b ∈ (T ∪ N )∗ : the
been generated by combining the symbols in A in all relation holds if and only if a production z1 → z2 ∈ P
the possible ways. Because of that, (A∗ , ·), if · de- exists such that z1 occurs in w, w = w1 z1 w2 is the
notes the juxtaposition, is called the “free monoid” leftmost occurrence of z1 in w and w b = w1 z2 w2 . We
generated by A. Observe that a monoid is an alge- also denote by ⊢∗ the transitive closure of ⊢ and call
braic structure more general than a group, in which it generation or derivation; this means that w ⊢∗ w b
all the elements have an inverse as well. if and only if a sequence (w = w1 , w2 , . . . , ws = w) b
If w ∈ A∗ and z is a word such that w can be exists such that w1 ⊢ w2 , w2 ⊢ w3 , . . . , ws−1 ⊢ ws .
decomposed w = w1 zw2 (w1 and/or w2 possibly We observe explicitly that by our condition that in
empty), we say that z is a subword of w; we also every production z1 → z2 the word z1 should contain

67
68 CHAPTER 6. FORMAL METHODS


at least a symbol in N , if a word wi ∈ T ∗ is produced Theorem 6.2.1 A word w ∈ {a, b} belongs to the
during a generation, it is terminal, i.e., the generation Dyck language D if and only if:
should stop. By collecting all these definitions, we
i) the number of a’s in w equals the number of b’s;
finally define the language generated by the grammar
G as the set: ii) in every prefix z of w the number of a’s is not
n ¯ o
∗ ¯ ∗ less than the number of b’s.
L(G) = w ∈ T ¯ σ ⊢ w
Proof: Let w ∈ D; if w = ǫ nothing has to be
i.e., a word w ∈ T ∗ is in L(G) if and only if we can
proved. Otherwise, w is generated by the second pro-
generate it by starting with the initial symbol σ and
duction and w = aw1 bw2 with w1 , w2 ∈ D; therefore,
go on by applying the productions in P until w is
if we suppose that i) holds for w1 and w2 , it also
generated. At that moment, the generation stops.
holds for w. For ii), any prefix z of w must have
Note that, sometimes, the generation can go on for-
one of the forms: a, az1 where z1 is a prefix of w1 ,
ever, never generating a word on T ; however, this is
aw1 b or aw1 bz2 where z2 is a prefix of w2 . By the
not a problem: it only means that such generations
induction hypothesis, ii) should hold for z1 and z2 ,
should be ignored.
and therefore it is easily proved for w. Vice versa, let
us suppose that i) and ii) hold for w ∈ T ∗ . If w 6= ǫ,
6.2 Context-free languages then by ii) w should begin by a. Let us scan w until
we find the first occurrence of the symbol b such that
The definition of a formal language is quite general w = aw1 bw2 and in w1 the number of b’s equals the
and it is possible to show that formal languages co- number of a’s. By i) such occurrence of b must exist,
incide with the class of “partially recursive sets”, the and consequently w1 and w2 must satisfy condition
largest class of sets which can be constructed recur- i). Besides, if w1 and w2 are not empty, then they
sively, i.e., in finite terms. This means that we can should satisfy condition ii), by the very construction
give rules to build such sets (e.g., we can give a gram- of w1 and the fact that w satisfies condition ii) by
mar for them), but their construction can go on for- hypothesis. We have thus obtained a decomposition
ever, so that, looking at them from another point of of w showing that the second production has been
view, if we wish to know whether a word w belongs used. This completes the proof.
to such a set S, we can be unlucky and an infinite
If we substitute the letter a with the symbol ‘(′ and
process can be necessary to find out that w 6∈ S.
the letter b with the symbol ‘)’, the theorem shows
Because of that, people have studied more re-
that the words in the Dyck language are the possi-
stricted classes of languages, for which a finite process
ble parenthetizations of an expression. Therefore, the
is always possible for finding out whether w belongs
number of Dyck words ¡ ¢with n pairs of parentheses is
to the language or not. Surely, the most important
the Catalan number 2n n /(n+1). We will see how this
class of this kind is the class of “context-free lan-
result can also be obtained by starting with the def-
guages”. They are defined in the following way. A
inition of the Dyck language and applying a suitable
context-free grammar is a grammar G = (T, N, σ, P)
and mechanical method, known as Schützenberger
in which all the productions z1 → z2 in P are such
methodology or symbolic method. The method can
that z1 ∈ N . The naming “context-free” derives from
be applied to every set of objects, which are defined
this definition, because a production z1 → z2 is ap-
through a non-ambiguous context-free grammar.
plied whenever the non-terminal symbol z1 is the left-
A context-free grammar G is ambiguous iff there
most non-terminal symbol in a word, irrespective of
exists a word w ∈ L(G) which can be generated by
the context in which it appears.
two different leftmost derivations. In other words, a
As a very simple example, let us consider the fol-
context-free grammar H is non-ambiguous iff every
lowing grammar. Let T = {a, b}, N = {σ}; σ is
word w ∈ L(H) can be generated in one and only
the initial symbol of the grammar, being the only
one way. An example of an ambiguous grammar is
non-terminal. The set P is composed of the two pro-
G = (T, N, σ, P) where T = {1}, N = {σ} and P
ductions:
contains the two productions:
σ→ǫ σ → aσbσ.
σ→1 σ → σσ.
This grammar is called the Dyck grammar and the
language generated by it the Dyck language. In Fig- For example, the word 111 is generated by the two
ure 6.1 we draw the generation of some words in the following leftmost derivations:
Dyck language. The recursive nature of the produc- σ ⊢ σσ ⊢ 1σ ⊢ 1σσ ⊢ 11σ ⊢ 111
tions allows us to prove properties of the Dyck lan-
guage by means of mathematical induction: σ ⊢ σσ ⊢ σσσ ⊢ 1σσ ⊢ 11σ ⊢ 111.
6.3. FORMAL LANGUAGES AND PROGRAMMING LANGUAGES 69

σ
»» X XX
» »» XXX
»» X
ǫ aσbσ
(
(( h hhh
(( (( hh hh
(((( hhh
abσ aaσbσbσ
¡ @ ¡ @
¡ @ ¡ @
ab abaσbσ aabσbσ aaaσbσbσbσ
¡ @ ¡ @ ¡@
¡ @ ¡ @ ¡ @
ababσ abaaσbσbσ aabbσ aabaσbσbσ
¡ @ ¡@ ¡ @ ¡@
¡ @ ¡ @ ¡ @ ¡ @
abab ababaσbσ aabb aabbaσbσ

Figure 6.1: The generation of some Dyck words

Instead, the Dyck grammar is non-ambiguous; in 6.3 Formal languages and pro-
fact, as we have shown in the proof of the pre-
vious theorem, given any word w ∈ D, w 6= ǫ,
gramming languages
there is only one decomposition w = aw1 bw2 , hav-
ing w1 , w2 ∈ D; therefore, w can only be generated
In 1960, the formal definition of the programming
in a single way. In general, if we show that any word
language ALGOL’60 was published. ALGOL’60 has
in a context-free language L(G), generated by some
surely been the most influential programming lan-
grammar G, has a unique decomposition according
guage ever created, although it was actually used only
to the productions in G, then the grammar cannot
by a very limited number of programmers. Most of
be ambiguous. Because of the connection between
the concepts we now find in programming languages
the Schützenberger methodology and non-ambiguous
were introduced by ALGOL’60, of which, for exam-
context-free grammars, we are mainly interested in
ple, PASCAL and C are direct derivations. Here,
this kind of grammars. For the sake of completeness,
we are not interested in these aspects of ALGOL’60,
a context-free language is called intrinsically ambigu-
but we wish to spend some words on how ALGOL’60
ous iff every context-free grammar generating it is
used context-free grammars to define its syntax in a
ambiguous. This definition stresses the fact that, if a
formal and precise way. In practice, a program in
language is generated by an ambiguous grammar, it
ALGOL’60 is a word generated by a (rather com-
can also be generated by some non-ambiguous gram-
plex) context-free grammar, whose initial symbol is
mar, unless it is intrinsically ambiguous. It is possible
hprogrami.
to show that intrinsically ambiguous languages actu-
ally exist; fortunately, they are not very frequent. For The ALGOL’60 grammar used, as terminal sym-
example, the language generated by the previous am- bol alphabet, the characters available on the stan-
biguous grammar is {1}+ , i.e., the set of all the words dard keyboard of a computer; actually, they were the
composed by any sequence of 1’s, except the empty characters punchable on a card, the input mean used
word. actually, it is not an ambiguous language and at that time to introduce a program into the com-
a non-ambiguous grammar generating it is given by puter. The non-terminal symbol notation was one
the same T, N, σ and the two productions: of the most appealing inventions of ALGOL’60: the
symbols were composed by entire English sentences
enclosed by the two special parentheses h and i. This
σ→1 σ → 1σ. allowed to clearly express the intended meaning of the
non-terminal symbols. The previous example con-
cerning hprogrami makes surely sense. Another tech-
It is a simple matter to show that every word 11 . . . 1 nical device used by ALGOL’60 was the compaction
can be uniquely decomposed according to these pro- of productions; if we had several production with the
ductions. same left hand symbol β → w1 , β → w2 , . . . , β → wk ,
70 CHAPTER 6. FORMAL METHODS

they were written as a single rule: beginning and ending by the symbol 1 and never con-
taining two consecutive 0’s. For small values of n,
β ::= w1 | w2 | · · · | wk Fibonacci words of length n are easily displayed:
where ::= was a metasymbol denoting definition and
| was read “or” to denote alternatives. This notation
is usually called Backus Normal Form (BNF). n=1 1
Just to do a very simple example, in Figure 6.1 n=2 11
(lines 1 through 6) we show how integer numbers n=3 111, 101
were defined. This definition avoids leading 0’s in n=4 1111, 1011, 1101
numbers, but allows both +0 and −0. Productions n=5 11111, 10111, 11011, 11101, 10101
can be easily changed to avoid +0 or −0 or both.
If we count them by their length, we obtain the
In the same figure, line 7 shows the definition of the
sequence {0, 1, 1, 2, 3, 5, 8, . . .}, which is easily recog-
conditional statements.
nized as the Fibonacci sequence. In fact, a word of
This kind of definition gives a precise formula-
length n is obtained by adding a trailing 1 to a word of
tion of all the clauses in the programming language.
length n−1, or adding a trailing 01 to a word of length
Besides, since the program has a single generation
n − 2. This immediately shows, in a combinatorial
according to the grammar, it is possible to find
way, that Fibonacci words are counted by Fibonacci
this derivation starting from the actual program and
numbers. Besides, we get the productions of a non-
therefore give its exact structure. This allows to give
ambiguous context-free grammar G = (T, N, σ, P),
precise information to the compiler, which, in a sense,
where T = {0, 1}, N = {φ}, σ = φ and P contains:
is directed from the formal syntax of the language
(syntax directed compilation). φ→1 φ → φ1 φ → φ01
A very interesting aspect is how this context-free
grammar definition can avoid ambiguities in the in- (these productions could have been written φ ::=
terpretation of a program. Let us consider an expres- 1 | φ1 | φ01 by using the ALGOL’60 notations).
sion like a + b ∗ c; according to the rules of Algebra, We are now going to obtain the counting gener-
the multiplication should be executed before the ad- ating function for Fibonacci words by applying the
dition, and the computer must follow this convention Schützenberger’s method. This consists in the fol-
in order to create no confusion. This is done by the lowing steps:
simplified productions given by lines 8 through 11 in
Figure 6.1. The derivation of the simple expression 1. every non-terminal symbol σ ∈ N is transformed
a + b ∗ c, or of a more complicated expression, reveals into the name of its counting generating function
that it is decomposed into the sum of a and b ∗ c; σ(t);
this information is passed to the compiler and the 2. every terminal symbol is transformed into t;
multiplication is actually performed before addition.
If powers are also present, they are executed before 3. the empty word is transformed into 1;
products.
4. every | sign is transformed into a + sign, and ::=
This ability of context-free grammars in design-
is transformed into an equal sign.
ing the syntax of programming languages is very im-
portant, and after ALGOL’60 the syntax of every After having performed these transformations, we ob-
programming language has always been defined by tain a system of equations, which can be solved in the
context-free grammars. We conclude by remembering unknown generating functions introduced in the first
that a more sophisticated approach to the definition step. They are the counting generating functions for
of programming languages was tried with ALGOL’68 the languages generated by the corresponding non-
by means of van Wijngaarden’s grammars, but the terminal symbols, when we consider them as the ini-
method revealed too complex and was abandoned. tial symbols.
The definition of the Fibonacci words produces:
6.4 The symbolic method φ(t) = t + tφ(t) + t2 φ(t)

The Schützenberger’s method allows us to obtain the solution of which is:


the counting generating function for every non- t
ambiguous language, starting with the correspond- φ(t) = ;
1 − t − t2
ing non-ambiguous grammar and proceeding in a me-
chanical way. Let us begin by a simple example; Fi- this is obviously the generating function for the Fi-
bonacci words are the words on the alphabet {0, 1} bonacci numbers. Therefore, we have shown that the
6.5. THE BIVARIATE CASE 71

1 hdigiti ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
2 hnon − zero digiti ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
3 hsequence of digitsi ::= hdigiti | hdigiti hsequence of digitsi
4 hunsigned numberi ::= hdigiti | hnon − zero digiti hsequence of digitsi
5 hsigned numberi ::= +hunsigned numberi | − hunsigned numberi
6 hinteger numberi ::= hunsigned numberi | hsigned numberi

7 hconditional clausei ::= if hconditioni then hinstructioni |


if hconditioni then hinstructioni else hinstructioni

8 hexpressioni ::= htermi | htermi + hexpressioni


9 htermi ::= hfactori | htermi ∗ hfactori
10 hfactori ::= helementi | hfactori ↑ helementi
11 helementi ::= hconstanti | hvariablei | (hexpressioni)

Table 6.1: Context-free languages and programming languages

number of Fibonacci words of length n is Fn , as we They can be defined in a pictorial way by means of
have already proved by combinatorial arguments. an object grammar:
In the case of the Dyck language, the definition An object grammar defines combinatorial objects
yields: instead of simple letters or words; however, most
σ(t) = 1 + t2 σ(t)2 times, it is rather easy to pass from an object gram-
mar to an equivalent context-free grammar, and
and therefore: therefore obtain counting generating functions by
√ means of Schützenberger’s method. For example, the
1− 1 − 4t2
σ(t) = . object grammar in Figure 6.2 is obviously equivalent
2t2
to the context-free grammar for Motzkin words.
Since every word in the Dyck language has an even
length, the number of Dyck words with 2n symbols is
just the nth Catalan number, and this also we knew 6.5 The bivariate case
by combinatorial means.
Another example is given by the Motzkin words; In Schützenberger’s method, the rôle of the indeter-
these are words on the alphabet {a, b, c} in which minate t is to count the number of letters or symbols
a, b act as parentheses in the Dyck language, while occurring in the generated words; because of that, ev-
c is free and can appear everywhere. Therefore, the ery symbol appearing in a production is transformed
definition of the language is: into a t. However, we can wish to count other param-
eters instead of or in conjunction with the number of
µ ::= ǫ | cµ | aµbµ symbols. This is accomplished by modifying the in-
tended meaning of the indeterminate t and/or intro-
if µ is the only non-terminal symbol. The ducing some other indeterminate to take into account
Schützenberger’s method gives the equation: the other parameters.
For example, in the Dyck language, we can wish to
µ(t) = 1 + tµ(t) + t2 µ(t)2 count the number of pairs a, b occurring in the words;
this means that t no longer counts the single letters,
whose solution is easily found: but counts the pairs. Therefore, the Schützenberger’s
√ method gives the equation σ(t) = 1 + tσ(t)2 , whose
1 − t − 1 − 2t − 3t2 solution is just the generating function of the Catalan
µ(t) = .
2t2 numbers.
By expanding this function we find the sequence of An interesting application is as follows. Let us sup-
Motzkin numbers, beginning: pose we wish to know how many Fibonacci words of
length n contain k zeroes. Besides the indeterminate
n 0 1 2 3 4 5 6 7 8 9 t counting the total number of symbols, we introduce
Mn 1 1 2 4 9 21 51 127 323 835 a new indeterminate z counting the number of zeroes.
From the productions of the Fibonacci grammar, we
These numbers count the so-called unary-binary derive an equation for the bivariate generating func-
trees, i.e., trees the nodes of which have ariety 1 or 2. tion φ(t, z), in which the coefficient of tn z k is just the
72 CHAPTER 6. FORMAL METHODS

t t
¡@
¡ @
º· ¡ @
­­JJ ::= ǫ ¡ @
­ µ J ¹¸ ­­JJ ­­JJ ­­JJ
­ J ­ µ J ­ µ J ­ µ J
­ J ­ J ­ J

Figure 6.2: An object grammar

number we are looking for. The equation is: total number of zeroes in all the words of length n:
X X µn − k − 1 ¶
φ(t, z) = t + tφ(t, z) + t2 zφ(t, z). kφn,k = k=
k
k k
In fact, the production φ → φ1 increases by 1 the · ¯ ¸
n t y ¯ t2
length of the words, but does not introduce any 0; = [t ] ¯y= =
the production φ → φ01, increases by 2 the length 1 − t (1 − y)2 1−t
and introduces a zero. By solving this equation, we t3
= [tn ] .
find: (1 − t − t2 )2
t
φ(t, z) = . We extract the coefficient:
1 − t − zt2
n t3
We can now extract the coefficient φn,k of tn z k in the [t ] =
(1 − t − t2 )2
following way: µ µ ¶¶2
n−1 1 1 1
t = [t ] √ − =
φn,k = [tn ][z k ] = 5 1 − φt 1 − φt b
1 − t − zt2
t 1 1 n−1 1 2 t
= [tn ][z k ] = = [t ] 2
− [tn ] +
t
1 − t 1 − z 1−t2 5 (1 − φt) 5 (1 − φt)(1 − φt) b
µ 2 ¶k 1 1
t X k

t + [tn−1 ] =
n
= [t ][z ] k
z = 5 (1 − φt)b 2
1−t 1−t
k=0 1 n−1 1 2 1

X t2k+1 z k 2k+1 = [t ] 2
− √ [tn ] +
t 5 (1 − φt) 5 5 1 − φt
= [tn ][z k ] = [t n
] =
(1 − t)k+1 (1 − t)k+1 2 1 1 1
k=0
µ ¶ + √ [tn ] + [tn−1 ] .
5 5 1 − φtb 5 b 2
(1 − φt)
1 n − k − 1
= [tn−2k−1 ] = . The last two terms are negligible because they rapidly
(1 − t)k+1 k
tend to 0; therefore we have:
Therefore, the number of Fibonacci words of length X
n containing k zeroes is counted by a binomial coeffi- n 2
kφn,k ≈ φn−1 − √ φn .
cient. The second expression in the derivation shows 5 5 5
k
that the array (φn,k )n,k∈N is indeed a Riordan ar- To obtain the average number Z of zeroes, we need
ray (t/(1 − t), t/(1 − t)), which is the Pascal triangle to divide this quantity by Fn ∼ φn /√5, the total
n

stretched vertically, i.e., column k is shifted down by number of Fibonacci words of length n:
k positions (k + 1, in reality). The general formula P √
we know for the row sums of a Riordan array gives: kφn,k n 2 5− 5 2
Zn = k ∼ √ − = n− .
X µ
X n−k−1 ¶ F n φ 5 5 10 5
φn,k = = This shows that the average number of zeroes grows
k
k k linearly with the length of the words and tends to
· ¸
n t 1 ¯¯ t2 become the 27.64% of this length, because (5 −
= [t ] ¯y= = √
1−t 1−y 1−t 5)/10 ≈ 0.2763932022 . . ..
t
= [tn ] = Fn
1 − t − t2
6.6 The Shift Operator
as we were expecting. A more interesting problem
is to find the average number of zeroes in all the Fi- In the usual mathematical terminology, an operator
bonacci words with n letters. First, we count the is a mapping from some set F1 of functions into some
6.7. THE DIFFERENCE OPERATOR 73

other set of functions F2 . We have already encoun- E(f (x)g(x)) = Ef (x)Eg(x)


tered the operator G, acting from the set of sequences
(which are properly functions from N into R or C) Hence, if c is any constant (i.e., c ∈ R or c ∈ C) we
into the set of formal power series (analytic func- have Ec = c.
tions). Other usual examples are D, the operator It is possible to consider negative powers of E, as
R
of differentiation, and , the operator of indefinite well. So we have:
integration. Since the middle of the 19th century,
the English mathematicians (G. Boole in particular) E −1 f (x) = f (x − 1) E −n f (x) = f (x − n)
introduced the concept of a finite operator, i.e., an
operator acting in finite terms, in contrast to an in- and this is in accordance with the usual rules of pow-
finitesimal operator, such as differentiation and inte- ers:
gration. The most simple among the finite operators E n E m = E n+m for n, m ∈ Z
is the identity, denoted by I or by 1, which does not
change anything. Operationally, however, the most E n E −n = E 0 = I for n∈Z
important finite operator is the shift operator, de-
noted by E, changing the value of any function f at Finite operators are commonly used in Numerical
a point x into the value of the same function at the Analysis. In that case, an increment h is defined and
point x + 1. We write: the shift operator acts according to this increment,
i.e., Ef (x) = f (x + h). When considering sequences,
Ef (x) = f (x + 1) this makes no sense and we constantly use h = 1.
We can have occasions to use two or more shift
Unlike an infinitesimal operator as D, a finite op-
operators, that is shift operators related to different
erator can be applied to a sequence, as well, and in
variables. We’ll distinguish them by suitable sub-
that case we can write:
scripts:
Efn = fn+1
Ex f (x, y) = f (x + 1, y) Ey f (x, y) = f (x, y + 1)
This property is particularly interesting from our
point of view but, in order to follow the usual nota- Ex Ey f (x, y) = Ey Ex f (x, y) = f (x + 1, y + 1)
tional conventions, we shall adopt the first, functional
notation rather than the second one, more specific for
sequences.
The shift operator can be iterated and, to denote
6.7 The Difference Operator
the successive applications of E, we write E n instead The second most important finite operator is the dif-
of EE . . . E (n times). So: ference operator ∆; it is defined in terms of the shift
2
E f (x) = EEf (x) = Ef (x + 1) = f (x + 2) and identity operators:

and in general: ∆f (x) = (E − 1)f (x) =


= Ef (x) − If (x) = f (x + 1) − f (x)
E n f (x) = f (x + n)
Conventionally, we set E 0 = I = 1, and this is in Some simple examples are:
accordance with the meaning of I:
∆c = Ec − c = c − c = 0 ∀c ∈ C
E 0 f (x) = f (x) = If (x) ∆x = Ex − x = x + 1 − x = 1
2
We wish to observe that every recurrence can be ∆x = Ex2 − x2 = (x + 1)2 − x2 = 2x + 1
written using the shift operator. For example, the ∆xn = Exn − xn = (x + 1)n − xn =
recurrence for the Fibonacci numbers is: X µn ¶
n−1
xk
E 2 Fn = EFn + Fn k
k=0

and this can be written in the following way, separat- 1 1 1 1


∆ = − =−
ing the operator parts from the sequence parts: x x+1 x x(x + 1)
µ ¶ µ ¶ µ ¶ µ ¶
x x+1 x x
(E 2 − E − I)Fn = 0 ∆ = − =
m m m m−1
Some obvious properties of the shift operator are:
The last example makes use of the recurrence relation
E(αf (x) + βg(x)) = αEf (x) + βEg(x) for binomial coefficients. An important observation
74 CHAPTER 6. FORMAL METHODS

concerns the behavior of the difference operator with This is a very important formula, and it is the first
respect to the falling factorial: example for the interest of combinatorics and gener-
ating functions in the theory of finite operators. In
∆xm = (x + 1)m − xm = (x + 1)x(x − 1) · · ·
fact, let us iterate ∆ on f (x) = 1/x:
· · · (x − m + 2) − x(x − 1) · · · (x − m + 1) =
1 −1 1
= x(x − 1) · · · (x − m + 2)(x + 1 − x + m − 1) = ∆2 = + =
x (x + 1)(x + 2) x(x + 1)
= mxm−1 −x + x + 2 2
= =
This is analogous to the usual rule for the differenti- x(x + 1)(x + 2) x(x + 1)(x + 2)
ation operator applied to xm : 1 (−1)n n!
∆n =
x x(x + 1) · · · (x + n)
Dxm = mxm−1
as we can easily show by mathematical induction. In
As we shall see, many formal properties of the dif- fact:
ference operator are similar to the properties of the
differentiation operator. The rôle of the powers xm is ∆n+1 1 = (−1)n n!

however taken by the falling factorials, which there- x (x + 1) · · · (x + n + 1)
fore assume a central position in the theory of finite (−1)n n!
− =
operators. x(x + 1) · · · (x + n)
The following general rules are rather obvious: (−1)n+1 (n + 1)!
=
∆(αf (x) + βg(x)) = α∆f (x) + β∆g(x) x(x + 1) · · · (x + n + 1)
The formula for ∆n now gives the following identity:
∆(f (x)g(x)) = n µ ¶
1 X n 1
= E(f (x)g(x)) − f (x)g(x) = ∆n = (−1)n−k E k
x k x
= f (x + 1)g(x + 1) − f (x)g(x) = k=0

= f (x + 1)g(x + 1) − f (x)g(x + 1) + By multiplying everything by (−1)n this identity can


+f (x)g(x + 1) − f (x)g(x) = be written as:
= ∆f (x)Eg(x) + f (x)∆g(x) X n µ ¶
n! 1 n (−1)k
= ¡x+n¢ =
resembling the differentiation rule D(f (x)g(x)) = x(x + 1) · · · (x + n) x n k x+k
k=0
(Df (x))g(x) + f (x)Dg(x). In a similar way we have:
and therefore we have both a way to express the in-
f (x) f (x + 1) f (x) verse of a binomial coefficient as a sum and an expres-
∆ = − =
g(x) g(x + 1) g(x) sion for the partial fraction expansion of the polyno-
f (x + 1)g(x) − f (x)g(x) mial x(x + 1) · · · (x + n) inverse.
= +
g(x)g(x + 1)
f (x)g(x) − f (x)g(x + 1) 6.8 Shift and Difference Oper-
+ =
g(x)g(x + 1)
(∆f (x))g(x) − f (x)∆g(x) ators - Example I
=
g(x)Eg(x) As the difference operator ∆ can be expressed in
The difference operator can be iterated: terms of the shift operator E, so E can be expressed
in terms of ∆:
∆2 f (x) = ∆∆f (x) = ∆(f (x + 1) − f (x)) =
= f (x + 2) − 2f (x + 1) + f (x). E =∆+1

From a formal point of view, we have: This rule can be iterated, giving the summation for-
mula:
∆2 = (E − 1)2 = E 2 − 2E + 1 Xn µ ¶
n
E n = (∆ + 1)n = ∆k
and in general: k
k=0
n µ ¶
X n which can be seen as the “dual” formula of the one
∆n = (E − 1)n = (−1)n−k E k =
k already considered:
k=0
n µ ¶
X Xn µ ¶
n n
= (−1)n (−E)k n
∆ = (−1)n−k E k
k k
k=0 k=0
6.8. SHIFT AND DIFFERENCE OPERATORS - EXAMPLE I 75

The evaluation of the successive differences of any 4∗ ) A case involving the harmonic numbers:
function f (x) allows us to state and prove two identi-
1
ties, which may have combinatorial significance. Here ∆Hx =
x+1
we record some typical examples; we mark with an µ ¶−1
asterisk the cases when ∆0 f (x) 6= If (x). n (−1)n−1 x + n
∆ Hx =
1) The function f (x) = 1/x has already been de- n n
veloped, at least partially: X µn¶ µ ¶−1
1 x+n
1 −1 (−1)k Hx+k = − (n > 0)
∆ = k n n
k
x x(x + 1)
µ ¶−1 n µ ¶ µ ¶−1
n1 (−1)n n! (−1)n x + n X n (−1)k−1 x + k
∆ = = Hx+n − Hx
x x(x + 1) · · · (x + n) x n k k k
k=1

X µn¶ (−1)k n!
where is to be noted the case x = 0.
= = 5∗ ) A more complicated case with the harmonic
k x+k x(x + 1) · · · (x + n) numbers:
k
µ ¶−1
1 x+n ∆xHx = Hx + 1
= µ ¶−1
x n (−1)n x + n − 1
∆n xHx =
X n¶
µ µ ¶−1 n−1 n−1
k x+k x
(−1) = .
k k x+n
k X µn¶
2∗ ) A somewhat similar situation, but a bit more (−1)k (x + k)Hx+k =
k
k
complex: µ ¶−1
1 x+n−1
p+x m−p =
∆ = n−1 n−1
m+x (m + x)(m + x + 1)
µ ¶−1
∆n
p+x
=
m−p
(−1)n−1
m+x+n X µn¶ (−1)k µx + k − 1¶−1
m+x m+x n =
k k−1 k−1
µ ¶ µ ¶−1 k
X n m−p m+n
k p+k = (x + n) (Hx+n − Hx ) − n
(−1) = (n > 0)
k m+k m n
k
6) Harmonic numbers and binomial coefficients:
X µn¶µm + k ¶−1 m µ ¶ µ ¶µ ¶
(−1)k = (see above). x x 1
k k m+n ∆ Hx = Hx +
k m m−1 m
µ ¶ µ ¶
3) Another version of the first example: x x
∆n Hx = (Hx + Hm − Hm−n )
1 −p m m−n
∆ =
px + m (px + m)(px + p + m)
X µn¶ µ
x+k

1 (−1) k
Hx+k =
∆n = k m
px + m k
µ ¶
(−1)n n!pn n x
= = (−1) (Hx + Hm − Hm−n )
(px + m)(px + p + m) · · · (px + np + m) m−n
X nµ ¶µ ¶
According to this rule, we should have: ∆0 (p + x
(Hx + Hm − Hm−k ) =
x)/(m + x) = (p − m)(m + x); in the second next k m−k
k
sum, however, we have to set ∆0 = I, and therefore µ ¶
x+n
we also have to subtract 1 from both members in or- = Hx+n
m
der to obtain a true identity; a similar situation arises
whenever we have ∆0 6= I. and by performing the sums on the left containing
µ ¶
X n (−1)k Hx and Hm :
n!pn
= X µn¶µ x ¶
k pk + m m(m + p) · · · (m + np) Hm−k =
k
k m−k
µ ¶ k
X n (−1)k k!pk 1 µ ¶
= x+n
k m(m + p) · · · (m + pk) pn + m = (Hx + Hm − Hx+n )
k m
76 CHAPTER 6. FORMAL METHODS

7) The function ln(x) can be inserted in this group: X µn ¶


(−1)k (x + k)m = (−1)n mn xm−n
µ ¶ k
k
x+1
∆ ln(x) = ln X µn¶
x mk xm−k = (x + n)m
X µn¶ k
k
∆n ln(x) = (−1)n (−1)k ln(x + k)
k 4) Similar sums hold for raising factorials:
k

∆xm = m(x + 1)m−1


X µn¶
(−1)k ln(x + k) = ∆n xm = mn (x + n)m−n
k
k
X µn¶ X µn ¶
= (−1)k ln(x + k) (−1)k (x + k)m = (−1)n mn (x + n)m−n
k k
k k

µ ¶µ ¶ X µn¶
n X
X k
n k mk (x + k)m−k = (x + n)m
(−1)k+j ln(x + j) = ln(x + n) k
k
k j
k=0 j=0
5) Two sums involving the binomial coefficients:
Note that the last but one relation is an identity. µ ¶ µ ¶
x x
∆ =
m m−1
µ ¶ µ ¶
6.9 Shift and Difference Oper- n x x
∆ =
ators - Example II m m−n

Here we propose other examples of combinatorial X µn ¶ µ ¶ µ ¶


k x+k n x
sums obtained by iterating the shift and difference (−1) = (−1)
k m m−n
k
operators.
X µn¶µ x ¶ µ
x+n

1) The typical sum containing the binomial coeffi- =
cients: Newton’s binomial theorem: k m−k m
k

x x 6) Another case with binomial coefficients:


∆p = (p − 1)p
∆n px = (p − 1)n px µ ¶ µ ¶
p+x p+x
∆ =
m+x m+x+1
X µn¶ µ
p+x
¶ µ
p+x

(−1)k pk = (p − 1)n ∆n =
k m+x m+x+n
k
X µn ¶
(p − 1)k = pn X µn ¶ µ ¶ µ ¶
k k p+k n p
k (−1) = (−1)
k m+k m+n
k
2) Two sums involving Fibonacci numbers: X n µ ¶µ ¶ µ ¶
p p+n
=
k m+k m+n
∆Fx = Fx−1 k

∆n Fx = Fx−n 7) And now a case with the inverse of a binomial


coefficient:
X µn¶ µ ¶−1 µ ¶−1
(−1)k Fx+k = (−1)n Fx−n x m x+1
k ∆ = −
k m m−1 m+1
X µn ¶ µ ¶−1
x m
µ
x+n
¶−1
Fx−k = Fx+n n n
∆ = (−1)
k m m+n m+n
k

X µn ¶ µ
x+k
¶−1
m
µ
x+n
¶−1
3) Falling factorials are an introduction to binomial (−1)k =
coefficients: k m m+n m+n
k

∆xm = mxm−1 X µn¶ m


µ
x+k
¶−1 µ
x+n
¶−1
k
(−1) =
∆n xm = mn xm−n k
k m+k m+k m
6.10. THE ADDITION OPERATOR 77

8) Two sums with the central binomial coefficients: Because of this connection, the addition operator has
µ ¶ µ ¶ not widely considered in the literature, and the sym-
1 2x 1 1 2x bol S only is used here for convenience. Likewise the
∆ x =
4 x 2(x + 1) 4x x difference operator, the addition operator can be it-
µ ¶ µ ¶
1 2x (−1)n (2n)! 1 2n erated and often produces interesting combinatorial
∆n x = sums according to the rules:
4 x n!(x + 1) · · · (x + n) 4n n
X µn¶
S n = (E + 1)n = Ek
X µn ¶ µ
2x + 2k 1

k
(−1)k = k
k x + k 4k X µn¶
k
µ ¶ E n = (S − 1)n = (−1)n−k S k
(2n)! 1 2x k
k
= =
n!(x + 1) · · · (x + n) 4n x Some examples are in order here:
µ ¶µ ¶−1 µ ¶
1 2n x + n 2x 1) Fibonacci numbers are typical:
=
4n n n x SFm = Fm+1 + Fm = Fm+2
X µn ¶ k
(−1) (2k)!
µ ¶
1 2x
= S n Fm = Fm+2n
k k!(x + 1) · · · (x + k) 4k x
k
µ ¶ X µn ¶
1 2x + 2n Fm+k = Fm+2n
= k
4n x + n k
X µn¶
9) Two sums with the inverse of the central bino- (−1)k Fm+2k = (−1)n Fm+n
mial coefficients: k
k
µ ¶−1 µ ¶−1 2) Here are the binomial coefficients:
2x 4x 2x
∆4x = µ ¶ µ ¶
x 2x + 1 x m m+1
µ ¶−1 S =
x x+1
n x 2x µ ¶ µ ¶
∆ 4 =
x n m m+n
µ ¶−1 S =
x x+n
1 (−1)n−1 (2n)!4x 2x
=
2n − 1 2n n!(2x + 1) · · · (2x + 2n − 1) x X µn¶µ m ¶ µ
m+n

=
k x+k x+n
X µn ¶ µ ¶−1 k
k k 2x + 2k X µn¶ µ ¶ µ ¶
(−1) 4 = k m+k n m
k x+k (−1) = (−1)
k k x+k x+n
µ ¶−1 k
1 (2n)! 2x
= 3) And finally the inverse of binomial coefficients:
n
2n − 1 2 n!(2x + 1) · · · (2x + 2n − 1) x µ ¶−1 µ ¶−1
X µn ¶ 1 (2k)!(−1)k−1 m m+1 m−1
= S =
k
k 2k − 1 2 k!(2x + 1) · · · (2x + 2k − 1) x m x
k µ ¶ µ ¶−1
µ ¶−1 µ ¶ n m m+1 m−n
2x + 2n 2x S =
= 4n −1 x m−n+1 x
x+n x
X µn¶µ m ¶−1 m+1
µ
m+n
¶−1
6.10 The Addition Operator k x+k
=
m−n+1 x
k
The addition operator S is analogous to the difference X µn¶ (−1)k µm + k ¶−1 1
µ
m
¶−1
operator: = .
k m−k+1 x m+1 x+n
S =E+1 k
We can obviously invent as many expressions as we
and in fact a simple connection exists between the
desire and, correspondingly, may obtain some sum-
two operators:
mation formulas of combinatorial interest. For ex-
S(−1)x f (x) = (−1)x+1 f (x + 1) + (−1)x f (x) = ample:
= (−1)x+1 (f (x + 1) − f (x)) = S∆ = (E + 1)(E − 1) = E 2 − 1 =
= (−1)x−1 ∆f (x) = (E − 1)(E + 1) = ∆S
78 CHAPTER 6. FORMAL METHODS

This derivation shows that the two operators S and have ∆−1 f (x) = g(x) and the rule of definite sum-
∆ commute. We can directly verify this property: mation immediately gives:
n
X
S∆f (x) = S(f (x + 1) − f (x)) =
f (x + k) = g(x + n + 1) − g(x)
= f (x + 2) − f (x) = (E 2 − 1)f (x) k=0
∆Sf (x) = ∆(f (x + 1) + f (x)) =
This is analogous to the rule of definite integration.
R
= f (x + 2) − f (x) = (E 2 − 1)f (x) In fact, the operator of indefinite integration dx
Consequently, we have the two summation formulas: is inverse of the differentiation operator D, and if
f (x) is any function, a primitive function for f (x)
X µn¶ is any function ĝ(x) such that Dĝ(x) = f (x) or
∆n S n = (E 2 − 1)n = (−1)n−k E 2k R
k D−1 f (x) = f (x)dx = ĝ(x). The fundamental the-
k
X µn¶ orem of the integral calculus relates definite and in-
E 2n = (∆S + 1)n = ∆k S k definite integration:
k
k Z b
A simple example is offered by the Fibonacci num- f (x)dx = ĝ(b) − ĝ(a)
a
bers:
The formula for definite summation can be written
∆SFm = Fm+1 in a similar way, if we consider the integer variable k
(∆S)n Fm = ∆n S n Fm = Fm+n and set a = x and b = x + n + 1:
b−1
X
X µn¶ f (k) = g(b) − g(a)
(−1)k Fm+2k = (−1)n Fm+n
k k=a
k
X µn¶ These facts create an analogy between ∆−1 and
Fm+k = Fm+2n −1
R
k D , or Σ and dx, which can be stressed by con-
k
sidering the formal properties of Σ. First of all, we
but these identities have already been proved using observe that g(x) = Σf (x) is not uniquely deter-
the addition operator S. mined. If C(x) is any function periodic of period
1, i.e., C(x + k) = C(x), ∀k ∈ Z, we have:

6.11 Definite and Indefinite ∆(g(x) + C(x)) = ∆g(x) + ∆C(x) = ∆g(x)


summation and therefore:
The following result is one of the most important rule Σf (x) = g(x) + C(x)
connecting the finite operator method and combina-
torial sums: When f (x) is a sequence and only is defined for inte-
n
ger values of x, the function C(x) reduces to a con-
X 1 1
E k
= [z ]n
= stant, and plays the same rôle as the integration con-
1 − z 1 − Ez stant in the operation of indefinite integration.
k=0
µ ¶ The operator Σ is obviously linear:
1 1 1
= [z n ] − =
(E − 1)z 1 − Ez 1−z Σ(αf (x) + βg(x)) = αΣf (x) + βΣg(x)
µ ¶
1 1 1
= [z n+1 ] − = This is proved by applying the operator ∆ to both
E−1 1 − Ez 1−z
sides.
E n+1 − 1 n+1 −1 An important property is summation by parts, cor-
= = (E − 1)∆
E−1 responding to the well-known rule of the indefinite
We observe that the operator E commutes with the integration operator. Let us begin with the rule for
indeterminate z, which is constant with respect to the difference of a product, which we proved earlier:
the variable x, on which E operates. The rule above
∆(f (x)g(x)) = ∆f (x)Eg(x) + f (x)∆g(x)
is called the rule of definite summation; the operator
∆−1 is called indefinite summation and is often de- By applying the operator Σ to both sides and ex-
noted by Σ. In order to make this point clear, let us changing terms:
consider any function f (x) and suppose that a func-
tion g(x) exists such that ∆g(x) = f (x). Hence we Σ(f (x)∆g(x)) = f (x)g(x) − Σ(∆f (x)Eg(x))
6.12. DEFINITE SUMMATION 79

This formula allows us to change a sum of products, have a number of sums. The negative point is that,
of which we know that the second factor is a differ- sometimes, we do not have a simple function and,
ence, into a sum involving the difference of the first therefore, the sum may not have any combinatorial
factor. The transformation can be convenient every interest.
time when the difference of the first factor is simpler Here is a number of identities obtained by our pre-
than the difference of the second factor. For example, vious computations.
let us perform the following indefinite summation: 1) We have again the partial sums of the geometric
µ ¶ µ ¶ series:
x x
Σx = Σx∆ = px
m m+1 ∆−1 px = = Σpx
µ ¶ µ ¶ p−1
x x+1 n
= x − Σ(∆x) = X px+n+1 − px
m+1 m+1 px+k =
µ ¶ µ ¶ p−1
x x+1 k=0
= x −Σ = n
m+1 m+1 X pn+1 − 1
µ ¶ µ ¶ pk = (x = 0)
x x+1 p−1
= x − k=0
m+1 m+2
2) The sum of consecutive Fibonacci numbers:
Obviously, this indefinite sum can be transformed
into a definite sum by using the first result in this ∆−1 Fx = Fx+1 = ΣFx
section: n
X
µ ¶ µ ¶ Fx+k = Fx+n+2 − Fx+1
Xb
k b+1 k=0
k = (b + 1) − n
m m+1 X
k=a Fk = Fn+2 − 1 (x = 0)
µ ¶ µ ¶ µ ¶
a b+2 a+1 k=0
− a − +
m+1 m+2 m+2
3) The sum of consecutive binomial coefficients with
and for a = 0 and b = n: constant denominator:
µ ¶ µ ¶ µ ¶
Xn µ ¶ µ ¶ µ ¶ −1 x x x
k n+1 n+2 ∆ = =Σ
k = (n + 1) − m m+1 m
m m+1 m+2
k=0 Xn µ ¶ µ ¶ µ ¶
x+k x+n+1 x
= −
m m+1 m+1
6.12 Definite Summation k=0
n µ ¶ µ ¶
X k n+1
In a sense, the rule: = (x = 0)
m m+1
k=0
n
X
E k = (E n+1 − 1)∆−1 = (E n+1 − 1)Σ 4) The sum of consecutive binomial coefficients:
k=0
µ ¶ µ ¶ µ ¶
p+x p+x p+x
∆−1 = =Σ
is the most important result of the operator method. m+x m+x−1 m+x
Xn µ ¶ µ ¶ µ ¶
In fact, it reduces the sum of the successive elements p+k p+n+1 p
in a sequence to the computation of the indefinite = −
m+k m+n m−1
sum, and this is just the operator inverse of the k=0

difference. Unfortunately, ∆−1 is not easy to com- 5) The sum of falling factorials:
pute and, apart from a restricted number of cases,
there is no general rule allowing us to guess what 1
∆−1 xm = xm+1 = Σxm
−1
∆ f (x) = Σf (x) might be. In this rather pes- m + 1
n
simistic sense, the rule is very fine, very general and X (x + n + 1)m+1 − xm+1
(x + k)m =
completely useless. m+1
k=0
However, from a more positive point of view, we n
X 1
can say that whenever we know, in some way or an- km = (n + 1)m+1 (x = 0).
−1
other, an expression for ∆ f (x) = Σf (x), we have m +1
Pn k=0
solved the problem of finding k=0 f (x + k). For
example, we can look at the differences computed in 6) The sum of raising factorials:
the previous sections and, for each of them, obtain 1
the Σ of some function; in this way we immediately ∆−1 xm = (x − 1)m+1 = Σxm
m+1
80 CHAPTER 6. FORMAL METHODS

n
X (x + n)m+1 − (x − 1)m+1 By inverting, we have a formula for the Σ operator:
(x + k)m =
m+1 µ ¶
k=0 1 1 D
n
X 1 Σ= D =
km = nm+1 (x = 0). e −1 D eD − 1
m+1
k=0
Now, we recognize the generating function of the
7) The sum of inverse binomial coefficients: Bernoulli numbers, and therefore we have a devel-
µ ¶−1 µ ¶−1 µ ¶ opment for Σ:
−1 x m x−1 x
∆ = =Σ µ ¶
m m−1 m−1 m 1 B1 B2 2
Σ = B0 + D+ D + ··· =
n µ ¶−1 D 1! 2!
X x+k
= 1 1 1 3
m = D−1 − I + D − D +
k=0 2 12 720
õ ¶−1 µ ¶−1 ! 1 1
m x−1 x+n + D5 − D7 + · · · .
− . 30240 1209600
m−1 m−1 m−1
This is not a series development since, as we know,
8) The sum of harmonic numbers. Since 1 = ∆x, we the Bernoulli numbers diverge to infinity. We have a
have: case of asymptotic development, which only is defined
∆−1 Hx = xHx − x = ΣHx when we consider a limited number of terms, but in
n
X general diverges if we let the number of terms go to
Hx+k = (x + n + 1)Hx+n+1 − xHx − (n + 1) infinity. The number of terms for which the sum ap-
k=0 proaches its true value depends on the function f (x)
n
X and on the argument x.
Hk = (n + 1)Hn+1 − (n + 1) = From the indefinite we can pass to the definite sum
k=0 by applying the general rule of Section 6.12. Since
R
= (n + 1)Hn − n (x = 0). D−1 = dx, we immediately have:
n−1
X Z n
6.13 The Euler-McLaurin Sum- 1 n
f (k) = f (x) dx − [f (x)]0 +
0 2
mation Formula k=0
1 ′ n 1 n
+ [f (x)]0 − [f ′′′ (x)]0 + · · ·
One of the most striking applications of the finite 12 720
operator method is the formal proof of the Euler- and this is the celebrated Euler-McLaurin summation
McLaurin summation formula. The starting point is formula. It expresses a sum as a function of the in-
the Taylor theorem for the series expansion of a func- tegral and the successive derivatives of the function
tion f (x) ∈ C ∞ , i.e., a function having derivatives of f (x). In this sense, the formula can be seen as a
any order. The usual form of the theorem: method for approximating a sum by means of an in-
h ′ h2 tegral or, vice versa, for approximating an integral by
f (x + h) = f (x) + f (x) + f ′′ (x) +
1! 2! means of a sum, and this was just the point of view
hn (n) of the mathematicians who first developed it.
+ ··· + f (x) + · · ·
n! As a simple but very important example, let us
can be interpreted in the sense of operators as a result find an asymptotic development for the harmonic
connecting the shift and the differentiation operators. numbers Hn . Since Hn = Hn−1 + 1/n, the Euler-
In fact, for h = 1, it can be written as: McLaurin formula applies to Hn−1 and to the func-
tion f (x) = 1/x, giving:
Df (x) D2 f (x)
Ef (x) = If (x) + + + ··· Z n · ¸n · ¸n
1! 2! dx 1 1 1 1
Hn−1 = − + − 2 −
and therefore as a relation between operators: 1 x 2 x 1 12 x 1
· ¸n · ¸n
D D2 Dn 1 6 1 120
E =1+ + + ··· + + · · · = eD − − 4 + − 6 + ···
1! 2! n! 720 x 1 30240 x 1
This formal identity relates the finite operator E and 1 1 1 1 1
= ln n − + − 2
+ + −
the infinitesimal operator D, and subtracting 1 from 2n 2 12n 12 120n4
both sides it can be formulated as: 1 1 1
− − 6
+ + ···
120 256n 252
∆ = eD − 1
6.14. APPLICATIONS OF THE EULER-MCLAURIN FORMULA 81
µ ¶
In this expression a number of constants appears, and √ nn 1 1
= 2πn n 1 + + + ··· ×
they can be summed together to form a constant γ, e 12n 288n2
µ ¶
provided that the sum actually converges. However, 1
we observe that as n → ∞ this constant γ is the × 1− + ··· ··· =
360n3
Euler-Mascheroni constant: µ ¶
√ nn 1 1
= 2πn n 1 + + − ··· .
lim (Hn−1 − ln n) = γ = 0.577215664902 . . . e 12n 288n2
n→∞
This is the well-known Stirling’s approximation for
By adding 1/n to both sides of the previous relation,
n!. By means of this approximation, we can also find
we eventually find:
the approximation for another important quantity:
1 1 1 1 µ ¶ √ ¡ ¢2n
Hn = ln n + γ + − + − + ··· 2n (2n)! 4πn 2n
2n 12n2 120n4 252n6 = = e
¡ ¢2n ×
n n!2 2πn n
and this is the asymptotic expansion we were looking e
1 1
for. 1+ 24n + 1142n2 − · · ·
ס = ¢2
1 1
1+ +
12n 288n2 − · · ·
µ ¶
6.14 Applications of the Euler- = √
4n
1−
1
+
1
+ ··· .
πn 8n 128n2
McLaurin Formula
Another application of the Euler-McLaurin sum-
As another application of the Euler-McLaurin sum- mation formula is given by the sum Pn k p , when
k=1
mation formula, we now show the derivation of the p is any integer constant different from −1, which is
Stirling’s approximation for n!. The first step consists the case of the harmonic numbers:
in taking the logarithm of that quantity: n−1 Z n
X 1 n 1 £ p−1 ¤n
p
ln n! = ln 1 + ln 2 + ln 3 + · · · + ln n k = xp dx − [xp ]0 + px 0

0 2 12
k=0
so that we are reduced to compute a sum and hence 1 £ ¤n
to apply the Euler-McLaurin formula: − p(p − 1)(p − 2)xp−3 0 + · · · =
720
n−1
X Z n np+1 np pnp−1
1 n = − + −
ln(n − 1)! = ln k = ln x dx − [ln x]1 + p+1 2 12
1 2
k=1
· ¸n · ¸n p(p − 1)(p − 2)np−3
1 1 1 2 − + ···.
+ − + ··· = 720
12 x 1 720 x3 1 In this case the evaluation at 0 does not introduce
1 1 any constant. By adding np to both sides, we have
= n ln n − n + 1 − ln n + −
2 12n the following formula, which only contains a finite
1 1 1 number of terms:
− − + + ···.
12 360n2 360 n
X
R np+1 np pnp−1
Here we have used the fact that ln x dx = x ln x − kp = + + −
p+1 2 12
x. At this point we can add ln n to both sides and k=0
introduce a constant σ = 1 − 1/12 + 1/360 − · · · It is p(p − 1)(p − 2)np−3
− + ···
not by all means easy to determine directly the value 720
of σ, but by other approaches
√ to the same problem If p is not an integer, after ⌈p⌉ differentiations, we
it is known that σ = ln 2π. Numerically, we can obtain xq , where q < 0, and therefore we cannot
observe that: consider the limit 0. We proceed with the Euler-
1 1 √ McLaurin formula in the following way:
1− + = 0.919(4) and ln 2π ≈ 0.9189388.
12 360 n−1 Z n
X 1 n 1 £ p−1 ¤n
We can now go on with our sum: kp = xp dx − [xp ]1 + px 1

1 2 12
k=1
1 √ 1 1
ln n! = n ln n−n+ ln n+ln 2π + − +· · · 1 £ ¤n
2 12n 360n3 − p(p − 1)(p − 2)xp−3 1 + · · · =
720
To obtain the value of n! we only have to take expo- np+1 1 1 1 pnp−1
nentials: = − − np + + −
µ ¶ µ ¶ p+1 p+1 2 2 12
nn √ √ 1 1 p p(p − 1)(p − 2)np−3
n! = n 2π exp exp − ··· − − +
en 12n 360n3 12 720
82 CHAPTER 6. FORMAL METHODS

p(p − 1)(p − 2)
+ ···
720
n
X np+1 np pnp−1
kp = + + −
p+1 2 12
k=1
p(p − 1)(p − 2)np−3
− + · · · + Kp .
720
The constant:
1 1 p p(p − 1)(p − 2)
Kp = − + − + + ···
p + 1 2 12 720
has a fundamental rôle when the leading term
np+1 /(p + 1) does not increase with n, i.e., when
p < −1. In that case, in fact, the sum converges
to Kp . When p > −1 the constant is less important.
For example, we have:
n √
X √
2 √ n 1
k= n n+ + K1/2 + √ + · · ·
3 2 24 n
k=1

K1/2 ≈ −0.2078862 . . .
Xn
1 √ 1 1
√ = 2 n + K−1/2 + √ − √ + ···
k 2 n 24n n
k=1

K−1/2 ≈ −1.4603545 . . .
For p = −2 we find:
n
X 1 1 1 1 1
= K−2 − + 2 − 3 + − ···
k2 n 2n 6n 30n5
k=1

It is possible to show that K−2 = π 2 /6 and therefore


we have a way to approximate the sum (see Section
2.7).
Chapter 7

Asymptotics

7.1 The convergence of power |fn | < M/|t0 |n and therefore:


¯ ∞ ¯
series ¯X
¯
¯
¯

X
¯ fn tn1 ¯ ≤ |fn ||t1 |n ≤
¯ ¯
n=N n=N
In many occasions, we have pointed out that our ap-

X ∞ µ
X ¶n
proach to power series was purely formal. Because M n |t1 |
≤ |t1 | = M <∞
of that, we always spoke of “formal power series”, |t0 |n |t0 |
n=N n=N
and never considered convergence problems. As we
because the last sum is a geometric series with
have seen, a lot of things can be said about formal
|t1 |/|t0 | < 1 by the hypothesis |t1 | < |t0 |. Since the
power series, but now the moment has arrived that
first N terms obviously amount to a finite quantity,
we must turn to talk about the convergence of power
the theorem follows.
series. We will see that this allows us to evaluate the
asymptotic
P behavior of the coefficients fn of a power In a similar way, we can prove that if the series
series n fn tn , thus solving many problems in which diverges for some value t0 ∈ C, then it diverges for
the exact value of fn cannot be found. In fact, many every value t1 such that |t1 | > |t0 |. Obviously, a
times, the asymptotic evaluation of fn can be made series can converge for the single value t0 = 0, as it
P n
more precise and an actual approximation of fn can happens for P n n!t , or can converge for every value
be found. t ∈ C, as for n tn /n! = et . In all the other cases,
The natural setting for talking about convergence the previous theorem implies:
is the field C of the complex numbers and therefore, P
Theorem 7.1.2 Let f (t) = n fn tn be a power se-
from now on, we will think of the indeterminate t ries; then there exists a non-negative number R ∈ R
as of a variable taking its
P values from C. Obviously, or R = ∞ such that:
n
a power series f (t) P = n nf t converges for some
t0 ∈ C iff f (t0 ) = f t n
< ∞, and diverges iff 1. for every complex number t0 such that |t0 | < R
n n 0
limt→t0 f (t) = ∞. There are cases for which a series the series (absolutely) converges and, in fact, the
neither converges convergence is uniform in every circle of radius
P nor diverges; for example, when t = ρ < R;
1, the series n (−1)n tn does not tend to any limit,
finite or infinite. Therefore, when we say that a series 2. for every complex number t0 such that |t0 | > R
does not converge (to a finite value) for a given value the series does not converge.
t0 ∈ C, we mean that the series in t0 diverges or does
The uniform convergence derives from the previous
not tend to any limit.
proof: the constant M can be made unique by choos-
A basic result on convergence is given by the fol- ing the largest value for all the t0 such that |t0 | ≤ ρ.
lowing: The value of R is uniquely determined and is called
the radius of convergence for the series. From the
P
Theorem 7.1.1 Let f (t) = n fn tn be a power se- proof ofpthe theorem, √ for r < R we have |fn |rn ≤
n
ries such that f (t0 ) converges for the value t0 ∈ C. M or p|fn | ≤
n
M /r → 1/r; this implies that
Then f (t) converges for every t1 ∈ C such that lim sup n |fn | ≤ 1/R. Besides, for r > R we
|t1 | < |t0 |. have |fnp|rn ≥ 1 for infinitely many n; this implies
lim sup |fn | ≥ 1/R, and therefore we have the fol-
n

Proof: If f (t0 ) < ∞ then an index N ∈ N ex- lowing formula for the radius of convergence:
ists such that for every n > N we have |fn tn0 | ≤ 1 pn
n
|fn ||t0 | < M , for some finite M ∈ R. This means = lim sup |fn |.
R n→∞

83
84 CHAPTER 7. ASYMPTOTICS

This result is the basis for our considerations on the This is a particular case of a more general re-
asymptotics of a power series coefficients. In fact, it sult due to Darboux and known as Darboux’ method.
implies that, as a first approximation, |fn | grows as First of all, let us show how it is possible to ¡obtain
¢ an
1/Rn . However, this is a rough estimate, because it approximation for the binomial coefficient nγ , when
can also grow as n/Rn or 1/(nRn ), and many possi- γ ∈ C is a fixed number and n is large. We begin
bilities arise, which can make more precise the basic by proving the following formula for the ratio of two
approximation; the next sections will be dedicated to large values of the Γ function (a, b are two small pa-
this problem. We conclude by noticing that if: rameters with respect to n):
¯ ¯
¯ fn+1 ¯ Γ(n + a)
lim ¯ ¯=S =
n→∞ ¯ fn ¯ Γ(n + b)
µ µ ¶¶
then R = 1/S is the radius of convergence of the (a − b)(a + b − 1) 1
= na−b 1 + +O .
series. 2n n2
Let us apply the Stirling formula for the Γ function:
7.2 The method of Darboux Γ(n + a)

Γ(n + b)
Newton’s rule is the basis for many considerations on r µ ¶n+a µ ¶
asymptotics. 2π n+a 1
√ In practice, we used it to prove that ≈ 1+ ×
Fn ∼ φn / 5, and many other proofs can be per- n+a e 12(n + a)
r µ ¶n+b µ ¶
formed by using Newton’s rule together with the fol- n+b e 1
lowing theorem, whose relevance was noted by Ben- × 1− .
2π n+b 12(n + b)
der and, therefore, will be called Bender’s theorem:
If we limit ourselves to the term in 1/n, the two cor-
Theorem 7.2.1 Let f (t) = g(t)h(t) where f (t), rections cancellate each other and therefore we find:
g(t), h(t) are power series and h(t) has a ra- r
dius of convergence larger than f (t)’s (which there- Γ(n + a) ≈ n + b b−a (n + a)n+a
e =
fore equals the radius of convergence of g(t)); if Γ(n + b) n+a (n + b)n+b
r
limn→∞ gn /gn+1 = b and h(b) 6= 0, then: n + b b−a a−b (1 + a/n)n+a
= e n .
fn ∼ h(b)gn n+a (1 + b/n)n+b
Let us remember that if g(t) has positive real coef- We now obtain asymptotic approximations in the fol-
ficients, then gn /gn+1 tends to the radius of conver- lowing way:
r p
gence of g(t). The proof of this theorem is omitted n+b 1 + b/n
here; instead, we give a simple example. Let us sup- = p ≈
n+a 1 + a/n
pose we wish to find the asymptotic value for the µ ¶
Motzkin numbers, whose generating function is: b ³ a ´ b−a
≈ 1+ 1− ≈1+ .
√ 2n 2n 2n
1 − t − 1 − 2t − 3t 2
µ(t) = . ³
2t2 x ´n+x ³ ³ x ´´
1+ = exp (n + x) ln 1 + =
For n ≥ 2 we obviously have: n n
√ µ µ ¶¶
1 − 2t − 3t2 x x2
n 1−t− = exp (n + x) − + · · · =
µn = [t ] = n 2n2
2t2 µ ¶
1 n+2 √ x2 x2
= − [t 1/2
] 1 + t (1 − 3t) . = exp x + − + · · · =
2 n 2n
µ ¶
We now observe that the radius of convergence of x x2
µ(t) is R = 1/3, which is the same as the radius of = e 1 + + · · · .
1/2
√ 2n
g(t) = (1−3t) , while h(t) = 1 + t has 1 as radius
of convergence; therefore we have µn /µn+1 → 1/3 as Therefore, for our expression we have:
n → ∞. By Bender’s theorem we find: Γ(n + a)
r ≈
1 4 n+2 Γ(n + b)
µn ∼ − [t ](1 − 3t)1/2
= µ ¶µ ¶µ ¶
2 3 na−b ea a2 b2 b−a
√ µ ¶ ≈ 1+ 1− 1+ =
3 1/2 ea−b eb 2n 2n 2n
n+2 µ µ ¶¶
= − (−3) =
3 n+2 a−b a2 − b2 − a + b 1
√ µ ¶ µ ¶n+2 = n 1 + +O .
3 2n + 4 3 2n n2
= . We are now in a position to prove the following:
3(2n + 3) n + 2 4
7.3. SINGULARITIES: POLES 85

Theorem 7.2.2 Let f (t) = h(t)(1 − αt)γ , for some |t| > 1, f (t) assumes a well-defined value while fb(t)
γ which is not a positive integer, and h(t) having a diverges.
radius of convergence larger than 1/α. Then we have: We will call a singularity for f (t) every point t0 ∈ C
µ ¶µ ¶ such that in every neighborhood of t0 there is a t for
n 1 γ αn h(1/α) which f (t) and fb(t) behaves differently and a t′ for
fn = [t ]f (t) ∼ h (−α)n = .
α n Γ(−γ)n1+γ which f (t′ ) = fb(t′ ). Therefore, t0 = 1 is a singularity
for f (t) = 1/(1 − t). Because our previous considera-
Proof: We simply apply Bender’s theorem and the
tions, the singularities of f (t) determine its radius of
formula for approximating the binomial coefficient:
convergence; on the other hand, no singularity can be
µ ¶ contained in the circle of convergence, and therefore
γ γ(γ − 1) · · · (γ − n + 1)
= = the radius of convergence is determined by the singu-
n n!
larity or singularities of smallest modulus. These will
(−1)n (n − γ − 1)(n − γ − 2) · · · (1 − γ)(−γ) be called dominating singularities and we observe ex-
= . plicitly that a function can have more than one dom-
Γ(n + 1)
inating singularity. For example, f (t) = 1/(1 − t2 )
By repeated applications of the recurrence formula has t = 1 and t = −1 as dominating singularities, be-
for the Γ function Γ(x + 1) = xΓ(x), we find: cause |1| = |−1|. The radius of convergence is always
a non-negative real number and we have R = |t0 |, if
Γ(n − γ) =
t0 is any one of the dominating singularities for f (t).
= (n − γ − 1)(n − γ − 2) · · · (1 − γ)(−γ)Γ(−γ) An isolated point t0 for which f (t0 ) = ∞ is there-
fore a singularity for f (t); as we shall see, not every
and therefore: singularity of f (t) is such that f (t) = ∞, but, for
µ ¶ the moment, let us limit ourselves to this case. The
γ (−1)n Γ(n − γ)
= = following situation is very important: if f (t0 ) = ∞
n Γ(n + 1)Γ(−γ)
n
µ ¶ and we set α = 1/t0 , we will say that t0 is a pole for
(−1) −1−γ γ(γ + 1)
= n 1+ f (t) iff there exists a positive integer m such that:
Γ(−γ) 2n
lim (1 − αt)m f (t) = K < ∞ and K 6= 0.
from which the desired formula follows. t→t0

The integer m is called the order of the pole. By


this definition, the function f (t) = 1/(1 − t) has a
7.3 Singularities: poles pole of order 1 in t0 = 1, while 1/(1 − t)2 has a
The considerations in the previous sections show how pole of order 2 in t0 = 1 and 1/(1 − 2t)5 has a pole
important is to determine the radius of convergence of order 5 in t0 = 1/2. A more interesting case is
of a series, when we wish to have an approximation f (t) = (et − e)/(1 − t)2 , which, notwithstanding the
of its coefficients. Therefore, we are now going to (1 − t)2 , has a pole of order 1 in t0 = 1; in fact:
look more closely to methods for finding the radius et − e et − e et
of convergence of a given function f (t). In order to do lim (1 − t) 2
= lim = lim = −e.
t→1 (1 − t) t→1 1 − t t→1 −1
this, we have to distinguish between the function f (t)
and its series development, which will be denoted by The generating function of Bernoulli numbers
fb(t). So, for example, we have f (t) = 1/(1 − t) and f (t) = t/(et − 1) has infinitely many poles. Observe
fb(t) = 1 + t + t2 + t3 + · · ·. The series fb(t) represents first that t = 0 is not a pole because:
f (t) inside the circle of convergence, in the sense that
t 1
fb(t) = f (t) for every t internal to this circle, but lim = lim t = 1.
t→0 et − 1 t→0 e
fb(t) can be different from f (t) on the border of the
circle or outside it, where actually the series does not The denominator becomes 0 when et = 1, and this
converge. Therefore, the radius of convergence can happens when t = 2kπi; in fact, e2kπi = cos 2kπ +
be determined if we are able to find out values t0 for i sin 2kπ = 1. In that case, the dominating sin-
which f (t0 ) 6= fb(t0 ). gularities are t0 = 2πi and t1 = −2πi. Finally,
A first case is when t0 is an isolated point for which the generating function of the ordered Bell numbers
limt→t0 f (t) = ∞. In fact, in this case, for every t f (t) = 1/(2−et ) has again an infinite number of poles
such that |t| > |t0 | the series fb(t) should diverge as t = ln 2+2kπi; in this case the dominating singularity
we have seen in the previous section, and therefore is t0 = ln 2.
fb(t) must be different from f (t). This is the case of We conclude this section by observing that if
f (t) = 1/(1 − t), which goes to ∞ when t → 1. When f (t0 ) = ∞, not necessarily t0 is a pole for f (t). In
86 CHAPTER 7. ASYMPTOTICS

fact, let us consider the generating function


√ for the the number of permutations having at least a fixed
central binomial coefficients f (t) = 1/ 1 − 4t. For point: if the fixed point is 1, we have (n − 1)! possible
t0 = 1/4 we have f (1/4) = ∞, but t0 is not a pole of permutations; if the fixed point is 2, we have again
order 1 because: (n − 1)! permutations of the other elements. There-
1 − 4t √ fore, we have a total of n(n − 1)! cases, giving the
lim √ = lim 1 − 4t = 0 approximation:
t→1/4 1 − 4t t→1/4
and the same happens if we try with (1 − 4t)m for Dn = n! − n(n − 1)!.
m > 1. As we shall see, this kind of singularity is
called “algebraic”. Finally, let us consider the func- This quantity is clearly 0 and this happens because
tion f (t) = exp(1/(1 − t)), which goes to ∞ as t → 1. we have subtracted twice every permutation with at
In this case we have: least 2 fixed points: in fact, we subtracted it when
µ ¶ we considered the first and the second fixed point.
m 1 Therefore, we have now to add permutations with at
lim (1 − t) exp =
t→1 1−t least two fixed points. These are obtained by choos-
¡ ¢
µ ¶ ing the two fixed points in all the n2 possible ways
1 1
= lim (1 − t)m 1 + + + · · · = ∞. and then permuting the n − 2 remaining elements.
t→1 1 − t 2(1 − t)2 Thus we have the new approximation:
In fact, the first m − 1 terms tend to 0, the mth µ ¶
term tends to 1/m!, but all the other terms go to ∞. n
Dn = n! − n(n − 1)! + (n − 2)!.
Therefore, t0 = 1 is not a pole of any order. When- 2
ever we have a function f (t) for which a point t0 ∈ C
exists such that ∀m > 0: limt→t0 (1−t/t0 )m f (t) = ∞, In this way, however, we added twice permutations
we say that t0 is an essential singularity for f (t). Es- with at least three fixed points, which have to be
sential singularities are points at which f (t) goes to subtracted again. We thus obtain:
∞ too fast; these singularities cannot be treated by µ ¶ µ ¶
n n
Darboux’ method and their study will be delayed un- Dn = n! − n(n − 1)! + (n − 2)! − (n − 3)!.
til we study the Hayman’s method. 2 3

We can now go on with the same method, which is


7.4 Poles and asymptotics called the inclusion exclusion principle, and eventu-
ally arrive to the final value:
Darboux’ method can be easily used to deal with µ ¶
n
functions, whose dominating singularities are poles. Dn = n! − n(n − 1)! + (n − 2)! −
Actually, a direct application of Bender’s theorem is 2
µ ¶
sufficient, and this is the way we will use in the fol- n
− (n − 3)! + · · · =
lowing examples. 3
Fibonacci numbers are easily approximated: n! n! n! n! Xn
(−1)k
t t 1 = − + − + · · · = n! .
[tn ] = [tn ] ∼ 0! 1! 2! 3! k!
2 k=0
1−t−t 1 − φtb 1 − φt
· ¯ ¸ This formula checks with the previously found val-
t ¯ 1 1 1
∼ ¯t= [tn ] = √ φn . ues. We obtain the exponential generating function
b
1 − φt φ 1 − φt 5
G(Dn /n!) by observing that the generic element in
Our second example concerns a particular kind of the sum is the coefficient [tn ]e−t , and therefore by
permutations, called derangements (see Section 2.2). the theorem on the generating function for the par-
A derangement is a permutation without any fixed tial sums we have:
point. For n = 0 the empty permutation is consid- µ ¶
ered a derangement, since no fixed point exists. For Dn e−t
G = .
n = 1, there is no derangement, but for n = 2 the n! 1−t
permutation (1 2), written in cycle notation, is ac-
tually a derangement. For n = 3 we have the two In order to find the asymptotic value for Dn , we
derangements (1 2 3) and (1 3 2), and for n = 4 we observe that the radius of convergence of 1/(1 − t)
have a total of 9 derangements. is 1, while e−t converges for every value of t. By
Let Dn the number of derangements in Pn ; we can Bender’s theorem we have:
count them in the following way: we begin by sub- Dn n!
tracting from n!, the total number of permutations, ∼ e−1 or Dn ∼ .
n! e
7.5. ALGEBRAIC AND LOGARITHMIC SINGULARITIES 87

This value is indeed a very good approximation for 7.5 Algebraic and logarithmic
Dn , which can actually be computed as the integer
nearest to n!/e.
singularities
Let us now see how Bender’s theorem is applied
Let us consider the generating√ function for the Cata-
to the exponential generating function of the ordered
lan numbers f (t) = (1 − 1 − 4t)/(2t) and the cor-
Bell numbers. We have shown that the dominating
responding power series fb(t) = 1 + t + 2t2 + 5t3 +
singularity is a pole at t = ln 2 which has order 1:
14t4 + · · ·. Our choice of the − sign was motivated
1 − t/ ln 2 −1/ ln 2 1 by the initial condition of the recurrence Cn+1 =
lim = lim = . P n
t→ln 2 2 − et t→ln 2 −et 2 ln 2 k=0 Ck Cn−k defining the Catalan numbers. This
At this point we have: is due to the fact that, when the argument is a pos-
1 1 1 − t/ ln 2 itive real number, we can choose the positive value
[tn ] = [t n
] ∼ as the result of a square root. In other words, we
2 − et 1 − t/ ln 2 2 − et
· ¸ consider the arithmetic square root instead of the al-
1 − t/ ln 2 ¯¯ n 1 gebraic square root. This allows us to identify the
∼ ¯ t = ln 2 [t ] =
2 − et 1 − t/ ln 2 power series fb(t) with the function f (t), but when we
1 1 pass to complex numbers this is no longer possible.
=
2 (ln 2)n+1 Actually, in the complex field, a function containing
and we conclude with the very good approximation a square root is a two-valued function, and there are
On ∼ n!/(2(ln 2)n+1 ). two branches defined by the same expression. Only
Finally, we find the asymptotic approximation for one of these two branches coincides with the func-
the Bernoulli numbers. The following statement is tion defined by the power series, which is obviously a
very important when we have functions with several one-valued function.
dominating singularity: The points at which a square root becomes 0 are
special points; in them the function is one-valued, but
Principle: If t1 , t2 , . . . , tk are all the dominating in every neighborhood the function is two-valued. For
singularities of a function f (t), then [tn ]f (t) can be the smallest in modulus among these points, say t0 ,
found by summing all the contributions obtained by we must have the following situation: for t such that
independently considering the k singularities. |t| < |t0 |, fb(t) should coincide with a branch of f (t),
We already observed that ±2πi are the two domi- while for t such that |t| > |t0 |, fb(t) cannot converge.
nating singularities for the generating function of the In fact, consider a t ∈ R, t > |t0 |; the expression un-
Bernoulli numbers; they are both poles of order 1: der the square root should be a negative real number
and therefore f (t) ∈ C\R; but fb(t) can only be a real
t(1 − t/2πi) 1 − t/πi
lim = lim = −1. number or f (t) does not converge. Because we know
t→2πi et − 1 t→2πi et
that when fb(t) converges we must have fb(t) = f (t),
t(1 + t/2πi) 1 + t/πi we conclude that fb(t) cannot converge. This shows
lim = lim = −1.
t→−2πi et − 1 t→−2πi et that t0 is a singularity for f (t).
Therefore we have: Every kth root originates the same problem and
n t n 1 t(1 − t/2πi) 1 the function is actually a k-valued function; all the
[t ] t = [t ] ∼− .
e −1 1 − t/2πi et − 1 (2πi)n values for which the argument of the root is 0 is a
A similar result is obtained for the other pole; thus singularity, called an algebraic singularity. They can
we have: be treated by Darboux’ method or, directly, by means
of Bender’s theorem, which relies on Newton’s rule.
Bn 1 1
∼− − . Actually, we already used this method to find the
n! (2πi)n (−2πi)n asymptotic evaluation for the Motzkin numbers.
When n is odd, these two values are opposite in sign The same considerations hold when a function con-
and the result is 0; this confirms that the Bernoulli tains a logarithm. In fact, a logarithm is an infinite-
numbers of odd index are 0, except for n = 1. When valued function, because it is the inverse of the expo-
n is even, say n = 2k, we have (2πi)2k = (−2πi)2k = nential, which, in the complex field C, is a periodic
(−1)k (2π)2k ; therefore: function:
2(−1)k (2k)!
B2k ∼ − . et+2kπi = et e2kπi = et (cos 2kπ + i sin 2kπ) = et .
(2π)2k
This formula is a good approximation, also for small The period of et is therefore 2πi and ln t is actually
values of n, and shows that Bernoulli numbers be- ln t + 2kπi, for k ∈ Z. A point t0 for which the ar-
come, in modulus, larger and larger as n increases. gument of a logarithm is 0 is a singularity for the
88 CHAPTER 7. ASYMPTOTICS

corresponding function. In every neighborhood of t0 , completely eliminates the singularity. Therefore, if


the function has an infinite number of branches; this a second singularity t1 exists such that |t1 | > |t0 |,
is the only fact distinguishing a logarithmic singular- we can express the coefficients fn in terms of t1 as
ity from an algebraic one. well. When f (t) has a dominating algebraic singular-
Let us suppose we have the sum: ity, this cannot be eliminated, but the method of sub-
tracted singularities allows us to obtain corrections to
n
2 4 8 2n−1 1 X 2k the principal value. Formally, by the successive ap-
Sn = 1 + + + + · · · + =
2 3 4 n 2 k plication of this method, we arrive to the following
k=1
results. If t0 is a dominating pole of order m and
and we wish to compute an approximate value. The α = 1/t0 , then we find the expansion:
generating function is:
à n ! A−m A−m+1 A−m+2
f (t) = m
+ m−1
+ +· · · .
1 X 2 k
1 1 1 (1 − αt) (1 − αt) (1 − αt)m−2
G = ln .
2 k 2 1 − t 1 − 2t
k=1 If t0 is a dominating algebraic singularity and f (t) =
h(t)(1 − αt)p/m , where α = 1/t0 and h(t) has a ra-
There are two singularities: t = 1 is a pole, while t =
dius of convergence larger than |t0 |, then we find the
1/2 is a logarithmic singularity. Since the latter has
expansion:
smaller modulus, it is dominating and R = 1/2 is the
radius of convergence of the function. By Bender’s f (t) = A (1 − αt)p/m + A (1 − αt)(p−1)/m +
p p−1
theorem we have: (p−2)/m
+ Ap−2 (1 − αt) + ···.
1 n 1 1
Sn = [t ] ln ∼
2 1 − t 1 − 2t Newton’s rule can obviously be used to pass from
n
1 1 1 2 these expansions to the asymptotic value of fn .
∼ [tn ] ln = .
2 1 − 1/2 1 − 2t n The same method of subtracted singularities can be
used for a logarithmic singularity. Let us consider as
This is not a very good approximation. In the next an example the sum S = Pn 2k−1 /k, introduced
n k=1
section we will see how it can be improved. in the previous section. We found the principal value
Sn ∼ 2n /n by studying the generating function:
7.6 Subtracted singularities 1 1 1 1
ln ∼ ln .
The methods presented in the preceding sections only 2 1 − t 1 − 2t 1 − 2t
give the expression describing the general Pbehavior of Let us therefore consider the new function:

the coefficients fn in the expansion fb(t) = k=0 fk tk ,
i.e., what is called the principal value for fn . Some- 1 1 1 1
h(t) = ln − ln =
times, this behavior is only achieved for very large 2 1 − t 1 − 2t 1 − 2t
values of n, but for smaller values it is just a rough 1 − 2t 1
= − ln .
approximation of the true value. Because of that, 2(1 − t) 1 − 2t
we speak of “asymptotic evaluation” or “asymptotic
The generic term hn should be significantly less than
approximation”. When we need a true approxima-
fn ; the factor (1 − 2t) actually reduces the order of
tion, we should introduce some corrections, which
growth of the logarithm:
slightly modify the general behavior and more ac-
curately evaluate the true value of fn . 1 − 2t 1
Many times, the following observation solves the −[tn ] ln =
2(1 − t) 1 − 2t
problem. Suppose we have found, by one of the 1 1
previous methods, that a function f (t) is such that = − [tn ](1 − 2t) ln =
2(1 − 1/2) 1 − 2t
f (t) ∼ A(1−αt)γ , for some A, γ ∈ R, or, more in gen- µ n ¶
eral, f (t) ∼ g(t), for some function g(t) of which we 2 2n−1 2n
= − −2 = .
exactly know the coefficients gn . For example, this is n n−1 n(n − 1)
the case of ln(1/(1 − αt)). Because fn ∼ gn , the func- Therefore, a better approximation for Sn is:
tion h(t) = f (t) − g(t) has coefficients hn that grow
more slowly than fn ; formally, since O(fn ) = O(gn ), 2n 2n 2n
we must have hn = o(fn ). Therefore, the quantity Sn = + = .
n n(n − 1) n−1
gn + hn is a better approximation to fn than gn .
When f (t) has a pole t0 of order m, the successive The reader can easily verify that this correction
application of this method of subtracted singularities greatly reduces the error in the evaluation of Sn .
7.8. HAYMAN’S METHOD 89

A further correction can now be obtained by con- where m is a small integer. In the second case, gn is
sidering: the sum of various terms, as many as there are terms
1 1 1 in the polynomial q(t), each one of the form:
k(t) = ln −
2 1 − t 1 − 2t 1
1 1 qk [tn−k ] p .
− ln + (1 − 2t) ln = (1 − αt)(1 − βt)
1 − 2t 1 − 2t
(1 − 2t)2 1 It is therefore interesting to compute, once and for
= ln all, the asymptotic value [tn ]((1−αt)(1−βt))s , where
2(1 − t) 1 − 2t
s = 1/2 or s = −1/2.
which gives: Let us suppose that |α| > |β|, since the case α = β
1 1 has no interest and the case α = −β should be ap-
kn = [tn ](1 − 2t)2 ln = proached in another way. This hypothesis means that
2(1 − 1/2) 1 − 2t
t = 1/α is the radius of convergence of the function
2n 2n−1 2n−2 2n+1
= −4 +4 = . and we can develop everything around this singular-
n n−1 n−2 n(n − 1)(n − 2) ity. In most combinatorial problems we have α > 0,
This correction is still smaller, and we can write: because the coefficients of f (t) are positive numbers,
µ ¶ but this is not a limiting factor.
2n 2 Let us consider s = 1/2; in this case, a minus sign
Sn ∼ 1+ .
n−1 n(n − 2) should precede the square root. The evaluation is
shown in Table 7.1. The formula so obtained can be
In general, we can obtain the same results if we ex-
considered sufficient for obtaining both the asymp-
pand the function h(t) in f (t) = g(t)h(t), h(t) with a
totic evaluation of fn and a suitable numerical ap-
radius of convergence larger than that of f (t), around
proximation. However, we can use the following de-
the dominating singularity. This is done in the fol-
velopments:
lowing way: µ ¶ µ µ ¶¶
2n 4n 1 1 1
1 1 = √ 1− + +O
= = n πn 8n 128n2 n3
2(1 − t) 1 + (1 − 2t) µ µ ¶¶
2 3 1 1 1 1 1
= 1 − (1 − 2t) + (1 − 2t) − (1 − 2t) + · · · . = 1+ + +O
2n − 1 2n 2n 4n2 n3
This implies: µ µ ¶¶
1 1 3 1
= 1+ +O
1 1 1 2n − 3 2n 2n n2
ln = ln −
2(1 − t) 1 − 2t 1 − 2t and get:
1 2 1 r
− (1 − 2t) ln + (1 − 2t) ln − ··· µ
1 − 2t 1 − 2t α − β αn 6 + 3(α − β) 25
fn = √ 1− + +
and the result is the same as the one previously ob- α 2n πn 8(α − β)n 128n2
tained by the method of subtracted singularities. µ ¶¶
9 9αβ + 51β 2 1
+ − + O .
8(α − β)n2 32(α − β)2 n2 n3
7.7 The asymptotic behavior of The reader is invited to find a similar formula for
a trinomial square root the case s = 1/2.

In many problems we arrive to a generating function


of the form:
7.8 Hayman’s method
p
p(t) − (1 − αt)(1 − βt) The method for coefficient evaluation which uses the
f (t) = m function singularities (Darboux’ method), can be im-
rt
proved and made more accurate, as we have seen,
or:
q(t) by the technique of “subtracted singularities”. Un-
g(t) = p . fortunately, these methods become useless when the
(1 − αt)(1 − βt)
function f (t) has no singularity (entire functions) or
In the former case, p(t) is a correcting polynomial, when the dominating singularity is essential. In fact,
which has no effect on fn , for n sufficiently large, in the former case we do not have any singularity to
and therefore we have: operate on, and in the latter the development around
1 n+m p the singularity gives rise to a series with an infinite
fn = − [t ] (1 − αt)(1 − βt) number of terms of negative degree.
r
90 CHAPTER 7. ASYMPTOTICS

³ ´1/2
[tn ] − (1 − αt)1/2 (1 − βt)1/2 = [tn ] − (1 − αt)1/2 α−β α + β
α (1 − αt) =
q ³ ´1/2
= − α−β n
α [t ](1 − αt)
1/2 β
1 + α−β (1 − αt) =
q ³ 2
´
= − α−β [t n
](1 − αt) 1/2
1 + β
(1 − αt) − β
2 (1 − αt)2
+ · · · =
q α ³ 2(α−β) 8(α−β)
2
´
α−β n β β
=− [t ] (1 − αt)1/2 + 2(α−β) (1 − αt)3/2 − 8(α−β)2 (1 − αt)5/2 + · · · =
q α ³¡ ¢ ¡3/2¢ ¡5/2¢ ´
β2
= − α−β α
1/2 n β n
n (−α) + 2(α−β) n (−α) − 8(α−β)2 n (−α) + · · · =
n
q ³ ´
(−1)n−1 ¡2n¢ β2
= − α−β n (−α) n
1 − β 3
− 2
15
+ · · · =
q α 4 (2n−1) ¡n ¢ ³ 2(α−β) 2n−3 8(α−β) (2n−3)(2n−5)
2 ¡ ¢ ´
n
= α−β α
α 4n (2n−1) n
2n 3β
1 − 2(α−β)(2n−3) 15β
− 8(α−β)2 (2n−3)(2n−5) + O n13 .

Table 7.1: The case s = 1/2

In these cases, the only method seems to be the 3. if α, β are positive real numbers and γ, δ are real
Cauchy theorem, which allows us to evaluate [tn ]f (t) numbers, then the function:
by means of an integral: µ µ ¶γ
β 1 1
Z f (t) = exp ln ×
1 f (t) (1 − t)α t (1 − t)
fn = dt
2πi γ tn+1 µ µ ¶¶δ !
2 1 1
× ln ln
where γ is a suitable path enclosing the origin. We do t t (1 − t)
not intend to develop this method here, but we’ll limit is H-admissible.
ourselves to sketch a method, derived from Cauchy
theorem, which allows us to find an asymptotic evalu- For example, the following functions are all H-
ation for fn in many practical situations. The method admissible:
µ ¶ µ ¶
can be implemented on a computer in the following t t2 t
sense: given a function f (t), in an algorithmic way we e exp t + exp
2 1−t
can check whether f (t) belongs to the class of func- µ ¶
tions for which the method is applicable (the class 1 1
exp ln .
of “H-admissible” functions) and, if that is the case, t(1 − t)2 1 − t
we can evaluate the principal value of the asymp- In particular, for the third function we have:
totic estimate for fn . The system ΛΥΩ, by Flajolet, µ ¶ µ ¶ µ ¶
Salvy and Zimmermann, realizes this method. The t 1 1 1
exp = exp − 1 = exp
development of the method was mainly performed 1−t 1−t e 1−t
by Hayman and therefore it is known as Hayman’s and naturally a constant does not influence the H-
method; this also justifies the use of the letter H in admissibility of a function. In this example we have
the definition of H-admissibility. α = β = 1 and γ = δ = 0.
A function is called H-admissible if and only if it For H-admissible functions, the following result
belongs to one of the following classes or can be ob- holds:
tained, in a finite number of steps according to the
following rules, from other H-admissible functions: Theorem 7.8.1 Let f (t) be an H-admissible func-
tion; then:
1. if f (t) and g(t) are H-admissible functions and f (r)
p(t) is a polynomial with real coefficients and fn = [tn ]f (t) ∼ p as n→∞
positive leading term, then: rn 2πb(r)
where r = r(n) is the least positive solution of the
exp(f (t)) f (t) + g(t) f (t) + p(t) equation tf ′ (t)/f (t) = n and b(t) is the function:
µ ′ ¶
p(f (t)) p(t)f (t) d f (t)
b(t) = t t .
dt f (t)
are all H-admissible functions;
As we said before, the proof of this theorem is
2. if p(t) is a polynomial with positive coefficients based on Cauchy’s theorem and is beyond the scope
and not of the form p(tk ) for k > 1, then the of these notes. Instead, let us show some examples
function exp(p(t)) is H-admissible; to clarify the application of Hayman’s method.
7.9. EXAMPLES OF HAYMAN’S THEOREM 91

7.9 Examples of Hayman’s As n grows, we can also give an asymptotic approx-


imation of r, by developing the expression inside the
Theorem square root:
t r
The first example can be easily verified. Let f (t) = e √ √ 1
be the exponential function, so that we know fn = 4n + 1 = 2 n 1 + =
4n
1/n!. For applying Hayman’s theorem, we have to µ µ ¶¶
t t
√ 1 1
solve the equation te /e = n, which gives r = n. = 2 n 1+ +O 2
.
The function b(t) is simply t and therefore we have: 8n n
From this formula we immediately obtain:
en
[tn ]et = √
1 1 1
µ ¶
1
nn 2πn r =1− √ + − √ +O
n 2n 8n n n2
and in this formula we immediately recognize Stirling
approximation for factorials. which will be used in the next approximations.
Examples become early rather complex and require First we compute an approximation for f (r), that
a large amount of computations. Let us consider the is exp(r/(1 − r)). Since:
following sum: µ ¶
1 1 1 1
µ ¶ µ ¶ 1−r = √ − + √ +O =
n−1
X n−1 n
X n−1 1 n 2n 8n n n2
1 µ µ ¶¶
= = 1 1 1 1
k (k + 1)! k − 1 k! = √ 1− √ + +O √
k=0 k=0
n µµ ¶ µ ¶¶ n 2 n 8n n n
X n n−1 1
= − = we immediately obtain:
k k k! µ µ ¶¶
k=0
µ ¶ µ ¶ r 1 1 1
n
X n 1 n−1
X n−1 1 = 1− √ + +O √ ×
= − = 1−r n 2n n n
k k! k k! µ µ ¶¶
k=0 k=0 √ 1 1 1
µ ¶ × n 1+ √ + +O √ =
1 t 2 n 8n n n
= [tn ] exp − µ µ ¶¶
1−t 1−t √ 1 1 1
µ ¶ = n 1− √ + +O √ =
1 t 2 n 8n n n
− [tn−1 ] exp = µ ¶
1−t 1−t √ 1 1 1
µ ¶ µ ¶ = n− + √ +O .
1−t t t 2 8 n n
= [tn ] exp = [tn ] exp .
1−t 1−t 1−t Finally, the exponential gives:
We have already seen that this function is H- µ ¶ √ µ µ ¶¶
r e n 1 1
admissible, and therefore we can try to evaluate the exp = √ exp √ + O =
1−r e 8 n n
asymptotic behavior of the sum. Let us define the √ µ µ ¶¶
function g(t) = tf ′ (t)/f (t), which in the present case e n 1 1
= √ 1+ √ +O .
is: ³ ´ e 8 n n
t t
(1−t)2 exp 1−t t Because Hayman’s method only gives the principal
g(t) = ³ ´ = .
t (1 − t)2 value of the result, the correction can be ignored (it
exp 1−t
can be not precise) and we get:
The value of r is therefore given by the minimal pos- µ ¶ √
r e n
itive solution of: exp = √ .
1−r e
t 2
= n or nt − (2n + 1)t + n = 0. The second part we have to develop is 1/rn , which
(1 − t)2
can be computed when we write it as exp(n ln 1/r),
Because ∆ = 4n2 + 4n + 1 − 4n2 = 4n + 1, we have that is:
the two solutions: µ µ µ ¶¶¶
1 1 1 1
√ = exp n ln 1 + √ + + O √ =
2n + 1 ± 4n + 1 rn n 2n n n
r= µ µ µ ¶
2n 1 1 1
= exp n √ + +O √ −
n 2n n n
and we must accept the one with the ‘−’ sign, which is µ µ ¶¶2 µ ¶!!
positive and 1 1 1 1
√ less than√ the other. It is surely positive, − √ +O √ +O √
because 4n + 1 < 2 n + 1 < 2n, for every n > 1. 2 n n n n n
92 CHAPTER 7. ASYMPTOTICS
µ µ µ ¶¶¶
1 1 1 1
= exp n √ + − +O √ = PIt is a simple matter to execute the original sum
n−1 ¡n−1¢
n 2n 2n n n k=0 k /(k + 1)! by using a personal computer
µ µ ¶¶
√ 1 √ for, say, n = 100, 200, 300. The results obtained can
= exp n+O √ ∼ e n. be compared with the evaluation of the previous for-
n
mula. We can thus verify that the relative error de-
Again, the correction is ignored and we only consider creases as n increases.
the√ principal
√ value. We now observe that f (r)/rn ∼
2 n
e / e.
Only b(r) remains to be computed; we have b(t) =
tg ′ (t) where g(t) is as above, and therefore we have:
d t t(1 + t)
b(t) = t = .
dt (1 − t)2 (1 − t)3
This quantity can be computed a piece at a time.
First:
µ µ ¶¶
1 1 1
r(1 + r) = 1 − √ + +O √ ×
n 2n n n
µ µ ¶¶
1 1 1
× 2− √ + +O √ =
n 2n n n
µ ¶
3 5 1
= 2− √ + +O √ =
n 2n n n
µ µ ¶¶
3 5 1
= 2 1− √ + +O √ .
2 n 4n n n
By using the expression already found for (1 − r) we
then have:
µ µ ¶¶3
3 1 1 1 1
(1 − r) = √ 1− √ + +O √ =
n n 2 n 8n n n
µ µ ¶¶
1 1 1 1
= √ 1− √ + +O √ ×
n n n 2n n n
µ µ ¶¶
1 1 1
× 1− √ + +O √ =
2 n 8n n n
µ µ ¶¶
1 3 9 1
= √ 1− √ + +O √ .
n n 2 n 8n n n
By inverting this quantity, we eventually get:
r(1 + r)
b(r) = =
(1 − r)3
µ µ ¶¶
√ 3 9 1
= 2n n 1 + √ + +O √ ×
2 n 8n n n
µ µ ¶¶
3 5 1
× 1− √ + +O √ =
2 n 4n n n
µ µ ¶¶
√ 1 1
= 2n n 1 + +O √ .
8n n n

The principal value is 2n n and therefore:
p q
√ √
2πb(r) = 4πn n = 2 πn3/4 ;

the final result is:


µ ¶ √
t e2 n
fn = [tn ] exp ∼ √ .
1−t 2 πen3/4
Chapter 8

Bibliography

The birth of Computer Science and the need of ana- It deals, in an elementary but rigorous way, with the
lyzing the behavior of algorithms and data structures main topics of Combinatorial Analysis, with an eye
have given a strong twirl to Combinatorial Analysis, to Computer Science. Many exercise (with solutions)
and to the mathematical methods used to study com- are proposed, and the reader is never left alone in
binatorial objects. So, near to the traditional litera- front of the many-faced problems he or she is encour-
ture on Combinatorics, a number of books and papers aged to tackle.
have been produced, relating Computer Science and
the methods of Combinatorial Analysis. The first au- Ronald L. Graham, Donald E. Knuth, Oren
thor who systematically worked in this direction was Patashnik: Concrete Mathematics, Addison-
surely Donald Knuth, who published in 1968 the first Wesley (1989).
volume of his monumental “The Art of Computer
Programming”, the first part of which is dedicated Many texts on Combinatorial Analysis are worth of
to the mathematical methods used in Combinatorial being considered, because they contain information
Analysis. Without this basic knowledge, there is little on general concepts, both from a combinatorial and
hope to understand the developments of the analysis a mathematical point of view:
of algorithms and data structures:
William Feller: An Introduction to Probabil-
Donald E. Knuth: The Art of Computer Pro- ity Theory and Its Applications, Wiley (1950) -
gramming: Fundamental Algorithms, Vol. I, (1957) - (1968).
Addison-Wesley (1968).

Many additional concepts and techniques are also John Riordan: An Introduction to Combinatorial
contained in the third volume: Analysis, Wiley (1953) (1958).

Donald E. Knuth: The Art of Computer Pro-


gramming: Sorting and Searching, Vol. III, Louis Comtet: Advanced Combinatorics, Reidel
Addison-Wesley (1973). (1974).

Numerical and probabilistic developments are to Ian P. Goulden, David M. Jackson: Combinato-
be found in the central volume: rial Enumeration, Dover Publ. (2004).
Donald E. Knuth: The Art of Computer Pro-
gramming: Numerical Algorithms, Vol. II, Richard P. Stanley: Enumerative Combinatorics,
Addison-Wesley (1973) Vol. I, Cambridge Univ. Press (1986) (2000).

A concise exposition of several important tech-


niques is given in: Richard P. Stanley: Enumerative Combinatorics,
Vol. II, Cambridge Univ. Press !997) (2001).
Daniel H. Greene, Donald E. Knuth: Mathemat-
ics for the Analysis of Algorithms, Birkhäuser
(1981). Robert Sedgewick, Philippe Flajolet: An Intro-
duction to the Analysis of Algorithms, Addison-
Wesley (1995).
The most systematic work in the field is perhaps
the following text, very clear and very comprehensive.

93
94 CHAPTER 8. BIBLIOGRAPHY

*** Coefficient extraction is central in our approach; a


formalization can be found in:
Chapter 1: Introduction Donatella Merlini, Renzo Sprugnol, M. Cecilia
Verri: The method of coefficients, The American
The concepts relating to the problem of searching Mathematical Monthly (to appear).
can be found in any text on Algorithms and Data
Structures, in particular Volume III of the quoted
The Lagrange Inversion Formula can be found in
“The Art of Computer Programming”. Volume I can
most of the quoted texts, in particular in Stanley
be consulted for Landau notation, even if it is com-
and Henrici. Nowadays, almost every part of Math-
mon in Mathematics. For the mathematical concepts
ematics is available by means of several Computer
of Γ and ψ functions, the reader is referred to:
Algebra Systems, implementing on a computer the
Milton Abramowitz, Irene A. Stegun: Handbook ideas described in this chapter, and many, many oth-
of Mathematical Functions, Dover Publ. (1972). ers. Maple and Mathematica are among the most
used systems; other software is available, free or not.
These systems have become an essential tool for de-
*** veloping any new aspect of Mathematics.

Chapter 2: Special numbers ***

The quoted book by Graham, Knuth and Patash-


nik is appropriate. Anyhow, the quantity we consider Chapter 4: Generating functions
are common to all parts of Mathematics, in partic-
ular of Combinatorial Analysis. The ζ function and In our present approach the concept of an (ordi-
the Bernoulli numbers are also covered by the book of nary) generating function is essential, and this justi-
Abramowitz and Stegun. Stanley dedicates to Cata- fies our insistence on the two operators “generating
lan numbers a special survey in his second volume. function” and “coefficient of”. Since their introduc-
Probably, they are the most frequently used quan- tion in Mathematics, due to de Moivre in the XVIII
tity in Combinatorics, just after binomial coefficients Century, generating functions have been a controver-
and before Fibonacci numbers. Stirling numbers are sial concept. Now they are accepted almost univer-
very important in Numerical Analysis, but here we sally, and their theory has been developed in many
are more interested in their combinatorial meaning. of the quoted texts. In particular:
Harmonic numbers are the “discrete” version of log- Herbert S. Wilf: Generatingfunctionology, Aca-
arithms, and from this fact their relevance in Combi- demic Press (1990).
natorics and in the Analysis of Algorithms originates.
Sequences studied in Combinatorial Analysis have A practical device has been realized in Maple:
been collected and annotated by Sloane. The result-
ing book (and the corresponding Internet site) is one Bruno Salvy, Paul Zimmermann: GFUN: a Maple
of the most important reference point: package for the manipulation of generating and
holonomic functions in one variable, INRIA, Rap-
N. J. A. Sloane, Simon Plouffe: The Encyclope- port Technique N. 143 (1992).
dia of Integer Sequences Academic Press (1995).
Available at:
The number of applications of generating functions
http://www.research.att.com/ njas/sequences/.
is almost infinite, so we limit our considerations to
some classical cases relative to Computer Science.
Sometimes, we intentionally complicate proofs in or-
*** der to stress the use of generating function as a uni-
fying approach. So, our proofs should be compared
Chapter 3: Formal Power Series with those given (for the same problems) by Knuth
or Flajolet and Sedgewick.
Formal power series have a long tradition; the
reader can find their algebraic foundation in the book:
***
Peter Henrici: Applied and Computational Com-
plex Analysis, Wiley (1974) Vol. I; (1977) Vol.
Chapter 5: Riordan arrays
II; (1986) Vol. III.
95

Riordan arrays are part of the general method of Doron Zeilberger: A holonomic systems approach
coefficients and are particularly important in prov- to special functions identities, Journal of Compu-
ing combinatorial identities and generating function tational and Applied Mathematics 32 (1990), 321
transformations. They were introduced by: – 368.

Louis W. Shapiro, Seyoum Getu, Wen-Jin Woan, Doron Zeilberger: A fast algorithm for proving
Leon C. Woodson: The Riordan group, Discrete terminating hypergeometric identities, Discrete
Applied Mathematics, 34 (1991) 229 – 239. Mathematics 80 (1990), 207 – 211.

Their practical relevance was noted in:


M. Petkovšek, Herbert S. Wilf, Doron Zeilberger:
A = B, A. K. Peters (1996).
Renzo Sprugnoli: Riordan arrays and combinato-
rial sums, Discrete Mathematics, 132 (1992) 267
In particular, the method of Wilf and Zeilberger
– 290.
completely solves the problem of combinatorial iden-
tities “with hypergeometric terms”. This means that
The theory was further developed in: we can algorithmically establish whether an identity
is truePor false, provided the two members of the iden-
Donatella Merlini, Douglas G. Rogers, Renzo
tity: k L(k, n) = R(n) have a special form:
Sprugnoli, M. Cecilia Verri: On some alterna-
tive characterizations of Riordan arrays, Cana- L(k + 1, n) L(k, n + 1)
dian Journal of Mathematics 49 (1997) 301 – 320.
L(k, n) L(k, n)

Riordan arrays are strictly related to “convolution R(n + 1)


matrices” and to “Umbral calculus”, although, rather R(n)
strangely, nobody seems to have noticed the connec- are all rational functions in n and k. This actu-
tions of these concepts and combinatorial sums: ally means that L(k, n) and R(n) are composed of
factorial, powers and binomial coefficients. In this
Steve Roman: The Umbral Calculus Academic
sense, Riordan arrays are less powerfy+ul, but can
Press (1984).
be used also when non-hypergeometric terms are not
involved, as for examples in the case of harmonic and
Donald E. Knuth: Convolution polynomials, The Stirling numbers of both kind.
Methematica Journal, 2 (1991) 67 – 78.
***
Collections of combinatorial identities are:
Chapter 6: Formal Methods
Henry W. Gould: Combinatorial Identities. A
Standardized Set of Tables Listing 500 Binomial In this chapter we have considered two important
Coefficient Summations West Virginia University methods: the symbolic method for deducing counting
(1972). generating functions from the syntactic definition of
combinatorial objects, and the method of “operators”
for obtaining combinatorial identities from relations
Josef Kaucky: Combinatorial Identities Veda,
between transformations of sequences defined by op-
Bratislava (1975).
erators.
The symbolic method was started by
Other methods to prove combinatorial identities Schützenberger and Viennot, who devised a
are important in Combinatorial Analysis. We quote: technique to automatically generate counting gener-
ating functions from a context-free non-ambiguous
John Riordan: Combinatorial Identities, Wiley grammar. When the grammar defines a class of
(1968). combinatorial objects, this method gives a direct way
to obtain monovariate or multivariate generating
functions, which allow to solve many problems
G. P. Egorychev: Integral Representation and the
relative to the given objects. Since context-free
Computation of Combinatorial Sums American
languages only define algebraic generating functions
Math. Society Translations, Vol. 59 (1984).
(the subset of regular grammars is limited to rational
96 CHAPTER 8. BIBLIOGRAPHY

functions), the method is not very general, but covered by the quoted text, especially Knuth, Wilf
is very effective whenever it can be applied. The and Henrici. The method of Heyman is based on the
method was extended by Flajolet to some classes of paper:
exponential generating functions and implemented
Micha Hofri: Probabilistic Analysis of Algo-
in Maple.
rithms, Springer (1987).
Marco Schützenberger: Context-free languages
and pushdown automata, Information and Con-
trol 6 (1963) 246 – 264

Maylis Delest, Xavier Viennot: Algebraic lan-


guages and polyominoes enumeration, X Collo-
quium on Automata, Languages and Program-
ming - Lecture Notes in Computer Science (1983)
173 – 181.

Philippe Flajolet: Symbolic enumerative com-


binatorics and complex asymptotic analysis,
Algorithms Seminar, (2001). Available at:
http://algo.inria.fr/seminars/sem00-
01/flajolet.html

The method of operators is very old and was devel-


oped in the XIX Century by English mathematicians,
especially George Boole. A classical book in this di-
rection is:
Charles Jordan: Calculus of Finite Differences,
Chelsea Publ. (1965).

Actually, the method is used in Numerical Anal-


ysis, but it has a clear connection with Combinato-
rial Analysis, as our numerous examples show. The
important concepts of indefinite and definite summa-
tions are used by Wilf and Zeilberger in the quoted
texts. The Euler-McLaurin summation formula is the
first connection between finite methods (considered
up to this moment) and asymptotics.

***

Chapter 7: Asymptotics

The methods treated in the previous chapters are


“exact”, in the sense that every time they give the so-
lution to a problem, this solution is a precise formula.
This, however, is not always possible, and many times
we are not able to find a solution of this kind. In these
cases, we would also be content with an approximate
solution, provided we can give an upper bound to the
error committed. The purpose of asymptotic meth-
ods is just that.
The natural settings for these problems are Com-
plex Analysis and the theory of series. We have used
a rather descriptive approach, limiting our consid-
erations to elementary cases. These situations are
Index

Γ function, 8, 84 Cartesian product, 11


ΛΥΩ, 90 Catalan number, 20, 34
ψ function, 8 Catalan triangle, 63
1-1 correspondence, 11 Cauchy product, 26
1-1 mapping, 11 Cauchy theorem, 90
central binomial coefficient, 16, 17
A-sequence, 58 central trinomial coefficient, 46
absolute scale, 9 characteristic function, 12
addition operator, 77 Chomski, Noam, 67
Adelson-Velski, 52 choose, 15
adicity, 35 closed form expression, 7
algebraic singularity, 86, 87 codomain, 11
algebraic square root, 87 coefficient extraction rules, 30
Algol’60, 69 coefficient of operator, 30
Algol’68, 70 coefficient operator, 30
algorithm, 5 colored walk, 62
alphabet, 12, 67 colored walk problem, 62
alternating subgroup, 14 column, 11
ambiguous grammar, 68 column index, 11
Appell subgroup, 57 combination, 15
arithmetic square root, 87 combination with repetitions, 16
arrangement, 11 complete colored walk, 62
asymptotic approximation, 88 composition of f.p.s., 29
asymptotic development, 80 composition of permutations, 13
asymptotic evaluation, 88 composition rule for coefficient of, 30
average case analysis, 5 composition rule for generating functions, 40
AVL tree, 52 compositional inverse of a f.p.s., 29
Computer Algebra System, 34
Backus Normal Form, 70 context free grammar, 68
Bell number, 24 context free language, 68
Bell subgroup, 57 convergence, 83
Bender’s theorem, 84 convolution, 26
Bernoulli number, 18, 24, 53, 80, 85 convolution rule for coefficient of, 30
big-oh notation, 9 convolution rule for generating functions, 40
bijection, 11 cross product rule, 17
binary searching, 6 cycle, 12, 21
binary tree, 20 cycle degree, 12
binomial coefficient, 8, 15 cycle representation, 12
binomial formula, 15
bisection formula, 41 Darboux’ method, 84
bivariate generating function, 55 definite integration of a f.p.s., 28
BNF, 70 definite summation, 78
Boole, George, 73 degree of a permutation, 12
branch, 87 delta series, 29
derangement, 12, 86
C language, 69 derivation, 67
cardinality, 11 diagonal step, 62

97
98 INDEX

diagonalisation rule for generating functions, 40 grammar, 67


difference operator, 73 group, 67
differentiation of a f.p.s., 28
differentiation rule for coefficient of, 30 H-admissible function, 90
differentiation rule for generating functions, 40 Hardy’s identity, 61
digamma function, 8 harmonic number, 8, 18
disposition, 15 harmonic series, 17
divergence, 83 Hayman’s method, 90
domain, 11 head, 67
dominating singularity, 85 height balanced binary tree, 52
double sequence, 11 height of a tree, 52
Dyck grammar, 68
i.p.l., 51
Dyck language, 68
identity, 12
Dyck walk, 21
identity for composition, 29
Dyck word, 69
identity operator, 73
east step, 20, 62 identity permutation, 13
empty word, 12, 67 image, 11
entire function, 89 inclusion exclusion principle, 86
essential singularity, 86 indefinite precision, 35
Euclid’s algorithm, 19 indefinite summation, 78
indeterminate, 25
Euler constant, 8, 18
index, 11
Euler transformation, 42, 55
infinitesimal operator, 73
Euler-McLaurin summation formula, 80
initial condition, 6, 47
even permutation, 13
injective function, 11
exponential algorithm, 10
input, 5
exponential generating function, 25, 39
integral lattice, 20
exponentiation of f.p.s., 28
integration of a f.p.s., 28
extensional definition, 11
intensional definition, 11
extraction of the coefficient, 29
internal path length, 51
f.p.s., 25 intractable algorithm, 10
factorial, 8, 14 intrinsecally ambiguous grammar, 69
falling factorial, 15 invertible f.p.s., 26
Fibonacci number, 19 involution, 13, 49
Fibonacci problem, 19 juxtaposition, 67
Fibonacci word, 70
Fibonacci, Leonardo, 18 k-combination, 15
finite operator, 73 key, 5
fixed point, 12 Kronecker’s delta, 43
Flajolet, Philippe, 90
formal grammar, 67 Lagrange, 32
formal language, 67 Lagrange inversion formula, 33
formal Laurent series, 25, 27 Lagrange subgroup, 57
formal power series, 25 Landau, Edmund, 9
free monoid, 67 Landis, 52
full history recurrence, 47 language, 12, 67
function, 11 language generated by the grammar, 68
leftmost occurrence, 67
Gauss’ integral, 8 length of a walk, 62
generalized convolution rule, 56 length of a word, 67
generalized harmonic number, 18 letter, 12, 67
generating function, 25 LIF, 33
generating function rules, 40 linear algorithm, 10
generation, 67 linear recurrence, 47
geometric series, 30 linear recurrence with constant coefficients, 48
INDEX 99

linear recurrence with polynomial coefficients, 48 pole, 85


linearity rule for coefficient of, 30 polynomial algorithm, 10
linearity rule for generating functions, 40 power of a f.p.s., 27
list representation, 36 preferential arrangement number, 24
logarithm of a f.p.s., 28 preferential arrangements, 24
logarithmic algorithm, 10 prefix, 67
logarithmic singularity, 88 principal value, 88
principle of identity, 40
mapping, 11 problem of searching, 5
Mascheroni constant, 8, 18 product of f.L.s., 27
metasymbol, 70 product of f.p.s., 26
method of shifting, 44 production, 67
Miller, J. C. P., 37 program, 5
monoid, 67 proper Riordan array, 55
Motzkin number, 71
Motzkin triangle, 63 quadratic algoritm, 10
Motzkin word, 71 quasi-unit, 29
multiset, 24
rabbit problem, 19
negation rule, 16 radius of convergence, 83
Newton’s rule, 28, 30, 84 random permutation, 14
non convergence, 83 range, 11
non-ambiguous grammar, 68 recurrence relation, 6, 47
north step, 20, 62 renewal array, 57
north-east step, 20 residue of a f.L.s., 30
number of involutions, 14 reverse of a f.p.s., 29
number of mappings, 11 Riemann zeta function, 18
Numerical Analysis, 73 Riordan array, 55
Riordan’s old identity, 61
O-notation, 9 rising factorial, 15
object grammar, 71 Rogers, Douglas, 57
occurence, 67 root, 21
occurs in, 67 rooted planar tree, 21
odd permutation, 13 row, 11
operand, 35 row index, 11
operations on rational numbers, 35 row-by-column product, 32
operator, 35, 72
order of a f.p.s., 25 Salvy, Bruno, 90
order of a pole, 85 Schützenberger methodology, 68
order of a recurrence, 47 semi-closed form, 46
ordered Bell number, 24, 85 sequence, 11
ordered partition, 24 sequential searching, 5
ordinary generating function, 25 set, 11
output, 5 set partition, 23
shift operator, 73
p-ary tree, 34 shifting rule for coefficient of, 30
parenthetization, 20, 68 shifting rule for generating functions, 40
partial fraction expansion, 30, 48 shuffling, 14
partial history recurrence, 47 simple binomial coefficients, 59
partially recursive set, 68 singularity, 85
Pascal language, 69 small-oh notation, 9
Pascal triangle, 16 solving a recurrence, 6
path, 20 sorting, 12
permutation, 12 south-east step, 20
place marker, 25 square root of a f.p.s., 28
Pochhammer symbol, 15 Stanley, 33
100 INDEX

Stirling number of the first kind, 21


Stirling number of the second kind, 22
Stirling polynomial, 23
Stirling, James, 21, 22
subgroup of associated operators, 57
subtracted singularity, 88
subword, 67
successful search, 5
suffix, 67
sum of a geometric progression, 44
sum of f.L.s, 27
sum of f.p.s., 26
summation by parts, 78
summing factor method, 49
surjective function, 11
symbol, 12, 67
symbolic method, 68
symmetric colored walk problem, 62
symmetric group, 14
symmetry formula, 16
syntax directed compilation, 70

table, 5
tail, 67
Tartaglia triangle, 16
Taylor theorem, 80
terminal word, 68
tractable algorithm, 10
transposition, 12
transposition representation, 13
tree representation, 36
triangle, 55
trinomial coefficient, 46

unary-binary tree, 71
underdiagonal colored walk, 62
underdiagonal walk, 20
unfold a recurrence, 6, 49
uniform convergence, 83
unit, 26
unsuccessful search, 5

van Wijngarden grammar, 70


Vandermonde convolution, 43
vector notation, 11
vector representation, 12

walk, 20
word, 12, 67
worst AVL tree, 52
worst case analysis, 5

Z-sequence, 58
zeta function, 18
Zimmermann, Paul, 90

You might also like