An Introduction to

Mathematical Methods in Combinatorics
Renzo Sprugnoli
Dipartimento di Sistemi e Informatica
Viale Morgagni, 65 - Firenze (Italy)
January 18, 2006
2
Contents
1 Introduction 5
1.1 What is the Analysis of an Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 The Analysis of Sequential Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Binary Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Closed Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 The Landau notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Special numbers 11
2.1 Mappings and powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 The group structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Counting permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Dispositions and Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 The Pascal triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.7 Harmonic numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.8 Fibonacci numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.9 Walks, trees and Catalan numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.10 Stirling numbers of the first kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.11 Stirling numbers of the second kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.12 Bell and Bernoulli numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Formal power series 25
3.1 Definitions for formal power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 The basic algebraic structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Formal Laurent Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Operations on formal power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6 Coefficient extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.7 Matrix representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8 Lagrange inversion theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.9 Some examples of the LIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.10 Formal power series and the computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.11 The internal representation of expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.12 Basic operations of formal power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.13 Logarithm and exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Generating Functions 39
4.1 General Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Some Theorems on Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 More advanced results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Common Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 The Method of Shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.6 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.7 Some special generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3
4 CONTENTS
4.8 Linear recurrences with constant coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.9 Linear recurrences with polynomial coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.10 The summing factor method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.11 The internal path length of binary trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.12 Height balanced binary trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.13 Some special recurrences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5 Riordan Arrays 55
5.1 Definitions and basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 The algebraic structure of Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 The A-sequence for proper Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.4 Simple binomial coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.5 Other Riordan arrays from binomial coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.6 Binomial coefficients and the LIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.7 Coloured walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.8 Stirling numbers and Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.9 Identities involving the Stirling numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6 Formal methods 67
6.1 Formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 Context-free languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3 Formal languages and programming languages . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4 The symbolic method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.5 The bivariate case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.6 The Shift Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.7 The Difference Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.8 Shift and Difference Operators - Example I . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.9 Shift and Difference Operators - Example II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.10 The Addition Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.11 Definite and Indefinite summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.12 Definite Summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.13 The Euler-McLaurin Summation Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.14 Applications of the Euler-McLaurin Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7 Asymptotics 83
7.1 The convergence of power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 The method of Darboux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.3 Singularities: poles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.4 Poles and asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.5 Algebraic and logarithmic singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.6 Subtracted singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.7 The asymptotic behavior of a trinomial square root . . . . . . . . . . . . . . . . . . . . . . . . 89
7.8 Hayman’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.9 Examples of Hayman’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8 Bibliography 93
Chapter 1
Introduction
1.1 What is the Analysis of an
Algorithm
An algorithm is a finite sequence of unambiguous
rules for solving a problem. Once the algorithm is
started with a particular input, it will always end,
obtaining the correct answer or output, which is the
solution to the given problem. An algorithm is real-
ized on a computer by means of a program, i.e., a set
of instructions which cause the computer to perform
the elaborations intended by the algorithm.
So, an algorithm is independent of any computer,
and, in fact, the word was used for a long time before
computers were invented. Leonardo Fibonacci called
“algorisms” the rules for performing the four basic
operations (addition, subtraction, multiplication and
division) with the newly introduced arabic digits and
the positional notation for numbers. “Euclid’s algo-
rithm” for evaluating the greatest common divisor of
two integer numbers was also well known before the
appearance of computers.
Many algorithms can exist which solve the same
problem. Some can be very skillful, others can be
very simple and straight-forward. A natural prob-
lem, in these cases, is to choose, if any, the best al-
gorithm, in order to realize it as a program on a par-
ticular computer. Simple algorithms are more easily
programmed, but they can be very slow or require
large amounts of computer memory. The problem
is therefore to have some means to judge the speed
and the quantity of computer resources a given algo-
rithm uses. The aim of the “Analysis of Algorithms”
is just to give the mathematical bases for describing
the behavior of algorithms, thus obtaining exact cri-
teria to compare different algorithms performing the
same task, or to see if an algorithm is sufficiently good
to be efficiently implemented on a computer.
Let us consider, as a simple example, the problem
of searching. Let S be a given set. In practical cases
S is the set N of natural numbers, or the set Z of
integer numbers, or the set R of real numbers or also
the set A

of the words on some alphabet A. How-
ever, as a matter of fact, the nature of S is not essen-
tial, because we always deal with a suitable binary
representation of the elements in S on a computer,
and have therefore to be considered as “words” in
the computer memory. The “problem of searching”
is as follows: we are given a finite ordered subset
T = (a
1
, a
2
, . . . , a
n
) of S (usually called a table, its
elements referred to as keys), an element s ∈ S and
we wish to know whether s ∈ T or s ∈ T, and in the
former case which element a
k
in T it is.
Although the mathematical problem “s ∈ T or
not” has almost no relevance, the searching prob-
lem is basic in computer science and many algorithms
have been devised to make the process of searching
as fast as possible. Surely, the most straight-forward
algorithm is sequential searching: we begin by com-
paring s and a
1
and we are finished if they are equal.
Otherwise, we compare s and a
2
, and so on, until we
find an element a
k
= s or reach the end of T. In
the former case the search is successful and we have
determined the element in T equal to s. In the latter
case we are convinced that s ∈ T and the search is
unsuccessful.
The analysis of this (simple) algorithm consists in
finding one or more mathematical expressions de-
scribing in some way the number of operations per-
formed by the algorithm as a function of the number
n of the elements in T. This definition is intentionally
vague and the following points should be noted:
• an algorithm can present several aspects and
therefore may require several mathematical ex-
pressions to be fully understood. For example,
for what concerns the sequential search algo-
rithm, we are interested in what happens during
a successful or unsuccessful search. Besides, for
a successful search, we wish to know what the
worst case is (Worst Case Analysis) and what
the average case is (Average Case Analysis) with
respect to all the tables containing n elements or,
for a fixed table, with respect to the n elements
it contains;
• the operations performed by an algorithm can be
5
6 CHAPTER 1. INTRODUCTION
of many different kinds. In the example above
the only operations involved in the algorithm are
comparisons, so no doubt is possible. In other
algorithms we can have arithmetic or logical op-
erations. Sometimes we can also consider more
complex operations as square roots, list concate-
nation or reversal of words. Operations depend
on the nature of the algorithm, and we can de-
cide to consider as an “operation” also very com-
plicated manipulations (e.g., extracting a ran-
dom number or performing the differentiation of
a polynomial of a given degree). The important
point is that every instance of the “operation”
takes on about the same time or has the same
complexity. If this is not the case, we can give
different weights to different operations or to dif-
ferent instances of the same operation.
We observe explicitly that we never consider exe-
cution time as a possible parameter for the behavior
of an algorithm. As stated before, algorithms are
independent of any particular computer and should
never be confused with the programs realizing them.
Algorithms are only related to the “logic” used to
solve the problem; programs can depend on the abil-
ity of the programmer or on the characteristics of the
computer.
1.2 The Analysis of Sequential
Searching
The analysis of the sequential searching algorithm is
very simple. For a successful search, the worst case
analysis is immediate, since we have to perform n
comparisons if s = a
n
is the last element in T. More
interesting is the average case analysis, which intro-
duces the first mathematical device of algorithm anal-
ysis. To find the average number of comparisons in a
successful search we should sum the number of com-
parisons necessary to find any element in T, and then
divide by n. It is clear that if s = a
1
we only need
a single comparison; if s = a
2
we need two compar-
isons, and so on. Consequently, we have:
C
n
=
1
n
(1 + 2 + +n) =
1
n
n(n + 1)
2
=
n + 1
2
This result is intuitively clear but important, since
it shows in mathematical terms that the number of
comparisons performed by the algorithm (and hence
the time taken on a computer) grows linearly with
the dimension of T.
The concluding step of our analysis was the execu-
tion of a sum—a well-known sum in the present case.
This is typical of many algorithms and, as a matter of
fact, the ability in performing sums is an important
technical point for the analyst. A large part of our
efforts will be dedicated to this topic.
Let us now consider an unsuccessful search, for
which we only have an Average Case analysis. If U
k
denotes the number of comparisons necessary for a
table with k elements, we can determine U
n
in the
following way. We compare s with the first element
a
1
in T and obviously we find s = a
1
, so we should
go on with the table T

= T ` ¦a
1
¦, which contains
n −1 elements. Consequently, we have:
U
n
= 1 +U
n−1
This is a recurrence relation, that is an expression
relating the value of U
n
with other values of U
k
having
k < n. It is clear that if some value, e.g., U
0
or U
1
is known, then it is possible to find the value of U
n
,
for every n ∈ N. In our case, U
0
is the number of
comparisons necessary to find out that an element
s does not belong to a table containing no element.
Hence we have the initial condition U
0
= 0 and we
can unfold the preceding recurrence relation:
U
n
= 1 +U
n−1
= 1 + 1 +U
n−2
= =
= 1 + 1 + + 1
. .. .
ntimes
+U
0
= n
Recurrence relations are the other mathematical
device arising in algorithm analysis. In our example
the recurrence is easily transformed into a sum, but
as we shall see this is not always the case. In general
we have the problem of solving a recurrence, i.e., to
find an explicit expression for U
n
, starting with the
recurrence relation and the initial conditions. So, an-
other large part of our efforts will be dedicated to the
solution or recurrences.
1.3 Binary Searching
Another simple example of analysis can be performed
with the binary search algorithm. Let S be a given
ordered set. The ordering must be total, as the nu-
merical order in N, Z or R or the lexicographical
order in A

. If T = (a
1
, a
2
, . . . , a
n
) is a finite or-
dered subset of S, i.e., a table, we can always imag-
ine that a
1
< a
2
< < a
n
and consider the fol-
lowing algorithm, called binary searching, to search
for an element s ∈ S in T. Let a
i
the median el-
ement in T, i.e., i = ⌊(n + 1)/2⌋, and compare it
with s. If s = a
i
then the search is successful;
otherwise, if s < a
i
, perform the same algorithm
on the subtable T

= (a
1
, a
2
, . . . , a
i−1
); if instead
s > a
i
perform the same algorithm on the subtable
T
′′
= (a
i+1
, a
i+2
, . . . , a
n
). If at any moment the ta-
ble on which we perform the search is reduced to the
empty set ∅, then the search is unsuccessful.
1.4. CLOSED FORMS 7
Let us consider first the Worst Case analysis of
this algorithm. For a successful search, the element
s is only found at the last step of the algorithm, i.e.,
when the subtable on which we search is reduced to
a single element. If B
n
is the number of comparisons
necessary to find s in a table T with n elements, we
have the recurrence:
B
n
= 1 +B
⌊n/2⌋
In fact, we observe that every step reduces the table
to ⌊n/2⌋ or to ⌊(n−1)/2⌋ elements. Since we are per-
forming a Worst Case analysis, we consider the worse
situation. The initial condition is B
1
= 1, relative to
the table (s), to which we should always reduce. The
recurrence is not so simple as in the case of sequential
searching, but we can simplify everything considering
a value of n of the form 2
k
−1. In fact, in such a case,
we have ⌊n/2⌋ = ⌊(n −1)/2⌋ = 2
k−1
−1, and the re-
currence takes on the form:
B
2
k
−1
= 1 +B
2
k−1
−1
or β
k
= 1 +β
k−1
if we write β
k
for B
2
k
−1
. As before, unfolding yields
β
k
= k, and returning to the B’s we find:
B
n
= log
2
(n + 1)
by our definition n = 2
k
−1. This is valid for every n
of the form 2
k
−1 and for the other values this is an
approximation, a rather good approximation, indeed,
because of the very slow growth of logarithms.
We observe explicitly that for n = 1, 000, 000, a
sequential search requires about 500,000 comparisons
on the average for a successful search, whereas binary
searching only requires log
2
(1, 000, 000) ≈ 20 com-
parisons. This accounts for the dramatic improve-
ment that binary searching operates on sequential
searching, and the analysis of algorithms provides a
mathematical proof of such a fact.
The Average Case analysis for successful searches
can be accomplished in the following way. There is
only one element that can be found with a single com-
parison: the median element in T. There are two
elements that can be found with two comparisons:
the median elements in T

and in T
′′
. Continuing
in the same way we find the average number A
n
of
comparisons as:
A
n
=
1
n
(1 + 2 + 2 + 3 + 3 + 3 + 3 + 4 +
+ (1 +⌊log
2
(n)⌋))
The value of this sum can be found explicitly, but the
method is rather difficult and we delay it until later
(see Section 4.7). When n = 2
k
− 1 the expression
simplifies:
A
2
k
−1
=
1
2
k
−1
k
¸
j=1
j2
j−1
=
k2
k
−2
k
+ 1
2
k
−1
This sum also is not immediate, but the reader can
check it by using mathematical induction. If we now
write k2
k
−2
k
+1 = k(2
k
−1) +k −(2
k
−1), we find:
A
n
= k +
k
n
−1 = log
2
(n + 1) −1 +
log
2
(n + 1)
n
which is only a little better than the worst case.
For unsuccessful searches, the analysis is now very
simple, since we have to proceed as in the Worst Case
analysis and at the last comparison we have a failure
instead of a success. Consequently, U
n
= B
n
.
1.4 Closed Forms
The sign “=” between two numerical expressions de-
notes their numerical equivalence as for example:
n
¸
k=0
k =
n(n + 1)
2
Although algebraically or numerically equivalent, two
expressions can be computationally quite different.
In the example, the left-hand expression requires n
sums to be evaluated, whereas the right-hand ex-
pression only requires a sum, a multiplication and
a halving. For n also moderately large (say n ≥ 5)
nobody would prefer computing the left-hand expres-
sion rather than the right-hand one. A computer
evaluates this latter expression in a few nanoseconds,
but can require some milliseconds to compute the for-
mer, if only n is greater than 10,000. The important
point is that the evaluation of the right-hand expres-
sion is independent of n, whilst the left-hand expres-
sion requires a number of operations growing linearly
with n.
A closed form expression is an expression, depend-
ing on some parameter n, the evaluation of which
does not require a number of operations depending
on n. Another example we have already found is:
n
¸
k=0
k2
k−1
= n2
n
−2
n
+ 1 = (n −1)2
n
+ 1
Again, the left-hand expression is not in closed form,
whereas the right-hand one is. We observe that
2
n
= 22 2 (n times) seems to require n−1 mul-
tiplications. In fact, however, 2
n
is a simple shift in
a binary computer and, more in general, every power
α
n
= exp(nln α) can be always computed with the
maximal accuracy allowed by a computer in constant
time, i.e., in a time independent of α and n. This
is because the two elementary functions exp(x) and
ln(x) have the nice property that their evaluation is
independent of their argument. The same property
holds true for the most common numerical functions,
8 CHAPTER 1. INTRODUCTION
as the trigonometric and hyperbolic functions, the Γ
and ψ functions (see below), and so on.
As we shall see, in algorithm analysis there appear
many kinds of “special” numbers. Most of them can
be reduced to the computation of some basic quan-
tities, which are considered to be in closed form, al-
though apparently they depend on some parameter
n. The three main quantities of this kind are the fac-
torial, the harmonic numbers and the binomial co-
efficients. In order to justify the previous sentence,
let us anticipate some definitions, which will be dis-
cussed in the next chapter, and give a more precise
presentation of the Γ and ψ functions.
The Γ-function is defined by a definite integral:
Γ(x) =


0
t
x−1
e
−t
dt.
By integrating by parts, we obtain:
Γ(x + 1) =


0
t
x
e
−t
dt =
=

−t
x
e
−t


0
+


0
xt
x−1
e
−t
dt
= xΓ(x)
which is a basic, recurrence property of the Γ-
function. It allows us to reduce the computation of
Γ(x) to the case 1 ≤ x ≤ 2. In this interval we can
use a polynomial approximation:
Γ(x + 1) = 1 +b
1
x +b
2
x
2
+ +b
8
x
8
+ǫ(x)
where:
b
1
= −0.577191652 b
5
= −0.756704078
b
2
= 0.988205891 b
6
= 0.482199394
b
3
= −0.897056937 b
7
= −0.193527818
b
4
= 0.918206857 b
8
= 0.035868343
The error is [ǫ(x)[ ≤ 3 10
−7
. Another method is to
use Stirling’s approximation:
Γ(x) = e
−x
x
x−0.5

1 +
1
12x
+
1
288x
2


139
51840x
3

571
2488320x
4
+

.
Some special values of the Γ-function are directly
obtained from the definition. For example, when
x = 1 the integral simplifies and we immediately find
Γ(1) = 1. When x = 1/2 the definition implies:
Γ(1/2) =


0
t
1/2−1
e
−t
dt =


0
e
−t

t
dt.
By performing the substitution y =

t (t = y
2
and
dt = 2ydy), we have:
Γ(1/2) =


0
e
−y
2
y
2ydy = 2


0
e
−y
2
dy =

π
the last integral being the famous Gauss’ integral.
Finally, from the recurrence relation we obtain:
Γ(1/2) = Γ(1 −1/2) = −
1
2
Γ(−1/2)
and therefore:
Γ(−1/2) = −2Γ(1/2) = −2

π.
The Γ function is defined for every x ∈ C, except
when x is a negative integer, where the function goes
to infinity; the following approximation can be im-
portant:
Γ(−n +ǫ) ≈
(−1)
n
n!
1
ǫ
.
When we unfold the basic recurrence of the Γ-
function for x = n an integer, we find Γ(n + 1) =
n (n − 1) 2 1. The factorial Γ(n + 1) =
n! = 1 2 3 n seems to require n −2 multi-
plications. However, for n large it can be computed
by means of the Stirling’s formula, which is obtained
from the same formula for the Γ-function:
n! = Γ(n + 1) = nΓ(n) =
=

2πn

n
e

n

1 +
1
12n
+
1
288n
2
+

.
This requires only a fixed amount of operations to
reach the desired accuracy.
The function ψ(x), called ψ-function or digamma
function, is defined as the logarithmic derivative of
the Γ-function:
ψ(x) =
d
dx
ln Γ(x) =
Γ

(x)
Γ(x)
.
Obviously, we have:
ψ(x + 1) =
Γ

(x + 1)
Γ(x + 1)
=
d
dx
xΓ(x)
xΓ(x)
=
=
Γ(x) +xΓ

(x)
xΓ(x)
=
1
x
+ψ(x)
and this is a basic property of the digamma function.
By this formula we can always reduce the computa-
tion of ψ(x) to the case 1 ≤ x ≤ 2, where we can use
the approximation:
ψ(x) = ln x −
1
2x

1
12x
2
+
1
120x
4

1
252x
6
+ .
By the previous recurrence, we see that the digamma
function is related to the harmonic numbers H
n
=
1 + 1/2 + 1/3 + + 1/n. In fact, we have:
H
n
= ψ(n + 1) +γ
where γ = 0.57721566 . . . is the Mascheroni-Euler
constant. By using the approximation for ψ(x), we
1.5. THE LANDAU NOTATION 9
obtain an approximate formula for the Harmonic
numbers:
H
n
= ln n +γ +
1
2n

1
12n
2
+
which shows that the computation of H
n
does not
require n − 1 sums and n − 1 inversions as it can
appear from its definition.
Finally, the binomial coefficient:

n
k

=
n(n −1) (n −k + 1)
k!
=
=
n!
k!(n −k)!
=
=
Γ(n + 1)
Γ(k + 1)Γ(n −k + 1)
can be reduced to the computation of the Γ function
or can be approximated by using the Stirling’s for-
mula for factorials. The two methods are indeed the
same. We observe explicitly that the last expression
shows that binomial coefficients can be defined for
every n, k ∈ C, except that k cannot be a negative
integer number.
The reader can, as a very useful exercise, write
computer programs to realize the various functions
mentioned in the present section.
1.5 The Landau notation
To the mathematician Edmund Landau is ascribed
a special notation to describe the general behavior
of a function f(x) when x approaches some definite
value. We are mainly interested to the case x →
∞, but this should not be considered a restriction.
Landau notation is also known as O-notation (or big-
oh notation), because of the use of the letter O to
denote the desired behavior.
Let us consider functions f : N →R (i.e., sequences
of real numbers); given two functions f(n) and g(n),
we say that f(n) is O(g(n)), or that f(n) is in the
order of g(n), if and only if:
lim
n→∞
f(n)
g(n)
< ∞
In formulas we write f(n) = O(g(n)) or also f(n) ∼
g(n). Besides, if we have at the same time:
lim
n→∞
g(n)
f(n)
< ∞
we say that f(n) is in the same order as g(n) and
write f(n) = Θ(g(n)) or f(n) ≈ g(n).
It is easy to see that “∼” is an order relation be-
tween functions f : N →R, and that “≈” is an equiv-
alence relation. We observe explicitly that when f(n)
is in the same order as g(n), a constant K = 0 exists
such that:
lim
n→∞
f(n)
Kg(n)
= 1 or lim
n→∞
f(n)
g(n)
= K;
the constant K is very important and will often be
used.
Before making some important comments on Lan-
dau notation, we wish to introduce a last definition:
we say that f(n) is of smaller order than g(n) and
write f(n) = o(g(n)), iff:
lim
n→∞
f(n)
g(n)
= 0.
Obviously, this is in accordance with the previous
definitions, but the notation introduced (the small-
oh notation) is used rather frequently and should be
known.
If f(n) and g(n) describe the behavior of two al-
gorithms A and B solving the same problem, we
will say that A is asymptotically better than B iff
f(n) = o(g(n)); instead, the two algorithms are
asymptotically equivalent iff f(n) = Θ(g(n)). This is
rather clear, because when f(n) = o(g(n)) the num-
ber of operations performed by A is substantially less
than the number of operations performed by B. How-
ever, when f(n) = Θ(g(n)), the number of operations
is the same, except for a constant quantity K, which
remains the same as n → ∞. The constant K can
simply depend on the particular realization of the al-
gorithms A and B, and with two different implemen-
tations we may have K < 1 or K > 1. Therefore, in
general, when f(n) = Θ(g(n)) we cannot say which
algorithm is better, this depending on the particular
realization or on the particular computer on which
the algorithms are run. Obviously, if A and B are
both realized on the same computer and in the best
possible way, a value K < 1 tells us that algorithm
A is relatively better than B, and vice versa when
K > 1.
It is also possible to give an absolute evaluation
for the performance of a given algorithm A, whose
behavior is described by a sequence of values f(n).
This is done by comparing f(n) against an absolute
scale of values. The scale most frequently used con-
tains powers of n, logarithms and exponentials:
O(1) < O(ln n) < O(

n) < O(n) <
< O(nln n) < O(n

n) < O(n
2
) <
< O(n
5
) < < O(e
n
) <
< O(e
e
n
) < .
This scale reflects well-known properties: the loga-
rithm grows more slowly than any power n
ǫ
, how-
ever small ǫ, while e
n
grows faster than any power
10 CHAPTER 1. INTRODUCTION
n
k
, however large k, independent of n. Note that
n
n
= e
nln n
and therefore O(e
n
) < O(n
n
). As a mat-
ter of fact, the scale is not complete, as we obviously
have O(n
0.4
) < O(

n) and O(ln ln n) < O(ln n), but
the reader can easily fill in other values, which can
be of interest to him or her.
We can compare f(n) to the elements of the scale
and decide, for example, that f(n) = O(n); this
means that algorithm A performs at least as well as
any algorithm B whose behavior is described by a
function g(n) = Θ(n).
Let f(n) be a sequence describing the behavior of
an algorithm A. If f(n) = O(1), then the algorithm
A performs in a constant time, i.e., in a time indepen-
dent of n, the parameter used to evaluate the algo-
rithm behavior. Algorithms of this type are the best
possible algorithms, and they are especially interest-
ing when n becomes very large. If f(n) = O(n), the
algorithm A is said to be linear; if f(n) = O(ln n), A
is said to be logarithmic; if f(n) = O(n
2
), A is said
to be quadratic; and if f(n) ≥ O(e
n
), A is said to
be exponential. In general, if f(n) ≤ O(n
r
) for some
r ∈ R, then A is said to be polynomial.
As a standard terminology, mainly used in the
“Theory of Complexity”, polynomial algorithms are
also called tractable, while exponential algorithms are
called intractable. These names are due to the follow-
ing observation. Suppose we have a linear algorithm
A and an exponential algorithm B, not necessarily
solving the same problem. Also suppose that an hour
of computer time executes both A and B with n = 40.
If a new computer is used which is 1000 times faster
than the old one, in an hour algorithm A will exe-
cute with m such that f(m) = K
A
m = 1000K
A
n, or
m = 1000n = 40, 000. Therefore, the problem solved
by the new computer is 1000 times larger than the
problem solved by the old computer. For algorithm
B we have g(m) = K
B
e
m
= 1000K
B
e
n
; by simplify-
ing and taking logarithms, we find m = n+ln 1000 ≈
n + 6.9 < 47. Therefore, the improvement achieved
by the new computer is almost negligible.
Chapter 2
Special numbers
A sequence is a mapping from the set N of nat-
ural numbers into some other set of numbers. If
f : N → R, the sequence is called a sequence of real
numbers; if f : N → Q, the sequence is called a se-
quence of rational numbers; and so on. Usually, the
image of a k ∈ N is denoted by f
k
instead of the tradi-
tional f(k), and the whole sequence is abbreviated as
(f
0
, f
1
, f
2
, . . .) = (f
k
)
k∈N
. Because of this notation,
an element k ∈ N is called an index.
Often, we also study double sequences, i.e., map-
pings f : N N → R or some other nu-
meric set. In this case also, instead of writing
f(n, k) we will usually write f
n,k
and the whole
sequence will be denoted by ¦f
n,k
[ n, k ∈ N¦ or
(f
n,k
)
n,k∈N
. A double sequence can be displayed
as an infinite array of numbers, whose first row is
the sequence (f
0,0
, f
0,1
, f
0,2
, . . .), the second row is
(f
1,0
, f
1,1
, f
1,2
, . . .), and so on. The array can also
be read by columns, and then (f
0,0
, f
1,0
, f
2,0
, . . .) is
the first column, (f
0,1
, f
1,1
, f
2,1
, . . .) is the second col-
umn, and so on. Therefore, the index n is the row
index and the index k is the column index.
We wish to describe here some sequences and dou-
ble sequences of numbers, frequently occurring in the
analysis of algorithms. In fact, they arise in the study
of very simple and basic combinatorial problems, and
therefore appear in more complex situations, so to be
considered as the fundamental bricks in a solid wall.
2.1 Mappings and powers
In all branches of Mathematics two concepts appear,
which have to be taken as basic: the concept of a
set and the concept of a mapping. A set is a col-
lection of objects and it must be considered a primi-
tive concept, i.e., a concept which cannot be defined
in terms of other and more elementary concepts. If
a, b, c, . . . are the objects (or elements) in a set de-
noted by A, we write A = ¦a, b, c, . . .¦, thus giving
an extensional definition of this particular set. If a
set B is defined through a property P of its elements,
we write B = ¦x[ P(x) is true¦, thus giving an inten-
sional definition of the particular set B.
If S is a finite set, then [S[ denotes its cardinality
or the number of its elements: The order in which we
write or consider the elements of S is irrelevant. If we
wish to emphasize a particular arrangement or order-
ing of the elements in S, we write (a
1
, a
2
, . . . , a
n
), if
[S[ = n and S = ¦a
1
, a
2
, . . . , a
n
¦ in any order. This is
the vector notation and is used to represent arrange-
ments of S. Two arrangements of S are different if
and only if an index k exists, for which the elements
corresponding to that index in the two arrangements
are different; obviously, as sets, the two arrangements
continue to be the same.
If A, B are two sets, a mapping or function from A
into B, noted as f : A → B, is a subset of the Carte-
sian product of A by B, f ⊆ A B, such that every
element a ∈ A is the first element of one and only
one pair (a, b) ∈ f. The usual notation for (a, b) ∈ f
is f(a) = b, and b is called the image of a under the
mapping f. The set A is the domain of the map-
ping, while B is the range or codomain. A function
for which every a
1
= a
2
∈ A corresponds to pairs
(a
1
, b
1
) and (a
2
, b
2
) with b
1
= b
2
is called injective. A
function in which every b ∈ B belongs at least to one
pair (a, b) is called surjective. A bijection or 1-1 cor-
respondence or 1-1 mapping is any injective function
which is also surjective.
If [A[ = n and [B[ = m, the cartesian product
AB, i.e., the set of all the couples (a, b) with a ∈ A
and b ∈ B, contains exactly nm elements or couples.
A more difficult problem is to find out how many
mappings from A to B exist. We can observe that
every element a ∈ A must have its image in B; this
means that we can associate to a any one of the m
elements in B. Therefore, we have m m . . . m
different possibilities, when the product is extended
to all the n elements in A. Since all the mappings can
be built in this way, we have a total of m
n
different
mappings from A into B. This also explains why the
set of mappings from A into B is often denoted by
11
12 CHAPTER 2. SPECIAL NUMBERS
B
A
; in fact we can write:

B
A

= [B[
|A|
= m
n
.
This formula allows us to solve some simple com-
binatorial problems. For example, if we toss 5 coins,
how many different configurations head/tail are pos-
sible? The five coins are the domain of our mappings
and the set ¦head,tail¦ is the codomain. Therefore
we have a total of 2
5
= 32 different configurations.
Similarly, if we toss three dice, the total number of
configurations is 6
3
= 216. In the same way, we can
count the number of subsets in a set S having [S[ = n.
In fact, let us consider, given a subset A ⊆ S, the
mapping χ
A
: S → ¦0, 1¦ defined by:

χ
A
(x) = 1 for x ∈ A
χ
A
(x) = 0 for x / ∈ A
This is called the characteristic function of the sub-
set A; every two different subsets of S have dif-
ferent characteristic functions, and every mapping
f : S → ¦0, 1¦ is the characteristic function of some
subset A ⊆ S, i.e., the subset ¦x ∈ S [ f(x) = 1¦.
Therefore, there are as many subsets of S as there
are characteristic functions; but these are 2
n
by the
formula above.
A finite set A is sometimes called an alphabet and
its elements symbols or letters. Any sequence of let-
ters is called a word; the empty sequence is the empty
word and is denoted by ǫ. From the previous consid-
erations, if [A[ = n, the number of words of length m
is n
m
. The first part of Chapter 6 is devoted to some
basic notions on special sets of words or languages.
2.2 Permutations
In the usual sense, a permutation of a set of objects
is an arrangement of these objects in any order. For
example, three objects, denoted by a, b, c, can be ar-
ranged into six different ways:
(a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a).
A very important problem in Computer Science is
“sorting”: suppose we have n objects from an or-
dered set, usually some set of numbers or some set of
strings (with the common lexicographic order); the
objects are given in a random order and the problem
consists in sorting them according to the given order.
For example, by sorting (60, 51, 80, 77, 44) we should
obtain (44, 51, 60, 77, 80) and the real problem is to
obtain this ordering in the shortest possible time. In
other words, we start with a random permutation of
the n objects, and wish to arrive to their standard
ordering, the one in accordance with their supposed
order relation (e.g., “less than”).
In order to abstract from the particular nature of
the n objects, we will use the numbers ¦1, 2, . . . , n¦ =
N
n
, and define a permutation as a 1-1 mapping π :
N
n
→ N
n
. By identifying a and 1, b and 2, c and 3,
the six permutations of 3 objects are written:

1 2 3
1 2 3

,

1 2 3
1 3 2

,

1 2 3
2 1 3

,

1 2 3
2 3 1

,

1 2 3
3 1 2

,

1 2 3
3 2 1

,
where, conventionally, the first line contains the el-
ements in N
n
in their proper order, and the sec-
ond line contains the corresponding images. This
is the usual representation for permutations, but
since the first line can be understood without
ambiguity, the vector representation for permuta-
tions is more common. This consists in writ-
ing the second line (the images) in the form of
a vector. Therefore, the six permutations are
(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1), re-
spectively.
Let us examine the permutation π = (3, 2, 1) for
which we have π(1) = 3, π(2) = 2 and π(3) = 1.
If we start with the element 1 and successively ap-
ply the mapping π, we have π(1) = 3, π(π(1)) =
π(3) = 1, π(π(π(1))) = π(1) = 3 and so on. Since
the elements in N
n
are finite, by starting with any
k ∈ N
n
we must obtain a finite chain of numbers,
which will repeat always in the same order. These
numbers are said to form a cycle and the permuta-
tion (3, 2, 1) is formed by two cycles, the first one
composed by 1 and 3, the second one only composed
by 2. We write (3, 2, 1) = (1 3)(2), where every cycle
is written between parentheses and numbers are sep-
arated by blanks, to distinguish a cycle from a vector.
Conventionally, a cycle is written with the smallest
element first and the various cycles are arranged ac-
cording to their first element. Therefore, in this cycle
representation the six permutations are:
(1)(2)(3), (1)(2 3), (1 2)(3), (1 2 3), (1 3 2), (1 3)(2).
A number k for which π(k) = k is called a fixed
point for π. The corresponding cycles, formed by a
single element, are conventionally understood, except
in the identity (1, 2, . . . , n) = (1)(2) (n), in which
all the elements are fixed points; the identity is simply
written (1). Consequently, the usual representation
of the six permutations is:
(1) (2 3) (1 2) (1 2 3) (1 3 2) (1 3).
A permutation without any fixed point is called a
derangement. A cycle with only two elements is called
a transposition. The degree of a cycle is the number
of its elements, plus one; the degree of a permutation
2.3. THE GROUP STRUCTURE 13
is the sum of the degrees of its cycles. The six per-
mutations have degree 2, 3, 3, 4, 4, 3, respectively. A
permutation is even or odd according to the fact that
its degree is even or odd.
The permutation (8, 9, 4, 3, 6, 1, 7, 2, 10, 5),
in vector notation, has a cycle representation
(1 8 2 9 10 5 6)(3 4), the number 7 being a fixed
point. The long cycle (1 8 2 9 10 5 6) has degree 8;
therefore the permutation degree is 8 + 3 + 2 = 13
and the permutation is odd.
2.3 The group structure
Let n ∈ N; P
n
denotes the set of all the permutations
of n elements, i.e., according to the previous sections,
the set of 1-1 mappings π : N
n
→ N
n
. If π, ρ ∈ P
n
,
we can perform their composition, i.e., a new permu-
tation σ defined as σ(k) = π(ρ(k)) = (π ◦ ρ)(k). An
example in P
7
is:
π ◦ ρ =

1 2 3 4 5 6 7
1 5 6 7 4 2 3

1 2 3 4 5 6 7
4 5 2 1 7 6 3

=

1 2 3 4 5 6 7
4 7 6 3 1 5 2

.
In fact, by instance, π(2) = 5 and ρ(5) = 7; therefore
σ(2) = ρ(π(2)) = ρ(5) = 7, and so on. The vec-
tor representation of permutations is not particularly
suited for hand evaluation of composition, although it
is very convenient for computer implementation. The
opposite situation occurs for cycle representation:
(2 5 4 7 3 6) ◦ (1 4)(2 5 7 3) = (1 4 3 6 5)(2 7)
Cycles in the left hand member are read from left to
right and by examining a cycle after the other we find
the images of every element by simply remembering
the cycle successor of the same element. For example,
the image of 4 in the first cycle is 7; the second cycle
does not contain 7, but the third cycle tells us that the
image of 7 is 3. Therefore, the image of 4 is 3. Fixed
points are ignored, and this is in accordance with
their meaning. The composition symbol ‘◦’ is usually
understood and, in reality, the simple juxtaposition
of the cycles can well denote their composition.
The identity mapping acts as an identity for com-
position, since it is only composed by fixed points.
Every permutation has an inverse, which is the per-
mutation obtained by reading the given permutation
from below, i.e., by sorting the elements in the second
line, which become the elements in the first line, and
then rearranging the elements in the first line into
the second line. For example, the inverse of the first
permutation π in the example above is:

1 2 3 4 5 6 7
1 6 7 5 2 3 4

.
In fact, in cycle notation, we have:
(2 5 4 7 3 6)(2 6 3 7 4 5) =
= (2 6 3 7 4 5)(2 5 4 7 3 6) = (1).
A simple observation is that the inverse of a cycle
is obtained by writing its first element followed by
all the other elements in reverse order. Hence, the
inverse of a transposition is the same transposition.
Since composition is associative, we have proved
that (P
n
, ◦) is a group. The group is not commuta-
tive, because, for example:
ρ ◦ π = (1 4)(2 5 7 3) ◦ (2 5 4 7 3 6)
= (1 7 6 2 4)(3 5) = π ◦ ρ.
An involution is a permutation π such that π
2
= π ◦
π = (1). An involution can only be composed by fixed
points and transpositions, because by the definition
we have π
−1
= π and the above observation on the
inversion of cycles shows that a cycle with more than
2 elements has an inverse which cannot coincide with
the cycle itself.
Till now, we have supposed that in the cycle repre-
sentation every number is only considered once. How-
ever, if we think of a permutation as the product of
cycles, we can imagine that its representation is not
unique and that an element k ∈ N
n
can appear in
more than one cycle. The representation of σ or π ◦ρ
are examples of this statement. In particular, we can
obtain the transposition representation of a permuta-
tion; we observe that we have:
(2 6)(6 5)(6 4)(6 7)(6 3) = (2 5 4 7 3 6)
We transform the cycle into a product of transposi-
tions by forming a transposition with the first and
the last element in the cycle, and then adding other
transpositions with the same first element (the last
element in the cycle) and the other elements in the
cycle, in the same order, as second element. Besides,
we note that we can always add a couple of transpo-
sitions as (2 5)(2 5), corresponding to the two fixed
points (2) and (5), and therefore adding nothing to
the permutation. All these remarks show that:
• every permutation can be written as the compo-
sition of transpositions;
• this representation is not unique, but every
two representations differ by an even number of
transpositions;
• the minimal number of transpositions corre-
sponding to a cycle is the degree of the cycle
(except possibly for fixed points, which however
always correspond to an even number of trans-
positions).
14 CHAPTER 2. SPECIAL NUMBERS
Therefore, we conclude that an even [odd] permuta-
tion can be expressed as the composition of an even
[odd] number of transpositions. Since the composi-
tion of two even permutations is still an even permu-
tation, the set A
n
of even permutations is a subgroup
of P
n
and is called the alternating subgroup, while the
whole group P
n
is referred to as the symmetric group.
2.4 Counting permutations
How many permutations are there in P
n
? If n = 1,
we only have a single permutation (1), and if n = 2
we have two permutations, exactly (1, 2) and (2, 1).
We have already seen that [P
3
[ = 6 and if n = 0
we consider the empty vector () as the only possible
permutation, that is [P
0
[ = 1. In this way we obtain
a sequence ¦1, 1, 2, 6, . . .¦ and we wish to obtain a
formula giving us [P
n
[, for every n ∈ N.
Let π ∈ P
n
be a permutation and (a
1
, a
2
, ..., a
n
) be
its vector representation. We can obtain a permuta-
tion in P
n+1
by simply adding the new element n+1
in any position of the representation of π:
(n + 1, a
1
, a
2
, . . . , a
n
) (a
1
, n + 1, a
2
, . . . , a
n
)
(a
1
, a
2
, . . . , a
n
, n + 1)
Therefore, from any permutation in P
n
we obtain
n+1 permutations in P
n+1
, and they are all different.
Vice versa, if we start with a permutation in P
n+1
,
and eliminate the element n + 1, we obtain one and
only one permutation in P
n
. Therefore, all permuta-
tions in P
n+1
are obtained in the way just described
and are obtained only once. So we find:
[P
n+1
[ = (n + 1) [P
n
[
which is a simple recurrence relation. By unfolding
this recurrence, i.e., by substituting to [P
n
[ the same
expression in [P
n+1
[, and so on, we obtain:
[P
n+1
[ = (n + 1)[P
n
[ = (n + 1)n[P
n−1
[ =
= =
= (n + 1)n(n −1) 1 [P
0
[
Since, as we have seen, [P
0
[=1, we have proved that
the number of permutations in P
n
is given by the
product n (n − 1)... 2 1. Therefore, our sequence
is:
n 0 1 2 3 4 5 6 7 8
[P
n
[ 1 1 2 6 24 120 720 5040 40320
As we mentioned in the Introduction, the number
n (n−1)... 2 1 is called n factorial and is denoted by
n!. For example we have 10! = 10987654321 =
3, 680, 800. Factorials grow very fast, but they are one
of the most important quantities in Mathematics.
When n ≥ 2, we can add to every permutation in
P
n
one transposition, say (1 2). This transforms ev-
ery even permutation into an odd permutation, and
vice versa. On the other hand, since (1 2)
−1
= (1 2),
the transformation is its own inverse, and therefore
defines a 1-1 mapping between even and odd permu-
tations. This proves that the number of even (odd)
permutations is n!/2.
Another simple problem is how to determine the
number of involutions on n elements. As we have al-
ready seen, an involution is only composed by fixed
points and transpositions (without repetitions of the
elements!). If we denote by I
n
the set of involutions
of n elements, we can divide I
n
into two subsets: I

n
is the set of involutions in which n is a fixed point,
and I
′′
n
is the set of involutions in which n belongs
to a transposition, say (k n). If we eliminate n from
the involutions in I

n
, we obtain an involution of n−1
elements, and vice versa every involution in I

n
can be
obtained by adding the fixed point n to an involution
in I
n−1
. If we eliminate the transposition (k n) from
an involution in I
′′
n
, we obtain an involution in I
n−2
,
which contains the element n−1, but does not contain
the element k. In all cases, however, by eliminating
(k n) from all involutions containing it, we obtain a
set of involutions in a 1-1 correspondence with I
n−2
.
The element k can assume any value 1, 2, . . . , n − 1,
and therefore we obtain (n − 1) times [I
n−2
[ involu-
tions.
We now observe that all the involutions in I
n
are
obtained in this way from involutions in I
n−1
and
I
n−2
, and therefore we have:
[I
n
[ = [I
n−1
[ + (n −1)[I
n−2
[
Since [I
0
[ = 1, [I
1
[ = 1 and [I
2
[ = 2, from this recur-
rence relation we can successively find all the values
of [I
n
[. This sequence (see Section 4.9) is therefore:
n 0 1 2 3 4 5 6 7 8
I
n
1 1 2 4 10 26 76 232 764
We conclude this section by giving the classical
computer program for generating a random permu-
tation of the numbers ¦1, 2, . . . , n¦. The procedure
shuffle receives the address of the vector, and the
number of its elements; fills it with the numbers from
1 to n then uses the standard procedure random to
produce a random permutation, which is returned in
the input vector:
procedure shuffle( var v : vector; n : integer ) ;
var : ;
begin
for i := 1 to n do v[ i ] := i ;
for i := n downto 2 do begin
j := random( i ) + 1 ;
2.5. DISPOSITIONS AND COMBINATIONS 15
a := v[ i ]; v[ i ] := v[ j ]; v[ j ] := a
end
end ¦shuffle¦ ;
The procedure exchanges the last element in v with
a random element in v, possibly the same last ele-
ment. In this way the last element is chosen at ran-
dom, and the procedure goes on with the last but one
element. In this way, the elements in v are properly
shuffled and eventually v contains the desired random
permutation. The procedure obviously performs in
linear time.
2.5 Dispositions and Combina-
tions
Permutations are a special case of a more general sit-
uation. If we have n objects, we can wonder how
many different orderings exist of k among the n ob-
jects. For example, if we have 4 objects a, b, c, d, we
can make 12 different arrangements with two objects
chosen from a, b, c, d. They are:
(a, b) (a, c) (a, d) (b, a) (b, c) (b, d)
(c, a) (c, b) (c, d) (d, a) (d, b) (d, c)
These arrangements ara called dispositions and, in
general, we can use any one of the n objects to be first
in the permutation. There remain only n −1 objects
to be used as second element, and n − 2 objects to
be used as a third element, and so on. Therefore,
the k objects can be selected in n(n−1)...(n−k +1)
different ways. If D
n,k
denotes the number of possible
dispositions of n elements in groups of k, we have:
D
n,k
= n(n −1) (n −k + 1) = n
k
The symbol n
k
is called a falling factorial because it
consists of k factors beginning with n and decreasing
by one down to (n −k +1). Obviously, n
n
= n! and,
by convention, n
0
= 1. There exists also a rising
factorial n
¯
k
= n(n + 1) (n +k −1), often denoted
by (n)
k
, the so-called Pochhammer symbol.
When in k-dispositions we do not consider the
order of the elements, we obtain what are called
k-combinations. Therefore, there are only 6 2-
combinations of 4 objects, and they are:
¦a, b¦ ¦a, c¦ ¦a, d¦ ¦b, c¦ ¦b, d¦ ¦c, d¦
Combinations are written between braces, because
they are simply the subsets with k objects of a set of n
objects. If A = ¦a, b, c¦, all the possible combinations
of these objects are:
C
3,0
= ∅
C
3,1
= ¦¦a¦, ¦b¦, ¦c¦¦
C
3,2
= ¦¦a, b¦, ¦a, c¦, ¦b, c¦¦
C
3,3
= ¦¦a, b, c¦¦
The number of k-combinations of n objects is denoted
by

n
k

, which is often read “n choose k”, and is called
a binomial coefficient. By the very definition we have:

n
0

= 1

n
n

= 1 ∀n ∈ N
because, given a set of n elements, the empty set is
the only subset with 0 elements, and the whole set is
the only subset with n elements.
The name “binomial coefficients” is due to the well-
known “binomial formula”:
(a +b)
n
=
n
¸
k=0

n
k

a
k
b
n−k
which is easily proved. In fact, when expanding the
product (a + b)(a + b) (a + b), we choose a term
from each factor (a +b); the resulting term a
k
b
n−k
is
obtained by summing over all the possible choices of
k a’s and n − k b’s, which, by the definition above,
are just

n
k

.
There exists a simple formula to compute binomial
coefficients. As we have seen, there are n
k
differ-
ent k-dispositions of n objects; by permuting the k
objects in the disposition, we obtain k! other dispo-
sitions with the same elements. Therefore, k! dis-
positions correspond to a single combination and we
have:

n
k

=
n
k
k!
=
n(n −1) . . . (n −k + 1)
k!
This formula gives a simple way to compute binomial
coefficients in a recursive way. In fact we have:

n
k

=
n(n −1) . . . (n −k + 1)
k!
=
=
n
k
(n −1) . . . (n −k + 1)
(k −1)!
=
=
n
k

n −1
k −1

and the expression on the right is successively reduced
to

r
0

, which is 1 as we have already seen. For exam-
ple, we have:

7
3

=
7
3

6
2

=
7
3
6
2

5
1

=
7
3
6
2
5
1

4
0

= 35
It is not difficult to compute a binomial coefficient
such as

100
3

, but it is a hard job to compute

100
97

16 CHAPTER 2. SPECIAL NUMBERS
by means of the above formulas. There exists, how-
ever, a symmetry formula which is very useful. Let
us begin by observing that:

n
k

=
n(n −1) . . . (n −k + 1)
k!
=
=
n. . . (n −k + 1)
k!
(n −k) . . . 1
(n −k) . . . 1
=
=
n!
k!(n −k)!
This is a very important formula by its own, and it
shows that:

n
n −k

=
n!
(n −k)!(n −(n −k))!
=

n
k

From this formula we immediately obtain

100
97

=

100
3

. The most difficult computing problem is the
evaluation of the central binomial coefficient

2k
k

, for
which symmetry gives no help.
The reader is invited to produce a computer pro-
gram to evaluate binomial coefficients. He (or she) is
warned not to use the formula n!/k!(n−k)!, which can
produce very large numbers, exceeding the capacity
of the computer when n, k are not small.
The definition of a binomial coefficient can be eas-
ily expanded to any real numerator:

r
k

=
r
k
k!
=
r(r −1) . . . (r −k + 1)
k!
.
For example we have:

1/2
3

=
1/2(−1/2)(−3/2)
3!
=
1
16
but in this case the symmetry rule does not make
sense. We point out that:

−n
k

=
−n(−n −1) . . . (−n −k + 1)
k!
=
=
(−1)
k
(n +k −1)
k
k!
=

n +k −1
k

(−1)
k
which allows us to express a binomial coefficient with
negative, integer numerator as a binomial coefficient
with positive numerator. This is known as negation
rule and will be used very often.
If in a combination we are allowed to have several
copies of the same element, we obtain a combination
with repetitions. A useful exercise is to prove that the
number of the k by k combinations with repetitions
of n elements is:
R
n,k
=

n +k −1
k

.
2.6 The Pascal triangle
Binomial coefficients satisfy a very important recur-
rence relation, which we are now going to prove. As
we know,

n
k

is the number of the subsets with k
elements of a set with n elements. We can count
the number of such subsets in the following way. Let
¦a
1
, a
2
, . . . , a
n
¦ be the elements of the base set, and
let us fix one of these elements, e.g., a
n
. We can dis-
tinguish the subsets with k elements into two classes:
the subsets containing a
n
and the subsets that do
not contain a
n
. Let S
+
and S

be these two classes.
We now point out that S
+
can be seen (by elimi-
nating a
n
) as the subsets with k − 1 elements of a
set with n−1 elements; therefore, the number of the
elements in S
+
is

n−1
k−1

. The class S

can be seen
as composed by the subsets with k elements of a set
with n−1 elements, i.e., the base set minus a
n
: their
number is therefore

n−1
k

. By summing these two
contributions, we obtain the recurrence relation:

n
k

=

n −1
k

+

n −1
k −1

which can be used with the initial conditions:

n
0

=

n
n

= 1 ∀n ∈ N.
For example, we have:

4
2

=

3
2

+

3
1

=
=

2
2

+

2
1

+

2
1

+

2
0

=
= 2 + 2

2
1

= 2 + 2

1
1

+

1
0

=
= 2 + 2 2 = 6.
This recurrence is not particularly suitable for nu-
merical computation. However, it gives a simple rule
to compute successively all the binomial coefficients.
Let us dispose them in an infinite array, whose rows
represent the number n and whose columns represent
the number k. The recurrence tells us that the ele-
ment in position (n, k) is obtained by summing two
elements in the previous row: the element just above
the position (n, k), i.e., in position (n−1, k), and the
element on the left, i.e., in position (n−1, k−1). The
array is initially filled by 1’s in the first column (cor-
responding to the various

n
0

) and the main diagonal
(corresponding to the numbers

n
n

). See Table 2.1.
We actually obtain an infinite, lower triangular ar-
ray known as the Pascal triangle (or Tartaglia-Pascal
triangle). The symmetry rule is quite evident and a
simple observation is that the sum of the elements in
row n is 2
n
. The proof of this fact is immediate, since
2.7. HARMONIC NUMBERS 17
n`k 0 1 2 3 4 5 6
0 1
1 1 1
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
5 1 5 10 10 5 1
6 1 6 15 20 15 6 1
Table 2.1: The Pascal triangle
a row represents the total number of the subsets in a
set with n elements, and this number is obviously 2
n
.
The reader can try to prove that the row alternating
sums, e.g., the sum 1−4+6−4+1, equal zero, except
for the first row, the row with n = 0.
As we have already mentioned, the numbers c
n
=

2n
n

are called central binomial coefficients and their
sequence begins:
n 0 1 2 3 4 5 6 7 8
c
n
1 2 6 20 70 252 924 3432 12870
They are very important, and, for example, we can
express the binomial coefficients

−1/2
n

in terms of
the central binomial coefficients:

−1/2
n

=
(−1/2)(−3/2) (−(2n −1)/2)
n!
=
=
(−1)
n
1 3 (2n −1)
2
n
n!
=
=
(−1)
n
1 2 3 4 (2n −1) (2n)
2
n
n!2 4 (2n)
=
=
(−1)
n
(2n)!
2
n
n!2
n
(1 2 n)
=
(−1)
n
4
n
(2n)!
n!
2
=
=
(−1)
n
4
n

2n
n

.
In a similar way, we can prove the following identities:

1/2
n

=
(−1)
n−1
4
n
(2n −1)

2n
n

3/2
n

=
(−1)
n
3
4
n
(2n −1)(2n −3)

2n
n

−3/2
n

=
(−1)
n−1
(2n + 1)
4
n

2n
n

.
An important point of this generalization is that
the binomial formula can be extended to real expo-
nents. Let us consider the function f(x) = (1+x)
r
; it
is continuous and can be differentiated as many times
as we wish; in fact we have:
f
(n)
(x) =
d
n
dx
n
(1 +x)
r
= r
n
(1 +x)
r−n
as can be shown by mathematical induction. The
first two cases are f
(0)
= (1 + x)
r
= r
0
(1 + x)
r−0
,
f

(x) = r(1 + x)
r−1
. Suppose now that the formula
holds true for some n ∈ N and let us differentiate it
once more:
f
(n+1)
(x) = r
n
(r −n)(1 +x)
r−n−1
=
= r
n+1
(1 +x)
r−(n+1)
and this proves our statement. Because of that, (1 +
x)
r
has a Taylor expansion around the point x = 0
of the form:
(1 +x)
r
=
= f(0) +
f

(0)
1!
x +
f
′′
(0)
2!
x
2
+ +
f
(n)
(0)
n!
x
n
+ .
The coefficient of x
n
is therefore f
(n)
(0)/n! =
r
n
/n! =

r
n

, and so:
(1 +x)
r
=

¸
n=0

r
n

x
n
.
We conclude with the following property, which is
called the cross-product rule:

n
k

k
r

=
n!
k!(n −k)!
k!
r!(k −r)!
=
=
n!
r!(n −r)!
(n −r)!
(n −k)!(k −r)!
=
=

n
r

n −r
k −r

.
This rule, together with the symmetry and the nega-
tion rules, are the three basic properties of binomial
coefficients:

n
k

=

n
n −k

−n
k

=

n +k −1
k

(−1)
k

n
k

k
r

=

n
r

n −r
k −r

and are to be remembered by heart.
2.7 Harmonic numbers
It is well-known that the harmonic series:

¸
k=1
1
k
= 1 +
1
2
+
1
3
+
1
4
+
1
5
+
diverges. In fact, if we cumulate the 2
m
numbers
from 1/(2
m
+ 1) to 1/2
m+1
, we obtain:
1
2
m
+ 1
+
1
2
m
+ 2
+ +
1
2
m+1
>
>
1
2
m+1
+
1
2
m+1
+ +
1
2
m+1
=
2
m
2
m+1
=
1
2
18 CHAPTER 2. SPECIAL NUMBERS
and therefore the sum cannot be limited. On the
other hand we can define:
H
n
= 1 +
1
2
+
1
3
+ +
1
n
a finite, partial sum of the harmonic series. This
number has a well-defined value and is called a har-
monic number. Conventionally, we set H
0
= 0 and
the sequence of harmonic numbers begins:
n 0 1 2 3 4 5 6 7 8
H
n
0 1
3
2
11
6
25
12
137
60
49
20
363
140
761
280
Harmonic numbers arise in the analysis of many
algorithms and it is very useful to know an approxi-
mate value for them. Let us consider the series:
1 −
1
2
+
1
3

1
4
+
1
5
− = ln 2
and let us define:
L
n
= 1 −
1
2
+
1
3

1
4
+ +
(−1)
n−1
n
.
Obviously we have:
H
2n
−L
2n
= 2

1
2
+
1
4
+ +
1
2n

= H
n
or H
2n
= L
2n
+H
n
, and since the series for ln 2 is al-
ternating in sign, the error committed by truncating
it at any place is less than the first discarded element.
Therefore:
ln 2 −
1
2n
< L
2n
< ln 2
and by summing H
n
to all members:
H
n
+ ln 2 −
1
2n
< H
2n
< H
n
+ ln 2
Let us now consider the two cases n = 2
k−1
and n =
2
k−2
:
H
2
k−1 + ln 2 −
1
2
k
< H
2
k < H
2
k−1 + ln 2
H
2
k−2 + ln 2 −
1
2
k−1
< H
2
k−1 < H
2
k−2 + ln 2.
By summing and simplifying these two expressions,
we obtain:
H
2
k−2 + 2 ln 2 −
1
2
k

1
2
k−1
< H
2
k < H
2
k−2 + 2 ln 2.
We can now iterate this procedure and eventually
find:
H
2
0 +k ln 2−
1
2
k

1
2
k−1
− −
1
2
1
< H
2
k < H
2
0 +k ln 2.
Since H
2
0 = H
1
= 1, we have the limitations:
ln 2
k
< H
2
k < ln 2
k
+ 1.
These limitations can be extended to every n, and
since the values of the H
n
’s are increasing, this im-
plies that a constant γ should exist (0 < γ < 1) such
that:
H
n
→ ln n +γ as n → ∞
This constant is called the Euler-Mascheroni con-
stant and, as we have already mentioned, its value
is: γ ≈ 0.5772156649 . . .. Later we will prove the
more accurate approximation of the H
n
’s we quoted
in the Introduction.
The generalized harmonic numbers are defined as:
H
(s)
n
=
1
1
s
+
1
2
s
+
1
3
s
+ +
1
n
s
and H
(1)
n
= H
n
. They are the partial sums of the
series defining the Riemann ζ function:
ζ(s) =
1
1
s
+
1
2
s
+
1
3
s
+
which can be defined in such a way that the sum
actually converges except for s = 1 (the harmonic
series). In particular we have:
ζ(2) = 1 +
1
4
+
1
9
+
1
16
+
1
25
+ =
π
2
6
ζ(3) = 1 +
1
8
+
1
27
+
1
64
+
1
125
+
ζ(4) = 1 +
1
16
+
1
81
+
1
256
+
1
625
+ =
π
4
90
and in general:
ζ(2n) =
(2π)
2n
2(2n)!
[B
2n
[
where B
n
are the Bernoulli numbers (see below). No
explicit formula is known for ζ(2n + 1), but numeri-
cally we have:
ζ(3) ≈ 1.202056903 . . . .
Because of the limited value of ζ(s), we can set, for
large values of n:
H
(s)
n
≈ ζ(s)
2.8 Fibonacci numbers
At the beginning of 1200, Leonardo Fibonacci intro-
duced in Europe the positional notation for numbers,
together with the computing algorithms for perform-
ing the four basic operations. In fact, Fibonacci was
2.9. WALKS, TREES AND CATALAN NUMBERS 19
the most important mathematician in western Eu-
rope at that time. He posed the following problem:
a farmer has a couple of rabbits which generates an-
other couple after two months and, from that moment
on, a new couple of rabbits every month. The new
generated couples become fertile after two months,
when they begin to generate a new couple of rab-
bits every month. The problem consists in comput-
ing how many couples of rabbits the farmer has after
n months.
It is a simple matter to find the initial values: there
is one couple at the beginning (1st month) and 1 in
the second month. The third month the farmer has
2 couples, and 3 couples the fourth month; in fact,
the first couple has generated another pair of rabbits,
while the previously generated couple of rabbits has
not yet become fertile. The couples become 5 on the
fifth month: in fact, there are the 3 couples of the
preceding month plus the newly generated couples,
which are as many as there are fertile couples; but
these are just the couples of two months beforehand,
i.e., 2 couples. In general, at the nth month, the
farmer will have the couples of the previous month
plus the new couples, which are generated by the fer-
tile couples, that is the couples he had two months
before. If we denote by F
n
the number of couples at
the nth month, we have the Fibonacci recurrence:
F
n
= F
n−1
+F
n−2
with the initial conditions F
1
= F
2
= 1. By the same
rule, we have F
0
= 0 and the sequence of Fibonacci
numbers begins:
n 0 1 2 3 4 5 6 7 8 9 10
F
n
0 1 1 2 3 5 8 13 21 34 55
every term obtained by summing the two preceding
numbers in the sequence.
Despite the small numbers appearing at the begin-
ning of the sequence, Fibonacci numbers grow very
fast, and later we will see how they grow and how they
can be computed in a fast way. For the moment, we
wish to show how Fibonacci numbers appear in com-
binatorics and in the analysis of algorithms. Suppose
we have some bricks of dimensions 1 2 dm, and we
wish to cover a strip 2 dm wide and n dm long by
using these bricks. The problem is to know in how
many different ways we can perform this covering. In
Figure 2.1 we show the five ways to cover a strip 4
dm long.
If M
n
is this number, we can observe that a cov-
ering of M
n
can be obtained by adding vertically a
brick to a covering in M
n−1
or by adding horizon-
tally two bricks to a covering in M
n−2
. These are the
only ways of proceeding to build our coverings, and
Figure 2.1: Fibonacci coverings for a strip 4 dm long
therefore we have the recurrence relation:
M
n
= M
n−1
+M
n−2
which is the same recurrence as for Fibonacci num-
bers. This time, however, we have the initial condi-
tions M
0
= 1 (the empty covering is just a covering!)
and M
1
= 1. Therefore we conclude M
n
= F
n+1
.
Euclid’s algorithm for computing the Greatest
Common Divisor (gcd) of two positive integer num-
bers is another instance of the appearance of Fi-
bonacci numbers. The problem is to determine the
maximal number of divisions performed by Euclid’s
algorithm. Obviously, this maximum is attained
when every division in the process gives 1 as a quo-
tient, since a greater quotient would drastically cut
the number of necessary divisions. Let us consider
two consecutive Fibonacci numbers, for example 34
and 55, and let us try to find gcd(34, 55):
55 = 1 34 + 21
34 = 1 21 + 13
21 = 1 13 + 8
13 = 1 8 + 5
8 = 1 5 + 3
5 = 1 3 + 2
3 = 1 2 + 1
2 = 2 1
We immediately see that the quotients are all 1 (ex-
cept the last one) and the remainders are decreasing
Fibonacci numbers. The process can be inverted to
prove that only consecutive Fibonacci numbers enjoy
this property. Therefore, we conclude that given two
integer numbers, n and m, the maximal number of
divisions performed by Euclid’s algorithm is attained
when n, m are two consecutive Fibonacci numbers
and the actual number of divisions is the order of the
smaller number in the Fibonacci sequence, minus 1.
2.9 Walks, trees and Catalan
numbers
“Walks” or “paths” are common combinatorial ob-
jects and are defined in the following way. Let Z
2
be
20 CHAPTER 2. SPECIAL NUMBERS
T
E

, ,
, ,
,
, , , ,
, , ,
,
,
,
,
A
B
C
D
Figure 2.2: How a walk is decomposed
the integral lattice, i.e., the set of points in R
2
having
integer coordinates. A walk or path is a finite sequence
of points in Z
2
with the following properties:
1. the origin (0, 0) belongs to the walk;
2. if (x + 1, y + 1) belongs to the walk, then either
(x, y + 1) or (x + 1, y) also belongs to the walk.
A pair of points ((x, y), (x + 1, y)) is called an east
step and a pair ((x, y), (x, y + 1)) is a north step. It
is a simple matter to show that the number of walks
composed by n steps and ending at column k (i.e.,
the last point is (x, k) = (n − k, k)) is just

n
k

. In
fact, if we denote by 1, 2, . . . , n the n steps, starting
with the origin, then we can associate to any walk a
subset of N
n
= ¦1, 2, . . . , n¦, that is the subset of the
north-step numbers. Since the other steps should be
east steps, a 1-1 correspondence exists between the
walks and the subsets of N
n
with k elements, which,
as we know, are exactly

n
k

. This also proves, in a
combinatorial way, the symmetry rule for binomial
coefficients.
Among the walks, the ones that never go above the
main diagonal, i.e., walks no point of which has the
form (x, y) with y > x, are particularly important.
They are called underdiagonal walks. Later on, we
will be able to count the number of underdiagonal
walks composed by n steps and ending at a hori-
zontal distance k from the main diagonal. For the
moment, we only wish to establish a recurrence rela-
tion for the number b
n
of the underdiagonal walks of
length 2n and ending on the main diagonal. In Fig.
2.2 we sketch a possible walk, where we have marked
the point C = (k, k) at which the walk encounters
the main diagonal for the first time. We observe that
the walk is decomposed into a first part (the walk be-
tween A and B) and a second part (the walk between
C and D) which are underdiagonal walks of the same
type. There are b
k−1
walks of the type AB and b
n−k
walks of the type CD and, obviously, k can be any
number between 1 and n. We therefore obtain the
recurrence relation:
b
n
=
n
¸
k=1
b
k−1
b
n−k
=
n−1
¸
k=0
b
k
b
n−k−1
with the initial condition b
0
= 1, corresponding to
the empty walk or the walk composed by the only
point (0, 0). The sequence (b
k
)
k∈N
begins:
n 0 1 2 3 4 5 6 7 8
b
n
1 1 2 5 14 42 132 429 1430
and, as we shall see:
b
n
=
1
n + 1

2n
n

.
The b
n
’s are called Catalan numbers and they fre-
quently occur in the analysis of algorithms and data
structures. For example, if we associate an open
parenthesis to every east steps and a closed paren-
thesis to every north step, we obtain the number of
possible parenthetizations of an expression. When
we have three pairs of parentheses, the 5 possibilities
are:
()()() ()(()) (())() (()()) ((())).
When we build binary trees from permutations, we
do not always obtain different trees from different
permutations. There are only 5 trees generated by
the six permutations of 1, 2, 3, as we show in Figure
2.3.
How many different trees exist with n nodes? If we
fix our attention on the root, the left subtree has k
nodes for k = 0, 1, . . . , n − 1, while the right subtree
contains the remaining n − k − 1 nodes. Every tree
with k nodes can be combined with every tree with
n − k − 1 nodes to form a tree with n nodes, and
therefore we have the recurrence relation:
b
n
=
n−1
¸
k=0
b
k
b
n−k−1
which is the same recurrence as before. Since the
initial condition is again b
0
= 1 (the empty tree) there
are as many trees as there are walks.
Another kind of walks is obtained by considering
steps of type ((x, y), (x + 1, y + 1)), i.e., north-east
steps, and of type ((x, y), (x + 1, y − 1)), i.e., south-
east steps. The interesting walks are those starting
2.10. STIRLING NUMBERS OF THE FIRST KIND 21
1
2
3
d
d
d
d
1
3
2
d
d

1
2
3

d
d
1
3
2

d
d
1
2
3

Figure 2.3: The five binary trees with three nodes

d
d

d
d

d
d

d
d

d
d
Figure 2.4: Rooted Plane Trees
from the origin and never going below the x-axis; they
are called Dyck walks. An obvious 1-1 correspondence
exists between Dyck walks and the walks considered
above, and again we obtain the sequence of Catalan
numbers:
n 0 1 2 3 4 5 6 7 8
b
n
1 1 2 5 14 42 132 429 1430
Finally, the concept of a rooted planar tree is as
follows: let us consider a node, which is the root of
the tree; if we recursively add branches to the root or
to the nodes generated by previous insertions, what
we obtain is a “rooted plane tree”. If n denotes the
number of branches in a rooted plane tree, in Fig. 2.4
we represent all the trees up to n = 3. Again, rooted
plane trees are counted by Catalan numbers.
2.10 Stirling numbers of the
first kind
About 1730, the English mathematician James Stir-
ling was looking for a connection between powers
of a number x, say x
n
, and the falling factorials
x
k
= x(x − 1) (x − k + 1). He developed the first
n`k 0 1 2 3 4 5 6
0 1
1 0 1
2 0 1 1
3 0 2 3 1
4 0 6 11 6 1
5 0 24 50 35 10 1
6 0 120 274 225 85 15 1
Table 2.2: Stirling numbers of the first kind
instances:
x
1
= x
x
2
= x(x −1) = x
2
−x
x
3
= x(x −1)(x −2) = x
3
−3x
2
+ 2x
x
4
= x(x −1)(x −2)(x −3) =
= x
4
−6x
3
+ 11x
2
−6x
and picking the coefficients in their proper order
(from the smallest power to the largest) he obtained a
table of integer numbers. We are mostly interested in
the absolute values of these numbers, as are shown in
Table 2.2. After him, these numbers are called Stir-
ling numbers of the first kind and are now denoted by

n
k

, sometimes read “n cycle k”, for the reason we
are now going to explain.
First note that the above identities can be written:
x
n
=
n
¸
k=0

n
k

(−1)
n−k
x
k
.
Let us now observe that x
n
= x
n−1
(x − n + 1) and
therefore we have:
x
n
= (x −n + 1)
n−1
¸
k=0
¸
n −1
k

(−1)
n−k−1
x
k
=
=
n−1
¸
k=0
¸
n −1
k

(−1)
n−k−1
x
k+1


n−1
¸
k=0
(n −1)
¸
n −1
k

(−1)
n−k−1
x
k
=
=
n
¸
k=0
¸
n −1
k −1

(−1)
n−k
x
k
+
22 CHAPTER 2. SPECIAL NUMBERS
+
n
¸
k=0
(n −1)
¸
n −1
k

(−1)
n−k
x
k
.
We performed the change of variable k → k − 1 in
the first sum and then extended both sums from 0
to n. This identity is valid for every value of x, and
therefore we can equate its coefficients to those of the
previous, general Stirling identity, thus obtaining the
recurrence relation:

n
k

= (n −1)
¸
n −1
k

+
¸
n −1
k −1

.
This recurrence, together with the initial conditions:

n
n

= 1, ∀n ∈ N and

n
0

= 0, ∀n > 0,
completely defines the Stirling number of the first
kind.
What is a possible combinatorial interpretation of
these numbers? Let us consider the permutations of
n elements and count the permutations having ex-
actly k cycles, whose set will be denoted by S
n,k
. If
we fix any element, say the last element n, we ob-
serve that the permutations in S
n,k
can have n as a
fixed point, or not. When n is a fixed point and we
eliminate it, we obtain a permutation with n − 1 el-
ements having exactly k − 1 cycles; vice versa, any
such permutation gives a permutation in S
n,k
with n
as a fixed point if we add (n) to it. Therefore, there
are [S
n−1,k−1
[ such permutations in S
n,k
. When n is
not a fixed point and we eliminate it from the permu-
tation, we obtain a permutation with n −1 elements
and k cycles. However, the same permutation is ob-
tained several times, exactly n−1 times, since n can
occur after any other element in the standard cycle
representation (it can never occur as the first element
in a cycle, by our conventions). For example, all the
following permutations in S
5,2
produce the same per-
mutation in S
4,2
:
(1 2 3)(4 5) (1 2 3 5)(4) (1 2 5 3)(4) (1 5 2 3)(4).
The process can be inverted and therefore we have:
[S
n,k
[ = (n −1)[S
n−1,k
[ +[S
n−1,k−1
[
which is just the recurrence relation for the Stirling
numbers of the first kind. If we now prove that also
the initial conditions are the same, we conclude that
[S
n,k
[ =

n
k

. First we observe that S
n,n
is only com-
posed by the identity, the only permutation having n
cycles, i.e., n fixed points; so [S
n,n
[ = 1, ∀n ∈ N.
Moreover, for n ≥ 1, S
n,0
is empty, because ev-
ery permutation contains at least one cycle, and so
[S
n,0
[ = 0. This concludes the proof.
As an immediate consequence of this reasoning, we
find:
n
¸
k=0

n
k

= n!,
i.e., the row sums of the Stirling triangle of the first
kind equal n!, because they correspond to the total
number of permutations of n objects. We also observe
that:

n
1

= (n − 1)!; in fact, S
n,1
is composed by
all the permutations having a single cycle; this
begins by 1 and is followed by any permutations
of the n −1 remaining numbers;

n
n−1

=

n
2

; in fact, S
n,n−1
contains permu-
tations having all fixed points except a single
transposition; but this transposition can only be
formed by taking two elements among 1, 2, . . . , n,
which is done in

n
2

different ways;

n
2

= (n − 1)!H
n−1
; returning to the numeri-
cal definition, the coefficient of x
2
is a sum of
products, in each of which a positive integer is
missing.
2.11 Stirling numbers of the
second kind
James Stirling also tried to invert the process de-
scribed in the previous section, that is he was also
interested in expressing ordinary powers in terms of
falling factorials. The first instances are:
x
1
= x
1
x
2
= x
1
+x
2
= x +x(x −1)
x
3
= x
1
+ 3x
2
+x
3
= x + 3x(x −1)+
+x(x −1)(x −2)
x
4
= x
1
+ 6x
2
+ 7x
3
+x
4
The coefficients can be arranged into a triangular ar-
ray, as shown in Table 2.3, and are called Stirling
numbers of the second kind. The usual notation for
them is
¸
n
k
¸
, often read “n subset k”, for the reason
we are going to explain.
Stirling’s identities can be globally written:
x
n
=
n
¸
k=0

n
k
¸
x
k
.
We obtain a recurrence relation in the following way:
x
n
= xx
n−1
= x
n−1
¸
k=0

n −1
k

x
k
=
=
n−1
¸
k=0

n −1
k

(x +k −k)x
k
=
2.12. BELL AND BERNOULLI NUMBERS 23
n`k 0 1 2 3 4 5 6
0 1
1 0 1
2 0 1 1
3 0 1 3 1
4 0 1 7 6 1
5 0 1 15 25 10 1
6 0 1 31 90 65 15 1
Table 2.3: Stirling numbers of the second kind
=
n−1
¸
k=0

n −1
k

x
k+1
+
n−1
¸
k=0
k

n −1
k

x
k
=
=
n
¸
k=0

n −1
k −1

x
k
+
n
¸
k=0
k

n −1
k

x
k
where, as usual, we performed the change of variable
k → k − 1 and extended the two sums from 0 to n.
The identity is valid for every x ∈ R, and therefore we
can equate the coefficients of x
k
in this and the above
identity, thus obtaining the recurrence relation:

n
k
¸
= k

n −1
k

+

n −1
k −1

which is slightly different from the recurrence for the
Stirling numbers of the first kind. Here we have the
initial conditions:

n
n
¸
= 1, ∀n ∈ N and

n
0
¸
= 0, ∀n ≥ 1.
These relations completely define the Stirling trian-
gle of the second kind. Every row of the triangle
determines a polynomial; for example, from row 4 we
obtain: S
4
(w) = w +7w
2
+6w
3
+w
4
and it is called
the 4-th Stirling polynomial.
Let us now look for a combinatorial interpretation
of these numbers. If N
n
is the usual set ¦1, 2, . . . , n¦,
we can study the partitions of N
n
into k disjoint, non-
empty subsets. For example, when n = 4 and k = 2,
we have the following 7 partitions:
¦1¦ ∪ ¦2, 3, 4¦ ¦1, 2¦ ∪ ¦3, 4¦ ¦1, 3¦ ∪ ¦2, 4¦
¦1, 4¦ ∪ ¦2, 3¦ ¦1, 2, 3¦ ∪ ¦4¦
¦1, 2, 4¦ ∪ ¦3¦ ¦1, 3, 4¦ ∪ ¦2¦ .
If P
n,k
is the corresponding set, we now count [P
n,k
[
by fixing an element in N
n
, say the last element n.
The partitions in P
n,k
can contain n as a singleton
(i.e., as a subset with n as its only element) or can
contain n as an element in a larger subset. In the
former case, by eliminating ¦n¦ we obtain a partition
in P
n−1,k−1
and, obviously, all partitions in P
n−1,k−1
can be obtained in such a way. When n belongs to a
larger set, we can eliminate it obtaining a partition
in P
n−1,k
; however, the same partition is obtained
several times, exactly by eliminating n from any of
the k subsets containing it in the various partitions.
For example, the following three partitions in P
5,3
all
produce the same partition in P
4,3
:
¦1, 2, 5¦ ∪ ¦3¦ ∪ ¦4¦ ¦1, 2¦ ∪ ¦3, 5¦ ∪ ¦4¦
¦1, 2¦ ∪ ¦3¦ ∪ ¦4, 5¦
This proves the recurrence relation:
[P
n,k
[ = k[P
n−1,k
[ +[P
n−1,k−1
[
which is the same recurrence as for the Stirling num-
bers of the second kind. As far as the initial con-
ditions are concerned, we observe that there is only
one partition of N
n
composed by n subsets, i.e., the
partition containing n singletons; therefore [P
n,n
[ =
1, ∀n ∈ N (in the case n = 0 the empty set is the only
partition of the empty set). When n ≥ 1, there is
no partition of N
n
composed by 0 subsets, and there-
fore [P
n,0
[ = 0. We can conclude that [P
n,k
[ coincides
with the corresponding Stirling number of the second
kind, and use this fact for observing that:

¸
n
1
¸
= 1, ∀n ≥ 1. In fact, the only partition of
N
n
in a single subset is when the subset coincides
with N
n
;

¸
n
2
¸
= 2
n−1
, ∀n ≥ 2. When the partition is only
composed by two subsets, the first one uniquely
determines the second. Let us suppose that the
first subset always contains 1. By eliminating
1, we obtain as first subset all the subsets in
N
n
` ¦1¦, except this last whole set, which would
correspond to an empty second set. This proves
the identity;

n
n−1
¸
=

n
2

, ∀n ∈ N. In any partition with
one subset with 2 elements, and all the others
singletons, the two elements can be chosen in

n
2

different ways.
2.12 Bell and Bernoulli num-
bers
If we sum the rows of the Stirling triangle of the sec-
ond kind, we find a sequence:
n 0 1 2 3 4 5 6 7 8
B
n
1 1 2 5 15 52 203 877 4140
which represents the total number of partitions rela-
tive to the set N
n
. For example, the five partitions of
a set with three elements are:
¦1, 2, 3¦ ¦1¦ ∪ ¦2, 3¦ ¦1, 2¦ ∪ ¦3¦
24 CHAPTER 2. SPECIAL NUMBERS
¦1, 3¦ ∪ ¦2¦ ¦1¦ ∪ ¦2¦ ∪ ¦3¦ .
The numbers in this sequence are called Bell numbers
and are denoted by B
n
; by definition we have:
B
n
=
n
¸
k=0

n
k
¸
.
Bell numbers grow very fast; however, since
¸
n
k
¸

n
k

, for every value of n and k (a subset in P
n,k
corresponds to one or more cycles in S
n,k
), we always
have B
n
≤ n!, and in fact B
n
< n! for every n > 1.
Another frequently occurring sequence is obtained
by ordering the subsets appearing in the partitions of
N
n
. For example, the partition ¦1¦∪¦2¦∪¦3¦ can be
ordered in 3! = 6 different ways, and ¦1, 2¦ ∪¦3¦ can
be ordered in 2! = 2 ways, i.e., ¦1, 2¦ ∪¦3¦ and ¦3¦ ∪
¦1, 2¦. These are called ordered partitions, and their
number is denoted by O
n
. By the previous example,
we easily see that O
3
= 13, and the sequence begins:
n 0 1 2 3 4 5 6 7
O
n
1 1 3 13 75 541 4683 47293
Because of this definition, the numbers O
n
are called
ordered Bell numbers and we have:
O
n
=
n
¸
k=0

n
k
¸
k!;
this shows that O
n
≥ n!, and, in fact, O
n
> n!, ∀n >
1.
Another combinatorial interpretation of the or-
dered Bell numbers is as follows. Let us fix an in-
teger n ∈ N and for every k ≤ n let A
k
be any mul-
tiset with n elements containing at least once all the
numbers 1, 2, . . . , k. The number of all the possible
orderings of the A
k
’s is just the nth ordered Bell num-
ber. For example, when n = 3, the possible multisets
are: ¦1, 1, 1¦ , ¦1, 1, 2¦ , ¦1, 2, 2¦ , ¦1, 2, 3¦. Their pos-
sible orderings are given by the following 7 vectors:
(1, 1, 1) (1, 1, 2) (1, 2, 1) (2, 1, 1)
(1, 2, 2) (2, 1, 2) (2, 2, 1)
plus the six permutations of the set ¦1, 2, 3¦. These
orderings are called preferential arrangements.
We can find a 1-1 correspondence between the
orderings of set partitions and preferential arrange-
ments. If (a
1
, a
2
, . . . , a
n
) is a preferential arrange-
ment, we build the corresponding ordered partition
by setting the element 1 in the a
1
th subset, 2 in the
a
2
th subset, and so on. If k is the largest number
in the arrangement, we exactly build k subsets. For
example, the partition corresponding to (1, 2, 2, 1) is
¦1, 4¦ ∪ ¦2, 3¦, while the partition corresponding to
(2, 1, 1, 2) is ¦2, 3¦∪¦1, 4¦, whose ordering is different.
This construction can be easily inverted and since it is
injective, we have proved that it is actually a 1-1 cor-
respondence. Because of that, ordered Bell numbers
are also called preferential arrangement numbers.
We conclude this section by introducing another
important sequence of numbers. These are (positive
or negative) rational numbers and therefore they can-
not correspond to any counting problem, i.e., their
combinatorial interpretation cannot be direct. How-
ever, they arise in many combinatorial problems and
therefore they should be examined here, for the mo-
ment only introducing their definition. The Bernoulli
numbers are implicitly defined by the recurrence re-
lation:
n
¸
k=0

n + 1
k

B
k
= δ
n,0
.
No initial condition is necessary, because for n = 0
we have

1
0

B
0
= 1, i.e., B
0
= 1. This is the starting
value, and B
1
is obtained by setting n = 1 in the
recurrence relation:

2
0

B
0
+

2
1

B
1
= 0.
We obtain B
1
= −1/2, and we now have a formula
for B
2
:

3
0

B
0
+

3
1

B
1
+

3
2

B
2
= 0.
By performing the necessary computations, we find
B
2
= 1/6, and we can go on successively obtaining
all the possible values for the B
n
’s. The first twelve
values are as follows:
n 0 1 2 3 4 5 6
B
n
1 −1/2 1/6 0 −1/30 0 1/42
n 7 8 9 10 11 12
B
n
0 −1/30 0 5/66 0 −691/2730
Except for B
1
, all the other values of B
n
for odd
n are zero. Initially, Bernoulli numbers seem to be
small, but as n grows, they become extremely large in
modulus, but, apart from the zero values, they are al-
ternatively one positive and one negative. These and
other properties of the Bernoulli numbers are not eas-
ily proven in a direct way, i.e., from their definition.
However, we’ll see later how we can arrange things
in such a way that everything becomes accessible to
us.
Chapter 3
Formal power series
3.1 Definitions for formal
power series
Let R be the field of real numbers and let t be any
indeterminate over R, i.e., a symbol different from
any element in R. A formal power series (f.p.s.) over
R in the indeterminate t is an expression:
f(t) = f
0
+f
1
t+f
2
t
2
+f
3
t
3
+ +f
n
t
n
+ =

¸
k=0
f
k
t
k
where f
0
, f
1
, f
2
, . . . are all real numbers. The same
definition applies to every set of numbers, in particu-
lar to the field of rational numbers Q and to the field
of complex numbers C. The developments we are now
going to see, and depending on the field structure of
the numeric set, can be easily extended to every field
F of 0 characteristic. The set of formal power series
over F in the indeterminate t is denoted by F[[t]]. The
use of a particular indeterminate t is irrelevant, and
there exists an obvious 1-1 correspondence between,
say, F[[t]] and F[[y]]; it is simple to prove that this
correspondence is indeed an isomorphism. In order
to stress that our results are substantially indepen-
dent of the particular field F and of the particular
indeterminate t, we denote F[[t]] by T, but the reader
can think of T as of R[[t]]. In fact, in combinatorial
analysis and in the analysis of algorithms the coeffi-
cients f
0
, f
1
, f
2
, . . . of a formal power series are mostly
used to count objects, and therefore they are positive
integer numbers, or, in some cases, positive rational
numbers (e.g., when they are the coefficients of an ex-
ponential generating function. See below and Section
4.1).
If f(t) ∈ T, the order of f(t), denoted by ord(f(t)),
is the smallest index r for which f
r
= 0. The set of all
f.p.s. of order exactly r is denoted by T
r
or by F
r
[[t]].
The formal power series 0 = 0 + 0t + 0t
2
+ 0t
3
+
has infinite order.
If (f
0
, f
1
, f
2
, . . .) = (f
k
)
k∈N
is a sequence of (real)
numbers, there is no substantial difference between
the sequence and the f.p.s.
¸

k=0
f
k
t
k
, which will
be called the (ordinary) generating function of the
sequence. The term ordinary is used to distinguish
these functions from exponential generating func-
tions, which will be introduced in the next chapter.
The indeterminate t is used as a “place-marker”, i.e.,
a symbol to denote the place of the element in the se-
quence. For example, in the f.p.s. 1+t +t
2
+t
3
+ ,
corresponding to the sequence (1, 1, 1, . . .), the term
t
5
= 1 t
5
simply denotes that the element in position
5 (starting from 0) in the sequence is the number 1.
Although our study of f.p.s. is mainly justified by
the development of a generating function theory, we
dedicate the present chapter to the general theory of
f.p.s., and postpone the study of generating functions
to the next chapter.
There are two main reasons why f.p.s. are more
easily studied than sequences:
1. the algebraic structure of f.p.s. is very well un-
derstood and can be developed in a standard
way;
2. many f.p.s. can be “abbreviated” by expressions
easily manipulated by elementary algebra.
The present chapter is devoted to these algebraic as-
pects of f.p.s.. For example, we will prove that the
series 1 +t +t
2
+t
3
+ can be conveniently abbre-
viated as 1/(1 −t), and from this fact we will be able
to infer that the series has a f.p.s. inverse, which is
1 −t + 0t
2
+ 0t
3
+ .
We conclude this section by defining the concept
of a formal Laurent (power) series (f.L.s.), as an ex-
pression:
g(t) = g
−m
t
−m
+g
−m+1
t
−m+1
+ +g
−1
t
−1
+
+g
0
+g
1
t +g
2
t
2
+ =

¸
k=−m
g
k
t
k
.
The set of f.L.s. strictly contains the set of f.p.s..
For a f.L.s. g(t) the order can be negative; when the
order of g(t) is non-negative, then g(t) is actually a
f.p.s.. We observe explicitly that an expression as
¸

k=−∞
f
k
t
k
does not represent a f.L.s..
25
26 CHAPTER 3. FORMAL POWER SERIES
3.2 The basic algebraic struc-
ture
The set T of f.p.s. can be embedded into several al-
gebraic structures. We are now going to define the
most common one, which is related to the usual con-
cept of sum and (Cauchy) product of series. Given
two f.p.s. f(t) =
¸

k=0
f
k
t
k
and g(t) =
¸

k=0
g
k
t
k
,
the sum of f(t) and g(t) is defined as:
f(t) +g(t) =

¸
k=0
f
k
t
k
+

¸
k=0
g
k
t
k
=

¸
k=0
(f
k
+g
k
)t
k
.
From this definition, it immediately follows that T
is a commutative group with respect to the sum.
The associative and commutative laws directly fol-
low from the analogous properties in the field F; the
identity is the f.p.s. 0 = 0 + 0t + 0t
2
+ 0t
3
+ , and
the opposite series of f(t) =
¸

k=0
f
k
t
k
is the series
−f(t) =
¸

k=0
(−f
k
)t
k
.
Let us now define the Cauchy product of f(t) by
g(t):
f(t)g(t) =


¸
k=0
f
k
t
k


¸
k=0
g
k
t
k

=
=

¸
k=0

¸
k
¸
j=0
f
j
g
k−j
¸

t
k
Because of the form of the t
k
coefficient, this is also
called the convolution of f(t) and g(t). It is a good
idea to write down explicitly the first terms of the
Cauchy product:
f(t)g(t) = f
0
g
0
+ (f
0
g
1
+f
1
g
0
)t +
+ (f
0
g
2
+f
1
g
1
+f
2
g
0
)t
2
+
+ (f
0
g
3
+f
1
g
2
+f
2
g
1
+f
3
g
0
)t
3
+
This clearly shows that the product is commutative
and it is a simple matter to prove that the identity is
the f.p.s. 1 = 1+0t +0t
2
+0t
3
+ . The distributive
law is a consequence of the distributive law valid in
F. In fact, we have:
(f(t) +g(t))h(t) =
=

¸
k=0

¸
k
¸
j=0
(f
j
+g
j
)h
k−j
¸

t
k
=
=

¸
k=0

¸
k
¸
j=0
f
j
h
k−j
¸

t
k
+

¸
k=0

¸
k
¸
j=0
g
j
h
k−j
¸

t
k
=
= f(t)h(t) +g(t)h(t)
Finally, we can prove that T does not contain any
zero divisor. If f(t) and g(t) are two f.p.s. different
from zero, then we can suppose that ord(f(t)) = k
1
and ord(g(t)) = k
2
, with 0 ≤ k
1
, k
2
< ∞. This
means f
k1
= 0 and g
k2
= 0; therefore, the product
f(t)g(t) has the term of degree k
1
+k
2
with coefficient
f
k1
g
k2
= 0, and so it cannot be zero. We conclude
that (T, +, ) is an integrity domain.
The previous reasoning also shows that, in general,
we have:
ord(f(t)g(t)) = ord(f(t)) + ord(g(t))
The order of the identity 1 is obviously 0; if f(t) is an
invertible element in T, we should have f(t)f(t)
−1
=
1 and therefore ord(f(t)) = 0. On the other hand, if
f(t) ∈ T
0
, i.e., f(t) = f
0
+f
1
t +f
2
t
2
+f
3
t
3
+ with
f
0
= 0, we can easily prove that f(t) is invertible. In
fact, let g(t) = f(t)
−1
so that f(t)g(t) = 1. From
the explicit expression for the Cauchy product, we
can determine the coefficients of g(t) by solving the
infinite system of linear equations:

f
0
g
0
= 1
f
0
g
1
+f
1
g
0
= 0
f
0
g
2
+f
1
g
1
+f
2
g
0
= 0

The system can be solved in a simple way, starting
with the first equation and going on one equation
after the other. Explicitly, we obtain:
g
0
= f
−1
0
g
1
= −
f
1
f
2
0
g
2
= −
f
2
1
f
3
0

f
2
f
2
0

and therefore g(t) = f(t)
−1
is well defined. We con-
clude stating the result just obtained: a f.p.s. is in-
vertible if and only if its order is 0. Because of that,
T
0
is also called the set of invertible f.p.s.. Accord-
ing to standard terminology, the elements of T
0
are
called the units of the integrity domain.
As a simple example, let us compute the inverse of
the f.p.s. 1 −t = 1 −t +0t
2
+0t
3
+ . Here we have
f
0
= 1, f
1
= −1 and f
k
= 0, ∀k > 1. The system
becomes:

g
0
= 1
g
1
−g
0
= 0
g
2
−g
1
= 0

and we easily obtain that all the g
j
’s (j = 0, 1, 2, . . .)
are 1. Therefore the inverse f.p.s. we are looking for
is 1 + t + t
2
+ t
3
+ . The usual notation for this
fact is:
1
1 −t
= 1 +t +t
2
+t
3
+ .
It is well-known that this identity is only valid for
−1 < t < 1, when t is a variable and f.p.s. are inter-
preted as functions. In our formal approach, however,
these considerations are irrelevant and the identity is
valid from a purely formal point of view.
3.4. OPERATIONS ON FORMAL POWER SERIES 27
3.3 Formal Laurent Series
In the first section of this Chapter, we introduced the
concept of a formal Laurent series, as an extension
of the concept of a f.p.s.; if a(t) =
¸

k=m
a
k
t
k
and
b(t) =
¸

k=n
b
k
t
k
(m, n ∈ Z), are two f.L.s., we can
define the sum and the Cauchy product:
a(t) +b(t) =

¸
k=m
a
k
t
k
+

¸
k=n
b
k
t
k
=
=

¸
k=p
(a
k
+b
k
)t
k
a(t)b(t) =


¸
k=m
a
k
t
k


¸
k=n
b
k
t
k

=
=

¸
k=q

¸
¸
i+j=k
a
i
b
j
¸

t
k
where p = min(m, n) and q = m + n. As we did for
f.p.s., it is not difficult to find out that these opera-
tions enjoy the usual properties of sum and product,
and if we denote by L the set of f.L.s., we have that
(L, +, ) is a field. The only point we should formally
prove is that every f.L.s. a(t) =
¸

k=m
a
k
t
k
= 0 has
an inverse f.L.s. b(t) =
¸

k=−m
b
k
t
k
. However, this
is proved in the same way we proved that every f.p.s.
in T
0
has an inverse. In fact we should have:
a
m
b
−m
= 1
a
m
b
−m+1
+a
m+1
b
−m
= 0
a
m
b
−m+2
+a
m+1
b
−m+1
+a
m+2
b
−m
= 0

By solving the first equation, we find b
−m
= a
−1
m
;
then the system can be solved one equation after the
other, by substituting the values obtained up to the
moment. Since a
m
b
−m
is the coefficient of t
0
, we have
a(t)b(t) = 1 and the proof is complete.
We can now show that (L, +, ) is the smallest
field containing the integrity domain (T, +, ), thus
characterizing the set of f.L.s. in an algebraic way.
From Algebra we know that given an integrity do-
main (K, +, ) the smallest field (F, +, ) containing
(K, +, ) can be built in the following way: let us de-
fine an equivalence relation ∼ on the set K K:
(a, b) ∼ (c, d) ⇐⇒ ad = bc;
if we now set F = K K/ ∼, the set F with the
operations + and defined as the extension of + and
in K is the field we are searching for. This is just
the way in which the field Q of rational numbers is
constructed from the integrity domain Z of integers
numbers, and the field of rational functions is built
from the integrity domain of the polynomials.
Our aim is to show that the field (L, +, ) of f.L.s. is
isomorphic with the field constructed in the described
way starting with the integrity domain of f.p.s.. Let
´
L = T T be the set of pairs of f.p.s.; we begin
by showing that for every (f(t), g(t)) ∈
´
L there ex-
ists a pair (a(t), b(t)) ∈
´
L such that (f(t), g(t)) ∼
(a(t), b(t)) (i.e., f(t)b(t) = g(t)a(t)) and at least
one between a(t) and b(t) belongs to T
0
. In fact,
let p = min(ord(f(t)), ord(g(t))) and let us define
a(t), b(t) as f(t) = t
p
a(t) and g(t) = t
p
b(t); obviously,
either a(t) ∈ T
0
or b(t) ∈ T
0
or both are invertible
f.p.s.. We now have:
b(t)f(t) = b(t)t
p
a(t) = t
p
b(t)a(t) = g(t)a(t)
and this shows that (a(t), b(t)) ∼ (f(t), g(t)).
If b(t) ∈ T
0
, then a(t)/b(t) ∈ T and is uniquely de-
termined by a(t), b(t); in this case, therefore, our as-
sert is proved. So, let us now suppose that b(t) ∈ T
0
;
then we can write b(t) = t
m
v(t), where v(t) ∈ T
0
. We
have a(t)/v(t) =
¸

k=0
d
k
t
k
∈ T
0
, and consequently
let us consider the f.L.s. l(t) =
¸

k=0
d
k
t
k−m
; by
construction, it is uniquely determined by a(t), b(t)
or also by f(t), g(t). It is now easy to see that l(t)
is the inverse of the f.p.s. b(t)/a(t) in the sense of
f.L.s. as considered above, and our proof is complete.
This shows that the correspondence is a 1-1 corre-
spondence between L and
´
L preserving the inverse,
so it is now obvious that the correspondence is also
an isomorphism between (L, +, ) and (
´
L, +, ).
Because of this result, we can identify
´
L and L
and assert that (L, +, ) is indeed the smallest field
containing (T, +, ). From now on, the set
´
L will be
ignored and we will always refer to L as the field of
f.L.s..
3.4 Operations on formal
power series
Besides the four basic operations: addition, subtrac-
tion, multiplication and division, it is possible to con-
sider other operations on T, only a few of which can
be extended to L.
The most important operation is surely taking a
power of a f.p.s.; if p ∈ N we can recursively define:

f(t)
0
= 1 if p = 0
f(t)
p
= f(t)f(t)
p−1
if p > 0
and observe that ord(f(t)
p
) = p ord(f(t)). There-
fore, f(t)
p
∈ T
0
if and only if f(t) ∈ T
0
; on the
other hand, if f(t) ∈ T
0
, then the order of f(t)
p
be-
comes larger and larger and goes to ∞ when p → ∞.
This property will be important in our future devel-
opments, when we will reduce many operations to
28 CHAPTER 3. FORMAL POWER SERIES
infinite sums involving the powers f(t)
p
with p ∈ N.
If f(t) ∈ T
0
, i.e., ord(f(t)) > 0, these sums involve
elements of larger and larger order, and therefore for
every index k we can determine the coefficient of t
k
by only a finite number of terms. This assures that
our definitions will be good definitions.
We wish also to observe that taking a positive in-
teger power can be easily extended to L; in this case,
when ord(f(t)) < 0, ord(f(t)
p
) decreases, but re-
mains always finite. In particular, for g(t) = f(t)
−1
,
g(t)
p
= f(t)
−p
, and powers can be extended to all
integers p ∈ Z.
When the exponent p is a real or complex num-
ber whatsoever, we should restrict f(t)
p
to the case
f(t) ∈ T
0
; in fact, if f(t) = t
m
g(t), we would have:
f(t)
p
= (t
m
g(t))
p
= t
mp
g(t)
p
; however, t
mp
is an ex-
pression without any mathematical sense. Instead,
if f(t) ∈ T
0
, let us write f(t) = f
0
+ ´ v(t), with
ord(´ v(t)) > 0. For v(t) = ´ v(t)/f
0
, we have by New-
ton’s rule:
f(t)
p
= (f
0
+´ v(t))
p
= f
p
0
(1+v(t))
p
= f
p
0

¸
k=0

p
k

v(t)
k
,
which can be assumed as a definition. In the last
expression, we can observe that: i) f
p
0
∈ C; ii)

p
k

is
defined for every value of p, k being a non-negative
integer; iii) v(t)
k
is well-defined by the considerations
above and ord(v(t)
k
) grows indefinitely, so that for
every k the coefficient of t
k
is obtained by a finite
sum. We can conclude that f(t)
p
is well-defined.
Particular cases are p = −1 and p = 1/2. In the
former case, f(t)
−1
is the inverse of the f.p.s. f(t).
We have already seen a method for computing f(t)
−1
,
but now we obtain the following formula:
f(t)
−1
=
1
f
0

¸
k=0

−1
k

v(t)
k
=
1
f
0

¸
k=0
(−1)
k
v(t)
k
.
For p = 1/2, we obtain a formula for the square root
of a f.p.s.:
f(t)
1/2
=

f(t) =

f
0

¸
k=0

1/2
k

v(t)
k
=
=

f
0

¸
k=0
(−1)
k−1
4
k
(2k −1)

2k
k

v(t)
k
.
In Section 3.12, we will see how f(t)
p
can be ob-
tained computationally without actually performing
the powers v(t)
k
. We conclude by observing that this
more general operation of taking the power p ∈ R
cannot be extended to f.L.s.: in fact, we would have
smaller and smaller terms t
k
(k → −∞) and there-
fore the resulting expression cannot be considered an
actual f.L.s., which requires a term with smallest de-
gree.
By applying well-known rules of the exponential
and logarithmic functions, we can easily define the
corresponding operations for f.p.s., which however,
as will be apparent, cannot be extended to f.L.s.. For
the exponentiation we have for f(t) ∈ T
0
, f(t) = f
0
+
v(t):
e
f(t)
= exp(f
0
+v(t)) = e
f0

¸
k=0
v(t)
k
k!
.
Again, since v(t) ∈ T
0
, the order of v(t)
k
increases
with k and the sums necessary to compute the co-
efficient of t
k
are always finite. The formula makes
clear that exponentiation can be performed on every
f(t) ∈ T, and when f(t) ∈ T
0
the factor e
f0
is not
present.
For the logarithm, let us suppose f(t) ∈ T
0
, f(t) =
f
0
+ ´ v(t), v(t) = ´ v(t)/f
0
; then we have:
ln(f
0
+ ´ v(t)) = ln f
0
+ ln(1 +v(t)) =
= ln f
0
+

¸
k=1
(−1)
k+1
v(t)
k
k
.
In this case, for f(t) ∈ T
0
, we cannot define the log-
arithm, and this shows an asymmetry between expo-
nential and logarithm.
Another important operation is differentiation:
Df(t) =
d
dt
f(t) =

¸
k=1
kf
k
t
k−1
= f

(t).
This operation can be performed on every f(t) ∈ L,
and a very important observation is the following:
Theorem 3.4.1 For every f(t) ∈ L, its derivative
f

(t) does not contain any term in t
−1
.
Proof: In fact, by the general rule, the term in
t
−1
should have been originated by the constant term
(i.e., the term in t
0
) in f(t), but the product by k
reduces it to 0.
This fact will be the basis for very important results
on the theory of f.p.s. and f.L.s. (see Section 3.8).
Another operation is integration; because indefinite
integration leaves a constant term undefined, we pre-
fer to introduce and use only definite integration; for
f(t) ∈ T this is defined as:

t
0
f(τ)dτ =

¸
k=0
f
k

t
0
τ
k
dτ =

¸
k=0
f
k
k + 1
t
k+1
.
Our purely formal approach allows us to exchange
the integration and summation signs; in general, as
we know, this is only possible when the convergence is
uniform. By this definition,

t
0
f(τ)dτ never belongs
to T
0
. Integration can be extended to f.L.s. with an
3.6. COEFFICIENT EXTRACTION 29
obvious exception: because integration is the inverse
operation of differentiation, we cannot apply integra-
tion to a f.L.s. containing a term in t
−1
. Formally,
from the definition above, such a term would imply a
division by 0, and this is not allowed. In all the other
cases, integration does not create any problem.
3.5 Composition
A last operation on f.p.s. is so important that we
dedicate to it a complete section. The operation is the
composition of two f.p.s.. Let f(t) ∈ T and g(t) ∈ T
0
,
then we define the “composition” of f(t) by g(t) as
the f.p.s:
f(g(t)) =

¸
k=0
f
k
g(t)
k
.
This definition justifies the fact that g(t) cannot be-
long to F
0
; in fact, otherwise, infinite sums were in-
volved in the computation of f(g(t)). In connection
with the composition of f.p.s., we will use the follow-
ing notation:
f(g(t)) =

f(y)

y = g(t)

which is intended to mean the result of substituting
g(t) to the indeterminate y in the f.p.s. f(y). The
indeterminate y is a dummy symbol; it should be dif-
ferent from t in order not to create any ambiguity,
but it can be substituted by any other symbol. Be-
cause every f(t) ∈ T
0
is characterized by the fact
that g(0) = 0, we will always understand that, in the
notation above, the f.p.s. g(t) is such that g(0) = 0.
The definition can be extended to every f(t) ∈ L,
but the function g(t) has always to be such that
ord(g(t)) > 0, otherwise the definition would imply
infinite sums, which we avoid because, by our formal
approach, we do not consider any convergence crite-
rion.
The f.p.s. in T
1
have a particular relevance for
composition. They are called quasi-units or delta se-
ries. First of all, we wish to observe that the f.p.s.
t ∈ T
1
acts as an identity for composition. In fact

y

y = g(t)

= g(t) and

f(y)

y = t

= f(t), and
therefore t is a left and right identity. As a sec-
ond fact, we show that a f.p.s. f(t) has an inverse
with respect to composition if and only if f(t) ∈ T
1
.
Note that g(t) is the inverse of f(t) if and only if
f(g(t)) = t and g(f(t)) = t. From this, we de-
duce immediately that f(t) ∈ T
0
and g(t) ∈ T
0
.
On the other hand, it is clear that ord(f(g(t))) =
ord(f(t))ord(g(t)) by our initial definition, and since
ord(t) = 1 and ord(f(t)) > 0, ord(g(t)) > 0, we must
have ord(f(t)) = ord(g(t)) = 1.
Let us now come to the main part of the proof
and consider the set T
1
with the operation of com-
position ◦; composition is always associative and
therefore (T
1
, ◦) is a group if we prove that every
f(t) ∈ T
1
has a left (or right) inverse, because the
theory assures that the other inverse exists and coin-
cides with the previously found inverse. Let f(t) =
f
1
t+f
2
t
2
+f
3
t
3
+ and g(t) = g
1
t+g
2
t
2
+g
3
t
3
+ ;
we have:
f(g(t)) = f
1
(g
1
t +g
2
t
2
+g
3
t
3
+ ) +
+f
2
(g
2
1
t
2
+ 2g
1
g
2
t
3
+ ) +
+f
3
(g
3
1
t
3
+ ) + =
= f
1
g
1
t + (f
1
g
2
+f
2
g
2
1
)t
2
+
+ (f
1
g
3
+ 2f
2
g
1
g
2
+f
3
g
3
1
)t
3
+
= t
In order to determine g(t) we have to solve the sys-
tem:

f
1
g
1
= 1
f
1
g
2
+f
2
g
2
1
= 0
f
1
g
3
+ 2f
2
g
1
g
2
+f
3
g
3
1
= 0

The first equation gives g
1
= 1/f
1
; we can substitute
this value in the second equation and obtain a value
for g
2
; the two values for g
1
and g
2
can be substi-
tuted in the third equation and obtain a value for g
3
.
Continuing in this way, we obtain the value of all the
coefficients of g(t), and therefore g(t) is determined
in a unique way. In fact, we observe that, by con-
struction, in the kth equation, g
k
appears in linear
form and its coefficient is always f
1
. Being f
1
= 0,
g
k
is unique even if the other g
r
(r < k) appear with
powers.
The f.p.s. g(t) such that f(g(t)) = t, and therefore
such that g(f(t)) = t as well, is called the composi-
tional inverse of f(t). In the literature, it is usually
denoted by f(t) or f
[−1]
(t); we will adopt the first no-
tation. Obviously, f(t) = f(t), and sometimes f(t) is
also called the reverse of f(t). Given f(t) ∈ T
1
, the
determination of its compositional inverse is one of
the most interesting problems in the theory of f.p.s.
or f.L.s.; it was solved by Lagrange and we will dis-
cuss it in the following sections. Note that, in princi-
ple, the g
k
’s can be computed by solving the system
above; this, however, is too complicated and nobody
will follow that way, unless for exercising.
3.6 Coefficient extraction
If f(t) ∈ L, or in particular f(t) ∈ T, the notation
[t
n
]f(t) indicates the extraction of the coefficient of
t
n
from f(t), and therefore we have [t
n
]f(t) = f
n
.
In this sense, [t
n
] can be seen as a mapping: [t
n
] :
L → R or [t
n
] : L → C, according to what is the
field underlying the set L or T. Because of that, [t
n
]
30 CHAPTER 3. FORMAL POWER SERIES
(linearity) [t
n
](αf(t) +βg(t)) = α[t
n
]f(t) +β[t
n
]g(t) (K1)
(shifting) [t
n
]tf(t) = [t
n−1
]f(t) (K2)
(differentiation) [t
n
]f

(t) = (n + 1)[t
n+1
]f(t) (K3)
(convolution) [t
n
]f(t)g(t) =
n
¸
k=0
[t
k
]f(t)[t
n−k
]g(t) (K4)
(composition) [t
n
]f(g(t)) =

¸
k=0
([y
k
]f(y))[t
n
]g(t)
k
(K5)
Table 3.1: The rules for coefficient extraction
is called an operator and exactly the “coefficient of ”
operator or, more simply, the coefficient operator.
In Table 3.1 we state formally the main properties
of this operator, by collecting what we said in the pre-
vious sections. We observe that α, β ∈ R or α, β ∈ C
are any constants; the use of the indeterminate y is
only necessary not to confuse the action on different
f.p.s.; because g(0) = 0 in composition, the last sum
is actually finite. Some points require more lengthy
comments. The property of shifting can be easily gen-
eralized to [t
n
]t
k
f(t) = [t
n−k
]f(t) and also to nega-
tive powers: [t
n
]f(t)/t
k
= [t
n+k
]f(t). These rules are
very important and are often applied in the theory of
f.p.s. and f.L.s.. In the former case, some care should
be exercised to see whether the properties remain in
the realm of T or go beyond it, invading the domain
of L, which can be not always correct. The property
of differentiation for n = 1 gives [t
−1
]f

(t) = 0, a
situation we already noticed. The operator [t
−1
] is
also called the residue and is noted as “res”; so, for
example, people write resf

(t) = 0 and some authors
use the notation res t
n+1
f(t) for [t
n
]f(t).
We will have many occasions to apply rules (K1)÷
(K5) of coefficient extraction. However, just to give
a meaningful example, let us find the coefficient of
t
n
in the series expansion of (1 + αt)
r
, when α and
r are two real numbers whatsoever. Rule (K3) can
be written in the form [t
n
]f(t) =
1
n
[t
n−1
]f

(t) and we
can successively apply this form to our case:
[t
n
](1 +αt)
r
=

n
[t
n−1
](1 +αt)
r−1
=
=

n
(r −1)α
n −1
[t
n−2
](1 +αt)
r−2
= =
=

n
(r −1)α
n −1

(r −n + 1)α
1
[t
0
](1 +αt)
r−n
=
= α
n

r
n

[t
0
](1 +αt)
r−n
.
We now observe that [t
0
](1 + αt)
r−n
= 1 because of
our observations on f.p.s. operations. Therefore, we
conclude with the so-called Newton’s rule:
[t
n
](1 +αt)
r
=

r
n

α
n
which is one of the most frequently used results in
coefficient extraction. Let us remark explicitly that
when r = −1 (the geometric series) we have:
[t
n
]
1
1 +αt
=

−1
n

α
n
=
=

1 +n −1
n

(−1)
n
α
n
= (−α)
n
.
A simple, but important use of Newton’s rule con-
cerns the extraction of the coefficient of t
n
from the
inverse of a trinomial at
2
+ bt + c, in the case it is
reducible, i.e., it can be written (1 +αt)(1 +βt); ob-
viously, we can always reduce the constant c to 1; by
the linearity rule, it can be taken outside the “coeffi-
cient of” operator. Therefore, our aim is to compute:
[t
n
]
1
(1 +αt)(1 +βt)
with α = β, otherwise Newton’s rule would be im-
mediately applicable. The problem can be solved by
using the technique of partial fraction expansion. We
look for two constants A and B such that:
1
(1 +αt)(1 +βt)
=
A
1 +αt
+
B
1 +βt
=
=
A+Aβt +B +Bαt
(1 +αt)(1 +βt)
;
if two such constants exist, the numerator in the first
expression should equal the numerator in the last one,
independently of t, or, if one so prefers, for every
value of t. Therefore, the term A+B should be equal
to 1, while the term (Aβ +Bα)t should always be 0.
The values for A and B are therefore the solution of
the linear system:

A+B = 1
Aβ +Bα = 0
3.7. MATRIX REPRESENTATION 31
The discriminant of this system is α − β, which is
always different from 0, because of our hypothesis
α = β. The system has therefore only one solution,
which is A = α/(α−β) and B = −β/(α−β). We can
now substitute these values in the expression above:
[t
n
]
1
(1 +αt)(1 +βt)
=
= [t
n
]
1
α −β

α
1 +αt

β
1 +βt

=
=
1
α −β

[t
n
]
α
1 +αt
−[t
n
]
β
1 +βt

=
=
α
n+1
−β
n+1
α −β
(−1)
n
Let us now consider a trinomial 1 + bt + ct
2
for
which ∆ = b
2
− 4c < 0 and b = 0. The trinomial is
irreducible, but we can write:
[t
n
]
1
1 +bt +ct
2
=
= [t
n
]
1

1 −
−b+i

|∆|
2
t

1 −
−b−i

|∆|
2
t

This time, a partial fraction expansion does not give
a simple closed form for the coefficients, however, we
can apply the formula above in the form:
[t
n
]
1
(1 −αt)(1 −βt)
=
α
n+1
−β
n+1
α −β
.
Since α and β are complex numbers, the resulting
expression is not very appealing. We can try to give
it a better form. Let us set α =

−b +i

[∆[

/2,
so α is always contained in the positive imaginary
halfplane. This implies 0 < arg(α) < π and we have:
α −β = −
b
2
+i

[∆[
2
+
b
2
+i

[∆[
2
=
= i

[∆[ = i

4c −b
2
If θ = arg(α) and:
ρ = [α[ =

b
2
4

4c −b
2
4
=

c
we can set α = ρe

and β = ρe
−iθ
. Consequently:
α
n+1
−β
n+1
= ρ
n+1

e
i(n+1)θ
−e
−i(n+1)θ

=
= 2iρ
n+1
sin(n + 1)θ
and therefore:
[t
n
]
1
1 +bt +ct
2
=
2(

c)
n+1
sin(n + 1)θ

4c −b
2
At this point we only have to find the value of θ.
Obviously:
θ = arctan

[∆[
2

−b
2

+kπ = arctan

4c −b
2
−b
+kπ
When b < 0, we have 0 < arctan



4c −b
2

/2 <
π/2, and this is the correct value for θ. However,
when b > 0, the principal branch of arctan is negative,
and we should set θ = π+arctan



4c −b
2

/2. As
a consequence, we have:
θ = arctan

4c −b
2
−b
+C
where C = π if b > 0 and C = 0 if b < 0.
An interesting and non-trivial example is given by:
σ
n
= [t
n
]
1
1 −3t + 3t
2
=
=
2(

3)
n+1
sin((n + 1) arctan(

3/3))

3
=
= 2(

3)
n
sin
(n + 1)π
6
These coefficients have the following values:
n = 12k σ
n
=

3
12k
= 729
k
n = 12k + 1 σ
n
=

3
12k+2
= 3 729
k
n = 12k + 2 σ
n
= 2

3
12k+2
= 6 729
k
n = 12k + 3 σ
n
=

3
12k+4
= 9 729
k
n = 12k + 4 σ
n
=

3
12k+4
= 9 729
k
n = 12k + 5 σ
n
= 0
n = 12k + 6 σ
n
= −

3
12k+6
= −27 729
k
n = 12k + 7 σ
n
= −

3
12k+8
= −81 729
k
n = 12k + 8 σ
n
= −2

3
12k+8
= −162 729
k
n = 12k + 9 σ
n
= −

3
12k+10
= −243 729
k
n = 12k + 10 σ
n
= −

3
12k+10
= −243 729
k
n = 12k + 11 σ
n
= 0
3.7 Matrix representation
Let f(t) ∈ T
0
; with the coefficients of f(t) we form
the following infinite lower triangular matrix (or ar-
ray) D = (d
n,k
)
n,k∈N
: column 0 contains the coeffi-
cients f
0
, f
1
, f
2
, . . . in this order; column 1 contains
the same coefficients shifted down by one position
and d
0,1
= 0; in general, column k contains the coef-
ficients of f(t) shifted down k positions, so that the
first k positions are 0. This definition can be sum-
marized in the formula d
n,k
= f
n−k
, ∀n, k ∈ N. For
a reason which will be apparent only later, the array
32 CHAPTER 3. FORMAL POWER SERIES
D will be denoted by (f(t), 1):
D = (f(t), 1) =

¸
¸
¸
¸
¸
¸
¸
¸
f
0
0 0 0 0
f
1
f
0
0 0 0
f
2
f
1
f
0
0 0
f
3
f
2
f
1
f
0
0
f
4
f
3
f
2
f
1
f
0

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
¸

.
If (f(t), 1) and (g(t), 1) are the matrices corre-
sponding to the two f.p.s. f(t) and g(t), we are in-
terested in finding out what is the matrix obtained
by multiplying the two matrices with the usual row-
by-column product. This product will be denoted by
(f(t), 1) (g(t), 1), and it is immediate to see what
its generic element d
n,k
is. The row n in (f(t), 1) is,
by definition, ¦f
n
, f
n−1
, f
n−2
, . . .¦, and column k in
(g(t), 1) is ¦0, 0, . . . , 0, g
1
, g
2
, . . .¦ where the number
of leading 0’s is just k. Therefore we have:
d
n,k
=

¸
j=0
f
n−j
g
j−k
if we conventionally set g
r
= 0, ∀r < 0, When
k = 0, we have d
n,0
=
¸

j=0
f
n−j
g
j
=
¸
n
j=0
f
n−j
g
j
,
and therefore column 0 contains the coefficients of
the convolution f(t)g(t). When k = 1 we have
d
n,1
=
¸

j=0
f
n−j
g
j−1
=
¸
n−1
j=0
f
n−1−j
g
j
, and this
is the coefficient of t
n−1
in the convolution f(t)g(t).
Proceeding in the same way, we see that column k
contains the coefficients of the convolution f(t)g(t)
shifted down k positions. Therefore we conclude:
(f(t), 1) (g(t), 1) = (f(t)g(t), 1)
and this shows that there exists a group isomorphism
between (T
0
, ) and the set of matrices (f(t), 1) with
the row-by-column product. In particular, (1, 1) is
the identity (in fact, it corresponds to the identity
matrix) and (f(t)
−1
, 1) is the inverse of (f(t), 1).
Let us now consider a f.p.s. f(t) ∈ T
1
and let
us build an infinite lower triangular matrix in the
following way: column k contains the coefficients of
f(t)
k
in their proper order:

¸
¸
¸
¸
¸
¸
¸
¸
1 0 0 0 0
0 f
1
0 0 0
0 f
2
f
2
1
0 0
0 f
3
2f
1
f
2
f
3
1
0
0 f
4
2f
1
f
3
+f
2
2
3f
2
1
f
2
f
4
1

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
¸

.
The matrix will be denoted by (1, f(t)/t) and we are
interested to see how the matrix (1, g(t)/t)(1, f(t)/t)
is composed when f(t), g(t) ∈ T
1
.
If

´
f
n,k

n,k∈N
= (1, f(t)/t), by definition we have:
´
f
n,k
= [t
n
]f(t)
k
and therefore the generic element d
n,k
of the product
is:
d
n,k
=

¸
j=0
´ g
n,j
´
f
j,k
=

¸
j=0
[t
n
]g(t)
j
[y
j
]f(y)
k
=
= [t
n
]

¸
j=0

[y
j
]f(t)
k

g(t)
j
= [t
n
]f(g(t))
k
.
In other words, column k in (1, g(t)/t) (1, f(t)/t) is
the kth power of the composition f(g(t)), and we can
conclude:
(1, g(t)/t) (1, f(t)/t) = (1, f(g(t))/t).
Clearly, the identity t ∈ T
1
corresponds to the ma-
trix (1, t/t) = (1, 1), the identity matrix, and this is
sufficient to prove that the correspondence f(t) ↔
(1, f(t)/t) is a group isomorphism.
Row-by-column product is surely the basic op-
eration on matrices and its extension to infinite,
lower triangular arrays is straight-forward, because
the sums involved in the product are actually finite.
We have shown that we can associate every f.p.s.
f(t) ∈ T
0
to a particular matrix (f(t), 1) (let us
denote by A the set of such arrays) in such a way
that (T
0
, ) is isomorphic to (A, ), and the Cauchy
product becomes the row-by-column product. Be-
sides, we can associate every f.p.s. g(t) ∈ T
1
to a
matrix (1, g(t)/t) (let us call B the set of such matri-
ces) in such a way that (T
1
, ◦) is isomorphic to (B, ),
and the composition of f.p.s. becomes again the row-
by-column product. This reveals a connection be-
tween the Cauchy product and the composition: in
the Chapter on Riordan Arrays we will explore more
deeply this connection; for the moment, we wish to
see how this observation yields to a computational
method for evaluating the compositional inverse of a
f.p.s. in T
1
.
3.8 Lagrange inversion theo-
rem
Given an infinite, lower triangular array of the form
(1, f(t)/t), with f(t) ∈ T
1
, the inverse matrix
(1, g(t)/t) is such that (1, g(t)/t) (1, f(t)/t) = (1, 1),
and since the product results in (1, f(g(t))/t) we have
f(g(t)) = t. In other words, because of the isomor-
phism we have seen, the inverse matrix for (1, f(t)/t)
is just the matrix corresponding to the compositional
inverse of f(t). As we have already said, Lagrange
3.9. SOME EXAMPLES OF THE LIF 33
found a noteworthy formula for the coefficients of
this compositional inverse. We follow the more recent
proof of Stanley, which points out the purely formal
aspects of Lagrange’s formula. Indeed, we will prove
something more, by finding the exact form of the ma-
trix (1, g(t)/t), inverse of (1, f(t)/t). As a matter of
fact, we state what the form of (1, g(t)/t) should be
and then verify that it is actually so.
Let D = (d
n,k
)
n,k∈N
be defined as:
d
n,k
=
k
n
[t
n−k
]

t
f(t)

n
.
Because f(t)/t ∈ T
0
, the power (t/f(t))
k
=
(f(t)/t)
−k
is well-defined; in order to show that
(d
n,k
)
n,k∈N
= (1, g(t)/t) we only have to prove that
D (1, f(t)/t) = (1, 1), because we already know that
the compositional inverse of f(t) is unique. The
generic element v
n,k
of the row-by-column product
D (1, f(t)/t) is:
v
n,k
=

¸
j=0
d
n,j
[y
j
]f(y)
k
=
=

¸
j=0
j
n
[t
n−j
]

t
f(t)

n
[y
j
]f(y)
k
.
By the rule of differentiation for the coefficient of op-
erator, we have:
j[y
j
]f(y)
k
= [y
j−1
]
d
dy
f(y)
k
= k[y
j
]yf

(y)f(y)
k−1
.
Therefore, for v
n,k
we have:
v
n,k
=
k
n

¸
j=0
[t
n−j
]

t
f(t)

n
[y
j
]yf

(y)f(y)
k−1
=
=
k
n
[t
n
]

t
f(t)

n
tf

(t)f(t)
k−1
.
In fact, the factor k/n does not depend on j and
can be taken out of the summation sign; the sum
is actually finite and is the term of the convolution
appearing in the last formula. Let us now distinguish
between the case k = n and k = n. When k = n we
have:
v
n,n
= [t
n
]t
n
tf(t)
−1
f

(t) =
= [t
0
]f

(t)

t
f(t)

= 1;
in fact, f

(t) = f
1
+ 2f
2
t + 3f
3
t
2
+ and, being
f(t)/t ∈ T
0
, (f(t)/t)
−1
= (f
1
+f
2
t +f
3
t
2
+ )
−1
=
f
−1
1
+ ; therefore, the constant term in f

(t)(t/f(t))
is f
1
/f
1
= 1. When k = n:
v
n,k
=
k
n
[t
n
]t
n
tf(t)
k−n−1
f

(t) =
=
k
n
[t
−1
]
1
k −n
d
dt

f(t)
k−n

= 0;
in fact, f(t)
k−n
is a f.L.s. and, as we observed, the
residue of its derivative should be zero. This proves
that D (1, f(t)/t) = (1, 1) and therefore D is the
inverse of (1, f(t)/t).
If f(t) is the compositional inverse of f(t), the col-
umn 1 gives us the value of its coefficients; by the
formula for d
n,k
we have:
f
n
= [t
n
]f(t) = d
n,1
=
1
n
[t
n−1
]

t
f(t)

n
and this is the celebrated Lagrange Inversion For-
mula (LIF). The other columns give us the coefficients
of the powers f(t)
k
, for which we have:
[t
n
]f(t)
k
=
k
n
[t
n−k
]

t
f(t)

n
.
Many times, there is another way for applying the
LIF. Suppose we have a functional equation w =
tφ(w), where φ(t) ∈ T
0
, and we wish to find the
f.p.s. w = w(t) satisfying this functional equation.
Clearly w(t) ∈ T
1
and if we set f(y) = y/φ(y), we
also have f(t) ∈ T
1
. However, the functional equa-
tion can be written f(w(t)) = t, and this shows that
w(t) is the compositional inverse of f(t). We there-
fore know that w(t) is uniquely determined and the
LIF gives us:
[t
n
]w(t) =
1
n
[t
n−1
]

t
f(t)

n
=
1
n
[t
n−1
]φ(t)
n
.
The LIF can also give us the coefficients of the
powers w(t)
k
, but we can obtain a still more general
result. Let F(t) ∈ T and let us consider the com-
position F(w(t)) where w = w(t) is, as before, the
solution to the functional equation w = tφ(w), with
φ(w) ∈ T
0
. For the coefficient of t
n
in F(w(t)) we
have:
[t
n
]F(w(t)) = [t
n
]

¸
k=0
F
k
w(t)
k
=
=

¸
k=0
F
k
[t
n
]w(t)
k
=

¸
k=0
F
k
k
n
[t
n−k
]φ(t)
n
=
=
1
n
[t
n−1
]


¸
k=0
kF
k
t
k−1

φ(t)
n
=
=
1
n
[t
n−1
]F

(t)φ(t)
n
.
Note that [t
0
]F(w(t)) = F
0
, and this formula can be
generalized to every F(t) ∈ L, except for the coeffi-
cient [t
0
]F(w(t)).
3.9 Some examples of the LIF
We found that the number b
n
of binary trees
with n nodes (and of other combinatorial objects
34 CHAPTER 3. FORMAL POWER SERIES
as well) satisfies the recurrence relation: b
n+1
=
¸
n
k=0
b
k
b
n−k
. Let us consider the f.p.s. b(t) =
¸

k=0
b
k
t
k
; if we multiply the recurrence relation by
t
n+1
and sum for n from 0 to infinity, we find:

¸
n=0
b
n+1
t
n+1
=

¸
n=0
t
n+1

n
¸
k=0
b
k
b
n−k

.
Since b
0
= 1, we can add and subtract 1 = b
0
t
0
from
the left hand member and can take t outside the sum-
mation sign in the right hand member:

¸
n=0
b
n
t
n
−1 = t

¸
n=0

n
¸
k=0
b
k
b
n−k

t
n
.
In the r.h.s. we recognize a convolution and substi-
tuting b(t) for the corresponding f.p.s., we obtain:
b(t) −1 = tb(t)
2
.
We are interested in evaluating b
n
= [t
n
]b(t); let us
therefore set w = w(t) = b(t) −1, so that w(t) ∈ T
1
and w
n
= b
n
, ∀n > 0. The previous relation becomes
w = t(1+w)
2
and we see that the LIF can be applied
(in the form relative to the functional equation) with
φ(t) = (1 +t)
2
. Therefore we have:
b
n
= [t
n
]w(t) =
1
n
[t
n−1
](1 +t)
2n
=
=
1
n

2n
n −1

=
1
n
(2n)!
(n −1)!(n + 1)!
=
=
1
n + 1
(2n)!
n!n!
=
1
n + 1

2n
n

.
As we said in the previous chapter, b
n
is called the
nth Catalan number and, under this name, it is often
denoted by C
n
. Now we have its form:
C
n
=
1
n + 1

2n
n

also valid in the case n = 0 when C
0
= 1.
In the same way we can compute the number of
p-ary trees with n nodes. A p-ary tree is a tree in
which all the nodes have arity p, except for leaves,
which have arity 0. A non-empty p-ary tree can be
decomposed in the following way:
,
, , , ¨
¨
¨
¨
¨
¨
¨
¨¨

r
r
r
r
r
r
r
r r

t
t
tt
T
1

t
t
tt
T
2
. . .

t
t
tt
T
p
which proves that T
n+1
=
¸
T
i1
T
i2
T
ip
, where the
sum is extended to all the p-uples (i
1
, i
2
, . . . , i
p
) such
that i
1
+i
2
+ +i
p
= n. As before, we can multiply
by t
n+1
the two members of the recurrence relation
and sum for n from 0 to infinity. We find:
T(t) −1 = tT(t)
p
.
This time we have a p degree equation, which cannot
be directly solved. However, if we set w(t) = T(t)−1,
so that w(t) ∈ T
1
, we have:
w = t(1 +w)
p
and the LIF gives:
T
n
= [t
n
]w(t) =
1
n
[t
n−1
](1 +t)
pn
=
=
1
n

pn
n −1

=
1
n
(pn)!
(n −1)!((p −1)n + 1)!
=
=
1
(p −1)n + 1

pn
n

which generalizes the formula for the Catalan num-
bers.
Finally, let us find the solution of the functional
equation w = te
w
. The LIF gives:
w
n
=
1
n
[t
n−1
]e
nt
=
1
n
n
n−1
(n −1)!
=
n
n−1
n!
.
Therefore, the solution we are looking for is the f.p.s.:
w(t) =

¸
n=1
n
n−1
n!
t
n
= t +t
2
+
3
2
t
3
+
8
3
t
4
+
125
24
t
5
+ .
As noticed, w(t) is the compositional inverse of
f(t) = t/φ(t) = te
−t
= t −t
2
+
t
3
2!

t
4
3!
+
t
5
4!
− .
It is a useful exercise to perform the necessary com-
putations to show that f(w(t)) = t, for example up to
the term of degree 5 or 6, and verify that w(f(t)) = t
as well.
3.10 Formal power series and
the computer
When we are dealing with generating functions, or
more in general with formal power series of any kind,
we often have to perform numerical computations in
order to verify some theoretical result or to experi-
ment with actual cases. In these and other circum-
stances the computer can help very much with its
speed and precision. Nowadays, several Computer
Algebra Systems exist, which offer the possibility of
3.11. THE INTERNAL REPRESENTATION OF EXPRESSIONS 35
actually working with formal power series, contain-
ing formal parameters as well. The use of these tools
is recommended because they can solve a doubt in
a few seconds, can clarify difficult theoretical points
and can give useful hints whenever we are faced with
particular problems.
However, a Computer Algebra System is not al-
ways accessible or, in certain circumstances, one may
desire to use less sophisticated tools. For example,
programmable pocket computers are now available,
which can perform quite easily the basic operations
on formal power series. The aim of the present and
of the following sections is to describe the main al-
gorithms for dealing with formal power series. They
can be used to program a computer or to simply un-
derstand how an existing system actually works.
The simplest way to represent a formal power se-
ries is surely by means of a vector, in which the kth
component (starting from 0) is the coefficient of t
k
in
the power series. Obviously, the computer memory
only can store a finite number of components, so an
upper bound n
0
is usually given to the length of vec-
tors and to represent power series. In other words we
have:
repr
n0


¸
k=0
a
k
t
k

= (a
0
, a
1
, . . . , a
n
) (n ≤ n
0
)
Fortunately, most operations on formal power se-
ries preserve the number of significant components,
so that there is little danger that a number of succes-
sive operations could reduce a finite representation to
a meaningless sequence of numbers. Differentiation
decreases by one the number of useful components;
on the contrary, integration and multiplication by t
r
,
say, increase the number of significant elements, at
the cost of introducing some 0 components.
The components a
0
, a
1
, . . . , a
n
are usually real
numbers, represented with the precision allowed by
the particular computer. In most combinatorial ap-
plications, however, a
0
, a
1
, . . . , a
n
are rational num-
bers and, with some extra effort, it is not difficult
to realize rational arithmetic on a computer. It is
sufficient to represent a rational number as a couple
(m, n), whose intended meaning is just m/n. So we
must have m ∈ Z, n ∈ N and it is a good idea to
keep m and n coprime. This can be performed by
a routine reduce computing p = gcd(m, n) using Eu-
clid’s algorithm and then dividing both m and n by
p. The operations on rational numbers are defined in
the following way:
(m, n) + (m

, n

) = reduce(mn

+m

n, nn

)
(m, n) (m

, n

) = reduce(mm

, nn

)
(m, n)
−1
= (n, m)
(m, n)
p
= (m
p
, n
p
) (p ∈ N)
provided that (m, n) is a reduced rational number.
The dimension of m and n is limited by the internal
representation of integer numbers.
In order to avoid this last problem, Computer Al-
gebra Systems usually realize an indefinite precision
integer arithmetic. An integer number has a vari-
able length internal representation and special rou-
tines are used to perform the basic operations. These
routines can also be realized in a high level program-
ming language (such as C or JAVA), but they can
slow down too much execution time if realized on a
programmable pocket computer.
3.11 The internal representa-
tion of expressions
The simple representation of a formal power series by
a vector of real, or rational, components will be used
in the next sections to explain the main algorithms
for formal power series operations. However, it is
surely not the best way to represent power series and
becomes completely useless when, for example, the
coefficients depend on some formal parameter. In
other words, our representation only can deal with
purely numerical formal power series.
Because of that, Computer Algebra Systems use a
more sophisticated internal representation. In fact,
power series are simply a particular case of a general
mathematical expression. The aim of the present sec-
tion is to give a rough idea of how an expression can
be represented in the computer memory.
In general, an expression consists in operators and
operands. For example, in a +

3, the operators are
+ and

, and the operands are a and 3. Every
operator has its own arity or adicity, i.e., the number
of operands on which it acts. The adicity of the sum
+ is two, because it acts on its two terms; the adicity
of the square root

is one, because it acts on a
single term. Operands can be numbers (and it is
important that the nature of the number be specified,
i.e., if it is a natural, an integer, a rational, a real or a
complex number) or can be a formal parameter, as a.
Obviously, if an operator acts on numerical operands,
it can be executed giving a numerical result. But if
any of its operands is a formal parameter, the result is
a formal expression, which may perhaps be simplified
but cannot be evaluated to a numerical result.
An expression can always be transposed into a
“tree”, the internal nodes of which correspond to
operators and whose leaves correspond to operands.
The simple tree for the previous expression is given
in Figure 3.1. Each operator has as many branches as
its adicity is and a simple visit to the tree can perform
its evaluation, that is it can execute all the operators
36 CHAPTER 3. FORMAL POWER SERIES
+

a
`

3
`

d
d
d
Figure 3.1: The tree for a simple expression
when only numerical operands are attached to them.
Simplification is a rather complicated matter and it
is not quite clear what a “simple” expression is. For
example, which is simpler between (a+1)(a+2) and
a
2
+3a+2? It is easily seen that there are occasions in
which either expression can be considered “simpler”.
Therefore, most Computer Algebra Systems provide
both a general “simplification” routine together with
a series of more specific programs performing some
specific simplifying tasks, such as expanding paren-
thetized expressions or collecting like factors.
In the computer memory an expression is repre-
sented by its tree, which is called the tree or list repre-
sentation of the expression. The representation of the
formal power series
¸
n
k=0
a
k
t
k
is shown in Figure 3.2.
This representation is very convenient and when some
coefficients a
1
, a
2
, . . . , a
n
depend on a formal param-
eter p nothing is changed, at least conceptually. In
fact, where we have drawn a leaf a
j
, we simply have
a more complex tree representing the expression for
a
j
.
The reader can develop computer programs for
dealing with this representation of formal power se-
ries. It should be clear that another important point
of this approach is that no limitation is given to the
length of expressions. A clever and dynamic use of
the storage solves every problem without increasing
the complexity of the corresponding programs.
3.12 Basic operations of formal
power series
We are now considering the vector representation of
formal power series:
repr
n0


¸
k=0
a
k
t
k

= (a
0
, a
1
, . . . , a
n
) n ≤ n
0
.
The sum of the formal power series is defined in an
obvious way:
(a
0
, a
1
, . . . , a
n
) + (b
0
, b
1
, . . . , b
m
) = (c
0
, c
1
, . . . , c
r
)
where c
i
= a
i
+ b
i
for every 0 ≤ i ≤ r, and r =
min(n, m). In a similar way the Cauchy product is
defined:
(a
0
, a
1
, . . . , a
n
) (b
0
, b
1
, . . . , b
m
) = (c
0
, c
1
, . . . , c
r
)
where c
k
=
¸
k
j=0
a
j
b
k−j
for every 0 ≤ k ≤ r. Here
r is defined as r = min(n + p
B
, m + p
A
), if p
A
is
the first index for which a
pA
= 0 and p
B
is the first
index for which b
pB
= 0. We point out that the
time complexity for the sum is O(r) and the time
complexity for the product is O(r
2
).
Subtraction is similar to addition and does not re-
quire any particular comment. Before discussing divi-
sion, let us consider the operation of rising a formal
power series to a power α ∈ R. This includes the
inversion of a power series (α = −1) and therefore
division as well.
First of all we observe that whenever α ∈ N, f(t)
α
can be reduced to that case (1 +g(t))
α
, where g(t) / ∈
T
0
. In fact we have:
f(t) = f
h
t
h
+f
h+1
t
h+1
+f
h+2
t
h+2
+ =
= f
h
t
h

1 +
f
h+1
f
h
t +
f
h+2
f
h
t
2
+

and therefore:
f(t)
α
= f
α
h
t
αh

1 +
f
h+1
f
h
t +
f
h+2
f
h
t
2
+

α
.
On the contrary, when α / ∈ N, f(t)
α
only can be
performed if f(t) ∈ T
0
. In that case we have:
f(t)
α
=

f
0
+f
1
t +f
2
t
2
+f
3
t
3
+

α
=
= f
α
0

1 +
f
1
f
0
t +
f
2
f
0
t
2
+
f
3
f
0
t
3
+

α
;
note that in this case if f
0
= 1, usually f
α
0
is not ratio-
nal. In any case, we are always reduced to compute
(1 +g(t))
α
and since:
(1 +g(t))
α
=

¸
k=0

α
k

g(t)
k
(3.12.1)
if the coefficients in g(t) are rational numbers, also
the coefficients in (1 + g(t))
α
are, provided α ∈ Q.
These considerations are to be remembered if the op-
eration is realized in some special environment, as
described in the previous section.
Whatever α ∈ R is, the exponents involved in
the right hand member of (3.12.1) are all positive
integer numbers. Therefore, powers can be real-
ized as successive multiplications or Cauchy products.
This gives a straight-forward method for perform-
ing (1 + g(t))
α
, but it is easily seen to take a time
in the order of O(r
3
), since it requires r products
each executed in time O(r
2
). Fortunately, however,
3.13. LOGARITHM AND EXPONENTIAL 37
+
+
+
+



d
d
d

d
d
d

.
.
.
.
.

d
d
d
`

a
0
`

t
0
`

a
1
`

t
1
`

a
2
`

t
2
`

a
n
`

t
n
`

O(t
n
)

d
d
d
Figure 3.2: The tree for a formal power series
J. C. P. Miller has devised an algorithm allowing to
perform (1 + g(t))
α
in time O(r
2
). In fact, let us
write h(t) = a(t)
α
, where a(t) is any formal power
series with a
0
= 1. By differentiating, we obtain
h

(t) = αa(t)
α−1
a

(t), or, by multiplying everything
by a(t), a(t)h

(t) = αh(t)a

(t). Therefore, by extract-
ing the coefficient of t
n−1
we find:
n−1
¸
k=0
a
k
(n −k)h
n−k
= α
n−1
¸
k=0
(k + 1)a
k+1
h
n−k−1
We now isolate the term with k = 0 in the left hand
member and the term having k = n − 1 in the right
hand member (a
0
= 1 by hypothesis):
nh
n
+
n−1
¸
k=1
a
k
(n −k)h
n−k
= αna
n
+
n−1
¸
k=1
αka
k
h
n−k
(in the last sum we performed the change of vari-
able k → k −1, in order to have the same indices as
in the left hand member). We now have an expres-
sion for h
n
only depending on (a
1
, a
2
, . . . , a
n
) and
(h
1
, h
2
, . . . , h
n−1
):
h
n
= αa
n
+
1
n
n−1
¸
k=1
((α + 1)k −n) a
k
h
n−k
=
= αa
n
+
n−1
¸
k=1

(α + 1)k
n
−1

a
k
h
n−k
The computation is now straight-forward. We be-
gin by setting h
0
= 1, and then we successively com-
pute h
1
, h
2
, . . . , h
r
(r = n, if n is the number of terms
in (a
1
, a
2
, . . . , a
n
)). The evaluation of h
k
requires a
number of operations in the order O(k), and therefore
the whole procedure works in time O(r
2
), as desired.
The inverse of a series, i.e., (1+g(t))
−1
, is obtained
by setting α = −1. It is worth noting that the previ-
ous formula becomes:
h
k
= −a
k

k−1
¸
j=1
a
j
h
k−j
= −
k
¸
j=1
a
j
h
k−j
(h
0
= 1)
and can be used to prove properties of the inverse of
a power series. As a simple example, the reader can
show that the coefficients in (1 −t)
−1
are all 1.
3.13 Logarithm and exponen-
tial
The idea of Miller can be applied to other operations
on formal power series. In the present section we
wish to use it to perform the (natural) logarithm and
the exponentiation of a series. Let us begin with the
logarithm and try to compute ln(1 + g(t)). As we
know, there is a direct way to perform this operation,
i.e.:
ln(1 +g(t)) =

¸
k=1
1
k
g(t)
k
38 CHAPTER 3. FORMAL POWER SERIES
and this formula only requires a series of successive
products. As for the operation of rising to a power,
the procedure needs a time in the order of O(r
3
),
and it is worth considering an alternative approach.
In fact, if we set h(t) = ln(1+g(t)), by differentiating
we obtain h

(t) = g

(t)/(1 + g(t)), or h

(t) = g

(t) −
h

(t)g(t). We can now extract the coefficient of t
k−1
and obtain:
kh
k
= kg
k

k−1
¸
j=0
(k −j)h
k−j
g
j
However, g
0
= 0 by hypothesis, and therefore we have
an expression relating h
k
to (g
1
, g
2
, . . . , g
k
) and to
(h
1
, h
2
, . . . , h
k−1
):
h
k
= g
k

1
k
k−1
¸
j=1
(k −j)h
k−j
g
j
=
= g
k

k−1
¸
j=1

1 −
j
k

h
k−j
g
j
A program to perform the logarithm of a formal
power series 1+g(t) begins by setting h
0
= 0 and then
proceeds computing h
1
, h
2
, . . . , h
r
if r is the number
of significant terms in g(t). The total time is clearly
in the order of O(r
2
).
A similar technique can be applied to the com-
putation of exp(g(t)) provided that g(t) / ∈ T
0
. If
g(t) ∈ T
0
, i.e., g(t) = g
0
+ g
1
t + g
2
t
2
+ , we have
exp(g
0
+g
1
t+g
2
t
2
+ ) = e
g0
exp(g
1
t+g
2
t
2
+ ). In
this way we are reduced to the previous case, but we
no longer have rational coefficients when g(t) ∈ Q[[t]].
By differentiating the identity h(t) = exp(g(t)) we
obtain h

(t) = g

(t) exp(g(t)) = g

(t)h(t). We extract
the coefficient of t
k−1
:
kh
k
=
k−1
¸
j=0
(j + 1)g
j+1
h
k−j−1
=
k
¸
j=1
jg
j
h
k−j
This formula allows us to compute h
k
in terms of
(g
1
, g
2
, . . . , g
k
) and (h
0
= 1, h
1
, h
2
, . . . , h
k−1
). A pro-
gram performing exponentiation can be easily writ-
ten by defining h
0
= 1 and successively evaluating
h
1
, h
2
, . . . , h
r
. if r is the number of significant terms
in g(t). Time complexity is obviously O(r
2
).
Unfortunately, a similar trick does not work for
series composition. To compute f(g(t)), when g(t) / ∈
T
0
, we have to resort to the defining formula:
f(g(t)) =

¸
k=0
f
k
g(t)
k
This requires the successive computation of the in-
teger powers of g(t), which can be performed by re-
peated applications of the Cauchy product. The exe-
cution time is in the order of O(r
3
), if r is the minimal
number of significant terms in f(t) and g(t), respec-
tively.
We conclude this section by sketching the obvious
algorithms to compute differentiation and integration
of a formal power series f(t) = f
0
+f
1
t+f
2
t
2
+f
3
t
3
+
. If h(t) = f

(t), we have:
h
k
= (k + 1)f
k+1
and therefore the number of significant terms is re-
duced by 1. Conversely, if h(t) =

t
0
f(τ)dτ, we have:
h
k
=
1
k
f
k−1
and h
0
= 0; consequently the number of significant
terms is increased by 1.
Chapter 4
Generating Functions
4.1 General Rules
Let us consider a sequence of numbers F =
(f
0
, f
1
, f
2
, . . .) = (f
k
)
k∈N
; the (ordinary) generat-
ing function for the sequence F is defined as f(t) =
f
0
+ f
1
t + f
2
t
2
+ , where the indeterminate t is
arbitrary. Given the sequence (f
k
)
k∈N
, we intro-
duce the generating function operator (, which ap-
plied to (f
k
)
k∈N
produces the ordinary generating
function for the sequence, i.e., ((f
k
)
k∈N
= f(t). In
this expression t is a bound variable, and a more ac-
curate notation would be (
t
(f
k
)
k∈N
= f(t). This
notation is essential when (f
k
)
k∈N
depends on some
parameter or when we consider multivariate gener-
ating functions. In this latter case, for example, we
should write (
t,w
(f
n,k
)
n,k∈N
= f(t, w) to indicate the
fact that f
n,k
in the double sequence becomes the co-
efficient of t
n
w
k
in the function f(t, w). However,
whenever no ambiguity can arise, we will use the no-
tation ((f
k
) = f(t), understanding also the binding
for the variable k. For the sake of completeness, we
also define the exponential generating function of the
sequence (f
0
, f
1
, f
2
, . . .) as:
c(f
k
) = (

f
k
k!

=

¸
k=0
f
k
t
k
k!
.
The operator ( is clearly linear. The function f(t)
can be shifted or differentiated. Two functions f(t)
and g(t) can be multiplied and composed. This leads
to the properties for the operator (listed in Table 4.1.
Note that formula (G5) requires g
0
= 0. The first
five formulas are easily verified by using the intended
interpretation of the operator (; the last formula can
be proved by means of the LIF, in the form relative
to the composition F(w(t)). In fact we have:
[t
n
]F(t)φ(t)
n
= [t
n−1
]
F(t)
t
φ(t)
n
=
= n[t
n
]
¸
F(y)
y
dy

y = w(t)

;
in the last passage we applied backwards the formula:
[t
n
]F(w(t)) =
1
n
[t
n−1
]F

(t)φ(t)
n
(w = tφ(w))
and therefore w = w(t) ∈ T
1
is the unique solution
of the functional equation w = tφ(w). By now apply-
ing the rule of differentiation for the “coefficient of”
operator, we can go on:
[t
n
]F(t)φ(t)
n
= [t
n−1
]
d
dt
¸
F(y)
y
dy

y = w(t)

=
= [t
n−1
]
¸
F(w)
w

w = tφ(w)

dw
dt
.
We have applied the chain rule for differentiation, and
from w = tφ(w) we have:
dw
dt
= φ(w) +t
¸

dw

w = tφ(w)

dw
dt
.
We can therefore compute the derivative of w(t):
dw
dt
=
¸
φ(w)
1 −tφ

(w)

w = tφ(w)

where φ

(w) denotes the derivative of φ(w) with re-
spect to w. We can substitute this expression in the
formula above and observe that w/φ(w) = t can be
taken outside of the substitution symbol:
[t
n
]F(t)φ(t)
n
=
= [t
n−1
]
¸
F(w)
w
φ(w)
1 −tφ

(w)

w = tφ(w)

=
= [t
n−1
]
1
t
¸
F(w)
1 −tφ

(w)

w = tφ(w)

=
= [t
n
]
¸
F(w)
1 −tφ

(w)

w = tφ(w)

which is our diagonalization rule (G6).
The name “diagonalization” is due to the fact
that if we imagine the coefficients of F(t)φ(t)
n
as
constituting the row n in an infinite matrix, then
39
40 CHAPTER 4. GENERATING FUNCTIONS
linearity ((αf
k
+βg
k
) = α((f
k
) +β((g
k
) (G1)
shifting ((f
k+1
) =
(((f
k
) −f
0
)
t
(G2)
differentiation ((kf
k
) = tD((f
k
) (G3)
convolution (

n
¸
k=0
f
k
g
n−k

= ((f
k
) ((g
k
) (G4)
composition

¸
n=0
f
n
(((g
k
))
n
= ((f
k
) ◦ ((g
k
) (G5)
diagonalisation (([t
n
]F(t)φ(t)
n
) =
¸
F(w)
1 −tφ

(w)

w = tφ(w)

(G6)
Table 4.1: The rules for the generating function operator
[t
n
]F(t)φ(t)
n
are just the elements in the main di-
agonal of this array.
The rules (G1) −(G6) can also be assumed as ax-
ioms of a theory of generating functions and used to
derive general theorems as well as specific functions
for particular sequences. In the next sections, we
will prove a number of properties of the generating
function operator. The proofs rely on the following
fundamental principle of identity:
Given two sequences (f
k
)
k∈N
and (g
k
)
k∈N
, then
((f
k
) = ((g
k
) if and only if for every k ∈ N f
k
= g
k
.
The principle is rather obvious from the very defini-
tion of the concept of generating functions; however,
it is important, because it states the condition under
which we can pass from an identity about elements
to the corresponding identity about generating func-
tions. It is sufficient that the two sequences do not
agree by a single element (e.g., the first one) and we
cannot infer the equality of generating functions.
4.2 Some Theorems on Gener-
ating Functions
We are now going to prove a series of properties of
generating functions.
Theorem 4.2.1 Let f(t) = ((f
k
) be the generating
function of the sequence (f
k
)
k∈N
, then
((f
k+2
) =
((f
k
) −f
0
−f
1
t
t
2
(4.2.1)
Proof: Let g
k
= f
k+1
; by (G2), ((g
k
) =
(((f
k
) −f
0
) /t. Since g
0
= f
1
, we have:
((f
k+2
) = ((g
k+1
) =
((g
k
) −g
0
t
=
((f
k
) −f
0
−f
1
t
t
2
By mathematical induction this result can be gen-
eralized to:
Theorem 4.2.2 Let f(t) = ((f
k
) be as above; then:
((f
k+j
) =
((f
k
) −f
0
−f
1
t − −f
j−1
t
j−1
t
j
(4.2.2)
If we consider right instead of left shifting we have
to be more careful:
Theorem 4.2.3 Let f(t) = ((f
k
) be as above, then:
((f
k−j
) = t
j
((f
k
) (4.2.3)
Proof: We have ((f
k
) = (

f
(k−1)+1

=
t
−1
(((f
n−1
) −f
−1
) where f
−1
is the coefficient of t
−1
in f(t). If f(t) ∈ T, f
−1
= 0 and ((f
k−1
) = t((f
k
).
The theorem then follows by mathematical induction.
Property (G3) can be generalized in several ways:
Theorem 4.2.4 Let f(t) = ((f
k
) be as above; then:
(((k + 1)f
k+1
) = D((f
k
) (4.2.4)
Proof: If we set g
k
= kf
k
, we obtain:
(((k + 1)f
k+1
) = ((g
k+1
) =
= t
−1
(((kf
k
) −0f
0
)
(G2)
=
= t
−1
tD((f
k
) = D((f
k
) .
Theorem 4.2.5 Let f(t) = ((f
k
) be as above; then:
(

k
2
f
k

= tD((f
k
) +t
2
D
2
((f
k
) (4.2.5)
This can be further generalized:
4.3. MORE ADVANCED RESULTS 41
Theorem 4.2.6 Let f(t) = ((f
k
) be as above; then:
(

k
j
f
k

= o
j
(tD)((f
k
) (4.2.6)
where o
j
(w) =
¸
j
r=1
¸
j
r
¸
w
r
is the jth Stirling poly-
nomial of the second kind (see Section 2.11).
Proof: Formula (4.2.6) is to be understood in the
operator sense; so, for example, being o
3
(w) = w +
3w
2
+w
3
, we have:
(

k
3
f
k

= tD((f
k
) + 3t
2
D
2
((f
k
) +t
3
D
3
((f
k
) .
The proof proceeds by induction, as (G3) and (4.2.5)
are the first two instances. Now:
(

k
j+1
f
k

= (

k(k
j
)f
k

= tD(

k
j
f
k

that is:
o
j+1
(tD) = tDo
j
(tD) = tD
j
¸
r=1
S(j, r)t
r
D
r
=
=
j
¸
r=1
S(j, r)rt
r
D
r
+
j
¸
r=1
S(j, r)t
r+1
D
r+1
.
By equating like coefficients we find S(j + 1, r) =
rS(j, r) + S(j, r − 1), which is the classical recur-
rence for the Stirling numbers of the second kind.
Since initial conditions also coincide, we can conclude
S(j, r) =
¸
j
r
¸
.
For the falling factorial k
r
= k(k − 1) (k − r +
1) we have a simpler formula, the proof of which is
immediate:
Theorem 4.2.7 Let f(t) = ((f
k
) be as above; then:
((k
r
f
k
) = t
r
D
r
((f
k
) (4.2.7)
Let us now come to integration:
Theorem 4.2.8 Let f(t) = ((f
k
) be as above and
let us define g
k
= f
k
/k, ∀k = 0 and g
0
= 0; then:
(

1
k
f
k

= ((g
k
) =

t
0
(((f
k
) −f
0
)
dz
z
(4.2.8)
Proof: Clearly, kg
k
= f
k
, except for k = 0. Hence
we have ((kg
k
) = ((f
k
) −f
0
. By using (G3), we find
tD((g
k
) = ((f
k
) −f
0
, from which (4.2.8) follows by
integration and the condition g
0
= 0.
A more classical formula is:
Theorem 4.2.9 Let f(t) = ((f
k
) be as above or,
equivalently, let f(t) be a f.p.s. but not a f.L.s.; then:
(

1
k + 1
f
k

=
=
1
t

t
0
((f
k
) dz =
1
t

t
0
f(z) dz (4.2.9)
Proof: Let us consider the sequence (g
k
)
k∈N
, where
g
k+1
= f
k
and g
0
= 0. So we have: ((g
k+1
) =
((f
k
) = t
−1
(((g
k
) −g
0
). Finally:
(

1
k + 1
f
k

= (

1
k + 1
g
k+1

=
1
t
(

1
k
g
k

=
=
1
t

t
0
(((g
k
) −g
0
)
dz
z
=
1
t

t
0
((f
k
) dz
In the following theorems, f(t) will always denote
the generating function ((f
k
).
Theorem 4.2.10 Let f(t) = ((f
k
) denote the gen-
erating function of the sequence (f
k
)
k∈N
; then:
(

p
k
f
k

= f(pt) (4.2.10)
Proof: By setting g(t) = pt in (G5) we have:
f(pt) =
¸

n=0
f
n
(pt)
n
=
¸

n=0
p
n
f
n
t
n
= (

p
k
f
k

In particular, for p = −1 we have: (

(−1)
k
f
k

=
f(−t).
4.3 More advanced results
The results obtained till now can be considered as
simple generalizations of the axioms. They are very
useful and will be used in many circumstances. How-
ever, we can also obtain more advanced results, con-
cerning sequences derived from a given sequence by
manipulating its elements in various ways. For exam-
ple, let us begin by proving the well-known bisection
formulas:
Theorem 4.3.1 Let f(t) = ((f
k
) denote the gener-
ating function of the sequence (f
k
)
k∈N
; then:
((f
2k
) =
f(

t) +f(−

t)
2
(4.3.1)
((f
2k+1
) =
f(

t) −f(−

t)
2

t
(4.3.2)
where

t is a symbol with the property that (

t)
2
= t.
Proof: By (G5) we have:
f(

t) +f(−

t)
2
=
=
¸

n=0
f
n

t
n
+
¸

n=0
f
n
(−

t)
n
2
=
=

¸
n=0
f
n
(

t)
n
+ (−

t)
n
2
For n odd (

t)
n
+ (−

t)
n
= 0; hence, by setting
n = 2k, we have:

¸
k=0
f
2k
(t
k
+t
k
)/2 =

¸
k=0
f
2k
t
k
= ((f
2k
) .
42 CHAPTER 4. GENERATING FUNCTIONS
The proof of the second formula is analogous.
The following proof is typical and introduces the
use of ordinary differential equations in the calculus
of generating functions:
Theorem 4.3.2 Let f(t) = ((f
k
) denote the gener-
ating function of the sequence (f
k
)
k∈N
; then:
(

f
k
2k + 1

=
1
2

t

t
0
((f
k
)

z
dz (4.3.3)
Proof: Let us set g
k
= f
k
/(2k+1), or 2kg
k
+g
k
= f
k
.
If g(t) = ((g
k
), by applying (G3) we have the differ-
ential equation 2tg

(t) + g(t) = f(t), whose solution
having g(0) = f
0
is just formula (4.3.3).
We conclude with two general theorems on sums:
Theorem 4.3.3 (Partial Sum Theorem) Let
f(t) = ((f
k
) denote the generating function of the
sequence (f
k
)
k∈N
; then:
(

n
¸
k=0
f
k

=
1
1 −t
((f
k
) (4.3.4)
Proof: If we set s
n
=
¸
n
k=0
f
k
, then we have s
n+1
=
s
n
+f
n+1
for every n ∈ N and we can apply the oper-
ator ( to both members: ((s
n+1
) = ((s
n
) +((f
n+1
),
i.e.:
((s
n
) −s
0
t
= ((s
n
) +
((f
n
) −f
0
t
Since s
0
= f
0
, we find ((s
n
) = t((s
n
) + ((f
n
) and
from this (4.3.4) follows directly.
The following result is known as Euler transforma-
tion:
Theorem 4.3.4 Let f(t) = ((f
k
) denote the gener-
ating function of the sequence (f
k
)
k∈N
; then:
(

n
¸
k=0

n
k

f
k

=
1
1 −t
f

t
1 −t

(4.3.5)
Proof: By well-known properties of binomial coeffi-
cients we have:

n
k

=

n
n −k

=

−n +n −k −1
n −k

(−1)
n−k
=
=

−k −1
n −k

(−1)
n−k
and this is the coefficient of t
n−k
in (1 −t)
−k−1
. We
now observe that the sum in (4.3.5) can be extended
to infinity, and by (G5) we have:
n
¸
k=0

n
k

f
k
=
n
¸
k=0

−k −1
n −k

(−1)
n−k
f
k
=
=

¸
k=0
[t
n−k
](1 −t)
−k−1
[y
k
]f(y) =
= [t
n
]
1
1 −t

¸
k=0
[y
k
]f(y)

t
1 −t

k
=
= [t
n
]
1
1 −t
f

t
1 −t

.
Since the last expression does not depend on n, it
represents the generating function of the sum.
We observe explicitly that by (K4) we have:
¸
n
k=0

n
k

f
k
= [t
n
](1 + t)
n
f(t), but this expression
does not represent a generating function because it
depends on n. The Euler transformation can be gen-
eralized in several ways, as we shall see when dealing
with Riordan arrays.
4.4 Common Generating Func-
tions
The aim of the present section is to derive the most
common generating functions by using the apparatus
of the previous sections. As a first example, let us
consider the constant sequence F = (1, 1, 1, . . .), for
which we have f
k+1
= f
k
for every k ∈ N. By ap-
plying the principle of identity, we find: ((f
k+1
) =
((f
k
), that is by (G2): ((f
k
) − f
0
= t((f
k
). Since
f
0
= 1, we have immediately:
((1) =
1
1 −t
For any constant sequence F = (c, c, c, . . .), by (G1)
we find that ((c) = c(1−t)
−1
. Similarly, by using the
basic rules and the theorems of the previous sections
we have:
((n) = ((n 1) = tD
1
1 −t
=
t
(1 −t)
2
(

n
2

= tD((n) = tD
1
(1 −t)
2
=
t +t
2
(1 −t)
3
(((−1)
n
) = ((1) ◦ (−t) =
1
1 +t
(

1
n

= (

1
n
1

=

t
0

1
1 −z
−1

dz
z
=
=

t
0
dz
1 −z
= ln
1
1 −t
((H
n
) = (

n
¸
k=0
1
k

=
1
1 −t
(

1
n

=
=
1
1 −t
ln
1
1 −t
where H
n
is the nth harmonic number. Other gen-
erating functions can be obtained from the previous
formulas:
((nH
n
) = tD
1
1 −t
ln
1
1 −t
=
4.4. COMMON GENERATING FUNCTIONS 43
=
t
(1 −t)
2

ln
1
1 −t
+ 1

(

1
n + 1
H
n

=
1
t

t
0
1
1 −z
ln
1
1 −z
dz =
=
1
2t

ln
1
1 −t

2
((δ
0,n
) = ((1) −t((1) =
1 −t
1 −t
= 1
where δ
n,m
is the Kronecker’s delta. This last re-
lation can be readily generalized to ((δ
n,m
) = t
m
.
An interesting example is given by (

1
n(n+1)

. Since
1
n(n+1)
=
1
n

1
n+1
, it is tempting to apply the op-
erator ( to both members. However, this relation is
not valid for n = 0. In order to apply the principle
of identity, we must define:
1
n(n + 1)
=
1
n

1
n + 1

n,0
in accordance with the fact that the first element of
the sequence is zero. We thus arrive to the correct
generating function:
(

1
n(n + 1)

= 1 −
1 −t
t
ln
1
1 −t
Let us now come to binomial coefficients. In order
to find (

p
k

we observe that, from the definition:

p
k + 1

=
p(p −1) (p −k + 1)(p −k)
(k + 1)!
=
=
p −k
k + 1

p
k

.
Hence, by denoting

p
k

as f
k
, we have:
(((k + 1)f
k+1
) = (((p −k)f
k
) = p((f
k
) − ((kf
k
).
By applying (4.2.4) and (G3) we have
D((f
k
) = p((f
k
) − tD((f
k
), i.e., the differen-
tial equation f

(t) = pf(t) − tf

(t). By sep-
arating the variables and integrating, we find
ln f(t) = p ln(1 +t) +c, or f(t) = c(1 +t)
p
. For t = 0
we should have f(0) =

p
0

= 1, and this implies
c = 1. Consequently:
(

p
k

= (1 +t)
p
p ∈ R
We are now in a position to derive the recurrence
relation for binomial coefficients. By using (K1) ÷
(K5) we find easily:

p
k

= [t
k
](1 +t)
p
= [t
k
](1 +t)(1 +t)
p−1
=
= [t
k
](1 +t)
p−1
+ [t
k−1
](1 +t)
p−1
=
=

p −1
k

+

p −1
k −1

.
By (K4), we have the well-known Vandermonde
convolution:

m+p
n

= [t
n
](1 +t)
m+p
=
= [t
n
](1 +t)
m
(1 +t)
p
=
=
n
¸
k=0

m
k

p
n −k

which, for m = p = n becomes
¸
n
k=0

n
k

2
=

2n
n

.
We can also find (

k
p

, where p is a parameter.
The derivation is purely algebraic and makes use of
the generating functions already found and of various
properties considered in the previous section:
(

k
p

= (

k
k −p

=
= (

−k +k −p + 1
k −p

(−1)
k−p

=
= (

−p −1
k −p

(−1)
k−p

=
= (

[t
k−p
]
1
(1 −t)
p+1

=
= (

[t
k
]
t
p
(1 −t)
p+1

=
t
p
(1 −t)
p+1
.
Several generating functions for different forms of bi-
nomial coefficients can be found by means of this
method. They are summarized as follows, where p
and m are two parameters and can also be zero:
(

p
m+k

=
(1 +t)
p
t
m
(

p +k
m

=
t
m−p
(1 −t)
m+1
(

p +k
m+k

=
1
t
m
(1 −t)
p+1−m
These functions can make sense even when they are
f.L.s. and not simply f.p.s..
Finally, we list the following generating functions:
(

k

p
k

= tD(

p
k

=
= tD(1 +t)
p
= pt(1 +t)
p−1
(

k
2

p
k

= (tD +t
2
D
2
)(

p
k

=
= pt(1 +pt)(1 +t)
p−2
(

1
k + 1

p
k

=
1
t

t
0
(

p
k

dz =
=
1
t
¸
(1 +t)
p+1
p + 1

t
0
=
=
(1 +t)
p+1
−1
(p + 1)t
44 CHAPTER 4. GENERATING FUNCTIONS
(

k

k
m

= tD
t
m
(1 −t)
m+1
=
=
mt
m
+t
m+1
(1 −t)
m+2
(

k
2

k
m

= (tD +t
2
D
2
)(

k
m

=
=
m
2
t
m
+ (3m+ 1)t
m+1
+t
m+2
(1 −t)
m+3
(

1
k

k
m

=

t
0

(

k
m

0
m

dz
z
=
=

t
0
z
m−1
dz
(1 −z)
m+1
=
t
m
m(1 −t)
m
The last integral can be solved by setting y = (1 −
z)
−1
and is valid for m > 0; for m = 0 it reduces to
((1/k) = −ln(1 −t).
4.5 The Method of Shifting
When the elements of a sequence F are given by an
explicit formula, we can try to find the generating
function for F by using the technique of shifting: we
consider the element f
n+1
and try to express it in
terms of f
n
. This can produce a relation to which
we apply the principle of identity deriving an equa-
tion in ((f
n
), the solution of which is the generating
function. In practice, we find a recurrence for the
elements f
n
∈ F and then try to solve it by using
the rules (G1) ÷(G5) and their consequences. It can
happen that the recurrence involves several elements
in F and/or that the resulting equation is indeed a
differential equation. Whatever the case, the method
of shifting allows us to find the generating function
of many sequences.
Let us consider the geometric sequence

1, p, p
2
, p
3
, . . .

; we have p
k+1
= pp
k
, ∀k ∈ N
or, by applying the operator (, (

p
k+1

= p(

p
k

.
By (G2) we have t
−1

(

p
k

−1

= p(

p
k

, that is:
(

p
k

=
1
1 −pt
From this we obtain other generating functions:
(

kp
k

= tD
1
1 −pt
=
pt
(1 −pt)
2
(

k
2
p
k

= tD
pt
(1 −pt)
2
=
pt +p
2
t
2
(1 −pt)
3
(

1
k
p
k

=

t
0

1
1 −pz
−1

dz
z
=
=

p dz
(1 −pz)
= ln
1
1 −pt
(

n
¸
k=0
p
k

=
1
1 −t
1
1 −pt
=
=
1
(p −1)t

1
1 −pt

1
1 −t

.
The last relation has been obtained by partial fraction
expansion. By using the operator [t
k
] we easily find:
n
¸
k=0
p
k
= [t
n
]
1
1 −t
1
1 −pt
=
=
1
(p −1)
[t
n+1
]

1
1 −pt

1
1 −t

=
=
p
n+1
−1
p −1
the well-known formula for the sum of a geometric
progression. We observe explicitly that the formulas
above could have been obtained from formulas of the
previous section and the general formula (4.2.10). In
a similar way we also have:
(

m
k

p
k

= (1 +pt)
m
(

k
m

p
k

= (

k
k −m

p
k−m
p
m

=
= p
m
(

−m−1
k −m

(−p)
k−m

=
=
p
m
t
m
(1 −pt)
m+1
.
As a very simple application of the shifting method,
let us observe that:
1
(n + 1)!
=
1
n + 1
1
n!
that is (n+1)f
n+1
= f
n
, where f
n
= 1/n!. By (4.2.4)
we have f

(t) = f(t) or:
(

1
n!

= e
t
(

1
n n!

=

t
0
e
z
−1
z
dz
(

n
(n + 1)!

= tD(

1
(n + 1)!

=
= tD
1
t
(e
t
−1) =
te
t
−e
t
+ 1
t
By this relation the well-known result follows:
n
¸
k=0
k
(k + 1)!
= [t
n
]
1
1 −t

te
t
−e
t
+ 1
t

=
= [t
n
]

1
1 −t

e
t
−1
t

=
= 1 −
1
(n + 1)!
.
Let us now observe that

2n+2
n+1

=
2(2n+1)
n+1

2n
n

;
by setting f
n
=

2n
n

, we have the recurrence (n +
4.6. DIAGONALIZATION 45
1)f
n+1
= 2(2n + 1)f
n
. By using (4.2.4), (G1) and
(G3) we obtain the differential equation: f

(t) =
4tf

(t) + 2f(t), the simple solution of which is:
(

2n
n

=
1

1 −4t
(

1
n + 1

2n
n

=
1
t

t
0
dz

1 −4z
=
1 −

1 −4t
2t
(

1
n

2n
n

=

t
0

1

1 −4z
−1

dz
z
=
= 2 ln
1 −

1 −4t
2t
(

n

2n
n

=
2t
(1 −4t)

1 −4t
(

1
2n + 1

2n
n

=
1

t

t
0
dz

4z(1 −4z)
=
=
1

4t
arctan

4t
1 −4t
.
A last group of generating functions is obtained by
considering f
n
= 4
n

2n
n

−1
. Since:
4
n+1

2n + 2
n + 1

−1
=
2n + 2
2n + 1
4
n

2n
n

−1
we have the recurrence: (2n + 1)f
n+1
= 2(n + 1)f
n
.
By using the operator ( and the rules of Section 4.2,
the differential equation 2t(1−t)f

(t) −(1+2t)f(t) +
1 = 0 is derived. The solution is:
f(t) =

t
(1 −t)
3

t
0

(1 −z)
3
z
dz
2z(1 −z)

By simplifying and using the change of variable y =

z/(1 −z), the integral can be computed without
difficulty, and the final result is:
(

4
n

2n
n

−1

=

t
(1 −t)
3
arctan

t
1 −t
+
1
1 −t
.
Some immediate consequences are:
(

4
n
2n + 1

2n
n

−1

=
1

t(1 −t)
arctan

t
1 −t
(

4
n
2n
2

2n
n

−1

=

arctan

t
1 −t

2
and finally:
(

1
2n
4
n
2n + 1

2n
n

−1

= 1−

1 −t
t
arctan

t
1 −t
.
4.6 Diagonalization
The technique of shifting is a rather general method
for obtaining generating functions. It produces first
order recurrence relations, which will be more closely
studied in the next sections. Not every sequence can
be defined by a first order recurrence relation, and
other methods are often necessary to find out gener-
ating functions. Sometimes, the rule of diagonaliza-
tion can be used very conveniently. One of the most
simple examples is how to determine the generating
function of the central binomial coefficients, without
having to pass through the solution of a differential
equation. In fact we have:

2n
n

= [t
n
](1 +t)
2n
and (G6) can be applied with F(t) = 1 and φ(t) =
(1 +t)
2
. In this case, the function w = w(t) is easily
determined by solving the functional equation w =
t(1+w)
2
. By expanding, we find tw
2
−(1−2t)w+t = 0
or:
w = w(t) =
1 −t ±

1 −4t
2t
.
Since w = w(t) should belong to T
1
, we must elim-
inate the solution with the + sign; consequently, we
have:
(

2n
n

=
=
¸
1
1 −2t(1 +w)

w =
1 −t −

1 −4t
2t

=
=
1

1 −4t
as we already know.
The function φ(t) = (1 +t)
2
gives rise to a second
degree equation. More in general, let us study the
sequence:
c
n
= [t
n
](1 +αt +βt
2
)
n
and look for its generating function C(t) = ((c
n
).
In this case again we have F(t) = 1 and φ(t) = 1 +
αt +βt
2
, and therefore we should solve the functional
equation w = t(1+αw+βw
2
) or βtw
2
−(1−αt)w+t =
0. This gives:
w = w(t) =
1 −αt ±

(1 −αt)
2
−4βt
2
2βt
and again we have to eliminate the solution with the
+ sign. By performing the necessary computations,
we find:
C(t) =
¸
1
1 −t(α + 2βw)

46 CHAPTER 4. GENERATING FUNCTIONS
n`k 0 1 2 3 4 5 6 7 8
0 1
1 1 1 1
2 1 2 3 2 1
3 1 3 6 7 6 3 1
4 1 4 10 16 19 16 10 4 1
Table 4.2: Trinomial coefficients

w =
1 −αt −

1 −2αt + (α
2
−4β)t
2
2βt
¸
=
1

1 −2αt + (α
2
−4β)t
2
and for α = 2, β = 1 we obtain again the generating
function for the central binomial coefficients.
The coefficients of (1+t +t
2
)
n
are called trinomial
coefficients, in analogy with the binomial coefficients.
They constitute an infinite array in which every row
has two more elements, different from 0, with respect
to the previous row (see Table 4.2).
If T
n,k
is [t
k
](1 + t + t
2
)
n
, a trinomial coefficient,
by the obvious property (1 + t + t
2
)
n+1
= (1 + t +
t
2
)(1+t+t
2
)
n
, we immediately deduce the recurrence
relation:
T
n+1,k+1
= T
n,k−1
+T
n,k
+T
n,k+1
from which the array can be built, once we start from
the initial conditions T
n,0
= 1 and T
n,2n
= 1, for ev-
ery n ∈ N. The elements T
n,n
, marked in the table,
are called the central trinomial coefficients; their se-
quence begins:
n 0 1 2 3 4 5 6 7 8
T
n
1 1 3 7 19 51 141 393 1107
and by the formula above their generating function
is:
((T
n,n
) =
1

1 −2t −3t
2
=
1

(1 +t)(1 −3t)
.
4.7 Some special generating
functions
We wish to determine the generating function of
the sequence ¦0, 1, 1, 1, 2, 2, 2, 2, 3, . . .¦, that is the
sequence whose generic element is ⌊

k⌋. We can
think that it is formed up by summing an infinite
number of simpler sequences ¦0, 1, 1, 1, 1, 1, 1, . . .¦,
¦0, 0, 0, 0, 1, 1, 1, 1, . . .¦, the next one with the first 1
in position 9, and so on. The generating functions of
these sequences are:
t
1
1 −t
t
4
1 −t
t
9
1 −t
t
16
1 −t

and therefore we obtain:
(



k⌋

=

¸
k=0
t
k
2
1 −t
.
In the same way we obtain analogous generating
functions:
(


r

k⌋

=

¸
k=0
t
k
r
1 −t
((⌊log
r
k⌋) =

¸
k=0
t
r
k
1 −t
where r is any integer number, or also any real num-
ber, if we substitute to k
r
and r
k
, respectively, ⌈k
r

and ⌈r
k
⌉.
These generating functions can be used to find the
values of several sums in closed or semi-closed form.
Let us begin by the following case, where we use the
Euler transformation:
¸
k

n
k



k⌋(−1)
k
=
= (−1)
n
¸
k

n
k

(−1)
n−k


k⌋ =
= (−1)
n
[t
n
]
1
1 +t
¸

¸
k=0
y
k
2
1 −y

y =
t
1 +t
¸
=
= (−1)
n
[t
n
]

¸
k=0
t
k
2
(1 +t)
k
2
=
= (−1)
n


n⌋
¸
k=0
[t
n−k
2
]
1
(1 +t)
k
2
=
= (−1)
n


n⌋
¸
k=0

−k
2
n −k
2

=
= (−1)
n


n⌋
¸
k=0

n −1
n −k
2

(−1)
n−k
2
=
=


n⌋
¸
k=0

n −1
n −k
2

(−1)
k
2
.
We can think of the last sum as a “semi-closed”
form, because the number of terms is dramatically
reduced from n to

n, although it remains depending
on n. In the same way we find:
¸
k

n
k

⌊log
2
k⌋(−1)
k
=
⌊log
2
n⌋
¸
k=1

n −1
n −2
k

.
A truly closed formula is found for the following
sum:
n
¸
k=1


k⌋ = [t
n
]
1
1 −t

¸
k=1
t
k
2
1 −t
=
4.8. LINEAR RECURRENCES WITH CONSTANT COEFFICIENTS 47
=

¸
k=1
[t
n−k
2
]
1
(1 −t)
2
=
=


n⌋
¸
k=1

−2
n −k
2

(−1)
n−k
2
=
=


n⌋
¸
k=1

n −k
2
+ 1
n −k
2

=
=


n⌋
¸
k=1
(n −k
2
+ 1) =
= (n + 1)⌊

n⌋ −


n⌋
¸
k=1
k
2
.
The final value of the sum is therefore:
(n + 1)⌊

n⌋ −


n⌋
3



n⌋ + 1



n⌋ +
1
2

whose asymptotic value is
2
3
n

n. Again, for the anal-
ogous sum with ⌊log
2
n⌋, we obtain (see Chapter 1):
n
¸
k=1
⌊log
2
k⌋ = (n + 1)⌊log
2
n⌋ −2
⌊log
2
n⌋+1
+ 1.
A somewhat more difficult sum is the following one:
n
¸
k=0

n
k



k⌋ = [t
n
]
1
1 −t
¸

¸
k=0
y
k
2
1 −y

y =
t
1 −t
¸
= [t
n
]

¸
k=0
t
k
2
(1 −t)
k
2
(1 −2t)
=
=

¸
k=0
[t
n−k
2
]
1
(1 −t)
k
2
(1 −2t)
.
We can now obtain a semi-closed form for this sum by
expanding the generating function into partial frac-
tions:
1
(1 −t)
k
2
(1 −2t)
=
A
1 −2t
+
B
(1 −t)
k
2
+
+
C
(1 −t)
k
2
−1
+
D
(1 −t)
k
2
−2
+ +
X
1 −t
.
We can show that A = 2
k
2
, B = −1, C = −2, D =
−4, . . . , X = −2
k
2
−1
; in fact, by substituting these
values in the previous expression we get:
2
k
2
1 −2t

1
(1 −t)
k
2

2(1 −t)
(1 −t)
k
2


4(1 −t)
2
(1 −t)
k
2
− −
2
k
2
−1
(1 −t)
k
2
−1
(1 −t)
k
2
=
=
2
k
2
1 −2t

1
(1 −t)
k
2

2
k
2
(1 −t)
k
2
−1
2(1 −t) −1

=
=
2
k
2
1 −2t

2
k
2
(1 −t)
k
2
−1
(1 −t)
k
2
(1 −2t)
=
=
1
(1 −t)
k
2
(1 −2t)
.
Therefore, we conclude:
n
¸
k=0

n
k



k⌋ =


n⌋
¸
k=1
2
k
2
2
n−k
2



n⌋
¸
k=1

n −1
n −k
2





n⌋
¸
k=1
2

n −2
n −k
2

− −


n⌋
¸
k=1
2
k
2
−1

n −k
2
n −k
2

=
= ⌊

n⌋2
n



n⌋
¸
k=1

n −1
n −k
2

+ 2

n −2
n −k
2

+ +
+ + 2
k
2
−1

n −k
2
n −k
2

.
We observe that for very large values of n the first
term dominates all the others, and therefore the
asymptotic value of the sum is ⌊

n⌋2
n
.
4.8 Linear recurrences with
constant coefficients
If (f
k
)
k∈N
is a sequence, it can be defined by means
of a recurrence relation, i.e., a relation relating the
generic element f
n
to other elements f
k
having k < n.
Usually, the first elements of the sequence must be
given explicitly, in order to allow the computation
of the successive values; they constitute the initial
conditions and the sequence is well-defined if and only
if every element can be computed by starting with
the initial conditions and going on with the other
elements by means of the recurrence relation. For
example, the constant sequence (1, 1, 1, . . .) can be
defined by the recurrence relation x
n
= x
n−1
and
the initial condition x
0
= 1. By changing the initial
conditions, the sequence can radically change; if we
consider the same relation x
n
= x
n−1
together with
the initial condition x
0
= 2, we obtain the constant
sequence ¦2, 2, 2, . . .¦.
In general, a recurrence relation can be written
f
n
= F(f
n−1
, f
n−2
, . . .); when F depends on all
the values f
n−1
, f
n−2
, . . . , f
1
, f
0
, then the relation is
called a full history recurrence. If F only depends on
a fixed number of elements f
n−1
, f
n−2
, . . . , f
n−p
, then
the relation is called a partial history recurrence and p
is called the order of the relation. Besides, if F is lin-
ear, we have a linear recurrence. Linear recurrences
are surely the most common and important type of
recurrence relations; if all the coefficients appearing
in F are constant, we have a linear recurrence with
48 CHAPTER 4. GENERATING FUNCTIONS
constant coefficients, and if the coefficients are poly-
nomials in n, we have a linear recurrence with poly-
nomial coefficients. As we are now going to see, the
method of generating functions allows us to find the
solution of any linear recurrence with constant coeffi-
cients, in the sense that we find a function f(t) such
that [t
n
]f(t) = f
n
, ∀n ∈ N. For linear recurrences
with polynomial coefficients, the same method allows
us to find a solution in many occasions, but the suc-
cess is not assured. On the other hand, no method
is known that solves all the recurrences of this kind,
and surely generating functions are the method giv-
ing the highest number of positive results. We will
discuss this case in the next section.
The Fibonacci recurrence F
n
= F
n−1
+F
n−2
is an
example of a recurrence relation with constant co-
efficients. When we have a recurrence of this kind,
we begin by expressing it in such a way that the re-
lation is valid for every n ∈ N. In the example of
Fibonacci numbers, this is not the case, because for
n = 0 we have F
0
= F
−1
+F
−2
, and we do not know
the values for the two elements in the r.h.s., which
have no combinatorial meaning. However, if we write
the recurrence as F
n+2
= F
n+1
+F
n
we have fulfilled
the requirement. This first step has a great impor-
tance, because it allows us to apply the operator (
to both members of the relation; this was not pos-
sible beforehand because of the principle of identity
for generating functions.
The recurrence being linear with constant coeffi-
cients, we can apply the axiom of linearity to the
recurrence:
f
n+p
= α
1
f
n+p−1

2
f
n+p−2
+ +α
p
f
n
and obtain the relation:
((f
n+p
) = α
1
((f
n+p−1
)+

2
((f
n+p−2
) + +α
p
((f
n
).
By Theorem 4.2.2 we can now express every
((f
n+p−j
) in terms of f(t) = ((f
n
) and obtain a lin-
ear relation in f(t), from which an explicit expression
for f(t) is immediately obtained. This is the solution
of the recurrence relation. We observe explicitly that
in writing the expressions for ((f
n+p−j
) we make use
of the initial conditions for the sequence.
Let us go on with the example of the Fibonacci
sequence (F
k
)
k∈N
. We have:
((F
n+2
) = ((F
n+1
) +((F
n
)
and by setting F(t) = ((F
n
) we find:
F(t) −F
0
−F
1
t
t
2
=
F(t) −F
0
t
+F(t).
Because we know that F
0
= 0, F
1
= 1, we have:
F(t) −t = tF(t) +t
2
F(t)
and by solving in F(t) we have the explicit expression:
F(t) =
t
1 −t −t
2
.
This is the generating function for the Fibonacci
numbers. We can now find an explicit expression for
F
n
in the following way. The denominator of F(t)
can be written 1 −t −t
2
= (1 −φt)(1 −
´
φt) where:
φ =
1 +

5
2
≈ 1.618033989
´
φ =
1 −

5
2
≈ −0.618033989.
The constant 1/φ ≈ 0.618033989 is known as the
golden ratio. By applying the method of partial frac-
tion expansion we find:
F(t) =
t
(1 −φt)(1 −
´
φt)
=
A
1 −φt
+
B
1 −
´
φt
=
=
A−A
´
φt +B −Bφt
1 −t −t
2
.
We determine the two constants A and B by equat-
ing the coefficients in the first and last expression for
F(t):

A+B = 0
−A
´
φ −Bφ = 1

A = 1/(φ −
´
φ) = 1/

5
B = −A = −1/

5
The value of F
n
is now obtained by extracting the
coefficient of t
n
:
F
n
= [t
n
]F(t) =
= [t
n
]
1

5

1
1 −φt

1
1 −
´
φt

=
=
1

5

[t
n
]
1
1 −φt
−[t
n
]
1
1 −
´
φt

=
=
φ
n

´
φ
n

5
.
This formula allows us to compute F
n
in a time in-
dependent of n, because φ
n
= exp(nln φ), and shows
that F
n
grows exponentially. In fact, since [
´
φ[ < 1,
the quantity
´
φ
n
approaches 0 very rapidly and we
have F
n
= O(φ
n
). In reality, F
n
should be an inte-
ger and therefore we can compute it by finding the
integer number closest to φ
n
/

5; consequently:
F
n
= round

φ
n

5

.
4.10. THE SUMMING FACTOR METHOD 49
4.9 Linear recurrences with
polynomial coefficients
When a recurrence relation has polynomial coeffi-
cients, the method of generating functions does not
assure a solution, but no other method is available
to solve those recurrences which cannot be solved by
a generating function approach. Usually, the rule of
differentiation introduces a derivative in the relation
for the generating function and a differential equa-
tion has to be solved. This is the actual problem of
this approach, because the main difficulty just con-
sists in dealing with the differential equation. We
have already seen some examples when we studied
the method of shifting, but here we wish to present
a case arising from an actual combinatorial problem,
and in the next section we will see a very important
example taken from the analysis of algorithms.
When we studied permutations, we introduced the
concept of an involution, i.e., a permutation π ∈ P
n
such that π
2
= (1), and for the number I
n
of involu-
tions in P
n
we found the recurrence relation:
I
n
= I
n−1
+ (n −1)I
n−2
which has polynomial coefficients. The number of
involutions grows very fast and it can be a good idea
to consider the quantity i
n
= I
n
/n!. Therefore, let
us begin by changing the recurrence in such a way
that the principle of identity can be applied, and then
divide everything by (n + 2)!:
I
n+2
= I
n+1
+ (n + 1)I
n
I
n+2
(n + 2)!
=
1
n + 2
I
n+1
(n + 1)!
+
1
n + 2
I
n
n!
.
The recurrence relation for i
n
is:
(n + 2)i
n+2
= i
n+1
+i
n
and we can pass to generating functions.
(((n + 2)i
n+2
) can be seen as the shifting of
(((n + 1)i
n+1
) = i

(t), if with i(t) we denote the
generating function of (i
k
)
k∈N
= (I
k
/k!)
k∈N
, which
is therefore the exponential generating function for
(I
k
)
k∈N
. We have:
i

(t) −1
t
=
i(t) −1
t
+i(t)
because of the initial conditions i
0
= i
1
= 1, and so:
i

(t) = (1 +t)i(t).
This is a simple differential equation with separable
variables and by solving it we find:
ln i(t) = t +
t
2
2
+C or i(t) = exp

t +
t
2
2
+C

where C is an integration constant. Because i(0) =
e
C
= 1, we have C = 0 and we conclude with the
formula:
I
n
= n![t
n
] exp

t +
t
2
2

.
4.10 The summing factor
method
For linear recurrences of the first order a method ex-
ists, which allows us to obtain an explicit expression
for the generic element of the defined sequence. Usu-
ally, this expression is in the form of a sum, and a pos-
sible closed form can only be found by manipulating
this sum; therefore, the method does not guarantee
a closed form. Let us suppose we have a recurrence
relation:
a
n+1
f
n+1
= b
n
f
n
+c
n
where a
n
, b
n
, c
n
are any expressions, possibly depend-
ing on n. As we remarked in the Introduction, if
a
n+1
= b
n
= 1, by unfolding the recurrence we can
find an explicit expression for f
n+1
or f
n
:
f
n+1
= f
n
+c
n
= f
n−1
+c
n−1
+c
n
= = f
0
+
n
¸
k=0
c
k
where f
0
is the initial condition relative to the se-
quence under consideration. Fortunately, we can al-
ways change the original recurrence into a relation of
this more simple form. In fact, if we multiply every-
thing by the so-called summing factor:
a
n
a
n−1
. . . a
0
b
n
b
n−1
. . . b
0
provided none of a
n
, a
n−1
, . . . , a
0
, b
n
, b
n−1
, . . . , b
0
is
zero, we obtain:
a
n+1
a
n
. . . a
0
b
n
b
n−1
. . . b
0
f
n+1
=
=
a
n
a
n−1
. . . a
0
b
n−1
b
n−2
. . . b
0
f
n
+
a
n
a
n−1
. . . a
0
b
n
b
n−1
. . . b
0
c
n
.
We can now define:
g
n+1
= a
n+1
a
n
. . . a
0
f
n+1
/(b
n
b
n−1
. . . b
0
),
and the relation becomes:
g
n+1
= g
n
+
a
n
a
n−1
. . . a
0
b
n
b
n−1
. . . b
0
c
n
g
0
= a
0
f
0
.
Finally, by unfolding this recurrence the result is:
f
n+1
=
b
n
b
n−1
. . . b
0
a
n+1
a
n
. . . a
0

a
0
f
0
+
n
¸
k=0
a
k
a
k−1
. . . a
0
b
k
b
k−1
. . . b
0
c
k

.
As a technical remark, we observe that sometimes
a
0
and/or b
0
can be 0; in that case, we can unfold the
50 CHAPTER 4. GENERATING FUNCTIONS
recurrence down to 1, and accordingly change the last
index 0.
In order to show a non-trivial example, let us
discuss the problem of determining the coefficient
of t
n
in the f.p.s. corresponding to the function
f(t) =

1 −t ln(1/(1 − t)). If we expand the func-
tion, we find:
f(t) =

1 −t ln
1
1 −t
=
= t −
1
24
t
3

1
24
t
4

71
1920
t
5

31
960
t
6
+ .
A method for finding a recurrence relation for the
coefficients f
n
of this f.p.s. is to derive a differential
equation for f(t). By differentiating:
f

(t) = −
1
2

1 −t
ln
1
1 −t
+
1

1 −t
and therefore we have the differential equation:
(1 −t)f

(t) = −
1
2
f(t) +

1 −t.
By extracting the coefficient of t
n
, we have the rela-
tion:
(n + 1)f
n+1
−nf
n
= −
1
2
f
n
+

1/2
n

(−1)
n
which can be written as:
(n + 1)f
n+1
=
2n −1
2
f
n

1
4
n
(2n −1)

2n
n

.
This is a recurrence relation of the first order with the
initial condition f
0
= 0. Let us apply the summing
factor method, for which we have a
n
= n, b
n
= (2n−
1)/2. Since a
0
= 0, we have:
a
n
a
n−1
. . . a
1
b
n
b
n−1
. . . b
1
=
n(n −1) 1 2
n
(2n −1)(2n −3) 1
=
=
n!2
n
2n(2n −2) 2
2n(2n −1)(2n −2) 1
=
=
4
n
n!
2
(2n)!
.
By multiplying the recurrence relation by this sum-
ming factor, we find:
(n + 1)
4
n
n!
2
(2n)!
f
n+1
=
2n −1
2
4
n
n!
2
(2n)!
f
n

1
2n −1
.
We are fortunate and c
n
simplifies dramatically;
besides, we know that the two coefficients of f
n+1
and f
n
are equal, notwithstanding their appearance.
Therefore we have:
f
n+1
=
1
(n + 1)4
n

2n
n

n
¸
k=0
−1
2k −1
(a
0
f
0
= 0).
We can somewhat simplify this expression by observ-
ing that:
n
¸
k=0
1
2k −1
=
1
2n −1
+
1
2n −3
+ + 1 −1 =
=
1
2n
+
1
2n −1
+
1
2n −2
+ +
1
2
+1−
1
2n
− −
1
2
−1 =
= H
2n+2

1
2n + 1

1
2n + 2

1
2
H
n+1
+
1
2n + 2
−1 =
= H
2n+2

1
2
H
n+1

2(n + 1)
2n + 1
.
Furthermore, we have:
1
(n + 1)4
n

2n
n

=
4
(n + 1)4
n+1

2n + 2
n + 1

n + 1
2(2n + 1)
=
2
2n + 1
1
4
n+1

2n + 2
n + 1

.
Therefore:
f
n+1
=

1
2
H
n+1
−H
2n+2
+
2(n + 1)
2n + 1

2
2n + 1
1
4
n+1

2n + 2
n + 1

.
This expression allows us to obtain a formula for f
n
:
f
n
=

1
2
H
n
−H
2n
+
2n
2n −1

2
2n −1
1
4
n

2n
n

=
=

H
n
−2H
2n
+
4n
2n −1

1/2
n

.
The reader can numerically check this expression
against the actual values of f
n
given above. By using
the asymptotic approximation H
n
∼ ln n+γ given in
the Introduction, we find:
H
n
−2H
2n
+
4n
2n −1

∼ ln n +γ −2(ln 2 + lnn +γ) + 2 =
= −ln n −γ −ln 4 + 2.
Besides:
1
2n −1
1
4
n

2n
n


1
2n

πn
and we conclude:
f
n
∼ −(ln n +γ + ln 4 −2)
1
2n

πn
which shows that [f
n
[ grows as ln n/n
3/2
.
4.11. THE INTERNAL PATH LENGTH OF BINARY TREES 51
4.11 The internal path length
of binary trees
Binary trees are often used as a data structure to re-
trieve information. A set D of keys is given, taken
from an ordered universe U. Therefore D is a permu-
tation of the ordered sequence d
1
< d
2
< < d
n
,
and as the various elements arrive, they are inserted
in a binary tree. As we know, there are n! pos-
sible permutations of the keys in D, but there are
only

2n
n

/(n+1) different binary trees with n nodes.
When we are looking for some key d to find whether
d ∈ D or not, we perform a search in the binary tree,
comparing d against the root and other keys in the
tree, until we find d or we arrive to some leaf and we
cannot go on with our search. In the former case our
search is successful, while in the latter case it is un-
successful. The problem is: how many comparisons
should we perform, on the average, to find out that d
is present in the tree (successful search)? The answer
to this question is very important, because it tells us
how good binary trees are for searching information.
The number of nodes along a path in the tree, start-
ing at the root and arriving to a given node K is
called the internal path length for K. It is just the
number of comparisons necessary to find the key in
K. Therefore, our previous problem can be stated in
the following way: what is the average internal path
length for binary trees with n nodes? Knuth has
found a rather simple way for answering this ques-
tion; however, we wish to show how the method of
generating functions can be used to find the average
internal path length in a standard way. The reason-
ing is as follows: we evaluate the total internal path
length for all the trees generated by the n! possible
permutations of our key set D, and then divide this
number by n!n, the total number of nodes in all the
trees.
A non-empty binary tree can be seen as two sub-
trees connected to the root (see Section 2.9); the left
subtree contains k nodes (k = 0, 1, . . . , n−1) and the
right subtree contains the remaining n−1 −k nodes.
Let P
n
the total internal path length (i.p.l.) of all the
n! possible trees generated by permutations. The left
subtrees have therefore a total i.p.l. equal to P
k
, but
every search in these subtrees has to pass through
the root. This increases the total i.p.l. by the total
number of nodes, i.e., it actually is P
k
+ k!k. We
now observe that every left subtree is associated to
each possible right subtree and therefore it should be
counted (n − 1 − k)! times. Besides, every permuta-
tion generating the left and right subtrees is not to
be counted only once: the keys can be arranged in
all possible ways in the overall permutation, retain-
ing their relative ordering. These possible ways are

n−1
k

, and therefore the total contribution to P
n
of
the left subtrees is:

n −1
k

(n−1 −k)!(P
k
+k!k) = (n−1)!

P
k
k!
+k

.
In a similar way we find the total contribution of the
right subtrees:

n −1
k

k!(P
n−1−k
+ (n −1 −k)!(n −1 −k)) =
= (n −1)!

P
n−1−k
(n −1 −k)!
+ (n −1 −k)

.
It only remains to count the contribution of the roots,
which obviously amounts to n!, a single comparisons
for every tree. We therefore have the following recur-
rence relation, in which the contributions of the left
and right subtrees turn out to be the same:
P
n
= n! + (n −1)!

n−1
¸
k=0

P
k
k!
+k +
P
n−1−k
(n −1 −k)!
+ (n −1 −k)

=
= n! + 2(n −1)!
n−1
¸
k=0

P
k
k!
+k

=
= n! + 2(n −1)!

n−1
¸
k=0
P
k
k!
+
n(n −1)
2

.
We used the formula for the sum of the first n − 1
integers, and now, by dividing by n!, we have:
P
n
n!
= 1 +
2
n
n−1
¸
k=0
P
k
k!
+n −1 = n +
2
n
n−1
¸
k=0
P
k
k!
.
Let us now set Q
n
= P
n
/n!, so that Q
n
is the average
total i.p.l. relative to a single tree. If we’ll succeed
in finding Q
n
, the average i.p.l. we are looking for
will simply be Q
n
/n. We can also reformulate the
recurrence for n +1, in order to be able to apply the
generating function operator:
(n + 1)Q
n+1
= (n + 1)
2
+ 2
n
¸
k=0
Q
k
Q

(t) =
1 +t
(1 −t)
3
+ 2
Q(t)
1 −t
Q

(t) −
2
1 −t
Q(t) =
1 +t
(1 −t)
3
.
This differential equation can be easily solved:
Q(t) =
1
(1 −t)
2

(1 −t)
2
(1 +t)
(1 −t)
3
dt +C

=
=
1
(1 −t)
2

2 ln
1
1 −t
−t +C

.
52 CHAPTER 4. GENERATING FUNCTIONS
Since the i.p.l. of the empty tree is 0, we should have
Q
0
= Q(0) = 0 and therefore, by setting t = 0, we
find C = 0. The final result is:
Q(t) =
2
(1 −t)
2
ln
1
1 −t

t
(1 −t)
2
.
We can now use the formula for ((nH
n
) (see the Sec-
tion 4.4 on Common Generating Functions) to ex-
tract the coefficient of t
n
:
Q
n
= [t
n
]
1
(1 −t)
2

2

ln
1
1 −t
+ 1

−(2 +t)

=
= 2[t
n+1
]
t
(1 −t)
2

ln
1
1 −t
+ 1

−[t
n
]
2 +t
(1 −t)
2
=
= 2(n+1)H
n+1
−2

−2
n

(−1)
n

−2
n −1

(−1)
n−1
=
= 2(n +1)H
n
+2 −2(n +1) −n = 2(n +1)H
n
−3n.
Thus we conclude with the average i.p.l.:
P
n
n!n
=
Q
n
n
= 2

1 +
1
n

H
n
−3.
This formula is asymptotic to 2 ln n+γ−3, and shows
that the average number of comparisons necessary to
retrieve any key in a binary tree is in the order of
O(ln n).
4.12 Height balanced binary
trees
We have been able to show that binary trees are a
“good” retrieving structure, in the sense that if the
elements, or keys, of a set ¦a
1
, a
2
, . . . , a
n
¦ are stored
in random order in a binary (search) tree, then the
expected average time for retrieving any key in the
tree is in the order of lnn. However, this behavior of
binary trees is not always assured; for example, if the
keys are stored in the tree in their proper order, the
resulting structure degenerates into a linear list and
the average retrieving time becomes O(n).
To avoid this drawback, at the beginning of the
1960’s, two Russian researchers, Adelson-Velski and
Landis, found an algorithm to store keys in a “height
balanced” binary tree, a tree for which the height of
the left subtree of every node K differs by at most
1 from the height of the right subtree of the same
node K. To understand this concept, let us define the
height of a tree as the highest level at which a node
in the tree is placed. The height is also the maximal
number of comparisons necessary to find any key in
the tree. Therefore, if we find a limitation for the
height of a class of trees, this is also a limitation for
the internal path length of the trees in the same class.
Formally, a height balanced binary tree is a tree such
that for every node K in it, if h

K
and h
′′
k
are the
heights of the two subtrees originating from K, then
[h

K
−h
′′
K
[ ≤ 1.
The algorithm of Adelson-Velski and Landis is very
important because, as we are now going to show,
height balanced binary trees assure that the retriev-
ing time for every key in the tree is O(ln n). Because
of that, height balanced binary trees are also known
as AVL trees, and the algorithm for building AVL-
trees from a set of n keys can be found in any book
on algorithms and data structures. Here we only wish
to perform a worst case analysis to prove that the re-
trieval time in any AVL tree cannot be larger than
O(ln n).
In order to perform our analysis, let us consider
to worst possible AVL tree. Since, by definition, the
height of the left subtree of any node cannot exceed
the height of the corresponding right subtree plus 1,
let us consider trees in which the height of the left
subtree of every node exceeds exactly by 1 the height
of the right subtree of the same node. In Figure 4.1
we have drawn the first cases. These trees are built in
a very simple way: every tree T
n
, of height n, is built
by using the preceding tree T
n−1
as the left subtree
and the tree T
n−2
as the right subtree of the root.
Therefore, the number of nodes in T
n
is just the sum
of the nodes in T
n−1
and in T
n−2
, plus 1 (the root),
and the condition on the heights of the subtrees of
every node is satisfied. Because of this construction,
T
n
can be considered as the “worst” tree of height n,
in the sense that every other AVL-tree of height n will
have at least as many nodes as T
n
. Since the height
n is an upper bound for the number of comparisons
necessary to retrieve any key in the tree, the average
retrieving time for every such tree will be ≤ n.
If we denote by [T
n
[ the number of nodes in the
tree T
n
, we have the simple recurrence relation:
[T
n
[ = [T
n−1
[ +[T
n−2
[ + 1.
This resembles the Fibonacci recurrence relation,
and, in fact, we can easily show that [T
n
[ = F
n+1
−1,
as is intuitively apparent from the beginning of the
sequence ¦0, 1, 2, 4, 7, 12, . . .¦. The proof is done by
mathematical induction. For n = 0 we have [T
0
[ =
F
1
− 1 = 1 − 1 = 0, and this is true; similarly we
proceed for n + 1. Therefore, let us suppose that for
every k < n we have [T
k+1
[ = F
k
− 1; this holds for
k = n−1 and k = n−2, and because of the recurrence
relation for [T
n
[ we have:
[T
n
[ = [T
n−1
[ +[T
n−2
[ + 1 =
= F
n
−1 +F
n−1
−1 + 1 = F
n+1
−1
since F
n
+F
n−1
= F
n+1
by the Fibonacci recurrence.
4.13. SOME SPECIAL RECURRENCES 53
T
0
T
1
T
2
T
3
T
4
T
5
,
,
,

,
,
,
,

d
d
,
,
,
,
,
,
,

e
e
d
d
¡
¡
,
,
,
,
,
,
,
,
,
,
,
,

e
e
d
d
¡
¡
d
d
d
d
¡
¡
¡
¡
Figure 4.1: Worst AVL-trees
We have shown that for large values of n we have
F
n
≈ φ
n
/

5; therefore we have [T
n
[ ≈ φ
n+1
/

5 −1
or φ
n+1


5([T
n
[+1). By passing to the logarithms,
we have: n ≈ log
φ
(

5([T
n
[ + 1)) − 1, and since all
logarithms are proportional, n = O(ln [T
n
[). As we
observed, every AVL-tree of height n has a number
of nodes not less than [T
n
[, and this assures that the
retrieving time for every AVL-tree with at most [T
n
[
nodes is bounded from above by log
φ
(

5([T
n
[+1))−1
4.13 Some special recurrences
Not all recurrence relations are linear and we had
occasions to deal with a different sort of relation when
we studied the Catalan numbers. They satisfy the
recurrence C
n
=
¸
n−1
k=0
C
k
C
n−k−1
, which however,
in this particular form, is only valid for n > 0. In
order to apply the method of generating functions,
we write it for n + 1:
C
n+1
=
n
¸
k=0
C
k
C
n−k
.
The right hand member is a convolution, and there-
fore, by the initial condition C
0
= 1, we obtain:
C(t) −1
t
= C(t)
2
or tC(t)
2
−C(t) + 1 = 0.
This is a second degree equation, which can be di-
rectly solved; for t = 0 we should have C(0) = C
0
=
1, and therefore the solution with the + sign before
the square root is to be ignored; we thus obtain:
C(t) =
1 −

1 −4t
2t
which was found in the section “The method of shift-
ing” in a completely different way.
The Bernoulli numbers were introduced by means
of the implicit relation:
n
¸
k=0

n + 1
k

B
k
= δ
n,0
.
We are now in a position to find out their expo-
nential generating function, i.e., the function B(t) =
((B
n
/n!), and prove some of their properties. The
defining relation can be written as:
n
¸
k=0

n + 1
n −k

B
n−k
=
=
n
¸
k=0
(n + 1)n (k + 2)
B
n−k
(n −k)!
=
=
n
¸
k=0
(n + 1)!
(k + 1)!
B
n−k
(n −k)!
= δ
n,0
.
If we divide everything by (n + 1)!, we obtain:
n
¸
k=0
1
(k + 1)!
B
n−k
(n −k)!
= δ
n,0
and since this relation holds for every n ∈ N, we
can pass to the generating functions. The left hand
member is a convolution, whose first factor is the shift
of the exponential function and therefore we obtain:
e
t
−1
t
B(t) = 1 or B(t) =
t
e
t
−1
.
The classical way to see that B
2n+1
= 0, ∀n > 0,
is to show that the function obtained from B(t) by
deleting the term of first degree is an even function,
and therefore should have all its coefficients of odd
order equal to zero. In fact we have:
B(t) +
t
2
=
t
e
t
−1
+
t
2
=
t
2
e
t
+ 1
e
t
−1
.
In order to see that this function is even, we substi-
tute t → −t and show that the function remains the
same:

t
2
e
−t
+ 1
e
−t
−1
= −
t
2
1 +e
t
1 −e
t
=
t
2
e
t
+ 1
e
t
−1
.
54 CHAPTER 4. GENERATING FUNCTIONS
Chapter 5
Riordan Arrays
5.1 Definitions and basic con-
cepts
A Riordan array is a couple of formal power series
D = (d(t), h(t)); if both d(t), h(t) ∈ T
0
, then the
Riordan array is called proper. The Riordan array
can be identified with the infinite, lower triangular
array (or triangle) (d
n,k
)
n,k∈N
defined by:
d
n,k
= [t
n
]d(t)(th(t))
k
(5.1.1)
In fact, we are mainly interested in the sequence of
functions iteratively defined by:

d
0
(t) = d(t)
d
k
(t) = d
k−1
(t)th(t) = d(t)(th(t))
k
These functions are the column generating functions
of the triangle.
Another way of characterizing a Riordan array
D = (d(t), h(t)) is to consider the bivariate gener-
ating function:
d(t, z) =

¸
k=0
d(t)(th(t))
k
z
k
=
d(t)
1 −tzh(t)
(5.1.2)
A common example of a Riordan array is the Pascal
triangle, for which we have d(t) = h(t) = 1/(1 − t).
In fact we have:
d
n,k
= [t
n
]
1
1 −t

t
1 −t

k
= [t
n−k
]
1
(1 −t)
k+1
=
=

−k −1
n −k

(−1)
n−k
=

n
n −k

=

n
k

and this shows that the generic element is actually a
binomial coefficient. By formula (5.1.2), we find the
well-known bivariate generating function d(t, z) =
(1 −t −tz)
−1
.
From our point of view, one of the most impor-
tant properties of Riordan arrays is the fact that the
sums involving the rows of a Riordan array can be
performed by operating a suitable transformation on
a generating function and then by extracting a coeffi-
cient from the resulting function. More precisely, we
prove the following theorem:
Theorem 5.1.1 Let D = (d(t), h(t)) be a Riordan
array and let f(t) be the generating function of the
sequence (f
k
)
k∈N
; then we have:
n
¸
k=0
d
n,k
f
k
= [t
n
]d(t)f(th(t)) (5.1.3)
Proof: The proof consists in a straight-forward com-
putation:
n
¸
k=0
d
n,k
f
k
=

¸
k=0
d
n,k
f
k
=
=

¸
k=0
[t
n
]d(t)(th(t))
k
f
k
=
= [t
n
]d(t)

¸
k=0
f
k
(th(t))
k
=
= [t
n
]d(t)f(th(t)).
In the case of Pascal triangle we obtain the Euler
transformation:
n
¸
k=0

n
k

f
k
= [t
n
]
1
1 −t
f

t
1 −t

which we proved by simple considerations on bino-
mial coefficients and generating functions. By con-
sidering the generating functions ((1) = 1/(1 − t),
(

(−1)
k

= 1/(1+t) and ((k) = t/(1−t)
2
, from the
previous theorem we immediately find:
row sums (r. s.)
¸
k
d
n,k
= [t
n
]
d(t)
1 −th(t)
alternating r. s.
¸
k
(−1)
k
d
n,k
= [t
n
]
d(t)
1 +th(t)
weighted r. s.
¸
k
kd
n,k
= [t
n
]
td(t)h(t)
(1 −th(t))
2
.
Moreover, by observing that
´
D = (d(t), th(t)) is a
Riordan array, whose rows are the diagonals of D,
55
56 CHAPTER 5. RIORDAN ARRAYS
we have:
diagonal sums
¸
k
d
n−k,k
= [t
n
]
d(t)
1 −t
2
h(t)
.
Obviously, this observation can be generalized to find
the generating function of any sum
¸
k
d
n−sk,k
for
every s ≥ 1. We obtain well-known results for the
Pascal triangle; for example, diagonal sums give:
¸
k

n −k
k

= [t
n
]
1
1 −t
1
1 −t
2
(1 −t)
−1
=
= [t
n
]
1
1 −t −t
2
= F
n+1
connecting binomial coefficients and Fibonacci num-
bers.
Another general result can be obtained by means of
two sequences (f
k
)
k∈N
and (g
k
)
k∈N
and their gener-
ating functions f(t), g(t). For p = 1, 2, . . ., the generic
element of the Riordan array (f(t), t
p−1
) is:
d
n,k
= [t
n
]f(t)(t
p
)
k
= [t
n−pk
]f(t) = f
n−pk
.
Therefore, by formula (5.1.3), we have:
⌊n/p⌋
¸
k=0
f
n−pk
g
k
= [t
n
]f(t)

g(y)

y = t
p

=
= [t
n
]f(t)g(t
p
).
This can be called the rule of generalized convolution
since it reduces to the usual convolution rule for p =
1. Suppose, for example, that we wish to sum one
out of every three powers of 2, starting with 2
n
and
going down to the lowest integer exponent ≥ 0; we
have:
S
n
=
⌊n/3⌋
¸
k=0
2
n−3k
= [t
n
]
1
1 −2t
1
1 −t
3
.
As we will learn studying asymptotics, an approxi-
mate value for this sum can be obtained by extracting
the coefficient of the first factor and then by multiply-
ing it by the second factor, in which t is substituted
by 1/2. This gives S
n
≈ 2
n+3
/7, and in fact we have
the exact value S
n
= ⌊2
n+3
/7⌋.
In a sense, the theorem on the sums involving the
Riordan arrays is a characterization for them; in fact,
we can prove a sort of inverse property:
Theorem 5.1.2 Let (d
n,k
)
n,k∈N
be an infinite tri-
angle such that for every sequence (f
k
)
k∈N
we have
¸
k
d
n,k
f
k
= [t
n
]d(t)f(th(t)), where f(t) is the gen-
erating function of the sequence and d(t), h(t) are two
f.p.s. not depending on f(t). Then the triangle de-
fined by the Riordan array (d(t), h(t)) coincides with
(d
n,k
)
n,k∈N
.
Proof: For every k ∈ N take the sequence which
is 0 everywhere except in the kth element f
k
= 1.
The corresponding generating function is f(t) = t
k
and we have
¸

i=0
d
n,i
f
i
= d
n,k
. Hence, according to
the theorem’s hypotheses, we find (
t
(d
n,k
)
n,k∈N
=
d
k
(t) = d(t)(th(t))
k
, and this corresponds to the ini-
tial definition of column generating functions for a
Riordan array, for every k = 1, 2, . . ..
5.2 The algebraic structure of
Riordan arrays
The most important algebraic property of Riordan
arrays is the fact that the usual row-by-column prod-
uct of two Riordan arrays is a Riordan array. This is
proved by considering two Riordan arrays (d(t), h(t))
and (a(t), b(t)) and performing the product, whose
generic element is
¸
j
d
n,j
f
j,k
, if d
n,j
is the generic
element in (d(t), h(t)) and f
j,k
is the generic element
in (a(t), b(t)). In fact we have:

¸
j=0
d
n,j
f
j,k
=
=

¸
j=0
[t
n
]d(t)(th(t))
j
[y
j
]a(y)(yb(y))
k
=
= [t
n
]d(t)

¸
j=0
(th(t))
j
[y
j
]a(y)(yb(y))
k
=
= [t
n
]d(t)a(th(t))(th(t)b(th(t)))
k
.
By definition, the last expression denotes the generic
element of the Riordan array (f(t), g(t)) where f(t) =
d(t)a(th(t)) and g(t) = h(t)b(th(t)). Therefore we
have:
(d(t), h(t)) (a(t), b(t)) = (d(t)a(th(t)), h(t)b(th(t))).
(5.2.1)
This expression is particularly important and is the
basis for many developments of the Riordan array
theory.
The product is obviously associative, and we ob-
serve that the Riordan array (1, 1) acts as the neutral
element or identity. In fact, the array (1, 1) is every-
where 0 except for the elements on the main diagonal,
which are 1. Observe that this array is proper.
Let us now suppose that (d(t), h(t)) is a proper
Riordan array. By formula (5.2.1), we immediately
see that the product of two proper Riordan arrays is
proper; therefore, we can look for a proper Riordan
array (a(t), b(t)) such that (d(t), h(t)) (a(t), b(t)) =
(1, 1). If this is the case, we should have:
d(t)a(th(t)) = 1 and h(t)b(th(t)) = 1.
5.3. THE A-SEQUENCE FOR PROPER RIORDAN ARRAYS 57
By setting y = th(t) we have:
a(y) =

d(t)
−1

t = yh(t)
−1

b(y) =

h(t)
−1

t = yh(t)
−1

.
Here we are in the hypotheses of the Lagrange Inver-
sion Formula, and therefore there is a unique function
t = t(y) such that t(0) = 0 and t = yh(t)
−1
. Besides,
being d(t), h(t) ∈ T
0
, the two f.p.s. a(y) and b(y) are
uniquely defined. We have therefore proved:
Theorem 5.2.1 The set / of proper Riordan arrays
is a group with the operation of row-by-column prod-
uct defined functionally by relation (5.2.1).
It is a simple matter to show that some important
classes of Riordan arrays are subgroups of /:
• the set of the Riordan arrays (f(t), 1) is an in-
variant subgroup of /; it is called the Appell
subgroup;
• the set of the Riordan arrays (1, g(t)) is a sub-
group of / and is called the subgroup of associ-
ated operators or the Lagrange subgroup;
• the set of Riordan arrays (f(t), f(t)) is a sub-
group of / and is called the Bell subgroup. Its
elements are also known as renewal arrays.
The first two subgroups have already been consid-
ered in the Chapter on “Formal Power Series” and
show the connection between f.p.s. and Riordan ar-
rays. The notations used in that Chapter are thus
explained as particular cases of the most general case
of (proper) Riordan arrays.
Let us now return to the formulas for a Riordan
array inverse. If h(t) is any fixed invertible f.p.s., let
us define:
d
h
(t) =

d(y)
−1

y = th(y)
−1

so that we can write (d(t), h(t))
−1
= (d
h
(t), h
h
(t)).
By the product formula (5.2.1) we immediately find
the identities:
d(th
h
(t)) = d
h
(t)
−1
d
h
(th(t)) = d(t)
−1
h(th
h
(t)) = h
h
(t)
−1
h
h
(th(t)) = h(t)
−1
which can be reduced to the single and basic rule:
f(th
h
(t)) = f
h
(t)
−1
∀f(t) ∈ T
0
.
Observe that obviously f
h
(t) = f(t).
We wish now to find an explicit expression for the
generic element d
n,k
in the inverse Riordan array
(d(t), h(t))
−1
in terms of d(t) and h(t). This will be
done by using the LIF. As we observed in the first sec-
tion, the bivariate generating function for (d(t), h(t))
is d(t)/(1 −tzh(t)) and therefore we have:
d
n,k
= [t
n
z
k
]
d
h
(t)
1 −tzh
h
(t)
=
= [z
k
][t
n
]
¸
d
h
(t)
1 −zy

y = th
h
(t)

.
By the formulas above, we have:
y = th
h
(t) = th(th
h
(t))
−1
= th(y)
−1
which is the same as t = yh(y). Therefore we find:
d
h
(t) = d
h
(yh(y)) = d(t)
−1
, and consequently:
d
n,k
= [z
k
][t
n
]
¸
d(y)
−1
1 −zy

y = th(y)
−1

=
= [z
k
]
1
n
[y
n−1
]

d
dy
d(y)
−1
1 −zy

1
h(y)
n
=
= [z
k
]
1
n
[y
n−1
]

z
d(y)(1 −zy)
2


d

(y)
d(y)
2
(1 −zy)

1
h(y)
n
=
= [z
k
]
1
n
[y
n−1
]


¸
r=0
z
r+1
y
r
(r + 1)−

d

(y)
d(y)

¸
r=0
z
r
y
r

1
d(y)h(y)
n
=
=
1
n
[y
n−1
]

ky
k−1
−y
k
d

(y)
d(y)

1
d(y)h(y)
n
=
=
1
n
[y
n−k
]

k −
yd

(y)
d(y)

1
d(y)h(y)
n
.
This is the formula we were looking for.
5.3 The A-sequence for proper
Riordan arrays
Proper Riordan arrays play a very important role
in our approach. Let us consider a Riordan array
D = (d(t), h(t)), which is not proper, but d(t) ∈
T
0
. Since h(0) = 0, an s > 0 exists such that
h(t) = h
s
t
s
+h
s+1
t
s+1
+ and h
s
= 0. If we define
´
h(t) = h
s
+h
s+1
t+ , then
´
h(t) ∈ T
0
. Consequently,
the Riordan array
´
D = (d(t),
´
h(t)) is proper and the
rows of D can be seen as the s-diagonals (
´
d
n−sk
)
k∈N
of
´
D. Fortunately, for proper Riordan arrays, Rogers
has found an important characterization: every ele-
ment d
n+1,k+1
, n, k ∈ N, can be expressed as a linear
combination of the elements in the preceding row,
i.e.:
d
n+1,k+1
= a
0
d
n,k
+a
1
d
n,k+1
+a
2
d
n,k+2
+ =
58 CHAPTER 5. RIORDAN ARRAYS
=

¸
j=0
a
j
d
n,k+j
. (5.3.1)
The sum is actually finite and the sequence A =
(a
k
)
k∈N
is fixed. More precisely, we can prove the
following theorem:
Theorem 5.3.1 An infinite lower triangular array
D = (d
n,k
)
n,k∈N
is a Riordan array if and only if a
sequence A = ¦a
0
= 0, a
1
, a
2
, . . .¦ exists such that for
every n, k ∈ N relation (5.3.1) holds
Proof: Let us suppose that D is the Riordan
array (d(t), h(t)) and let us consider the Riordan
array (d(t)h(t), h(t)); we define the Riordan array
(A(t), B(t)) by the relation:
(A(t), B(t)) = (d(t), h(t))
−1
(d(t)h(t), h(t))
or:
(d(t), h(t)) (A(t), B(t)) = (d(t)h(t), h(t)).
By performing the product we find:
d(t)A(th(t)) = d(t)h(t) and h(t)B(th(t)) = h(t).
The latter identity gives B(th(t)) = 1 and this implies
B(t) = 1. Therefore we have (d(t), h(t)) (A(t), 1) =
(d(t)h(t), h(t)). The element f
n,k
of the left hand
member is
¸

j=0
d
n,j
a
k−j
=
¸

j=0
d
n,k+j
a
j
, if as
usual we interpret a
k−j
as 0 when k < j. The same
element in the right hand member is:
[t
n
]d(t)h(t)(th(t))
k
=
= [t
n+1
]d(t)(th(t))
k+1
= d
n+1,k+1
.
By equating these two quantities, we have the iden-
tity (5.3.1). For the converse, let us observe that
(5.3.1) uniquely defines the array D when the ele-
ments ¦d
0,0
, d
1,0
, d
2,0
, . . .¦ of column 0 are given. Let
d(t) be the generating function of this column, A(t)
the generating function of the sequence A and de-
fine h(t) as the solution of the functional equation
h(t) = A(th(t)), which is uniquely determined be-
cause of our hypothesis a
0
= 0. We can therefore
consider the proper Riordan array
´
D = (d(t), h(t));
by the first part of the theorem,
´
D satisfies relation
(5.3.1), for every n, k ∈ N and therefore, by our previ-
ous observation, it must coincide with D. This com-
pletes the proof.
The sequence A = (a
k
)
k∈N
is called the A-sequence
of the Riordan array D = (d(t), h(t)) and it only
depends on h(t). In fact, as we have shown during
the proof of the theorem, we have:
h(t) = A(th(t)) (5.3.2)
and this uniquely determines A when h(t) is given
and, vice versa, h(t) is uniquely determined when A
is given.
The A-sequence for the Pascal triangle is the so-
lution A(y) of the functional equation 1/(1 − t) =
A(t/(1 − t)). The simple substitution y = t/(1 − t)
gives A(y) = 1 +y, corresponding to the well-known
basic recurrence of the Pascal triangle:

n+1
k+1

=

n
k

+

n
k+1

. At this point, we realize that we could
have started with this recurrence relation and directly
found A(y) = 1+y. Now, h(t) is defined by (5.3.2) as
the solution of h(t) = 1+th(t), and this immediately
gives h(t) = 1/(1−t). Furthermore, since column 0 is
¦1, 1, 1, . . .¦, we have proved that the Pascal triangle
corresponds to the Riordan array (1/(1−t), 1/(1−t))
as initially stated.
The pair of functions d(t) and A(t) completely
characterize a proper Riordan array. Another type
of characterization is obtained through the following
observation:
Theorem 5.3.2 Let (d
n,k
)
n,k∈N
be any infinite,
lower triangular array with d
n,n
= 0, ∀n ∈ N (in
particular, let it be a proper Riordan array); then a
unique sequence Z = (z
k
)
k∈N
exists such that every
element in column 0 can be expressed as a linear com-
bination of all the elements in the preceding row, i.e.:
d
n+1,0
= z
0
d
n,0
+z
1
d
n,1
+z
2
d
n,2
+ =

¸
j=0
z
j
d
n,j
.
(5.3.3)
Proof: Let z
0
= d
1,0
/d
0,0
. Now we can uniquely
determine the value of z
1
by expressing d
2,0
in terms
of the elements in row 1, i.e.:
d
2,0
= z
0
d
1,0
+z
1
d
1,1
or z
1
=
d
0,0
d
2,0
−d
2
1,0
d
0,0
d
1,1
.
In the same way, we determine z
2
by expressing d
3,0
in terms of the elements in row 2, and by substituting
the values just obtained for z
0
and z
1
. By proceeding
in the same way, we determine the sequence Z in a
unique way.
The sequence Z is called the Z-sequence for the
(Riordan) array; it characterizes column 0, except
for the element d
0,0
. Therefore, we can say that
the triple (d
0,0
, A(t), Z(t)) completely characterizes
a proper Riordan array. To see how the Z-sequence
is obtained by starting with the usual definition of a
Riordan array, let us prove the following:
Theorem 5.3.3 Let (d(t), h(t)) be a proper Riordan
array and let Z(t) be the generating function of the
array’s Z-sequence. We have:
d(t) =
d
0,0
1 −tZ(th(t))
5.4. SIMPLE BINOMIAL COEFFICIENTS 59
Proof: By the preceding theorem, the Z-sequence
exists and is unique. Therefore, equation (5.3.3) is
valid for every n ∈ N, and we can go on to the gener-
ating functions. Since d(t)(th(t))
k
is the generating
function for column k, we have:
d(t) −d
0,0
t
=
= z
0
d(t) +z
1
d(t)th(t) +z
2
d(t)(th(t))
2
+ =
= d(t)(z
0
+z
1
th(t) +z
2
(th(t))
2
+ ) =
= d(t)Z(th(t)).
By solving this equation in d(t), we immediately find
the relation desired.
The relation can be inverted and this gives us the
formula for the Z-sequence:
Z(y) =
¸
d(t) −d
0,0
td(t)

t = yh(t)
−1

.
We conclude this section by giving a theorem,
which characterizes renewal arrays by means of the
A- and Z-sequences:
Theorem 5.3.4 Let d(0) = h(0) = 0. Then d(t) =
h(t) if and only if: A(y) = d(0) +yZ(y).
Proof: Let us assume that A(y) = d(0) + yZ(y) or
Z(y) = (A(y) − d(0))/y. By the previous theorem,
we have:
d(t) =
d(0)
1 −tZ(th(t))
=
=
d(0)
1 −(tA(th(t)) −d(0)t)/th(t)
=
=
d(0)th(t)
d(0)t
= h(t),
because A(th(t)) = h(t) by formula (5.3.2). Vice
versa, by the formula for Z(y), we obtain from the
hypothesis d(t) = h(t):
d(0) +yZ(y) =
=
¸
d(0) +y

1
t

d(0)
th(t)

t = yh(t)
−1

=
=
¸
d(0) +
th(t)
t

d(0)th(t)
th(t)

t = yh(t)
−1

=
=

h(t)

t = yh(t)
−1

= A(y).
5.4 Simple binomial coeffi-
cients
Let us consider simple binomial coefficients, i.e., bi-
nomial coefficients of the form

n+ak
m+bk

, where a, b are
two parameters and k is a non-negative integer vari-
able. Depending if we consider n a variable and m a
parameter, or vice versa, we have two different infi-
nite arrays (d
n,k
) or (
´
d
m,k
), whose elements depend
on the parameters a, b, m or a, b, n, respectively. In
either case, if some conditions on a, b hold, we have
Riordan arrays and therefore we can apply formula
(5.1.3) to find the value of many sums.
Theorem 5.4.1 Let d
n,k
and
´
d
m,k
be as above. If
b > a and b − a is an integer, then D = (d
n,k
) is
a Riordan array. If b < 0 is an integer, then
´
D =
(
´
d
m,k
) is a Riordan array. We have:
D =

t
m
(1 −t)
m+1
,
t
b−a−1
(1 −t)
b

´
D =

(1 +t)
n
,
t
−b−1
(1 +t)
−a

.
Proof: By using well-known properties of binomial
coefficients, we find:
d
n,k
=

n +ak
m+bk

=

n +ak
n −m+ak −bk

=
=

−n −ak +n −m+ak −bk −1
n −m+ak −bk

(−1)
n−m+ak−bk
=
=

−m−bk −1
(n −m) + (a −b)k

(−1)
n−m+ak−bk
=
= [t
n−m+ak−bk
]
1
(1 −t)
m+1+bk
=
= [t
n
]
t
m
(1 −t)
m+1

t
b−a
(1 −t)
b

k
;
and:
´
d
m,k
= [t
m+bk
](1 +t)
n+ak
=
= [t
m
](1 +t)
n
(t
−b
(1 +t)
a
)
k
.
The theorem now directly follows from (5.1.1)
For m = a = 0 and b = 1 we again find the Riordan
array of the Pascal triangle. The sum (5.1.3) takes
on two specific forms which are worth being stated
explicitly:
¸
k

n +ak
m+bk

f
k
=
= [t
n
]
t
m
(1 −t)
m+1
f

t
b−a
(1 −t)
b

b > a (5.4.1)
¸
k

n +ak
m+bk

f
k
=
= [t
m
](1 +t)
n
f(t
−b
(1 +t)
a
) b < 0. (5.4.2)
60 CHAPTER 5. RIORDAN ARRAYS
If m and n are independent of each other, these
relations can also be stated as generating function
identities. The binomial coefficient

n+ak
m+bk

is so gen-
eral that a large number of combinatorial sums can
be solved by means of the two formulas (5.4.1) and
(5.4.2).
Let us begin our set of examples with a simple sum;
by the theorem above, the binomial coefficients

n−k
m

corresponds to the Riordan array (t
m
/(1 +t)
m+1
, 1);
therefore, by the formula concerning the row sums,
we have:
¸
k

n −k
m

= [t
n
]
t
m
(1 −t)
m+1
1
1 −t
=
= [t
n−m
]
1
(1 −t)
m+2
=

n + 1
m+ 1

.
Another simple example is the sum:
¸
k

n
2k + 1

5
k
=
= [t
n
]
t
(1 −t)
2
¸
1
1 −5y

y =
t
2
(1 −t)
2

=
=
1
2
[t
n
]
2t
1 −2t −4t
2
= 2
n−1
F
n
.
The following sum is a more interesting case. From
the generating function of the Catalan numbers we
immediately find:
¸
k

n +k
m+ 2k

2k
k

(−1)
k
k + 1
=
= [t
n
]
t
m
(1 −t)
m+1
¸√
1 + 4y −1
2y

y =
t
(1 −t)
2

=
= [t
n−m
]
1
(1 −t)
m+1

1 +
4t
(1 −t)
2
−1

(1 −t)
2
2t
=
= [t
n−m
]
1
(1 −t)
m
=

n −1
m−1

.
In the following sum we use the bisection formulas.
Because the generating function for

z+1
k

2
k
is (1 +
2t)
z+1
, we have:
(

z + 1
2k + 1

2
2k+1

=
=
1
2

t

(1 + 2

t)
z+1
−(1 −2

t)
z+1

.
By applying formula (5.4.2):
¸
k

z + 1
2k + 1

z −2k
n −k

2
2k+1
=
= [t
n
](1 +t)
z
¸
(1 + 2

y)
z+1
2

y


(1 −2

y)
z+1
2

y

y =
t
(1 +t)
2

=
= [t
n
](1 +t)
z+1

(1 +t + 2

t)
z+1
2

t(1 +t)
z+1


(1 +t −2

t)
z+1
2

t(1 +t)
z+1

=
= [t
2n+1
](1 +t)
2z+2
=

2z + 2
2n + 1

;
in the last but one passage, we used backwards the
bisection rule, since (1 +t ±2

t)
z+1
= (1 ±

t)
2z+2
.
We solve the following sum by using (5.4.2):
¸
k

2n −2k
m−k

n
k

(−2)
k
=
= [t
m
](1 +t)
2n
¸
(1 −2y)
n

y =
t
(1 −t)
2

=
= [t
m
](1 +t
2
)
n
=

n
m/2

where the binomial coefficient is to be taken as zero
when m is odd.
5.5 Other Riordan arrays from
binomial coefficients
Other Riordan arrays can be found by using the the-
orem in the previous section and the rule (α = 0, if
− is considered):
α ±β
α

α
β

=

α
β

±

α −1
β −1

.
For example we find:
2n
n +k

n +k
2k

=
2n
n +k

n +k
n −k

=
=

n +k
n −k

+

n +k −1
n −k −1

=
=

n +k
2k

+

n −1 +k
2k

.
Hence, by formula (5.4.1), we have:
¸
k
2n
n +k

n +k
n −k

f
k
=
=
¸
k

n +k
2k

f
k
+
¸
k

n −1 +k
2k

f
k
=
= [t
n
]
1
1 −t
f

t
(1 −t)
2

+
+ [t
n−1
]
1
1 −t
f

t
(1 −t)
2

=
= [t
n
]
1 +t
1 −t
f

t
(1 −t)
2

.
5.6. BINOMIAL COEFFICIENTS AND THE LIF 61
This proves that the infinite triangle of the elements
2n
n+k

n+k
2k

is a proper Riordan array and many identi-
ties can be proved by means of the previous formula.
For example:
¸
k
2n
n +k

n +k
n −k

2k
k

(−1)
k
=
= [t
n
]
1 +t
1 −t
¸
1

1 + 4y

y =
t
(1 −t)
2

=
= [t
n
]1 = δ
n,0
,
¸
k
2n
n +k

n +k
n −k

2k
k

(−1)
k
k + 1
=
= [t
n
]
1 +t
1 −t
¸√
1 + 4y −1
2y

y =
t
(1 −t)
2

=
= [t
n
](1 +t) = δ
n,0

n,1
.
The following is a quite different case. Let f(t) =
((f
k
) and:
G(t) = (

f
k
k

=

t
0
f(τ) −f
0
τ
dτ.
Obviously we have:

n −k
k

=
n −k
k

n −k −1
k −1

except for k = 0, when the left-hand side is 1 and the
right-hand side is not defined. By formula (5.4.1):
¸
k
n
n −k

n −k
k

f
k
=
= f
0
+n

¸
k=1

n −k −1
k −1

f
k
k
=
= f
0
+n[t
n
]G

t
2
1 −t

. (5.5.1)
This gives an immediate proof of the following for-
mula known as Hardy’s identity:
¸
k
n
n −k

n −k
k

(−1)
k
=
= [t
n
] ln
1 −t
2
1 +t
3
=
= [t
n
]

ln
1
1 +t
3
−ln
1
1 −t
2

=
=

(−1)
n
2/n if 3 divides n
(−1)
n−1
/n otherwise.
We also immediately obtain:
¸
k
1
n −k

n −k
k

=
φ
n
+
´
φ
n
n
where φ is the golden ratio and
´
φ = −φ
−1
. The
reader can generalize formula (5.5.1) by using the
change of variable t → pt and prove other formulas.
The following one is known as Riordan’s old identity:
¸
k
n
n −k

n −k
k

(a +b)
n−2k
(−ab)
k
= a
n
+b
n
while this is a generalization of Hardy’s identity:
¸
k
n
n −k

n −k
k

x
n−2k
(−1)
k
=
=
(x +

x
2
−4)
n
+ (x −

x
2
−4)
n
2
n
.
5.6 Binomial coefficients and
the LIF
In a few cases only, the formulas of the previous sec-
tions give the desired result when the m and n in
the numerator and denominator of a binomial coeffi-
cient are related between them. In fact, in that case,
we have to extract the coefficient of t
n
from a func-
tion depending on the same variable n (or m). This
requires to apply the Lagrange Inversion Formula, ac-
cording to the diagonalization rule. Let us suppose
we have the binomial coefficient

2n−k
n−k

and we wish
to know whether it corresponds to a Riordan array
or not. We have:

2n −k
n −k

= [t
n−k
](1 +t)
2n−k
=
= [t
n
](1 +t)
2n

t
1 +t

k
.
The function (1 + t)
2n
cannot be assumed as the
d(t) function of a Riordan array because it varies
as n varies. Therefore, let us suppose that k is
fixed; we can apply the diagonalization rule with
F(t) = (t/(1 + t))
k
and φ(t) = (1 + t)
2
, and try to
find a true generating function. We have to solve the
equation:
w = tφ(w) or w = t(1 +w)
2
.
This equation is tw
2
− (1 − 2t)w + t = 0 and we are
looking for the unique solution w = w(t) such that
w(0) = 0. This is:
w(t) =
1 −2t −

1 −4t
2t
.
We now perform the necessary computations:
F(w) =

w
1 +w

k
=
62 CHAPTER 5. RIORDAN ARRAYS
=

1 −2t −

1 −4t
1 −

1 −4t

k
=
=

1 −

1 −4t
2

k
;
furthermore:
1
1 −tφ

(w)
=
1
1 −2t(1 +w)
=
1

1 −4t
.
Therefore, the diagonalization gives:

2n −k
n −k

= [t
n
]
1

1 −4t

1 −

1 −4t
2

k
.
This shows that the binomial coefficient is the generic
element of the Riordan array:
D =

1

1 −4t
,
1 −

1 −4t
2t

.
As a check, we observe that column 0 contains all the
elements with k = 0, i.e.,

2n
n

, and this is in accor-
dance with the generating function d(t) = 1/

1 −4t.
A simple example is:
n
¸
k=0

2n −k
n −k

2
k
=
= [t
n
]
1

1 −4t
¸
1
1 −2y

y =
1 −

1 −4t
2

=
= [t
n
]
1

1 −4t
1

1 −4t
= [t
n
]
1
1 −4t
= 4
n
.
By using the diagonalization rule as above, we can
show that:

2n +ak
n −ck

k∈N
=
=

1

1 −4t
, t
c−1

1 −

1 −4t
2t

a+2c

.
An interesting example is given by the following al-
ternating sum:
¸
k

2n
n −3k

(−1)
k
=
= [t
n
]
1

1 −4t
¸
1
1 +y

y = t
3

1 −

1 −4t
2t

6
¸
= [t
n
]

1
2

1 −4t
+
1 −t
2(1 −3t)

=
=
1
2

2n
n

+ 3
n−1
+
δ
n,0
6
.
The reader is invited to solve, in a similar way, the
corresponding non-alternating sum.
In the same way we can deal with binomial coeffi-
cients of the form

pn+ak
n−ck

, but in this case, in order
to apply the LIF, we have to solve an equation of de-
gree p > 2. This creates many difficulties, and we do
not insist on it any longer.
5.7 Coloured walks
In the section “Walks, trees and Catalan numbers”
we introduced the concept of a walk or path on the
integral lattice Z
2
. The concept can be generalized by
defining a walk as a sequence of steps starting from
the origin and composed by three kinds of steps:
1. east steps, which go from the point (x, y) to (x+
1, y);
2. diagonal steps, which go from the point (x, y) to
(x + 1, y + 1);
3. north steps, which go from the point (x, y) to
(x, y + 1).
A colored walk is a walk in which every kind of step
can assume different colors; we denote by a, b, c (a >
0, b, c ≥ 0) the number of colors the east, diagonal
and north steps can be. We discuss complete colored
walks, i.e., walks without any restriction, and under-
diagonal walks, i.e., walks that never go above the
main diagonal x −y = 0. The length of a walk is the
number of its steps, and we denote by d
n,k
the num-
ber of colored walks which have length n and reach a
distance k from the main diagonal, i.e., the last step
ends on the diagonal x − y = k ≥ 0. A colored walk
problem is any (counting) problem corresponding to
colored walks; a problem is called symmetric if and
only if a = c.
We wish to point out that our considerations are
by no means limited to the walks on the integral lat-
tice. Many combinatorial problems can be proved
to be equivalent to some walk problems; bracketing
problems are a typical example and, in fact, a vast
literature exists on walk problems.
Let us consider d
n+1,k+1
, i.e., the number of col-
ored walks of length n+1 reaching the distance k +1
from the main diagonal. We observe that each walk
is obtained in a unique way as:
1. a walk of length n reaching the distance k from
the main diagonal, followed by any of the a east
steps;
2. a walk of length n reaching the distance k + 1
from the main diagonal, followed by any of the b
diagonal steps;
3. a walk of length n reaching the distance k + 2
from the main diagonal, followed by any of the
c north steps.
Hence we have: d
n+1,k+1
= ad
n,k
+bd
n,k+1
+cd
n,k+2
.
This proves that A = ¦a, b, c¦ is the A-sequence of
(d
n,k
)
n,k∈N
, which therefore is a proper Riordan ar-
ray. This significant fact can be stated as:
5.7. COLOURED WALKS 63
Theorem 5.7.1 Let d
n,k
be the number of colored
walks of length n reaching a distance k from the main
diagonal, then the infinite triangle (d
n,k
)
n,k∈N
is a
proper Riordan array.
The Pascal, Catalan and Motzkin triangles define
walking problems that have different values of a, b, c.
When c = 0, it is easily proved that d
n,k
=

n
k

a
k
b
n−k
and so we end up with the Pascal triangle. Con-
sequently, we assume c = 0. For any given triple
(a, b, c) we obtain one type of array from complete
walks and another from underdiagonal walks. How-
ever, the function h(t), that only depends on the A-
sequence, is the same in both cases, and we can find it
by means of formula (5.3.2). In fact, A(t) = a+bt+ct
2
and h(t) is the solution of the functional equation
h(t) = a +bth(t) +ct
2
h(t)
2
having h(0) = 0:
h(t) =
1 −bt −

1 −2bt +b
2
t
2
−4act
2
2ct
2
(5.7.1)
The radicand 1 − 2bt + (b
2
− 4ac)t
2
= (1 − (b +
2

ac)t)(1 − (b − 2

ac)t) will be simply denoted by
∆.
Let us now focus our attention on underdiagonal
walks. If we consider d
n+1,0
, we observe that every
walk returning to the main diagonal can only be ob-
tained from another walk returning to the main diag-
onal followed by any diagonal step, or a walk ending
at distance 1 from the main diagonal followed by any
north step. Hence, we have d
n+1,0
= bd
n,0
+ cd
n,1
and in the column generating functions this corre-
sponds to d(t) − 1 = btd(t) + ctd(t)th(t). From this
relation we easily find d(t) = (1/a)h(t), and therefore
by (5.7.1) the Riordan array of underdiagonal colored
walk is:
(d
n,k
)
n,k∈N
=

1 −bt −


2act
2
,
1 −bt −


2ct
2

.
In current literature, major importance is usually
given to the following three quantities:
1. the number of walks returning to the main diag-
onal; this is d
n
= [t
n
]d(t), for every n,
2. the total number of walks of length n; this is
α
n
=
¸
n
k=0
d
n,k
, i.e., the value of the row sums
of the Riordan array;
3. the average distance from the main diagonal of
all the walks of length n; this is δ
n
=
¸
n
k=0
kd
n,k
,
which is the weighted row sum of the Riordan
array, divided by α
n
.
In Chapter 7 we will learn how to find an asymp-
totic approximation for d
n
. With regard to the last
two points, the formulas for the row sums and the
weighted row sums given in the first section allow us
to find the generating functions α(t) of the total num-
ber α
n
of underdiagonal walks of length n, and δ(t)
of the total distance δ
n
of these walks from the main
diagonal:
α(t) =
1
2at
1 −(b + 2a)t −


(a +b +c)t −1
δ(t) =
1
4at

1 −(b + 2a)t −


(a +b +c)t −1

2
.
In the symmetric case these formulas simplify as fol-
lows:
α(t) =
1
2at

1 −(b −2a)t
1 −(b + 2a)t
−1

δ(t) =
1
2at

1 −bt
1 −(b + 2a)t

1 −(b −2a)t
1 −(b + 2a)t

.
The alternating row sums and the diagonal sums
sometimes have some combinatorial significance as
well, and so they can be treated in the same way.
The study of complete walks follows the same lines
and we only have to derive the form of the corre-
sponding Riordan array, which is:
(d
n,k
)
n,k∈N
=

1


,
1 −bt −


2ct
2

.
The proof is as follows. Since a complete walk can
go above the main diagonal, the array (d
n,k
)
n,k∈N
is
only the right part of an infinite triangle, in which
k can also assume the negative values. By following
the logic of the theorem above, we see that the gen-
erating function of the nth row is ((c/w) +b +aw)
n
,
and therefore the bivariate generating function of the
extended triangle is:
d(t, w) =
¸
n

c
w
+b +aw

n
t
n
=
=
1
1 −(aw +b +c/w)t
.
If we expand this expression by partial fractions, we
get:
d(t, w) =
1

1
1 −
1−bt−


2ct
w

1
1 −
1−bt+


2ct
w

=
1

1
1 −
1−bt−


2ct
w
+
+
1 −bt −


2at
1
w
1
1 −
1−bt−


2ct
1
w

.
64 CHAPTER 5. RIORDAN ARRAYS
The first term represents the right part of the ex-
tended triangle and this corresponds to k ≥ 0,
whereas the second term corresponds to the left part
(k < 0). We are interested in the right part, and the
expression can be written as:
1


1
1 −
1−bt−


2ct
w
=
1


¸
k

1 −bt −


2ct

k
w
k
which immediately gives the form of the Riordan ar-
ray.
5.8 Stirling numbers and Rior-
dan arrays
The connection between Riordan arrays and Stirling
numbers is not immediate. If we examine the two in-
finite triangles of the Stirling numbers of both kinds,
we immediately realize that they are not Riordan ar-
rays. It is not difficult to obtain the column generat-
ing functions for the Stirling numbers of the second
kind; by starting with the recurrence relation:

n + 1
k + 1

= (k + 1)

n
k + 1

+

n
k
¸
and the obvious generating function S
0
(t) = 1, we
can specialize the recurrence (valid for every n ∈ N)
to the case k = 0. This gives the relation between
generating functions:
S
1
(t) −S
1
(0)
t
= S
1
(t) +S
0
(t);
because S
1
(0) = 0, we immediately obtain S
1
(t) =
t/(1−t), which is easily checked by looking at column
1 in the array. In a similar way, by specializing the
recurrence relation to k = 1, we find S
2
(t) = 2tS
2
(t)+
tS
1
(t), whose solution is:
S
2
(t) =
t
2
(1 −t)(1 −2t)
.
This proves, in an algebraic way, that
¸
n
2
¸
= 2
n−1
−1,
and also indicates the form of the generating function
for column m:
S
m
(t) = (

n
m
¸
n∈N
=
t
m
(1 −t)(1 −2t) (1 −mt)
which is now proved by induction when we specialize
the recurrence relation above to k = m. This is left
to the reader as a simple exercise.
The generating functions for the Stirling numbers
of the first kind are not so simple. However, let us
go on with the Stirling numbers of the second kind
proceeding in the following way; if we multiply the
recurrence relation by (k +1)!/(n+1)! we obtain the
new relation:
(k + 1)!
(n + 1)!

n + 1
k + 1

=
=
(k + 1)!
n!

n
k + 1

k + 1
n + 1
+
k!
n!

n
k
¸
k + 1
n + 1
.
If we denote by d
n,k
the quantity k!
¸
n
k
¸
/n!, this is a
recurrence relation for d
n,k
, which can be written as:
(n + 1)d
n+1,k+1
= (k + 1)d
n,k+1
+ (k + 1)d
n,k
.
Let us now proceed as above and find the column
generating functions for the new array (d
n,k
)
n,k∈N
.
Obviously, d
0
(t) = 1; by setting k = 0 in the new
recurrence:
(n + 1)d
n+1,1
= d
n,1
+d
n,0
and passing to generating functions: d

1
(t) = d
1
(t) +
1. The solution of this simple differential equation
is d
1
(t) = e
t
− 1 (the reader can simply check this
solution, if he or she prefers). We can now go on
by setting k = 1 in the recurrence; we obtain: (n +
1)d
n+1,2
= 2d
n,2
+ 2d
n,1
, or d

2
(t) = 2d
2
(t) + 2(e
t

1). Again, this differential equation has the solution
d
2
(t) = (e
t
− 1)
2
, and this suggests that, in general,
we have: d
k
(t) = (e
t
− 1)
k
. A rigorous proof of this
fact can be obtained by mathematical induction; the
recurrence relation gives: d

k+1
(t) = (k +1)d
k+1
(t) +
(k + 1)d
k
(t). By the induction hypothesis, we can
substitute d
k
(t) = (e
t
−1)
k
and solve the differential
equation thus obtained. In practice, we can simply
verify that d
k+1
(t) = (e
t
−1)
k+1
; by substituting, we
have:
(k + 1)e
t
(e
t
−1)
k
=
(k + 1)(e
t
−1)
k+1
+ (k + 1)(e
t
−1)
k
and this equality is obviously true.
The form of this generating function:
d
k
(t) = (

k!
n!

n
k
¸

n∈N
= (e
t
−1)
k
proves that (d
n,k
)
n,k∈N
is a Riordan array having
d(t) = 1 and th(t) = (e
t
− 1). This fact allows us
to prove algebraically a lot of identities concerning
the Stirling numbers of the second kind, as we shall
see in the next section.
For the Stirling numbers of the first kind we pro-
ceed in an analogous way. We multiply the basic
recurrence:
¸
n + 1
k + 1

= n
¸
n
k + 1

+

n
k

5.9. IDENTITIES INVOLVING THE STIRLING NUMBERS 65
by (k + 1)!/(n + 1)! and study the quantity f
n,k
=
k!

n
k

/n!:
(k + 1)!
(n + 1)!
¸
n + 1
k + 1

=
=
(k + 1)!
n!
¸
n
k + 1

n
n + 1
+
k!
n!

n
k

k + 1
n + 1
,
that is:
(n + 1)f
n+1,k+1
= nf
n,k+1
+ (k + 1)f
n,k
.
In this case also we have f
0
(t) = 1 and by special-
izing the last relation to the case k = 0, we obtain:
f

1
(t) = tf

1
(t) +f
0
(t).
This is equivalent to f

1
(t) = 1/(1 − t) and because
f
1
(0) = 0 we have:
f
1
(t) = ln
1
1 −t
.
By setting k = 1, we find the simple differential equa-
tion f

2
(t) = tf

2
(t) + 2f
1
(t), whose solution is:
f
2
(t) =

ln
1
1 −t

2
.
This suggests the general formula:
f
k
(t) = (

k!
n!

n
k

n∈N
=

ln
1
1 −t

k
and again this can be proved by induction. In this
case, (f
n,k
)
n,k∈N
is the Riordan array having d(t) = 1
and th(t) = ln(1/(1 −t)).
5.9 Identities involving the
Stirling numbers
The two recurrence relations for d
n,k
and f
n,k
do not
give an immediate evidence that the two triangles
are indeed Riordan arrays., because they do not cor-
respond to A-sequences. However, the A-sequences
for the two arrays can be easily found, once we know
their h(t) function. For the Stirling numbers of the
first kind we have to solve the functional equation:
ln
1
1 −t
= tA

ln
1
1 −t

.
By setting y = ln(1/(1 − t)) or t = (e
y
− 1)/y, we
have A(y) = ye
y
/(e
y
− 1) and this is the generating
function for the A-sequence we were looking for. In
a similar way, we find that the A-sequence for the
triangle related to the Stirling numbers of the second
kind is:
A(t) =
t
ln(1 +t)
.
A first result we obtain by using the correspon-
dence between Stirling numbers and Riordan arrays
concerns the row sums of the two triangles. For the
Stirling numbers of the first kind we have:
n
¸
k=0

n
k

= n!
n
¸
k=0
k!
n!

n
k

1
k!
=
= n![t
n
]
¸
e
y

y = ln
1
1 −t

=
= n![t
n
]
1
1 −t
= n!
as we observed and proved in a combinatorial way.
The row sums of the Stirling numbers of the second
kind give, as we know, the Bell numbers; thus we
can obtain the (exponential) generating function for
these numbers:
n
¸
k=0

n
k
¸
= n!
n
¸
k=0
k!
n!

n
k
¸
1
k!
=
= n![t
n
]

e
y

y = e
t
−1

=
= n![t
n
] exp(e
t
−1);
therefore we have:
(

B
n
n!

= exp(e
t
−1).
We also defined the ordered Bell numbers as O
n
=
¸
n
k=0
¸
n
k
¸
k!; therefore we have:
O
n
n!
=
n
¸
k=0
k!
n!

n
k
¸
=
= [t
n
]
¸
1
1 −y

y = e
t
−1

= [t
n
]
1
2 −e
t
.
We have thus obtained the exponential generating
function:
(

O
n
n!

=
1
2 −e
t
.
Stirling numbers of the two kinds are related be-
tween them in various ways. For example, we have:
¸
k

n
k

k
m

=
n!
m!
¸
k
k!
n!

n
k

m!
k!

k
m

=
=
n!
m!
[t
n
]
¸
(e
y
−1)
m

y = ln
1
1 −t

=
=
n!
m!
[t
n
]
t
m
(1 −t)
m
=
n!
m!

n −1
m−1

.
Besides, two orthogonality relations exist between
Stirling numbers. The first one is proved in this way:
¸
k

n
k

k
m

(−1)
n−k
=
66 CHAPTER 5. RIORDAN ARRAYS
= (−1)
n
n!
m!
¸
k
k!
n!

n
k

m!
k!

k
m

(−1)
k
=
= (−1)
n
n!
m!
[t
n
]
¸
(e
−y
−1)
m

y = ln
1
1 −t

=
= (−1)
n
n!
m!
[t
n
](−t)
m
= δ
n,m
.
The second orthogonality relation is proved in a sim-
ilar way and reads:
¸
k

n
k
¸
¸
k
m

(−1)
n−k
= δ
n,m
.
We introduced Stirling numbers by means of Stir-
ling identities relative to powers and falling factorials.
We can now prove these identities by using a Riordan
array approach. In fact:
n
¸
k=0

n
k

(−1)
n−k
x
k
=
= (−1)
n
n!
n
¸
k=0
k!
n!

n
k

x
k
k!
(−1)
k
=
= (−1)
n
n![t
n
]
¸
e
−xy

y = ln
1
1 −t

=
= (−1)
n
n![t
n
](1 −t)
x
= n!

x
n

= x
n
and:
n
¸
k=0

n
k
¸
x
k
= n!
n
¸
k=0
k!
n!

n
k
¸

x
k

=
= n![t
n
]

(1 +y)
x

y = e
t
−1

=
= n![t
n
]e
tx
= x
n
.
We conclude this section by showing two possible
connections between Stirling numbers and Bernoulli
numbers. First we have:
n
¸
k=0

n
k
¸
(−1)
k
k!
k + 1
= n!
n
¸
k=0
k!
n!

n
k
¸
(−1)
k
k + 1
=
= n![t
n
]
¸

1
y
ln
1
1 +y

y = e
t
−1

=
= n![t
n
]
t
e
t
−1
= B
n
which proves that Bernoulli numbers can be defined
in terms of the Stirling numbers of the second kind.
For the Stirling numbers of the first kind we have the
identity:
n
¸
k=0

n
k

B
k
= n!
n
¸
k=0
k!
n!

n
k

B
k
k!
=
= n![t
n
]
¸
y
e
y
−1

y = ln
1
1 −t

=
= n![t
n
]
1 −t
t
ln
1
1 −t
=
= n![t
n+1
](1 −t) ln
1
1 −t
=
(n −1)!
n + 1
.
Clearly, this holds for n > 0. For n = 0 we have:
n
¸
k=0

n
k

B
k
= B
0
= 1.
Chapter 6
Formal methods
6.1 Formal languages
During the 1950’s, the linguist Noam Chomski in-
troduced the concept of a formal language. Several
definitions have to be provided before a precise state-
ment of the concept can be given. Therefore, let us
proceed in the following way.
First, we recall definitions given in Section 2.1. An
alphabet is a finite set A = ¦a
1
, a
2
, . . . , a
n
¦, whose
elements are called symbols or letters. A word on A
is a finite sequence of symbols in A; the sequence is
written by juxtaposing the symbols, and therefore a
word w is denoted by w = a
i1
a
i2
. . . a
ir
, and r = [w[ is
the length of the word. The empty sequence is called
the empty word and is conventionally denoted by ǫ;
its length is obviously 0, and is the only word of 0
length.
The set of all the words on A, the empty word in-
cluded, is indicated by A

, and by A
+
if the empty
word is excluded. Algebraically, A

is the free monoid
on A, that is the monoid freely generated by the sym-
bols in A. To understand this point, let us consider
the operation of juxtaposition and recursively apply
it starting with the symbols in A. What we get are
the words on A, and the juxtaposition can be seen as
an operation between them. The algebraic structure
thus obtained has the following properties:
1. associativity: w
1
(w
2
w
3
) = (w
1
w
2
)w
3
=
w
1
w
2
w
3
;
2. ǫ is the identity or neutral element: ǫw = wǫ =
w.
It is called a monoid, which, by construction, has
been generated by combining the symbols in A in all
the possible ways. Because of that, (A

, ), if de-
notes the juxtaposition, is called the “free monoid”
generated by A. Observe that a monoid is an alge-
braic structure more general than a group, in which
all the elements have an inverse as well.
If w ∈ A

and z is a word such that w can be
decomposed w = w
1
zw
2
(w
1
and/or w
2
possibly
empty), we say that z is a subword of w; we also
say that z occurs in w, and the particular instance
of z in w is called an occurrence of z in w. Observe
that if z is a subword of w, it can have more than one
occurrence in w. If w = zw
2
, we say that z is a head
or prefix of w, and if w = w
1
z, we say that z is a tail
or suffix of w. Finally, a language on A is any subset
L ⊆ A

.
The basic definition concerning formal languages
is the following: a grammar is a 4-tuple G =
(T, N, σ, {), where:
• T = ¦a
1
, a
2
, . . . , a
n
¦ is the alphabet of terminal
symbols;
• N = ¦φ
1
, φ
2
, . . . , φ
m
¦ is the alphabet of non-
terminal symbols;
• σ ∈ N is the initial symbol;
• { is a finite set of productions.
Usually, the symbols in T are denoted by lower
case Latin letters; the symbols in N by Greek let-
ters or by upper case Latin letters. A production
is a pair (z
1
, z
2
) of words in T ∪ N, such that z
1
contains at least a symbol in N; the production is
often indicated by z
1
→ z
2
. If w ∈ (T ∪ N)

, we
can apply a production z
1
→ z
2
∈ { to w whenever
w can be decomposed w = w
1
z
1
w
2
, and the result
is the new word w
1
z
2
w
2
∈ (T ∪ N)

; we will write
w = w
1
z
1
w
2
⊢ w
1
z
2
w
2
when w
1
z
1
w
2
is the decompo-
sition of w in which z
1
is the leftmost occurrence of
z
1
in w; in other words, if we also have w = ´ w
1
z
1
´ w
2
,
then [w
1
[ < [ ´ w
1
[.
Given a grammar G = (T, N, σ, {), we define the
relation w ⊢ ´ w between words w, ´ w ∈ (T ∪ N)

: the
relation holds if and only if a production z
1
→ z
2
∈ {
exists such that z
1
occurs in w, w = w
1
z
1
w
2
is the
leftmost occurrence of z
1
in w and ´ w = w
1
z
2
w
2
. We
also denote by ⊢

the transitive closure of ⊢ and call
it generation or derivation; this means that w ⊢

´ w
if and only if a sequence (w = w
1
, w
2
, . . . , w
s
= ´ w)
exists such that w
1
⊢ w
2
, w
2
⊢ w
3
, . . . , w
s−1
⊢ w
s
.
We observe explicitly that by our condition that in
every production z
1
→ z
2
the word z
1
should contain
67
68 CHAPTER 6. FORMAL METHODS
at least a symbol in N, if a word w
i
∈ T

is produced
during a generation, it is terminal, i.e., the generation
should stop. By collecting all these definitions, we
finally define the language generated by the grammar
G as the set:
L(G) =

w ∈ T

σ ⊢

w
¸
i.e., a word w ∈ T

is in L(G) if and only if we can
generate it by starting with the initial symbol σ and
go on by applying the productions in { until w is
generated. At that moment, the generation stops.
Note that, sometimes, the generation can go on for-
ever, never generating a word on T; however, this is
not a problem: it only means that such generations
should be ignored.
6.2 Context-free languages
The definition of a formal language is quite general
and it is possible to show that formal languages co-
incide with the class of “partially recursive sets”, the
largest class of sets which can be constructed recur-
sively, i.e., in finite terms. This means that we can
give rules to build such sets (e.g., we can give a gram-
mar for them), but their construction can go on for-
ever, so that, looking at them from another point of
view, if we wish to know whether a word w belongs
to such a set S, we can be unlucky and an infinite
process can be necessary to find out that w ∈ S.
Because of that, people have studied more re-
stricted classes of languages, for which a finite process
is always possible for finding out whether w belongs
to the language or not. Surely, the most important
class of this kind is the class of “context-free lan-
guages”. They are defined in the following way. A
context-free grammar is a grammar G = (T, N, σ, {)
in which all the productions z
1
→ z
2
in { are such
that z
1
∈ N. The naming “context-free” derives from
this definition, because a production z
1
→ z
2
is ap-
plied whenever the non-terminal symbol z
1
is the left-
most non-terminal symbol in a word, irrespective of
the context in which it appears.
As a very simple example, let us consider the fol-
lowing grammar. Let T = ¦a, b¦, N = ¦σ¦; σ is
the initial symbol of the grammar, being the only
non-terminal. The set { is composed of the two pro-
ductions:
σ → ǫ σ → aσbσ.
This grammar is called the Dyck grammar and the
language generated by it the Dyck language. In Fig-
ure 6.1 we draw the generation of some words in the
Dyck language. The recursive nature of the produc-
tions allows us to prove properties of the Dyck lan-
guage by means of mathematical induction:
Theorem 6.2.1 A word w ∈ ¦a, b¦

belongs to the
Dyck language D if and only if:
i) the number of a’s in w equals the number of b’s;
ii) in every prefix z of w the number of a’s is not
less than the number of b’s.
Proof: Let w ∈ D; if w = ǫ nothing has to be
proved. Otherwise, w is generated by the second pro-
duction and w = aw
1
bw
2
with w
1
, w
2
∈ D; therefore,
if we suppose that i) holds for w
1
and w
2
, it also
holds for w. For ii), any prefix z of w must have
one of the forms: a, az
1
where z
1
is a prefix of w
1
,
aw
1
b or aw
1
bz
2
where z
2
is a prefix of w
2
. By the
induction hypothesis, ii) should hold for z
1
and z
2
,
and therefore it is easily proved for w. Vice versa, let
us suppose that i) and ii) hold for w ∈ T

. If w = ǫ,
then by ii) w should begin by a. Let us scan w until
we find the first occurrence of the symbol b such that
w = aw
1
bw
2
and in w
1
the number of b’s equals the
number of a’s. By i) such occurrence of b must exist,
and consequently w
1
and w
2
must satisfy condition
i). Besides, if w
1
and w
2
are not empty, then they
should satisfy condition ii), by the very construction
of w
1
and the fact that w satisfies condition ii) by
hypothesis. We have thus obtained a decomposition
of w showing that the second production has been
used. This completes the proof.
If we substitute the letter a with the symbol ‘(

and
the letter b with the symbol ‘)’, the theorem shows
that the words in the Dyck language are the possi-
ble parenthetizations of an expression. Therefore, the
number of Dyck words with n pairs of parentheses is
the Catalan number

2n
n

/(n+1). We will see how this
result can also be obtained by starting with the def-
inition of the Dyck language and applying a suitable
and mechanical method, known as Sch¨ utzenberger
methodology or symbolic method. The method can
be applied to every set of objects, which are defined
through a non-ambiguous context-free grammar.
A context-free grammar G is ambiguous iff there
exists a word w ∈ L(G) which can be generated by
two different leftmost derivations. In other words, a
context-free grammar H is non-ambiguous iff every
word w ∈ L(H) can be generated in one and only
one way. An example of an ambiguous grammar is
G = (T, N, σ, {) where T = ¦1¦, N = ¦σ¦ and {
contains the two productions:
σ → 1 σ → σσ.
For example, the word 111 is generated by the two
following leftmost derivations:
σ ⊢ σσ ⊢ 1σ ⊢ 1σσ ⊢ 11σ ⊢ 111
σ ⊢ σσ ⊢ σσσ ⊢ 1σσ ⊢ 11σ ⊢ 111.
6.3. FORMAL LANGUAGES AND PROGRAMMING LANGUAGES 69
σ
ǫ
aσbσ
abσ aaσbσbσ
ab abaσbσ aabσbσ aaaσbσbσbσ
ababσ abaaσbσbσ aabbσ aabaσbσbσ
abab ababaσbσ aabb aabbaσbσ
$
$
$
$
$
$
$ ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
@
@
@
@
@
@
@
@
@
@@ h
h
h
h
h
h
h
h
h
hh

d
d
d
d

d
d
d
d
d
d

d
d
d
d
d
d
d
d
Figure 6.1: The generation of some Dyck words
Instead, the Dyck grammar is non-ambiguous; in
fact, as we have shown in the proof of the pre-
vious theorem, given any word w ∈ D, w = ǫ,
there is only one decomposition w = aw
1
bw
2
, hav-
ing w
1
, w
2
∈ D; therefore, w can only be generated
in a single way. In general, if we show that any word
in a context-free language L(G), generated by some
grammar G, has a unique decomposition according
to the productions in G, then the grammar cannot
be ambiguous. Because of the connection between
the Sch¨ utzenberger methodology and non-ambiguous
context-free grammars, we are mainly interested in
this kind of grammars. For the sake of completeness,
a context-free language is called intrinsically ambigu-
ous iff every context-free grammar generating it is
ambiguous. This definition stresses the fact that, if a
language is generated by an ambiguous grammar, it
can also be generated by some non-ambiguous gram-
mar, unless it is intrinsically ambiguous. It is possible
to show that intrinsically ambiguous languages actu-
ally exist; fortunately, they are not very frequent. For
example, the language generated by the previous am-
biguous grammar is ¦1¦
+
, i.e., the set of all the words
composed by any sequence of 1’s, except the empty
word. actually, it is not an ambiguous language and
a non-ambiguous grammar generating it is given by
the same T, N, σ and the two productions:
σ → 1 σ → 1σ.
It is a simple matter to show that every word 11 . . . 1
can be uniquely decomposed according to these pro-
ductions.
6.3 Formal languages and pro-
gramming languages
In 1960, the formal definition of the programming
language ALGOL’60 was published. ALGOL’60 has
surely been the most influential programming lan-
guage ever created, although it was actually used only
by a very limited number of programmers. Most of
the concepts we now find in programming languages
were introduced by ALGOL’60, of which, for exam-
ple, PASCAL and C are direct derivations. Here,
we are not interested in these aspects of ALGOL’60,
but we wish to spend some words on how ALGOL’60
used context-free grammars to define its syntax in a
formal and precise way. In practice, a program in
ALGOL’60 is a word generated by a (rather com-
plex) context-free grammar, whose initial symbol is
'program`.
The ALGOL’60 grammar used, as terminal sym-
bol alphabet, the characters available on the stan-
dard keyboard of a computer; actually, they were the
characters punchable on a card, the input mean used
at that time to introduce a program into the com-
puter. The non-terminal symbol notation was one
of the most appealing inventions of ALGOL’60: the
symbols were composed by entire English sentences
enclosed by the two special parentheses ' and `. This
allowed to clearly express the intended meaning of the
non-terminal symbols. The previous example con-
cerning 'program` makes surely sense. Another tech-
nical device used by ALGOL’60 was the compaction
of productions; if we had several production with the
same left hand symbol β → w
1
, β → w
2
, . . . , β → w
k
,
70 CHAPTER 6. FORMAL METHODS
they were written as a single rule:
β ::= w
1
[ w
2
[ [ w
k
where ::= was a metasymbol denoting definition and
[ was read “or” to denote alternatives. This notation
is usually called Backus Normal Form (BNF).
Just to do a very simple example, in Figure 6.1
(lines 1 through 6) we show how integer numbers
were defined. This definition avoids leading 0’s in
numbers, but allows both +0 and −0. Productions
can be easily changed to avoid +0 or −0 or both.
In the same figure, line 7 shows the definition of the
conditional statements.
This kind of definition gives a precise formula-
tion of all the clauses in the programming language.
Besides, since the program has a single generation
according to the grammar, it is possible to find
this derivation starting from the actual program and
therefore give its exact structure. This allows to give
precise information to the compiler, which, in a sense,
is directed from the formal syntax of the language
(syntax directed compilation).
A very interesting aspect is how this context-free
grammar definition can avoid ambiguities in the in-
terpretation of a program. Let us consider an expres-
sion like a + b ∗ c; according to the rules of Algebra,
the multiplication should be executed before the ad-
dition, and the computer must follow this convention
in order to create no confusion. This is done by the
simplified productions given by lines 8 through 11 in
Figure 6.1. The derivation of the simple expression
a+b ∗ c, or of a more complicated expression, reveals
that it is decomposed into the sum of a and b ∗ c;
this information is passed to the compiler and the
multiplication is actually performed before addition.
If powers are also present, they are executed before
products.
This ability of context-free grammars in design-
ing the syntax of programming languages is very im-
portant, and after ALGOL’60 the syntax of every
programming language has always been defined by
context-free grammars. We conclude by remembering
that a more sophisticated approach to the definition
of programming languages was tried with ALGOL’68
by means of van Wijngaarden’s grammars, but the
method revealed too complex and was abandoned.
6.4 The symbolic method
The Sch¨ utzenberger’s method allows us to obtain
the counting generating function for every non-
ambiguous language, starting with the correspond-
ing non-ambiguous grammar and proceeding in a me-
chanical way. Let us begin by a simple example; Fi-
bonacci words are the words on the alphabet ¦0, 1¦
beginning and ending by the symbol 1 and never con-
taining two consecutive 0’s. For small values of n,
Fibonacci words of length n are easily displayed:
n = 1 1
n = 2 11
n = 3 111, 101
n = 4 1111, 1011, 1101
n = 5 11111, 10111, 11011, 11101, 10101
If we count them by their length, we obtain the
sequence ¦0, 1, 1, 2, 3, 5, 8, . . .¦, which is easily recog-
nized as the Fibonacci sequence. In fact, a word of
length n is obtained by adding a trailing 1 to a word of
length n−1, or adding a trailing 01 to a word of length
n − 2. This immediately shows, in a combinatorial
way, that Fibonacci words are counted by Fibonacci
numbers. Besides, we get the productions of a non-
ambiguous context-free grammar G = (T, N, σ, {),
where T = ¦0, 1¦, N = ¦φ¦, σ = φ and { contains:
φ → 1 φ → φ1 φ → φ01
(these productions could have been written φ ::=
1 [ φ1 [ φ01 by using the ALGOL’60 notations).
We are now going to obtain the counting gener-
ating function for Fibonacci words by applying the
Sch¨ utzenberger’s method. This consists in the fol-
lowing steps:
1. every non-terminal symbol σ ∈ N is transformed
into the name of its counting generating function
σ(t);
2. every terminal symbol is transformed into t;
3. the empty word is transformed into 1;
4. every [ sign is transformed into a + sign, and ::=
is transformed into an equal sign.
After having performed these transformations, we ob-
tain a system of equations, which can be solved in the
unknown generating functions introduced in the first
step. They are the counting generating functions for
the languages generated by the corresponding non-
terminal symbols, when we consider them as the ini-
tial symbols.
The definition of the Fibonacci words produces:
φ(t) = t +tφ(t) +t
2
φ(t)
the solution of which is:
φ(t) =
t
1 −t −t
2
;
this is obviously the generating function for the Fi-
bonacci numbers. Therefore, we have shown that the
6.5. THE BIVARIATE CASE 71
1 'digit` ::= 0 [ 1 [ 2 [ 3 [ 4 [ 5 [ 6 [ 7 [ 8 [ 9
2 'non −zero digit` ::= 1 [ 2 [ 3 [ 4 [ 5 [ 6 [ 7 [ 8 [ 9
3 'sequence of digits` ::= 'digit` [ 'digit` 'sequence of digits`
4 'unsigned number` ::= 'digit` [ 'non −zero digit` 'sequence of digits`
5 'signed number` ::= +'unsigned number` [ −'unsigned number`
6 'integer number` ::= 'unsigned number` [ 'signed number`
7 'conditional clause` ::= if 'condition` then 'instruction` [
if 'condition` then 'instruction` else 'instruction`
8 'expression` ::= 'term` [ 'term` +'expression`
9 'term` ::= 'factor` [ 'term` ∗ 'factor`
10 'factor` ::= 'element` [ 'factor` ↑ 'element`
11 'element` ::= 'constant` [ 'variable` [ ('expression`)
Table 6.1: Context-free languages and programming languages
number of Fibonacci words of length n is F
n
, as we
have already proved by combinatorial arguments.
In the case of the Dyck language, the definition
yields:
σ(t) = 1 +t
2
σ(t)
2
and therefore:
σ(t) =
1 −

1 −4t
2
2t
2
.
Since every word in the Dyck language has an even
length, the number of Dyck words with 2n symbols is
just the nth Catalan number, and this also we knew
by combinatorial means.
Another example is given by the Motzkin words;
these are words on the alphabet ¦a, b, c¦ in which
a, b act as parentheses in the Dyck language, while
c is free and can appear everywhere. Therefore, the
definition of the language is:
µ ::= ǫ [ cµ[ aµbµ
if µ is the only non-terminal symbol. The
Sch¨ utzenberger’s method gives the equation:
µ(t) = 1 +tµ(t) +t
2
µ(t)
2
whose solution is easily found:
µ(t) =
1 −t −

1 −2t −3t
2
2t
2
.
By expanding this function we find the sequence of
Motzkin numbers, beginning:
n 0 1 2 3 4 5 6 7 8 9
M
n
1 1 2 4 9 21 51 127 323 835
These numbers count the so-called unary-binary
trees, i.e., trees the nodes of which have ariety 1 or 2.
They can be defined in a pictorial way by means of
an object grammar:
An object grammar defines combinatorial objects
instead of simple letters or words; however, most
times, it is rather easy to pass from an object gram-
mar to an equivalent context-free grammar, and
therefore obtain counting generating functions by
means of Sch¨ utzenberger’s method. For example, the
object grammar in Figure 6.2 is obviously equivalent
to the context-free grammar for Motzkin words.
6.5 The bivariate case
In Sch¨ utzenberger’s method, the rˆole of the indeter-
minate t is to count the number of letters or symbols
occurring in the generated words; because of that, ev-
ery symbol appearing in a production is transformed
into a t. However, we can wish to count other param-
eters instead of or in conjunction with the number of
symbols. This is accomplished by modifying the in-
tended meaning of the indeterminate t and/or intro-
ducing some other indeterminate to take into account
the other parameters.
For example, in the Dyck language, we can wish to
count the number of pairs a, b occurring in the words;
this means that t no longer counts the single letters,
but counts the pairs. Therefore, the Sch¨ utzenberger’s
method gives the equation σ(t) = 1 + tσ(t)
2
, whose
solution is just the generating function of the Catalan
numbers.
An interesting application is as follows. Let us sup-
pose we wish to know how many Fibonacci words of
length n contain k zeroes. Besides the indeterminate
t counting the total number of symbols, we introduce
a new indeterminate z counting the number of zeroes.
From the productions of the Fibonacci grammar, we
derive an equation for the bivariate generating func-
tion φ(t, z), in which the coefficient of t
n
z
k
is just the
72 CHAPTER 6. FORMAL METHODS

t
t
t
t
µ
::=
`

ǫ
,

t
t
t
t
µ
,

d
d
d
d

t
t
t
t
µ

t
t
t
t
µ
Figure 6.2: An object grammar
number we are looking for. The equation is:
φ(t, z) = t +tφ(t, z) +t
2
zφ(t, z).
In fact, the production φ → φ1 increases by 1 the
length of the words, but does not introduce any 0;
the production φ → φ01, increases by 2 the length
and introduces a zero. By solving this equation, we
find:
φ(t, z) =
t
1 −t −zt
2
.
We can now extract the coefficient φ
n,k
of t
n
z
k
in the
following way:
φ
n,k
= [t
n
][z
k
]
t
1 −t −zt
2
=
= [t
n
][z
k
]
t
1 −t
1
1 −z
t
2
1−t
=
= [t
n
][z
k
]
t
1 −t

¸
k=0
z
k

t
2
1 −t

k
=
= [t
n
][z
k
]

¸
k=0
t
2k+1
z
k
(1 −t)
k+1
= [t
n
]
t
2k+1
(1 −t)
k+1
=
= [t
n−2k−1
]
1
(1 −t)
k+1
=

n −k −1
k

.
Therefore, the number of Fibonacci words of length
n containing k zeroes is counted by a binomial coeffi-
cient. The second expression in the derivation shows
that the array (φ
n,k
)
n,k∈N
is indeed a Riordan ar-
ray (t/(1 −t), t/(1 −t)), which is the Pascal triangle
stretched vertically, i.e., column k is shifted down by
k positions (k + 1, in reality). The general formula
we know for the row sums of a Riordan array gives:
¸
k
φ
n,k
=
¸
k

n −k −1
k

=
= [t
n
]
t
1 −t
¸
1
1 −y

y =
t
2
1 −t

=
= [t
n
]
t
1 −t −t
2
= F
n
as we were expecting. A more interesting problem
is to find the average number of zeroes in all the Fi-
bonacci words with n letters. First, we count the
total number of zeroes in all the words of length n:
¸
k

n,k
=
¸
k

n −k −1
k

k =
= [t
n
]
t
1 −t
¸
y
(1 −y)
2

y =
t
2
1 −t

=
= [t
n
]
t
3
(1 −t −t
2
)
2
.
We extract the coefficient:
[t
n
]
t
3
(1 −t −t
2
)
2
=
= [t
n−1
]

1

5

1
1 −φt

1
1 −
´
φt

2
=
=
1
5
[t
n−1
]
1
(1 −φt)
2

2
5
[t
n
]
t
(1 −φt)(1 −
´
φt)
+
+
1
5
[t
n−1
]
1
(1 −
´
φt)
2
=
=
1
5
[t
n−1
]
1
(1 −φt)
2

2
5

5
[t
n
]
1
1 −φt
+
+
2
5

5
[t
n
]
1
1 −
´
φt
+
1
5
[t
n−1
]
1
(1 −
´
φt)
2
.
The last two terms are negligible because they rapidly
tend to 0; therefore we have:
¸
k

n,k

n
5
φ
n−1

2
5

5
φ
n
.
To obtain the average number Z
n
of zeroes, we need
to divide this quantity by F
n
∼ φ
n
/

5, the total
number of Fibonacci words of length n:
Z
n
=
¸
k

n,k
F
n

n
φ

5

2
5
=
5 −

5
10
n −
2
5
.
This shows that the average number of zeroes grows
linearly with the length of the words and tends to
become the 27.64% of this length, because (5 −

5)/10 ≈ 0.2763932022 . . ..
6.6 The Shift Operator
In the usual mathematical terminology, an operator
is a mapping from some set T
1
of functions into some
6.7. THE DIFFERENCE OPERATOR 73
other set of functions T
2
. We have already encoun-
tered the operator (, acting from the set of sequences
(which are properly functions from N into R or C)
into the set of formal power series (analytic func-
tions). Other usual examples are D, the operator
of differentiation, and

, the operator of indefinite
integration. Since the middle of the 19th century,
the English mathematicians (G. Boole in particular)
introduced the concept of a finite operator, i.e., an
operator acting in finite terms, in contrast to an in-
finitesimal operator, such as differentiation and inte-
gration. The most simple among the finite operators
is the identity, denoted by I or by 1, which does not
change anything. Operationally, however, the most
important finite operator is the shift operator, de-
noted by E, changing the value of any function f at
a point x into the value of the same function at the
point x + 1. We write:
Ef(x) = f(x + 1)
Unlike an infinitesimal operator as D, a finite op-
erator can be applied to a sequence, as well, and in
that case we can write:
Ef
n
= f
n+1
This property is particularly interesting from our
point of view but, in order to follow the usual nota-
tional conventions, we shall adopt the first, functional
notation rather than the second one, more specific for
sequences.
The shift operator can be iterated and, to denote
the successive applications of E, we write E
n
instead
of EE . . . E (n times). So:
E
2
f(x) = EEf(x) = Ef(x + 1) = f(x + 2)
and in general:
E
n
f(x) = f(x +n)
Conventionally, we set E
0
= I = 1, and this is in
accordance with the meaning of I:
E
0
f(x) = f(x) = If(x)
We wish to observe that every recurrence can be
written using the shift operator. For example, the
recurrence for the Fibonacci numbers is:
E
2
F
n
= EF
n
+F
n
and this can be written in the following way, separat-
ing the operator parts from the sequence parts:
(E
2
−E −I)F
n
= 0
Some obvious properties of the shift operator are:
E(αf(x) +βg(x)) = αEf(x) +βEg(x)
E(f(x)g(x)) = Ef(x)Eg(x)
Hence, if c is any constant (i.e., c ∈ R or c ∈ C) we
have Ec = c.
It is possible to consider negative powers of E, as
well. So we have:
E
−1
f(x) = f(x −1) E
−n
f(x) = f(x −n)
and this is in accordance with the usual rules of pow-
ers:
E
n
E
m
= E
n+m
for n, m ∈ Z
E
n
E
−n
= E
0
= I for n ∈ Z
Finite operators are commonly used in Numerical
Analysis. In that case, an increment h is defined and
the shift operator acts according to this increment,
i.e., Ef(x) = f(x +h). When considering sequences,
this makes no sense and we constantly use h = 1.
We can have occasions to use two or more shift
operators, that is shift operators related to different
variables. We’ll distinguish them by suitable sub-
scripts:
E
x
f(x, y) = f(x + 1, y) E
y
f(x, y) = f(x, y + 1)
E
x
E
y
f(x, y) = E
y
E
x
f(x, y) = f(x + 1, y + 1)
6.7 The Difference Operator
The second most important finite operator is the dif-
ference operator ∆; it is defined in terms of the shift
and identity operators:
∆f(x) = (E −1)f(x) =
= Ef(x) −If(x) = f(x + 1) −f(x)
Some simple examples are:
∆c = Ec −c = c −c = 0 ∀c ∈ C
∆x = Ex −x = x + 1 −x = 1
∆x
2
= Ex
2
−x
2
= (x + 1)
2
−x
2
= 2x + 1
∆x
n
= Ex
n
−x
n
= (x + 1)
n
−x
n
=
n−1
¸
k=0

n
k

x
k

1
x
=
1
x + 1

1
x
= −
1
x(x + 1)

x
m

=

x + 1
m

x
m

=

x
m−1

The last example makes use of the recurrence relation
for binomial coefficients. An important observation
74 CHAPTER 6. FORMAL METHODS
concerns the behavior of the difference operator with
respect to the falling factorial:
∆x
m
= (x + 1)
m
−x
m
= (x + 1)x(x −1)
(x −m+ 2) −x(x −1) (x −m+ 1) =
= x(x −1) (x −m+ 2)(x + 1 −x +m−1) =
= mx
m−1
This is analogous to the usual rule for the differenti-
ation operator applied to x
m
:
Dx
m
= mx
m−1
As we shall see, many formal properties of the dif-
ference operator are similar to the properties of the
differentiation operator. The rˆole of the powers x
m
is
however taken by the falling factorials, which there-
fore assume a central position in the theory of finite
operators.
The following general rules are rather obvious:
∆(αf(x) +βg(x)) = α∆f(x) +β∆g(x)
∆(f(x)g(x)) =
= E(f(x)g(x)) −f(x)g(x) =
= f(x + 1)g(x + 1) −f(x)g(x) =
= f(x + 1)g(x + 1) −f(x)g(x + 1) +
+f(x)g(x + 1) −f(x)g(x) =
= ∆f(x)Eg(x) +f(x)∆g(x)
resembling the differentiation rule D(f(x)g(x)) =
(Df(x))g(x) +f(x)Dg(x). In a similar way we have:

f(x)
g(x)
=
f(x + 1)
g(x + 1)

f(x)
g(x)
=
=
f(x + 1)g(x) −f(x)g(x)
g(x)g(x + 1)
+
+
f(x)g(x) −f(x)g(x + 1)
g(x)g(x + 1)
=
=
(∆f(x))g(x) −f(x)∆g(x)
g(x)Eg(x)
The difference operator can be iterated:

2
f(x) = ∆∆f(x) = ∆(f(x + 1) −f(x)) =
= f(x + 2) −2f(x + 1) +f(x).
From a formal point of view, we have:

2
= (E −1)
2
= E
2
−2E + 1
and in general:

n
= (E −1)
n
=
n
¸
k=0

n
k

(−1)
n−k
E
k
=
= (−1)
n
n
¸
k=0

n
k

(−E)
k
This is a very important formula, and it is the first
example for the interest of combinatorics and gener-
ating functions in the theory of finite operators. In
fact, let us iterate ∆ on f(x) = 1/x:

2
1
x
=
−1
(x + 1)(x + 2)
+
1
x(x + 1)
=
=
−x +x + 2
x(x + 1)(x + 2)
=
2
x(x + 1)(x + 2)

n
1
x
=
(−1)
n
n!
x(x + 1) (x +n)
as we can easily show by mathematical induction. In
fact:

n+1
1
x
=
(−1)
n
n!
(x + 1) (x +n + 1)


(−1)
n
n!
x(x + 1) (x +n)
=
=
(−1)
n+1
(n + 1)!
x(x + 1) (x +n + 1)
The formula for ∆
n
now gives the following identity:

n
1
x
=
n
¸
k=0

n
k

(−1)
n−k
E
k
1
x
By multiplying everything by (−1)
n
this identity can
be written as:
n!
x(x + 1) (x +n)
=
1
x

x+n
n
=
n
¸
k=0

n
k

(−1)
k
x +k
and therefore we have both a way to express the in-
verse of a binomial coefficient as a sum and an expres-
sion for the partial fraction expansion of the polyno-
mial x(x + 1) (x +n) inverse.
6.8 Shift and Difference Oper-
ators - Example I
As the difference operator ∆ can be expressed in
terms of the shift operator E, so E can be expressed
in terms of ∆:
E = ∆ + 1
This rule can be iterated, giving the summation for-
mula:
E
n
= (∆ + 1)
n
=
n
¸
k=0

n
k


k
which can be seen as the “dual” formula of the one
already considered:

n
=
n
¸
k=0

n
k

(−1)
n−k
E
k
6.8. SHIFT AND DIFFERENCE OPERATORS - EXAMPLE I 75
The evaluation of the successive differences of any
function f(x) allows us to state and prove two identi-
ties, which may have combinatorial significance. Here
we record some typical examples; we mark with an
asterisk the cases when ∆
0
f(x) = If(x).
1) The function f(x) = 1/x has already been de-
veloped, at least partially:

1
x
=
−1
x(x + 1)

n
1
x
=
(−1)
n
n!
x(x + 1) (x +n)
(−1)
n
x

x +n
n

−1
¸
k

n
k

(−1)
k
x +k
=
n!
x(x + 1) (x +n)
=
=
1
x

x +n
n

−1
¸
k

n
k

(−1)
k

x +k
k

−1
=
x
x +n
.
2

) A somewhat similar situation, but a bit more
complex:

p +x
m+x
=
m−p
(m+x)(m+x + 1)

n
p +x
m+x
=
m−p
m+x
(−1)
n−1

m+x +n
n

−1
¸
k

n
k

(−1)
k
p +k
m+k
=
m−p
m

m+n
n

−1
(n > 0)
¸
k

n
k

m+k
k

−1
(−1)
k
=
m
m+n
(see above).
3) Another version of the first example:

1
px +m
=
−p
(px +m)(px +p +m)

n
1
px +m
=
=
(−1)
n
n!p
n
(px +m)(px +p +m) (px +np +m)
According to this rule, we should have: ∆
0
(p +
x)/(m + x) = (p − m)(m + x); in the second next
sum, however, we have to set ∆
0
= I, and therefore
we also have to subtract 1 from both members in or-
der to obtain a true identity; a similar situation arises
whenever we have ∆
0
= I.
¸
k

n
k

(−1)
k
pk +m
=
n!p
n
m(m+p) (m+np)
¸
k

n
k

(−1)
k
k!p
k
m(m+p) (m+pk)
=
1
pn +m
4

) A case involving the harmonic numbers:
∆H
x
=
1
x + 1

n
H
x
=
(−1)
n−1
n

x +n
n

−1
¸
k

n
k

(−1)
k
H
x+k
= −
1
n

x +n
n

−1
(n > 0)
n
¸
k=1

n
k

(−1)
k−1
k

x +k
k

−1
= H
x+n
−H
x
where is to be noted the case x = 0.
5

) A more complicated case with the harmonic
numbers:
∆xH
x
= H
x
+ 1

n
xH
x
=
(−1)
n
n −1

x +n −1
n −1

−1
¸
k

n
k

(−1)
k
(x +k)H
x+k
=
=
1
n −1

x +n −1
n −1

−1
¸
k

n
k

(−1)
k
k −1

x +k −1
k −1

−1
=
= (x +n) (H
x+n
−H
x
) −n
6) Harmonic numbers and binomial coefficients:

x
m

H
x
=

x
m−1

H
x
+
1
m


n

x
m

H
x
=

x
m−n

(H
x
+H
m
−H
m−n
)
¸
k

n
k

(−1)
k

x +k
m

H
x+k
=
= (−1)
n

x
m−n

(H
x
+H
m
−H
m−n
)
¸
k

n
k

x
m−k

(H
x
+H
m
−H
m−k
) =
=

x +n
m

H
x+n
and by performing the sums on the left containing
H
x
and H
m
:
¸
k

n
k

x
m−k

H
m−k
=
=

x +n
m

(H
x
+H
m
−H
x+n
)
76 CHAPTER 6. FORMAL METHODS
7) The function ln(x) can be inserted in this group:
∆ln(x) = ln

x + 1
x


n
ln(x) = (−1)
n
¸
k

n
k

(−1)
k
ln(x +k)
¸
k

n
k

(−1)
k
ln(x +k) =
=
¸
k

n
k

(−1)
k
ln(x +k)
n
¸
k=0
k
¸
j=0
(−1)
k+j

n
k

k
j

ln(x +j) = ln(x +n)
Note that the last but one relation is an identity.
6.9 Shift and Difference Oper-
ators - Example II
Here we propose other examples of combinatorial
sums obtained by iterating the shift and difference
operators.
1) The typical sum containing the binomial coeffi-
cients: Newton’s binomial theorem:
∆p
x
= (p −1)p
x

n
p
x
= (p −1)
n
p
x
¸
k

n
k

(−1)
k
p
k
= (p −1)
n
¸
k

n
k

(p −1)
k
= p
n
2) Two sums involving Fibonacci numbers:
∆F
x
= F
x−1

n
F
x
= F
x−n
¸
k

n
k

(−1)
k
F
x+k
= (−1)
n
F
x−n
¸
k

n
k

F
x−k
= F
x+n
3) Falling factorials are an introduction to binomial
coefficients:
∆x
m
= mx
m−1

n
x
m
= m
n
x
m−n
¸
k

n
k

(−1)
k
(x +k)
m
= (−1)
n
m
n
x
m−n
¸
k

n
k

m
k
x
m−k
= (x +n)
m
4) Similar sums hold for raising factorials:
∆x
m
= m(x + 1)
m−1

n
x
m
= m
n
(x +n)
m−n
¸
k

n
k

(−1)
k
(x +k)
m
= (−1)
n
m
n
(x +n)
m−n
¸
k

n
k

m
k
(x +k)
m−k
= (x +n)
m
5) Two sums involving the binomial coefficients:

x
m

=

x
m−1


n

x
m

=

x
m−n

¸
k

n
k

(−1)
k

x +k
m

= (−1)
n

x
m−n

¸
k

n
k

x
m−k

=

x +n
m

6) Another case with binomial coefficients:

p +x
m+x

=

p +x
m+x + 1


n

p +x
m+x

=

p +x
m+x +n

¸
k

n
k

(−1)
k

p +k
m+k

= (−1)
n

p
m+n

¸
k

n
k

p
m+k

=

p +n
m+n

7) And now a case with the inverse of a binomial
coefficient:

x
m

−1
= −
m
m−1

x + 1
m+ 1

−1

n

x
m

−1
= (−1)
n
m
m+n

x +n
m+n

−1
¸
k

n
k

(−1)
k

x +k
m

−1
=
m
m+n

x +n
m+n

−1
¸
k

n
k

(−1)
k
m
m+k

x +k
m+k

−1
=

x +n
m

−1
6.10. THE ADDITION OPERATOR 77
8) Two sums with the central binomial coefficients:

1
4
x

2x
x

=
1
2(x + 1)
1
4
x

2x
x


n
1
4
x

2x
x

=
(−1)
n
(2n)!
n!(x + 1) (x +n)
1
4
n

2n
n

¸
k

n
k

(−1)
k

2x + 2k
x +k

1
4
k
=
=
(2n)!
n!(x + 1) (x +n)
1
4
n

2x
x

=
=
1
4
n

2n
n

x +n
n

−1

2x
x

¸
k

n
k

(−1)
k
(2k)!
k!(x + 1) (x +k)
1
4
k

2x
x

=
=
1
4
n

2x + 2n
x +n

9) Two sums with the inverse of the central bino-
mial coefficients:
∆4
x

2x
x

−1
=
4
x
2x + 1

2x
x

−1

n
4
x

2x
x

−1
=
=
1
2n −1
(−1)
n−1
(2n)!4
x
2
n
n!(2x + 1) (2x + 2n −1)

2x
x

−1
¸
k

n
k

(−1)
k
4
k

2x + 2k
x +k

−1
=
=
1
2n −1
(2n)!
2
n
n!(2x + 1) (2x + 2n −1)

2x
x

−1
¸
k

n
k

1
2k −1
(2k)!(−1)
k−1
2
k
k!(2x + 1) (2x + 2k −1)
=
= 4
n

2x + 2n
x +n

−1

2x
x

−1
6.10 The Addition Operator
The addition operator S is analogous to the difference
operator:
S = E + 1
and in fact a simple connection exists between the
two operators:
S(−1)
x
f(x) = (−1)
x+1
f(x + 1) + (−1)
x
f(x) =
= (−1)
x+1
(f(x + 1) −f(x)) =
= (−1)
x−1
∆f(x)
Because of this connection, the addition operator has
not widely considered in the literature, and the sym-
bol S only is used here for convenience. Likewise the
difference operator, the addition operator can be it-
erated and often produces interesting combinatorial
sums according to the rules:
S
n
= (E + 1)
n
=
¸
k

n
k

E
k
E
n
= (S −1)
n
=
¸
k

n
k

(−1)
n−k
S
k
Some examples are in order here:
1) Fibonacci numbers are typical:
SF
m
= F
m+1
+F
m
= F
m+2
S
n
F
m
= F
m+2n
¸
k

n
k

F
m+k
= F
m+2n
¸
k

n
k

(−1)
k
F
m+2k
= (−1)
n
F
m+n
2) Here are the binomial coefficients:
S

m
x

=

m+ 1
x + 1

S
n

m
x

=

m+n
x +n

¸
k

n
k

m
x +k

=

m+n
x +n

¸
k

n
k

(−1)
k

m+k
x +k

= (−1)
n

m
x +n

3) And finally the inverse of binomial coefficients:
S

m
x

−1
=
m+ 1
m

m−1
x

−1
S
n

m
x

=
m+ 1
m−n + 1

m−n
x

−1
¸
k

n
k

m
x +k

−1
=
m+ 1
m−n + 1

m+n
x

−1
¸
k

n
k

(−1)
k
m−k + 1

m+k
x

−1
=
1
m+ 1

m
x +n

−1
.
We can obviously invent as many expressions as we
desire and, correspondingly, may obtain some sum-
mation formulas of combinatorial interest. For ex-
ample:
S∆ = (E + 1)(E −1) = E
2
−1 =
= (E −1)(E + 1) = ∆S
78 CHAPTER 6. FORMAL METHODS
This derivation shows that the two operators S and
∆ commute. We can directly verify this property:
S∆f(x) = S(f(x + 1) −f(x)) =
= f(x + 2) −f(x) = (E
2
−1)f(x)
∆Sf(x) = ∆(f(x + 1) +f(x)) =
= f(x + 2) −f(x) = (E
2
−1)f(x)
Consequently, we have the two summation formulas:

n
S
n
= (E
2
−1)
n
=
¸
k

n
k

(−1)
n−k
E
2k
E
2n
= (∆S + 1)
n
=
¸
k

n
k


k
S
k
A simple example is offered by the Fibonacci num-
bers:
∆SF
m
= F
m+1
(∆S)
n
F
m
= ∆
n
S
n
F
m
= F
m+n
¸
k

n
k

(−1)
k
F
m+2k
= (−1)
n
F
m+n
¸
k

n
k

F
m+k
= F
m+2n
but these identities have already been proved using
the addition operator S.
6.11 Definite and Indefinite
summation
The following result is one of the most important rule
connecting the finite operator method and combina-
torial sums:
n
¸
k=0
E
k
= [z
n
]
1
1 −z
1
1 −Ez
=
= [z
n
]
1
(E −1)z

1
1 −Ez

1
1 −z

=
=
1
E −1
[z
n+1
]

1
1 −Ez

1
1 −z

=
=
E
n+1
−1
E −1
= (E
n+1
−1)∆
−1
We observe that the operator E commutes with the
indeterminate z, which is constant with respect to
the variable x, on which E operates. The rule above
is called the rule of definite summation; the operator

−1
is called indefinite summation and is often de-
noted by Σ. In order to make this point clear, let us
consider any function f(x) and suppose that a func-
tion g(x) exists such that ∆g(x) = f(x). Hence we
have ∆
−1
f(x) = g(x) and the rule of definite sum-
mation immediately gives:
n
¸
k=0
f(x +k) = g(x +n + 1) −g(x)
This is analogous to the rule of definite integration.
In fact, the operator of indefinite integration

dx
is inverse of the differentiation operator D, and if
f(x) is any function, a primitive function for f(x)
is any function ˆ g(x) such that Dˆ g(x) = f(x) or
D
−1
f(x) =

f(x)dx = ˆ g(x). The fundamental the-
orem of the integral calculus relates definite and in-
definite integration:

b
a
f(x)dx = ˆ g(b) − ˆ g(a)
The formula for definite summation can be written
in a similar way, if we consider the integer variable k
and set a = x and b = x +n + 1:
b−1
¸
k=a
f(k) = g(b) −g(a)
These facts create an analogy between ∆
−1
and
D
−1
, or Σ and

dx, which can be stressed by con-
sidering the formal properties of Σ. First of all, we
observe that g(x) = Σf(x) is not uniquely deter-
mined. If C(x) is any function periodic of period
1, i.e., C(x +k) = C(x), ∀k ∈ Z, we have:
∆(g(x) +C(x)) = ∆g(x) + ∆C(x) = ∆g(x)
and therefore:
Σf(x) = g(x) +C(x)
When f(x) is a sequence and only is defined for inte-
ger values of x, the function C(x) reduces to a con-
stant, and plays the same rˆole as the integration con-
stant in the operation of indefinite integration.
The operator Σ is obviously linear:
Σ(αf(x) +βg(x)) = αΣf(x) +βΣg(x)
This is proved by applying the operator ∆ to both
sides.
An important property is summation by parts, cor-
responding to the well-known rule of the indefinite
integration operator. Let us begin with the rule for
the difference of a product, which we proved earlier:
∆(f(x)g(x)) = ∆f(x)Eg(x) +f(x)∆g(x)
By applying the operator Σ to both sides and ex-
changing terms:
Σ(f(x)∆g(x)) = f(x)g(x) −Σ(∆f(x)Eg(x))
6.12. DEFINITE SUMMATION 79
This formula allows us to change a sum of products,
of which we know that the second factor is a differ-
ence, into a sum involving the difference of the first
factor. The transformation can be convenient every
time when the difference of the first factor is simpler
than the difference of the second factor. For example,
let us perform the following indefinite summation:
Σx

x
m

= Σx∆

x
m+ 1

=
= x

x
m+ 1

−Σ(∆x)

x + 1
m+ 1

=
= x

x
m+ 1

−Σ

x + 1
m+ 1

=
= x

x
m+ 1

x + 1
m+ 2

Obviously, this indefinite sum can be transformed
into a definite sum by using the first result in this
section:
b
¸
k=a
k

k
m

= (b + 1)

b + 1
m+ 1


− a

a
m+ 1

b + 2
m+ 2

+

a + 1
m+ 2

and for a = 0 and b = n:
n
¸
k=0
k

k
m

= (n + 1)

n + 1
m+ 1

n + 2
m+ 2

6.12 Definite Summation
In a sense, the rule:
n
¸
k=0
E
k
= (E
n+1
−1)∆
−1
= (E
n+1
−1)Σ
is the most important result of the operator method.
In fact, it reduces the sum of the successive elements
in a sequence to the computation of the indefinite
sum, and this is just the operator inverse of the
difference. Unfortunately, ∆
−1
is not easy to com-
pute and, apart from a restricted number of cases,
there is no general rule allowing us to guess what

−1
f(x) = Σf(x) might be. In this rather pes-
simistic sense, the rule is very fine, very general and
completely useless.
However, from a more positive point of view, we
can say that whenever we know, in some way or an-
other, an expression for ∆
−1
f(x) = Σf(x), we have
solved the problem of finding
¸
n
k=0
f(x + k). For
example, we can look at the differences computed in
the previous sections and, for each of them, obtain
the Σ of some function; in this way we immediately
have a number of sums. The negative point is that,
sometimes, we do not have a simple function and,
therefore, the sum may not have any combinatorial
interest.
Here is a number of identities obtained by our pre-
vious computations.
1) We have again the partial sums of the geometric
series:

−1
p
x
=
p
x
p −1
= Σp
x
n
¸
k=0
p
x+k
=
p
x+n+1
−p
x
p −1
n
¸
k=0
p
k
=
p
n+1
−1
p −1
(x = 0)
2) The sum of consecutive Fibonacci numbers:

−1
F
x
= F
x+1
= ΣF
x
n
¸
k=0
F
x+k
= F
x+n+2
−F
x+1
n
¸
k=0
F
k
= F
n+2
−1 (x = 0)
3) The sum of consecutive binomial coefficients with
constant denominator:

−1

x
m

=

x
m+ 1

= Σ

x
m

n
¸
k=0

x +k
m

=

x +n + 1
m+ 1

x
m+ 1

n
¸
k=0

k
m

=

n + 1
m+ 1

(x = 0)
4) The sum of consecutive binomial coefficients:

−1

p +x
m+x

=

p +x
m+x −1

= Σ

p +x
m+x

n
¸
k=0

p +k
m+k

=

p +n + 1
m+n

p
m−1

5) The sum of falling factorials:

−1
x
m
=
1
m+ 1
x
m+1
= Σx
m
n
¸
k=0
(x +k)
m
=
(x +n + 1)
m+1
−x
m+1
m+ 1
n
¸
k=0
k
m
=
1
m+ 1
(n + 1)
m+1
(x = 0).
6) The sum of raising factorials:

−1
x
m
=
1
m+ 1
(x −1)
m+1
= Σx
m
80 CHAPTER 6. FORMAL METHODS
n
¸
k=0
(x +k)
m
=
(x +n)
m+1
−(x −1)
m+1
m+ 1
n
¸
k=0
k
m
=
1
m+ 1
n
m+1
(x = 0).
7) The sum of inverse binomial coefficients:

−1

x
m

−1
=
m
m−1

x −1
m−1

−1
= Σ

x
m

n
¸
k=0

x +k
m

−1
=
m
m−1

x −1
m−1

−1

x +n
m−1

−1

.
8) The sum of harmonic numbers. Since 1 = ∆x, we
have:

−1
H
x
= xH
x
−x = ΣH
x
n
¸
k=0
H
x+k
= (x +n + 1)H
x+n+1
−xH
x
−(n + 1)
n
¸
k=0
H
k
= (n + 1)H
n+1
−(n + 1) =
= (n + 1)H
n
−n (x = 0).
6.13 The Euler-McLaurin Sum-
mation Formula
One of the most striking applications of the finite
operator method is the formal proof of the Euler-
McLaurin summation formula. The starting point is
the Taylor theorem for the series expansion of a func-
tion f(x) ∈ C

, i.e., a function having derivatives of
any order. The usual form of the theorem:
f(x +h) = f(x) +
h
1!
f

(x) +
h
2
2!
f
′′
(x) +
+ +
h
n
n!
f
(n)
(x) +
can be interpreted in the sense of operators as a result
connecting the shift and the differentiation operators.
In fact, for h = 1, it can be written as:
Ef(x) = If(x) +
Df(x)
1!
+
D
2
f(x)
2!
+
and therefore as a relation between operators:
E = 1 +
D
1!
+
D
2
2!
+ +
D
n
n!
+ = e
D
This formal identity relates the finite operator E and
the infinitesimal operator D, and subtracting 1 from
both sides it can be formulated as:
∆ = e
D
−1
By inverting, we have a formula for the Σ operator:
Σ =
1
e
D
−1
=
1
D

D
e
D
−1

Now, we recognize the generating function of the
Bernoulli numbers, and therefore we have a devel-
opment for Σ:
Σ =
1
D

B
0
+
B
1
1!
D +
B
2
2!
D
2
+

=
= D
−1

1
2
I +
1
12
D −
1
720
D
3
+
+
1
30240
D
5

1
1209600
D
7
+ .
This is not a series development since, as we know,
the Bernoulli numbers diverge to infinity. We have a
case of asymptotic development, which only is defined
when we consider a limited number of terms, but in
general diverges if we let the number of terms go to
infinity. The number of terms for which the sum ap-
proaches its true value depends on the function f(x)
and on the argument x.
From the indefinite we can pass to the definite sum
by applying the general rule of Section 6.12. Since
D
−1
=

dx, we immediately have:
n−1
¸
k=0
f(k) =

n
0
f(x) dx −
1
2
[f(x)]
n
0
+
+
1
12
[f

(x)]
n
0

1
720
[f
′′′
(x)]
n
0
+
and this is the celebrated Euler-McLaurin summation
formula. It expresses a sum as a function of the in-
tegral and the successive derivatives of the function
f(x). In this sense, the formula can be seen as a
method for approximating a sum by means of an in-
tegral or, vice versa, for approximating an integral by
means of a sum, and this was just the point of view
of the mathematicians who first developed it.
As a simple but very important example, let us
find an asymptotic development for the harmonic
numbers H
n
. Since H
n
= H
n−1
+ 1/n, the Euler-
McLaurin formula applies to H
n−1
and to the func-
tion f(x) = 1/x, giving:
H
n−1
=

n
1
dx
x

1
2
¸
1
x

n
1
+
1
12
¸

1
x
2

n
1


1
720
¸

6
x
4

n
1
+
1
30240
¸

120
x
6

n
1
+
= ln n −
1
2n
+
1
2

1
12n
2
+
1
12
+
1
120n
4


1
120

1
256n
6
+
1
252
+
6.14. APPLICATIONS OF THE EULER-MCLAURIN FORMULA 81
In this expression a number of constants appears, and
they can be summed together to form a constant γ,
provided that the sum actually converges. However,
we observe that as n → ∞ this constant γ is the
Euler-Mascheroni constant:
lim
n→∞
(H
n−1
−ln n) = γ = 0.577215664902 . . .
By adding 1/n to both sides of the previous relation,
we eventually find:
H
n
= ln n +γ +
1
2n

1
12n
2
+
1
120n
4

1
252n
6
+
and this is the asymptotic expansion we were looking
for.
6.14 Applications of the Euler-
McLaurin Formula
As another application of the Euler-McLaurin sum-
mation formula, we now show the derivation of the
Stirling’s approximation for n!. The first step consists
in taking the logarithm of that quantity:
ln n! = ln 1 + ln 2 + ln 3 + + ln n
so that we are reduced to compute a sum and hence
to apply the Euler-McLaurin formula:
ln(n −1)! =
n−1
¸
k=1
ln k =

n
1
ln xdx −
1
2
[ln x]
n
1
+
+
1
12
¸
1
x

n
1

1
720
¸
2
x
3

n
1
+ =
= nln n −n + 1 −
1
2
ln n +
1
12n


1
12

1
360n
2
+
1
360
+ .
Here we have used the fact that

ln xdx = xln x −
x. At this point we can add lnn to both sides and
introduce a constant σ = 1 −1/12 +1/360 − It is
not by all means easy to determine directly the value
of σ, but by other approaches to the same problem
it is known that σ = ln

2π. Numerically, we can
observe that:
1−
1
12
+
1
360
= 0.919(4) and ln

2π ≈ 0.9189388.
We can now go on with our sum:
ln n! = nln n−n+
1
2
ln n+ln

2π+
1
12n

1
360n
3
+
To obtain the value of n! we only have to take expo-
nentials:
n! =
n
n
e
n

n

2π exp

1
12n

exp


1
360n
3

=

2πn
n
n
e
n

1 +
1
12n
+
1
288n
2
+

1 −
1
360n
3
+

=
=

2πn
n
n
e
n

1 +
1
12n
+
1
288n
2

.
This is the well-known Stirling’s approximation for
n!. By means of this approximation, we can also find
the approximation for another important quantity:

2n
n

=
(2n)!
n!
2
=

4πn

2n
e

2n
2πn

n
e

2n

1 +
1
24n
+
1
1142n
2

1 +
1
12n
+
1
288n
2

2
=
=
4
n

πn

1 −
1
8n
+
1
128n
2
+

.
Another application of the Euler-McLaurin sum-
mation formula is given by the sum
¸
n
k=1
k
p
, when
p is any integer constant different from −1, which is
the case of the harmonic numbers:
n−1
¸
k=0
k
p
=

n
0
x
p
dx −
1
2
[x
p
]
n
0
+
1
12

px
p−1

n
0


1
720

p(p −1)(p −2)x
p−3

n
0
+ =
=
n
p+1
p + 1

n
p
2
+
pn
p−1
12


p(p −1)(p −2)n
p−3
720
+ .
In this case the evaluation at 0 does not introduce
any constant. By adding n
p
to both sides, we have
the following formula, which only contains a finite
number of terms:
n
¸
k=0
k
p
=
n
p+1
p + 1
+
n
p
2
+
pn
p−1
12


p(p −1)(p −2)n
p−3
720
+
If p is not an integer, after ⌈p⌉ differentiations, we
obtain x
q
, where q < 0, and therefore we cannot
consider the limit 0. We proceed with the Euler-
McLaurin formula in the following way:
n−1
¸
k=1
k
p
=

n
1
x
p
dx −
1
2
[x
p
]
n
1
+
1
12

px
p−1

n
1


1
720

p(p −1)(p −2)x
p−3

n
1
+ =
=
n
p+1
p + 1

1
p + 1

1
2
n
p
+
1
2
+
pn
p−1
12


p
12

p(p −1)(p −2)n
p−3
720
+
82 CHAPTER 6. FORMAL METHODS
+
p(p −1)(p −2)
720

n
¸
k=1
k
p
=
n
p+1
p + 1
+
n
p
2
+
pn
p−1
12


p(p −1)(p −2)n
p−3
720
+ +K
p
.
The constant:
K
p
= −
1
p + 1
+
1
2

p
12
+
p(p −1)(p −2)
720
+
has a fundamental rˆole when the leading term
n
p+1
/(p + 1) does not increase with n, i.e., when
p < −1. In that case, in fact, the sum converges
to K
p
. When p > −1 the constant is less important.
For example, we have:
n
¸
k=1

k =
2
3
n

n +

n
2
+K
1/2
+
1
24

n
+
K
1/2
≈ −0.2078862 . . .
n
¸
k=1
1

k
= 2

n +K
−1/2
+
1
2

n

1
24n

n
+
K
−1/2
≈ −1.4603545 . . .
For p = −2 we find:
n
¸
k=1
1
k
2
= K
−2

1
n
+
1
2n
2

1
6n
3
+
1
30n
5

It is possible to show that K
−2
= π
2
/6 and therefore
we have a way to approximate the sum (see Section
2.7).
Chapter 7
Asymptotics
7.1 The convergence of power
series
In many occasions, we have pointed out that our ap-
proach to power series was purely formal. Because
of that, we always spoke of “formal power series”,
and never considered convergence problems. As we
have seen, a lot of things can be said about formal
power series, but now the moment has arrived that
we must turn to talk about the convergence of power
series. We will see that this allows us to evaluate the
asymptotic behavior of the coefficients f
n
of a power
series
¸
n
f
n
t
n
, thus solving many problems in which
the exact value of f
n
cannot be found. In fact, many
times, the asymptotic evaluation of f
n
can be made
more precise and an actual approximation of f
n
can
be found.
The natural setting for talking about convergence
is the field C of the complex numbers and therefore,
from now on, we will think of the indeterminate t
as of a variable taking its values from C. Obviously,
a power series f(t) =
¸
n
f
n
t
n
converges for some
t
0
∈ C iff f(t
0
) =
¸
n
f
n
t
n
0
< ∞, and diverges iff
lim
t→t0
f(t) = ∞. There are cases for which a series
neither converges nor diverges; for example, when t =
1, the series
¸
n
(−1)
n
t
n
does not tend to any limit,
finite or infinite. Therefore, when we say that a series
does not converge (to a finite value) for a given value
t
0
∈ C, we mean that the series in t
0
diverges or does
not tend to any limit.
A basic result on convergence is given by the fol-
lowing:
Theorem 7.1.1 Let f(t) =
¸
n
f
n
t
n
be a power se-
ries such that f(t
0
) converges for the value t
0
∈ C.
Then f(t) converges for every t
1
∈ C such that
[t
1
[ < [t
0
[.
Proof: If f(t
0
) < ∞ then an index N ∈ N ex-
ists such that for every n > N we have [f
n
t
n
0
[ ≤
[f
n
[[t
0
[
n
< M, for some finite M ∈ R. This means
[f
n
[ < M/[t
0
[
n
and therefore:


¸
n=N
f
n
t
n
1



¸
n=N
[f
n
[[t
1
[
n



¸
n=N
M
[t
0
[
n
[t
1
[
n
= M

¸
n=N

[t
1
[
[t
0
[

n
< ∞
because the last sum is a geometric series with
[t
1
[/[t
0
[ < 1 by the hypothesis [t
1
[ < [t
0
[. Since the
first N terms obviously amount to a finite quantity,
the theorem follows.
In a similar way, we can prove that if the series
diverges for some value t
0
∈ C, then it diverges for
every value t
1
such that [t
1
[ > [t
0
[. Obviously, a
series can converge for the single value t
0
= 0, as it
happens for
¸
n
n!t
n
, or can converge for every value
t ∈ C, as for
¸
n
t
n
/n! = e
t
. In all the other cases,
the previous theorem implies:
Theorem 7.1.2 Let f(t) =
¸
n
f
n
t
n
be a power se-
ries; then there exists a non-negative number R ∈ R
or R = ∞ such that:
1. for every complex number t
0
such that [t
0
[ < R
the series (absolutely) converges and, in fact, the
convergence is uniform in every circle of radius
ρ < R;
2. for every complex number t
0
such that [t
0
[ > R
the series does not converge.
The uniform convergence derives from the previous
proof: the constant M can be made unique by choos-
ing the largest value for all the t
0
such that [t
0
[ ≤ ρ.
The value of R is uniquely determined and is called
the radius of convergence for the series. From the
proof of the theorem, for r < R we have [f
n
[r
n

M or
n

[f
n
[ ≤
n

M/r → 1/r; this implies that
limsup
n

[f
n
[ ≤ 1/R. Besides, for r > R we
have [f
n
[r
n
≥ 1 for infinitely many n; this implies
limsup
n

[f
n
[ ≥ 1/R, and therefore we have the fol-
lowing formula for the radius of convergence:
1
R
= lim sup
n→∞
n

[f
n
[.
83
84 CHAPTER 7. ASYMPTOTICS
This result is the basis for our considerations on the
asymptotics of a power series coefficients. In fact, it
implies that, as a first approximation, [f
n
[ grows as
1/R
n
. However, this is a rough estimate, because it
can also grow as n/R
n
or 1/(nR
n
), and many possi-
bilities arise, which can make more precise the basic
approximation; the next sections will be dedicated to
this problem. We conclude by noticing that if:
lim
n→∞

f
n+1
f
n

= S
then R = 1/S is the radius of convergence of the
series.
7.2 The method of Darboux
Newton’s rule is the basis for many considerations on
asymptotics. In practice, we used it to prove that
F
n
∼ φ
n
/

5, and many other proofs can be per-
formed by using Newton’s rule together with the fol-
lowing theorem, whose relevance was noted by Ben-
der and, therefore, will be called Bender’s theorem:
Theorem 7.2.1 Let f(t) = g(t)h(t) where f(t),
g(t), h(t) are power series and h(t) has a ra-
dius of convergence larger than f(t)’s (which there-
fore equals the radius of convergence of g(t)); if
lim
n→∞
g
n
/g
n+1
= b and h(b) = 0, then:
f
n
∼ h(b)g
n
Let us remember that if g(t) has positive real coef-
ficients, then g
n
/g
n+1
tends to the radius of conver-
gence of g(t). The proof of this theorem is omitted
here; instead, we give a simple example. Let us sup-
pose we wish to find the asymptotic value for the
Motzkin numbers, whose generating function is:
µ(t) =
1 −t −

1 −2t −3t
2
2t
2
.
For n ≥ 2 we obviously have:
µ
n
= [t
n
]
1 −t −

1 −2t −3t
2
2t
2
=
= −
1
2
[t
n+2
]

1 +t (1 −3t)
1/2
.
We now observe that the radius of convergence of
µ(t) is R = 1/3, which is the same as the radius of
g(t) = (1−3t)
1/2
, while h(t) =

1 +t has 1 as radius
of convergence; therefore we have µ
n

n+1
→ 1/3 as
n → ∞. By Bender’s theorem we find:
µ
n
∼ −
1
2

4
3
[t
n+2
](1 −3t)
1/2
=
= −

3
3

1/2
n + 2

(−3)
n+2
=
=

3
3(2n + 3)

2n + 4
n + 2

3
4

n+2
.
This is a particular case of a more general re-
sult due to Darboux and known as Darboux’ method.
First of all, let us show how it is possible to obtain an
approximation for the binomial coefficient

γ
n

, when
γ ∈ C is a fixed number and n is large. We begin
by proving the following formula for the ratio of two
large values of the Γ function (a, b are two small pa-
rameters with respect to n):
Γ(n +a)
Γ(n +b)
=
= n
a−b

1 +
(a −b)(a +b −1)
2n
+O

1
n
2

.
Let us apply the Stirling formula for the Γ function:
Γ(n +a)
Γ(n +b)


n +a

n +a
e

n+a

1 +
1
12(n +a)

n +b

e
n +b

n+b

1 −
1
12(n +b)

.
If we limit ourselves to the term in 1/n, the two cor-
rections cancellate each other and therefore we find:
Γ(n +a)
Γ(n +b)

n +b
n +a
e
b−a
(n +a)
n+a
(n +b)
n+b
=
=

n +b
n +a
e
b−a
n
a−b
(1 +a/n)
n+a
(1 +b/n)
n+b
.
We now obtain asymptotic approximations in the fol-
lowing way:

n +b
n +a
=

1 +b/n

1 +a/n

1 +
b
2n

1 −
a
2n

≈ 1 +
b −a
2n
.

1 +
x
n

n+x
= exp

(n +x) ln

1 +
x
n

=
= exp

(n +x)

x
n

x
2
2n
2
+

=
= exp

x +
x
2
n

x
2
2n
+

=
= e
x

1 +
x
2
2n
+

.
Therefore, for our expression we have:
Γ(n +a)
Γ(n +b)


n
a−b
e
a−b
e
a
e
b

1 +
a
2
2n

1 −
b
2
2n

1 +
b −a
2n

=
= n
a−b

1 +
a
2
−b
2
−a +b
2n
+O

1
n
2

.
We are now in a position to prove the following:
7.3. SINGULARITIES: POLES 85
Theorem 7.2.2 Let f(t) = h(t)(1 − αt)
γ
, for some
γ which is not a positive integer, and h(t) having a
radius of convergence larger than 1/α. Then we have:
f
n
= [t
n
]f(t) ∼ h

1
α

γ
n

(−α)
n
=
α
n
h(1/α)
Γ(−γ)n
1+γ
.
Proof: We simply apply Bender’s theorem and the
formula for approximating the binomial coefficient:

γ
n

=
γ(γ −1) (γ −n + 1)
n!
=
=
(−1)
n
(n −γ −1)(n −γ −2) (1 −γ)(−γ)
Γ(n + 1)
.
By repeated applications of the recurrence formula
for the Γ function Γ(x + 1) = xΓ(x), we find:
Γ(n −γ) =
= (n −γ −1)(n −γ −2) (1 −γ)(−γ)Γ(−γ)
and therefore:

γ
n

=
(−1)
n
Γ(n −γ)
Γ(n + 1)Γ(−γ)
=
=
(−1)
n
Γ(−γ)
n
−1−γ

1 +
γ(γ + 1)
2n

from which the desired formula follows.
7.3 Singularities: poles
The considerations in the previous sections show how
important is to determine the radius of convergence
of a series, when we wish to have an approximation
of its coefficients. Therefore, we are now going to
look more closely to methods for finding the radius
of convergence of a given function f(t). In order to do
this, we have to distinguish between the function f(t)
and its series development, which will be denoted by
´
f(t). So, for example, we have f(t) = 1/(1 − t) and
´
f(t) = 1 +t +t
2
+t
3
+ . The series
´
f(t) represents
f(t) inside the circle of convergence, in the sense that
´
f(t) = f(t) for every t internal to this circle, but
´
f(t) can be different from f(t) on the border of the
circle or outside it, where actually the series does not
converge. Therefore, the radius of convergence can
be determined if we are able to find out values t
0
for
which f(t
0
) =
´
f(t
0
).
A first case is when t
0
is an isolated point for which
lim
t→t0
f(t) = ∞. In fact, in this case, for every t
such that [t[ > [t
0
[ the series
´
f(t) should diverge as
we have seen in the previous section, and therefore
´
f(t) must be different from f(t). This is the case of
f(t) = 1/(1−t), which goes to ∞ when t → 1. When
[t[ > 1, f(t) assumes a well-defined value while
´
f(t)
diverges.
We will call a singularity for f(t) every point t
0
∈ C
such that in every neighborhood of t
0
there is a t for
which f(t) and
´
f(t) behaves differently and a t

for
which f(t

) =
´
f(t

). Therefore, t
0
= 1 is a singularity
for f(t) = 1/(1 −t). Because our previous considera-
tions, the singularities of f(t) determine its radius of
convergence; on the other hand, no singularity can be
contained in the circle of convergence, and therefore
the radius of convergence is determined by the singu-
larity or singularities of smallest modulus. These will
be called dominating singularities and we observe ex-
plicitly that a function can have more than one dom-
inating singularity. For example, f(t) = 1/(1 − t
2
)
has t = 1 and t = −1 as dominating singularities, be-
cause [1[ = [ −1[. The radius of convergence is always
a non-negative real number and we have R = [t
0
[, if
t
0
is any one of the dominating singularities for f(t).
An isolated point t
0
for which f(t
0
) = ∞ is there-
fore a singularity for f(t); as we shall see, not every
singularity of f(t) is such that f(t) = ∞, but, for
the moment, let us limit ourselves to this case. The
following situation is very important: if f(t
0
) = ∞
and we set α = 1/t
0
, we will say that t
0
is a pole for
f(t) iff there exists a positive integer m such that:
lim
t→t0
(1 −αt)
m
f(t) = K < ∞ and K = 0.
The integer m is called the order of the pole. By
this definition, the function f(t) = 1/(1 − t) has a
pole of order 1 in t
0
= 1, while 1/(1 − t)
2
has a
pole of order 2 in t
0
= 1 and 1/(1 − 2t)
5
has a pole
of order 5 in t
0
= 1/2. A more interesting case is
f(t) = (e
t
− e)/(1 − t)
2
, which, notwithstanding the
(1 −t)
2
, has a pole of order 1 in t
0
= 1; in fact:
lim
t→1
(1 −t)
e
t
−e
(1 −t)
2
= lim
t→1
e
t
−e
1 −t
= lim
t→1
e
t
−1
= −e.
The generating function of Bernoulli numbers
f(t) = t/(e
t
− 1) has infinitely many poles. Observe
first that t = 0 is not a pole because:
lim
t→0
t
e
t
−1
= lim
t→0
1
e
t
= 1.
The denominator becomes 0 when e
t
= 1, and this
happens when t = 2kπi; in fact, e
2kπi
= cos 2kπ +
i sin 2kπ = 1. In that case, the dominating sin-
gularities are t
0
= 2πi and t
1
= −2πi. Finally,
the generating function of the ordered Bell numbers
f(t) = 1/(2−e
t
) has again an infinite number of poles
t = ln 2+2kπi; in this case the dominating singularity
is t
0
= ln 2.
We conclude this section by observing that if
f(t
0
) = ∞, not necessarily t
0
is a pole for f(t). In
86 CHAPTER 7. ASYMPTOTICS
fact, let us consider the generating function for the
central binomial coefficients f(t) = 1/

1 −4t. For
t
0
= 1/4 we have f(1/4) = ∞, but t
0
is not a pole of
order 1 because:
lim
t→1/4
1 −4t

1 −4t
= lim
t→1/4

1 −4t = 0
and the same happens if we try with (1 − 4t)
m
for
m > 1. As we shall see, this kind of singularity is
called “algebraic”. Finally, let us consider the func-
tion f(t) = exp(1/(1−t)), which goes to ∞ as t → 1.
In this case we have:
lim
t→1
(1 −t)
m
exp

1
1 −t

=
= lim
t→1
(1 −t)
m

1 +
1
1 −t
+
1
2(1 −t)
2
+

= ∞.
In fact, the first m − 1 terms tend to 0, the mth
term tends to 1/m!, but all the other terms go to ∞.
Therefore, t
0
= 1 is not a pole of any order. When-
ever we have a function f(t) for which a point t
0
∈ C
exists such that ∀m > 0: lim
t→t0
(1−t/t
0
)
m
f(t) = ∞,
we say that t
0
is an essential singularity for f(t). Es-
sential singularities are points at which f(t) goes to
∞ too fast; these singularities cannot be treated by
Darboux’ method and their study will be delayed un-
til we study the Hayman’s method.
7.4 Poles and asymptotics
Darboux’ method can be easily used to deal with
functions, whose dominating singularities are poles.
Actually, a direct application of Bender’s theorem is
sufficient, and this is the way we will use in the fol-
lowing examples.
Fibonacci numbers are easily approximated:
[t
n
]
t
1 −t −t
2
= [t
n
]
t
1 −
´
φt
1
1 −φt


¸
t
1 −
´
φt

t =
1
φ

[t
n
]
1
1 −φt
=
1

5
φ
n
.
Our second example concerns a particular kind of
permutations, called derangements (see Section 2.2).
A derangement is a permutation without any fixed
point. For n = 0 the empty permutation is consid-
ered a derangement, since no fixed point exists. For
n = 1, there is no derangement, but for n = 2 the
permutation (1 2), written in cycle notation, is ac-
tually a derangement. For n = 3 we have the two
derangements (1 2 3) and (1 3 2), and for n = 4 we
have a total of 9 derangements.
Let D
n
the number of derangements in P
n
; we can
count them in the following way: we begin by sub-
tracting from n!, the total number of permutations,
the number of permutations having at least a fixed
point: if the fixed point is 1, we have (n−1)! possible
permutations; if the fixed point is 2, we have again
(n − 1)! permutations of the other elements. There-
fore, we have a total of n(n − 1)! cases, giving the
approximation:
D
n
= n! −n(n −1)!.
This quantity is clearly 0 and this happens because
we have subtracted twice every permutation with at
least 2 fixed points: in fact, we subtracted it when
we considered the first and the second fixed point.
Therefore, we have now to add permutations with at
least two fixed points. These are obtained by choos-
ing the two fixed points in all the

n
2

possible ways
and then permuting the n − 2 remaining elements.
Thus we have the new approximation:
D
n
= n! −n(n −1)! +

n
2

(n −2)!.
In this way, however, we added twice permutations
with at least three fixed points, which have to be
subtracted again. We thus obtain:
D
n
= n! −n(n −1)! +

n
2

(n −2)! −

n
3

(n −3)!.
We can now go on with the same method, which is
called the inclusion exclusion principle, and eventu-
ally arrive to the final value:
D
n
= n! −n(n −1)! +

n
2

(n −2)! −

n
3

(n −3)! + =
=
n!
0!

n!
1!
+
n!
2!

n!
3!
+ = n!
n
¸
k=0
(−1)
k
k!
.
This formula checks with the previously found val-
ues. We obtain the exponential generating function
((D
n
/n!) by observing that the generic element in
the sum is the coefficient [t
n
]e
−t
, and therefore by
the theorem on the generating function for the par-
tial sums we have:
(

D
n
n!

=
e
−t
1 −t
.
In order to find the asymptotic value for D
n
, we
observe that the radius of convergence of 1/(1 − t)
is 1, while e
−t
converges for every value of t. By
Bender’s theorem we have:
D
n
n!
∼ e
−1
or D
n

n!
e
.
7.5. ALGEBRAIC AND LOGARITHMIC SINGULARITIES 87
This value is indeed a very good approximation for
D
n
, which can actually be computed as the integer
nearest to n!/e.
Let us now see how Bender’s theorem is applied
to the exponential generating function of the ordered
Bell numbers. We have shown that the dominating
singularity is a pole at t = ln 2 which has order 1:
lim
t→ln 2
1 −t/ ln 2
2 −e
t
= lim
t→ln 2
−1/ ln 2
−e
t
=
1
2 ln 2
.
At this point we have:
[t
n
]
1
2 −e
t
= [t
n
]
1
1 −t/ ln 2
1 −t/ ln 2
2 −e
t


¸
1 −t/ ln 2
2 −e
t

t = ln 2

[t
n
]
1
1 −t/ ln 2
=
=
1
2
1
(ln 2)
n+1
and we conclude with the very good approximation
O
n
∼ n!/(2(ln 2)
n+1
).
Finally, we find the asymptotic approximation for
the Bernoulli numbers. The following statement is
very important when we have functions with several
dominating singularity:
Principle: If t
1
, t
2
, . . . , t
k
are all the dominating
singularities of a function f(t), then [t
n
]f(t) can be
found by summing all the contributions obtained by
independently considering the k singularities.
We already observed that ±2πi are the two domi-
nating singularities for the generating function of the
Bernoulli numbers; they are both poles of order 1:
lim
t→2πi
t(1 −t/2πi)
e
t
−1
= lim
t→2πi
1 −t/πi
e
t
= −1.
lim
t→−2πi
t(1 +t/2πi)
e
t
−1
= lim
t→−2πi
1 +t/πi
e
t
= −1.
Therefore we have:
[t
n
]
t
e
t
−1
= [t
n
]
1
1 −t/2πi
t(1 −t/2πi)
e
t
−1
∼ −
1
(2πi)
n
.
A similar result is obtained for the other pole; thus
we have:
B
n
n!
∼ −
1
(2πi)
n

1
(−2πi)
n
.
When n is odd, these two values are opposite in sign
and the result is 0; this confirms that the Bernoulli
numbers of odd index are 0, except for n = 1. When
n is even, say n = 2k, we have (2πi)
2k
= (−2πi)
2k
=
(−1)
k
(2π)
2k
; therefore:
B
2k
∼ −
2(−1)
k
(2k)!
(2π)
2k
.
This formula is a good approximation, also for small
values of n, and shows that Bernoulli numbers be-
come, in modulus, larger and larger as n increases.
7.5 Algebraic and logarithmic
singularities
Let us consider the generating function for the Cata-
lan numbers f(t) = (1 −

1 −4t)/(2t) and the cor-
responding power series
´
f(t) = 1 + t + 2t
2
+ 5t
3
+
14t
4
+ . Our choice of the − sign was motivated
by the initial condition of the recurrence C
n+1
=
¸
n
k=0
C
k
C
n−k
defining the Catalan numbers. This
is due to the fact that, when the argument is a pos-
itive real number, we can choose the positive value
as the result of a square root. In other words, we
consider the arithmetic square root instead of the al-
gebraic square root. This allows us to identify the
power series
´
f(t) with the function f(t), but when we
pass to complex numbers this is no longer possible.
Actually, in the complex field, a function containing
a square root is a two-valued function, and there are
two branches defined by the same expression. Only
one of these two branches coincides with the func-
tion defined by the power series, which is obviously a
one-valued function.
The points at which a square root becomes 0 are
special points; in them the function is one-valued, but
in every neighborhood the function is two-valued. For
the smallest in modulus among these points, say t
0
,
we must have the following situation: for t such that
[t[ < [t
0
[,
´
f(t) should coincide with a branch of f(t),
while for t such that [t[ > [t
0
[,
´
f(t) cannot converge.
In fact, consider a t ∈ R, t > [t
0
[; the expression un-
der the square root should be a negative real number
and therefore f(t) ∈ C`R; but
´
f(t) can only be a real
number or f(t) does not converge. Because we know
that when
´
f(t) converges we must have
´
f(t) = f(t),
we conclude that
´
f(t) cannot converge. This shows
that t
0
is a singularity for f(t).
Every kth root originates the same problem and
the function is actually a k-valued function; all the
values for which the argument of the root is 0 is a
singularity, called an algebraic singularity. They can
be treated by Darboux’ method or, directly, by means
of Bender’s theorem, which relies on Newton’s rule.
Actually, we already used this method to find the
asymptotic evaluation for the Motzkin numbers.
The same considerations hold when a function con-
tains a logarithm. In fact, a logarithm is an infinite-
valued function, because it is the inverse of the expo-
nential, which, in the complex field C, is a periodic
function:
e
t+2kπi
= e
t
e
2kπi
= e
t
(cos 2kπ +i sin 2kπ) = e
t
.
The period of e
t
is therefore 2πi and lnt is actually
ln t + 2kπi, for k ∈ Z. A point t
0
for which the ar-
gument of a logarithm is 0 is a singularity for the
88 CHAPTER 7. ASYMPTOTICS
corresponding function. In every neighborhood of t
0
,
the function has an infinite number of branches; this
is the only fact distinguishing a logarithmic singular-
ity from an algebraic one.
Let us suppose we have the sum:
S
n
= 1 +
2
2
+
4
3
+
8
4
+ +
2
n−1
n
=
1
2
n
¸
k=1
2
k
k
and we wish to compute an approximate value. The
generating function is:
(

1
2
n
¸
k=1
2
k
k

=
1
2
1
1 −t
ln
1
1 −2t
.
There are two singularities: t = 1 is a pole, while t =
1/2 is a logarithmic singularity. Since the latter has
smaller modulus, it is dominating and R = 1/2 is the
radius of convergence of the function. By Bender’s
theorem we have:
S
n
=
1
2
[t
n
]
1
1 −t
ln
1
1 −2t


1
2
1
1 −1/2
[t
n
] ln
1
1 −2t
=
2
n
n
.
This is not a very good approximation. In the next
section we will see how it can be improved.
7.6 Subtracted singularities
The methods presented in the preceding sections only
give the expression describing the general behavior of
the coefficients f
n
in the expansion
´
f(t) =
¸

k=0
f
k
t
k
,
i.e., what is called the principal value for f
n
. Some-
times, this behavior is only achieved for very large
values of n, but for smaller values it is just a rough
approximation of the true value. Because of that,
we speak of “asymptotic evaluation” or “asymptotic
approximation”. When we need a true approxima-
tion, we should introduce some corrections, which
slightly modify the general behavior and more ac-
curately evaluate the true value of f
n
.
Many times, the following observation solves the
problem. Suppose we have found, by one of the
previous methods, that a function f(t) is such that
f(t) ∼ A(1−αt)
γ
, for some A, γ ∈ R, or, more in gen-
eral, f(t) ∼ g(t), for some function g(t) of which we
exactly know the coefficients g
n
. For example, this is
the case of ln(1/(1−αt)). Because f
n
∼ g
n
, the func-
tion h(t) = f(t) − g(t) has coefficients h
n
that grow
more slowly than f
n
; formally, since O(f
n
) = O(g
n
),
we must have h
n
= o(f
n
). Therefore, the quantity
g
n
+h
n
is a better approximation to f
n
than g
n
.
When f(t) has a pole t
0
of order m, the successive
application of this method of subtracted singularities
completely eliminates the singularity. Therefore, if
a second singularity t
1
exists such that [t
1
[ > [t
0
[,
we can express the coefficients f
n
in terms of t
1
as
well. When f(t) has a dominating algebraic singular-
ity, this cannot be eliminated, but the method of sub-
tracted singularities allows us to obtain corrections to
the principal value. Formally, by the successive ap-
plication of this method, we arrive to the following
results. If t
0
is a dominating pole of order m and
α = 1/t
0
, then we find the expansion:
f(t) =
A
−m
(1 −αt)
m
+
A
−m+1
(1 −αt)
m−1
+
A
−m+2
(1 −αt)
m−2
+ .
If t
0
is a dominating algebraic singularity and f(t) =
h(t)(1 − αt)
p/m
, where α = 1/t
0
and h(t) has a ra-
dius of convergence larger than [t
0
[, then we find the
expansion:
f(t) = A
p
(1 −αt)
p/m
+A
p−1
(1 −αt)
(p−1)/m
+
+A
p−2
(1 −αt)
(p−2)/m
+ .
Newton’s rule can obviously be used to pass from
these expansions to the asymptotic value of f
n
.
The same method of subtracted singularities can be
used for a logarithmic singularity. Let us consider as
an example the sum S
n
=
¸
n
k=1
2
k−1
/k, introduced
in the previous section. We found the principal value
S
n
∼ 2
n
/n by studying the generating function:
1
2
1
1 −t
ln
1
1 −2t
∼ ln
1
1 −2t
.
Let us therefore consider the new function:
h(t) =
1
2
1
1 −t
ln
1
1 −2t
−ln
1
1 −2t
=
= −
1 −2t
2(1 −t)
ln
1
1 −2t
.
The generic term h
n
should be significantly less than
f
n
; the factor (1 − 2t) actually reduces the order of
growth of the logarithm:
−[t
n
]
1 −2t
2(1 −t)
ln
1
1 −2t
=
= −
1
2(1 −1/2)
[t
n
](1 −2t) ln
1
1 −2t
=
= −

2
n
n
−2
2
n−1
n −1

=
2
n
n(n −1)
.
Therefore, a better approximation for S
n
is:
S
n
=
2
n
n
+
2
n
n(n −1)
=
2
n
n −1
.
The reader can easily verify that this correction
greatly reduces the error in the evaluation of S
n
.
7.8. HAYMAN’S METHOD 89
A further correction can now be obtained by con-
sidering:
k(t) =
1
2
1
1 −t
ln
1
1 −2t

−ln
1
1 −2t
+ (1 −2t) ln
1
1 −2t
=
=
(1 −2t)
2
2(1 −t)
ln
1
1 −2t
which gives:
k
n
=
1
2(1 −1/2)
[t
n
](1 −2t)
2
ln
1
1 −2t
=
=
2
n
n
−4
2
n−1
n −1
+ 4
2
n−2
n −2
=
2
n+1
n(n −1)(n −2)
.
This correction is still smaller, and we can write:
S
n

2
n
n −1

1 +
2
n(n −2)

.
In general, we can obtain the same results if we ex-
pand the function h(t) in f(t) = g(t)h(t), h(t) with a
radius of convergence larger than that of f(t), around
the dominating singularity. This is done in the fol-
lowing way:
1
2(1 −t)
=
1
1 + (1 −2t)
=
= 1 −(1 −2t) + (1 −2t)
2
−(1 −2t)
3
+ .
This implies:
1
2(1 −t)
ln
1
1 −2t
= ln
1
1 −2t

−(1 −2t) ln
1
1 −2t
+ (1 −2t)
2
ln
1
1 −2t

and the result is the same as the one previously ob-
tained by the method of subtracted singularities.
7.7 The asymptotic behavior of
a trinomial square root
In many problems we arrive to a generating function
of the form:
f(t) =
p(t) −

(1 −αt)(1 −βt)
rt
m
or:
g(t) =
q(t)

(1 −αt)(1 −βt)
.
In the former case, p(t) is a correcting polynomial,
which has no effect on f
n
, for n sufficiently large,
and therefore we have:
f
n
= −
1
r
[t
n+m
]

(1 −αt)(1 −βt)
where m is a small integer. In the second case, g
n
is
the sum of various terms, as many as there are terms
in the polynomial q(t), each one of the form:
q
k
[t
n−k
]
1

(1 −αt)(1 −βt)
.
It is therefore interesting to compute, once and for
all, the asymptotic value [t
n
]((1−αt)(1−βt))
s
, where
s = 1/2 or s = −1/2.
Let us suppose that [α[ > [β[, since the case α = β
has no interest and the case α = −β should be ap-
proached in another way. This hypothesis means that
t = 1/α is the radius of convergence of the function
and we can develop everything around this singular-
ity. In most combinatorial problems we have α > 0,
because the coefficients of f(t) are positive numbers,
but this is not a limiting factor.
Let us consider s = 1/2; in this case, a minus sign
should precede the square root. The evaluation is
shown in Table 7.1. The formula so obtained can be
considered sufficient for obtaining both the asymp-
totic evaluation of f
n
and a suitable numerical ap-
proximation. However, we can use the following de-
velopments:

2n
n

=
4
n

πn

1 −
1
8n
+
1
128n
2
+O

1
n
3

1
2n −1
=
1
2n

1 +
1
2n
+
1
4n
2
+O

1
n
3

1
2n −3
=
1
2n

1 +
3
2n
+O

1
n
2

and get:
f
n
=

α −β
α
α
n
2n

πn

1 −
6 + 3(α −β)
8(α −β)n
+
25
128n
2
+
+
9
8(α −β)n
2

9αβ + 51β
2
32(α −β)
2
n
2
+O

1
n
3

.
The reader is invited to find a similar formula for
the case s = 1/2.
7.8 Hayman’s method
The method for coefficient evaluation which uses the
function singularities (Darboux’ method), can be im-
proved and made more accurate, as we have seen,
by the technique of “subtracted singularities”. Un-
fortunately, these methods become useless when the
function f(t) has no singularity (entire functions) or
when the dominating singularity is essential. In fact,
in the former case we do not have any singularity to
operate on, and in the latter the development around
the singularity gives rise to a series with an infinite
number of terms of negative degree.
90 CHAPTER 7. ASYMPTOTICS
[t
n
] −(1 −αt)
1/2
(1 −βt)
1/2
= [t
n
] −(1 −αt)
1/2

α−β
α
+
β
α
(1 −αt)

1/2
=
= −

α−β
α
[t
n
](1 −αt)
1/2

1 +
β
α−β
(1 −αt)

1/2
=
= −

α−β
α
[t
n
](1 −αt)
1/2

1 +
β
2(α−β)
(1 −αt) −
β
2
8(α−β)
2
(1 −αt)
2
+

=
= −

α−β
α
[t
n
]

(1 −αt)
1/2
+
β
2(α−β)
(1 −αt)
3/2

β
2
8(α−β)
2
(1 −αt)
5/2
+

=
= −

α−β
α

1/2
n

(−α)
n
+
β
2(α−β)

3/2
n

(−α)
n

β
2
8(α−β)
2

5/2
n

(−α)
n
+

=
= −

α−β
α
(−1)
n−1
4
n
(2n−1)

2n
n

(−α)
n

1 −
β
2(α−β)
3
2n−3

β
2
8(α−β)
2
15
(2n−3)(2n−5)
+

=
=

α−β
α
α
n
4
n
(2n−1)

2n
n

1 −

2(α−β)(2n−3)

15β
2
8(α−β)
2
(2n−3)(2n−5)
+O

1
n
3

.
Table 7.1: The case s = 1/2
In these cases, the only method seems to be the
Cauchy theorem, which allows us to evaluate [t
n
]f(t)
by means of an integral:
f
n
=
1
2πi

γ
f(t)
t
n+1
dt
where γ is a suitable path enclosing the origin. We do
not intend to develop this method here, but we’ll limit
ourselves to sketch a method, derived from Cauchy
theorem, which allows us to find an asymptotic evalu-
ation for f
n
in many practical situations. The method
can be implemented on a computer in the following
sense: given a function f(t), in an algorithmic way we
can check whether f(t) belongs to the class of func-
tions for which the method is applicable (the class
of “H-admissible” functions) and, if that is the case,
we can evaluate the principal value of the asymp-
totic estimate for f
n
. The system ΛΥΩ, by Flajolet,
Salvy and Zimmermann, realizes this method. The
development of the method was mainly performed
by Hayman and therefore it is known as Hayman’s
method; this also justifies the use of the letter H in
the definition of H-admissibility.
A function is called H-admissible if and only if it
belongs to one of the following classes or can be ob-
tained, in a finite number of steps according to the
following rules, from other H-admissible functions:
1. if f(t) and g(t) are H-admissible functions and
p(t) is a polynomial with real coefficients and
positive leading term, then:
exp(f(t)) f(t) +g(t) f(t) +p(t)
p(f(t)) p(t)f(t)
are all H-admissible functions;
2. if p(t) is a polynomial with positive coefficients
and not of the form p(t
k
) for k > 1, then the
function exp(p(t)) is H-admissible;
3. if α, β are positive real numbers and γ, δ are real
numbers, then the function:
f(t) = exp

β
(1 −t)
α

1
t
ln
1
(1 −t)

γ

2
t
ln

1
t
ln
1
(1 −t)

δ

is H-admissible.
For example, the following functions are all H-
admissible:
e
t
exp

t +
t
2
2

exp

t
1 −t

exp

1
t(1 −t)
2
ln
1
1 −t

.
In particular, for the third function we have:
exp

t
1 −t

= exp

1
1 −t
−1

=
1
e
exp

1
1 −t

and naturally a constant does not influence the H-
admissibility of a function. In this example we have
α = β = 1 and γ = δ = 0.
For H-admissible functions, the following result
holds:
Theorem 7.8.1 Let f(t) be an H-admissible func-
tion; then:
f
n
= [t
n
]f(t) ∼
f(r)
r
n

2πb(r)
as n → ∞
where r = r(n) is the least positive solution of the
equation tf

(t)/f(t) = n and b(t) is the function:
b(t) = t
d
dt

t
f

(t)
f(t)

.
As we said before, the proof of this theorem is
based on Cauchy’s theorem and is beyond the scope
of these notes. Instead, let us show some examples
to clarify the application of Hayman’s method.
7.9. EXAMPLES OF HAYMAN’S THEOREM 91
7.9 Examples of Hayman’s
Theorem
The first example can be easily verified. Let f(t) = e
t
be the exponential function, so that we know f
n
=
1/n!. For applying Hayman’s theorem, we have to
solve the equation te
t
/e
t
= n, which gives r = n.
The function b(t) is simply t and therefore we have:
[t
n
]e
t
=
e
n
n
n

2πn
and in this formula we immediately recognize Stirling
approximation for factorials.
Examples become early rather complex and require
a large amount of computations. Let us consider the
following sum:
n−1
¸
k=0

n −1
k

1
(k + 1)!
=
n
¸
k=0

n −1
k −1

1
k!
=
=
n
¸
k=0

n
k

n −1
k

1
k!
=
=
n
¸
k=0

n
k

1
k!

n−1
¸
k=0

n −1
k

1
k!
=
= [t
n
]
1
1 −t
exp

t
1 −t


−[t
n−1
]
1
1 −t
exp

t
1 −t

=
= [t
n
]
1 −t
1 −t
exp

t
1 −t

= [t
n
] exp

t
1 −t

.
We have already seen that this function is H-
admissible, and therefore we can try to evaluate the
asymptotic behavior of the sum. Let us define the
function g(t) = tf

(t)/f(t), which in the present case
is:
g(t) =
t
(1−t)
2
exp

t
1−t

exp

t
1−t
=
t
(1 −t)
2
.
The value of r is therefore given by the minimal pos-
itive solution of:
t
(1 −t)
2
= n or nt
2
−(2n + 1)t +n = 0.
Because ∆ = 4n
2
+ 4n + 1 − 4n
2
= 4n + 1, we have
the two solutions:
r =
2n + 1 ±

4n + 1
2n
and we must accept the one with the ‘−’ sign, which is
positive and less than the other. It is surely positive,
because

4n + 1 < 2

n + 1 < 2n, for every n > 1.
As n grows, we can also give an asymptotic approx-
imation of r, by developing the expression inside the
square root:

4n + 1 = 2

n

1 +
1
4n
=
= 2

n

1 +
1
8n
+O

1
n
2

.
From this formula we immediately obtain:
r = 1 −
1

n
+
1
2n

1
8n

n
+O

1
n
2

which will be used in the next approximations.
First we compute an approximation for f(r), that
is exp(r/(1 −r)). Since:
1 −r =
1

n

1
2n
+
1
8n

n
+O

1
n
2

=
=
1

n

1 −
1
2

n
+
1
8n
+O

1
n

n

we immediately obtain:
r
1 −r
=

1 −
1

n
+
1
2n
+O

1
n

n


n

1 +
1
2

n
+
1
8n
+O

1
n

n

=
=

n

1 −
1
2

n
+
1
8n
+O

1
n

n

=
=

n −
1
2
+
1
8

n
+O

1
n

.
Finally, the exponential gives:
exp

r
1 −r

=
e

n

e
exp

1
8

n
+O

1
n

=
=
e

n

e

1 +
1
8

n
+O

1
n

.
Because Hayman’s method only gives the principal
value of the result, the correction can be ignored (it
can be not precise) and we get:
exp

r
1 −r

=
e

n

e
.
The second part we have to develop is 1/r
n
, which
can be computed when we write it as exp(nln 1/r),
that is:
1
r
n
= exp

nln

1 +
1

n
+
1
2n
+O

1
n

n

=
= exp

n

1

n
+
1
2n
+O

1
n

n



1
2

1

n
+O

1
n

n

2
+O

1
n

n

92 CHAPTER 7. ASYMPTOTICS
= exp

n

1

n
+
1
2n

1
2n
+O

1
n

n

=
= exp


n +O

1

n

∼ e

n
.
Again, the correction is ignored and we only consider
the principal value. We now observe that f(r)/r
n

e
2

n
/

e.
Only b(r) remains to be computed; we have b(t) =
tg

(t) where g(t) is as above, and therefore we have:
b(t) = t
d
dt
t
(1 −t)
2
=
t(1 +t)
(1 −t)
3
.
This quantity can be computed a piece at a time.
First:
r(1 +r) =

1 −
1

n
+
1
2n
+O

1
n

n

2 −
1

n
+
1
2n
+O

1
n

n

=
= 2 −
3

n
+
5
2n
+O

1
n

n

=
= 2

1 −
3
2

n
+
5
4n
+O

1
n

n

.
By using the expression already found for (1 −r) we
then have:
(1 −r)
3
=
1
n

n

1 −
1
2

n
+
1
8n
+O

1
n

n

3
=
=
1
n

n

1 −
1

n
+
1
2n
+O

1
n

n

1 −
1
2

n
+
1
8n
+O

1
n

n

=
=
1
n

n

1 −
3
2

n
+
9
8n
+O

1
n

n

.
By inverting this quantity, we eventually get:
b(r) =
r(1 +r)
(1 −r)
3
=
= 2n

n

1 +
3
2

n
+
9
8n
+O

1
n

n

1 −
3
2

n
+
5
4n
+O

1
n

n

=
= 2n

n

1 +
1
8n
+O

1
n

n

.
The principal value is 2n

n and therefore:

2πb(r) =

4πn

n = 2

πn
3/4
;
the final result is:
f
n
= [t
n
] exp

t
1 −t


e
2

n
2

πen
3/4
.
It is a simple matter to execute the original sum
¸
n−1
k=0

n−1
k

/(k + 1)! by using a personal computer
for, say, n = 100, 200, 300. The results obtained can
be compared with the evaluation of the previous for-
mula. We can thus verify that the relative error de-
creases as n increases.
Chapter 8
Bibliography
The birth of Computer Science and the need of ana-
lyzing the behavior of algorithms and data structures
have given a strong twirl to Combinatorial Analysis,
and to the mathematical methods used to study com-
binatorial objects. So, near to the traditional litera-
ture on Combinatorics, a number of books and papers
have been produced, relating Computer Science and
the methods of Combinatorial Analysis. The first au-
thor who systematically worked in this direction was
surely Donald Knuth, who published in 1968 the first
volume of his monumental “The Art of Computer
Programming”, the first part of which is dedicated
to the mathematical methods used in Combinatorial
Analysis. Without this basic knowledge, there is little
hope to understand the developments of the analysis
of algorithms and data structures:
Donald E. Knuth: The Art of Computer Pro-
gramming: Fundamental Algorithms, Vol. I,
Addison-Wesley (1968).
Many additional concepts and techniques are also
contained in the third volume:
Donald E. Knuth: The Art of Computer Pro-
gramming: Sorting and Searching, Vol. III,
Addison-Wesley (1973).
Numerical and probabilistic developments are to
be found in the central volume:
Donald E. Knuth: The Art of Computer Pro-
gramming: Numerical Algorithms, Vol. II,
Addison-Wesley (1973)
A concise exposition of several important tech-
niques is given in:
Daniel H. Greene, Donald E. Knuth: Mathemat-
ics for the Analysis of Algorithms, Birkh¨auser
(1981).
The most systematic work in the field is perhaps
the following text, very clear and very comprehensive.
It deals, in an elementary but rigorous way, with the
main topics of Combinatorial Analysis, with an eye
to Computer Science. Many exercise (with solutions)
are proposed, and the reader is never left alone in
front of the many-faced problems he or she is encour-
aged to tackle.
Ronald L. Graham, Donald E. Knuth, Oren
Patashnik: Concrete Mathematics, Addison-
Wesley (1989).
Many texts on Combinatorial Analysis are worth of
being considered, because they contain information
on general concepts, both from a combinatorial and
a mathematical point of view:
William Feller: An Introduction to Probabil-
ity Theory and Its Applications, Wiley (1950) -
(1957) - (1968).
John Riordan: An Introduction to Combinatorial
Analysis, Wiley (1953) (1958).
Louis Comtet: Advanced Combinatorics, Reidel
(1974).
Ian P. Goulden, David M. Jackson: Combinato-
rial Enumeration, Dover Publ. (2004).
Richard P. Stanley: Enumerative Combinatorics,
Vol. I, Cambridge Univ. Press (1986) (2000).
Richard P. Stanley: Enumerative Combinatorics,
Vol. II, Cambridge Univ. Press !997) (2001).
Robert Sedgewick, Philippe Flajolet: An Intro-
duction to the Analysis of Algorithms, Addison-
Wesley (1995).
93
94 CHAPTER 8. BIBLIOGRAPHY
* * *
Chapter 1: Introduction
The concepts relating to the problem of searching
can be found in any text on Algorithms and Data
Structures, in particular Volume III of the quoted
“The Art of Computer Programming”. Volume I can
be consulted for Landau notation, even if it is com-
mon in Mathematics. For the mathematical concepts
of Γ and ψ functions, the reader is referred to:
Milton Abramowitz, Irene A. Stegun: Handbook
of Mathematical Functions, Dover Publ. (1972).
* * *
Chapter 2: Special numbers
The quoted book by Graham, Knuth and Patash-
nik is appropriate. Anyhow, the quantity we consider
are common to all parts of Mathematics, in partic-
ular of Combinatorial Analysis. The ζ function and
the Bernoulli numbers are also covered by the book of
Abramowitz and Stegun. Stanley dedicates to Cata-
lan numbers a special survey in his second volume.
Probably, they are the most frequently used quan-
tity in Combinatorics, just after binomial coefficients
and before Fibonacci numbers. Stirling numbers are
very important in Numerical Analysis, but here we
are more interested in their combinatorial meaning.
Harmonic numbers are the “discrete” version of log-
arithms, and from this fact their relevance in Combi-
natorics and in the Analysis of Algorithms originates.
Sequences studied in Combinatorial Analysis have
been collected and annotated by Sloane. The result-
ing book (and the corresponding Internet site) is one
of the most important reference point:
N. J. A. Sloane, Simon Plouffe: The Encyclope-
dia of Integer Sequences Academic Press (1995).
Available at:
http://www.research.att.com/ njas/sequences/.
* * *
Chapter 3: Formal Power Series
Formal power series have a long tradition; the
reader can find their algebraic foundation in the book:
Peter Henrici: Applied and Computational Com-
plex Analysis, Wiley (1974) Vol. I; (1977) Vol.
II; (1986) Vol. III.
Coefficient extraction is central in our approach; a
formalization can be found in:
Donatella Merlini, Renzo Sprugnol, M. Cecilia
Verri: The method of coefficients, The American
Mathematical Monthly (to appear).
The Lagrange Inversion Formula can be found in
most of the quoted texts, in particular in Stanley
and Henrici. Nowadays, almost every part of Math-
ematics is available by means of several Computer
Algebra Systems, implementing on a computer the
ideas described in this chapter, and many, many oth-
ers. Maple and Mathematica are among the most
used systems; other software is available, free or not.
These systems have become an essential tool for de-
veloping any new aspect of Mathematics.
* * *
Chapter 4: Generating functions
In our present approach the concept of an (ordi-
nary) generating function is essential, and this justi-
fies our insistence on the two operators “generating
function” and “coefficient of”. Since their introduc-
tion in Mathematics, due to de Moivre in the XVIII
Century, generating functions have been a controver-
sial concept. Now they are accepted almost univer-
sally, and their theory has been developed in many
of the quoted texts. In particular:
Herbert S. Wilf: Generatingfunctionology, Aca-
demic Press (1990).
A practical device has been realized in Maple:
Bruno Salvy, Paul Zimmermann: GFUN: a Maple
package for the manipulation of generating and
holonomic functions in one variable, INRIA, Rap-
port Technique N. 143 (1992).
The number of applications of generating functions
is almost infinite, so we limit our considerations to
some classical cases relative to Computer Science.
Sometimes, we intentionally complicate proofs in or-
der to stress the use of generating function as a uni-
fying approach. So, our proofs should be compared
with those given (for the same problems) by Knuth
or Flajolet and Sedgewick.
* * *
Chapter 5: Riordan arrays
95
Riordan arrays are part of the general method of
coefficients and are particularly important in prov-
ing combinatorial identities and generating function
transformations. They were introduced by:
Louis W. Shapiro, Seyoum Getu, Wen-Jin Woan,
Leon C. Woodson: The Riordan group, Discrete
Applied Mathematics, 34 (1991) 229 – 239.
Their practical relevance was noted in:
Renzo Sprugnoli: Riordan arrays and combinato-
rial sums, Discrete Mathematics, 132 (1992) 267
– 290.
The theory was further developed in:
Donatella Merlini, Douglas G. Rogers, Renzo
Sprugnoli, M. Cecilia Verri: On some alterna-
tive characterizations of Riordan arrays, Cana-
dian Journal of Mathematics 49 (1997) 301 – 320.
Riordan arrays are strictly related to “convolution
matrices” and to “Umbral calculus”, although, rather
strangely, nobody seems to have noticed the connec-
tions of these concepts and combinatorial sums:
Steve Roman: The Umbral Calculus Academic
Press (1984).
Donald E. Knuth: Convolution polynomials, The
Methematica Journal, 2 (1991) 67 – 78.
Collections of combinatorial identities are:
Henry W. Gould: Combinatorial Identities. A
Standardized Set of Tables Listing 500 Binomial
Coefficient Summations West Virginia University
(1972).
Josef Kaucky: Combinatorial Identities Veda,
Bratislava (1975).
Other methods to prove combinatorial identities
are important in Combinatorial Analysis. We quote:
John Riordan: Combinatorial Identities, Wiley
(1968).
G. P. Egorychev: Integral Representation and the
Computation of Combinatorial Sums American
Math. Society Translations, Vol. 59 (1984).
Doron Zeilberger: A holonomic systems approach
to special functions identities, Journal of Compu-
tational and Applied Mathematics 32 (1990), 321
– 368.
Doron Zeilberger: A fast algorithm for proving
terminating hypergeometric identities, Discrete
Mathematics 80 (1990), 207 – 211.
M. Petkovˇsek, Herbert S. Wilf, Doron Zeilberger:
A = B, A. K. Peters (1996).
In particular, the method of Wilf and Zeilberger
completely solves the problem of combinatorial iden-
tities “with hypergeometric terms”. This means that
we can algorithmically establish whether an identity
is true or false, provided the two members of the iden-
tity:
¸
k
L(k, n) = R(n) have a special form:
L(k + 1, n)
L(k, n)
L(k, n + 1)
L(k, n)
R(n + 1)
R(n)
are all rational functions in n and k. This actu-
ally means that L(k, n) and R(n) are composed of
factorial, powers and binomial coefficients. In this
sense, Riordan arrays are less powerfy+ul, but can
be used also when non-hypergeometric terms are not
involved, as for examples in the case of harmonic and
Stirling numbers of both kind.
* * *
Chapter 6: Formal Methods
In this chapter we have considered two important
methods: the symbolic method for deducing counting
generating functions from the syntactic definition of
combinatorial objects, and the method of “operators”
for obtaining combinatorial identities from relations
between transformations of sequences defined by op-
erators.
The symbolic method was started by
Sch¨ utzenberger and Viennot, who devised a
technique to automatically generate counting gener-
ating functions from a context-free non-ambiguous
grammar. When the grammar defines a class of
combinatorial objects, this method gives a direct way
to obtain monovariate or multivariate generating
functions, which allow to solve many problems
relative to the given objects. Since context-free
languages only define algebraic generating functions
(the subset of regular grammars is limited to rational
96 CHAPTER 8. BIBLIOGRAPHY
functions), the method is not very general, but
is very effective whenever it can be applied. The
method was extended by Flajolet to some classes of
exponential generating functions and implemented
in Maple.
Marco Sch¨ utzenberger: Context-free languages
and pushdown automata, Information and Con-
trol 6 (1963) 246 – 264
Maylis Delest, Xavier Viennot: Algebraic lan-
guages and polyominoes enumeration, X Collo-
quium on Automata, Languages and Program-
ming - Lecture Notes in Computer Science (1983)
173 – 181.
Philippe Flajolet: Symbolic enumerative com-
binatorics and complex asymptotic analysis,
Algorithms Seminar, (2001). Available at:
http://algo.inria.fr/seminars/sem00-
01/flajolet.html
The method of operators is very old and was devel-
oped in the XIX Century by English mathematicians,
especially George Boole. A classical book in this di-
rection is:
Charles Jordan: Calculus of Finite Differences,
Chelsea Publ. (1965).
Actually, the method is used in Numerical Anal-
ysis, but it has a clear connection with Combinato-
rial Analysis, as our numerous examples show. The
important concepts of indefinite and definite summa-
tions are used by Wilf and Zeilberger in the quoted
texts. The Euler-McLaurin summation formula is the
first connection between finite methods (considered
up to this moment) and asymptotics.
* * *
Chapter 7: Asymptotics
The methods treated in the previous chapters are
“exact”, in the sense that every time they give the so-
lution to a problem, this solution is a precise formula.
This, however, is not always possible, and many times
we are not able to find a solution of this kind. In these
cases, we would also be content with an approximate
solution, provided we can give an upper bound to the
error committed. The purpose of asymptotic meth-
ods is just that.
The natural settings for these problems are Com-
plex Analysis and the theory of series. We have used
a rather descriptive approach, limiting our consid-
erations to elementary cases. These situations are
covered by the quoted text, especially Knuth, Wilf
and Henrici. The method of Heyman is based on the
paper:
Micha Hofri: Probabilistic Analysis of Algo-
rithms, Springer (1987).
Index
Γ function, 8, 84
ΛΥΩ, 90
ψ function, 8
1-1 correspondence, 11
1-1 mapping, 11
A-sequence, 58
absolute scale, 9
addition operator, 77
Adelson-Velski, 52
adicity, 35
algebraic singularity, 86, 87
algebraic square root, 87
Algol’60, 69
Algol’68, 70
algorithm, 5
alphabet, 12, 67
alternating subgroup, 14
ambiguous grammar, 68
Appell subgroup, 57
arithmetic square root, 87
arrangement, 11
asymptotic approximation, 88
asymptotic development, 80
asymptotic evaluation, 88
average case analysis, 5
AVL tree, 52
Backus Normal Form, 70
Bell number, 24
Bell subgroup, 57
Bender’s theorem, 84
Bernoulli number, 18, 24, 53, 80, 85
big-oh notation, 9
bijection, 11
binary searching, 6
binary tree, 20
binomial coefficient, 8, 15
binomial formula, 15
bisection formula, 41
bivariate generating function, 55
BNF, 70
Boole, George, 73
branch, 87
C language, 69
cardinality, 11
Cartesian product, 11
Catalan number, 20, 34
Catalan triangle, 63
Cauchy product, 26
Cauchy theorem, 90
central binomial coefficient, 16, 17
central trinomial coefficient, 46
characteristic function, 12
Chomski, Noam, 67
choose, 15
closed form expression, 7
codomain, 11
coefficient extraction rules, 30
coefficient of operator, 30
coefficient operator, 30
colored walk, 62
colored walk problem, 62
column, 11
column index, 11
combination, 15
combination with repetitions, 16
complete colored walk, 62
composition of f.p.s., 29
composition of permutations, 13
composition rule for coefficient of, 30
composition rule for generating functions, 40
compositional inverse of a f.p.s., 29
Computer Algebra System, 34
context free grammar, 68
context free language, 68
convergence, 83
convolution, 26
convolution rule for coefficient of, 30
convolution rule for generating functions, 40
cross product rule, 17
cycle, 12, 21
cycle degree, 12
cycle representation, 12
Darboux’ method, 84
definite integration of a f.p.s., 28
definite summation, 78
degree of a permutation, 12
delta series, 29
derangement, 12, 86
derivation, 67
diagonal step, 62
97
98 INDEX
diagonalisation rule for generating functions, 40
difference operator, 73
differentiation of a f.p.s., 28
differentiation rule for coefficient of, 30
differentiation rule for generating functions, 40
digamma function, 8
disposition, 15
divergence, 83
domain, 11
dominating singularity, 85
double sequence, 11
Dyck grammar, 68
Dyck language, 68
Dyck walk, 21
Dyck word, 69
east step, 20, 62
empty word, 12, 67
entire function, 89
essential singularity, 86
Euclid’s algorithm, 19
Euler constant, 8, 18
Euler transformation, 42, 55
Euler-McLaurin summation formula, 80
even permutation, 13
exponential algorithm, 10
exponential generating function, 25, 39
exponentiation of f.p.s., 28
extensional definition, 11
extraction of the coefficient, 29
f.p.s., 25
factorial, 8, 14
falling factorial, 15
Fibonacci number, 19
Fibonacci problem, 19
Fibonacci word, 70
Fibonacci, Leonardo, 18
finite operator, 73
fixed point, 12
Flajolet, Philippe, 90
formal grammar, 67
formal language, 67
formal Laurent series, 25, 27
formal power series, 25
free monoid, 67
full history recurrence, 47
function, 11
Gauss’ integral, 8
generalized convolution rule, 56
generalized harmonic number, 18
generating function, 25
generating function rules, 40
generation, 67
geometric series, 30
grammar, 67
group, 67
H-admissible function, 90
Hardy’s identity, 61
harmonic number, 8, 18
harmonic series, 17
Hayman’s method, 90
head, 67
height balanced binary tree, 52
height of a tree, 52
i.p.l., 51
identity, 12
identity for composition, 29
identity operator, 73
identity permutation, 13
image, 11
inclusion exclusion principle, 86
indefinite precision, 35
indefinite summation, 78
indeterminate, 25
index, 11
infinitesimal operator, 73
initial condition, 6, 47
injective function, 11
input, 5
integral lattice, 20
integration of a f.p.s., 28
intensional definition, 11
internal path length, 51
intractable algorithm, 10
intrinsecally ambiguous grammar, 69
invertible f.p.s., 26
involution, 13, 49
juxtaposition, 67
k-combination, 15
key, 5
Kronecker’s delta, 43
Lagrange, 32
Lagrange inversion formula, 33
Lagrange subgroup, 57
Landau, Edmund, 9
Landis, 52
language, 12, 67
language generated by the grammar, 68
leftmost occurrence, 67
length of a walk, 62
length of a word, 67
letter, 12, 67
LIF, 33
linear algorithm, 10
linear recurrence, 47
linear recurrence with constant coefficients, 48
INDEX 99
linear recurrence with polynomial coefficients, 48
linearity rule for coefficient of, 30
linearity rule for generating functions, 40
list representation, 36
logarithm of a f.p.s., 28
logarithmic algorithm, 10
logarithmic singularity, 88
mapping, 11
Mascheroni constant, 8, 18
metasymbol, 70
method of shifting, 44
Miller, J. C. P., 37
monoid, 67
Motzkin number, 71
Motzkin triangle, 63
Motzkin word, 71
multiset, 24
negation rule, 16
Newton’s rule, 28, 30, 84
non convergence, 83
non-ambiguous grammar, 68
north step, 20, 62
north-east step, 20
number of involutions, 14
number of mappings, 11
Numerical Analysis, 73
O-notation, 9
object grammar, 71
occurence, 67
occurs in, 67
odd permutation, 13
operand, 35
operations on rational numbers, 35
operator, 35, 72
order of a f.p.s., 25
order of a pole, 85
order of a recurrence, 47
ordered Bell number, 24, 85
ordered partition, 24
ordinary generating function, 25
output, 5
p-ary tree, 34
parenthetization, 20, 68
partial fraction expansion, 30, 48
partial history recurrence, 47
partially recursive set, 68
Pascal language, 69
Pascal triangle, 16
path, 20
permutation, 12
place marker, 25
Pochhammer symbol, 15
pole, 85
polynomial algorithm, 10
power of a f.p.s., 27
preferential arrangement number, 24
preferential arrangements, 24
prefix, 67
principal value, 88
principle of identity, 40
problem of searching, 5
product of f.L.s., 27
product of f.p.s., 26
production, 67
program, 5
proper Riordan array, 55
quadratic algoritm, 10
quasi-unit, 29
rabbit problem, 19
radius of convergence, 83
random permutation, 14
range, 11
recurrence relation, 6, 47
renewal array, 57
residue of a f.L.s., 30
reverse of a f.p.s., 29
Riemann zeta function, 18
Riordan array, 55
Riordan’s old identity, 61
rising factorial, 15
Rogers, Douglas, 57
root, 21
rooted planar tree, 21
row, 11
row index, 11
row-by-column product, 32
Salvy, Bruno, 90
Sch¨ utzenberger methodology, 68
semi-closed form, 46
sequence, 11
sequential searching, 5
set, 11
set partition, 23
shift operator, 73
shifting rule for coefficient of, 30
shifting rule for generating functions, 40
shuffling, 14
simple binomial coefficients, 59
singularity, 85
small-oh notation, 9
solving a recurrence, 6
sorting, 12
south-east step, 20
square root of a f.p.s., 28
Stanley, 33
100 INDEX
Stirling number of the first kind, 21
Stirling number of the second kind, 22
Stirling polynomial, 23
Stirling, James, 21, 22
subgroup of associated operators, 57
subtracted singularity, 88
subword, 67
successful search, 5
suffix, 67
sum of a geometric progression, 44
sum of f.L.s, 27
sum of f.p.s., 26
summation by parts, 78
summing factor method, 49
surjective function, 11
symbol, 12, 67
symbolic method, 68
symmetric colored walk problem, 62
symmetric group, 14
symmetry formula, 16
syntax directed compilation, 70
table, 5
tail, 67
Tartaglia triangle, 16
Taylor theorem, 80
terminal word, 68
tractable algorithm, 10
transposition, 12
transposition representation, 13
tree representation, 36
triangle, 55
trinomial coefficient, 46
unary-binary tree, 71
underdiagonal colored walk, 62
underdiagonal walk, 20
unfold a recurrence, 6, 49
uniform convergence, 83
unit, 26
unsuccessful search, 5
van Wijngarden grammar, 70
Vandermonde convolution, 43
vector notation, 11
vector representation, 12
walk, 20
word, 12, 67
worst AVL tree, 52
worst case analysis, 5
Z-sequence, 58
zeta function, 18
Zimmermann, Paul, 90

2

Contents
1 Introduction 1.1 What is the Analysis of an Algorithm 1.2 The Analysis of Sequential Searching . 1.3 Binary Searching . . . . . . . . . . . . 1.4 Closed Forms . . . . . . . . . . . . . . 1.5 The Landau notation . . . . . . . . . . 2 Special numbers 2.1 Mappings and powers . . . . . . . . 2.2 Permutations . . . . . . . . . . . . . 2.3 The group structure . . . . . . . . . 2.4 Counting permutations . . . . . . . . 2.5 Dispositions and Combinations . . . 2.6 The Pascal triangle . . . . . . . . . . 2.7 Harmonic numbers . . . . . . . . . . 2.8 Fibonacci numbers . . . . . . . . . . 2.9 Walks, trees and Catalan numbers . 2.10 Stirling numbers of the first kind . . 2.11 Stirling numbers of the second kind . 2.12 Bell and Bernoulli numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 6 6 7 9 11 11 12 13 14 15 16 17 18 19 21 22 23 25 25 26 27 27 29 29 31 32 33 34 35 36 37 39 39 40 41 42 44 45 46

3 Formal power series 3.1 Definitions for formal power series . . . . 3.2 The basic algebraic structure . . . . . . . 3.3 Formal Laurent Series . . . . . . . . . . . 3.4 Operations on formal power series . . . . 3.5 Composition . . . . . . . . . . . . . . . . 3.6 Coefficient extraction . . . . . . . . . . . . 3.7 Matrix representation . . . . . . . . . . . 3.8 Lagrange inversion theorem . . . . . . . . 3.9 Some examples of the LIF . . . . . . . . . 3.10 Formal power series and the computer . . 3.11 The internal representation of expressions 3.12 Basic operations of formal power series . . 3.13 Logarithm and exponential . . . . . . . . 4 Generating Functions 4.1 General Rules . . . . . . . . . . . . . . . . 4.2 Some Theorems on Generating Functions 4.3 More advanced results . . . . . . . . . . . 4.4 Common Generating Functions . . . . . . 4.5 The Method of Shifting . . . . . . . . . . 4.6 Diagonalization . . . . . . . . . . . . . . . 4.7 Some special generating functions . . . . .

8 Hayman’s method . . . . . . . . . . . . 7. . 7. . . . . . 6. . . . . .8 Shift and Difference Operators . . . . 7. . . . . . . . . . . . . . . .4 Poles and asymptotics . . . . . . . . . . . . . . 6. . . . . . . . . . .4 Simple binomial coefficients . . .8 Stirling numbers and Riordan arrays . . . . .5 The bivariate case . . . 5. . . . . . . . . . . . . .11 4. . . . . . . . . . . . . . . . . 5. . . . . . . . . . . .13 Linear recurrences with constant coefficients . . . . . . . . . . . . . . . . . . .9 Identities involving the Stirling numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 The A-sequence for proper Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . 5. .1 The convergence of power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . 5. . . CONTENTS .7 The asymptotic behavior of a trinomial square root 7. . . . . . . . . . . . . . . . .12 Definite Summation . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 4. . . . . .10 The Addition Operator . . . . 8 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 Examples of Hayman’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . .1 Definitions and basic concepts . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 The Shift Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 The algebraic structure of Riordan arrays . . . . . . . . .9 Shift and Difference Operators . . . . . . . . . . . .7 The Difference Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . 6. . . . . . . . . . . . . 7. . . . . . .6 Subtracted singularities . . . .11 Definite and Indefinite summation . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Coloured walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Context-free languages . . . . .5 Algebraic and logarithmic singularities . . . . . . . . . . . . . . 6. . . . . . .Example II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Asymptotics 7. . . . . . . . . . . . . . . 6 Formal methods 6. . . . . .9 4. . . . .1 Formal languages . . . . . . . . . . . . . . . . 6. . 6. . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . . .13 The Euler-McLaurin Summation Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Binomial coefficients and the LIF . . . . . . . . . . . . . . . . . . . . . . . . . .10 4. . . . . . . . . . . . . . . . . . . Some special recurrences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . 6. . . . . . . Linear recurrences with polynomial coefficients The summing factor method . . . . . . . 7. . . . . . .Example I . . . . . . . . . . . . . . . . . .5 Other Riordan arrays from binomial coefficients 5. . . . . . . . . . . . . . . . . . . . . . . .3 Singularities: poles . . . . . . . . . . . . . . . . . . . . . . . . The internal path length of binary trees . .4 The symbolic method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . .4 4. .3 Formal languages and programming languages 6. . . . . .14 Applications of the Euler-McLaurin Formula . . . . . . . . . . 47 49 49 51 52 53 55 55 56 57 59 60 61 62 64 65 67 67 68 69 70 71 72 73 74 76 77 78 79 80 81 83 83 84 85 86 87 88 89 89 91 93 5 Riordan Arrays 5. .2 The method of Darboux . . . . . . . . . . . . . . 6. . . . . Height balanced binary trees . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . . . . . . .

and have therefore to be considered as “words” in the computer memory. A natural problem. and so on. Otherwise. In practical cases S is the set N of natural numbers. In the latter case we are convinced that s ∈ T and the search is unsuccessful. we wish to know what the worst case is (Worst Case Analysis) and what the average case is (Average Case Analysis) with respect to all the tables containing n elements or. the most straight-forward algorithm is sequential searching: we begin by comparing s and a1 and we are finished if they are equal. multiplication and division) with the newly introduced arabic digits and the positional notation for numbers. is to choose. Many algorithms can exist which solve the same problem. the problem of searching. . thus obtaining exact criteria to compare different algorithms performing the same task. the word was used for a long time before computers were invented. but they can be very slow or require large amounts of computer memory. in order to realize it as a program on a particular computer. we compare s and a2 . Surely.1 What is the Analysis of an Algorithm ever. The “problem of searching” is as follows: we are given a finite ordered subset T = (a1 . in these cases. its elements referred to as keys). . Leonardo Fibonacci called “algorisms” the rules for performing the four basic operations (addition. others can be very simple and straight-forward. which is the solution to the given problem. In the former case the search is successful and we have determined the element in T equal to s. The problem is therefore to have some means to judge the speed and the quantity of computer resources a given algorithm uses. Besides. Although the mathematical problem “s ∈ T or not” has almost no relevance. An algorithm is realized on a computer by means of a program. How- . the searching problem is basic in computer science and many algorithms have been devised to make the process of searching as fast as possible. an element s ∈ S and we wish to know whether s ∈ T or s ∈ T . as a matter of fact. for what concerns the sequential search algorithm. or the set R of real numbers or also the set A∗ of the words on some alphabet A. . For example. i. This definition is intentionally vague and the following points should be noted: • an algorithm can present several aspects and therefore may require several mathematical expressions to be fully understood. the best algorithm. The aim of the “Analysis of Algorithms” is just to give the mathematical bases for describing the behavior of algorithms. obtaining the correct answer or output. • the operations performed by an algorithm can be 5 An algorithm is a finite sequence of unambiguous rules for solving a problem. subtraction. a set of instructions which cause the computer to perform the elaborations intended by the algorithm. and.Chapter 1 Introduction 1.e. if any.. as a simple example. with respect to the n elements it contains. Let us consider. Once the algorithm is started with a particular input. Simple algorithms are more easily programmed. because we always deal with a suitable binary representation of the elements in S on a computer. we are interested in what happens during a successful or unsuccessful search. until we find an element ak = s or reach the end of T . an algorithm is independent of any computer. Let S be a given set. Some can be very skillful. in fact. or to see if an algorithm is sufficiently good to be efficiently implemented on a computer. an ) of S (usually called a table. . So. and in the former case which element ak in T it is. or the set Z of integer numbers. for a fixed table. The analysis of this (simple) algorithm consists in finding one or more mathematical expressions describing in some way the number of operations performed by the algorithm as a function of the number n of the elements in T . a2 . it will always end. “Euclid’s algorithm” for evaluating the greatest common divisor of two integer numbers was also well known before the appearance of computers. for a successful search. the nature of S is not essential.

i. and we can decide to consider as an “operation” also very complicated manipulations (e. then the search is unsuccessful.2 The Analysis of Sequential the recurrence is easily transformed into a sum.. . Let S be a given divide by n.merical order in N. In our case. of an algorithm. The important point is that every instance of the “operation” takes on about the same time or has the same complexity. Algorithms are only related to the “logic” used to Un = 1 + Un−1 = 1 + 1 + Un−2 = · · · = solve the problem.6 of many different kinds. Let ai the median elit shows in mathematical terms that the number of ement in T .s > ai perform the same algorithm on the subtable tion of a sum—a well-known sum in the present case.comparisons necessary to find out that an element cution time as a possible parameter for the behavior s does not belong to a table containing no element. i. . If this is not the case. If T = (a1 . . In our example 1.Another simple example of analysis can be performed parisons necessary to find any element in T . More interesting is the average case analysis. U0 is the number of We observe explicitly that we never consider exe. perform the same algorithm on the subtable T ′ = (a1 . an ). and compare it comparisons performed by the algorithm (and hence with s. . we can determine Un in the following way. the time taken on a computer) grows linearly with otherwise. algorithms are Hence we have the initial condition U0 = 0 and we independent of any particular computer and should can unfold the preceding recurrence relation: never be confused with the programs realizing them. . i = ⌊(n + 1)/2⌋. an ) is a finite ordered subset of S. e. In general Searching we have the problem of solving a recurrence. which contains n − 1 elements. The ordering must be total. Operations depend on the nature of the algorithm. we have: order in A∗ . . ai−1 ). as the nua single comparison. . if instead the dimension of T .. starting with the The analysis of the sequential searching algorithm is recurrence relation and the initial conditions. since for an element s ∈ S in T . Z or R or the lexicographical isons. If s = ai then the search is successful. and then with the binary search algorithm. if s < ai . we have: Un = 1 + Un−1 This is a recurrence relation. . . which introduces the first mathematical device of algorithm anal. that is an expression relating the value of Un with other values of Uk having k < n. and so on. Sometimes we can also consider more complex operations as square roots. ai+2 . Recurrence relations are the other mathematical device arising in algorithm analysis. Let us now consider an unsuccessful search. since we have to perform n solution or recurrences. T ′′ = (ai+1 . but as we shall see this is not always the case. as a matter of ble on which we perform the search is reduced to the fact. . . Consequently. i.1. It is clear that if s = a1 we only need ordered set. So. CHAPTER 1. As stated before. A large part of our efforts will be dedicated to this topic. if s = a2 we need two compar.. If Uk denotes the number of comparisons necessary for a table with k elements. extracting a random number or performing the differentiation of a polynomial of a given degree). In the example above the only operations involved in the algorithm are comparisons.. . list concatenation or reversal of words. The concluding step of our analysis was the execu. the worst case other large part of our efforts will be dedicated to the analysis is immediate. anvery simple. so we should go on with the table T ′ = T \ {a1 }. If at any moment the taThis is typical of many algorithms and. for which we only have an Average Case analysis. then it is possible to find the value of Un .e. programs can depend on the abil= 1 + 1 + · · · + 1 +U0 = n ity of the programmer or on the characteristics of the n times computer. called binary searching. to find an explicit expression for Un . INTRODUCTION technical point for the analyst. comparisons if s = an is the last element in T . a table.3 Binary Searching ysis. To find the average number of comparisons in a successful search we should sum the number of com. U0 or U1 is known.g. For a successful search. we can always imag1 n(n + 1) n+1 1 = Cn = (1 + 2 + · · · + n) = ine that a1 < a2 < · · · < an and consider the foln n 2 2 lowing algorithm. a2 .e. In other algorithms we can have arithmetic or logical operations. Consequently. . It is clear that if some value.g. to search This result is intuitively clear but important. the ability in performing sums is an important empty set ∅.. so no doubt is possible. we can give different weights to different operations or to different instances of the same operation. for every n ∈ N. We compare s with the first element a1 in T and obviously we find s = a1 . a2 .e.

two expressions can be computationally quite different. we have the recurrence: Bn = 1 + B⌊n/2⌋ 7 This sum also is not immediate. 000..4.7). The important searching only requires log2 (1. every power The value of this sum can be found explicitly.point is that the evaluation of the right-hand expresparisons. when the subtable on which we search is reduced to a single element.. but the reader can check it by using mathematical induction. βk = k. A computer because of the very slow growth of logarithms. This is valid for every n pression only requires a sum. the left-hand expression is not in closed form. For unsuccessful searches. the evaluation of which can be accomplished in the following way. 2n is a simple shift in · · · + (1 + ⌊log2 (n)⌋)) a binary computer and. i. in such a case. the left-hand expression requires n Bn = log2 (n + 1) sums to be evaluated. The same property A2k −1 = k j2j−1 = 2 − 1 j=1 2k − 1 holds true for the most common numerical functions. There are two n elements that can be found with two comparisons: k2k−1 = n2n − 2n + 1 = (n − 1)2n + 1 the median elements in T ′ and in T ′′ . We observe explicitly that for n = 1. dependmathematical proof of such a fact. As before. nobody would prefer computing the left-hand expression rather than the right-hand one. notes their numerical equivalence as for example: we have ⌊n/2⌋ = ⌊(n − 1)/2⌋ = 2k−1 − 1.1. We observe that 1 An = (1 + 2 + 2 + 3 + 3 + 3 + 3 + 4 + · · · 2n = 2×2×· · ·×2 (n times) seems to require n−1 muln tiplications. more in general. whilst the left-hand expresment that binary searching operates on sequential sion requires a number of operations growing linearly searching. and the ren n(n + 1) currence takes on the form: k= 2 k=0 B2k −1 = 1 + B2k−1 −1 or βk = 1 + βk−1 which is only a little better than the worst case. i. we consider the worse situation. In fact. if only n is greater than 10. 000. and the analysis of algorithms provides a with n. indeed. the element s is only found at the last step of the algorithm. however. 000. we find: An = k + k log2 (n + 1) − 1 = log2 (n + 1) − 1 + n n if we write βk for B2k −1 . to which we should always reduce. The Average Case analysis for successful searches ing on some parameter n. relative to the table (s). If Bn is the number of comparisons necessary to find s in a table T with n elements. . Since we are performing a Worst Case analysis. comparisons as: whereas the right-hand one is.sion is independent of n. the analysis is now very simple. There is does not require a number of operations depending only one element that can be found with a single com. Un = Bn . sequential search requires about 500. unfolding yields Although algebraically or numerically equivalent. The initial condition is B1 = 1. in a time independent of α and n.e. A closed form expression is an expression. The 1. This accounts for the dramatic improve. whereas the right-hand exk by our definition n = 2 − 1. 000) ≈ 20 com. a rather good approximation. When n = 2k − 1 the expression time. For n also moderately large (say n ≥ 5) approximation. CLOSED FORMS Let us consider first the Worst Case analysis of this algorithm. but we can simplify everything considering The sign “=” between two numerical expressions dea value of n of the form 2k −1. since we have to proceed as in the Worst Case analysis and at the last comparison we have a failure instead of a success. and returning to the B’s we find: In the example. Continuing k=0 in the same way we find the average number An of Again. a evaluates this latter expression in a few nanoseconds. whereas binary mer. This simplifies: is because the two elementary functions exp(x) and ln(x) have the nice property that their evaluation is k 1 k2k − 2k + 1 independent of their argument. If we now write k2k − 2k + 1 = k(2k − 1) + k − (2k − 1).on n. In fact. but the αn = exp(n ln α) can be always computed with the method is rather difficult and we delay it until later maximal accuracy allowed by a computer in constant (see Section 4. we observe that every step reduces the table to ⌊n/2⌋ or to ⌊(n−1)/2⌋ elements.4 Closed Forms recurrence is not so simple as in the case of sequential searching. a multiplication and of the form 2k − 1 and for the other values this is an a halving. For a successful search. In fact.000 comparisons but can require some milliseconds to compute the foron the average for a successful search.000.e. Consequently. Another example we have already found is: parison: the median element in T .

756704078 = 0. − By this formula we can always reduce the computa3 4 51840x 2488320x tion of ψ(x) to the case 1 ≤ x ≤ 2. except when x is a negative integer. for n large it can be computed by means of the Stirling’s formula. we have: Γ(1/2) = 0 ∞ √ t (t = y 2 and √ π By the previous recurrence.5 + − 2π 1 + Γ(x) = e x xΓ(x) x 12x 288x2 and this is a basic property of the digamma function. which is obtained from the same formula for the Γ-function: n! Γ(n + 1) = nΓ(n) = √ n n 1 1 2πn = + + ··· . Finally. Most of them can be reduced to the computation of some basic quantities. . It allows us to reduce the computation of Γ(x) to the case 1 ≤ x ≤ 2. the harmonic numbers and the binomial coefficients. we have: Γ′ (x) d ln Γ(x) = . Another method is to ψ(x + 1) = = dx = Γ(x + 1) xΓ(x) use Stirling’s approximation: 1 Γ(x) + xΓ′ (x) √ 1 1 = + ψ(x) = −x x−0.577191652 = 0. is defined as the logarithmic derivative of the Γ-function: where: b1 b2 b3 b4 = −0.988205891 = −0. For example. where we can use Some special values of the Γ-function are directly the approximation: obtained from the definition. which are considered to be in closed form. In this interval we can This requires only a fixed amount of operations to reach the desired accuracy. called ψ-function or digamma Γ(x + 1) = 1 + b1 x + b2 x2 + · · · + b8 x8 + ǫ(x) function. However. Γ(−n + ǫ) ≈ n! ǫ When we unfold the basic recurrence of the Γfunction for x = n an integer. which will be discussed in the next chapter.193527818 = 0. By using the approximation for ψ(x).57721566 . we find Γ(n + 1) = n × (n − 1) × · · · × 2 × 1. The factorial Γ(n + 1) = n! = 1 × 2 × 3 × · · · × n seems to require n − 2 multiplications.8 as the trigonometric and hyperbolic functions. In order to justify the previous sentence. In fact. The three main quantities of this kind are the factorial. although apparently they depend on some parameter n. when 1 1 1 1 x = 1 the integral simplifies and we immediately find ψ(x) = ln x − − + − + ···. the following approximation can be important: (−1)n 1 . recurrence property of the Γfunction.482199394 = −0. from the recurrence relation we obtain: 1 Γ(1/2) = Γ(1 − 1/2) = − Γ(−1/2) 2 and therefore: √ Γ(−1/2) = −2Γ(1/2) = −2 π. we have: Hn = ψ(n + 1) + γ e−y 2ydy = 2 y 2 ∞ 0 e−y dy = 2 where γ = 0. we see that the digamma function is related to the harmonic numbers Hn = 1 + 1/2 + 1/3 + · · · + 1/n.918206857 b5 b6 b7 b8 = −0. let us anticipate some definitions.035868343 ψ(x) = Obviously. When x = 1/2 the definition implies: Γ(1/2) = 0 ∞ t1/2−1 e−t dt = 0 ∞ e−t √ dt. The Γ function is defined for every x ∈ C. 571 139 − + ··· . The Γ-function is defined by a definite integral: Γ(x) = 0 ∞ CHAPTER 1. INTRODUCTION the last integral being the famous Gauss’ integral. we obtain: Γ(x + 1) = 0 ∞ tx e−t dt = ∞ 0 = −tx e−t + 0 ∞ xtx−1 e−t dt = xΓ(x) which is a basic. the Γ and ψ functions (see below). and so on. t By performing the substitution y = dt = 2ydy). where the function goes to infinity. 1+ e 12n 288n2 = tx−1 e−t dt. is the Mascheroni-Euler constant.897056937 = 0. we . dx Γ(x) d xΓ(x) Γ′ (x + 1) The error is |ǫ(x)| ≤ 3 × 10−7 . . As we shall see. 2 4 2x 12x 120x 252x6 Γ(1) = 1. in algorithm analysis there appear many kinds of “special” numbers. and give a more precise presentation of the Γ and ψ functions. By integrating by parts. use a polynomial approximation: The function ψ(x).

The two methods are indeed the same. Obviously. iff: lim f (n) = 0. 1. write f (n) = Θ(g(n)) or f (n) ≈ g(n). while en grows faster than any power Obviously. because when f (n) = o(g(n)) the number of operations performed by A is substantially less than the number of operations performed by B. which remains the same as n → ∞. If f (n) and g(n) describe the behavior of two algorithms A and B solving the same problem. Finally. but the notation introduced (the smalloh notation) is used rather frequently and should be known. However. It is also possible to give an absolute evaluation for the performance of a given algorithm A. when f (n) = Θ(g(n)).5. The reader can. if we have at the same time: g(n) <∞ n→∞ f (n) lim n we say that f (n) is in the same order as g(n) and < O(ee ) < · · · . we wish to introduce a last definition: we say that f (n) is of smaller order than g(n) and write f (n) = o(g(n)).1. when f (n) = Θ(g(n)) we cannot say which algorithm is better. a value K < 1 tells us that algorithm A is relatively better than B. g(n) which shows that the computation of Hn does not require n − 1 sums and n − 1 inversions as it can appear from its definition. the number of operations is the same. Let us consider functions f : N → R (i. Therefore. Besides. This is rather clear. g(n) n→∞ can be reduced to the computation of the Γ function or can be approximated by using the Stirling’s formula for factorials. whose behavior is described by a sequence of values f (n). This is done by comparing f (n) against an absolute scale of values. the two algorithms are asymptotically equivalent iff f (n) = Θ(g(n)). this is in accordance with the previous definitions. and with two different implementations we may have K < 1 or K > 1. sequences of real numbers). k ∈ C. We observe explicitly that when f (n) ever small ǫ. because of the use of the letter O to denote the desired behavior. logarithms and exponentials: √ O(1) < O(ln n) < O( n) < O(n) < · · · √ < O(n ln n) < O(n n) < O(n2 ) < · · · < O(n5 ) < · · · < O(en ) < · · · . as a very useful exercise. THE LANDAU NOTATION 9 obtain an approximate formula for the Harmonic is in the same order as g(n). howalence relation. We observe explicitly that the last expression shows that binomial coefficients can be defined for every n. we say that f (n) is O(g(n)). except for a constant quantity K. write computer programs to realize the various functions mentioned in the present section. It is easy to see that “∼” is an order relation be. we will say that A is asymptotically better than B iff f (n) = o(g(n)).5 The Landau notation To the mathematician Edmund Landau is ascribed a special notation to describe the general behavior of a function f (x) when x approaches some definite value. The constant K can simply depend on the particular realization of the algorithms A and B. instead. except that k cannot be a negative integer number. if and only if: n→∞ lim f (n) <∞ g(n) In formulas we write f (n) = O(g(n)) or also f (n) ∼ g(n). the binomial coefficient: n k = = = n(n − 1) · · · (n − k + 1) = k! n! = k!(n − k)! Γ(n + 1) Γ(k + 1)Γ(n − k + 1) the constant K is very important and will often be used. and that “≈” is an equiv.e. Before making some important comments on Landau notation. We are mainly interested to the case x → ∞. or that f (n) is in the order of g(n). The scale most frequently used contains powers of n.. in general. given two functions f (n) and g(n). and vice versa when K > 1. Landau notation is also known as O-notation (or bigoh notation).rithm grows more slowly than any power nǫ . a constant K = 0 exists such that: numbers: Hn = ln n + γ + 1 1 − + ··· 2n 12n2 n→∞ lim f (n) =1 Kg(n) or n→∞ lim f (n) = K. if A and B are both realized on the same computer and in the best possible way. this depending on the particular realization or on the particular computer on which the algorithms are run. but this should not be considered a restriction.This scale reflects well-known properties: the logatween functions f : N → R.

this means that algorithm A performs at least as well as any algorithm B whose behavior is described by a function g(n) = Θ(n). the scale is not complete.9 < 47. For algorithm B we have g(m) = KB em = 1000KB en . A is said to be quadratic. 000. As a standard terminology. If f (n) = O(1). if f (n) ≤ O(nr ) for some r ∈ R. while exponential algorithms are called intractable.. and they are especially interesting when n becomes very large.10 nk . if f (n) = O(n2 ).4 ) < O( n) and O(ln ln n) < O(ln n). mainly used in the “Theory of Complexity”. INTRODUCTION .e. but the reader can easily fill in other values. Therefore. which can be of interest to him or her. for example. A is said to be exponential. however large k. as we obviously √ have O(n0. by simplifying and taking logarithms. If f (n) = O(n). in a time independent of n. CHAPTER 1. A is said to be logarithmic. Let f (n) be a sequence describing the behavior of an algorithm A. the parameter used to evaluate the algorithm behavior. if f (n) = O(ln n). Suppose we have a linear algorithm A and an exponential algorithm B. If a new computer is used which is 1000 times faster than the old one. Algorithms of this type are the best possible algorithms. in an hour algorithm A will execute with m such that f (m) = KA m = 1000KA n. Therefore. then the algorithm A performs in a constant time. In general. that f (n) = O(n). polynomial algorithms are also called tractable. Note that nn = en ln n and therefore O(en ) < O(nn ). the improvement achieved by the new computer is almost negligible. not necessarily solving the same problem. As a matter of fact. the problem solved by the new computer is 1000 times larger than the problem solved by the old computer. or m = 1000n = 40. These names are due to the following observation. the algorithm A is said to be linear. and if f (n) ≥ O(en ). then A is said to be polynomial. Also suppose that an hour of computer time executes both A and B with n = 40. independent of n. we find m = n + ln 1000 ≈ n + 6. i. We can compare f (n) to the elements of the scale and decide.

. and then (f0. the sequence is called a sequence of real numbers. whose first row is the sequence (f0.) is the first column.1 . Therefore.e. if |S| = n and S = {a1 . such that every element a ∈ A is the first element of one and only one pair (a. we also study double sequences. we have m · m · . (f0. A more difficult problem is to find out how many mappings from A to B exist. if f : N → Q. . the second row is (f1. f0.1 . this means that we can associate to a any one of the m elements in B. and so on.k | n. mappings f : N × N → R or some other numeric set.) = (fk )k∈N . f2 . . Usually. . an } in any order. The set A is the domain of the mapping. Because of this notation.) is the second column. . Often. The array can also be read by columns. an ). .). . and so on. . . . Two arrangements of S are different if and only if an index k exists. the two arrangements continue to be the same. · m different possibilities. and b is called the image of a under the mapping f . i. . . . thus giving an intensional definition of the particular set B. f2. Therefore. 11 . . instead of writing f (n. f1. noted as f : A → B.1 . . If S is a finite set. A bijection or 1-1 correspondence or 1-1 mapping is any injective function which is also surjective. .). the index n is the row index and the index k is the column index. If |A| = n and |B| = m. f1 . If f : N → R...k∈N . i. We wish to describe here some sequences and double sequences of numbers. . . . . . b) ∈ f . b. c. b) ∈ f is f (a) = b. f0.e. contains exactly nm elements or couples. f2. . i. A function in which every b ∈ B belongs at least to one pair (a. k) we will usually write fn. .1 . a2 . . b) with a ∈ A and b ∈ B. frequently occurring in the analysis of algorithms. We can observe that every element a ∈ A must have its image in B. b1 ) and (a2 . B are two sets.k and the whole sequence will be denoted by {fn.0 . and therefore appear in more complex situations. . . c. . they arise in the study of very simple and basic combinatorial problems. as sets. the set of all the couples (a. for which the elements corresponding to that index in the two arrangements are different. If a. we write (a1 . we write B = {x | P (x) is true}. a mapping or function from A into B. This is the vector notation and is used to represent arrangements of S. f1. If A. an element k ∈ N is called an index. .0 . f1. If a set B is defined through a property P of its elements. f1. are the objects (or elements) in a set denoted by A. obviously. a2 . and the whole sequence is abbreviated as (f0 . . the image of a k ∈ N is denoted by fk instead of the traditional f (k). . b.1 . which have to be taken as basic: the concept of a set and the concept of a mapping. b) is called surjective. A set is a collection of objects and it must be considered a primitive concept.0 .e. then |S| denotes its cardinality or the number of its elements: The order in which we write or consider the elements of S is irrelevant. k ∈ N} or (fn. .}. f ⊆ A × B. b2 ) with b1 = b2 is called injective. is a subset of the Cartesian product of A by B.1 Mappings and powers In all branches of Mathematics two concepts appear. . thus giving an extensional definition of this particular set. In this case also. If we wish to emphasize a particular arrangement or ordering of the elements in S.2 . while B is the range or codomain. we have a total of mn different mappings from A into B. so to be considered as the fundamental bricks in a solid wall.0 . A double sequence can be displayed as an infinite array of numbers. when the product is extended to all the n elements in A. Since all the mappings can be built in this way. A function for which every a1 = a2 ∈ A corresponds to pairs (a1 . In fact.Chapter 2 Special numbers A sequence is a mapping from the set N of natural numbers into some other set of numbers. This also explains why the set of mappings from A into B is often denoted by 2.0 . The usual notation for (a. a concept which cannot be defined in terms of other and more elementary concepts. the cartesian product A × B. the sequence is called a sequence of rational numbers. we write A = {a.k )n.2 . and so on.

12 B A ; in fact we can write: B A = |B||A| = mn .

CHAPTER 2. SPECIAL NUMBERS In order to abstract from the particular nature of the n objects, we will use the numbers {1, 2, . . . , n} = Nn , and define a permutation as a 1-1 mapping π : Nn → Nn . By identifying a and 1, b and 2, c and 3, the six permutations of 3 objects are written:

This formula allows us to solve some simple combinatorial problems. For example, if we toss 5 coins, how many different configurations head/tail are pos1 2 3 1 2 3 1 2 3 , , , sible? The five coins are the domain of our mappings 1 2 3 1 3 2 2 1 3 and the set {head,tail} is the codomain. Therefore we have a total of 25 = 32 different configurations. 1 2 3 1 2 3 1 2 3 , , , Similarly, if we toss three dice, the total number of 2 3 1 3 1 2 3 2 1 3 configurations is 6 = 216. In the same way, we can where, conventionally, the first line contains the elcount the number of subsets in a set S having |S| = n. ements in Nn in their proper order, and the secIn fact, let us consider, given a subset A ⊆ S, the ond line contains the corresponding images. This mapping χA : S → {0, 1} defined by: is the usual representation for permutations, but since the first line can be understood without χA (x) = 1 for x ∈ A ambiguity, the vector representation for permutaχA (x) = 0 for x ∈ A / tions is more common. This consists in writThis is called the characteristic function of the sub- ing the second line (the images) in the form of set A; every two different subsets of S have dif- a vector. Therefore, the six permutations are ferent characteristic functions, and every mapping (1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1), ref : S → {0, 1} is the characteristic function of some spectively. subset A ⊆ S, i.e., the subset {x ∈ S | f (x) = 1}. Let us examine the permutation π = (3, 2, 1) for Therefore, there are as many subsets of S as there which we have π(1) = 3, π(2) = 2 and π(3) = 1. are characteristic functions; but these are 2n by the If we start with the element 1 and successively apformula above. ply the mapping π, we have π(1) = 3, π(π(1)) = A finite set A is sometimes called an alphabet and π(3) = 1, π(π(π(1))) = π(1) = 3 and so on. Since its elements symbols or letters. Any sequence of let- the elements in Nn are finite, by starting with any ters is called a word; the empty sequence is the empty k ∈ Nn we must obtain a finite chain of numbers, word and is denoted by ǫ. From the previous consid- which will repeat always in the same order. These erations, if |A| = n, the number of words of length m numbers are said to form a cycle and the permutais nm . The first part of Chapter 6 is devoted to some tion (3, 2, 1) is formed by two cycles, the first one basic notions on special sets of words or languages. composed by 1 and 3, the second one only composed by 2. We write (3, 2, 1) = (1 3)(2), where every cycle is written between parentheses and numbers are sep2.2 Permutations arated by blanks, to distinguish a cycle from a vector. Conventionally, a cycle is written with the smallest In the usual sense, a permutation of a set of objects element first and the various cycles are arranged acis an arrangement of these objects in any order. For cording to their first element. Therefore, in this cycle example, three objects, denoted by a, b, c, can be arrepresentation the six permutations are: ranged into six different ways: (1)(2)(3), (1)(2 3), (1 2)(3), (1 2 3), (1 3 2), (1 3)(2). (a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a). A number k for which π(k) = k is called a fixed A very important problem in Computer Science is point for π. The corresponding cycles, formed by a “sorting”: suppose we have n objects from an orsingle element, are conventionally understood, except dered set, usually some set of numbers or some set of in the identity (1, 2, . . . , n) = (1)(2) · · · (n), in which strings (with the common lexicographic order); the all the elements are fixed points; the identity is simply objects are given in a random order and the problem written (1). Consequently, the usual representation consists in sorting them according to the given order. of the six permutations is: For example, by sorting (60, 51, 80, 77, 44) we should obtain (44, 51, 60, 77, 80) and the real problem is to (1) (2 3) (1 2) (1 2 3) (1 3 2) (1 3). obtain this ordering in the shortest possible time. In other words, we start with a random permutation of A permutation without any fixed point is called a the n objects, and wish to arrive to their standard derangement. A cycle with only two elements is called ordering, the one in accordance with their supposed a transposition. The degree of a cycle is the number order relation (e.g., “less than”). of its elements, plus one; the degree of a permutation

2.3. THE GROUP STRUCTURE is the sum of the degrees of its cycles. The six permutations have degree 2, 3, 3, 4, 4, 3, respectively. A permutation is even or odd according to the fact that its degree is even or odd. The permutation (8, 9, 4, 3, 6, 1, 7, 2, 10, 5), in vector notation, has a cycle representation (1 8 2 9 10 5 6)(3 4), the number 7 being a fixed point. The long cycle (1 8 2 9 10 5 6) has degree 8; therefore the permutation degree is 8 + 3 + 2 = 13 and the permutation is odd. In fact, in cycle notation, we have: (2 5 4 7 3 6)(2 6 3 7 4 5) = = (2 6 3 7 4 5)(2 5 4 7 3 6) = (1).

13

A simple observation is that the inverse of a cycle is obtained by writing its first element followed by all the other elements in reverse order. Hence, the inverse of a transposition is the same transposition. Since composition is associative, we have proved that (Pn , ◦) is a group. The group is not commutative, because, for example: ρ◦π = = (1 7 6 2 4)(3 5) = π ◦ ρ. (1 4)(2 5 7 3) ◦ (2 5 4 7 3 6)

2.3

The group structure

Let n ∈ N; Pn denotes the set of all the permutations of n elements, i.e., according to the previous sections, the set of 1-1 mappings π : Nn → Nn . If π, ρ ∈ Pn , we can perform their composition, i.e., a new permutation σ defined as σ(k) = π(ρ(k)) = (π ◦ ρ)(k). An example in P7 is: π◦ρ=

An involution is a permutation π such that π 2 = π ◦ π = (1). An involution can only be composed by fixed points and transpositions, because by the definition we have π −1 = π and the above observation on the inversion of cycles shows that a cycle with more than 2 elements has an inverse which cannot coincide with 1 2 3 4 5 6 7 1 2 3 4 5 6 7 the cycle itself. 1 5 6 7 4 2 3 4 5 2 1 7 6 3 Till now, we have supposed that in the cycle repre1 2 3 4 5 6 7 sentation every number is only considered once. How. = 4 7 6 3 1 5 2 ever, if we think of a permutation as the product of In fact, by instance, π(2) = 5 and ρ(5) = 7; therefore cycles, we can imagine that its representation is not σ(2) = ρ(π(2)) = ρ(5) = 7, and so on. The vec- unique and that an element k ∈ Nn can appear in tor representation of permutations is not particularly more than one cycle. The representation of σ or π ◦ ρ suited for hand evaluation of composition, although it are examples of this statement. In particular, we can is very convenient for computer implementation. The obtain the transposition representation of a permutation; we observe that we have: opposite situation occurs for cycle representation: (2 5 4 7 3 6) ◦ (1 4)(2 5 7 3) = (1 4 3 6 5)(2 7) Cycles in the left hand member are read from left to right and by examining a cycle after the other we find the images of every element by simply remembering the cycle successor of the same element. For example, the image of 4 in the first cycle is 7; the second cycle does not contain 7, but the third cycle tells us that the image of 7 is 3. Therefore, the image of 4 is 3. Fixed points are ignored, and this is in accordance with their meaning. The composition symbol ‘◦’ is usually understood and, in reality, the simple juxtaposition of the cycles can well denote their composition. The identity mapping acts as an identity for composition, since it is only composed by fixed points. Every permutation has an inverse, which is the permutation obtained by reading the given permutation from below, i.e., by sorting the elements in the second line, which become the elements in the first line, and then rearranging the elements in the first line into the second line. For example, the inverse of the first permutation π in the example above is: 1 1 2 3 4 6 7 5 5 6 2 3 7 4 . (2 6)(6 5)(6 4)(6 7)(6 3) = (2 5 4 7 3 6) We transform the cycle into a product of transpositions by forming a transposition with the first and the last element in the cycle, and then adding other transpositions with the same first element (the last element in the cycle) and the other elements in the cycle, in the same order, as second element. Besides, we note that we can always add a couple of transpositions as (2 5)(2 5), corresponding to the two fixed points (2) and (5), and therefore adding nothing to the permutation. All these remarks show that: • every permutation can be written as the composition of transpositions; • this representation is not unique, but every two representations differ by an even number of transpositions; • the minimal number of transpositions corresponding to a cycle is the degree of the cycle (except possibly for fixed points, which however always correspond to an even number of transpositions).

14 Therefore, we conclude that an even [odd] permutation can be expressed as the composition of an even [odd] number of transpositions. Since the composition of two even permutations is still an even permutation, the set An of even permutations is a subgroup of Pn and is called the alternating subgroup, while the whole group Pn is referred to as the symmetric group.

CHAPTER 2. SPECIAL NUMBERS When n ≥ 2, we can add to every permutation in Pn one transposition, say (1 2). This transforms every even permutation into an odd permutation, and vice versa. On the other hand, since (1 2)−1 = (1 2), the transformation is its own inverse, and therefore defines a 1-1 mapping between even and odd permutations. This proves that the number of even (odd) permutations is n!/2. Another simple problem is how to determine the number of involutions on n elements. As we have already seen, an involution is only composed by fixed points and transpositions (without repetitions of the elements!). If we denote by In the set of involutions ′ of n elements, we can divide In into two subsets: In is the set of involutions in which n is a fixed point, ′′ and In is the set of involutions in which n belongs to a transposition, say (k n). If we eliminate n from ′ the involutions in In , we obtain an involution of n − 1 ′ elements, and vice versa every involution in In can be obtained by adding the fixed point n to an involution in In−1 . If we eliminate the transposition (k n) from ′′ an involution in In , we obtain an involution in In−2 , which contains the element n−1, but does not contain the element k. In all cases, however, by eliminating (k n) from all involutions containing it, we obtain a set of involutions in a 1-1 correspondence with In−2 . The element k can assume any value 1, 2, . . . , n − 1, and therefore we obtain (n − 1) times |In−2 | involutions. We now observe that all the involutions in In are obtained in this way from involutions in In−1 and In−2 , and therefore we have: |In | = |In−1 | + (n − 1)|In−2 |

2.4

Counting permutations

How many permutations are there in Pn ? If n = 1, we only have a single permutation (1), and if n = 2 we have two permutations, exactly (1, 2) and (2, 1). We have already seen that |P3 | = 6 and if n = 0 we consider the empty vector () as the only possible permutation, that is |P0 | = 1. In this way we obtain a sequence {1, 1, 2, 6, . . .} and we wish to obtain a formula giving us |Pn |, for every n ∈ N. Let π ∈ Pn be a permutation and (a1 , a2 , ..., an ) be its vector representation. We can obtain a permutation in Pn+1 by simply adding the new element n + 1 in any position of the representation of π: (n + 1, a1 , a2 , . . . , an ) ··· (a1 , n + 1, a2 , . . . , an ) ···

(a1 , a2 , . . . , an , n + 1)

Therefore, from any permutation in Pn we obtain n+1 permutations in Pn+1 , and they are all different. Vice versa, if we start with a permutation in Pn+1 , and eliminate the element n + 1, we obtain one and only one permutation in Pn . Therefore, all permutations in Pn+1 are obtained in the way just described and are obtained only once. So we find: |Pn+1 | = (n + 1) |Pn |

Since |I0 | = 1, |I1 | = 1 and |I2 | = 2, from this recurwhich is a simple recurrence relation. By unfolding rence relation we can successively find all the values this recurrence, i.e., by substituting to |Pn | the same of |In |. This sequence (see Section 4.9) is therefore: expression in |Pn+1 |, and so on, we obtain: 5 6 7 8 n 0 1 2 3 4 We conclude this section by giving the classical computer program for generating a random permutation of the numbers {1, 2, . . . , n}. The procedure Since, as we have seen, |P0 |=1, we have proved that shuffle receives the address of the vector, and the the number of permutations in Pn is given by the number of its elements; fills it with the numbers from product n · (n − 1)... · 2 · 1. Therefore, our sequence 1 to n then uses the standard procedure random to is: produce a random permutation, which is returned in the input vector: n 0 1 2 3 4 5 6 7 8 |Pn | 1 1 2 6 24 120 720 5040 40320 procedure shuffle( var v : vector; n : integer ) ; As we mentioned in the Introduction, the number var : · · ·; n·(n−1)...·2·1 is called n factorial and is denoted by begin n!. For example we have 10! = 10·9·8·7·6·5·4·3·2·1 = for i := 1 to n do v[ i ] := i ; 3, 680, 800. Factorials grow very fast, but they are one for i := n downto 2 do begin of the most important quantities in Mathematics. j := random( i ) + 1 ; = (n + 1)n(n − 1) · · · 1 × |P0 | |Pn+1 | = (n + 1)|Pn | = (n + 1)n|Pn−1 | = = ··· = In 1 1 2 4 10 26 76 232 764

n0 = 1. there are n different k-dispositions of n objects.1 C3. . (n − k + 1) n nk = = dispositions of n elements in groups of k. we obtain k! other dispoto be used as second element. which is 1 as we have already seen. If A = {a. there are only 6 2r to 0 . The procedure obviously performs in n 0 linear time. c. d} {c. As we have seen. d.(n − k + 1) have: different ways. nn = n! and. all the possible combinations It is not difficult to compute a binomial coefficient such as 100 . c}} = {{a. k! disbe used as a third element. c) (b. c) There exists a simple formula to compute binomial k These arrangements ara called dispositions and. we choose a term from each factor (a + b). n n−1 = When in k-dispositions we do not consider the k k−1 order of the elements. but it is a hard job to compute 100 3 97 . because. which is often read “n choose k”.The number of k-combinations of n objects is denoted n dom. . (n − k + 1) n = = consists of k factors beginning with n and decreasing k k! by one down to (n − k + 1). and is called element.0 C3. For example.k denotes the number of possible n(n − 1) . c} {a. b. . we can use any one of the n objects to be first in the permutation. b. and n − 2 objects to sitions with the same elements. possibly the same last element.. and so on. a) (b.2.3 = {{a}. a) (d. by the definition above. we have: k! k! k Dn. Therefore. If Dn. n (n − 1) . the elements in v are properly a binomial coefficient. if we have 4 objects a. {b}. b} {a. d. in coefficients.2 C3. They are: n (a + b)n = k=0 n k n−k a b k which is easily proved. {a. d} {b. k (c. we can wonder how many different orderings exist of k among the n objects. we have: {a. and the procedure goes on with the last but one by k . c.k = n(n − 1) · · · (n − k + 1) = nk This formula gives a simple way to compute binomial coefficients in a recursive way. the so-called Pochhammer symbol. positions correspond to a single combination and we the k objects can be selected in n(n − 1). In this way. b) (c. {c}} = {{a. There remain only n − 1 objects objects in the disposition. The name “binomial coefficients” is due to the welltions known “binomial formula”: Permutations are a special case of a more general situation. There exists also a rising = = ¯ k k (k − 1)! factorial n = n(n + 1) · · · (n + k − 1). v[ i ] := v[ j ]. the resulting term ak bn−k is obtained by summing over all the possible choices of (a.5. c}} = ∅ 15 The procedure exchanges the last element in v with a random element in v. are just n . by permuting the k general. b. given a set of n elements. Therefore.5 Dispositions and Combina.. d} 7 3 = 7 6 3 2 = 76 5 32 1 = 765 4 321 0 = 35 Combinations are written between braces. d) (b. Obviously. which. the empty set is the only subset with 0 elements. we can make 12 different arrangements with two objects chosen from a. . b) (d. and they are: ple. of these objects are: C3. In fact we have: The symbol nk is called a falling factorial because it n(n − 1) . v[ j ] := a end end {shuffle} . and the whole set is 2. DISPOSITIONS AND COMBINATIONS a := v[ i ]. In fact. . c) (a. often denoted by (n)k . b) (a. {b. d) k a’s and n − k b’s. because they are simply the subsets with k objects of a set of n objects.the only subset with n elements. c}. (n − k + 1) by convention. when expanding the product (a + b)(a + b) · · · (a + b). b}. For examcombinations of 4 objects. By the very definition we have: shuffled and eventually v contains the desired random n n =1 ∀n ∈ N =1 permutation. d) (d. c} {b. a) (c. If we have n objects. c}. we obtain what are called and the expression on the right is successively reduced k-combinations. . In this way the last element is chosen at ran. b. Therefore.

A useful exercise is to prove that the (corresponding to the numbers n ). As n n(n − 1) .1. By summing these two evaluation of the central binomial coefficient k . (r − k + 1) rk = . . The rule and will be used very often. However. we have: 4 2 3 3 + = 2 1 2 2 2 2 = + + + = 2 1 1 0 2 1 1 = 2+2 =2+2 + 1 1 0 = 2 + 2 × 2 = 6. k −1). i. SPECIAL NUMBERS by means of the above formulas. a2 . = For example we have: 1/2 3 = 1 1/2(−1/2)(−3/2) = 3! 16 but in this case the symmetry rule does not make sense. k) is obtained by summing two which allows us to express a binomial coefficient with negative. it gives a simple rule (−1)k (n + k − 1)k to compute successively all the binomial coefficients. He (or she) is + = k−1 k k warned not to use the formula n!/k!(n−k)!. = k! Let us dispose them in an infinite array. . therefore. . Let S + and S − be these two classes. a symmetry formula which is very useful.k = ... (n − k + 1) (n − k) . and k! (n − k) . . n is the number of the subsets with k = = k k! k elements of a set with n elements. how. k). whose rows n+k−1 k represent the number n and whose columns represent (−1) = k the number k.6 The Pascal triangle ever. n n The definition of a binomial coefficient can be eas= =1 ∀n ∈ N. . If in a combination we are allowed to have several array is initially filled by 1’s in the first column (corn copies of the same element. The class S − can be seen k−1 100 From this formula we immediately obtain 97 = as composed by the subsets with k elements of a set 100 with n − 1 elements. This is known as negation the position (n. The most difficult computing problem is the 2k number is therefore n−1 .e. . . . i. in position (n−1. the base set minus an : their 3 . . See Table 2. (−n − k + 1) = k! = This recurrence is not particularly suitable for numerical computation. . We can count n . 1 the number of such subsets in the following way. Let Binomial coefficients satisfy a very important recurus begin by observing that: rence relation. exceeding the capacity which can be used with the initial conditions: of the computer when n. The symmetry rule is quite evident and a n+k−1 simple observation is that the sum of the elements in Rn. The reader is invited to produce a computer pron−1 n−1 n gram to evaluate binomial coefficients.e. 0 n ily expanded to any real numerator: r k = r(r − 1) . There exists. Let = = {a1 . an . k row n is 2n . The recurrence tells us that the element in position (n. k are not small. . lower triangular arnumber of the k by k combinations with repetitions ray known as the Pascal triangle (or Tartaglia-Pascal of n elements is: triangle). k! k! For example. . since .16 CHAPTER 2. . integer numerator as a binomial coefficient elements in the previous row: the element just above with positive numerator.g. . we obtain a combination responding to the various 0 ) and the main diagonal n with repetitions. which we are now going to prove. The proof of this fact is immediate.. which can produce very large numbers.e. an } be the elements of the base set. We actually obtain an infinite. we obtain the recurrence relation: which symmetry gives no help. in position (n − 1. e. and it not contain an .2. We can disn! = tinguish the subsets with k elements into two classes: k!(n − k)! the subsets containing an and the subsets that do This is a very important formula by its own. 1 let us fix one of these elements. for k contributions. i. . We point out that: −n k = −n(−n − 1) . .. k). the number of the n−k k (n − k)!(n − (n − k))! elements in S + is n−1 . and the element on the left. (n − k + 1) we know. shows that: We now point out that S + can be seen (by eliminating an ) as the subsets with k − 1 elements of a n! n n = = set with n − 1 elements.

if we cumulate the 2m numbers the binomial formula can be extended to real expofrom 1/(2m + 1) to 1/2m+1 . and so: n for the first row. we obtain: nents. which is n 0 1 2 3 4 5 6 7 8 called the cross-product rule: cn 1 2 6 20 70 252 924 3432 12870 k! n k n! = = They are very important. the row with n = 0. 1! 2! n! set with n elements. and this number is obviously 2n . the sum 1−4+6−4+1. = n n! k−r r n (−1) 1 · 3 · · · (2n − 1) = = This rule. 4n n n k k r = n r In a similar way.g.. we can prove the following identities: 1/2 n 3/2 n −3/2 n = = = (−1)n−1 2n − 1) n 2n (−1)n 3 n (2n − 1)(2n − 3) 4 n n−1 (−1) (2n + 1) 2n . e. (1 + x)r has a Taylor expansion around the point x = 0 of the form: (1 + x)r = n\k 0 1 2 3 4 5 6 0 1 1 1 1 1 1 1 1 1 2 3 4 5 6 2 3 4 5 6 1 3 6 10 15 1 4 10 20 1 5 15 1 6 1 Table 2. for example. we can k!(n − k)! r!(k − r)! k r −1/2 in terms of express the binomial coefficients n n! (n − r)! the central binomial coefficients: = = r!(n − r)! (n − k)!(k − r)! −1/2 (−1/2)(−3/2) · · · (−(2n − 1)/2) n n−r = = . n−r k−r 2. The reader can try to prove that the row alternating The coefficient of xn is therefore f (n) (0)/n! = sums. Suppose now that the formula holds true for some n ∈ N and let us differentiate it once more: f (n+1) (x) = rn (r − n)(1 + x)r−n−1 = = rn+1 (1 + x)r−(n+1) and this proves our statement. and. The first two cases are f (0) = (1 + x)r = r0 (1 + x)r−0 . Let us consider the function f (x) = (1+x)r . are the three basic properties of binomial n (−1) 1 · 2 · 3 · 4 · · · (2n − 1) · (2n) = = coefficients: 2n n!2 · 4 · · · (2n) n n n −n n+k−1 (−1)n (2n)! (−1) (2n)! = = (−1)k = = = n n!2n (1 · 2 · · · n) n 2 k n−k k k 2 4 n! = (−1)n 2n .7. the numbers cn = r n 2n (1 + x)r = x . together with the symmetry and the nega2n n! tion rules.7 Harmonic numbers ∞ k=1 It is well-known that the harmonic series: 1 1 1 1 1 = 1 + + + + + ··· k 2 3 4 5 An important point of this generalization is that diverges. HARMONIC NUMBERS 17 as can be shown by mathematical induction. equal zero. n 4n 4n (2n and are to be remembered by heart.1: The Pascal triangle ′ ′′ (n) a row represents the total number of the subsets in a = f (0) + f (0) x + f (0) x2 + · · · + f (0) xn + · · · . Because of that. in fact we have: 2m + 1 2m + 2 2 f (n) (x) = dn (1 + x)r = rn (1 + x)r−n dxn > 1 2m+1 + 1 2m+1 + ··· + 1 2m+1 = 2m 2m+1 = 1 2 . In fact. n are called central binomial coefficients and their n n=0 sequence begins: We conclude with the following property.2. ∞ As we have already mentioned. except rn /n! = r . it is continuous and can be differentiated as many times 1 1 1 + + · · · + m+1 > as we wish. f ′ (x) = r(1 + x)r−1 .

5772156649 .that: monic number. Conventionally. SPECIAL NUMBERS and therefore the sum cannot be limited. we set H0 = 0 and Hn → ln n + γ as n→∞ the sequence of harmonic numbers begins: This constant is called the Euler-Mascheroni con4 5 6 7 8 n 0 1 2 3 stant and. 2 3 4 n ζ(s) = 1 1 1 + s + s + ··· s 1 2 3 Obviously we have: H2n − L2n = 2 1 1 1 + + ··· + 2 4 2n = Hn which can be defined in such a way that the sum actually converges except for s = 1 (the harmonic series).8 Fibonacci numbers We can now iterate this procedure and eventually find: H20 +k ln 2− 1 1 1 − k−1 −· · ·− 1 < H2k < H20 +k ln 2. together with the computing algorithms for performing the four basic operations. 1 1 1 Hn = 1 + + + · · · + These limitations can be extended to every n. Later we will prove the Hn 0 1 2 6 12 60 20 140 280 more accurate approximation of the Hn ’s we quoted in the Introduction. In fact. . Harmonic numbers arise in the analysis of many The generalized harmonic numbers are defined as: algorithms and it is very useful to know an approximate value for them. Leonardo Fibonacci introduced in Europe the positional notation for numbers.202056903 . this ima finite. On the Since H20 = H1 = 1. Let us consider the series: 1 1 1 1 (s) Hn = s + s + s + · · · + s 1 2 3 n 1 1 1 1 1 − + − + − · · · = ln 2 (1) 2 3 4 5 and Hn = Hn . we can set. we have the limitations: other hand we can define: ln 2k < H2k < ln 2k + 1. and since the series for ln 2 is alternating in sign. 2. for < H2k−1 < H2k−2 + ln 2. (s) Hn ≈ ζ(s) we obtain: H2k−2 + ln 2 − H2k−2 + 2 ln 2 − 1 1 − k−1 < H2k < H2k−2 + 2 ln 2. No explicit formula is known for ζ(2n + 1). . 2k 2 Let us now consider the two cases n = 2k−1 and n = where Bn are the Bernoulli numbers (see below). In particular we have: ζ(2) = 1 + 1 1 π2 1 1 + + + + ··· = 4 9 16 25 6 or H2n = L2n + Hn . 8 27 64 125 Therefore: 1 1 1 1 π4 1 ζ(4) = 1 + + + + + ··· = ln 2 − < L2n < ln 2 16 81 256 625 90 2n and in general: and by summing Hn to all members: Hn + ln 2 − 1 < H2n < Hn + ln 2 2n ζ(2n) = (2π)2n |B2n | 2(2n)! 1 Because of the limited value of ζ(s). the error committed by truncating 1 1 1 1 ζ(3) = 1 + + + + + ··· it at any place is less than the first discarded element. and 2 3 n since the values of the Hn ’s are increasing. This plies that a constant γ should exist (0 < γ < 1) such number has a well-defined value and is called a har. as we have already mentioned. They are the partial sums of the series defining the Riemann ζ function: and let us define: Ln = 1 − 1 1 1 (−1)n−1 + − + ··· + . its value 3 11 25 137 49 363 761 is: γ ≈ 0. . k 2 2 2 At the beginning of 1200. 2k−1 large values of n: By summing and simplifying these two expressions. but numeri2k−2 : cally we have: 1 H2k−1 + ln 2 − k < H2k < H2k−1 + ln 2 2 ζ(3) ≈ 1. partial sum of the harmonic series. .. . Fibonacci was .18 CHAPTER 2.

for example 34 and 55.e. and later we will see how they grow and how they can be computed in a fast way. which are generated by the fertile couples. while the previously generated couple of rabbits has not yet become fertile. we can observe that a covering of Mn can be obtained by adding vertically a brick to a covering in Mn−1 or by adding horizontally two bricks to a covering in Mn−2 . If we denote by Fn the number of couples at the nth month. For the moment.9 Walks. Fibonacci numbers grow very fast. we wish to show how Fibonacci numbers appear in combinatorics and in the analysis of algorithms. In Figure 2. The problem is to determine the maximal number of divisions performed by Euclid’s algorithm. Therefore we conclude Mn = Fn+1 . we have the Fibonacci recurrence: Fn = Fn−1 + Fn−2 with the initial conditions F1 = F2 = 1. i. which are as many as there are fertile couples. m are two consecutive Fibonacci numbers and the actual number of divisions is the order of the smaller number in the Fibonacci sequence. The third month the farmer has 2 couples. and we wish to cover a strip 2 dm wide and n dm long by using these bricks.1: Fibonacci coverings for a strip 4 dm long therefore we have the recurrence relation: Mn = Mn−1 + Mn−2 which is the same recurrence as for Fibonacci numbers.2. By the same rule. from that moment on. 55): 55 34 21 13 8 5 3 2 = 1 × 34 = 1 × 21 = 1 × 13 = 1×8 = 1×5 = 1×3 = 1×2 = 2×1 + + + + + + + 21 13 8 5 3 2 1 every term obtained by summing the two preceding numbers in the sequence. since a greater quotient would drastically cut the number of necessary divisions. and 3 couples the fourth month. when they begin to generate a new couple of rabbits every month. 2 couples. Let us consider two consecutive Fibonacci numbers. and let us try to find gcd(34. trees and Catalan numbers “Walks” or “paths” are common combinatorial objects and are defined in the following way. This time. and We immediately see that the quotients are all 1 (except the last one) and the remainders are decreasing Fibonacci numbers. a new couple of rabbits every month. but these are just the couples of two months beforehand. The process can be inverted to prove that only consecutive Fibonacci numbers enjoy this property. It is a simple matter to find the initial values: there is one couple at the beginning (1st month) and 1 in the second month. minus 1. Let Z2 be . WALKS. there are the 3 couples of the preceding month plus the newly generated couples. If Mn is this number. n and m. The problem is to know in how many different ways we can perform this covering. we have the initial conditions M0 = 1 (the empty covering is just a covering!) and M1 = 1. He posed the following problem: a farmer has a couple of rabbits which generates another couple after two months and.. at the nth month. Suppose we have some bricks of dimensions 1 × 2 dm. that is the couples he had two months before. Euclid’s algorithm for computing the Greatest Common Divisor (gcd) of two positive integer numbers is another instance of the appearance of Fibonacci numbers. Obviously. The couples become 5 on the fifth month: in fact.9. the farmer will have the couples of the previous month plus the new couples. TREES AND CATALAN NUMBERS the most important mathematician in western Europe at that time. this maximum is attained when every division in the process gives 1 as a quotient. Despite the small numbers appearing at the beginning of the sequence. the first couple has generated another pair of rabbits. in fact. however.1 we show the five ways to cover a strip 4 dm long. we conclude that given two integer numbers. The new generated couples become fertile after two months. the maximal number of divisions performed by Euclid’s algorithm is attained when n. The problem consists in computing how many couples of rabbits the farmer has after n months. In general. Therefore. we have F0 = 0 and the sequence of Fibonacci numbers begins: n Fn 0 1 0 1 2 3 1 2 4 5 3 5 6 7 8 13 8 21 9 34 10 55 19 Figure 2. These are the only ways of proceeding to build our coverings. 2.

we obtain the number of possible parenthetizations of an expression. the origin (0. There are bk−1 walks of the type AB and bn−k walks of the type CD and. corresponding to the empty walk or the walk composed by the only point (0. How many different trees exist with n nodes? If we fix our attention on the root. if (x + 1. . In Fig. .. y). a 1-1 correspondence exists between the walks and the subsets of Nn with k elements. north-east steps. n}. . Among the walks. The interesting walks are those starting . 3. Every tree with k nodes can be combined with every tree with n − k − 1 nodes to form a tree with n nodes. A pair of points ((x. A walk or path is a finite sequence of points in Z2 with the following properties: 1. y). This also proves. then we can associate to any walk a subset of Nn = {1. There are only 5 trees generated by the six permutations of 1. For the moment. i. It is a simple matter to show that the number of walks composed by n steps and ending at column k (i. 1. 0). the set of points in R2 having integer coordinates. We observe that and. y) with y > x. . . obviously. y − 1)).e.e. 2. 2. y)) is called an east step and a pair ((x. which. In k fact. . and of type ((x.. the 5 possibilities are: ()()() ()(()) (())() (()()) ((())).e. the ones that never go above the main diagonal.e. and therefore we have the recurrence relation: n−1 bn = k=0 bk bn−k−1 which is the same recurrence as before. (x + 1. then either (x. 2.2: How a walk is decomposed the integral lattice. as we show in Figure 2. (x. n the n steps. if we associate an open parenthesis to every east steps and a closed parenthesis to every north step. For example. where we have marked the point C = (k. we will be able to count the number of underdiagonal walks composed by n steps and ending at a horizontal distance k from the main diagonal. k) at which the walk encounters the main diagonal for the first time.3. k)) is just n . walks no point of which has the form (x. Another kind of walks is obtained by considering steps of type ((x. the left subtree has k nodes for k = 0. . 0) belongs to the walk. (x + 1. n − 1.2 we sketch a possible walk. i.e. . y + 1) or (x + 1. n+1 n The bn ’s are called Catalan numbers and they frequently occur in the analysis of algorithms and data structures. the last point is (x. y + 1)). if we denote by 1. y) also belongs to the walk. in a k combinatorial way. . When we build binary trees from permutations. They are called underdiagonal walks. y). we only wish to establish a recurrence relation for the number bn of the underdiagonal walks of length 2n and ending on the main diagonal. . the symmetry rule for binomial coefficients. .. The sequence (bk )k∈N begins: n bn 0 1 1 2 1 2 3 4 5 14 5 42 6 132 7 429 8 1430 Figure 2. while the right subtree contains the remaining n − k − 1 nodes. that is the subset of the north-step numbers. Later on. Since the initial condition is again b0 = 1 (the empty tree) there are as many trees as there are walks.20 D  r     r r r CHAPTER 2. (x + 1. Since the other steps should be east steps. y + 1)) is a north step. When we have three pairs of parentheses. as we shall see: bn = 1 2n . y + 1) belongs to the walk. We therefore obtain the recurrence relation: n n−1 T         r Cr r     rB       r   r         r r A         r r r r bn = k=1 bk−1 bn−k = k=0 bk bn−k−1 E with the initial condition b0 = 1. 2. y). i. SPECIAL NUMBERS the walk is decomposed into a first part (the walk between A and B) and a second part (the walk between C and D) which are underdiagonal walks of the same type. . i. 2. southeast steps. k) = (n − k. k can be any number between 1 and n. are particularly important. starting with the origin... are exactly n . as we know. we do not always obtain different trees from different permutations.

the concept of a rooted planar tree is as First note that the above identities can be written: follows: let us consider a node. He developed the first − n k=0 = k=0 n−1 (−1)n−k xk + k−1 .2: Stirling numbers of the first kind instances: Figure 2. rooted Let us now observe that x = x therefore we have: plane trees are counted by Catalan numbers. and the falling factorials xk = x(x − 1) · · · (x − k + 1). An obvious 1-1 correspondence exists between Dyck walks and the walks considered above.2. they are called Dyck walks.3: The five binary trees with three nodes n\k 0 1 2 3 4 5 6 0 1 0 0 0 0 0 0 1 1 1 2 6 24 120 2 3 4 5 6 q q q q  d q  dq q q q q q   d q  d q q q   d q  d q q   d q  qd q q q q q q q  d q   d q 1 3 11 50 274 1 6 35 225 1 10 85 1 15 1 Table 2. We are mostly interested in the absolute values of these numbers. n−1 x1 x2 x3 x4 = = = = = x x(x − 1) = x2 − x x(x − 1)(x − 2) = x3 − 3x2 + 2x x(x − 1)(x − 2)(x − 3) = x4 − 6x3 + 11x2 − 6x xn = (x − n + 1) n−1 k=0 n−1 (−1)n−k−1 xk = k 2. sometimes read “n cycle k”.2.10.10 Stirling numbers of the first kind = k=0 n−1 (−1)n−k−1 xk+1 − k (n − 1) n−1 (−1)n−k−1 xk = k n−1 About 1730.4 n−1 n (x − n + 1) and we represent all the trees up to n = 3. as are shown in 5 6 7 8 n 0 1 2 3 4 Table 2. STIRLING NUMBERS OF THE FIRST KIND 21 1 d d 2 d d 3 1 d d 3     2 1     2 d d 3 1 d d     3     2     2 1 3 Figure 2. which is the root of n the tree. If n denotes the number of branches in a rooted plane tree. in Fig. if we recursively add branches to the root or n (−1)n−k xk . these numbers are called Stirbn 1 1 2 5 14 42 132 429 1430 ling numbers of the first kind and are now denoted by n k . say xn . 2. After him. and again we obtain the sequence of Catalan and picking the coefficients in their proper order (from the smallest power to the largest) he obtained a numbers: table of integer numbers. Again. xn = to the nodes generated by previous insertions. Finally. the English mathematician James Stirling was looking for a connection between powers of a number x. what k k=0 we obtain is a “rooted plane tree”. for the reason we are now going to explain.4: Rooted Plane Trees from the origin and never going below the x-axis.

∀n > 0.n | = 1. in fact. there are |Sn−1.0 is empty. and so (x + k − k)xk = = k |Sn.e.2 produce the same permutation in S4. which is done in n different ways. since n can occur after any other element in the standard cycle representation (it can never occur as the first element in a cycle. What is a possible combinatorial interpretation of these numbers? Let us consider the permutations of n elements and count the permutations having exactly k cycles.k | = (n − 1)|Sn−1. for the reason k we are going to explain. However. the coefficient of x2 is a sum of products. n 2 n n−1 n 1 • completely defines the Stirling number of the first kind. n. vice versa. we conclude that We obtain a recurrence relation in the following way: |Sn. If we now prove that also the initial conditions are the same. general Stirling identity. we obtain a permutation with n − 1 elements and k cycles. often read “n subset k”.n is only comk n−1 n−1 posed by the identity. When n is a fixed point and we eliminate it. 2 = (n − 1)!Hn−1 . and therefore we can equate its coefficients to those of the previous. k k=0 + k=0 We performed the change of variable k → k − 1 in the first sum and then extended both sums from 0 to n. say the last element n. by our conventions). This identity is valid for every value of x. and are called Stirling numbers of the second kind. Sn. thus obtaining the recurrence relation: n n−1 n−1 = (n − 1) + . . as shown in Table 2.0 | = 0. Sn. which is just the recurrence relation for the Stirling k k=0 numbers of the first kind. This concludes the proof. together with the initial conditions: n = 1.k−1 | The coefficients can be arranged into a triangular array. SPECIAL NUMBERS (n − 1) n−1 (−1)n−k xk . because they correspond to the total number of permutations of n objects. = n . k k k−1 This recurrence. whose set will be denoted by Sn.2 : (1 2 3)(4 5) (1 2 3 5)(4) (1 2 5 3)(4) (1 5 2 3)(4). When n is not a fixed point and we eliminate it from the permutation. we find: n n = n!. the row sums of the Stirling triangle of the first kind equal n!.k−1 | such permutations in Sn.k . we obtain a permutation with n − 1 elements having exactly k − 1 cycles. but this transposition can only be formed by taking two elements among 1. because evn−1 n−1 ery permutation contains at least one cycle.. the only permutation having n xn = xxn−1 = x xk = k cycles.k . Stirling’s identities can be globally written: n n xk .. 2. . any such permutation gives a permutation in Sn. n fixed points. We also observe that: • = (n − 1)!. .n−1 contains permu2 tations having all fixed points except a single transposition.22 n CHAPTER 2. . The usual notation for them is n .3. for n ≥ 1. the same permutation is obtained several times. exactly n − 1 times. ∀n ∈ N. returning to the numerical definition. Sn.k with n as a fixed point if we add (n) to it.1 is composed by all the permutations having a single cycle. or not. in each of which a positive integer is missing. in fact. all the following permutations in S5. this begins by 1 and is followed by any permutations of the n − 1 remaining numbers. If we fix any element. Therefore. • 2. k=0 Moreover.k | = n . The first instances are: x1 x2 x3 x4 = x1 = x1 + x2 = x + x(x − 1) = x1 + 3x2 + x3 = x + 3x(x − 1)+ + x(x − 1)(x − 2) = x1 + 6x2 + 7x3 + x4 The process can be inverted and therefore we have: |Sn. that is he was also interested in expressing ordinary powers in terms of falling factorials.11 Stirling numbers of the second kind James Stirling also tried to invert the process described in the previous section. k As an immediate consequence of this reasoning. First we observe that Sn. i.k | + |Sn−1. we observe that the permutations in Sn. so |Sn. k=0 xn = .e.k can have n as a fixed point. ∀n ∈ N n and n = 0. For example. 0 i.

which would determines a polynomial. 3. When n belongs to a larger set. n Let us now look for a combinatorial interpretation • n−1 = n .k | = k|Pn−1. Here we have the initial conditions: • n = 2n−1 .k | + |Pn−1.k | by fixing an element in Nn . by eliminating {n} we obtain a partition in Pn−1. and therecan equate the coefficients of xk in this and the above fore |Pn. In any partition with 2 of these numbers. ∀n ≥ 2. and therefore we no partition of Nn composed by 0 subsets. The partitions in Pn. . obviously. thus obtaining the recurrence relation: with the corresponding Stirling number of the second kind. 4} ∪ {2} . Stirling numbers of the first kind. 2.n | = where.12.k−1 | which is the same recurrence as for the Stirling numbers of the second kind. we obtain as first subset all the subsets in gle of the second kind. . 2 different ways. 4} ∪ {2. nonsingletons. In fact. This proves obtain: S4 (w) = w + 7w2 + 6w3 + w4 and it is called the identity. the following three partitions in P5. we have the following 7 partitions: k=0 n k=0 = n−1 k n−1 x k+1 + k n−1 k x = k {1} ∪ {2. partition of the empty set). As far as the initial conn n−1 n−1 ditions are concerned. ∀n ∈ N and = 0. 4} {1. 4} 2. the two elements can be chosen in n empty subsets. 5} ∪ {3} ∪ {4} {1. In the former case. i. When n ≥ 1. 3} {1} ∪ {2. we observe that there is only k k k x x + = k k−1 one partition of Nn composed by n subsets. 3} ∪ {2. . all partitions in Pn−1.k . except this last whole set. 2. Bell and Bernoulli numbers If Pn. we can eliminate it obtaining a partition If we sum the rows of the Stirling triangle of the second kind.. If Nn is the usual set {1. ∀n ≥ 1.3 : {1. and use this fact for observing that: n n−1 n−1 =k + k k k−1 • n = 1.3: Stirling numbers of the second kind n−1 {1. for example. the first one uniquely n n = 1. the five partitions of a set with three elements are: {1. For example. there is The identity is valid for every x ∈ R. 2. 2. 5} ∪ {4} 1 3 7 15 31 1 6 25 90 1 10 65 1 15 1 Table 2. When the partition is only 2 composed by two subsets. ∀n ∈ N.3 all produce the same partition in P4. the only partition of 1 Nn in a single subset is when the subset coincides which is slightly different from the recurrence for the with Nn . For example. . 4} {1.k is the corresponding set. Every row of the triangle Nn \ {1}. 2. therefore |Pn.0 | = 0. 2} ∪ {3} ∪ {4. say the last element n. 2} ∪ {3. 5} This proves the recurrence relation: |Pn. 4} ∪ {3} {1. n}..e. and all the others we can study the partitions of Nn into k disjoint. Let us suppose that the 0 n first subset always contains 1.k | coincides identity. For example. when n = 4 and k = 2. 3} ∪ {4} {1. the k=0 k=0 partition containing n singletons.e. however. as usual.2. 3} {1. 2} ∪ {3.k can contain n as a singleton (i.k−1 can be obtained in such a way. we find a sequence: n Bn 0 1 1 1 2 3 2 5 4 15 5 52 6 203 7 877 8 4140 which represents the total number of partitions relative to the set Nn . By eliminating These relations completely define the Stirling trian1. we performed the change of variable 1. ∀n ∈ N (in the case n = 0 the empty set is the only k → k − 1 and extended the two sums from 0 to n. exactly by eliminating n from any of the k subsets containing it in the various partitions. as a subset with n as its only element) or can contain n as an element in a larger subset. ∀n ≥ 1. the same partition is obtained several times. We can conclude that |Pn. we now count |Pn. 3. one subset with 2 elements. the 4-th Stirling polynomial.k−1 and. from row 4 we correspond to an empty second set.12 {1. 3} {1. 2} ∪ {3} . determines the second. BELL AND BERNOULLI NUMBERS n\k 0 1 2 3 4 5 6 0 1 0 0 0 0 0 0 1 1 1 1 1 1 1 2 3 4 5 6 23 in Pn−1.

the possible multisets n 7 8 9 10 11 12 are: {1. i. . i. By the previous example. 1} . Initially.. 3}∪{1. {1. we build the corresponding ordered partition us. k.24 {1.all the possible values for the Bn ’s. but. for the moment only introducing their definition. 1) (2. 2) (2. 2. {1. ordered Bell numbers are also called preferential arrangement numbers.. i. For example. 1. n k .. B0 = 1. The Bernoulli numbers are implicitly defined by the recurrence relation: n n+1 Bk = δn. 2}. 3} ∪ {2} {1} ∪ {2} ∪ {3} . they arise in many combinatorial problems and therefore they should be examined here. orderings of set partitions and preferential arrange. in fact. If k is the largest number in the arrangement. we have proved that it is actually a 1-1 correspondence. and we now have a formula for B2 : n n On = k!. 1. B0 + 1 0 Because of this definition. 1) n are zero. On > n!. 1. and their No initial condition is necessary. 2) is {2.However. all the other values of Bn for odd (1. For example. {1. Another frequently occurring sequence is obtained by ordering the subsets appearing in the partitions of Nn . . . the partition {1} ∪ {2} ∪ {3} can be ordered in 3! = 6 different ways. 1) (1.. 3}. and. We conclude this section by introducing another important sequence of numbers. 2) (2. However. 2) (1. 1. . 2. ∀n > By performing the necessary computations. SPECIAL NUMBERS This construction can be easily inverted and since it is injective. the numbers On are called ordered Bell numbers and we have: We obtain B1 = −1/2. For example. 2. we have 1 B0 = 1. by definition we have: n Bn = k=0 n . 4}. 1. . apart from the zero values. k Bell numbers grow very fast. 2. and so on. but as n grows. from their definition. 2} . and in fact Bn < n! for every n > 1. Their posBn 0 −1/30 0 5/66 0 −691/2730 sible orderings are given by the following 7 vectors: Except for B1 . .values are as follows: tiset with n elements containing at least once all the n 0 1 2 3 4 5 6 numbers 1. We can find a 1-1 correspondence between the ily proven in a direct way.0 . 2. These are called ordered partitions. they are alplus the six permutations of the set {1. and B1 is obtained by setting n = 1 in the recurrence relation: n 0 1 2 3 4 5 6 7 2 2 On 1 1 3 13 75 541 4683 47293 B1 = 0. 4} ∪ {2. 3}. . This is the starting 0 we easily see that O3 = 13. These and other properties of the Bernoulli numbers are not easorderings are called preferential arrangements. 1. These ternatively one positive and one negative. by setting the element 1 in the a1 th subset.k ). and {1. B1 + B0 + 2 1 0 this shows that On ≥ n!. 1) modulus. the partition corresponding to (1. 2. 2 in the a2 th subset. Another combinatorial interpretation of the or. 2. 3}. their combinatorial interpretation cannot be direct. since n ≤ k . 2} ∪ {3} and {3} ∪ {1. while the partition corresponding to (2. {1. The first twelve teger n ∈ N and for every k ≤ n let Ak be any mul. The number of all the possible Bn 1 −1/2 1/6 0 −1/30 0 1/42 orderings of the Ak ’s is just the nth ordered Bell number. and the sequence begins: value.e. whose ordering is different. i. CHAPTER 2. Because of that.in such a way that everything becomes accessible to ment. k The numbers in this sequence are called Bell numbers and are denoted by Bn . a2 . however. they become extremely large in (1. 3 3 3 k k=0 B2 = 0.k corresponds to one or more cycles in Sn. 2. we always have Bn ≤ n!.B2 = 1/6.e.e. we exactly build k subsets. . 2} ∪ {3} can k=0 be ordered in 2! = 2 ways. 2} . Bernoulli numbers seem to be small. Let us fix an in. 2. and we can go on successively obtaining dered Bell numbers is as follows. an ) is a preferential arrange. 1. for every value of n and k (a subset in Pn. 1. If (a1 . These are (positive or negative) rational numbers and therefore they cannot correspond to any counting problem. 1) is {1.e. we find 1. we’ll see later how we can arrange things ments. because for n = 0 number is denoted by On . when n = 3.

) = (fk )k∈N is a sequence of (real) numbers. In order to stress that our results are substantially independent of the particular field F and of the particular indeterminate t.L.L. are more easily studied than sequences: 1. denoted by ord(f (t)).) over R in the indeterminate t is an expression: f (t) = f0 +f1 t+f2 t2 +f3 t3 +· · ·+fn tn +· · · = ∞ k=0 fk tk where f0 . f1 .L. which will be introduced in the next chapter. f1 . but the reader can think of F as of R[[t]]. Although our study of f.p. . .s. which will 25 gk t k . we denote F[[t]] by F.p. 1. of order exactly r is denoted by F r or by Fr [[t]].1 Definitions power series for formal be called the (ordinary) generating function of the sequence. then g(t) is actually a f. is mainly justified by the development of a generating function theory. g(t) the order can be negative. 1. in some cases..p.p.s..p.s.. . In fact. . are all real numbers.s. The term ordinary is used to distinguish these functions from exponential generating functions. The same definition applies to every set of numbers.s.p. in the f.p. f1 . We conclude this section by defining the concept of a formal Laurent (power) series (f.. f2 .s.s. inverse. and depending on the field structure of the numeric set.s. F[[t]] and F[[y]]. .. For example. If f (t) ∈ F. many f. strictly contains the set of f. in combinatorial analysis and in the analysis of algorithms the coefficients f0 .p.).s.s. The use of a particular indeterminate t is irrelevant.p. i. positive rational numbers (e. The set of formal power series over F in the indeterminate t is denoted by F[[t]]. of a formal power series are mostly used to count objects.g.). . 1 + t + t2 + t3 + · · ·. or. the algebraic structure of f. k=−m The set of f. f2 . and postpone the study of generating functions to the next chapter. can be “abbreviated” by expressions easily manipulated by elementary algebra. and therefore they are positive integer numbers.e. The formal power series 0 = 0 + 0t + 0t2 + 0t3 + · · · has infinite order.p.s. .1).p.e..s. when they are the coefficients of an exponential generating function.s.Chapter 3 Formal power series 3. 2. k=0 fk t . There are two main reasons why f. The developments we are now going to see. there is no substantial difference between ∞ k the sequence and the f. the order of f (t). .. we will prove that the series 1 + t + t2 + t3 + · · · can be conveniently abbreviated as 1/(1 − t). is very well understood and can be developed in a standard way. say.s. and there exists an obvious 1-1 correspondence between. . i. corresponding to the sequence (1. The present chapter is devoted to these algebraic aspects of f. The set of all f. We observe explicitly that an expression as ∞ k k=−∞ fk t does not represent a f. which is 1 − t + 0t2 + 0t3 + · · ·. as an expression: g(t) = g−m t−m + g−m+1 t−m+1 + · · · + g−1 t−1 + + g0 + g1 t + g 2 t + · · · = 2 ∞ Let R be the field of real numbers and let t be any indeterminate over R. The indeterminate t is used as a “place-marker”. . A formal power series (f. we dedicate the present chapter to the general theory of f. . is the smallest index r for which fr = 0. it is simple to prove that this correspondence is indeed an isomorphism. and from this fact we will be able to infer that the series has a f.s. a symbol to denote the place of the element in the sequence. a symbol different from any element in R.p. If (f0 .p. when the order of g(t) is non-negative. For example. See below and Section 4.L. For a f. f2 .s. in particular to the field of rational numbers Q and to the field of complex numbers C.. . . can be easily extended to every field F of 0 characteristic. the term t5 = 1 · t5 simply denotes that the element in position 5 (starting from 0) in the sequence is the number 1.s.

s. 1 = 1 + 0t + 0t2 + 0t3 + · · ·.f (t)g(t) has the term of degree k1 +k2 with coefficient gebraic structures.) k ∞ are 1. In our formal approach. f1 = −1 and fk = 0. Here we have This clearly shows that the product is commutative f0 = 1. is invertible if and only if its order is 0. FORMAL POWER SERIES from zero.26 CHAPTER 3. It is a good and therefore g(t) = f (t) is well defined. the product The set F of f.s. 1 − t = 1 − t + 0t2 + 0t3 + · · ·.s. are inter= f (t)h(t) + g(t)h(t) preted as functions.e. with 0 ≤ k1 .s.p.p.. Explicitly. We conclude most common one. In fact. when t is a variable and f. different valid from a purely formal point of view.p. The previous reasoning also shows that. . ∀k > 1. + (f0 g2 + f1 g1 + f2 g0 )t2 + As a simple example.p.s. . and can determine the coefficients of g(t) by solving the ∞ the opposite series of f (t) = k=0 fk tk is the series infinite system of linear equations: ∞ k  −f (t) = k=0 (−fk )t . can be embedded into several al. and so it cannot be zero. Therefore the inverse f. we can easily prove that f (t) is invertible. ∞ ∞ ∞ 3. 1. let us compute the inverse of + (f0 g3 + f1 g2 + f2 g1 + f3 g0 )t3 + · · · the f. We are now going to define the fk1 gk2 = 0. Finally. +.  f0 g0 = 1  Let us now define the Cauchy product of f (t) by  f0 g1 + f1 g0 = 0 g(t):  f0 g2 + f1 g1 + f2 g0 = 0   ∞ ∞ ··· k k f (t)g(t) = = fk t gk t The system can be solved in a simple way. In The associative and commutative laws directly fol. On the other hand.that (F.fact. however. The usual notation for this j=0 k=0 fact is:     1 ∞ k ∞ k = 1 + t + t 2 + t3 + · · · . If f (t) and g(t) are two f. in general.s. the the explicit expression for the Cauchy product. Accordf (t)g(t) = f0 g0 + (f0 g1 + f1 g0 )t + ing to standard terminology. we identity is the f. if f (t) is an invertible element in F. f0 = 0. The system and it is a simple matter to prove that the identity is becomes:   g0 = 1 the f. i. This means fk1 = 0 and gk2 = 0. it immediately follows that F f (t) ∈ F 0 . we should have f (t)f (t)−1 = k=0 k=0 k=0 1 and therefore ord(f (t)) = 0. we obtain:  = fj gk−j  tk f1 f2 f2 −1 j=0 k=0 g0 = f0 g1 = − 2 g2 = − 1 − 2 · · · 3 f0 f0 f0 Because of the form of the tk coefficient. k2 < ∞. the elements of F 0 are called the units of the integrity domain.2 The basic algebraic structure .. this is also −1 called the convolution of f (t) and g(t).p. 1−t   tk + tk = = fj hk−j gj hk−j It is well-known that this identity is only valid for j=0 j=0 k=0 k=0 −1 < t < 1. From low from the analogous properties in the field F. cept of sum and (Cauchy) product of series. We conidea to write down explicitly the first terms of the clude stating the result just obtained: a f. then we can suppose that ord(f (t)) = k1 and ord(g(t)) = k2 . . Cauchy product: F 0 is also called the set of invertible f.p. 2. if From this definition.s. f (t) = k=0 fk tk and g(t) = k=0 gk tk . we are looking for  (fj + gj )hk−j  tk = = is 1 + t + t2 + t3 + · · ·.p. starting k=0 k=0 with the first equation and going on one equation   ∞ k after the other. f (t) = f0 + f1 t + f2 t2 + f3 t3 + · · · with is a commutative group with respect to the sum.p. which is related to the usual con. we can prove that F does not contain any these considerations are irrelevant and the identity is zero divisor. Because of that. we have:   ··· (f (t) + g(t))h(t) =   and we easily obtain that all the gj ’s (j = 0. 0 = 0 + 0t + 0t2 + 0t3 + · · ·.s. the sum of f (t) and g(t) is defined as: ord(f (t)g(t)) = ord(f (t)) + ord(g(t)) The order of the identity 1 is obviously 0. let g(t) = f (t)−1 so that f (t)g(t) = 1. The distributive   g1 − g 0 = 0 law is a consequence of the distributive law valid in  g2 − g 1 = 0 F. ·) is an integrity domain. therefore.p.s.p.s. Given ∞ ∞ we have: two f. f (t) + g(t) = fk tk + gk t k = (fk + gk )tk .

f (t)p ∈ F 0 if and only if f (t) ∈ F 0 . and if we denote by L the set of f. then a(t)/b(t) ∈ F and is uniquely determined by a(t). +.4. Therefore. We ∞ have a(t)/v(t) = k=0 dk tk ∈ F 0 . Let concept of a formal Laurent series.s. multiplication and division. b(t)) ∼ (f (t). subtraction. as considered above... our assert is proved.s. therefore.p.L. The only point we should formally ∞ prove is that every f. The most important operation is surely taking a power of a f. then we can write b(t) = tm v(t). if p ∈ N we can recursively define: f (t)0 f (t)p = 1 = f (t)f (t)p−1 if p = 0 if p > 0 and observe that ord(f (t)p ) = p ord(f (t)).L.p. we have that (L. n ∈ Z). = k=q   i+j=k where p = min(m. b(t)) ∈ L such that (f (t). ·) of f. It is now easy to see that l(t) is the inverse of the f. in F 0 has an inverse. g(t).s. thus characterizing the set of f. let us now suppose that b(t) ∈ F 0 . a(t) + b(t) = ak tk + bk t k = let p = min(ord(f (t)).L.3. d) ⇐⇒ ad = bc. So. +. and our proof is complete.e. +. is isomorphic with the field constructed in the described In the first section of this Chapter. ·) is indeed the smallest field containing (F. we can identify L and L and assert that (L. ∞ either a(t) ∈ F 0 or b(t) ∈ F 0 or both are invertible = (ak + bk )tk f. ord(g(t))) and let us define k=m k=n a(t). g(t)) ∈ L there ex∞ k b(t) = k=n bk t (m. it is not difficult to find out that these operations enjoy the usual properties of sum and product.s.L.3 Our aim is to show that the field (L. this is proved in the same way we proved that every f.s. we have a(t)b(t) = 1 and the proof is complete. b(t)/a(t) in the sense of f. are two f. This property will be important in our future developments... OPERATIONS ON FORMAL POWER SERIES 27 3. +. we begin ∞ of the concept of a f. +. and consequently ∞ k−m .. then the order of f (t)p becomes larger and larger and goes to ∞ when p → ∞. it is uniquely determined by a(t). by substituting the values obtained up to the moment. ·) containing (K. Because of this result. f (t)b(t) = g(t)a(t)) and at least ∞ ∞ one between a(t) and b(t) belongs to F 0 . If b(t) ∈ F 0 . +.L.p. in this case. ai bj  tk  3. ·).L.p. it is possible to consider other operations on F.. only a few of which can be extended to L.s.L.. b(t). a(t) = k=m ak tk = 0 has ∞ an inverse f. In fact we should have: am b−m = 1 am b−m+1 + am+1 b−m = 0 am b−m+2 + am+1 b−m+1 + am+2 b−m = 0 ··· By solving the first equation. when we will reduce many operations to . +. ·) is the smallest field containing the integrity domain (F. We can now show that (L.4 Operations power series on formal Besides the four basic operations: addition. b(t) or also by f (t). ·) is a field. b(t)) (i.L. we can ists a pair (a(t). However. l(t) = k=0 dk t construction.s. obviously.s..L. b(t) as f (t) = tp a(t) and g(t) = tp b(t). the set F with the operations + and · defined as the extension of + and · in K is the field we are searching for. and the field of rational functions is built from the integrity domain of the polynomials.. ·) the smallest field (F. where v(t) ∈ F 0 . As we did for f. we find b−m = a−1 .s. We now have: k=p Formal Laurent Series a(t)b(t) = ∞ ∞ k=m ak t k ∞ k=n bk t k b(t)f (t) = b(t)tp a(t) = tp b(t)a(t) = g(t)a(t) = and this shows that (a(t). we introduced the way starting with the integrity domain of f. This is just the way in which the field Q of rational numbers is constructed from the integrity domain Z of integers numbers. on the other hand.s. From now on. m then the system can be solved one equation after the other. Since am b−m is the coefficient of t0 . g(t)). From Algebra we know that given an integrity domain (K. so it is now obvious that the correspondence is also an isomorphism between (L. +. +. ·).s.s. g(t)) ∼ define the sum and the Cauchy product: (a(t).p. ·) and (L. if a(t) = k=m ak tk and by showing that for every (f (t). the set L will be ignored and we will always refer to L as the field of f.s.s.p. by let us consider the f. +.. This shows that the correspondence is a 1-1 correspondence between L and L preserving the inverse. b(t) = k=−m bk tk .p. ·). n) and q = m + n.p. b) ∼ (c. in an algebraic way. if f (t) ∈ F 0 . ·) can be built in the following way: let us define an equivalence relation ∼ on the set K × K: (a. In fact. if we now set F = K × K/ ∼. as an extension L = F × F be the set of pairs of f.s. +.s.s.

28 infinite sums involving the powers f (t)p with p ∈ N. If f (t) ∈ F 0 , i.e., ord(f (t)) > 0, these sums involve elements of larger and larger order, and therefore for every index k we can determine the coefficient of tk by only a finite number of terms. This assures that our definitions will be good definitions. We wish also to observe that taking a positive integer power can be easily extended to L; in this case, when ord(f (t)) < 0, ord(f (t)p ) decreases, but remains always finite. In particular, for g(t) = f (t)−1 , g(t)p = f (t)−p , and powers can be extended to all integers p ∈ Z. When the exponent p is a real or complex number whatsoever, we should restrict f (t)p to the case f (t) ∈ F 0 ; in fact, if f (t) = tm g(t), we would have: f (t)p = (tm g(t))p = tmp g(t)p ; however, tmp is an expression without any mathematical sense. Instead, if f (t) ∈ F 0 , let us write f (t) = f0 + v(t), with ord(v(t)) > 0. For v(t) = v(t)/f0 , we have by Newton’s rule: f (t) = (f0 +v(t)) =
p p p f0 (1+v(t))p

CHAPTER 3. FORMAL POWER SERIES By applying well-known rules of the exponential and logarithmic functions, we can easily define the corresponding operations for f.p.s., which however, as will be apparent, cannot be extended to f.L.s.. For the exponentiation we have for f (t) ∈ F 0 , f (t) = f0 + v(t): ef (t) = exp(f0 + v(t)) = ef0
∞ k=0

v(t)k . k!

Again, since v(t) ∈ F 0 , the order of v(t)k increases with k and the sums necessary to compute the coefficient of tk are always finite. The formula makes clear that exponentiation can be performed on every f (t) ∈ F, and when f (t) ∈ F 0 the factor ef0 is not present. For the logarithm, let us suppose f (t) ∈ F 0 , f (t) = f0 + v(t), v(t) = v(t)/f0 ; then we have: ln(f0 + v(t)) = = ln f0 + ln(1 + v(t)) = ln f0 +

=

p f0

∞ k=0

p v(t)k , k

(−1)k+1

k=1

v(t)k . k

which can be assumed as a definition. In the last p p expression, we can observe that: i) f0 ∈ C; ii) k is defined for every value of p, k being a non-negative integer; iii) v(t)k is well-defined by the considerations above and ord(v(t)k ) grows indefinitely, so that for every k the coefficient of tk is obtained by a finite sum. We can conclude that f (t)p is well-defined. Particular cases are p = −1 and p = 1/2. In the former case, f (t)−1 is the inverse of the f.p.s. f (t). We have already seen a method for computing f (t)−1 , but now we obtain the following formula: f (t)−1 = 1 f0
∞ k=0

In this case, for f (t) ∈ F 0 , we cannot define the logarithm, and this shows an asymmetry between exponential and logarithm. Another important operation is differentiation: Df (t) = d f (t) = dt
∞ k=1

kfk tk−1 = f ′ (t).

This operation can be performed on every f (t) ∈ L, and a very important observation is the following: Theorem 3.4.1 For every f (t) ∈ L, its derivative f ′ (t) does not contain any term in t−1 .

Proof: In fact, by the general rule, the term in t−1 should have been originated by the constant term For p = 1/2, we obtain a formula for the square root (i.e., the term in t0 ) in f (t), but the product by k reduces it to 0. of a f.p.s.:
k=0

−1 1 v(t)k = k f0

(−1)k v(t)k .

f (t)1/2

= =

f (t) = f0
∞ k=0

f0

∞ k=0

1/2 v(t)k = k

(−1)k−1 2k v(t)k . 4k (2k − 1) k

In Section 3.12, we will see how f (t)p can be obtained computationally without actually performing the powers v(t)k . We conclude by observing that this more general operation of taking the power p ∈ R cannot be extended to f.L.s.: in fact, we would have smaller and smaller terms tk (k → −∞) and therefore the resulting expression cannot be considered an actual f.L.s., which requires a term with smallest degree.

This fact will be the basis for very important results on the theory of f.p.s. and f.L.s. (see Section 3.8). Another operation is integration; because indefinite integration leaves a constant term undefined, we prefer to introduce and use only definite integration; for f (t) ∈ F this is defined as:
t

f (τ )dτ =
0

∞ k=0

t

fk
0

τ k dτ =

∞ k=0

fk k+1 t . k+1

Our purely formal approach allows us to exchange the integration and summation signs; in general, as we know, this is only possible when the convergence is t uniform. By this definition, 0 f (τ )dτ never belongs to F 0 . Integration can be extended to f.L.s. with an

3.6. COEFFICIENT EXTRACTION obvious exception: because integration is the inverse operation of differentiation, we cannot apply integration to a f.L.s. containing a term in t−1 . Formally, from the definition above, such a term would imply a division by 0, and this is not allowed. In all the other cases, integration does not create any problem.

29 position ◦; composition is always associative and therefore (F 1 , ◦) is a group if we prove that every f (t) ∈ F 1 has a left (or right) inverse, because the theory assures that the other inverse exists and coincides with the previously found inverse. Let f (t) = f1 t+f2 t2 +f3 t3 +· · · and g(t) = g1 t+g2 t2 +g3 t3 +· · ·; we have: f (g(t)) = f1 (g1 t + g2 t2 + g3 t3 + · · ·) + 2 + f2 (g1 t2 + 2g1 g2 t3 + · · ·) +
3 + f3 (g1 t3 + · · ·) + · · · = 2 = f1 g1 t + (f1 g2 + f2 g1 )t2 +

3.5

Composition

A last operation on f.p.s. is so important that we dedicate to it a complete section. The operation is the composition of two f.p.s.. Let f (t) ∈ F and g(t) ∈ F 0 , then we define the “composition” of f (t) by g(t) as the f.p.s: f (g(t)) =

3 + (f1 g3 + 2f2 g1 g2 + f3 g1 )t3 + · · · = t

fk g(t)k .

k=0

This definition justifies the fact that g(t) cannot belong to F0 ; in fact, otherwise, infinite sums were involved in the computation of f (g(t)). In connection with the composition of f.p.s., we will use the following notation: f (g(t)) = f (y) y = g(t) which is intended to mean the result of substituting g(t) to the indeterminate y in the f.p.s. f (y). The indeterminate y is a dummy symbol; it should be different from t in order not to create any ambiguity, but it can be substituted by any other symbol. Because every f (t) ∈ F 0 is characterized by the fact that g(0) = 0, we will always understand that, in the notation above, the f.p.s. g(t) is such that g(0) = 0. The definition can be extended to every f (t) ∈ L, but the function g(t) has always to be such that ord(g(t)) > 0, otherwise the definition would imply infinite sums, which we avoid because, by our formal approach, we do not consider any convergence criterion. The f.p.s. in F 1 have a particular relevance for composition. They are called quasi-units or delta series. First of all, we wish to observe that the f.p.s. t ∈ F 1 acts as an identity for composition. In fact y y = g(t) = g(t) and f (y) y = t = f (t), and therefore t is a left and right identity. As a second fact, we show that a f.p.s. f (t) has an inverse with respect to composition if and only if f (t) ∈ F 1 . Note that g(t) is the inverse of f (t) if and only if f (g(t)) = t and g(f (t)) = t. From this, we deduce immediately that f (t) ∈ F 0 and g(t) ∈ F 0 . On the other hand, it is clear that ord(f (g(t))) = ord(f (t))ord(g(t)) by our initial definition, and since ord(t) = 1 and ord(f (t)) > 0, ord(g(t)) > 0, we must have ord(f (t)) = ord(g(t)) = 1. Let us now come to the main part of the proof and consider the set F 1 with the operation of com-

The first equation gives g1 = 1/f1 ; we can substitute this value in the second equation and obtain a value for g2 ; the two values for g1 and g2 can be substituted in the third equation and obtain a value for g3 . Continuing in this way, we obtain the value of all the coefficients of g(t), and therefore g(t) is determined in a unique way. In fact, we observe that, by construction, in the kth equation, gk appears in linear form and its coefficient is always f1 . Being f1 = 0, gk is unique even if the other gr (r < k) appear with powers. The f.p.s. g(t) such that f (g(t)) = t, and therefore such that g(f (t)) = t as well, is called the compositional inverse of f (t). In the literature, it is usually denoted by f (t) or f [−1] (t); we will adopt the first notation. Obviously, f (t) = f (t), and sometimes f (t) is also called the reverse of f (t). Given f (t) ∈ F 1 , the determination of its compositional inverse is one of the most interesting problems in the theory of f.p.s. or f.L.s.; it was solved by Lagrange and we will discuss it in the following sections. Note that, in principle, the gk ’s can be computed by solving the system above; this, however, is too complicated and nobody will follow that way, unless for exercising.

In order to determine g(t) we have to solve the system:   f1 g1 = 1   2 f1 g2 + f2 g1 = 0 3  f1 g3 + 2f2 g1 g2 + f3 g1 = 0   ···

3.6

Coefficient extraction

If f (t) ∈ L, or in particular f (t) ∈ F, the notation [tn ]f (t) indicates the extraction of the coefficient of tn from f (t), and therefore we have [tn ]f (t) = fn . In this sense, [tn ] can be seen as a mapping: [tn ] : L → R or [tn ] : L → C, according to what is the field underlying the set L or F. Because of that, [tn ]

30

CHAPTER 3. FORMAL POWER SERIES

(linearity) (shifting) (differentiation) (convolution) (composition)

[tn ](αf (t) + βg(t)) [tn ]tf (t) [tn ]f ′ (t) [tn ]f (t)g(t) [tn ]f (g(t))

= α[tn ]f (t) + β[tn ]g(t) = = = = [tn−1 ]f (t) (n + 1)[tn+1 ]f (t)
n

(K1) (K2) (K3) (K4) (K5)

[tk ]f (t)[tn−k ]g(t)
k=0 ∞

([y k ]f (y))[tn ]g(t)k

k=0

Table 3.1: The rules for coefficient extraction is called an operator and exactly the “coefficient of ” operator or, more simply, the coefficient operator. In Table 3.1 we state formally the main properties of this operator, by collecting what we said in the previous sections. We observe that α, β ∈ R or α, β ∈ C are any constants; the use of the indeterminate y is only necessary not to confuse the action on different f.p.s.; because g(0) = 0 in composition, the last sum is actually finite. Some points require more lengthy comments. The property of shifting can be easily generalized to [tn ]tk f (t) = [tn−k ]f (t) and also to negative powers: [tn ]f (t)/tk = [tn+k ]f (t). These rules are very important and are often applied in the theory of f.p.s. and f.L.s.. In the former case, some care should be exercised to see whether the properties remain in the realm of F or go beyond it, invading the domain of L, which can be not always correct. The property of differentiation for n = 1 gives [t−1 ]f ′ (t) = 0, a situation we already noticed. The operator [t−1 ] is also called the residue and is noted as “res”; so, for example, people write resf ′ (t) = 0 and some authors use the notation res tn+1 f (t) for [tn ]f (t). We will have many occasions to apply rules (K1) ÷ (K5) of coefficient extraction. However, just to give a meaningful example, let us find the coefficient of tn in the series expansion of (1 + αt)r , when α and r are two real numbers whatsoever. Rule (K3) can 1 be written in the form [tn ]f (t) = n [tn−1 ]f ′ (t) and we can successively apply this form to our case: [tn ](1 + αt)r = rα n−1 [t ](1 + αt)r−1 = n rα (r − 1)α n−2 = [t ](1 + αt)r−2 = · · · = n n−1 (r − n + 1)α 0 rα (r − 1)α ··· [t ](1 + αt)r−n = = n n−1 1 r 0 = αn [t ](1 + αt)r−n . n conclude with the so-called Newton’s rule: [tn ](1 + αt)r = r n α n

which is one of the most frequently used results in coefficient extraction. Let us remark explicitly that when r = −1 (the geometric series) we have: [tn ] 1 1 + αt = = −1 n α = n 1+n−1 (−1)n αn = (−α)n . n

A simple, but important use of Newton’s rule concerns the extraction of the coefficient of tn from the inverse of a trinomial at2 + bt + c, in the case it is reducible, i.e., it can be written (1 + αt)(1 + βt); obviously, we can always reduce the constant c to 1; by the linearity rule, it can be taken outside the “coefficient of” operator. Therefore, our aim is to compute: [tn ] 1 (1 + αt)(1 + βt)

with α = β, otherwise Newton’s rule would be immediately applicable. The problem can be solved by using the technique of partial fraction expansion. We look for two constants A and B such that: 1 A B = + = (1 + αt)(1 + βt) 1 + αt 1 + βt A + Aβt + B + Bαt ; = (1 + αt)(1 + βt) if two such constants exist, the numerator in the first expression should equal the numerator in the last one, independently of t, or, if one so prefers, for every value of t. Therefore, the term A + B should be equal to 1, while the term (Aβ + Bα)t should always be 0. The values for A and B are therefore the solution of the linear system: A+B =1 Aβ + Bα = 0

We now observe that [t0 ](1 + αt)r−n = 1 because of our observations on f.p.s. operations. Therefore, we

the principal branch of arctan is negative. However. we have 0 < arctan − 4c − b2 /2 < π/2. As a consequence. so that the first k positions are 0. √ and we should set θ = π + arctan − 4c − b2 /2. We can try to give n = 12k + 4 σn = 3 = 9 × 729k n = 12k + 5 σn = 0 it a better form. .1 = 0.7. For a reason which will be apparent only later.3.7 Matrix representation we can set α = ρeiθ and β = ρe−iθ . f2 . The trinomial is irreducible. the array . which is A = α/(α−β) and B = −β/(α−β). when b > 0. we √ 12k can apply the formula above in the form: n = 12k σn = 3 = 729k √ 12k+2 αn+1 − β n+1 1 = 3 × 729k n = 12k + 1 σn = 3 = . we have: √ 4c − b2 θ = arctan +C −b where C = π if b > 0 and C = 0 if b < 0.k )n. Consequently: αn+1 − β n+1 = ρn+1 ei(n+1)θ − e−i(n+1)θ = = 2iρn+1 sin(n + 1)θ and therefore: √ 2( c)n+1 sin(n + 1)θ 1 √ = [t ] 1 + bt + ct2 4c − b2 n Let f (t) ∈ F 0 . Obviously: θ = arctan |∆| −b + kπ = arctan 2 2 √ 4c − b2 + kπ −b √ When b < 0. An interesting and non-trivial example is given by: σn 1 = = [tn ] 1 − 3t + 3t2 √ n+1 √ 2( 3) sin((n + 1) arctan( 3/3)) √ = = 3 √ (n + 1)π = 2( 3)n sin 6 Let us now consider a trinomial 1 + bt + ct2 for which ∆ = b2 − 4c < 0 and b = 0. We can now substitute these values in the expression above: [tn ] 1 = (1 + αt)(1 + βt) 1 α β = [tn ] − = α − β 1 + αt 1 + βt α β 1 [tn ] = − [tn ] = α−β 1 + αt 1 + βt αn+1 − β n+1 (−1)n = α−β 31 At this point we only have to find the value of θ. which is always different from 0. The system has therefore only one solution. MATRIX REPRESENTATION The discriminant of this system is α − β. in general. with the coefficients of f (t) we form the following infinite lower triangular matrix (or array) D = (dn. however. . Let us set α = −b + i |∆| /2. column k contains the coefficients of f (t) shifted down k positions. This implies 0 < arg(α) < π and we have: = −81 × 729k n = 12k + 7 σn = − 3 √ 12k+8 n = 12k + 8 σn = −2 3 = −162 × 729k |∆| |∆| b b √ 12k+10 α−β = − +i + +i = = −243 × 729k n = 12k + 9 σn = − 3 2 2 2 2 √ 12k+10 n = 12k + 10 σn = − 3 = −243 × 729k = i |∆| = i 4c − b2 n = 12k + 11 σn = 0 If θ = arg(α) and: ρ = |α| = √ b2 4c − b2 − = c 4 4 3. k ∈ N. because of our hypothesis α = β. in this order.k∈N : column 0 contains the coefficients f0 . f1 . a partial fraction expansion does not give These coefficients have the following values: a simple closed form for the coefficients. the resulting √ 12k+4 expression is not very appealing. . √ 12k+6 n = 12k + 6 σn = − 3 = −27 × 729k so α is always contained in the positive imaginary √ 12k+8 halfplane. This definition can be summarized in the formula dn. column 1 contains the same coefficients shifted down by one position and d0. but we can write: [tn ] = [tn ] 1− 1 = 1 + bt + ct2 √ 1 1− −b−i |∆| t 2 −b+i |∆| t 2 √ This time. [tn ] √ 12k+2 (1 − αt)(1 − βt) α−β n = 12k + 2 σn = 2 3 = 6 × 729k √ 12k+4 = 9 × 729k n = 12k + 3 σn = 3 Since α and β are complex numbers. ∀n. and this is the correct value for θ.k = fn−k .

. by definition we have: fn.} where the number of leading 0’s is just k. We have shown that we can associate every f. 0. . 1) and (g(t).   2 2 4  0 f4 2f1 f3 + f2 3f1 f2 f1 · · ·    .    = (1. f (t)/t) and we are interested to see how the matrix (1. . Proceeding in the same way. f (g(t))/t) we have f (g(t)) = t. 1) = (f (t)g(t). . ◦) is isomorphic to (B. f (t) ∈ F 1 and let us build an infinite lower triangular matrix in the following way: column k contains the coefficients of f (t)k in their proper order:   1 0 0 0 0 ···  0 f1 0 0 0 ···    2  0 f2 f1 0 0 ···    3  0 f3 2f1 f2 f1 0 ··· . (1. g2 . f (t)/t) is just the matrix corresponding to the compositional inverse of f (t). .p. . .k = [tn ]f (t)k and therefore the generic element dn. This reveals a connection between the Cauchy product and the composition: in the Chapter on Riordan Arrays we will explore more deeply this connection. fn−1 . Besides. column k in (1. 1) is {0. Lagrange .k of the product is: dn. 1) (let us denote by A the set of such arrays) in such a way that (F 0 . g(t)/t) · (1. because of the isomorphism we have seen. f (t)/t) = (1. and the composition of f. 1) and this shows that there exists a group isomorphism between (F 0 . . . g(t)/t) · (1. . . . 1):  f0 0 0  f1 f0 0   f2 f1 f0  D = (f (t). . ·) and the set of matrices (f (t). When k = 1 we have n−1 ∞ dn. As we have already said. g(t)/t) is such that (1. t/t) = (1. .k = ∞ j=0 ∞ j=0 [tn ]g(t)j [y j ]f (y)k = If (f (t). g(t) ∈ F 1 to a matrix (1. . . . 1) is. f (t) and g(t). 0 0 0 0 f0 . with f (t) ∈ F 1 . and column k in (g(t).s. . ∀r < 0. . f (g(t))/t). . the inverse matrix for (1.1 = j=0 fn−1−j gj . and we can conclude: (1. we have dn.k is. we see that column k contains the coefficients of the convolution f (t)g(t) shifted down k positions.32 D will be denoted by (f (t). and this is sufficient to prove that the correspondence f (t) ↔ (1.p. The row n in (f (t).k = = ∞ j=0 n. Therefore we conclude: (f (t). .p. 1) are the matrices corresponding to the two f. In other words. it corresponds to the identity matrix) and (f (t)−1 . 1). fn−2 . . . FORMAL POWER SERIES If fn. in F 1 .      .p. the inverse matrix (1. . 1) with the row-by-column product. . CHAPTER 3. g(t) ∈ F 1 . ·). Let us now consider a f. fn−j gj−k if we conventionally set gr = 0. 1). ·) is isomorphic to (A.p. for the moment. 3. we wish to see how this observation yields to a computational method for evaluating the compositional inverse of a f.}. . . . . the identity matrix. because the sums involved in the product are actually finite. becomes again the rowby-column product. . g1 . When n ∞ k = 0.k 0 0 0 f0 f1 . Given an infinite. and it is immediate to see what its generic element dn.8 Lagrange inversion rem theo- The matrix will be denoted by (1. g(t)/t) · (1. .s. g(t)/t)·(1. we are interested in finding out what is the matrix obtained by multiplying the two matrices with the usual rowby-column product. This product will be denoted by (f (t). . and therefore column 0 contains the coefficients of the convolution f (t)g(t). {fn . .p. the identity t ∈ F 1 corresponds to the matrix (1. ·). .s. 1). Therefore we have: dn. 0. by definition. lower triangular arrays is straight-forward. 1) =  f3 f2 f1   f4 f3 f2  . . 1) · (g(t).k = ∞ j=0 [tn ] [y j ]f (t)k g(t)j = [tn ]f (g(t))k .s..j fj. and this j=0 fn−j gj−1 = n−1 is the coefficient of t in the convolution f (t)g(t).k∈N gn. . and the Cauchy product becomes the row-by-column product. Clearly. In particular. ··· ··· ··· ··· ··· . we can associate every f. 1) · (g(t).s.s. . and since the product results in (1. 1) is the inverse of (f (t). . f (t)/t). f (t)/t) is composed when f (t). In other words. f (t)/t). f (t)/t) is a group isomorphism. . f (t)/t) = (1. Row-by-column product is surely the basic operation on matrices and its extension to infinite. f (t)/t) is the kth power of the composition f (g(t)).0 = j=0 fn−j gj = j=0 fn−j gj . 1). g(t)/t) (let us call B the set of such matrices) in such a way that (F 1 .. 1) is the identity (in fact. lower triangular array of the form (1. f (t) ∈ F 0 to a particular matrix (f (t).

the constant term in f ′ (t)(t/f (t)) cient [t ]F (w(t)). there is another way for applying the the compositional inverse of f (t) is unique.k we have: result. n f (t) ∞ [tn ]F (w(t)) = [tn ] Fk w(t)k = In fact. the residue of its derivative should be zero.s. where φ(t) ∈ F 0 . (f (t)/t)−1 = (f1 + f2 t + f3 t2 + · · ·)−1 = generalized to every F (t) ∈ L. As a matter of fact. f (t)/t). the sum ∞ ∞ is actually finite and is the term of the convolution k n k Fk [tn−k ]φ(t)n = Fk [t ]w(t) = = appearing in the last formula. Let us now distinguish n k=0 k=0 between the case k = n and k = n. For the coefficient of tn in F (w(t)) we n k n t ′ k−1 have: = [t ] tf (t)f (t) . g(t)/t) should be and then verify that it is actually so. [tn ]f (t)k = [tn−k ] (dn. we have: t 1 1 [tn ]w(t) = [tn−1 ] = [tn−1 ]φ(t)n . Let F (t) ∈ F and let us consider the com∞ n k t position F (w(t)) where w = w(t) is. the column 1 gives us the value of its coefficients. The other columns give us the coefficients k Because f (t)/t ∈ F 0 . and we wish to find the D · (1. n f (t) n d j[y j ]f (y)k = [y j−1 ] f (y)k = k[y j ]yf ′ (y)f (y)k−1 . we state what the form of (1.3. 1) and therefore D is the inverse of (1. with φ(w) ∈ F 0 . because we already know that Many times. f (t)/t) is: f. n.s. f (t)k−n is a f. Suppose we have a functional equation w = generic element vn.k∈N be defined as: dn. If f (t) is the compositional inverse of f (t). This proves that D · (1. = = 1.L. = [t0 ]f ′ (t) n f (t) 0 in fact. except for the coeffi0 −1 f1 +· · ·. n k − n dt 3. by the formula for dn. being Note that [t ]F (w(t)) = F0 .1 = 1 n−1 [t ] n t f (t) n and this is the celebrated Lagrange Inversion Formula (LIF).9. dy The LIF can also give us the coefficients of the powers w(t)k . We follow the more recent proof of Stanley. f ′ (t) = f1 + 2f2 t + 3f3 t2 + · · · and. When k = n: vn. the power (t/f (t))k = of the powers f (t) . However. inverse of (1.k = = k n n [t ]t tf (t)k−n−1 f ′ (t) = n k −1 1 d [t ] f (t)k−n = 0. is f1 /f1 = 1. 1).k = k n−k [t ] n t f (t) n 33 in fact. n f (t) fore know that w(t) is uniquely determined and the j=0 LIF gives us: By the rule of differentiation for the coefficient of opn erator. but we can obtain a still more general Therefore. the functional equaj=0 tion can be written f (w(t)) = t. as we observed.k = [t ] n j=0 f (t) solution to the functional equation w = tφ(w). and this shows that ∞ n j n−j t w(t) is the compositional inverse of f (t). by finding the exact form of the matrix (1.k )n. for which we have: n (f (t)/t)−k is well-defined. Let D = (dn.k∈N = (1.k of the row-by-column product tφ(w).k = dn. we vn. and. When k = n we ∞ 1 n−1 have: = [t ] kFk tk−1 φ(t)n = n n n −1 ′ v = [t ]t tf (t) f (t) = . f (t)/t) = (1. g(t)/t) we only have to prove that n f (t) D · (1. g(t)/t).9 Some examples of the LIF We found that the number bn of binary trees with n nodes (and of other combinatorial objects .k we have: f n = [tn ]f (t) = dn. therefore. for vn.j [y j ]f (y)k = also have f (t) ∈ F 1 . f (t)/t). the j ′ k−1 n−j [y ]yf (y)f (y) = vn. SOME EXAMPLES OF THE LIF found a noteworthy formula for the coefficients of this compositional inverse. Indeed. and this formula can be f (t)/t ∈ F 0 . as before. in order to show that t k . The LIF. ∞ Clearly w(t) ∈ F 1 and if we set f (y) = y/φ(y).p. f (t)/t) = (1. which points out the purely formal aspects of Lagrange’s formula.n k=0 1 n−1 ′ t [t ]F (t)φ(t)n . We there= [t ] [y j ]f (y)k . we will prove something more. the factor k/n does not depend on j and k=0 can be taken out of the summation sign. w = w(t) satisfying this functional equation.k )n.

A non-empty p-ary tree can be decomposed in the following way: r ¨r ¨r ¨  rr r ¨¨  rr ¨  ¨ rrr ¨  ¨ r r t t t  Tpt  T2t  T1t . As before. This time we have a p degree equation. we can multiply n+1 by tn+1 the two members of the recurrence relation t and sum for n from 0 to infinity.s. several Computer Algebra Systems exist. t t t    3. we can add and subtract 1 = b0 t0 from the left hand member and can take t outside the summation sign in the right hand member: ∞ n=0 bn t n − 1 = t ∞ n=0 n bk bn−k k=0 tn . Now we have its form: 2! 3! 4! Cn = 2n 1 n+1 n It is a useful exercise to perform the necessary computations to show that f (w(t)) = t. under this name. 1 n n (n − 1)! n! bn = [tn ]w(t) = [tn−1 ](1 + t)2n = n Therefore.p. b(t) = sum is extended to all the p-uples (i1 ..s. .s. we obtain: b(t) − 1 = tb(t)2 .: (2n)! 2n 1 1 = = = n n−1 n (n − 1)!(n + 1)! ∞ 3 8 125 5 nn−1 n w(t) = t = t + t 2 + t3 + t4 + t +· · · . . which cannot be directly solved. so that w(t) ∈ F 1 which generalizes the formula for the Catalan numand wn = bn . i2 . .10 Formal power series and the computer When we are dealing with generating functions. The LIF gives: φ(t) = (1 + t)2 . FORMAL POWER SERIES Ti1 Ti2 · · · Tip . we recognize a convolution and substituting b(t) for the corresponding f. so that w(t) ∈ F 1 . Therefore we have: 1 1 nn−1 nn−1 wn = [tn−1 ]ent = = . n=1 n + 1 n!n! n+1 n As we said in the previous chapter. or more in general with formal power series of any kind.p. w(t) is the compositional inverse of nth Catalan number and. which have arity 0. it is often t4 t5 t3 f (t) = t/φ(t) = te−t = t − t2 + − + − · · · . we often have to perform numerical computations in order to verify some theoretical result or to experiment with actual cases. Nowadays. 1 (2n)! 1 2n n! 2 3 24 = = . bn is called the As noticed. also valid in the case n = 0 when C0 = 1. In the r. if we multiply the recurrence relation by that i1 + i2 + · · · + ip = n. We find: ∞ bn+1 tn+1 = ∞ n tn+1 k=0 bk bn−k . n=0 n=0 T (t) − 1 = tT (t)p . ∀n > 0.h. let us therefore set w = w(t) = b(t) − 1. ip ) such k=0 bk bn−k . for example up to the term of degree 5 or 6. The previous relation becomes bers.34 CHAPTER 3. except for leaves. w = t(1 + w)2 and we see that the LIF can be applied Finally.p. . and verify that w(f (t)) = t as well. . we find: and sum for n from 0 to infinity. which offer the possibility of . if we set w(t) = T (t)−1. In these and other circumstances the computer can help very much with its speed and precision. . n We are interested in evaluating bn = [t ]b(t). the solution we are looking for is the f. In the same way we can compute the number of p-ary trees with n nodes.s. A p-ary tree is a tree in which all the nodes have arity p. ∞ k k=0 bk t . let us find the solution of the functional (in the form relative to the functional equation) with equation w = tew . However. where the as well) satisfies the recurrence relation: bn+1 = which proves that Tn+1 = n Let us consider the f. denoted by Cn . we have: w = t(1 + w)p and the LIF gives: Tn = = = [tn ]w(t) = 1 n−1 [t ](1 + t)pn = n 1 (pn)! pn 1 = = n n−1 n (n − 1)!((p − 1)n + 1)! pn 1 (p − 1)n + 1 n Since b0 = 1.

e. n′ ) = reduce(mm′ . at the cost of introducing some 0 components. These routines can also be realized in a high level programming language (such as C or JAVA). So we must have m ∈ Z. the adicity acts of the square root · is one. The components a0 . n) is a reduced rational number. In other words. An integer number has a variable length internal representation and special routines are used to perform the basic operations. most operations on formal power series preserve the number of significant components. The simplest way to represent a formal power series is surely by means of a vector. n) using Euclid’s algorithm and then dividing both m and n by p. and the operands are a and 3. THE INTERNAL REPRESENTATION OF EXPRESSIONS actually working with formal power series. an are usually real numbers. . in a + 3. n)−1 = (n. it can be executed giving a numerical result.3. it is not difficult to realize rational arithmetic on a computer. it is surely not the best way to represent power series and becomes completely useless when. For example. Obviously. however. a1 . a Computer Algebra System is not always accessible or. . which can perform quite easily the basic operations on formal power series. represented with the precision allowed by the particular computer. . programmable pocket computers are now available.11 The internal representation of expressions ak tk = (a0 . because it acts on a single term. . a rational. . . In general. . for example. The aim of the present and of the following sections is to describe the main algorithms for dealing with formal power series. increase the number of significant elements. Differentiation decreases by one the number of useful components. if it is a natural. in which the kth component (starting from 0) is the coefficient of tk in the power series. one may desire to use less sophisticated tools.. the computer memory only can store a finite number of components. An expression can always be transposed into a “tree”. In most combinatorial applications. m) (m.e.11. n′ ) = reduce(mn′ + m′ n. The operations on rational numbers are defined in the following way: (m. the number of operands on which it acts. say. our representation only can deal with purely numerical formal power series. n ∈ N and it is a good idea to keep m and n coprime. They can be used to program a computer or to simply understand how an existing system actually works. so that there is little danger that a number of successive operations could reduce a finite representation to a meaningless sequence of numbers. an are rational numbers and. The dimension of m and n is limited by the internal representation of integer numbers. . on the contrary. so an upper bound n0 is usually given to the length of vectors and to represent power series. It is sufficient to represent a rational number as a couple (m. Computer Algebra Systems use a more sophisticated internal representation. n) + (m′ . For example. that is it can execute all the operators . if an operator acts on numerical operands. Obviously. an integer. or rational. because it √ on its two terms. n). .. . Each operator has as many branches as its adicity is and a simple visit to the tree can perform its evaluation. The aim of the present section is to give a rough idea of how an expression can be represented in the computer memory. the coefficients depend on some formal parameter. integration and multiplication by tr . containing formal parameters as well. Because of that. i. the operators are √ + and ·. the internal nodes of which correspond to operators and whose leaves correspond to operands. The use of these tools is recommended because they can solve a doubt in a few seconds. np ) (p ∈ N) The simple representation of a formal power series by a vector of real. in certain circumstances. whose intended meaning is just m/n. In fact. . n) × (m′ . but they can slow down too much execution time if realized on a programmable pocket computer. The simple tree for the previous expression is given in Figure 3. as a. However. nn′ ) (m. the result is a formal expression. In other words we have: reprn0 ∞ k=0 35 provided that (m. Computer Algebra Systems usually realize an indefinite precision integer arithmetic. a1 . can clarify difficult theoretical points and can give useful hints whenever we are faced with particular problems. i. power series are simply a particular case of a general mathematical expression. with some extra effort. . an ) (n ≤ n0 ) Fortunately. In order to avoid this last problem. 3. a1 . Operands can be numbers (and it is important that the nature of the number be specified. an expression consists in operators and √ operands. which may perhaps be simplified but cannot be evaluated to a numerical result. Every operator has its own arity or adicity. n)p = (mp . However. a0 .1. components will be used in the next sections to explain the main algorithms for formal power series operations. But if any of its operands is a formal parameter. The adicity of the sum + is two. a real or a complex number) or can be a formal parameter. This can be performed by a routine reduce computing p = gcd(m. nn′ ) (m.

where we have drawn a leaf aj . . we are always reduced to compute (1 + g(t))α and since: (1 + g(t))α = ∞ k=0 α g(t)k k (3. . Therefore. . bm ) = (c0 . . . First of all we observe that whenever α ∈ N. In a similar way the Cauchy product is defined: d √ · (a0 . . let us consider the operation of rising a formal power series to a power α ∈ R. . . the exponents involved in the right hand member of (3. f (t)α can be reduced to that case (1 + g(t))α . . The reader can develop computer programs for dealing with this representation of formal power series. . . m + pA ). an ) n ≤ n0 . Before discussing division. We point out that the time complexity for the sum is O(r) and the time complexity for the product is O(r2 ). . In any case. In fact. . Therefore. which is simpler between (a + 1)(a + 2) and a2 +3a+2? It is easily seen that there are occasions in which either expression can be considered “simpler”. k=0 This gives a straight-forward method for performThe sum of the formal power series is defined in an ing (1 + g(t))α . This includes the inversion of a power series (α = −1) and therefore division as well. f (t)α only can be / performed if f (t) ∈ F0 . most Computer Algebra Systems provide both a general “simplification” routine together with a series of more specific programs performing some specific simplifying tasks. . when α ∈ N. . a2 . powers can be realreprn0 ak tk = (a0 . c1 .12. . 3. Here r is defined as r = min(n + pB . . at least conceptually. b1 . In fact we have: f (t) = fh th + fh+1 th+1 + fh+2 th+2 + · · · = fh+1 fh+2 2 = fh th 1 + t+ t + ··· fh fh α k +        a  d d  3  Figure 3. which is called the tree or list representation of the expression. In the computer memory an expression is represented by its tree. . . A clever and dynamic use of the storage solves every problem without increasing the complexity of the corresponding programs. . we simply have a more complex tree representing the expression for aj . however. ized as successive multiplications or Cauchy products. where g(t) ∈ / F0 .36 CHAPTER 3.2. cr ) each executed in time O(r2 ). . .1) if the coefficients in g(t) are rational numbers. formal power series: Whatever α ∈ R is. an ) + (b0 . Fortunately. c1 . a1 . and r = min(n. In that case we have: f (t)α = f0 + f1 t + f2 t2 + f3 t3 + · · · α = α α = f0 1 + f2 f3 f1 t + t2 + t3 + · · · f0 f0 f0 . b1 .12. .12 Basic operations of formal power series . This representation is very convenient and when some coefficients a1 . . . and therefore: α f (t)α = fh tαh 1 + fh+1 fh+2 2 t+ t + ··· fh fh . such as expanding parenthetized expressions or collecting like factors.1) are all positive ∞ integer numbers. For example. . . . . It should be clear that another important point of this approach is that no limitation is given to the length of expressions. Subtraction is similar to addition and does not require any particular comment. a1 . an ) × (b0 . . since it requires r products (a0 . FORMAL POWER SERIES where ci = ai + bi for every 0 ≤ i ≤ r. On the contrary. as We are now considering the vector representation of described in the previous section. α note that in this case if f0 = 1. a1 . . if pA is the first index for which apA = 0 and pB is the first index for which bpB = 0. . an depend on a formal parameter p nothing is changed. provided α ∈ Q. These considerations are to be remembered if the operation is realized in some special environment. but it is easily seen to take a time obvious way: in the order of O(r3 ). Simplification is a rather complicated matter and it is not quite clear what a “simple” expression is. cr ) where ck = j=0 aj bk−j for every 0 ≤ k ≤ r. The representation of the n formal power series k=0 ak tk is shown in Figure 3. also the coefficients in (1 + g(t))α are.1: The tree for a simple expression when only numerical operands are attached to them. m). usually f0 is not rational. bm ) = (c0 .

and therefore the whole procedure works in time O(r2 ). in order to have the same indices as in the left hand member).e. or. The inverse of a series. . n k=1 i. an ) and on formal power series. where a(t) is any formal power series with a0 = 1. The evaluation of hk requires a number of operations in the order O(k). .13.3. hr (r = n. h2 . In the present section we wish to use it to perform the (natural) logarithm and (h1 . is obtained by setting α = −1. hn−1 ): the exponentiation of a series.: ∞ n−1 1 (α + 1)k ln(1 + g(t)) = − 1 ak hn−k g(t)k = αan + n k k=1 k=1 Logarithm and exponential .e. It is worth noting that the previous formula becomes: k−1 k nhn + k=1 ak (n − k)hn−k = αnan + αkak hn−k k=1 3. . As a simple example.    . we obtain h′ (t) = αa(t)α−1 a′ (t). an )). Miller has devised an algorithm allowing to perform (1 + g(t))α in time O(r2 ).. C.2: The tree for a formal power series J. . . LOGARITHM AND EXPONENTIAL 37 +     ∗   d d d + d       d     d     0 a0 ∗ + t   .13 (in the last sum we performed the change of variable k → k − 1. if n is the number of terms in (a1 . . h2 . there is a direct way to perform this operation. We now have an expres. n−1 n−1 k=0 ak (n − k)hn−k = α (k + 1)ak+1 hn−k−1 k=0 The computation is now straight-forward. .     . . i. . .      . a(t)h′ (t) = αh(t)a′ (t).The idea of Miller can be applied to other operations sion for hn only depending on (a1 . In fact. a2 . . the reader can show that the coefficients in (1 − t)−1 are all 1. by extracting the coefficient of tn−1 we find: n−1 n−1 hk = −ak − aj hk−j = − aj hk−j (h0 = 1) j=1 j=1 We now isolate the term with k = 0 in the left hand member and the term having k = n − 1 in the right and can be used to prove properties of the inverse of hand member (a0 = 1 by hypothesis): a power series. as desired. . (1+g(t))−1 . Therefore. As we ((α + 1)k − n) ak hn−k = hn = αan + know. a2 . . and then we successively compute h1 .   . P. We begin by setting h0 = 1. let us write h(t) = a(t)α . . 1 a1 ∗ t   +     d       d     d a2 t2   ∗ O(tn )          an tn   Figure 3. by multiplying everything by a(t). By differentiating. . Let us begin with the n−1 1 logarithm and try to compute ln(1 + g(t)). .

We conclude this section by sketching the obvious algorithms to compute differentiation and integration of a formal power series f (t) = f0 +f1 t+f2 t2 +f3 t3 + · · ·. . gk ) and to (h1 .. by differentiating we obtain h′ (t) = g ′ (t)/(1 + g(t)). . We extract the coefficient of tk−1 : k−1 k khk = j=0 (j + 1)gj+1 hk−j−1 = j=1 jgj hk−j This formula allows us to compute hk in terms of (g1 . . h2 . we have to resort to the defining formula: f (g(t)) = ∞ fk g(t)k k=0 This requires the successive computation of the integer powers of g(t). . Conversely. . FORMAL POWER SERIES number of significant terms in f (t) and g(t). h2 . . but we no longer have rational coefficients when g(t) ∈ Q[[t]]. . h2 . h1 . hr . if r is the minimal . hr if r is the number of significant terms in g(t). In this way we are reduced to the previous case. To compute f (g(t)). hk−1 ): and h0 = 0. . A similar technique can be applied to the computation of exp(g(t)) provided that g(t) ∈ F0 . . By differentiating the identity h(t) = exp(g(t)) we obtain h′ (t) = g ′ (t) exp(g(t)) = g ′ (t)h(t). if we set h(t) = ln(1 + g(t)). . we have: hk = (k + 1)fk+1 and therefore the number of significant terms is ret duced by 1. g0 = 0 by hypothesis. or h′ (t) = g ′ (t) − h′ (t)g(t). If / g(t) ∈ F0 . . if h(t) = 0 f (τ )dτ . . We can now extract the coefficient of tk−1 and obtain: k−1 CHAPTER 3. which can be performed by repeated applications of the Cauchy product. A program performing exponentiation can be easily written by defining h0 = 1 and successively evaluating h1 . when g(t) ∈ / F0 . if r is the number of significant terms in g(t). consequently the number of significant k−1 terms is increased by 1. If h(t) = f ′ (t). . g2 .38 and this formula only requires a series of successive products. g(t) = g0 + g1 t + g2 t2 + · · ·. g2 . gk ) and (h0 = 1. In fact. hk−1 ). The execution time is in the order of O(r3 ). . . the procedure needs a time in the order of O(r3 ). . a similar trick does not work for series composition. . As for the operation of rising to a power. Time complexity is obviously O(r2 ). i. The total time is clearly in the order of O(r2 ). . Unfortunately. and it is worth considering an alternative approach. . h2 . . respectively. . we have exp(g0 +g1 t+g2 t2 +· · ·) = eg0 exp(g1 t+g2 t2 +· · ·). . 1 (k − j)hk−j gj = h k = gk − k j=1 k−1 = gk − j=1 1− j k hk−j gj A program to perform the logarithm of a formal power series 1+g(t) begins by setting h0 = 0 and then proceeds computing h1 . . and therefore we have hk = fk−1 k an expression relating hk to (g1 . we have: khk = kgk − j=0 (k − j)hk−j gj 1 However.e. .

w). and from w = tφ(w) we have: dφ dw = φ(w) + t dt dw w = tφ(w) dw ..Chapter 4 Generating Functions 4. the last formula can be proved by means of the LIF. We can substitute this expression in the formula above and observe that w/φ(w) = t can be taken outside of the substitution symbol: [tn ]F (t)φ(t)n = F (w) φ(w) w = tφ(w) = = [tn−1 ] w 1 − tφ′ (w) 1 F (w) = [tn−1 ] w = tφ(w) = t 1 − tφ′ (w) F (w) = [tn ] w = tφ(w) 1 − tφ′ (w) which is our diagonalization rule (G6).) = (fk )k∈N . we should write Gt. The name “diagonalization” is due to the fact that if we imagine the coefficients of F (t)φ(t)n as constituting the row n in an infinite matrix. we can go on: [tn ]F (t)φ(t)n d F (y) = [tn−1 ] dy y = w(t) = dt y F (w) dw . . In this latter case. In this expression t is a bound variable. whenever no ambiguity can arise. w) to indicate the fact that fn. f2 .) as: E(fk ) = G fk k! = ∞ k=0 and therefore w = w(t) ∈ F 1 is the unique solution of the functional equation w = tφ(w). in the form relative to the composition F (w(t)). for example. Note that formula (G5) requires g0 = 0. the (ordinary) generating function for the sequence F is defined as f (t) = f0 + f1 t + f2 t2 + · · ·. then . . k! The operator G is clearly linear. This leads to the properties for the operator G listed in Table 4. Two functions f (t) and g(t) can be multiplied and composed. . Given the sequence (fk )k∈N . This notation is essential when (fk )k∈N depends on some parameter or when we consider multivariate generating functions. f1 . G(fk )k∈N = f (t). understanding also the binding for the variable k. For the sake of completeness.k in the double sequence becomes the coefficient of tn wk in the function f (t.e. . and a more accurate notation would be Gt (fk )k∈N = f (t). = [tn−1 ] w = tφ(w) w dt We have applied the chain rule for differentiation. The function f (t) can be shifted or differentiated.k )n. = n[tn ] y = [tn−1 ] 39 where φ′ (w) denotes the derivative of φ(w) with respect to w. i. dt We can therefore compute the derivative of w(t): dw φ(w) = dt 1 − tφ′ (w) w = tφ(w) fk tk . we will use the notation G(fk ) = f (t). By now applying the rule of differentiation for the “coefficient of” operator. The first five formulas are easily verified by using the intended interpretation of the operator G. which applied to (fk )k∈N produces the ordinary generating function for the sequence. where the indeterminate t is arbitrary. f1 . In fact we have: [tn ]F (t)φ(t)n F (t) φ(t)n = t F (y) dy y = w(t) . . we also define the exponential generating function of the sequence (f0 .k∈N = f (t.w (fn. . However.1. f2 . we introduce the generating function operator G.1 General Rules in the last passage we applied backwards the formula: [tn ]F (w(t)) = 1 n−1 ′ [t ]F (t)φ(t)n n (w = tφ(w)) Let us consider a sequence of numbers F = (f0 .

The theorem then follows by mathematical induction. because it states the condition under which we can pass from an identity about elements to the corresponding identity about generating functions.5 Let f (t) = G(fk ) be as above. we will prove a number of properties of the generating G(fk ) − f0 − f1 t − · · · − fj−1 tj−1 (4. then: (G(fk ) − f0 ) /t. then G(fk ) = G(gk ) if and only if for every k ∈ N fk = gk .5) G(fk ) − f0 − f1 t G(gk ) − g0 = G(fk+2 ) = G(gk+1 ) = This can be further generalized: t t2 . we obtain: We are now going to prove a series of properties of generating functions.2.2 Let f (t) = G(fk ) be as above. In the next sections. The proofs rely on the following tj fundamental principle of identity: If we consider right instead of left shifting we have to be more careful: Given two sequences (fk )k∈N and (gk )k∈N .4 Let f (t) = G(fk ) be as above. we have: G k 2 fk = tDG(fk ) + t2 D2 G(fk ) (4. it is important. the first one) and we cannot infer the equality of generating functions. By mathematical induction this result can be genThe rules (G1) − (G6) can also be assumed as axeralized to: ioms of a theory of generating functions and used to derive general theorems as well as specific functions Theorem 4. G(gk ) = Theorem 4.g.2. Theorem 4. f−1 = 0 and G(fk−1 ) = tG(fk ). then: The principle is rather obvious from the very definition of the concept of generating functions. by (G2). If f (t) ∈ F.3 Let f (t) = G(fk ) be as above. however.2.2.2.4) 4. then G(fk+2 ) = G(fk ) − f0 − f1 t t2 (4.2. GENERATING FUNCTIONS linearity shifting differentiation G(αfk + βgk ) = αG(fk ) + βG(gk ) G(fk+1 ) = n (G1) (G2) (G3) (G4) (G5) (G(fk ) − f0 ) t G(kfk ) = tDG(fk ) fk gn−k = G(fk ) · G(gk ) n convolution composition diagonalisation G k=0 ∞ fn (G(gk )) n=0 = G(fk ) ◦ G(gk ) F (w) 1 − tφ′ (w) w = tφ(w) G([tn ]F (t)φ(t)n ) = (G6) Table 4.3) Proof: We have G(fk ) = G f(k−1)+1 = t−1 (G(fn−1 ) − f−1 ) where f−1 is the coefficient of t−1 in f (t). G(fk−j ) = tj G(fk ) (4. Since g0 = f1 . then: for particular sequences. Property (G3) can be generalized in several ways: Theorem 4. then: G((k + 1)fk+1 ) = DG(fk ) Proof: If we set gk = kfk .2.2) G(fk+j ) = function operator.1: The rules for the generating function operator [tn ]F (t)φ(t)n are just the elements in the main diagonal of this array.1) G((k + 1)fk+1 ) = G(gk+1 ) = (4.40 CHAPTER 4.. Theorem 4.2.2. = t−1 (G(kfk ) − 0f0 ) = (G2) Proof: Let gk = fk+1 .2 Some Theorems on Generating Functions = t−1 tDG(fk ) = DG(fk ) . It is sufficient that the two sequences do not agree by a single element (e.1 Let f (t) = G(fk ) be the generating function of the sequence (fk )k∈N .2.

for p = −1 we have: G (−1)k fk f (−t). f (t) will always denote the generating function G(fk ). Now: G k j+1 fk = G k(k j )fk = tDG k j fk that is: j t = (G(gk ) − g0 ) dz 1 = z t 0 G(fk ) dz In the following theorems. then: G pk fk = f (pt) (4.4.2.9) ∞ k=0 f2k (t + t )/2 = k k ∞ k=0 f2k tk = G(f2k ) .2) G(f2k+1 ) = 2 t √ √ where t is a symbol with the property that ( t)2 = t. The proof proceeds by induction.8) follows by f ( t) + f (− t) = 2 integration and the condition g0 = 0.6 Let f (t) = G(fk ) be as above.2. the proof of which is manipulating its elements in various ways. let f (t) be a f. we have: = 1 t t 0 G(fk ) dz = 1 t t f (z) dz 0 (4.1 Let f (t) = G(fk ) denote the generating function of the sequence (fk )k∈N . √n √ n ∞ ∞ n=0 fn t + n=0 fn (− t) = = A more classical formula is: 2 √ √ ∞ Theorem 4. conFor the falling factorial k r = k(k − 1) · · · (k − r + cerning sequences derived from a given sequence by 1) we have a simpler formula. we find tDG(gk ) = G(fk ) − f0 . = = r=1 S(j. which is the classical recurrence for the Stirling numbers of the second kind.3. r − 1).8) Theorem 4. by setting fk = G k+1 n = 2k. So we have: G(gk+1 ) = G(fk ) = t−1 (G(gk ) − g0 ). kgk = fk .10) Sj+1 (tD) = tDSj (tD) = tD j j S(j. By using (G3).6) j where Sj (w) = r=1 r wr is the jth Stirling polynomial of the second kind (see Section 2.6) is to be understood in the operator sense. then: formulas: G(k r fk ) = tr Dr G(fk ) (4.2. Hence Proof: By (G5) we have: √ √ we have G(kgk ) = G(fk ) − f0 . we can conclude simple generalizations of the axioms. except for k = 0. =G 1 gk+1 k+1 = 1 1 G gk t k t = Proof: Formula (4. ever. being S3 (w) = w + 3w2 + w3 . The results obtained till now can be considered as Since initial conditions also coincide. then: G k fk = Sj (tD)G(fk ) j j 41 Proof: Let us consider the sequence (gk )k∈N . HowS(j.7) Let us now come to integration: Theorem 4.2. They are very j useful and will be used in many circumstances. hence. but not a f. where gk+1 = fk and g0 = 0. ∀k = 0 and g0 = 0.11). r) = r .7 Let f (t) = G(fk ) be as above. for example.s. r)rt D + r=1 r r S(j. then: √ √ f ( t) + f (− t) (4.9 Let f (t) = G(fk ) be as above or. r)tr Dr = r=1 r+1 r+1 Proof: By setting g(t) = pt in (G5) we have: ∞ ∞ n n n f (pt) = = = G pk fk n=0 fn (pt) n=0 p fn t In particular.3.1) G(f2k ) = 2 √ √ f ( t) − f (− t) √ (4.2. Proof: Clearly.3. r)t D . ( t)n + (− t)n = fn equivalently. we can also obtain more advanced results.10 Let f (t) = G(fk ) denote the generating function of the sequence (fk )k∈N . r) + S(j.2.2.2.L.3 More advanced results rS(j. then: 2 n=0 √ √ n 1 For n odd ( t) + (− t)n = 0.2.8 Let f (t) = G(fk ) be as above and let us define gk = fk /k. r) = 4. as (G3) and (4. then: G 1 fk k t = G(gk ) = 0 (G(fk ) − f0 ) dz z (4. MORE ADVANCED RESULTS Theorem 4. from which (4. Finally: G 1 fk k+1 1 t 0 (4. For examimmediate: ple. By equating like coefficients we find S(j + 1. let us begin by proving the well-known bisection Theorem 4.5) are the first two instances.3.. .2.2.p.2. we have: G k 3 fk = tDG(fk ) + 3t2 D2 G(fk ) + t3 D3 G(fk ) . Theorem 4.s. so.2.

k=0 t 1−t k = 1 f 1−t Since the last expression does not depend on n.3. The following proof is typical and introduces the use of ordinary differential equations in the calculus of generating functions: Theorem 4. or 2kgk +gk = fk . We observe explicitly that by (K4) we have: n n n n k=0 k fk = [t ](1 + t) f (t).basic rules and the theorems of the previous sections tion: we have: t 1 Theorem 4. G(fk ). 1. c. G(fk ) √ dz z (4. Since f0 = 1. let us G fk = 1−t k=0 consider the constant sequence F = (1. By apProof: If we set sn = k=0 fk .3 (Partial Sum Theorem) Let f (t) = G(fk ) denote the generating function of the sequence (fk )k∈N . We conclude with two general theorems on sums: Theorem 4.4 Common Generating Functions The aim of the present section is to derive the most common generating functions by using the apparatus 1 G(fk ) (4.5) can be extended 1 1 ln = to infinity. by (G1) from this (4.4) follows directly. .).3. We k=0 now observe that the sum in (4. then: n 4. then: G fk 2k + 1 1 = √ 2 t t 0 CHAPTER 4. for n which we have fk+1 = fk for every k ∈ N. that is by (G2): G(fk ) − f0 = tG(fk ).: G(sn ) − s0 G(fn ) − f0 1 = G(sn ) + G(1) = t t 1−t Since s0 = f0 . as we shall see when dealing with Riordan arrays.4) of the previous sections. but this expression does not represent a generating function because it depends on n. we have immediately: i.4 Let f (t) = G(fk ) denote the gener= G(n) = G(n · 1) = tD ating function of the sequence (fk )k∈N . .3).5) fk = G (1 − t) (1 − t)3 1−t 1−t k k=0 1 G((−1)n ) = G(1) ◦ (−t) = 1+t Proof: By well-known properties of binomial coeffit cients we have: 1 1 dz 1 = G ·1 = −1 = G n n 1−z z n n −n + n − k − 1 0 = = (−1)n−k = t k n−k n−k dz 1 = = ln −k − 1 1−t 0 1−z = (−1)n−k n n−k 1 1 1 = G = G(Hn ) = G n−k −k−1 k 1−t n and this is the coefficient of t in (1 − t) .3. by using the The following result is known as Euler transforma.plying the principle of identity. Other generating functions can be obtained from the previous formulas: 1 1 ln = G(nHn ) = tD 1−t 1−t . GENERATING FUNCTIONS [tn ] [tn ] 1 1−t ∞ = = [y k ]f (y) t 1−t . we find: G(fk+1 ) = ator G to both members: G(sn+1 ) = G(sn ) + G(fn+1 ). .3. it represents the generating function of the sum. we find that G(c) = c(1−t)−1 . then we have sn+1 = sn + fn+1 for every n ∈ N and we can apply the oper. we find G(sn ) = tG(sn ) + G(fn ) and For any constant sequence F = (c.3. by applying (G3) we have the differential equation 2tg ′ (t) + g(t) = f (t). . Similarly. If g(t) = G(gk ). c. and by (G5) we have: 1−t 1−t n k=0 n fk k n = = k=0 ∞ −k − 1 (−1)n−k fk = n−k k=0 [tn−k ](1 − t)−k−1 [y k ]f (y) = where Hn is the nth harmonic number. .3.).3.3.3. then: 1−t (1 − t)2 t + t2 1 n t n 1 = G n2 = tDG(n) = tD 2 f (4.2 Let f (t) = G(fk ) denote the generating function of the sequence (fk )k∈N . . The Euler transformation can be generalized in several ways. 1. As a first example.42 The proof of the second formula is analogous. whose solution having g(0) = f0 is just formula (4.3) Proof: Let us set gk = fk /(2k+1).e.

where p is a parameter. = By applying (4..n ) t (1 − t)2 1 t t 0 43 By (K4).4. we find f. + k−1 k 1 (1 + t)p+1 t p+1 (1 + t)p+1 − 1 (p + 1)t .m ) = tm . For t = 0 we should have f (0) = p = 1.p. from the definition: p k+1 = = p(p − 1) · · · (p − k + 1)(p − k) = (k + 1)! p−k p . we have the well-known Vandermonde convolution: m+p = [tn ](1 + t)m+p = n = [tn ](1 + t)m (1 + t)p = n ln 1 2t 1 1 ln dz = 1−z 1−z 1 1−t 2 1 +1 1−t ln We can also find G .e. by denoting k G((k + 1)fk+1 ) = G((p − k)fk ) = pG(fk ) − G(kfk ).These functions can make sense even when they are arating the variables and integrating. Let us now come to binomial coefficients. this relation is k k not valid for n = 0. 1−t = G(1) − tG(1) = =1 1−t = k=0 m k p n−k n n 2 k=0 k which. k+1 k Several generating functions for different forms of binomial coefficients can be found by means of this method. This last relation can be readily generalized to G(δn.m is the Kronecker’s delta.4.s. n(n + 1) t 1−t (1 − t)p+1 (1 − t)p+1 1 n(n+1) where δn. By using (K1) ÷ (K5) we find easily: p k = = = [t ](1 + t) = [t ](1 + t)(1 + t) k p k p−1 = pt(1 + pt)(1 + t)p−2 = = = 1 t t 0 = G 1 p k+1 k G p k t dz = = 0 [tk ](1 + t)p−1 + [tk−1 ](1 + t)p−1 = p−1 p−1 . and not simply f.0 k−p n(n + 1) n n+1 −p − 1 = G (−1)k−p = in accordance with the fact that the first element of k−p the sequence is zero. the differential equation f ′ (t) = pf (t) − tf ′ (t). or f (t) = c(1 + t)p . COMMON GENERATING FUNCTIONS = G 1 Hn n+1 = = G(δ0.4) and (G3) we have G DG(fk ) = pG(fk ) − tDG(fk ).2. However.s.L. . i. it is tempting to apply the op. We thus arrive to the correct 1 generating function: = = G [tk−p ] (1 − t)p+1 1−t 1 1 tp tp =1− ln G = G [tk ] = . Finally. we must define: −k + k − p + 1 1 1 1 = G (−1)k−p = = − + δn.. By sep. we have: Hence. we list the following generating functions: ln f (t) = p ln(1 + t) + c. In order p to find G k we observe that. Since The derivation is purely algebraic and makes use of An interesting example is given by G the generating functions already found and of various 1 1 1 n(n+1) = n − n+1 . In order to apply the principle G = G = p k−p of identity. They are summarized as follows. for m = p = n becomes k p = 2n n . and this implies 0 p p G k = tDG = c = 1. where p and m are two parameters and can also be zero: G p m+k p+k m p+k m+k = (1 + t)p tm tm−p (1 − t)m+1 1 tm (1 − t)p+1−m p G = as fk . Consequently: k k p = tD(1 + t)p = pt(1 + t)p−1 G = (1 + t)p p∈R k p p G k2 = (tD + t2 D2 )G = k k We are now in a position to derive the recurrence relation for binomial coefficients.properties considered in the previous section: erator G to both members.

It can 1 1 1 = happen that the recurrence involves several elements (n + 1)! n + 1 n! in F and/or that the resulting equation is indeed a differential equation. By (4. tm z m−1 dz = m+1 (1 − z) m(1 − t)m the well-known formula for the sum of a geometric progression. We observe explicitly that the formulas above could have been obtained from formulas of the previous section and the general formula (4. p . GENERATING FUNCTIONS tm = (1 − t)m+1 mtm + tm+1 (1 − t)m+2 = 1 (p − 1)t 1 1 − 1 − pt 1 − t . The last relation has been obtained by partial fraction expansion. we can try to find the generating −m − 1 function for F by using the technique of shifting: we (−p)k−m = = pm G consider the element fn+1 and try to express it in k−m terms of fn . that is: 1 n = tDG = G 1 k (n + 1)! (n + 1)! G p = 1 − pt 1 tet − et + 1 = tD (et − 1) = From this we obtain other generating functions: t t pt 1 k By this relation the well-known result follows: = G kp = tD 1 − pt (1 − pt)2 n 1 tet − et + 1 k pt + p2 t2 pt = [tn ] = 2 k = G k p = tD (k + 1)! 1−t t (1 − pt)2 (1 − pt)3 k=0 t et − 1 1 dz 1 1 k − = = [tn ] = p −1 = G 1−t t k 1 − pz z 0 1 1 p dz .10). Whatever the case. p . . the method that is (n+1)fn+1 = fn .2. we have p = pp . n+1 n+1 n by setting fn = 2n . we find a recurrence for the elements fn ∈ F and then try to solve it by using let us observe that: the rules (G1) ÷ (G5) and their consequences. the solution of which is the generating As a very simple application of the shifting method. by applying the operator G. we have the recurrence (n + n . for m = 0 it reduces to G(1/k) = − ln(1 − t). In a similar way we also have: G m k p k k k p m = (1 + pt)m 1 [tn+1 ] (p − 1) pn+1 − 1 p−1 4. = 1− = ln = (n + 1)! (1 − pz) 1 − pt n G pk k=0 = 1 1 = 1 − t 1 − pt Let us now observe that 2n+2 = 2(2n+1) 2n . function.5 The Method of Shifting k When the elements of a sequence F are given by an G = G pk−m pm = k−m explicit formula. . . G = dz n · n! z 0 By (G2) we have t−1 G pk − 1 = pG pk . By using the operator [tk ] we easily find: n (tD + t2 D2 )G k = m m2 tm + (3m + 1)tm+1 + tm+2 (1 − t)m+3 t pk k=0 = = = [tn ] 1 1 = 1 − t 1 − pt 1 1 − 1 − pt 1 − t = G k m 0 − m dz = z = 0 The last integral can be solved by setting y = (1 − z)−1 and is valid for m > 0.4) ′ of shifting allows us to find the generating function we have f (t) = f (t) or: of many sequences. 1 G = et Let us consider the geometric sequence n! 2 3 k+1 k 1. = we apply the principle of identity deriving an equa(1 − pt)m+1 tion in G(fn ). ∀k ∈ N t z e −1 1 or. p.2. G pk+1 = pG pk . . In practice.44 G k k m = tD = G k2 k m = = 1 k G k m = 0 t CHAPTER 4. This can produce a relation to which pm t m . where fn = 1/n!.

In fact we have: 2n 2t √ G n = n (1 − 4t) 1 − 4t 2n t = [tn ](1 + t)2n dz 2n 1 1 n = = √ G 2n + 1 n t 0 4z(1 − 4z) and (G6) can be applied with F (t) = 1 and φ(t) = 1 4t (1 + t)2 . the simple solution of which is: 45 4. let us study the sequence: cn = [tn ](1 + αt + βt2 )n and look for its generating function C(t) = G(cn ). In this case. By expanding. we have: G 1 = 1 − 2t(1 + w) 2n n = we have the recurrence: (2n + 1)fn+1 = 2(n + 1)fn . 1−t 1−t as we already know. The function φ(t) = (1 + t)2 gives rise to a second degree equation.4). More in general. and 1 1 2n dz √ = G = other methods are often necessary to find out genern+1 n t 0 1 − 4z 2t ating functions. By using (4. the integral can be computed without difficulty. This gives: w = w(t) = 1 − αt ± (1 − αt)2 − 4βt2 2βt Some immediate consequences are: G 2n 4n 2n + 1 n 4n 2n G 2n2 n and finally: G 2n 1 4n 2n 2n + 1 n −1 −1 = 1 t(1 − t) arctan arctan t 1−t 2 t 1−t −1 = = 1− 1−t arctan t t . Not every sequence can n 1 − 4t √ t 1 − 1 − 4t be defined by a first order recurrence relation. 1−t and again we have to eliminate the solution with the + sign. It produces first 2n 1 order recurrence relations. One of the most √ −1 G = = n n z 1 − 4z simple examples is how to determine the generating 0 √ function of the central binomial coefficients. Since: considering fn = 4n 2n 1 − t ± 1 − 4t n .6. we find: C(t) = 1 1 − t(α + 2βw) . Sometimes.4. DIAGONALIZATION 1)fn+1 = 2(2n + 1)fn . By performing the necessary computations. we find tw2 −(1−2t)w+t = 0 A last group of generating functions is obtained by or: −1 √ . By using the operator G and the rules of Section 4. w = w(t) = 2t −1 −1 2n + 2 2n + 2 n 2n Since w = w(t) should belong to F 1 . and the final result is: G 4n 2n n −1 = t arctan (1 − t)3 t 1 + . 1 − 4t 4t determined by solving the functional equation w = t(1+w)2 . In this case again we have F (t) = 1 and φ(t) = 1 + αt + βt2 . (G1) and (G3) we obtain the differential equation: f ′ (t) = 4tf ′ (t) + 2f (t).2. which will be more closely = √ G studied in the next sections. the differential equation 2t(1 − t)f ′ (t) − (1 + 2t)f (t) + 1 = 0 is derived. the function w = w(t) is easily = √ arctan . we must elim4n+1 = 4 n+1 2n + 1 n inate the solution with the + sign. and therefore we should solve the functional equation w = t(1+αw+βw2 ) or βtw2 −(1−αt)w+t = 0. consequently. the rule of diagonalizat dz 1 2n 1 tion can be used very conveniently. without 1 − 1 − 4t = 2 ln having to pass through the solution of a differential 2t equation. The solution is: f (t) = t (1 − t)3 t − 0 (1 − z)3 dz z 2z(1 − z) √ 1 − t − 1 − 4t w= = 2t 1 1 − 4t =√ By simplifying and using the change of variable y = z/(1 − z).2.6 Diagonalization The technique of shifting is a rather general method for obtaining generating functions.

1. 1. 2. 2.n . These generating functions can be used to find the and for α = 2. different from 0.2). 1. 2 n−k We can think of the last sum as a “semi-closed” form. 1. once we start from (1 + t)k2 k=0 the initial conditions Tn. √ n If Tn. ⌈k r ⌉ 1 − 2αt + (α2 − 4β)t2 and ⌈rk ⌉.k is [tk ](1 + t + t2 )n . . We can n n−1 k ⌊log2 k⌋(−1) = . 2. . function for the central binomial coefficients. 0. because the√ number of terms is dramatically reduced from n to n. The elements Tn.2: Trinomial coefficients 1 − αt − 1 1 − 2αt + (α2 − 4β)t2 2βt In the same way we obtain analogous generating functions: √ r G ⌊ k⌋ = ∞ k=0 w= = tk 1−t r G(⌊logr k⌋) = ∞ k=0 tr 1−t k where r is any integer number.n ) = √ 1 − 2t − 3t2 1 (1 + t)(1 − 3t) . In the same way we find: the sequence {0. . {0. . GENERATING FUNCTIONS and therefore we obtain: √ G ⌊ k⌋ = 3 16 1 10 4 1 ∞ k=0 2 7 16 1 6 19 tk .k+1 = Tn. marked in the table. although it remains depending We wish to determine the generating function of on n. in analogy with the binomial coefficients. we immediately deduce the recurrence 2 ∞ 1 yk t relation: = (−1)n [tn ] = y= 1+t 1−y 1+t k=0 Tn+1. 3.7 Some special functions generating . 1. 0. = (−1)n √ ⌊ n⌋ k=0 2 n−1 (−1)n−k = n − k2 = 2 n−1 (−1)k . or also any real number. think that it is formed up by summing an infinite k n − 2k k k=1 number of simpler sequences {0. The generating functions of sum: these sequences are: 2 ∞ n √ 1 tk n t4 t9 t16 t1 = ⌊ k⌋ = [t ] ··· 1−t 1−t k=1 k=1 1−t 1−t 1−t 1−t 4. 1.k−1 + Tn. with respect k k to the previous row (see Table 4. that is the √ ⌊log2 n⌋ sequence whose generic element is ⌊ k⌋. . 1.0 = 1 and Tn. . 1.2n = 1. 1. respectively. 1. Euler transformation: They constitute an infinite array in which every row n √ ⌊ k⌋(−1)k = has two more elements. the next one with the first 1 A truly closed formula is found for the following in position 9. and so on. .}. 1. for ev√ ⌊ n⌋ ery n ∈ N. . β = 1 we obtain again the generating values of several sums in closed or semi-closed form. where we use the coefficients.}. 1. 2 1 n [tn−k ] = = (−1) are called the central trinomial coefficients. 2. a trinomial coefficient. 0. (−1)n−k ⌊ k⌋ = = (−1)n by the obvious property (1 + t + t2 )n+1 = (1 + t + k k t2 )(1+t+t2 )n . 1.k + Tn.46 n\k 0 1 2 3 4 0 1 1 1 1 1 1 1 2 3 4 2 1 3 6 10 3 4 5 6 7 8 CHAPTER 4. their se(1 + t)k2 k=0 quence begins: √ n Tn 0 1 2 1 1 3 3 4 7 19 5 51 6 141 7 393 8 1107 = (−1)n ⌊ n⌋ k=0 √ ⌊ n⌋ k=0 −k 2 n − k2 = and by the formula above their generating function is: 1 = G(Tn. .k+1 2 ∞ tk = = (−1)n [tn ] from which the array can be built.}. if we substitute to k r and rk . The coefficients of (1 + t + t2 )n are called trinomial Let us begin by the following case. 1−t 2 Table 4.

then the relation is values in the previous expression we get: called a full history recurrence. .8 A somewhat more difficult sum is the following one: n Linear recurrences constant coefficients with If (fk )k∈N is a sequence. . if F is lin2k −1 (1 − t)k −1 4(1 − t)2 − ··· − = − ear. then − − − 1 − 2t (1 − t)k2 (1 − t)k2 the relation is called a partial history recurrence and p 2 2 is called the order of the relation. k2 t = [tn ] = Usually. 2. B = −1. 1.). we obtain the constant k2 −1 k2 −2 1−t (1 − t) (1 − t) sequence {2. . i. C = −2.. k2 (1 − 2t) (1 − t) √ ⌊ n⌋ k=1 √ ⌊ n⌋ k=1 2 Therefore. 1. we have a linear recurrence with = y= k=0 k=0 n √ ⌊ k⌋ k [tn ] 1 1−t ∞ yk 1−y 2 t 1−t . 4. .e. we have a linear recurrence. If F only depends on 2 1 2k 2(1 − t) a fixed number of elements fn−1 . + + ··· + the initial condition x0 = 2. . f1 . Again. for the anal. . By changing the initial + = + 1 − 2t (1 − t)k2 (1 − t)k2 (1 − 2t) conditions. we conclude: n = n−k +1 n − k2 (n − k 2 + 1) = √ 2 = k=0 n √ ⌊ k⌋ = k n−2 2 n − k2 √ ⌊ n⌋ 2k 2n−k − ⌊ n⌋ k=1 √ 2 2 n−1 − n − k2 n − k2 n − k2 = = = n−1 n−2 = ⌊ n⌋2n − +2 + ···+ The final value of the sum is therefore: n − k2 n − k2 k=1 √ √ √ ⌊ n⌋ √ 1 2 n − k2 (n + 1)⌊ n⌋ − ⌊ n⌋ + 1 ⌊ n⌋ + + · · · + 2k −1 . . For example. if all the coefficients appearing = = − 1 − 2t (1 − t)k2 2(1 − t) − 1 in F are constant.) can be tions: defined by the recurrence relation xn = xn−1 and B 1 A the initial condition x0 = 1. 2. if we consider the same relation xn = xn−1 together with C D X + . LINEAR RECURRENCES WITH CONSTANT COEFFICIENTS ∞ 47 2 2 = [tn−k ] 2 = k=1 √ ⌊ n⌋ k=1 √ ⌊ n⌋ k=1 √ ⌊ n⌋ k=1 1 = (1 − t)2 = = 2 −2 (−1)n−k = n − k2 2k 2k (1 − t)k − 1 = − 1 − 2t (1 − t)k2 (1 − 2t) 1 .}. by substituting these fn = F (fn−1 . f0 . fn−p . . . fn−2 .8. fn−2 . the first elements of the sequence must be (1 − t)k2 (1 − 2t) k=0 given explicitly.√and therefore the asymptotic value of the sum is ⌊ n⌋2n . when F depends on all the values fn−1 . we obtain (see Chapter 1): term dominates all the others. . . . 2 − √ ⌊ n⌋ k=1 − ··· − 2k 2 −1 √ ⌊log2 k⌋ = (n + 1)⌊log2 n⌋ − 2⌊log2 n⌋+1 + 1. the constant sequence (1.4. . they constitute the initial 2 1 . [tn−k ] = conditions and the sequence is well-defined if and only k2 (1 − 2t) (1 − t) k=0 if every element can be computed by starting with We can now obtain a semi-closed form for this sum by the initial conditions and going on with the other expanding the generating function into partial frac. Linear recurrences k2 k2 (1 − t) (1 − t) are surely the most common and important type of 2 k2 k2 2 (1 − t)k − 1 2 1 recurrence relations. X = −2k −1 . . . 2 In general. D = 2 −4.elements by means of the recurrence relation. a relation relating the ∞ generic element fn to other elements fk having k < n. . . . n k=1 (n + 1)⌊ n⌋ − √ ⌊ n⌋ k=1 k . . . a recurrence relation can be written We can show that A = 2k .We observe that for very large values of n the first 3 ogous sum with ⌊log2 n⌋. the sequence can radically change. Besides. fn−2 . . in fact. 3 2 n − k2 √ whose asymptotic value is 2 n n. in order to allow the computation ∞ of the successive values. it can be defined by means of a recurrence relation.

For linear recurrences with polynomial coefficients. if we write the recurrence as Fn+2 = Fn+1 + Fn we have fulfilled the requirement. the method of generating functions allows us to find the solution of any linear recurrence with constant coefficients.48 constant coefficients. We can now find an explicit expression for Fn in the following way. no method is known that solves all the recurrences of this kind. in writing the expressions for G(fn+p−j ) we make use 5 of the initial conditions for the sequence. In fact. and if the coefficients are polynomials in n. In reality. As we are now going to see. However. Fn should be an integer and therefore we can compute it by finding the √ F (t) − F0 F (t) − F0 − F1 t = + F (t).. Let us go on with the example of the Fibonacci This formula allows us to compute Fn in a time independent of n. this was not possible beforehand because of the principle of identity for generating functions. φ= 2 The constant 1/φ ≈ 0. and shows sequence (Fk )k∈N . and surely generating functions are the method giving the highest number of positive results. we begin by expressing it in such a way that the relation is valid for every n ∈ N.s. In the example of Fibonacci numbers. We have: that Fn grows exponentially. 5 F (t) − t = tF (t) + t2 F (t) . On the other hand. This first step has a great importance. F1 = 1. since |φ| < 1. from which an explicit expression for f (t) is immediately obtained. integer number closest to φn / 5. G(Fn+2 ) = G(Fn+1 ) + G(Fn ) the quantity φn approaches 0 very rapidly and we and by setting F (t) = G(Fn ) we find: have Fn = O(φn ). the same method allows us to find a solution in many occasions. GENERATING FUNCTIONS and by solving in F (t) we have the explicit expression: F (t) = t . which have no combinatorial meaning. this is not the case. The denominator of F (t) can be written 1 − t − t2 = (1 − φt)(1 − φt) where: √ 1+ 5 φ= ≈ 1. By applying the method of partial fraction expansion we find: F (t) = = t (1 − φt)(1 − φt) = B A + = 1 − φt 1 − φt A − Aφt + B − Bφt . We observe explicitly that √ = . 1 − t − t2 We determine the two constants A and B by equating the coefficients in the first and last expression for F (t): A+B =0 −Aφ − Bφ = 1 √ A = 1/(φ − φ) = 1/ 5 √ B = −A = −1/ 5 The value of Fn is now obtained by extracting the coefficient of tn : Fn = [tn ]F (t) = 1 1 1 = − [tn ] √ 5 1 − φt 1 − φt 1 1 1 √ [tn ] − [tn ] = 1 − φt 5 1 − φt = By Theorem 4. The Fibonacci recurrence Fn = Fn−1 + Fn−2 is an example of a recurrence relation with constant coefficients. consequently: 2 t t Because we know that F0 = 0. This is the solution φn − φn of the recurrence relation. ∀n ∈ N. We will discuss this case in the next section. The recurrence being linear with constant coefficients.h. and we do not know the values for the two elements in the r.2. When we have a recurrence of this kind. in the sense that we find a function f (t) such that [tn ]f (t) = fn .618033989 is known as the golden ratio.618033989. 1 − t − t2 This is the generating function for the Fibonacci numbers. because for n = 0 we have F0 = F−1 + F−2 . but the success is not assured. we can apply the axiom of linearity to the recurrence: fn+p = α1 fn+p−1 + α2 fn+p−2 + · · · + αp fn and obtain the relation: G(fn+p ) = α1 G(fn+p−1 )+ + α2 G(fn+p−2 ) + · · · + αp G(fn ).2 we can now express every G(fn+p−j ) in terms of f (t) = G(fn ) and obtain a lin= ear relation in f (t). because φn = exp(n ln φ). because it allows us to apply the operator G to both members of the relation. we have: φn Fn = round √ .618033989 2 √ 1− 5 ≈ −0. we have a linear recurrence with polynomial coefficients. CHAPTER 4.

10. a0 = fn + cn . a permutation π ∈ Pn such that π 2 = (1). . . . b0 is zero. When we studied permutations. if we multiply everything by the so-called summing factor: In+2 = In+1 + (n + 1)In an an−1 . which allows us to obtain an explicit expression for the generic element of the defined sequence. . we can althat the principle of identity can be applied. but no other method is available to solve those recurrences which cannot be solved by a generating function approach. bk bk−1 . As we remarked in the Introduction. bn . G((n + 2)in+2 ) can be seen as the shifting of G((n + 1)in+1 ) = i′ (t). the rule of differentiation introduces a derivative in the relation for the generating function and a differential equation has to be solved.9 Linear recurrences with polynomial coefficients When a recurrence relation has polynomial coefficients. . . the method of generating functions does not assure a solution. . . Usually. In fact. b0 a0 f0 + an+1 an . which We can now define: is therefore the exponential generating function for gn+1 = an+1 an . Therefore. a0 an an−1 . because the main difficulty just consists in dealing with the differential equation. This is the actual problem of this approach. an−1 . if with i(t) we denote the generating function of (ik )k∈N = (Ik /k!)k∈N . and in the next section we will see a very important example taken from the analysis of algorithms. . a0 fn+1 = bn bn−1 . by unfolding this recurrence the result is: fn+1 = bn bn−1 . . The number of fn+1 = fn +cn = fn−1 +cn−1 +cn = · · · = f0 + k=0 involutions grows very fast and it can be a good idea to consider the quantity in = In /n!. this expression is in the form of a sum. . but here we wish to present a case arising from an actual combinatorial problem. bn−1 . and for the number In of involutions in Pn we found the recurrence relation: In = In−1 + (n − 1)In−2 4. b0 ). a0 . . Usually. b0 (n + 2)! n + 2 (n + 1)! n + 2 n! The recurrence relation for in is: (n + 2)in+2 = in+1 + in provided none of an . . . i. and so: i′ (t) = (1 + t)i(t). . . cn are any expressions. we have C = 0 and we conclude with the formula: t2 In = n![tn ] exp t + . . a0 ck . . . . . the method does not guarantee a closed form. a0 In+2 In+1 1 1 In = + . bn .10 The summing method factor For linear recurrences of the first order a method exists. . therefore. . . . . let where f0 is the initial condition relative to the seus begin by changing the recurrence in such a way quence under consideration. a0 fn+1 /(bn bn−1 ..4. . Fortunately. . bn−1 bn−2 . . b0 g0 = a0 f0 . We have already seen some examples when we studied the method of shifting. by unfolding the recurrence we can find an explicit expression for fn+1 or fn : n ck which has polynomial coefficients. we introduced the concept of an involution. a0 n k=0 ak ak−1 . . and a possible closed form can only be found by manipulating this sum. and then ways change the original recurrence into a relation of divide everything by (n + 2)!: this more simple form. This is a simple differential equation with separable variables and by solving it we find: ln i(t) = t + t2 +C 2 or i(t) = exp t + t2 +C 2 and the relation becomes: an an−1 . . b0 and we can pass to generating functions. a0 gn+1 = gn + cn bn bn−1 . THE SUMMING FACTOR METHOD 49 where C is an integration constant. . . in that case. b0 bn bn−1 . if an+1 = bn = 1. 2 4. . . . Because i(0) = eC = 1. . We have: i′ (t) − 1 i(t) − 1 = + i(t) t t because of the initial conditions i0 = i1 = 1. .e. possibly depending on n. b0 As a technical remark. . we observe that sometimes a0 and/or b0 can be 0. Let us suppose we have a recurrence relation: an+1 fn+1 = bn fn + cn where an . b0 an an−1 . (Ik )k∈N . bn bn−1 . we can unfold the . we obtain: an+1 an . Finally. .

we know that the two coefficients of fn+1 and we conclude: and fn are equal. we find: Hn − 2H2n + 4n ∼ 2n − 1 ∼ ln n + γ − 2(ln 2 + ln n + γ) + 2 = = − ln n − γ − ln 4 + 2. fn − n 2 4 (2n − 1) n This expression allows us to obtain a formula for fn : fn = 1 2n Hn − H2n + 2 2n − 1 = Hn − 2H2n + 2 1 2n 2n − 1 4n n 1/2 . bn = (2n − 1)/2. a1 bn bn−1 . = H2n+2 − 2n + 1 − 2n + 2 − 2 Hn+1 + 2n + 2 − 1 = 24 24 1920 960 A method for finding a recurrence relation for the coefficients fn of this f. 2n + 1 4n+1 n + 1 and therefore we have the differential equation: √ 1 (1 − t)f (t) = − f (t) + 1 − t. we find: 1 1 1 1 1 1 = + + +· · ·+ +1− −· · ·− −1 = √ 1 2n 2n − 1 2n − 2 2 2n 2 1 − t ln f (t) = = 1−t 1 1 1 1 1 1 71 5 31 6 = t − t3 − t4 − t − t + · · · . we have: 2n 1 (n + 1)4n n = = Therefore: fn+1 = 1 2(n + 1) Hn+1 − H2n+2 + 2 2n + 1 × 2 2n + 2 1 . (2n)! 4n 2n − 1 The reader can numerically check this expression against the actual values of fn given above. we find: (n + 1) 4n n!2 2n − 1 4n n!2 1 fn+1 = fn − .p. which shows that |fn | grows as ln n/n3/2 .s. By using the asymptotic approximation Hn ∼ ln n + γ given in the Introduction. 2 2n + 1 Furthermore. 2n + 1 4n+1 n + 1 × 2n + 2 4 n+1 (n + 1)4n+1 n + 1 2(2n + 1) 1 2 2n + 2 . . notwithstanding their appearance. we have: an an−1 . (2n)! 2 (2n)! 2n − 1 Besides: ∼ 1 √ 2n πn We are fortunate and cn simplifies dramatically. let us n discuss the problem of determining the coefficient 1 1 1 = + + ··· + 1 − 1 = of tn in the f.s. If we expand the function. . . b1 = = = n(n − 1) · · · 1 · 2n = (2n − 1)(2n − 3) · · · 1 n!2n 2n(2n − 2) · · · 2 = 2n(2n − 1)(2n − 2) · · · 1 4n n!2 . besides. 2 ′ By extracting the coefficient of tn . we have the relation: 1 1/2 (n + 1)fn+1 − nfn = − fn + (−1)n 2 n which can be written as: (n + 1)fn+1 = 2n − 1 1 2n . is to derive a differential equation for f (t). for which we have an = n. In order to show a non-trivial example. Therefore we have: fn ∼ −(ln n + γ + ln 4 − 2) fn+1 2n 1 = (n + 1)4n n n k=0 −1 2k − 1 1 √ 2n πn (a0 f0 = 0). 1 2n 1 2n − 1 4n n By multiplying the recurrence relation by this summing factor. By differentiating: 1 1 1 ln +√ f (t) = − √ 1−t 2 1−t 1−t ′ 1 2(n + 1) = H2n+2 − Hn+1 − . .50 CHAPTER 4. Since a0 = 0. . and accordingly change the last We can somewhat simplify this expression by observing that: index 0. Let us apply the summing factor method. corresponding to the function 2k − 1 2n − 1 2n − 3 √ k=0 f (t) = 1 − t ln(1/(1 − t)). n = This is a recurrence relation of the first order with the initial condition f0 = 0. GENERATING FUNCTIONS recurrence down to 1.p.

we perform a search in the binary tree. We used the formula for the sum of the first n − 1 integers. but there are only 2n /(n + 1) different binary trees with n nodes. (n − 1 − k)! It only remains to count the contribution of the roots. in order to be able to apply the generating function operator: n (n + 1)Qn+1 Q′ (t) Q′ (t) = = − (n + 1)2 + 2 k=0 Qk Q(t) 1+t +2 (1 − t)3 1−t 1+t 2 Q(t) = .9). a single comparisons for every tree. . starting at the root and arriving to a given node K is called the internal path length for K.l. so that Qn is the average total i. We can also reformulate the recurrence for n + 1. These possible ways are In a similar way we find the total contribution of the right subtrees: n−1 k!(Pn−1−k + (n − 1 − k)!(n − 1 − k)) = k = (n − 1)! Pn−1−k + (n − 1 − k) . The left subtrees have therefore a total i.11. We now observe that every left subtree is associated to each possible right subtree and therefore it should be counted (n − 1 − k)! times. If we’ll succeed in finding Qn . we have: Pn 2 =1+ n! n n−1 k=0 2 Pk +n−1=n+ k! n n−1 k=0 Pk . on the average. our previous problem can be stated in the following way: what is the average internal path length for binary trees with n nodes? Knuth has found a rather simple way for answering this question. relative to a single tree. As we know. the total number of nodes in all the trees.4. .l. k! n−1 k Binary trees are often used as a data structure to retrieve information. the average i. Therefore D is a permutation of the ordered sequence d1 < d2 < · · · < dn . comparing d against the root and other keys in the tree. in which the contributions of the left and right subtrees turn out to be the same: Pn = n! + (n − 1)!× n−1 × k=0 Pn−1−k Pk +k+ + (n − 1 − k) k! (n − 1 − k)! n−1 = = n! + 2(n − 1)! = n! + 2(n − 1)! k=0 n−1 Pk +k k! = k=0 n(n − 1) Pk + k! 2 . The reasoning is as follows: we evaluate the total internal path length for all the trees generated by the n! possible permutations of our key set D. every permutation generating the left and right subtrees is not to be counted only once: the keys can be arranged in all possible ways in the overall permutation. by dividing by n!.. THE INTERNAL PATH LENGTH OF BINARY TREES 51 4. there are n! possible permutations of the keys in D. which obviously amounts to n!. n − 1) and the right subtree contains the remaining n − 1 − k nodes. Besides. retaining their relative ordering. i. Let Pn the total internal path length (i. 1. Therefore. and as the various elements arrive. by the total number of nodes.e. equal to Pk . A set D of keys is given.l.p.11 The internal path length of binary trees . taken from an ordered universe U . In the former case our search is successful. however. until we find d or we arrive to some leaf and we cannot go on with our search. The number of nodes along a path in the tree. while in the latter case it is unsuccessful.l. the left subtree contains k nodes (k = 0.p.l. This increases the total i. A non-empty binary tree can be seen as two subtrees connected to the root (see Section 2. 2 ln 1−t = This differential equation can be easily solved: Q(t) = = 1 (1 − t)2 1 (1 − t)2 . they are inserted in a binary tree.p. and then divide this number by n!n. 1−t (1 − t)3 (1 − t)2 (1 + t) dt + C (1 − t)3 1 −t+C . It is just the number of comparisons necessary to find the key in K. . and therefore the total contribution to Pn of the left subtrees is: n−1 (n − 1 − k)!(Pk + k!k) = (n − 1)! k Pk +k .p. we are looking for will simply be Qn /n. because it tells us how good binary trees are for searching information. but every search in these subtrees has to pass through the root. and now. we wish to show how the method of generating functions can be used to find the average internal path length in a standard way.) of all the n! possible trees generated by permutations. We therefore have the following recurrence relation. The problem is: how many comparisons should we perform. to find out that d is present in the tree (successful search)? The answer to this question is very important. k! Let us now set Qn = Pn /n!.p. n When we are looking for some key d to find whether d ∈ D or not. . it actually is Pk + k!k.

In Figure 4. keys are stored in the tree in their proper order. if we find a limitation for the = Fn − 1 + Fn−1 − 1 + 1 = Fn+1 − 1 height of a class of trees. .1 Hn − 3. height balanced binary trees are also known tract the coefficient of tn : as AVL trees. . or keys. by setting t = 0. Q(t) = The algorithm of Adelson-Velski and Landis is very 2 2 (1 − t) 1 − t (1 − t) important because. of a set {a1 . we should have Formally. and the condition on the heights of the subtrees of is satisfied.: subtree of every node exceeds exactly by 1 the height Qn 1 Pn of the right subtree of the same node.52 CHAPTER 4. a2 . Here we only wish to perform a worst case analysis to prove that the ret 1 2+t trieval time in any AVL tree cannot be larger than = 2[tn+1 ] ln + 1 − [tn ] = (1 − t)2 1−t (1 − t)2 O(ln n). Therefore. 7.p. The height is also the maximal number of comparisons necessary to find any key in |Tn | = |Tn−1 | + |Tn−2 | + 1 = the tree.12 Height balanced binary every nodeconsidered asBecause of this construction.4 on Common Generating Functions) to ex. Therefore.ing time for every key in the tree is O(ln n). if h′ and h′′ are the K k find C = 0. an } are stored retrieving time for every such tree will be ≤ n. 12. and because of the recurrence height of a tree as the highest level at which a node relation for |Tn | we have: in the tree is placed. The proof is done by Landis.l. and the algorithm for building AVL1 1 n trees from a set of n keys can be found in any book Qn = [t ] 2 ln + 1 − (2 + t) = (1 − t)2 1−t on algorithms and data structures. in fact. of height n. . the average elements. 4. Tn can be the “worst” tree of height n. the number of nodes in Tn is just the sum O(ln n). the height of the corresponding right subtree plus 1. let us consider −2 −2 = 2(n+1)Hn+1 −2 (−1)n − (−1)n−1 = to worst possible AVL tree. We can now use the formula for G(nHn ) (see the Sec. we have the simple recurrence relation: tree is in the order of ln n. Because of that. two Russian researchers. in random order in a binary (search) tree. at the beginning of the as is intuitively apparent from the beginning of the 1960’s.p. = =2 1+ we have drawn the first cases.height balanced binary trees assure that the retrievtion 4. the average retrieving time becomes O(n). Since the height We have been able to show that binary trees are a n is an upper bound for the number of comparisons “good” retrieving structure. let us define the k = n−1 and k = n−2.}. Adelson-Velski and sequence {0. retrieve any key in a binary tree is in the order of Therefore. we that for every node K in it. To avoid this drawback. . For n = 0 we have |T0 | = balanced” binary tree. and this is true. . .l. and shows by using the preceding tree Tn−1 as the left subtree that the average number of comparisons necessary to and the tree Tn−2 as the right subtree of the root. is built This formula is asymptotic to 2 ln n+γ −3. then |h′ − h′′ | ≤ 1. the resulting structure degenerates into a linear list and This resembles the Fibonacci recurrence relation. The final result is: heights of the two subtrees originating from K. 1. However. similarly we the left subtree of every node K differs by at most proceed for n + 1. as we are now going to show. we can easily show that |Tn | = Fn+1 − 1. this is also a limitation for the internal path length of the trees in the same class. if the |Tn | = |Tn−1 | + |Tn−2 | + 1. K K 1 t 2 ln − . and. plus 1 (the root). for example. a tree for which the height of F1 − 1 = 1 − 1 = 0. this holds for node K. of the empty tree is 0. trees in the sense that every other AVL-tree of height n will have at least as many nodes as Tn . found an algorithm to store keys in a “height mathematical induction. let us consider trees in which the height of the left Thus we conclude with the average i. These trees are built in n!n n n a very simple way: every tree Tn . the n n−1 height of the left subtree of any node cannot exceed = 2(n + 1)Hn + 2 − 2(n + 1) − n = 2(n + 1)Hn − 3n. let us suppose that for 1 from the height of the right subtree of the same every k < n we have |Tk+1 | = Fk − 1. since Fn + Fn−1 = Fn+1 by the Fibonacci recurrence. this behavior of binary trees is not always assured. To understand this concept. . a height balanced binary tree is a tree such Q0 = Q(0) = 0 and therefore. GENERATING FUNCTIONS Since the i. 4. In order to perform our analysis. in the sense that if the necessary to retrieve any key in the tree. . by definition. 2. then the If we denote by |Tn | the number of nodes in the expected average time for retrieving any key in the tree Tn . Since. of the nodes in Tn−1 and in Tn−2 .

As we observed. the function B(t) = G(Bn /n!). √ 2 e −1 2 2 et − 1 1 − 1 − 4t C(t) = 2t In order to see that this function is even. and therefore the solution with the + sign before order equal to zero. In fact we have: the square root is to be ignored.0 . ∀n > 0. Cn+1 = of the exponential function and therefore we obtain: k=0 The right hand member is a convolution. In k=0 order to apply the method of generating functions. By passing to the logarithms. rectly solved. n = O(ln |Tn |). by the initial condition C0 = 1. we obtain: C(t) − 1 = C(t)2 t or tC(t)2 − C(t) + 1 = 0. is to show that the function obtained from B(t) by This is a second degree equation.13. every AVL-tree of height n has a number of nodes not less than |Tn |. √ we have: n ≈ logφ ( 5(|Tn | + 1)) − 1. t et − 1 B(t) = 1 or B(t) = t . which however. which can be di. Fn ≈ φn / √ therefore we have |Tn | ≈ φn+1 / 5 − 1 or φn+1 ≈ 5(|Tn |+1). we obtain: we studied the Catalan numbers. and therefore.. − −t t 2e −1 21−e 2 et − 1 n n+1 Bk = δn. (k + 1)! (n − k)! The classical way to see that B2n+1 = 0.13 Some special recurrences Not all recurrence relations are linear and we had occasions to deal with a different sort of relation when If we divide everything by (n + 1)!. They satisfy the n n−1 1 Bn−k recurrence Cn = k=0 Ck Cn−k−1 . = δn.e.1: Worst AVL-trees We have shown that for large values of n we have √ √ 5. we substiwhich was found in the section “The method of shift. is only valid for n > 0. for t = 0 we should have C(0) = C0 = and therefore should have all its coefficients of odd 1.0 . The left hand n member is a convolution. The Bernoulli numbers were introduced by means t 1 + et t et + 1 t e−t + 1 of the implicit relation: =− = . and prove some of their properties. and this assures that the retrieving time for every AVL-tree with at most |Tn | √ nodes is bounded from above by logφ ( 5(|Tn |+1))−1 We are now in a position to find out their exponential generating function. SOME SPECIAL RECURRENCES 53 T0 T1 r T2 r   r     r T3 r  d r  dr     r T4 r    d dr r     eer ¡¡ r r   r   T5 r  d r r   d ¡  d ¡ d r dr r d r   ¡   eer r ¡r ¡¡ r   Figure 4.tute t → −t and show that the function remains the same: ing” in a completely different way. t e −1 (n + 1)! Bn−k = δn. and since all logarithms are proportional. k k=0 . we we write it for n + 1: can pass to the generating functions. The defining relation can be written as: n k=0 n+1 Bn−k = n−k n k=0 n = = k=0 (n + 1)n · · · (k + 2) Bn−k = (n − k)! 4.4. i. whose first factor is the shift Ck Cn−k .deleting the term of first degree is an even function.0 (k + 1)! (n − k)! in this particular form. we thus obtain: t t t t et + 1 B(t) + = t + = . and since this relation holds for every n ∈ N.

GENERATING FUNCTIONS .54 CHAPTER 4.

lower triangular Proof: The proof consists in a straight-forward comarray (or triangle) (dn. d(t)(th(t))k z k = d(t) 1 − tzh(t) (5.2). th(t)) is a Riordan array. 1 + th(t) k tant properties of Riordan arrays is the fact that the td(t)h(t) sums involving the rows of a Riordan array can be .1. we find the row sums (r.k = = [tn ] t 1 1 = [tn−k ] = 1−t 1−t (1 − t)k+1 −k − 1 n n (−1)n−k = = n−k n−k k k k=0 n 1 fk = [tn ] f k 1−t t 1−t which we proved by simple considerations on binomial coefficients and generating functions.1) ∞ n dn.1 Definitions and basic concepts Theorem 5. we are mainly interested in the sequence of functions iteratively defined by: d0 (t) = d(t) dk (t) = dk−1 (t)th(t) = d(t)(th(t))k These functions are the column generating functions of the triangle. h(t)) be a Riordan array and let f (t) be the generating function of the sequence (fk )k∈N .1.) dn.k fk = [tn ]d(t)f (th(t)) (5.1. if both d(t).1 Let D = (d(t).k = [tn ] (1 − th(t))2 performed by operating a suitable transformation on k a generating function and then by extracting a coefficient from the resulting function. from the previous theorem we immediately find: and this shows that the generic element is actually a d(t) binomial coefficient. By formula (5. More precisely.alternating r. G (−1)k = 1/(1 + t) and G(k) = t/(1 − t)2 . s. whose rows are the diagonals of D. by observing that D = (d(t). h(t)) is to consider the bivariate generating function: d(t. then the dn.k fk = dn. z) = k −1 (1 − t − tz) . kdn. h(t)). Another way of characterizing a Riordan array D = (d(t). one of the most impor. d(t) (−1)k dn.Chapter 5 Riordan Arrays 5.k = [tn ] 1 − th(t) well-known bivariate generating function d(t.2) In the case of Pascal triangle we obtain the Euler transformation: n A common example of a Riordan array is the Pascal triangle.k fk = In fact.k∈N defined by: putation: dn. h(t) ∈ F 0 . s. we Moreover. In fact we have: dn. for which we have d(t) = h(t) = 1/(1 − t). s. weighted r. z) = ∞ k=0 k=0 = k=0 ∞ k=0 [tn ]d(t)(th(t))k fk = ∞ k=0 = [tn ]d(t) fk (th(t))k = = [tn ]d(t)f (th(t)).3) Riordan array is called proper.k = [tn ] From our point of view. By considering the generating functions G(1) = 1/(1 − t). prove the following theorem: 55 . The Riordan array k=0 can be identified with the infinite.k )n. then we have: A Riordan array is a couple of formal power series n D = (d(t).1.1.k = [tn ]d(t)(th(t))k (5.

Suppose.k = [tn ]f (t)(tp )k = [tn−pk ]f (t) = fn−pk . that we wish to sum one out of every three powers of 2.56 we have: CHAPTER 5. if dn.k is the generic element in (a(t). which are 1. By formula (5. .i fi = dn. the generic element of the Riordan array (f (t). we element of the Riordan array (f (t). we find Gt (dn. not depending on f (t).k )n. This gives Sn ≈ 2n+3 /7. . h(t)) is a proper Theorem 5. b(t)) = (d(t)a(th(t)). h(t)) and fj. Therefore. h(t)) · (a(t). . 1 − 2t 1 − t3 k=0 [tn ]d(t) As we will learn studying asymptotics. 2. In fact we have: ∞ j=0 dn. In a sense. . = = [tn ]d(t)(th(t))j [y j ]a(y)(yb(y))k = ∞ (th(t))j [y j ]a(y)(yb(y))k = This can be called the rule of generalized convolution j=0 since it reduces to the usual convolution rule for p = n = [t ]d(t)a(th(t))(th(t)b(th(t)))k .2 Let (dn. For p = 1. the theorem on the sums involving the Riordan arrays is a characterization for them. for every k = 1. Then the triangle de. . an approximate value for this sum can be obtained by extracting the coefficient of the first factor and then by multiplying it by the second factor. The product is obviously associative. h(t)b(th(t))). g(t)) where f (t) = have: d(t)a(th(t)) and g(t) = h(t)b(th(t)).1) This expression is particularly important and is the basis for many developments of the Riordan array theory. we have: ⌊n/p⌋ k=0 1 = Fn+1 [tn ] 1 − t − t2 The algebraic structure of Riordan arrays The most important algebraic property of Riordan arrays is the fact that the usual row-by-column product of two Riordan arrays is a Riordan array. Therefore we have: ⌊n/3⌋ 1 1 . therefore. .k = [tn ] diagonal sums ∞ 1 − t2 h(t) and we have i=0 dn. Observe that this array is proper. (dn.2.. Pascal triangle.. g(t).1.2 connecting binomial coefficients and Fibonacci numbers. h(t)) · (a(t).j fj. In fact. in fact. If this is the case.1. This is proved by considering two Riordan arrays (d(t). the array (1.j fj. Let us now suppose that (d(t).k .j is the generic element in (d(t). where f (t) is the gen. we can prove a sort of inverse property: (5. b(t)) = f. . starting with 2n and By definition. b(t)) and performing the product. 2. we can look for a proper Riordan erating function of the sequence and d(t). diagonal sums give: n−k k = = [tn ] 1 1 = 1 − t 1 − t2 (1 − t)−1 k 5.2. 1). for example. b(t)). 1) acts as the neutral element or identity. h(t) are two array (a(t).proper. and we observe that the Riordan array (1.k . and this corresponds to the inithe generating function of any sum k dn−sk. h(t)) and (a(t). the last expression denotes the generic going down to the lowest integer exponent ≥ 0. by formula (5. Another general result can be obtained by means of two sequences (fk )k∈N and (gk )k∈N and their generating functions f (t).k∈N be an infinite triRiordan array.k fk = [t ]d(t)f (th(t)). d(t) The corresponding generating function is f (t) = tk . dn−k. this observation can be generalized to find dk (t) = d(t)(th(t))k . according to k the theorem’s hypotheses.k∈N .(1. in which t is substituted by 1/2.p. we immediately angle such that for every sequence (fk )k∈N we have see that the product of two proper Riordan arrays is n k dn. 1) is everywhere 0 except for the elements on the main diagonal.1). we should have: fined by the Riordan array (d(t). RIORDAN ARRAYS Proof: For every k ∈ N take the sequence which is 0 everywhere except in the kth element fk = 1.k for every s ≥ 1. We obtain well-known results for the tial definition of column generating functions for a Riordan array.s. tp−1 ) is: dn.k = ∞ j=0 fn−pk gk = [tn ]f (t) g(y) y = tp = = [tn ]f (t)g(tp ). Hence.k∈N = Obviously. b(t)) such that (d(t). 1. Sn = 2n−3k = [tn ] (d(t). and in fact we have the exact value Sn = ⌊2n+3 /7⌋. whose generic element is j dn.k )n. h(t)) coincides with d(t)a(th(t)) = 1 and h(t)b(th(t)) = 1.k )n.3). for example.

. If we define dh (th(t)) = d(t)−1 d(thh (t)) = dh (t)−1 h(t) = hs +hs+1 t+· · ·. Its ∞ ′ d (y) 1 elements are also known as renewal arrays. but d(t) ∈ the identities: F 0 . Proper Riordan arrays play a very important role By the product formula (5. for proper Riordan arrays.k = [z ][t ] 1 − zy y = th(y) classes of Riordan arrays are subgroups of A: 1 d d(y)−1 1 = = [z k ] [y n−1 ] • the set of the Riordan arrays (f (t). let us define: 5. The notations used in that Chapter are thus yd′ (y) 1 1 n−k [y ] k− . = explained as particular cases of the most general case n d(y) d(y)h(y)n of (proper) Riordan arrays. can be expressed as a linear Observe that obviously f h (t) = f (t).5. We have therefore proved: y = thh (t) = th(thh (t))−1 = th(y)−1 Theorem 5. the bivariate generating function for (d(t).s. combination of the elements in the preceding row. Rogers ∀f (t) ∈ F 0 . array inverse. f (thh (t)) = f h (t)−1 has found an important characterization: every element dn+1. sion Formula. 1) is an inn dy 1 − zy h(y)n variant subgroup of A. d(y)−1 −1 k n = It is a simple matter to show that some important dn. which is not proper. h(t)). the two f. Therefore we find: is a group with the operation of row-by-column prod.2.s. − = zr yr d(y) r=0 d(y)h(y)n The first two subgroups have already been consid1 d′ (y) 1 n−1 ered in the Chapter on “Formal Power Series” and [y ] ky k−1 − y k = = n d(y) d(y)h(y)n show the connection between f. then h(t) ∈ F 0 .3.1).p.1) we immediately find in our approach. hh (t)). Since h(0) = 0. k ∈ N. h(t))−1 = (dh (t). 57 done by using the LIF. and Riordan arrays. g(t)) is a subd′ (y) 1 − = 2 (1 − zy) group of A and is called the subgroup of associd(y) h(y)n ated operators or the Lagrange subgroup.: generic element dn. h(t)) is d(t)/(1 − tzh(t)) and therefore we have: dn. THE A-SEQUENCE FOR PROPER RIORDAN ARRAYS By setting y = th(t) we have: a(y) = d(t)−1 t = yh(t)−1 b(y) = h(t)−1 t = yh(t)−1 . a(y) and b(y) are uniquely defined. and consequently: h h uct defined functionally by relation (5.p. f (t)) is a subn r=0 group of A and is called the Bell subgroup.1 The set A of proper Riordan arrays which is the same as t = yh(y). Consequently. an s > 0 exists such that h(t) = hs ts + hs+1 ts+1 + · · · and hs = 0. Let us consider a Riordan array D = (d(t). h(t))−1 in terms of d(t) and h(t). ∞ 1 z r+1 y r (r + 1)− = [z k ] [y n−1 ] • the set of Riordan arrays (f (t).2. By the formulas above. As we observed in the first section.p.d (t) = d (yh(y)) = d(t)−1 . n d(y)(1 − zy)2 • the set of the Riordan arrays (1.s.k+2 + · · · = . and therefore there is a unique function 1 − zy −1 t = t(y) such that t(0) = 0 and t = yh(t) . This will be dn+1.3 The A-sequence for proper dh (t) = d(y)−1 y = th(y)−1 Riordan arrays so that we can write (d(t). h(t)) is proper and the rows of D can be seen as the s-diagonals (dn−sk )k∈N which can be reduced to the single and basic rule: of D. Fortunately. If h(t) is any fixed invertible f. hh (th(t)) = h(t)−1 h(thh (t)) = hh (t)−1 the Riordan array D = (d(t).e.k+1 = a0 dn. it is called the Appell z 1 − = [z k ] [y n−1 ] subgroup. Besides. h(t) ∈ F 0 .k in the inverse Riordan array (d(t). Let us now return to the formulas for a Riordan This is the formula we were looking for. we have: being d(t). n.k = [tn z k ] dh (t) = 1 − tzhh (t) dh (t) Here we are in the hypotheses of the Lagrange Inver= [z k ][tn ] y = thh (t) . We wish now to find an explicit expression for the i.2.k+1 .k+1 + a2 dn.k + a1 dn.

1 In the same way. The A-sequence for the Pascal triangle is the solution A(y) of the functional equation 1/(1 − t) = A(t/(1 − t)). h(t)).k )n. Now we can uniquely determine the value of z1 by expressing d2.k of the left hand ∞ ∞ member is j=0 dn. let us observe that (5. A(t) the generating function of the sequence A and define h(t) as the solution of the functional equation h(t) = A(th(t)). .2 Let (dn.0 /d0.e. then a unique sequence Z = (zk )k∈N exists such that every element in column 0 can be expressed as a linear combination of all the elements in the preceding row.3.0 d(t) = 1 − tZ(th(t)) h(t) = A(th(t)) (5.3. i. Therefore.3. let it be a proper Riordan array). Z(t)) completely characterizes a proper Riordan array. . it characterizes column 0.k∈N is a Riordan array if and only if a sequence A = {a0 = 0. .1 An infinite lower triangular array D = (dn.3. . which is uniquely determined because of our hypothesis a0 = 0. By performing the product we find: d(t)A(th(t)) = d(t)h(t) and h(t)B(th(t)) = h(t).1 or z1 = d0.3. vice versa. (5.0 . Furthermore.n = 0. we determine the sequence Z in a unique way. The simple substitution y = t/(1 − t) gives A(y) = 1 + y. we realize that we could k have started with this recurrence relation and directly found A(y) = 1+y. we determine z2 by expressing d3. .3.3) Proof: Let z0 = d1.0 = z0 dn. B(t)) = (d(t).1) holds Proof: Let us suppose that D is the Riordan array (d(t).0 − d2 1. B(t)) = (d(t)h(t).0 = z0 d1. h(t)) or: (d(t). The element fn. by our previous observation. h(t)). Let d(t) be the generating function of this column.0 in terms of the elements in row 1. The sequence Z is called the Z-sequence for the (Riordan) array. and this immediately gives h(t) = 1/(1−t).3. . i.0 .1) uniquely defines the array D when the elements {d0.k∈N be any infinite. We have: the proof of the theorem.2 + · · · = ∞ j=0 aj dn. In fact. we have: d0. h(t))−1 · (d(t)h(t). h(t)) and it only array and let Z(t) be the generating function of the depends on h(t). if as usual we interpret ak−j as 0 when k < j. 1.2) as the solution of h(t) = 1 + th(t). Therefore we have (d(t). a2 .1 + z2 dn.1) The sum is actually finite and the sequence A = (ak )k∈N is fixed.j . d0.k+1 . RIORDAN ARRAYS and this uniquely determines A when h(t) is given and. We can therefore consider the proper Riordan array D = (d(t). More precisely. . for every n. Now.0 + z1 dn. The latter identity gives B(th(t)) = 1 and this implies B(t) = 1. By equating these two quantities.0 .0 + z1 d1. The same element in the right hand member is: [tn ]d(t)h(t)(th(t))k = = [tn+1 ]d(t)(th(t))k+1 = dn+1. At this point.3.k )n.2) . we can prove the following theorem: Theorem 5. 1.: dn+1. zj dn. d1. A(t).0 d1.1). h(t)) be a proper Riordan of the Riordan array D = (d(t). corresponding to the well-known basic recurrence of the Pascal triangle: n+1 = k+1 n n + k+1 .3.3 Let (d(t). By proceeding in the same way. h(t)) and let us consider the Riordan array (d(t)h(t). h(t)) · (A(t). (5.3. k ∈ N relation (5.0 in terms of the elements in row 2.}. For the converse.e. by the first part of the theorem. This completes the proof. since column 0 is {1. D satisfies relation (5. h(t) is defined by (5. B(t)) by the relation: (A(t). .} of column 0 are given. it must coincide with D.} exists such that for every n.0 . h(t) is uniquely determined when A is given. ∀n ∈ N (in particular. h(t)). 1/(1−t)) as initially stated. we have the identity (5.: d2.0 .3.58 = ∞ j=0 CHAPTER 5. we can say that the triple (d0. k ∈ N and therefore. h(t)). let us prove the following: The sequence A = (ak )k∈N is called the A-sequence Theorem 5. d2. Another type of characterization is obtained through the following observation: Theorem 5.k+j aj .0 . The pair of functions d(t) and A(t) completely characterize a proper Riordan array. we define the Riordan array (A(t).j ak−j = j=0 dn. To see how the Z-sequence is obtained by starting with the usual definition of a Riordan array. h(t)) · (A(t). 1) = (d(t)h(t).1). . as we have shown during array’s Z-sequence. except for the element d0.0 d2.k+j . we have proved that the Pascal triangle corresponds to the Riordan array (1/(1−t). and by substituting the values just obtained for z0 and z1 .0 . a1 . lower triangular array with dn.

equation (5. the Z-sequence exists and is unique.and Z-sequences: Theorem 5. and we can go on to the generating functions. then D = (dn. respectively. The theorem now directly follows from (5. then D = By solving this equation in d(t). n. tb−a−1 tm The relation can be inverted and this gives us the .. Then d(t) = h(t) if and only if: A(y) = d(0) + yZ(y). In either case. bin+ak nomial coefficients of the form m+bk . (5. because A(th(t)) = h(t) by formula (5. whose elements depend on the parameters a. we have Riordan arrays and therefore we can apply formula (5. m or a.k ) or (dm.4. t−b−1 (1 + t)−a .4 Let d(0) = h(0) = 0.5. we have: d(t) − d0. or vice versa.4. we immediately find (d ) is a Riordan array. Depending if we consider n a variable and m a parameter. Theorem 5.0 = t = z0 d(t) + z1 d(t)th(t) + z2 d(t)(th(t))2 + · · · = = d(t)(z0 + z1 th(t) + z2 (th(t))2 + · · ·) = 59 two parameters and k is a non-negative integer variable.k hypothesis d(t) = h(t): d(0) + yZ(y) = = = = d(0) 1 − d(0) + y t = yh(t)−1 = t th(t) th(t) d(0)th(t) t = yh(t)−1 = − d(0) + t th(t) h(t) t = yh(t)−1 = A(y).k be as above.k ). we have: d(t) = = = d(0) = 1 − tZ(th(t)) d(0) = 1 − (tA(th(t)) − d(0)t)/th(t) d(0)th(t) = h(t).1. Since d(t)(th(t))k is the generating function for column k. = = [tm+bk ](1 + t)n+ak = [tm ](1 + t)n (t−b (1 + t)a )k .3) takes on two specific forms which are worth being stated explicitly: n + ak fk = m + bk k 5. By the previous theorem. Vice and: versa.3. a Riordan array. If b < 0 is an integer. b are = [tm ](1 + t)n f (t−b (1 + t)a ) .1) n + ak fk = m + bk b < 0. If b > a and b − a is an integer. b.2) Let us consider simple binomial coefficients.3. Therefore.1 Let dn. we find: dn.4. Proof: Let us assume that A(y) = d(0) + yZ(y) or Z(y) = (A(y) − d(0))/y. D= (1 + t)n .4. b. if some conditions on a. we have two different infinite arrays (dn. We conclude this section by giving a theorem.k the relation desired.1) For m = a = 0 and b = 1 we again find the Riordan array of the Pascal triangle.0 td(t) t = yh(t)−1 .2). where a. We have: m.4 Simple cients binomial coeffi- = [tn ] tm f (1 − t)m+1 k tb−a (1 − t)b b>a (5. we obtain from the dm.k ) is = d(t)Z(th(t)). D= (1 − t)m+1 (1 − t)b formula for the Z-sequence: Z(y) = d(t) − d0. by the formula for Z(y).k and dm.3. which characterizes renewal arrays by means of the A.e. d(0)t Proof: By using well-known properties of binomial coefficients.1.3) is valid for every n ∈ N. The sum (5.3) to find the value of many sums.1. b hold.k = = n + ak n + ak = = n − m + ak − bk m + bk −n − ak + n − m + ak − bk − 1 × n − m + ak − bk = = = × (−1)n−m+ak−bk = −m − bk − 1 (−1)n−m+ak−bk = (n − m) + (a − b)k 1 [tn−m+ak−bk ] = (1 − t)m+1+bk [tn ] tm (1 − t)m+1 tb−a (1 − t)b k . i. SIMPLE BINOMIAL COEFFICIENTS Proof: By the preceding theorem.

1). 2k 2k = α α−1 ± . − √ = Let us begin our set of examples with a simple sum. corresponds to the Riordan array (tm /(1 + t)m+1 .1) and √ (1 + t − 2 t)z+1 (5. 2 t(1 + t)z+1 n−k by the theorem above. we have: 2n n + k fk = n+k n−k = k k n+k fk + 2k k n−1+k fk = 2k + = = 1 f [t ] 1−t n k + [tn−1 ] = [tn ] 1+t f 1−t 1 f 1−t t (1 − t)2 t (1 − t)2 t (1 − t)2 .2).4. since (1 + t ± 2 t)z+1 = (1 ± t)2z+2 . we used backwards the we have: √ √ bisection rule. m t n−k 1 We solve the following sum by using (5. by the formula concerning the row sums. . if − is considered): α±β α α β For example we find: 2n n + k n+k 2k = = = 2n n + k = n+k n−k n+k n+k−1 + = n−k n−k−1 n+k n−1+k + . the binomial coefficients m 2z + 2 = [t2n+1 ](1 + t)2z+2 = .4. = 2 t By applying formula (5.2): z+1 2k + 1 = z − 2k 2k+1 2 = n−k √ (1 + 2 y)z+1 − [tn ](1 + t)z √ 2 y Hence. m−1 In the following sum we use the bisection formulas.4. by formula (5.2): = [tn ] = m (1 − t)m+1 1 − t k n 2n − 2k (−2)k = 1 n+1 n−m k m−k . The following sum is a more interesting case. = [t ] = k m+1 (1 − t)m+2 t = [tm ](1 + t)2n (1 − 2y)n y = = Another simple example is the sum: (1 − t)2 n n 5k = = [tm ](1 + t2 )n = 2k + 1 m/2 k = = [tn ] t2 1 t y= = (1 − t)2 1 − 5y (1 − t)2 1 n 2t [t ] = 2n−1 Fn . 1). From the generating function of the Catalan numbers we immediately find: n+k m + 2k = = 2k (−1)k = k k+1 √ tm 1 + 4y − 1 [tn ] m+1 (1 − t) 2y [tn−m ] × 1 (1 − t)m+1 2 5. β β−1 y= t = (1 − t)2 × 1+ 4t −1 (1 − t)2 = (1 − t) = 2t 1 = [tn−m ] (1 − t)m n−1 .60 CHAPTER 5. The binomial coefficient m+bk is so gen(1 + t + 2 t)z+1 √ − = [tn ](1 + t)z+1 eral that a large number of combinatorial sums can 2 t(1 + t)z+1 be solved by means of the two formulas (5. RIORDAN ARRAYS √ (1 − 2 y)z+1 t If m and n are independent of each other. 2 1 − 2t − 4t2 where the binomial coefficient is to be taken as zero when m is odd. 2n + 1 therefore.5 Other Riordan arrays from binomial coefficients k Other Riordan arrays can be found by using the theorem in the previous section and the rule (α = 0. these y= − = √ relations can also be stated as generating function 2 y (1 + t)2 n+ak √ identities. Because the generating function for z+1 2k is (1 + k 2t)z+1 . in the last but one passage.4. we have: z+1 G 22k+1 = 2k + 1 √ √ 1 √ (1 + 2 t)z+1 − (1 − 2 t)z+1 .4.

The following one is known as Riordan’s old identity: n n−k (a + b)n−2k (−ab)k = an + bn k n−k k k 1+t 1 √ = [tn ] 1−t 1 + 4y = [tn ]1 = δn.1) by using the change of variable t → pt and prove other formulas. let us suppose that k is This gives an immediate proof of the following for. For example: 2n n + k n+k n−k 2k (−1)k = k t = y= (1 − t)2 61 where φ is the golden ratio and φ = −φ−1 . By formula (5. Let f (t) = G(fk ) and: Binomial coefficients and the LIF In a few cases only. w(t) = . In fact.6 The following is a quite different case.6. and try to mula known as Hardy’s identity: find a true generating function. when the left-hand side is 1 and the to know whether it corresponds to a Riordan array or not.1) d(t) function of a Riordan array because it varies as n varies. the formulas of the previous sections give the desired result when the m and n in t f (τ ) − f0 fk the numerator and denominator of a binomial coeffi= dτ. acn−k n−k n−k−1 = cording to the diagonalization rule. in that case. We have to solve the n−k n k equation: (−1) = n−k k k w = tφ(w) or w = t(1 + w)2 . 1 − t2 n = [t ] ln = 1 + t3 This equation is tw2 − (1 − 2t)w + t = 0 and we are 1 1 looking for the unique solution w = w(t) such that − ln = = [tn ] ln 3 2 w(0) = 0.0 + δn. 0 we have to extract the coefficient of tn from a funcObviously we have: tion depending on the same variable n (or m). G(t) = G k τ cient are related between them.0 .1): 2n − k n n−k = [tn−k ](1 + t)2n−k = fk = n−k n−k k k = f0 + n ∞ The function (1 + t)2n cannot be assumed as the = f0 + n[t ]G . The reader can generalize formula (5. This is: 1+t 1−t √ (−1)n 2/n if 3 divides n 1 − 2t − 1 − 4t = (−1)n−1 /n otherwise. 2n k 2k (−1)k = k k+1 k √ t 1+t 1 + 4y − 1 y= [t ] = 1−t 2y (1 − t)2 [tn ](1 + t) = δn. k n−k 1 k n−k = φn + φn n F (w) = w 1+w k = .4. 2n n + k n+k n−k = = n while this is a generalization of Hardy’s identity: n n − k n−2k x (−1)k = k n−k √ √ x2 − 4)n + (x − x2 − 4)n .5.5. we can apply the diagonalization rule with F (t) = (t/(1 + t))k and φ(t) = (1 + t)2 .fixed. (5. This requires to apply the Lagrange Inversion Formula.5. BINOMIAL COEFFICIENTS AND THE LIF This proves that the infinite triangle of the elements 2n n+k is a proper Riordan array and many identin+k 2k ties can be proved by means of the previous formula. We have: right-hand side is not defined. Let us suppose k k−1 k we have the binomial coefficient 2n−k and we wish n−k except for k = 0. = (x + 5.1 . 2t We also immediately obtain: We now perform the necessary computations: n k=1 n − k − 1 fk = k−1 k t2 1−t = [tn ](1 + t)2n t 1+t k . Therefore.

which therefore is a proper Riordan array. i. which go from the point (x. c (a > As a check. east steps. b.k∈N . y). = [tn ] A colored walk is a walk in which every kind of step can assume different colors.e.ored walks of length n + 1 reaching the distance k + 1 ternating sum: from the main diagonal. walks.e. c} is the A-sequence of (dn. y) to (x + 1. we observe that column 0 contains all the 0. This significant fact can be stated as: . D= √ .k the numn−k k=0 ber of colored walks which have length n and reach a √ 1 − 1 − 4t 1 1 n distance k from the main diagonal. we can We wish to point out that our considerations are show that: by no means limited to the walks on the integral lat2n + ak = tice. y + 1). a walk of length n reaching the distance k from √ 6 1 1 1 − 1 − 4t the main diagonal. trees and Catalan numbers” we introduced the concept of a walk or path on the integral lattice Z2 . = =√ 1 − tφ′ (w) 1 − 2t(1 + w) 1 − 4t Therefore. colored walks. in order n−ck to apply the LIF. y) to (x.k+1 + cdn. followed by any of the b diagonal steps. . n−k 2 1 − 4t In the section “Walks. b.... the diagonalization gives: √ k 2n − k 1 1 − 1 − 4t = [tn ] √ . 2 n 6 The reader is invited to solve. The concept can be generalized by defining a walk as a sequence of steps starting from the origin and composed by three kinds of steps: 1. diagonal elements with k = 0. i.. diagonal steps. walks that never go above the n main diagonal x − y = 0.k+1 = adn. and underA simple example is: diagonal walks. y) to (x + 1.e. This creates many difficulties. 3.62 √ 1 − 2t − 1 − 4t √ 1 − 1 − 4t √ k 1 − 1 − 4t . the corresponding non-alternating sum. but in this case. c ≥ 0) the number of colors the east. the number of colAn interesting example is given by the following al. a problem is called symmetric if and 1 − 4t 1 − 4t 1 − 4t only if a = c. we denote by a. We observe that each walk 2n is obtained in a unique way as: k (−1) = n − 3k k 1. b. By using the diagonalization rule as above. 2t 1 − 4t 1 + y 2. 2t 1 − 4t Let us consider dn+1.k + bdn.k+1 . tc−1 = literature exists on walk problems. In the same way we can deal with binomial coefficients of the form pn+ak . 2n . followed by any of the c north steps.k+2 . followed by any of the a east = [tn ] √ y = t3 steps. which go from the point (x. which go from the point (x.k )n. we have to solve an equation of degree p > 2. We discuss complete colored dance with the generating function d(t) = 1/ 1 − 4t.e. and we do not insist on it any longer.e. and we denote by dn. i. walks without any restriction. RIORDAN ARRAYS = = = 5. The length of a walk is the 2n − k k 2 = number of its steps. north steps. Many combinatorial problems can be proved n − ck k∈N to be equivalent to some walk problems. 2. Hence we have: dn+1.. and this is in accorn √ and north steps can be. A colored walk 1 1 1 problem is any (counting) problem corresponding to √ = [tn ] = [tn ] √ = 4n . This proves that A = {a. a vast 1 − 1 − 4t 1 √ . y + 1). i. 1−t 1 √ + = 2 1 − 4t 2(1 − 3t) δn. in a similar way. bracketing √ a+2c problems are a typical example and.0 1 2n + 3n−1 + = . 2t 1 − 4t furthermore: 1 1 1 .7 Coloured walks This shows that the binomial coefficient is the generic element of the Riordan array: √ 1 − 1 − 4t 1 . in fact. i. a walk of length n reaching the distance k + 2 from the main diagonal. the last step y= = [t ] √ = 2 1 − 4t 1 − 2y ends on the diagonal x − y = k ≥ 0. 2 k CHAPTER 5. 3. a walk of length n reaching the distance k + 1 from the main diagonal.

c. the formulas for the row sums and the d(t. Since a complete walk can relation we easily find d(t) = (1/a)h(t).. is the same in both cases.k )n.7. . Con√ 2 sequently. Hence. In fact. and we can find it lows: by means of formula (5. w) = + b + aw tn = w n 1.1 2ct2 ∆ and in the column generating functions this corresponds to d(t) − 1 = btd(t) + ctd(t)th(t).k be the number of colored walks of length n reaching a distance k from the main diagonal. From this The proof is as follows.k∈N is a proper Riordan array. in which walk is: k can also assume the negative values.k )n. that only depends on the A.k . divided by αn .k )n. major importance is usually n c given to the following three quantities: d(t. . we assume c = 0.k = k a b 2at (a + b + c)t − 1 and so we end up with the Pascal triangle. 4at (a + b + c)t − 1 walks and another from underdiagonal walks.3.and we only have to derive the form of the corretained from another walk returning to the main diag. then the infinite triangle (dn.k∈N = .0 + cdn. b. 63 weighted row sums given in the first section allow us to find the generating functions α(t) of the total number αn of underdiagonal walks of length n. In Chapter 7 we will learn how to find an asymptotic approximation for dn . (dn. . the value of the row sums If we expand this expression by partial fractions. erating function of the nth row is ((c/w) + b + aw)n .7. i. COLOURED WALKS Theorem 5. With regard to the last two points.0 = bdn. we have dn+1.1) the Riordan array of underdiagonal colored only the right part of an infinite triangle. we get: of the Riordan array. we observe that every The study of complete walks follows the same lines walk returning to the main diagonal can only be ob. 1 1 − (b + 2a)t − ∆ n k n−k α(t) = When c = 0. However.k∈N is by (5. this is dn = [tn ]d(t). the average distance from the main diagonal of n all the walks of length n. By following √ √ the logic of the theorem above. 3. the total number of walks of length n. A(t) = a+bt+ct2 and h(t) is the solution of the functional equation 1 − (b − 2a)t 1 −1 α(t) = h(t) = a + bth(t) + ct2 h(t)2 having h(0) = 0: 2at 1 − (b + 2a)t √ 1 − bt − 1 − 2bt + b2 t2 − 4act2 h(t) = (5.k )n.k .In the symmetric case these formulas simplify as folsequence. Catalan and Motzkin triangles define √ walking problems that have different values of a. for every n. the array (dn. this is n αn = k=0 dn.k∈N = √ .5. the number of walks returning to the main diag1 onal. which is: onal followed by any diagonal step. sometimes have some combinatorial significance as Let us now focus our attention on underdiagonal well. walks.7. and δ(t) of the total distance δn of these walks from the main diagonal: The Pascal.sponding Riordan array. we see that the gen1 − bt − ∆ 1 − bt − ∆ (dn. For any given triple 1 1 − (b + 2a)t − ∆ (a. b. north step. which is the weighted row sum of the Riordan array.2). If we consider dn+1. 2at 1 − (b + 2a)t 1 − (b + 2a)t The radicand 1 −√ + (b2 − 4ac)t2 = (1 − (b + 2bt √ 2 ac)t)(1 − (b − 2 ac)t) will be simply denoted by The alternating row sums and the diagonal sums ∆. w) = = 1 √ ∆ 1− 1 √ ∆ 1− √ 1−bt− ∆ w 2ct 1 − 1− √ 1−bt+ ∆ w 2ct 1 √ + 1−bt− ∆ w 2ct 1 1 − bt − + 2at √ ∆1 w1− √ 1−bt− ∆ 1 2ct w 1 .0 . = 1 − (aw + b + c/w)t 2. the function h(t).7. or a walk ending √ at distance 1 from the main diagonal followed by any 1 1 − bt − ∆ .1 Let dn. this is δn = k=0 kdn. 2act2 2ct2 and therefore the bivariate generating function of the extended triangle is: In current literature.1) 1 1 − (b − 2a)t 1 − bt 2ct2 δ(t) = − . and so they can be treated in the same way.e. and therefore go above the main diagonal. c) we obtain one type of array from complete δ(t) = . it is easily proved that dn.

k . by specializing the (k + 1)et (et − 1)k = recurrence relation to k = 1. RIORDAN ARRAYS The first term represents the right part of the ex.1 + dn. We can now go on n n n+1 by setting k = 1 in the recurrence. Obviously.k .2 + 2dn.k+1 + (k + 1)dn.k+1 = (k + 1)dn. By the induction hypothesis. if we multiply the k+1 k k+1 n m tm = (1 − t)(1 − 2t) · · · (1 − mt) This proves. Let us now proceed as above and find the column generating functions for the new array (dn. the generating functions: recurrence relation gives: d′ (t) = (k + 1)dk+1 (t) + k+1 (k + 1)dk (t). if he or she prefers).2 = 2dn. This gives the relation between fact can be obtained by mathematical induction. We are interested in the right part.1 = dn. We multiply the basic The generating functions for the Stirling numbers recurrence: of the first kind are not so simple. If we examine the two in(n + 1)dn+1. substitute dk (t) = (et − 1)k and solve the differential t equation thus obtained.k )n. and the = (n + 1)! k + 1 expression can be written as: 1 √ ∆1− 1 √ If we denote by dn. we t/(1−t).64 CHAPTER 5. which is easily checked by looking at column have: 1 in the array. let us go on with the Stirling numbers of the second kind n n n+1 + =n proceeding in the following way. as we shall which is now proved by induction when we specialize see in the next section. we immediately realize that they are not Riordan ar′ rays. However. we obtain: (n + + = (k + 1) 1)dn+1. can specialize the recurrence (valid for every n ∈ N) we have: dk (t) = (et − 1)k . the recurrence relation above to k = m. in general. Again. The solution of this simple differential equation ing functions for the Stirling numbers of the second is d1 (t) = et − 1 (the reader can simply check this kind. this is a k which immediately gives the form of the Riordan ar. d0 (t) = 1.k∈N is a Riordan array having d(t) = 1 and th(t) = (et − 1). we immediately obtain S1 (t) = verify that dk+1 (t) = (et − 1)k+1 .k the quantity k! n /n!. that n = 2n−1 −1. (n + 1)dn+1. we can S1 (t) − S1 (0) = S1 (t) + S0 (t).recurrence relation by (k + 1)!/(n + 1)! we obtain the tended triangle and this corresponds to k ≥ 0. by substituting. In practice. n+1 5. This is left For the Stirling numbers of the first kind we proto the reader as a simple exercise. In a similar way. we d2 (t) = (et − 1)2 . which can be written as: ray. by starting with the recurrence relation: solution. ceed in an analogous way. (1 − t)(1 − 2t) and this equality is obviously true. This fact allows us Sm (t) = G n∈N to prove algebraically a lot of identities concerning the Stirling numbers of the second kind.8 Stirling numbers and Riordan arrays proves that (dn. A rigorous proof of this to the case k = 0.k )n. It is not difficult to obtain the column generat. whose solution is: S2 (t) = t2 .recurrence relation for dn. new relation: whereas the second term corresponds to the left part (k + 1)! n + 1 (k < 0).and passing to generating functions: d1 (t) = d1 (t) + 1.1 . this differential equation has the solution and the obvious generating function S0 (t) = 1. in an algebraic way.k∈N . and this suggests that. we find S2 (t) = 2tS2 (t)+ (k + 1)(et − 1)k+1 + (k + 1)(et − 1)k tS1 (t). we can simply because S1 (0) = 0. or d′ (t) = 2d2 (t) + 2(et − k+1 k k+1 2 1). 2 and also indicates the form of the generating function for column m: n∈N .0 finite triangles of the Stirling numbers of both kinds. by setting k = 0 in the new The connection between Riordan arrays and Stirling recurrence: numbers is not immediate. The form of this generating function: dk (t) = G k! n! n k = (et − 1)k 1−bt− 2ct 1 =√ ∆ ∆ w k 1 − bt − 2ct √ ∆ k w k = (k + 1)! n! n k+1 k+1 k! + n + 1 n! n k k+1 .

This suggests the general formula: fk (t) = G k! n n! k = n∈N ln 1 1−t k and again this can be proved by induction. therefore we have: On n! n = k=0 k! n! n k = y = et − 1 = [tn ] 1 . ′ This is equivalent to f1 (t) = 1/(1 − t) and because f1 (0) = 0 we have: (k + 1)! n n k! n k + 1 + .k+1 + (k + 1)fn.k )n. For the (k + 1)! n + 1 = Stirling numbers of the first kind we have: (n + 1)! k + 1 = that is: (n + 1)fn+1. The row sums of the Stirling numbers of the second kind give. we [t ] = . A(t) = m k ln(1 + t) k Identities involving Stirling numbers the = [tn ] 1 1−y . = tA ln ln [t ] (ey − 1)m y = ln = = 1−t 1−t m! 1−t tm n! n − 1 n! n By setting y = ln(1/(1 − t)) or t = (ey − 1)/y. as we know. IDENTITIES INVOLVING THE STIRLING NUMBERS 65 by (k + 1)!/(n + 1)! and study the quantity fn. We also defined the ordered Bell numbers as On = n n k=0 k k!.k+1 = nfn. the A-sequences tween them in various ways. we find that the A-sequence for the Besides. we obtain: ′ ′ f1 (t) = tf1 (t) + f0 (t).5. For example. The first one is proved in this way: kind is: k n t (−1)n−k = . whose solution is: f2 (t) = 1 ln 1−t 2 1 .k∈N is the Riordan array having d(t) = 1 and th(t) = ln(1/(1 − t)).9. thus we can obtain the (exponential) generating function for these numbers: n f1 (t) = ln By setting k = 1. once we know their h(t) function. k + 1 n + 1 n! k n + 1 n! n k=0 n k n = n! k=0 k! n 1 = n! k k! 1 = 1−t = n![tn ] ey y = ln = n![tn ] 1 = n! 1−t as we observed and proved in a combinatorial way. Bn n! therefore we have: G = exp(et − 1). two orthogonality relations exist between triangle related to the Stirling numbers of the second Stirling numbers.k . (fn. because they do not corStirling numbers of the two kinds are related berespond to A-sequences..k = A first result we obtain by using the corresponk! n /n!: dence between Stirling numbers and Riordan arrays k concerns the row sums of the two triangles. = y y have A(y) = ye /(e − 1) and this is the generating m! (1 − t)m m! m − 1 function for the A-sequence we were looking for. we have: for the two arrays can be easily found. For the Stirling numbers of the k! n m! k n k n! = = first kind we have to solve the functional equation: m! n! k k! m k m k k 1 1 1 n! n . In this case also we have f0 (t) = 1 and by specializing the last relation to the case k = 0. we find the simple differential equa′ ′ tion f2 (t) = tf2 (t) + 2f1 (t). 1−t k=0 n k n = n! k=0 k! n! n k 1 = k! = n![tn ] ey y = et − 1 = = n![tn ] exp(et − 1).k do not 1 On = . In a similar way. In this case. the Bell numbers. However.k and fn. . 2 − et 5.9 We have thus obtained the exponential generating function: The two recurrence relations for dn. G give an immediate evidence that the two triangles n! 2 − et are indeed Riordan arrays.

k The second orthogonality relation is proved in a similar way and reads: n k k (−1)n−k = δn. For n = 0 we have: n Bk = B0 = 1.m . this holds for n > 0. For the Stirling numbers of the first kind we have the identity: n k=0 n Bk k n = n! k=0 k! n Bk = n! k k! y ey − 1 y = ln 1 = 1−t = n![tn ] . y = et − 1 = We conclude this section by showing two possible connections between Stirling numbers and Bernoulli numbers. m! Clearly. We can now prove these identities by using a Riordan array approach.m . RIORDAN ARRAYS n! m! k! n m! n! k k! k m (−1)k = 1−t 1 ln = t 1−t (n − 1)! 1 = . m k=0 k We introduced Stirling numbers by means of Stirling identities relative to powers and falling factorials. In fact: n k=0 n (−1)n−k xk = k n = = = and: n (−1)n n! k=0 k! n xk (−1)k = n! k k! (−1)n n![tn ] e−xy y = ln 1 = 1−t x (−1)n n![tn ](1 − t)x = n! = xn n n k=0 n xk = n! k = n![tn ] (1 + y) k=0 x k! n! n k x k = = n![tn ]etx = xn . = n![tn+1 ](1 − t) ln 1−t n+1 = n![tn ] n k 1 n! n [t ] (e−y − 1)m y = ln = (−1) m! 1−t n! (−1)n [tn ](−t)m = δn.66 = = = (−1)n n CHAPTER 5. First we have: n k=0 n k (−1)k k! = n! k+1 n k=0 k! n! n k (−1)k = k+1 1 1 = n![tn ] − ln y 1+y t = n![tn ] t = Bn e −1 y = et − 1 = which proves that Bernoulli numbers can be defined in terms of the Stirling numbers of the second kind.

or prefix of w. Usually. Observe During the 1950’s. z2 ) of words in T ∪ N . An The basic definition concerning formal languages alphabet is a finite set A = {a1 . . Given a grammar G = (T. . is called the “free monoid” generated by A. in other words. φm } is the alphabet of nonterminal symbols. If w ∈ A∗ and z is a word such that w can be decomposed w = w1 zw2 (w1 and/or w2 possibly empty). N. A production is a pair (z1 . the empty word included. associativity: w1 w2 w3 . . Therefore. and therefore a symbols. To understand this point. ǫ is the identity or neutral element: ǫw = wǫ = w. . . . w2 ⊢ w3 . . • σ ∈ N is the initial symbol. by construction. let us or suffix of w. P). . . a language on A is any subset proceed in the following way. . Observe that a monoid is an algebraic structure more general than a group. . and is the only word of 0 length. the symbols in T are denoted by lower case Latin letters. such that z1 contains at least a symbol in N .1. σ. then |w1 | < |w1 |. Several occurrence in w. φ2 . w = w1 z1 w2 is the leftmost occurrence of z1 in w and w = w1 z2 w2 . A∗ is the free monoid on A. . if · denotes the juxtaposition. an }. that is the monoid freely generated by the symbols in A. If w ∈ (T ∪ N )∗ . we can apply a production z1 → z2 ∈ P to w whenever w can be decomposed w = w1 z1 w2 . What we get are the words on A. w1 (w2 w3 ) = (w1 w2 )w3 = • N = {φ1 . a2 . • P is a finite set of productions. let us consider the operation of juxtaposition and recursively apply it starting with the symbols in A. which. in which all the elements have an inverse as well.1 Formal languages the length of the word. we also 67 . is indicated by A∗ . Because of that. and if w = w1 z. we will write w = w1 z1 w2 ⊢ w1 z2 w2 when w1 z1 w2 is the decomposition of w in which z1 is the leftmost occurrence of z1 in w. we define the relation w ⊢ w between words w. word w is denoted by w = ai ai . N. we say that z is a head definitions have to be provided before a precise state. w ∈ (T ∪ N )∗ : the relation holds if and only if a production z1 → z2 ∈ P exists such that z1 occurs in w. ws−1 ⊢ ws . an } is the alphabet of terminal written by juxtaposing the symbols. . we recall definitions given in Section 2. If w = zw2 . Algebraically. we say that z is a subword of w. The empty sequence is called the empty word and is conventionally denoted by ǫ. . (A∗ . L ⊆ A∗ . w2 . a2 .Chapter 6 Formal methods say that z occurs in w. and the particular instance of z in w is called an occurrence of z in w. . P). the sequence is • T = {a1 . . has been generated by combining the symbols in A in all the possible ways. We observe explicitly that by our condition that in every production z1 → z2 the word z1 should contain 2. this means that w ⊢∗ w if and only if a sequence (w = w1 . and r = |w| is 1 2 r 6.that if z is a subword of w. and the result is the new word w1 z2 w2 ∈ (T ∪ N )∗ . Finally. The algebraic structure thus obtained has the following properties: 1. the production is often indicated by z1 → z2 . the symbols in N by Greek letters or by upper case Latin letters. . We also denote by ⊢∗ the transitive closure of ⊢ and call it generation or derivation. . First. ·). . if we also have w = w1 z1 w2 . A word on A (T. we say that z is a tail ment of the concept can be given. It is called a monoid. whose is the following: a grammar is a 4-tuple G = elements are called symbols or letters. σ. it can have more than one troduced the concept of a formal language. ai . ws = w) exists such that w1 ⊢ w2 . The set of all the words on A. . . the linguist Noam Chomski in. and by A+ if the empty word is excluded. and the juxtaposition can be seen as an operation between them. its length is obviously 0. where: is a finite sequence of symbols in A. .

.For example. finally define the language generated by the grammar G as the set: ii) in every prefix z of w the number of a’s is not less than the number of b’s. By the induction hypothesis. Besides. az1 where z1 is a prefix of w1 . By i) such occurrence of b must exist. the word 111 is generated by the two ure 6. the generation Dyck language D if and only if: should stop. the most important class of this kind is the class of “context-free languages”. b}. We have thus obtained a decomposition of w showing that the second production has been used. the largest class of sets which can be constructed recursively.e. In Fig. then by ii) w should begin by a. a word w ∈ T ∗ is in L(G) if and only if we can generate it by starting with the initial symbol σ and go on by applying the productions in P until w is generated. we can give a grammar for them). An example of an ambiguous grammar is G = (T. it is terminal. the theorem shows that the words in the Dyck language are the possible parenthetizations of an expression. it also holds for w. let us suppose that i) and ii) hold for w ∈ T ∗ . We will see how this n result can also be obtained by starting with the definition of the Dyck language and applying a suitable and mechanical method. The naming “context-free” derives from this definition. people have studied more restricted classes of languages. sometimes. however. P) where T = {1}. i. σ. which are defined through a non-ambiguous context-free grammar. because a production z1 → z2 is applied whenever the non-terminal symbol z1 is the leftmost non-terminal symbol in a word.1 we draw the generation of some words in the following leftmost derivations: Dyck language. b} belongs to the during a generation. if we suppose that i) holds for w1 and w2 . in finite terms. we can be unlucky and an infinite process can be necessary to find out that w ∈ S. never generating a word on T .. Otherwise. FORMAL METHODS ∗ at least a symbol in N . the generation stops. Note that.1 A word w ∈ {a. any prefix z of w must have one of the forms: a. . The method can be applied to every set of objects. If we substitute the letter a with the symbol ‘(′ and the letter b with the symbol ‘)’. N. ∗ ∗ L(G) = w ∈ T σ ⊢ w i. Proof: Let w ∈ D. Surely. this is not a problem: it only means that such generations should be ignored. if a word wi ∈ T ∗ is produced Theorem 6.. 6. N = {σ} and P contains the two productions: σ→1 σ → σσ. therefore. aw1 b or aw1 bz2 where z2 is a prefix of w2 . Let T = {a. w2 ∈ D. for which a finite process is always possible for finding out whether w belongs to the language or not. For ii). the number of Dyck words with n pairs of parentheses is the Catalan number 2n /(n+1). ii) should hold for z1 and z2 .g. The recursive nature of the producσ ⊢ σσ ⊢ 1σ ⊢ 1σσ ⊢ 11σ ⊢ 111 tions allows us to prove properties of the Dyck language by means of mathematical induction: σ ⊢ σσ ⊢ σσσ ⊢ 1σσ ⊢ 11σ ⊢ 111. a context-free grammar H is non-ambiguous iff every word w ∈ L(H) can be generated in one and only one way. Because of that. irrespective of the context in which it appears. The set P is composed of the two productions: σ→ǫ σ → aσbσ. being the only non-terminal. looking at them from another point of view. and consequently w1 and w2 must satisfy condition i). σ. P) in which all the productions z1 → z2 in P are such that z1 ∈ N . so that. They are defined in the following way. As a very simple example. Therefore. Let us scan w until we find the first occurrence of the symbol b such that w = aw1 bw2 and in w1 the number of b’s equals the number of a’s. σ is the initial symbol of the grammar. let us consider the following grammar. and therefore it is easily proved for w. This grammar is called the Dyck grammar and the language generated by it the Dyck language. by the very construction of w1 and the fact that w satisfies condition ii) by hypothesis. w is generated by the second production and w = aw1 bw2 with w1 .2 Context-free languages The definition of a formal language is quite general and it is possible to show that formal languages coincide with the class of “partially recursive sets”.e. but their construction can go on forever. we i) the number of a’s in w equals the number of b’s.2. If w = ǫ. N = {σ}. known as Sch¨tzenberger u methodology or symbolic method. Vice versa.. At that moment. if we wish to know whether a word w belongs to such a set S. i. if w1 and w2 are not empty.68 CHAPTER 6. In other words. A context-free grammar is a grammar G = (T. A context-free grammar G is ambiguous iff there exists a word w ∈ L(G) which can be generated by two different leftmost derivations. then they should satisfy condition ii). This completes the proof. This means that we can give rules to build such sets (e. By collecting all these definitions. if w = ǫ nothing has to be proved. the generation can go on forever. N.e.

the formal definition of the programming language ALGOL’60 was published. for example. in fact. The ALGOL’60 grammar used. This σ→1 σ → 1σ. FORMAL LANGUAGES AND PROGRAMMING LANGUAGES 69 $$ ǫ $$ $ $$ σ ˆ ˆˆ ˆˆˆ ˆ @ @@ aσbσ h hhh hh hh hh h aaσbσbσ d d aaaσbσbσbσ  d   d aabaσbσbσ  d   d @@ @@ @@@@ abσ     d d abaσbσ     ababσ d d abaaσbσbσ ab     aabσbσ     aabbσ     aabb d d     abab d d ababaσbσ  d   d d d aabbaσbσ Figure 6. generated by some grammar G.of productions. has a unique decomposition according to the productions in G. ductions. if we had several production with the same left hand symbol β → w1 . In general. i. except the empty word. 1 nical device used by ALGOL’60 was the compaction can be uniquely decomposed according to these pro. N. actually. ALGOL’60 has surely been the most influential programming language ever created. Most of the concepts we now find in programming languages were introduced by ALGOL’60. they were the characters punchable on a card. a context-free language is called intrinsically ambiguous iff every context-free grammar generating it is ambiguous. For example.1: The generation of some Dyck words Instead. a program in ALGOL’60 is a word generated by a (rather complex) context-free grammar. The previous example concerning program makes surely sense.3. as terminal symbol alphabet. w2 ∈ D. Another techIt is a simple matter to show that every word 11 . the language generated by the previous ambiguous grammar is {1}+ . it can also be generated by some non-ambiguous grammar. Because of the connection between the Sch¨tzenberger methodology and non-ambiguous u context-free grammars.. we are not interested in these aspects of ALGOL’60. fortunately. . β → wk . In practice.3 Formal languages and programming languages In 1960. given any word w ∈ D. whose initial symbol is program . Here. although it was actually used only by a very limited number of programmers. w can only be generated in a single way. the set of all the words composed by any sequence of 1’s. . The non-terminal symbol notation was one of the most appealing inventions of ALGOL’60: the symbols were composed by entire English sentences enclosed by the two special parentheses and . but we wish to spend some words on how ALGOL’60 used context-free grammars to define its syntax in a formal and precise way. the Dyck grammar is non-ambiguous. w = ǫ. β → w2 . it is not an ambiguous language and a non-ambiguous grammar generating it is given by the same T.6. then the grammar cannot be ambiguous. they are not very frequent. as we have shown in the proof of the previous theorem. of which. having w1 . if a language is generated by an ambiguous grammar. For the sake of completeness. . σ and the two productions: 6. PASCAL and C are direct derivations.e. allowed to clearly express the intended meaning of the non-terminal symbols. the input mean used at that time to introduce a program into the computer. the characters available on the standard keyboard of a computer. there is only one decomposition w = aw1 bw2 . It is possible to show that intrinsically ambiguous languages actually exist. we are mainly interested in this kind of grammars. therefore. unless it is intrinsically ambiguous. This definition stresses the fact that. . actually. . . . if we show that any word in a context-free language L(G).

σ = φ and P contains: φ→1 φ → φ1 φ → φ01 (these productions could have been written φ ::= 1 | φ1 | φ01 by using the ALGOL’60 notations). or adding a trailing 01 to a word of length n − 2. After having performed these transformations. they are executed before products. the empty word is transformed into 1. 10111. we have shown that the .1 (lines 1 through 6) we show how integer numbers were defined. a word of length n is obtained by adding a trailing 1 to a word of length n−1. σ. 3. the multiplication should be executed before the addition. In fact. this information is passed to the compiler and the multiplication is actually performed before addition. but allows both +0 and −0. we obtain a system of equations. 1. which is easily recognized as the Fibonacci sequence. If powers are also present. Just to do a very simple example. since the program has a single generation according to the grammar. 1. 10101 If we count them by their length. 11011. every | sign is transformed into a + sign. in a sense. and after ALGOL’60 the syntax of every programming language has always been defined by context-free grammars. we get the productions of a nonambiguous context-free grammar G = (T. Besides. it is possible to find this derivation starting from the actual program and therefore give its exact structure. In the same figure. 1}. in Figure 6. starting with the correspond1 − t − t2 ing non-ambiguous grammar and proceeding in a mechanical way. reveals that it is decomposed into the sum of a and b ∗ c. 1} bonacci numbers.this is obviously the generating function for the Fibonacci words are the words on the alphabet {0. according to the rules of Algebra. We conclude by remembering that a more sophisticated approach to the definition of programming languages was tried with ALGOL’68 by means of van Wijngaarden’s grammars. 2. The derivation of the simple expression a + b ∗ c. 4. in a combinatorial way. when we consider them as the initial symbols. but the method revealed too complex and was abandoned. 3. P). This consists in the folu lowing steps: 1. This ability of context-free grammars in designing the syntax of programming languages is very important. 1101 11111. every non-terminal symbol σ ∈ N is transformed into the name of its counting generating function σ(t). every terminal symbol is transformed into t. 101 1111.}. Therefore. This allows to give precise information to the compiler. FORMAL METHODS beginning and ending by the symbol 1 and never containing two consecutive 0’s.4 The symbolic method The Sch¨tzenberger’s method allows us to obtain the solution of which is: u the counting generating function for every nont . line 7 shows the definition of the conditional statements. . 2. A very interesting aspect is how this context-free grammar definition can avoid ambiguities in the interpretation of a program. 1011. CHAPTER 6. and ::= is transformed into an equal sign. This notation is usually called Backus Normal Form (BNF). Fi. This kind of definition gives a precise formulation of all the clauses in the programming language. and the computer must follow this convention in order to create no confusion. For small values of n. This immediately shows. N = {φ}. They are the counting generating functions for the languages generated by the corresponding nonterminal symbols. The definition of the Fibonacci words produces: φ(t) = t + tφ(t) + t2 φ(t) 6. Productions can be easily changed to avoid +0 or −0 or both. . 11101.70 they were written as a single rule: β ::= w1 | w2 | · · · | wk where ::= was a metasymbol denoting definition and | was read “or” to denote alternatives. which. where T = {0. . we obtain the sequence {0. Let us consider an expression like a + b ∗ c. 8. This definition avoids leading 0’s in numbers. that Fibonacci words are counted by Fibonacci numbers. is directed from the formal syntax of the language (syntax directed compilation). φ(t) = ambiguous language. This is done by the simplified productions given by lines 8 through 11 in Figure 6. Fibonacci words of length n are easily displayed: n=1 n=2 n=3 n=4 n=5 1 11 111. We are now going to obtain the counting generating function for Fibonacci words by applying the Sch¨tzenberger’s method.1. Besides. which can be solved in the unknown generating functions introduced in the first step. N. 5. or of a more complicated expression. Let us begin by a simple example.

the definition of the language is: µ ::= ǫ | cµ | aµbµ if µ is the only non-terminal symbol. b occurring in the words. This is accomplished by modifying the intended meaning of the indeterminate t and/or introducing some other indeterminate to take into account the other parameters. Therefore. and therefore obtain counting generating functions by means of Sch¨tzenberger’s method. b act as parentheses in the Dyck language. as we have already proved by combinatorial arguments. these are words on the alphabet {a. Another example is given by the Motzkin words.e. whose solution is just the generating function of the Catalan numbers. 2t2 71 8 9 10 11 They can be defined in a pictorial way by means of an object grammar: An object grammar defines combinatorial objects instead of simple letters or words. Besides the indeterminate t counting the total number of symbols. we can wish to count the number of pairs a. b. but counts the pairs. most times. For example. In the case of the Dyck language. Let us suppose we wish to know how many Fibonacci words of length n contain k zeroes. because of that. we can wish to count other parameters instead of or in conjunction with the number of symbols. in the Dyck language. the Sch¨tzenberger’s u method gives the equation σ(t) = 1 + tσ(t)2 . From the productions of the Fibonacci grammar. In Sch¨tzenberger’s method. we derive an equation for the bivariate generating function φ(t. beginning: n Mn 0 1 1 2 1 2 3 4 5 4 9 21 6 51 7 127 8 323 9 835 The 6. however. it is rather easy to pass from an object grammar to an equivalent context-free grammar. and this also we knew by combinatorial means.5. Since every word in the Dyck language has an even length. i. we introduce a new indeterminate z counting the number of zeroes. the rˆle of the indeteru o minate t is to count the number of letters or symbols occurring in the generated words. Therefore. the number of Dyck words with 2n symbols is just the nth Catalan number. every symbol appearing in a production is transformed into a t. For example. in which the coefficient of tn z k is just the . THE BIVARIATE CASE 1 2 3 4 5 6 7 digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 non − zero digit ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 sequence of digits ::= digit | digit sequence of digits unsigned number ::= digit | non − zero digit sequence of digits signed number ::= + unsigned number | − unsigned number integer number ::= unsigned number | signed number conditional clause ::= if condition then instruction | if condition then instruction else instruction expression ::= term | term + expression term ::= factor | term ∗ factor factor ::= element | factor ↑ element element ::= constant | variable | ( expression ) Table 6. this means that t no longer counts the single letters. trees the nodes of which have ariety 1 or 2. the definition yields: σ(t) = 1 + t2 σ(t)2 and therefore: σ(t) = 1− √ 1 − 4t2 .1: Context-free languages and programming languages number of Fibonacci words of length n is Fn . An interesting application is as follows. z). c} in which a. Sch¨tzenberger’s method gives the equation: u µ(t) = 1 + tµ(t) + t2 µ(t)2 whose solution is easily found: √ 1 − t − 1 − 2t − 3t2 µ(t) = .2 is obviously equivalent to the context-free grammar for Motzkin words.5 The bivariate case These numbers count the so-called unary-binary trees. 2t2 By expanding this function we find the sequence of Motzkin numbers. the u object grammar in Figure 6. while c is free and can appear everywhere. However.6..

2763932022 . φ(t. 5 5 5 1 − φt (1 − φt)2 n−k−1 1 .72 t CHAPTER 6. = [tn−2k−1 ] = The last two terms are negligible because they rapidly k (1 − t)k+1 tend to 0. therefore we have: Therefore. FORMAL METHODS t   d   d t t  µ t  t ::=  ǫ  t t  µ t  t     t t  µ t  t d d t t  µ t  t Figure 6. The general formula √ kφn. an operator bonacci words with n letters.In the usual mathematical terminology. . we need n ray (t/(1 − t). The equation is: φ(t. the production φ → φ1 increases by 1 the t t2 y length of the words.k 2 n 5− 5 2 we know for the row sums of a Riordan array gives: Zn = k ∼ √ − = n− . z) + t2 zφ(t. column k is shifted down by number of Fibonacci words of length n: k positions (k + 1.k )n.2: An object grammar number we are looking for. 5 5 5 cient. the number of Fibonacci words of length 2 n n containing k zeroes is counted by a binomial coeffikφn. t/(1 − t)). (1 − t − t2 )2 find: t . because (5 − n √ y= = = [t ] 1−t 1−y 1−t 5)/10 ≈ 0. = [tn ] y= = 1 − t (1 − y)2 1−t the production φ → φ01. .k = [t ][z ] 5 1 − φt 1 − φt 1 − t − zt2 1 2 t 1 n−1 t 1 + [t ] − [tn ] = = [tn ][z k ] = 5 (1 − φt)2 5 t2 (1 − φt)(1 − φt) 1 − t 1 − z 1−t 1 1 ∞ k + [tn−1 ] = t2 t k n k 5 z = = [t ][z ] (1 − φt)2 1−t 1−t k=0 1 1 n−1 1 2 = [t ] − √ [tn ] + ∞ 5 (1 − φt)2 1 − φt t2k+1 z k t2k+1 5 5 = [tn ][z k ] = [tn ] = 1 2 1 1 (1 − t)k+1 (1 − t)k+1 k=0 + √ [tn ] + [tn−1 ] . First.k = = This shows that the average number of zeroes grows k k k linearly with the length of the words and tends to t2 1 t become the 27. in reality). By solving this equation.k ≈ φn−1 − √ φn . z) = t + tφ(t.To obtain the average number Z of zeroes.64% of this length. k total number of zeroes in all the words of length n: kφn. The second expression in the derivation shows k that the array (φn.e. increases by 2 the length t3 and introduces a zero. z). the total stretched vertically.k∈N is indeed a Riordan ar.k = k n−k−1 k= k In fact.6 The Shift Operator . i. t = Fn = [tn ] 1 − t − t2 as we were expecting. A more interesting problem is to find the average number of zeroes in all the Fi.k of tn z k in the [t ] (1 − t − t2 )2 following way: 2 1 1 1 t = [tn−1 ] √ − = n k = φn. but does not introduce any 0. Fn 10 5 φ 5 5 n−k−1 φn. which is the Pascal triangle to divide this quantity by Fn ∼ φn /√5.. z) = We extract the coefficient: 1 − t − zt2 t3 n = We can now extract the coefficient φn. we = [tn ] . we count the is a mapping from some set F1 of functions into some 6..

We write: Ef (x) = f (x + 1) Unlike an infinitesimal operator as D. Since the middle of the 19th century. c ∈ R or c ∈ C) we have Ec = c. 6. an increment h is defined and the shift operator acts according to this increment. acting from the set of sequences (which are properly functions from N into R or C) into the set of formal power series (analytic functions). We have already encountered the operator G. So we have: E −1 f (x) = f (x − 1) E −n f (x) = f (x − n) and this is in accordance with the usual rules of powers: E n E m = E n+m for n. It is possible to consider negative powers of E. THE DIFFERENCE OPERATOR other set of functions F2 . An important observation . i. an operator acting in finite terms. changing the value of any function f at a point x into the value of the same function at the point x + 1. in contrast to an infinitesimal operator. the English mathematicians (G. For example. we shall adopt the first. y) = f (x. in order to follow the usual notaEx Ey f (x.. such as differentiation and integration.. Boole in particular) introduced the concept of a finite operator. the operator of differentiation. however. y + 1) This property is particularly interesting from our point of view but.. We can have occasions to use two or more shift operators. We’ll distinguish them by suitable subscripts: Ex f (x. Ef (x) = f (x + h). the operator of indefinite integration. and in that case we can write: Efn = fn+1 E(f (x)g(x)) = Ef (x)Eg(x) 73 Hence. y + 1) tional conventions. i. In that case. functional notation rather than the second one. more specific for sequences.e. The most simple among the finite operators is the identity. y) = f (x + 1.e. this makes no sense and we constantly use h = 1. E (n times). as well. and . which does not change anything. Operationally. we write E n instead The second most important finite operator is the difof EE . we set E 0 = I = 1. that is shift operators related to different variables. denoted by I or by 1. to denote the successive applications of E. the most important finite operator is the shift operator. separating the operator parts from the sequence parts: (E 2 − E − I)Fn = 0 Some obvious properties of the shift operator are: E(αf (x) + βg(x)) = αEf (x) + βEg(x) 1 ∆ x x ∆ m = = 1 1 1 − =− x+1 x x(x + 1) x+1 x x − = m m m−1 The last example makes use of the recurrence relation for binomial coefficients. Other usual examples are D. When considering sequences. . m ∈ Z E n E −n = E 0 = I for n∈Z Finite operators are commonly used in Numerical Analysis.e.6. the ∆xn = Exn − xn = (x + 1)n − xn = recurrence for the Fibonacci numbers is: n−1 n k x 2 E Fn = EFn + Fn k k=0 and this can be written in the following way.7 The Difference Operator The shift operator can be iterated and. . as well. it is defined in terms of the shift 2 and identity operators: E f (x) = EEf (x) = Ef (x + 1) = f (x + 2) and in general: E n f (x) = f (x + n) ∆f (x) = = Ef (x) − If (x) = f (x + 1) − f (x) (E − 1)f (x) = Conventionally. if c is any constant (i. y) = Ey Ex f (x.7. denoted by E. So: ference operator ∆. y) Ey f (x. y) = f (x + 1. and this is in Some simple examples are: accordance with the meaning of I: ∆c = Ec − c = c − c = 0 ∀c ∈ C E 0 f (x) = f (x) = If (x) ∆x = Ex − x = x + 1 − x = 1 ∆x2 = Ex2 − x2 = (x + 1)2 − x2 = 2x + 1 We wish to observe that every recurrence can be written using the shift operator. a finite operator can be applied to a sequence.

In a similar way we have: ∆ f (x) g(x) = = f (x + 1) f (x) − = g(x + 1) g(x) f (x + 1)g(x) − f (x)g(x) + g(x)g(x + 1) f (x)g(x) − f (x)g(x + 1) = + g(x)g(x + 1) (∆f (x))g(x) − f (x)∆g(x) g(x)Eg(x) 1 n (−1)k n! = x+n = x(x + 1) · · · (x + n) k x+k x n k=0 n and therefore we have both a way to express the inverse of a binomial coefficient as a sum and an expression for the partial fraction expansion of the polynomial x(x + 1) · · · (x + n) inverse. x(x + 1) · · · (x + n) The following general rules are rather obvious: (−1)n+1 (n + 1)! = x(x + 1) · · · (x + n + 1) ∆(αf (x) + βg(x)) = α∆f (x) + β∆g(x) The formula for ∆n now gives the following identity: ∆(f (x)g(x)) = n 1 n 1 = E(f (x)g(x)) − f (x)g(x) = ∆n = (−1)n−k E k k x x k=0 = f (x + 1)g(x + 1) − f (x)g(x) = = f (x + 1)g(x + 1) − f (x)g(x + 1) + By multiplying everything by (−1)n this identity can be written as: +f (x)g(x + 1) − f (x)g(x) = = ∆f (x)Eg(x) + f (x)∆g(x) resembling the differentiation rule D(f (x)g(x)) = (Df (x))g(x) + f (x)Dg(x).74 concerns the behavior of the difference operator with respect to the falling factorial: ∆xm = (x + 1)m − xm = (x + 1)x(x − 1) · · · · · · (x − m + 2) − x(x − 1) · · · (x − m + 1) = CHAPTER 6. FORMAL METHODS This is a very important formula. let us iterate ∆ on f (x) = 1/x: ∆2 1 x = = ∆n 1 x = −1 1 + = (x + 1)(x + 2) x(x + 1) −x + x + 2 2 = x(x + 1)(x + 2) x(x + 1)(x + 2) (−1)n n! x(x + 1) · · · (x + n) = x(x − 1) · · · (x − m + 2)(x + 1 − x + m − 1) = = mxm−1 This is analogous to the usual rule for the differentiation operator applied to xm : Dxm = mxm−1 as we can easily show by mathematical induction. many formal properties of the dif. In As we shall see.fact: ference operator are similar to the properties of the (−1)n n! differentiation operator. ∆2 = (E − 1)2 = E 2 − 2E + 1 and in general: n As the difference operator ∆ can be expressed in terms of the shift operator E. giving the summation formula: n From a formal point of view. which there(−1)n n! fore assume a central position in the theory of finite = − operators.8 Shift and Difference Operators . so E can be expressed in terms of ∆: E =∆+1 This rule can be iterated. and it is the first example for the interest of combinatorics and generating functions in the theory of finite operators. 6. we have: E n = (∆ + 1)n = k=0 n ∆k k ∆n = = (E − 1)n = n k=0 n (−1)n−k E k = k which can be seen as the “dual” formula of the one already considered: n (−1) n k=0 n (−E)k k ∆ = k=0 n n (−1)n−k E k k . In fact. The rˆle of the powers xm is ∆n+1 1 = o − x (x + 1) · · · (x + n + 1) however taken by the falling factorials.Example I = The difference operator can be iterated: ∆2 f (x) = ∆∆f (x) = ∆(f (x + 1) − f (x)) = = f (x + 2) − 2f (x + 1) + f (x).

SHIFT AND DIFFERENCE OPERATORS . which may have combinatorial significance. Here we record some typical examples. x+n ∆n xHx 2∗ ) A somewhat similar situation. we should have: ∆0 (p + (Hx + Hm − Hm−k ) = k m−k x)/(m + x) = (p − m)(m + x). a similar situation arises and by performing the sums on the left containing whenever we have ∆0 = I. 1) The function f (x) = 1/x has already been developed. however. ∆ ∆n = = k m (−1)k = m+n 3) Another version of the first example: −p 1 = ∆ px + m (px + m)(px + p + m) 1 ∆n = px + m (−1)n n!pn = (px + m)(px + p + m) · · · (px + np + m) k n x+k (−1)k Hx+k = k m = (−1)n x (Hx + Hm − Hm−n ) m−n n x According to this rule. but a bit more complex: p+x ∆ m+x ∆n p+x m+x = = m−p (m + x)(m + x + 1) m+x+n m−p (−1)n−1 m+x n −1 −1 k n (−1)k (x + k)Hx+k = k = 1 x+n−1 n−1 n−1 −1 k p+k n m−p m+n (−1)k = k n m+k m n k m+k k −1 k n (−1)k x + k − 1 k k−1 k−1 = −1 = (n > 0) (x + n) (Hx+n − Hx ) − n x Hx m x Hx m x 1 Hx + m m−1 x (Hx + Hm − Hm−n ) m−n 6) Harmonic numbers and binomial coefficients: (see above).EXAMPLE I The evaluation of the successive differences of any function f (x) allows us to state and prove two identities. Hx and Hm : n!pn n (−1)k = x n m(m + p) · · · (m + np) k pk + m Hm−k = k m−k k k (−1)k k!pk 1 n x+n = (Hx + Hm − Hx+n ) = k m(m + p) · · · (m + pk) pn + m m k . and therefore x+n = Hx+n we also have to subtract 1 from both members in orm der to obtain a true identity.6. at least partially: ∆ ∆n 1 x 1 x = = −1 x(x + 1) (−1)n x + n (−1)n n! x(x + 1) · · · (x + n) x n −1 75 4∗ ) A case involving the harmonic numbers: ∆Hx ∆n Hx = = 1 x+1 (−1)n−1 x + n n n −1 k n 1 x+n n (−1)k Hx+k = − k n n n (−1)k−1 x + k k k k −1 −1 (n > 0) k=1 = Hx+n − Hx k n (−1)k n! = = k x+k x(x + 1) · · · (x + n) = 1 x+n x n −1 where is to be noted the case x = 0. 5∗ ) A more complicated case with the harmonic numbers: ∆xHx = Hx + 1 = (−1)n x + n − 1 n−1 n−1 −1 k x+k n (−1)k k k −1 = x .8. we have to set ∆0 = I. in the second next k sum. we mark with an asterisk the cases when ∆0 f (x) = If (x).

76 7) The function ln(x) can be inserted in this group: ∆ ln(x) ∆n ln(x) = = ln x+1 x n (−1)k ln(x + k) k k CHAPTER 6. FORMAL METHODS n (−1)k (x + k)m k n mk xm−k k = = (−1)n mn xm−n (x + n)m (−1)n k k 4) Similar sums hold for raising factorials: ∆xm ∆n xm = m(x + 1)m−1 = mn (x + n)m−n = = (−1)n mn (x + n)m−n (x + n)m k n (−1)k ln(x + k) = k = k n k n (−1)k ln(x + k) k k n (−1)k (x + k)m k n mk (x + k)m−k k (−1) k=0 j=0 k+j n k k ln(x + j) = ln(x + n) j k Note that the last but one relation is an identity.Example II ∆n Here we propose other examples of combinatorial sums obtained by iterating the shift and difference operators. 1) The typical sum containing the binomial coefficients: Newton’s binomial theorem: ∆p ∆n px x k n x+k (−1)k k m n k x m−k k = = (p − 1)p (p − 1)n px = (p − 1)n x 6) Another case with binomial coefficients: ∆ ∆n p+x m+x p+x m+x = = p+x m+x+1 p+x m+x+n = = (−1)n p m+n k n (−1)k pk k n (p − 1)k k = pn k k n p+k (−1)k k m+k n k p m+k 2) Two sums involving Fibonacci numbers: ∆Fx ∆n Fx = Fx−1 = Fx−n (−1)n Fx−n k p+n m+n 7) And now a case with the inverse of a binomial coefficient: = ∆ ∆n x m x m −1 k n (−1)k Fx+k k n Fx−k k = − = = Fx+n −1 m x+1 m−1 m+1 −1 (−1)n k x+n m m+n m+n −1 −1 3) Falling factorials are an introduction to binomial coefficients: ∆xm ∆n xm = mxm−1 = mn xm−n k x+k n (−1)k m k = x+n m m+n m+n −1 −1 k m n x+k (−1)k k m+k m+k = x+n m −1 .9 Shift and Difference Operators . 5) Two sums involving the binomial coefficients: ∆ x m x m = = x m−1 x m−n = = (−1)n x+n m x m−n 6.

10. the addition operator can be iterated and often produces interesting combinatorial sums according to the rules: Sn En = = (E + 1)n = k k 2x + 2k 1 n (−1)k = x + k 4k k = = (2n)! 1 2x n!(x + 1) · · · (x + n) 4n x 1 2n 4n n x+n n −1 n Ek k n (−1)n−k S k k = (S − 1)n = k 2x x Some examples are in order here: 1) Fibonacci numbers are typical: SFm S n Fm = Fm+1 + Fm = Fm+2 = Fm+2n n Fm+k k = Fm+2n = (−1)n Fm+n k 1 2x n (−1)k (2k)! k k!(x + 1) · · · (x + k) 4k x = 1 2x + 2n 4n x + n = k 9) Two sums with the inverse of the central binomial coefficients: ∆4x ∆n 4x = 2x x 2x x −1 k n (−1)k Fm+2k k = 2x 4x 2x + 1 x −1 2) Here are the binomial coefficients: S Sn m x m x = = n k m+1 x+1 m+n x+n m x+k = = m+n x+n (−1)n m x+n −1 = −1 1 (−1)n−1 (2n)!4x 2x n n!(2x + 1) · · · (2x + 2n − 1) 2n − 1 2 x n 2x + 2k (−1)k 4k k x+k −1 k = −1 k k m+k n (−1)k x+k k = k 1 n (2k)!(−1)k−1 = k k!(2x + 1) · · · (2x + 2k − 1) k 2k − 1 2 = 4 n (2n)! 2x 1 n n!(2x + 1) · · · (2x + 2n − 1) 2n − 1 2 x 3) And finally the inverse of binomial coefficients: S m x −1 = = m+1 m−1 m x −1 2x + 2n x+n −1 2x −1 x Sn m x n k m−n m+1 m−n+1 x −1 −1 6.10 The Addition Operator k m x+k = m+1 m+n m−n+1 x −1 −1 The addition operator S is analogous to the difference operator: S =E+1 and in fact a simple connection exists between the two operators: S(−1)x f (x) = (−1)x+1 f (x + 1) + (−1)x f (x) = = (−1)x+1 (f (x + 1) − f (x)) = = (−1)x−1 ∆f (x) k n (−1)k m+k k m−k+1 x = m 1 m+1 x+n −1 . the addition operator has not widely considered in the literature. For example: S∆ = (E + 1)(E − 1) = E 2 − 1 = = (E − 1)(E + 1) = ∆S . correspondingly. may obtain some summation formulas of combinatorial interest. THE ADDITION OPERATOR 8) Two sums with the central binomial coefficients: 1 2x 4x x 1 2x ∆n x 4 x ∆ = = 1 2x 1 2(x + 1) 4x x 1 2n (−1)n (2n)! n!(x + 1) · · · (x + n) 4n n 77 Because of this connection. We can obviously invent as many expressions as we desire and. Likewise the difference operator.6. and the symbol S only is used here for convenience.

. which we proved earlier: the variable x.By applying the operator Σ to both sides and exnoted by Σ. FORMAL METHODS This derivation shows that the two operators S and have ∆−1 f (x) = g(x) and the rule of definite summation immediately gives: ∆ commute. which can be stressed by conk sidering the formal properties of Σ. In order to make this point clear. mined. or Σ and dx. We can directly verify this property: S∆f (x) ∆Sf (x) = f (x + 2) − f (x) = (E 2 − 1)f (x) = ∆(f (x + 1) + f (x)) = = f (x + 2) − f (x) = (E 2 − 1)f (x) Consequently. which is constant with respect to the difference of a product. The rule above ∆(f (x)g(x)) = ∆f (x)Eg(x) + f (x)∆g(x) is called the rule of definite summation. if we consider the integer variable k and set a = x and b = x + n + 1: b−1 k n (−1)k Fm+2k k n Fm+k k k=a f (k) = g(b) − g(a) These facts create an analogy between ∆−1 and D . the function C(x) reduces to a conn 1 1 stant. we but these identities have already been proved using observe that g(x) = Σf (x) is not uniquely deterthe addition operator S.78 CHAPTER 6. let us changing terms: consider any function f (x) and suppose that a function g(x) exists such that ∆g(x) = f (x). and plays the same rˆle as the integration cono = E k = [z n ] 1 − z 1 − Ez stant in the operation of indefinite integration. E n+1 − 1 n+1 −1 An important property is summation by parts. the operator ∆−1 is called indefinite summation and is often de. a primitive function for f (x) is any function g (x) such that Dˆ(x) = f (x) or ˆ g D−1 f (x) = f (x)dx = g (x). i.11 Definite and summation Indefinite ∆(g(x) + C(x)) = ∆g(x) + ∆C(x) = ∆g(x) and therefore: The following result is one of the most important rule Σf (x) = g(x) + C(x) connecting the finite operator method and combinaWhen f (x) is a sequence and only is defined for intetorial sums: ger values of x. k=0 The operator Σ is obviously linear: 1 1 1 = = [z n ] − (E − 1)z 1 − Ez 1−z Σ(αf (x) + βg(x)) = αΣf (x) + βΣg(x) 1 1 1 = = [z n+1 ] − This is proved by applying the operator ∆ to both E−1 1 − Ez 1−z sides. In fact. we have: −1 6.e. C(x + k) = C(x). Let us begin with the rule for indeterminate z. and if f (x) is any function. Hence we Σ(f (x)∆g(x)) = f (x)g(x) − Σ(∆f (x)Eg(x)) . First of all. cor= (E − 1)∆ = E−1 responding to the well-known rule of the indefinite We observe that the operator E commutes with the integration operator. ∀k ∈ Z. we have the two summation formulas: ∆n S n E 2n = = (E 2 − 1)n = (∆S + 1)n = k = S(f (x + 1) − f (x)) = n k=0 f (x + k) = g(x + n + 1) − g(x) k n (−1)n−k E 2k k n ∆k S k k This is analogous to the rule of definite integration. on which E operates. the operator of indefinite integration dx is inverse of the differentiation operator D. The fundamental theˆ orem of the integral calculus relates definite and indefinite integration: b a A simple example is offered by the Fibonacci numbers: ∆SFm (∆S)n Fm = Fm+1 = ∆n S n Fm = Fm+n = (−1)n Fm+n = Fm+2n f (x)dx = g (b) − g (a) ˆ ˆ The formula for definite summation can be written in a similar way. If C(x) is any function periodic of period 1.

∆−1 is not easy to com. 1) We have again the partial sums of the geometric series: ∆−1 px = n = = = px = Σpx p−1 px+n+1 − px p−1 px+k k=0 n pk k=0 pn+1 − 1 p−1 (x = 0) Obviously. the rule: k=0 k m k=0 E k = (E n+1 − 1)∆−1 = (E n+1 − 1)Σ 4) The sum of consecutive binomial coefficients: ∆−1 p+x m+x = = p+x m+x−1 =Σ p+x m+x is the most important result of the operator method. into a sum involving the difference of the first factor. falling factorials: = = = raising factorials: = 1 (x − 1)m+1 = Σxm m+1 . Unfortunately. from a more positive point of view. and this is just the operator inverse of the difference. The negative point is that. n In fact. For example. For example. apart from a restricted number of cases. in this way we immediately p+n+1 p − m+n m−1 1 xm+1 = Σxm m+1 (x + n + 1)m+1 − xm+1 m+1 1 (n + 1)m+1 m+1 (x = 0). we n can say that whenever we know. Here is a number of identities obtained by our previous computations. obtain ∆−1 xm the Σ of some function. it reduces the sum of the successive elements p+k in a sequence to the computation of the indefinite m+k k=0 sum. the rule is very fine. in some way or ankm other.5) The sum of pute and. we do not have a simple function and. for each of them. of which we know that the second factor is a difference. k=0 However. there is no general rule allowing us to guess what ∆−1 xm −1 ∆ f (x) = Σf (x) might be. DEFINITE SUMMATION This formula allows us to change a sum of products. very general and (x + k)m completely useless. we can look at the differences computed in 6) The sum of the previous sections and.6. The transformation can be convenient every time when the difference of the first factor is simpler than the difference of the second factor. sometimes. therefore.12. In this rather pesn simistic sense. the sum may not have any combinatorial interest. we have k=0 n solved the problem of finding k=0 f (x + k). this indefinite sum can be transformed into a definite sum by using the first result in this section: b 2) The sum of consecutive Fibonacci numbers: ∆−1 Fx n = Fx+1 = ΣFx = Fx+n+2 − Fx+1 = Fn+2 − 1 (x = 0) Fx+k k=0 n k=a k k m b+1 = (b + 1) − m+1 a b+2 a+1 − a − + m+1 m+2 m+2 and for a = 0 and b = n: n Fk k=0 3) The sum of consecutive binomial coefficients with constant denominator: ∆−1 n k k=0 k m = (n + 1) n+2 n+1 − m+2 m+1 x m = = = x m+1 =Σ x m 6. let us perform the following indefinite summation: Σx x m = x = m+1 x x+1 = x − Σ(∆x) m+1 m+1 x x+1 = x −Σ = m+1 m+1 x x+1 = x − m+1 m+2 Σx∆ 79 have a number of sums.12 n Definite Summation k=0 x+k m n x+n+1 x − m+1 m+1 n+1 m+1 (x = 0) In a sense. an expression for ∆−1 f (x) = Σf (x).

i. and therefore we have a development for Σ: Σ = B1 B2 2 D+ D + ··· = 1! 2! 1 1 1 3 = D−1 − I + D − D + 2 12 720 1 1 D5 − D7 + · · · .13 The Euler-McLaurin Summation Formula f (k) k=0 = 0 f (x) dx − 1 n [f (x)]0 + 2 One of the most striking applications of the finite operator method is the formal proof of the EulerMcLaurin summation formula. The usual form of the theorem: + 1 ′ 1 n n [f (x)]0 − [f ′′′ (x)]0 + · · · 12 720 and this is the celebrated Euler-McLaurin summation formula. let us n! can be interpreted in the sense of operators as a result find an asymptotic development for the harmonic connecting the shift and the differentiation operators. it can be written as: tion f (x) = 1/x. the EulerMcLaurin formula applies to Hn−1 and to the funcIn fact. we immediately have: n−1 n 6. + 30240 1209600 B0 + 1 D k=0 x+k m m m−1 −1 = x−1 m−1 −1 − x+n m−1 −1 . In this sense. which only is defined when we consider a limited number of terms. for approximating an integral by f (x + h) = f (x) + f ′ (x) + f ′′ (x) + means of a sum. The number of terms for which the sum approaches its true value depends on the function f (x) and on the argument x. we recognize the generating function of the Bernoulli numbers. but in general diverges if we let the number of terms go to infinity.e. the Bernoulli numbers diverge to infinity. By inverting.80 n CHAPTER 6. hn (n) f (x) + · · · + ··· + As a simple but very important example. Since 1 = ∆x. we have a formula for the Σ operator: Σ= 1 1 = eD − 1 D D eD − 1 = = k=0 n km k=0 7) The sum of inverse binomial coefficients: ∆−1 n x m −1 = m x−1 m−1 m−1 −1 =Σ x m Now. the formula can be seen as a method for approximating a sum by means of an inh h2 tegral or.12. Hx+k k=0 n Hk k=0 This is not a series development since. and this was just the point of view 1! 2! of the mathematicians who first developed it. It expresses a sum as a function of the integral and the successive derivatives of the function f (x). We have a case of asymptotic development. FORMAL METHODS (x + k)m (x + n)m+1 − (x − 1)m+1 m+1 1 nm+1 m+1 (x = 0). and subtracting 1 from both sides it can be formulated as: ∆ = eD − 1 = . The starting point is the Taylor theorem for the series expansion of a function f (x) ∈ C ∞ . From the indefinite we can pass to the definite sum by applying the general rule of Section 6. Since D−1 = dx.. we have: ∆−1 Hx n = xHx − x = ΣHx = = = (x + n + 1)Hx+n+1 − xHx − (n + 1) (n + 1)Hn+1 − (n + 1) = (n + 1)Hn − n (x = 0). vice versa. 8) The sum of harmonic numbers. for h = 1. giving: Df (x) D2 f (x) + + ··· Ef (x) = If (x) + n n n 1! 2! dx 1 1 1 1 Hn−1 = + − − 2 − x 2 x 1 12 x 1 and therefore as a relation between operators: 1 6 1 120 1 − 4 + − 6 + ··· 720 x 1 30240 x 1 1 1 1 1 1 + − + + − ln n − 2 2n 2 12n 12 120n4 1 1 1 − + + ··· − 6 120 256n 252 − n n D D2 Dn E =1+ + + ··· + + · · · = eD 1! 2! n! This formal identity relates the finite operator E and the infinitesimal operator D. numbers Hn . Since Hn = Hn−1 + 1/n. as we know. a function having derivatives of any order.

2πn n 1 + = e 12n 288n2 × 1− This is the well-known Stirling’s approximation for n!. we can obtain xq . provided that the sum actually converges. . and they can be summed together to form a constant γ. when k=1 mation formula.14 Applications of the EulerMcLaurin Formula 4n √ πn 1− 1 1 + + ··· . 1 + ··· ··· = 360n3 √ 1 1 nn + − ··· . We proceed with the Euler√ 1 1 McLaurin formula in the following way: 1− + = 0. However.mation formula is given by the sum n k p . we observe that as n → ∞ this constant γ is the Euler-Mascheroni constant: n→∞ 81 nn en 1+ 1 1 + + ··· × 12n 288n2 = √ 2πn lim (Hn−1 − ln n) = γ = 0. APPLICATIONS OF THE EULER-MCLAURIN FORMULA In this expression a number of constants appears. after ⌈p⌉ differentiations. we eventually find: Hn = ln n + γ + 1 1 1 1 − + − + ··· 2n 12n2 120n4 252n6 and this is the asymptotic expansion we were looking for. 8n 128n2 e 1 1142n2 − · · · 2 1 288n2 − · · · = Another application of the Euler-McLaurin sumAs another application of the Euler-McLaurin sum. 720 − In this case the evaluation at 0 does not introduce any constant. − 12 360n2 360 + 1 1 12 x − 1 n p(p − 1)(p − 2)xp−3 0 + · · · = 720 np+1 np pnp−1 − + − p+1 2 12 p(p − 1)(p − 2)np−3 − + ···. 12 360 n−1 n 1 1 n n pxp−1 1 − xp dx − [xp ]1 + kp = We can now go on with our sum: 2 12 1 k=1 √ 1 1 1 1 n − +· · · ln n! = n ln n−n+ ln n+ln 2π + − p(p − 1)(p − 2)xp−3 1 + · · · = 2 12n 360n3 720 To obtain the value of n! we only have to take expo1 1 1 pnp−1 np+1 − − np + + − = nentials: p+1 p+1 2 2 12 1 1 nn √ √ p p(p − 1)(p − 2)np−3 n 2π exp exp − ··· n! = − − + en 12n 360n3 12 720 1 2 + ··· = 720 x3 1 1 1 1 = n ln n − n + 1 − ln n + − 2 12n 1 1 1 − + + ···. By adding np to both sides.6.577215664902 . By means of this approximation. which only contains a finite number of terms: n . The first step consists the case of the harmonic numbers: in taking the logarithm of that quantity: n−1 n 1 1 n n kp = xp dx − [xp ]0 + pxp−1 0 − ln n! = ln 1 + ln 2 + ln 3 + · · · + ln n 2 12 0 k=0 so that we are reduced to compute a sum and hence to apply the Euler-McLaurin formula: n−1 n ln(n − 1)! = k=1 ln k = 1 n ln x dx − 1 n [ln x]1 + 2 n = np pnp−1 np+1 + + − kp = Here we have used the fact that ln x dx = x ln x − p+1 2 12 x.919(4) and ln 2π ≈ 0. + + 6. we have the following formula. which is Stirling’s approximation for n!. where q < 0. we it is known that σ = ln 2π. we now show the derivation of the p is any integer constant different from −1. . At this point we can add ln n to both sides and k=0 p(p − 1)(p − 2)np−3 introduce a constant σ = 1 − 1/12 + 1/360 − · · · It is − + ··· not by all means easy to determine directly the value 720 of σ. we can also find the approximation for another important quantity: √ 2n 4πn 2n 2n (2n)! e = = 2n × n n!2 2πn n × = 1+ 1+ 1 24n 1 12n By adding 1/n to both sides of the previous relation. and therefore we cannot observe that: consider the limit 0.14.9189388. Numerically. but by other approaches to the same problem √ If p is not an integer.

. √ 1 1 1 √ = 2 n + K−1/2 + √ − √ + ··· 2 n 24n n k k=1 K−1/2 ≈ −1.4603545 . . in fact. For example. .7). When p > −1 the constant is less important. we have: n √ k= k=1 √ 1 2 √ n n n+ + K1/2 + √ + · · · 3 2 24 n K1/2 ≈ −0. FORMAL METHODS n kp k=1 = has a fundamental rˆle when the leading term o np+1 /(p + 1) does not increase with n. . the sum converges to Kp . when p < −1.82 p(p − 1)(p − 2) ··· 720 np+1 np pnp−1 + + − p+1 2 12 + − The constant: Kp = − 1 1 p p(p − 1)(p − 2) + − + + ··· p + 1 2 12 720 p(p − 1)(p − 2)np−3 + · · · + Kp . . 720 CHAPTER 6. i.2078862 .e. For p = −2 we find: n n k=1 1 1 1 1 1 = K−2 − + 2 − 3 + − ··· k2 n 2n 6n 30n5 It is possible to show that K−2 = π 2 /6 and therefore we have a way to approximate the sum (see Section 2.. In that case.

and therefore we have the folProof: If f (t0 ) < ∞ then an index N ∈ N ex. for every complex number t0 such that |t0 | > R the series does not converge.1 Let f (t) = n fn tn be a power se.proof of the theorem.2 Let f (t) = n fn tn be a power series. we will think of the indeterminate t as of a variable taking its values from C. the series n (−1)n tn does not tend to any limit. A basic result on convergence is given by the following: n=N |fn ||t1 |n ≤ |t1 | |t0 | n ≤ ∞ n=N M |t1 |n = M |t0 |n ∞ n=N <∞ because the last sum is a geometric series with |t1 |/|t0 | < 1 by the hypothesis |t1 | < |t0 |. As we have seen. from now on. for every complex number t0 such that |t0 | < R the series (absolutely) converges and. Obviously. as for n tn /n! = et . when we say that a series does not converge (to a finite value) for a given value t0 ∈ C.Chapter 7 Asymptotics 7. we have pointed out that our approach to power series was purely formal. There are cases for which a series neither converges nor diverges. we can prove that if the series diverges for some value t0 ∈ C.1 The convergence of power series |fn | < M/|t0 |n and therefore: ∞ n=N ∞ fn tn ≤ 1 The uniform convergence derives from the previous proof: the constant M can be made unique by choosing the largest value for all the t0 such that |t0 | ≤ ρ. We will see that this allows us to evaluate the asymptotic behavior of the coefficients fn of a power series n fn tn . and never considered convergence problems. Therefore. This means R n→∞ 83 In many occasions. this implies lim sup n |fn | ≥ 1/R. n |fn ||t0 | < M . when t = 1. Because of that. and diverges iff t0 ∈ C iff f (t0 ) = 0 n limt→t0 f (t) = ∞. as it happens for n n!tn . many times. finite or infinite.lowing formula for the radius of convergence: n ists such that for every n > N we have |fn t0 | ≤ 1 = lim sup n |fn |. for example. Obviously. then there exists a non-negative number R ∈ R or R = ∞ such that: 1. The natural setting for talking about convergence is the field C of the complex numbers and therefore. n a power series f (t) = n fn t converges for some fn tn < ∞. for r < R we have |fn |rn ≤ √ n M /r → 1/r. In fact.1.1. for some finite M ∈ R. The value of R is uniquely determined and is called the radius of convergence for the series. . we always spoke of “formal power series”. we mean that the series in t0 diverges or does not tend to any limit. in fact. then it diverges for every value t1 such that |t1 | > |t0 |. have |fn |rn ≥ 1 for infinitely many n. Besides. this implies that ries such that f (t0 ) converges for the value t0 ∈ C. but now the moment has arrived that we must turn to talk about the convergence of power series. a series can converge for the single value t0 = 0. thus solving many problems in which the exact value of fn cannot be found. the theorem follows. a lot of things can be said about formal power series. the convergence is uniform in every circle of radius ρ < R. Since the first N terms obviously amount to a finite quantity. the asymptotic evaluation of fn can be made more precise and an actual approximation of fn can be found. 2. From the Theorem 7. the previous theorem implies: Theorem 7. In a similar way. for r > R we |t1 | < |t0 |. In all the other cases. M or n |fn | ≤ n Then f (t) converges for every t1 ∈ C such that lim sup |fn | ≤ 1/R. or can converge for every value t ∈ C.

for our expression we have: of convergence. we used it to prove that √ Fn ∼ φn / 5. this is a rough estimate. b are two small parameters with respect to n): Γ(n + a) = Γ(n + b) = na−b 1 + (a − b)(a + b − 1) +O 2n 1 n2 . will be called Bender’s theorem: Γ(n + a) ≈ Γ(n + b) ≈ × 2π n+a n+b 2π n+a e e n+b n+a 1+ n+b 1 12(n + a) 1 12(n + b) × . which is the same as the radius of 2n √ 1/2 g(t) = (1−3t) . therefore we have µn /µn+1 → 1/3 as Γ(n + a) n → ∞. then: n + b b−a a−b (1 + a/n)n+a = e n . if Γ(n + b) limn→∞ gn /gn+1 = b and h(b) = 0.2 The method of Darboux If we limit ourselves to the term in 1/n. In practice. 1− Let us remember that if g(t) has positive real coef. = na−b 1 + √ n+2 2n n2 3 2n + 4 3 = .We now obtain asymptotic approximations in the folficients. |fn | grows as 1/Rn . = ex 1 + µ(t) is R = 1/3. the next sections will be dedicated to this problem. which can make more precise the basic approximation. and many other proofs can be performed by using Newton’s rule together with the following theorem. By Bender’s theorem we find: ≈ Γ(n + b) 1 4 n+2 1/2 [t ](1 − 3t) = µn ∼ − na−b ea b2 b−a a2 2 3 ≈ 1− 1+ = 1+ √ a−b eb e 2n 2n 2n 3 1/2 n+2 (−3) = = − 1 a2 − b2 − a + b 3 n+2 +O . Let us supn+a 1 + a/n pose we wish to find the asymptotic value for the b a b−a Motzkin numbers.84 This result is the basis for our considerations on the asymptotics of a power series coefficients. because it can also grow as n/Rn or 1/(nRn ). ASYMPTOTICS This is a particular case of a more general result due to Darboux and known as Darboux’ method.2. when γ ∈ C is a fixed number and n is large. and many possibilities arise.lowing way: gence of g(t). First of all. We are now in a position to prove the following: 3(2n + 3) n + 2 4 . lim fn+1 =S fn then R = 1/S is the radius of convergence of the series. We begin by proving the following formula for the ratio of two large values of the Γ function (a. then gn /gn+1 tends to the radius of conver. whose generating function is: . Let us apply the Stirling formula for the Γ function: 7.Γ(n + a) ≈ e = n+a (n + b)n+b fore equals the radius of convergence of g(t)). rections cancellate each other and therefore we find: g(t). while h(t) = 1 + t has 1 as radius Therefore. In fact. n+a (1 + b/n)n+b fn ∼ h(b)gn Newton’s rule is the basis for many considerations on asymptotics. instead. therefore. whose relevance was noted by Bender and. n 2n 2 We now observe that the radius of convergence of x2 + ··· . µ(t) = 2t2 x x n+x = exp (n + x) ln 1 + = 1+ For n ≥ 2 we obviously have: n n √ x x2 1 − t − 1 − 2t − 3t2 = exp (n + x) − 2 + ··· = µn = [tn ] = n 2n 2t2 x2 x2 √ 1 = exp x + − + ··· = = − [tn+2 ] 1 + t (1 − 3t)1/2 . we give a simple example. let us show how it is possible to obtain an γ approximation for the binomial coefficient n . However. it implies that. as a first approximation. ≈ 1+ 1− ≈1+ √ 2n 2n 2n 2 1 − t − 1 − 2t − 3t . We conclude by noticing that if: n→∞ CHAPTER 7. h(t) are power series and h(t) has a ran + b b−a (n + a)n+a dius of convergence larger than f (t)’s (which there. The proof of this theorem is omitted 1 + b/n n+b = ≈ here.1 Let f (t) = g(t)h(t) where f (t). the two corTheorem 7.

f (t) = 1/(1 − t2 ) By repeated applications of the recurrence formula has t = 1 and t = −1 as dominating singularities. Observe first that t = 0 is not a pole because: t→0 et lim 1 t = lim t = 1. In order to do this. Therefore. where actually the series does not converge. A more interesting case is f (t) = (et − e)/(1 − t)2 . 2 t→1 1 − t t→1 −1 (1 − t) The generating function of Bernoulli numbers f (t) = t/(et − 1) has infinitely many poles. not necessarily t0 is a pole for f (t). SINGULARITIES: POLES 85 Theorem 7.2 Let f (t) = h(t)(1 − αt)γ .3 Singularities: poles The considerations in the previous sections show how important is to determine the radius of convergence of a series. plicitly that a function can have more than one domΓ(n + 1) inating singularity. In fact.2. These will be called dominating singularities and we observe ex(−1)n (n − γ − 1)(n − γ − 2) · · · (1 − γ)(−γ) = . when we wish to have an approximation of its coefficients. Because our previous consideraProof: We simply apply Bender’s theorem and the tions. We conclude this section by observing that if f (t0 ) = ∞. the generating function of the ordered Bell numbers f (t) = 1/(2−et ) has again an infinite number of poles t = ln 2+2kπi. By this definition. f (t) assumes a well-defined value while f (t) γ which is not a positive integer. For example. notwithstanding the (1 − t)2 . the dominating singularities are t0 = 2πi and t1 = −2πi. we find: cause |1| = |−1|. and therefore f (t) must be different from f (t).3. in this case the dominating singularity is t0 = ln 2. for the moment. on the other hand. In that case. we are now going to look more closely to methods for finding the radius of convergence of a given function f (t). from which the desired formula follows. This is the case of f (t) = 1/(1 − t). while 1/(1 − t)2 has a pole of order 2 in t0 = 1 and 1/(1 − 2t)5 has a pole of order 5 in t0 = 1/2. the radius of convergence can be determined if we are able to find out values t0 for which f (t0 ) = f (t0 ). in the sense that f (t) = f (t) for every t internal to this circle. When The integer m is called the order of the pole. let us limit ourselves to this case. not every and therefore: singularity of f (t) is such that f (t) = ∞. in fact. The series f (t) represents f (t) inside the circle of convergence. and h(t) having a diverges. but. if Γ(n − γ) = t0 is any one of the dominating singularities for f (t).7. in fact: t→1 lim (1 − t) et − e et et − e = lim = lim = −e. So. has a pole of order 1 in t0 = 1. n α Γ(−γ)n1+γ which f (t′ ) = f (t′ ). In . − 1 t→0 e The denominator becomes 0 when et = 1. we have f (t) = 1/(1 − t) and f (t) = 1 + t + t2 + t3 + · · ·. but f (t) can be different from f (t) on the border of the circle or outside it. e2kπi = cos 2kπ + i sin 2kπ = 1. for some |t| > 1. The radius of convergence is always a non-negative real number and we have R = |t0 |. and therefore γ γ(γ − 1) · · · (γ − n + 1) = = the radius of convergence is determined by the singun! n larity or singularities of smallest modulus. the function f (t) = 1/(1 − t) has a pole of order 1 in t0 = 1. and this happens when t = 2kπi. in this case. Therefore. Finally. for every t such that |t| > |t0 | the series f (t) should diverge as we have seen in the previous section. which will be denoted by f (t). we will say that t0 is a pole for n γ(γ + 1) (−1) −1−γ f (t) iff there exists a positive integer m such that: n 1+ = Γ(−γ) 2n lim (1 − αt)m f (t) = K < ∞ and K = 0. A first case is when t0 is an isolated point for which limt→t0 f (t) = ∞. t→t0 7. no singularity can be contained in the circle of convergence. radius of convergence larger than 1/α. befor the Γ function Γ(x + 1) = xΓ(x). Therefore. as we shall see. t0 = 1 is a singularity for f (t) = 1/(1 − t). The γ (−1)n Γ(n − γ) = = following situation is very important: if f (t0 ) = ∞ Γ(n + 1)Γ(−γ) n and we set α = 1/t0 . the singularities of f (t) determine its radius of formula for approximating the binomial coefficient: convergence. which goes to ∞ when t → 1. we have to distinguish between the function f (t) and its series development. for example. Then we have: We will call a singularity for f (t) every point t0 ∈ C such that in every neighborhood of t0 there is a t for 1 αn h(1/α) γ which f (t) and f (t) behaves differently and a t′ for fn = [tn ]f (t) ∼ h (−α)n = . which. An isolated point t0 for which f (t0 ) = ∞ is there= (n − γ − 1)(n − γ − 2) · · · (1 − γ)(−γ)Γ(−γ) fore a singularity for f (t).

which have to be sential singularities are points at which f (t) goes to subtracted again. For n = 3 we have the two observe that the radius of convergence of 1/(1 − t) derangements (1 2 3) and (1 3 2). let us consider the generating function for the √ central binomial coefficients f (t) = 1/ 1 − 4t. but t0 is not a pole of order 1 because: √ 1 − 4t = lim 1 − 4t = 0 lim √ t→1/4 1 − 4t t→1/4 CHAPTER 7. a direct application of Bender’s theorem is sufficient. but all the other terms go to ∞. but for n = 2 the permutation (1 2). Finally.Dn = n! − n(n − 1)! + (n − 3)!. Let Dn the number of derangements in Pn . the total number of permutations. n! 1−t n = 1. and this is the way we will use in the following examples. 1+ = lim (1 − t) 2 t→1 1 − t 2(1 − t) Thus we have the new approximation: In fact. the first m − 1 terms tend to 0. We thus obtain: ∞ too fast. (n − 2)! − 3 2 til we study the Hayman’s method. Actually. Therefore. we can Bender’s theorem we have: count them in the following way: we begin by subn! Dn ∼ e−1 or Dn ∼ . 1 m Therefore. As we shall see. For t0 = 1/4 we have f (1/4) = ∞. is acIn order to find the asymptotic value for Dn . giving the approximation: Dn = n! − n(n − 1)!. which goes to ∞ as t → 1. called derangements (see Section 2. t0 = 1 is not a pole of any order. we tually a derangement. we have a total of n(n − 1)! cases. Whenever we have a function f (t) for which a point t0 ∈ C exists such that ∀m > 0: limt→t0 (1−t/t0 )m f (t) = ∞. while e−t converges for every value of t. if the fixed point is 2. these singularities cannot be treated by n n Darboux’ method and their study will be delayed un. this kind of singularity is called “algebraic”.2).86 fact. the mth n term tends to 1/m!. however. and for n = 4 we is 1. In this way. since no fixed point exists. φ 1 − φt 5 . Dn = n! − n(n − 1)! + (n − 2)!.4 Poles and asymptotics We can now go on with the same method. Es. we have again (n − 1)! permutations of the other elements. we subtracted it when In this case we have: we considered the first and the second fixed point.This quantity is clearly 0 and this happens because tion f (t) = exp(1/(1 − t)). k! This formula checks with the previously found values. we have subtracted twice every permutation with at least 2 fixed points: in fact. and then permuting the n − 2 remaining elements. and therefore by permutations. we have (n − 1)! possible permutations. 7. let us consider the func. and the same happens if we try with (1 − 4t)m for m > 1. We obtain the exponential generating function 1 − φt G(Dn /n!) by observing that the generic element in Our second example concerns a particular kind of the sum is the coefficient [tn ]e−t . we added twice permutations we say that t0 is an essential singularity for f (t). tracting from n!. we have now to add permutations with at = lim (1 − t) exp t→1 1−t least two fixed points. Fibonacci numbers are easily approximated: 1 t t = [tn ] ∼ [tn ] 2 1−t−t 1 − φt 1 − φt ∼ t t= n (n − 3)! + · · · = 3 n n! n! n! n! − + − + · · · = n! 0! 1! 2! 3! k=0 (−1)k . written in cycle notation. and eventually arrive to the final value: Dn = n! − n(n − 1)! + − = n (n − 2)! − 2 Darboux’ method can be easily used to deal with functions. ASYMPTOTICS the number of permutations having at least a fixed point: if the fixed point is 1. For G = .with at least three fixed points. By have a total of 9 derangements. 2 Therefore. n! e 1 1 1 [tn ] = √ φn . These are obtained by choosing the two fixed points in all the n possible ways 2 1 1 m + + · · · = ∞. whose dominating singularities are poles. the theorem on the generating function for the parA derangement is a permutation without any fixed tial sums we have: point. which is called the inclusion exclusion principle. there is no derangement. For n = 0 the empty permutation is considDn e−t ered a derangement.

which. say t0 . and shows that Bernoulli numbers be. . thus singularity. n! (2πi)n (−2πi)n asymptotic evaluation for the Motzkin numbers. in them the function is one-valued. lim n t→ln 2 t→ln 2 2 − et −et 2 ln 2 k=0 Ck Cn−k defining the Catalan numbers. Let us now see how Bender’s theorem is applied Let us consider the generating function for the Cata√ to the exponential generating function of the ordered lan numbers f (t) = (1 − 1 − 4t)/(2t) and the corBell numbers. but when we 1 1 pass to complex numbers this is no longer possible. When n is odd. by means we have: of Bender’s theorem. tk are all the dominating in every neighborhood the function is two-valued.7.5 Dn . Our choice of the − sign was motivated 1 −1/ ln 2 1 − t/ ln 2 by the initial condition of the recurrence Cn+1 = = lim = . We already observed that ±2πi are the two dominating singularities for the generating function of the In fact. = Actually. except for n = 1. but Principle: If t1 . also for small The period of et is therefore 2πi and ln t is actually values of n. We have shown that the dominating responding power series f (t) = 1 + t + 2t2 + 5t3 + singularity is a pole at t = ln 2 which has order 1: 14t4 + · · ·. then [tn ]f (t) can be the smallest in modulus among these points. in the complex field C. f (t) cannot converge. the expression under the square root should be a negative real number Bernoulli numbers. This shows = lim = −1. which is obviously a very important when we have functions with several one-valued function.ln t + 2kπi. a logarithm is an infinitenumbers of odd index are 0. t2 . When valued function. The points at which a square root becomes 0 are dominating singularity: special points. Bn 1 1 Actually. Finally. all the n n [t ] t = [t ] ∼− . they are both poles of order 1: and therefore f (t) ∈ C\R. in modulus. we can choose the positive value 1 − t/ ln 2 1 1 [tn ] = [tn ] ∼ as the result of a square root. these two values are opposite in sign The same considerations hold when a function conand the result is 0. . A point t0 for which the arcome. is a periodic (−1)k (2π)2k . larger and larger as n increases. In other words. but f (t) can only be a real t(1 − t/2πi) 1 − t/πi number or f (t) does not converge. this confirms that the Bernoulli tains a logarithm. therefore: function: B2k ∼ − 2(−1)k (2k)! . while for t such that |t| > |t0 |. Because we know lim = lim = −1. This formula is a good approximation. we find the asymptotic approximation for one of these two branches coincides with the functhe Bernoulli numbers. f (t) should coincide with a branch of f (t). (2π)2k et+2kπi = et e2kπi = et (cos 2kπ + i sin 2kπ) = et . when the argument is a posAt this point we have: itive real number. and there are two branches defined by the same expression. we 2 − et 1 − t/ ln 2 2 − et consider the arithmetic square root instead of the al1 1 − t/ ln 2 n gebraic square root. t > |t0 |. found by summing all the contributions obtained by we must have the following situation: for t such that independently considering the k singularities. For singularities of a function f (t). They can be treated by Darboux’ method or. we have (2πi)2k = (−2πi)2k = nential. gument of a logarithm is 0 is a singularity for the . . values for which the argument of the root is 0 is a e −1 1 − t/2πi et − 1 (2πi)n A similar result is obtained for the other pole. lim t→−2πi t→−2πi et − 1 et that t0 is a singularity for f (t). |t| < |t0 |. say n = 2k. which can actually be computed as the integer singularities nearest to n!/e. because it is the inverse of the expon is even. in the complex field. In fact. a function containing 2 (ln 2)n+1 and we conclude with the very good approximation a square root is a two-valued function. called an algebraic singularity. . which relies on Newton’s rule. Only On ∼ n!/(2(ln 2)n+1 ). for k ∈ Z. consider a t ∈ R. Therefore we have: Every kth root originates the same problem and t(1 − t/2πi) t 1 1 the function is actually a k-valued function.5. directly. we already used this method to find the ∼− − . This allows us to identify the t = ln 2 [t ] = ∼ 2 − et 1 − t/ ln 2 power series f (t) with the function f (t). The following statement is tion defined by the power series. 1 + t/πi t(1 + t/2πi) we conclude that f (t) cannot converge. t→2πi t→2πi et − 1 et that when f (t) converges we must have f (t) = f (t). This is due to the fact that. ALGEBRAIC AND LOGARITHMIC SINGULARITIES 87 Algebraic and logarithmic This value is indeed a very good approximation for 7.

Let us suppose we have the sum: CHAPTER 7. this cannot be eliminated. Formally. where α = 1/t0 and h(t) has a raThere are two singularities: t = 1 is a pole. what is called the principal value for fn . formally. i. this is the only fact distinguishing a logarithmic singularity from an algebraic one. G 2 k 2 1 − t 1 − 2t k=1 If t0 is a dominating algebraic singularity and f (t) = h(t)(1 − αt)p/m . the function h(t) = f (t) − g(t) has coefficients hn that grow more slowly than fn . which slightly modify the general behavior and more accurately evaluate the true value of fn . while t = dius of convergence larger than |t0 |. the quantity gn + hn is a better approximation to fn than gn . Because fn ∼ gn . the successive application of this method of subtracted singularities 1 1 1 1 ln ∼ ln . since O(fn ) = O(gn ). the function has an infinite number of branches. 1 1 n 1 [t ] ln ∼ Sn = Newton’s rule can obviously be used to pass from 2 1 − t 1 − 2t 1 1 2n 1 these expansions to the asymptotic value of fn . this is the case of ln(1/(1 − αt)). for some A. this behavior is only achieved for very large values of n. then we find the expansion: and we wish to compute an approximate value. the factor (1 − 2t) actually reduces the order of growth of the logarithm: −[tn ] 1 − 2t 1 ln = 2(1 − t) 1 − 2t 1 1 = − [tn ](1 − 2t) ln = 2(1 − 1/2) 1 − 2t 2n 2n−1 2n = −2 . 2 1 − t 1 − 2t 1 − 2t Let us therefore consider the new function: h(t) = 1 1 1 1 ln − ln = 2 1 − t 1 − 2t 1 − 2t 1 − 2t 1 = − ln . Since the latter has expansion: smaller modulus. Let us consider as This is not a very good approximation. Suppose we have found. in the previous section. Therefore. that a function f (t) is such that f (t) ∼ A(1−αt)γ . If t0 is a dominating pole of order m and α = 1/t0 . .. Therefore. In every neighborhood of t0 . When we need a true approximation. we can express the coefficients fn in terms of t1 as well. but the method of subtracted singularities allows us to obtain corrections to n 2 4 8 2n−1 1 2k the principal value.6 Subtracted singularities The methods presented in the preceding sections only give the expression describing the general behavior of ∞ the coefficients fn in the expansion f (t) = k=0 fk tk . Because of that. 2(1 − t) 1 − 2t The generic term hn should be significantly less than fn . f (t) = n k (1 − αt)m (1 − αt)m−1 (1 − αt)m−2 1 1 1 2 1 = ln . Many times. f (t) ∼ g(t). we arrive to the following 2 3 4 n 2 k k=1 results. When f (t) has a dominating algebraic singularity. or. by the successive apSn = 1 + + + + · · · + = plication of this method. n n(n − 1) n−1 Therefore. The generating function is: A−m+1 A−m+2 A−m + + +· · · . it is dominating and R = 1/2 is the radius of convergence of the function. a better approximation for Sn is: Sn = The reader can easily verify that this correction greatly reduces the error in the evaluation of Sn . more in general. Sometimes. introduced n k=1 section we will see how it can be improved. we speak of “asymptotic evaluation” or “asymptotic approximation”. by one of the previous methods. We found the principal value Sn ∼ 2n /n by studying the generating function: 7. then we find the 1/2 is a logarithmic singularity. ∼ The same method of subtracted singularities can be 2 1 − 1/2 1 − 2t n used for a logarithmic singularity. we must have hn = o(fn ). [tn ] ln = .88 corresponding function. = − n n−1 n(n − 1) 2n 2n 2n + = . ASYMPTOTICS completely eliminates the singularity.e. In the next an example the sum S = n 2k−1 /k. When f (t) has a pole t0 of order m. the following observation solves the problem. if a second singularity t1 exists such that |t1 | > |t0 |. we should introduce some corrections. γ ∈ R. By Bender’s f (t) = A (1 − αt)p/m + A (1 − αt)(p−1)/m + p p−1 theorem we have: (p−2)/m + Ap−2 (1 − αt) + ···. but for smaller values it is just a rough approximation of the true value. for some function g(t) of which we exactly know the coefficients gn . For example.

gn is the sum of various terms. This is done in the folvelopments: lowing way: 2n 4n 1 1 1 1 1 = √ 1− + +O = = n 8n 128n2 n3 πn 2(1 − t) 1 + (1 − 2t) 1 1 1 1 1 = 1 − (1 − 2t) + (1 − 2t)2 − (1 − 2t)3 + · · · . + 9 9αβ + 51β 2 − +O 2 8(α − β)n 32(α − β)2 n2 1 n3 . HAYMAN’S METHOD 89 A further correction can now be obtained by con. Unfortunately. The evaluation is shown in Table 7. In fact.8 Hayman’s method In the former case. since the case α = β has no interest and the case α = −β should be ap1 1 [tn ](1 − 2t)2 ln = kn = proached in another way.1. and we can develop everything around this singular= n n−1 n−2 n(n − 1)(n − 2) ity. .7 The asymptotic behavior of a trinomial square root The reader is invited to find a similar formula for the case s = 1/2. we can obtain the same results if we exconsidered sufficient for obtaining both the asymppand the function h(t) in f (t) = g(t)h(t). a minus sign Sn ∼ 1+ . However.7.8. which gives: Let us suppose that |α| > |β|. In the second case. and therefore we have: 1 fn = − [tn+m ] r (1 − αt)(1 − βt) The method for coefficient evaluation which uses the function singularities (Darboux’ method). each one of the form: 1 1 1 k(t) = ln − 2 1 − t 1 − 2t 1 qk [tn−k ] . we can use the following dethe dominating singularity. where 2(1 − t) 1 − 2t s = 1/2 or s = −1/2. p(t) is a correcting polynomial. these methods become useless when the function f (t) has no singularity (entire functions) or when the dominating singularity is essential. as we have seen. around proximation. 1 1 (1 − αt)(1 − βt) − ln + (1 − 2t) ln = 1 − 2t 1 − 2t It is therefore interesting to compute. the asymptotic value [tn ]((1−αt)(1−βt))s . This hypothesis means that 2(1 − 1/2) 1 − 2t t = 1/α is the radius of convergence of the function 2n−1 2n−2 2n+1 2n −4 +4 = . The formula so obtained can be In general. 7. in this case. This correction is still smaller. because the coefficients of f (t) are positive numbers. In most combinatorial problems we have α > 0. which has no effect on fn .where m is a small integer. by the technique of “subtracted singularities”. can be improved and made more accurate. 2n 2 Let us consider s = 1/2. for n sufficiently large. and in the latter the development around the singularity gives rise to a series with an infinite number of terms of negative degree. in the former case we do not have any singularity to operate on. once and for 1 (1 − 2t)2 ln = all. n−1 n(n − 2) should precede the square root. In many problems we arrive to a generating function of the form: f (t) = or: g(t) = p(t) − (1 − αt)(1 − βt) rtm q(t) (1 − αt)(1 − βt) . h(t) with a totic evaluation of fn and a suitable numerical apradius of convergence larger than that of f (t). 7. 1+ = + +O 2n − 1 2n 2n 4n2 n3 This implies: 3 1 1 1 1+ = +O 2n − 3 2n 2n n2 1 1 1 ln = ln − 2(1 − t) 1 − 2t 1 − 2t and get: 1 1 2 + (1 − 2t) ln − ··· − (1 − 2t) ln α − β αn 25 6 + 3(α − β) 1 − 2t 1 − 2t √ fn = 1− + + α 2n πn 8(α − β)n 128n2 and the result is the same as the one previously obtained by the method of subtracted singularities. as many as there are terms sidering: in the polynomial q(t). and we can write: but this is not a limiting factor.

in a finite number of steps according to the following rules. let us show some examples function exp(p(t)) is H-admissible. The development of the method was mainly performed by Hayman and therefore it is known as Hayman’s method.1: The case s = 1/2 In these cases. δ are real numbers. for the third function we have: exp t 1−t = exp 1 −1 1−t = 1 exp e 1 1−t and naturally a constant does not influence the Hadmissibility of a function. then: fn = [tn ]f (t) ∼ f (r) rn 2πb(r) as n→∞ where r = r(n) is the least positive solution of the equation tf ′ (t)/f (t) = n and b(t) is the function: b(t) = t d dt t f ′ (t) f (t) . the proof of this theorem is 2. t(1 − t)2 1 − t In particular. are all H-admissible functions. we can evaluate the principal value of the asymptotic estimate for fn . Salvy and Zimmermann. β are positive real numbers and γ. Instead. Table 7. if p(t) is a polynomial with positive coefficients based on Cauchy’s theorem and is beyond the scope and not of the form p(tk ) for k > 1. which allows us to find an asymptotic evaluation for fn in many practical situations. which allows us to evaluate [tn ]f (t) by means of an integral: fn = 1 2πi f (t) dt tn+1 3. this also justifies the use of the letter H in the definition of H-admissibility. if that is the case. if f (t) and g(t) are H-admissible functions and p(t) is a polynomial with real coefficients and positive leading term. p(f (t)) As we said before.8. then the function: f (t) = exp × β (1 − t)α 1 1 ln t (1 − t) δ γ × γ where γ is a suitable path enclosing the origin. realizes this method. the following result holds: Theorem 7. For example. if α. . A function is called H-admissible if and only if it belongs to one of the following classes or can be obtained. then: exp(f (t)) f (t) + g(t) f (t) + p(t) p(t)f (t) 2 ln t 1 1 ln t (1 − t) is H-admissible. then the of these notes.1 Let f (t) be an H-admissible function. the only method seems to be the Cauchy theorem. to clarify the application of Hayman’s method. The method can be implemented on a computer in the following sense: given a function f (t). ASYMPTOTICS β + α (1 − αt) 1/2 1/2 = =− α−β n α [t ](1 − αt)1/2 1 + β 2(α−β) (1 β α−β (1 − αt)1/2 1 + (−α)n + (1 − αt)1/2 + 1/2 n β 2(α−β) (1 − αt) − β 8(α−β)2 (1 β2 8(α−β)2 (1 2 − αt) 2 = − αt)2 + · · · = − αt)5/2 + · · · = (−α)n + · · · = + ··· = 1 n3 β 3/2 2(α−β) n − αt)3/2 − (−α)n − − − β 5/2 8(α−β)2 n α−β (−1)n−1 2n α 4n (2n−1) n α−β 2n α α 4n (2n−1) n n (−α)n 1 − 1− β 3 2(α−β) 2n−3 β2 15 8(α−β)2 (2n−3)(2n−5) 2 3β 2(α−β)(2n−3) 15β 8(α−β)2 (2n−3)(2n−5) +O .90 [tn ] − (1 − αt)1/2 (1 − βt)1/2 = [tn ] − (1 − αt)1/2 =− =− =− =− = α−β n α [t ](1 α−β n α [t ] α−β α α−β α CHAPTER 7. the following functions are all Hadmissible: et exp t + exp t2 2 exp t 1−t 1 1 ln . but we’ll limit ourselves to sketch a method. We do not intend to develop this method here. derived from Cauchy theorem. In this example we have α = β = 1 and γ = δ = 0. The system ΛΥΩ. from other H-admissible functions: 1. in an algorithmic way we can check whether f (t) belongs to the class of functions for which the method is applicable (the class of “H-admissible” functions) and. by Flajolet. For H-admissible functions.

we have to solve the equation tet /et = n. The function b(t) is simply t and therefore we have: [tn ]et = en √ nn 2πn 1 n2 . which gives r = n. = n− + √ +O 2 8 n n t t 1−t = [tn ] exp . It is surely positive. which (1 − t)2 can be computed when we write it as exp(n ln 1/r). which is 2 1 1 1 1 positive and less than√ other. For applying Hayman’s theorem. Examples become early rather complex and require First we compute an approximation for f (r).9. that a large amount of computations. Let us consider the is exp(r/(1 − r)). for every n > 1. √ . Since: following sum: 1 1 1 1 = 1−r = √ − + √ +O n n−1 n2 n 2n 8n n n−1 1 n−1 1 = = 1 1 1 1 k − 1 k! (k + 1)! k √ 1− √ + = √ +O k=0 k=0 n 2 n 8n n n n 1 n−1 n − = = we immediately obtain: k k k! k=0 r 1 1 1 n−1 n √ × = 1− √ + +O n−1 1 n 1 1−r n 2n n n − = = k k! k k! √ 1 1 1 k=0 k=0 √ × n 1+ √ + = +O 2 n 8n n n 1 t − = [tn ] exp √ 1−t 1−t 1 1 1 √ = = n 1− √ + +O 2 n 8n n n 1 t − [tn−1 ] = exp √ 1−t 1−t 1 1 1 . 1−r e t 2 = n or nt − (2n + 1)t + n = 0. the √ √ +O √ √ +O − 2 n n n n n because 4n + 1 < 2 n + 1 < 2n. EXAMPLES OF HAYMAN’S THEOREM 91 As n grows. by developing the expression inside the square root: √ 4n + 1 = = 1 = 4n √ 1 2 n 1+ +O 8n 1+ √ 2 n 7. = √ n e 8 n is: t t Because Hayman’s method only gives the principal (1−t)2 exp 1−t t = . so that we know fn = 1/n!. g(t) = value of the result.9 Examples Theorem of Hayman’s The first example can be easily verified. and therefore we can try to evaluate the exp 1 − r n e 8 n asymptotic behavior of the sum. approximation for factorials. Because ∆ = 4n2 + 4n + 1 − 4n2 = 4n + 1. exp = [tn ] 1−t 1−t 1−t Finally. the exponential gives: We have already seen that this function is Hr e n 1 1 √ +O = √ exp = admissible. the correction can be ignored (it t (1 − t)2 exp 1−t can be not precise) and we get: √ The value of r is therefore given by the minimal posr e n itive solution of: exp = √ . Let us define the √ 1 1 e n function g(t) = tf ′ (t)/f (t). which in the present case 1+ √ +O . Let f (t) = et be the exponential function.7. we have that is: the two solutions: 1 1 1 1 √ = = exp n ln 1 + √ + +O √ n r n 2n n n 2n + 1 ± 4n + 1 r= 1 1 1 2n √ − +O = exp n √ + n 2n n n and we must accept the one with the ‘−’ sign. The second part we have to develop is 1/rn . we can also give an asymptotic approximation of r. From this formula we immediately obtain: 1 1 1 r =1− √ + − √ +O n 2n 8n n 1 n2 and in this formula we immediately recognize Stirling which will be used in the next approximations.

the principal value. . First: r(1 + r) = 1 1 1 √ × +O 1− √ + n 2n n n 1 1 1 √ × 2− √ + +O = n 2n n n 1 5 3 √ +O = = 2− √ + n 2n n n 5 3 1 √ . the correction is ignored and we only consider creases as n increases. and therefore we have: b(t) = t t t(1 + t) d = . Only b(r) remains to be computed. = 2 1− √ + +O 2 n 4n n n By using the expression already found for (1 − r) we then have: 1 1 1 1 √ 1− √ + +O (1 − r)3 = √ n n 2 n 8n n n 1 1 1 1 √ √ 1− √ + × = +O n 2n n n n n 1 1 1 √ = × 1− √ + +O 2 n 8n n n 3 9 1 1 √ √ 1− √ + . we have b(t) = tg ′ (t) where g(t) is as above. ASYMPTOTICS It is a simple matter to execute the original sum n−1 n−1 k=0 k /(k + 1)! by using a personal computer for. exp n+O √ n exp n = CHAPTER 7. = +O n n 2 n 8n n n By inverting this quantity. n = 100. 300. we eventually get: r(1 + r) = (1 − r)3 √ 9 3 1 √ × +O = 2n n 1 + √ + 2 n 8n n n 3 5 1 √ × 1− √ + = +O 2 n 4n n n √ 1 1 √ .92 = 1 1 1 1 √ √ + − +O n 2n 2n n n √ √ 1 ∼ e n. say. dt (1 − t)2 (1 − t)3 This quantity can be computed a piece at a time. ∼ √ 2 πen3/4 √ 3 = √ √ 4πn n = 2 πn3/4 . 200. = 2n n 1 + +O 8n n n √ The principal value is 2n n and therefore: b(r) = 2πb(r) = the final result is: fn = [tn ] exp t 1−t e2 n . The results obtained can = be compared with the evaluation of the previous formula. We can thus verify that the relative error deAgain. We now observe that f (r)/rn ∼ √ √ e2 n / e.

Louis Comtet: Advanced Combinatorics. in an elementary but rigorous way. Birkh¨user a (1981). I. Addison-Wesley (1968). I. relating Computer Science and the methods of Combinatorial Analysis. Press !997) (2001). with the main topics of Combinatorial Analysis. there is little hope to understand the developments of the analysis of algorithms and data structures: Donald E. Without this basic knowledge. So. Cambridge Univ. Graham. Donald E. Philippe Flajolet: An Introduction to the Analysis of Algorithms. David M. Knuth. the first part of which is dedicated to the mathematical methods used in Combinatorial Analysis. with an eye to Computer Science.Chapter 8 Bibliography The birth of Computer Science and the need of analyzing the behavior of algorithms and data structures have given a strong twirl to Combinatorial Analysis. Many exercise (with solutions) are proposed. and to the mathematical methods used to study combinatorial objects.(1968). Oren Patashnik: Concrete Mathematics. Numerical and probabilistic developments are to be found in the central volume: Donald E. Knuth: The Art of Computer Programming: Fundamental Algorithms. Vol. (2004). Reidel (1974). near to the traditional literature on Combinatorics. Addison-Wesley (1973). II. Vol. Dover Publ. Addison-Wesley (1973) A concise exposition of several important techniques is given in: Daniel H. Jackson: Combinatorial Enumeration. Stanley: Enumerative Combinatorics. Richard P. Vol. Wiley (1950) (1957) . Robert Sedgewick. Greene. John Riordan: An Introduction to Combinatorial Analysis. Vol. 93 It deals. Cambridge Univ. Knuth: The Art of Computer Programming: Numerical Algorithms. very clear and very comprehensive. who published in 1968 the first volume of his monumental “The Art of Computer Programming”. Knuth: Mathematics for the Analysis of Algorithms. and the reader is never left alone in front of the many-faced problems he or she is encouraged to tackle. Ian P. Ronald L. Press (1986) (2000). The most systematic work in the field is perhaps the following text. because they contain information on general concepts. II. Many texts on Combinatorial Analysis are worth of being considered. a number of books and papers have been produced. The first author who systematically worked in this direction was surely Donald Knuth. Richard P. Many additional concepts and techniques are also contained in the third volume: Donald E. . III. Stanley: Enumerative Combinatorics. both from a combinatorial and a mathematical point of view: William Feller: An Introduction to Probability Theory and Its Applications. Goulden. Donald E. AddisonWesley (1989). AddisonWesley (1995). Knuth: The Art of Computer Programming: Sorting and Searching. Wiley (1953) (1958). Vol.

143 (1992). Rapport Technique N. (1986) Vol. (1972). so we limit our considerations to some classical cases relative to Computer Science. The concepts relating to the problem of searching can be found in any text on Algorithms and Data Structures. in particular of Combinatorial Analysis. J. generating functions have been a controversial concept. and their theory has been developed in many of the quoted texts. Sloane. Volume I can most of the quoted texts. in particular in Stanley be consulted for Landau notation. Stirling numbers are very important in Numerical Analysis. Academic Press (1990). .94 *** Chapter 1: Introduction CHAPTER 8. Paul Zimmermann: GFUN: a Maple package for the manipulation of generating and holonomic functions in one variable. the quantity we consider are common to all parts of Mathematics.com/ njas/sequences/. For the mathematical concepts ematics is available by means of several Computer of Γ and ψ functions. In particular: Herbert S. they are the most frequently used quantity in Combinatorics.att. due to de Moivre in the XVIII Century. the reader can find their algebraic foundation in the book: Peter Henrici: Applied and Computational Complex Analysis. III. Available at: http://www. in particular Volume III of the quoted The Lagrange Inversion Formula can be found in “The Art of Computer Programming”. Sometimes. Renzo Sprugnol. Anyhow. and from this fact their relevance in Combinatorics and in the Analysis of Algorithms originates. (1977) Vol. The resulting book (and the corresponding Internet site) is one of the most important reference point: N. *** Chapter 2: Special numbers The quoted book by Graham. M. Stegun: Handbook ideas described in this chapter. So. Irene A. Stanley dedicates to Catalan numbers a special survey in his second volume. A practical device has been realized in Maple: Bruno Salvy. but here we are more interested in their combinatorial meaning. free or not. other software is available. Harmonic numbers are the “discrete” version of logarithms. the reader is referred to: Algebra Systems. Wiley (1974) Vol. almost every part of Mathmon in Mathematics. II. many others. used systems. A. just after binomial coefficients and before Fibonacci numbers. Simon Plouffe: The Encyclopedia of Integer Sequences Academic Press (1995). Since their introduction in Mathematics. Maple and Mathematica are among the most of Mathematical Functions. Cecilia Verri: The method of coefficients. The number of applications of generating functions is almost infinite.research. The American Mathematical Monthly (to appear). I. BIBLIOGRAPHY Coefficient extraction is central in our approach. implementing on a computer the Milton Abramowitz. Knuth and Patashnik is appropriate. INRIA. *** Chapter 4: Generating functions In our present approach the concept of an (ordinary) generating function is essential. and many. Nowadays. Now they are accepted almost universally. *** Chapter 5: Riordan arrays *** Chapter 3: Formal Power Series Formal power series have a long tradition. and this justifies our insistence on the two operators “generating function” and “coefficient of”. These systems have become an essential tool for developing any new aspect of Mathematics. our proofs should be compared with those given (for the same problems) by Knuth or Flajolet and Sedgewick. we intentionally complicate proofs in order to stress the use of generating function as a unifying approach. Wilf: Generatingfunctionology. Probably. The ζ function and the Bernoulli numbers are also covered by the book of Abramowitz and Stegun. a formalization can be found in: Donatella Merlini. even if it is comand Henrici. Sequences studied in Combinatorial Analysis have been collected and annotated by Sloane. Dover Publ.

M. Wilf. Seyoum Getu. Riordan arrays are less powerfy+ul. powers and binomial coefficients. Knuth: Convolution polynomials. K. 132 (1992) 267 – 290. Peters (1996). Canadian Journal of Mathematics 49 (1997) 301 – 320. the method of Wilf and Zeilberger completely solves the problem of combinatorial identities “with hypergeometric terms”. which allow to solve many problems relative to the given objects. although. Discrete Mathematics 80 (1990). *** Collections of combinatorial identities are: Henry W. nobody seems to have noticed the connec. The Stirling numbers of both kind. 207 – 211. In this Steve Roman: The Umbral Calculus Academic sense. Other methods to prove combinatorial identities are important in Combinatorial Analysis. When the grammar defines a class of combinatorial objects. Since context-free languages only define algebraic generating functions (the subset of regular grammars is limited to rational . A. Their practical relevance was noted in: Renzo Sprugnoli: Riordan arrays and combinatorial sums. Doron Zeilberger: A fast algorithm for proving terminating hypergeometric identities. Cecilia Verri: On some alternative characterizations of Riordan arrays. n) R(n + 1) Riordan arrays are strictly related to “convolution R(n) matrices” and to “Umbral calculus”. The theory was further developed in: Donatella Merlini. n) L(k. but can Press (1984). G. n + 1) L(k. n) L(k. Josef Kaucky: Combinatorial Identities Veda. Discrete Applied Mathematics. this method gives a direct way to obtain monovariate or multivariate generating functions. Woodson: The Riordan group. as for examples in the case of harmonic and Donald E. M. This actutions of these concepts and combinatorial sums: ally means that L(k. Leon C. Discrete Mathematics. In particular. 59 (1984).95 Riordan arrays are part of the general method of coefficients and are particularly important in proving combinatorial identities and generating function transformations. Herbert S. The symbolic method was started by Sch¨tzenberger and Viennot. Douglas G. 2 (1991) 67 – 78. Rogers. We quote: John Riordan: Combinatorial Identities. Gould: Combinatorial Identities. Shapiro. Chapter 6: Formal Methods In this chapter we have considered two important methods: the symbolic method for deducing counting generating functions from the syntactic definition of combinatorial objects. Vol. Society Translations.are all rational functions in n and k. Wen-Jin Woan. n) and R(n) are composed of factorial. P. who devised a u technique to automatically generate counting generating functions from a context-free non-ambiguous grammar. Doron Zeilberger: A holonomic systems approach to special functions identities. Doron Zeilberger: s A = B. This means that we can algorithmically establish whether an identity is true or false. Wiley (1968). n) = R(n) have a special form: L(k + 1. Egorychev: Integral Representation and the Computation of Combinatorial Sums American Math. Bratislava (1975). 321 – 368. Methematica Journal. and the method of “operators” for obtaining combinatorial identities from relations between transformations of sequences defined by operators. A Standardized Set of Tables Listing 500 Binomial Coefficient Summations West Virginia University (1972). be used also when non-hypergeometric terms are not involved. Petkovˇek. They were introduced by: Louis W. provided the two members of the identity: k L(k. Renzo Sprugnoli. rather strangely. 34 (1991) 229 – 239. Journal of Computational and Applied Mathematics 32 (1990).

In these cases.inria. this solution is a precise formula. The and Henrici. however. Actually. The method of Heyman is based on the method was extended by Flajolet to some classes of paper: exponential generating functions and implemented Micha Hofri: Probabilistic Analysis of Algoin Maple. X Colloquium on Automata. Algorithms Seminar. Information and Control 6 (1963) 246 – 264 Maylis Delest. Philippe Flajolet: Symbolic enumerative combinatorics and complex asymptotic analysis. Wilf is very effective whenever it can be applied. Chelsea Publ. The purpose of asymptotic methods is just that. Springer (1987). (1965). The important concepts of indefinite and definite summations are used by Wilf and Zeilberger in the quoted texts. but it has a clear connection with Combinatorial Analysis. as our numerous examples show. BIBLIOGRAPHY functions). rithms. the method is not very general. we would also be content with an approximate solution. (2001). Available at: http://algo. provided we can give an upper bound to the error committed. especially George Boole.fr/seminars/sem0001/flajolet. We have used a rather descriptive approach. Languages and Programming .Lecture Notes in Computer Science (1983) 173 – 181. especially Knuth.html The method of operators is very old and was developed in the XIX Century by English mathematicians. This.96 CHAPTER 8. Xavier Viennot: Algebraic languages and polyominoes enumeration. the method is used in Numerical Analysis. The natural settings for these problems are Complex Analysis and the theory of series. but covered by the quoted text. limiting our considerations to elementary cases. *** Chapter 7: Asymptotics The methods treated in the previous chapters are “exact”. A classical book in this direction is: Charles Jordan: Calculus of Finite Differences. These situations are . is not always possible. in the sense that every time they give the solution to a problem. and many times we are not able to find a solution of this kind. Marco Sch¨tzenberger: Context-free languages u and pushdown automata. The Euler-McLaurin summation formula is the first connection between finite methods (considered up to this moment) and asymptotics.

62 colored walk problem. 11 combination. 78 degree of a permutation. 63 Cauchy product. 24. 30 convolution rule for generating functions. Noam. 84 Bernoulli number. 12. 86. 40 cross product rule. 90 central binomial coefficient. 68 convergence. 67 diagonal step.s. 11 1-1 mapping. 73 branch. 34 context free grammar.p. 11 column index.p. 30 colored walk. 11 asymptotic approximation. 15 bisection formula. 53. 20 binomial coefficient. 41 bivariate generating function. 17 central trinomial coefficient. 46 characteristic function. 83 convolution. 70 Boole. 8. 9 bijection. 12 cycle representation. 80. 12 Darboux’ method. 12 Chomski.. 40 compositional inverse of a f. 35 algebraic singularity. 87 Algol’60. 11 coefficient extraction rules.s. 88 asymptotic development. 29 composition of permutations. 80 asymptotic evaluation. 8 1-1 correspondence. 57 arithmetic square root. 67 alternating subgroup. 87 C language. 85 big-oh notation. 26 Cauchy theorem. 5 alphabet.. 70 algorithm.s. George. 52 adicity. 15 closed form expression. 68 Appell subgroup. 15 combination with repetitions. 88 average case analysis.. 29 Computer Algebra System. 5 AVL tree. 15 binomial formula. 58 absolute scale. 57 Bender’s theorem. 17 cycle.Index Γ function. 52 Backus Normal Form. 13 composition rule for coefficient of. 8. 68 context free language. 24 Bell subgroup. 7 codomain. 62 column. 29 derangement. 16. 16 complete colored walk. 62 . 21 cycle degree. 90 ψ function. 14 ambiguous grammar. 11 Catalan number. 11 binary searching. 9 addition operator. 11 A-sequence. 20. 12. 18. 70 Bell number. 69 cardinality. 30 coefficient operator. 55 BNF. 28 definite summation. 69 Algol’68. 84 definite integration of a f. 62 composition of f.p. 86 derivation. 6 binary tree. 12. 87 arrangement. 77 Adelson-Velski. 12 delta series. 84 ΛΥΩ. 34 Catalan triangle. 87 algebraic square root. 30 coefficient of operator. 26 convolution rule for coefficient of. 30 composition rule for generating functions. 67 choose. 11 97 Cartesian product.

p. 73 identity permutation. 12. 13.. 8. 83 domain. 12 Flajolet. 30 differentiation rule for generating functions. 33 Lagrange subgroup. 62 length of a word. 8 disposition. 11 extraction of the coefficient. 11 inclusion exclusion principle. 18 generating function. 25 index. 35 indefinite summation. 5 Kronecker’s delta. 51 intractable algorithm. 51 identity. 67 full history recurrence.. 10 linear recurrence. 15 key. 67 entire function. 90 head. 73 fixed point. 55 Euler-McLaurin summation formula. 32 Lagrange inversion formula. 13 image. 61 harmonic number. 52 language.98 diagonalisation rule for generating functions. Leonardo. 28 extensional definition. 29 f. 11 infinitesimal operator. 49 juxtaposition. 68 Dyck language. 86 Euclid’s algorithm. 13 exponential algorithm. 11 Gauss’ integral. Philippe. 11 Dyck grammar. 67 language generated by the grammar. 15 Fibonacci number. 48 . 12. 29 identity operator. 47 injective function. 12. 6. 86 indefinite precision. 39 exponentiation of f. 90 Hardy’s identity. 9 Landis. 67 H-admissible function. 25. 30 grammar. 11 internal path length. 67 letter. 27 formal power series. 12 identity for composition. 43 INDEX Lagrange.p.. 67 formal Laurent series. 78 indeterminate. 69 invertible f. 21 Dyck word. 67 geometric series. 18 Euler transformation. 8 generalized convolution rule. 42. 80 even permutation. 5 integral lattice. 40 difference operator. 25 generating function rules. 25 factorial. 47 function. 8. 90 formal grammar.p.s. 52 i.s. 10 exponential generating function. 26 involution. 28 intensional definition.p. 40 digamma function. 11 dominating singularity. 67 length of a walk. 89 essential singularity. 67 group. 56 generalized harmonic number. 18 finite operator. 68 Dyck walk. 62 empty word. 15 divergence. 73 initial condition. 19 Euler constant. 67 height balanced binary tree. 19 Fibonacci word. 68 leftmost occurrence..s. 17 Hayman’s method. 10 intrinsecally ambiguous grammar.. 25. 67 k-combination. 69 east step.l. 8. 73 differentiation of a f. 18 harmonic series.p. 28 differentiation rule for coefficient of.s.. 52 height of a tree. 67 LIF. 14 falling factorial. 20. Edmund.s. 67 formal language. 40 generation. 57 Landau. 47 linear recurrence with constant coefficients. 33 linear algorithm. 70 Fibonacci. 20 integration of a f. 19 Fibonacci problem. 11 input. 25 free monoid. 85 double sequence.p.

37 monoid.s. 57 residue of a f. 70 method of shifting. 24 preferential arrangements.s. 18 Riordan array.L. 24 ordinary generating function. 11 Numerical Analysis.p. 15 pole. 10 logarithmic singularity. 68 partial fraction expansion. 40 problem of searching. 67 principal value. 18 metasymbol. 11 set partition. 33 99 . 29 rabbit problem. 30 reverse of a f. 16 path. Bruno. 30. 59 singularity. 9 solving a recurrence. 55 quadratic algoritm. J.s. 14 number of mappings... 83 non-ambiguous grammar. 90 Sch¨tzenberger methodology.INDEX linear recurrence with polynomial coefficients.p. 68 Pascal language.s. 48 partial history recurrence. 40 shuffling. 88 principle of identity. 28 Stanley. 71 occurence.s. 34 parenthetization. 25 output. 71 multiset.. 11 Mascheroni constant. 84 non convergence. 14 simple binomial coefficients. 20..p. 6 sorting. 25 order of a pole. 35 operator.. 21 rooted planar tree. 24 negation rule. 57 root. 14 range. 12 south-east step. 30. 24. 67 occurs in. 61 rising factorial.. 15 Rogers. 10 power of a f. 19 radius of convergence. 46 sequence. 5 set. 62 north-east step.s. 11 row index. 63 Motzkin word. 47 ordered Bell number. 32 Salvy. 13 operand. 72 order of a f. 36 logarithm of a f. 25 Pochhammer symbol. 6. 21 row. 85 polynomial algorithm.p. 11 row-by-column product. 83 random permutation. 12 place marker. 35. 69 Pascal triangle. 67 program. 5 proper Riordan array. 29 Riemann zeta function. 73 O-notation. 67 odd permutation. 28. 88 mapping. 71 Motzkin triangle. 47 renewal array. 23 shift operator. 40 list representation. 67 Motzkin number.L. 85 order of a recurrence.. 35 operations on rational numbers. 47 partially recursive set. Douglas. 5 product of f. 55 Riordan’s old identity. 44 Miller. 16 Newton’s rule. 73 shifting rule for coefficient of. 10 quasi-unit. 20 permutation. 11 recurrence relation. C. 28 logarithmic algorithm. 9 object grammar. P.. 68 north step. 27 preferential arrangement number.s. 20 number of involutions. 24 prefix. 20 square root of a f. 30 shifting rule for generating functions. 11 sequential searching. 20.p. 85 small-oh notation. 85 ordered partition. 27 product of f. 48 linearity rule for coefficient of. 30 linearity rule for generating functions. 8.p. 68 u semi-closed form.s. 26 production.. 5 p-ary tree.

14 symmetry formula. 49 surjective function. 11 symbol. James. 67 Tartaglia triangle. 16 syntax directed compilation. 67 sum of a geometric progression. 12. 22 subgroup of associated operators. 21. 46 unary-binary tree. 5 van Wijngarden grammar. 67 worst AVL tree.p. 12 transposition representation. 68 tractable algorithm. 43 vector notation. 90 INDEX ..L. 10 transposition. 67 successful search. 12 walk. Paul. 5 suffix.s. 21 Stirling number of the second kind. 70 Vandermonde convolution.s. 88 subword. 13 tree representation. 26 summation by parts.100 Stirling number of the first kind. 68 symmetric colored walk problem. 70 table. 16 Taylor theorem. 22 Stirling polynomial. 12. 58 zeta function. 71 underdiagonal colored walk. 36 triangle. 11 vector representation. 83 unit. 67 symbolic method. 80 terminal word. 62 symmetric group. 49 uniform convergence. 57 subtracted singularity. 23 Stirling. 55 trinomial coefficient. 20 word. 20 unfold a recurrence. 18 Zimmermann. 62 underdiagonal walk. 6. 26 unsuccessful search. 5 Z-sequence. 78 summing factor method. 52 worst case analysis. 27 sum of f. 5 tail. 44 sum of f.

Sign up to vote on this title
UsefulNot useful