This action might not be possible to undo. Are you sure you want to continue?

# An Introduction to

**Mathematical Methods in Combinatorics
**

Renzo Sprugnoli

Dipartimento di Sistemi e Informatica

Viale Morgagni, 65 - Firenze (Italy)

January 18, 2006

2

Contents

1 Introduction 5

1.1 What is the Analysis of an Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 The Analysis of Sequential Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Binary Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Closed Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5 The Landau notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Special numbers 11

2.1 Mappings and powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 The group structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Counting permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Dispositions and Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.6 The Pascal triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.7 Harmonic numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.8 Fibonacci numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.9 Walks, trees and Catalan numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.10 Stirling numbers of the ﬁrst kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.11 Stirling numbers of the second kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.12 Bell and Bernoulli numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Formal power series 25

3.1 Deﬁnitions for formal power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 The basic algebraic structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Formal Laurent Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4 Operations on formal power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.5 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.6 Coeﬃcient extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.7 Matrix representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.8 Lagrange inversion theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.9 Some examples of the LIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.10 Formal power series and the computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.11 The internal representation of expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.12 Basic operations of formal power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.13 Logarithm and exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Generating Functions 39

4.1 General Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2 Some Theorems on Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3 More advanced results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4 Common Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.5 The Method of Shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.6 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.7 Some special generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3

4 CONTENTS

4.8 Linear recurrences with constant coeﬃcients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.9 Linear recurrences with polynomial coeﬃcients . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.10 The summing factor method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.11 The internal path length of binary trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.12 Height balanced binary trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.13 Some special recurrences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Riordan Arrays 55

5.1 Deﬁnitions and basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2 The algebraic structure of Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3 The A-sequence for proper Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.4 Simple binomial coeﬃcients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.5 Other Riordan arrays from binomial coeﬃcients . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.6 Binomial coeﬃcients and the LIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.7 Coloured walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.8 Stirling numbers and Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.9 Identities involving the Stirling numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6 Formal methods 67

6.1 Formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.2 Context-free languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.3 Formal languages and programming languages . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.4 The symbolic method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.5 The bivariate case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.6 The Shift Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.7 The Diﬀerence Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.8 Shift and Diﬀerence Operators - Example I . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.9 Shift and Diﬀerence Operators - Example II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.10 The Addition Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.11 Deﬁnite and Indeﬁnite summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.12 Deﬁnite Summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.13 The Euler-McLaurin Summation Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.14 Applications of the Euler-McLaurin Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7 Asymptotics 83

7.1 The convergence of power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.2 The method of Darboux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7.3 Singularities: poles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.4 Poles and asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.5 Algebraic and logarithmic singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.6 Subtracted singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.7 The asymptotic behavior of a trinomial square root . . . . . . . . . . . . . . . . . . . . . . . . 89

7.8 Hayman’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.9 Examples of Hayman’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8 Bibliography 93

Chapter 1

Introduction

1.1 What is the Analysis of an

Algorithm

An algorithm is a ﬁnite sequence of unambiguous

rules for solving a problem. Once the algorithm is

started with a particular input, it will always end,

obtaining the correct answer or output, which is the

solution to the given problem. An algorithm is real-

ized on a computer by means of a program, i.e., a set

of instructions which cause the computer to perform

the elaborations intended by the algorithm.

So, an algorithm is independent of any computer,

and, in fact, the word was used for a long time before

computers were invented. Leonardo Fibonacci called

“algorisms” the rules for performing the four basic

operations (addition, subtraction, multiplication and

division) with the newly introduced arabic digits and

the positional notation for numbers. “Euclid’s algo-

rithm” for evaluating the greatest common divisor of

two integer numbers was also well known before the

appearance of computers.

Many algorithms can exist which solve the same

problem. Some can be very skillful, others can be

very simple and straight-forward. A natural prob-

lem, in these cases, is to choose, if any, the best al-

gorithm, in order to realize it as a program on a par-

ticular computer. Simple algorithms are more easily

programmed, but they can be very slow or require

large amounts of computer memory. The problem

is therefore to have some means to judge the speed

and the quantity of computer resources a given algo-

rithm uses. The aim of the “Analysis of Algorithms”

is just to give the mathematical bases for describing

the behavior of algorithms, thus obtaining exact cri-

teria to compare diﬀerent algorithms performing the

same task, or to see if an algorithm is suﬃciently good

to be eﬃciently implemented on a computer.

Let us consider, as a simple example, the problem

of searching. Let S be a given set. In practical cases

S is the set N of natural numbers, or the set Z of

integer numbers, or the set R of real numbers or also

the set A

∗

of the words on some alphabet A. How-

ever, as a matter of fact, the nature of S is not essen-

tial, because we always deal with a suitable binary

representation of the elements in S on a computer,

and have therefore to be considered as “words” in

the computer memory. The “problem of searching”

is as follows: we are given a ﬁnite ordered subset

T = (a

1

, a

2

, . . . , a

n

) of S (usually called a table, its

elements referred to as keys), an element s ∈ S and

we wish to know whether s ∈ T or s ∈ T, and in the

former case which element a

k

in T it is.

Although the mathematical problem “s ∈ T or

not” has almost no relevance, the searching prob-

lem is basic in computer science and many algorithms

have been devised to make the process of searching

as fast as possible. Surely, the most straight-forward

algorithm is sequential searching: we begin by com-

paring s and a

1

and we are ﬁnished if they are equal.

Otherwise, we compare s and a

2

, and so on, until we

ﬁnd an element a

k

= s or reach the end of T. In

the former case the search is successful and we have

determined the element in T equal to s. In the latter

case we are convinced that s ∈ T and the search is

unsuccessful.

The analysis of this (simple) algorithm consists in

ﬁnding one or more mathematical expressions de-

scribing in some way the number of operations per-

formed by the algorithm as a function of the number

n of the elements in T. This deﬁnition is intentionally

vague and the following points should be noted:

• an algorithm can present several aspects and

therefore may require several mathematical ex-

pressions to be fully understood. For example,

for what concerns the sequential search algo-

rithm, we are interested in what happens during

a successful or unsuccessful search. Besides, for

a successful search, we wish to know what the

worst case is (Worst Case Analysis) and what

the average case is (Average Case Analysis) with

respect to all the tables containing n elements or,

for a ﬁxed table, with respect to the n elements

it contains;

• the operations performed by an algorithm can be

5

6 CHAPTER 1. INTRODUCTION

of many diﬀerent kinds. In the example above

the only operations involved in the algorithm are

comparisons, so no doubt is possible. In other

algorithms we can have arithmetic or logical op-

erations. Sometimes we can also consider more

complex operations as square roots, list concate-

nation or reversal of words. Operations depend

on the nature of the algorithm, and we can de-

cide to consider as an “operation” also very com-

plicated manipulations (e.g., extracting a ran-

dom number or performing the diﬀerentiation of

a polynomial of a given degree). The important

point is that every instance of the “operation”

takes on about the same time or has the same

complexity. If this is not the case, we can give

diﬀerent weights to diﬀerent operations or to dif-

ferent instances of the same operation.

We observe explicitly that we never consider exe-

cution time as a possible parameter for the behavior

of an algorithm. As stated before, algorithms are

independent of any particular computer and should

never be confused with the programs realizing them.

Algorithms are only related to the “logic” used to

solve the problem; programs can depend on the abil-

ity of the programmer or on the characteristics of the

computer.

1.2 The Analysis of Sequential

Searching

The analysis of the sequential searching algorithm is

very simple. For a successful search, the worst case

analysis is immediate, since we have to perform n

comparisons if s = a

n

is the last element in T. More

interesting is the average case analysis, which intro-

duces the ﬁrst mathematical device of algorithm anal-

ysis. To ﬁnd the average number of comparisons in a

successful search we should sum the number of com-

parisons necessary to ﬁnd any element in T, and then

divide by n. It is clear that if s = a

1

we only need

a single comparison; if s = a

2

we need two compar-

isons, and so on. Consequently, we have:

C

n

=

1

n

(1 + 2 + +n) =

1

n

n(n + 1)

2

=

n + 1

2

This result is intuitively clear but important, since

it shows in mathematical terms that the number of

comparisons performed by the algorithm (and hence

the time taken on a computer) grows linearly with

the dimension of T.

The concluding step of our analysis was the execu-

tion of a sum—a well-known sum in the present case.

This is typical of many algorithms and, as a matter of

fact, the ability in performing sums is an important

technical point for the analyst. A large part of our

eﬀorts will be dedicated to this topic.

Let us now consider an unsuccessful search, for

which we only have an Average Case analysis. If U

k

denotes the number of comparisons necessary for a

table with k elements, we can determine U

n

in the

following way. We compare s with the ﬁrst element

a

1

in T and obviously we ﬁnd s = a

1

, so we should

go on with the table T

′

= T ` ¦a

1

¦, which contains

n −1 elements. Consequently, we have:

U

n

= 1 +U

n−1

This is a recurrence relation, that is an expression

relating the value of U

n

with other values of U

k

having

k < n. It is clear that if some value, e.g., U

0

or U

1

is known, then it is possible to ﬁnd the value of U

n

,

for every n ∈ N. In our case, U

0

is the number of

comparisons necessary to ﬁnd out that an element

s does not belong to a table containing no element.

Hence we have the initial condition U

0

= 0 and we

can unfold the preceding recurrence relation:

U

n

= 1 +U

n−1

= 1 + 1 +U

n−2

= =

= 1 + 1 + + 1

. .. .

ntimes

+U

0

= n

Recurrence relations are the other mathematical

device arising in algorithm analysis. In our example

the recurrence is easily transformed into a sum, but

as we shall see this is not always the case. In general

we have the problem of solving a recurrence, i.e., to

ﬁnd an explicit expression for U

n

, starting with the

recurrence relation and the initial conditions. So, an-

other large part of our eﬀorts will be dedicated to the

solution or recurrences.

1.3 Binary Searching

Another simple example of analysis can be performed

with the binary search algorithm. Let S be a given

ordered set. The ordering must be total, as the nu-

merical order in N, Z or R or the lexicographical

order in A

∗

. If T = (a

1

, a

2

, . . . , a

n

) is a ﬁnite or-

dered subset of S, i.e., a table, we can always imag-

ine that a

1

< a

2

< < a

n

and consider the fol-

lowing algorithm, called binary searching, to search

for an element s ∈ S in T. Let a

i

the median el-

ement in T, i.e., i = ⌊(n + 1)/2⌋, and compare it

with s. If s = a

i

then the search is successful;

otherwise, if s < a

i

, perform the same algorithm

on the subtable T

′

= (a

1

, a

2

, . . . , a

i−1

); if instead

s > a

i

perform the same algorithm on the subtable

T

′′

= (a

i+1

, a

i+2

, . . . , a

n

). If at any moment the ta-

ble on which we perform the search is reduced to the

empty set ∅, then the search is unsuccessful.

1.4. CLOSED FORMS 7

Let us consider ﬁrst the Worst Case analysis of

this algorithm. For a successful search, the element

s is only found at the last step of the algorithm, i.e.,

when the subtable on which we search is reduced to

a single element. If B

n

is the number of comparisons

necessary to ﬁnd s in a table T with n elements, we

have the recurrence:

B

n

= 1 +B

⌊n/2⌋

In fact, we observe that every step reduces the table

to ⌊n/2⌋ or to ⌊(n−1)/2⌋ elements. Since we are per-

forming a Worst Case analysis, we consider the worse

situation. The initial condition is B

1

= 1, relative to

the table (s), to which we should always reduce. The

recurrence is not so simple as in the case of sequential

searching, but we can simplify everything considering

a value of n of the form 2

k

−1. In fact, in such a case,

we have ⌊n/2⌋ = ⌊(n −1)/2⌋ = 2

k−1

−1, and the re-

currence takes on the form:

B

2

k

−1

= 1 +B

2

k−1

−1

or β

k

= 1 +β

k−1

if we write β

k

for B

2

k

−1

. As before, unfolding yields

β

k

= k, and returning to the B’s we ﬁnd:

B

n

= log

2

(n + 1)

by our deﬁnition n = 2

k

−1. This is valid for every n

of the form 2

k

−1 and for the other values this is an

approximation, a rather good approximation, indeed,

because of the very slow growth of logarithms.

We observe explicitly that for n = 1, 000, 000, a

sequential search requires about 500,000 comparisons

on the average for a successful search, whereas binary

searching only requires log

2

(1, 000, 000) ≈ 20 com-

parisons. This accounts for the dramatic improve-

ment that binary searching operates on sequential

searching, and the analysis of algorithms provides a

mathematical proof of such a fact.

The Average Case analysis for successful searches

can be accomplished in the following way. There is

only one element that can be found with a single com-

parison: the median element in T. There are two

elements that can be found with two comparisons:

the median elements in T

′

and in T

′′

. Continuing

in the same way we ﬁnd the average number A

n

of

comparisons as:

A

n

=

1

n

(1 + 2 + 2 + 3 + 3 + 3 + 3 + 4 +

+ (1 +⌊log

2

(n)⌋))

The value of this sum can be found explicitly, but the

method is rather diﬃcult and we delay it until later

(see Section 4.7). When n = 2

k

− 1 the expression

simpliﬁes:

A

2

k

−1

=

1

2

k

−1

k

¸

j=1

j2

j−1

=

k2

k

−2

k

+ 1

2

k

−1

This sum also is not immediate, but the reader can

check it by using mathematical induction. If we now

write k2

k

−2

k

+1 = k(2

k

−1) +k −(2

k

−1), we ﬁnd:

A

n

= k +

k

n

−1 = log

2

(n + 1) −1 +

log

2

(n + 1)

n

which is only a little better than the worst case.

For unsuccessful searches, the analysis is now very

simple, since we have to proceed as in the Worst Case

analysis and at the last comparison we have a failure

instead of a success. Consequently, U

n

= B

n

.

1.4 Closed Forms

The sign “=” between two numerical expressions de-

notes their numerical equivalence as for example:

n

¸

k=0

k =

n(n + 1)

2

Although algebraically or numerically equivalent, two

expressions can be computationally quite diﬀerent.

In the example, the left-hand expression requires n

sums to be evaluated, whereas the right-hand ex-

pression only requires a sum, a multiplication and

a halving. For n also moderately large (say n ≥ 5)

nobody would prefer computing the left-hand expres-

sion rather than the right-hand one. A computer

evaluates this latter expression in a few nanoseconds,

but can require some milliseconds to compute the for-

mer, if only n is greater than 10,000. The important

point is that the evaluation of the right-hand expres-

sion is independent of n, whilst the left-hand expres-

sion requires a number of operations growing linearly

with n.

A closed form expression is an expression, depend-

ing on some parameter n, the evaluation of which

does not require a number of operations depending

on n. Another example we have already found is:

n

¸

k=0

k2

k−1

= n2

n

−2

n

+ 1 = (n −1)2

n

+ 1

Again, the left-hand expression is not in closed form,

whereas the right-hand one is. We observe that

2

n

= 22 2 (n times) seems to require n−1 mul-

tiplications. In fact, however, 2

n

is a simple shift in

a binary computer and, more in general, every power

α

n

= exp(nln α) can be always computed with the

maximal accuracy allowed by a computer in constant

time, i.e., in a time independent of α and n. This

is because the two elementary functions exp(x) and

ln(x) have the nice property that their evaluation is

independent of their argument. The same property

holds true for the most common numerical functions,

8 CHAPTER 1. INTRODUCTION

as the trigonometric and hyperbolic functions, the Γ

and ψ functions (see below), and so on.

As we shall see, in algorithm analysis there appear

many kinds of “special” numbers. Most of them can

be reduced to the computation of some basic quan-

tities, which are considered to be in closed form, al-

though apparently they depend on some parameter

n. The three main quantities of this kind are the fac-

torial, the harmonic numbers and the binomial co-

eﬃcients. In order to justify the previous sentence,

let us anticipate some deﬁnitions, which will be dis-

cussed in the next chapter, and give a more precise

presentation of the Γ and ψ functions.

The Γ-function is deﬁned by a deﬁnite integral:

Γ(x) =

∞

0

t

x−1

e

−t

dt.

By integrating by parts, we obtain:

Γ(x + 1) =

∞

0

t

x

e

−t

dt =

=

−t

x

e

−t

∞

0

+

∞

0

xt

x−1

e

−t

dt

= xΓ(x)

which is a basic, recurrence property of the Γ-

function. It allows us to reduce the computation of

Γ(x) to the case 1 ≤ x ≤ 2. In this interval we can

use a polynomial approximation:

Γ(x + 1) = 1 +b

1

x +b

2

x

2

+ +b

8

x

8

+ǫ(x)

where:

b

1

= −0.577191652 b

5

= −0.756704078

b

2

= 0.988205891 b

6

= 0.482199394

b

3

= −0.897056937 b

7

= −0.193527818

b

4

= 0.918206857 b

8

= 0.035868343

The error is [ǫ(x)[ ≤ 3 10

−7

. Another method is to

use Stirling’s approximation:

Γ(x) = e

−x

x

x−0.5

√

2π

1 +

1

12x

+

1

288x

2

−

−

139

51840x

3

−

571

2488320x

4

+

.

Some special values of the Γ-function are directly

obtained from the deﬁnition. For example, when

x = 1 the integral simpliﬁes and we immediately ﬁnd

Γ(1) = 1. When x = 1/2 the deﬁnition implies:

Γ(1/2) =

∞

0

t

1/2−1

e

−t

dt =

∞

0

e

−t

√

t

dt.

By performing the substitution y =

√

t (t = y

2

and

dt = 2ydy), we have:

Γ(1/2) =

∞

0

e

−y

2

y

2ydy = 2

∞

0

e

−y

2

dy =

√

π

the last integral being the famous Gauss’ integral.

Finally, from the recurrence relation we obtain:

Γ(1/2) = Γ(1 −1/2) = −

1

2

Γ(−1/2)

and therefore:

Γ(−1/2) = −2Γ(1/2) = −2

√

π.

The Γ function is deﬁned for every x ∈ C, except

when x is a negative integer, where the function goes

to inﬁnity; the following approximation can be im-

portant:

Γ(−n +ǫ) ≈

(−1)

n

n!

1

ǫ

.

When we unfold the basic recurrence of the Γ-

function for x = n an integer, we ﬁnd Γ(n + 1) =

n (n − 1) 2 1. The factorial Γ(n + 1) =

n! = 1 2 3 n seems to require n −2 multi-

plications. However, for n large it can be computed

by means of the Stirling’s formula, which is obtained

from the same formula for the Γ-function:

n! = Γ(n + 1) = nΓ(n) =

=

√

2πn

n

e

n

1 +

1

12n

+

1

288n

2

+

.

This requires only a ﬁxed amount of operations to

reach the desired accuracy.

The function ψ(x), called ψ-function or digamma

function, is deﬁned as the logarithmic derivative of

the Γ-function:

ψ(x) =

d

dx

ln Γ(x) =

Γ

′

(x)

Γ(x)

.

Obviously, we have:

ψ(x + 1) =

Γ

′

(x + 1)

Γ(x + 1)

=

d

dx

xΓ(x)

xΓ(x)

=

=

Γ(x) +xΓ

′

(x)

xΓ(x)

=

1

x

+ψ(x)

and this is a basic property of the digamma function.

By this formula we can always reduce the computa-

tion of ψ(x) to the case 1 ≤ x ≤ 2, where we can use

the approximation:

ψ(x) = ln x −

1

2x

−

1

12x

2

+

1

120x

4

−

1

252x

6

+ .

By the previous recurrence, we see that the digamma

function is related to the harmonic numbers H

n

=

1 + 1/2 + 1/3 + + 1/n. In fact, we have:

H

n

= ψ(n + 1) +γ

where γ = 0.57721566 . . . is the Mascheroni-Euler

constant. By using the approximation for ψ(x), we

1.5. THE LANDAU NOTATION 9

obtain an approximate formula for the Harmonic

numbers:

H

n

= ln n +γ +

1

2n

−

1

12n

2

+

which shows that the computation of H

n

does not

require n − 1 sums and n − 1 inversions as it can

appear from its deﬁnition.

Finally, the binomial coeﬃcient:

n

k

=

n(n −1) (n −k + 1)

k!

=

=

n!

k!(n −k)!

=

=

Γ(n + 1)

Γ(k + 1)Γ(n −k + 1)

can be reduced to the computation of the Γ function

or can be approximated by using the Stirling’s for-

mula for factorials. The two methods are indeed the

same. We observe explicitly that the last expression

shows that binomial coeﬃcients can be deﬁned for

every n, k ∈ C, except that k cannot be a negative

integer number.

The reader can, as a very useful exercise, write

computer programs to realize the various functions

mentioned in the present section.

1.5 The Landau notation

To the mathematician Edmund Landau is ascribed

a special notation to describe the general behavior

of a function f(x) when x approaches some deﬁnite

value. We are mainly interested to the case x →

∞, but this should not be considered a restriction.

Landau notation is also known as O-notation (or big-

oh notation), because of the use of the letter O to

denote the desired behavior.

Let us consider functions f : N →R (i.e., sequences

of real numbers); given two functions f(n) and g(n),

we say that f(n) is O(g(n)), or that f(n) is in the

order of g(n), if and only if:

lim

n→∞

f(n)

g(n)

< ∞

In formulas we write f(n) = O(g(n)) or also f(n) ∼

g(n). Besides, if we have at the same time:

lim

n→∞

g(n)

f(n)

< ∞

we say that f(n) is in the same order as g(n) and

write f(n) = Θ(g(n)) or f(n) ≈ g(n).

It is easy to see that “∼” is an order relation be-

tween functions f : N →R, and that “≈” is an equiv-

alence relation. We observe explicitly that when f(n)

is in the same order as g(n), a constant K = 0 exists

such that:

lim

n→∞

f(n)

Kg(n)

= 1 or lim

n→∞

f(n)

g(n)

= K;

the constant K is very important and will often be

used.

Before making some important comments on Lan-

dau notation, we wish to introduce a last deﬁnition:

we say that f(n) is of smaller order than g(n) and

write f(n) = o(g(n)), iﬀ:

lim

n→∞

f(n)

g(n)

= 0.

Obviously, this is in accordance with the previous

deﬁnitions, but the notation introduced (the small-

oh notation) is used rather frequently and should be

known.

If f(n) and g(n) describe the behavior of two al-

gorithms A and B solving the same problem, we

will say that A is asymptotically better than B iﬀ

f(n) = o(g(n)); instead, the two algorithms are

asymptotically equivalent iﬀ f(n) = Θ(g(n)). This is

rather clear, because when f(n) = o(g(n)) the num-

ber of operations performed by A is substantially less

than the number of operations performed by B. How-

ever, when f(n) = Θ(g(n)), the number of operations

is the same, except for a constant quantity K, which

remains the same as n → ∞. The constant K can

simply depend on the particular realization of the al-

gorithms A and B, and with two diﬀerent implemen-

tations we may have K < 1 or K > 1. Therefore, in

general, when f(n) = Θ(g(n)) we cannot say which

algorithm is better, this depending on the particular

realization or on the particular computer on which

the algorithms are run. Obviously, if A and B are

both realized on the same computer and in the best

possible way, a value K < 1 tells us that algorithm

A is relatively better than B, and vice versa when

K > 1.

It is also possible to give an absolute evaluation

for the performance of a given algorithm A, whose

behavior is described by a sequence of values f(n).

This is done by comparing f(n) against an absolute

scale of values. The scale most frequently used con-

tains powers of n, logarithms and exponentials:

O(1) < O(ln n) < O(

√

n) < O(n) <

< O(nln n) < O(n

√

n) < O(n

2

) <

< O(n

5

) < < O(e

n

) <

< O(e

e

n

) < .

This scale reﬂects well-known properties: the loga-

rithm grows more slowly than any power n

ǫ

, how-

ever small ǫ, while e

n

grows faster than any power

10 CHAPTER 1. INTRODUCTION

n

k

, however large k, independent of n. Note that

n

n

= e

nln n

and therefore O(e

n

) < O(n

n

). As a mat-

ter of fact, the scale is not complete, as we obviously

have O(n

0.4

) < O(

√

n) and O(ln ln n) < O(ln n), but

the reader can easily ﬁll in other values, which can

be of interest to him or her.

We can compare f(n) to the elements of the scale

and decide, for example, that f(n) = O(n); this

means that algorithm A performs at least as well as

any algorithm B whose behavior is described by a

function g(n) = Θ(n).

Let f(n) be a sequence describing the behavior of

an algorithm A. If f(n) = O(1), then the algorithm

A performs in a constant time, i.e., in a time indepen-

dent of n, the parameter used to evaluate the algo-

rithm behavior. Algorithms of this type are the best

possible algorithms, and they are especially interest-

ing when n becomes very large. If f(n) = O(n), the

algorithm A is said to be linear; if f(n) = O(ln n), A

is said to be logarithmic; if f(n) = O(n

2

), A is said

to be quadratic; and if f(n) ≥ O(e

n

), A is said to

be exponential. In general, if f(n) ≤ O(n

r

) for some

r ∈ R, then A is said to be polynomial.

As a standard terminology, mainly used in the

“Theory of Complexity”, polynomial algorithms are

also called tractable, while exponential algorithms are

called intractable. These names are due to the follow-

ing observation. Suppose we have a linear algorithm

A and an exponential algorithm B, not necessarily

solving the same problem. Also suppose that an hour

of computer time executes both A and B with n = 40.

If a new computer is used which is 1000 times faster

than the old one, in an hour algorithm A will exe-

cute with m such that f(m) = K

A

m = 1000K

A

n, or

m = 1000n = 40, 000. Therefore, the problem solved

by the new computer is 1000 times larger than the

problem solved by the old computer. For algorithm

B we have g(m) = K

B

e

m

= 1000K

B

e

n

; by simplify-

ing and taking logarithms, we ﬁnd m = n+ln 1000 ≈

n + 6.9 < 47. Therefore, the improvement achieved

by the new computer is almost negligible.

Chapter 2

Special numbers

A sequence is a mapping from the set N of nat-

ural numbers into some other set of numbers. If

f : N → R, the sequence is called a sequence of real

numbers; if f : N → Q, the sequence is called a se-

quence of rational numbers; and so on. Usually, the

image of a k ∈ N is denoted by f

k

instead of the tradi-

tional f(k), and the whole sequence is abbreviated as

(f

0

, f

1

, f

2

, . . .) = (f

k

)

k∈N

. Because of this notation,

an element k ∈ N is called an index.

Often, we also study double sequences, i.e., map-

pings f : N N → R or some other nu-

meric set. In this case also, instead of writing

f(n, k) we will usually write f

n,k

and the whole

sequence will be denoted by ¦f

n,k

[ n, k ∈ N¦ or

(f

n,k

)

n,k∈N

. A double sequence can be displayed

as an inﬁnite array of numbers, whose ﬁrst row is

the sequence (f

0,0

, f

0,1

, f

0,2

, . . .), the second row is

(f

1,0

, f

1,1

, f

1,2

, . . .), and so on. The array can also

be read by columns, and then (f

0,0

, f

1,0

, f

2,0

, . . .) is

the ﬁrst column, (f

0,1

, f

1,1

, f

2,1

, . . .) is the second col-

umn, and so on. Therefore, the index n is the row

index and the index k is the column index.

We wish to describe here some sequences and dou-

ble sequences of numbers, frequently occurring in the

analysis of algorithms. In fact, they arise in the study

of very simple and basic combinatorial problems, and

therefore appear in more complex situations, so to be

considered as the fundamental bricks in a solid wall.

2.1 Mappings and powers

In all branches of Mathematics two concepts appear,

which have to be taken as basic: the concept of a

set and the concept of a mapping. A set is a col-

lection of objects and it must be considered a primi-

tive concept, i.e., a concept which cannot be deﬁned

in terms of other and more elementary concepts. If

a, b, c, . . . are the objects (or elements) in a set de-

noted by A, we write A = ¦a, b, c, . . .¦, thus giving

an extensional deﬁnition of this particular set. If a

set B is deﬁned through a property P of its elements,

we write B = ¦x[ P(x) is true¦, thus giving an inten-

sional deﬁnition of the particular set B.

If S is a ﬁnite set, then [S[ denotes its cardinality

or the number of its elements: The order in which we

write or consider the elements of S is irrelevant. If we

wish to emphasize a particular arrangement or order-

ing of the elements in S, we write (a

1

, a

2

, . . . , a

n

), if

[S[ = n and S = ¦a

1

, a

2

, . . . , a

n

¦ in any order. This is

the vector notation and is used to represent arrange-

ments of S. Two arrangements of S are diﬀerent if

and only if an index k exists, for which the elements

corresponding to that index in the two arrangements

are diﬀerent; obviously, as sets, the two arrangements

continue to be the same.

If A, B are two sets, a mapping or function from A

into B, noted as f : A → B, is a subset of the Carte-

sian product of A by B, f ⊆ A B, such that every

element a ∈ A is the ﬁrst element of one and only

one pair (a, b) ∈ f. The usual notation for (a, b) ∈ f

is f(a) = b, and b is called the image of a under the

mapping f. The set A is the domain of the map-

ping, while B is the range or codomain. A function

for which every a

1

= a

2

∈ A corresponds to pairs

(a

1

, b

1

) and (a

2

, b

2

) with b

1

= b

2

is called injective. A

function in which every b ∈ B belongs at least to one

pair (a, b) is called surjective. A bijection or 1-1 cor-

respondence or 1-1 mapping is any injective function

which is also surjective.

If [A[ = n and [B[ = m, the cartesian product

AB, i.e., the set of all the couples (a, b) with a ∈ A

and b ∈ B, contains exactly nm elements or couples.

A more diﬃcult problem is to ﬁnd out how many

mappings from A to B exist. We can observe that

every element a ∈ A must have its image in B; this

means that we can associate to a any one of the m

elements in B. Therefore, we have m m . . . m

diﬀerent possibilities, when the product is extended

to all the n elements in A. Since all the mappings can

be built in this way, we have a total of m

n

diﬀerent

mappings from A into B. This also explains why the

set of mappings from A into B is often denoted by

11

12 CHAPTER 2. SPECIAL NUMBERS

B

A

; in fact we can write:

B

A

= [B[

|A|

= m

n

.

This formula allows us to solve some simple com-

binatorial problems. For example, if we toss 5 coins,

how many diﬀerent conﬁgurations head/tail are pos-

sible? The ﬁve coins are the domain of our mappings

and the set ¦head,tail¦ is the codomain. Therefore

we have a total of 2

5

= 32 diﬀerent conﬁgurations.

Similarly, if we toss three dice, the total number of

conﬁgurations is 6

3

= 216. In the same way, we can

count the number of subsets in a set S having [S[ = n.

In fact, let us consider, given a subset A ⊆ S, the

mapping χ

A

: S → ¦0, 1¦ deﬁned by:

χ

A

(x) = 1 for x ∈ A

χ

A

(x) = 0 for x / ∈ A

This is called the characteristic function of the sub-

set A; every two diﬀerent subsets of S have dif-

ferent characteristic functions, and every mapping

f : S → ¦0, 1¦ is the characteristic function of some

subset A ⊆ S, i.e., the subset ¦x ∈ S [ f(x) = 1¦.

Therefore, there are as many subsets of S as there

are characteristic functions; but these are 2

n

by the

formula above.

A ﬁnite set A is sometimes called an alphabet and

its elements symbols or letters. Any sequence of let-

ters is called a word; the empty sequence is the empty

word and is denoted by ǫ. From the previous consid-

erations, if [A[ = n, the number of words of length m

is n

m

. The ﬁrst part of Chapter 6 is devoted to some

basic notions on special sets of words or languages.

2.2 Permutations

In the usual sense, a permutation of a set of objects

is an arrangement of these objects in any order. For

example, three objects, denoted by a, b, c, can be ar-

ranged into six diﬀerent ways:

(a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a).

A very important problem in Computer Science is

“sorting”: suppose we have n objects from an or-

dered set, usually some set of numbers or some set of

strings (with the common lexicographic order); the

objects are given in a random order and the problem

consists in sorting them according to the given order.

For example, by sorting (60, 51, 80, 77, 44) we should

obtain (44, 51, 60, 77, 80) and the real problem is to

obtain this ordering in the shortest possible time. In

other words, we start with a random permutation of

the n objects, and wish to arrive to their standard

ordering, the one in accordance with their supposed

order relation (e.g., “less than”).

In order to abstract from the particular nature of

the n objects, we will use the numbers ¦1, 2, . . . , n¦ =

N

n

, and deﬁne a permutation as a 1-1 mapping π :

N

n

→ N

n

. By identifying a and 1, b and 2, c and 3,

the six permutations of 3 objects are written:

1 2 3

1 2 3

,

1 2 3

1 3 2

,

1 2 3

2 1 3

,

1 2 3

2 3 1

,

1 2 3

3 1 2

,

1 2 3

3 2 1

,

where, conventionally, the ﬁrst line contains the el-

ements in N

n

in their proper order, and the sec-

ond line contains the corresponding images. This

is the usual representation for permutations, but

since the ﬁrst line can be understood without

ambiguity, the vector representation for permuta-

tions is more common. This consists in writ-

ing the second line (the images) in the form of

a vector. Therefore, the six permutations are

(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1), re-

spectively.

Let us examine the permutation π = (3, 2, 1) for

which we have π(1) = 3, π(2) = 2 and π(3) = 1.

If we start with the element 1 and successively ap-

ply the mapping π, we have π(1) = 3, π(π(1)) =

π(3) = 1, π(π(π(1))) = π(1) = 3 and so on. Since

the elements in N

n

are ﬁnite, by starting with any

k ∈ N

n

we must obtain a ﬁnite chain of numbers,

which will repeat always in the same order. These

numbers are said to form a cycle and the permuta-

tion (3, 2, 1) is formed by two cycles, the ﬁrst one

composed by 1 and 3, the second one only composed

by 2. We write (3, 2, 1) = (1 3)(2), where every cycle

is written between parentheses and numbers are sep-

arated by blanks, to distinguish a cycle from a vector.

Conventionally, a cycle is written with the smallest

element ﬁrst and the various cycles are arranged ac-

cording to their ﬁrst element. Therefore, in this cycle

representation the six permutations are:

(1)(2)(3), (1)(2 3), (1 2)(3), (1 2 3), (1 3 2), (1 3)(2).

A number k for which π(k) = k is called a ﬁxed

point for π. The corresponding cycles, formed by a

single element, are conventionally understood, except

in the identity (1, 2, . . . , n) = (1)(2) (n), in which

all the elements are ﬁxed points; the identity is simply

written (1). Consequently, the usual representation

of the six permutations is:

(1) (2 3) (1 2) (1 2 3) (1 3 2) (1 3).

A permutation without any ﬁxed point is called a

derangement. A cycle with only two elements is called

a transposition. The degree of a cycle is the number

of its elements, plus one; the degree of a permutation

2.3. THE GROUP STRUCTURE 13

is the sum of the degrees of its cycles. The six per-

mutations have degree 2, 3, 3, 4, 4, 3, respectively. A

permutation is even or odd according to the fact that

its degree is even or odd.

The permutation (8, 9, 4, 3, 6, 1, 7, 2, 10, 5),

in vector notation, has a cycle representation

(1 8 2 9 10 5 6)(3 4), the number 7 being a ﬁxed

point. The long cycle (1 8 2 9 10 5 6) has degree 8;

therefore the permutation degree is 8 + 3 + 2 = 13

and the permutation is odd.

2.3 The group structure

Let n ∈ N; P

n

denotes the set of all the permutations

of n elements, i.e., according to the previous sections,

the set of 1-1 mappings π : N

n

→ N

n

. If π, ρ ∈ P

n

,

we can perform their composition, i.e., a new permu-

tation σ deﬁned as σ(k) = π(ρ(k)) = (π ◦ ρ)(k). An

example in P

7

is:

π ◦ ρ =

1 2 3 4 5 6 7

1 5 6 7 4 2 3

1 2 3 4 5 6 7

4 5 2 1 7 6 3

=

1 2 3 4 5 6 7

4 7 6 3 1 5 2

.

In fact, by instance, π(2) = 5 and ρ(5) = 7; therefore

σ(2) = ρ(π(2)) = ρ(5) = 7, and so on. The vec-

tor representation of permutations is not particularly

suited for hand evaluation of composition, although it

is very convenient for computer implementation. The

opposite situation occurs for cycle representation:

(2 5 4 7 3 6) ◦ (1 4)(2 5 7 3) = (1 4 3 6 5)(2 7)

Cycles in the left hand member are read from left to

right and by examining a cycle after the other we ﬁnd

the images of every element by simply remembering

the cycle successor of the same element. For example,

the image of 4 in the ﬁrst cycle is 7; the second cycle

does not contain 7, but the third cycle tells us that the

image of 7 is 3. Therefore, the image of 4 is 3. Fixed

points are ignored, and this is in accordance with

their meaning. The composition symbol ‘◦’ is usually

understood and, in reality, the simple juxtaposition

of the cycles can well denote their composition.

The identity mapping acts as an identity for com-

position, since it is only composed by ﬁxed points.

Every permutation has an inverse, which is the per-

mutation obtained by reading the given permutation

from below, i.e., by sorting the elements in the second

line, which become the elements in the ﬁrst line, and

then rearranging the elements in the ﬁrst line into

the second line. For example, the inverse of the ﬁrst

permutation π in the example above is:

1 2 3 4 5 6 7

1 6 7 5 2 3 4

.

In fact, in cycle notation, we have:

(2 5 4 7 3 6)(2 6 3 7 4 5) =

= (2 6 3 7 4 5)(2 5 4 7 3 6) = (1).

A simple observation is that the inverse of a cycle

is obtained by writing its ﬁrst element followed by

all the other elements in reverse order. Hence, the

inverse of a transposition is the same transposition.

Since composition is associative, we have proved

that (P

n

, ◦) is a group. The group is not commuta-

tive, because, for example:

ρ ◦ π = (1 4)(2 5 7 3) ◦ (2 5 4 7 3 6)

= (1 7 6 2 4)(3 5) = π ◦ ρ.

An involution is a permutation π such that π

2

= π ◦

π = (1). An involution can only be composed by ﬁxed

points and transpositions, because by the deﬁnition

we have π

−1

= π and the above observation on the

inversion of cycles shows that a cycle with more than

2 elements has an inverse which cannot coincide with

the cycle itself.

Till now, we have supposed that in the cycle repre-

sentation every number is only considered once. How-

ever, if we think of a permutation as the product of

cycles, we can imagine that its representation is not

unique and that an element k ∈ N

n

can appear in

more than one cycle. The representation of σ or π ◦ρ

are examples of this statement. In particular, we can

obtain the transposition representation of a permuta-

tion; we observe that we have:

(2 6)(6 5)(6 4)(6 7)(6 3) = (2 5 4 7 3 6)

We transform the cycle into a product of transposi-

tions by forming a transposition with the ﬁrst and

the last element in the cycle, and then adding other

transpositions with the same ﬁrst element (the last

element in the cycle) and the other elements in the

cycle, in the same order, as second element. Besides,

we note that we can always add a couple of transpo-

sitions as (2 5)(2 5), corresponding to the two ﬁxed

points (2) and (5), and therefore adding nothing to

the permutation. All these remarks show that:

• every permutation can be written as the compo-

sition of transpositions;

• this representation is not unique, but every

two representations diﬀer by an even number of

transpositions;

• the minimal number of transpositions corre-

sponding to a cycle is the degree of the cycle

(except possibly for ﬁxed points, which however

always correspond to an even number of trans-

positions).

14 CHAPTER 2. SPECIAL NUMBERS

Therefore, we conclude that an even [odd] permuta-

tion can be expressed as the composition of an even

[odd] number of transpositions. Since the composi-

tion of two even permutations is still an even permu-

tation, the set A

n

of even permutations is a subgroup

of P

n

and is called the alternating subgroup, while the

whole group P

n

is referred to as the symmetric group.

2.4 Counting permutations

How many permutations are there in P

n

? If n = 1,

we only have a single permutation (1), and if n = 2

we have two permutations, exactly (1, 2) and (2, 1).

We have already seen that [P

3

[ = 6 and if n = 0

we consider the empty vector () as the only possible

permutation, that is [P

0

[ = 1. In this way we obtain

a sequence ¦1, 1, 2, 6, . . .¦ and we wish to obtain a

formula giving us [P

n

[, for every n ∈ N.

Let π ∈ P

n

be a permutation and (a

1

, a

2

, ..., a

n

) be

its vector representation. We can obtain a permuta-

tion in P

n+1

by simply adding the new element n+1

in any position of the representation of π:

(n + 1, a

1

, a

2

, . . . , a

n

) (a

1

, n + 1, a

2

, . . . , a

n

)

(a

1

, a

2

, . . . , a

n

, n + 1)

Therefore, from any permutation in P

n

we obtain

n+1 permutations in P

n+1

, and they are all diﬀerent.

Vice versa, if we start with a permutation in P

n+1

,

and eliminate the element n + 1, we obtain one and

only one permutation in P

n

. Therefore, all permuta-

tions in P

n+1

are obtained in the way just described

and are obtained only once. So we ﬁnd:

[P

n+1

[ = (n + 1) [P

n

[

which is a simple recurrence relation. By unfolding

this recurrence, i.e., by substituting to [P

n

[ the same

expression in [P

n+1

[, and so on, we obtain:

[P

n+1

[ = (n + 1)[P

n

[ = (n + 1)n[P

n−1

[ =

= =

= (n + 1)n(n −1) 1 [P

0

[

Since, as we have seen, [P

0

[=1, we have proved that

the number of permutations in P

n

is given by the

product n (n − 1)... 2 1. Therefore, our sequence

is:

n 0 1 2 3 4 5 6 7 8

[P

n

[ 1 1 2 6 24 120 720 5040 40320

As we mentioned in the Introduction, the number

n (n−1)... 2 1 is called n factorial and is denoted by

n!. For example we have 10! = 10987654321 =

3, 680, 800. Factorials grow very fast, but they are one

of the most important quantities in Mathematics.

When n ≥ 2, we can add to every permutation in

P

n

one transposition, say (1 2). This transforms ev-

ery even permutation into an odd permutation, and

vice versa. On the other hand, since (1 2)

−1

= (1 2),

the transformation is its own inverse, and therefore

deﬁnes a 1-1 mapping between even and odd permu-

tations. This proves that the number of even (odd)

permutations is n!/2.

Another simple problem is how to determine the

number of involutions on n elements. As we have al-

ready seen, an involution is only composed by ﬁxed

points and transpositions (without repetitions of the

elements!). If we denote by I

n

the set of involutions

of n elements, we can divide I

n

into two subsets: I

′

n

is the set of involutions in which n is a ﬁxed point,

and I

′′

n

is the set of involutions in which n belongs

to a transposition, say (k n). If we eliminate n from

the involutions in I

′

n

, we obtain an involution of n−1

elements, and vice versa every involution in I

′

n

can be

obtained by adding the ﬁxed point n to an involution

in I

n−1

. If we eliminate the transposition (k n) from

an involution in I

′′

n

, we obtain an involution in I

n−2

,

which contains the element n−1, but does not contain

the element k. In all cases, however, by eliminating

(k n) from all involutions containing it, we obtain a

set of involutions in a 1-1 correspondence with I

n−2

.

The element k can assume any value 1, 2, . . . , n − 1,

and therefore we obtain (n − 1) times [I

n−2

[ involu-

tions.

We now observe that all the involutions in I

n

are

obtained in this way from involutions in I

n−1

and

I

n−2

, and therefore we have:

[I

n

[ = [I

n−1

[ + (n −1)[I

n−2

[

Since [I

0

[ = 1, [I

1

[ = 1 and [I

2

[ = 2, from this recur-

rence relation we can successively ﬁnd all the values

of [I

n

[. This sequence (see Section 4.9) is therefore:

n 0 1 2 3 4 5 6 7 8

I

n

1 1 2 4 10 26 76 232 764

We conclude this section by giving the classical

computer program for generating a random permu-

tation of the numbers ¦1, 2, . . . , n¦. The procedure

shuﬄe receives the address of the vector, and the

number of its elements; ﬁlls it with the numbers from

1 to n then uses the standard procedure random to

produce a random permutation, which is returned in

the input vector:

procedure shuﬄe( var v : vector; n : integer ) ;

var : ;

begin

for i := 1 to n do v[ i ] := i ;

for i := n downto 2 do begin

j := random( i ) + 1 ;

2.5. DISPOSITIONS AND COMBINATIONS 15

a := v[ i ]; v[ i ] := v[ j ]; v[ j ] := a

end

end ¦shuﬄe¦ ;

The procedure exchanges the last element in v with

a random element in v, possibly the same last ele-

ment. In this way the last element is chosen at ran-

dom, and the procedure goes on with the last but one

element. In this way, the elements in v are properly

shuﬄed and eventually v contains the desired random

permutation. The procedure obviously performs in

linear time.

2.5 Dispositions and Combina-

tions

Permutations are a special case of a more general sit-

uation. If we have n objects, we can wonder how

many diﬀerent orderings exist of k among the n ob-

jects. For example, if we have 4 objects a, b, c, d, we

can make 12 diﬀerent arrangements with two objects

chosen from a, b, c, d. They are:

(a, b) (a, c) (a, d) (b, a) (b, c) (b, d)

(c, a) (c, b) (c, d) (d, a) (d, b) (d, c)

These arrangements ara called dispositions and, in

general, we can use any one of the n objects to be ﬁrst

in the permutation. There remain only n −1 objects

to be used as second element, and n − 2 objects to

be used as a third element, and so on. Therefore,

the k objects can be selected in n(n−1)...(n−k +1)

diﬀerent ways. If D

n,k

denotes the number of possible

dispositions of n elements in groups of k, we have:

D

n,k

= n(n −1) (n −k + 1) = n

k

The symbol n

k

is called a falling factorial because it

consists of k factors beginning with n and decreasing

by one down to (n −k +1). Obviously, n

n

= n! and,

by convention, n

0

= 1. There exists also a rising

factorial n

¯

k

= n(n + 1) (n +k −1), often denoted

by (n)

k

, the so-called Pochhammer symbol.

When in k-dispositions we do not consider the

order of the elements, we obtain what are called

k-combinations. Therefore, there are only 6 2-

combinations of 4 objects, and they are:

¦a, b¦ ¦a, c¦ ¦a, d¦ ¦b, c¦ ¦b, d¦ ¦c, d¦

Combinations are written between braces, because

they are simply the subsets with k objects of a set of n

objects. If A = ¦a, b, c¦, all the possible combinations

of these objects are:

C

3,0

= ∅

C

3,1

= ¦¦a¦, ¦b¦, ¦c¦¦

C

3,2

= ¦¦a, b¦, ¦a, c¦, ¦b, c¦¦

C

3,3

= ¦¦a, b, c¦¦

The number of k-combinations of n objects is denoted

by

n

k

**, which is often read “n choose k”, and is called
**

a binomial coeﬃcient. By the very deﬁnition we have:

n

0

= 1

n

n

= 1 ∀n ∈ N

because, given a set of n elements, the empty set is

the only subset with 0 elements, and the whole set is

the only subset with n elements.

The name “binomial coeﬃcients” is due to the well-

known “binomial formula”:

(a +b)

n

=

n

¸

k=0

n

k

a

k

b

n−k

which is easily proved. In fact, when expanding the

product (a + b)(a + b) (a + b), we choose a term

from each factor (a +b); the resulting term a

k

b

n−k

is

obtained by summing over all the possible choices of

k a’s and n − k b’s, which, by the deﬁnition above,

are just

n

k

.

There exists a simple formula to compute binomial

coeﬃcients. As we have seen, there are n

k

diﬀer-

ent k-dispositions of n objects; by permuting the k

objects in the disposition, we obtain k! other dispo-

sitions with the same elements. Therefore, k! dis-

positions correspond to a single combination and we

have:

n

k

=

n

k

k!

=

n(n −1) . . . (n −k + 1)

k!

This formula gives a simple way to compute binomial

coeﬃcients in a recursive way. In fact we have:

n

k

=

n(n −1) . . . (n −k + 1)

k!

=

=

n

k

(n −1) . . . (n −k + 1)

(k −1)!

=

=

n

k

n −1

k −1

**and the expression on the right is successively reduced
**

to

r

0

**, which is 1 as we have already seen. For exam-
**

ple, we have:

7

3

=

7

3

6

2

=

7

3

6

2

5

1

=

7

3

6

2

5

1

4

0

= 35

It is not diﬃcult to compute a binomial coeﬃcient

such as

100

3

, but it is a hard job to compute

100

97

**16 CHAPTER 2. SPECIAL NUMBERS
**

by means of the above formulas. There exists, how-

ever, a symmetry formula which is very useful. Let

us begin by observing that:

n

k

=

n(n −1) . . . (n −k + 1)

k!

=

=

n. . . (n −k + 1)

k!

(n −k) . . . 1

(n −k) . . . 1

=

=

n!

k!(n −k)!

This is a very important formula by its own, and it

shows that:

n

n −k

=

n!

(n −k)!(n −(n −k))!

=

n

k

From this formula we immediately obtain

100

97

=

100

3

**. The most diﬃcult computing problem is the
**

evaluation of the central binomial coeﬃcient

2k

k

, for

which symmetry gives no help.

The reader is invited to produce a computer pro-

gram to evaluate binomial coeﬃcients. He (or she) is

warned not to use the formula n!/k!(n−k)!, which can

produce very large numbers, exceeding the capacity

of the computer when n, k are not small.

The deﬁnition of a binomial coeﬃcient can be eas-

ily expanded to any real numerator:

r

k

=

r

k

k!

=

r(r −1) . . . (r −k + 1)

k!

.

For example we have:

1/2

3

=

1/2(−1/2)(−3/2)

3!

=

1

16

but in this case the symmetry rule does not make

sense. We point out that:

−n

k

=

−n(−n −1) . . . (−n −k + 1)

k!

=

=

(−1)

k

(n +k −1)

k

k!

=

n +k −1

k

(−1)

k

which allows us to express a binomial coeﬃcient with

negative, integer numerator as a binomial coeﬃcient

with positive numerator. This is known as negation

rule and will be used very often.

If in a combination we are allowed to have several

copies of the same element, we obtain a combination

with repetitions. A useful exercise is to prove that the

number of the k by k combinations with repetitions

of n elements is:

R

n,k

=

n +k −1

k

.

2.6 The Pascal triangle

Binomial coeﬃcients satisfy a very important recur-

rence relation, which we are now going to prove. As

we know,

n

k

**is the number of the subsets with k
**

elements of a set with n elements. We can count

the number of such subsets in the following way. Let

¦a

1

, a

2

, . . . , a

n

¦ be the elements of the base set, and

let us ﬁx one of these elements, e.g., a

n

. We can dis-

tinguish the subsets with k elements into two classes:

the subsets containing a

n

and the subsets that do

not contain a

n

. Let S

+

and S

−

be these two classes.

We now point out that S

+

can be seen (by elimi-

nating a

n

) as the subsets with k − 1 elements of a

set with n−1 elements; therefore, the number of the

elements in S

+

is

n−1

k−1

. The class S

−

can be seen

as composed by the subsets with k elements of a set

with n−1 elements, i.e., the base set minus a

n

: their

number is therefore

n−1

k

**. By summing these two
**

contributions, we obtain the recurrence relation:

n

k

=

n −1

k

+

n −1

k −1

which can be used with the initial conditions:

n

0

=

n

n

= 1 ∀n ∈ N.

For example, we have:

4

2

=

3

2

+

3

1

=

=

2

2

+

2

1

+

2

1

+

2

0

=

= 2 + 2

2

1

= 2 + 2

1

1

+

1

0

=

= 2 + 2 2 = 6.

This recurrence is not particularly suitable for nu-

merical computation. However, it gives a simple rule

to compute successively all the binomial coeﬃcients.

Let us dispose them in an inﬁnite array, whose rows

represent the number n and whose columns represent

the number k. The recurrence tells us that the ele-

ment in position (n, k) is obtained by summing two

elements in the previous row: the element just above

the position (n, k), i.e., in position (n−1, k), and the

element on the left, i.e., in position (n−1, k−1). The

array is initially ﬁlled by 1’s in the ﬁrst column (cor-

responding to the various

n

0

**) and the main diagonal
**

(corresponding to the numbers

n

n

). See Table 2.1.

We actually obtain an inﬁnite, lower triangular ar-

ray known as the Pascal triangle (or Tartaglia-Pascal

triangle). The symmetry rule is quite evident and a

simple observation is that the sum of the elements in

row n is 2

n

. The proof of this fact is immediate, since

2.7. HARMONIC NUMBERS 17

n`k 0 1 2 3 4 5 6

0 1

1 1 1

2 1 2 1

3 1 3 3 1

4 1 4 6 4 1

5 1 5 10 10 5 1

6 1 6 15 20 15 6 1

Table 2.1: The Pascal triangle

a row represents the total number of the subsets in a

set with n elements, and this number is obviously 2

n

.

The reader can try to prove that the row alternating

sums, e.g., the sum 1−4+6−4+1, equal zero, except

for the ﬁrst row, the row with n = 0.

As we have already mentioned, the numbers c

n

=

2n

n

**are called central binomial coeﬃcients and their
**

sequence begins:

n 0 1 2 3 4 5 6 7 8

c

n

1 2 6 20 70 252 924 3432 12870

They are very important, and, for example, we can

express the binomial coeﬃcients

−1/2

n

in terms of

the central binomial coeﬃcients:

−1/2

n

=

(−1/2)(−3/2) (−(2n −1)/2)

n!

=

=

(−1)

n

1 3 (2n −1)

2

n

n!

=

=

(−1)

n

1 2 3 4 (2n −1) (2n)

2

n

n!2 4 (2n)

=

=

(−1)

n

(2n)!

2

n

n!2

n

(1 2 n)

=

(−1)

n

4

n

(2n)!

n!

2

=

=

(−1)

n

4

n

2n

n

.

In a similar way, we can prove the following identities:

1/2

n

=

(−1)

n−1

4

n

(2n −1)

2n

n

3/2

n

=

(−1)

n

3

4

n

(2n −1)(2n −3)

2n

n

−3/2

n

=

(−1)

n−1

(2n + 1)

4

n

2n

n

.

An important point of this generalization is that

the binomial formula can be extended to real expo-

nents. Let us consider the function f(x) = (1+x)

r

; it

is continuous and can be diﬀerentiated as many times

as we wish; in fact we have:

f

(n)

(x) =

d

n

dx

n

(1 +x)

r

= r

n

(1 +x)

r−n

as can be shown by mathematical induction. The

ﬁrst two cases are f

(0)

= (1 + x)

r

= r

0

(1 + x)

r−0

,

f

′

(x) = r(1 + x)

r−1

. Suppose now that the formula

holds true for some n ∈ N and let us diﬀerentiate it

once more:

f

(n+1)

(x) = r

n

(r −n)(1 +x)

r−n−1

=

= r

n+1

(1 +x)

r−(n+1)

and this proves our statement. Because of that, (1 +

x)

r

has a Taylor expansion around the point x = 0

of the form:

(1 +x)

r

=

= f(0) +

f

′

(0)

1!

x +

f

′′

(0)

2!

x

2

+ +

f

(n)

(0)

n!

x

n

+ .

The coeﬃcient of x

n

is therefore f

(n)

(0)/n! =

r

n

/n! =

r

n

, and so:

(1 +x)

r

=

∞

¸

n=0

r

n

x

n

.

We conclude with the following property, which is

called the cross-product rule:

n

k

k

r

=

n!

k!(n −k)!

k!

r!(k −r)!

=

=

n!

r!(n −r)!

(n −r)!

(n −k)!(k −r)!

=

=

n

r

n −r

k −r

.

This rule, together with the symmetry and the nega-

tion rules, are the three basic properties of binomial

coeﬃcients:

n

k

=

n

n −k

−n

k

=

n +k −1

k

(−1)

k

n

k

k

r

=

n

r

n −r

k −r

**and are to be remembered by heart.
**

2.7 Harmonic numbers

It is well-known that the harmonic series:

∞

¸

k=1

1

k

= 1 +

1

2

+

1

3

+

1

4

+

1

5

+

diverges. In fact, if we cumulate the 2

m

numbers

from 1/(2

m

+ 1) to 1/2

m+1

, we obtain:

1

2

m

+ 1

+

1

2

m

+ 2

+ +

1

2

m+1

>

>

1

2

m+1

+

1

2

m+1

+ +

1

2

m+1

=

2

m

2

m+1

=

1

2

18 CHAPTER 2. SPECIAL NUMBERS

and therefore the sum cannot be limited. On the

other hand we can deﬁne:

H

n

= 1 +

1

2

+

1

3

+ +

1

n

a ﬁnite, partial sum of the harmonic series. This

number has a well-deﬁned value and is called a har-

monic number. Conventionally, we set H

0

= 0 and

the sequence of harmonic numbers begins:

n 0 1 2 3 4 5 6 7 8

H

n

0 1

3

2

11

6

25

12

137

60

49

20

363

140

761

280

Harmonic numbers arise in the analysis of many

algorithms and it is very useful to know an approxi-

mate value for them. Let us consider the series:

1 −

1

2

+

1

3

−

1

4

+

1

5

− = ln 2

and let us deﬁne:

L

n

= 1 −

1

2

+

1

3

−

1

4

+ +

(−1)

n−1

n

.

Obviously we have:

H

2n

−L

2n

= 2

1

2

+

1

4

+ +

1

2n

= H

n

or H

2n

= L

2n

+H

n

, and since the series for ln 2 is al-

ternating in sign, the error committed by truncating

it at any place is less than the ﬁrst discarded element.

Therefore:

ln 2 −

1

2n

< L

2n

< ln 2

and by summing H

n

to all members:

H

n

+ ln 2 −

1

2n

< H

2n

< H

n

+ ln 2

Let us now consider the two cases n = 2

k−1

and n =

2

k−2

:

H

2

k−1 + ln 2 −

1

2

k

< H

2

k < H

2

k−1 + ln 2

H

2

k−2 + ln 2 −

1

2

k−1

< H

2

k−1 < H

2

k−2 + ln 2.

By summing and simplifying these two expressions,

we obtain:

H

2

k−2 + 2 ln 2 −

1

2

k

−

1

2

k−1

< H

2

k < H

2

k−2 + 2 ln 2.

We can now iterate this procedure and eventually

ﬁnd:

H

2

0 +k ln 2−

1

2

k

−

1

2

k−1

− −

1

2

1

< H

2

k < H

2

0 +k ln 2.

Since H

2

0 = H

1

= 1, we have the limitations:

ln 2

k

< H

2

k < ln 2

k

+ 1.

These limitations can be extended to every n, and

since the values of the H

n

’s are increasing, this im-

plies that a constant γ should exist (0 < γ < 1) such

that:

H

n

→ ln n +γ as n → ∞

This constant is called the Euler-Mascheroni con-

stant and, as we have already mentioned, its value

is: γ ≈ 0.5772156649 . . .. Later we will prove the

more accurate approximation of the H

n

’s we quoted

in the Introduction.

The generalized harmonic numbers are deﬁned as:

H

(s)

n

=

1

1

s

+

1

2

s

+

1

3

s

+ +

1

n

s

and H

(1)

n

= H

n

. They are the partial sums of the

series deﬁning the Riemann ζ function:

ζ(s) =

1

1

s

+

1

2

s

+

1

3

s

+

which can be deﬁned in such a way that the sum

actually converges except for s = 1 (the harmonic

series). In particular we have:

ζ(2) = 1 +

1

4

+

1

9

+

1

16

+

1

25

+ =

π

2

6

ζ(3) = 1 +

1

8

+

1

27

+

1

64

+

1

125

+

ζ(4) = 1 +

1

16

+

1

81

+

1

256

+

1

625

+ =

π

4

90

and in general:

ζ(2n) =

(2π)

2n

2(2n)!

[B

2n

[

where B

n

are the Bernoulli numbers (see below). No

explicit formula is known for ζ(2n + 1), but numeri-

cally we have:

ζ(3) ≈ 1.202056903 . . . .

Because of the limited value of ζ(s), we can set, for

large values of n:

H

(s)

n

≈ ζ(s)

2.8 Fibonacci numbers

At the beginning of 1200, Leonardo Fibonacci intro-

duced in Europe the positional notation for numbers,

together with the computing algorithms for perform-

ing the four basic operations. In fact, Fibonacci was

2.9. WALKS, TREES AND CATALAN NUMBERS 19

the most important mathematician in western Eu-

rope at that time. He posed the following problem:

a farmer has a couple of rabbits which generates an-

other couple after two months and, from that moment

on, a new couple of rabbits every month. The new

generated couples become fertile after two months,

when they begin to generate a new couple of rab-

bits every month. The problem consists in comput-

ing how many couples of rabbits the farmer has after

n months.

It is a simple matter to ﬁnd the initial values: there

is one couple at the beginning (1st month) and 1 in

the second month. The third month the farmer has

2 couples, and 3 couples the fourth month; in fact,

the ﬁrst couple has generated another pair of rabbits,

while the previously generated couple of rabbits has

not yet become fertile. The couples become 5 on the

ﬁfth month: in fact, there are the 3 couples of the

preceding month plus the newly generated couples,

which are as many as there are fertile couples; but

these are just the couples of two months beforehand,

i.e., 2 couples. In general, at the nth month, the

farmer will have the couples of the previous month

plus the new couples, which are generated by the fer-

tile couples, that is the couples he had two months

before. If we denote by F

n

the number of couples at

the nth month, we have the Fibonacci recurrence:

F

n

= F

n−1

+F

n−2

with the initial conditions F

1

= F

2

= 1. By the same

rule, we have F

0

= 0 and the sequence of Fibonacci

numbers begins:

n 0 1 2 3 4 5 6 7 8 9 10

F

n

0 1 1 2 3 5 8 13 21 34 55

every term obtained by summing the two preceding

numbers in the sequence.

Despite the small numbers appearing at the begin-

ning of the sequence, Fibonacci numbers grow very

fast, and later we will see how they grow and how they

can be computed in a fast way. For the moment, we

wish to show how Fibonacci numbers appear in com-

binatorics and in the analysis of algorithms. Suppose

we have some bricks of dimensions 1 2 dm, and we

wish to cover a strip 2 dm wide and n dm long by

using these bricks. The problem is to know in how

many diﬀerent ways we can perform this covering. In

Figure 2.1 we show the ﬁve ways to cover a strip 4

dm long.

If M

n

is this number, we can observe that a cov-

ering of M

n

can be obtained by adding vertically a

brick to a covering in M

n−1

or by adding horizon-

tally two bricks to a covering in M

n−2

. These are the

only ways of proceeding to build our coverings, and

Figure 2.1: Fibonacci coverings for a strip 4 dm long

therefore we have the recurrence relation:

M

n

= M

n−1

+M

n−2

which is the same recurrence as for Fibonacci num-

bers. This time, however, we have the initial condi-

tions M

0

= 1 (the empty covering is just a covering!)

and M

1

= 1. Therefore we conclude M

n

= F

n+1

.

Euclid’s algorithm for computing the Greatest

Common Divisor (gcd) of two positive integer num-

bers is another instance of the appearance of Fi-

bonacci numbers. The problem is to determine the

maximal number of divisions performed by Euclid’s

algorithm. Obviously, this maximum is attained

when every division in the process gives 1 as a quo-

tient, since a greater quotient would drastically cut

the number of necessary divisions. Let us consider

two consecutive Fibonacci numbers, for example 34

and 55, and let us try to ﬁnd gcd(34, 55):

55 = 1 34 + 21

34 = 1 21 + 13

21 = 1 13 + 8

13 = 1 8 + 5

8 = 1 5 + 3

5 = 1 3 + 2

3 = 1 2 + 1

2 = 2 1

We immediately see that the quotients are all 1 (ex-

cept the last one) and the remainders are decreasing

Fibonacci numbers. The process can be inverted to

prove that only consecutive Fibonacci numbers enjoy

this property. Therefore, we conclude that given two

integer numbers, n and m, the maximal number of

divisions performed by Euclid’s algorithm is attained

when n, m are two consecutive Fibonacci numbers

and the actual number of divisions is the order of the

smaller number in the Fibonacci sequence, minus 1.

2.9 Walks, trees and Catalan

numbers

“Walks” or “paths” are common combinatorial ob-

jects and are deﬁned in the following way. Let Z

2

be

20 CHAPTER 2. SPECIAL NUMBERS

T

E

, ,

, ,

,

, , , ,

, , ,

,

,

,

,

A

B

C

D

Figure 2.2: How a walk is decomposed

the integral lattice, i.e., the set of points in R

2

having

integer coordinates. A walk or path is a ﬁnite sequence

of points in Z

2

with the following properties:

1. the origin (0, 0) belongs to the walk;

2. if (x + 1, y + 1) belongs to the walk, then either

(x, y + 1) or (x + 1, y) also belongs to the walk.

A pair of points ((x, y), (x + 1, y)) is called an east

step and a pair ((x, y), (x, y + 1)) is a north step. It

is a simple matter to show that the number of walks

composed by n steps and ending at column k (i.e.,

the last point is (x, k) = (n − k, k)) is just

n

k

. In

fact, if we denote by 1, 2, . . . , n the n steps, starting

with the origin, then we can associate to any walk a

subset of N

n

= ¦1, 2, . . . , n¦, that is the subset of the

north-step numbers. Since the other steps should be

east steps, a 1-1 correspondence exists between the

walks and the subsets of N

n

with k elements, which,

as we know, are exactly

n

k

**. This also proves, in a
**

combinatorial way, the symmetry rule for binomial

coeﬃcients.

Among the walks, the ones that never go above the

main diagonal, i.e., walks no point of which has the

form (x, y) with y > x, are particularly important.

They are called underdiagonal walks. Later on, we

will be able to count the number of underdiagonal

walks composed by n steps and ending at a hori-

zontal distance k from the main diagonal. For the

moment, we only wish to establish a recurrence rela-

tion for the number b

n

of the underdiagonal walks of

length 2n and ending on the main diagonal. In Fig.

2.2 we sketch a possible walk, where we have marked

the point C = (k, k) at which the walk encounters

the main diagonal for the ﬁrst time. We observe that

the walk is decomposed into a ﬁrst part (the walk be-

tween A and B) and a second part (the walk between

C and D) which are underdiagonal walks of the same

type. There are b

k−1

walks of the type AB and b

n−k

walks of the type CD and, obviously, k can be any

number between 1 and n. We therefore obtain the

recurrence relation:

b

n

=

n

¸

k=1

b

k−1

b

n−k

=

n−1

¸

k=0

b

k

b

n−k−1

with the initial condition b

0

= 1, corresponding to

the empty walk or the walk composed by the only

point (0, 0). The sequence (b

k

)

k∈N

begins:

n 0 1 2 3 4 5 6 7 8

b

n

1 1 2 5 14 42 132 429 1430

and, as we shall see:

b

n

=

1

n + 1

2n

n

.

The b

n

’s are called Catalan numbers and they fre-

quently occur in the analysis of algorithms and data

structures. For example, if we associate an open

parenthesis to every east steps and a closed paren-

thesis to every north step, we obtain the number of

possible parenthetizations of an expression. When

we have three pairs of parentheses, the 5 possibilities

are:

()()() ()(()) (())() (()()) ((())).

When we build binary trees from permutations, we

do not always obtain diﬀerent trees from diﬀerent

permutations. There are only 5 trees generated by

the six permutations of 1, 2, 3, as we show in Figure

2.3.

How many diﬀerent trees exist with n nodes? If we

ﬁx our attention on the root, the left subtree has k

nodes for k = 0, 1, . . . , n − 1, while the right subtree

contains the remaining n − k − 1 nodes. Every tree

with k nodes can be combined with every tree with

n − k − 1 nodes to form a tree with n nodes, and

therefore we have the recurrence relation:

b

n

=

n−1

¸

k=0

b

k

b

n−k−1

which is the same recurrence as before. Since the

initial condition is again b

0

= 1 (the empty tree) there

are as many trees as there are walks.

Another kind of walks is obtained by considering

steps of type ((x, y), (x + 1, y + 1)), i.e., north-east

steps, and of type ((x, y), (x + 1, y − 1)), i.e., south-

east steps. The interesting walks are those starting

2.10. STIRLING NUMBERS OF THE FIRST KIND 21

1

2

3

d

d

d

d

1

3

2

d

d

1

2

3

d

d

1

3

2

d

d

1

2

3

**Figure 2.3: The ﬁve binary trees with three nodes
**

d

d

d

d

d

d

d

d

d

d

Figure 2.4: Rooted Plane Trees

from the origin and never going below the x-axis; they

are called Dyck walks. An obvious 1-1 correspondence

exists between Dyck walks and the walks considered

above, and again we obtain the sequence of Catalan

numbers:

n 0 1 2 3 4 5 6 7 8

b

n

1 1 2 5 14 42 132 429 1430

Finally, the concept of a rooted planar tree is as

follows: let us consider a node, which is the root of

the tree; if we recursively add branches to the root or

to the nodes generated by previous insertions, what

we obtain is a “rooted plane tree”. If n denotes the

number of branches in a rooted plane tree, in Fig. 2.4

we represent all the trees up to n = 3. Again, rooted

plane trees are counted by Catalan numbers.

2.10 Stirling numbers of the

ﬁrst kind

About 1730, the English mathematician James Stir-

ling was looking for a connection between powers

of a number x, say x

n

, and the falling factorials

x

k

= x(x − 1) (x − k + 1). He developed the ﬁrst

n`k 0 1 2 3 4 5 6

0 1

1 0 1

2 0 1 1

3 0 2 3 1

4 0 6 11 6 1

5 0 24 50 35 10 1

6 0 120 274 225 85 15 1

Table 2.2: Stirling numbers of the ﬁrst kind

instances:

x

1

= x

x

2

= x(x −1) = x

2

−x

x

3

= x(x −1)(x −2) = x

3

−3x

2

+ 2x

x

4

= x(x −1)(x −2)(x −3) =

= x

4

−6x

3

+ 11x

2

−6x

and picking the coeﬃcients in their proper order

(from the smallest power to the largest) he obtained a

table of integer numbers. We are mostly interested in

the absolute values of these numbers, as are shown in

Table 2.2. After him, these numbers are called Stir-

ling numbers of the ﬁrst kind and are now denoted by

n

k

**, sometimes read “n cycle k”, for the reason we
**

are now going to explain.

First note that the above identities can be written:

x

n

=

n

¸

k=0

n

k

(−1)

n−k

x

k

.

Let us now observe that x

n

= x

n−1

(x − n + 1) and

therefore we have:

x

n

= (x −n + 1)

n−1

¸

k=0

¸

n −1

k

(−1)

n−k−1

x

k

=

=

n−1

¸

k=0

¸

n −1

k

(−1)

n−k−1

x

k+1

−

−

n−1

¸

k=0

(n −1)

¸

n −1

k

(−1)

n−k−1

x

k

=

=

n

¸

k=0

¸

n −1

k −1

(−1)

n−k

x

k

+

22 CHAPTER 2. SPECIAL NUMBERS

+

n

¸

k=0

(n −1)

¸

n −1

k

(−1)

n−k

x

k

.

We performed the change of variable k → k − 1 in

the ﬁrst sum and then extended both sums from 0

to n. This identity is valid for every value of x, and

therefore we can equate its coeﬃcients to those of the

previous, general Stirling identity, thus obtaining the

recurrence relation:

n

k

= (n −1)

¸

n −1

k

+

¸

n −1

k −1

.

This recurrence, together with the initial conditions:

n

n

= 1, ∀n ∈ N and

n

0

= 0, ∀n > 0,

completely deﬁnes the Stirling number of the ﬁrst

kind.

What is a possible combinatorial interpretation of

these numbers? Let us consider the permutations of

n elements and count the permutations having ex-

actly k cycles, whose set will be denoted by S

n,k

. If

we ﬁx any element, say the last element n, we ob-

serve that the permutations in S

n,k

can have n as a

ﬁxed point, or not. When n is a ﬁxed point and we

eliminate it, we obtain a permutation with n − 1 el-

ements having exactly k − 1 cycles; vice versa, any

such permutation gives a permutation in S

n,k

with n

as a ﬁxed point if we add (n) to it. Therefore, there

are [S

n−1,k−1

[ such permutations in S

n,k

. When n is

not a ﬁxed point and we eliminate it from the permu-

tation, we obtain a permutation with n −1 elements

and k cycles. However, the same permutation is ob-

tained several times, exactly n−1 times, since n can

occur after any other element in the standard cycle

representation (it can never occur as the ﬁrst element

in a cycle, by our conventions). For example, all the

following permutations in S

5,2

produce the same per-

mutation in S

4,2

:

(1 2 3)(4 5) (1 2 3 5)(4) (1 2 5 3)(4) (1 5 2 3)(4).

The process can be inverted and therefore we have:

[S

n,k

[ = (n −1)[S

n−1,k

[ +[S

n−1,k−1

[

which is just the recurrence relation for the Stirling

numbers of the ﬁrst kind. If we now prove that also

the initial conditions are the same, we conclude that

[S

n,k

[ =

n

k

**. First we observe that S
**

n,n

is only com-

posed by the identity, the only permutation having n

cycles, i.e., n ﬁxed points; so [S

n,n

[ = 1, ∀n ∈ N.

Moreover, for n ≥ 1, S

n,0

is empty, because ev-

ery permutation contains at least one cycle, and so

[S

n,0

[ = 0. This concludes the proof.

As an immediate consequence of this reasoning, we

ﬁnd:

n

¸

k=0

n

k

= n!,

i.e., the row sums of the Stirling triangle of the ﬁrst

kind equal n!, because they correspond to the total

number of permutations of n objects. We also observe

that:

•

n

1

= (n − 1)!; in fact, S

n,1

is composed by

all the permutations having a single cycle; this

begins by 1 and is followed by any permutations

of the n −1 remaining numbers;

•

n

n−1

=

n

2

; in fact, S

n,n−1

contains permu-

tations having all ﬁxed points except a single

transposition; but this transposition can only be

formed by taking two elements among 1, 2, . . . , n,

which is done in

n

2

diﬀerent ways;

•

n

2

= (n − 1)!H

n−1

; returning to the numeri-

cal deﬁnition, the coeﬃcient of x

2

is a sum of

products, in each of which a positive integer is

missing.

2.11 Stirling numbers of the

second kind

James Stirling also tried to invert the process de-

scribed in the previous section, that is he was also

interested in expressing ordinary powers in terms of

falling factorials. The ﬁrst instances are:

x

1

= x

1

x

2

= x

1

+x

2

= x +x(x −1)

x

3

= x

1

+ 3x

2

+x

3

= x + 3x(x −1)+

+x(x −1)(x −2)

x

4

= x

1

+ 6x

2

+ 7x

3

+x

4

The coeﬃcients can be arranged into a triangular ar-

ray, as shown in Table 2.3, and are called Stirling

numbers of the second kind. The usual notation for

them is

¸

n

k

¸

, often read “n subset k”, for the reason

we are going to explain.

Stirling’s identities can be globally written:

x

n

=

n

¸

k=0

n

k

¸

x

k

.

We obtain a recurrence relation in the following way:

x

n

= xx

n−1

= x

n−1

¸

k=0

n −1

k

x

k

=

=

n−1

¸

k=0

n −1

k

(x +k −k)x

k

=

2.12. BELL AND BERNOULLI NUMBERS 23

n`k 0 1 2 3 4 5 6

0 1

1 0 1

2 0 1 1

3 0 1 3 1

4 0 1 7 6 1

5 0 1 15 25 10 1

6 0 1 31 90 65 15 1

Table 2.3: Stirling numbers of the second kind

=

n−1

¸

k=0

n −1

k

x

k+1

+

n−1

¸

k=0

k

n −1

k

x

k

=

=

n

¸

k=0

n −1

k −1

x

k

+

n

¸

k=0

k

n −1

k

x

k

where, as usual, we performed the change of variable

k → k − 1 and extended the two sums from 0 to n.

The identity is valid for every x ∈ R, and therefore we

can equate the coeﬃcients of x

k

in this and the above

identity, thus obtaining the recurrence relation:

n

k

¸

= k

n −1

k

+

n −1

k −1

**which is slightly diﬀerent from the recurrence for the
**

Stirling numbers of the ﬁrst kind. Here we have the

initial conditions:

n

n

¸

= 1, ∀n ∈ N and

n

0

¸

= 0, ∀n ≥ 1.

These relations completely deﬁne the Stirling trian-

gle of the second kind. Every row of the triangle

determines a polynomial; for example, from row 4 we

obtain: S

4

(w) = w +7w

2

+6w

3

+w

4

and it is called

the 4-th Stirling polynomial.

Let us now look for a combinatorial interpretation

of these numbers. If N

n

is the usual set ¦1, 2, . . . , n¦,

we can study the partitions of N

n

into k disjoint, non-

empty subsets. For example, when n = 4 and k = 2,

we have the following 7 partitions:

¦1¦ ∪ ¦2, 3, 4¦ ¦1, 2¦ ∪ ¦3, 4¦ ¦1, 3¦ ∪ ¦2, 4¦

¦1, 4¦ ∪ ¦2, 3¦ ¦1, 2, 3¦ ∪ ¦4¦

¦1, 2, 4¦ ∪ ¦3¦ ¦1, 3, 4¦ ∪ ¦2¦ .

If P

n,k

is the corresponding set, we now count [P

n,k

[

by ﬁxing an element in N

n

, say the last element n.

The partitions in P

n,k

can contain n as a singleton

(i.e., as a subset with n as its only element) or can

contain n as an element in a larger subset. In the

former case, by eliminating ¦n¦ we obtain a partition

in P

n−1,k−1

and, obviously, all partitions in P

n−1,k−1

can be obtained in such a way. When n belongs to a

larger set, we can eliminate it obtaining a partition

in P

n−1,k

; however, the same partition is obtained

several times, exactly by eliminating n from any of

the k subsets containing it in the various partitions.

For example, the following three partitions in P

5,3

all

produce the same partition in P

4,3

:

¦1, 2, 5¦ ∪ ¦3¦ ∪ ¦4¦ ¦1, 2¦ ∪ ¦3, 5¦ ∪ ¦4¦

¦1, 2¦ ∪ ¦3¦ ∪ ¦4, 5¦

This proves the recurrence relation:

[P

n,k

[ = k[P

n−1,k

[ +[P

n−1,k−1

[

which is the same recurrence as for the Stirling num-

bers of the second kind. As far as the initial con-

ditions are concerned, we observe that there is only

one partition of N

n

composed by n subsets, i.e., the

partition containing n singletons; therefore [P

n,n

[ =

1, ∀n ∈ N (in the case n = 0 the empty set is the only

partition of the empty set). When n ≥ 1, there is

no partition of N

n

composed by 0 subsets, and there-

fore [P

n,0

[ = 0. We can conclude that [P

n,k

[ coincides

with the corresponding Stirling number of the second

kind, and use this fact for observing that:

•

¸

n

1

¸

= 1, ∀n ≥ 1. In fact, the only partition of

N

n

in a single subset is when the subset coincides

with N

n

;

•

¸

n

2

¸

= 2

n−1

, ∀n ≥ 2. When the partition is only

composed by two subsets, the ﬁrst one uniquely

determines the second. Let us suppose that the

ﬁrst subset always contains 1. By eliminating

1, we obtain as ﬁrst subset all the subsets in

N

n

` ¦1¦, except this last whole set, which would

correspond to an empty second set. This proves

the identity;

•

n

n−1

¸

=

n

2

**, ∀n ∈ N. In any partition with
**

one subset with 2 elements, and all the others

singletons, the two elements can be chosen in

n

2

diﬀerent ways.

2.12 Bell and Bernoulli num-

bers

If we sum the rows of the Stirling triangle of the sec-

ond kind, we ﬁnd a sequence:

n 0 1 2 3 4 5 6 7 8

B

n

1 1 2 5 15 52 203 877 4140

which represents the total number of partitions rela-

tive to the set N

n

. For example, the ﬁve partitions of

a set with three elements are:

¦1, 2, 3¦ ¦1¦ ∪ ¦2, 3¦ ¦1, 2¦ ∪ ¦3¦

24 CHAPTER 2. SPECIAL NUMBERS

¦1, 3¦ ∪ ¦2¦ ¦1¦ ∪ ¦2¦ ∪ ¦3¦ .

The numbers in this sequence are called Bell numbers

and are denoted by B

n

; by deﬁnition we have:

B

n

=

n

¸

k=0

n

k

¸

.

Bell numbers grow very fast; however, since

¸

n

k

¸

≤

n

k

**, for every value of n and k (a subset in P
**

n,k

corresponds to one or more cycles in S

n,k

), we always

have B

n

≤ n!, and in fact B

n

< n! for every n > 1.

Another frequently occurring sequence is obtained

by ordering the subsets appearing in the partitions of

N

n

. For example, the partition ¦1¦∪¦2¦∪¦3¦ can be

ordered in 3! = 6 diﬀerent ways, and ¦1, 2¦ ∪¦3¦ can

be ordered in 2! = 2 ways, i.e., ¦1, 2¦ ∪¦3¦ and ¦3¦ ∪

¦1, 2¦. These are called ordered partitions, and their

number is denoted by O

n

. By the previous example,

we easily see that O

3

= 13, and the sequence begins:

n 0 1 2 3 4 5 6 7

O

n

1 1 3 13 75 541 4683 47293

Because of this deﬁnition, the numbers O

n

are called

ordered Bell numbers and we have:

O

n

=

n

¸

k=0

n

k

¸

k!;

this shows that O

n

≥ n!, and, in fact, O

n

> n!, ∀n >

1.

Another combinatorial interpretation of the or-

dered Bell numbers is as follows. Let us ﬁx an in-

teger n ∈ N and for every k ≤ n let A

k

be any mul-

tiset with n elements containing at least once all the

numbers 1, 2, . . . , k. The number of all the possible

orderings of the A

k

’s is just the nth ordered Bell num-

ber. For example, when n = 3, the possible multisets

are: ¦1, 1, 1¦ , ¦1, 1, 2¦ , ¦1, 2, 2¦ , ¦1, 2, 3¦. Their pos-

sible orderings are given by the following 7 vectors:

(1, 1, 1) (1, 1, 2) (1, 2, 1) (2, 1, 1)

(1, 2, 2) (2, 1, 2) (2, 2, 1)

plus the six permutations of the set ¦1, 2, 3¦. These

orderings are called preferential arrangements.

We can ﬁnd a 1-1 correspondence between the

orderings of set partitions and preferential arrange-

ments. If (a

1

, a

2

, . . . , a

n

) is a preferential arrange-

ment, we build the corresponding ordered partition

by setting the element 1 in the a

1

th subset, 2 in the

a

2

th subset, and so on. If k is the largest number

in the arrangement, we exactly build k subsets. For

example, the partition corresponding to (1, 2, 2, 1) is

¦1, 4¦ ∪ ¦2, 3¦, while the partition corresponding to

(2, 1, 1, 2) is ¦2, 3¦∪¦1, 4¦, whose ordering is diﬀerent.

This construction can be easily inverted and since it is

injective, we have proved that it is actually a 1-1 cor-

respondence. Because of that, ordered Bell numbers

are also called preferential arrangement numbers.

We conclude this section by introducing another

important sequence of numbers. These are (positive

or negative) rational numbers and therefore they can-

not correspond to any counting problem, i.e., their

combinatorial interpretation cannot be direct. How-

ever, they arise in many combinatorial problems and

therefore they should be examined here, for the mo-

ment only introducing their deﬁnition. The Bernoulli

numbers are implicitly deﬁned by the recurrence re-

lation:

n

¸

k=0

n + 1

k

B

k

= δ

n,0

.

No initial condition is necessary, because for n = 0

we have

1

0

B

0

= 1, i.e., B

0

= 1. This is the starting

value, and B

1

is obtained by setting n = 1 in the

recurrence relation:

2

0

B

0

+

2

1

B

1

= 0.

We obtain B

1

= −1/2, and we now have a formula

for B

2

:

3

0

B

0

+

3

1

B

1

+

3

2

B

2

= 0.

By performing the necessary computations, we ﬁnd

B

2

= 1/6, and we can go on successively obtaining

all the possible values for the B

n

’s. The ﬁrst twelve

values are as follows:

n 0 1 2 3 4 5 6

B

n

1 −1/2 1/6 0 −1/30 0 1/42

n 7 8 9 10 11 12

B

n

0 −1/30 0 5/66 0 −691/2730

Except for B

1

, all the other values of B

n

for odd

n are zero. Initially, Bernoulli numbers seem to be

small, but as n grows, they become extremely large in

modulus, but, apart from the zero values, they are al-

ternatively one positive and one negative. These and

other properties of the Bernoulli numbers are not eas-

ily proven in a direct way, i.e., from their deﬁnition.

However, we’ll see later how we can arrange things

in such a way that everything becomes accessible to

us.

Chapter 3

Formal power series

3.1 Deﬁnitions for formal

power series

Let R be the ﬁeld of real numbers and let t be any

indeterminate over R, i.e., a symbol diﬀerent from

any element in R. A formal power series (f.p.s.) over

R in the indeterminate t is an expression:

f(t) = f

0

+f

1

t+f

2

t

2

+f

3

t

3

+ +f

n

t

n

+ =

∞

¸

k=0

f

k

t

k

where f

0

, f

1

, f

2

, . . . are all real numbers. The same

deﬁnition applies to every set of numbers, in particu-

lar to the ﬁeld of rational numbers Q and to the ﬁeld

of complex numbers C. The developments we are now

going to see, and depending on the ﬁeld structure of

the numeric set, can be easily extended to every ﬁeld

F of 0 characteristic. The set of formal power series

over F in the indeterminate t is denoted by F[[t]]. The

use of a particular indeterminate t is irrelevant, and

there exists an obvious 1-1 correspondence between,

say, F[[t]] and F[[y]]; it is simple to prove that this

correspondence is indeed an isomorphism. In order

to stress that our results are substantially indepen-

dent of the particular ﬁeld F and of the particular

indeterminate t, we denote F[[t]] by T, but the reader

can think of T as of R[[t]]. In fact, in combinatorial

analysis and in the analysis of algorithms the coeﬃ-

cients f

0

, f

1

, f

2

, . . . of a formal power series are mostly

used to count objects, and therefore they are positive

integer numbers, or, in some cases, positive rational

numbers (e.g., when they are the coeﬃcients of an ex-

ponential generating function. See below and Section

4.1).

If f(t) ∈ T, the order of f(t), denoted by ord(f(t)),

is the smallest index r for which f

r

= 0. The set of all

f.p.s. of order exactly r is denoted by T

r

or by F

r

[[t]].

The formal power series 0 = 0 + 0t + 0t

2

+ 0t

3

+

has inﬁnite order.

If (f

0

, f

1

, f

2

, . . .) = (f

k

)

k∈N

is a sequence of (real)

numbers, there is no substantial diﬀerence between

the sequence and the f.p.s.

¸

∞

k=0

f

k

t

k

, which will

be called the (ordinary) generating function of the

sequence. The term ordinary is used to distinguish

these functions from exponential generating func-

tions, which will be introduced in the next chapter.

The indeterminate t is used as a “place-marker”, i.e.,

a symbol to denote the place of the element in the se-

quence. For example, in the f.p.s. 1+t +t

2

+t

3

+ ,

corresponding to the sequence (1, 1, 1, . . .), the term

t

5

= 1 t

5

simply denotes that the element in position

5 (starting from 0) in the sequence is the number 1.

Although our study of f.p.s. is mainly justiﬁed by

the development of a generating function theory, we

dedicate the present chapter to the general theory of

f.p.s., and postpone the study of generating functions

to the next chapter.

There are two main reasons why f.p.s. are more

easily studied than sequences:

1. the algebraic structure of f.p.s. is very well un-

derstood and can be developed in a standard

way;

2. many f.p.s. can be “abbreviated” by expressions

easily manipulated by elementary algebra.

The present chapter is devoted to these algebraic as-

pects of f.p.s.. For example, we will prove that the

series 1 +t +t

2

+t

3

+ can be conveniently abbre-

viated as 1/(1 −t), and from this fact we will be able

to infer that the series has a f.p.s. inverse, which is

1 −t + 0t

2

+ 0t

3

+ .

We conclude this section by deﬁning the concept

of a formal Laurent (power) series (f.L.s.), as an ex-

pression:

g(t) = g

−m

t

−m

+g

−m+1

t

−m+1

+ +g

−1

t

−1

+

+g

0

+g

1

t +g

2

t

2

+ =

∞

¸

k=−m

g

k

t

k

.

The set of f.L.s. strictly contains the set of f.p.s..

For a f.L.s. g(t) the order can be negative; when the

order of g(t) is non-negative, then g(t) is actually a

f.p.s.. We observe explicitly that an expression as

¸

∞

k=−∞

f

k

t

k

does not represent a f.L.s..

25

26 CHAPTER 3. FORMAL POWER SERIES

3.2 The basic algebraic struc-

ture

The set T of f.p.s. can be embedded into several al-

gebraic structures. We are now going to deﬁne the

most common one, which is related to the usual con-

cept of sum and (Cauchy) product of series. Given

two f.p.s. f(t) =

¸

∞

k=0

f

k

t

k

and g(t) =

¸

∞

k=0

g

k

t

k

,

the sum of f(t) and g(t) is deﬁned as:

f(t) +g(t) =

∞

¸

k=0

f

k

t

k

+

∞

¸

k=0

g

k

t

k

=

∞

¸

k=0

(f

k

+g

k

)t

k

.

From this deﬁnition, it immediately follows that T

is a commutative group with respect to the sum.

The associative and commutative laws directly fol-

low from the analogous properties in the ﬁeld F; the

identity is the f.p.s. 0 = 0 + 0t + 0t

2

+ 0t

3

+ , and

the opposite series of f(t) =

¸

∞

k=0

f

k

t

k

is the series

−f(t) =

¸

∞

k=0

(−f

k

)t

k

.

Let us now deﬁne the Cauchy product of f(t) by

g(t):

f(t)g(t) =

∞

¸

k=0

f

k

t

k

∞

¸

k=0

g

k

t

k

=

=

∞

¸

k=0

¸

k

¸

j=0

f

j

g

k−j

¸

t

k

Because of the form of the t

k

coeﬃcient, this is also

called the convolution of f(t) and g(t). It is a good

idea to write down explicitly the ﬁrst terms of the

Cauchy product:

f(t)g(t) = f

0

g

0

+ (f

0

g

1

+f

1

g

0

)t +

+ (f

0

g

2

+f

1

g

1

+f

2

g

0

)t

2

+

+ (f

0

g

3

+f

1

g

2

+f

2

g

1

+f

3

g

0

)t

3

+

This clearly shows that the product is commutative

and it is a simple matter to prove that the identity is

the f.p.s. 1 = 1+0t +0t

2

+0t

3

+ . The distributive

law is a consequence of the distributive law valid in

F. In fact, we have:

(f(t) +g(t))h(t) =

=

∞

¸

k=0

¸

k

¸

j=0

(f

j

+g

j

)h

k−j

¸

t

k

=

=

∞

¸

k=0

¸

k

¸

j=0

f

j

h

k−j

¸

t

k

+

∞

¸

k=0

¸

k

¸

j=0

g

j

h

k−j

¸

t

k

=

= f(t)h(t) +g(t)h(t)

Finally, we can prove that T does not contain any

zero divisor. If f(t) and g(t) are two f.p.s. diﬀerent

from zero, then we can suppose that ord(f(t)) = k

1

and ord(g(t)) = k

2

, with 0 ≤ k

1

, k

2

< ∞. This

means f

k1

= 0 and g

k2

= 0; therefore, the product

f(t)g(t) has the term of degree k

1

+k

2

with coeﬃcient

f

k1

g

k2

= 0, and so it cannot be zero. We conclude

that (T, +, ) is an integrity domain.

The previous reasoning also shows that, in general,

we have:

ord(f(t)g(t)) = ord(f(t)) + ord(g(t))

The order of the identity 1 is obviously 0; if f(t) is an

invertible element in T, we should have f(t)f(t)

−1

=

1 and therefore ord(f(t)) = 0. On the other hand, if

f(t) ∈ T

0

, i.e., f(t) = f

0

+f

1

t +f

2

t

2

+f

3

t

3

+ with

f

0

= 0, we can easily prove that f(t) is invertible. In

fact, let g(t) = f(t)

−1

so that f(t)g(t) = 1. From

the explicit expression for the Cauchy product, we

can determine the coeﬃcients of g(t) by solving the

inﬁnite system of linear equations:

f

0

g

0

= 1

f

0

g

1

+f

1

g

0

= 0

f

0

g

2

+f

1

g

1

+f

2

g

0

= 0

The system can be solved in a simple way, starting

with the ﬁrst equation and going on one equation

after the other. Explicitly, we obtain:

g

0

= f

−1

0

g

1

= −

f

1

f

2

0

g

2

= −

f

2

1

f

3

0

−

f

2

f

2

0

and therefore g(t) = f(t)

−1

is well deﬁned. We con-

clude stating the result just obtained: a f.p.s. is in-

vertible if and only if its order is 0. Because of that,

T

0

is also called the set of invertible f.p.s.. Accord-

ing to standard terminology, the elements of T

0

are

called the units of the integrity domain.

As a simple example, let us compute the inverse of

the f.p.s. 1 −t = 1 −t +0t

2

+0t

3

+ . Here we have

f

0

= 1, f

1

= −1 and f

k

= 0, ∀k > 1. The system

becomes:

g

0

= 1

g

1

−g

0

= 0

g

2

−g

1

= 0

and we easily obtain that all the g

j

’s (j = 0, 1, 2, . . .)

are 1. Therefore the inverse f.p.s. we are looking for

is 1 + t + t

2

+ t

3

+ . The usual notation for this

fact is:

1

1 −t

= 1 +t +t

2

+t

3

+ .

It is well-known that this identity is only valid for

−1 < t < 1, when t is a variable and f.p.s. are inter-

preted as functions. In our formal approach, however,

these considerations are irrelevant and the identity is

valid from a purely formal point of view.

3.4. OPERATIONS ON FORMAL POWER SERIES 27

3.3 Formal Laurent Series

In the ﬁrst section of this Chapter, we introduced the

concept of a formal Laurent series, as an extension

of the concept of a f.p.s.; if a(t) =

¸

∞

k=m

a

k

t

k

and

b(t) =

¸

∞

k=n

b

k

t

k

(m, n ∈ Z), are two f.L.s., we can

deﬁne the sum and the Cauchy product:

a(t) +b(t) =

∞

¸

k=m

a

k

t

k

+

∞

¸

k=n

b

k

t

k

=

=

∞

¸

k=p

(a

k

+b

k

)t

k

a(t)b(t) =

∞

¸

k=m

a

k

t

k

∞

¸

k=n

b

k

t

k

=

=

∞

¸

k=q

¸

¸

i+j=k

a

i

b

j

¸

t

k

where p = min(m, n) and q = m + n. As we did for

f.p.s., it is not diﬃcult to ﬁnd out that these opera-

tions enjoy the usual properties of sum and product,

and if we denote by L the set of f.L.s., we have that

(L, +, ) is a ﬁeld. The only point we should formally

prove is that every f.L.s. a(t) =

¸

∞

k=m

a

k

t

k

= 0 has

an inverse f.L.s. b(t) =

¸

∞

k=−m

b

k

t

k

. However, this

is proved in the same way we proved that every f.p.s.

in T

0

has an inverse. In fact we should have:

a

m

b

−m

= 1

a

m

b

−m+1

+a

m+1

b

−m

= 0

a

m

b

−m+2

+a

m+1

b

−m+1

+a

m+2

b

−m

= 0

By solving the ﬁrst equation, we ﬁnd b

−m

= a

−1

m

;

then the system can be solved one equation after the

other, by substituting the values obtained up to the

moment. Since a

m

b

−m

is the coeﬃcient of t

0

, we have

a(t)b(t) = 1 and the proof is complete.

We can now show that (L, +, ) is the smallest

ﬁeld containing the integrity domain (T, +, ), thus

characterizing the set of f.L.s. in an algebraic way.

From Algebra we know that given an integrity do-

main (K, +, ) the smallest ﬁeld (F, +, ) containing

(K, +, ) can be built in the following way: let us de-

ﬁne an equivalence relation ∼ on the set K K:

(a, b) ∼ (c, d) ⇐⇒ ad = bc;

if we now set F = K K/ ∼, the set F with the

operations + and deﬁned as the extension of + and

in K is the ﬁeld we are searching for. This is just

the way in which the ﬁeld Q of rational numbers is

constructed from the integrity domain Z of integers

numbers, and the ﬁeld of rational functions is built

from the integrity domain of the polynomials.

Our aim is to show that the ﬁeld (L, +, ) of f.L.s. is

isomorphic with the ﬁeld constructed in the described

way starting with the integrity domain of f.p.s.. Let

´

L = T T be the set of pairs of f.p.s.; we begin

by showing that for every (f(t), g(t)) ∈

´

L there ex-

ists a pair (a(t), b(t)) ∈

´

L such that (f(t), g(t)) ∼

(a(t), b(t)) (i.e., f(t)b(t) = g(t)a(t)) and at least

one between a(t) and b(t) belongs to T

0

. In fact,

let p = min(ord(f(t)), ord(g(t))) and let us deﬁne

a(t), b(t) as f(t) = t

p

a(t) and g(t) = t

p

b(t); obviously,

either a(t) ∈ T

0

or b(t) ∈ T

0

or both are invertible

f.p.s.. We now have:

b(t)f(t) = b(t)t

p

a(t) = t

p

b(t)a(t) = g(t)a(t)

and this shows that (a(t), b(t)) ∼ (f(t), g(t)).

If b(t) ∈ T

0

, then a(t)/b(t) ∈ T and is uniquely de-

termined by a(t), b(t); in this case, therefore, our as-

sert is proved. So, let us now suppose that b(t) ∈ T

0

;

then we can write b(t) = t

m

v(t), where v(t) ∈ T

0

. We

have a(t)/v(t) =

¸

∞

k=0

d

k

t

k

∈ T

0

, and consequently

let us consider the f.L.s. l(t) =

¸

∞

k=0

d

k

t

k−m

; by

construction, it is uniquely determined by a(t), b(t)

or also by f(t), g(t). It is now easy to see that l(t)

is the inverse of the f.p.s. b(t)/a(t) in the sense of

f.L.s. as considered above, and our proof is complete.

This shows that the correspondence is a 1-1 corre-

spondence between L and

´

L preserving the inverse,

so it is now obvious that the correspondence is also

an isomorphism between (L, +, ) and (

´

L, +, ).

Because of this result, we can identify

´

L and L

and assert that (L, +, ) is indeed the smallest ﬁeld

containing (T, +, ). From now on, the set

´

L will be

ignored and we will always refer to L as the ﬁeld of

f.L.s..

3.4 Operations on formal

power series

Besides the four basic operations: addition, subtrac-

tion, multiplication and division, it is possible to con-

sider other operations on T, only a few of which can

be extended to L.

The most important operation is surely taking a

power of a f.p.s.; if p ∈ N we can recursively deﬁne:

f(t)

0

= 1 if p = 0

f(t)

p

= f(t)f(t)

p−1

if p > 0

and observe that ord(f(t)

p

) = p ord(f(t)). There-

fore, f(t)

p

∈ T

0

if and only if f(t) ∈ T

0

; on the

other hand, if f(t) ∈ T

0

, then the order of f(t)

p

be-

comes larger and larger and goes to ∞ when p → ∞.

This property will be important in our future devel-

opments, when we will reduce many operations to

28 CHAPTER 3. FORMAL POWER SERIES

inﬁnite sums involving the powers f(t)

p

with p ∈ N.

If f(t) ∈ T

0

, i.e., ord(f(t)) > 0, these sums involve

elements of larger and larger order, and therefore for

every index k we can determine the coeﬃcient of t

k

by only a ﬁnite number of terms. This assures that

our deﬁnitions will be good deﬁnitions.

We wish also to observe that taking a positive in-

teger power can be easily extended to L; in this case,

when ord(f(t)) < 0, ord(f(t)

p

) decreases, but re-

mains always ﬁnite. In particular, for g(t) = f(t)

−1

,

g(t)

p

= f(t)

−p

, and powers can be extended to all

integers p ∈ Z.

When the exponent p is a real or complex num-

ber whatsoever, we should restrict f(t)

p

to the case

f(t) ∈ T

0

; in fact, if f(t) = t

m

g(t), we would have:

f(t)

p

= (t

m

g(t))

p

= t

mp

g(t)

p

; however, t

mp

is an ex-

pression without any mathematical sense. Instead,

if f(t) ∈ T

0

, let us write f(t) = f

0

+ ´ v(t), with

ord(´ v(t)) > 0. For v(t) = ´ v(t)/f

0

, we have by New-

ton’s rule:

f(t)

p

= (f

0

+´ v(t))

p

= f

p

0

(1+v(t))

p

= f

p

0

∞

¸

k=0

p

k

v(t)

k

,

which can be assumed as a deﬁnition. In the last

expression, we can observe that: i) f

p

0

∈ C; ii)

p

k

is

deﬁned for every value of p, k being a non-negative

integer; iii) v(t)

k

is well-deﬁned by the considerations

above and ord(v(t)

k

) grows indeﬁnitely, so that for

every k the coeﬃcient of t

k

is obtained by a ﬁnite

sum. We can conclude that f(t)

p

is well-deﬁned.

Particular cases are p = −1 and p = 1/2. In the

former case, f(t)

−1

is the inverse of the f.p.s. f(t).

We have already seen a method for computing f(t)

−1

,

but now we obtain the following formula:

f(t)

−1

=

1

f

0

∞

¸

k=0

−1

k

v(t)

k

=

1

f

0

∞

¸

k=0

(−1)

k

v(t)

k

.

For p = 1/2, we obtain a formula for the square root

of a f.p.s.:

f(t)

1/2

=

f(t) =

f

0

∞

¸

k=0

1/2

k

v(t)

k

=

=

f

0

∞

¸

k=0

(−1)

k−1

4

k

(2k −1)

2k

k

v(t)

k

.

In Section 3.12, we will see how f(t)

p

can be ob-

tained computationally without actually performing

the powers v(t)

k

. We conclude by observing that this

more general operation of taking the power p ∈ R

cannot be extended to f.L.s.: in fact, we would have

smaller and smaller terms t

k

(k → −∞) and there-

fore the resulting expression cannot be considered an

actual f.L.s., which requires a term with smallest de-

gree.

By applying well-known rules of the exponential

and logarithmic functions, we can easily deﬁne the

corresponding operations for f.p.s., which however,

as will be apparent, cannot be extended to f.L.s.. For

the exponentiation we have for f(t) ∈ T

0

, f(t) = f

0

+

v(t):

e

f(t)

= exp(f

0

+v(t)) = e

f0

∞

¸

k=0

v(t)

k

k!

.

Again, since v(t) ∈ T

0

, the order of v(t)

k

increases

with k and the sums necessary to compute the co-

eﬃcient of t

k

are always ﬁnite. The formula makes

clear that exponentiation can be performed on every

f(t) ∈ T, and when f(t) ∈ T

0

the factor e

f0

is not

present.

For the logarithm, let us suppose f(t) ∈ T

0

, f(t) =

f

0

+ ´ v(t), v(t) = ´ v(t)/f

0

; then we have:

ln(f

0

+ ´ v(t)) = ln f

0

+ ln(1 +v(t)) =

= ln f

0

+

∞

¸

k=1

(−1)

k+1

v(t)

k

k

.

In this case, for f(t) ∈ T

0

, we cannot deﬁne the log-

arithm, and this shows an asymmetry between expo-

nential and logarithm.

Another important operation is diﬀerentiation:

Df(t) =

d

dt

f(t) =

∞

¸

k=1

kf

k

t

k−1

= f

′

(t).

This operation can be performed on every f(t) ∈ L,

and a very important observation is the following:

Theorem 3.4.1 For every f(t) ∈ L, its derivative

f

′

(t) does not contain any term in t

−1

.

Proof: In fact, by the general rule, the term in

t

−1

should have been originated by the constant term

(i.e., the term in t

0

) in f(t), but the product by k

reduces it to 0.

This fact will be the basis for very important results

on the theory of f.p.s. and f.L.s. (see Section 3.8).

Another operation is integration; because indeﬁnite

integration leaves a constant term undeﬁned, we pre-

fer to introduce and use only deﬁnite integration; for

f(t) ∈ T this is deﬁned as:

t

0

f(τ)dτ =

∞

¸

k=0

f

k

t

0

τ

k

dτ =

∞

¸

k=0

f

k

k + 1

t

k+1

.

Our purely formal approach allows us to exchange

the integration and summation signs; in general, as

we know, this is only possible when the convergence is

uniform. By this deﬁnition,

t

0

f(τ)dτ never belongs

to T

0

. Integration can be extended to f.L.s. with an

3.6. COEFFICIENT EXTRACTION 29

obvious exception: because integration is the inverse

operation of diﬀerentiation, we cannot apply integra-

tion to a f.L.s. containing a term in t

−1

. Formally,

from the deﬁnition above, such a term would imply a

division by 0, and this is not allowed. In all the other

cases, integration does not create any problem.

3.5 Composition

A last operation on f.p.s. is so important that we

dedicate to it a complete section. The operation is the

composition of two f.p.s.. Let f(t) ∈ T and g(t) ∈ T

0

,

then we deﬁne the “composition” of f(t) by g(t) as

the f.p.s:

f(g(t)) =

∞

¸

k=0

f

k

g(t)

k

.

This deﬁnition justiﬁes the fact that g(t) cannot be-

long to F

0

; in fact, otherwise, inﬁnite sums were in-

volved in the computation of f(g(t)). In connection

with the composition of f.p.s., we will use the follow-

ing notation:

f(g(t)) =

f(y)

y = g(t)

**which is intended to mean the result of substituting
**

g(t) to the indeterminate y in the f.p.s. f(y). The

indeterminate y is a dummy symbol; it should be dif-

ferent from t in order not to create any ambiguity,

but it can be substituted by any other symbol. Be-

cause every f(t) ∈ T

0

is characterized by the fact

that g(0) = 0, we will always understand that, in the

notation above, the f.p.s. g(t) is such that g(0) = 0.

The deﬁnition can be extended to every f(t) ∈ L,

but the function g(t) has always to be such that

ord(g(t)) > 0, otherwise the deﬁnition would imply

inﬁnite sums, which we avoid because, by our formal

approach, we do not consider any convergence crite-

rion.

The f.p.s. in T

1

have a particular relevance for

composition. They are called quasi-units or delta se-

ries. First of all, we wish to observe that the f.p.s.

t ∈ T

1

acts as an identity for composition. In fact

y

y = g(t)

= g(t) and

f(y)

y = t

= f(t), and

therefore t is a left and right identity. As a sec-

ond fact, we show that a f.p.s. f(t) has an inverse

with respect to composition if and only if f(t) ∈ T

1

.

Note that g(t) is the inverse of f(t) if and only if

f(g(t)) = t and g(f(t)) = t. From this, we de-

duce immediately that f(t) ∈ T

0

and g(t) ∈ T

0

.

On the other hand, it is clear that ord(f(g(t))) =

ord(f(t))ord(g(t)) by our initial deﬁnition, and since

ord(t) = 1 and ord(f(t)) > 0, ord(g(t)) > 0, we must

have ord(f(t)) = ord(g(t)) = 1.

Let us now come to the main part of the proof

and consider the set T

1

with the operation of com-

position ◦; composition is always associative and

therefore (T

1

, ◦) is a group if we prove that every

f(t) ∈ T

1

has a left (or right) inverse, because the

theory assures that the other inverse exists and coin-

cides with the previously found inverse. Let f(t) =

f

1

t+f

2

t

2

+f

3

t

3

+ and g(t) = g

1

t+g

2

t

2

+g

3

t

3

+ ;

we have:

f(g(t)) = f

1

(g

1

t +g

2

t

2

+g

3

t

3

+ ) +

+f

2

(g

2

1

t

2

+ 2g

1

g

2

t

3

+ ) +

+f

3

(g

3

1

t

3

+ ) + =

= f

1

g

1

t + (f

1

g

2

+f

2

g

2

1

)t

2

+

+ (f

1

g

3

+ 2f

2

g

1

g

2

+f

3

g

3

1

)t

3

+

= t

In order to determine g(t) we have to solve the sys-

tem:

f

1

g

1

= 1

f

1

g

2

+f

2

g

2

1

= 0

f

1

g

3

+ 2f

2

g

1

g

2

+f

3

g

3

1

= 0

The ﬁrst equation gives g

1

= 1/f

1

; we can substitute

this value in the second equation and obtain a value

for g

2

; the two values for g

1

and g

2

can be substi-

tuted in the third equation and obtain a value for g

3

.

Continuing in this way, we obtain the value of all the

coeﬃcients of g(t), and therefore g(t) is determined

in a unique way. In fact, we observe that, by con-

struction, in the kth equation, g

k

appears in linear

form and its coeﬃcient is always f

1

. Being f

1

= 0,

g

k

is unique even if the other g

r

(r < k) appear with

powers.

The f.p.s. g(t) such that f(g(t)) = t, and therefore

such that g(f(t)) = t as well, is called the composi-

tional inverse of f(t). In the literature, it is usually

denoted by f(t) or f

[−1]

(t); we will adopt the ﬁrst no-

tation. Obviously, f(t) = f(t), and sometimes f(t) is

also called the reverse of f(t). Given f(t) ∈ T

1

, the

determination of its compositional inverse is one of

the most interesting problems in the theory of f.p.s.

or f.L.s.; it was solved by Lagrange and we will dis-

cuss it in the following sections. Note that, in princi-

ple, the g

k

’s can be computed by solving the system

above; this, however, is too complicated and nobody

will follow that way, unless for exercising.

3.6 Coeﬃcient extraction

If f(t) ∈ L, or in particular f(t) ∈ T, the notation

[t

n

]f(t) indicates the extraction of the coeﬃcient of

t

n

from f(t), and therefore we have [t

n

]f(t) = f

n

.

In this sense, [t

n

] can be seen as a mapping: [t

n

] :

L → R or [t

n

] : L → C, according to what is the

ﬁeld underlying the set L or T. Because of that, [t

n

]

30 CHAPTER 3. FORMAL POWER SERIES

(linearity) [t

n

](αf(t) +βg(t)) = α[t

n

]f(t) +β[t

n

]g(t) (K1)

(shifting) [t

n

]tf(t) = [t

n−1

]f(t) (K2)

(diﬀerentiation) [t

n

]f

′

(t) = (n + 1)[t

n+1

]f(t) (K3)

(convolution) [t

n

]f(t)g(t) =

n

¸

k=0

[t

k

]f(t)[t

n−k

]g(t) (K4)

(composition) [t

n

]f(g(t)) =

∞

¸

k=0

([y

k

]f(y))[t

n

]g(t)

k

(K5)

Table 3.1: The rules for coeﬃcient extraction

is called an operator and exactly the “coeﬃcient of ”

operator or, more simply, the coeﬃcient operator.

In Table 3.1 we state formally the main properties

of this operator, by collecting what we said in the pre-

vious sections. We observe that α, β ∈ R or α, β ∈ C

are any constants; the use of the indeterminate y is

only necessary not to confuse the action on diﬀerent

f.p.s.; because g(0) = 0 in composition, the last sum

is actually ﬁnite. Some points require more lengthy

comments. The property of shifting can be easily gen-

eralized to [t

n

]t

k

f(t) = [t

n−k

]f(t) and also to nega-

tive powers: [t

n

]f(t)/t

k

= [t

n+k

]f(t). These rules are

very important and are often applied in the theory of

f.p.s. and f.L.s.. In the former case, some care should

be exercised to see whether the properties remain in

the realm of T or go beyond it, invading the domain

of L, which can be not always correct. The property

of diﬀerentiation for n = 1 gives [t

−1

]f

′

(t) = 0, a

situation we already noticed. The operator [t

−1

] is

also called the residue and is noted as “res”; so, for

example, people write resf

′

(t) = 0 and some authors

use the notation res t

n+1

f(t) for [t

n

]f(t).

We will have many occasions to apply rules (K1)÷

(K5) of coeﬃcient extraction. However, just to give

a meaningful example, let us ﬁnd the coeﬃcient of

t

n

in the series expansion of (1 + αt)

r

, when α and

r are two real numbers whatsoever. Rule (K3) can

be written in the form [t

n

]f(t) =

1

n

[t

n−1

]f

′

(t) and we

can successively apply this form to our case:

[t

n

](1 +αt)

r

=

rα

n

[t

n−1

](1 +αt)

r−1

=

=

rα

n

(r −1)α

n −1

[t

n−2

](1 +αt)

r−2

= =

=

rα

n

(r −1)α

n −1

(r −n + 1)α

1

[t

0

](1 +αt)

r−n

=

= α

n

r

n

[t

0

](1 +αt)

r−n

.

We now observe that [t

0

](1 + αt)

r−n

= 1 because of

our observations on f.p.s. operations. Therefore, we

conclude with the so-called Newton’s rule:

[t

n

](1 +αt)

r

=

r

n

α

n

which is one of the most frequently used results in

coeﬃcient extraction. Let us remark explicitly that

when r = −1 (the geometric series) we have:

[t

n

]

1

1 +αt

=

−1

n

α

n

=

=

1 +n −1

n

(−1)

n

α

n

= (−α)

n

.

A simple, but important use of Newton’s rule con-

cerns the extraction of the coeﬃcient of t

n

from the

inverse of a trinomial at

2

+ bt + c, in the case it is

reducible, i.e., it can be written (1 +αt)(1 +βt); ob-

viously, we can always reduce the constant c to 1; by

the linearity rule, it can be taken outside the “coeﬃ-

cient of” operator. Therefore, our aim is to compute:

[t

n

]

1

(1 +αt)(1 +βt)

with α = β, otherwise Newton’s rule would be im-

mediately applicable. The problem can be solved by

using the technique of partial fraction expansion. We

look for two constants A and B such that:

1

(1 +αt)(1 +βt)

=

A

1 +αt

+

B

1 +βt

=

=

A+Aβt +B +Bαt

(1 +αt)(1 +βt)

;

if two such constants exist, the numerator in the ﬁrst

expression should equal the numerator in the last one,

independently of t, or, if one so prefers, for every

value of t. Therefore, the term A+B should be equal

to 1, while the term (Aβ +Bα)t should always be 0.

The values for A and B are therefore the solution of

the linear system:

A+B = 1

Aβ +Bα = 0

3.7. MATRIX REPRESENTATION 31

The discriminant of this system is α − β, which is

always diﬀerent from 0, because of our hypothesis

α = β. The system has therefore only one solution,

which is A = α/(α−β) and B = −β/(α−β). We can

now substitute these values in the expression above:

[t

n

]

1

(1 +αt)(1 +βt)

=

= [t

n

]

1

α −β

α

1 +αt

−

β

1 +βt

=

=

1

α −β

[t

n

]

α

1 +αt

−[t

n

]

β

1 +βt

=

=

α

n+1

−β

n+1

α −β

(−1)

n

Let us now consider a trinomial 1 + bt + ct

2

for

which ∆ = b

2

− 4c < 0 and b = 0. The trinomial is

irreducible, but we can write:

[t

n

]

1

1 +bt +ct

2

=

= [t

n

]

1

1 −

−b+i

√

|∆|

2

t

1 −

−b−i

√

|∆|

2

t

**This time, a partial fraction expansion does not give
**

a simple closed form for the coeﬃcients, however, we

can apply the formula above in the form:

[t

n

]

1

(1 −αt)(1 −βt)

=

α

n+1

−β

n+1

α −β

.

Since α and β are complex numbers, the resulting

expression is not very appealing. We can try to give

it a better form. Let us set α =

−b +i

[∆[

/2,

so α is always contained in the positive imaginary

halfplane. This implies 0 < arg(α) < π and we have:

α −β = −

b

2

+i

[∆[

2

+

b

2

+i

[∆[

2

=

= i

[∆[ = i

4c −b

2

If θ = arg(α) and:

ρ = [α[ =

b

2

4

−

4c −b

2

4

=

√

c

we can set α = ρe

iθ

and β = ρe

−iθ

. Consequently:

α

n+1

−β

n+1

= ρ

n+1

e

i(n+1)θ

−e

−i(n+1)θ

=

= 2iρ

n+1

sin(n + 1)θ

and therefore:

[t

n

]

1

1 +bt +ct

2

=

2(

√

c)

n+1

sin(n + 1)θ

√

4c −b

2

At this point we only have to ﬁnd the value of θ.

Obviously:

θ = arctan

[∆[

2

−b

2

+kπ = arctan

√

4c −b

2

−b

+kπ

When b < 0, we have 0 < arctan

−

√

4c −b

2

/2 <

π/2, and this is the correct value for θ. However,

when b > 0, the principal branch of arctan is negative,

and we should set θ = π+arctan

−

√

4c −b

2

/2. As

a consequence, we have:

θ = arctan

√

4c −b

2

−b

+C

where C = π if b > 0 and C = 0 if b < 0.

An interesting and non-trivial example is given by:

σ

n

= [t

n

]

1

1 −3t + 3t

2

=

=

2(

√

3)

n+1

sin((n + 1) arctan(

√

3/3))

√

3

=

= 2(

√

3)

n

sin

(n + 1)π

6

These coeﬃcients have the following values:

n = 12k σ

n

=

√

3

12k

= 729

k

n = 12k + 1 σ

n

=

√

3

12k+2

= 3 729

k

n = 12k + 2 σ

n

= 2

√

3

12k+2

= 6 729

k

n = 12k + 3 σ

n

=

√

3

12k+4

= 9 729

k

n = 12k + 4 σ

n

=

√

3

12k+4

= 9 729

k

n = 12k + 5 σ

n

= 0

n = 12k + 6 σ

n

= −

√

3

12k+6

= −27 729

k

n = 12k + 7 σ

n

= −

√

3

12k+8

= −81 729

k

n = 12k + 8 σ

n

= −2

√

3

12k+8

= −162 729

k

n = 12k + 9 σ

n

= −

√

3

12k+10

= −243 729

k

n = 12k + 10 σ

n

= −

√

3

12k+10

= −243 729

k

n = 12k + 11 σ

n

= 0

3.7 Matrix representation

Let f(t) ∈ T

0

; with the coeﬃcients of f(t) we form

the following inﬁnite lower triangular matrix (or ar-

ray) D = (d

n,k

)

n,k∈N

: column 0 contains the coeﬃ-

cients f

0

, f

1

, f

2

, . . . in this order; column 1 contains

the same coeﬃcients shifted down by one position

and d

0,1

= 0; in general, column k contains the coef-

ﬁcients of f(t) shifted down k positions, so that the

ﬁrst k positions are 0. This deﬁnition can be sum-

marized in the formula d

n,k

= f

n−k

, ∀n, k ∈ N. For

a reason which will be apparent only later, the array

32 CHAPTER 3. FORMAL POWER SERIES

D will be denoted by (f(t), 1):

D = (f(t), 1) =

¸

¸

¸

¸

¸

¸

¸

¸

f

0

0 0 0 0

f

1

f

0

0 0 0

f

2

f

1

f

0

0 0

f

3

f

2

f

1

f

0

0

f

4

f

3

f

2

f

1

f

0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

¸

.

If (f(t), 1) and (g(t), 1) are the matrices corre-

sponding to the two f.p.s. f(t) and g(t), we are in-

terested in ﬁnding out what is the matrix obtained

by multiplying the two matrices with the usual row-

by-column product. This product will be denoted by

(f(t), 1) (g(t), 1), and it is immediate to see what

its generic element d

n,k

is. The row n in (f(t), 1) is,

by deﬁnition, ¦f

n

, f

n−1

, f

n−2

, . . .¦, and column k in

(g(t), 1) is ¦0, 0, . . . , 0, g

1

, g

2

, . . .¦ where the number

of leading 0’s is just k. Therefore we have:

d

n,k

=

∞

¸

j=0

f

n−j

g

j−k

if we conventionally set g

r

= 0, ∀r < 0, When

k = 0, we have d

n,0

=

¸

∞

j=0

f

n−j

g

j

=

¸

n

j=0

f

n−j

g

j

,

and therefore column 0 contains the coeﬃcients of

the convolution f(t)g(t). When k = 1 we have

d

n,1

=

¸

∞

j=0

f

n−j

g

j−1

=

¸

n−1

j=0

f

n−1−j

g

j

, and this

is the coeﬃcient of t

n−1

in the convolution f(t)g(t).

Proceeding in the same way, we see that column k

contains the coeﬃcients of the convolution f(t)g(t)

shifted down k positions. Therefore we conclude:

(f(t), 1) (g(t), 1) = (f(t)g(t), 1)

and this shows that there exists a group isomorphism

between (T

0

, ) and the set of matrices (f(t), 1) with

the row-by-column product. In particular, (1, 1) is

the identity (in fact, it corresponds to the identity

matrix) and (f(t)

−1

, 1) is the inverse of (f(t), 1).

Let us now consider a f.p.s. f(t) ∈ T

1

and let

us build an inﬁnite lower triangular matrix in the

following way: column k contains the coeﬃcients of

f(t)

k

in their proper order:

¸

¸

¸

¸

¸

¸

¸

¸

1 0 0 0 0

0 f

1

0 0 0

0 f

2

f

2

1

0 0

0 f

3

2f

1

f

2

f

3

1

0

0 f

4

2f

1

f

3

+f

2

2

3f

2

1

f

2

f

4

1

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

¸

.

The matrix will be denoted by (1, f(t)/t) and we are

interested to see how the matrix (1, g(t)/t)(1, f(t)/t)

is composed when f(t), g(t) ∈ T

1

.

If

´

f

n,k

n,k∈N

= (1, f(t)/t), by deﬁnition we have:

´

f

n,k

= [t

n

]f(t)

k

and therefore the generic element d

n,k

of the product

is:

d

n,k

=

∞

¸

j=0

´ g

n,j

´

f

j,k

=

∞

¸

j=0

[t

n

]g(t)

j

[y

j

]f(y)

k

=

= [t

n

]

∞

¸

j=0

[y

j

]f(t)

k

g(t)

j

= [t

n

]f(g(t))

k

.

In other words, column k in (1, g(t)/t) (1, f(t)/t) is

the kth power of the composition f(g(t)), and we can

conclude:

(1, g(t)/t) (1, f(t)/t) = (1, f(g(t))/t).

Clearly, the identity t ∈ T

1

corresponds to the ma-

trix (1, t/t) = (1, 1), the identity matrix, and this is

suﬃcient to prove that the correspondence f(t) ↔

(1, f(t)/t) is a group isomorphism.

Row-by-column product is surely the basic op-

eration on matrices and its extension to inﬁnite,

lower triangular arrays is straight-forward, because

the sums involved in the product are actually ﬁnite.

We have shown that we can associate every f.p.s.

f(t) ∈ T

0

to a particular matrix (f(t), 1) (let us

denote by A the set of such arrays) in such a way

that (T

0

, ) is isomorphic to (A, ), and the Cauchy

product becomes the row-by-column product. Be-

sides, we can associate every f.p.s. g(t) ∈ T

1

to a

matrix (1, g(t)/t) (let us call B the set of such matri-

ces) in such a way that (T

1

, ◦) is isomorphic to (B, ),

and the composition of f.p.s. becomes again the row-

by-column product. This reveals a connection be-

tween the Cauchy product and the composition: in

the Chapter on Riordan Arrays we will explore more

deeply this connection; for the moment, we wish to

see how this observation yields to a computational

method for evaluating the compositional inverse of a

f.p.s. in T

1

.

3.8 Lagrange inversion theo-

rem

Given an inﬁnite, lower triangular array of the form

(1, f(t)/t), with f(t) ∈ T

1

, the inverse matrix

(1, g(t)/t) is such that (1, g(t)/t) (1, f(t)/t) = (1, 1),

and since the product results in (1, f(g(t))/t) we have

f(g(t)) = t. In other words, because of the isomor-

phism we have seen, the inverse matrix for (1, f(t)/t)

is just the matrix corresponding to the compositional

inverse of f(t). As we have already said, Lagrange

3.9. SOME EXAMPLES OF THE LIF 33

found a noteworthy formula for the coeﬃcients of

this compositional inverse. We follow the more recent

proof of Stanley, which points out the purely formal

aspects of Lagrange’s formula. Indeed, we will prove

something more, by ﬁnding the exact form of the ma-

trix (1, g(t)/t), inverse of (1, f(t)/t). As a matter of

fact, we state what the form of (1, g(t)/t) should be

and then verify that it is actually so.

Let D = (d

n,k

)

n,k∈N

be deﬁned as:

d

n,k

=

k

n

[t

n−k

]

t

f(t)

n

.

Because f(t)/t ∈ T

0

, the power (t/f(t))

k

=

(f(t)/t)

−k

is well-deﬁned; in order to show that

(d

n,k

)

n,k∈N

= (1, g(t)/t) we only have to prove that

D (1, f(t)/t) = (1, 1), because we already know that

the compositional inverse of f(t) is unique. The

generic element v

n,k

of the row-by-column product

D (1, f(t)/t) is:

v

n,k

=

∞

¸

j=0

d

n,j

[y

j

]f(y)

k

=

=

∞

¸

j=0

j

n

[t

n−j

]

t

f(t)

n

[y

j

]f(y)

k

.

By the rule of diﬀerentiation for the coeﬃcient of op-

erator, we have:

j[y

j

]f(y)

k

= [y

j−1

]

d

dy

f(y)

k

= k[y

j

]yf

′

(y)f(y)

k−1

.

Therefore, for v

n,k

we have:

v

n,k

=

k

n

∞

¸

j=0

[t

n−j

]

t

f(t)

n

[y

j

]yf

′

(y)f(y)

k−1

=

=

k

n

[t

n

]

t

f(t)

n

tf

′

(t)f(t)

k−1

.

In fact, the factor k/n does not depend on j and

can be taken out of the summation sign; the sum

is actually ﬁnite and is the term of the convolution

appearing in the last formula. Let us now distinguish

between the case k = n and k = n. When k = n we

have:

v

n,n

= [t

n

]t

n

tf(t)

−1

f

′

(t) =

= [t

0

]f

′

(t)

t

f(t)

= 1;

in fact, f

′

(t) = f

1

+ 2f

2

t + 3f

3

t

2

+ and, being

f(t)/t ∈ T

0

, (f(t)/t)

−1

= (f

1

+f

2

t +f

3

t

2

+ )

−1

=

f

−1

1

+ ; therefore, the constant term in f

′

(t)(t/f(t))

is f

1

/f

1

= 1. When k = n:

v

n,k

=

k

n

[t

n

]t

n

tf(t)

k−n−1

f

′

(t) =

=

k

n

[t

−1

]

1

k −n

d

dt

f(t)

k−n

= 0;

in fact, f(t)

k−n

is a f.L.s. and, as we observed, the

residue of its derivative should be zero. This proves

that D (1, f(t)/t) = (1, 1) and therefore D is the

inverse of (1, f(t)/t).

If f(t) is the compositional inverse of f(t), the col-

umn 1 gives us the value of its coeﬃcients; by the

formula for d

n,k

we have:

f

n

= [t

n

]f(t) = d

n,1

=

1

n

[t

n−1

]

t

f(t)

n

and this is the celebrated Lagrange Inversion For-

mula (LIF). The other columns give us the coeﬃcients

of the powers f(t)

k

, for which we have:

[t

n

]f(t)

k

=

k

n

[t

n−k

]

t

f(t)

n

.

Many times, there is another way for applying the

LIF. Suppose we have a functional equation w =

tφ(w), where φ(t) ∈ T

0

, and we wish to ﬁnd the

f.p.s. w = w(t) satisfying this functional equation.

Clearly w(t) ∈ T

1

and if we set f(y) = y/φ(y), we

also have f(t) ∈ T

1

. However, the functional equa-

tion can be written f(w(t)) = t, and this shows that

w(t) is the compositional inverse of f(t). We there-

fore know that w(t) is uniquely determined and the

LIF gives us:

[t

n

]w(t) =

1

n

[t

n−1

]

t

f(t)

n

=

1

n

[t

n−1

]φ(t)

n

.

The LIF can also give us the coeﬃcients of the

powers w(t)

k

, but we can obtain a still more general

result. Let F(t) ∈ T and let us consider the com-

position F(w(t)) where w = w(t) is, as before, the

solution to the functional equation w = tφ(w), with

φ(w) ∈ T

0

. For the coeﬃcient of t

n

in F(w(t)) we

have:

[t

n

]F(w(t)) = [t

n

]

∞

¸

k=0

F

k

w(t)

k

=

=

∞

¸

k=0

F

k

[t

n

]w(t)

k

=

∞

¸

k=0

F

k

k

n

[t

n−k

]φ(t)

n

=

=

1

n

[t

n−1

]

∞

¸

k=0

kF

k

t

k−1

φ(t)

n

=

=

1

n

[t

n−1

]F

′

(t)φ(t)

n

.

Note that [t

0

]F(w(t)) = F

0

, and this formula can be

generalized to every F(t) ∈ L, except for the coeﬃ-

cient [t

0

]F(w(t)).

3.9 Some examples of the LIF

We found that the number b

n

of binary trees

with n nodes (and of other combinatorial objects

34 CHAPTER 3. FORMAL POWER SERIES

as well) satisﬁes the recurrence relation: b

n+1

=

¸

n

k=0

b

k

b

n−k

. Let us consider the f.p.s. b(t) =

¸

∞

k=0

b

k

t

k

; if we multiply the recurrence relation by

t

n+1

and sum for n from 0 to inﬁnity, we ﬁnd:

∞

¸

n=0

b

n+1

t

n+1

=

∞

¸

n=0

t

n+1

n

¸

k=0

b

k

b

n−k

.

Since b

0

= 1, we can add and subtract 1 = b

0

t

0

from

the left hand member and can take t outside the sum-

mation sign in the right hand member:

∞

¸

n=0

b

n

t

n

−1 = t

∞

¸

n=0

n

¸

k=0

b

k

b

n−k

t

n

.

In the r.h.s. we recognize a convolution and substi-

tuting b(t) for the corresponding f.p.s., we obtain:

b(t) −1 = tb(t)

2

.

We are interested in evaluating b

n

= [t

n

]b(t); let us

therefore set w = w(t) = b(t) −1, so that w(t) ∈ T

1

and w

n

= b

n

, ∀n > 0. The previous relation becomes

w = t(1+w)

2

and we see that the LIF can be applied

(in the form relative to the functional equation) with

φ(t) = (1 +t)

2

. Therefore we have:

b

n

= [t

n

]w(t) =

1

n

[t

n−1

](1 +t)

2n

=

=

1

n

2n

n −1

=

1

n

(2n)!

(n −1)!(n + 1)!

=

=

1

n + 1

(2n)!

n!n!

=

1

n + 1

2n

n

.

As we said in the previous chapter, b

n

is called the

nth Catalan number and, under this name, it is often

denoted by C

n

. Now we have its form:

C

n

=

1

n + 1

2n

n

**also valid in the case n = 0 when C
**

0

= 1.

In the same way we can compute the number of

p-ary trees with n nodes. A p-ary tree is a tree in

which all the nodes have arity p, except for leaves,

which have arity 0. A non-empty p-ary tree can be

decomposed in the following way:

,

, , , ¨

¨

¨

¨

¨

¨

¨

¨¨

r

r

r

r

r

r

r

r r

t

t

tt

T

1

t

t

tt

T

2

. . .

t

t

tt

T

p

which proves that T

n+1

=

¸

T

i1

T

i2

T

ip

, where the

sum is extended to all the p-uples (i

1

, i

2

, . . . , i

p

) such

that i

1

+i

2

+ +i

p

= n. As before, we can multiply

by t

n+1

the two members of the recurrence relation

and sum for n from 0 to inﬁnity. We ﬁnd:

T(t) −1 = tT(t)

p

.

This time we have a p degree equation, which cannot

be directly solved. However, if we set w(t) = T(t)−1,

so that w(t) ∈ T

1

, we have:

w = t(1 +w)

p

and the LIF gives:

T

n

= [t

n

]w(t) =

1

n

[t

n−1

](1 +t)

pn

=

=

1

n

pn

n −1

=

1

n

(pn)!

(n −1)!((p −1)n + 1)!

=

=

1

(p −1)n + 1

pn

n

**which generalizes the formula for the Catalan num-
**

bers.

Finally, let us ﬁnd the solution of the functional

equation w = te

w

. The LIF gives:

w

n

=

1

n

[t

n−1

]e

nt

=

1

n

n

n−1

(n −1)!

=

n

n−1

n!

.

Therefore, the solution we are looking for is the f.p.s.:

w(t) =

∞

¸

n=1

n

n−1

n!

t

n

= t +t

2

+

3

2

t

3

+

8

3

t

4

+

125

24

t

5

+ .

As noticed, w(t) is the compositional inverse of

f(t) = t/φ(t) = te

−t

= t −t

2

+

t

3

2!

−

t

4

3!

+

t

5

4!

− .

It is a useful exercise to perform the necessary com-

putations to show that f(w(t)) = t, for example up to

the term of degree 5 or 6, and verify that w(f(t)) = t

as well.

3.10 Formal power series and

the computer

When we are dealing with generating functions, or

more in general with formal power series of any kind,

we often have to perform numerical computations in

order to verify some theoretical result or to experi-

ment with actual cases. In these and other circum-

stances the computer can help very much with its

speed and precision. Nowadays, several Computer

Algebra Systems exist, which oﬀer the possibility of

3.11. THE INTERNAL REPRESENTATION OF EXPRESSIONS 35

actually working with formal power series, contain-

ing formal parameters as well. The use of these tools

is recommended because they can solve a doubt in

a few seconds, can clarify diﬃcult theoretical points

and can give useful hints whenever we are faced with

particular problems.

However, a Computer Algebra System is not al-

ways accessible or, in certain circumstances, one may

desire to use less sophisticated tools. For example,

programmable pocket computers are now available,

which can perform quite easily the basic operations

on formal power series. The aim of the present and

of the following sections is to describe the main al-

gorithms for dealing with formal power series. They

can be used to program a computer or to simply un-

derstand how an existing system actually works.

The simplest way to represent a formal power se-

ries is surely by means of a vector, in which the kth

component (starting from 0) is the coeﬃcient of t

k

in

the power series. Obviously, the computer memory

only can store a ﬁnite number of components, so an

upper bound n

0

is usually given to the length of vec-

tors and to represent power series. In other words we

have:

repr

n0

∞

¸

k=0

a

k

t

k

= (a

0

, a

1

, . . . , a

n

) (n ≤ n

0

)

Fortunately, most operations on formal power se-

ries preserve the number of signiﬁcant components,

so that there is little danger that a number of succes-

sive operations could reduce a ﬁnite representation to

a meaningless sequence of numbers. Diﬀerentiation

decreases by one the number of useful components;

on the contrary, integration and multiplication by t

r

,

say, increase the number of signiﬁcant elements, at

the cost of introducing some 0 components.

The components a

0

, a

1

, . . . , a

n

are usually real

numbers, represented with the precision allowed by

the particular computer. In most combinatorial ap-

plications, however, a

0

, a

1

, . . . , a

n

are rational num-

bers and, with some extra eﬀort, it is not diﬃcult

to realize rational arithmetic on a computer. It is

suﬃcient to represent a rational number as a couple

(m, n), whose intended meaning is just m/n. So we

must have m ∈ Z, n ∈ N and it is a good idea to

keep m and n coprime. This can be performed by

a routine reduce computing p = gcd(m, n) using Eu-

clid’s algorithm and then dividing both m and n by

p. The operations on rational numbers are deﬁned in

the following way:

(m, n) + (m

′

, n

′

) = reduce(mn

′

+m

′

n, nn

′

)

(m, n) (m

′

, n

′

) = reduce(mm

′

, nn

′

)

(m, n)

−1

= (n, m)

(m, n)

p

= (m

p

, n

p

) (p ∈ N)

provided that (m, n) is a reduced rational number.

The dimension of m and n is limited by the internal

representation of integer numbers.

In order to avoid this last problem, Computer Al-

gebra Systems usually realize an indeﬁnite precision

integer arithmetic. An integer number has a vari-

able length internal representation and special rou-

tines are used to perform the basic operations. These

routines can also be realized in a high level program-

ming language (such as C or JAVA), but they can

slow down too much execution time if realized on a

programmable pocket computer.

3.11 The internal representa-

tion of expressions

The simple representation of a formal power series by

a vector of real, or rational, components will be used

in the next sections to explain the main algorithms

for formal power series operations. However, it is

surely not the best way to represent power series and

becomes completely useless when, for example, the

coeﬃcients depend on some formal parameter. In

other words, our representation only can deal with

purely numerical formal power series.

Because of that, Computer Algebra Systems use a

more sophisticated internal representation. In fact,

power series are simply a particular case of a general

mathematical expression. The aim of the present sec-

tion is to give a rough idea of how an expression can

be represented in the computer memory.

In general, an expression consists in operators and

operands. For example, in a +

√

3, the operators are

+ and

√

, and the operands are a and 3. Every

operator has its own arity or adicity, i.e., the number

of operands on which it acts. The adicity of the sum

+ is two, because it acts on its two terms; the adicity

of the square root

√

is one, because it acts on a

single term. Operands can be numbers (and it is

important that the nature of the number be speciﬁed,

i.e., if it is a natural, an integer, a rational, a real or a

complex number) or can be a formal parameter, as a.

Obviously, if an operator acts on numerical operands,

it can be executed giving a numerical result. But if

any of its operands is a formal parameter, the result is

a formal expression, which may perhaps be simpliﬁed

but cannot be evaluated to a numerical result.

An expression can always be transposed into a

“tree”, the internal nodes of which correspond to

operators and whose leaves correspond to operands.

The simple tree for the previous expression is given

in Figure 3.1. Each operator has as many branches as

its adicity is and a simple visit to the tree can perform

its evaluation, that is it can execute all the operators

36 CHAPTER 3. FORMAL POWER SERIES

+

√

a

`

3

`

d

d

d

Figure 3.1: The tree for a simple expression

when only numerical operands are attached to them.

Simpliﬁcation is a rather complicated matter and it

is not quite clear what a “simple” expression is. For

example, which is simpler between (a+1)(a+2) and

a

2

+3a+2? It is easily seen that there are occasions in

which either expression can be considered “simpler”.

Therefore, most Computer Algebra Systems provide

both a general “simpliﬁcation” routine together with

a series of more speciﬁc programs performing some

speciﬁc simplifying tasks, such as expanding paren-

thetized expressions or collecting like factors.

In the computer memory an expression is repre-

sented by its tree, which is called the tree or list repre-

sentation of the expression. The representation of the

formal power series

¸

n

k=0

a

k

t

k

is shown in Figure 3.2.

This representation is very convenient and when some

coeﬃcients a

1

, a

2

, . . . , a

n

depend on a formal param-

eter p nothing is changed, at least conceptually. In

fact, where we have drawn a leaf a

j

, we simply have

a more complex tree representing the expression for

a

j

.

The reader can develop computer programs for

dealing with this representation of formal power se-

ries. It should be clear that another important point

of this approach is that no limitation is given to the

length of expressions. A clever and dynamic use of

the storage solves every problem without increasing

the complexity of the corresponding programs.

3.12 Basic operations of formal

power series

We are now considering the vector representation of

formal power series:

repr

n0

∞

¸

k=0

a

k

t

k

= (a

0

, a

1

, . . . , a

n

) n ≤ n

0

.

The sum of the formal power series is deﬁned in an

obvious way:

(a

0

, a

1

, . . . , a

n

) + (b

0

, b

1

, . . . , b

m

) = (c

0

, c

1

, . . . , c

r

)

where c

i

= a

i

+ b

i

for every 0 ≤ i ≤ r, and r =

min(n, m). In a similar way the Cauchy product is

deﬁned:

(a

0

, a

1

, . . . , a

n

) (b

0

, b

1

, . . . , b

m

) = (c

0

, c

1

, . . . , c

r

)

where c

k

=

¸

k

j=0

a

j

b

k−j

for every 0 ≤ k ≤ r. Here

r is deﬁned as r = min(n + p

B

, m + p

A

), if p

A

is

the ﬁrst index for which a

pA

= 0 and p

B

is the ﬁrst

index for which b

pB

= 0. We point out that the

time complexity for the sum is O(r) and the time

complexity for the product is O(r

2

).

Subtraction is similar to addition and does not re-

quire any particular comment. Before discussing divi-

sion, let us consider the operation of rising a formal

power series to a power α ∈ R. This includes the

inversion of a power series (α = −1) and therefore

division as well.

First of all we observe that whenever α ∈ N, f(t)

α

can be reduced to that case (1 +g(t))

α

, where g(t) / ∈

T

0

. In fact we have:

f(t) = f

h

t

h

+f

h+1

t

h+1

+f

h+2

t

h+2

+ =

= f

h

t

h

1 +

f

h+1

f

h

t +

f

h+2

f

h

t

2

+

and therefore:

f(t)

α

= f

α

h

t

αh

1 +

f

h+1

f

h

t +

f

h+2

f

h

t

2

+

α

.

On the contrary, when α / ∈ N, f(t)

α

only can be

performed if f(t) ∈ T

0

. In that case we have:

f(t)

α

=

f

0

+f

1

t +f

2

t

2

+f

3

t

3

+

α

=

= f

α

0

1 +

f

1

f

0

t +

f

2

f

0

t

2

+

f

3

f

0

t

3

+

α

;

note that in this case if f

0

= 1, usually f

α

0

is not ratio-

nal. In any case, we are always reduced to compute

(1 +g(t))

α

and since:

(1 +g(t))

α

=

∞

¸

k=0

α

k

g(t)

k

(3.12.1)

if the coeﬃcients in g(t) are rational numbers, also

the coeﬃcients in (1 + g(t))

α

are, provided α ∈ Q.

These considerations are to be remembered if the op-

eration is realized in some special environment, as

described in the previous section.

Whatever α ∈ R is, the exponents involved in

the right hand member of (3.12.1) are all positive

integer numbers. Therefore, powers can be real-

ized as successive multiplications or Cauchy products.

This gives a straight-forward method for perform-

ing (1 + g(t))

α

, but it is easily seen to take a time

in the order of O(r

3

), since it requires r products

each executed in time O(r

2

). Fortunately, however,

3.13. LOGARITHM AND EXPONENTIAL 37

+

+

+

+

∗

∗

∗

∗

d

d

d

d

d

d

.

.

.

.

.

d

d

d

`

a

0

`

t

0

`

a

1

`

t

1

`

a

2

`

t

2

`

a

n

`

t

n

`

O(t

n

)

d

d

d

Figure 3.2: The tree for a formal power series

J. C. P. Miller has devised an algorithm allowing to

perform (1 + g(t))

α

in time O(r

2

). In fact, let us

write h(t) = a(t)

α

, where a(t) is any formal power

series with a

0

= 1. By diﬀerentiating, we obtain

h

′

(t) = αa(t)

α−1

a

′

(t), or, by multiplying everything

by a(t), a(t)h

′

(t) = αh(t)a

′

(t). Therefore, by extract-

ing the coeﬃcient of t

n−1

we ﬁnd:

n−1

¸

k=0

a

k

(n −k)h

n−k

= α

n−1

¸

k=0

(k + 1)a

k+1

h

n−k−1

We now isolate the term with k = 0 in the left hand

member and the term having k = n − 1 in the right

hand member (a

0

= 1 by hypothesis):

nh

n

+

n−1

¸

k=1

a

k

(n −k)h

n−k

= αna

n

+

n−1

¸

k=1

αka

k

h

n−k

(in the last sum we performed the change of vari-

able k → k −1, in order to have the same indices as

in the left hand member). We now have an expres-

sion for h

n

only depending on (a

1

, a

2

, . . . , a

n

) and

(h

1

, h

2

, . . . , h

n−1

):

h

n

= αa

n

+

1

n

n−1

¸

k=1

((α + 1)k −n) a

k

h

n−k

=

= αa

n

+

n−1

¸

k=1

(α + 1)k

n

−1

a

k

h

n−k

The computation is now straight-forward. We be-

gin by setting h

0

= 1, and then we successively com-

pute h

1

, h

2

, . . . , h

r

(r = n, if n is the number of terms

in (a

1

, a

2

, . . . , a

n

)). The evaluation of h

k

requires a

number of operations in the order O(k), and therefore

the whole procedure works in time O(r

2

), as desired.

The inverse of a series, i.e., (1+g(t))

−1

, is obtained

by setting α = −1. It is worth noting that the previ-

ous formula becomes:

h

k

= −a

k

−

k−1

¸

j=1

a

j

h

k−j

= −

k

¸

j=1

a

j

h

k−j

(h

0

= 1)

and can be used to prove properties of the inverse of

a power series. As a simple example, the reader can

show that the coeﬃcients in (1 −t)

−1

are all 1.

3.13 Logarithm and exponen-

tial

The idea of Miller can be applied to other operations

on formal power series. In the present section we

wish to use it to perform the (natural) logarithm and

the exponentiation of a series. Let us begin with the

logarithm and try to compute ln(1 + g(t)). As we

know, there is a direct way to perform this operation,

i.e.:

ln(1 +g(t)) =

∞

¸

k=1

1

k

g(t)

k

38 CHAPTER 3. FORMAL POWER SERIES

and this formula only requires a series of successive

products. As for the operation of rising to a power,

the procedure needs a time in the order of O(r

3

),

and it is worth considering an alternative approach.

In fact, if we set h(t) = ln(1+g(t)), by diﬀerentiating

we obtain h

′

(t) = g

′

(t)/(1 + g(t)), or h

′

(t) = g

′

(t) −

h

′

(t)g(t). We can now extract the coeﬃcient of t

k−1

and obtain:

kh

k

= kg

k

−

k−1

¸

j=0

(k −j)h

k−j

g

j

However, g

0

= 0 by hypothesis, and therefore we have

an expression relating h

k

to (g

1

, g

2

, . . . , g

k

) and to

(h

1

, h

2

, . . . , h

k−1

):

h

k

= g

k

−

1

k

k−1

¸

j=1

(k −j)h

k−j

g

j

=

= g

k

−

k−1

¸

j=1

1 −

j

k

h

k−j

g

j

A program to perform the logarithm of a formal

power series 1+g(t) begins by setting h

0

= 0 and then

proceeds computing h

1

, h

2

, . . . , h

r

if r is the number

of signiﬁcant terms in g(t). The total time is clearly

in the order of O(r

2

).

A similar technique can be applied to the com-

putation of exp(g(t)) provided that g(t) / ∈ T

0

. If

g(t) ∈ T

0

, i.e., g(t) = g

0

+ g

1

t + g

2

t

2

+ , we have

exp(g

0

+g

1

t+g

2

t

2

+ ) = e

g0

exp(g

1

t+g

2

t

2

+ ). In

this way we are reduced to the previous case, but we

no longer have rational coeﬃcients when g(t) ∈ Q[[t]].

By diﬀerentiating the identity h(t) = exp(g(t)) we

obtain h

′

(t) = g

′

(t) exp(g(t)) = g

′

(t)h(t). We extract

the coeﬃcient of t

k−1

:

kh

k

=

k−1

¸

j=0

(j + 1)g

j+1

h

k−j−1

=

k

¸

j=1

jg

j

h

k−j

This formula allows us to compute h

k

in terms of

(g

1

, g

2

, . . . , g

k

) and (h

0

= 1, h

1

, h

2

, . . . , h

k−1

). A pro-

gram performing exponentiation can be easily writ-

ten by deﬁning h

0

= 1 and successively evaluating

h

1

, h

2

, . . . , h

r

. if r is the number of signiﬁcant terms

in g(t). Time complexity is obviously O(r

2

).

Unfortunately, a similar trick does not work for

series composition. To compute f(g(t)), when g(t) / ∈

T

0

, we have to resort to the deﬁning formula:

f(g(t)) =

∞

¸

k=0

f

k

g(t)

k

This requires the successive computation of the in-

teger powers of g(t), which can be performed by re-

peated applications of the Cauchy product. The exe-

cution time is in the order of O(r

3

), if r is the minimal

number of signiﬁcant terms in f(t) and g(t), respec-

tively.

We conclude this section by sketching the obvious

algorithms to compute diﬀerentiation and integration

of a formal power series f(t) = f

0

+f

1

t+f

2

t

2

+f

3

t

3

+

. If h(t) = f

′

(t), we have:

h

k

= (k + 1)f

k+1

and therefore the number of signiﬁcant terms is re-

duced by 1. Conversely, if h(t) =

t

0

f(τ)dτ, we have:

h

k

=

1

k

f

k−1

and h

0

= 0; consequently the number of signiﬁcant

terms is increased by 1.

Chapter 4

Generating Functions

4.1 General Rules

Let us consider a sequence of numbers F =

(f

0

, f

1

, f

2

, . . .) = (f

k

)

k∈N

; the (ordinary) generat-

ing function for the sequence F is deﬁned as f(t) =

f

0

+ f

1

t + f

2

t

2

+ , where the indeterminate t is

arbitrary. Given the sequence (f

k

)

k∈N

, we intro-

duce the generating function operator (, which ap-

plied to (f

k

)

k∈N

produces the ordinary generating

function for the sequence, i.e., ((f

k

)

k∈N

= f(t). In

this expression t is a bound variable, and a more ac-

curate notation would be (

t

(f

k

)

k∈N

= f(t). This

notation is essential when (f

k

)

k∈N

depends on some

parameter or when we consider multivariate gener-

ating functions. In this latter case, for example, we

should write (

t,w

(f

n,k

)

n,k∈N

= f(t, w) to indicate the

fact that f

n,k

in the double sequence becomes the co-

eﬃcient of t

n

w

k

in the function f(t, w). However,

whenever no ambiguity can arise, we will use the no-

tation ((f

k

) = f(t), understanding also the binding

for the variable k. For the sake of completeness, we

also deﬁne the exponential generating function of the

sequence (f

0

, f

1

, f

2

, . . .) as:

c(f

k

) = (

f

k

k!

=

∞

¸

k=0

f

k

t

k

k!

.

The operator ( is clearly linear. The function f(t)

can be shifted or diﬀerentiated. Two functions f(t)

and g(t) can be multiplied and composed. This leads

to the properties for the operator (listed in Table 4.1.

Note that formula (G5) requires g

0

= 0. The ﬁrst

ﬁve formulas are easily veriﬁed by using the intended

interpretation of the operator (; the last formula can

be proved by means of the LIF, in the form relative

to the composition F(w(t)). In fact we have:

[t

n

]F(t)φ(t)

n

= [t

n−1

]

F(t)

t

φ(t)

n

=

= n[t

n

]

¸

F(y)

y

dy

y = w(t)

;

in the last passage we applied backwards the formula:

[t

n

]F(w(t)) =

1

n

[t

n−1

]F

′

(t)φ(t)

n

(w = tφ(w))

and therefore w = w(t) ∈ T

1

is the unique solution

of the functional equation w = tφ(w). By now apply-

ing the rule of diﬀerentiation for the “coeﬃcient of”

operator, we can go on:

[t

n

]F(t)φ(t)

n

= [t

n−1

]

d

dt

¸

F(y)

y

dy

y = w(t)

=

= [t

n−1

]

¸

F(w)

w

w = tφ(w)

dw

dt

.

We have applied the chain rule for diﬀerentiation, and

from w = tφ(w) we have:

dw

dt

= φ(w) +t

¸

dφ

dw

w = tφ(w)

dw

dt

.

We can therefore compute the derivative of w(t):

dw

dt

=

¸

φ(w)

1 −tφ

′

(w)

w = tφ(w)

where φ

′

(w) denotes the derivative of φ(w) with re-

spect to w. We can substitute this expression in the

formula above and observe that w/φ(w) = t can be

taken outside of the substitution symbol:

[t

n

]F(t)φ(t)

n

=

= [t

n−1

]

¸

F(w)

w

φ(w)

1 −tφ

′

(w)

w = tφ(w)

=

= [t

n−1

]

1

t

¸

F(w)

1 −tφ

′

(w)

w = tφ(w)

=

= [t

n

]

¸

F(w)

1 −tφ

′

(w)

w = tφ(w)

**which is our diagonalization rule (G6).
**

The name “diagonalization” is due to the fact

that if we imagine the coeﬃcients of F(t)φ(t)

n

as

constituting the row n in an inﬁnite matrix, then

39

40 CHAPTER 4. GENERATING FUNCTIONS

linearity ((αf

k

+βg

k

) = α((f

k

) +β((g

k

) (G1)

shifting ((f

k+1

) =

(((f

k

) −f

0

)

t

(G2)

diﬀerentiation ((kf

k

) = tD((f

k

) (G3)

convolution (

n

¸

k=0

f

k

g

n−k

= ((f

k

) ((g

k

) (G4)

composition

∞

¸

n=0

f

n

(((g

k

))

n

= ((f

k

) ◦ ((g

k

) (G5)

diagonalisation (([t

n

]F(t)φ(t)

n

) =

¸

F(w)

1 −tφ

′

(w)

w = tφ(w)

(G6)

Table 4.1: The rules for the generating function operator

[t

n

]F(t)φ(t)

n

are just the elements in the main di-

agonal of this array.

The rules (G1) −(G6) can also be assumed as ax-

ioms of a theory of generating functions and used to

derive general theorems as well as speciﬁc functions

for particular sequences. In the next sections, we

will prove a number of properties of the generating

function operator. The proofs rely on the following

fundamental principle of identity:

Given two sequences (f

k

)

k∈N

and (g

k

)

k∈N

, then

((f

k

) = ((g

k

) if and only if for every k ∈ N f

k

= g

k

.

The principle is rather obvious from the very deﬁni-

tion of the concept of generating functions; however,

it is important, because it states the condition under

which we can pass from an identity about elements

to the corresponding identity about generating func-

tions. It is suﬃcient that the two sequences do not

agree by a single element (e.g., the ﬁrst one) and we

cannot infer the equality of generating functions.

4.2 Some Theorems on Gener-

ating Functions

We are now going to prove a series of properties of

generating functions.

Theorem 4.2.1 Let f(t) = ((f

k

) be the generating

function of the sequence (f

k

)

k∈N

, then

((f

k+2

) =

((f

k

) −f

0

−f

1

t

t

2

(4.2.1)

Proof: Let g

k

= f

k+1

; by (G2), ((g

k

) =

(((f

k

) −f

0

) /t. Since g

0

= f

1

, we have:

((f

k+2

) = ((g

k+1

) =

((g

k

) −g

0

t

=

((f

k

) −f

0

−f

1

t

t

2

By mathematical induction this result can be gen-

eralized to:

Theorem 4.2.2 Let f(t) = ((f

k

) be as above; then:

((f

k+j

) =

((f

k

) −f

0

−f

1

t − −f

j−1

t

j−1

t

j

(4.2.2)

If we consider right instead of left shifting we have

to be more careful:

Theorem 4.2.3 Let f(t) = ((f

k

) be as above, then:

((f

k−j

) = t

j

((f

k

) (4.2.3)

Proof: We have ((f

k

) = (

f

(k−1)+1

=

t

−1

(((f

n−1

) −f

−1

) where f

−1

is the coeﬃcient of t

−1

in f(t). If f(t) ∈ T, f

−1

= 0 and ((f

k−1

) = t((f

k

).

The theorem then follows by mathematical induction.

Property (G3) can be generalized in several ways:

Theorem 4.2.4 Let f(t) = ((f

k

) be as above; then:

(((k + 1)f

k+1

) = D((f

k

) (4.2.4)

Proof: If we set g

k

= kf

k

, we obtain:

(((k + 1)f

k+1

) = ((g

k+1

) =

= t

−1

(((kf

k

) −0f

0

)

(G2)

=

= t

−1

tD((f

k

) = D((f

k

) .

Theorem 4.2.5 Let f(t) = ((f

k

) be as above; then:

(

k

2

f

k

= tD((f

k

) +t

2

D

2

((f

k

) (4.2.5)

This can be further generalized:

4.3. MORE ADVANCED RESULTS 41

Theorem 4.2.6 Let f(t) = ((f

k

) be as above; then:

(

k

j

f

k

= o

j

(tD)((f

k

) (4.2.6)

where o

j

(w) =

¸

j

r=1

¸

j

r

¸

w

r

is the jth Stirling poly-

nomial of the second kind (see Section 2.11).

Proof: Formula (4.2.6) is to be understood in the

operator sense; so, for example, being o

3

(w) = w +

3w

2

+w

3

, we have:

(

k

3

f

k

= tD((f

k

) + 3t

2

D

2

((f

k

) +t

3

D

3

((f

k

) .

The proof proceeds by induction, as (G3) and (4.2.5)

are the ﬁrst two instances. Now:

(

k

j+1

f

k

= (

k(k

j

)f

k

= tD(

k

j

f

k

that is:

o

j+1

(tD) = tDo

j

(tD) = tD

j

¸

r=1

S(j, r)t

r

D

r

=

=

j

¸

r=1

S(j, r)rt

r

D

r

+

j

¸

r=1

S(j, r)t

r+1

D

r+1

.

By equating like coeﬃcients we ﬁnd S(j + 1, r) =

rS(j, r) + S(j, r − 1), which is the classical recur-

rence for the Stirling numbers of the second kind.

Since initial conditions also coincide, we can conclude

S(j, r) =

¸

j

r

¸

.

For the falling factorial k

r

= k(k − 1) (k − r +

1) we have a simpler formula, the proof of which is

immediate:

Theorem 4.2.7 Let f(t) = ((f

k

) be as above; then:

((k

r

f

k

) = t

r

D

r

((f

k

) (4.2.7)

Let us now come to integration:

Theorem 4.2.8 Let f(t) = ((f

k

) be as above and

let us deﬁne g

k

= f

k

/k, ∀k = 0 and g

0

= 0; then:

(

1

k

f

k

= ((g

k

) =

t

0

(((f

k

) −f

0

)

dz

z

(4.2.8)

Proof: Clearly, kg

k

= f

k

, except for k = 0. Hence

we have ((kg

k

) = ((f

k

) −f

0

. By using (G3), we ﬁnd

tD((g

k

) = ((f

k

) −f

0

, from which (4.2.8) follows by

integration and the condition g

0

= 0.

A more classical formula is:

Theorem 4.2.9 Let f(t) = ((f

k

) be as above or,

equivalently, let f(t) be a f.p.s. but not a f.L.s.; then:

(

1

k + 1

f

k

=

=

1

t

t

0

((f

k

) dz =

1

t

t

0

f(z) dz (4.2.9)

Proof: Let us consider the sequence (g

k

)

k∈N

, where

g

k+1

= f

k

and g

0

= 0. So we have: ((g

k+1

) =

((f

k

) = t

−1

(((g

k

) −g

0

). Finally:

(

1

k + 1

f

k

= (

1

k + 1

g

k+1

=

1

t

(

1

k

g

k

=

=

1

t

t

0

(((g

k

) −g

0

)

dz

z

=

1

t

t

0

((f

k

) dz

In the following theorems, f(t) will always denote

the generating function ((f

k

).

Theorem 4.2.10 Let f(t) = ((f

k

) denote the gen-

erating function of the sequence (f

k

)

k∈N

; then:

(

p

k

f

k

= f(pt) (4.2.10)

Proof: By setting g(t) = pt in (G5) we have:

f(pt) =

¸

∞

n=0

f

n

(pt)

n

=

¸

∞

n=0

p

n

f

n

t

n

= (

p

k

f

k

In particular, for p = −1 we have: (

(−1)

k

f

k

=

f(−t).

4.3 More advanced results

The results obtained till now can be considered as

simple generalizations of the axioms. They are very

useful and will be used in many circumstances. How-

ever, we can also obtain more advanced results, con-

cerning sequences derived from a given sequence by

manipulating its elements in various ways. For exam-

ple, let us begin by proving the well-known bisection

formulas:

Theorem 4.3.1 Let f(t) = ((f

k

) denote the gener-

ating function of the sequence (f

k

)

k∈N

; then:

((f

2k

) =

f(

√

t) +f(−

√

t)

2

(4.3.1)

((f

2k+1

) =

f(

√

t) −f(−

√

t)

2

√

t

(4.3.2)

where

√

t is a symbol with the property that (

√

t)

2

= t.

Proof: By (G5) we have:

f(

√

t) +f(−

√

t)

2

=

=

¸

∞

n=0

f

n

√

t

n

+

¸

∞

n=0

f

n

(−

√

t)

n

2

=

=

∞

¸

n=0

f

n

(

√

t)

n

+ (−

√

t)

n

2

For n odd (

√

t)

n

+ (−

√

t)

n

= 0; hence, by setting

n = 2k, we have:

∞

¸

k=0

f

2k

(t

k

+t

k

)/2 =

∞

¸

k=0

f

2k

t

k

= ((f

2k

) .

42 CHAPTER 4. GENERATING FUNCTIONS

The proof of the second formula is analogous.

The following proof is typical and introduces the

use of ordinary diﬀerential equations in the calculus

of generating functions:

Theorem 4.3.2 Let f(t) = ((f

k

) denote the gener-

ating function of the sequence (f

k

)

k∈N

; then:

(

f

k

2k + 1

=

1

2

√

t

t

0

((f

k

)

√

z

dz (4.3.3)

Proof: Let us set g

k

= f

k

/(2k+1), or 2kg

k

+g

k

= f

k

.

If g(t) = ((g

k

), by applying (G3) we have the diﬀer-

ential equation 2tg

′

(t) + g(t) = f(t), whose solution

having g(0) = f

0

is just formula (4.3.3).

We conclude with two general theorems on sums:

Theorem 4.3.3 (Partial Sum Theorem) Let

f(t) = ((f

k

) denote the generating function of the

sequence (f

k

)

k∈N

; then:

(

n

¸

k=0

f

k

=

1

1 −t

((f

k

) (4.3.4)

Proof: If we set s

n

=

¸

n

k=0

f

k

, then we have s

n+1

=

s

n

+f

n+1

for every n ∈ N and we can apply the oper-

ator ( to both members: ((s

n+1

) = ((s

n

) +((f

n+1

),

i.e.:

((s

n

) −s

0

t

= ((s

n

) +

((f

n

) −f

0

t

Since s

0

= f

0

, we ﬁnd ((s

n

) = t((s

n

) + ((f

n

) and

from this (4.3.4) follows directly.

The following result is known as Euler transforma-

tion:

Theorem 4.3.4 Let f(t) = ((f

k

) denote the gener-

ating function of the sequence (f

k

)

k∈N

; then:

(

n

¸

k=0

n

k

f

k

=

1

1 −t

f

t

1 −t

(4.3.5)

Proof: By well-known properties of binomial coeﬃ-

cients we have:

n

k

=

n

n −k

=

−n +n −k −1

n −k

(−1)

n−k

=

=

−k −1

n −k

(−1)

n−k

and this is the coeﬃcient of t

n−k

in (1 −t)

−k−1

. We

now observe that the sum in (4.3.5) can be extended

to inﬁnity, and by (G5) we have:

n

¸

k=0

n

k

f

k

=

n

¸

k=0

−k −1

n −k

(−1)

n−k

f

k

=

=

∞

¸

k=0

[t

n−k

](1 −t)

−k−1

[y

k

]f(y) =

= [t

n

]

1

1 −t

∞

¸

k=0

[y

k

]f(y)

t

1 −t

k

=

= [t

n

]

1

1 −t

f

t

1 −t

.

Since the last expression does not depend on n, it

represents the generating function of the sum.

We observe explicitly that by (K4) we have:

¸

n

k=0

n

k

f

k

= [t

n

](1 + t)

n

f(t), but this expression

does not represent a generating function because it

depends on n. The Euler transformation can be gen-

eralized in several ways, as we shall see when dealing

with Riordan arrays.

4.4 Common Generating Func-

tions

The aim of the present section is to derive the most

common generating functions by using the apparatus

of the previous sections. As a ﬁrst example, let us

consider the constant sequence F = (1, 1, 1, . . .), for

which we have f

k+1

= f

k

for every k ∈ N. By ap-

plying the principle of identity, we ﬁnd: ((f

k+1

) =

((f

k

), that is by (G2): ((f

k

) − f

0

= t((f

k

). Since

f

0

= 1, we have immediately:

((1) =

1

1 −t

For any constant sequence F = (c, c, c, . . .), by (G1)

we ﬁnd that ((c) = c(1−t)

−1

. Similarly, by using the

basic rules and the theorems of the previous sections

we have:

((n) = ((n 1) = tD

1

1 −t

=

t

(1 −t)

2

(

n

2

= tD((n) = tD

1

(1 −t)

2

=

t +t

2

(1 −t)

3

(((−1)

n

) = ((1) ◦ (−t) =

1

1 +t

(

1

n

= (

1

n

1

=

t

0

1

1 −z

−1

dz

z

=

=

t

0

dz

1 −z

= ln

1

1 −t

((H

n

) = (

n

¸

k=0

1

k

=

1

1 −t

(

1

n

=

=

1

1 −t

ln

1

1 −t

where H

n

is the nth harmonic number. Other gen-

erating functions can be obtained from the previous

formulas:

((nH

n

) = tD

1

1 −t

ln

1

1 −t

=

4.4. COMMON GENERATING FUNCTIONS 43

=

t

(1 −t)

2

ln

1

1 −t

+ 1

(

1

n + 1

H

n

=

1

t

t

0

1

1 −z

ln

1

1 −z

dz =

=

1

2t

ln

1

1 −t

2

((δ

0,n

) = ((1) −t((1) =

1 −t

1 −t

= 1

where δ

n,m

is the Kronecker’s delta. This last re-

lation can be readily generalized to ((δ

n,m

) = t

m

.

An interesting example is given by (

1

n(n+1)

. Since

1

n(n+1)

=

1

n

−

1

n+1

, it is tempting to apply the op-

erator ( to both members. However, this relation is

not valid for n = 0. In order to apply the principle

of identity, we must deﬁne:

1

n(n + 1)

=

1

n

−

1

n + 1

+δ

n,0

in accordance with the fact that the ﬁrst element of

the sequence is zero. We thus arrive to the correct

generating function:

(

1

n(n + 1)

= 1 −

1 −t

t

ln

1

1 −t

Let us now come to binomial coeﬃcients. In order

to ﬁnd (

p

k

we observe that, from the deﬁnition:

p

k + 1

=

p(p −1) (p −k + 1)(p −k)

(k + 1)!

=

=

p −k

k + 1

p

k

.

Hence, by denoting

p

k

as f

k

, we have:

(((k + 1)f

k+1

) = (((p −k)f

k

) = p((f

k

) − ((kf

k

).

By applying (4.2.4) and (G3) we have

D((f

k

) = p((f

k

) − tD((f

k

), i.e., the diﬀeren-

tial equation f

′

(t) = pf(t) − tf

′

(t). By sep-

arating the variables and integrating, we ﬁnd

ln f(t) = p ln(1 +t) +c, or f(t) = c(1 +t)

p

. For t = 0

we should have f(0) =

p

0

**= 1, and this implies
**

c = 1. Consequently:

(

p

k

= (1 +t)

p

p ∈ R

We are now in a position to derive the recurrence

relation for binomial coeﬃcients. By using (K1) ÷

(K5) we ﬁnd easily:

p

k

= [t

k

](1 +t)

p

= [t

k

](1 +t)(1 +t)

p−1

=

= [t

k

](1 +t)

p−1

+ [t

k−1

](1 +t)

p−1

=

=

p −1

k

+

p −1

k −1

.

By (K4), we have the well-known Vandermonde

convolution:

m+p

n

= [t

n

](1 +t)

m+p

=

= [t

n

](1 +t)

m

(1 +t)

p

=

=

n

¸

k=0

m

k

p

n −k

**which, for m = p = n becomes
**

¸

n

k=0

n

k

2

=

2n

n

.

We can also ﬁnd (

k

p

, where p is a parameter.

The derivation is purely algebraic and makes use of

the generating functions already found and of various

properties considered in the previous section:

(

k

p

= (

k

k −p

=

= (

−k +k −p + 1

k −p

(−1)

k−p

=

= (

−p −1

k −p

(−1)

k−p

=

= (

[t

k−p

]

1

(1 −t)

p+1

=

= (

[t

k

]

t

p

(1 −t)

p+1

=

t

p

(1 −t)

p+1

.

Several generating functions for diﬀerent forms of bi-

nomial coeﬃcients can be found by means of this

method. They are summarized as follows, where p

and m are two parameters and can also be zero:

(

p

m+k

=

(1 +t)

p

t

m

(

p +k

m

=

t

m−p

(1 −t)

m+1

(

p +k

m+k

=

1

t

m

(1 −t)

p+1−m

These functions can make sense even when they are

f.L.s. and not simply f.p.s..

Finally, we list the following generating functions:

(

k

p

k

= tD(

p

k

=

= tD(1 +t)

p

= pt(1 +t)

p−1

(

k

2

p

k

= (tD +t

2

D

2

)(

p

k

=

= pt(1 +pt)(1 +t)

p−2

(

1

k + 1

p

k

=

1

t

t

0

(

p

k

dz =

=

1

t

¸

(1 +t)

p+1

p + 1

t

0

=

=

(1 +t)

p+1

−1

(p + 1)t

44 CHAPTER 4. GENERATING FUNCTIONS

(

k

k

m

= tD

t

m

(1 −t)

m+1

=

=

mt

m

+t

m+1

(1 −t)

m+2

(

k

2

k

m

= (tD +t

2

D

2

)(

k

m

=

=

m

2

t

m

+ (3m+ 1)t

m+1

+t

m+2

(1 −t)

m+3

(

1

k

k

m

=

t

0

(

k

m

−

0

m

dz

z

=

=

t

0

z

m−1

dz

(1 −z)

m+1

=

t

m

m(1 −t)

m

The last integral can be solved by setting y = (1 −

z)

−1

and is valid for m > 0; for m = 0 it reduces to

((1/k) = −ln(1 −t).

4.5 The Method of Shifting

When the elements of a sequence F are given by an

explicit formula, we can try to ﬁnd the generating

function for F by using the technique of shifting: we

consider the element f

n+1

and try to express it in

terms of f

n

. This can produce a relation to which

we apply the principle of identity deriving an equa-

tion in ((f

n

), the solution of which is the generating

function. In practice, we ﬁnd a recurrence for the

elements f

n

∈ F and then try to solve it by using

the rules (G1) ÷(G5) and their consequences. It can

happen that the recurrence involves several elements

in F and/or that the resulting equation is indeed a

diﬀerential equation. Whatever the case, the method

of shifting allows us to ﬁnd the generating function

of many sequences.

Let us consider the geometric sequence

1, p, p

2

, p

3

, . . .

; we have p

k+1

= pp

k

, ∀k ∈ N

or, by applying the operator (, (

p

k+1

= p(

p

k

.

By (G2) we have t

−1

(

p

k

−1

= p(

p

k

, that is:

(

p

k

=

1

1 −pt

From this we obtain other generating functions:

(

kp

k

= tD

1

1 −pt

=

pt

(1 −pt)

2

(

k

2

p

k

= tD

pt

(1 −pt)

2

=

pt +p

2

t

2

(1 −pt)

3

(

1

k

p

k

=

t

0

1

1 −pz

−1

dz

z

=

=

p dz

(1 −pz)

= ln

1

1 −pt

(

n

¸

k=0

p

k

=

1

1 −t

1

1 −pt

=

=

1

(p −1)t

1

1 −pt

−

1

1 −t

.

The last relation has been obtained by partial fraction

expansion. By using the operator [t

k

] we easily ﬁnd:

n

¸

k=0

p

k

= [t

n

]

1

1 −t

1

1 −pt

=

=

1

(p −1)

[t

n+1

]

1

1 −pt

−

1

1 −t

=

=

p

n+1

−1

p −1

the well-known formula for the sum of a geometric

progression. We observe explicitly that the formulas

above could have been obtained from formulas of the

previous section and the general formula (4.2.10). In

a similar way we also have:

(

m

k

p

k

= (1 +pt)

m

(

k

m

p

k

= (

k

k −m

p

k−m

p

m

=

= p

m

(

−m−1

k −m

(−p)

k−m

=

=

p

m

t

m

(1 −pt)

m+1

.

As a very simple application of the shifting method,

let us observe that:

1

(n + 1)!

=

1

n + 1

1

n!

that is (n+1)f

n+1

= f

n

, where f

n

= 1/n!. By (4.2.4)

we have f

′

(t) = f(t) or:

(

1

n!

= e

t

(

1

n n!

=

t

0

e

z

−1

z

dz

(

n

(n + 1)!

= tD(

1

(n + 1)!

=

= tD

1

t

(e

t

−1) =

te

t

−e

t

+ 1

t

By this relation the well-known result follows:

n

¸

k=0

k

(k + 1)!

= [t

n

]

1

1 −t

te

t

−e

t

+ 1

t

=

= [t

n

]

1

1 −t

−

e

t

−1

t

=

= 1 −

1

(n + 1)!

.

Let us now observe that

2n+2

n+1

=

2(2n+1)

n+1

2n

n

;

by setting f

n

=

2n

n

**, we have the recurrence (n +
**

4.6. DIAGONALIZATION 45

1)f

n+1

= 2(2n + 1)f

n

. By using (4.2.4), (G1) and

(G3) we obtain the diﬀerential equation: f

′

(t) =

4tf

′

(t) + 2f(t), the simple solution of which is:

(

2n

n

=

1

√

1 −4t

(

1

n + 1

2n

n

=

1

t

t

0

dz

√

1 −4z

=

1 −

√

1 −4t

2t

(

1

n

2n

n

=

t

0

1

√

1 −4z

−1

dz

z

=

= 2 ln

1 −

√

1 −4t

2t

(

n

2n

n

=

2t

(1 −4t)

√

1 −4t

(

1

2n + 1

2n

n

=

1

√

t

t

0

dz

4z(1 −4z)

=

=

1

√

4t

arctan

4t

1 −4t

.

A last group of generating functions is obtained by

considering f

n

= 4

n

2n

n

−1

. Since:

4

n+1

2n + 2

n + 1

−1

=

2n + 2

2n + 1

4

n

2n

n

−1

we have the recurrence: (2n + 1)f

n+1

= 2(n + 1)f

n

.

By using the operator ( and the rules of Section 4.2,

the diﬀerential equation 2t(1−t)f

′

(t) −(1+2t)f(t) +

1 = 0 is derived. The solution is:

f(t) =

t

(1 −t)

3

−

t

0

(1 −z)

3

z

dz

2z(1 −z)

By simplifying and using the change of variable y =

**z/(1 −z), the integral can be computed without
**

diﬃculty, and the ﬁnal result is:

(

4

n

2n

n

−1

=

t

(1 −t)

3

arctan

t

1 −t

+

1

1 −t

.

Some immediate consequences are:

(

4

n

2n + 1

2n

n

−1

=

1

t(1 −t)

arctan

t

1 −t

(

4

n

2n

2

2n

n

−1

=

arctan

t

1 −t

2

and ﬁnally:

(

1

2n

4

n

2n + 1

2n

n

−1

= 1−

1 −t

t

arctan

t

1 −t

.

4.6 Diagonalization

The technique of shifting is a rather general method

for obtaining generating functions. It produces ﬁrst

order recurrence relations, which will be more closely

studied in the next sections. Not every sequence can

be deﬁned by a ﬁrst order recurrence relation, and

other methods are often necessary to ﬁnd out gener-

ating functions. Sometimes, the rule of diagonaliza-

tion can be used very conveniently. One of the most

simple examples is how to determine the generating

function of the central binomial coeﬃcients, without

having to pass through the solution of a diﬀerential

equation. In fact we have:

2n

n

= [t

n

](1 +t)

2n

and (G6) can be applied with F(t) = 1 and φ(t) =

(1 +t)

2

. In this case, the function w = w(t) is easily

determined by solving the functional equation w =

t(1+w)

2

. By expanding, we ﬁnd tw

2

−(1−2t)w+t = 0

or:

w = w(t) =

1 −t ±

√

1 −4t

2t

.

Since w = w(t) should belong to T

1

, we must elim-

inate the solution with the + sign; consequently, we

have:

(

2n

n

=

=

¸

1

1 −2t(1 +w)

w =

1 −t −

√

1 −4t

2t

=

=

1

√

1 −4t

as we already know.

The function φ(t) = (1 +t)

2

gives rise to a second

degree equation. More in general, let us study the

sequence:

c

n

= [t

n

](1 +αt +βt

2

)

n

and look for its generating function C(t) = ((c

n

).

In this case again we have F(t) = 1 and φ(t) = 1 +

αt +βt

2

, and therefore we should solve the functional

equation w = t(1+αw+βw

2

) or βtw

2

−(1−αt)w+t =

0. This gives:

w = w(t) =

1 −αt ±

(1 −αt)

2

−4βt

2

2βt

and again we have to eliminate the solution with the

+ sign. By performing the necessary computations,

we ﬁnd:

C(t) =

¸

1

1 −t(α + 2βw)

**46 CHAPTER 4. GENERATING FUNCTIONS
**

n`k 0 1 2 3 4 5 6 7 8

0 1

1 1 1 1

2 1 2 3 2 1

3 1 3 6 7 6 3 1

4 1 4 10 16 19 16 10 4 1

Table 4.2: Trinomial coeﬃcients

w =

1 −αt −

1 −2αt + (α

2

−4β)t

2

2βt

¸

=

1

1 −2αt + (α

2

−4β)t

2

and for α = 2, β = 1 we obtain again the generating

function for the central binomial coeﬃcients.

The coeﬃcients of (1+t +t

2

)

n

are called trinomial

coeﬃcients, in analogy with the binomial coeﬃcients.

They constitute an inﬁnite array in which every row

has two more elements, diﬀerent from 0, with respect

to the previous row (see Table 4.2).

If T

n,k

is [t

k

](1 + t + t

2

)

n

, a trinomial coeﬃcient,

by the obvious property (1 + t + t

2

)

n+1

= (1 + t +

t

2

)(1+t+t

2

)

n

, we immediately deduce the recurrence

relation:

T

n+1,k+1

= T

n,k−1

+T

n,k

+T

n,k+1

from which the array can be built, once we start from

the initial conditions T

n,0

= 1 and T

n,2n

= 1, for ev-

ery n ∈ N. The elements T

n,n

, marked in the table,

are called the central trinomial coeﬃcients; their se-

quence begins:

n 0 1 2 3 4 5 6 7 8

T

n

1 1 3 7 19 51 141 393 1107

and by the formula above their generating function

is:

((T

n,n

) =

1

√

1 −2t −3t

2

=

1

(1 +t)(1 −3t)

.

4.7 Some special generating

functions

We wish to determine the generating function of

the sequence ¦0, 1, 1, 1, 2, 2, 2, 2, 3, . . .¦, that is the

sequence whose generic element is ⌊

√

k⌋. We can

think that it is formed up by summing an inﬁnite

number of simpler sequences ¦0, 1, 1, 1, 1, 1, 1, . . .¦,

¦0, 0, 0, 0, 1, 1, 1, 1, . . .¦, the next one with the ﬁrst 1

in position 9, and so on. The generating functions of

these sequences are:

t

1

1 −t

t

4

1 −t

t

9

1 −t

t

16

1 −t

and therefore we obtain:

(

⌊

√

k⌋

=

∞

¸

k=0

t

k

2

1 −t

.

In the same way we obtain analogous generating

functions:

(

⌊

r

√

k⌋

=

∞

¸

k=0

t

k

r

1 −t

((⌊log

r

k⌋) =

∞

¸

k=0

t

r

k

1 −t

where r is any integer number, or also any real num-

ber, if we substitute to k

r

and r

k

, respectively, ⌈k

r

⌉

and ⌈r

k

⌉.

These generating functions can be used to ﬁnd the

values of several sums in closed or semi-closed form.

Let us begin by the following case, where we use the

Euler transformation:

¸

k

n

k

⌊

√

k⌋(−1)

k

=

= (−1)

n

¸

k

n

k

(−1)

n−k

⌊

√

k⌋ =

= (−1)

n

[t

n

]

1

1 +t

¸

∞

¸

k=0

y

k

2

1 −y

y =

t

1 +t

¸

=

= (−1)

n

[t

n

]

∞

¸

k=0

t

k

2

(1 +t)

k

2

=

= (−1)

n

⌊

√

n⌋

¸

k=0

[t

n−k

2

]

1

(1 +t)

k

2

=

= (−1)

n

⌊

√

n⌋

¸

k=0

−k

2

n −k

2

=

= (−1)

n

⌊

√

n⌋

¸

k=0

n −1

n −k

2

(−1)

n−k

2

=

=

⌊

√

n⌋

¸

k=0

n −1

n −k

2

(−1)

k

2

.

We can think of the last sum as a “semi-closed”

form, because the number of terms is dramatically

reduced from n to

√

n, although it remains depending

on n. In the same way we ﬁnd:

¸

k

n

k

⌊log

2

k⌋(−1)

k

=

⌊log

2

n⌋

¸

k=1

n −1

n −2

k

.

A truly closed formula is found for the following

sum:

n

¸

k=1

⌊

√

k⌋ = [t

n

]

1

1 −t

∞

¸

k=1

t

k

2

1 −t

=

4.8. LINEAR RECURRENCES WITH CONSTANT COEFFICIENTS 47

=

∞

¸

k=1

[t

n−k

2

]

1

(1 −t)

2

=

=

⌊

√

n⌋

¸

k=1

−2

n −k

2

(−1)

n−k

2

=

=

⌊

√

n⌋

¸

k=1

n −k

2

+ 1

n −k

2

=

=

⌊

√

n⌋

¸

k=1

(n −k

2

+ 1) =

= (n + 1)⌊

√

n⌋ −

⌊

√

n⌋

¸

k=1

k

2

.

The ﬁnal value of the sum is therefore:

(n + 1)⌊

√

n⌋ −

⌊

√

n⌋

3

⌊

√

n⌋ + 1

⌊

√

n⌋ +

1

2

**whose asymptotic value is
**

2

3

n

√

n. Again, for the anal-

ogous sum with ⌊log

2

n⌋, we obtain (see Chapter 1):

n

¸

k=1

⌊log

2

k⌋ = (n + 1)⌊log

2

n⌋ −2

⌊log

2

n⌋+1

+ 1.

A somewhat more diﬃcult sum is the following one:

n

¸

k=0

n

k

⌊

√

k⌋ = [t

n

]

1

1 −t

¸

∞

¸

k=0

y

k

2

1 −y

y =

t

1 −t

¸

= [t

n

]

∞

¸

k=0

t

k

2

(1 −t)

k

2

(1 −2t)

=

=

∞

¸

k=0

[t

n−k

2

]

1

(1 −t)

k

2

(1 −2t)

.

We can now obtain a semi-closed form for this sum by

expanding the generating function into partial frac-

tions:

1

(1 −t)

k

2

(1 −2t)

=

A

1 −2t

+

B

(1 −t)

k

2

+

+

C

(1 −t)

k

2

−1

+

D

(1 −t)

k

2

−2

+ +

X

1 −t

.

We can show that A = 2

k

2

, B = −1, C = −2, D =

−4, . . . , X = −2

k

2

−1

; in fact, by substituting these

values in the previous expression we get:

2

k

2

1 −2t

−

1

(1 −t)

k

2

−

2(1 −t)

(1 −t)

k

2

−

−

4(1 −t)

2

(1 −t)

k

2

− −

2

k

2

−1

(1 −t)

k

2

−1

(1 −t)

k

2

=

=

2

k

2

1 −2t

−

1

(1 −t)

k

2

2

k

2

(1 −t)

k

2

−1

2(1 −t) −1

=

=

2

k

2

1 −2t

−

2

k

2

(1 −t)

k

2

−1

(1 −t)

k

2

(1 −2t)

=

=

1

(1 −t)

k

2

(1 −2t)

.

Therefore, we conclude:

n

¸

k=0

n

k

⌊

√

k⌋ =

⌊

√

n⌋

¸

k=1

2

k

2

2

n−k

2

−

⌊

√

n⌋

¸

k=1

n −1

n −k

2

−

−

⌊

√

n⌋

¸

k=1

2

n −2

n −k

2

− −

⌊

√

n⌋

¸

k=1

2

k

2

−1

n −k

2

n −k

2

=

= ⌊

√

n⌋2

n

−

⌊

√

n⌋

¸

k=1

n −1

n −k

2

+ 2

n −2

n −k

2

+ +

+ + 2

k

2

−1

n −k

2

n −k

2

.

We observe that for very large values of n the ﬁrst

term dominates all the others, and therefore the

asymptotic value of the sum is ⌊

√

n⌋2

n

.

4.8 Linear recurrences with

constant coeﬃcients

If (f

k

)

k∈N

is a sequence, it can be deﬁned by means

of a recurrence relation, i.e., a relation relating the

generic element f

n

to other elements f

k

having k < n.

Usually, the ﬁrst elements of the sequence must be

given explicitly, in order to allow the computation

of the successive values; they constitute the initial

conditions and the sequence is well-deﬁned if and only

if every element can be computed by starting with

the initial conditions and going on with the other

elements by means of the recurrence relation. For

example, the constant sequence (1, 1, 1, . . .) can be

deﬁned by the recurrence relation x

n

= x

n−1

and

the initial condition x

0

= 1. By changing the initial

conditions, the sequence can radically change; if we

consider the same relation x

n

= x

n−1

together with

the initial condition x

0

= 2, we obtain the constant

sequence ¦2, 2, 2, . . .¦.

In general, a recurrence relation can be written

f

n

= F(f

n−1

, f

n−2

, . . .); when F depends on all

the values f

n−1

, f

n−2

, . . . , f

1

, f

0

, then the relation is

called a full history recurrence. If F only depends on

a ﬁxed number of elements f

n−1

, f

n−2

, . . . , f

n−p

, then

the relation is called a partial history recurrence and p

is called the order of the relation. Besides, if F is lin-

ear, we have a linear recurrence. Linear recurrences

are surely the most common and important type of

recurrence relations; if all the coeﬃcients appearing

in F are constant, we have a linear recurrence with

48 CHAPTER 4. GENERATING FUNCTIONS

constant coeﬃcients, and if the coeﬃcients are poly-

nomials in n, we have a linear recurrence with poly-

nomial coeﬃcients. As we are now going to see, the

method of generating functions allows us to ﬁnd the

solution of any linear recurrence with constant coeﬃ-

cients, in the sense that we ﬁnd a function f(t) such

that [t

n

]f(t) = f

n

, ∀n ∈ N. For linear recurrences

with polynomial coeﬃcients, the same method allows

us to ﬁnd a solution in many occasions, but the suc-

cess is not assured. On the other hand, no method

is known that solves all the recurrences of this kind,

and surely generating functions are the method giv-

ing the highest number of positive results. We will

discuss this case in the next section.

The Fibonacci recurrence F

n

= F

n−1

+F

n−2

is an

example of a recurrence relation with constant co-

eﬃcients. When we have a recurrence of this kind,

we begin by expressing it in such a way that the re-

lation is valid for every n ∈ N. In the example of

Fibonacci numbers, this is not the case, because for

n = 0 we have F

0

= F

−1

+F

−2

, and we do not know

the values for the two elements in the r.h.s., which

have no combinatorial meaning. However, if we write

the recurrence as F

n+2

= F

n+1

+F

n

we have fulﬁlled

the requirement. This ﬁrst step has a great impor-

tance, because it allows us to apply the operator (

to both members of the relation; this was not pos-

sible beforehand because of the principle of identity

for generating functions.

The recurrence being linear with constant coeﬃ-

cients, we can apply the axiom of linearity to the

recurrence:

f

n+p

= α

1

f

n+p−1

+α

2

f

n+p−2

+ +α

p

f

n

and obtain the relation:

((f

n+p

) = α

1

((f

n+p−1

)+

+α

2

((f

n+p−2

) + +α

p

((f

n

).

By Theorem 4.2.2 we can now express every

((f

n+p−j

) in terms of f(t) = ((f

n

) and obtain a lin-

ear relation in f(t), from which an explicit expression

for f(t) is immediately obtained. This is the solution

of the recurrence relation. We observe explicitly that

in writing the expressions for ((f

n+p−j

) we make use

of the initial conditions for the sequence.

Let us go on with the example of the Fibonacci

sequence (F

k

)

k∈N

. We have:

((F

n+2

) = ((F

n+1

) +((F

n

)

and by setting F(t) = ((F

n

) we ﬁnd:

F(t) −F

0

−F

1

t

t

2

=

F(t) −F

0

t

+F(t).

Because we know that F

0

= 0, F

1

= 1, we have:

F(t) −t = tF(t) +t

2

F(t)

and by solving in F(t) we have the explicit expression:

F(t) =

t

1 −t −t

2

.

This is the generating function for the Fibonacci

numbers. We can now ﬁnd an explicit expression for

F

n

in the following way. The denominator of F(t)

can be written 1 −t −t

2

= (1 −φt)(1 −

´

φt) where:

φ =

1 +

√

5

2

≈ 1.618033989

´

φ =

1 −

√

5

2

≈ −0.618033989.

The constant 1/φ ≈ 0.618033989 is known as the

golden ratio. By applying the method of partial frac-

tion expansion we ﬁnd:

F(t) =

t

(1 −φt)(1 −

´

φt)

=

A

1 −φt

+

B

1 −

´

φt

=

=

A−A

´

φt +B −Bφt

1 −t −t

2

.

We determine the two constants A and B by equat-

ing the coeﬃcients in the ﬁrst and last expression for

F(t):

A+B = 0

−A

´

φ −Bφ = 1

A = 1/(φ −

´

φ) = 1/

√

5

B = −A = −1/

√

5

The value of F

n

is now obtained by extracting the

coeﬃcient of t

n

:

F

n

= [t

n

]F(t) =

= [t

n

]

1

√

5

1

1 −φt

−

1

1 −

´

φt

=

=

1

√

5

[t

n

]

1

1 −φt

−[t

n

]

1

1 −

´

φt

=

=

φ

n

−

´

φ

n

√

5

.

This formula allows us to compute F

n

in a time in-

dependent of n, because φ

n

= exp(nln φ), and shows

that F

n

grows exponentially. In fact, since [

´

φ[ < 1,

the quantity

´

φ

n

approaches 0 very rapidly and we

have F

n

= O(φ

n

). In reality, F

n

should be an inte-

ger and therefore we can compute it by ﬁnding the

integer number closest to φ

n

/

√

5; consequently:

F

n

= round

φ

n

√

5

.

4.10. THE SUMMING FACTOR METHOD 49

4.9 Linear recurrences with

polynomial coeﬃcients

When a recurrence relation has polynomial coeﬃ-

cients, the method of generating functions does not

assure a solution, but no other method is available

to solve those recurrences which cannot be solved by

a generating function approach. Usually, the rule of

diﬀerentiation introduces a derivative in the relation

for the generating function and a diﬀerential equa-

tion has to be solved. This is the actual problem of

this approach, because the main diﬃculty just con-

sists in dealing with the diﬀerential equation. We

have already seen some examples when we studied

the method of shifting, but here we wish to present

a case arising from an actual combinatorial problem,

and in the next section we will see a very important

example taken from the analysis of algorithms.

When we studied permutations, we introduced the

concept of an involution, i.e., a permutation π ∈ P

n

such that π

2

= (1), and for the number I

n

of involu-

tions in P

n

we found the recurrence relation:

I

n

= I

n−1

+ (n −1)I

n−2

which has polynomial coeﬃcients. The number of

involutions grows very fast and it can be a good idea

to consider the quantity i

n

= I

n

/n!. Therefore, let

us begin by changing the recurrence in such a way

that the principle of identity can be applied, and then

divide everything by (n + 2)!:

I

n+2

= I

n+1

+ (n + 1)I

n

I

n+2

(n + 2)!

=

1

n + 2

I

n+1

(n + 1)!

+

1

n + 2

I

n

n!

.

The recurrence relation for i

n

is:

(n + 2)i

n+2

= i

n+1

+i

n

and we can pass to generating functions.

(((n + 2)i

n+2

) can be seen as the shifting of

(((n + 1)i

n+1

) = i

′

(t), if with i(t) we denote the

generating function of (i

k

)

k∈N

= (I

k

/k!)

k∈N

, which

is therefore the exponential generating function for

(I

k

)

k∈N

. We have:

i

′

(t) −1

t

=

i(t) −1

t

+i(t)

because of the initial conditions i

0

= i

1

= 1, and so:

i

′

(t) = (1 +t)i(t).

This is a simple diﬀerential equation with separable

variables and by solving it we ﬁnd:

ln i(t) = t +

t

2

2

+C or i(t) = exp

t +

t

2

2

+C

**where C is an integration constant. Because i(0) =
**

e

C

= 1, we have C = 0 and we conclude with the

formula:

I

n

= n![t

n

] exp

t +

t

2

2

.

4.10 The summing factor

method

For linear recurrences of the ﬁrst order a method ex-

ists, which allows us to obtain an explicit expression

for the generic element of the deﬁned sequence. Usu-

ally, this expression is in the form of a sum, and a pos-

sible closed form can only be found by manipulating

this sum; therefore, the method does not guarantee

a closed form. Let us suppose we have a recurrence

relation:

a

n+1

f

n+1

= b

n

f

n

+c

n

where a

n

, b

n

, c

n

are any expressions, possibly depend-

ing on n. As we remarked in the Introduction, if

a

n+1

= b

n

= 1, by unfolding the recurrence we can

ﬁnd an explicit expression for f

n+1

or f

n

:

f

n+1

= f

n

+c

n

= f

n−1

+c

n−1

+c

n

= = f

0

+

n

¸

k=0

c

k

where f

0

is the initial condition relative to the se-

quence under consideration. Fortunately, we can al-

ways change the original recurrence into a relation of

this more simple form. In fact, if we multiply every-

thing by the so-called summing factor:

a

n

a

n−1

. . . a

0

b

n

b

n−1

. . . b

0

provided none of a

n

, a

n−1

, . . . , a

0

, b

n

, b

n−1

, . . . , b

0

is

zero, we obtain:

a

n+1

a

n

. . . a

0

b

n

b

n−1

. . . b

0

f

n+1

=

=

a

n

a

n−1

. . . a

0

b

n−1

b

n−2

. . . b

0

f

n

+

a

n

a

n−1

. . . a

0

b

n

b

n−1

. . . b

0

c

n

.

We can now deﬁne:

g

n+1

= a

n+1

a

n

. . . a

0

f

n+1

/(b

n

b

n−1

. . . b

0

),

and the relation becomes:

g

n+1

= g

n

+

a

n

a

n−1

. . . a

0

b

n

b

n−1

. . . b

0

c

n

g

0

= a

0

f

0

.

Finally, by unfolding this recurrence the result is:

f

n+1

=

b

n

b

n−1

. . . b

0

a

n+1

a

n

. . . a

0

a

0

f

0

+

n

¸

k=0

a

k

a

k−1

. . . a

0

b

k

b

k−1

. . . b

0

c

k

.

As a technical remark, we observe that sometimes

a

0

and/or b

0

can be 0; in that case, we can unfold the

50 CHAPTER 4. GENERATING FUNCTIONS

recurrence down to 1, and accordingly change the last

index 0.

In order to show a non-trivial example, let us

discuss the problem of determining the coeﬃcient

of t

n

in the f.p.s. corresponding to the function

f(t) =

√

1 −t ln(1/(1 − t)). If we expand the func-

tion, we ﬁnd:

f(t) =

√

1 −t ln

1

1 −t

=

= t −

1

24

t

3

−

1

24

t

4

−

71

1920

t

5

−

31

960

t

6

+ .

A method for ﬁnding a recurrence relation for the

coeﬃcients f

n

of this f.p.s. is to derive a diﬀerential

equation for f(t). By diﬀerentiating:

f

′

(t) = −

1

2

√

1 −t

ln

1

1 −t

+

1

√

1 −t

and therefore we have the diﬀerential equation:

(1 −t)f

′

(t) = −

1

2

f(t) +

√

1 −t.

By extracting the coeﬃcient of t

n

, we have the rela-

tion:

(n + 1)f

n+1

−nf

n

= −

1

2

f

n

+

1/2

n

(−1)

n

which can be written as:

(n + 1)f

n+1

=

2n −1

2

f

n

−

1

4

n

(2n −1)

2n

n

.

This is a recurrence relation of the ﬁrst order with the

initial condition f

0

= 0. Let us apply the summing

factor method, for which we have a

n

= n, b

n

= (2n−

1)/2. Since a

0

= 0, we have:

a

n

a

n−1

. . . a

1

b

n

b

n−1

. . . b

1

=

n(n −1) 1 2

n

(2n −1)(2n −3) 1

=

=

n!2

n

2n(2n −2) 2

2n(2n −1)(2n −2) 1

=

=

4

n

n!

2

(2n)!

.

By multiplying the recurrence relation by this sum-

ming factor, we ﬁnd:

(n + 1)

4

n

n!

2

(2n)!

f

n+1

=

2n −1

2

4

n

n!

2

(2n)!

f

n

−

1

2n −1

.

We are fortunate and c

n

simpliﬁes dramatically;

besides, we know that the two coeﬃcients of f

n+1

and f

n

are equal, notwithstanding their appearance.

Therefore we have:

f

n+1

=

1

(n + 1)4

n

2n

n

n

¸

k=0

−1

2k −1

(a

0

f

0

= 0).

We can somewhat simplify this expression by observ-

ing that:

n

¸

k=0

1

2k −1

=

1

2n −1

+

1

2n −3

+ + 1 −1 =

=

1

2n

+

1

2n −1

+

1

2n −2

+ +

1

2

+1−

1

2n

− −

1

2

−1 =

= H

2n+2

−

1

2n + 1

−

1

2n + 2

−

1

2

H

n+1

+

1

2n + 2

−1 =

= H

2n+2

−

1

2

H

n+1

−

2(n + 1)

2n + 1

.

Furthermore, we have:

1

(n + 1)4

n

2n

n

=

4

(n + 1)4

n+1

2n + 2

n + 1

n + 1

2(2n + 1)

=

2

2n + 1

1

4

n+1

2n + 2

n + 1

.

Therefore:

f

n+1

=

1

2

H

n+1

−H

2n+2

+

2(n + 1)

2n + 1

2

2n + 1

1

4

n+1

2n + 2

n + 1

.

This expression allows us to obtain a formula for f

n

:

f

n

=

1

2

H

n

−H

2n

+

2n

2n −1

2

2n −1

1

4

n

2n

n

=

=

H

n

−2H

2n

+

4n

2n −1

1/2

n

.

The reader can numerically check this expression

against the actual values of f

n

given above. By using

the asymptotic approximation H

n

∼ ln n+γ given in

the Introduction, we ﬁnd:

H

n

−2H

2n

+

4n

2n −1

∼

∼ ln n +γ −2(ln 2 + lnn +γ) + 2 =

= −ln n −γ −ln 4 + 2.

Besides:

1

2n −1

1

4

n

2n

n

∼

1

2n

√

πn

and we conclude:

f

n

∼ −(ln n +γ + ln 4 −2)

1

2n

√

πn

which shows that [f

n

[ grows as ln n/n

3/2

.

4.11. THE INTERNAL PATH LENGTH OF BINARY TREES 51

4.11 The internal path length

of binary trees

Binary trees are often used as a data structure to re-

trieve information. A set D of keys is given, taken

from an ordered universe U. Therefore D is a permu-

tation of the ordered sequence d

1

< d

2

< < d

n

,

and as the various elements arrive, they are inserted

in a binary tree. As we know, there are n! pos-

sible permutations of the keys in D, but there are

only

2n

n

**/(n+1) diﬀerent binary trees with n nodes.
**

When we are looking for some key d to ﬁnd whether

d ∈ D or not, we perform a search in the binary tree,

comparing d against the root and other keys in the

tree, until we ﬁnd d or we arrive to some leaf and we

cannot go on with our search. In the former case our

search is successful, while in the latter case it is un-

successful. The problem is: how many comparisons

should we perform, on the average, to ﬁnd out that d

is present in the tree (successful search)? The answer

to this question is very important, because it tells us

how good binary trees are for searching information.

The number of nodes along a path in the tree, start-

ing at the root and arriving to a given node K is

called the internal path length for K. It is just the

number of comparisons necessary to ﬁnd the key in

K. Therefore, our previous problem can be stated in

the following way: what is the average internal path

length for binary trees with n nodes? Knuth has

found a rather simple way for answering this ques-

tion; however, we wish to show how the method of

generating functions can be used to ﬁnd the average

internal path length in a standard way. The reason-

ing is as follows: we evaluate the total internal path

length for all the trees generated by the n! possible

permutations of our key set D, and then divide this

number by n!n, the total number of nodes in all the

trees.

A non-empty binary tree can be seen as two sub-

trees connected to the root (see Section 2.9); the left

subtree contains k nodes (k = 0, 1, . . . , n−1) and the

right subtree contains the remaining n−1 −k nodes.

Let P

n

the total internal path length (i.p.l.) of all the

n! possible trees generated by permutations. The left

subtrees have therefore a total i.p.l. equal to P

k

, but

every search in these subtrees has to pass through

the root. This increases the total i.p.l. by the total

number of nodes, i.e., it actually is P

k

+ k!k. We

now observe that every left subtree is associated to

each possible right subtree and therefore it should be

counted (n − 1 − k)! times. Besides, every permuta-

tion generating the left and right subtrees is not to

be counted only once: the keys can be arranged in

all possible ways in the overall permutation, retain-

ing their relative ordering. These possible ways are

n−1

k

**, and therefore the total contribution to P
**

n

of

the left subtrees is:

n −1

k

(n−1 −k)!(P

k

+k!k) = (n−1)!

P

k

k!

+k

.

In a similar way we ﬁnd the total contribution of the

right subtrees:

n −1

k

k!(P

n−1−k

+ (n −1 −k)!(n −1 −k)) =

= (n −1)!

P

n−1−k

(n −1 −k)!

+ (n −1 −k)

.

It only remains to count the contribution of the roots,

which obviously amounts to n!, a single comparisons

for every tree. We therefore have the following recur-

rence relation, in which the contributions of the left

and right subtrees turn out to be the same:

P

n

= n! + (n −1)!

n−1

¸

k=0

P

k

k!

+k +

P

n−1−k

(n −1 −k)!

+ (n −1 −k)

=

= n! + 2(n −1)!

n−1

¸

k=0

P

k

k!

+k

=

= n! + 2(n −1)!

n−1

¸

k=0

P

k

k!

+

n(n −1)

2

.

We used the formula for the sum of the ﬁrst n − 1

integers, and now, by dividing by n!, we have:

P

n

n!

= 1 +

2

n

n−1

¸

k=0

P

k

k!

+n −1 = n +

2

n

n−1

¸

k=0

P

k

k!

.

Let us now set Q

n

= P

n

/n!, so that Q

n

is the average

total i.p.l. relative to a single tree. If we’ll succeed

in ﬁnding Q

n

, the average i.p.l. we are looking for

will simply be Q

n

/n. We can also reformulate the

recurrence for n +1, in order to be able to apply the

generating function operator:

(n + 1)Q

n+1

= (n + 1)

2

+ 2

n

¸

k=0

Q

k

Q

′

(t) =

1 +t

(1 −t)

3

+ 2

Q(t)

1 −t

Q

′

(t) −

2

1 −t

Q(t) =

1 +t

(1 −t)

3

.

This diﬀerential equation can be easily solved:

Q(t) =

1

(1 −t)

2

(1 −t)

2

(1 +t)

(1 −t)

3

dt +C

=

=

1

(1 −t)

2

2 ln

1

1 −t

−t +C

.

52 CHAPTER 4. GENERATING FUNCTIONS

Since the i.p.l. of the empty tree is 0, we should have

Q

0

= Q(0) = 0 and therefore, by setting t = 0, we

ﬁnd C = 0. The ﬁnal result is:

Q(t) =

2

(1 −t)

2

ln

1

1 −t

−

t

(1 −t)

2

.

We can now use the formula for ((nH

n

) (see the Sec-

tion 4.4 on Common Generating Functions) to ex-

tract the coeﬃcient of t

n

:

Q

n

= [t

n

]

1

(1 −t)

2

2

ln

1

1 −t

+ 1

−(2 +t)

=

= 2[t

n+1

]

t

(1 −t)

2

ln

1

1 −t

+ 1

−[t

n

]

2 +t

(1 −t)

2

=

= 2(n+1)H

n+1

−2

−2

n

(−1)

n

−

−2

n −1

(−1)

n−1

=

= 2(n +1)H

n

+2 −2(n +1) −n = 2(n +1)H

n

−3n.

Thus we conclude with the average i.p.l.:

P

n

n!n

=

Q

n

n

= 2

1 +

1

n

H

n

−3.

This formula is asymptotic to 2 ln n+γ−3, and shows

that the average number of comparisons necessary to

retrieve any key in a binary tree is in the order of

O(ln n).

4.12 Height balanced binary

trees

We have been able to show that binary trees are a

“good” retrieving structure, in the sense that if the

elements, or keys, of a set ¦a

1

, a

2

, . . . , a

n

¦ are stored

in random order in a binary (search) tree, then the

expected average time for retrieving any key in the

tree is in the order of lnn. However, this behavior of

binary trees is not always assured; for example, if the

keys are stored in the tree in their proper order, the

resulting structure degenerates into a linear list and

the average retrieving time becomes O(n).

To avoid this drawback, at the beginning of the

1960’s, two Russian researchers, Adelson-Velski and

Landis, found an algorithm to store keys in a “height

balanced” binary tree, a tree for which the height of

the left subtree of every node K diﬀers by at most

1 from the height of the right subtree of the same

node K. To understand this concept, let us deﬁne the

height of a tree as the highest level at which a node

in the tree is placed. The height is also the maximal

number of comparisons necessary to ﬁnd any key in

the tree. Therefore, if we ﬁnd a limitation for the

height of a class of trees, this is also a limitation for

the internal path length of the trees in the same class.

Formally, a height balanced binary tree is a tree such

that for every node K in it, if h

′

K

and h

′′

k

are the

heights of the two subtrees originating from K, then

[h

′

K

−h

′′

K

[ ≤ 1.

The algorithm of Adelson-Velski and Landis is very

important because, as we are now going to show,

height balanced binary trees assure that the retriev-

ing time for every key in the tree is O(ln n). Because

of that, height balanced binary trees are also known

as AVL trees, and the algorithm for building AVL-

trees from a set of n keys can be found in any book

on algorithms and data structures. Here we only wish

to perform a worst case analysis to prove that the re-

trieval time in any AVL tree cannot be larger than

O(ln n).

In order to perform our analysis, let us consider

to worst possible AVL tree. Since, by deﬁnition, the

height of the left subtree of any node cannot exceed

the height of the corresponding right subtree plus 1,

let us consider trees in which the height of the left

subtree of every node exceeds exactly by 1 the height

of the right subtree of the same node. In Figure 4.1

we have drawn the ﬁrst cases. These trees are built in

a very simple way: every tree T

n

, of height n, is built

by using the preceding tree T

n−1

as the left subtree

and the tree T

n−2

as the right subtree of the root.

Therefore, the number of nodes in T

n

is just the sum

of the nodes in T

n−1

and in T

n−2

, plus 1 (the root),

and the condition on the heights of the subtrees of

every node is satisﬁed. Because of this construction,

T

n

can be considered as the “worst” tree of height n,

in the sense that every other AVL-tree of height n will

have at least as many nodes as T

n

. Since the height

n is an upper bound for the number of comparisons

necessary to retrieve any key in the tree, the average

retrieving time for every such tree will be ≤ n.

If we denote by [T

n

[ the number of nodes in the

tree T

n

, we have the simple recurrence relation:

[T

n

[ = [T

n−1

[ +[T

n−2

[ + 1.

This resembles the Fibonacci recurrence relation,

and, in fact, we can easily show that [T

n

[ = F

n+1

−1,

as is intuitively apparent from the beginning of the

sequence ¦0, 1, 2, 4, 7, 12, . . .¦. The proof is done by

mathematical induction. For n = 0 we have [T

0

[ =

F

1

− 1 = 1 − 1 = 0, and this is true; similarly we

proceed for n + 1. Therefore, let us suppose that for

every k < n we have [T

k+1

[ = F

k

− 1; this holds for

k = n−1 and k = n−2, and because of the recurrence

relation for [T

n

[ we have:

[T

n

[ = [T

n−1

[ +[T

n−2

[ + 1 =

= F

n

−1 +F

n−1

−1 + 1 = F

n+1

−1

since F

n

+F

n−1

= F

n+1

by the Fibonacci recurrence.

4.13. SOME SPECIAL RECURRENCES 53

T

0

T

1

T

2

T

3

T

4

T

5

,

,

,

,

,

,

,

d

d

,

,

,

,

,

,

,

e

e

d

d

¡

¡

,

,

,

,

,

,

,

,

,

,

,

,

e

e

d

d

¡

¡

d

d

d

d

¡

¡

¡

¡

Figure 4.1: Worst AVL-trees

We have shown that for large values of n we have

F

n

≈ φ

n

/

√

5; therefore we have [T

n

[ ≈ φ

n+1

/

√

5 −1

or φ

n+1

≈

√

5([T

n

[+1). By passing to the logarithms,

we have: n ≈ log

φ

(

√

5([T

n

[ + 1)) − 1, and since all

logarithms are proportional, n = O(ln [T

n

[). As we

observed, every AVL-tree of height n has a number

of nodes not less than [T

n

[, and this assures that the

retrieving time for every AVL-tree with at most [T

n

[

nodes is bounded from above by log

φ

(

√

5([T

n

[+1))−1

4.13 Some special recurrences

Not all recurrence relations are linear and we had

occasions to deal with a diﬀerent sort of relation when

we studied the Catalan numbers. They satisfy the

recurrence C

n

=

¸

n−1

k=0

C

k

C

n−k−1

, which however,

in this particular form, is only valid for n > 0. In

order to apply the method of generating functions,

we write it for n + 1:

C

n+1

=

n

¸

k=0

C

k

C

n−k

.

The right hand member is a convolution, and there-

fore, by the initial condition C

0

= 1, we obtain:

C(t) −1

t

= C(t)

2

or tC(t)

2

−C(t) + 1 = 0.

This is a second degree equation, which can be di-

rectly solved; for t = 0 we should have C(0) = C

0

=

1, and therefore the solution with the + sign before

the square root is to be ignored; we thus obtain:

C(t) =

1 −

√

1 −4t

2t

which was found in the section “The method of shift-

ing” in a completely diﬀerent way.

The Bernoulli numbers were introduced by means

of the implicit relation:

n

¸

k=0

n + 1

k

B

k

= δ

n,0

.

We are now in a position to ﬁnd out their expo-

nential generating function, i.e., the function B(t) =

((B

n

/n!), and prove some of their properties. The

deﬁning relation can be written as:

n

¸

k=0

n + 1

n −k

B

n−k

=

=

n

¸

k=0

(n + 1)n (k + 2)

B

n−k

(n −k)!

=

=

n

¸

k=0

(n + 1)!

(k + 1)!

B

n−k

(n −k)!

= δ

n,0

.

If we divide everything by (n + 1)!, we obtain:

n

¸

k=0

1

(k + 1)!

B

n−k

(n −k)!

= δ

n,0

and since this relation holds for every n ∈ N, we

can pass to the generating functions. The left hand

member is a convolution, whose ﬁrst factor is the shift

of the exponential function and therefore we obtain:

e

t

−1

t

B(t) = 1 or B(t) =

t

e

t

−1

.

The classical way to see that B

2n+1

= 0, ∀n > 0,

is to show that the function obtained from B(t) by

deleting the term of ﬁrst degree is an even function,

and therefore should have all its coeﬃcients of odd

order equal to zero. In fact we have:

B(t) +

t

2

=

t

e

t

−1

+

t

2

=

t

2

e

t

+ 1

e

t

−1

.

In order to see that this function is even, we substi-

tute t → −t and show that the function remains the

same:

−

t

2

e

−t

+ 1

e

−t

−1

= −

t

2

1 +e

t

1 −e

t

=

t

2

e

t

+ 1

e

t

−1

.

54 CHAPTER 4. GENERATING FUNCTIONS

Chapter 5

Riordan Arrays

5.1 Deﬁnitions and basic con-

cepts

A Riordan array is a couple of formal power series

D = (d(t), h(t)); if both d(t), h(t) ∈ T

0

, then the

Riordan array is called proper. The Riordan array

can be identiﬁed with the inﬁnite, lower triangular

array (or triangle) (d

n,k

)

n,k∈N

deﬁned by:

d

n,k

= [t

n

]d(t)(th(t))

k

(5.1.1)

In fact, we are mainly interested in the sequence of

functions iteratively deﬁned by:

d

0

(t) = d(t)

d

k

(t) = d

k−1

(t)th(t) = d(t)(th(t))

k

These functions are the column generating functions

of the triangle.

Another way of characterizing a Riordan array

D = (d(t), h(t)) is to consider the bivariate gener-

ating function:

d(t, z) =

∞

¸

k=0

d(t)(th(t))

k

z

k

=

d(t)

1 −tzh(t)

(5.1.2)

A common example of a Riordan array is the Pascal

triangle, for which we have d(t) = h(t) = 1/(1 − t).

In fact we have:

d

n,k

= [t

n

]

1

1 −t

t

1 −t

k

= [t

n−k

]

1

(1 −t)

k+1

=

=

−k −1

n −k

(−1)

n−k

=

n

n −k

=

n

k

**and this shows that the generic element is actually a
**

binomial coeﬃcient. By formula (5.1.2), we ﬁnd the

well-known bivariate generating function d(t, z) =

(1 −t −tz)

−1

.

From our point of view, one of the most impor-

tant properties of Riordan arrays is the fact that the

sums involving the rows of a Riordan array can be

performed by operating a suitable transformation on

a generating function and then by extracting a coeﬃ-

cient from the resulting function. More precisely, we

prove the following theorem:

Theorem 5.1.1 Let D = (d(t), h(t)) be a Riordan

array and let f(t) be the generating function of the

sequence (f

k

)

k∈N

; then we have:

n

¸

k=0

d

n,k

f

k

= [t

n

]d(t)f(th(t)) (5.1.3)

Proof: The proof consists in a straight-forward com-

putation:

n

¸

k=0

d

n,k

f

k

=

∞

¸

k=0

d

n,k

f

k

=

=

∞

¸

k=0

[t

n

]d(t)(th(t))

k

f

k

=

= [t

n

]d(t)

∞

¸

k=0

f

k

(th(t))

k

=

= [t

n

]d(t)f(th(t)).

In the case of Pascal triangle we obtain the Euler

transformation:

n

¸

k=0

n

k

f

k

= [t

n

]

1

1 −t

f

t

1 −t

**which we proved by simple considerations on bino-
**

mial coeﬃcients and generating functions. By con-

sidering the generating functions ((1) = 1/(1 − t),

(

(−1)

k

= 1/(1+t) and ((k) = t/(1−t)

2

, from the

previous theorem we immediately ﬁnd:

row sums (r. s.)

¸

k

d

n,k

= [t

n

]

d(t)

1 −th(t)

alternating r. s.

¸

k

(−1)

k

d

n,k

= [t

n

]

d(t)

1 +th(t)

weighted r. s.

¸

k

kd

n,k

= [t

n

]

td(t)h(t)

(1 −th(t))

2

.

Moreover, by observing that

´

D = (d(t), th(t)) is a

Riordan array, whose rows are the diagonals of D,

55

56 CHAPTER 5. RIORDAN ARRAYS

we have:

diagonal sums

¸

k

d

n−k,k

= [t

n

]

d(t)

1 −t

2

h(t)

.

Obviously, this observation can be generalized to ﬁnd

the generating function of any sum

¸

k

d

n−sk,k

for

every s ≥ 1. We obtain well-known results for the

Pascal triangle; for example, diagonal sums give:

¸

k

n −k

k

= [t

n

]

1

1 −t

1

1 −t

2

(1 −t)

−1

=

= [t

n

]

1

1 −t −t

2

= F

n+1

connecting binomial coeﬃcients and Fibonacci num-

bers.

Another general result can be obtained by means of

two sequences (f

k

)

k∈N

and (g

k

)

k∈N

and their gener-

ating functions f(t), g(t). For p = 1, 2, . . ., the generic

element of the Riordan array (f(t), t

p−1

) is:

d

n,k

= [t

n

]f(t)(t

p

)

k

= [t

n−pk

]f(t) = f

n−pk

.

Therefore, by formula (5.1.3), we have:

⌊n/p⌋

¸

k=0

f

n−pk

g

k

= [t

n

]f(t)

g(y)

y = t

p

=

= [t

n

]f(t)g(t

p

).

This can be called the rule of generalized convolution

since it reduces to the usual convolution rule for p =

1. Suppose, for example, that we wish to sum one

out of every three powers of 2, starting with 2

n

and

going down to the lowest integer exponent ≥ 0; we

have:

S

n

=

⌊n/3⌋

¸

k=0

2

n−3k

= [t

n

]

1

1 −2t

1

1 −t

3

.

As we will learn studying asymptotics, an approxi-

mate value for this sum can be obtained by extracting

the coeﬃcient of the ﬁrst factor and then by multiply-

ing it by the second factor, in which t is substituted

by 1/2. This gives S

n

≈ 2

n+3

/7, and in fact we have

the exact value S

n

= ⌊2

n+3

/7⌋.

In a sense, the theorem on the sums involving the

Riordan arrays is a characterization for them; in fact,

we can prove a sort of inverse property:

Theorem 5.1.2 Let (d

n,k

)

n,k∈N

be an inﬁnite tri-

angle such that for every sequence (f

k

)

k∈N

we have

¸

k

d

n,k

f

k

= [t

n

]d(t)f(th(t)), where f(t) is the gen-

erating function of the sequence and d(t), h(t) are two

f.p.s. not depending on f(t). Then the triangle de-

ﬁned by the Riordan array (d(t), h(t)) coincides with

(d

n,k

)

n,k∈N

.

Proof: For every k ∈ N take the sequence which

is 0 everywhere except in the kth element f

k

= 1.

The corresponding generating function is f(t) = t

k

and we have

¸

∞

i=0

d

n,i

f

i

= d

n,k

. Hence, according to

the theorem’s hypotheses, we ﬁnd (

t

(d

n,k

)

n,k∈N

=

d

k

(t) = d(t)(th(t))

k

, and this corresponds to the ini-

tial deﬁnition of column generating functions for a

Riordan array, for every k = 1, 2, . . ..

5.2 The algebraic structure of

Riordan arrays

The most important algebraic property of Riordan

arrays is the fact that the usual row-by-column prod-

uct of two Riordan arrays is a Riordan array. This is

proved by considering two Riordan arrays (d(t), h(t))

and (a(t), b(t)) and performing the product, whose

generic element is

¸

j

d

n,j

f

j,k

, if d

n,j

is the generic

element in (d(t), h(t)) and f

j,k

is the generic element

in (a(t), b(t)). In fact we have:

∞

¸

j=0

d

n,j

f

j,k

=

=

∞

¸

j=0

[t

n

]d(t)(th(t))

j

[y

j

]a(y)(yb(y))

k

=

= [t

n

]d(t)

∞

¸

j=0

(th(t))

j

[y

j

]a(y)(yb(y))

k

=

= [t

n

]d(t)a(th(t))(th(t)b(th(t)))

k

.

By deﬁnition, the last expression denotes the generic

element of the Riordan array (f(t), g(t)) where f(t) =

d(t)a(th(t)) and g(t) = h(t)b(th(t)). Therefore we

have:

(d(t), h(t)) (a(t), b(t)) = (d(t)a(th(t)), h(t)b(th(t))).

(5.2.1)

This expression is particularly important and is the

basis for many developments of the Riordan array

theory.

The product is obviously associative, and we ob-

serve that the Riordan array (1, 1) acts as the neutral

element or identity. In fact, the array (1, 1) is every-

where 0 except for the elements on the main diagonal,

which are 1. Observe that this array is proper.

Let us now suppose that (d(t), h(t)) is a proper

Riordan array. By formula (5.2.1), we immediately

see that the product of two proper Riordan arrays is

proper; therefore, we can look for a proper Riordan

array (a(t), b(t)) such that (d(t), h(t)) (a(t), b(t)) =

(1, 1). If this is the case, we should have:

d(t)a(th(t)) = 1 and h(t)b(th(t)) = 1.

5.3. THE A-SEQUENCE FOR PROPER RIORDAN ARRAYS 57

By setting y = th(t) we have:

a(y) =

d(t)

−1

t = yh(t)

−1

b(y) =

h(t)

−1

t = yh(t)

−1

.

Here we are in the hypotheses of the Lagrange Inver-

sion Formula, and therefore there is a unique function

t = t(y) such that t(0) = 0 and t = yh(t)

−1

. Besides,

being d(t), h(t) ∈ T

0

, the two f.p.s. a(y) and b(y) are

uniquely deﬁned. We have therefore proved:

Theorem 5.2.1 The set / of proper Riordan arrays

is a group with the operation of row-by-column prod-

uct deﬁned functionally by relation (5.2.1).

It is a simple matter to show that some important

classes of Riordan arrays are subgroups of /:

• the set of the Riordan arrays (f(t), 1) is an in-

variant subgroup of /; it is called the Appell

subgroup;

• the set of the Riordan arrays (1, g(t)) is a sub-

group of / and is called the subgroup of associ-

ated operators or the Lagrange subgroup;

• the set of Riordan arrays (f(t), f(t)) is a sub-

group of / and is called the Bell subgroup. Its

elements are also known as renewal arrays.

The ﬁrst two subgroups have already been consid-

ered in the Chapter on “Formal Power Series” and

show the connection between f.p.s. and Riordan ar-

rays. The notations used in that Chapter are thus

explained as particular cases of the most general case

of (proper) Riordan arrays.

Let us now return to the formulas for a Riordan

array inverse. If h(t) is any ﬁxed invertible f.p.s., let

us deﬁne:

d

h

(t) =

d(y)

−1

y = th(y)

−1

**so that we can write (d(t), h(t))
**

−1

= (d

h

(t), h

h

(t)).

By the product formula (5.2.1) we immediately ﬁnd

the identities:

d(th

h

(t)) = d

h

(t)

−1

d

h

(th(t)) = d(t)

−1

h(th

h

(t)) = h

h

(t)

−1

h

h

(th(t)) = h(t)

−1

which can be reduced to the single and basic rule:

f(th

h

(t)) = f

h

(t)

−1

∀f(t) ∈ T

0

.

Observe that obviously f

h

(t) = f(t).

We wish now to ﬁnd an explicit expression for the

generic element d

n,k

in the inverse Riordan array

(d(t), h(t))

−1

in terms of d(t) and h(t). This will be

done by using the LIF. As we observed in the ﬁrst sec-

tion, the bivariate generating function for (d(t), h(t))

is d(t)/(1 −tzh(t)) and therefore we have:

d

n,k

= [t

n

z

k

]

d

h

(t)

1 −tzh

h

(t)

=

= [z

k

][t

n

]

¸

d

h

(t)

1 −zy

y = th

h

(t)

.

By the formulas above, we have:

y = th

h

(t) = th(th

h

(t))

−1

= th(y)

−1

which is the same as t = yh(y). Therefore we ﬁnd:

d

h

(t) = d

h

(yh(y)) = d(t)

−1

, and consequently:

d

n,k

= [z

k

][t

n

]

¸

d(y)

−1

1 −zy

y = th(y)

−1

=

= [z

k

]

1

n

[y

n−1

]

d

dy

d(y)

−1

1 −zy

1

h(y)

n

=

= [z

k

]

1

n

[y

n−1

]

z

d(y)(1 −zy)

2

−

−

d

′

(y)

d(y)

2

(1 −zy)

1

h(y)

n

=

= [z

k

]

1

n

[y

n−1

]

∞

¸

r=0

z

r+1

y

r

(r + 1)−

−

d

′

(y)

d(y)

∞

¸

r=0

z

r

y

r

1

d(y)h(y)

n

=

=

1

n

[y

n−1

]

ky

k−1

−y

k

d

′

(y)

d(y)

1

d(y)h(y)

n

=

=

1

n

[y

n−k

]

k −

yd

′

(y)

d(y)

1

d(y)h(y)

n

.

This is the formula we were looking for.

5.3 The A-sequence for proper

Riordan arrays

Proper Riordan arrays play a very important role

in our approach. Let us consider a Riordan array

D = (d(t), h(t)), which is not proper, but d(t) ∈

T

0

. Since h(0) = 0, an s > 0 exists such that

h(t) = h

s

t

s

+h

s+1

t

s+1

+ and h

s

= 0. If we deﬁne

´

h(t) = h

s

+h

s+1

t+ , then

´

h(t) ∈ T

0

. Consequently,

the Riordan array

´

D = (d(t),

´

h(t)) is proper and the

rows of D can be seen as the s-diagonals (

´

d

n−sk

)

k∈N

of

´

D. Fortunately, for proper Riordan arrays, Rogers

has found an important characterization: every ele-

ment d

n+1,k+1

, n, k ∈ N, can be expressed as a linear

combination of the elements in the preceding row,

i.e.:

d

n+1,k+1

= a

0

d

n,k

+a

1

d

n,k+1

+a

2

d

n,k+2

+ =

58 CHAPTER 5. RIORDAN ARRAYS

=

∞

¸

j=0

a

j

d

n,k+j

. (5.3.1)

The sum is actually ﬁnite and the sequence A =

(a

k

)

k∈N

is ﬁxed. More precisely, we can prove the

following theorem:

Theorem 5.3.1 An inﬁnite lower triangular array

D = (d

n,k

)

n,k∈N

is a Riordan array if and only if a

sequence A = ¦a

0

= 0, a

1

, a

2

, . . .¦ exists such that for

every n, k ∈ N relation (5.3.1) holds

Proof: Let us suppose that D is the Riordan

array (d(t), h(t)) and let us consider the Riordan

array (d(t)h(t), h(t)); we deﬁne the Riordan array

(A(t), B(t)) by the relation:

(A(t), B(t)) = (d(t), h(t))

−1

(d(t)h(t), h(t))

or:

(d(t), h(t)) (A(t), B(t)) = (d(t)h(t), h(t)).

By performing the product we ﬁnd:

d(t)A(th(t)) = d(t)h(t) and h(t)B(th(t)) = h(t).

The latter identity gives B(th(t)) = 1 and this implies

B(t) = 1. Therefore we have (d(t), h(t)) (A(t), 1) =

(d(t)h(t), h(t)). The element f

n,k

of the left hand

member is

¸

∞

j=0

d

n,j

a

k−j

=

¸

∞

j=0

d

n,k+j

a

j

, if as

usual we interpret a

k−j

as 0 when k < j. The same

element in the right hand member is:

[t

n

]d(t)h(t)(th(t))

k

=

= [t

n+1

]d(t)(th(t))

k+1

= d

n+1,k+1

.

By equating these two quantities, we have the iden-

tity (5.3.1). For the converse, let us observe that

(5.3.1) uniquely deﬁnes the array D when the ele-

ments ¦d

0,0

, d

1,0

, d

2,0

, . . .¦ of column 0 are given. Let

d(t) be the generating function of this column, A(t)

the generating function of the sequence A and de-

ﬁne h(t) as the solution of the functional equation

h(t) = A(th(t)), which is uniquely determined be-

cause of our hypothesis a

0

= 0. We can therefore

consider the proper Riordan array

´

D = (d(t), h(t));

by the ﬁrst part of the theorem,

´

D satisﬁes relation

(5.3.1), for every n, k ∈ N and therefore, by our previ-

ous observation, it must coincide with D. This com-

pletes the proof.

The sequence A = (a

k

)

k∈N

is called the A-sequence

of the Riordan array D = (d(t), h(t)) and it only

depends on h(t). In fact, as we have shown during

the proof of the theorem, we have:

h(t) = A(th(t)) (5.3.2)

and this uniquely determines A when h(t) is given

and, vice versa, h(t) is uniquely determined when A

is given.

The A-sequence for the Pascal triangle is the so-

lution A(y) of the functional equation 1/(1 − t) =

A(t/(1 − t)). The simple substitution y = t/(1 − t)

gives A(y) = 1 +y, corresponding to the well-known

basic recurrence of the Pascal triangle:

n+1

k+1

=

n

k

+

n

k+1

**. At this point, we realize that we could
**

have started with this recurrence relation and directly

found A(y) = 1+y. Now, h(t) is deﬁned by (5.3.2) as

the solution of h(t) = 1+th(t), and this immediately

gives h(t) = 1/(1−t). Furthermore, since column 0 is

¦1, 1, 1, . . .¦, we have proved that the Pascal triangle

corresponds to the Riordan array (1/(1−t), 1/(1−t))

as initially stated.

The pair of functions d(t) and A(t) completely

characterize a proper Riordan array. Another type

of characterization is obtained through the following

observation:

Theorem 5.3.2 Let (d

n,k

)

n,k∈N

be any inﬁnite,

lower triangular array with d

n,n

= 0, ∀n ∈ N (in

particular, let it be a proper Riordan array); then a

unique sequence Z = (z

k

)

k∈N

exists such that every

element in column 0 can be expressed as a linear com-

bination of all the elements in the preceding row, i.e.:

d

n+1,0

= z

0

d

n,0

+z

1

d

n,1

+z

2

d

n,2

+ =

∞

¸

j=0

z

j

d

n,j

.

(5.3.3)

Proof: Let z

0

= d

1,0

/d

0,0

. Now we can uniquely

determine the value of z

1

by expressing d

2,0

in terms

of the elements in row 1, i.e.:

d

2,0

= z

0

d

1,0

+z

1

d

1,1

or z

1

=

d

0,0

d

2,0

−d

2

1,0

d

0,0

d

1,1

.

In the same way, we determine z

2

by expressing d

3,0

in terms of the elements in row 2, and by substituting

the values just obtained for z

0

and z

1

. By proceeding

in the same way, we determine the sequence Z in a

unique way.

The sequence Z is called the Z-sequence for the

(Riordan) array; it characterizes column 0, except

for the element d

0,0

. Therefore, we can say that

the triple (d

0,0

, A(t), Z(t)) completely characterizes

a proper Riordan array. To see how the Z-sequence

is obtained by starting with the usual deﬁnition of a

Riordan array, let us prove the following:

Theorem 5.3.3 Let (d(t), h(t)) be a proper Riordan

array and let Z(t) be the generating function of the

array’s Z-sequence. We have:

d(t) =

d

0,0

1 −tZ(th(t))

5.4. SIMPLE BINOMIAL COEFFICIENTS 59

Proof: By the preceding theorem, the Z-sequence

exists and is unique. Therefore, equation (5.3.3) is

valid for every n ∈ N, and we can go on to the gener-

ating functions. Since d(t)(th(t))

k

is the generating

function for column k, we have:

d(t) −d

0,0

t

=

= z

0

d(t) +z

1

d(t)th(t) +z

2

d(t)(th(t))

2

+ =

= d(t)(z

0

+z

1

th(t) +z

2

(th(t))

2

+ ) =

= d(t)Z(th(t)).

By solving this equation in d(t), we immediately ﬁnd

the relation desired.

The relation can be inverted and this gives us the

formula for the Z-sequence:

Z(y) =

¸

d(t) −d

0,0

td(t)

t = yh(t)

−1

.

We conclude this section by giving a theorem,

which characterizes renewal arrays by means of the

A- and Z-sequences:

Theorem 5.3.4 Let d(0) = h(0) = 0. Then d(t) =

h(t) if and only if: A(y) = d(0) +yZ(y).

Proof: Let us assume that A(y) = d(0) + yZ(y) or

Z(y) = (A(y) − d(0))/y. By the previous theorem,

we have:

d(t) =

d(0)

1 −tZ(th(t))

=

=

d(0)

1 −(tA(th(t)) −d(0)t)/th(t)

=

=

d(0)th(t)

d(0)t

= h(t),

because A(th(t)) = h(t) by formula (5.3.2). Vice

versa, by the formula for Z(y), we obtain from the

hypothesis d(t) = h(t):

d(0) +yZ(y) =

=

¸

d(0) +y

1

t

−

d(0)

th(t)

t = yh(t)

−1

=

=

¸

d(0) +

th(t)

t

−

d(0)th(t)

th(t)

t = yh(t)

−1

=

=

h(t)

t = yh(t)

−1

= A(y).

5.4 Simple binomial coeﬃ-

cients

Let us consider simple binomial coeﬃcients, i.e., bi-

nomial coeﬃcients of the form

n+ak

m+bk

, where a, b are

two parameters and k is a non-negative integer vari-

able. Depending if we consider n a variable and m a

parameter, or vice versa, we have two diﬀerent inﬁ-

nite arrays (d

n,k

) or (

´

d

m,k

), whose elements depend

on the parameters a, b, m or a, b, n, respectively. In

either case, if some conditions on a, b hold, we have

Riordan arrays and therefore we can apply formula

(5.1.3) to ﬁnd the value of many sums.

Theorem 5.4.1 Let d

n,k

and

´

d

m,k

be as above. If

b > a and b − a is an integer, then D = (d

n,k

) is

a Riordan array. If b < 0 is an integer, then

´

D =

(

´

d

m,k

) is a Riordan array. We have:

D =

t

m

(1 −t)

m+1

,

t

b−a−1

(1 −t)

b

´

D =

(1 +t)

n

,

t

−b−1

(1 +t)

−a

.

Proof: By using well-known properties of binomial

coeﬃcients, we ﬁnd:

d

n,k

=

n +ak

m+bk

=

n +ak

n −m+ak −bk

=

=

**−n −ak +n −m+ak −bk −1
**

n −m+ak −bk

(−1)

n−m+ak−bk

=

=

−m−bk −1

(n −m) + (a −b)k

(−1)

n−m+ak−bk

=

= [t

n−m+ak−bk

]

1

(1 −t)

m+1+bk

=

= [t

n

]

t

m

(1 −t)

m+1

t

b−a

(1 −t)

b

k

;

and:

´

d

m,k

= [t

m+bk

](1 +t)

n+ak

=

= [t

m

](1 +t)

n

(t

−b

(1 +t)

a

)

k

.

The theorem now directly follows from (5.1.1)

For m = a = 0 and b = 1 we again ﬁnd the Riordan

array of the Pascal triangle. The sum (5.1.3) takes

on two speciﬁc forms which are worth being stated

explicitly:

¸

k

n +ak

m+bk

f

k

=

= [t

n

]

t

m

(1 −t)

m+1

f

t

b−a

(1 −t)

b

b > a (5.4.1)

¸

k

n +ak

m+bk

f

k

=

= [t

m

](1 +t)

n

f(t

−b

(1 +t)

a

) b < 0. (5.4.2)

60 CHAPTER 5. RIORDAN ARRAYS

If m and n are independent of each other, these

relations can also be stated as generating function

identities. The binomial coeﬃcient

n+ak

m+bk

is so gen-

eral that a large number of combinatorial sums can

be solved by means of the two formulas (5.4.1) and

(5.4.2).

Let us begin our set of examples with a simple sum;

by the theorem above, the binomial coeﬃcients

n−k

m

**corresponds to the Riordan array (t
**

m

/(1 +t)

m+1

, 1);

therefore, by the formula concerning the row sums,

we have:

¸

k

n −k

m

= [t

n

]

t

m

(1 −t)

m+1

1

1 −t

=

= [t

n−m

]

1

(1 −t)

m+2

=

n + 1

m+ 1

.

Another simple example is the sum:

¸

k

n

2k + 1

5

k

=

= [t

n

]

t

(1 −t)

2

¸

1

1 −5y

y =

t

2

(1 −t)

2

=

=

1

2

[t

n

]

2t

1 −2t −4t

2

= 2

n−1

F

n

.

The following sum is a more interesting case. From

the generating function of the Catalan numbers we

immediately ﬁnd:

¸

k

n +k

m+ 2k

2k

k

(−1)

k

k + 1

=

= [t

n

]

t

m

(1 −t)

m+1

¸√

1 + 4y −1

2y

y =

t

(1 −t)

2

=

= [t

n−m

]

1

(1 −t)

m+1

1 +

4t

(1 −t)

2

−1

(1 −t)

2

2t

=

= [t

n−m

]

1

(1 −t)

m

=

n −1

m−1

.

In the following sum we use the bisection formulas.

Because the generating function for

z+1

k

2

k

is (1 +

2t)

z+1

, we have:

(

z + 1

2k + 1

2

2k+1

=

=

1

2

√

t

(1 + 2

√

t)

z+1

−(1 −2

√

t)

z+1

.

By applying formula (5.4.2):

¸

k

z + 1

2k + 1

z −2k

n −k

2

2k+1

=

= [t

n

](1 +t)

z

¸

(1 + 2

√

y)

z+1

2

√

y

−

−

(1 −2

√

y)

z+1

2

√

y

y =

t

(1 +t)

2

=

= [t

n

](1 +t)

z+1

(1 +t + 2

√

t)

z+1

2

√

t(1 +t)

z+1

−

−

(1 +t −2

√

t)

z+1

2

√

t(1 +t)

z+1

=

= [t

2n+1

](1 +t)

2z+2

=

2z + 2

2n + 1

;

in the last but one passage, we used backwards the

bisection rule, since (1 +t ±2

√

t)

z+1

= (1 ±

√

t)

2z+2

.

We solve the following sum by using (5.4.2):

¸

k

2n −2k

m−k

n

k

(−2)

k

=

= [t

m

](1 +t)

2n

¸

(1 −2y)

n

y =

t

(1 −t)

2

=

= [t

m

](1 +t

2

)

n

=

n

m/2

**where the binomial coeﬃcient is to be taken as zero
**

when m is odd.

5.5 Other Riordan arrays from

binomial coeﬃcients

Other Riordan arrays can be found by using the the-

orem in the previous section and the rule (α = 0, if

− is considered):

α ±β

α

α

β

=

α

β

±

α −1

β −1

.

For example we ﬁnd:

2n

n +k

n +k

2k

=

2n

n +k

n +k

n −k

=

=

n +k

n −k

+

n +k −1

n −k −1

=

=

n +k

2k

+

n −1 +k

2k

.

Hence, by formula (5.4.1), we have:

¸

k

2n

n +k

n +k

n −k

f

k

=

=

¸

k

n +k

2k

f

k

+

¸

k

n −1 +k

2k

f

k

=

= [t

n

]

1

1 −t

f

t

(1 −t)

2

+

+ [t

n−1

]

1

1 −t

f

t

(1 −t)

2

=

= [t

n

]

1 +t

1 −t

f

t

(1 −t)

2

.

5.6. BINOMIAL COEFFICIENTS AND THE LIF 61

This proves that the inﬁnite triangle of the elements

2n

n+k

n+k

2k

**is a proper Riordan array and many identi-
**

ties can be proved by means of the previous formula.

For example:

¸

k

2n

n +k

n +k

n −k

2k

k

(−1)

k

=

= [t

n

]

1 +t

1 −t

¸

1

√

1 + 4y

y =

t

(1 −t)

2

=

= [t

n

]1 = δ

n,0

,

¸

k

2n

n +k

n +k

n −k

2k

k

(−1)

k

k + 1

=

= [t

n

]

1 +t

1 −t

¸√

1 + 4y −1

2y

y =

t

(1 −t)

2

=

= [t

n

](1 +t) = δ

n,0

+δ

n,1

.

The following is a quite diﬀerent case. Let f(t) =

((f

k

) and:

G(t) = (

f

k

k

=

t

0

f(τ) −f

0

τ

dτ.

Obviously we have:

n −k

k

=

n −k

k

n −k −1

k −1

**except for k = 0, when the left-hand side is 1 and the
**

right-hand side is not deﬁned. By formula (5.4.1):

¸

k

n

n −k

n −k

k

f

k

=

= f

0

+n

∞

¸

k=1

n −k −1

k −1

f

k

k

=

= f

0

+n[t

n

]G

t

2

1 −t

. (5.5.1)

This gives an immediate proof of the following for-

mula known as Hardy’s identity:

¸

k

n

n −k

n −k

k

(−1)

k

=

= [t

n

] ln

1 −t

2

1 +t

3

=

= [t

n

]

ln

1

1 +t

3

−ln

1

1 −t

2

=

=

(−1)

n

2/n if 3 divides n

(−1)

n−1

/n otherwise.

We also immediately obtain:

¸

k

1

n −k

n −k

k

=

φ

n

+

´

φ

n

n

where φ is the golden ratio and

´

φ = −φ

−1

. The

reader can generalize formula (5.5.1) by using the

change of variable t → pt and prove other formulas.

The following one is known as Riordan’s old identity:

¸

k

n

n −k

n −k

k

(a +b)

n−2k

(−ab)

k

= a

n

+b

n

while this is a generalization of Hardy’s identity:

¸

k

n

n −k

n −k

k

x

n−2k

(−1)

k

=

=

(x +

√

x

2

−4)

n

+ (x −

√

x

2

−4)

n

2

n

.

5.6 Binomial coeﬃcients and

the LIF

In a few cases only, the formulas of the previous sec-

tions give the desired result when the m and n in

the numerator and denominator of a binomial coeﬃ-

cient are related between them. In fact, in that case,

we have to extract the coeﬃcient of t

n

from a func-

tion depending on the same variable n (or m). This

requires to apply the Lagrange Inversion Formula, ac-

cording to the diagonalization rule. Let us suppose

we have the binomial coeﬃcient

2n−k

n−k

and we wish

to know whether it corresponds to a Riordan array

or not. We have:

2n −k

n −k

= [t

n−k

](1 +t)

2n−k

=

= [t

n

](1 +t)

2n

t

1 +t

k

.

The function (1 + t)

2n

cannot be assumed as the

d(t) function of a Riordan array because it varies

as n varies. Therefore, let us suppose that k is

ﬁxed; we can apply the diagonalization rule with

F(t) = (t/(1 + t))

k

and φ(t) = (1 + t)

2

, and try to

ﬁnd a true generating function. We have to solve the

equation:

w = tφ(w) or w = t(1 +w)

2

.

This equation is tw

2

− (1 − 2t)w + t = 0 and we are

looking for the unique solution w = w(t) such that

w(0) = 0. This is:

w(t) =

1 −2t −

√

1 −4t

2t

.

We now perform the necessary computations:

F(w) =

w

1 +w

k

=

62 CHAPTER 5. RIORDAN ARRAYS

=

1 −2t −

√

1 −4t

1 −

√

1 −4t

k

=

=

1 −

√

1 −4t

2

k

;

furthermore:

1

1 −tφ

′

(w)

=

1

1 −2t(1 +w)

=

1

√

1 −4t

.

Therefore, the diagonalization gives:

2n −k

n −k

= [t

n

]

1

√

1 −4t

1 −

√

1 −4t

2

k

.

This shows that the binomial coeﬃcient is the generic

element of the Riordan array:

D =

1

√

1 −4t

,

1 −

√

1 −4t

2t

.

As a check, we observe that column 0 contains all the

elements with k = 0, i.e.,

2n

n

**, and this is in accor-
**

dance with the generating function d(t) = 1/

√

1 −4t.

A simple example is:

n

¸

k=0

2n −k

n −k

2

k

=

= [t

n

]

1

√

1 −4t

¸

1

1 −2y

y =

1 −

√

1 −4t

2

=

= [t

n

]

1

√

1 −4t

1

√

1 −4t

= [t

n

]

1

1 −4t

= 4

n

.

By using the diagonalization rule as above, we can

show that:

2n +ak

n −ck

k∈N

=

=

1

√

1 −4t

, t

c−1

1 −

√

1 −4t

2t

a+2c

.

An interesting example is given by the following al-

ternating sum:

¸

k

2n

n −3k

(−1)

k

=

= [t

n

]

1

√

1 −4t

¸

1

1 +y

y = t

3

1 −

√

1 −4t

2t

6

¸

= [t

n

]

1

2

√

1 −4t

+

1 −t

2(1 −3t)

=

=

1

2

2n

n

+ 3

n−1

+

δ

n,0

6

.

The reader is invited to solve, in a similar way, the

corresponding non-alternating sum.

In the same way we can deal with binomial coeﬃ-

cients of the form

pn+ak

n−ck

**, but in this case, in order
**

to apply the LIF, we have to solve an equation of de-

gree p > 2. This creates many diﬃculties, and we do

not insist on it any longer.

5.7 Coloured walks

In the section “Walks, trees and Catalan numbers”

we introduced the concept of a walk or path on the

integral lattice Z

2

. The concept can be generalized by

deﬁning a walk as a sequence of steps starting from

the origin and composed by three kinds of steps:

1. east steps, which go from the point (x, y) to (x+

1, y);

2. diagonal steps, which go from the point (x, y) to

(x + 1, y + 1);

3. north steps, which go from the point (x, y) to

(x, y + 1).

A colored walk is a walk in which every kind of step

can assume diﬀerent colors; we denote by a, b, c (a >

0, b, c ≥ 0) the number of colors the east, diagonal

and north steps can be. We discuss complete colored

walks, i.e., walks without any restriction, and under-

diagonal walks, i.e., walks that never go above the

main diagonal x −y = 0. The length of a walk is the

number of its steps, and we denote by d

n,k

the num-

ber of colored walks which have length n and reach a

distance k from the main diagonal, i.e., the last step

ends on the diagonal x − y = k ≥ 0. A colored walk

problem is any (counting) problem corresponding to

colored walks; a problem is called symmetric if and

only if a = c.

We wish to point out that our considerations are

by no means limited to the walks on the integral lat-

tice. Many combinatorial problems can be proved

to be equivalent to some walk problems; bracketing

problems are a typical example and, in fact, a vast

literature exists on walk problems.

Let us consider d

n+1,k+1

, i.e., the number of col-

ored walks of length n+1 reaching the distance k +1

from the main diagonal. We observe that each walk

is obtained in a unique way as:

1. a walk of length n reaching the distance k from

the main diagonal, followed by any of the a east

steps;

2. a walk of length n reaching the distance k + 1

from the main diagonal, followed by any of the b

diagonal steps;

3. a walk of length n reaching the distance k + 2

from the main diagonal, followed by any of the

c north steps.

Hence we have: d

n+1,k+1

= ad

n,k

+bd

n,k+1

+cd

n,k+2

.

This proves that A = ¦a, b, c¦ is the A-sequence of

(d

n,k

)

n,k∈N

, which therefore is a proper Riordan ar-

ray. This signiﬁcant fact can be stated as:

5.7. COLOURED WALKS 63

Theorem 5.7.1 Let d

n,k

be the number of colored

walks of length n reaching a distance k from the main

diagonal, then the inﬁnite triangle (d

n,k

)

n,k∈N

is a

proper Riordan array.

The Pascal, Catalan and Motzkin triangles deﬁne

walking problems that have diﬀerent values of a, b, c.

When c = 0, it is easily proved that d

n,k

=

n

k

a

k

b

n−k

and so we end up with the Pascal triangle. Con-

sequently, we assume c = 0. For any given triple

(a, b, c) we obtain one type of array from complete

walks and another from underdiagonal walks. How-

ever, the function h(t), that only depends on the A-

sequence, is the same in both cases, and we can ﬁnd it

by means of formula (5.3.2). In fact, A(t) = a+bt+ct

2

and h(t) is the solution of the functional equation

h(t) = a +bth(t) +ct

2

h(t)

2

having h(0) = 0:

h(t) =

1 −bt −

√

1 −2bt +b

2

t

2

−4act

2

2ct

2

(5.7.1)

The radicand 1 − 2bt + (b

2

− 4ac)t

2

= (1 − (b +

2

√

ac)t)(1 − (b − 2

√

ac)t) will be simply denoted by

∆.

Let us now focus our attention on underdiagonal

walks. If we consider d

n+1,0

, we observe that every

walk returning to the main diagonal can only be ob-

tained from another walk returning to the main diag-

onal followed by any diagonal step, or a walk ending

at distance 1 from the main diagonal followed by any

north step. Hence, we have d

n+1,0

= bd

n,0

+ cd

n,1

and in the column generating functions this corre-

sponds to d(t) − 1 = btd(t) + ctd(t)th(t). From this

relation we easily ﬁnd d(t) = (1/a)h(t), and therefore

by (5.7.1) the Riordan array of underdiagonal colored

walk is:

(d

n,k

)

n,k∈N

=

1 −bt −

√

∆

2act

2

,

1 −bt −

√

∆

2ct

2

.

In current literature, major importance is usually

given to the following three quantities:

1. the number of walks returning to the main diag-

onal; this is d

n

= [t

n

]d(t), for every n,

2. the total number of walks of length n; this is

α

n

=

¸

n

k=0

d

n,k

, i.e., the value of the row sums

of the Riordan array;

3. the average distance from the main diagonal of

all the walks of length n; this is δ

n

=

¸

n

k=0

kd

n,k

,

which is the weighted row sum of the Riordan

array, divided by α

n

.

In Chapter 7 we will learn how to ﬁnd an asymp-

totic approximation for d

n

. With regard to the last

two points, the formulas for the row sums and the

weighted row sums given in the ﬁrst section allow us

to ﬁnd the generating functions α(t) of the total num-

ber α

n

of underdiagonal walks of length n, and δ(t)

of the total distance δ

n

of these walks from the main

diagonal:

α(t) =

1

2at

1 −(b + 2a)t −

√

∆

(a +b +c)t −1

δ(t) =

1

4at

1 −(b + 2a)t −

√

∆

(a +b +c)t −1

2

.

In the symmetric case these formulas simplify as fol-

lows:

α(t) =

1

2at

1 −(b −2a)t

1 −(b + 2a)t

−1

δ(t) =

1

2at

1 −bt

1 −(b + 2a)t

−

1 −(b −2a)t

1 −(b + 2a)t

.

The alternating row sums and the diagonal sums

sometimes have some combinatorial signiﬁcance as

well, and so they can be treated in the same way.

The study of complete walks follows the same lines

and we only have to derive the form of the corre-

sponding Riordan array, which is:

(d

n,k

)

n,k∈N

=

1

√

∆

,

1 −bt −

√

∆

2ct

2

.

The proof is as follows. Since a complete walk can

go above the main diagonal, the array (d

n,k

)

n,k∈N

is

only the right part of an inﬁnite triangle, in which

k can also assume the negative values. By following

the logic of the theorem above, we see that the gen-

erating function of the nth row is ((c/w) +b +aw)

n

,

and therefore the bivariate generating function of the

extended triangle is:

d(t, w) =

¸

n

c

w

+b +aw

n

t

n

=

=

1

1 −(aw +b +c/w)t

.

If we expand this expression by partial fractions, we

get:

d(t, w) =

1

√

∆

1

1 −

1−bt−

√

∆

2ct

w

−

1

1 −

1−bt+

√

∆

2ct

w

=

1

√

∆

1

1 −

1−bt−

√

∆

2ct

w

+

+

1 −bt −

√

∆

2at

1

w

1

1 −

1−bt−

√

∆

2ct

1

w

.

64 CHAPTER 5. RIORDAN ARRAYS

The ﬁrst term represents the right part of the ex-

tended triangle and this corresponds to k ≥ 0,

whereas the second term corresponds to the left part

(k < 0). We are interested in the right part, and the

expression can be written as:

1

√

∆

1

1 −

1−bt−

√

∆

2ct

w

=

1

√

∆

¸

k

1 −bt −

√

∆

2ct

k

w

k

which immediately gives the form of the Riordan ar-

ray.

5.8 Stirling numbers and Rior-

dan arrays

The connection between Riordan arrays and Stirling

numbers is not immediate. If we examine the two in-

ﬁnite triangles of the Stirling numbers of both kinds,

we immediately realize that they are not Riordan ar-

rays. It is not diﬃcult to obtain the column generat-

ing functions for the Stirling numbers of the second

kind; by starting with the recurrence relation:

n + 1

k + 1

= (k + 1)

n

k + 1

+

n

k

¸

and the obvious generating function S

0

(t) = 1, we

can specialize the recurrence (valid for every n ∈ N)

to the case k = 0. This gives the relation between

generating functions:

S

1

(t) −S

1

(0)

t

= S

1

(t) +S

0

(t);

because S

1

(0) = 0, we immediately obtain S

1

(t) =

t/(1−t), which is easily checked by looking at column

1 in the array. In a similar way, by specializing the

recurrence relation to k = 1, we ﬁnd S

2

(t) = 2tS

2

(t)+

tS

1

(t), whose solution is:

S

2

(t) =

t

2

(1 −t)(1 −2t)

.

This proves, in an algebraic way, that

¸

n

2

¸

= 2

n−1

−1,

and also indicates the form of the generating function

for column m:

S

m

(t) = (

n

m

¸

n∈N

=

t

m

(1 −t)(1 −2t) (1 −mt)

which is now proved by induction when we specialize

the recurrence relation above to k = m. This is left

to the reader as a simple exercise.

The generating functions for the Stirling numbers

of the ﬁrst kind are not so simple. However, let us

go on with the Stirling numbers of the second kind

proceeding in the following way; if we multiply the

recurrence relation by (k +1)!/(n+1)! we obtain the

new relation:

(k + 1)!

(n + 1)!

n + 1

k + 1

=

=

(k + 1)!

n!

n

k + 1

k + 1

n + 1

+

k!

n!

n

k

¸

k + 1

n + 1

.

If we denote by d

n,k

the quantity k!

¸

n

k

¸

/n!, this is a

recurrence relation for d

n,k

, which can be written as:

(n + 1)d

n+1,k+1

= (k + 1)d

n,k+1

+ (k + 1)d

n,k

.

Let us now proceed as above and ﬁnd the column

generating functions for the new array (d

n,k

)

n,k∈N

.

Obviously, d

0

(t) = 1; by setting k = 0 in the new

recurrence:

(n + 1)d

n+1,1

= d

n,1

+d

n,0

and passing to generating functions: d

′

1

(t) = d

1

(t) +

1. The solution of this simple diﬀerential equation

is d

1

(t) = e

t

− 1 (the reader can simply check this

solution, if he or she prefers). We can now go on

by setting k = 1 in the recurrence; we obtain: (n +

1)d

n+1,2

= 2d

n,2

+ 2d

n,1

, or d

′

2

(t) = 2d

2

(t) + 2(e

t

−

1). Again, this diﬀerential equation has the solution

d

2

(t) = (e

t

− 1)

2

, and this suggests that, in general,

we have: d

k

(t) = (e

t

− 1)

k

. A rigorous proof of this

fact can be obtained by mathematical induction; the

recurrence relation gives: d

′

k+1

(t) = (k +1)d

k+1

(t) +

(k + 1)d

k

(t). By the induction hypothesis, we can

substitute d

k

(t) = (e

t

−1)

k

and solve the diﬀerential

equation thus obtained. In practice, we can simply

verify that d

k+1

(t) = (e

t

−1)

k+1

; by substituting, we

have:

(k + 1)e

t

(e

t

−1)

k

=

(k + 1)(e

t

−1)

k+1

+ (k + 1)(e

t

−1)

k

and this equality is obviously true.

The form of this generating function:

d

k

(t) = (

k!

n!

n

k

¸

n∈N

= (e

t

−1)

k

proves that (d

n,k

)

n,k∈N

is a Riordan array having

d(t) = 1 and th(t) = (e

t

− 1). This fact allows us

to prove algebraically a lot of identities concerning

the Stirling numbers of the second kind, as we shall

see in the next section.

For the Stirling numbers of the ﬁrst kind we pro-

ceed in an analogous way. We multiply the basic

recurrence:

¸

n + 1

k + 1

= n

¸

n

k + 1

+

n

k

**5.9. IDENTITIES INVOLVING THE STIRLING NUMBERS 65
**

by (k + 1)!/(n + 1)! and study the quantity f

n,k

=

k!

n

k

/n!:

(k + 1)!

(n + 1)!

¸

n + 1

k + 1

=

=

(k + 1)!

n!

¸

n

k + 1

n

n + 1

+

k!

n!

n

k

k + 1

n + 1

,

that is:

(n + 1)f

n+1,k+1

= nf

n,k+1

+ (k + 1)f

n,k

.

In this case also we have f

0

(t) = 1 and by special-

izing the last relation to the case k = 0, we obtain:

f

′

1

(t) = tf

′

1

(t) +f

0

(t).

This is equivalent to f

′

1

(t) = 1/(1 − t) and because

f

1

(0) = 0 we have:

f

1

(t) = ln

1

1 −t

.

By setting k = 1, we ﬁnd the simple diﬀerential equa-

tion f

′

2

(t) = tf

′

2

(t) + 2f

1

(t), whose solution is:

f

2

(t) =

ln

1

1 −t

2

.

This suggests the general formula:

f

k

(t) = (

k!

n!

n

k

n∈N

=

ln

1

1 −t

k

and again this can be proved by induction. In this

case, (f

n,k

)

n,k∈N

is the Riordan array having d(t) = 1

and th(t) = ln(1/(1 −t)).

5.9 Identities involving the

Stirling numbers

The two recurrence relations for d

n,k

and f

n,k

do not

give an immediate evidence that the two triangles

are indeed Riordan arrays., because they do not cor-

respond to A-sequences. However, the A-sequences

for the two arrays can be easily found, once we know

their h(t) function. For the Stirling numbers of the

ﬁrst kind we have to solve the functional equation:

ln

1

1 −t

= tA

ln

1

1 −t

.

By setting y = ln(1/(1 − t)) or t = (e

y

− 1)/y, we

have A(y) = ye

y

/(e

y

− 1) and this is the generating

function for the A-sequence we were looking for. In

a similar way, we ﬁnd that the A-sequence for the

triangle related to the Stirling numbers of the second

kind is:

A(t) =

t

ln(1 +t)

.

A ﬁrst result we obtain by using the correspon-

dence between Stirling numbers and Riordan arrays

concerns the row sums of the two triangles. For the

Stirling numbers of the ﬁrst kind we have:

n

¸

k=0

n

k

= n!

n

¸

k=0

k!

n!

n

k

1

k!

=

= n![t

n

]

¸

e

y

y = ln

1

1 −t

=

= n![t

n

]

1

1 −t

= n!

as we observed and proved in a combinatorial way.

The row sums of the Stirling numbers of the second

kind give, as we know, the Bell numbers; thus we

can obtain the (exponential) generating function for

these numbers:

n

¸

k=0

n

k

¸

= n!

n

¸

k=0

k!

n!

n

k

¸

1

k!

=

= n![t

n

]

e

y

y = e

t

−1

=

= n![t

n

] exp(e

t

−1);

therefore we have:

(

B

n

n!

= exp(e

t

−1).

We also deﬁned the ordered Bell numbers as O

n

=

¸

n

k=0

¸

n

k

¸

k!; therefore we have:

O

n

n!

=

n

¸

k=0

k!

n!

n

k

¸

=

= [t

n

]

¸

1

1 −y

y = e

t

−1

= [t

n

]

1

2 −e

t

.

We have thus obtained the exponential generating

function:

(

O

n

n!

=

1

2 −e

t

.

Stirling numbers of the two kinds are related be-

tween them in various ways. For example, we have:

¸

k

n

k

k

m

=

n!

m!

¸

k

k!

n!

n

k

m!

k!

k

m

=

=

n!

m!

[t

n

]

¸

(e

y

−1)

m

y = ln

1

1 −t

=

=

n!

m!

[t

n

]

t

m

(1 −t)

m

=

n!

m!

n −1

m−1

.

Besides, two orthogonality relations exist between

Stirling numbers. The ﬁrst one is proved in this way:

¸

k

n

k

k

m

(−1)

n−k

=

66 CHAPTER 5. RIORDAN ARRAYS

= (−1)

n

n!

m!

¸

k

k!

n!

n

k

m!

k!

k

m

(−1)

k

=

= (−1)

n

n!

m!

[t

n

]

¸

(e

−y

−1)

m

y = ln

1

1 −t

=

= (−1)

n

n!

m!

[t

n

](−t)

m

= δ

n,m

.

The second orthogonality relation is proved in a sim-

ilar way and reads:

¸

k

n

k

¸

¸

k

m

(−1)

n−k

= δ

n,m

.

We introduced Stirling numbers by means of Stir-

ling identities relative to powers and falling factorials.

We can now prove these identities by using a Riordan

array approach. In fact:

n

¸

k=0

n

k

(−1)

n−k

x

k

=

= (−1)

n

n!

n

¸

k=0

k!

n!

n

k

x

k

k!

(−1)

k

=

= (−1)

n

n![t

n

]

¸

e

−xy

y = ln

1

1 −t

=

= (−1)

n

n![t

n

](1 −t)

x

= n!

x

n

= x

n

and:

n

¸

k=0

n

k

¸

x

k

= n!

n

¸

k=0

k!

n!

n

k

¸

x

k

=

= n![t

n

]

(1 +y)

x

y = e

t

−1

=

= n![t

n

]e

tx

= x

n

.

We conclude this section by showing two possible

connections between Stirling numbers and Bernoulli

numbers. First we have:

n

¸

k=0

n

k

¸

(−1)

k

k!

k + 1

= n!

n

¸

k=0

k!

n!

n

k

¸

(−1)

k

k + 1

=

= n![t

n

]

¸

−

1

y

ln

1

1 +y

y = e

t

−1

=

= n![t

n

]

t

e

t

−1

= B

n

which proves that Bernoulli numbers can be deﬁned

in terms of the Stirling numbers of the second kind.

For the Stirling numbers of the ﬁrst kind we have the

identity:

n

¸

k=0

n

k

B

k

= n!

n

¸

k=0

k!

n!

n

k

B

k

k!

=

= n![t

n

]

¸

y

e

y

−1

y = ln

1

1 −t

=

= n![t

n

]

1 −t

t

ln

1

1 −t

=

= n![t

n+1

](1 −t) ln

1

1 −t

=

(n −1)!

n + 1

.

Clearly, this holds for n > 0. For n = 0 we have:

n

¸

k=0

n

k

B

k

= B

0

= 1.

Chapter 6

Formal methods

6.1 Formal languages

During the 1950’s, the linguist Noam Chomski in-

troduced the concept of a formal language. Several

deﬁnitions have to be provided before a precise state-

ment of the concept can be given. Therefore, let us

proceed in the following way.

First, we recall deﬁnitions given in Section 2.1. An

alphabet is a ﬁnite set A = ¦a

1

, a

2

, . . . , a

n

¦, whose

elements are called symbols or letters. A word on A

is a ﬁnite sequence of symbols in A; the sequence is

written by juxtaposing the symbols, and therefore a

word w is denoted by w = a

i1

a

i2

. . . a

ir

, and r = [w[ is

the length of the word. The empty sequence is called

the empty word and is conventionally denoted by ǫ;

its length is obviously 0, and is the only word of 0

length.

The set of all the words on A, the empty word in-

cluded, is indicated by A

∗

, and by A

+

if the empty

word is excluded. Algebraically, A

∗

is the free monoid

on A, that is the monoid freely generated by the sym-

bols in A. To understand this point, let us consider

the operation of juxtaposition and recursively apply

it starting with the symbols in A. What we get are

the words on A, and the juxtaposition can be seen as

an operation between them. The algebraic structure

thus obtained has the following properties:

1. associativity: w

1

(w

2

w

3

) = (w

1

w

2

)w

3

=

w

1

w

2

w

3

;

2. ǫ is the identity or neutral element: ǫw = wǫ =

w.

It is called a monoid, which, by construction, has

been generated by combining the symbols in A in all

the possible ways. Because of that, (A

∗

, ), if de-

notes the juxtaposition, is called the “free monoid”

generated by A. Observe that a monoid is an alge-

braic structure more general than a group, in which

all the elements have an inverse as well.

If w ∈ A

∗

and z is a word such that w can be

decomposed w = w

1

zw

2

(w

1

and/or w

2

possibly

empty), we say that z is a subword of w; we also

say that z occurs in w, and the particular instance

of z in w is called an occurrence of z in w. Observe

that if z is a subword of w, it can have more than one

occurrence in w. If w = zw

2

, we say that z is a head

or preﬁx of w, and if w = w

1

z, we say that z is a tail

or suﬃx of w. Finally, a language on A is any subset

L ⊆ A

∗

.

The basic deﬁnition concerning formal languages

is the following: a grammar is a 4-tuple G =

(T, N, σ, {), where:

• T = ¦a

1

, a

2

, . . . , a

n

¦ is the alphabet of terminal

symbols;

• N = ¦φ

1

, φ

2

, . . . , φ

m

¦ is the alphabet of non-

terminal symbols;

• σ ∈ N is the initial symbol;

• { is a ﬁnite set of productions.

Usually, the symbols in T are denoted by lower

case Latin letters; the symbols in N by Greek let-

ters or by upper case Latin letters. A production

is a pair (z

1

, z

2

) of words in T ∪ N, such that z

1

contains at least a symbol in N; the production is

often indicated by z

1

→ z

2

. If w ∈ (T ∪ N)

∗

, we

can apply a production z

1

→ z

2

∈ { to w whenever

w can be decomposed w = w

1

z

1

w

2

, and the result

is the new word w

1

z

2

w

2

∈ (T ∪ N)

∗

; we will write

w = w

1

z

1

w

2

⊢ w

1

z

2

w

2

when w

1

z

1

w

2

is the decompo-

sition of w in which z

1

is the leftmost occurrence of

z

1

in w; in other words, if we also have w = ´ w

1

z

1

´ w

2

,

then [w

1

[ < [ ´ w

1

[.

Given a grammar G = (T, N, σ, {), we deﬁne the

relation w ⊢ ´ w between words w, ´ w ∈ (T ∪ N)

∗

: the

relation holds if and only if a production z

1

→ z

2

∈ {

exists such that z

1

occurs in w, w = w

1

z

1

w

2

is the

leftmost occurrence of z

1

in w and ´ w = w

1

z

2

w

2

. We

also denote by ⊢

∗

the transitive closure of ⊢ and call

it generation or derivation; this means that w ⊢

∗

´ w

if and only if a sequence (w = w

1

, w

2

, . . . , w

s

= ´ w)

exists such that w

1

⊢ w

2

, w

2

⊢ w

3

, . . . , w

s−1

⊢ w

s

.

We observe explicitly that by our condition that in

every production z

1

→ z

2

the word z

1

should contain

67

68 CHAPTER 6. FORMAL METHODS

at least a symbol in N, if a word w

i

∈ T

∗

is produced

during a generation, it is terminal, i.e., the generation

should stop. By collecting all these deﬁnitions, we

ﬁnally deﬁne the language generated by the grammar

G as the set:

L(G) =

w ∈ T

∗

σ ⊢

∗

w

¸

i.e., a word w ∈ T

∗

is in L(G) if and only if we can

generate it by starting with the initial symbol σ and

go on by applying the productions in { until w is

generated. At that moment, the generation stops.

Note that, sometimes, the generation can go on for-

ever, never generating a word on T; however, this is

not a problem: it only means that such generations

should be ignored.

6.2 Context-free languages

The deﬁnition of a formal language is quite general

and it is possible to show that formal languages co-

incide with the class of “partially recursive sets”, the

largest class of sets which can be constructed recur-

sively, i.e., in ﬁnite terms. This means that we can

give rules to build such sets (e.g., we can give a gram-

mar for them), but their construction can go on for-

ever, so that, looking at them from another point of

view, if we wish to know whether a word w belongs

to such a set S, we can be unlucky and an inﬁnite

process can be necessary to ﬁnd out that w ∈ S.

Because of that, people have studied more re-

stricted classes of languages, for which a ﬁnite process

is always possible for ﬁnding out whether w belongs

to the language or not. Surely, the most important

class of this kind is the class of “context-free lan-

guages”. They are deﬁned in the following way. A

context-free grammar is a grammar G = (T, N, σ, {)

in which all the productions z

1

→ z

2

in { are such

that z

1

∈ N. The naming “context-free” derives from

this deﬁnition, because a production z

1

→ z

2

is ap-

plied whenever the non-terminal symbol z

1

is the left-

most non-terminal symbol in a word, irrespective of

the context in which it appears.

As a very simple example, let us consider the fol-

lowing grammar. Let T = ¦a, b¦, N = ¦σ¦; σ is

the initial symbol of the grammar, being the only

non-terminal. The set { is composed of the two pro-

ductions:

σ → ǫ σ → aσbσ.

This grammar is called the Dyck grammar and the

language generated by it the Dyck language. In Fig-

ure 6.1 we draw the generation of some words in the

Dyck language. The recursive nature of the produc-

tions allows us to prove properties of the Dyck lan-

guage by means of mathematical induction:

Theorem 6.2.1 A word w ∈ ¦a, b¦

∗

belongs to the

Dyck language D if and only if:

i) the number of a’s in w equals the number of b’s;

ii) in every preﬁx z of w the number of a’s is not

less than the number of b’s.

Proof: Let w ∈ D; if w = ǫ nothing has to be

proved. Otherwise, w is generated by the second pro-

duction and w = aw

1

bw

2

with w

1

, w

2

∈ D; therefore,

if we suppose that i) holds for w

1

and w

2

, it also

holds for w. For ii), any preﬁx z of w must have

one of the forms: a, az

1

where z

1

is a preﬁx of w

1

,

aw

1

b or aw

1

bz

2

where z

2

is a preﬁx of w

2

. By the

induction hypothesis, ii) should hold for z

1

and z

2

,

and therefore it is easily proved for w. Vice versa, let

us suppose that i) and ii) hold for w ∈ T

∗

. If w = ǫ,

then by ii) w should begin by a. Let us scan w until

we ﬁnd the ﬁrst occurrence of the symbol b such that

w = aw

1

bw

2

and in w

1

the number of b’s equals the

number of a’s. By i) such occurrence of b must exist,

and consequently w

1

and w

2

must satisfy condition

i). Besides, if w

1

and w

2

are not empty, then they

should satisfy condition ii), by the very construction

of w

1

and the fact that w satisﬁes condition ii) by

hypothesis. We have thus obtained a decomposition

of w showing that the second production has been

used. This completes the proof.

If we substitute the letter a with the symbol ‘(

′

and

the letter b with the symbol ‘)’, the theorem shows

that the words in the Dyck language are the possi-

ble parenthetizations of an expression. Therefore, the

number of Dyck words with n pairs of parentheses is

the Catalan number

2n

n

**/(n+1). We will see how this
**

result can also be obtained by starting with the def-

inition of the Dyck language and applying a suitable

and mechanical method, known as Sch¨ utzenberger

methodology or symbolic method. The method can

be applied to every set of objects, which are deﬁned

through a non-ambiguous context-free grammar.

A context-free grammar G is ambiguous iﬀ there

exists a word w ∈ L(G) which can be generated by

two diﬀerent leftmost derivations. In other words, a

context-free grammar H is non-ambiguous iﬀ every

word w ∈ L(H) can be generated in one and only

one way. An example of an ambiguous grammar is

G = (T, N, σ, {) where T = ¦1¦, N = ¦σ¦ and {

contains the two productions:

σ → 1 σ → σσ.

For example, the word 111 is generated by the two

following leftmost derivations:

σ ⊢ σσ ⊢ 1σ ⊢ 1σσ ⊢ 11σ ⊢ 111

σ ⊢ σσ ⊢ σσσ ⊢ 1σσ ⊢ 11σ ⊢ 111.

6.3. FORMAL LANGUAGES AND PROGRAMMING LANGUAGES 69

σ

ǫ

aσbσ

abσ aaσbσbσ

ab abaσbσ aabσbσ aaaσbσbσbσ

ababσ abaaσbσbσ aabbσ aabaσbσbσ

abab ababaσbσ aabb aabbaσbσ

$

$

$

$

$

$

$

@

@

@

@

@

@

@

@

@

@@ h

h

h

h

h

h

h

h

h

hh

d

d

d

d

d

d

d

d

d

d

d

d

d

d

d

d

d

d

Figure 6.1: The generation of some Dyck words

Instead, the Dyck grammar is non-ambiguous; in

fact, as we have shown in the proof of the pre-

vious theorem, given any word w ∈ D, w = ǫ,

there is only one decomposition w = aw

1

bw

2

, hav-

ing w

1

, w

2

∈ D; therefore, w can only be generated

in a single way. In general, if we show that any word

in a context-free language L(G), generated by some

grammar G, has a unique decomposition according

to the productions in G, then the grammar cannot

be ambiguous. Because of the connection between

the Sch¨ utzenberger methodology and non-ambiguous

context-free grammars, we are mainly interested in

this kind of grammars. For the sake of completeness,

a context-free language is called intrinsically ambigu-

ous iﬀ every context-free grammar generating it is

ambiguous. This deﬁnition stresses the fact that, if a

language is generated by an ambiguous grammar, it

can also be generated by some non-ambiguous gram-

mar, unless it is intrinsically ambiguous. It is possible

to show that intrinsically ambiguous languages actu-

ally exist; fortunately, they are not very frequent. For

example, the language generated by the previous am-

biguous grammar is ¦1¦

+

, i.e., the set of all the words

composed by any sequence of 1’s, except the empty

word. actually, it is not an ambiguous language and

a non-ambiguous grammar generating it is given by

the same T, N, σ and the two productions:

σ → 1 σ → 1σ.

It is a simple matter to show that every word 11 . . . 1

can be uniquely decomposed according to these pro-

ductions.

6.3 Formal languages and pro-

gramming languages

In 1960, the formal deﬁnition of the programming

language ALGOL’60 was published. ALGOL’60 has

surely been the most inﬂuential programming lan-

guage ever created, although it was actually used only

by a very limited number of programmers. Most of

the concepts we now ﬁnd in programming languages

were introduced by ALGOL’60, of which, for exam-

ple, PASCAL and C are direct derivations. Here,

we are not interested in these aspects of ALGOL’60,

but we wish to spend some words on how ALGOL’60

used context-free grammars to deﬁne its syntax in a

formal and precise way. In practice, a program in

ALGOL’60 is a word generated by a (rather com-

plex) context-free grammar, whose initial symbol is

'program`.

The ALGOL’60 grammar used, as terminal sym-

bol alphabet, the characters available on the stan-

dard keyboard of a computer; actually, they were the

characters punchable on a card, the input mean used

at that time to introduce a program into the com-

puter. The non-terminal symbol notation was one

of the most appealing inventions of ALGOL’60: the

symbols were composed by entire English sentences

enclosed by the two special parentheses ' and `. This

allowed to clearly express the intended meaning of the

non-terminal symbols. The previous example con-

cerning 'program` makes surely sense. Another tech-

nical device used by ALGOL’60 was the compaction

of productions; if we had several production with the

same left hand symbol β → w

1

, β → w

2

, . . . , β → w

k

,

70 CHAPTER 6. FORMAL METHODS

they were written as a single rule:

β ::= w

1

[ w

2

[ [ w

k

where ::= was a metasymbol denoting deﬁnition and

[ was read “or” to denote alternatives. This notation

is usually called Backus Normal Form (BNF).

Just to do a very simple example, in Figure 6.1

(lines 1 through 6) we show how integer numbers

were deﬁned. This deﬁnition avoids leading 0’s in

numbers, but allows both +0 and −0. Productions

can be easily changed to avoid +0 or −0 or both.

In the same ﬁgure, line 7 shows the deﬁnition of the

conditional statements.

This kind of deﬁnition gives a precise formula-

tion of all the clauses in the programming language.

Besides, since the program has a single generation

according to the grammar, it is possible to ﬁnd

this derivation starting from the actual program and

therefore give its exact structure. This allows to give

precise information to the compiler, which, in a sense,

is directed from the formal syntax of the language

(syntax directed compilation).

A very interesting aspect is how this context-free

grammar deﬁnition can avoid ambiguities in the in-

terpretation of a program. Let us consider an expres-

sion like a + b ∗ c; according to the rules of Algebra,

the multiplication should be executed before the ad-

dition, and the computer must follow this convention

in order to create no confusion. This is done by the

simpliﬁed productions given by lines 8 through 11 in

Figure 6.1. The derivation of the simple expression

a+b ∗ c, or of a more complicated expression, reveals

that it is decomposed into the sum of a and b ∗ c;

this information is passed to the compiler and the

multiplication is actually performed before addition.

If powers are also present, they are executed before

products.

This ability of context-free grammars in design-

ing the syntax of programming languages is very im-

portant, and after ALGOL’60 the syntax of every

programming language has always been deﬁned by

context-free grammars. We conclude by remembering

that a more sophisticated approach to the deﬁnition

of programming languages was tried with ALGOL’68

by means of van Wijngaarden’s grammars, but the

method revealed too complex and was abandoned.

6.4 The symbolic method

The Sch¨ utzenberger’s method allows us to obtain

the counting generating function for every non-

ambiguous language, starting with the correspond-

ing non-ambiguous grammar and proceeding in a me-

chanical way. Let us begin by a simple example; Fi-

bonacci words are the words on the alphabet ¦0, 1¦

beginning and ending by the symbol 1 and never con-

taining two consecutive 0’s. For small values of n,

Fibonacci words of length n are easily displayed:

n = 1 1

n = 2 11

n = 3 111, 101

n = 4 1111, 1011, 1101

n = 5 11111, 10111, 11011, 11101, 10101

If we count them by their length, we obtain the

sequence ¦0, 1, 1, 2, 3, 5, 8, . . .¦, which is easily recog-

nized as the Fibonacci sequence. In fact, a word of

length n is obtained by adding a trailing 1 to a word of

length n−1, or adding a trailing 01 to a word of length

n − 2. This immediately shows, in a combinatorial

way, that Fibonacci words are counted by Fibonacci

numbers. Besides, we get the productions of a non-

ambiguous context-free grammar G = (T, N, σ, {),

where T = ¦0, 1¦, N = ¦φ¦, σ = φ and { contains:

φ → 1 φ → φ1 φ → φ01

(these productions could have been written φ ::=

1 [ φ1 [ φ01 by using the ALGOL’60 notations).

We are now going to obtain the counting gener-

ating function for Fibonacci words by applying the

Sch¨ utzenberger’s method. This consists in the fol-

lowing steps:

1. every non-terminal symbol σ ∈ N is transformed

into the name of its counting generating function

σ(t);

2. every terminal symbol is transformed into t;

3. the empty word is transformed into 1;

4. every [ sign is transformed into a + sign, and ::=

is transformed into an equal sign.

After having performed these transformations, we ob-

tain a system of equations, which can be solved in the

unknown generating functions introduced in the ﬁrst

step. They are the counting generating functions for

the languages generated by the corresponding non-

terminal symbols, when we consider them as the ini-

tial symbols.

The deﬁnition of the Fibonacci words produces:

φ(t) = t +tφ(t) +t

2

φ(t)

the solution of which is:

φ(t) =

t

1 −t −t

2

;

this is obviously the generating function for the Fi-

bonacci numbers. Therefore, we have shown that the

6.5. THE BIVARIATE CASE 71

1 'digit` ::= 0 [ 1 [ 2 [ 3 [ 4 [ 5 [ 6 [ 7 [ 8 [ 9

2 'non −zero digit` ::= 1 [ 2 [ 3 [ 4 [ 5 [ 6 [ 7 [ 8 [ 9

3 'sequence of digits` ::= 'digit` [ 'digit` 'sequence of digits`

4 'unsigned number` ::= 'digit` [ 'non −zero digit` 'sequence of digits`

5 'signed number` ::= +'unsigned number` [ −'unsigned number`

6 'integer number` ::= 'unsigned number` [ 'signed number`

7 'conditional clause` ::= if 'condition` then 'instruction` [

if 'condition` then 'instruction` else 'instruction`

8 'expression` ::= 'term` [ 'term` +'expression`

9 'term` ::= 'factor` [ 'term` ∗ 'factor`

10 'factor` ::= 'element` [ 'factor` ↑ 'element`

11 'element` ::= 'constant` [ 'variable` [ ('expression`)

Table 6.1: Context-free languages and programming languages

number of Fibonacci words of length n is F

n

, as we

have already proved by combinatorial arguments.

In the case of the Dyck language, the deﬁnition

yields:

σ(t) = 1 +t

2

σ(t)

2

and therefore:

σ(t) =

1 −

√

1 −4t

2

2t

2

.

Since every word in the Dyck language has an even

length, the number of Dyck words with 2n symbols is

just the nth Catalan number, and this also we knew

by combinatorial means.

Another example is given by the Motzkin words;

these are words on the alphabet ¦a, b, c¦ in which

a, b act as parentheses in the Dyck language, while

c is free and can appear everywhere. Therefore, the

deﬁnition of the language is:

µ ::= ǫ [ cµ[ aµbµ

if µ is the only non-terminal symbol. The

Sch¨ utzenberger’s method gives the equation:

µ(t) = 1 +tµ(t) +t

2

µ(t)

2

whose solution is easily found:

µ(t) =

1 −t −

√

1 −2t −3t

2

2t

2

.

By expanding this function we ﬁnd the sequence of

Motzkin numbers, beginning:

n 0 1 2 3 4 5 6 7 8 9

M

n

1 1 2 4 9 21 51 127 323 835

These numbers count the so-called unary-binary

trees, i.e., trees the nodes of which have ariety 1 or 2.

They can be deﬁned in a pictorial way by means of

an object grammar:

An object grammar deﬁnes combinatorial objects

instead of simple letters or words; however, most

times, it is rather easy to pass from an object gram-

mar to an equivalent context-free grammar, and

therefore obtain counting generating functions by

means of Sch¨ utzenberger’s method. For example, the

object grammar in Figure 6.2 is obviously equivalent

to the context-free grammar for Motzkin words.

6.5 The bivariate case

In Sch¨ utzenberger’s method, the rˆole of the indeter-

minate t is to count the number of letters or symbols

occurring in the generated words; because of that, ev-

ery symbol appearing in a production is transformed

into a t. However, we can wish to count other param-

eters instead of or in conjunction with the number of

symbols. This is accomplished by modifying the in-

tended meaning of the indeterminate t and/or intro-

ducing some other indeterminate to take into account

the other parameters.

For example, in the Dyck language, we can wish to

count the number of pairs a, b occurring in the words;

this means that t no longer counts the single letters,

but counts the pairs. Therefore, the Sch¨ utzenberger’s

method gives the equation σ(t) = 1 + tσ(t)

2

, whose

solution is just the generating function of the Catalan

numbers.

An interesting application is as follows. Let us sup-

pose we wish to know how many Fibonacci words of

length n contain k zeroes. Besides the indeterminate

t counting the total number of symbols, we introduce

a new indeterminate z counting the number of zeroes.

From the productions of the Fibonacci grammar, we

derive an equation for the bivariate generating func-

tion φ(t, z), in which the coeﬃcient of t

n

z

k

is just the

72 CHAPTER 6. FORMAL METHODS

t

t

t

t

µ

::=

`

ǫ

,

t

t

t

t

µ

,

d

d

d

d

t

t

t

t

µ

t

t

t

t

µ

Figure 6.2: An object grammar

number we are looking for. The equation is:

φ(t, z) = t +tφ(t, z) +t

2

zφ(t, z).

In fact, the production φ → φ1 increases by 1 the

length of the words, but does not introduce any 0;

the production φ → φ01, increases by 2 the length

and introduces a zero. By solving this equation, we

ﬁnd:

φ(t, z) =

t

1 −t −zt

2

.

We can now extract the coeﬃcient φ

n,k

of t

n

z

k

in the

following way:

φ

n,k

= [t

n

][z

k

]

t

1 −t −zt

2

=

= [t

n

][z

k

]

t

1 −t

1

1 −z

t

2

1−t

=

= [t

n

][z

k

]

t

1 −t

∞

¸

k=0

z

k

t

2

1 −t

k

=

= [t

n

][z

k

]

∞

¸

k=0

t

2k+1

z

k

(1 −t)

k+1

= [t

n

]

t

2k+1

(1 −t)

k+1

=

= [t

n−2k−1

]

1

(1 −t)

k+1

=

n −k −1

k

.

Therefore, the number of Fibonacci words of length

n containing k zeroes is counted by a binomial coeﬃ-

cient. The second expression in the derivation shows

that the array (φ

n,k

)

n,k∈N

is indeed a Riordan ar-

ray (t/(1 −t), t/(1 −t)), which is the Pascal triangle

stretched vertically, i.e., column k is shifted down by

k positions (k + 1, in reality). The general formula

we know for the row sums of a Riordan array gives:

¸

k

φ

n,k

=

¸

k

n −k −1

k

=

= [t

n

]

t

1 −t

¸

1

1 −y

y =

t

2

1 −t

=

= [t

n

]

t

1 −t −t

2

= F

n

as we were expecting. A more interesting problem

is to ﬁnd the average number of zeroes in all the Fi-

bonacci words with n letters. First, we count the

total number of zeroes in all the words of length n:

¸

k

kφ

n,k

=

¸

k

n −k −1

k

k =

= [t

n

]

t

1 −t

¸

y

(1 −y)

2

y =

t

2

1 −t

=

= [t

n

]

t

3

(1 −t −t

2

)

2

.

We extract the coeﬃcient:

[t

n

]

t

3

(1 −t −t

2

)

2

=

= [t

n−1

]

1

√

5

1

1 −φt

−

1

1 −

´

φt

2

=

=

1

5

[t

n−1

]

1

(1 −φt)

2

−

2

5

[t

n

]

t

(1 −φt)(1 −

´

φt)

+

+

1

5

[t

n−1

]

1

(1 −

´

φt)

2

=

=

1

5

[t

n−1

]

1

(1 −φt)

2

−

2

5

√

5

[t

n

]

1

1 −φt

+

+

2

5

√

5

[t

n

]

1

1 −

´

φt

+

1

5

[t

n−1

]

1

(1 −

´

φt)

2

.

The last two terms are negligible because they rapidly

tend to 0; therefore we have:

¸

k

kφ

n,k

≈

n

5

φ

n−1

−

2

5

√

5

φ

n

.

To obtain the average number Z

n

of zeroes, we need

to divide this quantity by F

n

∼ φ

n

/

√

5, the total

number of Fibonacci words of length n:

Z

n

=

¸

k

kφ

n,k

F

n

∼

n

φ

√

5

−

2

5

=

5 −

√

5

10

n −

2

5

.

This shows that the average number of zeroes grows

linearly with the length of the words and tends to

become the 27.64% of this length, because (5 −

√

5)/10 ≈ 0.2763932022 . . ..

6.6 The Shift Operator

In the usual mathematical terminology, an operator

is a mapping from some set T

1

of functions into some

6.7. THE DIFFERENCE OPERATOR 73

other set of functions T

2

. We have already encoun-

tered the operator (, acting from the set of sequences

(which are properly functions from N into R or C)

into the set of formal power series (analytic func-

tions). Other usual examples are D, the operator

of diﬀerentiation, and

**, the operator of indeﬁnite
**

integration. Since the middle of the 19th century,

the English mathematicians (G. Boole in particular)

introduced the concept of a ﬁnite operator, i.e., an

operator acting in ﬁnite terms, in contrast to an in-

ﬁnitesimal operator, such as diﬀerentiation and inte-

gration. The most simple among the ﬁnite operators

is the identity, denoted by I or by 1, which does not

change anything. Operationally, however, the most

important ﬁnite operator is the shift operator, de-

noted by E, changing the value of any function f at

a point x into the value of the same function at the

point x + 1. We write:

Ef(x) = f(x + 1)

Unlike an inﬁnitesimal operator as D, a ﬁnite op-

erator can be applied to a sequence, as well, and in

that case we can write:

Ef

n

= f

n+1

This property is particularly interesting from our

point of view but, in order to follow the usual nota-

tional conventions, we shall adopt the ﬁrst, functional

notation rather than the second one, more speciﬁc for

sequences.

The shift operator can be iterated and, to denote

the successive applications of E, we write E

n

instead

of EE . . . E (n times). So:

E

2

f(x) = EEf(x) = Ef(x + 1) = f(x + 2)

and in general:

E

n

f(x) = f(x +n)

Conventionally, we set E

0

= I = 1, and this is in

accordance with the meaning of I:

E

0

f(x) = f(x) = If(x)

We wish to observe that every recurrence can be

written using the shift operator. For example, the

recurrence for the Fibonacci numbers is:

E

2

F

n

= EF

n

+F

n

and this can be written in the following way, separat-

ing the operator parts from the sequence parts:

(E

2

−E −I)F

n

= 0

Some obvious properties of the shift operator are:

E(αf(x) +βg(x)) = αEf(x) +βEg(x)

E(f(x)g(x)) = Ef(x)Eg(x)

Hence, if c is any constant (i.e., c ∈ R or c ∈ C) we

have Ec = c.

It is possible to consider negative powers of E, as

well. So we have:

E

−1

f(x) = f(x −1) E

−n

f(x) = f(x −n)

and this is in accordance with the usual rules of pow-

ers:

E

n

E

m

= E

n+m

for n, m ∈ Z

E

n

E

−n

= E

0

= I for n ∈ Z

Finite operators are commonly used in Numerical

Analysis. In that case, an increment h is deﬁned and

the shift operator acts according to this increment,

i.e., Ef(x) = f(x +h). When considering sequences,

this makes no sense and we constantly use h = 1.

We can have occasions to use two or more shift

operators, that is shift operators related to diﬀerent

variables. We’ll distinguish them by suitable sub-

scripts:

E

x

f(x, y) = f(x + 1, y) E

y

f(x, y) = f(x, y + 1)

E

x

E

y

f(x, y) = E

y

E

x

f(x, y) = f(x + 1, y + 1)

6.7 The Diﬀerence Operator

The second most important ﬁnite operator is the dif-

ference operator ∆; it is deﬁned in terms of the shift

and identity operators:

∆f(x) = (E −1)f(x) =

= Ef(x) −If(x) = f(x + 1) −f(x)

Some simple examples are:

∆c = Ec −c = c −c = 0 ∀c ∈ C

∆x = Ex −x = x + 1 −x = 1

∆x

2

= Ex

2

−x

2

= (x + 1)

2

−x

2

= 2x + 1

∆x

n

= Ex

n

−x

n

= (x + 1)

n

−x

n

=

n−1

¸

k=0

n

k

x

k

∆

1

x

=

1

x + 1

−

1

x

= −

1

x(x + 1)

∆

x

m

=

x + 1

m

−

x

m

=

x

m−1

**The last example makes use of the recurrence relation
**

for binomial coeﬃcients. An important observation

74 CHAPTER 6. FORMAL METHODS

concerns the behavior of the diﬀerence operator with

respect to the falling factorial:

∆x

m

= (x + 1)

m

−x

m

= (x + 1)x(x −1)

(x −m+ 2) −x(x −1) (x −m+ 1) =

= x(x −1) (x −m+ 2)(x + 1 −x +m−1) =

= mx

m−1

This is analogous to the usual rule for the diﬀerenti-

ation operator applied to x

m

:

Dx

m

= mx

m−1

As we shall see, many formal properties of the dif-

ference operator are similar to the properties of the

diﬀerentiation operator. The rˆole of the powers x

m

is

however taken by the falling factorials, which there-

fore assume a central position in the theory of ﬁnite

operators.

The following general rules are rather obvious:

∆(αf(x) +βg(x)) = α∆f(x) +β∆g(x)

∆(f(x)g(x)) =

= E(f(x)g(x)) −f(x)g(x) =

= f(x + 1)g(x + 1) −f(x)g(x) =

= f(x + 1)g(x + 1) −f(x)g(x + 1) +

+f(x)g(x + 1) −f(x)g(x) =

= ∆f(x)Eg(x) +f(x)∆g(x)

resembling the diﬀerentiation rule D(f(x)g(x)) =

(Df(x))g(x) +f(x)Dg(x). In a similar way we have:

∆

f(x)

g(x)

=

f(x + 1)

g(x + 1)

−

f(x)

g(x)

=

=

f(x + 1)g(x) −f(x)g(x)

g(x)g(x + 1)

+

+

f(x)g(x) −f(x)g(x + 1)

g(x)g(x + 1)

=

=

(∆f(x))g(x) −f(x)∆g(x)

g(x)Eg(x)

The diﬀerence operator can be iterated:

∆

2

f(x) = ∆∆f(x) = ∆(f(x + 1) −f(x)) =

= f(x + 2) −2f(x + 1) +f(x).

From a formal point of view, we have:

∆

2

= (E −1)

2

= E

2

−2E + 1

and in general:

∆

n

= (E −1)

n

=

n

¸

k=0

n

k

(−1)

n−k

E

k

=

= (−1)

n

n

¸

k=0

n

k

(−E)

k

This is a very important formula, and it is the ﬁrst

example for the interest of combinatorics and gener-

ating functions in the theory of ﬁnite operators. In

fact, let us iterate ∆ on f(x) = 1/x:

∆

2

1

x

=

−1

(x + 1)(x + 2)

+

1

x(x + 1)

=

=

−x +x + 2

x(x + 1)(x + 2)

=

2

x(x + 1)(x + 2)

∆

n

1

x

=

(−1)

n

n!

x(x + 1) (x +n)

as we can easily show by mathematical induction. In

fact:

∆

n+1

1

x

=

(−1)

n

n!

(x + 1) (x +n + 1)

−

−

(−1)

n

n!

x(x + 1) (x +n)

=

=

(−1)

n+1

(n + 1)!

x(x + 1) (x +n + 1)

The formula for ∆

n

now gives the following identity:

∆

n

1

x

=

n

¸

k=0

n

k

(−1)

n−k

E

k

1

x

By multiplying everything by (−1)

n

this identity can

be written as:

n!

x(x + 1) (x +n)

=

1

x

x+n

n

=

n

¸

k=0

n

k

(−1)

k

x +k

and therefore we have both a way to express the in-

verse of a binomial coeﬃcient as a sum and an expres-

sion for the partial fraction expansion of the polyno-

mial x(x + 1) (x +n) inverse.

6.8 Shift and Diﬀerence Oper-

ators - Example I

As the diﬀerence operator ∆ can be expressed in

terms of the shift operator E, so E can be expressed

in terms of ∆:

E = ∆ + 1

This rule can be iterated, giving the summation for-

mula:

E

n

= (∆ + 1)

n

=

n

¸

k=0

n

k

∆

k

which can be seen as the “dual” formula of the one

already considered:

∆

n

=

n

¸

k=0

n

k

(−1)

n−k

E

k

6.8. SHIFT AND DIFFERENCE OPERATORS - EXAMPLE I 75

The evaluation of the successive diﬀerences of any

function f(x) allows us to state and prove two identi-

ties, which may have combinatorial signiﬁcance. Here

we record some typical examples; we mark with an

asterisk the cases when ∆

0

f(x) = If(x).

1) The function f(x) = 1/x has already been de-

veloped, at least partially:

∆

1

x

=

−1

x(x + 1)

∆

n

1

x

=

(−1)

n

n!

x(x + 1) (x +n)

(−1)

n

x

x +n

n

−1

¸

k

n

k

(−1)

k

x +k

=

n!

x(x + 1) (x +n)

=

=

1

x

x +n

n

−1

¸

k

n

k

(−1)

k

x +k

k

−1

=

x

x +n

.

2

∗

) A somewhat similar situation, but a bit more

complex:

∆

p +x

m+x

=

m−p

(m+x)(m+x + 1)

∆

n

p +x

m+x

=

m−p

m+x

(−1)

n−1

m+x +n

n

−1

¸

k

n

k

(−1)

k

p +k

m+k

=

m−p

m

m+n

n

−1

(n > 0)

¸

k

n

k

m+k

k

−1

(−1)

k

=

m

m+n

(see above).

3) Another version of the ﬁrst example:

∆

1

px +m

=

−p

(px +m)(px +p +m)

∆

n

1

px +m

=

=

(−1)

n

n!p

n

(px +m)(px +p +m) (px +np +m)

According to this rule, we should have: ∆

0

(p +

x)/(m + x) = (p − m)(m + x); in the second next

sum, however, we have to set ∆

0

= I, and therefore

we also have to subtract 1 from both members in or-

der to obtain a true identity; a similar situation arises

whenever we have ∆

0

= I.

¸

k

n

k

(−1)

k

pk +m

=

n!p

n

m(m+p) (m+np)

¸

k

n

k

(−1)

k

k!p

k

m(m+p) (m+pk)

=

1

pn +m

4

∗

) A case involving the harmonic numbers:

∆H

x

=

1

x + 1

∆

n

H

x

=

(−1)

n−1

n

x +n

n

−1

¸

k

n

k

(−1)

k

H

x+k

= −

1

n

x +n

n

−1

(n > 0)

n

¸

k=1

n

k

(−1)

k−1

k

x +k

k

−1

= H

x+n

−H

x

where is to be noted the case x = 0.

5

∗

) A more complicated case with the harmonic

numbers:

∆xH

x

= H

x

+ 1

∆

n

xH

x

=

(−1)

n

n −1

x +n −1

n −1

−1

¸

k

n

k

(−1)

k

(x +k)H

x+k

=

=

1

n −1

x +n −1

n −1

−1

¸

k

n

k

(−1)

k

k −1

x +k −1

k −1

−1

=

= (x +n) (H

x+n

−H

x

) −n

6) Harmonic numbers and binomial coeﬃcients:

∆

x

m

H

x

=

x

m−1

H

x

+

1

m

∆

n

x

m

H

x

=

x

m−n

(H

x

+H

m

−H

m−n

)

¸

k

n

k

(−1)

k

x +k

m

H

x+k

=

= (−1)

n

x

m−n

(H

x

+H

m

−H

m−n

)

¸

k

n

k

x

m−k

(H

x

+H

m

−H

m−k

) =

=

x +n

m

H

x+n

and by performing the sums on the left containing

H

x

and H

m

:

¸

k

n

k

x

m−k

H

m−k

=

=

x +n

m

(H

x

+H

m

−H

x+n

)

76 CHAPTER 6. FORMAL METHODS

7) The function ln(x) can be inserted in this group:

∆ln(x) = ln

x + 1

x

∆

n

ln(x) = (−1)

n

¸

k

n

k

(−1)

k

ln(x +k)

¸

k

n

k

(−1)

k

ln(x +k) =

=

¸

k

n

k

(−1)

k

ln(x +k)

n

¸

k=0

k

¸

j=0

(−1)

k+j

n

k

k

j

ln(x +j) = ln(x +n)

Note that the last but one relation is an identity.

6.9 Shift and Diﬀerence Oper-

ators - Example II

Here we propose other examples of combinatorial

sums obtained by iterating the shift and diﬀerence

operators.

1) The typical sum containing the binomial coeﬃ-

cients: Newton’s binomial theorem:

∆p

x

= (p −1)p

x

∆

n

p

x

= (p −1)

n

p

x

¸

k

n

k

(−1)

k

p

k

= (p −1)

n

¸

k

n

k

(p −1)

k

= p

n

2) Two sums involving Fibonacci numbers:

∆F

x

= F

x−1

∆

n

F

x

= F

x−n

¸

k

n

k

(−1)

k

F

x+k

= (−1)

n

F

x−n

¸

k

n

k

F

x−k

= F

x+n

3) Falling factorials are an introduction to binomial

coeﬃcients:

∆x

m

= mx

m−1

∆

n

x

m

= m

n

x

m−n

¸

k

n

k

(−1)

k

(x +k)

m

= (−1)

n

m

n

x

m−n

¸

k

n

k

m

k

x

m−k

= (x +n)

m

4) Similar sums hold for raising factorials:

∆x

m

= m(x + 1)

m−1

∆

n

x

m

= m

n

(x +n)

m−n

¸

k

n

k

(−1)

k

(x +k)

m

= (−1)

n

m

n

(x +n)

m−n

¸

k

n

k

m

k

(x +k)

m−k

= (x +n)

m

5) Two sums involving the binomial coeﬃcients:

∆

x

m

=

x

m−1

∆

n

x

m

=

x

m−n

¸

k

n

k

(−1)

k

x +k

m

= (−1)

n

x

m−n

¸

k

n

k

x

m−k

=

x +n

m

**6) Another case with binomial coeﬃcients:
**

∆

p +x

m+x

=

p +x

m+x + 1

∆

n

p +x

m+x

=

p +x

m+x +n

¸

k

n

k

(−1)

k

p +k

m+k

= (−1)

n

p

m+n

¸

k

n

k

p

m+k

=

p +n

m+n

**7) And now a case with the inverse of a binomial
**

coeﬃcient:

∆

x

m

−1

= −

m

m−1

x + 1

m+ 1

−1

∆

n

x

m

−1

= (−1)

n

m

m+n

x +n

m+n

−1

¸

k

n

k

(−1)

k

x +k

m

−1

=

m

m+n

x +n

m+n

−1

¸

k

n

k

(−1)

k

m

m+k

x +k

m+k

−1

=

x +n

m

−1

6.10. THE ADDITION OPERATOR 77

8) Two sums with the central binomial coeﬃcients:

∆

1

4

x

2x

x

=

1

2(x + 1)

1

4

x

2x

x

∆

n

1

4

x

2x

x

=

(−1)

n

(2n)!

n!(x + 1) (x +n)

1

4

n

2n

n

¸

k

n

k

(−1)

k

2x + 2k

x +k

1

4

k

=

=

(2n)!

n!(x + 1) (x +n)

1

4

n

2x

x

=

=

1

4

n

2n

n

x +n

n

−1

2x

x

¸

k

n

k

(−1)

k

(2k)!

k!(x + 1) (x +k)

1

4

k

2x

x

=

=

1

4

n

2x + 2n

x +n

**9) Two sums with the inverse of the central bino-
**

mial coeﬃcients:

∆4

x

2x

x

−1

=

4

x

2x + 1

2x

x

−1

∆

n

4

x

2x

x

−1

=

=

1

2n −1

(−1)

n−1

(2n)!4

x

2

n

n!(2x + 1) (2x + 2n −1)

2x

x

−1

¸

k

n

k

(−1)

k

4

k

2x + 2k

x +k

−1

=

=

1

2n −1

(2n)!

2

n

n!(2x + 1) (2x + 2n −1)

2x

x

−1

¸

k

n

k

1

2k −1

(2k)!(−1)

k−1

2

k

k!(2x + 1) (2x + 2k −1)

=

= 4

n

2x + 2n

x +n

−1

2x

x

−1

6.10 The Addition Operator

The addition operator S is analogous to the diﬀerence

operator:

S = E + 1

and in fact a simple connection exists between the

two operators:

S(−1)

x

f(x) = (−1)

x+1

f(x + 1) + (−1)

x

f(x) =

= (−1)

x+1

(f(x + 1) −f(x)) =

= (−1)

x−1

∆f(x)

Because of this connection, the addition operator has

not widely considered in the literature, and the sym-

bol S only is used here for convenience. Likewise the

diﬀerence operator, the addition operator can be it-

erated and often produces interesting combinatorial

sums according to the rules:

S

n

= (E + 1)

n

=

¸

k

n

k

E

k

E

n

= (S −1)

n

=

¸

k

n

k

(−1)

n−k

S

k

Some examples are in order here:

1) Fibonacci numbers are typical:

SF

m

= F

m+1

+F

m

= F

m+2

S

n

F

m

= F

m+2n

¸

k

n

k

F

m+k

= F

m+2n

¸

k

n

k

(−1)

k

F

m+2k

= (−1)

n

F

m+n

2) Here are the binomial coeﬃcients:

S

m

x

=

m+ 1

x + 1

S

n

m

x

=

m+n

x +n

¸

k

n

k

m

x +k

=

m+n

x +n

¸

k

n

k

(−1)

k

m+k

x +k

= (−1)

n

m

x +n

**3) And ﬁnally the inverse of binomial coeﬃcients:
**

S

m

x

−1

=

m+ 1

m

m−1

x

−1

S

n

m

x

=

m+ 1

m−n + 1

m−n

x

−1

¸

k

n

k

m

x +k

−1

=

m+ 1

m−n + 1

m+n

x

−1

¸

k

n

k

(−1)

k

m−k + 1

m+k

x

−1

=

1

m+ 1

m

x +n

−1

.

We can obviously invent as many expressions as we

desire and, correspondingly, may obtain some sum-

mation formulas of combinatorial interest. For ex-

ample:

S∆ = (E + 1)(E −1) = E

2

−1 =

= (E −1)(E + 1) = ∆S

78 CHAPTER 6. FORMAL METHODS

This derivation shows that the two operators S and

∆ commute. We can directly verify this property:

S∆f(x) = S(f(x + 1) −f(x)) =

= f(x + 2) −f(x) = (E

2

−1)f(x)

∆Sf(x) = ∆(f(x + 1) +f(x)) =

= f(x + 2) −f(x) = (E

2

−1)f(x)

Consequently, we have the two summation formulas:

∆

n

S

n

= (E

2

−1)

n

=

¸

k

n

k

(−1)

n−k

E

2k

E

2n

= (∆S + 1)

n

=

¸

k

n

k

∆

k

S

k

A simple example is oﬀered by the Fibonacci num-

bers:

∆SF

m

= F

m+1

(∆S)

n

F

m

= ∆

n

S

n

F

m

= F

m+n

¸

k

n

k

(−1)

k

F

m+2k

= (−1)

n

F

m+n

¸

k

n

k

F

m+k

= F

m+2n

but these identities have already been proved using

the addition operator S.

6.11 Deﬁnite and Indeﬁnite

summation

The following result is one of the most important rule

connecting the ﬁnite operator method and combina-

torial sums:

n

¸

k=0

E

k

= [z

n

]

1

1 −z

1

1 −Ez

=

= [z

n

]

1

(E −1)z

1

1 −Ez

−

1

1 −z

=

=

1

E −1

[z

n+1

]

1

1 −Ez

−

1

1 −z

=

=

E

n+1

−1

E −1

= (E

n+1

−1)∆

−1

We observe that the operator E commutes with the

indeterminate z, which is constant with respect to

the variable x, on which E operates. The rule above

is called the rule of deﬁnite summation; the operator

∆

−1

is called indeﬁnite summation and is often de-

noted by Σ. In order to make this point clear, let us

consider any function f(x) and suppose that a func-

tion g(x) exists such that ∆g(x) = f(x). Hence we

have ∆

−1

f(x) = g(x) and the rule of deﬁnite sum-

mation immediately gives:

n

¸

k=0

f(x +k) = g(x +n + 1) −g(x)

This is analogous to the rule of deﬁnite integration.

In fact, the operator of indeﬁnite integration

dx

is inverse of the diﬀerentiation operator D, and if

f(x) is any function, a primitive function for f(x)

is any function ˆ g(x) such that Dˆ g(x) = f(x) or

D

−1

f(x) =

**f(x)dx = ˆ g(x). The fundamental the-
**

orem of the integral calculus relates deﬁnite and in-

deﬁnite integration:

b

a

f(x)dx = ˆ g(b) − ˆ g(a)

The formula for deﬁnite summation can be written

in a similar way, if we consider the integer variable k

and set a = x and b = x +n + 1:

b−1

¸

k=a

f(k) = g(b) −g(a)

These facts create an analogy between ∆

−1

and

D

−1

, or Σ and

**dx, which can be stressed by con-
**

sidering the formal properties of Σ. First of all, we

observe that g(x) = Σf(x) is not uniquely deter-

mined. If C(x) is any function periodic of period

1, i.e., C(x +k) = C(x), ∀k ∈ Z, we have:

∆(g(x) +C(x)) = ∆g(x) + ∆C(x) = ∆g(x)

and therefore:

Σf(x) = g(x) +C(x)

When f(x) is a sequence and only is deﬁned for inte-

ger values of x, the function C(x) reduces to a con-

stant, and plays the same rˆole as the integration con-

stant in the operation of indeﬁnite integration.

The operator Σ is obviously linear:

Σ(αf(x) +βg(x)) = αΣf(x) +βΣg(x)

This is proved by applying the operator ∆ to both

sides.

An important property is summation by parts, cor-

responding to the well-known rule of the indeﬁnite

integration operator. Let us begin with the rule for

the diﬀerence of a product, which we proved earlier:

∆(f(x)g(x)) = ∆f(x)Eg(x) +f(x)∆g(x)

By applying the operator Σ to both sides and ex-

changing terms:

Σ(f(x)∆g(x)) = f(x)g(x) −Σ(∆f(x)Eg(x))

6.12. DEFINITE SUMMATION 79

This formula allows us to change a sum of products,

of which we know that the second factor is a diﬀer-

ence, into a sum involving the diﬀerence of the ﬁrst

factor. The transformation can be convenient every

time when the diﬀerence of the ﬁrst factor is simpler

than the diﬀerence of the second factor. For example,

let us perform the following indeﬁnite summation:

Σx

x

m

= Σx∆

x

m+ 1

=

= x

x

m+ 1

−Σ(∆x)

x + 1

m+ 1

=

= x

x

m+ 1

−Σ

x + 1

m+ 1

=

= x

x

m+ 1

−

x + 1

m+ 2

**Obviously, this indeﬁnite sum can be transformed
**

into a deﬁnite sum by using the ﬁrst result in this

section:

b

¸

k=a

k

k

m

= (b + 1)

b + 1

m+ 1

−

− a

a

m+ 1

−

b + 2

m+ 2

+

a + 1

m+ 2

**and for a = 0 and b = n:
**

n

¸

k=0

k

k

m

= (n + 1)

n + 1

m+ 1

−

n + 2

m+ 2

6.12 Deﬁnite Summation

In a sense, the rule:

n

¸

k=0

E

k

= (E

n+1

−1)∆

−1

= (E

n+1

−1)Σ

is the most important result of the operator method.

In fact, it reduces the sum of the successive elements

in a sequence to the computation of the indeﬁnite

sum, and this is just the operator inverse of the

diﬀerence. Unfortunately, ∆

−1

is not easy to com-

pute and, apart from a restricted number of cases,

there is no general rule allowing us to guess what

∆

−1

f(x) = Σf(x) might be. In this rather pes-

simistic sense, the rule is very ﬁne, very general and

completely useless.

However, from a more positive point of view, we

can say that whenever we know, in some way or an-

other, an expression for ∆

−1

f(x) = Σf(x), we have

solved the problem of ﬁnding

¸

n

k=0

f(x + k). For

example, we can look at the diﬀerences computed in

the previous sections and, for each of them, obtain

the Σ of some function; in this way we immediately

have a number of sums. The negative point is that,

sometimes, we do not have a simple function and,

therefore, the sum may not have any combinatorial

interest.

Here is a number of identities obtained by our pre-

vious computations.

1) We have again the partial sums of the geometric

series:

∆

−1

p

x

=

p

x

p −1

= Σp

x

n

¸

k=0

p

x+k

=

p

x+n+1

−p

x

p −1

n

¸

k=0

p

k

=

p

n+1

−1

p −1

(x = 0)

2) The sum of consecutive Fibonacci numbers:

∆

−1

F

x

= F

x+1

= ΣF

x

n

¸

k=0

F

x+k

= F

x+n+2

−F

x+1

n

¸

k=0

F

k

= F

n+2

−1 (x = 0)

3) The sum of consecutive binomial coeﬃcients with

constant denominator:

∆

−1

x

m

=

x

m+ 1

= Σ

x

m

n

¸

k=0

x +k

m

=

x +n + 1

m+ 1

−

x

m+ 1

n

¸

k=0

k

m

=

n + 1

m+ 1

(x = 0)

4) The sum of consecutive binomial coeﬃcients:

∆

−1

p +x

m+x

=

p +x

m+x −1

= Σ

p +x

m+x

n

¸

k=0

p +k

m+k

=

p +n + 1

m+n

−

p

m−1

**5) The sum of falling factorials:
**

∆

−1

x

m

=

1

m+ 1

x

m+1

= Σx

m

n

¸

k=0

(x +k)

m

=

(x +n + 1)

m+1

−x

m+1

m+ 1

n

¸

k=0

k

m

=

1

m+ 1

(n + 1)

m+1

(x = 0).

6) The sum of raising factorials:

∆

−1

x

m

=

1

m+ 1

(x −1)

m+1

= Σx

m

80 CHAPTER 6. FORMAL METHODS

n

¸

k=0

(x +k)

m

=

(x +n)

m+1

−(x −1)

m+1

m+ 1

n

¸

k=0

k

m

=

1

m+ 1

n

m+1

(x = 0).

7) The sum of inverse binomial coeﬃcients:

∆

−1

x

m

−1

=

m

m−1

x −1

m−1

−1

= Σ

x

m

n

¸

k=0

x +k

m

−1

=

m

m−1

x −1

m−1

−1

−

x +n

m−1

−1

.

8) The sum of harmonic numbers. Since 1 = ∆x, we

have:

∆

−1

H

x

= xH

x

−x = ΣH

x

n

¸

k=0

H

x+k

= (x +n + 1)H

x+n+1

−xH

x

−(n + 1)

n

¸

k=0

H

k

= (n + 1)H

n+1

−(n + 1) =

= (n + 1)H

n

−n (x = 0).

6.13 The Euler-McLaurin Sum-

mation Formula

One of the most striking applications of the ﬁnite

operator method is the formal proof of the Euler-

McLaurin summation formula. The starting point is

the Taylor theorem for the series expansion of a func-

tion f(x) ∈ C

∞

, i.e., a function having derivatives of

any order. The usual form of the theorem:

f(x +h) = f(x) +

h

1!

f

′

(x) +

h

2

2!

f

′′

(x) +

+ +

h

n

n!

f

(n)

(x) +

can be interpreted in the sense of operators as a result

connecting the shift and the diﬀerentiation operators.

In fact, for h = 1, it can be written as:

Ef(x) = If(x) +

Df(x)

1!

+

D

2

f(x)

2!

+

and therefore as a relation between operators:

E = 1 +

D

1!

+

D

2

2!

+ +

D

n

n!

+ = e

D

This formal identity relates the ﬁnite operator E and

the inﬁnitesimal operator D, and subtracting 1 from

both sides it can be formulated as:

∆ = e

D

−1

By inverting, we have a formula for the Σ operator:

Σ =

1

e

D

−1

=

1

D

D

e

D

−1

**Now, we recognize the generating function of the
**

Bernoulli numbers, and therefore we have a devel-

opment for Σ:

Σ =

1

D

B

0

+

B

1

1!

D +

B

2

2!

D

2

+

=

= D

−1

−

1

2

I +

1

12

D −

1

720

D

3

+

+

1

30240

D

5

−

1

1209600

D

7

+ .

This is not a series development since, as we know,

the Bernoulli numbers diverge to inﬁnity. We have a

case of asymptotic development, which only is deﬁned

when we consider a limited number of terms, but in

general diverges if we let the number of terms go to

inﬁnity. The number of terms for which the sum ap-

proaches its true value depends on the function f(x)

and on the argument x.

From the indeﬁnite we can pass to the deﬁnite sum

by applying the general rule of Section 6.12. Since

D

−1

=

dx, we immediately have:

n−1

¸

k=0

f(k) =

n

0

f(x) dx −

1

2

[f(x)]

n

0

+

+

1

12

[f

′

(x)]

n

0

−

1

720

[f

′′′

(x)]

n

0

+

and this is the celebrated Euler-McLaurin summation

formula. It expresses a sum as a function of the in-

tegral and the successive derivatives of the function

f(x). In this sense, the formula can be seen as a

method for approximating a sum by means of an in-

tegral or, vice versa, for approximating an integral by

means of a sum, and this was just the point of view

of the mathematicians who ﬁrst developed it.

As a simple but very important example, let us

ﬁnd an asymptotic development for the harmonic

numbers H

n

. Since H

n

= H

n−1

+ 1/n, the Euler-

McLaurin formula applies to H

n−1

and to the func-

tion f(x) = 1/x, giving:

H

n−1

=

n

1

dx

x

−

1

2

¸

1

x

n

1

+

1

12

¸

−

1

x

2

n

1

−

−

1

720

¸

−

6

x

4

n

1

+

1

30240

¸

−

120

x

6

n

1

+

= ln n −

1

2n

+

1

2

−

1

12n

2

+

1

12

+

1

120n

4

−

−

1

120

−

1

256n

6

+

1

252

+

6.14. APPLICATIONS OF THE EULER-MCLAURIN FORMULA 81

In this expression a number of constants appears, and

they can be summed together to form a constant γ,

provided that the sum actually converges. However,

we observe that as n → ∞ this constant γ is the

Euler-Mascheroni constant:

lim

n→∞

(H

n−1

−ln n) = γ = 0.577215664902 . . .

By adding 1/n to both sides of the previous relation,

we eventually ﬁnd:

H

n

= ln n +γ +

1

2n

−

1

12n

2

+

1

120n

4

−

1

252n

6

+

and this is the asymptotic expansion we were looking

for.

6.14 Applications of the Euler-

McLaurin Formula

As another application of the Euler-McLaurin sum-

mation formula, we now show the derivation of the

Stirling’s approximation for n!. The ﬁrst step consists

in taking the logarithm of that quantity:

ln n! = ln 1 + ln 2 + ln 3 + + ln n

so that we are reduced to compute a sum and hence

to apply the Euler-McLaurin formula:

ln(n −1)! =

n−1

¸

k=1

ln k =

n

1

ln xdx −

1

2

[ln x]

n

1

+

+

1

12

¸

1

x

n

1

−

1

720

¸

2

x

3

n

1

+ =

= nln n −n + 1 −

1

2

ln n +

1

12n

−

−

1

12

−

1

360n

2

+

1

360

+ .

Here we have used the fact that

ln xdx = xln x −

x. At this point we can add lnn to both sides and

introduce a constant σ = 1 −1/12 +1/360 − It is

not by all means easy to determine directly the value

of σ, but by other approaches to the same problem

it is known that σ = ln

√

2π. Numerically, we can

observe that:

1−

1

12

+

1

360

= 0.919(4) and ln

√

2π ≈ 0.9189388.

We can now go on with our sum:

ln n! = nln n−n+

1

2

ln n+ln

√

2π+

1

12n

−

1

360n

3

+

To obtain the value of n! we only have to take expo-

nentials:

n! =

n

n

e

n

√

n

√

2π exp

1

12n

exp

−

1

360n

3

=

√

2πn

n

n

e

n

1 +

1

12n

+

1

288n

2

+

1 −

1

360n

3

+

=

=

√

2πn

n

n

e

n

1 +

1

12n

+

1

288n

2

−

.

This is the well-known Stirling’s approximation for

n!. By means of this approximation, we can also ﬁnd

the approximation for another important quantity:

2n

n

=

(2n)!

n!

2

=

√

4πn

2n

e

2n

2πn

n

e

2n

1 +

1

24n

+

1

1142n

2

−

1 +

1

12n

+

1

288n

2

−

2

=

=

4

n

√

πn

1 −

1

8n

+

1

128n

2

+

.

Another application of the Euler-McLaurin sum-

mation formula is given by the sum

¸

n

k=1

k

p

, when

p is any integer constant diﬀerent from −1, which is

the case of the harmonic numbers:

n−1

¸

k=0

k

p

=

n

0

x

p

dx −

1

2

[x

p

]

n

0

+

1

12

px

p−1

n

0

−

−

1

720

p(p −1)(p −2)x

p−3

n

0

+ =

=

n

p+1

p + 1

−

n

p

2

+

pn

p−1

12

−

−

p(p −1)(p −2)n

p−3

720

+ .

In this case the evaluation at 0 does not introduce

any constant. By adding n

p

to both sides, we have

the following formula, which only contains a ﬁnite

number of terms:

n

¸

k=0

k

p

=

n

p+1

p + 1

+

n

p

2

+

pn

p−1

12

−

−

p(p −1)(p −2)n

p−3

720

+

If p is not an integer, after ⌈p⌉ diﬀerentiations, we

obtain x

q

, where q < 0, and therefore we cannot

consider the limit 0. We proceed with the Euler-

McLaurin formula in the following way:

n−1

¸

k=1

k

p

=

n

1

x

p

dx −

1

2

[x

p

]

n

1

+

1

12

px

p−1

n

1

−

−

1

720

p(p −1)(p −2)x

p−3

n

1

+ =

=

n

p+1

p + 1

−

1

p + 1

−

1

2

n

p

+

1

2

+

pn

p−1

12

−

−

p

12

−

p(p −1)(p −2)n

p−3

720

+

82 CHAPTER 6. FORMAL METHODS

+

p(p −1)(p −2)

720

n

¸

k=1

k

p

=

n

p+1

p + 1

+

n

p

2

+

pn

p−1

12

−

−

p(p −1)(p −2)n

p−3

720

+ +K

p

.

The constant:

K

p

= −

1

p + 1

+

1

2

−

p

12

+

p(p −1)(p −2)

720

+

has a fundamental rˆole when the leading term

n

p+1

/(p + 1) does not increase with n, i.e., when

p < −1. In that case, in fact, the sum converges

to K

p

. When p > −1 the constant is less important.

For example, we have:

n

¸

k=1

√

k =

2

3

n

√

n +

√

n

2

+K

1/2

+

1

24

√

n

+

K

1/2

≈ −0.2078862 . . .

n

¸

k=1

1

√

k

= 2

√

n +K

−1/2

+

1

2

√

n

−

1

24n

√

n

+

K

−1/2

≈ −1.4603545 . . .

For p = −2 we ﬁnd:

n

¸

k=1

1

k

2

= K

−2

−

1

n

+

1

2n

2

−

1

6n

3

+

1

30n

5

−

It is possible to show that K

−2

= π

2

/6 and therefore

we have a way to approximate the sum (see Section

2.7).

Chapter 7

Asymptotics

7.1 The convergence of power

series

In many occasions, we have pointed out that our ap-

proach to power series was purely formal. Because

of that, we always spoke of “formal power series”,

and never considered convergence problems. As we

have seen, a lot of things can be said about formal

power series, but now the moment has arrived that

we must turn to talk about the convergence of power

series. We will see that this allows us to evaluate the

asymptotic behavior of the coeﬃcients f

n

of a power

series

¸

n

f

n

t

n

, thus solving many problems in which

the exact value of f

n

cannot be found. In fact, many

times, the asymptotic evaluation of f

n

can be made

more precise and an actual approximation of f

n

can

be found.

The natural setting for talking about convergence

is the ﬁeld C of the complex numbers and therefore,

from now on, we will think of the indeterminate t

as of a variable taking its values from C. Obviously,

a power series f(t) =

¸

n

f

n

t

n

converges for some

t

0

∈ C iﬀ f(t

0

) =

¸

n

f

n

t

n

0

< ∞, and diverges iﬀ

lim

t→t0

f(t) = ∞. There are cases for which a series

neither converges nor diverges; for example, when t =

1, the series

¸

n

(−1)

n

t

n

does not tend to any limit,

ﬁnite or inﬁnite. Therefore, when we say that a series

does not converge (to a ﬁnite value) for a given value

t

0

∈ C, we mean that the series in t

0

diverges or does

not tend to any limit.

A basic result on convergence is given by the fol-

lowing:

Theorem 7.1.1 Let f(t) =

¸

n

f

n

t

n

be a power se-

ries such that f(t

0

) converges for the value t

0

∈ C.

Then f(t) converges for every t

1

∈ C such that

[t

1

[ < [t

0

[.

Proof: If f(t

0

) < ∞ then an index N ∈ N ex-

ists such that for every n > N we have [f

n

t

n

0

[ ≤

[f

n

[[t

0

[

n

< M, for some ﬁnite M ∈ R. This means

[f

n

[ < M/[t

0

[

n

and therefore:

∞

¸

n=N

f

n

t

n

1

≤

∞

¸

n=N

[f

n

[[t

1

[

n

≤

≤

∞

¸

n=N

M

[t

0

[

n

[t

1

[

n

= M

∞

¸

n=N

[t

1

[

[t

0

[

n

< ∞

because the last sum is a geometric series with

[t

1

[/[t

0

[ < 1 by the hypothesis [t

1

[ < [t

0

[. Since the

ﬁrst N terms obviously amount to a ﬁnite quantity,

the theorem follows.

In a similar way, we can prove that if the series

diverges for some value t

0

∈ C, then it diverges for

every value t

1

such that [t

1

[ > [t

0

[. Obviously, a

series can converge for the single value t

0

= 0, as it

happens for

¸

n

n!t

n

, or can converge for every value

t ∈ C, as for

¸

n

t

n

/n! = e

t

. In all the other cases,

the previous theorem implies:

Theorem 7.1.2 Let f(t) =

¸

n

f

n

t

n

be a power se-

ries; then there exists a non-negative number R ∈ R

or R = ∞ such that:

1. for every complex number t

0

such that [t

0

[ < R

the series (absolutely) converges and, in fact, the

convergence is uniform in every circle of radius

ρ < R;

2. for every complex number t

0

such that [t

0

[ > R

the series does not converge.

The uniform convergence derives from the previous

proof: the constant M can be made unique by choos-

ing the largest value for all the t

0

such that [t

0

[ ≤ ρ.

The value of R is uniquely determined and is called

the radius of convergence for the series. From the

proof of the theorem, for r < R we have [f

n

[r

n

≤

M or

n

[f

n

[ ≤

n

√

M/r → 1/r; this implies that

limsup

n

[f

n

[ ≤ 1/R. Besides, for r > R we

have [f

n

[r

n

≥ 1 for inﬁnitely many n; this implies

limsup

n

[f

n

[ ≥ 1/R, and therefore we have the fol-

lowing formula for the radius of convergence:

1

R

= lim sup

n→∞

n

[f

n

[.

83

84 CHAPTER 7. ASYMPTOTICS

This result is the basis for our considerations on the

asymptotics of a power series coeﬃcients. In fact, it

implies that, as a ﬁrst approximation, [f

n

[ grows as

1/R

n

. However, this is a rough estimate, because it

can also grow as n/R

n

or 1/(nR

n

), and many possi-

bilities arise, which can make more precise the basic

approximation; the next sections will be dedicated to

this problem. We conclude by noticing that if:

lim

n→∞

f

n+1

f

n

= S

then R = 1/S is the radius of convergence of the

series.

7.2 The method of Darboux

Newton’s rule is the basis for many considerations on

asymptotics. In practice, we used it to prove that

F

n

∼ φ

n

/

√

5, and many other proofs can be per-

formed by using Newton’s rule together with the fol-

lowing theorem, whose relevance was noted by Ben-

der and, therefore, will be called Bender’s theorem:

Theorem 7.2.1 Let f(t) = g(t)h(t) where f(t),

g(t), h(t) are power series and h(t) has a ra-

dius of convergence larger than f(t)’s (which there-

fore equals the radius of convergence of g(t)); if

lim

n→∞

g

n

/g

n+1

= b and h(b) = 0, then:

f

n

∼ h(b)g

n

Let us remember that if g(t) has positive real coef-

ﬁcients, then g

n

/g

n+1

tends to the radius of conver-

gence of g(t). The proof of this theorem is omitted

here; instead, we give a simple example. Let us sup-

pose we wish to ﬁnd the asymptotic value for the

Motzkin numbers, whose generating function is:

µ(t) =

1 −t −

√

1 −2t −3t

2

2t

2

.

For n ≥ 2 we obviously have:

µ

n

= [t

n

]

1 −t −

√

1 −2t −3t

2

2t

2

=

= −

1

2

[t

n+2

]

√

1 +t (1 −3t)

1/2

.

We now observe that the radius of convergence of

µ(t) is R = 1/3, which is the same as the radius of

g(t) = (1−3t)

1/2

, while h(t) =

√

1 +t has 1 as radius

of convergence; therefore we have µ

n

/µ

n+1

→ 1/3 as

n → ∞. By Bender’s theorem we ﬁnd:

µ

n

∼ −

1

2

4

3

[t

n+2

](1 −3t)

1/2

=

= −

√

3

3

1/2

n + 2

(−3)

n+2

=

=

√

3

3(2n + 3)

2n + 4

n + 2

3

4

n+2

.

This is a particular case of a more general re-

sult due to Darboux and known as Darboux’ method.

First of all, let us show how it is possible to obtain an

approximation for the binomial coeﬃcient

γ

n

, when

γ ∈ C is a ﬁxed number and n is large. We begin

by proving the following formula for the ratio of two

large values of the Γ function (a, b are two small pa-

rameters with respect to n):

Γ(n +a)

Γ(n +b)

=

= n

a−b

1 +

(a −b)(a +b −1)

2n

+O

1

n

2

.

Let us apply the Stirling formula for the Γ function:

Γ(n +a)

Γ(n +b)

≈

≈

2π

n +a

n +a

e

n+a

1 +

1

12(n +a)

n +b

2π

e

n +b

n+b

1 −

1

12(n +b)

.

If we limit ourselves to the term in 1/n, the two cor-

rections cancellate each other and therefore we ﬁnd:

Γ(n +a)

Γ(n +b)

≈

n +b

n +a

e

b−a

(n +a)

n+a

(n +b)

n+b

=

=

n +b

n +a

e

b−a

n

a−b

(1 +a/n)

n+a

(1 +b/n)

n+b

.

We now obtain asymptotic approximations in the fol-

lowing way:

n +b

n +a

=

1 +b/n

1 +a/n

≈

≈

1 +

b

2n

1 −

a

2n

≈ 1 +

b −a

2n

.

1 +

x

n

n+x

= exp

(n +x) ln

1 +

x

n

=

= exp

(n +x)

x

n

−

x

2

2n

2

+

=

= exp

x +

x

2

n

−

x

2

2n

+

=

= e

x

1 +

x

2

2n

+

.

Therefore, for our expression we have:

Γ(n +a)

Γ(n +b)

≈

≈

n

a−b

e

a−b

e

a

e

b

1 +

a

2

2n

1 −

b

2

2n

1 +

b −a

2n

=

= n

a−b

1 +

a

2

−b

2

−a +b

2n

+O

1

n

2

.

We are now in a position to prove the following:

7.3. SINGULARITIES: POLES 85

Theorem 7.2.2 Let f(t) = h(t)(1 − αt)

γ

, for some

γ which is not a positive integer, and h(t) having a

radius of convergence larger than 1/α. Then we have:

f

n

= [t

n

]f(t) ∼ h

1

α

γ

n

(−α)

n

=

α

n

h(1/α)

Γ(−γ)n

1+γ

.

Proof: We simply apply Bender’s theorem and the

formula for approximating the binomial coeﬃcient:

γ

n

=

γ(γ −1) (γ −n + 1)

n!

=

=

(−1)

n

(n −γ −1)(n −γ −2) (1 −γ)(−γ)

Γ(n + 1)

.

By repeated applications of the recurrence formula

for the Γ function Γ(x + 1) = xΓ(x), we ﬁnd:

Γ(n −γ) =

= (n −γ −1)(n −γ −2) (1 −γ)(−γ)Γ(−γ)

and therefore:

γ

n

=

(−1)

n

Γ(n −γ)

Γ(n + 1)Γ(−γ)

=

=

(−1)

n

Γ(−γ)

n

−1−γ

1 +

γ(γ + 1)

2n

**from which the desired formula follows.
**

7.3 Singularities: poles

The considerations in the previous sections show how

important is to determine the radius of convergence

of a series, when we wish to have an approximation

of its coeﬃcients. Therefore, we are now going to

look more closely to methods for ﬁnding the radius

of convergence of a given function f(t). In order to do

this, we have to distinguish between the function f(t)

and its series development, which will be denoted by

´

f(t). So, for example, we have f(t) = 1/(1 − t) and

´

f(t) = 1 +t +t

2

+t

3

+ . The series

´

f(t) represents

f(t) inside the circle of convergence, in the sense that

´

f(t) = f(t) for every t internal to this circle, but

´

f(t) can be diﬀerent from f(t) on the border of the

circle or outside it, where actually the series does not

converge. Therefore, the radius of convergence can

be determined if we are able to ﬁnd out values t

0

for

which f(t

0

) =

´

f(t

0

).

A ﬁrst case is when t

0

is an isolated point for which

lim

t→t0

f(t) = ∞. In fact, in this case, for every t

such that [t[ > [t

0

[ the series

´

f(t) should diverge as

we have seen in the previous section, and therefore

´

f(t) must be diﬀerent from f(t). This is the case of

f(t) = 1/(1−t), which goes to ∞ when t → 1. When

[t[ > 1, f(t) assumes a well-deﬁned value while

´

f(t)

diverges.

We will call a singularity for f(t) every point t

0

∈ C

such that in every neighborhood of t

0

there is a t for

which f(t) and

´

f(t) behaves diﬀerently and a t

′

for

which f(t

′

) =

´

f(t

′

). Therefore, t

0

= 1 is a singularity

for f(t) = 1/(1 −t). Because our previous considera-

tions, the singularities of f(t) determine its radius of

convergence; on the other hand, no singularity can be

contained in the circle of convergence, and therefore

the radius of convergence is determined by the singu-

larity or singularities of smallest modulus. These will

be called dominating singularities and we observe ex-

plicitly that a function can have more than one dom-

inating singularity. For example, f(t) = 1/(1 − t

2

)

has t = 1 and t = −1 as dominating singularities, be-

cause [1[ = [ −1[. The radius of convergence is always

a non-negative real number and we have R = [t

0

[, if

t

0

is any one of the dominating singularities for f(t).

An isolated point t

0

for which f(t

0

) = ∞ is there-

fore a singularity for f(t); as we shall see, not every

singularity of f(t) is such that f(t) = ∞, but, for

the moment, let us limit ourselves to this case. The

following situation is very important: if f(t

0

) = ∞

and we set α = 1/t

0

, we will say that t

0

is a pole for

f(t) iﬀ there exists a positive integer m such that:

lim

t→t0

(1 −αt)

m

f(t) = K < ∞ and K = 0.

The integer m is called the order of the pole. By

this deﬁnition, the function f(t) = 1/(1 − t) has a

pole of order 1 in t

0

= 1, while 1/(1 − t)

2

has a

pole of order 2 in t

0

= 1 and 1/(1 − 2t)

5

has a pole

of order 5 in t

0

= 1/2. A more interesting case is

f(t) = (e

t

− e)/(1 − t)

2

, which, notwithstanding the

(1 −t)

2

, has a pole of order 1 in t

0

= 1; in fact:

lim

t→1

(1 −t)

e

t

−e

(1 −t)

2

= lim

t→1

e

t

−e

1 −t

= lim

t→1

e

t

−1

= −e.

The generating function of Bernoulli numbers

f(t) = t/(e

t

− 1) has inﬁnitely many poles. Observe

ﬁrst that t = 0 is not a pole because:

lim

t→0

t

e

t

−1

= lim

t→0

1

e

t

= 1.

The denominator becomes 0 when e

t

= 1, and this

happens when t = 2kπi; in fact, e

2kπi

= cos 2kπ +

i sin 2kπ = 1. In that case, the dominating sin-

gularities are t

0

= 2πi and t

1

= −2πi. Finally,

the generating function of the ordered Bell numbers

f(t) = 1/(2−e

t

) has again an inﬁnite number of poles

t = ln 2+2kπi; in this case the dominating singularity

is t

0

= ln 2.

We conclude this section by observing that if

f(t

0

) = ∞, not necessarily t

0

is a pole for f(t). In

86 CHAPTER 7. ASYMPTOTICS

fact, let us consider the generating function for the

central binomial coeﬃcients f(t) = 1/

√

1 −4t. For

t

0

= 1/4 we have f(1/4) = ∞, but t

0

is not a pole of

order 1 because:

lim

t→1/4

1 −4t

√

1 −4t

= lim

t→1/4

√

1 −4t = 0

and the same happens if we try with (1 − 4t)

m

for

m > 1. As we shall see, this kind of singularity is

called “algebraic”. Finally, let us consider the func-

tion f(t) = exp(1/(1−t)), which goes to ∞ as t → 1.

In this case we have:

lim

t→1

(1 −t)

m

exp

1

1 −t

=

= lim

t→1

(1 −t)

m

1 +

1

1 −t

+

1

2(1 −t)

2

+

= ∞.

In fact, the ﬁrst m − 1 terms tend to 0, the mth

term tends to 1/m!, but all the other terms go to ∞.

Therefore, t

0

= 1 is not a pole of any order. When-

ever we have a function f(t) for which a point t

0

∈ C

exists such that ∀m > 0: lim

t→t0

(1−t/t

0

)

m

f(t) = ∞,

we say that t

0

is an essential singularity for f(t). Es-

sential singularities are points at which f(t) goes to

∞ too fast; these singularities cannot be treated by

Darboux’ method and their study will be delayed un-

til we study the Hayman’s method.

7.4 Poles and asymptotics

Darboux’ method can be easily used to deal with

functions, whose dominating singularities are poles.

Actually, a direct application of Bender’s theorem is

suﬃcient, and this is the way we will use in the fol-

lowing examples.

Fibonacci numbers are easily approximated:

[t

n

]

t

1 −t −t

2

= [t

n

]

t

1 −

´

φt

1

1 −φt

∼

∼

¸

t

1 −

´

φt

t =

1

φ

[t

n

]

1

1 −φt

=

1

√

5

φ

n

.

Our second example concerns a particular kind of

permutations, called derangements (see Section 2.2).

A derangement is a permutation without any ﬁxed

point. For n = 0 the empty permutation is consid-

ered a derangement, since no ﬁxed point exists. For

n = 1, there is no derangement, but for n = 2 the

permutation (1 2), written in cycle notation, is ac-

tually a derangement. For n = 3 we have the two

derangements (1 2 3) and (1 3 2), and for n = 4 we

have a total of 9 derangements.

Let D

n

the number of derangements in P

n

; we can

count them in the following way: we begin by sub-

tracting from n!, the total number of permutations,

the number of permutations having at least a ﬁxed

point: if the ﬁxed point is 1, we have (n−1)! possible

permutations; if the ﬁxed point is 2, we have again

(n − 1)! permutations of the other elements. There-

fore, we have a total of n(n − 1)! cases, giving the

approximation:

D

n

= n! −n(n −1)!.

This quantity is clearly 0 and this happens because

we have subtracted twice every permutation with at

least 2 ﬁxed points: in fact, we subtracted it when

we considered the ﬁrst and the second ﬁxed point.

Therefore, we have now to add permutations with at

least two ﬁxed points. These are obtained by choos-

ing the two ﬁxed points in all the

n

2

possible ways

and then permuting the n − 2 remaining elements.

Thus we have the new approximation:

D

n

= n! −n(n −1)! +

n

2

(n −2)!.

In this way, however, we added twice permutations

with at least three ﬁxed points, which have to be

subtracted again. We thus obtain:

D

n

= n! −n(n −1)! +

n

2

(n −2)! −

n

3

(n −3)!.

We can now go on with the same method, which is

called the inclusion exclusion principle, and eventu-

ally arrive to the ﬁnal value:

D

n

= n! −n(n −1)! +

n

2

(n −2)! −

−

n

3

(n −3)! + =

=

n!

0!

−

n!

1!

+

n!

2!

−

n!

3!

+ = n!

n

¸

k=0

(−1)

k

k!

.

This formula checks with the previously found val-

ues. We obtain the exponential generating function

((D

n

/n!) by observing that the generic element in

the sum is the coeﬃcient [t

n

]e

−t

, and therefore by

the theorem on the generating function for the par-

tial sums we have:

(

D

n

n!

=

e

−t

1 −t

.

In order to ﬁnd the asymptotic value for D

n

, we

observe that the radius of convergence of 1/(1 − t)

is 1, while e

−t

converges for every value of t. By

Bender’s theorem we have:

D

n

n!

∼ e

−1

or D

n

∼

n!

e

.

7.5. ALGEBRAIC AND LOGARITHMIC SINGULARITIES 87

This value is indeed a very good approximation for

D

n

, which can actually be computed as the integer

nearest to n!/e.

Let us now see how Bender’s theorem is applied

to the exponential generating function of the ordered

Bell numbers. We have shown that the dominating

singularity is a pole at t = ln 2 which has order 1:

lim

t→ln 2

1 −t/ ln 2

2 −e

t

= lim

t→ln 2

−1/ ln 2

−e

t

=

1

2 ln 2

.

At this point we have:

[t

n

]

1

2 −e

t

= [t

n

]

1

1 −t/ ln 2

1 −t/ ln 2

2 −e

t

∼

∼

¸

1 −t/ ln 2

2 −e

t

t = ln 2

[t

n

]

1

1 −t/ ln 2

=

=

1

2

1

(ln 2)

n+1

and we conclude with the very good approximation

O

n

∼ n!/(2(ln 2)

n+1

).

Finally, we ﬁnd the asymptotic approximation for

the Bernoulli numbers. The following statement is

very important when we have functions with several

dominating singularity:

Principle: If t

1

, t

2

, . . . , t

k

are all the dominating

singularities of a function f(t), then [t

n

]f(t) can be

found by summing all the contributions obtained by

independently considering the k singularities.

We already observed that ±2πi are the two domi-

nating singularities for the generating function of the

Bernoulli numbers; they are both poles of order 1:

lim

t→2πi

t(1 −t/2πi)

e

t

−1

= lim

t→2πi

1 −t/πi

e

t

= −1.

lim

t→−2πi

t(1 +t/2πi)

e

t

−1

= lim

t→−2πi

1 +t/πi

e

t

= −1.

Therefore we have:

[t

n

]

t

e

t

−1

= [t

n

]

1

1 −t/2πi

t(1 −t/2πi)

e

t

−1

∼ −

1

(2πi)

n

.

A similar result is obtained for the other pole; thus

we have:

B

n

n!

∼ −

1

(2πi)

n

−

1

(−2πi)

n

.

When n is odd, these two values are opposite in sign

and the result is 0; this conﬁrms that the Bernoulli

numbers of odd index are 0, except for n = 1. When

n is even, say n = 2k, we have (2πi)

2k

= (−2πi)

2k

=

(−1)

k

(2π)

2k

; therefore:

B

2k

∼ −

2(−1)

k

(2k)!

(2π)

2k

.

This formula is a good approximation, also for small

values of n, and shows that Bernoulli numbers be-

come, in modulus, larger and larger as n increases.

7.5 Algebraic and logarithmic

singularities

Let us consider the generating function for the Cata-

lan numbers f(t) = (1 −

√

1 −4t)/(2t) and the cor-

responding power series

´

f(t) = 1 + t + 2t

2

+ 5t

3

+

14t

4

+ . Our choice of the − sign was motivated

by the initial condition of the recurrence C

n+1

=

¸

n

k=0

C

k

C

n−k

deﬁning the Catalan numbers. This

is due to the fact that, when the argument is a pos-

itive real number, we can choose the positive value

as the result of a square root. In other words, we

consider the arithmetic square root instead of the al-

gebraic square root. This allows us to identify the

power series

´

f(t) with the function f(t), but when we

pass to complex numbers this is no longer possible.

Actually, in the complex ﬁeld, a function containing

a square root is a two-valued function, and there are

two branches deﬁned by the same expression. Only

one of these two branches coincides with the func-

tion deﬁned by the power series, which is obviously a

one-valued function.

The points at which a square root becomes 0 are

special points; in them the function is one-valued, but

in every neighborhood the function is two-valued. For

the smallest in modulus among these points, say t

0

,

we must have the following situation: for t such that

[t[ < [t

0

[,

´

f(t) should coincide with a branch of f(t),

while for t such that [t[ > [t

0

[,

´

f(t) cannot converge.

In fact, consider a t ∈ R, t > [t

0

[; the expression un-

der the square root should be a negative real number

and therefore f(t) ∈ C`R; but

´

f(t) can only be a real

number or f(t) does not converge. Because we know

that when

´

f(t) converges we must have

´

f(t) = f(t),

we conclude that

´

f(t) cannot converge. This shows

that t

0

is a singularity for f(t).

Every kth root originates the same problem and

the function is actually a k-valued function; all the

values for which the argument of the root is 0 is a

singularity, called an algebraic singularity. They can

be treated by Darboux’ method or, directly, by means

of Bender’s theorem, which relies on Newton’s rule.

Actually, we already used this method to ﬁnd the

asymptotic evaluation for the Motzkin numbers.

The same considerations hold when a function con-

tains a logarithm. In fact, a logarithm is an inﬁnite-

valued function, because it is the inverse of the expo-

nential, which, in the complex ﬁeld C, is a periodic

function:

e

t+2kπi

= e

t

e

2kπi

= e

t

(cos 2kπ +i sin 2kπ) = e

t

.

The period of e

t

is therefore 2πi and lnt is actually

ln t + 2kπi, for k ∈ Z. A point t

0

for which the ar-

gument of a logarithm is 0 is a singularity for the

88 CHAPTER 7. ASYMPTOTICS

corresponding function. In every neighborhood of t

0

,

the function has an inﬁnite number of branches; this

is the only fact distinguishing a logarithmic singular-

ity from an algebraic one.

Let us suppose we have the sum:

S

n

= 1 +

2

2

+

4

3

+

8

4

+ +

2

n−1

n

=

1

2

n

¸

k=1

2

k

k

and we wish to compute an approximate value. The

generating function is:

(

1

2

n

¸

k=1

2

k

k

=

1

2

1

1 −t

ln

1

1 −2t

.

There are two singularities: t = 1 is a pole, while t =

1/2 is a logarithmic singularity. Since the latter has

smaller modulus, it is dominating and R = 1/2 is the

radius of convergence of the function. By Bender’s

theorem we have:

S

n

=

1

2

[t

n

]

1

1 −t

ln

1

1 −2t

∼

∼

1

2

1

1 −1/2

[t

n

] ln

1

1 −2t

=

2

n

n

.

This is not a very good approximation. In the next

section we will see how it can be improved.

7.6 Subtracted singularities

The methods presented in the preceding sections only

give the expression describing the general behavior of

the coeﬃcients f

n

in the expansion

´

f(t) =

¸

∞

k=0

f

k

t

k

,

i.e., what is called the principal value for f

n

. Some-

times, this behavior is only achieved for very large

values of n, but for smaller values it is just a rough

approximation of the true value. Because of that,

we speak of “asymptotic evaluation” or “asymptotic

approximation”. When we need a true approxima-

tion, we should introduce some corrections, which

slightly modify the general behavior and more ac-

curately evaluate the true value of f

n

.

Many times, the following observation solves the

problem. Suppose we have found, by one of the

previous methods, that a function f(t) is such that

f(t) ∼ A(1−αt)

γ

, for some A, γ ∈ R, or, more in gen-

eral, f(t) ∼ g(t), for some function g(t) of which we

exactly know the coeﬃcients g

n

. For example, this is

the case of ln(1/(1−αt)). Because f

n

∼ g

n

, the func-

tion h(t) = f(t) − g(t) has coeﬃcients h

n

that grow

more slowly than f

n

; formally, since O(f

n

) = O(g

n

),

we must have h

n

= o(f

n

). Therefore, the quantity

g

n

+h

n

is a better approximation to f

n

than g

n

.

When f(t) has a pole t

0

of order m, the successive

application of this method of subtracted singularities

completely eliminates the singularity. Therefore, if

a second singularity t

1

exists such that [t

1

[ > [t

0

[,

we can express the coeﬃcients f

n

in terms of t

1

as

well. When f(t) has a dominating algebraic singular-

ity, this cannot be eliminated, but the method of sub-

tracted singularities allows us to obtain corrections to

the principal value. Formally, by the successive ap-

plication of this method, we arrive to the following

results. If t

0

is a dominating pole of order m and

α = 1/t

0

, then we ﬁnd the expansion:

f(t) =

A

−m

(1 −αt)

m

+

A

−m+1

(1 −αt)

m−1

+

A

−m+2

(1 −αt)

m−2

+ .

If t

0

is a dominating algebraic singularity and f(t) =

h(t)(1 − αt)

p/m

, where α = 1/t

0

and h(t) has a ra-

dius of convergence larger than [t

0

[, then we ﬁnd the

expansion:

f(t) = A

p

(1 −αt)

p/m

+A

p−1

(1 −αt)

(p−1)/m

+

+A

p−2

(1 −αt)

(p−2)/m

+ .

Newton’s rule can obviously be used to pass from

these expansions to the asymptotic value of f

n

.

The same method of subtracted singularities can be

used for a logarithmic singularity. Let us consider as

an example the sum S

n

=

¸

n

k=1

2

k−1

/k, introduced

in the previous section. We found the principal value

S

n

∼ 2

n

/n by studying the generating function:

1

2

1

1 −t

ln

1

1 −2t

∼ ln

1

1 −2t

.

Let us therefore consider the new function:

h(t) =

1

2

1

1 −t

ln

1

1 −2t

−ln

1

1 −2t

=

= −

1 −2t

2(1 −t)

ln

1

1 −2t

.

The generic term h

n

should be signiﬁcantly less than

f

n

; the factor (1 − 2t) actually reduces the order of

growth of the logarithm:

−[t

n

]

1 −2t

2(1 −t)

ln

1

1 −2t

=

= −

1

2(1 −1/2)

[t

n

](1 −2t) ln

1

1 −2t

=

= −

2

n

n

−2

2

n−1

n −1

=

2

n

n(n −1)

.

Therefore, a better approximation for S

n

is:

S

n

=

2

n

n

+

2

n

n(n −1)

=

2

n

n −1

.

The reader can easily verify that this correction

greatly reduces the error in the evaluation of S

n

.

7.8. HAYMAN’S METHOD 89

A further correction can now be obtained by con-

sidering:

k(t) =

1

2

1

1 −t

ln

1

1 −2t

−

−ln

1

1 −2t

+ (1 −2t) ln

1

1 −2t

=

=

(1 −2t)

2

2(1 −t)

ln

1

1 −2t

which gives:

k

n

=

1

2(1 −1/2)

[t

n

](1 −2t)

2

ln

1

1 −2t

=

=

2

n

n

−4

2

n−1

n −1

+ 4

2

n−2

n −2

=

2

n+1

n(n −1)(n −2)

.

This correction is still smaller, and we can write:

S

n

∼

2

n

n −1

1 +

2

n(n −2)

.

In general, we can obtain the same results if we ex-

pand the function h(t) in f(t) = g(t)h(t), h(t) with a

radius of convergence larger than that of f(t), around

the dominating singularity. This is done in the fol-

lowing way:

1

2(1 −t)

=

1

1 + (1 −2t)

=

= 1 −(1 −2t) + (1 −2t)

2

−(1 −2t)

3

+ .

This implies:

1

2(1 −t)

ln

1

1 −2t

= ln

1

1 −2t

−

−(1 −2t) ln

1

1 −2t

+ (1 −2t)

2

ln

1

1 −2t

−

and the result is the same as the one previously ob-

tained by the method of subtracted singularities.

7.7 The asymptotic behavior of

a trinomial square root

In many problems we arrive to a generating function

of the form:

f(t) =

p(t) −

(1 −αt)(1 −βt)

rt

m

or:

g(t) =

q(t)

(1 −αt)(1 −βt)

.

In the former case, p(t) is a correcting polynomial,

which has no eﬀect on f

n

, for n suﬃciently large,

and therefore we have:

f

n

= −

1

r

[t

n+m

]

(1 −αt)(1 −βt)

where m is a small integer. In the second case, g

n

is

the sum of various terms, as many as there are terms

in the polynomial q(t), each one of the form:

q

k

[t

n−k

]

1

(1 −αt)(1 −βt)

.

It is therefore interesting to compute, once and for

all, the asymptotic value [t

n

]((1−αt)(1−βt))

s

, where

s = 1/2 or s = −1/2.

Let us suppose that [α[ > [β[, since the case α = β

has no interest and the case α = −β should be ap-

proached in another way. This hypothesis means that

t = 1/α is the radius of convergence of the function

and we can develop everything around this singular-

ity. In most combinatorial problems we have α > 0,

because the coeﬃcients of f(t) are positive numbers,

but this is not a limiting factor.

Let us consider s = 1/2; in this case, a minus sign

should precede the square root. The evaluation is

shown in Table 7.1. The formula so obtained can be

considered suﬃcient for obtaining both the asymp-

totic evaluation of f

n

and a suitable numerical ap-

proximation. However, we can use the following de-

velopments:

2n

n

=

4

n

√

πn

1 −

1

8n

+

1

128n

2

+O

1

n

3

1

2n −1

=

1

2n

1 +

1

2n

+

1

4n

2

+O

1

n

3

1

2n −3

=

1

2n

1 +

3

2n

+O

1

n

2

and get:

f

n

=

α −β

α

α

n

2n

√

πn

1 −

6 + 3(α −β)

8(α −β)n

+

25

128n

2

+

+

9

8(α −β)n

2

−

9αβ + 51β

2

32(α −β)

2

n

2

+O

1

n

3

.

The reader is invited to ﬁnd a similar formula for

the case s = 1/2.

7.8 Hayman’s method

The method for coeﬃcient evaluation which uses the

function singularities (Darboux’ method), can be im-

proved and made more accurate, as we have seen,

by the technique of “subtracted singularities”. Un-

fortunately, these methods become useless when the

function f(t) has no singularity (entire functions) or

when the dominating singularity is essential. In fact,

in the former case we do not have any singularity to

operate on, and in the latter the development around

the singularity gives rise to a series with an inﬁnite

number of terms of negative degree.

90 CHAPTER 7. ASYMPTOTICS

[t

n

] −(1 −αt)

1/2

(1 −βt)

1/2

= [t

n

] −(1 −αt)

1/2

α−β

α

+

β

α

(1 −αt)

1/2

=

= −

α−β

α

[t

n

](1 −αt)

1/2

1 +

β

α−β

(1 −αt)

1/2

=

= −

α−β

α

[t

n

](1 −αt)

1/2

1 +

β

2(α−β)

(1 −αt) −

β

2

8(α−β)

2

(1 −αt)

2

+

=

= −

α−β

α

[t

n

]

(1 −αt)

1/2

+

β

2(α−β)

(1 −αt)

3/2

−

β

2

8(α−β)

2

(1 −αt)

5/2

+

=

= −

α−β

α

1/2

n

(−α)

n

+

β

2(α−β)

3/2

n

(−α)

n

−

β

2

8(α−β)

2

5/2

n

(−α)

n

+

=

= −

α−β

α

(−1)

n−1

4

n

(2n−1)

2n

n

(−α)

n

1 −

β

2(α−β)

3

2n−3

−

β

2

8(α−β)

2

15

(2n−3)(2n−5)

+

=

=

α−β

α

α

n

4

n

(2n−1)

2n

n

1 −

3β

2(α−β)(2n−3)

−

15β

2

8(α−β)

2

(2n−3)(2n−5)

+O

1

n

3

.

Table 7.1: The case s = 1/2

In these cases, the only method seems to be the

Cauchy theorem, which allows us to evaluate [t

n

]f(t)

by means of an integral:

f

n

=

1

2πi

γ

f(t)

t

n+1

dt

where γ is a suitable path enclosing the origin. We do

not intend to develop this method here, but we’ll limit

ourselves to sketch a method, derived from Cauchy

theorem, which allows us to ﬁnd an asymptotic evalu-

ation for f

n

in many practical situations. The method

can be implemented on a computer in the following

sense: given a function f(t), in an algorithmic way we

can check whether f(t) belongs to the class of func-

tions for which the method is applicable (the class

of “H-admissible” functions) and, if that is the case,

we can evaluate the principal value of the asymp-

totic estimate for f

n

. The system ΛΥΩ, by Flajolet,

Salvy and Zimmermann, realizes this method. The

development of the method was mainly performed

by Hayman and therefore it is known as Hayman’s

method; this also justiﬁes the use of the letter H in

the deﬁnition of H-admissibility.

A function is called H-admissible if and only if it

belongs to one of the following classes or can be ob-

tained, in a ﬁnite number of steps according to the

following rules, from other H-admissible functions:

1. if f(t) and g(t) are H-admissible functions and

p(t) is a polynomial with real coeﬃcients and

positive leading term, then:

exp(f(t)) f(t) +g(t) f(t) +p(t)

p(f(t)) p(t)f(t)

are all H-admissible functions;

2. if p(t) is a polynomial with positive coeﬃcients

and not of the form p(t

k

) for k > 1, then the

function exp(p(t)) is H-admissible;

3. if α, β are positive real numbers and γ, δ are real

numbers, then the function:

f(t) = exp

β

(1 −t)

α

1

t

ln

1

(1 −t)

γ

2

t

ln

1

t

ln

1

(1 −t)

δ

is H-admissible.

For example, the following functions are all H-

admissible:

e

t

exp

t +

t

2

2

exp

t

1 −t

exp

1

t(1 −t)

2

ln

1

1 −t

.

In particular, for the third function we have:

exp

t

1 −t

= exp

1

1 −t

−1

=

1

e

exp

1

1 −t

**and naturally a constant does not inﬂuence the H-
**

admissibility of a function. In this example we have

α = β = 1 and γ = δ = 0.

For H-admissible functions, the following result

holds:

Theorem 7.8.1 Let f(t) be an H-admissible func-

tion; then:

f

n

= [t

n

]f(t) ∼

f(r)

r

n

2πb(r)

as n → ∞

where r = r(n) is the least positive solution of the

equation tf

′

(t)/f(t) = n and b(t) is the function:

b(t) = t

d

dt

t

f

′

(t)

f(t)

.

As we said before, the proof of this theorem is

based on Cauchy’s theorem and is beyond the scope

of these notes. Instead, let us show some examples

to clarify the application of Hayman’s method.

7.9. EXAMPLES OF HAYMAN’S THEOREM 91

7.9 Examples of Hayman’s

Theorem

The ﬁrst example can be easily veriﬁed. Let f(t) = e

t

be the exponential function, so that we know f

n

=

1/n!. For applying Hayman’s theorem, we have to

solve the equation te

t

/e

t

= n, which gives r = n.

The function b(t) is simply t and therefore we have:

[t

n

]e

t

=

e

n

n

n

√

2πn

and in this formula we immediately recognize Stirling

approximation for factorials.

Examples become early rather complex and require

a large amount of computations. Let us consider the

following sum:

n−1

¸

k=0

n −1

k

1

(k + 1)!

=

n

¸

k=0

n −1

k −1

1

k!

=

=

n

¸

k=0

n

k

−

n −1

k

1

k!

=

=

n

¸

k=0

n

k

1

k!

−

n−1

¸

k=0

n −1

k

1

k!

=

= [t

n

]

1

1 −t

exp

t

1 −t

−

−[t

n−1

]

1

1 −t

exp

t

1 −t

=

= [t

n

]

1 −t

1 −t

exp

t

1 −t

= [t

n

] exp

t

1 −t

.

We have already seen that this function is H-

admissible, and therefore we can try to evaluate the

asymptotic behavior of the sum. Let us deﬁne the

function g(t) = tf

′

(t)/f(t), which in the present case

is:

g(t) =

t

(1−t)

2

exp

t

1−t

exp

t

1−t

=

t

(1 −t)

2

.

The value of r is therefore given by the minimal pos-

itive solution of:

t

(1 −t)

2

= n or nt

2

−(2n + 1)t +n = 0.

Because ∆ = 4n

2

+ 4n + 1 − 4n

2

= 4n + 1, we have

the two solutions:

r =

2n + 1 ±

√

4n + 1

2n

and we must accept the one with the ‘−’ sign, which is

positive and less than the other. It is surely positive,

because

√

4n + 1 < 2

√

n + 1 < 2n, for every n > 1.

As n grows, we can also give an asymptotic approx-

imation of r, by developing the expression inside the

square root:

√

4n + 1 = 2

√

n

1 +

1

4n

=

= 2

√

n

1 +

1

8n

+O

1

n

2

.

From this formula we immediately obtain:

r = 1 −

1

√

n

+

1

2n

−

1

8n

√

n

+O

1

n

2

**which will be used in the next approximations.
**

First we compute an approximation for f(r), that

is exp(r/(1 −r)). Since:

1 −r =

1

√

n

−

1

2n

+

1

8n

√

n

+O

1

n

2

=

=

1

√

n

1 −

1

2

√

n

+

1

8n

+O

1

n

√

n

**we immediately obtain:
**

r

1 −r

=

1 −

1

√

n

+

1

2n

+O

1

n

√

n

√

n

1 +

1

2

√

n

+

1

8n

+O

1

n

√

n

=

=

√

n

1 −

1

2

√

n

+

1

8n

+O

1

n

√

n

=

=

√

n −

1

2

+

1

8

√

n

+O

1

n

.

Finally, the exponential gives:

exp

r

1 −r

=

e

√

n

√

e

exp

1

8

√

n

+O

1

n

=

=

e

√

n

√

e

1 +

1

8

√

n

+O

1

n

.

Because Hayman’s method only gives the principal

value of the result, the correction can be ignored (it

can be not precise) and we get:

exp

r

1 −r

=

e

√

n

√

e

.

The second part we have to develop is 1/r

n

, which

can be computed when we write it as exp(nln 1/r),

that is:

1

r

n

= exp

nln

1 +

1

√

n

+

1

2n

+O

1

n

√

n

=

= exp

n

1

√

n

+

1

2n

+O

1

n

√

n

−

−

1

2

1

√

n

+O

1

n

√

n

2

+O

1

n

√

n

**92 CHAPTER 7. ASYMPTOTICS
**

= exp

n

1

√

n

+

1

2n

−

1

2n

+O

1

n

√

n

=

= exp

√

n +O

1

√

n

∼ e

√

n

.

Again, the correction is ignored and we only consider

the principal value. We now observe that f(r)/r

n

∼

e

2

√

n

/

√

e.

Only b(r) remains to be computed; we have b(t) =

tg

′

(t) where g(t) is as above, and therefore we have:

b(t) = t

d

dt

t

(1 −t)

2

=

t(1 +t)

(1 −t)

3

.

This quantity can be computed a piece at a time.

First:

r(1 +r) =

1 −

1

√

n

+

1

2n

+O

1

n

√

n

2 −

1

√

n

+

1

2n

+O

1

n

√

n

=

= 2 −

3

√

n

+

5

2n

+O

1

n

√

n

=

= 2

1 −

3

2

√

n

+

5

4n

+O

1

n

√

n

.

By using the expression already found for (1 −r) we

then have:

(1 −r)

3

=

1

n

√

n

1 −

1

2

√

n

+

1

8n

+O

1

n

√

n

3

=

=

1

n

√

n

1 −

1

√

n

+

1

2n

+O

1

n

√

n

1 −

1

2

√

n

+

1

8n

+O

1

n

√

n

=

=

1

n

√

n

1 −

3

2

√

n

+

9

8n

+O

1

n

√

n

.

By inverting this quantity, we eventually get:

b(r) =

r(1 +r)

(1 −r)

3

=

= 2n

√

n

1 +

3

2

√

n

+

9

8n

+O

1

n

√

n

1 −

3

2

√

n

+

5

4n

+O

1

n

√

n

=

= 2n

√

n

1 +

1

8n

+O

1

n

√

n

.

The principal value is 2n

√

n and therefore:

2πb(r) =

4πn

√

n = 2

√

πn

3/4

;

the ﬁnal result is:

f

n

= [t

n

] exp

t

1 −t

∼

e

2

√

n

2

√

πen

3/4

.

It is a simple matter to execute the original sum

¸

n−1

k=0

n−1

k

**/(k + 1)! by using a personal computer
**

for, say, n = 100, 200, 300. The results obtained can

be compared with the evaluation of the previous for-

mula. We can thus verify that the relative error de-

creases as n increases.

Chapter 8

Bibliography

The birth of Computer Science and the need of ana-

lyzing the behavior of algorithms and data structures

have given a strong twirl to Combinatorial Analysis,

and to the mathematical methods used to study com-

binatorial objects. So, near to the traditional litera-

ture on Combinatorics, a number of books and papers

have been produced, relating Computer Science and

the methods of Combinatorial Analysis. The ﬁrst au-

thor who systematically worked in this direction was

surely Donald Knuth, who published in 1968 the ﬁrst

volume of his monumental “The Art of Computer

Programming”, the ﬁrst part of which is dedicated

to the mathematical methods used in Combinatorial

Analysis. Without this basic knowledge, there is little

hope to understand the developments of the analysis

of algorithms and data structures:

Donald E. Knuth: The Art of Computer Pro-

gramming: Fundamental Algorithms, Vol. I,

Addison-Wesley (1968).

Many additional concepts and techniques are also

contained in the third volume:

Donald E. Knuth: The Art of Computer Pro-

gramming: Sorting and Searching, Vol. III,

Addison-Wesley (1973).

Numerical and probabilistic developments are to

be found in the central volume:

Donald E. Knuth: The Art of Computer Pro-

gramming: Numerical Algorithms, Vol. II,

Addison-Wesley (1973)

A concise exposition of several important tech-

niques is given in:

Daniel H. Greene, Donald E. Knuth: Mathemat-

ics for the Analysis of Algorithms, Birkh¨auser

(1981).

The most systematic work in the ﬁeld is perhaps

the following text, very clear and very comprehensive.

It deals, in an elementary but rigorous way, with the

main topics of Combinatorial Analysis, with an eye

to Computer Science. Many exercise (with solutions)

are proposed, and the reader is never left alone in

front of the many-faced problems he or she is encour-

aged to tackle.

Ronald L. Graham, Donald E. Knuth, Oren

Patashnik: Concrete Mathematics, Addison-

Wesley (1989).

Many texts on Combinatorial Analysis are worth of

being considered, because they contain information

on general concepts, both from a combinatorial and

a mathematical point of view:

William Feller: An Introduction to Probabil-

ity Theory and Its Applications, Wiley (1950) -

(1957) - (1968).

John Riordan: An Introduction to Combinatorial

Analysis, Wiley (1953) (1958).

Louis Comtet: Advanced Combinatorics, Reidel

(1974).

Ian P. Goulden, David M. Jackson: Combinato-

rial Enumeration, Dover Publ. (2004).

Richard P. Stanley: Enumerative Combinatorics,

Vol. I, Cambridge Univ. Press (1986) (2000).

Richard P. Stanley: Enumerative Combinatorics,

Vol. II, Cambridge Univ. Press !997) (2001).

Robert Sedgewick, Philippe Flajolet: An Intro-

duction to the Analysis of Algorithms, Addison-

Wesley (1995).

93

94 CHAPTER 8. BIBLIOGRAPHY

* * *

Chapter 1: Introduction

The concepts relating to the problem of searching

can be found in any text on Algorithms and Data

Structures, in particular Volume III of the quoted

“The Art of Computer Programming”. Volume I can

be consulted for Landau notation, even if it is com-

mon in Mathematics. For the mathematical concepts

of Γ and ψ functions, the reader is referred to:

Milton Abramowitz, Irene A. Stegun: Handbook

of Mathematical Functions, Dover Publ. (1972).

* * *

Chapter 2: Special numbers

The quoted book by Graham, Knuth and Patash-

nik is appropriate. Anyhow, the quantity we consider

are common to all parts of Mathematics, in partic-

ular of Combinatorial Analysis. The ζ function and

the Bernoulli numbers are also covered by the book of

Abramowitz and Stegun. Stanley dedicates to Cata-

lan numbers a special survey in his second volume.

Probably, they are the most frequently used quan-

tity in Combinatorics, just after binomial coeﬃcients

and before Fibonacci numbers. Stirling numbers are

very important in Numerical Analysis, but here we

are more interested in their combinatorial meaning.

Harmonic numbers are the “discrete” version of log-

arithms, and from this fact their relevance in Combi-

natorics and in the Analysis of Algorithms originates.

Sequences studied in Combinatorial Analysis have

been collected and annotated by Sloane. The result-

ing book (and the corresponding Internet site) is one

of the most important reference point:

N. J. A. Sloane, Simon Plouﬀe: The Encyclope-

dia of Integer Sequences Academic Press (1995).

Available at:

http://www.research.att.com/ njas/sequences/.

* * *

Chapter 3: Formal Power Series

Formal power series have a long tradition; the

reader can ﬁnd their algebraic foundation in the book:

Peter Henrici: Applied and Computational Com-

plex Analysis, Wiley (1974) Vol. I; (1977) Vol.

II; (1986) Vol. III.

Coeﬃcient extraction is central in our approach; a

formalization can be found in:

Donatella Merlini, Renzo Sprugnol, M. Cecilia

Verri: The method of coeﬃcients, The American

Mathematical Monthly (to appear).

The Lagrange Inversion Formula can be found in

most of the quoted texts, in particular in Stanley

and Henrici. Nowadays, almost every part of Math-

ematics is available by means of several Computer

Algebra Systems, implementing on a computer the

ideas described in this chapter, and many, many oth-

ers. Maple and Mathematica are among the most

used systems; other software is available, free or not.

These systems have become an essential tool for de-

veloping any new aspect of Mathematics.

* * *

Chapter 4: Generating functions

In our present approach the concept of an (ordi-

nary) generating function is essential, and this justi-

ﬁes our insistence on the two operators “generating

function” and “coeﬃcient of”. Since their introduc-

tion in Mathematics, due to de Moivre in the XVIII

Century, generating functions have been a controver-

sial concept. Now they are accepted almost univer-

sally, and their theory has been developed in many

of the quoted texts. In particular:

Herbert S. Wilf: Generatingfunctionology, Aca-

demic Press (1990).

A practical device has been realized in Maple:

Bruno Salvy, Paul Zimmermann: GFUN: a Maple

package for the manipulation of generating and

holonomic functions in one variable, INRIA, Rap-

port Technique N. 143 (1992).

The number of applications of generating functions

is almost inﬁnite, so we limit our considerations to

some classical cases relative to Computer Science.

Sometimes, we intentionally complicate proofs in or-

der to stress the use of generating function as a uni-

fying approach. So, our proofs should be compared

with those given (for the same problems) by Knuth

or Flajolet and Sedgewick.

* * *

Chapter 5: Riordan arrays

95

Riordan arrays are part of the general method of

coeﬃcients and are particularly important in prov-

ing combinatorial identities and generating function

transformations. They were introduced by:

Louis W. Shapiro, Seyoum Getu, Wen-Jin Woan,

Leon C. Woodson: The Riordan group, Discrete

Applied Mathematics, 34 (1991) 229 – 239.

Their practical relevance was noted in:

Renzo Sprugnoli: Riordan arrays and combinato-

rial sums, Discrete Mathematics, 132 (1992) 267

– 290.

The theory was further developed in:

Donatella Merlini, Douglas G. Rogers, Renzo

Sprugnoli, M. Cecilia Verri: On some alterna-

tive characterizations of Riordan arrays, Cana-

dian Journal of Mathematics 49 (1997) 301 – 320.

Riordan arrays are strictly related to “convolution

matrices” and to “Umbral calculus”, although, rather

strangely, nobody seems to have noticed the connec-

tions of these concepts and combinatorial sums:

Steve Roman: The Umbral Calculus Academic

Press (1984).

Donald E. Knuth: Convolution polynomials, The

Methematica Journal, 2 (1991) 67 – 78.

Collections of combinatorial identities are:

Henry W. Gould: Combinatorial Identities. A

Standardized Set of Tables Listing 500 Binomial

Coeﬃcient Summations West Virginia University

(1972).

Josef Kaucky: Combinatorial Identities Veda,

Bratislava (1975).

Other methods to prove combinatorial identities

are important in Combinatorial Analysis. We quote:

John Riordan: Combinatorial Identities, Wiley

(1968).

G. P. Egorychev: Integral Representation and the

Computation of Combinatorial Sums American

Math. Society Translations, Vol. 59 (1984).

Doron Zeilberger: A holonomic systems approach

to special functions identities, Journal of Compu-

tational and Applied Mathematics 32 (1990), 321

– 368.

Doron Zeilberger: A fast algorithm for proving

terminating hypergeometric identities, Discrete

Mathematics 80 (1990), 207 – 211.

M. Petkovˇsek, Herbert S. Wilf, Doron Zeilberger:

A = B, A. K. Peters (1996).

In particular, the method of Wilf and Zeilberger

completely solves the problem of combinatorial iden-

tities “with hypergeometric terms”. This means that

we can algorithmically establish whether an identity

is true or false, provided the two members of the iden-

tity:

¸

k

L(k, n) = R(n) have a special form:

L(k + 1, n)

L(k, n)

L(k, n + 1)

L(k, n)

R(n + 1)

R(n)

are all rational functions in n and k. This actu-

ally means that L(k, n) and R(n) are composed of

factorial, powers and binomial coeﬃcients. In this

sense, Riordan arrays are less powerfy+ul, but can

be used also when non-hypergeometric terms are not

involved, as for examples in the case of harmonic and

Stirling numbers of both kind.

* * *

Chapter 6: Formal Methods

In this chapter we have considered two important

methods: the symbolic method for deducing counting

generating functions from the syntactic deﬁnition of

combinatorial objects, and the method of “operators”

for obtaining combinatorial identities from relations

between transformations of sequences deﬁned by op-

erators.

The symbolic method was started by

Sch¨ utzenberger and Viennot, who devised a

technique to automatically generate counting gener-

ating functions from a context-free non-ambiguous

grammar. When the grammar deﬁnes a class of

combinatorial objects, this method gives a direct way

to obtain monovariate or multivariate generating

functions, which allow to solve many problems

relative to the given objects. Since context-free

languages only deﬁne algebraic generating functions

(the subset of regular grammars is limited to rational

96 CHAPTER 8. BIBLIOGRAPHY

functions), the method is not very general, but

is very eﬀective whenever it can be applied. The

method was extended by Flajolet to some classes of

exponential generating functions and implemented

in Maple.

Marco Sch¨ utzenberger: Context-free languages

and pushdown automata, Information and Con-

trol 6 (1963) 246 – 264

Maylis Delest, Xavier Viennot: Algebraic lan-

guages and polyominoes enumeration, X Collo-

quium on Automata, Languages and Program-

ming - Lecture Notes in Computer Science (1983)

173 – 181.

Philippe Flajolet: Symbolic enumerative com-

binatorics and complex asymptotic analysis,

Algorithms Seminar, (2001). Available at:

http://algo.inria.fr/seminars/sem00-

01/ﬂajolet.html

The method of operators is very old and was devel-

oped in the XIX Century by English mathematicians,

especially George Boole. A classical book in this di-

rection is:

Charles Jordan: Calculus of Finite Diﬀerences,

Chelsea Publ. (1965).

Actually, the method is used in Numerical Anal-

ysis, but it has a clear connection with Combinato-

rial Analysis, as our numerous examples show. The

important concepts of indeﬁnite and deﬁnite summa-

tions are used by Wilf and Zeilberger in the quoted

texts. The Euler-McLaurin summation formula is the

ﬁrst connection between ﬁnite methods (considered

up to this moment) and asymptotics.

* * *

Chapter 7: Asymptotics

The methods treated in the previous chapters are

“exact”, in the sense that every time they give the so-

lution to a problem, this solution is a precise formula.

This, however, is not always possible, and many times

we are not able to ﬁnd a solution of this kind. In these

cases, we would also be content with an approximate

solution, provided we can give an upper bound to the

error committed. The purpose of asymptotic meth-

ods is just that.

The natural settings for these problems are Com-

plex Analysis and the theory of series. We have used

a rather descriptive approach, limiting our consid-

erations to elementary cases. These situations are

covered by the quoted text, especially Knuth, Wilf

and Henrici. The method of Heyman is based on the

paper:

Micha Hofri: Probabilistic Analysis of Algo-

rithms, Springer (1987).

Index

Γ function, 8, 84

ΛΥΩ, 90

ψ function, 8

1-1 correspondence, 11

1-1 mapping, 11

A-sequence, 58

absolute scale, 9

addition operator, 77

Adelson-Velski, 52

adicity, 35

algebraic singularity, 86, 87

algebraic square root, 87

Algol’60, 69

Algol’68, 70

algorithm, 5

alphabet, 12, 67

alternating subgroup, 14

ambiguous grammar, 68

Appell subgroup, 57

arithmetic square root, 87

arrangement, 11

asymptotic approximation, 88

asymptotic development, 80

asymptotic evaluation, 88

average case analysis, 5

AVL tree, 52

Backus Normal Form, 70

Bell number, 24

Bell subgroup, 57

Bender’s theorem, 84

Bernoulli number, 18, 24, 53, 80, 85

big-oh notation, 9

bijection, 11

binary searching, 6

binary tree, 20

binomial coeﬃcient, 8, 15

binomial formula, 15

bisection formula, 41

bivariate generating function, 55

BNF, 70

Boole, George, 73

branch, 87

C language, 69

cardinality, 11

Cartesian product, 11

Catalan number, 20, 34

Catalan triangle, 63

Cauchy product, 26

Cauchy theorem, 90

central binomial coeﬃcient, 16, 17

central trinomial coeﬃcient, 46

characteristic function, 12

Chomski, Noam, 67

choose, 15

closed form expression, 7

codomain, 11

coeﬃcient extraction rules, 30

coeﬃcient of operator, 30

coeﬃcient operator, 30

colored walk, 62

colored walk problem, 62

column, 11

column index, 11

combination, 15

combination with repetitions, 16

complete colored walk, 62

composition of f.p.s., 29

composition of permutations, 13

composition rule for coeﬃcient of, 30

composition rule for generating functions, 40

compositional inverse of a f.p.s., 29

Computer Algebra System, 34

context free grammar, 68

context free language, 68

convergence, 83

convolution, 26

convolution rule for coeﬃcient of, 30

convolution rule for generating functions, 40

cross product rule, 17

cycle, 12, 21

cycle degree, 12

cycle representation, 12

Darboux’ method, 84

deﬁnite integration of a f.p.s., 28

deﬁnite summation, 78

degree of a permutation, 12

delta series, 29

derangement, 12, 86

derivation, 67

diagonal step, 62

97

98 INDEX

diagonalisation rule for generating functions, 40

diﬀerence operator, 73

diﬀerentiation of a f.p.s., 28

diﬀerentiation rule for coeﬃcient of, 30

diﬀerentiation rule for generating functions, 40

digamma function, 8

disposition, 15

divergence, 83

domain, 11

dominating singularity, 85

double sequence, 11

Dyck grammar, 68

Dyck language, 68

Dyck walk, 21

Dyck word, 69

east step, 20, 62

empty word, 12, 67

entire function, 89

essential singularity, 86

Euclid’s algorithm, 19

Euler constant, 8, 18

Euler transformation, 42, 55

Euler-McLaurin summation formula, 80

even permutation, 13

exponential algorithm, 10

exponential generating function, 25, 39

exponentiation of f.p.s., 28

extensional deﬁnition, 11

extraction of the coeﬃcient, 29

f.p.s., 25

factorial, 8, 14

falling factorial, 15

Fibonacci number, 19

Fibonacci problem, 19

Fibonacci word, 70

Fibonacci, Leonardo, 18

ﬁnite operator, 73

ﬁxed point, 12

Flajolet, Philippe, 90

formal grammar, 67

formal language, 67

formal Laurent series, 25, 27

formal power series, 25

free monoid, 67

full history recurrence, 47

function, 11

Gauss’ integral, 8

generalized convolution rule, 56

generalized harmonic number, 18

generating function, 25

generating function rules, 40

generation, 67

geometric series, 30

grammar, 67

group, 67

H-admissible function, 90

Hardy’s identity, 61

harmonic number, 8, 18

harmonic series, 17

Hayman’s method, 90

head, 67

height balanced binary tree, 52

height of a tree, 52

i.p.l., 51

identity, 12

identity for composition, 29

identity operator, 73

identity permutation, 13

image, 11

inclusion exclusion principle, 86

indeﬁnite precision, 35

indeﬁnite summation, 78

indeterminate, 25

index, 11

inﬁnitesimal operator, 73

initial condition, 6, 47

injective function, 11

input, 5

integral lattice, 20

integration of a f.p.s., 28

intensional deﬁnition, 11

internal path length, 51

intractable algorithm, 10

intrinsecally ambiguous grammar, 69

invertible f.p.s., 26

involution, 13, 49

juxtaposition, 67

k-combination, 15

key, 5

Kronecker’s delta, 43

Lagrange, 32

Lagrange inversion formula, 33

Lagrange subgroup, 57

Landau, Edmund, 9

Landis, 52

language, 12, 67

language generated by the grammar, 68

leftmost occurrence, 67

length of a walk, 62

length of a word, 67

letter, 12, 67

LIF, 33

linear algorithm, 10

linear recurrence, 47

linear recurrence with constant coeﬃcients, 48

INDEX 99

linear recurrence with polynomial coeﬃcients, 48

linearity rule for coeﬃcient of, 30

linearity rule for generating functions, 40

list representation, 36

logarithm of a f.p.s., 28

logarithmic algorithm, 10

logarithmic singularity, 88

mapping, 11

Mascheroni constant, 8, 18

metasymbol, 70

method of shifting, 44

Miller, J. C. P., 37

monoid, 67

Motzkin number, 71

Motzkin triangle, 63

Motzkin word, 71

multiset, 24

negation rule, 16

Newton’s rule, 28, 30, 84

non convergence, 83

non-ambiguous grammar, 68

north step, 20, 62

north-east step, 20

number of involutions, 14

number of mappings, 11

Numerical Analysis, 73

O-notation, 9

object grammar, 71

occurence, 67

occurs in, 67

odd permutation, 13

operand, 35

operations on rational numbers, 35

operator, 35, 72

order of a f.p.s., 25

order of a pole, 85

order of a recurrence, 47

ordered Bell number, 24, 85

ordered partition, 24

ordinary generating function, 25

output, 5

p-ary tree, 34

parenthetization, 20, 68

partial fraction expansion, 30, 48

partial history recurrence, 47

partially recursive set, 68

Pascal language, 69

Pascal triangle, 16

path, 20

permutation, 12

place marker, 25

Pochhammer symbol, 15

pole, 85

polynomial algorithm, 10

power of a f.p.s., 27

preferential arrangement number, 24

preferential arrangements, 24

preﬁx, 67

principal value, 88

principle of identity, 40

problem of searching, 5

product of f.L.s., 27

product of f.p.s., 26

production, 67

program, 5

proper Riordan array, 55

quadratic algoritm, 10

quasi-unit, 29

rabbit problem, 19

radius of convergence, 83

random permutation, 14

range, 11

recurrence relation, 6, 47

renewal array, 57

residue of a f.L.s., 30

reverse of a f.p.s., 29

Riemann zeta function, 18

Riordan array, 55

Riordan’s old identity, 61

rising factorial, 15

Rogers, Douglas, 57

root, 21

rooted planar tree, 21

row, 11

row index, 11

row-by-column product, 32

Salvy, Bruno, 90

Sch¨ utzenberger methodology, 68

semi-closed form, 46

sequence, 11

sequential searching, 5

set, 11

set partition, 23

shift operator, 73

shifting rule for coeﬃcient of, 30

shifting rule for generating functions, 40

shuﬄing, 14

simple binomial coeﬃcients, 59

singularity, 85

small-oh notation, 9

solving a recurrence, 6

sorting, 12

south-east step, 20

square root of a f.p.s., 28

Stanley, 33

100 INDEX

Stirling number of the ﬁrst kind, 21

Stirling number of the second kind, 22

Stirling polynomial, 23

Stirling, James, 21, 22

subgroup of associated operators, 57

subtracted singularity, 88

subword, 67

successful search, 5

suﬃx, 67

sum of a geometric progression, 44

sum of f.L.s, 27

sum of f.p.s., 26

summation by parts, 78

summing factor method, 49

surjective function, 11

symbol, 12, 67

symbolic method, 68

symmetric colored walk problem, 62

symmetric group, 14

symmetry formula, 16

syntax directed compilation, 70

table, 5

tail, 67

Tartaglia triangle, 16

Taylor theorem, 80

terminal word, 68

tractable algorithm, 10

transposition, 12

transposition representation, 13

tree representation, 36

triangle, 55

trinomial coeﬃcient, 46

unary-binary tree, 71

underdiagonal colored walk, 62

underdiagonal walk, 20

unfold a recurrence, 6, 49

uniform convergence, 83

unit, 26

unsuccessful search, 5

van Wijngarden grammar, 70

Vandermonde convolution, 43

vector notation, 11

vector representation, 12

walk, 20

word, 12, 67

worst AVL tree, 52

worst case analysis, 5

Z-sequence, 58

zeta function, 18

Zimmermann, Paul, 90

2

Contents

1 Introduction 1.1 What is the Analysis of an Algorithm 1.2 The Analysis of Sequential Searching . 1.3 Binary Searching . . . . . . . . . . . . 1.4 Closed Forms . . . . . . . . . . . . . . 1.5 The Landau notation . . . . . . . . . . 2 Special numbers 2.1 Mappings and powers . . . . . . . . 2.2 Permutations . . . . . . . . . . . . . 2.3 The group structure . . . . . . . . . 2.4 Counting permutations . . . . . . . . 2.5 Dispositions and Combinations . . . 2.6 The Pascal triangle . . . . . . . . . . 2.7 Harmonic numbers . . . . . . . . . . 2.8 Fibonacci numbers . . . . . . . . . . 2.9 Walks, trees and Catalan numbers . 2.10 Stirling numbers of the ﬁrst kind . . 2.11 Stirling numbers of the second kind . 2.12 Bell and Bernoulli numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 6 6 7 9 11 11 12 13 14 15 16 17 18 19 21 22 23 25 25 26 27 27 29 29 31 32 33 34 35 36 37 39 39 40 41 42 44 45 46

3 Formal power series 3.1 Deﬁnitions for formal power series . . . . 3.2 The basic algebraic structure . . . . . . . 3.3 Formal Laurent Series . . . . . . . . . . . 3.4 Operations on formal power series . . . . 3.5 Composition . . . . . . . . . . . . . . . . 3.6 Coeﬃcient extraction . . . . . . . . . . . . 3.7 Matrix representation . . . . . . . . . . . 3.8 Lagrange inversion theorem . . . . . . . . 3.9 Some examples of the LIF . . . . . . . . . 3.10 Formal power series and the computer . . 3.11 The internal representation of expressions 3.12 Basic operations of formal power series . . 3.13 Logarithm and exponential . . . . . . . . 4 Generating Functions 4.1 General Rules . . . . . . . . . . . . . . . . 4.2 Some Theorems on Generating Functions 4.3 More advanced results . . . . . . . . . . . 4.4 Common Generating Functions . . . . . . 4.5 The Method of Shifting . . . . . . . . . . 4.6 Diagonalization . . . . . . . . . . . . . . . 4.7 Some special generating functions . . . . .

. . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 The Shift Operator . . . .1 Deﬁnitions and basic concepts . .5 The bivariate case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . .5 Algebraic and logarithmic singularities . . . .7 Coloured walks . . . . .4 The symbolic method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Formal languages and programming languages 6. . . . . . . 6. . . . . . . . 5. . . . . . . . . .8 4. . . . . . .9 Shift and Diﬀerence Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . . .6 Binomial coeﬃcients and the LIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . 6. . . . . . . . . . . 47 49 49 51 52 53 55 55 56 57 59 60 61 62 64 65 67 67 68 69 70 71 72 73 74 76 77 78 79 80 81 83 83 84 85 86 87 88 89 89 91 93 5 Riordan Arrays 5. . . . . . . . . . CONTENTS . . . . . . .7 The asymptotic behavior of a trinomial square root 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Singularities: poles . . . . . . . . . . . . . . . . . . . .1 Formal languages . . . . . . . . . . . . . . . .8 Hayman’s method . . . . . . . . . . . . . . . . . . . . 7. 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 Deﬁnite and Indeﬁnite summation . . . . . . . . . . . . . 6. . . . . . . . . . . . . . 7. . . . . . . 6. . .12 Deﬁnite Summation . . . . . . . . . 5. . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Shift and Diﬀerence Operators . . . . . . . . . . . . . . . . . . . . . . . Some special recurrences . . The internal path length of binary trees . . . . . . . . .3 The A-sequence for proper Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 The method of Darboux . . . . . . . . . . . . . . . . . . . . . 7 Asymptotics 7. . . . . .9 Examples of Hayman’s Theorem . . . . . . . . . . . . .13 Linear recurrences with constant coeﬃcients . 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 The convergence of power series . . . . . .Example I . . . . . . 6. . . . . . . . . . . .10 The Addition Operator . . . . . . . . . . . . . . . . . . . . . . . . . . .4 4. . . . . . . . . . . . . . . . . . . . Linear recurrences with polynomial coeﬃcients The summing factor method . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 4. . . . . . . 5. . . .2 Context-free languages . . . . .9 Identities involving the Stirling numbers . . . . . . . . .13 The Euler-McLaurin Summation Formula . . . . . . . . . . . . . . . .7 The Diﬀerence Operator . . . . . . . . . .4 Poles and asymptotics . . . . . . . . . . . . . . . 6 Formal methods 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . .14 Applications of the Euler-McLaurin Formula . . . .Example II . .11 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. .2 The algebraic structure of Riordan arrays . . . . . . . . . . . 5. . . . 8 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 4.4 Simple binomial coeﬃcients . . . . . .5 Other Riordan arrays from binomial coeﬃcients 5. . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . . . Height balanced binary trees . . . .6 Subtracted singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Stirling numbers and Riordan arrays . .

So. “Euclid’s algorithm” for evaluating the greatest common divisor of two integer numbers was also well known before the appearance of computers. subtraction. Many algorithms can exist which solve the same problem. and in the former case which element ak in T it is. The analysis of this (simple) algorithm consists in ﬁnding one or more mathematical expressions describing in some way the number of operations performed by the algorithm as a function of the number n of the elements in T . or the set Z of integer numbers. Simple algorithms are more easily programmed. is to choose.Chapter 1 Introduction 1. until we ﬁnd an element ak = s or reach the end of T . because we always deal with a suitable binary representation of the elements in S on a computer. Let us consider. • the operations performed by an algorithm can be 5 An algorithm is a ﬁnite sequence of unambiguous rules for solving a problem. for what concerns the sequential search algorithm. if any. Otherwise. multiplication and division) with the newly introduced arabic digits and the positional notation for numbers. an algorithm is independent of any computer. i. as a simple example. it will always end. and so on. In the former case the search is successful and we have determined the element in T equal to s. Although the mathematical problem “s ∈ T or not” has almost no relevance. in these cases. in fact. its elements referred to as keys). the most straight-forward algorithm is sequential searching: we begin by comparing s and a1 and we are ﬁnished if they are equal. The problem is therefore to have some means to judge the speed and the quantity of computer resources a given algorithm uses. This deﬁnition is intentionally vague and the following points should be noted: • an algorithm can present several aspects and therefore may require several mathematical expressions to be fully understood. the word was used for a long time before computers were invented. we wish to know what the worst case is (Worst Case Analysis) and what the average case is (Average Case Analysis) with respect to all the tables containing n elements or. A natural problem. which is the solution to the given problem. with respect to the n elements it contains. In practical cases S is the set N of natural numbers.. . the best algorithm. Surely. we are interested in what happens during a successful or unsuccessful search. the problem of searching. An algorithm is realized on a computer by means of a program. Besides. and. the searching problem is basic in computer science and many algorithms have been devised to make the process of searching as fast as possible. Once the algorithm is started with a particular input. and have therefore to be considered as “words” in the computer memory. an ) of S (usually called a table. but they can be very slow or require large amounts of computer memory. a2 . . a set of instructions which cause the computer to perform the elaborations intended by the algorithm. in order to realize it as a program on a particular computer.1 What is the Analysis of an Algorithm ever. for a successful search. Some can be very skillful. The “problem of searching” is as follows: we are given a ﬁnite ordered subset T = (a1 . we compare s and a2 . for a ﬁxed table.e. Let S be a given set. or the set R of real numbers or also the set A∗ of the words on some alphabet A. obtaining the correct answer or output. or to see if an algorithm is suﬃciently good to be eﬃciently implemented on a computer. In the latter case we are convinced that s ∈ T and the search is unsuccessful. . the nature of S is not essential. an element s ∈ S and we wish to know whether s ∈ T or s ∈ T . Leonardo Fibonacci called “algorisms” the rules for performing the four basic operations (addition. as a matter of fact. . others can be very simple and straight-forward. The aim of the “Analysis of Algorithms” is just to give the mathematical bases for describing the behavior of algorithms. thus obtaining exact criteria to compare diﬀerent algorithms performing the same task. For example. How- .

since for an element s ∈ S in T . we have: Un = 1 + Un−1 This is a recurrence relation. the worst case other large part of our eﬀorts will be dedicated to the analysis is immediate.. ai+2 . So. . of an algorithm. Consequently. . that is an expression relating the value of Un with other values of Uk having k < n. i. If s = ai then the search is successful. to search This result is intuitively clear but important. then it is possible to ﬁnd the value of Un .3 Binary Searching ysis. For a successful search. To ﬁnd the average number of comparisons in a successful search we should sum the number of com. Sometimes we can also consider more complex operations as square roots.. In general Searching we have the problem of solving a recurrence. . a2 . It is clear that if some value. ai−1 ). for which we only have an Average Case analysis. In the example above the only operations involved in the algorithm are comparisons. perform the same algorithm on the subtable T ′ = (a1 . As stated before. i = ⌊(n + 1)/2⌋. In our example 1. and we can decide to consider as an “operation” also very complicated manipulations (e. called binary searching. . If this is not the case. list concatenation or reversal of words.Another simple example of analysis can be performed parisons necessary to ﬁnd any element in T . so we should go on with the table T ′ = T \ {a1 }. but as we shall see this is not always the case. so no doubt is possible. The important point is that every instance of the “operation” takes on about the same time or has the same complexity. .e.1. A large part of our eﬀorts will be dedicated to this topic. . . If at any moment the taThis is typical of many algorithms and. If T = (a1 . Algorithms are only related to the “logic” used to Un = 1 + Un−1 = 1 + 1 + Un−2 = · · · = solve the problem. an ). Consequently. We compare s with the ﬁrst element a1 in T and obviously we ﬁnd s = a1 . if s < ai . for every n ∈ N.2 The Analysis of Sequential the recurrence is easily transformed into a sum. and so on. comparisons if s = an is the last element in T . Recurrence relations are the other mathematical device arising in algorithm analysis. CHAPTER 1. extracting a random number or performing the diﬀerentiation of a polynomial of a given degree). The concluding step of our analysis was the execu. if s = a2 we need two compar. starting with the The analysis of the sequential searching algorithm is recurrence relation and the initial conditions. It is clear that if s = a1 we only need ordered set. we can determine Un in the following way.s > ai perform the same algorithm on the subtable tion of a sum—a well-known sum in the present case.e. INTRODUCTION technical point for the analyst. . Let S be a given divide by n. we can give diﬀerent weights to diﬀerent operations or to different instances of the same operation. a2 .. then the search is unsuccessful. e.merical order in N. algorithms are Hence we have the initial condition U0 = 0 and we independent of any particular computer and should can unfold the preceding recurrence relation: never be confused with the programs realizing them. In our case.. which contains n − 1 elements. Operations depend on the nature of the algorithm..6 of many diﬀerent kinds. and compare it comparisons performed by the algorithm (and hence with s.g. U0 or U1 is known. an ) is a ﬁnite ordered subset of S. Let us now consider an unsuccessful search. as the nua single comparison. programs can depend on the abil= 1 + 1 + · · · + 1 +U0 = n ity of the programmer or on the characteristics of the n times computer. since we have to perform n solution or recurrences. The ordering must be total. . U0 is the number of We observe explicitly that we never consider exe. More interesting is the average case analysis. T ′′ = (ai+1 . the ability in performing sums is an important empty set ∅. we can always imag1 n(n + 1) n+1 1 = Cn = (1 + 2 + · · · + n) = ine that a1 < a2 < · · · < an and consider the foln n 2 2 lowing algorithm. Let ai the median elit shows in mathematical terms that the number of ement in T . In other algorithms we can have arithmetic or logical operations.g. . . anvery simple. the time taken on a computer) grows linearly with otherwise. . If Uk denotes the number of comparisons necessary for a table with k elements. i. . as a matter of ble on which we perform the search is reduced to the fact. Z or R or the lexicographical isons. and then with the binary search algorithm.e. we have: order in A∗ . i. if instead the dimension of T . which introduces the ﬁrst mathematical device of algorithm anal. a table.comparisons necessary to ﬁnd out that an element cution time as a possible parameter for the behavior s does not belong to a table containing no element. to ﬁnd an explicit expression for Un .

more in general. the left-hand expression is not in closed form. For n also moderately large (say n ≥ 5) approximation. and returning to the B’s we ﬁnd: In the example. For a successful search. but the reader can check it by using mathematical induction. 000. A closed form expression is an expression. In fact. Since we are performing a Worst Case analysis. Another example we have already found is: parison: the median element in T .. comparisons as: whereas the right-hand one is.1.000 comparisons but can require some milliseconds to compute the foron the average for a successful search. the analysis is now very simple. The 1. If Bn is the number of comparisons necessary to ﬁnd s in a table T with n elements. the left-hand expression requires n Bn = log2 (n + 1) sums to be evaluated.sion is independent of n. in a time independent of α and n. The important searching only requires log2 (1. sequential search requires about 500. CLOSED FORMS Let us consider ﬁrst the Worst Case analysis of this algorithm. whilst the left-hand expresment that binary searching operates on sequential sion requires a number of operations growing linearly searching. but we can simplify everything considering The sign “=” between two numerical expressions dea value of n of the form 2k −1. dependmathematical proof of such a fact. We observe explicitly that for n = 1. if only n is greater than 10. the element s is only found at the last step of the algorithm. If we now write k2k − 2k + 1 = k(2k − 1) + k − (2k − 1). relative to the table (s). Continuing k=0 in the same way we ﬁnd the average number An of Again. . the evaluation of which can be accomplished in the following way. In fact. Un = Bn . 2n is a simple shift in · · · + (1 + ⌊log2 (n)⌋)) a binary computer and.point is that the evaluation of the right-hand expresparisons. This simpliﬁes: is because the two elementary functions exp(x) and ln(x) have the nice property that their evaluation is k 1 k2k − 2k + 1 independent of their argument. We observe that 1 An = (1 + 2 + 2 + 3 + 3 + 3 + 3 + 4 + · · · 2n = 2×2×· · ·×2 (n times) seems to require n−1 muln tiplications.7). i. in such a case. we observe that every step reduces the table to ⌊n/2⌋ or to ⌊(n−1)/2⌋ elements. and the ren n(n + 1) currence takes on the form: k= 2 k=0 B2k −1 = 1 + B2k−1 −1 or βk = 1 + βk−1 which is only a little better than the worst case. to which we should always reduce. 000) ≈ 20 com. when the subtable on which we search is reduced to a single element. nobody would prefer computing the left-hand expression rather than the right-hand one. 000. whereas binary mer. This is valid for every n pression only requires a sum. When n = 2k − 1 the expression time. As before.4.e. The same property A2k −1 = k j2j−1 = 2 − 1 j=1 2k − 1 holds true for the most common numerical functions. a multiplication and of the form 2k − 1 and for the other values this is an a halving.4 Closed Forms recurrence is not so simple as in the case of sequential searching. unfolding yields Although algebraically or numerically equivalent.e. we have the recurrence: Bn = 1 + B⌊n/2⌋ 7 This sum also is not immediate. a evaluates this latter expression in a few nanoseconds. we consider the worse situation. i. a rather good approximation. βk = k.on n. every power The value of this sum can be found explicitly. two expressions can be computationally quite diﬀerent. 000. There are two n elements that can be found with two comparisons: k2k−1 = n2n − 2n + 1 = (n − 1)2n + 1 the median elements in T ′ and in T ′′ .. In fact. we ﬁnd: An = k + k log2 (n + 1) − 1 = log2 (n + 1) − 1 + n n if we write βk for B2k −1 . There is does not require a number of operations depending only one element that can be found with a single com. The Average Case analysis for successful searches ing on some parameter n. since we have to proceed as in the Worst Case analysis and at the last comparison we have a failure instead of a success. The initial condition is B1 = 1. indeed. This accounts for the dramatic improve. A computer because of the very slow growth of logarithms.000. Consequently. but the αn = exp(n ln α) can be always computed with the method is rather diﬃcult and we delay it until later maximal accuracy allowed by a computer in constant (see Section 4. notes their numerical equivalence as for example: we have ⌊n/2⌋ = ⌊(n − 1)/2⌋ = 2k−1 − 1. For unsuccessful searches. whereas the right-hand exk by our deﬁnition n = 2 − 1. however. and the analysis of algorithms provides a with n.

INTRODUCTION the last integral being the famous Gauss’ integral. is the Mascheroni-Euler constant. let us anticipate some deﬁnitions. the harmonic numbers and the binomial coeﬃcients. 2 4 2x 12x 120x 252x6 Γ(1) = 1.8 as the trigonometric and hyperbolic functions. from the recurrence relation we obtain: 1 Γ(1/2) = Γ(1 − 1/2) = − Γ(−1/2) 2 and therefore: √ Γ(−1/2) = −2Γ(1/2) = −2 π.482199394 = −0.57721566 . . which will be discussed in the next chapter. By integrating by parts. the following approximation can be important: (−1)n 1 .193527818 = 0. which are considered to be in closed form. By using the approximation for ψ(x). when 1 1 1 1 x = 1 the integral simpliﬁes and we immediately ﬁnd ψ(x) = ln x − − + − + ···. called ψ-function or digamma Γ(x + 1) = 1 + b1 x + b2 x2 + · · · + b8 x8 + ǫ(x) function. Finally. Most of them can be reduced to the computation of some basic quantities. For example. In fact. The Γ-function is deﬁned by a deﬁnite integral: Γ(x) = 0 ∞ CHAPTER 1.988205891 = −0. Γ(−n + ǫ) ≈ n! ǫ When we unfold the basic recurrence of the Γfunction for x = n an integer. In order to justify the previous sentence. Another method is to ψ(x + 1) = = dx = Γ(x + 1) xΓ(x) use Stirling’s approximation: 1 Γ(x) + xΓ′ (x) √ 1 1 = + ψ(x) = −x x−0. − By this formula we can always reduce the computa3 4 51840x 2488320x tion of ψ(x) to the case 1 ≤ x ≤ 2. where the function goes to inﬁnity. we see that the digamma function is related to the harmonic numbers Hn = 1 + 1/2 + 1/3 + · · · + 1/n. When x = 1/2 the deﬁnition implies: Γ(1/2) = 0 ∞ t1/2−1 e−t dt = 0 ∞ e−t √ dt. we have: Γ′ (x) d ln Γ(x) = . which is obtained from the same formula for the Γ-function: n! Γ(n + 1) = nΓ(n) = √ n n 1 1 2πn = + + ··· . and give a more precise presentation of the Γ and ψ functions. we . The three main quantities of this kind are the factorial. we have: Γ(1/2) = 0 ∞ √ t (t = y 2 and √ π By the previous recurrence. use a polynomial approximation: The function ψ(x). we have: Hn = ψ(n + 1) + γ e−y 2ydy = 2 y 2 ∞ 0 e−y dy = 2 where γ = 0. The factorial Γ(n + 1) = n! = 1 × 2 × 3 × · · · × n seems to require n − 2 multiplications. It allows us to reduce the computation of Γ(x) to the case 1 ≤ x ≤ 2.756704078 = 0. As we shall see. although apparently they depend on some parameter n. we obtain: Γ(x + 1) = 0 ∞ tx e−t dt = ∞ 0 = −tx e−t + 0 ∞ xtx−1 e−t dt = xΓ(x) which is a basic.918206857 b5 b6 b7 b8 = −0.897056937 = 0. in algorithm analysis there appear many kinds of “special” numbers. where we can use Some special values of the Γ-function are directly the approximation: obtained from the deﬁnition. However. . we ﬁnd Γ(n + 1) = n × (n − 1) × · · · × 2 × 1. t By performing the substitution y = dt = 2ydy). and so on. recurrence property of the Γfunction. 1+ e 12n 288n2 = tx−1 e−t dt. except when x is a negative integer.577191652 = 0. the Γ and ψ functions (see below). In this interval we can This requires only a ﬁxed amount of operations to reach the desired accuracy. dx Γ(x) d xΓ(x) Γ′ (x + 1) The error is |ǫ(x)| ≤ 3 × 10−7 . for n large it can be computed by means of the Stirling’s formula.035868343 ψ(x) = Obviously. is deﬁned as the logarithmic derivative of the Γ-function: where: b1 b2 b3 b4 = −0. The Γ function is deﬁned for every x ∈ C. 571 139 − + ··· .5 + − 2π 1 + Γ(x) = e x xΓ(x) x 12x 288x2 and this is a basic property of the digamma function.

if we have at the same time: g(n) <∞ n→∞ f (n) lim n we say that f (n) is in the same order as g(n) and < O(ee ) < · · · . a value K < 1 tells us that algorithm A is relatively better than B.This scale reﬂects well-known properties: the logatween functions f : N → R. we wish to introduce a last deﬁnition: we say that f (n) is of smaller order than g(n) and write f (n) = o(g(n)). the two algorithms are asymptotically equivalent iﬀ f (n) = Θ(g(n)).. Obviously. It is also possible to give an absolute evaluation for the performance of a given algorithm A. the number of operations is the same. when f (n) = Θ(g(n)) we cannot say which algorithm is better. this is in accordance with the previous deﬁnitions. a constant K = 0 exists such that: numbers: Hn = ln n + γ + 1 1 − + ··· 2n 12n2 n→∞ lim f (n) =1 Kg(n) or n→∞ lim f (n) = K. and that “≈” is an equiv. Before making some important comments on Landau notation. the binomial coeﬃcient: n k = = = n(n − 1) · · · (n − k + 1) = k! n! = k!(n − k)! Γ(n + 1) Γ(k + 1)Γ(n − k + 1) the constant K is very important and will often be used. but the notation introduced (the smalloh notation) is used rather frequently and should be known. We observe explicitly that when f (n) ever small ǫ. We are mainly interested to the case x → ∞. instead. This is done by comparing f (n) against an absolute scale of values. However. which remains the same as n → ∞. We observe explicitly that the last expression shows that binomial coeﬃcients can be deﬁned for every n. we will say that A is asymptotically better than B iﬀ f (n) = o(g(n)). given two functions f (n) and g(n). write computer programs to realize the various functions mentioned in the present section. g(n) which shows that the computation of Hn does not require n − 1 sums and n − 1 inversions as it can appear from its deﬁnition. but this should not be considered a restriction. The two methods are indeed the same. this depending on the particular realization or on the particular computer on which the algorithms are run. and vice versa when K > 1. because of the use of the letter O to denote the desired behavior. g(n) n→∞ can be reduced to the computation of the Γ function or can be approximated by using the Stirling’s formula for factorials. except for a constant quantity K.1. whose behavior is described by a sequence of values f (n). if and only if: n→∞ lim f (n) <∞ g(n) In formulas we write f (n) = O(g(n)) or also f (n) ∼ g(n). in general. sequences of real numbers). If f (n) and g(n) describe the behavior of two algorithms A and B solving the same problem. Let us consider functions f : N → R (i. This is rather clear. Therefore. and with two diﬀerent implementations we may have K < 1 or K > 1. because when f (n) = o(g(n)) the number of operations performed by A is substantially less than the number of operations performed by B. write f (n) = Θ(g(n)) or f (n) ≈ g(n). or that f (n) is in the order of g(n). k ∈ C.5 The Landau notation To the mathematician Edmund Landau is ascribed a special notation to describe the general behavior of a function f (x) when x approaches some deﬁnite value. 1. except that k cannot be a negative integer number.5. while en grows faster than any power Obviously. when f (n) = Θ(g(n)).rithm grows more slowly than any power nǫ . if A and B are both realized on the same computer and in the best possible way. logarithms and exponentials: √ O(1) < O(ln n) < O( n) < O(n) < · · · √ < O(n ln n) < O(n n) < O(n2 ) < · · · < O(n5 ) < · · · < O(en ) < · · · . we say that f (n) is O(g(n)). Besides. iﬀ: lim f (n) = 0. The scale most frequently used contains powers of n. The constant K can simply depend on the particular realization of the algorithms A and B.e. howalence relation. as a very useful exercise. It is easy to see that “∼” is an order relation be. Finally. THE LANDAU NOTATION 9 obtain an approximate formula for the Harmonic is in the same order as g(n). Landau notation is also known as O-notation (or bigoh notation). The reader can.

For algorithm B we have g(m) = KB em = 1000KB en . however large k. and if f (n) ≥ O(en ). if f (n) = O(ln n). CHAPTER 1. In general. We can compare f (n) to the elements of the scale and decide. independent of n. A is said to be quadratic. but the reader can easily ﬁll in other values. the algorithm A is said to be linear. i. then the algorithm A performs in a constant time. A is said to be logarithmic. 000. Therefore. in an hour algorithm A will execute with m such that f (m) = KA m = 1000KA n. by simplifying and taking logarithms. As a standard terminology. polynomial algorithms are also called tractable. in a time independent of n. mainly used in the “Theory of Complexity”. the scale is not complete. we ﬁnd m = n + ln 1000 ≈ n + 6. if f (n) ≤ O(nr ) for some r ∈ R. the problem solved by the new computer is 1000 times larger than the problem solved by the old computer. Suppose we have a linear algorithm A and an exponential algorithm B. the improvement achieved by the new computer is almost negligible. then A is said to be polynomial. Algorithms of this type are the best possible algorithms. this means that algorithm A performs at least as well as any algorithm B whose behavior is described by a function g(n) = Θ(n). if f (n) = O(n2 ). the parameter used to evaluate the algorithm behavior. If f (n) = O(n). while exponential algorithms are called intractable.4 ) < O( n) and O(ln ln n) < O(ln n).e. A is said to be exponential.. If f (n) = O(1). Let f (n) be a sequence describing the behavior of an algorithm A. or m = 1000n = 40.9 < 47. Therefore. Also suppose that an hour of computer time executes both A and B with n = 40. As a matter of fact. Note that nn = en ln n and therefore O(en ) < O(nn ). which can be of interest to him or her. If a new computer is used which is 1000 times faster than the old one. These names are due to the following observation. INTRODUCTION . as we obviously √ have O(n0. for example. and they are especially interesting when n becomes very large. that f (n) = O(n). not necessarily solving the same problem.10 nk .

as sets.Chapter 2 Special numbers A sequence is a mapping from the set N of natural numbers into some other set of numbers.). whose ﬁrst row is the sequence (f0. the second row is (f1. . . . i. . . we have m · m · . .). f1. . . we write B = {x | P (x) is true}. f1.e. If a set B is deﬁned through a property P of its elements. . the two arrangements continue to be the same.) is the ﬁrst column. a mapping or function from A into B. we also study double sequences. k) we will usually write fn.0 . the sequence is called a sequence of rational numbers. f1. .. c. and then (f0. noted as f : A → B. instead of writing f (n. . .1 . then |S| denotes its cardinality or the number of its elements: The order in which we write or consider the elements of S is irrelevant. If f : N → R. b) with a ∈ A and b ∈ B.e. .) is the second column. if f : N → Q. . . In fact.. Since all the mappings can be built in this way. an } in any order. a2 . c. f ⊆ A × B. If a. A set is a collection of objects and it must be considered a primitive concept. Because of this notation. In this case also. and b is called the image of a under the mapping f . . this means that we can associate to a any one of the m elements in B. . . . the sequence is called a sequence of real numbers. b1 ) and (a2 . b) is called surjective. We can observe that every element a ∈ A must have its image in B. an ).1 .k and the whole sequence will be denoted by {fn. A more diﬃcult problem is to ﬁnd out how many mappings from A to B exist. an element k ∈ N is called an index. Therefore.0 . This is the vector notation and is used to represent arrangements of S. a2 .1 . . and so on. is a subset of the Cartesian product of A by B. a concept which cannot be deﬁned in terms of other and more elementary concepts. b) ∈ f is f (a) = b. . The usual notation for (a.0 .0 . we write A = {a. f0. for which the elements corresponding to that index in the two arrangements are diﬀerent. If we wish to emphasize a particular arrangement or ordering of the elements in S. Often. k ∈ N} or (fn.e. and the whole sequence is abbreviated as (f0 .}. thus giving an intensional deﬁnition of the particular set B. the cartesian product A × B.k )n. we write (a1 . b2 ) with b1 = b2 is called injective.0 . Two arrangements of S are diﬀerent if and only if an index k exists. contains exactly nm elements or couples. If A. b. mappings f : N × N → R or some other numeric set. . . . We wish to describe here some sequences and double sequences of numbers. such that every element a ∈ A is the ﬁrst element of one and only one pair (a. the image of a k ∈ N is denoted by fk instead of the traditional f (k). frequently occurring in the analysis of algorithms. A bijection or 1-1 correspondence or 1-1 mapping is any injective function which is also surjective. f2 . . The set A is the domain of the mapping. If S is a ﬁnite set.1 Mappings and powers In all branches of Mathematics two concepts appear. . which have to be taken as basic: the concept of a set and the concept of a mapping.1 . . f2. A function in which every b ∈ B belongs at least to one pair (a. · m diﬀerent possibilities. f1. thus giving an extensional deﬁnition of this particular set. obviously. Therefore. f1 . when the product is extended to all the n elements in A. Usually.2 . the set of all the couples (a. while B is the range or codomain. so to be considered as the fundamental bricks in a solid wall. .1 . A double sequence can be displayed as an inﬁnite array of numbers. and therefore appear in more complex situations. and so on. if |S| = n and S = {a1 .. B are two sets. f2.k | n. and so on. i. b) ∈ f . f0. we have a total of mn diﬀerent mappings from A into B. The array can also be read by columns. (f0. .k∈N . . the index n is the row index and the index k is the column index. 11 .2 . This also explains why the set of mappings from A into B is often denoted by 2.) = (fk )k∈N . If |A| = n and |B| = m. they arise in the study of very simple and basic combinatorial problems. . A function for which every a1 = a2 ∈ A corresponds to pairs (a1 . are the objects (or elements) in a set denoted by A. i. b.

12 B A ; in fact we can write: B A = |B||A| = mn .

CHAPTER 2. SPECIAL NUMBERS In order to abstract from the particular nature of the n objects, we will use the numbers {1, 2, . . . , n} = Nn , and deﬁne a permutation as a 1-1 mapping π : Nn → Nn . By identifying a and 1, b and 2, c and 3, the six permutations of 3 objects are written:

This formula allows us to solve some simple combinatorial problems. For example, if we toss 5 coins, how many diﬀerent conﬁgurations head/tail are pos1 2 3 1 2 3 1 2 3 , , , sible? The ﬁve coins are the domain of our mappings 1 2 3 1 3 2 2 1 3 and the set {head,tail} is the codomain. Therefore we have a total of 25 = 32 diﬀerent conﬁgurations. 1 2 3 1 2 3 1 2 3 , , , Similarly, if we toss three dice, the total number of 2 3 1 3 1 2 3 2 1 3 conﬁgurations is 6 = 216. In the same way, we can where, conventionally, the ﬁrst line contains the elcount the number of subsets in a set S having |S| = n. ements in Nn in their proper order, and the secIn fact, let us consider, given a subset A ⊆ S, the ond line contains the corresponding images. This mapping χA : S → {0, 1} deﬁned by: is the usual representation for permutations, but since the ﬁrst line can be understood without χA (x) = 1 for x ∈ A ambiguity, the vector representation for permutaχA (x) = 0 for x ∈ A / tions is more common. This consists in writThis is called the characteristic function of the sub- ing the second line (the images) in the form of set A; every two diﬀerent subsets of S have dif- a vector. Therefore, the six permutations are ferent characteristic functions, and every mapping (1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1), ref : S → {0, 1} is the characteristic function of some spectively. subset A ⊆ S, i.e., the subset {x ∈ S | f (x) = 1}. Let us examine the permutation π = (3, 2, 1) for Therefore, there are as many subsets of S as there which we have π(1) = 3, π(2) = 2 and π(3) = 1. are characteristic functions; but these are 2n by the If we start with the element 1 and successively apformula above. ply the mapping π, we have π(1) = 3, π(π(1)) = A ﬁnite set A is sometimes called an alphabet and π(3) = 1, π(π(π(1))) = π(1) = 3 and so on. Since its elements symbols or letters. Any sequence of let- the elements in Nn are ﬁnite, by starting with any ters is called a word; the empty sequence is the empty k ∈ Nn we must obtain a ﬁnite chain of numbers, word and is denoted by ǫ. From the previous consid- which will repeat always in the same order. These erations, if |A| = n, the number of words of length m numbers are said to form a cycle and the permutais nm . The ﬁrst part of Chapter 6 is devoted to some tion (3, 2, 1) is formed by two cycles, the ﬁrst one basic notions on special sets of words or languages. composed by 1 and 3, the second one only composed by 2. We write (3, 2, 1) = (1 3)(2), where every cycle is written between parentheses and numbers are sep2.2 Permutations arated by blanks, to distinguish a cycle from a vector. Conventionally, a cycle is written with the smallest In the usual sense, a permutation of a set of objects element ﬁrst and the various cycles are arranged acis an arrangement of these objects in any order. For cording to their ﬁrst element. Therefore, in this cycle example, three objects, denoted by a, b, c, can be arrepresentation the six permutations are: ranged into six diﬀerent ways: (1)(2)(3), (1)(2 3), (1 2)(3), (1 2 3), (1 3 2), (1 3)(2). (a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a). A number k for which π(k) = k is called a ﬁxed A very important problem in Computer Science is point for π. The corresponding cycles, formed by a “sorting”: suppose we have n objects from an orsingle element, are conventionally understood, except dered set, usually some set of numbers or some set of in the identity (1, 2, . . . , n) = (1)(2) · · · (n), in which strings (with the common lexicographic order); the all the elements are ﬁxed points; the identity is simply objects are given in a random order and the problem written (1). Consequently, the usual representation consists in sorting them according to the given order. of the six permutations is: For example, by sorting (60, 51, 80, 77, 44) we should obtain (44, 51, 60, 77, 80) and the real problem is to (1) (2 3) (1 2) (1 2 3) (1 3 2) (1 3). obtain this ordering in the shortest possible time. In other words, we start with a random permutation of A permutation without any ﬁxed point is called a the n objects, and wish to arrive to their standard derangement. A cycle with only two elements is called ordering, the one in accordance with their supposed a transposition. The degree of a cycle is the number order relation (e.g., “less than”). of its elements, plus one; the degree of a permutation

2.3. THE GROUP STRUCTURE is the sum of the degrees of its cycles. The six permutations have degree 2, 3, 3, 4, 4, 3, respectively. A permutation is even or odd according to the fact that its degree is even or odd. The permutation (8, 9, 4, 3, 6, 1, 7, 2, 10, 5), in vector notation, has a cycle representation (1 8 2 9 10 5 6)(3 4), the number 7 being a ﬁxed point. The long cycle (1 8 2 9 10 5 6) has degree 8; therefore the permutation degree is 8 + 3 + 2 = 13 and the permutation is odd. In fact, in cycle notation, we have: (2 5 4 7 3 6)(2 6 3 7 4 5) = = (2 6 3 7 4 5)(2 5 4 7 3 6) = (1).

13

A simple observation is that the inverse of a cycle is obtained by writing its ﬁrst element followed by all the other elements in reverse order. Hence, the inverse of a transposition is the same transposition. Since composition is associative, we have proved that (Pn , ◦) is a group. The group is not commutative, because, for example: ρ◦π = = (1 7 6 2 4)(3 5) = π ◦ ρ. (1 4)(2 5 7 3) ◦ (2 5 4 7 3 6)

2.3

The group structure

Let n ∈ N; Pn denotes the set of all the permutations of n elements, i.e., according to the previous sections, the set of 1-1 mappings π : Nn → Nn . If π, ρ ∈ Pn , we can perform their composition, i.e., a new permutation σ deﬁned as σ(k) = π(ρ(k)) = (π ◦ ρ)(k). An example in P7 is: π◦ρ=

An involution is a permutation π such that π 2 = π ◦ π = (1). An involution can only be composed by ﬁxed points and transpositions, because by the deﬁnition we have π −1 = π and the above observation on the inversion of cycles shows that a cycle with more than 2 elements has an inverse which cannot coincide with 1 2 3 4 5 6 7 1 2 3 4 5 6 7 the cycle itself. 1 5 6 7 4 2 3 4 5 2 1 7 6 3 Till now, we have supposed that in the cycle repre1 2 3 4 5 6 7 sentation every number is only considered once. How. = 4 7 6 3 1 5 2 ever, if we think of a permutation as the product of In fact, by instance, π(2) = 5 and ρ(5) = 7; therefore cycles, we can imagine that its representation is not σ(2) = ρ(π(2)) = ρ(5) = 7, and so on. The vec- unique and that an element k ∈ Nn can appear in tor representation of permutations is not particularly more than one cycle. The representation of σ or π ◦ ρ suited for hand evaluation of composition, although it are examples of this statement. In particular, we can is very convenient for computer implementation. The obtain the transposition representation of a permutation; we observe that we have: opposite situation occurs for cycle representation: (2 5 4 7 3 6) ◦ (1 4)(2 5 7 3) = (1 4 3 6 5)(2 7) Cycles in the left hand member are read from left to right and by examining a cycle after the other we ﬁnd the images of every element by simply remembering the cycle successor of the same element. For example, the image of 4 in the ﬁrst cycle is 7; the second cycle does not contain 7, but the third cycle tells us that the image of 7 is 3. Therefore, the image of 4 is 3. Fixed points are ignored, and this is in accordance with their meaning. The composition symbol ‘◦’ is usually understood and, in reality, the simple juxtaposition of the cycles can well denote their composition. The identity mapping acts as an identity for composition, since it is only composed by ﬁxed points. Every permutation has an inverse, which is the permutation obtained by reading the given permutation from below, i.e., by sorting the elements in the second line, which become the elements in the ﬁrst line, and then rearranging the elements in the ﬁrst line into the second line. For example, the inverse of the ﬁrst permutation π in the example above is: 1 1 2 3 4 6 7 5 5 6 2 3 7 4 . (2 6)(6 5)(6 4)(6 7)(6 3) = (2 5 4 7 3 6) We transform the cycle into a product of transpositions by forming a transposition with the ﬁrst and the last element in the cycle, and then adding other transpositions with the same ﬁrst element (the last element in the cycle) and the other elements in the cycle, in the same order, as second element. Besides, we note that we can always add a couple of transpositions as (2 5)(2 5), corresponding to the two ﬁxed points (2) and (5), and therefore adding nothing to the permutation. All these remarks show that: • every permutation can be written as the composition of transpositions; • this representation is not unique, but every two representations diﬀer by an even number of transpositions; • the minimal number of transpositions corresponding to a cycle is the degree of the cycle (except possibly for ﬁxed points, which however always correspond to an even number of transpositions).

14 Therefore, we conclude that an even [odd] permutation can be expressed as the composition of an even [odd] number of transpositions. Since the composition of two even permutations is still an even permutation, the set An of even permutations is a subgroup of Pn and is called the alternating subgroup, while the whole group Pn is referred to as the symmetric group.

CHAPTER 2. SPECIAL NUMBERS When n ≥ 2, we can add to every permutation in Pn one transposition, say (1 2). This transforms every even permutation into an odd permutation, and vice versa. On the other hand, since (1 2)−1 = (1 2), the transformation is its own inverse, and therefore deﬁnes a 1-1 mapping between even and odd permutations. This proves that the number of even (odd) permutations is n!/2. Another simple problem is how to determine the number of involutions on n elements. As we have already seen, an involution is only composed by ﬁxed points and transpositions (without repetitions of the elements!). If we denote by In the set of involutions ′ of n elements, we can divide In into two subsets: In is the set of involutions in which n is a ﬁxed point, ′′ and In is the set of involutions in which n belongs to a transposition, say (k n). If we eliminate n from ′ the involutions in In , we obtain an involution of n − 1 ′ elements, and vice versa every involution in In can be obtained by adding the ﬁxed point n to an involution in In−1 . If we eliminate the transposition (k n) from ′′ an involution in In , we obtain an involution in In−2 , which contains the element n−1, but does not contain the element k. In all cases, however, by eliminating (k n) from all involutions containing it, we obtain a set of involutions in a 1-1 correspondence with In−2 . The element k can assume any value 1, 2, . . . , n − 1, and therefore we obtain (n − 1) times |In−2 | involutions. We now observe that all the involutions in In are obtained in this way from involutions in In−1 and In−2 , and therefore we have: |In | = |In−1 | + (n − 1)|In−2 |

2.4

Counting permutations

How many permutations are there in Pn ? If n = 1, we only have a single permutation (1), and if n = 2 we have two permutations, exactly (1, 2) and (2, 1). We have already seen that |P3 | = 6 and if n = 0 we consider the empty vector () as the only possible permutation, that is |P0 | = 1. In this way we obtain a sequence {1, 1, 2, 6, . . .} and we wish to obtain a formula giving us |Pn |, for every n ∈ N. Let π ∈ Pn be a permutation and (a1 , a2 , ..., an ) be its vector representation. We can obtain a permutation in Pn+1 by simply adding the new element n + 1 in any position of the representation of π: (n + 1, a1 , a2 , . . . , an ) ··· (a1 , n + 1, a2 , . . . , an ) ···

(a1 , a2 , . . . , an , n + 1)

Therefore, from any permutation in Pn we obtain n+1 permutations in Pn+1 , and they are all diﬀerent. Vice versa, if we start with a permutation in Pn+1 , and eliminate the element n + 1, we obtain one and only one permutation in Pn . Therefore, all permutations in Pn+1 are obtained in the way just described and are obtained only once. So we ﬁnd: |Pn+1 | = (n + 1) |Pn |

Since |I0 | = 1, |I1 | = 1 and |I2 | = 2, from this recurwhich is a simple recurrence relation. By unfolding rence relation we can successively ﬁnd all the values this recurrence, i.e., by substituting to |Pn | the same of |In |. This sequence (see Section 4.9) is therefore: expression in |Pn+1 |, and so on, we obtain: 5 6 7 8 n 0 1 2 3 4 We conclude this section by giving the classical computer program for generating a random permutation of the numbers {1, 2, . . . , n}. The procedure Since, as we have seen, |P0 |=1, we have proved that shuﬄe receives the address of the vector, and the the number of permutations in Pn is given by the number of its elements; ﬁlls it with the numbers from product n · (n − 1)... · 2 · 1. Therefore, our sequence 1 to n then uses the standard procedure random to is: produce a random permutation, which is returned in the input vector: n 0 1 2 3 4 5 6 7 8 |Pn | 1 1 2 6 24 120 720 5040 40320 procedure shuﬄe( var v : vector; n : integer ) ; As we mentioned in the Introduction, the number var : · · ·; n·(n−1)...·2·1 is called n factorial and is denoted by begin n!. For example we have 10! = 10·9·8·7·6·5·4·3·2·1 = for i := 1 to n do v[ i ] := i ; 3, 680, 800. Factorials grow very fast, but they are one for i := n downto 2 do begin of the most important quantities in Mathematics. j := random( i ) + 1 ; = (n + 1)n(n − 1) · · · 1 × |P0 | |Pn+1 | = (n + 1)|Pn | = (n + 1)n|Pn−1 | = = ··· = In 1 1 2 4 10 26 76 232 764

we choose a term from each factor (a + b). . b. In fact. b} {a. we obtain what are called and the expression on the right is successively reduced k-combinations. n0 = 1. If A = {a. The procedure obviously performs in n 0 linear time. and n − 2 objects to sitions with the same elements. there are only 6 2r to 0 . .. when expanding the product (a + b)(a + b) · · · (a + b). The name “binomial coeﬃcients” is due to the welltions known “binomial formula”: Permutations are a special case of a more general situation. In this way the last element is chosen at ran. a) (c. by the deﬁnition above. we have: k! k! k Dn. which is often read “n choose k”. we can make 12 diﬀerent arrangements with two objects chosen from a. Therefore. d) (b.. but it is a hard job to compute 100 3 97 . Therefore. nn = n! and. in coeﬃcients. b}. by permuting the k general.(n − k + 1) have: diﬀerent ways. the resulting term ak bn−k is obtained by summing over all the possible choices of (a. the so-called Pochhammer symbol. By the very deﬁnition we have: shuﬄed and eventually v contains the desired random n n =1 ∀n ∈ N =1 permutation. d) k a’s and n − k b’s.0 C3. which. d} {c.k denotes the number of possible n(n − 1) . {a. there are n diﬀerent k-dispositions of n objects. and so on. b. DISPOSITIONS AND COMBINATIONS a := v[ i ]. k (c. the empty set is the only subset with 0 elements.2. .5 Dispositions and Combina. {b. positions correspond to a single combination and we the k objects can be selected in n(n − 1). because. and the whole set is 2. v[ j ] := a end end {shuﬄe} . d. all the possible combinations It is not diﬃcult to compute a binomial coeﬃcient such as 100 . c}. if we have 4 objects a. Therefore. c) (a. and is called element. given a set of n elements.the only subset with n elements. {b}. c.5. As we have seen. {c}} = {{a. a) (d. of these objects are: C3. (n − k + 1) n = = consists of k factors beginning with n and decreasing k k! by one down to (n − k + 1). often denoted by (n)k . d. a) (b. In this way. (n − k + 1) n nk = = dispositions of n elements in groups of k. b. If Dn. the elements in v are properly a binomial coeﬃcient. There remain only n − 1 objects objects in the disposition.2 C3. we can wonder how many diﬀerent orderings exist of k among the n objects. b. because they are simply the subsets with k objects of a set of n objects. c) (b. c} {b. v[ i ] := v[ j ]. In fact we have: The symbol nk is called a falling factorial because it n(n − 1) . .k = n(n − 1) · · · (n − k + 1) = nk This formula gives a simple way to compute binomial coeﬃcients in a recursive way. If we have n objects. n (n − 1) . b) (c.3 = {{a}. They are: n (a + b)n = k=0 n k n−k a b k which is easily proved. we can use any one of the n objects to be ﬁrst in the permutation. c) There exists a simple formula to compute binomial k These arrangements ara called dispositions and. c} {a. we have: {a. . There exists also a rising = = ¯ k k (k − 1)! factorial n = n(n + 1) · · · (n + k − 1). d) (d.1 C3. (n − k + 1) by convention. n n−1 = When in k-dispositions we do not consider the k k−1 order of the elements. c}} = {{a. k! disbe used as a third element. possibly the same last element. b) (d. are just n . c.The number of k-combinations of n objects is denoted n dom. d} {b. and the procedure goes on with the last but one by k . b) (a. Obviously. For examcombinations of 4 objects. which is 1 as we have already seen. . For example. c}. d} 7 3 = 7 6 3 2 = 76 5 32 1 = 765 4 321 0 = 35 Combinations are written between braces. c}} = ∅ 15 The procedure exchanges the last element in v with a random element in v. we obtain k! other dispoto be used as second element. and they are: ple.

i. See Table 2. in position (n−1.g.2. . 0 n ily expanded to any real numerator: r k = r(r − 1) .1. since .. we obtain a combination responding to the various 0 ) and the main diagonal n with repetitions. Let = = {a1 . exceeding the capacity which can be used with the initial conditions: of the computer when n. and k! (n − k) . The reader is invited to produce a computer pron−1 n−1 n gram to evaluate binomial coeﬃcients. (n − k + 1) we know. . which can produce very large numbers.. which we are now going to prove. . k). lower triangular arnumber of the k by k combinations with repetitions ray known as the Pascal triangle (or Tartaglia-Pascal of n elements is: triangle). The proof of this fact is immediate. . we have: 4 2 3 3 + = 2 1 2 2 2 2 = + + + = 2 1 1 0 2 1 1 = 2+2 =2+2 + 1 1 0 = 2 + 2 × 2 = 6.e. n n The deﬁnition of a binomial coeﬃcient can be eas= =1 ∀n ∈ N. SPECIAL NUMBERS by means of the above formulas. shows that: We now point out that S + can be seen (by eliminating an ) as the subsets with k − 1 elements of a n! n n = = set with n − 1 elements. 1 let us ﬁx one of these elements. n is the number of the subsets with k = = k k! k elements of a set with n elements. The symmetry rule is quite evident and a n+k−1 simple observation is that the sum of the elements in Rn. a symmetry formula which is very useful. (−n − k + 1) = k! = This recurrence is not particularly suitable for numerical computation. . We can count n . and it not contain an . 1 the number of such subsets in the following way. . . There exists. (n − k + 1) (n − k) .e. . The class S − can be seen k−1 100 From this formula we immediately obtain 97 = as composed by the subsets with k elements of a set 100 with n − 1 elements. the number of the n−k k (n − k)!(n − (n − k))! elements in S + is n−1 .k = . e. therefore. If in a combination we are allowed to have several array is initially ﬁlled by 1’s in the ﬁrst column (corn copies of the same element. i. an . A useful exercise is to prove that the (corresponding to the numbers n ). By summing these two evaluation of the central binomial coeﬃcient k . He (or she) is + = k−1 k k warned not to use the formula n!/k!(n−k)!. . and the element on the left. = For example we have: 1/2 3 = 1 1/2(−1/2)(−3/2) = 3! 16 but in this case the symmetry rule does not make sense. the base set minus an : their 3 . Let S + and S − be these two classes. a2 . i. . However. integer numerator as a binomial coeﬃcient elements in the previous row: the element just above with positive numerator. . k! k! For example. . = k! Let us dispose them in an inﬁnite array. an } be the elements of the base set. This is known as negation the position (n. we obtain the recurrence relation: which symmetry gives no help. (r − k + 1) rk = . As n n(n − 1) . k are not small. We can disn! = tinguish the subsets with k elements into two classes: k!(n − k)! the subsets containing an and the subsets that do This is a very important formula by its own. k −1). . in position (n − 1. . k). The recurrence tells us that the element in position (n. The most diﬃcult computing problem is the 2k number is therefore n−1 . for k contributions. We point out that: −n k = −n(−n − 1) .16 CHAPTER 2. k) is obtained by summing two which allows us to express a binomial coeﬃcient with negative. We actually obtain an inﬁnite. k row n is 2n . The rule and will be used very often. it gives a simple rule (−1)k (n + k − 1)k to compute successively all the binomial coeﬃcients.6 The Pascal triangle ever. how. Let Binomial coeﬃcients satisfy a very important recurus begin by observing that: rence relation..e. . . whose rows n+k−1 k represent the number n and whose columns represent (−1) = k the number k..

are the three basic properties of binomial n (−1) 1 · 2 · 3 · 4 · · · (2n − 1) · (2n) = = coeﬃcients: 2n n!2 · 4 · · · (2n) n n n −n n+k−1 (−1)n (2n)! (−1) (2n)! = = (−1)k = = = n n!2n (1 · 2 · · · n) n 2 k n−k k k 2 4 n! = (−1)n 2n . The reader can try to prove that the row alternating The coeﬃcient of xn is therefore f (n) (0)/n! = sums. 4n n n k k r = n r In a similar way. in fact we have: 2m + 1 2m + 2 2 f (n) (x) = dn (1 + x)r = rn (1 + x)r−n dxn > 1 2m+1 + 1 2m+1 + ··· + 1 2m+1 = 2m 2m+1 = 1 2 . Because of that. which is n 0 1 2 3 4 5 6 7 8 called the cross-product rule: cn 1 2 6 20 70 252 924 3432 12870 k! n k n! = = They are very important.7. n−r k−r 2. n 4n 4n (2n and are to be remembered by heart. the sum 1−4+6−4+1. and. we can k!(n − k)! r!(k − r)! k r −1/2 in terms of express the binomial coeﬃcients n n! (n − r)! the central binomial coeﬃcients: = = r!(n − r)! (n − k)!(k − r)! −1/2 (−1/2)(−3/2) · · · (−(2n − 1)/2) n n−r = = . and this number is obviously 2n . the numbers cn = r n 2n (1 + x)r = x . Suppose now that the formula holds true for some n ∈ N and let us diﬀerentiate it once more: f (n+1) (x) = rn (r − n)(1 + x)r−n−1 = = rn+1 (1 + x)r−(n+1) and this proves our statement. if we cumulate the 2m numbers the binomial formula can be extended to real expofrom 1/(2m + 1) to 1/2m+1 . n are called central binomial coeﬃcients and their n n=0 sequence begins: We conclude with the following property. ∞ As we have already mentioned. HARMONIC NUMBERS 17 as can be shown by mathematical induction. and so: n for the ﬁrst row. e.1: The Pascal triangle ′ ′′ (n) a row represents the total number of the subsets in a = f (0) + f (0) x + f (0) x2 + · · · + f (0) xn + · · · . (1 + x)r has a Taylor expansion around the point x = 0 of the form: (1 + x)r = n\k 0 1 2 3 4 5 6 0 1 1 1 1 1 1 1 1 1 2 3 4 5 6 2 3 4 5 6 1 3 6 10 15 1 4 10 20 1 5 15 1 6 1 Table 2.2. the row with n = 0. In fact. together with the symmetry and the nega2n n! tion rules. Let us consider the function f (x) = (1+x)r . 1! 2! n! set with n elements. except rn /n! = r . = n n! k−r r n (−1) 1 · 3 · · · (2n − 1) = = This rule. we can prove the following identities: 1/2 n 3/2 n −3/2 n = = = (−1)n−1 2n − 1) n 2n (−1)n 3 n (2n − 1)(2n − 3) 4 n n−1 (−1) (2n + 1) 2n .. The ﬁrst two cases are f (0) = (1 + x)r = r0 (1 + x)r−0 . it is continuous and can be diﬀerentiated as many times 1 1 1 + + · · · + m+1 > as we wish. we obtain: nents. f ′ (x) = r(1 + x)r−1 .g. for example. equal zero.7 Harmonic numbers ∞ k=1 It is well-known that the harmonic series: 1 1 1 1 1 = 1 + + + + + ··· k 2 3 4 5 An important point of this generalization is that diverges.

Conventionally. its value 3 11 25 137 49 363 761 is: γ ≈ 0. 2 3 4 n ζ(s) = 1 1 1 + s + s + ··· s 1 2 3 Obviously we have: H2n − L2n = 2 1 1 1 + + ··· + 2 4 2n = Hn which can be deﬁned in such a way that the sum actually converges except for s = 1 (the harmonic series). 8 27 64 125 Therefore: 1 1 1 1 π4 1 ζ(4) = 1 + + + + + ··· = ln 2 − < L2n < ln 2 16 81 256 625 90 2n and in general: and by summing Hn to all members: Hn + ln 2 − 1 < H2n < Hn + ln 2 2n ζ(2n) = (2π)2n |B2n | 2(2n)! 1 Because of the limited value of ζ(s). . partial sum of the harmonic series. They are the partial sums of the series deﬁning the Riemann ζ function: and let us deﬁne: Ln = 1 − 1 1 1 (−1)n−1 + − + ··· + . for < H2k−1 < H2k−2 + ln 2. Fibonacci was .18 CHAPTER 2. Let us consider the series: 1 1 1 1 (s) Hn = s + s + s + · · · + s 1 2 3 n 1 1 1 1 1 − + − + − · · · = ln 2 (1) 2 3 4 5 and Hn = Hn . On the Since H20 = H1 = 1. .202056903 . (s) Hn ≈ ζ(s) we obtain: H2k−2 + ln 2 − H2k−2 + 2 ln 2 − 1 1 − k−1 < H2k < H2k−2 + 2 ln 2. Leonardo Fibonacci introduced in Europe the positional notation for numbers.5772156649 . the error committed by truncating 1 1 1 1 ζ(3) = 1 + + + + + ··· it at any place is less than the ﬁrst discarded element. . and 2 3 n since the values of the Hn ’s are increasing. In fact. 2. SPECIAL NUMBERS and therefore the sum cannot be limited.that: monic number. k 2 2 2 At the beginning of 1200. No explicit formula is known for ζ(2n + 1). In particular we have: ζ(2) = 1 + 1 1 π2 1 1 + + + + ··· = 4 9 16 25 6 or H2n = L2n + Hn . 2k−1 large values of n: By summing and simplifying these two expressions.8 Fibonacci numbers We can now iterate this procedure and eventually ﬁnd: H20 +k ln 2− 1 1 1 − k−1 −· · ·− 1 < H2k < H20 +k ln 2. and since the series for ln 2 is alternating in sign. we set H0 = 0 and Hn → ln n + γ as n→∞ the sequence of harmonic numbers begins: This constant is called the Euler-Mascheroni con4 5 6 7 8 n 0 1 2 3 stant and. . we have the limitations: other hand we can deﬁne: ln 2k < H2k < ln 2k + 1. This plies that a constant γ should exist (0 < γ < 1) such number has a well-deﬁned value and is called a har. 1 1 1 Hn = 1 + + + · · · + These limitations can be extended to every n.. but numeri2k−2 : cally we have: 1 H2k−1 + ln 2 − k < H2k < H2k−1 + ln 2 2 ζ(3) ≈ 1. as we have already mentioned. together with the computing algorithms for performing the four basic operations. . 2k 2 Let us now consider the two cases n = 2k−1 and n = where Bn are the Bernoulli numbers (see below). we can set. Harmonic numbers arise in the analysis of many The generalized harmonic numbers are deﬁned as: algorithms and it is very useful to know an approximate value for them. Later we will prove the Hn 0 1 2 6 12 60 20 140 280 more accurate approximation of the Hn ’s we quoted in the Introduction. this ima ﬁnite.

when they begin to generate a new couple of rabbits every month. the ﬁrst couple has generated another pair of rabbits. 2 couples. and we wish to cover a strip 2 dm wide and n dm long by using these bricks. Therefore. we have the initial conditions M0 = 1 (the empty covering is just a covering!) and M1 = 1. and later we will see how they grow and how they can be computed in a fast way. This time. the maximal number of divisions performed by Euclid’s algorithm is attained when n. we wish to show how Fibonacci numbers appear in combinatorics and in the analysis of algorithms. i. TREES AND CATALAN NUMBERS the most important mathematician in western Europe at that time. The problem consists in computing how many couples of rabbits the farmer has after n months. In general. and We immediately see that the quotients are all 1 (except the last one) and the remainders are decreasing Fibonacci numbers. which are generated by the fertile couples. which are as many as there are fertile couples. For the moment. 55): 55 34 21 13 8 5 3 2 = 1 × 34 = 1 × 21 = 1 × 13 = 1×8 = 1×5 = 1×3 = 1×2 = 2×1 + + + + + + + 21 13 8 5 3 2 1 every term obtained by summing the two preceding numbers in the sequence. The problem is to determine the maximal number of divisions performed by Euclid’s algorithm. since a greater quotient would drastically cut the number of necessary divisions.2. Obviously. trees and Catalan numbers “Walks” or “paths” are common combinatorial objects and are deﬁned in the following way. the farmer will have the couples of the previous month plus the new couples. there are the 3 couples of the preceding month plus the newly generated couples. Despite the small numbers appearing at the beginning of the sequence. and 3 couples the fourth month. Let Z2 be . a new couple of rabbits every month.9 Walks. If we denote by Fn the number of couples at the nth month. this maximum is attained when every division in the process gives 1 as a quotient. The problem is to know in how many diﬀerent ways we can perform this covering. we have the Fibonacci recurrence: Fn = Fn−1 + Fn−2 with the initial conditions F1 = F2 = 1. The new generated couples become fertile after two months. The process can be inverted to prove that only consecutive Fibonacci numbers enjoy this property. m are two consecutive Fibonacci numbers and the actual number of divisions is the order of the smaller number in the Fibonacci sequence. Suppose we have some bricks of dimensions 1 × 2 dm. we conclude that given two integer numbers.. we can observe that a covering of Mn can be obtained by adding vertically a brick to a covering in Mn−1 or by adding horizontally two bricks to a covering in Mn−2 . Fibonacci numbers grow very fast. Let us consider two consecutive Fibonacci numbers.9. and let us try to ﬁnd gcd(34.1: Fibonacci coverings for a strip 4 dm long therefore we have the recurrence relation: Mn = Mn−1 + Mn−2 which is the same recurrence as for Fibonacci numbers. but these are just the couples of two months beforehand. n and m. 2. The third month the farmer has 2 couples. WALKS. He posed the following problem: a farmer has a couple of rabbits which generates another couple after two months and. It is a simple matter to ﬁnd the initial values: there is one couple at the beginning (1st month) and 1 in the second month. minus 1. from that moment on.1 we show the ﬁve ways to cover a strip 4 dm long. The couples become 5 on the ﬁfth month: in fact. we have F0 = 0 and the sequence of Fibonacci numbers begins: n Fn 0 1 0 1 2 3 1 2 4 5 3 5 6 7 8 13 8 21 9 34 10 55 19 Figure 2. while the previously generated couple of rabbits has not yet become fertile. These are the only ways of proceeding to build our coverings.e. at the nth month. for example 34 and 55. however. that is the couples he had two months before. If Mn is this number. in fact. By the same rule. In Figure 2. Therefore we conclude Mn = Fn+1 . Euclid’s algorithm for computing the Greatest Common Divisor (gcd) of two positive integer numbers is another instance of the appearance of Fibonacci numbers.

. (x + 1.e. then we can associate to any walk a subset of Nn = {1. while the right subtree contains the remaining n − k − 1 nodes. In k fact. we only wish to establish a recurrence relation for the number bn of the underdiagonal walks of length 2n and ending on the main diagonal.2: How a walk is decomposed the integral lattice. (x + 1. are particularly important. 2. if we associate an open parenthesis to every east steps and a closed parenthesis to every north step. How many diﬀerent trees exist with n nodes? If we ﬁx our attention on the root. k) at which the walk encounters the main diagonal for the ﬁrst time. corresponding to the empty walk or the walk composed by the only point (0. y). . we will be able to count the number of underdiagonal walks composed by n steps and ending at a horizontal distance k from the main diagonal. are exactly n . southeast steps. i.e. . y + 1)). In Fig. north-east steps.. A walk or path is a ﬁnite sequence of points in Z2 with the following properties: 1. Since the other steps should be east steps. n the n steps. 3.e. They are called underdiagonal walks. which. y) with y > x. and therefore we have the recurrence relation: n−1 bn = k=0 bk bn−k−1 which is the same recurrence as before. When we build binary trees from permutations. the left subtree has k nodes for k = 0. . A pair of points ((x. as we know. Among the walks. . and of type ((x. . 2. Later on. For the moment. 2. . n+1 n The bn ’s are called Catalan numbers and they frequently occur in the analysis of algorithms and data structures. the ones that never go above the main diagonal. where we have marked the point C = (k. as we show in Figure 2. if we denote by 1. the last point is (x.. the set of points in R2 having integer coordinates. k)) is just n . as we shall see: bn = 1 2n . . For example. The sequence (bk )k∈N begins: n bn 0 1 1 2 1 2 3 4 5 14 5 42 6 132 7 429 8 1430 Figure 2. .2 we sketch a possible walk. y). y)) is called an east step and a pair ((x. we obtain the number of possible parenthetizations of an expression. i. a 1-1 correspondence exists between the walks and the subsets of Nn with k elements.. i. y). We therefore obtain the recurrence relation: n n−1 T r Cr r rB r r r r A r r r r bn = k=1 bk−1 bn−k = k=0 bk bn−k−1 E with the initial condition b0 = 1. When we have three pairs of parentheses. Since the initial condition is again b0 = 1 (the empty tree) there are as many trees as there are walks. n}. This also proves. that is the subset of the north-step numbers. There are only 5 trees generated by the six permutations of 1. 0) belongs to the walk. Every tree with k nodes can be combined with every tree with n − k − 1 nodes to form a tree with n nodes. There are bk−1 walks of the type AB and bn−k walks of the type CD and.3. n − 1. in a k combinatorial way. k) = (n − k. the symmetry rule for binomial coeﬃcients. y + 1) or (x + 1. then either (x. if (x + 1. 1. It is a simple matter to show that the number of walks composed by n steps and ending at column k (i. obviously. 0). we do not always obtain diﬀerent trees from diﬀerent permutations. 2. y + 1)) is a north step. walks no point of which has the form (x. the origin (0. y) also belongs to the walk. starting with the origin. the 5 possibilities are: ()()() ()(()) (())() (()()) ((())). Another kind of walks is obtained by considering steps of type ((x. The interesting walks are those starting .. .. . 2. y − 1)). SPECIAL NUMBERS the walk is decomposed into a ﬁrst part (the walk between A and B) and a second part (the walk between C and D) which are underdiagonal walks of the same type.e.20 D r r r r CHAPTER 2.e. We observe that and. y). k can be any number between 1 and n. i. . (x. (x + 1. y + 1) belongs to the walk.

4 n−1 n (x − n + 1) and we represent all the trees up to n = 3. We are mostly interested in the absolute values of these numbers.10. what k k=0 we obtain is a “rooted plane tree”. n−1 x1 x2 x3 x4 = = = = = x x(x − 1) = x2 − x x(x − 1)(x − 2) = x3 − 3x2 + 2x x(x − 1)(x − 2)(x − 3) = x4 − 6x3 + 11x2 − 6x xn = (x − n + 1) n−1 k=0 n−1 (−1)n−k−1 xk = k 2. If n denotes the number of branches in a rooted plane tree.2: Stirling numbers of the ﬁrst kind instances: Figure 2. they are called Dyck walks.2. Finally. say xn . as are shown in 5 6 7 8 n 0 1 2 3 4 Table 2. sometimes read “n cycle k”. if we recursively add branches to the root or n (−1)n−k xk . the English mathematician James Stirling was looking for a connection between powers of a number x. these numbers are called Stirbn 1 1 2 5 14 42 132 429 1430 ling numbers of the ﬁrst kind and are now denoted by n k .4: Rooted Plane Trees from the origin and never going below the x-axis. for the reason we are now going to explain. and again we obtain the sequence of Catalan and picking the coeﬃcients in their proper order (from the smallest power to the largest) he obtained a numbers: table of integer numbers. Again. STIRLING NUMBERS OF THE FIRST KIND 21 1 d d 2 d d 3 1 d d 3 2 1 2 d d 3 1 d d 3 2 2 1 3 Figure 2. which is the root of n the tree. He developed the ﬁrst − n k=0 = k=0 n−1 (−1)n−k xk + k−1 . in Fig. rooted Let us now observe that x = x therefore we have: plane trees are counted by Catalan numbers. 2.3: The ﬁve binary trees with three nodes n\k 0 1 2 3 4 5 6 0 1 0 0 0 0 0 0 1 1 1 2 6 24 120 2 3 4 5 6 q q q q d q dq q q q q q d q d q q q d q d q q d q qd q q q q q q q d q d q 1 3 11 50 274 1 6 35 225 1 10 85 1 15 1 Table 2. the concept of a rooted planar tree is as First note that the above identities can be written: follows: let us consider a node. and the falling factorials xk = x(x − 1) · · · (x − k + 1).2.10 Stirling numbers of the ﬁrst kind = k=0 n−1 (−1)n−k−1 xk+1 − k (n − 1) n−1 (−1)n−k−1 xk = k n−1 About 1730. xn = to the nodes generated by previous insertions. After him. An obvious 1-1 correspondence exists between Dyck walks and the walks considered above.

This concludes the proof. we obtain a permutation with n − 1 elements and k cycles. Sn. we observe that the permutations in Sn.k | + |Sn−1. What is a possible combinatorial interpretation of these numbers? Let us consider the permutations of n elements and count the permutations having exactly k cycles. 0 i. Therefore. that is he was also interested in expressing ordinary powers in terms of falling factorials. returning to the numerical deﬁnition. exactly n − 1 times.k . but this transposition can only be formed by taking two elements among 1. k k k−1 This recurrence. we obtain a permutation with n − 1 elements having exactly k − 1 cycles. thus obtaining the recurrence relation: n n−1 n−1 = (n − 1) + . First we observe that Sn.k | = n . • 2. k=0 xn = .2 : (1 2 3)(4 5) (1 2 3 5)(4) (1 2 5 3)(4) (1 5 2 3)(4). in each of which a positive integer is missing.n | = 1.. we conclude that We obtain a recurrence relation in the following way: |Sn.0 is empty.k−1 | such permutations in Sn. so |Sn. or not.3.k with n as a ﬁxed point if we add (n) to it. as shown in Table 2.22 n CHAPTER 2. n 2 n n−1 n 1 • completely deﬁnes the Stirling number of the ﬁrst kind. ∀n > 0. i. ∀n ∈ N n and n = 0. since n can occur after any other element in the standard cycle representation (it can never occur as the ﬁrst element in a cycle. for n ≥ 1.e.k | = (n − 1)|Sn−1. ∀n ∈ N. 2 = (n − 1)!Hn−1 . For example. because evn−1 n−1 ery permutation contains at least one cycle. The usual notation for them is n . If we now prove that also the initial conditions are the same. n ﬁxed points. by our conventions). whose set will be denoted by Sn. and therefore we can equate its coeﬃcients to those of the previous. this begins by 1 and is followed by any permutations of the n − 1 remaining numbers. general Stirling identity.2 produce the same permutation in S4. k k=0 + k=0 We performed the change of variable k → k − 1 in the ﬁrst sum and then extended both sums from 0 to n. together with the initial conditions: n = 1. which is done in n diﬀerent ways.k .1 is composed by all the permutations having a single cycle. n.n is only comk n−1 n−1 posed by the identity. When n is not a ﬁxed point and we eliminate it from the permutation.11 Stirling numbers of the second kind James Stirling also tried to invert the process described in the previous section. any such permutation gives a permutation in Sn. because they correspond to the total number of permutations of n objects. we ﬁnd: n n = n!. Sn. k=0 Moreover. vice versa.. . When n is a ﬁxed point and we eliminate it. If we ﬁx any element. the coeﬃcient of x2 is a sum of products. . Sn. in fact. . This identity is valid for every value of x. 2. for the reason k we are going to explain. Stirling’s identities can be globally written: n n xk . However. in fact. often read “n subset k”. and so (x + k − k)xk = = k |Sn. the row sums of the Stirling triangle of the ﬁrst kind equal n!. say the last element n.k can have n as a ﬁxed point.e. which is just the recurrence relation for the Stirling k k=0 numbers of the ﬁrst kind. the same permutation is obtained several times. = n .k−1 | The coeﬃcients can be arranged into a triangular array.n−1 contains permu2 tations having all ﬁxed points except a single transposition.0 | = 0. . all the following permutations in S5. the only permutation having n xn = xxn−1 = x xk = k cycles. there are |Sn−1. k As an immediate consequence of this reasoning. The ﬁrst instances are: x1 x2 x3 x4 = x1 = x1 + x2 = x + x(x − 1) = x1 + 3x2 + x3 = x + 3x(x − 1)+ + x(x − 1)(x − 2) = x1 + 6x2 + 7x3 + x4 The process can be inverted and therefore we have: |Sn. SPECIAL NUMBERS (n − 1) n−1 (−1)n−k xk . and are called Stirling numbers of the second kind. We also observe that: • = (n − 1)!.

we ﬁnd a sequence: n Bn 0 1 1 1 2 3 2 5 4 15 5 52 6 203 7 877 8 4140 which represents the total number of partitions relative to the set Nn .12 {1. 4} ∪ {2. 2} ∪ {3. 2. as usual. when n = 4 and k = 2.0 | = 0. 4} {1. 5} This proves the recurrence relation: |Pn. 2. In any partition with 2 of these numbers.k | coincides identity. we performed the change of variable 1. one subset with 2 elements. 3} ∪ {2. the only partition of 1 Nn in a single subset is when the subset coincides which is slightly diﬀerent from the recurrence for the with Nn . as a subset with n as its only element) or can contain n as an element in a larger subset. The partitions in Pn. and all the others we can study the partitions of Nn into k disjoint. . 5} ∪ {4} 1 3 7 15 31 1 6 25 90 1 10 65 1 15 1 Table 2. however. Bell and Bernoulli numbers If Pn. 2} ∪ {3..k can contain n as a singleton (i.3 : {1. 3. the following three partitions in P5. . Every row of the triangle Nn \ {1}. When n belongs to a larger set. Let us suppose that the 0 n ﬁrst subset always contains 1. 3} {1} ∪ {2.k−1 and. the ﬁve partitions of a set with three elements are: {1.k . determines the second. . we observe that there is only k k k x x + = k k−1 one partition of Nn composed by n subsets. ∀n ∈ N. and therefore we no partition of Nn composed by 0 subsets. ∀n ∈ N (in the case n = 0 the empty set is the only k → k − 1 and extended the two sums from 0 to n. by eliminating {n} we obtain a partition in Pn−1. 2. from row 4 we correspond to an empty second set. 2. the 4-th Stirling polynomial. i.e. ∀n ≥ 1. 4} ∪ {3} {1. therefore |Pn. for example. partition of the empty set). If Nn is the usual set {1. which would determines a polynomial. 2} ∪ {3} ∪ {4. 3} {1. For example. This proves obtain: S4 (w) = w + 7w2 + 6w3 + w4 and it is called the identity.k | + |Pn−1. there is The identity is valid for every x ∈ R. all partitions in Pn−1. the k=0 k=0 partition containing n singletons. obviously. exactly by eliminating n from any of the k subsets containing it in the various partitions.k | by ﬁxing an element in Nn . Here we have the initial conditions: • n = 2n−1 . 2 diﬀerent ways. In the former case. ∀n ≥ 1. When the partition is only 2 composed by two subsets. In fact.. BELL AND BERNOULLI NUMBERS n\k 0 1 2 3 4 5 6 0 1 0 0 0 0 0 0 1 1 1 1 1 1 1 2 3 4 5 6 23 in Pn−1.12.k−1 can be obtained in such a way.3 all produce the same partition in P4. ∀n ≥ 2. and use this fact for observing that: n n−1 n−1 =k + k k k−1 • n = 1. n Let us now look for a combinatorial interpretation • n−1 = n . 3} ∪ {4} {1.n | = where.k−1 | which is the same recurrence as for the Stirling numbers of the second kind.k | = k|Pn−1. ∀n ∈ N and = 0. we can eliminate it obtaining a partition If we sum the rows of the Stirling triangle of the second kind. By eliminating These relations completely deﬁne the Stirling trian1. Stirling numbers of the ﬁrst kind. 2. nonsingletons. 4} 2. 4} {1. 4} ∪ {2} . we have the following 7 partitions: k=0 n k=0 = n−1 k n−1 x k+1 + k n−1 k x = k {1} ∪ {2. except this last whole set. As far as the initial conn n−1 n−1 ditions are concerned. 3} {1. the two elements can be chosen in n empty subsets. thus obtaining the recurrence relation: with the corresponding Stirling number of the second kind. We can conclude that |Pn. the ﬁrst one uniquely n n = 1. 2} ∪ {3} . For example. . we now count |Pn. say the last element n. and therecan equate the coeﬃcients of xk in this and the above fore |Pn. For example. 5} ∪ {3} ∪ {4} {1.k is the corresponding set. n}. 3. the same partition is obtained several times. we obtain as ﬁrst subset all the subsets in gle of the second kind. When n ≥ 1.3: Stirling numbers of the second kind n−1 {1.2.e.

a2 .k corresponds to one or more cycles in Sn. k The numbers in this sequence are called Bell numbers and are denoted by Bn . they are alplus the six permutations of the set {1. we always have Bn ≤ n!.e. . 2. For example. {1. k. we ﬁnd 1. These are (positive or negative) rational numbers and therefore they cannot correspond to any counting problem. 1. 2. . B0 + 1 0 Because of this deﬁnition. they arise in many combinatorial problems and therefore they should be examined here. For example. their combinatorial interpretation cannot be direct. ∀n > By performing the necessary computations.B2 = 1/6. B0 = 1. by deﬁnition we have: n Bn = k=0 n . {1. ordered Bell numbers are also called preferential arrangement numbers. {1. We conclude this section by introducing another important sequence of numbers. and {1. k Bell numbers grow very fast.. 1) modulus. the partition {1} ∪ {2} ∪ {3} can be ordered in 3! = 6 diﬀerent ways. all the other values of Bn for odd (1. for the moment only introducing their deﬁnition. By the previous example. 3 3 3 k k=0 B2 = 0.. and in fact Bn < n! for every n > 1. Let us ﬁx an in. 2) is {2. because for n = 0 number is denoted by On . n k . and their No initial condition is necessary. .24 {1. 2} . These ternatively one positive and one negative. 1} . but.all the possible values for the Bn ’s. 2.in such a way that everything becomes accessible to ment. . apart from the zero values.k ). in fact. 2} . i. These are called ordered partitions. for every value of n and k (a subset in Pn. . by setting the element 1 in the a1 th subset. 2) (1. 1) n are zero. 2.e. 2. 1. For example. i.However. 4} ∪ {2. they become extremely large in (1. However. On > n!. orderings of set partitions and preferential arrange. 1. we have proved that it is actually a 1-1 correspondence. If (a1 . an ) is a preferential arrange.values are as follows: tiset with n elements containing at least once all the n 0 1 2 3 4 5 6 numbers 1. {1. 3}. i. SPECIAL NUMBERS This construction can be easily inverted and since it is injective. 3}∪{1. 2} ∪ {3} can k=0 be ordered in 2! = 2 ways.. since n ≤ k . while the partition corresponding to (2. 2) (2. 2 in the a2 th subset.e. 1. We can ﬁnd a 1-1 correspondence between the ily proven in a direct way. The ﬁrst twelve teger n ∈ N and for every k ≤ n let Ak be any mul. B1 + B0 + 2 1 0 this shows that On ≥ n!. we build the corresponding ordered partition us. from their deﬁnition. and we now have a formula for B2 : n n On = k!. These and other properties of the Bernoulli numbers are not easorderings are called preferential arrangements. The Bernoulli numbers are implicitly deﬁned by the recurrence relation: n n+1 Bk = δn. and we can go on successively obtaining dered Bell numbers is as follows. Bernoulli numbers seem to be small. CHAPTER 2. 1) (2. Their posBn 0 −1/30 0 5/66 0 −691/2730 sible orderings are given by the following 7 vectors: Except for B1 . 1. 2. and the sequence begins: value. This is the starting 0 we easily see that O3 = 13. . whose ordering is diﬀerent. 1. i. but as n grows. Because of that. we’ll see later how we can arrange things ments. 3} ∪ {2} {1} ∪ {2} ∪ {3} .0 . 3}. 1. 2. however. the possible multisets n 7 8 9 10 11 12 are: {1.. . when n = 3. If k is the largest number in the arrangement. and. the numbers On are called ordered Bell numbers and we have: We obtain B1 = −1/2. 4}. we exactly build k subsets. 2. 2}. and B1 is obtained by setting n = 1 in the recurrence relation: n 0 1 2 3 4 5 6 7 2 2 On 1 1 3 13 75 541 4683 47293 B1 = 0. the partition corresponding to (1. Initially. 2. 1) is {1. 2} ∪ {3} and {3} ∪ {1. Another combinatorial interpretation of the or. Another frequently occurring sequence is obtained by ordering the subsets appearing in the partitions of Nn . . The number of all the possible Bn 1 −1/2 1/6 0 −1/30 0 1/42 orderings of the Ak ’s is just the nth ordered Bell number. 1) (1. 1. and so on. 3}. 2) (2. we have 1 B0 = 1.e.

i. as an expression: g(t) = g−m t−m + g−m+1 t−m+1 + · · · + g−1 t−1 + + g0 + g1 t + g 2 t + · · · = 2 ∞ Let R be the ﬁeld of real numbers and let t be any indeterminate over R.L.. in particular to the ﬁeld of rational numbers Q and to the ﬁeld of complex numbers C. The same deﬁnition applies to every set of numbers.g. k=−m The set of f.p. F[[t]] and F[[y]]. . The term ordinary is used to distinguish these functions from exponential generating functions. say. a symbol diﬀerent from any element in R. when the order of g(t) is non-negative. 1 + t + t2 + t3 + · · ·.p. There are two main reasons why f.p. f1 . is the smallest index r for which fr = 0.s.s. of a formal power series are mostly used to count objects. .p..p.s.p. f2 .. when they are the coeﬃcients of an exponential generating function. For example. . In order to stress that our results are substantially independent of the particular ﬁeld F and of the particular indeterminate t. The formal power series 0 = 0 + 0t + 0t2 + 0t3 + · · · has inﬁnite order. . The use of a particular indeterminate t is irrelevant. the algebraic structure of f. i. f1 .s. are more easily studied than sequences: 1.p. positive rational numbers (e. . If f (t) ∈ F. .L. The present chapter is devoted to these algebraic aspects of f.s. f2 . We observe explicitly that an expression as ∞ k k=−∞ fk t does not represent a f. For example. and there exists an obvious 1-1 correspondence between. inverse. .s.1). We conclude this section by deﬁning the concept of a formal Laurent (power) series (f. . or.e. which will be introduced in the next chapter. of order exactly r is denoted by F r or by Fr [[t]]..s.s. . there is no substantial diﬀerence between ∞ k the sequence and the f.. . it is simple to prove that this correspondence is indeed an isomorphism. are all real numbers.s. .p. strictly contains the set of f.s. a symbol to denote the place of the element in the sequence. corresponding to the sequence (1.p. we denote F[[t]] by F. in some cases.s. For a f.). we will prove that the series 1 + t + t2 + t3 + · · · can be conveniently abbreviated as 1/(1 − t).p. we dedicate the present chapter to the general theory of f. many f. the order of f (t). A formal power series (f. Although our study of f.s. can be “abbreviated” by expressions easily manipulated by elementary algebra.s. then g(t) is actually a f.. The set of formal power series over F in the indeterminate t is denoted by F[[t]].s. The developments we are now going to see. .L. 2. The indeterminate t is used as a “place-marker”.).Chapter 3 Formal power series 3. the term t5 = 1 · t5 simply denotes that the element in position 5 (starting from 0) in the sequence is the number 1. 1.p. is mainly justiﬁed by the development of a generating function theory. in combinatorial analysis and in the analysis of algorithms the coeﬃcients f0 . In fact. and therefore they are positive integer numbers.) over R in the indeterminate t is an expression: f (t) = f0 +f1 t+f2 t2 +f3 t3 +· · ·+fn tn +· · · = ∞ k=0 fk tk where f0 . denoted by ord(f (t)). 1. The set of all f. and postpone the study of generating functions to the next chapter. g(t) the order can be negative. and from this fact we will be able to infer that the series has a f.p.L. f2 .s..p. which will 25 gk t k .) = (fk )k∈N is a sequence of (real) numbers. See below and Section 4.1 Deﬁnitions power series for formal be called the (ordinary) generating function of the sequence. and depending on the ﬁeld structure of the numeric set. is very well understood and can be developed in a standard way.s.e. If (f0 . can be easily extended to every ﬁeld F of 0 characteristic. but the reader can think of F as of R[[t]].s. k=0 fk t .. . which is 1 − t + 0t2 + 0t3 + · · ·. in the f. f1 .

k2 < ∞. the the explicit expression for the Cauchy product. it immediately follows that F f (t) ∈ F 0 . The previous reasoning also shows that. then we can suppose that ord(f (t)) = k1 and ord(g(t)) = k2 . f1 = −1 and fk = 0. Because of that.s. Here we have This clearly shows that the product is commutative f0 = 1.s. in general. if f (t) is an invertible element in F. We are now going to deﬁne the fk1 gk2 = 0. we can prove that F does not contain any these considerations are irrelevant and the identity is zero divisor. Accordf (t)g(t) = f0 g0 + (f0 g1 + f1 g0 )t + ing to standard terminology. we obtain: = fj gk−j tk f1 f2 f2 −1 j=0 k=0 g0 = f0 g1 = − 2 g2 = − 1 − 2 · · · 3 f0 f0 f0 Because of the form of the tk coeﬃcient. therefore. 0 = 0 + 0t + 0t2 + 0t3 + · · ·. can be embedded into several al. 1 − t = 1 − t + 0t2 + 0t3 + · · ·. diﬀerent valid from a purely formal point of view.p. 1. and so it cannot be zero. which is related to the usual con. let us compute the inverse of + (f0 g3 + f1 g2 + f2 g1 + f3 g0 )t3 + · · · the f.p. On the other hand. when t is a variable and f. 1−t tk + tk = = fj hk−j gj hk−j It is well-known that this identity is only valid for j=0 j=0 k=0 k=0 −1 < t < 1. the sum of f (t) and g(t) is deﬁned as: ord(f (t)g(t)) = ord(f (t)) + ord(g(t)) The order of the identity 1 is obviously 0. this is also −1 called the convolution of f (t) and g(t). we should have f (t)f (t)−1 = k=0 k=0 k=0 1 and therefore ord(f (t)) = 0.p. We conidea to write down explicitly the ﬁrst terms of the clude stating the result just obtained: a f. We conclude most common one. + (f0 g2 + f1 g1 + f2 g0 )t2 + As a simple example.26 CHAPTER 3. Therefore the inverse f. Explicitly. In fact.s. Given ∞ ∞ we have: two f. with 0 ≤ k1 . . f0 = 0. cept of sum and (Cauchy) product of series. ..s. Finally. ·) is an integrity domain. Cauchy product: F 0 is also called the set of invertible f. i. In our formal approach.p. starting k=0 k=0 with the ﬁrst equation and going on one equation ∞ k after the other. From low from the analogous properties in the ﬁeld F.e.p. This means fk1 = 0 and gk2 = 0. we are looking for (fj + gj )hk−j tk = = is 1 + t + t2 + t3 + · · ·.p. If f (t) and g(t) are two f. The usual notation for this j=0 k=0 fact is: 1 ∞ k ∞ k = 1 + t + t 2 + t3 + · · · . f (t) + g(t) = fk tk + gk t k = (fk + gk )tk .) k ∞ are 1. is invertible if and only if its order is 0.p. the elements of F 0 are called the units of the integrity domain.s. 2. f (t) = f0 + f1 t + f2 t2 + f3 t3 + · · · with is a commutative group with respect to the sum. ∀k > 1. however. ∞ ∞ ∞ 3.s.p.s. and can determine the coeﬃcients of g(t) by solving the ∞ the opposite series of f (t) = k=0 fk tk is the series inﬁnite system of linear equations: ∞ k −f (t) = k=0 (−fk )t .s.that (F. The distributive g1 − g 0 = 0 law is a consequence of the distributive law valid in g2 − g 1 = 0 F.fact. let g(t) = f (t)−1 so that f (t)g(t) = 1. we have: ··· (f (t) + g(t))h(t) = and we easily obtain that all the gj ’s (j = 0. 1 = 1 + 0t + 0t2 + 0t3 + · · ·. we can easily prove that f (t) is invertible. In The associative and commutative laws directly fol. +..p. The system and it is a simple matter to prove that the identity is becomes: g0 = 1 the f.p. we identity is the f. . f (t) = k=0 fk tk and g(t) = k=0 gk tk . if From this deﬁnition. FORMAL POWER SERIES from zero.s.s. It is a good and therefore g(t) = f (t) is well deﬁned. are inter= f (t)h(t) + g(t)h(t) preted as functions. f0 g0 = 1 Let us now deﬁne the Cauchy product of f (t) by f0 g1 + f1 g0 = 0 g(t): f0 g2 + f1 g1 + f2 g0 = 0 ∞ ∞ ··· k k f (t)g(t) = = fk t gk t The system can be solved in a simple way.2 The basic algebraic structure .f (t)g(t) has the term of degree k1 +k2 with coeﬃcient gebraic structures. the product The set F of f.

we ﬁnd b−m = a−1 . g(t)). +. From Algebra we know that given an integrity domain (K. our assert is proved.s. this is proved in the same way we proved that every f. l(t) = k=0 dk t construction. b(t) as f (t) = tp a(t) and g(t) = tp b(t).L. we have that (L. in this case. ·) is the smallest ﬁeld containing the integrity domain (F. Let concept of a formal Laurent series. g(t)) ∈ L there ex∞ k b(t) = k=n bk t (m.s.L.s. as an extension L = F × F be the set of pairs of f.p. a(t) = k=m ak tk = 0 has ∞ an inverse f.s.. so it is now obvious that the correspondence is also an isomorphism between (L.e.s. From now on. b(t) = k=−m bk tk . +. +.3 Our aim is to show that the ﬁeld (L. and our proof is complete. ai bj tk 3. and if we denote by L the set of f. we have a(t)b(t) = 1 and the proof is complete. ·) containing (K. obviously. by substituting the values obtained up to the moment. when we will reduce many operations to . It is now easy to see that l(t) is the inverse of the f. in F 0 has an inverse. We now have: k=p Formal Laurent Series a(t)b(t) = ∞ ∞ k=m ak t k ∞ k=n bk t k b(t)f (t) = b(t)tp a(t) = tp b(t)a(t) = g(t)a(t) = and this shows that (a(t). b(t)) ∈ L such that (f (t). In fact we should have: am b−m = 1 am b−m+1 + am+1 b−m = 0 am b−m+2 + am+1 b−m+1 + am+2 b−m = 0 ··· By solving the ﬁrst equation. let us now suppose that b(t) ∈ F 0 . We can now show that (L. we introduced the way starting with the integrity domain of f. +.4.. m then the system can be solved one equation after the other.p. multiplication and division..L. So. on the other hand. ·) of f. n ∈ Z). +.p.. As we did for f. This shows that the correspondence is a 1-1 correspondence between L and L preserving the inverse. only a few of which can be extended to L. In fact. We ∞ have a(t)/v(t) = k=0 dk tk ∈ F 0 . +. it is possible to consider other operations on F. +. as considered above. we can ists a pair (a(t).. ·).4 Operations power series on formal Besides the four basic operations: addition.s. ord(g(t))) and let us deﬁne k=m k=n a(t).L. b(t) or also by f (t).s. Therefore. and the ﬁeld of rational functions is built from the integrity domain of the polynomials. This property will be important in our future developments. if p ∈ N we can recursively deﬁne: f (t)0 f (t)p = 1 = f (t)f (t)p−1 if p = 0 if p > 0 and observe that ord(f (t)p ) = p ord(f (t)). then the order of f (t)p becomes larger and larger and goes to ∞ when p → ∞.L.s. are two f. it is not diﬃcult to ﬁnd out that these operations enjoy the usual properties of sum and product.p. ·) is indeed the smallest ﬁeld containing (F. Because of this result.s. ·) and (L.s. n) and q = m + n. by let us consider the f.s.. ·) can be built in the following way: let us deﬁne an equivalence relation ∼ on the set K × K: (a.s. Since am b−m is the coeﬃcient of t0 . where v(t) ∈ F 0 . b) ∼ (c. g(t). +. b(t)) (i. This is just the way in which the ﬁeld Q of rational numbers is constructed from the integrity domain Z of integers numbers. OPERATIONS ON FORMAL POWER SERIES 27 3. However. b(t)/a(t) in the sense of f. the set L will be ignored and we will always refer to L as the ﬁeld of f.p.L.s. ∞ either a(t) ∈ F 0 or b(t) ∈ F 0 or both are invertible = (ak + bk )tk f. therefore. is isomorphic with the ﬁeld constructed in the described In the ﬁrst section of this Chapter. if we now set F = K × K/ ∼. if a(t) = k=m ak tk and by showing that for every (f (t). it is uniquely determined by a(t). thus characterizing the set of f.p..s.3. the set F with the operations + and · deﬁned as the extension of + and · in K is the ﬁeld we are searching for. if f (t) ∈ F 0 . then a(t)/b(t) ∈ F and is uniquely determined by a(t). f (t)b(t) = g(t)a(t)) and at least ∞ ∞ one between a(t) and b(t) belongs to F 0 . +.L. then we can write b(t) = tm v(t). ·)... +.s. If b(t) ∈ F 0 .p. ·). f (t)p ∈ F 0 if and only if f (t) ∈ F 0 . d) ⇐⇒ ad = bc. in an algebraic way. ·) is a ﬁeld.L..s. and consequently ∞ k−m . g(t)) ∼ deﬁne the sum and the Cauchy product: (a(t). b(t)) ∼ (f (t). a(t) + b(t) = ak tk + bk t k = let p = min(ord(f (t)). we begin ∞ of the concept of a f. = k=q i+j=k where p = min(m. b(t).s. +. ·) the smallest ﬁeld (F. The most important operation is surely taking a power of a f.p. we can identify L and L and assert that (L.L. subtraction. The only point we should formally ∞ prove is that every f.

28 inﬁnite sums involving the powers f (t)p with p ∈ N. If f (t) ∈ F 0 , i.e., ord(f (t)) > 0, these sums involve elements of larger and larger order, and therefore for every index k we can determine the coeﬃcient of tk by only a ﬁnite number of terms. This assures that our deﬁnitions will be good deﬁnitions. We wish also to observe that taking a positive integer power can be easily extended to L; in this case, when ord(f (t)) < 0, ord(f (t)p ) decreases, but remains always ﬁnite. In particular, for g(t) = f (t)−1 , g(t)p = f (t)−p , and powers can be extended to all integers p ∈ Z. When the exponent p is a real or complex number whatsoever, we should restrict f (t)p to the case f (t) ∈ F 0 ; in fact, if f (t) = tm g(t), we would have: f (t)p = (tm g(t))p = tmp g(t)p ; however, tmp is an expression without any mathematical sense. Instead, if f (t) ∈ F 0 , let us write f (t) = f0 + v(t), with ord(v(t)) > 0. For v(t) = v(t)/f0 , we have by Newton’s rule: f (t) = (f0 +v(t)) =

p p p f0 (1+v(t))p

CHAPTER 3. FORMAL POWER SERIES By applying well-known rules of the exponential and logarithmic functions, we can easily deﬁne the corresponding operations for f.p.s., which however, as will be apparent, cannot be extended to f.L.s.. For the exponentiation we have for f (t) ∈ F 0 , f (t) = f0 + v(t): ef (t) = exp(f0 + v(t)) = ef0

∞ k=0

v(t)k . k!

Again, since v(t) ∈ F 0 , the order of v(t)k increases with k and the sums necessary to compute the coeﬃcient of tk are always ﬁnite. The formula makes clear that exponentiation can be performed on every f (t) ∈ F, and when f (t) ∈ F 0 the factor ef0 is not present. For the logarithm, let us suppose f (t) ∈ F 0 , f (t) = f0 + v(t), v(t) = v(t)/f0 ; then we have: ln(f0 + v(t)) = = ln f0 + ln(1 + v(t)) = ln f0 +

∞

=

p f0

∞ k=0

p v(t)k , k

(−1)k+1

k=1

v(t)k . k

which can be assumed as a deﬁnition. In the last p p expression, we can observe that: i) f0 ∈ C; ii) k is deﬁned for every value of p, k being a non-negative integer; iii) v(t)k is well-deﬁned by the considerations above and ord(v(t)k ) grows indeﬁnitely, so that for every k the coeﬃcient of tk is obtained by a ﬁnite sum. We can conclude that f (t)p is well-deﬁned. Particular cases are p = −1 and p = 1/2. In the former case, f (t)−1 is the inverse of the f.p.s. f (t). We have already seen a method for computing f (t)−1 , but now we obtain the following formula: f (t)−1 = 1 f0

∞ k=0

In this case, for f (t) ∈ F 0 , we cannot deﬁne the logarithm, and this shows an asymmetry between exponential and logarithm. Another important operation is diﬀerentiation: Df (t) = d f (t) = dt

∞ k=1

kfk tk−1 = f ′ (t).

This operation can be performed on every f (t) ∈ L, and a very important observation is the following: Theorem 3.4.1 For every f (t) ∈ L, its derivative f ′ (t) does not contain any term in t−1 .

Proof: In fact, by the general rule, the term in t−1 should have been originated by the constant term For p = 1/2, we obtain a formula for the square root (i.e., the term in t0 ) in f (t), but the product by k reduces it to 0. of a f.p.s.:

k=0

−1 1 v(t)k = k f0

∞

(−1)k v(t)k .

f (t)1/2

= =

f (t) = f0

∞ k=0

f0

∞ k=0

1/2 v(t)k = k

(−1)k−1 2k v(t)k . 4k (2k − 1) k

In Section 3.12, we will see how f (t)p can be obtained computationally without actually performing the powers v(t)k . We conclude by observing that this more general operation of taking the power p ∈ R cannot be extended to f.L.s.: in fact, we would have smaller and smaller terms tk (k → −∞) and therefore the resulting expression cannot be considered an actual f.L.s., which requires a term with smallest degree.

This fact will be the basis for very important results on the theory of f.p.s. and f.L.s. (see Section 3.8). Another operation is integration; because indeﬁnite integration leaves a constant term undeﬁned, we prefer to introduce and use only deﬁnite integration; for f (t) ∈ F this is deﬁned as:

t

f (τ )dτ =

0

∞ k=0

t

fk

0

τ k dτ =

∞ k=0

fk k+1 t . k+1

Our purely formal approach allows us to exchange the integration and summation signs; in general, as we know, this is only possible when the convergence is t uniform. By this deﬁnition, 0 f (τ )dτ never belongs to F 0 . Integration can be extended to f.L.s. with an

3.6. COEFFICIENT EXTRACTION obvious exception: because integration is the inverse operation of diﬀerentiation, we cannot apply integration to a f.L.s. containing a term in t−1 . Formally, from the deﬁnition above, such a term would imply a division by 0, and this is not allowed. In all the other cases, integration does not create any problem.

29 position ◦; composition is always associative and therefore (F 1 , ◦) is a group if we prove that every f (t) ∈ F 1 has a left (or right) inverse, because the theory assures that the other inverse exists and coincides with the previously found inverse. Let f (t) = f1 t+f2 t2 +f3 t3 +· · · and g(t) = g1 t+g2 t2 +g3 t3 +· · ·; we have: f (g(t)) = f1 (g1 t + g2 t2 + g3 t3 + · · ·) + 2 + f2 (g1 t2 + 2g1 g2 t3 + · · ·) +

3 + f3 (g1 t3 + · · ·) + · · · = 2 = f1 g1 t + (f1 g2 + f2 g1 )t2 +

3.5

Composition

A last operation on f.p.s. is so important that we dedicate to it a complete section. The operation is the composition of two f.p.s.. Let f (t) ∈ F and g(t) ∈ F 0 , then we deﬁne the “composition” of f (t) by g(t) as the f.p.s: f (g(t)) =

∞

3 + (f1 g3 + 2f2 g1 g2 + f3 g1 )t3 + · · · = t

fk g(t)k .

k=0

This deﬁnition justiﬁes the fact that g(t) cannot belong to F0 ; in fact, otherwise, inﬁnite sums were involved in the computation of f (g(t)). In connection with the composition of f.p.s., we will use the following notation: f (g(t)) = f (y) y = g(t) which is intended to mean the result of substituting g(t) to the indeterminate y in the f.p.s. f (y). The indeterminate y is a dummy symbol; it should be different from t in order not to create any ambiguity, but it can be substituted by any other symbol. Because every f (t) ∈ F 0 is characterized by the fact that g(0) = 0, we will always understand that, in the notation above, the f.p.s. g(t) is such that g(0) = 0. The deﬁnition can be extended to every f (t) ∈ L, but the function g(t) has always to be such that ord(g(t)) > 0, otherwise the deﬁnition would imply inﬁnite sums, which we avoid because, by our formal approach, we do not consider any convergence criterion. The f.p.s. in F 1 have a particular relevance for composition. They are called quasi-units or delta series. First of all, we wish to observe that the f.p.s. t ∈ F 1 acts as an identity for composition. In fact y y = g(t) = g(t) and f (y) y = t = f (t), and therefore t is a left and right identity. As a second fact, we show that a f.p.s. f (t) has an inverse with respect to composition if and only if f (t) ∈ F 1 . Note that g(t) is the inverse of f (t) if and only if f (g(t)) = t and g(f (t)) = t. From this, we deduce immediately that f (t) ∈ F 0 and g(t) ∈ F 0 . On the other hand, it is clear that ord(f (g(t))) = ord(f (t))ord(g(t)) by our initial deﬁnition, and since ord(t) = 1 and ord(f (t)) > 0, ord(g(t)) > 0, we must have ord(f (t)) = ord(g(t)) = 1. Let us now come to the main part of the proof and consider the set F 1 with the operation of com-

The ﬁrst equation gives g1 = 1/f1 ; we can substitute this value in the second equation and obtain a value for g2 ; the two values for g1 and g2 can be substituted in the third equation and obtain a value for g3 . Continuing in this way, we obtain the value of all the coeﬃcients of g(t), and therefore g(t) is determined in a unique way. In fact, we observe that, by construction, in the kth equation, gk appears in linear form and its coeﬃcient is always f1 . Being f1 = 0, gk is unique even if the other gr (r < k) appear with powers. The f.p.s. g(t) such that f (g(t)) = t, and therefore such that g(f (t)) = t as well, is called the compositional inverse of f (t). In the literature, it is usually denoted by f (t) or f [−1] (t); we will adopt the ﬁrst notation. Obviously, f (t) = f (t), and sometimes f (t) is also called the reverse of f (t). Given f (t) ∈ F 1 , the determination of its compositional inverse is one of the most interesting problems in the theory of f.p.s. or f.L.s.; it was solved by Lagrange and we will discuss it in the following sections. Note that, in principle, the gk ’s can be computed by solving the system above; this, however, is too complicated and nobody will follow that way, unless for exercising.

In order to determine g(t) we have to solve the system: f1 g1 = 1 2 f1 g2 + f2 g1 = 0 3 f1 g3 + 2f2 g1 g2 + f3 g1 = 0 ···

3.6

Coeﬃcient extraction

If f (t) ∈ L, or in particular f (t) ∈ F, the notation [tn ]f (t) indicates the extraction of the coeﬃcient of tn from f (t), and therefore we have [tn ]f (t) = fn . In this sense, [tn ] can be seen as a mapping: [tn ] : L → R or [tn ] : L → C, according to what is the ﬁeld underlying the set L or F. Because of that, [tn ]

30

CHAPTER 3. FORMAL POWER SERIES

(linearity) (shifting) (diﬀerentiation) (convolution) (composition)

[tn ](αf (t) + βg(t)) [tn ]tf (t) [tn ]f ′ (t) [tn ]f (t)g(t) [tn ]f (g(t))

= α[tn ]f (t) + β[tn ]g(t) = = = = [tn−1 ]f (t) (n + 1)[tn+1 ]f (t)

n

(K1) (K2) (K3) (K4) (K5)

[tk ]f (t)[tn−k ]g(t)

k=0 ∞

([y k ]f (y))[tn ]g(t)k

k=0

Table 3.1: The rules for coeﬃcient extraction is called an operator and exactly the “coeﬃcient of ” operator or, more simply, the coeﬃcient operator. In Table 3.1 we state formally the main properties of this operator, by collecting what we said in the previous sections. We observe that α, β ∈ R or α, β ∈ C are any constants; the use of the indeterminate y is only necessary not to confuse the action on diﬀerent f.p.s.; because g(0) = 0 in composition, the last sum is actually ﬁnite. Some points require more lengthy comments. The property of shifting can be easily generalized to [tn ]tk f (t) = [tn−k ]f (t) and also to negative powers: [tn ]f (t)/tk = [tn+k ]f (t). These rules are very important and are often applied in the theory of f.p.s. and f.L.s.. In the former case, some care should be exercised to see whether the properties remain in the realm of F or go beyond it, invading the domain of L, which can be not always correct. The property of diﬀerentiation for n = 1 gives [t−1 ]f ′ (t) = 0, a situation we already noticed. The operator [t−1 ] is also called the residue and is noted as “res”; so, for example, people write resf ′ (t) = 0 and some authors use the notation res tn+1 f (t) for [tn ]f (t). We will have many occasions to apply rules (K1) ÷ (K5) of coeﬃcient extraction. However, just to give a meaningful example, let us ﬁnd the coeﬃcient of tn in the series expansion of (1 + αt)r , when α and r are two real numbers whatsoever. Rule (K3) can 1 be written in the form [tn ]f (t) = n [tn−1 ]f ′ (t) and we can successively apply this form to our case: [tn ](1 + αt)r = rα n−1 [t ](1 + αt)r−1 = n rα (r − 1)α n−2 = [t ](1 + αt)r−2 = · · · = n n−1 (r − n + 1)α 0 rα (r − 1)α ··· [t ](1 + αt)r−n = = n n−1 1 r 0 = αn [t ](1 + αt)r−n . n conclude with the so-called Newton’s rule: [tn ](1 + αt)r = r n α n

which is one of the most frequently used results in coeﬃcient extraction. Let us remark explicitly that when r = −1 (the geometric series) we have: [tn ] 1 1 + αt = = −1 n α = n 1+n−1 (−1)n αn = (−α)n . n

A simple, but important use of Newton’s rule concerns the extraction of the coeﬃcient of tn from the inverse of a trinomial at2 + bt + c, in the case it is reducible, i.e., it can be written (1 + αt)(1 + βt); obviously, we can always reduce the constant c to 1; by the linearity rule, it can be taken outside the “coeﬃcient of” operator. Therefore, our aim is to compute: [tn ] 1 (1 + αt)(1 + βt)

with α = β, otherwise Newton’s rule would be immediately applicable. The problem can be solved by using the technique of partial fraction expansion. We look for two constants A and B such that: 1 A B = + = (1 + αt)(1 + βt) 1 + αt 1 + βt A + Aβt + B + Bαt ; = (1 + αt)(1 + βt) if two such constants exist, the numerator in the ﬁrst expression should equal the numerator in the last one, independently of t, or, if one so prefers, for every value of t. Therefore, the term A + B should be equal to 1, while the term (Aβ + Bα)t should always be 0. The values for A and B are therefore the solution of the linear system: A+B =1 Aβ + Bα = 0

We now observe that [t0 ](1 + αt)r−n = 1 because of our observations on f.p.s. operations. Therefore, we

column 1 contains the same coeﬃcients shifted down by one position and d0. The system has therefore only one solution. when b > 0. and this is the correct value for θ.1 = 0.7. so that the ﬁrst k positions are 0. [tn ] √ 12k+2 (1 − αt)(1 − βt) α−β n = 12k + 2 σn = 2 3 = 6 × 729k √ 12k+4 = 9 × 729k n = 12k + 3 σn = 3 Since α and β are complex numbers. As a consequence. We can try to give n = 12k + 4 σn = 3 = 9 × 729k n = 12k + 5 σn = 0 it a better form. .k∈N : column 0 contains the coeﬃcients f0 . which is A = α/(α−β) and B = −β/(α−β). we √ 12k can apply the formula above in the form: n = 12k σn = 3 = 729k √ 12k+2 αn+1 − β n+1 1 = 3 × 729k n = 12k + 1 σn = 3 = . with the coeﬃcients of f (t) we form the following inﬁnite lower triangular matrix (or array) D = (dn. . For a reason which will be apparent only later.k = fn−k . we have 0 < arctan − 4c − b2 /2 < π/2. the principal branch of arctan is negative. column k contains the coefﬁcients of f (t) shifted down k positions. Obviously: θ = arctan |∆| −b + kπ = arctan 2 2 √ 4c − b2 + kπ −b √ When b < 0. The trinomial is irreducible. We can now substitute these values in the expression above: [tn ] 1 = (1 + αt)(1 + βt) 1 α β = [tn ] − = α − β 1 + αt 1 + βt α β 1 [tn ] = − [tn ] = α−β 1 + αt 1 + βt αn+1 − β n+1 (−1)n = α−β 31 At this point we only have to ﬁnd the value of θ. f1 . but we can write: [tn ] = [tn ] 1− 1 = 1 + bt + ct2 √ 1 1− −b−i |∆| t 2 −b+i |∆| t 2 √ This time.3. in this order. which is always diﬀerent from 0. This deﬁnition can be summarized in the formula dn. in general. Let us set α = −b + i |∆| /2. However. a partial fraction expansion does not give These coeﬃcients have the following values: a simple closed form for the coeﬃcients. √ and we should set θ = π + arctan − 4c − b2 /2.7 Matrix representation we can set α = ρeiθ and β = ρe−iθ . the resulting √ 12k+4 expression is not very appealing. because of our hypothesis α = β. k ∈ N.k )n. however. the array . This implies 0 < arg(α) < π and we have: = −81 × 729k n = 12k + 7 σn = − 3 √ 12k+8 n = 12k + 8 σn = −2 3 = −162 × 729k |∆| |∆| b b √ 12k+10 α−β = − +i + +i = = −243 × 729k n = 12k + 9 σn = − 3 2 2 2 2 √ 12k+10 n = 12k + 10 σn = − 3 = −243 × 729k = i |∆| = i 4c − b2 n = 12k + 11 σn = 0 If θ = arg(α) and: ρ = |α| = √ b2 4c − b2 − = c 4 4 3. f2 . Consequently: αn+1 − β n+1 = ρn+1 ei(n+1)θ − e−i(n+1)θ = = 2iρn+1 sin(n + 1)θ and therefore: √ 2( c)n+1 sin(n + 1)θ 1 √ = [t ] 1 + bt + ct2 4c − b2 n Let f (t) ∈ F 0 . . An interesting and non-trivial example is given by: σn 1 = = [tn ] 1 − 3t + 3t2 √ n+1 √ 2( 3) sin((n + 1) arctan( 3/3)) √ = = 3 √ (n + 1)π = 2( 3)n sin 6 Let us now consider a trinomial 1 + bt + ct2 for which ∆ = b2 − 4c < 0 and b = 0. √ 12k+6 n = 12k + 6 σn = − 3 = −27 × 729k so α is always contained in the positive imaginary √ 12k+8 halfplane. ∀n. MATRIX REPRESENTATION The discriminant of this system is α − β. we have: √ 4c − b2 θ = arctan +C −b where C = π if b > 0 and C = 0 if b < 0.

. . f (t)/t) = (1. . fn−2 . 1) and (g(t).p. 1) with the row-by-column product. As we have already said. This reveals a connection between the Cauchy product and the composition: in the Chapter on Riordan Arrays we will explore more deeply this connection. . . because of the isomorphism we have seen. fn−1 . . 1) and this shows that there exists a group isomorphism between (F 0 .p. 1) · (g(t). t/t) = (1. 0. g(t)/t) (let us call B the set of such matrices) in such a way that (F 1 . 1). f (g(t))/t) we have f (g(t)) = t. the identity t ∈ F 1 corresponds to the matrix (1. . we see that column k contains the coeﬃcients of the convolution f (t)g(t) shifted down k positions. Besides. ∀r < 0. the inverse matrix (1. and it is immediate to see what its generic element dn. Therefore we conclude: (f (t). 1): f0 0 0 f1 f0 0 f2 f1 f0 D = (f (t). . ◦) is isomorphic to (B.k∈N gn. g(t)/t)·(1. Lagrange . and since the product results in (1. g2 . . . fn−j gj−k if we conventionally set gr = 0. g(t)/t) · (1.s. f (t)/t). in F 1 . 2 2 4 0 f4 2f1 f3 + f2 3f1 f2 f1 · · · . lower triangular array of the form (1. 1) is the inverse of (f (t). becomes again the rowby-column product. ··· ··· ··· ··· ··· . In other words. . f (g(t))/t). Let us now consider a f.k 0 0 0 f0 f1 . f (t)/t) = (1..0 = j=0 fn−j gj = j=0 fn−j gj . 1) · (g(t). . f (t)/t).j fj. 1) = f3 f2 f1 f4 f3 f2 . g(t) ∈ F 1 .k of the product is: dn. by deﬁnition we have: fn. .s. 3. and the composition of f. This product will be denoted by (f (t). . 1) is the identity (in fact. it corresponds to the identity matrix) and (f (t)−1 . FORMAL POWER SERIES If fn.s. the inverse matrix for (1.p. 1) is {0. 0. Given an inﬁnite. .k = [tn ]f (t)k and therefore the generic element dn. and column k in (g(t). . and we can conclude: (1. In other words.k = ∞ j=0 ∞ j=0 [tn ]g(t)j [y j ]f (y)k = If (f (t). 1) (let us denote by A the set of such arrays) in such a way that (F 0 . Row-by-column product is surely the basic operation on matrices and its extension to inﬁnite. and therefore column 0 contains the coeﬃcients of the convolution f (t)g(t). . ·) is isomorphic to (A. 1). we wish to see how this observation yields to a computational method for evaluating the compositional inverse of a f. f (t) ∈ F 1 and let us build an inﬁnite lower triangular matrix in the following way: column k contains the coeﬃcients of f (t)k in their proper order: 1 0 0 0 0 ··· 0 f1 0 0 0 ··· 2 0 f2 f1 0 0 ··· 3 0 f3 2f1 f2 f1 0 ··· . by deﬁnition. with f (t) ∈ F 1 . 1) = (f (t)g(t).k = ∞ j=0 [tn ] [y j ]f (t)k g(t)j = [tn ]f (g(t))k . .p. . . we have dn. g(t)/t) · (1. When n ∞ k = 0. . . the identity matrix. 0 0 0 0 f0 . 1) is. 1) are the matrices corresponding to the two f. column k in (1. and the Cauchy product becomes the row-by-column product. . lower triangular arrays is straight-forward. .}.s. and this j=0 fn−j gj−1 = n−1 is the coeﬃcient of t in the convolution f (t)g(t).k is. 1). g1 .32 D will be denoted by (f (t). and this is suﬃcient to prove that the correspondence f (t) ↔ (1. we can associate every f. . because the sums involved in the product are actually ﬁnite. In particular.s. . . . g(t)/t) · (1. CHAPTER 3. .k = = ∞ j=0 n. .} where the number of leading 0’s is just k. When k = 1 we have n−1 ∞ dn. g(t)/t) is such that (1. f (t)/t) is the kth power of the composition f (g(t)). . . f (t) ∈ F 0 to a particular matrix (f (t). {fn . . g(t) ∈ F 1 to a matrix (1. .p. The row n in (f (t). f (t)/t) is a group isomorphism. (1. f (t)/t) is composed when f (t). f (t)/t) and we are interested to see how the matrix (1. f (t) and g(t). Clearly. . We have shown that we can associate every f.s. f (t)/t) is just the matrix corresponding to the compositional inverse of f (t).p. . we are interested in ﬁnding out what is the matrix obtained by multiplying the two matrices with the usual rowby-column product. = (1. . ·) and the set of matrices (f (t). 1). .. . . Proceeding in the same way.8 Lagrange inversion rem theo- The matrix will be denoted by (1.1 = j=0 fn−1−j gj . Therefore we have: dn. ·). for the moment. ·).

We there= [t ] [y j ]f (y)k . for which we have: n (f (t)/t)−k is well-deﬁned. and. the column 1 gives us the value of its coeﬃcients. and we wish to ﬁnd the D · (1. with φ(w) ∈ F 0 . If f (t) is the compositional inverse of f (t). (f (t)/t)−1 = (f1 + f2 t + f3 t2 + · · ·)−1 = generalized to every F (t) ∈ L. and this formula can be f (t)/t ∈ F 0 . the factor k/n does not depend on j and k=0 can be taken out of the summation sign.n k=0 1 n−1 ′ t [t ]F (t)φ(t)n . The other columns give us the coeﬃcients k Because f (t)/t ∈ F 0 .s.k∈N = (1.9 Some examples of the LIF We found that the number bn of binary trees with n nodes (and of other combinatorial objects . the sum ∞ ∞ is actually ﬁnite and is the term of the convolution k n k Fk [tn−k ]φ(t)n = Fk [t ]w(t) = = appearing in the last formula. w = w(t) satisfying this functional equation. Let us now distinguish n k=0 k=0 between the case k = n and k = n. f (t)/t) is: f. because we already know that Many times. is f1 /f1 = 1. 1). f (t)/t).j [y j ]f (y)k = also have f (t) ∈ F 1 . = = 1. However. When k = n we ∞ 1 n−1 have: = [t ] kFk tk−1 φ(t)n = n n n −1 ′ v = [t ]t tf (t) f (t) = . being Note that [t ]F (w(t)) = F0 .k = = k n n [t ]t tf (t)k−n−1 f ′ (t) = n k −1 1 d [t ] f (t)k−n = 0. n f (t) fore know that w(t) is uniquely determined and the j=0 LIF gives us: By the rule of diﬀerentiation for the coeﬃcient of opn erator. Indeed. which points out the purely formal aspects of Lagrange’s formula. the residue of its derivative should be zero. When k = n: vn. 1) and therefore D is the inverse of (1. This proves that D · (1.9. n f (t) n d j[y j ]f (y)k = [y j−1 ] f (y)k = k[y j ]yf ′ (y)f (y)k−1 .k = dn.k of the row-by-column product tφ(w). where φ(t) ∈ F 0 . [tn ]f (t)k = [tn−k ] (dn. inverse of (1.k we have: f n = [tn ]f (t) = dn. we vn. The LIF.k )n. dy The LIF can also give us the coeﬃcients of the powers w(t)k . there is another way for applying the the compositional inverse of f (t) is unique. in order to show that t k . f (t)/t). the constant term in f ′ (t)(t/f (t)) cient [t ]F (w(t)). f (t)/t) = (1. Let F (t) ∈ F and let us consider the com∞ n k t position F (w(t)) where w = w(t) is.k we have: result.s. For the coeﬃcient of tn in F (w(t)) we n k n t ′ k−1 have: = [t ] tf (t)f (t) .k )n.k = k n−k [t ] n t f (t) n 33 in fact. f (t)k−n is a f. n. g(t)/t). n f (t) ∞ [tn ]F (w(t)) = [tn ] Fk w(t)k = In fact. We follow the more recent proof of Stanley. f ′ (t) = f1 + 2f2 t + 3f3 t2 + · · · and. SOME EXAMPLES OF THE LIF found a noteworthy formula for the coeﬃcients of this compositional inverse.L. the functional equaj=0 tion can be written f (w(t)) = t. we have: t 1 1 [tn ]w(t) = [tn−1 ] = [tn−1 ]φ(t)n . f (t)/t) = (1. by ﬁnding the exact form of the matrix (1. g(t)/t) we only have to prove that n f (t) D · (1.p. as we observed. for vn.3. Let D = (dn. we will prove something more. the j ′ k−1 n−j [y ]yf (y)f (y) = vn. g(t)/t) should be and then verify that it is actually so. except for the coeﬃ0 −1 f1 +· · ·. Suppose we have a functional equation w = generic element vn. but we can obtain a still more general Therefore. by the formula for dn. As a matter of fact. ∞ Clearly w(t) ∈ F 1 and if we set f (y) = y/φ(y). as before.k = [t ] n j=0 f (t) solution to the functional equation w = tφ(w). therefore. the power (t/f (t))k = of the powers f (t) . = [t0 ]f ′ (t) n f (t) 0 in fact.k∈N be deﬁned as: dn. n k − n dt 3. and this shows that ∞ n j n−j t w(t) is the compositional inverse of f (t). we state what the form of (1.1 = 1 n−1 [t ] n t f (t) n and this is the celebrated Lagrange Inversion Formula (LIF).

b(t) = sum is extended to all the p-uples (i1 . for example up to the term of degree 5 or 6. In the r.p. also valid in the case n = 0 when C0 = 1. which cannot be directly solved.10 Formal power series and the computer When we are dealing with generating functions.. . we recognize a convolution and substituting b(t) for the corresponding f. we ﬁnd: and sum for n from 0 to inﬁnity. several Computer Algebra Systems exist. so that w(t) ∈ F 1 which generalizes the formula for the Catalan numand wn = bn . ∀n > 0. FORMAL POWER SERIES Ti1 Ti2 · · · Tip . so that w(t) ∈ F 1 . if we multiply the recurrence relation by that i1 + i2 + · · · + ip = n. denoted by Cn .s.h. Therefore we have: 1 1 nn−1 nn−1 wn = [tn−1 ]ent = = . 1 (2n)! 1 2n n! 2 3 24 = = . t t t 3. bn is called the As noticed. n=1 n + 1 n!n! n+1 n As we said in the previous chapter. In the same way we can compute the number of p-ary trees with n nodes. 1 n n (n − 1)! n! bn = [tn ]w(t) = [tn−1 ](1 + t)2n = n Therefore. or more in general with formal power series of any kind. w(t) is the compositional inverse of nth Catalan number and.s. we can multiply n+1 by tn+1 the two members of the recurrence relation t and sum for n from 0 to inﬁnity. which have arity 0. we obtain: b(t) − 1 = tb(t)2 .: (2n)! 2n 1 1 = = = n n−1 n (n − 1)!(n + 1)! ∞ 3 8 125 5 nn−1 n w(t) = t = t + t 2 + t3 + t4 + t +· · · .s. which oﬀer the possibility of .p. The previous relation becomes bers. . The LIF gives: φ(t) = (1 + t)2 . we have: w = t(1 + w)p and the LIF gives: Tn = = = [tn ]w(t) = 1 n−1 [t ](1 + t)pn = n 1 (pn)! pn 1 = = n n−1 n (n − 1)!((p − 1)n + 1)! pn 1 (p − 1)n + 1 n Since b0 = 1. the solution we are looking for is the f. . . .s. This time we have a p degree equation. under this name. and verify that w(f (t)) = t as well. if we set w(t) = T (t)−1.p. we often have to perform numerical computations in order to verify some theoretical result or to experiment with actual cases. . Now we have its form: 2! 3! 4! Cn = 2n 1 n+1 n It is a useful exercise to perform the necessary computations to show that f (w(t)) = t. In these and other circumstances the computer can help very much with its speed and precision. We ﬁnd: ∞ bn+1 tn+1 = ∞ n tn+1 k=0 bk bn−k . let us ﬁnd the solution of the functional (in the form relative to the functional equation) with equation w = tew . However. n=0 n=0 T (t) − 1 = tT (t)p . i2 . ip ) such k=0 bk bn−k . ∞ k k=0 bk t . n We are interested in evaluating bn = [t ]b(t). Nowadays. A non-empty p-ary tree can be decomposed in the following way: r ¨r ¨r ¨ rr r ¨¨ rr ¨ ¨ rrr ¨ ¨ r r t t t Tpt T2t T1t . A p-ary tree is a tree in which all the nodes have arity p.34 CHAPTER 3. it is often t4 t5 t3 f (t) = t/φ(t) = te−t = t − t2 + − + − · · · . As before. where the as well) satisﬁes the recurrence relation: bn+1 = which proves that Tn+1 = n Let us consider the f. let us therefore set w = w(t) = b(t) − 1. except for leaves. we can add and subtract 1 = b0 t0 from the left hand member and can take t outside the summation sign in the right hand member: ∞ n=0 bn t n − 1 = t ∞ n=0 n bk bn−k k=0 tn . w = t(1 + w)2 and we see that the LIF can be applied Finally.

on the contrary. a rational. . power series are simply a particular case of a general mathematical expression. An integer number has a variable length internal representation and special routines are used to perform the basic operations. In general. the operators are √ + and ·. n)p = (mp . This can be performed by a routine reduce computing p = gcd(m. Computer Algebra Systems use a more sophisticated internal representation. that is it can execute all the operators . i. the internal nodes of which correspond to operators and whose leaves correspond to operands.11. but they can slow down too much execution time if realized on a programmable pocket computer. n). n) + (m′ . n′ ) = reduce(mm′ . In order to avoid this last problem. For example. The operations on rational numbers are deﬁned in the following way: (m. Each operator has as many branches as its adicity is and a simple visit to the tree can perform its evaluation. . so that there is little danger that a number of successive operations could reduce a ﬁnite representation to a meaningless sequence of numbers. Every operator has its own arity or adicity. So we must have m ∈ Z. Obviously. if it is a natural. an are rational numbers and. np ) (p ∈ N) The simple representation of a formal power series by a vector of real. In fact. it is surely not the best way to represent power series and becomes completely useless when. Diﬀerentiation decreases by one the number of useful components. the coeﬃcients depend on some formal parameter. or rational. the result is a formal expression. and the operands are a and 3. which can perform quite easily the basic operations on formal power series.. can clarify diﬃcult theoretical points and can give useful hints whenever we are faced with particular problems. In other words we have: reprn0 ∞ k=0 35 provided that (m. i. The simplest way to represent a formal power series is surely by means of a vector. for example.11 The internal representation of expressions ak tk = (a0 . one may desire to use less sophisticated tools. Obviously. The use of these tools is recommended because they can solve a doubt in a few seconds. n) using Euclid’s algorithm and then dividing both m and n by p. . . which may perhaps be simpliﬁed but cannot be evaluated to a numerical result. so an upper bound n0 is usually given to the length of vectors and to represent power series. represented with the precision allowed by the particular computer. an are usually real numbers. It is suﬃcient to represent a rational number as a couple (m.e. However. In other words. Computer Algebra Systems usually realize an indeﬁnite precision integer arithmetic. programmable pocket computers are now available.3. Because of that. the computer memory only can store a ﬁnite number of components. They can be used to program a computer or to simply understand how an existing system actually works. n ∈ N and it is a good idea to keep m and n coprime. . a1 . components will be used in the next sections to explain the main algorithms for formal power series operations.1. a1 . a Computer Algebra System is not always accessible or. THE INTERNAL REPRESENTATION OF EXPRESSIONS actually working with formal power series. The dimension of m and n is limited by the internal representation of integer numbers. a real or a complex number) or can be a formal parameter. m) (m.e.. however. These routines can also be realized in a high level programming language (such as C or JAVA). a1 . in certain circumstances. nn′ ) (m. in a + 3. most operations on formal power series preserve the number of signiﬁcant components. . The aim of the present section is to give a rough idea of how an expression can be represented in the computer memory. The simple tree for the previous expression is given in Figure 3. . . in which the kth component (starting from 0) is the coeﬃcient of tk in the power series. However. n) × (m′ . For example. it is not diﬃcult to realize rational arithmetic on a computer. . In most combinatorial applications. The aim of the present and of the following sections is to describe the main algorithms for dealing with formal power series. n)−1 = (n. 3. The components a0 . it can be executed giving a numerical result. an ) (n ≤ n0 ) Fortunately. n) is a reduced rational number. whose intended meaning is just m/n. because it acts on a single term. if an operator acts on numerical operands. our representation only can deal with purely numerical formal power series. n′ ) = reduce(mn′ + m′ n. the number of operands on which it acts. as a. increase the number of signiﬁcant elements. . nn′ ) (m. an expression consists in operators and √ operands. . at the cost of introducing some 0 components. an integer. An expression can always be transposed into a “tree”. Operands can be numbers (and it is important that the nature of the number be speciﬁed. The adicity of the sum + is two. But if any of its operands is a formal parameter. integration and multiplication by tr . with some extra eﬀort. say. the adicity acts of the square root · is one. containing formal parameters as well. . a0 . because it √ on its two terms.

. Fortunately. . We point out that the time complexity for the sum is O(r) and the time complexity for the product is O(r2 ). f (t)α only can be / performed if f (t) ∈ F0 . a1 . These considerations are to be remembered if the operation is realized in some special environment. . however. Therefore.12 Basic operations of formal power series . an ) n ≤ n0 . . .1) are all positive ∞ integer numbers. . b1 . the exponents involved in the right hand member of (3. a2 . and therefore: α f (t)α = fh tαh 1 + fh+1 fh+2 2 t+ t + ··· fh fh .1) if the coeﬃcients in g(t) are rational numbers. . bm ) = (c0 . . On the contrary. m). In fact we have: f (t) = fh th + fh+1 th+1 + fh+2 th+2 + · · · = fh+1 fh+2 2 = fh th 1 + t+ t + ··· fh fh α k + a d d 3 Figure 3. The reader can develop computer programs for dealing with this representation of formal power series. we simply have a more complex tree representing the expression for aj .12. m + pA ). provided α ∈ Q. . . . a1 . Therefore. . ized as successive multiplications or Cauchy products. FORMAL POWER SERIES where ci = ai + bi for every 0 ≤ i ≤ r. which is called the tree or list representation of the expression. In any case. where we have drawn a leaf aj . powers can be realreprn0 ak tk = (a0 . . since it requires r products (a0 . In fact. cr ) where ck = j=0 aj bk−j for every 0 ≤ k ≤ r. A clever and dynamic use of the storage solves every problem without increasing the complexity of the corresponding programs. . c1 . . . an ) × (b0 . . In that case we have: f (t)α = f0 + f1 t + f2 t2 + f3 t3 + · · · α = α α = f0 1 + f2 f3 f1 t + t2 + t3 + · · · f0 f0 f0 . . It should be clear that another important point of this approach is that no limitation is given to the length of expressions. . In a similar way the Cauchy product is deﬁned: d √ · (a0 . cr ) each executed in time O(r2 ). bm ) = (c0 . c1 . f (t)α can be reduced to that case (1 + g(t))α . an ) + (b0 . . formal power series: Whatever α ∈ R is. also the coeﬃcients in (1 + g(t))α are. . we are always reduced to compute (1 + g(t))α and since: (1 + g(t))α = ∞ k=0 α g(t)k k (3. when α ∈ N. . if pA is the ﬁrst index for which apA = 0 and pB is the ﬁrst index for which bpB = 0. most Computer Algebra Systems provide both a general “simpliﬁcation” routine together with a series of more speciﬁc programs performing some speciﬁc simplifying tasks. α note that in this case if f0 = 1. but it is easily seen to take a time obvious way: in the order of O(r3 ). k=0 This gives a straight-forward method for performThe sum of the formal power series is deﬁned in an ing (1 + g(t))α . as We are now considering the vector representation of described in the previous section. . . an depend on a formal parameter p nothing is changed. This includes the inversion of a power series (α = −1) and therefore division as well. In the computer memory an expression is represented by its tree. . Subtraction is similar to addition and does not require any particular comment.2.12. . . where g(t) ∈ / F0 . . Before discussing division. let us consider the operation of rising a formal power series to a power α ∈ R.36 CHAPTER 3. . This representation is very convenient and when some coeﬃcients a1 . which is simpler between (a + 1)(a + 2) and a2 +3a+2? It is easily seen that there are occasions in which either expression can be considered “simpler”. 3. . usually f0 is not rational. b1 . The representation of the n formal power series k=0 ak tk is shown in Figure 3. a1 . such as expanding parenthetized expressions or collecting like factors. For example. First of all we observe that whenever α ∈ N. and r = min(n. . .1: The tree for a simple expression when only numerical operands are attached to them. Simpliﬁcation is a rather complicated matter and it is not quite clear what a “simple” expression is. Here r is deﬁned as r = min(n + pB . at least conceptually.

. is obtained by setting α = −1. the reader can show that the coeﬃcients in (1 − t)−1 are all 1. n−1 n−1 k=0 ak (n − k)hn−k = α (k + 1)ak+1 hn−k−1 k=0 The computation is now straight-forward. . We now have an expres. if n is the number of terms in (a1 . . LOGARITHM AND EXPONENTIAL 37 + ∗ d d d + d d d 0 a0 ∗ + t . hn−1 ): the exponentiation of a series. In fact. . a(t)h′ (t) = αh(t)a′ (t). . The evaluation of hk requires a number of operations in the order O(k). or. . and therefore the whole procedure works in time O(r2 ). . .13 (in the last sum we performed the change of variable k → k − 1. P. Miller has devised an algorithm allowing to perform (1 + g(t))α in time O(r2 ). We begin by setting h0 = 1. as desired. by extracting the coeﬃcient of tn−1 we ﬁnd: n−1 n−1 hk = −ak − aj hk−j = − aj hk−j (h0 = 1) j=1 j=1 We now isolate the term with k = 0 in the left hand member and the term having k = n − 1 in the right and can be used to prove properties of the inverse of hand member (a0 = 1 by hypothesis): a power series. Therefore. By diﬀerentiating. As we ((α + 1)k − n) ak hn−k = hn = αan + know. in order to have the same indices as in the left hand member).2: The tree for a formal power series J. It is worth noting that the previous formula becomes: k−1 k nhn + k=1 ak (n − k)hn−k = αnan + αkak hn−k k=1 3. h2 .3. Let us begin with the n−1 1 logarithm and try to compute ln(1 + g(t)). C. . by multiplying everything by a(t). let us write h(t) = a(t)α . .e. an )). i. an ) and on formal power series.The idea of Miller can be applied to other operations sion for hn only depending on (a1 . In the present section we wish to use it to perform the (natural) logarithm and (h1 . there is a direct way to perform this operation. . . we obtain h′ (t) = αa(t)α−1 a′ (t). hr (r = n. .13. The inverse of a series. h2 . n k=1 i. . . where a(t) is any formal power series with a0 = 1.. . a2 . . a2 .e. As a simple example. (1+g(t))−1 . and then we successively compute h1 . . . 1 a1 ∗ t + d d d a2 t2 ∗ O(tn ) an tn Figure 3.: ∞ n−1 1 (α + 1)k ln(1 + g(t)) = − 1 ak hn−k g(t)k = αan + n k k=1 k=1 Logarithm and exponential . .

. . when g(t) ∈ / F0 . If h(t) = f ′ (t). A program performing exponentiation can be easily written by deﬁning h0 = 1 and successively evaluating h1 . As for the operation of rising to a power. hk−1 ). we have to resort to the deﬁning formula: f (g(t)) = ∞ fk g(t)k k=0 This requires the successive computation of the integer powers of g(t). respectively. g0 = 0 by hypothesis. we have: hk = (k + 1)fk+1 and therefore the number of signiﬁcant terms is ret duced by 1. gk ) and (h0 = 1. 1 (k − j)hk−j gj = h k = gk − k j=1 k−1 = gk − j=1 1− j k hk−j gj A program to perform the logarithm of a formal power series 1+g(t) begins by setting h0 = 0 and then proceeds computing h1 . By diﬀerentiating the identity h(t) = exp(g(t)) we obtain h′ (t) = g ′ (t) exp(g(t)) = g ′ (t)h(t). . h1 . In this way we are reduced to the previous case. h2 . . g2 .e. consequently the number of signiﬁcant k−1 terms is increased by 1. To compute f (g(t)). We conclude this section by sketching the obvious algorithms to compute diﬀerentiation and integration of a formal power series f (t) = f0 +f1 t+f2 t2 +f3 t3 + · · ·. h2 . .38 and this formula only requires a series of successive products. . The execution time is in the order of O(r3 ). or h′ (t) = g ′ (t) − h′ (t)g(t). and it is worth considering an alternative approach. . a similar trick does not work for series composition. . . . if h(t) = 0 f (τ )dτ . h2 . but we no longer have rational coeﬃcients when g(t) ∈ Q[[t]]. . If / g(t) ∈ F0 . . g2 . hk−1 ): and h0 = 0. . g(t) = g0 + g1 t + g2 t2 + · · ·. gk ) and to (h1 . the procedure needs a time in the order of O(r3 ). Conversely. . We can now extract the coeﬃcient of tk−1 and obtain: k−1 CHAPTER 3. if r is the number of signiﬁcant terms in g(t). . if r is the minimal . . we have exp(g0 +g1 t+g2 t2 +· · ·) = eg0 exp(g1 t+g2 t2 +· · ·). . A similar technique can be applied to the computation of exp(g(t)) provided that g(t) ∈ F0 . We extract the coeﬃcient of tk−1 : k−1 k khk = j=0 (j + 1)gj+1 hk−j−1 = j=1 jgj hk−j This formula allows us to compute hk in terms of (g1 . In fact. . i. FORMAL POWER SERIES number of signiﬁcant terms in f (t) and g(t). Unfortunately. . . if we set h(t) = ln(1 + g(t)). Time complexity is obviously O(r2 ). hr . hr if r is the number of signiﬁcant terms in g(t).. The total time is clearly in the order of O(r2 ). . h2 . and therefore we have hk = fk−1 k an expression relating hk to (g1 . we have: khk = kgk − j=0 (k − j)hk−j gj 1 However. . by diﬀerentiating we obtain h′ (t) = g ′ (t)/(1 + g(t)). . . which can be performed by repeated applications of the Cauchy product.

whenever no ambiguity can arise. .k∈N = f (t. Note that formula (G5) requires g0 = 0. We can substitute this expression in the formula above and observe that w/φ(w) = t can be taken outside of the substitution symbol: [tn ]F (t)φ(t)n = F (w) φ(w) w = tφ(w) = = [tn−1 ] w 1 − tφ′ (w) 1 F (w) = [tn−1 ] w = tφ(w) = t 1 − tφ′ (w) F (w) = [tn ] w = tφ(w) 1 − tφ′ (w) which is our diagonalization rule (G6). the (ordinary) generating function for the sequence F is deﬁned as f (t) = f0 + f1 t + f2 t2 + · · ·. Two functions f (t) and g(t) can be multiplied and composed. understanding also the binding for the variable k. which applied to (fk )k∈N produces the ordinary generating function for the sequence. . By now applying the rule of diﬀerentiation for the “coeﬃcient of” operator. we can go on: [tn ]F (t)φ(t)n d F (y) = [tn−1 ] dy y = w(t) = dt y F (w) dw .Chapter 4 Generating Functions 4. the last formula can be proved by means of the LIF. For the sake of completeness. w). .k in the double sequence becomes the coeﬃcient of tn wk in the function f (t. .w (fn. This leads to the properties for the operator G listed in Table 4. and from w = tφ(w) we have: dφ dw = φ(w) + t dt dw w = tφ(w) dw . we introduce the generating function operator G. = n[tn ] y = [tn−1 ] 39 where φ′ (w) denotes the derivative of φ(w) with respect to w. f2 . The function f (t) can be shifted or diﬀerentiated. w) to indicate the fact that fn.e. then . we should write Gt.1 General Rules in the last passage we applied backwards the formula: [tn ]F (w(t)) = 1 n−1 ′ [t ]F (t)φ(t)n n (w = tφ(w)) Let us consider a sequence of numbers F = (f0 . .1. dt We can therefore compute the derivative of w(t): dw φ(w) = dt 1 − tφ′ (w) w = tφ(w) fk tk . This notation is essential when (fk )k∈N depends on some parameter or when we consider multivariate generating functions.k )n. G(fk )k∈N = f (t).) as: E(fk ) = G fk k! = ∞ k=0 and therefore w = w(t) ∈ F 1 is the unique solution of the functional equation w = tφ(w). The name “diagonalization” is due to the fact that if we imagine the coeﬃcients of F (t)φ(t)n as constituting the row n in an inﬁnite matrix. In fact we have: [tn ]F (t)φ(t)n F (t) φ(t)n = t F (y) dy y = w(t) . However. in the form relative to the composition F (w(t)). f1 . . f1 . and a more accurate notation would be Gt (fk )k∈N = f (t). i.) = (fk )k∈N . f2 . Given the sequence (fk )k∈N . The ﬁrst ﬁve formulas are easily veriﬁed by using the intended interpretation of the operator G. we will use the notation G(fk ) = f (t). for example. k! The operator G is clearly linear. we also deﬁne the exponential generating function of the sequence (f0 .. where the indeterminate t is arbitrary. In this expression t is a bound variable. In this latter case. = [tn−1 ] w = tφ(w) w dt We have applied the chain rule for diﬀerentiation.

4 Let f (t) = G(fk ) be as above. then: for particular sequences.2.2.2. because it states the condition under which we can pass from an identity about elements to the corresponding identity about generating functions.5) G(fk ) − f0 − f1 t G(gk ) − g0 = G(fk+2 ) = G(gk+1 ) = This can be further generalized: t t2 .2.1) G((k + 1)fk+1 ) = G(gk+1 ) = (4. however.2. If f (t) ∈ F. Theorem 4.1: The rules for the generating function operator [tn ]F (t)φ(t)n are just the elements in the main diagonal of this array..5 Let f (t) = G(fk ) be as above.4) 4.2) G(fk+j ) = function operator. Theorem 4. The theorem then follows by mathematical induction. we will prove a number of properties of the generating G(fk ) − f0 − f1 t − · · · − fj−1 tj−1 (4.40 CHAPTER 4. then: (G(fk ) − f0 ) /t. GENERATING FUNCTIONS linearity shifting diﬀerentiation G(αfk + βgk ) = αG(fk ) + βG(gk ) G(fk+1 ) = n (G1) (G2) (G3) (G4) (G5) (G(fk ) − f0 ) t G(kfk ) = tDG(fk ) fk gn−k = G(fk ) · G(gk ) n convolution composition diagonalisation G k=0 ∞ fn (G(gk )) n=0 = G(fk ) ◦ G(gk ) F (w) 1 − tφ′ (w) w = tφ(w) G([tn ]F (t)φ(t)n ) = (G6) Table 4. we obtain: We are now going to prove a series of properties of generating functions. the ﬁrst one) and we cannot infer the equality of generating functions. The proofs rely on the following tj fundamental principle of identity: If we consider right instead of left shifting we have to be more careful: Given two sequences (fk )k∈N and (gk )k∈N . then G(fk+2 ) = G(fk ) − f0 − f1 t t2 (4. f−1 = 0 and G(fk−1 ) = tG(fk ).1 Let f (t) = G(fk ) be the generating function of the sequence (fk )k∈N . we have: G k 2 fk = tDG(fk ) + t2 D2 G(fk ) (4. by (G2). then: G((k + 1)fk+1 ) = DG(fk ) Proof: If we set gk = kfk .2.g. In the next sections.2.2. Since g0 = f1 . = t−1 (G(kfk ) − 0f0 ) = (G2) Proof: Let gk = fk+1 .3 Let f (t) = G(fk ) be as above.2. By mathematical induction this result can be genThe rules (G1) − (G6) can also be assumed as axeralized to: ioms of a theory of generating functions and used to derive general theorems as well as speciﬁc functions Theorem 4.2 Let f (t) = G(fk ) be as above. then: The principle is rather obvious from the very deﬁnition of the concept of generating functions.2. then G(fk ) = G(gk ) if and only if for every k ∈ N fk = gk . G(gk ) = Theorem 4. It is suﬃcient that the two sequences do not agree by a single element (e. it is important. G(fk−j ) = tj G(fk ) (4. Property (G3) can be generalized in several ways: Theorem 4.2 Some Theorems on Generating Functions = t−1 tDG(fk ) = DG(fk ) .3) Proof: We have G(fk ) = G f(k−1)+1 = t−1 (G(fn−1 ) − f−1 ) where f−1 is the coeﬃcient of t−1 in f (t).

then: G k fk = Sj (tD)G(fk ) j j 41 Proof: Let us consider the sequence (gk )k∈N .7 Let f (t) = G(fk ) be as above.3 More advanced results rS(j. as (G3) and (4. we can also obtain more advanced results. √n √ n ∞ ∞ n=0 fn t + n=0 fn (− t) = = A more classical formula is: 2 √ √ ∞ Theorem 4. then: √ √ f ( t) + f (− t) (4. For examimmediate: ple.3. for example.10) Sj+1 (tD) = tDSj (tD) = tD j j S(j.p. but not a f.2) G(f2k+1 ) = 2 t √ √ where t is a symbol with the property that ( t)2 = t. = = r=1 S(j. we have: G k 3 fk = tDG(fk ) + 3t2 D2 G(fk ) + t3 D3 G(fk ) .2. let f (t) be a f. where gk+1 = fk and g0 = 0. except for k = 0. then: G pk fk = f (pt) (4. r) = r . ( t)n + (− t)n = fn equivalently. kgk = fk . =G 1 gk+1 k+1 = 1 1 G gk t k t = Proof: Formula (4. HowS(j. Hence Proof: By (G5) we have: √ √ we have G(kgk ) = G(fk ) − f0 .2. r)t D . then: G 1 fk k t = G(gk ) = 0 (G(fk ) − f0 ) dz z (4.9 Let f (t) = G(fk ) be as above or.4. By using (G3). MORE ADVANCED RESULTS Theorem 4.L. hence. which is the classical recurrence for the Stirling numbers of the second kind. r − 1).8 Let f (t) = G(fk ) be as above and let us deﬁne gk = fk /k. The results obtained till now can be considered as Since initial conditions also coincide. By equating like coeﬃcients we ﬁnd S(j + 1.6) j where Sj (w) = r=1 r wr is the jth Stirling polynomial of the second kind (see Section 2.3.3.5) are the ﬁrst two instances..11).1) G(f2k ) = 2 √ √ f ( t) − f (− t) √ (4. the proof of which is manipulating its elements in various ways. Theorem 4. from which (4. They are very j useful and will be used in many circumstances.2.10 Let f (t) = G(fk ) denote the generating function of the sequence (fk )k∈N . Proof: Clearly. r)rt D + r=1 r r S(j.s.6) is to be understood in the operator sense. The proof proceeds by induction.8) Theorem 4. we can conclude simple generalizations of the axioms. Finally: G 1 fk k+1 1 t 0 (4. so. ∀k = 0 and g0 = 0. So we have: G(gk+1 ) = G(fk ) = t−1 (G(gk ) − g0 ).s.1 Let f (t) = G(fk ) denote the generating function of the sequence (fk )k∈N . r) + S(j.6 Let f (t) = G(fk ) be as above.2.2. Now: G k j+1 fk = G k(k j )fk = tDG k j fk that is: j t = (G(gk ) − g0 ) dz 1 = z t 0 G(fk ) dz In the following theorems.2. by setting fk = G k+1 n = 2k. for p = −1 we have: G (−1)k fk f (−t).8) follows by f ( t) + f (− t) = 2 integration and the condition g0 = 0.2.2.2. then: formulas: G(k r fk ) = tr Dr G(fk ) (4.9) ∞ k=0 f2k (t + t )/2 = k k ∞ k=0 f2k tk = G(f2k ) . let us begin by proving the well-known bisection Theorem 4.2. being S3 (w) = w + 3w2 + w3 .2.3. . conFor the falling factorial k r = k(k − 1) · · · (k − r + cerning sequences derived from a given sequence by 1) we have a simpler formula. f (t) will always denote the generating function G(fk ).7) Let us now come to integration: Theorem 4. ever.2. r) = 4. we have: = 1 t t 0 G(fk ) dz = 1 t t f (z) dz 0 (4.2. r)tr Dr = r=1 r+1 r+1 Proof: By setting g(t) = pt in (G5) we have: ∞ ∞ n n n f (pt) = = = G pk fk n=0 fn (pt) n=0 p fn t In particular. then: 2 n=0 √ √ n 1 For n odd ( t) + (− t)n = 0. we ﬁnd tDG(gk ) = G(fk ) − f0 .

then: n 4. We k=0 now observe that the sum in (4.3. then: G fk 2k + 1 1 = √ 2 t t 0 CHAPTER 4. .3. As a ﬁrst example. as we shall see when dealing with Riordan arrays. by applying (G3) we have the diﬀerential equation 2tg ′ (t) + g(t) = f (t).3. for n which we have fk+1 = fk for every k ∈ N.e. G(fk ) √ dz z (4.3. . let us G fk = 1−t k=0 consider the constant sequence F = (1.42 The proof of the second formula is analogous. .4) follows directly.3 (Partial Sum Theorem) Let f (t) = G(fk ) denote the generating function of the sequence (fk )k∈N .basic rules and the theorems of the previous sections tion: we have: t 1 Theorem 4.4) of the previous sections. The following proof is typical and introduces the use of ordinary diﬀerential equations in the calculus of generating functions: Theorem 4. c.4 Let f (t) = G(fk ) denote the gener= G(n) = G(n · 1) = tD ating function of the sequence (fk )k∈N . we have immediately: i.3) Proof: Let us set gk = fk /(2k+1). Similarly.5) fk = G (1 − t) (1 − t)3 1−t 1−t k k=0 1 G((−1)n ) = G(1) ◦ (−t) = 1+t Proof: By well-known properties of binomial coeﬃt cients we have: 1 1 dz 1 = G ·1 = −1 = G n n 1−z z n n −n + n − k − 1 0 = = (−1)n−k = t k n−k n−k dz 1 = = ln −k − 1 1−t 0 1−z = (−1)n−k n n−k 1 1 1 = G = G(Hn ) = G n−k −k−1 k 1−t n and this is the coeﬃcient of t in (1 − t) . We observe explicitly that by (K4) we have: n n n n k=0 k fk = [t ](1 + t) f (t). . by using the The following result is known as Euler transforma.4 Common Generating Functions The aim of the present section is to derive the most common generating functions by using the apparatus 1 G(fk ) (4.plying the principle of identity. we ﬁnd G(sn ) = tG(sn ) + G(fn ) and For any constant sequence F = (c.3. G(fk ). that is by (G2): G(fk ) − f0 = tG(fk ). or 2kgk +gk = fk .3.).5) can be extended 1 1 ln = to inﬁnity. by (G1) from this (4. 1. GENERATING FUNCTIONS [tn ] [tn ] 1 1−t ∞ = = [y k ]f (y) t 1−t . The Euler transformation can be generalized in several ways.: G(sn ) − s0 G(fn ) − f0 1 = G(sn ) + G(1) = t t 1−t Since s0 = f0 . and by (G5) we have: 1−t 1−t n k=0 n fk k n = = k=0 ∞ −k − 1 (−1)n−k fk = n−k k=0 [tn−k ](1 − t)−k−1 [y k ]f (y) = where Hn is the nth harmonic number. we ﬁnd that G(c) = c(1−t)−1 .2 Let f (t) = G(fk ) denote the generating function of the sequence (fk )k∈N .3. Other generating functions can be obtained from the previous formulas: 1 1 ln = G(nHn ) = tD 1−t 1−t . If g(t) = G(gk ). c. we ﬁnd: G(fk+1 ) = ator G to both members: G(sn+1 ) = G(sn ) + G(fn+1 ).).3). but this expression does not represent a generating function because it depends on n.3.3. then we have sn+1 = sn + fn+1 for every n ∈ N and we can apply the oper. . We conclude with two general theorems on sums: Theorem 4. whose solution having g(0) = f0 is just formula (4. it represents the generating function of the sum. By apProof: If we set sn = k=0 fk . Since f0 = 1. k=0 t 1−t k = 1 f 1−t Since the last expression does not depend on n. . then: 1−t (1 − t)2 t + t2 1 n t n 1 = G n2 = tDG(n) = tD 2 f (4. 1.

Finally. This last relation can be readily generalized to G(δn. for m = p = n becomes k p = 2n n . COMMON GENERATING FUNCTIONS = G 1 Hn n+1 = = G(δ0.s. we ﬁnd f.2. We thus arrive to the correct 1 generating function: = = G [tk−p ] (1 − t)p+1 1−t 1 1 tp tp =1− ln G = G [tk ] = . the diﬀerential equation f ′ (t) = pf (t) − tf ′ (t). In order to apply the principle G = G = p k−p of identity. this relation is k k not valid for n = 0.4) and (G3) we have G DG(fk ) = pG(fk ) − tDG(fk ).m ) = tm . n(n + 1) t 1−t (1 − t)p+1 (1 − t)p+1 1 n(n+1) where δn. For t = 0 we should have f (0) = p = 1. where p is a parameter. where p and m are two parameters and can also be zero: G p m+k p+k m p+k m+k = (1 + t)p tm tm−p (1 − t)m+1 1 tm (1 − t)p+1−m p G = as fk . + k−1 k 1 (1 + t)p+1 t p+1 (1 + t)p+1 − 1 (p + 1)t . from the deﬁnition: p k+1 = = p(p − 1) · · · (p − k + 1)(p − k) = (k + 1)! p−k p .0 k−p n(n + 1) n n+1 −p − 1 = G (−1)k−p = in accordance with the fact that the ﬁrst element of k−p the sequence is zero. By sep.m is the Kronecker’s delta.4..properties considered in the previous section: erator G to both members. it is tempting to apply the op. we must deﬁne: −k + k − p + 1 1 1 1 = G (−1)k−p = = − + δn. we have the well-known Vandermonde convolution: m+p = [tn ](1 + t)m+p = n = [tn ](1 + t)m (1 + t)p = n ln 1 2t 1 1 ln dz = 1−z 1−z 1 1−t 2 1 +1 1−t ln We can also ﬁnd G . and this implies 0 p p G k = tDG = c = 1. Consequently: k k p = tD(1 + t)p = pt(1 + t)p−1 G = (1 + t)p p∈R k p p G k2 = (tD + t2 D2 )G = k k We are now in a position to derive the recurrence relation for binomial coeﬃcients. By using (K1) ÷ (K5) we ﬁnd easily: p k = = = [t ](1 + t) = [t ](1 + t)(1 + t) k p k p−1 = pt(1 + pt)(1 + t)p−2 = = = 1 t t 0 = G 1 p k+1 k G p k t dz = = 0 [tk ](1 + t)p−1 + [tk−1 ](1 + t)p−1 = p−1 p−1 . i.s.L. and not simply f. However.e. Since The derivation is purely algebraic and makes use of An interesting example is given by G the generating functions already found and of various 1 1 1 n(n+1) = n − n+1 . . = By applying (4.n ) t (1 − t)2 1 t t 0 43 By (K4).4. or f (t) = c(1 + t)p .p. 1−t = G(1) − tG(1) = =1 1−t = k=0 m k p n−k n n 2 k=0 k which. we have: Hence. k+1 k Several generating functions for diﬀerent forms of binomial coeﬃcients can be found by means of this method. by denoting k G((k + 1)fk+1 ) = G((p − k)fk ) = pG(fk ) − G(kfk ).These functions can make sense even when they are arating the variables and integrating. we list the following generating functions: ln f (t) = p ln(1 + t) + c.. In order p to ﬁnd G k we observe that. They are summarized as follows. Let us now come to binomial coeﬃcients.

. This can produce a relation to which pm t m . In a similar way we also have: G m k p k k k p m = (1 + pt)m 1 [tn+1 ] (p − 1) pn+1 − 1 p−1 4. . By using the operator [tk ] we easily ﬁnd: n (tD + t2 D2 )G k = m m2 tm + (3m + 1)tm+1 + tm+2 (1 − t)m+3 t pk k=0 = = = [tn ] 1 1 = 1 − t 1 − pt 1 1 − 1 − pt 1 − t = G k m 0 − m dz = z = 0 The last integral can be solved by setting y = (1 − z)−1 and is valid for m > 0. n+1 n+1 n by setting fn = 2n . p . The last relation has been obtained by partial fraction expansion. In practice. . we have p = pp .5 The Method of Shifting k When the elements of a sequence F are given by an G = G pk−m pm = k−m explicit formula. that is: 1 n = tDG = G 1 k (n + 1)! (n + 1)! G p = 1 − pt 1 tet − et + 1 = tD (et − 1) = From this we obtain other generating functions: t t pt 1 k By this relation the well-known result follows: = G kp = tD 1 − pt (1 − pt)2 n 1 tet − et + 1 k pt + p2 t2 pt = [tn ] = 2 k = G k p = tD (k + 1)! 1−t t (1 − pt)2 (1 − pt)3 k=0 t et − 1 1 dz 1 1 k − = = [tn ] = p −1 = G 1−t t k 1 − pz z 0 1 1 p dz . G = dz n · n! z 0 By (G2) we have t−1 G pk − 1 = pG pk . tm z m−1 dz = m+1 (1 − z) m(1 − t)m the well-known formula for the sum of a geometric progression.10). by applying the operator G. we have the recurrence (n + n . G pk+1 = pG pk . = 1− = ln = (n + 1)! (1 − pz) 1 − pt n G pk k=0 = 1 1 = 1 − t 1 − pt Let us now observe that 2n+2 = 2(2n+1) 2n . 1 G = et Let us consider the geometric sequence n! 2 3 k+1 k 1. p. the method that is (n+1)fn+1 = fn . Whatever the case. for m = 0 it reduces to G(1/k) = − ln(1 − t). By (4. ∀k ∈ N t z e −1 1 or. the solution of which is the generating As a very simple application of the shifting method. where fn = 1/n!.44 G k k m = tD = G k2 k m = = 1 k G k m = 0 t CHAPTER 4. we ﬁnd a recurrence for the elements fn ∈ F and then try to solve it by using let us observe that: the rules (G1) ÷ (G5) and their consequences.4) ′ of shifting allows us to ﬁnd the generating function we have f (t) = f (t) or: of many sequences. we can try to ﬁnd the generating −m − 1 function for F by using the technique of shifting: we (−p)k−m = = pm G consider the element fn+1 and try to express it in k−m terms of fn . .2. function.2. = we apply the principle of identity deriving an equa(1 − pt)m+1 tion in G(fn ). GENERATING FUNCTIONS tm = (1 − t)m+1 mtm + tm+1 (1 − t)m+2 = 1 (p − 1)t 1 1 − 1 − pt 1 − t . p . It can 1 1 1 = happen that the recurrence involves several elements (n + 1)! n + 1 n! in F and/or that the resulting equation is indeed a diﬀerential equation. We observe explicitly that the formulas above could have been obtained from formulas of the previous section and the general formula (4.

By using (4. Not every sequence can n 1 − 4t √ t 1 − 1 − 4t be deﬁned by a ﬁrst order recurrence relation. More in general. without 1 − 1 − 4t = 2 ln having to pass through the solution of a diﬀerential 2t equation. let us study the sequence: cn = [tn ](1 + αt + βt2 )n and look for its generating function C(t) = G(cn ). which will be more closely = √ G studied in the next sections. and therefore we should solve the functional equation w = t(1+αw+βw2 ) or βtw2 −(1−αt)w+t = 0.2. It produces ﬁrst 2n 1 order recurrence relations. consequently. the diﬀerential equation 2t(1 − t)f ′ (t) − (1 + 2t)f (t) + 1 = 0 is derived. we must elim4n+1 = 4 n+1 2n + 1 n inate the solution with the + sign. In fact we have: 2n 2t √ G n = n (1 − 4t) 1 − 4t 2n t = [tn ](1 + t)2n dz 2n 1 1 n = = √ G 2n + 1 n t 0 4z(1 − 4z) and (G6) can be applied with F (t) = 1 and φ(t) = 1 4t (1 + t)2 . By using the operator G and the rules of Section 4. In this case again we have F (t) = 1 and φ(t) = 1 + αt + βt2 . This gives: w = w(t) = 1 − αt ± (1 − αt)2 − 4βt2 2βt Some immediate consequences are: G 2n 4n 2n + 1 n 4n 2n G 2n2 n and ﬁnally: G 2n 1 4n 2n 2n + 1 n −1 −1 = 1 t(1 − t) arctan arctan t 1−t 2 t 1−t −1 = = 1− 1−t arctan t t . the simple solution of which is: 45 4. the function w = w(t) is easily = √ arctan . the integral can be computed without diﬃculty. we ﬁnd tw2 −(1−2t)w+t = 0 A last group of generating functions is obtained by or: −1 √ . w = w(t) = 2t −1 −1 2n + 2 2n + 2 n 2n Since w = w(t) should belong to F 1 . In this case. Since: considering fn = 4n 2n 1 − t ± 1 − 4t n .2. and 1 1 2n dz √ = G = other methods are often necessary to ﬁnd out genern+1 n t 0 1 − 4z 2t ating functions. we ﬁnd: C(t) = 1 1 − t(α + 2βw) .4.6 Diagonalization The technique of shifting is a rather general method for obtaining generating functions.6. 1 − 4t 4t determined by solving the functional equation w = t(1+w)2 . DIAGONALIZATION 1)fn+1 = 2(2n + 1)fn . One of the most √ −1 G = = n n z 1 − 4z simple examples is how to determine the generating 0 √ function of the central binomial coeﬃcients. and the ﬁnal result is: G 4n 2n n −1 = t arctan (1 − t)3 t 1 + . (G1) and (G3) we obtain the diﬀerential equation: f ′ (t) = 4tf ′ (t) + 2f (t). By performing the necessary computations. Sometimes. we have: G 1 = 1 − 2t(1 + w) 2n n = we have the recurrence: (2n + 1)fn+1 = 2(n + 1)fn . the rule of diagonalizat dz 1 2n 1 tion can be used very conveniently.4). 1−t and again we have to eliminate the solution with the + sign. 1−t 1−t as we already know. The function φ(t) = (1 + t)2 gives rise to a second degree equation. The solution is: f (t) = t (1 − t)3 t − 0 (1 − z)3 dz z 2z(1 − z) √ 1 − t − 1 − 4t w= = 2t 1 1 − 4t =√ By simplifying and using the change of variable y = z/(1 − z). By expanding.

in analogy with the binomial coeﬃcients. These generating functions can be used to ﬁnd the and for α = 2. (−1)n−k ⌊ k⌋ = = (−1)n by the obvious property (1 + t + t2 )n+1 = (1 + t + k k t2 )(1+t+t2 )n . 1−t 2 Table 4.n . 1. 1. . that is the √ ⌊log2 n⌋ sequence whose generic element is ⌊ k⌋. 2. 1. function for the central binomial coeﬃcients. We can n n−1 k ⌊log2 k⌋(−1) = . {0. . 1.2). . where we use the coeﬃcients. . Euler transformation: They constitute an inﬁnite array in which every row n √ ⌊ k⌋(−1)k = has two more elements. √ n If Tn.k+1 2 ∞ tk = = (−1)n [tn ] from which the array can be built. a trinomial coeﬃcient. 1. we immediately deduce the recurrence 2 ∞ 1 yk t relation: = (−1)n [tn ] = y= 1+t 1−y 1+t k=0 Tn+1. 2. . = (−1)n √ ⌊ n⌋ k=0 2 n−1 (−1)n−k = n − k2 = 2 n−1 (−1)k . 1.46 n\k 0 1 2 3 4 0 1 1 1 1 1 1 1 2 3 4 2 1 3 6 10 3 4 5 6 7 8 CHAPTER 4. for ev√ ⌊ n⌋ ery n ∈ N. 1. their se(1 + t)k2 k=0 quence begins: √ n Tn 0 1 2 1 1 3 3 4 7 19 5 51 6 141 7 393 8 1107 = (−1)n ⌊ n⌋ k=0 √ ⌊ n⌋ k=0 −k 2 n − k2 = and by the formula above their generating function is: 1 = G(Tn. or also any real number. 3. 2.2: Trinomial coeﬃcients 1 − αt − 1 1 − 2αt + (α2 − 4β)t2 2βt In the same way we obtain analogous generating functions: √ r G ⌊ k⌋ = ∞ k=0 w= = tk 1−t r G(⌊logr k⌋) = ∞ k=0 tr 1−t k where r is any integer number. although it remains depending We wish to determine the generating function of on n. think that it is formed up by summing an inﬁnite k n − 2k k k=1 number of simpler sequences {0. In the same way we ﬁnd: the sequence {0. . .k is [tk ](1 + t + t2 )n . 2. 0. GENERATING FUNCTIONS and therefore we obtain: √ G ⌊ k⌋ = 3 16 1 10 4 1 ∞ k=0 2 7 16 1 6 19 tk . diﬀerent from 0. .k+1 = Tn. The coeﬃcients of (1 + t + t2 )n are called trinomial Let us begin by the following case. The elements Tn. .0 = 1 and Tn. 1.k−1 + Tn. The generating functions of sum: these sequences are: 2 ∞ n √ 1 tk n t4 t9 t16 t1 = ⌊ k⌋ = [t ] ··· 1−t 1−t k=1 k=1 1−t 1−t 1−t 1−t 4. the next one with the ﬁrst 1 A truly closed formula is found for the following in position 9. 1.n ) = √ 1 − 2t − 3t2 1 (1 + t)(1 − 3t) . β = 1 we obtain again the generating values of several sums in closed or semi-closed form. 2 n−k We can think of the last sum as a “semi-closed” form.}. ⌈k r ⌉ 1 − 2αt + (α2 − 4β)t2 and ⌈rk ⌉. with respect k k to the previous row (see Table 4. because the√ number of terms is dramatically reduced from n to n. 1.7 Some special functions generating . once we start from (1 + t)k2 k=0 the initial conditions Tn. marked in the table. 0. 1. and so on.}. 0.k + Tn. respectively. 1.2n = 1. 1. if we substitute to k r and rk . 2 1 n [tn−k ] = = (−1) are called the central trinomial coeﬃcients.}.

it can be deﬁned by means of a recurrence relation. . . fn−p . f1 . then − − − 1 − 2t (1 − t)k2 (1 − t)k2 the relation is called a partial history recurrence and p 2 2 is called the order of the relation. B = −1. If F only depends on 2 1 2k 2(1 − t) a ﬁxed number of elements fn−1 . . By changing the initial + = + 1 − 2t (1 − t)k2 (1 − t)k2 (1 − 2t) conditions. .We observe that for very large values of n the ﬁrst 3 ogous sum with ⌊log2 n⌋. . 4. then the relation is values in the previous expression we get: called a full history recurrence. n k=1 (n + 1)⌊ n⌋ − √ ⌊ n⌋ k=1 k .) can be tions: deﬁned by the recurrence relation xn = xn−1 and B 1 A the initial condition x0 = 1. Again. fn−2 .e. X = −2k −1 . . 3 2 n − k2 √ whose asymptotic value is 2 n n. the sequence can radically change. 1. we conclude: n = n−k +1 n − k2 (n − k 2 + 1) = √ 2 = k=0 n √ ⌊ k⌋ = k n−2 2 n − k2 √ ⌊ n⌋ 2k 2n−k − ⌊ n⌋ k=1 √ 2 2 n−1 − n − k2 n − k2 n − k2 = = = n−1 n−2 = ⌊ n⌋2n − +2 + ···+ The ﬁnal value of the sum is therefore: n − k2 n − k2 k=1 √ √ √ ⌊ n⌋ √ 1 2 n − k2 (n + 1)⌊ n⌋ − ⌊ n⌋ + 1 ⌊ n⌋ + + · · · + 2k −1 .elements by means of the recurrence relation. . if all the coeﬃcients appearing = = − 1 − 2t (1 − t)k2 2(1 − t) − 1 in F are constant. fn−2 . k2 t = [tn ] = Usually. k2 (1 − 2t) (1 − t) √ ⌊ n⌋ k=1 √ ⌊ n⌋ k=1 2 Therefore.8 A somewhat more diﬃcult sum is the following one: n Linear recurrences constant coeﬃcients with If (fk )k∈N is a sequence. . . . in fact. . if we consider the same relation xn = xn−1 together with C D X + . 2. D = 2 −4. C = −2. 2. by substituting these fn = F (fn−1 .√and therefore the asymptotic value of the sum is ⌊ n⌋2n . the constant sequence (1.8. LINEAR RECURRENCES WITH CONSTANT COEFFICIENTS ∞ 47 2 2 = [tn−k ] 2 = k=1 √ ⌊ n⌋ k=1 √ ⌊ n⌋ k=1 √ ⌊ n⌋ k=1 1 = (1 − t)2 = = 2 −2 (−1)n−k = n − k2 2k 2k (1 − t)k − 1 = − 1 − 2t (1 − t)k2 (1 − 2t) 1 . .. the ﬁrst elements of the sequence must be (1 − t)k2 (1 − 2t) k=0 given explicitly. Linear recurrences k2 k2 (1 − t) (1 − t) are surely the most common and important type of 2 k2 k2 2 (1 − t)k − 1 2 1 recurrence relations. . in order to allow the computation ∞ of the successive values. 1. a relation relating the ∞ generic element fn to other elements fk having k < n. . we obtain (see Chapter 1): term dominates all the others. we have a linear recurrence with = y= k=0 k=0 n √ ⌊ k⌋ k [tn ] 1 1−t ∞ yk 1−y 2 t 1−t . + + ··· + the initial condition x0 = 2. f0 . they constitute the initial 2 1 . . when F depends on all the values fn−1 . a recurrence relation can be written We can show that A = 2k . Besides. 2 − √ ⌊ n⌋ k=1 − ··· − 2k 2 −1 √ ⌊log2 k⌋ = (n + 1)⌊log2 n⌋ − 2⌊log2 n⌋+1 + 1.4. . 2 In general. . for the anal. [tn−k ] = conditions and the sequence is well-deﬁned if and only k2 (1 − 2t) (1 − t) k=0 if every element can be computed by starting with We can now obtain a semi-closed form for this sum by the initial conditions and going on with the other expanding the generating function into partial frac. .). fn−2 .}. . . we obtain the constant k2 −1 k2 −2 1−t (1 − t) (1 − t) sequence {2. . i. For example. if F is lin2k −1 (1 − t)k −1 4(1 − t)2 − ··· − = − ear. we have a linear recurrence.

the method of generating functions allows us to ﬁnd the solution of any linear recurrence with constant coeﬃcients. and if the coeﬃcients are polynomials in n. which have no combinatorial meaning. we have: φn Fn = round √ . this is not the case.618033989. In fact.h. For linear recurrences with polynomial coeﬃcients. The recurrence being linear with constant coeﬃcients. We have: that Fn grows exponentially. we have a linear recurrence with polynomial coeﬃcients. we can apply the axiom of linearity to the recurrence: fn+p = α1 fn+p−1 + α2 fn+p−2 + · · · + αp fn and obtain the relation: G(fn+p ) = α1 G(fn+p−1 )+ + α2 G(fn+p−2 ) + · · · + αp G(fn ). because for n = 0 we have F0 = F−1 + F−2 . We observe explicitly that √ = . integer number closest to φn / 5. When we have a recurrence of this kind. This ﬁrst step has a great importance. CHAPTER 4. 1 − t − t2 We determine the two constants A and B by equating the coeﬃcients in the ﬁrst and last expression for F (t): A+B =0 −Aφ − Bφ = 1 √ A = 1/(φ − φ) = 1/ 5 √ B = −A = −1/ 5 The value of Fn is now obtained by extracting the coeﬃcient of tn : Fn = [tn ]F (t) = 1 1 1 = − [tn ] √ 5 1 − φt 1 − φt 1 1 1 √ [tn ] − [tn ] = 1 − φt 5 1 − φt = By Theorem 4. As we are now going to see. if we write the recurrence as Fn+2 = Fn+1 + Fn we have fulﬁlled the requirement.48 constant coeﬃcients. The Fibonacci recurrence Fn = Fn−1 + Fn−2 is an example of a recurrence relation with constant coeﬃcients. the same method allows us to ﬁnd a solution in many occasions. GENERATING FUNCTIONS and by solving in F (t) we have the explicit expression: F (t) = t .s. φ= 2 The constant 1/φ ≈ 0. and surely generating functions are the method giving the highest number of positive results. because it allows us to apply the operator G to both members of the relation.. since |φ| < 1. 5 F (t) − t = tF (t) + t2 F (t) . ∀n ∈ N. this was not possible beforehand because of the principle of identity for generating functions. This is the solution φn − φn of the recurrence relation.618033989 is known as the golden ratio. but the success is not assured.618033989 2 √ 1− 5 ≈ −0. By applying the method of partial fraction expansion we ﬁnd: F (t) = = t (1 − φt)(1 − φt) = B A + = 1 − φt 1 − φt A − Aφt + B − Bφt . We can now ﬁnd an explicit expression for Fn in the following way. However. from which an explicit expression for f (t) is immediately obtained. Fn should be an integer and therefore we can compute it by ﬁnding the √ F (t) − F0 F (t) − F0 − F1 t = + F (t). We will discuss this case in the next section.2 we can now express every G(fn+p−j ) in terms of f (t) = G(fn ) and obtain a lin= ear relation in f (t). because φn = exp(n ln φ). Let us go on with the example of the Fibonacci This formula allows us to compute Fn in a time independent of n.2. The denominator of F (t) can be written 1 − t − t2 = (1 − φt)(1 − φt) where: √ 1+ 5 φ= ≈ 1. F1 = 1. in writing the expressions for G(fn+p−j ) we make use 5 of the initial conditions for the sequence. In reality. 1 − t − t2 This is the generating function for the Fibonacci numbers. and we do not know the values for the two elements in the r. In the example of Fibonacci numbers. no method is known that solves all the recurrences of this kind. we begin by expressing it in such a way that the relation is valid for every n ∈ N. On the other hand. consequently: 2 t t Because we know that F0 = 0. and shows sequence (Fk )k∈N . G(Fn+2 ) = G(Fn+1 ) + G(Fn ) the quantity φn approaches 0 very rapidly and we and by setting F (t) = G(Fn ) we ﬁnd: have Fn = O(φn ). in the sense that we ﬁnd a function f (t) such that [tn ]f (t) = fn .

The number of fn+1 = fn +cn = fn−1 +cn−1 +cn = · · · = f0 + k=0 involutions grows very fast and it can be a good idea to consider the quantity in = In /n!. . in that case. b0 ). .9 Linear recurrences with polynomial coeﬃcients When a recurrence relation has polynomial coeﬃcients. This is the actual problem of this approach. . bn−1 bn−2 . if an+1 = bn = 1. by unfolding this recurrence the result is: fn+1 = bn bn−1 . This is a simple diﬀerential equation with separable variables and by solving it we ﬁnd: ln i(t) = t + t2 +C 2 or i(t) = exp t + t2 +C 2 and the relation becomes: an an−1 . . but no other method is available to solve those recurrences which cannot be solved by a generating function approach. . let where f0 is the initial condition relative to the seus begin by changing the recurrence in such a way quence under consideration. a0 n k=0 ak ak−1 . b0 is zero. THE SUMMING FACTOR METHOD 49 where C is an integration constant. a0 fn+1 = bn bn−1 . . we introduced the concept of an involution. We have already seen some examples when we studied the method of shifting. . . . the rule of diﬀerentiation introduces a derivative in the relation for the generating function and a diﬀerential equation has to be solved. . bn . . . . . . Usually. . a0 ck . . bn bn−1 . we observe that sometimes a0 and/or b0 can be 0. . if we multiply everything by the so-called summing factor: In+2 = In+1 + (n + 1)In an an−1 . . a0 fn+1 /(bn bn−1 . (Ik )k∈N . . .4. .10. b0 As a technical remark. a0 gn+1 = gn + cn bn bn−1 .. which We can now deﬁne: is therefore the exponential generating function for gn+1 = an+1 an . if with i(t) we denote the generating function of (ik )k∈N = (Ik /k!)k∈N . b0 and we can pass to generating functions. and then ways change the original recurrence into a relation of divide everything by (n + 2)!: this more simple form. . When we studied permutations. we obtain: an+1 an . Finally. because the main diﬃculty just consists in dealing with the diﬀerential equation. . . i. possibly depending on n. by unfolding the recurrence we can ﬁnd an explicit expression for fn+1 or fn : n ck which has polynomial coeﬃcients. . bn−1 . and a possible closed form can only be found by manipulating this sum. and in the next section we will see a very important example taken from the analysis of algorithms. and for the number In of involutions in Pn we found the recurrence relation: In = In−1 + (n − 1)In−2 4. an−1 . a permutation π ∈ Pn such that π 2 = (1). a0 an an−1 . Let us suppose we have a recurrence relation: an+1 fn+1 = bn fn + cn where an . a0 . bn . Fortunately. the method of generating functions does not assure a solution. . Because i(0) = eC = 1. we can unfold the . which allows us to obtain an explicit expression for the generic element of the deﬁned sequence. . G((n + 2)in+2 ) can be seen as the shifting of G((n + 1)in+1 ) = i′ (t). . b0 an an−1 . . b0 a0 f0 + an+1 an .e. the method does not guarantee a closed form. . 2 4. We have: i′ (t) − 1 i(t) − 1 = + i(t) t t because of the initial conditions i0 = i1 = 1. this expression is in the form of a sum. . . we can althat the principle of identity can be applied. therefore. but here we wish to present a case arising from an actual combinatorial problem. b0 bn bn−1 . . . . In fact.10 The summing method factor For linear recurrences of the ﬁrst order a method exists. a0 = fn + cn . a0 In+2 In+1 1 1 In = + . . . . Therefore. cn are any expressions. Usually. bk bk−1 . As we remarked in the Introduction. . we have C = 0 and we conclude with the formula: t2 In = n![tn ] exp t + . and so: i′ (t) = (1 + t)i(t). b0 g0 = a0 f0 . b0 (n + 2)! n + 2 (n + 1)! n + 2 n! The recurrence relation for in is: (n + 2)in+2 = in+1 + in provided none of an .

. a1 bn bn−1 . . we ﬁnd: (n + 1) 4n n!2 2n − 1 4n n!2 1 fn+1 = fn − . 1 2n 1 2n − 1 4n n By multiplying the recurrence relation by this summing factor. . By diﬀerentiating: 1 1 1 ln +√ f (t) = − √ 1−t 2 1−t 1−t ′ 1 2(n + 1) = H2n+2 − Hn+1 − . we have the relation: 1 1/2 (n + 1)fn+1 − nfn = − fn + (−1)n 2 n which can be written as: (n + 1)fn+1 = 2n − 1 1 2n . we ﬁnd: 1 1 1 1 1 1 = + + +· · ·+ +1− −· · ·− −1 = √ 1 2n 2n − 1 2n − 2 2 2n 2 1 − t ln f (t) = = 1−t 1 1 1 1 1 1 71 5 31 6 = t − t3 − t4 − t − t + · · · . we ﬁnd: Hn − 2H2n + 4n ∼ 2n − 1 ∼ ln n + γ − 2(ln 2 + ln n + γ) + 2 = = − ln n − γ − ln 4 + 2.s. Therefore we have: fn ∼ −(ln n + γ + ln 4 − 2) fn+1 2n 1 = (n + 1)4n n n k=0 −1 2k − 1 1 √ 2n πn (a0 f0 = 0). which shows that |fn | grows as ln n/n3/2 . (2n)! 4n 2n − 1 The reader can numerically check this expression against the actual values of fn given above. we know that the two coeﬃcients of fn+1 and we conclude: and fn are equal.50 CHAPTER 4. . and accordingly change the last We can somewhat simplify this expression by observing that: index 0. corresponding to the function 2k − 1 2n − 1 2n − 3 √ k=0 f (t) = 1 − t ln(1/(1 − t)). b1 = = = n(n − 1) · · · 1 · 2n = (2n − 1)(2n − 3) · · · 1 n!2n 2n(2n − 2) · · · 2 = 2n(2n − 1)(2n − 2) · · · 1 4n n!2 . let us n discuss the problem of determining the coeﬃcient 1 1 1 = + + ··· + 1 − 1 = of tn in the f. besides. = H2n+2 − 2n + 1 − 2n + 2 − 2 Hn+1 + 2n + 2 − 1 = 24 24 1920 960 A method for ﬁnding a recurrence relation for the coeﬃcients fn of this f. is to derive a diﬀerential equation for f (t). we have: an an−1 . GENERATING FUNCTIONS recurrence down to 1. 2n + 1 4n+1 n + 1 and therefore we have the diﬀerential equation: √ 1 (1 − t)f (t) = − f (t) + 1 − t. fn − n 2 4 (2n − 1) n This expression allows us to obtain a formula for fn : fn = 1 2n Hn − H2n + 2 2n − 1 = Hn − 2H2n + 2 1 2n 2n − 1 4n n 1/2 . . Let us apply the summing factor method. By using the asymptotic approximation Hn ∼ ln n + γ given in the Introduction. Since a0 = 0. (2n)! 2 (2n)! 2n − 1 Besides: ∼ 1 √ 2n πn We are fortunate and cn simpliﬁes dramatically. 2 ′ By extracting the coeﬃcient of tn .p. we have: 2n 1 (n + 1)4n n = = Therefore: fn+1 = 1 2(n + 1) Hn+1 − H2n+2 + 2 2n + 1 × 2 2n + 2 1 .p. n = This is a recurrence relation of the ﬁrst order with the initial condition f0 = 0. for which we have an = n. 2n + 1 4n+1 n + 1 × 2n + 2 4 n+1 (n + 1)4n+1 n + 1 2(2n + 1) 1 2 2n + 2 .s. 2 2n + 1 Furthermore. In order to show a non-trivial example. If we expand the function. notwithstanding their appearance. bn = (2n − 1)/2.

. We now observe that every left subtree is associated to each possible right subtree and therefore it should be counted (n − 1 − k)! times. but there are only 2n /(n + 1) diﬀerent binary trees with n nodes. by dividing by n!.11.9).l. A set D of keys is given. however. so that Qn is the average total i.4. This increases the total i. These possible ways are In a similar way we ﬁnd the total contribution of the right subtrees: n−1 k!(Pn−1−k + (n − 1 − k)!(n − 1 − k)) = k = (n − 1)! Pn−1−k + (n − 1 − k) . every permutation generating the left and right subtrees is not to be counted only once: the keys can be arranged in all possible ways in the overall permutation. we have: Pn 2 =1+ n! n n−1 k=0 2 Pk +n−1=n+ k! n n−1 k=0 Pk . they are inserted in a binary tree. We used the formula for the sum of the ﬁrst n − 1 integers.e. n When we are looking for some key d to ﬁnd whether d ∈ D or not.l.p. it actually is Pk + k!k. . the total number of nodes in all the trees. We can also reformulate the recurrence for n + 1. The problem is: how many comparisons should we perform. and then divide this number by n!n. We therefore have the following recurrence relation. (n − 1 − k)! It only remains to count the contribution of the roots. Besides.p. Let Pn the total internal path length (i. in order to be able to apply the generating function operator: n (n + 1)Qn+1 Q′ (t) Q′ (t) = = − (n + 1)2 + 2 k=0 Qk Q(t) 1+t +2 (1 − t)3 1−t 1+t 2 Q(t) = . to ﬁnd out that d is present in the tree (successful search)? The answer to this question is very important. starting at the root and arriving to a given node K is called the internal path length for K. In the former case our search is successful. n − 1) and the right subtree contains the remaining n − 1 − k nodes.) of all the n! possible trees generated by permutations.11 The internal path length of binary trees . the average i. taken from an ordered universe U . a single comparisons for every tree. while in the latter case it is unsuccessful. k! Let us now set Qn = Pn /n!. The left subtrees have therefore a total i. retaining their relative ordering. our previous problem can be stated in the following way: what is the average internal path length for binary trees with n nodes? Knuth has found a rather simple way for answering this question. If we’ll succeed in ﬁnding Qn .p. until we ﬁnd d or we arrive to some leaf and we cannot go on with our search. . The number of nodes along a path in the tree. we perform a search in the binary tree. k! n−1 k Binary trees are often used as a data structure to retrieve information. there are n! possible permutations of the keys in D. . The reasoning is as follows: we evaluate the total internal path length for all the trees generated by the n! possible permutations of our key set D. because it tells us how good binary trees are for searching information. A non-empty binary tree can be seen as two subtrees connected to the root (see Section 2.p. which obviously amounts to n!.l. by the total number of nodes. and now. Therefore. and as the various elements arrive.l. we wish to show how the method of generating functions can be used to ﬁnd the average internal path length in a standard way. 2 ln 1−t = This diﬀerential equation can be easily solved: Q(t) = = 1 (1 − t)2 1 (1 − t)2 .l. the left subtree contains k nodes (k = 0. Therefore D is a permutation of the ordered sequence d1 < d2 < · · · < dn . in which the contributions of the left and right subtrees turn out to be the same: Pn = n! + (n − 1)!× n−1 × k=0 Pn−1−k Pk +k+ + (n − 1 − k) k! (n − 1 − k)! n−1 = = n! + 2(n − 1)! = n! + 2(n − 1)! k=0 n−1 Pk +k k! = k=0 n(n − 1) Pk + k! 2 . equal to Pk . but every search in these subtrees has to pass through the root. relative to a single tree. i. It is just the number of comparisons necessary to ﬁnd the key in K. on the average. and therefore the total contribution to Pn of the left subtrees is: n−1 (n − 1 − k)!(Pk + k!k) = (n − 1)! k Pk +k . 1−t (1 − t)3 (1 − t)2 (1 + t) dt + C (1 − t)3 1 −t+C . 1. As we know.p. comparing d against the root and other keys in the tree. . THE INTERNAL PATH LENGTH OF BINARY TREES 51 4. we are looking for will simply be Qn /n.

a height balanced binary tree is a tree such Q0 = Q(0) = 0 and therefore. if the |Tn | = |Tn−1 | + |Tn−2 | + 1. GENERATING FUNCTIONS Since the i. of a set {a1 . an } are stored retrieving time for every such tree will be ≤ n. plus 1 (the root). we that for every node K in it.: subtree of every node exceeds exactly by 1 the height Qn 1 Pn of the right subtree of the same node. In order to perform our analysis. 12. 2. Since the height We have been able to show that binary trees are a n is an upper bound for the number of comparisons “good” retrieving structure. then |h′ − h′′ | ≤ 1. as we are now going to show.52 CHAPTER 4. and shows by using the preceding tree Tn−1 as the left subtree that the average number of comparisons necessary to and the tree Tn−2 as the right subtree of the root.}. the n n−1 height of the left subtree of any node cannot exceed = 2(n + 1)Hn + 2 − 2(n + 1) − n = 2(n + 1)Hn − 3n. the resulting structure degenerates into a linear list and This resembles the Fibonacci recurrence relation. the average retrieving time becomes O(n). we can easily show that |Tn | = Fn+1 − 1. . = =2 1+ we have drawn the ﬁrst cases. let us consider trees in which the height of the left Thus we conclude with the average i. of height n. To understand this concept. height balanced binary trees are also known tract the coeﬃcient of tn : as AVL trees. of the empty tree is 0.p. the number of nodes in Tn is just the sum O(ln n). let us deﬁne the k = n−1 and k = n−2. a tree for which the height of F1 − 1 = 1 − 1 = 0. The proof is done by Landis. retrieve any key in a binary tree is in the order of Therefore. two Russian researchers. we have the simple recurrence relation: tree is in the order of ln n. trees in the sense that every other AVL-tree of height n will have at least as many nodes as Tn . Since. the average elements. . let us consider −2 −2 = 2(n+1)Hn+1 −2 (−1)n − (−1)n−1 = to worst possible AVL tree. a2 . In Figure 4. To avoid this drawback. keys are stored in the tree in their proper order. in the sense that if the necessary to retrieve any key in the tree. the height of the corresponding right subtree plus 1. then the If we denote by |Tn | the number of nodes in the expected average time for retrieving any key in the tree Tn . if we ﬁnd a limitation for the = Fn − 1 + Fn−1 − 1 + 1 = Fn+1 − 1 height of a class of trees. we should have Formally. Q(t) = The algorithm of Adelson-Velski and Landis is very 2 2 (1 − t) 1 − t (1 − t) important because. and the algorithm for building AVL1 1 n trees from a set of n keys can be found in any book Qn = [t ] 2 ln + 1 − (2 + t) = (1 − t)2 1−t on algorithms and data structures. . this is also a limitation for the internal path length of the trees in the same class. Because of that. let us suppose that for 1 from the height of the right subtree of the same every k < n we have |Tk+1 | = Fk − 1.p. and because of the recurrence height of a tree as the highest level at which a node relation for |Tn | we have: in the tree is placed.l. However. and the condition on the heights of the subtrees of is satisﬁed. and. similarly we the left subtree of every node K diﬀers by at most proceed for n + 1. 4.height balanced binary trees assure that the retrievtion 4. The ﬁnal result is: heights of the two subtrees originating from K. These trees are built in n!n n n a very simple way: every tree Tn .4 on Common Generating Functions) to ex. this behavior of binary trees is not always assured. We can now use the formula for G(nHn ) (see the Sec. of the nodes in Tn−1 and in Tn−2 . by setting t = 0.l. Therefore. K K 1 t 2 ln − . The height is also the maximal number of comparisons necessary to ﬁnd any key in |Tn | = |Tn−1 | + |Tn−2 | + 1 = the tree.12 Height balanced binary every nodeconsidered asBecause of this construction. or keys. this holds for node K. Adelson-Velski and sequence {0. 4. For n = 0 we have |T0 | = balanced” binary tree. . if h′ and h′′ are the K k ﬁnd C = 0. in random order in a binary (search) tree. . by deﬁnition. Tn can be the “worst” tree of height n. .1 Hn − 3.ing time for every key in the tree is O(ln n). 7. . Therefore. is built This formula is asymptotic to 2 ln n+γ −3. for example. . 1. Here we only wish to perform a worst case analysis to prove that the ret 1 2+t trieval time in any AVL tree cannot be larger than = 2[tn+1 ] ln + 1 − [tn ] = (1 − t)2 1−t (1 − t)2 O(ln n). found an algorithm to store keys in a “height mathematical induction. and this is true. at the beginning of the as is intuitively apparent from the beginning of the 1960’s. since Fn + Fn−1 = Fn+1 by the Fibonacci recurrence. in fact.

13. They satisfy the n n−1 1 Bn−k recurrence Cn = k=0 Ck Cn−k−1 .4. rectly solved.13 Some special recurrences Not all recurrence relations are linear and we had occasions to deal with a diﬀerent sort of relation when If we divide everything by (n + 1)!. The Bernoulli numbers were introduced by means t 1 + et t et + 1 t e−t + 1 of the implicit relation: =− = .tute t → −t and show that the function remains the same: ing” in a completely diﬀerent way. ∀n > 0. we obtain: C(t) − 1 = C(t)2 t or tC(t)2 − C(t) + 1 = 0. we substiwhich was found in the section “The method of shift. k k=0 . As we observed. and this assures that the retrieving time for every AVL-tree with at most |Tn | √ nodes is bounded from above by logφ ( 5(|Tn |+1))−1 We are now in a position to ﬁnd out their exponential generating function. every AVL-tree of height n has a number of nodes not less than |Tn |. = δn. The left hand n member is a convolution. is only valid for n > 0.0 (k + 1)! (n − k)! in this particular form. for t = 0 we should have C(0) = C0 = and therefore should have all its coeﬃcients of odd 1. i. we we write it for n + 1: can pass to the generating functions. (k + 1)! (n − k)! The classical way to see that B2n+1 = 0. which however.0 . we obtain: we studied the Catalan numbers. and therefore. we thus obtain: t t t t et + 1 B(t) + = t + = . In fact we have: the square root is to be ignored. Fn ≈ φn / √ therefore we have |Tn | ≈ φn+1 / 5 − 1 or φn+1 ≈ 5(|Tn |+1). and since this relation holds for every n ∈ N.deleting the term of ﬁrst degree is an even function. √ we have: n ≈ logφ ( 5(|Tn | + 1)) − 1. SOME SPECIAL RECURRENCES 53 T0 T1 r T2 r r r T3 r d r dr r T4 r d dr r eer ¡¡ r r r T5 r d r r d ¡ d ¡ d r dr r d r ¡ eer r ¡r ¡¡ r Figure 4. √ 2 e −1 2 2 et − 1 1 − 1 − 4t C(t) = 2t In order to see that this function is even. is to show that the function obtained from B(t) by This is a second degree equation. By passing to the logarithms. by the initial condition C0 = 1. which can be di. t et − 1 B(t) = 1 or B(t) = t .0 . and prove some of their properties. The deﬁning relation can be written as: n k=0 n+1 Bn−k = n−k n k=0 n = = k=0 (n + 1)n · · · (k + 2) Bn−k = (n − k)! 4. whose ﬁrst factor is the shift Ck Cn−k . and therefore the solution with the + sign before order equal to zero. and since all logarithms are proportional.e. − −t t 2e −1 21−e 2 et − 1 n n+1 Bk = δn. In k=0 order to apply the method of generating functions. t e −1 (n + 1)! Bn−k = δn.. n = O(ln |Tn |). the function B(t) = G(Bn /n!).1: Worst AVL-trees We have shown that for large values of n we have √ √ 5. Cn+1 = of the exponential function and therefore we obtain: k=0 The right hand member is a convolution.

GENERATING FUNCTIONS .54 CHAPTER 4.

k = [tn ]d(t)(th(t))k (5.2) In the case of Pascal triangle we obtain the Euler transformation: n A common example of a Riordan array is the Pascal triangle. from the previous theorem we immediately ﬁnd: and this shows that the generic element is actually a d(t) binomial coeﬃcient.k fk = [tn ]d(t)f (th(t)) (5.2). by observing that D = (d(t). s. More precisely. then the dn. we Moreover. By considering the generating functions G(1) = 1/(1 − t).1) ∞ n dn. one of the most impor. weighted r.k = = [tn ] t 1 1 = [tn−k ] = 1−t 1−t (1 − t)k+1 −k − 1 n n (−1)n−k = = n−k n−k k k k=0 n 1 fk = [tn ] f k 1−t t 1−t which we proved by simple considerations on binomial coeﬃcients and generating functions. h(t) ∈ F 0 . In fact we have: dn. th(t)) is a Riordan array.1. s. lower triangular Proof: The proof consists in a straight-forward comarray (or triangle) (dn. z) = k −1 (1 − t − tz) .Chapter 5 Riordan Arrays 5. we ﬁnd the row sums (r.k )n. 1 + th(t) k tant properties of Riordan arrays is the fact that the td(t)h(t) sums involving the rows of a Riordan array can be .1. d(t)(th(t))k z k = d(t) 1 − tzh(t) (5. prove the following theorem: 55 . h(t)) be a Riordan array and let f (t) be the generating function of the sequence (fk )k∈N . whose rows are the diagonals of D. The Riordan array k=0 can be identiﬁed with the inﬁnite. G (−1)k = 1/(1 + t) and G(k) = t/(1 − t)2 .k = [tn ] (1 − th(t))2 performed by operating a suitable transformation on k a generating function and then by extracting a coeﬃcient from the resulting function.) dn.k = [tn ] 1 − th(t) well-known bivariate generating function d(t.k∈N deﬁned by: putation: dn.1. we are mainly interested in the sequence of functions iteratively deﬁned by: d0 (t) = d(t) dk (t) = dk−1 (t)th(t) = d(t)(th(t))k These functions are the column generating functions of the triangle.1 Let D = (d(t). for which we have d(t) = h(t) = 1/(1 − t).alternating r. Another way of characterizing a Riordan array D = (d(t).3) Riordan array is called proper. then we have: A Riordan array is a couple of formal power series n D = (d(t). kdn. s. h(t)). By formula (5. z) = ∞ k=0 k=0 = k=0 ∞ k=0 [tn ]d(t)(th(t))k fk = ∞ k=0 = [tn ]d(t) fk (th(t))k = = [tn ]d(t)f (th(t)). d(t) (−1)k dn. h(t)) is to consider the bivariate generating function: d(t. if both d(t).k fk = dn.k = [tn ] From our point of view.1.k fk = In fact.1 Deﬁnitions and basic concepts Theorem 5.1.

which are 1. we element of the Riordan array (f (t). whose generic element is j dn. h(t)b(th(t))). the last expression denotes the generic going down to the lowest integer exponent ≥ 0. an approximate value for this sum can be obtained by extracting the coeﬃcient of the ﬁrst factor and then by multiplying it by the second factor.3). RIORDAN ARRAYS Proof: For every k ∈ N take the sequence which is 0 everywhere except in the kth element fk = 1. this observation can be generalized to ﬁnd dk (t) = d(t)(th(t))k . and this corresponds to the inithe generating function of any sum k dn−sk.k .2. in which t is substituted by 1/2.i fi = dn. therefore.k∈N = Obviously. Hence. h(t) are two array (a(t).proper. This gives Sn ≈ 2n+3 /7.k )n. Suppose. by formula (5. The product is obviously associative. If this is the case.2 connecting binomial coeﬃcients and Fibonacci numbers.k∈N be an inﬁnite triRiordan array. dn−k. that we wish to sum one out of every three powers of 2. we can prove a sort of inverse property: (5. This is proved by considering two Riordan arrays (d(t).2 Let (dn. . we have: ⌊n/p⌋ k=0 1 = Fn+1 [tn ] 1 − t − t2 The algebraic structure of Riordan arrays The most important algebraic property of Riordan arrays is the fact that the usual row-by-column product of two Riordan arrays is a Riordan array. diagonal sums give: n−k k = = [tn ] 1 1 = 1 − t 1 − t2 (1 − t)−1 k 5. . 2. and in fact we have the exact value Sn = ⌊2n+3 /7⌋.. By formula (5. Sn = 2n−3k = [tn ] (d(t). 2. where f (t) is the gen. Pascal triangle.1. .56 we have: CHAPTER 5. In fact. In fact we have: ∞ j=0 dn. h(t)) coincides with d(t)a(th(t)) = 1 and h(t)b(th(t)) = 1.p.j fj. . we ﬁnd Gt (dn. b(t)).k = [tn ] diagonal sums ∞ 1 − t2 h(t) and we have i=0 dn. 1). . b(t)) = (d(t)a(th(t)). starting with 2n and By deﬁnition. 1) acts as the neutral element or identity. according to k the theorem’s hypotheses.k is the generic element in (a(t). . 1 − 2t 1 − t3 k=0 [tn ]d(t) As we will learn studying asymptotics. we should have: ﬁned by the Riordan array (d(t). for example.k∈N .k )n.k = [tn ]f (t)(tp )k = [tn−pk ]f (t) = fn−pk . Let us now suppose that (d(t). (dn. b(t)) and performing the product.k . = = [tn ]d(t)(th(t))j [y j ]a(y)(yb(y))k = ∞ (th(t))j [y j ]a(y)(yb(y))k = This can be called the rule of generalized convolution j=0 since it reduces to the usual convolution rule for p = n = [t ]d(t)a(th(t))(th(t)b(th(t)))k . Therefore.(1. not depending on f (t). the theorem on the sums involving the Riordan arrays is a characterization for them. g(t)) where f (t) = have: d(t)a(th(t)) and g(t) = h(t)b(th(t)). and we observe that the Riordan array (1.2. we can look for a proper Riordan erating function of the sequence and d(t). h(t)) and fj. for every k = 1. d(t) The corresponding generating function is f (t) = tk . h(t)) · (a(t). the generic element of the Riordan array (f (t). Then the triangle de. 1) is everywhere 0 except for the elements on the main diagonal. b(t)) such that (d(t). For p = 1.1). tp−1 ) is: dn. Observe that this array is proper. In a sense. h(t)) and (a(t). h(t)) is a proper Theorem 5. h(t)) · (a(t).j fj.k for every s ≥ 1. in fact. b(t)) = f.. for example.1) This expression is particularly important and is the basis for many developments of the Riordan array theory.k )n. 1. the array (1.k = ∞ j=0 fn−pk gk = [tn ]f (t) g(y) y = tp = = [tn ]f (t)g(tp ).s. We obtain well-known results for the tial deﬁnition of column generating functions for a Riordan array. we immediately angle such that for every sequence (fk )k∈N we have see that the product of two proper Riordan arrays is n k dn. Another general result can be obtained by means of two sequences (fk )k∈N and (gk )k∈N and their generating functions f (t).k fk = [t ]d(t)f (th(t)). g(t). if dn. Therefore we have: ⌊n/3⌋ 1 1 .1. .j is the generic element in (d(t).

= explained as particular cases of the most general case n d(y) d(y)h(y)n of (proper) Riordan arrays.k+1 .k = [tn z k ] dh (t) = 1 − tzhh (t) dh (t) Here we are in the hypotheses of the Lagrange Inver= [z k ][tn ] y = thh (t) . combination of the elements in the preceding row. We wish now to ﬁnd an explicit expression for the i.1). Besides. k ∈ N. g(t)) is a subd′ (y) 1 − = 2 (1 − zy) group of A and is called the subgroup of associd(y) h(y)n ated operators or the Lagrange subgroup. If h(t) is any ﬁxed invertible f. We have therefore proved: y = thh (t) = th(thh (t))−1 = th(y)−1 Theorem 5. By the formulas above.2. f (thh (t)) = f h (t)−1 has found an important characterization: every element dn+1.k+1 = a0 dn.e. h(t)) is d(t)/(1 − tzh(t)) and therefore we have: dn. sion Formula. d(y)−1 −1 k n = It is a simple matter to show that some important dn. 57 done by using the LIF. h(t)) is proper and the rows of D can be seen as the s-diagonals (dn−sk )k∈N which can be reduced to the single and basic rule: of D. If we deﬁne dh (th(t)) = d(t)−1 d(thh (t)) = dh (t)−1 h(t) = hs +hs+1 t+· · ·. a(y) and b(y) are uniquely deﬁned.s. the two f. Therefore we ﬁnd: is a group with the operation of row-by-column prod. let us deﬁne: 5. hh (th(t)) = h(t)−1 h(thh (t)) = hh (t)−1 the Riordan array D = (d(t). This will be dn+1. array inverse. Let us now return to the formulas for a Riordan This is the formula we were looking for. an s > 0 exists such that h(t) = hs ts + hs+1 ts+1 + · · · and hs = 0. h(t)).3 The A-sequence for proper dh (t) = d(y)−1 y = th(y)−1 Riordan arrays so that we can write (d(t).k in the inverse Riordan array (d(t).k = [z ][t ] 1 − zy y = th(y) classes of Riordan arrays are subgroups of A: 1 d d(y)−1 1 = = [z k ] [y n−1 ] • the set of the Riordan arrays (f (t). As we observed in the ﬁrst section. which is not proper.. n.k+2 + · · · = .p. Since h(0) = 0.p. f (t)) is a subn r=0 group of A and is called the Bell subgroup. it is called the Appell z 1 − = [z k ] [y n−1 ] subgroup. hh (t)). n d(y)(1 − zy)2 • the set of the Riordan arrays (1. can be expressed as a linear Observe that obviously f h (t) = f (t).: generic element dn.2.5.s. but d(t) ∈ the identities: F 0 .p. ∞ 1 z r+1 y r (r + 1)− = [z k ] [y n−1 ] • the set of Riordan arrays (f (t). h(t))−1 = (dh (t). Consequently. Its ∞ ′ d (y) 1 elements are also known as renewal arrays. 1) is an inn dy 1 − zy h(y)n variant subgroup of A.1) we immediately ﬁnd in our approach. h(t) ∈ F 0 . Proper Riordan arrays play a very important role By the product formula (5. Let us consider a Riordan array D = (d(t).2. for proper Riordan arrays. then h(t) ∈ F 0 . we have: being d(t). Rogers ∀f (t) ∈ F 0 . THE A-SEQUENCE FOR PROPER RIORDAN ARRAYS By setting y = th(t) we have: a(y) = d(t)−1 t = yh(t)−1 b(y) = h(t)−1 t = yh(t)−1 .3.d (t) = d (yh(y)) = d(t)−1 .1 The set A of proper Riordan arrays which is the same as t = yh(y). and Riordan arrays. − = zr yr d(y) r=0 d(y)h(y)n The ﬁrst two subgroups have already been consid1 d′ (y) 1 n−1 ered in the Chapter on “Formal Power Series” and [y ] ky k−1 − y k = = n d(y) d(y)h(y)n show the connection between f. the bivariate generating function for (d(t). Fortunately. h(t))−1 in terms of d(t) and h(t).k + a1 dn. and therefore there is a unique function 1 − zy −1 t = t(y) such that t(0) = 0 and t = yh(t) .s.k+1 + a2 dn. and consequently: h h uct deﬁned functionally by relation (5. The notations used in that Chapter are thus yd′ (y) 1 1 n−k [y ] k− .

k+1 .3) Proof: Let z0 = d1. This completes the proof.3.3. The A-sequence for the Pascal triangle is the solution A(y) of the functional equation 1/(1 − t) = A(t/(1 − t)).1) holds Proof: Let us suppose that D is the Riordan array (d(t).n = 0.0 d1.0 . Let d(t) be the generating function of this column. .0 . h(t)).1 An inﬁnite lower triangular array D = (dn. as we have shown during array’s Z-sequence. vice versa.1) The sum is actually ﬁnite and the sequence A = (ak )k∈N is ﬁxed. and by substituting the values just obtained for z0 and z1 . At this point. lower triangular array with dn. We have: the proof of the theorem.0 = z0 dn. B(t)) by the relation: (A(t). we deﬁne the Riordan array (A(t). .k )n. h(t)) or: (d(t). by our previous observation.3. then a unique sequence Z = (zk )k∈N exists such that every element in column 0 can be expressed as a linear combination of all the elements in the preceding row. .2) as the solution of h(t) = 1 + th(t).0 .}. D satisﬁes relation (5.0 . a2 . h(t)).1 + z2 dn. (5. (5. k ∈ N relation (5. The simple substitution y = t/(1 − t) gives A(y) = 1 + y.0 − d2 1.3.0 . we have the identity (5.k∈N be any inﬁnite. k ∈ N and therefore.1 or z1 = d0.} exists such that for every n. h(t) is deﬁned by (5. We can therefore consider the proper Riordan array D = (d(t). 1. we can prove the following theorem: Theorem 5. . which is uniquely determined because of our hypothesis a0 = 0. we can say that the triple (d0. for every n.0 /d0. we have proved that the Pascal triangle corresponds to the Riordan array (1/(1−t).0 in terms of the elements in row 1. we realize that we could k have started with this recurrence relation and directly found A(y) = 1+y. we determine the sequence Z in a unique way.e. In fact. The same element in the right hand member is: [tn ]d(t)h(t)(th(t))k = = [tn+1 ]d(t)(th(t))k+1 = dn+1. h(t)) · (A(t). and this immediately gives h(t) = 1/(1−t).3. h(t))−1 · (d(t)h(t). .k∈N is a Riordan array if and only if a sequence A = {a0 = 0. By proceeding in the same way. The element fn. Therefore we have (d(t). Furthermore. Z(t)) completely characterizes a proper Riordan array.j . d0.k+j .3.k )n.0 d2. since column 0 is {1. By equating these two quantities. zj dn.2) .k+j aj . The latter identity gives B(th(t)) = 1 and this implies B(t) = 1. .3.1) uniquely deﬁnes the array D when the elements {d0. A(t) the generating function of the sequence A and deﬁne h(t) as the solution of the functional equation h(t) = A(th(t)). h(t)) be a proper Riordan of the Riordan array D = (d(t).3 Let (d(t). if as usual we interpret ak−j as 0 when k < j. 1) = (d(t)h(t). h(t)) and it only array and let Z(t) be the generating function of the depends on h(t). corresponding to the well-known basic recurrence of the Pascal triangle: n+1 = k+1 n n + k+1 . i. h(t)) and let us consider the Riordan array (d(t)h(t).2 + · · · = ∞ j=0 aj dn.1 In the same way. . let it be a proper Riordan array).0 + z1 dn. we have: d0. d2.0 d(t) = 1 − tZ(th(t)) h(t) = A(th(t)) (5.1). except for the element d0.0 = z0 d1. RIORDAN ARRAYS and this uniquely determines A when h(t) is given and. The sequence Z is called the Z-sequence for the (Riordan) array. it characterizes column 0. . Now. ∀n ∈ N (in particular.0 in terms of the elements in row 2.0 . it must coincide with D. For the converse. h(t) is uniquely determined when A is given. by the ﬁrst part of the theorem.58 = ∞ j=0 CHAPTER 5. a1 .2 Let (dn.} of column 0 are given. Therefore. More precisely.3. d1.: dn+1.0 . B(t)) = (d(t).: d2. we determine z2 by expressing d3. 1/(1−t)) as initially stated. h(t)) · (A(t). B(t)) = (d(t)h(t).3. i. The pair of functions d(t) and A(t) completely characterize a proper Riordan array. h(t)).j ak−j = j=0 dn.3. let us observe that (5.1). To see how the Z-sequence is obtained by starting with the usual deﬁnition of a Riordan array. Now we can uniquely determine the value of z1 by expressing d2.e.0 + z1 d1. Another type of characterization is obtained through the following observation: Theorem 5. let us prove the following: The sequence A = (ak )k∈N is called the A-sequence Theorem 5. A(t).k of the left hand ∞ ∞ member is j=0 dn.3. 1. h(t)). By performing the product we ﬁnd: d(t)A(th(t)) = d(t)h(t) and h(t)B(th(t)) = h(t). .

k and dm. we have: d(t) − d0. b. Since d(t)(th(t))k is the generating function for column k.k ) is = d(t)Z(th(t)).3) takes on two speciﬁc forms which are worth being stated explicitly: n + ak fk = m + bk k 5.3.3) is valid for every n ∈ N.4 Simple cients binomial coeﬃ- = [tn ] tm f (1 − t)m+1 k tb−a (1 − t)b b>a (5.k ). We conclude this section by giving a theorem.e.k = = n + ak n + ak = = n − m + ak − bk m + bk −n − ak + n − m + ak − bk − 1 × n − m + ak − bk = = = × (−1)n−m+ak−bk = −m − bk − 1 (−1)n−m+ak−bk = (n − m) + (a − b)k 1 [tn−m+ak−bk ] = (1 − t)m+1+bk [tn ] tm (1 − t)m+1 tb−a (1 − t)b k .4. The theorem now directly follows from (5. If b > a and b − a is an integer.k the relation desired. where a.4.1 Let dn. D= (1 + t)n . whose elements depend on the parameters a. because A(th(t)) = h(t) by formula (5.4. b are = [tm ](1 + t)n f (t−b (1 + t)a ) . (5. Therefore. if some conditions on a. m or a. we ﬁnd: dn.1. D= (1 − t)m+1 (1 − t)b formula for the Z-sequence: Z(y) = d(t) − d0. By the previous theorem.0 = t = z0 d(t) + z1 d(t)th(t) + z2 d(t)(th(t))2 + · · · = = d(t)(z0 + z1 th(t) + z2 (th(t))2 + · · ·) = 59 two parameters and k is a non-negative integer variable.2). = = [tm+bk ](1 + t)n+ak = [tm ](1 + t)n (t−b (1 + t)a )k . bin+ak nomial coeﬃcients of the form m+bk . respectively. we have Riordan arrays and therefore we can apply formula (5.3. n. Vice and: versa.3) to ﬁnd the value of many sums.and Z-sequences: Theorem 5. In either case.5. We have: m. Proof: Let us assume that A(y) = d(0) + yZ(y) or Z(y) = (A(y) − d(0))/y.0 td(t) t = yh(t)−1 . i. The sum (5. Theorem 5.1.k hypothesis d(t) = h(t): d(0) + yZ(y) = = = = d(0) 1 − d(0) + y t = yh(t)−1 = t th(t) th(t) d(0)th(t) t = yh(t)−1 = − d(0) + t th(t) h(t) t = yh(t)−1 = A(y).1) For m = a = 0 and b = 1 we again ﬁnd the Riordan array of the Pascal triangle. the Z-sequence exists and is unique. then D = By solving this equation in d(t). and we can go on to the generating functions. d(0)t Proof: By using well-known properties of binomial coeﬃcients. Then d(t) = h(t) if and only if: A(y) = d(0) + yZ(y). Depending if we consider n a variable and m a parameter. by the formula for Z(y). we obtain from the dm. a Riordan array. tb−a−1 tm The relation can be inverted and this gives us the . b hold. we immediately ﬁnd (d ) is a Riordan array. then D = (dn.2) Let us consider simple binomial coeﬃcients. we have: d(t) = = = d(0) = 1 − tZ(th(t)) d(0) = 1 − (tA(th(t)) − d(0)t)/th(t) d(0)th(t) = h(t). which characterizes renewal arrays by means of the A.. or vice versa. equation (5.1) n + ak fk = m + bk b < 0.4 Let d(0) = h(0) = 0. If b < 0 is an integer. t−b−1 (1 + t)−a . we have two diﬀerent inﬁnite arrays (dn.3. b.1.4. SIMPLE BINOMIAL COEFFICIENTS Proof: By the preceding theorem.k ) or (dm.k be as above.

From the generating function of the Catalan numbers we immediately ﬁnd: n+k m + 2k = = 2k (−1)k = k k+1 √ tm 1 + 4y − 1 [tn ] m+1 (1 − t) 2y [tn−m ] × 1 (1 − t)m+1 2 5. 2k 2k = α α−1 ± . The following sum is a more interesting case. 2 t(1 + t)z+1 n−k by the theorem above.4. corresponds to the Riordan array (tm /(1 + t)m+1 .4. β β−1 y= t = (1 − t)2 × 1+ 4t −1 (1 − t)2 = (1 − t) = 2t 1 = [tn−m ] (1 − t)m n−1 . m−1 In the following sum we use the bisection formulas.2): = [tn ] = m (1 − t)m+1 1 − t k n 2n − 2k (−2)k = 1 n+1 n−m k m−k .2).4. by the formula concerning the row sums.2): z+1 2k + 1 = z − 2k 2k+1 2 = n−k √ (1 + 2 y)z+1 − [tn ](1 + t)z √ 2 y Hence. by formula (5. these y= − = √ relations can also be stated as generating function 2 y (1 + t)2 n+ak √ identities.60 CHAPTER 5.5 Other Riordan arrays from binomial coeﬃcients k Other Riordan arrays can be found by using the theorem in the previous section and the rule (α = 0. Because the generating function for z+1 2k is (1 + k 2t)z+1 . RIORDAN ARRAYS √ (1 − 2 y)z+1 t If m and n are independent of each other. we used backwards the we have: √ √ bisection rule.4. m t n−k 1 We solve the following sum by using (5. since (1 + t ± 2 t)z+1 = (1 ± t)2z+2 . − √ = Let us begin our set of examples with a simple sum. we have: z+1 G 22k+1 = 2k + 1 √ √ 1 √ (1 + 2 t)z+1 − (1 − 2 t)z+1 . The binomial coeﬃcient m+bk is so gen(1 + t + 2 t)z+1 √ − = [tn ](1 + t)z+1 eral that a large number of combinatorial sums can 2 t(1 + t)z+1 be solved by means of the two formulas (5. 2 1 − 2t − 4t2 where the binomial coeﬃcient is to be taken as zero when m is odd. = [t ] = k m+1 (1 − t)m+2 t = [tm ](1 + t)2n (1 − 2y)n y = = Another simple example is the sum: (1 − t)2 n n 5k = = [tm ](1 + t2 )n = 2k + 1 m/2 k = = [tn ] t2 1 t y= = (1 − t)2 1 − 5y (1 − t)2 1 n 2t [t ] = 2n−1 Fn . we have: 2n n + k fk = n+k n−k = k k n+k fk + 2k k n−1+k fk = 2k + = = 1 f [t ] 1−t n k + [tn−1 ] = [tn ] 1+t f 1−t 1 f 1−t t (1 − t)2 t (1 − t)2 t (1 − t)2 . 2n + 1 therefore. the binomial coeﬃcients m 2z + 2 = [t2n+1 ](1 + t)2z+2 = .1). = 2 t By applying formula (5. 1).4. in the last but one passage.1) and √ (1 + t − 2 t)z+1 (5. if − is considered): α±β α α β For example we ﬁnd: 2n n + k n+k 2k = = = 2n n + k = n+k n−k n+k n+k−1 + = n−k n−k−1 n+k n−1+k + . .

We have to solve the n−k n k equation: (−1) = n−k k k w = tφ(w) or w = t(1 + w)2 .0 + δn. This is: 1+t 1−t √ (−1)n 2/n if 3 divides n 1 − 2t − 1 − 4t = (−1)n−1 /n otherwise. and try to mula known as Hardy’s identity: ﬁnd a true generating function. let us suppose that k is This gives an immediate proof of the following for. Let us suppose k k−1 k we have the binomial coeﬃcient 2n−k and we wish n−k except for k = 0.5.ﬁxed. The reader can generalize formula (5.5. when the left-hand side is 1 and the to know whether it corresponds to a Riordan array or not. the formulas of the previous sections give the desired result when the m and n in t f (τ ) − f0 fk the numerator and denominator of a binomial coeﬃ= dτ. 2n k 2k (−1)k = k k+1 k √ t 1+t 1 + 4y − 1 y= [t ] = 1−t 2y (1 − t)2 [tn ](1 + t) = δn.1) by using the change of variable t → pt and prove other formulas. We have: right-hand side is not deﬁned. 2n n + k n+k n−k = = n while this is a generalization of Hardy’s identity: n n − k n−2k x (−1)k = k n−k √ √ x2 − 4)n + (x − x2 − 4)n . acn−k n−k n−k−1 = cording to the diagonalization rule. (5. 1 − t2 n = [t ] ln = 1 + t3 This equation is tw2 − (1 − 2t)w + t = 0 and we are 1 1 looking for the unique solution w = w(t) such that − ln = = [tn ] ln 3 2 w(0) = 0. in that case. Therefore. By formula (5. k n−k 1 k n−k = φn + φn n F (w) = w 1+w k = .6 The following is a quite diﬀerent case. In fact.6. we can apply the diagonalization rule with F (t) = (t/(1 + t))k and φ(t) = (1 + t)2 . For example: 2n n + k n+k n−k 2k (−1)k = k t = y= (1 − t)2 61 where φ is the golden ratio and φ = −φ−1 .4. This requires to apply the Lagrange Inversion Formula. The following one is known as Riordan’s old identity: n n−k (a + b)n−2k (−ab)k = an + bn k n−k k k 1+t 1 √ = [tn ] 1−t 1 + 4y = [tn ]1 = δn.1) d(t) function of a Riordan array because it varies as n varies. 2t We also immediately obtain: We now perform the necessary computations: n k=1 n − k − 1 fk = k−1 k t2 1−t = [tn ](1 + t)2n t 1+t k . 0 we have to extract the coeﬃcient of tn from a funcObviously we have: tion depending on the same variable n (or m).5. = (x + 5.0 . BINOMIAL COEFFICIENTS AND THE LIF This proves that the inﬁnite triangle of the elements 2n n+k is a proper Riordan array and many identin+k 2k ties can be proved by means of the previous formula. G(t) = G k τ cient are related between them.1): 2n − k n n−k = [tn−k ](1 + t)2n−k = fk = n−k n−k k k = f0 + n ∞ The function (1 + t)2n cannot be assumed as the = f0 + n[t ]G . Let f (t) = G(fk ) and: Binomial coeﬃcients and the LIF In a few cases only.1 . w(t) = .

we can We wish to point out that our considerations are show that: by no means limited to the walks on the integral lat2n + ak = tice. y) to (x + 1..k the numn−k k=0 ber of colored walks which have length n and reach a √ 1 − 1 − 4t 1 1 n distance k from the main diagonal.k + bdn. and this is in accorn √ and north steps can be. in order n−ck to apply the LIF.k+1 = adn. By using the diagonalization rule as above.. b. but in this case. This creates many diﬃculties.7 Coloured walks This shows that the binomial coeﬃcient is the generic element of the Riordan array: √ 1 − 1 − 4t 1 . 2t 1 − 4t furthermore: 1 1 1 .k∈N . which go from the point (x. This proves that A = {a. and we denote by dn. 2 k CHAPTER 5. 2.. 3. Hence we have: dn+1. which go from the point (x. we have to solve an equation of degree p > 2. which therefore is a proper Riordan array. c (a > As a check. bracketing √ a+2c problems are a typical example and.e. We observe that each walk 2n is obtained in a unique way as: k (−1) = n − 3k k 1. a problem is called symmetric if and 1 − 4t 1 − 4t 1 − 4t only if a = c. y + 1). walks. in a similar way. diagonal steps. c ≥ 0) the number of colors the east. 2n .ored walks of length n + 1 reaching the distance k + 1 ternating sum: from the main diagonal. 1−t 1 √ + = 2 1 − 4t 2(1 − 3t) δn. We discuss complete colored dance with the generating function d(t) = 1/ 1 − 4t. the corresponding non-alternating sum. A colored walk 1 1 1 problem is any (counting) problem corresponding to √ = [tn ] = [tn ] √ = 4n . i. followed by any of the c north steps. n−k 2 1 − 4t In the section “Walks. c} is the A-sequence of (dn. a vast 1 − 1 − 4t 1 √ . trees and Catalan numbers” we introduced the concept of a walk or path on the integral lattice Z2 . i. a walk of length n reaching the distance k + 1 from the main diagonal. a walk of length n reaching the distance k from √ 6 1 1 1 − 1 − 4t the main diagonal. the last step y= = [t ] √ = 2 1 − 4t 1 − 2y ends on the diagonal x − y = k ≥ 0. i. east steps. D= √ . 2 n 6 The reader is invited to solve.. = [tn ] A colored walk is a walk in which every kind of step can assume diﬀerent colors.e. y) to (x + 1. walks without any restriction. diagonal elements with k = 0.e. In the same way we can deal with binomial coeﬃcients of the form pn+ak . b. walks that never go above the n main diagonal x − y = 0. the diagonalization gives: √ k 2n − k 1 1 − 1 − 4t = [tn ] √ . i. b. y). colored walks. a walk of length n reaching the distance k + 2 from the main diagonal. and we do not insist on it any longer. followed by any of the b diagonal steps. y + 1). which go from the point (x. . = =√ 1 − tφ′ (w) 1 − 2t(1 + w) 1 − 4t Therefore..e.0 1 2n + 3n−1 + = . the number of colAn interesting example is given by the following al.62 √ 1 − 2t − 1 − 4t √ 1 − 1 − 4t √ k 1 − 1 − 4t . we observe that column 0 contains all the 0. y) to (x.k )n.e. Many combinatorial problems can be proved n − ck k∈N to be equivalent to some walk problems. 3. 2t 1 − 4t Let us consider dn+1. The concept can be generalized by deﬁning a walk as a sequence of steps starting from the origin and composed by three kinds of steps: 1. in fact.k+1 . i.k+1 + cdn. north steps. RIORDAN ARRAYS = = = 5. followed by any of the a east = [tn ] √ y = t3 steps. This signiﬁcant fact can be stated as: .k+2 . tc−1 = literature exists on walk problems. we denote by a. The length of a walk is the 2n − k k 2 = number of its steps. 2t 1 − 4t 1 + y 2. and underA simple example is: diagonal walks.

0 = bdn. which is: onal followed by any diagonal step.2). From this The proof is as follows. in which walk is: k can also assume the negative values.In the symmetric case these formulas simplify as folsequence. 1 1 − (b + 2a)t − ∆ n k n−k α(t) = When c = 0. b. and therefore go above the main diagonal. i.k . 2act2 2ct2 and therefore the bivariate generating function of the extended triangle is: In current literature. the formulas for the row sums and the d(t.k )n. In fact. However.3.0 . c) we obtain one type of array from complete δ(t) = .sponding Riordan array. and we can ﬁnd it lows: by means of formula (5. which is the weighted row sum of the Riordan array.5. the array (dn. it is easily proved that dn. the function h(t).k . is the same in both cases. then the inﬁnite triangle (dn. c.k )n. north step. Catalan and Motzkin triangles deﬁne √ walking problems that have diﬀerent values of a. w) = + b + aw tn = w n 1. With regard to the last two points.1 2ct2 ∆ and in the column generating functions this corresponds to d(t) − 1 = btd(t) + ctd(t)th(t). COLOURED WALKS Theorem 5. walks.1 Let dn. For any given triple 1 1 − (b + 2a)t − ∆ (a. 2at 1 − (b + 2a)t 1 − (b + 2a)t The radicand 1 −√ + (b2 − 4ac)t2 = (1 − (b + 2bt √ 2 ac)t)(1 − (b − 2 ac)t) will be simply denoted by The alternating row sums and the diagonal sums ∆.1) the Riordan array of underdiagonal colored only the right part of an inﬁnite triangle. or a walk ending √ at distance 1 from the main diagonal followed by any 1 1 − bt − ∆ .e.1) 1 1 − (b − 2a)t 1 − bt 2ct2 δ(t) = − . divided by αn .7. we have dn+1.k = k a b 2at (a + b + c)t − 1 and so we end up with the Pascal triangle. 63 weighted row sums given in the ﬁrst section allow us to ﬁnd the generating functions α(t) of the total number αn of underdiagonal walks of length n. Hence.k∈N = . and so they can be treated in the same way.k∈N = √ .7. In Chapter 7 we will learn how to ﬁnd an asymptotic approximation for dn . = 1 − (aw + b + c/w)t 2.k )n.k be the number of colored walks of length n reaching a distance k from the main diagonal. the average distance from the main diagonal of n all the walks of length n.and we only have to derive the form of the corretained from another walk returning to the main diag. this is δn = k=0 kdn. (dn. the total number of walks of length n. b. this is n αn = k=0 dn.k∈N is by (5. sometimes have some combinatorial signiﬁcance as Let us now focus our attention on underdiagonal well. we see that the gen1 − bt − ∆ 1 − bt − ∆ (dn. 3. w) = = 1 √ ∆ 1− 1 √ ∆ 1− √ 1−bt− ∆ w 2ct 1 − 1− √ 1−bt+ ∆ w 2ct 1 √ + 1−bt− ∆ w 2ct 1 1 − bt − + 2at √ ∆1 w1− √ 1−bt− ∆ 1 2ct w 1 . erating function of the nth row is ((c/w) + b + aw)n . A(t) = a+bt+ct2 and h(t) is the solution of the functional equation 1 − (b − 2a)t 1 −1 α(t) = h(t) = a + bth(t) + ct2 h(t)2 having h(0) = 0: 2at 1 − (b + 2a)t √ 1 − bt − 1 − 2bt + b2 t2 − 4act2 h(t) = (5.7. we observe that every The study of complete walks follows the same lines walk returning to the main diagonal can only be ob.0 + cdn. . By following √ √ the logic of the theorem above. Con√ 2 sequently.k∈N is a proper Riordan array.7. the number of walks returning to the main diag1 onal.. major importance is usually n c given to the following three quantities: d(t. for every n. . the value of the row sums If we expand this expression by partial fractions. and δ(t) of the total distance δn of these walks from the main diagonal: The Pascal. 4at (a + b + c)t − 1 walks and another from underdiagonal walks. that only depends on the A. this is dn = [tn ]d(t). Since a complete walk can relation we easily ﬁnd d(t) = (1/a)h(t). we assume c = 0.k )n. If we consider dn+1. we get: of the Riordan array. .

64 CHAPTER 5.k the quantity k! n /n!. we immediately obtain S1 (t) = verify that dk+1 (t) = (et − 1)k+1 .8 Stirling numbers and Riordan arrays proves that (dn.k )n.k . If we examine the two in(n + 1)dn+1. which is easily checked by looking at column have: 1 in the array.1 = dn. The form of this generating function: dk (t) = G k! n! n k = (et − 1)k 1−bt− 2ct 1 =√ ∆ ∆ w k 1 − bt − 2ct √ ∆ k w k = (k + 1)! n! n k+1 k+1 k! + n + 1 n! n k k+1 . In a similar way. whose solution is: S2 (t) = t2 . by specializing the (k + 1)et (et − 1)k = recurrence relation to k = 1. let us go on with the Stirling numbers of the second kind n n n+1 + =n proceeding in the following way. the recurrence relation above to k = m. and this suggests that. if he or she prefers). new relation: whereas the second term corresponds to the left part (k + 1)! n + 1 (k < 0).and passing to generating functions: d1 (t) = d1 (t) + 1. This gives the relation between fact can be obtained by mathematical induction. substitute dk (t) = (et − 1)k and solve the diﬀerential t equation thus obtained. Again. this is a k which immediately gives the form of the Riordan ar.1 .0 ﬁnite triangles of the Stirling numbers of both kinds. we t/(1−t). We can now go on n n n+1 by setting k = 1 in the recurrence. we can S1 (t) − S1 (0) = S1 (t) + S0 (t). Obviously. as we shall which is now proved by induction when we specialize see in the next section.k+1 + (k + 1)dn. if we multiply the k+1 k k+1 n m tm = (1 − t)(1 − 2t) · · · (1 − mt) This proves. in an algebraic way. in general. the generating functions: recurrence relation gives: d′ (t) = (k + 1)dk+1 (t) + k+1 (k + 1)dk (t). by starting with the recurrence relation: solution. By the induction hypothesis. (1 − t)(1 − 2t) and this equality is obviously true. we can simply because S1 (0) = 0. that n = 2n−1 −1. The solution of this simple diﬀerential equation ing functions for the Stirling numbers of the second is d1 (t) = et − 1 (the reader can simply check this kind. However.recurrence relation by (k + 1)!/(n + 1)! we obtain the tended triangle and this corresponds to k ≥ 0.k∈N is a Riordan array having d(t) = 1 and th(t) = (et − 1). ceed in an analogous way.2 = 2dn. A rigorous proof of this to the case k = 0. Let us now proceed as above and ﬁnd the column generating functions for the new array (dn. RIORDAN ARRAYS The ﬁrst term represents the right part of the ex. and the = (n + 1)! k + 1 expression can be written as: 1 √ ∆1− 1 √ If we denote by dn. which can be written as: ray. n+1 5. this diﬀerential equation has the solution and the obvious generating function S0 (t) = 1. or d′ (t) = 2d2 (t) + 2(et − k+1 k k+1 2 1). In practice. we obtain: (n + + = (k + 1) 1)dn+1.k )n. can specialize the recurrence (valid for every n ∈ N) we have: dk (t) = (et − 1)k . we d2 (t) = (et − 1)2 .recurrence relation for dn.k+1 = (k + 1)dn. This is left For the Stirling numbers of the ﬁrst kind we proto the reader as a simple exercise.1 + dn. We are interested in the right part. d0 (t) = 1. we ﬁnd S2 (t) = 2tS2 (t)+ (k + 1)(et − 1)k+1 + (k + 1)(et − 1)k tS1 (t). by setting k = 0 in the new The connection between Riordan arrays and Stirling recurrence: numbers is not immediate. We multiply the basic The generating functions for the Stirling numbers recurrence: of the ﬁrst kind are not so simple. by substituting. This fact allows us Sm (t) = G n∈N to prove algebraically a lot of identities concerning the Stirling numbers of the second kind. (n + 1)dn+1.2 + 2dn.k .k∈N . we immediately realize that they are not Riordan ar′ rays. 2 and also indicates the form of the generating function for column m: n∈N . It is not diﬃcult to obtain the column generat.

. we obtain: ′ ′ f1 (t) = tf1 (t) + f0 (t). For the (k + 1)! n + 1 = Stirling numbers of the ﬁrst kind we have: (n + 1)! k + 1 = that is: (n + 1)fn+1. we ﬁnd that the A-sequence for the Besides. two orthogonality relations exist between triangle related to the Stirling numbers of the second Stirling numbers. A(t) = m k ln(1 + t) k Identities involving Stirling numbers the = [tn ] 1 1−y .k and fn. Bn n! therefore we have: G = exp(et − 1). For the Stirling numbers of the k! n m! k n k n! = = ﬁrst kind we have to solve the functional equation: m! n! k k! m k m k k 1 1 1 n! n . In this case. therefore we have: On n! n = k=0 k! n! n k = y = et − 1 = [tn ] 1 .k do not 1 On = . thus we can obtain the (exponential) generating function for these numbers: n f1 (t) = ln By setting k = 1. k + 1 n + 1 n! k n + 1 n! n k=0 n k n = n! k=0 k! n 1 = n! k k! 1 = 1−t = n![tn ] ey y = ln = n![tn ] 1 = n! 1−t as we observed and proved in a combinatorial way. The row sums of the Stirling numbers of the second kind give. However. For example. G give an immediate evidence that the two triangles n! 2 − et are indeed Riordan arrays.9. once we know their h(t) function. IDENTITIES INVOLVING THE STIRLING NUMBERS 65 by (k + 1)!/(n + 1)! and study the quantity fn. we ﬁnd the simple diﬀerential equa′ ′ tion f2 (t) = tf2 (t) + 2f1 (t). 2 − et 5. . In this case also we have f0 (t) = 1 and by specializing the last relation to the case k = 0. the A-sequences tween them in various ways. (fn.k+1 = nfn. = y y have A(y) = ye /(e − 1) and this is the generating m! (1 − t)m m! m − 1 function for the A-sequence we were looking for. as we know. We also deﬁned the ordered Bell numbers as On = n n k=0 k k!. In a similar way.k )n.k = A ﬁrst result we obtain by using the corresponk! n /n!: dence between Stirling numbers and Riordan arrays k concerns the row sums of the two triangles.5.9 We have thus obtained the exponential generating function: The two recurrence relations for dn. we have: for the two arrays can be easily found.k . ′ This is equivalent to f1 (t) = 1/(1 − t) and because f1 (0) = 0 we have: (k + 1)! n n k! n k + 1 + . The ﬁrst one is proved in this way: kind is: k n t (−1)n−k = .k+1 + (k + 1)fn. whose solution is: f2 (t) = 1 ln 1−t 2 1 . This suggests the general formula: fk (t) = G k! n n! k = n∈N ln 1 1−t k and again this can be proved by induction. = tA ln ln [t ] (ey − 1)m y = ln = = 1−t 1−t m! 1−t tm n! n − 1 n! n By setting y = ln(1/(1 − t)) or t = (ey − 1)/y. the Bell numbers. 1−t k=0 n k n = n! k=0 k! n! n k 1 = k! = n![tn ] ey y = et − 1 = = n![tn ] exp(et − 1).k∈N is the Riordan array having d(t) = 1 and th(t) = ln(1/(1 − t)). we [t ] = . because they do not corStirling numbers of the two kinds are related berespond to A-sequences.

First we have: n k=0 n k (−1)k k! = n! k+1 n k=0 k! n! n k (−1)k = k+1 1 1 = n![tn ] − ln y 1+y t = n![tn ] t = Bn e −1 y = et − 1 = which proves that Bernoulli numbers can be deﬁned in terms of the Stirling numbers of the second kind.m . In fact: n k=0 n (−1)n−k xk = k n = = = and: n (−1)n n! k=0 k! n xk (−1)k = n! k k! (−1)n n![tn ] e−xy y = ln 1 = 1−t x (−1)n n![tn ](1 − t)x = n! = xn n n k=0 n xk = n! k = n![tn ] (1 + y) k=0 x k! n! n k x k = = n![tn ]etx = xn . = n![tn+1 ](1 − t) ln 1−t n+1 = n![tn ] n k 1 n! n [t ] (e−y − 1)m y = ln = (−1) m! 1−t n! (−1)n [tn ](−t)m = δn. For the Stirling numbers of the ﬁrst kind we have the identity: n k=0 n Bk k n = n! k=0 k! n Bk = n! k k! y ey − 1 y = ln 1 = 1−t = n![tn ] . m k=0 k We introduced Stirling numbers by means of Stirling identities relative to powers and falling factorials. We can now prove these identities by using a Riordan array approach. RIORDAN ARRAYS n! m! k! n m! n! k k! k m (−1)k = 1−t 1 ln = t 1−t (n − 1)! 1 = . For n = 0 we have: n Bk = B0 = 1. this holds for n > 0. k The second orthogonality relation is proved in a similar way and reads: n k k (−1)n−k = δn. y = et − 1 = We conclude this section by showing two possible connections between Stirling numbers and Bernoulli numbers.m . m! Clearly.66 = = = (−1)n n CHAPTER 5.

that if z is a subword of w. If w ∈ A∗ and z is a word such that w can be decomposed w = w1 zw2 (w1 and/or w2 possibly empty). if we also have w = w1 z1 w2 . and r = |w| is 1 2 r 6. . we say that z is a subword of w.Chapter 6 Formal methods say that z occurs in w. and if w = w1 z. associativity: w1 w2 w3 . If w ∈ (T ∪ N )∗ . if · denotes the juxtaposition. we recall deﬁnitions given in Section 2. w = w1 z1 w2 is the leftmost occurrence of z1 in w and w = w1 z2 w2 . the symbols in T are denoted by lower case Latin letters. . Observe During the 1950’s. The set of all the words on A. . . φ2 . we deﬁne the relation w ⊢ w between words w. . . a2 . . N. N. let us consider the operation of juxtaposition and recursively apply it starting with the symbols in A. has been generated by combining the symbols in A in all the possible ways. Therefore. . • σ ∈ N is the initial symbol. the linguist Noam Chomski in. where: is a ﬁnite sequence of symbols in A. and the result is the new word w1 z2 w2 ∈ (T ∪ N )∗ . w ∈ (T ∪ N )∗ : the relation holds if and only if a production z1 → z2 ∈ P exists such that z1 occurs in w. • P is a ﬁnite set of productions. Because of that. . and the juxtaposition can be seen as an operation between them. A word on A (T. To understand this point. . Finally. The algebraic structure thus obtained has the following properties: 1. and the particular instance of z in w is called an occurrence of z in w. in other words. in which all the elements have an inverse as well. w1 (w2 w3 ) = (w1 w2 )w3 = • N = {φ1 . let us or suﬃx of w. First. Algebraically. P). a2 . . We also denote by ⊢∗ the transitive closure of ⊢ and call it generation or derivation. . . this means that w ⊢∗ w if and only if a sequence (w = w1 . A∗ is the free monoid on A. by construction. it can have more than one troduced the concept of a formal language. .1. σ. . Observe that a monoid is an algebraic structure more general than a group. we say that z is a tail ment of the concept can be given. an } is the alphabet of terminal written by juxtaposing the symbols. its length is obviously 0. A production is a pair (z1 . w2 . we say that z is a head deﬁnitions have to be provided before a precise state. Usually. We observe explicitly that by our condition that in every production z1 → z2 the word z1 should contain 2. the symbols in N by Greek letters or by upper case Latin letters. an }. . such that z1 contains at least a symbol in N . word w is denoted by w = ai ai . What we get are the words on A. is called the “free monoid” generated by A. a language on A is any subset proceed in the following way. (A∗ . we can apply a production z1 → z2 ∈ P to w whenever w can be decomposed w = w1 z1 w2 . which. P). and is the only word of 0 length. ai . whose is the following: a grammar is a 4-tuple G = elements are called symbols or letters. . . the empty word included. w2 ⊢ w3 . σ. ǫ is the identity or neutral element: ǫw = wǫ = w. φm } is the alphabet of nonterminal symbols.1 Formal languages the length of the word. then |w1 | < |w1 |. ws−1 ⊢ ws . that is the monoid freely generated by the symbols in A. Given a grammar G = (T. the production is often indicated by z1 → z2 .or preﬁx of w. z2 ) of words in T ∪ N . It is called a monoid. is indicated by A∗ . If w = zw2 . An The basic deﬁnition concerning formal languages alphabet is a ﬁnite set A = {a1 . . and therefore a symbols. Several occurrence in w. . and by A+ if the empty word is excluded. the sequence is • T = {a1 . ·). The empty sequence is called the empty word and is conventionally denoted by ǫ. . L ⊆ A∗ . . we will write w = w1 z1 w2 ⊢ w1 z2 w2 when w1 z1 w2 is the decomposition of w in which z1 is the leftmost occurrence of z1 in w. we also 67 . ws = w) exists such that w1 ⊢ w2 .

we i) the number of a’s in w equals the number of b’s. As a very simple example. Besides. so that. Surely. The recursive nature of the producσ ⊢ σσ ⊢ 1σ ⊢ 1σσ ⊢ 11σ ⊢ 111 tions allows us to prove properties of the Dyck language by means of mathematical induction: σ ⊢ σσ ⊢ σσσ ⊢ 1σσ ⊢ 11σ ⊢ 111. For ii). if we suppose that i) holds for w1 and w2 . let us consider the following grammar. the word 111 is generated by the two ure 6. By collecting all these deﬁnitions. if we wish to know whether a word w belongs to such a set S. any preﬁx z of w must have one of the forms: a. we can give a grammar for them). By the induction hypothesis.e. but their construction can go on forever. sometimes. Let us scan w until we ﬁnd the ﬁrst occurrence of the symbol b such that w = aw1 bw2 and in w1 the number of b’s equals the number of a’s. 6. then by ii) w should begin by a. Otherwise. Therefore. FORMAL METHODS ∗ at least a symbol in N . if w = ǫ nothing has to be proved. people have studied more restricted classes of languages. ﬁnally deﬁne the language generated by the grammar G as the set: ii) in every preﬁx z of w the number of a’s is not less than the number of b’s. We will see how this n result can also be obtained by starting with the definition of the Dyck language and applying a suitable and mechanical method. and therefore it is easily proved for w. ∗ ∗ L(G) = w ∈ T σ ⊢ w i.2. i. if w1 and w2 are not empty. This means that we can give rules to build such sets (e. the generation can go on forever. because a production z1 → z2 is applied whenever the non-terminal symbol z1 is the leftmost non-terminal symbol in a word. Vice versa. This completes the proof. Note that. The set P is composed of the two productions: σ→ǫ σ → aσbσ. aw1 b or aw1 bz2 where z2 is a preﬁx of w2 . The method can be applied to every set of objects. the generation stops. N = {σ} and P contains the two productions: σ→1 σ → σσ. az1 where z1 is a preﬁx of w1 . In Fig. never generating a word on T . N. looking at them from another point of view.For example. being the only non-terminal. Because of that.e. σ is the initial symbol of the grammar. by the very construction of w1 and the fact that w satisﬁes condition ii) by hypothesis. N. In other words.. a word w ∈ T ∗ is in L(G) if and only if we can generate it by starting with the initial symbol σ and go on by applying the productions in P until w is generated.. If we substitute the letter a with the symbol ‘(′ and the letter b with the symbol ‘)’. P) where T = {1}. if a word wi ∈ T ∗ is produced Theorem 6. b}. the theorem shows that the words in the Dyck language are the possible parenthetizations of an expression. σ. At that moment. The naming “context-free” derives from this deﬁnition. b} belongs to the during a generation. the largest class of sets which can be constructed recursively. An example of an ambiguous grammar is G = (T. for which a ﬁnite process is always possible for ﬁnding out whether w belongs to the language or not.1 A word w ∈ {a. By i) such occurrence of b must exist. therefore.. the most important class of this kind is the class of “context-free languages”.g. a context-free grammar H is non-ambiguous iﬀ every word w ∈ L(H) can be generated in one and only one way. A context-free grammar G is ambiguous iﬀ there exists a word w ∈ L(G) which can be generated by two diﬀerent leftmost derivations.2 Context-free languages The deﬁnition of a formal language is quite general and it is possible to show that formal languages coincide with the class of “partially recursive sets”. and consequently w1 and w2 must satisfy condition i). They are deﬁned in the following way. this is not a problem: it only means that such generations should be ignored. If w = ǫ. Let T = {a. We have thus obtained a decomposition of w showing that the second production has been used. N = {σ}. which are deﬁned through a non-ambiguous context-free grammar.e. it also holds for w. irrespective of the context in which it appears.1 we draw the generation of some words in the following leftmost derivations: Dyck language. known as Sch¨tzenberger u methodology or symbolic method. Proof: Let w ∈ D. the number of Dyck words with n pairs of parentheses is the Catalan number 2n /(n+1). . it is terminal. then they should satisfy condition ii). This grammar is called the Dyck grammar and the language generated by it the Dyck language. A context-free grammar is a grammar G = (T. however.. σ. let us suppose that i) and ii) hold for w ∈ T ∗ . in ﬁnite terms. P) in which all the productions z1 → z2 in P are such that z1 ∈ N . i. ii) should hold for z1 and z2 . w2 ∈ D. w is generated by the second production and w = aw1 bw2 with w1 .68 CHAPTER 6. we can be unlucky and an inﬁnite process can be necessary to ﬁnd out that w ∈ S. the generation Dyck language D if and only if: should stop.

they were the characters punchable on a card. they are not very frequent. if we show that any word in a context-free language L(G). β → w2 . Another techIt is a simple matter to show that every word 11 . of which. In general. Here. . 1 nical device used by ALGOL’60 was the compaction can be uniquely decomposed according to these pro. although it was actually used only by a very limited number of programmers. given any word w ∈ D. . generated by some grammar G. For the sake of completeness. it can also be generated by some non-ambiguous grammar. the Dyck grammar is non-ambiguous. whose initial symbol is program . This deﬁnition stresses the fact that. ductions. ALGOL’60 has surely been the most inﬂuential programming language ever created. if we had several production with the same left hand symbol β → w1 . allowed to clearly express the intended meaning of the non-terminal symbols. the set of all the words composed by any sequence of 1’s. N. in fact. for example. The ALGOL’60 grammar used. then the grammar cannot be ambiguous. PASCAL and C are direct derivations. as we have shown in the proof of the previous theorem. a context-free language is called intrinsically ambiguous iﬀ every context-free grammar generating it is ambiguous.6. w can only be generated in a single way. a program in ALGOL’60 is a word generated by a (rather complex) context-free grammar. w = ǫ. but we wish to spend some words on how ALGOL’60 used context-free grammars to deﬁne its syntax in a formal and precise way. i. has a unique decomposition according to the productions in G. . Most of the concepts we now ﬁnd in programming languages were introduced by ALGOL’60. the language generated by the previous ambiguous grammar is {1}+ . it is not an ambiguous language and a non-ambiguous grammar generating it is given by the same T.. . . we are not interested in these aspects of ALGOL’60. except the empty word. as terminal symbol alphabet. the characters available on the standard keyboard of a computer. . the input mean used at that time to introduce a program into the computer. actually. we are mainly interested in this kind of grammars.e.3. The previous example concerning program makes surely sense. σ and the two productions: 6. The non-terminal symbol notation was one of the most appealing inventions of ALGOL’60: the symbols were composed by entire English sentences enclosed by the two special parentheses and . therefore. fortunately.1: The generation of some Dyck words Instead. w2 ∈ D.3 Formal languages and programming languages In 1960. Because of the connection between the Sch¨tzenberger methodology and non-ambiguous u context-free grammars. FORMAL LANGUAGES AND PROGRAMMING LANGUAGES 69 $$ ǫ $$ $ $$ σ @ @@ aσbσ h hhh hh hh hh h aaσbσbσ d d aaaσbσbσbσ d d aabaσbσbσ d d @@ @@ @@@@ abσ d d abaσbσ ababσ d d abaaσbσbσ ab aabσbσ aabbσ aabb d d abab d d ababaσbσ d d d d aabbaσbσ Figure 6. actually. This σ→1 σ → 1σ. For example. unless it is intrinsically ambiguous. In practice.of productions. there is only one decomposition w = aw1 bw2 . It is possible to show that intrinsically ambiguous languages actually exist. if a language is generated by an ambiguous grammar. the formal deﬁnition of the programming language ALGOL’60 was published. having w1 . . β → wk .

N = {φ}. they are executed before products. which is easily recognized as the Fibonacci sequence. This deﬁnition avoids leading 0’s in numbers. or of a more complicated expression. Therefore. . This kind of deﬁnition gives a precise formulation of all the clauses in the programming language. every non-terminal symbol σ ∈ N is transformed into the name of its counting generating function σ(t). Fi. reveals that it is decomposed into the sum of a and b ∗ c. we obtain the sequence {0. We are now going to obtain the counting generating function for Fibonacci words by applying the Sch¨tzenberger’s method. Fibonacci words of length n are easily displayed: n=1 n=2 n=3 n=4 n=5 1 11 111. Productions can be easily changed to avoid +0 or −0 or both. This consists in the folu lowing steps: 1. 1011. In the same ﬁgure. The derivation of the simple expression a + b ∗ c. σ. starting with the correspond1 − t − t2 ing non-ambiguous grammar and proceeding in a mechanical way. . A very interesting aspect is how this context-free grammar deﬁnition can avoid ambiguities in the interpretation of a program. FORMAL METHODS beginning and ending by the symbol 1 and never containing two consecutive 0’s.}. according to the rules of Algebra. we get the productions of a nonambiguous context-free grammar G = (T. This allows to give precise information to the compiler. but allows both +0 and −0. We conclude by remembering that a more sophisticated approach to the deﬁnition of programming languages was tried with ALGOL’68 by means of van Wijngaarden’s grammars. line 7 shows the deﬁnition of the conditional statements. For small values of n. 2. 1. 5.70 they were written as a single rule: β ::= w1 | w2 | · · · | wk where ::= was a metasymbol denoting deﬁnition and | was read “or” to denote alternatives. Just to do a very simple example. 4.1. This is done by the simpliﬁed productions given by lines 8 through 11 in Figure 6. N. the empty word is transformed into 1. In fact. Let us consider an expression like a + b ∗ c.4 The symbolic method The Sch¨tzenberger’s method allows us to obtain the solution of which is: u the counting generating function for every nont . 8. in Figure 6. when we consider them as the initial symbols. 1}. 3. 101 1111. If powers are also present. After having performed these transformations. 1101 11111. since the program has a single generation according to the grammar. that Fibonacci words are counted by Fibonacci numbers. This notation is usually called Backus Normal Form (BNF). . Besides. This ability of context-free grammars in designing the syntax of programming languages is very important. and after ALGOL’60 the syntax of every programming language has always been deﬁned by context-free grammars. which can be solved in the unknown generating functions introduced in the ﬁrst step. in a combinatorial way.1 (lines 1 through 6) we show how integer numbers were deﬁned. Let us begin by a simple example. a word of length n is obtained by adding a trailing 1 to a word of length n−1. which. it is possible to ﬁnd this derivation starting from the actual program and therefore give its exact structure. The deﬁnition of the Fibonacci words produces: φ(t) = t + tφ(t) + t2 φ(t) 6. we have shown that the . 1. every terminal symbol is transformed into t. we obtain a system of equations. but the method revealed too complex and was abandoned. 11101. 1} bonacci numbers.this is obviously the generating function for the Fibonacci words are the words on the alphabet {0. in a sense. 2. is directed from the formal syntax of the language (syntax directed compilation). 10111. Besides. and ::= is transformed into an equal sign. where T = {0. 11011. this information is passed to the compiler and the multiplication is actually performed before addition. φ(t) = ambiguous language. CHAPTER 6. the multiplication should be executed before the addition. This immediately shows. and the computer must follow this convention in order to create no confusion. 3. or adding a trailing 01 to a word of length n − 2. They are the counting generating functions for the languages generated by the corresponding nonterminal symbols. σ = φ and P contains: φ→1 φ → φ1 φ → φ01 (these productions could have been written φ ::= 1 | φ1 | φ01 by using the ALGOL’60 notations). every | sign is transformed into a + sign. P). 10101 If we count them by their length.

e. z). Therefore. i. THE BIVARIATE CASE 1 2 3 4 5 6 7 digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 non − zero digit ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 sequence of digits ::= digit | digit sequence of digits unsigned number ::= digit | non − zero digit sequence of digits signed number ::= + unsigned number | − unsigned number integer number ::= unsigned number | signed number conditional clause ::= if condition then instruction | if condition then instruction else instruction expression ::= term | term + expression term ::= factor | term ∗ factor factor ::= element | factor ↑ element element ::= constant | variable | ( expression ) Table 6. as we have already proved by combinatorial arguments. while c is free and can appear everywhere. because of that. the deﬁnition yields: σ(t) = 1 + t2 σ(t)2 and therefore: σ(t) = 1− √ 1 − 4t2 . An interesting application is as follows. the number of Dyck words with 2n symbols is just the nth Catalan number. these are words on the alphabet {a. For example. the Sch¨tzenberger’s u method gives the equation σ(t) = 1 + tσ(t)2 . b act as parentheses in the Dyck language. in which the coeﬃcient of tn z k is just the . the u object grammar in Figure 6. b occurring in the words. we derive an equation for the bivariate generating function φ(t. however. and therefore obtain counting generating functions by means of Sch¨tzenberger’s method.. In Sch¨tzenberger’s method. beginning: n Mn 0 1 1 2 1 2 3 4 5 4 9 21 6 51 7 127 8 323 9 835 The 6. it is rather easy to pass from an object grammar to an equivalent context-free grammar.6. whose solution is just the generating function of the Catalan numbers. c} in which a. This is accomplished by modifying the intended meaning of the indeterminate t and/or introducing some other indeterminate to take into account the other parameters. the rˆle of the indeteru o minate t is to count the number of letters or symbols occurring in the generated words. Therefore. every symbol appearing in a production is transformed into a t. Let us suppose we wish to know how many Fibonacci words of length n contain k zeroes. Besides the indeterminate t counting the total number of symbols. most times. b.5 The bivariate case These numbers count the so-called unary-binary trees. in the Dyck language. but counts the pairs. we introduce a new indeterminate z counting the number of zeroes. However. and this also we knew by combinatorial means. we can wish to count the number of pairs a. trees the nodes of which have ariety 1 or 2.2 is obviously equivalent to the context-free grammar for Motzkin words. Sch¨tzenberger’s method gives the equation: u µ(t) = 1 + tµ(t) + t2 µ(t)2 whose solution is easily found: √ 1 − t − 1 − 2t − 3t2 µ(t) = . 2t2 71 8 9 10 11 They can be deﬁned in a pictorial way by means of an object grammar: An object grammar deﬁnes combinatorial objects instead of simple letters or words. In the case of the Dyck language. From the productions of the Fibonacci grammar. Another example is given by the Motzkin words.1: Context-free languages and programming languages number of Fibonacci words of length n is Fn . this means that t no longer counts the single letters.5. the deﬁnition of the language is: µ ::= ǫ | cµ | aµbµ if µ is the only non-terminal symbol. we can wish to count other parameters instead of or in conjunction with the number of symbols. Since every word in the Dyck language has an even length. 2t2 By expanding this function we ﬁnd the sequence of Motzkin numbers. For example.

the production φ → φ1 increases by 1 the t t2 y length of the words. 5 5 5 1 − φt (1 − φt)2 n−k−1 1 .72 t CHAPTER 6. z) = We extract the coeﬃcient: 1 − t − zt2 t3 n = We can now extract the coeﬃcient φn. t = Fn = [tn ] 1 − t − t2 as we were expecting.6 The Shift Operator ..In the usual mathematical terminology. k total number of zeroes in all the words of length n: kφn. The general formula √ kφn. the total stretched vertically. FORMAL METHODS t d d t t µ t t ::= ǫ t t µ t t t t µ t t d d t t µ t t Figure 6. we = [tn ] . therefore we have: Therefore.k of tn z k in the [t ] (1 − t − t2 )2 following way: 2 1 1 1 t = [tn−1 ] √ − = n k = φn.k 2 n 5− 5 2 we know for the row sums of a Riordan array gives: Zn = k ∼ √ − = n− .To obtain the average number Z of zeroes.64% of this length. in reality). z).2763932022 .e. the number of Fibonacci words of length 2 n n containing k zeroes is counted by a binomial coeﬃkφn. = [tn−2k−1 ] = The last two terms are negligible because they rapidly k (1 − t)k+1 tend to 0.k )n. = [tn ] y= = 1 − t (1 − y)2 1−t the production φ → φ01. (1 − t − t2 )2 ﬁnd: t . φ(t. increases by 2 the length t3 and introduces a zero. A more interesting problem is to ﬁnd the average number of zeroes in all the Fi. because (5 − n √ y= = = [t ] 1−t 1−y 1−t 5)/10 ≈ 0.2: An object grammar number we are looking for. z) = t + tφ(t.k∈N is indeed a Riordan ar. we count the is a mapping from some set F1 of functions into some 6.k = [t ][z ] 5 1 − φt 1 − φt 1 − t − zt2 1 2 t 1 n−1 t 1 + [t ] − [tn ] = = [tn ][z k ] = 5 (1 − φt)2 5 t2 (1 − φt)(1 − φt) 1 − t 1 − z 1−t 1 1 ∞ k + [tn−1 ] = t2 t k n k 5 z = = [t ][z ] (1 − φt)2 1−t 1−t k=0 1 1 n−1 1 2 = [t ] − √ [tn ] + ∞ 5 (1 − φt)2 1 − φt t2k+1 z k t2k+1 5 5 = [tn ][z k ] = [tn ] = 1 2 1 1 (1 − t)k+1 (1 − t)k+1 k=0 + √ [tn ] + [tn−1 ] .k = = This shows that the average number of zeroes grows k k k linearly with the length of the words and tends to t2 1 t become the 27. Fn 10 5 φ 5 5 n−k−1 φn. First. By solving this equation. t/(1 − t)). an operator bonacci words with n letters. 5 5 5 cient. The second expression in the derivation shows k that the array (φn. which is the Pascal triangle to divide this quantity by Fn ∼ φn /√5. i. . but does not introduce any 0.k = k n−k−1 k= k In fact.k ≈ φn−1 − √ φn . The equation is: φ(t. z) + t2 zφ(t. column k is shifted down by number of Fibonacci words of length n: k positions (k + 1. we need n ray (t/(1 − t).. .

it is deﬁned in terms of the shift 2 and identity operators: E f (x) = EEf (x) = Ef (x + 1) = f (x + 2) and in general: E n f (x) = f (x + n) ∆f (x) = = Ef (x) − If (x) = f (x + 1) − f (x) (E − 1)f (x) = Conventionally. the operator of indeﬁnite integration. We can have occasions to use two or more shift operators. the English mathematicians (G.e. which does not change anything. We have already encountered the operator G. Since the middle of the 19th century. y) Ey f (x. if c is any constant (i. functional notation rather than the second one. this makes no sense and we constantly use h = 1. c ∈ R or c ∈ C) we have Ec = c. E (n times). Operationally. a ﬁnite operator can be applied to a sequence. denoted by I or by 1. i. . an operator acting in ﬁnite terms. in contrast to an inﬁnitesimal operator. changing the value of any function f at a point x into the value of the same function at the point x + 1. Boole in particular) introduced the concept of a ﬁnite operator.6. acting from the set of sequences (which are properly functions from N into R or C) into the set of formal power series (analytic functions). y) = f (x. y + 1) This property is particularly interesting from our point of view but. We’ll distinguish them by suitable subscripts: Ex f (x. y) = Ey Ex f (x.e. and this is in Some simple examples are: accordance with the meaning of I: ∆c = Ec − c = c − c = 0 ∀c ∈ C E 0 f (x) = f (x) = If (x) ∆x = Ex − x = x + 1 − x = 1 ∆x2 = Ex2 − x2 = (x + 1)2 − x2 = 2x + 1 We wish to observe that every recurrence can be written using the shift operator. An important observation . THE DIFFERENCE OPERATOR other set of functions F2 . Ef (x) = f (x + h). and in that case we can write: Efn = fn+1 E(f (x)g(x)) = Ef (x)Eg(x) 73 Hence. and . the operator of diﬀerentiation. separating the operator parts from the sequence parts: (E 2 − E − I)Fn = 0 Some obvious properties of the shift operator are: E(αf (x) + βg(x)) = αEf (x) + βEg(x) 1 ∆ x x ∆ m = = 1 1 1 − =− x+1 x x(x + 1) x+1 x x − = m m m−1 The last example makes use of the recurrence relation for binomial coeﬃcients.7 The Diﬀerence Operator The shift operator can be iterated and. such as diﬀerentiation and integration. So we have: E −1 f (x) = f (x − 1) E −n f (x) = f (x − n) and this is in accordance with the usual rules of powers: E n E m = E n+m for n..7. the ∆xn = Exn − xn = (x + 1)n − xn = recurrence for the Fibonacci numbers is: n−1 n k x 2 E Fn = EFn + Fn k k=0 and this can be written in the following way. the most important ﬁnite operator is the shift operator. however. we write E n instead The second most important ﬁnite operator is the difof EE . It is possible to consider negative powers of E. m ∈ Z E n E −n = E 0 = I for n∈Z Finite operators are commonly used in Numerical Analysis. We write: Ef (x) = f (x + 1) Unlike an inﬁnitesimal operator as D. we set E 0 = I = 1. we shall adopt the ﬁrst. So: ference operator ∆. . as well. that is shift operators related to diﬀerent variables. y + 1) tional conventions. to denote the successive applications of E. In that case. y) = f (x + 1. i. Other usual examples are D. For example..e. The most simple among the ﬁnite operators is the identity. y) = f (x + 1. more speciﬁc for sequences. denoted by E. an increment h is deﬁned and the shift operator acts according to this increment. 6. as well.. When considering sequences. in order to follow the usual notaEx Ey f (x.

Example I = The diﬀerence operator can be iterated: ∆2 f (x) = ∆∆f (x) = ∆(f (x + 1) − f (x)) = = f (x + 2) − 2f (x + 1) + f (x). x(x + 1) · · · (x + n) The following general rules are rather obvious: (−1)n+1 (n + 1)! = x(x + 1) · · · (x + n + 1) ∆(αf (x) + βg(x)) = α∆f (x) + β∆g(x) The formula for ∆n now gives the following identity: ∆(f (x)g(x)) = n 1 n 1 = E(f (x)g(x)) − f (x)g(x) = ∆n = (−1)n−k E k k x x k=0 = f (x + 1)g(x + 1) − f (x)g(x) = = f (x + 1)g(x + 1) − f (x)g(x + 1) + By multiplying everything by (−1)n this identity can be written as: +f (x)g(x + 1) − f (x)g(x) = = ∆f (x)Eg(x) + f (x)∆g(x) resembling the diﬀerentiation rule D(f (x)g(x)) = (Df (x))g(x) + f (x)Dg(x). giving the summation formula: n From a formal point of view. In fact. The rˆle of the powers xm is ∆n+1 1 = o − x (x + 1) · · · (x + n + 1) however taken by the falling factorials.74 concerns the behavior of the diﬀerence operator with respect to the falling factorial: ∆xm = (x + 1)m − xm = (x + 1)x(x − 1) · · · · · · (x − m + 2) − x(x − 1) · · · (x − m + 1) = CHAPTER 6. In As we shall see. let us iterate ∆ on f (x) = 1/x: ∆2 1 x = = ∆n 1 x = −1 1 + = (x + 1)(x + 2) x(x + 1) −x + x + 2 2 = x(x + 1)(x + 2) x(x + 1)(x + 2) (−1)n n! x(x + 1) · · · (x + n) = x(x − 1) · · · (x − m + 2)(x + 1 − x + m − 1) = = mxm−1 This is analogous to the usual rule for the diﬀerentiation operator applied to xm : Dxm = mxm−1 as we can easily show by mathematical induction. which there(−1)n n! fore assume a central position in the theory of ﬁnite = − operators. In a similar way we have: ∆ f (x) g(x) = = f (x + 1) f (x) − = g(x + 1) g(x) f (x + 1)g(x) − f (x)g(x) + g(x)g(x + 1) f (x)g(x) − f (x)g(x + 1) = + g(x)g(x + 1) (∆f (x))g(x) − f (x)∆g(x) g(x)Eg(x) 1 n (−1)k n! = x+n = x(x + 1) · · · (x + n) k x+k x n k=0 n and therefore we have both a way to express the inverse of a binomial coeﬃcient as a sum and an expression for the partial fraction expansion of the polynomial x(x + 1) · · · (x + n) inverse. ∆2 = (E − 1)2 = E 2 − 2E + 1 and in general: n As the diﬀerence operator ∆ can be expressed in terms of the shift operator E. we have: E n = (∆ + 1)n = k=0 n ∆k k ∆n = = (E − 1)n = n k=0 n (−1)n−k E k = k which can be seen as the “dual” formula of the one already considered: n (−1) n k=0 n (−E)k k ∆ = k=0 n n (−1)n−k E k k .8 Shift and Diﬀerence Operators . and it is the ﬁrst example for the interest of combinatorics and generating functions in the theory of ﬁnite operators.fact: ference operator are similar to the properties of the (−1)n n! diﬀerentiation operator. 6. so E can be expressed in terms of ∆: E =∆+1 This rule can be iterated. many formal properties of the dif. FORMAL METHODS This is a very important formula.

but a bit more complex: p+x ∆ m+x ∆n p+x m+x = = m−p (m + x)(m + x + 1) m+x+n m−p (−1)n−1 m+x n −1 −1 k n (−1)k (x + k)Hx+k = k = 1 x+n−1 n−1 n−1 −1 k p+k n m−p m+n (−1)k = k n m+k m n k m+k k −1 k n (−1)k x + k − 1 k k−1 k−1 = −1 = (n > 0) (x + n) (Hx+n − Hx ) − n x Hx m x Hx m x 1 Hx + m m−1 x (Hx + Hm − Hm−n ) m−n 6) Harmonic numbers and binomial coeﬃcients: (see above). which may have combinatorial signiﬁcance. however. 5∗ ) A more complicated case with the harmonic numbers: ∆xHx = Hx + 1 = (−1)n x + n − 1 n−1 n−1 −1 k x+k n (−1)k k k −1 = x . Here we record some typical examples. ∆ ∆n = = k m (−1)k = m+n 3) Another version of the ﬁrst example: −p 1 = ∆ px + m (px + m)(px + p + m) 1 ∆n = px + m (−1)n n!pn = (px + m)(px + p + m) · · · (px + np + m) k n x+k (−1)k Hx+k = k m = (−1)n x (Hx + Hm − Hm−n ) m−n n x According to this rule.6. 1) The function f (x) = 1/x has already been developed. we mark with an asterisk the cases when ∆0 f (x) = If (x). at least partially: ∆ ∆n 1 x 1 x = = −1 x(x + 1) (−1)n x + n (−1)n n! x(x + 1) · · · (x + n) x n −1 75 4∗ ) A case involving the harmonic numbers: ∆Hx ∆n Hx = = 1 x+1 (−1)n−1 x + n n n −1 k n 1 x+n n (−1)k Hx+k = − k n n n (−1)k−1 x + k k k k −1 −1 (n > 0) k=1 = Hx+n − Hx k n (−1)k n! = = k x+k x(x + 1) · · · (x + n) = 1 x+n x n −1 where is to be noted the case x = 0.8. in the second next k sum. Hx and Hm : n!pn n (−1)k = x n m(m + p) · · · (m + np) k pk + m Hm−k = k m−k k k (−1)k k!pk 1 n x+n = (Hx + Hm − Hx+n ) = k m(m + p) · · · (m + pk) pn + m m k . x+n ∆n xHx 2∗ ) A somewhat similar situation. and therefore x+n = Hx+n we also have to subtract 1 from both members in orm der to obtain a true identity. SHIFT AND DIFFERENCE OPERATORS . we should have: ∆0 (p + (Hx + Hm − Hm−k ) = k m−k x)/(m + x) = (p − m)(m + x). we have to set ∆0 = I. a similar situation arises and by performing the sums on the left containing whenever we have ∆0 = I.EXAMPLE I The evaluation of the successive diﬀerences of any function f (x) allows us to state and prove two identities.

76 7) The function ln(x) can be inserted in this group: ∆ ln(x) ∆n ln(x) = = ln x+1 x n (−1)k ln(x + k) k k CHAPTER 6. 5) Two sums involving the binomial coeﬃcients: ∆ x m x m = = x m−1 x m−n = = (−1)n x+n m x m−n 6.9 Shift and Diﬀerence Operators . FORMAL METHODS n (−1)k (x + k)m k n mk xm−k k = = (−1)n mn xm−n (x + n)m (−1)n k k 4) Similar sums hold for raising factorials: ∆xm ∆n xm = m(x + 1)m−1 = mn (x + n)m−n = = (−1)n mn (x + n)m−n (x + n)m k n (−1)k ln(x + k) = k = k n k n (−1)k ln(x + k) k k n (−1)k (x + k)m k n mk (x + k)m−k k (−1) k=0 j=0 k+j n k k ln(x + j) = ln(x + n) j k Note that the last but one relation is an identity.Example II ∆n Here we propose other examples of combinatorial sums obtained by iterating the shift and diﬀerence operators. 1) The typical sum containing the binomial coeﬃcients: Newton’s binomial theorem: ∆p ∆n px x k n x+k (−1)k k m n k x m−k k = = (p − 1)p (p − 1)n px = (p − 1)n x 6) Another case with binomial coeﬃcients: ∆ ∆n p+x m+x p+x m+x = = p+x m+x+1 p+x m+x+n = = (−1)n p m+n k n (−1)k pk k n (p − 1)k k = pn k k n p+k (−1)k k m+k n k p m+k 2) Two sums involving Fibonacci numbers: ∆Fx ∆n Fx = Fx−1 = Fx−n (−1)n Fx−n k p+n m+n 7) And now a case with the inverse of a binomial coeﬃcient: = ∆ ∆n x m x m −1 k n (−1)k Fx+k k n Fx−k k = − = = Fx+n −1 m x+1 m−1 m+1 −1 (−1)n k x+n m m+n m+n −1 −1 3) Falling factorials are an introduction to binomial coeﬃcients: ∆xm ∆n xm = mxm−1 = mn xm−n k x+k n (−1)k m k = x+n m m+n m+n −1 −1 k m n x+k (−1)k k m+k m+k = x+n m −1 .

and the symbol S only is used here for convenience. For example: S∆ = (E + 1)(E − 1) = E 2 − 1 = = (E − 1)(E + 1) = ∆S . THE ADDITION OPERATOR 8) Two sums with the central binomial coeﬃcients: 1 2x 4x x 1 2x ∆n x 4 x ∆ = = 1 2x 1 2(x + 1) 4x x 1 2n (−1)n (2n)! n!(x + 1) · · · (x + n) 4n n 77 Because of this connection. the addition operator has not widely considered in the literature.10. Likewise the diﬀerence operator.6.10 The Addition Operator k m x+k = m+1 m+n m−n+1 x −1 −1 The addition operator S is analogous to the diﬀerence operator: S =E+1 and in fact a simple connection exists between the two operators: S(−1)x f (x) = (−1)x+1 f (x + 1) + (−1)x f (x) = = (−1)x+1 (f (x + 1) − f (x)) = = (−1)x−1 ∆f (x) k n (−1)k m+k k m−k+1 x = m 1 m+1 x+n −1 . correspondingly. We can obviously invent as many expressions as we desire and. may obtain some summation formulas of combinatorial interest. the addition operator can be iterated and often produces interesting combinatorial sums according to the rules: Sn En = = (E + 1)n = k k 2x + 2k 1 n (−1)k = x + k 4k k = = (2n)! 1 2x n!(x + 1) · · · (x + n) 4n x 1 2n 4n n x+n n −1 n Ek k n (−1)n−k S k k = (S − 1)n = k 2x x Some examples are in order here: 1) Fibonacci numbers are typical: SFm S n Fm = Fm+1 + Fm = Fm+2 = Fm+2n n Fm+k k = Fm+2n = (−1)n Fm+n k 1 2x n (−1)k (2k)! k k!(x + 1) · · · (x + k) 4k x = 1 2x + 2n 4n x + n = k 9) Two sums with the inverse of the central binomial coeﬃcients: ∆4x ∆n 4x = 2x x 2x x −1 k n (−1)k Fm+2k k = 2x 4x 2x + 1 x −1 2) Here are the binomial coeﬃcients: S Sn m x m x = = n k m+1 x+1 m+n x+n m x+k = = m+n x+n (−1)n m x+n −1 = −1 1 (−1)n−1 (2n)!4x 2x n n!(2x + 1) · · · (2x + 2n − 1) 2n − 1 2 x n 2x + 2k (−1)k 4k k x+k −1 k = −1 k k m+k n (−1)k x+k k = k 1 n (2k)!(−1)k−1 = k k!(2x + 1) · · · (2x + 2k − 1) k 2k − 1 2 = 4 n (2n)! 2x 1 n n!(2x + 1) · · · (2x + 2n − 1) 2n − 1 2 x 3) And ﬁnally the inverse of binomial coeﬃcients: S m x −1 = = m+1 m−1 m x −1 2x + 2n x+n −1 2x −1 x Sn m x n k m−n m+1 m−n+1 x −1 −1 6.

E n+1 − 1 n+1 −1 An important property is summation by parts. we have the two summation formulas: ∆n S n E 2n = = (E 2 − 1)n = (∆S + 1)n = k = S(f (x + 1) − f (x)) = n k=0 f (x + k) = g(x + n + 1) − g(x) k n (−1)n−k E 2k k n ∆k S k k This is analogous to the rule of deﬁnite integration. which we proved earlier: the variable x. a primitive function for f (x) is any function g (x) such that Dˆ(x) = f (x) or ˆ g D−1 f (x) = f (x)dx = g (x).. the operator ∆−1 is called indeﬁnite summation and is often de. or Σ and dx. the operator of indeﬁnite integration dx is inverse of the diﬀerentiation operator D. on which E operates. let us changing terms: consider any function f (x) and suppose that a function g(x) exists such that ∆g(x) = f (x).78 CHAPTER 6. Let us begin with the rule for indeterminate z. In order to make this point clear. i. C(x + k) = C(x). k=0 The operator Σ is obviously linear: 1 1 1 = = [z n ] − (E − 1)z 1 − Ez 1−z Σ(αf (x) + βg(x)) = αΣf (x) + βΣg(x) 1 1 1 = = [z n+1 ] − This is proved by applying the operator ∆ to both E−1 1 − Ez 1−z sides. we but these identities have already been proved using observe that g(x) = Σf (x) is not uniquely deterthe addition operator S. if we consider the integer variable k and set a = x and b = x + n + 1: b−1 k n (−1)k Fm+2k k n Fm+k k k=a f (k) = g(b) − g(a) These facts create an analogy between ∆−1 and D . The rule above ∆(f (x)g(x)) = ∆f (x)Eg(x) + f (x)∆g(x) is called the rule of deﬁnite summation.e. and if f (x) is any function. which can be stressed by conk sidering the formal properties of Σ. the function C(x) reduces to a conn 1 1 stant.By applying the operator Σ to both sides and exnoted by Σ. mined. which is constant with respect to the diﬀerence of a product. we have: −1 6. FORMAL METHODS This derivation shows that the two operators S and have ∆−1 f (x) = g(x) and the rule of deﬁnite summation immediately gives: ∆ commute.11 Deﬁnite and summation Indeﬁnite ∆(g(x) + C(x)) = ∆g(x) + ∆C(x) = ∆g(x) and therefore: The following result is one of the most important rule Σf (x) = g(x) + C(x) connecting the ﬁnite operator method and combinaWhen f (x) is a sequence and only is deﬁned for intetorial sums: ger values of x. Hence we Σ(f (x)∆g(x)) = f (x)g(x) − Σ(∆f (x)Eg(x)) . In fact. and plays the same rˆle as the integration cono = E k = [z n ] 1 − z 1 − Ez stant in the operation of indeﬁnite integration. cor= (E − 1)∆ = E−1 responding to the well-known rule of the indeﬁnite We observe that the operator E commutes with the integration operator. First of all. We can directly verify this property: S∆f (x) ∆Sf (x) = f (x + 2) − f (x) = (E 2 − 1)f (x) = ∆(f (x + 1) + f (x)) = = f (x + 2) − f (x) = (E 2 − 1)f (x) Consequently. If C(x) is any function periodic of period 1. ∀k ∈ Z. The fundamental theˆ orem of the integral calculus relates deﬁnite and indeﬁnite integration: b a A simple example is oﬀered by the Fibonacci numbers: ∆SFm (∆S)n Fm = Fm+1 = ∆n S n Fm = Fm+n = (−1)n Fm+n = Fm+2n f (x)dx = g (b) − g (a) ˆ ˆ The formula for deﬁnite summation can be written in a similar way.

For example. in some way or ankm other. into a sum involving the diﬀerence of the ﬁrst factor. and this is just the operator inverse of the diﬀerence. The transformation can be convenient every time when the diﬀerence of the ﬁrst factor is simpler than the diﬀerence of the second factor. apart from a restricted number of cases. let us perform the following indeﬁnite summation: Σx x m = x = m+1 x x+1 = x − Σ(∆x) m+1 m+1 x x+1 = x −Σ = m+1 m+1 x x+1 = x − m+1 m+2 Σx∆ 79 have a number of sums. In this rather pesn simistic sense. 1) We have again the partial sums of the geometric series: ∆−1 px = n = = = px = Σpx p−1 px+n+1 − px p−1 px+k k=0 n pk k=0 pn+1 − 1 p−1 (x = 0) Obviously.12. we can look at the diﬀerences computed in 6) The sum of the previous sections and. For example. very general and (x + k)m completely useless. the rule is very ﬁne. sometimes. we have k=0 n solved the problem of ﬁnding k=0 f (x + k). in this way we immediately p+n+1 p − m+n m−1 1 xm+1 = Σxm m+1 (x + n + 1)m+1 − xm+1 m+1 1 (n + 1)m+1 m+1 (x = 0). ∆−1 is not easy to com. falling factorials: = = = raising factorials: = 1 (x − 1)m+1 = Σxm m+1 . we n can say that whenever we know. this indeﬁnite sum can be transformed into a deﬁnite sum by using the ﬁrst result in this section: b 2) The sum of consecutive Fibonacci numbers: ∆−1 Fx n = Fx+1 = ΣFx = Fx+n+2 − Fx+1 = Fn+2 − 1 (x = 0) Fx+k k=0 n k=a k k m b+1 = (b + 1) − m+1 a b+2 a+1 − a − + m+1 m+2 m+2 and for a = 0 and b = n: n Fk k=0 3) The sum of consecutive binomial coeﬃcients with constant denominator: ∆−1 n k k=0 k m = (n + 1) n+2 n+1 − m+2 m+1 x m = = = x m+1 =Σ x m 6. from a more positive point of view. the sum may not have any combinatorial interest. Unfortunately. of which we know that the second factor is a diﬀerence. obtain ∆−1 xm the Σ of some function. Here is a number of identities obtained by our previous computations.6. it reduces the sum of the successive elements p+k in a sequence to the computation of the indeﬁnite m+k k=0 sum.12 n Deﬁnite Summation k=0 x+k m n x+n+1 x − m+1 m+1 n+1 m+1 (x = 0) In a sense.5) The sum of pute and. DEFINITE SUMMATION This formula allows us to change a sum of products. there is no general rule allowing us to guess what ∆−1 xm −1 ∆ f (x) = Σf (x) might be. k=0 However. The negative point is that. an expression for ∆−1 f (x) = Σf (x). therefore. the rule: k=0 k m k=0 E k = (E n+1 − 1)∆−1 = (E n+1 − 1)Σ 4) The sum of consecutive binomial coeﬃcients: ∆−1 p+x m+x = = p+x m+x−1 =Σ p+x m+x is the most important result of the operator method. n In fact. for each of them. we do not have a simple function and.

By inverting. From the indeﬁnite we can pass to the deﬁnite sum by applying the general rule of Section 6. we recognize the generating function of the Bernoulli numbers. a function having derivatives of any order. In this sense. FORMAL METHODS (x + k)m (x + n)m+1 − (x − 1)m+1 m+1 1 nm+1 m+1 (x = 0). we have a formula for the Σ operator: Σ= 1 1 = eD − 1 D D eD − 1 = = k=0 n km k=0 7) The sum of inverse binomial coeﬃcients: ∆−1 n x m −1 = m x−1 m−1 m−1 −1 =Σ x m Now.e. for h = 1. hn (n) f (x) + · · · + ··· + As a simple but very important example. Since D−1 = dx. let us n! can be interpreted in the sense of operators as a result ﬁnd an asymptotic development for the harmonic connecting the shift and the diﬀerentiation operators. the EulerMcLaurin formula applies to Hn−1 and to the funcIn fact. giving: Df (x) D2 f (x) + + ··· Ef (x) = If (x) + n n n 1! 2! dx 1 1 1 1 Hn−1 = + − − 2 − x 2 x 1 12 x 1 and therefore as a relation between operators: 1 6 1 120 1 − 4 + − 6 + ··· 720 x 1 30240 x 1 1 1 1 1 1 + − + + − ln n − 2 2n 2 12n 12 120n4 1 1 1 − + + ··· − 6 120 256n 252 − n n D D2 Dn E =1+ + + ··· + + · · · = eD 1! 2! n! This formal identity relates the ﬁnite operator E and the inﬁnitesimal operator D. and subtracting 1 from both sides it can be formulated as: ∆ = eD − 1 = . which only is deﬁned when we consider a limited number of terms. numbers Hn .. i. + 30240 1209600 B0 + 1 D k=0 x+k m m m−1 −1 = x−1 m−1 −1 − x+n m−1 −1 . and therefore we have a development for Σ: Σ = B1 B2 2 D+ D + ··· = 1! 2! 1 1 1 3 = D−1 − I + D − D + 2 12 720 1 1 D5 − D7 + · · · .80 n CHAPTER 6. 8) The sum of harmonic numbers. The starting point is the Taylor theorem for the series expansion of a function f (x) ∈ C ∞ . We have a case of asymptotic development.13 The Euler-McLaurin Summation Formula f (k) k=0 = 0 f (x) dx − 1 n [f (x)]0 + 2 One of the most striking applications of the ﬁnite operator method is the formal proof of the EulerMcLaurin summation formula. as we know. but in general diverges if we let the number of terms go to inﬁnity. Since Hn = Hn−1 + 1/n. for approximating an integral by f (x + h) = f (x) + f ′ (x) + f ′′ (x) + means of a sum. vice versa. it can be written as: tion f (x) = 1/x. The usual form of the theorem: + 1 ′ 1 n n [f (x)]0 − [f ′′′ (x)]0 + · · · 12 720 and this is the celebrated Euler-McLaurin summation formula. we immediately have: n−1 n 6. The number of terms for which the sum approaches its true value depends on the function f (x) and on the argument x. we have: ∆−1 Hx n = xHx − x = ΣHx = = = (x + n + 1)Hx+n+1 − xHx − (n + 1) (n + 1)Hn+1 − (n + 1) = (n + 1)Hn − n (x = 0). the formula can be seen as a method for approximating a sum by means of an inh h2 tegral or. and this was just the point of view 1! 2! of the mathematicians who ﬁrst developed it.12. the Bernoulli numbers diverge to inﬁnity. Hx+k k=0 n Hk k=0 This is not a series development since. It expresses a sum as a function of the integral and the successive derivatives of the function f (x). Since 1 = ∆x.

provided that the sum actually converges. and therefore we cannot observe that: consider the limit 0.9189388. we it is known that σ = ln 2π. we have the following formula.14 Applications of the EulerMcLaurin Formula 4n √ πn 1− 1 1 + + ··· . The ﬁrst step consists the case of the harmonic numbers: in taking the logarithm of that quantity: n−1 n 1 1 n n kp = xp dx − [xp ]0 + pxp−1 0 − ln n! = ln 1 + ln 2 + ln 3 + · · · + ln n 2 12 0 k=0 so that we are reduced to compute a sum and hence to apply the Euler-McLaurin formula: n−1 n ln(n − 1)! = k=1 ln k = 1 n ln x dx − 1 n [ln x]1 + 2 n = np pnp−1 np+1 + + − kp = Here we have used the fact that ln x dx = x ln x − p+1 2 12 x. we observe that as n → ∞ this constant γ is the Euler-Mascheroni constant: n→∞ 81 nn en 1+ 1 1 + + ··· × 12n 288n2 = √ 2πn lim (Hn−1 − ln n) = γ = 0.577215664902 . At this point we can add ln n to both sides and k=0 p(p − 1)(p − 2)np−3 introduce a constant σ = 1 − 1/12 + 1/360 − · · · It is − + ··· not by all means easy to determine directly the value 720 of σ.6. 720 − In this case the evaluation at 0 does not introduce any constant. However. We proceed with the Euler√ 1 1 McLaurin formula in the following way: 1− + = 0. . but by other approaches to the same problem √ If p is not an integer. APPLICATIONS OF THE EULER-MCLAURIN FORMULA In this expression a number of constants appears.14. 8n 128n2 e 1 1142n2 − · · · 2 1 288n2 − · · · = Another application of the Euler-McLaurin sumAs another application of the Euler-McLaurin sum. Numerically. which only contains a ﬁnite number of terms: n . where q < 0. we can obtain xq . By adding np to both sides. . 1 + ··· ··· = 360n3 √ 1 1 nn + − ··· . which is Stirling’s approximation for n!. we eventually ﬁnd: Hn = ln n + γ + 1 1 1 1 − + − + ··· 2n 12n2 120n4 252n6 and this is the asymptotic expansion we were looking for. we can also ﬁnd the approximation for another important quantity: √ 2n 4πn 2n 2n (2n)! e = = 2n × n n!2 2πn n × = 1+ 1+ 1 24n 1 12n By adding 1/n to both sides of the previous relation. + + 6. when k=1 mation formula. and they can be summed together to form a constant γ. 12 360 n−1 n 1 1 n n pxp−1 1 − xp dx − [xp ]1 + kp = We can now go on with our sum: 2 12 1 k=1 √ 1 1 1 1 n − +· · · ln n! = n ln n−n+ ln n+ln 2π + − p(p − 1)(p − 2)xp−3 1 + · · · = 2 12n 360n3 720 To obtain the value of n! we only have to take expo1 1 1 pnp−1 np+1 − − np + + − = nentials: p+1 p+1 2 2 12 1 1 nn √ √ p p(p − 1)(p − 2)np−3 n 2π exp exp − ··· n! = − − + en 12n 360n3 12 720 1 2 + ··· = 720 x3 1 1 1 1 = n ln n − n + 1 − ln n + − 2 12n 1 1 1 − + + ···. we now show the derivation of the p is any integer constant diﬀerent from −1. By means of this approximation. − 12 360n2 360 + 1 1 12 x − 1 n p(p − 1)(p − 2)xp−3 0 + · · · = 720 np+1 np pnp−1 − + − p+1 2 12 p(p − 1)(p − 2)np−3 − + ···. 2πn n 1 + = e 12n 288n2 × 1− This is the well-known Stirling’s approximation for n!.919(4) and ln 2π ≈ 0.mation formula is given by the sum n k p . after ⌈p⌉ diﬀerentiations.

For p = −2 we ﬁnd: n n k=1 1 1 1 1 1 = K−2 − + 2 − 3 + − ··· k2 n 2n 6n 30n5 It is possible to show that K−2 = π 2 /6 and therefore we have a way to approximate the sum (see Section 2. 720 CHAPTER 6.4603545 . When p > −1 the constant is less important. .7).2078862 . For example. the sum converges to Kp .e. . i. we have: n √ k= k=1 √ 1 2 √ n n n+ + K1/2 + √ + · · · 3 2 24 n K1/2 ≈ −0.82 p(p − 1)(p − 2) ··· 720 np+1 np pnp−1 + + − p+1 2 12 + − The constant: Kp = − 1 1 p p(p − 1)(p − 2) + − + + ··· p + 1 2 12 720 p(p − 1)(p − 2)np−3 + · · · + Kp . √ 1 1 1 √ = 2 n + K−1/2 + √ − √ + ··· 2 n 24n n k k=1 K−1/2 ≈ −1. when p < −1. . . In that case. in fact. . FORMAL METHODS n kp k=1 = has a fundamental rˆle when the leading term o np+1 /(p + 1) does not increase with n..

1 Let f (t) = n fn tn be a power se. The natural setting for talking about convergence is the ﬁeld C of the complex numbers and therefore. a lot of things can be said about formal power series. the previous theorem implies: Theorem 7.1 The convergence of power series |fn | < M/|t0 |n and therefore: ∞ n=N ∞ fn tn ≤ 1 The uniform convergence derives from the previous proof: the constant M can be made unique by choosing the largest value for all the t0 such that |t0 | ≤ ρ. This means R n→∞ 83 In many occasions. when we say that a series does not converge (to a ﬁnite value) for a given value t0 ∈ C. n a power series f (t) = n fn t converges for some fn tn < ∞. have |fn |rn ≥ 1 for inﬁnitely many n. In all the other cases. thus solving many problems in which the exact value of fn cannot be found. The value of R is uniquely determined and is called the radius of convergence for the series. this implies that ries such that f (t0 ) converges for the value t0 ∈ C. and diverges iﬀ t0 ∈ C iﬀ f (t0 ) = 0 n limt→t0 f (t) = ∞. the convergence is uniform in every circle of radius ρ < R. we have pointed out that our approach to power series was purely formal. then there exists a non-negative number R ∈ R or R = ∞ such that: 1. from now on. this implies lim sup n |fn | ≥ 1/R. There are cases for which a series neither converges nor diverges. we always spoke of “formal power series”. for every complex number t0 such that |t0 | > R the series does not converge. the theorem follows. we mean that the series in t0 diverges or does not tend to any limit. As we have seen. in fact.1. the series n (−1)n tn does not tend to any limit. ﬁnite or inﬁnite. . but now the moment has arrived that we must turn to talk about the convergence of power series. Therefore. Obviously. A basic result on convergence is given by the following: n=N |fn ||t1 |n ≤ |t1 | |t0 | n ≤ ∞ n=N M |t1 |n = M |t0 |n ∞ n=N <∞ because the last sum is a geometric series with |t1 |/|t0 | < 1 by the hypothesis |t1 | < |t0 |. for some ﬁnite M ∈ R. as for n tn /n! = et . we can prove that if the series diverges for some value t0 ∈ C.Chapter 7 Asymptotics 7.2 Let f (t) = n fn tn be a power series.1. for example. Obviously. In a similar way. as it happens for n n!tn . we will think of the indeterminate t as of a variable taking its values from C. Besides. M or n |fn | ≤ n Then f (t) converges for every t1 ∈ C such that lim sup |fn | ≤ 1/R. for r > R we |t1 | < |t0 |. for every complex number t0 such that |t0 | < R the series (absolutely) converges and. We will see that this allows us to evaluate the asymptotic behavior of the coeﬃcients fn of a power series n fn tn . then it diverges for every value t1 such that |t1 | > |t0 |. a series can converge for the single value t0 = 0. the asymptotic evaluation of fn can be made more precise and an actual approximation of fn can be found. or can converge for every value t ∈ C. many times. for r < R we have |fn |rn ≤ √ n M /r → 1/r.proof of the theorem. when t = 1. In fact. and never considered convergence problems. 2. and therefore we have the folProof: If f (t0 ) < ∞ then an index N ∈ N ex.lowing formula for the radius of convergence: n ists such that for every n > N we have |fn t0 | ≤ 1 = lim sup n |fn |. Because of that. From the Theorem 7. Since the ﬁrst N terms obviously amount to a ﬁnite quantity. n |fn ||t0 | < M .

However.2 The method of Darboux If we limit ourselves to the term in 1/n.1 Let f (t) = g(t)h(t) where f (t). = ex 1 + µ(t) is R = 1/3. ≈ 1+ 1− ≈1+ √ 2n 2n 2n 2 1 − t − 1 − 2t − 3t . instead. when γ ∈ C is a ﬁxed number and n is large. By Bender’s theorem we ﬁnd: ≈ Γ(n + b) 1 4 n+2 1/2 [t ](1 − 3t) = µn ∼ − na−b ea b2 b−a a2 2 3 ≈ 1− 1+ = 1+ √ a−b eb e 2n 2n 2n 3 1/2 n+2 (−3) = = − 1 a2 − b2 − a + b 3 n+2 +O . = na−b 1 + √ n+2 2n n2 3 2n + 4 3 = . Let us supn+a 1 + a/n pose we wish to ﬁnd the asymptotic value for the b a b−a Motzkin numbers. ASYMPTOTICS This is a particular case of a more general result due to Darboux and known as Darboux’ method.lowing way: gence of g(t). whose generating function is: .84 This result is the basis for our considerations on the asymptotics of a power series coeﬃcients. it implies that.Γ(n + a) ≈ e = n+a (n + b)n+b fore equals the radius of convergence of g(t)). then: n + b b−a a−b (1 + a/n)n+a = e n . We begin by proving the following formula for the ratio of two large values of the Γ function (a. and many possibilities arise. We conclude by noticing that if: n→∞ CHAPTER 7. Let us apply the Stirling formula for the Γ function: 7. and many other proofs can be performed by using Newton’s rule together with the following theorem. for our expression we have: of convergence. will be called Bender’s theorem: Γ(n + a) ≈ Γ(n + b) ≈ × 2π n+a n+b 2π n+a e e n+b n+a 1+ n+b 1 12(n + a) 1 12(n + b) × . therefore we have µn /µn+1 → 1/3 as Γ(n + a) n → ∞. let us show how it is possible to obtain an γ approximation for the binomial coeﬃcient n . if Γ(n + b) limn→∞ gn /gn+1 = b and h(b) = 0. b are two small parameters with respect to n): Γ(n + a) = Γ(n + b) = na−b 1 + (a − b)(a + b − 1) +O 2n 1 n2 . lim fn+1 =S fn then R = 1/S is the radius of convergence of the series. h(t) are power series and h(t) has a ran + b b−a (n + a)n+a dius of convergence larger than f (t)’s (which there. therefore. In fact. µ(t) = 2t2 x x n+x = exp (n + x) ln 1 + = 1+ For n ≥ 2 we obviously have: n n √ x x2 1 − t − 1 − 2t − 3t2 = exp (n + x) − 2 + ··· = µn = [tn ] = n 2n 2t2 x2 x2 √ 1 = exp x + − + ··· = = − [tn+2 ] 1 + t (1 − 3t)1/2 . |fn | grows as 1/Rn . the next sections will be dedicated to this problem. n 2n 2 We now observe that the radius of convergence of x2 + ··· . this is a rough estimate. because it can also grow as n/Rn or 1/(nRn ). as a ﬁrst approximation. rections cancellate each other and therefore we ﬁnd: g(t). which is the same as the radius of 2n √ 1/2 g(t) = (1−3t) . 1− Let us remember that if g(t) has positive real coef. In practice. we give a simple example. The proof of this theorem is omitted 1 + b/n n+b = ≈ here. We are now in a position to prove the following: 3(2n + 3) n + 2 4 . n+a (1 + b/n)n+b fn ∼ h(b)gn Newton’s rule is the basis for many considerations on asymptotics. First of all.We now obtain asymptotic approximations in the folﬁcients. the two corTheorem 7. we used it to prove that √ Fn ∼ φn / 5. which can make more precise the basic approximation. while h(t) = 1 + t has 1 as radius Therefore.2. whose relevance was noted by Bender and. then gn /gn+1 tends to the radius of conver.

f (t) = 1/(1 − t2 ) By repeated applications of the recurrence formula has t = 1 and t = −1 as dominating singularities. In . if Γ(n − γ) = t0 is any one of the dominating singularities for f (t).3. and therefore f (t) must be diﬀerent from f (t). the dominating singularities are t0 = 2πi and t1 = −2πi. For example. By this deﬁnition. the function f (t) = 1/(1 − t) has a pole of order 1 in t0 = 1. t0 = 1 is a singularity for f (t) = 1/(1 − t). Therefore. notwithstanding the (1 − t)2 . from which the desired formula follows. The radius of convergence is always a non-negative real number and we have R = |t0 |. while 1/(1 − t)2 has a pole of order 2 in t0 = 1 and 1/(1 − 2t)5 has a pole of order 5 in t0 = 1/2. for example. and this happens when t = 2kπi. A ﬁrst case is when t0 is an isolated point for which limt→t0 f (t) = ∞. So. − 1 t→0 e The denominator becomes 0 when et = 1. and therefore γ γ(γ − 1) · · · (γ − n + 1) = = the radius of convergence is determined by the singun! n larity or singularities of smallest modulus. Observe ﬁrst that t = 0 is not a pole because: t→0 et lim 1 t = lim t = 1. This is the case of f (t) = 1/(1 − t). the generating function of the ordered Bell numbers f (t) = 1/(2−et ) has again an inﬁnite number of poles t = ln 2+2kπi. we have to distinguish between the function f (t) and its series development. for the moment. not every and therefore: singularity of f (t) is such that f (t) = ∞. Because our previous consideraProof: We simply apply Bender’s theorem and the tions. In order to do this. SINGULARITIES: POLES 85 Theorem 7. when we wish to have an approximation of its coeﬃcients. for every t such that |t| > |t0 | the series f (t) should diverge as we have seen in the previous section. t→t0 7. in fact. The γ (−1)n Γ(n − γ) = = following situation is very important: if f (t0 ) = ∞ Γ(n + 1)Γ(−γ) n and we set α = 1/t0 . has a pole of order 1 in t0 = 1. We conclude this section by observing that if f (t0 ) = ∞. and h(t) having a diverges. Finally.2 Let f (t) = h(t)(1 − αt)γ .2. plicitly that a function can have more than one domΓ(n + 1) inating singularity. These will be called dominating singularities and we observe ex(−1)n (n − γ − 1)(n − γ − 2) · · · (1 − γ)(−γ) = . let us limit ourselves to this case. Then we have: We will call a singularity for f (t) every point t0 ∈ C such that in every neighborhood of t0 there is a t for 1 αn h(1/α) γ which f (t) and f (t) behaves diﬀerently and a t′ for fn = [tn ]f (t) ∼ h (−α)n = .7. in fact: t→1 lim (1 − t) et − e et et − e = lim = lim = −e. we will say that t0 is a pole for n γ(γ + 1) (−1) −1−γ f (t) iﬀ there exists a positive integer m such that: n 1+ = Γ(−γ) 2n lim (1 − αt)m f (t) = K < ∞ and K = 0. In fact. f (t) assumes a well-deﬁned value while f (t) γ which is not a positive integer. but f (t) can be diﬀerent from f (t) on the border of the circle or outside it. we have f (t) = 1/(1 − t) and f (t) = 1 + t + t2 + t3 + · · ·. In that case. we are now going to look more closely to methods for ﬁnding the radius of convergence of a given function f (t). but. n α Γ(−γ)n1+γ which f (t′ ) = f (t′ ). An isolated point t0 for which f (t0 ) = ∞ is there= (n − γ − 1)(n − γ − 2) · · · (1 − γ)(−γ)Γ(−γ) fore a singularity for f (t). radius of convergence larger than 1/α. which will be denoted by f (t). A more interesting case is f (t) = (et − e)/(1 − t)2 . which. When The integer m is called the order of the pole. on the other hand. not necessarily t0 is a pole for f (t). e2kπi = cos 2kπ + i sin 2kπ = 1. Therefore. in this case the dominating singularity is t0 = ln 2. the singularities of f (t) determine its radius of formula for approximating the binomial coeﬃcient: convergence.3 Singularities: poles The considerations in the previous sections show how important is to determine the radius of convergence of a series. as we shall see. befor the Γ function Γ(x + 1) = xΓ(x). The series f (t) represents f (t) inside the circle of convergence. in this case. for some |t| > 1. in the sense that f (t) = f (t) for every t internal to this circle. where actually the series does not converge. 2 t→1 1 − t t→1 −1 (1 − t) The generating function of Bernoulli numbers f (t) = t/(et − 1) has inﬁnitely many poles. no singularity can be contained in the circle of convergence. Therefore. which goes to ∞ when t → 1. we ﬁnd: cause |1| = |−1|. the radius of convergence can be determined if we are able to ﬁnd out values t0 for which f (t0 ) = f (t0 ).

φ 1 − φt 5 . we have a total of n(n − 1)! cases. is acIn order to ﬁnd the asymptotic value for Dn . For G = . let us consider the func. a direct application of Bender’s theorem is suﬃcient. but all the other terms go to ∞. In this way. these singularities cannot be treated by n n Darboux’ method and their study will be delayed un. we tually a derangement. giving the approximation: Dn = n! − n(n − 1)!. and eventually arrive to the ﬁnal value: Dn = n! − n(n − 1)! + − = n (n − 2)! − 2 Darboux’ method can be easily used to deal with functions. which have to be sential singularities are points at which f (t) goes to subtracted again. For n = 0 the empty permutation is considDn e−t ered a derangement. called derangements (see Section 2. and for n = 4 we is 1. 1 m Therefore. let us consider the generating function for the √ central binomial coeﬃcients f (t) = 1/ 1 − 4t. Dn = n! − n(n − 1)! + (n − 2)!. Es. written in cycle notation. we have subtracted twice every permutation with at least 2 ﬁxed points: in fact. we have (n − 1)! possible permutations.2). but t0 is not a pole of order 1 because: √ 1 − 4t = lim 1 − 4t = 0 lim √ t→1/4 1 − 4t t→1/4 CHAPTER 7. Therefore.Dn = n! − n(n − 1)! + (n − 3)!. Whenever we have a function f (t) for which a point t0 ∈ C exists such that ∀m > 0: limt→t0 (1−t/t0 )m f (t) = ∞. but for n = 2 the permutation (1 2). t0 = 1 is not a pole of any order. we added twice permutations we say that t0 is an essential singularity for f (t). Actually. For t0 = 1/4 we have f (1/4) = ∞. and therefore by permutations. and this is the way we will use in the following examples. however. ASYMPTOTICS the number of permutations having at least a ﬁxed point: if the ﬁxed point is 1. n! e 1 1 1 [tn ] = √ φn . whose dominating singularities are poles.with at least three ﬁxed points. 1+ = lim (1 − t) 2 t→1 1 − t 2(1 − t) Thus we have the new approximation: In fact. and the same happens if we try with (1 − 4t)m for m > 1.86 fact. we have again (n − 1)! permutations of the other elements. which goes to ∞ as t → 1. n! 1−t n = 1.4 Poles and asymptotics We can now go on with the same method. By have a total of 9 derangements. the mth n term tends to 1/m!. 2 Therefore. Let Dn the number of derangements in Pn . These are obtained by choosing the two ﬁxed points in all the n possible ways 2 1 1 m + + · · · = ∞. Fibonacci numbers are easily approximated: 1 t t = [tn ] ∼ [tn ] 2 1−t−t 1 − φt 1 − φt ∼ t t= n (n − 3)! + · · · = 3 n n! n! n! n! − + − + · · · = n! 0! 1! 2! 3! k=0 (−1)k . We thus obtain: ∞ too fast. while e−t converges for every value of t. the total number of permutations.This quantity is clearly 0 and this happens because tion f (t) = exp(1/(1 − t)). which is called the inclusion exclusion principle. and then permuting the n − 2 remaining elements. since no ﬁxed point exists. (n − 2)! − 3 2 til we study the Hayman’s method. 7. we can Bender’s theorem we have: count them in the following way: we begin by subn! Dn ∼ e−1 or Dn ∼ . tracting from n!. k! This formula checks with the previously found values. the ﬁrst m − 1 terms tend to 0. there is no derangement. we subtracted it when In this case we have: we considered the ﬁrst and the second ﬁxed point. this kind of singularity is called “algebraic”. As we shall see. For n = 3 we have the two observe that the radius of convergence of 1/(1 − t) derangements (1 2 3) and (1 3 2). the theorem on the generating function for the parA derangement is a permutation without any ﬁxed tial sums we have: point. We obtain the exponential generating function 1 − φt G(Dn /n!) by observing that the generic element in Our second example concerns a particular kind of the sum is the coeﬃcient [tn ]e−t . Finally. we have now to add permutations with at = lim (1 − t) exp t→1 1−t least two ﬁxed points. if the ﬁxed point is 2.

while for t such that |t| > |t0 |. For singularities of a function f (t). which. called an algebraic singularity. larger and larger as n increases. say t0 . is a periodic (−1)k (2π)2k . in the complex ﬁeld. this conﬁrms that the Bernoulli tains a logarithm. 1 + t/πi t(1 + t/2πi) we conclude that f (t) cannot converge. . A point t0 for which the arcome. then [tn ]f (t) can be the smallest in modulus among these points. In fact. The points at which a square root becomes 0 are dominating singularity: special points. these two values are opposite in sign The same considerations hold when a function conand the result is 0. . Finally. therefore: function: B2k ∼ − 2(−1)k (2k)! . in modulus.5 Dn . we have (2πi)2k = (−2πi)2k = nential. and there are two branches deﬁned by the same expression. tk are all the dominating in every neighborhood the function is two-valued. Only On ∼ n!/(2(ln 2)n+1 ). but when we 1 1 pass to complex numbers this is no longer possible. This allows us to identify the t = ln 2 [t ] = ∼ 2 − et 1 − t/ ln 2 power series f (t) with the function f (t). values for which the argument of the root is 0 is a e −1 1 − t/2πi et − 1 (2πi)n A similar result is obtained for the other pole. when the argument is a posAt this point we have: itive real number. t > |t0 |. This shows = lim = −1. Because we know lim = lim = −1. we can choose the positive value 1 − t/ ln 2 1 1 [tn ] = [tn ] ∼ as the result of a square root. This formula is a good approximation. t2 . by means we have: of Bender’s theorem. consider a t ∈ R. Let us now see how Bender’s theorem is applied Let us consider the generating function for the Cata√ to the exponential generating function of the ordered lan numbers f (t) = (1 − 1 − 4t)/(2t) and the corBell numbers. In other words. We have shown that the dominating responding power series f (t) = 1 + t + 2t2 + 5t3 + singularity is a pole at t = ln 2 which has order 1: 14t4 + · · ·. n! (2πi)n (−2πi)n asymptotic evaluation for the Motzkin numbers. = Actually. which relies on Newton’s rule. gument of a logarithm is 0 is a singularity for the . t→2πi t→2πi et − 1 et that when f (t) converges we must have f (t) = f (t). we already used this method to ﬁnd the ∼− − . a logarithm is an inﬁnitenumbers of odd index are 0. which can actually be computed as the integer singularities nearest to n!/e. . When valued function. |t| < |t0 |. say n = 2k. because it is the inverse of the expon is even. ALGEBRAIC AND LOGARITHMIC SINGULARITIES 87 Algebraic and logarithmic This value is indeed a very good approximation for 7. we ﬁnd the asymptotic approximation for one of these two branches coincides with the functhe Bernoulli numbers. We already observed that ±2πi are the two dominating singularities for the generating function of the In fact.5. f (t) should coincide with a branch of f (t).7. but Principle: If t1 . for k ∈ Z. which is obviously a very important when we have functions with several one-valued function. the expression under the square root should be a negative real number Bernoulli numbers. lim n t→ln 2 t→ln 2 2 − et −et 2 ln 2 k=0 Ck Cn−k deﬁning the Catalan numbers. in the complex ﬁeld C. . Bn 1 1 Actually. f (t) cannot converge. and shows that Bernoulli numbers be. They can be treated by Darboux’ method or. directly. When n is odd. except for n = 1. Our choice of the − sign was motivated 1 −1/ ln 2 1 − t/ ln 2 by the initial condition of the recurrence Cn+1 = = lim = . This is due to the fact that. in them the function is one-valued. also for small The period of et is therefore 2πi and ln t is actually values of n. they are both poles of order 1: and therefore f (t) ∈ C\R. Therefore we have: Every kth root originates the same problem and t(1 − t/2πi) t 1 1 the function is actually a k-valued function. thus singularity. but f (t) can only be a real t(1 − t/2πi) 1 − t/πi number or f (t) does not converge. lim t→−2πi t→−2πi et − 1 et that t0 is a singularity for f (t). a function containing 2 (ln 2)n+1 and we conclude with the very good approximation a square root is a two-valued function. The following statement is tion deﬁned by the power series. (2π)2k et+2kπi = et e2kπi = et (cos 2kπ + i sin 2kπ) = et . all the n n [t ] t = [t ] ∼− . found by summing all the contributions obtained by we must have the following situation: for t such that independently considering the k singularities. we 2 − et 1 − t/ ln 2 2 − et consider the arithmetic square root instead of the al1 1 − t/ ln 2 n gebraic square root.ln t + 2kπi.

by one of the previous methods. By Bender’s f (t) = A (1 − αt)p/m + A (1 − αt)(p−1)/m + p p−1 theorem we have: (p−2)/m + Ap−2 (1 − αt) + ···. i. the successive application of this method of subtracted singularities 1 1 1 1 ln ∼ ln . = − n n−1 n(n − 1) 2n 2n 2n + = . introduced n k=1 section we will see how it can be improved. the factor (1 − 2t) actually reduces the order of growth of the logarithm: −[tn ] 1 − 2t 1 ln = 2(1 − t) 1 − 2t 1 1 = − [tn ](1 − 2t) ln = 2(1 − 1/2) 1 − 2t 2n 2n−1 2n = −2 . In every neighborhood of t0 .6 Subtracted singularities The methods presented in the preceding sections only give the expression describing the general behavior of ∞ the coeﬃcients fn in the expansion f (t) = k=0 fk tk . When we need a true approximation. ASYMPTOTICS completely eliminates the singularity. for some A. we arrive to the following 2 3 4 n 2 k k=1 results. that a function f (t) is such that f (t) ∼ A(1−αt)γ . When f (t) has a dominating algebraic singularity. For example. In the next an example the sum S = n 2k−1 /k. if a second singularity t1 exists such that |t1 | > |t0 |. Therefore. If t0 is a dominating pole of order m and α = 1/t0 . we speak of “asymptotic evaluation” or “asymptotic approximation”. where α = 1/t0 and h(t) has a raThere are two singularities: t = 1 is a pole. more in general. ∼ The same method of subtracted singularities can be 2 1 − 1/2 1 − 2t n used for a logarithmic singularity. in the previous section. Let us suppose we have the sum: CHAPTER 7. Because fn ∼ gn . 2(1 − t) 1 − 2t The generic term hn should be signiﬁcantly less than fn . formally.e. G 2 k 2 1 − t 1 − 2t k=1 If t0 is a dominating algebraic singularity and f (t) = h(t)(1 − αt)p/m . we should introduce some corrections. or. then we ﬁnd the 1/2 is a logarithmic singularity.88 corresponding function. the quantity gn + hn is a better approximation to fn than gn . this behavior is only achieved for very large values of n. n n(n − 1) n−1 Therefore. The generating function is: A−m+1 A−m+2 A−m + + +· · · . we can express the coeﬃcients fn in terms of t1 as well. the following observation solves the problem. since O(fn ) = O(gn ). Let us consider as This is not a very good approximation. γ ∈ R. 2 1 − t 1 − 2t 1 − 2t Let us therefore consider the new function: h(t) = 1 1 1 1 ln − ln = 2 1 − t 1 − 2t 1 − 2t 1 − 2t 1 = − ln . Suppose we have found. then we ﬁnd the expansion: and we wish to compute an approximate value. We found the principal value Sn ∼ 2n /n by studying the generating function: 7. Therefore. .. which slightly modify the general behavior and more accurately evaluate the true value of fn . this cannot be eliminated. while t = dius of convergence larger than |t0 |. this is the only fact distinguishing a logarithmic singularity from an algebraic one. but the method of subtracted singularities allows us to obtain corrections to n 2 4 8 2n−1 1 2k the principal value. it is dominating and R = 1/2 is the radius of convergence of the function. When f (t) has a pole t0 of order m. 1 1 n 1 [t ] ln ∼ Sn = Newton’s rule can obviously be used to pass from 2 1 − t 1 − 2t 1 1 2n 1 these expansions to the asymptotic value of fn . Because of that. Formally. by the successive apSn = 1 + + + + · · · + = plication of this method. this is the case of ln(1/(1 − αt)). the function h(t) = f (t) − g(t) has coeﬃcients hn that grow more slowly than fn . f (t) = n k (1 − αt)m (1 − αt)m−1 (1 − αt)m−2 1 1 1 2 1 = ln . Since the latter has expansion: smaller modulus. for some function g(t) of which we exactly know the coeﬃcients gn . the function has an inﬁnite number of branches. Sometimes. but for smaller values it is just a rough approximation of the true value. what is called the principal value for fn . a better approximation for Sn is: Sn = The reader can easily verify that this correction greatly reduces the error in the evaluation of Sn . f (t) ∼ g(t). Many times. [tn ] ln = . we must have hn = o(fn ).

8 Hayman’s method In the former case. HAYMAN’S METHOD 89 A further correction can now be obtained by con. n−1 n(n − 2) should precede the square root. since the case α = β has no interest and the case α = −β should be ap1 1 [tn ](1 − 2t)2 ln = kn = proached in another way. these methods become useless when the function f (t) has no singularity (entire functions) or when the dominating singularity is essential. by the technique of “subtracted singularities”. This hypothesis means that 2(1 − 1/2) 1 − 2t t = 1/α is the radius of convergence of the function 2n−1 2n−2 2n+1 2n −4 +4 = . as we have seen. where 2(1 − t) 1 − 2t s = 1/2 or s = −1/2. once and for 1 (1 − 2t)2 ln = all.where m is a small integer. . However.1.8. each one of the form: 1 1 1 k(t) = ln − 2 1 − t 1 − 2t 1 qk [tn−k ] . and therefore we have: 1 fn = − [tn+m ] r (1 − αt)(1 − βt) The method for coeﬃcient evaluation which uses the function singularities (Darboux’ method). 1+ = + +O 2n − 1 2n 2n 4n2 n3 This implies: 3 1 1 1 1+ = +O 2n − 3 2n 2n n2 1 1 1 ln = ln − 2(1 − t) 1 − 2t 1 − 2t and get: 1 1 2 + (1 − 2t) ln − ··· − (1 − 2t) ln α − β αn 25 6 + 3(α − β) 1 − 2t 1 − 2t √ fn = 1− + + α 2n πn 8(α − β)n 128n2 and the result is the same as the one previously obtained by the method of subtracted singularities. In fact. 2n 2 Let us consider s = 1/2. 7. we can obtain the same results if we exconsidered suﬃcient for obtaining both the asymppand the function h(t) in f (t) = g(t)h(t). 1 1 (1 − αt)(1 − βt) − ln + (1 − 2t) ln = 1 − 2t 1 − 2t It is therefore interesting to compute.7. which gives: Let us suppose that |α| > |β|. and we can develop everything around this singular= n n−1 n−2 n(n − 1)(n − 2) ity. which has no eﬀect on fn . can be improved and made more accurate. the asymptotic value [tn ]((1−αt)(1−βt))s . Unfortunately. h(t) with a totic evaluation of fn and a suitable numerical apradius of convergence larger than that of f (t). around proximation. This correction is still smaller. + 9 9αβ + 51β 2 − +O 2 8(α − β)n 32(α − β)2 n2 1 n3 . a minus sign Sn ∼ 1+ . The evaluation is shown in Table 7. The formula so obtained can be In general. as many as there are terms sidering: in the polynomial q(t). In many problems we arrive to a generating function of the form: f (t) = or: g(t) = p(t) − (1 − αt)(1 − βt) rtm q(t) (1 − αt)(1 − βt) . gn is the sum of various terms. In the second case. p(t) is a correcting polynomial. in this case. for n suﬃciently large. we can use the following dethe dominating singularity. This is done in the folvelopments: lowing way: 2n 4n 1 1 1 1 1 = √ 1− + +O = = n 8n 128n2 n3 πn 2(1 − t) 1 + (1 − 2t) 1 1 1 1 1 = 1 − (1 − 2t) + (1 − 2t)2 − (1 − 2t)3 + · · · . In most combinatorial problems we have α > 0. and we can write: but this is not a limiting factor. and in the latter the development around the singularity gives rise to a series with an inﬁnite number of terms of negative degree. 7. because the coeﬃcients of f (t) are positive numbers.7 The asymptotic behavior of a trinomial square root The reader is invited to ﬁnd a similar formula for the case s = 1/2. in the former case we do not have any singularity to operate on.

to clarify the application of Hayman’s method. in an algorithmic way we can check whether f (t) belongs to the class of functions for which the method is applicable (the class of “H-admissible” functions) and. by Flajolet. then: exp(f (t)) f (t) + g(t) f (t) + p(t) p(t)f (t) 2 ln t 1 1 ln t (1 − t) is H-admissible. Table 7. if that is the case. δ are real numbers. then the function: f (t) = exp × β (1 − t)α 1 1 ln t (1 − t) δ γ × γ where γ is a suitable path enclosing the origin. Salvy and Zimmermann. derived from Cauchy theorem.1: The case s = 1/2 In these cases. then the of these notes. β are positive real numbers and γ. We do not intend to develop this method here. The development of the method was mainly performed by Hayman and therefore it is known as Hayman’s method. let us show some examples function exp(p(t)) is H-admissible. p(f (t)) As we said before.8. if α. ASYMPTOTICS β + α (1 − αt) 1/2 1/2 = =− α−β n α [t ](1 − αt)1/2 1 + β 2(α−β) (1 β α−β (1 − αt)1/2 1 + (−α)n + (1 − αt)1/2 + 1/2 n β 2(α−β) (1 − αt) − β 8(α−β)2 (1 β2 8(α−β)2 (1 2 − αt) 2 = − αt)2 + · · · = − αt)5/2 + · · · = (−α)n + · · · = + ··· = 1 n3 β 3/2 2(α−β) n − αt)3/2 − (−α)n − − − β 5/2 8(α−β)2 n α−β (−1)n−1 2n α 4n (2n−1) n α−β 2n α α 4n (2n−1) n n (−α)n 1 − 1− β 3 2(α−β) 2n−3 β2 15 8(α−β)2 (2n−3)(2n−5) 2 3β 2(α−β)(2n−3) 15β 8(α−β)2 (2n−3)(2n−5) +O .1 Let f (t) be an H-admissible function. but we’ll limit ourselves to sketch a method. For H-admissible functions. the only method seems to be the Cauchy theorem. The method can be implemented on a computer in the following sense: given a function f (t). For example. this also justiﬁes the use of the letter H in the deﬁnition of H-admissibility. t(1 − t)2 1 − t In particular. which allows us to evaluate [tn ]f (t) by means of an integral: fn = 1 2πi f (t) dt tn+1 3. if p(t) is a polynomial with positive coeﬃcients based on Cauchy’s theorem and is beyond the scope and not of the form p(tk ) for k > 1. the following result holds: Theorem 7. A function is called H-admissible if and only if it belongs to one of the following classes or can be obtained. which allows us to ﬁnd an asymptotic evaluation for fn in many practical situations. are all H-admissible functions. in a ﬁnite number of steps according to the following rules. for the third function we have: exp t 1−t = exp 1 −1 1−t = 1 exp e 1 1−t and naturally a constant does not inﬂuence the Hadmissibility of a function. the following functions are all Hadmissible: et exp t + exp t2 2 exp t 1−t 1 1 ln . The system ΛΥΩ. Instead. the proof of this theorem is 2. we can evaluate the principal value of the asymptotic estimate for fn . then: fn = [tn ]f (t) ∼ f (r) rn 2πb(r) as n→∞ where r = r(n) is the least positive solution of the equation tf ′ (t)/f (t) = n and b(t) is the function: b(t) = t d dt t f ′ (t) f (t) .90 [tn ] − (1 − αt)1/2 (1 − βt)1/2 = [tn ] − (1 − αt)1/2 =− =− =− =− = α−β n α [t ](1 α−β n α [t ] α−β α α−β α CHAPTER 7. realizes this method. . if f (t) and g(t) are H-admissible functions and p(t) is a polynomial with real coeﬃcients and positive leading term. from other H-admissible functions: 1. In this example we have α = β = 1 and γ = δ = 0.

1−r e t 2 = n or nt − (2n + 1)t + n = 0. EXAMPLES OF HAYMAN’S THEOREM 91 As n grows. which is 2 1 1 1 1 positive and less than√ other. which (1 − t)2 can be computed when we write it as exp(n ln 1/r). we have to solve the equation tet /et = n. and therefore we can try to evaluate the exp 1 − r n e 8 n asymptotic behavior of the sum. by developing the expression inside the square root: √ 4n + 1 = = 1 = 4n √ 1 2 n 1+ +O 8n 1+ √ 2 n 7. which in the present case 1+ √ +O . Let f (t) = et be the exponential function. It is surely positive. Examples become early rather complex and require First we compute an approximation for f (r). = n− + √ +O 2 8 n n t t 1−t = [tn ] exp . for every n > 1. The function b(t) is simply t and therefore we have: [tn ]et = en √ nn 2πn 1 n2 . exp = [tn ] 1−t 1−t 1−t Finally.7. g(t) = value of the result. For applying Hayman’s theorem.9 Examples Theorem of Hayman’s The ﬁrst example can be easily veriﬁed. From this formula we immediately obtain: 1 1 1 r =1− √ + − √ +O n 2n 8n n 1 n2 and in this formula we immediately recognize Stirling which will be used in the next approximations. The second part we have to develop is 1/rn . the correction can be ignored (it t (1 − t)2 exp 1−t can be not precise) and we get: √ The value of r is therefore given by the minimal posr e n itive solution of: exp = √ . Let us deﬁne the √ 1 1 e n function g(t) = tf ′ (t)/f (t). Let us consider the is exp(r/(1 − r)). we have that is: the two solutions: 1 1 1 1 √ = = exp n ln 1 + √ + +O √ n r n 2n n n 2n + 1 ± 4n + 1 r= 1 1 1 2n √ − +O = exp n √ + n 2n n n and we must accept the one with the ‘−’ sign. that a large amount of computations. approximation for factorials. so that we know fn = 1/n!. Because ∆ = 4n2 + 4n + 1 − 4n2 = 4n + 1.9. √ . the exponential gives: We have already seen that this function is Hr e n 1 1 √ +O = √ exp = admissible. = √ n e 8 n is: t t Because Hayman’s method only gives the principal (1−t)2 exp 1−t t = . the √ √ +O √ √ +O − 2 n n n n n because 4n + 1 < 2 n + 1 < 2n. we can also give an asymptotic approximation of r. Since: following sum: 1 1 1 1 = 1−r = √ − + √ +O n n−1 n2 n 2n 8n n n−1 1 n−1 1 = = 1 1 1 1 k − 1 k! (k + 1)! k √ 1− √ + = √ +O k=0 k=0 n 2 n 8n n n n 1 n−1 n − = = we immediately obtain: k k k! k=0 r 1 1 1 n−1 n √ × = 1− √ + +O n−1 1 n 1 1−r n 2n n n − = = k k! k k! √ 1 1 1 k=0 k=0 √ × n 1+ √ + = +O 2 n 8n n n 1 t − = [tn ] exp √ 1−t 1−t 1 1 1 √ = = n 1− √ + +O 2 n 8n n n 1 t − [tn−1 ] = exp √ 1−t 1−t 1 1 1 . which gives r = n.

We can thus verify that the relative error deAgain. = +O n n 2 n 8n n n By inverting this quantity. ASYMPTOTICS It is a simple matter to execute the original sum n−1 n−1 k=0 k /(k + 1)! by using a personal computer for. We now observe that f (r)/rn ∼ √ √ e2 n / e. 200. Only b(r) remains to be computed. the correction is ignored and we only consider creases as n increases. the principal value. n = 100. First: r(1 + r) = 1 1 1 √ × +O 1− √ + n 2n n n 1 1 1 √ × 2− √ + +O = n 2n n n 1 5 3 √ +O = = 2− √ + n 2n n n 5 3 1 √ . exp n+O √ n exp n = CHAPTER 7.92 = 1 1 1 1 √ √ + − +O n 2n 2n n n √ √ 1 ∼ e n. The results obtained can = be compared with the evaluation of the previous formula. we have b(t) = tg ′ (t) where g(t) is as above. 300. . we eventually get: r(1 + r) = (1 − r)3 √ 9 3 1 √ × +O = 2n n 1 + √ + 2 n 8n n n 3 5 1 √ × 1− √ + = +O 2 n 4n n n √ 1 1 √ . say. dt (1 − t)2 (1 − t)3 This quantity can be computed a piece at a time. ∼ √ 2 πen3/4 √ 3 = √ √ 4πn n = 2 πn3/4 . = 2n n 1 + +O 8n n n √ The principal value is 2n n and therefore: b(r) = 2πb(r) = the ﬁnal result is: fn = [tn ] exp t 1−t e2 n . = 2 1− √ + +O 2 n 4n n n By using the expression already found for (1 − r) we then have: 1 1 1 1 √ 1− √ + +O (1 − r)3 = √ n n 2 n 8n n n 1 1 1 1 √ √ 1− √ + × = +O n 2n n n n n 1 1 1 √ = × 1− √ + +O 2 n 8n n n 3 9 1 1 √ √ 1− √ + . and therefore we have: b(t) = t t t(1 + t) d = .

AddisonWesley (1995). Knuth: The Art of Computer Programming: Sorting and Searching. Oren Patashnik: Concrete Mathematics. a number of books and papers have been produced. Donald E. Wiley (1953) (1958).Chapter 8 Bibliography The birth of Computer Science and the need of analyzing the behavior of algorithms and data structures have given a strong twirl to Combinatorial Analysis. Addison-Wesley (1973). Cambridge Univ. and the reader is never left alone in front of the many-faced problems he or she is encouraged to tackle. Numerical and probabilistic developments are to be found in the central volume: Donald E. and to the mathematical methods used to study combinatorial objects. The most systematic work in the ﬁeld is perhaps the following text. Without this basic knowledge. because they contain information on general concepts. I. Vol. Addison-Wesley (1968). II. Wiley (1950) (1957) . the ﬁrst part of which is dedicated to the mathematical methods used in Combinatorial Analysis. Addison-Wesley (1973) A concise exposition of several important techniques is given in: Daniel H. very clear and very comprehensive. Knuth: The Art of Computer Programming: Fundamental Algorithms. with the main topics of Combinatorial Analysis. Graham. Louis Comtet: Advanced Combinatorics. AddisonWesley (1989). Richard P. Vol. both from a combinatorial and a mathematical point of view: William Feller: An Introduction to Probability Theory and Its Applications. Stanley: Enumerative Combinatorics. David M. Ronald L. So. Robert Sedgewick. there is little hope to understand the developments of the analysis of algorithms and data structures: Donald E. Vol. Jackson: Combinatorial Enumeration. 93 It deals. Donald E. Press (1986) (2000). Press !997) (2001). Dover Publ. Many texts on Combinatorial Analysis are worth of being considered. in an elementary but rigorous way. . Vol. I. The ﬁrst author who systematically worked in this direction was surely Donald Knuth. Knuth. with an eye to Computer Science. (2004). Many exercise (with solutions) are proposed. Stanley: Enumerative Combinatorics. who published in 1968 the ﬁrst volume of his monumental “The Art of Computer Programming”. near to the traditional literature on Combinatorics. Greene. John Riordan: An Introduction to Combinatorial Analysis. Ian P. Reidel (1974). Goulden. Birkh¨user a (1981). Philippe Flajolet: An Introduction to the Analysis of Algorithms. Cambridge Univ. III. Knuth: Mathematics for the Analysis of Algorithms. Vol. Knuth: The Art of Computer Programming: Numerical Algorithms. Richard P. II.(1968). relating Computer Science and the methods of Combinatorial Analysis. Many additional concepts and techniques are also contained in the third volume: Donald E.

Stanley dedicates to Catalan numbers a special survey in his second volume. Academic Press (1990). Sometimes. Renzo Sprugnol. Wiley (1974) Vol. Irene A. but here we are more interested in their combinatorial meaning. The concepts relating to the problem of searching can be found in any text on Algorithms and Data Structures. INRIA. Maple and Mathematica are among the most of Mathematical Functions. The resulting book (and the corresponding Internet site) is one of the most important reference point: N. Now they are accepted almost universally. so we limit our considerations to some classical cases relative to Computer Science. Probably. the quantity we consider are common to all parts of Mathematics. A practical device has been realized in Maple: Bruno Salvy.com/ njas/sequences/. Stirling numbers are very important in Numerical Analysis. Sequences studied in Combinatorial Analysis have been collected and annotated by Sloane. For the mathematical concepts ematics is available by means of several Computer of Γ and ψ functions. even if it is comand Henrici. The American Mathematical Monthly (to appear). The number of applications of generating functions is almost inﬁnite. Simon Plouﬀe: The Encyclopedia of Integer Sequences Academic Press (1995). and their theory has been developed in many of the quoted texts. they are the most frequently used quantity in Combinatorics. These systems have become an essential tool for developing any new aspect of Mathematics. *** Chapter 5: Riordan arrays *** Chapter 3: Formal Power Series Formal power series have a long tradition. almost every part of Mathmon in Mathematics.94 *** Chapter 1: Introduction CHAPTER 8. Since their introduction in Mathematics. (1977) Vol. in particular Volume III of the quoted The Lagrange Inversion Formula can be found in “The Art of Computer Programming”. the reader can ﬁnd their algebraic foundation in the book: Peter Henrici: Applied and Computational Complex Analysis. other software is available. I. generating functions have been a controversial concept. The ζ function and the Bernoulli numbers are also covered by the book of Abramowitz and Stegun. Sloane. used systems. 143 (1992). Knuth and Patashnik is appropriate.research. Dover Publ. . Nowadays. in particular of Combinatorial Analysis. Rapport Technique N. in particular in Stanley be consulted for Landau notation. we intentionally complicate proofs in order to stress the use of generating function as a unifying approach. and from this fact their relevance in Combinatorics and in the Analysis of Algorithms originates. So. In particular: Herbert S. implementing on a computer the Milton Abramowitz. the reader is referred to: Algebra Systems. Harmonic numbers are the “discrete” version of logarithms. *** Chapter 2: Special numbers The quoted book by Graham. M. III. *** Chapter 4: Generating functions In our present approach the concept of an (ordinary) generating function is essential. Wilf: Generatingfunctionology. due to de Moivre in the XVIII Century. (1972). a formalization can be found in: Donatella Merlini. BIBLIOGRAPHY Coeﬃcient extraction is central in our approach. Paul Zimmermann: GFUN: a Maple package for the manipulation of generating and holonomic functions in one variable. and this justiﬁes our insistence on the two operators “generating function” and “coeﬃcient of”. and many. Cecilia Verri: The method of coeﬃcients. Stegun: Handbook ideas described in this chapter. our proofs should be compared with those given (for the same problems) by Knuth or Flajolet and Sedgewick.att. just after binomial coeﬃcients and before Fibonacci numbers. free or not. Volume I can most of the quoted texts. (1986) Vol. Anyhow. A. many others. Available at: http://www. II. J.

Riordan arrays are less powerfy+ul. Knuth: Convolution polynomials. provided the two members of the identity: k L(k. Journal of Computational and Applied Mathematics 32 (1990). This means that we can algorithmically establish whether an identity is true or false. Doron Zeilberger: A fast algorithm for proving terminating hypergeometric identities. The theory was further developed in: Donatella Merlini. In particular. We quote: John Riordan: Combinatorial Identities. *** Collections of combinatorial identities are: Henry W. the method of Wilf and Zeilberger completely solves the problem of combinatorial identities “with hypergeometric terms”. The symbolic method was started by Sch¨tzenberger and Viennot. n) R(n + 1) Riordan arrays are strictly related to “convolution R(n) matrices” and to “Umbral calculus”. powers and binomial coeﬃcients. Peters (1996). Chapter 6: Formal Methods In this chapter we have considered two important methods: the symbolic method for deducing counting generating functions from the syntactic deﬁnition of combinatorial objects. be used also when non-hypergeometric terms are not involved. Gould: Combinatorial Identities.are all rational functions in n and k. Woodson: The Riordan group. Seyoum Getu. M. Josef Kaucky: Combinatorial Identities Veda. 207 – 211. Shapiro. They were introduced by: Louis W. 132 (1992) 267 – 290. 34 (1991) 229 – 239. Discrete Mathematics 80 (1990). n) = R(n) have a special form: L(k + 1. which allow to solve many problems relative to the given objects. Wen-Jin Woan. Discrete Applied Mathematics. In this Steve Roman: The Umbral Calculus Academic sense. M. Herbert S. Discrete Mathematics. Their practical relevance was noted in: Renzo Sprugnoli: Riordan arrays and combinatorial sums. as for examples in the case of harmonic and Donald E. Renzo Sprugnoli. Society Translations. Egorychev: Integral Representation and the Computation of Combinatorial Sums American Math. Rogers. The Stirling numbers of both kind. A. n) and R(n) are composed of factorial. Canadian Journal of Mathematics 49 (1997) 301 – 320. This actutions of these concepts and combinatorial sums: ally means that L(k. Wilf. Douglas G. Vol. Methematica Journal. K. who devised a u technique to automatically generate counting generating functions from a context-free non-ambiguous grammar. 2 (1991) 67 – 78. n) L(k. nobody seems to have noticed the connec. 59 (1984). this method gives a direct way to obtain monovariate or multivariate generating functions. Other methods to prove combinatorial identities are important in Combinatorial Analysis. and the method of “operators” for obtaining combinatorial identities from relations between transformations of sequences deﬁned by operators. 321 – 368. Wiley (1968). A Standardized Set of Tables Listing 500 Binomial Coeﬃcient Summations West Virginia University (1972).95 Riordan arrays are part of the general method of coeﬃcients and are particularly important in proving combinatorial identities and generating function transformations. Since context-free languages only deﬁne algebraic generating functions (the subset of regular grammars is limited to rational . P. Bratislava (1975). Petkovˇek. rather strangely. n + 1) L(k. although. Doron Zeilberger: s A = B. Cecilia Verri: On some alternative characterizations of Riordan arrays. but can Press (1984). When the grammar deﬁnes a class of combinatorial objects. Doron Zeilberger: A holonomic systems approach to special functions identities. Leon C. G. n) L(k.

fr/seminars/sem0001/ﬂajolet.Lecture Notes in Computer Science (1983) 173 – 181. (1965). Chelsea Publ. this solution is a precise formula. This. is not always possible. Available at: http://algo.inria. as our numerous examples show. We have used a rather descriptive approach. Languages and Programming . limiting our considerations to elementary cases. Actually. in the sense that every time they give the solution to a problem. X Colloquium on Automata. we would also be content with an approximate solution. The and Henrici. however. the method is not very general. (2001). Xavier Viennot: Algebraic languages and polyominoes enumeration. The method of Heyman is based on the method was extended by Flajolet to some classes of paper: exponential generating functions and implemented Micha Hofri: Probabilistic Analysis of Algoin Maple. the method is used in Numerical Analysis. Information and Control 6 (1963) 246 – 264 Maylis Delest. especially George Boole. BIBLIOGRAPHY functions). Algorithms Seminar.96 CHAPTER 8. The natural settings for these problems are Complex Analysis and the theory of series. rithms. and many times we are not able to ﬁnd a solution of this kind. provided we can give an upper bound to the error committed. *** Chapter 7: Asymptotics The methods treated in the previous chapters are “exact”. A classical book in this direction is: Charles Jordan: Calculus of Finite Diﬀerences. The Euler-McLaurin summation formula is the ﬁrst connection between ﬁnite methods (considered up to this moment) and asymptotics.html The method of operators is very old and was developed in the XIX Century by English mathematicians. especially Knuth. In these cases. but covered by the quoted text. Wilf is very eﬀective whenever it can be applied. but it has a clear connection with Combinatorial Analysis. These situations are . The purpose of asymptotic methods is just that. Philippe Flajolet: Symbolic enumerative combinatorics and complex asymptotic analysis. Springer (1987). Marco Sch¨tzenberger: Context-free languages u and pushdown automata. The important concepts of indeﬁnite and deﬁnite summations are used by Wilf and Zeilberger in the quoted texts.

30 coeﬃcient operator. 11 1-1 mapping. 12 Darboux’ method.s. 11 97 Cartesian product. 63 Cauchy product. 87 algebraic square root. 29 composition of permutations. 40 cross product rule.. 17 central trinomial coeﬃcient. 30 composition rule for generating functions. 18. 15 closed form expression.p. 29 Computer Algebra System. 52 Backus Normal Form. 70 Boole. 11 coeﬃcient extraction rules. 57 Bender’s theorem. 9 addition operator. 11 A-sequence. 73 branch. 15 combination with repetitions. 12. 70 Bell number. 62 . 77 Adelson-Velski. 67 alternating subgroup. 34 Catalan triangle.p. 26 convolution rule for coeﬃcient of. 14 ambiguous grammar. 28 deﬁnite summation. 68 convergence. 12. 87 Algol’60. 90 central binomial coeﬃcient. 62 colored walk problem. 90 ψ function. 41 bivariate generating function.Index Γ function. 20 binomial coeﬃcient. 35 algebraic singularity. 29 derangement. 69 Algol’68. 9 bijection. 11 asymptotic approximation. 8. 30 convolution rule for generating functions. 78 degree of a permutation. 68 Appell subgroup. 62 column. 87 C language. 88 asymptotic development. 88 average case analysis. 52 adicity. 15 bisection formula. 30 colored walk. 21 cycle degree. 34 context free grammar. 80. 83 convolution. 20. 5 AVL tree. 13 composition rule for coeﬃcient of. 6 binary tree. 11 Catalan number. 24 Bell subgroup.. 11 combination. 24. 58 absolute scale. 11 column index.. 12 Chomski. 80 asymptotic evaluation. 84 ΛΥΩ.s. 12 delta series. 67 diagonal step. 85 big-oh notation. 84 Bernoulli number. 5 alphabet. 46 characteristic function. 69 cardinality. 30 coeﬃcient of operator. 68 context free language. 86 derivation. 67 choose. 53. 7 codomain. 57 arithmetic square root. 11 binary searching. 26 Cauchy theorem. 84 deﬁnite integration of a f. George.p. 8 1-1 correspondence. Noam. 87 arrangement. 55 BNF. 70 algorithm. 15 binomial formula. 16. 17 cycle. 40 compositional inverse of a f. 12 cycle representation. 62 composition of f. 12. 16 complete colored walk. 86. 8.s.

19 Euler constant. 12. 73 diﬀerentiation of a f. 40 diﬀerence operator. 26 involution.98 diagonalisation rule for generating functions. 13. 12. 11 input. 69 invertible f. 69 east step. 15 key. 39 exponentiation of f. 67 length of a walk. 25 free monoid.. 6. 85 double sequence.p. 67 full history recurrence. 10 linear recurrence. 62 empty word. 29 f. 67 height balanced binary tree. 15 Fibonacci number. 18 generating function. 11 dominating singularity.p.p. 25 factorial. 67 k-combination. 28 diﬀerentiation rule for coeﬃcient of. 17 Hayman’s method. 67 entire function. 13 image. 52 language. 51 intractable algorithm. 68 leftmost occurrence. 28 extensional deﬁnition. 21 Dyck word. 47 linear recurrence with constant coeﬃcients. 40 digamma function. 14 falling factorial. 19 Fibonacci word. 30 grammar. 57 Landau. 25 index.s. 49 juxtaposition. 18 Euler transformation. 86 indeﬁnite precision. 55 Euler-McLaurin summation formula. 11 internal path length. 47 function.s. 35 indeﬁnite summation. 8. 78 indeterminate. 67 formal Laurent series. 11 Dyck grammar. 52 height of a tree. 33 linear algorithm. 29 identity operator.s. 18 harmonic series. 90 Hardy’s identity. 18 ﬁnite operator. 67 language generated by the grammar. 30 diﬀerentiation rule for generating functions.p. 11 inﬁnitesimal operator. 48 . 28 intensional deﬁnition. 8. 68 Dyck walk. 32 Lagrange inversion formula. 12. 80 even permutation. 11 extraction of the coeﬃcient. 8 generalized convolution rule. 67 geometric series...s. 10 exponential generating function. 52 i. 67 LIF. Leonardo. 47 injective function. 67 H-admissible function. 20.p. 15 divergence. 13 exponential algorithm. 25. 70 Fibonacci.. 5 integral lattice. 67 letter. 90 formal grammar. 61 harmonic number.l. 67 formal language. 89 essential singularity. 25 generating function rules.p. 25. Edmund. Philippe. 43 INDEX Lagrange.. 73 initial condition. 51 identity. 73 ﬁxed point.. 56 generalized harmonic number. 83 domain. 5 Kronecker’s delta. 90 head. 33 Lagrange subgroup. 62 length of a word. 73 identity permutation. 11 Gauss’ integral. 12 Flajolet. 20 integration of a f. 9 Landis. 40 generation. 10 intrinsecally ambiguous grammar. 12 identity for composition. 11 inclusion exclusion principle. 42. 86 Euclid’s algorithm. 68 Dyck language.s. 27 formal power series. 8. 67 group. 19 Fibonacci problem. 8 disposition.

.INDEX linear recurrence with polynomial coeﬃcients. 67 Motzkin number. 11 Mascheroni constant. 68 u semi-closed form. 14 range. 73 shifting rule for coeﬃcient of. 26 production. 67 odd permutation.p.. 5 set. 90 Sch¨tzenberger methodology. 71 occurence. 8. 19 radius of convergence. 72 order of a f. 5 proper Riordan array. 55 Riordan’s old identity. P. 71 Motzkin triangle.p.s.s. 57 root. 30 linearity rule for generating functions. 34 parenthetization. 20 square root of a f. 35 operator.s. 67 program.. 30. 6. 14 simple binomial coeﬃcients. 20 number of involutions. 83 non-ambiguous grammar. Douglas. 55 quadratic algoritm. 15 Rogers. 57 residue of a f. 18 metasymbol.. 85 ordered partition. 67 occurs in. 61 rising factorial. 71 multiset. C. 30.s.. 24 preferential arrangements. Bruno. 10 logarithmic singularity. 48 linearity rule for coeﬃcient of. 20. 11 row index. 47 renewal array. 30 shifting rule for generating functions. 30 reverse of a f. 5 product of f. 85 order of a recurrence. 25 order of a pole. 48 partial history recurrence. 11 set partition. 28. 70 method of shifting. 36 logarithm of a f.L. 9 solving a recurrence. 37 monoid. 9 object grammar. 24 preﬁx. 62 north-east step.p. 10 power of a f. 12 south-east step. 40 list representation.s. 27 preferential arrangement number. 27 product of f. 21 row. 12 place marker. 32 Salvy. 67 principal value. 20 permutation. 68 north step. 88 principle of identity.p.L. 44 Miller. 85 small-oh notation. 18 Riordan array. 29 rabbit problem.. 59 singularity. 25 output. 20. 24 ordinary generating function. 84 non convergence. 11 sequential searching. 40 problem of searching. 5 p-ary tree. 21 rooted planar tree. 83 random permutation.. 11 row-by-column product. 40 shuﬄing. 14 number of mappings. 6 sorting.. 47 partially recursive set. 46 sequence. 29 Riemann zeta function. J. 11 recurrence relation. 68 partial fraction expansion. 35 operations on rational numbers. 69 Pascal triangle. 25 Pochhammer symbol. 11 Numerical Analysis.p.s. 35. 13 operand. 85 polynomial algorithm. 24. 68 Pascal language. 16 Newton’s rule. 47 ordered Bell number. 73 O-notation.p. 28 Stanley. 16 path.s.s. 24 negation rule.. 33 99 . 28 logarithmic algorithm. 15 pole. 88 mapping. 23 shift operator. 63 Motzkin word. 10 quasi-unit.

67 successful search. 43 vector notation. 55 trinomial coeﬃcient. 16 Taylor theorem. 49 uniform convergence.s. 11 vector representation. 21. 21 Stirling number of the second kind. 71 underdiagonal colored walk. 80 terminal word. 62 symmetric group. 12 walk. 27 sum of f. 20 word. 68 symmetric colored walk problem. 10 transposition. 14 symmetry formula. 12. 67 worst AVL tree. 49 surjective function. 70 table. 5 Z-sequence. 44 sum of f. 36 triangle. 57 subtracted singularity. Paul. 5 tail.. 11 symbol. 5 suﬃx. 23 Stirling.s. 67 sum of a geometric progression. 67 symbolic method. 5 van Wijngarden grammar. 52 worst case analysis. 88 subword. 12. 90 INDEX . 68 tractable algorithm. 18 Zimmermann. 12 transposition representation. 22 subgroup of associated operators. 67 Tartaglia triangle. James. 6.L.p. 78 summing factor method. 26 unsuccessful search. 46 unary-binary tree. 62 underdiagonal walk. 20 unfold a recurrence. 13 tree representation. 22 Stirling polynomial.100 Stirling number of the ﬁrst kind. 26 summation by parts. 83 unit. 70 Vandermonde convolution. 58 zeta function. 16 syntax directed compilation.