3 views

Uploaded by hendra

© All Rights Reserved

- Kyburg, Epistemology and logic
- Lecture Notes on Measure Theory
- Dempster Shafer
- Mt Notes
- skewness_kurtosis
- Moment Generation Function
- Beta Distribution
- Dajani Ergodic Theory
- Introduction to Probability Theory1
- Module CARE Paper2@Set1
- Meas.integr
- Some Results Concerning Hilbert Space
- PD
- Probability Density function worksheet
- US Federal Reserve: 200552pap
- Assignment Sms Ida
- Power Analysis of JT Test
- f13-18100Asyll
- Weibull Analysis in Excel
- Simulation Problems

You are on page 1of 145

2

Contents

1.1 What is Ergodic Theory? . . . . . . . . . . . . . . . . . . . . . 5

1.2 Measure Preserving Transformations . . . . . . . . . . . . . . 6

1.3 Basic Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.5 Induced and Integral Transformations . . . . . . . . . . . . . . 15

1.5.1 Induced Transformations . . . . . . . . . . . . . . . . . 15

1.5.2 Integral Transformations . . . . . . . . . . . . . . . . . 18

1.6 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.7 Other Characterizations of Ergodicity . . . . . . . . . . . . . . 22

1.8 Examples of Ergodic Transformations . . . . . . . . . . . . . . 24

2.1 The Ergodic Theorem and its consequences . . . . . . . . . . . 29

2.2 Characterization of Irreducible Markov Chains . . . . . . . . . 41

2.3 Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.1 Measure Preserving Isomorphisms . . . . . . . . . . . . . . . . 47

3.2 Factor Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.3 Natural Extensions . . . . . . . . . . . . . . . . . . . . . . . . 52

4 Entropy 55

4.1 Randomness and Information . . . . . . . . . . . . . . . . . . 55

4.2 Definitions and Properties . . . . . . . . . . . . . . . . . . . . 56

4.3 Calculation of Entropy and Examples . . . . . . . . . . . . . . 63

4.4 The Shannon-McMillan-Breiman Theorem . . . . . . . . . . . 66

4.5 Lochs Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3

4

5.1 Equivalent measures . . . . . . . . . . . . . . . . . . . . . . . 79

5.2 Non-singular and conservative transformations . . . . . . . . . 80

5.3 Hurewicz Ergodic Theorem . . . . . . . . . . . . . . . . . . . . 83

6.1 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.2 Unique Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.1 Basic Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

7.2 Topological Entropy . . . . . . . . . . . . . . . . . . . . . . . 108

7.2.1 Two Definitions . . . . . . . . . . . . . . . . . . . . . . 108

7.2.2 Equivalence of the two Definitions . . . . . . . . . . . . 114

7.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

8.1 Main Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8.2 Measures of Maximal Entropy . . . . . . . . . . . . . . . . . . 136

Chapter 1

It is not easy to give a simple definition of Ergodic Theory because it uses

techniques and examples from many fields such as probability theory, statis-

tical mechanics, number theory, vector fields on manifolds, group actions of

homogeneous spaces and many more.

The word ergodic is a mixture of two Greek words: ergon (work) and odos

(path). The word was introduced by Boltzmann (in statistical mechanics)

regarding his hypothesis: for large systems of interacting particles in equilib-

rium, the time average along a single trajectory equals the space average. The

hypothesis as it was stated was false, and the investigation for the conditions

under which these two quantities are equal lead to the birth of ergodic theory

as is known nowadays.

A modern description of what ergodic theory is would be: it is the study

of the long term average behavior of systems evolving in time. The collection

of all states of the system form a space X, and the evolution is represented

by either

a transformation T : X X, where T x is the state of the system at

time t = 1, when the system (i.e., at time t = 0) was initially in state x.

(This is analogous to the setup of discrete time stochastic processes).

if the evolution is continuous or if the configurations have spacial struc-

ture, then we describe the evolution by looking at a group of transfor-

mations G (like Z2 , R, R2 ) acting on X, i.e., every g G is identified

with a transformation Tg : X X, and Tgg0 = Tg Tg0 .

5

6 Introduction and preliminaries

the basic structure on X. For example

if X is a measure space, then T must be measurable.

if X is a topological space, then T must be continuous.

if X has a differentiable structure, then T is a diffeomorphism.

In this course our space is a probability space (X, B, ), and our time is

discrete. So the evolution is described by a measurable map T : X X, so

that T 1 A B for all A B. For each x X, the orbit of x is the sequence

x, T x, T 2 x, . . . .

. . . , T 1 x, x, T x, . . . .

We want also that the evolution is in steady state i.e. stationary. In the

language of ergodic theory, we want T to be measure preserving.

Definition 1.2.1 Let (X, B, ) be a probability space, and T : X X mea-

surable. The map T is said to be measure preserving with respect to if

(T 1 A) = (A) for all A B.

process

f, f T, f T 2 , . . .

is stationary. This means that for all Borel sets B1 , . . . , Bn , and all integers

r1 < r2 < . . . < rn , one has for any k 1,

({x : f (T r1 x) B1 , . . . f (T rn x) Bn }) =

{x : f (T r1 +k x) B1 , . . . f (T rn +k x) Bn } .

(A) for all A B. We can generalize the definition of measure preserving to

the following case. Let T : (X1 , B1 , 1 ) (X2 , B2 , 2 ) be measurable, then

T is measure preserving if 1 (T 1 A) = 2 (A) for all A B2 . The following

Measure Preserving Transformations 7

For this we need the notions of algebra and semi-algebra.

Recall that a collection S of subsets of X is said to be a semi-algebra if

(i) S, (ii) A B S whenever A, B S, and (iii) if A S, then

X \A = ni=1 Ei is a disjoint union of elements of S. For example if X = [0, 1),

and S is the collection of all subintervals, then S is a semi-algebra. Or if

X = {0, 1}Z , then the collection of all cylinder sets {x : xi = ai , . . . , xj = aj }

is a semi-algebra.

An algebra A is a collection of subsets of X satisfying:(i) A, (ii) if

A, B A, then A B A, and finally (iii) if A A, then X \ A A.

Clearly an algebra is a semi-algebra. Furthermore, given a semi-algebra S

one can form an algebra by taking all finite disjoint unions of elements of S.

We denote this algebra by A(S), and we call it the algebra generated by S.

It is in fact the smallest algebra containing S. Likewise, given a semi-algebra

S (or an algebra A), the -algebra generated by S (A) is denoted by B(S)

(B(A)), and is the smallest -algebra containing S (or A).

A monotone class C is a collection of subsets of X with the following two

properties

if E1 E2 . . . are elements of C, then

i=1 Ei C,

i=1 Fi C.

monotone class containing S.

ated by A equals the monotone class generated by A.

Using the above Theorem, one can get an easier criterion for checking that

a transformation is measure preserving.

X2 a transformation. Suppose S2 is a generating semi-algebra of B2 . Then,

T is measurable and measure preserving if and only if for each A S2 , we

have T 1 A B1 and 1 (T 1 A) = 2 (A).

Proof. Let

C = {B B2 : T 1 B B1 , and 1 (T 1 B) = 2 (B)}.

8 Introduction and preliminaries

class. Let E1 E2 . . . be elements of C, and let E =

i=1 Ei . Then,

1 1

T E = i=1 T Ei B1 .

1 (T 1 E) = 1 (

n=1 T

1

En ) = lim 1 (T 1 En )

n

= lim 2 (En )

n

= 2 (

n=1 En )

= 2 (E).

then i=1 Fi C. Hence, C is a monotone class containing the algebra A(S2 ).

By the monotone class theorem, B2 is the smallest monotone class containing

A(S2 ), hence B2 C. This shows that B2 = C, therefore T is measurable

and measure preserving.

For example if

X = [0, 1) with the Borel -algebra B, and a probability measure on B.

Then a transformation T : X X is measurable and measure preserving if

and only if T 1 [a, b) B and (T 1 [a, b)) = ([a, b)) for any interval [a, b).

X = {0, 1}N with product -algebra and product measure . A trans-

formation T : X X is measurable and measure preserving if and only

if

T 1 ({x : x0 = a0 , . . . , xn = an }) B,

and

T 1 {x : x0 = a0 , . . . , xn = an } = ({x : x0 = a0 , . . . , xn = an })

AB = (A B) \ (A B) = (A \ B) (B \ A).

Basic Examples 9

ating B. Then, for any A B and any > 0, there exists C A such that

(AC) < .

Proof. Let

D = {A B : for any > 0, there exists C A such that (AC) < }.

we need to show that D is a monotone

S class. To this end, let A1 A2

be a sequence in D, and let A = n=1 An , notice that (A) = lim (An ).

n

Let > 0, there exists an N such that (AAN ) = |(A) (AN )| < /2.

Since AN D, then there exists C A such that (AN C) < /2. Then,

Hence, A D. Similarly, one can show that D is closed under decreasing in-

tersections so that D is a monotone class containg A, hence by the Monotone

class Theorem B D. Therefore, B = D, and the theorem is proved.

(a) Translations Let X = [0, 1) with the Lebesgue -algebra B, and

Lebesgue measure . Let 0 < < 1, define T : X X by

T x = x + mod 1 = x + bx + c.

measure preserving.

(b) Multiplication by 2 modulo 1 Let (X, B, ) be as in example (a), and

let T : X X be given by

2x 0 x < 1/2

T x = 2x mod 1 =

2x 1 1/2 x < 1.

a b a+1 b+1

T 1 [a, b) = [ , ) [ , ),

2 2 2 2

10 Introduction and preliminaries

and

T 1 [a, b) = b a = ([a, b)) .

Although this map is very simple, it has in fact many facets. For example,

iterations of this map yield the binary expansion of points in [0, 1) i.e., using

T one can associate with each point in [0, 1) an infinite sequence of 0s and

1s. To do so, we define the function a1 by

(

0 if 0 x < 1/2

a1 (x) =

1 if 1/2 x < 1,

for simplicity, we write an instead of an (x), then T x = 2x a1 . Rewriting

2

we get x = a21 + T2x . Similarly, T x = a22 + T 2 x . Continuing in this manner,

we see that for each n 1,

a1 a2 an T n x

x= + 2 + + n + n .

2 2 2 2

Since 0 < T n x < 1, we get

n

X ai T nx

x = 0 as n .

i=1

2i 2n

Thus, x =

P ai

i=1 2i . We shall later see that the sequence of digits a1 , a2 , . . .

forms an i.i.d. sequence of Bernoulli random variables.

(c) Bakers Transformation This example is the two-dimensional version

of example (b). The underlying probability space is [0, 1)2 with product

Lebesgue -algebra B B and product Lebesgue measure . Define

T : [0, 1)2 [0, 1)2 by

(

(2x, y2 ) 0 x < 1/2

T (x, y) = y+1

(2x 1, 2 ) 1/2 x < 1.

ing.

(d) -transformations

Let X = [0, 1) with the Lebesgue -algebra B. Let

1+ 5

= 2 , the golden mean. Notice that 2 = + 1. Define a transformation

Basic Examples 11

T : X X by

x 0 x < 1/

T x = x mod 1 =

x 1 1/ x < 1.

a counterexample), but is measure preserving with respect to the measure

given by Z

(B) = g(x) dx,

B

where (

5+3 5

10

0 x < 1/

g(x) =

5+ 5

10

1/ x < 1.

show that (similar to example (b)) iterations of this map generate expansions

for points x [0, 1) (known as -expansions) of the form

X bi

x= ,

i=1

i

F the -algebra generated by the cylinders. Let p = (p0 , p1 , . . . , pk1 ) be a

positive probability vector, define a measure on F by specifying it on the

cylinder sets as follows

the left shift, is measurable and measure preserving, since

T 1 {x : xn = an , . . . xn = an } = {x : xn+1 = an , . . . , xn+1 = an },

and

({x : xn+1 = an , . . . , xn+1 = an }) = pan . . . pan .

12 Introduction and preliminaries

Notice that in case X = {0, 1, . . . k 1}N , then one should consider cylinder

sets of the form {x : x0 = a0 , . . . xn = an }. In this case

T 1 {x : x0 = a0 , . . . , xn = an } = k1

j=0 {x : x0 = j, x1 = a0 , . . . , xn+1 = an },

(f) Markov Shifts Let (X, F, T ) be as in example (e). We define a measure

on F as follows. Let P = (pij ) be a stochastic k k matrix, and q =

(q0 , q1 , . . . , qk1 ) a positive probability vector such that qP = q. Define on

cylinders by

({x : xn = an , . . . xn = an }) = qan pan an+1 . . . pan1 an .

Just as in example (e), one sees that T is measurable and measure pre-

serving.

(g) Stationary Stochastic Processes Let (, F, IP) be a probability space,

and

. . . , Y2 , Y1 , Y0 , Y1 , Y2 , . . .

a stationary stochastic process on with values in R. Hence, for each k Z

IP (Yn1 B1 , . . . , Ynr Br ) = IP (Yn1 +k B1 , . . . , Ynr +k Br )

for any n1 < n2 < < nr and any Lebesgue sets B1 , . . . , Br . We want to

see this process as coming from a measure preserving transformation.

Let X = RZ = {x = (. . . , x1 , x0 , x1 , . . .) : xi R} with the product -algebra

(i.e. generated by the cylinder sets). Let T : X X be the left shift i.e.

T x = z where zn = xn+1 . Define : X by

() = (. . . , Y2 (), Y1 (), Y0 (), Y1 (), Y2 (), . . . ).

Then, is measurable since if B1 , . . . , Br are Lebesgue sets in R, then

1 ({x X : xn1 B1 , . . . xnr Br }) = Yn1

1

(B1 ) . . . Yn1

r

(Br ) F.

Define a measure on X by

(E) = IP 1 (E) .

({x X : xn1 B1 , . . . xnr Br }) = IP (Yn1 B1 , . . . , Ynr Br ) .

Basic Examples 13

Since

T 1 ({x : xn1 B1 , . . . xnr Br }) = {x : xn1 +1 B1 , . . . xnr +1 Br },

stationarity of the process Yn implies that T is measure preserving. Further-

more, if we let i : X R be the natural projection onto the ith coordinate,

then Yi () = i (()) = 0 T i (()).

(h) Random Shifts Let (X, B, ) be a probability space, and T : X X an

invertible measure preserving transformation. Then, T 1 is measurable and

measure preserving with respect to . Suppose now that at each moment

instead of moving forward by T (x T x), we first flip a fair coin to decide

whether we will use T or T 1 . We can describe this random system by means

of a measure preserving transformation in the following way.

Let = {1, 1}Z with product -algebra F (i.e. the -algebra generated

by the cylinder sets), and the uniform product measure IP (see example (e)),

and let : be the left shift. As in example (e), the map is measure

preserving. Now, let Y = X with the product -algebra, and product

measure IP . Define S : Y Y by

S(, x) = (, T 0 x).

Then S is invertible (why?), and measure preserving with respect to IP .

To see the latter, for any set C F, and any A B, we have

(IP ) S 1 (C A) = (IP ) ({(, x) : S(, x) (C A))

= (IP ) ({(, x) : 0 = 1, C, T x A)

+ (IP ) {(, x) : 0 = 1, C, T 1 x A

= (IP ) {0 = 1} 1 C T 1 A

+ (IP ) {0 = 1} 1 C T A

= IP {0 = 1} 1 C T 1 A

+ IP {0 = 1} 1 C (T A)

= IP {0 = 1} 1 C (A)

+ IP {0 = 1} 1 C (A)

(h) continued fractions Consider ([0, 1), B), where B is the Lebesgue -

algebra. Define a transformation T : [0, 1) [0, 1) by T 0 = 0 and for x 6= 0

1 1

T x = b c.

x x

14 Introduction and preliminaries

Lebesgue measure, but is measure preserving with respect to the so called

Gauss probability measure given by

Z

1 1

(B) = dx.

B log 2 1 + x

An interesting feature of this map is that its iterations generate the continued

fraction expansion for points in (0, 1). For if we define

(

1 if x ( 12 , 1)

a1 = a1 (x) = 1

n if x ( n+1 , n1 ], n 2,

1 1

then, T x = x

a1 and hence x = . For n 1, let an = an (x) =

a1 + T x

a1 (T n1 x). Then, after n iterations we see that

1 1

x= = ... = .

a1 + T x 1

a1 +

. 1

a2 + . . +

an + T n x

pn 1

In fact, if = , then one can show that {qn } are mono-

qn 1

a1 +

. 1

a2 + . . +

an

tonically increasing, and

pn 1

|x | < 2 0 as n .

qn qn

1

x= .

1

a1 +

1

a2 +

1

a3 +

..

.

Recurrence 15

1.4 Recurrence

Let T be a measure preserving transformation on a probability space (X, F, ),

and let B F. A point x B is said to be B-recurrent if there exists k 1

such that T k x B.

a.e. x B is B-recurrent.

Proof Let F be the subset of B consisting of all elements that are not B-

recurrent. Then,

F = {x B : T k x

/ B for all k 1}.

We want to show that (F ) = 0. First notice that F T k F = for all

k 1, hence T l F T m F = for all l 6= m. Thus, the sets F, T 1 F, . . .

are pairwise disjoint, and (T n F ) = (F ) for all n 1 (T is measure

preserving). If (F ) > 0, then

X

1 = (X) k0 T k F = (F ) = ,

k0

a contradiction.

The proof of the above theorem implies that almost every x B returns

to B infinitely often. In other words, there exist infinitely many integers

n1 < n2 < . . . such that T ni x B. To see this, let

D = {x B : T k x B for finitely many k 1}.

Then,

D = {x B : T k x F for some k 0}

k=0 T

k

F.

Thus, (D) = 0 since (F ) = 0 and T is measure preserving.

1.5.1 Induced Transformations

Let T be a measure preserving transformation on the probability space

(X, F, ). Let A X with (A) > 0. By Poincares Recurrence Theo-

rem almost every x A returns to A infinitely often under the action of T .

16 Introduction and preliminaries

For x A, let n(x) := inf{n 1 : T n x A}. We call n(x) the first return

time of x to A.

F A on A.

from A the set of measure zero on which n(x) = , and we denote the new

set again by A. Consider the -algebra F A on A, which is the restriction

of F to A. Furthermore, let A be the probability measure on A, defined by

(B)

A (B) = , for B F A,

(A)

TA : A A by

TA x = T n(x) x , for x A.

From the above we see that TA is defined on A. What kind of a transformation

is TA ?

F A.

Ak = {x A : n(x) = k}

Bk = {x X \ A : T x, . . . , T k1 x 6 A, T k x A}.

Notice that A =

S

k=1 Ak , and

To show that A (C) = A (TA1 C), we show that

(TA1 C) = (T 1 C).

Induced and Integral Transformations 17

B. 1

..

........

...

..

..

B. 1 B. 2 .

.... ....

....... .......

.... ....

T 2 A \ A T A (A, 3)

B. 1 B. 2 B. 3

.. .. ..

........ ........ ........

... ... ...

.. .. ..

T A \ A (A, 2)

B. 1 B. 2 B. 3 B. 4

.... .... .... ....

....... ....... ....... .......

.... .... .... ....

A (A, 1)

A1 A2 A3 A4 A5

Now,

[

[

TA1 (C) = Ak TA1 C = Ak T k C,

k=1 k=1

hence

X

TA1 (C) Ak T k C .

=

k=1

On the other hand, using repeatedly (1.1) , one gets for any n 1,

= (A1 T 1 C) + (A2 T 2 C) + (B2 T 2 C)

..

.

X n

= (Ak T k C) + (Bn T n C).

k=1

Since

!

[ X

1 Bn T n C = (Bn T n C),

n=1 n=1

it follows that

lim (Bn T n C) = 0.

n

18 Introduction and preliminaries

Thus,

X

1

Ak T k C = (TA1 C).

(C) = (T C) =

k=1

This shows that A (C) = A (TA1 C), which implies that TA is measure pre-

serving with respect to A .

show that for all C F A,

1+ 5

Exercise 1.5.4 Let G = , so that G2 = G + 1. Consider the set

2

1 [ 1 1

X = [0, ) [0, 1) [ , 1) [0, ),

G G G

endowed with the product Borel -algebra, and the normalized Lebesgue

measure . Define the transformation

y

(Gx, G ),

(x, y) [0, G1 ) [0, 1]

T (x, y) =

(Gx 1, 1 + y ), (x, y) [ 1 , 1) [0, 1 ).

G G

G

(a) Show that T is measure preserving with respect to .

[0, G1 ).

Let S be a measure preserving transformation on a probability space (A, F, ),

and let f L1 (A, ) be positive and integer valued. We now construct a mea-

sure preserving transformation T on a probability space (X, C, ), such that

the original transformation S can be seen as the induced transformation on

X with return time f .

Induced and Integral Transformations 19

where B A, B F and i N.

(B)

(3) (B, i) = Z and then extend to all of X.

f (y) d(y)

A

(

(y, i + 1), if i + 1 f (y),

T (y, i) =

(Sy, 1), if i + 1 > f (y).

show that T is -measure preserving. In fact, it suffices to check this on the

generators.

Let B A be F-measurable, and let i 1. We have to discern the

following two cases:

(1) If i > 1, then T 1 (B, i) = (B, i 1) and clearly

(B)

(T 1 (B, i)) = (B, i 1) = (B, i) = R .

A

f (y) d(y)

[

1

T (B, 1) = (An S 1 B, n) (disjoint union).

n=1

S

Since n=1 An = A we therefore find that

X (An S 1 B) (S 1 B)

(T 1 (B, 1)) = Z = Z

n=1 f (y) d(y) f (y) d(y)

A A

(B)

= Z = (B, 1) .

f (y) d(y)

A

20 Introduction and preliminaries

induced transformation of T on the set (A, 1), then the first return time

n(x, 1) = inf{k 1 : T k (x, 1) (A, 1)} is given by n(x, 1) = f (x), and

T(A,1) (x, 1) = (Sx, 1).

1.6 Ergodicity

Definition 1.6.1 Let T be a measure preserving transformation on a proba-

bility space (X, F, ). The map T is said to be ergodic if for every measurable

set A satisfying T 1 A = A, we have (A) = 0 or 1.

sure preserving. The following are equivalent:

(i) T is ergodic.

n=1 T

n

A) = 1.

(iv) If A, B F with (A) > 0 and (B) > 0, then there exists n > 0 such

that (T n A B) > 0.

Remark 1.6.1

1. In case T is invertible, then in the above characterization one can replace

T n by T n .

2. Note that if (B4T 1 B) = 0, then (B \ T 1 B) = (T 1 B \ B) = 0.

Since

B = B \ T 1 B B T 1 B ,

and

T 1 B = T 1 B \ B B T 1 B ,

we see that after removing a set of measure 0 from B and a set of measure 0

from T 1 B, the remaining parts are equal. In this case we say that B equals

T 1 B modulo sets of measure 0.

3. In words, (iii) says that if A is a set of positive measure, almost every

x X eventually (in fact infinitely often) will visit A.

4. (iv) says that elements of B will eventually enter A.

Ergodicity 21

(i)(ii) Let B F be such that (BT 1 B) = 0. We shall define a

measurable set C with C = T 1 C and (CB) = 0. Let

[

\

C = {x X : T n x B i.o. } = T k B.

n=1 k=n

[

! \

!

\ [

(CB) = T k B B c + T k B c B

n=1 k=n n=1 k=n

!

!

[ [

k

T B Bc + T k B c B

k=1 k=1

X

T k BB .

k=1

Using induction (and the fact that (EF ) (EG) + (GF )), one can

k

show that for each k 1 one has T BB = 0. Hence, (CB) = 0

which implies that (C) = (B). Therefore, (B) = 0 or 1.

(ii)(iii) Let (A) > 0 and let B = n

A. Then T 1 B B. Since T

S

n=1 T

is measure preserving, then (B) > 0 and

(T 1 BB) = (B \ T 1 B) = (B) (T 1 B) = 0.

(iii)(iv) Suppose (A)(B) > 0. By (iii)

!

!

[ [

(B) = B T n A = (B T n A) > 0.

n=1 n=1

(iv)(i) Suppose T 1 A = A with (A) > 0. If (Ac ) > 0, then by (iv) there

exists k 1 such that (Ac T k A) > 0. Since T k A = A, it follows that

(Ac A) > 0, a contradiction. Hence, (A) = 1 and T is ergodic.

22 Introduction and preliminaries

We denote by L0 (X, F, ) the space of all complex valued measurable func-

tions on the probability space (X, F, ). Let

Z

p 0

L (X, F, ) = {f L (X, F, ) : |f |p d(x) < }.

X

We use the subscript R whenever we are dealing only with real-valued func-

tions.

Let (Xi , Fi , i ), i = 1, 2 be two probability spaces, and T : X1 X2 a mea-

sure preserving transformation i.e., 2 (A) = 1 (T 1 A). Define the induced

operator UT : L0 (X2 , F2 , 2 ) L0 (X1 , F1 , 1 ) by

UT f = f T.

(i) UT is linear

R R

(vi) X1 UT f d1 = X2 f d2 for all f L0 (X2 , F2 , 2 ), (where if one side

doesnt exist or is infinite, then the other side has the same property).

||f ||p for all f Lp (X2 , F2 , 2 ).

ergodicity

Other Characterizations of Ergodicity 23

sure preserving. The following are equivalent:

(i) T is ergodic.

a.e.

a.e.

a.e.

a.e.

Proof

The implications (iii)(ii), (ii)(iv), (v)(iv), and (iii)(v) are all clear.

It remains to show (i)(iii) and (iv)(i).

(i)(iii) Suppose f (T x) = f (x) a.e. and assume without any loss of gener-

ality that f is real (otherwise we consider separately the real and imaginary

parts of f ). For each n 1 and k Z, let

k k+1

X(k,n) = {x X : n

f (x) < }.

2 2n

Then, T 1 X(k,n) X(k,n) {x : f (T x) 6= f (x)} which implies that

T 1 X(k,n) X(k,n) = 0.

for each n 1, we have

[

X= X(k,n) (disjoint union).

kZ

Hence, for each n 1, there exists a unique integer kn such that X(kn ,n) =

kn

1. In fact, X(k1 ,1) X(k2 ,2) . . ., and { n } is a bounded increasing sequence,

2

24 Introduction and preliminaries

kn T

hence limn exists. Let Y = n1 X(kn ,n) , then (Y ) = 1. Now, if

2n

kn

x Y , then 0 |f (x) kn /2n | < 1/2n for all n. Hence, f (x) = limn n ,

2

and f is a constant on Y .

Consider 1A , the indicator function of A. We have 1A L2 (X, F, ), and

1A T = 1T 1 A = 1A . Hence, by (iv), 1A is a constant a.e., hence 1A = 1 a.e.

and therefore (A) = 1.

Example 1Irrational Rotations. Consider ([0, 1), B, ), where B is the Lebesgue

-algebra, and Lebesgue measure. For (0, 1), consider the transforma-

tion T : [0, 1) [0, 1) defined by T x = x + (mod 1). We have seen

in example (a) that T is measure preserving with respect . When is T

ergodic?

If is rational, then T is not ergodic. Consider for example = 1/4, then

the set

A = [0, 1/8) [1/4, 3/8) [1/2, 5/8) [3/4, 7/8)

p

Exercise 1.8.1 Suppose = with gcd(p, q) = 1. Find a non-trivial T -

q

invariant set. Conclude that T is not ergodic if is a rational.

Proof of Claim.

then T is not ergodic.

f in its Fourier series

X

f (x) = an e2inx .

nZ

Examples of Ergodic Transformations 25

X X

f (T x) = an e2in(x+) = an e2in e2inx

nZ nZ

X

2inx

= f (x) = an e .

nZ

2in 2inx

P

Hence, nZ an (1 e )e = 0. By the uniqueness of the Fourier

2in

coefficients, we have an (1 e ) = 0 for all n Z. If n 6= 0, since is

irrational we have 1 e2in 6= 0. Thus, an = 0 for all n 6= 0, and therefore

f (x) = a0 is a constant. By Theorem 1.7.1, T is ergodic.

Exercise 1.8.2 Consider the probability space ([0, 1), B B, ), where

as above B is the Lebesgue -algebra on [0, 1), and normalized Lebesgue

measure. Suppose (0, 1) is irrational, and define T T : [0, 1) [0, 1)

[0, 1) [0, 1) by

T T (x, y) = (x + mod (1) , y + mod (1) ) .

Show that T T is measure preserving, but is not ergodic.

Example 2One (or Two) sided shift. Let X = {0, 1, . . . k 1}N , F the -

algebra generated by the cylinders, and the product measure defined on

cylinder sets by

({x : x0 = a0 , . . . xn = an }) = pa0 . . . pan ,

where p = (p0 , p1 , . . . , pk1 ) is a positive probability vector. Consider the

left shift T defined on X by T x = y, where yn = xn+1 (See Example (e) in

Subsection 1.3). We show that T is ergodic. Let E be a measurable subset

of X which is T -invariant i.e., T 1 E = E. For any > 0, by Lemma 1.2.1

(see subsection 1.2), there exists A F which is a finite disjoint union of

cylinders such that (EA) < . Then

|(E) (A)| = |(E \ A) (A \ E)|

(E \ A) + (A \ E) = (EA) < .

Since A depends on finitely many coordinates only, there exists n0 > 0

such that T n0 A depends on different coordinates than A. Since is a

product measure, we have

(A T n0 A) = (A)(T n0 A) = (A)2 .

26 Introduction and preliminaries

Further,

and

E(A T n0 A) (EA) + (ET n0 A) < 2.

Hence,

Thus,

= |(E) ((A T n0 A))| + ((A) + (E))|(A) (E)|

< 4.

Therefore, T is ergodic.

The following lemma provides, in some cases, a useful tool to verify that a

measure preserving transformation defined on ([0, 1), B, ) is ergodic, where

B is the Lebesgue -algebra, and is a probability measure equivalent to

Lebesgue measure (i.e., (A) = 0 if and only if (A) = 0).

of subintervals of [0, 1) satisfying

elements from C,

then (B) = 1.

there exists by Lemma 1.2.1 a set E that is a finite disjoint union of open

intervals such that (B c 4E ) < . Now by conditions (a) and (b) (that

is, writing E as a countable union of disjoint elements of C) one gets that

(B E ) (E ).

Examples of Ergodic Transformations 27

(B c 4E ) (B E ) (E ) (B c E ) > ((B c ) ),

we have that

((B c ) ) < (B c 4E ) < .

Hence (B c ) < + , and since > 0 is arbitrary, we get a contradiction.

ample (1) above, and let T : X X be given by

2x 0 x < 1/2

T x = 2x mod 1 =

2x 1 1/2 x < 1,

(see Example (b), subsection 1.3). We have seen that T is measure preserving.

We will use Lemma 1.8.1 to show that T is ergodic. Let C be the collection

of all intervals of the form [k/2n , (k + 1)/2n ) with n 1 and 0 k 2n 1.

Notice that the the set {k/2n : n 1, 0 k < 2n 1} of dyadic rationals

is dense in [0, 1), hence each open interval is at most a countable union of

disjoint elements of C. Hence, C satisfies the first hypothesis of Knopps

Lemma. Now, T n maps each dyadic interval of the form [k/2n , (k + 1)/2n )

linearly onto [0, 1), (we call such an interval dyadic of order n); in fact,

T n x = 2n x mod(1). Let B B be T -invariant, and assume (B) > 0. Let

A C, and assume that A is dyadic of order n. Then, T n A = [0, 1) and

1

(A B) = (A T n B) = (T n A B)

(A)

1

= (B) = (A)(B).

2n

Thus, the second hypothesis of Knopps Lemma is satisfied with = (B) >

0. Hence, (B) = 1. Therefore T is ergodic.

T : [0, 1) [0, 1) given by T x = x mod(1) = x bxc. Use Lemma

1.8.1 to show that T is ergodic with respect to Lebesgue measure , i.e. if

T1 A = A, then (A) = 0 or 1.

28 Introduction and preliminaries

ergodic measure preserving transformation on the probability space (X, F, ),

and A F with (A) > 0. Consider the induced transformation TA on

(A, F A, A ) of T (see subsection 1.5). Recall that TA x = T n(x) x, where

n(x) := inf{n 1 : T n x A}. Let (as before)

Ak = {x A : n(x) = k}

Bk = {x X \ A : T x, . . . , T k1 x 6 A, T k x A}.

A, A ).

A (C) = 0 or 1; equivalently, (C) = 0 or (C) = (A). Since A = k1 Ak ,

we have C = TA1 C = k1 Ak T k C. Let E = k1 Bk T k C, and

S S

F = E C (disjoint union). Recall that (see subsection 1.5) T 1 A = A1 B1 ,

and T 1 Bk = Ak+1 Bk+1 . Hence,

T 1 F = T 1 E T 1 C

[

(Ak+1 Bk+1 ) T (k+1) C (A1 B1 ) T 1 C

=

k1

[ [

= (Ak T k C) (Bk T k C)

k1 k1

= C E = F.

Hence, F is T -invariant, and by ergodicity of T we have (F ) = 0 or 1.

If (F ) = 0, then (C) = 0, and hence A (C) = 0.

If (F ) = 1, then (X \ F ) = 0. Since

X \ F = (A \ C) ((X \ A) \ E) A \ C,

it follows that

(A \ C) (X \ F ) = 0.

Since (A \ C) = (A) (C), we have (A) = (C), i.e., A (C) = 1.

T k A = 1, then, T

S

Exercise 1.8.4 Show that if TA is ergodic and k1

is ergodic.

Chapter 2

The Ergodic Theorem is also known as Birkhoffs Ergodic Theorem or the

Individual Ergodic Theorem (1931). This theorem is in fact a generalization

of the Strong Law of Large Numbers (SLLN) which states that for a sequence

Y1 , Y2 , . . . of i.i.d. random variables on a probability space (X, F, ), with

E|Yi | < ; one has

n

1X

lim Yi = EY1 (a.e.).

n n

i=1

For example consider X = {0, 1}N , F the -algebra generated by the cylinder

sets, and the uniform product measure, i.e.,

({x : x1 = a1 , x2 = a2 , . . . , xn = an }) = 1/2n .

Suppose one is interested in finding the frequency of the digit 1. More pre-

cisely, for a.e. x we would like to find

1

lim #{1 i n : xi = 1}.

n n

Using the Strong Law of Large Numbers one can answer this question easily.

Define

1, if xi = 1,

Yi (x) :=

0, otherwise.

29

30 The Ergodic Theorem

Bernoulli

Pn process, and EYi = E|Yi | = 1/2. Further, #{1 i n : xi = 1} =

i=1 Yi (x). Hence, by SLLN one has

1 1

lim #{1 i n : xi = 1} = .

n n 2

Suppose now we are interested in the frequency of the block 011, i.e., we

would like to find

1

lim #{1 i n : xi = 0, xi+1 = 1, xi+2 = 1}.

n n

1, if xi = 0, xi+1 = 1, xi+2 = 1,

Zi (x) :=

0, otherwise.

Then,

n

1 1X

#{1 i n : xi = 0, xi+1 = 1, xi+2 = 1} = Zi (x).

n n i=1

It is not hard to see that this sequence is stationary but not independent. So

one cannot directly apply the strong law of large numbers. Notice that if T

is the left shift on X, then Yn = Y1 T n1 and Zn = Z1 T n1 .

In general, suppose (X, F, ) is a probability space and T : X X a measure

preserving transformation. For f L1 (X, F, ), we would like to know under

n1

1X

what conditions does the limit limn f (T i x) exist a.e. If it does exist

n i=0

what is its value? This is answered by the Ergodic Theorem which was

originally proved by G.D. Birkhoff in 1931. Since then, several proofs of this

important theorem have been obtained; here we present a recent proof given

by T. Kamae and M.S. Keane in [KK].

Theorem 2.1.1 (The Ergodic Theorem) Let (X, F, ) be a probability space

and T : X X a measure preserving transformation. Then, for any f in

L1 (),

n1

1X

lim f (T i (x)) = f (x)

n n

i=0

R R

exists a.e., is T -invariant and X f d = X

f d. If moreover T is ergodic,

R

then f is a constant a.e. and f = X f d.

The Ergodic Theorem and its consequences 31

For the proof of the above theorem, we need the following simple lemma.

Lemma 2.1.1 Let M > 0 be an integer, and suppose {an }n0 , {bn }n0 are

sequences of non-negative real numbers such that for each n = 0, 1, 2, . . . there

exists an integer 1 m M with

an + + an+m1 bn + + bn+m1 .

Then, for each positive integer N > M , one has

a0 + + aN 1 b0 + + bN M 1 .

m0 < m1 < . . . < mk < N with the following properties

a0 + . . . + am0 1 b0 + . . . + bm0 1 ,

am0 + . . . + am1 1 bm0 + . . . + bm1 1 ,

..

.

amk1 + . . . + amk 1 bmk1 + . . . + bmk 1 .

Then,

a0 + . . . + aN 1 a0 + . . . + amk 1

b0 + . . . + bmk 1 b0 + . . . bN M 1 .

Proof of Theorem 2.1.1 Assume with no loss of generality that f 0

(otherwise we write f = f + f , and we consider each part separately).

fn (x)

Let fn (x) = f (x) + . . . + f (T n1 x), f (x) = lim supn , and f (x) =

n

fn (x)

lim inf n . Then f and f are T -invariant. This follows from

n

fn (T x)

f (T x) = lim sup

n n

fn+1 (x) n + 1 f (x)

= lim sup

n n+1 n n

fn+1 (x)

= lim sup = f (x).

n n+1

32 The Ergodic Theorem

T -invariant, it is enough to show that

Z Z Z

f d f d f d.

X X X

R R

We first prove that X f d X f d. Fix any 0 < < 1, and let L > 0 be

any real number. By definition of f , for any x X, there exists an integer

m > 0 such that

fm (x)

min(f (x), L)(1 ).

m

Now, for any > 0 there exists an integer M > 0 such that the set

f (x) x X0

F (x) =

L x/ X0 .

bn = bn (x) = min(f (x), L)(1 ) (so bn is independent of n).We now show

that {an } and {bn } satisfy the hypothesis of Lemma 2.1.1 with M > 0 as

above. For any n = 0, 1, 2, . . .

if T n x X0 , then there exists 1 m M such that

= m min(f (x), L)(1 )

= bn + . . . + bn+m1 .

Hence,

an + . . . + an+m1 = F (T n x) + . . . + F (T n+m1 x)

f (T n x) + . . . + f (T n+m1 x) = fm (T n x)

bn + . . . + bn+m1 .

If T n x

/ X0 , then take m = 1 since

The Ergodic Theorem and its consequences 33

Integrating both sides, and using the fact that T is measure preserving one

gets Z Z

N F (x) d(x) (N M ) min(f (x), L)(1 ) d(x).

X X

Since Z Z

F (x) d(x) = f (x) d(x) + L(X \ X0 ),

X X0

one has

Z Z

f (x) d(x) f (x) d(x)

X X0

Z

= F (x) d(x) L(X \ X0 )

X

(N M )

Z

min(f (x), L)(1 ) d(x) L.

N X

gets together with the monotone convergence theorem that f is integrable,

and Z Z

f (x) d(x) f (x) d(x).

X X

Z Z

f (x) d(x) f (x) d(x).

X X

fm (x)

(f (x) + ).

m

For any > 0 there exists an integer M > 0 such that the set

34 The Ergodic Theorem

f (x) x Y0

G(x) =

0 x/ Y0 .

of n). One can easily check that the sequences {an } and {bn } satisfy the

hypothesis of Lemma 2.1.1 with M > 0 as above. Hence for any M > N ,

one has

G(x) + . . . + G(T N M 1 x) N (f (x) + ).

Integrating both sides yields

Z Z

(N M ) G(x)d(x) N ( f (x)d(x) + ).

X X

R

Since f 0, the measure defined by (A) = A f (x) d(x) is absolutely

continuous with respect to the measure . Hence, there exists 0 > 0 such

that

R if (A) < , then (A) < 0 . Since (X \ Y0 ) < , then (X \ Y0 ) =

X\Y0

f (x)d(x) < 0 . Hence,

Z Z Z

f (x) d(x) = G(x) d(x) + f (x) d(x)

X X X\Y0

Z

N

(f (x) + ) d(x) + 0 .

N M X

one gets Z Z

f (x) d(x) f (x) d(x).

X X

This shows that Z Z Z

f d f d f d,

X X X

T -invariance of f implies that f is a constant a.e. Therefore,

Z Z

f (x) = f (y)d(y) = f (y) d(y).

X X

The Ergodic Theorem and its consequences 35

Remarks

(1) Let us study further the limit f in the case that T is not ergodic. Let I

be the sub--algebra of F consisting of all T -invariant subsets A F. Notice

that if f L1 (), then the conditional expectation of f given I (denoted by

E (f |I)), is the unique a.e. I-measurable L1 () function with the property

that Z Z

f (x) d(x) = E (f |I)(x) d(x)

A A

function f is T -invariant, it follows that f is I-measurable. Furthermore,

for any A I, by the ergodic theorem and the T -invariance of 1A ,

n1 n1

1X 1X

lim (f 1A )(T i x) = 1A (x) lim f (T i x) = 1A (x)f (x) a.e.

n n n n

i=0 i=0

and Z Z

f 1A (x) d(x) = f 1A (x) d(x).

X X

(2) Suppose T is ergodic and measure preserving with respect to , and let

be a probability measure which is equivalent to (i.e. and have the

same sets of measure zero so (A) = 0 if and only if (A) = 0), then for

every f L1 () one has

n1 Z

1X

lim f (T i (x)) = f d

n n X

i=0

a.e.

transformation on a probability space (X, F, ). Let A be a measurable

subset of X of positive measure, and denote by n the first return time map

and let TA be the induced transformation of T on A (see section 1.5). Prove

that Z

n(x) d = 1.

A

36 The Ergodic Theorem

n1

1X 1

lim n(TAi (x)) = ,

n n (A)

i=0

almost everywhere on A.

1+ 5

Exercise 2.1.2 Let = , and consider the transformation T :

2

[0, 1) [0, 1) given by T x = x mod(1) = x bxc. Define b1 on

[0, 1) by

0 if 0 x < 1/

b1 (x) =

1 if 1/ x < 1,

Fix k 0. Find the a.e. value (with respect to Lebesgue measure) of the

following limit

1

lim #{1 i n : bi = 0, bi+1 = 0, . . . , bi+k = 0}.

n n

Using the Ergodic Theorem, one can give yet another characterization of

ergodicity.

measure preserving transformation. Then, T is ergodic if and only if for all

A, B F, one has

n1

1X

lim (T i A B) = (A)(B). (2.1)

n n

i=0

1A L1 (X, F, ), by the ergodic theorem one has

n1 Z

1X

lim 1A (T i x) = 1A (x) d(x) = (A) a.e.

n n X

i=0

The Ergodic Theorem and its consequences 37

Then,

n1 n1

1X 1X

lim 1T i AB (x) = lim 1T i A (x)1B (x)

n n n n

i=0 i=0

n1

1X

= 1B (x) lim 1A (T i x)

n n

i=0

= 1B (x)(A) a.e.

Pn1

Since for each n, the function limn n1 i=0 1T i AB is dominated by the

constant function 1, it follows by the dominated convergence theorem that

n Z n1

1X i 1X

lim (T A B) = lim 1T i AB (x) d(x)

n n X n n i=0

i=0

Z

= 1B (A) d(x) = (A)(B).

X

T 1 E = E and (E) > 0. By invariance of E, we have (T i E E) = (E),

hence

n1

1X

lim (T i E E) = (E).

n n

i=0

n1

1X

lim (T i E E) = (E)2 .

n n

i=0

Hence, (E) = (E)2 . Since (E) > 0, this implies (E) = 1. Therefore, T

is ergodic.

To show ergodicity one needs to verify equation (2.1) for sets A and B

belonging to a generating semi-algebra only as the next proposition shows.

semi-algebra of F. Let T : X X be a measure preserving transformation.

Then, T is ergodic if and only if for all A, B S, one has

n1

1X

lim (T i A B) = (A)(B). (2.2)

n n

i=0

38 The Ergodic Theorem

Proof We only need to show that if (2.2) holds for all A, B S, then it

holds for all A, B F. Let > 0, and A, B F. Then, by Lemma 1.2.1

(subsection 1.2) there exist sets A0 , B0 each of which is a finite disjoint union

of elements of S such that

Since,

(T i A B)(T i A0 B0 ) (T i AT i A0 ) (BB0 ),

it follows that

|(T i A B) (T i A0 B0 )| (T i A B)(T i A0 B0 )

(T i AT i A0 ) + (BB0 )

< 2.

Further,

|(B) (B0 )| + |(A) (A0 )|

(BB0 ) + (AA0 )

< 2.

Hence,

! !

1Xn1 n1

1X

(T i A B) (A)(B) (T i A0 B0 ) (A0 )(B0 )

n n i=0

i=0

n1

1 X

(T i A B) + (T i A0 B0 ) |(A)(B) (A0 )(B0 )|

n i=0

< 4.

Therefore, " #

n1

1X

lim (T i A B) (A)(B) = 0.

n n i=0

The Ergodic Theorem and its consequences 39

Theorem 2.1.2 Suppose 1 and 2 are probability measures on (X, F), and

T : X X is measurable and measure preserving with respect to 1 and 2 .

Then,

respect to 1 , then 1 = 2 ,

2 are singular with respect to each other.

tinuous with respect to 1 . For any A F, by the ergodic theorem for a.e.

x one has

n1

1X

lim 1A (T i x) = 1 (A).

n n

i=0

Let

n1

1X

CA = {x X : lim 1A (T i x) = 1 (A)},

n n

i=0

T is measure preserving with respect to 2 , for each n 1 one has

n1 Z

1X

1A (T i x) d2 (x) = 2 (A).

n i=0 X

Z n1 Z

1X

lim 1A (T i x)d2 (x) = 1 (A) d2 (x).

n X n i=0 X

(ii) Suppose T is ergodic with respect to 1 and 2 . Assume that 1 6= 2 .

Then, there exists a set A F such that 1 (A) 6= 2 (A). For i = 1, 2 let

n1

1X

Ci = {x X : lim 1A (T j x) = i (A)}.

n n

j=0

40 The Ergodic Theorem

C1 C2 = . Thus 1 and 2 are supported on disjoint sets, and hence 1

and 2 are mutually singular.

We end this subsection with a short discussion that the assumption of er-

godicity is not very restrictive. Let T be a transformation on the probability

space (X, F, ), and suppose T is measure preserving but not necessarily

ergodic. We assume that X is a complete separable metric space, and F the

corresponding Borel -algebra (in order to make sure that the conditional

expectation is well-defined a.e.). Let I be the sub--algebra of T -invariant

measurable sets. We can decompose into T -invariant ergodic components

in the following way. For x X, define a measure x on F by

x (A) = E (1A |I)(x).

Then, for any f L1 (X, F, ),

Z

f (y) dx (y) = E (f |I)(x).

X

Note that

Z Z

(A) = E (1A |I)(x) d(x) = x (A) d(x),

X X

ergodic for a.e. x X. So let A F, then for a.e. x

x (T 1 A) = E (1A T |I)(x) = E (IA |I)(T x) = E (IA |I)(x) = x (A).

Now, let A F be such that T 1 A = A. Then, 1A is T -invariant, and hence

I-measurable. Then,

x (A) = E (1A |I)(x) = 1A (x) a.e.

Hence, for a.e. x and for any B F,

x (A B) = E (1A 1B |I)(x) = 1A (x)E (1B |I)(x) = x (A)x (B).

In particular, if A = B, then the latter equality yields x (A) = x (A)2 which

implies that for a.e. x, x (A) = 0 or 1. Therefore, x is ergodic. (One

in fact needs to work a little harder to show that one can find a set N of

-measure zero, such that for any x X \ N , and any T -invariant set A, one

has x (A) = 0 or 1. In the above analysis the a.e. set depended on the choice

of A. Hence, the above analysis is just a rough sketch of the proof of what

is called the ergodic decomposition of measure preserving transformations.)

Characterization of Irreducible Markov Chains 41

Chains

Consider the Markov Chain in Example(f) subsection 1.3. That is X =

{0, 1, . . . N 1}Z , F the -algebra generated by the cylinders, T : X X

the left shift, and the Markov measure defined by the stochastic N N

matrix P = (pij ), and the positive probability vector = (0 , 1 , . . . , N 1 )

satisfying P = . That is

({x : x0 = i0 , x1 = i1 , . . . xn = in }) = i0 pi0 i1 pi1 i2 . . . pin1 in .

We want to find necessary and sufficient conditions for T to be ergodic. To

achieve this, we first set

n1

1X k

Q = lim P ,

n n

k=0

(k)

where P k = (pij ) is the k th power of the matrix P , and P 0 is the k k

identity matrix. More precisely, Q = (qij ), where

n1

1 X (k)

qij = lim pij .

n n

k=0

1

Pn1 (k)

Lemma 2.2.1 For each i, j {0, 1, . . . N 1}, the limit limn n k=0 pij

exists, i.e., qij is well-defined.

n1 n1

1 X (k) 1 1X

pij = ({x X : x0 = i, xk = j}).

n k=0 i n k=0

n1 n1

1X 1X

lim 1{x:xk =j} (x) = lim 1{x:x0 =j} (T k x) = f (x),

n n n n

k=0 k=0

n1 n1

1X 1X

lim 1{x:x0 =i,xk =j} (x) = 1{x:x0 =i} (x) lim 1{x:x0 =j} (T k x) = f (x)1{x:x0 =i} (x).

n n n n

k=0 k=0

42 The Ergodic Theorem

Since n1 n1

P

k=0 1{x:x0 =i,xk =j} (x) 1 for all n, by the dominated convergence

theorem,

n1

1 1X

qij = lim ({x X : x0 = i, xk = j})

i n n k=0

Z n1

1 1X

= lim 1{x:x0 =i,xk =j} (x) d(x)

i X n n k=0

Z

1

= f (x)1{x:x0 =i} (x) d(x)

i X

Z

1

= f (x) d(x)

i {x:x0 =i}

which is finite since f is integrable. Hence qij exists.

Exercise 2.2.1 Show that the matrix Q has the following properties:

(a) Q is stochastic.

(b) Q = QP = P Q = Q2 .

(c) Q = .

matrix P is said to be irreducible if for every i, j {0, 1, . . . N 1}, there

(n)

exists n 1 such that pij > 0.

Theorem 2.2.1 The following are equivalent,

(i) T is ergodic.

(ii) All rows of Q are identical.

(iii) qij > 0 for all i, j.

(iv) P is irreducible.

(v) 1 is a simple eigenvalue of P .

Proof

(i) (ii) By the ergodic theorem for each i, j,

n1

1X

lim 1{x:x0 =i,xk =j} (x) = 1{x:x0 =i} (x)j .

n n

k=0

Characterization of Irreducible Markov Chains 43

n1

1X

lim ({x X : x0 = i, xk = j}) = i j .

n n

k=0

Hence,

n1

1 1X

qij = lim ({x X : x0 = i, xk = j}) = j ,

i n n k=0

i.e., qij is independent of i. Therefore, all rows of Q are identical.

(ii) (iii) If all the rows of Q are identical, then all the columns of Q are

constants. Thus, for each j there exists a constant cj such that qij = cj for

all i. Since Q = , it follows that qij = cj = j > 0 for all i, j.

(iii) (iv) For any i, j

n1

1 X (k)

lim pij = qij > 0.

n n

k=0

(n)

Hence, there exists n such that pij > 0, therefore P is irreducible.

(iv) (iii) Suppose P is irreducible. For any state i {0, 1, . . . , N 1}, let

Si = {j : qij > 0}. Since Q is a stochastic matrix, it follows that Si 6= . Let

l Si , then qil > 0. Since Q = QP = QP n for all n, then for any state j

N

X 1

(n) (n)

qij = qim pmj qil plj

m=0

(n)

for any n. Since P is irreducible, there exists n such that plj > 0. Hence,

qij > 0 for all i, j.

(iii) (ii) Suppose qij > 0 for all j = 0, 1, . . . , N 1. Fix any state j, and

let qj = max0iN 1 qij . Suppose that not all the qij s are the same. Then

there exists k {0, 1, . . . , N 1} such that qkj < qj . Since Q is stochastic

and Q2 = Q, then for any i {0, 1, . . . , N 1} we have,

N

X 1 N

X 1

qij = qil qlj < qil qj = qj .

l=0 l=0

44 The Ergodic Theorem

columns of Q are constants, or all the rows are identical.

(ii) (i) Suppose all the rows of Q are identical. We have shown above

that this implies qij = j for all i, j {0, 1, . . . , N 1}. Hence j =

(k)

limn n1 n1

P

k=0 pij .

Let

show that

n1

1X

lim (T i A B) = (A)(B).

n n

i=0

Since T is the left shift, for all n sufficiently large, the cylinders T n A and

B depend on different coordinates. Hence, for n sufficiently large,

(n+rsm)

(T n A B) = j0 pj0 j1 . . . pjm1 jm pjm i0 pi0 i1 . . . pil1 il .

Thus,

n1

1X

lim (T k A B)

n n

k=0

n1

1 X (k)

= j0 pj0 j1 . . . pjm1 jm pi0 i1 . . . pil1 il lim pjm i0

n n

k=0

= (j0 pj0 j1 . . . pjm1 jm )(i0 pi0 i1 . . . pil1 il )

= (B)(A).

Therefore, T is ergodic.

(ii) (v) If all the rows of Q are identical, then qij = j for all i, j. If

P 1

vP = v, then vQ = v. This implies that for all j, vj = ( N i=0 vi )j . Thus,

v is a multiple of . Therefore, 1 is a simple eigenvalue.

(v) (ii) Suppose 1 is a simple eigenvalue. For any i, let qi = (qi0 , . . . , qi(N 1) )

denote the ith row of Q then, qi is a probability vector. From Q = QP , we get

qi = qi P . By hypothesis is the only probability vector satisfying P = P ,

hence = qi , and all the rows of Q are identical.

Characterization of Irreducible Markov Chains 45

2.3 Mixing

As a corollary to the ergodic theorem we found a new definition of ergodicity;

namely, asymptotic average independence. Based on the same idea, we now

define other notions of weak independence that are stronger than ergodicity.

measure preserving transformation. Then,

(i) T is weakly mixing if for all A, B F, one has

n1

1 X

(T i A B) (A)(B) = 0.

lim (2.3)

n n

i=0

n

Notice that strongly mixing implies weakly mixing, and weakly mixing im-

plies ergodicity. This follows from the simple fact that if {an } is a sequence

n1

1X

of real numbers such that limn an = 0, then limn |ai | = 0, and

n i=0

n1

1X

hence limn ai = 0. Furthermore, if {an } is a bounded sequence,

n i=0

then the following are equivalent:

n1

1X

(i) limn |ai | = 0

n i=0

n1

1X

(ii) limn |ai |2 = 0

n i=0

1

lim # ({0, 1, . . . , n 1} J) = 0,

n n

/ an = 0.

46 The Ergodic Theorem

Using this one can give three equivalent characterizations of weakly mixing

transformations, can you state them?

measure preserving transformation. Let S be a generating semi-algebra of

F.

(a) Show that if equation (2.3) holds for all A, B S, then T is weakly

mixing.

(b) Show that if equation (2.4) holds for all A, B S, then T is strongly

mixing.

Example (e) in subsection 1.3, and Example (2) in subsection 1.8. Show that

T is strongly mixing.

measure preserving transformation. Consider the transformation T T de-

fined on (X X, F F, ) by T T (x, y) = (T x, T y).

Chapter 3

Measure Preserving

Isomorphisms and Factor Maps

Given a measure preserving transformation T on a probability space (X, F, ),

we call the quadruple (X, F, , T ) a dynamical system . Now, given two dy-

namical systems (X, F, , T ) and (Y, C, , S), what should we mean by: these

systems are the same? On each space there are two important structures:

(1) The measure structure given by the -algebra and the probability mea-

sure. Note, that in this context, sets of measure zero can be ignored.

tion.

So our notion of being the same must mean that we have a map

: (X, F, , T ) (Y, C, , S)

satisfying

(suitable) set NX of measure 0 in X, and a (suitable) set NY of measure

0 in Y , the map : X \ NX Y \ NY is a bijection.

47

48 Measure Preserving Isomorphism and Factor Maps

C C.

same as saying that the following diagram commutes.

T

N... .............................................................. N...

... ...

... ...

... ...

... ...

... ...

...

...

...

...

...

...

... ...

. .

......... .........

... ...

S

N 0..............................................................N 0

x Tx T 2x T nx

2 n

(x) S ((x)) S ((x)) S ((x))

isomorphic if there exist measurable sets N X and M Y with (X\N ) =

(Y \ M ) = 0 and T (N ) N, S(M ) M , and finally if there exists a

measurable map : N M such that (i)(iv) are satisfied.

Exercise 3.1.1 Suppose (X, F, , T ) and (Y, C, , S) are two isomorphic dy-

namical systems. Show that

Examples

(1) Let K = {z C : |z| = 1} be equipped with the Borel -algebra B on

K, and Haar measure (i.e., normalized Lebeque measure on the unit circle).

Measure Preserving Isomorphisms 49

easily check that S is measure preserving. In fact, the map S is isomorphic

to the map T on ([0, 1), B, ) given by T x = 2x (mod 1) (see Example

(b) in subsection 1.3, and Example (3) in subsection 1.8). Define a map

: [0, 1) K by (x) = e2ix . We leave it to the reader to check that

is a measurable isomorphism, i.e., is a measurable and measure preserving

bijection such that S(x) = (T x) for all x [0, 1).

(2) Consider ([0, 1), B, ), the unit interval with the Lebesgue -algebra, and

Lebesgue measure. Let T : [0, 1) [0, 1) be given by T x = N x bN xc.

Iterations of T generate the N -adic expansion of points in the unit interval.

Let Y := {0, 1, . . . , N 1}N , the set of all sequences (yn )n1 , with yn

{0, 1, . . . , N 1} for n 1. We now construct an isomorphism between

([0, 1), B, , T ) and (Y, F, , S), where F is the -algebra generated by the

cylinders, and the uniform product measure defined on cylinders by

1

({(yi )i1 Y : y1 = a1 , y2 = a2 , . . . , yn = an }) = ,

Nn

for any (a1 , a2 , a3 , . . .) Y , and where S is the left shift.

Define : [0, 1) Y = {0, 1, . . . , N 1}N by

X ak

: x = k

7 (ak )k1 ,

k=1

N

where k

P

k=1 ak /N is the N -adic expansion of x (for example if N = 2 we

get the binary expansion, and if N = 10 we get the decimal expansion). Let

and measure preservingness on cylinders:

i1 i2 in i1 i2 in + 1

1 (C(i1 , . . . , in )) = + 2 + + n, + 2 + +

N N N N N Nn

and

1

1 (C(i1 , . . . , in )) = n = (C(i1 , . . . , in ))) .

N

Note that

50 Measure Preserving Isomorphism and Factor Maps

bijection, since every x [0, 1) has a unique N -adic expansion generated

by T . Finally, it is easy to see that T = S .

Exercise 3.1.2 Consider ([0, 1)2 , B B, ), where B B is the product

Lebesgue -algebra, and is the product Lebesgue measure Let T :

[0, 1)2 [0, 1)2 be given by

1

(2x, 2 y), 0 x < 21

T (x, y) =

1

(2x 1, (y + 1)), 12 x < 1.

2

Show that T is isomorphic to the two-sided Bernoulli shift S on {0, 1}Z , F, ,

where F is the -algebra generated by cylinders of the form

= {xk = ak , . . . , x` = a` : ai {0, 1}, i = k, . . . , `}, k, ` 0,

and the product measure with weights ( 21 , 12 ) (so () = ( 12 )k+`+1 ).

1+ 5

Exercise 3.1.3 Let G = , so that G2 = G + 1. Consider the set

2

1 [ 1 1

X = [0, ) [0, 1) [ , 1) [0, ),

G G G

endowed with the product Borel -algebra. Define the transformation

y

(Gx, G ),

(x, y) [0, G1 ) [0, 1]

T (x, y) =

(Gx 1, 1 + y ), (x, y) [ 1 , 1) [0, 1 ).

G G

G

(a) Show that T is measure preserving with respect to normalized Lebesgue

measure on X.

(b) Now let S : [0, 1) [0, 1) [0, 1) [0, 1) be given by

y

(Gx, ), (x, y) [0, G1 ) [0, 1]

G

S(x, y) =

(G2 x G, G + y ), (x, y) [ 1 , 1) [0, 1).

G

G2

Factor Maps 51

measure on [0, 1) [0, 1).

on Y , i.e., for (x, y) Y , U (x, y) = T n(x,y) , where n(x, y) = inf{n

1 : T n (x, y) Y }. Show that the map : [0, 1) [0, 1) Y given by

y

(x, y) = (x, )

G

defines an isomorphism from S to U , where Y has the induced measure

structure (see Section 1.5).

In the above section, we discussed the notion of isomorphism which describes

when two dynamical systems are considered the same. Now, we give a precise

definition of what it means for a dynamical system to be a subsystem of

another one.

We say that S is a factor of T if there exist measurable sets M1 F and

M2 C, such that (M1 ) = (M2 ) = 1 and T (M1 ) M1 , S(M2 ) M2 ,

and finally if there exists a measurable and measure preserving map : M1

M2 which is surjective, and satisfies (T (x)) = S((x)) for all x M1 . We

call a factor map.

sub--algebra of F, since

T 1 G = T 1 1 C = 1 S 1 C 1 C = G.

given by

(2x, 21 y), 0 x < 21

T (x, y) =

(2x 1, 12 (y + 1)), 21 x < 1,

52 Measure Preserving Isomorphism and Factor Maps

and let S be the left shift on X = {0, 1}N with the -algebra F generated by

the cylinders, and the uniform product measure . Define : [0, 1)[0, 1)

X by

(x, y) = (a1 , a2 , . . .),

an

where x =

P

n=1 n is the binary expansion of x. It is easy to check that

2

is a factor map.

Exercise 3.2.1 Let T be the left shift on X = {0, 1, 2}N which is endowed

with the -algebra F, generated by the cylinder sets, and the uniform product

measure giving each symbol probability 1/3, i.e.,

1

({x X : x1 = i1 , x2 = i2 , . . . , xn = in }) = ,

3n

where i1 , i2 , . . . , in {0, 1, 2}.

Let S be the left shift on Y = {0, 1}N which is endowed with the -algebra G,

generated by the cylinder sets, and the product measure giving the symbol

0 probability 1/3 and the symbol 1 probability 2/3, i.e.,

2 1

({y Y : y1 = j1 , y2 = j2 , . . . , yn = jn }) = ( )j1 +j2 +...+jn ( )n(j1 +j2 +...+jn ) ,

3 3

where j1 , j2 , . . . , jn {0, 1}. Show that S is a factor of T .

mixing) transformation is also ergodic (weakly mixing/strongly mixing).

Suppose (Y, G, , S) is a non-invertible measure-preserving dynamical system.

An invertible measure-preserving dynamical system (X, F, , T ) is called a

natural extension of (Y, G, , S) if S is a factor of T and the factor map

satisfies m 1

m=0 T G = F, where

_

T k 1 G

k=0

Natural Extensions 53

Example Let T on {0, 1}Z

, F, be the two-sided Bernoulli shift, and S on

{0, 1}N{0}

, G, be the one-sided Bernoulli shift, both spaces are endowed

with the uniform product measure. Notice that T is invertible, while S is

not. Set X = {0, 1}Z , Y = {0, 1}N{0} , and define : X Y by

(. . . , x1 , x0 , x1 , . . .) = (x0 , x1 , . . .) .

_

T k 1 G = F.

k=0

W

k=0 T G contains all cylinders generating

F.

Let = {x X : xk = ak , . . . , x` = a` } be an arbitrary cylinder in

F, and let D = {y Y : y0 = ak , . . . , yk+` = a` } which is a cylinder in G.

Then,

1 D = {x X : x0 = ak , . . . , xk+` = a` } and T k 1 D = .

_

T k 1 G = F.

k=0

54 Measure Preserving Isomorphism and Factor Maps

Chapter 4

Entropy

Given a measure preserving transformation T on a probability space (X, F, ),

we want to define a nonnegative quantity h(T ) which measures the average

uncertainty about where T moves the points of X. That is, the value of

h(T ) reflects the amount of randomness generated by T . We want to define

h(T ) in such a way, that (i) the amount of information gained by an appli-

cation of T is proportional to the amount of uncertainty removed, and (ii)

that h(T ) is isomorphism invariant, so that isomorphic transformations have

equal entropy.

The connection between entropy (that is randomness, uncertainty) and

the transmission of information was first studied by Claude Shannon in 1948.

As a motivation let us look at the following simple example. Consider a source

(for example a ticker-tape) that produces a string of symbols x1 x0 x1

from the alphabet {a1 , a2 , . . . , an }. Suppose that the probability of receiving

symbol ai at any given time is pi , and that each symbol is transmitted inde-

pendently of what has beenPtransmitted earlier. Of course we must have here

that each pi 0 and that i pi = 1. In ergodic theory we view this process

as the dynamical system (X, F, B, , T ), where X = {a1 , a2 , . . . , an }N , B the

-algebra generated by cylinder sets of the form

55

56 Entropy

the symbol ai , and T the left shift. We define the entropy of this system by

n

X

H(p1 , . . . , pn ) = h(T ) := pi log2 pi . (4.1)

i=1

then H is the average amount of information gained (or uncertainty removed)

per symbol (notice that H is in fact an expected value). To see why this is an

appropriate definition, notice that if the source is degenerate, that is, pi = 1

for some i (i.e., the source only transmits the symbol ai ), then H = 0. In

this case we indeed have no randomness. Another reason to see why this

definition is appropriate, is that H is maximal if pi = n1 for all i, and this

agrees with the fact that the source is most random when all the symbols are

equiprobable. To see this maximum, consider the function f : [0, 1] R+

defined by (

0 if t = 0,

f (t) =

t log2 t if 0 < t 1.

Then f is continuous and concave downward, and Jensens Inequality implies

that for any p1 , . . . , pn with pi 0 and p1 + . . . + pn = 1,

n n

1 1X 1X 1 1

H(p1 , . . . , pn ) = f (pi ) f ( pi ) = f ( ) = log2 n ,

n n i=1 n i=1 n n

1 1

H( , . . . , ) = log2 n,

n n

so the maximum value is attained at ( n1 , . . . , n1 ).

So far H is defined as the average information per symbol. The above defini-

tion can be extended to define the information transmitted by the occurrence

of an event E as log2 P (E). This definition has the property that the in-

formation transmitted by E F for independent events E and F is the sum

of the information transmitted by each one individually, i.e.,

log2 P (E F ) = log2 P (E) log2 P (F ).

Definitions and Properties 57

The only function with this property is the logarithm function to any base.

We choose base 2 because information is usually measured in bits.

In the above example of the ticker-tape the symbols were transmitted

independently. In general, the symbol generated might depend on what has

been received before. In fact these dependencies are often built-in to be

able to check the transmitted sequence of symbols on errors (think here of

the Morse sequence, sequences on compact discs etc.). Such dependencies

must be taken into consideration in the calculation of the average information

per symbol. This can be achieved if one replaces the symbols ai by blocks of

symbols of particular size. More precisely, for every n, let Cn be the collection

of all possible n-blocks (or cylinder sets) of length n, and define

X

Hn := P (C) log P (C) .

CCn

Then n1 Hn can be seen as the average information per symbol when a block

of length n is transmitted. The entropy of the source is now defined by

Hn

h := lim . (4.2)

n n

The existence of the limit in (4.2) follows from the fact that Hn is a subadditive

sequence, i.e., Hn+m Hn + Hm , and proposition (4.2.2) (see proposition

(4.2.3) below).

Now replace the source by a measure preserving system (X, B, , T ).

How can one define the entropy of this system similar to the case of a

source? The symbols {a1 , a2 , . . . , an } can now be viewed as a partition

A = {A1 , A2 , . . . , An } of X, so that X is the disjoint union (up to sets

of measure zero) of A1 , A2 , . . . , An . The source can be seen as follows: with

each point x X, we associate an infinite sequence x1 , x0 , x1 , , where

xi is aj if and only if T i x Aj . We define the entropy of the partition by

n

X

H() = H () := (Ai ) log (Ai ) .

i=1

of the partition we choose. In fact h(T ) must be the maximal entropy over

all possible finite partitions. But first we need few facts about partitions.

58 Entropy

tions of X. Show that

T 1 := {T 1 A1 , . . . , T 1 An }

and

:= {Ai Bj : Ai , Bj }

are both partitions of X.

The members of a partition are called the atoms of the partition. We say

that the partition = {B1 , . . . , Bm } is a refinement of the partition =

{A1 , . . . , An }, and write , if for every 1 j m there exists an

1 i n such that Bj Ai (up to sets of measure zero). The partition

is called the common refinement of and .

(disjoint) union of atoms of .

define the conditional entropy of given by

XX (A B)

H(|) := log (A B) .

A B

(B)

The above quantity H(|) is interpreted as the average uncertainty

about which element of the partition the point x will enter (under T )

if we already know which element of the point x will enter.

Definitions and Properties 59

(f ) H( |) = H(|) + H(| );

(g) If , then H(|) H(|);

(h) If , then H(|) = 0.

(i) We call two partitions and independent if

(A B) = (A)(B) for all A , B .

If and are independent partitions, one has that

H( ) = H() + H() .

Proof We prove properties (b) and (c), the rest are left as an exercise.

XX

H( ) = (A B) log (A B)

A B

XX (A B)

= (A B) log

A B

(A)

XX

+ (A B) log (A)

A B

= H(|) + H().

We now show that H(|) H(). Recall that the function f (t) = t log t

for 0 < t 1 is concave down. Thus,

XX (A B)

H(|) = (A B) log

B A

(A)

XX (A B) (A B)

= (A) log

B A

(A) (A)

XX (A B)

= (A)f

B A

(A)

!

X X (A B)

f (A)

B A

(A)

X

= f ((B)) = H().

B

60 Entropy

Wn1 i

Now consider the partition i=0 T , whose atoms are of the form Ai0

1 (n1)

T Ai1 . . . T Ain1 , consisting of all points x X with the property

that x Ai0 , T x Ai1 , . . . , T n1 x Ain1 .

n1 n1 j

_ X _

i

H( T ) = H() + H(| T i ).

i=0 j=1 i=1

partition, we need the following two propositions.

an+p an + ap for all n, p, then

an

lim

n n

exists.

Proof Fix any m > 0. For any n 0 one has n = km + i for some i between

0 i m 1. By subadditivity it follows that

an akm+i akm ai am ai

= + k + .

n km + i km km km km

Note that if n , k and so lim supn an /n am /m. Since m is

arbitrary one has

an am an

lim sup inf lim inf .

n n m n n

W, T ), where T is a

1 n1 i

measure preserving transformation. Then, limn n H( i=0 T ) exists.

Definitions and Properties 61

Proof Let an = H( n1 i

W

i=0 T ) 0. Then, by Proposition 4.2.1, we have

n+p1

_

an+p = H( T i )

i=0

n1 n+p1

_ _

i

H( T ) + H( T i )

i=0 i=n

p1

_

= an + H( T i )

i=0

= an + ap .

n1

an 1 _

lim = lim H( T i )

n n n n

i=0

exists.

We are now in position to give the definition of the entropy of the trans-

formation T .

with respect to the partition is given by

n1

1 _

h(, T ) = h (, T ) := lim H( T i ) ,

n n

i=0

where

n1

_ X

H( T i ) = (D) log((D)) .

i=0 Wn1

D i=0 T i

62 Entropy

with respect to the partition is also given by

n1

_

h(, T ) = lim H(| T i ) .

n

i=1

W

WnT ) i

is bounded from below,

and is non-increasing, hence limn H(| i=1 T ) exists. Furthermore,

n n j

_ 1X

i

_

lim H(| T ) = lim H(| T i ).

n n n

i=1 j=1 i=1

n1 n1 j

_ X _

i

H( T ) = H() + H(| T i ).

i=0 j=1 i=1

Now, dividing by n, and taking the limit as n , one gets the desired

result

ing systems (see Definition 1.2.3, for a definition), with : X Y the

corresponding isomorphism. We need to show that h (T ) = h (S).

Let = {B1 , . . . , Bn } be any partition of Y , then 1 = { 1 B1 , . . . , 1 Bn }

is a partition of X. Set Ai = 1 Bi , for 1 i n. Since : X Y is an

isomorphism, we have that = 1 and T = S, so that for any n 0

and Bi0 , . . . , Bin1

Setting

Calculation of Entropy and Examples 63

n1

1 _

h (S) = sup h (, S) = sup lim H ( S i )

n n

i=0

1 X

= sup lim (B(n)) log (B(n))

n n Wn1 i

B(n) i=0 S

1 X

= sup lim (A(n)) log (A(n))

1 n n Wn1

A(n) i=0 T i 1

1

= sup h ( , T )

1

sup h (, T ) = h (T ) ,

where in the last inequality the supremum is taken over all possible finite

partitions of X. Thus h (S) h (T ). The proof of h (T ) h (S) is done

similarly. Therefore h (S) = h (T ), and the proof is complete.

Calculating the entropy of a transformation directly from the definition does

not seem feasible, for one needs to take the supremum over all finite parti-

tions, which is practically impossible. However, the entropy of a partition is

relatively easier to calculate if one has full information about the partition

under consideration. So the question is whether it is possible to find a par-

tition of X where h(, T ) = h(T ). Naturally, such a partition contains all

the information transmitted by T . To answer this question we need some

notations and definitions.

For = {A1 , . . . , AN } and all m, n 0, let

m

! n

!

_ _

T i and T i

i=n i=m

W W

W i=n i=m T

respectively. Furthermore, let i= T i be the smallest -algebra con-

Wn

taining all the partitions m i i

W

i=n T and i=m T for W

all n andm. We

call a partition a generator with respect to T if i= T i = F,

64 Entropy

W i on X. If T is non-invertible, then is said to be

a generator if ( i=0 T ) = F. Naturally, this equality is modulo sets

of measure zero. One has also the following characterization of generators,

saying basically, that each measurable set in X can be approximated by a

finite disjoint union of cylinder sets. See also [W] for more details and proofs.

and for each > 0 there exists a finite disjoint union C of elements of {nm },

such that (A4C) < .

Sinais Theorem and Kriegers Generator Theorem. For the proofs, we refer

the interested reader to the book of Karl Petersen or Peter Walter.

spect to T and H() < , then h(T ) = h(, T ).

formation with h(T ) < , then T has a finite generator.

shift.

Example (Entropy of a Bernoulli shift)Let T be the left shift on X =

{1, 2, , n}Z endowed with the -algebra F generated by the cylinder sets,

and product measure giving symbol i probability pi , where p1 + p2 + . . . +

pn = 1. Our aim is to calculate h(T ). To this end we need to find a partition

which generates the -algebra F under the action of T . The natural choice

of is what is known as the time-zero partition = {A1 , . . . , An }, where

Ai := {x X : x0 = i} , i = 1, . . . , n .

T m Ai = {x X : xm = i} ,

and

Calculation of Entropy and Examples 65

In other words, m i

W

i=0 T is precisely the collection of cylinders of length

m (i.e., the collection of all m-blocks), and these by definition generate F.

Hence, is a generating partition, so that

m1

!

1 _

i

h(T ) = h(, T ) = lim H T .

m m

i=0

, T 1 , , T (m1)

H( T 1 T (m1) )

= H() + H(T 1 ) + + H(T (m1) )

X n

= mH() = m pi log pi .

i=1

Thus,

n n

1 X X

h(T ) = lim (m) pi log pi = pi log pi .

m m

i=1 i=1

Exercise 4.3.1 Let T be the left shift on X = {1, 2, , n}Z endowed with

the -algebra F generated by the cylinder sets, and the Markov measure

given by the stochastic matrix P = (pij ), and the probability vector =

(1 , . . . , n ) with P = . Show that

n X

X n

h(T ) = i pij log pij

j=1 i=1

ical systems. Show that

66 Entropy

In the previous sections we have considered only finite partitions on X, how-

ever all the definitions and results hold if we were to consider countable par-

titions of finite entropy. Before we state and prove the Shannon-McMillan-

Breiman Theorem, we need to introduce the information function associated

with a partition.

Let (X, F, ) be a probability space, and = {A1 , A2 , . . .} be a finite or

a countable partition of X into measurable sets. For each x X, let (x)

be the element of to which x belongs. Then, the information function

associated to is defined to be

X

I (x) = log ((x)) = 1A (x) log (A).

A

tional information function of given by

XX (A B)

I| (x) = 1(AB) (x) log .

B A

(B)

We claim that

X

I| (x) = log E (1(x) |()) = 1A (x) log E(1A |()), (4.3)

A

(see the remark following the proof of Theorem (2.1.1)). This follows from the

fact (which is easy to prove using the definition of conditional expectations)

that if is finite or countable, then for any f L1 (), one has

Z

X 1

E (f |()) = 1B f d.

B

(B) B

R

Clearly, H(|) = X

I| (x) d(x).

that

I W = I + I| .

The Shannon-McMillan-Breiman Theorem 67

(X, F, ), and let = {A1 , A2 , . . .} be any countable partition. Then T 1 =

{T 1 A1 , T 1 A2 , . . .} is also a countable partition. Since T is measure pre-

serving one has,

X X

IT 1 (x) = 1T 1 Ai (x) log (T 1 Ai ) = 1Ai (T x) log (Ai ) = I (T x).

Ai Ai

Furthermore,

n Z

1 _

i 1

lim H( T ) = lim IWni=0 T i (x) d(x) = h(, T ).

n n + 1 n n + 1 X

i=0

1 W

finite entropy, then in fact the integrand I n T i (x) converges a.e. to

n + 1 i=0

h(, T ). Notice that the integrand can be written as

n

!

1 W 1 _

I n T i (x) = log ( T i )(x) ,

n + 1 i=0 n+1 i=0

W W

to as the -cylinder of order n containing x). Before we proceed we need the

following proposition.

entropy. For each n = 1, 2, 3, . . ., let fn (x) = I| Wni=1 T i (x), and let f =

supn1 fn . Then, for each 0 and for each A ,

Furthermore, f L1 (X, F, ).

n

!

_

i

fnA (x) = log E 1A | T (x),

i=1

and

Bn = {x X : f1A (x) t, . . . , fn1

A

(x) t, fnA (x) > t}.

68 Entropy

Wn fori x A one

Notice that has fn (x) = fnAW(x), and for x Bn one has

E (1A | i=1 T ) (x) < 2t . Since Bn ( ni=1 T i ), then

Z

(Bn A) = 1A (x) d(x)

Bn

Z n

!

_

= E 1A | T i (x) d(x)

Bn i=1

Z

2t d(x) = 2t (Bn ).

Bn

Thus,

({x A : f (x) > t}) = ({x A : fn (x) > t, for some n})

= {x A : fnA (x) > t, for some n}

= (

n=1 A Bn )

X

= (A Bn )

n=1

X

t

2 (Bn ) 2t .

n=1

hence,

The Shannon-McMillan-Breiman Theorem 69

Z Z

f (x) d(x) = ({x X : f (x) > t}) dt

X

Z0 X

= ({x A : f (x) > t}) dt

0 A

XZ

= ({x A : f (x) > t}) dt

A 0

XZ

min((A), 2t ) dt

A 0

XZ log (A) XZ

= (A) dt + 2t dt

A 0 A log (A)

X X (A)

= (A) log (A) +

A A

loge 2

1

= H () + < .

loge 2

So far we have defined the notion of the conditional entropy I| when

and are countable partitions. We can generalize the definition to the case

is a countable partition and G is a -algebra as follows (see equation (4.3)),

I|G (x) = log E (1(x) |G).

If we denote by

W i

Wn i

i=1 T = (n i=1 T ), then

I| W

i=1 T

i (x) = lim I| n

W

i=1 T

i (x). (4.4)

n

Exercise 4.4.2 Give a proof of equation (4.4) using the following important

theorem, known as the Martingale Convergence Theorem (and is stated to

our setting)

Theorem 4.4.1 (Martingale Convergence Theorem) Let C1 C2 be a

sequence of increasing algebras, and let C = (n Cn ). If f L1 (), then

E (f |C) = lim E (f |Cn )

n

70 Entropy

space (X, F, ) and f L1 (), then

f (T n x)

lim = 0, a.e.

n n

an ergodic measure preserving transformation on a probability space (X, F, ),

and let be a countable partition with H() < . Then,

1 W

lim I ni=0 T i (x) = h(, T ) a.e.

n n + 1

= IWn1

i=0 T

i (T x) + fn (x)

= IWn1

i=1 T

i (T x) + I| n1 T i (T x) + fn (x)

W

i=1

2

= IWn2

i=0 T

i (T x) + fn1 (T x) + fn (x)

..

.

= I (T n x) + f1 (T n1 x) + . . . + fn1 (T x) + fn (x).

Let f R(x) = I| W

i=1 T

i (x) = limn fn (x). Notice that f L1 (X, F, )

since X f (x) d(x) = h(, T ). Now letting f0 = I , we have

n

1 W 1 X

I ni=0 T i (x) = fnk (T k x)

n+1 n + 1 k=0

n n

1 X 1 X

= f (T k x) + (fnk f )(T k x).

n + 1 k=0 n + 1 k=0

n

X Z

k

lim f (T x) = f (x) d(x) = h(, T ) a.e.

n X

k=0

The Shannon-McMillan-Breiman Theorem 71

n

1 X

We now study the sequence { (fnk f )(T k x)}. Let

n + 1 k=0

kN n1

a.e. Also for any k, |fnk f | f + f , so that |fnk f | L1 (X, F, ) and

lim n |fnk f | = 0 a.e.

For any N n,

n nN

1 X 1 X

|fnk f |(T k x) = |fnk f |(T k x)

n + 1 k=0 n + 1 k=0

n

1 X

+ |fnk f |(T k x)

n + 1 k=nN +1

nN

1 X

FN (T k x)

n + 1 k=0

N 1

1 X

+ |fk f |(T nk x).

n + 1 k=0

If we take the limit as n , then by exercise (4.4.3) the second term tends

to 0 a.e., and by the ergodic theorem as well as the dominated convergence

theorem, the first term tends to zero a.e. Hence,

1 W

lim I ni=0 T i (x) = h(, T ) a.e.

n n + 1

The above theorem Wn canibe interpreted as providing an estimate of the size

of

Wnthe iatoms of i=0 T . For n sufficiently large, a typical element A

i=0 T satisfies

1

log (A) h(, T )

n+1

or

(An ) 2(n+1)h(,T ) .

Furthermore, if is a generating partition (i.e. i

W

i=0 T = F, then in the

conclusion of Shannon-McMillan-Breiman Theorem one can replace h(, T )

by h(T ).

72 Entropy

In 1964, G. Lochs compared the decimal and the continued fraction expan-

sions. Let x [0, 1) be an irrational number, and suppose x = .d1 d2 is the

decimal expansion of x (which is generated by iterating the map Sx = 10x

(mod 1)). Suppose further that

1

x = = [0; a1 , a2 , ] (4.5)

1

a1 +

1

a2 +

1

a3 +

...

T x = x1 b x1 c). Let y = .d1 d2 dn be the rational number determined by

the first n decimal digits of x, and let z = y + 10n . Then, [y, z) is the

decimal cylinder of order n containing x, which we also denote by Bn (x).

Now let

1

y=

1

b1 +

. 1

b2 + . . +

bl

and

1

z=

1

c1 +

. 1

c2 + . . +

ck

be the continued fraction expansion of y and z. Let

In other words, if Bn (x) denotes the decimal cylinder consisting of all points

y in [0, 1) such that the first n decimal digits of y agree with those of x,

and if Cj (x) denotes the continued fraction cylinder of order j containing

x, i.e., Cj (x) is the set of all points in [0, 1) such that the first j digits in

their continued fraction expansion is the same as that of x, then m(n, x) is

the largest integer such that Bn (x) Cm(n,x) (x). Lochs proved the following

theorem:

Lochs Theorem 73

Theorem 4.5.1 Let denote Lebesgue measure on [0, 1). Then for a.e.

x [0, 1)

m(n, x) 6 log 2 log 10

lim = .

n n 2

allows one to compare any two known expansions of numbers. We show

that Lochs theorem is true for any two sequences of interval partitions on

[0, 1) satisfying the conclusion of Shannon-McMillan-Breiman theorem. The

content of this section as well as the proofs can be found in [DF]. We begin

with few definitions that will be used in the arguments to follow.

partition of [0, 1) into subintervals. If P is an interval partition and x [0, 1),

we let P (x) denote the interval of P containing x.

Lebesgue probability measure on [0, 1).

Definition 4.5.2 Let c 0. We say that P has entropy c a.e. with respect

to if

log (Pn (x))

c a.e.

n

Suppose that P = {Pn }

n=1 and Q = {Qn }n=1 are sequences of interval

partitions. For each n N and x [0, 1), define

n=1 and Q = {Qn }n=1 be sequences of interval

partitions and Lebesgue probability measure on [0, 1). Suppose that for some

constants c > 0 and d > 0, P has entropy c a.e with respect to and Q has

entropy d a.e. with respect to . Then

mP,Q (n, x) c

a.e.

n d

74 Entropy

mP,Q (n, x) c

lim supn a.e.

n d

Fix > 0. Let x [0, 1) be a point at which the convergence conditions of

c+

the hypotheses are met. Fix > 0 so that c < 1 + . Choose N so

c

d

that for all n N

(Pn (x)) > 2n(c+)

and

(Qn (x)) < 2n(d) .

n c o

Fix n so that min n, n N , and let m0 denote any integer greater than

d

c

(1 + ) n. By the choice of ,

d

(Pn (x)) > (Qm0 (x))

c

mP,Q (n, x) (1 + ) n

d

and so

mP,Q (n, x) c

lim supn (1 + ) a.e.

n d

Since > 0 was arbitrary, we have the desired result.

Now we show that

mP,Q (n, x) c

lim inf n a.e.

n d

c

Fix (0, 1). Choose > 0 so that =: c 1 + (1 ) > 0. For

d

j c k

each n N let m (n) = (1 ) n . For brevity, for each n N we call an

d

element of Pn (respectively Qn ) (n, ) good if

(respectively

(Qn (x)) > 2n(d+) .)

Lochs Theorem 75

Pn (x) is (n, ) good and Qm(n) (x) is (m (n) , ) good

Dn () = x : .

and Pn (x) " Qm(n) (x)

If x Dn (), then Pn (x) contains an endpoint of the (m (n) , ) good

interval Qm(n) (x). By the definition of Dn () and m (n),

(Pn (x))

< 2n .

Qm(n) (x)

Since no more than one atom of Pn can contain a particular endpoint of an

atom of Qm(n) , we see that (Dn ()) < 2 2n and so

X

(Dn ()) < ,

n=1

{x | x Dn () i.o.} = 0.

Since m (n) goes to infinity as n does, we have shown that for almost every

x [0, 1), there exists N N, so that for all n N , Pn (x) is (n, ) good

and Qm(n) (x) is (m (n) , ) good and x / Dn (). In other words, for al-

most every x [0, 1), there exists N N, so that for all n N , Pn (x)

is (n, ) good and Qm(n) (x) is (m (n) , ) good and Pn (x) Qm(n) (x).

Thus, for almost every x [0, 1), there exists N N, so that for all n N ,

mP,Q (n, x) m (n), so that

mP,Q (n, x) c

b(1 ) c.

n d

This proves that

mP,Q (n, x) c

lim inf n (1 ) a.e.

n d

Since > 0 was arbitrary, we have established the theorem.

The above result allows us to compare any two well-known expansions

of numbers. Since the commonly used expansions are usually performed for

points in the unit interval, our underlying space will be ([0, 1), B, ), where

B is the Lebesgue -algebra, and the Lebesgue measure. The expansions

we have in mind share the following two properties.

76 Entropy

theoretic fibered map (NTFM) if it satisfies the following conditions:

such that T restricted to each atom of (cylinder set of order 0) is

monotone, continuous and injective. Furthermore, is a generating

partition.

invariant probability measure equivalent to with bounded density.

d d

(Both and are bounded, and (A) = 0 if and only if (A) = 0

d d

for all Lebesgue sets A.).

equivalent to . Let L, M > 0 be such that

W

i=0 T , then

by property (a), Pn is an interval partition. If H () < , then Shannon-

McMillan-Breiman Theorem gives

lim = h (T ) a.e. with respect to .

n n

Exercise 4.5.1 Show that the conclusion of the Shannon-McMillan-Breiman

Theorem holds if we replace by , i.e.

lim = h (T ) a.e. with respect to .

n n

Iterations of T generate expansions of points x [0, 1) with digits in D.

We refer to the resulting expansion as the T -expansion of x.

Almost all known expansions on [0, 1) are generated by a NTFM. Among

them are the n-adic expansions (T x = nx (mod 1), where n is a positive

integer), expansions (T x = x (mod 1), where > 1 is a real number),

continued fraction expansions (T x = x1 b x1 c), and many others (see the

book Ergodic Theory of Numbers).

Lochs Theorem 77

Exercise 4.5.2 Prove Theorem (4.5.1) using Theorem (4.5.2). Use the fact

that the continued fraction map T is ergodic with respect to Gauss measure

, given by Z

1 1

(B) = dx,

B log 2 1 + x

2

and has entropy equal to h (T ) = .

6 log 2

Exercise 4.5.3 Reformulate and prove Lochs Theorem for any two NTFM

maps S and T on [0, 1).

78 Entropy

Chapter 5

In particular, we study invertible, non-singular and conservative transforma-

tions on a probability space. We first start with a quick review of equivalent

measures, we then define non-singular and conservative transformations, and

state some of their properties. We end this section by giving a new proof

of Hurewicz Ergodic Theorem, which is a generalization of Birkhoff Ergodic

Theorem to non-singular conservative transformations.

Recall that two measures and on a measure space (Y, F) are equivalent

if and have the same null-sets, i.e.,

(A) = 0 (A) = 0, A F.

alent, then there exist measurable functions f, g 0, such that

Z Z

(A) = f d and (A) = g d.

A A

Z Z Z Z

h d = hf d and h d = hg d.

79

80 Hurewicz Ergodic Theorem

d d

Usually the function f is denoted by and the function g by .

d d

Now suppose that (X, B, ) is a probability space, and T : X X a

measurable transformation. One can define a new measure T 1 on (X, B)

by T 1 (A) = (T 1 A) for A B. It is not hard to prove that for

f L1 (), Z Z

f d( T 1 ) = f T d (5.1)

Z Z

f d( T ) = f T 1 d (5.2)

mations

Definition 5.2.1 Let (X, B, ) be a probability space and T : X X an

invertible measurable function. T is said to be non-singular if for any A B,

equivalent to (and hence equivalent to each other). By the theorem of

Radon-Nikodym, there exists for each n 6= 0, a non-negative measurable

d T n

function n (x) = (x) such that

d

Z

n

(T A) = n (x) d(x).

A

Non-singular and conservative transformations 81

is invertible and non-singular. Then for every f L1 (),

Z Z Z

f (x) d(x) = f (T x)1 (x) d(x) = f (T n x)n (x) d(x).

X X X

Proof We show the result for indicator functions only, the rest of the proof

is left to the reader.

Z

1A (x) d(x) = (A) = (T (T 1 A))

X

Z

= 1 (x) d(x)

T 1 A

Z

= 1A (T x)1 (x) d(x).

X

Proposition 5.2.2 Under the assumptions of Proposition 5.2.1, one has for

all n, m 1, that

Z Z

n

n (x)m (T x)d(x) = 1A (x)m (T n x)d( T n )(x)

A ZX

= 1A (T n x)m (x)d(x)

ZX

= 1T n A (x)d( T m )(x)

X Z

m n m+n

= T (T A) = (T A) = n+m (x)d(x).

A

Exercise 5.2.1 Let (X, B, ) be a probability space, and T : X X an

invertible P

non-singular transformation. For any measurable function f , set

n1 i

fn (x) = i=0 f (T x)i (x), n 1, where 0 (x) = 1. Show that for all

n, m 1,

fn+m (x) = fn (x) + n (x)fm (T n x).

82 Hurewicz Ergodic Theorem

measurable transformation. We say that T is conservative if for any A B

with (A) > 0, there exists an n 1 such that (A T n A) > 0.

non-singular and conservative. In this case, for any A B with (A) > 0,

there exists an n 6= 0 such that (A T n A) > 0.

on the probability space (X, B, ), and let A B with (A) > 0. Then for

a.e. x A there exist infinitely many positive and negative integers n, such

that T n x A.

n

B T B = . If (B) > 0, then by conservativity there exists an n 1,

such that (BT n B) is positive, which is a contradiction. Hence, (B) = 0,

and by non-singularity we have (T n B) = 0 for all n 1.

n

S letnC = {x A; T x A for only finitely many n 1}, then

Now,

C n=1 T B, implying that

X

(C) (T n B) = 0.

n=1

Therefore, for almost every x A there exist infinitely many n 1 such that

T n x A. Replacing T by T 1 yields the result for n 1.

then

X

n (x) = , a.e.

n=1

P

Proof Let A = {x X : n=1 n (x) < }. Note that

[

X

A= {x X : n (x) < M }.

M =1 n=1

X

B = {x X : n (x) < M }

n=1

Hurewicz Ergodic Theorem 83

R P

n=1 n (x) d(x) < M (B) < . However,

Z X Z

X

n (x) d(x) = n (x) d(x)

B n=1 n=1 B

X

= (T n B)

n=1

Z

X

= 1T n B (x) d(x)

n=1 X

Z X

= 1B (T n x) d(x).

X n=1

R P

Hence, X n=1 1B (T n x) d(x) < , which implies that

X

1B (T n x) < a.e.

n=1

contradicting Proposition 5.2.3. Thus (A) = 0, and

X

n (x) = , a.e.

n=1

The following theorem by Hurewicz is a generalization of Birkhoffs Ergodic

Theorem to our setting; see also Hurewicz original paper [H]. We give a new

prove, similar to the proof for Birkhoffs Theorem; see Section 2.1 and [KK].

invertible, non-singular and conservative transformation. If f L1 (), then

n1

X

f (T i x)i (x)

i=0

lim n1

= f (x)

n X

i (x)

i=0

84 Hurewicz Ergodic Theorem

Z Z

f (x) d(x) = f (x) d(x).

X X

f = f + f , and we consider each part separately). Let

fn (x) = f (x) + f (T x)1 (x) + + f (T n1 x)n1 (x),

fn (x) fn (x)

f (x) = lim sup n1

= lim sup ,

n X n gn (x)

i (x)

i=0

and

fn (x) fn (x)

f (x) = lim inf n1

= lim inf .

n X n gn (x)

i (x)

i=0

By Proposition (5.2.2), one has gn+m (x) = gn (x) + gm (T n x). Using Exercise

(5.2.1) and Proposition (5.2.4), we will show that f and f are T -invariant.

To this end,

fn (T x)

f (T x) = lim sup n

n gn (T x)

fn+1 (x) f (x)

1 (x)

= lim sup

n gn+1 (x) g(x)

1 (x)

fn+1 (x) f (x)

= lim sup

n gn+1 (x) g(x)

fn+1 (x) gn+1 (x) f (x)

= lim sup

n gn+1 (x) gn+1 (x) g(x) gn+1 (x) g(x)

fn+1 (x)

= lim sup

n gn+1 (x)

= f (x).

Hurewicz Ergodic Theorem 85

(Similarly f is T -invariant).

Now, to prove that f exists, is integrable and T -invariant, it is enough

to show that Z Z Z

f d f d f d.

X X X

R R

We first prove that X f d X f d. Fix any 0 < < 1, and let L > 0 be

any real number. By definition of f , for any x X, there exists an integer

m > 0 such that

fm (x)

min(f (x), L)(1 ).

gm (x)

Now, for any > 0 there exists an integer M > 0 such that the set

f (x) x X0

F (x) =

L x/ X0 .

and bn = bn (x) = min(f (x), L)(1 )n (x). We now show that {an } and

{bn } satisfy the hypothesis of Lemma 2.1.1 with M > 0 as above. For any

n = 0, 1, 2, . . .

if T n x X0 , then there exists 1 m M such that

Hence,

n (x)fm (T n x) min(f (x), L)(1 )gm (T n x)n (x).

Now,

n (x)fm (T n x)

= f (T n x)n (x) + f (T n+1 x)n+1 (x) + + f (T n+m1 x)n+m1 (x)

F (T n x)n (x) + F (T n+1 x)n+1 (x) + + F (T n+m1 x)n+m1 (x)

= an + an+1 + + an+m1 .

86 Hurewicz Ergodic Theorem

If T n x

/ X0 , then take m = 1 since

Hence by T -invariance of f , and Lemma 2.1.1 for all integers N > M one

has

Integrating both sides, and using Proposition (5.2.1) together with the T -

invariance of f one gets

Z Z

N F (x) d(x) min(f (x), L)(1 )gN M (x) d(x)

X X

Z

= (N M ) min(f (x), L)(1 ) d(x).

X

Since Z Z

F (x) d(x) = f (x) d(x) + L(X \ X0 ),

X X0

one has

Z Z

f (x) d(x) f (x) d(x)

X X0

Z

= F (x) d(x) L(X \ X0 )

X

(N M )

Z

min(f (x), L)(1 ) d(x) L.

N X

gets together with the monotone convergence theorem that f is integrable,

and Z Z

f (x) d(x) f (x) d(x).

X X

Z Z

f (x) d(x) f (x) d(x).

X X

Hurewicz Ergodic Theorem 87

fm (x)

(f (x) + ).

gm (x)

For any > 0 there exists an integer M > 0 such that the set

f (x) x Y0

G(x) =

0 x/ Y0 .

Notice that G f . Let bn = G(T n x)n (x), and an = (f (x) + )n (x). We

now check that the sequences {an } and {bn } satisfy the hypothesis of Lemma

2.1.1 with M > 0 as above.

if T n x Y0 , then there exists 1 m M such that

Hence,

f (T n x)n (x) + + f (T n+m1 x)n+m1 (x)

= n (x)fm (T n x)

(f (x) + )(n (x) + + n+m+1 (x))

= an + an+m1 .

If T n x

/ Y0 , then take m = 1 since

G(x) + G(T x)1 (x) + . . . + G(T N M 1 x)N M 1 (x) (f (x) + )gN (x).

88 Hurewicz Ergodic Theorem

Z Z

(N M ) G(x)d(x) N ( f (x)d(x) + ).

X X

R

Since f 0, the measure defined by (A) = A f (x) d(x) is absolutely

continuous with respect to the measure . Hence, there exists 0 > 0 such

that

R if (A) < , then (A) < 0 . Since (X \ Y0 ) < , then (X \ Y0 ) =

X\Y0

f (x)d(x) < 0 . Hence,

Z Z Z

f (x) d(x) = G(x) d(x) + f (x) d(x)

X X X\Y0

Z

N

(f (x) + ) d(x) + 0 .

N M X

one gets Z Z

f (x) d(x) f (x) d(x).

X X

This shows that Z Z Z

f d f d f d,

X X X

singular and conservative, we say that T is ergodic if for any measurable set

A satisfying (AT 1 A) = 0, one has (A) = 0 or 1. It is easy to check that

the proof of Proposition (1.7.1) holds in this case, so that T ergodic implies

that each T -invariant function is a constant a.e. Hence, if T is invertible,

non-singular, conservative and ergodic, then by Hurewicz Ergodic Theorem

one has for any f L1 (),

n1

X

f (T i x)i (x) Z

i=0

lim n1

= f d a.e.

n X X

i (x)

i=0

Chapter 6

Continuous Transformations

6.1 Existence

Suppose X is a compact metric space, and let B be the Borel -algebra i.e.,

the -algebra generated by the open sets. Let M (X) be the collection of all

Borel probability measures on X. There is natural embedding of the space

X in M (X) via the map x x , where X is the Dirac measure concentrated

at x (x (A) = 1 if x A, and is zero otherwise). Furthermore, M (X) is a

convex set, i.e., p + (1 p) M (X) whenever , M (X) and 0 p 1.

Theorem 6.1.2 below shows that a member of M (X) is determined by how

it integrates continuous functions. We denote by C(X) the Banach space of

all complex valued continuous functions on X under the supremum norm.

Theorem 6.1.1 Every member of M (X) is regular, i.e., for all B B and

every > 0 there exist an open set U and a closed sed C such that C

B U such that (U \ C ) < .

Idea of proof Call a set B B with the above property a regular set. Let

R = {B B : B is regular }. Show that R is a -algebra containing all the

closed sets.

Corollary 6.1.1 For any B B, and any M (X),

(B) = sup (C) = inf (U ).

BU :U open

CB:C closed

89

90 Invariant measures for continuous transformations

Z Z

f d = f dm

X X

Proof From the above corollary, it suffices to show that (C) = m(C) for

all closed subsets C of X. Let > 0, by regularity of the measure m there

exists an open set U such that C U and m(U \ C) < . Define f C(X)

as follows

(

0 x/ U

f (x) = d(x,X\U )

d(x,X\U )+d(x,C)

x U .

Z Z

(C) f d = f dm m(U ) m(C) + .

X X

Using a similar argument, one can show that m(C) (C) + . Therefore,

(C) = m(C) for all closed sets, and hence for all Borel sets.

This allows us to define a metric structure on M (X) as follows. A sequence

{n } in M (X) converges to M (X) if and only if

Z Z

lim f (x) dn (x) = f (x) d(x)

n X X

for all f C(X). We will show that under this notion of convergence

the space M (X) is compact, but first we need The Riesz Representation

Representation Theorem.

metric space and J : C(X) C a continuous linear map such that J is a

positive Roperator and J(1) = 1. Then there exists a M (X) such that

J(f ) = X f (x) d(x).

Existence 91

subset of {fn } of C(X). The sequence { X f1 dn } is a bounded sequence of

R (1)

complex numbers, hence has a convergent subsequence { X f1 dn }. Now,

R (1)

the sequence { X f2 dn } is bounded, and hence has a convergent sub-

R (2) R (2)

sequence { X f2 dn }. Notice that { X f1 dn } is also convergent. We

(i)

continue in this manner, to get for each i a subsequence {n } of {n }

(i) (j) R (i)

such that for all j i, {n } is a subsequence of {n } and { X fj dn }

(n) R (n)

converges. Consider the diagonal sequence {n }, then { X fj dn } con-

R (n)

verges for all j, and hence { X f dn } converges for all f C(X). Now

R (n)

define J : C(X) C by J(f ) = limn { X f dn }. Then, J is lin-

ear, continuous (|J(f )| supxX |f (x)|), positive and J(1) = 1. Thus,

by Riesz Representation Theorem, there exists a M (X) such that

R (n) R (n)

J(f ) = limn { X f dn } = X f d. Therefore, limn n = , and

M (X) is compact.

Let T : X X be a continuous transformation. Since B is generated

by the open sets, then T is measurable with respect to B. Furthermore, T

induces in a natural way, an operator T : M (X) M (X) given by

(T )(A) = (T 1 A)

i

for all A B. Then T (A) = (T i A). Using a standard argument, one

can easily show that

Z Z

f (x) d(T )(x) = f (T x)d(x)

X X

respect to M (X) if and only if T = . Equivalently, is measure

preserving if and only if

Z Z

f (x) d(x) = f (T x) d(x)

X X

M (X, T ) = { M (X) : T = }

be the collection of all probability measures under which T is measure pre-

serving.

92 Invariant measures for continuous transformations

M (X). Define a sequence {n } in M (X) by

n1

1X i

n = T n .

n i=0

Proof

R We need toR show that for any continuous function f on X, one

has X f (x) d(x) = X f (T x) d. Since M (X) is compact there exists a

M (X) and a subsequence {ni } such that ni in M (X). Now for

any f continuous, we have

Z Z Z Z

f (T x) d(x) f (x) d(x) = lim f (T x) dn (x) f (x) d n (x)

j j

X X

j X X

Z nj 1

1 X

f (T i+1 x) f (T i x) dnj (x)

= lim

j nj X

Z i=0

1

= lim (f (T nj x) f (x)) dnj (x)

j nj X

2 supxX |f (x)|

lim = 0.

j nj

Therefore M (X, T ).

space. Then,

trivial way as a convex combination of elements of M (X, T )) if and

only if T is ergodic with respect to .

M (X, T ) converging to in M (X). We need to show that M (X, T ).

Existence 93

f T is also continuous. Hence,

Z Z

f (T x) d(x) = lim f (T x) dn (x)

X n X

Z

= lim f (x) dn (x)

n X

Z

= f (x) d(x).

X

(ii) Suppose T is ergodic with respect to , and assume that

= p1 + (1 p)2

for some 1 , 2 M (X, T ), and some 0 < p 1. We will show that = 1 .

Notice that the measure 1 is absolutely continuous with respect to , and

T is ergodic with respect to , hence by Theorem (2.1.2) we have 1 = .

Conversely, (we prove the contrapositive) suppose that T is not ergodic with

respect to . Then there exists a measurable set E such that T 1 E = E,

and 0 < (E) < 1. Define measures 1 , 2 on X by

(B E) (B (X \ E))

1 (B) = and 1 (B) = .

(E) (X \ E)

Since E and X \ E are T -invariant sets, then 1 , 2 M (X, T ), and 1 6= 2

since 1 (E) = 1 while 2 (E) = 0. Furthermore, for any measurable set B,

(B) = (E)1 (B) + (1 (E))2 (B),

i.e. 1 is a non-trivial convex combination of elements of M (X, T ). Thus,

is not an extreme point of M (X, T ).

Since the Banach space C(X) of all continuous functions on X (under

the sup norm) is separable i.e. C(X) has a countable dense subset, one gets

the following strengthening of the Ergodic Theorem.

Theorem 6.1.7 If T : X X is continuous and M (X, T ) is ergodic,

then there exists a measurable set Y such that (Y ) = 1, and

n1 Z

1X i

lim f (T x) = f (x) d(x)

n n X

i=0

94 Invariant measures for continuous transformations

Theorem, for each k there exists a subset Xk with (Xk ) = 1 and

n1 Z

1X

lim fk (T i x) = fk (x) d(x)

n n X

i=0

T

for all x Xk . Let Y = k=1 Xk , then (Y ) = 1, and

n1 Z

1X

lim fk (T i x) = fk (x)d(x)

n n X

i=0

for all k and all x Y . Now, let f C(X), then there exists a subse-

quence {fkj } converging to f in the supremum norm, and hence is uniformly

convergent. For any x Y , using uniform convergence and the dominated

convergence theorem, one gets

n1 n1

1X 1X

lim f (T i x) = lim lim fkj (T i x)

n n n j n

i=0 i=0

n1

1X

= lim lim fkj (T i x)

j n n

i=0

Z Z

= lim fkj d = f d.

j X X

is ergodic with respect to if and only if

n1

1X

T i x a.e.

n i=0

C(X), Z

f (y) d(T i x )(y) = f (T i x),

X

Existence 95

that

n1 Z n1 Z

1X 1X

lim f (y) d(T i x )(y) = lim f (T i x) = f (y) d(y)

n n X n n i=0 X

i=0

1

Pn1

for all x Y , and f C(X). Thus, n i=0 T i x for all x Y .

Conversely, suppose n1 n1

P

i=0 T i x for all x Y , where (Y ) = 1. Then

for any f C(X) and any g L1 (X, B, ) one has

n1 Z

1X

lim f (T i x)g(x) = g(x) f (y) d(y).

n n X

i=0

n1 Z Z Z

1X i

lim f (T x)g(x) d(x) = g(x)d(x) f (y) d(y).

n n X X X

i=0

n1 Z Z Z

1X i

lim f (T x)G(x) d(x) = G(x)d(x) f (y)d(y)

n n X X X

i=0

R exists f C(X) such that ||F f ||2 <

which implies that | F d f d| < . Furthermore, there exists N so

that for n N one has

Z

n1 Z Z

1X i

f (T x)G(x) d(x) Gd f d < .

Xn

i=0 X X

96 Invariant measures for continuous transformations

Z

n1 Z Z

1 X

F (T i x)G(x) d(x) Gd F d

Xn

i=0 X X

Z n1

1 X

F (T i x) f (T i x) |G(x)| d(x)

X n i=0

Z

n1 Z Z

1 X

+ f (T i x)G(x)d(x) Gd f d

Xn X X

i=0

Z Z Z Z

+ f d

Gd F d G d

X X X X

< ||G||2 + + ||G||2 .

Thus,

n1 Z Z Z

1X i

lim F (T x)G(x) d(x) = G(x) d(x) F (y) d(y)

n n X X X

i=0

functions, one gets that T is ergodic.

Exercise 6.1.1 Let X be a compact metric space and T : X X be a

continuous homeomorphism. Let x X be periodic point of T of period n,

i.e. T n x = x and T j x 6= x for j < i. Show that if M (X, T ) is ergodic

n1

1X

and ({x}) > 0, then = T i x .

n i=0

A continuous transformation T : X X on a compact metric space is

uniquely ergodic if there is only one T -invariant probabilty measure on

X. In this case, M (X, T ) = {}, and is necessarily ergodic, since is an

extreme point of M (X, T ). Recall that if M (X, T ) is ergodic, then there

exists a measurable subset Y such that (Y ) = 1 and

n1 Z

1X i

lim f (T x) = f (y) d(y)

n n X

i=0

Unique Ergodicity 97

for all x Y and all f C(X). When T is uniquely ergodic we will see that

we have a much stronger result.

pact metric space X. Then the following are equivalent:

n1

1X

(i) For every f C(X), the sequence { f (T j x)} converges uniformly

n j=0

to a constant.

n1

1X

(ii) For every f C(X), the sequence { f (T j x)} converges pointwise

n j=0

to a constant.

(iii) There exists a M (X, T ) such that for every f C(X) and all

x X.

n1 Z

1X i

lim f (T x) = f (y) d(y).

n n X

i=0

(ii) (iii) Define L : C(X) C by

n1

1X

L(f ) = lim f (T i x).

n n

i=0

to see that L is linear, continuous (|L(f )| supxX |f (x)|), positive and

L(1) = 1. Thus, by Riesz Representation Theorem there exists a probability

measure M (X) such that

Z

L(f ) = f (x) d(x)

X

n1

1X

L(f T ) = lim f (T i+1 x) = L(f ).

n n

i=0

98 Invariant measures for continuous transformations

Hence, Z Z

f (T x) d(x) = f (x) d(x).

X X

Thus, M (X, T ), and for every f C(X),

n1 Z

1X i

lim f (T x) = f (x) d(x)

n n X

i=0

for all x X.

(iii) (iv) Suppose M (X, T ) is such that for every f C(X),

n1 Z

1X i

lim f (T x) = f (x) d(x)

n n X

i=0

n1

1X

C(X), since the sequence { f (T j x)} converges pointwise to the constant

n j=0

R

function X f (x) d(x), and since each term of the sequence is bounded in

absolute value by the constant supxX |f (x)|, it follows by the Dominated

Convergence Theorem that

Z n1 Z Z Z

1X i

lim f (T x) d(x) = f (x) d(x)d(y) = f (x) d(x).

n X n X X X

i=0

Z n1 Z

1X i

f (T x) d(x) = f (x) d(x).

X n i=0 X

R R

Thus, X f (x) d(x) = X f (x) d(x), and = .

(iv) (i) The proof is done by contradiction. Assume M (X, T ) = {} and

n1

1X

suppose g C(X) is such that the sequence { g T j } does not converge

n j=0

uniformly on X. Then there exists > 0 such that for each N there exists

n > N and there exists xn X such that

n1 Z

1 X j

g(T xn ) g d .

n j=0 X

Unique Ergodicity 99

Let

n1 n1

1X 1X j

n = T j xn = T xn .

n j=0 n j=0

Then, Z Z

| gdn g d| .

X X

Since M (X) is compact. there exists a subsequence ni converging to

M (X). Hence, Z Z

| g d g d| .

X X

By Theorem (6.1.5), M (X, T ) and by unique ergodicity = , which is

a contradiction.

Example If T is an irrational rotation, then T is uniquely ergodic. This is

a consequence of the above theorem and Weyls Theorem on uniform distri-

bution: for any Riemann integrable function f on [0, 1), and any x [0, 1),

one has

n1 Z

1X

lim f (x + i bx + ic) = f (y) dy.

n n X

i=0

the sequence of first digits

{1, 2, 4, 8, 1, 3, 6, 1, . . .}

obtained by writing the first decimal digit of each term in the sequence

For each k = 1, 2, . . . , 9, let pk (n) be the number of times the digit k appears

in the first n terms of the first digit sequence. The asymptotic relative fre-

pk (n)

quency of the digit k is then limn . We want to identify this limit

n

for each k {1, 2, . . . 9}. To do this, let = log10 2, then is irrational. For

k = 1, 2, . . . , 9, let Jk = [log10 k, log10 (k + 1)). By unique ergodicity of T , we

have for each k = 1, 2, . . . , 9,

n1

1X k+1

lim 1Jk (Tj (0)) = (Jk ) = log10 .

n n k

i=0

100 Invariant measures for continuous transformations

Returning to our original problem, notice that the first digit of 2i is k if and

only if

k 10r 2i < (k + 1) 10r

for some r 0. In this case,

But Ti (0) = i bic, so that Ti (0) Jk . Summarizing, we see that the first

digit of 2i is k if and only if Ti (0) Jk . Thus,

n1

pk (n) 1X k+1

lim = lim 1Jk (Ti (0)) = log10 .

n n n n

i=0

k

Chapter 7

Topological Dynamics

In this chapter we will shift our attention from the theory of measure preserv-

ing transformations and ergodic theory to the study of topological dynamics.

The field of topological dynamics also studies dynamical systems, but in a

different setting: where ergodic theory is set in a measure space with a mea-

sure preserving transformation, topological dynamics focuses on a topological

space with a continuous transformation. The two fields bear great similarity

and seem to co-evolve. Whenever a new notion has proven its value in one

field, analogies will be sought in the other. The two fields are, however, also

complementary. Indeed, we shall encounter a most striking example of a map

which exhibits its most interesting topological dynamics in a set of measure

zero, right in the blind spot of the measure theoretic eye.

In order to illuminate this interplay between the two fields we will be merely

interested in compact metric spaces. The compactness assumption will play

the role of the assumption of a finite measure space. The assumption of the

existence of a metric is sufficient for applications and will greatly simplify

all of our proofs. The chapter is structured as follows. We will first intro-

duce several basic notions from topological dynamics and discuss how they

are related to each other and to analogous notions from ergodic theory. We

will then devote a separate section to topological entropy. To conclude the

chapter, we will discuss some examples and applications.

101

102 Basic Notions

Throughout this chapter X will denote a compact metric space. Unless

stated otherwise, we will always denote the metric by d and occasionally

write (X, d) to denote a metric space if there is any risk of confusion. We will

assume that the reader has some familiarity with basic (analytic) topology.

To jog the readers memory, a brief outline of these basics is included as an

appendix. Here we will only summarize the properties of the space X, for

future reference.

f : X Y continuous. Then,

X is a Hausdorff space

there are disjoint open sets U , V containing A and B, respectively

A of X with diam(A) < is contained in an element of O. We call

> 0 a Lebesgue number for O.

X is a Baire space

f is uniformly continuous

X

continuous map g : X [a, b] such that g(x) = a for all x A and

g(x) = b for all x b

103

The reader may recognize that this theorem contains some of the most

important results in analytic topology, most notably the Lebesgue number

lemma, Baires category theorem, the extreme value theorem and Urysohns

lemma.

Let us now introduce some concepts of topological dynamics: topological

transitivity, topological conjugacy, periodicity and expansiveness.

x X, the forward orbit of x is the set F OT (x) = {T n (x)|n Z0 }. T is

called one-sided topologically transitive if for some x X, F OT (x) is dense

in X.

If T is invertible, the orbit of x is defined as OT (x) = {T n (x)|n Z}. If T

is a homeomorphism, T is called topologically transitive if for some x X

OT (x) is dense in X.

introduced, called minimality.

x X has a dense orbit in X.

the discrete topology. X is a finite space, hence compact, and the discrete

metric (

0 if x = y

ddisc (x, y) =

1 if x 6= y

induces the discrete topology on X. Thus X is a compact metric space. If

we now define T : X X by the permutation (12 1000), i.e. T (1) = 2,

T (2) = 3,. . .,T (1000) = 1, then it is easy to see that T is a homeomorphism.

Moreover, OT (x) = X for all x X, so T is minimal.

{ n1k |k Z>0 } equipped with the subspace topology inherited from R. Then

X is a compact metric space. Indeed, if O is an open cover of X, then O

contains an open set U which contains 0. U contains all but finitely many

points of X, so by adding an open set Uy to U from O with y Uy , for each

y not in U , we obtain a finite subcover. Define T : X X by T (x) = nx .

104 Basic Notions

that F OT (1) = X {0} and X {0} = X, so T is one-sided topologically

transitive as x = 1 has a dense forward orbit.

transitivity.

Theorem 7.1.2 Let T : X X be a homeomorphism of a compact metric

space. Then the following are equivalent:

1. T is topologically transitive

2. If A is a closed set in X satisfying T A = A, then either A = X or A

has empty interior (i.e. X A is dense in X)

3. For any pair of non empty open sets U , V in X there exists an n Z

such that T n U V 6=

Proof

U OT (x) 6= .

(1)(2): Suppose that OT (x) = X and let A be as stated. Suppose that A

does not have an empty interior, then there exists an open set U such that

U is open, non-empty and U A. Now, we can find a p Z such that

T p (x) U A. Since T n A = A for all n Z, OT (x) A and taking

closures we obtain X A. Hence, either A has empty interior, or A = X.

(2)(3): Suppose U , V are open and non-empty. Then W = n= T U

n

T A = A and A 6= X. By (2), A has empty interior and therefore W is dense

in X. Thus W V 6= , which implies (3).

(3)(1): For an arbitrary y X we have: y OT (x) if and only if every

open neighborhood V of y intersects OT (x), i.e. T m (x) V for some m Z.

By theorem 7.1.1 there exists a countable basis U = {Un }n=1 for the topology

of X. Since every open neighborhood of y contains a basis element Ui such

that y Ui , we see that OT (x) = X if and only if for every n Z>0 there is

some m Z such that T m (x) Un . Hence,

\

[

{x X|OT (x) = X} = T m Un

n=1 m=

105

But m

m= T Un is T -invariant and therefore intersects every open set in

X by (3). Thus, m

m= T Un is dense in X for every n. Now, since X is

a Baire space (c.f. theorem 7.1.1), we see that {x X|OT (x) = X} is itself

dense in X and thus certainly non-empty (as X 6= ).

[W].

Note that theorem 7.1.2 clearly resembles theorem (1.6.1) and (2) implies

that we can view topological transitivity as an analogue of ergodicity (in

some sense). This is also reflected in the following (partial) analogue of

theorem (1.7.1).

transitive or a topologically transitive homeomorphism. Then every continu-

ous T -invariant function is constant.

Proof

f T n = f , so f is constant on (forward) orbits of points. Let x0 X have a

dense (forward) orbit in X and suppose f (x0 ) = c for some c Y . Fix > 0

and let x X be arbitrary. By continuity of f at x, we can find a > 0 such

that d(f (x), f (x)) < for any x X with d(x, x) < . But then, since x0

has a dense (forward) orbit in X, there is some n Z (n Z0 ) such that

d(x, T n (x0 )) < . Hence,

Exercise 7.1.1 Let X be a compact metric space with metric d and let

T : X X is a topologically transitive homeomorphism. Show that if T is

an isometry (i.e. d(T (x), T (y)) = d(x, y), for all x, y X) then T is minimal.

tion and x X. Then x is called a periodic point of T of period n (n Z>0 )

if T n (x) = x and T m (x) 6= x for m < n. A periodic point of period 1 is called

a fixed point.

106 Basic Notions

point, namely x = 0. All other points in the domain are periodic points of

period 2.

homeomorphism. T is said to be expansive if there exist a > 0 such that: if

x 6= y then there exists an n Z such that d(T n (x), T n (y)) > . is called

an expansive constant for T .

compact metric space, since the discrete metric ddisc induces the topology on

X. Now, any bijective map T : X X is an expansive homeomorphism

and any 0 < < 1 is an expansive constant for T .

Expansiveness is closely related to the concept of a generator.

Definition 7.1.5 Let X be a compact metric space and T : X X a

homeomorphism. A finite open cover of X is called a generator for T if

the set

n= T

n

An contains at most one point of X, for any collection

{An }n= , Ai .

homeomorphism. Then T is expansive if and only if T has a generator.

Proof

Let be the open cover of X defined by = {B(a, 2 )|a X}. By compact-

ness, there exists a finite subcover of . Suppose that x, y n= T

n

Bn ,

m m

Bi for all i. Then d(T (x), T (y)) , for all m Z. But by expan-

siveness, there exists an l Z such that d(T l (x), T l (y)) > , a contradiction.

Hence, x = y. We conclude that is a generator for T .

Conversely, suppose that is a generator for T . By theorem 7.1.1, there

exists a Lebesgue number > 0 for the open cover . We claim that

/4 is an expansive constant for T . Indeed, if x, y X are such that

d(T m (x), T m (y)) /4, for all m Z. Then, for all m Z, the closed

ball B(T m (x), /4) contains T m (y) and has diameter /2 < . Hence,

107

m= T

m

Am .But is a generator,

m

so m= T Am contains at most one point. We conclude that x = y, T is

expansive.

called a weak generator. A generator is then defined by replacing

n= T

n

An

n

by n= T An in definition 7.1.5. Show that in a compact metric space

both concepts are in fact equivalent.

omorphism T : X X:

a. T is expansive if and only if T n is expansive (n 6= 0)

A, T |A , is expansive

map T S : X Y X Y is expansive with respect to the metric

d = max{dX , dY }.

Y Y be homeomorphisms. T is said to be topologically conjugate to S if

there exists a homeomorphism : X Y such that T = S . is

called a (topological) conjugacy.

of all homeomorphisms.

The term topological conjugacy is, in a sense, a misnomer. The following

theorem shows that topological conjugacy can be considered as the counter-

part of a measure preserving isomorphism.

Theorem 7.1.5 Let X, Y be compact spaces and let T : X X, S :

Y Y be topologically conjugate homeomorphisms. Then,

1. T is topologically transitive if and only if S is topologically transitive

108 Two Definitions

Proof

The proof of the first two statements is trivial. For the third statement

we note that

\

\

\

n 1 1 n 1

T ( An ) = S (An ) = ( S n An )

n= n= n=

The lefthandside of the equation contains at most one point if and only if

the righthandside does, so a finite open cover is a generator for S if and

only if 1 = {1 A|A } is a generator for T . Hence the result follows

from theorem 7.1.4.

Topological entropy was first introduced as an analogue of the succesful con-

cept of measure theoretic entropy. It will turn out to be a conjugacy invariant

and is therefore a useful tool for distinguishing between topological dynamical

systems. The definition of topological entropy comes in two flavors. The first

is in terms of open covers and is very similar to the definition of measure the-

oretic entropy in terms of partitions. The second (and chronologically later)

definition uses (n,) separating and spanning sets. Interestingly, the latter

definition was a topological dynamical discovery, which was later engineered

into a similar definition of measure theoretic entropy.

We will start with a definition of topological entropy similar to the defini-

tion of measure theoretic entropy introduced earlier. To spark the readers

recognition of the similarities between the two, we use analogous notation

and terminology. Before kicking off, let us first make a remark on the as-

sumptions about our space X. The first definition presented will only require

X to be compact. The second, on the other hand, only requires X to be a

metric space. We will obscure from these subtleties and simply stick to the

context of a compact metric space, in which the two definitions will turn out

to be equivalent. Nevertheless, the reader should be aware that both defini-

tions represent generalizations of topological entropy in different directions

109

Let X be a compact metric space and let , be open covers of X. We say

that is a refinement of , and write , if for every B there is an

A such that B A. The common refinement of and is defined to

be = {A B|AW , B }. For a finite collection {i }ni=1 of open

covers we may define ni=1 i = {ni=1 Aji |Aji i }. For a continuous trans-

formation T : X X we define T 1 = {T 1 A|A }. Note that these

are all again open covers of X. Finally, we define the diameter of an open

cover as diam() := supA diam(A), where diam(A) = supx,yA d(x, y).

implies T 1 T 1 .

of sets in a finite subcover of of minimal cardinality. We define the entropy

of to be H() = log(N ()).

proof is left as an exercise.

Proposition 7.2.1 Let be an open cover of X and let H() be the entropy

of . Then

1. H() 0

2. H() = 0 if and only if N () = 1 if and only if X

3. If , then H() H()

4. H( ) H() + H()

5. For T : X X continuous we have H(T 1 ) H(). If T is

surjective then H(T 1 ) = H().

A).

uous transformation with respect to an open cover and subsequently make

this definition independent of open covers.

110 Two Definitions

continuous. We define the topological entropy of T with respect to to be:

n1

1 _

h(T, ) = lim H( T i )

n n

i=0

We must show that h(T, ) is well-defined, i.e. that the right hand side exists.

Wn1 i

Theorem 7.2.1 limn n1 H( i=1 T ) exists.

Proof

Define

n1

_

an = H( T i )

i=0

Now, by (4) of proposition 7.2.1 and exercise 7.2.1

n+p1

_

an+p = H( T i )

i=0

n1 p1

_ _

i n

H( T ) + H(T T j )

i=0 j=0

an + ap

1. h(T, ) 0

3. h(T, ) H()

111

Proof

These are easy consequences of proposition 7.2.1. We will only prove the

third statement. We have:

n1

_ n1

X

i

H( T ) H(T i )

i=0 i=0

nH()

logical entropy of T to be:

h1 (T ) = sup h(T, )

We will defer a discussion of the properties of topological entropy till the end

of this chapter.

This approach was first taken by E.I. Dinaburg.

We shall first define the main ingredients. Let d be the metric on the com-

pact metric space X and for each n Z0 define a new metric by:

0in1

spanning for X if for all x X there exists a y A such that dn (x, y) < .

Define span(n, , T ) to be the minimum cardinality of an (n, )-spanning set.

A is called (n, )-separated if for any x, y A dn (x, y) > . We define

sep(n, , T ) to be the maximum cardinality of an (n, )-separated set.

112 Two Definitions

Figure 7.1: An (n, )-separated set (left) and (n, )-spanning set (right). The

dotted circles represent open balls of dn -radius 2 (left) and of dn -radius

(right), respectively.

a compact subset of X. This generalization to the context of non-compact

metric spaces was devised by R.E. Bowen. See [W] for an outline of this

extended framework.

We can also formulate the above in terms of open balls. If B(x, r) = {y

X|d(x, y) < r} is an open ball in the metric d, then the open ball with centre

x and radius r in the metric dn is given by:

n1

\

Bn (x, r) := T i B(T i (x), r)

i=0

[

X= Bn (a, )

aA

minimum cardinality of a covering of X by open sets of dn -diameter less

than .

The following theorem shows that the above notions are actually two sides

of the same coin.

113

Furthermore, sep(n, , T ), span(n, , T ) and cov(n, , T ) are finite, for all n,

> 0 and continuous T .

Proof

We will only prove the last two inequalities. The first is left as an exercise

to the reader. Let A be an (n, )-separated set of cardinality sep(n, , T ).

Suppose A is not (n, )-spanning for X. Then there is some x X such

that dn (x, a) , for all a A. But then A {x} is an (n, )-separated

set of cardinality larger than sep(n, , T ). This contradiction shows that A

is an (n, )-spanning set for X. The second inequality now follows since the

cardinality of A is at least as large as span(n, , T ).

To prove the third inequality, let A be an (n, )-separated set of cardinal-

ity sep(n, , T ). Note that if is an open cover of dn -diameter less than ,

then no element of can contain more than 1 element of A. This holds in

particular for an open cover of minimal cardinality, so the third inequality is

proved.

The final statement follows for span(n, , T ) and cov(n, , T ) from the com-

pactness of X and subsequently for sep(n, , T ) by the last inequality.

Exercise 7.2.3 Finish the proof of the above theorem by proving the first

inequality.

1

Lemma 7.2.1 The limit cov(, T ) := limn n

log(cov(n, , T )) exists and

is finite.

Proof

diameter and dn -diameter, smaller than and with cardinalities cov(m, , T )

and cov(n, , T ), respectively. Pick any A and B . Then, for

x, y A T m B,

dm+n (x, y) = max d(T i (x), T i (y))

0im+n1

0im1 mjm+n1

<

114 Equivalence of the two Definitions

Thus, A T m B has dm+n -diameter less than and the sets {A T m B|A

, B } form a cover of X. Hence,

Now, since log is an increasing function, the sequence {an } n=1 defined by

an = log cov(n, , T ) is subadditive, so the result follows from proposition 5.

Motivated by the above lemma and theorem, we have the following definition

tion T : X X is given by:

1

h2 (T ) = lim lim log(cov(n, , T ))

0 nn

1

= lim lim log(span(n, , T ))

0 n n

1

= lim lim log(sep(n, , T ))

0 n n

h2 (T ) still depends on the chosen metric d. As we will see later on, in

compact metric spaces h2 (T ) is independent of d, as long as it induces the

topology of X. However, there is definitely an issue at this point if we drop

our assumption of compactness of the space X.

We will now show that definitions (I) and (II) of topological entropy coincide

on compact metric spaces. This will justify the use of the notation h(T ) for

both definitions of topological entropy.

In the meantime, we will continue to write h1 (T ) for topological entropy in

the definition using open covers and h2 (T ) for the second definition of entropy.

After all the work done in the last section, our only missing ingredient is the

following lemma:

115

quence of open covers of X such that limn diam(n ) = 0, where diam is

the diameter under the metric d. Then limn h1 (T, n ) = h1 (T ).

Proof

We will only prove the lemma for the case h1 (T ) < . Let > 0 be arbi-

trary and let be an open cover of X such that h1 (T, ) > h1 (T ) . By

theorem 7.1.1 there exists a Lebesgue number > 0 for . By assumption,

there exists an N > 0 such that for n N we have diam(n ) < . So, for

such n, we find that for any A n there exists a B such that A B,

i.e. n . Hence, by proposition 7.2.2 and the above,

h1 (T ) < h1 (T, ) h1 (T, n ) h1 (T )

for all n N .

Exercise 7.2.4 Finish the proof of lemma 7.2.2. That is, show that if

h1 (T ) = , then limn h1 (T, n ) = .

Theorem 7.2.3 For a continuous transformation T : X X of a compact

metric space X, definitions (I) and (II) of topological entropy coincide.

Proof

Step 1: h2 (T ) h1 (T ).

For any n, let {k } k=1 be the sequence of open covers of X, defined by

1

k = {Bn (x, 3k )|x X}. Then, clearly, the dn -diameter of k is smaller than

1

Wn1 i

k

. Now, i=0 T k is an open W cover of X by sets of dn -diameter smaller

than k , hence cov(n, k , T ) N ( n1

1 1 i

i=0 T k ). Since limk diam(k ) = 0,

by lemma 7.2.2, we can subsequently take the log, divide by n, take the limit

for n and the limit for k to obtain the desired result.

Step 2: h1 (T ) h2 (T ).

Let {k } 1

k=1 be defined by k = {B(x, k )|x X}. Note that k = k is a

2

Lebesgue number for k , for each k. Let A be an (n, 2k )-spanning set for X

of cardinality span(n, 2k , T ). Then, for each a A the ball B(T i (a), 2k ) is

contained in a member of k (for all 0 i < n), hence Bn (T i (a), 2k ) is con-

tained in a member of n1

W i

Wn1 i 1

i=0 T k . Thus, N ( i=0 T k ) span(n, 4k , T ).

Proceeding in the same way as in step 1, we obtain the desired inequality.

116 Equivalence of the two Definitions

Properties of Entropy

We will now list some elementary properties of topological entropy.

Theorem 7.2.4 Let X be a compact metric space and let T : X X be

continuous. Then h(T ) satisfies the following properties:

1. If the metrics d and d both generate the topology of X, then h(T ) is the

same under both metrics

|n| h(T ), for all n Z

Proof

(1): Let (X, d) and (X, d)

and j : (X, d)

the same topology, the identity maps i : (X, d) (X, d)

(X, d) are continuous and hence by theorem 7.1.1 uniformly continuous. Fix

1 > 0. Then, by uniform continuity, we can subsequently find an 2 > 0 and

3 > 0 such that for all x, y X:

y) < 2

d(x, y) < 1 d(x,

y) < 2 d(x, y) < 3

d(x,

spanning for T under d. Analogously, every (n, 2 )-spanning set for T under

d is (n, 3 )-spanning for T under d. Therefore,

span(n, 1 , T, d)

span(n, 3 , T, d) span(n, 2 , T, d)

the log, dividing by n, taking the limit for n and the limit for 1 0

(so that 2 , 3 0) in these inequalities, we obtain the desired result.

(2): Suppose that S : Y Y is topologically conjugate to T , where Y is a

117

X. Then the map dY defined by dY (x, y) = dX ((x), (y)) defines a metric

on Y which induces the topology of Y . Indeed, let BdY (y0 , ) be a basis

element of the topology of Y . Then (BdY (y0 , )) is open in X, as is an

open map. Hence, there is an open ball BdX ((y0 ), ) (BdY (y0 , )), since

the open balls are a basis for the topology of X. Now, 1 (BdX ((y0 ), ))

BdY (y0 , ) and

y 1 (Bd ((y0 ), )) dY (y0 , y) = dX ((y), (y0 )) <

X

hence, BdY (y0 , ) BdY (y0 , ). Analogously, for any y0 Y and > 0, there

is a > 0 such that

BdY (y0 , ) BdY (y0 , )

Thus, dY induces the topology of Y .

Now, for x1 , x2 X, we have:

= dX ( S(y1 ), S(y2 ))

= dY (S(y1 ), S(y2 ))

is a bijection. Thus, we see that the ndiameters (in the metrics dX and

dY ) remain the same under . Hence, cov(n, , T, dX ) = cov(n, , S, dY ), for

all > 0 and n Z0 . By (1), h(S) is the same under dY and dY , so it now

easily follows that h(S) = h(T ).

(3): Observe that

max d(T ni (x), T ni (y)) max d(T j (x), T j (y))

1im1 1jnm1

n

nm

span(nm, , T ), thus h(T n ) nh(T ).

Fix > 0. Then, by uniform continuity of T i on X (c.f. theorem 7.1.1),

we can find a > 0 such that d(T i (x), T i (y)) < for all x, y X satisfying

d(x, y) < and i = 0, . . . , n 1. Hence, for x, y X satisfying d(x, y) < ,

we find dn (x, y) < . Let A be an (m, )-spanning set for T n . Then, for all

x X there exists some a A such that

max d(T ni (x), T ni (a)) <

0im1

118 Equivalence of the two Definitions

max max d(T ni+k (x), T ni+k (a)) = max d(T j (x), T j (a)) <

0kn1 0im1 0jnm1

(4): Let A be an (n, )-separated set for T . Then, for any x, y A dn (x, y) >

. But then,

0in1 0in1

versely, every (n, )-separated set B for T 1 gives the (n, )-separated set

T (n1) B for T . Thus, sep(n, , T ) = sep(n, , T 1 ) and the first statement

readily follows. The second statement is a consequence of (3).

Exercise 7.2.5 Show that assertion (4) of theorem 7.2.4, i.e. h(T 1 ) =

h(T ), still holds if X is merely assumed to be compact.

X X, S : Y Y continuous. Show that h(T S) = h(T ) + h(S).

(Hint: first show that the metric d = max{dX , dY } induces the product

topology on X Y ).

7.3 Examples

In this section we provide a few detailed examples to get some familiarity

with the material discussed in this chapter. We derive some simple properties

of the dynamical systems presented below, but leave most of the good stuff

for you to discover in the exercises. After all, the proof of the pudding is in

the eating!

lation model. Suppose we are studying a colony of penguins and wish to

model the evolution of the population through time. We presume that there

is a certain limiting population level, P , which cannot be exceeded. When-

ever the population level is low, it is supposed to increase, since there is

119

plenty of fish to go around. If, on the other hand, population is near the up-

per limit level P , it is expected to decrease as a result of overcrowding. We

arrive at the following simple model for the population at time n, denoted

by Pn :

Pn+1 = Pn (P Pn )

Where is a speed parameter. If we now set P = 1, we can think of Pn

as the population at time n as a percentage of the limiting population level.

Then, writing x = P0 , the population dynamics are described by iteration of

the following map:

defined to be

Q (x) = x(1 x)

Of course, in the context of the biological model formulated above, our atten-

tion is restricted to the interval [0, 1] since the population percentage cannot

lie outside this interval.

The quadratic map is deceivingly simple, as it in fact displays a wide range

of interesting dynamics, such as periodicity, topological transitivity, bifurca-

tions and chaos.

it can be shown that Qn (x) as n . The interesting dynam-

ics occur in the interval [0, 1]. Figure 7.2 shows a phase portrait for Q

on [0, 1], which is nothing more than the graph of Q together with the

graph of the line y = x. In a phase portrait the trajectory of a point

under iteration of the map Q is easily visualized by iteratively project-

ing from the line y = x to the graph and vice versa. From our phase

portrait for Q it is immediately visible that there is a leak in our in-

terval, i.e. the subset A1 = {x [0, 1]|Q (x)

/ [0, 1]} is non-empty. Define

An = {x [0, 1]|Q (x) [0, 1] for 0 i n 1, Qn (x)

i

/ [0, 1]} and set

C = [0, 1] n=1 An . This procedure is reminiscent of the construction of the

classical Cantor set, by removing middle-thirds in subsequent steps. In the

exercises below it is shown that C is indeed a Cantor set.

120 Equivalence of the two Definitions

Figure 7.2: Phase portrait of the quadratic map Q for > 4 on [0, 1]. The

arrows indicate the (partial) trajectory of a point in [0, 1]. The dotted lines

divide [0, 1] in three parts: I0 , A1 and I1 (from left to right).

define the rotation map R : S 1 S 1 with parameter by R (z) = ei z =

ei(+) , where [0, 2) and z = ei for some [0, 2). It is easily seen

that R is a homeomorphism for every . The rotation map is very similar

to the earlier introduced shift map T . This is not surprising, considering the

fact that the interval [0, 1] is homeomorphic to S 1 after identification of the

endpoints 0 and 1.

If is rational, say = ab , then every z S 1 is periodic with period b. R is

minimal if and only if is irrational.

Proof

The first statement follows trivially from the fact that e2ni = 1 for n Z.

121

[0, 2). Then

ei(nm) = 1

(n m) Z

Thus, the points {Rn (z)|n Z} are all distinct and it follows that {Rn (z)}

n=1

is an infinite sequence. By compactness, this sequence has a convergent

subsequence, which is Cauchy. Therefore, we can find integers n > m such

that

d(Rn (z), Rm (z)) <

where d is the arc length distance function. Now, since R is distance preserv-

ing with respect to this metric, we can set l = nm to obtain d(Rl (z), z) < .

Also, by continuity of Rl , Rl maps the connected, closed arc from z to Rl (z)

onto the connected closed arc from Rl (z) to R2l (z) and this one onto the arc

connecting R2l (z) to R3l (z), etc. Since these arcs have positive and equal

length, they cover S 1 . The result now follows, since the arcs have length

smaller than , and > 0 was arbitrary.

distance function and this metric induces the topology on S 1 . Hence, we

see that span(n, , R ) = span(1, , R ) for all n Z0 and it follows that

h(R ) = 0, for any [0, 2).

troduce the Bernoulli shift in a topological context. Let Xk = {0, 1, . , k 1}Z

(or Xk+ = {0, . , k 1}N{0} ) and recall that the left shift on Xk is defined by

L : Xk Xk , L(x) = y, yn = xn+1 . We can define a topology on Xk (or

Xk+ ) which makes the shift map continuous (in fact, a homeomorphism in

the case of Xk ) as follows: we equip {0, . . . , k 1} with the discrete topology

and subsequently give Xk the corresponding product topology. It is easy to

see that the cylinder sets form a basis for this topology. By the Tychonoff

theorem (see e.g. [Mu]), which asserts that an arbitrary product of compact

122 Equivalence of the two Definitions

exercises you are asked to show that the metric

d(x, y) = 2l if l = min{|j| : xj 6= yj }

induces the product topology, i.e. the open balls B(x, r) with respect to this

metric form a basis for the product topology. Here we set d(x, y) = 0 if

x = y, this corresponds to the case l = .

We will end this example by calculating the topological entropy of the full

shift L.

Proposition 7.3.2 The topological entropy of the full shift map is equal to

h(L) = log(k).

Proof

We will prove the proposition for the left shift on Xk+ , the case for the left shift

+

on Xk is similar. Fix 0 < < 1 and pick any x = {xi }

i=0 , y = {yi }i=0 Xk .

Notice that if at least one of the first n symbols in the itineraries of x and y

differ, then

dn (x, y) = max d(T i (x), T i (y)) = 1 >

0in1

where we have used the metric d on Xk+ defined above. Thus, the set An

consisting of sequences a Xk+ such that ai {0, . . . , k 1} for 0 i n1

and aj = 0 for j n is (n, )-separated. Explicitly, a An is of the form

a0 a1 a2 . . . an1 000 . . .

Since there are k n possibilities to choose the first n symbols of an itinerary,

we obtain sep(n, , L) k n . Hence,

1

h(T ) = lim lim log(sep(n, , L)) log(k)

0 n n

To prove the reverse inequality, take l Z>0 such that 2l < . Then An+l

is an (n, , L)-spanning set, since for every x Xk+ there is some a An+l

for which the first n + l symbols coincide. In other words, d(x, a) < 2l < .

Therefore,

1 n+l

h(T ) = lim lim log(span(n, , L)) lim lim log(k) = log(k)

0 n n 0 n n

and our proof is complete.

123

quence space Xk is a topological dynamical system whose dynamics are quite

well understood. The subject of symbolic dynamics is devoted to using this

knowledge to unravel the dynamical properties of more difficult systems. The

scheme works as follows. Let T : X X be a topological dynamical sys-

tem and let = {A0 , . . . , Ak1 } be a partition of X such that Ai Aj =

for i 6= j (Note the difference with the earlier introduced notion of a parti-

tion). We assume that T is invertible. Now, for x X, we let i (x) be the

index of the element of which contains T i (x), i Z. This defines a map

: X Xk , called the Itinerary map generated by T on ,

(x) = {i (x)}

i=

T = L . Of course, if T is not invertible, we may apply the same

procedure by replacing Xk by Xk+ .

We will now apply the above procedure to determine the number of pe-

riodic points of the teepee (or tent) map T : [0, 1] [0, 1], defined by (see

figure 7.3) (

2x if 0 x 12

T (x) =

2(1 x) if 12 x 1

Let I0 = [0, 12 ], I1 = [ 21 , 1]. We would like to define an itinerary map

124 Equivalence of the two Definitions

I1 6= causes some complications. Indeed, we have an ambiguous choice of

the itinerary for x = 21 , since it can be described by either (01000 . . .) or

(11000 . . .), or any other point in [0, 1] which will end up in x = 21 upon

iteration of T . We will therefore identify any pair of sequences of the form

(a0 a1 . . . al 01000 . . .) and (a0 a1 . . . al 11000 . . .) and replace the two sequences

by a single equivalence class. The resulting space will be denoted by X2+ and

the resulting itinerary map again by : [0, 1] X2+ .

We will first show that is injective. Suppose that there are x, y [0, 1]

with (x) = (y). Then T n (x) and T n (y) lie on the same side of x = 21 for

all n Z0 . Suppose that x 6= y. Then, for all n Z0 , T n ([x, y]) does

not contain the point x = 21 . But then no rational numbers of the form 2pn ,

p {1, 2, 3, . . . , 2n 1}, are in [x, y] for all n Z>0 . Since these form a

dense set in [0, 1], we obtain a contradiction. We conclude that x = y, is

one-to-one.

+

Now let us show that is also surjective. Let a = {ai } i=0 X2 . If a is one

of the aforementioned equivalence classes, we use the 0 element to represent

a. Now, define

An = {x [0, 1]|x Ia0 , T (x) Ia1 , . . . , T n (x) Ian }

= Ia0 T 1 Ia1 . . . T n Ian

Since T is continuous, T i Iai is closed for all i and therefore An is closed for

n Z0 . As [0, 1] is a compact metric space, it follows from theorem 7.1.1

that An is compact. Moreover, {An } n=0 forms a decreasing sequence, in the

sense that An+1 An . Therefore, the intersection n=0 An is not empty and

a is precisely the itinerary of x n=0 An . We conclude that is onto.

Now, since T = L , with L : X2+ X2+ the (induced) left shift, we

have found a bijection between the periodic points of L on X2+ and those

of T on [0, 1]. The periodic points for T are quite difficult to find, but the

periodic points of L are easily identified. First observe that there are no

periodic points in the earlier defined equivalence classes in X2+ since these

points are eventually mapped to 0. Now, any periodic point a of L of period

n in X2+ must have an itinerary of the form

(a0 a1 . . . an1 a0 a1 . . . an1 a0 a1 . . .)

Hence, T has 2n periodic points of period n in [0, 1].

125

Exercise 7.3.1 In this exercise you are asked to prove some properties of

the shift map. Let Xk = {0, 1, . , k 1}Z and Xk+ = {0, 1, . , k 1}N{0} .

d(x, y) = 2l if l = min{|j| : xj 6= yj }

X |xn yn |

d (x, y) =

n=

2|n|

c. Show that the one-sided shift on Xk+ is continuous. Show that the two-

sided shift on Xk is a homeomorphism.

Exercise 7.3.2 Let Q be the quadratic map and let C, A1 be as in the first

example. Set [0, 1] A1 = I0 I1 , where I0 is theinterval left of A1 and I1

the interval to the right. We assume that > 2 + 5, so that |Q0 (x)| > 1 for

all x I0 I1 (check this!). Let X2+ = {0, 1}N and let : C X2+ be the

itinerary map generated by Q on {I0 , I1 }. This exercise serves as another

demonstration of the usefulness of symbolic dynamics.

shift map on X2+ .

the periodic points of Q are dense in C. Finally, prove that Q is

topologically transitive on C.

Exercise 7.3.3 Let Q be the quadratic map and let C be as in the first

example. This exercise shows that C is a Cantor set, i.e. a closed, totally

disconnected and perfect

subset of [0, 1]. As in the previous exercise, we will

assume that > 2 + 5.

126 Equivalence of the two Definitions

C are one-point sets.

Chapter 8

theoretic and topological entropy, known as the Variational Principle. It

asserts that for a continuous transformation T of a compact metric space X

the topological entropy is given by h(T ) = sup{h (T )| M (X, T )}. To

prove this statement, we will proceed along the shortest and most popular

route to victory, provided by [M]1 . The proof is at times quite technical and

is therefore divided into several digestable pieces.

For the first part of the proof of the Variational Principle we will only use

some properties of measure theoretic entropy. All the necessary ingredients

are listed in the following lemma:

ing transformation of the probability space (X, F, ). Then,

1

of non-zero measure. Equality holds only if (A) = N () , for all A

with (A) > 0

1

Our exposition of Misiurewiczs proof follows [W] to a certain extent

127

128 Equivalence of the two Definitions

Proofs

f : [0, ) R, P defined by f (t) = t log(t) (t > 0), f (0) = 0. In this

notation, H () = A f ((A)).

(2): implies that for every B , there some A such that B A

and hence also T i B T i A, i Z0 . Therefore, for n 1

n1

_ n1

_

i

T T i

i=0 i=0

(3): By proposition (4.2.1)(e) and (b), respectively,

n1

_ n1

_ n1

_

H( T i ) H( T i T i )

i=0 i=0 i=0

n1

_ n1

_ n1

_

= H( T i ) + H( T i | T i )

i=0 i=0 i=0

Now, by successively applying proposition (4.2.1) (d), (g) and finally using

the fact that T is measure preserving, we get

n1

_ n1

_ n1

X n1

_

i i i

H( T | T ) H(T | T i )

i=0 i=0 i=0 i=0

n1

X

H(T i |T i )

i=0

= nH(|)

Combining the inequalities, dividing both sides by n and taking the limit for

n gives the desired result.

Main Theorem 129

n1 k1 n1

_ 1 _ nj _ i

h(T n , T i ) = lim H( T ( T ))

k k

i=0 j=0 i=0

kn1

n _

= lim H( T i )

k kn

i=0

m1

n _

= lim H( T i )

m m

i=0

= nh(T, )

Notice that { n1 i

W

i=0 T | a partition of X} { | a partition of X}.

Hence,

n1

_

= sup h(T , n

T i )

i=0

n

sup h(T , )

= h(T n )

Wn1

Finally, since i=0 T i , we obtain by (2),

n1

_

n

h(T , ) h(T , n

T i ) = nh(T, )

i=0

Exercise 8.1.1 Use lemma 8.1.1 and proposition (4.2.1) to show that for an

invertible measure preserving transformation T on (X, F, ):

We are now ready to prove the first part of the theorem.

130 Equivalence of the two Definitions

compact metric space. Then h(T ) sup{h (T )| M (X, T )}.

Proof

1

> 0 such that < n log n

. By theorem 16, is regular, so we can find closed

sets Bi Ai such that (Ai Bi ) < , i = 1, . . . , n. Define B0 = X ni=1 Bi

and let be the partition = {B0 , . . . , Bn }. Then, writing f (t) = t log(t),

n X

n

X (Bi Aj )

H (|) = (Bi )f ( )

i=0 j=1

(Bi )

n

X (B0 Aj )

= (B0 ) f( )

j=1

(B0 )

Bi Ai = Bi

Bi Aj = if i 6= j

the conditional measure (|B0 ) : F B0 [0, 1] by

(A B0 )

(A|B0 ) =

(B0 )

(|B0 ) M (B0 , T |B0 ). Now, noting that 0 := {AB0 |A } is a partition

of B0 under (|B0 ), we get

n

X (B0 Aj )

H (|) = (B0 ) f( ) = (B0 )H(|B0 ) (0 )

j=1

(B0 )

Hence, we can apply (1) of lemma 8.1.1 and use the fact that

Main Theorem 131

to obtain

H (|) (B0 ) log(n) < n log(n) < 1

Define for each i, i = 1, . . . , n, the open set Ci by Ci = B0 Bi = X

j6=i Bj . Then we can define an open cover of X by = {C1 , . . . , Cn }.

By (1)

W of lemma 8.1.1 weWhave for m 1, in the notation of the lemma,

H ( m1

i=0 T i

) log(N ( m1 i

i=0 T )). Note that is not necessarily

Wm1 ia par-

tition, butWby the proof of (1) of lemma 8.1.1, we still have H ( i=0 T )

log(2m N ( m1 i

i=0 T )), since contains (at most) one more set of non-zero

measure.

Thus, it follows that

holds for T m as well. Applying (4) of lemma 8.1.1 and (3) of theorem 7.2.4

leads us to mh (T ) mh(T ) + log 2 + 1. By dividing by m, taking the limit

for m and taking the supremum over all M (X, T ), we obtain the

desired result.

We will now finish the proof of the Variational Principle by proving the

opposite inequality. This is the hard part of the proof, since it does not only

require more machinery, but also a clever trick.

X. Then, for any , M (X, T ) and p [0, 1] we have Hp+(1p) ()

pH () + (1 p)H ()

Proof

Then f is concave and so, for A measurable, we have:

The result now follows easily.

132 Equivalence of the two Definitions

continuous transformation. Use (the proof of) lemma 8.1.2 to show that for

any , M (X, T ) and p [0, 1] we have hp+(1p) (T ) ph (T ) + (1

p)h (T ).

Exercise 8.1.3 Improve the result in the exercise above by showing that we

can replace the inequality by an equality sign, i.e. for any , M (X, T )

and p [0, 1] we have hp+(1p) (T ) = ph (T ) + (1 p)h (T ).

1. For any x X and > 0, there is a 0 < < such that (B(x, )) =

0

that diam(Aj ) < and (Aj ) = 0, for all j

such that (Aj ) = 0, j = 0, . . . , n 1, then ((n1

j=0 T

j

Aj )) = 0

then limn n (A) = (A)

Proof

(1): Suppose the statement is false. Then there exists an x X and > 0

such that for all 0 < < (B(x, )) > 0. Let {i } i=1 be a sequence of

distinct real numbers P satisfying 0 < i < and i for i . Then,

(i=1 B(x, i )) = i=1 (B(x, i )) = , contradicting the fact that is

a probability measure on X.

(2): Fix > 0. By (1), for each x X, we can find an 0 < x < /2

such that (B(x, x )) = 0. The collection {B(x, x )|x X} forms an open

cover of X, so by compactness there exists a finite subcover which we denote

by = {B1 , . . . , Bn }. Define by letting A1 = B1 and for 0 < j n let

j1

Aj = Bj (k=1 Bk ). Then is a partition of X, diam(Aj ) < diam(Bj ) <

and (Aj ) (ni=1 Bi ) = 0, since Aj ni=1 Bi .

(3): Let x (n1 j=0 T

j

Aj ). Then x n1

j=0 T

j A , but x

j / n1

j=0 T

j

Aj .

j k

That is, every open neighborhood of x intersects every T Aj , but x / T Ak

Main Theorem 133

T k,

T k (x) T k (T k Ak ) Ak Ak Ak = Ak

Thus, (n1

j=0 T

j

Aj ) n1

j=0 T

j

Aj and the statement readily follows.

(4): Recall that n in M (X) for n if and only if:

Z Z

lim f (x)dn (x) = f (x)d(x)

n X X

k Z>0 . Then, since A and X Ak are closed, it follows from theorem 7.1.1

that for every k there exists a function fk C(X) such that 0 fk 1,

fk (x) = 1 for all x A and fk (x) = 0 for all x X Ak . Now, for each k,

Z Z Z

lim n (A) = lim 1A (x)dn (x) lim fk (x)dn (x) = fk (x)d(x)

n n X n X X

Theorem,

Z

lim n (A) lim fk (x)d(x) = (A) = (A)

n k X

where the final inequality follows from the assumption (A) = 0. The proof

of the opposite inequality is similar.

Lemma 8.1.4 Let q, n be integers such that 1 < q < n. Define, for 0 j

q 1, a(j) = b nj

q

c, where bc means taking the integer part. Then we have

the following

2. Fix 0 j q 1. Define

Then

134 Equivalence of the two Definitions

q

1cq + j n q. The

numbers {j + rq|0 j q 1, 0 r a(j) 1} are distinct and no

greater than n q.

Example 8.1.1 Let us work out what is going on in lemma 8.1.4 for n =

10, q = 4. Then a(0) = b 100 4

c = 2 and similarly, a(1) = 2, a(2) = 2

and a(3) = 1. This gives us S0 = {8, 9}, S1 = {0, 9}, S2 = {0, 1} and

S3 = {0, 1, 2, 7, 8, 9}. For example, S3 is obtained by taking all nonnegative

integers smaller than 3 and adding the numbers 3+a(3)q = 7, 3+a(3)q+1 = 8

and 3 + a(3)q + 2 = 9. Note that card(Sj ) 8. One can readily check all

properties stated in the lemma, e.g. (a(3) 1)q + 3 b 1034

1c 4 + 3 =

3 6 = n q.

We shall now finish the proof of our main theorem. The proof is presented

in a logical order, which is in this case not quite the same as thinking order.

Therefore, before plunging into a formal proof, we will briefly discuss the

main ideas.

We would like to construct a Borel probability measure with measure the-

oretic entropy h (T ) sep(, T ) := limn n1 log(sep(n, , T )). To do this,

we first find a measure n with Hn ( n1 i

W

i=0 T ) equal to log(sep(n, , T )),

where is a suitably chosen partition. This is not too difficult. The problem

is to find an element of M (X, T ) with this property. Theorem (6.1.5) sug-

gest a suitable measure which can be obtained from the n . The trick in

lemma 8.1.4 plays a crucial role in getting from an estimate for W Hn to one

for H . The idea is to first remove the tails Sj of the partition n1 i

i=0 T ,

make a crude estimate for these tails and later add them back on again.

Lemma 8.1.3 fills in the remaining technicalities.

compact metric space. Then h(T ) sup{h (T )| M (X, T )}.

Proof

ity sep(n, , T ). Define n M (X) by n = (1/sep(n, , T )) xEn x ,

Main Theorem 135

n = n n1

1 i

i=0 n T . By theorem 19, M (X) is compact, hence there exists a

subsequence {nj } j=1 such that {nj } converges in M (X) to some M (X).

By theorem (6.1.5), M (X, T ). We will show that h (T ) sep(, T ), from

which the result clearly follows.

By (2) of lemma 8.1.3, we can find a -measurable partition = {A1 , . . . , Ak }

of X such that diam(Aj ) < and (Aj ) = 0, j = 1, . . . , k. We may as-

Wn1

sume that every x En is contained in some Aj . Now, if A i=0 T i ,

then A cannot contain more than one element of En . Hence, W n (A) = 0 or

k n1 i

1/sep(n, , T ). Since j=1 Aj contains all of En , we see that Hn ( i=0 T ) =

log(sep(n, , T )).

Fix integers q, n with 1 < q < n and define for each 0 j q 1 a(j) as in

lemma 8.1.4. Fix 0 j q 1. Since

n1 a(j)1 q1

_ _ _ _

i (rq+j)

T =( T ( T i )) ( T i )

i=0 r=0 i=0 iSj

we find that

n1

_

log(sep(n, , T )) = Hn ( T i )

i=0

a(j)1 q1

X _ X

rqj

Hn (T ( T i )) + Hn (T i )

r=0 i=0 iSj

a(j)1 q1

X _

Hn T (rq+j) ( T i ) + 2q log k

r=0 i=0

Here we used proposition (4.2.1)(d) and lemma 8.1.1. Now if we sum the

above inequality over j and divide both sides by n, we obtain by (3) and (1)

of lemma 8.1.4

n1 q1

q 1X _ 2q 2

log(sep(n, , T )) Hn T l ( T i ) + log k

n n l=0 i=0

n

q1

_ 2q 2

H n ( T i ) + log k

i=0

n

136 Measures of Maximal Entropy

W

i=0 T has boudary of -measure

zero, so by (4) of the same lemma, limj nj (A) = (A). Hence, if we

replace n by nj in the above inequality and take the limit for j we

obtain qsep(, T ) H q1 i

W

i=0 T . If we now divide both sides by q and

take the limit for q , we get the desired inequality.

continuous transformation T : X X of a compact metric space X is

given by h(T ) = sup{h (T )| M (X, T )}

To get a taste of the power of this statement, let us recast our proof of the

invariance of topological entropy under conjugacy ((2) of theorem 7.2.4). Let

: X1 X2 denote the conjugacy. We note that M (X1 , T1 ) if and only

if 1 M (X2 , T2 ) and we observe that h (T1 ) = h1 (T2 ). The result

now follows from the Variational Principle. It is that simple.

The Variational Principle suggests an educated way of choosing a Borel prob-

ability measure on X, namely one that maximizes the entropy of T .

continuous. A measure M (X, T ) is called a measure of maximal entropy

if h (T ) = h(T ). Let Mmax (X, T ) = { M (X, T )|h (T ) = h(T )}. If

Mmax (X, T ) = {} then is called a unique measure of maximal entropy.

given by h(R ) = 0. Since h (R ) 0 for all M (X, R ), we see that

every M (X, R ) is a measure of maximal entropy, i.e. Mmax (X, R ) =

M (X, R ). More generally, we have h(T ) = 0 and hence Mmax (X, T ) =

M (X, T ) for any continuous isometry T : X X.

measures, as will become apparent from the following theorem.

continuous. Then

137

If h(T ) < then the extreme points of Mmax (X, T ) are precisely the

ergodic members of Mmax (X, T ).

measure of maximal entropy, then T is uniquely ergodic.

uniquely ergodic, then T has a measure of maximal entropy.

Proofs

(2): Suppose Mmax (X, T ) is ergodic. Then, by theorem 6.1.6, cannot

be written as a non-trivial convex combination of elements of M (X, T ). Since

Mmax (X, T ) M (X, T ), is an extreme point of Mmax (X, T ). Conversely,

suppose is an extreme point of M (X, T ) and suppose there is a p (0, 1)

and 1 , 2 M (X, T ) such that = p1 + (1 p)2 . By exercise 8.1.3,

h(T ) = h (T ) = ph1 (T ) + (1 p)h2 (T ). But by the Variational Principle,

h(T ) = sup{h (T )| M (X, T )}, so h(T ) = h1 (T ) = h2 (T ). In other

words, 1 , 2 Mmax (X, T ), thus = 1 = 2 . Therefore, is also an

extreme point of M (X, T ) and we conclude that is ergodic.

For the second part, suppose that Mmax (X, T ) 6= .

(3): By the Variational Principle, for any n Z>0 , we can find a n

M (X, T ) such that hn (T ) > 2n . Define M (X, T ) by

N

X n X n X n

= = +

n=1

2n n=1

2n

n=N

2n

But then,

N

X n

h (T ) >N

n=1

2n

This holds for arbitrary N Z>0 , so h (T ) = h(T ) = and is a measure

of maximal entropy.

138 Measures of Maximal Entropy

Now suppose that T has a unique measure of maximal entropy. Then, for

any M (X, T ), h/2+/2 (T ) = 21 h (T ) + 12 h (T ) = . Hence, = ,

M (X, T ) = {}.

(4): The first statement follows from (2), for h(T ) < , and (3), for

h(T ) = . If T is uniquely ergodic, then M (X, T ) = {} for some .

By the Variational Principle, h (T ) = h(T ).

measure. By the theorem above, it follows that Lebesgue measure is also a

unique measure of maximal entropy for R .

full two-sided shift. Use proposition 7.3.2 to show that the uniform product

measure is a unique measure of maximal entropy for T .

space has a measure of maximal entropy.

Proof

expansive constant for T . Fix 0 < < . Define M (X, T ) as in the

proof of theorem 8.1.2. Then, by the proof, h (T ) sep(, T ). We will show

that h(T ) = sep(, T ). It then immediately follows that Mmax (X, T ).

Pick any 0 < < and let A be an (n, )-separated set of cardinal-

ity sep(n, , T ). By expansiveness, we can find for any x, y A some

k = kx,y Z such that

d(T k (x), T k (y)) >

by our choice of l, for x, y T l A we have

0i2l+n1

139

1

sep(, T ) = lim log(sep(n, , T ))

n n

1

lim log(sep(2l + n, , T ))

n n

1

lim log(sep(2l + n, , T ))

n 2l + n

= sep(, T )

sep(, T ) sep(, T ).

We conclude that sep(, T ) = sep(, T ). Since 0 < < was arbitrary, this

shows that h(T ) = lim0 sep(, T ) = sep(, T ). This completes our proof.

be an expansive constant for T . Then h(T ) = sep(, T ), for any 0 < < .

At this point we would like to make a remark on the proof of the above propo-

sition. One might be inclined to believe that the measure constructed in

the proof of theorem 8.1.2 is a measure of maximal entropy for any continu-

ous transformation T . The reader should note, though, that still depends

on and it is therefore only because of the above corollary that is a mea-

sure of maximal entropy for expansive homeomorphisms.

140 Measures of Maximal Entropy

Bibliography

Cambridge University Press, 2002.

Mathematical Monographs, 29. Mathematical Association of Amer-

ica, Washington, DC 2002.

Application to Number Theory, Proc. AMS Vol 129, 12, (2001), 3453

- 3460.

Second Edition, Addison-Wesley Publishing Company, Inc.

tems, Springer-Verlag New York, Inc.

Math., 45 (1944), 192206.

[KK] Kamae, Teturo; Keane, Michael, A simple proof of the ratio ergodic

theorem, Osaka J. Math. 34 (1997), no. 3, 653657.

bridge Press, 1966.

[M] Misiurewicz, M., 1976, A short proof of the variational principle for

a Zn+ action on a compact space, Asterique, 40, pp. 147-187

[Mu] Munkres, J.R., 2000, Topology, Second Edition, Prentice Hall, Inc.

141

142 Measures of Maximal Entropy

[Pa] William Parry, Topics in Ergodic Theory, Reprint of the 1981 original.

Cambridge Tracts in Mathematics, 75. Cambridge University Press,

Cambridge, 2004.

Mathematics, 2. Cambridge University Press, Cambridge, 1989.

in Mathematics, 79. Springer-Verlag, New York-Berlin, 1982.

Index

(n, )-separated, 111 span(n, , T ), 111

(n, )-spanning, 111

B(x, r), 112 Stationary Stochastic Processes, 12

Bn (x, r), 112

algebra, 7

F OT (x), 103

algebra generated, 7

H(), 109 atoms of a partition, 58

Mmax (X, T ), 136

N (), 109 Bakers Transformation, 10

OT (x), 103 Bernoulli Shifts, 11

Q , 119 binary expansion, 10

R , 120 Birkhoffs Ergodic Theorem, 29

T 1 , 109

X, 102 Cantor set, 125

, 109 circle rotation, 120

-transformations, 10 common refinement, 58

Wn of open cover, 109

i=1 i , 109

A, 132 conditional expectation, 35

cov(, T ), 113 conditional information function, 66

cov(n, , T ), 112 conservative, 82

d, 102 Continued Fractions, 13

ddisc , 103 diameter

dn , 111 of a set, 109

diam(A), 109 of open cover, 109

diam(), 109 dynamical system, 47

h(T ), 114

h(T, ), 110 entropy of the partition, 57

h1 (T ), 111 entropy of the transformation, 61

h2 (T ), 114 equivalent measures, 79

sep(, T ), 134 ergodic, 20

143

144 Measures of Maximal Entropy

expansive, 106

constant, 106 natural extension, 52

extreme point, 92 non-singular transformation, 80

first return time, 16 forward, 103

fixed point, 105

periodic point, 105

generator, 64, 106 phase portrait, 119

weak, 107 Poincare Recurrence Theorem, 15

conjugate, 107

Radon-Nikodym Theorem, 79

expansive, 106, 107

random shifts, 13

generator for, 106

refinement

minimal, 103

of open cover, 109

topologically transitive, 103

regular measure, 89

weak generator for, 107

Riesz Representation Representation

Hurewicz Ergodic Theorem, 83

Theorem, 90

induced map, 16

semi-algebra, 7

induced operator, 22

Shannon-McMillan-Breiman Theo-

information function, 66

rem, 70

integral system, 19

shift map

irreducible Markov chain, 42

full, 121

isometry, 105

strongly mixing, 45

isomorphic, 47

subadditive sequence, 60

Kacs Lemma, 35 symbolic dynamics, 123

Knopps Lemma, 26

topological conjugacy, 107

Lebesgue number, 102 topological entropy

Lochs Theorem, 72 definition (I), 111

definition (II), 114

Markov measure, 41 of open cover, 109

Markov Shifts, 12 properties, 116

measure of maximal entropy, 136 wrt open cover, 110

unique, 136 topologically transitive, 103

measure preserving, 6 homeomorphism, 103

145

one-sided, 103

translations, 9

uniquely ergodic, 96

weakly mixing, 45

- Kyburg, Epistemology and logicUploaded byMoncif Daoudi
- Lecture Notes on Measure TheoryUploaded bybangvn
- Dempster ShaferUploaded byMainak Chakraborty
- Mt NotesUploaded byAsk Bulls Bear
- skewness_kurtosisUploaded byAlok Kumar
- Beta DistributionUploaded byMichael Balzary
- Dajani Ergodic TheoryUploaded byjzam1919
- Meas.integrUploaded byqwerty
- Moment Generation FunctionUploaded byNimesh Gunasekera
- Introduction to Probability Theory1Uploaded byxuble
- Some Results Concerning Hilbert SpaceUploaded byAlexander Decker
- Probability Density function worksheetUploaded byHANS
- Module CARE Paper2@Set1Uploaded byHayati Aini Ahmad
- PDUploaded byWaqar Zaki
- US Federal Reserve: 200552papUploaded byThe Fed
- Assignment Sms IdaUploaded bynurzahidahraman
- Power Analysis of JT TestUploaded bymarvec
- f13-18100AsyllUploaded byHigazi z'Eagle
- Weibull Analysis in ExcelUploaded byJoaquim Reis
- Simulation ProblemsUploaded byJayaKhemani
- Chapter 1 ProbUploaded byKathryn Smith
- 9783642286834-t1Uploaded byWilverAnderson
- Relating Two Definitions of ExpectationUploaded bymax_sperlich6647
- random numbersUploaded byapi-197410019
- 0902.1232Uploaded byTahtsel
- THE NEUTROSOPHIC STATISTICAL DISTRIBUTION : MORE PROBLEMS , MORE SOLUTIONSUploaded byMia Amalia
- stat12t_0503.pptUploaded byMustafa Sherwan
- forcastingUploaded byavesh Singh
- UJI RELIABILITAS DSK - KPD RISKA.docxUploaded byHari Wijonarko
- ProbaUploaded byDonabelleTubig

- liang2016(2)Uploaded byhendra
- 1687-1499-2014-220Uploaded byhendra
- MIMO Radios Full DuplexUploaded byPedroHenriqueAndradeFerreira
- atzeni2015Uploaded byhendra
- Bou Ida 2017Uploaded byhendra
- wei2017Uploaded byhendra
- 07906483Uploaded byhendra
- Performance Evaluation of a Coordinated Time-Domain eICIC Framework based on ABSF in Heterogeneous LTE-Advanced Networks.pdfUploaded byhendra
- 1411.4287Uploaded byhendra
- j214Uploaded byhendra
- 31 ijecsUploaded byhendra
- 1709.04374Uploaded byhendra
- Stav Rid is 2013Uploaded byhendra
- mk2016Uploaded byhendra
- roh2004.pdfUploaded byhendra
- Chang 2016Uploaded byhendra
- 10.1109@WCT.2003.1321534.pdfUploaded byhendra
- Behn Ad 2016Uploaded byhendra
- Zhang 2007Uploaded byhendra
- 1510.00649Uploaded byhendra
- Information 08 00111Uploaded byhendra
- IT FullRateSFcode Jan 05Uploaded byhendra
- Stf JournalUploaded byhendra
- ma2008Uploaded byhendra
- 07474028Uploaded byhendra
- he2017(1)Uploaded byhendra
- 1603.01926Uploaded byhendra
- a51f873301af5d186224310127e66cc02fe7Uploaded byhendra
- 10.1109@ICT.2017.7998273Uploaded byhendra
- 10.1109@ICT.2017.7998233Uploaded byhendra

- Evaluating a Company's Capital StructureUploaded bySteffanie228
- Growing Ginseng in the Forest.pdfUploaded bybateriamea
- data storageUploaded byapi-327848021
- Klein Ch12Uploaded byyogeshg_73
- Millenia lUploaded bymilove4u
- Tumeric & Chinese Goldthread Prostrate CancerUploaded byrustycarmelina
- Orbital mechanicsUploaded byangeles19531322
- rhcsa7_1497639628Uploaded byAshok Kuikel
- ANP & AHPUploaded byNihar Ranjan Barik
- How to Apply for a Niv Visa Using the Ds160Uploaded byAli Mahdi Lafta
- 6CH05_01_que_20120619Uploaded byYuan Xintong
- colon carcinomaUploaded byapi-276847924
- quality-by-design-(qbd)-overview.pdfUploaded byAshwini Deshpande
- SIKA Waterstops Brochure for WebUploaded bymoug_th
- program sept 1-2Uploaded byapi-306815162
- 01443579610109910Uploaded byKarthikkumar Baskaran
- Lesson 7Uploaded byAbs Pangader
- THERMOTECH MANUAL-SECTION 1 TO 5.pdfUploaded byNishadhraj Vasava
- Marlton 0213Uploaded byelauwit
- W49F002U-12BUploaded byUniversidad Tecnológica De Torreón
- Hangzhou Fuchuan Electric Equipment Co.,Ltd_PPTUploaded byRahmat Nur Ilham
- Acc Sample PcUploaded byShivaani Tekale
- Oscillators Simplified - Delton T HornUploaded byrobertlivingstone
- s k mondalUploaded byTapas Swain
- Cooperatives Europe brochureUploaded byRay Collins
- Buddy DoyleUploaded byAntonio Deko
- WEB HVAC Design and Sizing PrinciplesUploaded byYOUSIF
- Stability of LSD ConditionsUploaded byDario
- High-Speed Digital Design Black MagicUploaded byapi-26444461
- Green DesignUploaded byDaisy