n

© All Rights Reserved

4 views

ergodic

n

© All Rights Reserved

- CV - Non Separability of the Hölder Spaces
- Real Analysis Lecture sldies
- [Translations of Mathematical Monographs 26] Gennadiĭ Mikhaĭlovich Goluzin - Geometric Theory of Functions of a Complex Variable (Translations of Mathematical Monographs, Vol. 26) (1969, AMS Bookstore)
- Metric and Topological Spaces
- 01630563.2010.536286
- 1-2-2
- M.Sc math
- lecturenotes2009.pdf
- 00 Metric Spaces
- week1 (1).pdf
- Thermodynamic Formalism
- Limits and Continuity for Pharma .pdf
- Week 6 Solutions
- section3.5a.pdf
- Oeuvre Complète
- 02_MetricSpr-9.pdf
- 4.IJMCARFEB20174
- Research Scholarship Test 2009
- style_LaTeX.pdf
- ma691_ch1.pdf

You are on page 1of 62

PETER VARJU

These are brief lecture notes for my course. Please send comments

to: pv270@dpmms.cam.ac.uk.

Ergodic Theory is the study of measure preserving systems.

Definition 1. A measure preserving system (MPS) is a quadruple

(X, B, , T ), where

X is a set

B is a -algebra

is a probability measure, that is, (X) = 1 and (A) 0 for

all A B

T : X X is a measure preserving transformation, that is a

measurable transformation such that (T 1 (A)) = (A) for all

A B.

Example 2 (Circle rotation). Fix a number [0, 1). Let X =

[0, 1] = R/Z, let B be the collection of Borel sets in [0, 1], the

Lebesgue measure, and T = R , where R is the map

Ra : x 7 x + mod 1.

Example 3 (Times 2 map). Let X, B, be as in the previous ex-

ample and let T = T2 , where

T2 (x) = 2x mod 1.

These examples can be generalized to more general compact groups.

In the first case the map is addition (or multiplication) by a fixed

element of the group, in the second case it is an endomorphism of the

group.

Proof that T2 is measure preserving. Easy to see for intervals:

(a, b) = b a = ((a/2, b/2) (a/2 + 1/2, b/2 + 1/2)) = (T21 (a, b)).

An open set U T is the disjoint union of intervals I1 I2 . . ., hence

X X

(U ) = (Ij ) = (T21 (Ij )) = (T21 (U )).

A compact set K is the complement of an open set U , hence

(K) = 1 (U ) = 1 (T21 (U )) = (X\T21 (U )) = (T21 (K)).

1

2 PETER VARJU

and a compact set K such that K B U and (U ) (K) . On

the other hand

(T21 (B)) (T21 (U )) = (U )

and similarly (T21 (B)) (K), so we must have |(T21 (B))

(B)| , which proves the claim letting 0.

Ergodic theory studies the long term behaviour of orbits in MPSs.

The orbit of a point x X is the sequence x, T x, T 2 x, . . .. In particular

the following questions are asked:

Let A B and x A. Does the orbit of x visit A infinitely

often?

What is the proportion of the ns such that T n x A?

What is the measure of {x A : T n x A} for some large n?

Example 4. Let A = [0, 1/4). Then T2n x A if and only if the nth

and n + 1th binary digits of x are both 0. Hence the orbit of

1

x = = 0.00101010101010 . . .(2)

6

never visits A again. However, the opposite is true for most points.

In addition:

1

({x A : T n x A}) = for all n 2.

16

We mention one further example of a MPS.

Example 5 (Markov shifts). Let n 2 be an integer, let p1 , . . . , pn be

nn

a probability vector and A = (ai,j ) R0 a matrix, called the matrix

of transition probabilities. We assume

1 1

A .

. = .. , (p , . . . , p )A = (p , . . . , p ).

. . 1 n 1 n

1 1

The following MPS is called a Markov shift.

X = {1, 2, . . . , n}Z ,

B is the Borel -algebra generated by the product topology,

T = is the shift map (x)m = xm+1 ,

the measure is defined by the following formula

(1) ({x X : xm = i0 , xm+1 =i1 , . . . , xm+k = ik })

=pi0 ai0 i1 ai1 i2 aik1 ik ,

which is required to hold for all m, k, i0 , . . . , ik .

It requires a proof that the set-function defined by (1) is

additive and that it can be extended to a probability measure

on B that is -invariant. This will appear in the first example

sheet.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 3

Inspection of (1) shows that is the infinite product of the measure on

{1, . . . , k} given by the probability vector (p1 , . . . , pn ). In this special

case the MPS is called a Bernoulli shift.

Theorem 6 (Szemeredi). Let S Z be a set of positive upper Banach

density, that is:

1

d(S) := lim sup |{x S : M x < N }|.

N M N M

each l Z>0 , there is a Z and d Z>0 such that

{a + jd : j = 0, . . . , l 1} S.

Fix a set S with positive upper Banach density. We construct below

an MPS and show that Szemeredis theorem follows from the so-called

multiple recurrence property of this system.

We define X = {0, 1}Z and take B to be the Borel -algebra gener-

ated by the product topology. We consider the shift transformation

: X X, ((x))n = xn+1 .

The invariant measure will be constructed below, and it will encode

the set S.

Define the element xS X by

(

1 if n S

xSn :=

0 if n

/ S.

A := {x X : x0 = 1}.

The following observation will be used to relate the set S to the dy-

namics: For each n Z we have

nS if and only if n xS A.

Indeed,

n S 1 = xSn = ( n xS )0 n xS A.

The invariant measure in the MPS will be constructed as the limit of

normalized counting measures on longer and longer segments of the or-

bit of xS . Before we give the details, we recall the notion and properties

of weak limits of probability measures.

4 PETER VARJU

metric space.

Definition 7. Let n be a sequence of Borel probability measures on

X and let be another Borel probability measure. We say that n

weakly converges (1) to if

Z Z

lim f dn = f d for all f C(X).

n

notation, we will write

lim-wn n = .

This concept is useful thanks to the following result, which is an

analogue of the Bolzano-Weierstrass theorem.

Theorem 8 (Banach-Alaoglu/Helly). The space of Borel probability

measures M (X) on a compact metric space X endowed with the topol-

ogy of weak convergence is compact and metrizable. In other words, any

sequence of probability measures has a weakly convergent subsequence.

Remark: If we drop the condition that X is metric (but it should

be Hausdorff, at least), then the compactness of M (X) will still hold,

but we may lose metrizability and the sequential version needs to be

reformulated in terms of nets.

2.2. The invariant measure. Let S, X, B, and xS be the same as

above. We construct now the invariant measure on the MPS that will

be used to prove Szemeredis theorem.

We write x for the probability measure defined by

(

1 if x B

x (B) =

0 if x

/ B.

We choose two sequences of integers {Mm }, {Nm } Z such that

1

d(S) = lim |{x S : Mm x < Nm }|.

m Nm Mm

We define the measures m M (X) by

NXm 1

1

m = i S.

Nm Mm i=M x

m

ment Mm xS , . . . , Nm 1 xS in the set B.

We define to be the weak limit of a subsequence of m .

Lemma 9. The above defined (X, B, , ) is a MPS.

(1)This is also called weak- convergence.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 5

Proof sketch. Later in the course we prove a general result, which con-

tains this lemma. For this reason, we do not give a rigorous argument

now.

Without loss of generality, we assume that lim-w m = . (Otherwise

we pass to the suitable subsequence.) We need to prove that preserves

, but we only show that preserves m alomst.

We can write for a set B B:

1

m (B) = |{i : Mm i < Nm , i xS B}|

Nm Mm

and similarly

1

m ( 1 (B)) = |{i : Mm i < Nm , i xS 1 (B)}|

Nm Mm

1

= |{i : Mm + 1 i < Nm + 1, i xS B}|.

Nm Mm

Hence

1

|m (B) m ( 1 B)| < .

Nm Mm

Remark 10. If B is a cylinder set(2), i.e. a set of the form

B = {x X : (xN , . . . , xN ) B}

for some N Z0 and B {0, 1}2N +1 , then B is both open and closed,

hence B is continuous.

By the definition of weak convergence, we have then that

Z

1

( (B)) = 1 B d = lim 1 B dm

m

Z

= lim B dm = B d = (B)

m

Using this observation, together with the argument in the Proof

sketch, we can conclude that (B) = (B) for cylinder sets. It is

possible to make the proof of Lemma 9 rigorous by approximating ar-

bitrary Borel sets with cylinder sets in a suitable way.

Proposition 11. Let S Z be a set of positive upper Banach density

and let (X, B, , ) be the measure preserving system constructed above.

Let A X be the set A = {x X : x0 = 1}. Let l 1 be an integer.

Suppose that

(A n (A) 2n (A) . . . (l1)n (A)) > 0

for an integer n 1. Then S contains an arithmetic progression of

length l.

(2)Some sources use a more restrictive terminology and call only sets of the form

{x X : xN = aN , . . . , xN = aN } for some aN , . . . , aN {0, 1} cylinder sets.

6 PETER VARJU

B := A n (A) 2n (A) . . . (l1)n (A)

is a cylinder set, hence (B) = lim m (B), as we have seen in the

preceding proof. This means that for some m large enough m (B) > 0.

By the definition of m , we must have j xS B for some Mm j

Nm . Then j xS in (A), hence j+in xS A for all 0 i l 1. As

we observed in the beginning of the lecture, this translates as

{j + in : 0 i l 1} S,

which was to be shown.

We note that (A) = d(S) as can be seen easily from the construction

of . We can now conclude that the following theorem of Furstenberg

implies Szemeredis theorem.

Theorem 12 (Multiple Recurrence, Furstenberg). Let (X, B, , T ) be

a MPS and A B with (A) > 0. Then for any integer l 1 we have

N

1 X

lim inf (A T n A T 2n A . . . T (l1)n A) > 0.

N n=1

The following lemma can be considered the pigeon hole principle of

Ergodic Theory. Putting together with the Furstenberg correspondence

principle it implies that every set of integers of positive upper Banach

density contains an arithmetic progression of length 2, which is not

very impressive yet.

Lemma 13. Let (X, B, , T ) be an MPS and let A B with (A) > 0.

Then there is some n 1 such that (T n (A) A) > 0.

Proof. Assume to the contrary that (A T n (A)) = 0 for all n Z>0 .

Let i < j Z0 . Then

(T i (A) T j (A)) = (T i (A T (ji)(A) )) = 0.

We fix a number N and write

(AT 1 (A) . . . T N (A))

=(A) + (T 1 A) (T 1 A A)

+ (T 2 A) (T 2 A (A T 1 A)) + . . .

+ (T N A) (T N (A) (A . . . T N A))

=N (A),

which is a contradiciton if N > (A)1 .

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 7

let A B with (A) > 0. Then almost every point of A goes back to A

infinitely often. That is to say

[

\

A\ T n (A) = 0.

N =1 n=N

n

S Note that T (A) consists of the set of points x that satisfy T n x A,

n

n=N T (A) is the set T

of points

S thatn

visit A at least once after the

N th iteration of T and N =1 n=N T (A) is the set of points that

visit A infinitely often.

Proof. We denote by A0 the set of points in A that never come back to

A. Then A0 T n A0 A T n A0 = for all n 1, hence (A0 ) = 0

by the lemma.

Let x A be a point that does not come back to A infinitely often.

Let n be the largest integer such that T n x A. Then T n x A0 . This

shows that [

\ [

A\ T n (A) T n (A0 ).

N =1 n=N n=0

This latter set is of measure 0, because it is the countable union of

measure 0 sets.

Definition 15 (Ergodicity). An MPS (X, B, , T ) is called ergodic if

A = T 1 (A) implies (A) = 0 or (A) = 1 for any A B.

Ergodicity is a notion of indecomposibility for MPSs. If A B is

invariant (that is T 1 (A) = A), then we can restrict B, and T to A

and after renormalizing the measure, we obtain a new MPS.

The following lemma gives a few alternative characterization of er-

godicity.

Lemma 16. The following are equivalent

(1) The MPS (X, B, , T ) is ergodic.

(2) For any A B with (A) > 0, we have (

T S n

N =1 n=N T (A)) =

1.

(3) For any A B, (A4T 1 (A)) = 0 implies (A) = 0 or (A) =

1.

(4) For any bounded measurable function f : X R, f T =

f holds almost everywhere implies that f is constant almost

everywhere.

(5) For any measurable function f : X C, f T = f holds almost

everywhere implies that f is constant almost everywhere.

The second item shows that for ergodic systems the Poincare recur-

rence holds in a stronger form: not only almost every point in A but

also almost every point in X visit A infinitely often. The third, fourth

and fifth items give characterizations that are useful in practice. In

8 PETER VARJU

particular, we will use the fourth item to show that the circle rotation

R is ergodic if and only if is irrational.

Proof. (1) (2) : The set B :=

T S n

N =1 n=N T (A) is invariant. In-

deed, x B if and only if its orbit visits A infinitely often. This is true

for x if and only if it is true for T x. This shows that B = T 1 (B), as

claimed. By Poincare recurrence (B) (A) > 0, hence (B) = 1 by

ergodicity.

(2) (3) : Let A satisfy (A4T 1 (A)) = 0 and let B be as in the

previous paragraph. We show that (B) (A). For n 1, we define

Dn as the set of points x B\A such that n is the smallest integer

with T n x A. The sets Dn clearly partition B\A. We can write

(Dn ) (T n (A)\T n+1 (A)) = (T 1 A\A) (T 1 A4A) = 0.

Hence (

S

n=1 Dn ) = 0 and (B) (A), as required. If (A) 6= 0,

then (A) (B) = 1, which proves (3).

(3) (4) : For t R write

At := {x X : f (x) t}.

Clearly (At 4T 1 At ) = 0, hence (At ) = 0 or (At ) = 1 for each

t. Observe that the sets At are increasing with t, hence the function

t 7 (At ) is a monotone increasing function.

Since f is bounded, we have (At ) = 0 if t is sufficiently small, and

(At ) = 1 if t is sufficiently large. Therefore, there is a finite real

number c such that (At ) = 0 for t < c and (At ) = 1 for t > c. By

the previous observations,

\

{x : f (x) = c} = (Ac+1/n \Ac1/n )

n=1

is of measure 1.

(4) (1) : Let A B be such that T 1 (A) = A. Then A T = A ,

hence A is constant almost everywhere. If this constant is 0, then

(A) = 0, if it is 1, then (A) = 1. Hence the system is ergodic,

indeed.

Example 17. The circle rotation R is ergodic if and only if is

irrational.

Indeed: Let f : R/Z R be a bounded measurable function. We

can expand f in Fourier series

X

f= an exp(2inx).

nZ

Similarly for f R :

X X

f R = an exp(2in(x + )) = (an exp(2in)) exp(2inx).

nZ nZ

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 9

for all n by the uniqueness of Fourier series.

If is irrational, then exp(2in) 6= 1 for all n 6= 0. Hence an = 0

for all n 6= 0, if f is invariant. This proves that f (x) = a0 almost

everywhere.

If is rational, then n1 is an integer for some n1 6= 0, hence the

n1 th condition holds irrespective of the value of an1 . Thus f (x) =

cos(2in1 x) is R invariant and not constant almost everywhere.

4. Ergodic theorems

In the previous section, we discussed recurrence, which is concerened

with orbits visiting a given set infinitely often. We have not yet ad-

dressed the question how frequent these visits are, which is what we

pursue now.

Theorem 18 (Mean Ergodic Theorem, von Neumann). Let (X, B, , T )

be an MPS and denote by

I := {f L2 (X) : f T = f a.e.}

the closed subspace of T -invariant functions. Denote by PT : L2 (X)

I the orthogonal projection. Then for every f L2 (X):

N 1

1 X

f T n PT f in L2 (X).

N n=0

Theorem 19 (Pointwise Ergodic Theorem, Birkhoff). Let (X, B, , T )

be an MPS. Then for every f L1 (X), there is a T -invariant function

f L1 (X) such that

N 1

1 X

f (T n x) f (x) pointwise -almost everywhere.

N n=0

When f L2 (X), then of course the function f in the Pointwise

Ergodic Theorem is PT f . There are also Lp variants of the Mean

ErgodicPTheorem for 1 p < , see the example sheet. The quantity

N 1

(1/N ) n=0 f T n is called an ergodic average.

To understand the meaning of these results, we look at the case,

when the system is ergodic. In that case, the limit function f (or PT f

in case of the Mean Ergodic Theorem) is constant almost everywhere

being T invariant. To figure out the value of this constant we look at

the integral of the ergodic averages. We see that

Z N 1 Z

1 X n

f T d = f d

N n=0

for all N . Using the L1 version

R of the Mean Ergodic Theorem

R (or

2

R

assuming f L (X)) we get f d = f d. Thus f = f d

almost everywhere.

10 PETER VARJU

N 1 Z

1 X n

lim f (T x) = f d

N N X

n=0

tities on the left are called time averages, because in a physical system

they correspond to averages of the results of measurements at time

0, 1, . . . , N 1. (The measurement is the evaluation of the function f

at a given point, which represent a state of the physical system.) The

quantity on the right is called the space average, as it corresponds to

the average of the result of the measurement taken at all possible states

of the system with respect to the invariant measure. In this language

the ergodic theorems are interpreted as the convergence of the time

averages to the space average.

We begin with some background material concerning the operator

f 7 f T .

Lemma 20. Let (X, B, ) be a probability space and let T : X X

be a measurable transformation. Then T is measure preserving if and

only if

Z Z

(2) f T d = f d

X X

1

for all f L (X).

Proof. The If part: Let A B and note that x T 1 (A) is equivalent

to T x A. Hence we can write:

Z Z Z

1

(T (A)) = T 1 A d = A T d = A d = (A),

The Only if part: Similarly to the argument of the If part we

can show that (2) holds for characteristic functions. Then by linearity

of integration, it follows also for simple functions, that is functions that

take only finitely many different values.

Now let f L1 (X) and assume that it is non-negative. Let f1 , f2 , . . .

be a monotone increasing sequence of simple functions with f = lim fn

almost everywhere. Then f T = lim fn T almost everywhere. (Here

we used the measure preserving property.) Hence using the definition

of integration and (2) for simple functions:

Z Z Z Z

f T d = lim fn T d = lim fn d = f d.

ference of two non-negative functions and conclude (2) by linearity of

integration.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 11

surable function, we write

UT f = f T

and call UT the Koopman operator.

Lemma 22. The Koopman operator UT is an isometry on the Hilbert

space L2 (X). That is to say

hf, gi = hUT f, UT gi

for all f, g L2 (X).

Proof. Apply the previous lemma to the function f g L1 (X).

Definition 23. An MPS (X, B, , T ) is said to be invertible, if there is

a measure preserving map S : X X such that T S = S T = IdX

almost everywhere. If such a map exists, it is customary to denote it

by T 1 .

Example 24. The circle rotation is invertible, but the times 2 map is

not.

Lemma 25. If the system (X, B, , T ) is invertible, then UT is unitary

on L2 (X) and UT = UT 1 .

Proof. We use the first lemma of the lecture to the function f (g T 1 ):

Z Z Z

1 1

hf, UT 1 gi = f (gT )d = (f (gT ))T d = f T gd = hUT f, gi.

IdL2 (X) , which together with the first part proves that UT is unitary.

The proof of both ergodic theorems (at least the proofs that we will

give) rely on the observation that the convergence can be proved easily

for certain special functions. First, the ergodic averages of a T -invariant

function f I are equal to f , so they converge to f .

If f = UT g g for some g, then the ergodic averages become tele-

scopic sums and only the boundary terms remain, that is:

N 1

1 X n 1

UT (UT g g) = (UTN g g)

N n=0 N

and the right hand side is easily seen to converge to 0.

The following lemma shows that these two type of functions are

enough to look at.

Lemma 26. Write

B := {UT g g : g L2 (X)}.

Then I = B .

12 PETER VARJU

2

L (X) = I B, but not every function in L2 (X) is the sum of a

T -invariant function and a cocycle. (The elements of B are called

cocycles.)

Proof. We can write

f B hf, UT g gi = 0 for all g L2 (X)

hf, gi = hf, UT gi = hUT f, gi for all g L2 (X) f = UT f.

If the system was invertible, then we could finish the proof by applying

UT = (UT )1 to both sides of the last equation.

We prove that f = UT f f = UT f holds in the general case, too.

We can write

f =UT f hf UT f, f UT f i = 0

kf k22 + kUT f k22 hf, UT f i hUT f, f i = 0

kf k22 + kUT f k22 hUT f, f i hf, UT f i + (kUT f k22 kUT f k22 ) = 0

kf UT f k22 + (kUT f k22 kUT f k22 ) = 0

We note that both terms in the left hand side of the last equation are

non-negative. This is clear about the first term. To show it for the

second term, we observe that kUT f k2 = kf k2 , since UT is an isometry,

and kUT f k2 kf k2 , as kUT k = kUT k = 1.

It follows then that

f = UT f f = UT f and kf k2 = kUT f k2 .

The second statement on the right follows from the first one, hence

f = UT f f = UT f , as required.

Proof of the Mean Ergodic Theorem. Let f L2 (X) and fix an > 0.

By the lemma, we can write f = PT f +UT gg+e, for some e, g L2 (X)

such that kek2 < . Thus

1 N 1 1 N 1

X n

N 1 X n

lim sup UT f PT f lim sup (UT gg)+ UT e < .

N N n=0

2 N N N n=0

2

Theorem 27 (Wiener, Maximal Ergodic Theorem). Let (X, B, , T )

be a MPS and let f L1 (X) be a function. Write

( N 1

)

1 X n

E = x X : sup UT f (x) > .

N Z>0 N n=0

Then (E ) 1 kf k1 .

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 13

function. Write

f0 = 0, f1 = f, f2 = f + UT f, . . . ,

fn = f + UT f + . . . + UTn1 f, . . .

and

FN (x) = max {fi (x)}.

0iN 1

Then Z

f d 0.

{xX:FN (x)>0}

Proof. Fix an x X is such that FN (x) > 0. Then FN (x) = fj (x) for

some 1 j N 1. Hence

FN (x) = UT fj1 (x) + f (x) UT FN (x) + f (x).

Thus FN (x) > 0 implies that

f (x) FN (x) UT FN (x).

We can write

Z Z

f d (FN (x) UT FN (x))d

{xX:FN (x)>0} {xX:FN (x)>0}

Z

(FN (x) UT FN (x))d = 0.

X

The second inequality holds because UT Fn is always non-negative and

hence FN (x) = 0 implies that the integrand FN (x) UT FN (x) is non-

positive.

Proof of the Maximal Ergodic Theorem. Put

( N 1

)

1 X n

E,M = x X : max UT f (x) >

1N M N

n=0

( N 1

)

X

= x X : max (UTn f (x) ) > 0 .

1N M

n=0

Z Z

0 (f )d f d (E,M ) kf k1 (E,M ).

E,M E,M

This implies

S (E,M ) 1 kf k1 . We can conclude the proof by noting

E = E,M . (The sets E,M are increasing with M .)

Proof of the Pointwise Ergodic Theorem. We fix a number > 0. We

can write f = f + e1, such that f L2 and ke1, k1 < .

Recall that I = {f L2 (X) : UT f = f } denotes the space of

invariant functions and B = {UT g g : g L2 (X)}. As we have seen

14 PETER VARJU

Hence we can write

f = PT f + UT g g + e2,

such that ke2, k1 ke2, k2 < . (Recall that PT denotes the orthogonal

projection to the space I.)

Furthermore, we can write g = h + e3, such that h L (X)

andke3, k1 < . Putting everything together, we obtain

f = PT f + UT h h + e ,

the L1 -norm on a probability space.)

We note that for every x X, we have

1 N 1 1 N 1

X n

2 X n

(3) UT f (x) PT f (x) kh k + UT e (x).

N n=0 N N n=0

We estimate the measure of the set of points x, where the right hand

side is large using the Maximal Ergodic Theorem. We consider the set

N 1

n 1 X n 1o

Em, := x X : lim sup | UT f (x) PT f (x)| > .

N N n=0 m

Using (3) and the Maximal Ergodic theorem for e and e with =

1/m we obtain:

(Em, ) 2 m 4.

At this point, we intend to take the limit 0, however, the fact

that PT f depends on presents a difficulty. One way to overcome

this problem is to first focus on the convergence of the ergodic averages

only without identifying the limit.

We denote

PN 1 by F the set of points x such that the ergodic averages

(1/N ) n=0 UTn f (x) does not converge at x. We aim to show that

(F ) = 0. Write for all m Z>0

1 1

1 NX N2 1

n

n 1 X n

2o

Fm := x X : lim sup UT f (x) UT f (x) > .

N1 ,N2 N1 n=0 N2 n=0 m

S

It is clear that F Fm , so it is enough to show that (Fm ) = 0 for

all m.

We observe that Fm Em, for all m, , hence (Fm ) 8m. We

take 0 and conclude that (Fm ) = 0, as required.

We proved that the ergodic averages convere to a limit function f

almost everywhere. By Fatous Lemma, f L1 .

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 15

N 1

1 X n

f (x) = lim U f (x),

N n=0 T

N 1

1 X n

f (T x) = lim U f (T x).

N n=0 T

Subtracting these two equations, we find

1 1 N

f (T x) f (x) = lim (f (T N x) f (x)) = lim N (f (T 2 x) f (x)).

N 2

We observe that the series

1

N

X

N (f (T 2 x) f (x))

N =1

2

converges in L1 , hence it must be bounded almost everywhere. This

implies that the terms converge to 0 almost everywhere, hence f (T x)

f (x) = 0, as required.

Definition 29. Let x [0, 1) be a number and K 2 an integer.

Write x = 0.a1 a2 . . .(K) for the base K expansion of x. We say that the

number x is normal in base K, if for all integer M 1 and b1 , . . . , bM

{0, . . . , K 1} we have

|{1 n N : an+j = bj for all 1 j M }| 1

M.

N K

Theorem 30. Almost every number x [0, 1) is normal in every base

K 2.

Proof. We apply the pointwise ergodic theorem for the system (R/Z, B, m, TK ),

where m is the Lebesgue measure and TK is the map x 7 K x mod 1.

This system P is ergodic; see

Pthe first example sheet.

M M

Put A = [ j=1 bj K , j=1 bj K j + K M ). Observe that T n x A

j

theorem applied for f = A , for every set A of this form, implies that

almost every x [0, 1) is normal in base K. Here we use that there

is a countable collection of sets A that we need to consider, hence the

union of the countably many sets of exceptional points is of measure 0.

The union for K of the set of non-normal numbers in base K is of

measure 0, since there are only countably many K.

6. Unique ergodicity

In the previous lecture, we applied the pointwise ergodic theorem to

show that a property holds for almost every numbers. This result gives

no insight whether the property holds for specific numbers, e.g. e,

or 21/3 . In this lecture, we study the question: In which situation does

the pointwise ergodic theorem hold for all points?

16 PETER VARJU

(measurable functions are not even defined everywhere).

If there are more than one T -invariant measures for a transforma-

tion T , then we can apply the Pointwise Ergodic Theorem for both

measures, which may predict different limits for the ergodic averages

in each case. This is not a contradiction, because the set, where con-

vergence holds in one case may be a null-set with respect to the other

measure. However, this suggests, that the everywhere convergence of

ergodic averages prohibits the presence of multiple invariant measures.

The next result confirms this intuition.

Definition 31. Let T : X X be a continuous map on a compact

metric space. We say that the topological dynamical system (X, T ) is

uniquely ergodic if there is exactly one T -invariant Borel probability

measure.

Theorem 32. Let X be a compact metric space and let T : X X be

a continuous map. The following are equivalent

(1) The system (X, T ) is uniquely ergodic.

(2) For every f C(X) there is a number cf C such that

N 1

1 X n

U f (x) cf

N n=0 T

uniformly in x X.

(3) There is a dense subset A C(X) such that for every f A

there is a number cf C such that

N 1

1 X n

U f (x) cf

N n=0 T

We begin by recalling the following result.

Theorem 33 (Riesz Representation Theorem). Let X be a compact

metric space and denote by C(X) the space of continuous functions.

To a complex Borel measure , we associate a bounded functional L

on C(X) as follows:

Z

L f := f d.

isomorphism) between the space of complex Borel measures on X and

the space of bounded linear functionals on C(X).

Most of the time we will only need the following corollary, which is

much simpler than the theorem itself.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 17

Z Z

f d1 = f d2

For the rest of this lecture, X denotes a compact metric space. We

denote by M(X) the space of Borel probability measures on X. If

T : X X is a continuous map, then we define the push-forward of a

measure M(X) as

T (A) := (T 1 (A))

for all Borel set A B. We note that T preserves the measure (or in

other words is T -invariant) if and only if T = .

Lemma 35. We have

Z Z

f dT = f T d

Proof Sketch. First we consider a characteristic function f = A for a

Borel set A B. Then

Z

A dT = T (A) = (T 1 A)

Z Z

A T d = T 1 A d = (T 1 A),

and we see that the claim holds. The claim can be verified for simple

functions and then for general Borel functions in the same way as we

did before.

The following is a useful characterization of the measure preserving

property of continuous maps.

Lemma 36. Let T : X X be a continuous transformation. A

measure M(X) is T -invariant if and only if

Z Z

(4) f d = f T d

Proof. We have already seen that (4) holds even for all functions in L1

if the measure is T -invariant.

For the other direction, we note that by the previous lemma

Z Z Z

f dT = f T d = f d

theorem. Thus is T -invariant, indeed.

18 PETER VARJU

particular it shows that there is at least one always if T is a continuous

map on a compact metric space.

Theorem 37. Let T : X X be a continuous map on a compact

metric space. Let j M(X) be a sequence of probability measures on

X and let {Nj } Z0 be a sequence such that Nj . Then any

weak limit point of the sequence of measures

Nj 1

1 X n

T j

Nj n=0

is invariant for T .

Proof. Assume that

Nj 1

1 X n

= lim-w T j

Nj n=0

we have

Z Z h 1 NX j 1 Z Nj 1 Z

n 1 X i

f T d f d = lim f T dT j f dTn j

Nj n=0 Nj n=0

h 1 NX j 1 Z Nj 1 Z

n+1 1 X i

= lim f T dj f T n dj

Nj n=0 Nj n=0

Z Z

1 h i

= lim f T Nj dj f dj = 0.

Nj

invariant

R measure. Suppose to the contrary that (2) fails for cf =

f d. Then there is a sequence of points {xj } X and a sequence of

positive integers {Nj } Z0 with Nj such that

1 NXj 1 Z

n

UT f (xj ) f d

Nj n=0

Nj 1

1 X n

lim U f (xj ) = a

Nj n=0 T

R a C, otherwise

R we pass to a subsequence. Neces-

sarily a 6= f d, indeed a f d .

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 19

Nj 1

1 X n

T x .

Nj n=0 j

R R

We know that it is T invariant, but f d = a 6= f d. Hence 6= ,

which is a contradiction.

(3) (1) : Let and be two T -invariant probability measures. By

dominated convergence,

Z N 1

1 X

lim f T n d = cf

N N n=0

Z N 1 Z

1 X n

f T d = f d

N n=0

R

for all N , this

R implies f d = cf . R R

Similarly, f d = cf for all f A. Then f d = f d holds for

all f A. It also holds for all f C(X), because A is dense in C(X).

Hence = , as required.

ture of measures invariant for a given continuous map. In many cases,

the invariant measures are hopeless to classify, but sometimes they have

nice structure. The following is a very important question raised by

Furstenberg.

Open Problem 38. What are the Borel probability measures on R/Z

that are invariant under both T2 : x 7 2x and T3 : x 7 3x. The

Lebesgue measure is one example, and it is also easy to give exam-

ples that are supported on finitely many rational points. Any known

examples are convex combinations of these.

Later in the course we will discuss Rudolphs theorem, which claims

that a measure that is invariant and ergodic under the semigroup gen-

erated by T2 and T3 and that has positive entropy with respect to T2

is necessarily the Lebesgue measure.

Example 39. If is irrational, then the circle rotation (R/Z, B, m, R )

is uniquely ergodic. Indeed, let be an invariant measure. Then for

all n Z

Z Z

exp(2inx)d = exp(2in(x + ))d

Z

= exp(2in) exp(2inx)d.

20 PETER VARJU

exp(2in) 6= 1. Thus

Z

b(n) = exp(2inx)d = 0

for all n 6= 0. This implies that is the Lebesgue measure. (If you have

not seen this before, look up the Stone Weierstrass theorem and show

that the set of trigonometric polynomials (i.e. finite linear combina-

tions of the functions exp(2inx)) form a dense subset of C(R/Z).)

Definition 40. A sequence {xn } [0, 1] is equidistributed if

N Z

1 X

lim f (xn ) = f d

N N

n=1

Remark 41. If {xn } [0, 1] is equidistributed, then

|{1 n N : a xn < b}|

ba

N

for every numbers 0 a < b 1.

The following is an immediate corollary of the unique ergodicity of

circle rotations.

Theorem 42. The sequence {n mod 1} is equidistributed if is ir-

rational.

7. Equidistribution of polynomials

The goal of this section is to generalize the equidistribution result

for {n mod 1} to polynomial sequences. We begin with a theorem of

Furstenberg, which we will use to construct uniquely ergodic systems

that are suited to that application.

Theorem 43 (Furstenberg). Let X be a compact metric space and let

T : X X be a continuous transformation. Suppose (X, T ) is uniquely

ergodic and denote by the invariant measure. Let c : X R/Z be a

continuous function. Define the map S on X R/Z by

S(x, y) = (T x, c(x) + y mod 1).

Then m is S-invariant. If the measure m is ergodic, then

the system (X R/Z, S) is uniquely ergodic.

Remark 44. This result holds with an arbitrary compact metrizable

topological group in place of R/Z, which need not be Abelian. The

name of this construction is skew-product.

One of the key notions used in the proof of Furstenbergs theorem is

genericity that we define now.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 21

continuous transformation and let be a T -invariant Borel probability

measure. A point x X is called generic with respect to , if

N 1

1 X

lim-wN T n x = .

N n=0

an ergodic MPS. Then -almost every x X is generic with respect to

.

Proof. Let F C(X) be a dense countable set. By the pointwise

ergodic theorem, for every f F , there is Xf B with (Xf ) = 1

such that

N 1 Z

1 X n

lim f (T x) = f d.

N N

n=0

T

Fix x f F Xf . Let g C(X). Fix > 0 and choose f F such

that kg f k < . Then

1 N 1 N 1

X n 1 X n

f (T x) g(T x) <,

N n=0 N n=0

Z Z

f d gd <.

Thus

1 N 1 Z

X n

lim sup g(T x) gd < 2.

N N n=0

We take 0 and conclude that x is generic with respect to , since

g was arbitrary.

Proof of Furstenbergs theorem on skew products. For showing that S

preserves m, we choose a testfunction f C(X R/Z) and write

Z Z Z Z 1c(x)

f (S(x, y))d(x)dy = f (T x, c(x) + y)dyd(x)

R/Z X X c(x)

Z Z 1

= f (T x, y)dyd(x)

X 0

Z Z

= f (x, y)d(x)dy.

R/Z X

note by E X R/Z the set of generic points with respect to m.

Our strategy is to show that E is very large.

We first observe m(E) = 1. Then we prove the following.

Claim. If (x, y) E, then (x, y + t) E, also for every t R/Z.

22 PETER VARJU

R/Z X R/Z defined as Ut (x, y) = (x, y + t). The proof relies on

the observation that Ut S = S Ut . Indeed,

Ut S(x, y) = Ut (x, c(x)+y) = (x, c(x)+y+t) = S(x, y+t) = SUt (x, y).

To show that (x, y + t) is generic, we fix a testfunction f C(X

R/Z) and write

N 1 N 1

1 X n 1 X

f (S (x, y + t)) = f (S n Ut (x, y))

N n=0 N n=0

N 1 ZZ

1 X

= f Ut (S n (x, y)) f Ut d(x)dy

N n=0

ZZ

= f (x, y)d(x)dy.

testfunction f Ut and then that Ut preserves m, which is easy to

check.

By the above claim, there is a set A B(X) such that E = AR/Z.

We also have (A) = m(E) = 1.

Let be an arbitrary S invariant Borel probability measure. Denote

by P : X R/Z X the coordinate projection. Then P is T -

invariant, hence it is equal to . Indeed,

P (T 1 B) =((T 1 B) R/Z) = (S 1 (B R/Z))

=(B R/Z) = P (B)

for any B B(X).

Then P (A) = (A) = 1, hence (E) = (A R/Z) = 1. Fix

f C(X R/Z). For all x E, hence for -almost every x, we have

N 1 Z

1 X n

f (S (x, y)) f d m.

N n=0

By dominated convergence, we have

Z N 1 Z

1 X n

f (S (x, y))d f d m.

N n=0

R

The left hand side is equal to f d for all N , hence

Z Z

f d = f d m.

is indeed uniquely ergodic.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 23

X = (R/Z)d by

S(x1 , . . . , xd ) = (x1 + , x2 + x1 , . . . , xd + xd1 ).

Then the system (X, S) is uniquely ergodic.

Proof. We argue by induction. If d = 1, then the system is an irrational

circle rotation, hence it is uniquely ergodic.

We assume that d > 1 and the claim holds for d 1. We can apply

Furstenbergs above theorem, and we only need to show that the system

is ergodic.

To that end, we fix an invariant bounded measurable function f on

X and aim to show it is constant. We expand f in Fourier series and

write:

X

f (x) = an exp(2in x),

nZd

X

f (Sx) = an exp(2in Sx)

nZd

X

= an exp(2i(n1 (x1 + ) + n2 (x2 + x1 ) + . . . + nd (xd + xd1 )))

nZd

X

= an exp(2in1 ) exp(2i((n1 + n2 )x1 + . . .

nZd

+ (nd1 + nd )xd1 + nd xd )).

Writing

S(n)

b = (n1 + n2 , . . . , nd1 + nd , nd )

d

for n Z , we obtain

an exp(2in1 ) = aS(n)

b .

This means that |an | = |aS(n)

b |, in particular.

d

Let m Z be such that am 6= 0. By Parsevals formula,

X

|an |2 = kf k22 .

nZd

Hence the number of n Zd such that |an | = |am | must be finite. This

means that the sequence Sbk (m) must be periodic. Since (Sbk m)d1 =

md1 + kmd , we have md = 0. By a similar argument, we can prove

m2 = . . . = md = 0.

Finally, we consider the case m2 = . . . = md = 0 and am 6= 0. Then

S(m) = m, hence am exp(2im1 ) = am , which implies exp(2im1 ) =

b

1. Since is irrational, this implies m1 = 0.

We showed that an = 0 unless n = (0, . . . , 0), which proves that f is

constant, which is precisely what we wanted to show.

24 PETER VARJU

such that at least one of the coefficients ai is irrational. Then the

sequence (p(n) mod 1) is equidistributed.

Proof. We first consider the special case when ad is irrational. We

consider the system (X, S) defined in the corollary given above. It is

easily proved by induction that

n n

x1 x +

0 1 1

x 2

nx2 + nx1 + n

0 1 2

n

S .. = .. .

. .

n n n n

xd 0

xd + 1 xd1 + . . . + d1 x1 + d

Here we wrote the vector vertically just for notational convenience.

Write

t(t 1) (t i + 1)

qi (t) = .

i!

Since qi (t) for i = 0, . . . , d is a basis in the vectorspace of polynomials

of degree at most d, there are some real numbers , x1 , x2 , . . . , xd such

that

p(t) = q0 (t)xd + q1 (t)xd1 + . . . + qd1 (t)x1 + qd .

Moreover, = ad d! is irrational. Hence there are , x1 , . . . , xd R/Z

such that

n n n n

(5) p(n) = xd + xd1 + . . . + x1 + mod 1

0 1 d1 d

for each n Z>0 and is irrational.

Fix f C(R/Z) and define g C((R/Z)d ) by g(x) = f (xd ). Then

N 1 N 1

1 X 1 X

(6) f (p(n)) = g(S n x),

N n=0 N n=0

where and x = (x1 , . . . , xd ) are chosen in such a way that (5) holds.

By unique ergodicity of the system (X, S), we conclude that (6) con-

verges to Z Z

gdmd = f dm,

as required.

Finally, we consider the general case and reduce it to the special

case considered above. Let k be maximal such that ak is irrational.

Let q Z>0 be such that qaj Z for all j = d, . . . , k + 1. Then

p(qn+b) = ad bd +. . .+ak+1 bk+1 +ak (qn+b)k +. . . a1 (qn+b)+a0 mod 1,

where the right hand side is a polynomial in n, whose leading coefficient

is irrational. By the special case, we know then that p(qn + b) is

equidistributed for each b = 0, . . . , q 1. Then the original sequence is

also equidistributed.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 25

8. Mixing properties

Definition 49. A MPS (X, B, , T ) is said to be mixing if for every

measurable sets A, B B

lim (T n (A) B) = (A)(B).

We note that (T n (A) B) = (x X : x B, T n x A), hence

mixing can be interpreted as the property that the state of the system

at time n becomes independent of the state at times 0 as n grows. This

notion can be generalized to multiple sets and times as follows.

Definition 50. A MPS (X, B, , T ) is said to be mixing on k sets if

the following holds. Let A0 , . . . , Ak1 B and let > 0. Then there is

a number N Z>0 such that for all n1 , n2 , . . . , nk1 Z>0 that satisfy

n1 N, n2 n1 N, . . . , nk1 nk2 N

we have

|(A0 T n1 A1 . . . T nk1 Ak1 ) (A0 ) (Ak1 )| .

It is clear that a MPS is mixing if and only if it is mixing on 2 sets.

The following is a long standing open problem in ergodic theory.

Open Problem 51. Is there a MPS that is mixing on 2 sets but not

mixing on 3 sets?

The fact that this basic problem is still open underlines the fact that

mixing is not an easy notion to work with. We will introduce now

the notion of weak mixing, which may look less natural at first sight

but turns out to be more useful for the theory. This notion replaces

the limit in the definition of mixing with another kind of convergence,

which we define now.

Definition 52. We say that a set S Z>0 has full density if

1

|S [1, N ]| = 1.

lim

N

We say that a sequence of complex numbers {an } C converges

in density to a complex number a C if the set

{n Z>0 : |a an | < }

has full density for all > 0. We denote this by D-lim an = a.

We say that a sequence of complex numbers {an } C Cesaro

converges to a complex number a C if

N

1 X

lim an = a.

N N

n=1

26 PETER VARJU

D-limn (T n (A) B) = (A) (B)

for all A, B B.

Lemma 54. Let {an } R be a bounded sequence and a R. The

following are equivalent.

(1) D-lim an = a,

(2) C-lim |an a| = 0,

(3) C-lim(a an )2 = 0,

(4) C-lim an = a and C-lim a2n = a2 .

Proof. (1) (2): Fix > 0 and denote by M the supremum of |an |.

Let N be sufficiently large so that

1

|{n {1, . . . , N } : |a an | < }| > 1 .

N

Then

XN

|a an | 2M N + N = (2M N + N ).

n=1

We finish the proof by taking 0.

(2) (1): Fix > 0. Then

N

X

|{n {1, . . . , N } : |a an | > }| |a an |.

n=1

Thus

1

lim |{n {1, . . . , N } : |a an | > }| = 0.

N N

This proves the claim.

(1) (3) is very similar to (1) (2).

(1), (2) (4):

1 X N 1 X N N

1 X

lim an a = lim (an a) lim |an a| = 0

N N N N N N

n=1 n=1 n=1

showing the first part of the claim. The second part can be proved

similarly by noting that D-lim an = a implies D-lim a2n = a2 , which can

be seen directly from the definition.

(4) (3):

N

1 X 2

C-lim(a an )2 = lim (a 2aan + a2n )

N N

n=1

=a2 2a C-lim an + C-lim a2n = a2 2a2 + a2 = 0.

Theorem 55. Let (X, B, , T ) be an MPS. Then the following are

equivalent.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 27

(2) (X Y, B C, , T S) is ergodic for all ergodic MPS

(Y, C, , S),

(3) (X X, B B, , T T ) is ergodic,

(4) (X X, B B, , T T ) is weak mixing,

(5) the operator UT has no non-constant eigenfunctions.

will be used in the proof of the theorem.

(or -system) generating B.

The system (X, B, , T ) is weak mixing if and only if

for all A, B S.

The system (X, B, , T ) is ergodic if and only if

for all A, B S.

surable rectangles

S := {B C : B B, C C},

characterizing ergodicity in the previous lemma holds. Let B1 C1 , B2

28 PETER VARJU

C2 S. Then

1 N 1

X

((T S)n (B1 C1 ) B2 C2 )

N n=0

(B1 C1 ) (B2 C2 )

1 N 1

X

= [(T n (B1 ) B2 )(S n (C1 ) C2 )

N n=0

(B1 )(C1 )(B2 )(C2 )]

1 N 1

X

= [(T n (B1 ) B2 )(S n (C1 ) C2 )

N n=0

(B1 )(B2 )(S n (C1 ) C2 )]

1 N 1

X n

+ [(B1 )(B2 )(S (C1 ) C2 ) (B1 )(C1 )(B2 )(C2 )]

N n=0

N 1

1 X

|(T n (B1 ) B2 ) (B1 )(B2 )|

N n=0

1 N 1

X

+ (B1 )(B2 ) [(S n (C1 ) C2 ) (C1 )(C2 )]

N n=0

first sum and then estimated (S n (C1 ) C2 ) above by 1 in each term.

Thus

1 N 1

X

lim sup ((T S)n (B1 C1 ) B2 C2 )

N N n=0

(B1 C1 ) (B2 C2 )

N 1

1 X

lim |(T n (B1 ) B2 ) (B1 )(B2 )|

N N

n=0

1 N 1

X

+ lim [(S n (C1 ) C2 ) (C1 )(C2 )] = 0.

N N

n=0

second sum converges to 0 since (Y, C, , S) is ergodic. This proves

that the product system is indeed ergodic.

(2) (3): First, we apply (2) with Y being the one point space to

see that (X, B, , T ) is ergodic. Then (3) follows as a special case of

(2).

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 29

ergodicity in the lemma for the sets B X, C X X X:

C-lim ((T T )n (B X) C X) = (B X) (C X).

Unwinding the definitions, we write

C-lim (T n (B) C)(T n (X) X) = (B)(X)(C)(X)

and hence

C-lim (T n (B) C) = (B)(C).

We do the same with the sets B B, C C X X:

C-lim ((T T )n (B B) C C) = (B B) (C C).

Thus

C-lim (T n (B) C)2 = (B)2 (C)2 .

This together with the previous paragraph implies

D-lim (T n (B) C) = (B)(C),

which was to be proved.

(1) (4): This is very similar to the above arguments.

(3) (5): Let f be a non-constant eigenfunction of the operator

UT . We show that the function fe(x1 , x2 ) = f (x1 ) f (x2 ) : X X C

is T T invariant. By ergodicity of (X X, B B, , T T ),

fe is constant almost everywhere, hence f must be constant almost

everywhere, also.

To show the claim, we write

fe(T x1 , T x2 ) = f (T x1 )f (T x2 ) = f (x1 ) f (x2 ) = f (x1 )f (x2 ) = fe(x1 , x2 ),

where is the eigenvalue of UT corresponding to f . Since UT is an

isometry, || = 1 necessarily. This completes the proof.

The proof of the implication that (5) implies weak mixing requires

some knowledge of operator theory and it is not examinable. Two

possible approaches are outlined in the second example sheet.

Our next goal is to prove the following result.

Theorem 57. Let (X, B, , T ) be a weak mixing MPS. Then for any

k Z>0 , and f1 , . . . , fk L (X), we have

N 1 Z Z Z

1 X n 2n kn

(7) U f1 UT f2 UT fk 2 f1 d f2 d fk d.

N n=0 T in L

30 PETER VARJU

function. Then

N 1 Z Z Z

1 X n 2n kn

(8) lim f0 UT f1 UT f2 UT fk d = f0 d fk d.

N N

n=0

In particular, taking f0 = f1 = . . . = fk = A for a set A B we

obtain

N 1

1 X

lim (A T n A . . . T kn A) = (A)k+1 .

N N

n=0

ing systems.

Observe that (8) follows immediately from (7). Indeed, (7) claims

that a certain sequence of functions converge in norm, while (8) claims

that the inner products with any fixed function f0 converge.

The following lemma is very useful in showing that the averages of

a sequence of vectors converge to 0 in a Hilbert space.

R In the proof of

Theorem 57 we will use it in the special case when fk d = 0.

Lemma 59 (van der Corput). Let u1 , u2 , . . . be a bounded sequence of

vectors in a Hilbert space. Write

N

1 X

sh = lim sup hun , un+h i.

N N n=1

N

1

X

lim un = 0.

N N

n=1

all h. The following would be a natural approach to prove the lemma.

We can write

1 X N N N

2 1 X X

un = 2 hun , un2 i

N n=1 N n =1 n =1 1

1 2

N N 1 N h

1 X

2

X X

(9) = 2 kun k + 2 Rehun , un+h i .

N n=1 h=1 n=1

PN h

Unfortunately, the expression n=1 Rehun , un+h i can be estimated by

sh only for h fixed and N . Observe how we overcome this problem

in the following proof.

Proof. We fix > 0 and let H be sufficiently large so that

H1

1 X

sh < .

H h=0

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 31

1 X N N H H N +H

1 XX 1 X 1 X

un un+h kun k + kun k < .

N n=1 N H n=1 h=1 N n=1 N n=N +1

same to both averages.

The above inequality allows us to turn our attention to the double

average. We first use the triangle inequality:

1 X N X H N H

1 X 1 X

un+h un+h .

N H n=1 h=1 N n=1 H h=1

average of norm squares that can be expanded similarly to (9). Note

that this allows us to avoid inner products of vectors whose indices

differ by more than H.

1 X N X H 2 1 X N H

1 X

2

un+h un+h

N H n=1 h=1 N n=1 H h=1

N H H

1 X 1 XX

= hun+h1 , un+h2 i

N n=1 H 2 h =1 h =1

1 2

H H N

1 X X 1 X

hun+h1 , un+h2 i.

H2 N n=1

h1 =1 h2 =1

1 X N X H H H

2 1 XX

lim sup un+h 2 s|h h |

N N H n=1 h=1 H h =1 h =1 1 2

1 2

H

1 X

2Hsh < 2.

H2 h=1

N

1 X

lim sup un + 2.

N N n=1

Letting 0, we can complete the proof.

We recall tho auxiliary results from the second example sheet that

we need for the proof of the theorem.

Lemma 60. Let (X, B, , T ) be a weak mixing MPS. Then for any

f, g L2 (X) we have

Z Z

n

D-limhUT f, gi = f d gd.

32 PETER VARJU

Note that when f and g are characteristic functions of sets, then the

displayed equation is precisely the definition of weak mixing.

for all k Z>0 .

k = 1 is the mean ergodic theorem. We assume that k > 1 and that

the theorem and the corollary holds for k

R 1.

We first consider the special case when fk d = 0. We set

P

n=1 un k2 0 using the van der Corput lemma.

We compute

Z

hun , un+h i = UTn f1 UT2n f2 UTkn fk

2(n+h) k(n+h)

UTn+h f1 UT f2 UT fk d

Z

= UTn [(f1 UTh f1 ) (UTn f2 UTn+2h f2 )

(k1)n (k2)n+kh

(UT fk U T fk )]d

Z

(k1)n

= (f1 UTh f1 ) UTn (f2 UT2h f2 ) UT (fk UTkh fk )d.

We use now the induction hypothesis, more precisely (8) for k 1

and fi UTih fi in place of fi1 :

N Z Z Z

1 X

lim hun , un+h i = f1 UT f1 d f2 UT f2 d fk UTkh fk d.

h 2h

n N

n=1

We note that

Z

fi UT fi d kfi k2

ih

Z

2 2 kh

sh kf1 k kfk1 k fk UT fk d.

mixing by Lemma 61. We conclude that D-lim sh = 0. Hence R van der

Corputs lemma applies and implies (7) in the special case fk d = 0.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 33

R

In the general case we write a :=

write thus

N 1

1 X n

UT f1 UT2n f2 UTkn fk

N n=0

N 1

1 X n

= UT f1 UT2n f2 UTkn (fk0 + a)

N n=0

N 1

1 X n

= U f1 UT2n f2 UTkn fk0

N n=0 T

N 1

1 X n (k1)n

+a U f1 UT2n f2 UT fk1 .

N n=0 T

The first sum converges to 0 by the special case, the second converges

to Z Z Z Z

a f1 d fk1 d = f1 d fk d

case.

Our next goal is to exhibit an example of a measure preserving sys-

tem that is weak mixing but not mixing. We do this using a construc-

tion called cutting and stacking. We begin with a simple example;

our only purpose is to explain how the construction works.

We build a system on the space X = [0, 1], using the Borel -algebra

B, and the Lebesgue measure = m. The transformation T is defined

using the following procedure. For each n Z0 , we specify a suitable

collection of intervals

(n) (n) (n) (n) (n) (n)

I1 = [a1 , b1 ), . . . , I2n = [a2n , b2n )

such that each of them are of length 2n and

(n)

2(n)

[0, 1) = I1 . . . I n .

Once these intervals are given, we define T such that the following

holds for each n. For each j = 1, . . . , 2n 1, the map T restricted to

(n) (n)

the interval Ij is a translation onto the interval Ij+1 . More formally,

we put

(n) (n) (n)

T x = x + aj+1 aj for all x Ij for all 1 j < 2n .

This rule gives a partial definition of T for each n, and the intervals

has to be constructed in a suitable way so that these partial definitions

34 PETER VARJU

\ (n)

x I2n {1},

n

the rule does not define T x. However, this is a set of measure 0, hence

T can be extended arbitrarily.

(n)

Before explaining the construction of the intervals Ij , we visualize

the construction by the following figure. In this picture, each interval

is represented by a vertical line, and the map T moves each point to

the one directly above on the next level of the stack.

(n)

The intervals Ij are constructed by iterating the following proce-

(0)

dure. We start by taking I1 = [0, 1). Then we construct the intervals

(n+1) (n) (n)

Ij from the intervals Ij as follows. We cut each interval Ij into

(n+1)

two equal subinterval, and the one on the left will be defined Ij and

(n+1)

the one on the right will be defined Ij+2n . More formally, we put

j + bj (n+1) a(n)

j + bj

(n)

(n)

Ij = aj , , and Ij+2n = , bj .

2 2

stack of the intervals at the nth iteration, cut the whole stack ver-

tically into two and put the right half of the stack on top of the left

half.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 35

nth stack of intervals is compatible with the n + 1th stack. Indeed,

if a point is directly above another one, then it will be directly above

the same point in the next iteration, also.

We prove this now more formally.

(n)

Lemma 62. If x Ij for some 1 j < 2n for some n, then there is

(n+1)

1 j 0 < 2n+1 such that x Ij 0 and

(n+1) (n+1) (n) (n)

aj 0 +1 aj 0 = aj+1 aj .

In particular, the map T is well-defined.

(n) (n) (n)

Proof. Let x Ij and suppose first that x < (aj + bj )/2. Then

(n+1)

x Ij , so j 0 = j works. Furthermore,

j + bj (n+1) (n+1) (n) a(n) (n)

j+1 + bj+1

[aj , bj ) = aj , , [aj+1 , bj+1 ) = aj+1 , .

2 2

Thus

(n+1) (n+1) (n) (n)

aj+1 aj = aj+1 aj ,

as required.

(n) (n)

The other case, when x (aj + bj )/2, is very similar.

Lemma 63. The system (X, B, m, T ) is measure preserving.

(n)

This is also expected to hold, because T restricted to any Ij with

j < 2n is a translation and translations preserve the Lebesgue measure.

Proof. Let A B and fix some n Z0 . We can write

2 n

X (n)

1

m(T A) = m(T 1 (A) Ij ).

j=1

(n) (n)

Since T : Ij Ij+1 is a measure preserving bijection for 1 j < 2n ,

we have

36 PETER VARJU

n 1

2X

(n) (n)

m(T 1 A) = m(T 1 (A Ij+1 )) + m(T 1 (A) I2n )

j=1

n 1

2X

(n) (n)

= m(A Ij+1 ) + m(T 1 (A) I2n )

j=1

(n) (n)

=m(A) m(A I1 ) + m(T 1 (A) I2n ).

|m(T 1 A) m(A)| 2 2n .

system is indeed measure preserving.

We discuss a similar cutting and stacking construction, called the

Chacon map. This is an example of a weak mixing but not mixing

system.

Similarly as before, we take X = [0, 1], let B be the Borel -algebra

and let = m be the Lebesgue measure. We will define a collection of

disjoint intervals

Ij = [aj , bj ), j = 1, . . . , N (n)

for each n. However, this time they will not cover the whole interval

[0, 1). The common length of the intervals is 2/3n+1 .

(0)

The intervals are constructed as follows. We take I1 = [0, 2/3).

(n+1) (n)

Then the intervals Ij are constructed using the intervals Ij as

(n)

follows. We let N (n + 1) = 3N (n) + 1. We cut up the interval Ij into

three equal subintervals, and

(n+1)

Ij will be defined the left-most subinterval,

(n+1)

Ij+N (n) will be defined the middle subinterval,

(n+1)

Ij+2N (n)+1 will be defined the right-most subinterval.

(n)

This does not define I2N (n)+1 , yet. We cut off an interval of length

S (n) (n)

2/3n+2 from [0, 1]\ j Ij and I2N (n)+1 is defined to be this interval.

We visualize this construction by the following figure.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 37

(n)

The map T is again defined in such a way that it maps Ij for any

(n)

1 j < 2n onto Ij+1 via a translation. The facts that this construction

yields a well-defined map, and it is measure preserving can be proved

very similarly to the corresponding results in the previous example and

we omit the details.

We will need the following lemma in both the proof that the Chacon

map is not mixing and in the proof that it is weak mixing.

m(T N (n) (Ij ) Ij ) m(Ij ) and

3

(n) (n) 1 (n)

m(T N (n)+1 (Ij ) Ij ) m(Ij ).

3

Proof. The proof is illustrated by the figure below. We first note that

(n) (n+1) (n+1) (n+1) (n+1)

Ij = Ij Ij+N (n) Ij+2N (n)+1 . We also note that T N (n) (Ij )=

(n+1) (n+1) (n+1)

Ij+N (n) and T N (n)+1 (Ij+N (n) ) = Ij+2N (n)+1 . This shows that

m(T N (n) (Ij ) Ij ) m(Ij+N (n) ) = m(Ij ) and

3

(n) (n) (n+1) 1 (n)

m(T N (n)+1 (Ij ) Ij ) m(Ij+2N (n)+1 ) = m(Ij ),

3

as required.

38 PETER VARJU

Proof. We put A = [0, 2/9) and note that for each n 1, there is a set

S (n)

Jn {1, . . . , N (n)} such that A = jJn Ij . This can be proved by

(n)

induction. Applying the lemma for each Ij with j Jn we get

X (n) (n)

m(A T N (n) (A)) =m(T N (n) A A) m(T N (n) Ij Ij )

jJn

1X (n) 1

m(Ij ) = m(A).

3 jJ 3

n

This gives lim inf (A T N (n) (A)) m(A)/3 > m(A)2 showing that

the system is not mixing. (Note that m(A) = 2/9.)

Theorem 66. The Chacon map is weak mixing.

Proof. Let f L2 (X) be an eigenfunction of UT , that is UT f = f

for some complex with || = 1. We show below that f is constant

almost everywhere.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 39

m({x X : |f (x)| > 2}) > 10.

If we replace f by a sufficiently large constant multiple, then such an

can be found.

By Luzins theorem, there is a continuous function h such that

m({x X : h(x) 6= f (x)}) < .

Then h is uniformly continuous, hence there is n Z0 such that for

any x1 , x2 X with |x1 x2 | < 2/3n+1 we have |h(x1 ) h(x2 )| < /10.

(n)

Thus |h(x1 ) h(x2 )| < /10 holds in particular when x1 , x2 Ij for

some 1 j N (n).

(n)

Since the intervals Ij cover more than half of X, there is 1 j

N (n) such that

(n) (n)

m({x Ij : f (x) 6= h(x)}) < 2m(Ij ).

(n)

We select an arbitrary x0 Ij and put z = h(x0 ). Then

(n) (n)

m({x Ij : |f (x) z| > }) < 2m(Ij ).

(n) (n)

We note that T ij (Ij ) = Ii for every 1 i N (n). Using

UTij f = ij f and the measure preserving property we can write

(n)

m({x Ii : |f (x) ij z| > })

(n) (n)

= m({y Ij : |f (T ij y) ij z| > }) < 2m(Ij ).

Since < 1, this shows in particular that

N (n) N (n)

X (n)

[ (n)

m({x X : |f (x)| > |z| + 1}) < 2m(Ij ) + m X\ Ij

j=1 j=1

1

< 2 + .

2 3n

If n is sufficiently large (which we may assume) then 1/(2 3n ) < 8

and hence 2 < |z| + 1, hence |z| > 1.

Now we use Lemma 64 and obtain that

(n)

m({x Ij : |f (x) N (n) z| < })

(n) (n)

m({y Ij : T N (n) (y) Ij and |f (T N (n) y) N (n) z| < })

(n) (n)

=m(Ij T N (n) Ij )

(n) (n)

m(y Ij T N (n) Ij : |N (n) f (y) N (n) z| > )

(n) (n) (n)

m(T N (n) Ij Ij ) m(y Ij : |f (y) z| > )

1 (n) (n)

m(Ij ) 2m(Ij ).

3

40 PETER VARJU

Since < 1/12, (1/3)m(Ij ) 2m(Ij ) > 2m(Ij ), hence there is

(n)

x Ij such that both |f (x) z| < and |f (x) N (n) z| < hold.

Thus |z N (n) z| < 2 and |1 N (n) | < 2 as |z| > 1.

By a similar argument using the second part of the lemma we can

get |1 N (n)+1 | < 2, hence |N (n) N (n)+1 | < 4 and |1 | < 4.

Taking 0 we conclude that = 1.

Summarizing the above argument we can prove the following. For

every > 0 there is n0 such that for all integer n > n0 there is a

complex number z,n such that for all 1 i N (n) we have

(n) (n)

m({x Ii : |f (x) z,n | < }) > (1 2)m(Ii ).

Summing this up for 1 i N (n), we obtain

N (n)

X (n)

1

m({x X : |f (x)z,n | < }) > (12)m(Ii ) = (12) 1 n

.

i=1

2 3

Taking 0 and n we conclude that f must be constant almost

everywhere.

12. Entropy

Entropy is a number attached to measure preserving systems that

quantifies the following related intuitive properties.

What is the information content of a typical orbit x, T x, T 2 x, . . .?

How chaotic is a typical orbit x, T x, T 2 x, . . .?

Knowing x, T x, . . . , T n x, to what extent can we predict T n+1 x?

The original motivation for the introduction of entropy to ergodic

theory was the following problem.

Definition 67. Let (p1 , . . . , pn ) be a probability vector, i.e. pi 0 and

p1 + . . . + pn = 1. The (p1 , . . . , pn )-Bernoulli shift is the measure pre-

serving system ({1, . . . , n}Z , B, , ), where B is the Borel -algebra,

is the infinite-fold product of the measure p1 1 +. . .+pn n on {1, . . . , n}

and is the shift (x)n = xn+1 .

Problem 68. Are the (1/2, 1/2) and (1/3, 1/3, 1/3) Bernoulli shifts

isomorphic?

This problem may not seem very deep at first sight, but it was open

for a decade or so until it was solved by Kolmogorov, and the following

facts suggest that it is subtle.

The two systems have the same mixing properties. (They are

both mixing on k sets for all k Z>0 .)

They are also spectrally isomorphic. That is to say, there is a

unitary transformation U : L2 ({1, 2}Z ) L2 ({1, 2, 3}Z ) such

that

U U2 = U3 U,

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 41

Meshalkin proved that the (1/4, 1/4, 1/4, 1/4) and the

(1/2, 1/8, 1/8, 1/8, 1/8) Bernoulli shifts are isomorphic.

For more details on these facts, see the third example sheet.

In the next lectures we develop entropy theory and prove that the

(1/2, 1/2) and (1/3, 1/3, 1/3) Bernoulli shifts are not isomorphic.

12.1. Jensens inequality. We recall an inequality that will be used

frequently to derive elementary properties of entropy.

Definition 69. We say that a continuous function f : [a, b] R{}

is convex if there is a number x for each a < x < b such that

f (y) f (x) + x (y x)

for all y (a, b). The function is called strictly convex, if the inequality

is strict for all y 6= x.

We note that if f C 2 and f 00 (x) > 0 for all x (a, b), then f is

strictly convex, as can be seen easily from the Taylor expansion of f .

Theorem 70 (Jensens inequality). Let f : [a, b] R {} be a

convex function. Let x1 , x2 , . . . , xn [a, b] and let p1 , . . . , pn be a prob-

ability vector. Then

(10) f (p1 x1 + . . . + pn xn ) p1 f (x1 ) + . . . + pn f (xn ).

If the function is strictly convex, then equality occurs in (10) if and

only if the numbers xi coincide for all i such that pi > 0.

12.2. Entropy of partitions. Let (X, B, ) be a probability space. A

finite measurable partition is a finite collection of pairwise disjoint

measurable sets, whose union cover the whole space X. We call the

sets in this collection the atoms of the partition. If , B are two

finite partitions, then their join or coarsest common refinement is

the finite partition

= {A B : A , B }.

We define the function H on the set of probability vectors (of varying

length) by

H(p1 , . . . , pk ) = p1 log p1 . . . pk log pk ,

where we use the convention p log p = 0 if p = 0.

We define the entropy of a partition = (A1 , . . . , Ak ) as

H () = H((A1 ), . . . , (Ak )).

We define the conditional entropy of relative to another partition

= {B1 , . . . , Bl } as

l (A B )

X 1 j (Ak Bj )

H (|) = (Bj )H ,..., .

j=1

Bj (Bj )

42 PETER VARJU

tion subdivides the whole space as the union of probability spaces.

Indeed, we can endow each atom Bj with a probability measure given

by (C)/(Bj ) for each measurable set C Bj . The partition in-

duces a partition on each of these spaces: {A1 Bj , . . . , Ak Bj }. Now

the conditional entropy is the average of the entropies of these parti-

tions on each of the probability spaces weighted by the measure of the

corresponding atom of Bj of .

The following intuition for the meaning of entropy is useful. One

may think about the space X as the possible states of a physical system

and about the partition as an experiment; the atoms are the possible

outcomes. Then we may think of entropy as the amount of uncertainty

in the experiment, i.e. how difficult it is to predict its outcome. Or

equivalently, we may think of entropy as the amount of information we

gain when we learn the outcome of the experiment.

With this interpretation, the two partitions are two experiments, and

the conditional entropy measures the amount of new information that

we gain when we learn the outcome of the second experiment if we

already new the outcome of the first one.

The choice of the function

(11) H(p1 , . . . , pk ) = p1 log p1 . . . pk log pk

may seem arbitrary at first, but the following properties are reassuring.

Lemma 71. The following holds.

(1) H () 0 for any partition .

(2) The maximum of H () over all partitions with k atoms is ob-

tained when each atoms have the same probability 1/k.

Interpretation: Uncertainity is maximal, when each outcome is

equally likely.

(3) H (A1 , . . . , Ak ) = H (A1 , . . . , Ak ) for any permutation .

(4) H ( ) = H () + H (|). Interpretation: The amount of

information contained together in the two experiments and

is the same as the amount of information contained in plus

the amount of new information contained in when we learn

the outcome with the prior knowledge of the outcome of .

We note that Khinchin proved that the function H given in (11) is

unique upto scalar multiples such that the resulting notion of entropy

enjoys the properties listed in the lemma.

Proof. (1): This holds because p log p 0 for all p [0, 1].

(2): This follows from Jensens inequality applied to the strictly

convex function x 7 x log x with pi = 1/k and xi = (Ai ). Indeed, we

note

X X1 1

p i xi = (Ai ) = ,

k k

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 43

k

1 X 1

log k (Ai ) log((Ai )),

k i=1

k

log k H().

Analysing the equality case in Jensens inequality, we find that it occurs

if and only if (A1 ) = . . . = (Ak ) = 1/k.

(3): Immediate from the definition.

be a very useful gadget in our calculations with entropy. Let (X, B, )

be a probability space and let B be a finite partition. The infor-

mation function of is the function X 7 R {}

I ()(x) = log ([x] ),

where [x] denotes the atom of that contains x. If B is another

finite partition, then the conditional information function of

relative to is defined as

([x] )

I (|)(x) = log .

([x] )

The link with entropy is given in the next lemma.

Lemma 72. In the above setting, we have

Z

H () = I ()d,

Z

H (|) = I (|)d.

Proof. The first claim is a special case of the second one when we

consider = {X}, therefore, we only consider the second claim.

If A and B , then I (|) is constant on A B and its value

is log((A B)/(B)), which is immediate from the definition. Thus

(A B)

Z X

I (|)d = (A B) log

A,B

(B)

X X (A B) (A B)

= (B) log ,

B A

(B) (B)

as required.

44 PETER VARJU

ability space (X, B, ). Then

I ( |) =I (|) + I (| )

H ( |) =H (|) + H (| )

Note that this lemma generalizes item (4) from the first lemma of

the section.

Proof. The second line follows from the first one upon integration

thanks to the previous lemma.

Unwinding the definitions, we can write

([x] )

I (|)(x) = log

([x] )

([x] )

I (| )(x) = log

([x] )

([x] )

I ( |)(x) = log .

([x] )

It is clear that the sum of the first two lines yields the third line, as

required.

Lemma 74. Let , , B be finite partitions in a probability space

(X, B, ). Then

H (| ) H (|).

Interpretation: The new information content in the experiment

after we know the outcomes of and is at most as much as its new

information content if we know the outcome of alone.

Proof. We note the definitions

P (ABC)

(12) H (| ) = A,B,C (A B C) log (BC)

P (AC)

(13) H (|) = A,C (A C) log (C) .

X (A B C) (A C)

(A B C) log (A C) log

B

(B C) (C)

In order to show this, we fix A , C and apply Jensens

inequality for the function x 7 x log x at points (AB C)/(B C)

with weights (B C)/(C) as B runs through the atoms of . We

note that

X (B C) (A B C) (A C)

= .

B

(C) (B C) (C)

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 45

X (B C) (A B C) (A B C) (A C) (A C)

log log ,

B

(C) (B C) (B C) (C) (C)

which in turn yields

X (A B C) (A C)

(A B C) log (A C) log ,

B

(B C) (C)

as required.

The above inequality together with the chain rule can be used to

prove a range of useful inequalities for entropies of partitions. For

instance, we note the following.

Corollary 75. Let , B be finite partitions in a probability space

(X, B, ). Then

H () H ( ) H () + H ().

Proof. By the chain rule, we have

H ( ) = H () + H (|).

Since p log p 0 for all p [0, 1], H (|) 0 similarly as we saw in

item (1) of the first lemma of the section. This proves the inequality

on the left.

By the previous lemma (applied with = {X}), we have H (|)

H (), which proves the inequality on the right.

12.4. Entropy of measure preserving systems. In what follows

(X, B, , T ) is a measure preserving system and , B are finite

partitions.

Notation. We write T 1 for the partition, whose atoms are T 1 ([x] ).

Lemma 76 (Invariance). In the above setting, we have

I (T 1 |T 1 )(x) =I (|)(T x),

H (T 1 |T 1 ) =H (|).

Proof. By the definition, we have

([x]T 1 T 1 )

I (T 1 |T 1 )(x) = log .

([x]T 1 )

We note that

T 1 T 1 = T 1 ( ),

and

[x]T 1 () = T 1 ([T x] ).

Indeed, the set on the right hand side is an atom of T 1 ( ) and it

contains x.

46 PETER VARJU

([x]T 1 () ) = ([T x] ),

and similarly ([x]T 1 () ) = ([T x] ), which proves the first claim.

The second claim follows from the first one by integration and the

measure preserving property.

Before we can give the definition of entropy of a measure preserving

system, we need two more auxiliary results.

Lemma 77. Let a1 , a2 , . . . be a subadditive sequence, i.e. an + am

an+m for all n, m. Then

an an

lim = inf ,

n n n

in particular the limit exists.

Proof. Fix n0 Z>0 . For each n Z>0 , write n = i(n)n0 + j(n), where

0 j(n) < n0 . By (repeated use of) subadditivity,

an i(n)an0 + aj(n) ,

hence

an i(n)an0 + aj(n)

lim sup lim sup

n n n n

i(n) aj(n)

= lim sup an0 + lim sup

n n n n

n j(n) an0 an

= lim sup = 0.

n n n0 n0

Since n0 is arbitrary, we can write

an an an an

inf lim sup lim inf inf ,

n n n n n n

which proves the claim.

Notation. We write

n

m = T m T (m+1) . . . T n

for any integers 0 m n. When the system is invertible, we allow

m or n to be negative.

Lemma 78. In the above setting, we have

H (0n1 ) + H (0m1 ) H (0 )n+m1 ,

i.e. the sequence n 7 H (0n1 ) is subadditive.

Proof. We note that 0n+m1 = 0n1 nn+m1 and T n 0m1 = nn+m1 .

By the above corollary and invariance, we can write

H (0n+m1 ) H (0n1 ) + H (nn+m1 ) = H (0n1 ) + H (0m1 ),

as required.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 47

let B be a finite partition. The entropy of the system with

respect to the partition is

H (0n1 ) H (0n1 )

h (T, ) = lim = inf .

n n n

The entropy of the system is

h (T ) = sup h (T, ).

B is a finite partition

the measure preserving system as a physical system, the transformation

T is the progression of time and is in experiment, then the partition

0n1 corresponds to the experiment performed repeatedly at time

0, 1, . . . , n 1. Therefore, the entropy of the system with respect to

is the amount of information gained per one experiment when we

perform it repeatedly at regular time intervals. The entropy of the

system is the maximal amount of information we can gain per one

experiment optimized over all possible experiments we can perform.

12.5. The Kolmogorov-Sinai theorem and the entropy of Bernoulli

shifts. One of the difficulties in calculating the entropy of a measure

preserving system is that the definition involves a supremum over all

finite partitions. The Kolmogorov-Sinai theorem addresses this issue.

Definition 80. Let (X, B, , T ) be an invertible measure preserving

system. For a finite partition B, we denote by B() the -algebra

generated by the atoms of . A finite partition B is called a two-

sided generator, if for every A B and for every > 0, there are a

number n Z>0 and a set A0 B(n n

) such that (A4A0 ) < .

Example 81. Let (X = {1, . . . , k}Z , B, , ) be the (p1 , . . . , pk )- Bernoulli

shift. Then the partition

= {{x X : x0 = i} : i = 1, . . . , k}

is a two-sided generator.

Indeed, it is easy to check that the collection of sets

{A B : for every > 0, there are n and A0 B(n

n

) with (A4A0 ) < }

is a -algebra and it contains all cylinder sets, hence it equals B.

Theorem 82 (Kolmogorov-Sinai). Let (X, B, , T ) be an invertible

measure preserving system and let B be a finite partition that

is a two-sided generator. Then

h (T ) = h (T, ).

There is a version of this theorem with a similar proof for non-

invertible measure preserving systems, where a two-sided generator is

replaced by the analogous notion of a one-sided generator. However,

we will not need that result, therefore we omit the details.

48 PETER VARJU

shift, and let be the partition we considered in the previous example.

We prove that h (T, ) = H(p1 , . . . , pk ), which implies h (T ) =

H(p1 , . . . , pk ) by the Kolmogorov-Sinai theorem.

We first show that

(14) H (|1n ) = H(p1 , . . . , pk )

for all n Z>0 . To this end we calculate the information function

([x]0n )

I (|1n )(x) = log .

([x]1n )

We note that

[x]0n = {y : y0 = x0 , . . . , yn = xn },

which has measure px0 pxn by the definition of the product measure.

Similarly,

([x]1n ) = px1 pxn .

Thus

I (|1n )(x) = log px0 ,

and (14) follows by integration.

Using the chain rule, invariance and then (14), we obtain

H (0n1 ) =H (n1

n1 n2 n1

) + H (n2 |n1 ) + . . . + H (00 |1n1 )

=H () + H (|11 ) + . . . + H (|1n1 )

=nH(p1 , . . . , pk ).

We divide both sides by n and take the limit to conclude

h (T, ) = H(p1 , . . . , pk ),

as required.

Since isomorphic systems have the same entropy, this shows that the

(1/2, 1/2) and (1/3, 1/3, 1/3) Bernoulli shifts are not isomorphic. We

add that a deep theorem of Ornstein shows that two Bernoulli shifts

are isomorphic if and only if they have the same entropy.

The proof of the Kolmogorov-Sinai theorem relies on the following

three Lemmata.

Lemma 84. Let (X, B, , T ) be a measure preserving system and let

B be a finite measurable partition. Then

n

h (T, ) = h (T, n )

for all n Z>0 .

Lemma 85. Let (X, B, , T ) be a measure preserving system and let

, B be finite measurable partitions. Then

h (T, ) h (T, ) + H (|).

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 49

Lemma 86. For every > 0 and for every k Z>0 , there is > 0

such that the following holds. Let (X, B, ) be a probability space, and

let , B be two finite partitions. Suppose that has k atoms and

for each A there is A0 B() such that (A4B) .

Then H (|) .

Proof of Lemma 84. We can write

H (0m1 )

h (T, ) = lim

m m

and

n

n

H (n ) H (T 1 n

n

) . . . H (T (m1) n

n

)

h (T, n ) = lim

m m

m+n1

H (n )

= lim

m m

H (0m+2n1 )

= lim

m m

H (0m+2n1 )

= lim .

m m + 2n 1

Hence the entropies with respect to the two partitions equal, indeed.

Proof of Lemma 85. We can write

H (0m1 ) H (0m1 0m1 )

=H (0m1 ) + H (0m1 |0m1 )

m1

X

=H (0m1 ) + H (jj |0m1 j+1

m1

)

j=0

m1

X

H (0m1 ) + H (jj |jj )

j=0

=H (0m1 ) + m H (|).

Here, we used the Corollary of Lemma 74, then the chain rule twice,

then Lemma 74, and finally invariance.

We divide this inequality by m and take the limit to obtain the

claim.

Proof of Lemma 86. Let = {A1 , . . . , Ak }, and for each 1 i k, let

n

Bi B(n ) be such that (Ai 4Bi ) < . We consider the partition

that has the following k + 1 atoms:

[ [

C0 := Ai Bi \ Bj

j6=i

and Ci := Ai \C0 .

50 PETER VARJU

The set C0 has the following two key properties. First, it has large

measure, i.e.

k

X

(C0 ) ((Ai ) k) = 1 k 2 .

i=1

We note that

k

X

H () = (C0 ) log (C0 ) (Ci ) log (Ci )

i=1

k Pk

X

i=1 (Ci )

(C0 ) log (C0 ) (Ci ) log .

i=1

k

Indeed, this follows from Jensens inequality applied for the function

x 7 x log x at the points (Ci ) for i = 1, . . . , k with weights 1/k

similarly to the proof of (1) in Lemma 71. If is sufficiently small,

then (C0 ) is arbitrarily close to 1 and ki=1 (Ci ) is arbitrarily close

P

to 0. Hence H () < , provided is small enough.

We prove that H (| ) = 0, and hence

H (|) H ( |) = H (|) + H (| ) < ,

as required.

To that end, we prove I (| )(x) = 0 and consider two cases.

If x C0 , then [x] C0 , because C0 B( ). Moreover, there

is a unique 1 i k such that x Ai , hence x Bi , because

C0 Bi = C0 Ai . Then [x] Bi , because Bi B( ). Therefore

[x] Ai = [x]

and I (| )(x) = 0 in this case.

The second case is when x Ci for some i > 1. Then [x] Ci ,

because Ci B( ). Since Ci Ai , we have

[x] Ai = [x] .

Again, we find that I (| )(x) = 0 in this case. This completes the

proof.

h (T |) for any finite partition B.

n

Fix > 0. Since is a two-sided generator, we have H (|n )<

for n large enough by Lemma 86. By Lemma 85, we have

n n

h (T, ) h (T, n ) + H (|) h (T, n ) + .

n

By Lemma 84, h (T, n ) = h (T, ), hence h (T |) h (T |) + .

This is sufficient, because is arbitrarily small.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 51

Theorem 87. Let (X, B, , T ) be an ergodic measure preserving system

and let B be a finite partition. Then

1

I (0N 1 ) h (T, ) as N

N

pointwise -almost everywhere and in L1 (X, ).

Recall the definition of the information function

I (0N 1 )(x) = log([x]N 1 ).

0

Therefore, the theorem states that the atom of a typical point in the

partition 0N 1 is approximately exp(N h (T, )).

The Shannon McMillan Breiman theorem is also useful in calculating

the entropy of measure preserving systems. See the last example sheet

for some examples.

By the definition of entropy, we know that H (0N 1 ) is approxi-

mately N h (T, ) for N sufficiently large. In principle, it could hap-

pen, that in the partition 0N 1 the atoms have all kind of different sizes,

with a weighted logarithmic average of N h (T, ). However, the next

corollary of the Shannon McMillan Breiman theorem shows that this

does not happen in an ergodic measure preserving system; most atoms

in the sense of measure have size approximately exp(N h (T, )).

Corollary 88 (Asymptotic equipartition property). Let (X, B, , T ) be

an ergodic measure preserving system and let B be a finite partition.

Then for any > 0, there is N Z>0 such that the following holds for

each n N . There is a subset A B such that (A) 1 and

(15) exp(n(h (T, ) + )) ([x]0n1 ) exp(n(h (T, ) ))

for each x A.

In our usual interpretation, the corollary has the following intuitive

meaning. For n sufficiently large, there is a set of typical sequences of

outcomes for the experiment performed at times 0, . . . , n 1. The

number of these typical sequences is approximately exp(nh (T, )) and

each is approximately equally likely. Here approximation is understood

in the loose sense of (15).

Proof of the corollary. Fix > 0 and define A(n, ) as the set of points

x such that (15) holds. By Shannon-McMillan-Breiman, for almost

every x X, there N Z>0 such that x A(n, ) for every n N .

That is to say

[ \

A(n, ) = 1.

N Z n=N

Thus (

T

n=N A(n, )) 1 for N sufficiently large, as claimed.

52 PETER VARJU

Breiman theorem is based on the following idea. Using the chain rule

and invariance, we can write

N

X 1 N

X 1

I (0N 1 )(x) = N 1

I (nn |n+1 )(x) = I (|1N 1n )(T n x).

n=0 n=0

The sum on the right hand side looks like an ergodic sum and it would

be tempting to apply the pointwise ergodic theorem to conclude the

proof. However, in this sum, at each point a different function is eval-

uated and the ergodic theorem applies only if we evaluate the same

function. The idea to overcome this issue is to generalize the notion of

conditional information functions to -algebras in place of partitions

and show that the functions I (|1N 1n ) converges to such a general

conditional information function. Then we can replace the functions in

the sum by this limit and obtain a genuine ergodic sum.

This generalization requires some preparation. We begin by recalling

the definition of conditional expectation and its basic properties.

Theorem 89. Let (X, B, ) be a probability space and let A B be a

-algebra. Then for every f L1 (X, ), there is f L1 (X, ) such

that the following holds.

(1) Rf is A-measurable,

(2) A f d = A f d for all A A.

R

If f1 , f2 L1 (X, ) are two functions that satisfy both items (1) and

(2) in the role of f , then f1 = f2 holds -almost everywhere.

Definition 90. The function f in the above theorem, which is defined

up to a set of -measure 0 is called the conditional expectation of f

with respect to A and is denoted by E(f |A).

We note that

VA = {f L2 (X, ) : f is A measurable}

is a closed subspace of L2 (X, ) and for f L2 (X, ), E(f |A) is the

orthogonal projection of f to VA .

Example 91. Suppose that A is generated by a finite partition .

Then a function is A-measurable if and only if it is constant on the

atoms of , in particular, E(f |A) is constant on the atoms of . Let A

be an atom of . Then for -almost every x A, we have

Z Z

1 1

E(f |A)(x) = (A) E(f |A)d = (A) f d.

A A

expectation.

Theorem 92. Let (X, B, ) be a probability space, and let A, A1 , A2

B be -algebras, and let f, f1 , f2 L1 (X, ). Then

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 53

(2) E(f1 f2 |A) = f1 E(f2 |A) if f1 is A-measurable,

(3) E(f |A2 ) = E(E(f |A1 )|A2 ) if A2 A1 .

We will also need the following result.

Theorem 93 (Increasing and decreasing martingale theorems). Let

(X, B, ) be a probability space and let A1 , A2 , . . . B and A B be

-algebras. Suppose that either

S

A1 A2 . . . and A is Tthe -algebra generated by Ai or

A1 A2 . . . and A = Ai .

Let f L1 (X, ).

Then

lim E(f |Ai ) = E(f |A)

n

We omit the proof, which can be found in standard textbooks. It is

a useful exercise to prove this theorem for L2 -convergence (in the case

f L2 ), however, understanding the proof of this result is not required

in what follows.

As we have seen above, conditional expectation is particularly easy

to compute and work with when the -algebra is generated by a finite

partition. In many situations, a more complicated -algebra A can

be approximated by ones generated by finite partitions in the sense

of the martingale theorem, i.e. there is a sequence of finer and finer

finite partitions whose union generate A. In that case the conditional

expectation with respect to A can be approximated by conditional

expectations with respect to finite partitions thanks to the martingale

theorem.

Definition 94. Let (X, B, ) be a probability space, let B be a

finite partition and let A B be a -algebra. We define

X

I (|A) = A log E(A |A),

A

Z

H (|A) = I (|A)d.

This definition may look arbitrary. To illustrate that this is the right

one, we first verify that it coincide with the original definition, when

A = B() for a finite partition B.

54 PETER VARJU

it generates. Then

X

I (|A)(x) = A (x) log E(A |A)(x)

A

R

[x]

[x] d

= log E([x] |A)(x) = log

([x] )

([x] [x] )

= log = I (|).

([x] )

As a second reassurance, we prove that the conditional information

function with respect to finer and finer finite partitions converge to the

conditional information function with respect to the -algebra gener-

ated by the partitions. This will also be important in the proof of the

Shannon-McMillan-Breiman theorem.

Proposition 96. Let (X, B, ) be a probability space and let 1 , 2 , . . .

B be a sequence of finite partitions such that B(i ) B(i+1 ) for all i.

Let A = B(1 2 ). Let B be another finite partition. Then

I (|A) = lim I (|n )

n

1

in L and pointwise -almost everywhere.

Moreover,

(16) I (x) := sup I (|n )(x) L1 (X, ).

nZ>0

form the martingale theorem and the definition of the conditional in-

formation function. Indeed,

I (|A)(x) = log E([x] |A)(x)

= lim log E([x] |B(n ))(x)

n

= lim I (|n )(x)

n

holds almost everywhere.

Next, we prove (16). This together with the pointwise convergence

imply the L1 convergence by the dominated convergence theorem.

Fix R>0 and A . For each x X, we define n(x) as the

smallest n such that

log E(A |B(n ))(x) .

We put n(x) = if there is no such n.

For each n, the set

Bn :={x X : n(x) = n}

={x X : E(A |B(n ))(x) exp() and

E(A |B(j ))(x) > exp() for every j < n}

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 55

Z Z

(Bn ) exp() E(A |B(n ))(x) = A d = (A Bn ).

Bn Bn

Note that the sets Bn are pairwise disjoint and they cover the set

A := {x A : I (x) } = {x A : inf E(A |B(n )) exp()}.

Thus

X X

(A ) = (A Bn ) exp()(Bn ) exp().

nZ>0 nZ>0

Now

X

(x X : I (x) ) (A ) || exp().

A

Thus

Z X X

I d n(x : n1 I (x) n) || nexp((n1)) < ,

nZ>0 nZ>0

as claimed.

Lemma 97. Let (X, B, , T ) be a measure preserving system and let

B be a finite partition. Then

lim H (|1n ) = h (T, ).

n

tropy of Bernoulli shifts, we can write using the chain rule and invari-

ance

H (0n1 ) = H (|1n1 ) + . . . + H (|11 ) + H ().

Therefore

h (T, ) = C-limn H (|1n1 ).

Since n H (|1n1 ) is a monotone non-increasing sequence, it con-

verges and its limit must equal to its Cesaro limit, which was to be

proved.

ginning of the section that

N 1

1 1 X

I (0N 1 )(x) = I (|1N 1n )(T n x).

N N n=0

56 PETER VARJU

(17)

N 1

1 N 1 1 X

I (0 )(x) = I (|B(1 ))(T n x)

N N n=0

N 1

1 X

+ (I (|1N n1 )(T n x) I (|B(1 ))(T n x)),

N n=0

where B(1 ) denotes the smallest -algebra that contains 1N for all N .

By the pointwise ergodic theorem, we have

N 1 Z

1 X

I (|B(1 ))(T x) I (|B(1 ))d.

n

N n=0

We prove that I (|B(1 ))d = h (T, ). We have

R

Z Z

I (|B(1 ))d = lim I (|1N )d = lim H (|1N ) = h (T, ).

N N

to 0. To that end, we write

IK (x) = sup |I (|B(1 ))(x) I (|1k )(x)|.

kK

By the previous proposition, we have IK L1 (X, ) for all K, and

IK 0 in L1 (X, ) and pointwise -almost everywhere. We fix an

integer K and write

1 N 1

X N n1 n n

(I (|1 )(T x)I (|B(1 ))(T x))

N n=0

N K1 N 1

1 X

1 X

IK (T n x) + I0 (T n x).

N n=0

N n=N K

N K1 N Z

1 X n 1 X n

I (T x) I (T x) IK d

N n=0 K N n=0 K

as N . For the second sum, we write

N 1 N 1

1 X 1 X n

I0 (T n x) = I (T x)

N n=N K

N n=0 0

N K1

N K 1 1 X

I (T n x)

N N K 1 n=0 0

Z Z

I0 d I0 d = 0,

as N .

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 57

We obtained that

1 N 1 Z

X N n1 n n

lim sup (I (|1 )(T x) I (|B(1 ))(T x)) IK d.

N N n=0

Since K was arbitrary, and

Z

lim IK d = 0,

K

14. Mixing and entropy

Earlier in the course we discussed weak mixing as a more convenient

alternative for mixing. Now we discuss a similarly convenient notion

that is stronger than mixing.

Definition 98. A measure preserving system (X, B, , T ) is called K-

mixing (K for Kolmogorov) if the following holds. Let A B and let

B be a finite partition. Then for every > 0, there is N Z>0

such that for every B B(N ), we have

|(A B) (A)(B)| < .

Recall that B(N ) denotes the -algebra generated by the partitions

M

N for M ZN .

We note that we can take = {C, X\C} in the definition for an

arbitrary fixed set C B. This allows us to take B = T n (C) for

n N and obtain

|(A T n C) (A)(C)| < .

This shows that K-mixing indeed implies mixing. Moreover, it also

implies mixing on k-sets for any k Z2 , which is the content of an

exercise on the last example sheet.

Some authors refer to invertible K-mixing systems as K-automorphisms.

The notion of K-mixing can be characterized using entropy or tail

-algebras. We first define the latter.

Definition 99. Let (X, B, , T ) be a measure preserving system and

let B be a finite partition. The tail -algebra of is

\

T () = B(n ).

n=0

the knowledge of the result of an experiment for time n, n + 1, . . .. In

this manner we think about T () as the knowledge of the result of the

experiment in the distant future.

Theorem 100. Let (X, B, , T ) be a measure preserving system. The

following are equivalent.

58 PETER VARJU

(2) For each finite partition B, T () is trivial, that is it contains

only sets of measure 0 and 1.

(3) The system is of totally positive entropy, that is, we have h (T, ) >

0 for any finite partition B with H () > 0.

We note that the Kolmogorov 0 1 law states that T () is trivial if

the system is a Bernoulli shift and = {{x : x0 = i} : i = 1, . . . , k}. In

fact, this is true for any finite partitions B, hence Bernoulli shifts

are examples of K-mixing systems.

We begin with two easier implications in the theorem.

Proof of (1) (2). Suppose the system is K-mixing. Fix a finite par-

tition , and let A T (). Then A B(N ) for all N , hence

|(A A) (A)2 | <

for any > 0. Then (A) = (A)2 , hence (A) = 0 or 1, as required.

Proof of (2) (1). Fix A B and fix a finite partition B.

Suppose T () is trivial. We first show that E(A |T ()) = (A)

almost everywhere. If this fails, then

B := {x : E(A |T ()) (A) + }

is of positive measure for some > 0. Since B T (), (B) = 1, and

we have

Z Z

(A) = A d = E(A |T ())d (A) + ,

B B

a contradiction.

Fix > 0. By the martingale convergence theorem, we have

kE(A |T ()) E(A |B(N ))k1 <

provided N is sufficiently large, which we may and will assume.

Let now B B(N ). Then

Z Z

(A B) = A d = E(A |B(N ))d.

B B

Therefore

Z

|(A B) (A)(B)| = (E(A |B(N )) E(A |T ()))d < ,

B

as required.

The equivalence with item (3) requires some preparation. We also

need the following generalization of the basic properties of entropy that

we have proved for finite partitions. This lemma is included in the last

example sheet.

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 59

tions and let A B be a -algebra. Suppose that there is a sequence

of finite partitions 1 , 2 , . . . B such that A is the smallest -algebra

that contains the atoms of i for all i. We write A for the smallest

-algebra that contains A and the atoms of . Then

I (T 1 |T 1 A)(x) = I (|A)(T x),

H (T 1 |T 1 A) = H (|A),

I ( |A)(x) = I (|A)(x) + I (| A)(x),

H ( |A) = H (|A) + H (| A),

H (| A) H (|A).

We also recall the following result from the last example sheet, which

is a variant of Proposition 96.

Proposition 102. Let (X, B, ) be a probability space and let A1 , A2 , . . .

B be a sequence of -algebras such that Ai Ai+1 for all i. Let

A = A1 A2 . Let B be a finite partition. Then

I (|A) = lim I (|An )

n

1

in L and pointwise -almost everywhere.

The next two lemmata will be used in the proof of the implication

(2) (3).

Lemma 103. Let (X, B, , T ) be a measure preserving system and let

B be a finite partition. Then

h (T, ) = H (|B(1 )).

Proof. In the previous section, we showed that

h (T, ) = lim H (|1n ).

n

Now the claim follows from the convergence of entropy conditioned on

partitions to the entropy conditioned on the -algebra generated by

the partitions, which was proved in Proposition 96.

Lemma 104. Let (X, B, , T ) be a measure preserving system and let

B be a finite partition. If h (T |) = 0, then H (|T ()) = 0.

Proof. Suppose h (T |) = 0. By the previous lemma, we have

H (|B(1 )) = 0.

Using the chain rule and invariance, we can write

n1

X

H (|B(n )) H (0n1 |B(n )) = H (jj |B(j+1

))

j=0

n1

X

= H (|B(1 )) = 0.

j=0

60 PETER VARJU

Proof of (2) (3). Suppose (2) holds and h (T, ) = 0 for some finite

partition B. We show that H () = 0, which completes the proof.

By the previous lemma, we have H (|T ()) = 0.

By (2), we know that T () is trivial, hence

I (|T ()) = log(E([x] , T ())) = log ([x] ),

and H (|T ()) = H () = 0, as required.

The proof of the implication (3) (2) requires the following propo-

sition.

Proposition 105. Let (X, B, , T ) be a measure preserving system and

let , B be two finite partitions. Then

h (T, ) = H (|B(1 ) T ()).

We begin with two lemmata.

Lemma 106. Let (X, B, , T ) be a measure preserving system and let

B be a finite partition. Then

1

h (T, ) = H (0n1 |B(n ))

n

for each n.

Proof. By the chain rule and invariance, we have

n1

X

H (0n1 |B(n )) = H (jj |B(j+1

)) = nH (|B(1 )).

j=0

Lemma 107. Let (X, B, , T ) be a measure preserving system and let

, B be two finite partitions. Then

1

h (, T ) = lim H (0n1 |B(n ) B(n )).

n n

H (0n1 |B(n ) B(n )) H (0n1 )

for each n. Therefore,

1

h (, T ) lim sup H (0n1 |B(n ) B(n )).

n n

In order to prove

1

(18) H (0n1 |B(n ) B(n )),

h (, T ) lim inf

n n

TOPICS IN ERGODIC THEORY, MICHAELMAS 2016 61

We can write

H (0n1 0n1 |B(n ) B(n )) =H (0n1 |B(n ) B(n ))

+ H (0n1 |B(0 ) B(n ))

H (0n1 0n1 |B(n )) =H (0n1 |B(n )) + H (0n1 |B(0 )).

We note that

H (0n1 |B(0 )) H (0n1 |B(0 ) B(n ))

for each n, and the failure of (18) yields (using the previous lemma)

1 1

lim H (0n1 |B(n )) > lim inf H (0n1 |B(n ) B(n )).

n n n n

Therefore

1

lim inf H (0n1 0n1 |B(n ) B(n ))

n n

1

< lim sup H (0n1 0n1 |B(n ))

n n

1

lim sup H (0n1 0n1 )

n n

=h (T, ).

This contradicts, however, the previous lemma, hence (18) and the

lemma hold.

Proof of the proposition. Using the chain rule and invariance, we can

write

n1

X

H (0n1 |B(n ) B(n )) = H (jj |B(j+1

) B(n ))

j=0

n1

X

= H (|B(1 ) B(nj

)).

j=0

h (, T ) = C-limn H (|B(1 ) B(n )).

By Proposition 102, we have

lim H (|B(1 ) B(n )) = H (|B(1 ) T ()),

n

Proof of (3) (2). Suppose that (3) holds. Suppose to the contrary

that there is a finite partition B such that T () is not trivial, that

is, it contains a set A T () with 0 < (A) < 1.

Let = {A, Ac }, so H () > 0. We prove that

(19) H (|B(1 ) T ()) = 0,

62 PETER VARJU

finishing the proof of the theorem.

We calculate

I (|B(1 ) T ())(x) = log E([x] |B(1 ) T ())(x).

Since [x] is either A or Ac , which are both contained in T (), we see

that [x] is B(1 ) T ()-measurable, hence

E([x] |B(1 ) T ()) = [x] .

This proves that

I (|B(1 ) T ())(x) = 0

for -almost all x, hence (19) holds.

- CV - Non Separability of the Hölder SpacesUploaded byCristian Valdez
- Real Analysis Lecture sldiesUploaded byruchi21july
- [Translations of Mathematical Monographs 26] Gennadiĭ Mikhaĭlovich Goluzin - Geometric Theory of Functions of a Complex Variable (Translations of Mathematical Monographs, Vol. 26) (1969, AMS Bookstore)Uploaded bySonia Isabel Rentería Alva
- Metric and Topological SpacesUploaded byOrxan Qasimov
- 01630563.2010.536286Uploaded byAlemayehu Geremew
- 1-2-2Uploaded bykathrynalcorn
- M.Sc mathUploaded bySonu Verma
- lecturenotes2009.pdfUploaded byhendra
- 00 Metric SpacesUploaded byEhab Alex
- week1 (1).pdfUploaded byG
- Thermodynamic FormalismUploaded byiavicenna
- Limits and Continuity for Pharma .pdfUploaded byPaolo Naguit
- Week 6 SolutionsUploaded byKimbo Nugyen
- section3.5a.pdfUploaded byRaufrahim
- Oeuvre ComplèteUploaded byHelder Vilarinho
- 02_MetricSpr-9.pdfUploaded byRolffoTello
- 4.IJMCARFEB20174Uploaded byTJPRC Publications
- Research Scholarship Test 2009Uploaded byProsenjit Das
- style_LaTeX.pdfUploaded byAnimesh Gupta
- ma691_ch1.pdfUploaded byham.karim
- 07420610Uploaded byRajasekar Panneerselvam

- 25P001-022Uploaded byalin444444
- 3_724Uploaded byalin444444
- yUploaded byalin444444
- group08.pdfUploaded byalin444444
- fulltext.pdfUploaded byalin444444
- Lecture10IntrotoQuantumMechanics_001Uploaded byalin444444
- jswUploaded byalin444444
- Advanced High School MathematicsUploaded byPintoSenko
- akt.pdfUploaded byalin444444
- Brannon - Large Deformation KinematicsUploaded byPatrick Summers
- physUploaded byatultiwari8
- Chimica-fisicaUploaded byDiana Bonetta
- E FourieranalyseUploaded byalin444444
- A Gentle Intro to Abstract AlgebraUploaded bydoney_78
- Abstracts Ws3Uploaded byalin444444
- The Representation Theory of the Symmetric Groups SLNUploaded byalin444444
- Linear AlgebraUploaded byalin444444
- Linear_Algebra.pdfUploaded byalin444444
- Dime NsivUploaded byalin444444
- Perron-Frobenius Hannah CairnsUploaded byalin444444
- Crash CourseUploaded byalin444444
- rnotesUploaded byalin444444
- Leos Masters Kg NotesUploaded byalin444444
- Hopf lects 1-8Uploaded byalin444444
- CommutativeUploaded byanimasharma
- Mathemataical Foundation EngineeringUploaded byalin444444
- Kappers 99 Intrinsic GeometryUploaded byalin444444
- UAG.pdfUploaded byalin444444
- beutelUploaded byÁngel González

- Neural Networks and Deep Learning Chap 4Uploaded byMohit Duseja
- Homeomorphisms of Hashimoto TopologiesUploaded byscribd4tavo
- Real AnalysisUploaded bymichelez6
- image-sequenceUploaded bykahanee2003
- Descriptive Set Theory Lecture NotesUploaded byGeorge
- Limits and ContinuityUploaded byAnik
- 02_Mathreview.pdfUploaded bywin alfalah
- Fourier Neural NetworksUploaded byياسينبوهراوة
- 1607.03591Uploaded bySacha Christopher Lauria
- mal-525.pdfUploaded byparveshnain19
- advanced_calculus_IndexUploaded byhyd arnes
- Pcm 0178Uploaded byumerfarooque
- MAT 221 Lecture PlanUploaded byMaesha Armeen
- Somw Weaker Forms Fuzzy Alpha Continuous MappingUploaded byDr.Hakeem Ahmed Othman
- UCCM1153T2_N_.docxUploaded byAlex Mak
- Finite Element Approximation of a Problem With a Nonlinear Newton Boundary ConditionUploaded byalfonso
- CALCULUS: LIMITS & CONTINUITYUploaded byExamville.com
- calculus courseUploaded byMefisto El
- Bounded Variation & Helly's Selection TheoremUploaded byPaul Benedict
- Chapter1 Limits ContinuityUploaded byCik Nursharwani
- The Calculus Integral - Brian S. ThomsonUploaded byTruong Huynh
- Unit-6.PDF Engg MathUploaded bySudersanaViswanathan
- TopologyUploaded byLuis Morales
- Baby Rudin Chapter 4Uploaded byLuisJavierRubio
- Variational methods.pdfUploaded byKatty Tps
- Erf ExercisesUploaded bywieirra
- Altair's Student Guides - CAE and Design Optimization - AdvancedUploaded byKFourMetrics
- 432-1748-1-PBUploaded byJo Kunin
- Partial Differential Equations NotesUploaded byJoshua Lin
- HeavisideUploaded bykorangaprakash