You are on page 1of 34

ϕ-Normal Numbers

Jonah Bloch-Johnson
May 11, 2008
Abstract
The sequence of quotients of the simple continued fraction representa-
tion of almost every number in [0,1] contains the natural number k with
asymptotic frequency:
1
ln 2
ln
(k + 1)
2
k(k + 2)
This result has a number of applications; for instance, it tells us the
distribution of streaks one would get after tossing a coin an infinite amount
of times.
Introduction
This paper was prepared for Ming-Lun Hsieh for an undergraduate seminar. It is
based on research I have been performing with David Bayer. The seminar was on
number theory; whereas most number theory is concerned with properties of the
natural numbers, this paper is concerned with a property of real numbers found in
the interval [0, 1]. More specifically, the paper is centered around defining a prop-
erty, that of being ϕ-normal, which is named as an analogy with the more familiar
normal real numbers (don’t worry if you don’t know these–they are defined in
the paper.) The subject requires an introduction to continued fractions, Lebesgue
integration, and dynamical systems–specifically a dynamical system known as the
Gauss transformation, as well as an important theorem of ergodic systems, the
Birkhoff Ergodic Theorem. As a result, this paper consists primarily of condensa-
tion of material from three books:
“An Introduction to the Theory of Numbers” by G.H.Hardy and E.M.Wright
(Chapter 10)
“Real and Complex Analysis” by Walter Rudin (Chapter 1)
“Introduction to Dynamical Systems” by Michael Brin and Garrett Stuck
(Chapters 1 and 4)
1
I also used a paper found online to help prove the corollary for the Berkhoff Er-
godic Theorem. My original material consists of those sections which join together
the books and the fleshing out of some of the proofs, as well as the application at
the end of the paper, which was inspired by ideas from David Bayer. A note on
numbering conventions in this paper: if a theorem is introduced as “6 Theorem:,”
then it is referred to in the text as Theorem 6.
A final note: the material in this subject touches upon two rather beautiful
structures in number theory; periodic continued fractions and the Stern-Brocot
tree. In class, people expressed particular interest in the theorem of Lagrange,
all continued fractions representing irrational roots of integer-coefficient quadratic
equations are peridoci, as well as its converse. Unfortunately, there is not room in
this paper to develop the theory behind either of them, but for those who want
to investigate either can look in chapter 10 of the Hardy and Wright book for the
continued fraction material and in chapters 4 and 6 of Donald Knuth’s “Concrete
Mathematics” to learn more about the Stern-Brocot tree.
1 The Gauss Transformation and Continued Frac-
tions
1 Definition: A discrete-time dynamical system (X,f ) consists of a non-empty
set X and a map f :X→X. (f need not be onto.) For n ∈ N, the n
th
iterate of
f is the n-fold composition f
n
= f ◦ · · · ◦ f; we definite f
0
to be the identity
map (also denoted Id.)
2 Definition: Let {x} denote the greatest integer less than or equal to x for
x ∈ R. The map ϕ : [0, 1] → [0, 1] defined by
ϕ(x) =

1
x
−{
1
x
} if x ∈ (0, 1]
0 if x = 0
is called the Gauss transformation.
3 Remarks: a) For each interval (
1
n+1
,
1
n
], ∀n ∈ N, ϕ(x) =
1
x
− n. Therefore, on
every such interval ϕ(x) is monotonically decreasing, continuous, and
has a range [0,1).
b) ∀n ∈ N, ϕ(
1
n
) = n −{n} = n −n = 0.
c) ∀n ∈ N, lim
→0
+
(ϕ(
1
n
− )) = 1. So, based on 3 b), ϕ is discontinuous at
1
n
for all n ∈ N.
2
When we study a dynamical system (X,f), we are mainly interested in what
happens to the elements of X after repeated iterations of f. In the case
of the Gauss transformation, questions of this sort are greatly simplified by
knowledge of continued fractions.
4 Definition: The function of N+1 variables
x = a
0
+
1
a
1
+
1
a
2
+
1
.
.
.
+
1
a
N
is called a finite continued fraction, and is often rewritten [a
0
, a
1
, a
2
, . . . , a
N
].
a
0
, a
1
, a
2
, . . . , a
N
are called the partial quotients or simply quotients of the
continued fraction. Here are some facts about continued fractions that follow
quickly from their definition:
[a
0
, a
1
] = a
0
+
1
a
1
[a
0
, a
1
, . . . , a
n−1
, a
n
] = [a
0
, a
1
, . . . , a
n−2
, a
n−1
+
1
a
n
] (1)
[a
0
, a
1
, . . . , a
n
] = a
0
+
1
[a
1
, a
2
, . . . , a
n
]
= [a
0
, [a
1
, a
2
, . . . , a
n
]]
and more generally,
[a
0
, a
1
, . . . , a
n
] = [a
0
, a
1
, . . . , a
m
−1, [a
m
, a
m
+1, . . . , a
n
]], for1 ≤ m < n (2)
5 Definition: We call [a
0
, a
1
, . . . , a
n
](0 ≤ n ≤ N) the n
th
convergent to [a
0
, a
1
, . . . , a
N
].
6 Theorem: If p
n
and q
n
are defined by
p
0
= a
0
, p
1
= a
1
a
0
+ 1, p
n
= a
n
p
n−1
+ p
n−2
(2 ≤ n ≤ N) (3)
q
0
= 1, q
1
= a
1
, q
n
= a
n
q
n−1
+ q
n−2
(2 ≤ n ≤ N) (4)
then
[a
0
, a
1
, . . . , a
n
] =
p
n
q
n
.
Proof. If n = 0,
p
0
q
0
=
a
0
1
= [a
0
]
3
If n = 1,
p
1
q
1
=
a
1
a
0
+ 1
a
1
= a
0
+
1
a
1
= [a
0
, a
1
]
Suppose it is true for n ≤ m, where m < N. Then
[a
0
, a
1
, . . . , a
m−1
, a
m
] =
p
m
q
m
=
a
m
p
m−1
+ p
m−2
a
m
q
m−1
+ q
m−2
.
It is (1) that allows the recursion to propagate:
p
m+1
q
m+1
= [a
0
, a
1
, . . . , a
m−1
, a
m
, a
m+1
] = [a
0
, a
1
, . . . , a
m
+
1
a
m+1
] =
(a
m
+
1
a
m+1
)p
m−1
+ p
m−2
(a
m
+
1
a
m+1
)q
m−1
+ q
m−2
=
a
m
p
m−1
+
p
m−1
a
m+1
+ p
m−2
a
m
q
m−1
+
q
m−1
a
m+1
+ q
m−2
=
a
m+1
(a
m
p
m−1
+ p
m−2
) + pm−1
a
m+1
(a
m
q
m−1
+ q
m−2
) + qm−1
=
a
m+1
p
m
+ p
m−1
a
m+1
q
m
+ q
m−1
7 Theorem: p
n
and q
n
, as defined in Theorem 6, satisfy:
p
n
q
n−1
−p
n−1
q
n
= (−1)
n−1
Proof.
p
n
q
n−1
−p
n−1
q
n
= (a
n
p
n−1
+ p
n−2
)q
n−1
−p
n−1
(a
n
q
n−1
+ q
n−2
) =
a
n
p
n−1
q
n−1
+ p
n−2
q
n−1
−a
n
p
n−1
q
n−1
−p
n−1
q
n−2
=
−(p
n−1
q
n−2
−p
n−2
q
n−1
).
By simply repeating this deduction with n −1, n −2, . . . , 2 in place of n, we
get
p
n
q
n−1
−p
n−1
q
n
= (−1)
n−1
(p
1
q
0
−p
0
q
1
) =
(−1)
n−1
((a
0
a
1
+ 1) · 1 −a
0
a
1
) = (−1)
n−1
.
Further, we have
p
n
q
n

p
n−1
q
n−1
=
p
n
q
n−1
q
n
q
n−1

p
n−1
q
n
q
n
q
n−1
=
p
n
q
n−1
−p
n−1
q
n
q
n
q
n−1
=
(−1)
n−1
q
n
q
n−1
(5)
4
8 Theorem:
p
n
q
n−2
−p
n−2
q
n
= (−1)
n
a
n
Proof.
p
n
q
n−2
−p
n−2
q
n
= (a
n
p
n−1
+ p
n−2
)q
n−2
−p
n−2
(a
n
q
n−1
+ q
n−2
) =
a
n
p
n−1
q
n−2
+ p
n−2
q
n−2
−a
n
p
n−2
q
n−1
−p
n−2
q
n−2
=
a
n
(p
n−1
q
n−2
−p
n−2
q
n−1
) = (−1)
n−2
a
n
= (−1)
n
a
n
.
Thus,
p
n
q
n

p
n−2
q
n−2
=
(−1)
n
a
n
q
n
q
n−1
(6)
In general, we write
x
n
=
p
n
q
n
= the n
th
convergent
and x = x
N
. Based on (2),
x = [a
0
, a
1
, . . . , a
N
] = [a
0
, a
1
, . . . , a
m−1
, [a
m
, a
m+1
, . . . , a
N
]] =
[a
m
, a
m+1
, . . . , a
N
]p
m−1
+ p
m−2
[a
m
, a
m+1
, . . . , a
N
]q
m−1
+ q
m−2
(7)
for (2 ≤ n ≤ N.)
9 Theorem: Suppose that
a
j
> 0 (1 ≤ j ≤ N.) (8)
Then
a) The even convergents x
2n
increase strictly with n, while the odd conver-
gents x
2n+1
decrease strictly.
Proof. Based on (2) and (8), we have that every q
n
is positive, as is
every a
n
where 0 < n < N.
(−1)
n
a
n
q
n−2
q
n
has the sign of (−1)
n
. By (6),
(−1)
n
a
n
q
n−2
q
n
=
p
n
q
n

p
n−2
q
n−2
= x
n
−x
n−2
.
Thus, x
n
− x
n−2
has the sign of (−1)
n
; so if n is even, x
n
> x
n−2
, and
if n is odd, x
n
< x
n−2
, ∀n > 1.
5
b) Every odd convergent is greater than any even convergent.
Proof. Every q
n
is positive, so
(−1)
n−1
q
n
q
n−1
has the sign of (−1)
n−1
. So, after (5), x
n
−x
n−1
has the sign of −1)
n−1
.
Specifically, x
2m+1
−x
2m
has the sign (−1)
(2m+1)−1
= (−1)
2m
= 1, so
x
2m+1
> x
2m
. (9)
Suppose Theorem 9 b) were false. Then ∃m, µ ∈ N s.t.
x
2m+1
≤ x

.
If µ < m, then after Thm 9 a), x
2m+1
≤ x

< x
2m
, contradicting (9);
if µ > m, then x
2µ+1
< x
2m+1
≤ x

, also contradicting (9).
c) The value of the continued fraction is greater than that of any of its even
convergents and less than that of any of its odd convergents (except
that it is equal to the last convergent, whether this be even or odd.)
Proof. x = x
N
is the greatest of the even or least of the odd convergents,
in either of which case 9 a) and b) show 9 c) to be true.
10 Definition: A continued fraction [a
0
, a
1
, . . . , a
N
] wherein a
0
∈ Z and ∀js.t.1 ≤
j ≤ N, a
j
∈ (N) is called a simple continued fraction.
Theorem 9 holds for simple continued fractions; further, from (3) and (4) we
have that for a simple continued fraction, p
n
∈ Z and q
n
∈ N.
If [a
0
, a
1
, a
2
, . . . , a
N
] =
p
N
q
N
= x, we say that x (which ∈ Q) is represented by
that continued fraction.
11 Theorem a) q
n
≥ q
n−1
for n ≤ 1, with inequality when n > 1, and
b) q
n
≥ n, with inequality when n > 3.
Proof. q
0
= 1, q
1
= a
1
≥ 1 = q
0
.
For n > 1, q
n−2
≥ 1; q
n
= a
n
q
n−1
+ q
n−2
≥ q
n−1
+ 1, so q
n
> q
n−1
, and
by induction q
n
≥ n.
Finally, if n > 3, then q
n
≥ q
n−1
+ q
n−2
> q
n−1
+ 1 ≥, so q
n
> n.
12 Theorem: The convergents to a simple continued fraction are in their lowest
terms.
6
Proof. p
n
q
n−1
−p
n−1
q
n
= (−1)
n−1
. So,
d|p
n
and d|q
n
→ d|(−1)
n−1
→ d|1; thus, gcd(p, q) = 1
13 Theorem: If x is representable by a simple continued fraction with an odd
number of convergents, then it is also representable by one with an even
number, and vice versa.
Proof. If a
n
≥ 2, [a
0
, a
1
, . . . , a
N
] = [a
0
, a
1
, . . . , a
N
−1, 1];
If a
N
= 1, [a
0
, a
1
, . . . , a
N−1
, 1] = [a
0
, a
1
, . . . , a
N−2
, a
N−1
+ 1].
14 Definition: We call a

n
= [a
n
, a
n+1
, . . . , a
N
](0 ≤ n ≤ N) the n
th
complete
quotient of the continued fraction [a
0
, a
1
, . . . , a
n
, . . . , a
N
]. Thus:
x = a

0
, x = a
0
+
1
a

1
From (7), we have
x =
a

n
p
m−1
+ p
m−2
a

n
q
m−1
+ q
m−2
(10)
15 Theorem: a
n
= {a

n
}, except a
N−1
= {a

N−1
} −1 when a
N
= 1.
Proof. If N = 0, then a
0
= a

0
= {a

0
}.
If N > 0, then,
a

n
= a
n
+
1
a

n+1
. (11)
Lemma a

n+1
> 1(0 ≤ n ≤ N − 1) except that a

n+1
= 1 when n = N − 1
and a
N
= 1.
Proof. We will prove this using backwards induction:
For n = N −1,
a

(N−1)+1
= a

N
= a
N
; if a

(N−1)+1
= 1, then a
N
= 1. Otherwise, since
a
N
∈ N, a
N
> 1, so a

(N−1)+1
> 1.
Suppose a

n+1
> 1. Then 0 <
1
a
n+1
; 1 ≤ a
n−1
. So, a

n
= a
n
+
1
a

n+1
> 1 + 0 =
1.
Since a

n+1
< 1 for (0 ≤ n ≤ N − 1) (except when a
N
= 1), we have
0 <
1
a

n+1
< 1. Thus,
a
n
< a
n
+
1
a

n+1
= a

n+1
< a
n
+ 1. (12)
7
So, for n ≤ N − 1, we have a
n
= {a

n
} except when a
N
= 1, in which
case a

N−1
= a
N−1
+
1
a
N
= a
N−1
+ 1, so that a

N−1
= {a

N−1
}. Further,
a
N−1
= a

N−1

1
a
N
= a

N−1

1
1
= a

N−1
−1 = {a

N−1
} −1.
Finally, a
N
= a

N
= {a
N
}.
16 Definition: Two continued fractions are identical if they are formed by the
same exact series of quotients.
17 Theorem: If two simple continued fractions [a
0
, a
1
, . . . , a
N
] and [b
0
, b
1
, . . . , b
M
]
have the same value x, and a
N
> 1 and b
M
> 1, then M = N, and the
fractions are identical.
Proof. By Theorem 16, a
0
= {a

0
} = {x} = {b

0
} = b
0
.
Suppose the first n partial quotients in the continued fractions are identical,
and that a

n
and b

n
are the complete quotients. Then x = [a
0
, a
1
, . . . , a
n−1
, a

n
] =
[a
0
, a
1
, . . . , a
n−1
, b

n
].
If n = 1, a
0
+
1
a

1
= a
0
+
1
b

1
; so a

1
= b

1
; so a
1
= {a

1
} = {b

1
} = b
1
.
If n > 1, then by (10),
x =
a

n
p
n−1
+ p
n−2
a

n
q
n−1
+ q
n−2
=
b

n
p
n−1
+ p
n−2
b

n
q
n−1
+ q
n−2
.
(a

n
p
n−1
+ p
n−2
)(b

n
q
n−1
+ q
n−2
) −(b

n
p
n−1
+ p
n−2
)(a

n
q
n−1
+ q
n−2
) = 0
a

n
b

n
p
n−1
q
n−1
+ p
n−2
q
n−2
+ a

n
p
n−1
q
n−2
+ b

n
p
n−2
q
n−1

(a

n
b

n
p
n−1
q
n−1
+ p
n−2
q
n−2
+ b

n
p
n−1
q
n−2
+ a

n
p
n−2
q
n−1
) = 0.
(a

n
−b

n
)(p
n−1
q
n−1
−p
n−2
q
n−1
) = 0.
(a

n
−b

n
)(−1)
n
= 0,
a

n
= b

n
.
Since a
N
> 1, b
M
> 1, a
n
= {a

n
} = {b

n
} = b
n
, so a
n
= b
n
.
Suppose, without loss of generality, that N ≤ M. Then our argument shows
that a
n
= b
n
for m ≤ N. If N < M, then
p
N
q
N
= [a
0
, a
1
, ≤, a
N
] = [b
0
, b
1
, ≤, b
N
, b
N+1
, ≤, b
M
] =
b

N+1
p
N
+ p
N−1
b

N+1
q
N
+ q
N−1
.
p
N
(b

N+1
q
N
+ q
N−1
) = q
N
(b

N+1
p
N
+ p
N−1
)
p
N
q
N−1
−p
N−1
q
N
= 0.
This contradicts Theorem 7. So, M = N, and the fractions are identical.
8
Let x be any real number, and let a
0
={x}. Then
x = a
0
+ ξ
0
, 0 ≤ ξ
0
< 1.
If ξ
0
= 0, then we can write
1
ξ
0
= a

1
, {a

1
} = a
1
, a

1
= a
1
+ ξ
1
, 0 ≤ ξ
1
< 1.
If ξ
1
= 0, then we can write
1
ξ
1
= a

2
, {a

2
} = a
2
, a
2
, a

2
= a
2
+ ξ
2
, 0 ≤ ξ
2
< 1.
In general,
1
ξ
n−1
= a

n
> 1, so a
n
≥ 1 for n ≥ 1.
Thus, x = [a
0
, a

1
] = [a
0
, a
1
+
1
a

2
] = [a
0
, a
1
, a

2
] = [a
0
, a
1
, a
2
, a

3
] = . . ., where
a
0
, a
1
, . . . , are integers, and ∀j ∈ N, a
j
∈ N.
18 Definition The system of equations
x = a
0
+ ξ
0
(0 ≤ ξ
0
< 1)
1
ξ
0
= a

1
= a
1
+ ξ
1
(0 ≤ ξ
1
< 1)
1
ξ
1
= a

2
= a
2
+ ξ
2
(0 ≤ ξ
2
< 1)
.
.
.
.
.
.
is known as the continued fraction algorithm. The algorithm continues so
long as ξ
n
= 0. If we eventually reach a value of n, say N, for which ξ
N
= 0,
then the algorithm terminates and we have:
x = [a
0
, a
1
, . . . , a
N
]
In this case, x is represented by a simple continued fraction, and the numbers
a

n
are the complete quotients of the continued fraction.
19 Theorem: Any rational number x can be represented by a finite simple con-
tinued fraction.
Proof. Put x into the continued fraction algorithm:
Case 1: x is an integer.
Then ξ
0
= 0, so x = a
0
.
Case 2: x is not an integer.
9
Then x =
h
k
, h ∈ Z, k ∈ N.
Since
h
k
= a
0
+ ξ
0
; h = a
0
k + xi
0
k.
Let k
1
= ξ
0
k.(k
1
is the remainder of h/k := (
h
k
−{
h
k
})k.)
So, a

1
=
k
k
1
= {a

1
} + ξ
1
= a
1
+ ξ
1
, and k = a
1
k
1
+ ξ
1
k
1
.
Let k
2
= ξ
1
k
1
; then k = a
1
k
1
+ k
2
, where k
2
is the remainder of
k
k
1
.
We can continue this process, where k
1
= a
2
k
2
+k
3
, k
2
= a
3
k
3
+k
4
, . . . , k
n−2
=
a
n−1
k
n−1
+ k
n
, where a
n−1
= {
k
n−2
k
n−1
} and k
n
is the remainder of
k
n−2
k
n−1
(one
implies the other.) The continued fraction algorithm continues as long as
ξ
n
= 0;
1
ξ
n
= a

n+1
=
k
n
k
n+1
. So, ξ
n
=
k
n+1
k
n
; thus, ξ
n
= 0 ↔ k
n+1
= 0.
So, the continued fraction algorithm continues as long as k
n+1
= 0. Since k
j
is the remainder of k
j−1
/k
j−2
, k
j
≥ 0 for ∀j ∈ N, k, k
1
, k
2
, . . . is a sequence of
non-negative integers. Since k
n
= a
n
k
n+1
+k
n+2
, k
n
> k
n+1
, so this sequence
is strictly decreasing. Therefore, k
N+1
= 0 for some N. Therefore, ξ
N
= 0
for some N, and so the continue fraction algorithm terminates, giving us a
finite simple continued fraction that represents x.
20 Remark. The system of equations used in the proof of Theorem 19 is Euclid’s
algorithm, used for determining gcd(h, k).
21 Remark. Since ξ
N
= 0, a

N
= a
N
; also
0 <
1
a
N
=
1
a

N
= ξ
N−1
< 1; a
N
> 1, and so a
N
≥ 2.
Hence the continued fraction algorithm gives us a representation of the type
that was shown to be unique in Theorem 17. For any such representation,
we can always make the variation of Theorem 13. This gives us:
22 Theorem: A rational number can be expressed as a finite simple continued
fraction in just two ways, one with N convergents and one with N + 1
convergents; in the former, the last partial quotient is > 1, and in the latter,
it is = 1. The continued fraction algorithm always gives us the first type of
representation.
What about irrational numbers? If we put one into the continued fraction
algorithm and the algorithm terminated, we would have a finite simple con-
tinued fraction representing an irrational number-but it is clear that finite
simple continued fractions are rational, and so this gives us a contradiction.
So, we must have that the continued fraction algorithm never terminates
when given an irrational number. This implies that they might be repre-
sented by infinite continued fractions-so let’s develop a theory for those:
10
23 Definition: Suppose that we have a sequence a
0
, a
1
, a
2
, . . . where a
0
∈ Z and
a
j
∈ N∀j ∈ N. Therefore ∀n ∈ N, [a
0
, a
1
, . . . , a
n
] is a simple continued
fraction, which in turn represents a rational number, let us say x
n
. If lim
n→∞
x
n
exists, and we call it x, then we say that the simple continued fraction
[a
0
, a
1
, a
2
, . . .] converges to the value x and write x = [a
0
, a
1
, a
2
, . . .].
24 Theorem: If a
0
, a
1
, a
2
, . . . , is a sequence of integers as described above (a
0

Z, ∀j ∈ Na
j
∈ N), then x
n
= [a
0
, a
1
, . . . , a
n
] tends to a limit x when n → ∞;
or, in other words, all infinite simple continued fractions are convergent.
Proof. x
n
is called a convergent to [a
0
, a
1
, a
2
, . . .] if
x
n
=
p
n
q
n
= [a
0
, a
1
, . . . , a
n
].
If N ≥ n, then x
n
is also a convergent to [a
0
, a
1
, . . . , a
N
]. By Thm 9 a),
the even convergents form an increasing sequence and the odd convergents
form a decreasing one. Every even convergent is less than x,, by Thm 9 b),
so that the increasing sequence of even convergents is bounded above; and
every odd convergent is greater than x
0
, so that the decreasing sequence of
odd convergents is bounded below. Hence, the odd convergents tend to a
limit
1
, and the odd convergents tend to a limit
2
, and
1

2
.

p
2n
q
2n

p
2n−1
q
2n−1

=

p
2n
q
2n−1
−p
2n−1
q
2n
q
2n
q
2n−1

=

(−1)
2n−1
q
2n
q
2n−1

=
1
q
2n
q
2n−1
;
Since q
n
≥ n,
1
q
2n
q
2n−1

1
2n(2n −1)
.
As n → 0,
1
2n(2n−1)
→ 0, so

p
2n
q
2n

p
2n−1
q
2n−1

→ 0.
Thus,
1
=
2
. Let us define x :=
1
=
2
; then [a
0
, a
1
, a
2
, . . .] converges to
this x.
This proof also proves us the following theorem:
25 Theorem: An infinite simple continued fraction is less than any of its odd
convergents and greater than any of its even convergents.
We call a

n
= [a
n
, a
n+1
, . . .] the n
th
complete quotient of the continued fraction
x = [a
0
, a
1
, . . .]. Some familiar facts follow:
a

n
= lim
N→∞
[a
n
, a
n+1
, . . . , a
N
] = a
n
+ lim
N→∞
1
[a
n+1
, . . . , a
N
]
= a
n
+
1
a

n+1
.
11
For instance,
x = a

0
= a
0
+
1
a

1
Also,
a

n+1
> a
n+1
> 0, so 0 <
1
a

n+1
< 1;
since a

n
> a
n
, a
n
= a

n

1
a

n+1
, a
n
∈ N, we have:
a
n
= {a

n
}
This gives us:
26 Theorem: If [a
0
, a
1
, a
2
, . . .] = x, then
a
0
= {x}, a
n
= {a

n
}.
Also, through a parallel argument to Thm 17, we have
27 Theorem: Two infinite simple continued fractions which have the same value
are identical.
This tells us that if an irrational number can be expressed as an infinite
continued fraction, then its representation is unique. Consider the con-
tinued fraction algorithm; if x is irrational, the algorithm doesn’t termi-
nate, and hence it gives us a sequence of integers a
0
, a
1
, a
2
, . . . and numbers
a

0
, a

1
, a

2
, . . ., where
x = [a
0
, a

1
] = [a
0
, a
1
, a

2
] = . . . = [a
0
, a
1
, a
2
, . . . a
n
, a

n+1
],
and where, by (10), x =
a

n+1
p
n
+ p
n−1
a

n+1
q
n
+ q
n−1
, so
x−
p
n
q
n
=
a

n+1
p
n
+ p
n−1
a

n+1
q
n
+ q
n−1

p
n
q
n
=
a

n+1
p
n
q
n
+ p
n−1
q
n
−(a

n+1
p
n
q
n
+ p
1
q
n−1
)
q
n
(a

n+1
q
n
+ q
n−1
)
=
(−1)
n
q
n
(a

n+1
q
n
+ q
n−1
)
; so

x −
p
n
q
n

<
1
q
n
(a
n+1
q
n
+ q
n−1
)
=
1
q
n
(q
n+1

1
n(n + 1)
,
which → 0 as n → ∞.
Thus:
x = lim
n→∞
p
n
q
n
= [a
0
, a
1
, a
2
, . . . a
n
, . . .]
So, the algorithm gives a continued fraction whose value is x. Further, this
representation is unique; each irrational number can be represented at least
one way and at most one way as a simple continued fraction.
12
28 Theorem: Every irrational number can be expressed in exactly one way by
an infinite simple continued fraction.
We have now done a lot of work on continued fractions, and have gotten
some important results: by putting a real number through the continued
fraction algorithm, we get a simple continued fraction equal to that number;
if the real number is rational, we will get a finite simple continued fraction
whose last quotient is greater than 1, and which is a unique expression for
that rational number among finite simple continued fractions with quotients
greater than 1; if the real number is irrational, we will get an infinite simple
continued fraction which is the unique expression of that irrational number
as a simple continued fraction.
Consider this algorithm. In general, it states that a

n
= a
n
+ ξ
n
, or in other
words, a

n
= {a

n
} +
1
a

n+1
. So,
a

n
−{a

n
} =
1
a

n+1
(13)
Recall the Gauss transformation:
ϕ(x) =

1
x
−{
1
x
} if x ∈ (0, 1]
0 if x = 0
Suppose, n ≥ 0, 0 < ξ
n
< 1; then ξ
n
=
1
a

n+1
.
For n ≥ 1, 0 <
1
a

n
, so ϕ1a

n
is well defined, and, by (13),
ϕ(
1
a

n
) = a

n
−{a

n
} =
1
a

n+1
.
Suppose that for n = N, ξ
n
= 0;, the continued fraction algorithm terminates,
and we have that for the final complete quotient a
N
,
ϕ(
1
a

N
) = a

N
−{a

N
} = 0.
Therefore, we have
ϕ
n−1
(
1
a

1
) =
1
a

n
unless the continued fraction is finite and n > N,
in which case ϕ
n−1
(
1
a

1
) = 0.
13
Since a
n
= {a

n
},
a
n
=

1
ϕ
n−1
1
a

1
¸
. (14)
Let us constrict our attention to a real number x in the interval [0, 1). The
simple continued fraction expansion of x generated by the continued fraction
algorithm will have to have a
0
= {x} = 0. Therefore,
x ∈ [0, 1), then x = a
0
+
1
a

1
.
This and (14) give us
29 Theorem: If x ∈ [0, 1], its continued fraction expansion as obtained through
the continued fraction algorithm is:
[1] if x = 1,
[0,
¸
1
x
¸
,

1
ϕ(x)
¸
,

1
ϕ
2
(x)
¸
, . . . ,

1
ϕ
N−1
(x)
¸
] if x is rational, and
[0,
¸
1
x
¸
,

1
ϕ(x)
¸
,

1
ϕ
2
(x)
¸
, . . . ,

1
ϕ
n−1
(x)
¸
, . . .] if x is irrational.
Further, if x is rational, and a
N
is the last quotient of its continued fraction
expansion as obtained through the algorithm, then ∀m ≥ N, ϕ
m
(x) = 0.
For the next theorem, we will need a lemma:
Lemma: ϕ(x) ∈ Q iff x ∈ Q.
Proof. If x ∈ Q, then x =
p
q
, p, ∈ N, q ∈ Z
ϕ(x) =
q
p
−{
q
p
} =
q
p

(q −q mod p)
p
=
q mod p
p
∈ Q.
Suppose x / ∈ Q, but ϕ(x) ∈ Q. Then ϕ(x) =
a
b
, a, b ∈ Z.
ϕ(x) =
a
b
=
1
x

1
x

,
a + b
¸
1
x
¸
b
=
1
x
,
x =
b
a + b
¸
1
x
¸, where b, a + b

1
x

= 0, ∈ Z, so
x ∈ Q, giving us to a contradiction.
14
We get that if x is rational, ϕ(x) is rational, and if x is irrational, then ϕ(x)
is irrational. Suppose x is rational, and suppose that its fraction generated
by the continued fraction algorithm terminates after a
N
. Then ξ
N
= 0, so
a

N
= a
N
, and xi
N
= a
N
= {a

N
}, so ϕ
N−1
(x) =
¸
ϕ
N−1
(x)
¸
.. Further,
a

n
> a
n
∀n < N, and so,
ϕ
n−1
(x) >
¸
ϕ
n−1
(x)
¸
∀n s.t. 1 < n < N.
Consider then the fraction generated by putting ϕ(x) through the algorithm;
for 1 < n < N, ϕ
n−1
(x) > {ϕ
n−1
(x)} . So, ϕ
n−2
(ϕ(x)) > {ϕ
n−2
(ϕ(x))} ∀n s.t. 1 <
n < N, or in other words, ϕ
n−1
(ϕ(x)) > {ϕ
n−1
(ϕ(x))} ∀n s.t. 1 < n < N−1.
Thus, if b
j
are the partial quotients of the fraction generated by ϕ(x), b

n−1
>
b
n−1
, so ξ
n
> 0, ∀n s.t. 1 < n < N −1.
Therefore, the fraction does not terminate anywhere before reaching b
N−1
.
However, we have
ϕ
N−1
(x) = {ϕ
N−1
(x)}
ϕ
N−2
(ϕ(x)) = {ϕ
N−1
(ϕ(x))}
b

N−1
= b
n−1
xi
N−1
= 0,
so that the fraction terminates after b
N
−1. Therefore, combining this result
with Thm 29, we get that if x is rational and x ∈ [0, 1), and the simple
continued fraction generated by putting x through the continued fraction
algorithm ends with a
N
(the N+1
th
term), then the simple continued fraction
generated by putting ϕ(x) through the algorithm ends with its N
th
term, and
looks like this:
ϕ(x) = [0,

1
ϕ(x)

,

1
ϕ(ϕ(x))

,

1
ϕ
2
(ϕ(x))

, . . . ,

1
ϕ
N−2
(ϕ(x))

]
= [0,

1
ϕ(x)

,

1
ϕ
2
(x))

,

1
ϕ
3
(x))

, . . . ,

1
ϕ
N−1
(x)

]
remember, x = [0,

1
x

,

1
ϕ(x)

,

1
ϕ
2
(x)

, . . . ,

1
ϕ
N−1
(x)

]
(15)
Similarly, if x is irrational and x ∈ [0, 1], then ϕ(x) is irrational, and thus:
ϕ(x) = [0,

1
ϕ(x)

,

1
ϕ(ϕ(x))

,

1
ϕ
2
(ϕ(x))

, . . .] =
[0,

1
ϕ(x)

,

1
ϕ
2
(x))

,

1
ϕ
3
(x))

, . . .]
remember that x = [0,

1
x

,

1
ϕ(x)

,

1
ϕ
2
(x)

, . . .]
(16)
15
30 Definition: a
n
(x) := the n + 1
th
quotient, a
n
, that one would get if one put
x through the continued fraction algorithm.
Using this definition, (15) and (16), we get
31 Theorem: For x = [0, 1),
If n = 0, a
n
(ϕ(x)) = a
n
(x) = 0.
If n ≥ 1, a
n
(ϕ(x)) = a
n+1
(x).
In other words, to get the continued fraction of ϕ(x) from the continued
fraction of x, simple remove a
1
(x) from its entry and shift all the entries
following it over one:
x = [0, a
1
(x), a
2
(x), a
3
(x), . . .]
ϕ(x) = [0, a
2
(x), a
3
(x), a
4
(x), . . .]
This can be repeated, so that
32 Corollary: For x ∈ [0, 1), n ≥ 1, a
n

m
(x)) = a
n+m
(x).
Once more, in other words, to get from ϕ
m
(x) from x, remove a
1
through
a
m
of x, and shift the remaining terms over:
x = [0, a
1
(x), a
2
(x), a
3
(x), . . . , a
m
(x), a
m+1
(x), a
m+2
(x), . . .]
ϕ
m
(x) = [0, a
m+1
(x), a
m+2
(x), a
m+3
(x), . . .]
(Remember, if x is rational, and its expansion → N terms, ϕ
m
(x)’s expansion
has N −m.)
33 Theorem: When x ∈ [0, 1)ϕ
−1
(x), the preimage of x, consists of all continued
fractions of the form: [0, n, a
1
(x), a
2
(x), . . .], where n ∈ N.
Proof. Suppose ϕ(y) = x. Then, for n = 0, a
0
(y) = a
0
(x) = 0.], and n ≥
1, a
n
(x), a
n+1
(y). Since y is of the form [0, z, a
1
(x), a
2
(x), . . .]. Since z is
undetermined by x, it can be any number in the range of a
1
(y), which is N.
So, y is of the form [0, n, a
1
(x), a
2
(x), . . .], where n ∈ N.
34 Corollary: When x ∈ [0, 1), ϕ
−m
(x), the preimage of x, consists of all con-
tinued fractions of the form: [0, n
1
, n
2
, . . . , n
m
, a
1
(x), a
2
(x), . . .], where n
j

N for 1 ≤ j ≤ m.
16
35 Remarks: The theory of simple continue fractions and the Gauss transfor-
mation gives us a remarkable way to look at the continuum, a way different
from the standard decimal expansion. We need to develop some terminology
first.
36 Definition: The interval [
1
n+1
,
1
n
) = [[0, n + 1], [0, n]), n ∈ N, is called an n
th
platform of level 1.
An interval of the form (
1
z+
1
n
,
1
z+
1
n+1
] = ([0, z, n], [0, z, n + 1]], z, n ∈ N, is
called an n
th
platform of level 2.
The n
th
platforms of the m
th
level refer to those intervals of the form

[
1
z
1
+
1
z
2
+
1
.
.
.
+
1
z
n−1
+
1
n+1
,
1
z
1
+
1
z
2
+
1
.
.
.
+
1
z
n−1
+
1
n
) : z
1
, z
2
, . . . , z
m−1
, n ∈ N

=
{[[0, z
1
, z
2
, . . . , z
n−1
, n + 1], [0, z
1
, z
2
, . . . , z
n−1
, n]) : z
1
, z
2
, . . . , z
m−1
, n ∈ N}
if m is odd and

(
1
z
1
+
1
z
2
+
1
.
.
.
+
1
z
n−1
+
1
n
,
1
z
1
+
1
z
2
+
1
.
.
.
+
1
z
n−1
+
1
n+1
] : z
1
, z
2
, . . . , z
m−1
, n ∈ N

=
{([0, z
1
, z
2
, . . . , z
n−1
, n], [0, z
1
, z
2
, . . . , z
n−1
, n + 1]] : z
1
, z
2
, . . . , z
m−1
, n ∈ N}
if m is even.
Each n
th
platform of level m contains exactly one j
th
platform of level m+1
for each j ∈ N, and is in fact the union of this collection of platforms of
one higher level with its own endpoint (either the left or right endpoint, de-
pending on whether m is odd or even, respectively.) Thus, we could describe
uniquely each platform by listing the platforms of lower levels that it is a
member of.
37 Definition: pl
n
1
,n
2
,...,n
m−1
,n
m
is the n
th
m
platform of level m that’s contained in
the n
th
m−1
platform of level m−1 that’s contained in the . . . that’s contained
in the n
nd
2
platform of level 2 that’s contained in the n
th
1
platform of level 1.
17
The sequence of numbers a
1
(x), a
2
(x), a
3
(x), . . . associated with a number
x ∈ [0, 1), being the quotients of the simple continued fraction generated by
putting x through the continued fraction algorithm, gives us the address of
the number x in the terms of which platforms it’s on. For each a
j
(x), we
know that x is on the a
j
th platform of level j. More specifically, we know
that x ∈ pl
a
1
(x),a
2
(x),...,a
n
(x)
for all n ≤ N if x is rational and a
N
is the last
quotient of x, or for all n ∈ N if x is irrational.
Let us call the simple continued fraction generated by the continued fraction
algorithm applied to a number x the continued fraction expansion of x. A
number’s decimal expansion tells us its location in the decimal system, in
which each level is divided into ten equal pieces. The continued fraction
expansion of a number x ∈ [0, 1) tells us its location in the coordinate system
of the platforms, in which each level is divided into a countably infinite
number of parts of logorhythmically decreasing length.
Now, it is not hard to see that ϕ(pl
n
1
,n
2
,n
3
,...
) is equal to pl
n
2
,n
3
,n
4
,...
. In this
way, the Gauss transformation strips away levels of information about the
location of a number–if we know ϕ
m
(x), we don’t know which platforms of
levels 1 through m−1 it resides on.
One final note for now about the relationship of ϕ(x) and the platforms.
Suppose we define PL to be the set of all platforms of all levels, and suppose
that we define f(X), where X is a collection of sets and f is a function on
the members of those sets to be the set of images of the members of X under
f. Then we have that
ϕ(PL) = PL
and so
ϕ
m
(PL) = PL, m ∈ N
2 The Lebesgue Integral and Ergodic Theory
I will only have space to gave a skeletal version of the theory behind the
Lebesque integral; I will state many theorems without proving them.
38 Definition: A topology (X, τ) consists of a collection τ of subsets of a set X
such that:
i) ∅ ∈ τ and X ∈ τ.
ii) If V
i
∈ τ for i = 1, . . . , n, then V
1
∩ V
2
∩ . . . ∩ V
n
∈ τ.
18
iii) If {V
α
} is an arbitary collection of members of τ (finite, countable, or
uncountable), then ∪
α
V
α
∈ τ.
If there exists such a τ for a set X, then X is called a topological space, and
the members of τ are called the open sets in X.
39 Definition: A collection M of subsets of a set X is said to be a σ −algebra
in X if M has the following property:
i) X ∈ M
ii) If A ∈ M, then X −A ∈ M
iii) If A = ∪

n=1
A
n
and if A
n
∈ M for n = 1, 2, 3, . . . , then A ∈ M.
If there exists such a M for a set X, then X is called a measurable space,
and the members of M are called the measurable sets in X. Further if X is
a measurable space, Y is a topological space, and f is a mapping of X into
Y , then f is said to be measurable if f
−1
(V ) is a measurable set in X for
every open set V in Y.
40 Remarks: Another way of describing a σ-algebra is to say that it contains
the empty set and that it is closed under countable numbers of intersections,
unions, and complements. An example of a commonly used topology is the
topology on the extended real number line [−∞, ∞] generated by taking
finite intersections and infinite unions of open intervals (intervals of the form
[−∞, a), (a, b), (a, ∞].) An example of a commonly used σ-algebra is the
σ-algebra on the extended real number line [−∞, ∞] generated by taking
countable complements, unions, and intersections of any interval (all intervals
of the form [−∞, a), [−∞, a], (a, b), (a, b], [a, b), [a, b], (a, ∞], [a, ∞].)
41 Definition: Let {a
n
} be a sequence in [−∞, ∞], and put
b
k
= sup {a
k
, a
k+1
, a
k+2
, . . .}(k = 1, 2, 3, . . .) (17)
where sup is the least upper bound of the sequence a
k
, a
k+1
, a
k+2
, . . . and
β = inf {b
1
, b
2
, b
3
, . . .} (18)
where inf is the greatest lower bound of the sequence b
1
, b
2
, b
3
, . . . . We call
β the upper limit of {a
n
}, and write
β = limsup
n→∞
a
n
The following properties are easily verified: first, b
1
≥ b
2
≥ b
3
≥ . . ., so that
b
k
→ β as k → ∞; secondly, there is a subsequence {a
n
i
} of {a
n
} such that
a
n
i
→ β as i → ∞, and β is the largest number with this property.
19
The lower limit is defined analogously: switch (17) and (18). If {a
n
} con-
verges, we have:
limsup
n→∞
a
n
= liminf
n→∞
a
n
= lim
n→∞
a
n
and some thought shows that if limsup
n→∞
a
n
= liminf
n→∞
a
n
, then a
n
converges to
that limit.
Suppose {f
n
} is a sequence of extended-real functions on a set X. Then
sup
n
f
n
and limsup
n→∞
f
n
are the functions defined on X by
(sup
n
f
n
)(x) = sup
n
(f
n
(x))
(limsup
n→∞
f
n
)(x) = limsup
n→∞
(f
n
(x))
42 Definition: If E is a subset of X, then the function on X
ξ
E
(x) =

1 if x ∈ E
0 if x / ∈ E
is called the characteristic function of the set E.
A function s on a measurable space X whose range consists of only finitely
many points in [0, ∞) is known as a simple function. Clearly, a simple func-
tion is expressible as a finite linear combination of characteristic functions. A
rather beautiful theorem that we have not the time to prove is the following:
43 Theorem: Let f : X → [0, ∞] be measurable. There exist simple measurable
functions s
n
on X such that
a) 0 ≤ s
1
≤ s
2
≤ . . . ≤ f.
b) s
n
(x) → f(x) as n → ∞, for every x ∈ X
44 Definition: A measure is a function µ, defined on a σ − algebra M, whose
range is in [0, ∞] and which is σ−additive, meaning that if {A
i
} is a disjoint
countable collection of members of M, then
µ(


i=1
A
i
) =

¸
i=1
µ(A
i
).
We will also assume that ∃A ∈ M such that µ < ∞. A measurable space
with a measure defined on it is called a measure space.
As an example, take the σ − algebra described above on the extended real
line, [−∞, ∞], let us call it M and define the following measure on it: if I
20
is an interval in M, let m(I) = b − a, where a and b are the left and right
endpoints of I, respectively, regardless of whether I is open, semi-open, or
closed. If X ∈ M is not an interval, it is not hard to see that it is expressible
as a countable union of disjoint intervals; as such, define m(X) as the sum
of the measure of the disjoint intervals resulting from a decomposition of X.
It can be shown that m is well-defined in this way.
A consequence of this is that m({2}) = m([2, 2]) = 0]. Thus, we have a
non-empty set of measure 0. In fact, since measures are σ − additive, we
can take the union of a countable number of such single-point intervals and
get an infinite set of measure 0. We can therefore come to the conclusion
that the set of rational numbers has measure 0; and we can go further still–if
you extend our definition of measure to cover a larger group of sets, creating
what is known as the Lebesgue measure–which we will not define here–one
finds that one can have an uncountable set of measure 0–namely, the famous
Cantor set. Further:
45 Definition: Suppose that x ∈ [0, 1], and that the digit b occurs n
b
times in
the first n places of the decimal expansion of x, base r. If
n
b
n
→ β
when n → ∞, then we say that b appears with asymptotic frequency β. (This
limit need not exist for a given x.) We say that x is simply normal base r if
n
b
n

1
r
for each of the r possible values of b. Further, we say that x is normal base
r if all of the numbers
x −{x}, rx −{rx}, r
2
x −{r
2
x}, . . .
are simply normal in all of the sacles
r, r
2
, r
3
, ...;
This is the same as saying that in the decimal expansion of x in base r, every
combination
b
1
b
2
. . . b
k
of digits occurs with the right frequency: if n
b
is the number of times this
sequence occurs in the first n digits of x, then
n
b
n

1
r
k
21
as n → ∞. There is theorem, the proof of which is once more beyond the
scope of this paper, that the set of normal numbers has measure 1; thus, the
set of numbers that aren’t normal in [0,1] has measure 0. When something
is true for all members of a measure space X except for a set of measure 0,
we say that it is true for almost every x ∈ X. Thus, almost every x in [0, 1]
is normal. The purpose of this paper is to prove an analogous result for the
continued fraction expansion of a number in [0,1].
One of the reason that it is significant for a given set to be of measure 0
is demonstrated in the constructing of the Lebesgue integral, which brings
together a number of the ideas presented in this section:
46 Definition: Suppose that (X, M) is a measurable space and that µ is a mea-
sure on M. If s is a measurable simple function on X, we have pointed out
that it is expressible as
s =
n
¸
i=1
α
i
ξ
A
i
.
where α
1
, . . . , α
n
are the distinct values of s, A
i
= {x : s(x) = α
i
}, and ξ
A
i
is the characteristic function of A
i
, and if E ∈ M, we define

E
sdµ =
n
¸
i=1
α
i
µ(A
i
∩ E).
(If this equation requires us to take 0· ∞, we follow the convention 0· ∞ = 0.
If f : X → [0, ∞] is measurable, and E ∈ M, we define

E
fdµ = sup

E
sdµ (19)
the supremum being taken over all simple measurable functions s such that
0 ≤ s ≤ f. The left number (19) is called the Lebesgue integral of f over
E, with respect to the measure µ. Further, suppose that f is a real-valued
measurable function on X for which

X
|f|dµ < ∞
We call the collection of such functions L
1
(µ), also known as the Lebesque
integrable functions (with respect to µ). Let f
+
= max{f, 0} and f

=
−min{f, 0}; then we define

E
fdµ =

E
f
+
dµ −

E
f


22
for every measurable set E. One of the consequences of such a definition is
that if E is a set of measure 0 and f ∈ L
1
(µ), then

E
fdµ = 0, and that as
such, sets of measure 0 exert no influence on integrals taken over them.
Before getting to the most important theorems of this paper, we will need
two more results from Lebesgue theory, which I will state without proof:
47 Fatou’s Lemma: If f
n
: X → [0, ∞] is measurable, for each positive integer
n, then

X
(liminf
n→∞
f
n
)dµ ≤ liminf
n→∞

X
f
n

48 Lebesgue’s Dominated Convergence Theorem: Suppose {f
n
} is a sequence
of real measurable functions on X such that
f(x)) = lim
n→∞
f
n
(x)
exists for every x ∈ X. If there is a function g ∈ L
1
(µ) such that
|f
n
(x)| ≤ g(x)(n = 1, 2, 3, . . . ; x ∈ X)
then f ∈ L
1
(µ),
lim
n→∞

X
|f
n
−f|dµ = 0,
and
lim
n→∞

X
f
n
dµ =

X
fdµ.
49 Definition: Suppose (X, A, µ) and (Y, B, ν) are measurable spaces. We can
redefine measurable function from X to Y without respect to a topology on
Y by saying that a function if T : X → Y is measurable if the preimage
of any measurable set is measurable. A measurable function T : X → Y
is measure-preserving if µ(T
−1
(B)) = ν(B), ∀B ∈ B. If T maps X onto
itself, we call T a measure-preserving transformation, and we say that µ is
T-invariant.
If µ(X) = 1, we say that (X, A, µ) is a probability space. If µ(X) is finite,
then we say (X, A, µ) is a finite measure space, which can be rescaled to be
a probability space by scaling µ by
1
µ(X)
.
50 Poincar´e Recurrence Theorem: Let T be a measure-preserving transfor-
mation of a finite space (X, A, µ). If A is a measurable set, then for almost
every x ∈ A, there are infinitely many k ∈ N such that T
k
(x) ∈ A.
23
Proof. Let
B = {x ∈ A : T
k
(x) / ∈ A for all k ∈ N} = A −
¸
k∈N
T
−k
(A).
Then B ∈ A, and all the preimages T
−k
(B) are disjoint, are measurable,
and have the same measure as B. Since X has finite total measure, and
µ(X) ≥ µ(
¸
k∈N
T
−k
(B)) =

¸
k=1
µ(T
−k
(B)) =

¸
k=1
µ(B),
it follows that µ(B) = 0. Hence, for almost every x ∈ A, x ∈ A − B, so
that for almost every x, ∃n ∈ N : T
n
(x) ∈ A. Since T
n
(x) ∈ A, ∃n
1

N : T
n
1
(T
n
(x)) ∈ A, T
n+n
1
(x) ∈ A; by similar arguments, ∃n
2
, n
3
, . . . :
T
n+n
1
(x), T
n+n
2
(x), T
n+n
3
(x), . . . , ∈ A.
Before proving the most important theorem of the paper, we need to prove
a combinatorial lemma:
51 Definition: If a
1
, . . . , a
m
are real numbers and 1 ≤ n ≤ m, we say that a
k
is
an n-leader if a
k
+ . . . + a
k+p−1
≥ 0 for some p, 1 ≤ p ≤ n (for instance,
nonnegative numbers are always n-leaders.)
52 Lemma: For every n, 1 ≤ n ≤ m, the sum of all n-leaders is nonnegative.
Proof. If there are no n-leaders, the lemma is true. Otherwise, let a
k
be
the first n-leader, and p ≥ 1. If p > 1, then a
k
< 0, and we have that
a
k+1
+. . . +a
k+p+1
≥ 0, and so a
k+1
is an n-leader. Repeating this argument,
we get that if k ≤ j ≤ k + p − 1, then a
j
+ . . . + a
k+p−1
≥ 0, and so a
j
is
an n-leader, and we have that for the total collected n-leaders thus far, their
sum is nonnegative. Repeating the argument with the remaining sequence
a
k+p
, . . . , a
n
proves the lemma.
53 Birkhoff Ergodic Theorem: Let T be a measure-preserving transformation
in a finite space (X, A, µ), and let f ∈ L
1
(µ), and is real-valued. Then the
limit
¯
f(x) = lim
n→∞
1
n
n−1
¸
k=0
f(T
k
(x))
exists for almost every x ∈ X. Further,
¯
f ∈ L
1
(µ), is T-invariant, and satisfies

X
¯
f(x)dµ =

X
f(x)dµ
24
Proof. Let
A = {x ∈ X : f(x) + f(T(x)) + . . . + f(T
k
(x)) ≥ 0 for some k ∈ N ∪ {0}}
Lemma: (Maximal Ergodic Theorem)

A
f(x)dµ ≥ 0.
Proof. Let A
n
= {x ∈ X : Σ
k
i=0
f(T
i
(x)) ≥ 0 for some k, 0 ≤ k ≤ n}. Then
A
n
⊂ A
n+1
, A = ∪
n∈N
A
n
. Define:
g
n
(x) =

f(x) if x ∈ A
n
0 if x ∈ X −A
n
Clearly, g
n
∈ L
1
(µ), and f(x) = lim
n→∞
g(x), for all x ∈ A. Also, |g
n+1
| ≥ |g
n
|,
and that |f| > |g
n
| ∀n ∈ N. So, applying the Dominated Convergence
Theorem, we get:

A
fdµ = lim
n→∞

A
g
n
dµ = lim
n→∞

A
n
fdµ
Therefore, it suffices to show that

A
n
fdµ ≥ 0 for each n.
Fix an arbitrary m ∈ N. Let s
n
(x) be the sum of the n-leaders in the
sequence f(x), f(T(x)), . . . , f(T
m+n−1
(x)). For k ≤ m + n − 1, let B
k
⊂ X
be the set of points for which f(T
k
(x)) is an n-leader of this sequence. From
Lemma 52, we have 0 ≤

X
s
n
(x)dµ. A little bit of thought shows that:
0 ≤

X
s
n
(x)dµ =
m+n−1
¸
k=0

B
k
f(T
k
(x))dµ (20)
Note that for 1 ≤ k ≤ m, x ∈ B
k
iff T(x) ∈ B
k−1
, and further, for these k,
B
k
= T
−1
(B
k−1
) = T
−k
(B
0
), and so

B
k
f(T
k
(x))dµ =

T
−k
(B
0
)
f(T
k
(x))dµ =

B
0
f(x)dµ,
this last step due to the fact that µ is T-invariant. Thus, the first m + 1
25
steps of (20) are equal, and since B
0
= A
n−1
, we have
0 ≤
m+n−1
¸
k=0

B
k
f(T
k
(x))dµ =
(m + 1)

A
n−1
f(x)dµ +
m+n−1
¸
k=m+1

B
k
f(T
k
(x))dµ ≤
(m + 1)

A
n−1
f(x)dµ + (n −1)

X
|f(x)|dµ.So,

A
n−1
f(x)dµ ≥
−(n −1)

X
|f(x)|dµ
m + 1
.
Since m is arbitrary, we can make it as large as we want, showing that

A
n
fdµ ≥ 0.
Continuing the proof of the Birkhoff ergodic theorem, for any a, b ∈ R, a < b,
the set
X(a, b) =

x ∈ X : liminf
n→∞
1
n
n−1
¸
i=0
f(T
i
(x)) < a < b < limsup
n→∞
1
n
n−1
¸
i=0
f(T
i
(x))
¸
is measurable and T-invariant. Applying the Maximal Ergodic Theorem for
the function f(x) −b, we have that

B
f(x)dµ ≥ 0 where
B = {x ∈ X : (f(x) −b) + (f(T(x)) −b) + . . .
+(f(T
k
(x)) −b) ≥ 0 for some k ∈ N ∪ {0}} =
{x ∈ X : (f(x) + f(T(x)) + . . . + f(T
k−1
(x))) −b(k) ≥ 0 for some k ∈ N} =
{x ∈ X :
1
k
(f(x) + f(T(x)) + . . . + f(T
k−1
(x))) ≥ b for some k ∈ N}
Therefore, we see that if x ∈ X(a, b), then x ∈ B. If x ∈ B, but x / ∈ X(a, b),
then there are two options. Either only a finite number of
1
k
k−1
¸
i=0
f(T
i
(x)) < a for all k ∈ N
26
but by the Poincar´e Recurrence Theorem, this can only be true for a set of
measure 0; or
1
k
k−1
¸
i=0
f(T
i
(x)) = b for all k ∈ N, including i = 0,
so f(x) = b.
In either case, we have

B−X(a,b)
(f(x) −b)dµ = 0.
So,
0 ≤

B
(f(x) −b)dµ =

X(a,b)
(f(x) −b)dµ +

B−X(a,b)
(f(x) −b)dµ =

X(a,b)
(f(x) −b)dµ
By an analogous argument, we have
0 ≤

X(a,b)
(a −f(x))dµ
and therefore
0 ≤

X(a,b)
(a −f(x))dµ +

X(a,b)
(f(x) −b)dµ =

X(a,b)
(a −b)dµ.
But since b > a, the above equation must be equality, and X(a, b) must have
measure 0. Since a and b were arbitrary, we have that the limit represented
by
¯
f exists for almost every x ∈ X.
For n ∈ N, let f
n
(x) =
1
n
Σ
n−1
i=0
f(T
i
(x)). Define
¯
f : X → R by
¯
f(x) =
liminf
n→∞
f
n
(x). Then
¯
f is measurable, and f
n
converges for almost every x to
¯
f. By Fatou’s Lemma,

X
|
¯
f|dµ =

X
liminf
n→∞
|f
n
(x)|dµ ≤ liminf
n→∞

X
f
n
dµ ≤
liminf
n→∞
1
n
n−1
¸
j=0

X
|f(T
j
(x))|dµ = liminf
n→∞
1
n
n−1
¸
j=0

T
−j
(X)
|f(T
j
(x))|dµ
27
By the T-invariance of µ, we have

X
|
¯
f|dµ ≤

liminf
n→∞
1
n
n−1
¸
j=0

T
−j
(X)
|f(T
j
(x))|dµ =
liminf
n→∞
1
n
n−1
¸
j=0

X
|f(x)|dµ = liminf
n→∞
1
n
· n

X
|f(x)|dµ =
liminf
n→∞

X
|f(x)|dµ =

X
|f(x)|dµ.
Therefore,

X
|
¯
f|dµ < ∞,
¯
f ∈ L
1
(µ). Finally, let N be the set of points for
which the limit of
¯
f does not exist. (µ(N) = 0.) Let
f
n
(x) =

1
n
n−1
¸
j=0
f(T
j
(x)) if x ∈ X −N
0 if x ∈ N
and let
¯
f
0
(x) =

lim
n→∞
1
n
n−1
¸
j=0
f(T
j
(x)) if x ∈ X −N
0 if x ∈ N
Then f
n

¯
f
0
for every x ∈ X; as has been shown, |
¯
f| ∈ L
1
(µ); further,
|
¯
f| ≥ |f
n
(x)| ∀n ∈ N. So, using the Dominated Convergence Theorem, we
have
¯
f
0
∈ L
1
(µ), and so

X
¯
fdµ =

X−N
¯
fdµ =

X−N
¯
f
0
dµ =

X
¯
f
0
dµ =
lim
n→∞

X
1
n
n−1
¸
j=0
f(T
j
(x))dµ = lim
n→∞
1
n
n−1
¸
j=0

X
f(T
j
(x))dµ =
lim
n→∞
1
n
n−1
¸
j=0

T
−j
(X)
f(T
j
(x))dµ = lim
n→∞
1
n
n−1
¸
j=0

X
f(x)dµ =
lim
n→∞
1
n
· n

X
f(x)dµ = lim
n→∞

X
f(x)dµ =

X
f(x)dµ.
54 Definition: Let T be a measure-preserving transformation on a measure space
(X, A, µ). A measurable function f : X → R is essentially T-invariant
if µ({x ∈ X : f(T
n
x) = f(x)}) = 0 for every t. A measurable set A
28
is essentially T-invariant if its characteristic function ξ
A
is essentialy T-
invariant. A measure-preserving transformation T is ergodic if any essentially
T-invariant measurable set has either measure 0 or full measure (the measure
of the entire set.)
55 Corollary to the Birkhoff Ergodic Theorem: If a measure-preserving trans-
formation T in a finite measure space (X, A, µ) is ergodic, then for each
f ∈ L
1
(µ)
lim
n→∞
1
n
n−1
¸
k=0
f(T
k
(x)) =
1
µ(X)

X
f(x)dµ, for almost every x ∈ X.
The converse is also true, but we shall not need it here.
Proof. If f is ergodic, then for each real number c the set
Y
a
= x ∈ X : limsup
n→∞
1
n
n−1
¸
i=0
f(T
i
(x)) < a
is fully invariant. Therefore, it is essentially invariant, and we have that
µ(Y
a
) is either 0 or µ(X). Consider the function a → µ(Y
a
). It is clearly
monotonically increasing, and therefore takes the form
µ(Y
a
) =

0 if a < a
0
µ(X) if a
0
< a
(21)
where a
0
∈ [−∞, ∞]. Let
N = {x ∈ X : limsup
n→∞
1
n
n−1
¸
i=0
f(T
i
(x)) = a
0
}.
N is also invariant, but µ(N) = µ(X) is incompatible with (21); so we have
that µ(N) = 0, so that for almost every x ∈ X,
¯
f(x) = lim
n→∞
1
n
n−1
¸
i=0
f(T
i
(x)) = limsup
n→∞
1
n
n−1
¸
i=0
f(T
i
(x)) = a
0
So that we have

X
¯
f(x)dµ =

X
a
0
dµ =

X
f(x)d
x
µ(X)a
0
=

X
f(X)d
X
a
0
=
1
µ(X)

X
f(x)dµ,
completing the proof.
29
Finally, we return to the Gauss transformation.
56 Definition: The Gauss measure µ is defined by
µ(A) =
1
ln 2

A

1 + x
.
It will not be proved here, but this measure is ϕ-invariant. It is also a
probability measure. We will need a few more facts before completing our
final pair of theorems. First of all, through induction, we can see that if
p
n
(x)
q
n
(x)
is the n
th
convergent of the continued fraction expansion of x, then
p
n
(x) ≥ 2
(n−2)/2
and q
n
(x) ≥ 2
(n−1)/2
for n ≥ 2.
Secondly, define ψ
b
1
,...,b
n
to be
ψ
b
1
,...,b
n
(t) = [b
1
, . . . , b
n−1
, b
n
+ t]
. In other words, ψ
b
1
,...,b
n
maps [0, 1) onto pl
b
1
,...,b
n
, and is thus decreasing if
n is odd and increasing if n is even. For x ∈ pl
b
1
,...,b
n
,
x = ψ
b
1
,...,b
n
=
p
n
(x) + tp
n−1
(x)
q
n
(x) + tq
n−1
(x)
. (22)
Finally, if λ is the default measure on [0, 1], the Lebesgue measure (which is
equivalent to m discussed above on objects such as intervals), it is not hard
to see that then λ(pl
b
1
,...,b
n
) = (q
n
(q
n
+ q
n−1
))

1.
57 Theorem: The Gauss transformation is ergodic for the Gauss measure µ.
Proof. For a measure ν and measurable sets A and B with νB = 0, let
ν(A|B) = ν(A ∩ B)/ν(B) denote the conditional measure. Fix b
1
, . . . , b
n
,
and let pl
n
= pl
b
1
,...,b
n
, ψ
n
= ψ
b
1
,...,b
n
. The length of pl
n
is ±(ψ
n
(1) − ψ
n
(0)),
and for 0 ≤ x < y ≤ y,
λ({z : x ≤ ϕ
n
(z) < y} ∩ pl
n
) = ±(ψ
n
(y) −ψ
n
(x)),
where the sign depends on whether n is odd or even. Therefore
λ(ϕ
−n
([x, y))|pl
n
) =
ψ
n
(y) −ψ
n
(x)
ψ
n
(1) −ψ
n
(0)
,
and, by Theorem 7 and (22),
λ(ϕ
−n
([x, y))|pl
n
) = (y −x) ·
q
n
(q
n
+ q
n−1
)
(q
n
+ xq
n−1
)(q
n
+ yq
n−1
)
.
30
The second factor in the right-hand side is between 1/2 and 2. Hence
1
2
λ([x, y)) ≤ λ(ϕ
−n
([x, y))|pl
n
) ≤ 2λ([x, y)).
Since the intervals [x, y) generate the σ-algebra,
1
2
λ(A) ≤ λ(ϕ
−n
(A)|pl
n
) ≤ 2λ(A) (23)
for any measurable set A ⊂ [0, 1]. Because the density of the Gauss measure
µ is between 1/(2 ln2) and 1/ln2,
1
2 ln 2
λ(A) ≤ µ(A) ≤
1
ln 2
λ(A).
By (23),
1
4
µ(A) ≤ µ(ϕ
−n
(A)|pl
n
) ≤ 4µ(A)
for any measurable set A ⊂ [0, 1].
Let A be a measurable ϕ-invariant set with µ(A) > 0. Then
1
4
µ(A) ≤
µ(A|pl
n
), or, equivalently,
1
4
µ(pl
n
) ≤ µ(pl
n
|A). Since the platforms (inter-
vals) pl
n
generate the σ-algebra,
1
4
µ(B) ≤ µ(B|A) for any measurable set B.
By choosing B = [0, 1] −A we obtain that µ(A) = 1.
This brings us to our conclusion:
58 Definition: Suppose that x ∈ [0, 1], and that the natural number k occurs
n
k
times in the first n partial quotients of the (simple) continued fraction
expansion of x. If
n
k
n
→ κ
when n → ∞, then we say that k appears with asymptotic frequency κ.
59 Theorem: For almost every x ∈ [0, 1], every integer k ∈ N appears in its
(simple) continued fraction expansion (i.e. in the sequence a
1
(x), a
2
(x), . . .
with asymptotic frequency
1
ln 2
ln

(k + 1)
2
k(k + 2)

.
Numbers for which Theorem 59 is true are called ϕ-normal numbers.
31
Proof. a
n
(x) = k iff ξ
pl
k
(ϕ(x)) = 1. By the corollary to the Birkhoff Ergodic
Theorem, for almost every x,
lim
n→∞
1
n
n−1
¸
i=0
a
n
(x) =
lim
n→∞
1
n
n−1
¸
i=0
ξ
pl
k

i
(x)) =

1
0
ξ
pl
k
dµ = µ
¸
1
k
,
1
k + 1

=
1
ln 2
ln

(k + 1)
2
k(k + 2)

.
A quick application of this theorem. Suppose that you construct a tree in
the following way. Start with a point; draw two line segments branching
off, creating two new points. Out of each of those points, draw two new
line segments, creating four new points. Continue this indefinitely. This is
called an infinite binary tree. Arrange the tree so that each new row of 2
n
points is arranged in a horizontal line beneath the previous row. From this
perspective, it is possible to describe every point of the tree in terms of a
unique sequence of lefts and rights taken from the initial point. For instance,
left-right-right takes you one place and right-right-left takes you some place
else. Further, each sequence of lefts and rights takes you somewhere on the
tree. (The initial point is equivalent to zero lefts and zero rights.) Therefore,
there is a bijection from the set of sequences of lefts and rights and the points
of the binary tree. Further, it is not hard to see that you can rewrite each
sequence of lefts and rights in terms of streaks of lefts and rights; specifically,
you can rewrite all sequences in the form
R
α
0
L
α
1
R
α
2
L
α
3
. . . R
α
n−1
L
α
n
,
α
j
∈ Z∀j; α
j
> 0 for 1 ≤ j ≤ n −1; α
0
and α
n
≥ 0
Based on this, there is a bijection between points on the tree and all positive
simple continued fractions, where each point on the tree R
α
0
L
α
1
. . . R
α
n−1
L
α
n
is mapped to
α
0
+
1
α
1
+
1
.
.
.
+
1
α
n−1
+
1
α
n
+ 1
If we made a tree at each point of which was written its corresponding
positive simple continued fraction, and then found the irreducible rational
number representing that fraction, we would get the famous Stern-Brocot
32
tree. In such a tree, infinite continued fractions would represent strings of
lefts and rights of infinite length.
Alternatively, suppose that we represent each point on the binary tree with
a string of the form
L
α
1
R
α
2
. . . L
α
n−1
R
α
n
,
α
j
∈ Z∀j; α
j
> 0 for 2 ≤ j ≤ n −1; α
1
and α
n
≥ 0
Then we can establish a bijection between points on the tree and all numbers
in the interval (0, 1] by mapping L
α
1
R
α
2
. . . L
α
n−1
R
α
n
to
1
α
1
+
1
α
2
+
1
.
.
.
+
1
α
n
+ 1
In this case, we can directly apply Theorem 59.
60 Definition: Suppose that we have a string of lefts and rights, L
α
1
R
α
2
. . ., and
that the natural number k occurs n
k
times in the first n numbers of the
sequence {α
1
, α
2
, . . .}. If
n
k
n
→ κ
Then we say that k appears with asymptotic frequency κ. In that case, if
we define a measure on the set of all infinite strings such that a set of strings
has the measure that its corresponding set in (0, 1] has, we would see that for
almost every string of rights and lefts on a binary tree, every integer k ∈ N
appears (as an exponent) with asymptotic frequency
1
ln 2
ln

(k + 1)
2
k(k + 2)

.
In other words, probabilistically speaking, for 100% of the paths along an
infinite tree, the distribution of streaks (i.e. the number of times one moves
the same direction once in a row, twice in a row, three times in a row,. . .,
k times in a row, etc.) will be defined by the above equation. A binary
tree describes many processes in the world, for instance the flipping of a
coin; tails = left, heads = right, for example. Since the infinite binary tree
represents all scenarios of an infinite number of coin flips, we can see that
100% of the time, the distribution of an infinite amount of coin flips will
have a distribution of streaks as defined above, and that as finite amounts
33
of coin flips get larger, they will tend towards this distribution. This means,
for instance, that about 40% of streaks will be of length 1.
Finally, here are some questions I would like to investigate: what is the
relationship between the set of normal numbers and the set of ϕ-normal
numbers? What is the smallest measure 1 subset of [0, 1]? Are there complex
analogies to simple continued fractions?
That is all–thank you for reading!
34