Jonah BlochJohnson
May 11, 2008
Abstract
The sequence of quotients of the simple continued fraction representa
tion of almost every number in [0,1] contains the natural number k with
asymptotic frequency:
1
ln 2
ln
(k + 1)
2
k(k + 2)
This result has a number of applications; for instance, it tells us the
distribution of streaks one would get after tossing a coin an inﬁnite amount
of times.
Introduction
This paper was prepared for MingLun Hsieh for an undergraduate seminar. It is
based on research I have been performing with David Bayer. The seminar was on
number theory; whereas most number theory is concerned with properties of the
natural numbers, this paper is concerned with a property of real numbers found in
the interval [0, 1]. More speciﬁcally, the paper is centered around deﬁning a prop
erty, that of being ϕnormal, which is named as an analogy with the more familiar
normal real numbers (don’t worry if you don’t know these–they are deﬁned in
the paper.) The subject requires an introduction to continued fractions, Lebesgue
integration, and dynamical systems–speciﬁcally a dynamical system known as the
Gauss transformation, as well as an important theorem of ergodic systems, the
Birkhoﬀ Ergodic Theorem. As a result, this paper consists primarily of condensa
tion of material from three books:
“An Introduction to the Theory of Numbers” by G.H.Hardy and E.M.Wright
(Chapter 10)
“Real and Complex Analysis” by Walter Rudin (Chapter 1)
“Introduction to Dynamical Systems” by Michael Brin and Garrett Stuck
(Chapters 1 and 4)
1
I also used a paper found online to help prove the corollary for the Berkhoﬀ Er
godic Theorem. My original material consists of those sections which join together
the books and the ﬂeshing out of some of the proofs, as well as the application at
the end of the paper, which was inspired by ideas from David Bayer. A note on
numbering conventions in this paper: if a theorem is introduced as “6 Theorem:,”
then it is referred to in the text as Theorem 6.
A ﬁnal note: the material in this subject touches upon two rather beautiful
structures in number theory; periodic continued fractions and the SternBrocot
tree. In class, people expressed particular interest in the theorem of Lagrange,
all continued fractions representing irrational roots of integercoeﬃcient quadratic
equations are peridoci, as well as its converse. Unfortunately, there is not room in
this paper to develop the theory behind either of them, but for those who want
to investigate either can look in chapter 10 of the Hardy and Wright book for the
continued fraction material and in chapters 4 and 6 of Donald Knuth’s “Concrete
Mathematics” to learn more about the SternBrocot tree.
1 The Gauss Transformation and Continued Frac
tions
1 Deﬁnition: A discretetime dynamical system (X,f ) consists of a nonempty
set X and a map f :X→X. (f need not be onto.) For n ∈ N, the n
th
iterate of
f is the nfold composition f
n
= f ◦ · · · ◦ f; we deﬁnite f
0
to be the identity
map (also denoted Id.)
2 Deﬁnition: Let {x} denote the greatest integer less than or equal to x for
x ∈ R. The map ϕ : [0, 1] → [0, 1] deﬁned by
ϕ(x) =
1
x
−{
1
x
} if x ∈ (0, 1]
0 if x = 0
is called the Gauss transformation.
3 Remarks: a) For each interval (
1
n+1
,
1
n
], ∀n ∈ N, ϕ(x) =
1
x
− n. Therefore, on
every such interval ϕ(x) is monotonically decreasing, continuous, and
has a range [0,1).
b) ∀n ∈ N, ϕ(
1
n
) = n −{n} = n −n = 0.
c) ∀n ∈ N, lim
→0
+
(ϕ(
1
n
− )) = 1. So, based on 3 b), ϕ is discontinuous at
1
n
for all n ∈ N.
2
When we study a dynamical system (X,f), we are mainly interested in what
happens to the elements of X after repeated iterations of f. In the case
of the Gauss transformation, questions of this sort are greatly simpliﬁed by
knowledge of continued fractions.
4 Deﬁnition: The function of N+1 variables
x = a
0
+
1
a
1
+
1
a
2
+
1
.
.
.
+
1
a
N
is called a ﬁnite continued fraction, and is often rewritten [a
0
, a
1
, a
2
, . . . , a
N
].
a
0
, a
1
, a
2
, . . . , a
N
are called the partial quotients or simply quotients of the
continued fraction. Here are some facts about continued fractions that follow
quickly from their deﬁnition:
[a
0
, a
1
] = a
0
+
1
a
1
[a
0
, a
1
, . . . , a
n−1
, a
n
] = [a
0
, a
1
, . . . , a
n−2
, a
n−1
+
1
a
n
] (1)
[a
0
, a
1
, . . . , a
n
] = a
0
+
1
[a
1
, a
2
, . . . , a
n
]
= [a
0
, [a
1
, a
2
, . . . , a
n
]]
and more generally,
[a
0
, a
1
, . . . , a
n
] = [a
0
, a
1
, . . . , a
m
−1, [a
m
, a
m
+1, . . . , a
n
]], for1 ≤ m < n (2)
5 Deﬁnition: We call [a
0
, a
1
, . . . , a
n
](0 ≤ n ≤ N) the n
th
convergent to [a
0
, a
1
, . . . , a
N
].
6 Theorem: If p
n
and q
n
are deﬁned by
p
0
= a
0
, p
1
= a
1
a
0
+ 1, p
n
= a
n
p
n−1
+ p
n−2
(2 ≤ n ≤ N) (3)
q
0
= 1, q
1
= a
1
, q
n
= a
n
q
n−1
+ q
n−2
(2 ≤ n ≤ N) (4)
then
[a
0
, a
1
, . . . , a
n
] =
p
n
q
n
.
Proof. If n = 0,
p
0
q
0
=
a
0
1
= [a
0
]
3
If n = 1,
p
1
q
1
=
a
1
a
0
+ 1
a
1
= a
0
+
1
a
1
= [a
0
, a
1
]
Suppose it is true for n ≤ m, where m < N. Then
[a
0
, a
1
, . . . , a
m−1
, a
m
] =
p
m
q
m
=
a
m
p
m−1
+ p
m−2
a
m
q
m−1
+ q
m−2
.
It is (1) that allows the recursion to propagate:
p
m+1
q
m+1
= [a
0
, a
1
, . . . , a
m−1
, a
m
, a
m+1
] = [a
0
, a
1
, . . . , a
m
+
1
a
m+1
] =
(a
m
+
1
a
m+1
)p
m−1
+ p
m−2
(a
m
+
1
a
m+1
)q
m−1
+ q
m−2
=
a
m
p
m−1
+
p
m−1
a
m+1
+ p
m−2
a
m
q
m−1
+
q
m−1
a
m+1
+ q
m−2
=
a
m+1
(a
m
p
m−1
+ p
m−2
) + pm−1
a
m+1
(a
m
q
m−1
+ q
m−2
) + qm−1
=
a
m+1
p
m
+ p
m−1
a
m+1
q
m
+ q
m−1
7 Theorem: p
n
and q
n
, as deﬁned in Theorem 6, satisfy:
p
n
q
n−1
−p
n−1
q
n
= (−1)
n−1
Proof.
p
n
q
n−1
−p
n−1
q
n
= (a
n
p
n−1
+ p
n−2
)q
n−1
−p
n−1
(a
n
q
n−1
+ q
n−2
) =
a
n
p
n−1
q
n−1
+ p
n−2
q
n−1
−a
n
p
n−1
q
n−1
−p
n−1
q
n−2
=
−(p
n−1
q
n−2
−p
n−2
q
n−1
).
By simply repeating this deduction with n −1, n −2, . . . , 2 in place of n, we
get
p
n
q
n−1
−p
n−1
q
n
= (−1)
n−1
(p
1
q
0
−p
0
q
1
) =
(−1)
n−1
((a
0
a
1
+ 1) · 1 −a
0
a
1
) = (−1)
n−1
.
Further, we have
p
n
q
n
−
p
n−1
q
n−1
=
p
n
q
n−1
q
n
q
n−1
−
p
n−1
q
n
q
n
q
n−1
=
p
n
q
n−1
−p
n−1
q
n
q
n
q
n−1
=
(−1)
n−1
q
n
q
n−1
(5)
4
8 Theorem:
p
n
q
n−2
−p
n−2
q
n
= (−1)
n
a
n
Proof.
p
n
q
n−2
−p
n−2
q
n
= (a
n
p
n−1
+ p
n−2
)q
n−2
−p
n−2
(a
n
q
n−1
+ q
n−2
) =
a
n
p
n−1
q
n−2
+ p
n−2
q
n−2
−a
n
p
n−2
q
n−1
−p
n−2
q
n−2
=
a
n
(p
n−1
q
n−2
−p
n−2
q
n−1
) = (−1)
n−2
a
n
= (−1)
n
a
n
.
Thus,
p
n
q
n
−
p
n−2
q
n−2
=
(−1)
n
a
n
q
n
q
n−1
(6)
In general, we write
x
n
=
p
n
q
n
= the n
th
convergent
and x = x
N
. Based on (2),
x = [a
0
, a
1
, . . . , a
N
] = [a
0
, a
1
, . . . , a
m−1
, [a
m
, a
m+1
, . . . , a
N
]] =
[a
m
, a
m+1
, . . . , a
N
]p
m−1
+ p
m−2
[a
m
, a
m+1
, . . . , a
N
]q
m−1
+ q
m−2
(7)
for (2 ≤ n ≤ N.)
9 Theorem: Suppose that
a
j
> 0 (1 ≤ j ≤ N.) (8)
Then
a) The even convergents x
2n
increase strictly with n, while the odd conver
gents x
2n+1
decrease strictly.
Proof. Based on (2) and (8), we have that every q
n
is positive, as is
every a
n
where 0 < n < N.
(−1)
n
a
n
q
n−2
q
n
has the sign of (−1)
n
. By (6),
(−1)
n
a
n
q
n−2
q
n
=
p
n
q
n
−
p
n−2
q
n−2
= x
n
−x
n−2
.
Thus, x
n
− x
n−2
has the sign of (−1)
n
; so if n is even, x
n
> x
n−2
, and
if n is odd, x
n
< x
n−2
, ∀n > 1.
5
b) Every odd convergent is greater than any even convergent.
Proof. Every q
n
is positive, so
(−1)
n−1
q
n
q
n−1
has the sign of (−1)
n−1
. So, after (5), x
n
−x
n−1
has the sign of −1)
n−1
.
Speciﬁcally, x
2m+1
−x
2m
has the sign (−1)
(2m+1)−1
= (−1)
2m
= 1, so
x
2m+1
> x
2m
. (9)
Suppose Theorem 9 b) were false. Then ∃m, µ ∈ N s.t.
x
2m+1
≤ x
2µ
.
If µ < m, then after Thm 9 a), x
2m+1
≤ x
2µ
< x
2m
, contradicting (9);
if µ > m, then x
2µ+1
< x
2m+1
≤ x
2µ
, also contradicting (9).
c) The value of the continued fraction is greater than that of any of its even
convergents and less than that of any of its odd convergents (except
that it is equal to the last convergent, whether this be even or odd.)
Proof. x = x
N
is the greatest of the even or least of the odd convergents,
in either of which case 9 a) and b) show 9 c) to be true.
10 Deﬁnition: A continued fraction [a
0
, a
1
, . . . , a
N
] wherein a
0
∈ Z and ∀js.t.1 ≤
j ≤ N, a
j
∈ (N) is called a simple continued fraction.
Theorem 9 holds for simple continued fractions; further, from (3) and (4) we
have that for a simple continued fraction, p
n
∈ Z and q
n
∈ N.
If [a
0
, a
1
, a
2
, . . . , a
N
] =
p
N
q
N
= x, we say that x (which ∈ Q) is represented by
that continued fraction.
11 Theorem a) q
n
≥ q
n−1
for n ≤ 1, with inequality when n > 1, and
b) q
n
≥ n, with inequality when n > 3.
Proof. q
0
= 1, q
1
= a
1
≥ 1 = q
0
.
For n > 1, q
n−2
≥ 1; q
n
= a
n
q
n−1
+ q
n−2
≥ q
n−1
+ 1, so q
n
> q
n−1
, and
by induction q
n
≥ n.
Finally, if n > 3, then q
n
≥ q
n−1
+ q
n−2
> q
n−1
+ 1 ≥, so q
n
> n.
12 Theorem: The convergents to a simple continued fraction are in their lowest
terms.
6
Proof. p
n
q
n−1
−p
n−1
q
n
= (−1)
n−1
. So,
dp
n
and dq
n
→ d(−1)
n−1
→ d1; thus, gcd(p, q) = 1
13 Theorem: If x is representable by a simple continued fraction with an odd
number of convergents, then it is also representable by one with an even
number, and vice versa.
Proof. If a
n
≥ 2, [a
0
, a
1
, . . . , a
N
] = [a
0
, a
1
, . . . , a
N
−1, 1];
If a
N
= 1, [a
0
, a
1
, . . . , a
N−1
, 1] = [a
0
, a
1
, . . . , a
N−2
, a
N−1
+ 1].
14 Deﬁnition: We call a
n
= [a
n
, a
n+1
, . . . , a
N
](0 ≤ n ≤ N) the n
th
complete
quotient of the continued fraction [a
0
, a
1
, . . . , a
n
, . . . , a
N
]. Thus:
x = a
0
, x = a
0
+
1
a
1
From (7), we have
x =
a
n
p
m−1
+ p
m−2
a
n
q
m−1
+ q
m−2
(10)
15 Theorem: a
n
= {a
n
}, except a
N−1
= {a
N−1
} −1 when a
N
= 1.
Proof. If N = 0, then a
0
= a
0
= {a
0
}.
If N > 0, then,
a
n
= a
n
+
1
a
n+1
. (11)
Lemma a
n+1
> 1(0 ≤ n ≤ N − 1) except that a
n+1
= 1 when n = N − 1
and a
N
= 1.
Proof. We will prove this using backwards induction:
For n = N −1,
a
(N−1)+1
= a
N
= a
N
; if a
(N−1)+1
= 1, then a
N
= 1. Otherwise, since
a
N
∈ N, a
N
> 1, so a
(N−1)+1
> 1.
Suppose a
n+1
> 1. Then 0 <
1
a
n+1
; 1 ≤ a
n−1
. So, a
n
= a
n
+
1
a
n+1
> 1 + 0 =
1.
Since a
n+1
< 1 for (0 ≤ n ≤ N − 1) (except when a
N
= 1), we have
0 <
1
a
n+1
< 1. Thus,
a
n
< a
n
+
1
a
n+1
= a
n+1
< a
n
+ 1. (12)
7
So, for n ≤ N − 1, we have a
n
= {a
n
} except when a
N
= 1, in which
case a
N−1
= a
N−1
+
1
a
N
= a
N−1
+ 1, so that a
N−1
= {a
N−1
}. Further,
a
N−1
= a
N−1
−
1
a
N
= a
N−1
−
1
1
= a
N−1
−1 = {a
N−1
} −1.
Finally, a
N
= a
N
= {a
N
}.
16 Deﬁnition: Two continued fractions are identical if they are formed by the
same exact series of quotients.
17 Theorem: If two simple continued fractions [a
0
, a
1
, . . . , a
N
] and [b
0
, b
1
, . . . , b
M
]
have the same value x, and a
N
> 1 and b
M
> 1, then M = N, and the
fractions are identical.
Proof. By Theorem 16, a
0
= {a
0
} = {x} = {b
0
} = b
0
.
Suppose the ﬁrst n partial quotients in the continued fractions are identical,
and that a
n
and b
n
are the complete quotients. Then x = [a
0
, a
1
, . . . , a
n−1
, a
n
] =
[a
0
, a
1
, . . . , a
n−1
, b
n
].
If n = 1, a
0
+
1
a
1
= a
0
+
1
b
1
; so a
1
= b
1
; so a
1
= {a
1
} = {b
1
} = b
1
.
If n > 1, then by (10),
x =
a
n
p
n−1
+ p
n−2
a
n
q
n−1
+ q
n−2
=
b
n
p
n−1
+ p
n−2
b
n
q
n−1
+ q
n−2
.
(a
n
p
n−1
+ p
n−2
)(b
n
q
n−1
+ q
n−2
) −(b
n
p
n−1
+ p
n−2
)(a
n
q
n−1
+ q
n−2
) = 0
a
n
b
n
p
n−1
q
n−1
+ p
n−2
q
n−2
+ a
n
p
n−1
q
n−2
+ b
n
p
n−2
q
n−1
−
(a
n
b
n
p
n−1
q
n−1
+ p
n−2
q
n−2
+ b
n
p
n−1
q
n−2
+ a
n
p
n−2
q
n−1
) = 0.
(a
n
−b
n
)(p
n−1
q
n−1
−p
n−2
q
n−1
) = 0.
(a
n
−b
n
)(−1)
n
= 0,
a
n
= b
n
.
Since a
N
> 1, b
M
> 1, a
n
= {a
n
} = {b
n
} = b
n
, so a
n
= b
n
.
Suppose, without loss of generality, that N ≤ M. Then our argument shows
that a
n
= b
n
for m ≤ N. If N < M, then
p
N
q
N
= [a
0
, a
1
, ≤, a
N
] = [b
0
, b
1
, ≤, b
N
, b
N+1
, ≤, b
M
] =
b
N+1
p
N
+ p
N−1
b
N+1
q
N
+ q
N−1
.
p
N
(b
N+1
q
N
+ q
N−1
) = q
N
(b
N+1
p
N
+ p
N−1
)
p
N
q
N−1
−p
N−1
q
N
= 0.
This contradicts Theorem 7. So, M = N, and the fractions are identical.
8
Let x be any real number, and let a
0
={x}. Then
x = a
0
+ ξ
0
, 0 ≤ ξ
0
< 1.
If ξ
0
= 0, then we can write
1
ξ
0
= a
1
, {a
1
} = a
1
, a
1
= a
1
+ ξ
1
, 0 ≤ ξ
1
< 1.
If ξ
1
= 0, then we can write
1
ξ
1
= a
2
, {a
2
} = a
2
, a
2
, a
2
= a
2
+ ξ
2
, 0 ≤ ξ
2
< 1.
In general,
1
ξ
n−1
= a
n
> 1, so a
n
≥ 1 for n ≥ 1.
Thus, x = [a
0
, a
1
] = [a
0
, a
1
+
1
a
2
] = [a
0
, a
1
, a
2
] = [a
0
, a
1
, a
2
, a
3
] = . . ., where
a
0
, a
1
, . . . , are integers, and ∀j ∈ N, a
j
∈ N.
18 Deﬁnition The system of equations
x = a
0
+ ξ
0
(0 ≤ ξ
0
< 1)
1
ξ
0
= a
1
= a
1
+ ξ
1
(0 ≤ ξ
1
< 1)
1
ξ
1
= a
2
= a
2
+ ξ
2
(0 ≤ ξ
2
< 1)
.
.
.
.
.
.
is known as the continued fraction algorithm. The algorithm continues so
long as ξ
n
= 0. If we eventually reach a value of n, say N, for which ξ
N
= 0,
then the algorithm terminates and we have:
x = [a
0
, a
1
, . . . , a
N
]
In this case, x is represented by a simple continued fraction, and the numbers
a
n
are the complete quotients of the continued fraction.
19 Theorem: Any rational number x can be represented by a ﬁnite simple con
tinued fraction.
Proof. Put x into the continued fraction algorithm:
Case 1: x is an integer.
Then ξ
0
= 0, so x = a
0
.
Case 2: x is not an integer.
9
Then x =
h
k
, h ∈ Z, k ∈ N.
Since
h
k
= a
0
+ ξ
0
; h = a
0
k + xi
0
k.
Let k
1
= ξ
0
k.(k
1
is the remainder of h/k := (
h
k
−{
h
k
})k.)
So, a
1
=
k
k
1
= {a
1
} + ξ
1
= a
1
+ ξ
1
, and k = a
1
k
1
+ ξ
1
k
1
.
Let k
2
= ξ
1
k
1
; then k = a
1
k
1
+ k
2
, where k
2
is the remainder of
k
k
1
.
We can continue this process, where k
1
= a
2
k
2
+k
3
, k
2
= a
3
k
3
+k
4
, . . . , k
n−2
=
a
n−1
k
n−1
+ k
n
, where a
n−1
= {
k
n−2
k
n−1
} and k
n
is the remainder of
k
n−2
k
n−1
(one
implies the other.) The continued fraction algorithm continues as long as
ξ
n
= 0;
1
ξ
n
= a
n+1
=
k
n
k
n+1
. So, ξ
n
=
k
n+1
k
n
; thus, ξ
n
= 0 ↔ k
n+1
= 0.
So, the continued fraction algorithm continues as long as k
n+1
= 0. Since k
j
is the remainder of k
j−1
/k
j−2
, k
j
≥ 0 for ∀j ∈ N, k, k
1
, k
2
, . . . is a sequence of
nonnegative integers. Since k
n
= a
n
k
n+1
+k
n+2
, k
n
> k
n+1
, so this sequence
is strictly decreasing. Therefore, k
N+1
= 0 for some N. Therefore, ξ
N
= 0
for some N, and so the continue fraction algorithm terminates, giving us a
ﬁnite simple continued fraction that represents x.
20 Remark. The system of equations used in the proof of Theorem 19 is Euclid’s
algorithm, used for determining gcd(h, k).
21 Remark. Since ξ
N
= 0, a
N
= a
N
; also
0 <
1
a
N
=
1
a
N
= ξ
N−1
< 1; a
N
> 1, and so a
N
≥ 2.
Hence the continued fraction algorithm gives us a representation of the type
that was shown to be unique in Theorem 17. For any such representation,
we can always make the variation of Theorem 13. This gives us:
22 Theorem: A rational number can be expressed as a ﬁnite simple continued
fraction in just two ways, one with N convergents and one with N + 1
convergents; in the former, the last partial quotient is > 1, and in the latter,
it is = 1. The continued fraction algorithm always gives us the ﬁrst type of
representation.
What about irrational numbers? If we put one into the continued fraction
algorithm and the algorithm terminated, we would have a ﬁnite simple con
tinued fraction representing an irrational numberbut it is clear that ﬁnite
simple continued fractions are rational, and so this gives us a contradiction.
So, we must have that the continued fraction algorithm never terminates
when given an irrational number. This implies that they might be repre
sented by inﬁnite continued fractionsso let’s develop a theory for those:
10
23 Deﬁnition: Suppose that we have a sequence a
0
, a
1
, a
2
, . . . where a
0
∈ Z and
a
j
∈ N∀j ∈ N. Therefore ∀n ∈ N, [a
0
, a
1
, . . . , a
n
] is a simple continued
fraction, which in turn represents a rational number, let us say x
n
. If lim
n→∞
x
n
exists, and we call it x, then we say that the simple continued fraction
[a
0
, a
1
, a
2
, . . .] converges to the value x and write x = [a
0
, a
1
, a
2
, . . .].
24 Theorem: If a
0
, a
1
, a
2
, . . . , is a sequence of integers as described above (a
0
∈
Z, ∀j ∈ Na
j
∈ N), then x
n
= [a
0
, a
1
, . . . , a
n
] tends to a limit x when n → ∞;
or, in other words, all inﬁnite simple continued fractions are convergent.
Proof. x
n
is called a convergent to [a
0
, a
1
, a
2
, . . .] if
x
n
=
p
n
q
n
= [a
0
, a
1
, . . . , a
n
].
If N ≥ n, then x
n
is also a convergent to [a
0
, a
1
, . . . , a
N
]. By Thm 9 a),
the even convergents form an increasing sequence and the odd convergents
form a decreasing one. Every even convergent is less than x,, by Thm 9 b),
so that the increasing sequence of even convergents is bounded above; and
every odd convergent is greater than x
0
, so that the decreasing sequence of
odd convergents is bounded below. Hence, the odd convergents tend to a
limit
1
, and the odd convergents tend to a limit
2
, and
1
≤
2
.
p
2n
q
2n
−
p
2n−1
q
2n−1
=
p
2n
q
2n−1
−p
2n−1
q
2n
q
2n
q
2n−1
=
(−1)
2n−1
q
2n
q
2n−1
=
1
q
2n
q
2n−1
;
Since q
n
≥ n,
1
q
2n
q
2n−1
≤
1
2n(2n −1)
.
As n → 0,
1
2n(2n−1)
→ 0, so
p
2n
q
2n
−
p
2n−1
q
2n−1
→ 0.
Thus,
1
=
2
. Let us deﬁne x :=
1
=
2
; then [a
0
, a
1
, a
2
, . . .] converges to
this x.
This proof also proves us the following theorem:
25 Theorem: An inﬁnite simple continued fraction is less than any of its odd
convergents and greater than any of its even convergents.
We call a
n
= [a
n
, a
n+1
, . . .] the n
th
complete quotient of the continued fraction
x = [a
0
, a
1
, . . .]. Some familiar facts follow:
a
n
= lim
N→∞
[a
n
, a
n+1
, . . . , a
N
] = a
n
+ lim
N→∞
1
[a
n+1
, . . . , a
N
]
= a
n
+
1
a
n+1
.
11
For instance,
x = a
0
= a
0
+
1
a
1
Also,
a
n+1
> a
n+1
> 0, so 0 <
1
a
n+1
< 1;
since a
n
> a
n
, a
n
= a
n
−
1
a
n+1
, a
n
∈ N, we have:
a
n
= {a
n
}
This gives us:
26 Theorem: If [a
0
, a
1
, a
2
, . . .] = x, then
a
0
= {x}, a
n
= {a
n
}.
Also, through a parallel argument to Thm 17, we have
27 Theorem: Two inﬁnite simple continued fractions which have the same value
are identical.
This tells us that if an irrational number can be expressed as an inﬁnite
continued fraction, then its representation is unique. Consider the con
tinued fraction algorithm; if x is irrational, the algorithm doesn’t termi
nate, and hence it gives us a sequence of integers a
0
, a
1
, a
2
, . . . and numbers
a
0
, a
1
, a
2
, . . ., where
x = [a
0
, a
1
] = [a
0
, a
1
, a
2
] = . . . = [a
0
, a
1
, a
2
, . . . a
n
, a
n+1
],
and where, by (10), x =
a
n+1
p
n
+ p
n−1
a
n+1
q
n
+ q
n−1
, so
x−
p
n
q
n
=
a
n+1
p
n
+ p
n−1
a
n+1
q
n
+ q
n−1
−
p
n
q
n
=
a
n+1
p
n
q
n
+ p
n−1
q
n
−(a
n+1
p
n
q
n
+ p
1
q
n−1
)
q
n
(a
n+1
q
n
+ q
n−1
)
=
(−1)
n
q
n
(a
n+1
q
n
+ q
n−1
)
; so
x −
p
n
q
n
<
1
q
n
(a
n+1
q
n
+ q
n−1
)
=
1
q
n
(q
n+1
≤
1
n(n + 1)
,
which → 0 as n → ∞.
Thus:
x = lim
n→∞
p
n
q
n
= [a
0
, a
1
, a
2
, . . . a
n
, . . .]
So, the algorithm gives a continued fraction whose value is x. Further, this
representation is unique; each irrational number can be represented at least
one way and at most one way as a simple continued fraction.
12
28 Theorem: Every irrational number can be expressed in exactly one way by
an inﬁnite simple continued fraction.
We have now done a lot of work on continued fractions, and have gotten
some important results: by putting a real number through the continued
fraction algorithm, we get a simple continued fraction equal to that number;
if the real number is rational, we will get a ﬁnite simple continued fraction
whose last quotient is greater than 1, and which is a unique expression for
that rational number among ﬁnite simple continued fractions with quotients
greater than 1; if the real number is irrational, we will get an inﬁnite simple
continued fraction which is the unique expression of that irrational number
as a simple continued fraction.
Consider this algorithm. In general, it states that a
n
= a
n
+ ξ
n
, or in other
words, a
n
= {a
n
} +
1
a
n+1
. So,
a
n
−{a
n
} =
1
a
n+1
(13)
Recall the Gauss transformation:
ϕ(x) =
1
x
−{
1
x
} if x ∈ (0, 1]
0 if x = 0
Suppose, n ≥ 0, 0 < ξ
n
< 1; then ξ
n
=
1
a
n+1
.
For n ≥ 1, 0 <
1
a
n
, so ϕ1a
n
is well deﬁned, and, by (13),
ϕ(
1
a
n
) = a
n
−{a
n
} =
1
a
n+1
.
Suppose that for n = N, ξ
n
= 0;, the continued fraction algorithm terminates,
and we have that for the ﬁnal complete quotient a
N
,
ϕ(
1
a
N
) = a
N
−{a
N
} = 0.
Therefore, we have
ϕ
n−1
(
1
a
1
) =
1
a
n
unless the continued fraction is ﬁnite and n > N,
in which case ϕ
n−1
(
1
a
1
) = 0.
13
Since a
n
= {a
n
},
a
n
=
1
ϕ
n−1
1
a
1
¸
. (14)
Let us constrict our attention to a real number x in the interval [0, 1). The
simple continued fraction expansion of x generated by the continued fraction
algorithm will have to have a
0
= {x} = 0. Therefore,
x ∈ [0, 1), then x = a
0
+
1
a
1
.
This and (14) give us
29 Theorem: If x ∈ [0, 1], its continued fraction expansion as obtained through
the continued fraction algorithm is:
[1] if x = 1,
[0,
¸
1
x
¸
,
1
ϕ(x)
¸
,
1
ϕ
2
(x)
¸
, . . . ,
1
ϕ
N−1
(x)
¸
] if x is rational, and
[0,
¸
1
x
¸
,
1
ϕ(x)
¸
,
1
ϕ
2
(x)
¸
, . . . ,
1
ϕ
n−1
(x)
¸
, . . .] if x is irrational.
Further, if x is rational, and a
N
is the last quotient of its continued fraction
expansion as obtained through the algorithm, then ∀m ≥ N, ϕ
m
(x) = 0.
For the next theorem, we will need a lemma:
Lemma: ϕ(x) ∈ Q iﬀ x ∈ Q.
Proof. If x ∈ Q, then x =
p
q
, p, ∈ N, q ∈ Z
ϕ(x) =
q
p
−{
q
p
} =
q
p
−
(q −q mod p)
p
=
q mod p
p
∈ Q.
Suppose x / ∈ Q, but ϕ(x) ∈ Q. Then ϕ(x) =
a
b
, a, b ∈ Z.
ϕ(x) =
a
b
=
1
x
−
1
x
,
a + b
¸
1
x
¸
b
=
1
x
,
x =
b
a + b
¸
1
x
¸, where b, a + b
1
x
= 0, ∈ Z, so
x ∈ Q, giving us to a contradiction.
14
We get that if x is rational, ϕ(x) is rational, and if x is irrational, then ϕ(x)
is irrational. Suppose x is rational, and suppose that its fraction generated
by the continued fraction algorithm terminates after a
N
. Then ξ
N
= 0, so
a
N
= a
N
, and xi
N
= a
N
= {a
N
}, so ϕ
N−1
(x) =
¸
ϕ
N−1
(x)
¸
.. Further,
a
n
> a
n
∀n < N, and so,
ϕ
n−1
(x) >
¸
ϕ
n−1
(x)
¸
∀n s.t. 1 < n < N.
Consider then the fraction generated by putting ϕ(x) through the algorithm;
for 1 < n < N, ϕ
n−1
(x) > {ϕ
n−1
(x)} . So, ϕ
n−2
(ϕ(x)) > {ϕ
n−2
(ϕ(x))} ∀n s.t. 1 <
n < N, or in other words, ϕ
n−1
(ϕ(x)) > {ϕ
n−1
(ϕ(x))} ∀n s.t. 1 < n < N−1.
Thus, if b
j
are the partial quotients of the fraction generated by ϕ(x), b
n−1
>
b
n−1
, so ξ
n
> 0, ∀n s.t. 1 < n < N −1.
Therefore, the fraction does not terminate anywhere before reaching b
N−1
.
However, we have
ϕ
N−1
(x) = {ϕ
N−1
(x)}
ϕ
N−2
(ϕ(x)) = {ϕ
N−1
(ϕ(x))}
b
N−1
= b
n−1
xi
N−1
= 0,
so that the fraction terminates after b
N
−1. Therefore, combining this result
with Thm 29, we get that if x is rational and x ∈ [0, 1), and the simple
continued fraction generated by putting x through the continued fraction
algorithm ends with a
N
(the N+1
th
term), then the simple continued fraction
generated by putting ϕ(x) through the algorithm ends with its N
th
term, and
looks like this:
ϕ(x) = [0,
1
ϕ(x)
,
1
ϕ(ϕ(x))
,
1
ϕ
2
(ϕ(x))
, . . . ,
1
ϕ
N−2
(ϕ(x))
]
= [0,
1
ϕ(x)
,
1
ϕ
2
(x))
,
1
ϕ
3
(x))
, . . . ,
1
ϕ
N−1
(x)
]
remember, x = [0,
1
x
,
1
ϕ(x)
,
1
ϕ
2
(x)
, . . . ,
1
ϕ
N−1
(x)
]
(15)
Similarly, if x is irrational and x ∈ [0, 1], then ϕ(x) is irrational, and thus:
ϕ(x) = [0,
1
ϕ(x)
,
1
ϕ(ϕ(x))
,
1
ϕ
2
(ϕ(x))
, . . .] =
[0,
1
ϕ(x)
,
1
ϕ
2
(x))
,
1
ϕ
3
(x))
, . . .]
remember that x = [0,
1
x
,
1
ϕ(x)
,
1
ϕ
2
(x)
, . . .]
(16)
15
30 Deﬁnition: a
n
(x) := the n + 1
th
quotient, a
n
, that one would get if one put
x through the continued fraction algorithm.
Using this deﬁnition, (15) and (16), we get
31 Theorem: For x = [0, 1),
If n = 0, a
n
(ϕ(x)) = a
n
(x) = 0.
If n ≥ 1, a
n
(ϕ(x)) = a
n+1
(x).
In other words, to get the continued fraction of ϕ(x) from the continued
fraction of x, simple remove a
1
(x) from its entry and shift all the entries
following it over one:
x = [0, a
1
(x), a
2
(x), a
3
(x), . . .]
ϕ(x) = [0, a
2
(x), a
3
(x), a
4
(x), . . .]
This can be repeated, so that
32 Corollary: For x ∈ [0, 1), n ≥ 1, a
n
(ϕ
m
(x)) = a
n+m
(x).
Once more, in other words, to get from ϕ
m
(x) from x, remove a
1
through
a
m
of x, and shift the remaining terms over:
x = [0, a
1
(x), a
2
(x), a
3
(x), . . . , a
m
(x), a
m+1
(x), a
m+2
(x), . . .]
ϕ
m
(x) = [0, a
m+1
(x), a
m+2
(x), a
m+3
(x), . . .]
(Remember, if x is rational, and its expansion → N terms, ϕ
m
(x)’s expansion
has N −m.)
33 Theorem: When x ∈ [0, 1)ϕ
−1
(x), the preimage of x, consists of all continued
fractions of the form: [0, n, a
1
(x), a
2
(x), . . .], where n ∈ N.
Proof. Suppose ϕ(y) = x. Then, for n = 0, a
0
(y) = a
0
(x) = 0.], and n ≥
1, a
n
(x), a
n+1
(y). Since y is of the form [0, z, a
1
(x), a
2
(x), . . .]. Since z is
undetermined by x, it can be any number in the range of a
1
(y), which is N.
So, y is of the form [0, n, a
1
(x), a
2
(x), . . .], where n ∈ N.
34 Corollary: When x ∈ [0, 1), ϕ
−m
(x), the preimage of x, consists of all con
tinued fractions of the form: [0, n
1
, n
2
, . . . , n
m
, a
1
(x), a
2
(x), . . .], where n
j
∈
N for 1 ≤ j ≤ m.
16
35 Remarks: The theory of simple continue fractions and the Gauss transfor
mation gives us a remarkable way to look at the continuum, a way diﬀerent
from the standard decimal expansion. We need to develop some terminology
ﬁrst.
36 Deﬁnition: The interval [
1
n+1
,
1
n
) = [[0, n + 1], [0, n]), n ∈ N, is called an n
th
platform of level 1.
An interval of the form (
1
z+
1
n
,
1
z+
1
n+1
] = ([0, z, n], [0, z, n + 1]], z, n ∈ N, is
called an n
th
platform of level 2.
The n
th
platforms of the m
th
level refer to those intervals of the form
[
1
z
1
+
1
z
2
+
1
.
.
.
+
1
z
n−1
+
1
n+1
,
1
z
1
+
1
z
2
+
1
.
.
.
+
1
z
n−1
+
1
n
) : z
1
, z
2
, . . . , z
m−1
, n ∈ N
=
{[[0, z
1
, z
2
, . . . , z
n−1
, n + 1], [0, z
1
, z
2
, . . . , z
n−1
, n]) : z
1
, z
2
, . . . , z
m−1
, n ∈ N}
if m is odd and
(
1
z
1
+
1
z
2
+
1
.
.
.
+
1
z
n−1
+
1
n
,
1
z
1
+
1
z
2
+
1
.
.
.
+
1
z
n−1
+
1
n+1
] : z
1
, z
2
, . . . , z
m−1
, n ∈ N
=
{([0, z
1
, z
2
, . . . , z
n−1
, n], [0, z
1
, z
2
, . . . , z
n−1
, n + 1]] : z
1
, z
2
, . . . , z
m−1
, n ∈ N}
if m is even.
Each n
th
platform of level m contains exactly one j
th
platform of level m+1
for each j ∈ N, and is in fact the union of this collection of platforms of
one higher level with its own endpoint (either the left or right endpoint, de
pending on whether m is odd or even, respectively.) Thus, we could describe
uniquely each platform by listing the platforms of lower levels that it is a
member of.
37 Deﬁnition: pl
n
1
,n
2
,...,n
m−1
,n
m
is the n
th
m
platform of level m that’s contained in
the n
th
m−1
platform of level m−1 that’s contained in the . . . that’s contained
in the n
nd
2
platform of level 2 that’s contained in the n
th
1
platform of level 1.
17
The sequence of numbers a
1
(x), a
2
(x), a
3
(x), . . . associated with a number
x ∈ [0, 1), being the quotients of the simple continued fraction generated by
putting x through the continued fraction algorithm, gives us the address of
the number x in the terms of which platforms it’s on. For each a
j
(x), we
know that x is on the a
j
th platform of level j. More speciﬁcally, we know
that x ∈ pl
a
1
(x),a
2
(x),...,a
n
(x)
for all n ≤ N if x is rational and a
N
is the last
quotient of x, or for all n ∈ N if x is irrational.
Let us call the simple continued fraction generated by the continued fraction
algorithm applied to a number x the continued fraction expansion of x. A
number’s decimal expansion tells us its location in the decimal system, in
which each level is divided into ten equal pieces. The continued fraction
expansion of a number x ∈ [0, 1) tells us its location in the coordinate system
of the platforms, in which each level is divided into a countably inﬁnite
number of parts of logorhythmically decreasing length.
Now, it is not hard to see that ϕ(pl
n
1
,n
2
,n
3
,...
) is equal to pl
n
2
,n
3
,n
4
,...
. In this
way, the Gauss transformation strips away levels of information about the
location of a number–if we know ϕ
m
(x), we don’t know which platforms of
levels 1 through m−1 it resides on.
One ﬁnal note for now about the relationship of ϕ(x) and the platforms.
Suppose we deﬁne PL to be the set of all platforms of all levels, and suppose
that we deﬁne f(X), where X is a collection of sets and f is a function on
the members of those sets to be the set of images of the members of X under
f. Then we have that
ϕ(PL) = PL
and so
ϕ
m
(PL) = PL, m ∈ N
2 The Lebesgue Integral and Ergodic Theory
I will only have space to gave a skeletal version of the theory behind the
Lebesque integral; I will state many theorems without proving them.
38 Deﬁnition: A topology (X, τ) consists of a collection τ of subsets of a set X
such that:
i) ∅ ∈ τ and X ∈ τ.
ii) If V
i
∈ τ for i = 1, . . . , n, then V
1
∩ V
2
∩ . . . ∩ V
n
∈ τ.
18
iii) If {V
α
} is an arbitary collection of members of τ (ﬁnite, countable, or
uncountable), then ∪
α
V
α
∈ τ.
If there exists such a τ for a set X, then X is called a topological space, and
the members of τ are called the open sets in X.
39 Deﬁnition: A collection M of subsets of a set X is said to be a σ −algebra
in X if M has the following property:
i) X ∈ M
ii) If A ∈ M, then X −A ∈ M
iii) If A = ∪
∞
n=1
A
n
and if A
n
∈ M for n = 1, 2, 3, . . . , then A ∈ M.
If there exists such a M for a set X, then X is called a measurable space,
and the members of M are called the measurable sets in X. Further if X is
a measurable space, Y is a topological space, and f is a mapping of X into
Y , then f is said to be measurable if f
−1
(V ) is a measurable set in X for
every open set V in Y.
40 Remarks: Another way of describing a σalgebra is to say that it contains
the empty set and that it is closed under countable numbers of intersections,
unions, and complements. An example of a commonly used topology is the
topology on the extended real number line [−∞, ∞] generated by taking
ﬁnite intersections and inﬁnite unions of open intervals (intervals of the form
[−∞, a), (a, b), (a, ∞].) An example of a commonly used σalgebra is the
σalgebra on the extended real number line [−∞, ∞] generated by taking
countable complements, unions, and intersections of any interval (all intervals
of the form [−∞, a), [−∞, a], (a, b), (a, b], [a, b), [a, b], (a, ∞], [a, ∞].)
41 Deﬁnition: Let {a
n
} be a sequence in [−∞, ∞], and put
b
k
= sup {a
k
, a
k+1
, a
k+2
, . . .}(k = 1, 2, 3, . . .) (17)
where sup is the least upper bound of the sequence a
k
, a
k+1
, a
k+2
, . . . and
β = inf {b
1
, b
2
, b
3
, . . .} (18)
where inf is the greatest lower bound of the sequence b
1
, b
2
, b
3
, . . . . We call
β the upper limit of {a
n
}, and write
β = limsup
n→∞
a
n
The following properties are easily veriﬁed: ﬁrst, b
1
≥ b
2
≥ b
3
≥ . . ., so that
b
k
→ β as k → ∞; secondly, there is a subsequence {a
n
i
} of {a
n
} such that
a
n
i
→ β as i → ∞, and β is the largest number with this property.
19
The lower limit is deﬁned analogously: switch (17) and (18). If {a
n
} con
verges, we have:
limsup
n→∞
a
n
= liminf
n→∞
a
n
= lim
n→∞
a
n
and some thought shows that if limsup
n→∞
a
n
= liminf
n→∞
a
n
, then a
n
converges to
that limit.
Suppose {f
n
} is a sequence of extendedreal functions on a set X. Then
sup
n
f
n
and limsup
n→∞
f
n
are the functions deﬁned on X by
(sup
n
f
n
)(x) = sup
n
(f
n
(x))
(limsup
n→∞
f
n
)(x) = limsup
n→∞
(f
n
(x))
42 Deﬁnition: If E is a subset of X, then the function on X
ξ
E
(x) =
1 if x ∈ E
0 if x / ∈ E
is called the characteristic function of the set E.
A function s on a measurable space X whose range consists of only ﬁnitely
many points in [0, ∞) is known as a simple function. Clearly, a simple func
tion is expressible as a ﬁnite linear combination of characteristic functions. A
rather beautiful theorem that we have not the time to prove is the following:
43 Theorem: Let f : X → [0, ∞] be measurable. There exist simple measurable
functions s
n
on X such that
a) 0 ≤ s
1
≤ s
2
≤ . . . ≤ f.
b) s
n
(x) → f(x) as n → ∞, for every x ∈ X
44 Deﬁnition: A measure is a function µ, deﬁned on a σ − algebra M, whose
range is in [0, ∞] and which is σ−additive, meaning that if {A
i
} is a disjoint
countable collection of members of M, then
µ(
∞
∪
i=1
A
i
) =
∞
¸
i=1
µ(A
i
).
We will also assume that ∃A ∈ M such that µ < ∞. A measurable space
with a measure deﬁned on it is called a measure space.
As an example, take the σ − algebra described above on the extended real
line, [−∞, ∞], let us call it M and deﬁne the following measure on it: if I
20
is an interval in M, let m(I) = b − a, where a and b are the left and right
endpoints of I, respectively, regardless of whether I is open, semiopen, or
closed. If X ∈ M is not an interval, it is not hard to see that it is expressible
as a countable union of disjoint intervals; as such, deﬁne m(X) as the sum
of the measure of the disjoint intervals resulting from a decomposition of X.
It can be shown that m is welldeﬁned in this way.
A consequence of this is that m({2}) = m([2, 2]) = 0]. Thus, we have a
nonempty set of measure 0. In fact, since measures are σ − additive, we
can take the union of a countable number of such singlepoint intervals and
get an inﬁnite set of measure 0. We can therefore come to the conclusion
that the set of rational numbers has measure 0; and we can go further still–if
you extend our deﬁnition of measure to cover a larger group of sets, creating
what is known as the Lebesgue measure–which we will not deﬁne here–one
ﬁnds that one can have an uncountable set of measure 0–namely, the famous
Cantor set. Further:
45 Deﬁnition: Suppose that x ∈ [0, 1], and that the digit b occurs n
b
times in
the ﬁrst n places of the decimal expansion of x, base r. If
n
b
n
→ β
when n → ∞, then we say that b appears with asymptotic frequency β. (This
limit need not exist for a given x.) We say that x is simply normal base r if
n
b
n
→
1
r
for each of the r possible values of b. Further, we say that x is normal base
r if all of the numbers
x −{x}, rx −{rx}, r
2
x −{r
2
x}, . . .
are simply normal in all of the sacles
r, r
2
, r
3
, ...;
This is the same as saying that in the decimal expansion of x in base r, every
combination
b
1
b
2
. . . b
k
of digits occurs with the right frequency: if n
b
is the number of times this
sequence occurs in the ﬁrst n digits of x, then
n
b
n
→
1
r
k
21
as n → ∞. There is theorem, the proof of which is once more beyond the
scope of this paper, that the set of normal numbers has measure 1; thus, the
set of numbers that aren’t normal in [0,1] has measure 0. When something
is true for all members of a measure space X except for a set of measure 0,
we say that it is true for almost every x ∈ X. Thus, almost every x in [0, 1]
is normal. The purpose of this paper is to prove an analogous result for the
continued fraction expansion of a number in [0,1].
One of the reason that it is signiﬁcant for a given set to be of measure 0
is demonstrated in the constructing of the Lebesgue integral, which brings
together a number of the ideas presented in this section:
46 Deﬁnition: Suppose that (X, M) is a measurable space and that µ is a mea
sure on M. If s is a measurable simple function on X, we have pointed out
that it is expressible as
s =
n
¸
i=1
α
i
ξ
A
i
.
where α
1
, . . . , α
n
are the distinct values of s, A
i
= {x : s(x) = α
i
}, and ξ
A
i
is the characteristic function of A
i
, and if E ∈ M, we deﬁne
E
sdµ =
n
¸
i=1
α
i
µ(A
i
∩ E).
(If this equation requires us to take 0· ∞, we follow the convention 0· ∞ = 0.
If f : X → [0, ∞] is measurable, and E ∈ M, we deﬁne
E
fdµ = sup
E
sdµ (19)
the supremum being taken over all simple measurable functions s such that
0 ≤ s ≤ f. The left number (19) is called the Lebesgue integral of f over
E, with respect to the measure µ. Further, suppose that f is a realvalued
measurable function on X for which
X
fdµ < ∞
We call the collection of such functions L
1
(µ), also known as the Lebesque
integrable functions (with respect to µ). Let f
+
= max{f, 0} and f
−
=
−min{f, 0}; then we deﬁne
E
fdµ =
E
f
+
dµ −
E
f
−
dµ
22
for every measurable set E. One of the consequences of such a deﬁnition is
that if E is a set of measure 0 and f ∈ L
1
(µ), then
E
fdµ = 0, and that as
such, sets of measure 0 exert no inﬂuence on integrals taken over them.
Before getting to the most important theorems of this paper, we will need
two more results from Lebesgue theory, which I will state without proof:
47 Fatou’s Lemma: If f
n
: X → [0, ∞] is measurable, for each positive integer
n, then
X
(liminf
n→∞
f
n
)dµ ≤ liminf
n→∞
X
f
n
dµ
48 Lebesgue’s Dominated Convergence Theorem: Suppose {f
n
} is a sequence
of real measurable functions on X such that
f(x)) = lim
n→∞
f
n
(x)
exists for every x ∈ X. If there is a function g ∈ L
1
(µ) such that
f
n
(x) ≤ g(x)(n = 1, 2, 3, . . . ; x ∈ X)
then f ∈ L
1
(µ),
lim
n→∞
X
f
n
−fdµ = 0,
and
lim
n→∞
X
f
n
dµ =
X
fdµ.
49 Deﬁnition: Suppose (X, A, µ) and (Y, B, ν) are measurable spaces. We can
redeﬁne measurable function from X to Y without respect to a topology on
Y by saying that a function if T : X → Y is measurable if the preimage
of any measurable set is measurable. A measurable function T : X → Y
is measurepreserving if µ(T
−1
(B)) = ν(B), ∀B ∈ B. If T maps X onto
itself, we call T a measurepreserving transformation, and we say that µ is
Tinvariant.
If µ(X) = 1, we say that (X, A, µ) is a probability space. If µ(X) is ﬁnite,
then we say (X, A, µ) is a ﬁnite measure space, which can be rescaled to be
a probability space by scaling µ by
1
µ(X)
.
50 Poincar´e Recurrence Theorem: Let T be a measurepreserving transfor
mation of a ﬁnite space (X, A, µ). If A is a measurable set, then for almost
every x ∈ A, there are inﬁnitely many k ∈ N such that T
k
(x) ∈ A.
23
Proof. Let
B = {x ∈ A : T
k
(x) / ∈ A for all k ∈ N} = A −
¸
k∈N
T
−k
(A).
Then B ∈ A, and all the preimages T
−k
(B) are disjoint, are measurable,
and have the same measure as B. Since X has ﬁnite total measure, and
µ(X) ≥ µ(
¸
k∈N
T
−k
(B)) =
∞
¸
k=1
µ(T
−k
(B)) =
∞
¸
k=1
µ(B),
it follows that µ(B) = 0. Hence, for almost every x ∈ A, x ∈ A − B, so
that for almost every x, ∃n ∈ N : T
n
(x) ∈ A. Since T
n
(x) ∈ A, ∃n
1
∈
N : T
n
1
(T
n
(x)) ∈ A, T
n+n
1
(x) ∈ A; by similar arguments, ∃n
2
, n
3
, . . . :
T
n+n
1
(x), T
n+n
2
(x), T
n+n
3
(x), . . . , ∈ A.
Before proving the most important theorem of the paper, we need to prove
a combinatorial lemma:
51 Deﬁnition: If a
1
, . . . , a
m
are real numbers and 1 ≤ n ≤ m, we say that a
k
is
an nleader if a
k
+ . . . + a
k+p−1
≥ 0 for some p, 1 ≤ p ≤ n (for instance,
nonnegative numbers are always nleaders.)
52 Lemma: For every n, 1 ≤ n ≤ m, the sum of all nleaders is nonnegative.
Proof. If there are no nleaders, the lemma is true. Otherwise, let a
k
be
the ﬁrst nleader, and p ≥ 1. If p > 1, then a
k
< 0, and we have that
a
k+1
+. . . +a
k+p+1
≥ 0, and so a
k+1
is an nleader. Repeating this argument,
we get that if k ≤ j ≤ k + p − 1, then a
j
+ . . . + a
k+p−1
≥ 0, and so a
j
is
an nleader, and we have that for the total collected nleaders thus far, their
sum is nonnegative. Repeating the argument with the remaining sequence
a
k+p
, . . . , a
n
proves the lemma.
53 Birkhoﬀ Ergodic Theorem: Let T be a measurepreserving transformation
in a ﬁnite space (X, A, µ), and let f ∈ L
1
(µ), and is realvalued. Then the
limit
¯
f(x) = lim
n→∞
1
n
n−1
¸
k=0
f(T
k
(x))
exists for almost every x ∈ X. Further,
¯
f ∈ L
1
(µ), is Tinvariant, and satisﬁes
X
¯
f(x)dµ =
X
f(x)dµ
24
Proof. Let
A = {x ∈ X : f(x) + f(T(x)) + . . . + f(T
k
(x)) ≥ 0 for some k ∈ N ∪ {0}}
Lemma: (Maximal Ergodic Theorem)
A
f(x)dµ ≥ 0.
Proof. Let A
n
= {x ∈ X : Σ
k
i=0
f(T
i
(x)) ≥ 0 for some k, 0 ≤ k ≤ n}. Then
A
n
⊂ A
n+1
, A = ∪
n∈N
A
n
. Deﬁne:
g
n
(x) =
f(x) if x ∈ A
n
0 if x ∈ X −A
n
Clearly, g
n
∈ L
1
(µ), and f(x) = lim
n→∞
g(x), for all x ∈ A. Also, g
n+1
 ≥ g
n
,
and that f > g
n
 ∀n ∈ N. So, applying the Dominated Convergence
Theorem, we get:
A
fdµ = lim
n→∞
A
g
n
dµ = lim
n→∞
A
n
fdµ
Therefore, it suﬃces to show that
A
n
fdµ ≥ 0 for each n.
Fix an arbitrary m ∈ N. Let s
n
(x) be the sum of the nleaders in the
sequence f(x), f(T(x)), . . . , f(T
m+n−1
(x)). For k ≤ m + n − 1, let B
k
⊂ X
be the set of points for which f(T
k
(x)) is an nleader of this sequence. From
Lemma 52, we have 0 ≤
X
s
n
(x)dµ. A little bit of thought shows that:
0 ≤
X
s
n
(x)dµ =
m+n−1
¸
k=0
B
k
f(T
k
(x))dµ (20)
Note that for 1 ≤ k ≤ m, x ∈ B
k
iﬀ T(x) ∈ B
k−1
, and further, for these k,
B
k
= T
−1
(B
k−1
) = T
−k
(B
0
), and so
B
k
f(T
k
(x))dµ =
T
−k
(B
0
)
f(T
k
(x))dµ =
B
0
f(x)dµ,
this last step due to the fact that µ is Tinvariant. Thus, the ﬁrst m + 1
25
steps of (20) are equal, and since B
0
= A
n−1
, we have
0 ≤
m+n−1
¸
k=0
B
k
f(T
k
(x))dµ =
(m + 1)
A
n−1
f(x)dµ +
m+n−1
¸
k=m+1
B
k
f(T
k
(x))dµ ≤
(m + 1)
A
n−1
f(x)dµ + (n −1)
X
f(x)dµ.So,
A
n−1
f(x)dµ ≥
−(n −1)
X
f(x)dµ
m + 1
.
Since m is arbitrary, we can make it as large as we want, showing that
A
n
fdµ ≥ 0.
Continuing the proof of the Birkhoﬀ ergodic theorem, for any a, b ∈ R, a < b,
the set
X(a, b) =
x ∈ X : liminf
n→∞
1
n
n−1
¸
i=0
f(T
i
(x)) < a < b < limsup
n→∞
1
n
n−1
¸
i=0
f(T
i
(x))
¸
is measurable and Tinvariant. Applying the Maximal Ergodic Theorem for
the function f(x) −b, we have that
B
f(x)dµ ≥ 0 where
B = {x ∈ X : (f(x) −b) + (f(T(x)) −b) + . . .
+(f(T
k
(x)) −b) ≥ 0 for some k ∈ N ∪ {0}} =
{x ∈ X : (f(x) + f(T(x)) + . . . + f(T
k−1
(x))) −b(k) ≥ 0 for some k ∈ N} =
{x ∈ X :
1
k
(f(x) + f(T(x)) + . . . + f(T
k−1
(x))) ≥ b for some k ∈ N}
Therefore, we see that if x ∈ X(a, b), then x ∈ B. If x ∈ B, but x / ∈ X(a, b),
then there are two options. Either only a ﬁnite number of
1
k
k−1
¸
i=0
f(T
i
(x)) < a for all k ∈ N
26
but by the Poincar´e Recurrence Theorem, this can only be true for a set of
measure 0; or
1
k
k−1
¸
i=0
f(T
i
(x)) = b for all k ∈ N, including i = 0,
so f(x) = b.
In either case, we have
B−X(a,b)
(f(x) −b)dµ = 0.
So,
0 ≤
B
(f(x) −b)dµ =
X(a,b)
(f(x) −b)dµ +
B−X(a,b)
(f(x) −b)dµ =
X(a,b)
(f(x) −b)dµ
By an analogous argument, we have
0 ≤
X(a,b)
(a −f(x))dµ
and therefore
0 ≤
X(a,b)
(a −f(x))dµ +
X(a,b)
(f(x) −b)dµ =
X(a,b)
(a −b)dµ.
But since b > a, the above equation must be equality, and X(a, b) must have
measure 0. Since a and b were arbitrary, we have that the limit represented
by
¯
f exists for almost every x ∈ X.
For n ∈ N, let f
n
(x) =
1
n
Σ
n−1
i=0
f(T
i
(x)). Deﬁne
¯
f : X → R by
¯
f(x) =
liminf
n→∞
f
n
(x). Then
¯
f is measurable, and f
n
converges for almost every x to
¯
f. By Fatou’s Lemma,
X

¯
fdµ =
X
liminf
n→∞
f
n
(x)dµ ≤ liminf
n→∞
X
f
n
dµ ≤
liminf
n→∞
1
n
n−1
¸
j=0
X
f(T
j
(x))dµ = liminf
n→∞
1
n
n−1
¸
j=0
T
−j
(X)
f(T
j
(x))dµ
27
By the Tinvariance of µ, we have
X

¯
fdµ ≤
liminf
n→∞
1
n
n−1
¸
j=0
T
−j
(X)
f(T
j
(x))dµ =
liminf
n→∞
1
n
n−1
¸
j=0
X
f(x)dµ = liminf
n→∞
1
n
· n
X
f(x)dµ =
liminf
n→∞
X
f(x)dµ =
X
f(x)dµ.
Therefore,
X

¯
fdµ < ∞,
¯
f ∈ L
1
(µ). Finally, let N be the set of points for
which the limit of
¯
f does not exist. (µ(N) = 0.) Let
f
n
(x) =
1
n
n−1
¸
j=0
f(T
j
(x)) if x ∈ X −N
0 if x ∈ N
and let
¯
f
0
(x) =
lim
n→∞
1
n
n−1
¸
j=0
f(T
j
(x)) if x ∈ X −N
0 if x ∈ N
Then f
n
→
¯
f
0
for every x ∈ X; as has been shown, 
¯
f ∈ L
1
(µ); further,

¯
f ≥ f
n
(x) ∀n ∈ N. So, using the Dominated Convergence Theorem, we
have
¯
f
0
∈ L
1
(µ), and so
X
¯
fdµ =
X−N
¯
fdµ =
X−N
¯
f
0
dµ =
X
¯
f
0
dµ =
lim
n→∞
X
1
n
n−1
¸
j=0
f(T
j
(x))dµ = lim
n→∞
1
n
n−1
¸
j=0
X
f(T
j
(x))dµ =
lim
n→∞
1
n
n−1
¸
j=0
T
−j
(X)
f(T
j
(x))dµ = lim
n→∞
1
n
n−1
¸
j=0
X
f(x)dµ =
lim
n→∞
1
n
· n
X
f(x)dµ = lim
n→∞
X
f(x)dµ =
X
f(x)dµ.
54 Deﬁnition: Let T be a measurepreserving transformation on a measure space
(X, A, µ). A measurable function f : X → R is essentially Tinvariant
if µ({x ∈ X : f(T
n
x) = f(x)}) = 0 for every t. A measurable set A
28
is essentially Tinvariant if its characteristic function ξ
A
is essentialy T
invariant. A measurepreserving transformation T is ergodic if any essentially
Tinvariant measurable set has either measure 0 or full measure (the measure
of the entire set.)
55 Corollary to the Birkhoﬀ Ergodic Theorem: If a measurepreserving trans
formation T in a ﬁnite measure space (X, A, µ) is ergodic, then for each
f ∈ L
1
(µ)
lim
n→∞
1
n
n−1
¸
k=0
f(T
k
(x)) =
1
µ(X)
X
f(x)dµ, for almost every x ∈ X.
The converse is also true, but we shall not need it here.
Proof. If f is ergodic, then for each real number c the set
Y
a
= x ∈ X : limsup
n→∞
1
n
n−1
¸
i=0
f(T
i
(x)) < a
is fully invariant. Therefore, it is essentially invariant, and we have that
µ(Y
a
) is either 0 or µ(X). Consider the function a → µ(Y
a
). It is clearly
monotonically increasing, and therefore takes the form
µ(Y
a
) =
0 if a < a
0
µ(X) if a
0
< a
(21)
where a
0
∈ [−∞, ∞]. Let
N = {x ∈ X : limsup
n→∞
1
n
n−1
¸
i=0
f(T
i
(x)) = a
0
}.
N is also invariant, but µ(N) = µ(X) is incompatible with (21); so we have
that µ(N) = 0, so that for almost every x ∈ X,
¯
f(x) = lim
n→∞
1
n
n−1
¸
i=0
f(T
i
(x)) = limsup
n→∞
1
n
n−1
¸
i=0
f(T
i
(x)) = a
0
So that we have
X
¯
f(x)dµ =
X
a
0
dµ =
X
f(x)d
x
µ(X)a
0
=
X
f(X)d
X
a
0
=
1
µ(X)
X
f(x)dµ,
completing the proof.
29
Finally, we return to the Gauss transformation.
56 Deﬁnition: The Gauss measure µ is deﬁned by
µ(A) =
1
ln 2
A
dµ
1 + x
.
It will not be proved here, but this measure is ϕinvariant. It is also a
probability measure. We will need a few more facts before completing our
ﬁnal pair of theorems. First of all, through induction, we can see that if
p
n
(x)
q
n
(x)
is the n
th
convergent of the continued fraction expansion of x, then
p
n
(x) ≥ 2
(n−2)/2
and q
n
(x) ≥ 2
(n−1)/2
for n ≥ 2.
Secondly, deﬁne ψ
b
1
,...,b
n
to be
ψ
b
1
,...,b
n
(t) = [b
1
, . . . , b
n−1
, b
n
+ t]
. In other words, ψ
b
1
,...,b
n
maps [0, 1) onto pl
b
1
,...,b
n
, and is thus decreasing if
n is odd and increasing if n is even. For x ∈ pl
b
1
,...,b
n
,
x = ψ
b
1
,...,b
n
=
p
n
(x) + tp
n−1
(x)
q
n
(x) + tq
n−1
(x)
. (22)
Finally, if λ is the default measure on [0, 1], the Lebesgue measure (which is
equivalent to m discussed above on objects such as intervals), it is not hard
to see that then λ(pl
b
1
,...,b
n
) = (q
n
(q
n
+ q
n−1
))
−
1.
57 Theorem: The Gauss transformation is ergodic for the Gauss measure µ.
Proof. For a measure ν and measurable sets A and B with νB = 0, let
ν(AB) = ν(A ∩ B)/ν(B) denote the conditional measure. Fix b
1
, . . . , b
n
,
and let pl
n
= pl
b
1
,...,b
n
, ψ
n
= ψ
b
1
,...,b
n
. The length of pl
n
is ±(ψ
n
(1) − ψ
n
(0)),
and for 0 ≤ x < y ≤ y,
λ({z : x ≤ ϕ
n
(z) < y} ∩ pl
n
) = ±(ψ
n
(y) −ψ
n
(x)),
where the sign depends on whether n is odd or even. Therefore
λ(ϕ
−n
([x, y))pl
n
) =
ψ
n
(y) −ψ
n
(x)
ψ
n
(1) −ψ
n
(0)
,
and, by Theorem 7 and (22),
λ(ϕ
−n
([x, y))pl
n
) = (y −x) ·
q
n
(q
n
+ q
n−1
)
(q
n
+ xq
n−1
)(q
n
+ yq
n−1
)
.
30
The second factor in the righthand side is between 1/2 and 2. Hence
1
2
λ([x, y)) ≤ λ(ϕ
−n
([x, y))pl
n
) ≤ 2λ([x, y)).
Since the intervals [x, y) generate the σalgebra,
1
2
λ(A) ≤ λ(ϕ
−n
(A)pl
n
) ≤ 2λ(A) (23)
for any measurable set A ⊂ [0, 1]. Because the density of the Gauss measure
µ is between 1/(2 ln2) and 1/ln2,
1
2 ln 2
λ(A) ≤ µ(A) ≤
1
ln 2
λ(A).
By (23),
1
4
µ(A) ≤ µ(ϕ
−n
(A)pl
n
) ≤ 4µ(A)
for any measurable set A ⊂ [0, 1].
Let A be a measurable ϕinvariant set with µ(A) > 0. Then
1
4
µ(A) ≤
µ(Apl
n
), or, equivalently,
1
4
µ(pl
n
) ≤ µ(pl
n
A). Since the platforms (inter
vals) pl
n
generate the σalgebra,
1
4
µ(B) ≤ µ(BA) for any measurable set B.
By choosing B = [0, 1] −A we obtain that µ(A) = 1.
This brings us to our conclusion:
58 Deﬁnition: Suppose that x ∈ [0, 1], and that the natural number k occurs
n
k
times in the ﬁrst n partial quotients of the (simple) continued fraction
expansion of x. If
n
k
n
→ κ
when n → ∞, then we say that k appears with asymptotic frequency κ.
59 Theorem: For almost every x ∈ [0, 1], every integer k ∈ N appears in its
(simple) continued fraction expansion (i.e. in the sequence a
1
(x), a
2
(x), . . .
with asymptotic frequency
1
ln 2
ln
(k + 1)
2
k(k + 2)
.
Numbers for which Theorem 59 is true are called ϕnormal numbers.
31
Proof. a
n
(x) = k iﬀ ξ
pl
k
(ϕ(x)) = 1. By the corollary to the Birkhoﬀ Ergodic
Theorem, for almost every x,
lim
n→∞
1
n
n−1
¸
i=0
a
n
(x) =
lim
n→∞
1
n
n−1
¸
i=0
ξ
pl
k
(ϕ
i
(x)) =
1
0
ξ
pl
k
dµ = µ
¸
1
k
,
1
k + 1
=
1
ln 2
ln
(k + 1)
2
k(k + 2)
.
A quick application of this theorem. Suppose that you construct a tree in
the following way. Start with a point; draw two line segments branching
oﬀ, creating two new points. Out of each of those points, draw two new
line segments, creating four new points. Continue this indeﬁnitely. This is
called an inﬁnite binary tree. Arrange the tree so that each new row of 2
n
points is arranged in a horizontal line beneath the previous row. From this
perspective, it is possible to describe every point of the tree in terms of a
unique sequence of lefts and rights taken from the initial point. For instance,
leftrightright takes you one place and rightrightleft takes you some place
else. Further, each sequence of lefts and rights takes you somewhere on the
tree. (The initial point is equivalent to zero lefts and zero rights.) Therefore,
there is a bijection from the set of sequences of lefts and rights and the points
of the binary tree. Further, it is not hard to see that you can rewrite each
sequence of lefts and rights in terms of streaks of lefts and rights; speciﬁcally,
you can rewrite all sequences in the form
R
α
0
L
α
1
R
α
2
L
α
3
. . . R
α
n−1
L
α
n
,
α
j
∈ Z∀j; α
j
> 0 for 1 ≤ j ≤ n −1; α
0
and α
n
≥ 0
Based on this, there is a bijection between points on the tree and all positive
simple continued fractions, where each point on the tree R
α
0
L
α
1
. . . R
α
n−1
L
α
n
is mapped to
α
0
+
1
α
1
+
1
.
.
.
+
1
α
n−1
+
1
α
n
+ 1
If we made a tree at each point of which was written its corresponding
positive simple continued fraction, and then found the irreducible rational
number representing that fraction, we would get the famous SternBrocot
32
tree. In such a tree, inﬁnite continued fractions would represent strings of
lefts and rights of inﬁnite length.
Alternatively, suppose that we represent each point on the binary tree with
a string of the form
L
α
1
R
α
2
. . . L
α
n−1
R
α
n
,
α
j
∈ Z∀j; α
j
> 0 for 2 ≤ j ≤ n −1; α
1
and α
n
≥ 0
Then we can establish a bijection between points on the tree and all numbers
in the interval (0, 1] by mapping L
α
1
R
α
2
. . . L
α
n−1
R
α
n
to
1
α
1
+
1
α
2
+
1
.
.
.
+
1
α
n
+ 1
In this case, we can directly apply Theorem 59.
60 Deﬁnition: Suppose that we have a string of lefts and rights, L
α
1
R
α
2
. . ., and
that the natural number k occurs n
k
times in the ﬁrst n numbers of the
sequence {α
1
, α
2
, . . .}. If
n
k
n
→ κ
Then we say that k appears with asymptotic frequency κ. In that case, if
we deﬁne a measure on the set of all inﬁnite strings such that a set of strings
has the measure that its corresponding set in (0, 1] has, we would see that for
almost every string of rights and lefts on a binary tree, every integer k ∈ N
appears (as an exponent) with asymptotic frequency
1
ln 2
ln
(k + 1)
2
k(k + 2)
.
In other words, probabilistically speaking, for 100% of the paths along an
inﬁnite tree, the distribution of streaks (i.e. the number of times one moves
the same direction once in a row, twice in a row, three times in a row,. . .,
k times in a row, etc.) will be deﬁned by the above equation. A binary
tree describes many processes in the world, for instance the ﬂipping of a
coin; tails = left, heads = right, for example. Since the inﬁnite binary tree
represents all scenarios of an inﬁnite number of coin ﬂips, we can see that
100% of the time, the distribution of an inﬁnite amount of coin ﬂips will
have a distribution of streaks as deﬁned above, and that as ﬁnite amounts
33
of coin ﬂips get larger, they will tend towards this distribution. This means,
for instance, that about 40% of streaks will be of length 1.
Finally, here are some questions I would like to investigate: what is the
relationship between the set of normal numbers and the set of ϕnormal
numbers? What is the smallest measure 1 subset of [0, 1]? Are there complex
analogies to simple continued fractions?
That is all–thank you for reading!
34