You are on page 1of 277

Quadratic Residues and Non-Residues:

Selected Topics
arXiv:1408.0235v7 [math.NT] 21 Oct 2016

Steve Wright
Department of Mathematics and Statistics
Oakland University
Rochester, Michigan 48309
U.S.A.
e-mail: wright@oakland.edu
For Linda

i
Contents

Preface vii

Chapter 1. Introduction: Solving the General Quadratic Congruence Modulo a Prime 1


1. Linear and Quadratic Congruences 1
2. The Disquisitiones Arithmeticae 4
3. Notation, Terminology, and Some Useful Elementary Number Theory 6

Chapter 2. Basic Facts 9


1. The Legendre Symbol, Euler’s Criterion, and other Important Things 9
2. The Basic Problem and the Fundamental Problem for a Prime 13
3. Gauss’ Lemma and the Fundamental Problem for the Prime 2 15

Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity 19


1. What is a reciprocity law? 20
2. The Law of Quadratic Reciprocity 23
3. Some History 26
4. Proofs of the Law of Quadratic Reciprocity 30
5. A Proof of Quadratic Reciprocity via Gauss’ Lemma 31
6. Another Proof of Quadratic Reciprocity via Gauss’ Lemma 35
7. A Proof of Quadratic Reciprocity via Gauss Sums: Introduction 36
8. Algebraic Number Theory 37
9. Proof of Quadratic Reciprocity via Gauss Sums: Conclusion 44
10. A Proof of Quadratic Reciprocity via Ideal Theory: Introduction 50
11. The Structure of Ideals in a Quadratic Number Field 50
12. Proof of Quadratic Reciprocity via Ideal Theory: Conclusion 57
13. A Proof of Quadratic Reciprocity via Galois Theory 65

Chapter 4. Four Interesting Applications of Quadratic Reciprocity 71


1. Solution of the Fundamental Problem for Odd Primes 72
2. Solution of the Basic Problem 75
3. Sets of Integers which are Quadratic Residues of Infinitely Many Primes 79
iii
4. Intermezzo: Dirichlet’s Theorem on Primes in Arithmetic Progression 82
5. The Asymptotic Density of Primes 87
6. The Density of Primes which have a Given Finite Set of Quadratic Residues 88
7. Zero-Knowledge Proofs and Quadratic Residues 96
8. Jacobi Symbols 99
9. An Algorithm for Fast Computation of Legendre Symbols 102

Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications 107
1. Dedekind’s Ideal Distribution Theorem 107
2. The Zeta Function of an Algebraic Number Field 116
3. The Zeta Function of a Quadratic Number Field 124
4. Proof of Theorem 4.12 and Related Results 126
5. Proof of the Fundamental Theorem of Ideal Theory 132

Chapter 6. Elementary Proofs 137


1. Whither Elementary Proofs in Number Theory? 137
2. An Elementary Proof of Theorem 5.13 138
3. An Elementary Proof of Theorem 4.12 142

Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues 147


1. Positivity of Sums of Values of a Legendre Symbol 147
2. Proof of Theorem 7.1: Outline of the Argument 150
3. Some Useful Facts About Dirichlet L-functions 152
4. Calculation of a Gauss Sum 154
5. Some Useful Facts About Analytic Functions of a Complex Variable 157
6. The Convergence of Fourier Series 160
7. Proof of Theorems 7.2, 7.3, and 7.4 166
8. An Elegant Proof of Lemma 4.8 for Real Dirichlet Characters 174
9. A Proof of Quadratic Reciprocity via Finite Fourier Series 177

Chapter 8. Dirichlet’s Class-Number Formula 183


1. Some Structure Theory for Dirichlet Characters 184
2. The Structure of Real Primitive Dirichlet Characters 184
3. Elements of the Theory of Quadratic Forms 187
4. Representation of Integers by Quadratic Forms and the Class Number 189
5. The Class-Number Formula 192
6. The Class-Number Formula and the Class Number of Quadratic Fields 199
7. A Character-Theoretic Proof of Quadratic Reciprocity 202
iv
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression 207
1. Long Sets of Consecutive Residues and Non-Residues 208
2. Long Sets of Residues and Non-Residues in Arithmetic Progression 209
3. Weil Sums and their Estimation 211
4. Solution of Problems 1 and 3 217
5. Solution of Problems 2 and 4: Introduction 220
6. Preliminary Estimate of qε (p) 222
7. Calculation of Σ4 (p): Preliminaries 225
8. The (B, S)-signature of a Prime 226
9. Calculation of Σ4 (p): Conclusion 228
10. Solution of Problems 2 and 4: Conclusion 231
11. An Interesting Class of Examples 234
12. The Asymptotic Density of Π+ (a, b) 244

Chapter 10. Are quadratic residues randomly distributed? 247


1. Irregularity of the Distribution of Quadratic Residues 247
2. Detecting Random Behavior Using the Central Limit Theorem 249
3. Verifying Random Behavior via a Result of Davenport and Erdös 251

Bibliography 257

Index 261

v
Preface

Although number theory as a coherent mathematical subject started with the work of
Fermat in the 1630’s, modern number theory, i.e., the systematic and mathematically rig-
orous development of the subject from fundamental properties of the integers, began in
1801 with the appearance of Gauss’ landmark treatise Disquisitiones Arithmeticae [19]. A
major part of the Disquisitiones deals with quadratic residues and non-residues: if p is an
odd prime, an integer z is a quadratic residue (respectively, a quadratic non-residue) of p if
there is (respectively, is not) an integer x such that x2 ≡ z mod p. As we shall see, qua-
dratic residues arise naturally as soon as one wants to solve the general quadratic congruence
ax2 +bx+c ≡ 0 mod m, a 6≡ 0 mod m, and this, in fact, motivated some of the interest which
Gauss himself had in them. Beginning with Gauss’ fundamental contributions, the study of
quadratic residues and non-residues has subsequently led directly to many of the key ideas
and techniques that are used everywhere in number theory today, and the primary goal of
these lecture notes is to use this study as a window through which to view the development
of some of those ideas and techniques. In pursuit of that goal, we will employ methods from
elementary, analytic, and combinatorial number theory, as well as methods from the theory
of algebraic numbers.
In order to follow these lectures most profitably, the reader should have some familiarity
with the basic results of elementary number theory. An excellent source for this material (and
much more) is the text [30] of Kenneth Ireland and Michael Rosen, A Classical Introduction
to Modern Number Theory. A feature of this text that is of particular relevance to what we
discuss is Ireland and Rosen’s treatment of quadratic and higher-power residues, which is
noteworthy for its elegance and completeness, as well as for its historical perspicacity. We
will in fact make use of some of their work in Chapters 3 and 7.
Although not absolutely necessary, some knowledge of algebraic number theory will also
be helpful for reading these notes. We will provide complete proofs of some facts about
algebraic numbers and we will quote other facts without proof. Our reference for proof of
the latter results is the classical treatise of Erich Hecke [27], Vorlesungen über die Theorie der
Algebraischen Zahlen, in the very readable English translation by G. Brauer and J. Goldman.
About Hecke’s text André Weil ([58], foreword) had this to say: “To improve upon Hecke,

vii
in a treatment along classical lines of the theory of algebraic numbers, would be a futile and
impossible task.” We concur enthusiastically with Weil’s assessment and highly recommend
Hecke’s book to all those who are interested in number theory.
We next offer a brief overview of what is to follow. The notes are arranged in a series
of ten chapters. Chapter 1, an introduction to the subsequent chapters, provides some
motivation for the study of quadratic residues and non-residues by consideration of what
needs to be done when one wishes to solve the general quadratic congruence mentioned
above. We briefly discuss the contents of the Disquisitiones Arithmeticae, present some
biographical information about Gauss, and also record some basic results from elementary
number theory that will be used frequently in the sequel. Chapter 2 provides some useful
facts about quadratic residues and non-residues upon which the rest of the chapters are
based. Here we also describe a procedure which provides a strategy for solving what we
call the Basic Problem: if d is an integer, find all primes p such that d is a quadratic
residue of p. The Law of Quadratic Reciprocity is the subject of Chapter 3. We present
seven proofs of this fundamentally important result (five in Chapter 3, one in Chapter
7, and one in Chapter 8), which focus primarily (but not exclusively) on the ideas used
in the proofs of quadratic reciprocity which Gauss discovered. Chapter 4 discusses some
interesting and important applications of quadratic reciprocity, having to do with the solution
of the Basic Problem from Chapter 2 and with the structure of the finite subsets S of the
positive integers possessing at least one of the following two properties: for infinitely many
primes p, S is a set of quadratic residues of p, or for infinitely many primes p, S is a set of
quadratic non-residues of p. Here the fundamental contributions of Dirichlet to the theory
of quadratic residues enters our story and begins a major theme that will play throughout
the rest of our work. Chapter 4 concludes with an interesting application of quadratic
residues in modern cryptology, to so-called zero-knowledge or minimal-disclosure proofs.
The use of transcendental methods in the theory of quadratic residues, begun in Chapter 4,
continues in Chapter 5 with the study of the zeta function of an algebraic number field and
its application to the solution of some of the problems taken up in Chapter 4. Chapter 6
gives elementary proofs of some of the results in Chapter 5 which obviate the use made there
of the zeta function. The question of how quadratic residues and non-residues of a prime p
are distributed among the integers 1, 2, . . . , p − 1 is considered in Chapter 7, and there we
highlight additional results and methods due to Dirichlet which employ the basic theory of
L-functions attached to Dirichlet characters determined by certain moduli. Because of the
importance that positivity of the values at s = 1 of Dirichlet L-functions plays in the proof of

viii
the results of Chapter 7, we present in Chapter 8 a discussion and proof of Dirichlet’s class-
number formula as a way to defenitively explain why the values at s = 1 of L-functions are
positive. In Chapter 9 the occurrence of quadratic residues and non-residues as arbitrarily
long arithmetic progressions is studied by means of some ideas of Harold Davenport [5] and
some techniques in combinatorial number theory developed in recent work of the author [62],
[63]. A key issue that arises in our approach to this problem is the estimation of certain
character sums over the field of p elements, p a prime, and we address this issue by using some
results of A. Weil [57] and G. I. Perel’muter [44]. Our discussion concludes with Chapter 10,
where the Central Limit Theorem from the theory of probability and a theorem of Davenport
and Paul Erdös [6] are used to provide evidence for the contention that as the prime p tends
to infinity, quadratic residues of p are distributed randomly throughout certain subintervals
of the set {1, 2, . . . , p − 1}.
These notes are an elaboration of the contents of a special-topics-in-mathematics course
that was offered during the Summer semesters of 2014 and 2015 at Oakland University. I
am very grateful to my colleague Meir Shillor for suggesting that I give such a course, and
for thereby providing me with the impetus to think about what such a course would entail.
I am also very grateful to my colleagues Eddie Cheng and Serge Kruk, the former for giving
me very generous and valuable assistance with numerous LaTeX issues which arose during
the preparation of the manuscript, and the latter for formatting all of the figures in the text.
I thank my students Saad Al Najjar, Amelia McIlvenna and Julian Venegas for reading an
early version of the notes and offering several insightful comments which were very helpful
to me. My sincere and heartfelt appreciation is also tendered to the anonymous referees
for many comments and suggestions which resulted in a very substantial improvement in
both the content and exposition of these notes. Finally, and above all others, I am grateful
beyond words to my dear wife Linda for her unstinting love, support, and encouragement;
this humble missive is dedicated to her.

ix
CHAPTER 1

Introduction: Solving the General Quadratic Congruence Modulo


a Prime

The purpose of this chapter is to define quadratic residues and non-residues and to use
the solution of the general quadratic congruence modulo a prime to indicate one reason why
the study of quadratic residues and non-residues is interesting and important. This is done
in section 1. The primary source for essential information about quadratic residues is the
Disquisitiones Arithmeticae of Carl Friedrich Gauss, one of the most important books about
number theory ever written. Because of its singular prominence for number theory and also
for what we will do in these lecture notes, the contents of the Disquisitiones are discussed
briefly in section 2, and some biographical facts about Gauss are also presented. Notation
and terminology that will be employed throughout the sequel are recorded in section 3, as
well as a few basic facts from elementary number theory that will be used frequently in
subsequent work.

1. Linear and Quadratic Congruences

One of the central problems of number theory, both ancient and modern, is finding
solutions (in the integers) of polynomial equations with integer coefficients in one or more
variables. In order to motivate our study, consider the equation

ax ≡ b mod m,

a linear equation in the unknown integer x. Elementary number theory provides an algorithm
for determining exactly when this equation has a solution, and for finding all such solutions,
which essentially involves nothing more sophisticated than the Euclidean algorithm (see
Proposition 1.4 below and the comments after it).
When we consider what happens for the general quadratic congruence

(1) ax2 + bx + c ≡ 0 mod m, a 6≡ 0 mod m,

things get more complicated. In order to see what the issues are, note first that
1
Chapter 1. Introduction: Solving the General Quadratic Congruence Modulo a Prime

(2ax + b)2 ≡ b2 − 4ac mod 4am


iff 4a2 x2 + 4abx + 4ac ≡ 0 mod 4am
iff 4a(ax2 + bx + c) ≡ 0 mod 4am
iff ax2 + bx + c ≡ 0 mod m.

Hence (1) has a solution if and only if

(2) 2ax ≡ s − b mod 4am,

where s is a solution of

(3) s2 ≡ b2 − 4ac mod 4am.

Now (2) has a solution if and only if s − b is divisible by 2a, the greatest common divisor of
2a and 4am, and so it follows that (1) has a solution if and only if (3) has a solution s such
that s − b is divisible by 2a. We have hence reduced the solution of (1) to finding solutions
s of (3) which satisfy an appropriate divisibility condition.
Our attention is therefore focused on the following problem: if n and z are integers with
n ≥ 2, find all solutions x of the congruence

(4) x2 ≡ z mod n.

Let
k
Y
n= pαi i
i=1
be the prime factorization of n, and let Σi denote the set of all solutions of the congruence

x2 ≡ z mod pαi i , i = 1, . . . , k.

Let s = (s1 , . . . , sk ) ∈ Σ1 × · · · × Σk , and let σ(s) denote the simultaneous solution, unique
mod n, of the system of congruences

x ≡ si mod pαi i , i = 1, . . . , k,

obtained via the Chinese remainder theorem (Theorem 1.3 below). It is then not difficult to
show that the set of all solutions of (4) is given precisely by the set

{σ(s) : s ∈ Σ1 × · · · × Σk }.

Consequently (4), and hence also (1), can be solved if we can solve the congruence

(5) x2 ≡ z mod pα ,
2
1. Linear and Quadratic Congruences

where p is a fixed prime and α is a fixed positive integer.


In articles 103 and 104 of Disquisitiones Arithmeticae [19], Gauss gave a series of beautiful
formulae which completely solve (5) for all primes p and exponents α. In order to describe
them, let σ ∈ {0, 1, . . . , pα − 1} denote a solution of (5).
I. Suppose first that z is not divisible by p. If p = 2 and α = 1 then σ = 1. If p is odd or
p = 2 = α then σ has exactly two values ±σ0 . If p = 2 and α > 2 then σ has exactly four
values ±σ0 and ±σ0 + 2α−1 .
II. Suppose next that z is divisible by p but not by pα . If (5) has a solution it can be
shown that the multiplicity of p as a factor of z must be even, say 2µ, and so let z = z1 p2µ .
Then σ is given by the formula

σ ′ pµ + ipα−µ , i ∈ {0, 1, . . . , pµ − 1},

where σ ′ varies over all solutions, determined according to I, of the congruence

x2 ≡ z1 mod pα−2µ .

III. Finally if z is divisible by pα , and if we set α = 2k or α = 2k − 1, depending on


whether α is even or odd, then σ is given by the formula

ipk , i ∈ {0, . . . , pα−k − 1}.

We will focus on the most important special case of (5), namely when p is odd and α = 1,
i.e., the congruence

(6) x2 ≡ z mod p

(note that when p is odd, the solutions of (5) in cases I and II are determined by the solutions
of (6) for certain values of z). The first thing to do here is to observe that the ring determined
by the congruence classes of integers mod p is a field, and so (6) has at most two solutions.
We have that x ≡ 0 mod p is the unique solution of (6) if and only if z is divisible by p, and
if s0 6≡ 0 mod p is a solution of (6) then so is −s0 , and s0 6≡ −s0 mod p because p is an odd
prime. These facts are motivation for the following definition:

Definition. If p is an odd prime and z is an integer not divisible by p, then z is a quadratic


residue ( respectively, quadratic non-residue) of p if there is (respectively, is not) an integer
x such that x2 ≡ z mod p.

As a consequence of our previous discussion and Gauss’ solution of (5), solutions of (1)
will exist only if (among other things) for each (odd) prime factor p of 4am, the discriminant
3
Chapter 1. Introduction: Solving the General Quadratic Congruence Modulo a Prime

b2 − 4ac of ax2 + bx + c is either divisible by p or is a quadratic residue of p. This remark


becomes even more emphatic if the modulus m in (1) is a single odd prime p. In that case,

(2ax + b)2 ≡ b2 − 4ac mod p iff ax2 + bx + c ≡ 0 mod p,

from whence the next proposition follows immediately:

Proposition 1.1. Let p be an odd prime.The congruence

(7) ax2 + bx + c ≡ 0 mod p, a 6≡ 0 mod p,

has a solution if and only if

(8) x2 ≡ b2 − 4ac mod p

has a solution, i.e., if and only if either b2 − 4ac is divisible by p or b2 − 4ac is a quadratic
residue of p. Moreover, if (2a)−1 is the multiplicative inverse of 2a mod p (which exists
because p does not divide 2a; see Proposition 1.2 below) then the solutions of (7) are given
precisely by the formula
x ≡ (±s − b)(2a)−1 mod p,
where ±s are the solutions of (8).

We take it as self-evident that the solution of the general quadratic congruence (1) is one of
the most fundamental and most important problems in the theory of Diophantine equations
in two variables. By virtue of Proposition 1.1 and the discussion which precedes it, quadratic
residues and non-residues play a pivotal role in the determination of the solutions of (1).
We hope that the reader will now agree: the study of quadratic residues and non-residues is
important and interesting!

2. The Disquisitiones Arithmeticae

As we pointed out in the preface, modern number theory, that is, the systematic and
mathematically rigorous development of the subject from fundamental properties of the
integers, began in 1801 with the publication of Gauss’ great treatise, the Disquisitiones
Arithmeticae. Because of its first appearance here in our story, and especially because it
plays a dominant role in that story, we will now briefly discuss some of the most important
aspects of the Disquisitiones. The book consists of seven sections divided into 366 articles.
The first three sections are concerned with establishing basic results in number theory such as
the Fundamental Theorem of Arithmetic (proved rigorously here for the first time), Fermat’s
little theorem, primitive roots and indices, the Chinese remainder theorem, and Wilson’s
theorem. Perhaps the most important innovation in the Disquisitiones is Gauss’ introduction
4
2. The Disquisitiones Arithmeticae

in Section I, and its systematic use throughout the rest of the book, of the concept of
modular congruence. Gauss shows how modular congruence can be used to give the study
of divisibility of the integers a comprehensive and systematic algebraic formulation, thereby
greatly increasing the power and diversity of the techniques in number theory that were in
use up to that time. Certainly one of the most striking examples of the power of modular
congruence is the use that Gauss made of it in Section IV in his investigation of quadratic
residues, which culminates in the first complete and correct proof of the Law of Quadratic
Reciprocity, the most important result of that subject. We will have much more to say about
quadratic reciprocity in Chapter 3.
By far the longest part of the Disquisitiones, over half of the entire volume, is taken up
by Section V, which contains Gauss’ deep and penetrating analysis of quadratic forms. If
(a, b, c) ∈ Z × Z × Z, the function defined on Z × Z by

(x, y) → ax2 + bxy + cy 2

is called a (binary) quadratic form. We frequently repress the functional dependence on (x, y)
and hence also refer to the polynomial ax2 +bxy+cy 2 as a quadratic form. In Section V, Gauss
first develops a way to systematically classify quadratic forms according to their number-
theoretic properties and then investigates in great depth the algebraic and number theoretic
structure of quadratic forms using the properties of this classification. Next, he defines an
operation, which he called the composition of forms, on the set of forms ax2 +bxy +cy 2 whose
discriminate b2 −4ac is a fixed value. The set of all quadratic forms whose discriminants have
a fixed value supports a basic equivalence relation which partitions this set of forms into a
finite number of equivalence classes (section 12, Chapter 3), and composition of forms turns
this set of equivalence classes into what we would today call a group. Of course Gauss did not
call it that, as the group concept was not formulated until much later; however, he did make
essential use of the group structure which the composition of forms possesses. Gauss then
proceeds to use composition of forms together with the methods that he developed previously
to establish additional results concerning the algebraic and number-theoretic structure of
forms. We will see in section 12 of Chapter 3 how some of the elements of Gauss’ theory
of quadratic forms arises in our study of quadratic reciprocity, and we will also make use of
some of the basic theory of quadratic forms in Chapter 8.
Section VI of the Disquisitiones is concerned with applications of primitive roots, qua-
dratic residues, and quadratic forms to the structure of rational numbers, the solution of
modular square-root problems, and primality testing and prime factorization of the integers.
Finally, in Section VII, Gauss presents his theory of cyclotomy, the study of divisions of the
5
Chapter 1. Introduction: Solving the General Quadratic Congruence Modulo a Prime

circle into congruent arcs, which culminates in his famous theorem on the determination of
the regular n-gons which can be constructed using only a straightedge and compass.
It is appropriate at this juncture to say a few words about Gauss himself. Carl Friedrich
Gauss was born in 1777 in Brunswick, a city in the north of present-day Germany, lived for
most of his life in Göttingen, and died there in 1855. Gauss’ exceptional mathematical talent
was clear from a very early age. Because he was such a gifted and promising young student,
Gauss was introduced in 1791 to the Duke of Brunswick, who became a prominant patron
and supporter of Gauss for many years (see, in particular, the dedication in the Disquisi-
tiones which Gauss addressed to the Duke). In 1795, Gauss matriculated at the University
of Göttingen, left in 1798 without obtaining a degree, and was granted a doctoral degree
from the University of Helmstedt in 1799. After his celebrated computation of the orbit of
the asteroid Ceres in 1801 Gauss was appointed director of the newly opened observatory at
the University of Göttingen in 1807, which position he held for the rest of his life. In addition
to his ground-breaking work in number theory, Gauss made contributions of fundamental
importance to geometry (differential geometry and non-Euclidean geometry), analysis (el-
liptic functions, elliptic integrals, and the theory of infinite series), physics (potential theory
and geomagnetism), geodesy and astronomy (celestial mechanics and the computation of the
orbits of celestial bodies), and statistics and probability (the method of least squares and
the normal distribution).

3. Notation, Terminology, and Some Useful Elementary Number Theory

We now fix some notation and terminology that will be used repeatedly throughout the
sequel. The letter p will always denote a generic odd prime, the letter q, unless otherwise
specified, will denote a generic prime (either even or odd), P is the set of all primes, Z is
the set of all integers, Q is the set of all rationals, and R is the set of all real numbers. If
m, n ∈ Z with m ≤ n then [m, n] is the set of all integers at least m and no more than n,
listed in increasing order, [m, ∞) is the set of all integers exceeding m − 1, also listed in
increasing order, and gcd(m, n) is the greatest common divisor of m and n. If n ∈ [2, ∞)
then U(n) will denote the set {m ∈ [1, n − 1] : gcd(m, n) = 1}. If z is an integer then π(z)
will denote the set of all prime factors of z. If A is a set then |A| will denote the cardinality
of A, 2A is the set of all subsets of A, and ∅ denotes the empty set. Finally, we will refer
to a quadratic residue or quadratic non-residue as simply a residue or non-residue; all other
residues of a modulus m ∈ [2, ∞) will always be called ordinary residues. In particular, the
minimal non-negative ordinary residues modulo m are the elements of the set [0, m − 1].
6
3. Notation, Terminology, and Some Useful Elementary Number Theory

We also recall some facts from elementary number theory that will be useful in what
follows. For more information about them consult any standard text on elementary number
theory, e.g., Ireland and Rosen [30] or K. Rosen [48].
If m is a positive integer and a ∈ Z, recall that an inverse of a modulo m is an integer α
such that aα ≡ 1 mod m.

Proposition 1.2. If m is a positive integer and a ∈ Z then a has an inverse modulo m


if and only if gcd(a, m) = 1. Moreover, the inverse is relatively prime to m and is unique
modulo m.

Theorem 1.3. (Chinese remainder theorem). If m1 , . . . , mr are pairwise relatively prime


positive integers and (a1 , . . . , ar ) is an r-tuple of integers then the system of congruences

x ≡ ai mod mi , i = 1, . . . , r,

has a simultaneous solution that is unique modulo ri=1 mi . Moreover, if


Q
Y
Mk = mi ,
i6=k

and if yk is the inverse of Mk mod mk (which exits because gcd(mk , Mk ) = 1) then the
solution is given by
r
X Yr
x≡ ak Mk yk mod mi .
k=1 i=1

Recall that a linear Diophantine equation is an equation of the form

ax + by = c,

where a, b, and c are given integers and x and y are integer-valued unknowns.

Proposition 1.4. Let a, b, and c be integers and let gcd(a, b) = d. The Diophantine
equation ax + by = c has a solution if and only if d divides c. If d divides c then there are
infinitely many solutions (x, y), and if (x0 , y0 ) is a particular solution then all solutions are
given by
x = x0 + (b/d)n, y = y0 − (a/d)n, n ∈ Z.

Given the Diophantine equation ax + by = c with c divisible by d = gcd(a, b), the


Euclidean algorithm can be used to easily find a particular solution (x0 , y0). Simply let
k = c/d and use the Euclidean algorithm to find integers m and n such that d = am + bn;
then (x0 , y0) = (km, kn) is a particular solution, and all solutions can then be found by using
Proposition 1.4. The simple first-degree congruence ax ≡ b mod m can thus be easily solved
7
Chapter 1. Introduction: Solving the General Quadratic Congruence Modulo a Prime

upon the observation that this congruence has a solution x if and only if the Diophantine
equation ax + my = b has the solution (x, y) for some y ∈ Z.

8
CHAPTER 2

Basic Facts

In this chapter, we lay the foundations for all of the work that will be done in subsequent
chapters. Section 1 defines the Legendre symbol and verifies its basic properties, proves
Euler’s criterion, and deduces some corollaries which will be very useful in many situations
in which we will find ourselves. Motivated by the solutions of a quadratic congruence modulo
a prime which we discussed in Chapter 1, we formulate what we will call the Basic Problem
and the Fundamental Problem for Primes in section 2. In section 3, we state and prove
Gauss’ Lemma for residues and non-residues and use it to solve the Fundamental Problem
for the prime 2.

1. The Legendre Symbol, Euler’s Criterion, and other Important Things

In this section, we establish some fundamental facts about residues and non-residues that
will be used repeatedly throughout the rest of these notes.

Proposition 2.1. In every complete system of ordinary residues modulo p, there are
exactly (p − 1)/2 quadratic residues.

Proof. It suffices to prove that in [1, p − 1] there are exactly (p − 1)/2 quadratic residues.
Note first that 12 , 22, . . . , ( p−1
2
)2 are all incongruent mod p (if 1 ≤ i, j < p/2 and i2 ≡ j 2 mod p
then i ≡ j hence i = j or i ≡ −j, i.e., i + j ≡ 0. But 2 ≤ i + j < p, and so i + j ≡ 0 is
impossible).
Let S denote the set of minimal non-negative ordinary residues mod p of 12 , 22 , . . . , ( p−1 2
)2 .
The elements of S are quadratic residues of p and |S| = (p−1)/2. Suppose that n ∈ [1, p−1] is
a quadratic residue of p. Then there exists r ∈ [1, p−1] such that r 2 ≡ n. Then (p−r)2 ≡ r 2 ≡
n and {r, p − r} ∩ [1, (p − 1)/2] 6= ∅. Hence n ∈ S, whence S = the set of quadratic residues
of p inside [1, p − 1]. QED
Remark. The proof of Proposition 2.1 provides a way to easily find, at least in principle,
the residues of any prime p. Simply calculate the integers 12 , 22 , . . . , ( p−1 2
)2 and then reduce
mod p. The integers that result from this computation are the residues of p inside [1, p − 1].
This procedure also finds the modular square roots x of a residue r of p, i.e., the solutions to
the congruence x2 ≡ r mod p. For example, in just a few minutes on a hand-held calculator,
9
Chapter 2. Basic Facts

one finds that the residues of 17 are 1, 2, 4, 8, 9, 13, 15, and 16, with corresponding modular
square roots ±1, ±6, ±2, ±5, ±3, ±8, ±7, and ±4, and the residues of 37 are 1, 3, 4, 7, 9,
10, 11, 12, 16, 21, 25, 26, 27, 28, 30, 33, 34, and 36, with corresponding modular square roots
±1, ±15, ±2, ±9, ±3, ±11, ±14, ±7, ±4, ±13, ±5, ±10, ±8, ±18, ±17, ±12, ±16,
and ±6. Of course, for large p, this method quickly becomes impractical for the calculation
of residues and modular square roots, but see section 9 of Chapter 4 for a practical and
efficient way to perform these calculations for large values of p.
N.B. In the next proposition, all residues and non-residues are taken with respect to a
fixed prime p.

Proposition 2.2. (i) The product of two residues is a residue.


(ii) The product of a residue and a non-residue is a non-residue.
(iii) The product of two non-residues is a residue.

Proof. (i) If α, α′ are residues then x2 ≡ α, y 2 ≡ α′ imply that (xy)2 ≡ αα′ mod p.
(ii) Let α be a fixed residue. The integers 0, α, . . . , (p −1)α are incongruent mod p, hence
are a complete system of ordinary residues mod p. If R denotes the set of all residues in
[1, p − 1] then by Proposition 2.2(i), {αr : r ∈ R} is a set of residues of cardinality (p − 1)/2,
hence Proposition 2.1 implies that there are no other residues among α, 2α, . . . , (p − 1)α, i.e.,
if β ∈ [1, p − 1] \ R then αβ is a non-residue. Statement (ii) is an immediate consequence of
this.
(iii) Suppose that β is a non-residue. Then 0, β, 2β, . . . , (p − 1)β is a complete system
of ordinary residues mod p, and by Proposition 2.2(ii) and Proposition 2.1, {βr : r ∈ R} is
a set of non-residues and there are no other non-residues among β, 2β, . . . , (p − 1)β. Hence
β ′ ∈ [1, p−1]\R implies that ββ ′ is a residue. Statement (iii) is an immediate consequence of
this. QED
The following definition introduces the most important piece of mathematical technology
that we will use to study residues and non-residues.

Definition. The Legendre symbol χp of p is the function χp : Z → [−1, 1] defined by



 0, if p divides n,

χp (n) = 1, if gcd(p, n) = 1 and n is a residue of p,

−1, if gcd(p, n) = 1 and n is a non-residue of p.

The next proposition asserts that χp is a completely multiplicative arithmetic function


of period p. This fact will play a crucial role in much of our subsequent work.
10
1. The Legendre Symbol, Euler’s Criterion, and other Important Things

Proposition 2.3. (i) χp (n) = 0 if and only if p divides n, and if m ≡ n mod p then
χp (m) = χp (n) (χp is of period p).
(ii) For all m, n ∈ Z, χp (mn) = χp (m)χp (n) (χp is completely multiplicative).

Proof. (i) If m ≡ n mod p then p divides m (respectively, m is a residue/non-residue of p)


if and only if p divides n (respectively, n is a residue/non-residue of p). Hence χp (m) = χp (n).
(ii) χp (mn) = 0 if and only if p divides mn if and only if p divides m or n if and only if
χp (m) = 0 or χp (n) = 0 if and only if χp (m)χp (n) = 0.
2
Because χp (n2 ) = χp (n) , we may assume that m 6= n. Then χp (mn) = 1 (respec-
tively, χp (mn) = −1) if and only if gcd(mn, p) = 1 and mn is a residue (respectively, mn
is a non-residue) of p if and only if gcd(m, p) = 1 = gcd(n, p) and, by Proposition 2.2, m
and n are either both residues or both non-residues of p (respectively, {m, n} contains a
residue and a non-residue of p) if and only if χp (m)χp (n) = 1 (respectively, χp (m)χp (n) =
−1). QED
Remark on notation. As a consequence of Proposition 2.3, χp defines a homomorphism of
the group of units in the ring Z/pZ into the circle group, i.e., χp is a character of the group
of units. This is the reason why we have chosen the character-theoretic
  notation χp (n) for
n
the Legendre symbol, instead of the more traditional notation . When p is replaced by
p
an arbitrary integer m ≥ 2, we will have more to say later (see section 4 of Chapter 4) about
characters on the group of units in the ring Z/mZ and their use in what we will study here.
The next result determines the quadratic character of −1.

Theorem 2.4. (
1, if p ≡ 1 mod 4,
χp (−1) =
−1, if p ≡ −1 mod 4 .

This theorem is due to Euler [16], who proved it in 1760. It is of considerable importance
in the history of number theory because in 1795, the young Gauss (at the ripe old age of
18!) rediscovered it. Gauss was so struck by the beauty and depth of this result that, as he
testifies in the preface to Disquisitiones Arithmeticae [19], “I concentrated on it all of my
efforts in order to understand the principles on which it depended and to obtain a rigorous
proof. When I succeeded in this I was so attracted by these questions that I could not let
them be.” Thus began Gauss’ work in number theory that was to revolutionize the subject.
Proof of Theorem 2.4. The proof that we give is Euler’s own. It is based on

Theorem 2.5. (Euler’s criterion) If a ∈ Z and gcd(a, p) = 1 then

χp (a) ≡ a(p−1)/2 mod p.


11
Chapter 2. Basic Facts

If we apply Euler’s criterion with a = −1 then

χp (−1) ≡ (−1)(p−1)/2 mod p.

Hence χp (−1) − (−1)(p−1)/2 is either 0 or ±2 and is divisible by p, hence

χp (−1) = (−1)(p−1)/2 ,

and so χp (−1) = 1 (respectively, −1) if and only if (p − 1)/2 is even (respectively, odd) if
and only if p ≡ 1 mod 4 (respectively, p ≡ −1 mod 4). This verifies Theorem 2.4.
Proof of Theorem 2.5. This is an interesting application of Wilson’s theorem, which
asserts that

(*) if q is a prime then (q − 1)! ≡ −1 mod q,

and was in fact first stated by Abu Ali al-Hasan ibn al-Haytham in 1000 AD, over 750 years
before it was attributed to John Wilson, whose name it now bears. We will use Wilson’s
theorem to first prove Theorem 2.5; after that we then verify Wilson’s theorem.
Suppose that χp (a) = 1, and so x2 ≡ a mod p for some x ∈ Z. Note now that 1 =
gcd(a, p) implies that 1 = gcd(x2 , p), and so 1 = gcd(x, p) (p is prime!), hence by Fermat’s
little theorem,
a(p−1)/2 ≡ (x2 )(p−1)/2 = xp−1 ≡ 1 mod p.

Suppose that χp (a) = −1, i.e., a is a non-residue. For each i ∈ [1, p − 1], there exists
j ∈ [1, p − 1] uniquely determined by i, such that

ij ≡ a mod p

(Z/pZ is a field) and i 6= j because a is a non-residue. Hence we can group the integers
1, . . . , p − 1 into (p − 1)/2 pairs, each pair with a product ≡ a mod p. Multiplying all of
these pairs together yields
(p − 1)! ≡ a(p−1)/2 mod p,

and so (∗) implies that


−1 ≡ a(p−1)/2 mod p.

QED

Proof of Wilson’s theorem. The implication (∗) is clearly valid when q = 2, so assume
that q is odd. Use Proposition 1.2 to find for each integer a ∈ [1, q −1] an integer ā ∈ [1, p−1]
such that aā ≡ 1 mod q. The integers 1 and q − 1 are the only integers in [1, q − 1] that
12
2. The Basic Problem and the Fundamental Problem for a Prime

are their own inverses mod q, hence we may group the integers from 2 through q − 2 into
(q − 3)/2 pairs with the product of each pair congruent to 1 mod q. Hence

2 · 3 · · · (q − 3)(q − 2) ≡ 1 mod q.

Multiplication of both sides of this congruence by q − 1 yields

(q − 1)! = 1 · 2 · · · (q − 1) ≡ q − 1 ≡ −1 mod q.

QED
Remark. The converse of Wilson’s theorem is also valid.

2. The Basic Problem and the Fundamental Problem for a Prime

From our discussion in Chapter 1, if d is the discriminant of ax2 + bx + c and if neither


a nor d is divisible by p then
ax2 + bx + c ≡ 0 mod p
has a solution if and only if d is a residue of p. This motivates what we will call the

Basic Problem. If d ∈ Z, for what primes p is d a quadratic residue of p?

We now present a strategy for solving this problem which employs Proposition 2.3 as the
basic tool. Things can be stated precisely and concisely if we use the following
Notation: if z ∈ Z, let
X± (z) = {p : χp (z) = ±1},
πodd (z)(resp., πeven (z)) = {q ∈ π(z) : q has odd (resp., even) multiplicity in z}.
We point out here how to read the ± signs. If ± signs occur simultaneously in different
places in an equation, formula, definition, etc., then the + sign is meant to be taken simul-
taneously in all occurrences of ± and then the − sign is also to be taken simultaneously in
all occurrences of ±. For example, the equation X± (z) = {p : χp (z) = ±1} above asserts
that X+ (z) = {p : χp (z) = 1} and X− (z) = {p : χp (z) = −1}. We follow this convention in
the sequel.
Suppose first that d > 0, with gcd(d, p) = 1. If πodd (d) = ∅ then d is a square, so d is
trivially a residue of p. Hence assume that πodd (d) 6= ∅. Proposition 2.3 implies that
Y
χp (d) = χp (q).
q∈πodd (d)

Hence

(1) χp (d) = 1 iff |{q ∈ πodd (d) : χp (q) = −1}| is even.


13
Chapter 2. Basic Facts

Let
E = {E ⊆ πodd (d) : |E| is even}.

If E ∈ E, let RE denote the set of all p such that


(
−1, if q ∈ E,
χp (q) =
1, if q ∈ πodd (d) \ E.

Then (1) implies that


[ 
(2) X+ (d) = RE \ πeven (d),
E∈E

and this union is pairwise disjoint. Moreover


\   \ 
(3) RE = X− (q) ∩ X+ (q) .
q∈E q∈πodd (d)\E

Suppose next that d < 0. Then d = (−1)(−d), hence


Y
(4) χp (d) = χp (q).
q∈{−1}∪πodd (d)

If we let
E−1 = {E ⊆ {−1} ∪ πodd (d) : |E| is even},

then by applying (4) and an argument similar to the one that yielded (2) and (3) for
X+ (d), d > 0, we also deduce that for d < 0,
 [ 
(5) X+ (d) = RE \ πeven (d),
E∈E−1

where
\   \ 
(6) RE = X− (q) ∩ X+ (q) , E ∈ E−1.
q∈E q∈({−1}∪πodd (d))\E

In order to show more concretely how this strategy for the solution of the basic problem
is implemented, suppose as an example that we wish to determine X+ (±126). First, factor
±126 as ±2 · 32 · 7. It follows from this factorization that

πodd (±126) = {2, 7}, πeven (±126) = {3},

hence

E = {∅, {2, 7}}, E−1 = {∅, {−1, 2}, {−1, 7}, {2, 7}}.
14
3. Gauss’ Lemma and the Fundamental Problem for the Prime 2

It now follows from (2) and (3) that


X+ (126) = (R∅ ∪ R{2,7} ) \ {3}
  
= X+ (2) ∩ X+ (7) ∪ X− (2) ∩ X− (7) \ {3},

and from (5) and (6) that

X+ (−126) = (R∅ ∪ R{−1,2} ∪ R{−1,7} ∪ R{2,7} ) \ {3}


  
= X+ (−1) ∩ X+ (2) ∩ X+ (7) ∪ X− (−1) ∩ X− (2) ∩ X+ (7)
 
∪ X− (−1) ∩ X+ (2) ∩ X− (7) ∪ X+ (−1) ∩ X− (2) ∩ X− (7) \ {3}.

In order to finish this calculation of X+ (±126), we must now calculate X+ (2)∩X+ (7), X− (2)∩
X− (7), X+ (−1) ∩ X+ (2) ∩ X+ (7), X− (−1) ∩ X− (2) ∩ X+ (7), X− (−1) ∩ X+ (2) ∩ X− (7), and
X+ (−1) ∩ X− (2) ∩ X− (7), which in turn requires the calculation of X± (−1), X± (2), and
X± (7). Theorem 2.4 and formulae (2),(3),(5), and (6) hence reduce the solution of the Basic
Problem to the solution of the

Fundamental Problem for Primes. If q is prime, calculate X± (q).

The Fundamental Problem for odd primes and Basic Problem and will be completely
solved in sections 1 and 2 of Chapter 4. The Fundamental Problem for the prime 2 will be
completely solved in the next section.

3. Gauss’ Lemma and the Fundamental Problem for the Prime 2

The next theorem, along with Theorems 2.4 and 2.5, will be used many times in our
subsequent work.
2 −1)/8
Theorem 2.6. χp (2) = (−1)(p .

Theorem 2.6 solves the Fundamental Problem for the prime 2. It is easy to see that
(p2 − 1)/8 is even (odd) if and only if p ≡ 1 or 7 mod 8 (p ≡ 3 or 5 mod 8). Hence

X+ (2) = {p : p ≡ 1 or 7 mod 8},

X− (2) = {p : p ≡ 3 or 5 mod 8}.


The proof of Theorem 2.6 will use a basic result in the theory of quadratic residues called
Gauss’ lemma (this lemma was first used by Gauss in his third proof of the Law of Quadratic
Reciprocity [20], which proof we will present in Chapter 3). We will first state Gauss’ lemma,
then use it to prove Theorem 2.6, and then we will prove Gauss’ lemma.
15
Chapter 2. Basic Facts

Toward that end, then, let a ∈ Z, gcd(a, p) = 1. Consider the minimal positive ordinary
residues mod p of the integers a, . . . , 21 (p − 1)a. None of these ordinary residues is p/2, as p
is odd, and they are all distinct as gcd(a, p) = 1, hence let

u1 , . . . , us be those ordinary residues that are > p/2,

v1 , . . . , vt be those ordinary residues that are < p/2.


N.B. s + t = 21 (p − 1). We then have

Theorem 2.7. (Gauss’ lemma)

χp (a) = (−1)s .

Proof of Theorem 2.6. Let σ be the the number of minimal positive ordinary residues
mod p of the integers in the set
1
(7) 1 · 2, 2 · 2, . . . , (p − 1) · 2
2
that exceed p/2. Gauss’ lemma implies that

χp (2) = (−1)σ .

Because each integer in (7) is less than p, σ = the number of integers in the set (7) that
exceed p/2. An integer 2j, j ∈ [1, (p − 1)/2] does not exceed p/2 if and only if 1 ≤ j ≤ p/4,
hence the number of integers in (7) that do not exceed p/2 is [p/4], where [x] denotes the
greatest integer not exceeding x. Hence
p − 1 hpi
σ= − .
2 4
To prove Theorem 2.6, it hence suffices to prove that
n − 1 h n i n2 − 1
(8) for all odd integers n, − ≡ mod 2.
2 4 8
To see this, note first that the congruence in (8) is true for a particular integer n if and only
if it is true for n + 8, because
(n + 8) − 1 h n + 8 i n − 1 h n i  n − 1 hni
− = +4− +2 ≡ − mod 2,
2 4 2 4 2 4
(n + 8)2 − 1 n2 − 1 n2 − 1
= + 2n + 8 ≡ mod 2.
8 8 8
Thus (8) holds if and only if it holds for n = ±1, ±3, and it is easy to check that (8) holds for
these values of n. QED
16
3. Gauss’ Lemma and the Fundamental Problem for the Prime 2

Proof of Theorem 2.7. Let ui , vi be as defined before the statement of Gauss’ lemma. We
claim that
1
(9) {p − u1 , . . . , p − us , v1 , . . . , vt } = [1, (p − 1)].
2
To see this, note first that if i 6= j then vi 6= vj , ui 6= uj hence p − ui 6= p − uj . It is also true
that p −ui 6= vj for all i, j; otherwise p ≡ a(k + l) mod p, where 2 ≤ k + l ≤ p−1 2
+ p−1
2
= p −1,
which is impossible because gcd(a, p) = 1. Hence
p−1
(10) |{p − u1 , . . . , p − us , v1 , . . . , vt }| = s + t =
.
2
But 0 < vi < p/2 implies that 0 < vi ≤ (p − 1)/2 and p/2 < ui < p, hence 0 < p − ui ≤
(p − 1)/2, and so
1
(11) {p − u1 , . . . , p − us , v1 , . . . , vt } ⊆ [1, (p − 1)].
2
As |[1, 21 (p − 1)]| = 21 (p − 1), (9) follows from (10) and (11).
It follows from (9) that
s
Y t
Y p − 1
(p − ui ) vi = !.
1 1
2
Because
p − ui ≡ −ui mod p
we conclude from the preceding equation that
s
Y t
Y p − 1
s
(12) (−1) ui vi ≡ ! mod p.
1 1
2

Because u1 , . . . , us , v1 , . . . , vt are the least positive ordinary residues of a, . . . , 12 (p − 1)a, it is


a consequence of (12) that
s (p−1)/2 p − 1
  p − 1
(13) (−1) a !≡ ! mod p.
2 2
But p and ( p−1
2
)! are relatively prime, and so (13) implies that

(−1)s a(p−1)/2 ≡ 1 mod p

i.e.,
a(p−1)/2 ≡ (−1)s mod p.
By Euler’s criterion (Theorem 2.5),

a(p−1)/2 ≡ χp (a) mod p,


17
Chapter 2. Basic Facts

hence
χp (a) ≡ (−1)s mod p.
It follows that χp (a) − (−1)s is either 0 or ±2 and is also divisible by p and so
χp (a) = (−1)s .
QED
We now need to solve the Fundamental Problem for odd primes. This will be done in
Chapter 4 by using a result which Gauss called the theorema aureum, the “golden theorem”,
of number theory. We will discuss that result extensively in the next chapter.

18
CHAPTER 3

Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

Proposition 1.1 of Chapter 1 shows that the solution of the general second-degree con-
gruence ax2 + bx + c ≡ 0 mod p for an odd prime p can be reduced to the solution of the
congruence x2 ≡ b2 − 4ac mod p, and we also saw how the solution of x2 ≡ n mod m for a
composite modulus m can be reduced by way of Gauss’ algorithm to the solution of x2 ≡ q
mod p for prime numbers p and q. In this chapter, we will discuss a remarkable theorem
known as the Law of Quadratic Reciprocity, which provides a very powerful method for de-
termining the solvability of congruences of this last type. The theorem states that if p and
q are distinct odd primes then the congruences x2 ≡ q mod p and x2 ≡ p mod q are either
both solvable or both not solvable, unless p and q are both congruent to 3 mod 4, in which
case one is solvable and the other is not. As a simple but nevertheless striking example of
the power of this theorem, suppose one wants to know if x2 ≡ 5 mod 103 has any solutions.
Since 5 is not congruent to 3 mod 4, the quadratic reciprocity law asserts that x2 ≡ 5 mod
103 and x2 ≡ 103 mod 5 are both solvable or both not. But solution of the latter congruence
reduces to x2 ≡ 3 mod 5, which clearly has no solutions. Hence neither does x2 ≡ 5 mod
103.
The first rigorous proof of the Law of Quadratic Reciprocity is due to Gauss. He valued
this theorem so much that he referred to it as the theorema aureum, the golden theorem,
of number theory, and in order to acquire a deeper understanding of its content and impli-
cations, he searched for various proofs of the theorem, eventually discovering eight different
ones. After discussing what type of mathematical principle a reciprocity law might seek to
encapsulate in section 1 of this chapter, stating the Law of Quadratic Reciprocity precisely
in section 2, and discussing some of the mathematical history which led up to it in section
3, we follow Gauss’ example by presenting five different proofs of quadratic reciprocity in
the remaining 10 sections. Each of these proofs is chosen to highlight the ideas behind the
techniques which Gauss himself employed and to indicate how some of the more modern
approaches to quadratic reciprocity are inspired by the work of Gauss. For a more detailed
summary of what we do in sections 5-13, consult section 4.

19
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

1. What is a reciprocity law?

We will motivate why we would want an answer to the question entitling this section by
first asking this question: what positive integers n are the sum of two squares? This is an
old problem that was solved by Fermat in 1640. We can reduce to the case when n is prime
by first observing, as Fermat did, that if a prime number q divides a sum of two squares,
neither of which is divisible by q, then q is the sum of two squares. Using the identity
(a2 + b2 )(c2 + d2 ) = (ad − bc)2 + (ac + bd)2 ,
which shows that the property of being the sum of two squares is preserved under multipli-
cation, it can then be easily shown that n is the sum of two squares if and only if n is either
a square or each oddprime factor of n of odd multiplicity is the sum of two squares. Because
2 = 12 + 12 , we hence need only consider odd primes p.
As we mentioned before, p is the sum of two squares if it divides the sum of two squares
and neither of the squares are divisible by p, and so we are looking for integers a and b such
that
a2 + b2 ≡ 0 mod p
and
a 6≡ 0 6≡ b mod p.
After multiplying the first congruence by the square of the inverse of b mod p, it follows that
p is the sum of two squares if and only if the congruence
x2 + 1 ≡ 0 mod p
has a solution, i.e., −1 is a residue of p. We now invoke Theorem 2.4 of Chapter 2, which
asserts that −1 is a residue of p if and only if p ≡ 1 mod 4, to conclude that a positive integer
n is the sum of two squares if and only if either n is a square or each odd prime factor of n
of odd multiplicity is congruent to 1 mod 4.
Another way of saying that the congruence x2 + 1 ≡ 0 mod p has a solution is to say
that the polynomial x2 + 1 factors over Z/pZ as (x + c)(x − c) for some (nonzero) c ∈ Z,
i.e., x2 + 1 splits over Z/pZ (in the remainder of this section, we follow the exposition as set
forth in the very nice paper of B. F. Wyman [64]). Our previous discussion hence shows that
the problem of deciding when an integer is the sum of two squares comes down to deciding
when a certain monic polynomial with integer coefficients splits over Z/pZ. It is therefore of
considerable interest to further study this splitting phenomenon. For that purpose, we will
start more generally with a polynomial f (x) with integral coefficients that is irreducible over
Q, and for an odd prime p, we let fp (x) denote the polynomial over Z/pZ obtained from
20
1. What is a reciprocity law?

f (x) by reducing all of its coefficients modulo p. We will say that f (x) splits modulo p if
fp (x) is the product of distinct linear factors over Z/pZ, and if f (x) splits modulo p, we will
call p a slitting modulus of f (x).
Suppose now that f (x) = ax2 + bx + c is a quadratic polynomial, the case that is of most
interest to us here. If p is an odd prime then the congruence

f (x) ≡ 0 mod p

has either 0, 1, or 2 solutions, which occur, according to Proposition 1.1 of Chapter 1, if


the discriminant b2 − 4ac of f (x) is, respectively, a non-residue of p, is divisible by p, or is a
residue of p. Because this congruence also has exactly 2 solutions if and only if f (x) splits
modulo p, it follows that f (x) splits modulo p if and only if the discriminant of f (x) is a
residue of p. We saw before that x2 + 1 splits modulo p if and only if p ≡ 1 mod 4, and using
Theorem 2.6 from Chapter 2 in a similar manner, one can prove that x2 − 2 splits modulo
p if and only if p ≡ 1 mod 8. Another amusing example, which we will let the reader work
out, asserts that x2 + x + 1 splits modulo p if and only if p ≡ 1 mod 3.
In light of these three examples, we will now, for a fixed prime q, look for the splitting
moduli of x2 − q. We wish to determine these moduli by means of congruence conditions
that are similar to the conditions which described the splitting moduli of x2 + 1, x2 − 2 and
x2 + x + 1. If p is a prime then, over Z/pZ, x2 − q is the square of a linear polynomial only
if p = q, and also if p = 2, hence we may assume that p is an odd prime distinct from q.
It follows that x2 − q splits modulo p if and only if q is a quadratic residue of p, i.e., the
Legendre symbol χp (q) is 1. Hence we must find a way to calculate χp (q) as p varies over
the odd primes.
This translation of the splitting modulus problem for x2 − q does not really help much.
The Legendre symbol χp is not easy to evaluate directly, and changing the value of p would
require a direct calculation to begin again from scratch. Because there are infinitely many
primes, this approach to the problem quickly becomes unworkable.
A way to possibly overcome this difficulty is to observe that in this problem q is fixed while
the prime p varies, and so if it was possible to somehow use χq (p) in place of χp (q) then only
one Legendre symbol would be required. Moreover, the values of χq (p) are determined only
by the ordinary residue class of p modulo q, and so we would also have open the possibility
of calculating the splitting moduli of x2 − q in terms of ordinary residue classes determined
in some way by q, as per the descriptions of the splitting moduli in our three examples.
This suggests looking for a computationally efficient relationship between χp (q) and χq (p),
i.e., is there a useful reciprocal relation between the residues (respectively, non-residues) of
p and the residues (respectively, non-residues) of q? The answer: yes there is, and it is
21
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

given by the Law of Quadratic Reciprocity, one of the fundamental principles of elementary
number theory and one of the most powerful tools that we have for analyzing the behavior
of residues and non-residues. As we will see (in Chapter 4), it completely solves the problem
of determining the splitting moduli of any quadratic polynomial by means of congruence
conditions which depend only on the discriminant of the polynomial.
We will begin our study of quadratic reciprocity in the next section, but before we do that,
it is natural to wonder if there is a similar principle which can be used to study the splitting
moduli of polynomials of degree larger than 2. Using what we have discussed for quadratic
polynomials as a guide, we will say that a polynomial f (x) with integer coefficients satisfies
a reciprocity law if its splitting moduli are determined solely by congruence conditions which
depend only on f (x). This way of formulating these higher-degree reciprocity laws is the
main reason that we used the idea of splitting moduli of polynomials in the first place.
As it turns out, higher reciprocity laws exit for many polynomials. A particularly nice
class of examples are provided by the set of cyclotomic polynomials. There is a cyclotomic
polynomial corresponding to each integer n ≥ 2, defined by a primitive n-th root of unity, say
ζn = exp(2πi/n). The number ζn is algebraic over Q, and is the root of a unique irreducible
monic polynomial Φn (x) with integer coefficients of degree ϕ(n), where ϕ denotes Euler’s
totient function. The polynomial Φn (x) is the n-th cyclotomic polynomial. For example, if
n = q is prime, one can show that

Φq (x) = 1 + x + · · · + xq−1

(Chapter 3, section 8 ). The degree of Φn (x) is at least 4 when n ≥ 7, and Φn (x) satisfies
the following very nice reciprocity law (for a proof, consult Wyman [64]):

Theorem 3.1. (a cyclotomic reciprocity law). The prime p is a splitting modulus of


Φn (x) if and only if p ≡ 1 mod n.

It transpires that not every polynomial with integer coefficients satisfies a reciprocity
law, but there is an elegant way to characterize the polynomials with rational coefficients
which do satisfy one. If f (x) is a polynomial of degree n with coefficients in Q then f (x)
has n complex roots, counted according to multiplicity, and these roots, together with Q,
generate a subfield of the complex numbers that we will denote by Kf . The set of all field
automorphisms of Kf forms a group under the operation of composition of automorphisms.
The Galois group of f (x) is defined to be the subgroup of all automorphisms σ of Kf which
fix each rational number, i.e., σ(r) = r for all r ∈ Q. The next theorem neatly characterizes
in terms of their Galois groups the polynomials which satisfy a reciprocity law.
22
2. The Law of Quadratic Reciprocity

Theorem 3.2. (Existence of Reciprocity Laws). If f (x) is a polynomial with integer


coefficients and is irreducible over Q then f (x) satisfies a reciprocity law if and only if the
Galois group of f (x) is abelian.

The polynomials x4 + 4x2 + 2, x4 − 10x2 + 4, x4 − 2, and x5 − 4x + 2 are all irreducible


over Q, and one can show that their Galois groups are, respectively, the cyclic group of order
4, the Klein 4-group, the dihedral group of order 8, and the symmetric group on 5 symbols
(Hungerford [29], section V.4). We hence conclude from Theorem 3.2 that x4 + 4x2 + 2 and
x4 − 10x2 + 4 satisfy a reciprocity law, but x4 − 2 and x5 − 4x + 2 do not.
Two natural questions now arise: how do you prove Theorem 3.2, and if you have an
irreducible polynomial with integer coefficients and an abelian Galois group, how do you
find the congruence conditions which determine its splitting moduli? The answers to these
questions are far beyond the scope of what we will do in these lecture notes, because they
make use of essentially all of the machinery of class field theory over the rationals. We will
not even attempt an explanation of what class field theory is, except to say that it originated
in a program to find reciprocity laws which are similar in spirit to the reciprocity laws for
polynomials that we have discussed here, but which are valid in much greater generality.
This program, which began with the work of Gauss on quadratic reciprocity, was eventually
completed in the 1920’s and 30’s by Tagaki, E. Artin, Furtwängler, Hasse, and Chevalley.
We now turn to the theorem which inspired all of that work.

2. The Law of Quadratic Reciprocity

Theorem 3.3. (Law of Quadratic Reciprocity (LQR)) If p and q are distinct odd primes
then
1 1
χp (q)χq (p) = (−1) 2 (p−1) 2 (q−1) .

We will begin our study of the the LQR by unpacking the information that is encoded
in the elegant and efficient way by which Theorem 3.3 states it. Note first that if n ∈ Z is
odd then 12 (n − 1) is even (respectively, odd) if and only if n ≡ 1 mod 4 (respectively, n ≡ 3
mod 4). Hence
χp (q)χq (p) = 1 iff p or q ≡ 1 mod 4,
χp (q)χq (p) = −1 iff p ≡ q ≡ 3 mod 4,
i.e.,
χp (q) = χq (p) iff p or q ≡ 1 mod 4,
χp (q) = −χq (p) iff p ≡ q ≡ 3 mod 4.
23
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

Thus the LQR states that


if p or q ≡ 1 mod 4 then p is a residue of q if and only if q is a residue of p,
and
if p ≡ q ≡ 3 mod 4 then p is a residue of q if and only if q is a non-residue of p.
This is why the theorem is called the law of quadratic reciprocity. The classical quotient
notation for the Legendre symbol makes the reciprocity
   typographically explicit: in that
p q 1 1
notation, the conclusion of Theorem 3.3 reads as = (−1) 2 (p−1) 2 (q−1) .
q p
We next illustrate the usefulness of the LQR in determining whether or not a specific
integer is or is not the residue of a specific prime. We can do no better than taking the
example which Dirichet used himself in his landmark text Vorlesungen über Zahlentheorie
[12]. We wish to know whether 365 is a residue of the prime 1847. The first step is to factor
365 = 5 · 73, so that
χ1847 (365) = χ1847 (5) χ1847 (73).
Because 5 ≡ 1 mod 4, the LQR implies that
χ1857 (5) = χ5 (1857)
and as 1857 ≡ 2 mod 5, it follows that
χ1857 (5) = χ5 (2) = −1.
Since 73 ≡ 1 mod 4, it follows in the same manner from the LQR and the fact that 1847 ≡
22 mod 73 that
χ1847 (73) = χ73 (1847) = χ73 (22) = χ73 (2) χ73 (11).
But now 73 ≡ 1 mod 8, hence it follows from Theorem 2.6 that
χ73 (2) = 1
hence
χ1847 (73) = χ73 (11).
Using the LQR once more, we have that
χ73 (11) = χ11 (73) = χ11 (7),
and because 7 and 11 are each congruent to 3 mod 4, it follows from the LQR that
χ11 (7) = −χ7 (11) = −χ7 (4) = −χ7 (2)2 = −1.
Consequently,
χ1847 (73) = χ73 (11) = χ11 (7) = −1,
24
2. The Law of Quadratic Reciprocity

and so finally,
χ1847 (365) = χ1847 (5) χ1847 (73) = (−1)(−1) = 1.
Thus 365 is a residue of 1847; in fact

(±496)2 = 246016 = 365 + 133 · 1847.


Quadratic reciprocity can also be used to calculate the splitting moduli of polynomials
of the form x2 − q, q a prime, as we alluded to in section 1 above. For example, let q = 5.
Then the residues of 5 are 1 and 4 and so
χ5 (1) = χ5 (4) = 1
and
χ5 (2) = χ5 (3) = −1.
Hence
χ5 (p) = 1 iff p ≡ 1 or 4 mod 5.
Because 5 ≡ 1 mod 4, it follows from the LQR that
χp (5) = χ5 (p),
hence
5 is a residue of p iff p ≡ 1 or 4 mod 5.
Consequently, x2 − 5 splits modulo p if and only if p is congruent to either 1 or 4 mod 5.
For a different example, take q = 11. Then calculation of the residues of 11 shows that

χ11 (p) = 1 iff p ≡ 1, 3, 4, 5, or 9 mod 11.


We have that 11 ≡ 3 mod 4, hence by the LQR,
χp (11) = ±χ11 (p),

with the sign determined by the equivalence class of p mod 4. For example, if p = 23 then
23 ≡ 1 mod 11 and 23 ≡ 11 ≡ 3 mod 4, hence the LQR implies that

χ23 (11) = −χ11 (23) = −χ11 (1) = −1,


while if p = 89 then 89 ≡ 1 mod 11 and 89 ≡ 1 mod 4, and so the LQR implies in this case
that
χ98 (11) = χ11 (89) = χ11 (1) = 1.
Use of the Chinese remainder theorem shows that the value of χp (11) depends on the equiv-
alence class of p modulo 4 · 11 = 44, and after a few more calculations we see that

χp (11) = 1 iff p ≡ 1, 5, 7, 9, 19, 25, 35, 37, 39, or 43 mod 44.


25
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

Thus x2 − 11 splits modulo p if and only if p ≡ 1, 5, 7, 9, 19, 25, 35, 37, 39, or 43 mod
44. We will have much more to say about the utility of quadratic reciprocity in Chapter 4,
but these examples already give a good indication of how the LQR makes computation of
residues and non-residues much easier.

3. Some History

At the end of section 1, we indicated very briefly that many important and far-reaching
developments in number theory can trace their genesis to the Law of Quadratic Reciprocity.
Thus it is instructive to discuss the history of some of the ideas in number theory which
led up to it. In order to do that, we will follow the account of that story as presented in
Lemmermeyer [38]. Lemmermeyer’s book contains a wealth of information about the devel-
opment of reciprocity laws in many of their various manifestations, with a penetrating and
comprehensive analysis, both mathematical and historical, of the circle of ideas, techniques,
and approaches which have been brought to bear on that subject.
The first foreshadowing of quadratic reciprocity appears in the work of Fermat. Fermat’s
results on the representation of integers as the sum of two squares, as we saw in section 1,
leads directly to the problem of determining the quadratic character of −1, which was solved
in Chapter 2 by Theorem 2.4. Fermat also studied the representation of primes by quadratic
forms of the form x2 + ny 2 , for n = ±2, ±3, and −5. One can show that when n = 2 or 3
then a prime p which divides x2 + ny 2 but divides neither x nor y is of the form a2 + nb2 for
a pair of integers a and b. It now follows from this fact, using the same reasoning that we
employed in section 1, that p can be represented by either x2 + 2y 2 or x2 + 3y 2 if and only if
−2 or, respectively, −3, is a residue of p. We hence see that the quadratic character of ±2
and ±3 is also implicit in Fermat’s work on quadratic forms.
Euler apparently began to seriously study Fermat’s work when he started his mathemati-
cal correspondence with Christian Goldbach in 1729. As a result of that study, Euler became
interested in the divisors of integers which are represented by quadratic forms nx2 + my 2,
which eventually led him after several years to the Law of Quadratic Reciprocity. The LQR
was first conjectured by Euler [15] in an equivalent form in 1744, based on extensive numer-
ical evidence, but he could not prove it. Research done by Lagrange during the years from
1773 to 1775, in particular his work on a general theory of binary quadratic forms, inspired
Euler to return to the study of quadratic residues, and in a paper [17] published in 1783
after his death, Euler gave, still without proof, a formulation of the LQR that is very close
to that which is used most commonly today.
26
3. Some History

Euler’s original formulation of quadratic reciprocity in 1744 can be stated (using more
modern notation) as follows:

Theorem 3.4. Suppose that p is an odd prime and a is a positive integer not divisible
by p. If q is a prime such that p ≡ ±q mod 4a then χp (a) = χq (a).

This says that the value of the Legendre symbol χp (a) depends only on the ordinary residue
class of p modulo 4a, and that the value of χp (a) is the same for all primes p with a fixed
remained r or 4a − r when divided by 4a.
In section 6 below, we will deduce the LQR by proving Theorem 3.4 directly by means
of Gauss’ Lemma (Theorem 2.7), and so it makes sense to verify that the LQR is equivalent
to Theorem 3.4, which is what we will do now.

Proposition 3.5. The Law of Quadratic Reciprocity is equivalent to Theorem 3.4.

Proof. Suppose that Theorem 3.4 is true. Assume that p > q are odd primes. We need
to verify that
1 1
(1) χp (q)χq (p) = (−1) 2 (p−1) 2 (q−1) .
Suppose first that p ≡ q mod 4, and let a = (p − q)/4 > 0. Then we have that
χq (p) = χq (p − q) = χp (a)
and
χp (q) = χp (−1)χp (p − q) = χp (−1)χp (a).
Because p does not divide a, it follows from Theorem 3.4 that
χp (a) = χq (a),
and so
χq (p)χp (q) = χp (−1).
Equation (1) is now a consequence of this equation, Theorem 2.4, and the assumption that
p ≡ q mod 4.
On the other hand, if p 6≡ q mod 4 then p ≡ −q mod 4, hence we set a = (p + q)/4 > 0
and so deduce that
χq (p) = χq (p + q) = χq (a)
and
χp (q) = χp (p + q) = χp (a).
As p does not divide a, we conclude by way of Theorem 3.4 that
χq (p)χp (q) = 1,
27
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

and (1) follows from this equation because p ≡ −q mod 4.


In order to verify the converse, we assume that the LQR is valid and then for an odd
prime p, a positive integer a not divisible by p, and a prime q for which p ≡ ±q mod 4a, we
seek to verify that
(2) χp (a) = χq (a).
By virtue of the multiplicativity of the Legendre symbol (Proposition 2.3(ii) of Chapter
2), we immediately reduce to the case that a is prime. Suppose first that a = 2. Then
p ≡ ±q mod 8, hence (2) is true by Theorem 2.6.
Now let a be a fixed odd prime, and assume first that p ≡ q mod 4a. Then a is neither
p nor q, p ≡ q mod a, and p ≡ q mod 4. Hence
χa (q)χa (p) = 1
and
p−1 q−1
+ ≡ 0 mod 2.
2 2
It hence follows from the LQR that
1 1 1 1
χp (a)χq (a) = (−1) 2 (a−1)· 2 (p−1) (−1) 2 (a−1)· 2 (q−1) χa (p)χa (q)

1
(a−1) p−1 + q−1
= (−1) 2 2 2

= 1,
i.e., (2) is valid.
Assume next that p ≡ −q mod 4a. Then p ≡ −q mod 4 and p ≡ −q mod a, hence
p−1 q−1
+ ≡ 1 mod 2,
2 2
and, because of Theorem 2.4,
1
χa (p)χa (q) = (−1) 2 (a−1) .
As a is again neither p nor q, from the LQR it therefore follows that
1 1 1 1
χp (a)χq (a) = (−1) 2 (a−1)· 2 (p−1) (−1) 2 (a−1)· 2 (q−1) χa (p)χa (q)

1
(a−1) 1+ p−1 + q−1
= (−1) 2 2 2

= 1
in this case as well. QED
The formulation of the LQR which appears in modern texts, including the one before the
reader, was introduced by Legendre [36] in 1785. In this paper and in his influential book
[37], Legendre discussed the LQR at length and in depth. In particular, he gave a proof of the
28
3. Some History

LQR that depended on the assumption of primes which satisfy certain auxiliary conditions,
but as Legendre was unable to rigorously verify these conditions, as he himself admitted ([36],
p. 520), his argument is not complete. Interestingly enough, one of the auxiliary conditions
posited the existence of infinitely many primes in any arithmetic progression whose initial
term and common difference are relatively prime, a very important result which Dirichlet
would prove in 1837, and which we will have much more to say about later (see section 4 of
Chapter 4). Legendre introduced his symbol in [36] as a particularly elegant way to state
the LQR.
Lagrange’s role in the history of quadratic reciprocity also needs to be mentioned, and
comes from his work [33], [34] in refining and generalizing Euler’s work on the representation
of integers by quadratic forms. For example, Lagrange proved that if p is a prime which is
congruent to either 1 or 9 mod 20 then p = x2 + 5y 2, a result that was conjectured without
proof by Euler, and he also showed that if p and q are primes each of which are congruent
to 3 or 7 mod 20 then pq = x2 + 5y 2. From these two results the quadratic character of −5
can be deduced. Lagrange also used quadratic residues to study the existence of nontrivial
solutions to certain Diophantine equations of the form ax2 + by 2 + cz 2 = 0.
The important role which the theory of quadratic residues played in the work of Euler
and Lagrange on quadratic forms and the extensive discussion of the LQR by Legendre all
contributed to making the proof of the LQR one of the major unsolved problems of number
theory in the eighteenth century. The first rigorous and correct proof was discovered by
Gauss in 1796. He considered this result one of his greatest contributions to mathematics,
returning to it again and again throughout his career. Gauss eventually found eight different
proofs of the LQR and published six of them (although the two unpublished proofs, usually
referred to as proofs 7 and 8, are not complete, according to D. Gröger [26]; I am grateful
to an anonymous referee for this reference). A major goal of Gauss’ later work in number
theory was to generalize quadratic reciprocity to higher powers, in particular to cubic and
biquadratic (fourth-power) residues, and he sought out different ways of verifying quadratic
reciprocity in order to gain as much insight as he could into ways to achieve that goal. We
will discuss with a bit more detail in the next section the six proofs of the LQR which Gauss
published.
The establishment of generalizations of quadratic reciprocity that covered arbitrary power
residues, the so-called higher reciprocity laws, was a major theme of number theory in the
nineteenth century and led to many of the most important advances in the subject during
that time. Further generalizations to number systems extending beyond the integers, in
particular and most importantly, to rings of algebraic integers in algebraic number fields (see

29
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

section 8 of this chapter and section 1 of Chapter 5 for the relevant definitions), was a major
theme of twentieth-century number theory and led to many of the most important advances
during that time. For an especially apt example of this latter development which closely
follows the theme of this chapter, we direct the reader’s attention to Hecke’s penetrating
analysis of quadratic reciprocity in an arbitrary algebraic number field ([27], Chapter VIII).

4. Proofs of the Law of Quadratic Reciprocity

The Law of Quadratic Reciprocity has inspired more proofs, by far, than any other the-
orem of number theory. Lemmermeyer [38] documents 196 different proofs which have ap-
peared up to the year 2000 and the web site at http://www.rzuser.uni-heidelberg.de/∼hb3/
fchrono.html summarizes 266 proofs, with still more being found (I am grateful to an anony-
mous referee for this web-site reference). Many of these arguments have been used as a basis
for developing new methods and opening up new areas of study in number theory. In this
book we will give seven different proofs of the LQR, with our primary emphasis being on
the six proofs which Gauss himself published. As an introduction to what we will do in that
regard, we will briefly discuss next these six proofs of Gauss.
Gauss’ first proof of the LQR, which involved an extremely long and complicated induc-
tion argument, was published in Disquisitiones Arithmeticae ([19], articles 135-145) (for a
very readable account of a simplified version of Gauss’ argument, see Dirichlet [12], Chapter
3, sections 48-51). It is similar to a proof attempted by Legendre, in that it also requires
an “auxiliary” prime. The complexity of Gauss’ argument stems from the necessity of rig-
orously establishing the existence of this prime, and the formidable technical calculations
which Gauss had to use there caused his argument to garner little attention for many years
thereafter. However, those calculations were found to be useful in the development of alge-
braic K-theory in the 1970’s; in fact a proof of quadratic reciprocity can be deduced from
certain results in the K-theory of the rational numbers (Rosenberg [49], Theorem 4.4.9 and
Corollary 4.4.10)!
Gauss’ second proof also appeared in the Disquisitiones (article 262) and uses his genus
theory of quadratic forms, a classification of forms that is closely related to Lagrange’s
classification of quadratic forms by means of unimodular substitutions (see section 12 of
this chapter and also section 3 of chapter 8). The main point of the argument here is the
verification of an inequality for the number of genera. This proof has a very nice modern
formulation using the concept of narrow equivalence of ideals in a quadratic number field,
and we will present this argument in sections 10, 11, and 12 below.
30
5. A Proof of Quadratic Reciprocity via Gauss’ Lemma

Gauss’ Lemma (Theorem 2.7) is the main idea underlying the third and fifth proofs ([20]
and [22], respectively) which Gauss gave of the LQR. In the next section, we will present
a simplification of Gauss’ third proof due to Eisenstein; it is one of the most elegant and
elementary proofs that we have of quadratic reciprocity and it has become the standard
argument. In section 6 , Theorem 3.4 will be deduced directly from Gauss’ Lemma, thereby
proving the LQR after an appeal to Proposition 3.5; this approach is in a spirit similar to
the idea use by Gauss in his fifth proof.
We make our first contact in these notes with the theory of algebraic numbers in our
account in sections 7, 8 and 9 of Gauss’ sixth proof [23] of the LQR. The main idea in this
proof and Gauss’ fourth proof [21] is the employment of quadratic Gauss sums (without
and, respectively, with the determination of the signs). Gauss did not himself make use of
algebraic number theory in any modern sense but we will use it in order to more clearly
arrange the rather ingenious calculations that are a signal feature of these techniques. We
end in section 13 with a proof of quadratic reciprocity by means of the Galois theory of
cyclotomic fields, an approach which foreshadows the way to more general formulations of
reciprocity laws that are the subject of research in number theory today.

5. A Proof of Quadratic Reciprocity via Gauss’ Lemma

The first proof that we will give of Theorem 3.3 is a simplification, due to Eisenstein, of
Gauss’ third proof [20]. It is by now the standard argument and uses an ingenious application
of Theorem 2.7 (Gauss’ lemma). Theorem 2.7 enters the reasoning by way of

Lemma 3.6. If a ∈ Z is odd and gcd(a, p) = 1 then

χp (a) = (−1)S(a,p) ,

where
1
2
(p−1)
X h ka i
S(a, p) = .
k=1
p

Assume Lemma 3.6 for the time being, with its proof to come shortly.
We begin our first proof of Theorem 3.3 by outlining the strategy of the argument. Let
p and q be distinct odd primes and consider the set L of points (x, y) in the plane, where
x, y ∈ [1, ∞), 1 ≤ x ≤ 21 (p − 1), and 1 ≤ y ≤ 12 (q − 1), i.e., the set of lattice points inside the
rectangle with corners at (0, 0), (0, 12 (q − 1)), ( 21 (p − 1), 0), ( 21 (p − 1), 12 (q − 1)).
Let l be the line with equation qx = py. To prove Theorem 3.3, one shows first that

(3) no point of L lies on l.


31
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

Hence

L = set of all points of L which lie below l ∪ set of all points of L which lie above l
= L1 ∪ L2 ,

consequently
1 1
(4) (p − 1) (q − 1) = |L| = |L1 | + |L2 |.
2 2
This geometry is illustrated in Figure 1.

py = qx

1
2
(q − 1)

1
2
(p − 1)

Figure 1. The lattice-point count

The next step is to

(5) count the number of points in L1 and L2 .

The result is
|L1 | = S(q, p), |L2 | = S(p, q),
hence from (4),
1 1
(p − 1) (q − 1) = S(q, p) + S(p, q).
2 2
It then follows from Lemma 3.3 that
1 1
(−1) 2 (p−1) 2 (q−1) = (−1)S(q,p) (−1)S(p,q) = χp (q)χq (p),

which is the conclusion of Theorem 3.3. Thus, we need to verify (3), implement (5), and
prove Lemma 3.6.
32
5. A Proof of Quadratic Reciprocity via Gauss’ Lemma

Verification of (3). Suppose that (x, y) ∈ L satisfies qx = py. Then q, being prime, must
divide either p or y. Because p is prime and q 6= p, q must divide y, which is not possible
because 1 ≤ y ≤ 12 (q − 1) < q.
Implementation of (5).
L1 = {(x, y) ∈ L : qx > py}
1 qx
= {(x, y) ∈ L : 1 ≤ x ≤ (p − 1), 1 ≤ y < }
2 p
[ h qx i
= {(x, y) : 1 ≤ y ≤ },
1
p
1≤x≤ 2 (p−1)

and this union is pairwise disjoint. Hence


1
2
(p−1)
X h qx i
|L1 | = = S(q, p).
x=1
p
In Figure 1, L1 is the set of lattice points which lie below the line py = qx.
On the other hand,
L2 = {(x, y) ∈ L : qx < py}
1 py
= {(x, y) ∈ L : 1 ≤ y ≤ (q − 1), 1 ≤ x < }
2 q
[ h py i
= {(x, y) : 1 ≤ x ≤ },
1
q
1≤y≤ 2 (q−1)

hence
1
2
(q−1)
X h py i
|L2 | = = S(p, q).
y=1
q
Here, L2 is the set of lattice points in Figure 1 which lie above the line py = qx.
Note that this part of the proof contains no number theory but is instead a purely
geometric lattice-point count. All of the number theory is concentrated in the proof of
Lemma 3.6, which is still to come. Indeed, that is the main idea in Gauss’ third proof:
divide the argument into two parts, a number-theoretic part (Lemma 3.6) and a geometric
part (the lattice-point count). Coupling geometry to number theory is a very powerful
method for proving things, which Gauss pioneered in much of his work.
Proof of Lemma 3.6. We set up shop in order to apply Gauss’ lemma: take the minimal
positive ordinary residues mod p of the integers a, . . . , 21 (p − 1)a, observe as before that none
of these ordinary residues is p/2, as p is odd, and they are all distinct as gcd(a, p) = 1, hence
let
u1 , . . . , us be those ordinary residues that are > p/2,
33
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

v1 , . . . , vt be those ordinary residues that are < p/2.


By the division algorithm, for each j ∈ [1, 21 (p − 1)],
h ja i
ja = p + rj ,
p
rj = a uk or a vl .
Adding these equations together, we obtain
1 1
2
(p−1) 2
(p−1) s t
X X h ja i X X
(6) a j=p + uj + vj .
j=1 j=1
p j=1 j=1

Next, recall from (9) of Chapter 2 that


1
{p − u1 , . . . , p − us , v1 , . . . , vt } = [1, (p − 1)].
2
Hence
1
2
(p−1) s t
X X X
(7) j = sp − uj + vj .
j=1 j=1 j=1

Subtracting (7) from (6) yields


1
2
(p−1) s
X X
(a − 1) j = pS(a, p) − sp + 2 uj .
j=1 j=1

Hence
p(S(a, p) − s) is even (a is odd!),
and so
S(a, p) − s is even (p is odd!),
whence
(−1)S(a,p) = (−1)s .
Gauss’ lemma now implies that
χp (a) = (−1)s ,
and so
χp (a) = (−1)S(a,p) .

QED
34
6. Another Proof of Quadratic Reciprocity via Gauss’ Lemma

6. Another Proof of Quadratic Reciprocity via Gauss’ Lemma

The proof of Theorem 3.3 that we will do here is similar to the first proof that we presented
in section 5, in that it also uses Gauss’ Lemma as a crucial tool. However, the strategy for
the argument in this section is different from the one that was used in section 5; we will
deduce Theorem 3.4, Euler’s version of Theorem 3.3, directly from Gauss’ Lemma, and then
use the equivalence of Theorem 3.4 and Theorem 3.3 that was established in Proposition
3.5 to conclude that Theorem 3.3 is valid (the details of the following argument, as well as
those of the proof of Proposition 3.5, are taken from some lecture notes of F. Lemmermeyer
posted at www.fen.bilkent.edu.tr/∼franz/nt/ch6.pdf).
We proceed to implement this strategy. Let p > q be odd primes, let a be a positive
integer not divisible by p, and suppose that p ≡ ±q mod 4a. We wish to show that

χq (a) = χp (a).

Suppose that p ≡ q mod 4a(when p ≡ −q mod 4a, a straightforward modification of the


following argument, which we will leave to the interested reader, can be used to also verify
that χq (a) and χp (a) are the same). In order to apply Gauss’ Lemma to calculate χp (a), we

need to find (the parity of) the number s of all integers a, 2a, . . . , (p − 1)/2 a which have
positive minimal ordinary residue modulo p between p/2 and p. If

(x) = x − [x]

denotes the fractional part of a real number x, then we observe that s is the cardinality of
the set  
−1
 1
P (a) = r ∈ [1, (p − 1)/2] : arp > .
2
This set is the pairwise disjoint union of sets Pm (a), m ∈ [0, a − 1], where Pm (a) consists of
all integers r ∈ [1, (p − 1)/2] such that
m ar m+1  1
< < p and arp−1 > .
2 p 2 2
From this definition of Pm (a), we conclude that Pm (a) is empty when m is even, and we also
observe that the condition (arp−1 ) > 21 is automatically satisfied when m is odd. Hence
 
m m+1
Pm (a) = r ∈ Z : p < r < p ,
2a 2a
when m is odd, and
X
s= |Pm (a)|.
0≤m<a, m odd

35
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

Similarly, it follows that


χq (a) = (−1)t ,
where
X
t= |Qm (a)|,
0≤m<a, m odd
 
m m+1
Qm (a) = r ∈ Z : q < r < q .
2a 2a
We have that p − q = 4an, for some positive integer n, and so when m is odd,
 
m m+1
Pm (a) = r∈Z: p<r< p
2a 2a
 
m m+1
= r ∈ Z : (q + 4an) < r < (q + 4an)
2a 2a
 
m m+1
= r ∈ Z : q + 2mn < r < q + 2mn + 2n
2a 2a
 
′ m ′ m+1
= r ∈Z: <r < q + 2n ,
2a 2a
where r ′ = r − 2mn. Hence
|Pm (a)| = |Qm (a)| + 2n.
We conclude that s ≡ t mod 2, whence

χp (a) = (−1)s = (−1)t = χq (a).

QED

7. A Proof of Quadratic Reciprocity via Gauss Sums: Introduction

The third proof of Theorem 3.3 that we will give is a version of Gauss’ sixth proof. It
uses ingenious calculations based on some basic facts from algebraic number theory, and
anticipates some important techniques that we will use later to study various properties of
residues and non-residues in greater depth.
Gauss’ sixth proof of quadratic reciprocity [23] appeared in 1818. He mentions in the
introduction to this paper that for years he had searched for a method that would generalize
to the cubic and bi-quadratic cases and that finally his untiring efforts were crowned with
success. The purpose of publishing this sixth proof, he states, was to bring to a close this
part of the higher arithmetic dealing with quadratic residues and to say, in a sense, farewell.
Our third proof of Theorem 3.3 is a reworking of Gauss’ argument from [23] using some basic
facts from the theory of algebraic numbers. We start first with a rather detailed discussion
of the algebraic number theory that will be required; this is the content of the next section.
36
8. Algebraic Number Theory

This information is then used in section 9 to prove Theorem 3.3, following the development
given in Ireland and Rosen [30], sections 6.2 and 6.3.

8. Algebraic Number Theory

In the introduction to his great memoirs [24] and [25] on biquadratic reciprocity, Gauss
asserts that the theory of quadratic residues had reached such a state of perfection that no
more improvement was necessary. However, he said “The theory of biquadratic residues is
by far more difficult”. After struggling with this problem for a long time, he had been able
to derive satisfactory results in only a few special cases, with the proofs being so difficult
that he realized “. . . the previously accepted principles of arithmetic are in no way sufficient
for the foundations of a general theory, that rather such a theory necessarily demands that
to a certain extent the domain of the higher arithmetic needs to be endlessly enlarged. . . ”.
In modern language, Gauss is calling for a theory of algebraic numbers. He began to answer
that call himself in the second paper [25], where he used the subring of the complex numbers
Z+ iZ, what is now called the Gaussian integers, to formulate a precise statement of the Law
of Biquadratic Reciprocity. Subsequently, Gauss’ call has been so resoundingly answered by
the work of Dirichlet, Dedekind, Kummer, Kronecker, Hilbert, and many others, that today
the theory of algebraic numbers is indispensable in virtually all areas of number theory. In
this section we establish some basic facts from the theory of algebraic numbers which will
be required in our third proof of Theorem 3.3. This information will also play an important
role in further developments in subsequent chapters of these notes.
Let C denote the complex numbers.

Definition. A field of complex numbers is a nonzero subfield of C.

N. B. Every field of complex numbers contains the field Q of rational numbers.

Notation: if A is a commutative ring then A[x] will denote the ring of all polynomials in
x with coefficients in A.

Definitions. Let F be a field of complex numbers. A complex number θ is algebraic over


F if there exists f ∈ F [x] such that f 6≡ 0 and f (θ) = 0. If θ is algebraic over F , let
M(θ) = {f ∈ F [x] : f is monic and f (θ) = 0}
(N.B. M(θ) 6= ∅). An element of M(θ) of smallest degree is a minimal polynomial of θ over
F.
Proposition 3.7. The minimal polynomial of a complex number algebraic over a field
of complex numbers F is unique and irreducible over F .
37
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

Proof. Let r and s be minimal polynomials of the number θ algebraic over F . Use the
division algorithm in F [x] to find d, f ∈ F [x] such that
r = ds + f, f ≡ 0 or degree of f < degree of s.

Hence
f (θ) = r(θ) − d(θ)s(θ) = 0.
If f 6≡ 0 then, upon dividing f by its leading coefficient, we get a monic polynomial over
F of lower degree than s and not identically 0 which has θ as a root, which is not possible
because s is a minimal polynomial of θ over F . Hence f ≡ 0 and so s divides r over F ,
Similarly, r divides s over F . Hence r = αs for some α ∈ F , and as r and s are both monic,
α = 1, and so r = s. This proves that the minimal polynomial is unique.
To show that the minimal polynomial m is irreducible over F , suppose that m = rs,
where r and s are non-constant elements of F [x]. Then the degrees of r and s are both less
than the degree of m, and θ is a root of either r or s. Hence a constant multiple of either r
or s is a monic polynomial in F [x] having θ as a root and is of degree less than the degree of
m, contradicting the minimality of the degree of m. QED

Definition. Let θ be algebraic over F . The degree of θ over F is the degree of the minimal
polynomial of θ over F .

Lemma 3.8. If θ ∈ C, F is a field of complex numbers, and f ∈ F [x] is monic, irreducible


over F , and f (θ) = 0 then f is the minimal polynomial of θ over F.

Proof. Let m be the minimal polynomial of θ over F . The division algorithm in F [x]
implies that there exits q, r ∈ F [x] such that

f = qm + r, r ≡ 0 or degree of r < degree of m.

But
r(θ) = f (θ) − q(θ)m(θ) = 0,
and so if r 6≡ 0 then we divide r by its leading coefficient to get a monic polynomial over F
that is not identically 0, has θ as a root, and is of degree less than the degree of m, which
is impossible by the minimality of the degree of m. Hence r ≡ 0 and so f = qm. But f is
irreducible over F , and so either q or m is constant. If m is constant then m ≡ 1 (m is monic),
not possible because m(θ) = 0. Hence q is constant, and because f, m are both monic, q ≡ 1.
Hence f = m. QED

We will now discuss two examples that will be of major importance in subsequent work.
38
8. Algebraic Number Theory

(1) Let m ∈ Z \ {1} be square-free, i.e., m does not have a square 6= 1 as a factor. Then

m is irrational, hence x2 − m is irreducible over Q. Lemma 3.8 imllies that x2 − m is the
√ √
minimal polynomial of m over Q and so m is algebraic over Q of degree 2.
(2) Let q be a prime and let
 2πi 
ζq = exp .
q
Then ζqq = 1, ζq 6= 1, hence we deduce from the factorization
q−1
X 
q k
x − 1 = (x − 1) x
k=0
Pq−1 k
that ζq is a root of k=0 x .
Pq−1 k
We claim that k=1 x is irreducible over Q. To see this, note first that a polynomial
f (x) is irreducible if and only if f (x + 1) is irreducible, because f (x + 1) = g(x)h(x) if and
only if f (x) = g(x − 1)h(x − 1). Hence
q−1
X xq − 1 (x + 1)q − 1
xk = is irreducible if and only if is irreducible.
k=0
x−1 x
It follows from the binomial theorem that
q
!
(x + 1)q − 1 X q
= xk−1 .
x k=1
k
We now recall the following fact
! about binomial coefficients: if q is a prime then q divides
q
the binomial coefficient , k = 1, . . . , q − 1. Hence
k
(x + 1)q − 1
= xq−1 + q(xq−2 + . . . ) + q,
x
and this polynomial is irreducible over Q by way of

Lemma 3.9. (Eisenstein’s criterion) If q is prime and f (x) = nk=0 ak xk is a polynomial


P

in Z[x] whose coefficients satisfy: q does not divide an , q 2 does not divide a0 , and q divides
ak , k = 0, 1, . . . , n − 1, then f (x) is irreducible over Q.
Pq−1 k
Thus ζq has minimal polynomial k=0 x and hence is algebraic over Q of degree q − 1.
Proof of Lemma 3.9. We assert first that if a polynomial h ∈ Z[x] does not factor into a
product of polynomials in Z[x] of degree lower than the degree of h then it is irreducible over
Q. In order to see this, suppose that h is not constant (otherwise the assertion is trivial)
and that h = rs, where r and s are polynomials in Q[x], both not constant and of lower
degree than h. By clearing denominators and factoring out the greatest common divisors of
39
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

appropriate integer coefficients, we find integers a, b, c, and polynomials g, u, v in Z[x] such


that h = ag, degree of r = degree of u, degree of s = degree of v,

abg = cuv,

and all of the coefficients of g (respectively, u, v) are relatively prime, i.e., the greatest
common divisor of all of the coefficients of g (respectively, u, v) is 1.
We claim that the coefficients of the product uv are also relatively prime. Assume this
for now. Then |ab| = the greatest common divisor of the coefficients of abg = the greatest
common divisor of the coefficients of cuv = |c|, hence ab = ±c. But then h = ±auv and this
is a factorization of h as a product of polynomials in Z[x] of lower degree..
We must now verify our claim. Suppose that the coefficients of uv have a common prime
factor r. Let Fr denote the field of ordinary residue classes mod r. If s ∈ Z[x] and if we let s̄
denote the polynomial in Fr [x] obtained from s by reducing the coefficients of s mod r, then
s → s̄ defines a homomorphism of Z[x] onto Fr [x]. Because r divides all of the coefficients
of uv, it hence follows that
0 = uv = ūv̄ in Fr [x].
Because Fr is a field, Fr [x] is an integral domain, hence we conclude from this equation that
either ū or v̄ is 0 in Fr [x], i.e., either all of the coefficients of u or of v are divisible by r.
This contradicts the fact that the coefficients of u (respectively, v) are relatively prime. The
assertion that the product of two polynomials in Z[x] has all of its coefficients relatively
prime whenever the coefficients of each polynomial are relatively prime is often referred to
as Gauss’ lemma, not to be confused, of course, with the statement in Theorem 2.7.
Next suppose that f (x) = nk=0 ak xk ∈ Z[x] satisfies the hypotheses of Lemma 3.9. By
P

virtue of what we just showed, we need only prove that f does not factor into polynomials
of lower degree in Z[x]. Suppose, on the contrary, that
s
X t
 X 
k k
f (x) = bk x ck x
k=0 k=0

is a factorization of f in Z[x] with bs 6= 0 6= ct and s and t both less than n. Because a0 ≡ 0


mod q, a0 6≡ 0 mod q 2 and a0 = b0 c0 , one element of the set {b0 , c0 } is 6≡ 0 mod q and the
other is ≡ 0 mod q. Assume that b0 is the former element and c0 is the latter. As an 6≡ 0
mod q and an = bs ct , it follows that bs 6≡ 0 6≡ ct mod q. Let m be the smallest value of k
such that ck 6≡ 0 mod q. Then m > 0, hence
m−i
X
am = bj cm−j
j=0

40
8. Algebraic Number Theory

for some i ∈ [0, m − 1]. Because b0 6≡ 0 6≡ cm mod q and cm−1 , . . . , ci are all ≡ 0 mod q, it fol-
lows that am 6≡ 0 mod q, and so m = n. Hence t = n, contradicting the assumption on t and
n. QED
The crucial fact about algebraic numbers that we will need in order to prove the LQR
is that the set of all algebraic integers (see the definition after the proof of Theorem 3.10)
forms a subring of the field of complex numbers. The verification of that fact is the goal of
the next two results.
For use in the proof of the next theorem, we recall that if n is a positive integer, then
the elementary symmetric polynomials in n variables are the polynomials in the variables
x1 , . . . , xn defined by
Xn
σ1 = xi ,
i=1
..
.
σi = sum of all products of i different xj ,
..
.
n
Y
σn = xi .
i=1

The elementary symmetric polynomials have the property that if π is a permutation of the
set [1, n] then σi (xπ(1) , . . . , xπ(n) ) = σi (x1 , . . . , xn ), i.e., σi is unchanged by any permutation
of its variables.

Theorem 3.10. If F is a field of complex numbers then the set of all complex numbers
algebraic over F is a field of complex numbers which contains F.

Proof. Let α and β be algebraic over F . We want to show that α ± β, αβ, and α/β,
provided that β 6= 0, are all algebraic over F . We will do this by the explicit construction
of polynomials over F that have these numbers as roots.
Start with α + β. Let f and g denote the minimal polynomials of, respectively, α and β,
of degree m and n, respectively. Let α1 , . . . , αm and β1 , . . . , βn denote the roots of f and g
in C, with α1 = α and β1 = β. Now consider the polynomial
m Y
Y n mn
X
mn
(8) (x − αi − βj ) = x + ci (α1 , . . . , αm , β1 , . . . , βn )xmn−i ,
i=1 j=1 i=1

where each coefficient ci is a polynomial in the αi ’s and βj ’s over F (in fact, over Z). We
claim that

(9) ci (α1 , . . . , αm , β1 , . . . , βn ) ∈ F, i = 1, . . . , mn.


41
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

If this is true then the polynomial (8) is in F [x] and has α1 + β1 = α + β as a root, whence
α + β is algebraic over F .
In order to verify (9), we will make use of the following result from the classical theory of
equations (see Weisner [59], Theorem 49.10). Let τ1 , . . . , τm , σ1 , . . . , σn denote, respectively,
the elementary symmetric polynomials in m and n variables. Suppose that the polynomial
h over F in the variables x1 , . . . , xm , y1 , . . . , yn has the property that if π (respectively, ν) is
a permutation of [1, m] (respectively, [1, n]) then

h(x1 , . . . , xm , y1 , . . . , yn ) = h(xπ(1) , . . . , xπ(m) , yν(1) , . . . , yν(n) ),

i.e., h remains unchanged when its variables xi and yj are permuted amongst themselves.
Then there exist a polynomial l over F in the variables x1 , . . . , xm , y1, . . . , yn such that

h(x1 , . . . , xm , y1 , . . . , yn )
= l(τ1 (x1 , . . . , xm ), . . . , τm (x1 , . . . , xm ), σ1 (y1 , . . . , yn ), . . . σn (y1 , . . . , yn )).

Observe next that the left-hand side of (8) remains unchanged when the αi ’s and the βj ’s
are permuted amongst themselves (this simply rearranges the order of the factors in the
product), and so the same thing is true for each coefficient ci . It thus follows from our
result from the theory of equations that there exists a polynomial li over F in the variables
x1 , . . . , xm , y1 , . . . , yn such that

ci (α1 , . . . , αm , β1 , . . . , βn )
= li (τ1 (α1 , . . . , αm ), . . . , τm (α1 , . . . , αm ), σ1 (β1 , . . . , βn ), . . . , σn (β1 , . . . , βn )).

If we can prove that each of the numbers at which li is evaluated in this equation is in F then
(9) will be verified. Hence it suffices to prove that if θ is a number algebraic over F of degree
n, θ1 , . . . , θn are the roots of the minimal polynomial m of θ over F , and σ is an elementary
symmetric polynomial in n variables, then σ(θ1 , . . . , θn ) ∈ F . But this last statement follows
from the fact that all the coefficients of m are in F and
n
Y n
X
n
m(x) = (x − θi ) = x + (−1)i σi (θ1 , . . . , θn )xn−i ,
i=1 i=1

where σ1 , . . . , σn are the elementary symmetric polynomials in n variables.


A similar argument shows that α − β and αβ are algebraic over F .
Suppose next that β 6= 0 is algebraic over F and let
n−1
X
xn + ai xi
i=0
42
8. Algebraic Number Theory

be the minimal polynomial of β over F . Then 1/β is a root of


n−1
X
1+ ai xn−i ∈ F [x],
i=0

and so 1/β is algebraic over F . Then α/β = α · (1/β) is algebraic over F . QED

Notation: A(F ) denotes the field of all complex numbers algebraic over F .

Definition. An element of A(Q) is an algebraic integer if its minimal polynomial over Q


has all of its coefficients in Z.

By virtue of examples (1) and (2), all square-free integers and exp(2πi/q), q a prime, are
algebraic integers.

Theorem 3.11. The set of all algebraic integers is a subring of A(Q) containing Z.

Proof. Let α and β be algebraic integers. We need to prove that α ± β and αβ are
algebraic integers. This can be done by first observing that the result from the theory of
equations that we used in the proof of Theorem 3.10 holds mutatis mutandis if the field
F there is replaced by the ring Z of integers (Weisner [59], Theorem 49.9). If we then let
α1 , . . . , αm and β1 , . . . , βn denote the roots of the minimal polynomial over Q of α and β,
respectively, then the proof of Theorem 3.10, with F replaced in that proof by Z, verifies
that α ± β and αβ are roots of monic polynomials in Z[x]. We now invoke the following
fact: if a complex number θ is the root of a monic polynomial in Z[x] then it is an algebraic
integer.
In order to prove the last statement about θ, let f ∈ Z[x] be monic with f (θ) = 0. If m
is the minimal polynomial of θ over Q then we must show that m ∈ Z[x]. It follows from
the proof of Proposition 3.7 that there is a q ∈ Q[x] such that f = qm and so we can find a
rational number a/b and u, v ∈ Z[x] such that f = (a/b)uv, u (respectively, v) is a constant
multiple of m (respectively, q), and u (respectively, v) has all of its coefficients relatively
prime.
We have that
bf = auv.
As f is monic and u, v ∈ Z[x], we conclude that a divides b in Z, say b = ak for some k ∈ Z.
Hence
kf = uv.
Because f ∈ Z[x], it follows that k is a common factor of all of the coefficients of uv. Because
of the claim that we verified in the proof of Lemma 3.9, the coefficients of uv are relatively
43
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

prime, hence k = ±1, and so


f = ±uv.
As f is monic, the leading coefficient of u is ±1. But u is a constant multiple of m and m is
monic, hence m = ±u ∈ Z[x], and so θ is an algebraic integer. QED

Notation: R will denote the ring of algebraic integers.

The proof of Theorem 3.3 that we will give in the next section requires the following
simple lemma:

Lemma 3.12. R ∩ Q = Z.

Proof. If q ∈ R ∩ Q then x − q is the minimal polynomial of q over Q, hence x − q ∈ Z[x],


hence q ∈ Z. QED

9. Proof of Quadratic Reciprocity via Gauss Sums: Conclusion

As a warm-up for our third proof of Theorem 3.3, and also as a first illustration of how
useful algebraic number theory is to what we will do subsequently, we will reprove Theorem
2.6, which assets that
ε p2 − 1
χp (2) = (−1) , where ε ≡ mod 2,
8
by using algebraic number theory. Let
1 1
ζ = eπ/4 = √ + √ i.
2 2
Then
1 1
ζ −1 = e−π/4 = √ − √ i,
2 2
hence

τ = ζ + ζ −1 = 2 ∈ R,
and so we can work in the ring R of algebraic integers.
If p is an odd prime then we let

(p) = the ideal in R generated by p = pR = {pα : α ∈ R}.

If α, β ∈ R, then we will write


α ≡ β mod p
if α − β ∈ (p). Euler’s criterion (Theorem 2.5) implies that

τ p−1 = (τ 2 )(p−1)/2 = 2(p−1)/2 ≡ χp (2) mod p,


44
9. Proof of Quadratic Reciprocity via Gauss Sums: Conclusion

hence
(10) τ p ≡ χp (2)τ mod p.
We now make use of the following lemma, which!follows from the binomial theorem and the
p
fact that p divides the binomial coefficient , k = 1, . . . , p − 1.
k
Lemma 3.13. If α, β ∈ R then
(α + β)p ≡ αp + β p mod p.

Hence
(11) τ p = (ζ + ζ −1)p ≡ ζ p + ζ −p mod p.
The next step is to calculate ζ p + ζ −p. Begin by noting that
ζ 8 = 1,
hence if p ≡ ±1 mod 8, then
ζ p + ζ −p = ζ + ζ −1 = τ,
and if p ≡ ±3 mod 8, then
ζ p + ζ −p = ζ 3 + ζ −3
= −(ζ −1 + ζ)
= −τ,
where the second line follows from the first because ζ 4 = −1impliesthatζ 3 = −ζ −1 , andsoζ −3 =
−ζ. Hence
p −p ε p2 − 1
(12) ζ + ζ = (−1) τ, ε ≡ mod 2.
8
As a consequence of the congruences (10), (11), and (12), it follows that
χp (2)τ ≡ ζ p + ζ −p ≡ (−1)ε τ mod p.
Multiply this congruence by τ and use τ 2 = 2 to derive
(13) 2χp (2) ≡ 2(−1)ε mod p.
Now this congruence is in R, so there exits α ∈ R such that
2χp (2) = 2(−1)ε + αp,
hence
2(χp (2) − (−1)ε )
α= ∈ R ∩ Q = Z (by Lemma 3.12).
p
45
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

Hence (13) is in fact a congruence in Z, and so


χp (2) ≡ (−1)ε mod p in Z,
whence, as before,
χp (2) = (−1)ε .
This proof of Theorem 2.6 depends on the equation τ 2 = 2. Can one get a similar
equation with an odd prime p replacing the 2 on the right-hand side of this equation? Yes
one can, and a proof of the LQR will follow in a similar way from the ring structure of R.
In order to see how that goes, let ζ = e2πi/p and set
p−1
X
g = χp (n)ζ n ,
n=0

p∗ = (−1)(p−1)/2 p.
The sum g is called a Gauss sum; these sums were first used by Gauss in his famous study
of cyclotomy which concluded Disquisitiones Arithmeticae ([19], section VII). The analogue
of the equation τ 2 = 2 is given by

Theorem 3.14. g 2 = p∗ .

Assume this for now; we deduce LQR from it like so: let q be an odd prime, q 6= p. Then
g q−1 = (g 2 )(q−1)/2 = (p∗ )(q−1)/2 ≡ χq (p∗ ) mod q,
where the last equivalence follows from Euler’s criterion. Hence
(14) g q ≡ χq (p∗ )g mod q,
where this congruence is now in R, because g ∈ R. If n ∈ Z then χp (n)q = χp (n) because
χp (n) ∈ [−1, 1] and q is odd; consequently Lemma 3.13 implies that
X q
q n
(15) g = χp (n)ζ
n
X
≡ χp (n)q ζ qn mod q
n
X
≡ χp (n)ζ qn mod q,
n

We now need

Lemma 3.15. If a ∈ Z then


X
χp (n)ζ an = χp (a)g.
n

46
9. Proof of Quadratic Reciprocity via Gauss Sums: Conclusion

The sum on the left-hand side of this equation is another Gauss sum. Lemma 3.15 records
a very important relation satisfied by Gauss sums; in addition to the use that we make of it
here, it will also play an important role in some calculations that are performed in Chapter
7, where we study certain distributions of residues and non-residues.
Assume Lemma 3.15 for now; this lemma and (15) imply that

(16) g q ≡ χp (q)g mod q.

A consequence of (14) and (16) is that

χq (p∗ )g ≡ χp (q)g mod q.

Multiply by g and use g 2 = p∗ to derive

χq (p∗ )p∗ ≡ χp (q)p∗ mod q,

and then apply Lemma 3.12 and the fact that χq (p∗ ), χp (q) are both ±1 as before to get

(17) χq (p∗ ) = χp (q).

Theorem 2.4 implies that


χq (−1) = (−1)(q−1)/2 ,
hence, because of (17),
1
χp (q) = χq (−1) 2 (p−1) χq (p)
1 1
= (−1) 2 (q−1) 2 (p−1) χq (p),

which is the LQR.


We must now prove Theorem 3.14 and Lemma 3.15. Since Lemma 3.15 is used in the
proof of Theorem 3.14, we verify Lemma 3.15 first.
Proof of Lemma 3.15. Suppose first that p divides a. Then ζ an = 1, for all n and
χp (0) = 0 so
p−1
X X
an
χp (n)ζ = χp (n).
n n=1

Half of the terms of the sum on the right-hand side are 1 and the other half are −1 (Propo-
sition 2.1), and so this sum is 0. Because χp (a) = 0 (p divides a), the conclusion of Lemma
3.15 is valid.
Suppose that p does not divide a. Then
X X
(18) χp (a) χp (n)ζ an = χp (an)ζ an .
n n
47
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

Observe next that whenever n runs through a complete system of ordinary residues mod p,
so does an, and also that χp (an) and ζ an depend only on the residue class mod p of an.
Hence the sum on the right-hand side of (18) is
p−1
X
χp (n)ζ n = g.
n=0

Hence
X
χp (a) χp (n)ζ an = g.
n

Now multiply through by χp (a) and use the fact that χp (a)2 = 1, since p does not divide a.
QED
Proof of Theorem 3.14. We must prove that g 2 = p∗ .
Suppose that gcd(a, p) = 1 and let
p−1
X
g(a) = χp (n)ζ an .
n=0

The idea of this argument is to calculate


p−1
X
g(a)g(−a)
a=0

in two different ways, equate the expressions resulting from that, and see what happens.
For the first way, use Lemma 3.15 to obtain

g(a)g(−a) = χp (a)χp (−a)g 2


= χp (−a2 )g 2
= χp (−1)g 2 , a = 1, . . . , p − 1,

Hence this and the fact that g(0) = p−1


P
0 χp (n) = 0 imply that
p−1
X
(19) g(a)g(−a) = (p − 1)χp (−1)g 2 .
a=0

Now for the second way. We have that


X
g(a)g(−a) = χp (x)χp (y)ζ a(x−y).
1≤x,y≤p−1

Hence
p−1
X X X
(20) g(a)g(−a) = χp (x)χp (y) ζ a(x−y) .
a=0 1≤x,y≤p−1 a

48
9. Proof of Quadratic Reciprocity via Gauss Sums: Conclusion

The next step is to calculate


X
ζ a(x−y)
a
for fixed x and y. If x 6= y then
1 ≤ |x − y| ≤ p − 1
and so p does not divide x − y, hence ζ x−y 6= 1, hence
X ζ (x−y)p − 1
ζ a(x−y) = x−y = 0, (ζ p = 1 !).
a
ζ − 1
Hence
(
X p, if x = y,
(21) ζ a(x−y) =
a
0, if x 6= y.
Because of (20) and (21), it follows that
p−1
X
(22) g(a)g(−a) = (p − 1)p.
a=0

Equations (19) and (22) imply that

(p − 1)χp (−1)g 2 = (p − 1)p,

hence from Theorem 2.4,


g 2 = χp (−1)p = (−1)(p−1)/2 p.
QED
As Ireland and Rosen point out, the main idea of Gauss’ sixth proof of quadratic reci-
procity is to consider the polynomial
p−1
X
fk (x) = χp (t)xkt ,
t=0

and then to show, as Gauss did without using any roots of unity, that

1 + x + · · · + xp−1

divides
1
f1 (x)2 − (−1) 2 (p−1) p and fq (x) − χp (q)f1 (x).
Quadratic reciprocity then follows by noting that

fq (x) ≡ f1 (x)q mod q.

The proof which we have presented in this section amounts to setting x = exp(2πi/p) in this
argument and then performing the necessary calculations by means of congruences in the
49
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

ring of algebraic integers. This observation was also made by Eisenstein and Jacobi, who
used it as a stepping stone to the proof of higher-degree reciprocity laws via Gauss sums.

10. A Proof of Quadratic Reciprocity via Ideal Theory: Introduction

The second proof which Gauss gave for quadratic reciprocity rests on his genus theory
for quadratic forms, developed (along with his theory of composition of forms) in articles
231-261 of the Disquisitiones. A primary goal of genus theory is the determination of the
number of genera of quadratic forms with a given discriminant, and Gauss deduced the
LQR from a formula which he established for the number of genera. The argument which
Gauss used here to deduce the LQR is quite elegant, but the proof of the formula for the
number of genera on which it is based is much more involved, as it uses virtually all of
the formidable mathematical technology involved in Gauss’ development of his composition
of forms. Fortunately, the ideas and techniques which Gauss employed can be given a
formulation that is much easier to follow using the modern theory of ideals in the ring of
algebraic integers in a quadratic number field. This is the approach that we will take in our
fourth proof of Theorem 3.3; the necessary information about this ideal theory will be given
in the next section, and we will then deduce Theorem 3.3 from that ideal theory in section
12.

11. The Structure of Ideals in a Quadratic Number Field

Let F be a field of complex numbers. With respect to its addition and multiplication, F
is a vector space over Q, and we say that F has degree n (over Q) if n is the dimension of
F over Q. The following definition singles out the subfields of the complex numbers which
will be most important for the number theory which we will be most concerned with in all
of what follows.

Definition. F is an algebraic number field if the degree of F is finite.

We let F denote an algebraic number field of degree n that will remain fixed in the
discussion until indicated otherwise. Because the non-negative integral powers of a nonzero
element of F cannot form a set that is linearly independent over Q, every element of F is
algebraic over Q; this fact is the primary reason why fields of complex numbers which are
finite dimensional over Q are called algebraic number fields.
The structure of the ideals in the ring R = R ∩ F of all algebraic integers contained in F
will play a very important role in many situations in which we will be interested. We begin
our discussion of those matters by recalling that if A is a commutative ring with identity
then an ideal of A is a subring I of A such that ab ∈ I whenever a ∈ A and b ∈ I. If s ∈ A
50
11. The Structure of Ideals in a Quadratic Number Field

then the set {as : a ∈ A} is an ideal of A, called the principal ideal generated by s. We will
denote this ideal by either sA or (s). In a slight generalization of the principal-ideal notation
which will be useful, for two elements s and t of A, we will let (s, t) denote the ideal of A
generated by s and t, i.e., the set {as + bt : (a, b) ∈ A × A}. This notation should not be
confused with the ordered pair (s, t); the meaning of the notation will always be clear from
the context in which it is employed. An ideal I of A is prime if {0} = 6 I 6= A and if a, b
are elements of A such that ab ∈ I then a ∈ I or b ∈ I. An ideal M of A is maximal if
{0} =6 M 6= A and whenever I is an ideal of A such that M ⊆ I then M = I or I = A.
A basic fact in the theory of commutative rings with identity asserts that all maximal
ideals in such rings are prime ideals (Hungerford [29], Theorem III.2.19), with the converse
false in general. However, in the ring R of algebraic integers in F this converse is true, i.e.,
(i) an ideal of R is prime if and only if it is maximal.
This fact will be proved in section 1 of Chapter 5. We will also show in section 1 of Chapter
5 that in R,
(ii) if I is a nonzero ideal of R then the quotient ring R/I has finite cardinality,
and also that
(iii) if P is a prime ideal of R then P contains a unique rational prime q and the
cardinality of R/P is q n , for a positive integer n uniquely determined by P . The integer n
is called the degree of P.
By far the most important feature of the structure of proper, nonzero ideals of R is the
fact that they can be factored in a unique way as the product of prime ideals. We now
explain precisely what this means.

Definition. Let A be a commutative ring with identity, I, J (not necessarily distinct)


ideals of A. The (ideal) product IJ of I and J is the ideal of A generated by the set of
products
{xy : (x, y) ∈ I × J},
i.e., IJ is the smallest ideal of A, relative to subset inclusion, which contains this set of
products.
P
One can easily show that IJ consists precisely of all sums of the form i xi yi , where
xi ∈ I and yi ∈ J, for all i. It is also easy to show that the ideal product is commutative
and associative. We then have

Theorem 3.16. (Fundamental Theorem of Ideal Theory) Every nonzero, proper ideal
I of R is a product of prime ideals and this factorization is unique up to the order of the
factors. Moreover, the set of prime ideal factors of I is precisely the set of prime ideals of R
51
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

which contain I, i.e., the set of prime ideals of R containing I is nonempty and finite, and if
{P1 , . . . , Pk } is this set then there exist a k-tuple (m1 , . . . , mk ) of positive integers, uniquely
determined by I, such that I = P1m1 · · · Pkmk .

Theorem 3.16, one of the most important theorems in algebraic number theory, was
proved by R. Dedekind in 1871, and appeared as Supplement X in his famous series of
addenda to Dirichlet’s landmark text Vorlesungen über Zahlentheorie [12]. Because the
proof of Theorem 3.16 requires a rather substantial effort, and also because we will not make
use of it at its full strength until section 2 of Chapter 5, we will defer its proof until section
5 of Chapter 5.
Our fourth proof of Theorem 3.3 will require detailed information about how ideals factor
according to Theorem 3.16 for a special class of algebraic number fields. Let m 6= 1 be a

square-free integer. Then m is an algebraic integer with minimal polynomial x2 − m over

Q. It is not difficult to show that the field of complex numbers generated by m over Q, i.e.,

the smallest subfield of the complex numbers containing m and Q, the so-called quadratic
number field determined by m, is
√ √
Q( m) = {u + v m : (u, v) ∈ Q × Q}.

It is an immediate consequence of this equation that the set {1, m} is a vector-space basis
√ √
of Q( m) over Q, hence Q( m) has degree 2. With a bit more effort, one can also show
that

R ∩ Q( m) = {k + nω : (k, n) ∈ Z × Z},
where  √
m , if m ≡ 2 or 3 or mod 4,


ω= 1 + m
 , if m ≡ 1 mod 4
2
(Hecke [27], pp. 95, 96).

Let F = Q( m), R = R ∩ F . The proof of Theorem 3.3 which we will present in the
next section requires the determination of the prime-ideal factorization of each ideal qR of
R, q a rational prime, and the calculation of the degree of each factor, in accordance with
the conclusions of Theorem 3.16. This is done in

Proposition 3.17. (Decomposition law in Q( m)) Let p be an odd prime.
(i) If χp (m) = 1 then pR factors into the product of two distinct prime ideals, each of
degree 1. Moreover, for any a ∈ Z such that a2 ≡ m mod p, we can take the prime-ideal
factors of pR to be
√ √
(p, a + m) and (p, a − m).
52
11. The Structure of Ideals in a Quadratic Number Field

(ii) If χp (m) = 0 then pR is the square of a prime ideal I, and the degree of I is 1.

Moreover, we can take I to be the ideal (p, m).
(iii) If χp (m) = −1 then pR is prime in R, of degree 2.
If m ≡ 1 mod 8 then
(iv) 2R factors into the product of two distinct prime ideals, each of degree 1. Moreover,
we can take the prime-ideal factors of 2R to be
 √   √ 
1+ m 1− m
2, and 2, ,
2 2

If m ≡ 5 mod 8 then
(v) 2R is prime in R of degree 2.
If m ≡ 2 mod 4 then
(vi) 2R is the square of a prime ideal I, and the degree of I is 1. Moreover, we may take

I to be the ideal (2, m).
If m ≡ 3 mod 4 then
(vii) 2R is the square of a prime ideal I, and the degree of I is 1. Moreover, we may take

I to be the ideal (2, 1 + m).

Proof. Hecke [27], section 29, Theorem 90. QED


Let us return now to the general situation of an algebraic number field with its subring
R of algebraic integers. The next bit of mathematical technology that we have need of is
the ideal class group of R. In order to define this group, we first declare that the ideals I
and J of R are equivalent, and write I ∼ J, if there exist nonzero elements α and β of R
such that αI = βJ. This defines an equivalence relation on the set of all ideals of R, and we
refer to the corresponding equivalence classes as the ideal classes of R. If we let [I] denote
the ideal class which contains the ideal I then we can define a multiplication on the set of
ideal classes by declaring that the product of [I] and [J] is [IJ]. It can be shown that when
endowed with this product (which is well-defined), the ideal classes of R form an abelian
group, called the ideal-class group of R (Hecke [27], section 33). It is easy to see that the set
of all principal ideals of R, i.e., the set of all ideals of the form αR, α ∈ R, is an ideal class
of R, called the principal class, and one can prove that the principal class is the identity
element of the ideal-class group. It is one of the fundamental theorems of algebraic number
theory that the ideal-class group is always finite (see Hecke [27], section 33, Theorem 96),
and the order of the ideal-class group of R is called the class number of R.
Proposition 3.17, when combined with some additional mathematical technology, can be
used effectively to compute ideal-class groups and class numbers for quadratic number fields.
53
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

We illustrate how things go with three examples. But first, the additional technology that
is required.

Let F = Q( m) be a fixed quadratic number field, with R = A ∩ F . A subset {ω1 , ω2 }
of R is an integral basis of R if every element α in R can be expressed uniquely in the form

α = a1 ω1 + a2 ω2 , where (a1 , a2 ) ∈ Z × Z.

The ring R always has an integral basis (Hecke [27], section 22, Theorem 64), so we select
one, say {ω1 , ω2 }, let α ∈ R, write α = a1 ω1 + a2 ω2 for some (a1 , a2 ) ∈ Z × Z, and then set

N(α) = (a1 ω1 + a2 ω2 )(a1 ω1′ + a2 ω2′ ),

where the superscripted primes denote the algebraic conjugate taken over Q. The number
N(α) defined by this formula is called the norm of α, it does not depend upon the integral
basis of R used to define it, it maps R into Z and it is multiplicative in the sense that

N(αβ) = N(α)N(β) for all (α, β) ∈ R × R.


√ √
One can show that either the subset {1, 21 (1+ m)} or the subset {1, m} of R is an integral
basis of R, if either m is, or, respectively, is not, congruent to 1 mod 4. It follows that if
(a1 , a2 ) ∈ Z × Z then either
 √ 
1+ m 1−m 2
N a1 + a2 = a21 + a1 a2 + a2
2 4
or

N(a1 + a2 m) = a21 − ma22 ,
if either m is, or, respectively, is not, congruent to 1 mod 4. We also observe that when
m ≡ 1 mod 4, then
√  √ 
x2 − my 2
 
1+ m x+y m
N a1 + a2 =N = ,
2 2 4
where
x = 2a1 + a2 and y = a2 .
If I is an ideal of R then we extend the definition of the norm of numbers in R to ideals
by defining the norm N(I) of I to be the cardinality of R/I. N.B. It follows from observation
(ii) above that N(I) < +∞ for all nonzero ideals I of R. We also extend the norm to all
elements of F by first noting that any integral basis {ω1 , ω2 } of R is a vector-space basis of
F over Q, and so if w ∈ F , we chose the uniquely determined element (r1 , r2 ) ∈ Q × Q such
that w = r1 ω1 + r2 ω2 and then set N(w) equal to (r1 ω1 + r2 ω2 )(r1 ω1′ + r2 ω2′ ) as before. This
definition of the norm on F also is independent from the integral basis of R used to define
it, it maps F into Q, and it is multiplicative on F .
54
11. The Structure of Ideals in a Quadratic Number Field

The following lemma is what we require for the calculation of ideal-class groups and class
numbers of quadratic fields. It asserts that the norm on ideals of R is multiplicative with
respect to the product of ideals, it explains exactly how the norm on ideals extends the norm
on elements of R, and it concludes with a very useful inequality that will be used to limit
the ideals in R which can determine elements of the ideal-class group. The interested reader
can consult Hecke [27], Theorem 29 for a proof of statement (i) of Lemma 3.18, Hecke [27],
pp. 87-88 for a proof of statement (ii), and Marcus [40], Corollary 2, p. 136 for a proof of

(iii). It will also be convenient here and subsequently to define the discriminant of Q( m)
as either m or 4m, if m is or, respectively, is not congruent to 1 mod 4.

Lemma 3.18. (i) If I and J are ideals of R, then


N(IJ) = N(I)N(J).
(ii) If 0 6= α ∈ R then the norm of the principal ideal generated by α is |N(α)|.
(iii) In each ideal class of R there is an ideal I such that
 s
1 4 p
N(I) ≤ λ = |d|,
2 π
where d is the discriminant of F and s is either 0, if m is positive, or 1, if m is negative.

 s
1 4 p
The constant |d| is called Minkowski’s constant, and arises in the study of the
2 π
geometry of numbers.
In order to gain some insight into the techniques that we will employ to prove Theorem
3.3 in the next section, we will now calculate the ideal-class group and the class number of
of three quadratic number fields using Proposition 3.17 and Lemma 3.18.

Example 1.

Let F = Q( 2 ), R = A ∩ F . Then s = 0 and d = 8, hence the value of Minkowski’s
constant λ in Lemma 3.18(iii) is
1√
8 < 2,
2
and so by Lemma 3.18(iii), every ideal class of R contains an ideal I with N(I) ≤ 1 hence
|R/I| = N(I) = 1, hence I = (1). Conclusion: R has only one ideal class, the principal
class, and so R has class number 1.

Example 2 √
√ √ 4 5
Let F = Q( −5 ), R = A ∩ F = Z + −5 Z. Then s = 1 and d = −20, so λ = < 3,
π
hence every ideal class of R contains an ideal I such that N(I) is either 1 or 2.
55
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

If N(I) = 1 then I = (1). Suppose that N(I) = 2. Then the additive group of R/I has
order 2, and so
2(α + I) = I, for all α ∈ R,
and taking α = 1, we obtain 2 ∈ I. Hence all of the prime factors of I must contain 2, so
we factor the ideal (2) by way of Proposition 3.17(vii) as

(2) = (2, 1 + −5)2 .

It follows that I must be a power J k of J = (2, 1 + −5). The degree of J is 1, hence
N(J) = 2. But then Lemma 3.18(i) implies that 2 = N(I) = N(J)k = 2k , hence k = 1 and
so I = J. Conclusion: there are at most two ideal classes of R, namely [(1)] and [J].
We claim that J is not principal. If this claim is true then the ideal-class group of R is
{[(1)], [J]} and R has class number 2.
In order to verify our claim, suppose there exits α ∈ R such that J = (α). Lemma
3.18(ii) implies that
|N(α)| = N(J) = 2,
hence N(α) = 2 (all nonzero elements of R have positive norm). But there exist a, b ∈ Z

such that α = a + b −5, hence

a2 + 5b2 = N(α) = 2,

and this is clearly impossible.

Example 3
√ √
Let F = Q( −23 ), R = A ∩ F = Z + 12 (1 + −23) Z. Then s = 1 and d = −23


2 23
hence λ = < 4, and so every ideal class contains an ideal with norm 1, 2, or 3. As in
π
example 2, every ideal of norm 2 (respectively, 3) must have all of its prime factors containing
2 (respectively, 3), and so factoring via Proposition 3.17 (i) and (iv), we obtain
 √  √ 
1 + −23 1 − −23
(2) = 2, 2, = I1 I2 ,
2 2
√ √
(3) = (3, 1 + −23)(3, 1 − −23) = I3 I4 ,
hence the ideals of norm 2 are I1 , I2 and the ideals of norm 3 are I3 , I4 .

It is easily verified that the elements of R are either of the form a+b −23, where a, b ∈ Z

or 21 (a + b −23), where a and b are odd elements of Z. Hence the norm of an element of R is
either a2 + 23b2 or 14 (a2 + 23b2 ) for a, b ∈ Z, neither of which can be 2 or 3. Hence I1 , I2 , I3 ,
and I4 are all not principal. It follows that in order to calculate the ideal-class group of R,
we must determine the inequivalent ideals among I1 , I2 , I3 , and I4 .
56
12. Proof of Quadratic Reciprocity via Ideal Theory: Conclusion

We first look at I1 and I4 . I1 ∼ I4 if and only if [I1 ][I4 ]−1 = [(1)]. But I3 I4 = (3) ∼ (1),
and so [I4 ]−1 = [I3 ], hence we need to see if I1 I3 is principal.
Lemma 3.18(i) implies that

(23) N(I1 I3 ) = N(I1 )N(I3 ) = 2 · 3 = 6.

Claim: an ideal I 6= {0} of R is principal if and only if there exits α ∈ R such that
N(α) = N(I) and there is a generating set S of I such that s/α ∈ R, for all s ∈ S.
The necessity of this is clear. For the sufficiency, let α ∈ R satisfy the stated conditions.
Then J = (1/α)I is an ideal of R and Lemma 3.18 (i), (ii) imply that

N(I) = N (α)J = N(I)N(J).

hence N(J) = 1 and so J = (1), whence I = (α).


So in light of (23), we must look for elements of R of norm 6. If a, b ∈ Z then a2 +23b2 6= 6;
on the other hand,
a2 + 23b2
6=
4
2 2
if and only if a = 1 = b . √ Hence there are exactly two principal ideals of norm 6:
1
√  1 + −23
2
(1 ± −23 ) . Let α = . We have that
2
√ √ √
 
3  1 2
I1 I3 = 6, 2 + 2 −23, 1 + −23 , 1 + −23 .
2 2
Now divide each of these generators by α: you always get an element of R. Hence by the
claim, I1 I3 = (α), and so I1 ∼ I4 . Following the same line of reasoning also shows that
I2 ∼ I3 .
We now claim that I1 is not equivalent to I2 . Otherwise, [(1)] = [(2)] = [I1 I2 ] = [I12 ],
hence there exits α ∈ R such that I12 = (α), and so N(α) = N(I12 ) = 4, whence α = ±2.
But then (2)I1 = I12 I2 = (2)I2 , hence I1 = I2 , which contradicts the fact that these ideals
are distinct.
It follows that the ideal-class group of R is {[(1)], [I1 ], [I2 ]}, and R has class number 3.
Since the ideal-class group is of prime order, it is cyclic, and since the order is 3, both ideal
classes [I1 ] and [I2 ] are generators of the group.

12. Proof of Quadratic Reciprocity via Ideal Theory: Conclusion



Let F = Q( m) be a quadratic number field. The proof of Theorem 3.3 which we give in
this section depends on an equivalence relation defined on the ideals of R = A ∩ F which is
similar to, but possibly different from, the equivalence relation ∼ which determines the ideal
classes of R. If I and J are ideals of R then we declare that I is equivalent to J in the narrow
57
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

sense, and write I ≈ J, if there exists an element s ∈ F such that N(s) > 0 and I = sJ.
The relation ≈ is clearly an equivalence relation, and we will call the corresponding set of
equivalence classes narrow ideal classes of R. Because each element of F is the quotient
of two elements of R, it follows that I ≈ J implies that I ∼ J, hence each narrow ideal
class is a subset of some ideal class. If the norm function N is always positive on F , which
occurs when m < 0, then there is no difference between ordinary equivalence of ideals and
equivalence in the narrow sense.
On the other hand, when m > 0 the difference between these two equivalence relations is
mediated by the units in R, i.e., the elements of R which have a multiplicative inverse in R.
It is easy to see that an element u in R is a unit if and only if N(u) = ±1. If R has a unit
of norm −1 then there is also no difference between the two equivalence relations because
if I = kJ for some k ∈ F , then upon multiplication of the element k by a suitable unit of
norm −1, one can insure that the element k in this equation has positive norm. On the other
hand, if there is no unit with a negative norm then it follows easily from this assumption and

the fact that N( m) = −m < 0 that every nonzero ideal I of R is not narrowly equivalent

to m I. Hence each ideal class [I] in the ordinary sense in the union of exactly two ideal
classes in the narrow sense, namely the narrow ideal class containing I and the narrow ideal

class containing m I. If we let h denote the class number of R and h0 denote the number
of narrow ideal classes of R, i.e., h0 is the narrow class number of R, it follows that either
h0 = h or h0 = 2h; in particular the number of narrow ideal classes is finite.
The set of all narrow ideal classes can be given an abelian group structure using mul-
tiplication of ideals in the same way as we did for the ideal-class group, with the set of all
nonzero principal ideals (µ) with N(µ) > 0 acting as the identity element. We call this group
the narrow ideal-class group of R. The algebraic relationship between the ideal-class group
and the narrow ideal-class group can be described by considering the set I of all nonzero
ideals of R as an abelian semigroup under the ideal product and containing the semigroup
H of all nonzero principal ideals of R. The quotient semigroup I/H is then in fact a group
which is isomorphic to the ideal-class group. If we let H0 denote the semigroup of all nonzero
principal ideals (µ) with N(µ) > 0 then the quotient semigroup I/H0 is a group which is
isomorphic to the narrow ideal-class group.
The next theorem is the basis for the proof of Theorem 3.3 that will be presented here.
The proof of the theorem, rather long and technical, employs several results from algebraic
number theory that would take us too far afield to clearly explain, so we will be content to
cite Hecke [27], proof of Theorem 132, for the details. We will use the theorem to deduce
two corollaries from which Theorem 3.3 will follow by an elegant argument.

58
12. Proof of Quadratic Reciprocity via Ideal Theory: Conclusion

The statement of the theorem requires a bit of terminology from the theory of finite
abelian groups. If G is such a group whose order exceeds 1 then G is isomorphic to a
uniquely determined finite direct sum of cyclic groups of prime-power order. If q is a prime
number then the basis number of q belonging to G is the number b(q) of cyclic summands of
G whose orders are all divisible by q. It follows that if b(q) = 0 then the order of G is not
divisible by q.

Theorem 3.19. If t denotes the number of distinct prime ideals in R which contain the
discriminant of F then the basis number of 2 belonging to the narrow ideal-class group of R
is t − 1.

Corollary 3.20. If the discriminant of F is divisible by a single prime then the order
of the narrow ideal-class group is odd.

Proof. If the discriminant of F = Q( m) is divisible by a single prime then either
m = ±2 or m ≡ 1 mod 4 and either m or −m is prime. If m = ±2 then the discrimi-
nant is ±8 and so from Proposition 3.17(vi), it follows that (±8) = (2)3 has only a single
prime-ideal factor, i.e., t = 1. If m ≡ 1 mod 4 and either m or −m is prime then it follows
from Proposition 3.17(ii) that (m) has only a single prime-ideal factor, hence t = 1 here
as well. Theorem 3.19 then implies that the basis number of 2 belonging to the narrow
ideal-class group is 0. Hence the order of the narrow ideal-class group is not divisible by
2. QED

Corollary 3.21. If the discriminant of F is the product of two positive primes p and
q, each congruent to 3 mod 4, then either p or q is the norm of an element of R.

Proof. We prove first that the norm of each unit of R is 1. If not, i.e., there is an element
α of R with norm −1, then there exist rational integers x and y such that
x2 − pqy 2 = −4
(pq ≡ 1 mod 4), hence
−4 ≡ x2 mod pq,
and so −1 is a quadratic residue of p. However, Theorem 2.4 implies that χp (−1) = −1,
which is not possible.
Now use Proposition 3.17(ii) to factor (p) and (q) as
(24) (p) = I 2 , (q) = J 2 ,
where I and J are prime ideals such that
(25) N(I) = p, N(J) = q.
59
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

The discriminant of Q( pq) is pq, hence any prime ideal which is a factor of (pq) must
contain either p or q, and hence must be equal to either I or J. If (pq) has only a single
prime factor Q, say, then Q ∩ Z is a prime ideal of Z which contains both p and q, which
is not possible. It hence follows that t = 2 in Theorem 3.19, and consequently, the proof of
Theorem 3.19 (Hecke [27], pp. 160-162) implies that

(26) I ε1 J ε2 ≈ (1),

where εi ∈ {0, 1}, i = 1, 2, and ε1 6= 0 6= ε2 .


If ε1 = ε2 = 1, then (24) and (26) imply that

( pq) = IJ ≈ (1),

and so the definition of equivalence in the narrow sense implies that there exists α ∈ R with
N(α) > 0 such that

( pq) = (α).

We conclude the existence of a unit u of R such that u pq = α, hence
N(α) N(α)
N(u) = √ =− < 0,
N( pq) pq
which cannot happen because every unit of R has norm 1. Hence either ε1 or ε2 is 1 and the
other is 0, and consequently from (26) it follows that either I or J is principal and is generated
by an element γ of R of positive norm. Hence by (25) and Lemma 3.18(ii), either p or q is the
norm of γ. QED
With Corollaries 3.20 and 3.21 in hand, we can now prove Theorem 3.3. Given distinct
odd primes p and q, we wish to prove that
1 1
(27) χp (q)χq (p) = (−1) 2 (p−1) 2 (q−1) .

It will be most convenient to divide the reasoning into the three cases which determine
the sign on the right-hand side of (27).
Case 1.
Suppose that p ≡ q ≡ 1 mod 4. We will show that χp (q) and χq (p) are simultaneously 1,
hence also simultaneously −1, hence both sides of (27) are 1.
Assume that χp (q) = 1. Then according to Proposition 3.17(i), the ideal (p) in R =

R ∩ Q( q) factors as IJ with each of the factors having degree 1. If h0 is the narrow class
number of R then I h0 is in the narrow principal class of R, hence there exists an element α
in R of positive norm such that

(28) I h0 = (α).
60
12. Proof of Quadratic Reciprocity via Ideal Theory: Conclusion

Because there are rational integers x and y such that α = 21 (x + y q), we take the norm of
both sides of (28) and use the facts that p ∈ I, the degree of I is 1, and α has positive norm
to conclude that
h0 x2 − qy 2
p = ,
4
from which it follows that
4ph0 ≡ x2 mod q.
This means that 4ph0 is a residue of q, hence

χq (p)h0 = χq (4ph0 ) = 1.

Because Q( q) has discriminant q, it follows from Corollary 3.20 that h0 is odd, and so this
equation implies that χq (p) = 1. An interchange of the roles of p and q in this argument
also shows that χq (p) = 1 implies that χp (q) = 1.
Case 2.
Suppose that q ≡ 1 mod 4 and p ≡ 3 mod 4. The argument in Case 1 shows that if
χp (q) = 1 then χq (p) = 1. Hence by Theorem 2.4,

χq (−p) = χq (−1)χq (p) = 1



Conversely, if χq (−p) = 1, then we can apply the argument in Case 1, using the field Q( −p),
to obtain χp (q) = 1. It follows that

χp (q) = χq (−p) = χq (p),

and both sides of (27) are again equal to 1.


Case 3.
Suppose that p ≡ q ≡ 3 mod 4. Applying the same reasoning as we did in Case 1 or
2, it follows that χq (−p) = 1 implies that χp (−q) = −1, but verification of the converse

cannot be proved in that way. Instead, we work in the field Q( pq), in which, according to

Corollary 3.21, p or q is the norm of an algebraic integer 12 (x + y pq). If p is that norm then

4p = x2 − pqy 2.

This equation implies that x is divisible by p, say x = ap, and so it follows that 4 = pa2 −qy 2 .
Upon observing that a is not divisible by q and y is not divisible by p, it hence follows from
this equation that
χq (p) = χq (pa2 ) = χq (pa2 − qy 2) = χq (4) = 1,
and similarly,
χp (−q) = 1.
61
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

Hence by Theorem 2.4 again, it follows that


χp (q) = χp (−1)χp (−q) = −1.
If q is the prime that is the norm of an element of R, the same argument applies to show that
χp (q) and χq (p) still have opposite signs. Thus (27) is verified for this final case. QED
Before we move on to the last proof of Theorem 3.3 that will be presented in this chapter,
we will explain how the proof that we just gave is related to Gauss’ second proof of quadratic
reciprocity. In order to do that we need to describe how to get quadratic forms from ideals
of quadratic number fields. In the discussion which follows, we will eschew the proof of the
results mentioned; for those please consult Landau [35], Part Four, Chapters I-IV or Hecke
[27], section 53.
Recall from Chapter 1 that a (binary) quadratic form is a polynomial in the variables x
and y of the form
ax2 + bxy + cy 2
where (a, b, c) ∈ Z × Z × Z, and we will denote this form by [a, b, c]. The discriminant
of [a, b, c] is the familiar algebraic invariant d = b2 − 4ac, and in the classical theory, one
presupposes the form is irreducible, i.e., it is not a product of linear factors with integer
coefficients, so that the discriminant is not a perfect square and is congruent to either 0 or
1 mod 4. A discriminant is fundamental if gcd(a, b, c) = 1. It can be shown that the set of
fundamental discriminants consists of precisely the integers, positive and negative, which are
either square-free and congruent to 1 mod 4 or are of the form 4n, where n is square-free and
congruent to 2 or 3 mod 4. Thus the set of fundamental discriminants of quadratic forms
coincides with the set of discriminants of quadratic number fields.
For each fundamental discriminant d, let Q(d) denote the set of all irreducible quadratic
forms with discriminant d, so that Q(d) consists of all irreducible forms [a, b, c] such that
gcd(a, b, c) = 1 and b2 − 4ac = d. It transpires that there is a way to manufacture certain
p
elements of Q(d) from the ideals in the ring of algebraic integers in Q( m(d)), where
m(d) = d if d ≡ 1 mod 4, and m(d) = d/4 if d ≡ 0 mod 4. In order to describe this
procedure, we first single out the forms in Q(d) that will arise from it.
The set of quadratic forms that we need is determined by the manner in which quadratic
forms represent the integers. If n is an integer and q(x, y) is a quadratic form, we will say
that n is represented by q(x, y) if there exist integers x and y such that n = q(x, y). The sign
of the integers which a given quadratic form q = [a, b, c] in Q(d) represents depends on the
sign of the discriminant d. If d > 0 then q represents both positive and negative integers.
If d < 0 and a > 0 then q represents no negative integers, and q(x, y) represents 0 only if
x = y = 0. If d < 0 and a < 0 then q represents no positive integers, and q(x, y) represents
62
12. Proof of Quadratic Reciprocity via Ideal Theory: Conclusion

0 only if x = y = 0. Hence forms with positive discriminant are called indefinite and forms
with negative discriminant are called positive or negative definite if a is, respectively, positive
or negative (Hecke [27], section 53, Theorem 153).
p
Now let R = R ∩ Q( m(d)) and let I(d) denote the set of all nonzero ideals of R. For

each I ∈ I(d), we choose an integral basis {α, β} of I such that αβ ′ − α′ β = N(I) d is
positive or pure imaginary with positive imaginary part (an integral basis with this property
always exits, according to Hecke [27], p.190). If (x, y) ∈ Z × Z then we let
(xα + yβ)(xα′ + yβ ′)
QI (x, y) = .
N(I)
One can show that if I ∈ I(d) then QI ∈ Q(d). Moreover, if d > 0, and f ∈ Q(d) then there
is an I ∈ I(d) such that QI = f , and if d < 0 then for each positive definite form f ∈ Q(d),
there is an I ∈ I(d) such that QI = f (Hecke [27], pp.190-192).
The relation of narrow equivalence of ideals in I(d) has an important connection to an
equivalence relation on the set Q(d). We declare that forms q(x, y) = ax2 + bxy + cy 2 and
q1 (X, Y ) = a1 X 2 + b1 XY + c1 Y 2 in Q(d) are equivalent if there is a linear transformation
defined by
x = αX + βY, y = γX + δY,
where α, β, γ, and δ are integers satisfying αδ − βγ = 1, such that

q(αX + βY, γX + δY ) = q1 (X, Y ).

These transformations are called modular substitutions, and each modular substitution maps
Q(d) bijectively onto Q(d). It is a classical result of Lagrange that each equivalence class
of forms determined by this equivalence relation contains a form [a, b, c] whose coefficients
satisfy
|b| ≤ |a| ≤ |c|,
and it follows from this fact that the number of equivalence classes is finite (Landau [35],
Theorem 197). There is always at least one form in Q(d), called the principal form, defined
by
1
x2 − dy 2, if d ≡ 0 mod 4,
4
or
1
x2 + xy − (d − 1)y 2, if d ≡ 1 mod 4,
4
hence the number of equivalence classes is a positive integer. In what follows, when we speak
of a class of quadratic forms, we will mean one of these equivalence classes.
One can now prove the following very important theorem:
63
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

Theorem 3.22. For each I ∈ I(d) the class of the form QI does not depend on the
integral basis of I used to define it, and ideals I and J in I(d) are in the same narrow ideal
class if and only if QI and QJ are in the same class of forms in Q(d).

Proof. Hecke [27], section 53, Theorem 154. QED


As we mentioned in section 10, the second proof which Gauss gave for the LQR uses
his genus theory of quadratic forms. The genus which a quadratic form belongs to is an
equivalence class determined by yet another equivalence relation that Gauss defined on
Q(d). The definition of this equivalence relation is based on a very subtle and detailed
analysis of the manner by which a quadratic form represents odd integers, even integers,
and the residues and non-residues of primes which divide the discriminant of the form. It
is done in such a way that each genus is the union of certain classes of forms determined
by modular substitutions. Because of the rather daunting complexity of Gauss’ definition,
we are instead going to define a notion of genus on the set I(d), which can be done rather
more transparently, and then lift it to Q(d) via Theorem 3.22. What we will end up with is
Gauss’ definition of genera of quadratic forms.
Toward that end, we declare that nonzero ideals I and J of R such that I + (d) = R =
p
J + (d) have the same genus if there exists a number γ in Q( m(d)) such that

N(I) ≡ |N(γ)|N(J) mod (d).

On the set of nonzero ideals which are relatively prime to (d), i.e., the nonzero ideals whose
sum with (d) is R, this congruence defines an equivalence relation which partitions the
nonzero ideals relatively prime to (d) into genera. The genera form an abelian group in
the usual way with the identity in this group given by the the genus containing (1), thus
called the principal genus, and hence containing the set of all principal ideals which are
generated by the elements of R of positive norm. It is not difficult to see that ideals that are
equivalent in the narrow sense belong to the same genus if they are all relatively prime to
(d); consequently each genus is the union of certain narrow ideal classes. The narrow classes
belonging to the principal genus form a subgroup of the narrow ideal-class group, and so if f
is the order of this subgroup, g is the number of genera, and h0 is the narrow class number
of R, then each genus is the union of exactly f narrow ideal classes and h0 = f g.
Returning to Q(d), we take forms q1 and q2 in Q(d) (both positive definite if d < 0),
choose ideals I1 and I2 in I(d) such that qj = QIj for j = 1, 2, and we define the class
containing q1 and the class containing q2 to be in the same genus if I1 and I2 have the same
genus. By virtue of Theorem 3.22, this relation is well-defined, each genus of forms in Q(d)
64
13. A Proof of Quadratic Reciprocity via Galois Theory

is the union of f classes of forms, the number of genera is g, and the total number of classes
of forms (classes of positive definite forms if d < 0) is f g.
The fundamental problem of genus theory is the determination of the number of genera.
The solution to this problem is given in the following theorem:

Theorem 3.23. If t is the number of distinct prime ideals of R which contain d then
the number of genera is 2t−1 . Moreover, a narrow ideal class is the square of a narrow ideal
class if and only if it is contained in the principle genus.

Proof. Hecke [27], section 48, Theorem 145. QED


This theorem, due to Gauss in an equivalent form ([19], articles 261, 286, 287), is the
basis of his second proof of quadratic reciprocity. Actually Gauss only used the fact that the
number of genera does not exceed 2t−1 in his argument, because if d has at most two prime
factors, then this inequality implies that there are only at most two genera in Q(d), one
of which is always the principal genus. Gauss then used his definition of genera to deduce
quadratic reciprocity from this fact by means of an argument whose line of reasoning, when
converted into the language of ideals, is very similar to the one that is used in the proof of
quadratic reciprocity given in this section. Interestingly enough, Gauss then used quadratic
reciprocity and his theory of composition of forms to derive the reverse inequality that the
number of genera can be no less that 2t−1 .

13. A Proof of Quadratic Reciprocity via Galois Theory

The proof of Theorem 3.3 which we gave in the last section shows that quadratic reci-
procity results from certain factorization properties of the ideals in a quadratic number field.
In this final section of Chapter 3, we derive quadratic reciprocity from the structure of the
Galois group of certain cyclotomic number fields.
We preface the argument by recalling some relevant facts from Galois theory. Let K ⊆ F
be an inclusion of fields; we say that F is an extension of K. An automorphism σ of F is
Galois over K if each element of K is fixed by σ, i.e., σ(k) = k for all k ∈ K. The set
of all automorphisms of F which are Galois over K forms a group under composition of
automorphisms, called the Galois group of F over K, and denoted by GK (F ). Now let E be
an intermediate field, i.e., a subfield of F which contains K, and also let H be a subgroup
of GK (F ). We let
E ′ = {σ ∈ GK (F ) : σ(ξ) = ξ, for all ξ ∈ E},

H ′ = {ξ ∈ F : σ(ξ) = ξ, for all σ ∈ H}.


65
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

It is clear that E ′ = GE (F ) is a subgroup of GK (F ) and H ′ , the fixed set of H, is an


intermediate subfield of F . The field F is Galois over K if GK (F )′ = K, i.e., the fixed field
of GK (F ) is K.
The field F is naturally a vector space over K with respect to the addition and multi-
plication in F ; the dimension of F as a vector space over K is called the degree of F over K
and is denoted by [F : K]. F is a finite extension of K if the degree of F over K is finite.
The following theorem, often called the Fundamental Theorem of Galois Theory, plays, as
the name connotes, a central role in the theory of fields.

Theorem 3.24. If F is a finite Galois extension of K then the mapping E → E ′ is a


bijection of the set of all intermediate fields E onto the set of all subgroups of GK (F ) whose
inverse mapping is H → H ′ and which has the following properties:
(i) F is a Galois extension of E ′ and the order of GE ′ (F ) is [F : E ′ ]. In particular, the
order of GK (F ) is [F : K].
(ii) E is a Galois extension of K if and only if E ′ is a normal subgroup of GK (F ), and
if E is Galois over K then GK (E) is isomorphic to the quotient group GK (F ) E ′ .


Proof. Hungerford [29], Theorem V.2.5. QED


In particular, if F is a finite Galois extension of K and GK (F ) is abelian, i.e., F is an
abelian extension of K, then all intermediate subfields E are Galois extensions of K,

the order of GK (E) is [E : K] and



GK (E) is isomorphic to GK (F ) GE (F ).

We now specialize to the case when F is an algebraic number field, the situation that
is of most interest to us. A natural question that arises after one contemplates Theorem
3.24 asks: when is an algebraic number field a Galois extension of Q? The answer involves
the concept of a splitting field of a polynomial, an idea that we have already encountered in
section 1 of this chapter.
If K ⊆ F are fields of complex numbers, then F is a splitting field over K if F is generated
over K by the roots of a polynomial f (x) ∈ K[x], in other words, F is the smallest subfield
of the complex numbers which contains K and all the roots of f (x). In particular, if this
holds we also say that F is the splitting field of f (x) over K. The next theorem is true in
much greater generality, but it will be more than sufficient to meet our needs.

Theorem 3.25. An algebraic number field is Galois over Q if and only if it is the splitting
field of a polynomial in Q[x].
66
13. A Proof of Quadratic Reciprocity via Galois Theory

Proof. Hungerford [29], Theorem V.3.11. QED


In order to verify Theorem 3.3, we are going to work in the algebraic number field Q(ζ)
generated over Q by the p-th root of unity ζ = exp(2πi/p), for a fixed odd prime p, the
cyclotomic number field determined by p. As we saw in the examples from section 8, ζ is an
algebraic integer with minimal polynomial over Q given by

1 + x + · · · + xp−1 .

It follows from this fact and it is not too difficult to prove that
( p−1 )
X
(29) Q(ζ) = ri ζ i : (r0 , . . . , rp−1) ∈ Qp
i=0

(Hecke [27], section 30, p. 98). It is also true that


( p−1 )
X
(30) R(ζ) = R ∩ Q(ζ) = zi ζ i : (z0 , . . . , zp−1 ) ∈ Zp ,
i=0

although the proof of this requires quite a bit more work, for instance, see Marcus [40],
Theorem 10, p. 30.
From the factorization
p−1
Y
p−1
1+x+···+x = (x − ζ i ),
i=0

(29), and Theorem 3.25 it follows that Q(ζ) is Galois over Q. We proceed to calculate the
Galois group G of Q(ζ) over Q. Recall that U(p) denotes the group of units in Z/pZ; this
group is cyclic of order p − 1.

Proposition 3.26. There is an isomorphism θ of G onto U(p) such that for σ ∈ G,

σ(ζ) = ζ θ(σ) .

Proof. Because ζ p = 1, it follows that σ(ζ)p = 1. Hence

σ(ζ) = ζ θ(σ) ,

where θ(σ) is an integer determined uniquely by σ modulo p. If τ = σ −1 then

ζ = τ σ(ζ) = τ ζ θ(σ) = ζ θ(τ )θ(σ) ,




hence θ(τ )θ(σ) is in the coset in Z/pZ containing 1. It follows that θ maps G into U(p). If
τ, σ ∈ G then
ζ θ(τ σ) = (τ σ)(ζ) = τ σ(ζ) = ζ θ(τ )θ(σ) ,


67
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

hence θ(τ σ) ≡ θ(τ )θ(σ) mod p and so θ is a homomorphism. If θ(σ) ≡ 1 mod p then
σ(ζ) = ζ, hence by (29), σ(α) = α, for all α ∈ Q(ζ), whence θ is a monomorphism. As Q(ζ)
is Galois over Q, we have that

U(p) = p − 1 = [Q(ζ) : Q] = |G|,

hence θ is surjective. QED


It is a consequence of Proposition 3.26 that for every integer a ∈ Z not divisible by p,
there exists σa ∈ G such that
σa (ζ) = ζ a ,
and the map a → σa is the inverse of θ.

Lemma 3.27. If q is a prime distinct from p then for all w ∈ R(ζ), σq (w) ≡ w q mod
qR(ζ).

Proof. From (30), we have that


X
w= zi ζ i ,
i

where zi ∈ Z, for all i. Since σq (ζ) = ζ q , it follows from Fermat’s little theorem that
X q
σq (w) ≡ zi ζ qi mod qR(ζ).
i

Because the ring R(ζ)/qR(ζ) has characteristic q, the q-th power map in this ring is additive,
hence
X q
σq (w) ≡ zi ζ qi mod qR(ζ)
i
X q
i
≡ zi ζ mod qR(ζ)
i
≡ w q mod qR(ζ).

QED
The way is now clear to the proof of Theorem 3.3. We begin by looking for a square root
1
of (−1) 2 (p−1) p = p∗ in Q(ζ). This can be found by applying Theorem 3.14, but we want to
avoid the use of Gauss sums. Instead, we will find this square root via the equation
p−1
Y
p= (1 − ζ i ).
i=1

If the terms corresponding to i and p − i are combined, then

(1 − ζ i )(1 − ζ p−i ) = (1 − ζ i )(1 − ζ −i ) = −ζ −i (1 − ζ i )2 .


68
13. A Proof of Quadratic Reciprocity via Galois Theory

Hence
1 1
2
(p−1) 2
(p−1)
1
Y X
(p−1) n i 2
p = (−1) 2 ζ (1 − ζ ) , where n = − i.
i=1 i=1

Now choose z ∈ Z such that 2z ≡ 1 mod p. Then ζ n = (ζ nz )2 , and so


 1
2
2Y
(p−1)

p∗ = ζ nz (1 − ζ i ) ,
i=1

whence p∗ = τ 2 for some τ ∈ Q(ζ).


Let q be an odd prime distinct from p. Then

σq (τ )2 = σq (τ 2 ) = σq (p∗ ) = p∗ = τ 2 ,

hence σq (τ ) = ±τ , with the plus sign holding if and only if σq is in the Galois group of Q(ζ)
over Q(τ ). Proposition 3.26 implies that G is cyclic of order p − 1, hence abelian, and so we
conclude from Theorem 3.24 that
  
G GQ(τ ) Q(ζ) is isomorphic to GQ Q(τ )

and

GQ Q(τ ) = [Q(τ ) : Q].

As the minimal polynomial of τ over Q is x2 − p∗ , it follows that [Q(τ ) : Q] = 2, and so


 
G GQ(τ ) Q(ζ) is cyclic of order 2. Because G is cyclic of even order, we conclude that

GQ(τ ) Q(ζ) consists of all the squares of the elements of G. Hence σq (τ ) = τ if and only if
σq is a square in G. Since the map a → σa is an isomorphism of U(p) onto G, it follows that
σq is a square in G if and only if q is a square in Z/pZ. In other words,

σq (τ ) = χp (q)τ.

Let Q be a prime ideal of R(ζ) containing q. Lemma 3.27 implies that

χp (q)τ = σq (τ ) ≡ τ q mod Q, i.e.,

(χp (q) − τ q−1 )τ ∈ Q.


1
If τ ∈ Q then p = (−1) 2 (p−1) τ 2 ∈ Q, and this is not possible because q is the only rational
prime which Q contains. We conclude that

(31) χp (q) ≡ τ q−1 mod Q.

On the other hand, it follows from Euler’s criterion that


1
(32) τ q−1 = (p∗ ) 2 (q−1) ≡ χq (p∗ ) mod q.
69
Chapter 3. Gauss’ Theorema Aureum: the Law of Quadratic Reciprocity

Because q ∈ Q, it follows from (31) and (32) that


χp (q) ≡ χq (p∗ ) mod Q,
hence
χp (q) = χq (p∗ )
because 2 ∈ / Q. The LQR is now an immediate consequence of this equation and Theorem
2.4. QED
The argument which we have given here (taken from [30], section 13.3) shows that qua-

dratic reciprocity results from the fact that the cyclotomic number field Q exp(2πi/p) is
an abelian extension of Q with a cyclic Galois group. We also know from Theorem 3.2 that
an irreducible polynomial f (x) ∈ Z[x] satisfies a reciprocity law if and only if the splitting
field of f (x) is an abelian extension of Q. This path to higher reciprocity laws by way of the
Galois theory of abelian extensions eventually became one of the principal thoroughfares to
the creation of class field theory.

70
CHAPTER 4

Four Interesting Applications of Quadratic Reciprocity

Gauss called the Law of Quadratic Reciprocity the golden theorem of number theory
because, when it is in hand, the study of quadratic residues and non-residues can be pursued
to a significantly deeper level. We have already seen some examples of how useful the LQR
can be in answering questions about the calculation of specific residues or non-residues. In
this chapter, we will study four applications of the LQR which illustrate how it can be used
to shed further light on interesting properties of residues and non-residues.
Our first application will use quadratic reciprocity to completely solve the Basic Problem
and the Fundamental Problem for Odd Primes that we introduced in Chapter 2. If z is an
integer, recall that the Basic Problem is to determine all primes p such that z is a quadratic
residue of p and to determine all primes p such that z is a quadratic non-residue of p.
The Basic Problem must be solved in order to determine when the quadratic congruence
ax2 + bx + c ≡ 0 mod p has a solution, as we saw in Chapter 1, and it must also be solved in
order to determine the splitting moduli of quadratic polynomials, as we explained in section
1 of Chapter 3. Theorems 2.4 and 2.6 solve the Basic Problem for, respectively, z = −1 and
z = 2 and in Chapter 2 we also showed how to reduce the solution of the Basic Problem
to its solution when z is an odd prime, which we call the Fundamental Problem for Odd
Primes. In section 1 of this chapter, the LQR will be used to solve the Fundamental Problem
for Odd Primes and this solution will then be used in section 2 to solve the Basic Problem.
The second application, which we will discuss in section 3, employs quadratic reciprocity
to investigate when finite, nonempty subsets of the positive integers occur as sets of residues
of infinitely many primes. In addition to the LQR, the key lemma which we will use to
answer that question also employs Dirichlet’s theorem on primes in arithmetic progression.
We take the appearance of Dirichlet’s theorem here as an opportunity to discuss Dirichlet’s
proof of that theorem in section 4, because many of the ideas and techniques of his reasoning
will be used extensively in much of the work that we will do in subsequent chapters.
If S is a finite, nonempty subset of the positive integers which is a set of residues for
infinitely many primes, a natural question that immediately occurs asks: how large is the
set of all primes p such that S is a set of residues of p? In order to answer that question, one
must find a way to accurately measure the size of an infinite set of primes. A good way to

71
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

make that measurement is provided by the concept of the natural or asymptotic density of a
set of primes, which we will discuss in section 5. In section 6, we apply quadratic reciprocity
a third time in order to deduce a very nice way to calculate the asymptotic density of the
set of all primes p such that S is a set of residues of p.
Number theory, and in particular, quadratic residues, has been applied extensively in
modern cryptology. As one example of those applications, suppose that you receive an
identification number from person A and you want to verify that A validly is in possession
of the identification number, i.e., you want to be sure that A really is who he claims to be,
without knowing anything else about A. Or, for a more mathematical example, A wants to
convince you that he knows the prime factors of a very large number, without telling you
what the prime factors are. This second example is actually used by smart cards to verify
personal identification numbers. In section 7, we will describe methods, known as zero-
knowledge or minimum-disclosure proofs, which use quadratic residues to securely verify the
identity of someone and to convince someone that you are who you say you are. Jacobi
symbols and our fourth application of the LQR are used in sections 8 and 9 to describe and
verify an algorithm for fast and efficient computation of Legendre symbols that is required
for the calculations in the zero-knowledge proof of section 7.

1. Solution of the Fundamental Problem for Odd Primes

We will now use quadratic reciprocity to solve the Fundamental Problem for Odd Primes.
Let q be an odd prime, and recall from Chapter 2 that the sets X± (q) are defined by

X± (q) = {p : χp (q) = ±1}.

The Fundamental Problem for Odd Primes requires that the primes p in these sets be found
in some explicit and concrete manner.
Let ri+ (respectively, ri− ), i = 1, . . . , 21 (q − 1) denote the residues (respectively, non-
residues) of q in [1, q − 1]. Note, as we pointed out in Chapter 2, that the residues and
non-residues of q can be found by simply calculating the integers 12 , 22 , . . . , ( q−1
2
)2 and then
reducing mod q. The integers that result from this computation are the residues of q inside
[1, q − 1]. We consider the two cases which are determined by whether q is congruent to 1
or 3 mod 4.
Case 1: q ≡ 1 mod 4.
72
1. Solution of the Fundamental Problem for Odd Primes

In this case, the LQR implies immediately that

X± (q) = {p : χp (q) = ±1}


= {p : χq (p) = ±1}
1
2
(q−1)
[
= {p : p ≡ ri± mod q}.
i=1

Example: q = 17.

We find that the residues of 17 are 1, 2, 4, 8, 9, 13, 15, and 16 and the non-residues of
17 are 3, 5, 6, 7, 10, 11, 12, and 14. Hence

X+ (17) = {p : p ≡ 1, 2, 4, 8, 9, 13, 15, or 16 mod 17},

X− (17) = {p : p ≡ 3, 5, 6, 7, 10, 11, 12, or 14 mod 17}.


(Recall that p always denotes an odd prime.)
Case 2: q ≡ 3 mod 4.
Note first (from Theorem 2.4) that

X± (−1) = {p : p ≡ ±1 mod 4}.

Hence as a consequence of the LQR,


 
(1) X+ (q) = X+ (−1) ∩ {p : χq (p) = 1} ∪ X− (−1) ∩ {p : χq (p) = −1} .

Now for i = 1, . . . , 21 (q − 1), let

x ≡ x±
i mod 4q, 1 ≤ x±
i ≤ 4q − 1,

be the simultaneous solutions of


x ≡ ±1 mod 4,
x ≡ ri± mod q,
obtained from the Chinese remainder theorem (Theorem 1.3). If we set
− −
V (q) = {x+ +
1 , . . . , x 1 (q−1) , x1 , . . . , x 1 (q−1) }
2 2

then (1) implies that


[
X+ (q) = {p : p ≡ n mod 4q}.
n∈V (q)

In order to calculate X− (q), recall that U(4q) denotes the set {n ∈ [1, 4q−1] : gcd(n, 4q) =
1} and then observe that
V (q) ⊆ U(4q),
73
Chapter 4. Four Interesting Applications of Quadratic Reciprocity
[
{p : p 6= q} = {p : p ≡ n mod 4q}.
n∈U (4q)
Hence
X− (q) = {p : p 6= q} \ X+ (q)
[
= {p : p ≡ n mod 4q}.
n∈U (4q)\V (q)

Example: q = 7.

The residues of 7 are 1, 2, and 4 and the non-residues are 3, 5, and 6. Because of the
Chinese remainder theorem, the simultaneous solutions of the congruence pairs
p≡1 mod 4 and p ≡ 1 mod 7,

p≡1 mod 4 and p ≡ 2 mod 7,


p≡1 mod 4 and p ≡ 4 mod 7,
p ≡ −1 mod 4 and p ≡ 3 mod 7,
p ≡ −1 mod 4 and p ≡ 5 mod 7,
p ≡ −1 mod 4 and p ≡ 6 mod 7,
are, respectively,
p ≡ 1 mod 28,
p ≡ 9 mod 28,
p ≡ 25 mod 28,
p ≡ 3 mod 28,
p ≡ 19 mod 28,
p ≡ 27 mod 28.
Hence
X+ (7) = {p : p ≡ 1, 3, 9, 19, 25, or 27 mod 28}.
We have that
U(28) = {1, 3, 5, 9, 11, 13, 15, 17, 19, 23, 25, 27},
V (7) = {1, 3, 9, 19, 25, 27},
hence,
U(28) \ V (7) = {5, 11, 13, 15, 17, 23},
and so
X− (7) = {p : p ≡ 5, 11, 13, 15, 17, or 23 mod 28}.
74
2. Solution of the Basic Problem

2. Solution of the Basic Problem

If d is a fixed but arbitrary integer, we recall formulae (2) and (5) for X+ (d) from Chapter
2. Suppose first that d > 0. Let

E = {E ⊆ πodd (d) : |E| is even},

where πodd (d) denotes the set of all prime factors of d of odd multiplicity. If E ∈ E, let RE
denote the set of all p such that
(
−1, if q ∈ E,
χp (q) =
1, if q ∈ πodd (d) \ E.
Then formula (2) of Chapter 2 is
[ 
X+ (d) = RE \ πeven (d),
E∈E

where πeven (d) denotes the set of all prime factors of d of even multiplicity, and this union is
pairwise disjoint. Moreover
\   \ 
RE = X− (q) ∩ X+ (q) .
q∈E q∈πodd (d)\E

Suppose next that d < 0, and let

E−1 = {E ⊆ {−1} ∪ πodd (d) : |E| is even}.

Then formula (5) of Chapter 2 is


 [ 
X+ (d) = RE \ πeven (d),
E∈E−1

where \   
\
RE = X− (q) ∩ X+ (q) , E ∈ E−1.
q∈E q∈({−1}∪πodd (d))\E

We can now use formula (2) or (5) of Chapter 2 in concert with the solution that we
have of the Fundamental Problem for Odd Primes to calculate X± (d), thereby solving the
Basic Problem. The formulae that we have derived for the calculation of X± (q) where q is
either −1 or a prime show that each of these sets is equal to a union of certain equivalence
classes mod 4, 8, an odd prime, or 4 times an odd prime. It follows that when we employ
formula (2) or (5) of Chapter 2 to calculate X+ (d), each of the sets RE occurring in those
formulae can hence be calculated by the method of successive substitution, a generalization
of the Chinese remainder theorem that can be used to solve simultaneous congruences when
the moduli of the congruences are no longer pairwise relatively prime.
75
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

The method of successive substitution works as follows. We have a series of congruences


of the form
(2) x ≡ ai mod mi , i = 1, . . . , k,
where (m1 , . . . , mk ) is a given k-tuple of moduli and (a1 , . . . , ak ) is a given k-tuple of integers,
which we wish to solve simultaneously. Denoting by lcm(a, b) the least common multiple of
the integers a and b, one starts with
Proposition 4.1. The congruences
x ≡ a1 mod m1 , x ≡ a2 mod m2
have a simultaneous solution if and only if gcd(m1 , m2 ) divides a1 − a2 . The solution is
unique modulo lcm(m1 , m2 ) and is given by
x ≡ a1 + x0 m1 mod lcm(m1 , m2 ),
where x0 is a solution of
m1 x0 ≡ a2 − a1 mod m2 .
The congruences (2) are then solved by first using Proposition 4.1 to solve the first two
congruences in (2), then, if necessary, pairing the solution so obtained with the third con-
gruence in (2) and applying Proposition 4.1 to solve that congruence pair, and continuing
in this manner, successively applying Proposition 4.1 to the pair of congruences consisting
of the solution obtained from step i − 1 and the i-th congruence in (2). This procedure
confirms that (2) has a simultaneous solution if and only if gcd(mi , mj ) divides ai − aj for all
i and j, and that the solution is unique modulo the least common multiple of m1 , . . . , mk .
Proposition 4.1 is not difficult to verify, and so we will leave that to the interested reader.
Consequently, once the residues and non-residues of each integer in πodd (d) are deter-
mined, X+ (d) can be calculated by repeated applications of the method of successive substi-

tutions. In particular, one finds a positive integer m(d) and a subset V (d) of U m(d) such
that  [ 
X+ (d) = {p : p ≡ n mod m(d)} \ πeven (d).
n∈V (d)

The modulus m(d) is determined like so: if d > 0 and πodd (d) contains neither 2 nor a prime
≡ 3 mod 4, then m(d) is the product of all the elements of πodd (d); otherwise, m(d) is 4
times this product.
The formula for X− (d) can now be obtained from the one for X+ (d) by first observing
that as a consequence of the above determination of m(d),

π m(d) ∪ {2} = πodd (d) ∪ {2},
76
2. Solution of the Basic Problem

and so

π(d) ∪ {2} = π m(d) ∪ {2} ∪ πeven (d).
Upon recalling that P denotes the set of all primes, it follows that

X− (d) = P \ X+ (d) ∪ {2} ∪ π(d)
 
= P \ π m(d) ∪ {2} ∪ X+ (d) ∪ πeven (d)
    
= P \ π m(d) ∪ {2} \ X+ (d) ∪ πeven (d) .

Because
  [
P \ π m(d) ∪ {2} = {p : p ≡ n mod m(d)},
n∈U (m(d))
 [ 
X+ (d) ∪ πeven (d) = {p : p ≡ n mod m(d)} ∪ πeven (d),
n∈V (d)
it hence follows that
 [ 
X− (d) = {p : p ≡ n mod m(d)} \ πeven (d).
n∈U (m(d))\V (d)

The set V (d) that appears in the formulae which calculate X± (d) is obtained from ap-
plications of the method of successive substitution to the calculation of each of the sets RE
which appears in equation (2) or (5) of Chapter 2. A natural question which arises asks:

are all of the integers in V (d) and U m(d) \ V (d) which arise from these calculations re-
quired for the determination of X± (d)? The answer is yes, if for each pair of relatively prime
positive integers m and n, the set {z ∈ Z : z ≡ n mod m} contains primes. Remarkably
enough, {z ∈ Z : z ≡ n mod m} in fact always contains infinitely many primes. This is a
famous theorem of Dirichlet [10], and the connection of that theorem to the calculation of
X± (d) was Dirichlet’s primary motivation for proving it. Much more is to come (in section
4 below) about Dirichlet’s theorem and its use in the study of residues and non-residues.
We next illustrate the procedure which we have described for the solution of the Basic
Problem by calculating X± (126). From the calculations using this example that we preformed
in section 2 of Chapter 2, it follows that
  
X+ (126) = X+ (2) ∩ X+ (7) ∪ X− (2) ∩ X− (7) \ {3}.

hence we must calculate X+ (2) ∩ X+ (7) and X− (2) ∩ X− (7).


Calculation of X+ (2) ∩ X+ (7).
Theorem 2.6 implies that
X+ (2) = {p : p ≡ 1 or 7 mod 8},
77
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

and we have from the calculation of X+ (7) above that

X+ (7) = {p : p ≡ 1, 3, 9, 19, 25, or 27 mod 28}.

In order to calculate X+ (2) ∩ X+ (7), we need to solve at most 12 (but in fact exactly six)
pairs of simultaneous congruences. We do this by applying Proposition 4.1. We have that
gcd(8, 28) = 4, lcm(8, 28) = 56, and so Proposition 4.1 implies that X+ (2) ∩ X+ (7) consists
of the union of all odd prime simultaneous solutions of the congruence pairs

x ≡ 1 mod 8, x ≡ 1 mod 28,

x ≡ 1 mod 8, x ≡ 9 mod 28,


x ≡ 1 mod 8, x ≡ 25 mod 28,
x ≡ 7 mod 8, x ≡ 3 mod 28,
x ≡ 7 mod 8, x ≡ 19 mod 28,
x ≡ 7 mod 8, x ≡ 27 mod 28,
whose odd prime solutions are, respectively,

p ≡ 1 mod 56,

p ≡ 9 mod 56,
p ≡ 25 mod 56,
p ≡ 31 mod 56,
p ≡ 47 mod 56,
p ≡ 55 mod 56.
Calculation of X− (2) ∩ X− (7).
From Theorem 2.6 and the calculation of X− (7) above, it follows that

X− (2) = {p : p ≡ 3 or 5 mod 8},

X− (7) = {p : p ≡ 5, 11, 13, 15, 17, or 23 mod 28}.


Hence, again according to Proposition 4.1, X− (2) ∩ X− (7) consists of the union of all odd
prime simultaneous solutions of the congruence pairs

x ≡ 3 mod 8, x ≡ 11 mod 28,

x ≡ 3 mod 8, x ≡ 15 mod 28,


x ≡ 3 mod 8, x ≡ 23 mod 28,
x ≡ 5 mod 8, x ≡ 5 mod 28,
78
3. Sets of Integers which are Quadratic Residues of Infinitely Many Primes

x ≡ 5 mod 8, x ≡ 13 mod 28,


x ≡ 5 mod 8, x ≡ 17 mod 28,
whose odd prime solutions are, respectively,

p ≡ 11 mod 56,

p ≡ 43 mod 56,
p ≡ 51 mod 56,
p ≡ 5 mod 56,
p ≡ 13 mod 56,
p ≡ 45 mod 56.
From this calculation of X+ (2) ∩ X+ (7) and X− (2) ∩ X− (7), it hence follows that

X+ (126) = {p : p ≡ 1, 5, 9, 11, 13, 25, 31, 43, 45, 47, 51, or 55 mod 56}.
In order to calculate X− (126), we simply delete from U(56) the minimal positive ordinary
residues mod 56 that determine X+ (126): the integers resulting from that are 3, 15, 17, 19,
23, 27, 29, 33, 37, 39, 41, and 53. Hence
X− (126) = {p 6= 3 : p ≡ 3, 15, 17, 19, 23, 27, 29, 33, 37, 39, 41, or 53 mod 56}.

3. Sets of Integers which are Quadratic Residues of Infinitely Many Primes

In this section we will use the LQR to investigate when a finite non-empty subset of
positive integers is the set of residues for infinitely many primes. We start by looking at
singleton sets. Obviously, if a ∈ Z is a square then a is a residue of all primes. Is the converse
true, i.e., if a positive integer is a residue of all primes, must it be a square? The answer is
yes; in fact a slightly stronger statement is valid:

Theorem 4.2. A positive integer is a residue of all but finitely many primes if and only
if it is a square.

This theorem implies that if S is a nonempty finite subset of [1, ∞) then S is a set of
residues for all but finitely many primes if and only if every element of S is a square. What if
we weaken the requirement that S be a set of residues of all but finitely many primes to the
requirement that S be a set of residues for only infinitely many primes? Then the somewhat
surprising answer is asserted by

Theorem 4.3. If S is any nonempty finite subset of [1, ∞) then S is a set of residues of
infinitely many primes.
79
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

Theorems 4.2 and 4.3 are simple consequences of

Lemma 4.4. (Basic Lemma) If Π = {p1 , . . . , pk } is a nonempty finite set of primes and
if ε : Π → {−1, 1} is a fixed function then there exits infinitely many primes p such that

χp (pi ) = ε(pi ), i ∈ [1, k].

N.B. This lemma asserts that if all of the integers in the set S of Theorem 4.3 are prime,
then for any pattern of +1’s or −1’s attached to the elements of S, the Legendre symbol χp
reproduces that pattern on S for infinitely many primes p. Thus the conclusion of Theorem
4.3 can be strengthened considerably when S is a set of primes.
Assume Lemma 4.4 for now. We will use it to first prove Theorems 4.2 and 4.3 and
then we will use quadratic reciprocity (and Dirichlet’s theorem on primes in arithmetic
progression) to prove Lemma 4.4.
Proof of Theorem 4.2.
Suppose that n ∈ [1, ∞) is not a square. Then πodd (n) 6= ∅ and
Y
(3) χp (n) = χp (q), for all p ∈
/ π(n).
q∈πodd (n)

Now take any fixed q0 ∈ πodd (n) and define ε : πodd (n) → {−1, 1} by
(
−1, if q = q0 ,
ε(q) =
1, if q 6= q0 .
Lemma 4.4 implies that there exists infinitely many primes p such that

χp (q) = ε(q), for all q ∈ πodd (n),

and so the product in (3), and hence χp (n), is −1 for all such p ∈
/ π(n). QED
Proof of Theorem 4.3.
Let S be a fixed nonempty subset of positive integers and let
[
X= πodd (z).
z∈S

We may assume that X 6= ∅; otherwise all elements of S are squares and Theorem 4.3 is
trivially true in that case. Then Lemma 4.4 implies that there exists infinitely many primes
p such that
χp (q) = 1, for all q ∈ X,
hence for all such p which are not factors of an element of S,
Y
χp (z) = χp (q) = 1, for all z ∈ S.
q∈πodd (z)

80
3. Sets of Integers which are Quadratic Residues of Infinitely Many Primes

QED
Proof of Lemma 4.4.
It follows from our solution of the Fundamental Problem for all primes (Theorem 2.6
and the calculation of X± (q), q an odd prime, in section 1 of this chapter) that Lemma
4.4 is valid when Π is a singleton, so assume that k ≥ 2. We will make use of arithmetic
progressions in this argument, and so if a, b ∈ [1, ∞), let

AP (a, b) = {a + nb : n ∈ [0, ∞)}

denote the arithmetic progression with initial term a and common difference b. We will find
the primes that will verify the conclusion of Lemma 4.4 by looking inside certain arithmetic
progressions, hence we will need the following theorem, one of the basic results in the theory
of prime numbers:

Theorem 4.5. (Dirichlet’s theorem on primes in arithmetic progression). If {a, b} ⊆


[1, ∞) and gcd(a, b) = 1 then AP (a, b) contains infinitely many primes.

The key ideas in Dirichlet’s proof of Theorem 4.5 will be discussed in due course. For
now, assume that the elements of the set Π in the hypothesis of Lemma 4.4 are ordered as
p1 < · · · < pk and fix ε : Π → {−1, 1}. We need to verify the conclusion of Lemma 4.4 for
this ε. Suppose first that p1 = 2 and ε(2) = 1. If i ∈ [2, k] and ε(pi ) = 1, let ki = 1, and if
ε(pi ) = −1, let ki be an odd non-residue of pi such that gcd(pi , ki ) = 1 (if ε(pi ) = −1 then
such a ki can always be chosen: simply pick any non-residue x of pi in [1, pi − 1]; if x is odd,
set ki = x, and if x is even, set ki = x + pi ).
Now, suppose that i ∈ [2, k] , p ≡ 1 mod 8, and p ∈ AP (ki , 2pi ), say p = ki + 2pi n, for
some n ∈ [1, ∞). Then LQR implies that

χp (pi ) = χpi (p) = χpi (ki + 2pi n) = χpi (ki ).

It follows from Theorem 2.6 and the choice of ki that

χp (2) = 1 and χp (pi ) = ε(pi ).

Hence
k
\
(4) if p ≡ 1 mod 8 and p ∈ AP (ki , 2pi ), then χp (pi ) = ε(pi ), for all i ∈ [1, k].
i=2

We prove next that there are infinitely many primes ≡ 1 mod 8 inside ki=2 AP (ki , 2pi ).
T

To see this, we first use the fact that each ki is odd and an inductive construction obtained
81
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

from solving an appropriate sequence of linear Diophantine equations (Proposition 1.4) to


obtain an integer m such that
k
\ 
(5) AP (k2 + 2m, 8p2 · · · pk ) ⊆ AP (1, 8) ∩ AP (ki , 2pi ) .
i=2

We then claim that gcd(k2 + 2m, 8p2 · · · pk ) = 1. If this is true then by virtue of Theorem
4.5, we have that AP (k2 + 2m, 8p2 · · · pk ) contains infinitely many primes p, hence for any
such p, it follows from (4) and (5) that

(6) χp (pi ) = ε(pi ), i ∈ [1, k],

the conclusion of Lemma 4.4. To verify the claim, assume by way of contradiction that q is
a common prime factor of k2 + 2m and 8p2 · · · pk . Then q 6= 2 because k2 is odd, hence there
is a j ∈ [2, k] such that q = pj . But (5) implies that there exists n ∈ [0, ∞) such that

k2 + 2m + 8p2 · · · pk = kj + 2npj ,

and so pj divides kj , contrary to the choice of kj .


If p1 = 2 and ε(2) = −1, a similar argument shows that ki=2 AP (ki , 2pi ) contains
T

infinitely many primes p ≡ 5 mod 8, hence (6) is true for all such p. If p1 6= 2, simply adjoin
2 to Π and repeat this argument. QED

4. Intermezzo: Dirichlet’s Theorem on Primes in Arithmetic Progression

In addition to the LQR, Theorem 4.5 also played a key role in the proof of the basic
Lemma 4.4, and thus also in the proofs of Theorems 4.2 and 4.3. Because they will play such
an important role in our story, we will now discuss the key ingredients of Dirichlet’s proof
of Theorem 4.5. Dirichlet [10] proved this in 1837 , and it would be hard to overemphasize
the importance of this theorem and the methods Dirichlet developed to prove it. As we
shall see, he used analysis, specifically the theory of infinite series and infinite products of
complex-valued functions of a real variable, and in subsequent work [11] also the theory of
Fourier series, to discover properties of the primes (for the reader who may benefit from
it, we briefly discuss analytic functions, Fourier series, and some of their basic properties
in Chapter 7). His use of continuous methods to prove deep results about discrete sets
like the prime numbers was not only a revolutionary insight, but also caused a sensation
in the nineteenth century mathematical community. Dirichlet’s results founded the subject
of analytic number theory, which has become one of the most important areas and a major
industry in number theory today. Later (in Chapters 5 and 7) we will also see how Dirichlet
used analytic methods to study important properties of residues and non-residues.
82
4. Intermezzo: Dirichlet’s Theorem on Primes in Arithmetic Progression

Dirichlet is a towering figure in the history of number theory not only because of the
many results and methods of fundamental importance which he discovered and developed in
that subject but also because of his role as an expositor of that work and the work of Gauss.
We have already given an indication of how the work of Gauss, especially the Disquisitiones
Arithmeticae, brought about a revolutionary transformation in number theory. However,
the influence of Gauss’ work was rather slow to be realized, due primarily to the difficulty
that many of his mathematical contemporaries had in understanding exactly how Gauss had
done what he had done in the Disquisitiones. Dirichlet is said to have been the first person
to completely master the Disquisitiones, and legend has it that he was never without a copy
of it within easy reach. Many of the results and techniques that Gauss developed in the
Disquisitiones were first explained in a more accessible way in Dirichlet’s great text [12],
the Vorlesungen über Zahlentheorie; John Stillwell, the translator of the Vorlesungen into
English, called it one of the most important mathematics books of the nineteenth century:
the link between Gauss and the number theory of today. If a present-day reader of the
Disquisitiones finds much of it easier to understand than a reader in the early days of the
nineteenth century did, it is because that modern reader learned number theory the way
that Dirichlet first taught it.
Now, back to primes in arithmetic progression. In 1737, Euler proved that the series
1
P
q∈P q diverges and hence deduced Euclid’s theorem that there are infinitely many primes.
Taking his cue from this result, Dirichlet sought to prove that
X 1

p≡a mod b
p

diverges, where a and b are given positive relatively prime integers, thereby showing that
the arithmetic progression with constant term a and difference b contains infinitely many
primes. To do this, he studied the behavior as s → 1+ of the function of s defined by
X 1
.
p≡a mod b
ps

This function is difficult to get a handle on; it would be easier if we could replace it by a
sum indexed over all of the primes, so consider
(
X
−s 1, if p ≡ a mod b,
δ(p)p , where δ(p) =
p
0, otherwise.

Dirichlet’s profound insight was to replace δ(p) by certain functions which capture the be-
havior of δ(p) closely enough, but which are more amenable to analysis relative to primes in
the ordinary residue classes mod b. We now define these functions.
83
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

Begin by recalling that if A is a commutative ring with identity 1 then a unit u of A is an


element of A that has a multiplicative inverse in A, i.e., there exists v ∈ A such that uv = 1.
The set of all units of A forms a group under the multiplication of A, called the group of
units of A. Consider now the ring Z/bZ of ordinary residue classes of Z mod b. Proposition
1.2 implies that the group of units of Z/bZ consists of all ordinary residue classes that are
determined by the integers that are relatively prime to b. If we hence identify Z/bZ in the
usual way with the set of ordinary non-negative minimal residues [0, b−1] on which is defined
the addition and multiplication induced by addition and multiplication of ordinary residue
classes, it follows that
U(b) = {n ∈ [1, b − 1] : gcd(n, b) = 1}
is the group of units of Z/bZ, and we set

ϕ(b) = |U(b)|;

ϕ is called Euler’s totient function.


Let T denote the circle group of all complex numbers of modulus 1, with the group
operation defined by ordinary multiplication of complex numbers. A homomorphism of U(b)
into T is called a Dirichlet character modulo b. We denote by χ0 the principal character
modulo b, i.e., the character which sends every element of U(b) to 1 ∈ T . If χ is a Dirichlet
character modulo b, we extend it to all integers z by setting χ(z) = χ(n) if there exists
n ∈ U(b) such that z ≡ n mod b, and setting χ(z) = 0, otherwise. It is then easy to verify

Proposition 4.6. A Dirichlet character χ modulo b is


(i) of period b, i.e., χ(n) = 0 if and only if gcd(n, b) > 1 and χ(m) = χ(n) whenever
m ≡ n mod b, and is
(ii) completely multiplicative, i.e., χ(mn) = χ(m)χ(n) for all m, n ∈ Z.

We say that a Dirichlet character is real if it is real-valued, i.e., its range is either the set
{0, 1} or [−1, 1]. In particular the Legendre symbol χp is a real Dirichlet character mod p.
For each modulus b, the structure theory of finite abelian groups can be used to explicitly
construct all Dirichlet characters mod b; we will not do this, and instead refer the interested
reader to Hecke [27], section 10 or Davenport [6], pp. 27-30. In particular there are exactly
ϕ(b) Dirichlet characters mod b.
The connection between Dirichlet characters and primes in arithmetic progression can
now be made. If gcd(a, b) = 1 then Dirichlet showed that
(
1 X 1, if p ≡ a mod b,
χ(a)χ(p) =
ϕ(b) χ 0, otherwise,
84
4. Intermezzo: Dirichlet’s Theorem on Primes in Arithmetic Progression

where the sum is taken over all Dirichlet characters χ mod b. These are the so-called
orthogonality relations for the Dirichlet characters. This equation says that the characteristic
function δ(p) of the primes in an ordinary equivalence class mod b can be written as a linear
combination of Dirichlet characters. Hence
X 1 X
s
= δ(p)p−s
p≡a mod b
p p
X  1 X 
= χ(a)χ(p) p−s
p
ϕ(b) χ
1 X −s 1 X X 
= p + χ(a) χ(p)p−s .
ϕ(b) p ϕ(b) χ6=χ p
0

After observing that


X
lim+ p−s = +∞,
s→1
p

Dirichlet deduced immediately from the above equations the following lemma:

Lemma 4.7. lims→1+ p≡a mod b p−s = +∞ if for each non-principal Dirichlet character
P

χ mod b, p χ(p)p−s is bounded as s → 1+ .


P

Hence Theorem 4.5 will follow if one can prove that


X
(7) for all non-principal Dirichlet characters χ mod b, χ(p)p−s is bounded as s → 1+ .
p

Let χ be a given Dirichlet character. In order to verify (7), Dirichlet introduced his next
deep insight into the problem by considering the function

X χ(n)
L(s, χ) = , s ∈ C,
n=1
ns

which has come to be known as the Dirichlet L-function of χ. We will prove in Chapter 7
that L(s, χ) is analytic in the half-plane Re s > 1, satisfies the infinite-product formula
Y 1
L(s, χ) = , Re s > 1,
q∈P
1 − χ(q)q −s

the Euler-Dirichlet product formula, and is analytic in Re s > 0 whenever χ is non-principal.


One can take the complex logarithm of both sides of the Euler-Dirichlet product formula to
deduce that

X χ(n)Λ(n) −s
log L(s, χ) = n , Re s > 1,
n=2
log n
85
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

where (
log q, if n is a power of q, q ∈ P ,
Λ(n) =
0, otherwise.
Using algebraic properties of the character χ and the function Λ, Dirichlet proved that (7)
is true if

(8) log L(s, χ) is bounded as s → 1+ whenever χ is non-principal.

We should point out that Dirichlet did not use functions of a complex variable in his work,
but instead worked only with real values of the variable s (Cauchy’s theory of analytic
functions of a complex variable, although fully developed by 1825, did not become well-
known or commonly employed until the 1840’s) . Because L(s, χ) is continuous on Re s > 0,
it follows that
lim log L(s, χ) = log L(1, χ),
s→1+
hence (8) will hold if
L(1, χ) 6= 0 whenever χ is non-principal.
We have at last come to the heart of the matter, namely

Lemma 4.8. If χ is a non-principal Dirichlet character then L(1, χ) 6= 0.

If χ is not real, Lemma 4.8 is fairly easy to prove, but when χ is real, this task is
much more difficult to do. Dirichlet deduced Lemma 4.8 for real characters by using results
from the classical theory of quadratic forms; he established a remarkable formula which
calculates L(1, χ) as the product of a certain parameter and the number of equivalence
classes of quadratic forms (section 12, Chapter 3); because this parameter and the number
of equivalence classes are clearly positive, L(1, χ) must be nonzero. At the conclusion of
Chapter 7, we will give an elegant proof of Lemma 4.8 for real characters due to de la Vallée
Poussin [45], and then in Chapter 8 we will prove Dirichlet’s class-number formula for the
value of L(1, χ).
Finally, we note that if χ0 is the principal character mod b then it is a consequence of
the Euler-Dirichlet product formula that
Y
1 − q −s ,

L(s, χ0 ) = ζ(s)
q|b

where

X 1
ζ(s) =
n=1
ns
is the Riemann zeta function.
86
5. The Asymptotic Density of Primes

At this first appearance in our story of ζ(s), probably the single most important function
in analytic number theory, we cannot resist briefly discussing the

Riemann Hypothesis: all zeros of ζ(s) in the strip 0 < Re s < 1 have real part 12 .

Generalized Riemann Hypothesis (GRH): if χ is a Dirichlet character then all zeros of


L(s, χ) in the strip 0 < Re s ≤ 1 have real part 21 .

Riemann [47] first stated the Riemann Hypothesis (in an equivalent form) in a paper that
he published in 1859, in which he derived an explicit formula for the number of primes not
exceeding a given real number. By general agreement, verification of the Riemann Hypoth-
esis is the most important unsolved problem in mathematics. One of the most immediate
consequences of the truth of the Riemann Hypothesis, and arguably the most significant, is
the essentially optimal error estimate for the asymptotic approximation of the cardinality
of the set {q ∈ P : q ≤ x} given in the Prime Number Theorem (see the statement of
this theorem in the next section). This estimate asserts that there is an absolute, positive
constant C such that for all x sufficiently large,


{q ∈ P : q ≤ x}

C
Z x − 1 ≤ √ .
1 x
dt
2 log t

Rx 1
The integral 2 log t dt appearing in this inequality, the logarithmic integral of x, is generally
a better asymptotic approximation to the cardinality of {q ∈ P : q ≤ x} than the quotient
x/ log x. Hilbert emphasized the importance of the Riemann Hypothesis in Problem 8 on
his famous list of 23 open problems that he presented in 1900 in his address to the second
International Congress of Mathematicians. In 2000, the Clay Mathematics Institute (CMI)
published a series of seven open problems in mathematics that are considered to be of
exceptional importance and have long resisted solution. In order to encourage work on these
problems, which have come to be known as the Clay Millennium Prize Problems, for each
problem CMI will award to the first person(s) to solve it $1,000,000 (US). The proof of the
Riemann Hypothesis is the second Millennium Prize Problem (as currently listed on the CMI
web site).

5. The Asymptotic Density of Primes

Theorem 4.3 gives rise to the following natural and interesting question: if S is a
nonempty, finite subset of [1, ∞), how large is the necessarily infinite set of primes

{p : χp ≡ 1 on S} ?
87
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

(The meaning of the symbol ≡ used here is as an identity of functions, not as a modular
congruence; in subsequent uses of this symbol, its meaning will be clear from the context.)
To formulate this question precisely, we need a good way to measure the size of an infinite
set of primes. This is provided by the concept of the asymptotic density of a set of primes,
which we will discuss in this section.
If Π is a set of primes and P denotes the set of all primes then the asymptotic density of
Π in P is
{p ∈ Π : p ≤ x}
lim ,
x→+∞ {p ∈ P : p ≤ x}

provided that this limit exists. Roughly speaking, the density of Π is the “proportion” of
the set P that is occupied by Π. Since the asymptotic density of any finite set is clearly 0
and the asymptotic density of any set whose complement in P is finite is clearly 1, only sets
of primes which are infinite and have an infinite complement in P are of interest in terms of
their asymptotic densities. We can in fact be a bit more precise: recall that if a(x) and b(x)
denote positive real-valued functions defined on (0, +∞), then a(x) is asymptotic to b(x) as
x → +∞, denoted by a(x) ∼ b(x), if
a(x)
lim = 1.
x→+∞ b(x)

The Prime Number Theorem (LeVeque, [39], chapter 7; Montgomery and Vaughn, [41],
chapter 6) asserts that as x → +∞,
x
|{q ∈ P : q ≤ x}| ∼ ,
log x
consequently, if d is the density of Π then as x → +∞,
x
|{q ∈ Π : q ≤ x}| ∼ d .
log x
Hence the asymptotic density of Π provides a way to measure precisely the “asymptotic
cardinality” of Π.

6. The Density of Primes which have a Given Finite Set of Quadratic Residues

Theorem 4.3 asserts that if S is a given nonempty finite set of positive integers then the
set of primes {p : χp ≡ 1 on S} is infinite. In this section, we will prove a theorem which
provides a way to calculate the density of the set {p : χp ≡ 1 on S}. This will be given by
a formula which depends on a certain combinatorial parameter that is determined by the
prime factors of the elements of S. In order to formulate this result, let F denote the Galois
field GF (2) of 2 elements, which can be concretely realized as the field Z/2Z of ordinary
residue classes mod 2. Let A ⊆ [1, ∞). If n = |A|, then we let F n denote the vector space
88
6. The Density of Primes which have a Given Finite Set of Quadratic Residues

over F of dimension n, arrange the elements a1 < · · · < an of A in increasing order, and
then define the map v : 2A → F n like so: if B ⊆ A then
(
1, if ai ∈ B,
the i-th coordinate of v(B) =
0, if ai ∈
/ B.
If we recall that πodd (z) denotes the set of all prime factors of odd multiplicity of the integer
z then we can now state (and eventually prove) the following theorem:

Theorem 4.9. If S is a nonempty, finite subset of [1, ∞),

S = {πodd (z) : z ∈ S},


[
A= X,
X∈S

n = |A|,
and
d = the dimension of the linear span of v(S) in F n ,
then the density of {p : χp ≡ 1 on S} is 2−d .

Theorem 4.9 reduces the calculation of the density of {p : χp ≡ 1 on S} to prime


factorization of the integers in S and linear algebra over F . If we enumerate the nonempty
elements of S as S1 , . . . , Sm (if S has no such elements then S consists entirely of squares,
hence the density is clearly 1) then d is just the rank over F of the m × n matrix
 
v(S1 )(1) . . . v(S1 )(n)
 . ..
 ..

. ,
 
v(Sm )(1) . . . v(Sm )(n)
where v(Si )(j) is the j-th coordinate of v(Si ). This matrix is often referred to as the incidence
matrix of S. Because there are only two elementary row (column) operations over F , namely
row (column) interchange and addition of a row (column) to another row (column), the rank
of this matrix is easily calculated by Gauss-Jordan elimination. However, this procedure
requires that we first find the prime factors of odd multiplicity of each element of S, and
that, in general, is not so easy!
A few examples will indicate how Theorem 4.9 works in practice. Observe first that if
S is a finite set of primes of cardinality n, say, then the incidence matrix of S is just the
n × n identity matrix over F , hence the dimension of v(S) in F n is n, and so the density of
{p : χp ≡ 1 on S} is 2−n . Now chose four primes p < q < r < s, say, and let

S1 = {p, pq, qr, rs}.


89
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

The incidence matrix of S1 is


 
1 0 0 0
 

 1 1 0 0 
,

 0 1 1 0 

0 0 1 1
which is row equivalent to
 
1 0 0 0
 

 0 1 0 0 
.

 0 0 1 0 

0 0 0 1

It follows from Theorem 4.9 that the density of {p : χp ≡ 1 on S1 } is 2−4 . If

S2 = {p, ps, pqr, pqrs},

then the incidence matrix of S2 is


 
10 0 0
 
 10 0 1 
 ,
 11 1 0 
 
11 1 1
which is row equivalent to
 
1 0 0 0
 

 0 1 1 1 
,

 0 0 0 1 

0 0 0 0

hence Theorem 4.9 implies that the density of {p : χp ≡ 1 on S2 } is 2−3 . Because a 2-


dimensional subspace of F 4 contains exactly 3 nonzero vectors, it follows that if S consists
of 4 nontrivial square-free integers such that S is supported on 4 primes, then the density of
{p : χp ≡ 1 on S} cannot be 2−2 . However, for example, if

S3 = {ps, qr, pqrs},

then the incidence matrix of S3 is


 
1001
 0 1 1 0 ,
 

1111
90
6. The Density of Primes which have a Given Finite Set of Quadratic Residues

which is row equivalent to


 
1001
 0 1 1 0 ,
 

0000
and so the density of {p : χp ≡ 1 on S3 } is 2−2 .
We turn now to the

Proof of Theorem 4.9. We first establish a strengthened version of Theorem 4.9 in a


special case, and then use it (and another lemma) to prove Theorem 4.9 in general.

Lemma 4.10. (Filaseta and Richman [18], Theorem 2) If Π is a nonempty set of primes
and ε : Π → {−1, 1} is a given function then the density of the set {p : χp ≡ ε on Π} is
2−|Π|.

Proof. Let
X = {p : χp ≡ ε on Π},

K = product of the elements of Π.


If n ∈ Z then we let [n] denote the ordinary residue class mod 4K which contains n. The
proof of Lemma 4.10 can now be outlined in a series of three steps.
Step 1. Use the LQR to show that
[
X= {p : p ∈ [n]}.
n∈U (4K):X∩[n]6=∅

Step 2 (and its implementation) . Here we will make use of the Prime Number Theorem
for primes in arithmetic progressions, to wit, if a ∈ Z, b ∈ [1, ∞), gcd(a, b) = 1, and
AP (a, b) denotes the arithmetic progression with initial term a and common difference b,
then as x → +∞,
1 x
|{p ∈ AP (a, b) : p ≤ x}| ∼ .
ϕ(b) log x
For a proof of this important theorem, see either LeVeque [39], section 7.4, or Montgomery
and Vaughn, [41], section 11.3. In our situation it asserts that if n ∈ U(4K) then as x → +∞,
1 x
|{p ∈ [n] : p ≤ x}| ∼ .
ϕ(4K) log x
From this it follows that
1
(9) the density dn of {p : p ∈ [n]} is , for all n ∈ U(4K).
ϕ(4K)
91
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

Because the decomposition of X in Step 1 is pairwise disjoint, (9) implies that


X |{n ∈ U(4K) : X ∩ [n] 6= ∅}|
(10) density of X = dn = .
ϕ(4K)
n∈U (4K):X∩[n]6=∅

Step 3. Use the group structure of U(4K) and the LQR to prove that
ϕ(4K)
(11) |{n ∈ U(4K) : X ∩ [n] 6= ∅}| = .
2|Π|
From (10) and (11) it follows that the density of X is 2−|Π| , as desired, hence we need only
implement Steps 1 and 3 in order to finish the proof.
Implementation of Step 1. We claim that

(12) if p, p′ are odd primes and p ≡ p′ mod 4K then χp ≡ χp′ on Π.

Because X is disjoint from {2} ∪ Π and


[
(13) P \ ({2} ∪ Π) = {p : p ∈ [n]},
n∈U (4K)

the decomposition of X as asserted in Step 1 follows immediately from (12).


We verify (12) by using the LQR. Assume that p ≡ p′ mod 4K and let q ∈ Π. Suppose
first that p or q is ≡ 1 mod 4. Then p′ or q is ≡ 1 mod 4, and so the LQR implies that

χp (q) = χq (p)
= χq (p′ + 4kK) for some k ∈ Z
= χq (p′ ), since q divides 4kK
= χp′ (q).

Suppose next that p ≡ 3 ≡ q mod 4. Then p′ ≡ 3 mod 4 hence it follows from the LQR that

χp (q) = −χq (p) = −χq (p′ ) = −(−χp′ (q)) = χp′ (q).

Implementation of Step 3. Define the equivalence relation ∼ on the set of residue classes
{[n] : n ∈ U(4K)} like so:

[n] ∼ [n′ ] if for all odd primes p ∈ [n], q ∈ [n′ ], χp ≡ χq on Π.

We first count the number of equivalence classes of ∼. It is a consequence of (12) that


the sets
{q ∈ Π : χp (q) = 1}
92
6. The Density of Primes which have a Given Finite Set of Quadratic Residues

are the same for all p ∈ [n], and so we let I(n) denote this subset of Π. Now if n ∈ U(4K)
and p ∈ [n] then (13) implies that p ∈ / Π. Hence for all p ∈ [n], χp takes only the values ±1
on Π. It follows that
[n] ∼ [n′ ] if and only if I(n) = I(n′ ).
On the other hand, by virtue of Lemma 4.4, if S ⊆ Π then there exits infinitely many primes
p such that
S = {q ∈ Π : χp (q) = 1},
and so we use (13) to find n0 ∈ U(4K) such that [n0 ] contains at least one of these primes
p, hence
S = I(n0 ).
We conclude that
(14) the number of equivalence classes of ∼ is 2|Π| .
Let En denote the equivalence class of ∼ which contains [n]. We claim that
(15) multiplication by n maps E1 bijectively onto En .
If this is true then |En | is constant as a function of n ∈ U(4K), hence (14) implies that

(16) ϕ(4K) = 2|Π| |En |, for all n ∈ U(4K).


If we now choose p ∈ X then there is n0 ∈ U(4K) such that p ∈ [n0 ], hence it follows from
(12) that
En0 = {[n] : X ∩ [n] 6= ∅},
and so, in light of (16),
ϕ(4K) = 2|Π| |{n ∈ U(4K) : X ∩ [n] 6= ∅}|,
which is (11).
It remains only to verify (15). Because U(4K) is a group under the multiplication induced
by multiplication of ordinary residue classes mod 4K, it is clear that multiplication by n on
E1 is injective, so we need only prove that nE1 = En .
We show first that nE1 ⊆ En . Let [n′ ] ∈ E1 . We must prove: [nn′ ] ∈ En , i.e., [nn′ ] ∼ [n],
i.e.,
(17) if p ∈ [nn′ ], q ∈ [n] are odd primes then χp ≡ χq on Π.
In order to verify (17), let p ∈ [nn′ ], q ∈ [n], p′ ∈ [n′ ], q ′ ∈ [1] be odd primes. Because
[n′ ] ∼ [1],

(18) χp′ ≡ χq′ on Π.


93
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

The choice of p, q, p′ , q ′ implies that

pq ′ ≡ p′ q mod 4K.

This congruence and the LQR when used in an argument similar to the one that was used
to prove (12) imply that

(19) χp χq′ ≡ χp′ χq on Π.

Because χq′ and χp′ are both nonzero on Π, we can use (18) to cancel χq′ and χp′ from each
side of (19) to obtain
χp ≡ χq on Π.
We show next that En ⊆ nE1 . Let [n′ ] ∈ En . The group structure of U(4K) implies that
there exits n0 ∈ U(4K) such that

(20) [nn0 ] = [n′ ],

so we need only show that [n0 ] ∈ E1 , i.e.,

(21) χp ≡ χq on Π, for all odd primes p ∈ [n0 ], q ∈ [1].

Toward that end, choose odd primes p′ ∈ [n], q ′ ∈ [n′ ]. Because [n] ∼ [n′ ],

(22) χp′ ≡ χq′ on Π,

and so because of (20), we have that for all p ∈ [n0 ], q ∈ [1],

pp′ ≡ qq ′ mod 4K.

(21) is now a consequence of this congruence, (22), and our previous reasoning. QED
We will prove Theorem 4.9 by combining Lemma 4.10 with the next lemma, a simple
result in enumerative combinatorics.

Lemma 4.11. If A is a nonempty finite subset of [1, ∞), n = |A|, S ⊆ 2A , F = the Galois
field of order 2, v : 2A → F n is the map defined at the beginning of this section, and

d = the dimension of the linear span of v(S) in F n ,

then the cardinality of the set

N = {N ⊆ A : |N ∩ S| is even, for all S ∈ S}

is 2n−d .
94
6. The Density of Primes which have a Given Finite Set of Quadratic Residues

Proof. Without loss of generality take A = [1, n]. Observe first that if N, T ⊆ A, then
X
|N ∩ T | is even if and only if v(N)(i)v(T )(i) = 0 in F .
i=1

Hence there is a bijection of the set of all solutions in F n of the system of linear equations
n
X
(*) v(S)(i)xi = 0, S ∈ S,
1

onto N given by
(x1 , . . . , xn ) → {i : xi = 1}.
If m = |S| and σ : F n → F m is the linear transformation whose representing matrix is the
coefficient matrix of the system (∗) then

the set of all solutions of (∗) in F n = the kernel of σ.

But d is the rank of σ and so the kernel of σ has dimension n − d. Hence

|N | = |the set of all solutions of (∗) in F n | = |kernel of σ| = 2n−d .

QED
We proceed to prove Theorem 4.9. Let S, S, A, n, and d be as in the hypothesis of that
theorem, let
X = {p : χp ≡ 1 on S},
N = {N ⊆ A : |N ∩ S| is even, for all S ∈ S},
and for each prime p, let
N(p) = {q ∈ A : χp (q) = −1}.
Then since X is disjoint from A,
Y
p ∈ X iff 1 = χp (z) = χp (q), for all z ∈ S,
q∈πodd (z)

iff |N(p) ∩ πodd (z)| is even, for all z ∈ S,


iff N(p) ∈ N .

Hence
[
X= {p : N(p) = N}
N ∈N
and this union is pairwise disjoint. Hence
X
density of X = density of {p : N(p) = N}.
N ∈N
95
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

Lemma 4.10 implies that


density of {p : N(p) = N} = 2−n for all N ∈ N ,
and so
density of X = 2−n |N |
= 2−n (2n−d ), by Lemma 4.11
= 2−d .
QED
The next question which naturally arises asks: what about a version of Theorem 4.9 for
quadratic non-residues, i.e., for what finite, nonempty subsets S of [1, ∞) is it true that S is
a set of non-residues of infinitely many primes? In contrast to what occurs for residues, this
can fail to be true for certain finite subsets S of [1, ∞), and there is a simple obstruction
that prevents it from being true. Suppose that there is a subset T of S such that |T | is odd
Q
and i∈T i is a square, and suppose that S is a set of non-residues of infinitely many primes.
We can then choose p to exceed all of the prime factors of the elements of T and such that
χp (z) = −1, for all z ∈ T . Hence
Y Y 
−1 = (−1)|T | = χp (i) = χp i = 1,
i∈T i∈T

a clear contradiction. It follows that the presence of such subsets T of S prevents S from
being a set of non-residues of infinitely many primes. The next theorem asserts that those
subsets are the only obstructions to S having this property.

Theorem 4.12. If S is a finite, nonempty subset of [1, ∞) then S is a set of non-residues


Q
of infinitely many primes if and only if for all subsets T of S of odd cardinality, i∈T i is
not a square.

This theorem lies somewhat deeper than Theorem 4.3. We will prove it in Chapter 5, where
we will once again delve into the theory of algebraic numbers. But before we get to that, we
will discuss how to use quadratic residues to design zero-knowledge proofs.

7. Zero-Knowledge Proofs and Quadratic Residues

A major issue in modern electronic communication is the secure verification of identifi-


cation, namely, guaranteeing that the person with whom you are communicating is indeed
who you think he is. A typical scenario proceeds as follows: person P sends an electronic
message to person V in the form of an identification number. V wants to securely verify that
P validly possesses the ID number, without knowing anything more about P . Moreover, for
96
7. Zero-Knowledge Proofs and Quadratic Residues

security reasons, P does not want V to be able to find out anything about him during the
verification procedure, i.e., V is to have zero knowledge of P . In addition to all of this, V
wants to make it virtually impossible for any other person C to use the verification procedure
to deceive V into thinking that C is P . An identity-verification algorithm which satisfies all
of these requirements is called a zero-knowledge proof.
Zero-knowledge proofs which employ quadratic residues were devised in the 1980’s be-
cause of the need to maintain security when verifying identification numbers using smart
cards, electronic banking and stock transactions, and other similar types of communication.
In a zero-knowledge proof there are two parties, the prover, a person who wants his iden-
tity verified without divulging any other information about himself, and a verifier, a person
who must be convinced that the prover is who he says he is. The identity of the prover is
verified by checking that he has certain secret information that only he possesses. Security
is maintained because the procedures used in the zero-knowledge proof guarantee that the
probability that someone pretending to be the prover can convince the verifier that she is the
prover is extremely small. Moreover, the verifier checks only that the prover is in possession
of the secret information, without being able to discover what the secret information is.
We will describe a zero-knowledge proof discovered by Adi Shamir [53] in 1985 (we follow
Rosen [48], sections 11.3 and 11.5 for the exposition in this section and in sections 8 and 9
below). The prover P starts by choosing two very large primes p and q such that p ≡ q ≡
3 mod 4 (to maintain security, these primes should have hundreds of digits), computing
n = pq, and then sending n to the verifier V . Let I be a positive integer that represents
particular information, e.g., the personal identification number of P . P selects a positive
number c such that the integer w obtained by concatenating I with c (the integer obtained
by writing the digits of I followed by the digits of c) is a quadratic residue modulo n, i.e.,
there is a solution in integers of the congruence x2 ≡ w mod n, with gcd(x, n) = 1. P sends
w to V and then finds a solution u of this congruence. Finding u can easily be done by
means of Euler’s criterion. In order to see that, note first that χp (w) = χq (w) = 1 and recall
that p ≡ q ≡ 3 mod 4. Euler’s criterion therefore implies that
1
w 2 (p−1) ≡ χp (w) = 1 mod p,
1
w 2 (q−1) ≡ χq (w) = 1 mod q,
hence
1 2 1 1
w 4 (p+1) = w 2 (p+1) = w 2 (p−1) · w ≡ w mod p,
and similarly,
1 2
w 4 (q+1) ≡ w mod q.
97
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

The prover then finds a solution u of x2 ≡ w mod n by using the Chinese remainder theorem
to solve the congruences
1
u ≡ w 4 (p+1) mod p,

1
u ≡ w 4 (q+1) mod q.

Of course, in order to find u in this way, one must know the primes p and q.
P convinces V that P knows u by using an interactive proof that is composed of iterations
of the following four-step cycle:
(i) P choses a random number r and sends V a message containing two integers: x, where
x ≡ r 2 mod n, with gcd(x, n) = 1 and 0 ≤ x < n, and y, where y ≡ wx mod n, 0 ≤ x < n,
and x denotes the inverse of x modulo n.
(ii) V checks that xy ≡ w mod n, then choses a random bit b equal to either 0 or 1, and
sends b to P .
(iii) If b = 0, P sends r to V . If b = 1 then P calculates s ≡ ur mod n, 0 ≤ s < n, and
sends s to V .
(iv) V computes the square modulo n of what P has sent. If V sent 0, she checks that
this square is x, i.e., r 2 ≡ x mod n. If V sent 1 then she checks that this square is y, i.e.,
s2 ≡ y mod n.
This cycle can be iterated many times to guarantee security and to convince V that
P knows his private information u, which shows that P validly possesses the identification
number I, i.e., that P is who he says he is. By passing this test over many cycles, P has
shown that he can produce either r or s upon request. Hence P must know u because in each
cycle, he knows both r and s, and u ≡ rs mod n. Moreover, V is unable to discover what
u is because that would require V to be able to solve the square-root problem x2 ≡ w mod
n without knowing p and q. This problem is known in cryptology circles as the quadratic
residuosity problem, and is regarded to be computationally intractable, hence essentially
impossible to solve in any feasible length of time, when the modulus n is the product of two
very large unknown primes.
Because the bit chosen by V is random, the probability that it is a 0 is 1/2 and the
probability that it is a 1 is 1/2. If someone does not know u, the modular square root of
w, then the probability that they will pass one iteration of the cycle is almost exactly 1/2.
If an impostor is attempting to deceive V into believing that she, the impostor, is P , the
probability of the impostor passing, say, 30 iterations of the cycle is hence approximately
1/230 , less than one in a billion. This makes it virtually impossible for V to be deceived in
this manner.
98
8. Jacobi Symbols

We now ask the following question: what has quadratic reciprocity got to do with all of
this? We begin our answer to this question by recalling that in the initial steps of the Shamir
zero-knowledge proof, the prover needs to find an integer c such that the concatenation w of I
with c is a quadratic residue of n = pq, where p and q are very large primes. This can be done
if and only if χp (w) = χq (w) = 1, hence the prover must be able to compute Legendre symbols
quickly and efficiently. As we have seen, if one can find sufficiently many factors of w then the
LQR can be used to perform this computation in the desired manner. Unfortunately, one of
the outstanding, and very difficult, unsolved problems in computational number theory is the
design of a computationally fast and efficient algorithm for factoring very large integers, and
the ID number I, and also the integer w in Shamir’s algorithm, is often taken to be large for
reasons of security. This difficulty precludes quadratic reciprocity from being used directly
to compute Legendre symbols in Shamir’s algorithm. On the other hand, fortunately, there
is a very fast and efficient algorithm for computing Legendre symbols which avoids factoring,
so much so that when using it, one can, using high-speed computers, of course, very quickly
find the quadratic residues required in Shamir’s zero-knowledge proof. We will now describe
this algorithm for computing Legendre symbols, and it is in the verification of this algorithm
that quadratic reciprocity will find its application.

8. Jacobi Symbols

The device on which our algorithm is based is a generalization of the Legendre symbol,
due to Jacobi. We first define the Jacobi symbol χ1 (m) to be 1 for all integers m. Now let
n > 1 be an odd integer, with prime factorization n = pt11 · · · ptkk . If m is a positive integer
relatively prime to n, then the Jacobi symbol χn (m) is defined as the product of Legendre
symbols
k
Y
χn (m) = χpi (m)ti .
i=1

We emphasize here that this notation for the Jacobi symbol is not standard; we have chosen
it to align with the character-theoretic notation we have used for Legendre symbols.
The Jacobi symbols satisfy exactly the same algebraic properties of the Legendre symbols,
i.e.,
(a) if a and b are both relatively prime to n and a ≡ b mod n then χn (a) = χn (b);
(b) if a and b are both relatively prime to n then χn (ab) = χn (a)χn (b).
It follows from (a) and (b) that if n > 1 and we define the Jacobi symbol χn (m) to be zero
whenever gcd(m, n) > 1 then χn is a real Dirichlet character of modulus n. Moreover the
99
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

Jacobi symbols satisfy an exact analog of the first and second supplementary laws for the
Legendre symbols:
1
(c) χn (−1) = (−1) 2 (n−1) ;
1 2
(d) χn (2) = (−1) 8 (n −1) .
But that is not all! The Jacobi symbols also satisfy an exact analog of the Law of Quadratic
Reciprocity, to wit,

Theorem 4.13. (Reciprocity Law for the Jacobi symbol ) If m and n are relatively prime
odd positive integers then
1 1
χm (n)χn (m) = (−1) 2 (m−1)· 2 (n−1) .

Because they are necessary for the verification of the algorithm for the computation
of Legendre symbols that we require, we will now prove properties (a), (b) and (d) and
Theorem 4.13. As the verification of (a) and (b) are easy consequences of the definition of
the Jacobi symbol and the analogous properties of the Legendre symbol, we can safely leave
those details to the reader.
In order to verify (d), begin by letting pt11 · · · ptmm be the prime factorization of n. Then
from Theorem 2.6 it follows that
m
Y
χn (2) = χpi (2)ti = (−1)σ ,
i=1

where m
X ti (p2i − 1)
σ= .
i=1
8
We have that m
2
Y ti
n = 1 + (p2i − 1) .
i=1
Because p2i − 1 ≡ 0 mod 8, for i = 1, . . . , m, it follows that
t
1 + (p2i − 1) i ≡ 1 + ti (p2i − 1) mod 64
and
1 + ti (p2i − 1) 1 + tj (p2j − 1) ≡ 1 + ti (p2i − 1) + tj (p2j − 1) mod 64.
 

Hence m
X
2
n ≡1+ ti (p2i − 1) mod 64,
i=1
which implies that
m
n2 − 1 X ti (p2i − 1)
≡ = σ mod 8.
8 i=1
8
100
8. Jacobi Symbols

Therefore
1 2 −1)
χn (2) = (−1)σ = (−1) 8 (n .
QED
We begin the proof of Theorem 4.13 by letting and pa11 · · · pas s q1b1 · · · qrbr be the prime
factorizations of m and n. Then
r
Y Yr Ys
bi
χn (m) = χqi (m) = χqi (pj )bi aj
i=1 i=1 j=1

and
s
Y s Y
Y r
aj
χm (n) = χpj (n) = χpj (qi )bi aj .
j=1 j=1 i=1

Hence
s
r Y
Y  a b
χn (m)χm (n) = χqi (pj )χpj (qi ) j i .
i=1 j=1

Because m and n are odd and relatively prime, all of the primes in the prime factorizations
of m and n are odd and no prime factor of m is a factor of n. The LQR thus implies that
1 1
χqi (pj )χpj (qi ) = (−1) 2 (pj −1) 2 (qi −1) .

Hence
r Y
s
1 1
Y
(23) χn (m)χm (n) = (−1)aj 2 (pj −1)bi 2 (qi −1) = (−1)κ ,
i=1 j=1

where
r X s
X aj (pj − 1) bi (qi − 1)
κ= · .
i=1 j=1
2 2
We have that
r X s s r
X aj (pj − 1) bi (qi − 1) X aj (pj − 1) X bi (qi − 1)
· = .
i=1 j=1
2 2 j=1
2 i=1
2

Because
s
Y ai
m= 1 + (pi − 1)
i=1
and pi − 1 is even, it follows that
ai
1 + (pi − 1) ≡ 1 + ai (pi − 1) mod 4,

and
 
1 + ai (pi − 1) 1 + aj (pj − 1) ≡ 1 + ai (pi − 1) + aj (pj − 1) mod 4.
101
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

Hence s
X
m≡1+ ai (pi − 1) mod 4,
i=1
and so s
X ai (pi − 1) m−1
≡ mod 2.
i=1
2 2
Similarly,
r
X bi (qi − 1) n−1
≡ mod 2.
i=1
2 2
Therefore,
r X s
X aj (pj − 1) bi (qi − 1) m−1 n−1
(24) κ= · ≡ · mod 2.
i=1 j=1
2 2 2 2

It now follows from (23) and (24) that


1 1
χn (m)χm (n) = (−1)κ = (−1) 2 (m−1)· 2 (n−1) .
QED

9. An Algorithm for Fast Computation of Legendre Symbols

The key ingredient of the algorithm for the computation of Legendre symbols that we
want is a formula for the computation of certain Jacobi symbols. That formula uses data
given in the form of two finite sequences of integers which are generated by a successive
division and factorization procedure. In order to state that formula we start with two
relatively prime positive integers a and b with a > b. We will generate two finite sequences
of integers from a and b by using a modification of the Euclidean algorithm as follows: let
a = R0 and b = R1 . Using the division algorithm and then factoring out the highest power
of 2 from the remainder, we obtain
R0 = R1 q1 + 2s1 R2 ,
where gcd(R1 , R2 ) = 1 and R2 is odd. Now successively apply the division algorithm as
follows, factoring out the highest power of 2 from the remainders as you do so:
R1 = R2 q2 + 2s2 R3
R2 = R3 q3 + 2s3 R4
..
.
Rn−2 = Rn−1 qn−1 + 2sn−1 · 1
Rn = 1, sn = 0.
102
9. An Algorithm for Fast Computation of Legendre Symbols

Note that Ri is an odd positive integer and si is a nonnegative integer for i = 1, . . . , n, and
gcd(Ri , Ri+1 ) = 1 for i = 0, . . . , n−1. Because Ri+1 < Ri for each i, this division process will
always terminate. The formula for the computation of the Jacobi symbols that is required
can now be stated and proved:

Proposition 4.14. If a and b be relatively prime positive integers such that a > b, b
is odd, and Ri and si , i = 1, . . . , n, are the sequences of integers generated by the preceding
algorithm, then
χb (a) = (−1)σ ,
where
n−1 
Ri2 − 1 (Ri − 1)(Ri+1 − 1)
X 
σ= si + .
i=1
8 4

Proof. From properties (a), (b), and (d) of the Jacobi symbol, it follows that

χb (a) = χR1 (R0 ) = χR1 (2s1 R2 )


= χR1 (2)s1 χR1 (R2 )
R2
1 −1
= (−1)s1 · 8 χR1 (R2 ),

and it follows from Theorem 4.13 that


R1 −1 R2 −1
χR1 (R2 ) = (−1) 2 2 χR2 (R1 ),

hence
χb (a) = (−1)σ1 χR2 (R1 ),
where
R12 − 1 (R1 − 1)(R2 − 1)
σ1 = s1 + .
8 4
In the same manner, we obtain for i = 2 . . . , n − 1,

χRi (Ri−1 ) = (−1)σi χRi+1 (Ri ),

where
Ri2 − 1 (Ri − 1)(Ri+1 − 1)
σi = si + .
8 4
When all of these equations are combined, the desired expression for χb (a) is produced. QED
The algorithm for the computation of Legendre symbols can now be described in a simple
three-step procedure like so: let p be an odd prime, a a positive integer less than p; we wish
to compute the Legendre symbol χp (a).
Step 1. Factor a = 2s b where b is odd (in Shamir’s algorithm, this step can always be
avoided by concatenating an odd integer to the integer I).
103
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

Theorem 2.6 implies that


p2 −1
(25) χp (a) = χp (2)s χp (b) = (−1)s· 8 χp (b).

Now use Theorem 4.13 to obtain


1 1
(26) χp (b) = (−1) 2 (p−1) 2 (b−1) χb (p).

Substitution of (26) into (25) yields


Step 2. Write
χp (a) = (−1)ε χb (p),
where
s(p2 − 1) (p − 1)(b − 1)
ε= + .
8 4
Step 3. Use the formula from Proposition 4.14 to compute χb (p) and substitute that
value into the formula for χp (a) in Step 2.
As an example, we use this algorithm to calculate χ311 (141) without factoring the argu-
ment 141. Because 141 is odd, Step 1 yields s = 0, hence from Step 2 we obtain

χ311 (141) = χ141 (311).

In Step 3, we need the sequence of divisions

311 = 141 · 12 + 20 · 29
141 = 29 · 4 + 20 · 25
29 = 25 · 1 + 22 · 1,

and so the sequences that are required to apply Proposition 4.14 are R1 = 141, R2 = 29, R3 =
25, R4 = 1 and s1 = 0, s2 = 0, s3 = 2. Hence from Step 2, we see that

χ311 (141) = (−1)σ ,

where
1412 − 1 292 − 1 252 − 1 (141 − 1)(29 − 1) (29 − 1)(25 − 1)
σ = 0· +0· +2· + +
8 8 8 4 4
≡ 0 mod 2,

hence
χ311 (141) = 1.
Of course in this simple example, we can obviously factor 141 completely and then use
the LQR as before, but the whole point of the example is to calculate χ311 (141) without
any factoring. In practical applications of quadratic residues in cryptology, such as Shamir’s
104
9. An Algorithm for Fast Computation of Legendre Symbols

zero-knowledge proof, the arguments of Legendre symbols are frequently very large, and so
complete factorization of the argument becomes computationally unfeasible.
How can the efficiency of our algorithm for the calculation of Legendre symbols be mea-
sured when it is implemented for computation on modern high-speed computers? Integer
calculations on a computer are done by using base-2 expansions of the integers, which are
called bit strings. A bit operation is the addition, subtraction or multiplication of two bit
strings of length 1, the division of a bit string of length 2 by a bit string of length 1 using the
division algorithm, or the shifting of a bit string by one place. The computational efficiency
of an algorithm is measured by its computational complexity, which is an estimate of the
number of bit operations that are needed to carry out the algorithm when it is programmed
to run on a computer. Because our algorithm for the computation of χp (a) uses a variation
of the Euclidean algorithm in Step 3, which accounts for most of the computational complex-

ity, one can show that the algorithm requires only O (log2 a)2 bit operations to compute
χp (a), which means that the algorithm is very fast and efficient. Thus one can very quickly
determine the integer w that is needed to implement Shamir’s algorithm.
In addition to finding a quadratic residue w of n, the initial steps in Shamir’s algorithm
also requires the determination of the square root of w modulo n. The simple procedure
1 1
that we described for computing this square root uses the powers w 4 (p+1) and w 4 (q+1) in an
application of the Chinese remainder theorem, with the exponents of w here being extremely
large. This situation thus calls for a quick and efficient procedure for the computation of
high-powered modular exponentiation, and so we will now present an algorithm which does
that.
The problem is to compute, for given positive integers b, n, and N with b < n, the
power bN mod n. We do this by first expressing the exponent N in its base-2 expansion
(ak ak−1 . . . a1 a0 )base 2 . Then compute the nonnegative minimal ordinary residues mod n of
k
b, b2 , . . . , b2 by successively squaring and reducing mod n. The final step is to multiply to-
i
gether the minimal nonnegative ordinary residues of b2 which correspond to ai = 1, reducing
modulo n after each multiplication. It can be shown that the nonnegative minimal ordinary

residue of bN mod n can be computed by this algorithm using only O (log2 n)2 log2 N bit
operations.

105
Chapter 4. Four Interesting Applications of Quadratic Reciprocity

The following example illustrates the calculations which are typically involved. We wish
to compute 15402 mod 1607. The binary expansion of 402 is 110010010. We calculate that
15 ≡ 15 mod 1607
152 ≡ 225 mod 1607
154 ≡ 808 mod 1607
158 ≡ 422 mod 1607
1516 ≡ 1314 mod 1607
1532 ≡ 678 mod 1607
1564 ≡ 82 mod 1607
15128 ≡ 296 mod 1607
15256 ≡ 838 mod 1607.
It follows that
15402 = 15256+128+16+2
≡ 838 · 296 · 1314 · 225 mod 1607
≡ 570 · 1314 · 225 mod 1607
≡ 118 · 225 mod 1607
≡ 838 mod 1607.

106
CHAPTER 5

The Zeta Function of an Algebraic Number Field and Some


Applications

At the end of section 6 of chapter 4, we left ourselves with the problem of determining the
finite nonempty subsets S of the positive integers such that for infinitely many primes p, S
is a set of non-residues of p. We observed there that if S has this property then the product
of all the elements in every subset of S of odd cardinality is never a square. The object of
this chapter is to prove the converse of this statement, i.e., we wish to prove Theorem 4.12.
The proof of Theorem 4.12 that we present uses ideas that are closely related to the ones
that Dirichlet used in his proof of Theorem 4.5, together with some technical improvements
due to Hilbert. The key tool that we need is an analytic function attached to algebraic
number fields, called the zeta function of the field. The definition of this function requires a
significant amount of mathematical technology from the theory of algebraic numbers, and so
in section 1 we begin with a discussion of the results from algebraic number theory that will
be required, with Dedekind’s Ideal Distribution Theorem as the final goal of this section. The
zeta function of an algebraic number field is defined and studied in section 2; in particular,
the Euler-Dedekind product formula for the zeta function is derived here. In section 3 a
product formula for the zeta function of a quadratic number field that will be required in
the proof of Theorem 4.12 is derived from the Euler-Dedekind product formula. The proof
of Theorem 4.12, the principal object of this chapter, is carried out in section 4 and some
results which are closely related to that theorem are also established there. In the interest
of completeness, we prove in section 5 the Fundamental Theorem of Ideal Theory, Theorem
3.16 of Chapter 3, since it is used in an essential way in the derivation of the Euler-Dedekind
product formula.

1. Dedekind’s Ideal Distribution Theorem

We have already seen in sections 11 and 12 of Chapter 3 how the factorization of ideals in
a quadratic number field can be used to prove the Law of Quadratic Reciprocity. The crucial
fact on which that proof of quadratic reciprocity relies is the Fundamental Theorem of Ideal
Theory (Theorem 3.16), the result which describes the fundamental algebraic structure of the
ideals in the ring R of algebraic integers in an algebraic number field F . As we mentioned in
107
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

Chapter 3, the Fundamental Theorem of Ideal Theory is due to Richard Dedekind. In order
to define and study the zeta function of F , we will need another very important theorem of
Dedekind which provides a precise numerical measure of how the ideals of R are distributed
in R according to the cardinality of the quotient rings of R modulo the ideals. This result
is often called Dedekind’s Ideal Distribution Theorem, and the purpose of this section is
to develop enough of the theory of ideals in R so that we can state the Ideal Distribution
Theorem precisely. All of this information will then be used in the next section to define
the zeta function and establish the properties of the zeta function that we will need to prove
Theorem 4.12.
Let F denote an algebraic number field of degree n that will remain fixed in the discussion
until indicated otherwise, and let R denote the ring of algebraic integers in F . In section 11
of chapter 3, we mentioned that every prime ideal of R is maximal and that the cardinality
of the quotient ring R/I of R is finite for all nonzero ideals I of R. Consequently, the ideals
of R are exceptionally “large” subsets of R. We begin our discussion here by proving these
facts as part of the following proposition.

Proposition 5.1. (i) An ideal of R is prime if and only if it is maximal.


(ii) If I is a non-zero ideal of R then the cardinality of the quotient ring R/I is finite.
(iii) If I is a prime ideal of R then there exists a rational prime q ∈ Z such that I∩Z = qZ.
In particular q is the unique rational prime contained in I.
(iv) If I is a prime ideal of R and q is the rational prime in I then R/I is a finite field
of characteristic q, hence there exists a unique positive integer d such that |R/I| = q d .

Proof. The proof of statements (i) and (ii) of Proposition 5.1 depend on the existence of
an integral basis of R. A subset {α1 , . . . , αk } of R is an integral basis of R if for each α ∈ R,
there exists a k-tuple (z1 , . . . , zk ) of integers, uniquely determined by α, such that
k
X
α= zi αi .
i=1

It is an immediate consequence of the definition that an integral basis {α1 , . . . , αk } is linearly


independent over Z, i.e., if (z1 , . . . , zk ) is a k-tuple of integers such that ki=1 zi αi = 0 then
P

zi = 0 for i = 1, . . . , k. R always has an integral basis (the interested reader may consult
Hecke [27], section 22, Theorem 64, for a proof of this), and it is not difficult to prove that
every integral basis of R is a basis of F as a vector space over Q; consequently, all integral
bases of R contain exactly n elements.
Now for the proof of (i). Let I be a prime ideal of R: we need to prove that I is a
maximal ideal, i.e., we take an ideal J of R which properly contains I and show that J = R.
108
1. Dedekind’s Ideal Distribution Theorem

Toward that end, let {α1 , . . . , αn } be an integral basis of R, and let 0 6= β ∈ I. If


m−1
X
m
x + zi xi
i=0

is the minimal polynomial of β over Q then z0 6= 0 (otherwise, β is the root of a nonzero


polynomial over Q of degree less that m) and
m−1
X
z0 = −β m − zi β i ∈ I,
1

hence ±z0 ∈ I, and so I contains a positive integer a. We claim that each element of R can
be expressed in the form
X n
aγ + r i αi ,
1
where γ ∈ R, ri ∈ [0, a − 1], i = 1, . . . , n.
Assume this for now, and let α ∈ J \ I. Then for each k ∈ [1, ∞),
n
X
k
α = aγk + rik αi , γk ∈ R, rik ∈ [0, a − 1], i = 1, . . . , n,
1

hence the sequence (αk − aγk : k ∈ [1, ∞)) has only finitely many values; consequently there
exist positive integers l < k such that

αl − aγl = αk − aγk .

Hence
αl (αk−l − 1) = αk − αl = a(γk − γl ) ∈ I (a ∈ I !).
Because I is prime, either αl ∈ I or αk−l − 1 ∈ I. However, αl 6∈ I because α 6∈ I and I is
prime. Hence
αk−l − 1 ∈ I ⊆ J.
But k − l > 0 and α ∈ J (by the choice of α), and so −1 ∈ J. As J is an ideal, this implies
that J = R .
Our claim must now be verified. Let α ∈ R, and find zi ∈ Z such that
n
X
α= zi αi .
i=1

The division algorithm in Z implies that there exist mi ∈ Z, ri ∈ [1, a − 1], i = 1, . . . , n,


such that zi = mi a + ri , i = 1, . . . , n. Thus
X X X
α=a mi αi + ri αi = aγ + r i αi ,
i i i
109
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

with γ ∈ R.
We verify (ii) next. Let L 6= {0} be an ideal of R. We wish to show that |R/L| is finite.
A propos of that, choose a ∈ L ∩ Z with a > 0 (that such an a exists follows from the
previous proof of statement (i)). Then aR ⊆ L, hence there is a surjection of R/aR onto
R/L, whence it suffices to show that |R/aR| is finite.
We will in fact prove that |R/aR| = an . Consider for this the set
nX o
S= zi αi : zi ∈ [0, a − 1] .
i

We show that S is a set of coset representatives of R/aR; if this is true then clearly |R/aR| =
|S| = an . Thus, let α = i zi αi ∈ R. Then there exist mi ∈ Z, ri ∈ [0, a − 1], i = 1, . . . , n,
P

such that zi = mi a + ri , i = 1, . . . , n. Hence


X X  X
α− r i αi = mi a ∈ aR and ri αi ∈ S,
i i i

and so each coset of R/aR contains an element of S.


Let i ai αi , i a′i αi be elements of S in the same coset. Then
P P
X
(ai − a′i )αi = aα, for some α ∈ R.
i

Hence there exists mi ∈ Z such that


X X
(ai − a′i )αi = mi aαi ,
i i

and so the linear independence (over Z) of {α1 , . . . , αn } implies that

ai − a′i = mi a, i = 1, . . . , n

i.e., a divides ai − a′i in Z. Because |ai − a′i | < a for all i, it follows that ai − a′i = 0 for all i.
Hence each coset of R/aR contains exactly one element of S.
In order to verify (iii), note first that the proof of statement (i) implies that I ∩ Z 6= {0}
and I ∩ Z 6= Z because 1 6∈ I. Hence I ∩ Z is a prime ideal of Z, and is hence generated in
Z by a unique prime number q.
Finally, we prove (iv) by concluding from Proposition 5.1(i) that I is a maximal ideal
of R: a standard result in elementary ring theory asserts that if M is a maximal ideal in a
commutative ring A with identity then the quotient ring A/M is a field (Hungerford [29],
Theorem III.2.20), hence R/I is a field, and is finite by Proposition 5.1(ii).
To see that R/I has characteristic q, note first that I ∩ Z = qZ, and so there is a natural
isomorphism of the field Z/qZ into R/I such that the identity in Z/qZ is mapped onto the
identity of R/I. Because Z/qZ has characteristic q, it follows that if 1̄ is the identity in R/I
110
1. Dedekind’s Ideal Distribution Theorem

then q 1̄ = 0 in R/I, and q is the least positive integer n such that n1̄ = 0 in R/I. Hence R/I
has characteristic q. QED
Remark. It is a consequence of Theorem 3.16 and Proposition 5.1 that R contains infin-
itely many prime ideals.
It follows from Proposition 5.1(ii) that if I 6= {0} is an ideal of R then |R/I| is finite.
We set
N(I) = |R/I|,
and call this the norm of I (we defined the norm of an ideal in this way already in section
11 of Chapter 3 for ideals in a quadratic number field). The norm function N on nonzero
ideals is multiplicative with respect to the ideal product, i.e., we have

Proposition 5.2. If I and J are (not necessarily distinct) nonzero ideals of R then
N(IJ) = N(I)N(J).

Proof. Hecke [27], section 27, Theorem 79. QED


The multiplicativity of the norm function on ideals will play a crucial role in the derivation
of a very important product expansion formula for the zeta function that will be done in the
next section.
Now, let
I = the set of all nonzero ideals of R.
If n ∈ [1, ∞), let
Z(n) = |{I ∈ I : N(I) ≤ n}|.
The following proposition states a very important fact about the parameters Z(n)!

Proposition 5.3. Z(n) < +∞, for all n ∈ [1, ∞).

As a result of Proposition 5.3, (Z(1), Z(2), Z(3) . . . ) is a sequence of positive integers


whose behavior determines how the ideals of R are distributed throughout R in accordance
with the cardinality of the quotient rings of R. Useful information about the behavior of
this sequence can hence be converted into useful information about the distribution of the
ideals in R, and, as we shall see shortly, the Ideal Distribution Theorem gives very useful
information about the behavior of this sequence. We turn now to the
Proof of Proposition 5.3.
Perhaps the most elegant way to verify Proposition 5.3 is to make use of the ideal class
group of R. We defined this group in section 11 of Chapter 3, and for the benefit of the
reader, we will recall how that goes. First declare that the ideals I and J of R are equivalent
if there exist nonzero elements α and β of R such that αI = βJ. This defines an equivalence
111
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

relation on the set of all ideals of R, and the corresponding equivalence classes are the ideal
classes of R. If we let [I] denote the ideal class which contains the ideal I then we define
a multiplication on the set of ideal classes by declaring that the product of [I] and [J] is
[IJ]. It can be shown that when endowed with this product (which is well-defined), the ideal
classes of R form an abelian group, called the ideal-class group of R. It is easy to see that
the set of all principal ideals of R is an ideal class, called the principal class, and one can
prove that the principal class is the identity element of the ideal-class group. The ideal-class
group is always finite, and the order of the ideal-class group of R is called the class number
of R.
We begin the proof of Proposition 5.3 by letting C be an ideal class of R and for each
n ∈ [1, ∞), letting ZC (n) denote the set

{I ∈ C ∩ I : N(I) ≤ n}.

We claim that |ZC (n)| is finite. In order to verify this, let J be a fixed nonzeo ideal in C −1
(the inverse of C in the ideal-class group), and let 0 6= α ∈ J. Then there is a unique ideal
I such that αR = IJ, and since [I] = C[IJ] = C[αR] = C, it follows that I ∈ C ∩ I.
Moreover, the map αR → I is a bijection of the set of all nonzero principal ideals contained
in J onto C ∩ I. Proposition 5.2 implies that

N(αR) = N(I)N(J),

hence
N(I) ≤ n if and only if N(αR) ≤ nN(J).
Hence there is a bijection of ZC (n) onto the set

J = {{0} =
6 αR ⊆ J : N(αR) ≤ nN(J)},

and so it suffices to show that J is a finite set.


That |J | is finite will follow if we prove that there is only a finite number of principal
ideals of R whose norms do not exceed a fixed constant. Suppose that this latter statement
is false, i.e., there are infinitely many elements α1 , α2 , . . . of R such that the principal ideals
αi R, i = 1, 2, . . . are distinct and (N(α1 R), N(α2 R), . . . ) is a bounded sequence. As all of
the numbers N(αi R) are positive integers, we may suppose with no loss of generality that
N(αi R) all have the same value z.
We now wish to locate z in each ideal αi R. Toward that end, use the Primitive Element
Theorem (Hecke [27], section 19, Theorem 52) to find θ ∈ F , of degree n over Q, such that
for each element ν of F , there is a unique polynomial f ∈ Q[x] such that ν = f (θ) and the
degree of f does not exceed n − 1. For each i, we hence find fi ∈ Q[x] of degree no larger
112
1. Dedekind’s Ideal Distribution Theorem

than n − 1 and for which αi = fi (θ). If θ1 , . . . , θn , with θ1 = θ, are the roots of the minimal
polynomial of θ over Q, then one can show that
Y n
N(αi R) = fi (θk )

k=1

(Hecke [27], section 27, Theorem 76). Moreover, the degree di of αi over Q divides n in Z,
(1) (d ) (1)
and if αi , . . . , αi i , with αi = αi , denote the roots of the minimal polynomial of αi over
(j)
Q, then the numbers on the list fi (θk ), k = 1, . . . , n, are obtained by repeating each αi n/di
times (Hecke [27], section 19, Theorem 54). If c0 denotes the constant term of the minimal
polynomial of αi over Q, it follows that
n di
Y n/di
(k)
Y
fi (θk ) = αi = ((−1)di c0 )n/di ∈ Z.
k=1 k=1

Because fi (θk ) is an algebraic integer for all i and k, it hence follows that
n
z Y
=± fi (θk ) ∈ R ∩ F = R,
αi k=2

whence z ∈ αi R, for all i.


If we now let {β1 , . . . , βn } be an integral basis of R then the claim in the proof of
Proposition 5.1(i) shows that for each i there exists γi ∈ R and zij ∈ [0, z − 1], j = 1, . . . , n,
such that
Xn
αi = zγi + zij βj .
1
Because z ∈ αi R, it follows that
n
X 
αi R = zR + zij βj R, for all i.
1

However, the sum n1 zij βj can have only finitely many values; we conclude that the ideals
P

αi R, i = 1, 2, . . . cannot all be distinct, contrary to their choice.


We now have what we need to easily prove that Z(n) is finite. Let C1 , . . . , Ch denote the
distinct ideal classes of R. The set of all the ideals of R is the (pairwise disjoint) union of the
Ci ’s hence {I ∈ I : N(I) ≤ n} is the union of JC1 (n), . . . , JCh (n). Because each set JCi (n) is
finite, so therefore is |{I ∈ I : N(I) ≤ n}| = Z(n). QED
We can now state the main result of this section:

Theorem 5.4. (Dedekind’s Ideal Distribution Theorem). The limit


Z(n)
lim =λ
n→∞ n
113
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

exists, is positive, and its value is given by the formula


2r+1 π e ρ
λ= p h,
w |d|
where

d = discriminant of F ,
1
e = (number of complex embeddings of F over Q),
2
h = class number of R,
r = unital rank of R,
ρ = regulator of F ,
w = order of the group of roots of unity in R.

Thus the number of nonzero ideals of R whose norms do not exceed n is asymptotic to
λn as n → +∞.
The establishment of Theorem 5.4 calls for several results from the theory of algebraic
numbers whose exposition would take us too far from what we wish to do here, so we
omit the proof and instead refer the interested reader to Hecke [27], section 42, Theorem
122. Although we will make no further use of them, readers who are also interested in
the definition of the discriminant of F and the regulator of F , should see, respectively, the
definition on p.73 and the definition on p.116 of Hecke [27]. We will define the parameter e
and the unital rank of R in the two paragraphs after the next one. The integers d, e, h, r, w,
and the real number ρ are fundamental parameters associated with F which govern many
aspects of the arithmetic and algebraic structure of F and R; Theorem 5.4 is a remarkable
example of how these parameters work in concert to do that.
Although the parameters which are used in the formula for the value of the limit λ =
limn→∞ Z(n)/n are rather complicated to define for an arbitrary algebraic number field, they
are much simpler to describe for a quadratic number field, so as to gain a better idea of how
they determine the asymptotic behavior of the sequence Z(1), Z(2), . . . , we will take a closer

look at what they are for quadratic fields. Thus, let Q( m) be the quadratic number field
determined by the square-free integer m 6= 0 or 1. As we pointed out in section 11 of Chapter

3, the discriminant of Q( m) is either m or 4m, if m is, or respectively, is not, congruent
to 1 mod 4.
In order to calculate the parameter e in Theorem 5.4, one needs to consider the embeddings
of an algebraic number field, i.e., the ring isomorphisms of the field into the set of complex
numbers which fixes each element of Q. An embedding is said to be real if its range is a
114
1. Dedekind’s Ideal Distribution Theorem

subset of the real numbers, otherwise, the embedding is said to be complex. It can be shown
that the number of embeddings is equal to the degree of the field and that the number of
complex embeddings is even, and so e is well-defined in Theorem 5.4. It follows that the

quadratic field Q( m) has precisely two embeddings: one is the trivial embedding which
√ √
maps each element of Q( m) to itself, and the other is the mapping on Q( m) induced by
√ √
the algebraic conjugate of m which sends the element q + r m for (q, r) ∈ Q × Q to the
√ √
element q − r m. It follows that if m > 0 then there are no complex embeddings of Q( m)
and if m < 0 then there are exactly 2 complex embeddings. Thus if m > 0 then e = 0 and
if m < 0 then e = 1.
We have already defined the class number h, and so we turn next to the unital rank r.
This parameter is determined by the structure of the group of units in an algebraic number
field. It can be shown that the group of units in the ring of algebraic integers R in the
algebraic number field F is isomorphic to the direct sum of the finite cyclic group of roots of
unity that are contained in F and a free abelian group of finite rank r (Hecke [27], section 34,
Theorem 100). The rank r of this free-abelian summand is by definition the unital rank of

R. When we now let F = Q( m), it can be shown that if m < 0 then the group of units of

Q( m) has no free-abelian summand, and so r = 0 in this case. On the other hand, if m > 0
then there is a unit ̟ of R in the group of units U(R) such that U(R) = {±̟ n : n ∈ Z}. If
̟ is chosen to exceed 1 then it is uniquely determined as a generator of U(R) in this way
and is called the fundamental unit of R. It follows that when m > 0, the group of units of R
is isomorphic to the direct sum of the cyclic group of order 2 and the free abelian group Z,
hence the unital rank r is 1 in this case.
The regulator ρ of an algebraic number field F is also determined by the group of units
of R by means of a rather complicated formula that uses a determinant that is calculated
from a basis of the free-abelian summand of the group of units. For a quadratic number

field Q( m) with m < 0, whose group of units has no free-abelian summand, the regulator

is taken to be 1, and if m > 0 then the regulator of Q( m) turns out to be log ̟, where ̟
is the fundamental unit of R.

If m > 0 then the group of roots of unity in Q( m) is simply {−1, 1}, and so the order w
of the group of roots of unity is 2. If m < 0 then it can be shown that w is 2 when m < −4,
it is 6 when m = −3, and it is 4 when m = −1.
Taking all of this information into account, we see that for the quadratic number field

Q( m), the conclusion of Theorem 5.4 can be stated as follows: if m > 0 and ̟ is the

115
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

fundamental unit in R = R ∩ Q( m) then
2 log ̟


 √ h, if m ≡ 1 mod 4,
m


Z(n) 
lim =
n→∞ n
log ̟


 √ h, if m 6≡ 1 mod 4,


m
and if m < 0 then



 p h, if m ≡ 1 mod 4,
 w |m|


Z(n)
lim =
n→∞ n 
 π

 p h, if m 6≡ 1 mod 4,
w |m|

where w is 2 when m < −4, 6 when m = −3, and 4 when m = −1. The Ideal Distribution
Theorem for quadratic number fields is in fact due to Dirichlet; after a careful study of
Dirichlet’s result, Dedekind generalized it to arbitrary algebraic number fields.

2. The Zeta Function of an Algebraic Number Field

We are now in a position to define and study the zeta function. Let F be an algebraic
number field of degree n and let R denote the ring of algebraic integers in F , as before.
Consider next the set I of all nonzero ideals of R. It is a consequence of Proposition 5.3
that I is countable, and so if s ∈ C then the formal series
X 1
(∗)
I∈I
N(I)s

is defined, relative to some fixed enumeration of I. As we shall see, the zeta function of F
will be defined by this series. However, in order to do that precisely and rigorously, a careful
examination of the convergence of this series must be done first. That is what we will do
next.
If we let
L(n) = |{I ∈ I : N(I) = n}|, n ∈ [1, ∞),
then by formal rearrangement of its terms, we can write the series (∗) as

X L(n)
(∗∗) .
n=1
ns

The series (∗∗) is a Dirichlet series, i.e., a series of the form



X an
,
n=1
ns
116
2. The Zeta Function of an Algebraic Number Field

where (an ) is a given sequence of complex numbers. The L-function of a Dirichlet character
is another very important example of a Dirichlet series.
We will determine the convergence of the series (∗) by studying the convergence of the
Dirichlet series (∗∗). This will be done by way of the following proposition, which describes
how a Dirichlet series converges.

Proposition 5.5. Let (an ) be sequence of complex numbers, let


n
X
S(n) = ak ,
k=1

and suppose that there exits σ ≥ 0, C > 0 such that


S(n)
σ ≤ C, for all n sufficiently large.

n
Then the Dirichlet series

X an
n=1
ns

converges in the half-plane Re s > σ and uniformly in each closed and bounded subset of this
half-plane. Moreover, if
S(n)
lim =d
n→∞ n

then

X an
lim (s − 1) = d.
s→1+
n=1
ns

Proof (according to Hecke [27], section 42, Lemmas (a), (b), (c)). Let m and h be integers,
with m > 0 and h ≥ 0, and let K ⊆ {s : Re s > σ} be a compact (closed and bounded) set.
Then
m+h m+h
X S(n) − S(n − 1)
X an
=
n=m
ns n=m
ns
m+h−1
S(m + h) S(m − 1) X 1 1 
= − + S(n) −
(m + h)s ms n=m
ns (n + 1)s
m+h−1 Z n+1
S(m + h) S(m − 1) X dx
= − + s S(n) .
(m + h)s ms n=m n xs+1

117
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

If we now use the stipulated bound on the quotients S(n)/nσ , it follows that
m+h Z ∞
X an 2C dx

s
≤ Re s−σ
+ C|s| Re s−σ+1
n m m x

n=m
2C C|s| 1
= + .
mRe s−σ Re s − σ m s−σ
Re

Because K is a compact subset of Re s > σ, it is bounded and lies at a positive distance δ


from Re s = σ, i.e., there is a positive constant C ′ such that

Re s − σ ≥ δ and |s| ≤ C ′ , for all s ∈ K.

Hence there is a positive constant C ′′ , independent of m and h, such that


m+h
X an 1 1

′′
s
≤ C 1 + δ
, for all s ∈ K.
n δ m

n=m

As m and h are chosen arbitrarily and δ depends on neither m nor h, this estimate implies
that the Dirichlet series converges uniformly on K, and as K is also chosen arbitrarily, it
follows that the series converges to a function continuous in Re s > σ.
We now assume that
S(n)
lim = d;
n→∞ n

we wish to verify that



X an
lim+ (s − 1) s
= d.
s→1
n=1
n

From what we have just shown, it follows that the Dirichlet series now converges for
s > 1. Let
S(n) = dn + εn n, where lim εn = 0,
n→∞


X an
ϕ(s) = , s > 1.
n=1
ns

Then for s > 1, we have that


∞ n+1
dx
X Z
|ϕ(s) − dζ(s)| = s nεn

xs+1

n=1 n
∞ n+1
dx
X Z
< s |εn | .
n=1 n xs
118
2. The Zeta Function of an Algebraic Number Field

Let ǫ > 0, and choose an integer N and a positive constant A such that |εn | < ǫ, for all
n ≥ N, and |εn | ≤ A, for all n. Then
N −1 Z n+1 ∞ Z n+1
X dx X dx
|(s − 1)ϕ(s) − d(s − 1)ζ(s)| < As(s − 1) + ǫs(s − 1) +
n=1 n
x n=N n
xs
Z ∞
dx
= As(s − 1) log N + ǫs(s − 1) s
.
N x

Because the last expression has limit ǫ as s → 1, it follows that



lim+ (s − 1)ϕ(s) − d(s − 1)ζ(s) = 0.
s→1

We now claim that


lim (s − 1)ζ(s) = 1;
s→1+
if this is so, then
lim (s − 1)ϕ(s) = d,
s→1+
as desired. This claim can be verified upon noting that
Z n+1 Z n
dx 1 dx
s
< s < s
, for all n ∈ [2, ∞) and for all s > 1.
n x n n−1 x

Hence
∞ ∞ ∞
1 dx X 1 dx s
Z Z
= < = ζ(s) < 1 + = ,
s−1 1 xs n=1
ns 1 xs s−1
and so
1 < (s − 1)ζ(s) < s, for all s > 1,
from which the claim follows immediately. QED
Because each function an /ns is an entire function of s, a Dirichlet series which satisfies
the hypotheses of Proposition 5.5 is a series of functions each term of which is analytic in
Re s > σ and which also converges uniformly on every compact subset of Re s > σ. Hence
the sum of the series is analytic in Re s > σ.
We wish to apply Proposition 5.5 to the series (∗∗), and so we must study the behavior
of the sequence
n
X
Z(n) = L(k).
k=1
It is here that we make use of Theorem 5.4; it follows from that theorem that there is a
positive constant λ such that
Z(n)
lim = λ,
n→∞ n
119
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

whence the sequence (Z(n)/n)∞ n=1 is bounded. Therefore the hypotheses of Proposition 5.5
are satisfied for an = L(n) with σ = 1, hence the series (∗∗) converges to a function analytic
in Re s > 1.
We now let s > 1. Because L(n) ≥ 0 for all n, the convergence of (∗∗) is absolute for
s > 1, hence we can rearrange the terms of (∗∗) in any order without changing its value. It
follows that the value of the series
X 1

I∈I
N(I)s

for s > 1 is finite, is independent of the enumeration of I used to define the series, and is
given by the value of the Dirichlet series (∗∗).

Definition. The (Dedekind-Dirichlet) zeta function of F is the function ζF (s) defined for
s > 1 by
X 1
ζF (s) = s
.
I∈I
N(I)

Remark. One can show without difficulty that if n an /ns is a Dirichlet series which
P

satisfies the hypotheses of Proposition 5.5 then n an /ns converges absolutely in Re s >
P

1 + σ. If we apply this fact to the series (∗∗), it follows that (∗∗) converges absolutely in Re
s > 2. Hence the value of the series
X 1

I∈I
N(I)s

for Re s > 2 is finite, is independent of the enumeration of I used to define the series, and
is given by the value of the series (∗∗). Although we will make no use of this fact, it follows
that the zeta function of F can be defined by the series (∗∗) not only for s > 1, but also for
Re s > 1, and when so defined, is analytic in that half-plane.
For emphasis, we record in the following proposition the observation that we made about
the value of the zeta function of F in the paragraph which immediately preceded its definition:

Proposition 5.6. If

L(n) = |{I ∈ I : N(I) = n}|, n ∈ [1, ∞),

then

X L(n)
ζF (s) = .
n=1
ns

For future reference, we also observe that Proposition 5.5 and Theorem 5.4 imply
120
2. The Zeta Function of an Algebraic Number Field

Lemma 5.7. If ζF (s) is the zeta function of F and λ is the positive constant in the
conclusion of Theorem 5.4 then

lim (s − 1)ζF (s) = λ.


s→1+

If F = Q then R = R∩Q = Z, hence the nonzero ideals of R in this case are the principal
ideals nZ, n ∈ [1, ∞). Then
N(nZ) = |Z/nZ| = n,

and so
{I ∈ I : |N(I)| = n} = {nZ}.

Hence the zeta function of Q is



X 1
ζQ (s) = ,
n=1
ns
the Riemann zeta function.
The next theorem gives a product formula for ζF (s) that is reminiscent of the product
formula for the Dirichlet L-function of a Dirichlet character that we pointed out in section 4
of Chapter 4. It is a very useful tool for analyzing certain features of the behavior of ζF (s)
and will play a key role in our proof of Theorem 4.12.

Theorem 5.8. (Euler-Dedekind product formula for ζF ) Let Q denote the set of all prime
ideals of R. Then
Y 1
(1) ζF (s) = −s
, s > 1.
I∈Q
1 − N(I)

Proof. Note that because a prime ideal I of R is proper, N(I) > 1, and so each term of
this product is defined for s > 1. In order to prove the theorem we will need some standard
facts about the convergence of infinite products, which we record in the following definitions
and Proposition 5.9.

Definitions. Let (an ) be a sequence of complex numbers such that an 6= −1, for all n.
The infinite product
Y∞
(1 + an )
1

converges if
n
Y
lim (1 + ak )
n→∞
1
121
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

exists and is finite, and it converges absolutely if



Y
(1 + |an |)
1

converges.

Q P
Proposition 5.9. (i) n (1 + an ) converges absolutely if and only if the series n |an |
converges.
(ii) The limit of an absolutely convergent infinite product is not changed by any rearrange-
ment of the factors.

Proof. See Nevanlinna and Paatero [42], Sections 13.1, 13.2. QED
Returning to the proof of Theorem 5.8, we next consider the product on the right-hand
side of (1). Because N(I) ≥ 2 for all I ∈ Q it follows that for s > 1,
1 N(I)−s
0< − 1 = ≤ 2N(I)−s ,
1 − N(I)−s 1 − N(I)−s
hence
X 1  X
− 1 ≤ 2 N(I)−s < +∞
I∈Q
1 − N(I)−s I∈Q

and so by Proposition 5.9, the product on the right-hand side of (1) converges absolutely for
s > 1 and its value is independent of the order of the factors.
The next step is to prove that this product converges to ζF (s) for s > 1. Let
Y 1
Π(x) = ;
1 − N(I)−s
I∈Q:N (I)≤x

this product has only a finite number of factors by Proposition 5.3 and
Y 1
lim Π(x) = −s
.
x→+∞
I∈Q
1 − N(I)

We have that

1 X 1
= ,
1 − N(I)−s n=0
N(I)ns
hence Π(x) is a finite product of absolutely convergent series, which we can hence multiply
together and, in the resulting sum, rearrange terms in any order without altering the value
of the sum. Proposition 5.2 implies that each term of this sum is either 1 or of the form

N(I1α1 · · · Irαr )−s ,


122
2. The Zeta Function of an Algebraic Number Field

where (α1 , . . . , αr ) is an r-tuple of positive integers, Ii is a prime ideal for which N(Ii ) ≤
x, i = 1, . . . , r, and all products of powers of prime ideals I with N(I) ≤ x of this form occur
exactly once. Hence
X 1
Π(x) = 1 + ,
N(I)s
where the sum here is taken over all ideals I of R such that all prime ideal factors of I have
norm no greater than x. Now the Fundamental Theorem of Ideal Theory (Theorem 3.16)
implies that all nonzero ideals of R have a unique prime ideal factorization, hence
X 1
ζF (s) − Π(x) = ,
N(I)s
where the sum here is taken over all ideals I 6= {0} of R such that at least one prime ideal
factor of I has norm greater than x. Hence this sum does not exceed
X L(n)
,
n>x
ns

and so
X L(n)
lim (ζF (s) − Π(x)) = lim = 0.
x→+∞ x→+∞
n>x
ns
QED
If F = Q then the prime ideals of R = Z are the principal ideals generated by the rational
primes q ∈ Z, and so it follows from Theorem 5.8 that
Y 1
(2) ζ(s) = −s
, s > 1,
q
1 − q

the Euler-product expansion of Riemann’s zeta.


We are now going to use Theorem 5.8 to obtain a factorization of ζF over rational primes
that is the analog of the product expansion (2) of the Riemann zeta function. In order to
derive it, we first recall from Proposition 5.1(iii) and (iv) that if I is a prime ideal of R then
I contains a unique rational prime q and there is a unique positive integer d such that N(I)
is q d . The integer d is called the degree of I and we will denote it by deg I. We can now
state and prove

Theorem 5.10. If Q denotes the set of all prime ideals of R then the zeta function ζF (s)
of F has a product expansion given by
Y  Y 1 
(3) ζF (s) = , s > 1.
I∈Q:q∈I
1 − q −(deg I)s
q a rational prime

123
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

Proof. If n ∈ Z then the ideal nR is contained in a prime ideal of R (Theorem 3.16) and
so Proposition 5.1(iii) implies that Q can be expressed as the pairwise disjoint union
[
{I ∈ Q : q ∈ I}.
q a rational prime

Hence as a consequence of Theorem 5.8 and Proposition 5.9(ii), we can rearrange the factors
in (1) so as to derive the expansion (3) for ζF (s). QED
The ideal qR of R is contained in only finitely many prime ideals (because of Theorem
3.16) and so each product inside the parentheses in (3) has only a finite number of factors;
these finite products are called the elementary factors of ζF .

3. The Zeta Function of a Quadratic Number Field

As has been the case frequently in much of our previous work, quadratic number fields
provide interesting and important examples of various phenomena of great interest and
importance in algebraic number theory, and zeta functions are no exception to this rule.
In this section we will illustrate how the decomposition law for the rational primes in a
quadratic number field, Proposition 3.17 from section 11 of chapter 3, and Theorem 5.10
can be used to derive a very useful product expansion for the zeta function of a quadratic
number field. It is precisely this result that will be used to prove Theorem 4.12 in the next
section.

For a square-free integer m 6= 1, let F = Q( m), R = R ∩ F . We recall for our
convenience what the decomposition law for the rational primes in R says. First, let p be an
odd prime. Then
(i) If χp (m) = 1 then pR factors into the product of two distinct prime ideals, each of
degree 1.
(ii) If χp (m) = 0 then pR is the square of a prime ideal I, and the degree of I is 1.
(iii) If χp (m) = −1 then pR is prime in R of degree 2.
The decomposition of the prime 2 in R occurs as follows:
(iv) If m ≡ 1 mod 8 then 2R factors into the product of two distinct prime ideals, each
of degree 1.
(v) If m ≡ 2 or 3 mod 4 then 2R is the square of a prime ideal I, and the degree of I is
1.
(vi) If m ≡ 5 mod 8 then 2R is prime in R of degree 2.
It follows from (i)-(vi) that if p is an odd prime in Z then the corresponding elementary
factor of ζF is
1
, if χp (m) = 1,
(1 − p−s )2
124
3. The Zeta Function of a Quadratic Number Field

1
, if χp (m) = 0,
1 − p−s
1
, if χp (m) = −1,
1 − p−2s
and the elementary factor corresponding to 2 is
1
, if m ≡ 1 mod 8,
(1 − 2−s )2
1
, if m ≡ 2 or 3 mod 4 ,
1 − 2−s
1
, if m ≡ 5 mod 8.
1 − 2−2s
Observe next that each of the elementary factors corresponding to p can be expressed as
1 1
.
1 − p−s 1 − χp (d)p−s
Hence from the product expansion (2) of the Riemann zeta function and the product expan-
sion (3) of ζF (s) we deduce

Proposition 5.11. The zeta function of Q( m) has the product expansion
Y 1
(4) ζQ(√m) (s) = θ(s)ζ(s) −s
, s > 1,
p
1 − χp (m)p

where
1

 , if m ≡ 1 mod 8,
 1 − 2−s


θ(s) = 1 , if m ≡ 2 or 3 mod 4 ,

 1
, if m ≡ 5 mod 8.


1 + 2−s
We will use this factorization of ζQ(√m) (s) to prove, in due course, the following lemma,
the crucial fact that we will need to prove Theorem 4.12.

Lemma 5.12. If a ∈ Z is not a square then


X
χp (a)p−s
p

remains bounded as s → 1+ .

Note that Lemma 5.12 is very similar in form and spirit to the hypothesis of Lemma 4.7,
which was a key step in Dirichlet’s proof of Theorem 4.5. We will eventually see that this is
no accident!
125
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

4. Proof of Theorem 4.12 and Related Results

We now have assembled all of the ingredients necessary for a proof of Theorem 4.12.
As we have already verified the “only if” implication in Theorem 4.12, we hence let S be a
nonempty finite subset of [1, ∞) and suppose that for each subset T of S such that |T | is
odd,
Y
i is not a square.
i∈T

Let
X = {p : χp ≡ −1 on S}.
We must prove that X has infinite cardinality.
Consider the sum
XY  1
(5) Σ(s) = 1 − χp (i) · s , s > 1,
i∈S
p
(p)

where (p) means that the summation is over all primes p such that p divides no element of
S. Then
X 1
Σ(s) = 2|S| , s > 1,
p∈X
ps
hence if we can show that

(6) lim Σ(s) = +∞,


s→1+

then the cardinality of X will be infinite.


In order to get (6), we first calculate that
Y X Y 
(−1)|T | χp

1 − χp (i) = 1 + i ,
i∈S ∅6=T ⊆S i∈T

substitute this into (5) and interchange the order of summation to obtain
X 1 X X Y  1 
|T |
Σ(s) = + (−1) χp i · s .
ps i∈T
p
(p) ∅6=T ⊆S (p)

Now divide {T : ∅ =
6 T ⊆ S} into U ∪ V ∪ W , where
n Y o
U = ∅=6 T ⊆ S : |T | is even and i is a square ,
i∈T
n Y o
V = ∅=
6 T ⊆ S : |T | is even and i is not a square ,
i∈T

W = {T ⊆ S : |T | is odd}.
126
4. Proof of Theorem 4.12 and Related Results

Then
X 1
Σ(s) = (1 + |U|)
ps
(p)
X X Y  1 
+ χp i · s
T ∈V i∈T
p
(p)
X X Y  1 
− χp i · s
T ∈W i∈T
p
(p)

= Σ1 (s) + Σ2 (s) − Σ3 (s).

Because the range of the summation here is over all but finitely many primes, Lemma 5.12,
the definition of V and the hypothesis on S imply that Σ2 (s) and Σ3 (s) remain bounded as
s → 1+ , and so (6) will follow once we prove Lemma 5.12 and verify that
X 1
(7) lim+ = +∞.
s→1 ps
(p)

We check (7) first. Because the summation range in (7) is over all but finitely many
primes, we need only show that
X 1
(8) lim+ s
= +∞.
s→1
p
p

To see (8), recall from the proof of Proposition 5.5 that

lim (s − 1)ζ(s) = 1,
s→1+

hence
1
(9) lim+ log ζ(s) = lim+ log + lim log(s − 1)ζ(s) = +∞.
s→1 s→1 s − 1 s→1+
Now let s > 1. The mean value theorem implies that
1
| log(1 + x)| ≤ 2|x| for |x| ≤ ,
2
and so
| log(1 − q −s )| ≤ 2q −s , for all q ∈ P.
−s
P∞
n−s < ∞ it follows that the series
P
Because qq < n=1
X
log(1 − q −s )
q

127
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

is absolutely convergent. Hence


Y 1 
log ζ(s) = log (from (2))
q
1 − q −s
X
= − log(1 − q −s )
q
X 1 X 1
−s
= + − log(1 − q ) −
q
qs q
qs
X 1 XX 1 
= + ,
q
qs q n≥2
nq ns
P∞
where we use the series expansion log(1 − x) = − 1 xn /n, |x| < 1, to obtain the last
equation. Then
X 1 ∞
1 X 1 
0< = 2s
n≥2
nq ns q n=0
(n + 2)q ns

1 X −ns
≤ 2s q
q n=0
1 1
=
q 1 − q −s
2s

2
< 2 , for all q ≥ 2 and for all s ≥ 1.
q
and so
XX 1  X 1
0< ns
< 2 2
< +∞ for all s ≥ 1.
q n≥2
nq q
q

It follows that
X 1
s
= log ζ(s) + H(s), H(s) bounded on s > 1,
q
q

hence this equation and (9) imply (8).


It remains only to prove Lemma 5.12. Let d 6= 1 be a square-free integer. Then it is a

consequence of the factorization (4) of ζF , F = Q( d) in Proposition 5.11 that
Y 1
ζF (s) = θ(s)ζ(s)L(s), where L(s) = .
p
1 − χp (d)p−s

By virtue of Lemma 5.7,


lim (s − 1)ζF (s) = λ > 0,
s→1+
128
4. Proof of Theorem 4.12 and Related Results

hence
1 (s − 1)ζF (s)
lim+ L(s) = lim+
s→1 s→1 θ(s) (s − 1)ζ(s)
λ
= > 0,
θ(1)
and so

(10) lim log L(s) is finite.


s→1+

Now let s > 1. Then


X
(11) log L(s) = − log(1 − χp (d)p−s )
p

XX χp (d)n
=
p n=1
npns

X
−s
XX χp (d)n
= χp (d)p + .
p p n=2
npns

Because
∞ n
X X χp (d) XX 1
≤ ,

np ns np ns


p n=2 p n≥2

the second term on the right-hand side of the last equation in (11) can be estimated as before
to verify that it is bounded on s > 1. Hence (10) and (11) imply that
X
(12) χp (d)p−s is bounded as s → 1+ .
p

The integer d here can be any integer 6= 1 that is square-free, but every integer is the product
of a square and a square-free integer, hence (12) remains valid if d is replaced by any integer
which is not a square. QED
The technique used in the proof of Theorem 4.12 can also be used to obtain an interesting
generalization of Basic Lemma 4.4 which answers the following question: if S is a nonempty,
finite subset of [1, ∞) and ε : S → {−1, 1} is a given function, when does there exist
infinitely many primes p such that χp ≡ ε on S? There is a natural obstruction to S having
this property very similar to the obstruction that prevents the conclusion of Theorem 4.12
Q
from being true for S. Suppose that there exists a subset T 6= ∅ of S such that i∈T i is a
square. If we choose i0 ∈ T and define
(
−1, if i = i0 ,
ε(i) =
1, if i ∈ S \ {i0 },
129
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

then χp 6≡ ε on S for all sufficiently large p: otherwise there exits a p exceeding all prime
factors of the elements of T such that
Y Y 
−1 = ε(i) = χp i = 1.
i∈T i∈T

By tweaking the proof of Theorem 4.12, we will show that this is the only obstruction to S
having this property.

Theorem 5.13. Let S be a nonempty finite subset of [1, ∞). The following statements
are equivalent:
(i) The product of all the elements in each nonempty subset of S is not a square;
(ii) If ε : S → {−1, 1} is a fixed but arbitrary function, then there exist infinitely many
primes p such that χp ≡ ε on S.

Proof. We have already observed that (i) follows from (ii), hence suppose that S satisfies
(i) and let ε : S → {−1, 1} be a fixed function. Consider the sum
XY  1
Σε (s) = 1 + ε(i)χp (i) · s , s > 1.
i∈S
p
(p)

If
Xε = {p : χp ≡ ε on S}
then
X 1
Σε (s) = 2|S| .
p∈X
ps
ε

Also,
X 1 X Y X Y  1 
Σε (s) = + ε(i) χp i · s .
ps i∈T i∈T
p
(p) ∅6=T ⊆S (p)

Lemma 5.12 and the hypotheses on S imply that the second term on the right-hand side of
this equation is bounded as s → 1+ , hence from (7) we conclude that

lim Σε (s) = +∞,


s→1+

and so Xε is infinite. QED

Definition. Any set S satisfying statement (ii) of Theorem 5.13 will be said to support
all patterns.

Remark. The proof of Theorems 4.12 and 5.13 follows exactly the same strategy as
Dirichlet’s proof of Theorem 4.5. One wants to show that a set X of primes with a certain
130
4. Proof of Theorem 4.12 and Related Results

property is infinite. Hence take s > 1, attach a weight of p−s to each prime p in X and then
attempt to prove that the weighted sum
X 1

p∈X
ps

of the elements of X is unbounded as s → 1+ . In order to achieve this (using ingenious


methods!), one writes this weighted sum as p 1/ps plus a term that is bounded as s → 1+ .
P

The similarity of all of these arguments is no accident; Theorem 5.13 is in fact also due
to Dirichlet, and appeared in his great memoir [11], Recherches sur diverses applications de
l’analyse infinitésimal à la théorie des nombres, of 1839-40, which together with [10] founded
modern analytic number theory. The proof of Theorem 5.13 given here is a variation on
Dirichlet’s original argument due to Hilbert [28], section 80, Theorem 111.
A straightforward modification of the proof of Theorem 4.9 can now be used to establish

Theorem 5.14. If S is a nonempty, finite subset of [1, ∞) such that for all subsets T of
S of odd cardinality, i∈T i is not a square, S and v : 2S → F n are defined by S as in the
Q

statement of Theorem 4.9, and d is the dimension of the linear span of v(S) in F n , then the
density of the set {p : χp ≡ −1 on S} is 2−d .

If p < q < r < s are distinct primes and we let, for example, S1 = {p, pq, qr, rs} and
S2 = {p, ps, pqr, pqrs}, then it follows from Theorem 5.14 and the row reduction of the
incidence matrices of S1 and S2 that we performed in section 6 of Chapter 4 that the density
of {p : χp ≡ −1 on S1 } is 2−4 and the density of {p : χp ≡ −1 on S2 } is 2−3 . As we pointed
out in section 6 of chapter 4, a 2-dimensional subspace of F 4 contains only 3 nonzero vectors,
and so if S is a set of 4 nontrivial square-free integers such that S is supported on 4 primes
then the density of {p : χp ≡ −1 on S} cannot be 2−2 . But it is also true that all of the
vectors in a 2-dimensional subspace of F 4 must sum to 0 and so if S is a set of 3 nontrivial
square-free integers such that S is supported on 4 primes then {p : χp ≡ −1 on S} is in fact
empty. In order to get a set S from p, q, r, and s such that the density of {p : χp ≡ −1 on S}
is 2−2 , S has to have 2 elements, and it follows easily from Theorem 5.14 that S = {pq, qrs}
is one of many examples for which the density of {p : χp ≡ −1 on S} is 2−2 .
A straightforward modification of the proof of Lemma 4.10 can also be used to establish

Theorem 5.15. (Filaseta and Richman, [18], Theorem 2) If S is a nonempty, finite


subset of [1, ∞) such that the product of all the elements in each nonempty subset of S is
not a square and ε : S → {−1, 1} is a fixed but arbitrary function, then the density of the
set {p : χp ≡ ε on S} is 2−|S| .
131
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

5. Proof of the Fundamental Theorem of Ideal Theory

Because the Fundamental Theorem of Ideal Theory was used at its full strength in the
proof of the Euler-Dedekind product expansion of the zeta function (Theorem 5.8), and also
because of the important role that it played (although not at full strength) in the results
on the factorization of ideals in a quadratic number field from Chapter 3, we will present a
proof of it in this final section of Chapter 5. Our account follows the outline given by O.
Ore in [43].
Let F be an algebraic number field of degree n and let R be the ring of algebraic integers
in F . We want to prove that every nonzero proper ideal of R is a product of a finite number
of prime ideals and also that this factorization is unique up to the order of the prime-ideal
factors. The strategy of our argument is to prove first that each nonzero proper ideal of R
contains a finite product of prime ideals. We hence chose for each nonzero proper ideal I a
product of prime ideals with the smallest number of factors that is contained in I, and then
by use of appropriate mathematical technology that we will develop, proceed by induction
on this smallest number of prime-ideal factors to prove that I is in fact equal to a product
of prime ideals. Uniqueness will then follow by further use of the mathematical technology
that we have at our disposal. We proceed to implement this strategy.
Let I be an ideal of R, {0} =6 I 6= R.

Lemma 5.16. There exists a sequence of prime ideals P1 , . . . , Ps of R such that I ⊆ Pi ,


for all i and P1 · · · Ps ⊆ I.

Proof. If I is prime then we are done, with s = 1, hence suppose that I is not prime.
Then there exists a product βγ of elements of R which is in I and β 6∈ I, γ 6∈ I. Let
{α1 , . . . , αn } be an integral basis of I, and set

J = (α1 , . . . , αn , β), K = (α1 , . . . , αn , γ).

Then
JK ⊆ I, I $ J, I $ K.
If J, K are both prime then we are done, with s = 2. Otherwise apply this procedure to
each nonprime ideal that occurs, and continue in this way as long as the procedure produces
nonprime ideals. Note that after each step of the procedure,
(i) the product of all the ideals obtained in that step is contained in I,
(ii) I is contained in each ideal obtained in that step, and
(iii) each ideal obtained in that step is properly contained in an ideal from the immedi-
ately preceding step.
132
5. Proof of the Fundamental Theorem of Ideal Theory

Claim: this procedure terminates after finitely many steps.


If this is true then each ideal obtained in the final step is prime; otherwise the procedure
would continue by applying it to a nonprime ideal. If P1 , . . . , Ps are the prime ideals obtained
in the final step then this sequence of ideals satisfies Lemma 5.16 by virtue of (i) and (ii)
above.
Proof of the claim. Suppose this is false. Then (ii) and (iii) above imply that the
procedure produces an infinite sequence of ideals J0 , J1 , . . . , Jn , . . . such that J0 = I and
Ji $ Ji+1 , for all i. We will now prove that I is contained in only finitely many ideals, hence
no such sequence of ideals is possible.
The proof of Proposition 5.1(i) implies that I contains a positive rational integer a. We
show that a belongs to only finitely many ideals.
Suppose that J is an ideal, with integral basis {β1 , . . . , βn }, and a ∈ J. Then we also
have that
J = (β1 , . . . , βn , a).
By the claim in the proof of Proposition 5.1(i), for each i, there is γi , δi ∈ R such that
βi = aγi + δi , and δi can take on only at most an values. But then

J = (aγ1 + δ1 , . . . , aγn + δs ) = (δ1 , . . . , δn , a).

Because each δi assumes at most an values, it follows that J is one of only at most an2
ideals. QED
The statement of the next lemma requires the following definition:

Definition. If J is an ideal of R then

J −1 = {α ∈ F : αβ ∈ R, for all β ∈ J}.

Lemma 5.17. If P is a prime ideal of R then P −1 contains an element of F \ R.

Proof. Let x ∈ P . Lemma 5.16 implies that (x) contains a product P1 · · · Ps of prime
ideals. Choose a product with the smallest number s of factors.
Suppose that s = 1. Then P1 ⊆ (x) ⊆ P . P1 maximal (Proposition 5.1(i)) implies that
P = P1 = (x). Hence 1/x ∈ P −1 . Also, 1/x 6∈ R; otherwise, 1 = x · 1/x ∈ P , contrary to
the fact that P is proper.
Suppose that s > 1. Then P1 · · · Ps ⊆ (x) ⊆ P , and so the fact that P is prime implies
that P contains a Pi , say P1 . P1 maximal implies that P = P1 . P2 · · · Ps * (x) by minimality
of s, hence there exits α ∈ P2 · · · Ps such that α 6∈ (x), and so α/x 6∈ R.
Claim: α/x ∈ P −1.
133
Chapter 5. The Zeta Function of an Algebraic Number Field and Some Applications

Let β ∈ P . We must prove that β(α/x) ∈ R. To do that, observe that

(α)P ⊆ P2 · · · Ps P = P1 · · · Ps ⊆ (x),

and so there is a γ ∈ R such that αβ = xγ, i.e., β(α/x) = γ. QED


The next lemma is the key technical tool that allows us to prove the Fundamental The-
orem of Ideal Theory; it will be used to factor an ideal into a product of prime ideals and to
show that this factorization is unique up to the order of the factors. In order to state it, we
need to extend the definition of products of ideals to products of arbitrary subsets of R like
so:

Definition. If S and T are subsets of R then the product ST of S and T is the set
X
consisting of all sums of the form si ti , where (si , ti ) ∈ S × T for all i.
i

This product is clearly commutative and associative, and it agrees with the product
defined before when S and T are ideals of R.

Lemma 5.18. If P is a prime ideal of R and I is an ideal of R then P −1 P I = I.

Proof. It suffices to show that P −1 P = (1). It is straightforward to show that J = P −1 P


is an ideal of R. As 1 ∈ P −1, it follows that P ⊆ J and so P maximal implies that P = J
or J = (1).
Suppose that J = P . Let {α1 , . . . , αn } be an integral basis of P , and use Lemma 5.17 to
find γ ∈ P −1 , γ 6∈ R. Then γαi ∈ P , for all i, and so
X
γαi = aij αj , where aij ∈ Z for all i, j.
j

As a consequence of these equations, γ is an eigenvalue of the matrix [aij ], hence it is a root


of the characteristic polynomial of [aij ], and this characteristic polynomial is a monic polyno-
mial in Z[x]. As we showed in the proof of Theorem 3.11, this implies that γ is an algebraic in-
teger, contrary to its choice. Hence P 6= J, and so J = (1). QED
The Fundamental Theorem of Ideal Theory is now a consequence of the next two lemmas.

Lemma 5.19. Every nonzero proper ideal of R is a product of prime ideals.

Proof. Lemma 5.16 implies that every nonzero proper ideal of R contains a product
P1 · · · Pr of prime ideals, where we choose a product with the smallest number r of factors.
The argument now proceeds by induction on r.
Let {0} = 6 I 6= R be an ideal with r = 1, i.e., I contains a prime ideal P . P maximal
implies that I = P , and we are done.
134
5. Proof of the Fundamental Theorem of Ideal Theory

Assume now that r > 1 and every nonzero, proper ideal that contains a product of fewer
than r prime ideals is a product of prime ideals.
Let {0} =
6 I 6= R be an ideal that contains a product P1 · · · Pr of prime ideals, with r the
smallest number of prime ideals with this property. Lemma 5.16 implies that I is contained
in a prime ideal Q. Hence P1 · · · Pr ⊆ Q, and so Q contains a Pi , say P1 . P1 maximal implies
that Q = P1 . Hence I ⊆ P1 . Then IP1−1 is an ideal of R; I ⊆ IP1−1 (1 ∈ P −1 ), and so
IP1−1 6= {0}. IP1−1 6= R; otherwise, P1 ⊆ I, hence I = P1 , contrary to the fact that r > 1.
Lemma 5.18 implies that
P2 · · · Pr = P1−1 P1 · · · Pr ⊆ IP1−1 ,
hence by the induction hypothesis, IP1−1 is a product P1′ · · · Pk′ of prime ideals, and so by
Lemma 5.18 again,
I = (IP1−1 )P1 = P1′ · · · Pk′ P1
is a product of prime ideals. QED

Lemma 5.20. Factorization as a product of prime ideals is unique up to the order of the
factors.

Proof. Suppose that P1 · · · Pr = Q1 · · · Qs are products of prime ideals, with r ≤ s,


say. Q1 · · · Qs ⊆ Q1 , hence P1 · · · Pr ⊆ Q1 and so the fact that Q1 is a prime ideal and the
maximality of the Pi ’s imply, after reindexing one of the Pi ’s, that Q1 = P1 . Then Lemma
5.18 implies that
P2 · · · Pr = P1−1 P1 · · · Pr = Q−1
1 Q1 · · · Qs = Q2 · · · Qs .

Continuing in this way, we deduce, upon reindexing of the Pi ’s, that Pi = Qi , i = 1, . . . r,


and also, if r < s, that
(1) = Qr+1 · · · Qs .
But this equation implies that R = (1) ⊆ Qr+1 , which is impossible as Qr+1 is a proper
ideal. Hence r = s. QED
Dedekind’s own proof of The Fundamental Theorem of Ideal Theory in [8], Chapter 4,
section 25, is a model of clarity and insight which amply repays careful study. We strongly
encourage the reader to take a look at it.

135
CHAPTER 6

Elementary Proofs

After providing in section 1 of this chapter some motivation for the use of elementary
methods in number theory, we present proofs of Theorems 4.12 and 5.13 in sections 3 and
2, respectively, which employ only Lemma 4.4 from Chapter 4 and linear algebra over the
Galois field of order 2, thereby avoiding the use of zeta functions.

1. Whither Elementary Proofs in Number Theory?

Dirichlet’s incorporation of transcendental methods into number theory allowed him to


establish many deep and far-reaching results in that subject. The importance of Dirichlet’s
work led to a very strong desire to understand it in as many different ways as possible, and
this desire naturally motivated the search for different ways to prove his results. As the years
passed, particular attention was focused on removing any use of methods which relied on
mathematical analysis, replacing them instead by ideas and techniques which deal with or
stem directly from the fundamental structure of the integers, as this was sometimes viewed
as being more suitable for the development of the most important results of the theory. The
viewpoint that the preferred methods in number theory should be based only on fundamental
properties of the integers was originally held evidently by none other than Leonard Euler
(see the remarks by Gauss in [19], article 50 about Euler’s proof of Fermat’s little theorem),
and much of the fundamental contributions to number theory by Euler, Lagrange, Legendre,
Gauss, Dedekind, and many others can be seen as evidence of the value of that philosophy.
The subject of elementary number theory, i.e., the practice of number theory using methods
which have their basis in the algebra and/or the geometry of the integers, and which, in
particular, avoid the use of any of the infinite processes coming from analysis, has thus
attained major importance. Indeed, among the results of twentieth-century number theory
which generated exceptional excitement and interest is the discovery by Selberg [51], [52]
and Erdös [14] in 1949 of the long-sought elementary proofs of the Prime Number Theorem,
Dirichlet’s theorem on primes in arithmetic progression, and the Prime Number Theorem
for primes in arithmetic progression.
The philosophical spirit of elementary number theory resonates with particular force in
the mind of anyone who compares the way that we proved Theorems 4.2 and 4.3 to the
137
Chapter 6. Elementary Proofs

way that we proved Theorems 4.12 and 5.13. The proof of the former two results are easy
consequences of Lemma 4.4, which in turn depends on an elegant application of quadratic
reciprocity and Dirichlet’s theorem on primes in arithmetic progression. In contrast to that
line of reasoning, our proof of Theorems 4.12 and 5.13 requires, by comparison, a rather
sophisticated application of transcendental methods based on the Riemann zeta function
and the zeta function of a quadratic number field. Because all of these results are very
similar in content, this raises a natural question: can we give elementary proofs of Theorems
4.12 and 5.13 which, in particular, avoid the use of zeta functions and are more in line with
the ideas used in the proof of Theorems 4.2 and 4.3? The answer: yes we can, and that
will be done in this chapter by proving Theorems 4.12, in section 3, and 5.13, in section
2, using only Lemma 4.4 and linear algebra over GF (2). Taking into account the fact that
Dirichlet’s theorem and the Prime Number Theorem for primes in arithmetic progression
also have elementary proofs, the proofs that we have given of Theorems 4.2, 4.3, 4.9, 5.14,
and 5.15 are already elementary.

2. An Elementary Proof of Theorem 5.13

We begin with Theorem 5.13: let S be a nonempty finite subset of [1, ∞) such that
Y
(1) for all ∅ =
6 T ⊆ S, i is not a square.
i∈T

We wish to prove that for each function ε : S → {−1, 1}, the cardinality of {p : χp ≡ ε on S}
is infinite.
The first step in our reasoning is to reduce to the case in which every integer in S is
square-free. Recall that the square-free part σ(z) of z ∈ [1, ∞) is
Y
σ(z) = q,
q∈πodd (z)

and observe that if ∅ =


6 T ⊆ [1, ∞) is finite then
Y Y
i is not a square if and only if σ(i) is not a square.
i∈T i∈T

(There is an integer n such that i∈T i = i∈T σ(i) × n2 , so the multiplicity m of a prime
Q Q

factor q of i∈T i in i∈T i is congruent mod 2 to the multiplicity m′ of q in i∈T σ(i) hence
Q Q Q

m is odd if and only if m′ is odd.) Also

χp (z) = χp (σ(z)), for all p ∈


/ π(z).
138
2. An Elementary Proof of Theorem 5.13

Hence, upon replacing S by the set formed from the integers σ(z) for z ∈ S, we may suppose
with no loss of generality that all elements of S are square-free. Hence
Y
z= q, z ∈ S,
q∈π(z)

π(z) 6= ∅, for all z ∈ S(1 ∈


/ S), and if {w, z} ⊆ S then π(w) 6= π(z).
The next step is to look for a purely combinatorial condition on the sets π(z), z ∈ S, that
is equivalent to condition (1). The following notation will be helpful with regard to that: if
T ⊆ S, let
[
Π(T ) = π(i),
i∈T

S(T ) = {π(i) : i ∈ T },
Y
p(T ) = i,
i∈T
and let
[
Π= π(i),
i∈S

S = {π(i) : i ∈ S}.
Now
Π(T ) = the set of all prime factors of p(T )
and
the multiplicity in p(T ) of q ∈ Π(T ) = |{X ∈ S(T ) : q ∈ X}|.
Hence

(2) p(T ) is not a square iff {q ∈ Π(T ) : |{X ∈ S(T ) : q ∈ X}| is odd } =
6 ∅.

Condition (2) can be elegantly expressed by using the symmetric difference operation on
sets. Recall that if A and B are sets then the symmetric difference A△B of A and B is the
set (A \ B) ∪ (B \ A). The symmetric difference operation is commutative and associative,
hence if A1 , . . . , Ak are distinct sets then the repeated symmetric difference

△i Ai = A1 △ · · · △Ak

is unambiguously defined. In fact, one can show that


n [ o
(3) △i Ai = a ∈ Ai : |{Aj : a ∈ Aj }| is odd .
i

Statements (2) and (3) imply that

p(T ) is not a square if and only if △i∈T π(i) 6= ∅.


139
Chapter 6. Elementary Proofs

Hence

condition (1) holds if and only if for all nonempty subsets T of S, △i∈T π(i) 6= ∅.

As the map i → π(i) is a bijection of S onto S, it follows that

(4) condition (1) holds if and only if for all nonempty subsets T of S, △T ∈T T 6= ∅.

Statement (4) is the combinatorial formulation of condition (1) that we want.


In order to express things more concisely, we recall now from section 5 of Chapter 5 that
S is said to support all patterns if for each function ε : S → {−1, 1}, the set {p : χp ≡ ε on S}
is infinite. Consequently from (4), in order to prove Theorem 5.13, we must show that

(5) if △T ∈T T 6= ∅ for all ∅ =


6 T ⊆ S then S supports all patterns.

Hence we next look for a combinatorial condition on S which guarantees that S supports all
patterns. This is provided by

Lemma 6.1. Suppose that S satisfies the following condition:


(6) for each nonempty subset T of S, there exists a subset N of Π such that
T = {S ∈ S : |N ∩ S| is odd}.
Then S supports all patterns.

Proof. Let ε be a function of S into {−1, 1}. We must prove: {p : χp ≡ ε on S} is


infinite.
The map π(i) → ε(i), i ∈ S defines a function ε′ of S into {−1, 1}. Let

T = (ε′ )−1 (−1).

If T = ∅ then ε ≡ 1, hence apply Theorem 4.3. Suppose that T = 6 ∅, and then find N ⊆ Π
such that N satisfies the conclusion of (6) for this T . Basic Lemma 4.4 implies that there
are infinitely many primes p for which

(7) {q ∈ Π : χp (q) = −1} = N.

Let p be any one of these primes which divides no element of S.


We claim that χp ≡ ε on S. To verify this, note first that because of (7),

χp (i) = (−1)|N ∩π(i)| , for all i ∈ S.

Hence
i ∈ S ∩ χ−1
p (−1) if and only if |N ∩ π(i)| is odd.
140
2. An Elementary Proof of Theorem 5.13

Since the conclusion of (6) holds for N and T , it follows that

|N ∩ π(i)| is odd if and only if π(i) ∈ T , for all i ∈ S.

The definition of ε′ implies that

π(i) ∈ T if and only if i ∈ ε−1 (−1),

Hence
S ∩ χ−1 −1
p (−1) = ε (−1),

and so χp ≡ ε on S. QED
Remark. The converse of Lemma 6.1 is valid.
In order to verify statement (5), and hence prove Theorem 5.13, it suffices by virtue of
Lemma 6.1 to prove that if

(8) △T ∈T T 6= ∅ for all ∅ =


6 T ⊆S

then

(9) for each ∅ =


6 T ⊆ S, there exits N ⊆ Π such that T = {S ∈ S : |N ∩ S| is odd}.

We have now completely removed residues and non-residues from the scene and have reduced
everything to proving the following purely combinatorial statement about finite sets:
if A is a nonempty finite set, ∅ 6= S ⊆ 2A \ {∅}, and S satisfies (8), then,
with Π replaced by A, S satisfies (9).
This can be done via linear algebra over F = GF (2), by means of the same idea that we
used in the proof of Lemma 4.11. We may suppose with no loss of generality that A = [1, n]
for some n ∈ [1, ∞). Let
v : 2A → F n
be the map defined in section 6 of Chapter 4. If S = {S1 , . . . , Sm }, note that if ∅ =
6 T ⊆S
then there is a bijection of the set of solutions over F of the m × n system of linear equations
X
v(T )(i)xi = 1, T ∈ T ,
i
X
v(S)(i)xi = 0, S ∈ S \ T ,
i
onto the set

{N ⊆ [1, n] : N satisfies the conclusion of (9) (with Π replaced by A) for T }

given by
(x1 , . . . , xn ) → {i : xi = 1}.
141
Chapter 6. Elementary Proofs

Hence (9) holds with Π replaced by A if and only if the linear transformation of F n → F m
with matrix  
v(S1 )(1) . . . v(S1 )(n)
 . ..
..

B=  . 

v(Sm )(1) . . . v(Sm )(n)
is surjective, i.e., B has rank m, i.e., the row vectors of B are linearly independent over F .
We now show that

(10) the row vectors of B are linearly independent over F if and only if S satisfies (8);

this will prove Theorem 5.13 using only Lemma 4.4 and linear algebra over F !
If w = (w1 , . . . , wn ) ∈ F n , recall that the support supp(w) of w is the set

supp(w) = {i : wi = 1}.

6 U ⊆ F n then
It is easy to see that if ∅ =
X 
supp w = △w∈U supp(w),
w∈U

and so
X
(11) w 6= 0 if and only if △w∈U supp(w) 6= ∅.
w∈U

Observe now that


X
(12) U is linearly independent over F if and only if for all ∅ =
6 W ⊆ U, w 6= 0.
w∈W

Statement (10) is now a consequence of (11), (12), and the fact that

supp v(T ) = T , for all T ∈ S.

QED

3. An Elementary Proof of Theorem 4.12

Now for the proof of Theorem 4.12. Let S be a nonempty finite subset of [1, ∞) such
that

(13) p(T ) is not a square for all T ⊆ S with |T | odd.

We need to prove that the set {p : χp ≡ −1 on S} is infinite.


If we replace S by the set S ′ of integers formed by the square-free parts of the elements
of S then (13) is true with S replaced by S ′ hence we may again suppose with no loss of
generality that all integers in S are square-free.
142
3. An Elementary Proof of Theorem 4.12

The argument now proceeds along the same line of reasoning that we used to prove
Theorem 5.13. It follows as before that, with S = {π(i) : i ∈ S},

(14) condition (13) holds if and only if △T ∈T T 6= ∅ for all T ⊆ S with |T | odd.

We then look for a combinatorial condition on S which implies that the set of primes

{p : χp ≡ −1 on S}

is infinite, in analogy with Lemma 6.1. Such a condition is provided by


S
Lemma 6.2. If there exists a subset N of Π = i∈S π(i) such that

|N ∩ π(i)| is odd f or all i ∈ S,

then
{p : χp ≡ −1 on S}
is infinite.

Proof. Let N be a subset of Π which satisfies the hypothesis of Lemma 6.2. As before,
use Lemma 4.4 to find infinitely many primes p such that

{q ∈ Π : χp (q) = −1} = N ;

then for all such p which divides no element of S,

χp (i) = (−1)|N ∩π(i)| = −1, for all i ∈ S.

QED
A
The final step is to prove that if A is a nonempty finite set, ∅ =
6 S ⊆ 2 \ {∅}, and

(15) △T ∈T T 6= ∅ for all T ⊆ S with |T | odd,

then there is a subset N of A such that

|N ∩ S| is odd, for all S ∈ S,

which can be done again by linear algebra over F .


We may take A = [1, n], list the elements of S as S = {S1 , . . . , Sm } and then observe
that, as in the proof just given of Theorem 5.13, there is a bijection of the set of solutions
in F n of the system of equations
X
v(Sj )(i)xi = 1, j = 1, . . . , m,
i

onto the set


{N ⊆ [1, n] : |N ∩ S| is odd, for all S ∈ S}.
143
Chapter 6. Elementary Proofs

This system has a solution if and only if the matrices


 
v(S1 )(1) . . . v(S1 )(n)
 . ..
.

B=  . . 

v(Sm )(1) . . . v(Sm )(n)
and  
v(S1 )(1) . . . v(S1 )(n) 1
 . .. .. 
B′ = 

.. . . 

v(Sm )(1) . . . v(Sm )(n) 1
have the same rank (over F ), hence we must verify that if (15) holds then B and B ′ have
the same rank.
Assuming that (15) is valid, we let v1 , . . . , vm , v1′ , . . . , vm

denote the row vectors of B and

B , respectively. We will use (15) to prove that
X X
(16) for all ∅ =
6 T ⊆ [1, m], vi = 0 iff vi′ = 0.
i∈T i∈T

Statement (16) implies that if L (respectively, L ) is the set of all sets of linearly independent
rows of B (respectively, B ′ ) then the map vi → vi′ induces a bijection Λ of L onto L′ such
that
|Λ(L)| = |L|, for all L ∈ L,
and so
rank of B = max |L| = max′ |L| = rank of B ′ .
L∈L L∈L
In order to verify (16), note first that if ∅ =
6 T ⊆ [1, m] then
X X
(17) i-th coordinate of vj = i-th coordinate of vj′ , i = 1, . . . , n,
j∈T j∈T

P P P
and so if j∈T vj= 0 then j∈T vj = 0. Conversely, if j∈T vj = 0 then (15) implies that
|T | is even. Consequently,
X
(n + 1)-th coordinate of vj′ = |T | · 1 = 0,
j∈T

hence this equation and (17) imply that j∈T vj′ = 0.


P
QED
We close this chapter by discussing what happens if instead of subsets of [1, ∞) we allow
nonempty, finite subsets of Z \ {0} in the hypotheses of all of the theorems in Chapters 4
and 5. Theorem 4.2 remains valid if the positive integer in its hypothesis is replaced by a
non-zero integer, and Theorems 4.3, 4.12, 5.13, and 5.15 remain valid with no change in their
statements if the set S in the hypotheses there is replaced by an arbitrary nonempty, finite
subset of Z \ {0}. In this more general situation, the integer −1 behaves like an additional
144
3. An Elementary Proof of Theorem 4.12

prime, and once that is taken into account, all of our arguments, both elementary and non-
elementary, can be modified without too much additional effort to verify these more general
results. If the subset of [1, ∞) in the hypotheses of Theorems 4.9 and 5.14 is replaced by
a nonempty, finite subset S of Z \ {0} and if the dimension d is determined by S as in the
statements of those theorems, then the density of the sets in their conclusions is now either
2−d or 2−(1+d) , with the latter value occurring if either −1 ∈ S or the sets πodd (z), z ∈ S,
possess a certain combinatorial structure. However, the proof of this version of Theorems
4.9 and 5.14 proceeds along the same lines as the arguments that we have given, with only
a few additional technical adjustments (see Wright [61], section 3 for the details).

145
CHAPTER 7

Dirichlet L-functions and the Distribution of Quadratic Residues

In section 4 of Chapter 4, we saw how the non-vanishing at s = 1 of the L-function L(s, χ)


of a non-principal Dirichlet character χ played an essential role in the proof of Dirichlet’s
theorem on prime numbers in arithmetic progression (Theorem 4.5). In this chapter, the
fact that L(1, χ) is not only nonzero, but positive, when χ is real and non-principal, will be
of central importance. The positivity of L(1, χ) comes into play because we are interested in
the following problem concerning the distribution of residues and non-residues of a prime p.
Suppose that I is an interval of the real line contained in the interval from 1 to p. Are there
more residues of p than non-residues in I, or are there more non-residues than residues, or
is the number of residues and non-residues in I the same? We will see that this question can
be answered if we can determine if certain sums of values of the Legendre symbol of p are
positive, and it transpires that the positivity of the sum of these Legendre-symbol values, for
certain primes p, are determined precisely by the positivity of L(1, χ) for certain Dirichlet
characters χ. We make all of this precise in section 1, where the principal theorem of this
chapter, Theorem 7.1, is stated and then used to obtain some very interesting answers to
our question about the distribution of residues and non-residues. In the next section, the
proof of Theorem 7.1 is outlined; in particular we will see how the proof can be reduced to
the verification of formulae, stated in Theorems 7.2, 7.3, and 7.4, which express the relevant
Legendre-symbol sums in terms of the values of L-functions at s = 1. Sections 3-7 are
devoted to the proof Theorems 7.2-7.4. In section 3, the fact that L(1, χ) > 0 for real,
non-principal Dirichlet characters is established, and sections 4-6 are devoted to discussing
various results concerning Gauss sums, analytic functions of a complex variable, and Fourier
series which are required for the arguments we take up in section 7. Because it plays such an
important role in the results of this chapter, we prove in section 8 Dirichlet’s fundamental
Lemma 4.8 on the non-vanishing of L(1, χ) for real, non-principal characters. Motivated by
the result on the convergence of Fourier series that is proved in section 6, we give yet another
proof of quadratic reciprocity in section 9 that uses finite Fourier series expansions.

1. Positivity of Sums of Values of a Legendre Symbol

Dirichlet [11] proved the following theorem in 1839:


147
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

Theorem 7.1. (i) If p ≡ 3 mod 4 then


X
χp (n) > 0.
0<n<p/2

(ii) If p ≡ 1 mod 4 then


X
χp (n) > 0.
0<n<p/4

(iii) If p > 3 then


X
χp (n) > 0.
0<n<p/3

This result initiated a line of intense research on the positivity of various sums of values
of a Dirichlet character, and of characters on more general groups, that continues unabated
to the present day. The importance to us of Theorem 7.1 lies in its connection with the
distribution of residues and non-residues of a prime p throughout the interval [1, p − 1]. In
order to see how that goes, we consider a subinterval I of R contained in {x ∈ R : 0 < x < p},
and, following Berndt [1], we define the quadratic excess of I to be the sum
X
q(I) = χp (n).
n∈I

If q(I) > 0 (respectively, q(I) < 0) then the number of residues (respectively, non-residues)
of p inside I exceeds the number of non-residues (respectively, residues) of p there, and if
q(I) = 0 then the number of residues and non-residues are the same. Hence Theorem 7.1
implies that if p ≡ 3 mod 4 then the number of residues inside the interval (0, p/2) exceeds
the number of non-residues there, or if p ≡ 1 mod 4 then the number of residues inside the
interval (0, p/4) exceeds the number of non-residues there, or if p > 3 then the number of
residues inside the interval (0, p/3) exceeds the number of non-residues there.
By taking Proposition 2.1 and Theorem 2.4 of Chapter 2 into account, we can say more.
If {X1 , . . . , Xk } is a set of pairwise disjoint subintervals of {x ∈ R : 0 < x < p} such that

[1, p − 1] = Z ∩
S
i X i then because of Proposition 2.1, we have that
X
(1) q(Xi ) = 0.
i

Now, using (a, b) to denote the interval {x ∈ R : a < x < b}, let

I1 = (0, p/3), I2 = (p/3, 2p/3), I3 = (2p/3, p),

J1 = (0, p/4), J2 = (p/4, p/2), J3 = (p/2, 3p/4), J4 = (3p/4, p).


148
1. Positivity of Sums of Values of a Legendre Symbol

Assume first that p ≡ 3 mod 4. Theorem 2.4 implies that χp (−1) = −1, hence
X
(2) q(I1 ) = χp (n)
0<n<p/3
X
= − χp (−n)
0<n<p/3
X
= − χp (p − n)
0<n<p/3
X
= − χp (n)
2p/3<n<p

= −q(I3 ),

and so by (1) and Theorem 7.1 (iii),

q(I2 ) = 0 and q(I3 ) < 0.

It follows that (p/3, 2p/3) contains the same number of residues as non-residues of p and the
number of non-residues in (2p/3, p) exceeds the number of residues there.
Assume next that p ≡ 1 mod 4. Theorem 2.4 implies that χp (−1) = 1 hence the minus
signs in (2) can be dropped to conclude that

q(I1 ) = q(I3 ),

and so by (1) and Theorem 7.1(iii) yet again,

q(I3 ) > 0 and q(I2 ) = −q(I1 ) − q(I3 ) < 0.

It follows that the number of non-residues of p in (p/3, 2p/3) exceeds the number of residues
there and the number of residues of p in (2p/3, p) exceeds the number of non-residues there.
Similar arguments show that if p ≡ 1 mod 4 then

(3) q(J1 ) = q(J4 ), q(J2 ) = q(J3 ), q(J1 ) = −q(J3 ),

hence we conclude from (3) by way of Theorem 7.1(ii) that the number of residues of p in
each of the intervals (0, p/4) and (3p/4, p) exceeds the number of non-residues there and
the number of non-residues in each of the intervals (p/4, p/2) and (p/2, 3p/4)) exceeds the
number of residues there.
Figures 1-4 below display graphically the distribution of the residues and non-residues in
each of the cases that we have discussed. A + above an interval indicates that the number
of residues in that interval exceeds the number of non-residues there, a − indicates that the
number of non-residues exceeds the number of residues, and a 0 indicates that the number
of residues and non-residues are the same. We now turn to the proof of Theorem 7.1.
149
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

+ −
0 p p
2

Figure 1. p ≡ 3 mod 4

+ − +
0 p 2p p
3 3

Figure 2. p ≡ 1 mod 4

+ 0 −
0 p 2p p
3 3

Figure 3. p ≡ 3 mod 4, p > 3

+ − − +
0 p p 3p p
4 2 4

Figure 4. p ≡ 1 mod 4

2. Proof of Theorem 7.1: Outline of the Argument

The proof of Theorem 7.1 depends on formulae for the quadratic excesses in the statement
of that theorem which are given in terms of certain Dirichlet L-functions. Recall from section
4 of Chapter 4 that if χ is a Dirichlet character then the L-function of χ is defined by the
Dirichlet series

X χ(n)
L(s, χ) = , s ∈ C.
n=1
ns
The property of these L-functions that will be essential for our proof of Theorem 7.1 is the fact
that when χ is real and non-principal, L(1, χ) > 0. It follows from Dirichlet’s fundamental
Lemma 4.8 in Chapter 4 that L(1, χ) 6= 0 for any non-principal Dirichlet character χ, and
we will show (among other things) in the next section that when χ is also real, L(1, χ) ≥ 0,
whence L(1, χ) > 0. Consequently, if we can prove that each of the sums in Theorem 7.1
can be expressed as a positive multiple of the value at s = 1 of the L-function of a real
non-principal Dirichlet character, then Theorem 7.1 will follow from the positivity of that
150
2. Proof of Theorem 7.1: Outline of the Argument

L-function value. That is the line of reasoning which we will follow to the proof of Theorem
7.1.
In order to carry out this argument, we therefore require formulae which express the
quadratic excesses q(0, p/2), q(0, p/4), and q(0, p/3) in terms of the value of L-functions at
s = 1. The formula for q(0, p/2) is given in

Theorem 7.2. If p ≡ 3 mod 4 then



p 
q(0, p/2) = 2 − χp (2) L(1, χp ).
π
This theorem implies that statement (i) of Theorem 7.1 is true.
In order to state the L-function formulae that will verify Theorem 7.1(ii) and (iii), we
will need to make use of the fact that if χm and χn are Dirichlet characters of modulus m
and n, and if gcd(m, n) = 1, then the point-wise product χm χn is a Dirichlet character of
modulus mn. This follows from the fact that if gcd(m, n) = 1 then the Chinese remainder
theorem implies that U(mn) is isomorphic to the direct product U(m) × U(n), and so the
point-wise product χm χn clearly defines a homomorphism of U(mn) into the circle group.
Our proof of Theorem 7.1 (ii) will make use of the character χ4p of modulus 4p given by
point-wise multiplication of χp and the character χ4 of modulus 4 defined by
(
(−1)(n−1)/2 , n odd,
χ4 (n) =
0 , n even.
Also, if p > 3 then we let χ3p denote the point-wise product of χ3 and χp . It is clear that
the characters χ3p and χ4p are real and non-principal.
Theorem 7.1(ii) and (iii) now follow, respectively, from

Theorem 7.3. If p ≡ 1 mod 4 then



p
q(0, p/4) = L(1, χ4p ).
π
Theorem 7.4. Let p > 3.
(i) If p ≡ 1 mod 4 then √
3p
q(0, p/3) = L(1, χ3p ).

(ii) If p ≡ 3 mod 4 then

p 
q(0, p/3) = 3 − χp (3) L(1, χp ).

We have now reduced the proof of Theorem 7.1 to the proof of Theorems 7.2, 7.3, and
7.4. The proof of these results will require quite a bit of preliminary preparation. The facts
about Dirichlet L-functions that will be needed are set forth in section 3. Our arguments will
151
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

also require the calculation of a very useful Gauss sum, which will be carried out in section 4
(we used Gauss sums in section 9 of Chapter 3 to verify the Law of Quadratic Reciprocity).
Our proof of Theorems 7.2 and 7.4(ii) will follow a very nice argument of Bruce Berndt [1]
which employs the theory of analytic functions of a complex variable, and so we will discuss
the relevant facts from that subject in section 5. The convergence of Fourier series is the
subject of section 6, required because we will use Fourier series to prove Theorems 7.3 and
7.4(i), in the same spirit as Dirichlet’s original proof of those theorems. Finally, we will
bring all of this together for the proof of Theorems 7.2, 7.3, and 7.4 in section 7.

3. Some Useful Facts About Dirichlet L-functions

The information concerning L-functions of Dirichlet characters that we will need are
recorded in

Lemma 7.5. Let χ be a Dirichlet character mod m.


(i) If χ is non-principal then L(s, χ) is analytic in the half-plane Re s > 0.
(ii) L(s, χ) has the absolutely convergent Euler-Dirichlet product expansion given by
Y 1
L(s, χ) = −s
, Re s > 1,
q
1 − χ(q)q
where the product is taken over all prime numbers q.
(iii) If χ is real-valued and non-principal then L(1, χ) > 0.

Proof. (i) This will follow immediately from Proposition 5.5 after we prove that the sums
n
X
χ(k)
k=1

are uniformly bounded as a function of n. To see this, we claim first that


P
(4) χ(k) = 0, whenever this sum is taken over any complete system of
ordinary residues mod m.
Assuming this is true, we take n ∈ [1, ∞), write n = r + lm, 0 ≤ r < m, and then calculate
that
Xn lm−1
X Xr
χ(k) = χ(k) + χ(k + lm)
1 1 k=0
r
X
= χ(k + lm), by (4)
k=0
r
X
= χ(k),
k=0
152
3. Some Useful Facts About Dirichlet L-functions

hence
Xn X r
χ(k) ≤ |χ(k)| ≤ m − 1.


1 0

In order to verify (4) use the fact that χ is periodic of period m (Proposition 4.6) and
the fact that k in (4) runs through a complete set of ordinary residues mod m to write
X X
χ(k) = χ(k),
k k∈U (m)

so we need only show that this latter sum is 0.


Because χ is non-principal, there is a k0 ∈ U(m) such that χ(k0 ) 6= 1. The map k → kk0
is a bijection of U(m) onto U(m), hence
X X X
χ(k) = χ(kk0 ) = χ(k0 ) χ(k),
k∈U (m) k∈U (m) k∈U (m)

hence
X
(1 − χ(k0 )) χ(k) = 0.
k∈U (m)

As 1 − χ(k0 ) 6= 0, it follows that


X
χ(k) = 0.
k∈U (m)

(ii) This product formula can be derived by appropriate modifications of our proof of
Theorem 5.8, which verified the product formula for the zeta function of an algebraic number
field. Note first that
1 χ(q)q −s
− 1 =

1 − χ(q)q −s 1 − χ(q)q −s

q −Re s

1 − q −Re s

≤ 2q −Re s , for all q ≥ 2, Re s > 1,

consequently Proposition 5.9 implies that the product in (ii) is absolutely convergent for Re
s > 1. The proof of Theorem 5.8 can now be easily modified by replacing the set of prime
ideals of R, the set of nonzero ideals of R, Proposition 5.2 and the Fundamental Theorem
of Ideal Theory in that proof by, respectively, the set P of all primes, the set [1, ∞), the
complete multiplicativity of χ, and the Fundamental Theorem of Arithmetic to obtain

X χ(n) Y 1
= , for Re s > 1.
n=1
ns q
1 − χ(q)q −s
153
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

(iii) If χ is real then every value of χ is 0 or ±1, hence each factor in the Euler product
expansion of L(s, χ) is positive for s > 1. Consequently L(s, χ) is not less than 0, and so by
the continuity of L(s, χ) on s > 0 it follows that

L(1, χ) = lim+ L(s, χ) ≥ 0.


s→1

But L(1, χ) 6= 0, because of Dirichlet’s fundamental Lemma 4.8, hence L(1, χ) > 0. QED

4. Calculation of a Gauss Sum

In addition to L-functions, our derivation of the formulae in Theorems 7.2, 7.3, and
7.4 will also employ some very useful properties of Gauss sums. Recall from the proof of
quadratic reciprocity in section 9 of Chapter 3 the Gauss sums
p−1  2πinj 
X
G(n, p) = χp (j) exp .
j=0
p

In that proof (Lemma 3.15 and Theorem 3.14), we showed that

(5) G(n, p) = χp (n)G(1, p)

and that (
p, if p ≡ 1 mod 4,
G(1, p)2 =
−p, if p ≡ 3 mod 4.
Determining the sign of G(1, p) from this equation turns out to be a very difficult problem,
and was solved by Gauss in 1805 after four long years of intense effort on his part. The plus
sign is the correct one in both cases; we will present a very nice proof of this fact due to L.
Kronecker, according to the account of it given in Ireland and Rosen [30], section 6.4.

Theorem 7.6. ( √
p , if p ≡ 1 mod 4,
G(1, p) = √
i p , if p ≡ 3 mod 4.

Proof. Let ζ = exp(2πi/p). The argument proceeds through a series of claims and their
verifications.
Claim 1.
(p−1)/2
Y
(−1) (p−1)/2
p= (ζ 2k−1 − ζ −2k+1)2 .
k=1
Claim 2. ( √
(p−1)/2
Y p , if p ≡ 1 mod 4,
(ζ 2k−1 − ζ −2k+1 ) = √
k=1
i p , if p ≡ 3 mod 4.
154
4. Calculation of a Gauss Sum

Once that Claim 1 is verified, we deduce from Theorem 3.14 that


(p−1)/2
Y
G(1, p) = ε (ζ 2k−1 − ζ −2k+1).
k=1

where ε = ±1. The conclusion of Theorem 7.6 will then be at hand once we verify Claim 2
and prove that ε = 1. Hence we make
Claim 3. ε = 1.
To verify Claim 1, start with the factorization
p−1
Y
p
x − 1 = (x − 1) (x − ζ j ).
j=1

Divide this equation by x − 1 and set x = 1 to derive that


Y
p= (1 − ζ r ),
r

where this product is taken over any complete system of ordinary residues mod p. It is easy
to see that the integers ±(4k − 2), k = 1, . . . , (p − 1)/2, is such a system of residues, and so
(p−1)/2 (p−1)/2
Y Y
p = (1 − ζ 4k−2
) (1 − ζ −(4k−2) )
1 1
(p−1)/2 (p−1)/2
Y Y
−(2k−1)
= (ζ −ζ 2k−1
) (ζ 2k−1 − ζ −(2k−1) )
1 1
(p−1)/2
Y
= (−1)(p−1)/2 (ζ 2k−1 − ζ −2k+1)2 .
1

Now for Claim 2. Claim 1 implies that


 (p−1)/2
Y 2
(ζ 2k−1 − ζ −2k+1) = (−1)(p−1)/2 p,
1

hence Claim 2 will follow from this equation once the sign of the product in Claim 2 is
determined. That product is
(p−1)/2
(p−1)/2
Y (4k − 2)π
i 2 sin .
1
p

Observe now that for k ∈ [1, (p − 1)/2],


(4k − 2)π p+2 p−1
sin < 0 iff <k≤ ,
p 4 2
155
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

hence this product has precisely (p − 1)/2 − [(p + 2)/4] negative factors, and so the number
of negative factors is either (p − 1)/4 or (p − 3)/4 if, respectively, p ≡ 1 or 3 mod 4. It is
now easy to see from this that the product in Claim 2 is a positive number if p ≡ 1 mod 4
or is i×(a positive number) if p ≡ 3 mod 4.
In order to verify Claim 3, consider the polynomial

p−1 (p−1)/2
X Y
j
f (x) = χp (j)x − ε (x2k−1 − xp−2k+1 ).
j=1 k=1

Then
(p−1)/2
Y
f (ζ) = G(1, p) − ε (ζ 2k−1 − ζ −2k+1) = 0
1

and
p−1
X
f (1) = χp (j) = 0.
j=1

Now the minimal polynomial of ζ over Q is p−1 k


P
k=0 x , and so we conclude from the proof of
Proposition 3.7 that k=0 x divides f (x) in Q[x]. As x−1 and p−1
Pp−1 k P k
k=0 x are both irreducible
over Q, they are relatively prime in Q[x]. Because x − 1 divides f (x) in Q[x], it follows that
xp − 1 = (x − 1)( p−1 k
k=0 x ) must also divide f (x) in Q[x]. Hence there exists h ∈ Q[x] such
P

that f (x) = (xp − 1)h(x). Now replace x by ez to obtain the equation

p−1 (p−1)/2
X Y
jz
e(2k−1)z − e(p−2k+1)z = (epz − 1)h(ez ).

χp (j)e − ε
j=1 k=1

Insert the power series expansion of ez into this equation and then deduce that the coefficient
of z (p−1)/2 on the left-hand side of the equation is

p−1 (p−1)/2
1 X
(p−1)/2
Y
 χp (j)j −ε (4k − p − 2),
(p − 1)/2 ! j=1 k=1

while the coefficient of z (p−1)/2 on the right-hand side is of the form pA/B, where A and B

are integers and gcd(B, p) = 1. Now equate coefficients, multiply through by B (p − 1)/2 !
156
5. Some Useful Facts About Analytic Functions of a Complex Variable

and reduce mod p to derive


p−1
X  p − 1  (p−1)/2
Y
(p−1)/2
χp (j)j ≡ ε ! (4k − 2)
j=1
2 k=1
(p−1)/2 (p−1)/2
Y Y
≡ ε 2k (2k − 1)
k=1 k=1
≡ ε(p − 1)!
≡ −ε mod p,
where the last congruence follows from Wilson’s theorem. But then by Euler’s criterion
(Theorem 2.5),
j (p−1)/2 ≡ χp (j) mod p,
hence
p−1
X
p−1= χp (j)2 ≡ −ε mod p,
j=1
and so
ε ≡ 1 mod p.
Because ε = ±1, it follows that ε = 1. QED

5. Some Useful Facts About Analytic Functions of a Complex Variable

The proof of Theorems 7.2 and 7.4(ii) that we will present uses an elegant application
of contour integration from complex analysis due to Bruce Berndt. In this section we will
discuss the requisite facts from that subject.
Let ∅ =
6 U ⊆ C be an open set. A function f : U → C is analytic in U if for each z ∈ U,
f (w) − f (z)
lim = f ′ (z)
w→z w−z
exists and is finite, i.e., f has a complex derivative at each point of U. A complex-valued
function with domain C is said to be entire if it is analytic in C. We will use the following
fundamental theorem about analytic functions in our proof of Lemma 4.8 for real Dirichlet
characters that we will present in section 8:

Theorem 7.7. (Taylor-series expansion of analytic functions) If f is analytic in U then


the n-th order derivative f (n) (z) exists and is finite for all z ∈ U and for all n ∈ [1, ∞).
Moreover, if a ∈ U and r > 0 is the distance of a to the boundary of U then

X f (n) (a)
f (z) = (z − a)n , |z − a| < r.
n=0
n!

157
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

Theorem 7.7 highlights the remarkable regularity which all analytic functions possess:
not only is an analytic function always infinitely differentiable, but it even has a convergent
Taylor-series expansion in a neighborhood of each point in its domain. This is far from true
for differentiable functions of a real variable.
Now let I denote the closed unit interval on the real line, and let γ : I → U be a contour
in U, i.e., a continuous, piecewise-smooth function defined on I with range in U. Let {γ}
denote the range of γ. If g : {γ} → C is a function continuous on {γ}, u = Re(g), and v =
Im(g), then the contour integral of g along γ, denoted by
Z
g(z) dz,
γ

is defined by I I
(u dx − v dy) + i (v dx + u dy),
γ γ
H
where, from multi-variable calculus, γ denotes standard line integration in the plane along
γ of real-valued functions continuous on {γ}. Since it would take us too far afield to give
a detailed account of the properties of this integral, we instead refer to J.B. Conway [3],
section IV.1 for that. We will need only the basic estimate
Z  
(6) g(z) dz ≤ max |g(z)| : z ∈ {γ} (length of γ).

γ

A contour γ is closed if γ(0) = γ(1). The next theorem is one of the most important and
most useful in all of complex analysis.

Theorem 7.8. (Cauchy’s integral theorem ) If f is analytic in U and γ is a closed contour


in U which does not wind around any point in C \ U then
Z
f (z) dz = 0.
γ

The next theorem provides a very useful formula for computing certain contour integrals
of functions which are analytic outside of a finite set of points. In order to state it, some
terminology needs to be defined, and so we will do that first.
A closed contour γ is a Jordan contour if γ is an injective function on the set I \ {1}.
Geometrically, this says that the path {γ} does not cross itself (see figure 5). If γ is a Jordan
contour then γ divides C into a pairwise disjoint union

V ∪ {γ} ∪ W,

where V and W are open sets and

the boundary of V = {γ} = the boundary of W.


158
5. Some Useful Facts About Analytic Functions of a Complex Variable

Suppose that as t increases from 0 to 1, γ(t) traverses {γ} in the counterclockwise direction:
we then say that γ is positively oriented. If γ is positively oriented then as t increases from
0 to 1, for exactly one of the sets V or W , γ(t) winds around each of the points in that set
exactly once. The set for which this occurs, either all of the points of V or all of the points

of W , is called the interior of γ. The set C \ {γ} ∪ (interior of γ) is the exterior of γ. It
can be shown that the interior of γ is a bounded set and the exterior of γ is unbounded. All
of the facts in this paragraph are the contents of the Jordan Curve Theorem: for a proof,
consult Dugundji [13], section XVII.5.

Exterior of γ

Interior of γ {γ}

Figure 5. Geometry and topology of a positively oriented Jordan contour γ

A function f has an isolated singularity at a point a if there is an r > 0 such that f is


analytic in 0 < |z − a| < r, but f ′ (a) does not exist. An isolated singularity of f at a is a
pole of order m ∈ [1, ∞) if there exists δ > 0 and a function g analytic in |z − a| < δ such
that g(a) 6= 0 and
g(z)
f (z) = , 0 < |z − a| < δ.
(z − a)m
The residue of f at this pole, denoted Res(f, a), is the number
g (m−1) (a)
.
(m − 1)!
If the order of the pole at a is 1 then it is called a simple pole, and its residue there is
g(a) = lim (z − a)f (z).
z→a
159
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

We can now state the result on the calculation of contour integrals that we need.

Theorem 7.9. (The residue theorem) Let U be an open subset of C, f a function analytic
in U except for poles located in U. If γ is a positively oriented Jordan contour in U which
does not wind around a point in C \ U and which does not pass through any of the poles of
f, and if a1 , . . . , an are the poles of f that are in the interior of γ, then
n
1
Z X
f (z)dz = Res(f, ak ).
2πi γ k=1

For proof of Theorems 7.7, 7.8, and 7.9, consult, respectively, Conway [3], sections IV.2,
IV.5, and V.2.
We will apply Theorems 7.8 and 7.9 in the following situation. Let U be an open set, h
and g functions analytic in U, and suppose that a ∈ U is a zero of g, i.e., g(a) = 0. Moreover
suppose that a is a simple zero, i.e., g ′ (a) 6= 0. Then h/g has a simple pole at a if and only
if h(a) 6= 0, and if h(a) 6= 0 then by way of L’Hospital’s rule,
(z − a)h(z) h(a)
Res(h/g, a) = lim = ′ .
z→a g(z) g (a)
Hence Theorems 7.8 and 7.9 imply

Lemma 7.10. Let U be an open subset of C, let h and g be analytic in U, and suppose g
has only simple zeros in U. If γ is a positively oriented Jordan contour in U which does not
wind around a point in C \ U and does not pass through any of the zeros of g, and a1 , . . . , an
are the zeros of g in the interior of γ, then
n
1 h(z) h(ak )
Z X
dz = .
2πi γ g(z) g ′ (ak )
k=1

In section 7, Theorems 7.2 and 7.4(ii) will be deduced by integrating around rectangles
a cleverly designed function analytic except for poles and then applying Lemma 7.10.

6. The Convergence of Fourier Series

Theorems 7.3 and 7.4(i) will be deduced by appeals to certain facts concerning the
convergence of Fourier Series. We therefore preface the proof proper with a brief discussion
of Fourier series and their convergence.
If f is a real-valued function defined and integrable over −π ≤ x ≤ π, then the Fourier
series S(f, x) of f is the series defined by

a0 X
+ (an cos nx + bn sin nx),
2 n=1
160
6. The Convergence of Fourier Series

where
1 π
Z
a0 = f (x)dx,
π −π
1 π
Z
an = f (x) cos nx dx,
π −π
1 π
Z
bn = f (x) sin nx dx, n = 1, 2, . . . ;
π −π
an and bn are called, respectively, the Fourier cosine and sine coefficients of f.
Recall that a real-valued function f defined on a closed and bounded interval J = {x :
c ≤ x ≤ d} of the real line is piecewise differentiable on J if there is a finite partition of
{x : c ≤ x < d} into subintervals such that for each subinterval a ≤ x < b, there exists a
function g differentiable on a ≤ x ≤ b such that f ≡ g on a < x < b. A function f that is
piecewise differentiable on J is clearly piecewise continuous there, hence if c < x < d then
the one-sided limits
f± (x) = lim± f (t), lim+ f (t), and lim− f (t)
t→x t→c t→d
exist and are finite. It follows that if f is defined on the entire real line, is periodic of period
2π, and is piecewise differentiable on −π ≤ x ≤ π then both one-sided limits of f at any real
number exist and are finite, and so the functions f± (x) = limt→x± f (t) are both defined and
real-valued on the entire real line. Figure 6 illustrates what the graph of a typical piecewise
differentiable function looks like.

Figure 6. A piecewise differentiable function

Piecewise differentiable functions exist in abundance and examples are very easy to come
by; functions continuous but not piecewise differentiable on an interval are not difficult to
construct either. Probably the simplest such example of the latter is to take a closed and
bounded interval J on the real line, and let (an )∞ n=1 be a strictly increasing sequence of
elements of J converging to the right-hand endpoint d of J, with a1 equal to the left-hand
161
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

···

Figure 7. A continuous, non-piecewise differentiable function

endpoint of J, say. On each closed interval with endpoints an and an+1 define the function fn
which is 0 at an and an+1 , is 1/n at (an + an+1 )/2, and is linear and continuous on each of the
closed intervals with left-hand endpoints an and (an + an+1 )/2 and corresponding right-hand
endpoints (an + an+1 )/2 and an+1 , n = 1, 2, . . . . Then define f on J to equal fn on the closed
interval with endpoints an and an+1 for n = 1, 2, . . . , and set f (d) = 0. The function f is
continuous on J, it is not differentiable at each point an , n = 2, 3, . . . , and because an → d, f
is not piecewise differentiable on J. In Figure 7, we indicate what the graph of a continuous,
non-piecewise differentiable function constructed along these lines would look like.
We will use the following basic theorem on the convergence of Fourier series, a variant of
which was first proved by Dirichlet [9] in 1829.

Theorem 7.11. If f is defined on the real line R, is periodic of period 2π, and is piecewise
differentiable on −π ≤ x ≤ π, then the Fourier series S(f, x) of f converges to
f+ (x) + f− (x)
, x ∈ R.
2
In particular, if f is continuous at x then S(f, x) converges to f (x).

Proof. Let
n
a0 X
Sn (x) = + (ak cos kx + bn sin kx),
2 k=1

denote the n-th partial sum of the Fourier series of f . The key idea of this argument, due to
Dirichlet, and used more or less in all convergence proofs of Fourier series, is to first express
Sn (x) in an integral form that is more amenable to an analysis of the convergence involved.
162
6. The Convergence of Fourier Series

Using the definition of the Fourier cosine and sine coefficients of f , we thus calculate that
n
1 π 1 X
Z 
Sn (x) = f (t) + (cos kx cos kt + sin kx sin kt) dt
π −π 2 k=1
n
1 π 1 X
Z 
= f (t) + cos k(x − t) dt.
π −π 2 k=1
Using the trigonometric identity
n
sin n + 21 θ

1 X
+ cos kθ =  ,
2 k=1 2 sin θ2
it follows that π
1
Z
Sn (x) = f (t)Dn (x − t)dt,
π −π
where
sin n + 12 θ

Dn (θ) =
2 sin 2θ


is the Dirichlet kernel of Sn (x) (at θ = kπ, k an even integer, we define Dn (θ) to be n + 21 ,
so as to make Dn a function continuous on R). Using the facts that f and Dn are of period
2π and Dn is an even function, we can rewrite the integral formula for Sn as
1 π
Z

Sn (x) = f (x + t) + f (x − t) Dn (t)dt , x ∈ R.
π 0
If we now let f ≡ 1 in this equation and check that for this f , Sn ≡ 1, we find that
2 π
Z
1= Dn (t)dt.
π 0
After multiplying this equation by 21 (f+ (x) + f− (x)) and then subtracting the equation
resulting from that from the equation given by the above integral formula for Sn , it follows
that
f+ (x) + f− (x) 1 π
Z

(6) Sn (x) − = f (x + t) − f+ (x) + f (x − t) − f− (x) Dn (t)dt.
2 π 0
Now let
f (x + t) − f+ (x) + f (x − t) − f− (x)
Ξ(x, t) =   , 0 < t ≤ π.
t
2 sin
2
With an eye toward defining Ξ(x, ·) at t = 0 so as to make Ξ(x, ·) right-continuous there, we
study the behavior of Ξ(x, t) as t → 0+ . To that end, first rewrite Ξ(x, t) as
 
f (x + t) − f+ (x) f (x − t) − f− (x) t
Ξ(x, t) = + ·   , 0 < t ≤ π.
t t t
2 sin
2
163
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

Because f is periodic of period 2π and f is piecewise differentiable on −π ≤ ξ ≤ π, there


exists subintervals a ≤ ξ < b, b ≤ ξ < c of the real line and functions g and h differentiable
on a ≤ ξ ≤ b and b ≤ ξ ≤ c, respectively, such that b ≤ x < c and f (ξ) equals, respectively,
g(ξ) or h(ξ) whenever, respectively, a < ξ < b or b < ξ < c. A moment’s reflection now
confirms that
f (x + t) − f+ (x)
lim+ = h′ (x),
t→0 t
(
f (x − t) − f− (x) −h′ (x) , if x > b,
lim =
t→0+ t −g ′ (b) , if x = b,
and so we conclude that limt→0+ Ξ(x, t) exists and is finite. If we take Ξ(x, 0) to be this finite
limit, then Ξ(x, ·) is defined and piecewise continuous on 0 ≤ t ≤ π.
It follows that the functions
 1
Ξ(x, t) sin n + t, 0 ≤ t ≤ π,
2
and

f (x + t) − f+ (x) + f (x − t) − f− (x) Dn (t), 0 ≤ t ≤ π,
are both piecewise continuous on 0 ≤ t ≤ π and agree on 0 < t ≤ π. The latter function can
hence be replaced by the former function in the integrand of the integral on the right-hand
side of (6) to obtain the equation
f+ (x) + f− (x) 1 π 1
Z 
Sn (x) − = Ξ(x, t) sin n + t dt.
2 π 0 2
The conclusion of Theorem 7.11 will now follow if we prove that
1 π 1
Z 
lim Ξ(x, t) sin n + t dt = 0.
n→+∞ π 0 2
In order to do that, use the formula for the sine of a sum to write
Z π Z π Z π
 1
Ξ(x, t) sin n + t dt = α(t) sin nt dt + β(t) cos nt dt,
0 2 −π −π

where 
  0 , if − π ≤ t < 0,
α(t) = t
 Ξ(x, t) cos , if 0 ≤ t ≤ π,
2

  0 , if − π ≤ t < 0,
β(t) = t
 Ξ(x, t) sin , if 0 ≤ t ≤ π.
2
164
6. The Convergence of Fourier Series

Because α and β are functions piecewise continuous on −π ≤ t ≤ π, our proof will be done
upon verifying that if a function ψ is piecewise continuous on −π ≤ t ≤ π and if an and bn
are the Fourier cosine and sine coefficients of ψ then

lim an = 0 = lim bn
n n

(This very important fact is known


n as
o then Riemann-Lebesgue lemma).
o n In order to see that, o
1 1 1
note that the set of functions √

∪ √
π
cos nt : n ∈ [1, ∞) ∪ √
π
sin nt : n ∈ [1, ∞)
is orthonormal with respect to the inner product defined by integration over the interval
−π ≤ t ≤ π, hence a straightforward calculation using this fact shows that if σn denotes the
n-th partial sum of the Fourier series of ψ then
n
1 π 1 π 2  a2 X
Z Z 
2 0
0≤ (ψ − σn ) dx = ψ dx − + (a2k + b2k ) ,
π −π π −π 2 k=1

and so
n π
a20 X 2 1
Z
+ (ak + b2k ) ≤ ψ 2 dx < +∞, for all n ∈ [1, ∞)
2 k=1
π −π

(this is Bessel’s inequality). Hence the series



a20 X 2
+ (an + b2n )
2 n=1

converges, and so an and bn both tend to 0 as n → +∞. QED


Remarks.
(1) Another very useful class of real-valued functions for which the conclusion of Theorem
7.11 is also valid is the functions f that are defined on the whole real line, periodic of period
2π, and are of bounded variation on −π ≤ x ≤ π. This means that the supremum of the
sums
Xm
|f (xi ) − f (xi−1 )|
i=1
as {−π = x0 < x1 < · · · < xm = π} varies over all divisions of the interval −π ≤ x ≤ π by
a finite number of points x0 , . . . , xm is finite. A result from elementary real analysis asserts
that if f is of bounded variation on −π ≤ x ≤ π then f is the difference of two functions both
of which are non-decreasing on −π ≤ x ≤ π, and so if f is also defined on the entire real line
and is periodic of period 2π then the one-sided limits f± (x) exist and are finite for all x. That
Theorem 7.11 is valid for all functions of bounded variation on −π ≤ x ≤ π is in fact what
Dirichlet proved in his landmark paper [9]. This version of Theorem 7.11 also works in our
proof of Theorems 7.3 and 7.4 infra; we have proved Theorem 7.11 for piecewise differentiable
functions because the argument which covers that situation is a bit more elementary than
165
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

the one which suffices for functions of bounded variation. For a proof of the latter theorem,
the interested reader should consult Zygmund [65], Theorem II.8.1. However, note well: a
function that is piecewise differentiable need not be of bounded variation and a function of
bounded variation is not necessarily piecewise differentiable.
(2) It transpires that if f is a complex-valued function defined on the integers which is
periodic in the sense that for some integer m > 1, f (a) = f (b) whenever a ≡ b mod m,
then f can be expanded in terms of a finite Fourier series in complete analogy with the
expansion into infinite Fourier series that we have just discussed. When this finite Fourier
series expansion is applied to Gauss sums, another proof of the Law of Quadratic Reciprocity
results, as we will see in section 9 below.

7. Proof of Theorems 7.2, 7.3, and 7.4

We begin this section with the proof of Theorem 7.2. Let p ≡ 3 mod 4: we must prove
that √
p 
q(0, p/2) = 2 − χp (2) L(1, χp ).
π
Toward that end, consider the functions F (z) and f (z) defined by
  
X 4j
F (z) = χp (j) cos 1− πz ,
p
0<j<p/2

πF (z)
f (z) = .
z cos(πz)
We will prove Theorem 7.2 by integrating f (z) around rectangles and then applying Lemma
7.10.
Note first that the numerator and denominator of f are entire functions, then that the
zeros of the denominator of f occur at z = 0, zn = (2n − 1)/2, n ∈ Z, and that they are all
simple. In order to apply Lemma 7.10 to f , we therefore need to calculate
πF (z)
d
at z = 0, zn , n ∈ Z.
dz
(z
cos πz)
At z = 0 this is
X
(7) πF (0) = π χp (j) = πq(0, p/2),
0<j<p/2

and at z = zn , it is
F (zn )
(−1)n .
zn
166
7. Proof of Theorems 7.2, 7.3, and 7.4

We claim that

n F (zn ) p
(8) (−1) =− χp (2n − 1), n ∈ Z.
zn 2n − 1
In order to check this, we will first use the elementary identity
eiz + e−iz
(9) cos z =
2
to calculate F (zn ) as a Gauss sum. Toward that end, let αj = 1 − (4j/p); then
     
2n − 1 2n − 1 2πj(2n − 1)
exp i αj π = exp i π exp −i
2 2 p
 
2πj(2n − 1)
= (−1)n+1 i exp −i ,
p
and similarly    
2n − 1 n 2πj(2n − 1)
exp −i αj π = (−1) i exp i ,
2 p
Hence from (9) we deduce that
(−1)n+1 i X
 
2πij(2n − 1)
F (zn ) = χp (j) exp −
2 p
0<j<p/2

(−1)n i X
 
2πij(2n − 1)
+ χp (j) exp .
2 p
0<j<p/2

Observe now that the exponential factors here are periodic of period p in the variable j and,
as p ≡ 3 mod 4, χp (−1) = −1. We can hence shift the summation in the first term on the
right-hand side of this equation to express that term as
(−1)n i X
 
2πij(2n − 1)
χp (j) exp ,
2 p
p/2<j<p

hence
(−1)n i X (−1)n i
 
2πij(2n − 1)
(10) F (zn ) = χp (j) exp = G(2n − 1, p).
2 0<j<p p 2

Hence (10), (5), and Theorem 7.6 imply


F (zn ) i
(−1)n = G(2n − 1, p)
zn 2zn
i
= χp (2n − 1) G(1, p)
2n − 1

p
= − χp (2n − 1).
2n − 1
This verifies (8).
167
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

Now for the contour around which we will integrate f . Let γN denote the positively
oriented rectangle centered at the origin, with horizontal side length 4pN and vertical side

length 2 N , where N is a fixed positive integer. γN is clearly a Jordan contour, and the
zeros of z cos πz inside γN are 0 and zn , n ∈ [−pN + 1, pN]. Hence (7), (8), and Lemma 7.10
imply that
pN
1 √ χp (2n − 1)
Z X
(11) f (z)dz = πq(0, p/2) − p .
2πi γN n=−pN +1
2n − 1

Because χp (−1) = −1,


χp (k) χp (−k)
= , for all k ∈ Z \ {0},
k −k
hence
pN pN
X χp (2n − 1) X χp (2n − 1)
(12) =2 .
n=−pN +1
2n − 1 n=1
2n − 1

We claim that
1
Z
(13) lim f (z) dz = 0.
N →∞ 2πi γN

Assuming this for a moment, we deduce from (11), (12) and (13) that
√ pN
2 p X χp (2n − 1)
(14) q(0, p/2) = lim .
π N →∞
n=1
2n − 1
In order to evaluate the limit on the right-hand side of (14), note that for each integer M > 1,
M −1 M −1
χp (2) X χp (k) X χp (2k)
= ,
2 1
k 1
2k
hence
2M −1 M −1 M
X χp (k) χp (2) X χp (k) X χp (2n − 1)
− = .
1
k 2 1
k 1
2n − 1
Letting M → ∞ in this equation, we obtain
M   ∞
X χp (2n − 1) χp (2) X χp (k)
lim = 1−
M →∞
1
2n − 1 2 1
k
 
χp (2)
= 1− L(1, χp ).
2
Hence from (14) it follows that

p 
q(0, p/2) = 2 − χp (2) L(1, χp ),
π
168
7. Proof of Theorems 7.2, 7.3, and 7.4

the conclusion of Theorem 7.2.


We now need only to verify (13). This requires appropriate estimates of f along the sides
of γN . Consider first the function
cos(απz) 4j
g(z) = , α=1− ,
cos(πz) p
coming from a term of F (z)/ cos πz. Using (9), we calculate that for z = x + iy,
|g(z)|2 = h(z)e2π(α−1)|y| , where
e−4πα|y| + 2e−2π(α−1)|y| cos 2x + 1
h(z) = .
e−4π|y| + 2e−2π|y| cos 2x + 1
We have
α − 1 ≤ −4/p, for all α,
h(z) < 4/(1/2) = 8, for all |y| ≥ 1,
and so

|g(z)| < 2 2 e−(4π/p)|y| , for all |y| ≥ 1.
Hence


F (z)
(15) < p 2 e−(4π/p)|y| , for all |y| ≥ 1.
cos(πz)
From (15) it follows that
√ √
p 2 e−(4π/p) N
(16) |f (z)| < √ , for all z on the horizontal sides HN of γN .
N
By (15), F (z)/ cos(πz) is bounded on the vertical line Re z = 2p. But F (z)/ cos(πz) is
periodic of period 2p, hence there is a constant C, independent of N, such that

F (z)
cos(πz) ≤ C, for all z on the vertical sides VN of γN .

Hence
C
(17) |f (z)| ≤ , for all z on the vertical sides VN of γN .
2pN
The estimates (6), (16), and (17) now imply that
Z Z Z

f (z) dz ≤ f (z) dz + f (z) dz

γN HN VN
√ −(4π/p)√N
p 2e C √
≤ √ · 8pN + ·4 N
N 2pN
→ 0, as N → ∞.
QED
169
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

Now for the proof of Theorem 7.3. Here p ≡ 1 mod 4 and we must show that

p
q(0, p/4) = L(1, χ4p ).
π
The proof we give is based on the convergence of Fourier series and is very much in the
same spirit as Dirichlet’s original argument. Let f be the function defined on R which is

1, for 0 ≤ x < π/2, 3π/2 < x ≤ 2π,

0, for x = π/2, 3π/2,


−1, for π/2 < x < 3π/2,
and is periodic of period 2π. Clearly f is piecewise differentiable on −π ≤ x ≤ π, hence
calculation of the Fourier series of f and Theorem 7.11 imply that

4 X (−1)n
(18) f (x) = − cos(2n − 1)x, −∞ < x < +∞.
π n=1 2n − 1
Next, let χ = χ4p = χ4 χp . Multiply the equation of Gauss sums

G(2n − 1, χp ) = χp (2n − 1)G(1, p),

from (5), by
(−1)n
2n − 1
to obtain
p−1
(−1)n χp (2n − 1) (−1)n
 
X 2πi(2n − 1)j
(19) G(1, p) = χp (j) exp .
2n − 1 j=1
2n − 1 p

By virtue of Theorem 7.6,



G(1, p) = p,
and so, upon taking the real part of (19), we arrive at
p−1
√ (−1)n χp (2n − 1) X (−1)n
 
2πj
(20) p = χp (j) cos (2n − 1) · .
2n − 1 j=1
2n − 1 p

The definition of χ4 implies that

χ(k) = 0, k even,

χ(2n − 1) = (−1)n+1 χp (2n − 1),


hence
∞ ∞
X (−1)n χp (2n − 1) X χ(k)
(21) =− = −L(1, χ).
n=1
2n − 1 k=1
k
170
7. Proof of Theorems 7.2, 7.3, and 7.4

On the other hand, we have from (18) that



(−1)n
  X  
π 2πj 2πj
(22) − f = cos (2n − 1) · , j = 1, . . . , p − 1.
4 p n=1
2n − 1 p
Consequently, we can sum (20) from n = 1 to ∞, interchange the order of summation on the
right-hand side of the equation that results from that, and then use (21) and (22) to deduce
that
p−1  
√ πX 2πj
(23) p L(1, χ) = f χp (j).
4 j=1 p

The final step is to evaluate the right-hand side of (23). Note that
p 2πj π
0<j< iff 0 < < ,
4 p 2
p p π 2πj
< j < iff < < π,
4 2 2 p
p 3p 2πj 3π
<j< iff π < < ,
2 4 p 2
3p 3π 2πj
< j < p iff < < 2π.
4 2 p
Hence, according to the definition of f ,
π 
right-hand side of (23) = q(0, p/4) − q(p/4, p/2) − q(p/2, 3p/4) + q(3p/4, p) .
4
But by way of (3),
q(0, p/4) = q(3p/4, p),
q(p/4, p/2) = −q(0, p/4),
q(p/2, 3p/4) = −q(0, p/4),
and so
right-hand side of (23) = πq(0, p/4),
whence √
p
q(0, p/4) = L(1, χ).
π
QED
The proof of Theorem 7.4 naturally divides into the verification of each of the statements
(i) and (ii), and so we will verify each of these in turn
Begin with statement (i). We have here that p ≡ 1 mod 4, we want to verify that

3p
q(0, p/3) = L(1, χ3p ),

171
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

and we will use Fourier series once more. Let f be the function that is

1, for 0 ≤ x < 2π/3, 4π/3 < x ≤ 2π,

1/2, for x = 2π/3, 4π/3,


0, for 2π/3 < x < 4π/3,
and is periodic of period 2π. Calculation of the Fourier series of f and Theorem 7.11 imply
that √ ∞
2 3 X an
f (x) = + cos nx, −∞ < x < +∞,
3 π n=1 n
where 

 0, if 3 divides n,
an = 1, if n ≡ 1 mod 3,

−1, if n ≡ 2 mod 3 .

Observe now that


an = χ3 (n), for all n,
and so
√ ∞
2 3X cos nx
(24) f (x) = + χ3 (n) , −∞ < x < +∞.
3 π n=1 n
Now multiply both sides of
G(n, χp ) = χp (n)G(1, p)
by √
3
χ3 (n),
πn
equate real parts in the equation which results, and then use Theorem 7.6, (24), and sum-
mation of the resulting terms from n = 1 to ∞ as was done in the proof of Theorem 7.3 to
obtain
√ √ ∞ p−1    
3p 3 X χ3 (n)χp (n) X 2πj 2
L(1, χ3p ) = G(1, p) = f − χp (j).
π π n=1
n j=1
p 3
Pp−1
Because 1 χp (j) = 0, the sum on the right is
p−1  
X 2πj X X
f χp (j) = χp (j) + χp (j), by definition of f ,
j=1
p
0<j<p/3 2p/3<j<p
X
= 2 χp (j), because χp (−1) = 1,
0<j<p/3

= 2q(0, p/3).
172
7. Proof of Theorems 7.2, 7.3, and 7.4

Hence √
3p
q(0, p/3) = L(1, χ3p ),

which is the conclusion of Theorem 7.4(i).
The verification of statement (ii) of Theorem 7.4 follows by either contour integration or
the method of Fourier series along the same lines of argument that we have used before. We
will outline the main ideas in the contour-integration proof and leave the rest of the details
(and the proof via Fourier series) as an instructive exercise for the interested reader.
Let
πF (z)
f (z) =  ,
1
z sin π z +
3
where
 
X π 6πjz X
F (z) = 2i χp (j) sin πz + − + e−3πiz χp (j)e6πijz/p .
3 p
0<j<p/3 p/3<j<2p/3

We must calculate
πF (z)
  
d 1
z sin π z +
dz 3
1
at z = 0, n − 3 , n ∈ Z. At z = 0, we obtain
πF (0)
 π  = 2πi q(0, p/3),
sin
3
and at z = n − 13 , we find that the value is
3(−1)n
 
1 3 3
F n− =− G(3n − 1, p) = − χp (3n − 1)G(1, p).
3n − 1 3 3n − 1 3n − 1
We now integrate f (z) over a suitable rectangle γN as in the proof of Theorem 7.2, apply
πF (z)
Lemma 7.10 using the values of d  that we have calculated, and then let
dz
z sin π z + 13 )
N → +∞ as before to deduce that

X χp (3n − 1)
0 = 2πi q(0, p/3) − 3G(1, p)
n=−∞
3n − 1
∞ ∞
!
X χp (3n − 1) X χp (3n + 1)
= 2πi q(0, p/3) − 3G(1, p) +
n=1
3n − 1 n=0
3n + 1
∞ ∞
!
X χp (n) X χp (3n)
= 2πi q(0, p/3) − 3G(1, p) − ,
n=1
n n=1
3n
from which Theorem 7.4(ii) follows easily after another application of Theorem 7.6. QED
173
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

Remarks
(1) Berndt’s paper [1] is well worth studying; in it, he establishes many other results
on positivity and negativity of the quadratic excess over various intervals: for example if
p ≡ 11, 19 mod 40 then q(0, p/10) > 0 and if p ≡ 5 mod 24 then q(3p/8, 5p/12) < 0. He also
gives a very interesting discussion of the history of this problem with numerous pertinent
references to the literature.
(2) Because the statements in Theorem 7.1 are so important in the theory of quadratic
residues, elementary proofs of them would be of great interest. However, despite numerous
efforts by many people during the intervening 175 years, those proofs continue to remain
elusive.

8. An Elegant Proof of Lemma 4.8 for Real Dirichlet Characters

Because of the crucial role that it has played in the work done in this chapter, we will
now prove Lemma 4.8 for real, non-principal Dirichlet characters χ, i.e., we will show that if
χ(Z) = [−1, 1] then L(1, χ) 6= 0. The proof that we will present is due to de la Valleé Poussin
[45] and is one of the most elegant arguments available for this. Following Davenport [6], pp.
32-34, we start by recalling some well-known facts about analytic continuation of Riemann’s
zeta.
Following long tradition in these matters, we let s = σ + it denote a complex variable.
Proposition 5.5 implies that ζ(s) is analytic in σ > 1; we want to show that ζ can be extended
to a function analytic in σ > 0 except for a simple pole at s = 1. In order to do that, let
σ > 1 and then write

X ∞
X
−s
ζ(s) = n = n(n−s − (n + 1)−s )
n=1 n=1
X∞ Z n+1
= s n x−(s+1) dx
n=1 n
Z ∞
= s [x]x−(s+1) dx,
1

where [x] denotes the greatest integer which does not exceed x. Now let [x] = x − (x), so
that (x) denotes the fractional part of x. This gives
Z ∞
s
(25) ζ(s) = −s (x)x−(s+1) dx, σ > 1.
s−1 1

The integral on the right is absolutely convergent for σ > 0, uniformly convergent for σ ≥
ǫ > 0, and all Riemann sums of the integrand are entire functions of s, hence this integral
174
8. An Elegant Proof of Lemma 4.8 for Real Dirichlet Characters

defines a function analytic in σ > 0. Consequently the right-hand side of (25) extends ζ(s)
to a function analytic in σ > 0 except for a simple pole at s = 1. It hence follows that

(26) lim ζ(s) = +∞.


s→1+

Next we observe that the proof of the Euler-Dedekind product expansion of the zeta
function of an algebraic number field F given in Theorem 5.8 can be easily modified to show
that that product expansion is valid for all σ > 1. If we hence take the number field F in
that theorem to be Q, we deduce that ζ has the Euler-product expansion
Y
ζ(s) = (1 − q −s )−1 , σ > 1.
q

We also have from the estimate in the proof of (7) in Chapter 5 that the series
X
log(1 + q −σ )
q

is absolutely convergent for σ > 1. Hence


!
Y X
|ζ(s)| ≥ (1 + q −σ )−1 = exp − log(1 + q −σ ) > 0, σ > 1,
q q

and so ζ(s) never vanishes in σ > 1.


Now let χ be a real, non-principal Dirichlet character, and suppose by way of contra-
diction that L(1, χ) = 0. Because L(s, χ) is analytic in σ > 0 (Lemma 7.5(i)) and ζ has a
simple pole at s = 1 as its only singularity in σ > 0, it follows that

L(s, χ)ζ(s) is analytic in σ > 0.

Because ζ(2s) 6= 0 in σ > 1/2, the function


L(s, χ)ζ(s)
ψ(s) =
ζ(2s)
is analytic in σ > 1/2. Equation (26) implies that lims→ 1 + ζ(2s) = +∞, hence
2

(27) lim+ ψ(s) = 0.


s→ 21

For σ > 1, ψ has the Euler product expansion


Y (1 − χ(q)q −s )−1 (1 − q −s )−1
ψ(s) = .
q
(1 − q −2s )−1

Let m = the modulus of χ. χ(q) = 0 if and only if q divides m, and the factor of the Euler
product corresponding to such q is
1 + q −s .
175
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

If χ(q) = −1 then the factor corresponding to q is


(1 + q −s )−1 (1 − q −s )−1
= 1.
(1 − q −2s )−1
Hence
ψ(s) Y 1 + q −s
(28) = , σ > 1.
1 − q −s
Y
(1 + q −s ) q:χ(q)=1
q|m

(We note incidentally that X = {q : χ(q) = 1} must be infinite; otherwise


Y Y 1 + q −s
ψ(s) = (1 + q −s )
q∈X
1 − q −s
q|m

and this product has only a finite number of factors, hence lims→ 1 + ψ(s) > 0, contrary to
2
(27)).
Next let
ψ(s)
φ(s) = Y .
(1 + q −s )
q|m

As the denominator here is nonzero in σ > 0, φ(s) is analytic in σ > 1/2, and (27) implies
that

(29) lim+ φ(s) = 0.


s→ 21

We will now show that the product expansion (28) of φ implies that
1
(30) φ(s) > 1 for
< s < 2.
2
This contradicts (29) and so Lemma 4.8 follows for real non-principal characters.
In order to verify (30), observe that

1 + q −s X
−s
=1+2 q −ns , σ > 1,
1−q n=1

hence we can use (28) to express φ(s) as a Dirichlet series



X an
φ(s) = , σ > 1,
n=1
ns
where the coefficients an are calculated like so: a1 = 1, and if n ≥ 2 then
(
2|π(n)| , if π(n) ⊆ {q : χ(q) = 1},
an =
0 , otherwise.
In particular, an ≥ 0, for all n.
176
9. A Proof of Quadratic Reciprocity via Finite Fourier Series

Because φ is analytic in σ > 12 , it is a consequence of Theorem 7.7 that φ has a Taylor


series expansion centered at 2 with radius of convergence at least 23 , i.e.,

X φ(m) (2) 3
φ(s) = (s − 2)m , |s − 2| < .
m=0
m! 2

We can calculate φ(m) (2) by term-by-term differentiation of the Dirichlet series: this series is
locally uniformly convergent in σ > 1 and so we can apply the theorem which asserts that a
series of functions analytic in an open set U and locally uniformly convergent there has a sum
that is analytic in U and the derivative can be calculated by term-by-term differentiation of
the series. The result is

X an (log n)m
φ(m) (2) = (−1)m = (−1)m bm , bm ≥ 0.
n=1
n2

Hence

X bm 3
φ(s) = (2 − s)m , |s − 2| < .
m=0
m! 2
If 12 < s < 2 then all terms of this series are non-negative, hence φ(s) ≥ φ(2) > 1 for 12 <
s < 2. QED

9. A Proof of Quadratic Reciprocity via Finite Fourier Series

In this section we will apply a finite Fourier series expansion to powers of the Gauss sums
p−1
X
χp (n)ζ nm , ζ = exp(2πi/p),
n=0

to derive once more the Law of Quadratic Reciprocity. This proof is due to H. Rademacher,
and we follow his account of it from [46].
The Fourier series expansion to which we are referring is given in the following lemma:

Lemma 7.12. If m > 1 is an integer, F (t) is a (complex-valued) function defined on Z


of period m, and ζm = exp(2πi/m), then
m−1
X
nt
F (t) = a(n)ζm ,
n=0

where
m−1
1 X −nt
a(n) = F (t)ζm .
m t=0

177
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

In order to see why the expansion of F in Lemma 7.12 can be viewed as a discrete
Fourier series, we need to consider the complex version of the real-valued Fourier series that
was defined and studied in section 6. To that end, for a complex-valued function f (x) defined
on the interval −π ≤ x ≤ π of the real line, define the (complex) Fourier series of f as

X
(∗) fn einx ,
n=−∞

where
π
1
Z
fn = f (x)e−inx , n = 0, ±1, ±2, . . . ,
2π −π

are the (complex) Fourier coefficients of f. When f is real-valued, the Fourier series S(f )
that we defined in section 6 can be recovered from the series (∗) by grouping the terms with
indices −n and n together and formally rewriting the series as

X
f0 + (fn einx + f−n e−inx ).
n=1

If the Fourier series of f converges to f (x) then we can write



X
(31) f (x) = fn einx .
n=−∞

We now proceed to discretize this continuous picture. First, replace the continuous
variable x by the discrete variable t = 0, 1, 2, . . . , m − 1. Second, substitute the exponential
t
function ζm = exp(2πit/m) and the function F (t) in Lemma 7.12 for the exponential function
ix
e and the function f (x), respectively. The analog of the Fourier coefficient fn , which is the
mean value of the function f (x)e−inx over the interval −π ≤ x ≤ π, is then the mean a(n)
−nt
of the values at t = 0, 1, 2, . . . , m − 1 of the function F (t)ζm . It follows that the discrete
version of (31), in other words, a finite Fourier series expansion of F (t), is precisely the
conclusion of Lemma 7.12.
Proof of Lemma 7.12.
By direct substitution using the stated formula for a(n), we compute that
m−1 m−1
!
X 1 X m−1 X
nt −ns nt
a(n)ζm = F (s)ζm ζm
n=0
m n=0 s=0
m−1 m−1
1 X X
n(t−s)
= F (s) ζm
m s=0 n=0
= F (t),
178
9. A Proof of Quadratic Reciprocity via Finite Fourier Series

where the last line follows by the same calculation that we used to derive equation (21) in
section 9 of Chapter 3. QED
N.B. We will refer to the coefficients a(n) in the conclusion of Lemma 7.12 as the Fourier
±nt
coefficients of F. We also note that it follows from the fact that F (t) and ζm are periodic
of period m that the sums in the formulae for F (t) and a(n) in Lemma 7.12 can be taken
over any complete set of ordinary residues modulo m without a change in their values; we
will use this observation without further reference in the rest of this section.
Let p and q be distinct odd primes, with ζ = exp(2πi/p). In order to make the writing
less cumbersome, we change the notation slightly as follows: for each t ∈ Z, let
p−1
X
t
G(ζ ) = χp (n)ζ tn .
n=0

Recall from Theorem 3.14 that


1
G(ζ)2 = (−1) 2 (p−1) p,

and so
 21 (q−1) 1 1 1
G(ζ)q−1 = G(ζ)2 = (−1) 2 (p−1) 2 (q−1) p 2 (q−1) .

We conclude from Euler’s criterion (Theorem 2.5) that


1 1
(32) G(ζ)q−1 ≡ (−1) 2 (p−1) 2 (q−1) χq (p) mod q.

The LQR will follow from (32) if we can prove that

(33) G(ζ)q−1 ≡ χp (q) mod q,

because then (32) and (33) will imply that


1 1
(−1) 2 (p−1) 2 (q−1) χq (p) ≡ χp (q) mod q,

and this congruence must in fact be an equality since the difference of both sides must be
either 0 or ±2 and also divisible by the odd prime q. In order to deduce the LQR, we must
therefore verify (33).
This verification will be done by expanding the function G(ζ t )q as a finite Fourier series
by way of Lemma 7.12 with m = p. Thus, we calculate that
X
(34) G(ζ t)q = aq (u)ζ ut,
u mod p

179
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

with Fourier coefficients


1 X
aq (u) = G(ζ v )q ζ −vu
p
v mod p
1 X X X
= χp (m1 )ζ vm1 · · · χp (mq )ζ vmq ζ −vu
p v mod p m mod p mq mod p
1

1 X X
= χp (m1 m2 · · · mq )ζ v(m1 +m2 +···+mq −u)
p v mod p v,m1 ,...,mq
mod p
1 X X
= χp (m1 m2 · · · mq ) ζ v(m1 +m2 +···+mq −u) .
p v,m1 ,...,mq v mod p
mod p

Thus
X
(35) aq (u) = χp (m1 m2 · · · mq ).
mj mod p
m1 +m2 +···+mq ≡u mod p

We will now use the equation

(36) G(ζ t) = χp (t)G(ζ), t ∈ Z,

from Lemma 3.15 to calculate aq (u) in a different way. Because q is odd, it follows from (36)
that
G(ζ v )q = χp (v)G(ζ)q .
Hence
1 X
aq (u) = G(ζ)q χp (v)ζ −vu
p v mod p
1
= G(ζ)q G(ζ −u )
p
1
= G(ζ)q χp (u)G(ζ −1),
p
where in the last line, we have applied (36) again, this time with ζ replaced by ζ −1. Com-
paring the last formula with its own special case u = 1, we find that

(37) aq (u) = χp (u)aq (1).

Next, insert (37) into (34) to deduce that


X
G(ζ t )q = aq (1) aq (u)ζ ut = aq (1)G(ζ t ).
u mod p

180
9. A Proof of Quadratic Reciprocity via Finite Fourier Series

Because G(ζ) 6= 0 it follows from this equation and another application of (37) that

(38) G(ζ)q−1 = aq (1) = χp (q)aq (q).

We will use (35) and (38) to deduce (33); that will complete this proof of the LQR.
Toward that end, we substitute (35) into (38) to obtain
X
(39) G(ζ)q−1 = χp (q) χp (m1 m2 · · · mq ).
mj mod p
m1 +m2 +···+mq ≡q mod p

This equation must be reduced modulo q. In order to do that, we first examine the possibil-
ities for the set of summation variables m1 . . . , mq . Suppose that

m1 ≡ m2 ≡ · · · ≡ mq mod p.

This requires that


m1 + m2 + · · · + mq ≡ qmj ≡ q mod p,
hence mj ≡ 1 mod p, yielding only the one summand

(40) χp (1) = 1.

All other solutions m1 , . . . , mq of

m1 + m2 + · · · + mq ≡ q mod p

must contain non-congruent integers. If such a solution is cyclically permuted, then another
solution of this type will be obtained. Indeed, a cyclic permutation of

m1 , m2 , . . . mq

can be expressed as
m1+s , m2+s , . . . , mq+s ,
for some s, 1 ≤ s < q, with the subscripts here being taken modulo q. If this set of solutions
is of the same kind as the previous one, then

mj ≡ mj+s mod p.

Hence, upon successively setting j = s, 2s, . . . , (q − 1)s, we obtain

ms ≡ m2s ≡ · · · ≡ mqs mod p,

where the subscripts here form a complete set of ordinary residues mod q, and this puts us in
the case that we have already considered. It follows that those solutions of m1 +m2 +· · · mq ≡
q mod p in which non-congruent numbers appear produce the term

qχp (m1 m2 · · · mq )
181
Chapter 7. Dirichlet L-functions and the Distribution of Quadratic Residues

and these terms sum to a number congruent to 0 mod q. Consequently, when the sum
X
χp (m1 m2 · · · mq )
mj mod p
m1 +m2 +···+mq ≡q mod p

is reduced modulo q, the only nonzero term is the single term (40). Therefore by (39),
G(ζ)q−1 ≡ χp (q) mod q,
which verifies (33), and hence also the LQR. QED
In analogy with the Fourier-series expansion of a function f (x) of a real variable x, the
nt
functions ζm in Lemma 7.12 can be viewed as the harmonics of the function F (t), with the
amplitude of the harmonics given by the Fourier coefficients a(n). The proof of quadratic
reciprocity that we have presented here can thus be interpreted as showing that the LQR
comes from the fact that, modulo the prime q, the dominant harmonic of G(ζ t )q is ζ qt with
amplitude congruent to 1 modulo q.

182
CHAPTER 8

Dirichlet’s Class-Number Formula

Although De la Valleé Poussin’s proof of Lemma 4.8 is elegant and efficient, it fails to
explain exactly why the values of L-functions at s = 1 are positive, and consequently the
actual reason why the sums in Theorem 7.1 turn out to be positive is not yet clear. Often
in number theory, integers which occur in interesting situations are in fact positive because
they count something, and it transpires that this is the case in Theorem 7.1. The sums
there in fact count equivalence classes of binary quadratic forms, or, to say the same thing
in a different way, ideal classes in quadratic number fields. In order to see this, we will
now present a second proof of Lemma 4.8 that uses Dirichlet’s famous class-number formula,
which formula calculates the value of L(1, χ) when χ is a real (primitive) Dirichlet character
in terms of the number of certain equivalence classes of quadratic forms. As we did in our
first proof of Lemma 4.8, we will follow the exposition for this as set forth in Davenport [6].
We begin things in this chapter by using some structure theory of Dirichlet characters
in the first section to reduce the problem of proving that L(1, χ) 6= 0 for an arbitrary real
non-principal Dirichlet character χ to proving that that is true for an arbitrary real primitive
character. This highlights the importance of real primitive characters in our discussion, and
so the facts concerning the structure of those characters that will be required are recorded
in section 2. In order to precisely state what the class-number formula asserts, we need
to explain what a class number is, and that is the subject of sections 3 and 4. Section 3
recalls the fundamental equivalence relation defined on the set of all primitive and irreducible
quadratic forms with a given discriminant that we first discussed in section 12 of Chapter
3, and section 4 uses the equivalence classes coming from this equivalence relation and some
information on the representation of integers by quadratic forms to define the class number.
Dirichlet’s class-number formula is stated precisely and proved in section 5. Dirichlet’s
formula calculates the value at s = 1 of the L-function of a primitive Dirichlet character as the
product of a canonically determined positive constant and the class number of an appropriate
set of quadratic forms, thereby providing the definitive explanation of the positivity of that
value of the L-function. In section 7, we reformulate the class-number formula in terms of
the class number determined by the ideal classes in quadratic number fields, and we also give
class-number formulae for the sums of the Legendre symbols in Theorem 7.1. As an extra

183
Chapter 8. Dirichlet’s Class-Number Formula

bonus from the theory discussed in section 1, we give in the last section of this chapter our
seventh and final proof of the Law of Quadratic Reciprocity.

1. Some Structure Theory for Dirichlet Characters

We begin by discussing some useful facts regarding the structure of Dirichlet characters.
Let χ be a Dirichlet character of modulus b, and let d be a positive divisor of b. The number
d is an induced modulus of χ if χ(n) = 1 whenever gcd(n, b) = 1 and n ≡ 1 mod d. In
other words, d is an induced modulus of χ if χ acts like a character with modulus d on the
integers in an ordinary residue class of 1 mod d which are relatively prime to b. One can
show without too much difficulty that a positive divisor d of b is an induced modulus of χ if
and only if there exists a Dirichlet character ξ of modulus d such that

(1) χ(n) = χ1 (n)ξ(n), for all n ∈ Z,

where χ1 denotes the principal character of modulus b. Hence the induced moduli of χ are
precisely the moduli of Dirichlet characters which “induce” the character χ in the sense of
(1). It is a straightforward consequence of (1) that χ is non-principal if and only if ξ is
non-principal. Because d is a factor of b, it also follows from (1) and the Dirichet product
formula for L-functions (Lemma 7.5(ii)) that the L-function of χ is a factor of the L-function
of ξ.
The modulus b is clearly an induced modulus of χ, and if b is the only positive divisor of
b which is an induced modulus, χ is said to be primitive. If d is the smallest positive divisor
of b that is an induced modulus of χ i.e., d is the conductor of χ, then one can show that
the factor ξ in (1) is a primitive character modulo d. Taking the character ξ in (1) to be the
character modulo the conductor of χ, it follows from (1) that χ is real if and only if ξ is real,
and so from this fact and our observations in the previous paragraph, we conclude that in
order to show that the value at s = 1 of the L-function of an arbitrary real and non-principal
Dirichlet character is nonzero, and hence positive by the proof of Lemma 7.5(iii), it suffices
to prove that this is true for all real primitive Dirichlet characters. It hence behooves us to
take a closer look at the structure of those characters, and that is what we will do in the
next section.

2. The Structure of Real Primitive Dirichlet Characters

The structure of a real primitive Dirichlet character is of a very particular type; in this
section we will describe that structure precisely. It is here that we will also begin to see
how the classical theory of quadratic forms enters the picture. We begin by noting that
every Legendre symbol is real, and because the modulus is prime, they are also primitive.
184
2. The Structure of Real Primitive Dirichlet Characters

However, there are also real primitive characters of modulus 4 and 8, and so we will describe
those next.
For the modulus 4, there is only one non-principal character, defined by
(
1, if n ≡ 1 mod 4,
χ4 (n) =
−1, if n ≡ −1 mod 4,
and χ4 is obviously real and primitive. For the modulus 8, there are only two real primitive
characters, defined by (
1, if n ≡ ±1 mod 8,
χ8 (n) =
−1, if n ≡ ±3 mod 8.
and (
1, if n ≡ 1 or 3 mod 8,
χ−8 (n) =
−1, if n ≡ −1 or −3 mod 8.
In fact, we have that χ−8 = χ4 χ8 .
It can be shown that the prime-power moduli for which real primitive characters exist
are: any odd prime p, with corresponding character χp (the Legendre symbol of p), 4, with
corresponding character χ4 , and 8, with corresponding characters χ±8 . Following Davenport,
we will call the moduli 4, 8, p, p > 2, the basic moduli and the corresponding real primitive
characters the basic characters. More generally, the following theorem asserts that the moduli
for which real primitive characters exist are determined by the products of basic moduli and
the real primitive characters are determined by the products of the basic characters which
correspond to the basic moduli. For a proof of the theorem, we refer the interested reader
to Davenport [6], pp. 38-40.

Theorem 8.1. Let


M = {4, 8} ∪ {p ∈ P : p > 2}
denote the set of basic moduli. If b > 1 is an integer then b is the modulus of a real primitive
Dirichlet character if and only if b is the product of relatively prime factors from M. If b is
such a modulus and B is the set of basic moduli into which b factors, then the real primitive
Dirichlet characters of modulus b are given precisely by the product (or products)
Y
χn ,
n∈B

where χn is a basic character of modulus n.

It follows from Theorem 8.1 that if b is a modulus as specified in that theorem then there
is exactly one real primitive Dirichlet character of modulus b, unless 8 is a factor of b, in
185
Chapter 8. Dirichlet’s Class-Number Formula

which case there are exactly two such characters. Real primitive characters can also be used
to give yet another proof of the Law of Quadratic Reciprocity, as we will see in section 7.
If χ is a real primitive Dirichlet character of modulus b then we let dχ denote the param-
eter

χ(−1)b.

It is a consequence of Theorem 8.1 that as χ varies over all real primitive characters, dχ
varies over all integers, positive or negative, that are either
(a) square-free and congruent to 1 mod 4, or
(b) of the form 4n, where n is square-free and n ≡ 2 or 3 mod 4
(Davenport [6], pp. 40-41).
The range of the parameter dχ marks the first occurrence of what we will eventually
see as an intimate connection between real primitive Dirichlet characters and the classical
theory of quadratic forms. If

ax2 + bxy + cy 2

is a quadratic form with integer coefficients a, b, c, then the discriminant of the form is the
familiar algebraic invariant b2 − 4ac. In this theory, the forms which are not a product of
linear factors are the primary objects of study, and in this case, the discriminant is not a
perfect square, and is hence a non-square integer which is congruent to 0 or 1 mod 4. A
discriminant b2 −4ac is said to be a fundamental discriminant if gcd(a, b, c) = 1. It is then not
difficult to see that the set of fundamental discriminants consists precisely of the integers
which satisfy the conditions (a) and (b) above, i.e., the set of fundamental discriminants
coincides with the range of dχ as χ varies over all real primitive Dirichlet characters.
The range of dχ can also be used to make an important connection between Dirichlet
characters and quadratic number fields. As we saw in section 11 of Chapter 3, a quadratic
number field is generated over Q by the square root of a square-free integer m, and the
discriminant of the quadratic field is defined as m or 4m if m is, respectively, congruent to 1
mod 4 or congruent to 2 or 3 mod 4. Consequently, the range of dχ also coincides with the
discriminants of the set of all quadratic number fields.
The proof of Dirichlet’s class-number formula that we will present makes use of a canon-
ical representation of the real primitive Dirichlet characters that is parametrized by the set
of fundamental discriminates coming from the theory of quadratic forms. In order to explain
what this parameterization is, let d denote a fundamental discriminant, which we understand
at this point as simply an integer that satisfies either of the conditions (a) or (b) described
above. It can be shown (Landau [35], pp. 221-222) that d has a unique factorization into a
186
3. Elements of the Theory of Quadratic Forms

product of relatively prime factors taken from the set of primary discriminants
1
−4, 8, −8, (−1) 2 (p−1) p, p an odd prime;

we call this factorization the primary factorization of d. To each primary discriminant, we


associate one of the basic characters introduced in this section: to the integer −4 we associate
the character χ4 , to 8 and −8 the character χ8 and χ−8 , respectively, and to each integer
1
(−1) 2 (p−1) p, we associate the Legendre symbol χp . If D is the set of primary discriminants
in the primary factorization of d, we let χ(d) denote the real primitive character of modulus
|d| given by the product
Y
χ(d) = χn ,
n∈D

where χn is the character that we have associated with n above. We will call χ(d) the
character determined by d. A proof of the following theorem can be found in Davenport [6],
pp. 38-41:

Theorem 8.2. The map d → χ(d) is a bijection of the set of fundamental discriminants
onto the set of all real primitive Dirichlet characters, and its inverse map is given by χ →
χ(−1)(modulus of χ).

The correspondence given in Theorem 8.2 is a special case of the correspondence from
Z \ {0} into the set of all real Dirichlet characters defined by the Kronecker symbol, a
generalization of the Legendre and Jacobi symbols, which is a very useful device in the
study of real Dirichlet characters. We will make no further use of the Kronecker symbol in
these notes; for its definition and basic properties, the interested reader is referred to either
Landau [35], Definition 20 and Theorems 96-101 or Cohen [2], section 2.2.2.
We now have at our disposal all of the information about real primitive Dirichlet charac-
ters that we need in order to state and prove Dirichlet’s class-number formula. In addition
to that information, we will also require some results from the classical theory of quadratic
forms, to which we turn next.

3. Elements of the Theory of Quadratic Forms

Let d be a fundamental discriminant, i.e., an integer which is not a square and which
is either square-free and congruent to 1 mod 4 or of the form 4n, where n is square-free
and is congruent to either 2 or 3 mod 4. The classes to which Dirichlet’s class-number
formula originally referred are the equivalence classes determined by the basic equivalence
relation on quadratic forms defined by modular substitutions, which we discussed in section
187
Chapter 8. Dirichlet’s Class-Number Formula

12 of Chapter 3. For the reader’s convenience, we will recall the essential features of that
equivalence relation.
We begin with some convenient notation. If (a, b, c) is an ordered triple of integers then
[a, b, c] will denote the quadratic form ax2 + bxy + cy 2 and if d is a fundamental discriminant
then Q(d) will denote the set of all irreducible and primitive quadratic forms of discriminant
d, i.e., the set of all quadratic forms [a, b, c] of discriminant d which do not factor into linear
forms with integer coefficients and for which gcd(a, b, c) = 1. In particular, if [a, b, c] ∈ Q(d)
then acd 6= 0. On the set Q(d) we will declare that two forms q(x, y) = ax2 + bxy + cy 2 and
q1 (X, Y ) = a1 X 2 + b1 XY + c1 Y 2 in Q(d) are equivalent if there is a linear transformation
defined by

x = αX + βY, y = γX + δY,

where α, β, γ, and δ are integers satisfying αδ − βγ = 1, such that

q(αX + βY, γX + δY ) = q1 (X, Y ).

These transformations are called modular substitutions, and each modular substitution maps
Q(d) bijectively onto Q(d). As we pointed out in section 12 of Chapter 3, it follows from
classical results of Lagrange that the number of equivalence classes is finite. There is always
at least one form in Q(d), called the principal form, defined by

1
x2 − dy 2, if d ≡ 0 mod 4,
4

or
1
x2 + xy − (d − 1)y 2, if d ≡ 1 mod 4,
4
hence the number of equivalence classes is a positive integer.
An important parameter which enters into Dirichlet’s formula for the class number is
determined by the automorphs of a form in Q(d). These automorphs are the modular
substitutions which leave a given form invariant. There are always two automorphs for every
form: the trivial substitution x = X, y = Y and the negative of the trivial substitution,
x = −X, y = −Y . If d ≤ −5 then there are no other automorphs. If d = −3 or −4 then
there is only one equivalence class of forms, represented by the principal form. If d = −3,
this is the form x2 + xy + y 2 , with the additional automorphs x = −Y, y = X + Y and
x = X + Y, y = −X and their negatives. If d = −4 then the principal form is x2 + y 2 , and
this form has, in addition to the ones already mentioned, the automorph x = Y, y = X and
188
4. Representation of Integers by Quadratic Forms and the Class Number

its negative. We denote the number of automorphs by w, so that



 2, if d < −4,

(2) w= 4, if d = −4,

6, if d = −3.

On the other hand, if d > 0 then each form of discriminant d has infinitely many automorphs,
which are determined by the integral solutions (t, u) of the associated Pell’s equation

t2 − du2 = 4.

This equation has the trivial solution t = ±2, u = 0, and if (t0 , u0 ) is the solution for which
t0 > 0 and u0 is positive and is as small as possible, the so-called minimal positive solution
of Pell’s equation, then all nontrivial solutions (t, u) are generated from the minimal positive
solution via the equation
√  √  n
 
1 1
t+u d =± t0 + u0 d ,
2 2
where n varies over all positive and negative integers. For a form [a, b, c] of discriminant d,
it can be shown that all automorphs of the form are given by
1 1
x = (t − bu)X − cuY, y = auX + (t + bu)Y,
2 2
with the trivial automorphs determined by the trivial solutions of Pell’s equation. For a proof
of the results in this paragraph, the interested reader should consult Landau [35], Theorems
111 and 202.

4. Representation of Integers by Quadratic Forms and the Class Number

Let n be an integer, q(x, y) a quadratic form. We say that n is represented by q(x, y)


if there exist integers x and y such that n = q(x, y), and two representations q(x, y) and
q(X, Y ) of n are distinct if the ordered pairs (x, y) and (X, Y ) are distinct. In this section
we will be interested in the number of distinct ways a positive integer can be represented
by quadratic forms in Q(d). The problem of determining what integers are represented by a
given quadratic form, which is naturally closely related to the number of such representations,
was one of the main motivations for the theory of quadratic forms which Gauss developed
in Section V of the Disquisitiones. The number of representations of a positive integer will
also be a crucial idea in the proof of the class-number formula.
As we mentioned in section 12 of Chapter 3, the manner in which a given quadratic form
q = [a, b, c] in Q(d) represents integers depends on the sign of the discriminant d. If d > 0
then q represents both positive and negative integers. If d < 0 and a > 0 then q represents
189
Chapter 8. Dirichlet’s Class-Number Formula

no negative integers, and q(x, y) represents 0 only if x = y = 0. If d < 0 and a < 0 then
q represents no positive integers, and q represents 0 only if x = y = 0. Hence forms with
positive discriminant are called indefinite and forms with negative discriminant are called
positive or negative definite if a is, respectively, positive or negative. A straightforward
calculation shows that any unimodular substitution transforms a positive definite form into
a positive definite form, and multiplication of a form by −1 transforms a positive definite
form into a negative definite form. When d < 0, it hence follows that the equivalence
classes of Q(d) which are determined by modular substitutions are divided evenly into a
positive number of classes consisting of positive definite forms and a positive number of
classes consisting of negative definite forms. When d < 0, we define the class number of
d to be the number of equivalence classes of positive definite forms, and we denote it by
h(d). When d > 0, we define the class number of d to be the number of all equivalence
classes in Q(d), and we also denote it by h(d). As we shall see, this is the number to which
Dirichlet’s class-number formula refers and in the context of understanding why the value of
L-functions at s = 1 is positive, it is the most important parameter in that formula.
When d > 0 then each form q(x, y) is indefinite, and so we can chose a positive integer k
such that k = q(x0 , y0 ) with gcd(x0 , y0) = 1. It can then be shown that q(x, y) is equivalent
to a form with k as the coefficient of its x2 -term (Landau [35], Theorem 201), and so we can
choose a form [a, b, c] from each equivalence class with a > 0. When d < 0, we can chose such
a form from each equivalence class of positive definite forms. We formalize this procedure by
defining a representative system of forms as a set of representatives [a, b, c], one from each
equivalence class (positive definite if d < 0), such that each representative has a > 0.
When d < 0 then, as per the convention that we will now follow, all forms are positive
definite, and so the number of representations of a positive integer n is finite. We hence
denote by R(n) the total number of (distinct) representations of n by all forms from a
representative system. We observe that R(n) does not depend on the representative system
use to define it.
On the other hand, when d > 0, each representation of the positive integer n by a fixed
form gives rise to infinitely many representations of n by that form, obtained by applying
the automorphs of the form. We are going to circumvent this difficulty by restricting the
representations of n to ones which satisfy a particular condition which we will specify next,
and which is designed to canonically select a finite set of representations of n by each
quadratic form.

190
4. Representation of Integers by Quadratic Forms and the Class Number

Let q = [a, b, c] be a quadratic form of discriminant d > 0. Let (t0 , u0 ) be the minimal
positive solution of the associated Pell’s equation

t2 − du2 = 4,

and let

t0 + u0 d
ε= .
2
If n is a positive integer then the representation n = q(x, y) of n by q is said to be primary
if

2ax + (b − d)y > 0,

and

2ax + (b + d)y)
1≤ √ < ε2 .
2ax + (b − d)y
It can then be shown that the number of primary representations of n by q is finite (Landau
[35], Theorem 203 and the remark after this theorem). When d > 0, we hence denote by
R(n) the total number of primary representations of n by all forms from a representative
system, and we also note that, as before, R(n) is independent of the representative system
of forms used to define it.
One of the fundamental results in the classical theory of quadratic forms gives a formula
for the calculation of R(n) by means of Dirichlet characters, and this theorem provides
the link between quadratic forms and Dirichlet characters on which the derivation of the
class-number formula is based. It goes like so: for a proof, see Landau [35], Theorem 204:

Theorem 8.3. Let d be a fundamental discriminate and let χ(d) be the character de-
termined by d via Theorem 8.2. If n is a positive integer that is relatively prime to d and
R(n) denotes the total number of representations of n (primary if d > 0) by a representative
system of quadratic forms in Q(d), then the value of R(n) is given by the formula
X
R(n) = w χ(d)(m),
m|n

where w is given by (1) if d < 0, and w = 1 if d > 0.

As Davenport summarizes, this theorem is deduced by expressing R(n) in terms of the


number of solutions of the congruence x2 ≡ d mod 4n and then evaluating that sum using
appropriate characters defined by the Legendre symbol. With Theorem 8.3 in hand, we now
have all the ingredients required for the statement and proof of the class-number formula.
191
Chapter 8. Dirichlet’s Class-Number Formula

5. The Class-Number Formula

If χ is a real primitive Dirichlet character of modulus b and if d = χ(−1)b then it follows


from Theorem 8.2 that d is a fundamental discriminant. Dirichlet’s class-number formula
calculates the value at s = 1 of the L-function of χ as the product of a certain positive
parameter and the class number of d. In this sense, we can hence interpret this result to
say that L(1, χ) counts the equivalence classes of quadratic forms in Q(d), thereby providing
the definitive reason why L(1, χ) is positive. We now state and prove Dirichlet’s remarkable
formula.

Theorem 8.4. (Dirichlet’s Class-Number Formula). Let χ be a real primitive Dirichlet


character of modulus b, let d = χ(−1)b, and let h(d) denote the class number of d.
(i) Suppose that d < 0. Then

L(1, χ) = p h(d),
w |d|
where w is determined by (2).
(ii) Suppose that d > 0. Let (t0 , u0) be the minimal positive solution of t2 − du2 = 4, and

let ε = 21 (t0 + u0 d ) > 1. Then
log ε
L(1, χ) = √ h(d).
d
Proof. Let χ, d, and h(d) be as given in the statement of Theorem 8.4. Starting with a
positive integer n relatively prime to d and the formula
X
(3) R(n) = w χ(m)
m|n

for R(n) as given by Theorem 8.3, the key idea of this argument is to calculate the asymptotic
average value of R(n) as n → +∞ in two different ways and compare the results. The first
way is to use (3) and the fact that χ is non-principal to deduce that the asymptotic average
value of R(n) as n → +∞ is wL(1, χ). The second way is to directly use the definition
of R(n) and an integer-lattice count in the plane to express the asymptotic average value
of R(n) as the product of a positive parameter and the class number of d. Setting the two
expressions of the asymptotic average equal to each other produces the class-number formula
that we desire.
In order to implement the first way, we begin by using the formula (3) to write the sum
N
1 X X
(4) R(n) = χ(m1 ).
w n=1 m1 m2 ≤N
gcd(n, d)=1 gcd(m1 m2 , d)=1

192
5. The Class-Number Formula

We then split the sum on the right-hand side of this equation into
X  X  X  X 
χ(m1 ) 1 + χ(m1 ) ,
√ √ √
m1 ≤ N m2 ≤N/m1 m2 < N N <m1 ≤N/m2
gcd(m2 , d)=1 gcd(m2 , d)=1

because the sum on the left is summed over all pairs m1 , m2 such that m1 ≤ N and the

sum on the right ranges over all pairs for which m1 > N . The inner sum in the sum on
the left is
N ϕ(|d|) 
+ O ϕ(|d|) ,
m1 |d|
where ϕ is Euler’s totient, hence the double sum on the left is estimated by
ϕ(|d|) X χ(m1 ) √
N + O( N ).
|d| √ m1
m1 ≤ N

We now exploit the fact that χ is non-principal, to wit, the sum of the values χ(m) as m
varies throughout any finite interval is bounded (as was shown in the proof of Proposition

7.5(i)). Hence the double sum on the right is O( N). Inserting these estimates into (3) and
then multiplying the resulting equation by w/N, we obtain the estimate
N
1 X ϕ(|d|) X χ(m) 1
R(n) = w + O(N − 2 ).
N n=1
|d| √ m
m≤ N
gcd(n, d)=1

The sum on the right-hand side of this equation, when extended from m = 1 to +∞, has a
remainder that, when estimated by a summation-by-parts argument, is
X χ(m) 1
= O(N − 2 ).
√ m
m> N

It follows that
N ∞
1 X ϕ(|d|) X χ(m) ϕ(|d|)
(5) lim R(n) = w =w L(1, χ).
N →+∞ N
n=1
|d| m=1 m |d|
gcd(n, d)=1

As Davenport points out, the quotient ϕ(|d|)/|d| measures the density of the integers that
are relatively prime to |d|, and so this equation asserts that the asymptotic average of R(n)
with respect to n is wL(1, χ).
We now want to calculate the limit
N
1 X
lim R(n)
N →+∞ N
n=1
gcd(n, d)=1

193
Chapter 8. Dirichlet’s Class-Number Formula

in the second way, i.e., by means of the definition of R(n) as the total number of repre-
sentations of n by quadratic forms in Q(d) from a representative system. Hence we let
R(n, f ) denote the number of representations of n (primary when d > 0) by a particular
form f ∈ Q(d), so that, by definition,
X
R(n) = R(n, f ),
f

where the sum here is taken over all quadratic forms f from a representative system. Note
that the number of terms in this sum is therefore h(d). We now wish to calculate the limit
N
1 X
(6) lim R(n, f ),
N →+∞ N
n=1
gcd(n, d)=1

and it will transpire that this limit has a value κ(d) independent of f . Hence
N
1 X
(7) lim R(n) = κ(d)h(d).
N →+∞ N
n=1
gcd(n, d)=1

It follows from (5) and (7) that


|d|
(8) L(1, χ) = κ(d)h(d),
wϕ(|d|)
and so in order to complete the proof of Theorem 8.4, we must calculate κ(d).
Suppose first that d < 0. Let f = [a, b, c]. The sum
N
X
R(n, f )
n=1
gcd(n, d)=1

is the number of ordered pairs of integers (x, y) satisfying

0 < ax2 + bxy + cy 2 ≤ N, gcd(ax2 + bxy + cy 2 , d) = 1.

As x and y each run through a complete set of ordinary residues mod |d|, there are exactly
|d|ϕ(|d|) of the numbers ax2 + bxy + cy 2 that are relatively prime to d (Landau [35], Theorem
206). It therefore suffices to fix a pair of integers (x0 , y0) for which

gcd(ax20 + bx0 y0 + cy02, d) = 1,

and consider the number of pairs of integers (x, y) which satisfy

ax2 + bxy + cy 2 ≤ N, x ≡ x0 , y ≡ y0 mod |d|.


194
5. The Class-Number Formula

The first inequality asserts that the point (x, y) is in an ellipse centered at the origin and
p p
passing through the points (± N/a, 0) and (0, ± N/c), with the ellipse expanding uni-
formly as N → +∞. Figures 1 and 2 exhibit two typical elliptical regions which arise in this
manner.
q
α = Na
q
β = Nc
β

α
−α

−β

b
Figure 1. Elliptical region , <0
a−c

195
Chapter 8. Dirichlet’s Class-Number Formula
q
N
α= a
q
N
β= c
β

−α
α

−β

b
Figure 2. Elliptical region , >0
a−c

Using the fact that the area enclosed by an ellipse is π times the product of the lengths
of the semi-axes, we find that the area of the ellipse is
2π 2π
√ N = p N.
4ac − b2 |d|
By dividing the plane into squares of side length |d| centered at the points of the integer
lattice, it can be shown without difficulty that as N → +∞, the number of points in the
integer lattice and lying inside the ellipse is asymptotic to
1 2π
p N.
|d|2 |d|

This must now be multiplied by |d|ϕ(|d|) in order to account for the number of points (x0 , y0).
It follows that κ(d), the value of the limit (5), is
ϕ(|d|) 2π
p .
|d| |d|
Substituting this value of κ(d) into (8) yields the conclusion (i) of Theorem 8.4.
Now let d > 0. The lattice-point count here is a bit more involved than the one that was
done for d < 0 because we need to count primary representations. If we set
√ √
−b + d ′ −b − d 1 √
θ= , θ = , ε = (t0 + u0 d ) > 1,
2a 2a 2
196
5. The Class-Number Formula

where (t0 , u0 ) is the positive minimal solution of Pell’s equation t2 − du2 = 4 as before, then
an integer pair (x, y) determines a primary representation of an integer if
x − θ′ y
x − θy > 0 and 1 ≤ < ε2 .
x − θy
Arguing as we did in the case of negative d, for a fixed pair of integers (x0 , y0 ) for which
gcd(ax20 + bx0 y0 + cy02, d) = 1,
we need to count the number of integer points (x, y) such that
x − θ′ y
ax2 + bxy + cy 2 ≤ N, x − θy > 0, 1 ≤ < ε2 ,
x − θy
and
x ≡ x0 , y ≡ y0 mod d.
The set of conditions
x − θ′ y
ax2 + bxy + cy 2 ≤ N, x − θy > 0, 1 ≤ < ε2 ,
x − θy
determines a hyperbolic sector in the upper half-plane of two possible types: one is bounded
by the nonnegative x-axis, the branch of the hyperbola ax2 + bxy + cy 2 = N passing through
p
the point N/a on the x-axis, and the ray emanating from the origin and lying along the
a(ε2 − 1)
line νx = (νθ + 1)y, ν = √ , and the other is bounded by the nonpositive x-axis,
d p
the branch of the hyperbola ax2 + bxy + cy 2 = N passing through the point − N/a on
the x-axis, and the ray emanating from the origin and lying along the line νx = (νθ + 1)y.
Figures 3, 4, and 5 illustrate typical hyperbolic sectors which arise in the first way; we invite
the reader to provide figures for the hyperbolic sectors arising in the second way.

q
N
α= a

β
q
N
β= c

Figure 3. Hyperbolic sector, c > 0, νθ + 1 > 0


197
Chapter 8. Dirichlet’s Class-Number Formula

q
N
α= a

β
q
N
β= c

Figure 4. Hyperbolic sector, c > 0, νθ + 1 < 0

q
N
α= a

Figure 5. Hyperbolic sector, c < 0

198
6. The Class-Number Formula and the Class Number of Quadratic Fields

We need to calculate the area of this sector, which we will do by using the change of
variables
ξ = x − θy, η = x − θ′ y.
Because of the fact that

ax2 + bxy + cy 2 = a(x − θy)(x − θ′ y),

it follows that the hyperbolic sector is mapped onto the sector in the η, ξ plane given by
N
ηξ ≤ , ξ > 0, ξ ≤ η < ε2 ξ,
a
or equivalently,
r  
N 2 N
0<ξ≤ , ξ ≤ η < min ε ξ, .
a aξ
The Jacobian of the change of variables is

d
,
a
p
hence, upon setting ξ1 = ε−1 N/a, the area of the sector in the x, y plane is
Z ξ1 Z εξ1   
a 2 N N log ε
√ (ε ξ − ξ)dξ + − ξ dξ = √ .
d 0 ξ1 aξ d
Following the argument that we outlined when d < 0, we conclude that the number of integer
points inside the hyperbolic sector is asymptotic to
1 log ε
√ N
d2 d
as N → +∞. When this is multiplied by dϕ(d), in order to account for the number of
choices of the pair (x0 , y0 ), the value of κ(d) is now
ϕ(d) log ε
√ ,
d d
hence we obtain the conclusion (ii) of Theorem 8.4 from (8) using this value of κ(d). QED

6. The Class-Number Formula and the Class Number of Quadratic Fields

As we alluded to in the introduction to this chapter, the class-number formula can also
be used to count ideal classes in quadratic number fields in exactly the same way that it
counts equivalence classes of quadratic forms, and this provides another good explanation of
the positivity of L-functions at s = 1. In this section, we will discuss in more detail exactly
how this goes.
199
Chapter 8. Dirichlet’s Class-Number Formula

Recall from section 11 of Chapter 3 that if K is an algebraic number field and R is the
ring of algebraic integers in K, then the ideals I and J of R are said to be equivalent if
there exists nonzero elements α and β of R such that αI = βJ. The number of equivalence
classes of ideals with respect to this equivalence relation is finite, and that number is called
the class number of R.
As we saw in sections 11 and 12 of Chapter 3, there is a very close connection between
the theory of ideals in quadratic number fields and the classical theory of quadratic forms.
For the reader’s convenience, we will recapitulate the principal features of that connection
here.
Let d be a fundamental discriminant. Then d is either odd and square-free, or d is divisible
by 4 and d/4 is square-free. If we let m = d in the former case and m = d/4 in the latter

case, then the quadratic number field F generated over Q by m has discriminant d. In
order to relate the ideal theory of F to quadratic forms of discriminant d, as we did in section
11 of Chapter 3, we will need to recall the definition of the the norm of an element of F . One
starts by taking an integral basis {ω1 , ω2 } of F , letting k ∈ F , and then expressing k uniquely
as k = rω1 + sω2 , for some ordered pair (r, s) ∈ Q × Q. We then define the norm of k as
the number defined by N(k) = (rω1 + sω2 )(rω1′ + sω2′ ), where {ω1′ , ω2′ } denotes the algebraic
conjugates of {ω1 , ω2 } over Q. This definition of N(k) does not depend on the integral

basis used to define it, hence selecting either the standard integral basis {1, 21 (1 + m)} or

{1, m}, depending on whether m is or, respectively, is not congruent to 1 mod 4 (section
11, Chapter 3), a simple calculation shows that if d < 0 then N(k) > 0 whenever k 6= 0.
If I is an ideal in R = R ∩ F then the norm mapping N maps I into Z, and if {α, β} is
an integral basis of I and ξ = xα + yβ, (x, y) ∈ Z × Z, then we also have that

N(ξ) = (xα + yβ)(xα′ + yβ ′).

The right-hand side of this equation defines a quadratic form q(x, y) with integer coefficients.
When we recall that the norm N(I) of I is defined as the cardinality of the finite quotient
ring R/I (section 1 of Chapter 5), it can be shown that all coefficients of q(x, y) are divisible
by N(I), and that if we let
N(ξ)
= ax2 + bxy + cy 2 ,
N(I)
then this form is in Q(d). The equivalence class of Q(d) which contains this form can be
shown to be independent of the integral basis of I which is used to define the form.
As we discussed in section 12 of Chapter 3, the map which sends an ideal I of R to the
quadratic form N(ξ)/N(I) = ax2 + bxy + cy 2 induces a bijective correspondence between
the equivalence classes of ideals of R in the narrow sense and the forms in a representative
200
6. The Class-Number Formula and the Class Number of Quadratic Fields

system of discriminant d (Theorem 3.22). This correspondence will now be used to express
the class number of d in terms of the class number of R.
Recall from section 12 of Chapter 3 that if I and J are ideals of R then I is said to be
equivalent to J in the narrow sense if there exists k ∈ F such that N(k) > 0 and I = kJ. If
d < 0 then there is no difference between ordinary equivalence of ideals and equivalence in
the narrow sense because the values of N are all positive on F \ {0}. When d > 0 then the
difference between these two equivalence relations is mediated by the units in R. If R has a
unit of norm −1 then there is also no difference between the two equivalence relations. On
the other hand, if d > 0 and there is no unit with a negative norm then it can be shown that
each ideal class in the ordinary sense in the union of exactly two ideal classes in the narrow
sense (section 12, Chapter 3).
It follows that if if we denote the class number of R by h1 (d) and recall that h(d) denotes
the class number of d as determined from the classes of forms in Q(d), then h(d) = h1 (d)
whenever either d < 0 or d > 0 and R has a unit of norm −1, and h(d) = 2h1 (d), otherwise.

The parameters w and ε = 21 (t0 + u0 d ) which occur in the class-number formula can
be calculated by means of the units in R. When d < 0, the parameter w is defined by the
values as stipulated for it by (2), and those values are also the number of roots of unity in
R . As for ε, which enters the class-number formula when d > 0, its value can be calculated
by the fundamental unit in the group of units U(R) of R. Recall from section 12 of Chapter
3 that when d > 0, there is a unit ̟ of R such that

U(R) = {±̟ n : n ∈ Z}.

If ̟ is chosen to exceed 1 then it is uniquely determined as a generator of U(R) in this sense


and is called the fundamental unit of R. If ̟ is the fundamental unit, then it can be shown
that ε = ̟ if N(̟) = 1 and ε = ̟ 2 if N(̟) = −1.
Combining all of these results leads to the conclusion that

h(d) = h1 (d), if d < 0,

and
h(d) log ε = 2h1 (d) log ̟, if d > 0.

When these equations are combined with Dirichlet’s class-number formula it follows that if
χ is a real primitive Dirichlet character of modulus b and d = χ(−1)b, then

L(1, χ) = p h1 (d), if d < 0,
w |d|
201
Chapter 8. Dirichlet’s Class-Number Formula

and
2 log ̟
L(1, χ) = √ h1 (d), if d > 0.
d
Using the first of these formulae for L(1, χ), we can now calculate the quadratic excesses
in Theorems 7.2, 7.3, and 7.4 in terms of class numbers of quadratic fields. If p ≡ 3 mod 8
then
q(0, p/2) = 3h1 (−p),
if p ≡ 7 mod 8 then
q(0, p/2) = h1 (−p),
if p ≡ 1 mod 4 then
1
q(0, p/4) = h1 (−p),
2
if p > 3 and p ≡ 1 mod 4 then
1
q(0, p/3) = h1 (−3p),
2
if p > 3 and p ≡ 7 mod 12 then

q(0, p/3) = 2h1 (−p),

and if p > 3 and p ≡ 11 mod 12 then

q(0, p/3) = h1 (−p).

A noteworthy consequence of these equations is that h1 (−p) is even if p ≡ 1 mod 4 and


h1 (−3p) is even if p > 3 and p ≡ 1 mod 4. As a matter of fact, it is an interesting
open problem to determine if similar sums of this type which are positive are always linear
combinations of class numbers; for further information on sums and congruences for L-
functions, consult Urbanowicz and Williams [56] (I thank an anonymous referee for calling
my attention to this problem and this reference).

7. A Character-Theoretic Proof of Quadratic Reciprocity

In this section, we will present our final proof of quadratic reciprocity. The main idea
of this approach is to define Gauss sums for a general Dirichlet character and then use
computations with these sums and the structure theory for Dirichlet characters that was
discussed in section 1 to derive a formula which calculates the value of a real primitive
character χ at odd primes p as an appropriate value of the Legendre symbol χp . If we
then set χ equal to χq for an odd prime q distinct from p into that formula, the LQR will
immediately result from that choice of χ. This argument reveals how quadratic reciprocity
202
7. A Character-Theoretic Proof of Quadratic Reciprocity

is caused by the fact that the Legendre symbols are real primitive characters which can
reproduce the value at the primes of an arbitrary real primitive character. We follow the
exposition as recorded in Cohen [2], sections 2.1 and 2.2.
Let χ be a Dirichlet character of modulus b, let a ∈ Z, set ζb = exp(2πi/b), and define
the Gauss sum of χ at a by
X
G(χ, a) = χ(x)ζbax ,
x mod b
where the sum is taken over all x in a complete set of ordinary residues mod b. It is clear,
because of the periodicity of χ(x) and ζbax in x, that this sum does not depend on the set of
ordinary residues mod b used to define it. We first encountered Gauss sums in Chapter 3;
in particular, the proof of Lemma 3.15 can be easily modified to establish

Lemma 8.5. If a and b are relatively prime then

G(χ, a) = χ(a)G(χ, 1),

where the bar over χ(a) denotes the complex conjugate.

The next lemma records a useful condition on a which implies that G(χ, a) = 0.

Lemma 8.6. Let d = gcd(a, b) and suppose that b/d is not an induced modulus of χ.
Then G(χ, a) = 0.

Proof. Because b/d is not an induced modulus, there is an integer m such that m ≡ 1
mod b/d, gcd(m, b) = 1, and χ(m) 6= 1. Upon letting m−1 denote the inverse of m mod b,
we have that
X X −1
χ(m)G(χ, a) = χ(mx)ζbax = χ(y)ζbaym .
x mod b y mod b

Because m ≡ 1 mod b/d, it follows that


a
aym−1 = dym−1 ≡ ay mod b,
d
hence
X
χ(m)G(χ, a) = χ(y)ζbay = G(χ, a),
y mod b

so that G(χ, a) = 0, as χ(m) 6= 1. QED


The next result shows that when χ is primitive, Lemma 8.5 holds for all integers a, and
not just the integers that are relatively prime to b.

Lemma 8.7. If χ is primitive then

G(χ, a) = χ(a)G(χ, 1), for all a ∈ Z.


203
Chapter 8. Dirichlet’s Class-Number Formula

Proof. We need only check this equation if d = gcd(a, b) > 1. But then χ(a) = 0, and
b/d cannot be induced modulus of χ, hence G(χ, a) = 0 by Lemma 8.6. QED
If χ is a primitive character then we can use Lemma 8.7 to calculate |G(χ, 1)| as follows:
 1/2
|G(χ, 1)| = G(χ, 1)G(χ, 1)
!1/2
X
= χ(a)G(χ, 1)ζb−a
a mod b
!1/2
X
= G(χ, a)ζb−a , by Lemma 8.7
a mod b

b b
!!1/2
a(x−1)
X X
= χ(x) ζb
x=1 a=1

= b,

where the last line follows because the inner sum here is a geometric series, which is 0 if b
does not divide x − 1, i.e., if x 6= 1, and is b if x = 1. This calculation now has the following
corollary, which is a direct generalization of Theorem 3.14.

Corollary 8.8. If χ is a primitive character of modulus b then

G(χ, 1)G(χ, 1) = χ(−1)b.

In particular, if χ is also real then

G(χ, 1)2 = χ(−1)b.

Proof. From Lemma 8.5 (applied to χ), we have that

G(χ, 1) = G(χ, −1) = χ(−1)G(χ, 1).

Now multiply this equation by χ(−1)G(χ, 1) and then calculate that

χ(−1)b = χ(−1)G(χ, 1)G(χ, 1) = χ(−1)2 G(χ, 1)G(χ, 1) = G(χ, 1)G(χ, 1).

QED
The next result shows that the value of a real primitive character at the odd prime p can
be calculated by the Legendre symbol χp of p; it is a direct generalization of equation (17)
in Chapter 3. Note also that its proof follows the same ideas used in the third proof of the
LQR that we gave in Chapter 3.
204
7. A Character-Theoretic Proof of Quadratic Reciprocity

Theorem 8.9. If χ is a real primitive character of modulus b then for all odd primes p,

χ(p) = χp χ(−1)b .

Proof. Since both sides of the equation to be verified are 0 when p divides b, we may
assume that p is an odd prime not dividing b. Let R denote the ring of algebraic integers.
Since the quotient ring R/pR has characteristic p, the map which sends an element of that
ring to its p-th power is additive, and we also have that ζb ∈ R. Hence
X
G(χ, 1)p ≡ χ(x)p ζbpx mod pR.
x mod b
p
As χ is real and p is odd, χ(x) = χ(x), and so it follows from Lemma 8.5 that
G(χ, 1)p ≡ χ(p)G(χ, 1) mod pR.
On the other hand, χ is real and primitive, hence by Corollary 8.8, G(χ, 1)2 = χ(−1)b, and
so multiplication of this congruence by G(χ, 1) yields the congruence
1
χ(−1)b (χ(−1)b) 2 (p−1) − χ(p) ≡ 0 mod pR.


If we now multiply this congruence by the inverse of χ(−1)b in Z/pZ to obtain


1
(χ(−1)b) 2 (p−1) ≡ χ(p) mod pR
and then use the congruence
1
(χ(−1)b) 2 (p−1) ≡ χp (χ(−1)b) mod pZ
that we get from Euler’s criterion, it follows that
χ(p) ≡ χp (χ(−1)b) mod pR.
Hence
χ(p) − χp (χ(−1)b)
∈ R ∩ Q = Z.
p
Because the numerator of this rational integer is either 0 or ±2 and p is odd, it follows that the
numerator must be 0, whence the conclusion of the theorem. QED
The LQR can now be deduced immediately from Theorem 8.9 and Theorem 2.4 of Chap-
ter 2. Let p and q be distinct odd primes. We take χ = χq in Theorem 8.9 and then apply
Theorem 2.4 twice to calculate that
χq (p) = χp (χq (−1)q)
1
= χp (−1) 2 (q−1) q

1
= χp (−1) 2 (q−1) χp (q)
1 1
= (−1) 2 (p−1) 2 (q−1) χp (q).
205
CHAPTER 9

Quadratic Residues and Non-Residues in Arithmetic Progression

The distribution problem for residues and non-residues has been intensively studied for
175 years using a rich variety of formulations and techniques. The work done in Chapter
7 gave a window through which we viewed one of these formulations and also saw a very
important technique used to study it. Another problem that has been studied almost as long
and just as intensely is concerned with the arithmetic structure of residues and non-residues.
In this chapter, we will sample one aspect of that very important problem by studying when
residues and non-residues form very long sequences in arithmetic progression. The first
major advance in that problem came in 1939 when Harold Davenport proved the existence
of residues and non-residues which form arbitrarily long sets of consecutive integers. As an
introduction to the circle of ideas on which the work of this chapter is based, we briefly
discuss Davenport’s results and the technique that he used to obtain them in section 1.
Davenport’s approach uses another application of the Dirichlet-Hilbert trick, which we used
in the proofs of Theorems 4.12 and 5.13 presented in Chapter 5, together with an ingenious
estimate of the absolute value of certain Legendre-symbol sums with polynomial values in
their arguments. Davenport’s technique is quite flexible, and so we will adapt it in order to
detect long sets of residues and non-residues in arithmetic progression. In section 2, we will
formulate our results precisely as a series of four problems which will eventually be solved in
sections 4 and 10. This will require the estimation of the sums of values of Legendre symbols
with polynomial arguments a la Davenport, which estimates we will derive in section 3 by
making use of a very important result of André Weil concerning the number of rational points
on a nonsingular algebraic curve over Z/pZ. In addition to these estimates, we will also need
to calculate a term which will be shown to determine the asymptotic behavior of the number
of sets of residues or non-residues which form long sequences of arithmetic progressions, and
this calculation will be performed in sections 6-9. Here we will see how techniques from
combinatorial number theory are applied to study residues and non-residues. In section 11,
an interesting class of examples will be presented, and we will use it to illustrate exactly
how the results obtained in section 10, together with some results of section 11, combine to
describe asymptotically how many sets there are of residues or non-residues which form long
arithmetic progressions. Finally, the last section of this chapter discusses a result which, in

207
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

certain interesting situations, calculates the asymptotic density of the set of primes which
have residues and non-residues which form long sets of specified arithmetic progressions.

1. Long Sets of Consecutive Residues and Non-Residues

The following question began to attract interest in the early 1900’s: if s is a fixed positive
integer and p is sufficiently large, does there exist an n ∈ [1, ∞) such that {n, n + 1, . . . , n +
s − 1} is a set of residues (respectively, non-residues) of p inside [1, p − 1], i.e., for all
sufficiently large primes p, does [1, p − 1] contain arbitrarily long sets of consecutive residues,
(respectively, non-residues) of p? For s = 2, 3, 4, and 5, various authors showed that the
answer is yes; in fact it was shown that if Rs (p) (respectively, Ns (p)) denotes the number
of sets of s consecutive residues (respectively, non-residues) of p inside [1, p − 1] then as
p → +∞,

(1) Rs (p) ∼ 2−s p ∼ Ns (p), for s = 2, 3, 4, and 5.

This shows in particular that for s = 2, 3, 4, and 5, not only are Rs (p) and Ns (p) both
positive, but as p → +∞, they both tend to +∞. Based on this evidence and extensive
numerical calculations, the speculation was that (1) in fact is valid without any restriction
on s, and in 1939, Harold Davenport [5] proved that this is indeed the case.
Davenport established the validity of (1) in general by yet another application of the
Dirichlet-Hilbert trick that was used in the proof of Theorems 4.12 and 5.13. Let Fp denote
the field Z/pZ of p elements. Then U(p) can be viewed as the group of nonzero elements of
Fp , and if ε ∈ {−1, 1} then, a la Dirichlet-Hilbert, the sum
p−s s−1
X Y
−s

2 1 + εχp (x + i)
x=1 i=0

is Rs (p) (respectively, Ns (p)) when ε = 1(respectively, ε = −1). Davenport rewrote this sum
as
p−s
!!
X X Y
(2) 2−s (p − s) + 2−s ε|T | χp (x + i) ,
∅6=T ⊆[0,s−1] x=1 i∈T

and then proceeded to estimate the size of the second term of the expression (2). This term
is a sum of terms of the form
p−s
X 
± χp f (x) ,
x=1
where f is a monic polynomial of degree at most s over Fp with distinct roots in Fp . Us-
ing results from the theory of certain L-functions due to Hasse, Davenport found absolute
208
2. Long Sets of Residues and Non-Residues in Arithmetic Progression

constants C > 0 and 0 < σ < 1 such that


p−s
X
≤ Csp−σ , for all p large enough.

χ f (x)

p

x=1

This estimate, the heart of Davenport’s argument, implies that the modulus of the second
term in (2) does not exceed Cspσ , and so

|Rs (p) − 2−s (p − s)| ≤ Cspσ , for all p large enough.

Hence

Rs (p) s s σ−1
2−s p − 1 ≤ p + Cs2 p

→ 0 as p → +∞.

The same argument also works for Ns (p)


It transpires that Davenport’s technique is quite flexible and can be used to investigate
the occurrence of residues and non-residues with specific arithmetical properties. We are
going to use it to detect arbitrarily long arithmetic progressions of residues and non-residues
of a prime.

2. Long Sets of Residues and Non-Residues in Arithmetic Progression

Our point of departure from Davenport’s work is to notice that the sequence {x, x +
1, . . . , x + s − 1} of s consecutive positive integers is an instance of the sequence{x, x +
b, . . . , x + b(s − 1)}, an arithmetic progression of length s and common difference b, with
b = 1. Thus, if (b, s) ∈ [1, ∞) × [1, ∞), and we set
n o
AP (b; s) = {n + ib : i ∈ [0, s − 1]} : n ∈ [1, ∞) ,

the family of all arithmetic progressions of length s and common difference b, it is natural
to inquire about the asymptotics as p → +∞ of the number of elements of AP (b; s) that are
sets of quadratic residues (respectively, non-residues) of p that occur inside [1, p − 1]. We
also consider the following related question: if a ∈ [0, ∞), set
n o
AP (a, b; s) = {a + b(n + i) : i ∈ [0, s − 1]} : n ∈ [1, ∞) ,

the family of all arithmetic progressions of length s taken from a fixed arithmetic progression

AP (a, b) = {a + bn : n ∈ [1, ∞)}.

We then ask for the asymptotics of the number of elements of AP (a, b; s) that are sets of
quadratic residues (respectively, non-residues) of p that occur inside [1, p − 1]. Solutions
209
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

of these problems will provide interesting insights into how often quadratic residues and
non-residues appear as arbitrarily long arithmetic progressions.
We will in fact consider the following generalization of these questions. For each m ∈
[1, ∞), let
a = (a1 , . . . , am ) and b = (b1 , . . . , bm )
be m-tuples of nonnegative integers such that (ai , bi ) 6= (aj , bj ), for all i 6= j. Let s ∈ [1, ∞).
When the bi ’s are distinct and positive, we set
n[m o
AP (b; s) = {n + ibj : i ∈ [0, s − 1]} : n ∈ [1, ∞) ,
j=1

and when the bi ’s are all positive (but not necessarily distinct), we set
n[ m o
AP (a, b; s) = {aj + bj (n + i) : i ∈ [0, s − 1]} : n ∈ [1, ∞) .
j=1

The elements of AP (b; s) are formed by taking an n ∈ [1, ∞), then selecting an arithmetic
progression of length s with initial term n and common difference bi for each i = 1, . . . , m,
and then taking the union of the arithmetic progressions so chosen. Elements of AP (a, b; s)
are obtained by taking an n ∈ [1, ∞), choosing from the arithmetic progression{ai + bi m :
m ∈ [1, ∞)} the arithmetic progression with initial term ai + bi n and length s for each
i = 1, . . . , m, and then forming the union of the arithmetic progressions chosen in that way.
If m = 1 then we recover our original sets AP (b; s) and AP (a, b; s). We now pose
Problem 1 (respectively, Problem 2): determine the asymptotics as p →
+∞ of the number of elements of AP (b; s) (respectively, AP (a, b; s)) that
are sets of quadratic residues of p inside [1, p − 1].
We also pose as Problem 3 and Problem 4 the problems which result when the phrase
“quadratic residues” in the statements of Problems 1 and 2 is replaced by the phrase “qua-
dratic non-residues”.
As we saw in section 1, the main step of Davenport’s solution to the problem of finding
long sets of consecutive residues and non-residues was finding good estimates of sums of the
form
p−1
X 
χp f (x)
x=0
for certain polynomials f (x) ∈ Fp [x]. In order to solve Problems 1-4, we will use a variant of
Davenport’s reasoning, tailored to detect long sets of residues and non-residues in arithmetic
progression. Our techniques will also require appropriate estimation of these sums. Estimates
very much like Davenport’s will suffice to solve Problems 1 and 3, but, for technical reasons,
210
3. Weil Sums and their Estimation

they are not sufficient to solve Problems 2 and 4. For those problems, we will need to obtain
good estimates, which do not depend on N, for sums of the form
N
X 
χp f (x) ,
x=0

where N can be any integer in [0, p − 1]. As preamble to our solution of Problems 1-4, we
show in the following section how some results of A. Weil can be used to efficiently and
elegantly derive the estimates that we require.

3. Weil Sums and their Estimation

In order to solve Problems 1-4, we will require estimates of sums of the form
N
X 
(*) χp f (x) ,
x=1

where f is a polynomial in Fp [x] and N is a fixed integer in [1, p − 1].


Suppose first that N = p − 1. In this case there is an elegant way to calculate the sum
(∗) in terms of the number of rational points on an algebraic curve over Fp .
If F is a field, F is an algebraic closure of F , and g(x, y) is a polynomial in two variables
with coefficients in F , then the set of points
C = {(x, y) ∈ F × F : g(x, y) = 0}
is an algebraic curve over F. A point (x, y) ∈ C is a rational point of C over F if (x, y) ∈ F ×F .
If F is finite then the set of rational points on an algebraic curve over F is evidently finite,
and so the determination of the cardinality of the set of rational points is an interesting and
very important problem in combinatorial number theory. In 1948, A. Weil’s great treatise
[57] on the geometry of algebraic curves over finite fields was published, which contained,
among many other results of fundamental importance, an upper estimate of the number
p
of rational points in terms of |F | and certain geometric parameters associated with an
algebraic curve. The Weil bound has turned out to be very important for various problems
in number theory; in particular, we will now show how it can be employed to obtain good
estimates of the sums (∗) when N = p − 1.
Let f ∈ Fp [x] and consider the algebraic curve C over Fp defined by the polynomial
y 2 − f (x).
We will calculate the so-called complete Weil sum
p−1
X 
χp f (x)
x=0
211
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

in terms of the number of rational points of C over Fp .


Let R(p) denote the set of rational points of C, i.e.,

R(p) = {(x, y) ∈ Fp × Fp : y 2 = f (x)},

and let
S0 = {x ∈ Fp : f (x) = 0},

S+ = {x ∈ Fp \ S0 : χp (f (x)) = 1},

S− = {x ∈ Fp \ S0 : χp (f (x)) = −1}.
If x ∈ S+ then there are exactly two solutions ±y0 6= 0 of y 2 = f (x) in Fp , hence (x, ±y0 ) ∈
R(p). Conversely, if (x, y) ∈ R(p) and y 6= 0 then 0 6= y 2 = f (x), hence x ∈ S+ and y = ±y0 .
We conclude that

(2) |R(p)| = |S0 | + 2 |S+ | .

Because Fp is the pairwise disjoint union of S0 , S+ , and S− ,

(3) |S0 | + |S+ | + |S− | = p.

Observe now that


p−1
X 
(4) χp f (x) = |S+ | − |S− | .
x=0

Equations (2), (3), and (4) imply


p−1
X 
|R(p)| = |S0 | + |S+ | + |S− | + χp f (x)
x=0
p−1
X 
= p+ χp f (x) ,
x=0

i.e.,
p−1
X 
(5) χp f (x) = |R(p)| − p.
x=0

We are ready to apply Weil’s estimate of |R(p)|. In this case, Weil ([57], Corollaire IV.3)
proved that if y 2 − f (x) is non-singular over Fp , which means essentially that f is monic of
degree at least 1 and there does not exist a polynomial g ∈ Fp [x] such that f = g 2 , then

(6) |R(p)| = 1 + p − r(p), where 1 ≤ r(p) < d p, d = degree of f
212
3. Weil Sums and their Estimation

(for an elementary proof of (6), see Schmidt [50], Theorem 2.2C). If f ∈ Fp [x] is monic with
distinct roots in Fp then f cannot be the square of a polynomial over Fp , and so y 2 − f (x)
is non-singular over Fp . Hence (5) and (6) imply

Theorem 9.1. (complete Weil-sum estimate) If f ∈ Fp [x] is monic of degree d ≥ 1 and


f has distinct roots in Fp then

p−1
X  √
χp f (x) < d p.


x=0

As its definition makes clear, a Weil sum is nothing more than the sum of a certain
sequence of 0’s and ±1’s. The content of Theorem 9.1 (and also Theorem 9.2 to follow)
is that for certain polynomials f ∈ Fp [x], a remarkable cancellation occurs in the terms of
Pp−1 
x=0 χp f (x) so that the absolute value of this sum, ostensibly as large as p, is in fact less

that d p, where d is the degree of f .
The work of Weil in [57] is another seminal development in modern number theory.
There Weil used methods from algebraic geometry to study number-theoretic properties
of curves, thereby founding the subject of arithmetic algebraic geometry. This not only
introduced important new techniques in both number theory and geometry, but it also led
to the formulation of innovative strategies for attacking a wide variety of problems which
until then had been intractable. Certainly one of the most spectacular examples of that is
the proof of Fermat’s Last Theorem by Andrew Wiles [60] in 1995 (with an able assist from
Richard Taylor [55]), which employed arithmetic algebraic geometry as one of its crucial
tools.
We now turn to the problem of estimating the sums (∗) when N < p − 1. An incomplete
Weil sum is a sum of the form
N
X 
(**) χp f (x) ,
x=M

where f ∈ Fp [x], and either 0 ≤ M ≤ N < p − 1 or 0 < M ≤ N ≤ p − 1. Our solution of


Problems 2 and 4 will require an estimate of incomplete Weil sums similar to the estimate
of complete Weil sums provided by Theorem 9.1, and also independent of the parameters M
and N. When f (x) = x, Polya proved in 1918 that

XN √
χp (x) ≤ p log p,


x=M
213
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

and Vinogradov in the same year showed that if χ is a non-principal Dirichlet character mod
m then
N
X √
χ(x) ≤ 6 m log m.


x=M

Assuming the Generalized Riemann Hypothesis, in 1977 Montgomery and Vaughn improved
this to
N
X √
χ(x) ≤ C m log log m.


x=M

By an earlier result of Paley (which holds without assuming GRH), this estimate, except
for the choice of the constant C, is best possible. It follows that an estimate of (∗∗)
that is independent of M and N will most likely behave more or less like (an absolute

constant)× p log p. In fact, we will prove

Theorem 9.2. (incomplete Weil-sum estimate) There exists p0 > 0 such that the follow-
ing statement is true: if p ≥ p0 , if f ∈ Fp [x] is monic of degree d ≥ 1 with distinct roots in
Fp , and N ∈ [0, p − 1], then
N
X  √
χp f (x) ≤ d(1 + log p) p.


x=0

Our proof of Theorem 9.2 will make use of certain homomorphisms of the additive group
of Fp into the circle group, defined like so. Let
 
2πiθ
ep (θ) = exp .
p
If n ∈ Z then we set
ψ(m) = ep (mn), m ∈ Z.

Because ψ(m) = ψ(m′ ) whenever m ≡ m′ mod p, ψ defines a homomorphism of the additive


group of Fp into the circle group, hence ψ is called an additive character mod p.
Now for each n ∈ Z, ζ = ep (n) is a p-th root of unity, i.e., ζ p = 1, and from the
factorization
p−1
X 
(1 − ζ) ζ = 1 − ζp = 0
k

k=0

we see that
p−1
X
ζ k = 0,
k=0
214
3. Weil Sums and their Estimation

unless ζ = 1. Applying this with ζ = ep (n − a), we obtain


p−1
(
1X 1, if n ≡ a mod p,
(7) ep (−ax)ep (nx) =
p x=0 0, otherwise,
the so-called orthogonality relations of the additive characters. These relations are quite
similar to the orthogonality relations satisfied by Dirichlet characters (section 4 of Chapter
4), the latter of which Dirichlet used to prove Lemma 4.7, on his way to the proof of Theorem
4.5.
Proof of Theorem 9.2.
Let f ∈ Fp [x] be monic of degree d ≥ 1, with distinct roots in Fp , let N ∈ [1, p − 1] and
set
XN

S(N) = χp f (x) .
x=1
The strategy of this argument is to use the orthogonality relations of the additive characters
to express S(N) as a sum of terms λ(x)S(x), x = 0, 1, . . . , p − 1, where λ(x) is a sum of
additive characters and S(x) is a sum that is a “twisted” or “hybrid” version of a complete
Weil sum. Appropriate estimates of these terms are then made to obtain the conclusion of
Theorem 9.2.
We first decompose S(N) like so:
p−1
N X
(
X  1 , if j = k,
S(N) = δjk χp f (j) , δjk =
k=1 j=0
0 , if j 6= k.
p−1 p−1
N X
!
X  1X
= χp f (j) ep (xk)ep (−xj) , by (7)
k=1 j=0
p x=0
p−1 N
! p−1
1X X X 
= ep (xk) χp f (j) ep (−xj)
p x=0 k=1 j=0
p−1
1X
= λ(x)S(x),
p x=0
where
N p−1
X X 
λ(x) = ep (xk), S(x) = ep (−xk)χp f (k) .
k=1 k=0
The next step is to estimate |λ(x)| and |S(x)|, x = 0, 1, . . . , p−1. To get a useful estimate
of |λ(x)|, use the trigonometric identities
N
sin N + 12 θ − sin θ2
  
X
cos kθ = θ
 ,
k=1
2 sin 2

215
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

N θ
 1
 
X cos 2
− cos N+ 2
θ
sin kθ = θ
 ,
k=1
2 sin 2

to calculate that
sin (Nπx/p)
|λ(x)| = .
sin (πx/p)
Now use the estimate
2|θ| π
≤ | sin θ| , |θ| ≤ ,
π 2
to get
p p
(8) |λ(x)| ≤ , 0 < |x| < .
2|x| 2
The sums λ(x) and S(x) are periodic in x of period p, hence
1 X
(9) S(N) = λ(x)S(x).
p
|x|<p/2

Note that λ(0) = N, hence (8), (9) imply that



S(N) − S(0) ≤ 1
N X
|x|−1 |S(x)|.

p 2
0<|x|<p/2

An estimate of each sum S(x) is now required. These are so-called hybrid or mixed

Weil sums, and consist of terms ep (−xy)χp f (y) , y = 0, 1, . . . , p − 1, which are the terms
of the complete Weil sum p−1
P 
y=0 χp f (y) that are “twisted” by the multiplier ep (−xy). As
Perel’muter [44] proved in 1963 by means of the arithmetic algebraic geometry of Weil (see
also Schmidt [50], Theorem 2.2G for an elementary proof), this twisting causes no problems,
i.e., we have the estimate

|S(x)| ≤ d p, for all x ∈ Z.
Hence
N 1 X
|S(N)| ≤ |S(0)| + |x|−1 |S(x)|
p 2
0<|x|<p/2

√  X 1
≤ d p 1+ .
n
1≤n<p/2

Because
 hpi X 1
lim γ + log − = 0,
p→+∞ 2 n
1≤n<p/2

where γ = 0.57721. . . is Euler’s constant, we are done. QED


216
4. Solution of Problems 1 and 3

4. Solution of Problems 1 and 3

Now that we have Theorem 9.1 at our disposal, Problems 1 and 3 can be solved, i.e., the
asymptotic behavior, as the prime p → +∞, of the number of elements of
k
n[ o
AP (b; s) = {n + ibj : i ∈ [0, s − 1]} : n ∈ [1, ∞)
j=1

that are sets of residues (respectively, non-residues) of p inside [1, p − 1] can be determined.
We begin with some terminology and notation that will allow us to state our results precisely
and concisely. Let W = {z1 , . . . , zr } be a nonempty, finite subset of [0, ∞) with its elements
indexed in increasing order zi < zj for i < j. We let

S(W ) = {n + zi : i ∈ [1, r]} : n ∈ [1, ∞) ,

the set of all shifts of W to the right by a positive integer. Let ε be a choice of signs for
[1, r], i.e., a function from [1, r] into {−1, 1}. If S = {n + zi : i ∈ [1, r]} is an element of
S(W ), we will say that the pair (S, ε) is a residue pattern of p if

χp (n + zi ) = ε(i), for all i ∈ [1, r].

The set S(W ) has the universal pattern property if there exists p0 > 0 such that for all p ≥ p0
and for all choices of signs ε for [1, r], there is a set S ∈ S(W ) ∩ 2[1,p−1] such that (S, ε) is
a residue pattern of p. S(W ) hence has the universal pattern property if and only if for
all p sufficiently large, S(W ) contains a set that exhibits any fixed but arbitrary pattern of
quadratic residues and non-residues of p. This property is inspired directly by Davenport’s
work: using this terminology, we can state the result of [5, Corollary of Theorem 5] for
quadratic residues as asserting that if s ∈ [1, ∞) then S([0, s − 1]) has the universal pattern
property, and moreover, for any choice of signs ε for [1, s], the cardinality of the set

{S ∈ S([0, s − 1]) ∩ 2[1,p−1] : (S, ε) is a residue pattern of p}

is asymptotic to 2−s p as p → +∞. Note that if ε is the choice of signs that is either identically
1 or identically −1 on [1, s], then we recover the results that were discussed in section 1 of
this chapter.
Suppose now that there exists nontrivial gaps between elements of W , i.e., zi+1 − zi ≥ 2
for at least one i ∈ [1, r − 1]. It is then natural to search for elements S of S(W ) such
that the quadratic residues (respectively, non-residues) of p inside [min S, max S] consists
precisely of the elements of S, so that S acts as the “support” of quadratic residues or
non-residues of p inside the minimal interval of consecutive integers containing S. We
formalize this idea by declaring S to be a residue (respectively, non-residue) support set
217
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

of p if S = (the set of all residues of p inside [1, p − 1]) ∩ [min S, max S] (respectively, S =
(the set of all non-residues of p inside [1, p − 1]) ∩ [min S, max S]). We then define S(W ) to
have the residue (respectively, non-residue) support property if there exist p0 > 0 such that
for all p ≥ p0 , there is a set S ∈ S(W ) ∩ 2[1,p−1] such that S is a residue (respectively,
non-residue) support set of p.
We now use Davenport’s method to establish the following proposition, which generalizes
[5, Corollary of Theorem 5] for quadratic residues.

Proposition 9.3. If W is any nonempty, finite subset of [0, ∞), then S(W ) has the
universal pattern property and both the residue and non-residue support properties. Moreover,
if ε is a choice of signs for [1, |W |],

cε (W )(p) = {S ∈ S(W ) ∩ 2[1,p−1] : (S, ε) is a residue pattern of p} , and


cσ (W )(p) = {S ∈ S(W ) ∩ 2[1,p−1] : S is a residue (respectively, non-residue) support set of p} ,


then as p → +∞,

cε (W )(p) ∼ 2−|W |p and cσ (W )(p) ∼ 2−(1+max W −min W ) p.

Proof. Suppose that the asserted asymptotics of cε (W )(p) has been established for all
nonempty, finite subsets W of [0, ∞). Then the asserted asymptotics for cσ (W )(p) can be
deduced from that by means of the following trick. Let W ⊆ [0, ∞) be nonempty and finite.
Define the choice of signs ε for [min W , max W ] to be 1 on W and −1 on [min W , max
W ]\W . Now for each p, let

S(p) = {S ∈ S(W ) ∩ 2[1,p−1] : S is a residue support set of p},

R(p) = {S ∈ S([min W, max W ]) ∩ 2[1,p−1] : (S, ε) is a residue pattern of p}.


If to each E ∈ R(p) (respectively, F ∈ S(p)), we assign the set f (E) = E∩(set of all
residues of p inside [1, p − 1]) (respectively, g(F ) = [min F, max F ]), then f (respectively, g)
maps R(p) (respectively, S(p)) injectively into S(p) (respectively, R(p)). Hence R(p) and
S(p) have the same cardinality. Because of our assumption concerning the asymptotics of
cε ([min W, max W ])(p), it follows that as p → +∞,

cσ (W )(p) = |S(p)| = |R(p)| ∼ 2−|[min W,max W ]| p = 2−(1+max W −min W ) p.

This establishes the conclusion of the proposition with regard to residue support sets, and the
conclusion with regard to non-residue support sets follows by repeating the same reasoning
after ε is replaced by −ε.
218
4. Solution of Problems 1 and 3

If ε is now an arbitrary choice of signs for[1, |W |], it hence suffices to deduce the asserted
asymptotics of cε (W )(p). Letting r(p) = p − max W − 1, we have for all p sufficiently large
that
r(p) |W |  
X Y
−|W |
cε (W )(p) = 2 1 + ε(i)χp (x + zi ) .
x=1 i=1
This sum can hence be rewritten as
r(p)
X Y 
X Y
−|W | −|W |
2 r(p) + 2 ε(i) χp (x + zi ) .
∅ 6= T ⊆ [1,|W |] i∈T x=1 i∈T

The asserted asymptotics for cε (W )(p) now follows from an application of Theorem 9.1 to the
Weil sums in the second term of this expression. QED
Now, let (k, s) ∈ [1, ∞)×[1, ∞), {b1, . . . , bk } ⊆ [1, ∞) and b = (b1 , . . . , bk ). We will apply
Proposition 9.3 to the family of sets defined by
k
n[ o
AP (b; s) = {n + ibj : i ∈ [0, s − 1]} : n ∈ [1, ∞) ;
j=1

we need only to observe that


k
[ 
AP (b; s) = S {ibj : i ∈ [0, s − 1]} ,
j=1

for then the following theorem is an immediate consequence of Proposition 9.3. In particular,
if the choice of signs ε in the theorem is taken to be either identically 1 or identically −1,
we obtain the solution of Problems 1 and 3.

Theorem 9.4. (Wright [62], Theorem 2.3) If (k, s) ∈ [1, ∞) × [1, ∞), {b1 , . . . , bk } ⊆
[1, ∞) and b = (b1 , . . . , bk ), then AP (b; s) has the universal pattern property and both the
residue and non-residue support properties. Moreover, if b = max{b1 , . . . , bk },
[k
γ= {ibj : i ∈ [0, s − 1]} ,

j=1

ε is a choice of signs for [1, γ],

cε (p) = |{S ∈ AP (b; s) ∩ 2[1,p−1] : (S, ε) is a residue pattern of p}|, and

cσ (p) = |{S ∈ AP (b; s)∩2[1,p−1] : S is a residue (respectively, non-residue) support set of p}|,
then as p → +∞,
cε (p) ∼ 2−γ p and cσ (p) ∼ 2−(1+b(s−1)) p.
219
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

As an example of Theorem 9.4 in action, take k = 5, s = 6, and b = (b1 , b2 , b3 , b4 , b5 ) =


(1, 2, 3, 5, 7). Then
1 + b(s − 1) = 1 + 7 · 5 = 36
and
[5
γ = {ibj : i ∈ [0, 5]}

j=1

= {0} ∪ {1, 2, 3, 5, 7} ∪ {2, 4, 6, 10, 14} ∪ {3, 6, 9, 15, 21}

∪ {4, 8, 12, 20, 28} ∪ {5, 10, 15, 25, 35}

= {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 20, 21, 25, 28, 35}
= 19.

We have that

AP (b; 6) = {n+z : z ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 20, 21, 25, 28, 35} : n ∈ [1, ∞) ,

and so if ε is a choice of signs for [1, 19] then Theorem 9.4 implies that as p → +∞,

cε (p) ∼ 2−19 p and cσ (p) ∼ 2−36 p.

5. Solution of Problems 2 and 4: Introduction

Let (m, s) ∈ [1, ∞) × [1, ∞), let a = (a1 , . . . , am ), (respectively, b = (b1 , . . . , bm )) be


an m-tuple of nonnegative (respectively, positive) integers such that (ai , bi ) 6= (aj , bj ) for
i 6= j, let (a, b) denote the 2m-tuple (a1 , . . . , am , b1 , . . . , bm ) (we will call (a, b) a standard
2m-tuple) , and recall from section 2 that
m
n[ o
AP (a, b; s) = {aj + bj (n + i) : i ∈ [0, s − 1]} : n ∈ [1, ∞) .
j=1

Problems 2 and 4 ask for the asymptotic behavior as p → +∞ of the number of elements
of AP (a, b; s) ∩ 2[1,p−1] which are sets of residues (respectively, non-residues) of p. Because
of certain arithmetical interactions which can take place between the elements of the sets in
AP (a, b; s), the asymptotic behavior of this sequence is somewhat more complicated than
what occurs for AP (b; s) as per Theorem 9.4.
In order to explain the situation, we set

qε (p) = |{A ∈ AP (a, b; s) ∩ 2[1,p−1] : χp (a) = ε, for all a ∈ A}|

and note that the value of qε (p) for ε = 1 (respectively, ε = −1) counts the number of
elements of AP (a, b; s) that are sets of residues (respectively, non-residues) of p which are
220
5. Solution of Problems 2 and 4: Introduction

located inside [1, p − 1]. As we mentioned before, it will transpire that the asymptotic
behavior of qε (p) depends on certain arithmetic interactions that can take place between the
elements of AP (a, b; s). In order to see how this goes, first consider the set B of distinct
values of the coordinates of b. If we declare the coordinate ai of a and the coordinate bi
of b to correspond to each other, then for each b ∈ B, we let A(b) denote the set of all
coordinates of a whose corresponding coordinate of b is b. We then relabel the elements of
B as b1 , . . . , bk , say, and for each i ∈ [1, k], set
[
Si = {a + bi j : j ∈ [0, s − 1]},
a∈A(bi )

and then let


X
α= |Si |, b = max{b1 , . . . , bk }.
i
Next, suppose that
(∗ ∗ ∗) if (i, j) ∈ [1, k] × [1, k] with i 6= j and (x, y) ∈ A(bi ) × A(bj ), then
either bi bj does not divide ybi − xbj or bi bj divides ybi − xbj with a quotient
that exceeds s − 1 in modulus.
Then we will show in section 10 that as p → +∞, qε (p) is asymptotic to (b · 2α )−1 p. On the
other hand, if the assumption (∗ ∗ ∗) does not hold then we will also show in section 10 that
the asymptotic behavior of qε (p) falls into two distinct regimes, with each regime determined
in a certain manner by the integral quotients
ybi − xbj
(♮) , (x, y) ∈ A(bi ) × A(bj ),
bi bj
whose moduli do not exceed s − 1. More precisely, these quotients determine a positive
integer e < α and a collection S of nonempty subsets of [1, k] such that each element of S
has even cardinality and for which the following two alternatives hold:
(i) if i∈S bi is a square for all S ∈ S, then as p → +∞, qε (p) is asymptotic to (b·2α−e )−1 p,
Q

or
Q
(ii) if there is an S ∈ S such that i∈S bi is not a square, then there exist two disjoint,
infinite sets of primes Π+ and Π− whose union contains all but finitely many of the primes
and such that qε (p) = 0 for all p ∈ Π− , while as p → +∞ inside Π+ , qε (p) is asymptotic
to (b · 2α−e )−1 p. Thus we see that when (∗ ∗ ∗) does not hold and p → +∞, either qε (p) is
asymptotic to (b · 2α−e )−1 p or qε (p) asymptotically oscillates infinitely often between 0 and
(b · 2α−e )−1 p.
In light of what we have just discussed, it will come as no surprise that the solution of
Problems 2 and 4 for AP (a, b; s) involves a bit more effort than the solution of Problems 1
221
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

and 3 for AP (b; s). In order to analyze the asymptotic behavior of qε (p), we follow the same
strategy as before: using an appropriate sum of products involving χp , qε (p) is expressed as
a sum of a dominant term and a remainder. If the dominant term is a non-constant linear

function of p and the remainder term does not exceed an absolute constant × p log p, then
the asymptotic behavior of qε (p) will be in hand.
We in fact will implement this strategy when the set AP (a, b; s) in the definition of qε (p)
is replaced by a slightly more general set; for a precise statement of what we establish, see
Theorem 9.9 in section 10. Also in section 10, we then deduce the solution of Problems 2
and 4 from this more general result, where, in particular, we indicate more precisely the
manner in which the integral quotients (♮) whose moduli do not exceed s − 1 determine the
parameter e and collection of sets S discussed above.

6. Preliminary Estimate of qε (p)

We begin the analysis of qε (p) by taking a closer look at the structure of AP (a, b; s). Let
J denote the set of all subsets J of [1, m] that are of maximal cardinality with respect to
the property that bj is equal to a fixed integer bJ for all j ∈ J. We note that {J : J ∈ J }
is a partition of [1, m] and that bJ 6= bJ ′ whenever {J, J ′ } ⊆ J . Because (ai , bi ) 6= (aj , bj )
whenever i 6= j, it follows that if J ∈ J then the integers aj for j ∈ J are all distinct. Let
[
SJ = {aj + bJ i : i ∈ [0, s − 1]}, J ∈ J .
j∈J

Then
m
[ [
(11) {aj + bj (n + i) : i ∈ [0, s − 1]} = bJ n + SJ , for all n ∈ [1, ∞).
j=1 J∈J

It follows that AP (a, b; s) is a special case of the following more general situation. Let
k ∈ [1, ∞), let B = {b1 , . . . , bk } be a set of positive integers, and let S = (S1 , . . . , Sk ) be a
k-tuple of finite, nonempty subsets of [0, ∞). By way of analogy with the expression of the
elements of AP (a, b; s) according to (11), we will denote by AP (B, S) the collection of sets
defined by
n[ k o
bi n + Si : n ∈ [1, ∞) .
i=1

We are interested in the number of elements of AP (B, S) that are sets of quadratic residues
or, respectively, quadratic non-residues of a prime p, and so if ε ∈ {−1, 1}, we replace
AP (a, b; s) by AP (B, S) in the definition of qε (p) and retain the same notation, so that now

qε (p) = |{A ∈ AP (B, S) ∩ 2[1,p−1] : χp (a) = ε, for all a ∈ A}|.


222
6. Preliminary Estimate of qε (p)

The problem is to find an asymptotic formula for qε (p) as p → +∞.


We can now begin to implement the strategy for determining the asymptotic behavior of
qε (p) as set forth in section 5. Our goal here is to find an initial estimate of qε (p) in terms of

an expression that we will eventually denote by Σ4 (p) such that qε (p) − Σ4 (p) = O( p log p)
as p → +∞. In sections 7, 8, and 9, we will then prove that Σ4 (p) is a non-constant linear
function of p for enough primes p so that the precise asymptotic behavior of qε (p) is captured.
Toward that end, begin by noticing that there is a positive constant C, depending only
on B and S, such that for all n ≥ C,

(12) the sets bi n + Si , i ∈ [1, k], are pairwise disjoint, and

k
[
(13) bi n + Si is uniquely determined by n.
i=1

Because of (12) and (13), if


 
X p − 1 − max Si
α= |Si | and r(p) = min ,
i
i bi

then the sum


r(p) k
X Y Y
−α

2 1 + εχp (bi x + j)
x=1 i=1 j∈Si

differs from qε (p) by at most O(1), hence, as per the strategy as outlined in section 5, this
sum can be used to determine the asymptotics of qε (p).
Apropos of that strategy, let
k
[
T = {(i, j) : j ∈ Si },
i=1

and then rewrite the above sum as


k r(p)  Y 
(x + b¯i j) ,
X Y X
−α −α |T | |{j:(i,j)∈T }|
(14) 2 r(p) + 2 ε χp (bi ) χp
∅6=T ⊆T i=1 x=1 (i,j)∈T

where b¯i denotes the inverse of bi modulo p, which clearly exists for all p sufficiently large.
Our intent now is to estimate the modulus of certain summands in the second term of (14)
by means of Theorem 9.2.
Let Σ(p) denote the second term of the sum in (14). In order to carry out the intended
estimate, we must first remove from Σ(p) the terms to which Theorem 9.2 cannot be applied.
Toward that end, let
223
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

E(p) = {∅ =6 T ⊆ T : the distinct elements, modulo p, in the list b¯i j, (i, j) ∈


T , each occurs an even number of times}.
We then split Σ(p) into the sum Σ1 (p) of terms taken over the elements of E(p) and the sum
Σ2 (p) = Σ(p) − Σ1 (p). The sum Σ2 (p) has no more than 2α − 1 terms each of the form
r(p)  Y 
¯
X
−α
±2 χp 6 T ∈ 2T \ E(p).
(x + bi j) , ∅ =
x=1 (i,j)∈T

Since ∅ =
6 T ∈
/ E(p), the polynomial in x in this term at which χp is evaluated can be reduced
to a product of at least one and no more than α distinct monic linear factors in x over Fp ,
and so the sum in each of the above terms of Σ2 (p) is an incomplete Weil sum to which
Theorem 9.2 can be applied. It therefore follows from that theorem that

Σ2 (p) = O( p log p) as p → +∞.

We must now estimate


Σ3 (p) = 2−α r(p) + Σ1 (p),

and, as we shall see, it is precisely this term that will produce the dominant term which
determines the asymptotic behavior of qε (p).
Since each element of E(p) has even cardinality,
k r(p)  Y 
¯
X Y X
−α |{j:(i,j)∈T }|
Σ1 (p) = 2 χp (bi ) χp (x + bi j) .
T ∈E(p) i=1 x=1 (i,j)∈T

We now examine the sum over x ∈ [1, r(p)] on the right-hand side of this equation. Because
T ∈ E(p), each term in this sum is either 0 or 1, and a term is 0 precisely when the value
of x in that term agrees with the minimal nonnegative ordinary residue mod p of −b¯i j, for
some element (i, j) of T . However, there are at most α/2 of these values at which x can
agree for each T ∈ E(p) and so it follows that Σ3 (p) differs by at most O(1) from
 k
X Y 
−α
Σ4 (p) = 2 r(p) 1 + χp (bi )|{j:(i,j)∈T }| .
T ∈E(p) i=1

Consequently,

(15) for all p sufficiently large, qε (p) − Σ4 (p) = O( p log p),

and so it suffices to calculate Σ4 (p) in order to determine the asymptotics of qε (p).


224
7. Calculation of Σ4 (p): Preliminaries

7. Calculation of Σ4 (p): Preliminaries

The calculation of Σ4 (p) requires a careful study of E(p). In order to pin this set down
a bit more firmly, we make use of the equivalence relation ≈ defined on
k
[
T = {(i, j) : j ∈ Si }
i=1

as follows: if ((i, j), (l, m)) ∈ T × T then (i, j) ≈ (l, m) if bl j = bi m. For all p sufficiently
large, (i, j) ≈ (l, m) if and only if b¯i j ≡ b¯l m mod p, and so if we let E(A) denote the set of
all nonempty subsets of even cardinality of a finite set A, then
for all p sufficiently large, E(p) consists of all subsets T of T such that
there exists a nonempty subset S of equivalence classes of ≈ and elements
ES ∈ E(S) for S ∈ S such that
[
(16) T = ES .
S∈S

In particular, it follows that for all p large enough, E(p) does not depend on p, hence from
now on, we delete the “p” from the notation for this set.
The description of E given by (16) mandates that we determine the equivalence classes
of the equivalence relation ≈. In order to do that in a precise and concise manner, it will be
convenient to use the following notation: if b ∈ [1, ∞) and S ⊆ [0, ∞), we let b−1 S denote
the set of all rational numbers of the form z/b, where z is an element of S. We next let
n \ o
−1
K= ∅= 6 K ⊆ [1, k] : bi Si 6= ∅ .
i∈K

If K ∈ K then we set
\   \ 
T (K) = b−1
i S i ∩ (Q \ b−1
i S i ) .
i∈K i∈[1,k]\K

Let
Kmax = {K ∈ K : T (K) 6= ∅}.
Using Proposition 1.4, it is then straightforward to verify that the equivalence classes of ≈
consist precisely of all sets of the form

{(i, tbi ) : i ∈ K},

where K ∈ Kmax and t ∈ T (K).


Observe next that if the set
n \ o
{(i, tbi ) : i ∈ K} : K ∈ K, t ∈ b−1
i S i
i∈K
225
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

is ordered by inclusion then the equivalence classes of ≈ are the maximal elements of this
set. Hence T (K) ∩ T (K ′ ) = ∅ whenever {K, K ′ } ⊆ Kmax . Consequently, if (K, K ′ ) ∈
Kmax × Kmax , ∅ = 6 σ ⊆ K, ∅ = 6 σ ′ ⊆ K ′ , t ∈ T (K), and t′ ∈ T (K ′ ), then {(i, tbi ) : i ∈ σ}
and {(i, t′ bi ) : i ∈ σ ′ } are each contained in distinct equivalence classes of ≈ if and only if
t 6= t′ . The following lemma is now an immediate consequence of (16) and the structure
just obtained for the equivalence classes of ≈.

Lemma 9.5. If T ∈ E then there exists a nonempty subset S of Kmax , a nonempty subset
Σ(S) of E(S) for each S ∈ S and a nonempty subset T (σ, S) of T (S) for each σ ∈ Σ(S) and
S ∈ S such that
n o
the family of sets T (σ, S) : σ ∈ Σ(S), S ∈ S is pairwise disjoint, and

[ h [  [ i
T = {(i, tbi ) : i ∈ σ} .
S∈S σ∈Σ(S) t∈T (σ,S)

We have now determined via Lemma 9.5 the structure of the elements of E precisely
enough for effective use in the calculation of Σ4 (p). However, if we already know that
qε (p) = 0, the value of Σ4 (p) is obviated in our argument. It would hence be very useful to
have a way to mediate between the primes p for which qε (p) = 0 and the primes p for which
qε (p) 6= 0. We will now define and study a gadget which does that.

8. The (B, S)-signature of a Prime

Denote by Λ(K) the set


[
E(K).
K∈Kmax

Then Λ(K) is empty if and only if every element of Kmax is a singleton.


Suppose that Λ(K) is not empty. We will say that p is an allowable prime if no element
of B has p as a factor. If p is an allowable prime, then the (B, S)-signature of p is defined
to be the multi-set of ±1’s given by
n Y  o
χp bi : I ∈ Λ(K) .
i∈I

We declare the signature of p to be positive if all of its elements are 1, and non-positive
otherwise. Let
Π+ (B, S) (respectively, Π− (B, S)) denote the set of all allowable primes p
such that the (B, S)-signature of p is positive (respectively, non-positive).
226
8. The (B, S)-signature of a Prime

We can now prove the following two lemmas: the first records some important information
about the signature, and the second implies that we need only calculate Σ4 (p) for the primes
p in Π+ (B, S).

Lemma 9.6. (i) The set Π+ (B, S) consists precisely of all allowable primes p for which
each of the sets

(♯) {bi : i ∈ I}, I ∈ Λ(K),

is either a set of residues of p or a set of non-residues of p. In particular, Π+ (B, S) is always


an infinite set.
(ii) The set Π− (B, S) consists precisely of all allowable primes p for which at least one of
the sets (♯) contains a residue of p and a non-residue of p, Π− (B, S) is always either empty
Q
or infinite, and Π− (B, S) is empty if and only if for all I ∈ Λ(K), i∈I bi is a square.

Proof. Suppose that p is an allowable prime such that each of the sets (♯) is either a set
of residues of p or a set of non-residues of p. Then
Y 
χp bi = 1
i∈I

whenever I ∈ Λ(K) because |I| is even, i.e., p ∈ Π+ (B, S). On the other hand, let p ∈
Π+ (B, S) and let I = {i1 , . . . , in } ∈ Λ(K). Then because p ∈ Π+ (B, S),

χp (bij bij+1 ) = 1, j ∈ [1, n − 1],

and these equations imply that {bi : i ∈ I} is either a set of residues of p or a set of non-
residues of p. This verifies the first statement in (i), and the second statement follows from
the fact (Theorem 4.3) that there are infinitely many primes p such that B is a set of residues
of p.
Statement (ii) of the lemma follows from (i), the definition of Π− (B, S), and the fact
(Theorem 4.2) that a positive integer is a residue of all but finitely many primes if and only if
it is a square. QED
It is a consequence of the following lemma that we need only calculate Σ4 (p) for the
primes p which are in Π+ (B, S). As we will see in the next section, this greatly simplifies
that calculation.

Lemma 9.7. If p ∈ Π− (B, S) then qε (p) = 0.

Proof. If p ∈ Π− (B, S) then there is an I ∈ Λ(K) such that


Y 
χp bi = −1.
i∈I
227
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

Because I is nonempty and of even cardinality, there exists {m, n} ⊆ I such that

(17) χp (bm bn ) = −1.

Because {m, n} is contained in an element of Kmax , it follows that b−1 −1


m Sm ∩ bn Sn 6= ∅, and
so we find a non-negative rational number r such that

(18) rbm ∈ Sm and rbn ∈ Sn .

By way of contradiction, suppose that qε (p) 6= 0. Then there exists a z ∈ [1, ∞) such
that bm z + Sm and bn z + Sn are both contained in [1, p − 1] and

(19) χp (bm z + u) = χp (bn z + v), for all u ∈ Sm and for all v ∈ Sn .

If d is the greatest common divisor of bm and bn then there is a non-negative integer t such
that r = t/d. Hence by (18) and (19),

χp (bm /d)χp (dz + t) = χp (bm z + rbm )


= χp (bn z + rbn )
= χp (bn /d)χp (dz + t).

However, dz + t ∈ [1, p − 1] and so χp (dz + t) 6= 0. Hence

χp (bm /d) = χp (bn /d),

and this value of χp , as well as χp (d), is nonzero because d, bm /d, and bn /d are all elements
of [1, p − 1]. But then

χp (bm bn ) = χp (d2 )χp (bm /d)χp (bn /d) = 1,

contrary to (17). QED

9. Calculation of Σ4 (p): Conclusion

With Lemmas 9.5 and 9.7 in hand, we now calculate the sum Σ4 (p) that arose in (15).
By virtue of Lemma 9.7, we need only calculate Σ4 (p) for p ∈ Π+ (B, S), hence let p be an
allowable prime for which
Y 
(20) χp bi = 1, for all I ∈ Λ(K).
i∈I

We first recall that


 k
X Y 
(21) Σ4 (p) = 2−α r(p) 1 + χp (bi )|{j:(i,j)∈T }| ,
T ∈E i=1
228
9. Calculation of Σ4 (p): Conclusion

where
 
p − 1 − max Si
r(p) = min ,
i bi
and so we must evaluate the products over T ∈ E which determine the summands of the
third factor on the right-hand side of (21). Toward that end, let T ∈ E and use Lemma 9.5
to find a nonempty subset S of Kmax , a nonempty subset Σ(S) of E(S) for each S ∈ S and
a nonempty subset T (σ, S) of T (S) for each σ ∈ Σ(S) and S ∈ S such that

the sets T (σ, S), σ ∈ Σ(S), S ∈ S, are pairwise disjoint, and

[ h [  [ i
T = {(n, tbn ) : n ∈ σ} .
S∈S σ∈Σ(S) t∈T (σ,S)

Then
[  [ 
{j : (i, j) ∈ T } = {tbi : t ∈ T (σ, S)}
S∈S σ∈Σ(S):i∈σ

and this union is pairwise disjoint. Hence


X X
|{j : (i, j) ∈ T }| = |T (σ, S)|.
S∈S σ∈Σ(S):i∈σ

Thus from this equation and (20) we find that

k
Y Y P P
|T (σ,S)|
χp (bi )|{j:(i,j)∈T }| = χp (bi ) S∈S σ∈Σ(S):i∈σ

i=1 i∈∪S∈S ∪σ∈Σ(S) σ


Y Y   Y |T (σ,S)| 
= χp bi
S∈S σ∈Σ(S) i∈σ

= 1.

Hence
k
X Y
(22) χp (bi )|{j:(i,j)∈T }| = |E|,
T ∈E i=1

and so we must count the elements of E. In order to do that, note first that the pairwise
disjoint decomposition (16) of an element T of E is uniquely determined by T , and, obviously,
uniquely determines T . Hence if D denotes the set of all equivalence classes of ≈ of cardinality
229
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

at least 2 then
X Y
|E| = |E(S)|
∅6=S⊆D S∈S
Y
= −1 + (1 + |E(D)|)
D∈D
Y
= −1 + 2|D|−1
D∈D
P
= −1 + 2−|D| · 2 D∈D |D|
.
However, D consists of all sets of the form
{(i, tbi ) : i ∈ K}
where K ∈ Kmax , |K| ≥ 2, and t ∈ T (K). Hence
X
|D| = |T (K)|,
K∈Kmax :|K|≥2
X X
|D| = |K||T (K)|,
D∈D K∈Kmax :|K|≥2
and so if we set
X
e= |T (K)|(|K| − 1),
K∈Kmax
then
(23) |E| = 2e − 1.
Equations (21), (22), and (23) now imply

Lemma 9.8. If
 
X X p − 1 − max Si
α= |Si |, e = |T (K)|(|K| − 1), and r(p) = min ,
i K∈Kmax
i bi
then
Σ4 (p) = 2e−α r(p), for all p ∈ Π+ (B, S).

If we set b = max{bi } then it follows from Lemma 9.8 that as p → +∞ inside Π+ (B, S),
i

Σ4 (p) ∼ (b · 2α−e )−1 p.


When we insert this asymptotic approximation of Σ4 (p) into the estimate (15), and then
recall Lemma 9.7, we see that (b · 2α−e )−1 p is a linear function of p which should work to
determine the asymptotic behavior of qε (p). We will now show in the next section that it
does work in exactly that way.
230
10. Solution of Problems 2 and 4: Conclusion

10. Solution of Problems 2 and 4: Conclusion

All of the ingredients are now assembled for a proof of the following theorem, which
determines the asymptotic behavior of qε (p).

Theorem 9.9. (Wright [62], Theorem 6.1) Let ε ∈ {−1, 1}, k ∈ [1, ∞), and let B =
{b1 , . . . , bk } be a set of positive integers and S = (S1 , . . . , Sk ) a k-tuple of finite, nonempty
subsets of [0, ∞). If Kmax is the set of subsets of [1, k] defined by B and S as in section 7,
let
[
Λ(K) = E(K),
K∈Kmax
X X
α= |Si |, b = max{bi }, e = |T (K)|(|K| − 1), and
i
i K∈Kmax
[1,p−1]
qε (p) = |{A ∈ AP (B, S) ∩ 2 : χp (a) = ε, for all a ∈ A}|.
(i) If the sets b−1 −1
1 S1 , . . . , bk Sk are pairwise disjoint then

qε (p) ∼ (b · 2α )−1 p as p → +∞.

(ii) If the sets b−1 −1


1 S1 , . . . , bk Sk are not pairwise disjoint then
(a) the parameter e is positive and less than α;
Q
(b) if i∈I bi is a square for all I ∈ Λ(K) then

qε (p) ∼ (b · 2α−e )−1 p as p → +∞;


Q
(c) if there exists I ∈ Λ(K) such that i∈I bi is not a square then
(α) the set Π+ (B, S) of primes with positive (B, S)-signature and the set Π− (B, S) of
primes with non-positive (B, S)-signature are both infinite,
(β) qε (p) = 0 for all p in Π− (B, S), and
(γ) as p → +∞ inside Π+ (B, S),

qε (p) ∼ (b · 2α−e )−1 p .

Proof. If the sets b−1 −1


1 S1 , . . . , bk Sk are pairwise disjoint then every element of Kmax is a
singleton set, hence all of the equivalence classes of the equivalence relation ≈ defined above
on ki=1 {(i, j) : j ∈ Si } by the set B are singletons. It follows that the set E which is
S

summed over in (21) is empty and so

(24) Σ4 (p) = 2−α r(p), for all p sufficiently large.

Upon recalling that  


p − 1 − max Si
r(p) = min ,
i bi
231
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

and then noting that as p → +∞, r(p) ∼ p/b, the conclusion of (i) is an immediate conse-
quence of (15) and (24).
Suppose that the sets b−1 −1
1 S1 , . . . , bk Sk are not pairwise disjoint. Then Λ(K) is not empty
Q
and so conclusion (a) is an obvious consequence of the definition of e. If i∈I bi is a square
for all I ∈ Λ(K) then it follows from its definition that Π+ (B, S) contains all but finitely
many primes, and so (b) is an immediate consequence of (15) and Lemma 9.8. On the other
Q
hand, if there exists I ∈ Λ(K) such that i∈I bi is not a square then (α) follows from Lemma
9.6, (β) follows from Lemma 9.7, and (γ) is an immediate consequence of (15) and Lemma
9.8. QED
Theorem 9.9 shows that the elements of Λ(K) contribute to the formation of quadratic
residues and non-residues inside AP (B, S). If no such elements exist then qε (p) has the
expected minimal asymptotic approximation (b · 2α )−1 p as p → +∞. In the presence of
elements of Λ(K), the parameter e is positive and less than α, the asymptotic size of qε (p)
is increased by a factor of 2e , and whenever Π− (B, S) is empty, qε (p) is asymptotic to
(b · 2α−e )−1 p as p → +∞. However, the most interesting behavior occurs when Π− (B, S) is
not empty; in that case, as p → +∞, qε (p) asymptotically oscillates infinitely often between
0 and (b · 2α−e )−1 p.
Remark. If we observe that the cardinality of the set
k
[
b−1
i Si
i=1

is equal to the number of equivalence classes of the equivalence relation ≈ that was defined
on the set
[k
T = {(i, j) : j ∈ Si },
i=1
then it follows that
k
[ X
b−1


i Si =
|T (K)|.
i=1 K∈Kmax

But we also have that


X
α = |T | = |T (K)||K|.
K∈Kmax

Consequently, the exponents in the power of 1/2 that occur in the asymptotic approximation
to qε (p) in Theorem 9.9 are in fact all equal to the cardinality of ki=1 b−1
S
i Si .
Theorem 9.9 will now be applied to the situation of primary interest to us here, namely
to the family of sets AP (a, b; s) determined by a standard 2m-tuple (a, b). In this case, the
decomposition (11) of the sets in AP (a, b; s) shows that there is a set B = {b1 , . . . , bk } of
232
10. Solution of Problems 2 and 4: Conclusion

positive integers (the set of distinct values of the coordinates of b), a k-tuple (m1 , . . . , mk )
P
of positive integers such that m = i mi , and sets

Ai = {ai1 , . . . , aimi }

of non-negative integers, all uniquely determined by (a, b), such that if we let
mi
[
(25) Si = {aij + bi l : l ∈ [0, s − 1]}, i ∈ [1, k],
j=1

and set
S = (S1 , . . . , Sk )
then
AP (a, b; s) = AP (B, S).
It follows that
[
b−1
i Si = {q + j : j ∈ [0, s − 1]}, i ∈ [1, k].
q∈b−1
i Ai

These sets then determine the subsets of [1, k] that constitute


\
K = {∅ =6 K ⊆ [1, k] : b−1
i Si 6= ∅}}
i∈K

and hence also the elements of Kmax , according to the recipe given in section 7. The sets in
Kmax , together with the parameters
X X
α= |Si |, b = max{bi }, and e = |T (K)|(|K| − 1),
i
i K∈Kmax

when used as specified in Theorem 9.9, then determine precisely the asymptotic behav-
ior of the sequence qε (p) that is defined upon replacement of AP (B, S) by AP (a, b; s) in
the statement of Theorem 9.9, thereby solving Problems 2 and 4. In particular, the sets
b−1 −1
1 S1 , . . . , bk Sk are pairwise disjoint if and only if

(26) if (i, j) ∈ [1, k] × [1, k] with i 6= j and (x, y) ∈ Ai × Aj , then either


bi bj does not divide ybi − xbj or bi bj divides ybi − xbj with a quotient that
exceeds s − 1 in modulus.
Hence the conclusion of statement (i) of Theorem 9.9 holds for AP (a, b; s) when condition
(26) is satisfied, while the conclusions of statement (ii) of Theorem 9.9 hold for AP (a, b; s)
whenever condition (26) is not satisfied. In the following section we will present several
examples which illustrate how Theorem 9.9 works in practice to determine the asymptotic
behavior of qε (p). We will see there, in particular, that for each integer m ∈ [2, ∞) and for
233
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

each of the hypotheses in the statement of Theorem 9.9, there exists infinitely many standard
2m-tuples (a, b) which satisfy that hypothesis.

11. An Interesting Class of Examples

In order to apply Theorem 9.9 to a standard 2m-tuple (a, b), we need to calculate the
parameters α and e, the set Λ(K), and the associated signatures of the allowable primes.
In general, this can be somewhat complicated, but there is a class of standard 2m-tuples
for which these computations can be carried out by means of easily applied algebraic and
geometric formulae, which we will discuss next.
Let k ∈ [2, ∞). We will say that a standard 2k-tuple (a, b) of integers is admissible if it
satisfies the following two conditions:

(27) the coordinates of b are distinct, and,

(28) ai bj − aj bi 6= 0 for i 6= j.

If s ∈ [1, ∞) and (a, b) is admissible then it follows trivially from (27) that

Si = {ai + bi j : j ∈ [0, s − 1]}, i ∈ [1, k],

hence
|Si | = s, i ∈ [1, k],
and so the parameter α in the statement of Theorem 9.9 for AP (a, b; s) is ks.
We turn next to the calculation of the parameter e. Let qi = ai /bi , i ∈ [1, k]; (28) implies
that the qi ’s are distinct, and without loss of generality, we suppose that the coordinates of a
and b are indexed so that qi < qi+1 for each i ∈ [1, k − 1]. Let R denote the set of all subsets
R of {q1 , . . . , qk } such that |R| ≥ 2 and R is maximal relative to the property that w − z is
an integer for all (w, z) ∈ R × R. We note that R is just the set of all equivalence classes of
cardinality at least 2 of the equivalence relation ∼ defined on the set {q1 , . . . , qk } by declaring
that qi ∼ qj if qi − qj ∈ Z. After linearly ordering the elements of each R ∈ R, we let D(R)
denote the (|R| − 1)-tuple of positive integers whose coordinates are the distances between
consecutive elements of R. Then if MR (s) denotes the multi-set formed by the coordinates
of D(R) which do not exceed s − 1, it can be shown that
X X
(29) e= (s − r)
R∈R r∈MR (s)

(see Wright [62], section 8). We note in particular that e = 0 if and only if the set {R ∈
R : MR (s) 6= ∅} is empty and that this occurs if and only if the sets b−1
i Si , i ∈ [1, k], are
234
11. An Interesting Class of Examples

pairwise disjoint. Formula (29) shows that e can be calculated solely by means of information
obtained directly and straightforwardly from the set {q1 , . . . , qk }.
In order to calculate the signature of allowable primes, the set Λ(K) must be computed.
There is an elegant geometric formula for this computation that is based on the concept of
what we will call an overlap diagram, and so those diagrams will be described first.
Let (n, s) ∈ [1, ∞) × [1, ∞) and let g = (g(1), . . . , g(n)) be an n-tuple of positive integers.
We use g to construct the following array of points. In the plane, place s points horizontally
one unit apart, and label the j-th point as (1, j −1) for each j ∈ [1, s]. This is row 1. Suppose
that row i has been defined. One unit vertically down and g(i) units horizontally to the right
of the first point in row i, place s points horizontally one unit apart, and label the j-th point
as (i + 1, j − 1) for each j ∈ [1, s]. This is row i + 1. The array of points so formed by these
n + 1 rows is called the overlap diagram of g, the sequence g is called the gap sequence of the
overlap diagram, and a nonempty set that is formed by the intersection of the diagram with
a vertical line is called a column of the diagram. N.B. We do not distinguish between the
different possible positions in the plane which the overlap diagram may occupy. A typical
example with n = 3, s = 8, and gap sequence (3, 2, 2) looks like

· · · · · · · ·
· · · · · · · ·
· · · · · · · ·
· · · · · · · ·

An overlap diagram

We need to describe how and where rows overlap in an overlap diagram. Begin by first
noticing that if (g(1), . . . , g(n)) is the gap sequence, then row i overlaps row j for i < j if
and only if
j−1
X
g(r) ≤ s − 1;
r=i

in particular, row i overlaps row i + 1 if and only if g(i) ≤ s − 1. Now let G denote the
set of all subsets G of [1, n] such that G is a nonempty set of consecutive integers maximal
with respect to the property that g(i) ≤ s − 1 for all i ∈ G. If G is empty then g(i) ≥ s
for all i ∈ [1, n], and so there is no overlap of rows in the diagram. Otherwise there exists
m ∈ [1, 1 + [(n − 1)/2]] and strictly increasing sequences (l1 , . . . , lm ) and (M1 , . . . , Mm ) of
positive integers, uniquely determined by the gap sequence of the diagram, such that li ≤ Mi
235
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

for all i ∈ [1, m], 1 + Mi ≤ li+1 if i ∈ [1, m − 1], and

G = {[li , Mi ] : i ∈ [1, m]}.

In fact, li+1 > 1 + Mi if i ∈ [1, m − 1], lest the maximality of the elements of G be violated.
It follows that the intervals of integers [li , 1 + Mi ], i ∈ [1, m], are pairwise disjoint.
The set G can now be used to locate the overlap between rows in the overlap diagram
like so: for i ∈ [1, m], let
Bi = [li , 1 + Mi ],
and set

Bi = the set of all points in the overlap diagram whose labels are in Bi × [0, s − 1].

We refer to Bi as the i-th block of the overlap diagram; thus the blocks of the diagram are
precisely the regions in the diagram in which rows overlap.
We will now use the elements of R to construct a series of overlap diagrams. Let R be
an element of R such that D(R) has at least one coordinate that does not exceed s − 1.
Next, consider the nonempty and pairwise disjoint family of all subsets V of R such that
|V | ≥ 2 and V is maximal with respect to the property that the distance between consecutive
elements of V does not exceed s − 1. List the elements of V in increasing order and then for
each i ∈ [1, |V | − 1] let qV (i) denote the distance between the i-th element and the (i + 1)-th
element on that list. N.B. qV (i) ∈ [1, ∞), for all i ∈ [1, |V | − 1]. Finally, let D(V ) denote
the overlap diagram of the (|V | − 1)-tuple (qV (i) : i ∈ [1, |V | − 1]). Because qV (i) ≤ s − 1
for all i ∈ [1, |V | − 1], D(V ) consists of a single block.
Using a suitable positive integer m, we index all of the sets V that arise from all of
the elements of R in the previous construction as V1 , . . . , Vm and then define the quotient
diagram of (a, b) to be the m-tuple of overlap diagrams (D(Vn ) : n ∈ [1, m]). We will refer
to the diagrams D(Vn ) as the blocks of the quotient diagram.
The quotient diagram D of (a, b) will now be used to calculate the set Λ(K) determined
by (a, b) and hence the associated signature of an allowable prime. In order to see how this
goes, we will need to make use of a certain labeling of the points of D which we describe
next. Let V1 , . . . , Vm be the subsets of {q1 , . . . , qk } that determine the sequence of overlap
diagrams D(V1 ), . . . , D(Vm) which constitute D, and then find the subset Jn of [1, k] such
that Vn = {qj : j ∈ Jn }, with j ∈ Jn listed in increasing order (note that this ordering of Jn
also linearly orders qj , j ∈ Jn ).The overlap diagram D(Vn ) consists of |Jn | rows, with each
row containing s points. If i ∈ [1, |Jn |] is taken in increasing order then there is a unique
element j of Jn such that the i-th element of Vn is qj . Proceeding from left to right in each
row, we now take l ∈ [1, s] and label the l-th point of row i in D(Vn ) as (j, l − 1). N.B. This
236
11. An Interesting Class of Examples

labeling of the points of D(Vn ) does not necessarily coincide with the labeling of the points
of an overlap diagram that was used before to define the blocks of the diagram.
Next let C denote a column of one of the diagrams D(Vn ) which constitute D. We identify
C with the subset of [1, k] × [0, s − 1] defined by
(30) {(i, j) ∈ [1, k] × [0, s − 1] : (i, j) is the label of a point in C},
let Cn denote the set of all subsets of [1, k] ×[0, s −1] which arise from all such identifications,
S
and then set C = n Cn . If θ denotes the projection of [1, k] × [0, s − 1] onto [1, k] then one
can show (Wright [63], Lemma 2.5) that K ∈ Kmax if and only if there exists a T ∈ C such
that K = θ(T ), and so
[
(31) Λ(K) = E(θ(T )).
T ∈C

When this formula for Λ(K) is now combined with (29), it follows that all of the data required
for an application of Theorem 9.9 can be easily read off directly from the set {q1 , . . . , qk }
and the quotient diagram of (a, b).
At this juncture, some concrete examples which illustrate the mathematical technology
that we have introduced are in order. But before we get to those, recall that if (a, b) is an
admissible 2k-tuple, B is the set formed by the coordinates b1 , . . . , bk of b, Si = {ai + bi j :
j ∈ [0, s − 1]}, where ai is the i-th coordinate of a, i ∈ [1, k], and S is the k-tuple of sets
(S1 , . . . , Sk ), then the pair (B, S) determines by way of Theorem 9.9 the asymptotic behavior
of |{A ∈ AP (a, b; s) ∩ 2[1,p−1] : χp (a) = ε, for all a ∈ A}|, ε ∈ {−1, 1}. Hence for this pair,
we use the more specific notation Π± (a, b) for the sets Π± (B, S) in the statement of Theorem
9.9.
Now for the examples. We start with a simple example which illustrates how the pa-
rameter e and the set Λ(K) are calculated from (29) and (31). Suppose that s = 5 and
the quotient diagram of the admissible 8-tuple (a, b) consists of the single overlap diagram
located in the plane as follows:

237
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

q1 q2 q3 q4

Figure 1. Location of the quotient diagram

Then k = 4, q2 = q1 + 2, q3 = q2 + 2, and q4 = q3 + 3, hence the set R consists of the single


set R = {q1 , q2 , q3 , q4 }, whence D(R) = (2, 2, 3) and MR (5) = {2, 2, 3}. It therefore follows
from (29) that
e = (5 − 2) + (5 − 2) + (5 − 3) = 8.
Consequently,
α − e = 4 · 5 − 8 = 12.
We have that
b−1
i Si = {qi + j : j ∈ [0, 4]}, i = 1, 2, 3, 4,

and so 12 is also the cardinality of the union 4i=1 b−1


S
i Si .
Turning to the calculation of Λ(K) by means of (31), note first that the quotient diagram
of (a, b) consists of a single block D(V ) with V = {q1 , q2 , q3 , q4 }. The set of indices of the
elements of V is J = {1, 2, 3, 4}, and so the points of D(V ) are labeled as indicated in Figure
2:

(1, 0) (1, 1) (1, 2) (1, 3) (1, 4)

(2, 0) (2, 1) (2, 2) (2, 3) (2, 4)

(3, 0) (3, 1) (3, 2) (3, 3) (3, 4)

(4, 0) (4, 1) (4, 2) (4, 3) (4, 4)

q1 q2 q3 q4

Figure 2. Labeled points of the quotient diagram

238
11. An Interesting Class of Examples

The columns in C, identified as subsets of {1, 2, 3, 4} × {0, 1, 2, 3, 4} are hence


{(1, 0)}, {(1, 1)}, {(1, 2), (2, 0)}, {(1, 3), (2, 1)}, {(1, 4), (2, 2), (3, 0)}, {(2, 3),
(3, 1)}, {(2, 4), (3, 2)}, {(3, 3), (4, 0)}, {(3, 4), (4, 1)}, {(4, 2)}, {(4, 3)} and {(4, 4)}.
From this, we find that the sets θ(T ), T ∈ C, are

{1}, {1, 2}, {1, 2, 3}, {2, 3}, {3, 4}, and {4},

and so Λ(K), according to (31), consists of the sets

{1, 2}, {1, 3}, {2, 3}, and{3, 4}.

Consequently, the (a, b)-signature of an allowable prime p is

{χp (b1 b2 ), χp (b1 b3 ), χp (b2 b3 ), χp (b3 b4 )}.

Next, let m ∈ [1, +∞) and for each n ∈ [1, m], let D(n) be a fixed but arbitrary overlap
diagram with kn rows, kn ≥ 2, and gap sequence (d(i, n) : i ∈ [1, kn − 1]), with no gap
exceeding s − 1. Let k0 = 0, k = m
P
n=0 kn . We will now exhibit infinitely many admissible
2k-tuples (a, b) whose quotient diagram is ∆ = (D(n) : n ∈ [1, m]). This is done by taking
the (k − 1)-tuple (d1 , . . . , dk−1) in the following lemma to be
(   h i
d i − n0 kj , n + 1 , if n ∈ [0, m − 1] and i ∈ 1 + n0 kj , −1 + n+1
P P P
0 k j ,
di =
s, elsewhere,
and then letting (a, b) be any 2k-tuple obtained from the construction in the lemma.

Lemma 9.10. For k ∈ [2, ∞), let (d1 , . . . , dk−1) be a (k − 1)-tuple of positive integers.
Define k-tuples (a1 , . . . , ak ), (b1 , . . . , bk ) of positive integers inductively as follows: let (a1 , b1 )
be arbitrary, and if i > 1 and (ai , bi ) has been defined, choose ti ∈ [2, ∞) and set

ai+1 = ti (ai + di bi ), bi+1 = ti bi .

Then
i−1
ai aj X
− = dr , for all i > j.
bi bj r=j

Proof. This is a straightforward calculation using the recursive definition of the k-tuples
(a1 , . . . , ak ) and (b1 , . . . , bk ). QED
We can use Lemma 9.10 to also find infinitely many admissible 2k-tuples (a, b) with
quotient diagram ∆ and such that the set Π− (a, b) is empty. To do this, simply choose
the integer b1 and all subsequent ti ’s used in the above construction from Lemma 9.10 to
be squares. This shows that there are infinitely many admissible 2k-tuples with a specified
quotient diagram which satisfy the hypotheses of Theorem 9.9(ii)(b). On the other hand,
239
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

if b1 and all the subsequent ti ’s are instead chosen to be distinct primes, it follows that
the 2k-tuples determined in this way all have quotient diagram ∆ and each have Π− (a, b)
of infinite cardinality, and so there are infinitely many admissible 2k-tuples with specified
quotient diagram which satisfy the hypotheses of Theorem 9.9(ii)(c). We also note that if
all of the coordinates of (d1 , . . . , dk−1) in Lemma 9.10 are chosen to exceed s − 1 then we
obtain infinitely many admissible 2k-tuples which satisfy the hypothesis of Theorem 9.9(i).
For example, suppose we want to find infinitely many admissible 16-tuples (a, b) whose
quotient diagram consists of two copies of the overlap diagram in Figure 1, and for which
Π− (a, b) is empty. We note first that the gap sequence of the overlap diagram is (2, 2, 3),
then take the 7-tuple in Lemma 9.10 to be (2, 2, 3, 5, 2, 2, 3), select n ∈ [2, ∞), set ti = n2
and a1 = b1 = 1 in the recursive formulae for ai+1 and bi+1 , i = 1, 2, 3, 4, 5, 6, 7, to obtain
(a, b) = (1, 3n2 , 5n4 , 8n6 , 13n8 , 15n10 , 17n12 , 20n14 , 1, n2 , n4 , n6 , n8 , n10 , n12 , n14 ).
If we also wish to find infinitely many admissible 16-tuples (a, b) with this quotient diagram,
but for which Π− (a, b) is infinite, then in this same recipe, let {p1 , p2 , p3 , p4 , p5 , p6 , p7 } be
any set of 7 primes and take ti = pi for i = 1, 2, 3, 4, 5, 6, 7 to obtain
(a, b) = 1, 3p1 , 5 2i=1 pi , 8 3i=1 pi , 13 4i=1 pi , 15 5i=1 pi , 17 6i=1 pi , 20 7i=1 pi ,
Q Q Q Q Q Q

1, p1 , 2i=1 pi , 3i=1 pi , 4i=1 pi , 5i=1 pi , 6i=1 pi , 7i=1 pi .


Q Q Q Q Q Q 

Finally, to find infinitely many admissible 16-tuples (a, b) which satisfy the hypothesis of
Theorem 9.9(i), take the 7-tuple in Lemma 9.10 to be (5, 5, 5, 5, 5, 5, 5) and ti = n for
i = 1, 2, 3, 4, 5, 6, 7 for n ∈ [2, ∞) to obtain
(a, b) = (1, 6n, 11n2, 16n3 , 21n4 , 26n5 , 31n6 , 36n7 , 1, n, n2 , n3 , n4 , n5 , n6 , n7 ).
With this cornucopia of examples in hand, for ε ∈ {−1, 1}, we let qε (p) denote the
cardinality of the set
{A ∈ AP (a, b; s) ∩ 2[1,p−1] : χp (a) = ε, for all a ∈ A},
where (a, b) is admissible. We will now use the quotient diagram of (a, b), formulae (29),
(31), and Theorem 9.9 to study how (a, b) determines the asymptotic behavior of qε (p) in
specific situations. We will illustrate how things work when k = 2 and 3, and for when
“minimal” or “maximal” overlap is present in the quotient diagram of (a, b).
When k = 2, there is only at most a single overlap of rows in the quotient diagram of
(a, b), and if, e.g., a1 b2 − a2 b1 = qb1 b2 with 0 < q ≤ s − 1, then the quotient diagram looks
like

240
11. An Interesting Class of Examples

· · · · · · · ·
← q → · · · · · · · ·

Figure 3. A quotient diagram for k = 2

We have that α = 2s and, because of (29), e = s − q. Formula (31) shows that the signature
of p is {χp (b1 b2 )}, and so we conclude from Theorem 9.9 that when b1 b2 is a square,

qε (p) ∼ (b · 2s+q )−1 p, as p → +∞,

and when b1 b2 is not a square, Π+ (a, b) is the set of all allowable primes p such that {b1 , b2 }
is either a set of residues of p or a set of non-residues of p, Π− (a, b) is the set of all allowable
primes p such that {b1 , b2 } contains a residue of p and a non-residue of p,

qε (p) = 0, for all p in Π− (a, b),

and as p → +∞ inside Π+ (a, b),

qε (p) ∼ (b · 2s+q )−1 p.

When k = 3 there are exactly three types of overlap possible in the quotient diagram of
(a, b), determined, e.g., when either
(i) exactly one,
(ii) exactly two, or
(iii) exactly three
of b1 b2 , b2 b3 , and b1 b3 divide, respectively, a2 b1 −a1 b2 , a3 b2 −a2 b3 , and a3 b1 −a1 b3 with positive
quotients not exceeding s − 1.
In case (i), with a2 b1 − a1 b2 = qb1 b2 , say, the block in the quotient diagram of (a, b)
is formed by a single overlap between rows 1 and 2, and this block looks exactly like the
overlap diagram that was displayed for k = 2 above. It follows that the conclusions from
(29), (31), and Theorem 9.9 in case (i) read exactly like the conclusions in the k = 2 case
described before, except that the exponent of the power of 1/2 in the coefficient of p in the
asymptotic approximation is now 2s + q rather than s + q.
In case (ii), with a2 b1 −a1 b2 = qb1 b2 and a3 b2 −a2 b3 = rb2 b3 , say, the block in the quotient
diagram is formed by an overlap between rows 1 and 2 and an overlap between rows 2 and
3, but no overlap between rows 1 and 3. Hence the diagram looks like

241
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

· · · · · · · ·
← q → · · · · · · · ·
← r → · · · · · · · ·

Figure 4. A quotient diagram for k = 3

Here α = 3s, and, because of (29) and (31), e = 2s − q − r and the signature of p is
{χp (b1 b2 ), χp (b2 b3 )}. We hence conclude from Theorem 9.9 that if b1 b2 and b2 b3 are both
squares then

(32) qε (p) ∼ (b · 2s+q+r )−1 p as p → +∞.

On the other hand, if either b1 b2 or b2 b3 is not a square then Π+ (a, b) consists of all allowable
primes p such that {b1 , b2 , b3 } is either a set of residues of p or a set of non-residues of p,
Π− (a, b) consists of all allowable primes p such that {b1 , b2 , b3 } contains a residue of p and
a non-residue of p,

(33) qε (p) = 0, for all p ∈ Π− (a, b), and

(34) qε (p) ∼ (b · 2s+q+r )−1 p as p → +∞ inside Π+ (a, b).

In case (iii), with the quotients q and r determined as in case (ii), and, in addition,
a3 b1 − a1 b3 = tb1 b3 , say, the block in the quotient diagram is now formed by an overlap
between each pair of rows, and so the diagram looks like

· · · · · · · ·
← q → · · · · · · · ·
← r → · · · · · · · ·

Figure 5. Another quotient diagram for k = 3

It follows that α = 3s, e = 2s − q − r, and the signature of p is {χp (b1 b2 ), χp (b1 b3 ), χp (b2 b3 )}.
In this case, the asymptotic approximation (32) holds whenever b1 b2 , b1 b3 , and b2 b3 are all
squares, and when at least one of these integers is not a square, Π+ (a, b) and Π− (a, b) are
determined by {b1 , b2 , b3 } as before and (33) and (34) are valid.
Minimal overlap. Here we take the quotient diagram to consist of a single block with gap
sequence (s − 1, s − 1, . . . , s − 1), so that the overlap between rows is as small as possible: a
typical quotient diagram for s = 4 and k = 5 looks like

242
11. An Interesting Class of Examples

· · · ·
· · · ·
· · · ·
· · · ·
· · · ·

Figure 6. A quotient diagram with minimal overlap

Here α = ks, e = k − 1, and the signature of p is {χp (bi bi+1 ) : i ∈ [1, k − 1]}. Hence via
Theorem 9.9 , if bi bi+1 , i ∈ [1, k − 1], are all squares then
qε (p) ∼ (b · 21+k(s−1) )−1 p as p → +∞,
and if at least one of those products is not a square, then Π+ (a, b) consists of all allowable
primes p such that {b1 , . . . , bk } is either a set of residues of p or a set of non-residues of p,
Π− (a, b) consists of all allowable primes p such that {b1 , . . . , bk } contains a residue of p and
a non-residue of p,
(35) qε (p) = 0, for all p ∈ Π− (a, b), and
qε (p) ∼ (b · 21+k(s−1) )−1 p as p → +∞ inside Π+ (a, b).
Maximal overlap (k ≥ 3). Here we take the quotient diagram to consist of a single block
with gap sequence (1, 1, . . . , 1) and k = s, so that the overlap between each pair of rows is
as large as possible: the diagrams for k = 3, 4, and 5 look like

· · · · · · · · · · · ·
· · · · · · · · · · · ·
· · · · · · · · · · · ·
· · · · · · · · ·
· · · · ·

Figure 7. Quotient diagrams with maximal overlap

We have in this case that α = k 2 , e = (k − 1)2 , and the signature of p is


n Y  o
χp bi : I ∈ E([1, k]) .
i∈I
Q
Hence if i∈I bi is a square for all I ∈ E([1, k]) then
qε (p) ∼ (b · 22k−1)−1 p as p → +∞,
243
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

and if one of these products is not a square then Π+ (a, b) and Π− (a, b) are determined by
{b1 , . . . , bk } as before, (35) holds, and

qε (p) ∼ (b · 22k−1)−1 p as p → +∞ inside Π+ (a, b).

It follows from our discussion after the proof of Theorem 9.9 that an increase in the
number of overlaps between rows in the quotient diagram of (a, b) leads to an increase in
the asymptotic number of elements of AP (a, b; s) ∩ 2[1,p−1] that are sets of residues or non-
residues of p, and these examples now verify that principle quantitatively. In order to make
this explicit, note first that Lemma 9.10 can be used to generate examples in which the
(k − 1)-tuple (d1 , . . . , dk−1) varies arbitrarily, while at the same time b = max{b1 , . . . , bk }
always takes the same value. Hence we may assume in the discussion to follow that the
value of b is constant in each set of examples, and so the only parameter that is relevant
when comparing asymptotic approximations to qε (p) is the exponent of the power of 1/2 in
the coefficient of that approximation. When k = 2, there is either no overlap between rows
or exactly 1 overlap; in the former case, the exponent in the power of 1/2 that occurs in
the asymptotic approximation to qε (p) is 2s and in the latter case this exponent is less than
2s. When k = 3 there are 0, 1, 2, or 3 possible overlaps between rows, with the last three
possibilities occurring, respectively, in cases (i), (ii), and (iii) above. It follows that q < s in
case (i), q + r ≥ s in case (ii) and q + r < s in case (iii). Hence the exponent in the power
of 1/2 that occurs in the asymptotic approximation to qε (p) is 3s when no overlap occurs,
is greater than 2s and less than 3s in case (i), is at least 2s and less than 3s in case (ii),
and is less than 2s in case (iii). If we also take k = s when there is minimal overlap in the
quotient diagram and compare that to what happens when there is maximal overlap there,
we see that the exponent in the power of 1/2 that occurs in the asymptotic approximation
of qε (p) is quadratic in k, i.e., k 2 − k + 1, in the former case, but only linear in k, i.e., 2k − 1,
in the latter case.

12. The Asymptotic Density of Π+ (a, b)

Suppose that (a, b) is a standard 2k-tuple and assume that there exists an I ∈ Λ(K)
Q
such that i∈I bi is not a square. Then, in accordance with Theorem 9.9, the sets Π+ (a, b)
and Π− (a, b) are both infinite, and so it is of interest to calculate their asymptotic density.
Because Π+ (a, b) and Π− (a, b) are disjoint sets with only finitely many primes outside of
their union, it follows that

the density of Π+ (a, b) + the density of Π− (a, b) = 1,

so it suffices to calculate only the density of Π+ (a, b).


244
12. The Asymptotic Density of Π+ (a, b)

In order to keep the technicalities from becoming too complicated, we will describe this
calculation for the following special case: assume that
(36) (a, b) is admissible, the square-free parts σi = σ(bi ) of the coordinates
Q
bi of b are distinct and for each nonempty subset of T of [1, k], i∈T σi is
not a square.
This condition is satisfied, for example, if

(37) bi is square-free for all i and π(bi ) is a proper subset of π(bi+1 ), for all i ∈ [1, k − 1].

Moreover for each k ∈ [2, ∞), Lemma 9.10 can be used to construct infinitely many admis-
sible 2k-tuples with a fixed but arbitrary quotient diagram which satisfy (37).
Let (D(V1 ), . . . , D(Vm )) be the quotient diagram of (a, b) and let Di be the subset of
[1, k] such that Vi = {qj : j ∈ Di }, i ∈ [1, m]; as the sets V1 , . . . , Vm are pairwise disjoint, so
also are the sets D1 , . . . , Dm .
Now, let Ci denote the set of columns of the overlap diagram D(Vi ), realized as subsets
of [1, k] × [0, s − 1] as per the identification given by (30), and let
[
Λi (K) = E(θ(C)).
C∈Ci

Then
[ [
(38) I= θ(C) = Di , i ∈ [1, m],
I∈Λi (K) C∈Ci

and so it follows from the pairwise disjointness of the Di ’s, these equations, and (31) that
[
(39) Λ(K) = Λi (K), and this union is pairwise disjoint.
i

Next, for each I ∈ Λ(K) let


S(I) = {σi : i ∈ I},
and then set
M1 = {I ∈ Λ(K) : 1 ∈ S(I)}.
S
If M1 6= ∅ then there is a unique element n0 of i Di such that σn0 = 1, hence it follows
from (38) and (39) that there is a unique element i0 of [1, m] such that

M1 = {I ∈ Λi0 (K) : n0 ∈ I}.

It can then be shown that if


X
σ= |Di |
i
245
Chapter 9. Quadratic Residues and Non-Residues in Arithmetic Progression

and
m = the number of blocks in the quotient diagram of (a, b),
then the density of Π+ (a, b) is
(40) 2m−σ , if M1 = ∅ or M1 = Λi0 (K), or

(41) 21−σ (2m − 1), if ∅ =


6 M1 6= Λi0 (K).
It follows that whenever (a, b) is an admissible 2k-tuple for which the square-free parts
S
of the coordinates of b are distinct and satisfy condition (36), the cardinality of i Di , the
number of blocks m in the quotient diagram, and the set M1 completely determine the
density of Π+ (a, b) by means of formulae (40) and (41). Those formulae show that each
S
element of i Di contributes a factor of 1/2 to the density of Π+ (a, b) and each block of
the quotient diagram of (a, b) contributes essentially a factor of 2 to the density. Because
|Vi | ≥ 2 for all i, it follows that |Di | ≥ 2 for all i and so σ ≥ 2m; in particular, the density of
Π+ (a, b) is at most 2−m whenever M1 = ∅ or M1 = Λi0 (K) and is at most (2m − 1)/22m−1 ,
otherwise. This gives an interesting number-theoretic interpretation to the number of blocks
in the quotient diagram. In fact, if for each k ∈ [2, ∞), we let Ak denote the set of all
S
admissible 2k-tuples which satisfy condition (36), set A = k∈[2,∞) Ak , and take m ∈ [1, ∞),
then Lemma 9.10 can be used to show that there exists infinitely many elements (a, b) of
A such that the quotient diagram of (a, b) has m blocks and the density of Π+ (a, b) is 2−m
(respectively, (2m − 1)/22m−1 ). One can also show that if {l, n} ⊆ [1, ∞), with l ≥ 2n, then
there are infinitely many elements (a, b) of A such that the density of Π+ (a, b) is 21−l (2n −1).
For more details in this situation and for what transpires for arbitrary standard 2m-
tuples, we refer the interested reader to Wright [63].

246
CHAPTER 10

Are quadratic residues randomly distributed?

The purpose of this chapter is to provide evidence that the answer to the question in the
title is yes. By examining tables of residues and non-residues of certain primes in section
1, we observe that residues can occur in very irregular patterns. In section 2, we will show
how to view sums of the values of Legendre symbols χp as random variables and then we
will employ the Central Limit Theorem from probability theory to determine a condition
under which, at least when p is sufficiently large , the values of χp can be interpreted to
behave randomly and independently. In section 3, a very interesting result of Davenport and
Erdös on the the distribution of residues will then be employed to verify that the condition
from section 2 that detects random behavior of residues and non-residues does indeed hold.
Interestingly enough, the Weil-sum estimates from Theorem 9.1, which were so useful in our
work in Chapter 9, will also be very useful in our proof of Davenport and Erdös’ result.

1. Irregularity of the Distribution of Quadratic Residues

Extensive numerical calculations performed over the years indicate that, at least in certain
subintervals of [1, p − 1], residues and non-residues of p occur in very irregular patterns. For
example, we present below four tables which exhibit the residues and non-residues of the
primes 41, 79, 101, and 139. A 0 indicates that the corresponding entry is a residue of the
indicated prime and 1 indicates that the corresponding entry is a non-residue. The tables for
41 and 101 are palindromic, i.e., they read the same from left to right, starting from the first
entry in the table, as from right to left, starting from the last entry (note that 41 and 101
are congruent to 1 mod 4), and the tables for 79 and 139 become palindromic if the 0 and 1
entries in the last half of the tables are switched to 1 and 0, respectively (note that 79 and
139 are congruent to 3 mod 4). However, the entries in various subintervals of consecutive
integers in the first half of the tables are fairly irregular and do not appear to exhibit any
predictable pattern.

247
Chapter 10. Are quadratic residues randomly distributed?

1-20: 0 0 1 0 0 1 1 0 0 0 1 1 1 1 1 0 1 0 1 0
21-40: 0 1 0 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 0

Table 1: Residues and Non-residues of 41

1-20: 0 0 1 0 0 1 1 0 0 0 0 1 0 1 1 0 1 000
21-40: 0 0 0 1 0 0 1 1 1 1 0 0 1 1 1 0 1 010
41-60: 1 0 1 0 0 0 1 1 0 0 0 0 1 1 0 1 1 111
61-78: 1 0 1 0 0 1 0 1 1 1 1 0 0 1 1 0 1 1

Table 2: Residues and Non-residues of 79

1-20: 01100 011011100100100


21-40: 00000 111100101100111
41-60: 11010 101011010101011
61-80: 11100 110100111100000
81-100: 00100 100111011000110

Table 3: Residues and Non-residues of 101

1-20: 01100001010101101110
21-40: 11100110000110000011
41-60: 00100001010010010111
61-80: 11000001010111110000
81-100: 01011011010111101100
101-120: 11111001111001100010
121-138: 001001010101111001

Table 4: Residues and Non-residues of 139

This has led to speculation about whether residues occur more or less randomly in certain
intervals of consecutive integers. In the following section, we set up a procedure which can
be used to provide positive evidence for the contention that residues are in fact distributed
in this manner.
248
2. Detecting Random Behavior Using the Central Limit Theorem

2. Detecting Random Behavior Using the Central Limit Theorem

The method which we will use to detect random behavior in the distribution of residues
employs the Central Limit Theorem from the mathematical theory of probability. In order
to set the stage for that result, we will briefly review some basic facts and terminology from
probability theory.
One starts with a probability space, i.e., a triple (Ω, M, µ) consisting of a set Ω (the
sample space), a distinguished σ-algebra M of subsets of Ω (the events), and a non-negative,
countably additive measure µ defined on M such that µ(Ω) = 1 (the probability measure).
A random variable X on Ω is an extended real-valued function defined on Ω such that for
each real number r, the set {ω ∈ Ω : X(ω) < r} is in M, i.e., X is measurable with respect
Z to
M. The mean and variance of a random variable X is the value of the integral X = X dµ
Z Ω

and (X − X)2 dµ, respectively. The distribution function of X is the function defined on

the real line R by

λ → µ {ω ∈ Ω : X(ω) ≤ λ} , λ ∈ R.
It can be shown that a non-negative function F defined on R is the distribution function of
a random variable if and only if F is non-decreasing, right continuous, limλ→−∞ F (λ) = 0
and limλ→+∞ F (λ) = 1 (Chung [4], Theorem 2.2.4).
A set {X1 , . . . , Xn } of random variables on the probability space (Ω, M, µ) is (stochasti-
cally) independent if for any n-tuple (B1 , . . . , Bn ) of Borel subsets of the real line, we have
that
\ n  Y n

µ {ω ∈ Ω : Xi (ω) ∈ Bi } = µ {ω ∈ Ω : Xi (ω) ∈ Bi } .
i=1 i=1

An infinite sequence (Xn ) of random variables is independent if every finite subset of the
Xi ’s is independent.
Stochastic independence is a way of making mathematically precise the intuitive notion
of describing how events are determined by the outcomes of random trials. An unbiased
coin is tossed, the two possible outcomes are recorded as 0 or 1, with roughly probabilities
of 1/2 each, and repeated tossings generate a sequence of outcomes. In a similar way, we
may repeatedly cast a fair die, or draw colored beads (with replacement) from an urn, or
take repeated measurements of a certain quantity from a sample population, each process
generating values for a sequence of random variables. As Chung [4] states, “it is very easy
to conceive of undertaking these various trials under conditions such that their respective
outcomes do not appreciably effect each other.... In this circumstance, idealized trials are
carried out “independently of one another” and the corresponding [random variables] are
249
Chapter 10. Are quadratic residues randomly distributed?

“independent” according to definition”. In fact, according to Chung, “it may be said that
no one could have learned the subject of probability properly without acquiring some feeling
for the intuitive content of the concept of stochastic independence.”
Sequences of independent random variables exist in abundance. One can in fact prove
that if, for each positive integer n, Yn is a fixed but arbitrary random variable defined
on a fixed but arbitrary probability space (Ωn , Mn , µn ), then there exists a probability
space (Ω, M, µ) and a sequence of independent random variables (Xn ) on (Ω, M, µ) such
that, for all n, Xn and Yn have the same mean, variance, and distribution function. The
construction of (Ω, M, µ) and the sequence of independent random variable (Xn ) uses the
measure-theoretic infinite product of the probability spaces (Ωn , Mn , µn ) and is hence too
involved to go into further here; for complete details of this construction, we refer the reader
interested in them to Chung [4], Theorem 3.3.4 and its proof.
Now, suppose that X1 , X2 , . . . is a sequence of random variables defined on a probability
space (Ω, M, µ) which is independent, identically distributed, i.e., all Xn ’s have the same
distribution function, and each random variable has mean 0 and variance 1. If we set
n
X
Sn = Xk , n ∈ [1, ∞),
k=1

then the Central Limit Theorem (Chung [4], Theorem 6.4.4) asserts that for each real number
λ,
Z λ
n Sn (ω) o 1 2
(1) lim µ ω ∈ Ω : √ ≤ λ =√ e−t /2 dt,
n→+∞ n 2π −∞

i.e., as n → +∞, Sn / n tends to becomes normally distributed with mean 0 and variance
1.
Now let p be a prime. We convert the set [0, p − 1] into a (discrete and finite) probability
space by assigning probability 1/p to each element of [0, p − 1]. This induces the probability
measure µp on [0, p − 1] defined by
|S|
(2) µp (S) = , S ⊆ [0, p − 1].
p
For each positive integer h < p, consider the sums
x+h
X
Sh (x) = χp (n), x = 0, . . . , p − 1,
n=x+1

which is just the quadratic excess of the interval (x, x + h + 1) that we studied in Chapter
7. The function Sh is a random variable on ([0, p − 1], µp ), and so by way of analogy with
250
3. Verifying Random Behavior via a Result of Davenport and Erdös

(1), we consider the distribution function


n Sh (x) o
(3) λ → µp x ∈ [0, p − 1] : √ ≤ λ , λ ∈ R,
h

of Sh / h.
We next let h = h(p) be a function of p and look for conditions on the growth of h(p)
which guarantee that for each real number λ,
Z λ
1 n Sh(p) (x) o 1 2
(4) lim x ∈ [0, p − 1] : p ≤λ = √ e−t /2 dt,

p→+∞ p h(p) 2π −∞
It is easy to see that a necessary condition for (4) to occur is that limp→+∞ h(p) = +∞. If
(4) is valid then, as we see from (2) and (3), when p → +∞ the sums Sh(p) satisfy a “central
limit theorem” relative to the probability spaces ([0, p − 1], µp ). If (4) can be verified, then
upon comparing it to (1), we conclude that for p sufficiently large, at least with respect
to sampling using χp in the intervals [x + 1, x + h(p)], x = 0, 1, . . . , p − 1, residues and
non-residues of p appear to behave as if they are distributed randomly and independently!

3. Verifying Random Behavior via a Result of Davenport and Erdös

The following theorem of Davenport and Erdös ([7], Theorem 5) provides conditions on
h(p) which imply that (4) is true:

Theorem 10.1. If h : P → [1, ∞) is any function such that


h(q)r
lim h(q) = +∞, lim √ = 0, for all r ∈ [1, ∞)
q→+∞ q→+∞ q
(e.g., h(q) = [logN q], where N is any fixed positive integer), then for each real number λ,
Z λ
1 n Sh(p) (x) o 1 2
lim x ∈ [0, p − 1] : p ≤λ = √ e−t /2 dt.

p→+∞ p h(p) 2π −∞
As a consequence of this theorem and our discussion in section 2, we conclude that, at
least for sufficiently large primes p, residues and non-residues do appear to be distributed
randomly within intervals whose length does not increase too fast as p → +∞.
The proof of Theorem 10.1 relies on the following lemma: we will first state the lemma,
use it to prove Theorem 10.1, and then prove the lemma.

Lemma 10.2. Let r be a fixed positive integer, and let h be an integer and p a prime such
that r < h < p. Then there exists numbers 0 ≤ θ ≤ 1, 0 ≤ θ′ ≤ 1 such that
p−1 r
X
2r ′ r
Y √
(5) S (x) − (p − θr)(h − θ r) (2i − 1) ≤ 2rh2r p,

h
x=0 i=1
251
Chapter 10. Are quadratic residues randomly distributed?
p−1
X
2r−1 √
(6) Sh (x) ≤ 2rh2r p.


x=0

Proof of Theorem 10.1. Let r be a fixed positive integer. Then by the hypotheses satisfied
by h(p), we have that r < h(p) < p for all p sufficiently large, hence Lemma 10.2 implies
that for all such p,

p−1 r
1 X
−1/2 2r
 θr  θ′ r r Y h(p)r
(h(p) Sh(p) (x)) − 1 − 1− (2i − 1) ≤ 2r √ ,

p x=0 p h(p) i=1 p

p−1
1 X
−1/2

2r−1 h(p)r
(h(p) S (x)) ≤ 2r √ .

h(p)
p x=0 p

Letting p → +∞ in these inequalities, we deduce from the growth conditions on h(p) that if
r is any positive integer and

 r/2
 Y

(2i − 1), if r is even,
µr =
 i=1
0, if r is odd,

then
p−1
1 X
(7) lim (h(p)−1/2 Sh(p) (x))r = µr .
p→+∞ p
x=0

Now for each real number s, let

1 
Np (s) = x ∈ [0, p − 1] : Sh(p) (x) ≤ s .
p

The function Np is nondecreasing in s, constant except for possible discontinuities at certain


integral values of s, and is right-continuous at every value of s. Because

Sh(p) (x) ≤ h(p), for all x,

it follows that
(
0, if s < −h(p) ,
Np (s) =
1, if s ≥ h(p).
252
3. Verifying Random Behavior via a Result of Davenport and Erdös

We also have that


h(p)
1X −1/2
r 1 X  X
−1/2 r

(8) h(p) Sh(p) (x) = (h(p) s)
p x p
s=−h(p) x:Sh(p) (x)=s

h(p)
1 X
= (h(p)−1/2 s)r |{x : Sh(p) (x) = s}|
p
s=−h(p)
h(p)
X
= (h(p)−1/2 s)r (Np (s) − Np (s − 1)),
s=−h(p)

and so if we let
Φp (t) = Np (th(p)−1/2 ),
then the last sum in (8) can be written as the Stieltjes integral
Z ∞
tr dΦp (t).
−∞

Putting
t
1
Z
2 /2
Φ(t) = √ e−u du,
2π −∞
we have ∞ ∞
1
Z Z
2 /2
r
t dΦ(t) = √ tr e−t dt = µr ,
−∞ 2π −∞
hence (7), (8) imply that
Z ∞ Z ∞
r
(9) lim t dΦp (t) = tr dΦ(t), for all r ∈ [0, ∞).
p→+∞ −∞ −∞

By virtue of the definition of Φp , the conclusion of Theorem 10.1 can be stated as

(10) lim Φp (λ) = Φ(λ), for all real numbers λ.


p→+∞

We will deduce (10) from (9) by an appeal to the classical theory of moments.
Suppose by way of contradiction that (10) is false for some λ; then there exists δ > 0
such that

(11) |Φp (λ) − Φ(λ)| ≥ δ for infinitely many p.

Using the first and second Helly selection theorems (Shohat and Tamarkin, [54], Introduction,
section 3),we find a subsequence of these p, say p′ , and a nondeceasing real-valued function
Φ∗ defined on R such that

(12) lim Φ∗ (t) = 0, lim Φ∗ (t) = 1,


t→−∞ t→+∞

253
Chapter 10. Are quadratic residues randomly distributed?

(13) Φ∗ is right-continuous at all points of R,

(14) lim Φp′ (t) = Φ∗ (t), for all points t at which Φ∗ is continuous,
p′ →+∞

and
Z ∞ Z ∞
(15) ′
lim r
t dΦp′ (t) = tr dΦ∗ (t), for all r ∈ [0, ∞).
p →+∞ −∞ −∞

By way of (9) and (15),


Z ∞ Z ∞
r ∗
(16) t dΦ (t) = tr dΦ(t), for all r ∈ [0, ∞).
−∞ −∞

The Weierstrass approximation theorem, which asserts that each function continuous on a
closed and bounded interval of the real line is the uniform limit on that interval of a sequence
of polynomials, and (16) imply
Z ∞ Z ∞

(17) f dΦ (t) = f dΦ(t),
−∞ −∞

for all real-valued functions f continuous on R of compact support. Equations (12), (13),
and (17) imply that

(18) Φ∗ (t) = Φ(t), for all t ∈ R.

Hence Φ∗ is continuous everywhere in R, and so by (14) and (18),

lim Φp′ (λ) = Φ(λ),


p′ →+∞

and this contradicts (11).


It remains to prove Lemma 10.2. The argument here makes use of another interesting
application of the Weil-sum estimates available from Theorem 9.1.
Consider first the case with 2r as the exponent. We have that
p−1 p−1 2r
Y 
X X X
2r
(19) (Sh (x)) = χp (x + ni ) .
x=0 (n1 ,...,nr )∈[1,h]2r x=0 i=1

In order to estimate the absolute value of this sum, we divide the elements (n1 , . . . , n2r ) of
[1, h]2r into two types: (n1 , . . . , n2r ) is of type 1 if it has at most r distinct coordinates, each
of which occurs an even number of times; all other elements of [1, h]2r are of type 2.
Q
If (n1 , . . . , n2r ) is of type 1 then the polynomial i (x+ni ) is a 
perfect square
 in (Z/pZ)[x].
Q
If s is the number of distinct coordinates of (n1 , . . . , n2r ), then χp i (x+ni ) = 0 whenever

254
3. Verifying Random Behavior via a Result of Davenport and Erdös
Q
there is a distinct coordinate nj of (n1 , . . . , n2r ) such that x ≡ −nj mod p, and χp i (x +

ni ) = 1 otherwise. It follows that the value of the sum
p−1 2r
Y 
X
χp (x + ni )
x=0 i=1

is at least p − r, and this value is clearly at most p. Hence there exists a number 0 ≤ θ ≤ 1
such that the sum (19) is
F (h, r)(p − θr),

where F (h, r) denotes the cardinality of the set of all elements of [1, h]2r of type 1.
Q
On the other hand, if (n1 , . . . , n2r ) is of type 2 then the polynomial i (x + ni ) reduces
modulo p to a product of at least one and at most 2r distinct linear factors over Z/pZ, hence
Theorem 9.1 implies that
p−1 2r
X Y  √
χp (x + ni ) ≤ 2r p.


x=0 i=1

Hence the contribution of the elements of type 2 to the sum (19) has an absolute value that

does not exceed 2rh2r p.
An appropriate estimate of the size of F (h, r) is now required. Following Davenport and
Erdös, we note first that the number of ways of choosing exactly r distinct integers from
[1, h] is h(h − 1) · · · (h − r + 1), and the number of ways of arranging these as r pairs is
Qr
i=1 (2i − 1). Hence
r
Y
F (h, r) ≥ h(h − 1) . . . (h − r + 1) (2i − 1)
i=1
r
Y
> (h − r)r (2i − 1).
i=1

On the other hand, the number of ways of choosing at most r distinct elements from [1, h]
is at most hr , and when these have been chosen, the number of different ways of arranging
them in 2r places is at most ri=1 (2i − 1). Hence
Q

r
Y
r
F (r, h) ≤ h (2i − 1).
i=1

Hence there is a number 0 ≤ θ′ ≤ 1 such that


r
Y
F (r, h) = (h − θ′ r)r (2i − 1).
i=1
255
Chapter 10. Are quadratic residues randomly distributed?

The conclusion of Lemma 10.2 for odd exponents follows from these estimates, and when the
sum has an even exponent, the desired conclusion is now obvious, because in this case there
are no elements of type 1. QED
Remark. More recently, Kurlberg and Rudnick [32] and Kurlberg [31] have provided
further evidence of the random behavior of quadratic residues by computing the limiting
distribution of normalized consecutive spacings between representatives of the squares in
Z/nZ as |π(n)| → +∞. In order to describe their work there, let Sn ⊆ [0, n − 1] denote
the set of representatives of the squares in Z/nZ, i.e., the set of quadratic residues modulo
n inside [0, n − 1] (N.B. It is not assumed here that a quadratic residue mod n is relatively
prime to n). Order the elements of Sn as r1 < · · · < rN and then let xi = (ri+1 − ri )/s,
where s = (rN − r1 )/N is the mean spacing; xi , i = 1, . . . , N − 1, are the distances between
consecutive elements of Sn normalized to have mean distance 1. If t is any fixed positive real
number then it is shown in [31] and [32] that
|{xi : xi ≤ t}|
lim = 1 − e−t ,
|π(n)|→+∞ |Sn | − 1
i.e., for all n with |π(n)| large enough, the normalized spacings between quadratic residues
of n follow (approximately) a Poisson distribution. Among many other things, the Poisson
distribution governs the number of customers and their arrival times in queueing theory, and
so the results of Kurlburg and Rudnick can be interpreted to say that if the number of prime
factors of n is sufficiently large then quadratic residues of n appear consecutively in the set
[0, n − 1] in the same way as customers arriving randomly to join a queue.

256
Bibliography

[1] B. Berndt, Classical theorems on quadratic residues, Enseignement Math., 22 (1976) 261-304.
[2] H. Cohen, Number Theory, vol. I, Springer-Verlag, New York, 2000.
[3] J. B. Conway, Functions of One Complex Variable, vol. 1, Springer-Verlag, New York, 1978.
[4] K. L. Chung, A Course in Probability Theory, Academic Press, New York, 1974.
[5] H. Davenport, On character sums in finite fields, Acta Math., 71 (1939) 99-121.
[6] H. Davenport, Multiplicative Number Theory, Springer-Verlag, New York, 2000.
[7] H. Davenport and P. Erdös, The distribution of quadratic and higher residues, Publ. Math. Debrecen, 2
(1952) 252-265.
[8] R. Dedekind, Sur la Théorie des Nombres Entiers Algébriques, 1877; English translation by J. Stillwell,
Cambridge University Press, Cambridge, 1996.
[9] P. G. L. Dirichlet, Sur la convergence des series trigonométrique qui servent à représenter une fonction
arbitraire entre des limites donnée, J. Reine Angew. Math., 4 (1829) 157-169.
[10] P. G. L. Dirichlet, Beweis eines Satzes daß jede unbegrenzte arithmetische Progression, deren erstes Glied
und Differenz ganze Zahlen ohne gemeinschaftlichen Faktor sind, unendlich viele Primzahlen enhält, Abh.
K. Preuss. Akad. Wiss., (1837) 45-81.
[11] P. G. L. Dirichlet, Recherches sur diverses applications de l’analyse infinitésimal à la théorie des nombres,
J. Reine Angew. Math., 19 (1839) 324-369; 21 (1840) 1-12, 134-155.
[12] P. G. L. Dirichlet, Vorlesungen über Zahlentheorie, 1863; English translation by J. Stillwell, American
Mathematical Society, Providence, 1991.
[13] J. Dugundji, Topology, Allyn and Bacon, Boston, 1966.
[14] P. Erdös, On a new method in elementary number theory which leads to an elementary proof of the
prime number theorem, Proc. Nat. Acad. Sci. U.S.A. 35 (1949) 374-384.
[15] L. Euler, Theoremata circa divisores numerorum in hac forma pa2 ± qb2 contentorum, Comm. Acad.
Sci. Petersburg 14 (1744/46) 151-181.
[16] L. Euler, Theoremata circa residua ex divisione postestatum relicta, Novi Commet. Acad. Sci. Petropoli-
tanea 7 (1761) 49-82.
[17] L. Euler, Observationes circa divisionem quadratorum per numeros primes Opera Omnia I-3 (1783)
477-512.
[18] M. Filaseta and D. Richman, Sets which contain a quadratic residue modulo p for almost all p, Math.
J. Okayama Univ., 39 (1989) 1-8.
[19] C. F. Gauss, Disquisitiones Arithmeticae, 1801; English translation by A. A. Clarke, Springer-Verlag,
New York, 1986.
[20] C. F. Gauss, Theorematis arithmetici demonstratio nova, Göttingen Comment. Soc. Regiae Sci., XVI
(1808) 8 pp.

257
Bibliography

[21] C. F. Gauss, Summatio serierum quarundam singularium Göttingen Comment. Soc. Regiae Sci., (1811)
36 pp.
[22] C. F. Gauss, Theorematis fundamentalis in doctrina de residuis quadraticis demonstrationes et ampli-
cationes novae, 1818, Werke, vol. II, Georg Olms Verlag, Hildescheim, 1973, 47-64.
[23] C. F. Gauss, Theorematis fundamentallis in doctrina residuis demonstrationes et amplicationes novae,
Göttingen Comment. Soc. Regiae Sci., 4 (1818) 17 pp.
[24] C. F. Gauss, Theoria residuorum biquadraticorum: comentatio prima, Göttingen Comment. Soc. Regiae
Sci., 6 (1828) 28 pp.
[25] C. F. Gauss, Theoria residuorum biquadraticorum: comentatio secunda, Göttingen Comment. Soc.
Regiae Sci., 7 (1832) 56 pp.
[26] D. Gröger, Gauß’ Reziprozitätgesetze der Zahlentheorie: Eine Gesamtdarstellung der Hinterlassenschaft
in Zeitgemäßer Form, Erwin-Rauner Verlag, Augsburg, 2013.
[27] E. Hecke, Vorlesungen über die Theorie der Algebraischen Zahlen, 1923; English translation by G.
Brauer and J. Goldman, Springer-Verlag, New York, 1981.
[28] D. Hilbert, Die Theorie der Algebraischen Zahlkörper, 1897; English translation by I. Adamson,
Springer-Verlag, Berlin, 1998.
[29] T. Hungerford, Algebra, Springer-Verlag, New York, 1974.
[30] K. Ireland and M. Rosen, A Classical Introduction to Modern Number Theory, Springer-Verlag, New
York, 1990.
[31] P. Kurlberg, The distribution of spacings between quadratic residues II, Israel J. Math., 120 (2000)
205-224
[32] P. Kurlberg and Z. Rudnick, The distribution of spacings between quadratic residues, Duke Math. J.
100 (1999) 211-242.
[33] J. L. Lagrange, Problèmes indéterminés du second degré, Mém. de l’Acad. Royale Berlin, 23 (1769)
377-535.
[34] J. L. Lagrange, Reserches d’Arithmétique, 2nde partie, Nouv. Mém. de l’Acad. de Berlin (1775) 349-352.
[35] E. Landau, Elementary Number Theory, English translation by J. Goodman, Chelsea, New York, 1958.
[36] A. Legendre, Reserches d’analyse indéterminée, Histoiré de l’Acadmie Royale des Sciences de Paris
(1785), Paris, 1788, 465-559.
[37] A. Legendre, Essai sur la Théorie des Nombres, Paris, 1798.
[38] F. Lemmermeyer, Reciprocity Laws, Springer-Verlag, New York-Berlin-Heidelberg, 2000.
[39] W. J. LeVeque, Topics in Number Theory, vol. II, Addison-Wesley, Reading, 1956.
[40] D. Marcus, Number Fields, Springer, New York, 1977.
[41] H. Montgomery and R. Vaughan, Multiplicative Number Theory I: Classical Theory, Cambridge Uni-
versity Press, Cambridge, 2007.
[42] R. Nevenlinna and V. Paatero, Introduction to Complex Analysis, Addison-Wesley, Reading, 1969.
[43] O. Ore, Les Corps Algébriques et la Théorie des Idéaux, Paris, 1934.
[44] G. Perel’muter, On certain character sums Uspekhi Mat. Nauk., 18 (1963) 145-149.
[45] C. de la Vallée Poussin, Recherches analytiques sur la théorie des nombres premiers, Ann. Soc. Sci.
Bruxelles, 20 (1896) 281-362.
[46] H. Rademacher, Lectures on Elementary Number Theory, Krieger, New York, 1977.

258
Bibliography

[47] G. F. B. Riemann, Über die Anzahl der Primzahlen unter einer gegebenen Größe, Monatsberischte der
Berlin Akademie (1859), 671-680.
[48] K. Rosen, Elementary Number Theory and its Applications, Pearson, Boston, 2005.
[49] J. Rosenberg, Algebraic K-Theory and its Application, Springer, New York, 1996.
[50] W. Schmidt, Equations over Finite Fields: an Elementary Approach, Springer-Verlag, Berlin, 1976.
[51] A. Selberg, An elementary proof of Dirichlet’s theorem on primes in arithmetic progressions, Ann.
Math., 50 (1949) 297-304.
[52] A. Selberg, An elementary proof of the prime number theorem, Ann. Math., 50 (1949) 305-313.
[53] A. Shamir, Identity-based cryptosystems and signature schemes, in G. R. Blakely and D. Chaum, eds.,
Advances in Cryptology, Springer-Verlag, Berlin, 1985, 47-53.
[54] J. Shohat and J. D. Tamarkin, The Problem of Moments, American Mathematical Society, New York,
1943.
[55] R. Taylor and A. Wiles, Ring-theoretic properties of certain Hecke algebras, Ann. Math., 141 (1995)
553-572.
[56] J. Urbanowicz and K. S. Williams, Congruences for L-Functions, Kluwer, Dordrecht, 2000.
[57] A. Weil, Sur les Courbes Algébriques et les Variétes qui s’en Déduisent, Hermann et Cie, Paris, 1948.
[58] A. Weil, Basic Number Theory, Springer-Verlag, New York, 1973.
[59] L. Weisner, Introduction to the Theory of Equations, MacMillan, New York, 1938.
[60] A. Wiles, Modular elliptic curves and Fermat’s Last Theorem, Ann. Math., 141 (1995) 443-551.
[61] S. Wright, Quadratic non-residues and the combinatorics of sign multiplication, Ars Combin., 112 (2013)
257-278.
[62] S. Wright, Quadratic residues and non-residues in arithmetic progression, J. Number Theory 133 (2013)
2398-2430.
[63] S. Wright, On the density of primes with a set of quadratic residues or non-residues in given arithmetic
progression, J. Combin. Number Theory 6 (2015) 85-111.
[64] B. F. Wyman, What is a reciprocity law? Amer. Math. Monthly, 79 (1972), 571-586.
[65] A. Zygmund, Trigomometric Series, Cambridge University Press, Cambridge, 1968.

259
Index

abelian extension, 66 Cauchy’s Integral Theorem, 158


admissible 2k-tuple, 234 Central Limit Theorem, 250
algebraic curve, 211 character, 11
estimate of the number of rational points on a additive, 214
non-singular, 212 orthogonality relations for an, 214
non-singular, 212 basic, 185
rational point of an, 211 Dirichlet, 84
algebraic integer, 43 conductor of a, 184
algebraic number, 37 induced modulus of a, 184
degree of an, 38 orthogonality relations for a, 85
algebraic number field, 50 primitive, 184
embedding of an, 115 principal, 84
zeta function of an, 120 real, 84
Euler-Dedekind product expansion of the, Chinese remainder theorem, 7
121 Chung, K.-L., 249, 250, 257
elementary factors of the, 124 circle group, 11, 84
algebraic number theory, 37-44, 50-57, 107-116 class field theory, 23, 70
al-Hasan ibn al-Haythem, A. A., 12 class number
allowable prime, 226 of a fundamental discriminant, 190
analytic function, 157 of a number field, 53
Taylor-series expansion of an, 157 Clay Mathematics Institute, 87
analytic number theory, 82-88, 116-131, 147-177 combinatorial number theory, 138-144, 217-246
arithmetic algebraic geometry, 213 complete Weil sum, 211
arithmetic progression, 81 estimate of a, 213
asymptotic density, 88 computational complexity, 105
asymptotic functions, 88 contour, 158
basic modulus, 185 closed, 158
Basic Problem, 13 Jordan, 158
solution of the, 75-79 exterior of a, 159
Berndt, B., 148, 157, 174, 257 interior of a, 159
Bessel’s inequality, 165 positively oriented, 159
(B, S)-signature, 226 contour integral, 158
261
Index

Cohen, H., 187, 203, 257 finite extension, 66


Conway, J. B., 158, 160, 257 Fourier series, 160-161
cyclotomic number field, 67 complex form of a, 178
cyclotomy, 5, 6 convergence theorem for, 162
Davenport, H., 84, 174, 183, 185, 191, 207, 208, cosine coefficient of a, 161
257 finite, 177
Davenport, H. and P. Erdös, 247, 251, 257 Fourier coefficients of a, 179
Dedekind, R., 37, 52, 108, 135, 257 sine coefficient of a, 161
Dedekind’s Ideal-Distribution Theorem, 113, 114 function of bounded variation, 165
Dirichlet, P. G. L., 24, 77, 82-83, 116, 131, 147, fundamental discriminant, 62
162, 165, 257 Fundamental Problem, 15
Dirichlet-Hilbert trick, 207, 208 solution of the Fundamental Problem for the
Dirichlet kernel, 163 prime 2, 15-18
Dirichlet L-function, 85, 150 solution of the Fundamental Problem for odd
Dirichlet’s class-number formula, 192 primes, 72-74
Dirichlet series, 116, 117 Fundamental Theorem of Ideal Theory, 51
convergence theorem for, 117 proof of the, 132-135
Dirichlet’s theorem on primes in arithmetic fundamental unit, 115
progression, 81 Galois automorphism, 65
elementary proof of, 137 Galois field GF (2) of order 2, 89
proof of, 83-86 Galois group, 65
Disquisitiones Arithmeticae, 1, 4-6, 30, 46, 50, 65, Gauss, C. F., 3, 11, 15, 18, 19, 29, 30, 36, 37, 46,
83, 257 50, 65, 94, 154, 257-258
Dugundji, J., 159, 257 Gauss’ lemma, 16, 40
Eisenstein, M., 32, 50 Gauss sum, 46, 47, 50, 203
Eisenstein’s criterion, 39 theorem on the value of a, 154
elementary number theory, 6-8, 137-145, 174 Generalized Riemann Hypothesis, 87, 214
elementary symmetric polynomial, 41 Gröger, D., 29, 258
entire function, 157 group of units, 84
Erdös, P., 137, 257 Hecke, E., 30, 52-55, 58, 63-65, 111, 113, 114, 258
Euclidean algorithm, 7 higher reciprocity laws, 22, 30
Euler-Dirichlet product formula, 85-86, 152 Hilbert, D., 87, 107, 131, 258
Euler, L., 11, 26, 83, 81, 137, 257 Hungerford, T., 23, 51, 66, 67, 111, 258
Euler’s constant, 216 hybrid or mixed Weil sum, 216
Euler’s criterion, 11 ideal(s), 50
Euler’s totient function, 84 equivalent, 53
Fermat, P., 20, 26 genus of, 64
Fermat’s Last Theorem, 213 maximal, 51
field of complex numbers, 37 narrow equivalence of, 31, 57, 58
degree of a, 50 norm of an, 54, 111
Filaseta, M. and D. Richman, 91, 131, 257 prime, 51
262
Index

degree of a, 51, 123 method of successive substitution, 75, 76


product of, 51 Millennium Prize Problems, 87
ideal class, 53 minimal polynomial, 37
ideal-class group, 53 Minkowski’s constant, 55
incomplete Weil sum, 213 modular substitution, 63
estimate of an, 214 Montgomery, M. and R. Vaughan, 88, 92, 214,
infinite product, 121 258
absolute convergence of an, 122 narrow class number, 58
convergence of an, 122 narrow ideal class, 58
integral basis, 54, 108 narrow ideal-class group, 58
inverse modulo m, 7 Nevenlinna, R. and V. Paatero, 122, 258
existence and uniqueness theorem for an, 7 norm of a field element, 54
Ireland, K. and M. Rosen, 7, 36, 70, 154, 258 normal distribution, 6, 250
isolated singularity, 159 notation
Jacobi, C. G. J., 50, 99 P, 6
Jacobi symbol, 99 Z, 6
Reciprocity Law for the, 100 Q, 6
Jordan curve theorem, 159 R, 6
Kronecker, L., 154 [m, n], m and n integers, m ≤ n, 6
Kronecker symbol, 187 [m, ∞), m an integer, 6
Kurlberg, P., 256, 258 |A|, A a set, 6
Kurlberg, P. and Z. Rudnick, 256, 258 2A , A a set, 6
Lagrange, J. L., 26, 29, 63, 258 gcd(m, n), 6
Landau, E., 62, 63, 186, 187, 189-191, 258 π(z), z an integer, 6
Law of Quadratic Reciprocity (LQR), 23 U (n), n a positive integer, 6
Gauss’ first proof of the, 30 for the Legendre symbol, 10
Gauss’ second proof of the, 30, 57-65 X± (z), z an integer, 13
Gauss’ third proof of the, 15, 31-34 πeven (z), z an integer, 13
Gauss’ fourth proof of the, 31 πodd (z), z an integer, 13
Gauss’ fifth proof of the, 31, 35 C, 37
Gauss’ sixth proof of the, 31, 35, 44-49 A[x], A a commutative ring, 37
Gauss’ seventh proof of the, 29 A(F ), F a field, 43
Gauss’ eighth proof of the, 29 R, 44
Legendre, A. M., 28, 29, 258 [a, b, c], 62
Legendre symbol, 10 Q(d), 62
Lemmermeyer, F., 26, 30, 35, 258 I(d), 63
LeVeque, W., 88, 92, 258 for the Jacobi symbol, 99
linear Diophantine equation, 7 I, 111
solution of a, 7 Z(n), 111
logarithmic integral, 87 Q, 121
Marcus, D., 55, 67, 258 q(I), I a subinterval of the real line, 148
263
Index

G(n, p), 154 quadratic form(s), 5,


R(n), 190, 191 automorph of a, 188
AP (b; s), 210 composition of, 5
AP (a, b; s), 210 discriminant of a, 62
qε (p), p an odd prime, 220, 221 equivalent, 63
AP (B, S), 222 genus determined by a, 64
E(A), A a finite set, 225 indefinite, 63
Kmax , 225 irreducible, 62
Λ(K), 226 negative definite, 63
Π± (B, S), 226 positive definite, 63
Π± (a, b), 237 primary representation by a, 191
ordinary residue, 6 primitive, 188
minimal non-negative, 6 representation by a, 62
Ore, O., 132, 258 representative system of, 190
overlap diagram, 235 quadratic non-residue, 3
block of an, 235 quadratic number field, 52
column of an, 235 discriminant of a, 55
gap sequence of an, 235 algebraic integers in a, 52
Paley, R. E. A. C., 214 decomposition law in a, 52-53
Perel’muter, G., 4, 214, 258 regulator of a, 115
piecewise differentiable function, 161 zeta function of a, 125
Poisson distribution, 256 quadratic residue, 3
pole, 159 quadratic residuosity problem, 98
order of a, 159 quotient diagram, 236
simple, 159 block of a, 236
Polya, G., 213 Rademacher, H., 177, 258
Poussin, C. de la Vallée, 86, 174, 258 random variable(s), 249
primary discriminant, 187 distribution function of a, 249
primary factorization, 187 independent, 249
Prime Number Theorem, 88 mean of a, 249
elementary proof of the, 137 variance of a, 249
optimal error estimate for the, 87 reciprocity law for polynomials, 22, 23
Prime Number Theorem on primes in arithmetic residue (at a pole of an analytic function), 159
progression, 91 residue pattern, 217
elementary proof of the, 137 residue (non-residue) support property, 218
primes in arithmetic progression, 77 residue (non-residue) support set, 217-218
principal genus, 64 residue theorem, 160
principal ideal class, 53 Riemann, G. F. B., 87, 259
probability space, 249 Riemann Hypothesis, 87
quadratic congruence, 1 Riemann-Lebesgue lemma, 165
quadratic excess, 148 Riemann zeta function, 87
264
Index

Euler-product expansion of the, 123


Rosen, K., 7, 97, 259
Rosenberg, J., 30, 259
Schmidt, W., 213, 216, 259
Selberg, A., 137, 259
Shamir, A. 97, 259
Shamir’s algorithm, 97, 98
Shohat, J. and J. D. Tamarkin, 253, 259
splitting field, 66
splitting modulus, 21
square-free integer, 39
square-free part, 138
standard 2m-tuple, 220
Supplement X, 52
supports all patterns, set which, 130
symmetric difference, 139
Taylor, R., 213
Taylor, R. and A. Wiles, 213, 259
theorema aureum, 18, 19
unit (in a ring), 84
universal pattern property, 217
Urbanowicz, J. and K. S. Williams, 203, 259
Vinogradov, I. M., 214
Vorlesungen über Zahlentheorie, 24, 30, 52, 83,
259
Weierstrass approximation theorem, 254
Weil, A., 211, 213, 259
Weisner, L., 42, 43, 259
Wiles, A., 213, 259
Wilson, J., 12
Wilson’s theorem, 12
Wright, S., 145, 219, 231, 231, 237, 246, 259
Wyman, B. F., 20, 22, 259
zero-knowledge proof, 97
Zygmund, A., 166, 259

265