You are on page 1of 100

Math 259: Introduction to Analytic Number Theory


0. Introduction: What is analytic number theory?

1. Distribution of primes before complex analysis: classical techniques (Euclid, Euler); primes
in arithmetic progression via Dirichlet characters and L-series; further elementary but trickier
methods (C ebysev; Selberg sieve).
2. Distribution of primes using complex analysis:  (s) and L(s; ) as functions of a complex
variable, and the proof of the Prime Number Theorem and its extension to Dirichlet; blurb for
C ebotarev density; functional equations; the Riemann hypothesis, extensions, generalizations
and consequences.
3. Lower bounds on discriminants, conductors, etc. from functional equations; geometric
4. Analytic estimates on exponential sums (van der Corput etc.); prototypical applications:
Weyl equidistribution, upper bounds on  (s) and L(s; ) on vertical lines, lattice point sums.
j j j j

5. Analytic bounds on coecients of modular forms and functions; applications to counting

partitions, representations of integers as sums of squares, etc.

Prerequisites While Math 259 will proceed at a pace appropriate for a graduate-level course,
its prerequisites are perhaps surprisingly few: complex analysis at the level of Math 113, and
linear algebra and basic number theory (up to say arithmetic in the eld Z=p and Quadratic
Reciprocity). Some considerably deeper results (e.g. estimates on Kloosterman sums) will
be cited but may be regarded as black boxes for our purposes. If you know about algebraic
number elds or modular forms or curves over nite elds, you'll get more from the course at
speci c points, but these points will be in the nature of scenic detours that are not required
for the main journey.
Texts There is no textbook for the class; lecture notes will be handed out periodically. This
class is an introduction to several di erent avors of analytic methods in number theory, and
I know of no one work that covers all this material. Supplementary readings such as Serre's
A Course in Arithmetic and Titchmarsh's The Theory of the Riemann Zeta-Function will be
suggested as we approach their respective territories.
Oce Hours 335 Sci Ctr, Thursdays 2:30{4 PM; or e-mail me at elkies@math to ask ques-
tions or set up an alternative meeting time.
Grading There will be no required homework, though the lecture notes will contain recom-
mended exercises. If you are taking Math 256 for a grade (i.e. are not a post-Qual math
graduate student exercising your EXC option), tell me so we can work out an evaluation and
grading procedure. This will most likely be either an expository nal paper or an in-class
presentation on some aspect of analytic number theory related to but just beyond what we
cover in class. Which grading method is appropriate will be determined once the class size
has stabilized after \Shopping Period". The supplementary references will be a good source
for paper or presentation topics.
Math 259: Introduction to Analytic Number Theory
Introduction: What is analytic number theory?

One may reasonably de ne analytic number theory as the branch of mathe-

matics that uses analytical techniques to address number-theoretical problems.
However, this \de nition", while correct, is scarcely more informative than the
phrase it purports to de ne. (See [Wilf 1982].) What kind of problems are
suited to \analytical techniques"? What kind of mathematical techniques will
be used? What style of mathematics is this, and what will its study teach you
beyond the statements of theorems and their proofs? The next few sections
brie y answer these questions.
The problems of analytic number theory. The typical problem of ana-
lytic number theory is an enumerative problem involving primes, Diophantine
equations, or similar number-theoretic manifestations, usually asking for large
values of some parameter. Examples of problems which we'll address in this
course are:
 How many 100-digit primes are there, and how many of theme have the
last digit 7? More generally, how do the prime-counting functions (x)
and (x; a mod q) behave for large x? [For the 100-digit problems we'd
need x = 1099 and 10100, q = 10, a = 7.]
 Given a prime p > 0, a nonzero c mod p, and integers a1 ; b1 ; a2 ; b2 with
ai < bi , how many pairs (x1 ; x2 ) of integers are there such that ai < xi < bi
(i = 1; 2) and x1 x2  c mod p? For how small an H can we guarantee
that if bi ? ai > H then there is at least one such pair?
 Is there an integer n such that the rst eleven digits of n! are 31415926535?
Are there in nitely many such n? How many such n are there of at most
1000 digits?
 Given integers n; k, how many ways are there to represent n as a sum
of k squares? For instance, how many integer solutions has the equation
ab + b2 + c2 + d2 + e2 + f 2 + g 2 + h2 = 10100 ?

As often happens in mathematics, working on such down-to-earth questions

quickly leads us to problems and objects that appear at rst glance to belong
to completely di erent areas of mathematics:
 Analyze the Riemann zeta function  (s) := P1
n=1 1=n and Dirichlet

L-functions such as
L(s) := 1 ? 3?s ? 7?s + 9?s + 11?s ? 13?s ? 17?s + 19?s + ? ? +   

as functions of a complex variable s.

 Prove that the \Kloosterman sum"
p 1  
K (p; a; b) := exp 2pi (ax + bx?1 )

(with x?1 being the inverse of x mod p) has absolute value at most 2pp.
 Show that if a function f : R!R satis es reasonable smoothness condi-
tions then for large N the absolute value of the exponential sum
exp(if (n))

grows no faster than N  for some  < 1 (with  depending on the condi-
tions imposed on f ).
 Investigate the coecients of modular forms such as
 8 28 =q (1 ? qn)8 (1 ? q2n)8 = q ? 8q2 + 12q3 + 64q4 ? 210q5 ? 96q6    :

(Fortunately it will turn out that the route from (say) (x) to  (s) is not nearly
as long and tortuous as that from xn + yn = z n to deformations of Galois
representations: : : 1 )
The techniques of analytic number theory. A hallmark of analytic number
theory is the treatment of number-theoretical problems (usually enumerative,
as noted above) by methods often relegated to the domain of \applied mathe-
matics": elementary but clever manipulation of sums and integrals; asymptotic
and error analysis; Fourier series and transforms; contour integrals and residues.
Will there is still good new work to be done along these lines, much contempo-
rary analytic number theory also uses advanced tools from within and outside
number theory (e.g. modular forms beyond the upper half-plane, Laplacian spec-
tral theory). Nevertheless, in this introductory course we shall emphasize the
classical methods characteristic of analytic number theory, on the grounds that
they are rarely treated in this Department's courses, while our program already
o ers ample exposure to the algebraic/geometric tools. As already noted in
the pseudo-syllabus, we shall on a few occasions invoke results that depend on
deep (non-analytic) techniques, but we shall treat them as deus ex mathematica,
developing only their analytic applications.
The style of analytic number theory. It has often been said that there
are two kinds2 of mathematicians: theory builders and problem solvers. In
1 See for instance [Stevens 1994].
2 Actually there are three kinds of mathematician: those who can count, and those who
cannot. [Attributed to John Conway]

the mathematics of our century these two styles are epitomized respectively by
A.Grothendieck and P.Erdos. The Harvard math curriculum leans heavily to-
wards the systematic, theory-building style; analytic number theory as usually
practiced falls in the problem-solving camp. This is probably why, despite its
illustrious history (Euclid, Euler, Riemann, : : : ) and present-day vitality, ana-
lytic number theory has rarely been taught here | in the past twelve years there
have been only a handful of undergraduate seminars and research/Colloquium
talks, and no Catalog-listed courses at all. Now we shall see that there is more
to analytic number theory than a bag of unrelated ad-hoc tricks, but it is true
that fans of contravariant functors, adelic tangent sheaves, and etale cohomol-
ogy will not nd them in the present course. Still I believe that even ardent
structuralists will bene t from this course. First, speci c results of analytic
number theory often enter as necessary ingredients in the only known proofs
of important structural results. Consider for example the arithmetic of elliptic
curves: the many applications of Dirichlet's theorem on primes in arithmetic
progression, and its generalization to C ebotarev's density theorem,3 include the
recent work of Kolyvagin and Wiles; in [Serre 1981] sieve methods are elegantly
applied to investigate the distribution of traces of an elliptic curve;4 in [Merel
1996] a result (Lemme 5) on the x1 x2  c mod p problem is required to bound
the torsion of elliptic curves over number elds. Second, the ideas and tech-
niques apply widely. Sieve inequalities, for instance, are also used in probability
to analyze nearly independent variables; the \stationary phase" methods that
we'll use to estimate the partition function are also used to estimate oscilla-
tory integrals in quantum physics, special functions, and elsewhere; even the
van der Corput estimates on exponential sums have recently found applica-
tion in enumerative combinatorics [CEP 1996]. Third, the habit of looking for
asymptotic results and error terms is a healthy complement to the usual quest
for exact answers that we can tend to focus on too exclusively. Finally, An
ambitious theory-builder should regard the absence thus far of a Grand Uni ed
Theory of analytic number theory not as an insult but as a challenge. Both
machinery- and problem-motivated mathematicians should note that some of
the more exciting recent work in number theory depends critically on contribu-
tions from both sides of the stylistic fence. This course will introduce the main
analytic techniques needed to appreciate, and ultimately to extend, this work.
[CEP 1996] Cohn, H., Elkies, N.D., Propp, J.: Local statistics for random
3 We shall describe C ebotarev's theorem brie y in the course but not develop it in detail.
Given Dirichlet's theorem and the asymptotic formula for (x; a mod q), the extra work needed
to get C ebotarev is not analytic but algebraic: the development of algebraic number theory
and the arithmetic of characters of nite groups. Thus a full treatment of C ebotarev does not
alas belong in this course.
4 My thesis work on the case of trace zero (see e.g. [Elkies 1987]) also used Dirichlet's

domino tilings of the Aztec diamond, Duke Math J. 85 #1 (Oct.96), 117{166.
[Elkies 1987] Elkies, N.D.: The existence of in nitely many supersingular primes
for every elliptic curve over Q. Invent. Math. 89 (1987), 561{567.
[Merel 1996] Merel, L.: Bornes pour la torsion des courbes elliptiques sur les
corps de nombres. Invent. Math. 124 (1996), 437{449.
[Serre 1981] Serre, J.-P.: Quelques applications du theoreme de densite de Cheb-
otarev. IHES Publ. Math. 54 (1981), 123{201.
[Stevens 1994] Stevens, G.: Fermat's Last Theorem, PROMYS T-shirt, Boston
University 1994.
[Wilf 1982] Wilf, H.S.: What is an Answer? Amer. Math. Monthly 89 (1992),

Math 259: Introduction to Analytic Number Theory
Elementary approaches I: Variations on a theme of Euclid1

The rst asymptotic question to ask about (x) is whether (x)!1 as x!1,
that is, whether there are in nitely many primes. That the answer is Yes was
rst shown by the justly famed argument in Euclid. While often presented as a
proof by contradiction, the argument can readily be recast as an e ective Q (albeit
rather inecient) construction: given primes p1 ; p2; : : : ; pn , let Pn = nk=1 pn ,
de ne Nn = PN + 1, and let pn+1 be the smallest factor of Nn . Then pn+1 is
a prime no larger than Nn and di erent from p1 ; : : : ; pn . Thus fpk gk>1 is an
in nite sequence of distinct primes, Q.E.D.
Moreover this argument also gives an explicit upper bound on pn , and thus a
lower bound on (x). Indeed we may take p1 = 2 and observe that
Yn Yn
pn+1  Nn = 1 + pn  2 pn :
k=1 k=1
if equality were satis ed at each step we would have pn = 22n?1 . Thus by
induction we see that2 n?1
pn  2 2
(and of course the inequality is strict once n > 1). Therefore if x  22n?1 then
pk < x for k = 1; 2; : : : ; n and so  (x)  n, so we conclude3
 (x) > log2 log2 x:

The Pn + 1 trick can even be used to prove other special cases of the result that
(a; q) = 1 ) (x; a mod q)!1 as x!1. Of course the case 1 mod 2 is trivial
given Euclid. For ?1 mod q with q = 3; 4; 6, start with p1 = q ? 1 and de ne
Nn = qPn ? 1. More generally, for any quadratic character  there are in nitely
many primes p with (p) = ?1; e.g. given an odd prime q0 , there are in nitely
many primes p which are quadratic nonresidues of q0 . [I'm particularly fond of
this argument because I was able to adapt it as the punchline of my doctoral
thesis; see [El].] The case of (p) = +1, e.g. the result that (x; 1 mod 4)!1,
is only a bit trickier.4 For that case, let p1 = 5 and Nn = 4Pn2 + 1, and appeal
1 Speci cally, [Eu, IX, 20]. For more on the history of work on the distribution of primes
up to about 1900, see [Di, XVIII].
2 Curiously the same bound is obtained from a much later elementary proof: between 1
and 22n there are n pairwise coprime numbers, the rst n ? 1 Fermat numbers 22m + 1 for
0  m < n, [why are they pairwise coprime?], so necessarily at least n primes as well.
3 Q: What sound does a drowning analytic number theorist make? A: log log log log
: : : [R. Murty, via B. Mazur]
4 But enough so that a problem from a recent Qualifying Exam for our graduate students
asked to prove that there are in nitely many primes congruent to 1 mod 4.

to Fermat's theorem on the prime factors of x2 + y2 . Again this even yields an
explicit lower bound on (x; 1 mod 4), namely5
 (x; 1 mod 4) > C log log x
for some positive constant C . [ Exhibit an explicit value of C . Use

cyclotomic polynomials to show more generally that for any q0 , prime or not,
there exist in nitely many primes congruent to 1 mod q0 ,6 and that indeed the
number of such primes < x grows at least as fast as some multiple of log log x.
Modify this trick to show that there are in nitely many primes congruent to 4
mod 5, again with a log log lower bound.]
But Euclid's approach and its variations, however elegant, are not sucient for
our purposes. For one thing, numerical evidence suggests | and we shall soon
prove | that log2 log2 x is a gross underestimate on (x). For another, one
cannot7 prove all cases of (a; q) = 1 ) (x; a mod q)!1 using only variations
on the Euclid argument. Our next elementary approaches will address at least
the rst de ciency.
[Di] Dickson, L.E.: History of the Theory of Numbers, Vol. I: Divisibility and
Primality. Washington: Carnegie Inst., 1919.
[Eu] Euclid, Elements.
[El] The existence of in nitely many supersingular primes for every elliptic curve
over Q, Invent. Math. 89 (1987), 561{568; See also: Supersingular primes for
elliptic curves over real number elds, Compositio Math. 72 (1989), 165{172.

5 Even a drowning analytic number theorist knows that log log and log log are asymptot-
2 2
ically within a constant factor of each other. What is that factor?
6 A result attributed to Euler in [Di, XVIII].
7 This is not a theorem, of course; for one thing how does one precisely de ne \variation of
the Euclid argument"? But I'll be quite impressed if you can even nd a Euclid-style argument
for the in nitude of primes congruent to 2 mod 5 or mod 7.

Math 259: Introduction to Analytic Number Theory
Elementary approaches II: the Euler product

It was Euler who rst went signi cantly beyond Euclid's proof, by recasting
another highlight of ancient Greek number theory, this time unique factorization
into primes, as a generating-function identity.QThat is, from the fact that every
positive integer n may be written uniquely as p prime pcp , with cp a nonnegative
integer that vanishes for all but nitely many p, Euler obtained:
01 1
1 Y @X Y 1 :
p p A=
?s = ? c s
p prime cp =0 p prime
1 ? p?s

The sum on the left-hand side of (E) is now called the zeta function
1 1 X1
?s ;
 (s) = 1 + s + s +    = n
2 3 n=1

the formula (E) is called the Euler product for  (s). So far we have only proved
(E) as a formal identity. But, since all the terms and factors in the in nite series
and products are positive, (E) actuallyP1holds as an identity between convergent
series and products provided that ?s converges. By comparison with
R 1 x?s dx (i.e. by the \Integral Test" of elementary calculus) we see that the
n=1 n
this happens if and only if s > 1. Moreover, from
Z x+1
?s dy = 1 ?x1?s ? (x + 1)1?s 

we obtain the inequality

1?s 1)1?s < x?s ;
(x + 1)?s < x ?s(?x + 1
summing this over x = 1; 2; 3; : : : we obtain lower and upper bounds on  (s):1
1 <  (s) < 1 + 1: ()
s?1 s?1

In particular  (s)!1 as s!1 from above. This yields Euler's proof2 of the
in nitude of primes: if there were only nitely many then the product (E)
would remain nite for all s > 0.
1 In fact more accurate estimates are available from the \Euler-Maclaurin formula", as we
shall see in due course.
2 Actually Euler simply substituted s = 1 in (E) to claim a contradiction with the hypothesis
that there are only nitely many primes, but it is easy enough to convert that idea into this
legitimate proof. If you happen to know that  is transcendental, or only that 2 is irrational,
then from 2 = 6 (2) (another famous Euler identity) you may also obtain the in nitude of
primes, though naturally the resulting bounds on (x) are much worse.

In fact that product is actually in nite for s as large as 1, a fact that yields much
tighter estimates on (x) than were available starting from Euclid's proof. For
instance we cannot have constants C;  with  < 1 such that (x) < Cx for all
x, because then the Euler product would converge for s > . To go further along
these lines it is convenient to take logarithms in (E), converting the product to
a sum: X
log  (s) = ? log(1 ? p?s );

and to estimate both sides as s!1+. By (*) the left-hand side log  (s) is between
log 1=(s ? 1) and log s=(s ? 1); since 0 < log s < s ? 1, we conclude that  (s)
is within s ? 1 of log 1=(1 ? s). In the right-hand side we approximate each
summand ? log(1 ? p?s ) by P p?s ; the error is at most 21 p?2s , which summed
over all primes is less than 2 p p?2 <  (2)=2. The point is not the numerical
bound on the error but the fact that it remains nitely bounded as s!1+. We
thus have: X ?s
p = log s ?1 1 + O(1) (1 < s < 2): (E')

The O(1) here is our rst encounter with the \Big Oh" notation; in general if
g is nonnegative then \f = O(g )" is shorthand for \there exists a constant C
such that jf j  Cg for each allowed evaluation of f; g". Thus O(1) is aPbounded
function, so (E') means that there exists a constant C such that p p?s is
within C of log s?1 1 for all s 2 (1; 2). [The upper bound of 2 on s is not crucial,
since we're concerned with s near 1, but we must put some upper bound on s
for (E') to hold | do you see why?] An equivalent notation, more convenient in
some circumstances, is f  g (or g  f ). For instance, a linear map T between
Banach spaces is continuous i T v = O(jvj) i jvj  jT vj. Each instance of O()
or  or  is presumed to carry its own implicit constant C . If the constant
depends on some parameter that parameter appears as a subscript; e.g. for any
 > 0 we have log x = O (x ) (equivalently log x  x ) on x 2 [1; 1). For
basic properties of O() see the Exercises at the end of this section.
Now (x) still does not occur explicitly in the sum in (E'). WeRthus rewrite
this sum as follows. Express the summand p?s as an integral s p1 y?1?s dy.
Summing over all p we nd that y occurs in the interval of integration (p; 1)
i p < y, i.e. with multiplicity (y). Thus the sum in (E') becomes an integral
involving (y), and we nd:
 (y )y
?1?s dy = log 1 + O(1) (1 < s < 2): (E")
Two remarks are in order here. First, that the transformation from (E') to (E")
is an example of a method we shall use often, known either as partial summation
or integration by parts. To explain the latter name,
R consider that the sum in
(E') may be regarded as the Stieltjes integral 11 y?s d(y), which integrated

by parts yields (E"); that is how we shall write this transformation henceforth.
Second, that the integral in (E") is just s times the Laplace transform of the
function R (1eu ) on u 2 [0; 1): via the change of variable y = e?u that integral
becomes 0 (eu )e?su du. In Rgeneral if f (u) is a nonnegative function whose
Laplace transform Lf (s) := 01 f (u)e?su du converges for s > s0 then the
behavior of Lf (s) as s!s0 + detects the behavior of f (u) as u!1. In our case,
s0 = 1, so we expect that (E") will give us information on the behavior of  (x)
for large x.
We note now that (E") is consistent with (x)  x= log x, that is, that3
Z 1 y?s 1 + O(1) (1 < s < 2):
dy = log
2 log y s ? 1
R 1 be I (s). Di erentiating under the integral sign we nd that
Let the integral
I 0 (s) = ? 2 y ?s dy = 21?s =(1 ? s) = 1=(1 ? s) + O(1). Thus for 1 < s < 2 we
Z2 Z 2 d
I (s) = I (2) ?
I ( ) d = +
 ? 1 + O(1) = log s ?1 1 + O(1)
s s

as claimed. This does not prove the Prime Number Theorem, but it does show
that, for instance, if c < 1 < C then there are arbitrarily large x; x0 such that
 (x) > cx= log x and  (x0 ) < Cx0 = log x0 .

Exercises, mostly on the Big Oh (a.k.a. ) notation:

1. If f  g and g  h then f  h. If f1 = O(g1 ) and f2 = O(g2 ) then
f1 f2 = O(g1 g2 ) and f1 + f2 = O(g1 + g2 ) = O(max(g1 ; g2 )). Given a positive
function g, the functions f such that f = O(g) constitute a vector space.
2. If f  g on [a; b] then ax f (y) dy  ax g(y) dy for x 2 [a; b]. (We already
used this to obtain I (s) = log(1=(s ? 1)) + O(1) from I 0 (s) = 1=(1 ? s) + O(1).)
In general di erentiationP does not commute with \" (why?). Nevertheless,
prove that  0 (s)[= ? 1 n=1 n
?s log n] is ?1=(s ? 1)2 + O(1) on s 2 (1; 1).
3. So far all the implicit constants in the O() or  we have seen are e ective :
we didn't bother to specify them, but we could if we really had to. Moreover
the transformations in exercises 1,2 preserve e ectivity: if the input constants
are e ective then so are the output ones. However, it can happen that we know
that f = O(g) without being able to name a constant C such that jf j  Cg.
Here is a prototypical example: suppose x1 ; x2 ; x3 ; : : : is a sequence of positive
reals which we suspect are all  1, but all we can show is that if i 6= j then
xi xj < xi + xj . Prove that xi are bounded, i.e. xi = O(1), but that as long as
we do not nd some xi greater than 1, we cannot use this to exhibit a speci c C
3 We shift the lower limit of integration to y = 2 to avoid the spurious singularity of 1= log y
at y = 1, and suppress the factor s because only the behavior as s!1 matters and multiplying
by s does not a ect it to within O(1).

such that xi < C for all i | and indeed if our suspicion that every xi  1 is
correct then we'll be able to nd C .
We'll encounter this sort of unpleasant ine ectivity (where it takes at least two outliers
to get a contradiction) in Siegel's lower bound on L(1; ); it arises elsewhere too,
notably in Faltings' proof of the Mordell conjecture, where the number of rational
points on a given curve of genus > 1 can be e ectively bounded but their size cannot.

Math 259: Introduction to Analytic Number Theory
Primes in arithmetic progressions: Dirichlet characters and L-functions

We introduce this with the example of the distribution of primes mod 4, i.e. of
(x; 1 mod 4) and (x; 3 mod 4). The sum of these is of course (x) ? 1 once
x > 2, and we've obtained
s (y)y?1?s dy = log s ?1 1 + O(1) (1 < s < 2) (E")
from the Euler product for  (s). If the factor (1 ? 2?s )?1 is omitted from that
product we get a product formula for (1 ? 2?s ) (s) = 1+3?s +5?s +7?s +   . A
similar formula involving (; 1 mod 4) (or (; 3 mod 4)) would require summing
n?s over the integers all of whose prime factors are are congruent to 1 (or 3)
mod 4, which is hard to deal with. However, we can deal with the di erence
(x; 1 mod 4) ? (x; 3 mod 4) using an Euler product for the L-series
L(s; 4 ) := 1 ? 31s + 51s ? 71s + ?    = 4 (n)n?s :
Here 4 is the function
< +1; if n  +1 mod 4;
4 (n) = : ?1; if n  ?1 mod 4;
0 if 2jn.
This function is (strongly1) multiplicative:
4 (mn) = 4 (m)4 (n) (m; n 2 Z): (M)
Therefore L(s4 ; ) factors as did  (s):
01 1
Y @X Y 1
L(s; 4 ) = (pcp )p?cp s A = ?s : (E4 )
p prime cp =1 p prime 1 ? (p)p
By comparison with the Euler product for  (s) we see that the manipulations in
P of ((ns)),n?thes may
(E4 ) are valid for s > 1. Unlike the case function L(s; 4 ) remains
bounded as s!1+, because the sum 1 n=1 be grouped as
1 ? 31s + 51s ? 71s + 91s ? 111s + : : :
1 Sometimes a function f is called multiplicative when f (mn) = f (m)f (n) only for coprime
m; n.

in which the n-th term is O(n?(s+1) ) (why?). Indeed this regrouping lets us
extend L(; 4 ) to a continuous function on (0; 1). [ Exercise: Show that in fact
the resulting function is in nitely di erentiable on s > 0.] Moreover each term
(1 ? 3?s ), (5?s ? 7?s ), (9?s ? 11?s),: : : is positive, so L(s; 4 ) > 0 for all s > 0,
in particular for s = 1 (you probably already know that L(1; 4 ) = =4). So,
starting from (E4 ) and arguing as we did to get from (E) to (E") we obtain
s (y; 4 )y?1?s dy = O(1) (1 < s < 2); (E"4 )
where X
(y; 4 ) := (y; 1 mod 4) ? (y; 3 mod 4) = 4 (p):
p prime
Averaging (E"4 ) with (E) we nd that
s (y; 1 mod 4)y?1?s dy = 21 log s ?1 1 + O(1) (1 < s < 2);
s (y; 3 mod 4)y?1?s dy = 21 log s ?1 1 + O(1) (1 < s < 2):
This is consistent with (x; 1 mod 4)  12 x= log x, and corroborates our ex-
pectation that there should be on the average as many primes congruent to
+1 mod 4 as ?1 mod 4. To phrase it another way, the sets of 1 mod 4 and
3 mod 4 primes both have logarithmic P density 1=2, where a set S of primes is
said to have logarithmic density  if p2S p?s   log s?1 1 as s!1+.
We can similarly treat (x; 1 mod 3) and, with a tad more work, (x; a mod 8)
(a odd) and (x; a mod 12) (a = 1; 5; 7; 11) in the same way. [To prove that
the L-series 1 ? 3?s ? 5?s + 7?s + ? ? +    and 1 ? 5?s ? 7?s + 11?s + ?
? +    are positive on s > 0, group them in fours rather than pairs.] In
each case we nd that the two or four congruence classes of primes have equal
logarithmic densities. What about (x; a mod 5)? We have the quadratic 
taking n to +1 or ?1 if x  1 or 2 mod 5 (and to 0 if 5jn), but this only lets us
separate quadratic from non-quadratic residues mod 5. To get at the individual
nonzero residue classes mod 5 we need also two 's taking complex values: the
multiplicative functions taking n  0; 1; 2; 3; 4 mod 5 to 0; 1; i; i; ?1. The
resulting L-functions are then complex, but the crucial fact that they do not
vanish at s = 1 can still be shown by using the pairing trick on either the real or
the complex part. We thus nd again that each of the four nonzero congruence
classes of primes mod 5 has the same logarithmic density, and in particular that
there are in nitely many primes congruent to a mod 5 for each a 2 (Z=5)
(which we have so far been unable to do for a = 2).
How to generalize this to treat (x; a mod q) for all q > 1 and a coprime with q?
We'll use linear combinations of Dirichlet characters. These are de ned as

follows: for a positive integer q, a Dirichlet character mod q is a function  :
Z!C which is
 q-periodic, i.e. n  n0 mod q ) (n) = (n0 );
 supported on the integers coprime to q and on no smaller subset of Z, i.e.
(n; q) = 1 , (n) = 6 0; and
 multiplicative: (m)(n) = (mn) for all integers m; n.
To such a character is associated the Dirichlet L-series
X1 Y 1
L(s; ) := (n)n?s = (s > 1):
p prime 1 ? (p)p
Examples: The trivial character 0 mod q is de ned by (n) = Q 1 if (n; q) = 1 and
(n) = 0 otherwise, associated with the L-series L(s; 0 ) = pjq (1 ? p?s)   (s).
If l is prime then the Legendre symbol (=l), de ned by (n=l) = 0; 1; ?1 according
as n is zero, a nonzero square, or not a square mod l, is a character mod l or 4l by
Quadratic Reciprocity. If  is a Dirichlet character mod q then so is its complex
conjugate  (de ned of course by (n) = (n) ), with L(s; ) = L(s; ) for s > 1.
If 1 ; 2 are characters mod q1 ; q2 then 1 2 is a character mod lcm(q1 ; q2 ). In
particular the characters mod q constitute a group under multiplication, with
identity 0 and inverse ?1 = .
What is this group? Since a Dirichlet character mod q is just a homomorphism
from (Z=q) to the unit circle (extended by zero to a function on Z=q and
lifted to Z), the group of such characters is just the Pontrjagin dual of (Z=q) .
Pontrjagin duality for nite abelian groups like (Z=q) is easy | it's basically
just the discrete Fourier transform by another name. We recall the basic facts:
For any nite abelian group G let G^ be its dual. Then the dual of G  H is
G^  H^ , and the dual of Z=m is a cyclic group of order m. Since any nite
abelian group is a product of cyclic groups, it follows that G^ is isomorphic (not
in general canonically so!) with G, and the canonical homomorphism from G
to the dual of G^ , taking g 2 G to the map  7! (g), is an isomorphism. The
characters of G are orthogonal :
1 (g) 2 (g) = j0G; j; ifif 1 6=
= 2 ;
2 .
g2G 1
In particular, they are linearly independent; since there are jGj of them, they
form a basis for the vector space of complex-valued functions on G. The de-
composition of an arbitrary such function f : G!C as a linear combination of
characters is achieved by the inverse Fourier transform:
f = f ; where f := 1
  (g)f (g):
jGj g2G

In particular the characteristic function of any g0 2 G is jGj?1  (g).
What does all this tell us about Dirichlet L-functions and distribution of primes
mod q? First, that if we de ne (; ) by
(x; ) := (a)(x; a mod q) = (p)
a mod q p prime
then, for all a coprime to q,
(x; a mod q) = (1q) (a)(x; );
where as usual (q) = j(Z=q) j is the Euler phi (a.k.a. \totient") function.
Second, that
s (y; )y?1?s dy = log L(s; ) + O(1): (1 < s < 2) (D)
Since the Euler product for L(s; ) converges for s > 1, we know that L(s; ) > 0
for s > 1. If  = 0 then L(s; ) is essentially  (s) so log L(s; ) = log(1=(s ?
1)) + O(1) as s!1+. Otherwise the sum de ning L(s; ) actually converges
P once s  1): as a special Pcase of character
for s > 0 (though not absolutely
orthogonality, if  6= 0 then a modq (a) = 0, so S (x) := 0<n<x (n) is a
bounded function and
N (n) Z N
ns = n?s dS (x) = S (x)n?s + s n?1?s S (x) dx
n=M M M
 M ?s + N ?s ;
which for xed s > 0 tends to zero as M; N !1. As with the special case  = 4 ,
we can in fact show [ do it.] that L(s; ) is in nitely di erentiable in

(0; 1). From (D) we see that the crucial question is whether L(1; ) is positive
or zero: the right-hand side is O(1) if L(1; ) > 0 but  ? log(1=(s ? 1)) + O(1)
if L(1; ) = 0. Our experience with small q, and our expectation that the primes
should not favor one congruence class in (Z=q) to another, both suggest that
L(1; ) will not vanish. Our methods thus far do not let us prove this in general
(try doing it for  = (=67) or (=163)!), so let us for the time being assume:
 6= 0 ) L(1; ) 6= 0: (6=)
Then, by multiplying (D) by (a) and averaging over a 2 (Z=q) we obtain:
Let a; q be coprime positive integers. Assume that (6=) holds for all Dirichlet
characters  mod q. Then
s (y; a mod q)y?1?s dy = (1q) log s ?1 1 + O(1) (1 < s < 2): (D)

Thus the primes congruent to a mod q have logarithmic density 1=(q); in par-
ticular the arithmetic progression fn : n  a mod qg contains in nitely many
In fact (6=) was proved by Dirichlet, from which followed (D), Dirichlet's cele-
brated theorem on primes in arithmetic progressions. At least three proofs are
now known. These three proofs all start with the product of all (q) L-functions
associated to Dirichlet characters mod q:
0 1?1
Y Y @Y
L(s; ) = (1 ? (p)p?s )A :
 mod q p prime  mod q
The inner product can be evaluated with the following cyclotomic identity:
Let G be a nite abelian group and g 2 G an element of order m. Then
(1 ? (g)z ) = (1 ? z m)jGj=m
hold identically for all z .
The identity is an easy consequence of the factorization of 1 ? z m together with
the fact that any character of a subgroup H  G extends in [G : H ] ways to a
character of G (in our case H will be the cyclic subgroup generated by g).
Let mp , then, be the multiplicative order of p mod q (for all but the nitely
many primes p dividing q). Then we get
L(s; ) = (1 ? p?mps )?(q)=mp : ()
 mod q p-q
The left-hand side contains the factor L(s; 0 ), which is C=(s ? 1) + O(1) as
s!1+ for some C > 0 [in fact C = (q)=q]. Since the remaining factors are
di erentiable at s = 1, if any of them were to vanish there the product would
remain bounded as s!1+. So we must show that this cannot happen.
Dirichlet's original approach was to observe that () is, up to a few factors2
1 ? n?s with njq, the \zeta function of the cyclotomic number eld Q(e2i=q )".
He then proved that the zeta function K (s) of any number eld K is  C=(s?1)
as s!1+ for some positive constant C (and gave an exact formula for C , which
includes the class number of K and is thus called the \Dirichlet class number
2 In fact if we replace each  by its underlying primitive character (see the Exercises) the
product is exactly the zeta function of Q(e2i=q ). This is the prototypical example of the
factorization of a zeta function as a product of Artin L-functions, and the fact that the \Artin
L-functions" for 1-dimensional representations are Dirichlet series is a prototype for class eld
theory. Needless to say, Math 259 is not the course where these remarks may be properly

formula"). That is undoubtedly the best way to go about it | but it requires
more algebraic number theory than I want to assume here. Fortunately there
are at least two ad-hoc simpli cations available.
The rst is that we need only worry about real characters. If L(1; ) = 0 then
also L(1; ) = 0. So if  6=  but L(1; ) = 0 then there are at least two factors
in the left-hand side of () that vanish at s = 1; since they are di erentiable
there, the product would be not only bounded as s!1+, but approach zero
there | which is impossible because the right-hand side is > 1 for all s > 1.
But if  is a real character then L(s; 0)L(s; ) is (again within a few factors
1 ? n?s of) the L-function of a quadratic number eld. Developing the algebraic
number theory of quadratic number elds takes considerably less work than is
needed for the full Dirichlet class number formula, and if we only want to get
unboundedness as s!1+ it is even easier | for instance, if (1) = ?1 then the
right-hand side of () is dominated by the zeta function of a binary quadratic
form, which is easily seen to be  1=(s ? 1). However, even this easier proof is
beyond the scope of what I want to assume or develop in this class.
Fortunately there is a way to circumvent any K beyond K = Q, using the fact
that the right-hand side of () also dominates the series  ((q)  s), which blows
up not at s = 1 but at s = 1=(q). Since this s is still positive, we can still get
a proof of (6=) from it, but only by appealing to the magic of complex analysis.
We thus defer the proof until we have considered  (s) and more generally L(s; )
as functions of a complex variable s, which we shall have to do anyway to obtain
the Prime Number Theorem and results on the density (not just logarithmic
density) of primes in arithmetic progressions.
Exercises: Show that the integers q modulo which all the Dirichlet characters
are real (take on only the values 0; 1) are precisely 24 and its factors. If you
Q (Quadratic QReciprocity, show that every real Dirichlet character is either
l2L =l ) or  4 l2L (=l ) for some possibly empty nite set L of primes.
If q1 jq and 1 is a character mod q1 then a character  mod q is obtained from 1
by multiplying it by the trivial character mod q. Express L(s; ) in terms of
L(s; )1 . Conclude that (6=) holds for  , it holds for 1 .
A character mod q that cannot be obtained in this way from any character mod
a proper factor q1 jq is called primitive. Show that any Dirichlet character 
comes from a unique primitive character 1 . [The modulus of this 1 is called
theQ conductor of .] Show that the number of primitive characters mod n is
n pjn p , where p = ((p ? 1)=p)2 if p2 jn and (p ? 2)=p if pkn. NB there are
no primitive characters mod n when 2kn.
Deduce the fact that for any q at most one character mod q fails (6=) starting
from (D) together with the fact that (x; a mod q)  0 for all x; a; q. [In the nal
analysis this is not much di erent from our proof using the product of L-series.]
Using either this approach or the one based on (), prove that there is at most

one primitive Dirichlet character of any modulus whose L-function vanishes at
s = 1. (Assume there were two, and obtain two di erent imprimitive characters
to the same modulus for which (6=) fails, which we've already shown impossible.
We shall encounter this trick again when we come to Siegel's ine ective lower
bound on L(1; ).)

Math 259: Introduction to Analytic Number Theory
C ebysev (and von Mangoldt and Stirling)

It is well known1 that for any prime p and positive integer x the exponent of p
in x! is       1 x
x x x
cp (x) := p + p2 + p3 +    =
pk ; k=1
the sum being nite because eventually pk > x. It was C ebysev's insight that
one could extract information about () from the resulting formula
Y c (x)
x! = pp ;
or equivalently
X 1 jxk
log x! = cp (x) log(p) = n (n); (C )
p n=1
where (n) is the von Mangoldt function

(n) := log p; if n = pk for some positive integer k and prime p;
0; otherwise.
For instance, C ebysev was able to come close enough to the Prime Number The-
orem to prove \Bertrand's Postulate": every interval (x; 2x) with x > 1 contains
a prime. (See [H&W 1996, p.343{4] for Erdos's simpli cation of C ebysev's proof;
this simpli ed proof is also on the Web:
To make use of (C ) we need to estimate
log x! = log n
for large x. We do this by in e ect applying the rst few steps of symmetrized
Euler-Maclaurin summation. For any C 2 function f we have (by integrating by
parts twice)
Z 1=2 "Z Z 1=2 #
f (y) dy = f (0)+ 2
f 00 (y)(y + 1=2) dy +
2 00
f (y)(y ? 1=2) dy

?1=2 ?1=2 0

1 If only thanks to the perennial problems along the lines of \how many zeros end 1998! ?".

Z 1=2
= f (0) + 2 f 00 (y)hy + 12 i2 dy;
where hz i is the distance from z to the nearest integer. Thus
X Z N + 12 Z N + 12
f (k) = f (y) dy + 21 f 00 (y)hy + 1=2i2 dy:
k=1 =
1 2 1

Taking f (y) = log(y) and N = x we thus have

Z x+ 2
log x! = (x + 21 ) log(x + 12 ) + 12 log 2 ? x ? 21 1 hy + 1=2i2 dy

y2 :

The integral is Z1
? 2 1 hy + 1=2i2 dy
y2 + O(1=x);
and the other terms are
(x + 21 ) log x ? x + 21 (log 2 ? 1) + O(1=x):
Thus we have
log x! = (x + 12 ) log x ? x + C + O(1=x) (S)
for some absolute constant C . Except for the determination of C (which turns
out to be 12 log(2), as we shall see about a week hence), this formula is of
course Stirling's approximation pto x!. Stirling's approximation extends to an
asymptotic series for x!=[(x=e)x 2x ] in inverse powers of x, but for our pur-
poses log x! = (x + 21 ) log x ? x + O(1) is more than enough. In fact, since for
the time being we're really dealing with logbxc! and not log ?(x + 1), the best
honest error term we can use is O(log x).
Now let X
(x) := (n):
Then from (C ) and (S) we have
(x=k) = (x + 21 ) log x ? x + O(1):
This certainly suggests that (x)  x, and lets us prove upper and lower bounds
on (x) proportional to x, for instance
1 " 1 #
X ?m m X m
(x)  log x! ? 2 logbx=2 c! = 2 m log 2 x + O(log x) =

m=1 m=1

= (2 log 2)x + O(log2 x)
(since x  1 + 1
m=1 bx=2 c for x  1) and
m 2

(?1)k?1 (x=k) = log (2xx!2)! = (log 2)x + O(log x):

It is true that we're ultimately interested in (x), not (x). But it is easy to get
P1 to1=k(x) of prime1=powers
from one to the other. For one thing the contribution
pk with k > 1 is negligible
P | certainly less than k=2 bx c log x  x log x.

The remaining sum, p<x log p, can be expressed in terms of (x) and vice
versa using partial summation, and we nd:
(x) = log(x)(x) ? (y) dy 1=2
y + O(x log x);
Z 1=2 Zx
= f (0) + 2 (x)
(x) = log x + (y) dy2 + O(x1=2 log x):
?1=2 3 y log y
It follows that the Prime Number Theorem (x)  x= log x holds if and only if
(x)  x, and good error terms on one side imply good error terms on the other.
It turns out that we can more readily get at (x) than at (x); for instance
(x) is quite well approximated
R by x, while the \right" estimate forR (x) is not
x= log x but that plus x dy= log2 y, i.e. the \logarithmic integral" x dy= log y.
It is in the form (x)  x that we'll actually prove the Prime Number Theorem.
Since our upper and lower asymptotic bounds log 2; log 4 on (x)=x are within a
factor of 2 of each other, they do not quite suce to prove Bertrand's Postulate.
But any improvement would prove that (2x) > (x) for suciently large x,
from which the proof for all x follows by just exhibiting a few suitably spaced
primes. It turns out that better bounds are available starting from (C ). For
instance, show that (x) < ( 12 log 12)x + O(log2 x). Can you obtain C ebysev's
bounds of 0.9 and 1.1? In fact it is known that the upper and lower bounds
can be brought arbitrarily close to 1, but alas the only known proof of that fact
depends on the Prime Number Theorem!
Here is another elementary approach: let P (u) be any nonzero polynomial with
integer coecients of degree d; then
f (u)2n du  1=lcm(1; 2; : : : ; 2dn + 1) = exp(? (2dn + 1)) > Q 1 p :

0 p<2dn
2This is essentially the same tactic of factoring ( 2nn ) that C ebysev used to prove (2x) >

Thus X
min1 1=jP (u)j:
log p > 2n log 0<u<
For instance, taking f (u) = u ? u2 we nd (at least for 4jx) that p<x log x <
x log 4. This is essentially the same (why?) as C ebysev's trick of factoring ( 2nn ),
but suggests di erent sources of improvement; try f (u) = (u ? u2 )(1 ? 2u) for
example. [Unfortunately here the upper bound cannot be brought down to 1+ ;
see [Montgomery 1994, Chapter 10] | thanks to Madhav Nori for bringing this
to my attention.]
Show that
X 1
log p = (x) ? (x1=2 ) ? (x1=3 ) ? (x1=5 ) + (x1=6 )    = (k) (x1=k );
px k=1
where  is the Mobius function taking the product of r  0 distinct primes to
(?1)r and any non-square-free integer to 0.
[H&W 1996] Hardy, G.H., Wright, E.M.: An Introduction to the Theory of
Numbers, 5th ed. Oxford: Clarendon Press, 1988 [AB 9.88.10 / QA241.H37].
[Montgomery 1994] Montgomery, H.L.: Ten lectures on the interface between
analytic number theory and harmonic analysis. Providence: AMS, 1994 [AB

Math 259: Introduction to Analytic Number Theory
The contour integral formula for (x)

We now have several examples of series1

F (s) = an n?s (F)
from which we want to extract information about the growth of n<x an as
x!1. The key to this is a contour integral. We regard F (s) as a function of a
complex variable s =  + it. For real y > 0 we have
y?s = exp(?s log y) = y? eit log y =) jy?s j = y? :
Thus if the sum (F) converges absolutely for2 s > 0 then it also converges
absolutely to an analytic function on the half-plane Re(s) > 0 . Now for y > 0
and c > 0 we have
1 Z c+i1 ys ds = 11;; ifif yy >=
1; (I)
2i c?i1 s 2
0; if y < 1
in the following sense: the contour of integration is the vertical line Re(s) = c,
and since the integral is then not absolutely convergent it is regarded as a
principal value: Z c+i1 Z c+iT
f (s) ds := Tlim
f (s) ds:
c?i1 c?iT
6 1, and an
Thus interpreted, (I) is an easy exercise in contour integration for y =
elementary manipulation of log s for y = 1. So we expect that if (F) converges
absolutely in Re(s) > 0 then
X Z c+i1
an = 2i xs F (s) dss (1)
n<x c?i1
for any c > 0 , using the principal value of the integral and adding ax =2 to
the sum if x happens to be an integer. But getting from (F,I) to (1) involves
interchanging an in nite sum with a conditionally
R c+i1 convergent
R c+iT integral, which is
not in general legitimate. Thus we replace c?i1 by c?iT , which legitimizes
the manipulation but introduces an error term into (I). We estimate this error
term as follows:
1 As suggested by Serre, everything works just as well with a series 1
P a n?s where
k=0 k k
Pk are positive
n reals such that nk !1 as k!1. In that more general setting we would seek
n2 k <x ak .
Clearly if (F) converges absolutely for some given s then it also converges absolutely for
all larger s.

Lemma. For y; c; T > 0 we have
1 Z c+iT ys ds = 1 + O(yc min(1; T j log 1 )); if y  1;
yj (IT )
2i c?iT s 1 ));
O(yc min(1; T j log if y  1,
the implied O-constant being e ective and uniform in y; c; T .
(In fact the error's magnitude is less than both yc and yc=T j log yj. Of course
if y equals 1 then the error term is regarded as O(1) and is valid for both
approximations 0; 1 to the integral.)
Proof : Complete the contour of integration to a rectangle extending to real part
?M if y  1 or +M if y  1. The resulting contour integral is 1 or 0 respectively
R theorem. We may let jM j!1 and bound the horizontal integrals
by the residue
by (T )?1 01 ycr dr; this gives the estimate yc=T j log yj. Using a circular arc
centered at the origin instead of a rectangle yields the same residue with a
remainder of absolute value < yc . 2
This Lemma will let us approximate n<x an by (2i)?1 cc?+iTiT xs F (s) ds=s.
We shall eventually choose some T and exploit the analytic continuation of F
to shift the contour of integration past the region of absolute convergence to
obtain nontrivial estimates.
The next question is, which F should we choose? Consider for instance  (s). We
have in e ect seen already that if we take F (s) = log  (s) then the sum of the
resulting an over n < x closely approximates (x). Unfortunately, while  (s)
continues meromorphically to   1, its logarithm does not: it has essential log-
arithmic singularities at the pole s = 1, and at zeros of  (s) to be described later.
So we use the logarithmic derivative of  (s) instead, which at each pole/zero of
 has a simple pole with a known residue and thus a predictable e ect on our
contour integral.
What are the coecients an for this logarithmic derivative? It is convenient to
use not  0 = but ? 0 = , which has positive coecients. Using the Euler product
we nd
Xd 1
?s ) = X log p p?s = X log p X p?ks :
?  ((ss)) = log(1 ? p 1 ? p?s p
p ds p k=1
That is,
0 1
?  (s) = (n)n?s :
So the n coecient is none other than the von Mangoldt function which arose
in the x! factorization! Thus our contour integral
1 Z c+iT ?  0 (s) xs ds (c > 1)
2i c?iT  s
approximates (x). The error can be estimated by our Lemma (IT ): since

j(n)j < log n the error is of order at most
(x=n)c log n  min(1; T j log(1x=n)j )
which is O(T ?1 xc log2 x) provided 1 < T < x. [ Verify this; explain

why the bound need not hold if T is large compared to x. Use (IT ) to show that
nevertheless Z c+i1  0 R
(x) = 2i1 ?  (s) xs dss ( )
for all x; c > 1 in the principal value sense of (1).] Taking c = 1 + A= log x so
xc  x we nd:
Z 1+ log x +iT  0
A  2 
(x) = 21i ? (s) x s ds + OA x log x : ()
1+ logA x ?iT  s T
Similarly for any Dirichlet character  we obtain a formula for
(x; ) := (n)(n)
by replacing  (s) in () by L(s; ).
To make use of this we'll want to shift the line of integration to the left, where
jxs j is smaller. As we do so we shall encounter poles at s = 1 and at zeros of
 (s), and will have to estimate j 0 = j over the resulting contour. This is why
we are interested in the analytic continuation of  (s) [and likewise
L(s; )] and its zeros. We investigate these next.
Remark: we can already surmise that (x) will be approximated by x?  x =,
the sum running over zeros  of  (s) counted with multiplicity, and thus that the
Prime Number Theorem is tantamount to the nonvanishing of  (s) on Re(s) = 1.
That  (1 + it) 6= 0 is also the key step in various \elementary" proofs or the
Prime Number Theorem such as [Newman 1980] (see also [Zagier 1997]).
Exercise: Show that 1 ?s
n=1 (n)n = 1= (s), with  being the Mobius function
de ned in the P last exercise of the previous
R lecture notes. Deduce an integral
formula for n<x (n) analogous to ( ), and an approximate integral formula
analogous to () but with error only O(T ?1 x log x) instead of O(T ?1 x log2 x).
[Newman 1980] Newman, D.J.: Simple Analytic Proof of the Prime Number
Theorem, Amer. Math. Monthly 87 (1980), 693{696.
[Zagier 1997] Zagier, D.: Newman's Short Proof of the Prime Number Theorem,
Amer. Math. Monthly 104 (1994), 705{708.

Math 259: Introduction to Analytic Number Theory
The Riemann zeta function and its functional equation

Recall Euler's identity:

01 1
1 Y @X Y 1
[ (s) :=] n?s = p?cp s A = ?s : (E)
n=1 p prime cp =1 p prime 1 ? p
We showed that this holds as an identity between absolutely convergent sums
and products for real s > 1. Riemann's insight was to consider (E) as an identity
between functions of a complex variable s. We already noted (while obtaining
the integral for (x)) that for
s =  + it
and real n > 0 we have
n?s = exp(?s log n) = n? eit log n ) jn?s j = n? :
Thus both sides of (E) converge absolutely in the half-plane  > 1, and are
equal there either by analytic continuation from the real ray t = 0 or by the
same proof we used for the real case. But the function  (s) extends from that
half-plane to a meromorphic function on all of C, analytic except for a simple
pole at s = 1. The continuation to  > 0 is readily obtained from our formula
X1  Z n+1  X 1 Z n+1
 (s) ? s ?1 1 = n?s ? x?s dx = (n?s ? x?s ) dx;
n=1 n n=1 n
since for n + 1  x  n  1,  > 0 we have
jn?s ? x?s j = s
Z x y?1?s dy  jsjn?1?
so the formula for  (s) ? 1=(s ? 1) is a sum of analytic functions converging
absolutely in compact subsets of f + it :  > 0g and thus gives an analytic
function there. (See also the rst Exercise below.) It is possible to proceed in
this fashion, extending  to  > ?1,  > ?2, etc.1 However, once we have
de ned  (s) on  > 0 we can obtain the entire analytic continuation at once
from Riemann's functional equation relating  (s) with  (1 ? s). This equation
is most nicely stated by introducing the meromorphic function  (s) de ned by2
 (s) := ?s=2 ?(s=2) (s)
1 This is quite straightforward if you know the Euler-Maclaurin summation formula with
an integral form of the remainder.
2 Warning: occasionally one still sees  (s) de ned as what we would call (s2 ? s) (s) or
(s2 ? s)(s)=2, as in [GR 1980, 9.561]. The factor of (s2 ? s) makes the function entire, and
does not a ect the functional equation since it is symmetric under s $ 1 ? s. However, for
most uses it turns out to be better to leave it out and tolerate the poles at s = 0; 1.

for  > 0. Then we have:
Theorem (Riemann): The function  extends to a meromorphic function on C,
regular except for simple poles at s = 0; 1, which satis es the functional equation
 (s) =  (1 ? s): (R)
It follows that  also extends to a meromorphic function on C, which is regular
except for a simple pole at s = 1, and that this analytic continuation of  has
simple zeros at the negative even integers ?2; ?4; ?6; : : :, and no other zeros
outside the closed 0    1.
critical strip

[The zeros ?2; ?4; ?6; : : : of  outside the critical strip are called its trivial
The proof has two ingredients: properties of ?(s) as a meromorphic function of
s 2 C, and the Poisson summation formula. We treat ? rst.
The Gamma function was de ned for real s > 0 by Euler3 as the integral
?(s) := xs e?x dx
x: (?)
We have ?(1) = 01 e?x dx = 1 and, integrating by parts,
Z1 Z1
s?(s) = e?xd(xs ) = ? xs d(e?x ) = ?(s + 1) (s > 0);
0 0
so by induction ?(n) = (n ? 1)! for positive integers n. Since jxs j = x , the
integral (?) de nes an analytic function on  > 0, which still satis es the
recursion s?(s) = ?(s + 1) (proved either by repeating the integration by parts
or by analytic continuation from the positive real axis). That recursion then
extends ? to a meromorphic function on C, analytic except for simple poles at
0; ?1; ?2; ?3; : : :. (What are the residues at those poles?) For s; s0 in the right
half-plane  > 0 the Beta function4 B(s; s0 ), de ned by the integral
B(s; s0 ) := xs?1 (x ? 1)s0 ?1 dx;
is related with ? by
?(s + s0 )B(s; s0 ) = ?(s)?(s0 ) (B)
R 1R 1 0
(this is proved by the standard trick of evaluating 0 0 xs?1 ys ?1 e?x?y dx dy
in two di erent ways). Since ?(s) > 0 for real positive s it readily follows that
? has no zeros in  > 0, and thus none in the complex plane.
3 Actually Euler used (s ? 1) for what we call ?(s); thus (n) = n! for n = 0; 1; 2; : : :.
4a.k.a. \Euler's rst integral", (?) being \Euler's second integral".

This is enough to derive the poles and trivial zeros of  from the functional
equation (R). [Don't take my word for it | do it!] But where does (R) come
from? There are several known ways to prove it; we shall use Riemann's original
method, which generalizes to L-series associated to modular forms. Riemann
expresses  (s) as an integral involving the theta function
(u) := e?n2 u = 1 + 2(e?u + e?4u + e?9u + : : :);
the sum converging absolutely to an analytic function on the upper half-plane
Re(u) > 0. Then
2 (s) = ((u) ? 1)us=2 du
u ( > 0)
(integrate ((u) ? 1)us=2 du=u = 2 1
?n2 u us=2 du=u termwise). But we
n=1 e
shall see:
Lemma: The function (u) satis es the identity
(1=u) = u1=2(u): ()

Assume this for the time being. We then rewrite our integral for 2 (s) as
Z1 Z1
((u) ? 1)us=2 du
u + ((u) ? 1)us=2 du
0 1
Z1 Z1
= ? s + (u)u u + ((u) ? 1)us=2 du
2 s=2 du
0 1
and use the change of variable u $ 1=u to nd
Z1 Z1
(u)us=2 du
u = (u?1 )u?s=2 du
Z1 0 1 Z
= (u)u(1?s)=2 du 2
u = s ? 1 + 1 ((u) ? 1)u
(1?s)=2 du :
Thus Z1
 (s) + s + 1 ? s = 2 ((u) ? 1)(us=2 + u(1?s)=2 ) du
1 1 1
which is manifestly symmetrical under s $ 1 ? s, and analytic since (u) de-
creases exponentially as u!1. This concludes the proof of the functional equa-
tion and analytic continuation of  , assuming our lemma ().
And where does () come from? It is the special case f (x) = e?ux2 of the

Theorem (Poisson Summation Formula): Let f : R!C be a twice-di erentiable
function such that (jxjr + 1)(jf (x)j + jf 00 (x)j) is bounded for some r > 1, and
let f^ be its Fourier transform
Z +1
f^(y) = e2ixy f (x) dx:
1 X
f (m) = f^(n); (P)
m=?1 n=?1
the sums converging absolutely.
Proof : De ne F : R=Z!C by
F (x) := f (x + m);
the sum converging absolutely to a twice-di erentiable function by the assump-
tion on f . Thus the Fourier series of F converges absolutely to F , so in particular
1 Z1
F (0) = e2inx F (x) dx:
n=?1 0
But F (0) is just the left-hand side of (P), and the integral is f^(n), so its sum
over n 2 Z yields the right-hand side of (P), Q.E.D.
Now let f (x) = e?ux2 . The hypotheses are handily satis ed for any r, so
(P) holds. The left-hand side is just (u). To evaluate the right-hand side, we
need the Fourier transform of f , which is u ?1=2 e?u?1 y2 . [Contour integration
R 1 e?ux2 dx = u?1=2, which is the well-known Gauss
reduces this claim to ?1
integral | see the Exercises.] Thus the right-hand side is u?1=2 (1=u). Multi-
plying both sides by u1=2 we then obtain () and nally complete the proof of
the analytic continuation and functional equation for  (s).
Remark: We noted already that to each number eld K there corresponds a
zeta function
K (s) := Nm(I )?s = (1 ? Nm(})?s )?1 ( > 1);
I }
in which the sum and product extend respectively over ideals and prime ideals
of the ring of integers OK , and their equality expresses unique factorization. As
in our case of K = Q, this extends to a meromorphic function on C, regular
except for a simple pole at s = 1. Moreover it satis es a functional equation
K (s) = K (1 ? s), where
K (s) := ?(s=2)r1 ?(s)r2 (4?r2 ?n jdj)s=2 K (s);

in which n = r1 + 2r2 = [K : Q], r1 ; r2 are the numbers of real and complex
embeddings of K , and d is the discriminant of K=Q. The factors ?(s=2)r1 ; ?(s)r2
may be regarded as factors corresponding to the \archimedean places" of K , as
the factor (1 ? Nm(})?s )?1 corresponds to the nite place }. The functional
equation can be obtained from generalized Poisson summation as in [Tate 1950].
Most of our results for  = Q carry over to these K , and yield a Prime
Number Theorem for primes of K ; L-series generalize too, though the proper
generalization requires some thought when the class and unit groups need no
longer be trivial and f1g as for Q. See for instance H.Heilbronn's \Zeta-
Functions and L-Functions", Chapter VIII of [CF 1967].
Show that if : Z!C is a function such thatP1nm=1 (m) = O(1) (for
instance, if is a nontrivial Dirichlet character) then n=1 (n)n?s converges
uniformly (albeit not absolutely) in compact subsets of f + it :  > 0g and
thus de nes an analytic function on that half-plane. Apply this to
(1 ? 21?s ) (s) = 1 ? 21s + 31s ? 41s + ?   
(with (n) = (?1)n?1 ) and (1 ? 31?s ) (s) to obtain a di erent proof of the
analytic continuation of  to  > 0.
Show that  (?1) = ?1=12, and (if you know or are willing to derive the formula
for  (2m)) that more generally  (1 ? 2m) = ?B2m =2m where P Bk is the k-th
Bernoulli number de ned by the generating function t=(et ? 1) = 1 k
n=0 Bk t =k !.
What is  (0)? [It is known that in general K (?m) 2 Q (m = 0; 1; 2; : : :) for
any number eld K . In fact the functional equation for K indicates that once
[K : Q] > 1 all the K (?m) vanish unless K is totally real and m is odd, in
which case the rationality of K (?m) was obtained in [Siegel 1969].]
If you've never seen it yet, or have done it once but forgotten, prove (B) by
starting from the integral representation of the right-hand side as
Z 1Z 1
xs?1 ys0 ?1 e?x?y dx dy
0 0
and applying the change of variable (x; y) = (uz; (1 ? u)z ). [We will probably
have little use for the Beta function in Math 259, but an analogous transforma-
tion will show up later in the formula relating Gauss and Jacobi sums.]
Now take s = s0 = 1=2 to prove that ?(1=2) = , and thus to obtain the
Gauss integral Z1 2 p
e?x = :
Then take s0 = s and use the change of variable u = (1 ? 2x)2 in the integral
de ning B(s; s) to obtain B(s; s) = 21?2s B(s; 1=2), and thus the duplication
?(2s) = ?1=2 22x?1 ?(s)?(s + 12 ):

Use Poisson summation to evaluate 1
P 2 2
n=1 1=(n + c ) for c > 0. [Evaluating the
Fourier transform of 1=(x2 + c2 ) is a standard exercise in contour integration.]
Verify that your answer approaches  (2) = 2 =6 as c!0.
Let 8 be the Dirichlet character mod 8 de ned by 8 (1) = 1, 8 (3) = ?1.
Show that if f is a function satisfying the hypothesis of Poisson summation then
1 X
8 (m)f (m) = 8?1=2 8 (n)f^(n=8):
m=?1 n=?1
Letting f (x) = e?ux2 , obtain an identity analogous to (), and deduce a func-
tional equation for L(s; 8 ).
Now let 4 be the character mod 4 de ned by 4 (1) = 1. Show that, again
under the Poisson hypotheses,
1 X
4 (m)f (m) = 21 4 (n)f^(n=4):
m=?1 n=?1
But now taking f (x) = e?ux2 does not accomplish much! Use f (x) = xe?ux2
instead to nd a functional equation for L(s; 4 ).
For light relief after all this hard work, di erentiate the identity () with respect
to u, set u = 1, and conclude that
e > 8 ? 2:
What is the approximate size of the di erence?
[CF 1967] Cassels, J.W.S., Frohlich, A., eds.: Algebraic Number Theory. Lon-
don: Academic Press 1967. [AB 9.67.2 / QA 241.A42]
[GR 1980] Gradshteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series, and Prod-
ucts. New York: Academic Press 1980. [D 9.80.1 / basement reference
[Siegel 1969] Siegel, C.L.: Berechnung von Zetafunktionen an ganzzahligen
Stellen, Gott. Nach. 10 (1969), 87{102.
[Tate 1950] Tate, J.T.: Fourier Analysis in Number Fields and Hecke's Zeta-
Functions. Thesis, 1950; Chapter XV of [CF 1967].

Math 259: Introduction to Analytic Number Theory
More about the Gamma function

The product formula for ?(s). Recall that ?(s) has simple poles at s =
0; ?1; ?2; : : : and no zeros. We readily concoct a product that has the same
behavior: let 1
g(s) := 1s 1 + ks ;
Y . 
the product converging uniformly in compact subsets of C ? f0; ?1; ?2; : : :g
because ex =(1 + x) = 1 + O(x2 ). Then ?=g is an entire function with neither
poles nor zeros. What about g(s + 1)=g(s)? That is the limit as N !1 of
s e 1=k 1 + k = s
Y s
1 N
Y k+s
s + 1 k=1 1 + s+1 k s + 1 k=1
k k=1
k + s+1
= s  N +Ns + 1  exp ? log N + k1 :

Now the factor N=(N +s+1) approaches 1, while the exponent ? log N + Nk=1 k1
tends to Euler's constant = 0:57721566490 : : : Thus g(s + 1) = se g(s), and
if we de ne
1 !
?? (s) := e? sg(s) = e? s Y e
1 + s  = 1 lim N s Y N
s + k (P)
s k=1 k s N !1 k=1

then ?? satis es the same functional equation ?? (s + 1) = s?? (s) satis ed by ?.

We claim that in fact ?? = ?. Indeed the quotient q := ?=?? is an entire
function of period 1 that equals 1 at s = 0 (why?). Thus it is an analytic
function of e2is 2 C . We claim that
jq( + it)j  ejtj=2 (Q)
for all real ; t; since the coecient =2 in the exponent is less than 2 it will
follow that q is constant and thus that ?? = ? as claimed. Since q is periodic
it is enough to prove (Q) for  2 [1; 2]. Then we have j?( + it)j  ?() by the
integral formula and
? ( + it) Y1 +k 1X 1 
t2  :
= = exp log 1 +
?? ( )
j + k + itj 2 k=0 ( + k)2
The summand is a decreasing function of k, so the sum is
Z 1 ? 
Z 1 ? 
 log 1 + (t=x)2 dx = jtj log 1 + (1=x)2 dx;
0 0

which on integration by parts becomes 2jtj 01 dx=(x2 + 1) = jtj. This proves

(Q), and thus shows that (P) is a product formula for ?(s).
Consequences of the product formula. First among these is the Stirling
approximation1 to log ?(s). Fix  > 0 and let R be the region
fs 2 C : jsj > ; jIm(log s)j <  ? g:
Then R is a simply-connected region containing none of the poles of ?, so there
is an analytic function log ? on R , real on R \ R, and given by the above
product formula:
log ?(s) = Nlim
s log N + log N ! ? log(s + k) : (L)

We estimate the sum as we did for log x! in obtaining the original form of
Stirling's approximation: the sum di ers from
Z N+ 1
log(s + x) dx = (N + 12 + s) log(N + 12 + s) ? (s ? 21 ) log(s ? 12 ) ? N ? 1

? 21
= (N + 12 + s) log N +(N + 21 + s) log(1+ N1 (s + 21 )) ? (s ? 21 ) log(s ? 21 ) ? N ? 1
1 Z N + 2 1 hxi2 dx  jsj?1 :

2 ? 21 (s + x)2 

We already know that log N ! = (N + 1=2) log N ? N + A + O(N ?1 ) for some

constant A. Taking N !1 we obtain
log ?(s) = (s ? 21 ) log s ? s + C + O (jsj?1 ): (S)
for some absolute constant C . This is more than sucient for our purposes, but
in fact it is known that C = 21 log 2 ( Verify this from (S) together

with the duplication formula), and that the O (jsj?1 ) error can be expanded in
an asymptotic series in inverse powers of s (which come from further terms in
the Euler-Maclaurin expansion of the sum in (L)).
The logarithmic derivative of our product formula for ?(s) is
1 1
?0 (s) = ? ? 1 + X
1 = lim log N ? XN
1 :
?(s) ?
s k=1 k s + k N !1 s+k
1 Originally only for n! = ?(n + 1), but we need it for complex s as well. As we shall see
it can also be obtained from the stationary-phase expansion of Euler's integral, the critical
point of xs?1 ex being x = s ? 1.

by di erentiating2 (S) or by applying the same Euler-Maclaurin trick to
0 1=(s + k) we nd that
?0 (s) = log s ? 1 + O (jsj?2 ): (S0 )
?(s) 2s 

Exercises:1. (further applications of the product formula) Use (P) to obtain a

product formula for ?(s)?(?s) and deduce that
?(s)?(1 ? s) = = sin s: ()
(This can also be obtained from ?(s)?(1 ? s) = B(s; 1 ? s) by using the change
of variable x = y=(y ? 1) in the Beta integral and evaluating the resulting ex-
pression by contour integration.) Use this together with the duplication formula
Riemann's theorem to obtain the equivalent asymmetrical form
 (1 ? s) = ?s 21?s ?(s) cos s
2  (s)
of the functional equation for  (s). Note that the duplication formula, and its
generalization  ?1

? x + nk ;
?(ns) = (2)
2 nns? 12
can also be obtained from (P).
2. (complex-analytic proof of the  functional equation) Prove that
Z 1
 (s) = ?(1s) us?1 eudu? 1
for  > 1, and that when s is not a positive integer an equivalent formula is
? s) Z us?1 du
 (s) = ? ?(12i
C eu ? 1
where C is a contour coming from +1, going counterclockwise around u = 0,
and returning to +1:

2 While real asymptotic series cannot in general be di erentiated (why?), complex ones can,
thanks to Cauchy's integral formula for the derivative. The logarithmic derivative of ?(s) is
often called P (s) in the literature, but alas we cannot use this notation because it con icts
with (x) = n<x (n): : :

Show that this gives the analytic continuation of  to a meromorphic function
on C; shift the line of integration to the left to obtain the functional equation
relating  (s) to  (1 ? s) for  < 0, and thus by continuation for all s.
3. (behavior on vertical lines) Deduce from (S) that for xed  2 R
Re log ?( + it) = ( ? 21 ) log jtj ? 2 jtj + C + O (jtj?1 )

as jtj!1. Check that for  = 0; 1=2 this agrees with the exact formulas
 ; j?(1=2 + it)j2 = 
j?(it)j2 = t sinh t cosh t
obtained from (*).
[All this material is standard; one basic reference is Ch. XII (pages 235{264) of
[WW 1940]. One reason for not just citing Whittaker & Watson is that some
of the results concerning Euler's integrals B and ? have close analogues in the
Gauss and Jacobi sums associated to Dirichlet characters, and we'll need these
analogues shortly.]
[WW 1940] Whittaker, E.T., Watson, G.N.: A Course of Modern Analysis: : : 3
(fourth edition). Cambridge University Press, 1940 (reprinted 1963). [HA 9.40
/ QA295.W38]

3 The full title is 26 words long, which was not out of line when the book rst appeared in
1902. You can nd the title in Hollis.

Math 259: Introduction to Analytic Number Theory
Functions of nite order: product formula and logarithmic derivative

[See for instance Chapter 11 of [Davenport 1967], keeping in mind that Daven-
port uses \integral function" for what we call an \entire function"; Davenport
treats only the case of order (at most) 1, which is all that we need, but it is
scarcely harder to deal with any nite order as we do here.]
The order of an entire function f () is the smallest 2 [0; +1] such that
f (z )  exp jz j + for all  > 0. Entire functions of nite order have nice
in nite products. We have seen already the cases of sin z and 1=?(z ), both of
order 1. As we shall see, (s2 ? s) (s) also has order 1 (as do analogous functions
we'll obtain from Dirichlet L-series). From the product formula for  (s) we
shall obtain a partial-fraction decomposition of  0 (s)= (s) which we shall use to
analyze the contour-integral formula for (x).
Suppose rst that f has nite order and no zeros. Then f = eg for some
entire function g. We claim that g is a polynomial. Indeed the real part of g
is < O(jz j + . But then the same is true of jg(z )j. Let h = g ? g(0). Then
h(0) = 0. Let M = supjzj2R Re h(z ); by assumption M  R + . Then
h1 := h=(2M ? h) is analytic in C := fz 2 C : jz j  2Rg, with h1 (0) = 0 and
jh1 (z )j  1 in C . Consider now the analytic function (z ) := 2Rh1(z )=z on C .
On the boundary of that circle, j(z )j  1. Thus by the maximum principle the
same is true for all z 2 C . In particular if jz j  R then jh1 (z )j  1=2. But then
jh(z )j  2M . Thus jg(z )j  2M + g(0)  jz j + and g is a polynomial in z as
claimed. Moreover the degree of that polynomial is just the order of f .
We shall reduce the general case to this by dividing a given function f of nite
order (not vanishing identically on C) by a product whose zeros match those
of f . For this product to converge we'll need to bound the number of zeros of f
in a circle. Let z1 ; z2; : : : be the zeros of f , listed with the correct multiplicity
in increasing order of jzk j. Consider rst f (Qz ) in jz j < 1. Let zn be the last zero
of f there. Let g be the Blaschke product nk=1 (z ? zk )=(1 ? zk z ), designed to
have the same zeros but with jg(z )j = 1 on jz j = 1. Then f1 := f=g is analytic
on jz j  1, and jf (z )j = jf1 (z )j on the boundary jz j = 1. Therefore by the
maximum principle jf1 (0)j  maxjzj=1 jf (z )j, so
Yn Yn
jf (0)j = jg(0)f (0)j =
1 jzk j  jf (0)j 
1 jzk j  jmax
jf (z )j:
k=1 k=1 =1

It follows that if zk (1  k  n(R)) are the zeros of f in jz j < R then1
jf (0)j  jzk j  max jf (z )j:
k=1 R jzj=R
Now suppose for convenience that f (0) =6 0 [otherwise apply the following ar-
gument to f=z r where r is the order of the vanishing of f at z = 0]. Then we
may take logarithms to nd
(R) Z R

log max jf (z )j  log jf (0)j + log jz j = log jf (0)j + n(r) dr

jzj=R k=1 k 0

If f has order at most < 1 then the LHS of this is O (R + ), and we conclude
that Z eR Z eR

n(R) = dr
n(R) r  n(r) drr  R + :
R 0

It follows that 1 ?
k=1 jzk j converges if > , since the sum is

Z 1 Z 1 Z 1
r dn(r) = r ? ?1
n(r) dr  r +? ?1 dr < 1
0 jz1 j jz1j
for any positive  < ? . Therefore the product
1 a 1 z m
(1 ? zzk ) exp

P (z ) := (a = b c)
k=1 m=1 m zk
converges for all z 2 C. Moreover the convergence is uniform in compact subsets
of C, because on jz j  R log(1?z=zk )+ am=1 (z=zk )m =m  (z=zk )a+1  zk?a?1

uniformly once k > n(2R). Thus P (z ) is an entire function, with the same zeros
and multiplicities as f .
It follows that f=P is an entire function without zeros. We claim that it too has
order at most , and is thus exp g(z ) for some polynomial g of degree at most a.
To do this it is enough to prove that for each  > 0 there exists C such that for
all R  1 there exists r 2 (R; 2R) such that jP (z )j  exp ?C R + for all z 2 C
with jz j = r. Write P = P1 P2 , with P1 ; P2 being a product over k  n(4R) and
k > n(4R) respectively. The k-th factor of P2 (z ) is exp O(jz=zk ja+1 ), so
Z 1
r?a?1 dn(r)  R + ;
log jP2 (z )j  Ra+1 jzk ja  Ra
+1 +1

k>n(4R) 4 R
1 Since the resulting function f has no zeros in z < R, it follows that log f (z )
1 j j 1 j j
is a harmonic
R 2
function on that circle,Pwhence Jensen's formula : if f (0) = 0 then 6

(2)?1 0 log f (Rei ) d = log f (0) + k log R= zk :

j j j j j j

using integration by parts and n(r)  r Q+ in the last step (check this!). As
to P1 , it is a nite product, which is eh(z) kn(4R) (1 ? z=zk ) where h(z ) is the
(4R) a
z m
X 1
h(z ) =
k=1 m=1 m zk
of degree at most a. Thus h(z )  n(4R) + Ra kn(4R) jzk j?a , which readily

yields h(z )  R + (carry out the required partial integration and estimates).
So far, our lower bounds on the factors of P (z ) held for Q
all z in the annulus
R < jz j < 2R, but we cannot expect the same for P3 (z ) := kn(4R) (1 ? z=zk ),
since it may vanish at some points of the annulus. However, we can prove that
some r works by estimating the average
R nX
(2R) R

log 1 ? jzr j dr:

Z 2 Z 2

? R1 min log jP3 (z )j dr  ? 1

R jzj=r k=1 R R k
The integral is elementary, if not pretty, and at the end we conclude that the
average is again  R + . This shows that for some r 2 (R; 2R) the desired
lower bound holds, and we have nally proved the product formula
1 z a 1  z m
g (z )
f (z ) = P (z )e = e g (z )
(1 ? z ) exp
k=1 m=1 m zk
Taking logarithmic derivatives we deduce
1 (z=z )a
f 0 (z )=f (z ) = g0 (z ) + P 0 (z )=P (z ) = g0 (z ) +
k=1 z ? zk :
We note too that if > 0 and k jzk j? < 1 then there exists a constant C

such that f (z )  exp C jz j . This follows from the existence of a constant C

such that
(1 ? w ) exp

w m =m  exp C jwj

for all w 2 C. Contrapositively, if f (z ) is a function of order which grows faster
than exp C jz j for all C then k jzk j? diverges. For instance this happens with

f (s) = 1=?(s). [This may appear circular because it is proved from the product
formula for ?(s), but it need not be; see Exercise 3 below.] As we shall see,
the same is true for f (s) = (s2 ? s) (s); it will follow that  , and thus  , has
in nitely many nontrivial zeros with real part in [0; 1], and in fact that the sum
of their reciprocals' norms diverges.

1. Show that this last result does not hold for = 0.

Find an entire function f (z ) of order 1 such that jf (z )j  exp O(jz j) but
k=1 jzk j = 1. [Hint: you don't have to look very far.]
3. Supply the missing steps in our proof of the product formula.
4. Show that 1=?(s) is an entire function of order 1, using only the following
tools available to Euler: the integral formulas for ?(s) and B(s; s0 ), and the
identities B (s; s0 ) = ?(s)?(s0 )=?(s + s0) and ?(s)?(1 ? s) = = sin s. [The hard
part is getting an upper bound for 1=j?(s)j on a vertical strip; remember how
we showed that ?(s) 6= 0, and use the formula for j?(1=2 + it)j2 to get a better
lower bound on j?(s)j.] Use this to recover the product formula for ?(s), up to
a factor eA+Bs which may be determined from the behavior of ?(s) at s = 0; 1.
5. Prove that if f (z ) is an entire function of order > 0 then

jf 0 (z )=f (z )j dx dy  r
+1+  (z = x + iy)
as r!1. [Note that the integral is improper (except in the trivial case that
f has no zeros) but still converges: if  is a meromorphic function on a region
U  C with simple but no higher-order poles then jj is integrable on compact
subsets K  U , even K that contain poles of .]
[Davenport 1967] Davenport, H.: Multiplicative Number Theory. Chicago:
Markham, 1967; New York: Springer-Verlag, 1980 (GTM 74). [9.67.6 & 9.80.6
/ QA 241.D32]

Math 259: Introduction to Analytic Number Theory
The product formula for  (s) and  (s); vertical distribution of zeros

Behavior on vertical lines. From Stirling it follows that for xed  2 R

Re log ?( + it) = ( ? 12 ) log jtj ? 2 jtj + C + O (jtj?1 ):

[Check that for  = 0; 1=2 this agrees with the exact formulas
 ; j?(1=2 + it)j2 = 
j?(it)j2 = t sinh t cosh t
obtained from the formula for ?(s)?(1 ? s).] For  > 1, the Euler product for
 (s) shows that log j ( + it)j  1; indeed we have the upper and lower bounds
 ()  j ( + it)j > (1 + p?s )?1 =  (2)= ():
Thus j ( + it)j is within a constant factor of jtj=2?1=4 e?jtj=2 for large jtj. The
functional equation  (s) =  (1 ? s) then gives the same result for j (1 ?  + it)j,
and we conclude for each  < 0 that j ( + it)j is within a constant factor of
jtj1=2? for large t.
What about j ( + it)j for  2 [0; 1], i.e. within the critical strip? Generalizing
our formula for analytically continuing  (s) we nd for  > 0
NX 1?s X 1 Z n+1
 (s) = n?s + Ns ? 1 + (n?s ? x?s ) dx;
n=1 N =1 n
which for large t; N is  N 1? + jtjN ? , uniformly at least for   1=2. Taking
N = jtj + O(1) we nd  ( + it)  jtj1? there, so by the functional equation
also  ( + it)  t1=2 for  > 0. In fact either the \approximate functional
equation" for  (s) (usually attributed to Siegel, but now known to have been
used by Riemann) or general convexity results (variations on the \Three Lines
Theorem") tell us that  ( + it)  jtj(1?)=2+ for  2 [0; 1]. For our present
purposes any bound jtjO(1) will do, but the Lindelof conjecture asserts that in
fact  ( + it)  jtj for all   1=2 (excluding a neighborhood of the pole
s = 1), and thus by the functional equation that also  ( + it)  jtj1=2?+
for all   1=2. We shall see that this is implied by the Riemann hypothesis.
However, the best upper bound currently proved on
lim sup log jlog
(1=2 + it)j

is only a bit smaller than 1=6; when we get to exponential sums later this term
we shall derive the upper bound of 1=6.
A remark about our choice of N t in the bound  ( + it) N 1? + t N ? : of
 j j  j j

course we wanted to choose N to make the bound as good as possible, i.e. to minimize
N 1? + t N ? . In calculus we learned to do this by setting the derivative equal to
j j

zero. That would give N proportional to t , but we arbitrarily set the constant of
j j

proportionality to 1 even though another choice would make N 1? + t N ? slightly

j j

smaller. In general when we bound some quantity by a sum O(f (N ) + g(N )) of an

increasing and a decreasing function of some parameter N , we shall simply choose N
so that f (N ) = g(N ) (or, if N is constrained to be an integer, so that f (N ); g(N ) are
nearly equal). This is much simpler and less error-prone than messing with derivatives,
and is sure to give the minimum to within a factor of 2, which is good enough when
we're dealing with O( ) bounds.

Product and logarithmic-derivative formulas. At any rate, we know that

(s2 ? s) (s) is an entire function of order 1 bounded by exp O(jsj log jsj) but not
exp O(jsj). Thus we have a product expansion:
A+Bs Y
 (s) = es2 ? s (1 ? s=)es= ; ( )

for some constants A; B , with the product ranging over zeros  of  (i.e. non-
trivial zeros of  ) listed with multiplicity.
Exercise:Show that we may take A = 0. Prove that
0 0
B =  (0) = ?  (1) = 21 log 4 ? 1 ? 2 = ?0:0230957 : : :
Show also (starting by pairing the  and  terms in the in nite
product) that X
B = ? Re()=jj2 ;

and thus that j Im()j > 6 for every nontrivial zero  of  (s). [From
[Davenport 1967], Chapter 12. It is known that in fact the small-
est zeros have (real part 1/2 and) imaginary part 14:134725 : : :]
Finally prove the alternative in nite product
 " #
 (s) = 4(s(1?=2)
1 ? s ? 1=2 2 ;
s2 )   ? 1=2
the product extending over zeros  of  whose imaginary part is

Moreover  jj?1? < 1 for all  > 0 but  jj
?1 = 1. The logarithmic
derivative of ( ) is
 0 (s) = B ? 1 ? 1 + X  1 + 1  ; ( 0 )
 s s?1  s? 
since  (s) = ?s=2 ?(s=2) (s) we also get a product formula for  (s), and a
partial-fraction expansion of its logarithmic derivative:
 0 (s) = B ? 1 + 1 log  ? 1 ?0 ( s + 1) + X  1 + 1  : ( 0 )
 s?1 2 2? 2  s? 
(We have shifted from ?(s=2) to ?(s=2 + 1) to absorb the term ?1=s; note that
 (s) does not have a pole or zero at s = 0.)
Vertical distribution of zeros. Since the zeros  of (s) are limited to a strip
we can nd much more precise information P
about the distribution
of their sizes
than the convergence and divergence of  jj?1? and  jj?1 . Let N (T ) be
the number of zeros in the rectangle  2 [0; 1], t 2 [0; T ] (this is essentially half
of what we called n(T ) in the context of the general product formula). We shall
prove a theorem of von Mangoldt: as T !1,
N (T ) = 2T log 2T ? 2T + O(log T ): (N)
We follow chapter 15 of [Davenport 1967], keeping track of the fact that Dav-
enport's  and ours di er by a factor of (s2 ? s)=2.
We may assume that T does not equal the imaginary part of any zero of  (s).
2N (T ) ? 2 = 21i
 0 (s) ds = 1 I d(log  (s)) = 1 I d(Im log  (s));
CR  2i CR 2 CR
where CR is the boundary of the rectangle  2 [?1; 2], t 2 [?T; T ]. Since
 (s) =  (1 ? s) =  (s), we may by symmetry evaluate the last integral by
integrating over a quarter of CR and multiplying by 4. We use the top right
quarter, going from 2 to 2 + iT to 1=2 + iT . At s = 2, log  (s) is real, so we
(N (T )?1) = Im log  ( 21 +iT ) = Im(log ?( 12 +iT ))? T2 log  +Im(log  ( 12 +iT )):
By Stirling, the rst term is within O(T ?1 ) of

? iT ? 3  log? iT + 1  ? T
2 4 2 4 2


= T2 log iT2 + 41 ? 43 Im log( iT2 + 14 ) ? T2 = T2 (log T2 ? 1) + O(1):

Thus (N) is equivalent to

Im log  ( 12 + iT )  log T: (N1 )
We shall show that for s =  + it with  2 [?1; 2], jtj > 1 we have
 0 (s) = X 1 + O(log jtj); (N01 )
 j Im(s?)j<1 s ? 
and that the sum comprises at most O(log jtj) terms, from which our desired
estimate will follow by integrating from s = 2 + iT to s = 1=2 + iT . We start
by taking s = 2 + it in ( 0 ). At that point the LHS is uniformly bounded (use
the Euler product) and the RHS is
X 1 + 1 + O(log jtj)
r 2 + it ?  
by Stirling. Thus the sum, and in particular its real part, is O(log jtj). But
each summand has positive real part, which is at least 1=(4 + (t ? Im )2 ). Our
second claim, that jt ? Im j < 1 holds for at most O(log jtj) zeros , follows
immediately. It also follows that
X 1
j Im(s?)j1 Im(s ? )2  log jtj:
Now by ( 0 ) we have
 0 (s) ?  0 ( 1 + it) = X  1 ? 1  + O(1):
  2  s ?  2 + it ? 
The LHS di ers from that of (N01 ) by O(1), as noted already; the RHS summed
over zeros with j Im(s ? )j < 1 is within O(log jtj) of the RHS of (N01 ); and the
remaining terms are
(2 ? )
X 1 
X 1
2  log t:
j Im(s?)j1 (s ? )(2 + it ? ) j Im(s?)j1 Im(s ? )
This proves (N01 ) and thus also (N1 ); von Mangoldt's theorem (N) follows.
For much more about the vertical distribution of the nontrivial zeros  of  (s)
see [Titchmarsh 1951], Chapter 9.
[Titchmarsh 1951] Titchmarsh, E.C.: The Theory of the Riemann Zeta-Function.
Oxford: Clarendon, 1951. [HA 9.51.14 / QA351.T49; 2nd ed. revised by D.R.
Heath-Brown 1986, QA246.T44]

Math 259: Introduction to Analytic Number Theory
A zero-free region for  (s)

We rst show, as promised, that  (s) does not vanish on  = 1. As usual nowadays
we give Mertens' elegant version of the original arguments of Hadamard and (indepen-
dently) de la Vallee Poussin. Recall that
 0 (s) X 1 (n)
?  (s) = s
n=1 n
has a simple pole at s = 1 with residue +1. If  (s) were to vanish at some 1 + it then
? 0 =Pwould have a simple pole with residue ?1 (or ?2; ?3; : : :) there. The idea is
that n (n)=ns converges for  > 1, and as !1+ all the terms contribute towards
the positive-residue pole. As  approaches 1 + it from the right, the corresponding
terms have the same magnitude but are multiplied by nit , so to get a pole with residue
?1 \almost all" the phases nit would be near ?1. But then near 1 + 2it the phases
n2it would again approximate (?1)2 = +1, yielding a pole of positive residue, which
is not possible because then  would have another pole besides s = 1.
To make precise the idea that if nit  ?1 then n2it  +1, we use the identity
2(1 + cos )2 = 3 + 4 cos  + cos 2;
from which it follows that the right-hand side is positive. Thus if  = t log n we have
3 + 4 Re(nit ) + Re(n2it )  0:
Multiplying by (n)=n and summing over n we nd
 0   0   0 
3 ?  () + 4 Re ?  ( + it) + Re ?  ( + 2it)  0 ()
for all  > 1 and t 2 R. Fix t = 6 0. As !1+, the rst term in the LHS of this
inequality is 3=( ? 1) + O(1), and the remaining terms are bounded below. If  had
a zero of order r > 0 at 1 + it, the second term would be ?4r=( ? 1) + O(1). Thus
the inequality yields 4r  3. Since r is an integer this is impossible, and the proof is
Two remarks about this proof are in order. First, that P the only properties of (n)
we used are that facts that (n)  0 for all n and that n (n)=ns has an analytic
continuation with a simple pole at s = 1Qand no other poles of real part  1. Thus the
same argument exactly will show that  modq L(s; ), and thus each of the factors
L(s; ), has no zero on the line  = 1. Second, that the 3 + 4 cos  + cos 2 trick is
worth remembering since it has been adapted to other uses. For instance we shall
revisit and generalize it when we develop the Drinfeld-Vladut upper bounds on points
of a curve over a nite eld and the Odlyzko-Stark lower bounds on discriminants of
number elds. See also the Exercise below.
Returning to (), we next use it together with the partial-fraction formula
0 1 1 ?0 s X 1 1

?  (s) = s ? 1 + B1 + 2 ? ( 2 + 1) ? +

to show that even the existence of a zero close to 1 + it is not possible. How close
depends on t; speci cally, we shall show that there is a constant c > 0 such that if
jtj > 2 and  ( + it) = 0 then1
 <1? (Z)
log jtj :
Now let  2 [1; 2] and2 jtj  2 in the partial-fraction formula. Then the B1 and ?0 =?
terms are O(log jtj), and each of the terms 1=(s ? ), 1= has positive real part as
noted in connection with von Mangoldt's theorem on N (T ). Therefore3
? Re  ( + 2it) < O(log jtj);
and if some  = 1 ?  + it then
? Re  ( + 2it) < O(log jtj) ?  + 1 ? 1 :
Thus () yields
4 3 + O(log jtj):
+?1 ?1
In particular, taking4  = 1 + 4 yields 1=20 < O(log jtj). Thus   (log jtj)?1 , and
our claim (Z) follows.
Once we obtain the functional equation and partial-fraction decomposition for Dirichlet
L-functions L(s; ), the same argument will show that (Z) also gives a zero-free region
for L(s; ), though with the implied constant depending on .
Exercise: Show that for each > 2 there exists t 2 R such that
exp(jxj + itx) < 0:
(Yes, this is related to the present topic; see [EOR 1991, p.633]. The integral is known
to be positive for all t 2 R when 2 [0; 2], see for instance [EOR 1991, Lemma 5].)
For the zero-free region see for instance chapter 13 of Davenport's book [Davenport
1967] cited earlier.
[EOR 1991] Elkies, N.D., Odlyzko, A., Rush, J.A.: On the packing densities of super-
balls and other bodies, Invent. Math. 105 (1991), 613{639.
[Montgomery 1971] Montgomery, H.L.: Topics in Multiplicative Number Theory.
Berlin: Springer, 1971. [LNM 227 / QA3.L28 #227]
[Wal sz 1963] Wal sz, A.: Weylsche Exponentialsummen in der neueren Zahlentheorie.
Berlin: Deutscher Verlag der Wissenschaften, 1963. [AB 9.63.5 / Sci 885.110(15,16)]
1 This classical bound has been improved; the current record of 1 ?   log?2=3? jtj, due to
Korobov and perhaps Vinogradov, has stood for 40 years. See [Wal sz 1963] or [Montgomery 1971,
Chapter 11].
2 Any lower bound > 1 would do | and the only reason we cannot go lower is that our bounds are
in terms of log jtj so we do not want to allow log jtj = 0.
3 Note that we write < O (log jtj), not = O (log jtj), to allow the possibility of an arbitrarily large
negative multiple of j log tj.
4 1 +  will do for any > 3. This requires that   1, e.g.   1=4 for our choice of = 4, else
 > 2; but we're only concerned with  near zero anyway.

Math 259: Introduction to Analytic Number Theory
Proof of the Prime Number Theorem

Recall our integral approximation

1 Z 1+ log x +iT  0
1  x log2 x 
(x) = 2i ?  (s) xs dss + O (T 2 [1; x]) ()
1+ log x ?iT
1 T

to (x). Assume that T does not coincide with the imaginary part of any .
Shifting the line of integration leftwards, say to real part ?1, yields
0 1  2 
X A = I1 + I2 ?  0 (0) + O x log x ; P
(x) ? @x ? x
( )
j Im()j<T
in which I1 ; I2 are the integrals of ?( 0 (s)= (s))xs ds=s over the vertical line
 = ?1; jtj < T and the horizontal lines  2 [?1; 1 + 1= log x], t = T respec-
tively. We next show that I1 ; I2 are small. The vertical integral I1 is clearly
0 2
 logx T sup  (?1 + it)  logx T :
The horizontal integrals in I2 are
Z 1+ 1 0
 sup  ( + iT ) :
log x
x d
?1  2[?1;2]

The  integral is  x= log x. We claim that by varying T by O(1) [any change

in the error estimates that this causes is readily absorbed into the implied O-
constants] we can make the sup  logP2 T . We have seen already that up to
O(log T ) we may replace  0 (s)= (s) by jT ?Im j<1 1=(s ? ), a sum of O(log T )
terms. Since the number of Im  in the interval [T ? 1; T + 1] is  log T , some
point in the middle half of that interval is at distance  1= log T from all of
them; choosing that as our new value of T , we see that each term is  log T
and so the sum is  log2 T . In conclusion, then, I2  x log2 T =T log x. Better
estimates can be obtained, but are not necessary because this is already less
than the error (x log2 x)=T in (*).
Thus the RHS of ( ) may be absorbed into the O((x log2 x)=T ) error. In the
LHS, we use our zero-free region 1 ?  < c= log jtj to nd that
log x
jx j = xRe()  x1?c= log T = x exp ?c log T

Since X X
1 1
j Im()j<T jj <
j Im()j<T j Im j
( ) = 2Z T
() Z T
log t dt  log2 T ;
=2 dN t N t dt

1 t 1 1 t
we thus have X x log x

 x log2 T exp ?c log T
(x) ? 1  1 + exp??c log x  log2 x:

x log T
We choose T so that the logarithmsp ? log T , ? log x= log T of the two terms are
equal. That is, we take T = exp log x. We then absorb the log2 x factor into
the resulting estimate exp(?C log1=2 x + O(1)) by slightly decreasing C , and at
last obtain the Prime Number Theorem with error estimate: there exists C > 0
such that p
(x) = x + O(exp(?C log x)):
The equivalent result for (x) follows by partial integration:
(x) = li(x) + O(exp(?C log x)):
(Recall that li(x) =
R log y = x= log x + O(x= log2 x).)
Consequences of the Riemann Hypothesis P (RH). Suppose now that RH
holds. Then we may take T = x in ( ) to nd (x) = x + O(x1=2 log2 x).
More generally if it is known for some  2 [ 21 ; 1) that Re  <  for all zeros 
of  then the same argument yields (x) = x + O(x log2 x). The equivalent
result for (x) is (x) = li(x) + O(x log x).PNote that since necessarily   1=2
the O(x1=2 ) di erence between (x) and p<x log p is absorbed into the error
But note that, under RH, that di erence, which is (x)? p<x logPp  (x1=2 )  x1=2 ,
is exactly of the same asymptotic order as the terms x = of ( ), and much larger
than each single term because
P jj?1 < 1=14. Thus one expectsPx to exceed (x) more
often than not. Since  1= diverges, it is possible for ?  x = to exceed x1=2 ,
but not often | indeed it was thought that (x) might always be < x, and thus
that (x) < li(x), but Littlewood showed that the di erence changes sign in nitely
often in both cases. The rst such sign change has yet to be found, though. The rst
explicit bound was the (in)famously astronomical \Skewes' number" [Skewes 1933];
that bound has since fallen, but still stnds at several hundred digits, too small to
reach directly even with the best algorithms known for computing (x) | algorithms
that also depend on the analytical formulas such as (*); see [LO 1982].

A converse implication also holds: if for some  2 [ 12 ; 1) and all  > 0 we have
(x) = x + O (x+ ), or equivalently (x) = li(x) + O (x+ ), then  (s) has no
zeros of real part > . So, for instance, RH is equivalent to the P assertion that
 (x) = li x + O(x1=2 log x). To see this, write ? 0 (s)= (s) = n (n )n ?s as a
Stieltjes integral and integrate by parts to nd
0 Z1 Z1
?  (s) = s (x)x?s?1 dx = s ?s 1 + ( (x) ? x) x?s?1 dx ( > 1):
1 1
If (x) ? x  x+ then the resulting integral for s=(s ? 1)+  0 (s)= (s) extends
to an analytic function on  > , whence that half-plane contains no zeros
of  (s). Note the amusing consequence that an estimate (x) = x + O (x+ )
would automatically improve to (x) = x + O(x log2 x), and similarly for (x).

1. Use the partial-fraction decomposition of  0 = to get the following exact

formula: X  0
(x) = x ? x ?  (0) ? 21 log(1 ? x?2 ):

Here (x) is interpreted

as 21 lim!0 ( (x ? ) + (x + )), and  is taken to
mean limT !1 jj<T . Note that the last term is the sum of ?xr =r over the
trivial zeros r = ?2; ?4; ?6; : : : See [Davenport 1967, Chapter 17].
2. Show that the improved
? zero-free region zero-free
 region 1? < c = log2=3+ jtj
yields an estimate O x exp(?C log x) on the error in the Prime Number
[LO 1982] Lagarias, J.C., Odlyzko, A.M.: News algorithms for computing (x).
Pages 176{193 in Number Theory: New York 1982 (D.V. and G.V. Chudnovsky,
H. Cohn, M.B. Nathanson, eds.; Berlin: Springer 1984, LNM 1052).
[Skewes 1933] Skewes, S.: On the di erence (x) ? li(x) (I). J. London Math.
Soc. (1st ser.) 8 (1933), 277{283.

Math 259: Introduction to Analytic Number Theory
L(s; ) as an entire function; Gauss sums

We rst give, as promised, the analytic proof of the nonvanishing of L(1; ) for a
Dirichlet character  mod q; this will complete our proof of Dirichlet's theorem
that there are in nitely primes in the arithmetic progression fqm + a : m 2 Z>0 g
whenever (a; q) = 1, and that the logarithmic density of such primes is 1=(q).1
We follow of [Serre 1973, Ch. VI x2]. Functions such as  (s), L(s; ) and their
products are special cases of what Serre calls \Dirichlet series": functions
f (s) := an e?ns (D)
with an 2 C and 0  n < n+1 !1. [For instance L(s; ) has n = log n
and an = (n).] We assume that the an are small enough that the sum in (D)
converges for some s 2 C. We are particularly interested in cases such as
q (s) := L(s; )
 mod q
whose coecients an are nonnegative. Then if (D) converges at some real 0 ,
it converges uniformly on   0 , and f (s) is analytic on  > 0 . Thus a series
(D) has a maximal open half-plane of convergence (if we agree to regard C itself
as an open half-plane for this purpose), namely  > 0 where 0 is the in mum
of the real parts of s 2 C at which (D) converges. This 0 is then called the
\abscissa of convergence" of (D).
We claim that if 0 is nite then it is a singularity of f ; equivalently, that if
for some  2 R the series (D) converges in  >  and extends to an analytic
function in a neighborhood of  then 0 < . Since f (s ? ) is again of the form
(D) with nonnegative coecients it is enough to prove this claim for  = 0.
Since f is analytic in  > 0 and also in jz j <  for some p > 0, it is analytic in
js ? 1j  1+  for suciently small , speci cally any  < 1 + 2 ? 1. Expand f
in a Taylor series about s = 1. Since (D) converges uniformly in a neighborhood
of that point, it may be di erentiated termwise, and we nd that its p-th there
is 1
f (p) (1) = (?n )p an e?n :
1 Davenport gives a simpler proof of Dirichlet's theorem, also involving L-functions but
not yet obtaining even the logarithmic density, in Chapter 4, attributing the basic idea to
Landau 1905.

Thus taking s =  we obtain the convergent sum
X1 (?1 ? )p X1 (1 + )p "X
1 #
f (?) = (p) p ? n
p=0 p! f (1) = p=0 p! n=1(+n ) an e :

But since all the terms in the inner sum are nonnegative, the sum converges
absolutely and may be summed in reverse order, yielding
1 "1 #
X X (1 + )p ?n
f (?) = an e :
n=1 p=0 p!
But the new inner sum is just a Taylor series for en  . So we have shown that
(D) converges at s = ?, and thus that 0  ? < 0 = , as claimed.
We can now prove:
Theorem. Let  be a nontrivial character mod q. Then L(1; ) =6 0.
Proof : We know already that L(s; ) extends to a function on  > 0 analytic
except for the simple pole of L(s; 0 ) at s = 1. If any L(s; ) vanished at s = 1
then Y
q (s) := L(s; )
 mod q
would extend to an analytic P function on  > 0. But we observed already that
q (s) is a Dirichlet series n an n?s converging at least in  > 1 with an  0
for all n, and thus converging inP > 0. But we also have an  1 if n = k(q)
for some k coprime to q. Thus n an n? diverges for   1=(q). This is a
contradiction, and we are done.
So we have Dirichlet's theorem; but we want more than logarithmic density:
we're after asymptotics of (x; a mod q), or equivalently of (x; ). As with the
Prime Number Theorem, it will be enough to estimate
(x; ) := (n)(n);
for which we have an integral approximation
Z 1+ log x +iT L0
1  2 
(x; ) = 21i ? (s; ) x s ds + O x log x (T 2 [1; x]):
1+ log1 x ?iT L s T
So we proceed to develop a partial-fraction decomposition for L0 =L, which in
turn requires us to prove an analytic continuation and functional equation for
L(s; ).

Our key tool in proving the functional equation for  (s) was the Poisson sum-
mation formula, which we got from the Fourier series of
F (x) := f (x + m)
by setting x = 0. We now need this Fourier series
F (x) = f^(n)e?2inx
for fractional x. (As before, f^(y) is the normalization
Z +1
f^(y) = e2ixy f (x) dx (F)
of the Fourier transform of f .) Let  be a character mod q; evaluate F (x) at
x = a=q, multiply by (a), and sum over a mod q to obtain
2 3
X X1 X
(m)f (m=q) = 4 (a)e ?2ina=q 5 f^(n): (P0 )
m=?1 n=?1 a mod q
Consider the inner sum
?n () := (a)e?2ina=q :
a mod q
Assume henceforth that  is primitive. We then claim:
n () = (n)1 (): ( )
Indeed, if (n; q) = 1 then we may replace a by n?1 a, from which n () =
(n)1 (n) follows; note that this part did not require primitivity. If (n; q) > 1
then (n) = 0, so we want to show n () = 0. Let d = (n; q) and q0 = q=d, and
rearrange the n () sum according to a mod q0 :
2 3
X X X 2ina0 =q 6 X
n () = (a)e2ina=q = e 4 (a)75 :
a0 mod q0 aaamod q
0 modq0
a0 mod q0 aaamod q
0 mod q0

We claim that the inner sum vanishes. This is clear unless (a0 ; q0 ) = 1. In that
case the inner sum is X
(a1 ) (a);
mod q

for any a1  a0 mod q0 . But this last sum is the sum of a character on the group
of units mod q congruent to 1 mod q0 , and so vanishes unless that character is
trivial | and if (a) = 1 whenever a  1 mod q0 then  comes from a character
mod q0 (why?) and is not primitive! This proves ( ). We generally abbreviate
1 () as  (), and call that number
 () := (a)e2ia=q
a modq
the Gauss sum of the character . Then (P0 ) becomes
X 1
(m)f (m=q) =  () (?n)f^(n): (P )
m=?1 n=?1
Assuming that f is a function such that both f; f^ satisfy the
P Poisson hypotheses,
we may apply this twice to nd that ( () () ? (?1)q) m2Z (m)f (m=q) =
0. The sum does not vanish for every suitable f (e.g. f (x) = exp(?C (x ? 1=q)2 )
for large C ), so2  () () = (?1)q. Moreover
 () := (a)e2ia=q = (a)e?2ia=q = (?1) ();
a mod q a mod q

p j ()j = q. Moreover if  is a real

where we used ( ) in the last step. Thus 2
character then  () is either pq or  ?q according as (?1) = +1 or ?1,
i.e. according as  is even or odd. [This yields the remarkable fact that every
quadratic number eld is contained in a cyclotomic eld, and is also the key
ingredient in one beautiful proof of Quadratic Reciprocity.]
At least for even  we know what to do next: take f (x) = e?u(qx)2 and
Mellin-transform the resulting identity. This is actually easier than our proof
of the  functional equation, because we don't need to split the integral in two.
(Essentially this is because unlike  the L-function of a nontrivial primitive
character has no poles.) Let3
 (u) := (n)e?n2 u :
By (P ), together with the fact that the Fourier transform of e?u(qx)2 is
u?1=2 q?1 e?u?1 (x=q)2 , we obtain
 (u) =  (1=)2
qu (?n)e?u?1 (n=q)2 =  (1=)2  (1=q2 u):
qu ( )
2 This is not an admirable way of proving that  () () = (?1)q ! We'll give a nicer proof
3 Our  (u) is called (qu; ) in [Davenport 1963, Ch.9].

Note that since (0) = 0 once q > 1 it follows that  (u) is rapidly decreasing
as u!0+. Now we have as before
2?s=2 ?(s=2)L(s; ) =  (u)us=2 du
This gives the analytic continuation of L(s; ) to an entire function with zeros
at the poles s = 0; ?2; ?4; ?6; : : : of ?(s=2). Moreover by ( ) the integral is
 () Z 1  (1=q2 u)u(s?1)=2 du =  () Z 1  (u)(q2 u)(1?s)=2 du
q  u q  u
0 0
Z 1
=  q(s )  (u)u(1?s)=2 du
This last integral is 2 ()q?s ?( 12 (1 ? s))(s?1)=2 L(1 ? s; ) for  2 [0; 1], and
thus by analytic continuation for all s 2 C. We can write the functional equation
symmetrically by setting
 (s; ) := (=q)?s=2 ?(s=2)L(s; );
which is now an entire function:  (s; ) is related with  (s; ) by
 (s; ) =  ()  (1 ? s; ):
pq (L+ )

What about odd ? The same de nition of  would yield zero. We already
indicated (in the exercises on the functional equation for  and  ) the way
around this problem: we apply the -twisted Poisson summation formula (P )
not to the2 Gaussian e?u(qx)2 but to its derivative, which is proportional to
xe?u(qx) . Using the general fact that the Fourier transform of f 0 is 2iyf^(y)
(integrate by2 parts in the de nition?1(F) of2 fb0 ) we see that the Fourier transform
of xe?u(qx) is (iy=(u1=2q)3 )e?u (y=q) . So, if we de ne4
# (u) := n(n)e?n2 u ;
we obtain
# (u) = iq2(u3)=2 # (1=q2 u): (# )
This time we must multiply # by u(s+1)=2 du=u to cancel the extra factor of n,
and so obtain the integral formula
2?(s+1)=2 ?((s + 1)=2)L(s; ) = # (u)u(s+1)=2 du
4 Our # (u) is Davenport's (
1 qu; ).

for L(s; ). Again (# ) together with (0) = 0 tells us that # (u) vanishes
rapidly as u!0+, and thus that our integral extends to an entire function of s;
note however that the resulting trivial zeros of L(s; ) are at the negative odd
integers. The functional equation (# ) again gives us a relation between L(s; )
and L(1 ? s; ), which this time has the symmetrical form
 (s; ) =  ()  (1 ? s; );
ipq (L )?
 (s; ) := (=q)?(s+1)=2 ?((s + 1)=2)L(s; ):
Exercise: Complete the missing steps in the proof of L? .]
We may combine (L+ ) with (L? ) by introducing an integer depending on :

a := 01;; ifif (1) = +1;
(1) = ?1.
That is, = 0 or 1 according as  is even or odd. Then

 (s; ) := (=q)?(s+ )=2 ?((s + )=2)L(s; )


satis es the functional equation

 (s; ) =  ()  (1 ? s; ):
i pq
This even works for the case q = 1 corresponding to  (s): : : As in that case, we
conclude that  (s; ) is an entire function of order 1; we'll develop its product
formula and deduce the asymptotics of (x; ) in the next lecture notes.
Meanwhile, we prove another formula involving Gauss sums, namely their rela-
tion with the Jacobi sums
J (; 0 ) := (c)0 (1 ? c)
c modq
under the assumption that q is prime. Clearly J (0 ; 0 ) = q ?2, so we henceforth
assume that ; 0 are not both trivial. Note that unlike  (), which may involve
both q-th and (q ? 1)st roots of unity, the Jacobi sum J (; 0 ) involves only
(q ? 1)st roots. Nevertheless it can be evaluated in terms of Gauss sum, starting
from a consideration of the double sum
 () (0 ) = (a)0 (a0 )e2i(a+a0 )=q ;
a;a0 mod q
as suggested by our evaluation of B(s; s0 ) in terms of Gamma functions. Let
b = a + a0 . The terms with b = 0 sum to
X 0 
0 (?1) 0 = ;
 (a) = 0;(?1)(q ? 1); ifotherwise.
a modq

To sum the terms for xed nonzero b, let a = cb and a0 = (1 ? c)b to nd
e2ib=q 0 (b) (c)0 (1 ? c): = e2ib=q 0 (b)J (; 0 ):
c modq
Thus if 0 =  we have
 () (0 ) = (?1)(q ? 1) ? J (; 0 )
and thus J (; ) = ?(?1), a fact that can also be obtained directly from
(c)(1?c) = (c?1 ?1). [This in turn yields an alternative proof of j ()j = pq
in the prime case.] Otherwise we nd
 (0 ) :
J (; 0 ) =  (() (J)
In particular it follows that jJ (; 0 )j = pq if each of ; 0 ; 0 is nontrivial.
The formula (J) is the beginning of a long and intricate chapter of the arithmetic
of cyclotomic number elds; it can also be used used to count points on Fermat
curves mod q, showing for instance that if q  1 mod 3 then p the number of
c 6= 0; 1 in Z=q such that both c and 1 ? c are cubes is q=9 + O( q ).
What of our promised Poisson-free proof of j ()j = pq ? Well, our formula
for n () states in e ect that  () is the discrete Fourier transform of . It
follows from Parseval that
j ()(a)j2 = q j(a)j2 :
a mod q a mod q
But the LHS is j ()j2 (q), and the RHS is q(q), so we're done.
Further Exercises:

1. Consider a series (D) in which an need not be positive reals. Of course this
series still has an abscissa of absolute convergence. Less obvious, but still true,
is that it also has an abscissa of ordinary convergence.
P Show that if the sum
(D) converges in the usual sense of limN !1 N1 at some s0 then it converges
also in   Re(s0 ), the convergence being uniform in arg(s ? s0 )  for each
< =2. Deduce that (D) de nes an analytic function on   Re(s0 ). (Since
f (s ? s0 ) is again P
of the form (D) it is enough to provePthis claim for s0 = 0.
Assume then that 1 n=1 an converges, and let A(x) = ? n >x an !0 as x!1;
for large M; N write
an e?ns = e?s dA();
n=M M
etc. This is equivalent to the route taken by Serre, but probably more trans-
parent to us.)

2. Suppose  is a real character. Then (L) relates  (s; ) with  (s; 1 ? ). De-
duce that Lp(s; ) has a zero ofp even or odd multiplicity at s = 1=2 according as
 () = +i q or  () = ?i q. In particular, in the minus case L(1=2; ) = 0.
a a

[But it is known that in fact the minus case never occurs, a fact rst proved by
Gauss after much work. (Davenport proves this in the special case of prime q in
Chapter 2, using a later method of Dirichlet that relies on Poisson summation.)
It follows that the order of vanishing of L(s; ) at s = 1=2 is even; it is conjec-
tured, but not proved, that in fact L(1=2; ) > 0 for all Dirichlet characters .
More complicated number elds are known whose zeta functions do vanish at
s = 1=2.]
3. Obtain a formula for the generalized Jacobi sum
J (1 ;    ; n ) :=  1 (a1 )    n (an )
ai modq
under suitable hypotheses on the i . What is the analogous formula for de nite
4. Let  be the Legendre symbol modulo an odd prime q. Evaluate  ()n in
two ways to count the number of solutions mod q of x21 +    + x2n = 1.
5. Can you nd a  analog of the duplication formula for the Gamma function?
[Davenport 1967] Davenport, H.: Multiplicative Number Theory. Chicago:
Markham, 1967; New York: Springer-Verlag, 1980 (GTM 74). [9.67.6 & 9.80.6
/ QA 241.D32]
[Serre 1973] Serre, J.-P.: A Course in Arithmetic. New York: Springer, 1973
(GTM 7). [AB 9.70.4 (reserve case) / QA243.S4713]

Math 259: Introduction to Analytic Number Theory
The asymptotic formula for primes in arithmetic progressions

Now that we have the functional equation for L(s; ), the asymptotics for
(x; ), and thus also for (x; a mod q) and (x; a mod q)), follow just as they
did for (x) and (x) | at least if we are not very concerned with how the
implied constants depend on q. Again all this is found in [Davenport 1967].
Let  be a primitive character mod q > 1. We readily adapt our argument
showing that (s2 ? s) (s) is an entire function of order 1 to show that  (s; ) is
an entire function of order 1, and thus has a Hadamard product
 (s; ) =  (0; )eBs (1 ? s=)es= ; ( )

the product ranging over the zeros of  (s; ) counted with multiplicity, which
are just the zeros of L(s; ) with  2 [0; 1]. Thus
0 X 1 1

(s; ) = B + + :
s ? 
( 0 )

Note that B depends on ; see Exercise 1 below. Fortunately it will usually

cancel out from our formulas. It follows that
L0 1 q 1 ?0 ((s + )=2) + X 1 + 1 : (L0 )
(s; ) = B ? log ?
2  2? s?

How are these zeros  distributed? We noted already that their real parts lie in
[0; 1]. If L(; ) = 0 then by the functional equation 0 = L(1?; ) = L(1?; ).
Thus the zeros are symmetrical about the line  = 1=2, but not (unless  is
real) about the real axis. So the proper analog of N (T ) is half of N (T; ), where
N (T; ) is de ned as the number of zeros of L(s; ) in  2 (0; 1), jtj < T , counted
with multiplicity. [NB this excludes the trivial zero at s = 0 which occurs for
even .] Again we evaluate this by integrating  0 = around a rectangle, nding
that for (say) T  2
1 N (T; ) = T log qT ? T + O(log qT ): (N )
2 2 2 2
[Here the extra term (T =2) log q enters via the new factor qs=2 in  (s; ). That
factor is also responsible for the new term ? 21 log q in (L0 ) and thus forces us to
subtract O(log q) from our lower bound on the real part of (L0 =L)(s; ), nding
L0 X 1 + O(log jqtj)
(s) =
j Im(s?)j<1 s ? 

( 2 [?1; 2]), the sum comprising O(log jqtj) terms, with log jqtj in place of
log jtj; that is the source of the error O(log qT ) instead of O(log T ) in (N ).]
To isolate the primes in arithmetic progressions mod q, we need also nonprimi-
tive characters, such as 0 . Let 1 be the primitive character mod q1 jq under-
lying a nonprimitive  mod q. Then
L(s; ) = 1 ? 1 (p)p?s  L(s; 1 ):
The elementary factor pjq    has, for each p dividing q but not q1 , (T =) log p+
O(1) purely imaginary zeros of absolute value < T . It follows that the RHS of
(N ) is an upper bound on 21 N (T; ) even when  is nonprincipal.
The horizontal distribution of  is subtler. We noted already that the logarith-
mic derivative of Y
q (s) := L(s; )
 mod q
P ?s with q (n)
is a Dirichlet series ? n q (n)n  0 for all n, and thus that
the 3 + 4 cos  + cos 2 trick shows that q , and thus each factors L(s; ), does
not vanish at s = 1 + it. We can then adapt the proof of the classical zero-
free region for  (s); since, however, q (s) is the product of (s) L-series, each
of which contributes O(log jqtj) to the bound on (q0 =q )( + it), the resulting
zero-free region is not 1 ?  < c= log jtj or even 1 ?  < c= log jqtj but 1 ?  <
c=((q ) log jqtj). Moreover, the fact that this only holds for say jtj > 2 is newly
pertinent: unlike  (s), the L-series might have zeros of small imaginary part.
[Indeed it is known that there are Dirichlet characters whose L-series vanish
arbitrarily close to the real axis.] Still, for every q there are only nitely many
zeros with jtj  2. So our formula
Z 1+ log1 x +iT 0  2 
(x; ) = 21i ? LL (s; ) xs dss + O x log x
(T 2 [1; x])
1+ log1 x ?iT T

6 0 there is
yields an estimate as before, with only the di erence that when  =
no \main term" coming from a pole at s = 1. We thus nd
(x; )  exp(?C log x) ( )
for some constant C > 0. Multiplying by (a) and averaging
p over  (including
in the average 0 , for which (x; 0 ) = x + O(exp(?C log x)) instead of (  ),
we obtain
(x; a mod q) = (1q) x + Oq (exp(?Cq log x));
( q)
and thus
mod q) = (1q) li(x) + Oq (exp(?Cq log x)):
 (x; a (q )

Note however that the dependence of the error terms on q is unpredictable. The
zero-free region depends explicitly on q (though as we shall see it need not shrink
nearly as fast as 1=(q) log q, a factor which alone would make Cq proportional
to ((q) log q)?1=2 ), but it excludes the neighborhood of the real axis; it would
then seem that to specify C and  we would have to compute for each  the
largest Re(). There's also the matter of the contribution of the B 's from (L0 ).
Consider, by comparison, the consequences of the Extended Riemann Hypothesis
(ERH), i.e. the conjecture that each nontrivial zero  of an L-series associated
to a primitive Dirichlet character  has real part 1=2.1 Our analysis of (x)
under RH then carries over almost verbatim to show that (x; )  x1=2 log2 x
as long as q < x with an absolute an e ective implied constant, and thus that
(x; a mod q) = (xq) + O(x1=2 log2 x); (?)
again with the O-constant e ective and independent of q. It would also follow
 (x; a mod q ) =
li(x) + O(x1=2 log x); (??)
(q )
P similar comments about the e ect of the di erence between (x) and
p<x log x [RS 1994].

1. Show that the real part of the term B of ( 0 ) is ?  Re(1=). Conclude that
Re(B ) < 0. [Davenport 1973, page 85.]
2. Complete the missing steps in the proof of (N ).
3. Complete the missing steps in the proof of (  ), q , and q .
4. Verify that under ERHq the O-constant in (?) does not depend on q. Obtain
an analogous estimate on the weaker assumption that q has no zeros of real part
>  for some  2 ( 12 ; 1). Show that if for some q we have  (x; a mod q )  x+
for all a 2 (Z=q) then all the L(s; ) for Dirichlet characters  mod q are
nonzero on  > .
[RS 1994] Rubinstein, M., Sarnak, P.: Chebyshev's Bias. Experimental Mathe-
matics 3 (1994) #3, 173{197. []

1 Attributed by Davenport to \Piltz in 1884" (page 129).

Math 259: Introduction to Analytic Number Theory
A nearly zero-free region for L(s; ), and Siegel's theorem

We used positivity of the logarithmic derivative of q to get a crude zero-free region

for L(s; ). But but zero-free regions can be obtain with some more work by working
with the L(s; ) individually. The situation is most satisfactory for complex , i.e. for
characters with 2 6= 0 . (NB real  were also the characters that gave us the most
trouble in the proof of L(1; ) 6= 0; indeed the trouble for such characters is again in
the neighborhood of s = 1.) Recall that for  (s) zero-free region we started with the
expansion of the logarithmic derivative
0 1
?  (s) =
(n)n?s ( > 1)
and applied to the phases z = n?it = ei of the terms n?s the inequality
0  12 (z + 2 + z)2 = Re(z 2 + 4z + 3) = 3 + 4 cos  + cos 2:
If we apply the same inequality termwise to
0 1
? LL (s; ) =
then we must use z = (n)n?it instead of n?it , and obtain
0  3 + 4(n)n?it + 2 (n)n?2it :
Multiplying by (n)n? and summing over n yields
 0   0   0 
0  3 ? L (; 0 ) + 4 Re ? L ( + it; ) + Re ? L ( + 2it;  ) : () 2

Now we see why the case 2 = 0 will give us trouble near s = 1: for such  the last
term in () is essentially ( 0 = )( + 2it), whose pole at s = 1 will undo us for small t.
Let us see how far () does take us. (See for instance Chapter 14 of Davenport.) The
rst term is  0 

< 3 ?  () <  ?3 1 + O(1):
For the remaining terms, we use the partial-fraction expansion
0 0  
? LL (s; ) = 12 log q + 21 ?? ((s + a)=2) ? B ? 1 +1 :

To eliminate the contributions of B and  1= we evaluate this at s = 2 and subtract.
Since (L0 =L)(2; ) is bounded, we get
L 0 1 ?0 X 1 1

? L (s; ) = 2 ? ((s + a)=2) ? ? :
 s? 2?

Next take real parts. For  of real part in [0; 1] we have Re(1=(2 ? ))  j2 ? j?2 .
To estimate the sum of this over all , we may apply Jensen's theorem to  (2 + s; ),
nding that the number (with
P multiplicity) of jj at distance at most r from 2 is
O(r log qr), and thus that  j2 ? j?2  log q. We estimate the real part of the ?0 =?
term by Stirling as usual, and nd
Re ? L (s; ) < O(log q(jtj + 2)) ? Re 1 = O() ? Re 1 ;
L  s?  s?
where we have introduced the convenient abbreviation1
 := log q(jtj + 2):
Again each of the Re s?1  is nonnegative, so the estimate remains true if we only include
some or none of the .
In particular it follows that
Re ? LL ( + 2it; 2) < O(); (L2 )

at least when 2 is a primitive character. If 2 is not primitive, but still not = 0 ,

then (L2 ) holds when 2 is replaced by its corresponding primitive character; but the
error thus introduced is at most
X p? log p < X log p  log q < ;
pjq 1 ? p
so can be absorbed into the O() error. But when 2 = 0 the partial-fraction
expansion of its ?L0=L has a term +1=(s ? 1) which cannot be discarded, and can
only absorbed into O() if s is far enough from 1. We thus conclude that (L2 ) holds
unless 2 = 0 and jtj < c= log q, the implied constant in (L2 ) depending on c (or we
can change O() to 1=jtj + O() when 2 = 0 ).
The endgame is the same as we have seen for the classical zero-free region for  (s): if
there is a zero  = 1 ?  + it with  small, use its imaginary part t in () and nd
from the partial-fraction expansion that
L 1 :
Re ? L ( + it; ) < O() ?  ? Re()
Combining this with our previous estimates yields
4 < 3 + O();
+?1 ?1
choosing  = 1 + 4 as before yields 1=  O() under the hypotheses of (L2 ),
completing the proof of a zero-free region with the possible exception of real  and
zeros of imaginary part  1= log q.
1 The usual symbol for this, seen in Davenport and elsewhere, is a calligraphic script L, which alas
we do not seem to have readily available here.

Next suppose that  is a real character and x some  > 0. We have a zero-free
region for jtj > = log q. To deal with zeros of small real part, let s =  in () | or,
more simply, of the analogous inequality resulting from 1 + cos   0 (i.e. positivity of
?0 = , with  the zeta function of the quadratic number eld corresponding to )
| to nd X ? 1 O(log q);
Re(1=( ? )) <  +
j Im()j<= log q
the implied O-constant not depending on . Note that the LHS is real since the 's
come in complex conjugate pairs. Now Re(1=( ? )) = Re( ? )=j ? j2 . Choosing
 = 1 + 2= log q we nd that j Im()j < 21 ( ? 1) < 12 Re( ? ), and thus that
jj2 < 45 Re( ? )2 . Therefore Re(1=( ? )) > 45 = Re( ? ). So,
2 ?1 < log q + O(log q):
4 X
1 ? Re() +
5 j Im()j<= log q log q 2
Thus if c is small enough we conclude that at most one  can have real part > 1 ?
c= log q. Since 's are counted with multiplicity and come in complex conjugate pairs,
it follows that this exceptional zero, if it exists, is real and simple. It is usually denoted
by .
In fact2 such a may occur for at most one character mod q. Since  need not be
primitive, it follows that in fact it cannot occur for di erent q if we set the threshold
low enough: there is a constant c > 0 such that for any two distinct real characters
1 ; 2 to (not necessarily distinct) moduli q1 ; q2 at most one of their L-functions has
an exceptional zero > 1 ? c= log q1 q2 . The point is that 1 2 is also a nontrivial
Dirichlet character, so ?(L0 =L)(; 1 2 ) < O(log q1 q2 ) for  > 1. The sum of the
negative logarithmic derivatives of  (s), L(s; 1 ), L(s; 2), and L(s; 1 2 ) is a positive
Dirichlet series 1
(1 + 1 (n))(1 + 2 (n))(n)n?s ;
which is thus positive for real s > 1. Arguing as before we nd that if i are exceptional
zeros of L(s; i ) then
1 + 1 < 1 + O(log q q );
 ? 1  ? 2  ? 1 1 2

if i > 1 ?  then we may take  = 1+2 to nd 1=6 < O(log q1 q2 ), when   log q1 q2
as claimed.
In particular, given q at most one real character mod q has an L-series with an ex-
ceptional zero > 1 ? c= log q. A typical estimate on primes in arithmetic progressions
that we can now deduce by our standard methods [See e.g. Davenport, Chapter 20]
is: suppose x > exp(C log2 q) for some absolute constant C > 0; then there exists a
constant c > 0 depending only on C such that (x; a mod q) is
( ?1 p 
1 ? (a) x + O(exp ?c log x) x(q) ; if  mod q has the exceptional zero ;
? p
1 + O(exp ?c log x) x(q) ;

if no  mod q has an exceptional zero.
2 Davenport attributes the next to Landau in Gottinger Nachrichten 1918, 285{295.

Just how close can this come to 1? We rst observe that very small 1 ? imply
small L(1; ). To see this we need an upper bound on jL0 (; )j for  near 1, and such
a bound (also for complex ) is
0  1 ?   log1 q ) jL0(; )j  log2 q:
Indeed we have ?L0(; ) = 1 ? Pq
n=1 (n)(log n)n = n=1 + n>q ; the rst sum is
bounded termwise by e log n=n and thus by logP2 q; the rest can be bounded by partial
summation together with the crude estimate j qN (n)j < q, yielding e log q. It follows
that if 1 ? < 1= log q then L(1; ) < (1 ? ) log2 q. But the Dirichlet class number
formula for the quadratic number eld corresponding to  gives L(1; )  q?1=2 ; thus
1 ?  1=2 1 2 :
q log q
Note for later use that our method of proving jL0 (; )j  log2 q in 0  1 ?   1= log q
also yields jL(; )j  log q in the same interval, and in particular at  = 1.
Siegel proved3 that in fact
L(1; )  q?
for all  > 0. It follows that 1 ?  q?. However, the implied constant was, and
still is sixty years later, ine ective for every  < 1=2. So, for instance, we know that
the class number of an imaginary quadratic eld is  jDj1=2? , but only with much
e ort was it shown (by Stark and, independently and di erently, Heegner) that the
class number exceeds 1 for all jDj > 163, and even an e ective lower bound of c log jDj
for prime D was big news and remains the best e ective bound known!
The problem is that again we need more than one counterexample to get a contradic-
tion. We follow Chapter 21 of [Davenport 1967]. Let 1 ; 2 be di erent primitive real
characters to moduli q1 ; q2 > 1, and let
F (s) =  (s) (s; 1 ) (s; 2 ) (s; 1 2 )
(the zeta function of a biquadratic number eld) and

 = L(1; 1)L(1; 2 )L(1; 12 ) = (s ? 1)F (s) s=1
We shall use an estimate
s 2 (; 1) ! F (s) > A ? 1B
? s (q1 q2 )
C (1?s) (F)
for some universal constants  < 1 and positive A; B; C (speci cally  = 9=10 and
A = 1=2; C = 8). Assume (F) for the time being. We shall nd 1 and 2 (; 1) such
that F (s)  0 and conclude that
 > BA (1 ? )(q1 q2 )?C (1? ): ()
If L( 1 ; 1 ) = 0 for some real 1 and 1 ? 1 < =2C then we use that 1 and = 1 .
Otherwise we choose any 1 and any with 0 < 1 ? 1 < =2C , since  (s) < 0 for
3 In the 1935 inaugural volume of Acta Arithmetica.

0 < s < 1 while the other three factors of F (s) are positive for s > . Then for any
primitive 2 mod q2 > q1 we use () together with L(s; )  log q yields
L(1; 2) > c q1?C (1? )= log q2 ;
with c depending only (but ine ectively!) on  via 1 and . Since C (1 ? ) < =2,
Siegel's theorem follows.
It remains to prove (F), for which we follow Estermann's simpli cation of Siegel's
original proof. Since F (s) has a nonnegative Dirichlet series, its Taylor series about
s = 2 is 1
F (s) = bm (2 ? s)m
with b0 = F (2) > 1 and all bm > 0. Thus
F (s) = s ? 1 = (bm ? )(2 ? s)m ;

this being valid in the circle js ? 2j < 2. Consider this on js ? 2j = 3=2. We have there
the crude bounds L(s; 1 )  q1 , L(s; 2 )  q2 , L(s; 12 )  q1 q2 , and of course  (s)
is bounded on that circle. So, F (s)  (q1 q2 )2 on that circle, and thus the same is true
of F (s) ? =(s ? 1). Thus
jbm ? j  (2=3)m(q1 q2 )2 :
So for xed  2 (1=2; 1) we nd for all s 2 (; 1)

jbm ? j(2 ? s)m  (q1 q2 )2 3(2 ? ) :
Thus (remember b0 = 1, bm  0)
F (s) ? s ? 1  1 ?  (2 ?1s?) s ? 1 ? O(q1 q2 )2 3(2 2? ) :

Let M be the largest integer such that the error term (q1 q2 )2 (2=(3 ? 2))M is < 1=2.
F (s) > 12 ? 1 ? s (2 ? s)M :
(2 ? s)M = exp(M log(2 ? s)) < exp M (1 ? s);
and exp M  (q1 q2 )O(1) , which completes the proof of (F) and thus of Siegel's theorem.

Math 259: Introduction to Analytic Number Theory
Formulas for L(1; )

Let  be a primitive character mod q > 1. We shall obtain a nite closed

form for L(1; ). As with our other formulas, this will assume a di erent form
depending on whether  is even ((?1) = +1) or odd ((?1) = ?1).
Recall our formula X
(n) =  (1) (a)e2ina=q :
a mod q
This yields
X X1 1
L(1; ) =  (1) (a) n e2ian=q ; (L)
a mod q n=1
the implied interchange of sums being justi ed if the inner sum converges for
each a mod q coprime with q. But this follows
P 2byian=q
partial summation since for
every nonzero a mod q the partial sum M n=1 e is bounded. In fact we
recognize it as the Taylor series for
 a  i  2a 
? log(1 ? e 2ia=q ) = ? log 2 sin q + 2 1 ? q

(if we choose the representative of a mod q with 0 < a < q). Either the real or
the imaginary part will disappear depending on whether  is odd or even.
Assume rst that  is even. Then the terms (1 ? 2a=q) cancel in (a; q ? a) pairs.
Moreover the terms (a) log 2 sum to zero, and we have
L(1; ) = ?  (1) (a) log sin a
q: (L+ )
a mod q
For example if  is a real character then
pqL(1; ) = 2 log 
= sin(a) a
is a cyclotomic unit of Q(pq). The Dirichlet class number formula then asserts
in e ect that  = h0 where 0 is the fundamental unit of that real quadratic eld
and h is its class number.

If on the other hand  is odd then it is thePlogarithm terms that cancel in
symmetrical pairs. Using again that fact that a modq (a) = 0 we simplify (L)
to q?1
L(1; ) = ? qi() a(a) (L? )
In particular if  is real then (again using the sign of  () for real characters)
L(1; ) = ?q?3=2 a(a):
Thus the sum is negative, and bypDirichlet equals ?q times the class number of
the imaginary quadratic eld Q( ?q), except for q = 3; 4 when that eld has
extra roots of unity.
Exercise: Show directly that the sum is a multiple of q, at least when q is
Let usPconcentrate on the case of real characters to prime modulus q  ?1 mod 4.
That aq?=11 a(a) < 0 suggests that the quadratic residues mod q tend to be
more numerous in the interval [1; q=2] than in [q=2; q]. We can prove this by
evaluating the sum
S (N ) := (n)
at N = q=2. We noted already that for any nontrivial character  mod q we
have S (q) = 0 and thus jS (N )j < q for all N . In fact, using the Gauss-sum
formula for (n) we have
S (N ) =  (1) (a) e2ina=q =  (1) (a) 11 ? e2iNa=q ;
? e?2ia=q
a mod q n=1 a modq
from which Polya and Vinogradov's estimate
S (N )  q1=2 1 1=2
a=1 a  q log q
follows immediately. Now let  be the quadratic character modulo a prime
q  ?1 mod 4 and let N = (q ? 1)=2. (What would happen for q  +1 mod 4?)
Then q?1
S ((q ? 1)=2) = (n)(n=q)
where (x) is the periodic function de ned by
< 0; if 2x 2 Z;
(x) = : +1=2; if 0 < x ? bxc < 1=2;
?1=2; otherwise
(\square wave"). This has the Fourier series
(x) = 2 sin 2x + 13 sin 6x + 15 sin 10x + 17 sin 14x +   
We thus have
1 1X
S ((q ? 1)=2) = i 2ima=q ? e?2ima=q )
m=1 m (a)(e
m odd
The inner sum is
 ()((m) ? (?m)) = 2ipq (m):
Thus our nal formula for S ((q ? 1)=2) is
2pq X (m) (2 ? (2))pq
 m odd m =  L(1; ):

It follows, as claimed, that there are more quadratic residues than nonresidues
in [1; q=2]; in fact one q > 3 the di erence between the counts is either h or 3h
according as (2) = 1 or ?1, i.e. according as q is 7 or 3 mod 8. Even the
positivity of S ((q ? 1)=2) has yet to proven without resort to such analytic
Exercises: What can you say of S (bq=4c)? What about the sums aq?=11 am (a)
for m = 2; 3; : : :?

Math 259: Introduction to Analytic Number Theory
Introduction to exponential sums; Weyl equidistribution

The \exponential" in question is the complex exponential, which we shall nor-

malize with a factor of 2 and abbreviate by e():
e(x) := e2ix
(with x 2 R in most cases). We shall on occasion also use the notation
em (x) := e(mx) = e2imx ;
note that e1 (x) = e(x) and e0 (x) = 1 for all x. An \exponential sum" is a sum
of terms e(x) with x ranging over (or indexed by) a nite but large set A; and
the general problem is to nd a nontrivial estimate on such a sum, which usually
means an upper bound on its absolute value signi cantly smaller than #(A).
Such problems are ubiquitous in number theory, analytic and otherwise, and
occasionally arise in other branches of mathematics; we shall concentrate on the
analytic aspects.1 Sometimes these sums arise directly or nearly so; for instance,
the Lindelof conjecture concerns the size of
X 1=2?it
 (1=2 + it) = n?1=2?it + itN ? 1=2 + O(tN ?1=2 );
so it would follow from a proof of
bt2 c
n?1=2?it  jtj ;
which in turn would follow by partial summation from good estimates on
X M 
n?it = e t log
n :
n=1 n=1
Often the translation of a problem to estimating exponential sums takes more
work. Prof. Sternberg's guest lecture introduced one example of such a trans-
lation, the problem of counting lattice points in a circle. Our next example
concerns equidistribution mod 1.
1 There is a fascinating and deep algebraic theory of exponential sums, in which A is some-
thing like the k-rational points of a variety for some nite eld k; here we'll alas have to ignore
most of this theory, and cite without proof the occasional result when an algebraic estimate
on an exponential sum is needed. For an introduction to this theory see [Schmidt 1976].

A sequence c1 ; c2 ; c3 ; : : : of real numbers is said to be of real numbers is said to
be equidistributed mod 1 if the fractional parts hcn i cover each interval in R=Z
in proportion to its length, i.e. if
lim 1 #fn  N : a  hc i  bg = b ? a
N !1 N n (1)
for all a; b such that 0  a  b  1. What does this have to do with exponential
sums? Consider the following theorem of Weyl: for a sequence fcn g1 n=1 in R
(or equivalently in R=Z , the following are equivalent:
(i) Condition (1) holds for all a; b such that 0  a  b  1;
(ii) For any continuous function f : (R=Z)!C,
1XN Z1
N !1 N
f (c n ) = f (t) dt; (2)
n=1 0

(iii) For each m 2 Z,

1XN  Z1 
N !1 N
e (c
m n ) =  m = e m (t) dt : (3)
n=1 0

Note that (iii) is precisely the problem of nontrivially estimating an exponential

Proof : (i))(ii) Condition (i) means that (ii) holds when f is the characteristic
function of an interval (NB such a function is not generally continuous, but it
is integrable, which is enough for the sequel); also both sides (2) are linear in f ,
so (ii) holds for nite linear combinations of such characteristic functions, a.k.a.
step functions. If jf (t)j <  for all t 2 R=Z then both sides of (2) are bounded
by  for all N . Thus (ii) holds for any function on R=Z uniformly approximable
by step functions. But this includes all continuous functions.
(ii))(i) Estimate the characteristic function of [a; b] from below and above by
continuous functions whos integral di ers from b ? a by at most .
(ii))(iii) is clear because (iii) is a special case of (ii).
(iii))(ii) follows from Fejer's theorem: every continuous function on R=Z is
uniformly approximated by a nite linear combination of the functions em . 2
[NB the approximation is in general not an initial segment of the Fourier series
for f . See [Korner 1988], chapters 1{3 (pages 3{13). The existence of uniform
approximations is also a special case of the Stone-Weierstrass theorem.]
So, fcn g is equidistributed mod 1 if and only if N ?1 Nn=1 em (cn )!0 for each
nonzero m 2 Z, i.e. if for each m 6= 0 we can improve on the trivial bound

j PNn=1 em(cn )j  N by a factor that tends to 1 with N . For instance, we have
Weyl's original application of this theorem: For r 2 R the sequence fnrg is
equidistributed mod 1 if and only if r 2= Q. Indeed if r is rational then hnri
only takes on nitely many values; but if r is irrational then for each m we have
em (r) 6= 1 and thus
em (nr) = em ((ne+ (1)r)r)??1 em (r) = Om (1):
n=1 m
In general we cannot reasonably hope that Nn=1 em(cn ) is bounded for each m,
but we will be able to use our techniques to get a bound2 o(N ). For instance,
we'll see that if P 2 R[x] is a polynomial at least one of whose nonconstant
coecients is irrational then fP (n)g is equidistributed mod 1 (the example of
fnrg being the special case of linear polynomials). We'll also be able to show
this for flog10 (n!)g and thus obtain the distribution of the rst d digits of n! for
each d.

1. (An easy variation on Weyl's theorem.) Let An  R be nite subsets with

#(An )!1, and say that An is asymptotically equidistributed mod 1 if
lim #ft 2 An#(: Aa ) hti  bg = b ? a
n!1 n
for all a; b such that 0  a  b  1. Prove that this is the case if and only if
lim 1 X e (t) =  :
n!1 #(An ) m m

2. (Recognizing other distributions mod 1.) In Weyl's theorem suppose condi-

tion (iii) holds for all nonzero m 6= 1, but
n!1 n
e1 (cn ) = 1=2:
What can you conclude about the limits in (i) and (ii)? Generalize.
3. (Basic properties of o().) If f = o(g) then f = O(g). If f = o(g) and
g = O(h), or f  g and g = o(h), then f = o(h) (assuming that the same
implied limit is taken in both premises). If f1 = o(g1 ) and f2 = O(g2 ) then
f1 f2 = o(g1 g2 ); if moreover f2 = o(g2 ) then f1 +f2 = o(g1 +g2 ) = o(max(g1 ; g2 )).
2 We've gotten this for without explicitly using the \little oh" notation. In case it's new to
you, f = o(g) means that (g > 0 and) (f=g)!0. This begs the question \approaches zero as
what?", whose answer should usually be clear from context; in the present case we of course
intend \as N !1".

Given a positive function g, the functions f such that f = o(g) constitute a
vector space.
4. (E ective and ine ective o().) An estimate f = o(g) is said to be e ective if
for each  > 0 we can compute a speci c point past which jf j < g; otherwise it
is ine ective. Show that the transformations in the previous exercise preserve
e ectivity. Give an example of an ine ective o().
[Korner 1988] Korner, T.W.: Fourier Analysis. Cambridge, England: Cam-
bridge University Press, 1988. [HA 9.88.14 / QA403.5.K67]
[Schmidt 1976] Schmidt, W.M.: Equations over Finite Fields: An Elementary
Approach. Berlin: Springer, 1976 (LNM 536). [QA3.L28 #536]

Math 259: Introduction to Analytic Number Theory
Exponential sums II: the Kuzmin and Montgomery-Vaughan estimates

While proving that an arithmetic progression with irrational step size is equidis-
tributed mod 1, we encountered the estimate

e(cn)  j1 ?2e(c)j = 1=j sin cj  fcg?1;

where fcg is the distance from c to the nearest integer. Kuzmin showed (1927)
that more generally if for 0  N the sequence fcn ? cn?1 g of di erences is
monotonic and contained in [k + ; k + 1 ? ] for some k 2 Z and  > 0 then

e(cn )  cot 
2  :

Indeed, let n = cn ? cn?1 and
n = 1 ? e1( ) = c e(cn??1c) :
n n?1 n
Note that the n are collinear:
n = (1 + i cot n )=2;
since fn g is monotonic, the n are positioned consecutively on the vertical line
Re( ) = 1=2. Now our exponential sum is
X ?1
e(cn ) = e(cn ) + (e(cn?1 ) ? e(cn ))n
n=0 n=0
= (1 ? n )e(cn ) ? 1 e(c0 ) + e(cn )(n+1 ? n ):

XN ?1

e(cn )  j1 j +

jn+1 ? n j + j1 ? n j = j1 j + jN ? 1 j + jN j;
n=0 n=1
where in the last step we used the monotonicity of Im(n ) and the fact that
Re(n ) = 1=2. The conclusion of the proof,
j1 j + jN ? 1 j + jN j  sin1 + tan1 = cot 12 ;

is an exercise in trigonometry.
For instance, it follows that for t= < N1 < N2 we have
n?it  N2 =t;
since we are dealing with cn = ?(t log n)=2 and thus n  ?t=2n. By partial
summation it follows that
N2 Z N2
n?1=2?it  1t
n?3=2  n dn  t?1 N 1=2 ;
n=N1 N1
and thus
 (1=2 + it) = n?1=2?it + O(1):
With some more work, we can (and soon will) push the upper limit of the
sum further down, but not (yet?) all the way to t ; as n decreases, the phase
e((t log n)=2) varies more erratically, making the sum harder to control. Still,
ifPwe sum random complex numbers of norm cn , the variance of the sum is
n jcn j , so we expect that the sum would grow as the square root of that,
which for  (1=2+it) would make it log1=2 jtj \on average". We next prove a series
of general mean-square results along these lines, in which the summands are not
independent variables but complex exponentials with di erent frequencies:
f (t) = c e(t)
for some nite set A  R and coecients c 2 C. The easiest estimate is that
given A and c we have
Z T2 X
jf (t)j2 dt = (T2 ? T1 ) jc j2 + O(1):
T1 2A
How does the \O(1)" depend on A; c ; T1 ; T2? We easily nd
Z T2 X
jf (t)j2 dt ? (T2 ? T1 ) jc j2 = QA (~c2 ) ? QA (~c1 );
T1 2A
where QA is the sesquilinear form on CA de ned by
QA (~x) = 21i
XX x x
; 2A ?

and cj 2 CA (j = 1; 2) arePthe vectors with  coordinate c e(tj ). The termwise
estimate jQA (~x)j  ?1
> jx x j=( ?  ) is already sucient to prove
T 0 j (1=2 + it)j dt  log2 T . But remarkably a tighter estimate holds in
?1 2
this general setting: let
 j ? j;
() = min
the minimum taken over all  2 A other than  itself. Then
Theorem (Montgomery-Vaughan Hilbert Inequality). For any nite
A  R and ~c 2 CA we have
X jc j2 ;
2A ()
and thus
Z T2 X 2 X 
c e(t) dt = T2 ? T1 + O(()?1 ) jc j2 :
T1 2A 2A

Why \Hilbert Inequality" and not simply \Inequality"? Because this is a grand
generalization of the original Hilbert inequality, which is the special case A =
f 2 Z : jj < M g. In that case our function P
f (t) is Z-periodic, and as
Schur observed the inequality jQA(~c)j < (1=2)  jc j2 follows from the integral
QA (~c) = i 01 (t ? 21 )jf (t)j2 dt (though in that case the resulting estimate
R T2
on T1 jf (t)j2 dt is even easier than the upper bound on jQA (~c)j).
To start the proof of Montgomery-Vaughan, consider CA as a nite-dimensional
complex Hilbert space with inner product
h~c;~c0 i := c c0 =():
Then QA (~x) = h~x; L~xi where L isPthe Hermitian operator taking ~x to the vector
with  coordinate (2i)?1 ()  =6  xn =( ?  ), and we want to show that
h~c; L~ci  h~c;~ci for all ~c 2 CA . But this is equivalent to the condition that L
have norm O(1) as an operator on that Hilbert space, and since the operator
is Hermitian it is enough to check that [QA (~c) =]h~c; L~ci  1 holds when ~c is a
normalized eigenvector. Thus it is enough to prove that QA (~c)  1 for all A; c
such that X
jc j2 =() = 1
and there exists some  2 R such that
() c =( ?  ) = ic

for each  2 A, in which case  = 2QA(~c).
Now for any ~c we have
j2Q(~c)j2 = c
X c 2  X jc j2 X ( ) X c 2 :
 6=  ?   ( )  6=  ? 
P 2
By assumption  jc j =( ) = 1. For the other factor, we expand

X c

XX c1 c2 :

6=  ?  6=
1 6=2 1  )(2 ?  )
( ?
The single sum contributes
jc j2
X ( )
 6= ?  )
( 2

 ) 2 ; let T be the inner sum  6= ( )=( ?  )2 , so

to  ( ) 6= c =( ?P
the above contribution is  jc j2 T . The double sum contributes
c1 c2
X ( ) :
1 6=2  6=1 ;2 (1 ?  )(2 ?  )
The key trick is now to use the partial fraction decomposition
1 1 1 1
(1 ?  )(2 ?  ) = 2 ? 1 1 ?  ? 2 ? 
to rewrite this last triple sum as
XX c c h X   ( ) ( ) i:
1 2
1 6=2 2 ? 1  6=1 ;2 (1 ?  ) (2 ?  )
The point is that the rst part of the inner sum is almost independent of 2 ,
while the second half is almost independent of 1 : the other  enters only as a
single excluded  . That is, the triple sum is
XX c c
1 2
S ( ) ? (2 ) ? S ( ) ? (1 )
1 2
1 6=2 2 ? 1 2 ? 1 1 ? 2
S () :=
X ( ) :
 6= ? 

And now we get to use the eigenvalue hypothesis to show that the S (j ) terms
cancel each other. Indeed we have
XX c c X X c2
1 2
S ( ) = c S ( )
2 ? 1 1 1 1
1 6=2 1 2 6=1 2 ? 1
and the inner sum is just ?ic1 =(1 ), so
XX c1 c2 S ( ) = ?i X S () jc j2 :
1 6=2 2 ? 1  ()
The same computation shows that
XX c c
1 2
S ( ) = ?i
S ( ) jc j2 ;
 ? 2 ()
1 6=2 2 1 
so the S (j ) terms indeed drop out! Collecting the surviving terms, we are thus
left with
c1 c2 ((1 )?+()22 ) :
j2Q(~c)j2  jc j2 T + ()
2A 1 6=2 2 1
By now all the coecients are positive, so we will have no further magic cancel-
lations and will have to just estimate how big things can get. We'll need some
lemmas (which are the only place we actually use the de nition of ()!): rst,
for each k = 2; 3; : : :,
 2 A ) (?())k k ()1?k ;
1 ; 2 2 A )
X ( )  [(1 )?1 ] + [(2 )?1 ] : (2)
 6=1 ;2 (1 ?  ) (2 ?  ) (1 ? 2 )2
2 2

Now the rst sum in (*) is O(1) because

T = (?() )2  (1)

by the case k = 2 of (1). The second sum will be Cauchy-Schwarzed. That sum
is bounded by twice
jc1 c2 j ( (?1) )2 = jc c j (?() )2 :
B :=
1 6=2 2 1 6=

P 2 =() = 1, we have
Since  jc j

jB j2 
( )
jc j() 2 :
6= ( ?  )

Expanding and switching 's we rewrite this as

jB j2 
jc1 c2 j(1 )(2 )
( ) 
2 :
 6=1 ;2 (1 ?  ) (2 ?  )
1 ;2
When 1 = 2 , the inner Psum is  ()?3 (by (1) with k = 4), so the contribu-
tion of those terms is   jc j2 =() = 1. When 1 6= 2 we apply (2), and
the resulting estimate on the sum of the cross-terms is twice the double sum
de ning B ! So, we've shown (modulo the proofs of (1), (2)) that B 2  1 + B .
Thus B  1 and we're nally done.
For instance, if A = flog n=2 : n = 1; 2; 3; : : :; N g we nd that
T2 X 2 N
cn nit dt = (T2 ? T1 + O(n))jcn j2 :
T1 n=1 n=1
Taking T1 = ?T , T2 = 0, cn = n?1=2 we thus have
T X 2
n?1=2?it dt = T log N + O(T + N ):
0 n=1
It follows that
Z T p
j (1=2 + it)j2 dt = T log T + O(T log T ):

1. Prove that the constant  in the original Hilbert inequality is best possible,
and show that it holds even if c is allowed to be nonzero for every integer 
(this is in fact what Hilbert originally proved).
2. Complete the proof of Montgomery-Vaughan by verifying the inequalities
(1), (2).
3. Let
 be a character (primitive or not) mod q. Obtain an asymptotic formula
for 0 jL(1=2+ it)j2 dt. How does the error term depend on q? (It is conjectured
that L(1=2 + it)  (qjtj) ; as usual this problem is still wide open.)

Math 259: Introduction to Analytic Number Theory
Exponential sums III: the van der Corput inequalities
Let f (x) be a suciently di erentiable function, and S = Nn=1 e(f (n)). The
Kuzmin inequality tells us in e ect that
I If f 0(x) is monotonic and 1 < ff 0(x)g < 1 ? 1 for x 2 [1; N ] then S  1=1.
We shall use this to deduce van der Corput's estimates on S in terms of N and
higher derivatives of f . In each case the inequality is only useful if f has a
derivative f (k) of constant sign which is signi cantly smaller than 1.
II If c2 < f 00 < C2 for some constants c; C such that 0 < c < C then
S c;C N  = + ? = :
1 2
2 2
1 2

III If c 3 < f 000 < C3 for some constants c; C such that 0 < c < C then
S c;C N  = + N 1 6 1=2 ?1=6 :

In general there is a k-th inequality

S c;C N k= 1 (2k ?2) + N 1?2?k ?1=(2k ?2)

but we'll only make use of van der Corput II and III.
Here is a typical application, due to van der Corput:  (1=2 + it)  jtj1=6 log jtj.
We have seen that
 (1=2 + it) = n?1=2?it + O(1):
We shall break up the sum into segments nN=1 N with N < N1  2N . We shall
use f (x) = (t log x)=2, so k = t=N k . Then II and III give
X N0
nit  jtj + N=jtj
1=2 1=2
; nit  N = jtj = + N=jtj =
1 2 1 6 1 6

n=N n=N
for N < N 0 < N1 . By partial summation of S it follows that
X N0
n?1=2?it  (jtj=N ) = +(N=jtj) = ;
1 2 1 2
n?1=2?it  jtj = + N
1 6 1=2
= t 1=6
n=N n=N
Choosing the rst estimate for N  jtj2=3 and the second for N  jtj2=3 we
nd that the sum is  jtj1=6 in either case. Since the total number of (N; N 0 ]
segments is O(log jtj), the inequality  (1=2 + it)  jtj1=6 log jtj follows.

The inequality II is an easy consequence of Kuzmin's I. [NB the following is
not van der Corput's original proof, for which see for instance Lecture 3 of
[Montgomery 1994]. The proof we give is much more elementary, but does not
as readily yield the small further reductions of the exponents that are available
with these methods.] We may assume that f 00 (x) < 1=4 on [1; N ], else 2  1
and the inequality is trivial. Split [1; N ] into O(N 2 +1) intervals on which bf 0c
is constant. Let 1 be a small positive number to be determined later, and take
out O(N 2 + 1) subintervals of length O(1 =2 + 1) on which f 0 is within 1
of an integer. On each excised interval, estimate the sum trivially by its length;
on the remaining intervals, use Kuzmin. This yields
S  (N 2 + 1)(? 1 + 1 =2 + 1):

Now take 1 = 12=2 to get

S (N  + 1)(? = + 1):
2 2
1 2

But by assumption 2  1 so the second factor is  ? = . This completes the

1 2

proof of II.

For III and higher van der Corput bounds, use Weyl's trick: for H  N ,
( )1=2
H N ?h
jS j  H

e(f (n + h) ?
f (n))

: (W)
h=0 n=1
If f (x) has small positive k-th derivative then each f (x + h) ? f (x) has small
(k ? 1)-st derivative, which is positive except for h = 0 when the inner sum is N .
This will let us prove III from II, and so on by induction (see the rst Exercise
To prove (W) de ne zn for n 2 Z by zn = e(f (n)) for 1  n  N and zn = 0
otherwise. Then
X1 1 X1 X H
S= z = n z ; n+h
n=?1 n=?1 n=1
in which fewer than N + H of the inner sums are nonzero. Thus by Cauchy-
jS j 


 HN 2

zn+h1 zn+h2 :
n=?1 n=1 h1 ;h2 =1 n2Z
But the inner sum depends only on jh1 ? h2 j, and each possible h := h1 ? h2
occurs at most H times. So,
jS j 

zn+h1 zn ;
h=0 n2Z

from which (W) follows.
Now to prove III: we may assume N ?3 < 3 < 1, else the inequality is trivial.
Apply (W), and to each of the inner sums with h 6= 0 apply II with 2 = h3 .
This yields
2 H
jS j  NH + N
[N (h3 )1=2 + (h3 )?1=2 ]
=N 2
(H3 )1=2 + H ?1 + N=(H3 )1=2 :
Now make the rst two terms equal by taking H = b?3 1=3 c:
jS j  N 2 2 1=3
3 + N ?3 1=3 :
Extracting square roots yields III.

1. Prove the van der Corput estimates IV, V, etc. by induction.

2. Prove that flogb n!g is equidistributed mod 1 for any b > 1.
3. Use Weyl's trick to prove the equidistribution of fnP (n)g mod 1 for any poly-
nomial P (x) with an irrational coecient (which was Weyl's original motivation
for introducing this trick).
[Montgomery 1994] Montgomery, H.L.: Ten lectures on the interface between
analytic number theory and harmonic analysis. Providence: AMS, 1994 [AB

Math 259: Introduction to Analytic Number Theory
How many points can a curve of genus g have over Fq ?

Let k be a nite eld of q elements, and C=k a (smooth projective) curve of

genus g = g(C). Let K = k(C) be its function eld. A \prime" (a.k.a. \place",
\valuation") p of K is a Galois orbit of k-rational points of C. If that orbit has
size d = dp (the \degree" of p) then we are dealing with dp conjugate points
de ned over the qdp -element eld (and no smaller eld intermediate between it
and k), which is the residue eld of p. The zeta function K (s) = C (s) of this
eld or curve may be de ned as the Euler product
C (s) = 1 ? (q1dp )?s = 1 ?1Z d

p p
extending over all primes p, where Z = q?s . Then
X X 1 1
X X 1 X  Z n
dp = X
logC (s) = Z dp m =m = Zn dp n :
p m=1 n=1 dp jn n n=1 dp jn
But the inner sum is just the number Nn = Nn (C) of points of C rational over
the eld of qn elements. Note that1 Nn C qn, so the sum and thus the Euler
product converge for jZ j < 1=q, i.e. for  > 1.
As in the number- eld case, C satis es a functional equation relating its values
at s and 1 ? s:
C (1 ? s) = q(2?2g)( 12 ?s) C (s) = (qZ 2 )1?g C (s);
C (s) := q(1?g)(s? 21 ) C (s)
is invariant under s $ 1 ? s. Moreover, C (s) is of the form
C (s) = P(Z)=(1 ? Z)(1 ? qZ)
for some polynomial P of degree 2g with P(1) = 1; it then follows from the
functional equation that P(1=qZ) = P(Z)=(qZ 2 )g , which is to say that we can
factor P(Z) as
P(Z) = (1 ? j Z)(1 ? g+j Z)
j =1
1 Fix a nonconstant function : !P1 . Then
f C

n ( )  (deg ) n (P ) = (deg )(
N C f N
1 n + 1) 
f q q

for some complex numbers 1 ; : : :; 2g such that
j g+j = q
for j = 1; : : :; g. Comparing this with our formula for Nn we nd
Nn = q + 1 ? nj :
j =1
(Fortunately this agrees with our known formula for Nn when g = 0: : :) The
analogue of the Dirichlet class number formula is the fact that P(1) = j (1 ?

j ), which is essentially the residue of C (s) at its pole s = 1, is the size of the
\Jacobian" JC (k) of C over k.
So far all this can be proved by more-or-less elementary means, and even extends
to varieties over k of any dimension [Dwork 1960]. A much harder, but known,
result is that the Riemann hypothesis holds: P(q?s ) can vanish only for s such
that  = 1=2, i.e. jZ j = q?1=2; thus all the j have absolute value q1=2, and
g+j = j . This theorem of Weil, and its generalization by Deligne to varieties
of arbitrary dimension over nite elds, is at least to some tastes the strongest
evidence so far for the truth of the original Riemann hypothesis and its various
It also has numerous applications. For instance, it follows immediately that the
number N1 = N1 (C) of k-rational points on C is approximated by q + 1:
jN1 ? (q + 1)j  2gpq: (W)
Equality can hold in this Weil bound at least for small g, though already for
g = 1 there are surprises; for instance for q = 128 (W) allows N1 to be as large as
151 and as small as 107, but in fact the most and least possible are 150 and 108.
See [Serre 1982{4] for much more about this. We ask however what happens for
xed q as g!1: how large can N1 (C) grow as a function of g? this is not only
a compelling problem in its own right, but has applications to coding theory
and similar combinatorial problems, see forpinstance [Goppa 1981,3; Tsfasman
1996]. We shall see that the bound N1 < 2g q+Oq (1) coming from (W) cannot
be sharp, and obtain an improved bound, the Drinfeld-Vladut bound
N1 < (pq ? 1 + o(1)) g; (DV)
[DV 1983], that turns out to be best possible for square q [Ihara 1981, TVZ
1982]. Moreover we shall adapt Weyl's equidistribution argument to obtain the
asymptotic distribution of the i on the circle jj2 = q for curves attaining that
The key idea is much the same as what we used to prove that (1 + it) 6= 0. To
start with, note that if the Weil upper bound N1  q+1+2gpq is attained then
each j = ?pq. This can actually happen: for instance, q = q02 and let C be

the (q0 +1)-st Fermat curve, i.e. the smooth plane curve xq +1 +yq +1 +z q +1 = 0
0 0 0

of degree q0 +1 and thus of genus (q02 ? q0 )=2. Then C has q03 +1 points over k,
the maximum allowed by (W) [check this!]. But now consider this curve over
the quadratic extension Fq2 of k: we have
2 g
N2 = q + 1 ?
2j = q2 + 1 ? 2gq = q3=2 + 1 = N1 ;
j =1
i.e. every point rational over Fq2 is already de ned over k! [It is an amusing
problem to verify this directly, without invoking the Riemann hypothesis for
C .] It follows that if g were any larger than (q ? q0)=2 and all the j were ?q0
then N2 would actually be smaller than N1 , which is impossible.
So, we have
0  N2 ? N1 = q ? q + (j ? 2j )

j =1
and likewise 2g
0  N n ? N1 = q n ? q + (j ? nj )
j =1
for each n = 2; 3; 4; : :: (We also have inequalities Ndn > Nn but these do not
help us asymptotically.) How to best combine them? For given q; g this is not
an easy problem, but if we x q and only care about asymptotics as g!1 then
all we need do is use the inequality
M ?1
(j =pq)m = M +
X 2
0  (M ? m)q?m=2 (mj + mj+g ):
m=1 m=1
for each M. Summing this over j  g we nd
0  Mg + (M ? m)q?m=2 (qm + 1 ? Nm )
 Mg + (M ? m)q?m=2 (qm + 1 ? N1 )
= Mg + OM (1) ? (M ? m)q?m=2 N1 :
N1 < PM ?1 g m ?m=2 + OM (1):
m=1 (1 ? M )q
For each  > 0, the sum can be brought within  of
q?m=2 = 1=(pq ? 1)


by taking M large enough. We thus have for each  > 0
N1 < (pq ? 1 + )g + O (1);
from which (DV) follows.
What is required for asymptotic equality as C ranges over a sequence of curves
with g!1? Let j = pqe(xj ) for xj 2 R=Z with xj +g = ?xj . Then
Nn = ?qn=2 e(nxj ) + qn + 1:
j =1
Since Nn  N1 is used for each n, we must have Nn = N1 + on (g), and thus
2g 2g
e(nxj ) = q(1?n)=2
e(xj ) + on(g):
j =1 j =1
Moreover g
e(xj ) = ?(1 ? q?1=2)g + o(g):
j =1
Adapting the Weyl equidistribution argument (see especially exercise 2 of the
Weyl handout) we conclude that the xj approach the distribution whose n-th
Fourier moment (n 6= 0) is ?(1 ? q?1=2)=2q(jnj?1)=2, i.e. q (x) dx where the
density q is
1 ? (1 ? q ) q(1?n)=2 e(nx) +2e(?nx) :
? 1=2

Since 1
(1 ? q?1=2) q(1?n)=2 = 1;

this density is nonnegative, so it can be attained and (DV) is asymptotically the
best inequality that can be obtained from Nn  N1 . In fact it is known [Ihara
1981, TVZ 1982] that
p when q is a square2 there are curves with arbitrarily large
g for which N1  ( q ? 1) g; our proof of (DV) gives the asymptotic distribution
of j onQthe circle jj2 = q for any such sequence. It also lets us compute the
g (1 ?  ) of the Jacobian in a logarithmic asymptotic sense:
size #J 2j =1 j
Z 1
g?1 log #J ! log q + log j1 ? q?1=2e(x)j q (x) dx: (J)

Such formulas are needed to determine the asymptotic performance of families

of codes or lattices constructed as in [Tsfasman 1996] from the curves of [Ihara
1981, TVZ 1982].
g !1 1 ( ) ( ) is known to be positive (see e.g. [Serre
2 When is not a square, limsup
q N C =g C

1982{1984]), but its value is still a great mystery even for = 2. q


1. Verify that if q0 is a prime power then 2the Fermat curve of degree q0 + 1 has
q03 + 1 rational points over the eld of q0 elements.
2. Suppose C is the Fermat curve xr +yr +z r over Fq (not assuming the special
case q = (r ? 1)2). Write Nn in terms of characters on Fq and identify the
eigenvalues of Frobenius with Jacobi sums. (This yields an elementary proof of
jj j2 = q for Fermat curves.)
3. What is the best upper bound that can be obtained on N1 using only the
inequality N1  N2 ?
4. Compute q (x) and the integral (J) in closed form. Generalize to obtain, for
each s 2 C of real part 6= 1=2, a closed form forplimg!1 C (s) as C ranges over
a family of curves over Fq2 with N1 (C)=g(C)! q ? 1.
[DV 1983] Drinfeld, V.G., Vladut, S.: Number of points of an algebraic curve,
Func. Anal. 17 (1983), 53{54.
[Dwork 1960] Dwork, B.M.: On the rationality of the zeta function of an alge-
braic variety. Amer. J. Math. 82 (1960), 631{648.
[Goppa 1981,3] Goppa, V.D.: Codes on algebraic curves, Soviet Math. Dokl. 24
(1981), 170{172; Algebraico-geometric codes, Math. USSR Izvestiya 24 (1983),
[Ihara 1981] Ihara, Y.: Some remarks on the number of rational points of alge-
braic curves over nite elds. J. Fac. Sci. Tokyo 28 (1981), 721{724.
[Serre 1982{4] Serre, J.-P.: Sur le nombre des points rationnels d'une courbe
algebrique sur un corps ni; Nombres de points des courbes algebriques sur Fq ;
Resume des cours de 1983{1984: reprinted as ##128,129,132 in his Collected
Works III [O 9.86.1 (III) / QA3.S47]
[Tsfasman 1996] Tsfasman, M.A.: Algebraic Geometry Lattices and Codes,
pages 385{390 in the proceedings of ANTS-II (second Algorithmic Number
Theory Symposium), ed. H. Cohen, Lecture Notes in Computer Science 1122
[QA75.L4 #1122 in the McKay Applied Science Library].
[TVZ 1982] Tsfasman, M.A., Vladut, S.G., Zink, T.: Modular curves, Shimura
curves and Goppa codes better than the Varshamov-Gilbert bound. Math.
Nachr. 109 (1982), 21{28.

Math 259: Introduction to Analytic Number Theory
Stark's lower bound on j disc(K=Q)j

Let K be a number eld of degree n = r1 +2r2 , and let DK be its discriminant.

Minkowski introduced his \geometry of numbers" method to prove that
jDK j  (=4)2r2 (nn =n!)2; (M)
see for instance Chapter 5 (page 136 and environs) of [Marcus 1977]. It follows
that jDK j > 1 once n > 1, i.e. every nontrivial extension of Q has a rami ed
prime, which for instance is a key ingredient in the Kronecker-Weber Theorem
(see e.g. [Marcus 1977, p.125 .]). Moreover, by Stirling the bound (M) grows
exponentially with n:
log jDK j  n + r1 ? o(n) (L)
holds with
= 2 ? log(4=) = 1:7584+; = log(4=) = :2416 ? :
It is then natural to ask: how does the actual minK jDK j grow as a function of
n or of r1; r2? Even the fact that the minimum grows no faster than exp O(n)
is nontrivial; it's hard to write explicit examples of number elds with jDK j
growing slower than ncn . The rst, and apparently still the only, way to prove
that elds with log jDK j  [K : Q] exist is Hilbert class eld towers and varia-
tions thereon, see for instance Roquette's \On Class Field Towers", Chapter IX
of [CF 1967]. Meanwhile the lower bounds, always of the form (L) but with
larger ; , have been improved using various approaches. Remarkably the best
bounds now known are obtained by using the functional equation for K :
K (s) := ?(s=2)r1 ?(s)r2 (4?r2 ?n jDK j)s=2K (s) = K (1 ? s);
in which jDK j occurs, together with the in nite product for (s), as a formula
for jDK j. From the partial-fraction decomposition for ( 0 =)(s + 1=2) we nd
log jDK j = r1 log ? (s=2) + 2r2 log 2 ? (s) (D)
X 0
+2 Re s ?1  ? s ?2 1 ? 2s ? K (s);
for real s, where = ?0 =? and  runs over the nontrivial zeros of K (s). The
idea of using this to bound log jDK j is due to Stark [Stark 1984]. Naturally the
resulting bounds are both better and easier to prove if we assume GRH for K .
Then Stark obtained = 3:8014? and = =2 = 1:5708? in (L). We present
a simpli cation of his proof of this result:

Proposition: Let K be a number eld of absolute degree n = r1 + 2r2 which
satis es the Generalized Riemann Hypothesis. Then
log jDK j > (log 8 + C ? o(1))n + (=2)r1 (S)
as n ! 1, where C = ??0 (1) = :577 : : : is Euler's constant.
Proof : Start from (D), and use the fact that ?K0 =K and its derivatives of even
order with respect to s are positive for s > 1, and the derivatives of odd order
negative. Thus by di erentiating (1) m times (m = 1; 2; 3; : ::) we nd
 m  m 
m d d
0 > (?1) r1 dsm log ? (s=2) + 2r2 dsm log2 ? (s) (>)
 X 2 ;
+m! 2 Re [s ? ( 1 +1 i )]m+1 ? (s ? 1) 2 ?
m+1 sm+1

where  = 1=2 + i . Our idea is now that for xed s > 1 and large n the term
in (s ? 1)?(m+1) is negligible, and so by dividing the rest of (>) by 2m m! and
summing over m we obtain (D) with s replaced by s ? 1=2 (Taylor expansion
about s); since Re(1=(s ? 1 ? i )) is still positive, we then nd by bringing s
arbitrarily close to 1 that
log jDK j > r1 (log ? (1=4)) + 2r2(log 2 ? (1=2)) ? o(n);
and thus obtain our Proposition from the known1 special values
(1=2) = ? log 4 ? C; (1=4) = ? log8 ? =2 ? C:
To make this rigorous, we argue as follows: for any small  > 0, take s0 = 1 + ,
and pick an integer M so large that (i) the values at s = s0 ? 1=2 of the M-th
partial sums of the Taylor expansions of (s) and (s=2) about s = s0 are
within  of (s0 ? 1=2) and (s0 =2 ? 1=4) respectively (this is possible because
both functions are analytic in a circle of radius 1 > 1=2 about s0 ); (ii) the
value at s = s0 ? 1=2 of the M-th partial sum of the Taylor expansion of
Re(1=(s ? 1=2 ? i )) about s = s0 is positive for all > 0 (note that since
Re(1=(s ? 1 ? i )) = =(2 + 2 ), and the value of the M-th partial sum of the
Taylor expansion di ers from this by
1 2 ?M=2
Re [1 + 2( ? i )] M ( + i )  (1 +  + ) ;

it's clear that the positive value =(2 + 2 ) dominates the error (1+2 + 2 )?M=2
for all once M is suciently large). Now divide (>) by 2m m!, sum from m = 0
1 From the in nite product for ?(s) we have (s) + C =
n=01=(n + 1) ? 1=(s + 1). Thus
(1=2) + C = ?2 log 2, while (1=4) ? (3=4) = ? and (1=4) + (3=4) + 2C = ?6 log2
(why)?, from which (1=4) = ? log 8 ? =2 ? C follows.

to M ? 1 (using (D) for the m = 0 term), and set s = s0 to obtain
log jDK j > r1 log ? (s0 =2 ? 1=4) ?  +2r2 log 2 ? (s0 ? 1=2) ?  +O(1);
since  was arbitrarily small and s0 arbitrarily close to 1, we're done.
See [Odlyzko 1975] for using Stark's method to obtain good lower bounds on
jDK j for speci c nite r1; r2.

1. Find a constant B such that (O) holds under the weaker hypothesis that
(K) has no zeros  with Re  6= 1=2 and 0 < j Imj < B.
2. Assume that the rational prime 2 splits completely in K, i.e. that the Euler
product for K contains a factor (1 ? 2?s)?n. Assuming GRH for K, nd a lower
bound (L) with larger ; . [NB even under this more restrictive condition,
class eld theory yields towers of elds K whose root-discriminant jDK j1=n is
bounded, indeed constant.] Generalize.
[CF 1967] Cassels, J.W.S., Frohlich, A., eds.: Algebraic Number Theory. Lon-
don: Academic Press 1967. [AB 9.67.2 (Reserve) / QA 241.A42]
[Marcus 1977] Marcus, D.A.: Number Fields. New York: Springer 1977. [AB
9.77.1 (Reserve) / QA247.M346]
[Odlyzko 1975] Some Analytic Estimates of Class Numbers and Discriminants.
Inventiones Math. 29 (1975), 275{286.
[Serre 1975] Serre, J.-P.: Minorations de discriminants. #106 (pages 240{243)
in his Collected Works III [O 9.86.1 (III) / QA3.S47]. See also page 710 for ref-
erences to results concerning speci c (r1 ; r2) with or without GRH, and page 660
for the analogy with the Drinfeld-Vladut estimates.
[Stark 1984] Some E ective Cases of the Brauer-Siegel Theorem. Inventiones
Math. 23 (1974), 135{152.

Math 259: Introduction to Analytic Number Theory
An application of Kloosterman sums

As promised, here is the analytic lemma from [Merel 1996]. The algebraic
exponential sum that arises naturally here will also arise in our investigation of
the coecients of modular forms.
Fix a prime p and a nonzero c mod p. [More generally we might ask the same
question for any integer N and c 2 (Z=N ) ; see Exercise 1 below.] Let I; J 
(Z=p) be intervals of size A; B < p. How many solutions (x; y) 2 I  J are there
to xy  c mod p?
As usual we cannot reasonably hope for a meaningful exact answer, but on
probabilistic grounds we expect the number to be roughly AB=p, and can ana-
lytically bound the di erence between this and the actual number:
Lemma 1. The number M of solutions (x; y) 2 I  J of xy  c mod p is
AB=p + O(p1=2 log2 p).
Proof : Let ; : (Z=p)!C be the characteristic functions of I; J . Then the
number of solutions to our equation is
X (n) (cn?1 ):
n 2(Z=p)
As it stands, this \formula" for M is just restating the problem. But we may
expand ; in discrete Fourier series:
(x) =
X ^(a)e (ax); (x) = X ^(b)e (bx);
p p
a modp b mod p
where NB for t 2 (Z=p) the notation ep (t) now means e(t=p) = e2it=p , not
e(pt) as before. So, we have
X XX ^(a) ^(b)e (ax + bcx?1) = XX ^(a) ^(b)K (a; bc); (M)
p p
x modp a;b mod p a;b modp
where for a; b mod p the Kloosterman sum Kp (a; b) is de ned by
Kp (a; b) :=
X ep (an + bn?1 ):

Now clearly Kp (0; 0) = p ? 1, and almost as clearly Kp(0; b) = Kp (a; 0) = ?1 if

a; b 6= 0. The interesting case is a; b 2 (Z=p) . We now encounter yet another
algebraic result that we must cite without proof, again due to Weil [Weil 1948c]:
for a; b nonzero,
jKp (a; b)j < 2pp: (K)

[This comes from an interpretation of Kp (a; b) as  +  where  is an eigenvalue
of Frobenius for the \Artin-Schreier curve" Y p ? Y = aX + b=X , though even
the connection with that curve is nontrivial | see [Weil 1948c] again, which as
usual generalizes to nite elds which need not have prime order.] Putting this
into (M) we nd
jM ? ^(0) ^(0)(p ? 1)j < 2pp
X j^(a)j  X j ^(b)j:
a modp b mod p
But ^(0) = A=p and ^(0) = B=p. For nonzero a; b mod p we obtain ^(a); ^(b)
as sums of geometric series and nd (as in Polya-Vinogradov)
p j^(a)j  fa=pg?1; p j ^(b)j  fb=pg?1:
Thus a j^(a)j; b j ^(b)j  log p, and Lemma 1 is proved. 2
Corollary.2 (\Lemme 5" of [Merel 1996]) If AB is a suciently high multiple
of p3=2 log p then there are x 2 I , y 2 J such that xy  c mod p.
For instance it is enough for A; B to both be suciently high multiples of
p3=4 log p. Presumably p1=2+ suces, but as far as I know even p for any
 < 3=4 is a dicult problem. We can, however, remove the log factors from
the Corollary:
Lemma 2. Suppose I; J  (Z=p) are intervals of sizes A; B with AB 
8p5=2 =(p ? 1). Then there are x 2 I , y 2 J such that xy  c mod p.
Proof : The idea is to replace ; by functions f; g : (Z=p)![0; 1] supported
on I; J whose discrete Fourier coecients decay more rapidly than p=f pa g, and
sum to O(1) instead of O(log p). This will yield an estimate on
M 0 :=
X f (n)g(cn?1)

instead of M , but if there are no solutions (x; y) 2 I  J of xy  c mod p then

M 0 vanishes as well as M and a contradiction would arise just the same.
Let 0 be the characteristic function of an interval of size A0 = dA=2e, and let f0
be the convolution 0  0 . Then f0 is a function from (Z=p) to [0; A0 ], supported
on an interval of size  A centered at the origin, and with nonnegative discrete
Fourier coecients f^0 (a). Thus
X jf^ (a)j = X f^ (a) = f (0) = A0:
0 0
mod p modp
a a

Moreover mod f0 (x) = A02 . Let f be a translate of f0 =A0 supported on I .

P x p
Then mod jf^(a)j = 1 and mod f0 (x) = A0 . De ne 0 ; g0 ; g similarly.
a p x p
Arguing as before, we nd that
jM 0 ? p p?2 1 A0 B 0 j < 2pp:

Thus if M 0 = 0 then A0 B 0 < 2p5=2 =(p ? 1) and AB  4A0 B 0 < 8p5=2 =(p ? 1),

0. Show that (unless a; b both vanish mod p) the Kloosterman sum Kp (a; b)
depends only on ab mod p. In particular Kp (a; b) 2 R.
1. For any integer N and any a; b 2 Z=N Z, de ne
KN (a; b) :=
X eN (an + bn?1):
2(Z=N )

Prove that if N is squarefree then KN (a; b) = pjN Kp (a; b). Deduce results
analogous to our Lemmas 1,2 for composite moduli. What can you say about
Kpr (a; b) for r > 1, and KN (a; b) for general N ?
2. Show using only the \Riemann hypothesis" for elliptic curves over Fp that
Kp (a; b)  p3=4 . [Expand jKp (a; b)j2 and collect like terms. The point is that
while the bound is worse than (K), it is still e ectively o(p), which suces for
many purposes (including Merel's), while the proof is more elementary in that
RH for elliptic curves is easier to prove (and was already done by Deuring)
and the resulting bound on Kp(a; b) is obtained more directly than the one in
[Weil 1948].]
3. Let p be an odd prime, and  = (=p) the nontrivial real character mod p.
Evaluate the Salie sum

Sp (a; b) :=
X?1 (n)e (an + bn?1)

n =1
in closed form.
As with Gauss sums, there is an analogy between Kloosterman sums and certain de -
nite integrals, in thispcase the integral 01 exp(?ax ? b=x) dx=x which gives twice the
Bessel function K0 (2 ab). The Salie sum is analogous to 01 exp(?ax ? b=x) dx=px,
which involves a Bessel function K1=2 of half-integer order and so (unlike the K for
 2 Z) known in closed form. See for instance [GR 1980, 3.471 9. and 8.468] for the
relevant formulas.
[GR 1980] Gradshteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series, and Prod-
ucts. New York: Academic Press 1980. [D 9.80.1 / basement reference
[Merel 1996] Merel, L.: Bornes pour la torsion des courbes elliptiques sur les
corps de nombres. Invent. Math. 124 (1996), 437{449.
[Weil 1948] Weil, A.: On some exponential sums. Item 1948c (pages 386{389)
in his Collected Papers I [O 9.79.1 (I) / QA3.W43].

Math 259: Introduction to Analytic Number Theory
An upper bound on the coecients of a PSL (Z) cusp form2

Fix an integer k > 1, and let Mk0 be the space of cusp forms of weight 2k for G =
PSL2 (Z). We have seen that this is a nite-dimensional vector space. Moreover
it carries a Hermitian (Petersson) pairing [Serre 1973, VII 5.6.1 (p.105)]:
hf; gi = f (z )g(z )y2k?2 dx dy:
Now for each integer n > 0 the map taking a cusp form f = 1 m
m=1 am q to an
is a linear functional on Mk0 . Thus there is a unique Pn 2 Mk0 that represents
this functional:
hf; Pn i = an (f )
for all f 2 Mk0 . Moreover the Pn for n  dim Mk0 constitute a basis for Mk0 :
indeed the orthogonal complement of the linear span of these Pn is the subspace
of f 2 Mk0 whose rst dim Mk0 coecients vanish, and we have seen that 0 is the
only such f . So, an upper bound ar (Pn ) n r for all n  dim Mk0 will yield
ar (f ) f r for all f 2 Mk0 .
Remarkably we can obtain Pn and its q-expansion in an explicit enough form (a
\Poincare series") to obtain such an inequality for all  > k ? 14 | and the proof
uses Weil's bound [Weil 1948] on Kloosterman sums! (See for instance [Selberg
1965, x3]; thanks to Peter Sarnak for this reference and for rst introducing
me to this approach. It is now known that in fact the correct  is k ? 12 + ,
but Deligne's proof of this is quite deep, and is not as generally applicable: the
Poincare-series method still yields the sharpest bounds known for many other
kinds of modular forms.)
We begin by observing that for any f (z ) = 1 n
n=1 an q the coecient an may
be isolated from the absolutely convergent double integral
e2inz f (z )y2k?2 dx dy = an k ? 2)! a :
e?4ny y2k?2 dy = (4(2n
0 )2k?1 n
Now the region we're integrating over is a fundamental domain for the action
of hT i on H. We decompose this as the union (with only boundary overlaps) of
G-images of the fundamental domain for the action of G. That is, we split up
the integral as
e2ing(z) f (g(z ))y2k (g(z )) dxy2dy
g D
where D is a fundamental domain for H=G and g ranges over coset representa-
tives of hT inG. But these cosets amount to coprime pairs (c; d) of integers with

c > 0 or c = 0; d = 1. Moreover, we have for g(z ) = (az + b)=(cz + d)
f (g(z ))y2k (g(z )) = (cz + d)2k y2k (g(z ))f (z ) = f (z ) y2k (z )=(cz + d)2k :
So, we nd
(2k ? 2)! a = ZZ f (z ) XX(cz + d)?2k exp 2in az + b y2k?2 dx dy:
(4n)2k?1 n c;d cz + d
Therefore the double sum is (4n)1?2k (2k ? 2)!Pn , provided we can show that it
is in fact a cusp form | which, however, is surprisingly easy. [To do away with
the requirement that c > 0 or (c; d) = (0; 1) we may sum over all coprime pairs
(c; d), then divide by 2. The sum converges absolutely because it is dominated
by the sume de ning the Eisenstein series Ek : the factors e(ng(z )) all have
absolute value < 1.] We thus have:
(4n)2k?1 P (z ) = XX(cz + d)?2k exp 2in az + b : (
(2k ? 2)! n c;d cz + d
(Note that the exponential factor does not depend on the choice of a; b 2 Z such
that ad ? bc = 1.)
We next determine the q-expansion of the Poincare series Pn . The term (c; d) =
(0; 1) contributes qn to the sum. We group the remaining terms according to c
and d mod c. [The existence of a q-expansion is equivalent to T -invariance, so to
obtain the q-expansion we collect the (az + b)=(cz + d) into hT i-orbits, which is to
say that we now consider Pn as a sum over the double coset space hT inG=hT i.]
Fix coprime PPc; d0 with c > 0, and a0 ; b0 such that a0 d0 ? b0 c = 1. Then the terms
of the sum ( ) with d  d0 mod c have (a; b; c; d) = (a0 ; b0 + ma0 ; c; d0 + mc)
for m 2 Z, and thus contribute
? 2k a0 (z + m) + b0
(c(z + m) + d0 ) exp 2in c(z + m) + d :
( )
m2Z 0

By Poisson summation this is r2Z ur where
(c(z + t) + d0 )?2k exp 2in ac0((zz++tt))++db0 e?2irt dt:
ur := ( )
?1 0

If r  0 the integrand extends to a holomorphic function on Im t  0 bounded

by jc(z + t) + d0 j?2k  jtj?2k , and thus the integral
R vanishes by a standard
contour integration. So we need only consider ( ) for r > 0. In that case, let
w = z + t + (d0 =c). We then nd
ur = c?2k qr e2i(na +rd )=c
0 0
e?2i(rw+n=c w) w?2k dw;
( )

with the contour of integration C passing above the essential singularity at
w = 0. Note that the integral depends only on n; r; c but not on d0 ; the de-
pendence on d0 is entirely contained in the factor e2i(a +rd )=c , in which a0 is
0 0

the multiplicative inverse of d0 mod c. Summing over c; d0 we thus nd that for

r 6= n the qr coecient of Pn is
1 Z
(2k ? 2)! X ?2k ?2i(rw+n=c w)w?2k dw:
(4n)2k?1 c=1 c Kc (n; r) C e

The Kloosterman sum Kc(n; r) is On (c1=2+ ) as seen already. The integral is

essentially a Bessel function: let ?2irw = v to get
(?2ir) 2k?1
2 p p
exp v ? 4 rn v?2k dv = (?1)k 2(c r=n)2k?1 J (4 rn=c)
c2 v
2k 1
([GR 1980, 8.412 2.], taken from [Watson 1944]). So the qr coecient of Pn is
jJ k? (4prn=c)j:
k;n; c? rk?
2 1 ()
Now it is known ([GR 1980, 8.451], again from [Watson 1944]) that J2k?1 (x) 
x?1=2 for large x > 0 while
p J2k?1 (x)  Ck x2k?1 for small x > 0. Splitting the
sum in () around c = r we nd that both parts are  r1=4+ . Thus each Pn
has qr coecient O (rk?1=4+ ) as claimed, and we are done. (whew!)

1. Show that the modular cusp form (q) = q ? 24q2 + 252q3 : : : of weight 12
(called (2)?12  in [Serre 1973]) is given by the formula
X 8
(q) = n4 (n)qn =8 = (q1=8 ? 3q9=8 + 5q25=8 ? 7q49=8 + ?   )8 :

[NB the sum is essentially the modi ed theta series # that we used to prove
the functional equation for L(s; 4 ). Note that it is a \modular cusp form

of weight 3=2" for some arithmetic subgroup of G whose qn=8 coecients are
O(n1=4 ) in mean square but only O(n1=2 ), not O(n1=4+ ), individually | so
here the Pn -type bound is essentially best possible.] Using the Jacobi product
for , conclude that
(1 ? qn )3 = 1 ? 3q + 5q3 ? 7q6 + 9q10 ? +    :
2. Let f = n>0 an qn be P a cusp form of weight 2k. Use the boundedness of
y2k jf (z )j2 to prove that n<N jan j2 f N 2k . [In other words an  nk?1=2 in
mean square. Note that Hecke's estimate jan j f nk follows immediately.]

3. Let f = n>0 an qn be a cusp form of weight 2k, and let Lf (s) = n an n?s
be the associated L-function (called f (s) in [Serre 1973, p.103]). Use the
integral representation of Lf to prove that Lf ( + it) f; jtjk () for some
k () < 1. How small a k () can you obtain? [As usual, it is conjectured a la
Lindelof that Lf ( + it) f; jtj for all   k.]
4. Verify that in fact Pn 2 Mk0 .
5. Verify that our nal pestimate on () follows Q from the J2k?1 asymptotics
cited. Since jKc(n; r)j= nc is actually  pjc 2, which in turn is bounded
by the number of factors of c, we can make the r factor more precise; show
that in fact log r suces, i.e. the qr coecient of a cusp form of weight 2k is
O(rk?1=4 log r).
6. For each even k = 2; 4; 6; : : : there is a unique f = 1n=0 an q 2 Mk of the
form 1 + O(q bk=6c+1
), i.e. such that a0 = 1 and a1 = a2 =    = abk=6c = 0.
(Why?) Prove that abk=6c+1 > 0. [This is a bit tricky, requiring the residue
formula and the fact that 1= = q?1 + 24q + 324q + 3200q2 +    has positive
coecients | a fact that can be deduced from the Jacobi product for .]
Conclude that an even unimodular lattice in dimension 4k has a vector of norm
at most 2(bk=6c + 1).
Can the minimal norm be that large? Such lattices exist for several small k, including
k = 2; 4; 6; : : : ; 16, but it is known that for all but nitely many k the minimal norm
is always strictly smaller, indeed < 2(bk=6c ? ) once k > k() for some e ectively
computable k(). This is shown by proving that there is no suitable modular form all
of whose coecients are nonnegative. Still many open questions remain; for instance
it is not even known whether there is an even unimodular lattice of dimension 72 and
minimal norm 8. How many minimal vectors would such a lattice have? See [CS 1993]
for more along these lines, especially p.194 and thereabouts.
[CS 1993] Conway, J.H., Sloane, N.J.A.: Sphere Packings, Lattices and Groups.
New York: Springer 1993.
[GR 1980] Gradshteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series, and Prod-
ucts. New York: Academic Press 1980. [D 9.80.1 / basement reference
[Selberg 1965] Selberg, A.: On the estimation of Fourier coecients of modular
forms, #33 (506{520) in his Collected Papers I [O 9.89.2 (I)].
[Serre 1973] Serre, J.-P.: A Course in Arithmetic. New York: Springer, 1973
(GTM 7). [AB 9.70.4 (reserve case) / QA243.S4713]
[Watson 1944] Watson, G.N.: A treatise on the theory of Bessel functions (2nd
ed.) Bew York: Macmillan 1944. [HA 9.44.6 / QA408.W2]
[Weil 1948] Weil, A.: On some exponential sums. Item 1948c (pages 386{389)
in his Collected Papers I [O 9.79.1 (I) / QA3.W43].

Math 259: Introduction to Analytic Number Theory
The Selberg (quadratic) sieve and some applications

For our last topic we return to an elementary and indeed nave approach to the
distribution of primes: an integer p is prime if and only if it is not divisible
by the primes  pp; but half the integers are odd, 2=3 are not multiples of 3,
4=5 not multiples of 5, etc., and divisibility by any prime is independent of
divisibility by nitely many other primes, so: : : Moreover if p is restricted
to an arithmetic progression a mod q with (a; q) = 1 then the same factors
(l ? 1)=l arise except those for which ljq, whence the appearance of q=(q) in
the asymptotic formula for (qx; a mod q).
The problem with estimating (x) etc. this way is that the divisibilities aren't
quite independent. This is already implicit in our trial-division test for primality:
if p is known to contain no primes  ppp, the conditional probability that it be
a multiple of some other prime l 2 ( p; p) is not 1=l but zero. Already for
small l, the number of n < x divisible by l is not quite x=l but x=l + O(1),
and similarly for n divisible by a product of distinct primes; so if we try to use
inclusion-exclusion to recover the number of primes,
or even of n not divisible
by the rst r primes, we get an estimate of x p (1 ? p1 ) as expected, but with
an \error term" O(2r ) that swamps the estimate long before r can get usefully
Still, in \sieve" situations like this, where we have a set S of A integers such
that #(S \ DZ)=A is approximated by a multiplicative function (D) of the
squarefree integer D (for instance if S is an interval then (D) = 1=D), there
are various ways of bounding from above the size of the set of n 2 S not divisible
by any of a given set of primes. These di erent sieve inequalities use a variety of
methods, but curiously yield similar bounds in many important contexts, often
bounds that asymptotically exceed by a factor of 2 the expected number. We
shall develop one of the most general such inequalities, due to Selberg, and give
some typical examples of its use in analytic number theory. While we state
Selberg's sieve in the context of divisibility, in fact all that we are using is that
each prime p sifts out a subset of S and that the probabilities that a random
n 2 S survives these tests for di erent p are approximately independent; thus
Selberg's sieve has a counterpart in the context of probability theory, for which
see the nal exercise herein. Selberg's and many other sieves are collected in
[Selberg 1969]; nice applications of sieve inequalities to other kinds of problems
in number theory are interspersed throughout [Serre 1992].
Assume, then, that an (n 2 Z) are nonnegative real numbers with n2Z an =
A < 1. For each squarefree d > 0 let
Ad := amd = A (d) + r(d)
where is a multiplicative function with 0  (d)  1 for each d (equivalently,

for each prime d). We are interested in
A(D) := an :
We hope that A(D) is approximately A pjD (1 ? (p)), with an error that is
usefully small if the r(d) are. What we can show is
Theorem (Selberg): For each z  1 we have
A + R(D; z );
A(D)  S (D; (S)
where S; R are de ned by
S (D; z ) :=
XY (p) ; R(D; z ) :=
d D

d z
pjd 1 ? (p) j
d D
d<z 2

and !(d) := pjd 1.
Remarks : as z grows, R(D; z ) increases while 1=S (D; z ) decreases, tending as
z !1 to
X Y (p) Y (p) = Y 1 ;
S (D; D) = = 1 +
djD pjd 1 ? (p) pjD 1 ? (p) pjD 1 ? (p)
so 1=S (D; z )! djD (1 ? (p)) as expected. Note, however, that (S) is only an
upper bound: we do not claim that jA(D) ? A=S (D; z )j  R(D; z ).
Typically we will let D = D(y) = pjy p. For instance, if an is the characteristic
function of an arithmetic progression of length A with common di erence q then
A(D(y)) is an upper bound on (x0 + Aq; a mod q) ? (x0 ; a mod q). Of course
we are interested in the case (a; q) = 1. We take (n) = 1=n1 where n1 is
the largest factor of n1 coprime with q. Then jr(d)j  1 for each d, and so
R(D; z ) is bounded by the sum of the n?s coecients of  3 (s) forPn  z 2, so is
 (z log z )2. [An equivalent and more elementary way to handle nx 3!(n) is
to note that 3!(n) is at most the number of representations n = n1 n2 n3 of n as
a product of three positive integers.] As to S (D; z ), we expand =(1 ? ) in a
geometric series to nd
1=n = (q) log z + O(1):
S (D; z ) > q

Thus Selberg's bound (S) is (q=(q))A= log z + O(z 2 log z 2 ), and by choosing
z = A1=2 = log3 A we nd the upper bound
q 2 + O( log log A ) A :
(q) log A log A

The implied constant depends on q, but tractably so, without invoking zeros
of L-functions and the like; if the coecient 2 were any smaller this would be
enough to banish the Siegel zero!
Proof of (S): Let d (djD) be arbitrary real parameters with 1 = 1 (and
eventually d = 0 once d > z ). Then
X  X 2 XX X
A(D)  an d = d d
1 2 an ;
n dj(n;D) d1 ;d2 jD nj[d1 ;d2 ]
where [d1 ; d2 ] := lcm(d1 ; d2 ). The inner sum is just A[d ;d ] , so we have 1 2

XX ? 
A(D)  d d A ([d1 ; d2 ]) + r([d1 ; d2 ])  AQ + R;
1 2
d1 ;d2 jD
where Q is the quadratic form
Q := ([d1 ; d2 ]) d d 1 2
d1 ;d2 jD
in the d , and XX
R := jd d j r([d ; d ]):
1 2 1 2
d1 ;d2 jD
Now for djD the number of pairs d1 ; d2 such that d = [d1 ; d2 ] is 3!(d) (why?);
thus (S) will follow from the following
Lemma: The minimum of the quadratic form Q subject to the conditions
1 = 1 and d > z ) z = 0 is 1=S (D; z ) and is attained by d with jd j  1.
Proof of Lemma: by continuity we may assume that 0 < (d) < 1 for all djD.
(In fact for our purpose we can exclude from the start the possibilities (d) = 0
or 1 | do you see why?) Since is multiplicative and [d1 ; d2 ](d1 ; d2 ) = d1 d2
we have XX [ (d1 )d ] [ (d2 )d ]
Q= d : 1 2

(( 1 ; d2 ))
d ;d jD 1 2

We diagonalize this form by introducing (e) for ejD determined by

1 = X (e) ()
(d) ejd
indeed X hX i2
Q= (e) (d)d !
ejD ejd
Let x(e), then, be de ned by
x(e) := d (d):

By Mobius inversion we nd
(e) = 1 ? ( p()p) ; d = (1d)
pjd dje
Our conditions on the d then become
(D=e)x(e) = (1)1 = 1; e > z ) z (e) = 0:
By Schwarz, the minimum of Q subject to these conditions is
h X 1 i?1 = 1=S (D; z );
ejD; ez  (e)

and is attained at x(e) = (D=e) ((e)S (D; z )). This yields

S (D; z )d =  ((Dd)) 1 = (D) 1 :


djez  (e) (d) (d) j

f (D=d)

f z=d
(f )

But we have
1 =X 1
(d)(d) ejd (e)
since ;  are both multiplicative. Thus we have
S (D; z )d = (D) frac1(ef );
with each ef  z and no ef values repeated. Thus the sum is at most S (D; z ),
so jd j  1 as claimed. This concludes the proof of the Lemma, and thus also
of Selberg's inequality (S).

1. Prove that for each integer n > 0 the number of primes p < x such that p +2n
is also prime is On (x= log2 x). In particular, conclude that the sum
1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 +
3 5 5 7 11 13 17 19 29 31 41 43
of the reciprocals of twin primes converges.
2. Prove that there are at most ((8=) + o(1))x= log x integers n < x such that
n2 + 1 is a prime. Generalize. [It is of course an outstanding open problem to
nd a similar lower bound on the number of such n, or even to prove that it is
unbounded as x!1, i.e. to prove that there are in nitely many primes of the
form n2 + 1 | or more generally P (n) where P is an irreducible polynomial

such that P (n) 6 0 mod p has a solution mod p for each prime p. Dirichlet's
theorem is the case of linear polynomials; the conjecture has yet to be proven
for a single polynomial P (n) of degree 2 or greater.]
3. Let pi (i 2 [m] := f1; 2; : : :; mg) be probabilities, i.e. real numbers in [0; 1],
and let E1 ; : : : ; Em be events approximating independent events with those
probabilities,Q i.e. such that for each I  [m] the probability that Ei occurs for
all i 2 I is i2I pi + r(I ). Obtain upper bounds on the probability that none
of the Ei occurs, bounds which correspond to and/or generalize Selberg's (S).
[Selberg 1969] Selberg, A.: Lectures on Sieves, pages 66{247 of his Collected
Papers II [O 9.89.2 (II)]
[Serre 1992] Serre, J.-P.: Topics in Galois Theory. Boston: Jones and Bartlett
1992. [BB 9.92.12 / QA214.S47]