You are on page 1of 78

A course in

Analytic Number Theory


Taught by Barry Mazur Spring 2012

Last updated: April 24, 2012

Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. January 24 January 26 January 31 February 2 February 7 February 9 February 14 February 16 February 21 February 23 March 1 March 6 March 8 March 20 March 22 March 27 March 29 April 3 April 5 April 10 April 12 4 8 11 14 18 22 26 29 32 37 40 44 48 51 55 59 62 64 67 69 73

22.

April 17

75

Math 229

Barry Mazur 1. January 24

Lecture 1

1.1. Arithmetic functions and rst example. An arithmetic function is a function N C, which we will denote n c(n). For example, let rk,d (n) be the number of ways n can be expressed as a sum of d k th powers. Warings problem asks what this looks like asymptotically. For example, we know r3,2 (1729) = 2 (Ramanujan). Our aim is to derive statistics for (interesting) arithmetic functions via a study of the analytic properties of generating functions that package them. Here is a generating function:

Fc (q) =
n=1

c(n)q n

(You can think of this as a Fourier series, where q = e2iz .) You can retrieve c(n) as a Cauchy residue: Fc (q) dq 1 c(n) = 2i qn q This is the Hardy-Littlewood method. We can understand Warings problem by dening

fk (q) =
n=1

qn

(the generating function for the sequence that tells you if n is a perfect k th power). Then d fk (q) is the generating function of the thing we want: qn
k

d = fk =

rk,d (n)q n

Take the case when k = 1; you get a geometric series. If k = 2, then f2 (q) = by a theorem of Jacobi. Another example is the partition function: p(n) is the number of ways that n can be expressed as a sum of positive integers:

(1 q 2m )(1 + q 2m1 )(1 q 2m+1 )

p(n)q n =
n=1 m=1

1 1 qm

A third example is (q) = q (1 q m )2 (1 q 11m )2 = a(n)q n

If n = 11 is a prime, then a(p) = 1 + p |E(Fp )| where E = y 2 + y = x3 x2 (here |E(Fp )| is the number of points on the curve with values in Fp ). When you replace q = e2iz 1 theres a symmetry where you can replace z 11z .

Math 229 1.2. Formal Dirichlet Series.

Barry Mazur

Lecture 1

Dc (s) =
n=1

c(n) ns

Suppose we have two arithmetic functions n a(n) and n b(n). Dene a product a b = c where n a(d)b c(n) = d
d|n

(Divisors are assumed always to be positive.) Why do we care? If you have Dirichlet series a(n) b(n) Da (s) = ns and Db (s) = ns then a b(n) = Da (s)Db (s) ns The star product is commutative, associative, and has the identity where Dab = (n) = 1 n=1 0 else

This is not the identity function 1 : n 1. The identity function is invertible, and the inverse is the Moebius function (n) = Exercise: check that (d) 1 =
d|n

0 (1)# of prime factors of n

if n is not square-free else

so that

(n) (s) = 1 . ns Definition 1.1. An arithmetic function is called multiplicative if c(u v) = c(u) c(v)

if (u, v) = 1. So
v

c(n) =
i=1

c(pki ) i

if n =

v ki i=1 pi .

Suppose n c(n) is multiplicative, and let p be prime. Let

Dc,p (s) =
m=0

c(pm ) pms c(n) ns

Dc (s) = Formally (1) Dc (s) =


p

Dc,p (s) 5

Math 229

Barry Mazur

Lecture 1

BUT, even if Dc (s) and Dp (s) converge, you may not have the formula, because the product may not converge. If |c(n)ns | converges, then everything works. n=1

c(n) ns =
n=1

c(n)

niy = nx

c(n)

elog(n)iy nx

If it converges at a point, then it converges at the entire vertical line through that point. 1.3. Riemann function. Using the arithmetic function n 1, then dene 1 (s) = D1 (s) = ns This converges absolutely if Re(s) > 1, by calculus. There is an innite product expansion 1 = ns (This is in the HW.) Theorem 1.2. (s)1 = (n) = D (s) ns 1 1 = 1 ps prime p

(
prime m=0

pms )

Proof. D (s)(s) = D D1 = D = 1

This is analytically true when Re(s) > 1. Basic reason for interest in Dc (s) for any c : n c(n): If we know enough, we will be able to give very good estimates for pX c(p) := c (X). In particular, when c = 1, we get estimates for (X), the number of primes X. Theorem 1.3 (Euler).
1 pX p

diverges as X .
1 p )1 . Write this as m . m=0 p

Proof. Dene (X) =

pX (1

pX

But

nX 1 n

1 (X) n

1 because each is a term in the expansion of . But n diverges (which immediately proves that there are innitely many primes). Take the log of the innite product:

log (X) =
pX

log 1 6

1 p

Math 229 But log 1

Barry Mazur

Lecture 1

1 p

1 1 1 + + + p 2p2 3p3 1 1 1 + 2 + 3 + p p p 1 1 1 = + 2 1 p p 1 p =
nN converges

for some c constant. So we have 1 + something convergent X p We showed above that (X) goes to , which means its log does too; if the LHS diverges, then so does the RHS. lim log (X) =

1.4. Arithmetic functions related to the -function. a will be an arithmetic (d)A n . In function, and let A(n) = d|n a(d). Equivalently, a = A so a(n) = d terms of Dirichlet series, DA (s) = (s) D(a) Dene A as the series where n k (n) = dk
d|n

and a : n

nk .

Then DA (s) =

k (n) = (s)Da (s) = (s) (s k) ns nk 1 Da (s) = = = (s k) s sk n n If k = 0 then dene 0 (n) = (n) which counts the number of divisors of n. Then (n) = (s)2 ns We call a = , the Euler Phi function; then A(n) = A(n) = n. (s 1) D (s) = (s)
d|n (d);

you can check that

Recall rk,d (n), the number of ways that n is expressible as a sum of d k th powers, and recall that when k = 2 we had a nice formula for this. Let R2,d (n) be the number of ways that n is expressible as the sum of d squares, counting (+)2 and ()2 as dierent, and including 0. Also, order matters here! For example, R2,24 (2) = 1104 7

Math 229 Slightly more generally,

Barry Mazur

Lecture 2

16 11 (n) + e(n) 691 where e(n) is the error term: up to taking logs, it is < other term. We have a complete description of the error term. R2,24 (n) = 2. January 26 Dene =q and recall f2 = It turns out that f2 (mod 2) . RECALL, a Dirichlet series is a formal expression that looks like D(s) = a(n)ns . We also had arithmetic functions n a(n) C. L-functions will be a special case. 2.1. Some properties to learn about. Domain of denition. For example, (s) doesnt make sense as a power series to the left of the line Re(s) = 1. However, there is a meromorphic continuation, with pole only at s = 1. (We have good control to the right of the line, but the most interesting information comes from the other half.) Also, there are symmetries: usually a relationship between L(s) and L(1 s) (i.e. you can sort of reect across the line). Poles and Zeroes. One way (the only way) of understanding pX a(p) is to understand the poles and zeroes. In the case of (s), this gives information about pX 1 = (X), the number of primes X. The Riemann hypothesis says that the zeroes of the 1 -function are at the negative even integers, on the line Re(s) = 2 , and nowhere else. If youre interested in statistics, consider w(p)a(p) (i.e. a weighted sum). A good thing to study is pX log(p). Special values of L(s). For example, (1 2k) = (1)k B 2k . 2k

(1 q n )24
n=1

qn .

Estimating density of primes p where a(p) has some property. For example, there are asymptotically equal numbers of primes of the form 4n 1 as of the form 4n + 1. Also consider prime races: it turns out that 4n 1 wins almost all the time. We will call this a Chebotarev-type question. 2.2. Wedge uniform convergence. Suppose you have a Dirichlet series D(s) = which we assume converges at some complex point s0 . 8

s n=1 a(n)n

Math 229

Barry Mazur

Lecture 2

Notation 2.1. We will denote complex numbers as s = + it for , t R. Make any wedge in C with vertex s0 , pointing towards positive real innity, where c|tt0 | < 0 for c > 0. Theorem 2.2. D(s) converges uniformly in any such wedge. Dene a remainder term R(X) =
n>X

a(n)ns0

Since everything converges, this goes to zero as X . Check that a(n) = (R(n 1) R(n))ns0 so we can write
N N

a(n)ns =
n=M +1 N

(R(n 1) R(n))ns0 s
M +1 N

=
M +1

R(n 1)ns0 s
M +1

R(n)ns0 s
N

= R(M )(M + 1)s0 s R(N ) N s0 s


M +2

R(n 1)((n 1)s0 s ns0 s )


A

A = (s0 s) = (s0 s)

R(n 1)
n1 N M +1

us0 s1 du

R(u)us0 s1 du

So in total this is
N

(2)
M +1

a(n)ns = R(M )(M + 1)s0 s R(N ) N s0 s + (s0 s)

N M +1

R(u)us0 s1 du

This gives an estimate (using the fact that R(n) is supposed to be small for suciently large n):
N

a(n)ns + (s0 s)
M +1

N M +1

us0 s1 du

when u M and > 0 . As long as we have |s0 s| < C(0 ), the integral is bounded. If |t0 t| < c( 0 ), then by Cauchys inequality we have |s0 s| |0 | + |t0 t| implies |s0 s| < C|0 | as desired. This show that D(s) converges uniformly in any wedge with vertex s0 . 9

Math 229

Barry Mazur

Lecture 2

Corollary 2.3. If D(s) is analytic in any s0 -wedge then D(s) converges uniformly in any neighborhood of any point in this wedge. Corollary 2.4. Dierentiating term-by-term,

D (s) =
n=1

log(n)a(n)ns

converges in the wedge. Making the wedges larger and larger, we get Corollary 2.5. If D(s) converges at s = s0 , then it converges in the open half-plane of Re(s) > Re(s0 ) = 0 . How far to the left does it converge? Either it converges everywhere, it converges nowhere, or it stops somewhere. Definition 2.6 (Abscissa of convergence of D(s)). The abscissa of convergence is a number c R {} such that D(s) converges or not depending on whether Re(s) > c or Re(s) < c . (In general, we dont have control over the line.) Landau: if the series has positive real coecients, then if c is nite, it is a singularity of some sort. 2.3. Sums. It is much easier to estimate A(X) := pX a(p). nX a(n) than A(X) is not dierentiable, because it is a step function. So the derivative is a Dirac delta function, normalized by the coecients a(n). If you want to understand f (X)dA(X), this is not a classical Riemann integral. The mess above is basically a hands-on integration by parts. Integrals of this form are called Riemann-Stieltjes integrals. (Read appendix A in the book.)
N

a(n)ns =
1

N+ 1

xs dA(x)

(The 1- and N + refer to starting a little before and ending a little after the point, respectively; this is necessary because we are adding up Dirac deltas.)
N

a(n)ns =
1

N+ 1

xs dA(x)
N 1 N 1

= A(x)xs |N + 1
N

A(x)dxs

a(n)ns = A(N )N s + s If > 0 then


1 s A(N )N

A(x)xs1 dx

goes to zero. By hypothesis, c 0.


s 1 a(n)n

Lemma 2.7. If > c then

=s 10

s1 du 1 A(u)u

Math 229

Barry Mazur

Lecture 3

( > c means that the A(N )s converge.) Definition 2.8. L = max{0, lim supx
log |A(x)| log(x) }

If > L then the RHS and LHS of Lemma 2.7 converge and are equal. Corollary 2.9. c L Proof. Use (2) . . . Actual proof next time. Notation 2.10. A(x) X means that |A(x)| < C x , for suciently large x and for some nite constant C > 0.

3. January 31 Recall we had a Dirichlet series D(s) = a(n) ns . We talked about wedge uniform n=1 convergence, which guarantees that if it converges at any point s0 , then it converges in a wedge to the right of s0 . You can compute the derivatives term-by-term. We dened the abscissa of convergence c to denote the largest half-plane where it converges. Suppose the Dirichlet series converges at s = sc . Dene the remainder sum R(x) =
n>x

a(n)ns0

which goes to 0 as x . Via manual integration by parts, we integrated U s0 s dR (the Stieltjes integral) to get
N

(3)
M +1

a(n)ns = R(M )M s0 s R(N )N s0 s + (s0 s)

N M

R(u)us0 s1 du

In particular, try sending M 0 and N :

a(n)ns = (s0 s)
1 0

R(u)us0 s1 du

Denote the partial sums by A(x) =


nx

a(n)

Play the same Riemann-Stieltjes game:


N

a(n)ns = A(N )N s + s
1 1

A(u)us1 du

Case 1: c < 0 and = Re(s) > 0. So it converges at zero, and the A(N ) have a limit. Letting N , we get

a(n)ns = s
1 1

A(u) us1 du

11

Math 229 Case 2: c 0. Let

Barry Mazur

Lecture 3

log |A(x)| log x x We already have Re(s) = > L. Choose between and L. Then A(x) the calculation: L = lim sup x
log |A(x)| log x

x . Here is
1 log x

= x

log |A(x)|

1 log x

log x

log |A(x)|

1 log x

log |A(x)|

log x

= |A(x)|

If > then A(N )N s x N s goes to zero. The integral term converges for the same reason: A(u)us1 du should be compared to u u1 du = u1( ) du; as long as > , this converges. Corollary 3.1. If > > L and c 0, then D(s) converges for Re(s) = , and
1

a(n)ns = s
1

A(u) us1 du

In this case, c ; but is any number > L, so c L. (So this says something about c in terms of the growth rate of the coecients.) Use
N

a(n)ns = R(M )M s0 s R(N )N s0 s + (s0 s)


M +1

N M

R(u)us0 s1 du

Use M = s = 0 and this simplies to


N

a(n)ns = R(N )N s0 + s0
1 0

R(u)us0 1 du

Stu converges at s0 so R(u) is bounded. Since R(N ) is bounded, R(N )N s0 is growing no faster than x0 (using absolute value), and similarly for the integral term. |A(N )| N 0

But there are points where it converges, but not absolutely. This suggests we should pay attention to the abscissa of absolute convergence a . We say that a Dirichlet series converges absolutely at s if |a(n)| n converges. (Of course, absolute convergence implies convergence, so c a .) An absolutely convergent series does not change when you rearrange the terms. Theorem 3.2. c a c + 1 Proof. Suppose there is a sequence b(n) 0 as n . The modied sequence b(n)n1 is convergent, if b(n) is any bounded sequence. Suppose D(s0 ) converges. Then let b(n) = a(n) ns0 . If the sum converges, then the terms go to zero; in order to make it converge absolutely, we just have to tuck in an extra 1 in the exponent: a(n)ns0 1 converges absolutely. Letting 0 this proves the theorem. 12

Math 229

Barry Mazur

Lecture 3

For example, the zeta function has a pole at s = 1, so it doesnt have a chance of converging beyond that point as a Dirichlet series. But we can multiply by the oending factor (1 21s ) to eliminate the pole: c(n) ns where c(n) = (1)n+1 . Letting c and a denote the abscissas for the modied functions, we have c = 0 and a = 1. (1 21s ) (s) = Suppose you have two Dirichlet series D(1) (s) and D(2) (s), where c > c . Then (1) (2) D(1) (s) D(2) (s) makes sense for Re(s) > max{c , c } We also talked about the stara(1) a(2) product of Dirichlet series: D(s) = which also makes sense in that domain. But ns the product might make sense but not converge. (For example, the abscissa of convergence for (1 21s )(s) is 1 and not zero.) 4 We had a lot of formal product formulas in the rst lecture, for example (s)D(s). Do these make sense as functions (instead of just formal Dirichlet series)? Yes, if you have absolute convergence (the star product just rearranges terms). Theorem 3.3. D(1) (s) D(2) (s) converges to the formal product of Dirichlet series if (2) > a . Can you retrieve the coecients from the functions? Yes. Proposition 3.4. Suppose we have two Dirichlet series a(n)ns and b(n)ns that converge somewhere. Then, if they are equal as functions in some right half-plane of convergence, then they are the same Dirichlet series (i.e. the coecients are the same). Proof. By shifting, suppose they both converge at s = 0. Let c(n) = b(n) a(n); we will show that all c(i) = 0.
(1) (2)

c(1) =
n=2

c(n)ns

c(1) is constant, but the rest depends on s, so its zero. Keep going. Note that its not enough, in general, for complex continuous functions to be equal at a discrete set of points, in order to show that they are equal (example: sin(x) and zero). But Dirichlet series are indeed determined by their values at certain sequences. For elliptic curves, c = 1; in dynamical systems, it represents the entropy of the system. In the case were dealing with, it will usually be 1 or 0. Theorem 3.5 (Landaus Theorem). Start with a Dirichlet series D(s) = a(n)ns where a(n) is real and nonnegative for suciently large n. Also assume that < c < . Then c is a singularity (it cant be analytically continued there). 13

Math 229

Barry Mazur

Lecture 4

If a power series converges in a disc, you can ask what the largest disc of convergence is; then the power series is singular everywhere on that largest circle. Let s be such that > c (i.e. it converges there). Take the k th derivative: D(k) (s) =
n

a(n)( log n)k ns

Assume c = 0 (that is, replace an by an nc , which does not change the fact that the coecients are nonnegative). Take s = 1: this is analytic here, so it converges as a power series in the interior of some maximal circle: D(s) =
k

c(k)(s 1)k

Certainly, the radius of this maximal circle is at least 1. But suppose, for the sake of contradiction, the series is analytic at s = 0. Then it is analytic in some little circle D (0) around s = 0. There arent any obstacles to convergence anywhere to the right of the line = 0, so the maximal circle of power series convergence is allowed to extend to the left of = 0, as long as it is still contained in { > 0} D (0). c(k) can be gotten by taking the appropriate derivative: 1 c(k) = D(k) (s)|s=1 k! 1 = a(n)( log n)k n1 k! n Watching the negative sign, rewrite

D(s) =
k=0

(1 s)k k!

a(n) (log n)k n1


n

This is absolutely convergent, so you can rearrange terms (i.e. swap the sums) D(s) =
n

a(n)n1
k=0

((1 s) log n)k = k! =

a(n)n1 exp((1 s) log n) a(n)ns

We assumed that the power series converged past s = 0, and then transformed the power series into the original Dirichlet series, showing that this too converges past s = 0, a contradiction.

4. February 2 Standard way of dealing with the Gamma function:

(s) =
0

ex xs

dx x

(Integrate by parts.) 14

Math 229

Barry Mazur

Lecture 4

There was a recap of Landaus theorem, which I added to the notes on the previous lecture. 4.1. Euler Products. Let a(n) be a multiplicative function: that is, if n = then a(n) = a(pei ). Dene 1 i=1
ei i=1 pi ,

Dp (s) =
m=0

a(pm )pms

The Fundamental Theorem of Arithmetic says that Dp (s) = D(s)


p prime

at least formally. But, is this true analytically? Proposition 4.1. If D(s) converges absolutely then Dp (s) converges absolutely for all p, and Dp (s) = D(s)
p

(i.e. both sides converge, and they converge to the same thing). Let N (X) be the set of numbers with prime divisors X. If D(s) converges absolutely, then |a(n)|ns converges; for Dp (s) to converge, we only need a subset of these terms to converge, so that is OK. To show that p Dp (s) converges, it suces to show that pX Dp (s) 0 (so youre getting the terms where all of the divisors are > X). D(s)
nN (X)

a(n)ns

represents the set of summands of D(s) corresponding to terms where at least some divisor s is > X. But this dierence is nX |a(n) n |, which goes to zero by absolute convergence, so we are done. 4.2. Facts about the Ramanujan function. Dene

L (s) =
n=1

(n)ns 1 1 (p)ps + p11 p2s

L,p (s) =
m

(pm )pms =

Theorem 4.2 (Mordell). (m n) = (m) (n) if (m, n) = 1.

(pm+1 ) = (p) (pm ) + p11 (pm1 ) Together with the Mordell theorem, this shows that is determined by its values at primes. 15

Math 229

Barry Mazur

Lecture 4 (1 ps )1 :

4.3. Log of the zeta function. Using the denition (s) =

log((1 ps )1 ) =

log (s) =

m=1 pms p m=1

pms m

Definition 4.3. The Von Mangoldt Lambda function is: (n) = log p if n = pe for some e 0 else
n=1

log (s) =
p m=1

pms = m

(n) s n log n
log(p) ms log(pm ) p

(If n is not a power of p, then the coecient is zero. If n = pm , then we get which produces the right thing.) (log (s)) =
p,m

pms =

(n)ns

(s) = (s)

(n)ns

4.4. Meromorphic continuation. Notation 4.4. For u R write u = [u] + {u}, where {u} is the fractional part, and [u] is the greatest integer u. Similarly, for dierentials we can write du = d[u] + d{u}. Then d[u] is basically a delta function at each integer. Break up the zeta function: (4) The second piece is ns =
n>x x+

(s) =
nx

ns +
n>x

ns

us d[u] us du
A x+ B

=
x

us d{u}

u1s A= s1 16

Math 229
x+

Barry Mazur us d{u} +


x+

Lecture 4

{u}dus = dus {u} ] = xs {x} x+ B = xs {x}


x+

{u}dus

ns =
n>x

x1s s1

+ xs {x} + s

x+

{u} us1 du

Going back to (4), we have (s) =


nx

ns +

x1s + xs {x} s s1

x+

{u}us1 du

The integral is bounded by |s| u1 du. This is bounded if > 0. The formula on the right makes sense for Re(s) > 0. The LHS is a Dirichlet series that makes sense for Re(s) > 1. The RHS is analytic in s. Take the limit as x 1+ . The decomposition reduces to 1 (s) = 1 + s {u}us1 du s1 1
s s1

x+ 1

This shows that the RHS is meromorphic, with a simple pole at s = 1, in the region Re(s) > 0. Lets estimate the behavior of (s) in the strip 0 < < 1. Take s = R. The integral is bounded by replacing {u} by 1; now take the antiderivative, and we have 0 1 u1 du 1. This gives 1 = 1 () 1 1 1 for this range.

4.5. Eulers constant. Many equivalent denitions in the notes. Definition 4.5. C := 1
1

{u}u2 du

Alternatively, its the unique constant approximated by 1 log x n


nx

as x . More precisely, 1 log x = C + O n 1 x

nx

Notation 4.6. A = O(f (x)) means that A c f (x) for large x, and some constant c. 17

Math 229 To prove the equivalence, 1 = n


x+ 1

Barry Mazur

Lecture 5

u1 d[u] =

x+ 1

u1 du

x+ 1

u1 d{u}

nx

where the last integral is {u}d(u1 ) + u1 {u}]x+ by Riemann-Stieltjes calculations. Fin1 ish this later.

5. February 7 5.1. Eulers constant. (1) C = 1 1 {u}u2 du 1 1 (2) nx n log x = C + O( x ) 1 (3) (s) = s1 + C + a1 (s 1) + a2 (s 1)2 + Proof. Write
nx

1 = n

x 1

u1 d[u] =

x 1

u1 du
log x

x 1

u1 d{u}

where integration by parts gives


x x

d{u} +
1 1

{u}d(u1 ) = {u}u1 |x1 = 1 {x} s xs


1 x

This gives (2). (s) Now use x = 1: x1s = s1 (s) ns +


nx

{u}us1 du

1 =1s s1 and take the limit as s 1, getting (3).

{u}us1 du

Everyone thinks that C is transcendental, but we cant prove its irrational!

5.2. Fourier transforms. Schwartz functions on the real line are rapidly decreasing C functions: we require f P (X) to be bounded for any polynomial. (One easy way to have this is to consider compactly supported functions.) Definition 5.1. The Fourier transform of f is

e2ixy f (x) dx = f (y) 18

Math 229

Barry Mazur

Lecture 5

If f is Schwartz then f = f . We have a bilinear map R R C = C = {z C : |z| = 1} 1 Theorem 5.2 (Poisson summation). Letting x 0, we have f (n) = Definition 5.3. The Gaussian is f (x) = ex . Lemma 5.4. The Gaussian is its own Fourier transform. Proof. Integrals without limits will mean f (y) = ex e2ixy dx =
2 2 2

(x, y) e2ixy
2inx

which shows that R is self-dual. One of the beautiful consequences of this symmetry is
nZ f (n

+ x) =

nZ f (n)e

f (n)

R.

e(

2 +2ixy+y 2 y 2 )

dx

= ey

e(x+iy) dx

The contour is now + 0 + + 0, and we want to change it to + iy + + iy. Using a large R instead of , turn this into a rectangle, where the contribution to the integral from the sides 0 as R . So we can ignore the sides of the rectangle.
+

x2

dx =

x2

dx

y 2

1 2

dy

e(x

2 +y 2 )

dx dy =
r=0

er r dr d = 1

We can renormalize the Gaussian to ex /t , and its Fourier transform is replacing x x ). Were thinking of t > 0, so we can take t > 0. t Definition 5.5 (Theta function). (z) = 1 2 ein
nZ
2z

tex t (youre

1 + 2

ein
n1

2z

Write z = x + iy, which makes (z) =


2in2 x

1 2

e2in
nZ

2x

e2n

2y

But e has absolute value 1, so it will not play much of a role in convergence. But in 2 order for e2n y to make sense, we need y > 0. We will consider it on the imaginary axis z = iy, and think of it as a function of y. 19

Math 229

Barry Mazur

Lecture 5

Analytic continuation will show that symmetries in (iy) relate to symmetries on the entire upper half-plane. 1 2 ein y (iy) = 2
nZ

We have (z +2) = (z), just from the denition (and noticing e2in = 1). Apply Poisson summation (and remember the Gaussian Fourier transform from above): 1 n2 /y 1 = ye = y (iy) iy 2 (Dene our square roots so that they are positive on the positive real axis.) So 4k
1 z 1 z

z (z) i

= z 2k 4k (z)

5.3. Gamma function. Definition 5.6. (s) =


0

ts1 et dt

Also dene a function F (s) = s (s) (2s) which we will write in a bunch of ways. Now work in the domain where the Dirichlet series makes sense, and write

F (s) = =

(n2 )s (s)
n=1 0

(n2 )s ts1 et dt

Make the substitution t = n2 y, and dt = n2 dy =

(n2 )s (n2 )s1 y s1 en y n2 dy

=
n=1 0

y s1 en

2y

dy

We can interchange the summation and the integral, because the integrand is positive and = F (s) is known to be convergent (use Fubini/Tonelli):

=
0

y s1
n=1

en

2y

dy 1 2 dy

=
0

y s1 (iy)

HW: (iy)

1 2

= O(ecy ) 20

Math 229 Split the range of integration: o from the in this part: =
1 0

Barry Mazur =
1 1 + 0. 1

Lecture 5
1 0.

The problematic part is 1 ys 2 s


1 0

Split the

1 2

+
0 1

y s1 (iy) dy y s1 (iy) dy

=
1

+
0

1 2s 1 2s 1 2s

Now do the transformation y =


1

y 1 :
1

y 1s

i y

d(y 1 )
y 2 dy

=
1

+
1

y 1s

1 iy 1 2s

dy

Make the transformation

=
1

+
1

y 1s y 2 (iy) dy
1

y 2 s (iy) dy

=
1

+
1

y 2 s (iy)

1 2 1 2

dy +

1 2

y 2 2 dy
1 12s

=
1

(y s1 + y 2 s ) (iy)
I(s)

dy

1 1 1 2s 2s

= F (s)
1

(s) (2s)

Note: I(s) = (y s1 +y 2 s ) (iy) 1 dy has an integrand that is a polynomial times 2 a very quickly decreasing function. It is entire. If we replace s 1 s, then y s1 switches 2 1 1 1 with y 2 s , and 12s switches with 2s . Therefore, Proposition 5.7. F (s) = F 1 s 2

Usual denition: Then weve proven

(s) = 2

s (s) 2

(1 s) = (s) To get from to , weve passed through (iy) and gotten to (s) by a Mellin transform. Can you go the other way? Yes! 1 1 (iy) = us F (s) ds 2 2i Re(s)= 0 21

Math 229

Barry Mazur

Lecture 6

is a modular form, whose transformation law gives the symmetry of the functional equation. is completely determined by its transformation law, and growth rate. This translates to a characterization of (s) in terms of properties of its functional equation.

6. February 9 6.1. One way to prove the functional equation. For Re(s) > 0 (s) = You can check that s
1

1 +1+s s1
1 2 xs+1

{x} xs+1

dx =
1

1 2

Subtracting and adding this to the previous line: (s) = 1 1 + s s1 2 {x} 1 2 dx xs+1

for Re(s) > 0. But claim that this is true for Re(s) > 1. Write f (x) = {x} 1 and 2 x dene F (x) = 1 f (y) dy which is bounded because the sawtooth is centered around y = 0. Integration by parts:
b 1

f (x) F (x) dx = s+1 x xs+1

+ (s + 1)
1 1

F (x) dx xs+1

The second integral is convergent for s > 1. Claim 6.1. s


0 1 1 {x} 2 1 1 dx = + s+1 x s1 2

Proof. Since {x} = x for x [0, 1), we can split up the LHS so that
1

LHS = s
0

xs dx +

s 2

1 0
s 2

xs1 dx
1 xs s 0

When s > 0 then the lower bound of zero is OK. Now it is easy to evaluate this. Now we can plug this into the previous expression of (s) and get
1 {x} 2 dx xs+1 0 for 1 < Re(s) < 0. (We need > 1 so that the integral to will work out, and < 0 for the lower bound.) By Fourier series,

(s) = s

1 {x} = 2

n=1

sin(2nx) n

22

Math 229

Barry Mazur

Lecture 6

and the HW implies you can switch the integral and sum, giving (s) = = s
n=1

1 n

sin(2nx) xs+1 dx

s (2n)s sin y dy n y s+1 0 1 s = (1 s)(2)s [(s) sin( s)] 2 Lemma 6.2.


0

by next lemma

sin(y) dy = (s) sin y s+1

1 s 2

Use the contour obtained by going counterclockwise around the quarter radius-X circle in iy iy the right upper half-plane, and letting X go to . Recall sin y = e e . Here |eit | 1 2i and 1 1 Re(s)+1 y s+1 X 1 The arc length is X and X X Re(s)+1 0 as X +. So we can concentrate on the 2 2 straight pieces.

Residues = 0 =
0

eiy dy + y s+1

+
arc +i 0

eiy dy y s+1

Make the substitution z = iy:


+i

dy =
0 0

ez (iz)s1 i dz = (e 2 i )s

ez z s1 dz
(s)

Actually, the integral isnt dened at zero, so we need to make a little arc of radius r around zero, and integrate on that instead. We need to do something similar for eiy in sin y = e e ; this involves ipping the 2i contour over the x-axis. sin y 1 dy = sin s (s) y 1+s 2 0 Here we extended the domain of (s) by a strip to the left. But you can keep doing this, and, adding as many strips as you want, you can get the entire left half-plane.
iy iy

6.2. The lter. Recall we had a normalized version of the functional equation from last time: F (s) = s (s)(2s) 23

Math 229

Barry Mazur

Lecture 6

Using contour integration along a vertical line: 1 1 ts F (s) ds (it) = 2 2i Re(s)=0 Let X > 0, and assume 0 is suciently large (at least, > 0). Theorem 6.3. 1 2i
0 +i 0 i

Xs s

ds =

1
1 2

X>1 X=1 X<1

We will do the rst case; the second is HW. Use a rectangle between the lines Re(s) = A and Re(s) = 0 , cut o between s + iT and s iT . So there are four integrals, around +iT the sides of the rectangle, and the goal is to show that the ones other than 00iT go to zero. Lets do the top piece: =0 1 0 y 0 X y d T 1 T A log y =A | + iT | The left-hand line is t=+T X A X A dt 2T A t=T | A + it| As A , this goes to zero.

6.3. Unrigorous Mellin transform. Recall for a Dirichlet series D(s) = we can dene a counting function A(x) = nx a(n). Actually, modify this: A (x) = (Basically, A (x) = Recall (n) = log p 0 n = pe else
nx (n) A(x+)+A(x) .) 2

a(n)ns ,

A(x)
n<N

a(n) +

1 2 a(N )

xZ / else

We also care about the counting function (x) = (x) =


(x+)+(x) . 2

and the associated function

Roughly, D(s) and A(x) encode the same information. How do you get from one to the other? D(s) = xs dA(x) = A(x) d(xs ) = s A(x)xs1 dx
1 1

24

Math 229

Barry Mazur

Lecture 7

This is the Mellin transform: reading right-to-left, you recover D(s) from A(x). The inverse Mellin transform is getting A(x) out of D(s). Consider 1 2i D(s)
0

Xs 1 ds = s 2i

a(n)

This is just applying the lter to each term, the role of x in the denition, X and you get a factor of 1 exactly when n 1 = X n, and these are the terms a(n) that get selected. This just gets you A (x). Now do this rigorously. Theorem 6.4.

0 n=1 x with n playing

1 x s n

ds

1 T 2i where the limit is taken with xed X. A (x) = lim

0 +iT

D(s)
0 iT

xs ds s

Let DN (s) = nN a(n)ns , DN (s) = n>N a(n)ns . So D = DN + DN . (And yes, the upper and lower N s are kind of bad notation. . . ) It suces to prove that
0 +iT

DN (s)
0 iT

Xs ds 0 as T s

Use a rectangle with vertices {0 iT, T iT }, with counterclockwise direction. Everything is regular here, assuming that 0 > max{0, c }; the only pole is at s = 0. We know that A+ The claim is that all except Lemma 6.5. B, C, D0 A= B+
0 +iT 0 iT

C+

D=0 A 0.

go to zero; this will imply that

DN (s) =
N

us d(A(u) A(N ))

(The dA(N ) doesnt actually do anything, but we put it there for clarity.) This is = s
N

(A(u) A(N )) us1 du

Choose some between c and 0 . Claim A(u) A(N ) u (This comes from the lim sup discussion from earlier.) Next time, we use the integral and the inequality to produce immediate estimates on all the contours. 25

Math 229

Barry Mazur 7. February 14

Lecture 7

7.1. Perrons theorem. We have a Dirichlet series D(s) = A(x) = nx a(n) and A as before. The Mellin transform is

a(n)ns , have dened

D(s) = s
1

A(x)xs1 dx

Using the unrigorous method from last time, A (x) =


?

1 2i

0 +i

D(x)
0 i

xs ds s

where 0 > c . This is certainly true if were allowed to swap the sum (in D(s)) with the integral: one term looks like a(n) n<x x s 0 +i s a(n) s x n ds = ds = 1 a(n) n = x a(n) n 2 s 2i s 0 i 0 n>x The 1 in the middle case is why we had to use the averaging procedure for A (x) instead 2 of A(x). So 1 xs a(n) ns ds = A (x) 2i s
n=1

Now we need to make this rigorous. Fix x. Choose N x. Write D(s) =


nN DN (s)

a(n)ns +
n>N

a(n)ns
DN (s)

Since swapping sums and integrals is ne when the sums are nite, we can say 1 xs A (x) = DN (s) ds 2i s We need to show that xs DN (s) ds 0 as N s +iT Make the rectangle with vertices {0 iT, T iT } We want to show that 00iT (the left vertical piece) is small; to do that, we show that the other integrals are small. We can write DN (s) = us d(A(u) A(N )) = s (A(u) A(N ))us1 du
N+ N

by Riemann-Stieltjes. Assume A(u) A(N ) for (max{0, c }, 0 ).

u N

DN (s)

|s|
N

u1 du = |s| 26

Math 229

Barry Mazur

Lecture 7

Now the integral we want to evaluate is (5)

|s| x N ds = N |s|
T

x N

ds

When * is one of the horizontal pieces: N 0 = N 0 x N


x N N ds = x 0 log N

x T N

log

N N 0 x0 = 0 log N log N (0 ) x x
T +iT T iT ,

x 0 N N x N 0 x0

N is large so

x T N

is small

< 0 by assumption

To do the right-hand vertical integral that does not depend on t):


+T

plug in = T to (5) (obtaining an integrand

N
T

N T

xT xT dt = N N T (T (T )) (T ) T x T 2T = N N T
bounded

x and T T x T x N = N 2 T N 2 N N2 Remember, we were taking limT before taking limN , because we were trying to nd s T +iT limT T iT DN (s) x ds for each N , and then hope that those integrals approached zero s as N . So here, limT N 2 0 for every N .
T

Choose N large such that N

T 2

7.2. Gamma facts. Were interested in the behavior of on upper vertical strips: + it, where |t| > 1 and is in some strip. Theorem 7.1 (Stirlings formula). |(s)| e 2 |t|

By denition, (s) =

dt t 0 It is the Mellin transform of et . So we can do the inverse Mellin transform: 1 et = (s)ts ds 2i line (Its a little like applying the lter to s(s). . . ) 27 et ts

Math 229

Barry Mazur

Lecture 7 c > 0.

7.3. Heckes theorem. Let , k be positive reals, and choose {a(n), b(n)} Dene the Dirichlet series (s) = b(n)ns a(n)ns (s) =
n=1 n=1

(Note that a(0) and b(0) are not used here. This will be a constant annoyance.) These act like (2s), and are dened in some right half-plane. However, we want expressions for (k s) (which arent even dened yet.) To get a better relationship, we introduce -factors: s s (s) = (s)(s) (s) = (s)(s) 2 2 Dene

f (z) =
n=0

a(n)e2inz/

g(z) =
n=0

b(n)e2inz/

We need to show: (1) (s), (s) are analytic in the right half-plane > c + 1. (Use the bounds a(n), b(n) nc ) (2) f (z), g(z) are analytic in the upper half plane. Also, f (x + iy), g(x + iy)

y c1 as y 0+

(3) (s) = 0 (f (iy) a(0))y s1 dy and similarly for . (Unrigorously, this is obvious; the exercise is to justify the commutation of and .) (4) Crux: the following statements are equivalent: (a) The function (s) + a(0) + b(0) is entire on C, and is bounded on every s ks vertical strip, and (s) = (k s) (b) g 1 z = z i
k

f (z)

Example 7.2. -function: k = 1 , = 1, g = f = 2 1 z m/2 = m (z) z i This helps you estimate the function that counts the # of ways to write n as a sum of m squares m Equivalence in the forward direction is what we did in the case of the Riemann zeta function.

7.4. Digression: Lattice theory in dimension 24. Let L RN be a lattice. You can produce a theta function L associated to the lattice: L (z) = 28 a(n)q n

Math 229

Barry Mazur

Lecture 8

where a(n) #{ L : , = n} If N = 1, then this is our . But any of them satises a transformation law. If L R24 , then elementary analysis shows that we can write L (z) = E(z) + D(z) where E and D are both modular forms of weight 12. E(z) has large coecients, but is very elementary (its an Eisenstein series). D(z) has smaller coecients, but is somewhat deep; this gives the dominant term for the arithmetic function n a(n). It turns out that D(z) is a multiple of (q) = q (1 q n )24 . For higher Waring problems, we n=1 dont have symmetries as nice. 8. February 16 8.1. Phragmen-Lindelf Theorem. o Theorem 8.1. Let f (s) be analytic in some upper vertical strip: 1 2 and t T1 . Require a |f (s)| = O(et ) on the entire strip, and |f (s)| = O(tM ) on the vertical boundary. Then |f (s)| O(tM ) on the whole strip. Corollary 8.2. If |f (s)| is bounded on the boundary, and |f (s)| = O(et ) on the strip, then its bounded on the whole strip. Clearly, the theorem implies the corollary. But: Claim 8.3. The corollary implies the theorem. Proof. Suppose f (s) satises the hypotheses of the theorem. Then g(s) = f (s)/sM is still analytic on the strip, and still satises the interior bound. But, its also now bounded on the boundary. By the corollary, g(s) is bounded everywhere on the strip. So f (s) = g(s) sM is O(tM ). So it suces to prove the corollary. Corollary. Let f (s) B on the boundary, and O(et ) on the entire strip. Choose m > a such that m 2 (mod 4). Write sm = rm eim . Its OK to move T1 once, so assume that the line from s (in the strip) to the origin makes an angle with the real axis that is almost 90o . That is, s in the strip looks like r ei where . Furthermore, assume that 2 m (this is why we want m 2 (mod 4)). So if sm = rm (cos m + i sin m) where sin m is close to zero, and cos m 1, then sm rm . For the moment choose a constant c > 0, and dene gc (s) = f (s)ecs 29
m a a

Math 229

Barry Mazur

Lecture 8

(recall sm < 0). This is bounded on the boundary, and on the interior its not so bad: |gc (s)| = O(et
a +cr m

cos m

(recall cos m < 0). The second term will eventually swamp the rst: we can nd T2 large enough such that if t T2 , then ta + c rm cos m 0. (Recall that these things are related by s = r ei = + it, and here rm cos m sm .) Let T3 T2 be arbitrary. Then gc is bounded on all four sides of the rectangle T1 t T3 , 1 2 , because: |f (s)| Bet
a

= gc (s) Bet

a +cr m

cos m

= |gc (s)| B

Use the maximum modulus principle for gc on our rectangle. gc is analytic on the region, and has absolute value bounded by B. That is, |f (s)| B ecr for any in this strip. This is true for every c. Taking the limit gives |f (s)| lim B ecr
c0
m m

cos m

cos m

=B

as desired.

8.2. Stu about the HW. We had

a(n)eny
n=1

for a(n)

nc . Break this up as

a(n)eny/2 eny/2
n=1 replace by 1

(New HW.) Assuming the functional equation (i.e. (s) = stuf f (k s)), Stirlings formula, and Phragmen-Lindelf, but not assuming Heckes theorem, o you can show that |(s)| = O(|t|M ) in vertical strips. Important case: assume that a(0) = b(0) = 0. Break up

(s) =
0

f (iy)y s ds =
1

Assume that a(0) = b(0) = 0. From what weve just said, we know that (s) is bounded in vertical strips. Also, it satises the functional equation (s) = 30

0 We know that f (iy) is rapidly decreasing, so 1 is absolutely convergent. Con1 vert 0 into a similar integral with bounds 1 that is absolutely convergent.

Math 229 (k s). You can show g We have:

Barry Mazur

Lecture 8

1 z

z i

f (z)

+i 1 y s (s) ds 2i i +i 1 g(iy) = y s (s) ds 2i i for 0. But this isnt so useful if we want to use (s) = (k s). First, +i +iT think of i as a limit of iT . Well have to move far to the left, so that k s corresponds to something that is safe for . Let > k. Integrate around the box with vertices { iT, iT }. The claim is that the contribution of the horizontal bits goes to zero as T , so you end up integrating

f (iy) =

+iT T

lim

y s (s) ds

+iT iT

y s (s) ds

= Residue

iT

If you integrate along the top edge of the rectangle you get
+0 +iT 0 +iT

y s (s) ds

|(s)| |(s)| d

By Phragmen-Lindelf, and maybe Stirling, we have (s) = O(tM ). By Stirling, o (s) = e 2 t . The integral y
Re(s)=0

is just a constant. So you get a bound out of the other two parts. This bound goes to zero as T . Eventually, youve shown that you can move the line of integration from + to . Hint for Carls problem: youre trying to switch the order of summation in sin(2nx) dx nxs+1 0 Break this up into
B

+
0 B

where you can use the dominated convergence theorem on the rst part, and the second part should go to zero. Use lemma (D.1) in the book to help bound the partial sums.

8.3. Entire holomorphic functions (Levent). References: Elkies Math 229 in 2009 Definition 8.4. Let pi C. We say that
N i=1 pi

converges to p = 0 i

lim

pi p
iN

31

Math 229

Barry Mazur

Lecture 9

If any pi = 0, then say the product is zero. Now, assume that none of the terms are zero. Proposition 8.5.
n1 (1

+ an ) converges i log(1 + an )) =

n1 log(1

+ an ) < .

Proof. ( = ) exp(

N 1

N 1 (1

+ an )

( = ) Fix a branch of log. Write Pn = n pi , and P = lim Pi . We know that Pn /P 1. i=1 Write sn = n log(1 + ai ). Dene the integer hn by i=1 log Pn log P + 2ihn = Sn But also 2i(hn+1 hn ) = log(Pn+1 /P ) log(Pn /P ) 0 For n suciently large, |2i(hn+1 hn )| < , which implies that hn+1 = hn . Definition 8.6. Proposition 8.7. (1 + an ) converges absolutely if (1 + an ) converges absolutely i | log(1 + an )| < . |an | < .

Lemma 8.8. Let f be entire and holomorphic such that f = 0 anywhere. Then there is some g that is entire and holomorphic such that f = exp(g). Corollary 8.9. Let f be entire and holomorphic, with zeroes at a1 an . Then there is some m N, and g entire such that
n

f =z e

m g i=1

1
1 an

z an

Note: this product converges absolutely i

converges absolutely.

Theorem 8.10. Let f be entire, {an } its zeroes. Then: there exists m N, polynomials pn (z), and an entire function g such that

f (z) = z e

m g(s) n=1

z epn (z) m

Moreover, if {an } is a sequence of complex numbers such that |an | , then there exists mn N such that

1
n=

1 an

exp

1 z + an 2

z an

+ +

1 mn

z an

mn

converges to an entire function. Note: one can choose pn to have degree h i


1 |an |n+1

converges.

9. February 21 9.1. More on Weierstrass theorem. Theorem 9.1. Let an be nonzero complex numbers, where |an | . Then 32

Math 229

Barry Mazur

Lecture 9

(1) There exist pn (z) for which 1 azn epm (z) converges uniformly (absolutely) to an entire function; (2) If f = 0 is entire, and {an } are its (nonzero) zeroes with multiplicity, then there exist m N, a sequence mn N, and an entire function g such that f (z) = e
g(z) m

z 1 an

z +1 an 2

z an

1 ++ m

z an

mn

General idea: if there are nitely many an = 0, then you can write z 1 f (z) = z m entire function an where m is the order of vanishing (if at all) at zero, and the entire function can be written esomething . When we introduce innitely many zeroes an = 0, then we need to worry 1 azn converges. In general, it doesnt. But I claim that we can x this by whether multiplying by epolynomial . To see this, use the denition from last time that says that you can check convergence of products by rst taking the log, and then checking convergence of the resulting sum. After taking logs, we want z log 1 + polynomial m to converge for some polynomial. In fact, this polynomial can be solved for by writing out the Taylor expansion for log. Remark 9.2. An interesting case is when mn = h (i.e. each fudge polynomial is the same length). In this case, the minimal such h is called the genus of {an }. If f is entire, such an h exists, and g is a polynomial, then the genus of f is max{h, deg g}. The point is that you can measure this, up to an order of 1, thanks to an estimate of Hadamard that measures growth rate. Suppose |f | O(exp |z|a ). The aim is to recover a. Definition 9.3. Let f be entire. Let M (r) = sup|z|r |f (z)|. Then the order of f is log log M (r) log r r Alternatively, is minimal such that, for all > 0, = lim sup |f | O(exp |z|+ ) This is something we can actually measure! Theorem 9.4 (Hadamard). Let f be entire. Then < h < , and in this case, hh+1 Example 9.5. Recall that Euler used the factorization z2 sin(z) = z 1 2 n n=1 33

Math 229

Barry Mazur

Lecture 9

to nd (2). We know where the zeroes of sin z are: they are just the integers, and so an 1 are the integers. Euler knew that converges. We can take h = 1. Since sin z has a n2 simple zero at each integer, z z en sin(z) = zeg 1 n
n=0

= zeg
n=1

z2 n2

Lets take logarithmic derivatives, and ignore convergence issues: cot(z) = 1 + g (z) + z
n=1

1 1 + n zn

By the expansion of cot, or using the Hadamard factorization theorem (which implies that 1 1 1 g has degree at least 1. . . ), this is also z + n=0 n + zn . So g is identically zero, and g is constant. Note that
sin(z) z

as z 0, and eg = .

9.2. The Riemann function. Recall the Weierstrass product expansion of the gamma function: s eCs en (s) = s s 1+ n
n=1

It has a pole at zero, and at every negative integer. Levents g is eCs . We can check the formula (s)(1 s) = sin s If you unfold this, you will get Eulers formula. Plausibility check: theres a simple pole at every integer on the RHS, and on the LHS (s) gives the poles at nonpositive integers and (1 s) gives the poles at positive integers. Or, you can write it as s s (6) 1 = 2 2 sin s 2 and (s) s + 1 2 = 2 212s (2s)
1

Riemann didnt dene the function; he dened the (s) function: s 1 s (s) = s(s 1) 2 (s) 2 2
our (s)

Having killed the poles, (s) is entire, and it satises a functional equation: (s) = (1 s) We also have (s) = (s) = 34 (s) = (1 s)

Math 229

Barry Mazur

Lecture 9

If is a zero of (s), then the usual notation is to write = + i, and 1 and are zeroes. So related zeroes are: + i 1 i i 1 + i So if you have a zero thats o the line Re(s) = 1 , then theres another one on the other 2 side. Theorem 9.6. (s) = 2s s1 (1 s) sin
stu relating to the functional equations of

s (1 s) 2

We might call this an inhomogeneous functional equation. Proof. A nuisance. But you just use the original functional equation of , plus the prior formulas for (s).
1 In the formula for , we can ignore 1 s(s 1), because its symmetric around 2 . 2

s1 s 1s (s) = 2 (1 s) 2 2 1 s 1s (s) = s 2 (1 s) 2 2

(s) = s 2

1s 2 s 2

(1 s)

But we have a formula for the fraction term, gotten by rearranging (6), and multiplying both sides by 1s : 2 1s 2 = s 2 which we plug in to get s s 1s 1 1 sin (1 s) 2 2 2 I claim were done, by Legendres formula 1 s s+1 = 2 21s (s) 2 2 in which we can replace s by 1 s: 1 1s s 1 = 2 2s (1 s) 2 2 Plug in and compare with what we want. All the terms are there; you can do the bookkeeping. (s) = s 2
1

1s 2

1 s s 1 sin 2 2

9.3. Zeroes and poles of . We know that has a simple pole at s = 1. Beyond this line, there are no zeroes or poles. By what we just proved about the symmetry of 35

Math 229

Barry Mazur

Lecture 9

the zeroes, (1 s) does not contribute any zeroes to the left of = 0 (were following the formula for in the theorem.) sin s has a zero at every even integer. (1 s) has 2 a simple pole at all positive integers, and no zeroes. These poles cancel out the zeroes, except at the even negative integers. These are called the trivial zeroes of . Other than these, there are no zeroes to the left of = 0. So there is a critical strip 0 1, where we dont know where the zeroes are. (It has been proven that there are no zeroes on the boundary lines.)

9.4. Bernoulli numbers. Curiously, Jacob Bernoulli introduced the numbers as the constant terms of polynomials, whose growth was studied. They appear everywhere: the numerators are related to the number of dierentiable structures on spheres; the denominators are related to the image of the J-homomorphism in algebraic topology. Dene SR (n) =
1a<n

ak

This is a polynomial 1 nk+1 + + (Bk ) n k+1 where Bk , the Bernoulli number, is dened as above, as a part of the constant term in this polynomial. The Bernoulli numbers also arise in the generating function: x x =1 + x1 e 2

B2k
k=1

x2k (2k)!

(You can use the generating function form to prove that the coecients are rational.) All 1 the odd terms (except B1 = 2 ) are zero, so we often write the Bernoulli numbers as B2k . There are other useful formulas: B2k = 2(2k)!(2i)2k (2k)n2k We know that (1 2k) is related to (2k); this gives rise to a formula B2k = (1 2k) 2k

9.5. Facts about the logarithmic derivative. Consider the operator f d D:f = log f (z) f dz Useful properties: D(f g) = D(f ) + D(g) m D((z a)m ) = za That is, a D((z a)m ) dz = m. Since D is logarithmic, this counts multiplicities for any function. If f (z) = (z a)m g(z), where g(z) is regular and nonzero at z = a, then Df (z) has the same property: m + regular part Df (z) = za 36

Math 229

Barry Mazur

Lecture 10

If f (z) is analytic in the neighborhood of some contour, and has poles inside, then 1 Df (z) dz = algebraic sum of multiplicities 2i 0 Example 9.7. Let > 1, and integrate xs ds s Once we do this, we will get an explicit formula for sums of logs of primes up to x. D(s)

10. February 23 Let f (s) be a meromorphic function. Let D be the logarithmic derivative: f (s) Df (s) = Note that
D

f d (s) = log f f ds

m s s0 In particular, if f (s) has a pole or zero at s0 , you can write it as (s s0 )m g(s) where g(s) has no zeroes or poles in a neighborhood of s0 . Then m D(s s0 )m g(s) = + Dg(s) s s0 where Dg(s) is harmless. Now consider a contour around some points mi that are zeroes or poles. Integrating around is the same as integrating around a little neighborhood of each point. So you get mi Df (s) = + something harmless s si D((s s0 )m ) =
i

Now let h(s) be holomorphic in the region in question. 1 Df (s) h(s) ds = m1 h(s1 ) + + m h(s ) 2i C If h(s) is meromorphic, then there will be an extra contribution corresponding to the poles of h. We had a Dirichlet series for > 1: (s) = where we had dened (n) = Now use the lter: 1 lim 2i T log p 0 ifn = pe else
0 +iT x s n

(n) ns
n=1

0 +iT 0 iT

xs 1 (s) ds = lim s 2i T 37

(n)
0 iT n=1

ds

Math 229

Barry Mazur

Lecture 10

1 Commuting sum and integration as in Perrons formula, this is just (x) = 2 ((x+) + (x)), where (x) = nx (n).

We know that (s) has a pole at s = 1, might have zeroes in the critical strip 0 < < 1, and has zeroes at the negative even integers. Assume 0 > 1. The idea is to replace integration on the line = 0 with integration on a line far to the left. To do this, integrate along a rectangular contour with vertices {0 iT, 1 iT }, where we take lim1 . You have to be careful that 1 never falls on a negative even integer. Lets ignore this for the moment.
1 Just read the textbook. To recap, the idea is to evaluate (x) = 2i 0 D(s) x ds s by evaluating the contour over an expanding rectangle in such a way that the horizontal components are zero. This contour is completely determined by the parameters 0 , 1 , T . Choose 1 0 = 1 + log x You cant just choose any random T . Instead, in any interval [T, T + 1], we can nd a good T , which we will notate T1 . Similarly, for 1 you also have to keep away from the zeroes. Just let 1 be the negative odd integers. 0 +iT1
s

=
0 iT1 side1

+
side2

+
side3

+residues (0). At the negative integers s = 2k,


2k

At s = 0,

xs s

has a pole, and you get a residue

the residue is the multiplicity of the zero, so we get a contribution of x2k . At s = 1, has a pole (so has a zero), and we get a contribution of x. At s = (any zero in the critical strip, counted with multiplicity), we get a contribution of x . Unrigorously, (x) = x x (0) +
k=1

x2k 2k

Now clean this up. Just using the Taylor expansion of log, we have
k=1

1 1 x2k = log 1 2 2k 2 x 1 1 (0) + log 1 2 2 x


Dene s(x) =

This is a smooth function. As x , this goes to large x.

(0); so its basically a constant for

Recall that every zero = + i has a companion zero = i. So x (x) = x lim x s(x) T
||T

38

Math 229

Barry Mazur

Lecture 10

This illustrates a general principle: whenever your abscissa is too large, you can try to pull the line further to the left by taking a contour integral, and counting the residues inside. Lets talk about
x ||T .

Theorem 10.1 (The not-too-many-zeroes theorem). If are the nontrivial zeroes of (s), counted with multiplicity, then 1 < + ||1+ for any . We wont prove this now, but heres a corollary. Corollary 10.2. If is the imaginary part of the corresponding , then 1 T ||
||T

for any > 0.

Proof. (2T )
||T

1 = ||1+

(2T ) ||

||

||T

1 ||

10.1. Explicit formula. Let R(x, T ) := (x) x +


||T

x + s(x)

Theorem 10.3. Fix x. Then R(x, T ) 0 as T . But, we can do better. Recall R(x, T ) is gotten by integrating three sides of a contour integral. Dene m(x, T ) = min 1, T xx where x is dened to be the distance between x and the nearest prime power, except if x is a prime power, in which case you go to the closest other prime power. (The issue is that jumps at prime powers.) Theorem 10.4. where x (log x)2 + m(x, T ) log x T is to be interpreted as x, T both tending towards . 39 R(x, T )

Math 229 We wont prove this now either.

Barry Mazur

Lecture 11

R(x, T ) x because all the log terms are insignicant compared to any xsomething . This gives an estimate x (x) = x + O( x) 1
||x 2 +

Let T x 2 + .

So this says that you dont have to compute that many zeroes to get a decent approximation.

10.2. Zero-free regions. Hypothesis 10.5. For


1 2

< < 1, there are no zeroes = + i with .

(Call this hypothesis , where the Riemann hypothesis is hypothesis 1 .) 2 (x) = (x) = x The aim is to estimate
x ,
1 x + O(x 2 + )

where x = x xi , where . So x+i x x || 1 ||


T

so for small .

x+

It has been proven that there are no zeroes on the line s = 1. Hypothesis gives (x) = x + O(x+ ) Every time you have a zero-free region, you get an estimate for
x .

Were interested in counting primes: nding (x). We have some chance of understanding (x), and we will see that this informs you about (x).

11. March 1 11.1. Tony. Recall we had dened (s) = s


2

s (s) 2 40

Math 229

Barry Mazur

Lecture 11

which satises the functional equation (s) = (1 s) The growth of is dominated by the gamma term. Dene (s) = the order of f is log log Mr |f | lim sup log r r |t|a then this picks out the a. If |f | e Proposition 11.1. order() = 1 We will do this by showing (1) There is no c such that |(s)| ec|s| (2) There is some c such that |(s)| ec |s| log |s| Along R, |(s)| = 1s + 1, so this doesnt really slow the growth rate. Stirlings approximation says that n n n! 2n e This dwarfs any exponential, which proves (1). It also means that s | ec |s| log |s| | 2 Now we need a bound for the zeta function. It suces to show, then, that |(s)| suciently far to the right. Write (s) = 1 1 + + ns s 1
1
x1s 1s 0

s(s1) 2 (s).

Recall that

|(|s|)|

xs dx

1 + s1

n+1

n=1 n s

1 1 s dx s n x
1 n y s+1 dy

That is, s(s 1) s s2 (s 1) (s) + 2 2 2 We have 1 1 s |s| s n x dy = C. 41

n+1

x n

1 y +1

dy dx

n=1 n

1 |y s+1 |
n

dy

and

1 1 y +1

Remark 11.2.

Math 229

Barry Mazur

Lecture 11

You can get zeroes of zeta in Mathematica as ZetaZero[z].

11.2. Chebyshev. Notation 11.3. We say f (x) for x suciently large. Definition 11.4. (1) (X) = (2) (x) = (3) (x) = Obvious 11.5. (X) (X) (X) 1 pX log p nX (n) =
pX

g(x) if there exist c1 , c2 > 0 such that c2 g(x) < f (x) < c1 g(x)

pe X

log p

There is a formula for (X) in terms of the zeroes of zeta, but it is complicated. Easier is to use (X) to approximate (X), and (X) to approximate (X). The goal: Theorem 11.6. (1) (X) = (X) + O(X 2 ) X (2) (X) = (X) + O (log X)2 log X This can be used to show Theorem 11.7. (x) x
1

Recall that

d|n (d)

= log n. By Moebius transformations, we can write n (n) = (d) log d


d|n

which gives an alternate form of (x) =


nx

(n) =
nx d|n

(d) log log n d

n d

=
dx

(d)
n x d d

42

Math 229

Barry Mazur

Lecture 11

Were going to make approximations to the Moebius functions, which we call mock Moebius functions. First, start with any d m(d), and denote n log m (x) = m(d) d d|n
dx
nx d d

Now we restrict m(d) by setting restrictions on the resulting m (x). Let T (x) =
ax

log a

and in particular we care about T


x 1 log t dt

x = d

log
d|n nx d d

n d

Approximate T (x) by

= x log x x which gives the bound

N log N N T (N ) (N + 1) log(N + 1) N 1 x n = log = N log N + a small error term of about log 2N d d d|n
nx d d

From now on, let O denote O(log 2N ). Now we get to impose a condition on m: we want m(d) = 0 for d suciently large (i.e. m has compact support).

m (d) = =

m(d) T

x = d

m(d)
dx

x x x log +O d d d m(d)
dx

m(d) x log x d

m(d) x log d d

x d

separating log x d

= (x log x x) A + x B where A =
dx m(d) d

and B(x) =

dx

m(d) log d . d

We want B = 1, and A 0. So require Hypothesis 11.8. A(x) = Hypothesis 11.9. B(x) = 1


dx m(d) d

= 0 for x large.

Under these restrictions, we have that m (x) x + O. Proposition 11.10. m 43

Math 229

Barry Mazur

Lecture 12

We are done if we show this, and show that there is some m(d) meeting our hypotheses. At this point, just assume that m(d) satises our hypotheses. m (x) = m(d) log d x
dx
d d d|

=
(d,n),dnx

m(d) log n =
(d,n) : dnx

m(d)
k|n

(k) collect indices and replace n = k

=
(k,d,) : kdvx [x]

m(d)(k)

=
k=1 [x]

(k)
(d,) : d x k

m(d) x kd F (y) =
dy [x] k=1 (k)

=
k=1

(k)
d x k

m(d)

number of s is

x kd

Set m(d)

y d

So our previous expression is F x . Now use the second hypothesis on the k mock-Moebius functions: we can write m(d) y y y y F (y) = m(d) m(d) = y m(d) = 0 m(d) d d d d d
dy dy d=1 dy dy

Notice that F (y) is periodic: F (y + D) = F (y), where D is the least common multiple of all the ds such that m(d) = 0 (i.e. the ds that are actually in the sum). x if x is large m (x) = (k) F k The aim was to compare m to = (k) 1. So we hope we can make F x close to k 1.

12. March 6 RECALL: we are interested in (x) =


nx

(n) =
dx

(d)
nx d d d|n

log

n d

We imitate this by creating a mock Moebius function m(d), which is chosen to satisfy certain axioms: (1) m(d) has nite support m(d) (2) d=1 d = 0

44

Math 229 Dene

Barry Mazur n d

Lecture 12

m (x) =
dx

m(d)
nx d d d|n

log

We had m (x) = (x log x x)


dx

m(d) xd d

dx

m(d) log d + O(log 2x) d

We were trying to show that (x) x by way of showing m (x) x. The second axiom guarantees that the rst term in m (x) is eventually zero; ideally, we want m(d) log d = 1. We wont get this, but we can make it close to 1. d d We want to transform m to make it look like . n m (x) = m(d) log d nx
dx
d d d|n

=
dnx

m(d) log n m(d)


k|n

combine the two indices n, d of summation into one sum

= =

(k) (k) =
k|n kdnx

m(d)
dnx x

m(d)(k)

=
k=1 x

(k)
dn x k

m(d) x kd

=
k=1 x

(k)
d

m(d)

How many n are there? dene F

=
k=1

(k)F (x/k)

That is, weve dened F (y) = = y = d y m(d) d m(d)


0

y y { } d d y m(d){ } d m(d)

y d This is a periodic function, with period D = lcm{d : m(d) = 0}. The simplest choice of m(d) is: m(1) = 1 m(2) = 2 m(d) = 0 if d > 2 = m(d)

45

Math 229

Barry Mazur

Lecture 12

The period is 2. Assume d y, so there is nothing at 0, the function is 1 for y [1, 2] [3, 4] and zero elsewhere.
[x]

m (x) =
k=1

(k)F

x k

If x k

x 2

then 1

x k

< 2 and write


x 2

m (x) =
k=1

(k)F

x + k

(k)
x <kx 2

(x) ( x ) 2

since F

x k

= 1 in the second case. This gives (x) m (x) (x)


log2 x x 2

There is some constant cm such that m (x) = cm x + O(log x) m (x) =


j=0

x j 2

x 2j+1

log2 x

cm
j=1

x + O((log x)2 ) 2j

This is gotten by plugging in the expression for m (x), but noting that the error term compounds. The claim is that any expression (x) x, with whatever explicit constant, can give information about the (x) (x) (x) situation as
1

(1) (x) = (x) + O(x 2 ) x (2) (x) = y(x) x + O (log x)2 log

which is essentially (x) =

(x) log x

+ o(x)

(x) =
pk x

log p =
k=1

(x k )
1

(x) = (x) +
k=2

(x k )

Pull o the term for k = 2: (x) (x) x +


k=3
1 x3 1 k k=3 1 k 1 2

(x k )
x
1 1 3 (x k

)
1 3

(x ) puts a log p for every prime p that occurs x ; the term x 1 1 3 (x k ) log x. k=3 x
x

subsumes this. So

(x) =
2

log(u)1 d(u) 46

Math 229 =

Barry Mazur (x) log x


x 2

Lecture 12

(u) d(log u)1


1 u log u 1

(x) (x) (x) = (x) +O log x log x

x2 log x

(x) We wanted to study (x) (x) , but up to the error term, we can just study (x) log x . log x We have x (u) (x) = (x) log x u(log u)2 du 2 Since (u) is bounded by const u, this is x du 2 2 (log u) Estimate by x x x du + = 2 2 (log u) 2 x bounded by

so the rst integral is essentially This gives statement (2).

du (log x)2

x, and the second integral can be estimated by

x . (log x)2

12.1. Prime number theorem. Find a zero-free region within the critical strip. Bounds for in the zero-free region. Basic contour integrals. Compute error terms. Combine error terms.

Take a safe vertical line 0 > 1, and a function c(t) with 1 < c(t) = c(t) < 1 with c(|t|) 2 monotone increasing, such that the wobbly strip R between c(t) and the line = 0 > 1. 1 The idea is that there should be no zeroes in the strip; so 2 < c(t) < 1. That is, R = {s = + it : c(t) 0 } Dene a contour C that cuts out a wobbly rectangle with height between t = + iT and t = iT , and width bounded by 0 and c(t). So this consists of two horizontal lines { iT : c(t) 0 }, and then a part of the line = 0 and part of the curve c(t). 1 2i (x) = 1 2i
C 0 +iT 0 iT

xs (s) ds = x s xs (s) ds + R(x, T ) s 47

Math 229

Barry Mazur

Lecture 13

where R(x, T ) is the error term from Perrons formula. The issue is to compute the integral over the other three sides; we will pick up error terms from each piece. Actually, we will make 0 a function of x, and move them to have more control over the error terms. The region R contains the pole s = 1. Look for a specic T0 > 0 (the book uses 7 ). Call 6 the patch the region bounding s = 1 by bounding the imaginary part by T0 t T0 ; the rest will refer to, well, the rest. We have some estimates on the patch: 1 (s) = + O(1) s1 1 |s 1| (s) Outside the patch, (s) log A zero of would show up as a pole of We will dene c(t) = 1
constant log .

; so this guarantees a zero-free region.

13. March 8 Recall from last time we described a hypothetical zero-free region, bounded on the right by the line = 0 , on the left by some function c(t) with 1 < c(t) = c(t) < 1 where c(|t|) 2 is monotonically increasing, and bounded on the top and bottom by T . This produces a contour C. Were trying to integrate xs 1 (s) ds x= 2i C(T ) s The integral on the right can be done using Perrons formula, which gives an estimate for (x). The trick is to bound the others, and nd the residue of the s = 0 pole; this will give an estimate for in terms of other things. To isolate the pole at 0, draw a rectangle around it, bounded vertically by T0 , and call this the patch. Estimates on the patch:
1 (s) = s1 + O(1) log((s)(s 1)) 1 1 (s) |s 1|

Estimate outside the patch: 48

Math 229

Barry Mazur log

Lecture 13

(s)

Label the contour C(T ) as Perrons formula:

A+ B

clockwise starting on the left.


2

(x) =
right

R(x, T )
x log x+ T (log x)

On the top (assuming T > T0 so we can use the estimate x > 1):
0 +iT c(T )+iT

(s)

log , and assuming

xs (s) ds s

log
c(T )

|xs | d |s| log T 0 x (0 c(T )) T log T 0 x T

log T

x0 d T

Break up the integral on the left into a few pieces, depending on whether were in the patch or not.
c(T )+iT c(T )+iT0 c(T )iT0 c(T )+iT

=
c(T )iT c(T )iT0 t=+T0

+
c(T )iT

+
c(T )+iT0

First piece:
c(T0 )+iT0 c(T0 )iT0

(s)
1 s1

xs s
xc(T0 ) const.

ds
t=T0

xc(T0 ) dt |c(T0 ) + it 1|

xc(T0 ) |1 c(T0 )| Top part of this:


c(T )+iT c(T )+iT0

remove it term

xs (s) ds s

log

xc(T ) dt c(T0 ) + iT
denom.t

(log T )2 xc(T )

Now add up the error terms weve gotten so far: (x) x log x + x log T 0 xc(T ) (log x)2 + x + + xc(T ) (log T )2 T T 1 c(T )

For given x, chose T and 0 as functions of x. We can ignore the log x term, because it will always be beaten by the other terms. The problem is log T x0 , because 0 > 1. T Choose: 0 = 1 + a(x), where a(x) =
1 log x . c log T

Prove: If T > 2 then c(T ) = 1 (T ) where (T ) = way of moving it around for T < 2.) 49

for 0 < c < 1. (We will get a

Math 229 Take x out of the whole thing: (x) x x

Barry Mazur

Lecture 13

1 (log x)2 log T log x x(T ) 2 + x + + x(T )(log T ) T T constant (T )

The problematic error term turns into x(T ) x log T log T (T ) c Do some bookkeeping, and you get T = exp( c log x) which gives an estimate exp(c log x) (For the bound, see theorem 6.4 in the book. Read this!) (x) x = O x

13.1. Heuristic argument for nonvanishing at = 1. Write R(s) = Re (s) Assume s = 1 + + i > 1, where we will be taking the limit 0. R(s) = (n)n1 cos( log n)

We know that |R(1 + + i)| R(1 + )|. Suppose 0 is a zero of the form 1 + i0 . Take R(1 + ) close to the pole; you pick up a residue so R(1 + ) 1 . You have a zero at 1 + i. So R(1 + + i0 ) 1 . A lot of the cos terms are 1 at 0 . 1 R(1 + + 2i0 ) So this has a pole here, which is a contradiction.

13.2. Actual argument. This starts with a trig formula: () := 3 + 4 cos + cos 2 = 2(1 + cos )2 (Use the fact that 1 + cos 2 = 2 cos2 .) 3R(1 + ) 4R(1 + + i) R(1 + + 2i) = (n)n1
5 6

(( log n))
0 because its 2 a square

Replace = 0 , where s = 0 + i0 is a zero of (s), where wont prove the estimates in the following table:

0 1 and |0 | 7 . We 8

3R(1 + ) 4R(1 + + i) R(1 + + 2i) 4 3 Estimate 0 + 1 0 Error O(1) c1 (log |0 | + 4) c1 (log |20 | + 4) 50

Math 229

Barry Mazur

Lecture 14

Set C = c1 (log |0 | + 4) and take = 6C You eventually get 1+

1 2C .

0 Plugging everything in: 0 +C +C 0 1 14C

4
1 2c

1 0

The table is the application of a result described below. Let f (z) be an analytic function on the disc |z| 1, where f (0) = 0. Let M = max |f (z)|. Choose a disc D(0, R) inside. Then Theorem 13.1. f (z) f 1 =O z zk M |f (0)|

zeroes zk in D(0,R)

14. March 20 14.1. Finite abelian groups. Let G be a nite abelian group; by the fundamental theorem of nite abelian groups, it can be written (non-uniquely) as a sum C1 C of nite cyclic groups. We can choose Ci to be the pi -primary component, for pi a prime factor. Direct sum and direct product are not the same thing. If G = G1 G2 , then G1 and G2 are two subgroups. If G = G1 G2 , then there are quotients G G1 and G G2 . Of course, if there are only nitly many factors, then G1 Gn and G1 Gn are isomorphic . . . but theyre morally dierent things and arise in dierent ways. (The book writes instead of . . . oops.) Let G = Hom(G, C ) be the dual group. We would like a more symmetric denition. So let A and B be abelian groups. A bilinear pairing is a map A B C where (a1 a2 , b) (a1 , b) (a2 , b) Given a A, there is a map A B that sends a to a = (a, ) B. There is also an analogous map B A. If this is an isomorphism, it is called a perfect pairing. Proposition 14.1. AB =AB Proof. It is easy to show an isomorphism Hom(A B, C ) A B. = 51

Math 229

Barry Mazur

Lecture 14

If C = Z/nZ is a nite cyclic group, then an element of C is a root of unity; that is, 2ia C = {e n }. Clearly, |C| = |C| here, so C C. Using the previous proposition, we have = G=G for any nite abelian group G. Fix g G. Then (g) = G (If g = e, since our pairings are perfect, there is some 1 such that 1 (g) = 1. Then (g) = 0.) Temporarily let 0 be the (g) = (1 )(g) = 1 (g) (g) = trivial character (the identity) in G. For xed , (g) =
gG

|G| 0

if g = e G otherwise

|G| 0

if = 0 otherwise

Recall the Chinese Remainder Theorem: if (a, b) = 1 and n = ab, then there are r, s such that 1 = ra + sb, and Z/nZ Z/aZ Z/bZ = where the backwards map is (x, y) sbx + rag. There is a similar isomorphism between groups of units: (Z/nZ) (Z/aZ) (Z/bZ) = For the rest of this lecture, n will have a prime decomposition pei . Apply the Chinese i=1 i Remainder Theorem, to write

G := (Z/nZ) =
i=1

(Z/pe ) = i
i=1

Gp i

If m | n then there is a natural reduction (Z/nZ) If p > 2, then (Z/mZ)

(Z/pe Z) (Z/pZ) = where is the group of units congruent to 1 mod p; it is canonically cyclic of order pe1 , with generator 1 + p. For p = 2, this is (Z/2e Z) (Z/4Z) = where contains units u with u 1 (mod 4). 14.2. The Character. Let q N, and let G = (Z/qZ) . A Dirichlet character is a function G : (Z/qZ) C that is totally multiplicative: that is, (mn) = (m)(n). Definition 14.2. 52

Math 229

Barry Mazur

Lecture 14

(1) We say that q is the modulus of ; this is implicit in the choice of (note that any character of modulus q gives a canonical character of modulus n q for every n: just precompose with (Z/nqZ) (Z/qZ) ). (2) Let 0 be the principal character of modulus q: the map (Z/qZ) {1} (3) Since G = p Gp , and G = p Gp , every : G C induces homomorphisms p : Gp C , so we can write = p . This is the local factorization of . (4) The minimal modulus is the smallest q such that comes from : (Z/qZ) C , in the sense of (1). (5) If is a real character, then its image is C C ; either its a quadratic character, or its trivial.

Given a character : (Z/qZ) C , we can extend it to a map Z C , where a 0 (a (mod q)) if (a, q) = 1 if (a, q) = 1

In (Z/pe Z) (for p = 2) there is a unique quadratic character. If a is a square (mod p), then (a) = 1; otherwise (a) = 1 (unless its the trivial character). This is the Legendre character a = a . If p | a, then it is zero. q Dirichlet characters are totally multiplicative: that is, they are homomorphisms. Take a quadratic extension K = Q[ d] Q. Let A be the ring of integers of K. If p ramies in A, then (p) lifts to a square of prime ideals P 2 A (this is only possible for divisors of the discriminant). If p is split, then there are two (conjugate) prime ideals P, P in A. If p is non-split, then (p) lifts to a prime ideal in A. So we can dene a character 0 if p ramies p K (p) = +1 if p splits 1 if p is inert (does not split) There is only one way to dene K (1) in order to make a Dirichlet character. (It turns out to be the sign of d.) Exercise: show that this is a (quadratic) Dirichlet character; nd the minimal modulus.

14.3. Dirichlet L-functions. For > 1 a Dirichlet L-function looks like (p) 1 (n) L(s, ) = = 1 s ns p The proof of convergence is the same as the proof for . 53

Math 229

Barry Mazur

Lecture 15

Case 1: = 0 . If p | modulus of 0 , then the corresponding factor in the product is just 1. The rest look like the factors of when written as a product; that is, L=
pq

1 ps

which is just the product expansion of deprived of nitely many factors. As earlier in the course, dene (n) A(x) =
nx

log |A(x)| log x x (n) over a full set of representatives mod q, you get But |A(x)| q, because if you sum (n) only the last few terms matter.) This shows that c = 0. There is a zero. (So in cottage industry around showing that c = lim sup (n) q 1
M nN

and we have

where > 0. Corollary 14.3. L(1, ) = is convergent. Exercise: show that L (s, ) = L (n) n

(n)(n)ns
n=1

14.4. Deprived L-functions. Let L(s, ) be an L-function. In the 0 case, we noticed that L(s, ) is the -function, deprived of some factors. In general, let LD (s, X) =
pD

(p) ps

= L(s, X)
p|D

(p) ps

The last factor has no poles or zeroes. The only thing that changes is the nature of the 1 residue. For example, L(s, 0 ) has a pole at s = 1 with residue p|D 1 p . 14.5. -functions of quadratic number elds. Let K Q be a quadratic extension; we dened K to be the quadratic character associated to K, which encodes splitting information. Dene (p) 1 1 1 K (s) = (s) L(s, K ) = 1 s 1 s p p p p To be continued. 54

Math 229

Barry Mazur 15. March 22

Lecture 15

Recall we had a quadratic extension K Q, and an associated Dirichlet character K . We were computing (s)L(s, K ), for > 1. (Weve already shown that L(s, K ) extends to > 0, so everything is at least dened here.) 1 1 ramied 1 ps 1 K (p) 1 2 1 1 1 = (s) L(s, K ) = 1 s 1 ps split p ps p 1 1 (p1)s inert 2 number of primes over p NK/Q P p ramied p split p inert 1 1 1
1 ps 1 ps 1

1
2

p p p2

2
1

1 (p2 )s

where NK/Q P is the norm. So this is 1


P |p 1

1 (NK/Q P )s

We dont necessarily have unique factorization in quadratic number elds, but we do have unique factorization of ideals. By the same reasoning that produced the equivalence of the sum / product forms of the Riemann zeta function: K :=
Ideals I=0

1 = (N I)s

1
P

1 (N P )s

where N I is the norm of I. This is the Dedekind -function. Proposition 15.1. L(1, K ) = 0 (so s = 1 is a pole of K )

Proof. By contradiction. If L vanishes, then by Landaus theorem K would be analytic up to = 0. Write 1 1 K ( ) = + = (n2 )s + = + positive 2 n single-element
ideals

the rest >0

0<n

which diverges. Corollary 15.2 (of HW). If is any quadratic Dirichlet character, then L(1, ) = 0. 55

Math 229

Barry Mazur

Lecture 15

15.1. Arithmetic progressions. Let be a Dirichlet character of modulus q, and an integer a with (a, q) = 1. Proposition 15.3. 1 (a) (n) = (q) Applying the proposition: (n)ns =
na

1 0

n a (mod q) else

(n)ns
n

1 (q)

(mod q) n=1 L (s,) L

1 (a)(n) (q)

(a)

(n)(n)ns =

1 (a) L (s, ) (q) L

1 L 1 (a) L (s, ) (s, 0 ) + (q) L (q) L =0 1 + O(1) + = (q)(s 1)

If L (s, ) for = 0 has a pole, then the L-function either has a zero or a pole. L does L not have a pole. So we have to prove that L(1, ) = 0 for all = 0 .

L(s, ) = exp
modulus q

log L(s, ) = exp = exp = exp


n1 (mod q)

(n) (n)ns log n (n) s (n)n log n


1 or 0

(n) log n ns

L(1, ) 1 L(s, ) = L(s, 0 )


quadratic modulus q

L(s, )
ord>2

L(s, )

If any of these are zero, then we get a nite number; if two factors are zero at s = 1, then this is zero, contradicting L(, ) 1 for > 1. So we have at most one factor contributing zero at s = 1. We know that factor of quadratic L(s, ) do not contribute a zero. So we have to worry about the characters of order > 2. Then = . Recall that 56

Math 229 is conjugate to is a contradiction.


ns

Barry Mazur (n)


ns

Lecture 15

(n)

. If contributes zero, then also contributes zero, which

Corollary 15.4.
na (mod q)

(n) ns

L We just showed that 1 =0 (a) L (s, ) is bounded. So using the earlier expression (q) for na (mod q) (n)ns , we have

(n)ns =
na (mod q)

1 + O(1) (q)(s 1)

as s 1+. (n)n
na (mod q) s

=
pk a (mod q)

log p = pk

pa

(mod q)

log p + p
pk a

(log p) pk
k=2 (mod q) k=2 log p p p(p1)

log p

pk =

which converges. Corollary 15.5.


pa (mod q)

log p = p

15.2. Gauss sums. Fix of modulus q. Let (ugh) be a primitive q th root of unity e
2i q

. () =

(a)e
a=1

2ia q

=
a=1

(a) a

Treating the sum as a discrete integral, etc., this is sort of like a discrete version of the -function. Proposition 15.6. (1) If (b, q) = 1, then (b) () = () =
a q ab a=1 (a)

since

(ab) ab = (b)
a

(a) ab

This is also true for all b if is primitive, because both sides are zero. (2) Let be primitive. Then | ()| = q. Proof. We will show that | ()|2 = q. Fix any b such that (b, q) = 1. From (1), | ()2 | = 57 (a) ab
2

Math 229

Barry Mazur

Lecture 15

because (b) is nonzero and goes away when you take the absolute value. Sum over b.
2

(q)| ()|2 =
b (b,q)=1

(a) ab
a 2

=
b a q

(a) (a) ab
a=1

ab

if (b, q) = 1, everything is zero.


q

(a) ab so

(c) cb
c=1 a q q

(q) | ()|2 =
b=1 a=1 c=1

=
a,c

(a)(c)
b

b(ac)

If a c (mod q) then the last sum is zero. Otherwise, the sum is q. So this is (a) (c) q = (q) q = (q) | ()|2
ac (mod q)

Putting these facts together, if is primitive mod q, (n) = (Since | ()| = 0, we can divide by it.) 1 (a) an () a=1
q1

15.3. Evaluating L(1, ). We know L(1, ) =


n=1

(n) n

converges (but not absolutely). an If 1 a < q, then n converges. (We had our old theorem about c = log |A(x)| log x . If A is controllable, then you control convergence. You can make the same argument for the additive character an .) Substitute the old expression for (n) into the denition of L(1, ). L(1, ) =
an 1 (a) n () n=1 a=1 58 q

Math 229

Barry Mazur 1 = ()

Lecture 16 an n

(a)
q n=1

We have written L(1, ) as a nite sum, having switched summations, weighted by character of a function of a. Dene zn 1 F (z) = = log n 1z if |z| 1 and z = 1. So 1 (a)F ( an ) L(1, ) = ()

16. March 27 Recall we had a primitive Dirichlet character of modulus q. 1 L(1, ) = ()


2i

q1

(a)
a=1
n

2ian q

n=1

1 We had dened = e q and F (z) = zn = log 1z for |z| 1, so that L(1, ) = n=1 1 (a)F ( an ). Take the branch where log(x) is real if x > 0, and log(ei ) = i () (, +]. Take 0 < a < q. Use trig: ia +ia 2ia ia ia a 1 e q = e q (e q e q ) = 2ie q sin q a 1 a i = e q 2 2 sin q 2ian 1 a a 1 F (e q ) = log ) i 2ian = log(2 sin q q 2 1e q 1 (a) log(2 sin a ) i (a) a L(1, ) = ) q ( ()q T () S( )

Note that if (1) = +1 then (by symmetry with replacing a q a) we have T () = 0. More precisely, (a)a = (a)(q a) = (a) (q a) = q (a) (a) a Similarly, if (1) = 1 (a) log 2 sin a q = = (q a) log 2 sin (q a) q (a) log(2 sin( a )) = q 59

(a) log 2 sin a q

Math 229 So, we have two cases: L(1, ) = 1 ()

Barry Mazur

Lecture 16

(a) log 2 sin a q


q1

even odd

i (a) a L(1, ) = q () a=1

Let = K be a quadratic character (arising from a quadratic extension K Q). First assume that K is a real quadratic eld. ia (a) ia 1 L(1, K ) = log e q e q () (There should be an extra factor of i in the denominator, because the 2 in 2 sin cancels the 2 in the 2i denominator of the exponential version of sin. . . Prof. Mazur claims that it goes away when you examine terms the terms with i in the denominator cancel the terms with it in the denominator?) Write this as 1 log( ) L(1, ) = () Note that = since this is a quadratic character. Let e 2q = , a root of unity. So this is ia (a) ia (a) = ( a a ) = e q e q Z[] Q() (a root of unity), and of unity). Let = 2 be this q th root of unity. The root of unity a doesnt matter; we really only care about 2a 1 = a 1, for (q, a) = 1. Suppose we have (b, q) = 1, so b 1 = ( 1)( b1 + b2 + + 1) Find b such that bb = 1. Then we can write 1 = ( b)b 1. So the above equation is also 1 = ( b 1)(( b )b1 + + 1)
So the ratio of any a1 is a unit (if (a, q) = 1 = (b, q)), which also means that is a 1 unit in Z[], where is a primitive q th root of unity. It turns out that K.
b 2i

2q th

Z[ 2 ]

a ( 2a 1) Q( 2 ) (q th roots

Let A be the ring of units of K, and let U = A A K be the group of units. Then U = {1} Z , where is the fundamental unit: the generator that has the property > 1 (i.e. it is real and positive) in the usual embeding. So = h (where h is now just some arbitrary power. . . ). (Note that it is not negative, because it was the log of something > 1.) It turns out that |h| is the class number. Digression 16.1. The class number is the obstruction for A to satisfy unique factorization. If it is 1, then the ring of integers A satises unique factorization. If A is a PID, then it has unique factorization. The class number is the obstruction to being a PID. Consider the 60

Math 229

Barry Mazur

Lecture 16

multiplicative system of nonzero ideals; there is a sub-multiplicative system of principal ideals. There are no inverses; but impose the equivalence relation I J if there exist principal ideals (n), (m) such that (n)I = (m)J. There is an induced multiplication, and you can show that this is a nite group. This is called the ideal class group, and the class number is its size. If it is trivial, then every ideal is principal. So L(1, ) = 1 ) = log( h ) = h log , where log is called the regulator of the quadratic ( eld. This is an example of an analytic formula. 16.1. The Dedekind zeta-function of a number eld. Let A K be the ring of integers of a number eld K. 1 1 = =: K (s) s (N I) 1 N P s
ideals P prime

These enjoy some basic facts of life: (1) There is an analytic continuation to C with a single, simple pole at s = 1. Unlike (s), the residue of the pole has arithmetic information: res = 2r1 (2)r2 (Class number)(Regulator) (#roots of 1) |Discriminant|

If K Q is a degree-n extension, then there are r1 embeddings into R, and r2 distinct embeddings into C (not including the trivial ones into R), up to conjugation. So n = r1 + 2r2 . (This information is easy to come by: take the minimal polynomial of a primitive element of K; ask how many real roots and how many pairs of complex roots.) (2) There is a functional equation as follows. Dene s s R (s) = 2 2 (the Gamma-factor weve seen in the Riemann -function), and C (s) = 2 (2)s (s) In general, K (s) = R (s)r1 C (s)r2 If you dene K (s) = K (s) |DiscK | 2 K (s) then there is a functional equation K (s) = K (1 s) Remark 16.2. Read Ireland/ Rosen, or Davenports Higher Arithmetic, for an elementary treatment of class numbers. 16.2. Analytic continuation of L(s, ). Let be a primitive, modulus-q Dirichlet character. In the Riemann case, recall the motivation for Heckes theorem: we introduced , used Poisson summation, got the functional equation for , which gave analytic 61
s

Math 229

Barry Mazur

Lecture 17

continuation and a functional equation for (s). We will do the same in this case, by constructing , etc.

17. March 29 Let z C where Re(z) > 0. Let be a primitive character of modulus q. Dene = () =

0 1

even odd
n2 z q

even (z, ) =
n=

(n)e n(n)e

even odd

odd (z, ) =

n2 z q

Theorem 17.1. There are functional equations 1 () 1 , even (z, ) = 1 z 2 even z q2 () 3 1 2 odd (z, ) = , odd 1 z z iq 2 n in the sums above are only interesting as elements of congruence classes mod q. That is,
q1

even (z, ) =
a=1

(a)
m=

(mq+a)2 z q

Change variables A = a , Z = qz. q Theorem 17.2.


(m+A)2 Z m= e

= Z 2

2ikA e k Z m= e

Use Poisson summation:

f (m) =
m= k=
1 t2 Z

f (k) . Now apply this:

with f (u) = e(u+A)

2Z

and f (t) = Z 2 e2iAt e


q1

even (z, ) =
a=1 q1

(a)
m=

(m+ a )2 qz q

=
a=1 m=

(mq+a)2 z q

q1

=z

1 2 a=1

(a)
m=

2ik a q

k2 qz

62

Math 229

Barry Mazur =
k=

Lecture 17

k2 qz

q1

(a)e
a=1

2ika q

Gauss sum
k2

=
k=

qz

(k) ()
k2 qz

(qz)

1 2

()
k=

(k) e

() q
1 2

z 2 even

1 , z

() () = 1 ik q 2 Now we can write the functional equation more concisely as 1 , (z, ) = 1 z z 2 +k 17.1. Heckes Theorem. The input is , k, and two functions f (Z) =

Dene the epsilon factor :

a(n)e

2inZ

g(Z) =
n=1

b(n)e

2iZ

1 i k = f (Z) Z Z Ocially exclude the case q = 1; this is the Riemann zeta function, and weve done it. g Take Z = iz, = 2q, and = (). f a(m)e g Dene a(m) = b(m) = 0 2(n) if m is non-square if m = n2
mz q

such that

g = z k f (iz)

b(m)e

mz q

i z

0 if m is non-square 2(n) if m = n2 Exercise: Finish this for HW; some stu is probably wrong here. . . 17.2. Cyclotomic Fields. Consider the extension Q(e N ) Q (that is, adjoining a primitive N th root of unity). This is Galois, where G (Z/N Z) . The conjugates = of are a (for (a, N ) = 1). Let N denote the group of all N th roots of unity in Q(); 63
2i

Math 229

Barry Mazur

Lecture 18

the generators are the aforementioned a . An element in the Galois group is a map Q() Q() that is the identity on Q; so it is uniquely determined by assigning i ai . If we call this automorphism a , the isomorphism (Z/N Z) G is just a a . Let K L be any Galois extension, with Galois group G. There are rings of integers B L and A K. By the primitive element theorem, write L = K(), where satises an irreducible polynomial f (X) K[X] of degree equal to [L : K]. So L = K[X]/(f (X)). We are in a separable extension, so f is separable, i.e. all its roots 1 , , n are distinct. The Galois group acts on the set {i }, and there is a single orbit. We can always assume that f is monic. is an algebraic number, so it is in B; so A[] B. (It is not always the case that a ring of integers is generated by a single element; the inclusion here may be proper, and we dont care.) The ground eld doesnt matter, so assume K = Q. Then B = Z[]. Reduce mod primes; B Z[]/pZ[]. There is a map f (X) Z[X] f (X) Fp [X] Roots i get sent to roots i of f . If p Disc, the i are distinct. But f might split as f = f 1 f (irreducibles). So Fp [X]/f i (X) are elds. So you can write pZ[] = P1 P where Pi is the kernel of Z[X]/(f (X)) Fp [X]/(f i (X)).

18. April 3 Remark 18.1. There was a typo. The functional equation should have been g 1 Z = Z i
k

f (Z)

Let = e N be a primitive N th root of 1. Form Z[] = Z[X]/(N (x)) where N is the cyclotomic polynomial 1 + X + + X N 1 , whose roots are all the primitive N th roots of unity. We had an extension Q() Q, with an isomorphism G (Z/N Z) (where G is = the Galois group), identifying a (Z/N Z) with the automorphism a . Now let L K be any Galois extension. By the primitive element theorem, we can write L = K[] = K[X]/F (X), and suppose 1 , , d are all the roots of F (that is, = i for some i and i is the full set of conjugates). If A K is the ring of integers, B = A[] is the ring of integers of L. [Oops! The Galois group doesnt necessarily x the ring of integers. . . We should let B = A[1 , , n ], but then this messes other stu up. Fixed below.] Let A be a prime ideal, not dividing disc F . Let k = A/. We can then factor F (X) = fj (X) k[X]. The ideal lifts to B B, and j=1 B/B = k[X]/(F (X)). Check that F is a separable polynomial. By the Chinese Remainder Theorem, k[X]/(F (X)) = kj where kj = k[X]/(fj ). This gives a nice j=1 description of the prime decomposition of B B: if Pj is the kernel of B kj , then 64

2i

Math 229

Barry Mazur

Lecture 18

B = Pj . The Galois group G of the extension operates transitively on {1 , , d } j=1 and on {P1 , , P }. Furthermore, given a xed , the map G {1 , , d } given by g g is a 1-1 correspondence. This is not the case for the Pj : let Gj = {g G : gPj = Pj } be the stabilizer. Gj acts on kj = B/Pj , and hence represents an element of the Galois group. So now {G1 , , G } is in 1-1 correspondence with {P1 , , P }, via the obvious map Gj Pj (G acts by conjugation on the left and multiplication on the right). Consider the collection {Gj }j=1, , as a collection of subgroups of G. This corresponds to a full orbit under the action of G. Gj acts on B and stabilizes Pj , so it acts on kj = B/Pj . This induces a mapping Gj Gal(kj /k), which turns out to be an isomorphism. All the Gj are cyclic, and there is a canonical generator: j : x xq , where q = |k| (this is a homomorphism on kj = B/P j ). Call this Frobenius at Pj . We have a collection {j }j=1, , , which form a single conjugacy class in G. This depends only on the choice of , so call it C G. It turns out that none of this depends on the choice of or B. 18.1. More general theory. We have an extension L K, with A K the ring of integers. Let A be a prime ideal, and let A = lim A/m be the localization. Then let m B = B A A . Suppose that disc(L/K) (where the discriminant is now an ideal). We have BPj . As before, the Gj act on BPj . Let kj = BPj /P j BPj be the residue eld; we have a Galois group Gal(kj /k). Let the inertial group be the kernel of Gj Gal(kj /k). Assume that disc(L/K), so there is a unique generator j : x xq . For all that are good (i.e. those that dont divide disc(L/K)), we have identied a single conjugacy class C G, called Frobenius at . For example, let Q()/Q, and let G (Z/N Z) . This is an abelian group, so conjugacy = classes are single elements. If p is good (i.e. p N ), then Cp is the image of p in G (i.e. its p (mod N )). If L K is a Galois extension, and C G = Gal(L/K) is a conjugacy class, let PC = { good : C = C} be the Chebotarev class. In the previous example, the Chebotarev classes are the arithmetic progressions p = a + N t, for given N . 18.2. Review of representation theory. A representation is a homomorphism : G GLn (C) = AutC (V ) where V is an n-dimensional vector space. We will be considering representations up to conjugation. If G acts on V1 and V2 , we say that the representations are equivalent if there is an isomorphism h : V1 V2 such that, for all v V1 , g G, we have h(gv) = gh(v). The character of a representation is : G C that maps g T race((g)). This is well-dened up to equivalence: if you change a matrix up to conjugation, you dont change its trace. 65

Math 229

Barry Mazur

Lecture 18

Given a character G C, this factors through the set C of all conjugacy classes; so there is an induced function C C, which we also call . Let HG = M aps(C, C), which we make into a Hilbert space as follows: 1 (x)(x) ( | ) = |G|
xG

There is a natural basis. If c C, then c : c 0 c 1 if c = c if c = c

A representation on V is irreducible if it cannot be decomposed as 1 + 2 on V1 V2 . An irreducible character is one that comes from an irreducible representation. Theorem 18.2 (Orthogonality relations). {}
irreducible

HG is an orthonormal basis.

So now we have two bases: the natural basis, and the deeper basis of irreducibles. We can relate these: |c| (c) c = |G| irreducible Also, the representation is determined uniquely by the irreducible characters, and it is expressible uniquely as a sum of irreducibles.

18.3. Artin L-functions. Let L/K be an extension with Galois groups G, and a representation G GLn (C). Choose a prime disc(L/K). We have a conjugacy class C of elements. Choose C and a positive integer m. m is uniquely determined up to conjugacy, and so are (m ) and T race((m )) =: (m ). Call = and (m ) = ( m ). We will use this to produce a Dirichlet series, the local at Artin Euler factor attached to and L/K:

L (s; , L/K) =
m=1

( m ) m(N )ms

The numerators are bounded, so this makes sense for > 1. Alternatively, let be Frobenius at ; ( ) GLn (C) M atn (C). Let identity matrix here. Exercise: show that 1 L (s; , L/K) : s ) N s ) det(1 (

1 denote the

for suciently large Re(s). Let be (possibly) ramied. The inertial group Ij is the normal subgroup given by the kernel of Gj Gal(kj /k) 66

Math 229

Barry Mazur

Lecture 19

We had a representation acting on a vector space V ; but Gj and Ij also act on V . Let V I = {v V : v = v Ij } This is designed to have an action of Gj /Ij = Gal(kj /k), which we call j . Frobenius is sitting in this latter Galois group, and it corresponds to some element j Gj /Ij . If V had dimension n, let nj be the dimension of V Ij . We have 1, j (j ) GLn (C) M atnj (C). Then we have a function s 1 j (j )(N )s M atnj (C) Dene L(s, , L/K) =

L (s, ; L/K)

where the product is taken over all , ramied or not. 19. April 5
n 19.1. Zeros of L-functions (Akhil). We cared about (n), and more gennaq) ( 1+i L (s,) xs 1 erally (x, ) = nx (n)(n) = 2i 1i L(s,) s ds. Recall that had no zeros in c the area 1 log t for some c.

Proposition 19.1. Let be nontrivial, with (, 2). Then 1 L(s, ) (1 + 1 ) min{ , log } 1 5 Proposition 19.2. If 6 2, then L (s, ) 1 = + O(log ) L(s, ) s
close to s 5 6

where close means in the circle of radius

around

3 2

+ it.

This comes from a more general fact about analytic functions, and the proof is the same as in the case. 1 Observation 19.3. If Re(s) > 1 then Re L (s,)) Re s C log for any root L(s, close to s. In particular, if = + i, take s = 1 + + i. Then 1 L (s, ) Re C log L(s, ) 1+ Proposition 19.4. L (, 0 ) L ( + it, ) L ( + 2it, 2 ) Re 3 +4 + 0 L(s, 0 ) L( + it, ) L( + 2it, 2 ) This is the analogue of a similar inequality from earlier. L 3 (, 0 ) = +c L 1 67

Math 229

Barry Mazur L ( + 2it, 2 ) c log L 3 L (s, ) + C log Re L 4( 1) 1 3 c log r+ 4 + c log

Lecture 19

Let D = , then r

1 3 + c log (1 + r) 4Dr 1 1 3 c log = r D + 1 4D r (x, q, a) =

(log )1

Theorem 19.5.

1 log x + O(xe log x ) (q)

19.2. Recall we had an extension L K and a Frobenius element G, where is a good prime of K. For example, in the case of the cyclotomic extension Q(N ) Q, a good prime is one such that p N . It determines an element in the Galois group G = (Z/N Z) ; this is p . We had Dirichlet L-functions, which involved a Dirichlet character : (Z/N Z) C , and Artin L-functions, which involved a Galois character : G C . When L = Q(N ) and K = Q, then there is a canonical identication between these two. In general, if you have an abelian extension (i.e. the Galois group is abelian), then there is an analogue that is a Dirichlet character. The relationship between them is basically class-eld theory. (Whats interesting is that the Dirichlet character has nothing to do with L, only K; but the whole point of Artin characters is that they depend on L.) Ideals form a monoid. We can make it into an abelian group by adding in fractional ideals; call this I. Consider the characters : I C . Given an extension L K that is Galois and abelian, for all good primes you get an element G. You a get a map from most of I to G (i.e., sending ), just by multiplicativity. So a character : I C factors through G. . . It took quadratic reciprocity to guarantee that the K we dened before is actually dened on (Z/N Z)

19.3. More algebraic number theory. Given an extension L K with Galois group G, and a prime K that lifts to P , we can form the decomposition group DP G, the subgroup stabilizing P . We dened the inertial group IP as the kernel of DP Gal(kP /k) (recall kP = B/P , where B is the group of integers). From last time, 1 L (s, , L/K) = P ( ) N s ) det(1 68

Math 229
P

Barry Mazur

Lecture 20

where p : DP /IP Aut(V I ). Then the global Artin L-function is L(s, , L/K) =

L (s, , L/K)

You can show that this actually converges for > 1; in the cyclotomic case, if is a 1-dimensional character, then this is the Dirichlet L-function. Let M L K be a Galois tower: there is a Galois group G(M/K), and hence an induced Galois group G(M/L) (we dont care whether L K is Galois). By composition, given L : G(L/K) GLn (C), we get M : G(M/K) G(L/K) GLn (C). We have L(s, L , L/K) = L(s, M , M/K), because if { } are the Frobenius elements in the rst extension, and {M } are the Frobenius elements in the second, the M map onto . Taking 1 : G {1}, then L(s, 1, L/K) = K (s) is an incomplete Dedekind -function.

19.4. Induced representations. Let H G be a subgroup, and suppose you have a representation : H GLn (C). Then I claim you get an induced representation IndG () : G GLm (C), where m = [G : H] n, such that for every tower of elds H M L K (where M K has Galois group G, and L K has Galois group H), then L(s, , M/L) = L(s, IndG , M/K) H This is basically linear algebra. Let W be a nite-dimensional C vector space with H acting C-linearly. Dene C[H] = hH C h, an associative C-algebra. By facts about representation theory, this is a direct sum of total matrix algebras, one for every irreducible character of H. You get a tensor product C[G] C[H] W

20. April 10 April 24: Second 1 1 hour exam 2 Last time we introduced the group ring C[G]; I mentioned that a G-representation V is the same as a left C[G]-module. G acts on C[G] by left multiplication, so C[G] is a left (and right) G-module; this is called the regular representation, and it has degree |G|. When G was abelian, G = Hom(G, C ). In general, we let G be the set of irreducible G-representations. These are completely determined by their characters. The number of these is = the number of conjugacy classes in G. Suppose 1 , , s are the irreducible characters, which have degree d1 , , ds . Recall
s

C[G] =
i=1

End(Vi )

which gives that |G| =

s 2 i=1 di .

69

Math 229

Barry Mazur

Lecture 20

Let H G be nite groups. We have a character : H C, which belongs to a representation W . We want an induced representation V = IndG W = C[G] C[H] W H of G, with character = IndG . Let R = {}G (mod H) be a set of representatives from H each coset G/H. For all g G there exists hg/ H such that g = hg, . So G acts on R W , where W W sends w hg, w. Recall was the character of the H-representation, and = IndG is the character of the induced G-representation. This H can be dened as 1 (u) = (x1 ux) (r1 ur) = |H| xG rR
r 1 ur=hu,r H x1 uxH

Alternatively, we could just write this as GH = 0.

rR

(r1 ur) where is extended to G by

Let M L K be a sequence of elds, where M K is Galois with Galois group G, and M L is Galois with Galois group H. There is some prime of K that splits as a series of Pi in L, and a series of Qi in M . We have Artin L-functions L(s, ; M/L) and L(s, , M/K). Theorem 20.1. L(s, ; M/L) = L(s, , M/K)

This is not that deep. We know that L(s, , M/K) = L (s, ; M/K). Dene L (s, , M/L) = P | LP (s, , M/L) (where ramied primes are not counted with multiplicity). We will actually avoid ramied primes; they make the notation worse. So it suces to prove Proposition 20.2. L (s, , M/L) = L (s, ; M/K)

Before proving this, we will do a special case. Example 20.3 (S3 -example). M K has Galois group S3 , and M L is degree 2. So L K is a non-Galois cubic extension. Write H = {1, t} and G = {1, t, r, rt, r2 , r2 t}. Let R = {1, r, r2 } be a representative system for G/H. Let be a prime. Either it remains prime, it splits into two primes, or it splits into three primes. If it remains prime then the residue eld extension is degree 3. In the second case, one degree is 1 and the other is 2; in the third case, all extensions are degree 1. In the rst case, P cant stay prime, because the group isnt cyclic. So P splits into Q and Q. In the second case, the combined residue extensions have to be the same. So P1 lifts to Q1 (degree 1) and P2 splits into Q2 and Q3 , each extensions of degree 1. In the third case, you cant have each Pi lift through a degree-2 extension to some Qi : the decomposition group has to be contained in H; but they are all Galois conjugate. So each Pi splits into Qi and Qi . So, determining the splitting of in L determines the splitting in M . 70

Math 229

Barry Mazur

Lecture 20

Let : H C be the nontrivial character. We know what is on H: it sends t 1. (u) =


0,1,2

(ri uri )

(1) = 3 because thats the degree of the induced representation ([G : H] = 3). conj. class 1 t r 3 1 0 Let 1 denote the trivial representation, denote the sign representation, and be the 2-dimensional representation (action of S3 on its vertices). 1 t r 1 1 1 1 1 1 1 2 0 1 So = + . In this example, L (s, , M/L) = L (s, ; M/K)L (s, ; M/K). Evaluating local factors gets 3 1 1 s for the rst case,
1 1+ s 1 1 2s

for the second case, and

1 1p3s

for the third case.

Given the previous computation, L(s, ; M/L) = L(s, , M/K) L(s, ; M/K)
degree 1 degree 1 degree 2

If we have a general theory of analytic continuation and functional equations for degree 1 and Artin L-functions, then we have that L(s, ; M/K) would be the ratio of two analytic functions, i.e. a meromorphic function. If you have a zero-free region for Artin L-functions (which we do), then you get a zero-free region for L(s, , M/K) (modulo a few poles, maybe). This is actually useful: you dont need entire-ness to get good estimates. Theorem 20.4 (Brauer). Any character can be written = f inite nj IndG j j such that H each j : Hj C has degree 1, and nj Z. Thus L(s, ; M/K) = L(s, j , M/Lj )nj . Combining this with the general theory, you have nice control over Artin L-functions: you can write them as the product / quotient of degree-1 L-functions, which are analytic. Proof of proposition. splits into primes P1 , , P in L. For each of these, choose some Qi above Pi in M . For each j choose j G such that j Q1 = Qj . Each Qj has a decomposition group Gj (this is the Galois group of Qj K). These conjugate 1 the way the Qi s do: j G1 j = Gj . Each Gj contains Frobenius, which we call j ; and 1 j 1 j = j . Let fj : Qj Pj be the degree of the residue eld extension for M L, 71

Math 229

Barry Mazur

Lecture 21

and let fj : Pj be the other residue eld extension degree. Let f = fj fj (the same for all j its the residue eld extension degree of any of the Qj over ). (1) (2) (3) (4) |Gj | = f |Gj H| = fj |Gj /(Gj H)| = fj {j,1 , j,2 , , j,fj } is a representative system for Gj modulo Gj H.

Lemma 20.5. L : {j,k j : k = 1, .., fj j = 1, , } is a representative system for G/H.

Proof. Homework.

(u) =
j,k

1 1 (j,k j uj j,k )

(Make use of the convention that (X) = 0 if X H.) Take the Frobenius element 1 , / and let u = m be an arbitrary element of G1 = Gal(Q1 /k). Write this as the Artin 1 f symbol (Qi , M/K). So j j = (Qj ; M/L) H, and m H i fj | m. j

(m ) = 1 =

1 1 (jj,k j m j jk ) 1 1 (jk m jk ) j

=
j=1

fj (m ) j ( m ) m N ms ((Qj , M/K)m ) m N ms fj (m ) j m N ms fj ((j j )n ) fj n (N )fj ns


f

in an abelian group

log L (s, ; M/K) = =

m=1 j=1

=
n=1

only some of the above terms count by denition

= L (s, , M/L)

72

Math 229

Barry Mazur 21. April 12

Lecture 21

21.1. Artin L-functions and Dedekind -functions. Let L K to be Galois with Galois group G, and choose H G as the trivial group. Then L(s, , L/L) = L (s) by denition; but we just proved this is equal to L(s, IndG , L/K). Write C as a C[H] {1} module, where every element of H acts trivially on C. The induced representation is C[G] C[H] C1 = C[G]. So the induced character is the regular representation of G acting on C[G] = i irred EndC (Vi ); if each representation Vi has dimension di , then di EndC (Vi ) = Vi . L(s, IndG , L/K) = {1} i
irred

L(s, i ; L/K)di = L (s)

(Clarication: were thinking of C[G] acting on C[G] = Reg; also |G| = dimC (Reg). By more representation theory, write Reg = EndC (Vi ) = Mdi di (C). This is canonical. di Vi depends on choice of basis.) But EndC (Vi ) =

21.2. Signed divisors and D-ideal class groups. (Notation is made up by Mazur.) We have a number eld K Q, with ring of integers A. A signed divisor is a pair (D, S ) (usually referred to as D), where D is an ideal, and S is a set of distinct real embeddings of K R. Dene ID to be the group of fractional ideals of A relatively prime to D: the denominators do not include primes of D. Let P rD be the set of principal D-ideals: the subgroup of nonzero elements a K such that (1) for all P | D, a AP (A localized at P ) (2) a 1 (mod DP ); so DP = D AP for all P | D; alternatively, a 1 has no copies of p in the denominator (3) a has positive image under any embedding K R Definition 21.1. ClD (K) = ID /P rD is the ideal class group. If is an ideal, write [] to represent the corresponding element in the ideal class group. Let D be the set of ideals prime to D, and let D (c) = { D : [] = c}. Also dene D (c, X) = { D (c) : NK/Q X} Theorem 21.2. There exists = D (K) > 0 depending upon K and D, but not c, such that 1 1 |D (c, X)| = X + O X [K:Q]

It turns out that =

2r1 (2)r2 Regulator |roots of 1| Discriminant 73

Math 229

Barry Mazur

Lecture 21

but it doesnt really matter what is, and it simplies the proof if you dont specify . Well prove this. . . later? maybe not at all. If K = Q, then = 1. If K = Q(i), ideals are points in the rst quadrant of the radius-x circle. Dene
c K (s) =
D []=c

(N )s

Also dene

K (s) =
D

{D}

(N )s =
cClD (K)

c K (s)

Also, write Q for the Riemann function.


c A(x) of the Dirichlet series K (s) Q (s) is O(X [K:Q] ). Going back to the old lim sup 1 discussion, we can show that the abscissa of convergence is 1 [K:Q] . c Corollary 21.3. K has meromorphic continuation to at least = 1 pole at s = 1 with residue = . {D} 1 [K:Q] , 1
1

and has a

Corollary 21.4. The incomplete -function K (s) has the same meromorphic continuation with residue hD (K) , where hD (K) = |ClD (K)|.

21.3. Ideal class L-functions. Let D be a signed divisor. A D-ideal class character (of degree 1) is a homomorphism ID /P rD C Also view these as characters (also notated ) of ID . () L{D} (s, ) = c (s) = = (N )s
cClD ()
D

P D

1 (P )
NP s

(the last expression is by unique factorization, just like the Riemann case). Corollary 21.5. If is any nontrivial character, then L{D} (s, ) extends as an analytic 1 function to the right-half plane bounded by = 1 [K:Q] .

Why? c (s) has a pole of exactly the same residue R at s = 1. So the residue of L{D} (s, ) is cClD (K) (c) R. You can also prove (but we wont) that L{D} (s, ) doesnt vanish on = 1. We also have the weaker result that L{D} (1, ) = 0. 74

Math 229

Barry Mazur

Lecture 22

21.4. Logs of Ideal class L-functions. The dierence between -functions and Lfunctions is that L-functions have to have Euler products, but -functions may not. When we have a product, we want to take the log. Say that f g if f g is analytic around s = 1.

log L{D} (s, ) =


m=1 P D

(P )m m (N P )ms

(P ) (N P )s

+Rest
P D

(P ) NPs

P D deg(P )=1

(P ) NPs

(L{D} (s, ) has an analytic continuation around s = 1.) Let K be a eld, P a collection of primes. Take
1 P P N P s 1 P prime N P s 1 , and the denominator is log s1 . Then the limit of this as s 1 is the Dirichlet density.

21.5. Relationship between D-ideal class L-functions and abelian Artin Lfunctions. Theorem 21.6. Let L/K be abelian, with Galois group G. There exists a signed divisor D over K and a unique surjective homomorphism ClD (L) G that is completely described by the following: if P ID (K) is prime, and [P ] is its image in the class group, then Artin([P ]) = P G (the Frobenius element). Corollary 21.7. If : G C is a character, then LD (s, Artin) = Lincomplete (s, , L/K) where the incompleteness is because weve ignored ramied primes on the left.
Artin

22. April 17 22.1. Density. The Dedekind zeta function has a simple pole at s = 1, and is analytic elsewhere. 1 K (s) = (N P )s = 1 log + h(s) s1
P PK

where h(s) is analytic. (This comes from breaking up K as a sum over primes to various powers, and realizing that powers > 1 dont change much.) Then we took A PK , and asked about P A N P s . We say that A has strong analytic density if 1 (N P )s = a log + h(s) s1
P A

where h is another function that is holomorphic around s = 1. Impose the relation f (s) g(s) if f (s) = g(s) + h(s) where h(s) is analytic around s = 1. 75

Math 229 A has natural density a if a = lim


X

Barry Mazur

Lecture 22

#{P A : N P X} #{P PK : N P X}
P A N P 1 log s1 s

A has Dirichlet density (or analytic density) a if a = lim


s1+

Let K be a eld, P a prime over p Q. Say that P has degree 1 if NK/Q P = p. Proposition 22.1. Let A = PK P be the set of primes of K of degree 1. Then the strong analytic density of A is 1.
(1)

1 = NPs A(1)

p P s.tN P =p degree 1

1 +(higher degree) ps

1 + [K : Q] ps

k2

1 pks

So, A have the same densities. Similarly, the stu of degree 1 over some other eld F has the same density. 22.2. The Artin Map. We did not prove: Theorem 22.2. Let L/K be an Abelian extension with Galois group G. There exists a signedd divisor D of K and a surjective homomorphism called the Artin map ID (K) Artin ClD (K) = G P rD (K) such that the composition P produces the Frobenius element. Artin In particular, consider the composition ID G C ; clearly L(s.D , L/K) = LDincomplete (s, , L/K) Proposition 22.3 (Regularity). Let = D (i.e. a D-ideal class character) that is not principal. Then L(1, D , K) is regular: that is, it is neither zero nor a pole. This implies the same for abelian Artin L-functions. But we also get it for non-abelian Artin L-functions! We proved that Brauers theorem implies the following: if is any non-abelian character, then = i irred ni IndG i i where ni Z. So H L(s, , L/K) = (L(s, i , L/Ki ))ni 76 [P ]
ClD (K) Artin

(P ; L/K) = F robP

Math 229 where Ki is the xed eld of Hi .

Barry Mazur

Lecture 22

Theorem 22.4. Let L/K be an abelian extension. Let c Gal(L/K), and dene Ac = {P prime of K : F robP = c} 1 Strong analytic density of Ac = [L : K] Theorem 22.5 (General Chebotarev). If L/K is any Galois extension, and C G a conjugacy class. Let AC = {P : F robP = C}. Then Strong analytic density of AC = |C| [L : K]

Lemma 22.6. G is a nite group, C a single conjugacy class. Let C be the characteristic 1 xC . Then function of C: C (x) = 0 xC / C = |C| |G| (c)
irred

Abelian ideal class L-function theorem general. C = By orthonormality, 1 C (x)(x) = a() |G|
xG

irred a(

) .

a() =

1 |G|

xC

(x) = |C| (x) |G|

Let F robP denote the conjugacy class of F rob at P in G. |C| C| N P s = C (F robP ) N P s = N P s + |G| |G| P Ac P PK because the principal character applied to F robP is just 1.

(c)
irred =1 P

(F robP ) N P s

We want to show that the second term has no eect (is analytic), and in particular, that s 0 for xed nonprincipal. This comes from the regularity theorem P (F robP )N P for nonabelian characters.

Artin abelian theorem Artin general theorem. We had a Galois extension L/K with Galois group G. Let C G be a conjugacy class, and let g C (choice shouldnt matter!). Let M be the xed eld under g; so L/M has Galois group cyclic, generated by g. Let p denote a prime in Q, a prime in K, P a prime in M , and Q a prime in L. As before, norm always means norm down to Q. L/M is an abelian extension. Suppose we want F robP = g. That implies that P is inert in L/M . So there exists a unique Q lying over P . Let GQ G be the stabilizer of Q in G. If F robP = g then GQ = g . So the notation GQ = GP makes sense. 77

Math 229

Barry Mazur

Lecture 22

When we compute analytic density, we only care about the degree-1 primes . If P lies over then it is also degree 1: its decomposition group is just GQ . Were trying to compute AC in PK : the primes of K whose Frobenius live in C. We want: |C| 1 (N )s log |G| s1 deg =1
Q,F robQ =g

The LHS is ps #{ p, Q , F robQ = g}. For every Q, there is a corresponding P . So this is ps #{ p : P , F robP = g} =: f (s)
p

We want to compare this with the RHS g(s) := ps #{P : deg P = 1, F robP = g} =
deg P =1,F robP =g P p

(N P )s

By the abelian theorem (or, even the cyclic theorem), this is 1 1 log |g | s1 Lemma 22.7. # := #{P , deg P = 1, F robP = g} = |G| | g | |C|

This is the number of primes over that has the same Frobenius element as g. (Were assuming that # > 0. . . this should end up in the theorem hypotheses. . . ) Note that if there is some Q over P with F robQ = g, then that Q is unique. So # = #{P , deg P = 1, Q P, F robQ = g}. Let {Q, Q1 , , Q? } be the other primes over . Every Qi is given by a conjugacy class of G/GQ . F robxQ = xgx1 . The number = the number of x that commute with g; this is the commutator in G of g. Anything in GQ doesnt change anything; so we divide by GQ : #= I claim that CentG (g) = CentralizerG (g) GQ =
|G| |C| .

|G| #elts. conjugate to g

So

f (s)

|C| 1 log |G| s1

78