SBU Math Algo Weekly Problems 2014 Fall

Problem Solving in Math (Math 43900) Fall 2014
Weekly problems
Instructor: David Galvin
Contents
1 Week one (August 26) — A grab-bag 3
1.1 Week one problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Week one solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Week two (September 2) — Induction 10

2.1 Week two problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Week two solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Week three (September 9) — Pigeon-hole principle 20

3.1 Week three problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Week three solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Week four (September 16) — Binomial coefficients 25

4.1 Week four problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Week four solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Week five (September 23) — Inequalities 40

5.1 Week five problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Week five solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6 Week six (September 30) — Polynomials 51

6.1 Week six problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Week six solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7 Week seven (October 7) — Recurrences 58

7.1 Week seven problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.2 Week seven solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8 Week eight (October 14) — Another grab-bag 69

8.1 Week eight problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2 Week eight solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
1
9 Week nine (October 28) — Modular arithmetic and number theory 75
9.1 Modular arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.2 Week nine problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.3 Week nine solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
10 Week ten (November 4) — Games 85

10.1 Week ten problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.2 Week ten solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
11 Week eleven (November 11) — Graphs 90

11.1 Week eleven problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
11.2 Week eleven solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
12 Week twelve (November 18) — Probability 98

12.1 Week twelve problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
12.2 Week twelve solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
13 Week thirteen (November 25) — One more grab-bag, and some writing
tips 109
13.1 Week thirteen problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
13.2 Week thirteen solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
2
1 Week one (August 26) — A grab-bag
Most weeks’ handouts will be a themed collection of problems — all involving inequalities of
one form or another, for example, or all involving linear algebra. The Putnam Competition
itself has no such theme, and so every so often we’ll have a handout that includes the realistic
challenge of figuring out what (possibly different) approach (or approaches) to take for each
problem. This introductory problem set is one such.
Look over the problems, pick out some that you feel good about, and tackle them! You’ll
do best if you engage your conscious brain fully on a single problem, rather than hopping
back-and-forth between problems every few minutes (but it’s also a good idea to read all the
problems before tackling one, to allow your subconscious brain to mull over the whole set).
Remember that problem solving is a full-contact sport: throw everything you know at the
problem you are tackling! Sometimes, the solution can come from an unexpected quarter.
The point of Math 43900 is to introduce you to (or reacquaint you with) a variety of
tricks and tools that tend to be frequently useful in the solving of competition puzzles; but
even before that process starts, there are lots of common-sense things that you can do to
make problem-solving fun, productive and rewarding. Whole books are devoted to these
strategies (such as Larson’s Problem solving through problems and Pólya’s How to solve it).
Without writing a book on the subject, here (adapted from a list by Ravi Vakil) are some
slogans to keep in mind when solving problems:
• Try small cases!
• Plug in small numbers!
• Do examples!
• Look for patterns!
• Draw pictures!
• Write lots!
• Talk it out!
• Choose good notation!
• Look for symmetry!
• Break into cases!
• Work backwards!
• Argue by contradiction!
• Consider extreme cases!
• Modify the problem!
• Make a generalization!
3
• Don’t give up after five minutes!
• Don’t be afraid of a little algebra!
• Take a break!
• Sleep on it!
• Ask questions!
And above all:
• Enjoy!
4
1.1 Week one problems
1. How many positive integers are there which, in their decimal representation, have
strictly decreasing digits?
2. For positive integers n, let the numbers c(n) be determined by the rules c(1) = 1,
c(2n) = c(n), and c(2n + 1) = (−1)n c(n). Find the value of
2013
X
c(n)c(n + 2).
n=1
3. Which number is bigger: eπ or π e ? (Note: of course, a calculator will reveal the

answer in an instant! The challenge really is to make the determination without any
numerical calculations.)
4. Recall that a regular icosahedron is a convex polyhedron having 12 vertices and 20 faces;
the faces are congruent equilateral triangles. On each face of a regular icosahedron is
written a nonnegative integer such that the sum of all 20 integers is 39. Show that
there are two faces that share a vertex and have the same integer written on them.
5. Let f be a non-constant polynomial with positive integer coefficients. Prove that if n

is a positive integer, then f (n) divides f (f (n) + 1) if and only if n = 1.
6. Let a0 < a1 < a2 . . . be an infinite sequence of positive integers. Prove that there exists
a unique integer n ≥ 1 such that
a0 + a1 + a2 + . . . + an
an < ≤ an+1 .
n
5
1.2 Week one solutions
1. (Illinois mock Putnam 2005, exam 1 Q3) There are 1023 such numbers. Justification:
each non-empty subset of {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} corresponds to a unique decimal num-
ber with decreasing digits (just list the elements of the subset in decreasing order), and
conversely each decimal number with decreasing digits yields a non-empty subset of
{9, 8, 7, 6, 5, 4, 3, 2, 1, 0} (just take the elements of the subset to be the digits). So there
are exactly as many decimal numbers with decreasing digits as non-empty subsets of
{9, 8, 7, 6, 5, 4, 3, 2, 1, 0}; this number is 210 − 1 = 1023 (210 counts number of subsets:
each of the 10 elements can be either in or out, independently; the −1 removes the
empty set). Note: if one doesn’t consider 0 to be a positive number, then the answer
becomes 1022).
2. (Putnam 2013 B1) Drawing up a table of values of c(n) for 1 ≤ n ≤ 16, it seems that
for every odd k ≥ 1 we have
k
X
c(n)c(n + 2) = −1.
n=1
It seems natural to try to prove this by induction on k. The base case k = 1 is easy.
For odd k ≥ 3, we have (by induction)
k k−2
!
X X
c(n)c(n + 2) = c(n)c(n + 2) + c(k − 1)c(k + 1) + c(k)c(k + 2)
n=1 n=1
= −1 + c(k − 1)c(k + 1) + c(k)c(k + 2).
So we would be done if we could show that for each even ` ≥ 2, we have
c(`)c(` + 2) + c(` + 1)c(` + 3) = 0.
But this comes straight out of the recursive definition:
c(`)c(` + 2) + c(` + 1)c(` + 3) = c(`/2)c(`/2 + 1) + (−1)`/2 c(`/2)(−1)`/2+1 c(`/2 + 1)
= c(`/2)c(`/2 + 1) 1 + (−1)`/2+`/2+1

= c(`/2)c(`/2 + 1) 1 + (−1)`+1

= c(`/2)c(`/2 + 1) (1 + −1)
= 0.
3. (Illinois mock Putnam 2005, exam 1 Q1) Consider the function f (x) = x1/x , which is
continuous and differentiable on (0, ∞). We have
x1/x
f 0 (x) = (1 − ln x) .
x2
For 0 < x < e, f 0 (x) > 0 and for e < x < ∞, f 0 (x) < 0, so f (x) has a unique maximum
value at x = e. It follows that for any x 6= e (in the range (0, ∞)),
1 1
ee > xx
6
or
ex > xe .
Since π 6= e, it follows that
eπ > π e .
4. (Putnam 2013 A1) It would be good to know some more facts about the regular
icoahedron, specifically: how many edges does it have (let’s say e) and how many
edges meet at each vertex (let’s say x). To calculate e, we use Euler’s formula: for any
polyhedron with v vertices, f faces and e edges, v − e + f = 2. This yields e = 30. To
calculate x, consider the following process: if we sum over each vertex the number of
edges meeting that vertex, we get 12x. But in this sum, each edge gets counted twice,
so 12x = 60, and x = 5.
Now imagine writing non-negative integers on the faces, with at each vertex the five
faces that share that vertex getting different numbers. For each vertex, if we sum the
numbers on the faces sharing that vertex, we get at least 0 + 1 + 2 + 3 + 4 = 10. Thus
if we sum the answer we get at each vertex, we get a total of at least 10 × 12 = 120. In
this grand sum, each face-number is counted exactly three times (once for each vertex
of the face), so that means that the sum of the numbers used is at least 120/3 = 40.
Conclusion: if the sum of the numbers is 39, there must be two faces that share a
vertex and have the same numbers written on them.
5. Write (Putnam 2007 B1) f (x) as a0 + a1 x + . . . + ak xk with each ai > 0 and with k ≥ 1.
We have
f (f (n) + 1) = a0 + a1 (f (n) + 1) + . . . + ak (f (n) + 1)k

= a0 + a1 + . . . + an + Kf (n)
= f (1) + Kf (n)
where K is an integer. Here we are using the fact that for each i ≥ 1,

i i i
(f (n) + 1) = 1 + f (n) + . . . + f (n)i
1 i

i i i−1
= 1 + f (n) + ... + f (n) .
1 i
We conclude that f (n) divides f (f (n) + 1) if and only if f (n) divides f (1) + Kf (n),
which happens if and only if f (n) divides f (1). This certainly happens when n = 1; but
for n > 1, we have f (n) > f (1) > 0 (here we use both the fact that f is non-constant
and that it has positive integer coefficients), and so we cannot have f (n) dividing f (1).
So it only happens at n = 1.
6. (International Mathematical Olympiad 2014 Q1; for an extensive discussion of this

problems, with lots of different solutions, see the weblog of Fields medalist Tim Gowers,
7
http://gowers.wordpress.com/2014/07/19/mini-monomath/) We begin by showing
that there must be at least one n ≥ 1 such that
a0 + a1 + a2 + . . . + an
an < ≤ an+1
n
holds. Indeed, suppose not; then for each n ≥ 1, we have either
f (n) ≤ an
or
an+1 < f (n)
(both note not both, since an < an+1 ), where for convenience of typography we have
set
a0 + a1 + a2 + . . . + an
f (n) = .
n
We prove, by induction on n, that it’s the second of these that must occur, i.e., that
an+1 < f (n).
The base case n = 1 is clear: we cannot have f (1) ≤ a1 , since this is the same as
a0 + a1 ≤ a1 , and we known that a0 > 0; so we must have a2 < f (1). [Recall that we
have made the assume that there is no n ≥ 1 for which an < f (n) ≤ an+1 .] For n > 1,
the inductive hypothesis tells us that
an < f (n − 1).
Recalling the definition of f (n − 1), multiplying across by n − 1, adding an to both
sides, and dividing by n, this is the same as
an < f (n);
so we can’t have f (n) ≤ an , and so must have an+1 < f (n). This completes the
induction.
Summary so far: under the assumption that there is no n ≥ 1 for which an < f (n) ≤
an+1 , we have concluded that an+1 ≤ f (n) for all n ≥ 1.
We now claim that it follows from this (“this” being an+1 ≤ f (n) for all n ≥ 1) that
an+1 < a0 + a1
for all n ≥ 1. Again, we proceed by (this time strong) induction on n, with n = 1
immediate from a2 < f (1). For n > 1 we have
an+1 < f (n)
a0 + a1 + a2 + . . . + an
=
n
a0 + a1 + (a0 + a1 ) + . . . + (a0 + a1 )
<
n
n(a0 + a1 )
=
n
= a0 + a1 ,
8
where in the second inequality we have used the induction hypothesis to bound ak <
a0 + a1 for each of k = 2, . . . , n − 1. This completes the induction.
Summary so far: under the assumption that there is no n ≥ 1 for which an < f (n) ≤
an+1 , we have concluded that an+1 ≤ a0 + a1 for all n ≥ 1. But this is a contradiction:
since the ai ’s are integers and increasing, there must be some n ≥ 1 for which an+1 >
a0 + a1 .
Interim conclusion: there is at least one n ≥ 1 for which an < f (n) ≤ an+1 .
Now we need to show that there is a unique such n. To do this, suppose that n is
such that an < f (n) ≤ an+1 , and that m > n. We will prove (by induction on m)
that f (m) ≤ am . If we can do this, we will have shown that there can be at most one
n such that an < f (n) ≤ an+1 , and combined with our interim conclusion, we would
have a unique such n.
The base case for the induction is m = n + 1; f (n + 1) ≤ an+1 is just a simple
rearrangement of f (n) ≤ an+1 . We now move onto the induction step. For m > n + 1,
we get to assume f (m−1) ≤ am−1 . Combining this with am−1 < am , we get f (m−1) ≤
am , and a simple rearrangement of this leads to f (m) ≤ am , completing the induction.
9
2 Week two (September 2) — Induction
Suppose that P (n) is an assertion about the natural number n. Induction is essentially the
following: if there is some a for which P (a) is true, and if for all n ≥ a we have that the truth
of P (n) implies the truth of P (n + 1), then we can conclude that P (n) is true for all n ≥ a.
This should be fairly obvious: knowing P (a) and the implication “P (n) =⇒ P (n + 1)”
at n = a, we immediately deduce P (a + 1). But now knowing P (a + 1), the implication
“P (n) =⇒ P (n + 1)” at n = a + 1 allows us to deduce P (a + 2); and so on. Induction is the
mathematical tool that makes the “and so on” above rigourous. Induction works because
of the fundamental fact that a non-empty subset of the natural numbers must have a least
element. To see why this allows induction work, suppose that we know that P (a) is true
for some a, and that we can argue that for all n ≥ a, the truth of P (n) implies the truth
of P (n + 1). Suppose now (for a contradiction) that there are some n ≥ a for which P (n)
is not true. Let F = {n|n ≥ a, P (n) not true}. By assumption F is non-empty, so has a
least element, n0 say. We know n0 6= a, since P (a) is true; so n0 ≥ a + 1. That means that
n0 − 1 ≥ a, and since n0 − 1 6∈ F (if it was, n0 would not be the least element) we know
P (n0 − 1) is true. But then, by assumption, P ((n0 − 1) + 1) = P (n0 ) is true, a contradiction!
Example: Prove that a set of size n ≥ 1 has 2n subsets (including the empty set and the
set itself).
Solution: Let P (n) be the statement “a set of size n has 2n subsets”. We prove that P (n)
is true for all n ≥ 1 by induction. We first establish a base case. When n = 1, the generic
set under consideration is {x}, which has 2 = 21 subsets ({x} and ∅); so P (1) is true.
Next we establish the inductive step. Suppose that for some n ≥ 1, P (n) is true. Consider
P (n + 1). The generic set under consideration now is {x1 , . . . , xn , xn+1 }. We can construct
a subset of {x1 , . . . , xn , xn+1 } by first forming a subset of {x1 , . . . , xn }, and then either
adding the element xn+1 to this subset, or not. This tells us that the number of subsets of
{x1 , . . . , xn , xn+1 } is 2 times the number of subsets of {x1 , . . . , xn , }. Since P (n) is assumed
true, we know that {x1 , . . . , xn , } has 2n subsets (this step is usually referred to as applying
the inductive hypothesis); so {x1 , . . . , xn , xn+1 } has 2 × 2n = 2n+1 subsets. This shows that
the truth of P (n) implies that of P (n + 1), and the proof by induction is complete.
Strong Induction
Induction is a great tool because it gives you somewhere to start from in an argument. And
sometimes, the more you start with, the further you’ll go. That’s why the principle of Strong
Induction is worth keeping in mind: if there is some a for which P (a) is true, and if for each
n > a we have that the truth of P (m) for all m, a ≤ m < n, implies the truth of P (n), then
we can conclude that P (n) is true for all n ≥ a.
The proof that this works is almost the same as the proof that induction works. What’s
good about strong induction is that when you are at the part of the argument where you
have to show that the truth of P (n + 1) from some assumptions about earlier assertions, you
now have a lot more to work with: each of P (a), P (a + 1), . . . , P (n − 1), rather than just
P (n) alone. Sometimes this is helpful, and sometimes it’s absolutely necessary.
10
Example: Prove that every integer n ≥ 2 can be written as n = p1 . . . p` where the pi ’s are
(not necessarily distinct) prime numbers.
Solution: Let P (n) be the statement: “n can be written as n = p1 . . . p` where the pi ’s are
(not necessarily distinct) prime numbers”. We’ll prove that P (n) is true for all n ≥ 2 by
strong induction.
P (2) is true, since 2 = 2 works.
Now consider P (n) for some n > 2. We want to show how the (simultaneous) truth of
P (2), . . . , P (n − 1) implies the truth of P (n). If n is prime, then n = n works to show that
P (n) holds. If n is not a prime, then its composite, so n = ab for some numbers a, b with
2 ≤ a < n and 2 ≤ b < n. We’re allowed to assume that P (a) and P (b) are true, that
is, that a = p1 . . . p` where the pi ’s are (not necessarily distinct) prime numbers, and that
a = q1 . . . qm where the qi ’s are (not necessarily distinct) prime numbers. It follows that
n = ab = p1 . . . p` q1 . . . qm .
This is a product of (not necessarily distinct) prime numbers, and so P (n) is true.
So, by strong induction, we conclude that P (n) is true for all n ≥ 2.
Notice that we would have gotten exactly nowhere with this argument if, in trying to
prove P (n), all we had been allowed to assume was P (n − 1).
Recurrences
Sometimes we are either given a sequence of numbers via a recurrence relation, or we can
argue that there is such relation that governs the evolution of the sequence. A sequence
(bn )n≥a is defined via a recurrence relation if some initial values, ba , ba+1 , . . . , bk say, are
given, and then a rule is given that allows, for each n > k, bn to be computed as long as we
know the values ba , ba+1 , . . . , bn−1 .
Sequences defined by a recurrence relation, and proofs by induction, go hand-in-glove.
We may have more to say about recurrence relations later in the semester, but for now, we’ll
confine ourselves to an illustrative example.
Example: Let an be the number of different ways of covering a 1 by n strip with 1 by 1
and 1 by 3 tiles. Prove that an < (1.5)n .
Solution: We start by figuring out how to calculate an via a recurrence. Some initial values
of an are easy to compute: for example, a1 = 1, a2 = 1 and a3 = 2. For n ≥ 4, we can
tile the 1 by n strip EITHER by first tiling the initial 1 by 1 strip with a 1 by 1 tile, and
then finishing by tiling the remaining 1 by n − 1 strip in any of the an−1 admissible ways;
OR by first tiling the initial 1 by 3 strip with a 1 by 3 tile, and then finishing by tiling the
remaining 1 by n − 3 strip in any of the an−3 admissible ways. It follows that for n ≥ 4 we
have an = an−1 + an−3 . So an (for n ≥ 1) is determined by the recurrence


 1 if n = 1,
1 if n = 2,

an =

 2 if n = 3, and
an−1 + an−3 if n ≥ 4.

11
Notice that this gives us enough information to calculate an for all n ≥ 1: for example,
a4 = a3 + a1 = 3, a5 = a4 + a2 = 4, and a6 = a5 + a3 = 6.
Now we prove, by strong induction, that an < 1.5n . That a1 = 1 < 1.51 , a2 = 1 < (1.5)2
and a3 = 2 < (1.5)3 is obvious. For n ≥ 4, we have
an = an−1 + an−3
< (1.5)n−1 + (1.5)n−3
3 !
2 2
= (1.5)n +
3 3

26
= (1.5)n
27
n
< (1.5) ,
(the second line using the inductive hypothesis) and we are done by induction.
Notice that we really needed strong induction here, and we really needed all three of the
base cases n = 1, 2, 3 (think about what would happen if we tried to use regular induction,
or what would happen if we only verified n = 1 as a base case); notice also that an induction
argument can be written quite concisely, while still being fully correct, without fussing too
much about “P (n)”.
Induction beyond N
Induction can work to prove families of statements indexed other than N. Rather than giving
a general statement, we present an illustrative example. Suppose we have a doubly-infinite
sequence (an,k )∞
n,k=0 that is defined by the rules
• a0,0 = 1;
• an,0 = 1 for all n > 0;
• a0,k = 0 for all k > 0; and
• an,k = an−1,k + an−1,k−1 for n, k > 0.
(Convince yourself that this does uniquely define an,k for any n, k ≥ 0). Checking numerous
small values suggests that for all n, k ≥ 0,
n!
an,k = ,
k!(n − k)!
where n! is defined to be n × (n − 1) × . . . × ×1 if n ≥ 1, with 0! = 1 and n! = 0 if n < 0

(this is the factorial function, about which we will have a lot to say later). We prove this by
induction; but it can’t be quite the same induction as before, since the family of statements
“an,k = n!/(k!(n − k)!)” is indexed not by N but by N × N. One way to deal with this is to
introduce a new parameter ` = n + k, and give a proof by induction on ` of the statement
that an,k = n!/(k!(n − k)!) whenever n + k = `.
12
Base case: when ` = 0, we only have to verify a0,0 = 0!/(0!0!), which follows from the
first rule defining an,k , and from the definition of 0!.
Inductive step: For ` > 0, we have to verify three things:
• a`,0 = `!/(0!`!): this follows from the second rule defining an,k , and from the definition
of 0!;
• a0,` = 0!/(`!(−`)!): this follows from the third rule defining an,k , and from the definition
of 0!, and n! for negative n; and
• an,k = n!/(k!(n − k)!) when 0 < n < `, 0 < k < `, and n + k = `. For this, we use the
fourth rule defining an,k :
an,k = an−1,k + an−1,k−1 .
We can use (strong) induction to express an−1,k and an−1,k−1 in terms of factorials,
since we have both (n − 1) + k < ` and (n − 1) + (k − 1) < `, to conclude
(n − 1)! (n − 1)!
an,k = + .
k!((n − 1) − k)! (k − 1)!((n − 1) − (k − 1))!
After a little algebra, this simplifies to

n!
an,k = .
k!(n − k)!
This completes the proof by induction.
13
2.1 Week two problems
1. Prove that the following pair of inequalities are valid for all n ≥ 1:
√ 1 1 1 √
2( n + 1 − 1) ≤ 1 + √ + √ + . . . + √ ≤ 2 n.
2 3 n
2. Write the numbers 1, 1 on the board. Insert the sum of the numbers in between, to
get 1, 2, 1. In between each pair of numbers, again insert their sum, to get 1, 3, 2, 3,
1. Repeat, to get 1, 4, 3, 5, 2, 5, 3, 4, 1. After this operation has been performed 100
times, what is the sum of the numbers written on the board?
3. You are given a 64 by 64 chessboard, and 1365 L-shaped tiles (2 by 2 tiles with one
square removed). One of the squares of the chessboard is painted purple. Is it possible
to tile the chessboard using the given tiles, leaving only the purple square exposed?
4. Prove that for all n ≥ 2, it is possible to write n! − 1 as the sum of n − 1 numbers,

each of which is a divisor of n!.
5. Find (with proof!) all sequences (an )n≥0 of positive real numbers for which
a1 + 2a2 + 3a3 + . . . + kak k+1

=
a1 + a2 + . . . + ak 2
for all k ≥ 1.
6. The numbers 1 through 2n are partitioned into two sets A and B of size n, in an
arbitrary manner. The elements a1 , . . . , an of A are sorted in increasing order, that is,
a1 < a2 < . . . < an , while the elements b1 , . . . , bn of of B are sorted in decreasing order,
that is, b1 > b2 > . . . > bn . Find (with proof!) the value of the sum
n
X
|ai − bi |.
i=1
14
2.2 Week two solutions
1. (NYU Putnam preparation class) By induction on n ≥ 1, base case n = 1 easy. For
n ≥ 2, we have

1 1 1 1 1 1 1
1 + √ + √ + ... + √ = 1 + √ + √ + ... + √ +√
2 3 n 2 3 n−1 n
√ 1
≤ 2 n−1+ √ ,
n
the inequality using the inductive hypothesis. It follows that

1 1 1 √
1 + √ + √ + ... + √ ≤ 2 n
2 3 n
is implied by
√ 1 √
2 n − 1 + √ ≤ 2 n,
n
√ can be verified (for n ≥ 2) by simple algebra (easiest route: multiply across

which
by n, subtract 1 from both sides, square both sides and simplify). For the other
inequality, we have

1 1 1 1 1 1 1
1 + √ + √ + ... + √ = 1 + √ + √ + ... + √ +√
2 3 n 2 3 n−1 n
p 1
≥ 2( (n − 1) + 1 − 1) + √ ,
n
the inequality using the inductive hypothesis. It follows that

√ 1 1 1
2( n + 1 − 1) ≤ 1 + √ + √ + . . . + √
2 3 n
is implied by
p 1 √
2( (n − 1) + 1 − 1) + √ ≥ 2( n + 1 − 1),
n
√ n ≥ 2) by simple algebra (easiest route: add 2 to

which again can be verified (for
both sides, multiply across by n, square both sides and simplify). This completes
the proof by induction.
There’s an alternate approach, based on Riemann sums: consider the integral
Z n
1 √
dx = 2 n.
0 x
√
The integral measures the area under y = 1/ x, above the x-axis, and between x = 0
and x = n. A smaller area is covered by the n rectangles (in R2 )
√ √ √
[0, 1] × [0, 1/ 1], [1, 2] × [0, 1/ 2], . . . , [n − 1, n] × [0, 1/ n]
15
√
(this is because y = 1/ x is decreasing on (0, n]; a picture will show instantly what’s
going on). Comparing the sum of the area of the rectangles to the value of the intgral,
we immediately get
1 1 1 √
1 + √ + √ + . . . + √ ≤ 2 n.
2 3 n
R n+1
The other inequality follows similarly on consideration of 1 x1 dx.
2. (UMass Putnam preparation class) Let f (n) be the sum after the nth operation. We
have f (0) = 2, f (1) = 4, f (2) = 10, f (3) = 28, f (4) = 82. A clear patter emerges:
for n ≥ 0, f (n) = 3n + 1. We prove this statement by induction on n. The small-case
calculations work as base cases, but in fact we should be able to make the argument
work verifying only the base case n = 0. So, assume that n > 0 and that we know
f (n − 1) = 3n−1 + 1.
Here’s a sketch of the inductive step: after the nth iteration of the operation, if we
look at every second term in the string, starting from the first, we get a perfect copy
of the string we had just before the nth iteration; call this substring A. This gives
a contribution of f (n − 1) to the sum tallied by f (n). Each of the remaining terms
are calculated by summing their neighbours to the right and left, and so in the sum
of the remaining terms, each term from substring A is counted twice, except the first
and last terms, which are both 1 and are each only counted once. So the sum of
the remaining terms is two times the sum of the terms of substring A, less 2; i.e., it’s
2f (n−1)−2. Hence the sum of the terms in the whole string is f (n−1)+2f (n−1)−2 =
3(3n−1 +1)−2 = 3n −1 (the first equality by the inductive hypothesis). This completes
the proof by induction of the statement that f (n) = 3n + 1.
It remains to answer the question as asked: after the operation has been performed
100 times, the sum of the numbers written on the board is f (100) = 3100 + 1.
3. (Folklore; I learned this problem from Andrew Thomason) We prove a more general
statement by induction on n: given a 2n by 2n chessboard with one of the squares of
the chessboard painted purple, and the right number of L-shaped tiles, it is possible
to tile the chessboard using the given tiles, leaving only the purple square exposed.
The base case n = 0 is trivial. For n > 0, divide the chessboard into 4 congruent
2n−1 by 2n−1 chessboard in the obvious way (with vertical and horizontal dividing lines
down and across the middle). One of these four sub-boards has the purple square;
by induction we can tile the sub-board, leaving only the purple square exposed. For
the remaining three sub-boards, put pink paint on the square of the sub-board that
touches the dead center of the original board. By induction we can tile each of the
sub-boards, leaving only the pink squares exposed. But now notice that the three pink
square form an L, so one last tile can be used to cover the pink squares. This completes
the induction; the problem we were asked follows from the n = 6 case.
4. (Northwestern Putnam preparation class) This is a problem that seems impossible at
first; where could one find such precisely regulated numbers? But once one starts an
induction proof, it just falls out. Start with n = 2: 2! − 1 = 1, which can be written
as the sum of 2 − 1 = 1 number, which is a divisor of 2! = 2.
16
For n > 2 we have
n! − 1 = n[(n − 1)! − 1] + (n − 1)
= n[k1 + k2 + . . . + kn−2 ] + (n − 1),
where each ki is a divisor of (n − 1)!. (Here we are using the inductive hypothesis:
re-expressing n! − 1 in terms of (n − 1)! − 1, and using the fact that we are assuming
the existence of a good decomposition of (n − 1)! − 1). Now we have
n! − 1 = nk1 + nk2 + . . . + nkn−2 + (n − 1),
so we have written n! − 1 as the sum of n − 1 numbers, each of which divides n! (nki
divides n! because ki divides (n − 1)!, and n − 1 clearly divides n!).
Done by induction!
Alternate solution, due to Justin Skycak: The question doesn’t say that the divisors
have to be positive, so the following scheme works: for n = 2, 2 = 2, for n = 3, 5 = 3+2,
for n = 4, 23 = 12 + 8 + 3, then for odd n ≥ 5:
n−3
X
n! − 1 = n! − 1 + (−1)i
i=1
and for even n ≥ 6:

n−4
X
n! − 1 = n! − 2 + 1 + (−1)i .
i=1
I suspect that this what not what the creator of the puzzle had in mind!
5. (Northwestern Putnam preparation class) After a little experimentation, you might
start believing that the only sequences that work are constant sequences (with all
terms equal). We’ll verify this belief strong induction. Specifically, let a fixed sequence
(an )n≥0 of positive real numbers by given, for which
a1 + 2a2 + 3a3 + . . . + kak k+1
=
a1 + a2 + . . . + ak 2
for all k ≥ 1, and suppose for convenience that a1 = a. We will prove by strong
induction on n that the statement P (n) is true for all n ≥ 1, where P (n) is the
statement “an = a”.
P (1) is evident. Suppose we know that P (1), P (2), . . . , P (n) are all true, for some
n ≥ 1 . Let’s try to prove P (n + 1). One thing we know is that
a1 + 2a2 + 3a3 + . . . + (n + 1)an+1 (n + 1) + 1 n+2
= = .
a1 + a2 + . . . + an+1 2 2
Another thing we know, by strong induction, is that a1 = . . . = an = a. So what we
wrote above reduces to
(1 + 2 + . . . + n)a + (n + 1)an+1 n+2
= .
na + an+1 2
17
Now a basic fact (which is an easy example of induction!) is that 1 + 2 + . . . + n =
n(n + 1)/2, so the above becomes
n(n+1)a
2
+ (n + 1)an+1 n+2
= .
na + an+1 2
Solving for an+1 yields an+1 = a, and we are done.
6. (UMass Putnam preparation class) The wording of the question strongly suggests that
the answer is independent of the choice of A and B, so we should start with a partic-
ularly nice choice of A and B, see what answer we get, conjecture that this is always
the answer, and then try to prove the conjecture.
Letting A = {1, 2, 3, . . . , n} and B = {2n, 2n − 1, . . . , n + 1}, we find that
n
X
|ai − bi | = (2n − 1) + (2n − 3) + . . . + 1 = n2
i=1
(recall from class that it is an easy induction that the sum of the first n odd positive
integers is n2 ).
So, we try to prove by induction that ni=1 |ai − bi | = n2 . The base case n = 1 is
P
trivial.
For n ≥ 2, we can consider two cases. Case 1 is when 1 and 2n end up in different
partition classes. Let’s start by considering 1 ∈ A, 2n ∈ B. In this case, |a1 − b1 | will
contribute 2n − 1 to the sum. What about the remaining terms? Notice that A \ {1}
and B \ {2n} form a partition {a2 , . . . , an } ∪ {b2 , . . . , bn } of {2, . . . , 2n − 1}, with the
a’s increasing and the b’s decreasing. Setting a01 = a2 − 1, a02 = a3 − 1, etc., up to
a0n−1 = an − 1, and also setting b01 = b2 − 1, b02 = b3 − 1, etc., up to b0n−1 = bn − 1, we
get that A0 and B 0 form a partition {a01 , . . . , a0n−1 } ∪ {b01 , . . . , b0n−1 } of {1, . . . , 2n − 2},
with the a0 ’s increasing and the b0 ’s decreasing. By induction,
n−1
X
|a0i − b0i | = (n − 1)2 ,
i=1
and so, since |a0i − b0i | = |ai+1 − bi+1 | for each i,

n
X n
X n−1
X
|ai −bi | = (2n−1)+ |ai+1 −bi+1 | = (2n−1)+ |a0i −b0i | = (2n−1)+(n−1)2 = n2 .
i=1 i=2 i=1
If 2n ∈ A, 1 ∈ B, an almost identical argument gives the same result.

Case 2 is where 1 and 2n end up in the same partition class. We start with the case
where 1, 2n ∈ A. Let x be such that 1, 2, . . . , x ∈ A, but x + 1 ∈ B. Then
n
X x
X n−1
X
|ai − bi | = (bi − i) + |ai − bi | + (2n − (x + 1)).
i=1 i=1 i=x+1
18
(Note that this is valid even if x = n − 1, the largest it can possibly be; the second
sum in this case is empty and so 0). If we modify A and B to form A0 = {a01 , . . . , a0n },
B 0 = {b01 , . . . , b0n } by swapping 1 and x + 1, then
n
X x
X n−1
X
|a0i − b0i | = (bi − (i + 1)) + |ai − bi | + (2n − 1)
i=1 i=1 i=x+1
x
X n−1
X
= (bi − i) + |ai − bi | + (2n − (x + 1))
i=1 i=x+1
n
X
= |ai − bi |.
i=1
But now (with A0 and B 0 ) we are back in case 1, so

n
X n
X
|ai − bi | = |a0i − b0i | = n2 .
i=1 i=1
A very similar reduction works if 1, 2n ∈ B.
19
3 Week three (September 9) — Pigeon-hole principle
“If n + 1 pigeons settle themselves into a roost that has only n pigeonholes, then there must
be at least one pigeonhole that has at least two pigeons.”
This very simple principle, sometimes called the box principle, and sometimes Dirichlet’s
box principle, can be very powerful.
The proof is trivial: number the pigeonholesP 1 through n, and consider the case where ai
pigeons land in hole i. If each P ai ≤ 1, then ni=1 ai ≤ n, contradicting the fact that (since
there are n + 1 pigeons in all) ni=1 ai = n + 1.
Since it’s a simple principle, to get some power out of it it has to be applied cleverly (in the
examples, there will be at least one such clever application). Applying the principle requires
identifying what the pigeons should be, and what the pigeonholes should be; sometimes this
is far from obvious.
The pigeonhole principle has many obvious generalizations. I’ll just state one of them:
“if more than mn pigeons settle themselves into a roost that has no more than n pigeonholes,
then there must be at least one pigeonhole that has at least m + 1 pigeons”.
Example: 10 points are placed randomly √ in a 1 by 1 square. Show that there must be some
pair of points that are within distance 2/3 of each other.
Solution: Divide the square into 9 smaller squares, each of dimension 1/3 by 1/3. These
are the pigeonholes. The ten randomly chosen points are the pigeons. By the pigeonhole
principle, at least one of the 1/3 by 1/3 squares must have at least two of the ten points in it.
The maximum distance between two points in p a 1/3 by 1/3 square√ is the distance between
2 2
two opposite corners. By Pythagoras this is (1/3) + (1/3) = 2/3, and we are done.
Example: Show that there are two people in New York City who have the exactly same
number of hairs on their head.
Solution: Trivial, because surely there are at least two baldies in NYC! But even if we
weren’t sure of that: a quick websearch shows that a typical human head has around 150,000
hairs, and it is then certainly reasonable to assume that no one has more than 5,000,000
hairs on their head. Set up 5,000,001 pigeonholes, numbered 0 through 5,000,000, and place a
resident of NYC (a “pigeon”) into bin i if (s)he has i hairs on her head. Another websearch
shows that the population of NYC is around 8,300,000, so there are more pigeons than
pigeonholes, and some pigeonhole must have multiple pigeons in it.
Example: Show that every sequence of nm + 1 real numbers must contain EITHER a
decreasing subsequence of length n + 1 OR an increasing subsequence of length m + 1. (In a
sequence a1 , a2 , . . ., an increasing subsequence is a subsequence ai1 , ai2 , . . . [with i1 < i2 < . . .]
satisfying ai1 ≤ ai2 ≤ . . ., and a decreasing subsequence is defined analogously).
Solution: Let the sequence be a1 , . . . , anm+1 . For each k, 1 ≤ k ≤ nm + 1, let f (k) be the
length of the longest decreasing subsequence that starts with ak , and let g(k) be the length
of the longest increasing subsequence that starts with ak . Notice that f (k), g(k) ≥ 1 always.
If there is a k with either f (k) ≥ n+1 or g(k) ≥ m+1, we are done. If not, then for every
k we have 1 ≤ f (k) ≤ n and 1 ≤ g(k) ≤ m. Set up nm pigeonholes, with each pigeonhole
labeled by a different pair (i, j), 1 ≤ i ≤ n, 1 ≤ j ≤ m (there are exactly nm such pairs).
20
For each k, 1 ≤ k ≤ nm + 1, put ak in pigeonhole (i, j) iff f (k) = i and g(k) = j. There are
nm + 1 pigeonholes, so one pigeonhole, say hole (r, s), has at least two pigeons in it.
In other words, there are two terms of the sequence, say ap and aq (where without loss
of generality p < q), with f (p) = f (q) = r and g(p) = g(q) = s.
Suppose ap ≥ aq . Then we can find a decreasing subsequence of length r + 1 starting
from ap , by starting ap , aq , and then proceeding with any decreasing subsequence of length
r that starts with aq (one such exists, since f (q) = r). But that says that f (p) ≥ r + 1,
contradicting f (p) = r.
On the other hand, suppose ap ≤ aq . Then we can find an increasing subsequence of
length s + 1 starting from ap , by starting ap , aq , and then proceeding with any increasing
subsequence of length s that starts with aq (one such exists, since g(q) = s). But that says
that g(p) ≥ s + 1, contradicting g(p) = s.
So, whether ap ≥ aq or ap ≤ aq , we get a contradiction, and we CANNOT ever be in the
case where there is NO k with either f (k) ≥ n + 1 or g(k) ≥ m + 1. This completes the
proof.
Remark: This beautiful result was discovered by P. Erdös and G. Szekeres in 1935; the
incredibly clever application of pigeonholes was given by A. Seidenberg in 1959.
21
3.1 Week three problems
1. Show that among any 51 numbers selected from {1, . . . , 100}, there must be two (dis-
tinct) such that one divides the other.
2. Prove that given any five points on a sphere, there exists a closed hemisphere which
contains four of the points.
3. Prove that from a set of ten distinct two-digit numbers, it is possible to select two
non-empty disjoint subsets whose members have the same sum.
4. The Fibonacci numbers are defined by the recurrence f0 = 0, f1 = 1 and fn = fn−1 +

fn−2 for n ≥ 2. Show that the Fibonacci sequence is periodic modulo any positive
integer. (I.e, show that for each k ≥ 0, the sequence whose nth term is the remainder
of fn on division by k is a periodic sequence).
5. Let A and B be 2 by 2 matrices with integer entries such that A, A+B, A+2B, A+3B
and A + 4B are all invertible matrices whose inverses have integer entries. Show that
A + 5B is invertible and that its inverse has integer entries.
6. A lattice point is a point in R2 with integer coordinates. Suppose a disk of radius 1/10
is drawn centered at every nonzero lattice point in R2 . Show that every ray through
the origin eventually meets one of the discs.
22
3.2 Week three solutions
1. (Paul Erdős) Create 50 pigeonholes labeled by the 50 odd numbers 1 through 99.
Put each number n among the 51 into the pigeonhole labelled o, where n = o2k is the
(unique) representation of n as a product of an odd number and a power of 2. By PHP,
one pigeonhole has at least two numbers in it; clearly the lesser of the two divides to
greater.
2. (Putnam 2002, A2) Picking an arbitrary equator, PHP just gives a hemisphere with
three of the points. Better: pick two of the five points arbitrarily; there’s a unique
equator that passes through these two points. If two or more of the points lie on this
equator, then either closed hemisphere created by this equator works. If exactly one
more of the points lies on this equator, then one of the two open hemispheres created
by the equator has at least one point (this is PHP); add it to the equator to get a
closed hemisphere that works. If no more of the points lies on this equator, then one of
the two open hemispheres created by the equator has at least two points (again, this
is PHP); add it to the equator to get a closed hemisphere that works.
The key word here was “closed”.
3. (International Mathematical Olympiad 1972, problem 1) The maximum subset-sum is

99 + 98 + . . . + 90 = 945. The minimum subset-sum is 10. So there at most 936 possible
sums, as we run over non-empty subsets of a 10-element set of two digit numbers. But
there are 210 − 1 = 1023 non-empty subsets of the 10-element set. BY PHP, there are
two different non-empty subsets, A and B, with equal sums. PROBLEM: A, B may
not be disjoint. SOLUTION: We certainly can’t have A ⊂ B or B ⊂ A (otherwise one
sum would definitely be larger). So A \ (A ∩ B) and B \ (A ∩ B) are both non-empty.
They are clearly disjoint, and have the same sums, since we obtained them by removing
the sam set from both A and B.
4. (Northwestern Putnam preparation) Consider the pairs (f0 , f1 ), (f2 , f3 ), up to (f2k2 , f2k2 +1 )
(so k 2 + 1 pairs in all). Map each pair to a pair (a, b) be looking at the remainder (in
{0, . . . , k − 1}) of each term in the pair on division by k. There are only k 2 possible
pairs (a, b), so by PHP, there must be two pairs (f2m , f2m+1 ), (f2n , f2n+1 ) both mapped
to the same pair (a, b).
Because each term in the Fibonacci sequence is the sum of the two previous terms, it
follows that the remainder on division by k of each term is determined by the remainder
on division by k of the two previous terms. So the pair (a, b), the remainder on division
by k of the pair (f2m , f2m+1 ), determines the entire future trajectory of the remainder
on division by k of f` , for ` ≥ 2m. But the same is true starting from (f2n , f2n+1 ): the
pair (a, b) determines the entire trajectory of the remainder on division by k of f` , for
` ≥ 2n. It must be the the trajectory of this sequence from f2m to f2n−1 is identical to
its trajectory from f2n to f4n−2m−1 . Repeating, we get that the sequence of remainders
on division by k is periodic from f2m on.
But we can also run the Fibonacci sequence backwards: f`−2 = f` − f`−1 . The same
argument as above shows that the backwards sequence is periodic from f2n on, going
23
backwards. The overlap (f2m to f2n−1 ) between these two infinite periodic sequences
allows us to conclude that the remainder on division by k of fn is a periodic sequence
for n from −∞ to ∞.
5. (Putnam competition 1994 A4) When does an integer 2 by 2 matrix have integer
inverse? If the entries are cij , then what’s needed is for the determinant D = c11 c22 −
c12 c21 to divide each of c11 , c12 , c21 and c22 . In other words, c11 = aD, c12 = bD, c21 =
cD and c22 = dD for integers a, b, c and d. Hence D = c11 c22 − c12 c21 = (ad − bc)D2 ,
so 1 = (ad − bc)D. Since ad − bc is an integer, and so is D, this can only happen if
D = ±1.
Let the entries of A be aij , and those of B be bij . We have the following relation (just
by doing some algebra):
det(A + nB) = a + nh + n2 k,
where a = det(A), h = a11 b22 − a12 b21 − a21 b12 + a22 b22 and k = det(B). This is a
polynomial in n, of degree 2.
We know that each of A + nB, n = 0, 1, 2, 3, 4, have integer inverses, so determinants
±1. So at least three of them must have the same determinant. That means that there
are three different inputs n for which the quadratic polynomial a + nh + n2 k has the
same output. For a quadratic polynomial to have this property, it must be a constant,
so h = 0 and k = 0. That means that for all n,
det(A + nB) = a = detA = ±1,
and so the inverse of A + nB exists and has integer entries for all n (and in particular
for n = 5).
6. (UCLA Putnam preparation) By symmetry it’s enough to consider rays with slope m
satisfying 0 < m < 1, and moreover we may assume that m is irrational, since rays
with rational slopes are easily seen to meet the centers of some (infinitely many) discs.
Given irrational m, 0 < m < 1, let p/q (p, q ≥ 1, p ≤ q, p and q with no common
factors) by a rational approximation to m, that is, |m − p/q| ≤ ε for some small ε.
How small does ε have to be, so that the ray y = mx (x ≥ 0) passes through the
disc of radius 1/10 centered at (q, p)? Well, the ray passes through the point (q, mq),
which is at distance |mq − p| from (q, p); so it’s enough to have |mq − p| ≤ 1/10 or
|m − p/q| ≤ 1/(10q). Therefore, it would be enough to have ε ≤ 1/(10q).
So the question becomes: given irrational m, 0 < m < 1, is there p/q (p, q ≥ 1, p ≤ q,
p and q with no common factors) with |m − p/q| ≤ 1/(10q)? We prove that there is
using PHP.
Consider the sequence m, 2m, 3m, . . . , 11m. These 11 irrational numbers each have a
fractional part between 0 and 1. By PHP, two of them, say am and bm, must have
fractional part within 1/10 of each other. So am − bm is within 1/10 of some integer,
say p. Set q = |a − b| to get |qm − p| ≤ 1/10.
Notice that 1/10 here could have been replaced by ε, for any ε > 0.
24
4 Week four (September 16) — Binomial coefficients
Binomial coefficients crop up quite a lot in Putnam problems. This handout presents some
ways of thinking about them.
Introduction
The binomial coefficient nk , with n ∈ N and k ∈ Z, can be defined many ways; possi-

bly the most helpful definition from the point of view of problem-solving is the following
combinatorial one:

n
is the number of subsets of size k of a set of size n.
k
n

In particular, this definition immediately tells us that for all n ≥ 0 we have k
= 0 if k > n
or if k < 0, and that n0 = nn = 1 (and so in particular 00 = 1).

The binomial coefficients can also be defined by a recurrence relation: for n ≥ 1, and all
k ∈ Z, we have the recurrence

n n−1 n−1
= + , (Pascal’s identity)
k k k−1
with initial conditions k0 = 0 if k 6= 0, and 00 = 1. To see that this recurrence does indeed

generate the binomial coefficients, think about the combinatorial interpretation: the subsets
n
that don’t include element n ( n−1

of {1, . . . , n} of size k ( k of them) partition into those k
of them) and those that do include element n ( n−1

k−1
of them). The recurrence allows us
to quickly compute small binomial coefficients via Pascal’s triangle: the zeroth row of the
triangle has length one, and consists just of the number 1. Below that, the first row has two
1’s, one below and to the left of the 1 in the zeroth row, and one below and to the right of
the 1 in the zeroth row. The second row has three entries, a 1 below and to the left of the
leftmost 1 in the first row, a 1 below and to the right of the rightmost 1 in the first row,
and in the center a 2. Each subsequent row contains one more entry than the previous row,
starting with a 1 below and to the left of the leftmost 1 in the previous row, ending with a
1 below and to the right of the rightmost 1 in the previous row, and with all other entries
being the sum of the two entries in the previous row above to the left and to the right of the
entry being considered (see the picture below).
25
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
etc.
Pascal’s triangle in numbers
n
The kth entry in row k (counting from 0 rather than 1 both down and across) is then
k
(this is just a restatement of Pascal’s identity) (see the picture below).
0

0
1 1

0 1
2 2 2

0 1 2
3 3 3 3

0 1 2 3
4 4 4 4 4

0 1 2 3 4
5 5 5 5 5 5

0 1 2 3 4 5
6 6 6 6 6 6 6

0 1 2 3 4 5 6
etc.
Pascal’s triangle symbolically
Finally, there is a algebraic expression for nk , that makes sense for all n, k ≥ 0, using

the factorial function (defined combinatorially as the number of ways of arranging n distinct
objects in order, and algebraically by n! = n(n − 1)(n − 2) . . . (3)(2)(1) for n ≥ 1, with
0! = 1):
n n(n − 1) . . . (n − (k − 1)) n!
= = .
k k! k!(n − k)!
To see this, note that n(n − 1) . . . (n − (k − 1)) is fairly evidently the number of ordered lists
of k distinct elements from {1, . . . , n} (often referred to in textbooks as “permutations of n
26
items taken k at a time” — ugh). When the ordered lists are turned into (unordered) subsets,
each subset appears k! times (once for each of the k! ways of putting k distinct objects into
an ordered list), so we need to divide the ordered count by k! to get the unordered count.
When dealing with binomial coefficients, it is very helpful to bear all three definitions in
mind, but in particular the first two.
Identities
The binomial coefficients satisfy a staggering number of identities. The simplest of these
are easily understood using either the combinatorial or algebraic definitions; for the more
involved ones, that include sums, the algebraic definition is usually next to useless, and
often the easiest way to prove the identity is combinatorially, by showing that both sides of
the identity count the same thing in different ways (illustration below), though it is often
possible also to prove these identities by induction, using the recurrence relation. Another
approach that is helpful is that of generating functions.
Here are some of the basic binomial coefficient identities:
1. (Symmetry)
n n
=
k n−k
(Proof: trivial from the algebraic definition; combinatorially, left-hand side counts
selection of subsets of size k from a set of size n, by naming the selected elements;
right-hand side also counts selection of subsets of size k from a set of size n, this time
by naming the unselected elements).
2. (Lower summation)
n
X n
= 2n
k=0
k
(Proof: close to impossible using the algebraic definition; combinatorially, very straight-
forward: left-hand side counts the number of subsets of a set of size n, by first deciding
the size of the subset, and then choosing the subset itself; right-hand side also counts
the number of subsets of a set of size n, by going through the n elements one-by-one
and deciding whether they are in the subset or not).
3. (Upper summation)
n
X m n+1
= .
m=k
k k+1
4. (Parallel summation)
n
X m+k n+m+1
= .
k=0
k n
27
5. (Square summation)
n 2
X n 2n
= .
k=0
k n
6. (Vandermonde identity, or Vandermonde convolution)

r
X m n n+m
= .
k=0
k r − k r
The binomial theorem

This is the most important identity involving binomial coefficients: for all real x and y, and
n ≥ 0,
n
n
X n n−k k
(x + y) = x y .
k=0
k
This can be proved by induction using Pascal’s identity, but the proof is quite awkward.
Here’s a nice combinatorial proof. First, note that the identity is trivial if either x = 0 or
y = 0, so we may assume x, y 6= 0. Dividing through by xn , the identity is the same as
n
n
X n k
(1 + z) = z .
k=0
k
We will prove this combinatorially when z is a positive integer. The left-hand side counts
the number of words of length n from alphabet {0, 1, 2, . . . , z}, by deciding on the letters
one after the other. The right-hand side also counts the number of words of length n from
alphabet {0, 1, 2, . . . , z}, as follows: first decide how many of the letters of the word are from
{1, . . . , z}
(this is the k of the summation). Next, decide the location of these k letters (this
is the nk ). Finally, decide what specific letters go into those spots, one after another (this
is the z k ) (note that the remaining n − k letters must all be 0’s).
This only shows the identity for positive integer z. But now we use the fact that both
the right-hand and left-hand sides are polynomials of degree n, so if they agree at n + 1
different values of z, they must agree at all values of z (otherwise, their difference is a not-
identically-zero polynomial of degree at most n with n + 1 distinct roots, an impossibility).
And indeed, the two sides agree not just at n + 1 different values of z, but at infinitely many
(all positive integers z). So from the combinatorial argument that shows that the two sides
are equal for positive integers z, we infer that they are equal for all real z. This argument is
often called the polynomial principle.
There is a version of the binomial theorem also for non-positive-integral exponents: for
all real α,
X α
α
(1 + z) = zk
k≥0
k
α

where k is defined in the obvious way:

α α(α − 1) . . . (α − k + 1)
(for k ≥ 1; α0 = 1),

=
k k!
28
and the equality is valid for all real |z| < 1. (Check: when α is a positive integer, this reduces
to the standard binomial theorem).
For example, if ` > 0 is a positive integer, then

−` (−`)(−` − 1) . . . (−` − k + 1) k `+k−1
= = (−1) ,
k k! k
and so
1 X ` + k − 1
`
= zk .
(1 − z) k≥0
k
This generalizes the familiar identity

1
= 1 + z + z2 + . . . .
1−z
Modulo the convergence analysis, the proof of the binomial theorem for general exponents
is fair easy: the coefficient of xk in the Taylor series Taylor series of (1 + z)α is
1 dk

α α
k
(1 + z) |z=0 = .
k! dx k
Compositions and weak compositions

A composition of a positive integer n P into k parts is a vector (x1 , x2 , . . . , xk ), with each entry
a strictly positive integer, and with ki=1 xi = n. For example, (2, 1, 1, 3) is a composition
of 7, as is (1, 3, 1, 2); and, because a composition is a vector (ordered list), these two are
considered different compositions.
A weak composition of a positive integer n into k partsPk is a vector (x1 , x2 , . . . , xk ), with
each entry a non-negative (possibly 0) integer, and with i=1 xi = n. For example, (2, 0, 1, 3)
is a weak composition of 6, but not a composition.
How many weak compositions of n are there, into k parts? Put down n + k − 1 stars in
a row. Choose k − 1 of them to turn into bars. The resulting arrangement of stars-and-bars
encodes a weak composition of n into k parts — the number of stars before the first bar is
x1 , the number of stars between the first and second bar is x2 , and so on, up to the number
of stars after the last bar, which is xk (notice that only k − 1 bars are needed to determine
k intervals of stars). Conversely, every weak composition of n into k parts is encoded by
one such selection of k − 1 bars from the initial list of n + k − 1 stars. For example, the
configuration ? ? || ? | ? ?? encodes the weak composition (2, 0, 1, 3) of 6 into 4 parts. So, the
n+k−1

number of weak compositions of n into k parts is a binomial coefficient, k−1 .
How many compositions of n are there, into k parts? Each such composition (x1 , x2 , . . . , xk )
gives rise to a weak composition (x1 − 1, x2 − 1, . . . , xk − 1) of n − k into k parts, and all
weak composition of n − k into k parts are achieved by this process. So, the number of
compositions of n into k parts thesame as the number of weak compositions of n − k into
is n−1
(n−k)+k−1
k parts, which is k−1
= k−1 .
For example: I like plain cake, chocolate cake, blueberry cake and pumpkin cake donuts
from Dunkin’ Donuts. In how many different ways can I buy a dozen donuts that I like? I
29
must buy x1 plain, x2 chocolate, x3 blueberry and x4 pumpkin, with x1 + x2 + x3 + x4 = 12,
and with each xi a non-negative integer (possibly 0). So the number of different purchases
I can make is the number of weak compositions of 12 into 4 parts, so 15
4
= 1365.
Easy warm-up problems

1. Give a combinatorial proof of the upper summation identity.
2. Give a combinatorial proof of the parallel summation identity.
3. Give a combinatorial proof of the square summation identity.
4. Give a combinatorial proof of the Vandermonde identity.
5. Evaluate n
X n
k
(−1)
k=0
k
for n ≥ 1.
Harder warm-up problems

1. The kth falling power of x is xk = x(x − 1)(x − 2) . . . (x − (k − 1)). Prove that for all
real x, y, and all n ≥ 1,
n
n
X n n−k k
(x + y) = x y .
k=0
k
2. The kth rising power of x is xk = x(x + 1)(x + 2) . . . (x + (k − 1)). Prove that for all
real x, y, and all n ≥ 1,
n
n
X n n−k k
(x + y) = x y .
k=0
k
3. Evaluate
2n
k n 2n
X
(−1) k
k=0
k
for n ≥ 1.
4. Evaluate n
X n
Fk+1
k=0
k
for n ≥ 0, where F1 , F2 , F3 , F4 , F5 , . . . are the Fibonacci numbers 1, 1, 2, 3, 5, . . . ,.
5. (a) Let an be the number of 0-1 strings of length n that do not have two consecutive
1’s. Find a recurrence relation for an (starting with initial conditions a0 = 1,
a1 = 2).
30
(b) Let an,k be the number of 0-1 strings of length n that have exactly k 1’s and that
do not have two consecutive 1’s. Express an,k as a (single) binomial coefficient.
(c) Use the results of the previous two parts to give a combinatorial proof (showing
that both sides count the same thing) of the identity
X n − k − 1
Fn =
k≥0
k
where Fn is the nth Fibonacci number (as defined in the last question).
31
4.1 Week four problems
1. Show that for every n, m ≥ 0,
Z 1
1
xn (1 − x)m = n+m
.
0 (n + m + 1) n
2. Show that the coefficient of xk in (1 + x + x2 + x3 )n is

k
X n n
.
j=0
j k − 2j
3. Define a selfish set to be a set which has its own cardinality (number of elements) as
an element. Find the number of subsets of {1, 2, . . . , n} which are minimal selfish sets,
that is, selfish sets none of whose proper subsets is selfish.
4. Three distinct vertices are chosen at random from the vertices of a regular (2n + 1)-
sided polygon. If all such choices are equally likely, what is the probability that the
center of the polygon lies in the interior of the triangle determined by the three chosen
vertices?
5. Prove that the expression

gcd(m, n) n
n m
is an integer for all pairs of integers n ≥ m ≥ 1.
6. For positive integers m and n, let f (m, n) denote the number of n-tuples (x1 , x2 , . . . , xn )
of integers such that |x1 | + |x2 | + . . . + |xn | ≤ m. Show that f (m, n) = f (n, m). (In
other words, the number of points in the `1 ball of radius m in Rn is the same as the
number of points in the `1 ball of radius n in Rm .)
32
4.2 Week four solutions
Easy warm-up problems
1. RHS is number of subsets of {1, . . . , n + 1} of size k + 1, counted directly. LHS counts
same, by first specifying largest element in subset (if largest element is k +1, remaining
k
k must be chosen from {1, . . . , k}, k ways; if largest element is k + 2, remaining k
must be chosen from {1, . . . , k + 1}, k+1

k
ways; etc.).
2. RHS is number of subsets of {1, . . . , n + m + 1} of size n, counted directly. LHS counts

same, by first specifying the smallest element not in subset (if smallest missed element
is 1, all n elements must be chosen from {2, . . . , n + m + 1}, m+n n
ways, the k = n
term; if smallest missed element is 2, then 1 is in subset and remaining n − 1 elements
must be chosen from {3, . . . , n + m + 1}, m+n−1
n−1
ways, the k = n − 1 term; etc., down
to: if smallest missed element is n + 1, then {1, . . . , n} is in subset and remaining 0
elements must be chosen from {n + 2, . . . , k + 1}, m+0 0
ways, the k = 0 term).
3. RHS is number of subsets of {±1, . . . , ±n} of size n, counted directly. LHS counts
same, by first specifying k, the number of positive elements chosen, then selecting k
n

positive elements ( k ways), then selecting the k negative elements that are not chosen
n
(so the n − k that are, for n in total) ( k ways).
4. Let A = {x1 , . . . , xm } and B = {y1 , . . . , yn } be disjoint sets. RHS is number of subsets

of A ∪ B of size r, counted directly. LHS counts same, by first specifying k, the number
m

of elements chosen from A, then selecting r elements from A ( k ways), then selecting
n
the remaining r − k elements from B ( r−k ways).
5. Applying the binomial theorem with x = 1, y = 1 get

n n
n
X n n−k k
X
k n
0 = (1 − 1) = 1 (−1) = (−1)
k=0
k k=0
k
so the sum is 0.
Harder warm-up problems

1. Here’s a combinatorial proof. Let x and y be positive integers. The number of words in
alphabet {1, . . . , x} ∪ {x1 , . . . , x + y} of length n with no two repeating letters, counted
by selecting letter-by-letter, is (x + y)n . If instead we count by first selecting k, the
number of letters from {x + 1, . . . , x + y} used, then locate the k positions in which
those letters appear, then selecting the n − k letters from {1, . . . , x} letter-by-letter
in the order that they appear in the word, and finally selecting the k letters from
{x + 1, . .P
. , x + y} letter-by-letter in the order that they appear in the word, we get a
count of nk=0 nk xn−k y k . So the identity is true for positive integers x, y.
The LHS and RHS are polynomials in x and y of degree n, so the difference is a
polynomial in x and y of degree at most n, which we want to show is identically 0.
Write the difference as P (x, y) = p0 (x) + p1 (x)y + . . . + pn (x)y n where each pi (x) is a
33
polynomial in x of degree at most n. Setting x = 1 we get a polynomial P (1, y) in y of
degree at most n. This is 0 for all integers y > 0 (by our combinatorial argument), so
by the polynomial principle it is identically 0. So each pi (x) evaluates to 0 at x = 1.
But the same argument shows that each pi (x) evaluates to 0 at any positive integer x.
So again by the polynomial principle, each pi (x) is identically 0 and so P (x, y) is. This
proves the identity for all real x, y.
2. Set x0 = −x and y 0 = −y; we have
(x + y)n = (−x0 − y 0 )n = (−1)n (x0 + y 0 )n
and
n n
X n X n
x n−k k
y = (−x0 )n−k (−y 0 )k
k=0
k k=0
k
n
X n
= (−1)n−k (x0 )n−k (−1)k (y 0 )k
k=0
k
n
X n
= (−1) n
(x0 )n−k (y 0 )k ,
k=0
k
so the identity follows from the falling power binomial theorem (the previous question).
3. It’s not easy to deal

P2nwith this sum in isolation. But, we can generalize: define, for each
k r 2n
n, r ≥ 1, ar,n = k=0 (−1) k k . We claim that ar,n = 0 (and so in particular an,n ,
our sum of interest, is 0.
We prove the claim for each n by induction on r. It will be helpful to define
2n

X 2n k
fn (x) = x = (1 + x)2n .
k=0
k
Differentiating,
2n
X 2n
fn0 (x) = kxk−1
= 2n(1 + x)2n−1 .
k=0
k
Evaluating at x = −1 we get
2n
X
k−1 1 2n
(−1) k = 0,
k=0
k
and multiplying by −1 gives a1,n = 0. This is the base case of the induction.
For the induction step, assume that aj,n = 0 for all 1 ≤ j < r (with r ≥ 2). The rth
derivative of fn (x) is
2n
X 2n
fn(r) (x) = k(k−1) . . . (k−(r−1))x k−r
= 2n(2n−1) . . . (2n−(r−1))(1+x)2n−r .
k=0
k
34
Now k(k − 1) . . . (k − (r − 1)) is a polynomial in k, of degree r, whose leading coefficient
is 1, and for which all other terms are polynomials in r; in other words,
k(k − 1) . . . (k − (r − 1)) = k r + c1 (r)k r−1 + . . . + cr (r),
and so
2n X r 2n
X 2n X
k−r r−j 2n
fn(r) (x) = x k−r r
k + cj (r) x k .
k=0
k j=1 k=0
k
(r)
Evaluating at x = −1, and recalling that fn (x) = 2n(2n−1) . . . (2n−(r−1))(1+x)2n−r ,
we get
2n X r 2n
k−r r 2n k−r r−j 2n
X X
(−1) k + cj (r) (−1) k = 0.
k=0
k j=1 k=0
k
The sum corresponding to j = r is
2n 2n
k−r 2n k 2n
X X
−r
cj (r) (−1) = (−1) cj (r) (−1) = (−1)−r cj (r)(1 − 1)2n = 0.
k=0
k k=0
k
The sum corresponding to each j, 1 ≤ j < r is cj (r)(−1)−r ar,n , so is 0 by induction.

So we conclude
2n
k−r r 2n
X
(−1) k = 0,
k=0
k
which give ar,n = 0 on multiplying by (−1)r .
4. When n = 0 we get a sum of 1; when n = 1 we get a sum of 2; when n = 2 we get a

sum of 5; when n = 3 we get a sum of 13; when n = 4 we get a sum of 34; this suggests
strongly
n
X n
Fk+1 = F2n+1 .
k=0
k
One way to prove this is to iterative apply the Fibonacci recurrence to F2n+1 : on the
zeroth iteration,
0
F2n+1 = F2n+1 .
0
The first iteration leads to

1 1
F2n+1 = F2n + F2n−1 = F2n + F2n−1 .
0 0
The second leads to
F2n+1 = F2n + F2n−1

= (F2n−1 + F2n−2 ) + (F2n−2 + F2n−3 )

2 2 2
= F2n−1 + F2n−2 + F2n−3 .
0 1 2
35
This suggest that we prove to more general statement, that for each 0 ≤ s ≤ n,
s
X s
F2n+1 = F2n+1−s−j . (?)
j=0
j
The case s = n yields

n
X n
F2n+1 = Fn+1−j ,
j=0
j
n n

which is the same as what we have to prove (by the symmetry relation j
= n−j
).
We can prove (?) by induction on s (for each fixed n), with the case s = 0 trivial. For
larger s, we begin with the s−1 case of the induction hypothesis, then use the Fibonacci
recurrence to break each Fibonacci number into the sum of two earlier ones, then
use Pascals identity to gather together terms involving the same Fibonacci number.
(Details omitted.)
5. (a) By considering whether the last term is a 0 or a 1, get the Fibonacci recurrence:
an = an−1 + an−2 .
(b) Add a 0 to the beginning and end of such a string. By reading off a1 , the number of
0’s before the first 1, then a2 , the number of 0’s between the first 1 and the second,
and so on up to ak+1 , the number of 0’s after the last 1, we get a composition
(a1 , . . . , ak+1 ) of n + 2 − k into k + 1 parts; and each such composition can be
encoded (uniquely) by such a string. So an,k is the number of compositions of
n + 2 − k into k + 1 parts, and so equals n+1−k k
.
(c) From the recurrence in the first part, we get an = Fn+2 , so Fn counts the number
of 0-1 strings of length n − 2 with no two consecutive 1’s. We can count such
strings by first deciding on k, the number of 1’s, and by the second part, the
number of such strings is n−1−k k
. Summing over k we get the result.
Problems
1. (I don’t know where I saw this problem first) First solution: The result is trivial
when one or both of n, m = 0, so we may assume n + m ≥ 2. Using integration by
parts, we get Z 1 Z 1
n m m
x (1 − x) dx = xn+1 (1 − x)m−1 dx.
0 n + 1 0
This suggests that for each s ≥ 2 we use induction on m to prove the result for all
pairs (n, m) with n + m = s. The case m = 0 has been observed already. For m > 0
we have
Z 1 Z 1
n m m
x (1 − x) dx = xn+1 (1 − x)m−1 dx
0 n + 1 0
m 1
=
n + 1 ((n + 1) + (m − 1) + 1) (n+1)+(m−1)

n+1
m
= .
(n + 1)(n + m + 1) n+mn+1
36
The result follows if we can show

n+m n+m
(n + 1) =m .
n+1 n
This is easily verified, either algebraically, or via the “committee-chair” identity: to
choose a committee-with-chair of size n + 1 from n + m people, we either choose the
n+1 people for the committee and elect one of them chair ((n+1) n+m

n+1
ways) or select
the n non-chair members from the n + m people, and choose the chair from among
those not yet chosen (m n+m n
ways).
Second solution (essentially the same as the first, written differently): The result
is trivial when one or both of n, m = 0, so we may assume both n, m ≥ 1. Consider
n ≥ 1 fixed. Using integration by parts, we get
Z 1 Z 1
n m m
x (1 − x) dx = xn+1 (1 − x)m−1 dx
0 n+1 0
(recall m ≥ 1). Repeating integration by parts m − 1 more times, we get
Z 1 Z 1
n m m m−1 1
x (1 − x) dx = ··· xn+m dx
0 n + 1 n + 2 n + m 0
m! 1
=
(n + m)(n + m − 1) . . . (n + 1) n + m + 1
1
=
(n + m + 1) n+m

m
1
= .
(n + m + 1) n+m
n
2. (1992 Putnam, problem B2) We have
(1 + x + x2 + x3 )n = (1 + x)n (1 + x2 )n
! !
X n X n 0
= xi 0
x2i .
i≥0
i i0 ≥0
2i
an xk term in the product by pairing each ni xi from the first sum with

We get
n

k−2i
xk−2i from the second; so the coefficient of xk in the product is
k
X n n
,
i=0
i k − 2i
as claimed.
3. (Putnam competition 1996, problem B1) The answer is the nth Fibonacci number. The
solution given here is taken from http://mathforum.org/kb/thread.jspa?forumID=
13&threadID=20920&messageID=50140.
37
Write #S for the number of elements in S. Note that a set S is a minimal selfish set
if and only if #S is the smallest element of S.
Induct on n. For n = 1, there is one selfish set, namely {1}, so the number of minimal
selfish sets is 1. For n = 2, there are two selfish sets, namely {1} and {1, 2}, so the
number of minimal selfish sets is 1.
Fix n ≥ 2. Assume that there are a minimal selfish subsets of {1, 2, . . . , n} and b
minimal selfish subsets of {1, 2, . . . , n − 1}. I will construct exactly a + b minimal
selfish subsets of {1, 2, . . . , n + 1}.
Construction A: Let S be a minimal selfish subset of {1, 2, . . . , n}. Then S is a minimal
selfish subset of {1, 2, . . . , n + 1}. This construction produces a minimal selfish subsets.
Every minimal selfish subset T of {1, 2, . . . , n + 1} not containing n + 1 is obtained
from Construction A. Proof: T is a subset of {1, 2, . . . , n}.
Construction B: Let S be a minimal selfish subset of {1, 2, . . . , n − 1}. Define T =
{n + 1} ∪ {x + 1 : x ∈ S}. Then T is a minimal selfish set: its smallest element is
#S + 1 = #T . This construction produces exactly b minimal selfish subsets, all
different from the subsets in Construction A since they all contain n + 1.
Every minimal selfish subset T of {1, 2, . . . , n + 1} containing n + 1 is obtained from
Construction B. Proof: 1 ∈ / T since otherwise min T = 1 < 2 ≤ #T . Consider the set
S = {x − 1 : x ∈ T, x 6= n + 1}. Then #S = #T − 1 is the smallest element of S, so S
is a minimal selfish set. Finally T = {n + 1} ∪ {x + 1 : x ∈ S}.
4. (UCLA Putnam preparation class) Let the vertices be labelled clockwise a1 , a2 , . . . , a2n+1 .
Without loss of generality, let a1 be one of the points. There are 2n2
possibilities for
the remaining points. If a2 is the nearest point chosen to a1 , going clockwise around
the polygon, then there is only one point (an+2 ) that can complete a triangle that
encloses the center. If a3 is the nearest point to a1 then two points (an+2 and an+3 )
can complete a triangle. In general, if ak (k ≥ 2) is the nearest point chosen to a1 ,
going clockwise around the polygon, then there are k − 1 points (an+2 through an+k )
that can complete a triangle that encloses the center. The largest value k can take is
n + 1. So the numberof pairs of points that can complete a good triangle with a1 is
1 + 2 + . . . + n = n+1
2
, and the required probability is
n+1

2 n+1
2n
= .
2
2(2n − 1)
Notice that this tends to 1/4 as n tends to infinity, suggesting the following: if a circle
of radius 1 is given, a fixed point x on the perimeter is marked, two numbers a, b
are generated uniformly from the interval [0, 2π], and two points are marked on the
perimeter of the circle, one at arc-distance a from x and the other at arc distance b
(measured in a clockwise direction), then the probability that the center of the circle
lies in the triangle formed by the three marked points should be 1/4. Does this make
intuitive sense?
38
5. (Putnam competition, 2000, problem B2) We know that gcd(m, n) = am + bn for some
integers a, b; but then

gcd(m, n) n m n n n
=a +b .
n m n m n m
Since
n n n−1
=
m m m−1
(the committee-chair identity again, or easy algebra), we get

gcd(m, n) n n−1 n
=a +b ,
n m m−1 m
n−1 n

and so (since a, b, m−1 and m are all integers) we get the desired result.
6. (Putnam competition, 2005, problem B4) We’ll produce an explicit formula for f (m, n)
that is symmetric in m and n. Each tuple counted by f (m, n) has k entries that are
non-zero, for some k ≥ 0; once k has been decided on, there are nk ways to decide on
the locations of the k non-zeros. Each of these k entries has a sign (±); there are 2k
ways to decide on the signs. To complete the specification of the tuple, we have find
k numbers a1 , . . . , ak , all strictly positive, that add to at most m. Adding a phantom
(k + 1)st number, this is the same as the number of (k + 1)-tuples (a1 , . . . , ak+1 ), all
strictly positive integers, that add to exactly m + 1; i.e., the number of compositions
of m + 1 into k + 1 parts, so (m+1)−1 (k+1)−1
= m k
. So
X nn
f (m, n) = 2k .
k≥0
k k
Since this is symmetric in m and n, f (m, n) = f (n, m).
Note: f (m, n) is also the number of ways of moving from (0, 0) to (m, n) on the two-
dimensional lattice by taking steps of the following kind: one unit up parallel to y-axis,
one unit to the right parallel to the x-axis, or one diagonal unit (Euclidean distance
√
2) up and to the right, parallel to the line x = y; the numbers f (m, n) are called the
Delannoy numbers.
39
5 Week five (September 23) — Inequalities
Many Putnam problem involve showing that a particular inequality between two expressions
holds always, or holds under certain circumstances. There are a huge variety of general
inequalities between sets of numbers satisfying certain conditions, that are quite reasonable
for you to quote as “well-known”. I’ve listed some of them here, mostly without proofs. If you
are interested in knowing more about inequalities, consider looking at the book Inequalities
by Hardy, Littlewood and Pólya (QA 303 .H223i at the math library).
Squares are positive

Surprisingly many inequalities reduce to the obvious fact that x2 ≥ 0 for all real x, with
equality iff x = 0. I’ll highlight one example in what follows.
The triangle inequality

For real or complex x and y, |x + y| ≤ |x| + |y| (called the triangle inequality because it
says that the distance travelled along the line in going from x to −y — |x + y| — does not
decrease if we demand that we go through the intermediate point 0).
Arithmetic mean — Geometric mean — Harmonic mean inequality

For positive a1 , . . . , an
n √ a1 + . . . + an
1 1 ≤ n
a1 . . . an ≤
a1
+ ... + a1
n
with equalities in both inequalities iff all ai are equal. The three expressions above are the
harmonic mean, the geometric mean and the arithmetic mean of the ai .
√
For n = 2, here’s a proof of the second inequality: a1 a2 ≤ (a1 + a2 )/2 iff 4a1 a2 ≤
(a1 + a2 )2 iff a21 − 2a1 a2 + a22 ≥ 0 iff (a1 − a2 )2 ≥ 0, which is true by the “squares are positive”
inequality; there’s equality all along iff a1 = a2 .
√
For n = 2 the first inequality is equivalent to a1 a2 ≤ (a1 + a2 )/2.
Power means inequality

For a non-zero real r and positive a1 , . . . , an define
1/r
ar1 + . . . + arn

r
M (a1 , . . . , an ) = ,
n
√
and set M 0 (a1 , . . . , an ) = n a1 . . . an . For real numbers r < s,
M r (a1 , . . . , an ) ≤ M s (a1 , . . . , an )
with equality iff all ai are equal.
40
Notice that M −1 (a1 , . . . , an ) is the harmonic mean of the ai ’s, and M 1 (a1 , . . . , an ) is their
geometric mean, so this inequality generalizes the Arithmetic mean — Geometric mean —
Harmonic mean inequality.
There is a weighted power means inequality: let w1 , . . . , wn be positive reals that add to
1, and define
Mwr (a1 , . . . , an ) = (w1 ar1 + . . . + wn arn )1/r
for non-zero real r, with Mw0 (a1 , . . . , an ) = aw wn
1 . . . an . For real numbers r < s,
1
Mwr (a1 , . . . , an ) ≤ Mws (a1 , . . . , an ).

(This reduces to the power means inequality when all wi = 1/n.)
Cauchy-Schwarz-Bunyakovsky inequality
Let x1 , . . . , xn and y1 , . . . , yn be real numbers. We have
(x1 y1 + . . . + xn yn )2 ≤ x21 + . . . + x2n y12 + . . . + yn2 .

Equality holds if one of the sequences (x1 , . . . , xn ), (y1 , . . . , yn ) is identically zero. If both are
not identically zero, then there is equality iff there is some real number λ such that xi = λyi
for each i.
This is really a very general inequality: if you are familiar with inner products from linear
algebra, the CauchySchwarz-Bunyakovsky inequality really says that if x, y are vectors in
an inner product space (over either the reals or the complex numbers) then
|hx, yi|2 ≤ hx, xihy, yi.
Equivalently
|hx, yi| ≤ ||x|| ||y||.
There is equality iff x and y are linearly dependent.
Hölder’s inequality
Fix p > 1 and define q by 1/p + 1/q = 1. Let x1 , . . . , xn and y1 , . . . , yn be real numbers. We
have
|x1 y1 + . . . + xn yn | ≤ (|x1 |p + . . . + |xn |p )1/p (|y1 |q + . . . + |yn |q )1/q .
Notice that Hölder becomes Cauchy-Schwarz-Bunyakovsky in the case p = 2.
Chebyshev’s sum inequality

If a1 ≥ . . . ≥ an and b1 ≥ . . . ≥ bn are sequences of reals, then

a1 b 1 + . . . + an b n a1 + . . . + an b1 + . . . + b n
≥ .
n n n
The same holds if a1 ≤ . . . ≤ an and b1 ≤ . . . ≤ bn ; if either a1 ≥ . . . ≥ an and b1 ≤ . . . ≤ bn
or a1 ≤ . . . ≤ an and b1 ≥ . . . ≥ bn , then

a1 b 1 + . . . + an b n a1 + . . . + an b1 + . . . + b n
≤ .
n n n
41
The rearrangement inequality
If a1 ≤ . . . ≤ an and b1 ≤ . . . ≤ bn are sequences of reals, and aπ(1) , . . . , aπ(n) is a permutation
(rearrangement) of a1 ≤ . . . ≤ an , then
an b1 + . . . + a1 bn ≤ aπ(1) b1 + . . . + aπ(n) bn ≤ a1 b1 + . . . + an bn .
If a1 < . . . < an and b1 < . . . < bn , then there is equality in the first inequality iff π is the
reverse permutation π(i) = n + 1 − i, and there is equality in the second inequality iff π is
the identity permutation π(i) = i.
Jensen’s inequality
A real function f (x) is convex on the interval [c, d] if for all c ≤ a < b ≤ d, the line segment
joining (a, f (a)) to (b, f (b)) lies entirely above the graph y = f (x) on the interval (a, b), or
equivalently, if for all 0 ≤ t ≤ 1 we have
f ((1 − t)a + tb) ≤ (1 − t)f (a) + tf (b).
If f (x) is convex on the interval [c, d], and c ≤ a1 ≤ . . . ≤ an ≤ d, then

a1 + . . . + an f (a1 ) + . . . + f (an )
f ≤
n n
(note that when n = 2, this is just the definition of convexity).

We say that f (x) is concave on [c, d] if for all c ≤ a < b ≤ d, and for all 0 ≤ t ≤ 1, we
have
f ((1 − t)a + tb) ≥ (1 − t)f (a) + tf (b).
If f (x) is concave on the interval [c, d], and c ≤ a1 ≤ . . . ≤ an ≤ d, then

a1 + . . . + an f (a1 ) + . . . + f (an )
f ≥ .
n n
As an example, consider the convex function f (x) = x2 ; for this function Jensen says
that 2
a2 + . . . + a2n

a1 + . . . + an
≤ 1 ,
n n
which is equivalent to the powers means inequality M 1 (a1 , . . . , an ) ≤ M 2 (a1 , . . . , an ); and
when f (x) = − ln x we get
√ a1 + . . . + an
n
a1 . . . an ≤ ,
n
the AM-GM inequality.
42
Two miscellaneous comments
1. Maximization/minimization problems are often problems about inequalities in disguise.
For example, to find the minimum of f (a, b) as (a, b) ranges over a set R, it is enough
to first guess that the minimum is m, then find an (a, b) ∈ R with f (a, b) = m, and
then use inequalities to show that f (a, b) ≥ m for all (a, b) ∈ R.
2. If an expression is presented as a sum of n squares, it is sometimes helpful to think of

it as the (square of the) distance between two points in n dimensional space, and then
think of the problem geometrically.
Some warm-up problems

You should find that these are all fairly easy to prove by direct applications of an appropriate
inequality from the list above.
n
1. n! < n+12
for n = 2, 3, 4, . . ..
p √ √ √
2. 3(a + b + c) ≥ a + b + c for positive a, b, c.
3. Minimize x1 + . . . + xn subject to xi ≥ 0 and x1 . . . xn = 1.
4. Minimize
x2 y2 z2
+ +
y+z z+x x+y
subject to x, y, z ≥ 0 and xyz = 1.
5. If triangle has side lengths a, b, c and opposite angles (measured in radians) A, B, C,

then
aA + bB + cC π
≥ .
a+b+c 3
6. Identify which is bigger:
1999!(2000) or 2000!(1999) .
(Here n!(k) indicates iterating the factorial function k times, so for example 4!(2) = 24!.)
7. Identify which is bigger:

19991999 or 20001998 .
8. Minimize
sin3 x cos3 x
+
cos x sin x
on the interval 0 < x < π/2.
43
5.1 Week five problems
1. Show that for non-negative reals a1 , . . . , an and b1 , . . . , bn ,
(a1 . . . an )1/n + (b1 . . . bn )1/n ≤ ((a1 + b1 ) . . . (an + bn ))1/n .
2. For positive integers m, n, show
(m + n)! m! n!
m+n
< m n.
(m + n) m n
3. Find the minimum value over all real x of
| sin x + cos x + tan x + cot x + sec x + csc x|.
4. Show that for every integer n > 1,

n
1 1 1 1
< − 1− <
2ne e n ne
5. Suppose that a1 , a2 , . . . is a sequence of positive real numbers such that

X 1
a
n≥1 n
converges. Show that

X n 2 an
n≥1
(a1 + . . . + an )2
also converges.
6. Maximize Z y p
x4 + (y − y 2 )2 dx
0
on the interval [0, 1].
44
5.2 Week five solutions
Some warm-up problems
1. Use the geometric mean - arithmetic mean inequality, with (a1 , . . . , an ) = (1, . . . , n).
2. Use the power means inequality, with (a1 , a2 , a3 ) = (a, b, c) and r = 1/2, s = 1.
3. Guess: the minimum is n, achieved when all x1 = 1. Then use geometric mean -
arithmetic mean inequality to show

x1 + . . . + xn √
≥ n x 1 . . . xn = 1
n
for positive xi satisfying x1 . . . xn = 1.

√ √ √
x y z

4. Apply Cauchy-Schwartz with the vectors y + z, z + x, x + y and √y+z , √z+x , √x+y
to get 2
y2 z2

2 x
(x + y + z) ≤ + + 2 (x + y + z) ,
y+z z+x x+y
leading to
x2 y2 z2 x+y+z
+ + ≥ .
y+z z+x x+y 2
By the AM-GM inequality,
x+y+z √
≥ 3 xyz = 1,
3
so
x2 y2 z2 3
+ + ≥ .
y+z z+x x+y 2
This lower bound can be achieved by taking x = y = z = 1, so the minimum is 3/2.
5. Assume, without loss of generality, that a ≤ b ≤ c. Then also A ≤ B ≤ C, so by

Chebychev,

aA + bB + cC a+b+c A+B+C a+b+c π
≥ = ,
3 3 3 3 3
from which the result follows.
6. For n ≥ 1, n! is increasing in n (1 ≤ n < m implies n! < m!). So, starting from the
easy
1999! > 2000,
apply the factorial function 1999 more times to get
1999!(2000) > 2000!(1999) .
45
7. Consider f (x) = (1999−x) ln(1999+x). We have ef (0) = 19991999 and ef (1) = 20001998 ,
so we want to see what f does on the interval [0, 1]: increase or decrease? The derivative
is
1999 − x
f 0 (x) = − ln(1999 + x) + ,
1999 + x
which is negative on [0, 1] (since, for example,
1999 − x
≤ 1 = ln e < ln(1999 + x)
1999 + x
on that interval). So
20001998 < 19991999 .
8. We can use the rearrangement inequality on the pairs sin3 x, cos3 x (which satisfies

sin3 x ≤ cos3 x on [0, π/4], and sin3 x ≥ cos3 x on [π/4, π/2]), and (1/ cos x, 1/ sin x)
(which also satisfies 1/ cos x ≤ 1/ sin x on [0, π/4], and 1/ cos x ≥ 1/ sin x on [π/4, π/2]),
to get
sin3 x cos3 x sin3 x cos3 x
+ ≥ + = sin2 x + cos2 x = 1
cos x sin x sin x cos x
on the whole interval. Since 1 can be achieved (at x = π/4) the minimum is 1.
Problems
1. (Putnam competition 2003 problem A2) If any ai is 0, the result is trivial, so we may
assume all ai > 0. Dividing through by (a1 . . . an )1/n , the inequality becomes
1 + (c1 . . . cn )1/n ≤ ((1 + c1 ) . . . (1 + cn ))1/n
for ci ≥ 0. Raising both sides to the power n, this is the same as

n n
X n k/n
X
(c1 . . . cn ) ≤ ek
k=0
k k=0
where ek is the sum of the products of the ci ’s, taken k at a time. So it is enough to
show that for each k,

n X Y
(c1 . . . cn )k/n ≤ ci .
k i∈A
A⊆{1,...,n}, |A|=k
Q
We apply the AM-GM inequality to the numbers i∈A ci as A ranges over all subsets
n−1

of size k of {1, . . . , n}. Note that each ai appears exactly k−1 times in all these
numbers. So we we get
P Q
n−1 n A⊆{1,...,n}, |A|=k i∈A ci
(c . . . c )(k−1)/(k ) ≤
1 n n
.
k
46
n−1
n
Since k−1
/ k = k/n, this is the same as
P Q
A⊆{1,...,n}, |A|=k i∈A ci
(c1 . . . cn )k/n ≤ n
,
k
which is exactly what we wanted to show.
Much quicker solution, shown in class: If there is any i for which ai + bi = 0,

then the inequality trivially holds. If not, divide both sides by the right-hand side to
get the equivalent inequality
n ! n1 n ! n1
Y ai Y bi
≤ 1.
i=1
ai + b i i=1
ai + b i
Applying the arithmetic mean - geometric mean inequality to both terms on the left-
hand side, we find that the left-hand side is at most
n
! n
!
1 X ai 1 X bi
+
n i=1 ai + bi n i=1 ai + bi
which is the same as !

n
1 X ai + b i
,
n i=1
ai + b i
which is indeed at most (in fact exactly) 1.
2. (Putnam competition 2004 problem B2) Rearranging, this is the same as

m n
m+n m n
< 1.
m m+n m+n
This suggests looking at the binomial expansion

m+n
m n
+ .
m+n m+n
The whole binomial expansion sums to 1; one term of the expansion is

m n
m+n m n
.
m m+n m+n
Since all terms are strictly positive, we get the required inequality.
47
3. (Putnam competition 2003, problem A3) We can re-express the expression inside the
absolute value sign to be a function of sin x + cos x. Indeed,
sin2 x + cos2 x + sin x + cos x
tan x + cot x + sec x + csc x =
sin x cos x
sin x + cos x + 1
=
sin x cos x
(sin x + cos x + 1)(sin x + cos x − 1)
=
sin x cos x(sin x + cos x − 1)
(sin x + cos x)2 − 1
=
sin x cos x(sin x + cos x − 1)
2
= .
sin x + cos x − 1
So, setting t = sin x + cos x − 1, we seek the minimum of |1 + t + 2/t| over all real x.
Elementary calculus tells us that the extreme values of sin x + cos x − 1 are achieved
where sin √ and that as x ranges over real values, t = sin x + cos x − 1 ranges
√ x = cos x,
from − 2 − 1 to 2 − 1; so we only consider t in this range.
More elementary√ calculus tells us that√f (t) = √ 1 + t + 2/t is positive
√ and decreasing
for all t√∈ (0, 2 − 1], with minimum √ 2 + 2/( 2 − 1) = 2 + 3 2 achieved uniquely √
at t = 2 − 1, and that on √ t ∈ [− 2 − 1, 0) it is negative with maximum 1 − 2 2
achieved uniquely at t = − 2 (it’s undefined at t = 0).
√ √ √ √
Since 2 2 − 1 < 3√+ 2 2, the global minimum of |f (t)| on − 2 − 1 ≤ t ≤ 2 − 1 is
achieved at t = − 2, which corresponds to x = 5π/4 and
√
| sin x + cos x + tan x + cot x + sec x + csc x| = 2 2 − 1.
4. (Putnam competition 2002 B3) Start with Taylor series

x2 x3
log(1 + x) = x − + − ...,
2 3
valid for |x| < 1. Plugging in x = 1/n (valid since n > 1) and re-arranging, we get
n
1 1 1 1 1 1

− 1− = 1 − e− 2n − 3n2 − 4n3 ... .
e n e
For the upper bound, we need

1 1 1 1
1 − e− 2n − 3n2 − 4n3 ... < .
n
Now we use the inequality e−x > 1 − x, valid for all x > 0 (it from the fact that the
(convergent) Taylor series for e−x for x > 0 is alternating, and so e−x is bounded below
by any partial sum that stops immediately after a negative term, and so in particular
it is bounded below by 1 − x). Applying with x = 1/n we get
1 1
> 1 − e− n ;
n
48
so for the upper bound, it is enough to show
1 1 1 1
1 − e− n > 1 − e− 2n − 3n2 − 4n3 ...
or
1 1 1 1
+ + 2 + 3 + . . . ≤ 1.
2 3n 4n 5n
One way to show this would be to show that 1/3n ≤ 1/4, 1/4n2 ≤ 1/8, 1/5n3 ≤ 1/16,
etc., so the whole sum on the left-hand side is bounded above by the geometric series
1/2 + 1/4 + 1/8 + . . . = 1. To make this work, we need to show that for all n ≥ 2 and
k ≥ 2, we have
1 1
k−1
≤ k,
(k + 1)n 2
or
(k + 1)nk−1 ≥ 2k .
But since for k ≥ 2 we have k + 1 > 2, this is implied by nk−1 ≥ 2k−1 , which is trivially
true in the range under consideration.
For the lower bound, we need
1 1 1 1
< 1 − e− 2n − 3n2 − 4n3 ... .
2n
Now we use the alternating series trick again, this time in the form
x2
e−x < 1 − x + .
2
Applying with x = 1/2n + 1/3n2 + . . ., we reduce the problem to that of showing
1 1
2
2n
+ 3n2
+ ...+ 1 1
≤ 2 + 3 + ...,
2 3n 4n
which is implied by 2
1 1 1 2
+ + 2 + ... ≤ .
2 3n 4n 3
For n ≥ 3, this is easy; we can bound the left-hand side by
2
1 1 1 4 2
+ + + ... = < ,
2 9 27 9 3
completing the lower bound for n ≥ 3. For n = 2, we just observe directly that lower
bound asserts e < 3, which is certainly true.
Remark: This is a hideously long and cumbersome solution! It was the best I could
come up with on the fly, and was obtained by “following my nose”. The official solutions
give a much slicker approach!
49
5. (University of Massachusetts Putnam preparation class) A natural approach is to to
apply the arithmetic mean - harmonic mean inequality to find that
P 2
n 1
2
n an a n k=1 ak
2
≤ .
(a1 + . . . + an ) n2
Now use the fact that n≥1 a1n converges, say to some limit S; since the terms are all
P
positive we have nk=1 a1k ≤ S, and so

P
n 2 an an S 2
≤ .
(a1 + . . . + an )2 n2
So the convergence of the sequence whose convergence we wish to show, follows from
the convergence of X an
2
.
n≥1
n
But unfortunately, if n≥1 a1n converges, then it can’t be the case that n≥1 ann2 con-
P P
verges — this is a nice problem that may well appear on a later problem sheet!
We need to give less away in our initial inequality ...
Ryp
6. (Putnam competition 1991 A5) Let f (y) = 0 x4 + (y − y 2 )2 dx. At y = 0, f (y) = 0,
R1
and at y = 1, f (y) = 0 x2 dx = 1/3. Also, f is non-negative on [0, 1]. It looks like
it will be impossible to evaluate f (y) at any value other than y = 0, 1; this strongly
leads to the suspicion that the maximum is at y = 1. To prove this, we need to show
that f 0 (y) ≥ 0 on the interval [0, 1]. A process by which this can be done is outlined
in Kedlaya, Poonen & Vakil, The William Lowell Putnam Mathematical Competition
1985—2000; Problems, Solutions and Commentary, page 138.
Here’s a (really) slick solution from the same source: Since x2 and y − y 2 are positive
in the range 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, we have
x4 + (y − y 2 )2 ≤ x4 + 2x2 (y − y 2 ) + (y − y 2 )2 = (x2 + (y − y 2 ))2 ,
and so Z y Z y
p 2
x2 + (y − y 2 ) dx = y 2 − y 3 .

4 2 2
x + (y − y ) dx ≤
0 0 3
The derivative of y + (2/3)y is 2y − 2y 2 = 2y(1 − y) which is non-negative on [0, 1],
2 3
so the maximum is at y = 1, where it is 1/3. So

Z yp
x4 + (y − y 2 )2 dx ≤ 1/3;
0
and 1/3 is achievable, at y = 1.
50
6 Week six (September 30) — Polynomials
This week’s problem are all about polynomials, which come up in virtually every Putnam
competition.
Things to know about polynomials

• Fundamental Theorem of Algebra: Every polynomial p(x) = xn + a1 xn−1 +
a2 xn−2 + . . . + an−1 x + an , with real or complex coefficients, has a root in the complex
numbers, that is, there is c ∈ C such that p(c) = 0.
• Factorization: In fact, every polynomial p(x) = xn +a1 xn−1 +a2 xn2 +. . .+an−1 x+an ,
with real or complex coefficients, has exactly n roots, in the sense that there is a vector
(c1 , . . . , cn ) (perhaps with some repetitions) such that
p(x) = (x − c1 )(x − c2 ) . . . (x − cn ).
If a c appears in this vector exactly k times, it is called a root or zero of multiplicity k.

The next bullet point gives a very useful consequence of this.
• Two different polynomials of the same degree can’t agree too often: If p(x)
and q(x) (over R or C) both have degree at most n, and there are n+1 distinct numbers
x1 , . . . , xn+1 such that p(xi ) = q(xi ) for i = 1, . . . , n + 1, then p(x) and q(x) are equal
for all x. [Because then p(x) − q(x) is a polynomial of degree at most n with at least
n + 1 roots, so must be identically zero].
• Complex conjugates: If the coefficients of p(x) = xn +a1 xn−1 +a2 xn2 +. . .+an−1 x+an
are all real, then√the complex roots occur in complex-conjugate pairs: if s + it (with
s, t real, and i = −1) is a root, then s − it is also a root.
• Coefficients in terms of roots: If (c1 , . . . , cn ) is the vector of roots of a polynomial

p(x) = xn +a1 xn−1 +a2 xn2 +. . .+an−1 x+an (over R or C), then each of the coefficients
can be expressed simply in terms of the roots: a1 is the negative of the sum of the ci ’s;
a2 is the sum of the products of the ci ’s, taken two at a time, a3 is the negative of the
sum of the products of the ci ’s, taken three at a time, etc. Concisely:
X Y
ak = (−1)k ci .
A⊆{1,...,n}, |A|=k i∈A
• Elementary symmetric polynomials: The kth elementary symmetric polynomial

in variables x1 , . . . , xn is X Y
σk = xi
A⊆{1,...,n}, |A|=k i∈A
(these polynomials have already appeared in the last bullet point). A polynomial
p(x1 , . . . , xn ) in n variables is symmetric if for every permutation π of {1, . . . , n}, we
have
p(x1 , . . . , xn ) ≡ p(xπ(1) , . . . , xπ(n) ).
51
(For example, x21 + x22 + x23 + x24 is symmetric, but x21 + x22 + x23 + x1 x4 is not.) Every
symmetric polynomial in variables x1 , . . . , xn can be expressed as a linear combination
of the σk ’s.
• Some special values tell things about the coefficients: (Rather obvious, but
worth keeping in mind) If p(x) = a0 xn + a1 xn−1 + a2 xn2 + . . . + an−1 x + an , then
p(0) = an
p(1) = a0 + a1 + a2 + . . . + an
p(−1) = an − an−1 + an−2 − an−3 + . . . + (−1)n a0 .
• Intermediate value theorem: If p(x) is a polynomial with real coefficients (or in

fact any continuous real function) such that for some a < b, p(a) and p(b) have different
signs, then there is some c, a < c < b, with p(c) = 0.
• Lagrange interpolation: Suppose that p(x) is a real polynomial of degree n, whose

graph passes through the points (x0 , y0 ), (x1 , y1 ), . . ., (xn , yn ). Then we can write
n
X Y x − xj
p(x) = yi .
i=0
x i − xj
j6=i
• The Rational Roots theorem: Suppose that p(x) is a polynomial of degree n with
integer coefficients, and that x is a rational root a/b with a and b having no common
factors. Then the leading coefficient of p(x) (the coefficient of xn ) is a multiple of b,
and the constant term is a multiple of a. An immediate corollary of this is that if p(x)
is a monic polynomial (integer coefficients, leading coefficient 1), then any rational
root must in fact be an integer; conversely, if a real number x is a root √ of a monic
polynomial but is not an integer, it must be irrational (for example, 2 is a root of
monic x2 − 2, but is clearly not an integer, so it must be irrational)!
• Gauss’ lemma: Here is a weak form of Gauss’ lemma, but one that is very useful: if
c is an integer root of a monic polynomial p(x) (integer coefficients, leading coefficient
1), then p(x) factors as (x − c)q(x), where q(x) is also a monic polynomial (the surprise
being not that q(x) has leading coefficient 1, but that it has all integer entries).
• One more fact about integer polynomials: Let p(x) be a (not necessarily monic)
polynomial of degree n with integer coefficients. For any integers a, b,
(a − b)|(p(a) − p(b)).
(So also,
(p(a) − p(b))|(p(p(a)) − p(p(b))),
etc.)
52
6.1 Week six problems
1. Let p(x) be a polynomial of degree n that satisfies p(k) = k/(k+1) for k = 0, 1, 2, . . . , n.
Find p(n + 1).
2. The product of two of the four zeros of the quartic
x4 − 10x3 + kx2 + 200x − 1984 = 0
is −32. Find k.
3. (a) Determine all polynomials p(x) such that p(0) = 0 and p(x + 1) = p(x) + 1 for all
x.
(b) Determine all polynomials p(x) such that p(0) = 0 and p(x2 + 1) = (p(x))2 + 1
for all x.
4. Determine, with proof, all positive integers n for which there is a polynomial p(x) of
degree n satisfying the following three conditions:
(a) p(k) = k for k = 1, 2, . . . , n,

(b) p(0) is an integer, and
(c) p(−1) = 2013.
5. Let p(x) = xn + an−1 xn−1 + . . . + a1 x + a0 be a polynomial with integer coefficients.

Suppose that there exist four distinct integers a, b, c, d with p(a) = p(b) = p(c) =
p(d) = 5. Prove that there is no integer k with p(k) = 8.
6. Is there an infinite sequence a0 , a1 , a2 , . . . of nonzero real numbers such that for n =

1, 2, 3, . . . the polynomial
pn (x) = a0 + a1 x + a2 x2 + . . . + an xn
has exactly n distinct real roots?
53
6.2 Week six solutions
1. (Northwestern Putnam preparation class) Knowing n + 1 values, we could uniquely
determine the polynomial via Lagrange interpolation, and then evaluate at n + 1, but
there is an easier way involving changing one’s point of view. Consider the expression
(k + 1)p(k) − k, which we know evaluates to 0 for each of k = 0, 1, . . . , k + 1. This is the
same as saying that the polynomial q(x) = (x + 1)p(x) − x, a degree n + 1 polynomial,
has zeros at x = 0, 1, . . . , n. It follows that
q(x) = (x + 1)p(x) − x = Cx(x − 1)(x − 2)(x − 3) . . . (x − n)
for some constant C 6= 0. To figure out what C must be, evaluate both sides at x = −1
to get
1 = C(−1)n+1 (n + 1)!.
Evaluating at x = n + 1, we get
(−1)n+1

(n + 2)p(n + 1) − (n + 1) = (n + 1)n(n − 1) . . . (1),
(n + 1)!
or
(n + 2)p(n + 1) = (n + 1) + (−1)n+1 ,
and so
n + 1 + (−1)n+1
p(n + 1) = .
n+2
2. (Zeitz, The Art and Craft of Problem Solving Let the zeroes be a, b, c, d; we have
a+b+c+d = 18
ab + ac + ad + bc + bd + cd = k
abc + abd + acd + bcd = −200
abcd = −1984.
Without loss of generality, ab = −32, so cd = 62, and the system becomes
a + b + c + d = 18
30 + ac + ad + bc + bd = k
−32c + −32d + 62a + 62b = −200.
We can re-write this as
[a + b] + [c + d] = 18
30 + [a + b][c + d] = k
−32[c + d] + 62[a + b] = −200.
The first and third equations form a two-by-two system in unknowns [a + b], [c + d],
with solution
[a + b] = 4, [c + d] = 14,
so
k = 30 + (4)(14) = 86.
54
3. (Modified from Putnam competition, 1971 problem A2)
(a) By induction, p(x) = x for all positive integers x, so p(x) − x is a polynomial with
infinitely many zeros, so must be identically 0. We conclude that p(x) = x is the
only possible polynomial satisfying the given conditions.
(b) We have p(0) = 0, p(1) = p(0)2 +1 = 1, p(2) = p(1)2 +1 = 2, p(5) = p(2)2 +1 = 5,
p(26) = p(5)2 +1 = 26 and in general, by induction, if the sequence (an ) is defined
recursively by a0 = 0 and an+1 = a2n + 1, then p(an ) = an . Since the sequence
(an ) is strictly increasing, we find that there are infinitely many distinct values
x for which p(x) = x; as in the last part, this tells us that p(x) = x is the only
possible polynomial satisfying the given conditions.
4. (Illinois mock Putnam 2012) Suppose p(x) is such a polynomial; then p(x) − x is a
polynomial of degree at most n with zeros 1, . . . , n, so is of the form C(x − 1)(x −
2) . . . (x − n). Plugging in x = −1 gives
2014 = C(−1)n (n + 1)!
so C = 2014(−1)n /(n + 1)!. Plugging in x = 0 we get

2014
p(0) = C(−1)n n! = .
n+1
For this to be an integer, a necessary condition is that n + 1 divides 2014. Since
2014 = 2 · 19 · 53, the divisors of 2014 are 1, 2, 19, 38, 53, 106, 1007 and 2014, and the
corresponding values of n are
1, 18, 37, 52, 105, 1006, 2013.
We have shown that if such a p(x) exists, then n must be one of these seven values,
and p(x) must be of the form
2014(−1)n
p(x) = x + (x − 1)(x − 2) . . . (x − n).
(n + 1)!
It is easy to check that conversely, each polynomial of this form satisfies all the given
conditions. We conclude that the list of possible n above is the complete list.
5. (Northwestern Putnam Preparation class) Set q(x) = p(x) − 5. We have q(a) = q(b) =
q(c) = q(d) = 0 and so q(x) = r(x)(x − a)(x − b)(x − c)(x − d), where r(x) is some
rational polynomial; but in fact (by Gauss’ Lemma), r(x) is a polynomial over integers.
Aside: Why is r(k) above a polynomial over integers? Suppose xn + an−1 xn−1 + . . . +
a1 x+a0 (call this expression 1), with all ai integers, factors as (x−c)(xn−1 +rn−2 xn−2 +
. . . + r1 x + r0 ) (call this expression 2), where c is an integer. Then necessarily the ri
are rational numbers; but in fact, we can show that they are all integers. This is
55
obvious when c = 0, so assume c 6= 0. Expanding out the factorization and equating
coefficients, we get
an−1 = rn−2 − c
an−2 = rn−3 − crn−2
an−3 = rn−4 − crn−3
···
a2 = r1 − cr2
a1 = r0 − cr1
a0 = −cr0 .
Now evaluating both expression 1 and expression 2 at x = c, we get
cn + an−1 cn−1 + . . . + a2 c2 + a1 c + a0 = 0.
lugging in a0 = −cr0 yields
c cn−1 + an−1 cn−2 + . . . + a2 c + a1 − r0 = 0.

Utilizing c 6= 0, we conclude that

cn−1 + an−1 cn−2 + . . . + a2 c + a1 = r0
and so, since the left-hand side is clearly an integer, so is the right-hand side, r0 . Now
plugging a1 = r0 − cr1 into this last inequality, and dividing by c, we get
cn−2 + an−1 cn−3 + . . . + a2 = r1
so r1 is also an integer. Continuing in this manner we get, for general k,
cn−(k+1) + an−1 cn−(k+2) + . . . + ak+1 = rk
for k ≤ n − 2 (this could be formally proved by induction), which allows us to conclude
that all of the ri ’s are integers.
Back to solution: Now suppose there is an integer k with p(k) = 8. Then q(k) = 3,
so r(k)(k − a)(k − b)(k − c)(k − d) = 3. Since r(k), (k − a), (k − b), (k − c) and (k − d)
are all integers, and 3 is prime, one of the five must be ±3 and the remaining four
must be ±1. It follows that at least three of (k − a), (k − b), (k − c) and (k − d) must
be ±1, and so at least two of them must take the same value; this contradicts the fact
that a, b, c and d are distinct.
6. (Putnam competition, 1990 problem B5) We can explicitly construct such a sequence.
Start with a0 = 1 and a1 = −1 (so case n = 1 works fine). We’ll construct the
ai ’s inductively, always alternating in sign. Suppose we have a0 , a1 , . . . , an−1 . The
polynomial pn−1 (x) = a0 + a1 x + a2 x2 + . . . + an−1 xn−1 has real distinct roots x1 <
. . . < xn−1 . Choose y1 , . . . , yn so that
y1 < x1 < y2 < x2 < . . . < yn−1 < xn−1 < yn .
56
The sequence pn−1 (y1 ), pn−1 (y2 ), . . . , pn−1 (yn ) alternates in sign (think about the graph
of y = pn−1 (x)). As long as we choose an sufficiently close to 0, the sequence pn (y1 ), pn (y2 ), . . . , pn (yn )
alternates in sign (this is by continuity). So, choose such an an . Now choose a yn+1
sufficiently large that pn (yn+1 ) has the opposite sign to pn (yn ) (this is where alternat-
ing the signs of the ai ’s comes in — such a yn+1 exists exactly because an and an−1
have opposite signs). We get that the sequence pn (y1 ), pn (y2 ), . . . , pn (yn+1 ) alternates
in sign. Hence pn (x) has n distinct real roots: one between y1 and y2 , one between y2
and y3 , etc., up to one between yn and yn+1 . This accounts for all its roots, and we are
done.
57
7 Week seven (October 7) — Recurrences
Definition of a recurrence relation
We met recurrences in the section on induction.
Sometimes we are either given a sequence of numbers via a recurrence relation, or we can
argue that there is such relation that governs the growth of a sequence. A sequence (bn )n≥a
is defined via a recurrence relation if some initial values, ba , ba+1 , . . . , bk say, are given, and
then a rule is given that allows, for each n > k, bn to be computed as long as we know the
values ba , ba+1 , . . . , bn−1 .
Sequences defined by a recurrence relation, and proofs by induction, go hand-in-glove.
Here’s an illustrative example.
Example: Let an be the number of different ways of covering a 1 by n strip with 1 by 1
and 1 by 3 tiles. Prove that an < (1.5)n .
Solution: We start by figuring out how to calculate an via a recurrence. Some initial values
of an are easy to compute: for example, a1 = 1, a2 = 1 and a3 = 2. For n ≥ 4, we can
tile the 1 by n strip EITHER by first tiling the initial 1 by 1 strip with a 1 by 1 tile, and
then finishing by tiling the remaining 1 by n − 1 strip in any of the an−1 admissible ways;
OR by first tiling the initial 1 by 3 strip with a 1 by 3 tile, and then finishing by tiling the
remaining 1 by n − 3 strip in any of the an−3 admissible ways. It follows that for n ≥ 4 we
have an = an−1 + an−3 . So an (for n ≥ 1) is determined by the recurrence


 1 if n = 1,
1 if n = 2,

an =

 2 if n = 3, and
an−1 + an−3 if n ≥ 4.

Notice that this gives us enough information to calculate an for all n ≥ 1: for example,
a4 = a3 + a1 = 3, a5 = a4 + a2 = 4, and a6 = a5 + a3 = 6.
Now we prove, by strong induction, that an < 1.5n . That a1 = 1 < 1.51 , a2 = 1 < (1.5)2
and a3 = 2 < (1.5)3 is obvious. For n ≥ 4, we have
an = an−1 + an−3
< (1.5)n−1 + (1.5)n−3
3 !
2 2
= (1.5)n +
3 3

26
= (1.5)n
27
n
< (1.5) ,
(the second line using the inductive hypothesis) and we are done by induction.
Notice that we really needed strong induction here, and we really needed all three of the
base cases n = 1, 2, 3 (think about what would happen if we tried to use regular induction,
or what would happen if we only verified n = 1 as a base case).
58
Solving via generating functions
Given a sequence (an )n≥0 , we can form its generating function, the function
∞
X
F (x) = an x n .
n=0
Often we can use a recurrence relation to produce a functional equation that F (x) satisfies,
then solve that equation to find a compact (non-infinite-summation) expression for F (x),
then finally use knowledge of calculus power-series to extract an exact expression for an .
There are so many different varieties of this method, that I won’t describe it in general, just
give an example. The Perrin sequence is defined by p0 = 3, P1 = 0, p2 = 2, and
pn = pn−2 + pn−3 for n > 2.
(A quite interesting sequence: see http://en.wikipedia.org/wiki/Perrin_number#Primes_

and_divisibility.) The generating function of the sequence is
P (x) = p0 + p1 x + p2 x2 + p3 x3 + p4 x4 + . . . .
Plugging in the given values for p0 , p1 and p2 , and the recurrence’s right-hand side for all
others, we get
P (x) = 3 + 2x2 + (p0 + p1 )x3 + (p1 + p2 )x4 + . . .

= 3 + 2x2 + x3 (p0 + p1 x + . . .) + x2 (p1 x + p2 x2 + . . .)
= 3 + 2x2 + x3 P (x) + x2 (P (x) − p0 )
= 3 − x2 + (x3 + x2 )P (x).
We can now solve for P (x) as a rational function in x, and expand using partial fractions:
3 − x2
P (x) =
1 − x 2 − x3
A B C
= + +
1 − α1 x 1 − α2 x 1 − α3 x
where A, B and C are some constants and (1 − α1 x)(1 − α2 x)(1 − α3 x) = 1 − x2 − x3 , or

equivalently (x − α1 )(x − α2 )(x − α3 ) = x3 − x − 1. In other words, α1 , α2 and α3 are the
solutions to x3 − x − 1 = 0 (it happens that one of them, say α1 , is real, and is roughly 1.32
[it’s called the plastic number] and the other two are a complex conjugate pair with absolute
value smaller than α1 ).
Using
1
= 1 + kx + k 2 x2 + . . . ,
1 − kx
we now get that
P (x) = (A+B+C)+(Aα1 +Bα2 +Cα3 )x+(Aα12 +Bα22 +Cα32 )x2 +(Aα13 +Bα23 +Cα33 )x3 +. . . ,
59
and so we can read off a formula for pn (by uniqueness of power-series representations):
pn = Aα1n + Bα2n + Cα3n .
But what are A, B and C? One way to figure them out is to use the initial conditions, to
get a set of simultaneous equations:
A + B + C = 3
Aα1 + Bα2 + Cα3 = 0
Aα12 + Bα22 + Cα32 = 2.
It turns out that the unique solution to this system is A = B = C = 1. This solution satisfies
the first equation above, evidently; it satisfies the second since
(x−α1 )(x−α2 )(x−α3 ) = x3 −x−1 = x3 −(α1 +α2 +α3 )x2 +(α1 α2 +α1 α3 +α2 α3 )x−(α1 α2 α3 )
(1)
implies α1 + α2 + α3 = 0; and it satisfies the last since
(α1 + α2 + α3 )2 = (α12 + α22 + α32 ) + 2(α1 α2 + α1 α3 + α2 α3 ),
and from (1) this reduces to 0 = α12 + α22 + α32 − 2 so α12 + α22 + α32 = 2.
So we have an exact formula for pn :
pn = α1n + α2n + α3n
where α1 , α2 and α3 are the roots of x3 − x − 1.
Notice that without even doing the explicit computation of A, B and C, we have learned
something from the generating function approach about pn , namely the following: since
α1 ≈ 1.32 is real and (α2 , α3 ) is a complex conjugate pair with absolute value smaller than
α1 , we have that for large n,
pn ≈ Aα1n ≈ A(1.32)n .
In other words, with very little work, we have isolated the rough growth rate of pn .
If this business of generating functions interests you, you can find out much more in Herb
Wilf’s beautiful book generatingfunctionology (just google it; it’s freely available online).
Solving via characteristic function

Mimicing what we did with the Perrin sequence, we can easily prove the following theorem:
let (an ) be a sequence defined recursively, via the defining relation
an = c1 an−1 + c2 an−2 + ck an−k for n ≥ k,
(for some constants ci ) together with initial values a0 , a1 , a2 , . . . , ak−1 . Form the polynomial
C(x) = xk − c1 xk−1 − c2 xk−2 − . . . − ck−1 x − ck
(called the characteristic polynomial of the recurrence). If C(x) = (x−α1 )(x−α2 ) . . . (x−αk )
factors into distinct linear terms, then there are constants A1 , A2 , . . . , Ak such that for all
n ≥ 0,
an = A1 α1n + . . . + Ak αkn
60
with the Ai ’s explicitly findable by solving the k by k system of linear equations
A1 + . . . + Ak = a0
A1 α1 + . . . + Ak αk = a1
A1 α12 + . . . + Ak αk2 = a2
...
k−1 k−1
A1 α1 + . . . + Ak αk = ak−1 .
Even without solving this system, if C(x) has a unique root (say α1 ) of greatest absolute
value, then we know the asymptotic growth rate
pn ∼ A1 α1n
as n → ∞.
Similar statements can be made when C(x) has repeated roots, with the form of the final
answer changing depending on what is the right expression to use in the partial fractions
expansion step of the generating function method. I won’t make a general statement, because
it would be way too cumbersome (but ask me if you want to see more!); instead here’s an
example:
Suppose an = 4an−1 − 4an−2 for all n ≥ 2, with a0 = 0 and a1 = 1. The generating
function method gives that the generating function A(x) satisfies
x
A(x) = .
1 − 4x + 4x2
The correct partial fractions expansion now is
A B
A(x) = + .
1 − 2x (1 − 2x)2
The coefficient of xn in A/(1 − 2x) is A(2n ). For B/(1 − 2x)2 , we use:
1
= 1 + kx + k 2 x2 + . . . ,
1 − kx
so differentiating
k
= k + 2k 2 x + 3k 3 x2 + . . . + (n + 1)k n+1 xn + . . . ,
(1 − kx)2
so
1
= 1 + 2kx + 3k 2 x2 + . . . + (n + 1)k n xn + . . . ,
(1 − kx)2
so the coefficient of xn in B/(1 − 2x)2 is B(n + 1)(2n ). [This trick of figuring out new power
series from old by differentiation is quite useful!] This gives
an = A2n + (n + 1)B2n .
Using a0 = 0 we get A + B = 0, and using a1 = 1 we get 2A + 4B = 1, so A = −1/2,
B = 1/2 and
an = n2n−1
(a fact that if we had guessed correctly, we could have easily proven by induction).
61
Solving via matrices
The Perrin recurrence can be encoded in matrix form: start with pn+2 = pn + pn−1 , and add
the trivial identities pn+1 = pn+1 and pn = pn to get
    
pn 0 1 0 pn−1
 pn+1  =  0 0 1   pn  (2)
pn+2 1 1 0 pn+1
valid for n ≥ 1. iteratively applying, we get that for all n ≥ 1,
   n  
pn 0 1 0 3
 pn+1  =  0 0 1   0 . (3)
pn+2 1 1 0 2
Write this as vn = An v. If we can diagonalize A (that is, find invertible S with SAS −1 = D,
with D a diagonal matrix), then we can write A = S −1 DS, so An = S −1 Dn S, from which we
can easily find vn and so pn explicitly. If you know enough linear algebra, you’ll quickly see
that this approach requires finding eigenvalues, which are roots of a certain cubic, and the
computations quickly reduce to exactly the same ones as those of the previous two methods
described. I mention this method just to bring up the matrix point of view of recurrences,
which can sometimes be quite helpful.
Solving general recurrences

I’ve only talked about recurrences with constant coefficients, but of course recurrences can
be far more general. While the generating function method is very good to bear in mind for
more general problems, there’s really no general approach that’s sure to work; solving recur-
rences generally involves ad-ho tool like playing with lots of small examples, and spotting,
conjecturing and proving patterns (often by induction).
A non-Putnam warm-up exercise

Using the trick of repeatedly differentiating the identity
1
= 1 + x + x2 + . . . ,
1−x
find a nice expression for the coefficients of the power series (about 0) of 1/(1 − x)k . Use
this to derive, via generating functions, the identity
n
X n(n + 1)(2n + 1)
i2 = ,
i=0
6
and if you are feeling masochistic, go on to find a nice closed-form for

n
X
i3
i=0
using the same idea.
62
7.1 Week seven problems
1. Define a sequence (pn )n≥1 recursively by p1 = 3, p2 = 7 and, for n ≥ 3,
pn = 4 + pn−1 + 2pn−2 + . . . + 2p1
(so, for example, p3 = 4 + p2 + 2p1 = 17 and p4 = 4 + p3 + 2p2 + 2p1 = 41). Find a

closed-form expression for pn for general n.
2. Let (xn )n≥0 be a sequence of nonzero real numbers such that x2n − xn−1 xn+1 = 1 for
n = 1, 2, 3, . . .. Prove there exists a real number C such that xn+1 = Cxn − xn−1 for
all n ≥ 1.
3. Define a sequence by ak = k for k = 1, 2, . . . , 2014 and
ak+1 = ak + ak−2013
for k ≥ 2014. Show that the sequence has 2013 consecutive terms each divisible by
2014.
4. An ordinary dice has six faces, each with a different number written on it, from 1 to 6.
If you roll two ordinary dice, say a red one and a blue one, then there is 1 way to get
a sum of 2, 2 ways to get a sum of 3, 3 ways to get a sum of 4, 4 ways to get a sum 5,
5 ways to get a sum of 6, 6 ways to get a sum of 7, 5 ways to get a sum of 8, 4 ways
to get a sum of 9, 3 ways to get a sum of 10, 2 ways to get a sum of 11, and 1 way to
get a sum of 12 (and no way to get any other sum).
Is it possible to create a pair of non-ordinary dice, both with six faces, with positive
whole numbers written on the faces, but with these numbers not being the numbers 1
through 6, such that when you roll the pair of dice, you still have 2 ways to get a sum
of 3, 3 ways to get a sum of 4, etc., up to 1 way to get a sum of 12 (and no way to get
any other sum)? Note that these two ordinary dice don’t have to be the same as each
other, and you are allowed to repeat numbers on a single dice.
5. Define (an )n≥0 by

1 X
= an x n .
1 − 2x − x2 n≥0
Show that for each n ≥ 0, there is an m = m(n) such that am = a2n + a2n+1 .
6. Let am,n denote the coefficient of xn in the expansion of (1 + x + x2 )m . Prove that for
all integers k ≥ 0,
b2k/3c
X
0≤ (−1)i ak−i,i ≤ 1.
i=0
(Here bac denote the round-down of a to the nearest integer at or below a; so for
example b3.4c = 3, b2.999c = 2 and b5c = 5.)
63
7.2 Week seven solutions
A non-Putnam warm-up exercise
Differentiating the left-hand side k times, we get
k!
,
(1 − x)k+1
and differentiating the right-hand side k times, we get an power series where the coefficient
of xn−k is n(n − 1) . . . (n − (k − 1)), so the coefficient of xn is (n + k)(n + k − 1) . . . (n + 1).
Dividing through by k!, the coefficient of xn in 1/(1 − x)k+1 is

(n + k)(n + k − 1) . . . (n + 1) n+k
= .
k! k
Let an = ni=0 i2 ; an satisfies the recurrence a0 = 0 and an = an−1 + n2 for n > 0. Letting
P
A(x) = a0 + a1 x + a2 x2 . . . be the generating function of the an ’s, we get
A(x) = 0 + (a0 + 12 )x + (a1 + 22 )x2 + . . .

= xA(x) + (12 x + 22 x2 + 32 x3 + . . .)
= xA(x) + [1.0 + 1]x + [2.1 + 2]x2 + [3.2 + 3]x3 + . . .
= xA(x) + (1.0x + 2.1x2 + 3.2x3 + . . .) + (1x + 2x2 + 3x3 + . . .)
= xA(x) + x2 (2.1 + 3.2x + . . .) + x(1 + 2x + 3x2 + . . .)
2

2 d 1 d 1
= xA(x) + x +x
dx2 1 − x dx 1 − x
2
2x x
= xA(x) + 3
+
(1 − x) (1 − x)2
x2 + x
= xA(x) +
(1 − x)3
so
x2 + x x2 x
A(x) = 4
= 4
+ .
(1 − x) (1 − x) (1 − x)4
This means that an consists of two parts — the coefficient of xn−2 in 1/(1 − x)4 and the
coefficient of xn−1 in 1/(1 − x)4 . By what we established earlier, this is

n+1 n+2 n(n + 1)(2n − 1)
+ = .
3 3 6
If you were feeling masochistic, you might have used the same method to discover
n 2
X
3 n(n + 1)
i = .
i=0
2
64
The problems
1. (Galvin) The first few values are p1 = 3, p2 = 7, p3 = 17, p4 = 41, p3 = 17, p4 = 41. A
pattern seems to be emerging: pn = 2pn−1 + pn−2 , with p1 = 3, p2 = 7. We verify this
by induction on n. It’s certainly true for n = 3. For n > 3,
pn = 4 + pn−1 + 2pn−2 + 2pn−3 + . . . + 2p3 + 2p2 + 2p1
= (pn−1 + pn−2 ) + 4 + pn−2 + 2pn−3 + . . . + 2p3 + 2p2 + 2p1
= (pn−1 + pn−2 ) + pn−2 (induction)
= 2pn−1 + pn−2 ,
as required. With this new recurrence, it is easy to apply the method of generating
functions, as described in the introduction, to get
√ √
(1 + 2)n+1 (1 − 2)n+1
pn = + .
2 2
Remark: This problem arose in my research. A graph is a collection of points, some
pairs of which are joined by edges. A Widom-Rowlinson coloring of a graph is a coloring
of the points using 3 colors, red, white and blue, in such a way that no point colored red
is joined by an edge that is colored blue. I was looking at how many Widom-Rowlinson
colorings there are of the graph Pn that consists of n points, numbered 1 up to n, with
edges from 1 to 2, from 2 to 3, etc., up to from n − 1 to n. It turns out that there
are pn such colorings, where pn satisfies the first recurrence. In trying to find a closed
form for pn , I realized that pn satisfies the Fibonacci-like recurrence described in the
solution above, and so was able to solve for pn explicitly using generating functions.
2. (1993 Putnam problem A2) Set
xn + xn−2
fn =
xn−1
for n ≥ 2 (well-defined since all xi are non-zero), so that
xn+1 + xn−1
fn+1 = .
xn
We claim that for all n ≥ 2, fn+1 = fn for n ≥ 2. Indeed, this equality is equivalent to
xn + xn−2 xn+1 + xn−1
=
xn−1 xn
or
x2n + xn xn−2 = xn−1 xn+1 + x2n−1
or
x2n − xn−1 xn+1 = x2n−1 − xn xn−2 ;
but this last is true since both sides equal 1 for n ≥ 2. It follows that there is some C
such that for all n ≥ 2,
Cxn−1 = xn + xn−2
65
or
xn = Cxn−1 − xn−2 ,
so for all n ≥ 1
xn+1 = Cxn − xn−1 ,
as was required to show.
3. (Putnam competition 2006, problem A3) The given recurrence can be used to extend
the sequence to to a0 (a0 is the unique integer satisfying a2014 = a2013 + a0 , so a0 = 1),
and indeed in the same way it can be extended to all negative k. So we can consider
that we are working with a doubly-infinite sequence, indexed by Z.
For future reference, it will be useful to know the following: a0 = 1; a−1 is defined
by a2013 = a2012 + a−1 , so also a−1 = 1; a−2 is defined by a2012 = a2011 + a−2 , so
also a−2 = 1; this continues down to a−2012 , which is defined by a2 = a1 + a−2012 , so
also a−2012 = 1. Here things change a little. We have that a−2013 , which is defined by
a1 = a0 +a−2013 , takes value 0. Also, a−2014 , which is defined by a0 = a−1 +a−2014 , takes
value 0. This continues down to a−4025 , which is defined by a−2011 = a−2012 + a−4025 ,
and so also takes value 0. Notice that we have found 2013 consecutive values of the
sequence, albeit with negative indices, that take value 0, namely a−2013 through a−4025 .
In the section on pigeon-hole principle, we showed that the Fibonacci numbers, viewed
similarly as a doubly-infinite sequence, are periodic modulo any modulus; that is, if
we take any integer n and consider the remainder of the terms of the doubly-infinite
Fibonacci sequence modulo n, we get a periodic sequence. The same proof works to
show that the given sequence (ak )k∈Z is periodic modulo 2014.
So, it is enough to show that the doubly-infinite sequence (ak )k∈Z has 2013 consecutive
terms (some perhaps with negative indices), each divisible by 2014; the periodicity will
then give us such a long sequence of terms with positive index, also. But now we are
done, since a−2013 through a−4025 gives such a sequence.
4. (Matthew Conroy, http://www.madandmoonly.com/doctormatt/mathematics/dice1.
pdf) Let ai be the number of ways of rolling i with an ordinary dice (so ai = 1 if
i = 1, 2, 3, 4, 5 or 6, and is zero otherwise). The generating function of the sequence
(a0 , a1 , . . . , ) is
a1 x + a2 x2 + a3 x3 + a4 x4 + a5 x5 + a6 x6 = x + x2 + x3 + x4 + x5 + x6 .
Consider the expression
(x+x2 +x3 +x4 +x5 +x6 )2 = x2 +2x3 +3x4 +4x5 +5x6 +6x7 +5x8 +4x9 +3x10 +2x11 +x12 .
The coefficient of xk in this product is ki=0 ai ak−i , which counts exactly the number
P
of ways of rolling a sum of k with the two dice (either the first dice comes up 0, the
second k, which can happen a0 ak ways, or the first is 1, the second k − 1, which can
happen a0 ak ways, etc.).
More generally, suppose we have dice 1 that has number i on bi of the faces (i =
0, 1, 2, . . .), with all the bi non-negative integers, and the sum of the bi being 6, and
66
dice 2 that has number i on ci of the faces (i = 0, 1, 2, . . .), with all the ci non-negative
integers, and the sum of the ci being 6. By the same reasoning as above, the number
of ways of rolling a sum of k with the two dice is exactly the coefficient of xk in the
polynomial
(b1 x + b2 x2 + b3 x3 + b4 x4 + b5 x5 + b6 x6 )(c1 x + c2 x2 + c3 x3 + c4 x4 + c5 x5 + c6 x6 ).
So the question is: can we factor
x2 + 2x3 + 3x4 + 4x5 + 5x6 + 6x7 + 5x8 + 4x9 + 3x10 + 2x11 + x12
as the product of two polynomial, both with integral, non-negative coefficients, and
both with sum of coefficients equal to 6, in some way other than
x2 +2x3 +3x4 +4x5 +5x6 +6x7 +5x8 +4x9 +3x10 +2x11 +x12 = (x+x2 +x3 +x4 +x5 +x6 )2 ?
We have a factorization:
x2 +2x3 +3x4 +4x5 +5x6 +6x7 +5x8 +4x9 +3x10 +2x11 +x12 = x2 (1+x)2 (1+x+x2 )2 (1−x+x2 )2
which we can rearrange as
[x(1 + x)(1 + x + x2 )][x(1 + x)(1 + x + x2 )(1 − x + x2 )2 ]
or
(x + 2x2 + 2x3 + x4 )(x + x3 + x4 + x5 + x6 + x8 ).
This leads us the following conclusion: if dice 1 has sides with numbers 1, 2, 2, 3, 3, 4,
and dice 2 has sides with numbers 1,3,4,5,6,8, then in terms of sums of numbers rolled,
these two dice behave exactly as ordinary dice!
Remark: These dice were discovered by George Sicherman of Buffalo, New York and
were originally reported by Martin Gardner in a 1978 article in Scientific American.
5. (1999 Putnam, problem A3) Using partial fractions, we can get the Taylor series of
1/(1 − 2x − x2 ), and discover that
1 √ √
an = √ (1 + 2)n+1 − (1 − 2)n+1 .
2 2
After a little (messy) algebra, we find that
a2n + a2n+1 = a2n+2 ,
so m = 2n + 2 works.
6. (Putnam competition 1997, problem B4) Note that for each k ≥ 0, if i > b2k/3c, then
ak−i,i = 0 also. So we may as well consider
X
(−1)i ak−i,i ,
i≥0
67
Pb2k/3c
as it takes exactly the same value as i=0 (−1)i ak−i,i .
Observe that, by definition of am,n ,
X
(1 + x + x2 )m = am,n xn ,
n≥0
so that the two-variable generating function

X
am,n xn y m
m,n≥0
is equal to
1 + y(1 + x + x2 ) + y 2 (1 + x + x2 )2 + . . .
which simplifies to
1
.
1 − y(1 + x + x2 )
Evaluating both sides at x = −y, we get
1 X
2 3
= (−1)n am,n y m+n .
1−y+y −y m,n≥0
Note that the right-hand side above can be rewritten as

!
X X
(−1)i ak−i,i y k ;
k≥0 i≥0
so the sum we are trying to evaluate is the coefficient of y k in 1/(1 − y + y 2 − y 3 ).

But now 1 − y + y 2 − y 3 = (1 − y 4 )/(1 + y), so
1 1 y
2 3
= 4
+ .
1−y+y −y 1−y 1 − y4
From this we see that the coefficient of y k in 1/(1 − y + y 2 − y 3 ) is 1 if k is a multiple

of 4, or one greater than a multiple of 4, and is 0 otherwise; in any case it is between
0 and 1, as was required to be shown.
68
8 Week eight (October 14) — Another grab-bag
An aspect of problem solving that it is important to get a handle on, is seeing a problem out
of the context of some technique or method, and being able to reach for the right method
(or methods). This week’s problems are designed to challenge you to do just that. Here’s a
slight hint: for some of the problems, the techniques that we’ve discussed in the early part
of the semester are appropriate.
8.1 Week eight problems

1. For any positive integer n, let f (n) be the number of ways to write n as a sum of
positive integers, n = a1 + a2 + + ak , with k an arbitrary positive integer and a1 ≤
a2 ≤ . . . ≤ ak ≤ a1 + 1. For example, with n = 4 there are four such representations,
4 = 4, 4 = 2 + 2, 4 = 1 + 1 + 2, and 4 = 1 + 1 + 1 + 1, so f (4) = 4.
Find, with proof, a general formula for f (n).
2. Let x1 , x2 , . . . , xn be real numbers satisfying 0 ≤ xi ≤ 1 for each i. Prove that
(1 + x1 )(1 + x2 ) . . . (1 + xn ) ≤ 2n−1 (1 + x1 x2 . . . xn ).
3. Take a walk along the number line in the following manner. Start at 0. Toss a coin,
and move to 1 if the coin comes up heads, and to −1 if it comes up tails. Repeat (so
in general, if you are at integer x, at the next step you will either be at x + 1 or x − 1,
each with probability 1/2). Fix an integer n, and let pn (k) be the probability that
after exactly k steps, you are at n. Find the value of k that maximizes pn (k).
4. Let d1 , . . . , d12 be real numbers between 1 and 12 (but not including 1 or 12). Show
that the are three numbers among the 12 that are the side lengths of an acute-angled
triangle.
5. Let n be a natural number. Show that the polynomial
x2 x2n
Pn (x) = 1 + x + + ... +
2! (2n)!
has no real roots.
6. Let S be a set of real numbers, all between 0 and 1, that satisfies the following two
properties:
• 0 ∈ S and 1 ∈ S
• S is closed under taking
Pn finite means (meaning, if x1 , x2 , . . . , xn are distinct ele-
ments of S, then ( i=1 xi ) /n is in S).
Show that S contains every rational number between 0 and 1.
69
8.2 Week eight solutions
1. (Putnam Competition 2003, problem A1) At little experimentation suggests that for
each k, 1 ≤ k ≤ n, there is a unique representation of n of the form n = a1 + a2 + + ak ,
with a1 ≤ a2 ≤ . . . ≤ ak ≤ a1 + 1 (all ai ’s positive integers), and that for k < 1 or
k > n there is no such representation, so that f (n) = n. The awkwardness in this
problems comes from verifying all of these simple parts rigorously, in a readable way.
Any representation of n of the form n = a1 +a2 + +ak , with a1 ≤ a2 ≤ . . . ≤ ak ≤ a1 +1
(all ai ’s positive integers) [from here on I’ll just call this a “valid representation” of
n] must evidently have k at least 1, and must also have k ≤ n, for if k > n then
a1 + a2 + + ak > n. We claim that for each 1 ≤ k ≤ n, there is a unique valid
representation of n. This implies that f (n) = n.
Fix 1 ≤ k ≤ n. To begin proving the claim for this value of k, we exhibit at least one
valid representation. Let a be the greatest integer that does not exceed n/k. Note that
a + a + ... + a ≤ n
(k a’s in the sum), and
(a + 1) + (a + 1) + . . . + (a + 1) > n
(again, k (a + 1)’s in the sum). To see that latter, note that if the inequality were
not true, then a + 1 would be an integer that does not exceed n/k, contradicting the
definition of a. To see the former, note that if the inequality were not true, then a
would exceed n/k, again contradicting the definition of a.
Consider the sum
a + a + . . . + a + (a + 1) + (a + 1) + . . . (a + 1)
where there are j (a + 1)’s, for some 0 ≤ j ≤ −1. At j = 0 this sum is at most n, and
at j = k − 1 it is at least n (since it is above n at j = k), so there is a unique j in the
range for which it is exactly n.
We have shown not only that there is a valid representation of n with k parts, but also
that there is unique such with a1 = a.
To complete the proof of the claim, it is now enough to show that there is no valid
representation of n with k parts that has a1 < a, and none that has a1 > a.
The latter is easy: if a1 > a then a1 + . . . + ak ≥ (a + 1) + . . . + (a + 1) > n. Similarly
for the former: if a1 < a then a1 + . . . + ak < a + . . . + a < n. So we are done with
both he claim and the proof that f (n) = n.
2. (University of Illinois at Urbana-Champaign, freshman competition 2014, Question 3)

We proceed by induction on n, with n = 1 trivial. For n = 2 we are required to shown
that
(1 + x1 )(1 + x2 ) ≤ 2(1 + x1 x2 ).
70
After a little algebra this is seen to be equivalent to
(1 − x1 )(1 − x2 ) ≥ 0
which follows from 1 − x1 ≥ 0 (x1 ≤ 1) and 1 − x2 ≥ 0 (x2 ≤ 1).

For n ≥ 2 we have
(1 + x1 )(1 + x2 ) . . . (1 + xn ) ≤ (1 + x1 )2n−2 (1 + x2 . . . xn )
= 2n−2 [(1 + x1 )(1 + x2 . . . xn )]
≤ 2n−2 [2(1 + x1 x2 . . . xn )]
= 2n−1 (1 + x1 x2 . . . xn )
where in the first inequality we have used the n − 1 case of the inductive hypothesis,
and in the second inequality we have used the n = 2 case. Note that the validity of
this second example depends on 0 ≤ x2 . . . xn ≤ 1, which follows from 0 ≤ xi ≤ 1 for
each i — this is where we use the hypothesis xi ≥ 0.
3. (Told to me by Joshua Cooper) There is a parity issue to be aware of here: if k has a

different parity to n (one odd, the other even) then pk (n) = 0. So we implicitly assume
in all our formulae that n and k have the same parity.
We can explicitly compute pk (n). There are 2k possible sequences of k coin flips, and
those that have exactly (k + n)/2 heads (and so (k − n)/2 tails) lead to being at n. So
k

(k+n)/2
pk (n) = .
2k
Note that if k is less than n then (k + n)/2 < k and so the above probability is 0. A
possible intuition is that pk (n) starts rising from 0 at k = n, rises up to a maximum,
and then falls back down to 0 again as k → ∞ (this kind of behavior in a sequence
is referred to as unimodality). To verify this, we could look at ratios of the form
pk+2 (n)/pk (n). After a little algebra we get
pk+2 (n) (k + 1)(k + 2)

= k+n
.
+ 1 k−n

pk (n) 4 2 2
+1
Some further algebra shows that, starting from k = n


> 1 if k < n2 − 2
pk+2 (n) 
= 1 if k = n2 − 2
pk (n) 
< 1 if k > n2 − 2.
This shows that (ignore the zeros that come up every second term, due to parity), the
sequence (pk (n))k≥n is strictly increasing up to pn2 −2 (n), is strictly decreasing from
pn2 (n) on, and satisfies pn2 −2 (n) = pn2 (n), and so the two values of k that maximize
pk (n) are k = n2 − 2 and k = n2 .
71
4. (Putnam competition 2012, question A1) The statement that a, b and c satisfying
0 < a ≤ b ≤ c (and a + b ≥ c, so the three numbers really do form the sides lengths
of a triangle) do not form the side lengths of an acute-angled triangle is, by the law of
cosines, equivalent to c2 > b2 + a2 .
Consider numbers 0 < a1 ≤ a2 ≤ . . . ≤ an (n ≥ 3) with the property that all triples
(ai , aj , ak ) with i < j < k satisfy a2k > a2j + a2i . We claim that for n ≥ ` ≥ 3,
a2` > F`−1 a22 + F`−2 a21 ,
where the F ’s are Fibonacci numbers: F1 = 1, F2 = 1 and for k > 2, Fk = Fk−1 + Fk−2 .
This finishes the problem: ordering our 12 numbers 1 < d1 ≤ . . . ≤ d12 < 12, we get
from the claim that
d212 > F11 d22 + F10 d21 = 89d22 + 55d21 > 144
so d12 > 12, a contradiction.

We prove the claim by induction on `, with the case ` = 3 being immediate. For ` = 4
we have
a24 > a23 + a22

> 2a22 + a21 ,
as required. For ` > 4, we have
a2` > a2`−1 + a2`−2

> (F`−2 a22 + F`−3 a21 ) + (F`−3 a22 + F`−4 a21 )
= (F`−2 + F`−3 )a22 + (F`−3 + F`−4 )a21
= F`−1 a22 + F`−2 a21 ,
as required (the second inequality being the inductive hypothesis).

5. (University of Illinois at Urbana-Champaign, freshman competition 2014, Question 6)
Pn (x) has even degree and so limx→+∞ Pn (x) = limx→−∞ Pn (x) = +∞. It follows that
the global minimum of Pn (x) is achieved at some x for which Pn0 (x) = 0. But
x2 x2n−1 x2n
Pn0 (x) = 1 + x + + ... + = Pn (x) − ,
2! (2n − 1)! (2n)!
so at any real x for which Pn0 (x) = 0 we have
x2n
Pn (x) = > 0.
(2n)!
[We rule out x2n /(2n)! = 0, which is equivalent to x = 0, by observing that at x = 0
we have p0n (x) = 1 6= 0.]
It follows that the global minimum of Pn (x) on the reals is positive, and so there is no
real x for which Pn (x) = 0.
72
6. (The Angry Statistician blog) By taking averages of pairs of numbers, we can easily
obtain all rationals of the form
`
2n
n
for 0 ≤ ` ≤ 2 and all n ≥ 1. [For example, 1/2 = (0 + 1)/2; 1/4 = (0 + (1/2))/2;
3/4 = (1/2 + 1)/2].
We can use these rationals to get all others. How do we get `/k, with 1 ≤ ` ≤ k − 1,
and k ≥ 3? It is sufficient to find some n,P depending on k and `, and k distinct integers
a1 , . . . , ak in the range [0, 2n ], such that ki=1 ai = `2n ; then the mean of
a1 ak
n
,..., n
2 2
is `/k [For example: 0 + 1 + 2 + 3 + 10 = 16, so 1/5 is obtained as the average of
0, 1/16, 2/16, 3/16 and 10/16].
We deal first with the case ` < k/2. For ` = 1 we take a1 = 0, a2 = 1, . . ., ak−1 = k − 2
and ak = 2n − (0 + 1 + . . . (k − 2)). These numbers are all between 0 and 2n , they sum
to 2n , and for them to be distinct it is is enough to show
2n − (0 + 1 + . . . (k − 2)) > k − 2
or
2n > k − 2 + (0 + 1 + . . . (k − 2)),
which can certainly be arranged by choosing n sufficiently large.
For ` ≥ 2 we take
a1 = 2n
a2 = 2n − 1
...
a`−1 = 2n − (` − 2)
a` = 0
a`+1 = 1
...
ak−1 = k−`−1

n `−1 k−`
ak = 2 + − .
2 2
Using m m+1
we see that ki=1 ai = `2n . Next we need to see that the k
P P
i=0 i = 2
numbers are distinct, and that all are in the correct range. Certainly a1 through a`−1
are distinct, and the smallest among these is a`−1 . Also, a` through ak−1 are distinct,
and the largest of these is ak−1 . So it is enough to show that

n `−1 k−`
0≤k−`−1<2 + − < 2n − (` − 2) ≤ 2n .
2 2
73
The first and last inequalities are trivial. For the second-from last, we need

k−` `−1
> +`−2
2 2
which is the same as

k−` `
≥
2 2
which follows from k > 2`. For the second inequality, we require

n k−` `−1
2 >k−`−1+ − ,
2 2
which is implied by
n k
2 >k+ ,
2
which can certainly be arranged by choosing n sufficiently large.
Next we deal with the case ` > k/2. In this case k − ` < k/2, and since we have found
a1 + . . . + ak = (k − `)2n
with the ai ’s distinct and all between 0 and 2n , we may simply take a0i = 2n − ai and
note that the a0i ’s are distinct and all between 0 and 2n , and sum to
k2n − (k − `)2n = `2n .
There remains the case ` = k/2; but in this case `/k = 1/2, which has already been
dealt with.
Note that we have shown the following stronger statement: if S is a set of real numbers,
all between 0 and 1, that satisfies the following two properties:
• 0 ∈ S and 1 ∈ S
• S is closed under taking finite means of at most k distinct P terms (meaning, if
x1 , x2 , . . . , xn are distinct elements of S, and n ≤ k, then ( ni=1 xi ) /n is in S).
Then S contains every rational number between 0 and 1 of the form m/n with n ≤ k.
74
9 Week nine (October 28) — Modular arithmetic and
number theory
9.1 Modular arithmetic
For integers a and b, and positive integer k, say that a is congruent to b (modulo k), written
“a ≡ b (mod k)”, if a and b leave the same remainder on division by k, or equivalently if
a − b is a multiple of k, or equivalently if a − b = mk for some integer m. Congruence
(modulo k) is an equivalence relation on the integers, that partitions Z into k classes, called
residue classes. For example, when k = 3 the three classes are {. . . , −6, −3, 0, 3, 6, . . .},
{. . . , −5, −2, 1, 4, 7, . . .} and {. . . , −4, −1, 2, 5, 8, . . .}.
Many of the standard arithmetic operations go through unchanged to modular arithmetic.
For example, it is easy to establish that if
a ≡ b (mod k) and c ≡ d (mod k)
then each of
a + c ≡ b + d (mod k)
a − c ≡ b − d (mod k)
ac ≡ bd (mod k)
hold. Repeated application of this last relation also quickly gives that for all positive numbers
n,
an ≡ bn (mod k).
Modular arithmetic can be a great time-saver when working with problems concerning
divisibility. We give a quick and useful example.
Claim: The remainder of any integer, on division by 9, is the same as the remainder of the
sum of its digits on division by 9.
Proof: Write the number in decimal form as ni=0 ai 10i (with each ai ∈ {0, . . . , 9}). Since
P
Pn≡ 1(mod
10 9),
Pwe immediately have 10i ≡ 1i ≡ 1(mod 9), and so ai 10i ≡ 1(mod 9), and so
i n
i=0 ai 10 ≡ i=0 ai (mod 9), which is exactly what we wanted to show.
Here are two more quick examples illustrating how modular arithmetic can make life
easy:
Question: What are the last two digits of 372 ?
Answer: We are being asked: what number x between 0 and 99 is such that 372 ≡
x(mod 100)? By repeated squaring we have
3 ≡ 3(mod 100)
32 ≡ 32 ≡ 9(mod 100)
34 ≡ 92 ≡ 81(mod 100)
38 ≡ 812 ≡ 6561 ≡ 61(mod 100)
316 ≡ 612 ≡ 3721 ≡ 21(mod 100)
332 ≡ 212 ≡ 441 ≡ 41(mod 100)
364 ≡ 412 ≡ 1681 ≡ 81(mod 100)
75
and so
372 ≡ 364 38 ≡ 81 · 61 ≡ 4941 ≡ 41(mod 100),
so the last two digits of 372 are 41.
Problem: Prove that 270 + 370 is divisible by 13.
Solution: We could use the same technique as above to discover that 270 ≡ 10 (mod 13)
and 370 ≡ 3 (mod 13) so that 270 + 370 ≡ 10 + 3 ≡ 0 (mod 13). But there is a much easier
way: 22 ≡ −32 (mod 13), so 270 ≡ (−1)35 370 ≡ −370 (mod 13), so 270 + 370 ≡ 0 (mod 13).
Division in modular arithmetic is a little more subtle than addition, subtraction and
multiplication, and will come up somewhat in the discussion on number theory below.
Number theory
Here are some useful principles/definitions from number theory.
1. Divisibility: For integers a, b, a|b (a divides b) if there is an integer k with ak = b.
For positive integers a and b, the greatest common divisor of a, b, gcd(a, b) (sometimes
just written (a, b)) is the largest positive number that is a divisor of both a and b. This
means that if d = gcd(a, b), and e is any positive number with e|a and e|b, then e ≤ d;
but in fact it turns out that moreover e|d. This very useful fact follows easily from
looking at the prime factorizations of a and b, see below.
The least common multiple of a, b, lcm(a, b) is the smallest positive number f such
that a|f and and b|f ; for any positive number g with a|g and b|g we have f ≤ g; but
in fact, as with gcd, it turns out that we even have f |g in this case.
If gcd(a, b) = 1 (so no factors in common other that 1) then a and b are said to be
coprime or relatively prime.
2. Primes: If p > 1 only has 1 and p as divisors, it is said to be prime; otherwise it is
composite.
The fundamental fact about prime numbers (other that there are infinitely many of
them!) is that every number n > 1 has a prime factorization:
n = pa11 . . . pakk
with each pi a prime, and each ai > 0. Moreover, the factorization is unique if we
assume that p1 < . . . < pk .
The prime factorization gives one way (not the most computationally efficient way) of
accessing gcd(a, b) and lcm(a, b). Indeed, if
a = pa11 . . . pakk and b = pb11 . . . pbkk
(with some of the ai and bi possibly 0), then
min(a1 ,b1 ) min(ak ,bk ) max(a1 ,b1 ) max(ak ,bk )
gcd(a, b) = p1 . . . pk and lcm(a, b) = p1 . . . pk .
Using min(x, y) + max(x, y) = x + y, we get the nice identity
ab = gcd(a, b)lcm(a, b).
76
3. Euclidean algorithm: Euclid described a simple way to compute gcd(a, b). Assume
a > b. Write
a = kb + j
where 0 ≤ j < b. If j = 0, then gcd(a, b) = b. If j > 0, then it is fairly easy to
check that gcd(a, b) = gcd(b, j). Repeat the process with the smaller pair b, j, and
keep repeating as long as necessary. For example, suppose I want gcd(63, 36):
63 = 1.36 + 27
36 = 1.27 + 9
27 = 3.9.
We conclude 9 = gcd(27, 9) = gcd(36, 27) = gcd(63, 36).
4. Bézout’s Theorem: Given a, b, there are integers x, y such that ax + by = gcd(a, b).
Moreover, the set of numbers that can be expressed in the form ax0 + by 0 = c for
integers x0 , y 0 is exactly the set of multiples of gcd(a, b).
The proof comes from working the Euclidean algorithm backwards. I’ll just do an
example, with the pair 63, 36. We have
9 = 36 − 1.27
= 36 − 1(63 − 1.36)
= −1.63 + 2.36
so we can take x = −1 and y = 2.

Once we have found an x and y, the rest is easy. Suppose c = kgcd(a, b) is a multiple
of gcd(a, b); then (kx)a + (ky)b = cgcd(a, b). On the other hand, if ax0 + by 0 = c then
since gcd(a, b)|a and gcd(a, b)| we have gcd(a, b)|c, so c is a multiple of gcd(a, b).
The most common form of Bézout’s Theorem is that if (a, b) = 1 then every integer
k can be written as a linear combination of a and b; in particular there is x, y with
ax + by = 1.
In the language of modular arithmetic, this says that if (a, k) = 1, then there is a
number x such that ax ≡ 1 (mod k). We may think of x as an inverse of a (modulo
k); this is a starting point for thinking about division in modular arithmetic.
5. Useful facts/theorems concerning modular arithmetic:
(a) Inverses (repeating a previous observation): If p is a prime, and a 6≡ 0 (mod p),

then there is a whole number b such that ab ≡ 1 (mod p); more generally if a and
k are coprime then there is a whole number b such that ab ≡ 1 (mod k).
(b) Fermat’s theorem: If p is a prime, and a 6≡ 0 (mod p), then ap−1 ≡ 1 (mod p).
More generally, for arbitrary m (prime or composite) define ϕ(m) to be the number
of numbers in the range 1 through m that are coprime with m. If (a, m) = 1 then
aϕ(m) ≡ 1 (mod m). (When m = p this reduces to Fermat’s Theorem.)
77
We refer to ϕ as Euler’s totient function. A quick way to calculate its value: if m
has prime factorization m = pa11 . . . pakk , then
k
Y 1
ϕ(m) = m 1− .
i=1
pi
(c) Chinese Remainder Theorem: Suppose n1 , n2 , . . . , nk are pairwise relatively

prime. If a1 , a2 , . . . , ak are any integers, there is a number x that simultaneously
satisfies x ≡ ak (mod nk ). Moreover, modulo n1 n2 . . . nk , this solution is unique.
78
9.2 Week nine problems
1. Compute the sum of the digits of the sum of the digits of the sum of the digits of the
number 44444444 .
2. Let n > 1 be an integer and p a prime such that n|(p − 1) and p|(n3 − 1). Prove that
4p − 3 is a perfect square.
3. Let S be the set of all positive integers that are not perfect squares. For n ∈ S, consider
choices of integers a1 , a2 , . . . , ar such that n < a1 < a2 < . . . < ar and n·a1 ·a2 ·. . .·ar is
a perfect square, and let f (n) be the minimum of ar over all such choices. For example,
2 · 3 · 6 is a perfect square, while 2 · 3, 2 · 4, 2 · 5, 2 · 3 · 4, 2 · 3 · 5, 2 · 4 · 5, and 2 · 3 · 4 · 5 are
not, and so f (2) = 6. Show that the function f from S onto the integers is one-one.
4. Several positive integers are written on a chalk board. One can choose two of them,
erase them, and replace them with their greatest common divisor and least common
multiple. Prove that eventually the numbers on the board do not change.
5. Take a walk on the number line, starting at 0. Begin by taking a step of length one,
either right or left, then take a step of length two, either right or left, and in general
at the kth stage take a step of length k, either right or left.
(a) Prove that for each integer m, there is a walk that visits (has a step ending at)
m.
(b) Let f (m) denote the least number of steps needed to reach m. Show that
f (m)
lim √
m→∞ m
exists, and find the value of the limit.
6. For which positive integers n is there an n × n matrix with integer entries such that
every dot product of a row with itself is even, while every dot product of two different
rows is odd?
79
9.3 Week nine solutions
1. (International Mathematical Olympiad 1975 problem 4) We start with
44444444 < 1000010000 = 1040000 .
Among all numbers below 1040000 , none has a larger sum of digits than 1040000 − 1 (a
string of 40000 9’s). So the sum of the digits of 44444444 is at most 9×40000 < 1000000.
Among all numbers below 1000000, none has a larger sum of digits than 999999. So
the sum of the digits of the sum of the digits of 44444444 is at most 54. Among all
numbers at most 54, none has a larger sum of digits than 49. So the sum of the digits
of the sum of the digits of the sum of the digits of 44444444 is at most 13.
Now we use a useful fact: the remainder of a number, on division by 9, is the same as
the remainder of the sum of the digits on division by 9. . This fact implies that the
sum of the digits of the sum of the digits of the sum of the digits of 44444444 leaves the
same remainder on division by 9 as 44444444 itself does.
To calculate the remainder of 44444444 on division by 9, we can use a repeated multi-
plication trick. It’s easy that
4444 ≡ 7 (mod 9)
and so
44442 ≡ 49 ≡ 4 (mod 9)
44444 ≡ 16 ≡ 7 (mod 9)
44448 ≡ 49 ≡ 4 (mod 9)
444416 ≡ 16 ≡ 7 (mod 9)
444432 ≡ 49 ≡ 4 (mod 9)
444464 ≡ 16 ≡ 7 (mod 9)
4444128 ≡ 49 ≡ 4 (mod 9)
4444256 ≡ 16 ≡ 7 (mod 9)
4444512 ≡ 49 ≡ 4 (mod 9)
44441024 ≡ 16 ≡ 7 (mod 9)
44442048 ≡ 49 ≡ 4 (mod 9)
44444096 ≡ 16 ≡ 7 (mod 9).
It follows that
44444444 = 44444096 4444256 444464 444416 44448 44444 ≡ 7.7.7.7.4.7 ≡ 7 (mod 9).
So 44444444 leaves a remainder of 7 on division by 9, and also the sum of the digits of
the sum of the digits of the sum of the digits of 44444444 leaves a remainder of 7 on
division by 9; but we’ve calculated that this last is at most 13. The only number at
most 13 that leaves a remainder of 7 on division by 9 is 7 it self; so the sum of the
digits of the sum of the digits of the sum of the digits of 44444444 must be 7.
80
Remark: This was an IMO (International Mathematical Olympiad) problem. Appro-
priately enough, if you had got this question fully correct at the IMO, you would have
scored 7 points!
2. (Northwestern Putnam Preparation) We have p|(n − 1)(n2 + n + 1) so either p|(n − 1)

or p|(n2 + n + 1). The first is impossible (it implies p ≤ n − 1, but n|(p − 1) says
n ≤ p − 1 so p ≥ n + 1). So we have p|(n2 + n + 1).
By n|(p − 1) we have kn = p − 1 for some k ≥ 1, and by p|(n2 + n + 1) we have
`p = n2 + n + 1 for some ` ≥ 1. From the first, we have `p = `kn + `, and so
`kn + ` = n2 + n + 1 or
n2 + (1 − `k)n + (1 − `) = 0.
The solutions to this quadratic must be integers, so the discriminant must be a perfect
square, i.e.,
(1 − `k)2 − 4(1 − `) = x2
for some integer x.
One possibility is ` = 1 (the left-hand side above becomes (1 − k)2 , certainly a square).
But ` = 1 says p = n2 + n + 1, so 4p − 3 = 4n3 + 4n + 1 = (2n + 1)2 , a perfect square.
If ` > 1 then rewriting the above as (`k − 1)2 + 4(` − 1) = x2 , we see that
4(` − 1) ≥ 2(`k − 1) + 1
(why? (`k − 1)2 is a perfect square, and when we add 4(` − 1) we get a larger perfect
square; the first perfect square after (`k − 1)2 is (`k)2 , which differs from (`k − 1)2 by
2(`k − 1) + 1, so we need to add at least this much).
4(` − 1) ≥ 2(`k − 1) + 1 implies k = 1 (if k ≥ 2 then 2(`k − 1) ≥ 4` − 2 > 4(` − 1)).
But k = 1 says p = n + 1, so n + 1|(n2 + n + 1), so n + 1|n2 , impossible for n > 0 (why¿
n + 1|n2 − 1, so if n + 1|n2 then n + 1|1).
3. (Putnam competition 2013, problem A2) Suppose that f is not one-to-one. That means
that there are n, n0 in S, with (without loss of generality) n < n0 , with f (n) = f (n0 ) =
m. That means (by definition of f ) that there is A ⊆ {n, . . . , m}, with n ∈ A and
m ∈ A, with Y
a
a∈A
Q
a perfect square, and that for any Ã ⊆ {n, . . . , m}, with n ∈ Ã and m 6∈ Ã, a∈Ã a is
not a perfect square. Similarly, there is A0 ⊆ {n0 , . . . , m}, with n0 ∈ A0 and m ∈ A0 ,
with Y
a
a∈A0
a perfect square, and that for any Ã0 ⊆ {n0 , . . . , m}, with n0 ∈ Ã0 and m 6∈ Ã0 ,
Q
a∈Ã0 a
is not a perfect square.
81
Let B = A ∩ A0 . We have
! !   !2
Y Y Y Y
a a = a a .
a∈A a∈A0 a∈(A\B)∪(A0 \B) a∈B
The left-hand side above is a perfect square, so the right-hand side is, too. Since the
second term on the right-hand side is a perfect square, it follows that
Y
a
a∈(A\B)∪(A0 \B)
must be a perfect square. But now (A \ B) ∪ (A0 \ B) is a subset of {n, . . . , m} that

includes n but not m, contradicting f (n) = m.
4. (Stanford Putnam preparation) Here’s my laborious solution: When we take a pair

of numbers (a, b), and replace them with (gcd(a, b), lcm(a, b)), we preserve something,
namely the product of the pair of numbers (that ab = gcd(a, b)lcm(a, b) is easily seen
from the prime factorization of a and b: if
n
Y n
Y
a= pai i , b= pbi i
i=1 i=1
(with maybe some of the ai , bi zero) then

n
Y
ab = pai i +bi ,
i=1
n n
min{ai ,bi } max{ai ,bi }
Y Y
gcd(a, b) = pi , lcm(a, b) = pi ,
i=1 i=1
so n
min{ai ,bi }+max{ai ,bi }
Y
gcd(a, b)lcm(a, b) = pi .
i=1
Thus ab = gcd(a, b)lcm(a, b) follows from x + y = min{x, y} + max{x, y}, valid for any
positive integers x, y.)
For any fixed positive number, there are only finitely many ways to write it as the
product of a fixed number of positive numbers (if the target of the product is N , and
we are using d numbers, then each of the d numbers must be a divisor of N , so the
number of ways of writing N as a product of d terms is at most a(N )d , where a(N ) is
the number of divisors of N ). This shows that there are only finitely many possibilities
for the numbers written on the board.
Consider the sum of the numbers. How does this change with the swap operation? It
depends on how a + b compares to gcd(a, b) + lcm(a, b). Experimentation suggests that
gcd(a, b) + lcm(a, b) ≥ a + b, with equality iff the pair (a, b) coincides (in some order)
with the pair (lcm(a, b), gcd(a, b)). To prove this, first consider a = b, for which the
82
result is trivial. For all other cases, assume without loss of generality that a > b. We
have
lcm(a, b) ≥ a > b ≥ gcd(a, b).
If any one of a = lcm(a, b), b = gcd(a, b) holds then by the conservation of product the
other must too, and the result we are trying to prove is true. So now we may assume
lcm(a, b) > a > b > gcd(a, b),
and what we want to show is that this implies gcd(a, b) + lcm(a, b) > a + b. Let
n = ab = gcd(a, b)lcm(a, b), so
n n
gcd(a, b) + lcm(a, b) = gcd(a, b) + and a + b = b + .
gcd(a, b) b
A little
√ calculus shows that the √ function f (x) = x + n/x is decreasing on the interval
(0, n]. Since gcd(a, b) < b < n, this shows that
n n
gcd(a, b) + >b+
gcd(a, b) b
which is exactly what we want to show.

So, suppose we have the bunch of numbers in front of us, and we perform the swap
operation infinitely often. All swaps preserve the product. Some swaps also preserve
the sum; these swaps are exactly the swaps that don’t change the set of numbers.
All other swaps increase the sum. We can only increase the sum finitely many times
(there are only finitely many different configurations of numbers). Therefore there
must be some point (curiously, not boundable as a function of the original numbers!)
after which we make no more sum-increasing swaps; from that point on, the numbers
remain unchanged.
Here’s a very quick, slick solution shown to me by Do Trong Thanh: If you pick two
numbers a, b with a|b or b|a, then since gcd(a, b) = min{a, b} and lcm(a, b) = max{a, b}
in this case, the numbers do not change. In general gcd(a, b)|lcm(a, b), so if it is not
the case that a|b or b|a, then after the swap it is the case for that particular pair.
Initially there are only finitely many pairs (a, b) with a 6 |b and b 6 |a; either eventually
we replace all these pairs with pairs of which one divides to other (in which case we
are done), or we eventually commit to avoiding all remaining such pairs (in which case
we are done).
5. (From A Mathematical Orchard, Problems and Solutions by Krusemeyer, Gilbert and

Larson)
(a) We can easily get to the number m = n(n + 1)/2, for any integer n: just do
1 + 2 + 3 + . . . + n. What about a number of the form n(n + 1)/2 − k, where
1 ≤ k ≤ n − 1? For k even we can get to this by flipping the + to a − in front of
k/2. For k odd we can get to this by first adding n+1 to the end, then subtracting
n − 2 (net effect: −1), then flipping the + to a − in front of (k − 1)/2.
83
Thus we can reach every number in the interval [n(n + 1)/2 − (n − 1), n(n + 1)/2].
Using 1 + 2 + 3 + . . . + n = n(n + 1)/2 we can easily check that the union of these
disjoint intervals, as n increases from 1, covers the positive integers. To get to a
negative integer m, just reverse the signs on any scheme that gets to −m.
Another way to get to n > 0: do n = (−1 + 2) + (−3 + 4) + . . . + (−(2n − 1) + 2n).
(b) Our second solution to part 1 shows f (n) ≤ 2n, but to solve part 2 we need
to do better. Our first solution to part 1 shows that if m ∈ [n(n + 1)/2 −
(n − 1), n(n + 1)/2] then f (m) ≤ n + 2. But it is also easy to see that in that
interval, f (m) ≥ n, since the largest number that can be reached in n − 1 step is
1 + 2 + . . . + (n − 1) = n(n + 1)/2 − n.
√ √ √
For m ∈ [n(n + 1)/2 − (n − 1), n(n + 1)/2], we have (n − 1)/ 2 m ≤ (n + 1)/ 2,
and so
√ n f (m) √ n + 2
2 ≤ √ ≤ 2 .
n+1 m n−1
function of m, and that as m → ∞ we have n → ∞ also.
Notice that n here is a√
√ f (m)/ m is sandwiched between two positive sequences both of
So the sequence
which go to 2 as m → ∞. By the squeeze theorem,
f (m)
lim √
m→∞ m
√
exists and equals 2.
6. (Putnam 2011, Problem A4) Here is a transcription of a solution given by Kiran Ked-
laya and Lenny Ng at http://personal.plattsburgh.edu/quenelgt/f2013/putnam/
pdf/2011s.pdf:
The answer is n odd. Let I denote the n by n identity matrix, and let A denote the
n by n matrix all of whose entries are 1. If n is odd, then the matrix AI satisfies the
conditions of the problem: the dot product of any row with itself is n − 1, and the dot
product of any two distinct rows is n − 2.
Conversely, suppose n is even, and suppose that the matrix M satisfied the conditions
of the problem. Consider all matrices and vectors mod 2. Since the dot product of a row
with itself is equal mod 2 to the sum of the entries of the row, we have M v = 0 where
v is the vector (1, 1, . . . , 1), and so M is singular. On the other hand, M M T = AI;
since (A − I)2 = A2 − 2A + I = (n − 2)A + I = I, we have det(M )2 = det(A − I) = 1
and det(M ) = 1, contradicting the fact that M is singular.
84
10 Week ten (November 4) — Games
These problem are all about games played between two players. Usually when these problems
appear in the Putnam competition, you are asked to determine which player wins when
both players play as well as possible. Once you have decided which player wins (maybe
based on analyzing small examples), you need to prove this in general. Often this entails
demonstrating a winning strategy: for each possible move by the losing player, you can try to
identify a single appropriate response for the winning player, such that if the winning player
always uses these responses as the game goes on, then she will indeed win. It’s important to
remember that you must produce a response for the winning player for every possible move
of the losing player — not just a select few.
85
10.1 Week ten problems
1. A chocolate bar is made up of a rectangular m by n grid of small squares. Two players
take turns breaking up the bar. On a given turn, a player picks a rectangular piece
of chocolate and breaks it into two smaller rectangular pieces, by snapping along one
whole line of subdivisions between its squares. The player who makes the last break
wins. If both players play optimally, who wins?
2. I shuffle a regular deck of cards (26 red, 26 black), and begin to turn them face-up,
one after another. At some point during this process, you say “STOP!”. You can stay
stop as early as before I’ve even turned over the first card, or as late as when there is
only one card left to be turned over; the only rule is that at some point you must say
it. Once you’ve said stop, I turn over the next card. If it is red, you win the game,
and if it is black, you lose.
If you play the strategy “say stop before even a single card has been turned over”, you
have a 50% chance of winning the game. Is there a more clever strategy that gives you
a better than 50% chance of winning the game?
3. Alan and Barbara play a game in which they take turns filling entries of an initially
empty 1024 by 1024 array. Alan plays first. At each turn, a player chooses a real
number and places it in a vacant entry. The game ends when all the entries are filled.
Alan wins if the determinant of the resulting matrix is nonzero; Barbara wins if it is
zero. Which player has a winning strategy?
4. Two players, A and B, take turns naming positive integers, with A playing first.
No player may name an integer that can be expressed as a linear combination, with
positive integer coefficients, of previously named integers. The player who names “1”
loses. Show that no matter how A and B play, the game will always end.
5. Alice and Bob play a game in which they take turns removing stones from a heap that
initially has n stones. The number of stones removed at each turn must be one less
than a prime number. The winner is the player who takes the last stone. Alice plays
first. Prove that there are infinitely many n such that Bob has a winning strategy.
(For example, if n = 17, then Alice might take 6 leaving 11; Bob might take 1 leaving
10; then Alice can take the remaining stones to win.)
6. A game starts with four heaps of beans, containing 3, 4, 5 and 6 beans. The two players
move alternately. A move consists of taking either one bean from a heap, provided at
least two beans are left behind in that heap, or a complete heap of two or three beans.
The player who takes the last heap wins. Does the first or second player win? Give a
winning strategy.
86
10.2 Week ten solutions
1. (Told to me by Peter Winkler) There is one piece of chocolate to start, and mn pieces
at the end. Each turn by a player increases the number of squares by 1. Hence a game
lasts mn − 1 turns, completely independently of the strategies of the two players! The
winner is determined by the parity of m and n: if m and n are both odd, mn − 1 is
even and player 2 wins. Otherwise mn − 1 is odd and player 1 wins.
2. (Asked me by David Wilson, in an interview for a job at Microsoft Research) Here’s the
quick-and-dirty solution: The game fairly easily seen to be equivalent to the following:
exactly as before, except now when you say “STOP”, I turn over the bottom card in
the pile of cards that remains. In this formulation, it is clear that there cannot be a
strategy that gives you better than a 50% chance of winning.
Here’s a more prosaic solution. Suppose that instead of being played with a balanced
deck, it is played with a deck that has we claim that if there are a red cards and b
black cards, then there is no strategy better than the naive one of saying stop before a
single card has been turned over; note that with this strategy you win with probability
a/(a + b). We prove this by induction on a + b. If a + b = 1, then (whether a = 1
or a = 0) the result is trivial. Suppose a + b ≥ 2. To get a strategy that potentially
improves on the proposed best strategy, you must at least wait for the first card to be
turned over. Two things can happen:
• The first card turned over is red; this happens with probability a/(a + b). Once
this happens, you are playing a new version of the game, with a − 1 red cards and
b black cards, and by induction your best winning strategy has you winning with
probability (a − 1)/(a − 1 + b).
• The first card turned over is black; this happens with probability b/(a + b). Once
this happens, you are playing a new version of the game, with a red cards and
b − 1 black cards, and by induction your best winning strategy has you winning
with probability a/(a + b − 1).
Your probability of winning the original game is therefore at most

a a−1 b a a
× + × = .
a+b a+b−1 a+b a+b−1 a+b
3. (Putnam competition 2008, problem A2) Barbara has a winning strategy. For example,
Whenever Alan plays x in row i, Barbara can play −x in some other place in row i
(since there are an even number of places in row i, Alan will never place the last entry
in a row if Barbara plays this strategy). So Barbara can ensure that all row-sums of
the final matrix are 0, so that the column vector of all 1’s is in the kernel of the final
matrix, so it has determinant zero.
4. (John Conway) Here’s a quick solution, shown to me by Jonathan Sheperd.

Suppose the first number named is a. This rules out the subsequent naming of any
number congruent to 0 modulo a. So the second number named is some b that is
87
congruent to j modulo a, for 1 ≤ j ≤ a − 1. a and b together rule out the subsequent
naming of any number greater than b that congruent to j modulo a, leaving only
finitely many possible numbers congruent to j modulo a. So, after finitely many steps,
a number c must be named, that is congruent to j 0 modulo a, for some 1 ≤ j 0 ≤ a − 1,
j 0 6= j. a and c together rule out the subsequent naming of all but finitely many
numbers congruent to j 0 modulo a. Repeating, after finitely many moves we get to a
point where numbers have been named in every congruence class modulo a, at which
point there are only finitely many numbers remaining that may be named, and the
game ends in finitely many more moves.
Here’s my much less elegant solution:
Suppose the first k moves consist of naming x1 , . . . , xk . Let gk be the greatest common
divisor of the xi ’s. Consider the set of numbers expressible as a linear combination of
the xi ’s over positive integers.
PEach x in this set is an integer multiple of gk (gk divides
the right-hand side of x = i ai xi , so it divides the left-hand side). We claim that
there is some m such that all multiples of gk greater than mgk are in this set.
If we can prove this claim, we are done. The sequence (g1 , g2 , g3 , . . .) is non-increasing.
It stays constant in going from gi to gi+1 exactly when xi+1 is a multiple of gi , and
drops exactly when xi+1 is not a multiple of gi . By our claim, once the sequence has
reached a certain g, it can only stay there for a finite length of time. So eventually
that sequence becomes constantly 1. But once the sequence reaches 1, there are only
finitely many numbers that can be legitimately played, and so eventually 1 must be
played.
Here’s what we’ll prove, which is equivalent to the claim: if x1 , . . . , xk are relatively
prime positive integers (greatest common divisor equals 1) then there exists an m such
that all numbers greater than m can be expressed as a positive linear combination of
the xi ’s. We prove this by induction on k. When k = 1, xk = 1 and the result is
trivial. For k > 1, consider x1 , . . . , xk−1 . These may not be relatively prime; say their
greatest common divisor is d. By induction, there’s an m0 such that all positive integer
multiples of d greater than m0 d can be expressed as a positive linear combination of the
x1 , . . . , xk−1 . Now d and xk must be relatively prime (otherwise the xi ’s would not be
relatively prime), which means that there must be some positive integer e (which way
may assume is between 1 and xk − 1) with ed ≡ 1 (modulo xk ). If we add any multiple
of xk to e to get e0 , we still get e0 d ≡ 1 (modulo xk ). Pick a multiple large enough
that e0 > m0 . By induction, e0 d can be expressed as a positive integer combination of
x1 , . . . , xk−1 . So too can 2e0 d, 3e0 d, . . . , xk e0 d. These xk numbers cover all the residue
classes modulo xk . Let m be one less than the largest of these numbers. For ` > m, we
can express ` as a positive linear combination of x1 , . . . , xk as follows: first, determine
the residue class of ` modulo xk , say it’s p. Then add the appropriate positive integer
multiple of xk to pe0 d (which can can be expressed as a positive integer combination
of x1 , . . . , xk−1 ).
Remark: This is the game of Sylver coinage, invented by John H. Conway; see http:
//en.wikipedia.org/wiki/Sylver_coinage. It is named after J. J. Sylvester, who
proved that if a and b are relatively prime positive integers, then the largest positive
88
integer that cannot be expressed as a positive linear combination of a and b is (a −
1)(b − 1) − 1.
5. (Putnam competition 2008, problem A2. Below I give, verbatim, the solution published
in the American Mathematical Monthly verbatim. Implicit in this solution is the
following useful fact: in a finite, two-person game with no draws allowed, one of the
players must have a winning strategy.) Suppose there are only finitely many n such
that Bob will win if Alice starts with n stones, say all such n < N . Take K > N
so that K − m + 1 is composite for m = 0, . . . , N . Starting with n = K, Alice must
remove p − 1 stones, for p a prime number, leaving m = K − p + 1 stones. But m > N
since K − m + 1 = p is prime and K − m + 1 is composite for m < N . By assumption,
Alice can win starting from a heap of m stones. But it is Bob’s turn to move, and
so he could use the same strategy Alice would have used to win. This applies for any
first move Alice could have made from a heap of K stones. Hence Bob has a winning
strategy for a number K > N of stones, contrary to hypothesis. Instead there must
be infinitely many n for which Bob has a winning strategy.
6. (Putnam competition 1995, problem B5. I’ve reproduced the solution from The William
Lowell Putnam Mathematical Competition 1985–2000, Problems, Solutions, and Com-
mentary by Kiran S. Kedlaya, Bjorn Poonen and Ravi Vakil, where there is a further
nice discussion.) The first player has a winning strategy. The first player wins by
removing one bean from the pile of 3, leaving heaps of size 2, 4, 5, 6. Regarding heaps
of size 2 and heaps of odd size as “odd”, and heaps of even size other than 2 as “even”,
the total parity is now even. As long as neither player removes a single bean from a
heap of size 3, the parity will change after each move. Thus the first player can always
ensure that after his move, the total parity is even and there are no piles of size 3. (If
the second player removes a heap of size 2, the first player can move in another odd
heap; the resulting heap will be even and so cannot have size 3. If the second player
moves in a heap of size greater than 3, the first player can move in the same heap,
removing it entirely if it was reduced to size 3 by the second player.)
89
11 Week eleven (November 11) — Graphs
A graph G = (V, E) consists of a set a set V of vertices and a set E of edges, each of which
is a two-element subset of V . Think of the vertices as points put down on a piece of paper,
and of the edges as arcs joining pairs of points. There is no inherent geometry to a graph —
all that matters is which pairs of points are joined, not the exact position of the points, or
the nature of the arcs joining them.
Thinking about the data of a problem as a graph can sometimes be helpful. Although
some Putnam problems in the past have been non-trivial results from graph theory in dis-
guise, there is no real need to know much graph theory, so in this discussion I’ll just mention
some basic ideas that might be useful.
Problem: n people go to a party, and each one counts up the number of other people she
knows at the party. Show that there are an even number of people who come up with an
answer that is an odd number. (Assuming that “knowing” is a two-way relation; I know you
if and only if you know me.)
Solution: Model the problem as a graph. V is the set of n party goers, and E consists of
all pairs of people who know each other. For person i, denote by di the number of edges
that involve i (di is the degree of vertex i). We have
n
X
di = 2|E|
i=1
since as we run over all vertices and count degrees, each edge gets counted exactly twice
(once for each vertex in that edge). So the sum of degrees is even. But if there were an
odd number of vertices with odd degree, the sum would be odd; so there are indeed an even
number of people who know an odd number of people.
The useful fact that is true about all graphs that lies at the heart of the solution is this:
in any graph G = (V, E),
X n
di = 2|E|.
i=1
Problem: Show that two of the people at the party have the same number of friends.
Solution: The possible values for di are 0 through n − 1, n of them, so the pigeon-hole
principle doesn’t immediately apply. But: it’s not possible for there to be one vertex with
degree 0, and another with degree n − 1. So the possible values of di are either 1 through
n − 1, n − 1 of them, or 0 through n − 2, n − 1 of them, and in either case the pigeon-hole
principle gives that there are two people with the same number of friends.
The useful fact that is true about all graphs that lies at the heart of the solution is this:
in any graph G = (V, E), there must be two vertices with the same degree.
A walk in a graph from vertex u to vertex v is list of (not necessarily distinct) edges, with
u in the first, v in the last, and every pair of consecutive edges sharing a vertex in common
— graphically, a walk is a way to trace a path from u to v, always using complete arcs of
the drawing, and never taking pencil of paper.
90
Problem: How many different walks are there from u to v, that use k edges?
Solution: Form the adjacency matrix of the graph: rows and columns indexed by vertices,
entry (a, b) is 1 if {a, b} is an edge, and 0 otherwise. Then form the matrix Ak . The (u, v)
entry of this matrix is exactly the number of different walks from u to v, that use k edges.
The proof uses induction of k, and the definition of matrix multiplication. The key point is
that the number of walks from u to v that use k edges is the sum, over all neighbours w of
u (i.e., vertices w such that {u, w} is an edge), of the number of walks from w to v that use
k − 1 edges. I’ll skip the details.
The relation “there is a walk between” is an equivalence relation on vertices, so any graph
can be partitioned into equivalence classes, with each class having the property that between
any two vertices in the class, there is a walk, but there is no walk between any two vertices
in different classes. These classes are called components of the graph. If a graph has just
one component, meaning that between any two vertices in the graph, there is a walk, it is
said to be connected.
Problem: Given a graph G, under what circumstances is it possible to take a walk that
uses every edge of the graph exactly once, and ends up at the same vertex that it started
at?
Solution: Such a walk is called an Euler circuit, after the man who first studied them
(google “Bridges of Konigsberg”). Such a circuit is a tracing of the graphical representation
of the graph, with each arc traced out exactly once, the pencil never leaving the paper,
ending where it started. Two fairly obvious necessary conditions for the existence of an
Euler circuit are:
• the graph is connected, and
• every degree is even (because each time an Euler circuit visits a vertex, it eats up two
edges — one going in and one coming out).
Euler proved that these necessary conditions are sufficient: a connected graph has an Euler
circuit if and only if all vertex degrees are even. The details are given in any basic graph
theory textbook.
What if we don’t require the tracing to end at the same vertex it began?
Problem: Given a graph G, and two distinct vertices u and v, under what circumstances
is it possible to take a walk from u to v that uses every edge of the graph exactly once?
Solution: Such a walk is called an Euler trail. Euler proved that a connected graph has an
Euler trail from u to v if and only if all vertex degrees are even except the degrees of u and
v, which must be odd. It follows easily from his result on Euler circuits: just add an edge
from u to v, apply the Euler trails theorem, and delete the added edge.
Problem: Given a graph G with n vertices, under what circumstances is it possible to
list the vertices in some order v1 , . . . , vn , in such a way that each of {v1 , v2 }, {v2 , v3 }, . . .,
{vn−1 , vn }, {vn , v1 } are all edges?
91
Solution: Such a list is called a Hamiltonian cycle, after the man who first studied them
(google “icosian game”). Unlike with Eulerian trials, there is no simple set of necessary-and-
sufficient conditions known to allow one to determine whether such a thing exists in a given
graph. There is one useful sufficient condition, due to Dirac, that has an elementary but
involved proof that can be found in any graph theory textbook.
Dirac’s theorem: A graph G with n vertices has a Hamiltonian cycle if all vertices have
degree at least n/2.
A connected graph with the fewest possible number of edges is called a tree. It turns out
that all trees on n vertices have the same number of edges, namely n−1. One way to see this
is to imagine building up the tree from a set of n totally disconnected vertices, edge-by-edge.
At each step, you should add an edge that bridges two components, since adding an edge
within a component does not help; in the end such an edge can be removed without hurting
connectivity. Since two components get merged each time an edge is added, exactly n − 1
are needed to get to a single component.
A characterization of trees is that they are connected, but have no cycles — a cycle in
a graph is a list of distinct vertices u1 , u2 , . . . , uk such that each of {u1 , u2 }, {u2 , u3 }, . . .,
{uk−1 , uk }, {uk , u1 } are all edges.
A planar graph is a graph that can be drawn in the plane with no two arcs meeting
except at a vertex (if they have one in common). A planar drawing of a graph partitions
the plane into connected regions, called faces. Euler discovered a remarkable formula that
relates the number of vertices, edges and faces in a planar graph:
Euler’s formula: Let G be a planar graph with V vertices, E edges and F faces. Then
V − E + F = 2.
Proof sketch: By induction on F . If F = 1 then the graph has only one face, so it must
be a tree. A tree on V vertices has V − 1 edges, and so fits the formula.
Now suppose F > 1. Then the graph contains a cycle. If we remove an edge e of
that cycle then F drops by one, V stays the same, and E drops by 1. Now by induction,
V − (E − 1) + (F − 1) = 2 and this gives V − E + F = 2.
Problem: Show that five points can’t be connected up with arcs in the plane in such a way
that no two arcs meet each other except at a vertex (if they have one in common).
Solution: Suppose such a connection was possible. The resulting planar graph would have
5
5 vertices and ( 2) = 10 edges, so by Euler’s formula would have 7 faces. The sum, over the
faces, of the number of edges bounding the faces, is then at least 21, since each faces has at
least three bounding edges. But this sum is at most twice the number of edges, since each
each edge can be on the boundary of at most two faces; so it is at most 20, a contradiction.
A bipartite graph is a graph whose vertex set can be partitioned into two classes, X
and Y , such that the graph only has edges that go from X to Y (and so none that are
entirely within X or entirely within Y ). It’s fairly easy to see that any odd-length cycle is
not bipartite, so any graph that has an odd-length cycle sitting inside it is also not bipartite.
This turns out to be a characterization of bipartite graphs; the proof can be found in any
textbook on graph theory.
92
Theorem: A graph is bipartite if an only if it has no odd cycle.
A matching in a graph is a set of edges, no two of which share a vertex. A perfect matching
is a matching that involves all the edges. A famous result, whose proof can be found in any
graph theory textbook, is Hall’s marriage theorem. A consequence of it says that if there
are n women and n men, each women likes exactly d men, and each man is liked by exactly
d women, then it is possible to pair the men and women off into n pairs, such that each
women is paired with a man she likes. Here’s the statement in graph-theory language:
Theorem: Let G be a bipartite graph that is regular (all vertices have the sam degree). G
has a perfect matching.
93
11.1 Week eleven problems
1. In a town there are three newly build houses, and each needs to be connected by a line
to the gas, water, and electricity factories. The lines are only allowed to run along the
ground. Is there a way to make all nine connections without any of the lines crossing
each other?
2. An airline operates flights out of 2n airports. In all, the airline operates n2 + 1 different
routes (all there-and-back: South Bend to Chicago is considered the same route as
Chicago to South Bend). Prove that there are three airports a, b, c such that it is
possible to fly directly from a to b, then b to c, then c back to a.
3. A set of 1990 people is divided into non-intersecting subsets in such a way that
(a) no one in a subset knows all the others in the subset;

(b) among any three people in a subset there are always at least two who do not know
each other;
(c) for any two people in a subset who do not know each other, there is exactly one
person in the same subset knowing both of them.
Prove that within each subset every person has the same number of acquaintances,
and determine the maximum possible number of subsets.
4. Is there a way to list the 2n subsets of {1, . . . , n} (with each subset appearing on the
list once and only once) in such a way that the first element of the list is the empty set,
and every element on the list is obtained from the previous element either by adding
an element or deleting an element?
5. A round-robin tournament among 2n teams lasted for 2n − 1 days, as follows. On each

day, every team played one game against another team, with one team winning and
one team losing in each of the n games. Over the course of the tournament, each team
played every other team exactly once. Can one necessarily choose one winning team
from each day without choosing any team more than once?
6. Let n be an even positive integer. Write the numbers 1, 2, . . . , n2 in the squares of

an n by n grid so that the kth row, from left to right, reads (k − 1)n + 1, (k − 1)n +
2, . . . , (k − 1)n + n. Color the squares of the grid so that half of the squares in each
row and in each column are red and the other half are black (a checkerboard coloring
is one possibility, but there are others). Prove that for each coloring, the sum of the
numbers on the red squares is equal to the sum of the numbers on the black squares.
94
11.2 Week eleven solutions
1. (Folklore) The question is asking whether the graph on six vertices, 1, 2, . . . , 6, with
an edge from i to j if and only if 1 ≤ i ≤ 3 and 4 ≤ j ≤ 6, can be drawn in
the plane without crossing edges. It has 6 vertices and 9 edges, so if it could, any
representation would have 5 faces (Euler’s formula). Each face is bounded by at least
4 edges (note that the graph we are working with clearly has no triangles), so summing
“#(bounding edges)” over all faces, get at least 20. But this sum counts each edge at
most twice, so we get at most 18, a contradiction that reveals that there is no such
planar representation.
2. (Putnam Competition 1956, problem B5) Suppose there were no such three cities a,
b, c. For each city x, denote by d(x) the number of cities with direct connection to
x. If there is a connection between cities x and y, then there cannot be a third city y
directly connected to both (or we would have a triangle), so
(d(x) − 1) + (d(y) − 1) ≤ 2n − 2, d(x) + d(y) ≤ 2n.
Now in the sum of d(x) + d(y) over all pairs of connected cites, for each x we have that
d(x) appears exactly d(x) times (once for each city directly connected to x), so
X X
(d(x) + d(y)) = d2 (x) ≤ (n2 + 1)2n.
x
Now use the Cauchy-Schwartz-Bunyakovski inequality to bound:

! !2
X X
2n d2 (x) ≥ d(x) = (2(n2 + 1))2
x x
P
(note that in x d(x), each route gets counted exactly twice, once for each endpoint).
We conclude that
4(n2 + 1)2 ≤ 4n2 (n2 + 1),
which fails to hold for any n ≥ 0.
Here’s an alternate solution taken from https://mks.mff.cuni.cz/kalva/putnam/

psoln/psol5612.html: Model the problem as one about graphs: we are given a graph
on 2n vertices with n2 + 1 edges, and we want to find a triangle.
Induction. For n = 2, the result is obviously true, because there is only one graph with
4 points and 5 edges and it certainly contains a triangle. Suppose the result is true for
some n ≥ 2. Consider a graph G with 2n + 2 vertices and n2 + 2n + 2 edges. Take any
two vertices x and y joined by an edge. We consider two cases. If there are fewer than
2n + 1 other edges joined to either x or y (or both), then if we remove x and y we get
a graph with 2n vertices and at least n2 + 1 edges, which must contain a triangle (by
induction), so G does also. If there are at least 2n + 1 other edges joined to either x
or y (or both) then by pigeon-hole principle there is at least one vertex joined to both,
and that gives a triangle.
Note: This is Mantel’s theorem from 1907.
95
3. (Asian Pacific Mathematics Olympiad, 1990) Let the subsets be A1 , A2 , . . . , Ak . For
each i let Gi be the graph whose vertex set is Ai , with two vertices adjacent if the
people in question know each other. We know the following about Gi :
(a) all vertices have degree at most |Ai | − 2;

(b) it has no triangles;
(c) for every pair of vertices x, y that are not adjacent, there is exactly one vertex
adjacent to both x and y.
What can we say about a graph G that has the above three properties? Let a be a
vertex, with degree m and neighbours b1 , . . . , bm . By (2) there are no edges between
the b’s, and by (3), no two distinct b’s can have a common neighbour other than a; so
each bi has a and some vertices cij as neighbours, where j runs from 1 to one less than
the degree of bi , and all the c’s are distinct.
By (3), any vertex not adjacent to a is joined to a by a path of length 2, and so must
be one of the c’s; hence a, the b’s and the c’s make up all the vertices of the graph.
We can’t have an edge from cij to cij 0 , by (2); we can’t simultaneously have edges from
cij to ci0 j 0 and ci0 j 00 , by (3), so for each cij and each i0 6= i, there’s at most one edge
from cij to something of the form ci0 j 0 . But actually there’s exactly one such; if there
was none, then cij and bi0 would be non-adjacent and have no point adjacent to both,
violating (3). Hence each cij has degree exactly m: one edge to bi , and one edge to
something of the form ci0 j 0 for each i0 6= i.
Repeating the same argument, starting from a b, we get that all the b’s must also have
degree m, so all vertices of G have degree m, and G has 1 + m + m(m − 1) = m2 + 1
vertices.
If m = 0 we get that G is the graph on a single vertex, which violates (1). If m = 1 we
get that G consists of a single edge, which also violates (1). If m = 2 we get that G has
five vertices, and must be a pentagon, which we can check satisfies all the conditions.
Hence the smallest number of subsets is 1990/5 = 398.
4. (Imre Leader told me this problem) We’ll prove by induction on n that it is possible,
and that moreover it is possible to do so in such a way that the last element listed is a
singleton (so that the list can be considered a cycle). In fact, the question is asking for
a Hamiltonian path (a walk that visits every vertex once) in the graph whose vertex
set in the power set of {1, . . . , n}, with two vertices adjacent if they have symmetric
difference of size exactly 1; this graph is called the n-dimensional hypercube (when
n = 2 it is just a square, when n = 3 it is the usual 3-cube). We’ll prove by induction
on n ≥ 2 that the n-dimensional hypercube has a Hamiltonian cycle, from which we
can clearly construct a Hamiltonian path of the required kind by deleting an edge out
of ∅.
The case n = 2 is trivial: ∅, {1}, {1, 2}, {2}, ∅ works.
For n ≥ 2, start with a Hamiltonian cycle C, ∅, {1}, . . . , {2}, ∅, of the (n − 1)-
dimensional hypercube (we known there’s one by induction), and then also consider the
96
sequence C 0 , {n}, {1, n}, . . . , {2, n}, {n}, obtained by unioning every term of C with
{n}. C 0 has the property that it is a cycle list of the elements of the n-dimensional
hypercube that are not listed in C, and also has the property that adjacent elements
have symmetric difference of size exactly 1. A Hamiltonian cycle of the n dimensional
hypercube is now obtained by starting with all the elements of C except the final
∅, then going to the second-from-last element of C 0 , and then listing the remaining
element of C 0 (except for the final ∅) in reverse order.
5. (Putnam Competition, 2012, problem B3) Construct a bipartite graph with one side
of the graph having teams as vertices, and the other side having days. Join a team
and a day if that team won on that day. We want to choose a matching in this graph
that has 2m − 1 edges (so selects a distinct winning team for each day).
Hall’s theorem says that there is such a matching iff for every subset of days, the
number of teams that won on at least one of those days is at least the number of days
in question. Suppose this condition fails for some set D of days. That means that
among those |D| days, the number of teams that won at least one match is smaller
than |D|. That means that there is at least one team that lost on every one of the
days in D. But that team played distinct teams on those days, so there are at least
|D| winners in that period — contradiction!
6. (Putnam Competition, 2001, problem B1) Suppose we pick out a collection of n
squares, one from each row and one from each column, say the squares in positions
(1, a1 ), (2, a2 ), . . . , (n, an ) where {a1 , . . . , an } = {1, . . . , n}. The sum of the entries in
these n squares is
(0n + a1 ) + (1n + a2 ) + . . . + ((n − 1)n + an ) = n(1 + . . . + (n − 1)) + (a1 + . . . + an )
= n(1 + . . . + (n − 1)) + (1 + . . . + n),
which is independent of the selection of the ai . This means that the sum is 1/n times
the total of all the n2 entries.
It follows that if we can take the squares that are coloured red, and partition them
into n/2 selections of n squares, one from each row and one from each column, then
the sum of the red squares would be half the total sum of the n2 integers.
Here’s an alternate solution from ttp://www.mat.hawaii.edu/ dale/putnam/2001.pdf:

Instead of writing the numbers 1 through n2 , write 0 through n2 − 1. Since half the
squares are red and half are black, the assertion to be proven holds iff it holds with
this new numbering. Now rewrite each number in base n in the form d2 d1 , where we
take d2 = 0 in the first row. Note that d2 is the same in each row, while d1 is the same
in each column. Since half of the squares in each row and in each column are red and
the other half are black, the sum of the d2 ’s for the red squares is the same as the sum
of the d2 ’s for the black squares in any row, and hence the sum of all red d2 s and all
black d2 s is the same. Considering columns, the same equality holds for the sums of
d1 ’s for red and d1 ’s for black squares. Thus, adding base n yields the same sum for
red and black squares.
97
12 Week twelve (November 18) — Probability
Discrete probability may be thought about along the following lines: an experiment is per-
formed, with a set S, the sample space, of possible observable outcomes (e.g., roll a dice and
note the uppermost number when the dice lands; then S would be {1, 2, 3, 4, 5, 6}). S may
be finite or countable for our purposes. An event is a subset A of S; the event occurs if the
observed outcome is one of the elements of A (e.g., if A = {2, 4, 6}, which we might describe
as the event that an even number is rolled, then we would say that A occurred if we rolled a
4, and that it did not occur if we rolled a 5). The compound event A ∪ B is the event that
at least one of A, B occur; A ∩ B is the event that both A and B occur, and Ac (= S \ A)
is the event that A did not occur.
A probability function is a function P that assigns to each event a real number, which is
intended to measure how likely A is to occur, or, in what proportion of a very large numbers
of independent repetitions of the experiment does A occur. P should satisfy the following
three rules:
1. P (A) ≥ 0 always (events occur with non-negative probability);
2. P (S) = 1 (something always happens); and
3. if A and B are disjoint events (no outcomes in common) then P (A∪B) = P (A)+P (B);
more generally, if A1 , A2 , . . . is a countable collection of mutually disjoint events, then
X
P (Ui Ai ) = P (Ai ).
i
Three consequences of the rules are the following relations that one would expect:
1. If A ⊆ B then P (A) ≤ P (B);
2. P (∅) = 0; and
3. P (Ac ) = 1 − P (A).
Usually one constructs the probability function in the following way: intuitition/experiment/some
underlying theory suggests that a particular s ∈ S will occur a proportion ps of the time,
when the experiment isPrepeated many times; a reality check here is that ps should be
non-negative, and that s∈S ps = 1. One then sets
X
P (A) = ps ;
s∈A
it is readily checked that this function satisfies all the axioms.

In the particular case when S is finite and intuitition/experiment/some underlying theory
suggests that all outcomes s ∈ S are equally likely to occur, we get the classical “definition”
of the probability of an event:
|A|
P (A) = ,
|S|
98
and calculating probabilities comes down to counting.
Example: I toss a coin 100 times. How likely is it that I get exactly 50 heads?
Solution: All 2100 lists of outcomes of 100 tosses are equally likely, so each one should occur
with probability 1/2100 . The number of outcomes in which there are exactly 50 heads is
100
50
, so the required probability is
100

50
.
2100
A random variable X is a function that assigns to each outcome of an experiment a
(usually real) numerical value. For example, if I toss a coin 100 times, I may not be interested
in the particular list of heads and tails I get, just in the total number of heads, so I could
define X to be the function that takes in a string of 100 heads and tails, and returns as
the numerical value the number of heads in the string. The density function of the random
variable X is the function pX (x) = P (X = x), where “P (X = x)” is shorthand for the
event “the set of all outcomes for which X evaluates to x”. For tossing a coin 00 times and
counting the number of heads, the density function is
100 −100
x
2 if x = 0, 1, 2, . . . , 100
pX (x) =
0 otherwise.
More generally, we have the following:

Binomial distribution: I toss a coin n times, and each time it comes up heads with some
probability p. Let X be the number of heads that comes up. The random variable X is
called the binomial distribution with parameters n and p, and has density function
n x
x
p (1 − p)n−x if x = 0, 1, 2, . . . , n
pX (x) =
0 otherwise.
Note that by the binomial theorem.

n
X n k
p (1 − p)n−k = (p + (1 − p))n = 1.
k=0
k
The expected value of a probability distribution/random variable is a measure of the

average value of a long sequence of readings from that distribution; it is calculated as a
weighted average: X
E(X) = xpX (x)
x
with reading x being given weight pX (x). For example, if X is the binomial distribution
with parameters n and p, then E(X) is
n n
X n k n−k
X n − 1 k−1
k p (1 − p) = np p (1 − p)n−k
k=0
k k=0
k − 1
= np(p + (1 − p))n−1
= np,
99
as we would expect.
It is worth knowing that expectation is a linear function:
Linearity of expectation: If a probability distribution/random variable X can be written
as the sum X1 + . . . + Xn of n (usually simpler) probability distributions/random variables,
then
E(X) = E(X1 ) + . . . + E(Xn )
Example: n boxes have labels 1 through n. n cards with numbers 1 through n written
on them (one number per card) are distributed among the n boxes (one card per box). On
average how many boxes get the card whose number is the same as the label on the box?
Solution: Let Xi be the random variable that takes the value 1 if card i goes into box
i, and 0 otherwise; pXi (1) = 1/n, pXi (0) = 1 − (1/n) and pXi (x) = 0 for all other x’s, so
E(Xi ) = 1/n. Let X be the random variable that counts the number of boxes that get the
right card; since X = X1 + . . . Xn we have
E(X) = E(X1 ) + . . . + E(Xn ) = n(1/n) = 1
(independent of n!) [This is the famous problem of derrangements.]

One of the rules of probability is that for disjoint events A, B, we have P (A ∪ B) =
P (A) + P (B). If A and B have overlap, this formula overcounts by including outcomes in
A ∩ B twice, so should be corrected to
P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
For three events A, B, C, a Venn diagram readily shows that
P (A ∪ ∪C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩ C) + P (A ∩ B ∩ C).
There is a natural generalization:

Inclusion-exclusion (also called the sieve formula):
n
X X
P (∪ni=1 Ai ) = P (Ai ) − P (Ai ∩ Aj )
i=1 i<j
X
+ P (Ai ∩ Aj ∩ Ak ) + . . .
i<j<k
X
+(−1)`−1 P (Ai1 ∩ Ai2 ∩ Ai` ) + . . .
i1 <i2 <...<i`
n−1
+(−1) P (A1 ∩ A2 ∩ An ).
Inclusion-exclusion is often helpful because calculating probabilities of intersections is

easier than calculation probabilities of unions.
Example: In the problem of derrangements discussed above, what is the exact probability
that no box gets the correct card?
100
Solution: Let Ai be the event that box i gets the right card. We have P (Ai ) = 1/n, and
more generally for i1 < i2 < . . . < i` we have
(n − `)!
P (Ai1 ∩ Ai2 ∩ Ai` ) =
n!
(there are n! distributions of cards, and to make sure that boxes i1 through i` get the right
card, we are forced to place these ` cards each in a predesignated box; but the remaining
n − ` cards can be completely freely distributed among the remaining boxes). By inclusion-
exclusion,
n
`−1 n (n − `)!
X
n
P (∪i=1 Ai ) = (−1) ,
`=1
` n!
the binomial term coming from selecting i1 < i2 < . . . < i` . We want the probability of none
of the boxes getting the right card, which is the complement of ∪ni=1 Ai :
n
`−1 n (n − `)!
c
X
n
P ((∪i=1 Ai ) ) = 1 − (−1)
`=1
` n!
n
` n (n − `)!
X
= (−1)
`=0
` n!
n
X (−1)`
= .
`=0
`!
Note that this is the sum of the first n + 1 terms in the power series of ex around 0, evaluated
at x = −1, so as n gets larger the probability of there being no box with the right card
approaches 1/e.
There is a counting version of inclusion-exclusion, that is very useful to know:
Inclusion-exclusion (counting version):
n
X X
| ∪ni=1 Ai | = |Ai | − |Ai ∩ Aj |
i=1 i<j
X
+ |Ai ∩ Aj ∩ Ak | + . . .
i<j<k
X
+(−1)`−1 |Ai1 ∩ Ai2 ∩ Ai` | + . . .
i1 <i2 <...<i`
+(−1)n−1 |A1 ∩ A2 ∩ An |.
Example: How many numbers are there, between 1 and n, that are relatively prime to n
(have no factors in common)?
Solution: Let n have prime factorization pa11 pa22 . . . pakk . Let Ai be the set of numbers
between 1 and n that are multiples of pi . We have |Ai | = n/pi , and more generally for
i1 < i2 < . . . < i` we have
n
|Ai1 ∩ Ai2 ∩ Ai` | = .
pi1 . . . pi`
101
We want to know |(Ai1 ∩ Ai2 ∩ Ai` )c | (complement taken inside of {1, . . . , n}), because this
is exactly the set of numbers below n that share no factors in common with n. By inclusion-
exclusion,
k
!
n
c
X 1 X 1 (−1)
| (∪ni=1 Ai ) | = n 1 − + − ... +
p
i=1 i
pp
i<j i j
p1 p2 . . . pk

1 1
= n 1− ... 1 − .
p1 pk
The function
k
Y 1
ϕ(n) = n 1−
i=1
pi
counting the number of numbers between 1 and n that are relatively prime to n, is called
the Euler totient function.
The only mention I’ll make of probability with uncountable underlying sample spaces is
this: if R is a region in the plane, then a natural model for “selecting a point from R, all
points equally likely”, is to say that for each subset R0 of R, the probability that the selected
point will be in R0 is Area(R0 )/area(R), that is, proportional to the area of R0 . This idea
naturally extends to more general spaces.
Example: I place a small coin at a random location on a 3 foot by 5 foot table. How likely
is it that the coin is within one foot of some edge of the table?
Solution: There’s a 1 foot by 3 foot region at the center of the table, consisting of exactly
those points that are not within one foot of some edge of the table; assuming that the
coin is equally likely to be placed at any location, the probability of landing in this region is
(1 × 3)/(3 × 5) = .2, so the probability of landing withing one foot of some edge is 1 − .2 = .8.
102
12.1 Week twelve problems
1. Shanille OKeal shoots free throws on a basketball court. She hits the first and misses
the second, and thereafter the probability that she hits the next shot is equal to the
proportion of shots she has hit so far. What is the probability that she hits exactly 50
of her first 100 shots?
2. A dart, thrown at random, hits a square target. Assuming that any two parts of the
target of equal area are equally likely to be hit, find the probability that the√point hit
is nearer to the center than to any edge. Express your answer in the form (a b + c)/d
where a, b, c and d are positive integers.
3. A bag contains 1993 red balls and 1993 black balls. We remove two balls at a time
repeatedly and (i) discard both if they are the same color and (ii) discard the black
ball and return the red ball to the bag if their colors differ. What is the probability
that this process will terminate with exactly one red ball in the bag?
4. Let S be the set of 2 by 2 matrices each of whose entries is one of the 15 squares
0, 1, 4, 9, . . . , 196. Prove that if one selects more than 154 − 152 − 15 + 2 matrices from
S, then two of those selected must commute.
5. Let k be a positive integer. Suppose that the integers 1, 2, 3, . . . , 3k + 1 are written

down in random order. What is the probability that at no time during the process,
the sum of the integers that have been written up to that time is a positive integer
divisible by 3? Your answer should be in closed form, but may include factorials.
6. An unbiased coin (i.e., heads and tails will each occur with probability 1/2) is tossed
n times. Find a formula, in closed form (no summation), for the expected value of
|H − T |, where H is the number of heads and T is the number of tails.
103
12.2 Week twelve solutions
1. (Putnam Competition 2002, Problem B1) Some doodling with small examples suggests
the following: if Shanille throws a total of n free throws, n ≥ 3, then for each k in the
range [1, n−1] the probability that she makes exactly k shots in 1/(n−1) (independent
of k).
We can prove this be induction on n, with n = 3 very easy. For n > 3, we start with
the extreme case k = 1. The probability that she makes exactly one shot in total is
the probability that she misses each of shots 3 through n, which is
1 2 3 n−3 n−2 1
· · · ... · · = .
2 3 4 n−2 n−1 n−1
For k > 1, there are two (mutually exclusive) ways that she can make k shots in total:
(a) Make k − 1 of the first n − 1, and make the last; the probability of this happening
is, by induction,
1 k−1
· ,
n−2 n−1
or
(b) make k of the first n − 1, and miss the last; the probability of this happening is,
by induction,
1 n−1−k
· .
n−2 n−1
Thus the net probability of making k shots is
1 k−1 1 n−1−k 1
· + · = ,
n−2 n−1 n−2 n−1 n−1
and we are done by induction.
The answer to the given question is 1/99 (n = 100).
2. (Putnam Competition 1989, Problem B1) Place the dartboard on the x-y plane, with
vertices at (0, 0), (0, 2), (2, 2) and (2, 0), so center at (1, 1). We want to compute the
are of the set of points inside this square which are closer to (1, 1) than any of x = 0, 2,
y = 0, 2. We’ll just consider the triangle T bounded by vertices (0, 0), (1, 0), (1, 1); by
symmetry, this is one eight of the desired area.
p
For a point (x, y) in T , the distance to (1, 1) is (x − 1)2p + (y − 1)2 , and the distance
to the nearest of x = 0, 2, y = 0, 2 is just y. So the curve (x − 1)2 + (y − 1)2 = y, or
x2 − 2x + 2
y=
2
cuts T into two regions, one (containing (1, 1)) being the points that are closer
√ to (1,
√ 1)
than the nearest of x = 0, 2, y = 0, 2. This curve hits the line x = y at (2− 2, 2− 2).
So the desired area inside T is √
the total√area of T (which is 1/2) minus the area bounded
by x = y from (0, 0) to (2 − 2, 2 − 2), then y = (x2 − 2x + 2)/2 to (1, 1/2), then
104
x = 1 to (1, 0), then the
√ x-axis√back to (0, 0).√ This area is the area
√ of2 the triangle
bounded by (0, 0), (2 − 2, 2 − 2), and (2 − 2, 0) (which is (2 − 2) /2), plus
1 √
Z 1
x2 − 2x + 2
√
dx = (4 2 − 5);
2− 2 2 3
√
grand total (1/3)(4 − 2 2). It follows that the desired area inside T is
1 1 √ 1 √
− (4 − 2 2) = (4 2 − 5),
2 3 6
√
and so the total desired area is eight times this, or (4/3)(4 2 − 5).
Since the total area of the square is 4, the desired probability is thus
1 √
(4 2 − 5) ≈ .218951.
3
Note: This is an example of a Putnam question with a typo: the correct (and officially
sanctioned) answer is as given above, but notice that c = −5, which is not a positive
integer.
3. (I heard this problem from David Cook) It helps to generalize to r red balls and b
black balls, since as the process goes along the number of balls of the two colors will
not be equal. A little experimentation suggests the following: if the process is started
with and odd number r ≥ 1 of red balls, and b ≥ 0 balls, then it always ends with
one red ball. We prove this by induction on r + b. Formally: for each n ≥ 1, P (n) is
the proposition “if the process is started with and odd number r ≥ 1 of red balls, and
b ≥ 0 balls, with r + b = n, then it always ends with one red ball”, and we prove P (n)
by induction on n.
Base case n = 1 is trivial, as is base case n = 2. For base case n = 3, we either start
with three red balls, in which case after one step we are down to one red, or we start
with one red and two blues. In this case, one third of the time we first pick the two
blacks, and we are down to one red, while two thirds of the time we first pick a black
and a red, and we are down to one red and one black, leaving us with one red after
step two.
Now consider n ≥ 4, and start with r red balls and b black balls, r odd and r + b = n.
If on the first step we pick two reds, then we are left with r − 2 red balls and b black
balls. Note that r − 2 is odd and (r − 2) + b = n − 2, so by induction in this case we
always end with one red. If on the first step we pick two blacks, then we are left with
r red balls and b − 2 black balls. Note that r is odd and r + (b − 2) = n − 2, so by
induction in this case we always end with one red. Finally, if on the first step we pick
a black and a red, then we are left with r red balls and b − 1 black balls. Note that r
is odd and r + (b − 1) = n − 1, so by induction in this case we always end with one
red. This completes the induction.
Since 1993 is odd, the probability of ending with one red, starting with 1993 red balls
and 1993 black balls is 1.
105
4. (Putnam Competition 1990, problem B3) Let A be the set of diagonal matrices from S
(matrices with 0’s off the main diagonal), and B the set of matrices with all four entries
the same. Since |S| = 154 , |A| = 152 , |B| = 15 and |A ∩ B| = 1, inclusion-exclusions
tells us that
|(A ∪ B)c | = 154 − 152 − 15 + 1.
Let us select at least 154 − 152 − 15 + 3 elements from S. If the all-zero matrix (the
unique entry in A ∩ B) is among those we select, then there is a pair that commutes,
since the all-zero matrix commutes with everything. So let’s assume that the all-zero
matrix was not selected.
If two diagonal matrices are selected, or two matrices with all four entries the same are
selected, then there is a pair that commutes, since any two diagonal matrices commute,
and any two matrices with all four entries the same commute. So let’s assume that at
most one diagonal matrix was selected, and at most one matrix with all four entries
the same.
This leaves at least 154 − 152 − 15 + 1 matrices selected from (A ∪ B)c ; in other words,
all matrices from (A ∪ B)c . The problem is completed by exhibiting two matrices that
commute from (A ∪ B)c ; the pair [1, 1; 0, 1] and [1, 4; 0, 1] works.
5. (Putnam Competition 2007, problem A3) The official solution, published in the Amer-
ican Mathematical Monthly, is very nicely presented, so I reproduce it here verbatim:
“The number of ways to write down 1, 2, 3, . . . , 3k + 1 in random order is (3k + 1)!, so
we want to count the number of ways in which none of the “partial sums” is divisible
by 3. First, consider the integers modulo 3 : 1, 2, 0, 1, 2, 0, . . . , 1, 2, 0, 1. To write these
with none of the partial sums divisible by 3, we must start with a 1 or a 2. After that,
we can include or omit 0’s at will without affecting whether any of the partial sums
are divisible by 3, so suppose [initially] we omit all 0’s. The remaining sequence of 1’s
and 2’s must then be of the form
1, 1, 2, 1, 2, 1, 2, . . .
or
2, 2, 1, 2, 1, 2, 1, . . .
(once you start, the rest of the sequence is forced by the condition that no partial sum
is divisible by 3). However, a sequence of the form 2, 2, 1, 2, 1, 2, 1, . . . has one more 2
than 1, and we need to have one more 1 than 2. So the only possibility for our sequence
modulo 3, once the 0’s are omitted, is 1, 1, 2, 1, 2, 1, 2, . . .. There are 2k + 1 numbers in
this sequence, and the k 0’s can be returned to the sequence arbitrarily except at the
beginning. So the number of ways to form the complete sequence modulo 3 equals the
number of ways to distribute the k identical 0’s over 2k + 1 boxes (the “slots” after the
3k
1’s and 2’s), which by a standard “stars and bars” argument is k . Once this is done,
there are k! ways to replace the k 0’s in the sequence modulo 3 by the actual integers
3, 6, . . . , 3k. Also, there are k! ways to “reconstitute” the 2’s and (k + 1)! ways for the
1’s. So the answer is
3k

k
k!k!(k + 1)! 00
.
(3k + 1)!
106
6. (Putnam Competition 1974, Problem A4) Here’s a solution from https://mks.mff.
cuni.cz/kalva/putnam/putn74.html:
If i heads are tossed, then n − i tails, so H − T = 2i − n, and
n
1 X n
E(|H − T |) = n |2i − n| .
2 i=0 i
For i = 0, . . . [n/2], the summand is (n − 2i) ni , which is equal to (2(n − i) − n) n

n−i
,
the summand for n − i; so
[n/2]
1 X n
E(|H − T |) = n−1 (n − 2i)
2 i=0
i
(this clearly works for odd n, since as i runs from 0 to [n/2], n − i runs from n to
[n/2] + 1; it also works for even, since in this case the one term that contributes in
both ranges, i = [n/2], contributes 0.
We’ll show
[n/2]
X n n−1
(n − 2i) =n ,
i=0
i [n/2]
which leads to
n−1

n[n/2]
E(|H − T |) = .
2n−1
We’ll use the committee-chair identity,

n n−1
i =n ,
i i−1
and
[n/2]
2n−1

if n odd
X
=
n−1 1 n
2 + 2 [n/2] if n even
i=0
(to see this, use nr = n−r

n
and ni=0 ni = 2n ).
P
For odd n, we have

[n/2] [n/2]
X n n−1
X n
(n − 2i) = n2 −2 i
i=0
i i=0
i
[n/2]
X n − 1
n−1
= n2 − 2n
i=1
i−1
[n/2]−1
X n − 1
n−1
= n2 − 2n
i
i=0
n−1 n−1 n−1
= n2 −n 2 −
[n/2]

n−1
= n ,
[n/2]
107
as claimed. The case of even n is dealt with similarly.
108
13 Week thirteen (November 25) — One more grab-
bag, and some writing tips
This final selection of uncategorized problems is mostly drawn from recent Putnam competi-
tions. The odd-problem-out is problem 4, which I first heard a few days ago, and was blown
away by.
This week’s preamble concerns general problem solving tips, and specific writing tips. I
have shamelessly appropriate both of the sections below — the first from Ravi Vakil’s Stan-
ford Putnam Preparation website (http://math.stanford.edu/~vakil/putnam05/05putnam7.
pdf), and the second from Ioana Dumitriu’s UWashington’s “The Art of Problem Solv-
ing” website (http://www.math.washington.edu/~putnam/index.html). Both sections
are filled with what I think is great advice.
General Putnam tips

Getting ready
1. Try, if possible, to get a good night’s sleep beforehand.
2. Wear comfortable clothes.
3. There’s no need to bring paper, pencils, erasers or pencil sharpeners (unless you have
some “lucky” ones); I will provide all of these necessary materials.
4. Bring snacks and drinks, to keep your energy up during each three-hour session. I’ll
provide pizza, fruit and soda for lunch, so if you are happy with that, there’s no need
to bring any lunch supplies.
5. During the competition, you can’t use outside notes, computers, calculators, etc..
During the competition

1. Spend some time at the start looking over the questions. The earlier questions in each
half tend to be easier, but this isn’t always the case — you may be lucky/inspired and
spot a way in to A4, before A1. Remember that you have three hours for each paper;
that’s a lot of time, so take some time to begin calmly.
2. When (rather than if) your mind gets tired, take a break; go outside, get a snack, use
the bathroom, clear your head.
3. The “fifteen minute rule” can be helpful: if you find that you’ve been thinking about
a problem aimlessly without having any serious new ideas for 15 minutes, then jump
to a different problem, or take a break.
4. Don’t become discouraged. Don’t think of this as a test or exam. Think of it instead
as a stimulating challenge. Often a problem will break open a couple of hours into
109
the session. And the problem with your name on it might be in the afternoon session.
Remember that the national median score for the entire competition is usually 0 or
1; getting somewhere on one question is significant (it puts you on the top half of the
curve); getting a full question is a big deal.
5. If you solve a problem, write it up very well (rather than starting a new problem);
grading on the Putnam competition is very severe. See the later section on writing for
more on this.
Some specific mathematical tips

Here are some very simple things to remember, that can be very helpful, but in the heat of
the moment people tend to forget to do.
1. Try a few small cases out. Try a lot of cases out. Remember that three hours is a
long time — you have time to spare! If a question asks what happens when you have
n things, or 2014 things, try it out with 1, 2, 3, 4 things, and try to form a conjecture.
This is especially valuable for questions about sequences defined recursively.
2. Don’t be afraid to use lots and lots of paper.
3. Dont be afraid of diving into some algebra. (Again, three hours is a long time . . ..)
You shouldn’t waste that much time, thanks to the 15-minute rule.
4. If a question asks to determine whether something is true or false, and the direction
you initially guess doesn’t seem to be going anywhere, then try guessing the opposite
possibility.
5. Be willing to try (seemingly) stupid things.
6. Look for symmetries. Try to connect the problem to one you’ve seen before. Ask
yourself “how would [person X] approach this problem”? (It’s quite reasonable here
for [Person X] to be [Chuck Norris]!)
7. Putnam problems always have slick solutions. That leads to a helpful meta-approach:
“The only way this problem could have a nice solution is if this particular approach
worked out, unlikely as it seems, so I’ll try it out, and see what happens.”
8. Show no fear. If you think a problem is probably too hard for you, but you have an
idea, try it anyway. (Three hours is a long time.)
Some specific writing tips

You’re working hard on a problem, focussing all your energy, applying all the good tips
above, and all of a sudden a bulb lights above your head: you have figured out how to solve
one of the problems! Awesome! But here’s the hard, unforgiving truth: that warm tingle
you’re feeling is no guarantee
110
• that you have actually solved the problem (are you sure you have all the cases covered?
Are you sure that all the little details work out? . . .) and
• that you will get any or all credit for it.
Solving the problem is only half the battle. Now you have launch into the other half,
convincing the grader that you have actually done so.
Life is tough, and Putnam graders may be even tougher. But here’s a list of things which,
when done properly, will yield a nice write-up which will appease any reasonable grader.
1. All that scrap paper, filled with your musings on the problem to date? Put it aside;
you must write a clean, coherent solution on a fresh piece of paper.
2. Before you start writing, organize your thoughts. Make a list of all steps to the solution.
Figure out what intermediary results you will need to prove. For example, if the
problem involves induction, always start with the base case, and continue with proving
that “true for n implies true for n + 1”. Make sure that the steps follow from each
other logically, with no gaps.
3. After tracing a “road to proof” either in your head or (preferably) on scratch paper,
start writing up the solution on a fresh piece of paper. The best way to start this is
by writing a quick outline of what you propose to do. Sometimes, the grader will just
look at this outline, say “Yes, she knows what she is doing on this problem!”, and give
the credit.
4. Complete each step of the “road” before you continue to the next one.
5. When making statements like “it follows trivially” or “it is easy to see”, listen for
quiet, nagging doubts. If you yourself aren’t 100% convinced, how will you convince
someone else? Even if it seems to follow trivially, check again. Small exceptions may
not be obvious. The strategy “I am sure it’s true, even if I don’t see it; if I state that
it’s obvious, maybe the grader will believe I know how to prove it” has occasionally
led its user to a score of 0 out of 10.
6. Organize your solution on the page; avoid writing in corners or perpendicular to the
normal orientation. Avoid, if possible, post-factum insertion (if you discover you’ve
missed something, rather than making a mess of the paper by trying to write it over,
start anew. You have the time!)
7. Before writing each phrase, formulate it completely in your mind. Make sure it ex-
presses an idea. Starting to write one thing, then changing course in mid-sentence and
saying another thing is a sure way to create confusion.
8. Be as clear as possible. Avoid, if possible, long-winded phrases. Use as many words as
you need – just do it clearly. Also, avoid acronyms.
9. If necessary, state intermediary results as “claims” or “lemmas” which you can prove
right after stating them. If you cannot prove one of these results, but can prove the
problem’s statement from it, state that you will assume it, then show the path from it
to the solution. You may get partial credit for it.
111
10. When you’re done writing up the solution, go back and re-read it. Put yourself in
the grader’s shoes: can someone else read your write-up and understand the solution?
Must one look for things in the corners? Are there “miraculous” moments?, etc..
112
13.1 Week thirteen problems
1. Determine, with proof, the number of ordered triples (A1 , A2 , A3 ) of sets which have
the property that
(a) A1 ∪ A2 ∪ A3 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and

(b) A1 ∩ A2 ∩ A3 = ∅.
Express your answer in the form 2a 3b 5c 7d , where a, b, c and d are nonnegative integers.
2. Prove or disprove: if x and y are real numbers with y ≥ 0 and y(y + 1) ≤ (x + 1)2 then
y(y − 1) ≤ x2 .
3. You have coins C1 , C2 , . . . , Cn . For each k, Ck is biased so that, when tossed, it has
probability 1/(2k + 1) of falling heads. If the n coins are tossed, what is the probability
that the number of heads is odd? Express your answer as a rational function of n.
4. Alice thinks of a polynomial with strictly positive integer coefficients (so something
of the form P (x) = ni=0 ai xi with n ≥ 0 and each ai ∈ {1, 2, 3, . . .}. Bob wants to
P
learn Alice’s polynomial, but initially knows nothing about it except that it has strictly
positive integer coefficients (in particular, Bob doesn’t know the degree n). Show that
if Alice tells Bob just the two values P (1) and P (P (1) + 1), then Bob can figure out
the full polynomial.
5. Let T be an acute triangle. Inscribe a pair R, S of rectangles in T as shown:
Let A(X) denote the area of polygon X. Find the maximum value, or show that no
maximum exists, of
A(R) + A(S)
A(T )
(i.e., the proportion of the area of T taken up by R and S), where T ranges over all
acute triangles and R, S over all rectangles as above.
6. Let d be a real number. For each integer m ≥ 0, define a sequence (am (j))∞
j=0 by the
conditions
d
am (0) =
2m
113
and
am (j + 1) = am (j)2 + 2am (j), j ≥ 0.
Evaluate limn→∞ an (n).
114
13.2 Week thirteen solutions
1. (Putnam Competition 1985, problem A1) Idea: each i ∈ {1, . . . , 10} has six possibil-
ities, namely be in A1 alone, A2 alone, A3 alone, be missing from A1 alone, missing
from A2 alone, missing from A3 alone. So there should be 610 ways to construct such
a triple. Here’s a possible way of presenting this idea formally:
We claim that there are exactly 610 = 210 310 such subsets. Indeed, let S be the set of
all triples (A1 , A2 , A3 ) satisfying the conditions of the question, and let T be the set
of all words of length 10 over alphabet {a, b, c, d, e, f }. We will construct a bijection ϕ
from S to T ; since |T | is evidently 610 , this will prove the claim.
The bijection is the following: given S = (A1 , A2 , A3 ) ∈ S, for each i ∈ {1, . . . , 10} the
ith letter of ϕ(S) is a if i ∈ A1 , i 6∈ A2 , i 6∈ A3 ; b if i 6∈ A1 , i ∈ A2 , i 6∈ A3 ; c if i 6∈ A1 ,
i 6∈ A2 , i ∈ A3 ; d if i ∈ A1 , i ∈ A2 , i 6∈ A3 ; e if i ∈ A1 , i 6∈ A2 , i ∈ A3 ; and f if i 6∈ A1 ,
i ∈ A2 , i ∈ A3 .
Since there is no i with the property that i 6∈ A1 , i 6∈ A2 , i 6∈ A3 (by condition (1)),
and none with the property that i ∈ A1 , i ∈ A2 , i ∈ A3 (by condition (2)), ϕ(S) ∈ T
for all S ∈ S.
To verify that ϕ is injective, consider S1 = (A1 , A2 , A3 ), S2 = (A01 , A02 , A03 ) ∈ S with
S1 6= S2 . Since S1 6= S2 there must be some i ∈ {1, . . . , 10} and some j ∈ {1, 2, 3} such
that either i ∈ Aj and i 6∈ A0j or i 6∈ Aj and i ∈ A0j ; in either case ϕ(S1 ) and ϕ(S2 )
differ on the ith letter and so are distinct.
To verify that i is surjective, consider a word T = w1 . . . w10 ∈ T . Let A = {i : wi = a},
and define B, C, D, E and F analogously. Set A1 = A ∪ D ∪ E, A2 = B ∪ D ∪ F ,
and A3 = C ∪ E ∪ F . We have A1 ∪ A2 ∪ A3 = A ∪ B ∪ C ∪ D ∪ E ∪ F = {1, . . . 10}.
For each i ∈ {1, . . . , 10}, if wi = a then by construction i ∈ A1 , i 6∈ A2 , i 6∈ A3 , so
i 6∈ A1 ∩ A2 ∩ A3 , and by similar reasoning i 6∈ A1 ∩ A2 ∩ A3 if wi = b, c, d, e or f ; so
A1 ∩ A2 ∩ A3 = ∅, and hence S := (A1 , A2 , A3 ) ∈ S. By construction of S, ϕ(S) = T .
This completes the proof that ϕ is a bijection, and so the proof of the initial claim.
2. (Putnam Competition 1988, Problem B2) [Solution due to John Scholes, https://
mks.mff.cuni.cz/kalva/putnam/psoln/psol888.html] We prove that the claimed
implication necessarily follows, by considering three cases.
Case 1: x ≥ 0 and y(y + 1) = (x + 1)2 . In this case, using (x + 1/2)(x + 1/2 + 1) = (x +
1)2 −1/4, we have y > x+1/2 and so y(y−1) = y(y+1)−2y < (x+1)2 −2(x+1/2) = x2 .
Case 2: x ≥ 0 and y(y+1) < (x+1)2 . In this case there is z > y with z(z+1) = (x+1)2 .
We have y − 1 < z − 1 and, since y ≥ 0 is positive, y(y − 1) < y(z − 1) < z(z − 1) < x2
(the last inequality by Case 1).
Case 3: x < 0. In this case −|x| − 1 < x + 1 < |x| + 1, so (x + 1)2 < (|x| + 1)2 , and
x2 = |x|2 , and the result follows from the result for y, |x|.
3. (Putnam Competition 2001, A2) Let pn be the required probability. We have p1 = 1/3.
For n ≥ 2 we can express pn in terms of pn−1 as follows: we get an odd number of heads
either by getting an odd number of heads among the first n − 1 (probability pn−1 ) and
115
a tail on the nth coin (probability 2n/(2n + 1)), or by getting an even number of heads
among the first n − 1 (probability 1 − pn−1 ) and a head on the nth coin (probability
1/(2n + 1)). This leads to the recurrence
2npn−1 1 − pn−1 1 + (2n − 1)pn−1

pn = + =
2n + 1 2n + 1 2n + 1
valid for n ≥ 2. We claim that the solution to this recurrence is pn = n/(2n + 1). We
prove this claim by induction on n. This base case n = 1 is clear. For n ≥ 2 we have,
using the inductive hypothesis in the second equality,
1 + (2n − 1)pn−1
pn =
2n + 1
n−1
1 + (2n − 1) 2(n−1)+1
=
2n + 1
n
= ,
2n + 1
completing the induction.
4. (Told to me by Chris Long) p(1) is the sum of the coefficients; since all coefficients are
strictly positive, it follows that p(1) + 1 is strictly larger than all coefficients. Call this
number (p(1) + 1) N . The result of querying p(N ) is a number M of the form
a0 + a1 N + a2 N 2 + . . . + an N n
where each of the ai ’s satisfy 1 ≤ ai ≤ N − 1 and each is an integer. It follows that

the base-N representation of M is
an an−1 . . . a1 a0 .
So we can recover p(x) from M by calculating the (unique) base N representation of

M (for instance, by the following process: a0 is the remainder when M is divided by
N ; a1 is the remainder when (M − a0 )/N is divided by N , and so on; the process stops
when M − (a0 + . . . + an N n ) reaches 0).
Remark: This demonstrates the power of adaptive querying (in which each query is
allowed to depend on the results of previous queries).
Example: Alice thinks of p(x) = 3 + x + 2x2 + x3 + 2x4 . Bob queries p(1) and learns
that it is 9, so N = 10. He queries p(10) and learns that it is 21213; this is M . He
converts this to base 10 (a vacuous process), and reads off the digits to recover p(x).
5. (Putnam Competition 1985, A2) (Solution taken from http://people.stat.sfu.ca/

~lockhart/richard/Putnam1985Sol.pdf) The heart of this problem is coming up
with the right notation. Here’s one attempt:
We claim that the maximum value that ration can take is 2/3, and that in fact for any
T , 2/3 can be reached.
116
The problem is evidently scale invariant so we take the side of the triangle on which
R lies to be the line segment on the x-axis of the usual Cartesian plane running from
x = 0 to x = 1. Put the third vertex of T at (x, y) where y > 0, 0 < x < 1, and (x, y)
is outside the disk of radius 1/2 centered at (1/2, 0); this makes T an acute triangle,
and all acute triangles in the plane can be scaled and rotated to the described form
for some appropriate choice of x and y.
We take the combined height of R and S to be λy for some λ with 0 ≤ λ ≤ 1, and the
height of R alone to be λθy for some θ with 0 ≤ θ ≤ 1.
Some simple coordinate geometry gives that the corners of R are at
(λθx, 0); (λθx, λθy); (1 − λθ(1 − x), 0); (1 − λθ(1 − x), λθy),
and so
A(R) = (1λθ)λθy.
Similarly the corners of S are at
(λx, λθy); (λx, λy); (1 − λ(1 − x), λθy); (1 − λ(1 − x), λy)
so
A(S) = λ(1 − λ)(1 − θ)y.
Since A(T ) = y/2 we get
A(S) + A(R)
= 2 ((1 − λθ)λθ + λ(1 − λ)(1 − θ)) = 2(−λ2 θ2 + λ2 θ + λ − λ2 ).
A(T )
This is 0 at λ = 0. The derivative with respect to θ is
2λ2 − 4λ2 θ,
which for λ > 0 vanishes at, and only at, θ = 1/2. The second derivative with respect
to θ is always non-positive so that θ = 1/2 maximizes (A(S) + A(R))/A(T ) for each
λ > 0. At θ = 1/2 the ratio becomes
2(λ − 3λ2 /4),
which a little calculus shows is uniquely maximized at λ = 2/3, at which point the
ratio is 2/3.
6. (Putnam Competition 1985, A3) (Solution taken from http://people.stat.sfu.ca/

~lockhart/richard/Putnam1985Sol.pdf) The key point is that the defining relation
for am (j) almost involves a perfect square; shifting am (j) slightly creates a new se-
quence, that is defined by a “perfect square” recurrence relation, and is easy to work
with.
We claim that limn→∞ an (n) = −1. To see this, define a sequence (bm (j))∞
j=0 by
bm (j) = am (j) + 1,
117
so that
bm (0) = 1 + d/2m
and
bm (j + 1) − 1 = (bm (j) − 1)2 + 2(bm (j) − 1), j ≥ 0.
or equivalently
bm (j + 1) = b2m (j), j ≥ 0.
A simple induction gives that
m
bm (m) = (1 + d/2m)2
so that m
am (m) = (1 + d/2m)2 − 1,
and so
lim an (n) = −1
n→∞
as claimed.
Remark: I mis-typed the recurrence relation; in the 1985 Putnam, the initial condition
was
d
am (0) = m .
2
The same proof leads to the conclusion that
m
lim an (n) = lim (1 + d/2m m)2 − 1 = ed − 1,
n→∞ n→∞
a more interesting answer depending as it does on d.
118

SBU Math Algo Weekly Problems 2014 Fall

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SBU Math Algo Weekly Problems 2014 Fall

Uploaded by

Copyright:

Available Formats

Problem Solving in Math (Math 43900) Fall 2014

2 Week two (September 2) — Induction 10

3 Week three (September 9) — Pigeon-hole principle 20

4 Week four (September 16) — Binomial coefficients 25

5 Week five (September 23) — Inequalities 40

6 Week six (September 30) — Polynomials 51

7 Week seven (October 7) — Recurrences 58

8 Week eight (October 14) — Another grab-bag 69

10 Week ten (November 4) — Games 85

11 Week eleven (November 11) — Graphs 90

12 Week twelve (November 18) — Probability 98

• Plug in small numbers!

• Look for patterns!

• Choose good notation!

• Look for symmetry!

• Break into cases!

• Consider extreme cases!

• Modify the problem!

• Don’t be afraid of a little algebra!

And above all:

3. Which number is bigger: eπ or π e ? (Note: of course, a calculator will reveal the

5. Let f be a non-constant polynomial with positive integer coefficients. Prove that if n

f (f (n) + 1) = a0 + a1 (f (n) + 1) + . . . + ak (f (n) + 1)k

6. (International Mathematical Olympiad 2014 Q1; for an extensive discussion of this

• an,0 = 1 for all n > 0;

• a0,k = 0 for all k > 0; and

• an,k = an−1,k + an−1,k−1 for n, k > 0.

where n! is defined to be n × (n − 1) × . . . × ×1 if n ≥ 1, with 0! = 1 and n! = 0 if n < 0

After a little algebra, this simplifies to

This completes the proof by induction.

4. Prove that for all n ≥ 2, it is possible to write n! − 1 as the sum of n − 1 numbers,

a1 + 2a2 + 3a3 + . . . + kak k+1

the inequality using the inductive hypothesis. It follows that

√ can be verified (for n ≥ 2) by simple algebra (easiest route: multiply across

the inequality using the inductive hypothesis. It follows that

√ n ≥ 2) by simple algebra (easiest route: add 2 to

and for even n ≥ 6:

and so, since |a0i − b0i | = |ai+1 − bi+1 | for each i,

If 2n ∈ A, 1 ∈ B, an almost identical argument gives the same result.

But now (with A0 and B 0 ) we are back in case 1, so

A very similar reduction works if 1, 2n ∈ B.

4. The Fibonacci numbers are defined by the recurrence f0 = 0, f1 = 1 and fn = fn−1 +

3. (International Mathematical Olympiad 1972, problem 1) The maximum subset-sum is

Pascal’s triangle in numbers

Pascal’s triangle symbolically

6. (Vandermonde identity, or Vandermonde convolution)

The binomial theorem

This generalizes the familiar identity

Compositions and weak compositions

Easy warm-up problems

2. Give a combinatorial proof of the parallel summation identity.

3. Give a combinatorial proof of the square summation identity.

4. Give a combinatorial proof of the Vandermonde identity.

Harder warm-up problems

2. Show that the coefficient of xk in (1 + x + x2 + x3 )n is

5. Prove that the expression  

2. RHS is number of subsets of {1, . . . , n + m + 1} of size n, counted directly. LHS counts

4. Let A = {x1 , . . . , xm } and B = {y1 , . . . , yn } be disjoint sets. RHS is number of subsets

5. Applying the binomial theorem with x = 1, y = 1 get

Harder warm-up problems

2. Set x0 = −x and y 0 = −y; we have

(x + y)n = (−x0 − y 0 )n = (−1)n (x0 + y 0 )n

3. It’s not easy to deal

k(k − 1) . . . (k − (r − 1)) = k r + c1 (r)k r−1 + . . . + cr (r),

5. Prove that the expression