PIE2

Principle of Inclusion/Exclusion and The Sieve
Halcyon Derks
The Principle of Inclusion-Exclusion:
Let A1 , A2 , A3 , . . . , An be subsets of a finite set S. Define
\
AI = Ai for I ⊆ {1, 2, 3, . . . , n}
i∈I
Then we have
X
|S − (A1 ∪ A2 ∪ A3 ∪ . . . ∪ An )| = (−1)|I| |AI | (1)
I⊆{1,2,3,...,n}
This formula calculates the number of elements that do not belong to any Ai .
The complement of this formula is also commonly used, as it calculates the number
of elements that belong to the union of the sets. Notice that |S − (A1 ∪ . . . ∪ An )| =
|S| − |A1 ∪ . . . ∪ An |, so multiplying equation 1 by (−1) and subtracting |S| will yield
|A1 ∪ . . . ∪ An |. Thus
X
|A1 ∪ A2 ∪ A3 ∪ . . . ∪ An | = (−1)|I|−1 |AI | (2)
∅6=I⊆{1,2,3,...,n}
is obtained. Equation (2) can also be written by grouping the terms of the summation
based on |I|.
X X
|A1 ∪ A2 ∪ A3 ∪ . . . ∪ An | = |Ai | − |Ai ∩ Aj | +
1≤i≤n 1≤i<j≤n
X
|Ai ∩ Aj ∩ Ak | − . . . (3)
1≤i<j<k≤n
. . . + (−1)n+1 |A1 ∩ . . . ∩ An |
In different cases any of these formulae can be the appropriate choice.
Equation (3) can be proven by showing that an element a included in the union,
A1 ∪ . . . ∪ An , is counted exactly once in the right hand side
P of the equation. Let a be
of r ≥ 1 sets. a will be counted 1 times by |Ai |, it will be counted 2r
r

a memberP
times by |Ai ∩ Aj |, and generally will be counted mr times by the sum involving
m of the sets. Thus, returning to equation (3), a will be counted exactly

r r r r+1 r
− + − . . . + (−1)
1 2 3 r
times by the summation. Using the Binomial Theorem to evaluate (1 − 1)r ,

r
r r r r r
X
k r
− + − . . . + (−1) = (−1) = 0
0 1 2 r k=0
k
1
r
− . . . + (−1)r+1 rr , the observation can be made

Solving for 1

r r r r+1 r r
− + − . . . + (−1) = 0+ = 1
1 2 3 r 0
So any element a will be counted exactly once in the right-hand side of the equation
as it is in |A1 ∪ . . . ∪ An |.
The Principle of Inclusion-Exclusion can also be proven using inductive reasoning.

Assume X
|A1 ∪ . . . ∪ Ak | = (−1)|I|−1 |AI |
∅6=I⊆{1,...,k}
Then the goal is to show

X
|A1 ∪ . . . ∪ Ak ∪ Ak+1 | = (−1)|I|−1 |AI |
∅6=I⊆{1,...,k,k+1}
Define B = A1 ∪ . . . ∪ Ak , then |B| = ∅6=I⊆{1,2,3,...,k} (−1)|I|−1 |AI |. Then the goal

P
becomes calculating |B ∪ Ak+1 |. Applying the outcome of the Principle of Inclusion-

Exclusion when n = 2 (this will be proven when the base case is examined later in
the proof), yields
|B ∪ Ak+1 | = |B| + |Ak+1 | − |B ∩ Ak+1 |
Now confirm that this result counts all of the elements added by including Ak+1
in the union. |B| and |Ak+1 | have already been defined, so focus for a moment on
determining a formula for |B ∩ Ak+1 |. Using the distributive property of unions and
intersections,
|(A1 ∪ A2 ∪ . . . ∪ Ak ) ∩ Ak+1 | = |(A1 ∩ Ak+1 ) ∪ (A2 ∩ Ak+1 ) ∪ . . . ∪ (Ak ∩ Ak+1 )|
a union of k sets to which the Principle of Inclusion-Exclusion can be applied (the

inductive assumption was that the Principle holds for k sets). So,
X X
|B ∩ Ak+1 | = |Ai ∩ Ak+1 | − |Ai ∩ Aj ∩ Ak+1 | + . . .
1≤i≤k 1≤i<j≤k
n+1
. . . + (−1) |A1 ∩ A2 ∩ . . . ∩ Ak ∩ Ak+1 |
Returning to |B ∪ Ak+1 | = |B| + |Ak+1 | − |B ∩ Ak+1 |, group summations based on

|I|. Looking at components for which |I| = 1, yields
X X
|Aa | + |Ak+1 | = |Aa |
1≤a≤k 1≤a≤k+1
The same for |I| = 2, yields

X X X
|Aa ∩ Ab | + |Aa ∩ Ak+1 | = |Aa ∩ Ab |
1≤a<b≤k 1≤a≤k 1≤a<b≤k+1
2
The pattern follows for all |I|, so considering components for which |I| = i, yields
X X
|Aj1 ∩ . . . ∩ Aji | + |Aj1 ∩ . . . ∩ Aji−1 ∩ Ak+1 | =
1≤j1 <...<ji ≤k 1≤j1 <...<ji−1 ≤k
X
|Aj1 ∩ . . . ∩ Aji |
1≤j1 <...<ji ≤k+1
Notice also that the signs will work out because |B ∩ Ak+1 | is subtracted, multiplying
all of the summation components by (−1). So,
X X
|A1 ∪ A2 ∪ . . . ∪ Ak+1 | = |Aa | − |Aa ∩ Ab | +
1≤a≤k+1 1≤a<b≤k+1
X
|Aa ∩ Ab ∩ Ac | − . . .
1≤a<b<c≤k+1
. . . + (−1)k |A1 ∩ A2 ∩ . . . ∩ Ak+1 |

All that remains is to confirm the base case when n = 2. To find the number of
elements in the union of two finite sets A and B, notice that |A| + |B| counts each
element that is in A but not in B exactly once, and vice versa. However elements
that are in both A and B will be counted twice. Removing this number of elements,
that is |A ∩ B|, will reveal that |A ∪ B| = |A| + |B| − |A ∩ B|, confirming the Principle
holds for n = 2.
Sample Counting Problems

The Principle of Inclusion-Exclusion can be used for many different types of problems.
Consider the outcome of the Principle when working with only two or three sets. The
following are a few examples that apply the Principle directly to counting problems
for these values of n.
Find the number of positive integers not exceeding 100 that are not di-
visible by 5 or 7. Let S be the set of all positive integers less than or equal to
100. Then define the subsets in the following way: A will be the set of those positive
integers that are divisible by 5, and B will be those divisible by 7. Then A ∩ B rep-
resents numbers that are divisible by both 5 and 7. Since these are relatively prime,
a number that is divisible by both must be divisible by their product, namely 35.
Using these definitions
|S| = 100
j 100 k
|A| = = 20
5
j 100 k
|B| = = 14
7
j 100 k
|A ∩ B| = =2
35
3
Finally, applying the Principle of Inclusion-Exclusion yields
|S − (A ∪ B)| = |S| − |A| − |B| + |A ∩ B| = 100 − 20 − 14 + 2 = 68
How many bit strings of length 8 do not contain 6 consecutive 0s? Let S
be the set of all bit strings of length 8. Now consider ways that these strings could
have six consecutive 0s. The chain of 0s could begin in the first position of the string,
the second position, or the third position. Notice that positions 4 through 8 will
not suffice since there are not 6 positions following which can be filled with 0s. So,
let A1 , A2 , and A3 be the set of strings with six consecutive 0s starting in the first,
second, or third position, respectively. Then A1 ∩ A2 contains the strings that have
six 0s starting in the first position and in the second. This means that five of the
0s overlap, and so the set contains strings that have 7 consecutive 0s beginning in
the first position. Similarly, A2 ∩ A3 is the set containing the strings that have 7
consecutive 0s beginning in the second position and A1 ∩ A3 is the set containing
the strings that have eight consecutive 0s (six beginning in the first position and six
beginning in the third position). The final set, A1 ∩ A2 ∩ A3 , is the set of strings that
have six consecutive 0s in every position, overall eight consecutive 0s.
In each case, the size of the set can be calculated by determining how many “free”
positions remain. A free position can be filled with either a 0 or 1 (2 choices), while
the rest of the positions must be 0 (1 choice). So S has eight free positions and
|S| = 28 . This pattern holds for all of the sets. So
|S| = 28 = 256
|A1 | = |A2 | = |A3 | = 22 = 4
|A1 ∩ A2 | = |A2 ∩ A3 | = 21 = 2
|A1 ∩ A3 | = |A1 ∩ A2 ∩ A3 | = 20 = 1
Ultimately applying the Principle of Inclusion-Exclusion yields

X X
|S − (A1 ∪ A2 ∪ A3 )| = |S| − |Ak | + |Ak ∩ Aj | − |A1 ∩ A2 ∩ A3 |
= 256 − 4 − 4 − 4 + 2 + 2 + 1 − 1
= 248
How many permutations of the letters of the English alphabet do not con-
tain any of the strings fish, frog, or bird ? Let S be the set of all permutations
of the letters in the English alphabet. Then these three words, fish, frog, and bird,
can appear in any of these permutations. So let the sets Af ish , Af rog , and Abird be
those sets made up of permutations containing the specified word. By defining the
sets in this way the intersections will all be empty sets. For instance, a permutation
that contains the word fish cannot also contain the word frog because the f can only
appear once in a permutation.
All that remains is to calculate |S|, |Af ish |, |Af rog |, and |Abird |. Clearly |S| = 26!,
as there are 26 choices for the first letter in the permutation, 25 choices for the second
4
letter, 24 for the third, etc. |Af ish |, |Af rog |, and |Abird | are slightly more difficult to
calculate. Each word uses four of the letters in the alphabet in a fixed order, so these
four letters are becoming one element in the “alphabet” of the specified set. Then a
set Ai is the set of permutations of 23 elements instead of 26 (22 single letters and 1
word), so |Af ish | = |Af rog | = |Abird | = 23!.
Finally, applying the Principle of Inclusion-Exclusion, the number of permutations
of the English alphabet that do not contain any of the words fish, frog, or bird, is
|S| − |Af ish | − |Af rog | − |Abird | = 26! − 3 · 23!
The methods and logic used to solve these problems can be applied to much more
generalized problems. The rest of this paper is devoted to solving some of these more
challenging problems as well as proving some very interesting results that follow from
(and lead to) the Principle of Inclusion-Exclusion.
Derangements
A derangement is a permutation of objects that leaves no object in its original
position. It is often interesting to calculate Dn , the number of derangements of a set.
Before examining specific problems, it will be helpful to determine a general formula

for Dn using the Principle of Inclusion-Exclusion. Let S be the set of all permutations,
meaning that |S| = n!. Then define a set Ai to be the set of permutations which fix
the ith object. Generalizing this Ai1 ∩ Ai2 ∩ . . . ∩ Aik is the set of permutations in
which objects {i1 , i2 , . . . , ik } are all fixed in their original positions.
Next, determine a formula for the sizes of these sets. Begin with the sets Ai , each
of which fixes a particular object in the permutation, allowing the remaining objects
to be permuted in any way. So, |Ai | is the number of permutations of (n − 1) objects,
which is (n − 1)!. Then, since there are n ways to fix one object,
X
|Ai | = n · |Ai | = n · (n − 1)! = n!
1≤i≤n
P
Next consider how to find |Ai ∩ Aj |. This case fixes both objects i and j, while
the other n − 2 objects can be permuted in any way. So |Ai ∩ Aj | = (n − 2)!. Then,
n

with 2 ways to fix two objects in the set, the result is

X n n! n!
|Ai ∩ Aj | = · (n − 2)! = · (n − 2)! =
2 (n − 2)! · 2! 2!
Using this logic, consider the general case of |I| = r. This case fixes r objects
from the set, allowing the other (n − r) objects to be permuted. So |AI | = (n − r)!.
5
n

There are r
ways to choose which objects are fixed, so

X n n! n!
|AI | = · (n − r)! = · (n − r)! =
r (n − r)! · r! r!
|I|=r
Finally, using these values in the Principle of Inclusion-Exclusion
Dn = |S − (A1 ∪ A2 ∪ . . . ∪ An )|
n! n! n! n!
= n! − + − . . . + (−1)r + . . . + (−1)n
1! 2! r! n!
1 1 1 1
= n! 1 − + − . . . + (−1)r + . . . + (−1)n
1! 2! r! n!
For practice, apply this formula and logic to a few examples.
What is probability that none of 10 people receives the correct hat if a

hatcheck person hands their hats back randomly? Recognize that the number
of ways that no one receives the correct hat is the number of derangements of the
1 1 10 1

10 hats. Using this formula, D10 = 10! 1 − 1! + 2! − . . . + (−1) 10! . Now, to find
the probability of this event, count the number of ways it can occur, out of all of the
outcomes, in this case, any permutation of the hats. So the probability that no one
get the correct hat returned to them is
1
10! 1 − 1!1 + 2!1 − . . . + (−1)10 10!

D10
=
10! 10!
1 1 1 1 1 1 1 1 1 1
=1− + − + − + − + − +
1! 2! 3! 4! 5! 6! 7! 8! 9! 10!
1334961
= ≈ 0.36787946
3628800
This is very close to the approximation of 1e . In fact, looking at the formula derived
for Dn , it is exactly the expansion of ex for x = −1. So,
Dn 1
lim =
n→∞ n! e
How many ways can the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 be arranged

so that no even digit is in its original position (call this E10 )? This problem
can be solved using the technique used to calculate Dn . This case is different because
only even digits cannot remain in their original positions, not all digits as in a normal
derangement.
Let S be the set of all permutations, and then let An be the set that fixes 2(n − 1),
that is A1 fixes 0, A2 fixes 2, etc. Then|S| = 10!, and |AI | = (10 − |I|)!. In this case
there will be 5 ways to fix one digit, 52 ways to fix two digits, etc.
6
Using the Principle of Inclusion-Exclusion to derive the formula

X
|I| 5 5 5 5 5
E10 = (−1) |AI | = 10! − 9! + 8! − 7! + 6! − 5!
1 2 3 4 5
I⊆[1,5]
= 3628800 − 5(362880) + 10(40320) − 10(5040) + 5(720) − 1(120)

= 2170680
With this understanding of how Dn can be calculated, it is possible to make combina-

torial arguments providing some recurrence relations having to do with derangements.
Only two of these appear here, specifically
Dn = (n − 1)(Dn−1 + Dn−2 )
and, from further manipulation,
Dn = nDn−1 + (−1)n (4)
The proof of the first will rely on a combinatorial argument. Suppose (n − 1)

elements have already been arranged in some permutation (not necessarily a de-
rangement). Then placing the nth object into an appropriate position k (not position
n) will displace the object that was previously in position k. Use this displaced object
to fill position n. Then the question is how many arrangements of the original (n − 1)
objects will yield a derangement after this process.
If the n − 1 objects were a derangement, the new permutation will also be a
derangement (all of the objects remain in the same position except for that in position
k, which will end up in position n). Also, the object that starts in position k could
be k itself, since the object in this position is being moved. This case would not fall
into the above group as it would not be a derangement. Then none of the other n − 2
objects can appear in their original position, so there are Dn−2 arrangements.
Overall there are (n − 1) placements for the nth object and Dn−1 + Dn−2 ways the
previous elements could be arranged. So
Dn = (n − 1)(Dn−1 + Dn−2 )
Manipulating this equation will lead to the second equation above. First, dis-
tribute the (n − 1) through the equation, so
Dn = nDn−1 − Dn−1 + (n − 1)Dn−2
Subtracting nDn−1 gives
Dn − nDn−1 = −(Dn−1 − (n − 1)Dn−2 )
From here, an inductive argument yields equation (4). For the base case, confirm
that D2 − 2D1 = (−1)2 . There is only one derangement of 2 elements, that is the
7
reverse of their original placement. There are no derangements of only one object as
it must always end up in its original placement. So,
D2 − 2D1 = 1 − 2(0) = 1 = (−1)2
Now, for the inductive argument, assume that Dn−1 − (n − 1)Dn−2 = (−1)n−1 . Then
Dn − nDn−1 = −(Dn−1 − (n − 1)Dn−2 ) = −(−1)n−1 = (−1)n
Derangements are a very nice application of the Principle of Inclusion-Exclusion.

Many other applications exist in many different contexts. One of the most interesting
of these is the Sieve Formula, a version of the Principle that relates to probabilities.
The Sieve Formula allows the Principle to be applied to a more diverse range of
problems.
The Sieve Formula:

When working with probabilities, it is helpful to notice the similarities between them
and finding unions and intersections of sets. For example, let A and B be events in
a probability space (Ω, P ). Then
P (A + B) = P (A) + P (B) − P (A · B)
(Note: from now on “or” will be represented by +, “and” by · or no symbol.) This

should look very familiar from the inductive proof of the Principle of Inclusion-
Exclusion.
Considering this, let A1 , A2 , A3 , . . . , An be arbitrary events of a probability space
(Ω, P ). Now, referring back to the proof and explanation of the Principle of Inclusion-
Exclusion, these same steps can be followed to determine a formula for P (A1 + A2 +
A3 + . . . + An ). So,
X
P (A1 + A2 + A3 + . . . + An ) = (−1)|I|−1 P (AI )
∅6=I⊆{1,2,3,...,n}
Q
Where AI = i∈I Ai . Looking at this in terms of |I| and letting
X
σk = P (AI ), σ0 = 1
|I|=k
The above formula becomes

n
X
P (A1 + A2 + A3 + . . . + An ) = (−1)j−1 σj
j=1
8
The rest of this paper is devoted to solving some interesting problems that use the
Principle of Inclusion-Exclusion or the Sieve Formula in their solution, or help lend
understanding to what some interesting outcomes of these can be.
Advanced Counting Problems

There are many counting problems encountered in the world of combinatorics to which
the Principle of Inclusion-Exclusion can be applied. The following are examples that
lend themselves nicely to proof using the Principle.
Determine the number ϕ(n) of integers between 1 and n coprime to n, given

the prime factorization pα1 1 · pα2 2 · · · pαr r of n. Any number, b, coprime to n will
satisfy gcd(b, n) = 1. Also gcd(b, pk ) = 1 for k ∈ {1, 2, 3, . . . , r} (if this was not the
case, then pk |gcd(b, n) for some k ∈ {1, 2, 3, . . . , r}). Conversely, any number that is
not coprime to n will have a prime factor in common. So all numbers divisible by pk
and less that n should not be counted in ϕ(n).
With this understanding, let S = {1, 2, 3, . . . , n} and Ak be the set of numbers
divisible by pk . Then b will be a number that falls into S, but is not in Ak . So
ϕ(n) = |S − (A1 ∪ A2 ∪ A3 ∪ . . . ∪ Ar )|.
Then AI will be the set of numbers divisible by all pi for i ∈ I. Since all pi are
prime, a number divisible by all of them will be divisible by their product.QSo, the
general formula for any |AI | is going to be b Q n pk c. Since n is divisible by k∈I pk ,
k∈I
n
|AI | = Q
k∈I pk
Applying the Principle of Inclusion-Exclusion,

X n
ϕ(n) = (−1)|I| Q
k∈I pk
I⊆{1,2,3,...,r}
Manipulating the summation will yield the more common form of the equation. Begin
by writing this out without the summand. That is

1 1 1 1 1 1
ϕ(n) = n 1 − − − − ... − + + + ...
p1 p2 p3 p r p1 p2 p2 p3
This might be recognized as the product of binomials. Specifically, those of the form
(1 − p1k ). So the formula becomes
r
1 1 1 1 Y 1
ϕ(n) = n 1 − 1− 1− ... 1 − =n 1−
p1 p2 p3 pr i=1
pi
Notice that in order to obtain one of the terms in the summation, go through the
product and choose either the 1 or the (− p1k ) from each term.
9
Prove the identity
n
X
i n k 0 if 0 ≤ k < n
(−1) i = n (5)
i (−1) n! if k = n
i=0
In order to prove equation (5), start by calculating the number of onto functions
that map a set K of k elements to one N of n elements. Take S to be the set of
all functions mapping elements of K to elements of N . Then take Ai to be the set
of functions that do not map any element in the domain to the ith element of the
codomain (hence, any function in Ai cannot be an onto function). Then the number
of onto functions will be |S − (A1 ∪ A2 ∪ A3 ∪ . . . ∪ An )|.
A function that maps K to N will take every element in the domain and map it
to some element in the codomain. This means that there will be n choices for each of
the k elements in the domain to map to, yielding nk different functions. So, |S| = nk .
A function that does not map any elements of K to the ith element of N will
take every elements in the domain and map it to (n − 1) elements in the codomain.
So, there are (n − 1) options for each element of K to map to. Then |Ai | for all
i ∈ {1, 2, 3, . . . , n} will be (n − 1)k . Also, there are n different ways to choose i from
the set N , so overall, X
|Ai | = n · (n − 1)k
1≤i≤n
Now, consider the general case when |I| = r. AI contains the functions that map
to only (n−r) elements of N . This means that there are still k elements of the domain,
mapping to n − r elements in the codomain, for any given subset of {1, 2, 3, . . . , n}
of size r. There will be nr subsets of size r, and there will be (n − r)k appropriate
functions for each of these subsets. So

X n
|AI | = (n − r)k
r
|I|=r
Applying the Principle of Inclusion-Exclusion,

k n k n−1 n
|S − (A1 ∪ A2 ∪ . . . ∪ An )| = n − (n − 1) + . . . + (−1) 1k
1 n−1
That is n
X
n−i n
|S − (A1 ∪ A2 ∪ . . . ∪ An )| = (−1) ik
i=0
n−i
This is very similar to the above identity. Up to this point this summation is the
number of onto functions mapping K to N . If k < n there will be no onto functions,
so n
X
n−i n
(−1) ik = 0 if 0 ≤ k < n
i=0
n − i
10
Then, when k = n, onto function are simply permutations of the elements, so the
number of functions is n!. So,
n
X
n−i n k 0 if 0 ≤ k < n
(−1) i =
n − i n! if k = n
i=0
Manipulating this equation will yield equation (5). First, without changing the
n n
outcome of the equation, replace n−i with i (these are equal). Next, consider
(−1)n−i . Multiply both sides of the equation by (−1)n . This will give a coefficient of
(−1)2n−i = (−1)−i = (−1)i . So,
n
i n 0 if 0 ≤ k < n
X
k
(−1) i = n
i (−1) n! if k = n
i=0
Let p(x1 , x2 , . . . , xn ) be a polynomial of degree m, and denote by σ k p the

polynomial obtained by substituting 0’s for k of the variables in p in every
possible combination and summing the arising nk polynomials. Prove that
n
X
k k 0 if m < n
(−1) σ p =
c · x1 · x2 · · · xn if m = n
k=0
Begin by looking at the case when an arbitrary xi = 0. Define

p̃(x1 , . . . , xi−1 , xi+1 , . . . , xn ) = p(x1 , . . . , xi−1 , 0, xi+1 , . . . , xn )
Then define σ k p|xi =0 in terms of p̃. Every k-tuple of variables either contains or does
not contain xi . If xi is included in the group, then only make k − 1 of the remaining
variables 0s when representing σ k p in terms of p̃, yielding σ k−1 p̃. However, in the
case that xi is not included in the k-tuple, also include the cases when k variables are
replaced with 0s in p̃, yielding σ k p̃. Thus,
σ k p|xi =0 = σ k−1 p̃ + σ k p̃
Also, notice that σ 0 p|xi =0 = p|xi =0 = p̃ = σ 0 p̃ by definition.
Now, use the representation of σ k p|xi =0 in terms of p̃ to rewrite the summation.
So, σ 0 p − σ 1 p + σ 2 p − . . . becomes σ 0 p̃ − (σ 0 p̃ + σ 1 p̃) + (σ 1 p̃ + σ 2 p̃) − . . ., in which all
terms cancel out, leaving
Xn
(−1)k σ k p|xi =0 = 0
k=0
Since this is true for any arbitrary xi , the sum is always 0. This means that the
summation must be divisible by xi for all i ∈ {1, 2, 3, . . . , n}. Also, p has degree of
at most m, thus the summation also has degree of at most m. So, when m < n the
summation must be 0 (the only polynomial that is divisible by all xi and has degree
less than n). Also, when m = n, the summation must be divisible by all xi , of which
there are n. So the sum must be some multiple of x1 · x2 · x3 · · · xn . This is exactly,
n
X
k k 0 if m < n
(−1) σ p =
c · x1 · x2 · · · xn if m = n
k=0
11
More Involved Combinatorial Arguments
Let A1 , A2 , . . . , An be any events, Bi = fi (A1 , A2 , . . . , An ) polynomials, and
c1 , c2 , . . . , cn reals. Then show that
k
X
ci P (Bi ) ≥ 0 (6)
i=1
holds for every A1 , A2 , . . . , An provided it holds in those cases when P (Aj ) =

0 or 1 for j = 1, 2, . . . , n. Clerly, if (6) holds for all possible sets of events
A1 , A2 , . . . , An , then it holds in the particular case when P (Ai ) ∈ {0, 1}, as well.
In order to prove the implication in the other direction, we will show that the co-
efficient of P (B) in (6,) when the polynomials are expressed as linear combinations
of atoms, is always nonnegative for any atom in the Boolean algebra generated by
the events A1 , A2 , . . . , An . This implies that any sum or product of these atoms is
nonnegative.
Let B = A1 A2 · · · Ak Āk+1 · · · Ān (an atom of the boolean algebra generated by
A1 , A2 , . . . , An ). Let P (A01 ) = P (A02 ) = . . . = P (A0k ) = 1 and P (A0k+1 ) = . . . =
P (A0n ) = 0. Then define Bi0 = fi (A01 , . . . , A0n ). By (6), we have that
k
X
ci P (Bi0 ) ≥ 0
i=1
The coefficient of P (B) in ki=1 ci P (Bi ) will be B⊆Bi ci . In order to find this
P P
in terms of Bi0 , it is necessary to determine which polynomials B appears in. When
B ⊆ Bi , the atom B 0 = A01 A02 · · · A0k A0k+1 · · · A0n ⊆ Bi0 . The probability of this atom
is equal to 1, meaning that P (Bi0 ) 6= 0. Also, since all probabilities are either 0 or 1,
P (Bi0 ) = 1. The reverse is true as well, that is if P (Bi0 ) 6= 0, then the probability of
at least one atom contained in Bi0 is nonzero. The only atom with a probability that
is nonzero is B, so B ⊆ Bi . This means that the sum of the coefficients of P (B) for
B ⊆ Bi will be
X k
ci P (Bi0 ) ≥ 0
i=1
So the coefficient of every atom in the polynomial is nonnegative, and

k
X
ci P (Bi ) ≥ 0.
i=1
Note that this result can be applied not only to inequalities, but to identities, as well,
simply considering that the difference of the two sides of the identity and its negative,
both must be at least 0.
Let A1 , A2 , . . . , An be arbitrary Qevents of a probability

P space (Ω, P ). For
each I ⊆ {1, 2, 3, . . . , n}, let AI = i∈I Ai ; and let σk = |I|=k P (AI ). Then the
12
probability that exactly q of them occur is
n
j+q j
X
(−1) σj
j=q
q
Approach this problem by considering both sides of the equation and applying the
previous result. Thus, let P (A1 ) = P (A2 ) = · · · = P (Ak ) = 1 and then P (Ak+1 ) =
P (Ak+2 ) = · · · = P (An ) = 0.
Start by considering P (AQ ). Since this is a polynomial in A1 , A2 , . . . , An , applying
the previous result allows the problem to be reduced to only showing it when all
probabilities of individual events are 1 or 0. From here consider two cases: the first
when k 6= q, the other when k = q. When k 6= q there will be no way for exactly
q events to occur. So the probability that exactly q events occur is 0. When k = q
there will be exactly one way to choose q events that occur from the n events total.
Once these q events have been chosen, the probability that they occur is exactly 1.
So the probability that exactly q events occur is 1. Thus,

 0 if k < q
P (AQ ) = 1 if k = q
0 if k > q

Now consider the summation

n
X
j+q j
(−1) σj
j=q
q
In this case, (−1)j+q qj is simply a real coefficient of σj , which is a polynomial in

A1 , A2 , . . . , An . So, again, applying the previous result allows the problem to be

reduced to only showing it is true when all probabilities of individual events are 1 or
0. This immediately simplifies the equation a bit, since P (Ak+1 ), . . . , P (An ) are all 0
and thus do not need to be counted in the summation (all σi containing these will be
0). So, from this point on, the summation becomes
k
j+q j
X
(−1) σj
j=q
q
As before, consider cases, this time there will be three cases to examine beginning
with k < q. In this case there will be noPsummation asthe upper bound will be less
than the lower bound. So, when k < q, kj=q (−1)j+q qj σj = 0.
The next case is k = q. In this case the summation has only one term, so calculate
what this term will be. The summation is (−1)q+q qq σq . So, there are three parts
to the formula: (−1)2q = 1, qq = 1, and σq = 1. This last one requires a bit more

P
explanation, σq = |I|=q P (AI ), and, in this case, there is only one set I that contains
q elements. So, P (AI ) = P (A1 · A2 · · · Aq ). Finally, since all of these events have a
probability of 1, σq = 1. So, when k = q,
k
j+q j
X
(−1) σj = 1
j=q
q
13
The final case to consider is k > q. Start by calculating σj in this case. This will
be the number of different ways
that j events can be chosen from those k events with
k
probability 1. Thus σj = j . Making this substitution in the summation,
k
j+q j k
X
(−1)
j=q
q j
Evaluating these combinations yields

j! k! k!
· =
(j − q)! · q! (k − j)! · j! (k − j)! · (j − q)! · q!
k! k − q!
= ·
(k − q)! · q! ((k − q) − (k − j))! · (k − j)!

k k−q
=
q j−q
Notice that kq is independent of j, so it can be pulled outside of the summand.

Substituting this back into the summation, gives

Xk
k j+q k − q
(−1)
q j=q j−q
Now the expression inside of the summand looks much like the binomial equation.
Making the observation that (−1)j+q = (−1)j (−1)q = (−1)j (−1)−q = (−1)j−q , and
using this reveals
Xk
k j−q k − q k
(−1) = · (1 − 1)k−q = 0
q j=q j−q q
So, overall 
n
X
j  0 if k < q
(−1)j+q σj = 1 if k = q
q
0 if k > q

j=q
So, indeed
n
X
j+q j
P (AQ ) = (−1) σj
j=q
q
Let G be a simple graph on V (G) = {1, 2, 3, . . . , n} with all degrees at most

d and let an event Ai be associated with each point i. Suppose that (i)
1
P (Ai ) ≤ 4d , and (ii) every Ai is independent of the set of all Aj ’s for which
j is not adjacent to i. Then
P (Ā1 Ā2 . . . Ān ) > 0
14
1
Using an inductive argument prove that P (A1 |Ā2 Ā3 . . . Ān ) ≤ 2d . Let the inductive
0 0
hypothesis be that all subgraphs G of G, with V (G ) ⊂ V (G) have P (Ai1 |Āi2 Āi3 . . . Āik ) ≤
1
2d
. Then, the base case will be the smallest possible subgraphs, or those consisting
1 1
of only a single vertex. In this case P (Ai ) ≤ 4d ≤ 2d from part (i) in the statement
of the problem.
Let vertices 2 through m be those adjacent to vertex 1. Then rewrite
P (A1 Ā2 ...Ān )
P (A1 Ā2 . . . Ān ) P (Ām+1 ...Ān ) P (A1 Ā2 . . . Ām |Ām+1 . . . Ān )
P (A1 |Ā2 Ā3 . . . Ān ) = = =
P (Ā2 . . . Ān ) P (Ā2 ...Ān ) P (Ā2 . . . Ām |Ām+1 . . . Ān )
P (Ām+1 ...Ān )
Now evaluate the numerator and denominator of this equation separately.

Begin by looking at the numerator. P (A1 Ā2 . . . Ām |Ām+1 . . . Ān ) must be less than
or equal to P (A1 |Ām+1 . . . Ān ), since requiring more events to happen (or in this case,
to not happen) cannot increase the probability. Notice that A1 is independent of
Am+1 , Am+2 , . . . , An (it does not share an edge with any of them, and so, from the
statement of the problem, they are independent). This means that
1
P (A1 |Ām+1 . . . Ān ) = P (A1 ) ≤
4d
Now look at the denominator. Using DeMorgan’s law, the probability that none of
the given events occur is the same as one minus the probability that any one occurs.
That is P (Ā2 . . . Ām |Ām+1 . . . Ān ) = 1 − P (A2 + · · · + Am |Ām+1 . . . Ān ). This value
must be less than or equal to the sum of the probabilities that each individual event
happens (if all of the events are independent these two values will be equal). So
rewrite this as
m
X
1 − P (A2 + · · · + Am |Ām+1 . . . Ān ) ≥ 1 − P (Ai |Ām+1 . . . Ān )
i=2
1
Now, from the inductive assumption, P (Ai |Ām+1 . . . Ān ) ≤ 2d . Also, every vertex has
degree of at most d and m − 1 is the degree of vertex 1. Thus,
m
X m−1 d 1
1− P (Ai |Ām+1 . . . Ān ) ≥ 1 − ≥1− =
i=2
2d 2d 2
1
So, the numerator is less than or equal to 4d
, and the denominator is greater than
or equal to 12 . Meaning that, overall,
1
4d 1
P (A1 |Ā2 Ā3 . . . Ān ) ≤ 1 =
2
2d
Now, rewriting P (Ā1 Ā2 · · · Ān ) in terms of P (A1 |Ā2 Ā3 . . . Ān ) will yield the final
result. Notice that
P (Ā1 Ā2 . . . Ān )
= P (Ā1 |Ā2 Ā3 . . . Ān ) = 1 − P (A1 |Ā2 Ā3 . . . Ān )
P (Ā2 Ā3 . . . Ān )
15
1
However, P (A1 |Ā2 Ā3 . . . Ān ) ≤ 2d
, so this means
P (Ā1 Ā2 . . . Ān ) 1

≥1−
P (Ā2 Ā3 . . . Ān ) 2d

1
P (Ā1 Ā2 . . . Ān ) ≥ P (Ā2 Ā3 . . . Ān ) 1 −
2d
Since P (Ā2 Ā3 . . . Ān ) > 0 (necessary for the above to fraction to have meaning), and
1
(1 − 2d ) > 0,
P (Ā1 Ā2 . . . Ān ) > 0
Generalizing the Sieve

The problems in this section will ultimately lead to the Sieve formula by first defining
a much more generic function and showing that the Sieve is actually a specific instance
of this.
Let V = {x1 , x2 , . . . , xn } be a set partially ordered by a relation ≤. Call an

n × n-matrix (aij ) compatible if
aij 6= 0 ⇒ xi ≤ xj
Show that the sum, the product, and (if it exists) the inverse of compatible
matrices is compatible. Let A = (aij ) and B = (bij ) be compatible matrices.
Then show that C = A + B, D = AB, and E = A−1 are compatible as well.
Let C = (cij ). Then for all i, j ≤ n, cij = aij + bij . If cij 6= 0 then aij or bij must
be nonzero as well. Then, since A and B are both compatible, this implies xi ≤ xj ,
so C is also compatible.
Let D = (dij ). Then for all i, j ≤ n,
n
X
dij = aik bkj
k=1
If dij 6= 0, there must be some k such that aik 6= 0 and bkj 6= 0. This means that
xi ≤ xk (from compatibility of A), and xk ≤ xj (from compatibility of B). Then, by
transitivity, xi ≤ xj , so D is compatible.
Assuming that A is invertible, let E = (eij ). Then prove that eij 6= 0 ⇒ xi ≤ xj .
First, assume that xi ≤ xj ⇒ i ≤ j (this can be done by indexing A correctly). This
guarantees that A is an upper triangular matrix, so
n
Y
det(A) = aii
i=1
Also, since A is invertible,

det(A) 6= 0
16
So then for all i, aii 6= 0. Now assume that for some eij 6= 0, xi 6≤ xj . It is possible to
then choose the maximal i that yields this result. Looking at the product of A and
E (the product of inverse matrices is the identity),
n
X
aik ekj = 0 i 6= j
k=1
However, since aii 6= 0 and eij 6= 0, aii eij 6= 0. So there must be some k 6= i such that
aik ekj 6= 0. Then aik 6= 0 ⇒ xi ≤ xk , and thus i < k. Since i < k, the earlier choice
of i as maximal, guarantees us that ekj 6= 0 ⇒ xk ≤ xj (i was the largest value for
which there was an “incompatibility” in E). Transitivity then yields xi ≤ xj , which
contradicts the assumption that E was not compatible. So E is compatible.
Find a function µ defined on V × V such that

µ(x, y) = 0 if x 6≤ y (7)
µ(x, x) = 1 (8)
X
µ(x, y) = 0 (x < z) (9)
x≤y≤z
First observe that these conditions uniquely determine the function µ(x, y). Indeed,
if we know the values of µ(x, y) for x ≤ y < z, then by the last condition the value of
µ(x, z) can be determined.
Let the indexing be so that xi ≤ xj implies i ≤ j and define a matrix Q = (qij )
where
qij = 1 if xi ≤ xj
qij = 0 otherwise
Define a matrix M = (mij ) where mij = µ(xi , xj ). Then rewrite the requirements
on µ as
QM = I ⇒ M = Q−1
This uniquely determines µ(x, y).
Confirm this by checking that the three conditions are met. First, since Q is a
compatible matrix, M must also be compatible, confirming equation (7). Second,
µ(x, x) will always be an entry along the diagonal of M . All qii = 1, so det(Q) = 1,
thus the diagonal remains the same in the inverse, so mii = 1, confirming equation
(8). Finally, let ixz be an entry in the identity matrix. Then
n
X
ixz = mxy qyz
y=0
Notice however, that when y < x, mxy = 0, and when y > z, qyz = 0. So
n
X z
X
ixz = mxy qyz = mxy qyz = 0 (x < z)
y=0 y=x
17
Then, since qyz is 1 whenever y ≤ z, this is
X
µ(x, y) = 0
x≤y≤z
This confirms equation (9).
Evaluate the function µ(x, y) if V is the lattice of all subsets of a set S

(ordered by inclusion). Take µ(X, Y ) defined as follows:
µ(X, Y ) = (−1)|Y −X| when X ⊆ Y

µ(X, Y ) = 0 otherwise
Then, verify the requirements of the function µ from above. First, it is clear that
µ(X, Y ) = 0 when X 6⊆ Y from the definition of the function confirming equation (7).
Next,
P µ(X, X) = (−1)|X−X| = (−1)0 = 1, confirming equation (8). Finally, calculate
X⊆Y ⊆Z µ(X, Y ) when X ⊂ Z. This will be
|Z−X|
X
|Y −X|
X |Z − X|
(−1) = (−1)k
X⊆Y ⊆Z k=0
k
Notice that this is the binomial equation, so the sum is (1 − 1)|Z−X| = 0, fulfilling
equation (9).
Let f (x) be any function defined on V , and set

X
g(x) = f (z)
z≤x
Then show that X

f (x) = g(z)µ(z, x)
z≤x
Represent f (x) and g(x) as the vectors [f (x1 ), . . . , f (xn )] and [g(x1 ), . . . , g(xn )],
respectively. Then observe that g(x) = f (x) · Q. It follows that
f (x) = g(x) · Q−1 = g(x) · M
Now calculate g(x) · M . Then f (xi ), an entry of f , can be written as

X
g(z)µ(z, xi )
z≤xi
So, X
f (x) = g(z)µ(z, x)
z≤x
18
Finally, show that the Sieve is a special case of this statement.
Let S = {1, 2, . . . , n}, then the Sieve Formula states that
X
P (A1 + A2 + · · · + An ) = P (AK )(−1)|K|−1
K⊆S
Using DeMorgan’s Law, rewrite

! !
X Y
P Ai =1−P Āi (10)
i∈S i∈S
This will be used later in the proof.

Now let !
Y Y
f (L) = P Ai Āj
i6∈L j∈L
P Q
Then g(K) = L⊆K f (L). Every term in this summation contains P l6∈K Al and
what remains is every other combination of Ai s and Āi s for i ∈ K. So
!
X Y
g(K) = f (L) = P Ai = P (AS−K )
L⊆K i6∈K
Then, using the definition of f in terms of g,

X X
f (S) = g(K)µ(S, K) = P (AS−K )(−1)|S−K|
K⊆S K⊆S
However, since K ranges over all subsets of S, S − K will also range over all subsets.
So, X X
P (AS−K )(−1)|S−K| = P (AK )(−1)|K|
K⊆S K⊆S
Also, from the definition of f (L),

! !
Y Y Y
f (S) = P Ai Āj =P Āj
i6∈S j∈S j∈S
Referring back to equation (10)

!
X X
P Aj = 1 − P (AK )(−1)|K|
j∈S K⊆S
Finally, X
P (A1 + A2 + · · · + An ) = P (AK )(−1)|K|−1
K⊆S
the Sieve Formula.
19

PIE2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PIE2

Uploaded by

Copyright:

Available Formats

Principle of Inclusion/Exclusion and The Sieve

In different cases any of these formulae can be the appropriate choice.

times by the summation. Using the Binomial Theorem to evaluate (1 − 1)r ,

The Principle of Inclusion-Exclusion can also be proven using inductive reasoning.

Then the goal is to show

Define B = A1 ∪ . . . ∪ Ak , then |B| = ∅6=I⊆{1,2,3,...,k} (−1)|I|−1 |AI |. Then the goal

becomes calculating |B ∪ Ak+1 |. Applying the outcome of the Principle of Inclusion-

|(A1 ∪ A2 ∪ . . . ∪ Ak ) ∩ Ak+1 | = |(A1 ∩ Ak+1 ) ∪ (A2 ∩ Ak+1 ) ∪ . . . ∪ (Ak ∩ Ak+1 )|

a union of k sets to which the Principle of Inclusion-Exclusion can be applied (the

Returning to |B ∪ Ak+1 | = |B| + |Ak+1 | − |B ∩ Ak+1 |, group summations based on

The same for |I| = 2, yields

. . . + (−1)k |A1 ∩ A2 ∩ . . . ∩ Ak+1 |

Sample Counting Problems

|S − (A ∪ B)| = |S| − |A| − |B| + |A ∩ B| = 100 − 20 − 14 + 2 = 68

Ultimately applying the Principle of Inclusion-Exclusion yields

|S| − |Af ish | − |Af rog | − |Abird | = 26! − 3 · 23!

Before examining specific problems, it will be helpful to determine a general formula

Finally, using these values in the Principle of Inclusion-Exclusion

For practice, apply this formula and logic to a few examples.

What is probability that none of 10 people receives the correct hat if a

How many ways can the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 be arranged

= 3628800 − 5(362880) + 10(40320) − 10(5040) + 5(720) − 1(120)

With this understanding of how Dn can be calculated, it is possible to make combina-

and, from further manipulation,

Dn = nDn−1 + (−1)n (4)

The proof of the first will rely on a combinatorial argument. Suppose (n − 1)

Dn = nDn−1 − Dn−1 + (n − 1)Dn−2

Subtracting nDn−1 gives

Dn − nDn−1 = −(Dn−1 − (n − 1)Dn−2 )

D2 − 2D1 = 1 − 2(0) = 1 = (−1)2

Dn − nDn−1 = −(Dn−1 − (n − 1)Dn−2 ) = −(−1)n−1 = (−1)n

Derangements are a very nice application of the Principle of Inclusion-Exclusion.

The Sieve Formula:

(Note: from now on “or” will be represented by +, “and” by · or no symbol.) This

The above formula becomes

Advanced Counting Problems

Determine the number ϕ(n) of integers between 1 and n coprime to n, given

Applying the Principle of Inclusion-Exclusion,

Applying the Principle of Inclusion-Exclusion,

Let p(x1 , x2 , . . . , xn ) be a polynomial of degree m, and denote by σ k p the

Begin by looking at the case when an arbitrary xi = 0. Define

holds for every A1 , A2 , . . . , An provided it holds in those cases when P (Aj ) =

So the coefficient of every atom in the polynomial is nonnegative, and

Let A1 , A2 , . . . , An be arbitrary Qevents of a probability

Now consider the summation

In this case, (−1)j+q qj is simply a real coefficient of σj , which is a polynomial in

A1 , A2 , . . . , An . So, again, applying the previous result allows the problem to be

Evaluating these combinations yields

Notice that kq is independent of j, so it can be pulled outside of the summand.

Substituting this back into the summation, gives

Let G be a simple graph on V (G) = {1, 2, 3, . . . , n} with all degrees at most

P (Ā1 Ā2 . . . Ān ) > 0

Now evaluate the numerator and denominator of this equation separately.

P (Ā1 Ā2 . . . Ān ) 1

Generalizing the Sieve

Let V = {x1 , x2 , . . . , xn } be a set partially ordered by a relation ≤. Call an

Also, since A is invertible,

Find a function µ defined on V × V such that

This confirms equation (9).

Evaluate the function µ(x, y) if V is the lattice of all subsets of a set S

µ(X, Y ) = (−1)|Y −X| when X ⊆ Y

Let f (x) be any function defined on V , and set

Then show that X

f (x) = g(x) · Q−1 = g(x) · M