This action might not be possible to undo. Are you sure you want to continue?
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
Notes on
Discrete Mathematics
Miguel A. Lerma
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
Contents
Introduction 5
Chapter 1. Logic, Proofs 6
1.1. Propositions 6
1.2. Predicates, Quantiﬁers 11
1.3. Proofs 13
Chapter 2. Sets, Functions, Relations 19
2.1. Set Theory 19
2.2. Functions 27
2.3. Relations 32
Chapter 3. Algorithms, Integers 38
3.1. Algorithms 38
3.2. The Euclidean Algorithm 48
3.3. Modular Arithmetic, RSA Algorithm 52
Chapter 4. Induction, Recurences 59
4.1. Sequences and Strings 59
4.2. Mathematical Induction 62
4.3. Recurrence Relations 65
Chapter 5. Counting 69
5.1. Basic Principles 69
5.2. Combinatorics 71
5.3. Generalized Permutations and Combinations 73
5.4. Binomial Coeﬃcients 75
5.5. The Pigeonhole Principle 77
Chapter 6. Probability 78
6.1. Probability 78
Chapter 7. Graph Theory 82
7.1. Graphs 82
7.2. Representations of Graphs 88
7.3. Paths and Circuits 91
3
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
CONTENTS 4
7.4. Planar Graphs 97
Chapter 8. Trees 100
8.1. Trees 100
8.2. Binary Trees 102
8.3. Decision Trees, Tree Isomorphisms 104
8.4. Tree Transversal 113
8.5. Spanning Trees 116
Chapter 9. Boolean Algebras 122
9.1. Combinatorial Circuits 122
9.2. Boolean Functions, Applications 127
Chapter 10. Automata, Grammars and Languages 133
10.1. Finite State Machines 133
10.2. Languages and Grammars 137
10.3. Language Recognition 144
Appendix A. 150
A.1. Eﬃcient Computation of Powers Modulo m 150
A.2. Machines and Languages 152
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
Introduction
These notes are intended to be a summary of the main ideas in
course CS 310: Mathematical Foundations of Computer Science. I
may keep working on this document as the course goes on, so these
notes will not be completely ﬁnished until the end of the quarter.
The textbook for this course is Keneth H. Rosen: Discrete Mathe
matics and Its Applications, Fifth Edition, 2003, McGrawHill. With
few exceptions I will follow the notation in the book.
These notes contain some questions and “exercises” intended to
stimulate the reader who wants to play a somehow active role while
studying the subject. They are not homework nor need to be addressed
at all if the reader does not wish to. I will recommend exercises and
give homework assignments separately.
Finally, if you ﬁnd any typos or errors, or you have any suggestions,
please, do not hesitate to let me know.
Miguel A. Lerma
mlerma@math.northwestern.edu
Northwestern University
Spring 2005
http://www.math.northwestern.edu/~mlerma/courses/cs31005s/
5
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
CHAPTER 1
Logic, Proofs
1.1. Propositions
A proposition is a declarative sentence that is either true or false
(but not both). For instance, the following are propositions: “Paris
is in France” (true), “London is in Denmark” (false), “2 < 4” (true),
“4 = 7 (false)”. However the following are not propositions: “what
is your name?” (this is a question), “do your homework” (this is a
command), “this sentence is false” (neither true nor false), “x is an
even number” (it depends on what x represents), “Socrates” (it is not
even a sentence). The truth or falsehood of a proposition is called its
truth value.
1.1.1. Connectives, Truth Tables. Connectives are used for
making compound propositions. The main ones are the following (p
and q represent given propositions):
Name Represented Meaning
Negation p “not p”
Conjunction p ∧ q “p and q”
Disjunction p ∨ q “p or q (or both)”
Exclusive Or p ⊕q “either p or q, but not both”
Implication p → q “if p then q”
Biconditional p ↔ q “p if and only if q”
The truth value of a compound proposition depends only on the
value of its components. Writing F for “false” and T for “true”, we
can summarize the meaning of the connectives in the following way:
6
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
1.1. PROPOSITIONS 7
p q p p ∧ q p ∨ q p ⊕q p → q p ↔ q
T T F T T F T T
T F F F T T F F
F T T F T T T F
F F T F F F T T
Note that ∨ represents a nonexclusive or, i.e., p ∨ q is true when
any of p, q is true and also when both are true. On the other hand ⊕
represents an exclusive or, i.e., p ⊕ q is true only when exactly one of
p and q is true.
1.1.2. Tautology, Contradiction, Contingency.
1. A proposition is said to be a tautology if its truth value is T
for any assignment of truth values to its components. Example:
The proposition p ∨ p is a tautology.
2. A proposition is said to be a contradiction if its truth value is F
for any assignment of truth values to its components. Example:
The proposition p ∧ p is a contradiction.
3. A proposition that is neither a tautology nor a contradiction is
called a contingency.
p p p ∨ p p ∧ p
T F T F
T F T F
F T T F
F T T F
6 6
tautology contradiction
1.1.3. Conditional Propositions. A proposition of the form “if
p then q” or “p implies q”, represented “p → q” is called a conditional
proposition. For instance: “if John is from Chicago then John is from
Illinois”. The proposition p is called hypothesis or antecedent, and the
proposition q is the conclusion or consequent.
Note that p → q is true always except when p is true and q is false.
So, the following sentences are true: “if 2 < 4 then Paris is in France”
(true → true), “if London is in Denmark then 2 < 4” (false → true),
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
1.1. PROPOSITIONS 8
“if 4 = 7 then London is in Denmark” (false → false). However the
following one is false: “if 2 < 4 then London is in Denmark” (true →
false).
In might seem strange that “p → q” is considered true when p is
false, regardless of the truth value of q. This will become clearer when
we study predicates such as “if x is a multiple of 4 then x is a multiple
of 2”. That implication is obviously true, although for the particular
case x = 3 it becomes “if 3 is a multiple of 4 then 3 is a multiple of 2”.
The proposition p ↔ q, read “p if and only if q”, is called bicon
ditional. It is true precisely when p and q have the same truth value,
i.e., they are both true or both false.
1.1.4. Logical Equivalence. Note that the compound proposi
tions p → q and p ∨ q have the same truth values:
p q p p ∨ q p → q
T T F T T
T F F F F
F T T T T
F F T T T
When two compound propositions have the same truth values no
matter what truth value their constituent propositions have, they are
called logically equivalent. For instance p → q and p ∨ q are logically
equivalent, and we write it:
p → q ≡ p ∨ q
Note that that two propositions A and B are logically equivalent
precisely when A ↔ B is a tautology.
Example: De Morgan’s Laws for Logic. The following propositions
are logically equivalent:
(p ∨ q) ≡ p ∧ q
(p ∧ q) ≡ p ∨ q
We can check it by examining their truth tables:
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
1.1. PROPOSITIONS 9
p q p q p ∨ q (p ∨ q) p ∧ q p ∧ q (p ∧ q) p ∨ q
T T F F T F F T F F
T F F T T F F F T T
F T T F T F F F T T
F F T T F T T F T T
Example: The following propositions are logically equivalent:
p ↔ q ≡ (p → q) ∧ (q → p)
Again, this can be checked with the truth tables:
p q p → q q → p (p → q) ∧ (q → p) p ↔ q
T T T T T T
T F F T F F
F T T F F F
F F T T T T
Exercise: Check the following logical equivalences:
(p → q) ≡ p ∧ q
p → q ≡ q → p
(p ↔ q) ≡ p ⊕q
1.1.5. Converse, Contrapositive. The converse of a conditional
proposition p → q is the proposition q → p. As we have seen, the bi
conditional proposition is equivalent to the conjunction of a conditional
proposition an its converse.
p ↔ q ≡ (p → q) ∧ (q → p)
So, for instance, saying that “John is married if and only if he has a
spouse” is the same as saying “if John is married then he has a spouse”
and “if he has a spouse then he is married”.
Note that the converse is not equivalent to the given conditional
proposition, for instance “if John is from Chicago then John is from
Illinois” is true, but the converse “if John is from Illinois then John is
from Chicago” may be false.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
1.1. PROPOSITIONS 10
The contrapositive of a conditional proposition p → q is the propo
sition q → p. They are logically equivalent. For instance the con
trapositive of “if John is from Chicago then John is from Illinois” is “if
John is not from Illinois then John is not from Chicago”.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
1.2. PREDICATES, QUANTIFIERS 11
1.2. Predicates, Quantiﬁers
1.2.1. Predicates. A predicate or propositional function is a state
ment containing variables. For instance “x +2 = 7”, “X is American”,
“x < y”, “p is a prime number” are predicates. The truth value of the
predicate depends on the value assigned to its variables. For instance if
we replace x with 1 in the predicate “x+2 = 7” we obtain “1+2 = 7”,
which is false, but if we replace it with 5 we get “5 + 2 = 7”, which
is true. We represent a predicate by a letter followed by the variables
enclosed between parenthesis: P(x), Q(x, y), etc. An example for P(x)
is a value of x for which P(x) is true. A counterexample is a value of
x for which P(x) is false. So, 5 is an example for “x + 2 = 7”, while 1
is a counterexample.
Each variable in a predicate is assumed to belong to a universe (or
domain) of discourse, for instance in the predicate “n is an odd integer”
’n’ represents an integer, so the universe of discourse of n is the set of
all integers. In “X is American” we may assume that X is a human
being, so in this case the universe of discourse is the set of all human
beings.
1
1.2.2. Quantiﬁers. Given a predicate P(x), the statement “for
some x, P(x)” (or “there is some x such that p(x)”), represented
“∃x P(x)”, has a deﬁnite truth value, so it is a proposition in the
usual sense. For instance if P(x) is “x + 2 = 7” with the integers as
universe of discourse, then ∃x P(x) is true, since there is indeed an
integer, namely 5, such that P(5) is a true statement. However, if
Q(x) is “2x = 7” and the universe of discourse is still the integers,
then ∃x Q(x) is false. On the other hand, ∃x Q(x) would be true if we
extend the universe of discourse to the rational numbers. The symbol
∃ is called the existential quantiﬁer.
Analogously, the sentence “for all x, P(x)”—also “for any x, P(x)”,
“for every x, P(x)”, “for each x, P(x)”—, represented “∀x P(x)”, has
a deﬁnite truth value. For instance, if P(x) is “x + 2 = 7” and the
1
Usually all variables occurring in predicates along a reasoning are supposed to
belong to the same universe of discourse, but in some situations (as in the so called
manysorted logics) it is possible to use diﬀerent kinds of variables to represent
diﬀerent types of objects belonging to diﬀerent universes of discourse. For instance
in the predicate “σ is a string of length n” the variable σ represents a string, while
n represents a natural number, so the universe of discourse of σ is the set of all
strings, while the universe of discourse of n is the set of natural numbers.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
1.2. PREDICATES, QUANTIFIERS 12
universe of discourse is the integers, then ∀x P(x) is false. However if
Q(x) represents “(x + 1)
2
= x
2
+ 2x + 1” then ∀x Q(x) is true. The
symbol ∀ is called the universal quantiﬁer.
In predicates with more than one variable it is possible to use several
quantiﬁers at the same time, for instance ∀x∀y∃z P(x, y, z), meaning
“for all x and all y there is some z such that P(x, y, z)”.
Note that in general the existential and universal quantiﬁers cannot
be swapped, i.e., in general ∀x∃y P(x, y) means something diﬀerent
from ∃y∀x P(x, y). For instance if x and y represent human beings and
P(x, y) represents “x is a friend of y”, then ∀x∃y P(x, y) means that
everybody is a friend of someone, but ∃y∀x P(x, y) means that there
is someone such that everybody is his or her friend.
A predicate can be partially quantiﬁed, e.g. ∀x∃y P(x, y, z, t). The
variables quantiﬁed (x and y in the example) are called bound variables,
and the rest (z and t in the example) are called free variables. A
partially quantiﬁed predicate is still a predicate, but depending on
fewer variables.
1.2.3. Generalized De Morgan Laws for Logic. If ∃x P(x) is
false then there is no value of x for which P(x) is true, or in other
words, P(x) is always false. Hence
∃x P(x) ≡ ∀x P(x) .
On the other hand, if ∀x P(x) is false then it is not true that for
every x, P(x) holds, hence for some x, P(x) must be false. Thus:
∀x P(x) ≡ ∃x P(x) .
This two rules can be applied in successive steps to ﬁnd the negation
of a more complex quantiﬁed statement, for instance:
∃x∀y p(x, y) ≡ ∀x∀y P(x, y) ≡ ∀x∃y P(x, y) .
Exercise: Write formally the statement “for every real number there
is a greater real number”. Write the negation of that statement.
Answer: The statement is: ∀x ∃y (x < y) (the universe of discourse
is the real numbers). Its negation is: ∃x ∀y (x < y), i.e., ∃x ∀y (x ,< y).
(Note that among real numbers x ,< y is equivalent to x ≥ y, but
formally they are diﬀerent predicates.)
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
1.3. PROOFS 13
1.3. Proofs
1.3.1. Mathematical Systems, Proofs. A Mathematical Sys
tem consists of:
1. Axioms: propositions that are assumed true.
2. Deﬁnitions: used to create new concepts from old ones.
3. Undeﬁned terms: corresponding to the primitive concepts of the
system (for instance in set theory the term “set” is undeﬁned).
A theorem is a proposition that can be proved to be true. An
argument that establishes the truth of a proposition is called a proof.
Example: Prove that if x > 2 and y > 3 then x +y > 5.
Answer: Assuming x > 2 and y > 3 and adding the inequalities
term by term we get: x +y > 2 + 3 = 5.
That is an example of direct proof. In a direct proof we assume the
hypothesis together with axioms and other theorems previously proved
and we derive the conclusion from them.
An indirect proof or proof by contrapositive consists of proving the
contrapositive of the desired implication, i.e., instead of proving p → q
we prove q → p.
Example: Prove that if x +y > 5 then x > 2 or y > 3.
Answer: We must prove that x + y > 5 → (x > 2) ∨ (y > 3). An
indirect proof consists of proving ((x > 2) ∨ (y > 3)) → (x +y > 5).
In fact: ((x > 2) ∨ (y > 3)) is the same as (x ≤ 2)∧(y ≤ 3), so adding
both inequalities we get x +y ≤ 5, which is the same as (x +y > 5).
Proof by Contradiction. In a proof by contradiction or (Reductio ad
Absurdum) we assume the hypotheses and the negation of the conclu
sion, and try to derive a contradiction, i.e., a proposition of the form
r ∧ r.
Example: Prove by contradiction that if x+y > 5 then either x > 2
or y > 3.
Answer: We assume the hypothesis x +y > 5. From here we must
conclude that x > 2 or y > 3. Assume to the contrary that “x > 2 or
y > 3” is false, so x ≤ 2 and y ≤ 3. Adding those inequalities we get
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
1.3. PROOFS 14
x ≤ 2 + 3 = 5, which contradicts the hypothesis x +y > 5. From here
we conclude that the assumption “x ≤ 2 and y ≤ 3” cannot be right,
so “x > 2 or y > 3” must be true.
Remark: Sometimes it is diﬃcult to distinguish between an indirect
proof and a proof by contradiction. In an indirect proof we prove an
implication of the form p → q by proving the contrapositive q →
p. In an proof by contradiction we prove an statement s (which
may or may not be an implication) by assuming s and deriving a
contradiction. In fact proofs by contradiction are more general than
indirect proofs.
Exercise: Prove by contradiction that
√
2 is not a rational number,
i.e., there are no integers a, b such that
√
2 = a/b.
Answer: Assume that
√
2 is rational, i.e.,
√
2 = a/b, where a and b
are integers and the fraction is written in least terms. Squaring both
sides we have 2 = a
2
/b
2
, hence 2 b
2
= a
2
. Since the left hand side is
even, then a
2
is even, but this implies that a itself is even, so a = 2 a
.
Hence: 2 b
2
= 4 a
2
, and simplifying: b
2
= 2 a
2
. This implies that b
2
is even, so b is even: b = 2b
. Consequently a/b = 2a
/2b
= a
/b
,
contradicting the hypothesis that a/b was in least terms.
1.3.2. Arguments, Rules of Inference. An argument is a se
quence of propositions p
1
, p
2
, . . . , p
n
called hypotheses (or premises)
followed by a proposition q called conclusion. An argument is usually
written:
p
1
p
2
.
.
.
p
n
∴ q
or
p
1
, p
2
, . . . , p
n
/ ∴ q
The argument is called valid if q is true whenever p
1
, p
2
, . . . , p
n
are
true; otherwise it is called invalid.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
1.3. PROOFS 15
Rules of inference are certain simple arguments known to be valid
and used to make a proof step by step. For instance the following
argument is called modus ponens or rule of detachment:
p → q
p
∴ q
In order to check whether it is valid we must examine the following
truth table:
p q p → q p q
T T T T T
T F F T F
F T T F T
F F T F F
If we look now at the rows in which both p → q and p are true (just
the ﬁrst row) we see that also q is true, so the argument is valid.
Other rules of inference are the following:
1. Modus Ponens or Rule of Detachment:
p → q
p
∴ q
2. Modus Tollens:
p → q
q
∴ p
3. Addition:
p
∴ p ∨ q
4. Simpliﬁcation:
p ∧ q
∴ p
5. Conjunction:
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
1.3. PROOFS 16
p
q
∴ p ∧ q
6. Hypothetical Syllogism:
p → q
q → r
∴ p → r
7. Disjunctive Syllogism:
p ∨ q
p
∴ q
8. Resolution:
p ∨ q
p ∨ r
∴ q ∨ r
Arguments are usually written using three columns. Each row con
tains a label, a statement and the reason that justiﬁes the introduction
of that statement in the argument. That justiﬁcation can be one of the
following:
1. The statement is a premise.
2. The statement can be derived from statements occurring earlier
in the argument by using a rule of inference.
Example: Consider the following statements: “I take the bus or
I walk. If I walk I get tired. I do not get tired. Therefore I take the
bus.” We can formalize this by calling B = “I take the bus”, W =
“I walk” and T = “I get tired”. The premises are B ∨W, W → T and
T, and the conclusion is B. The argument can be described in the
following steps:
step statement reason
1) W → T Premise
2) T Premise
3) W 1,2, Modus Tollens
4) B ∨ W Premise
5) ∴ B 4,3, Disjunctive Syllogism
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
1.3. PROOFS 17
1.3.3. Rules of Inference for Quantiﬁed Statements. We
state the rules for predicates with one variable, but they can be gener
alized to predicates with two or more variables.
1. Universal Instantiation. If ∀x p(x) is true, then p(a) is true for
each speciﬁc element a in the universe of discourse; i.e.:
∀x p(x)
∴ p(a)
For instance, from ∀x (x+1 = 1+x) we can derive 7+1 = 1+7.
2. Existential Instantiation. If ∃x p(x) is true, then p(a) is true for
some speciﬁc element a in the universe of discourse; i.e.:
∃x p(x)
∴ p(a)
The diﬀerence respect to the previous rule is the restriction in
the meaning of a, which now represents some (not any) element
of the universe of discourse. So, for instance, from ∃x (x
2
= 2)
(the universe of discourse is the real numbers) we derive the
existence of some element, which we may represent ±
√
2, such
that (±
√
2)
2
= 2.
3. Universal Generalization. If p(x) is proved to be true for a
generic element in the universe of discourse, then ∀x p(x) is
true; i.e.:
p(x)
∴ ∀x p(x)
By “generic” we mean an element for which we do not make any
assumption other than its belonging to the universe of discourse.
So, for instance, we can prove ∀x [(x + 1)
2
= x
2
+ 2x + 1] (say,
for real numbers) by assuming that x is a generic real number
and using algebra to prove (x + 1)
2
= x
2
+ 2x + 1.
4. Existential Generalization. If p(a) is true for some speciﬁc ele
ment a in the universe of discourse, then ∃x p(x) is true; i.e.:
p(a)
∴ ∃x p(x)
For instance: from 7 + 1 = 8 we can derive ∃x (x + 1 = 8).
Example: Show that a counterexample can be used to disprove a
universal statement, i.e., if a is an element in the universe of discourse,
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
1.3. PROOFS 18
then from p(a) we can derive ∀x p(x). Answer: The argument is as
follows:
step statement reason
1) p(a) Premise
2) ∃x p(x) Existential Generalization
3) ∀x p(x) Negation of Universal Statement
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
CHAPTER 2
Sets, Functions, Relations
2.1. Set Theory
2.1.1. Sets. A set is a collection of objects, called elements of the
set. A set can be represented by listing its elements between braces:
A = ¦1, 2, 3, 4, 5¦. The symbol ∈ is used to express that an element is
(or belongs to) a set, for instance 3 ∈ A. Its negation is represented by
,∈, e.g. 7 ,∈ A. If the set is ﬁnite, its number of elements is represented
[A[, e.g. if A = ¦1, 2, 3, 4, 5¦ then [A[ = 5.
Some important sets are the following:
1. N = ¦0, 1, 2, 3, ¦ = the set of natural numbers.
1
2. Z = ¦ , −3, −2, −1, 0, 1, 2, 3, ¦ = the set of integers.
3. Q = the set of rational numbers.
4. R = the set of real numbers.
5. C = the set of complex numbers.
Is S is one of those sets then we also use the following notations:
2
1. S
+
= set of positive elements in S, for instance
Z
+
= ¦1, 2, 3, ¦ = the set of positive integers.
2. S
−
= set of negative elements in S, for instance
Z
−
= ¦−1, −2, −3, ¦ = the set of negative integers.
3. S
∗
= set of elements in S excluding zero, for instance R
∗
= the
set of non zero real numbers.
Setbuilder notation. An alternative way to deﬁne a set, called set
builder notation, is by stating a property (predicate) P(x) veriﬁed by
exactly its elements, for instance A = ¦x ∈ Z [ 1 ≤ x ≤ 5¦ = “set of
1
Note that N includes zero—for some authors N = ¦1, 2, 3, ¦, without zero.
2
When working with strings we will use a similar notation with a diﬀerent
meaning—be careful not to confuse it.
19
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.1. SET THEORY 20
integers x such that 1 ≤ x ≤ 5”—i.e.: A = ¦1, 2, 3, 4, 5¦. In general:
A = ¦x ∈ U [ p(x)¦, where U is the universe of discourse in which the
predicate P(x) must be interpreted, or A = ¦x [ P(x)¦ if the universe
of discourse for P(x) is implicitly understood. In set theory the term
universal set is often used in place of “universe of discourse” for a given
predicate.
3
Principle of Extension. Two sets are equal if and only if they have
the same elements, i.e.:
A = B ≡ ∀x (x ∈ A ↔ x ∈ B) .
Subset. We say that A is a subset of set B, or A is contained in
B, and we represent it “A ⊆ B”, if all elements of A are in B, e.g., if
A = ¦a, b, c¦ and B = ¦a, b, c, d, e¦ then A ⊆ B.
A is a proper subset of B, represented “A ⊂ B”, if A ⊆ B but
A ,= B, i.e., there is some element in B which is not in A.
Empty Set. A set with no elements is called empty set (or null set,
or void set), and is represented by ∅ or ¦¦.
Note that nothing prevents a set from possibly being an element of
another set (which is not the same as being a subset!). For instance
if A = ¦1, a, ¦3, t¦, ¦1, 2, 3¦¦ and B = ¦3, t¦, then obviously B is an
element of A, i.e., B ∈ A.
Power Set. The collection of all subsets of a set A is called the
power set of A, and is represented P(A). For instance, if A = ¦1, 2, 3¦,
then
P(A) = ¦∅, ¦1¦, ¦2¦, ¦3¦, ¦1, 2¦, ¦1, 3¦, ¦2, 3¦, A¦ .
Exercise: Prove by induction that if [A[ = n then [P(A)[ = 2
n
.
Multisets. Two ordinary sets are identical if they have the same
elements, so for instance, ¦a, a, b¦ and ¦a, b¦ are the same set because
they have exactly the same elements, namely a and b. However, in
some applications it might be useful to allow repeated elements in a
set. In that case we use multisets, which are mathematical entities
similar to sets, but with possibly repeated elements. So, as multisets,
¦a, a, b¦ and ¦a, b¦ would be considered diﬀerent, since in the ﬁrst one
the element a occurs twice and in the second one it occurs only once.
3
Properly speaking, the universe of discourse of set theory is the collection of
all sets (which is not a set).
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.1. SET THEORY 21
2.1.2. Venn Diagrams. Venn diagrams are graphic representa
tions of sets as enclosed areas in the plane. For instance, in ﬁgure 2.1,
the rectangle represents the universal set (the set of all elements con
sidered in a given problem) and the shaded region represents a set A.
The other ﬁgures represent various set operations.
A
Figure 2.1. Venn Diagram.
B A
Figure 2.2. Intersection A ∩ B.
B A
Figure 2.3. Union A ∪ B.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.1. SET THEORY 22
A
Figure 2.4. Complement A.
B
A
Figure 2.5. Diﬀerence A −B.
B
A
Figure 2.6. Symmetric Diﬀerence A⊕B.
2.1.3. Set Operations.
1. Intersection: The common elements of two sets:
A ∩ B = ¦x [ (x ∈ A) ∧ (x ∈ B)¦ .
If A ∩ B = ∅, the sets are said to be disjoint.
2. Union: The set of elements that belong to either of two sets:
A ∪ B = ¦x [ (x ∈ A) ∨ (x ∈ B)¦ .
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.1. SET THEORY 23
3. Complement: The set of elements (in the universal set) that do
not belong to a given set:
A = ¦x ∈ U [ x ,∈ A¦ .
4. Diﬀerence or Relative Complement: The set of elements that
belong to a set but not to another:
A −B = ¦x [ (x ∈ A) ∧ (x ,∈ B)¦ = A ∩ B.
5. Symmetric Diﬀerence: Given two sets, their symmetric diﬀer
ence is the set of elements that belong to either one or the other
set but not both.
A⊕B = ¦x [ (x ∈ A) ⊕(x ∈ B)¦ .
It can be expressed also in the following way:
A⊕B = A ∪ B −A ∩ B = (A −B) ∪ (B −A) .
2.1.4. Counting with Venn Diagrams. A Venn diagram with
n sets intersecting in the most general way divides the plane into 2
n
regions. If we have information about the number of elements of some
portions of the diagram, then we can ﬁnd the number of elements in
each of the regions and use that information for obtaining the number
of elements in other portions of the plane.
Example: Let M, P and C be the sets of students taking Mathe
matics courses, Physics courses and Computer Science courses respec
tively in a university. Assume [M[ = 300, [P[ = 350, [C[ = 450,
[M ∩ P[ = 100, [M ∩ C[ = 150, [P ∩ C[ = 75, [M ∩ P ∩ C[ = 10. How
many students are taking exactly one of those courses? (ﬁg. 2.7)
10
185
235
65
140
C
60
90
M P
Figure 2.7. Counting with Venn diagrams.
We see that [(M∩P)−(M∩P∩C)[ = 100−10 = 90, [(M∩C)−(M∩
P ∩C)[ = 150−10 = 140 and [(P ∩C) −(M∩P ∩C)[ = 75−10 = 65.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.1. SET THEORY 24
Then the region corresponding to students taking Mathematics courses
only has cardinality 300−(90+10+140) = 60. Analogously we compute
the number of students taking Physics courses only (185) and taking
Computer Science courses only (235). The sum 60 + 185 + 235 = 480
is the number of students taking exactly one of those courses.
2.1.5. Properties of Sets. The set operations verify the follow
ing properties:
1. Associative Laws:
A ∪ (B ∪ C) = (A ∪ B) ∪ C
A ∩ (B ∩ C) = (A ∩ B) ∩ C
2. Commutative Laws:
A ∪ B = B ∪ A
A ∩ B = B ∩ A
3. Distributive Laws:
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
4. Identity Laws:
A ∪ ∅ = A
A ∩ U = A
5. Complement Laws:
A ∪ A = U
A ∩ A = ∅
6. Idempotent Laws:
A ∪ A = A
A ∩ A = A
7. Bound Laws:
A ∪ U = U
A ∩ ∅ = ∅
8. Absorption Laws:
A ∪ (A ∩ B) = A
A ∩ (A ∪ B) = A
9. Involution Law:
A = A
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.1. SET THEORY 25
10. 0/1 Laws:
∅ = U
U = ∅
11. DeMorgan’s Laws:
A ∪ B = A ∩ B
A ∩ B = A ∪ B
2.1.6. Generalized Union and Intersection. Given a collec
tion of sets A
1
, A
2
, . . . , A
N
, their union is deﬁned as the set of elements
that belong to at least one of the sets (here n represents an integer in
the range from 1 to N):
N
_
n=1
A
n
= A
1
∪ A
2
∪ ∪ A
N
= ¦x [ ∃n(x ∈ A
n
)¦ .
Analogously, their intersection is the set of elements that belong to all
the sets simultaneously:
N
n=1
A
n
= A
1
∩ A
2
∩ ∩ A
N
= ¦x [ ∀n(x ∈ A
n
)¦ .
These deﬁnitions can be applied to inﬁnite collections of sets as well.
For instance assume that S
n
= ¦kn [ k = 2, 3, 4, . . . ¦ = set of multiples
of n greater than n. Then
∞
_
n=2
S
n
= S
2
∪ S
3
∪ S
4
∪ = ¦4, 6, 8, 9, 10, 12, 14, 15, . . . ¦
= set of composite positive integers .
2.1.7. Partitions. A partition of a set X is a collection S of non
overlapping non empty subsets of X whose union is the whole X. For
instance a partition of X = ¦1, 2, 3, 4, 5, 6, 7, 8, 9, 10¦ could be
S = ¦¦1, 2, 4, 8¦, ¦3, 6¦, ¦5, 7, 9, 10¦¦ .
Given a partition S of a set X, every element of X belongs to exactly
one member of S.
Example: The division of the integers Z into even and odd numbers
is a partition: S = ¦E, O¦, where E = ¦2n [ n ∈ Z¦, O = ¦2n +1 [ n ∈
Z¦.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.1. SET THEORY 26
Example: The divisions of Z in negative integers, positive integers
and zero is a partition: S = ¦Z
+
, Z
−
, ¦0¦¦.
2.1.8. Ordered Pairs, Cartesian Product. An ordinary pair
¦a, b¦ is a set with two elements. In a set the order of the elements is
irrelevant, so ¦a, b¦ = ¦b, a¦. If the order of the elements is relevant,
then we use a diﬀerent object called ordered pair, represented (a, b).
Now (a, b) ,= (b, a) (unless a = b). In general (a, b) = (a
, b
) iﬀ a = a
and b = b
.
Given two sets A, B, their Cartesian product AB is the set of all
ordered pairs (a, b) such that a ∈ A and b ∈ B:
A B = ¦(a, b) [ (a ∈ A) ∧ (b ∈ B)¦ .
Analogously we can deﬁne triples or 3tuples (a, b, c), 4tuples (a, b, c, d),
. . . , ntuples (a
1
, a
2
, . . . , a
n
), and the corresponding 3fold, 4fold,. . . ,
nfold Cartesian products:
A
1
A
2
A
n
=
¦(a
1
, a
2
, . . . , a
n
) [ (a
1
∈ A
1
) ∧ (a
2
∈ A
2
) ∧ ∧ (a
n
∈ A
n
)¦ .
If all the sets in a Cartesian product are the same, then we can use
an exponent: A
2
= A A, A
3
= A A A, etc. In general:
A
n
= A A
(n times)
A.
An example of Cartesian product is the real plane R
2
, where R is
the set of real numbers (R is sometimes called real line).
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.2. FUNCTIONS 27
2.2. Functions
2.2.1. Correspondences. Suppose that to each element of a set
A we assign some elements of another set B. For instance, A = N,
B = Z, and to each element x ∈ N we assign all elements y ∈ Z such
that y
2
= x (ﬁg. 2.8).
1
8
9
10
0
2
3 4
5
6
−1
1
2
−2
−3
0
3
Z N
7
Figure 2.8. Correspondence x → ±
√
x.
This operation is called a correspondence.
2.2.2. Functions. A function or mapping f from a set A to a set
B, denoted f : A → B, is a correspondence in which to each element
x of A corresponds exactly one element y = f(x) of B (ﬁg. 2.9).
A B
Figure 2.9. Function.
Sometimes we represent the function with a diagram like this:
f : A → B
x → y
or
A
f
→ B
x → y
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.2. FUNCTIONS 28
For instance, the following represents the function from Z to Z
deﬁned by f(x) = 2x + 1:
f : Z →Z
x → 2x + 1
The element y = f(x) is called the image of x, and x is a preimage
of y. For instance, if f(x) = 2x + 1 then f(7) = 2 7 + 1 = 15. The
set A is the domain of f, and B is its codomain. If A
⊆ A, the image
of A
by f is f(A
) = ¦f(x) [ x ∈ A
¦, i.e., the subset of B consisting
of all images of elements of A
. The subset f(A) of B consisting of
all images of elements of A is called the range of f. For instance, the
range of f(x) = 2x + 1 is the set of all integers of the form 2x + 1 for
some integer x, i.e., all odd numbers.
Example: Two useful functions from R to Z are the following:
1. The ﬂoor function:
¸x = greatest integer less than or equal to x .
For instance: ¸2 = 2, ¸2.3 = 2, ¸π = 3, ¸−2.5 = −3.
2. The ceiling function:
¸x = least integer greater than or equal to x .
For instance: ¸2 = 2, ¸2.3 = 3, ¸π = 4, ¸−2.5 = −2.
Example: The modulus operator is the function mod : ZZ
+
→Z
deﬁned:
x mod y = remainder when x is divided by y.
For instance 23 mod 7 = 2 because 23 = 37+2, 59 mod 9 = 5 because
59 = 6 9 + 5, etc.
Graph: The graph of a function f : A → B is the subset of A B
deﬁned by G(f) = ¦(x, f(x)) [ x ∈ A¦ (ﬁg. 2.10).
2.2.3. Types of Functions.
1. OnetoOne or Injective: A function f : A → B is called one
toone or injective if each element of B is the image of at most
one element of A (ﬁg. 2.11):
∀x, x
∈ A, f(x) = f(x
) ⇒ x = x
.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.2. FUNCTIONS 29
0
1
2
3
4
y
–2 –1 1 2
x
Figure 2.10. Graph of f(x) = x
2
.
For instance, f(x) = 2x from Z to Z is injective.
A B
Figure 2.11. Onetoone function.
2. Onto or Surjective: A function f : A → B is called onto or
surjective if every element of B is the image of some element of
A (ﬁg. 2.12):
∀y ∈ B, ∃x ∈ A such that y = f(x) .
For instance, f(x) = x
2
from R to R
+
∪ ¦0¦ is onto.
A B
Figure 2.12. Onto function.
3. OneToOne Correspondence or Bijective: A function f : A →
B is said to be a onetoone correspondence, or bijective, or a
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.2. FUNCTIONS 30
bijection, if it is onetoone and onto (ﬁg. 2.13). For instance,
f(x) = x + 3 from Z to Z is a bijection.
A B
Figure 2.13. Bijection.
2.2.4. Identity Function. Given a set A, the function 1
A
: A →
A deﬁned by 1
A
(x) = x for every x in A is called the identity function
for A.
2.2.5. Function Composition. Given two functions f : A → B
and g : B → C, the composite function of f and g is the function
g ◦ f : A → C deﬁned by (g ◦ f)(x) = g(f(x)) for every x in A:
A
x
f
g◦f
B
y=f(x)
g
C
z=g(y)=g(f(x))
For instance, if A = B = C = Z, f(x) = x + 1, g(x) = x
2
, then
(g ◦ f)(x) = f(x)
2
= (x +1)
2
. Also (f ◦ g)(x) = g(x) +1 = x
2
+1 (the
composition of functions is not commutative in general).
Some properties of function composition are the following:
1. If f : A → B is a function from A to B, we have that f ◦ 1
A
=
1
B
◦ f = f.
2. Function composition is associative, i.e., given three functions
A
f
→ B
g
→ C
h
→ D,
we have that h ◦ (g ◦ f) = (h ◦ g) ◦ f.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.2. FUNCTIONS 31
Function iteration. If f : A → A is a function from A to A, then
it makes sense to compose it with itself: f
2
= f ◦ f. For instance, if
f : Z → Z is f(x) = 2x + 1, then f
2
(x) = 2(2x + 1) + 1 = 4x + 3.
Analogously we can deﬁne f
3
= f ◦f ◦f, and so on, f
n
= f ◦
(n times)
. . . ◦f.
2.2.6. Inverse Function. If f : A → B is a bijective function, its
inverse is the function f
−1
: B → A such that f
−1
(y) = x if and only
if f(x) = y.
For instance, if f : Z → Z is deﬁned by f(x) = x + 3, then its
inverse is f
−1
(x) = x −3.
The arrow diagram of f
−1
is the same as the arrow diagram of f
but with all arrows reversed.
A characteristic property of the inverse function is that f
−1
◦f = 1
A
and f ◦ f
−1
= 1
B
.
2.2.7. Operators. A function from AA to A is called a binary
operator on A. For instance the addition of integers is a binary oper
ator + : Z Z → Z. In the usual notation for functions the sum of
two integers x and y would be represented +(x, y). This is called preﬁx
notation. The inﬁx notation consists of writing the symbol of the bi
nary operator between its arguments: x+y (this is the most common).
There is also a postﬁx notation consisting of writing the symbol after
the arguments: x y +.
Another example of binary operator on Z is (x, y) → x y.
A monary or unary operator on A is a function from A to A. For
instance the change of sign x → −x on Z is a unary operator on Z. An
example of unary operator on R
∗
(nonzero real numbers) is x → 1/x.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.3. RELATIONS 32
2.3. Relations
2.3.1. Relations. Assume that we have a set of men M and a set
of women W, some of whom are married. We want to express which
men in M are married to which women in W. One way to do that is by
listing the set of pairs (m, w) such that m is a man, w is a woman, and
m is married to w. So, the relation “married to” can be represented
by a subset of the Cartesian product M W. In general, a relation R
from a set A to a set B will be understood as a subset of the Cartesian
product A B, i.e., R ⊆ A B. If an element a ∈ A is related to an
element b ∈ B, we often write a Rb instead of (a, b) ∈ R.
The set
¦a ∈ A [ a Rb for some b ∈ B¦
is called the domain of R. The set
¦b ∈ B [ a Rb for some a ∈ A¦
is called the range of R. For instance, in the relation “married to”
above, the domain is the set of married men, and the range is the set
of married women.
If A and B are the same set, then any subset of A A will be a
binary relation in A. For instance, assume A = ¦1, 2, 3, 4¦. Then the
binary relation “less than” in A will be:
<
A
= ¦(x, y) ∈ A A [ x < y¦
= ¦(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)¦ .
Notation: A set A with a binary relation R is sometimes represented
by the pair (A, R). So, for instance, (Z, ≤) means the set of integers
together with the relation of nonstrict inequality.
2.3.2. Representations of Relations.
Arrow diagrams. Venn diagrams and arrows can be used for repre
senting relations between given sets. As an example, ﬁgure 2.14 rep
resents the relation from A = ¦a, b, c, d¦ to B = ¦1, 2, 3, 4¦ given by
R = ¦(a, 1), (b, 1), (c, 2), (c, 3)¦. In the diagram an arrow from x to y
means that x is related to y. This kind of graph is called directed graph
or digraph.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.3. RELATIONS 33
a
b
c
1
2
3
4
d
A
B
Figure 2.14. Relation.
Another example is given in diagram 2.15, which represents the
divisibility relation on the set ¦1, 2, 3, 4, 5, 6, 7, 8, 9¦.
1
2
3
4
5
6
7
8
9
Figure 2.15. Binary relation of divisibility.
Matrix of a Relation. Another way of representing a relation R
from A to B is with a matrix. Its rows are labeled with the elements
of A, and its columns are labeled with the elements of B. If a ∈ A
and b ∈ B then we write 1 in row a column b if a Rb, otherwise we
write 0. For instance the relation R = ¦(a, 1), (b, 1), (c, 2), (c, 3)¦ from
A = ¦a, b, c, d¦ to B = ¦1, 2, 3, 4¦ has the following matrix:
1 2 3 4
a
b
c
d
_
_
_
_
1 0 0 0
1 0 0 0
0 1 1 0
0 0 0 0
_
_
_
_
2.3.3. Inverse Relation. Given a relation R from A to B, the
inverse of R, denoted R
−1
, is the relation from B to A deﬁned as
b R
−1
a ⇔ a Rb .
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.3. RELATIONS 34
For instance, if R is the relation “being a son or daughter of”, then
R
−1
is the relation “being a parent of”.
2.3.4. Composition of Relations. Let A, B and C be three sets.
Given a relation R from A to B and a relation S from B to C, then
the composition S ◦ R of relations R and S is a relation from A to C
deﬁned by:
a (S ◦ R) c ⇔ there exists some b ∈ B such that a Rb and b S c .
For instance, if R is the relation “to be the father of”, and S is the
relation “to be married to”, then S ◦ R is the relation “to be the father
in law of”.
2.3.5. Properties of Binary Relations. A binary relation R on
A is called:
1. Reﬂexive if for all x ∈ A, x Rx. For instance on Z the relation
“equal to” (=) is reﬂexive.
2. Transitive if for all x, y, z ∈ A, x Ry and y Rz implies x Rz.
For instance equality (=) and inequality (<) on Z are transitive
relations.
3. Symmetric if for all x, y ∈ A, x Ry ⇒ y Rx. For instance on Z,
equality (=) is symmetric, but strict inequality (<) is not.
4. Antisymmetric if for all x, y ∈ A, x Ry and y Rx implies x = y.
For instance, nonstrict inequality (≤) on Z is antisymmetric.
2.3.6. Partial Orders. A partial order, or simply, an order on a
set A is a binary relation “” on A with the following properties:
1. Reﬂexive: for all x ∈ A, x x.
2. Antisymmetric: (x y) ∧ (y x) ⇒ x = y.
3. Transitive: (x y) ∧ (y z) ⇒ x z.
Examples:
1. The nonstrict inequality (≤) in Z.
2. Relation of divisibility on Z
+
: a[b ⇔ ∃t, b = at.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.3. RELATIONS 35
3. Set inclusion (⊆) on P(A) (the collection of subsets of a given
set A).
Exercise: prove that the aforementioned relations are in fact partial
orders. As an example we prove that integer divisibility is a partial
order:
1. Reﬂexive: a = a 1 ⇒ a[a.
2. Antisymmetric: a[b ⇒ b = at for some t and b[a ⇒ a = bt
for
some t
. Hence a = att
, which implies tt
= 1 ⇒ t
= t
−1
. The
only invertible positive integer is 1, so t = t
= 1 ⇒ a = b.
3. Transitive: a[b and b[c implies b = at for some t and c = bt
for
some t
, hence c = att
, i.e., a[c.
Question: is the strict inequality (<) a partial order on Z?
Two elements a, b ∈ A are said to be comparable if either x y
or y x, otherwise they are said to be non comparable. The order
is called total or linear when every pair of elements x, y ∈ A are com
parable. For instance (Z, ≤) is totally ordered, but (Z
+
, [), where “[”
represents integer divisibility, is not. A totally ordered subset of a par
tially ordered set is called a chain; for instance the set ¦1, 2, 4, 8, 16, . . . ¦
is a chain in (Z
+
, [).
2.3.7. Hasse diagrams. A Hasse diagram is a graphical represen
tation of a partially ordered set in which each element is represented
by a dot (node or vertex of the diagram). Its immediate successors are
placed above the node and connected to it by straight line segments. As
an example, ﬁgure 2.16 represents the Hasse diagram for the relation
of divisibility on ¦1, 2, 3, 4, 5, 6, 7, 8, 9¦.
Question: How does the Hasse diagram look for a totally ordered
set?
2.3.8. Equivalence Relations. An equivalence relation on a set
A is a binary relation “∼” on A with the following properties:
1. Reﬂexive: for all x ∈ A, x ∼ x.
2. Symmetric: x ∼ y ⇒ y ∼ x.
3. Transitive: (x ∼ y) ∧ (y ∼ z) ⇒ x ∼ z.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.3. RELATIONS 36
1
4
8
6
2
7
9
3
5
Figure 2.16. Hasse diagram for divisibility.
For instance, on Z, the equality (=) is an equivalence relation.
Another example, also on Z, is the following: x ≡ y (mod 2) (“x is
congruent to y modulo 2”) iﬀ x−y is even. For instance, 6 ≡ 2 (mod 2)
because 6 −2 = 4 is even, but 7 ,≡ 4 (mod 2), because 7 −4 = 3 is not
even. Congruence modulo 2 is in fact an equivalence relation:
1. Reﬂexive: for every integer x, x−x = 0 is indeed even, so x ≡ x
(mod 2).
2. Symmetric: if x ≡ y (mod 2) then x − y = t is even, but
y −x = −t is also even, hence y ≡ x (mod 2).
3. Transitive: assume x ≡ y (mod 2) and y ≡ z (mod 2). Then
x −y = t and y −z = u are even. From here, x −z = (x −y) +
(y −z) = t +u is also even, hence x ≡ z (mod 2).
2.3.9. Equivalence Classes, Quotient Set, Partitions. Given
an equivalence relation ∼ on a set A, and an element x ∈ A, the
set of elements of A related to x are called the equivalence class of
x, represented [x] = ¦y ∈ A [ y ∼ x¦. Element x is said to be a
representative of class
[x]. The collection of equivalence classes, represented A/∼ = ¦[x] [
x ∈ A¦, is called quotient set of A by ∼.
Exercise: Find the equivalence classes on Z with the relation of
congruence modulo 2.
One of the main properties of an equivalence relation on a set A
is that the quotient set, i.e. the collection of equivalence classes, is
a partition of A. Recall that a partition of a set A is a collection of
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
2.3. RELATIONS 37
nonempty subsets A
1
, A
2
, A
3
, . . . of A which are pairwise disjoint and
whose union equals A:
1. A
i
∩ A
j
= ∅ for i ,= j,
2.
n
A
n
= A.
Example: in Z with the relation of congruence modulo 2 (call it
“∼
2
”), there are two equivalence classes: the set E of even integers and
the set O of odd integers. The quotient set of Z by the relation “∼
2
”
of congruence modulo 2 is Z/∼
2
= ¦E, O¦. We see that it is in fact a
partition of Z, because E ∩ O = ∅, and Z = E ∪ O.
Exercise: Let m be an integer greater than or equal to 2. On Z
we deﬁne the relation x ≡ y (mod m) ⇔ m[(y − x) (i.e., m divides
exactly y −x). Prove that it is an equivalence relation. What are the
equivalence classes? How many are there?
Exercise: On the Cartesian product Z Z
∗
we deﬁne the relation
(a, b) R(c, d) ⇔ ad = bc. Prove that R is an equivalence relation.
Would it still be an equivalence relation if we extend it to Z Z?
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
CHAPTER 3
Algorithms, Integers
3.1. Algorithms
Consider the following list of instructions to ﬁnd the maximum of
three numbers a, b, c:
1. Assign variable x the value of a.
2. If b > x then assign x the value of b.
3. If c > x then assign x the value of c.
4. Output the value of x.
After executing those steps the output will be the maximum of a, b, c.
In general an algorithm is a ﬁnite list of instructions with the fol
lowing characteristics:
1. Precision. The steps are precisely stated.
2. Uniqueness. The result of executing each step is uniquely de
termined by the inputs and the result of preceding steps.
3. Finiteness. The algorithm stops after ﬁnitely many instructions
have been executed.
4. Input. The algorithm receives input.
5. Output. The algorithm produces output.
6. Generality. The algorithm applies to a set of inputs.
Basically an algorithm is the idea behind a program. Conversely,
programs are implementations of algorithms.
3.1.1. Pseudocode. Pseudocode is a language similar to a pro
gramming language used to represent algorithms. The main diﬀerence
respect to actual programming languages is that pseudocode is not re
quired to follow strict syntactic rules, since it is intended to be just
read by humans, not actually executed by a machine.
38
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.1. ALGORITHMS 39
Usually pseudocode will look like this:
procedure ProcedureName(Input)
Instructions...
end ProcedureName
For instance the following is an algorithm to ﬁnd the maximum of
three numbers a, b, c:
1: procedure max(a,b,c)
2: x := a
3: if b>x then
4: x := b
5: if c>x then
6: x := c
7: return x
8: end max
Next we show a few common operations in pseudocode.
The following statement means “assign variable x the value of vari
able y:
x := y
The following code executes “action” if condition “p” is true:
if p then
action
The following code executes “action1” if condition “p” is true, oth
erwise it executes “action2”:
if p then
action1
else
action2
The following code executes “action” while condition “p” is true:
1: while p
2: action
The following is the structure of a for loop:
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.1. ALGORITHMS 40
for var := init to limit
action
If an action contains more than one statement then we must enclose
them in a block:
begin
Instruction1
Instruction2
Instruction3
...
end
Comments are enclose between brackets:
¦This is a comment¦
The output of a procedure is returned with a return statement:
return output
Procedures that do not return anything are invoked with a call
statement:
call Procedure(arguments...)
As an example, the following procedure returns the largest number
in a sequence s
1
, s
2
, . . . s
n
represented as an array with n elements:
s[1], s[2],. . . , s[n]:
1: procedure largest element(s,n)
2: largest := s[1]
3: for k := 2 to n
4: if s[k] > largest then
5: largest := s[k]
6: return largest
7: end largest element
3.1.2. Recursiveness.
Recursive Deﬁnitions. A deﬁnition such that the object deﬁned oc
curs in the deﬁnition is called a recursive deﬁnition. For instance,
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.1. ALGORITHMS 41
consider the Fibonacci sequence
0, 1, 1, 2, 3, 5, 8, 13, . . .
It can be deﬁned as a sequence whose two ﬁrst terms are F
0
= 0,
F
1
= 1 and each subsequent term is the sum of the two previous ones:
F
n
= F
n−1
+F
n−2
(for n ≥ 2).
Other examples:
• Factorial:
1. 0! = 1
2. n! = n (n −1)! (n ≥ 1)
• Power:
1. a
0
= 1
2. a
n
= a
n−1
a (n ≥ 1)
In all these examples we have:
1. A Basis, where the function is explicitly evaluated for one or
more values of its argument.
2. A Recursive Step, stating how to compute the function from its
previous values.
Recursive Procedures. A recursive procedure is a procedure that in
vokes itself. Also a set of procedures is called recursive if they invoke
themselves in a circle, e.g., procedure p
1
invokes procedure p
2
, proce
dure p
2
invokes procedure p
3
and procedure p
3
invokes procedure p
1
.
A recursive algorithm is an algorithm that contains recursive proce
dures or recursive sets of procedures. Recursive algorithms have the
advantage that often they are easy to design and are closer to natural
mathematical deﬁnitions.
As an example we show two alternative algorithms for computing
the factorial of a natural number, the ﬁrst one iterative (non recursive),
the second one recursive.
1: procedure factorial iterative(n)
2: fact := 1
3: for k := 2 to n
4: fact := k * fact
5: return fact
6: end factorial iterative
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.1. ALGORITHMS 42
1: procedure factorial recursive(n)
2: if n = 0 then
3: return 1
4: else
5: return n * factorial recursive(n1)
6: end factorial recursive
While the iterative version computes n! = 1 2 . . . n directly, the
recursive version resembles more closely the formula n! = n (n −1)!
A recursive algorithm must contain at least a basic case without
recursive call (the case n = 0 in our example), and any legitimate
input should lead to a ﬁnite sequence of recursive calls ending up at
the basic case. In our example n is a legitimate input if it is a natural
number, i.e., an integer greater than or equal to 0. If n = 0 then
factorial recursive(0) returns 1 immediately without performing
any recursive call. If n > then the execution of
factorial recursive(n)
leads to a recursive call
factorial recursive(n1)
which will perform a recursive call
factorial recursive(n2)
and so on until eventually reaching the basic case
factorial recursive(0)
After reaching the basic case the procedure returns a value to the last
call, which returns a value to the previous call, and so on up to the
ﬁrst invocation of the procedure.
Another example is the following algorithm for computing the nth
element of the Fibonacci sequence:
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.1. ALGORITHMS 43
1: procedure fibonacci(n)
2: if n=0 then
3: return 0
4: if n=1 then
5: return 1
6: return fibonacci(n1) + fibonacci(n2)
7: end fibonacci
In this example we have two basic cases, namely n = 0 and n = 1.
In this particular case the algorithm is ineﬃcient in the sense that
it performs more computations than actually needed. For instance a
call to fibonacci(5) contains two recursive calls, one to fibonacci(4)
and another one to fibonacci(3). Then fibonacci(4) performs a call
to fibonacci(3) and another call to fibonacci(2), so at this point we
see that fibonacci(3) is being called twice, once inside fibonacci(5)
and again in fibonacci(4). Hence sometimes the price to pay for a
simpler algorithmic structure is a loss of eﬃciency.
However careful design may yield eﬃcient recursive algorithms. An
example is merge sort, and algorithm intended to sort a list of ele
ments. First let’s look at a simple non recursive sorting algorithm
called bubble sort. The idea is to go several times through the list
swapping adjacent elements if necessary. It applies to a list of numbers
s
i
, s
i+1
, . . . , s
j
represented as an array s[i], s[i+1],..., s[j]:
1: procedure bubble sort(s,i,j)
2: for p:=1 to ji
3: for q:=i to jp
4: if s[q] > s[q+1] then
5: swap(s[q],s[q+1])
6: end bubble sort
We can see that bubble sort requires n(n−1)/2 comparisons and
possible swapping operations.
On the other hand, the idea of merge sort is to split the list into
two approximately equal parts, sort them separately and then merge
them into a single list:
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.1. ALGORITHMS 44
1: procedure merge sort(s,i,j)
2: if i=j then
3: return
4: m := floor((i+j)/2)
5: call merge sort(s,i,m)
6: call merge sort(s,m+1,j)
7: call merge(s,i,m,j,c)
8: for k:=i to j
9: s[k] := c[k]
10: end merge sort
The procedure merge(s,i,m,j,c) merges the two increasing sequences
s
i
, s
i+1
, . . . , s
m
and s
m+1
, s
m+2
, . . . , s
j
into a single increasing sequence
c
i
, c
i+1
, . . . , c
j
. This algorithm is more eﬃcient than bubble sort be
cause it requires only about nlog
2
n operations (we will make this more
precise soon).
The strategy of dividing a task into several smaller tasks is called
divide and conquer.
3.1.3. Complexity. In general the complexity of an algorithm is
the amount of time and space (memory use) required to execute it.
Here we deal with time complexity only.
Since the actual time required to execute an algorithm depends on
the details of the program implementing the algorithm and the speed
and other characteristics of the machine executing it, it is in general
impossible to make an estimation in actual physical time, however it
is possible to measure the length of the computation in other ways,
say by the number of operations performed. For instance the following
loop performs the statement x := x + 1 exactly n times,
1: for i := 1 to n
2: x := x + 1
The following double loop performs it n
2
times:
1: for i := 1 to n
2: for j := 1 to n
3: x := x + 1
The following one performs it 1 + 2 + 3 + +n = n(n + 1)/2 times:
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.1. ALGORITHMS 45
1: for i := 1 to n
2: for j := 1 to i
3: x := x + 1
Since the time that takes to execute an algorithm usually depends
on the input, its complexity must be expressed as a function of the
input, or more generally as a function of the size of the input. Since
the execution time may be diﬀerent for inputs of the same size, we
deﬁne the following kinds of times:
1. Bestcase time: minimum time needed to execute the algorithm
among all inputs of a given size n.
2. Wostcase time: maximum time needed to execute the algo
rithm among all inputs of a given size n.
3. Averagecase time: average time needed to execute the algo
rithm among all inputs of a given size n.
For instance, assume that we have a list of n objects one of which is
colored red and the others are colored blue, and we want to ﬁnd the one
that is colored red by examining the objects one by one. We measure
time by the number of objects examined. In this problem the minimum
time needed to ﬁnd the red object would be 1 (in the lucky event that
the ﬁrst object examined turned out to be the red one). The maximum
time would be n (if the red object turns out to be the last one). The
average time is the average of all possible times: 1, 2, 3, . . . , n, which is
(1+2+3+ +n)/n = (n+1)/2. So in this example the bestcase time
is 1, the worstcase time is n and the averagecase time is (n + 1)/2.
Often the exact time is too hard to compute or we are interested
just in how it grows compared to the size of the input. For instance
and algorithm that requires exactly 7n
2
+3n+10 steps to be executed
on an input of size n is said to be or order n
2
, represented Θ(n
2
). This
justiﬁes the following notations:
Big Oh Notation. A function f(n) is said to be of order at most
g(n), written f(n) = O(g(n)), if there is a constant C
1
such that
[f(n)[ ≤ C
1
[g(n)[
for all but ﬁnitely many positive integers n.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.1. ALGORITHMS 46
Omega Notation. A function f(n) is said to be of order at least
g(n), written f(n) = Ω(g(n)), if there is a constant C
2
such that
[f(n)[ ≥ C
2
[g(n)[
for all but ﬁnitely many positive integers n.
Theta Notation. A function f(n) is said to be of order g(n), written
f(n) = Θ(g(n)), if f(n) = O(g(n)) and f(n) = Ω(g(n)).
Remark: All logarithmic functions are of the same order: log
a
n =
Θ(log
b
n) for any a, b > 1, because log
a
n = log
b
n/ log
b
a, so they always
diﬀer in a multiplicative constant. As a consequence, if the execution
time of an algorithm is of order a logarithmic function, we can just say
that its time is “logarithmic”, we do not need to specify the base of
the logarithm.
The following are several common growth functions:
Order Name
Θ(1) Constant
Θ(log log n) Log log
Θ(log n) Logarithmic
Θ(nlog n) n log n
Θ(n) Linear
Θ(n
2
) Quadratic
Θ(n
3
) Cubic
Θ(n
k
) Polynomial
Θ(a
n
) Exponential
Let’s see now how we ﬁnd the complexity of algorithms like bubble sort
and merge sort.
Since bubble sort is just a double loop its complexity is easy to
ﬁnd; the inner loop is executed
(n −1) + (n −2) + + 1 = n(n −1)/2
times, so it requires n(n −1)/2 comparisons and possible swap opera
tions. Hence its execution time is Θ(n
2
).
The estimation of the complexity of merge sort is more involved.
First, the number of operations required by the merge procedure is
Θ(n). Next, if we call T(n) (the order of) the number of operations
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.1. ALGORITHMS 47
required by merge sort working on a list of size n, we see that roughly:
T(n) = 2T(n/2) +n.
Replacing n with n/2 we have T(n/2) = 2T(n/4) +n/2, hence
T(n) = 2T(n/2) +n = 2(2T(n/4) +n/2) +n = 4T(n/4) + 2n.
Repeating k times we get:
T(n) = 2
k
T(n/2
k
) +kn.
So for k = log
2
n we have
T(n) = nT(1) +nlog
2
n = Θ(nlog n) .
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.2. THE EUCLIDEAN ALGORITHM 48
3.2. The Euclidean Algorithm
3.2.1. The Division Algorithm. The following result is known
as The Division Algorithm:
1
If a, b ∈ Z, b > 0, then there exist unique
q, r ∈ Z such that a = qb + r, 0 ≤ r < b. Here q is called quotient of
the integer division of a by b, and r is called remainder.
3.2.2. Divisibility. Given two integers a, b, b ,= 0, we say that b
divides a, written b[a, if there is some integer q such that a = bq:
b[a ⇔ ∃q, a = bq .
We also say that b divides or is a divisor of a, or that a is a multiple
of b.
3.2.3. Prime Numbers. A prime number is an integer p ≥ 2
whose only positive divisors are 1 and p. Any integer n ≥ 2 that is not
prime is called composite. A nontrivial divisor of n ≥ 2 is a divisor d
of n such that 1 < d < n, so n ≥ 2 is composite iﬀ it has nontrivial
divisors. Warning: 1 is not considered either prime or composite.
Some results about prime numbers:
1. For all n ≥ 2 there is some prime p such that p[n.
2. (Euclid) There are inﬁnitely many prime numbers.
3. If p[ab then p[a or p[b. More generally, if p[a
1
a
2
. . . a
n
then p[a
k
for some k = 1, 2, . . . , n.
3.2.4. The Fundamental Theorem of Arithmetic. Every in
teger n ≥ 2 can be written as a product of primes uniquely, up to the
order of the primes.
It is customary to write the factorization in the following way:
n = p
s
1
1
p
s
2
2
. . . p
s
k
k
,
where all the exponents are positive and the primes are written so that
p
1
< p
2
< < p
k
. For instance:
13104 = 2
4
3
2
7 13 .
1
The result is not really an “algorithm”, it is just a mathematical theorem.
There are, however, algorithms that allow us to compute the quotient and the
remainder in an integer division.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.2. THE EUCLIDEAN ALGORITHM 49
3.2.5. Greatest Common Divisor. A positive integer d is called
a common divisor of the integers a and b, if d divides a and b. The
greatest possible such d is called the greatest common divisor of a and b,
denoted gcd(a, b). If gcd(a, b) = 1 then a, b are called relatively prime.
Example: The set of positive divisors of 12 and 30 is ¦1, 2, 3, 6¦.
The greatest common divisor of 12 and 30 is gcd(12, 30) = 6.
A few properties of divisors are the following. Let m, n, d be
integers. Then:
1. If d[m and d[n then d[(m+n).
2. If d[m and d[n then d[(m−n).
3. If d[m then d[mn.
Another important result is the following: Given integers a, b, c, the
equation
ax +by = c
has integer solutions if and only if gcd(a, b) divides c. That is an
example of a Diophantine equation. In general a Diophantine equation
is an equation whose solutions must be integers.
Example: We have gcd(12, 30) = 6, and in fact we can write 6 =
1 30−2 12. The solution is not unique, for instance 6 = 3 30−7 12.
3.2.6. Finding the gcd by Prime Factorization. We have that
gcd(a, b) = product of the primes that occur in the prime factorizations
of both a and b, raised to their lowest exponent. For instance 1440 =
2
5
3
2
5, 1512 = 2
3
3
3
7, hence gcd(1440, 1512) = 2
3
3
2
= 72.
Factoring numbers is not always a simple task, so ﬁnding the gcd
by prime factorization might not be a most convenient way to do it,
but there are other ways.
3.2.7. The Euclidean Algorithm. Now we examine an alter
native method to compute the gcd of two given positive integers a, b.
The method provides at the same time a solution to the Diophantine
equation:
ax +by = gcd(a, b) .
It is based on the following fact: given two integers a ≥ 0 and
b > 0, and r = a mod b, then gcd(a, b) = gcd(b, r). Proof: Divide a by
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.2. THE EUCLIDEAN ALGORITHM 50
b obtaining a quotient q and a remainder r, then
a = bq +r , 0 ≤ r < b .
If d is a common divisor of a and b then it must be a divisor of r = a−bq.
Conversely, if d is a common divisor of b and r then it must divide
a = bq + r. So the set of common divisors of a and b and the set of
common divisors of b and r are equal, and the greatest common divisor
will be the same.
The Euclidean algorithm is a follows. First we divide a by b, obtain
ing a quotient q and a remainder r. Then we divide b by r, obtaining
a new quotient q
and a remainder r
. Next we divide r by r
, which
gives a quotient q
and another remainder r
. We continue dividing
each remainder by the next one until obtaining a zero remainder, and
which point we stop. The last nonzero remainder is the gcd.
Example: Assume that we wish to compute gcd(500, 222). Then we
arrange the computations in the following way:
500 = 2 222 + 56 → r = 56
222 = 3 56 + 54 → r
= 54
56 = 1 54 + 2 → r
= 2
54 = 27 2 + 0 → r
= 0
The last nonzero remainder is r
= 2, hence gcd(500, 222) = 2. Fur
thermore, if we want to express 2 as a linear combination of 500 and
222, we can do it by working backward:
2 = 56 −1 54 = 56 −1 (222 −3 56) = 4 56 −1 222
= 4 (500 −2 222) −1 222 = 4 500 −9 222 .
The algorithm to compute the gcd can be written as follows:
1: procedure gcd(a,b)
2: if a<b then ¦make a the largest¦
3: swap(a,b)
4: while b ,= 0
5: begin
6: r := a mod b
7: a := b
8: b := r
9: end
10: return a
11: end gcd
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.2. THE EUCLIDEAN ALGORITHM 51
The next one is a recursive version of the Euclidean algorithm:
1: procedure gcd recurs(a,b)
2: if b=0 then
3: return a
4: else
5: return gcd recurs(b,a mod b)
6: end gcd recurs
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.3. MODULAR ARITHMETIC, RSA ALGORITHM 52
3.3. Modular Arithmetic, RSA Algorithm
3.3.1. Congruences Modulo m. Given an integer m ≥ 2, we
say that a is congruent to b modulo m, written a ≡ b (mod m), if
m[(a −b). Note that the following conditions are equivalent
1. a ≡ b (mod m).
2. a = b +km for some integer k.
3. a and b have the same remainder when divided by m.
The relation of congruence modulo m is an equivalence relation. It
partitions Z into m equivalence classes of the form
[x] = [x]
m
= ¦x +km [ k ∈ Z¦ .
For instance, for m = 5, each one of the following rows is an equivalence
class:
. . . −10 −5 0 5 10 15 20 . . .
. . . −9 −4 1 6 11 16 21 . . .
. . . −8 −3 2 7 12 17 22 . . .
. . . −7 −2 3 8 13 18 23 . . .
. . . −6 −1 4 9 14 19 24 . . .
Each equivalence class has exactly a representative r such that 0 ≤ r <
m, namely the common remainder of all elements in that class when di
vided by m. Hence an equivalence class may be denoted [r] or x+mZ,
where 0 ≤ r < m. Often we will omit the brackets, so that the equiva
lence class [r] will be represented just r. The set of equivalence classes
(i.e., the quotient set of Z by the relation of congruence modulo m) is
denoted Z
m
= ¦0, 1, 2, . . . , m−1¦. For instance, Z
5
= ¦0, 1, 2, 3, 4¦.
Remark: When writing “r” as a notation for the class of r we may
stress the fact that r represents the class of r rather than the integer r
by including “ (mod p)” at some point. For instance 8 = 3 (mod p).
Note that in “a ≡ b (mod m)”, a and b represent integers, while in
“a = b (mod m)” they represent elements of Z
m
.
Reduction Modulo m: Once a set of representatives has been chosen
for the elements of Z
m
, we will call “r reduced modulo m”, written
“r mod m”, the chosen representative for the class of r. For instance,
if we choose the representatives for the elements of Z
5
in the interval
from 0 to 4 (Z
5
= ¦0, 1, 2, 3, 4¦), then 9 mod 5 = 4. Another possibility
is to choose the representatives in the interval from −2 to 2 (Z
5
=
¦−2, −1, 0, 1, 2¦), so that 9 mod 5 = −1
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.3. MODULAR ARITHMETIC, RSA ALGORITHM 53
In Z
m
it is possible to deﬁne an addition and a multiplication in
the following way:
[x] + [y] = [x +y] ; [x] [y] = [x y] .
As an example, tables 3.3.1 and 3.3.2 show the addition and multi
plication tables for Z
5
and Z
6
respectively.
+ 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3
0 1 2 3 4
0 0 0 0 0 0
1 0 1 2 3 4
2 0 2 4 1 3
3 0 3 1 4 2
4 0 4 3 2 1
Table 3.3.1. Operational tables for Z
5
+ 0 1 2 3 4 5
0 0 1 2 3 4 5
1 1 2 3 4 5 0
2 2 3 4 5 0 1
3 3 4 5 0 1 2
4 4 5 0 1 2 3
5 5 0 1 2 3 4
0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 1 2 3 4 5
2 0 2 4 0 2 4
3 0 3 0 3 0 3
4 0 4 2 0 4 2
5 0 5 4 3 2 1
Table 3.3.2. Operational tables for Z
6
A diﬀerence between this two tables is that in Z
5
every nonzero
element has a multiplicative inverse, i.e., for every x ∈ Z
5
such that
x ,= 0 there is an x
−1
such that x x
−1
= x
−1
x = 1; e.g. 2
−1
= 4
(mod 5). However in Z
6
that is not true, some nonzero elements like
2 have no multiplicative inverse. Furthermore the elements without
multiplicative inverse verify that they can be multiply by some other
nonzero element giving a product equal zero, e.g. 2 3 = 0 (mod 6).
These elements are called divisors of zero. Of course with this deﬁnition
zero itself is a divisor of zero. Divisors of zero diﬀerent from zero are
called proper divisors of zero. For instance in Z
6
2 is a proper divisor
of zero. In Z
5
there are no proper divisors of zero.
In general:
1. The elements of Z
m
can be classiﬁed into two classes:
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.3. MODULAR ARITHMETIC, RSA ALGORITHM 54
(a) Units: elements with multiplicative inverse.
(b) Divisors of zero: elements that multiplied by some other
nonzero element give product zero.
2. An element [a] ∈ Z
m
is a unit (has a multiplicative inverse) if
and only if gcd(a, m) = 1.
3. All nonzero elements of Z
m
are units if and only if m is a prime
number.
The set of units in Z
m
is denoted Z
∗
m
. For instance:
Z
∗
2
= ¦1¦
Z
∗
3
= ¦1, 2¦
Z
∗
4
= ¦1, 3¦
Z
∗
5
= ¦1, 2, 3, 4¦
Z
∗
6
= ¦1, 5¦
Z
∗
7
= ¦1, 2, 3, 4, 5, 6¦
Z
∗
8
= ¦1, 3, 5, 7¦
Z
∗
9
= ¦1, 2, 4, 5, 7, 8¦
Given an element [a] in Z
∗
m
, its inverse can be computed by using
the Euclidean algorithm to ﬁnd gcd(a, m), since that algorithm also
provides a solution to the equation ax +my = gcd(a, m) = 1, which is
equivalent to ax ≡ 1 (mod m).
Example: Find the multiplicative inverse of 17 in Z
∗
64
. Answer: We
use the Euclidean algorithm:
64 = 3 17 + 13 → r = 13
17 = 1 13 + 4 → r = 4
13 = 3 4 + 1 → r = 1
4 = 4 1 + 0 → r = 0
Now we compute backward:
1 = 13 −3 4 = 13 −3 (17 −1 13) = 4 13 −3 17
= 4 (64 −3 17) −3 17 = 4 64 −15 17 .
Hence (−15) 17 ≡ 1 (mod 64), but −15 ≡ 49 (mod 64), so the in
verse of 17 in (Z
∗
64
, ) is 49. We will denote this by writing 17
−1
= 49
(mod 64), or 17
−1
mod 64 = 49.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.3. MODULAR ARITHMETIC, RSA ALGORITHM 55
3.3.2. Euler’s Phi Function. The number of units in Z
m
is equal
to the number of positive integers not greater than and relatively
prime to m, i.e., the number of integers a such that 1 ≤ a ≤ m and
gcd(a, m) = 1. That number is given by the so called Euler’s phi
function:
φ(m) = number of positive integers not greater than m
and relatively prime to m.
For instance, the positive integers not greater than and relatively prime
to 15 are: 1, 2, 4, 7, 8, 11, 13, 14, hence φ(15) = 8.
We have the following results:
1. If p is a prime number and s ≥ 1, then φ(p
s
) = p
s
− p
s−1
=
p
s
(1 −1/p). In particular φ(p) = p −1.
2. If m
1
, m
2
are two relatively prime positive integers, then φ(m
1
m
2
) =
φ(m
1
) φ(m
2
).
1
3. If m = p
s
1
1
p
s
2
2
. . . p
s
k
k
, where the p
k
are prime and the s
k
are
positive, then
φ(m) = m(1 −1/p
1
) (1 −1/p
2
) . . . (1 −1/p
k
) .
For instance
φ(15) = φ(3 5) = φ(3) φ(5) = (3 −1) (5 −1) = 2 4 = 8 .
3.3.3. Euler’s Theorem. If a and m are two relatively prime
positive integers, m ≥ 2, then
a
φ(m)
≡ 1 (mod m) .
The particular case in which m is a prime number p, Euler’s theorem
is called Fermat’s Little Theorem:
a
p−1
≡ 1 (mod p) .
For instance, if a = 2 and p = 7, then we have, in fact, 2
7−1
= 2
6
=
64 = 1 + 9 7 ≡ 1 (mod 7).
A consequence of Euler’s Theorem is the following. If gcd(a, m) = 1
then
x ≡ y (mod φ(m)) ⇒ a
x
≡ a
y
(mod m) .
1
A function f(x) of positive integers such that gcd(a, b) = 1 ⇒ f(ab) =
f(a)f(b) is called multiplicative.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.3. MODULAR ARITHMETIC, RSA ALGORITHM 56
Consequently, the following function is well deﬁned:
Z
∗
m
Z
φ(m)
→Z
∗
m
([a]
m
, [x]
φ(m)
) → [a
x
]
m
Hence, we can compute powers modulo m in the following way:
a
n
= a
n mod φ(m)
(mod m) ,
if gcd(a, m) = 1. For instance:
3
9734888
mod 100 = 3
9734888 mod φ(100)
mod 100
= 3
9734888 mod 40
mod 100 = 3
8
mod 100 = 6561 mod 100 = 61 .
An even more eﬃcient way to compute powers modulo m is given
in Appendix A, paragraph A.1.
3.3.4. Application to Cryptography: RSA Algorithm. The
RSA algorithm is an encryption scheme designed in 1977 by Ronald
Rivest, Adi Shamir and Leonard Adleman. It allows encrypting a mes
sage with a key (the encryption key) and decrypting it with a diﬀerent
key (the decryption key). The encryption key is public and can be
given to everybody. The decryption key is private and is known only
by the recipient of the encrypted message.
The RSA algorithm is based on the following facts. Given two
prime numbers p and q, and a positive number m relatively prime to p
and q, Euler’s theorem tells us that:
m
φ(pq)
= m
(p−1)(q−1)
= 1 (mod pq) .
Assume now that we have two integers e and d such that e d = 1
(mod φ(pq)). Then we have that
(m
e
)
d
= m
e·d
= m (mod pq) .
So, given m
e
we can recover m modulo pq by raising to the dth power.
The RSA algorithm consists of the following:
1. Generate two large primes p and q. Find their product n = pq.
2. Find two numbers e and d (in the range from 2 to φ(n)) such
that e d = 1 (mod φ(n)). This requires some trial and error.
First e is chosen at random, and the Euclidean algorithm is
used to ﬁnd gcd(e, m), solving at the same time the equation
ex +my = gcd(e, m). If gcd(e, m) = 1 then the value obtained
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.3. MODULAR ARITHMETIC, RSA ALGORITHM 57
for x is d. Otherwise, e is no relatively prime to φ(n) and we
must try a diﬀerent value for e.
3. The public encryption key will be the pair (n, e). The private
decryption key will be the pair (n, d). The encryption key is
given to everybody, while the decryption key is kept secret by
the future recipient of the message.
4. The message to be encrypted is divided into small pieces, and
each piece is encoded numerically as a positive integer m smaller
than n.
5. The number m
e
is reduced modulo n; m
= m
e
mod n.
6. The recipient computes m
= m
d
mod n, with 0 ≤ m
< n.
It remains to prove that m
= m. If m is relatively prime to p and
q, then from Euler’s theorem we get that m
= m (mod n), and since
both are in the range from 0 to n −1 they must be equal. The case in
which p or q divides m is left as an exercise.
3.3.5. The Chinese Remainder Theorem. Let m
1
, m
2
, . . . , m
k
be pairwise relatively prime integers greater than or equal to 2. The
following system of congruences
_
¸
¸
_
¸
¸
_
x ≡ r
1
(mod m
1
)
x ≡ r
2
(mod m
2
)
. . .
x ≡ r
k
(mod m
k
)
has a unique solution modulo M = m
1
m
2
. . . m
k
.
We can ﬁnd a solution to that system in the following way. Let
M
i
= M/m
i
, and s
i
= the inverse of M
i
in Z
m
i
. Then
x = M
1
s
1
r
1
+M
2
s
2
r
2
+ +M
k
s
k
r
k
is a solution to the system.
Example: A group of objects can be arranged in 3 rows leaving 2
left, in 5 rows leaving 4 left, and in 7 rows leaving 6 left. How many
objects are there? Answer: We must solve the following system of
congruences:
_
_
_
x ≡ 2 (mod 3)
x ≡ 4 (mod 5)
x ≡ 6 (mod 7)
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
3.3. MODULAR ARITHMETIC, RSA ALGORITHM 58
We have: M = 3 5 7 = 105, M
1
= 105/3 = 35 ≡ 2 (mod 3),
M
2
= 105/5 = 21 ≡ 1 (mod 5), M
3
= 105/7 = 15 ≡ 1 (mod 7); s
1
=
“inverse of 2 in Z
3
” = 2, s
2
= “inverse of 1 in Z
5
” = 1, s
3
= “inverse
of 1 in Z
7
” = 1. Hence the solution is
x = 35 2 2 + 21 1 4 + 15 1 6 = 314 ≡ 104 (mod 105) .
Hence, any group of 104 + 105 k objects is a possible solution to the
problem.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
CHAPTER 4
Induction, Recurences
4.1. Sequences and Strings
4.1.1. Sequences. A sequence is an (usually inﬁnite) ordered list
of elements. Examples:
1. The sequence of positive integers:
1, 2, 3, 4, . . . , n, . . .
2. The sequence of positive even integers:
2, 4, 6, 8, . . . , 2n, . . .
3. The sequence of powers of 2:
1, 2, 4, 8, 16, . . . , n
2
, . . .
4. The sequence of Fibonacci numbers (each one is the sum of the
two previous ones):
0, 1, 1, 2, 3, 5, 8, 13, . . .
5. The reciprocals of the positive integers:
1,
1
2
,
1
3
,
1
4
, ,
1
n
,
In general the elements of a sequence are represented with an in
dexed letter, say s
1
, s
2
, s
3
, . . . , s
n
, . . . . The sequence itself can be de
ﬁned by giving a rule, e.g.: s
n
= 2n + 1 is the sequence:
3, 5, 7, 9, . . .
Here we are assuming that the ﬁrst element is s
1
, but we can start at
any value of the index that we want, for instance if we declare s
0
to be
the ﬁrst term, the previous sequence would become:
1, 3, 5, 7, 9, . . .
The sequence is symbolically represented ¦s
n
¦ or ¦s
n
¦
∞
n=1
.
59
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
4.1. SEQUENCES AND STRINGS 60
If s
n
≤ s
n+1
for every n the sequence is called increasing. If s
n
≥
s
n+1
then it is called decreasing. For instance s
n
= 2n+1 is increasing:
3, 5, 7, 9, . . . , while s
n
= 1/n is decreasing: 1,
1
2
,
1
3
,
1
4
, .
If we remove elements from a sequence we obtain a subsequence.
E.g., if we remove all odd numbers from the sequence of positive inte
gers:
1, 2, 3, 4, 5 . . . ,
we get the subsequence consisting of the even positive integers:
2, 4, 6, 8, . . .
4.1.2. Sum (Sigma) and Product Notation. In order to ab
breviate sums and products the following notations are used:
1. Sum (or sigma) notation:
n
i=m
a
i
= a
m
+a
m+1
+a
m+2
+ +a
n
2. Product notation:
n
i=m
a
i
= a
m
a
m+1
a
m+2
a
n
For instance: assume a
n
= 2n + 1, then
6
n=3
a
n
= a
3
+a
4
+a
5
+a
6
= 7 + 9 + 11 + 13 = 40 .
6
n=3
a
n
= a
3
a
4
a
5
a
6
= 7 9 11 13 = 9009 .
4.1.3. Strings. Given a set X, a string over X is a ﬁnite ordered
list of elements of X.
Example: If X is the set X = ¦a, b, c¦, then the following are ex
amples of strings over X: aba, aaaa, bba, etc.
Repetitions can be speciﬁed with a superscripts, for instance: a
2
b
3
ac
2
a
3
=
aabbbaccaaa, (ab)
3
= ababab, etc.
The length of a string is its number of elements, e.g., [abaccbab[ = 8,
[a
2
b
7
a
3
c
6
[ = 18.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
4.1. SEQUENCES AND STRINGS 61
The string with no elements is called null string, represented λ. Its
length is, of course, zero: [λ[ = 0.
The set of all strings over X is represented X
∗
. The set of no
null strings over X (i.e., all strings over X except the null string) is
represented X
+
.
Given two strings α and β over X, the string consisting of α followed
by β is called the concatenation of α and β. For instance if α = abac
and β = baaab then αβ = abacbaaab.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
4.2. MATHEMATICAL INDUCTION 62
4.2. Mathematical Induction
Many properties of positive integers can be proved by mathematical
induction.
4.2.1. Principle of Mathematical Induction. Let P be a prop
erty of positive integers such that:
1. Basis Step: P(1) is true, and
2. Inductive Step: if P(n) is true, then P(n + 1) is true.
Then P(n) is true for all positive integers.
Remark: The premise P(n) in the inductive step is called Induction
Hypothesis.
The validity of the Principle of Mathematical Induction is obvious.
The basis step states that P(1) is true. Then the inductive step implies
that P(2) is also true. By the inductive step again we see that P(3)
is true, and so on. Consequently the property is true for all positive
integers.
Remark: In the basis step we may replace 1 with some other integer
m. Then the conclusion is that the property is true for every integer n
greater than or equal to m.
Example: Prove that the sum of the n ﬁrst odd positive integers is
n
2
, i.e., 1 + 3 + 5 + + (2n −1) = n
2
.
Answer: Let S(n) = 1 + 3 + 5 + + (2n −1). We want to prove
by induction that for every positive integer n, S(n) = n
2
.
1. Basis Step: If n = 1 we have S(1) = 1 = 1
2
, so the property is
true for 1.
2. Inductive Step: Assume (Induction Hypothesis) that the prop
erty is true for some positive integer n, i.e.: S(n) = n
2
. We must
prove that it is also true for n +1, i.e., S(n +1) = (n +1)
2
. In
fact:
S(n + 1) = 1 + 3 + 5 + + (2n + 1) = S(n) + 2n + 1 .
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
4.2. MATHEMATICAL INDUCTION 63
But by induction hypothesis, S(n) = n
2
, hence:
S(n + 1) = n
2
+ 2n + 1 = (n + 1)
2
.
This completes the induction, and shows that the property is true for
all positive integers.
Example: Prove that 2n + 1 ≤ 2
n
for n ≥ 3.
Answer: This is an example in which the property is not true for
all positive integers but only for integers greater than or equal to 3.
1. Basis Step: If n = 3 we have 2n + 1 = 2 3 + 1 = 7 and
2
n
= 2
3
= 8, so the property is true in this case.
2. Inductive Step: Assume (Induction Hypothesis) that the prop
erty is true for some positive integer n, i.e.: 2n + 1 ≤ 2
n
. We
must prove that it is also true for n+1, i.e., 2(n+1)+1 ≤ 2
n+1
.
By the induction hypothesis we know that 2n ≤ 2
n
, and we also
have that 3 ≤ 2
n
if n ≥ 3, hence
2(n + 1) + 1 = 2n + 3 ≤ 2
n
+ 2
n
= 2
n+1
.
This completes the induction, and shows that the property is true for
all n ≥ 3.
Exercise: Prove the following identities by induction:
• 1 + 2 + 3 + +n =
n(n + 1)
2
.
• 1
2
+ 2
2
+ 3
2
+ +n
2
=
n(n + 1) (2n + 1)
6
.
• 1
3
+ 2
3
+ 3
3
+ +n
3
= (1 + 2 + 3 + +n)
2
.
4.2.2. Strong Form of Mathematical Induction. Let P be a
property of positive integers such that:
1. Basis Step: P(1) is true, and
2. Inductive Step: if P(k) is true for all 1 ≤ k ≤ n then P(n + 1)
is true.
Then P(n) is true for all positive integers.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
4.2. MATHEMATICAL INDUCTION 64
Example: Prove that every integer n ≥ 2 is prime or a product of
primes. Answer:
1. Basis Step: 2 is a prime number, so the property holds for
n = 2.
2. Inductive Step: Assume that if 2 ≤ k ≤ n, then k is a prime
number or a product of primes. Now, either n + 1 is a prime
number or it is not. If it is a prime number then it veriﬁes the
property. If it is not a prime number, then it can be written as
the product of two positive integers, n + 1 = k
1
k
2
, such that
1 < k
1
, k
2
< n + 1. By induction hypothesis each of k
1
and
k
2
must be a prime or a product of primes, hence n + 1 is a
product of primes.
This completes the proof.
4.2.3. The WellOrdering Principle. Every nonempty set of
positive integers has a smallest element.
Example: Prove that
√
2 is irrational (i.e.,
√
2 cannot be written as
a quotient of two positive integers) using the wellordering principle.
Answer: Assume that
√
2 is rational, i.e.,
√
2 = a/b, where a and
b are integers. Note that since
√
2 > 1 then a > b. Now we have
2 = a
2
/b
2
, hence 2 b
2
= a
2
. Since the left hand side is even, then
a
2
is even, but this implies that a itself is even, so a = 2 a
. Hence:
2 b
2
= 4 a
2
, and simplifying: b
2
= 2 a
2
. From here we see that
√
2 =
b/a
. Hence starting with a fractional representation of
√
2 = a/b
we end up with another fractional representation
√
2 = b/a
with a
smaller numerator b < a. Repeating the same argument with the
fraction b/a
we get another fraction with an even smaller numerator,
and so on. So the set of possible numerators of a fraction representing
√
2 cannot have a smallest element, contradicting the wellordering
principle. Consequently, our assumption that
√
2 is rational has to be
false.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
4.3. RECURRENCE RELATIONS 65
4.3. Recurrence Relations
Here we look at recursive deﬁnitions under a diﬀerent point of view.
Rather than deﬁnitions they will be considered as equations that we
must solve. The point is that a recursive deﬁnition is actually a def
inition when there is one and only one object satisfying it, i.e., when
the equations involved in that deﬁnition have a unique solution. Also,
the solution to those equations may provide a closedform (explicit)
formula for the object deﬁned.
The recursive step in a recursive deﬁnition is also called a recurrence
relation. We will focus on kthorder linear recurrence relations, which
are of the form
C
0
x
n
+C
1
x
n−1
+C
2
x
n−2
+ +C
k
x
n−k
= b
n
,
where C
0
,= 0. If b
n
= 0 the recurrence relation is called homogeneous.
Otherwise it is called nonhomogeneous.
The basis of the recursive deﬁnition is also called initial conditions
of the recurrence. So, for instance, in the recursive deﬁnition of the
Fibonacci sequence, the recurrence is
F
n
= F
n−1
+F
n−2
or
F
n
−F
n−1
−F
n−2
= 0 ,
and the initial conditions are
F
0
= 0, F
1
= 1 .
One way to solve some recurrence relations is by iteration, i.e., by
using the recurrence repeatedly until obtaining a explicit closeform
formula. For instance consider the following recurrence relation:
x
n
= r x
n−1
(n > 0) ; x
0
= A.
By using the recurrence repeatedly we get:
x
n
= r x
n−1
= r
2
x
n−2
= r
3
x
n−3
= = r
n
x
0
= Ar
n
,
hence the solution is x
n
= Ar
n
.
In the following we assume that the coeﬃcients C
0
, C
1
, . . . , C
k
are
constant.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
4.3. RECURRENCE RELATIONS 66
4.3.1. First Order Recurrence Relations. The homogeneous
case can be written in the following way:
x
n
= r x
n−1
(n > 0) ; x
0
= A.
Its general solution is
x
n
= Ar
n
,
which is a geometric sequence with ratio r.
The nonhomogeneous case can be written in the following way:
x
n
= r x
n−1
+c
n
(n > 0) ; x
0
= A.
Using the summation notation, its solution can be expressed like this:
x
n
= Ar
n
+
n
k=1
c
k
r
n−k
.
We examine two particular cases. The ﬁrst one is
x
n
= r x
n−1
+c (n > 0); x
0
= A.
where c is a constant. The solution is
x
n
= Ar
n
+c
n
k=1
r
n−k
= Ar
n
+c
r
n
−1
r −1
if r ,= 1 ,
and
x
n
= A +c n if r = 1 .
Example: Assume that a country with currently 100 million people
has a population growth rate (birth rate minus death rate) of 1% per
year, and it also receives 100 thousand immigrants per year (which
are quickly assimilated and reproduce at the same rate as the native
population). Find its population in 10 years from now. (Assume that
all the immigrants arrive in a single batch at the end of the year.)
Answer: If we call x
n
= population in year n from now, we have:
x
n
= 1.01 x
n−1
+ 100, 000 (n > 0); x
0
= 100, 000, 000 .
This is the equation above with r = 1.01, c = 100, 000 and A =
100, 000, 00, hence:
x
n
= 100, 000, 000 1.01
n
+ 100, 000
1.01
n
−1
1.01 −1
= 100, 000, 000 1.01
n
+ 1000 (1.01
n
−1) .
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
4.3. RECURRENCE RELATIONS 67
So:
x
10
= 110, 462, 317 .
The second particular case is for r = 1 and c
n
= c + d n, where c
and d are constant (so c
n
is an arithmetic sequence):
x
n
= x
n−1
+c +d n (n > 0) ; x
0
= A.
The solution is now
x
n
= A +
n
k=1
(c +d k) = A +c n +
d n(n + 1)
2
.
4.3.2. Second Order Recurrence Relations. Now we look at
the recurrence relation
C
0
x
n
+C
1
x
n−1
+C
2
x
n−2
= 0 .
First we will look for solutions of the form x
n
= c r
n
. By plugging in
the equation we get:
C
0
c r
n
+C
1
c r
n−1
+C
2
c r
n−2
= 0 ,
hence r must be a solution of the following equation, called the char
acteristic equation of the recurrence:
C
0
r
2
+C
1
r +C
2
= 0 .
Let r
1
, r
2
be the two (in general complex) roots of the above equation.
They are called characteristic roots. We distinguish three cases:
1. Distinct Real Roots. In this case the general solution of the
recurrence relation is
x
n
= c
1
r
n
1
+c
2
r
n
2
,
where c
1
, c
2
are arbitrary constants.
2. Double Real Root. If r
1
= r
2
= r, the general solution of the
recurrence relation is
x
n
= c
1
r
n
+c
2
nr
n
,
where c
1
, c
2
are arbitrary constants.
3. Complex Roots. In this case the solution could be expressed
in the same way as in the case of distinct real roots, but in
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
4.3. RECURRENCE RELATIONS 68
order to avoid the use of complex numbers we write r
1
= r e
αi
,
r
2
= r e
−αi
, k
1
= c
1
+c
2
, k
2
= (c
1
−c
2
) i, which yields:
1
x
n
= k
1
r
n
cos nα +k
2
r
n
sin nα.
Example: Find a closedform formula for the Fibonacci sequence
deﬁned by:
F
n+1
= F
n
+F
n−1
(n > 0) ; F
0
= 0, F
1
= 1 .
Answer: The recurrence relation can be written
F
n
−F
n−1
−F
n−2
= 0 .
The characteristic equation is
r
2
−r −1 = 0 .
Its roots are:
2
r
1
= φ =
1 +
√
5
2
; r
2
= −φ
−1
=
1 −
√
5
2
.
They are distinct real roots, so the general solution for the recurrence
is:
F
n
= c
1
φ
n
+c
2
(−φ
−1
)
n
.
Using the initial conditions we get the value of the constants:
_
(n = 0) c
1
+c
2
= 0
(n = 1) c
1
φ +c
2
(−φ
−1
) = 1
⇒
_
c
1
= 1/
√
5
c
2
= −1/
√
5
Hence:
F
n
=
1
√
5
_
φ
n
−(−φ)
−n
_
.
1
Remainder: e
αi
= cos α + i sin α.
2
φ =
1+
√
5
2
is the Golden Ratio.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
CHAPTER 5
Counting
5.1. Basic Principles
5.1.1. The Rule of Sum. If a task can be performed in m ways,
while another task can be performed in n ways, and the two tasks
cannot be performed simultaneously, then performing either task can
be accomplished in m+n ways.
Set theoretical version of the rule of sum: If A and B are disjoint
sets (A ∩ B = ∅) then
[A ∪ B[ = [A[ +[B[ .
More generally, if the sets A
1
, A
2
, . . . , A
n
are pairwise disjoint, then:
[A
1
∪ A
2
∪ ∪ A
n
[ = [A
1
[ +[A
2
[ + +[A
n
[ .
For instance, if a class has 30 male students and 25 female students,
then the class has 30 + 25 = 45 students.
5.1.2. The Rule of Product. If a task can be performed in m
ways and another independent task can be performed in n ways, then
the combination of both tasks can be performed in mn ways.
Set theoretical version of the rule of product: Let A B be the
Cartesian product of sets A and B. Then:
[A B[ = [A[ [B[ .
More generally:
[A
1
A
2
A
n
[ = [A
1
[ [A
2
[ [A
n
[ .
For instance, assume that a license plate contains two letters fol
lowed by three digits. How many diﬀerent license plates can be printed?
Answer: each letter can be printed in 26 ways, and each digit can be
printed in 10 ways, so 26 26 10 10 10 = 676000 diﬀerent plates can
be printed.
69
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
5.1. BASIC PRINCIPLES 70
Exercise: Given a set A with m elements and a set B with n ele
ments, ﬁnd the number of functions from A to B.
5.1.3. The InclusionExclusion Principle. The inclusionexclusion
principle generalizes the rule of sum to nondisjoint sets.
In general, for arbitrary (but ﬁnite) sets A, B:
[A ∪ B[ = [A[ +[B[ −[A ∩ B[ .
Example: Assume that in a university with 1000 students, 200 stu
dents are taking a course in mathematics, 300 are taking a course in
physics, and 50 students are taking both. How many students are
taking at least one of those courses?
Answer: If U = total set of students in the university, M = set
of students taking Mathematics, P = set of students taking Physics,
then:
[M ∪ P[ = [M[ +[P[ −[M ∩ P[ = 300 + 200 −50 = 450
students are taking Mathematics or Physics.
For three sets the following formula applies:
[A ∪ B ∪ C[ =
[A[ +[B[ +[C[ −[A ∩ B[ −[A ∩ C[ −[B ∩ C[ +[A ∩ B ∩ C[ ,
and for an arbitrary union of sets:
[A
1
∪ A
2
∪ ∪ A
n
[ = s
1
−s
2
+s
3
−s
4
+ ±s
n
,
where s
k
= sum of the cardinalities of all possible kfold intersections
of the given sets.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
5.2. COMBINATORICS 71
5.2. Combinatorics
5.2.1. Permutations. Assume that we have n objects. Any ar
rangement of any k of these objects in a given order is called a per
mutation of size k. If k = n then we call it just a permutation of the
n objects. For instance, the permutations of the letters a, b, c are the
following: abc, acb, bac, bca, cab, cba. The permutations of size 2 of
the letters a, b, c, d are: ab, ac, ad, ba, bc, bd, ca, cb, cd, da, db, dc.
Note that the order is important. Given two permutations, they
are considered equal if they have the same elements arranged in the
same order.
We ﬁnd the number P(n, k) of permutations of size k of n given
objects in the following way: The ﬁrst object in an arrangement can
be chosen in n ways, the second one in n − 1 ways, the third one in
n −2 ways, and so on, hence:
P(n, k) = n (n −1)
(k factors)
(n −k + 1) =
n!
(n −k)!
,
where n! = 1 2 3
(n factors)
n is called “n factorial ”.
The number P(n, k) of permutations of n objects is
P(n, n) = n! .
By convention 0! = 1.
For instance, there are 3! = 6 permutations of the 3 letters a, b, c.
The number of permutations of size 2 of the 4 letters a, b, c, d is P(4, 2) =
4 3 = 12.
Exercise: Given a set A with m elements and a set B with n ele
ments, ﬁnd the number of onetoone functions from A to B.
5.2.2. Combinations. Assume that we have a set A with n ob
jects. Any subset of A of size r is called a combination of n ele
ments taken r at a time. For instance, the combinations of the letters
a, b, c, d, e taken 3 at a time are: abc, abd, abe, acd, ace, ade, bcd, bce,
bde, cde, where two combinations are considered identical if they have
the same elements regardless of their order.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
5.2. COMBINATORICS 72
The number of subsets of size r in a set A with n elements is:
C(n, r) =
n!
r! (n −r)!
.
The symbol
_
n
r
_
(read “n choose r”) is often used instead of C(n, r).
One way to derive the formula for C(n, r) is the following. Let A
be a set with n objects. In order to generate all possible permutations
of size r of the elements of A we 1) take all possible subsets of size
r in the set A, and 2) permute the k elements in each subset in all
possible ways. Task 1) can be performed in C(n, r) ways, and task
2) can be performed in P(r, r) ways. By the product rule we have
P(n, r) = C(n, r) P(r, r), hence
C(n, r) =
P(n, r)
P(r, r)
=
n!
r! (n −r)!
.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
5.3. GENERALIZED PERMUTATIONS AND COMBINATIONS 73
5.3. Generalized Permutations and Combinations
5.3.1. Permutations with Repeated Elements. Assume that
we have an alphabet with k letters and we want to write all possible
words containing n
1
times the ﬁrst letter of the alphabet, n
2
times the
second letter,. . . , n
k
times the kth letter. How many words can we
write? We call this number P(n; n
1
, n
2
, . . . , n
k
), where n = n
1
+ n
2
+
+n
k
.
Example: With 3 a’s and 2 b’s we can write the following 5letter
words: aaabb, aabab, abaab, baaab, aabba, ababa, baaba, abbaa, babaa,
bbaaa.
We may solve this problem in the following way, as illustrated with
the example above. Let us distinguish the diﬀerent copies of a letter
with subscripts: a
1
a
2
a
3
b
1
b
2
. Next, generate each permutation of this
ﬁve elements by choosing 1) the position of each kind of letter, then 2)
the subscripts to place on the 3 a’s, then 3) these subscripts to place on
the 2 b’s. Task 1) can be performed in P(5; 3, 2) ways, task 2) can be
performed in 3! ways, task 3) can be performed in 2!. By the product
rule we have 5! = P(5; 3, 2) 3! 2!, hence P(5; 3, 2) = 5!/3! 2!.
In general the formula is:
P(n; n
1
, n
2
, . . . , n
k
) =
n!
n
1
! n
2
! . . . n
k
!
.
5.3.2. Combinations with Repetition. Assume that we have a
set A with n elements. Any selection of r objects from A, where each
object can be selected more than once, is called a combination of n
objects taken r at a time with repetition. For instance, the combinations
of the letters a, b, c, d taken 3 at a time with repetition are: aaa, aab,
aac, aad, abb, abc, abd, acc, acd, add, bbb, bbc, bbd, bcc, bcd, bdd, ccc, ccd,
cdd, ddd. Two combinations with repetition are considered identical
if they have the same elements repeated the same number of times,
regardless of their order.
Note that the following are equivalent:
1. The number of combinations of n objects taken r at a time with
repetition.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
5.3. GENERALIZED PERMUTATIONS AND COMBINATIONS 74
2. The number of ways r identical objects can be distributed among
n distinct containers.
3. The number of nonnegative integer solutions of the equation:
x
1
+x
2
+ +x
n
= r .
Example: Assume that we have 3 diﬀerent (empty) milk containers
and 7 quarts of milk that we can measure with a one quart measuring
cup. In how many ways can we distribute the milk among the three
containers? We solve the problem in the following way. Let x
1
, x
2
, x
3
be
the quarts of milk to put in containers number 1, 2 and 3 respectively.
The number of possible distributions of milk equals the number of non
negative integer solutions for the equation x
1
+ x
2
+ x
3
= 7. Instead
of using numbers for writing the solutions, we will use strokes, so for
instance we represent the solution x
1
= 2, x
2
= 1, x
3
= 4, or 2 + 1 + 4,
like this: [[ +[ +[[[[. Now, each possible solution is an arrangement of 7
strokes and 2 plus signs, so the number of arrangements is P(9; 7, 2) =
9!/7! 2! =
_
9
7
_
.
The general solution is:
P(n +r −1; r, n −1) =
(n +r −1)!
r! (n −1)!
=
_
n +r −1
r
_
.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
5.4. BINOMIAL COEFFICIENTS 75
5.4. Binomial Coeﬃcients
5.4.1. Binomial Theorem. The following identities can be easily
checked:
(x +y)
0
= 1
(x +y)
1
= x +y
(x +y)
2
= x
2
+ 2 xy +y
2
(x +y)
3
= x
3
+ 3 x
2
y + 3 xy
2
+y
3
They can be generalized by the following formula, called the Binomial
Theorem:
(x +y)
n
=
n
k=0
_
n
k
_
x
n−k
y
k
=
_
n
0
_
x
n
+
_
n
1
_
x
n−1
y +
_
n
2
_
x
n−2
y
2
+
+
_
n
n −1
_
xy
n−1
+
_
n
n
_
y
n
.
We can ﬁnd this formula by writing
(x +y)
n
= (x +y) (x +y)
(n factors)
(x +y) ,
expanding, and grouping terms of the form x
a
y
b
. Since there are n
factors of the form (x + y), we have a + b = n, hence the terms must
be of the form x
n−k
y
k
. The coeﬃcient of x
n−k
y
k
will be equal to the
number of ways in which we can select the y from any k of the factors
(and the x from the remaining n − k factors), which is C(n, k) =
_
n
k
_
.
The expression
_
n
k
_
is often called binomial coeﬃcient.
Exercise: Prove
n
k=0
_
n
k
_
= 2
n
and
n
k=0
(−1)
k
_
n
k
_
= 0 .
Hint: Apply the binomial theorem to (1 + 1)
2
and (1 −1)
2
.
5.4.2. Properties of Binomial Coeﬃcients. The binomial co
eﬃcients have the following properties:
1.
_
n
k
_
=
_
n
n −k
_
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
5.4. BINOMIAL COEFFICIENTS 76
2.
_
n + 1
k + 1
_
=
_
n
k
_
+
_
n
k + 1
_
The ﬁrst property follows easily from
_
n
k
_
=
n!
k!(n −k)!
.
The second property can be proved by choosing a distinguished
element a in a set A of n + 1 elements. The set A has
_
n+1
k+1
_
subsets
of size k + 1. Those subsets can be partitioned into two classes: that
of the subsets containing a, and that of the subsets not containing a.
The number of subsets containing a equals the number of subsets of
A −¦a¦ of size k, i.e.,
_
n
k
_
. The number of subsets not containing a is
the number of subsets of A − ¦a¦ of size k + 1, i.e.,
_
n
k+1
_
. Using the
sum principle we ﬁnd that in fact
_
n+1
k+1
_
=
_
n
k
_
+
_
n
k+1
_
.
5.4.3. Pascal’s Triangle. The properties shown in the previous
section allow us to compute binomial coeﬃcients in a simple way. Look
at the following triangular arrangement of binomial coeﬃcients:
_
0
0
_
_
1
0
_ _
1
1
_
_
2
0
_ _
2
1
_ _
2
2
_
_
3
0
_ _
3
1
_ _
3
2
_ _
3
3
_
_
4
0
_ _
4
1
_ _
4
2
_ _
4
3
_ _
4
4
_
We notice that each binomial coeﬃcient on this arrangement must
be the sum of the two closest binomial coeﬃcients on the line above it.
This together with
_
n
0
_
=
_
n
n
_
= 1, allows us to compute very quickly
the values of the binomial coeﬃcients on the arrangement:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
This arrangement of binomial coeﬃcients is called Pascal’s Trian
gle.
1
1
Although it was already known by the Chinese in the XIV century.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
5.5. THE PIGEONHOLE PRINCIPLE 77
5.5. The Pigeonhole Principle
5.5.1. The Pigeonhole Principle. The pigeonhole principle is
used for proving that a certain situation must actually occur. It says
the following: If n pigeonholes are occupied by m pigeons and m > n,
then at least one pigeonhole is occupied by more than one pigeon.
1
Example: In any given set of 13 people at least two of them have
their birthday during the same month.
Example: Let S be a set of eleven 2digit numbers. Prove that
S must have two elements whose digits have the same diﬀerence (for
instance in S = ¦10, 14, 19, 22, 26, 28, 49, 53, 70, 90, 93¦, the digits of
the numbers 28 and 93 have the same diﬀerence: 8 − 2 = 6, 9 − 3 =
6.) Answer: The digits of a twodigit number can have 10 possible
diﬀerences (from 0 to 9). So, in a list of 11 numbers there must be two
with the same diﬀerence.
Example: Assume that we choose three diﬀerent digits from 1 to
9 and write all permutations of those digits. Prove that among the
3digit numbers written that way there are two whose diﬀerence is a
multiple of 500. Answer: There are 9 8 7 = 504 permutations of
three digits. On the other hand if we divide the 504 numbers by 500
we can get only 500 possible remainders, so at least two numbers give
the same remainder, and their diﬀerence must be a multiple of 500.
Exercise: Prove that if we select n + 1 numbers from the set S =
¦1, 2, 3, . . . , 2n¦, among the numbers selected there are two such that
one is a multiple of the other one.
1
The Pigeonhole Principle (Schubfachprinzip) was ﬁrst used by Dirichlet in
Number Theory. The term pigeonhole actually refers to one of those oldfashioned
writing desks with thin vertical wooden partitions in which to ﬁle letters.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
CHAPTER 6
Probability
6.1. Probability
6.1.1. Introduction. Assume that we perform an experiment such
as tossing a coin or rolling a die. The set of possible outcomes is called
the sample space of the experiment. An event is a subset of the sample
space. For instance, if we toss a coin three times, the sample space is
S = ¦HHH, HHT, HTH, HTT, THH, THT, TTH, TTT¦ .
The event “at least two heads in a row” would be the subset
E = ¦HHH, HHT, THH¦ .
If all possible outcomes of an experiment have the same likelihood of
occurrence, then the probability of an event A ⊂ S is given by Laplace’s
rule:
P(E) =
[E[
[S[
.
For instance, the probability of getting at least two heads in a row in
the above experiment is 3/8.
6.1.2. Probability Function. In general the likelihood of diﬀer
ent outcomes of an experiment may not be the same. In that case
the probability of each possible outcome x is a function P(x). This
function veriﬁes:
0 ≤ P(x) ≤ 1 for all x ∈ S
and
x∈S
P(x) = 1 .
The probability of an event E ⊆ S will be
P(E) =
x∈E
P(x)
78
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
6.1. PROBABILITY 79
Example: Assume that a die is loaded so that the probability of
obtaining n point is proportional to n. Find the probability of getting
an odd number when rolling that die.
Answer: First we must ﬁnd the probability function P(n) (n =
1, 2, . . . , 6). We are told that P(n) is proportional to n, hence P(n) =
kn. Since P(S) = 1 we have P(1)+P(2)+ P(6) = 1, i.e., k1+k2+
+k 6 = 21k = 1, so k = 1/21 and P(n) = n/21. Next we want to
ﬁnd the probability of E = ¦2, 4, 6¦, i.e. P(E) = P(2) +P(4) +P(6) =
2
21
+
4
21
+
6
21
=
12
21
.
6.1.3. Properties of probability. Let P be a probability func
tion on a sample space S. Then:
1. For every event E ⊆ S,
0 ≤ P(E) ≤ 1 .
2. P(∅) = 0, P(S) = 1.
3. For every event E ⊆ S, if E = is the complement of E (“not
E”) then
P(E) = 1 −P(E) .
4. If E
1
, E
2
⊆ S are two events, then
P(E
1
∪ E
2
) = P(E
1
) +P(E
2
) −P(E
1
∩ E
2
) .
In particular, if E
1
∩E
2
= ∅ (E
1
and E
2
are mutually exclusive,
i.e., they cannot happen at the same time) then
P(E
1
∪ E
2
) = P(E
1
) +P(E
2
) .
Example: Find the probability of getting a sum diﬀerent from 10 or
12 after rolling two dice. Answer: We can get 10 in 3 diﬀerent ways:
4+6, 5+5, 6+4, so P(10) = 3/36. Similarly we get that P(12) = 1/36.
Since they are mutually exclusive events, the probability of getting 10
or 12 is P(10) +P(12) = 3/36+1/36 = 4/36 = 1/9. So the probability
of not getting 10 or 12 is 1 −1/9 = 8/9.
6.1.4. Conditional Probability. The conditional probability of
an event E given F, represented P(E [ F), is the probability of E
assuming that F has occurred. It is like restricting the sample space
to F. Its value is
P(E [ F) =
P(E ∩ F)
P(F)
.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
6.1. PROBABILITY 80
Example: Find the probability of obtaining a sum of 10 after rolling
two fair dice. Find the probability of that event if we know that at least
one of the dice shows 5 points.
Answer: We call E = “obtaining sum 10” and F = “at least one of
the dice shows 5 points”. The number of possible outcomes is 6 6 =
36. The event “obtaining a sum 10” is E = ¦(4, 6), (5, 5), (6, 4)¦, so
[E[ = 3. Hence the probability is P(E) = [E[/[S[ = 3/36 = 1/12.
Now, if we know that at least one of the dice shows 5 points then the
sample space shrinks to
F = ¦(1, 5), (2, 5), (3, 5), (4, 5), (5, 5), (6, 5), (5, 1), (5, 2), (5, 3), (5, 4), (5, 6)¦ ,
so [F[ = 11, and the ways to obtain a sum 10 are E ∩ F = ¦(5, 5)¦,
[E ∩F[ = 1, so the probability is P(E [ F) = P(E ∩F)/P(F) = 1/11.
6.1.5. Independent Events. Two events E and F are said to be
independent if the probability of one of them does not depend on the
other, e.g.:
P(E [ F) = P(E) .
In this circumstances:
P(E ∩ F) = P(E) P(F) .
Note that if E is independent of F then also F is independent of E,
e.g., P(F [ E) = P(F).
Example: Assume that the probability that a shooter hits a target
is p = 0.7, and that hitting the target in diﬀerent shots are independent
events. Find:
1. The probability that the shooter does not hit the target in one
shot.
2. The probability that the shooter does not hit the target three
times in a row.
3. The probability that the shooter hits the target at least once
after shooting three times.
Answer:
1. P(not hitting the target in one shot) = 1 −0.7 = 0.3.
2. P(not hitting the target three times in a row) = 0.3
3
= 0.027.
3. P(hitting the target at least once in three shots) = 1−0.027 =
0.973.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
6.1. PROBABILITY 81
6.1.6. Bayes’ Theorem. Suppose that a sample space S is parti
tioned into n classes C
1
, C
2
, . . . , C
n
which are pairwise mutually exclu
sive and whose union ﬁlls the whole sample space. Then for any event
F we have
P(F) =
n
i=1
P(F [ C
i
) P(C
i
)
and
P(C
j
[ F) =
P(F [ C
j
) P(C
j
)
P(F)
.
Example: In a country with 100 million people 100 thousand of
them have disease X. A test designed to detect the disease has a 99%
probability of detecting it when administered to a person who has it,
but it also has a 5% probability of giving a false positive when given to
a person who does not have it. A person is given the test and it comes
out positive. What is the probability that that person has the disease?
Answer: The classes are C
1
= “has the disease” and C
2
= “does
not have the disease”, and the event is F = “the test gives a positive”.
We have: [S[ = 100, 000, 000, [C
1
[ = 100, 000, [C
2
[ = 99, 900, 000,
hence P(C
1
) = [C
1
[/[S[ = 0.001, P(C
2
) = [C
2
[/[S[ = 0.999. Also
P(F [ C
1
) = 0.99, P(F [ C
2
) = 0.05. Hence:
P(F) = P(F [ C
1
) P(C
1
) +P(F [ C
2
) P(C
2
)
= 0.99 0.001 + 0.05 0.999 = 0.05094 ,
and by Bayes’ theorem:
P(C
1
[ F) =
P(F [ C
1
) P(C
1
)
P(F)
=
0.99 0.001
0.05094
= 0.019434628 ≈ 2%.
(So the test is really of little use when given to a random person—
however it might be useful in combination with other tests or other
evidence that the person might have the disease.)
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
CHAPTER 7
Graph Theory
7.1. Graphs
7.1.1. Graphs. Consider the following examples:
1. A road map, consisting of a number of towns connected with
roads.
2. The representation of a binary relation deﬁned on a given set.
The relation of a given element x to another element y is rep
resented with an arrow connecting x to y.
The former is an example of (undirected) graph. The latter is an
example of a directed graph or digraph.
a
b
c
d
Figure 7.1. Undirected Graph.
In general a graph G consists of two things:
1. The vertex set V , whose elements are called vertices, nodes or
points.
2. The edge set E or set of edges connecting pairs of vertices. If
the edges are directed then they are also called directed edges
or arcs. Each edge e ∈ E is associated with a pair of vertices.
82
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.1. GRAPHS 83
a
b
c
d
Figure 7.2. Directed Graph.
A graph is sometimes represented by the pair (V, E) (we assume V
and E ﬁnite).
If the graph is undirected and there is a unique edge e connecting x
and y we may write e = ¦x, y¦, so E can be regarded as set of unordered
pairs. In this context we may also write e = (x, y), understanding that
here (x, y) is not an ordered pair, but the name of an edge.
If the graph is directed and there is a unique edge e pointing from
x to y, then we may write e = (x, y), so E may be regarded as a set
of ordered pairs. If e = (x, y), the vertex x is called origin, source or
initial point of the edge e, and y is called the terminus, terminating
vertex or terminal point.
b
c
d
a
Figure 7.3. Graph with parallel edges.
Two vertices connected by an edge are called adjacent. They are
also the endpoints of the edge, and the edge is said to be incident to
each of its endpoints. If the graph is directed, an edge pointing from
vertex x to vertex y is said to be incident from x and incident to y. An
edge connecting a vertex to itself is called a loop. Two edges connecting
the same pair of points (and pointing in the same direction if the graph
is directed) are called parallel or multiple.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.1. GRAPHS 84
A graph with neither loops nor multiple edges is called a simple
graph. If a graph has multiple edges but no loops then it is called a
multigraph. If it has loops (and possible also multiple edges) then it is
called a pseudograph.
The following table summarizes the graph terminology
Table 7.1.1. Graph Terminology
Type Edges Multiple Edges Allowed? Loops Allowed?
Simple graph indirected no no
Multigraph indirected yes no
Pseudograph indirected yes yes
Directed graph directed no yes
Directed multigraph directed yes yes
The degree of a vertex v, represented deg(v), is the number of edges
that contain it (loops are counted twice). A vertex of degree zero (not
connected to any other vertex) is called isolated. A vertex of degree 1
is called pendant.
The Handshaking Theorem. Let G = (V, E) be an undirected graph
with e edges. Then
2e =
v∈V
deg (v) .
(This applies even if multiple edges and loops are present.)
In a graph with directed edges, the indegree of a vertex v, denoted
deg
−
(v), is the number of edges with v as their terminal vertex. The
outdegree of a vertex v, denoted deg
+
(v), is the number of edges with
v as their initial vertex. (Note that a loop at a vertex contributes 1 to
both the indegree and the outdegree of this vertex.)
Number of vertices of odd degree. An undirected graph has an even
number of vertices of odd degree. Proof: Let V
e
and V
o
respectively
the set of vertices of even degree and the set of vertices of odd degree
in an undirected graph G = (V, E). Then
2e =
v∈V
deg (v) =
v∈Ve
deg (v) +
v∈Vo
deg (v) .
Since deg(v) is even for v ∈ V
e
, the ﬁrst sum in the right hand side of
the equality is even. The total sum must be 2e, which is even, so the
second sum must be even too. But its terms are all odd, so there must
be an even number of them.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.1. GRAPHS 85
Sum of degrees in an directed graph. Let G = (V, E) be a directed
graph. Then
v∈V
deg
−
(v) =
v∈V
deg
+
(v) = [E[ .
A weighted graph is a graph whose edges have been labeled with
numbers. The length of a path in a weighted graph is the sum of the
weights of the edges in the path.
a
b
d
c
6
3
4
6
7
Figure 7.4. Weighted Graph.
7.1.2. Special Graphs. Here we examine a few special graphs.
The ncube: A graph with with 2
n
vertices labeled 0, 1, . . . , 2
n
− 1
so that two of them are connected with an edge if their binary repre
sentation diﬀers in exactly one bit.
000
001
010 011
100
101
110 111
Figure 7.5. 3cube.
Complete Graph: a simple undirected graph G such that every pair
of distinct vertices in G are connected by an edge. The complete graph
of n vertices is represented K
n
(ﬁg. 7.6). A complete directed graph is
a simple directed graph G = (V, E) such that every pair of distinct
vertices in G are connected by exactly one edge—so, for each pair of
distinct vertices, either (x, y) or (y, x) (but not both) is in E.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.1. GRAPHS 86
a
b
c d
e
Figure 7.6. Complete graph K
5
.
Bipartite Graph: a graph G = (V, E) in which V can be partitioned
into two subsets V
1
and V
2
so that each edge in G connects some vertex
in V
1
to some vertex in V
2
. A bipartite simple graph is called complete
if each vertex in V
1
is connected to each vertex in V
2
. If [V
1
[ = m and
[V
2
[ = n, the corresponding complete bipartite graph is represented
K
m,n
(ﬁg. 7.7).
A graph is bipartite iﬀ its vertices can be colored with two colors
so that every edge connects vertices of diﬀerent color.
Question: Is the ncube bipartite. Hint: color in red all vertices
whose binary representation has an even number of 1’s, color in blue
the ones with an odd number of 1’s.
b
p
q
r
s
a
c
Figure 7.7. Complete bipartite graph K
3,4
.
Regular Graph: a simple graph whose vertices have all the same
degree. For instance, the ncube is regular.
7.1.3. Subgraph. Given a graph G = (V, E), a subgraph G
=
(V
, E
) of G is another graph such that V
⊆ V and E
⊆ E. If
V
= V then G
is called a spanning subgraph of G.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.1. GRAPHS 87
Given a subset of vertices U ⊆ V , the subgraph of G induced by
U, denoted ¸U), is the graph whose vertex set is U, and its edge set
contains all edges from G connecting vertices in U.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.2. REPRESENTATIONS OF GRAPHS 88
7.2. Representations of Graphs
7.2.1. Adjacency matrix. The adjacency matrix of a graph is a
matrix with rows and columns labeled by the vertices and such that
its entry in row i, column j, i ,= j, is the number of edges incident on i
and j. For instance the following is the adjacency matrix of the graph
of ﬁgure 7.8:
1
a b c d
a
b
c
d
_
_
_
_
0 1 0 1
1 0 2 0
0 2 0 0
1 0 0 1
_
_
_
_
b
a
d
c
e1
e2
e3
e5
e4
Figure 7.8
One of the uses of the adjacency matrix A of a simple graph G is
to compute the number of paths between two vertices, namely entry
(i, j) of A
n
is the number of paths of length n from i to j.
7.2.2. Incidence matrix. The incidence matrix of a graph G is a
matrix with rows labeled by vertices and columns labeled by edges, so
that entry for row v column e is 1 if e is incident on v, and 0 otherwise.
As an example, the following is the incidence matrix of graph of ﬁgure
7.8:
1
For some authors if i = j then the entry is twice the number of loops incident
on i; so in the example of ﬁgure 7.8 entry (d, d) would be 2 instead of 1.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.2. REPRESENTATIONS OF GRAPHS 89
e
1
e
2
e
3
e
4
e
5
a
b
c
d
_
_
_
_
1 0 0 1 0
1 1 1 0 0
0 1 1 0 0
0 0 0 1 1
_
_
_
_
7.2.3. Graph Isomorphism. Two graphs G
1
= (V
1
, E
1
), G
2
=
(V
2
, E
2
), are called isomorphic if there is a bijection f : V
1
→ V
2
and a
bijection g : E
1
→ E
2
such that an edge e is adjacent to vertices v and
w if and only if g(e) is adjacent to f(v) and f(w) (ﬁg. 7.9).
a1
b1
c1
d1
e1
a2 b2
c2
d2
e2
Figure 7.9. Two isomorphic graphs.
Two graphs are isomorphic if and only if for some ordering of their
vertices their adjacency matrices are equal.
An invariant is a property such that if a graph has it then all graphs
isomorphic to it also have it. Examples of invariants are their number
of vertices, their number of edges, “has a vertex of degree k”, “has a
simple cycle of length l”, etc. It is possible to prove that two graphs are
not isomorphic by showing an invariant property that one has and the
other one does not have. For instance the graphs in ﬁgure 7.10 cannot
be isomorphic because one has a vertex of degree 2 and the other one
doesn’t.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.2. REPRESENTATIONS OF GRAPHS 90
a
b
c
e
f
3
4
5 1
2
Figure 7.10. Non isomorphic graphs.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.3. PATHS AND CIRCUITS 91
7.3. Paths and Circuits
7.3.1. Paths. A path from v
0
to v
n
of length n is a sequence of
n+1 vertices (v
k
) and n edges (e
k
) of the formv
0
, e
1
, v
1
, e
2
, v
2
, . . . , e
n
, v
n
,
where each edge e
k
connects v
k−1
with v
k
(and points from v
k−1
to v
k
if the edge is directed). The path may be speciﬁed by giving only the
sequence of edges e
1
, . . . , e
n
. If there are no multiple edges we can
specify the path by giving only the vertices: v
0
, v
1
, . . . , v
n
. The path
is a circuit (or cycle) if it begins and ends at the same vertex, i.e.,
v
0
= v
n
, and has lenght greater than zero. A path or circuit is simple
if it does not contain the same edge twice.
7.3.2. Connected Graphs. A graph Gis called connected if there
is a path between any two distinct vertices of G. Otherwise the graph
is called disconnected. A directed graph is connected if its associated
undirected graph (obtained by ignoring the directions of the edges) is
connected. A directed graph is strongly connected if for every pair of
distict points u, v, there is a path from u to v and there is a path
from v to u. A connected component of G is any connected subgraph
G
= (V
, E
) of G = (V, E) such that there is not edge (in G) from a
vertex in V to a vertex in V −V
. Given a vertex in G, the component
of G containing v is the subgraph G
of G consisting of all edges and
vertices of g contained in some path beginning at v.
7.3.3. The Seven Bridges of K¨ onigsberg. This is a classical
problem that started the discipline today called graph theory.
During the eighteenth century the city of K¨onigsberg (in East Prus
sia)
1
was divided into four sections, including the island of Kneiphop,
by the Pregel river. Seven bridges connected the regions, as shown in
ﬁgure 7.11. It was said that residents spent their Sunday walks trying
to ﬁnd a way to walk about the city so as to cross each bridge exactly
once and then return to the starting point. The ﬁrst person to solve
the problem (in the negative) was the Swiss mathematician Leonhard
Euler in 1736. He represented the sections of the city and the seven
bridges by the graph of ﬁgure 7.12, and proved that it is impossible to
ﬁnd a path in it that transverses every edge of the graph exactly once.
In the next section we study why this is so.
1
The city is currently called Kaliningrad and is part of the Russian republic.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.3. PATHS AND CIRCUITS 92
Figure 7.11. The Seven Bridges of K¨onigsberg.
Figure 7.12. Graph for the Seven Bridges of K¨onigsberg.
7.3.4. Euler paths and circuits. Let G = (V, E) be a graph
with no isolated vertices. An Euler path in G is a simple path that
transverses every edge of the graph exactly once. Analogously, an
Euler circuit in G is a simple circuit that transverses every edge of the
graph exactly once.
Existence of Euler Paths and Circuits. The graphs that have an
Euler path can be characterized by looking at the degree of their ver
tices. Recall that the degree of a vertex v, represented deg(v), is the
number of edges that contain v (loops are counted twice). An even
vertex is a vertex with even degree; an odd vertex is a vertex with odd
degree. The sum of the degrees of all vertices in a graph equals twice
its number of edges, so it is an even number. As a consequence, the
number of odd vertices in a graph is always even.
Let G be a connected multigraph. Then G contains an Euler circuit
if and only if G its vertices have even degree. Also, G contains an Euler
path from vertex a to vertex b (,= a) if and only if a and b have odd
degree, and all its other vertices have even degree.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.3. PATHS AND CIRCUITS 93
7.3.5. Hamilton Circuits. A Hamilton circuit in a graph G is
a circuit that contains each vertex of G once (except for the starting
and ending vertex, which occurs twice). A Hamilton path in G is a
path (not a circuit) that contains each vertex of G once. Note that by
deleting an edge in a Hamilton circuit we get a Hamilton path, so if a
graph has a Hamilton circuit, then it also has a Hamilton path. The
converse is not true, i.e., a graph may have a Hamilton path but not
a Hamilton circuit. Exercise: Find a graph with a Hamilton path but
no Hamilton circuit.
c
d
e
f
g
h
b
a
i
j
k
l
m
n
o
p
q
r
s t
Figure 7.13. Hamilton’s puzzle.
In general it is not easy to determine if a given graph has a Hamilton
path or circuit, although often it is possible to argue that a graph has no
Hamilton circuit. For instance if G = (V, E) is a bipartite graph with
vertex partition ¦V
1
, V
2
¦ (so that each edge in G connects some vertex
in V
1
to some vertex in V
2
), then G cannot have a Hamilton circuit if
[V
1
[ ,= [V
2
[, because any path must contain alternatively vertices from
V
1
and V
2
, so any circuit in G must have the same number of vertices
from each of both sets.
Edge removal argument. Another kind of argument consists of re
moving edges trying to make the degree of every vertex equal two. For
instance in the graph of ﬁgure 7.14 we cannot remove any edge because
that would make the degree of b, e or d less than 2, so it is impossi
ble to reduce the degree of a and c. Consequently that graph has no
Hamilton circuit.
Dirac’s Theorem. If G is a simple graph with n vertices with n ≥ 3
such that the degree of every vertex in G is at least n/2, then G has a
Hamilton circuit.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.3. PATHS AND CIRCUITS 94
b
c
d
a
e
Figure 7.14. Graph without Hamilton circuit.
Ore’s Theorem. If G is a simple graph with n vertices with n ≥ 3
such that deg(u) +deg(v) ≥ n for every pair of nonadjacent vertices u
and v in G, then G has a Hamilton circuit.
The Traveling Salesperson Problem. Given a weighted graph, the
traveling salesperson problem (TSP) consists of ﬁnding a Hamilton cir
cuit of minimum length in this graph. The name comes from a classical
problem in which a salesperson must visit a number of cities and go
back home traveling the minimum distance possible. One way to solve
the problem consists of searching all possible Hamilton circuits and
computing their length, but this is very ineﬃcient. Unfortunately no
eﬃcient algorithm is known for solving this problem (and chances are
that none exists).
Remark: (Digression on P/NP problems.) Given a weighted graph
with n vertices the problem of determining whether it contains a Hamil
ton circuit of length not greater than a given L is known to be NP
complete. This means the following. First it is a decision problem, i.e.,
a problem whose solution is “yes” or “no”. A decision problem is said
to be polynomial, or belong to the class P, if it can be solved with an
algorithm of complexity O(n
k
) for some integer k. It is said to be non
deterministic polynomial, or belong to the class NP, if in all cases when
the answer is “yes” this can be determined with a nondeterministic
algorithm of complexity O(n
k
). A nondeterministic algorithm is an
algorithm that works with an extra hint, for instance in the TSP, if
G has a Hamilton circuit of length not greater than L the hint could
consist of a Hamilton circuit with length not greater than L—so the
task of the algorithm would be just to check that in fact that length
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.3. PATHS AND CIRCUITS 95
is not greater than L.
2
Currently it is not known whether the class
NP is strictly larger than the class P, although it is strongly suspected
that it is. The class NP contains a subclass called NPcomplete con
taining the “hardest” problems of the class, so that their complexity
must be higher than polynomial unless P=NP. The TSP is one of these
problems.
Gray Codes. A Gray code is a sequence s
1
, s
2
, . . . , s
2
n of nbinary
strings verifying the following conditions:
1. Every nbinary string appears somewhere in the sequence.
2. Two consecutive strings s
i
and s
i+1
diﬀer exactly in one bit.
3. s
2
n and s
1
diﬀer in exactly one bit.
For instance: 000, 001, 011, 010, 110, 111, 101, 100,
The problem of ﬁnding a gray code is equivalent to ﬁnding a Hamil
ton circuit in the ncube.
7.3.6. Dijkstra’s ShortestPath Algorithm. This is an algo
rithm to ﬁnd the shortest path from a vertex a to another vertex z in
a connected weighted graph. Edge (i, j) has weight w(i, j) > 0, and
vertex x is labeled L(x) (minimum distance from a if known, otherwise
∞). The output is L(z) = length of a minimum path from a to z.
1: procedure dijkstra(w,a,z,L)
2: L(a) := 0
3: for all vertices x ,= a
4: L(x) := ∞
5: T := set of all vertices
6: ¦T is the set of all vertices whose shortest¦
7: ¦distance from a has not been found yet¦
8: while z in T
9: begin
10: choose v in T with minimum L(v)
11: T := T  ¦v¦
12: for each x in T adjacent to v
13: L(x) := min¦L(x),L(v)+w(v,x)¦
2
Informally, P problems are “easy to solve”, and NP problems are problems
whose answer is “easy to check”. In a sense the P=NP problem consist of de
termining whether every problem whose solution is easy to check is also easy to
solve.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.3. PATHS AND CIRCUITS 96
14: end
15: return L(z)
16: end dijkstra
For instance consider the graph in ﬁgure 7.15.
a
b c
z
d
e
f
2
1
3
2
3 1
2
4
4
Figure 7.15. Shortest path from a to z.
The algorithm would label the vertices in the following way in each
iteration (the boxed vertices are the ones removed from T):
iteration a b c d e f z
0 0 ∞ ∞ ∞ ∞ ∞ ∞
1 0 2 ∞ 3 4 ∞ ∞
2 0 2 3 3 4 ∞ ∞
3 0 2 3 3 4 ∞ 6
4 0 2 3 3 4 ∞ 4
5 0 2 3 3 4 6 4
6 0 2 3 3 4 6 4
At this point the algorithm returns the value 4.
Complexity of Dijkstra’s algorithm. For an nvertex, simple, con
nected weighted graph, Dijkstra’s algorithm has a worstcase run time
of Θ(n
2
).
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.4. PLANAR GRAPHS 97
7.4. Planar Graphs
7.4.1. Planar Graphs. A graph G is planar if it can be drawn
in the plane with its edges intersecting at their vertices only. One such
drawing is called an embedding of the graph in the plane.
A particular planar representation of a planar graph is called a map.
A map divides the plane into a number of regions or faces (one of them
inﬁnite).
7.4.2. Graph Homeomorphism. If a graph G has a vertex v of
degree 2 and edges (v, v
1
), (v, v
2
) with v
1
,= v
2
, we say that the edges
(v, v
1
) and (v, v
2
) are in series. Deleting such vertex v and replacing
(v, v
1
) and (v, v
2
) with (v
1
, v
2
) is called a series reduction. For instance,
in the third graph of ﬁgure 7.16, the edges (h, b) and (h, d) are in series.
By removing vertex h we get the ﬁrst graph in the left.
The opposite of a series reduction is an elementary subdivision It
consists of replacing an edge (u, v) with two edges (u, w) and (w, v),
where w is a new vertex.
Two graphs are said to be homeomorphic if they are isomorphic or
can be reduced to isomorphic graphs by a sequence of series reductions
(ﬁg. 7.16). Equivalently, two graphs are homeomorphic if they can be
obtained from the same graph by a sequence of elementary subdivisions.
d
e
a
b d
e
f
c
d
e a
b
a
b
h
c c
Figure 7.16. Three homeomorphic graphs.
Note that if a graph G is planar, then all graphs homeomorphic to
G are also planar.
7.4.3. Some Results About Planar Graphs.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.4. PLANAR GRAPHS 98
1. Euler’s Formula: Let G = (V, E) be a connected planar graph,
and let v = [V [, e = [E[, and r = number of regions in which
some given embedding of G divides the plane. Then:
v −e +r = 2 .
Note that this implies that all plane embeddings of a given
graph deﬁne the same number of regions.
2. Let G = (V, E) be a simple connected planar graph with v
vertices, e ≥ 3 edges and r regions. Then 3r ≤ 2e and e ≤
3v −6.
3. The graph K
5
is nonplanar. Proof: in K
5
we have v = 5 and
e = 10, hence 3v − 6 = 9 < e = 10, which contradicts the
previous result.
4. The graph K
3,3
is nonplanar. Proof: in K
3,3
we have v = 6 and
e = 9. If K
3,3
were planar, from Euler’s formula we would have
f = 5. On the other hand, each region is bounded by at least
four edges, so 4f ≤ 2e, i.e., 20 ≤ 18, which is a contradiction.
5. Kuratowski’s Theorem: A graph is nonplanar if and only if it
contains a subgraph that is homeomorphic to either K
5
or K
3,3
.
7.4.4. Dual Graph of a Map. A map is deﬁned by some planar
graph G = (V, E) embedded in the plane. Assume that the map divides
the plane into a set of regions R = ¦r
1
, r
2
, . . . , r
k
¦. For each region r
i
,
select a point p
i
in the interior of r
i
. The dual graph of that map is the
graph G
d
= (V
d
, E
d
), where V
d
= ¦p
1
, p
2
, . . . , p
k
¦, and for each edge in
E separating the regions r
i
and r
j
, there is an edge in E
d
connecting p
i
and p
j
. Warning: Note that a diﬀerent embedding of the same graph
G may give diﬀerent (and nonisomorphic) dual graphs. Exercise: Find
the duals of the maps shown in ﬁgure 7.9, and prove that they are not
isomorphic.
7.4.5. Graph Coloring. Consider the problem of coloring a map
M in such a way that no adjacent regions (sharing a border) have the
same color. This is equivalent to coloring the vertices of the dual graph
of M in such a way that no adjacent vertices have the same color.
In general, a coloring of a graph is an assignment of a color to
each vertex of the graph. The coloring is called proper if there are
no adjacent vertices with the same color. If a graph can be properly
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
7.4. PLANAR GRAPHS 99
a
b
d
c
e
Figure 7.17. Dual graph of a map.
colored with n colors we say that it is ncolorable. The minimum
number of colors needed to properly color a given graph G = (V, E) is
called the chromatic number of G, and is represented χ(G). Obviously
χ(G) ≤ [V [.
7.4.6. Some Results About Graph Coloring.
1. χ(K
n
) = n.
2. Let G be a simple graph. The following statements are equiva
lent:
(a) χ(G) = 2.
(b) G is bipartite.
(c) Every circuit in G has even length
3. Five Color Theorem (Kempe, Heawood) (not hard to prove):
Every simple, planar graph is 5colorable.
4. Four Color Theorem (Appel and Haken, 1976), proved with
an intricate computer analysis of conﬁgurations: Every simple,
planar graph is 4colorable.
Exercise: Find a planar graph G such that χ(G) = 4.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
CHAPTER 8
Trees
8.1. Trees
8.1.1. Terminology. A tree is a connected undirected graph with
no simple circuits.
A rooted tree is a tree in which a particular vertex is designated as
the root and every edge is directed away from the root.
We draw rooted trees with the root at the top. The arrows indicat
ing the directions of the edges can be omitted.
r
a
b
c
d
e f
g
h
i
j
k
l m n o p
Figure 8.1. A rooted tree.
The level of a vertex v is the length of the simple path from the root
to v. The height of a rooted tree is the maximum level of its vertices.
Let T be a tree with root v
0
. Suppose that x, y and z are vertices
in T and that (v
0
, v
1
, . . . , v
n
) is a simple path in T. Then:
1. v
n−1
is the parent of v
n
.
2. v
0
, v
1
, . . . , v
n−1
are ancestors of v
n
.
3. v
n
is a child of v
n−1
.
4. If x is an ancestor of y, y is a descendant of x.
5. If x and y are children of z, x and y are siblings.
6. If x has no children, it is called a terminal vertex or leaf.
100
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.1. TREES 101
7. If x is not a terminal vertex, it is an internal or branch vertex.
8. The subtree of T rooted at x is the graph (V, E), where V is x
together with its descendants and E = edges of simple paths
from x to some vertex in E.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.2. BINARY TREES 102
8.2. Binary Trees
8.2.1. Binary Trees. A binary tree is a rooted tree in which each
vertex has at most two children, designated as left child and right child.
If a vertex has one child, that child is designated as either a left child
or a right child, but not both. A full binary tree is a binary tree in
which each vertex has exactly two children or none. The following are
a few results about binary trees:
1. If T is a full binary tree with i internal vertices, then T has i+1
terminal vertices and 2i + 1 total vertices.
2. If a binary tree of height h has t terminal vertices, then t ≤ 2
h
.
More generally we can deﬁne a mary tree as a rooted tree in which
every internal vertex has no more than m children. The tree is called
a full mary tree if every internal vertex has exactly m children. An
ordered rooted tree is a rooted tree where the children of each internal
vertex are ordered. A binary tree is just a particular case of mary
ordered tree (with m = 2).
8.2.2. Binary Search Trees. Assume S is a set in which elements
(which we will call “data”) are ordered; e.g., the elements of S can be
numbers in their natural order, or strings of alphabetic characters in
lexicographic order. A binary search tree associated to S is a binary
tree T in which data from S are associate with the vertices of T so
that, for each vertex v in T, each data item in the left subtree of v is
less than the data item in v, and each data item in the right subtree of
v is greater than the data item in v.
Example: Figure 8.2 contains a binary search tree for the set S =
¦1, 2, 3, 4, 5, 6, 7, 8, 9, 10¦. In order to ﬁnd a element we start at the root
and compare it to the data in the current vertex (initially the root).
If the element is greater we continue through the right child, if it is
smaller we continue through the left child, if it is equal we have found
it. If we reach a terminal vertex without founding the element, then
that element is not present in S.
8.2.3. Making a Binary Search Tree. We can store data in a
binary search tree by randomly choosing data from S and placing it in
the tree in the following way: The ﬁrst data chosen will be the root of
the tree. Then for each subsequent data item, starting at the root we
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.2. BINARY TREES 103
5
2
1
3 7
6
9
10
8 4
Figure 8.2. Binary Search Tree.
compare it to the data in the current vertex v. If the new data item is
greater than the data in the current vertex then we move to the right
child, if it is less we move to the left child. If there is no such child
then we create one and put the new data in it. For instance, the tree in
ﬁgure 8.3 has been made from the following list of words choosing them
in the order they occur: “IN A PLACE OF LA MANCHA WHOSE
NAME I DO NOT WANT TO REMEMBER”.
A
I
DO
WHOSE
PLACE
OF
IN
LA
REMEMBER
TO
WANT
MANCHA
NAME
NOT
Figure 8.3. Another binary Search Tree.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.3. DECISION TREES, TREE ISOMORPHISMS 104
8.3. Decision Trees, Tree Isomorphisms
8.3.1. Decision Trees. A decision tree is a tree in which each
vertex represents a question and each descending edge from that vertex
represents a possible answer to that question.
Example: The FiveCoins Puzzle. In this puzzle we have ﬁve coins
C
1
, C
2
, C
3
, C
4
, C
5
that are identical in appearance, but one is either
heavier or lighter that the others. The problem is to identify the bad
coin and determine whether it is lighter or heavier using only a pan
balance and comparing the weights of two piles of coins. The problem
can be solved in the following way. First we compare the weights of
C
1
and C
2
. If C
1
is heavier than C
2
then we know that either C
1
is
the bad coin and is heavier, or C
2
is the bad coin and it is lighter.
Then by comparing say C
1
with any of the other coins, say C
5
, we can
determine whether the bad coin is C
1
and is heavier (if C
1
it is heavier
than C
5
) or it is C
2
and is lighter (if C
1
has the same weight as C
5
). If
C
1
is lighter than C
2
we proceed as before with “heavier” and “lighter”
reversed. If C
1
and C
2
have the same weight we can try comparing C
3
and C
4
in a similar manner. If their weights are the same then we know
that the bad coin is C
5
, and we can determine whether it is heavier or
lighter by comparing it to say C
1
. The corresponding decision tree is
the following:
C1:C5
C1:C2
C3:C4 C1:C5
C3:C5
C1:C5
C3:C5
C1,H
C2,L
C4,H C3,L
C2,H C1,L
C5,H C5,L C4,L C3,H
left
right
balanced
balanced
balanced right
left right
right
balanced
right
left
balanced
left
left
balanced
Figure 8.4. Decision tree for the 5 coins puzzle.
In each vertex “C
i
: C
j
” means that we compare coins C
i
and C
j
by
placing C
i
on the left pan and C
j
on the right pan of the balance, and
each edge is labeled depending on what side of the balance is heavier.
The terminal vertices are labeled with the bad coin and whether it is
heavier (H) or lighter (L). The decision tree is optimal in the sense
that in the worst case it uses three weighings, and there is no way to
solve the problem with less than that—with two weighings we can get
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.3. DECISION TREES, TREE ISOMORPHISMS 105
at most nine possible outcomes, which are insuﬃcient to distinguish
among ten combinations of 5 possible bad coins and the bad coin being
heavier or lighter.
8.3.2. Complexity of Sorting. Sorting algorithms work by com
paring elements and rearranging them as needed. For instance we can
sort three elements a
1
, a
2
, a
3
with the decision tree shown in ﬁgure 8.5
a1<a2?
a1<a3?
a2<a3? a1<a3?
a1,a2,a3
a1,a3,a2 a3,a1,a2
a2,a3,a1 a3,a2,a1
a2,a1,a3 a2<a3?
YES
YES
YES NO
NO YES NO YES
NO
NO
.
Figure 8.5. Sorting three elements.
Since there are 3! = 6 possible arrangements of 3 elements, we need
a decision tree with at least 6 possible outcomes or terminal vertices.
Recall that in a binary tree of height h with t terminal vertices the
following inequality holds: t ≤ 2
h
. Hence in our case 6 < 2
h
, which
implies h ≥ 3, so the algorithm represented by the decision tree in
ﬁgure 8.5 is optimal in the sense that it uses the minimum possible
number of comparisons in the worstcase.
More generally in order to sort n elements we need a decision tree
with n! outcomes, so its height h(n) will verify n! ≤ 2
h(n)
. Since
log
2
(n!) = Θ(nlog
2
n),
1
we have h(n) = Ω(nlog
2
n). So the worse
case complexity of a sorting algorithm is Ω(nlog
2
n). Since the merge
sort algorithm uses precisely Θ(nlog
2
n) comparisons, we know that it
is optimal.
8.3.3. Isomorphisms of Trees. Assume that T
1
is a tree with
vertex set V
1
and T
2
is another tree with vertex set V
2
. If they are
rooted trees then we call their roots r
1
and r
2
respectively. We will
study three diﬀerent kinds of treeisomorphisms between T
1
and T
2
.
1
According to Stirling’s formula, n! ≈ n
n
e
−n
√
2πn, so taking logarithms
log
2
n! ≈ nlog
2
n −nlog
2
e +
1
2
log
2
(2πn) = Θ(nlog
2
n).
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.3. DECISION TREES, TREE ISOMORPHISMS 106
1. Usual graphisomorphism between trees: T
1
and T
2
are isomor
phic if there is a bijection f : V
1
→ V
2
that preserves adjacency,
i.e., f(v) is adjacent to f(w) if and only if v is adjacent to w.
2. Rootedtreeisomorphism: T
1
and T
2
are isomorphic if there is
a bijection f : V
1
→ V
2
that preserves adjacency and the root
vertex, i.e.:
(a) f(v) is adjacent to f(w) if and only if v is adjacent to w.
(b) f(r
1
) = r
2
.
3. Binarytreeisomorphism: Two binary trees T
1
and T
2
are iso
morphic if there is a bijection f : V
1
→ V
2
that preserves adja
cency, and the root vertex, and left/right children, i.e.:
(a) f(v) is adjacent to f(w) if and only if v is adjacent to w.
(b) f(r
1
) = r
2
.
(c) f(v) is a left child of f(w) if and only if v is a left child of
w.
(d) f(v) is a right child of f(w) if and only if v is a right child
of w.
Example: Figure 8.6 shows three trees which are graphisomorphic.
On the other hand as rooted trees T
2
and T
3
are isomorphic, but they
are not isomorphic to T
1
because the root of T
1
has degree 3, while the
roots of T
2
and T
3
have degree 2. Finally T
2
and T
3
are not isomorphic
as binary trees because the left child of the root in T
2
is a terminal
vertex while the left child of the root of T
3
has two children.
T1 T2 T3
Figure 8.6. Trees with diﬀerent kinds of isomorphisms.
Exercise: Find all nonisomorphic 3vertex free trees, 3vertex rooted
trees and 3vertex binary trees. Answer: Figure 8.7 shows all 5 non
isomorphic 3vertex binary trees. As rooted trees T
2
–T
5
are isomorphic,
but T
1
is not isomorphic to the others, so there are 2 nonisomorphic
3vertex rooted trees represented for instance by T
1
and T
2
. All of them
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.3. DECISION TREES, TREE ISOMORPHISMS 107
are isomorphic as free trees, so there is only 1 nonisomorphic 3vertex
free tree.
T1 T3 T2 T4 T5
Figure 8.7. Nonisomorphic binary trees.
8.3.4. Huﬀman Codes. Usually characters are represented in a
computer with ﬁx length bit strings. Huﬀman codes provide an alter
native representation with variable length bit strings, so that shorter
strings are used for the most frequently used characters. As an example
assume that we have an alphabet with four symbols: A = ¦a, b, c, d¦.
Two bits are enough for representing them, for instance a = 11, b = 10,
c = 01, d = 00 would be one such representation. With this encoding
ncharacter words will have 2n bits. However assume that they do not
appear with the same frequency, instead some are more frequent that
others, say a appears with a frequency of 50%, b 30%, c 15% and d
5%. Then the following enconding would be more eﬃcient than the ﬁx
length encoding: a = 1, b = 01, c = 001, d = 000. Now in average an
ncharacter word will have 0.5n a’s, 0.3n b’s, 0.15n c’s and 0.05n d’s,
hence its length will be 0.5n1+0.3n2+0.15n3+0.05n3 = 1.7n, which
is shorter than 2n. In general the length per character of a given en
coding with characters a
1
, a
2
, . . . , a
n
whose frequencies are f
1
, f
2
, . . . , f
n
is
1
F
n
k=1
f
k
l(a
k
) ,
where l(a
k
) = length of a
k
and F =
n
k=1
f
k
. The problem now is,
given an alphabet and the frequencies of its characters, ﬁnd an optimal
encoding that provides minimum average length for words.
Fix length and Huﬀman codes can be represented by trees like in
ﬁgure 8.8. The code of each symbol consists of the sequence of labels of
the edges in the path from the root to the leaf with the desired symbol.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.3. DECISION TREES, TREE ISOMORPHISMS 108
a
b d
0
1 0
1 0
1
a
b
c d
1 0
1 0
1 0
Huffman code Fix length code
c
Figure 8.8. Fix length code and Huﬀman code.
8.3.5. Constructing an Optimal Huﬀman Code. An optimal
Huﬀman code is a Huﬀman code in which the average length of the
symbols is minimum. In general an optimal Huﬀman code can be made
as follows. First we list the frequencies of all the codes and represent
the symbols as vertices (which at the end will be leaves of a tree).
Then we replace the two smallest frequencies f
1
and f
2
with their sum
f
1
+ f
2
, and join the corresponding two symbols to a common vertex
above them by two edges, one labeled 0 and the other one labeled 1.
Than common vertex plays the role of a new symbol with a frequency
equal to f
1
+f
2
. Then we repeat the same operation with the resulting
shorter list of frequencies until the list is reduced to one element and
the graph obtained becomes a tree.
Example: Find the optimal Huﬀman code for the following table of
symbols:
character frequency
a 2
b 3
c 7
d 8
e 12
Answer: : The successive reductions of the list of frequencies are as
follows:
2, 3
.¸¸.
5
, 7, 8, 12 → 5, 7
.¸¸.
12
, 8, 12 → 12, 8, 12
Here we have a choice, we can choose to add the ﬁrst 12 and 8, or
8 and the second 12. Let’s choose the former:
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.3. DECISION TREES, TREE ISOMORPHISMS 109
12, 8
.¸¸.
20
, 12 → 20, 12
. ¸¸ .
32
→ 32
The tree obtained is the following:
a b
d
e
1
1
1
1 0
0
0
0
2
3
7
8
12
5
12
20
32
c
Figure 8.9. Optimal Huﬀman code 1.
The resulting code is as follows:
character code
a 1111
b 1110
c 110
d 10
e 0
The other choice yields the following:
12, 8, 12
.¸¸.
20
→ 20, 12
. ¸¸ .
32
→ 32
a
b
c d
e
1 0
1
1
0
0
1
32
12 20
5
7 8
12
2
3
0
Figure 8.10. Optimal Huﬀman code 2.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.3. DECISION TREES, TREE ISOMORPHISMS 110
character code
a 111
b 110
c 10
d 01
e 00
8.3.6. Game Trees. Trees are used in the analysis of some games.
As an example we study the following game using a tree: Initially
there are two piles with 3 coins and 1 coin respectively. Taking turns
two players remove any number of coins from one of the piles. The
player that removes the last coin loses. The following tree represents
all possible sequences of choices. Each node shows the number of coins
in each pile, and each edge represents a possible “move” (choice) from
one of the players. The ﬁrst player is represented with a box and the
second player is represented with an circle.
1
0
0
1
0
0
0
0
0
0
1
1
0
1
0
1
2
0
0
0
1
0
0
0
0
0
1
0
0
1
0
0
0
0
1
1
2
1
1
0
0
0
0
0
1
0
0
0
0
0
2
0
3
0
3
1
1 1 1
0 0 0
1
1
0
1
0 0
1
1
1
1
0
1
0 0
0 1
0
1
0
0
1
1
Figure 8.11. Tree of a game.
The analysis of the game starts by labeling each terminal vertex
with “1” if it represents a victory for the ﬁrst player and “0” if it
represents a victory for the second player. This numbers represent
the “value” of each position of the game, so that the ﬁrst player is
interested in making it “maximum” and the second player wants to
make it “minimum”. Then we continue labeling the rest of the vertices
in the following way. After all the children of a given vertex have
been labeled, we label the vertex depending on whether it is a “ﬁrst
player” position (box) or a “second player” position (circle). First
player positions are labeled with the maximum value of the labels of
its children, second player positions are labeled with the minimum
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.3. DECISION TREES, TREE ISOMORPHISMS 111
value of the labels of its children. This process is called the minimax
procedure. Every vertex labeled “1” will represent a position in which
the ﬁrst player has advantage and can win if he/she works without
making mistakes; on the other hand, vertices labeled “0” represent
positions for which the second player has advantage. Now the strategy
is for the ﬁrst player to select at each position a children with maximum
value, while the second player will be interested in selecting children
with minimum value. If the starting position has been labeled “1” that
means that the ﬁrst player has a winning strategy, otherwise the second
player has advantage. For instance in the present game the ﬁrst player
has advantage at the initial position, and only one favorable movement
at that point:
_
3
1
_
→
_
0
1
_
, i.e., he/she must remove all 3 coins from
the ﬁrst pile. If for any reason the ﬁrst player makes a mistake and
removes say one coin from the ﬁrst pile, going to position
_
2
1
_
, then the
second player has one favorable move to vertex
_
0
1
_
, which is the one
with minimum “value”.
Alphabeta pruning. In some games the game tree is so complicated
that it cannot be fully analyzed, so it is built up to a given depth only.
The vertices reached at that depth are not terminal, but they can
be “evaluated” using heuristic methods (for instance in chess usually
losing a knight is a better choice than losing the queen, so a position
with one queen and no knights will have a higher value than one with
no queen and one knight). Even so the evaluation and labeling of the
vertices can be time consuming, but we can bypass the evaluation of
many vertices using the technique of alphabeta pruning. The idea is
to skip a vertex as soon as it becomes obvious that its value will not
aﬀect the value of its parent. In order to do that with a ﬁrst player
(boxed) vertex v, we assign it an alpha value equal to the maximum
value of its children evaluated so far. Assume that we are evaluating
one of its children w, which will be a second player (circled) position. If
at any point a children of w gets a value less than or equal to the alpha
value of v then it will become obvious that the value of w is going to
be less than the current alpha value of v, so it will not aﬀect the value
of v and we can stop the process of evaluation of w (prone the subtree
at w). That is called an alpha cutoﬀ. Similarly, at a second player
(circled) vertex v, we assign a beta value equal to the minimum value
of its children evaluated so far, and practice a beta cutoﬀ when one of
its grandchildren gets a value greater than or equal to the current beta
value of v, i.e., we prone the subtree at w, where w is the parent of
that grandchildren.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.3. DECISION TREES, TREE ISOMORPHISMS 112
4
4 3
5 6 3 4 5 1 ?
v
w
(4)
1
7
Figure 8.12. Alpha cutoﬀ.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.4. TREE TRANSVERSAL 113
8.4. Tree Transversal
8.4.1. Transversal Algorithms. In order to motivate this sub
ject, we introduce the concept of Polish notation. Given a (not nec
essarily commutative) binary operation ◦, it is customary to represent
the result of applying the operation to two elements a, b by placing the
operation symbol in the middle:
a ◦ b .
This is called inﬁx notation. The Polish notation consists of placing
the symbol to the left:
◦ a b .
The reverse Polish notation consists of placing the symbol to the right:
a b ◦ .
The advantage of Polish notation is that it allows us to write ex
pressions without need for parenthesis. For instance, the expression
a∗(b+c) in Polish notation would be ∗ a+b c, while a∗b+c is +∗a b c.
Also, Polish notation is easier to evaluate in a computer.
In order to evaluate an expression in Polish notation, we scan the
expression from right to left, placing the elements in a stack.
1
Each
time we ﬁnd an operator, we replace the two top symbols of the stack
by the result of applying the operator to those elements. For instance,
the expression ∗ +2 3 4 (which in inﬁx notation is “(2 +3) ∗ 4”) would
be evaluated like this:
expression stack
∗ + 2 3 4
∗ + 2 3 4
∗ + 2 3 4
∗ + 2 3 4
∗ 5 4
20
An algebraic expression can be represented by a binary rooted tree
obtained recursively in the following way. The tree for a constant or
variable a has a as its only vertex. If the algebraic expression S is of
1
A stack or lastin ﬁrstout (LIFO) system, is a linear list of elements in which
insertions and deletions take place only at one end, called top of the list. A queue
or ﬁrstin ﬁrstout (FIFO) system, is a linear list of elements in which deletions
take place only at one end, called front of the list, and insertions take place only
at the other end, called rear of the list.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.4. TREE TRANSVERSAL 114
the form S
L
◦ S
R
, where S
L
and S
R
are subexpressions with trees T
L
and T
R
respectively, and ◦ is an operator, then the tree T for S consists
of ◦ as root, and the subtrees T
L
and T
R
(ﬁg. 8.13).
o
T
L
T
R
Figure 8.13. Tree of S
1
◦ S
2
.
For instance, consider the following algebraic expression:
a +b ∗ c +d ↑ e ∗ (f +h) ,
where + denotes addition, ∗ denotes multiplication and ↑ denotes ex
ponentiation. The binary tree for this expression is given in ﬁgure 8.14.
+
*
+
d e
f h
*
b c
+
a
Figure 8.14. Tree for a +b ∗ c +d ↑ e ∗ (f +h).
Given the binary tree of an algebraic expression, its Polish, reverse
Polish and inﬁx representation are diﬀerent ways of ordering the ver
tices of the tree, namely in preorder, postorder and inorder respectively.
The following are recursive deﬁnitions of several orderings of the
vertices of a rooted tree T = (V, E) with root r. If T has only one
vertex r, then r by itself constitutes the preorder, postorder and inorder
transversal of T. Otherwise, let T
1
, . . . , T
k
the subtrees of T from left
to right (ﬁg. 8.15). Then:
1. Preorder Transversal : Pre(T) = r, Pre(T
1
), . . . , Pre(T
k
).
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.4. TREE TRANSVERSAL 115
r
T1 T2
...
Tk
Figure 8.15. Ordering of trees.
2. Postorder Transversal : Post(T) = Post(T
1
), . . . , Post(T
k
), r.
3. Inorder Transversal. If T is a binary tree with root r, left sub
tree T
L
and right subtree T
R
, then: In(T) = In(T
L
), r, In(T
R
).
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.5. SPANNING TREES 116
8.5. Spanning Trees
8.5.1. Spanning Trees. A tree T is a spanning tree of a graph G
if T is a subgraph of G that contains all the vertices of G. For instance
the graph of ﬁgure 8.16 has a spanning tree represented by the thicker
edges.
b
c
d
e
f
g
h
a
i
Figure 8.16. Spanning tree.
Every connected graph has a spanning tree which can be obtained
by removing edges until the resulting graph becomes acyclic. In prac
tice, however, removing edges is not eﬃcient because ﬁnding cycles is
time consuming.
Next, we give two algorithms to ﬁnd the spanning tree T of a loop
free connected undirected graph G = (V, E). We assume that the
vertices of G are given in a certain order v
1
, v
2
, . . . , v
n
. The resulting
spanning tree will be T = (V
, E
).
8.5.2. BreadthFirst Search Algorithm. The idea is to start
with vertex v
1
as root, add the vertices that are adjacent to v
1
, then the
ones that are adjacent to the latter and have not been visited yet, and
so on. This algorithm uses a queue (initially empty) to store vertices
of the graph. In consists of the following:
1. Add v
1
to T, insert it in the queue and mark it as “visited”.
2. If the queue is empty, then we are done. Otherwise let v be the
vertex in the front of the queue.
3. For each vertex v
of G that has not been visited yet and is
adjacent to v (there might be none) taken in order of increasing
subscripts, add vertex v
and edge (v, v
) to T, insert v
in the
queue and mark it as “visited”.
4. Delete v from the queue.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.5. SPANNING TREES 117
5. Go to step 2.
A pseudocode version of the algorithm is as follows:
1: procedure bfs(V,E)
2: S := (v1) ¦ordered list of vertices of a fix level¦
3: V’ := ¦v1¦ ¦v1 is the root of the spanning tree¦
4: E’ := ¦¦ ¦no edges in the spanning tree yet¦
5: while true
6: begin
7: for each x in S, in order,
8: for each y in V  V’
9: if (x,y) is an edge then
10: add edge (x,y) to E’ and vertex y to V’
11: if no edges were added then
12: return T
13: S := children of S
14: end
15: end bfs
Figure 8.17 shows the spanning tree obtained using the breadthﬁrst
search algorithm on the graph with its vertices ordered lexicographi
cally: a, b, c, d, e, f, g, h, i.
a
b
c
d
e
f
g
h
i
Figure 8.17. BreadthFirst Search.
8.5.3. DepthFirst Search Algorithm. The idea of this algo
rithm is to make a path as long as possible, and then go back (back
track) to add branches also as long as possible.
This algorithm uses a stack (initially empty) to store vertices of the
graph. In consists of the following:
1. Add v
1
to T, insert it in the stack and mark it as “visited”.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.5. SPANNING TREES 118
2. If the stack is empty, then we are done. Otherwise let v be the
vertex on the top of the stack.
3. If there is no vertex v
that is adjacent to v and has not been
visited yet, then delete v and go to step 2 (backtrack). Oth
erwise, let v
be the ﬁrst nonvisited vertex that is adjacent to
v.
4. Add vertex v
and edge (v, v
) to T, insert v
in the stack and
mark it as “visited”.
5. Go to step 2.
An alternative recursive deﬁnition is as follows. We deﬁne recur
sively a process P applied to a given vertex v in the following way:
1. Add vertex v to T and mark it as “visited”.
2. If there is no vertex v
that is adjacent to v and has not been
visited yet, then return. Otherwise, let v
be the ﬁrst nonvisited
vertex that is adjacent to v.
3. Add the edge (v, v
) to T.
4. Apply P to v
.
5. Go to step 2 (backtrack).
The DepthFirst Search Algorithm consists of applying the process just
deﬁned to v
1
.
A pseudocode version of the algorithm is as follows:
1: procedure dfs(V,E)
2: V’ := ¦v1¦ ¦v1 is the root of the spanning tree¦
3: E’ := ¦¦ ¦no edges in the spanning tree yet¦
4: w := v1
5: while true
6: begin
7: while there is an edge (w,v) that when added
8: to T does not create a cycle in T
9: begin
10: Choose first v such that (w,v)
11: does not create a cycle in T
12: add (w,v) to E’
13: add v to V’
14: w := v
15: end
16: if w = v1 then
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.5. SPANNING TREES 119
17: return T
18: w := parent of w in T ¦backtrack¦
19: end
20: end
Figure 8.18 shows the spanning tree obtained using the breadthﬁrst
search algorithm on the graph with its vertices ordered lexicographi
cally: a, b, c, d, e, f, g, h, i.
a
b
c
d
e
f
g
h
i
Figure 8.18. DepthFirst Search.
8.5.4. Minimal Spanning Trees. Given a connected weighted
tree G, its minimal spanning tree is a spanning tree of G such that the
sum of the weights of its edges is minimum. For instance for the graph
of ﬁgure 8.19, the spanning tree shown is the one of minimum weight.
a b
c d
e f
4
1
5
2
6
3
2
6
Figure 8.19. Minimum Spanning Tree.
Prim’s Algorithm. An algorithm to ﬁnd a minimal spanning tree is
Prim’s Algorithm. It starts with a single vertex and at each iteration
adds to the current tree a minimum weight edge that does not complete
a cycle.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.5. SPANNING TREES 120
The following is a pseudocode version of Prim’s algorithm. If (x, y)
is an edge in G = (V, E) then w(x, y) is its weight, otherwise w(x, y) =
∞. The starting vertex is s.
1: procedure prim(V,w,s)
2: V’ := ¦s¦ ¦vertex set starts with s¦
3: E’ = ¦¦ ¦edge set initially empty¦
4: for i := 1 to n1 ¦put n edges in spanning tree¦
5: begin
6: find x in V’ and y in V  V’ with minimum w(x,y)
7: add y to V’
8: add (x,y) to E’
9: end
10: return E’
11: end prim
Prim’s algorithm is an example of a greedy algorithm. A greedy
algorithm is an algorithm that optimized the choice at each iteration
without regard to previous choices (“doing the best locally”). Prim’s
algorithm makes a minimum spanning tree, but in general a greedy
algorithm does not always ﬁnds an optimal solution to a given problem.
For instance in ﬁgure 8.20 a greedy algorithm to ﬁnd the shortest path
from a to z, working by adding the shortest available edge to the most
recently added vertex, would return acz, which is not the shortest path.
a
b
c
z
1
2
4
100
101
Figure 8.20
Kruskal’s Algorithm. Another algorithm to ﬁnd a minimal span
ning tree in a connected weighted tree G = (V, E) is Kruskal’s Algo
rithm. It starts with all n vertices of G and no edges. At each iteration
we add an edge having minimum weight that does not complete a cycle.
We stop after adding n −1 edges.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
8.5. SPANNING TREES 121
1: procedure kruskal(E,w,n)
2: V’ := V
3: E’ := ¦¦
4: while E’ < n1
5: begin
6: among all edges not completing a cycle in T
7: choose e of minimum weight and add it to E
8: end
9: T’ = (V’,E’)
10: return T’
11: end kruskal
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
CHAPTER 9
Boolean Algebras
9.1. Combinatorial Circuits
9.1.1. Introduction. At their lowest level digital computers han
dle only binary signals, represented with the symbols 0 and 1. The
most elementary circuits that combine those signals are called gates.
Figure 9.1 shows three gates: OR, AND and NOT.
x1
x2
x1
x2
x x
x1
x2
x2
OR GATE
NOT GATE
AND GATE
x1 +
Figure 9.1. Gates.
Their outputs can be expressed as a function of their inputs by the
following logic tables:
x
1
x
2
x
1
+x
2
1 1 1
1 0 1
0 1 1
0 0 0
OR GATE
122
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
9.1. COMBINATORIAL CIRCUITS 123
x
1
x
2
x
1
x
2
1 1 1
1 0 0
0 1 0
0 0 0
AND GATE
x x
1 0
0 1
NOT GATE
These are examples of combinatorial circuits. A combinatorial cir
cuit is a circuit whose output is uniquely deﬁned by its inputs. They
do not have memory, previous inputs do not aﬀect their outputs. Some
combinations of gates can be used to make more complicated combi
natorial circuits. For instance ﬁgure 9.2 is combinatorial circuit with
the logic table shown below, representing the values of the Boolean
expression y = (x
1
+x
2
) x
3
.
x1
x2
y
x3
Figure 9.2. A combinatorial circuit.
x
1
x
2
x
3
y = (x
1
+x
2
) x
3
1 1 1 0
1 1 0 1
1 0 1 0
1 0 0 1
0 1 1 0
0 1 0 1
0 0 1 1
0 0 0 1
However the circuit in ﬁgure 9.3 is not a combinatorial circuit. If
x
1
= 1 and x
2
= 0 then y can be 0 or 1. Assume that at a given time
y = 0. If we input a signal x
2
= 1, the output becomes y = 1, and
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
9.1. COMBINATORIAL CIRCUITS 124
stays so even after x
2
goes back to its original value 0. That way we
can store a bit. We can “delete” it by switching input x
1
to 0.
x2
x1
y
Figure 9.3. Not a combinatorial circuit.
9.1.2. Properties of Combinatorial Circuits. Here Z
2
= ¦0, 1¦
represents the set of signals handled by combinatorial circuits, and the
operations performed on those signals by AND, OR and NOT gates are
represented by the symbols , + and respectively. Then their prop
erties are the following (a, b, c are elements of Z
2
, i.e., each represents
either 0 or 1):
1. Associative
(a +b) +c = a + (b +c)
(a b) c = a (b c)
2. Commutative
a +b = b +a
a b = b a
3. Distributive
a (b +c) = (a b) + (a c)
a + (b c) = (a +b) (a +c)
4. Identity
a + 0 = a
a 1 = a
5. Complement
a +a = 1
a a = 0
A system satisfying those properties is called a Boolean algebra.
Two Boolean expressions are deﬁned to be equal is they have the
same values for all possible assignments of values to their literals. Ex
ample: x +y = x y, as shown in the following table:
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
9.1. COMBINATORIAL CIRCUITS 125
x y x +y x y
1 1 0 0
1 0 0 0
0 1 0 0
0 0 1 1
9.1.3. Abstract Boolean Algebras. Here we deal with general
Boolean algebras; combinatorial circuits are an example, but there are
others.
A Boolean algebra B = (S, ∨, ∧, , 0, 1) is a set S containing two
distinguished elements 0 and 1, two binary operators ∨ and ∧ on S,
and a unary operator on S, satisfying the following properties (x, y,
z are elements of S):
1. Associative
(x ∨ y) ∨ z = x ∨ (y ∨ z)
(x ∧ y) ∨ z = x ∧ (y ∧ z)
2. Commutative
x ∨ y = y ∨ x
x ∧ y = y ∧ x
3. Distributive
x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z)
x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z)
4. Identity
x ∨ 0 = x
x ∧ 1 = x
5. Complement
x ∨ x = 1
x ∧ x = 0
Example: (Z
2
, +, , , 0, 1) is a Boolean algebra.
Example: If U is a universal set and P(U)= the power set of S (col
lection of subsets of S) then (P(U), ∪, ∩, , ∅, U). is a Boolean algebra.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
9.1. COMBINATORIAL CIRCUITS 126
9.1.4. Other Properties of Boolean Algebras. The properties
mentioned above deﬁne a Boolean algebra, but Boolean algebras also
have other properties:
1. Idempotent
x ∨ x = x
x ∧ x = x
2. Bound
x ∨ 1 = 1
x ∧ 0 = 0
3. Absorption
x ∨ xy = x
x ∧ (x ∨ y) = x
4. Involution
x = x
5. 0 and 1
0 = 1
1 = 0
6. De Morgan’s
x ∨ y = x ∧ y
x ∧ y = x ∨ y
For instance the ﬁrst idempotent law can be proved like this: x =
x ∨ 0 = x ∨ x ∧ x = (x ∨ x) ∧ (x ∨ x) = (x ∨ x) ∧ 1 = x ∨ x.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
9.2. BOOLEAN FUNCTIONS, APPLICATIONS 127
9.2. Boolean Functions, Applications
9.2.1. Introduction. A Boolean function is a function from Z
n
2
to Z
2
. For instance, consider the exclusiveor function, deﬁned by the
following table:
x
1
x
2
x
1
⊕x
2
1 1 0
1 0 1
0 1 1
0 0 0
The exclusiveor function can interpreted as a function Z
2
2
→ Z
2
that assigns (1, 1) → 0, (1, 0) → 1, (0, 1) → 1, (0, 0) → 0. It can also
be written as a Boolean expression in the following way:
x
1
⊕x
2
= (x
1
x
2
) + (x
1
x
2
)
Every Boolean function can be written as a Boolean expression as
we are going to see next.
9.2.2. Disjunctive Normal Form. We start with a deﬁnition.
A minterm in the symbols x
1
, x
2
, . . . , x
n
is a Boolean expression of the
form y
1
y
2
y
n
, where each y
i
is either x
i
or x
i
.
Given any Boolean function f : Z
n
2
→ Z
2
that is not identically
zero, it can be represented
f(x
1
, . . . , x
n
) = m
1
+m
2
+ +m
k
,
where m
1
, m
2
, . . . , m
k
are all the minterms m
i
= y
1
y
2
y
n
such that
f(a
1
, a
2
, . . . , a
n
) = 1, where y
j
= x
j
if a
j
= 1 and y
j
= x
j
if a
j
= 0.
That representation is called disjunctive normal form of the Boolean
function f.
Example: We have seen that the exclusiveor can be represented
x
1
⊕ x
2
= (x
1
x
2
) + (x
1
x
2
). This provides a way to implement the
exclusiveor with a combinatorial circuit as shown in ﬁgure 9.4.
9.2.3. Conjunctive Normal Form. A maxterm in the symbols
x
1
, x
2
, . . . , x
n
is a Boolean expression of the form y
1
+ y
2
+ + y
n
,
where each y
i
is either x
i
or x
i
.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
9.2. BOOLEAN FUNCTIONS, APPLICATIONS 128
x1
x2
x2 x1
Figure 9.4. ExclusiveOr.
Given any Boolean function f : Z
n
2
→ Z
2
that is not identically
one, it can be represented
f(x
1
, . . . , x
n
) = M
1
M
2
M
k
,
where M
1
, M
2
, . . . , M
k
are all the maxterms M
i
= y
1
+ y
2
+ + y
n
such that f(a
1
, a
2
, . . . , a
n
) = 0, where y
j
= x
j
if a
j
= 0 and y
j
= x
j
if
a
j
= 1. That representation is called conjunctive normal form of the
Boolean function f.
Example: The conjunctive normal form of the exclusiveor is
x
1
⊕x
2
= (x
1
+x
2
) (x
1
+x
2
) .
9.2.4. Functionally Complete Sets of Gates. We have seen
how to design combinatorial circuits using AND, OR and NOT gates.
Here we will see how to do the same with other kinds of gates. In the
following gates will be considered as functions from Z
n
2
into Z
2
intended
to serve as building blocks of arbitrary boolean functions.
A set of gates ¦g
1
, g
2
, . . . , g
k
¦ is said to be functionally complete
if for any integer n and any function f : Z
n
2
→ Z
2
it is possible to
construct a combinatorial circuit that computes f using only the gates
g
1
, g
2
, . . . , g
k
. Example: The result about the existence of a disjunctive
normal form for any Boolean function proves that the set of gates
¦AND, OR, NOT¦ is functionally complete. Next we show other sets
of gates that are also functionally complete.
1. The set of gates ¦AND, NOT¦ is functionally complete. Proof:
Since we already know that ¦AND, OR, NOT¦ is functionally
complete, all we need to do is to show that we can compute
x +y using only AND and NOT gates. In fact:
x +y = x y ,
hence the combinatorial circuit of ﬁgure 9.5 computes x +y.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
9.2. BOOLEAN FUNCTIONS, APPLICATIONS 129
x1 x2
x2
x1
+
Figure 9.5. OR with AND and NOT.
2. The set of gates ¦OR, NOT¦ is functionally complete. The
proof is similar:
x y = x +y ,
hence the combinatorial circuit of ﬁgure 9.6 computes x+y.
x1
x1 x2
x2
Figure 9.6. AND with OR and NOT.
3. The gate NAND, denoted ↑ and deﬁned as
x
1
↑ x
2
=
_
0 if x
1
= 1 and x
2
= 1
1 otherwise,
is functionally complete.
x1
x1 x2
x2
Figure 9.7. NAND gate.
Proof: Note that x ↑ y = x y. Hence x = x x = x ↑ x, so
the NOT gate can be implemented with a NAND gate. Also the
OR gate can be implemented with NAND gates: x+y = x y =
(x ↑ x) ↑ (y ↑ y). Since the set ¦OR, NOT¦ is functionally
complete and each of its elements can be implemented with
NAND gates, the NAND gate is functionally complete.
9.2.5. Minimization of Combinatorial Circuits. Here we ad
dress the problems of ﬁnding a combinatorial circuit that computes a
given Boolean function with the minimum number of gates. The idea
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
9.2. BOOLEAN FUNCTIONS, APPLICATIONS 130
x y
x
x
x
y
Figure 9.8. NOT and OR functions implemented with
NAND gates.
is to simplify the corresponding Boolean expression by using algebraic
properties such as (E a) + (E a) = E and E + (E a) = E, where
E is any Boolean expression. For simplicity in the following we will
represent a b as ab, so for instance the expressions above will look like
this: Ea +Ea = E and E +Ea = E.
Example: Let F(x, y, z) the Boolean function deﬁned by the follow
ing table:
x y z f(x,y,z)
1 1 1 1
1 1 0 1
1 0 1 0
1 0 0 1
0 1 1 0
0 1 0 0
0 0 1 0
0 0 0 0
Its disjunctive normal form is f(x, y, z) = xyz + xyz + xyz. This
function can be implemented with the combinatorial circuit of ﬁgure
9.9.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
9.2. BOOLEAN FUNCTIONS, APPLICATIONS 131
z
y
x
.
f(x,y,z)
Figure 9.9. A circuit that computes f(x, y, z) = xyz +
xyz +xyz.
But we can do better if we simplify the expression in the following
way:
f(x, y, z) =
xy
¸ .. ¸
xyz +xyz +xyz
= xy +xyz
= x(y +yz)
= x(y +y)(y +z)
= x(y +z) ,
which corresponds to the circuit of ﬁgure 9.10.
f(x,y,z)
. y
x
z
Figure 9.10. A simpler circuit that computes
f(x, y, z) = xyz +xyz +xyz.
9.2.6. MultiOutput Combinatorial Circuits. Example: Half
Adder. A halfadder is a combinatorial circuit with two inputs x and
y and two outputs s and c, where s represents the sum of x and y and
c is the carry bit. Its table is as follows:
x y s c
1 1 0 1
1 0 1 0
0 1 1 0
0 0 0 0
So the sum is s = x ⊕y (exclusiveor) and the carry bit is c = x y.
Figure 9.11 shows a halfadder circuit.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
9.2. BOOLEAN FUNCTIONS, APPLICATIONS 132
y
s
c
x
Figure 9.11. Halfadder circuit.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
CHAPTER 10
Automata, Grammars and Languages
10.1. Finite State Machines
10.1.1. FiniteState Machines. Combinatorial circuits have no
memory or internal states, their output depends only on the current
values of their inputs. Finite state machines on the other hand have
internal states, so their output may depend not only on its current
inputs but also on the past history of those inputs.
A ﬁnitestate machine consists of the following:
1. A ﬁnite set of states S.
2. A ﬁnite set of input symbols I.
3. A ﬁnite set of output symbols O.
4. A nextstate or transition function f : S I → S.
5. An output function g : S I → O.
6. An initial state σ ∈ S.
We represent the machine M = (S, I, O, f, g, σ)
Example: We describe a ﬁnite state machine with two input symbols
I = ¦a, b¦ and two output symbols O = ¦0, 1¦ that accepts any string
from I
∗
and outputs as many 1’s as a’s there are at the beginning of the
string, then it outputs only 0’s. The internal states are S = ¦σ
0
, σ
1
¦,
where σ
0
is the initial state—we interpret it as not having seeing any
“b” yet; then the machine will switch to σ
1
as soon as the ﬁrst “b”
arrives. The nextstate and output functions are as follows:
f g
I a b a b
S
σ
0
σ
0
σ
1
1 0
σ
1
σ
1
σ
1
0 0
133
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.1. FINITE STATE MACHINES 134
This ﬁnitestate machine also can be represented with the following
transition diagram:
start
GFED @ABC
σ
0
a/1
b/0
GFED @ABC
σ
1
a/0
b/0
¸¸
The vertices of the diagram are the states. If in state σ an input
i causes the machine to output o and go to state σ
then we draw an
arrow from σ to σ
labeled i/o or i, o.
Example: The following example is similar to the previous one but
the machine outputs 1 only after a change of input symbol, otherwise
it outputs 0:
start
GFED @ABC
σ
0
a/0
.· 








b/0
B
B
B
B
B
B
B
B
B
GFED @ABC
σ
1
a/0
b/1
GFED @ABC
σ
2
b/0
¸¸
a/1
¸
Example: A SerialAdder. A serial adder accepts two bits and out
puts its sum. So the input set is I = ¦00, 01, 10, 11¦. The output
set is O = ¦0, 1¦. The set of states is S = ¦NC, C¦, which stands
for “no carry” and “carry” respectively. The transition diagram is the
following:
start
ONML HIJK
NC
00/0
01/1
10/1
¸¸
11/0
GFED @ABC
C
01/0
10/0
¸.
11/1
¸¸
00/1
¸
10.1.2. FiniteState Automata. A ﬁnitestate automaton is sim
ilar to a ﬁnitestate machine but with no output, and with a set of states
called accepting or ﬁnal states. More speciﬁcally, ﬁnitestate automa
ton consists of:
1. A ﬁnite set of states S.
2. A ﬁnite set of input symbols I.
3. A nextstate or transition function f : S I → S.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.1. FINITE STATE MACHINES 135
4. An initial state σ ∈ S.
5. A subset F ⊆ S of accepting or ﬁnal states.
We represent the automaton A = (S, I, f, σ, F). We say that an au
tomaton accepts or recognizes a given string of input symbols if that
strings takes the automaton from the start state to a ﬁnal state.
Example: The following transition diagrams represent an automa
ton accepting any string of a’s and b’s ending with an a. The ﬁrst
diagram uses the same scheme as with ﬁnitestate machines, with 1
representing “accept” or “recognize”, and “0” representing “not ac
cept”:
start
GFED @ABC
σ
0
b/0
a/1
GFED @ABC
σ
1
b/0
¸
a/1
The second kind of diagram omits the outputs and represents the
accepting states with double circles:
start
GFED @ABC
σ
0
b
a
GFED @ABC ?>=< 89:;
σ
1
b
¸
a
Two ﬁnitestate automata that accept exactly the same set of strings
are said to be equivalent. For instance the following automaton also
accepts precisely strings of a’s abd b’s that end with an a, so it is
equivalent to the automaton shown above:
start
GFED @ABC
σ
0
b
a
GFED @ABC ?>=< 89:;
σ
1
b
¸
a
GFED @ABC ?>=< 89:;
σ
2
b
¸¸
a
Example: The following automaton accepts strings of a’s and b’s
with exactly an even number of a’s:
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.1. FINITE STATE MACHINES 136
start
GFED @ABC ?>=< 89:;
E
b
a
GFED @ABC
O
a
¸¸
b
Example: The following automaton accepts strings starting with
one a followed by any number of b’s:
start
GFED @ABC
σ
0
a
b
¸
GFED @ABC ?>=< 89:;
σ
1
a
b
GFED @ABC
σ
2
a
b
¸¸
Example: The following automaton accepts strings ending with aba:
start
GFED @ABC
σ
0
a
b
¸¸
GFED @ABC
σ
1
b
a
¸
b
GFED @ABC
σ
2
a
b
.
GFED @ABC ?>=< 89:;
σ
3
b
¸
a
¸¸
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.2. LANGUAGES AND GRAMMARS 137
10.2. Languages and Grammars
10.2.1. Formal Languages. Consider algebraic expressions writ
ten with the symbols A = ¦x, y, z, +, ∗, (, )¦. The following are some
of them: “x + y ∗ y”, “y + (x ∗ y + y) ∗ x”, “(x + y) ∗ x + z”, etc.
There are however some strings of symbols that are not legitimate al
gebraic expressions, because they have some sort of syntax error, e.g.:
“(x + y”, “z + +y ∗ x”, “x(∗y) + z”, etc. So syntactically correct al
gebraic expressions are a subset of the whole set A
∗
of possible strings
over A.
In general, given a ﬁnite set A (the alphabet), a (formal) language
over A is a subset of A
∗
(set of strings of A).
Although in principle any subset of A
∗
is a formal language, we are
interested only in languages with certain structure. For instance: let
A = ¦a, b¦. The set of strings over A with an even number of a’s is a
language over A.
10.2.2. Grammars. A way to determine the structure of a lan
guage is with a grammar. In order to deﬁne a grammar we need two
kinds of symbols: nonterminal, used to represent given subsets of the
language, and terminal, the ﬁnal symbols that occur in the strings
of the language. For instance in the example about algebraic expres
sions mentioned above, the ﬁnal symbols are the elements of the set
A = ¦x, y, z, +, ∗, (, )¦. The nonterminal symbols can be chosen to
represent a complete algebraic expression (E), or terms (T) consisting
of product of factors (F). Then we can say that an algebraic expression
E consists of a single term
E → T,
or the sum of an algebraic expression and a term
E → E +T.
A term may consists of a factor or a product of a term and a factor
T → F
T → T ∗ F
A factor may consists of an algebraic expression between parenthesis
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.2. LANGUAGES AND GRAMMARS 138
F → (E),
or an isolated terminal symbol
F → x,
F → y,
F → z.
Those expressions are called productions, and tell us how we can
generate syntactically correct algebraic expressions by replacing suc
cessively the symbols on the left by the expressions on the right. For
instance the algebraic expression “‘y +(x∗y +y) ∗x” can be generated
like this:
E ⇒ E+T ⇒ T +T ⇒ F +T ⇒ y +T ⇒ y +T ∗F ⇒ y +F ∗F ⇒
y+(E)∗F ⇒y+(E+T)∗F ⇒y+(T+T)∗F ⇒y+(T∗F+T)∗F ⇒
y +(F ∗ F +T) ∗ F ⇒ y +(x ∗ T +T) ∗ F ⇒ y +(x ∗ F +T) ∗ F ⇒
y + (x ∗ y +T) ∗ F ⇒ y + (x ∗ y +F) ∗ T ⇒ y + (x ∗ y +y) ∗ F ⇒
y + (x ∗ y +y) ∗ x .
In general a phrasestructure grammar (or simply, grammar) G
consists of
1. A ﬁnite set V of symbols called vocabulary or alphabet.
2. A subset T ⊆ V of terminal symbols. The elements of N =
V −T are called nonterminal symbols or nonterminals.
3. A start symbol σ ∈ N.
4. A ﬁnite subset P of (V
∗
−T
∗
)V
∗
called the set of productions.
We write G = (V, T, σ, P).
A production (A, B) ∈ P is written:
A → B.
The right hand side of a production can be any combination of
terminal and nonterminal symbols. The left hand side must contain at
least one nonterminal symbol.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.2. LANGUAGES AND GRAMMARS 139
If α → β is a production and xαy ∈ V
∗
, we say that xβy is directly
derivable from xαy, and we write
xαy ⇒ xβy .
If we have α
1
⇒ α
2
⇒ ⇒ α
n
(n ≥ 0), we say that α
n
is derivable
from α
1
, and we write α
1
∗
⇒ α
n
(by convention also α
1
∗
⇒ α
1
.)
Given a grammar G, the language L(G) associated to this grammar
is the subset of T
∗
consisting of all strings derivable from σ.
10.2.3. Backus Normal Form. The Backus Normal Form or
BNF is an alternative way to represent productions. The production
S → T is written S ::= T. Productions of the form S ::= T
1
, S ::= T
2
,
. . . , S ::= T
n
, can be combined as
S ::= T
1
[ T
2
[ [ T
n
.
So, for instance, the grammar of algebraic expressions deﬁned above
can be written in BNF as follows:
E ::= T [ E +T
T ::= F [ T ∗ F
F ::= (E) [ x [ y [ z
10.2.4. Combining Grammars. Let G
1
= (V
1
, T
1
, σ
1
, P
1
) and
G
2
= (V
2
, T
2
, σ
2
, P
2
) be two grammars, where N
1
= V
1
−T
1
and N
2
=
V
2
− T
2
are disjoint (rename nonterminal symbols if necessary). Let
L
1
= L(G
1
) and L
2
= L(G
2
) be the languages associated respectively
to G
1
and G
2
. Also assume that σ is a new symbol not in V
1
∪ V
2
.
Then
1. Union Rule: the language union of L
1
and L
1
L
1
∪ L
2
= ¦α [ α ∈ L
1
or α ∈ L
1
¦
starts with the two productions
σ → σ
1
, σ → σ
2
.
2. Product Rule: the language product of L
1
and L
2
L
1
L
2
= ¦αβ [ α ∈ L
1
, β ∈ L
1
¦
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.2. LANGUAGES AND GRAMMARS 140
where αβ = string concatenation of α and β, starts with the
production
σ → σ
1
σ
2
.
3. Closure Rule: the language closure of L
1
L
∗
1
= L
0
1
∪ L
1
1
∪ L
2
1
∪ . . .
were L
0
1
= ¦λ¦ and L
n
1
= ¦α
1
α
2
. . . α
n
[ α
k
∈ L
1
, k = 1, 2, . . . , n¦
(n = 1, 2, . . . ), starts with the two productions
σ → σ
1
σ , σ → λ.
10.2.5. Types of Grammars (Chomsky’s Classiﬁcation). Let
G be a grammar and let λ denote the null string.
0. G is a phrasestructure (or type 0) grammar if every production
is of the form:
α → δ ,
where α ∈ V
∗
−T
∗
, δ ∈ V
∗
.
1. G is a contextsensitive (or type 1) grammar if every production
is of the form:
αAβ → αδβ
(i.e.: we may replace A with δ in the context of α and β), where
α, β ∈ V
∗
, A ∈ N, δ ∈ V
∗
−¦λ¦.
2. G is a contextfree (or type 2) grammar if every production is
of the form:
A → δ ,
where A ∈ N, δ ∈ V
∗
.
3. G is a regular (or type 3) grammar if every production is of the
form:
A → a or A → aB or A → λ,
where A, B ∈ N, a ∈ T.
A language L is contextsensitive (respectively contextfree, regu
lar) if there is a contextsensitive (respectively contextfree, regular)
grammar G such that L = L(G).
The following examples show that these grammars deﬁne diﬀerent
kinds of languages.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.2. LANGUAGES AND GRAMMARS 141
Example: The following language is type 3 (regular):
L = ¦a
n
b
m
[ n = 1, 2, 3 . . . ; m = 1, 2, 3, . . . ¦ .
A type 3 grammar for that language is the following: T = ¦a, b¦,
N = ¦σ, S¦, with start symbol σ, and productions:
σ → aσ , σ → aS , S → bS , S → b .
Example: The following language is type 2 (contextfree) but not
type 3:
L = ¦a
n
b
n
[ n = 1, 2, 3, . . . ¦ .
A type 2 grammar for that language is the following:
T = ¦a, b¦, N = ¦σ¦, with start symbol σ, and productions
σ → aσb , σ → ab .
Example: The following language is type 1 (contextsensitive) but
not type 2:
L = ¦a
n
b
n
c
n
[ n = 1, 2, 3, . . . ¦ .
A type 1 grammar for that language is the following:
T = ¦a, b, c¦, N = ¦σ, A, C¦, with start symbol σ, and productions
σ → abc , σ → aAbc ,
A → abC , A → aAbC ,
Cb → bC , Cc → cc .
There are also type 0 languages that are not type 1, but they are
harder to describe.
10.2.6. Equivalent Grammars. Two grammars G and G
are
equivalent if L(G) = L(G
).
Example: The grammar of algebraic expressions deﬁned at the be
ginning of the section is equivalent to the following one:
Terminal symbols = ¦x, y, z, +, ∗, (, )¦, nonterminal symbols = ¦E, T, F, L¦,
with start symbol E, and productions
E → T, E → E +T,
T → F, T → T ∗ F
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.2. LANGUAGES AND GRAMMARS 142
F → (E), F → L,
L → x, L → y, L → z.
10.2.7. ContextFree Interactive Lindenmayer Grammar.
A contextfree interactive Lindenmayer grammar is similar to a usual
contextfree grammar with the diﬀerence that it allows productions of
the form A → B where A ∈ N ∪T (in a context free grammar A must
be nonterminal). Its rules for deriving strings also are diﬀerent. In a
contextfree interactive Lindenmayer grammar, to derive string β from
string α, all symbols in α must be replaced simultaneously.
Example: The von Koch Snowﬂake. The von Koch Snowﬂake is a
fractal curve obtained by start with a line segment and then at each
stage transforming all segments of the ﬁgure into a four segment polyg
onal line, as shown below. The von Koch Snowﬂake fractal is the limit
of the sequence of curves deﬁned by that process.
Figure 10.1. Von Koch Snowﬂake, stages 1–3.
Figure 10.2. Von Koch Snowﬂake, stages 4–5
A way to represent an intermediate stage of the making of the
fractal is by representing it as a sequence of movements of three kinds:
’d’= draw a straight line (of a ﬁx length) in the current direction, ’r’=
turn right by 60
◦
, ’l’= turn left by 60
◦
. For instance we start with a
single horizontal line d, which we then transform into the polygonal
dldrrdld, then each segment is transformed into a polygonal according
to the rule d → dldrrdld, so we get
dldrrdldldldrrdldrrdldrrdldldldrrdld
If we represent by D a segment that may no be ﬁnal yet, then the
sequences of commands used to build any intermediate stage of the
curve can be deﬁned with the following grammar:
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.2. LANGUAGES AND GRAMMARS 143
N = ¦D¦, T = ¦d, r, l¦, with start symbol D, and productions:
D → DlDrrDlD, D → d , r → r , l → l .
Example: The Peano curve. The Peano curve is a space ﬁlling curve,
i.e., a function f : [0, 1] → [0, 1]
2
such that the range of f is the whole
square [0, 1]
2
, deﬁned as the limit of the sequence of curves shown in
the ﬁgures below.
Figure 10.3. Peano curve, stages 1–4.
Each element of that sequence of curves can be described as a se
quence of 90
◦
arcs drawn either anticlockwise (’l’) or clockwise (’r’).
The corresponding grammar is as follows:
T = ¦l, r¦, N = ¦C, L, R¦, with and start symbol C, and productions
C → LLLL,
L → RLLLR, R → RLR,
L → l , R → r , l → l , r → r .
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.3. LANGUAGE RECOGNITION 144
10.3. Language Recognition
10.3.1. Regular Languages. Recall that a regular language is
the language associated to a regular grammar, i.e., a grammar G =
(V, T, σ, P) in which every production is of the form:
A → a or A → aB or A → λ,
where A, B ∈ N = V −T, a ∈ T.
Regular languages over an alphabet T have the following properties
(recall that λ = ’empty string’, αβ = ’concatenation of α and β’, α
n
=
’α concatenated with itself n times’):
1. ∅, ¦λ¦, and ¦a¦ are regular languages for all a ∈ T.
2. If L
1
and L
2
are regular languages over T the following lan
guages also are regular:
L
1
∪ L
2
= ¦α [ α ∈ L
1
or α ∈ L
2
¦
L
1
L
2
= ¦αβ [ α ∈ L
1
, β ∈ L
2
¦
L
∗
1
= ¦α
1
. . . α
n
[ α
k
∈ L
1
, n ∈ N¦ ,
T
∗
−L
1
= ¦α ∈ T
∗
[ α / ∈ L
1
¦ ,
L
1
∩ L
2
= ¦α [ α ∈ L
1
and α ∈ L
2
¦ .
We justify the above claims about L
1
∪L
2
, L
1
L
2
and L
∗
1
as follows.
We already know how to combine two grammars (see 10.2.4) L
1
and
L
2
to obtain L
1
∪ L
2
, L
1
L
2
and L
∗
1
, the only problem is that the rules
given in section 10.2.4 do no have the form of a regular grammar, so we
need to modify them slightly (we use the same notation as in section
10.2.4):
1. Union Rule: Instead of adding σ → σ
1
and σ → σ
2
, add all
productions of the form σ → RHS, where RHS is the right
hand side of some production (σ
1
→ RHS) ∈ P
1
or (σ
2
→
RHS) ∈ P
2
.
2. Product Rule: Instead of adding σ → σ
1
σ
2
, use σ
1
as start
symbol and replace each production (A → a) ∈ P
1
with A →
aσ
2
and (A → λ) ∈ P
1
with A → σ
2
.
3. Closure Rule: Instead of adding σ → σ
1
σ and σ → λ, use
σ
1
as start symbol, add σ
1
→ λ, and replace each production
(A → a) ∈ P
1
with A → aσ
1
and (A → λ) ∈ P
1
with A → σ
1
.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.3. LANGUAGE RECOGNITION 145
10.3.2. Regular Expressions. Regular languages can be charac
terized as languages deﬁned by regular expressions. Given an alphabet
T, a regular expression over T is deﬁned recursively as follows:
1. ∅, λ, and a are regular expressions for all a ∈ T.
2. If R and S are regular expressions over T the following expres
sions are also regular: (R), R +S, R S, R
∗
.
In order to use fewer parentheses we assign those operations the fol
lowing hierarchy (from do ﬁrst to do last): ∗, , +. We may omit the
dot: α β = αβ.
Next we deﬁne recursively the language associated to a given regular
expression:
L(∅) = ∅ ,
L(λ) = ¦λ¦ ,
L(a) = ¦a¦ for each a ∈ T,
L(R +S) = L(R) ∪ L(S) ,
L(R S) = L(R)L(S) (language product),
L(R
∗
) = L(R)
∗
(language closure).
So, for instance, the expression a
∗
bb
∗
represents all strings of the
form a
n
b
m
with n ≥ 0, m > 0, a
∗
(b +c) is the set of strings consisting
of any number of a’s followed by a b or a c, a(a + b)
∗
b is the set of
strings over ¦a, b¦ than start with a and end with b, etc.
Another way of characterizing regular languages is as sets of strings
recognized by ﬁnitestate automata, as we will see next. But ﬁrst we
need a generalization of the concept of ﬁnitestate automaton.
10.3.3. Nondeterministic FiniteState Automata. A nonde
terministic ﬁnitestate automaton is a generalization of a ﬁnitestate
automaton so that at each state there might be several possible choices
for the “next state” instead of just one. Formally a nondeterministic
ﬁnitestate automaton consists of
1. A ﬁnite set of states S.
2. A ﬁnite set of input symbols I.
3. A nextstate or transition function f : S I → P(S).
4. An initial state σ ∈ S.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.3. LANGUAGE RECOGNITION 146
5. A subset F of S of accepting or ﬁnal states.
We represent the automaton A = (S, I, f, σ, F). We say that a nonde
terministic ﬁnitestate automaton accepts or recognizes a given string
of input symbols if in its transition diagram there is a path from the
starting state to a ﬁnal state with its edges labeled by the symbols of
the given string. A path (which we can express as a sequence of states)
whose edges are labeled with the symbols of a string is said to represent
the given string.
Example: Consider the nondeterministic ﬁnitestate automaton de
ﬁned by the following transition diagram:
start
GFED @ABC
σ
b
a
GFED @ABC
C
b
b
GFED @ABC ?>=< 89:;
F
This automaton recognizes precisely the strings of the form b
n
ab
m
,
n ≥ 0, m > 0. For instance the string bbabb is represented by the path
(σ, σ, σ, C, C, F). Since that path ends in a ﬁnal state, the string is
recognized by the automaton.
Next we will see that there is a precise relation between regular
grammars and nondeterministic ﬁnitestate automata.
Regular grammar associated to a nondeterministic ﬁnitestate au
tomaton. Let A be a nondeterministic ﬁnitestate automaton given as
a transition diagram. Let σ be the initial state. Let T be the set of
inputs symbols, let N be the set of states, and V = N ∪ T. Let P be
the set of productions
S → xS
if there is an edge labeled x from S to S
and
S → λ
if S is a ﬁnal state. Let G be the regular grammar
G = (V, T, σ, P) .
Then the set of strings recognized by A is precisely L(G).
Example: For the nondeterministic automaton deﬁned above the
corresponding grammar will be:
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.3. LANGUAGE RECOGNITION 147
T = ¦a, b¦, N = ¦σ, C, F¦, with productions
σ → bσ , σ → aC , C → bC , C → bF , F → λ.
The string bbabb can be produced like this:
σ ⇒ bσ ⇒ bbσ ⇒ bbaC ⇒ bbabC ⇒ bbabbF ⇒ bbabb .
Nondeterministic ﬁnitestate automaton associated to a given regu
lar grammar. Let G = (V, T, σ, P) be a regular grammar. Let
I = T.
S = N ∪ ¦F¦, where N = V −T, and F / ∈ V .
f(S, x) = ¦S
[ S → xS
∈ P¦ ∪ ¦F [ S → x ∈ P¦.
F = ¦F¦ ∪ ¦S [ S → λ ∈ P¦ .
Then the nondeterministic ﬁnitestate automaton A = (S, I, f, σ, F)
recognizes precisely the strings in L(G).
10.3.4. Relationships Between Regular Languages and Au
tomata. In the previous section we saw that regular languages co
incide with the languages recognized by nondeterministic ﬁnitestate
automata. Here we will see that the term “nondeterministic” can be
dropped, so that regular languages are precisely those recognized by
(deterministic) ﬁnitestate automata. The idea is to show that given
any nondeterministic ﬁnitestate automata it is possible to construct
an equivalent deterministic ﬁnitestate automata recognizing exactly
the same set of strings. The main result is the following:
Let A = (S, I, f, σ, F) be a nondeterministic ﬁnitestate automaton.
Then Ais equivalent to the ﬁnitestate automaton A
= (S
, I
, f
, σ
, F
),
where
1. S
= P(S).
2. I
= I.
3. σ
= ¦σ¦.
4. F
= ¦X ⊆ S [ X ∩ F ,= ∅¦.
5. f
(X, x) =
_
S∈X
f(S, x) , f
(∅, x) = ∅ .
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.3. LANGUAGE RECOGNITION 148
Example: Find a (deterministic) ﬁnitestate automaton A
equiva
lent to the following nondeterministic ﬁnitestate automaton A:
start
GFED @ABC
σ
b
a
GFED @ABC
C
b
b
GFED @ABC ?>=< 89:;
F
Answer: The set of input symbols is the same as that of the given
automaton: I
= I = ¦a, b¦. The set of states is the set of subsets of
S = ¦σ, C, F¦, i.e.:
S
= ¦∅, ¦σ¦, ¦C¦, ¦F¦, ¦σ, C¦, ¦σ, F¦, ¦C, F¦, ¦σ, C, F¦¦ .
The starting state is ¦σ¦. The ﬁnal states of A
are the elements of S
containing some ﬁnal state of A:
F
= ¦¦F¦, ¦σ, F¦, ¦C, F¦, ¦σ, C, F¦¦ .
Then for each element X of S
we draw an edge labeled x from X
to
_
S∈X
f(S, x) (and from ∅ to ∅):
start
GF ED
@A BC ¦σ¦
a
b
GF ED
@A BC ¦C¦
a
b
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
GF ED
@A BC
?> =<
89 :; ¦σ, C, F¦
a
¸
b
GF ED
@A BC
?> =<
89 :; ¦σ, F¦
a
¸
x
x
x
x
x
x
x
x
b
¸¸
GFED
@ABC
∅
a
b
¸.
GF ED
@A BC ¦σ, C¦
a
¸¸I
I
I
I
I
I
I
I
I
b
¸¸
GF ED
@A BC
?> =<
89 :; ¦F¦
a
¸¸
b
¸¸
GF ED
@A BC
?> =<
89 :; ¦C, F¦
a
¸¸I
I
I
I
I
I
I
I
I
I
I
b
We notice that some states are unreachable from the starting state.
After removing the unreachable states we get the following simpliﬁed
version of the ﬁnitestate automaton:
start
GF ED
@A BC ¦σ¦
a
b
GF ED
@A BC ¦C¦
a
b
GF ED
@A BC
?> =<
89 :; ¦C, F¦
a
b
GFED
@ABC
∅
a
b
¸_
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
10.3. LANGUAGE RECOGNITION 149
So, once proved that every nondeterministic ﬁnitestate automaton
is equivalent to some deterministic ﬁnitestate automaton, we obtain
the main result of this section: A language L is regular if and only
if there exists a ﬁnitestate automaton that recognizes precisely the
strings in L.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
APPENDIX A
A.1. Eﬃcient Computation of Powers Modulo m
We illustrate an eﬃcient method of computing powers modulo m
with an example. Assume that we want to compute 3
547
mod 10.
First write 547 in base 2: 1000100011, hence 547 = 2
9
+ 2
5
+ 2 + 1 =
((2
4
+1) 2
4
+1) 2+1, so: 3
547
= ((3
2
4
3)
2
4
3)
2
3. Next we compute the
expression beginning with the inner parenthesis, and reducing modulo
10 at each step: 3
2
= 9 (mod 10), 3
2
2
= 9
2
= 81 = 1 (mod 10),
3
2
3
= 1
2
= 1 (mod 10), 3
2
4
= 1
2
= 1 (mod 10), 3
2
4
3 = 1 3 = 3
(mod 10), etc. At the end we ﬁnd 3
547
= 7 (mod 10).
The algorithm in pseudocode would be like this:
1: procedure pow mod(a,x,m) ¦computes a^x mod m¦
2: p := 1
3: bx := binary array(x) ¦x as a binary array¦
4: t := a mod m
5: for k := 1 to length(bx)
6: begin
7: p := (p * p) mod m
8: if bx[k] = 1 then
¦if kth binary digit of x is 1¦
9: p := (p * t) mod m
10: end
11: return p
12: end pow mod
150
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
A.1. EFFICIENT COMPUTATION OF POWERS MODULO M 151
The following is a program in C implementing the algorithm:
int pow(int a, int x, int m) {
int p = 1;
int y = (1 << (8 * size of(int)  2));
a %= m;
while (!(y & x)) y >>= 1;
while (y) {
p *= p;
p %= m;
if (x & y) {
p *= a;
p %= m;
}
y >>= 1;
}
return p;
}
The following is an alternative algorithm equivalent to running
through the binary representation of the exponent from right to left
instead of left to right:
1: procedure pow mod(a,x,m) ¦computes a^x mod m¦
2: p := 1
3: t := a mod m
4: while x > 0
5: begin
6: if x is odd then
7: p := (p * t) mod m
8: t := (t * t) mod m
9: x := floor(x/2)
10: end
11: return p
12: end pow mod
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
A.2. MACHINES AND LANGUAGES 152
A.2. Machines and Languages
A.2.1. Turing Machines. A Turing machine is a theoretical de
vice intended to deﬁne rigorously the concept of algorithm. It consists
of
1. An inﬁnite tape made of a sequence of cells. Each cell may be
empty or may contain a symbol from a given alphabet.
2. A control unit containing a ﬁnite set of instructions.
3. A tape head able to read and write (or delete) symbols from the
tape.
Tape head
Tape
control
unit
Figure A.1. Turing Machine.
Each machine instruction contains the following ﬁve parts:
1. The current machine state.
2. A tape symbol read from the current tape cell.
3. A tape symbol to write into the current tape cell.
4. A direction for the tape head to move: L = ’move one cell to
the left’, R = ’move one cell to the right’, S = ’stay in the
current cell’.
5. The next machine state.
Turing machines are generalizations of ﬁnitestate automata. A
ﬁnitestate automaton is just a Turing machine whose tape head moves
always from left to right and never writes to the tape. The input of
the ﬁnitestate automaton is presented as symbols written in the tape.
In general we make the following assumptions:
1. An input is represented on the tape by placing the letters of
the strings in contiguous tape cells. All other cells contain the
blank symbol, which we may denote λ.
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
A.2. MACHINES AND LANGUAGES 153
2. The tape is initially positioned at the leftmost cell of the input
string unless speciﬁed otherwise.
3. There is one starting state.
4. There is one halt state, which we denote by “Halt”.
The execution of a Turing machine stops when it enters the Halt state
or when it enters a state for which there is no valid move. The output
of the Turing machine is the contents of the tape when the machine
stops.
We say that an input string is accepted by a Turing machine if
the machine enters the Halt state. Otherwise the string is rejected.
This can happen in two ways: by entering a state other than the Halt
state from which there is no move, or by running forever (for instance
executing an inﬁnite loop).
If a Turing machine has at least two instructions with the same state
and input letter, then the machine is nondeterministic. Otherwise it is
deterministic.
FiniteState Automata. A ﬁnitestate automata can be interpreted
as a Turing machine whose tape head moves only from left to right and
never writes to the tape.
Pushdown Automata. A pushdown automaton is ﬁnitestate au
tomaton with a stack, i.e., a storage structure in which symbols can be
put and extracted from it by two operations: push (place on the top of
the stack) and pop (take from the top of the stack)—consequently the
last symbol put into the stack is the ﬁrst symbol taken out. Addition
ally there is a third operation, nop, that leaves the stack intact. The
next state function takes into account not only the current state and
the symbol read from the input, but also the symbol at the top of the
stack. After reading the next input symbol and the symbol at the top
of the stack, the automaton executes a stack operation and goes to the
next state. Initially there is a single symbol in the stack.
Linearly Bounded Automata. A linearly bounded automaton is a
Turing machine whose tape is limited to the size of its input string
plus two boundary cells that may not be changed.
Computable Functions. Consider a Turing machine T working on
symbols from an alphabet of only one symbol A = ¦[¦ (“stroke”). Let
f : N → N the function deﬁned so that f(n) = m means that if the
www.jntuworld.com
w
w
w
.
j
n
t
u
w
o
r
l
d
.
c
o
m
A.2. MACHINES AND LANGUAGES 154
initial input of T consists of a string of n + 1 strokes, the output of T
is a string of m+ 1 strokes. We say that f is computed by the Turing
machine T. A computable function is a function computed by some
Turing machine. A computable function f(n) halts for a given value
of its argument n if T with input n + 1 strokes halts. A computable
function f is total if f(n) halts for every n.
An eﬀective enumeration of a set is a listing of its elements by an
algorithm.
A.2.2. Hierarchy of Languages. Here we mention a hierarchy
of languages that includes (and extends) Chomsky’s classiﬁcation, in
increasing order of inclusion.
1. Regular languages. They are recognized by ﬁnitestate automata.
Example: ¦a
m
b
n
[ m, n = 1, 2, 3 . . . ¦.
2. Deterministic contextfree languages, recognized by determinis
tic pushdown automata. Example: ¦a
n
b
n
[ n = 1, 2, 3 . . . ¦.
3. Contextfree languages, recognized by nondeterministic push
down automata. Example: palindromes over ¦a, b¦.
4. Contextsensitive languages, languages without λ recognized by
linearly bounded automata. Example: ¦a
n
b
n
c
n
[ n = 1, 2, 3 . . . ¦
5. Unrestricted or phrasestructure grammars, recognized by Tur
ing machines.
6. Recursively enumerable languages. A language is recursively
enumerable if there is a Turing machine that outputs all the
strings of the language. Example: ¦a
n
[ f
n
(n) halts¦, where
f
0
, f
1
, f
2
, . . . is an eﬀective enumeration of all computable func
tions.
7. Nongramatical languages, languages that are not deﬁnable by
any grammar and cannot be recognized by Turing machines.
Example: ¦a
n
[ f
n
is total¦.
www.jntuworld.com
www.jntuworld.com
w
w w
jn .
w tu
r o
.c ld
m o
www.jntuworld.com
Contents
Introduction Chapter 1. Logic, Proofs 1.1. Propositions 1.2. Predicates, Quantiﬁers 1.3. Proofs Chapter 2. Sets, Functions, Relations 2.1. Set Theory 2.2. Functions 2.3. Relations 5
Chapter 3. Algorithms, Integers 3.1. Algorithms 3.2. The Euclidean Algorithm 3.3. Modular Arithmetic, RSA Algorithm Chapter 4. Induction, Recurences 4.1. Sequences and Strings 4.2. Mathematical Induction 4.3. Recurrence Relations
w
Chapter 5. Counting 5.1. Basic Principles 5.2. Combinatorics 5.3. Generalized Permutations and Combinations 5.4. Binomial Coeﬃcients 5.5. The Pigeonhole Principle Chapter 6. Probability 6.1. Probability Chapter 7. Graph Theory 7.1. Graphs 7.2. Representations of Graphs 7.3. Paths and Circuits
3
w w
jn .
w tu
r o
.c ld
m o
19 19 27 32 38 38 48 52 59 59 62 65 69 69 71 73 75 77 78 78 82 82 88 91
6 6 11 13
3. Planar Graphs Chapter 8. Automata.4. w tu r o .2.2.www. 10.2. Applications Chapter 10. Boolean Algebras 9. 10. Eﬃcient Computation of Powers Modulo m A. Tree Transversal 8. Machines and Languages w w w jn .c ld m o 133 133 137 144 150 150 152 . 10. Binary Trees 8. Boolean Functions.com CONTENTS 4 7.4. Trees 8.2. Spanning Trees Chapter 9. Trees 8.3.1. Decision Trees.1. Combinatorial Circuits 9. Grammars and Languages Finite State Machines Languages and Grammars Language Recognition 97 100 100 102 104 113 116 122 122 127 Appendix A. Tree Isomorphisms 8.jntuworld.1. A.1.5.
northwestern. w Miguel A. The textbook for this course is Keneth H. Rosen: Discrete Mathematics and Its Applications.edu/~mlerma/courses/cs31005s/ 5 .www.math. if you ﬁnd any typos or errors. so these notes will not be completely ﬁnished until the end of the quarter.c ld m o http://www.jntuworld. With few exceptions I will follow the notation in the book.northwestern. Lerma mlerma@math. or you have any suggestions.edu Northwestern University Spring 2005 w w jn .com Introduction These notes are intended to be a summary of the main ideas in course CS 310: Mathematical Foundations of Computer Science. Fifth Edition. I will recommend exercises and give homework assignments separately. They are not homework nor need to be addressed at all if the reader does not wish to. I may keep working on this document as the course goes on. do not hesitate to let me know. These notes contain some questions and “exercises” intended to stimulate the reader who wants to play a somehow active role while studying the subject. Finally. w tu r o . 2003. please. McGrawHill.
“x is an even number” (it depends on what x represents).com CHAPTER 1 Logic.1. Proofs 1. but not both” Implication p→q “if p then q” Biconditional p↔q “p if and only if q” The truth value of a compound proposition depends only on the value of its components. Truth Tables.www. 1.c ld m o Name Represented Meaning Negation ¬p “not p” Conjunction p∧q “p and q” Disjunction p∨q “p or q (or both)” Exclusive Or p⊕q “either p or q. “do your homework” (this is a command). “2 < 4” (true). “London is in Denmark” (false). The truth or falsehood of a proposition is called its truth value. Writing F for “false” and T for “true”. The main ones are the following (p and q represent given propositions): w w w jn . Connectives are used for making compound propositions.jntuworld. “this sentence is false” (neither true nor false).1. Propositions A proposition is a declarative sentence that is either true or false (but not both). Connectives. “Socrates” (it is not even a sentence). w tu r o . However the following are not propositions: “what is your name?” (this is a question). For instance. we can summarize the meaning of the connectives in the following way: 6 . the following are propositions: “Paris is in France” (true). “4 = 7 (false)”.1.
p ⊕ q is true only when exactly one of p and q is true. and the proposition q is the conclusion or consequent. q is true and also when both are true. Contingency. So.1. p ¬p p ∨ ¬p p ∧ ¬p T F T F T F T F F T T F F T T F 6 6 w tu tautology r o .e. A proposition that is neither a tautology nor a contradiction is called a contingency. p ∨ q is true when any of p. . 3. i..www. w w w jn . Contradiction. For instance: “if John is from Chicago then John is from Illinois”.1. “if London is in Denmark then 2 < 4” (false → true). Conditional Propositions.. 1. The proposition p is called hypothesis or antecedent. A proposition is said to be a tautology if its truth value is T for any assignment of truth values to its components.c ld m o contradiction 1. Example: The proposition p ∨ ¬p is a tautology. 1. Note that p → q is true always except when p is true and q is false.e. i.com 1. A proposition of the form “if p then q” or “p implies q”. On the other hand ⊕ represents an exclusive or. represented “p → q” is called a conditional proposition. Example: The proposition p ∧ ¬p is a contradiction.1.jntuworld. the following sentences are true: “if 2 < 4 then Paris is in France” (true → true).3. A proposition is said to be a contradiction if its truth value is F for any assignment of truth values to its components. 2.2. Tautology. PROPOSITIONS 7 p T T F F q ¬p p ∧ q p ∨ q p ⊕ q p → q p ↔ q T F T T F T T F F F T T F F T T F T T T F F T F F F T T Note that ∨ represents a nonexclusive or.
regardless of the truth value of q.www.com 1. Logical Equivalence. That implication is obviously true.jntuworld. The proposition p ↔ q. 1. The following propositions are logically equivalent: ¬(p ∨ q) ≡ ¬p ∧ ¬q ¬(p ∧ q) ≡ ¬p ∨ ¬q We can check it by examining their truth tables: . For instance p → q and ¬p ∨ q are logically equivalent. Example: De Morgan’s Laws for Logic.e. This will become clearer when we study predicates such as “if x is a multiple of 4 then x is a multiple of 2”. However the following one is false: “if 2 < 4 then London is in Denmark” (true → false). i.1.c ld m o p → q ≡ ¬p ∨ q Note that that two propositions A and B are logically equivalent precisely when A ↔ B is a tautology. they are called logically equivalent. and we write it: w w w jn . PROPOSITIONS 8 “if 4 = 7 then London is in Denmark” (false → false). w tu r o . It is true precisely when p and q have the same truth value. In might seem strange that “p → q” is considered true when p is false.1. they are both true or both false.. Note that the compound propositions p → q and ¬p ∨ q have the same truth values: p T T F F q ¬p ¬p ∨ q p → q T F T T F F F F T T T T F T T T When two compound propositions have the same truth values no matter what truth value their constituent propositions have. although for the particular case x = 3 it becomes “if 3 is a multiple of 4 then 3 is a multiple of 2”.4. is called biconditional. read “p if and only if q”.
saying that “John is married if and only if he has a spouse” is the same as saying “if John is married then he has a spouse” and “if he has a spouse then he is married”.c ld m o . the biconditional proposition is equivalent to the conjunction of a conditional proposition an its converse.1. for instance. As we have seen. for instance “if John is from Chicago then John is from Illinois” is true.com 1. but the converse “if John is from Illinois then John is from Chicago” may be false.5.1.www. Contrapositive.jntuworld. PROPOSITIONS 9 p T T F F q ¬p ¬q p ∨ q ¬(p ∨ q) ¬p ∧ ¬q p ∧ q ¬(p ∧ q) ¬p ∨ ¬q T F F T F F T F F F F T T F F F T T T T F T F F F T T F T T F T T F T T Example: The following propositions are logically equivalent: p ↔ q ≡ (p → q) ∧ (q → p) Again. p ↔ q ≡ (p → q) ∧ (q → p) So. Converse. w w jn . Note that the converse is not equivalent to the given conditional proposition. this can be checked with the truth tables: p T T F F q p → q q → p (p → q) ∧ (q → p) p ↔ q T T T T T F F T F F T T F F F F T T T T Exercise: Check the following logical equivalences: w 1. ¬(p → q) ≡ p ∧ ¬q p → q ≡ ¬q → ¬p ¬(p ↔ q) ≡ p ⊕ q w tu r o . The converse of a conditional proposition p → q is the proposition q → p.
They are logically equivalent. PROPOSITIONS 10 The contrapositive of a conditional proposition p → q is the proposition ¬q → ¬p. w tu r o .www.c ld m o .jntuworld. For instance the contrapositive of “if John is from Chicago then John is from Illinois” is “if John is not from Illinois then John is not from Chicago”.com 1.1. w w w jn .
represented “∀x P (x)”. the sentence “for all x. P (x)”.c ld m o Analogously. while 1 is a counterexample. “x < y”. w w w jn . On the other hand.2.2. Q(x. 1Usually . ∃x Q(x) would be true if we extend the universe of discourse to the rational numbers. so the universe of discourse of n is the set of all integers. For instance if we replace x with 1 in the predicate “x + 2 = 7” we obtain “1 + 2 = 7”.jntuworld. such that P (5) is a true statement.1. namely 5. then ∃x P (x) is true.2.www. has a deﬁnite truth value. but in some situations (as in the so called manysorted logics) it is possible to use diﬀerent kinds of variables to represent diﬀerent types of objects belonging to diﬀerent universes of discourse. “X is American”. since there is indeed an integer. A counterexample is a value of x for which P (x) is false. For instance if P (x) is “x + 2 = 7” with the integers as universe of discourse. which is false. For instance in the predicate “σ is a string of length n” the variable σ represents a string. which is true. P (x)”.2. if P (x) is “x + 2 = 7” and the all variables occurring in predicates along a reasoning are supposed to belong to the same universe of discourse.2. PREDICATES. so the universe of discourse of σ is the set of all strings. for instance in the predicate “n is an odd integer” ’n’ represents an integer. “for every x. w tu r o . if Q(x) is “2x = 7” and the universe of discourse is still the integers.com 1. However. but if we replace it with 5 we get “5 + 2 = 7”. represented “∃x P (x)”. P (x)” (or “there is some x such that p(x)”). The symbol ∃ is called the existential quantiﬁer. Quantiﬁers. The truth value of the predicate depends on the value assigned to its variables. Predicates. QUANTIFIERS 11 1. P (x)”—also “for any x. Each variable in a predicate is assumed to belong to a universe (or domain) of discourse. Predicates. so it is a proposition in the usual sense. Quantiﬁers 1. the statement “for some x. while the universe of discourse of n is the set of natural numbers. y). For instance “x + 2 = 7”. In “X is American” we may assume that X is a human being. while n represents a natural number. We represent a predicate by a letter followed by the variables enclosed between parenthesis: P (x). P (x)”—. etc. so in this case the universe of discourse is the set of all human beings.1 1. “p is a prime number” are predicates. Given a predicate P (x). So. then ∃x Q(x) is false. “for each x. For instance. has a deﬁnite truth value. 5 is an example for “x + 2 = 7”. A predicate or propositional function is a statement containing variables. An example for P (x) is a value of x for which P (x) is true.
y) .) .e. P (x) is always false. If ∃x P (x) is false then there is no value of x for which P (x) is true. but ∃y∀x P (x.. if ∀x P (x) is false then it is not true that for every x. In predicates with more than one variable it is possible to use several quantiﬁers at the same time. ¬∃x P (x) ≡ ∀x ¬P (x) . Answer : The statement is: ∀x ∃y (x < y) (the universe of discourse is the real numbers). ∀x∃y P (x. y.www. Thus: w w w jn . or in other words. y) ≡ ∀x∃y ¬P (x. y) means something diﬀerent from ∃y∀x P (x. y) ≡ ∀x¬∀y P (x. This two rules can be applied in successive steps to ﬁnd the negation of a more complex quantiﬁed statement. y. A partially quantiﬁed predicate is still a predicate. Hence On the other hand.com 1. hence for some x. PREDICATES. (Note that among real numbers x < y is equivalent to x ≥ y. P (x) holds. QUANTIFIERS 12 universe of discourse is the integers. A predicate can be partially quantiﬁed. y). The symbol ∀ is called the universal quantiﬁer. Its negation is: ∃x ∀y ¬(x < y). For instance if x and y represent human beings and P (x.e. i. but depending on fewer variables. then ∀x∃y P (x. for instance ∀x∀y∃z P (x.3. t)..c ld m o ¬∀x P (x) ≡ ∃x ¬P (x) . However if Q(x) represents “(x + 1)2 = x2 + 2x + 1” then ∀x Q(x) is true.2. 1. y) means that there is someone such that everybody is his or her friend. P (x) must be false. z. Note that in general the existential and universal quantiﬁers cannot be swapped. z)”. z). and the rest (z and t in the example) are called free variables. y) represents “x is a friend of y”. meaning “for all x and all y there is some z such that P (x. Write the negation of that statement. i. for instance: ¬∃x∀y p(x.2.jntuworld. w tu r o . Exercise: Write formally the statement “for every real number there is a greater real number”. Generalized De Morgan Laws for Logic. y) means that everybody is a friend of someone. ∃x ∀y (x < y). The variables quantiﬁed (x and y in the example) are called bound variables. then ∀x P (x) is false. e. but formally they are diﬀerent predicates.g. y. in general ∀x∃y P (x.
Deﬁnitions: used to create new concepts from old ones. so x ≤ 2 and y ≤ 3. In a direct proof we assume the hypothesis together with axioms and other theorems previously proved and we derive the conclusion from them. Proof by Contradiction. An indirect proof or proof by contrapositive consists of proving the contrapositive of the desired implication.3..e. Adding those inequalities we get . Proofs 1.jntuworld. 3. w tu r o .. i.e. An argument that establishes the truth of a proposition is called a proof. Proofs. instead of proving p → q we prove ¬q → ¬p. A theorem is a proposition that can be proved to be true. so adding both inequalities we get x + y ≤ 5.3. i. and try to derive a contradiction. Undeﬁned terms: corresponding to the primitive concepts of the system (for instance in set theory the term “set” is undeﬁned).www. Assume to the contrary that “x > 2 or y > 3” is false. Axioms: propositions that are assumed true. w Answer : We must prove that x + y > 5 → (x > 2) ∨ (y > 3). w w jn .1. From here we must conclude that x > 2 or y > 3.com 1. In fact: ¬((x > 2) ∨ (y > 3)) is the same as (x ≤ 2)∧(y ≤ 3). Example: Prove that if x > 2 and y > 3 then x + y > 5. PROOFS 13 1. In a proof by contradiction or (Reductio ad Absurdum) we assume the hypotheses and the negation of the conclusion. Answer : We assume the hypothesis x + y > 5. Answer : Assuming x > 2 and y > 3 and adding the inequalities term by term we get: x + y > 2 + 3 = 5. which is the same as ¬(x + y > 5).3. a proposition of the form r ∧ ¬r. That is an example of direct proof. A Mathematical System consists of: 1. Mathematical Systems. 2. Example: Prove that if x + y > 5 then x > 2 or y > 3. An indirect proof consists of proving ¬((x > 2) ∨ (y > 3)) → ¬(x + y > 5).c ld m o Example: Prove by contradiction that if x + y > 5 then either x > 2 or y > 3.
but this implies that a itself is even. . In an indirect proof we prove an implication of the form p → q by proving the contrapositive ¬q → ¬p. √ i. . . . Remark : Sometimes it is diﬃcult to distinguish between an indirect proof and a proof by contradiction. Hence: 2 b2 = 4 a 2 .www. Squaring both sides we have 2 = a2 /b2 . . 1. In an proof by contradiction we prove an statement s (which may or may not be an implication) by assuming ¬s and deriving a contradiction. otherwise it is called invalid. w tu p1 p2 . so b is even: b = 2b .jntuworld.c ld m o p 1 . Rules of Inference. so a = 2 a . pn are true. there are no integers a. An argument is usually written: w or w w jn . b such that 2 = a/b. . . Arguments. . and simplifying: b2 = 2 a 2 . √ √ Answer : Assume that 2 is rational. . p2 . pn ∴ q r o . .3. p2 . .. . so “x > 2 or y > 3” must be true. i. . PROOFS 14 x ≤ 2 + 3 = 5. . which contradicts the hypothesis x + y > 5. In fact proofs by contradiction are more general than indirect proofs.e. hence 2 b2 = a2 .com 1. p n / ∴ q The argument is called valid if q is true whenever p1 . . Since the left hand side is even. √ Exercise: Prove by contradiction that 2 is not a rational number. contradicting the hypothesis that a/b was in least terms. This implies that b2 is even.. An argument is a sequence of propositions p1 . Consequently a/b = 2a /2b = a /b . then a2 is even. p2 . 2 = a/b.2.e.3. From here we conclude that the assumption “x ≤ 2 and y ≤ 3” cannot be right. pn called hypotheses (or premises) followed by a proposition q called conclusion. where a and b are integers and the fraction is written in least terms.
Conjunction: w w jn .3.www. w tu r o .c ld m o p→q p ∴ q 3. Modus Tollens: p→q ¬q ∴ ¬p p ∴ p∨q 4. so the argument is valid. Simpliﬁcation: p∧q ∴ p 5. For instance the following argument is called modus ponens or rule of detachment: p→q p ∴ q In order to check whether it is valid we must examine the following truth table: p T T F F q p→q T T F F T T F T p T T F F q T F T F If we look now at the rows in which both p → q and p are true (just the ﬁrst row) we see that also q is true. Other rules of inference are the following: 1. Modus Ponens or Rule of Detachment: w 2. Addition: .jntuworld.com 1. PROOFS 15 Rules of inference are certain simple arguments known to be valid and used to make a proof step by step.
That justiﬁcation can be one of the following: 1. The statement is a premise. Each row contains a label. If I walk I get tired.com 1. W → T and ¬T . W = “I walk” and T = “I get tired”.” We can formalize this by calling B = “I take the bus”.jntuworld. w tu r o . PROOFS 16 p q ∴ p∧q 6. Disjunctive Syllogism: p∨q ¬p ∴ q 8. and the conclusion is B. Resolution: p∨q ¬p ∨ r ∴ q∨r Arguments are usually written using three columns. The premises are B ∨ W .2.www. Modus Tollens Premise 4. The argument can be described in the following steps: step statement reason 1) 2) 3) 4) 5) W →T ¬T ¬W B∨W ∴B Premise Premise 1. Hypothetical Syllogism: p→q q→r ∴ p→r 7. I do not get tired. The statement can be derived from statements occurring earlier in the argument by using a rule of inference. Therefore I take the bus. Disjunctive Syllogism w w jn .3. a statement and the reason that justiﬁes the introduction of that statement in the argument.c ld m o . w Example: Consider the following statements: “I take the bus or I walk. 2.3.
e. then ∀x p(x) is true. Rules of Inference for Quantiﬁed Statements.e. w tu r o . but they can be generalized to predicates with two or more variables. from ∀x (x+1 = 1+x) we can derive 7+1 = 1+7. . We state the rules for predicates with one variable. then p(a) is true for some speciﬁc element a in the universe of discourse. 1. i. for instance. such √ that (± 2)2 = 2. Universal Generalization. If p(a) is true for some speciﬁc element a in the universe of discourse.e. 3.com 1. i. If p(x) is proved to be true for a generic element in the universe of discourse. 2. If ∀x p(x) is true.3. we can prove ∀x [(x + 1)2 = x2 + 2x + 1] (say.jntuworld. PROOFS 17 1. Existential Generalization. for instance.: ∃x p(x) ∴ p(a) The diﬀerence respect to the previous rule is the restriction in the meaning of a. So..c ld m o By “generic” we mean an element for which we do not make any assumption other than its belonging to the universe of discourse. then ∃x p(x) is true. which now represents some (not any) element of the universe of discourse.: ∀x p(x) ∴ p(a) For instance.: p(a) ∴ ∃x p(x) For instance: from 7 + 1 = 8 we can derive ∃x (x + 1 = 8). Universal Instantiation. if a is an element in the universe of discourse. Example: Show that a counterexample can be used to disprove a universal statement. for real numbers) by assuming that x is a generic real number and using algebra to prove (x + 1)2 = x2 + 2x + 1. So. If ∃x p(x) is true.www. which we may represent ± 2.: p(x) ∴ ∀x p(x) w w w jn . i. i. from ∃x (x2 = 2) (the universe of discourse is the real numbers) we derive the √ existence of some element. 4.3. then p(a) is true for each speciﬁc element a in the universe of discourse. i. Existential Instantiation.e.3.e.
3.c ld m o .jntuworld.com 1. PROOFS 18 then from ¬p(a) we can derive ¬∀x p(x). Answer : The argument is as follows: step statement reason 1) ¬p(a) 2) ∃x ¬p(x) 3) ¬∀x p(x) Premise Existential Generalization Negation of Universal Statement w w w jn . w tu r o .www.
com CHAPTER 2 Sets. 3. The symbol ∈ is used to express that an element is (or belongs to) a set. 2. −1. 4. −3. 5}. S − = set of negative elements in S. w w jn . S ∗ = set of elements in S excluding zero. is by stating a property (predicate) P (x) veriﬁed by exactly its elements.1. its number of elements is represented A. 3. · · · } = the set of positive integers. for instance 3 ∈ A. e. 0. 2. 4. for instance Z + = {1. Q = the set of rational numbers.www. If the set is ﬁnite. 2. 4. 5. for instance A = {x ∈ Z  1 ≤ x ≤ 5} = “set of that N includes zero—for some authors N = {1.g. called elements of the set.1. N = {0. · · · } = the set of negative integers. Sets. −2. if A = {1. A set is a collection of objects. without zero. S + = set of positive elements in S. for instance Z− = {−1. w tu r o . 7 ∈ A. R = the set of real numbers. called setbuilder notation. · · · } = the set of natural numbers. Relations 2.g. A set can be represented by listing its elements between braces: A = {1. 3. · · · }. e. 2. 3. Is S is one of those sets then we also use the following notations:2 w 1. 2. working with strings we will use a similar notation with a diﬀerent meaning—be careful not to confuse it. 2. −3. 3.1.c ld m o Setbuilder notation. An alternative way to deﬁne a set. Set Theory 2. 1. 2. for instance R∗ = the set of non zero real numbers. 5} then A = 5. · · · } = the set of integers. Functions. 2When 1Note 19 . C = the set of complex numbers. 3.1 Z = {· · · . −2. Some important sets are the following: 1.jntuworld. 3. 1. Its negation is represented by ∈. 2. 3.
B ∈ A. 3Properly speaking. b} would be considered diﬀerent. However. and is represented by ∅ or {}. We say that A is a subset of set B. 3}. So. 2.www.g. i. In that case we use multisets. and is represented P(A). so for instance. c. For instance if A = {1. t}. {a. Two ordinary sets are identical if they have the same elements. since in the ﬁrst one the element a occurs twice and in the second one it occurs only once. then P(A) = {∅. Subset. but with possibly repeated elements.com 2. Power Set. In general: A = {x ∈ U  p(x)}.: A = {1. then obviously B is an element of A.e. where U is the universe of discourse in which the predicate P (x) must be interpreted. 2. or A is contained in B. as multisets.. or A = {x  P (x)} if the universe of discourse for P (x) is implicitly understood. SET THEORY 20 integers x such that 1 ≤ x ≤ 5”—i. b} and {a. in some applications it might be useful to allow repeated elements in a set. 5}. A} .e. {2}.e. b. {3.1.e. if A = {a. Note that nothing prevents a set from possibly being an element of another set (which is not the same as being a subset!). b. if A = {1. 2. For instance. 3}. {1}. 3}. w tu r o . if A ⊆ B but A = B. if all elements of A are in B. there is some element in B which is not in A. Exercise: Prove by induction that if A = n then P(A) = 2n . 3. e} then A ⊆ B. a. The collection of all subsets of a set A is called the power set of A. and we represent it “A ⊆ B”. A set with no elements is called empty set (or null set. 2}.jntuworld. e. {1. or void set). 3}} and B = {3. {2. t}. {1. c} and B = {a. i.c ld m o Multisets.. represented “A ⊂ B”.. w w w jn . d. Two sets are equal if and only if they have the same elements. {1. a. b} and {a.: A = B ≡ ∀x (x ∈ A ↔ x ∈ B) . i. a.3 Principle of Extension. Empty Set. {3}. the universe of discourse of set theory is the collection of all sets (which is not a set). b} are the same set because they have exactly the same elements. 4. namely a and b. {a. which are mathematical entities similar to sets. . A is a proper subset of B. In set theory the term universal set is often used in place of “universe of discourse” for a given predicate.
1.1.com 2.www. Union A ∪ B. . For instance. The other ﬁgures represent various set operations.1. w w w jn . Venn Diagrams. Figure 2.3.2.c ld m o Figure 2. A Figure 2. in ﬁgure 2.2. A w tu A B r o B .jntuworld. Intersection A ∩ B. SET THEORY 21 2. Venn Diagram.1. the rectangle represents the universal set (the set of all elements considered in a given problem) and the shaded region represents a set A. Venn diagrams are graphic representations of sets as enclosed areas in the plane.
A B Figure 2. Symmetric Diﬀerence A ⊕ B. Complement A.3.6. Set Operations. Union: The set of elements that belong to either of two sets: A ∪ B = {x  (x ∈ A) ∨ (x ∈ B)} . the sets are said to be disjoint.com 2. SET THEORY 22 A Figure 2.jntuworld.1.www. . A w tu r o B . Intersection: The common elements of two sets: A ∩ B = {x  (x ∈ A) ∧ (x ∈ B)} . 2.4.c ld m o Figure 2. w w w jn . 1. 2.1. If A ∩ B = ∅.5. Diﬀerence A − B.
We see that (M ∩P )−(M ∩P ∩C) = 100−10 = 90. their symmetric diﬀerence is the set of elements that belong to either one or the other set but not both. 2. P and C be the sets of students taking Mathematics courses.1.4. P  = 350.7) w w w jn . (M ∩C)−(M ∩ P ∩ C) = 150 − 10 = 140 and (P ∩ C) − (M ∩ P ∩ C) = 75 − 10 = 65.7. A Venn diagram with n sets intersecting in the most general way divides the plane into 2n regions. 5. Assume M  = 300. Example: Let M . 2. M ∩ P ∩ C = 10. Symmetric Diﬀerence: Given two sets. M ∩ C = 150. .jntuworld. 4. Physics courses and Computer Science courses respectively in a university. How many students are taking exactly one of those courses? (ﬁg. M w tu 60 90 10 140 65 235 C r o P .1. If we have information about the number of elements of some portions of the diagram. M ∩ P  = 100.com 2. C = 450. SET THEORY 23 3. then we can ﬁnd the number of elements in each of the regions and use that information for obtaining the number of elements in other portions of the plane.www.c ld m o 185 Figure 2. P ∩ C = 75. Counting with Venn diagrams. Diﬀerence or Relative Complement: The set of elements that belong to a set but not to another: A − B = {x  (x ∈ A) ∧ (x ∈ B)} = A ∩ B . A ⊕ B = {x  (x ∈ A) ⊕ (x ∈ B)} . It can be expressed also in the following way: A ⊕ B = A ∪ B − A ∩ B = (A − B) ∪ (B − A) . Complement: The set of elements (in the universal set) that do not belong to a given set: A = {x ∈ U  x ∈ A} . Counting with Venn Diagrams.
2.jntuworld.www.1. w tu A∪∅=A A∩U=A A∪A=U A∩A=∅ r o . Properties of Sets. Involution Law: A=A w w jn .c ld m o 7.1. Identity Laws: 5. The set operations verify the following properties: 1. Bound Laws: . Distributive Laws: A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) 4. The sum 60 + 185 + 235 = 480 is the number of students taking exactly one of those courses. Absorption Laws: A ∪ (A ∩ B) = A A ∩ (A ∪ B) = A 9. Associative Laws: A ∪ (B ∪ C) = (A ∪ B) ∪ C A ∩ (B ∩ C) = (A ∩ B) ∩ C 2. Idempotent Laws: A∪A=A A∩A=A A∪U=U A∩∅=∅ 8. Complement Laws: w 6.5. SET THEORY 24 Then the region corresponding to students taking Mathematics courses only has cardinality 300−(90+10+140) = 60.com 2. Commutative Laws: A∪B =B∪A A∩B =B∩A 3. Analogously we compute the number of students taking Physics courses only (185) and taking Computer Science courses only (235).
www.jntuworld.com
2.1. SET THEORY
25
10. 0/1 Laws: ∅=U U=∅ 11. DeMorgan’s Laws: A∪B =A∩B A∩B =A∪B 2.1.6. Generalized Union and Intersection. Given a collection of sets A1 , A2 , . . . , AN , their union is deﬁned as the set of elements that belong to at least one of the sets (here n represents an integer in the range from 1 to N ):
N
An = A1 ∪ A2 ∪ · · · ∪ AN = {x  ∃n (x ∈ An )} .
n=1
Analogously, their intersection is the set of elements that belong to all the sets simultaneously:
N
An = A1 ∩ A2 ∩ · · · ∩ AN = {x  ∀n (x ∈ An )} .
n=1
These deﬁnitions can be applied to inﬁnite collections of sets as well. For instance assume that Sn = {kn  k = 2, 3, 4, . . . } = set of multiples of n greater than n. Then
w
w w
∞ n=2
Sn = S2 ∪ S3 ∪ S4 ∪ · · · = {4, 6, 8, 9, 10, 12, 14, 15, . . . } = set of composite positive integers .
jn .
w tu
r o
.c ld
m o
2.1.7. Partitions. A partition of a set X is a collection S of non overlapping non empty subsets of X whose union is the whole X. For instance a partition of X = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} could be S = {{1, 2, 4, 8}, {3, 6}, {5, 7, 9, 10}} . Given a partition S of a set X, every element of X belongs to exactly one member of S.
Example: The division of the integers Z into even and odd numbers is a partition: S = {E, O}, where E = {2n  n ∈ Z}, O = {2n + 1  n ∈ Z}.
www.jntuworld.com
2.1. SET THEORY
26
Example: The divisions of Z in negative integers, positive integers and zero is a partition: S = {Z+ , Z − , {0}}. 2.1.8. Ordered Pairs, Cartesian Product. An ordinary pair {a, b} is a set with two elements. In a set the order of the elements is irrelevant, so {a, b} = {b, a}. If the order of the elements is relevant, then we use a diﬀerent object called ordered pair, represented (a, b). Now (a, b) = (b, a) (unless a = b). In general (a, b) = (a , b ) iﬀ a = a and b = b . Given two sets A, B, their Cartesian product A × B is the set of all ordered pairs (a, b) such that a ∈ A and b ∈ B: A × B = {(a, b)  (a ∈ A) ∧ (b ∈ B)} .
Analogously we can deﬁne triples or 3tuples (a, b, c), 4tuples (a, b, c, d), . . . , ntuples (a1 , a2 , . . . , an ), and the corresponding 3fold, 4fold,. . . , nfold Cartesian products: A1 × A2 × · · · × An = {(a1 , a2 , . . . , an )  (a1 ∈ A1 ) ∧ (a2 ∈ A2 ) ∧ · · · ∧ (an ∈ An )} . If all the sets in a Cartesian product are the same, then we can use an exponent: A2 = A × A, A3 = A × A × A, etc. In general:
An example of Cartesian product is the real plane R2 , where R is the set of real numbers (R is sometimes called real line).
w
w w
jn .
An = A × A ×
w tu
···
r o
× A.
.c ld
m o
(n times)
www.jntuworld.com
2.2. FUNCTIONS
27
2.2. Functions 2.2.1. Correspondences. Suppose that to each element of a set A we assign some elements of another set B. For instance, A = N, B = Z, and to each element x ∈ N we assign all elements y ∈ Z such that y 2 = x (ﬁg. 2.8).
0 2 3 6 7 N 1 4 5 9 3 1 2
0 −1 −2 −3
8 10
Z
√ Figure 2.8. Correspondence x → ± x. This operation is called a correspondence.
2.2.2. Functions. A function or mapping f from a set A to a set B, denoted f : A → B, is a correspondence in which to each element x of A corresponds exactly one element y = f (x) of B (ﬁg. 2.9).
w
w w
jn .
A
w tu
r o
.c ld
m o
B
Figure 2.9. Function. Sometimes we represent the function with a diagram like this: f :A→B x→y or A→B x→y
f
f (x))  x ∈ A} (ﬁg. etc. x ∈ A. If A ⊆ A.com 2.e. −2. For instance: 2 = 2.2. The subset f (A) of B consisting of all images of elements of A is called the range of f . 2. f (x) = f (x ) ⇒ x = x .e.10). For instance. −2. For instance: 2 = 2.. i.jntuworld. the following represents the function from Z to Z deﬁned by f (x) = 2x + 1: f :Z→Z x → 2x + 1 The element y = f (x) is called the image of x. the subset of B consisting of all images of elements of A . if f (x) = 2x + 1 then f (7) = 2 · 7 + 1 = 15. 2. FUNCTIONS 28 For instance. all odd numbers.2. i..11): ∀x. . 2. For instance.5 = −3.3.5 = −2. and x is a preimage of y. The ceiling function: x = least integer greater than or equal to x .c ld m o x mod y = remainder when x is divided by y. 59 mod 9 = 5 because 59 = 6 · 9 + 5. w tu r o . OnetoOne or Injective: A function f : A → B is called onetoone or injective if each element of B is the image of at most one element of A (ﬁg. Example: Two useful functions from R to Z are the following: 1. Graph: The graph of a function f : A → B is the subset of A × B deﬁned by G(f ) = {(x. π = 4. 2. 1. π = 3.3 = 2. w w jn . 2. The ﬂoor function: x = greatest integer less than or equal to x .3 = 3. the range of f (x) = 2x + 1 is the set of all integers of the form 2x + 1 for some integer x. the image of A by f is f (A ) = {f (x)  x ∈ A }. 2. and B is its codomain.www. Types of Functions. Example: The modulus operator is the function mod : Z × Z+ → Z deﬁned: w For instance 23 mod 7 = 2 because 23 = 3·7+2. The set A is the domain of f .
∃x ∈ A such that y = f (x) . OneToOne Correspondence or Bijective: A function f : A → B is said to be a onetoone correspondence. Onto or Surjective: A function f : A → B is called onto or surjective if every element of B is the image of some element of A (ﬁg.jntuworld. A Figure 2.12): w w w ∀y ∈ B.com 2. f (x) = 2x from Z to Z is injective. Graph of f (x) = x2 . For instance. A w tu r o B . Onetoone function.12. 2. 3.2.c ld m o B Figure 2.10.11. jn . or bijective. FUNCTIONS 29 4 3 y 2 1 –2 –1 0 1 x 2 Figure 2. 2. For instance. or a .www. f (x) = x2 from R to R+ ∪ {0} is onto. Onto function.
if A = B = C = Z. f w tu g◦f r o C . we have that h ◦ (g ◦ f ) = (h ◦ g) ◦ f . the composite function of f and g is the function g ◦ f : A → C deﬁned by (g ◦ f )(x) = g(f (x)) for every x in A: w For instance.13). the function 1A : A → A deﬁned by 1A (x) = x for every x in A is called the identity function for A. Also (f ◦ g)(x) = g(x) + 1 = x2 + 1 (the composition of functions is not commutative in general).jntuworld. then (g ◦ f )(x) = f (x)2 = (x + 1)2 . Given two functions f : A → B and g : B → C. A B Figure 2. FUNCTIONS 30 bijection.13. Function Composition. For instance. f g h w w A x jn . 2. 2. g(x) = x2 .4.2.5. f (x) = x + 3 from Z to Z is a bijection. i. Given a set A. If f : A → B is a function from A to B. 2. f (x) = x + 1. Bijection.com 2.2. 2. Identity Function. Function composition is associative. Some properties of function composition are the following: 1.2.e..www. we have that f ◦ 1A = 1B ◦ f = f .c ld m o G G B g G G @ z=g(y)=g(f (x)) y=f (x) . given three functions A → B → C → D. if it is onetoone and onto (ﬁg.
w tu r o . If f : A → B is a bijective function. if f : Z → Z is deﬁned by f (x) = x + 3.2.www.times) ◦f . A characteristic property of the inverse function is that f −1 ◦f = 1A and f ◦ f −1 = 1B .6. w w jn . The inﬁx notation consists of writing the symbol of the binary operator between its arguments: x + y (this is the most common). Operators. if f : Z → Z is f (x) = 2x + 1. The arrow diagram of f −1 is the same as the arrow diagram of f but with all arrows reversed. and so on.2. This is called preﬁx notation. A function from A × A to A is called a binary operator on A. FUNCTIONS 31 Function iteration. then it makes sense to compose it with itself: f 2 = f ◦ f . For instance.jntuworld. Analogously we can deﬁne f 3 = f ◦f ◦f . If f : A → A is a function from A to A. Inverse Function. For instance. y) → x · y. 2. then its inverse is f −1 (x) = x − 3. w A monary or unary operator on A is a function from A to A. f n = f ◦ (n . 2. For instance the addition of integers is a binary operator + : Z × Z → Z.. Another example of binary operator on Z is (x. then f 2 (x) = 2(2x + 1) + 1 = 4x + 3.2.c ld m o . its inverse is the function f −1 : B → A such that f −1 (y) = x if and only if f (x) = y. There is also a postﬁx notation consisting of writing the symbol after the arguments: x y +. For instance the change of sign x → −x on Z is a unary operator on Z.7. An example of unary operator on R∗ (nonzero real numbers) is x → 1/x.com 2. y). . In the usual notation for functions the sum of two integers x and y would be represented +(x.
In the diagram an arrow from x to y means that x is related to y. R). (2. 4). One way to do that is by listing the set of pairs (m. (c.3. (1.jntuworld. 2. For instance. (2. (Z. 2).3. 4). b) ∈ R. 4} given by R = {(a. 1). 4}. Representations of Relations. This kind of graph is called directed graph or digraph. In general.c ld m o Arrow diagrams. (3.e. RELATIONS 32 2. R ⊆ A × B. the domain is the set of married men.2. So. c. If A and B are the same set. w) such that m is a man. 2. The set {a ∈ A  a R b for some b ∈ B} {b ∈ B  a R b for some a ∈ A} is called the range of R. ≤) means the set of integers together with the relation of nonstrict inequality. w Notation: A set A with a binary relation R is sometimes represented by the pair (A. w is a woman. in the relation “married to” above. w w jn . for instance.com 2. Assume that we have a set of men M and a set of women W . Venn diagrams and arrows can be used for representing relations between given sets. For instance. 3).. (c. a relation R from a set A to a set B will be understood as a subset of the Cartesian product A × B. Relations 2. and the range is the set of married women. 3. 3. 3). 4)} . b. We want to express which men in M are married to which women in W . Relations. then any subset of A × A will be a binary relation in A. 1). 2). assume A = {1. i. As an example.3. The set is called the domain of R.3. 2. So. 3)}. and m is married to w. the relation “married to” can be represented by a subset of the Cartesian product M × W . Then the binary relation “less than” in A will be: <A = {(x. . w tu r o . d} to B = {1.14 represents the relation from A = {a. y) ∈ A × A  x < y} = {(1. we often write a R b instead of (a. (b. (1.1.www. ﬁgure 2. some of whom are married. If an element a ∈ A is related to an element b ∈ B.
15.www.3.15. 1 2 4 8 Figure 2.14.3. RELATIONS 33 a c 1 b 2 d 3 4 A B Figure 2. 2. Another example is given in diagram 2. otherwise we write 0. 3)} from A = {a. Given a relation R from A to B. and its columns are labeled with the elements of B. 3. 4} has the following matrix: a b c d 1 1 1 0 0 2 3 4 0 0 1 0 0 0 1 0 0 0 0 0 w w w jn . d} to B = {1. Matrix of a Relation. Inverse Relation. 9}. 1). 2). Another way of representing a relation R from A to B is with a matrix. 7.com 2. 1). b. which represents the divisibility relation on the set {1. Binary relation of divisibility. the inverse of R. If a ∈ A and b ∈ B then we write 1 in row a column b if a R b. For instance the relation R = {(a. 6.jntuworld. 2. c.c ld m o 2. . 3.3. Its rows are labeled with the elements of A. (b. (c. w tu 9 7 5 6 3 r o . 5. 4. denoted R−1 . (c. Relation. is the relation from B to A deﬁned as b R−1 a ⇔ a R b . 8.
3. an order on a set A is a binary relation “ ” on A with the following properties: 1. A binary relation R on A is called: 1. x R y and y R z implies x R z. 3.4.www. then the composition S ◦ R of relations R and S is a relation from A to C deﬁned by: a (S ◦ R) c ⇔ there exists some b ∈ B such that a R b and b S c . B and C be three sets. 4. . Properties of Binary Relations. x R y ⇒ y R x. Reﬂexive if for all x ∈ A.5. 3. Antisymmetric: (x y) ∧ (y x) ⇒ x = y. but strict inequality (<) is not.c ld m o 2. nonstrict inequality (≤) on Z is antisymmetric. Transitive: (x y) ∧ (y z) ⇒ x z. w w w jn . Partial Orders. For instance. For instance on Z the relation “equal to” (=) is reﬂexive. x x. and S is the relation “to be married to”. then R−1 is the relation “being a parent of”. Relation of divisibility on Z+ : ab ⇔ ∃t. x R y and y R x implies x = y. x R x. Composition of Relations. equality (=) is symmetric. 2. Examples: 1. Given a relation R from A to B and a relation S from B to C.jntuworld. if R is the relation “to be the father of”. RELATIONS 34 For instance. b = at. Reﬂexive: for all x ∈ A.3. y. Symmetric if for all x. 2. 2. 2.3. The nonstrict inequality (≤) in Z. Transitive if for all x. For instance on Z. For instance equality (=) and inequality (<) on Z are transitive relations. z ∈ A. For instance.com 2. w tu r o .3.6. 2. then S ◦ R is the relation “to be the father in law of”. A partial order. y ∈ A. if R is the relation “being a son or daughter of”. Let A. or simply. Antisymmetric if for all x. y ∈ A.
Reﬂexive: a = a 1 ⇒ aa. y ∈ A are comparable. 4. The order is called total or linear when every pair of elements x. Exercise: prove that the aforementioned relations are in fact partial orders. . For instance (Z. x ∼ x. where “” represents integer divisibility. 3..8.com 2. 7. w w w jn .jntuworld.3. ≤) is totally ordered. so t = t = 1 ⇒ a = b. . As an example we prove that integer divisibility is a partial order: 1. 8. . 3. is not. Hasse diagrams. Symmetric: x ∼ y ⇒ y ∼ x. w tu r o . b ∈ A are said to be comparable if either x y or y x. for instance the set {1.7. 2.16 represents the Hasse diagram for the relation of divisibility on {1. A Hasse diagram is a graphical representation of a partially ordered set in which each element is represented by a dot (node or vertex of the diagram).3.e. 2. ac. Transitive: (x ∼ y) ∧ (y ∼ z) ⇒ x ∼ z. ). 6. 3. 2. RELATIONS 35 3. otherwise they are said to be non comparable. 5. ﬁgure 2. which implies tt = 1 ⇒ t = t−1 .www. Antisymmetric: ab ⇒ b = at for some t and ba ⇒ a = bt for some t . Equivalence Relations. ). Its immediate successors are placed above the node and connected to it by straight line segments. } is a chain in (Z+ . 9}. An equivalence relation on a set A is a binary relation “∼” on A with the following properties: 1. Question: is the strict inequality (<) a partial order on Z? Two elements a. Set inclusion (⊆) on P(A) (the collection of subsets of a given set A).c ld m o Question: How does the Hasse diagram look for a totally ordered set? 2. Hence a = att . . Reﬂexive: for all x ∈ A. The only invertible positive integer is 1. 4. As an example.3. 2. hence c = att . 8. but (Z+ . 16. 2. A totally ordered subset of a partially ordered set is called a chain. Transitive: ab and bc implies b = at for some t and c = bt for some t . i.
The collection of equivalence classes. the equality (=) is an equivalence relation. Another example. hence x ≡ z (mod 2). the collection of equivalence classes. RELATIONS 36 8 9 6 5 2 3 7 4 1 Figure 2. 2. For instance. and an element x ∈ A. but y − x = −t is also even. is called quotient set of A by ∼. x−x = 0 is indeed even. Given an equivalence relation ∼ on a set A.e. is the following: x ≡ y (mod 2) (“x is congruent to y modulo 2”) iﬀ x−y is even. One of the main properties of an equivalence relation on a set A is that the quotient set. Congruence modulo 2 is in fact an equivalence relation: 1. Equivalence Classes.com 2. but 7 ≡ 4 (mod 2). Symmetric: if x ≡ y (mod 2) then x − y = t is even. represented [x] = {y ∈ A  y ∼ x}. also on Z. the set of elements of A related to x are called the equivalence class of x.3. Recall that a partition of a set A is a collection of . 6 ≡ 2 (mod 2) because 6 − 2 = 4 is even.www. w 2. i. on Z. w tu r o .9.3.16. Hasse diagram for divisibility. represented A/ ∼ = {[x]  x ∈ A}. w w jn . hence y ≡ x (mod 2). For instance. From here. Reﬂexive: for every integer x. Then x − y = t and y − z = u are even. because 7 − 4 = 3 is not even. 3. Element x is said to be a representative of class [x]. x − z = (x − y) + (y − z) = t + u is also even. Quotient Set. so x ≡ x (mod 2).c ld m o Exercise: Find the equivalence classes on Z with the relation of congruence modulo 2.jntuworld. is a partition of A. Transitive: assume x ≡ y (mod 2) and y ≡ z (mod 2). Partitions.
Prove that R is an equivalence relation. Exercise: Let m be an integer greater than or equal to 2. of A which are pairwise disjoint and whose union equals A: 1. O}. A2 . . m divides exactly y − x). The quotient set of Z by the relation “∼2 ” of congruence modulo 2 is Z/ ∼2 = {E. . What are the equivalence classes? How many are there? Exercise: On the Cartesian product Z × Z∗ we deﬁne the relation (a. n An = A. RELATIONS 37 nonempty subsets A1 . We see that it is in fact a partition of Z. Would it still be an equivalence relation if we extend it to Z × Z? w w w jn . and Z = E ∪ O.jntuworld. d) ⇔ ad = bc.3. 2. w tu r o . b) R (c. there are two equivalence classes: the set E of even integers and the set O of odd integers. . On Z we deﬁne the relation x ≡ y (mod m) ⇔ m(y − x) (i.com 2.e. because E ∩ O = ∅..c ld m o .www. Ai ∩ Aj = ∅ for i = j. A3 . Example: in Z with the relation of congruence modulo 2 (call it “∼2 ”). Prove that it is an equivalence relation.
3. Algorithms Consider the following list of instructions to ﬁnd the maximum of three numbers a. 2. Input. 4. 6.1. Pseudocode is a language similar to a programming language used to represent algorithms. Generality. w w w jn . Output.c ld m o Basically an algorithm is the idea behind a program. The steps are precisely stated. 3. Assign variable x the value of a. c: 1. The result of executing each step is uniquely determined by the inputs and the result of preceding steps. The main diﬀerence respect to actual programming languages is that pseudocode is not required to follow strict syntactic rules. 38 .1.1. c. programs are implementations of algorithms. 3. b. 4. w tu r o . The algorithm stops after ﬁnitely many instructions have been executed. If b > x then assign x the value of b. After executing those steps the output will be the maximum of a. Precision. Uniqueness. 5. If c > x then assign x the value of c. Pseudocode. Output the value of x. Finiteness. The algorithm applies to a set of inputs. The algorithm produces output.com CHAPTER 3 Algorithms. not actually executed by a machine. The algorithm receives input.jntuworld. 2.www. b. Integers 3. since it is intended to be just read by humans. In general an algorithm is a ﬁnite list of instructions with the following characteristics: 1. Conversely.
.b.www.c) 2: x := a 3: if b>x then 4: x := b 5: if c>x then 6: x := c 7: return x 8: end max Next we show a few common operations in pseudocode.. b.c ld m o The following code executes “action” while condition “p” is true: 1: while p 2: action The following is the structure of a for loop: . otherwise it executes “action2”: if p then action1 else action2 w w if p then action jn . w tu r o .1.com 3. end ProcedureName For instance the following is an algorithm to ﬁnd the maximum of three numbers a. The following statement means “assign variable x the value of variable y: x := y The following code executes “action” if condition “p” is true: w The following code executes “action1” if condition “p” is true.jntuworld. c: 1: procedure max(a. ALGORITHMS 39 Usually pseudocode will look like this: procedure ProcedureName(Input) Instructions.
A deﬁnition such that the object deﬁned occurs in the deﬁnition is called a recursive deﬁnition.2.c ld m o .1. ALGORITHMS 40 for var := init to limit action If an action contains more than one statement then we must enclose them in a block: begin Instruction1 Instruction2 Instruction3 . .n) 2: largest := s[1] 3: for k := 2 to n 4: if s[k] > largest then 5: largest := s[k] 6: return largest 7: end largest element 3. s[n]: 1: procedure largest element(s. Recursiveness.. . w tu r o . . end Comments are enclose between brackets: {This is a comment} The output of a procedure is returned with a return statement: return output Procedures that do not return anything are invoked with a call statement: call Procedure(arguments. s2 .jntuworld.) w As an example. . .com 3.1... sn represented as an array with n elements: s[1]... Recursive Deﬁnitions.www. . For instance. w w jn . the following procedure returns the largest number in a sequence s1 . s[2].
c ld m o As an example we show two alternative algorithms for computing the factorial of a natural number. e. A Recursive Step. w w w jn . a0 = 1 2. A recursive procedure is a procedure that invokes itself. n! = n · (n − 1)! • Power: 1. ALGORITHMS 41 consider the Fibonacci sequence 0. 5. . A recursive algorithm is an algorithm that contains recursive procedures or recursive sets of procedures. Recursive algorithms have the advantage that often they are easy to design and are closer to natural mathematical deﬁnitions. 2. . 2. Other examples: • Factorial: 1. the second one recursive. A Basis. procedure p2 invokes procedure p3 and procedure p3 invokes procedure p1 . where the function is explicitly evaluated for one or more values of its argument. procedure p1 invokes procedure p2 .com 3. Recursive Procedures.. Also a set of procedures is called recursive if they invoke themselves in a circle. w tu r o . 8. 3. 0! = 1 2.g. . 13.www. an = an−1 a (n ≥ 1) (n ≥ 1) In all these examples we have: 1.1. F1 = 1 and each subsequent term is the sum of the two previous ones: Fn = Fn−1 + Fn−2 (for n ≥ 2). stating how to compute the function from its previous values.jntuworld. 1. 1: procedure factorial iterative(n) 2: fact := 1 3: for k := 2 to n 4: fact := k * fact 5: return fact 6: end factorial iterative . It can be deﬁned as a sequence whose two ﬁrst terms are F0 = 0. the ﬁrst one iterative (non recursive). 1.
In our example n is a legitimate input if it is a natural number.www.c ld m o After reaching the basic case the procedure returns a value to the last call. n directly. If n > then the execution of factorial recursive(n) leads to a recursive call factorial recursive(n1) which will perform a recursive call w and so on until eventually reaching the basic case factorial recursive(0) w w factorial recursive(n2) jn . which returns a value to the previous call. and so on up to the ﬁrst invocation of the procedure. ALGORITHMS 42 1: procedure factorial recursive(n) 2: if n = 0 then 3: return 1 4: else 5: return n * factorial recursive(n1) 6: end factorial recursive While the iterative version computes n! = 1 · 2 · .. Another example is the following algorithm for computing the nth element of the Fibonacci sequence: . and any legitimate input should lead to a ﬁnite sequence of recursive calls ending up at the basic case.e. an integer greater than or equal to 0. w tu r o . . the recursive version resembles more closely the formula n! = n · (n − 1)! A recursive algorithm must contain at least a basic case without recursive call (the case n = 0 in our example). . If n = 0 then factorial recursive(0) returns 1 immediately without performing any recursive call.com 3.jntuworld. i.1.
.jntuworld. Then fibonacci(4) performs a call to fibonacci(3) and another call to fibonacci(2). namely n = 0 and n = 1.1. . In this particular case the algorithm is ineﬃcient in the sense that it performs more computations than actually needed. s[i+1]. so at this point we see that fibonacci(3) is being called twice. the idea of merge sort is to split the list into two approximately equal parts.. Hence sometimes the price to pay for a simpler algorithmic structure is a loss of eﬃciency. w tu r o . On the other hand. one to fibonacci(4) and another one to fibonacci(3). The idea is to go several times through the list swapping adjacent elements if necessary. An example is merge sort. ALGORITHMS 43 1: procedure fibonacci(n) 2: if n=0 then 3: return 0 4: if n=1 then 5: return 1 6: return fibonacci(n1) + fibonacci(n2) 7: end fibonacci In this example we have two basic cases. once inside fibonacci(5) and again in fibonacci(4).c ld m o We can see that bubble sort requires n(n − 1)/2 comparisons and possible swapping operations. . sj represented as an array s[i].com 3.j) 2: for p:=1 to ji 3: for q:=i to jp 4: if s[q] > s[q+1] then 5: swap(s[q]. s[j]: 1: procedure bubble sort(s. It applies to a list of numbers si .www.. . sort them separately and then merge them into a single list: . si+1 . First let’s look at a simple non recursive sorting algorithm called bubble sort.. and algorithm intended to sort a list of elements. However careful design may yield eﬃcient recursive algorithms. For instance a call to fibonacci(5) contains two recursive calls.s[q+1]) 6: end bubble sort w w w jn . .i.
.www. . 3. . ci+1 . For instance the following loop performs the statement x := x + 1 exactly n times. . .com 3.c) 8: for k:=i to j 9: s[k] := c[k] 10: end merge sort The procedure merge(s. . This algorithm is more eﬃcient than bubble sort because it requires only about n log2 n operations (we will make this more precise soon). . . . sm+2 . w w jn .j) 7: call merge(s.i. w Since the actual time required to execute an algorithm depends on the details of the program implementing the algorithm and the speed and other characteristics of the machine executing it.3.1. The strategy of dividing a task into several smaller tasks is called divide and conquer.j.1. . sj into a single increasing sequence ci . .j) 2: if i=j then 3: return 4: m := floor((i+j)/2) 5: call merge sort(s. sm and sm+1 .jntuworld.m. In general the complexity of an algorithm is the amount of time and space (memory use) required to execute it. Complexity. w tu r o .c ld m o 1: for i := 1 to n 2: x := x + 1 The following double loop performs it n2 times: 1: for i := 1 to n 2: for j := 1 to n 3: x := x + 1 The following one performs it 1 + 2 + 3 + · · · + n = n(n + 1)/2 times: .m.i. say by the number of operations performed. . however it is possible to measure the length of the computation in other ways. ALGORITHMS 44 1: procedure merge sort(s.i.c) merges the two increasing sequences si .m) 6: call merge sort(s. si+1 . it is in general impossible to make an estimation in actual physical time.j. cj .i.m+1. Here we deal with time complexity only.
ALGORITHMS 45 1: for i := 1 to n 2: for j := 1 to i 3: x := x + 1 Since the time that takes to execute an algorithm usually depends on the input. In this problem the minimum time needed to ﬁnd the red object would be 1 (in the lucky event that the ﬁrst object examined turned out to be the red one). The maximum time would be n (if the red object turns out to be the last one). which is (1+2+3+· · ·+n)/n = (n+1)/2. Often the exact time is too hard to compute or we are interested just in how it grows compared to the size of the input. 2. w For instance. Bestcase time: minimum time needed to execute the algorithm among all inputs of a given size n. if there is a constant C1 such that f (n) ≤ C1 g(n) for all but ﬁnitely many positive integers n. or more generally as a function of the size of the input. . Averagecase time: average time needed to execute the algorithm among all inputs of a given size n. the worstcase time is n and the averagecase time is (n + 1)/2. So in this example the bestcase time is 1. written f (n) = O(g(n)).1. w w jn .jntuworld. and we want to ﬁnd the one that is colored red by examining the objects one by one. The average time is the average of all possible times: 1. represented Θ(n2 ). 2. w tu r o . Wostcase time: maximum time needed to execute the algorithm among all inputs of a given size n. we deﬁne the following kinds of times: 1. its complexity must be expressed as a function of the input.c ld m o . Since the execution time may be diﬀerent for inputs of the same size. 3. . .www. . assume that we have a list of n objects one of which is colored red and the others are colored blue. This justiﬁes the following notations: Big Oh Notation. A function f (n) is said to be of order at most g(n).com 3. 3. n. We measure time by the number of objects examined. For instance and algorithm that requires exactly 7n2 + 3n + 10 steps to be executed on an input of size n is said to be or order n2 .
c ld m o . so they always diﬀer in a multiplicative constant. the inner loop is executed (n − 1) + (n − 2) + · · · + 1 = n(n − 1)/2 times. we do not need to specify the base of the logarithm.com 3. written f (n) = Ω(g(n)).www. A function f (n) is said to be of order at least g(n). First. ALGORITHMS 46 Omega Notation. As a consequence. Hence its execution time is Θ(n2 ).jntuworld. w tu Name Constant Log log Logarithmic n log n Linear Quadratic Cubic Polynomial Exponential r o . Next. The estimation of the complexity of merge sort is more involved. if we call T (n) (the order of) the number of operations w w jn . if there is a constant C2 such that f (n) ≥ C2 g(n) for all but ﬁnitely many positive integers n. written f (n) = Θ(g(n)). b > 1. The following are several common growth functions: Order Θ(1) Θ(log log n) Θ(log n) Θ(n log n) Θ(n) Θ(n2 ) Θ(n3 ) Θ(nk ) Θ(an ) w Let’s see now how we ﬁnd the complexity of algorithms like bubble sort and merge sort. Theta Notation. if f (n) = O(g(n)) and f (n) = Ω(g(n)). Since bubble sort is just a double loop its complexity is easy to ﬁnd. A function f (n) is said to be of order g(n). so it requires n(n − 1)/2 comparisons and possible swap operations.1. the number of operations required by the merge procedure is Θ(n). we can just say that its time is “logarithmic”. because loga n = logb n/ logb a. Remark : All logarithmic functions are of the same order: loga n = Θ(logb n) for any a. if the execution time of an algorithm is of order a logarithmic function.
Replacing n with n/2 we have T (n/2) = 2T (n/4) + n/2. hence T (n) = 2T (n/2) + n = 2(2T (n/4) + n/2) + n = 4T (n/4) + 2n . w w w jn .1.jntuworld.c ld m o . ALGORITHMS 47 required by merge sort working on a list of size n. So for k = log2 n we have T (n) = nT (1) + n log2 n = Θ(n log n) .com 3. Repeating k times we get: T (n) = 2k T (n/2k ) + kn . w tu r o .www. we see that roughly: T (n) = 2T (n/2) + n .
com 3.3. 0 ≤ r < b. More generally. Some results about prime numbers: 1. however.4. if there is some integer q such that a = bq: ba ⇔ ∃q. a = bq .2. b = 0. .2. For instance: 13104 = 24 · 32 · 7 · 13 . n. 2. The Fundamental Theorem of Arithmetic. For all n ≥ 2 there is some prime p such that pn.www. 1 2 k where all the exponents are positive and the primes are written so that p1 < p2 < · · · < pk . . . then there exist unique q.2.2. Warning: 1 is not considered either prime or composite. w tu r o . THE EUCLIDEAN ALGORITHM 48 3. it is just a mathematical theorem. w 3. The Euclidean Algorithm 3. 3. so n ≥ 2 is composite iﬀ it has nontrivial divisors. Given two integers a. (Euclid) There are inﬁnitely many prime numbers. r ∈ Z such that a = qb + r.jntuworld. A nontrivial divisor of n ≥ 2 is a divisor d of n such that 1 < d < n. result is not really an “algorithm”.2. The Division Algorithm. . We also say that b divides or is a divisor of a. . If pab then pa or pb. Every integer n ≥ 2 can be written as a product of primes uniquely.c ld m o . written ba. It is customary to write the factorization in the following way: n = p s1 p s2 . up to the order of the primes. p sk . b ∈ Z. or that a is a multiple of b. A prime number is an integer p ≥ 2 whose only positive divisors are 1 and p. There are. an then pak for some k = 1. The following result is known as The Division Algorithm:1 If a. if pa1 a2 . we say that b divides a. . b > 0. Prime Numbers. Any integer n ≥ 2 that is not prime is called composite. b.1. 1The w w jn .2. and r is called remainder. Divisibility. .2. . algorithms that allow us to compute the quotient and the remainder in an integer division. 3. Here q is called quotient of the integer division of a by b. 3. 2.
for instance 6 = 3 · 30 − 7 · 12. A positive integer d is called a common divisor of the integers a and b. If gcd(a. THE EUCLIDEAN ALGORITHM 49 3. b) divides c.jntuworld.5. In general a Diophantine equation is an equation whose solutions must be integers. w tu r o . If dm and dn then d(m + n). 6}. b). and r = a mod b. Then: 1. c.2. 2. 2. Let m. w 3. w w jn . Example: We have gcd(12.com 3.2. Factoring numbers is not always a simple task. For instance 1440 = 25 · 32 · 5.c ld m o 3.2. r). The Euclidean Algorithm. Now we examine an alternative method to compute the gcd of two given positive integers a. and in fact we can write 6 = 1 · 30 − 2 · 12. 1512 = 23 · 33 · 7. A few properties of divisors are the following. n. so ﬁnding the gcd by prime factorization might not be a most convenient way to do it. We have that gcd(a. hence gcd(1440.6. 3. If dm then dmn. It is based on the following fact: given two integers a ≥ 0 and b > 0. d be integers. The greatest common divisor of 12 and 30 is gcd(12. then gcd(a. 1512) = 23 · 32 = 72. raised to their lowest exponent. If dm and dn then d(m − n). The solution is not unique. 30) = 6.2. but there are other ways. Greatest Common Divisor. b.www. The greatest possible such d is called the greatest common divisor of a and b. b. 3. if d divides a and b. Example: The set of positive divisors of 12 and 30 is {1. b) = product of the primes that occur in the prime factorizations of both a and b. That is an example of a Diophantine equation. b are called relatively prime. 30) = 6. Finding the gcd by Prime Factorization. Another important result is the following: Given integers a. the equation ax + by = c has integer solutions if and only if gcd(a. b) = gcd(b. b) .7. The method provides at the same time a solution to the Diophantine equation: ax + by = gcd(a. b) = 1 then a. denoted gcd(a. Proof: Divide a by .
Conversely. we can do it by working backward: 2 = 56 − 1 · 54 = 56 − 1 · (222 − 3 · 56) = 4 · 56 − 1 · 222 = 4 · (500 − 2 · 222) − 1 · 222 = 4 · 500 − 9 · 222 . if we want to express 2 as a linear combination of 500 and 222. We continue dividing each remainder by the next one until obtaining a zero remainder. and which point we stop. then a = bq + r . Then we divide b by r.2. hence gcd(500. 222) = 2. If d is a common divisor of a and b then it must be a divisor of r = a−bq. w The algorithm to compute the gcd can be written as follows: w w jn . obtaining a quotient q and a remainder r. THE EUCLIDEAN ALGORITHM 50 b obtaining a quotient q and a remainder r.jntuworld. Furthermore. 222). Example: Assume that we wish to compute gcd(500. 0 ≤ r < b.b) 2: if a<b then {make a the largest} 3: swap(a.www. obtaining a new quotient q and a remainder r . and the greatest common divisor will be the same. if d is a common divisor of b and r then it must divide a = bq + r.com 3. So the set of common divisors of a and b and the set of common divisors of b and r are equal. which gives a quotient q and another remainder r .b) 4: while b = 0 5: begin 6: r := a mod b 7: a := b 8: b := r 9: end 10: return a 11: end gcd . w tu → → → → r o r r r r = 56 = 54 =2 =0 . Next we divide r by r . First we divide a by b.c ld m o 1: procedure gcd(a. The Euclidean algorithm is a follows. Then we arrange the computations in the following way: 500 222 56 54 = = = = 2 · 222 + 56 3 · 56 + 54 1 · 54 + 2 27 · 2 + 0 The last nonzero remainder is r = 2. The last nonzero remainder is the gcd.
b) 2: if b=0 then 3: return a 4: else 5: return gcd recurs(b.jntuworld.c ld m o .a mod b) 6: end gcd recurs w w w jn . w tu r o .com 3.www.2. THE EUCLIDEAN ALGORITHM 51 The next one is a recursive version of the Euclidean algorithm: 1: procedure gcd recurs(a.
com 3. for m = 5. a ≡ b (mod m). . . Given an integer m ≥ 2. 1.3. so that 9 mod 5 = −1 . .jntuworld. −8 −3 2 7 12 17 22 . . Remark : When writing “r” as a notation for the class of r we may stress the fact that r represents the class of r rather than the integer r by including “ (mod p)” at some point. For instance 8 = 3 (mod p). .1. . MODULAR ARITHMETIC.3. 1.. we say that a is congruent to b modulo m. if m(a − b). 1. . −7 −2 3 8 13 18 23 . . written “r mod m”. For instance. 3.3. . 3. . . It partitions Z into m equivalence classes of the form [x] = [x]m = {x + km  k ∈ Z} . . . m − 1}. namely the common remainder of all elements in that class when divided by m. 0. −10 −5 0 5 10 15 20 . a and b have the same remainder when divided by m. RSA Algorithm 3. if we choose the representatives for the elements of Z5 in the interval from 0 to 4 (Z5 = {0. . −9 −4 1 6 11 16 21 . 3. For instance. Another possibility is to choose the representatives in the interval from −2 to 2 (Z5 = {−2. the chosen representative for the class of r. 2. 4}).e. . 2. a and b represent integers. −1. each one of the following rows is an equivalence class: . 2}). . Note that in “a ≡ b (mod m)”. where 0 ≤ r < m. Congruences Modulo m. . . The relation of congruence modulo m is an equivalence relation. 2. Modular Arithmetic. The set of equivalence classes (i. Often we will omit the brackets. Z5 = {0. written a ≡ b (mod m). we will call “r reduced modulo m”. w Each equivalence class has exactly a representative r such that 0 ≤ r < m. . 1. −6 −1 4 9 14 19 24 . w w jn . RSA ALGORITHM 52 3.c ld m o Reduction Modulo m: Once a set of representatives has been chosen for the elements of Zm . so that the equivalence class [r] will be represented just r. . . then 9 mod 5 = 4. . .www. 2. Note that the following conditions are equivalent 1. Hence an equivalence class may be denoted [r] or x + m Z. a = b + km for some integer k. . w tu r o . the quotient set of Z by the relation of congruence modulo m) is denoted Zm = {0. . For instance. . while in “a = b (mod m)” they represent elements of Zm . . 4}. .
2. For instance in Z6 2 is a proper divisor of zero.g.. Furthermore the elements without multiplicative inverse verify that they can be multiply by some other nonzero element giving a product equal zero. As an example. RSA ALGORITHM 53 In Zm it is possible to deﬁne an addition and a multiplication in the following way: [x] + [y] = [x + y] . 2 · 3 = 0 (mod 6). [x] · [y] = [x · y] .3.2 show the addition and multiplication tables for Z5 and Z6 respectively. + 0 1 2 3 4 0 0 1 2 3 4 1 1 2 3 4 0 2 2 3 4 0 1 3 3 4 0 1 2 4 4 0 1 2 3 · 0 1 2 3 4 0 0 0 0 0 0 1 0 1 2 3 4 2 0 2 4 1 3 3 0 3 1 4 2 4 0 4 3 2 1 Table 3. However in Z6 that is not true. Divisors of zero diﬀerent from zero are called proper divisors of zero. MODULAR ARITHMETIC. w tu r o 0 0 0 0 0 0 0 1 0 1 2 3 4 5 2 0 2 4 0 2 4 .1 and 3. Operational tables for Z5 + 0 1 2 3 4 5 0 0 1 2 3 4 5 1 1 2 3 4 5 0 2 2 3 4 5 0 1 3 3 4 5 0 1 2 4 4 5 0 1 2 3 5 5 0 1 2 3 4 · 0 1 2 3 4 5 w A diﬀerence between this two tables is that in Z5 every nonzero element has a multiplicative inverse. These elements are called divisors of zero.3.jntuworld.3.com 3. i.g. In Z5 there are no proper divisors of zero.www. tables 3.c ld 3 0 3 0 3 0 3 4 0 4 2 0 4 2 5 0 5 4 3 2 1 m o .3. e. some nonzero elements like 2 have no multiplicative inverse. The elements of Zm can be classiﬁed into two classes: w w Table 3. e. Operational tables for Z6 jn .1. In general: 1. Of course with this deﬁnition zero itself is a divisor of zero.e.3. 2−1 = 4 (mod 5). for every x ∈ Z5 such that x = 0 there is an x−1 such that x · x−1 = x−1 · x = 1.
2. 6} 7 ∗ Z9 = {1. its inverse can be computed by using m the Euclidean algorithm to ﬁnd gcd(a.com 3. . An element [a] ∈ Zm is a unit (has a multiplicative inverse) if and only if gcd(a. 2} 3 Z∗ = {1. Example: Find the multiplicative inverse of 17 in Z∗ . 3.3. 5. RSA ALGORITHM 54 (a) Units: elements with multiplicative inverse. m) = 1. 3. Hence (−15) · 17 ≡ 1 (mod 64). The set of units in Zm is denoted Z∗ . but −15 ≡ 49 (mod 64). or 17−1 mod 64 = 49. 5. which is equivalent to ax ≡ 1 (mod m). so the inverse of 17 in (Z∗ .www. 5} 6 ∗ Z8 = {1. 5. m) = 1. 2. (b) Divisors of zero: elements that multiplied by some other nonzero element give product zero. 3. Answer : We 64 use the Euclidean algorithm: 64 17 13 4 = = = = 3 · 17 + 13 1 · 13 + 4 3·4+1 4·1+0 → → → → r r r r = 13 =4 =1 =0 w w w jn . 2. 4. 4} 5 Z∗ = {1. MODULAR ARITHMETIC. 3} 4 Z∗ = {1. 7. We will denote this by writing 17−1 = 49 64 (mod 64). m). 7} Z∗ = {1. since that algorithm also provides a solution to the equation ax + my = gcd(a. w tu r o .c ld m o Now we compute backward: 1 = 13 − 3 · 4 = 13 − 3 · (17 − 1 · 13) = 4 · 13 − 3 · 17 = 4 · (64 − 3 · 17) − 3 · 17 = 4 · 64 − 15 · 17 .jntuworld. All nonzero elements of Zm are units if and only if m is a prime number. 2. 8} Given an element [a] in Z∗ . ·) is 49. 3. For instance: m Z∗ = {1} 2 Z∗ = {1. 4.
m) = 1. If gcd(a. .1 3. b) = 1 ⇒ f (ab) = f (a)f (b) is called multiplicative. If m1 .www. 11. where the pk are prime and the sk are 1 2 k positive. (1 − 1/pk ) . ..3.jntuworld. 3. then w The particular case in which m is a prime number p. the number of integers a such that 1 ≤ a ≤ m and gcd(a.2. function f (x) of positive integers such that gcd(a. psk .3. w w jn . i. If m = ps1 ps2 . The number of units in Zm is equal to the number of positive integers not greater than and relatively prime to m. 7. If a and m are two relatively prime positive integers. 14. hence φ(15) = 8. m2 are two relatively prime positive integers. 4. MODULAR ARITHMETIC. w tu r o . A consequence of Euler’s Theorem is the following. m) = 1 then x ≡ y (mod φ(m)) ⇒ ax ≡ ay (mod m) . m ≥ 2. Euler’s Phi Function. We have the following results: 1. if a = 2 and p = 7.3. then φ(ps ) = ps − ps−1 = ps (1 − 1/p). For instance φ(15) = φ(3 · 5) = φ(3) · φ(5) = (3 − 1) · (5 − 1) = 2 · 4 = 8 . 2. then we have.com 3. .3. 27−1 = 26 = 64 = 1 + 9 · 7 ≡ 1 (mod 7). 2. For instance. 8. in fact. 1A . That number is given by the so called Euler’s phi function: φ(m) = number of positive integers not greater than m and relatively prime to m .c ld m o aφ(m) ≡ 1 (mod m) . then φ(m) = m (1 − 1/p1 ) (1 − 1/p2 ) . then φ(m1 m2 ) = φ(m1 ) φ(m2 ). Euler’s theorem is called Fermat’s Little Theorem: ap−1 ≡ 1 (mod p) . the positive integers not greater than and relatively prime to 15 are: 1. In particular φ(p) = p − 1. RSA ALGORITHM 55 3. 13. If p is a prime number and s ≥ 1. .e. Euler’s Theorem. For instance.
we can compute powers modulo m in the following way: an = an mod φ(m) if gcd(a.1. 2. First e is chosen at random. This requires some trial and error. m). paragraph A.c ld m o . An even more eﬃcient way to compute powers modulo m is given in Appendix A. (mod m) . The decryption key is private and is known only by the recipient of the encrypted message. The RSA algorithm consists of the following: 1. So. Application to Cryptography: RSA Algorithm.jntuworld.com 3. jn . MODULAR ARITHMETIC. m). Adi Shamir and Leonard Adleman. For instance: 39734888 mod 100 = 39734888 mod φ(100) mod 100 = 39734888 mod 40 mod 100 = 38 mod 100 = 6561 mod 100 = 61 . Then we have that (me )d = me·d = m (mod pq) . The RSA algorithm is an encryption scheme designed in 1977 by Ronald Rivest. Find their product n = pq. RSA ALGORITHM 56 Consequently. [x]φ(m) ) → [ax ]m Hence.4.3. Find two numbers e and d (in the range from 2 to φ(n)) such that e · d = 1 (mod φ(n)). It allows encrypting a message with a key (the encryption key) and decrypting it with a diﬀerent key (the decryption key). and a positive number m relatively prime to p and q. the following function is well deﬁned: Z∗ × Zφ(m) → Z∗ m m ([a]m . given m we can recover m modulo pq by raising to the dth power. 3. If gcd(e. and the Euclidean algorithm is used to ﬁnd gcd(e. solving at the same time the equation ex + my = gcd(e.www. Euler’s theorem tells us that: Assume now that we have two integers e and d such that e · d = 1 (mod φ(pq)). m) = 1 then the value obtained e w w w mφ(pq) = m(p−1)(q−1) = 1 (mod pq) .3. The encryption key is public and can be given to everybody. m) = 1. Given two prime numbers p and q. Generate two large primes p and q. The RSA algorithm is based on the following facts. w tu r o .
. . mk be pairwise relatively prime integers greater than or equal to 2. m = me mod n. The case in which p or q divides m is left as an exercise. e).3.5. w tu r o . Then x = M 1 s1 r1 + M 2 s2 r2 + · · · + M k sk r k is a solution to the system. . The encryption key is given to everybody. . RSA ALGORITHM 57 for x is d. If m is relatively prime to p and q. The public encryption key will be the pair (n. and each piece is encoded numerically as a positive integer m smaller than n. in 5 rows leaving 4 left. x ≡ r (mod m ) k k w has a unique solution modulo M = m1 m2 . 6.c ld m o . The recipient computes m = m d mod n.3. MODULAR ARITHMETIC. e is no relatively prime to φ(n) and we must try a diﬀerent value for e. . d). Example: A group of objects can be arranged in 3 rows leaving 2 left. m2 . The Chinese Remainder Theorem. Let m1 . The number me is reduced modulo n. . The private decryption key will be the pair (n. We can ﬁnd a solution to that system in the following way. 3. It remains to prove that m = m. and in 7 rows leaving 6 left. Otherwise.com 3.. 3.www. with 0 ≤ m < n.jntuworld. The following system of congruences x ≡ r1 (mod m1 ) x ≡ r2 (mod m2 ) . 4. and si = the inverse of Mi in Zmi . 5. while the decryption key is kept secret by the future recipient of the message. and since both are in the range from 0 to n − 1 they must be equal. . mk . How many objects are there? Answer : We must solve the following system of congruences: x ≡ 2 (mod 3) x ≡ 4 (mod 5) x ≡ 6 (mod 7) w w jn . then from Euler’s theorem we get that m = m (mod n). The message to be encrypted is divided into small pieces. Let Mi = M/mi .
w w w jn . M1 = 105/3 = 35 ≡ 2 (mod 3).c ld m o . s3 = “inverse of 1 in Z7 ” = 1. M3 = 105/7 = 15 ≡ 1 (mod 7).jntuworld. Hence the solution is x = 35 · 2 · 2 + 21 · 1 · 4 + 15 · 1 · 6 = 314 ≡ 104 (mod 105) . s2 = “inverse of 1 in Z5 ” = 1. w tu r o .3.www. RSA ALGORITHM 58 We have: M = 3 · 5 · 7 = 105.com 3. M2 = 105/5 = 21 ≡ 1 (mod 5). any group of 104 + 105 k objects is a possible solution to the problem. s1 = “inverse of 2 in Z3 ” = 2. Hence. MODULAR ARITHMETIC.
. 8. 7. r o . . n. say s1 . The sequence is symbolically represented {sn } or {sn }∞ . . . 6. 2. . . n2 . . . The sequence itself can be deﬁned by giving a rule. . The reciprocals of the positive 1 1 1 1. w tu 3. The sequence of powers of 2: 4. .··· n In general the elements of a sequence are represented with an indexed letter. . integers: 1 . 2.1. . The sequence of Fibonacci numbers (each one is the sum of the two previous ones): w 5. 1. the previous sequence would become: 1.g. 2n. 8. . 3. . . Sequences. 5.c ld m o 0. . e. for instance if we declare s0 to be the ﬁrst term. 3. 4. . The sequence of positive even integers: 2. 3.1. . . .com CHAPTER 4 Induction. . Recurences 4. . . . . n=1 59 . 3. . . 5. but we can start at any value of the index that we want.1. . . 5. . Sequences and Strings 4. . . s3 . 13.: sn = 2n + 1 is the sequence: Here we are assuming that the ﬁrst element is s1 . .jntuworld. 1. . s2 . 1. Examples: 1. The sequence of positive integers: 1. . . 4. 7. . 2. 9. . 4. 8. 16. sn . . · · · 2 3 4 w w jn . . . 9. . 2.www. A sequence is an (usually inﬁnite) ordered list of elements.
www. . . aaaa. 5. Example: If X is the set X = {a. 4. i=m ai = am · am+1 · am+2 · · · · · an w tu r o . Sum (Sigma) and Product Notation. SEQUENCES AND STRINGS 60 If sn ≤ sn+1 for every n the sequence is called increasing. we get the subsequence consisting of the even positive integers: 2.g. b. while sn = 1/n is decreasing: 1. 2 7 3 6 . 7. then the following are examples of strings over X: aba. etc.. · · · . The length of a string is its number of elements.1. Strings.com 4. 4. . a b a c  = 18. .. if we remove all odd numbers from the sequence of positive integers: 1. 4. 9. Sum (or sigma) notation: n ai = am + am+1 + am+2 + · · · + an i=m 2. 5 . 1 . For instance sn = 2n + 1 is increasing: 3. If sn ≥ sn+1 then it is called decreasing.3.1. then w w w n=3 6 n=3 6 an = a3 + a4 + a5 + a6 = 7 + 9 + 11 + 13 = 40 . . abaccbab = 8. 1 .c ld m o an = a3 · a4 · a5 · a6 = 7 · 9 · 11 · 13 = 9009 .1. 2.2. . 1 . e. . Given a set X. . 3. E. (ab)3 = ababab. c}. 8. for instance: a2 b3 ac2 a3 = aabbbaccaaa. . Repetitions can be speciﬁed with a superscripts. 2 3 4 If we remove elements from a sequence we obtain a subsequence. . etc. bba.g. 6. 4. a string over X is a ﬁnite ordered list of elements of X. In order to abbreviate sums and products the following notations are used: 1. Product notation: n For instance: assume an = 2n + 1. jn .jntuworld.
w w w jn . The set of no null strings over X (i. zero: λ = 0. SEQUENCES AND STRINGS 61 The string with no elements is called null string.c ld m o . Its length is.www. represented λ. the string consisting of α followed by β is called the concatenation of α and β.. The set of all strings over X is represented X ∗ . of course.1.jntuworld. all strings over X except the null string) is represented X + .e.com 4. Given two strings α and β over X. For instance if α = abac and β = baaab then αβ = abacbaaab. w tu r o .
: S(n) = n2 . Remark : In the basis step we may replace 1 with some other integer m. then P (n + 1) is true. and 2.www. w w jn . so the property is true for 1. Then the conclusion is that the property is true for every integer n greater than or equal to m. Mathematical Induction Many properties of positive integers can be proved by mathematical induction.com 4. Remark : The premise P (n) in the inductive step is called Induction Hypothesis. 2..1. In fact: S(n + 1) = 1 + 3 + 5 + · · · + (2n + 1) = S(n) + 2n + 1 .2. Consequently the property is true for all positive integers. 1.2. We want to prove by induction that for every positive integer n. Then P (n) is true for all positive integers. Then the inductive step implies that P (2) is also true.. By the inductive step again we see that P (3) is true.jntuworld. i. S(n) = n2 .2. The validity of the Principle of Mathematical Induction is obvious. Let P be a property of positive integers such that: 1. MATHEMATICAL INDUCTION 62 4. i. Answer : Let S(n) = 1 + 3 + 5 + · · · + (2n − 1). S(n + 1) = (n + 1)2 . Basis Step: P (1) is true. Inductive Step: if P (n) is true. 1 + 3 + 5 + · · · + (2n − 1) = n2 .e. Principle of Mathematical Induction. w tu r o . w Example: Prove that the sum of the n ﬁrst odd positive integers is n2 . 4. Basis Step: If n = 1 we have S(1) = 1 = 12 . Inductive Step: Assume (Induction Hypothesis) that the property is true for some positive integer n. i.e. We must prove that it is also true for n + 1. and so on. The basis step states that P (1) is true.c ld m o .e.
S(n) = n2 . . and 2. 2.: 2n + 1 ≤ 2n . so the property is true in this case. MATHEMATICAL INDUCTION 63 But by induction hypothesis. Then P (n) is true for all positive integers. Strong Form of Mathematical Induction. 1.jntuworld.2. and we also have that 3 ≤ 2n if n ≥ 3. i.. We must prove that it is also true for n + 1.www. Inductive Step: if P (k) is true for all 1 ≤ k ≤ n then P (n + 1) is true. 6 • 13 + 23 + 33 + · · · + n3 = (1 + 2 + 3 + · · · + n)2 . Inductive Step: Assume (Induction Hypothesis) that the property is true for some positive integer n.c ld m o • 12 + 22 + 32 + · · · + n2 = n (n + 1) (2n + 1) . 4. Basis Step: If n = 3 we have 2n + 1 = 2 · 3 + 1 = 7 and 2n = 23 = 8. 2 r o . hence: S(n + 1) = n2 + 2n + 1 = (n + 1)2 . By the induction hypothesis we know that 2n ≤ 2n . Example: Prove that 2n + 1 ≤ 2n for n ≥ 3. w tu n (n + 1) .2. hence 2(n + 1) + 1 = 2n + 3 ≤ 2n + 2n = 2n+1 . and shows that the property is true for all positive integers. Basis Step: P (1) is true. Exercise: Prove the following identities by induction: w w w • 1 + 2 + 3 + ··· + n = jn . 2(n + 1) + 1 ≤ 2n+1 . This completes the induction.e. i. Answer : This is an example in which the property is not true for all positive integers but only for integers greater than or equal to 3.com 4.2. This completes the induction.e. and shows that the property is true for all n ≥ 3. Let P be a property of positive integers such that: 1.
then it can be written as the product of two positive integers. Note that since 2 > 1 then a > b. 2 = a/b.e.www. n + 1 = k1 k2 . Since the left hand side is even.c ld m o . If it is not a prime number. so the property holds for n = 2. Now we have 2 = a2 /b2 . By induction hypothesis each of k1 and k2 must be a prime or a product of primes.. i. This completes the proof. Now.. our assumption that 2 is rational has to be false. k2 < n + 1. hence 2 b2 = a2 . w w jn . From here we see √ that 2 = b/a . So the set of possible numerators of a fraction representing √ 2 cannot have a smallest element. The WellOrdering Principle. 2 cannot be written as a quotient of two positive integers) using the wellordering principle. then a2 is even. w 4. Answer : 1. MATHEMATICAL INDUCTION 64 Example: Prove that every integer n ≥ 2 is prime or a product of primes.jntuworld. 2. where a and √ b are integers. contradicting the wellordering √ principle. w tu r o . Inductive Step: Assume that if 2 ≤ k ≤ n.com 4. √ √ Example: Prove that 2 is irrational (i. then k is a prime number or a product of primes. and simplifying: b2 = 2 a 2 . Hence: √ 2 b2 = 4 a 2 . Every nonempty set of positive integers has a smallest element. √ √ Answer : Assume that 2 is rational. Repeating the same argument with the fraction b/a we get another fraction with an even smaller numerator. such that 1 < k1 . Basis Step: 2 is a prime number. If it is a prime number then it veriﬁes the property. so a = 2 a .2. Consequently.2.e. Hence starting with a fractional representation of 2 = a/b √ we end up with another fractional representation 2 = b/a with a smaller numerator b < a. either n + 1 is a prime number or it is not.3. but this implies that a itself is even. hence n + 1 is a product of primes. and so on.
F0 = 0. which are of the form C0 xn + C1 xn−1 + C2 xn−2 + · · · + Ck xn−k = bn .com 4. Ck are constant. .jntuworld. Recurrence Relations Here we look at recursive deﬁnitions under a diﬀerent point of view. . In the following we assume that the coeﬃcients C0 . the solution to those equations may provide a closedform (explicit) formula for the object deﬁned. when the equations involved in that deﬁnition have a unique solution. Also. in the recursive deﬁnition of the Fibonacci sequence. If bn = 0 the recurrence relation is called homogeneous. Rather than deﬁnitions they will be considered as equations that we must solve. w w jn .e. i.3.. where C0 = 0. by using the recurrence repeatedly until obtaining a explicit closeform formula. .3. hence the solution is xn = A rn . i. We will focus on kthorder linear recurrence relations. For instance consider the following recurrence relation: xn = r xn−1 (n > 0) . By using the recurrence repeatedly we get: xn = r xn−1 = r2 xn−2 = r3 xn−3 = · · · = rn x0 = A rn . The point is that a recursive deﬁnition is actually a definition when there is one and only one object satisfying it.e. x0 = A . C1 . Otherwise it is called nonhomogeneous. The recursive step in a recursive deﬁnition is also called a recurrence relation. F1 = 1 .www. the recurrence is or and the initial conditions are w One way to solve some recurrence relations is by iteration. So. RECURRENCE RELATIONS 65 4. The basis of the recursive deﬁnition is also called initial conditions of the recurrence. Fn = Fn−1 + Fn−2 w tu r o . .c ld m o Fn − Fn−1 − Fn−2 = 0 . .. for instance.
.www. The ﬁrst one is xn = r xn−1 + c (n > 0). The solution is n xn = A r + c k=1 n rn−k = A rn + c and w Example: Assume that a country with currently 100 million people has a population growth rate (birth rate minus death rate) of 1% per year. and it also receives 100 thousand immigrants per year (which are quickly assimilated and reproduce at the same rate as the native population).01n − 1) . 000.01n + 100. . w w xn = A + c n jn .01. we have: xn = 1. c = 100. where c is a constant.com 4.01n − 1 xn = 100.jntuworld. 000. We examine two particular cases. m o This is the equation above with r = 1. First Order Recurrence Relations. The homogeneous case can be written in the following way: xn = r xn−1 Its general solution is xn = A r n .c ld if r = 1 . x0 = A . Using the summation notation. 000 · 1.01 + 1000 (1. 000 1.01 xn−1 + 100. 00. RECURRENCE RELATIONS 66 4.3. 000 (n > 0). 000. Find its population in 10 years from now.3. 000 . which is a geometric sequence with ratio r.1.01 − 1 n = 100. (Assume that all the immigrants arrive in a single batch at the end of the year. hence: 1. The nonhomogeneous case can be written in the following way: xn = r xn−1 + cn (n > 0) . 000. n (n > 0) . 000 and A = 100.) Answer : If we call xn = population in year n from now. x0 = A . its solution can be expressed like this: xn = A r + k=1 n ck rn−k . w tu r o rn − 1 r−1 x0 = A . if r = 1 . 000 · 1. x0 = 100.
w tu r o . 2 4. the general solution of the recurrence relation is xn = c1 r n + c2 n r n .com 4. C0 r2 + C1 r + C2 = 0 .jntuworld. Second Order Recurrence Relations. Double Real Root. 3. In this case the solution could be expressed in the same way as in the case of distinct real roots. c2 are arbitrary constants. where c1 . xn = A + k=1 (c + d k) = A + c n + d n (n + 1) . In this case the general solution of the recurrence relation is n n x n = c 1 r 1 + c 2 r2 . 462. 317 . The solution is now n x0 = A . hence r must be a solution of the following equation.3. Complex Roots.3. c2 are arbitrary constants. By plugging in the equation we get: C0 c rn + C1 c rn−1 + C2 c rn−2 = 0 .c ld m o where c1 . called the characteristic equation of the recurrence: Let r1 . r2 be the two (in general complex) roots of the above equation. The second particular case is for r = 1 and cn = c + d n. First we will look for solutions of the form xn = c rn . Now we look at the recurrence relation C0 xn + C1 xn−1 + C2 xn−2 = 0 . where c and d are constant (so cn is an arithmetic sequence): xn = xn−1 + c + d n (n > 0) .2. Distinct Real Roots. We distinguish three cases: w 1. If r1 = r2 = r. RECURRENCE RELATIONS 67 So: x10 = 110. 2. w w jn .www. but in . They are called characteristic roots.
Using the initial conditions we get the value of the constants: √ (n = 0) c1 + c2 = 0 c1 = 1/√5 ⇒ c2 = −1/ 5 (n = 1) c1 φ + c2 (−φ−1 ) = 1 Hence: w w w √ 1+ 5 2 jn . Example: Find a closedform formula for the Fibonacci sequence deﬁned by: Fn+1 = Fn + Fn−1 (n > 0) . r2 = r e−αi .www. Its roots are: 2 √ √ 1+ 5 1− 5 −1 r1 = φ = . w tu r o . k2 = (c1 − c2 ) i. 1Remainder: 2φ = eαi = cos α + i sin α.com 4. k1 = c1 + c2 . F0 = 0.jntuworld. The characteristic equation is r2 − r − 1 = 0 .c ld m o 1 Fn = √ 5 φn − (−φ)−n . is the Golden Ratio.3. RECURRENCE RELATIONS 68 order to avoid the use of complex numbers we write r1 = r eαi . which yields:1 xn = k1 rn cos nα + k2 rn sin nα . Answer : The recurrence relation can be written Fn − Fn−1 − Fn−2 = 0 . so the general solution for the recurrence is: Fn = c1 φn + c2 (−φ−1 )n . 2 2 They are distinct real roots. r2 = −φ = . F1 = 1 . .
2. and each digit can be printed in 10 ways.c ld m o . assume that a license plate contains two letters followed by three digits. For instance. More generally: A1 × A2 × · · · × An  = A1  · A2  · · · An  .com CHAPTER 5 Counting 5. 69 w w w jn . Then: A × B = A · B . then performing either task can be accomplished in m + n ways. if the sets A1 . If a task can be performed in m ways and another independent task can be performed in n ways. 5. If a task can be performed in m ways. Set theoretical version of the rule of sum: If A and B are disjoint sets (A ∩ B = ∅) then A ∪ B = A + B . The Rule of Product. For instance. while another task can be performed in n ways. . More generally.1. Basic Principles 5.1. . and the two tasks cannot be performed simultaneously. so 26 · 26 · 10 · 10 · 10 = 676000 diﬀerent plates can be printed. .www.1. then the class has 30 + 25 = 45 students. Set theoretical version of the rule of product: Let A × B be the Cartesian product of sets A and B. w tu r o . A2 . then the combination of both tasks can be performed in mn ways. The Rule of Sum.1. if a class has 30 male students and 25 female students. How many diﬀerent license plates can be printed? Answer : each letter can be printed in 26 ways.jntuworld. . An are pairwise disjoint. then: A1 ∪ A2 ∪ · · · ∪ An  = A1  + A2  + · · · + An  .
In general. ﬁnd the number of functions from A to B.www.1. P = set of students taking Physics. w tu r o . and for an arbitrary union of sets: A1 ∪ A2 ∪ · · · ∪ An  = s1 − s2 + s3 − s4 + · · · ± sn . where sk = sum of the cardinalities of all possible kfold intersections of the given sets. 5. then: M ∪ P  = M  + P  − M ∩ P  = 300 + 200 − 50 = 450 students are taking Mathematics or Physics. How many students are taking at least one of those courses? Answer : If U = total set of students in the university. w w w jn . M = set of students taking Mathematics.jntuworld. and 50 students are taking both. For three sets the following formula applies: A ∪ B ∪ C = A + B + C − A ∩ B − A ∩ C − B ∩ C + A ∩ B ∩ C . B: A ∪ B = A + B − A ∩ B . The inclusionexclusion principle generalizes the rule of sum to nondisjoint sets. The InclusionExclusion Principle.1. for arbitrary (but ﬁnite) sets A. Example: Assume that in a university with 1000 students. 300 are taking a course in physics.3.com 5. BASIC PRINCIPLES 70 Exercise: Given a set A with m elements and a set B with n elements.c ld m o . 200 students are taking a course in mathematics.
da. For instance. The number of permutations of size 2 of the 4 letters a. c. ade. ace. Assume that we have n objects. bde. n) = n! .2. bcd. w w jn . Combinatorics 5. (n − k)! P (n. Any subset of A of size r is called a combination of n elements taken r at a time. ad. d is P (4.2. cd. ca. Assume that we have a set A with n objects. bca. where two combinations are considered identical if they have the same elements regardless of their order. acb. k) of permutations of n objects is By convention 0! = 1.www. b. they are considered equal if they have the same elements arranged in the same order. the combinations of the letters a. d are: ab. If k = n then we call it just a permutation of the n objects. w For instance. acd. cb. the second one in n − 1 ways. c are the following: abc. The permutations of size 2 of the letters a. abd. hence: P (n. 2) = 4 × 3 = 12. ﬁnd the number of onetoone functions from A to B.1. c. cde. e taken 3 at a time are: abc.com 5. k) of permutations of size k of n given objects in the following way: The ﬁrst object in an arrangement can be chosen in n ways. b.2. COMBINATORICS 71 5. Note that the order is important. Given two permutations. bd. We ﬁnd the number P (n. Combinations. Exercise: Given a set A with m elements and a set B with n elements.2. ac. b. For instance.c ld m o n! . ba. there are 3! = 6 permutations of the 3 letters a. Any arrangement of any k of these objects in a given order is called a permutation of size k. bac. 5. abe. c. bc. cba. cab. k) = n × (n − 1) × where n! = 1 × 2 × 3 × ··· (k factors) (n factors) The number P (n. r o . d. dc. db. b. c.2. b. w tu ··· × (n − k + 1) = × n is called “n factorial ”. Permutations. . the third one in n − 2 ways.jntuworld. bce. the permutations of the letters a. and so on.
and 2) permute the k elements in each subset in all possible ways. r) = . r) ways.com 5. r) n! C(n. Task 1) can be performed in C(n.c ld m o . and task 2) can be performed in P (r.2. r! (n − r)! The symbol n r (read “n choose r”) is often used instead of C(n. r) × P (r. By the product rule we have P (n. r) = = . r).jntuworld.www. r) ways. r) r! (n − r)! w w w jn . r) is the following. hence P (n. r). P (r. r) = C(n. In order to generate all possible permutations of size r of the elements of A we 1) take all possible subsets of size r in the set A. COMBINATORICS 72 The number of subsets of size r in a set A with n elements is: n! C(n. Let A be a set with n objects. w tu r o . One way to derive the formula for C(n.
Assume that we have an alphabet with k letters and we want to write all possible words containing n1 times the ﬁrst letter of the alphabet. . . cdd. 2) × 3! × 2!. nk ! . . The number of combinations of n objects taken r at a time with repetition. . nk ). babaa. generate each permutation of this ﬁve elements by choosing 1) the position of each kind of letter. abd. . ccd. where each object can be selected more than once. where n = n1 + n2 + · · · + nk . ddd. bbc. Permutations with Repeated Elements. n2 . Two combinations with repetition are considered identical if they have the same elements repeated the same number of times.3. add. Task 1) can be performed in P (5. Example: With 3 a’s and 2 b’s we can write the following 5letter words: aaabb. bbaaa. acd. abbaa. In general the formula is: w 5. hence P (5. task 3) can be performed in 2!. then 2) the subscripts to place on the 3 a’s. aab. n 1 ! n 2 ! . baaab. 3. n2 . task 2) can be performed in 3! ways. 3. By the product rule we have 5! = P (5. n2 times the second letter.. d taken 3 at a time with repetition are: aaa. aabab. aad. n1 . baaba.3. 3. bbb. acc. Combinations with Repetition. is called a combination of n objects taken r at a time with repetition. ccc. the combinations of the letters a. bcd. 2) ways. aabba. c. Assume that we have a set A with n elements.www. . bdd.jntuworld. Generalized Permutations and Combinations 5. We may solve this problem in the following way. . . bbd. nk times the kth letter. Let us distinguish the diﬀerent copies of a letter with subscripts: a1 a2 a3 b1 b2 . How many words can we write? We call this number P (n. abaab. n1 . then 3) these subscripts to place on the 2 b’s. abb. .c ld m o n! . ababa.3. Next. as illustrated with the example above. regardless of their order. aac. b. GENERALIZED PERMUTATIONS AND COMBINATIONS 73 5. . . abc.3. bcc.com 5.1. w tu r o . Any selection of r objects from A. . . For instance.2. nk ) = jn . Note that the following are equivalent: 1. w w P (n. 2) = 5!/3! 2!.
we will use strokes. GENERALIZED PERMUTATIONS AND COMBINATIONS 74 2. The number of nonnegative integer solutions of the equation: x1 + x2 + · · · + xn = r . like this:  +  + .com 5. so for instance we represent the solution x1 = 2. The number of possible distributions of milk equals the number of non negative integer solutions for the equation x1 + x2 + x3 = 7. each possible solution is an arrangement of 7 strokes and 2 plus signs.www. 2 and 3 respectively. r . Example: Assume that we have 3 diﬀerent (empty) milk containers and 7 quarts of milk that we can measure with a one quart measuring cup.c ld m o n+r−1 . x2 . x2 = 1. Instead of using numbers for writing the solutions. x3 be the quarts of milk to put in containers number 1.3.jntuworld. The number of ways r identical objects can be distributed among n distinct containers. Let x1 . w tu (n + r − 1)! = r! (n − 1)! r o . n − 1) = w w w jn . so the number of arrangements is P (9. In how many ways can we distribute the milk among the three containers? We solve the problem in the following way. 7 The general solution is: P (n + r − 1. 2) = 9!/7! 2! = 9 . Now. r. x3 = 4. 3. 7. or 2 + 1 + 4.
5.4. The coeﬃcient of xn−k y k will be equal to the number of ways in which we can select the y from any k of the factors (and the x from the remaining n − k factors). called the Binomial Theorem: n (x + y) = k=0 n n n−k k x y k = n n n n−1 n n−2 2 x + x y+ x y + ··· 0 1 2 n n n + xy n−1 + y . Binomial Coeﬃcients 5. Properties of Binomial Coeﬃcients. BINOMIAL COEFFICIENTS 75 5. we have a + b = n. n n−k .4.c ld m o ··· × (x + y) . a b Exercise: Prove n n = 2n k k=0 and k=0 (−1)k n k = 0. k The expression n is often called binomial coeﬃcient.1.2.4. hence the terms must be of the form xn−k y k . The binomial coeﬃcients have the following properties: 1. k) = n .com 5. and grouping terms of the form x y . The following identities can be easily checked: (x + y)0 = 1 (x + y)1 = x + y (x + y)2 = x2 + 2 xy + y 2 (x + y)3 = x3 + 3 x2 y + 3 xy 2 + y 3 They can be generalized by the following formula.www.4. Hint: Apply the binomial theorem to (1 + 1)2 and (1 − 1)2 .jntuworld. which is C(n. n−1 n We can ﬁnd this formula by writing (x + y)n = (x + y) × (x + y) × w expanding. Binomial Theorem. k w w n k = jn . w tu n r o (n factors) . Since there are n factors of the form (x + y).
Pascal’s Triangle. This together with n = n = 1. 3 0 2 0 4 1 w tu 1 0 3 1 0 0 2 1 4 2 1 1 3 2 r o 2 2 4 3 . allows us to compute very quickly 0 n the values of the binomial coeﬃcients on the arrangement: 1 1 1 1 1 4 3 6 2 3 4 1 1 1 1 w w 4 0 jn .www...jntuworld. 1 We notice that each binomial coeﬃcient on this arrangement must be the sum of the two closest binomial coeﬃcients on the line above it.4. The set A has n+1 subsets k+1 of size k + 1. Those subsets can be partitioned into two classes: that of the subsets containing a.c ld 4 4 m o 3 3 This arrangement of binomial coeﬃcients is called Pascal’s Trian 1Although it was already known by the Chinese in the XIV century.e. The number of subsets containing a equals the number of subsets of A − {a} of size k. BINOMIAL COEFFICIENTS 76 2. n .com 5. k+1 . n+1 k+1 = n n + k k+1 n k = n! . The properties shown in the previous section allow us to compute binomial coeﬃcients in a simple way.e. . Using the n sum principle we ﬁnd that in fact n+1 = n + k+1 .4. The number of subsets not containing a is k n the number of subsets of A − {a} of size k + 1. and that of the subsets not containing a. i. Look at the following triangular arrangement of binomial coeﬃcients: w gle.3. i. k+1 k 5. k!(n − k)! The ﬁrst property follows easily from The second property can be proved by choosing a distinguished element a in a set A of n + 1 elements.
www.jntuworld.com
5.5. THE PIGEONHOLE PRINCIPLE
77
5.5. The Pigeonhole Principle 5.5.1. The Pigeonhole Principle. The pigeonhole principle is used for proving that a certain situation must actually occur. It says the following: If n pigeonholes are occupied by m pigeons and m > n, then at least one pigeonhole is occupied by more than one pigeon.1 Example: In any given set of 13 people at least two of them have their birthday during the same month. Example: Let S be a set of eleven 2digit numbers. Prove that S must have two elements whose digits have the same diﬀerence (for instance in S = {10, 14, 19, 22, 26, 28, 49, 53, 70, 90, 93}, the digits of the numbers 28 and 93 have the same diﬀerence: 8 − 2 = 6, 9 − 3 = 6.) Answer : The digits of a twodigit number can have 10 possible diﬀerences (from 0 to 9). So, in a list of 11 numbers there must be two with the same diﬀerence.
Example: Assume that we choose three diﬀerent digits from 1 to 9 and write all permutations of those digits. Prove that among the 3digit numbers written that way there are two whose diﬀerence is a multiple of 500. Answer : There are 9 · 8 · 7 = 504 permutations of three digits. On the other hand if we divide the 504 numbers by 500 we can get only 500 possible remainders, so at least two numbers give the same remainder, and their diﬀerence must be a multiple of 500. Exercise: Prove that if we select n + 1 numbers from the set S = {1, 2, 3, . . . , 2n}, among the numbers selected there are two such that one is a multiple of the other one.
w
w w
jn .
w tu
r o
.c ld
m o
Pigeonhole Principle (Schubfachprinzip) was ﬁrst used by Dirichlet in Number Theory. The term pigeonhole actually refers to one of those oldfashioned writing desks with thin vertical wooden partitions in which to ﬁle letters.
1The
www.jntuworld.com
CHAPTER 6
Probability
6.1. Probability 6.1.1. Introduction. Assume that we perform an experiment such as tossing a coin or rolling a die. The set of possible outcomes is called the sample space of the experiment. An event is a subset of the sample space. For instance, if we toss a coin three times, the sample space is S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T } . The event “at least two heads in a row” would be the subset E = {HHH, HHT, T HH} .
If all possible outcomes of an experiment have the same likelihood of occurrence, then the probability of an event A ⊂ S is given by Laplace’s rule: E P (E) = . S For instance, the probability of getting at least two heads in a row in the above experiment is 3/8.
w
and
6.1.2. Probability Function. In general the likelihood of diﬀerent outcomes of an experiment may not be the same. In that case the probability of each possible outcome x is a function P (x). This function veriﬁes: 0 ≤ P (x) ≤ 1 for all x ∈ S
w w
jn .
x∈S
w tu
r o
.c ld
m o
P (x) = 1 . The probability of an event E ⊆ S will be P (E) =
x∈E
P (x)
78
www.jntuworld.com
6.1. PROBABILITY
79
Example: Assume that a die is loaded so that the probability of obtaining n point is proportional to n. Find the probability of getting an odd number when rolling that die. Answer : First we must ﬁnd the probability function P (n) (n = 1, 2, . . . , 6). We are told that P (n) is proportional to n, hence P (n) = kn. Since P (S) = 1 we have P (1)+P (2)+· · · P (6) = 1, i.e., k ·1+k ·2+ · · · + k · 6 = 21k = 1, so k = 1/21 and P (n) = n/21. Next we want to ﬁnd the probability of E = {2, 4, 6}, i.e. P (E) = P (2) + P (4) + P (6) = 2 4 6 12 + + = . 21 21 21 21 6.1.3. Properties of probability. Let P be a probability function on a sample space S. Then: 1. For every event E ⊆ S,
0 ≤ P (E) ≤ 1 .
2. P (∅) = 0, P (S) = 1. 3. For every event E ⊆ S, if E = is the complement of E (“not E”) then P (E) = 1 − P (E) . 4. If E1 , E2 ⊆ S are two events, then P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) − P (E1 ∩ E2 ) .
w
Example: Find the probability of getting a sum diﬀerent from 10 or 12 after rolling two dice. Answer : We can get 10 in 3 diﬀerent ways: 4+6, 5+5, 6+4, so P (10) = 3/36. Similarly we get that P (12) = 1/36. Since they are mutually exclusive events, the probability of getting 10 or 12 is P (10) + P (12) = 3/36 + 1/36 = 4/36 = 1/9. So the probability of not getting 10 or 12 is 1 − 1/9 = 8/9. 6.1.4. Conditional Probability. The conditional probability of an event E given F , represented P (E  F ), is the probability of E assuming that F has occurred. It is like restricting the sample space to F . Its value is P (E ∩ F ) . P (E  F ) = P (F )
w w
In particular, if E1 ∩ E2 = ∅ (E1 and E2 are mutually exclusive, i.e., they cannot happen at the same time) then P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) .
jn .
w tu
r o
.c ld
m o
1). The number of possible outcomes is 6 × 6 = 36.5. The probability that the shooter does not hit the target in one shot. The probability that the shooter hits the target at least once after shooting three times. 5). so the probability is P (E  F ) = P (E ∩ F )/P (F ) = 1/11.g. 5). w w jn . 6)} . 5). 3. The probability that the shooter does not hit the target three times in a row. P (not hitting the target three times in a row) = 0. Answer : We call E = “obtaining sum 10” and F = “at least one of the dice shows 5 points”. (4. Two events E and F are said to be independent if the probability of one of them does not depend on the other.973. 4). Note that if E is independent of F then also F is independent of E. e. (5.c ld m o Answer : 1.33 = 0. 6). 2).com 6. PROBABILITY 80 Example: Find the probability of obtaining a sum of 10 after rolling two fair dice.1. (3. (5.g. P (hitting the target at least once in three shots) = 1−0. In this circumstances: P (E ∩ F ) = P (E) · P (F ) . so F  = 11.027. and the ways to obtain a sum 10 are E ∩ F = {(5. w tu r o .7 = 0. .1. 3). so E = 3.3. 4)}. (5. Independent Events. (2. and that hitting the target in diﬀerent shots are independent events. P (not hitting the target in one shot) = 1 − 0. (5.: P (E  F ) = P (E) . e. (6. 6. 5). E ∩ F  = 1.027 = 0. (5. Example: Assume that the probability that a shooter hits a target is p = 0. Hence the probability is P (E) = E/S = 3/36 = 1/12. (6. (5. P (F  E) = P (F ). Now. 2. Find: w 1. 5).. 5).jntuworld. Find the probability of that event if we know that at least one of the dice shows 5 points.7. if we know that at least one of the dice shows 5 points then the sample space shrinks to F = {(1. 5)}. The event “obtaining a sum 10” is E = {(4. 5). (5. 3.www. 2.
A person is given the test and it comes out positive. C2  = 99. P (C2 ) = C2 /S = 0. but it also has a 5% probability of giving a false positive when given to a person who does not have it. PROBABILITY 81 6.05094 = 0.6. C2 . 000.jntuworld. Suppose that a sample space S is partitioned into n classes C1 . w tu r o .) w w P (C1  F ) = jn .1.019434628 · · · ≈ 2% . 000. A test designed to detect the disease has a 99% probability of detecting it when administered to a person who has it. . 000. 900.99 · 0. and the event is F = “the test gives a positive”. 000.05 · 0.05.999. P (F  C2 ) = 0.001 + 0. Cn which are pairwise mutually exclusive and whose union ﬁlls the whole sample space. Also P (F  C1 ) = 0. . P (F ) and P (Cj  F ) = Example: In a country with 100 million people 100 thousand of them have disease X.001. Then for any event F we have n P (F ) = i=1 P (F  Ci ) P (Ci ) P (F  Cj ) P (Cj ) .c ld m o P (F  C1 ) · P (C1 ) 0.001 = P (F ) 0. . .1. C1  = 100. We have: S = 100. What is the probability that that person has the disease? Answer : The classes are C1 = “has the disease” and C2 = “does not have the disease”.999 = 0. and by Bayes’ theorem: w (So the test is really of little use when given to a random person— however it might be useful in combination with other tests or other evidence that the person might have the disease. .com 6. hence P (C1 ) = C1 /S = 0.www.99 · 0. Hence: P (F ) = P (F  C1 ) · P (C1 ) + P (F  C2 ) · P (C2 ) = 0.99.05094 . Bayes’ Theorem.
The latter is an example of a directed graph or digraph. whose elements are called vertices. 2. Undirected Graph.com CHAPTER 7 Graph Theory 7. consisting of a number of towns connected with roads. w w w jn . Each edge e ∈ E is associated with a pair of vertices. a d w tu r o b c . nodes or points.c ld m o Figure 7.jntuworld.1. If the edges are directed then they are also called directed edges or arcs. Graphs. The vertex set V .www. In general a graph G consists of two things: 1. 82 . The representation of a binary relation deﬁned on a given set. A road map.1. The edge set E or set of edges connecting pairs of vertices. 2. Consider the following examples: 1. The relation of a given element x to another element y is represented with an arrow connecting x to y.1.1. Graphs 7. The former is an example of (undirected) graph.
If the graph is directed. y) is not an ordered pair.2. Two vertices connected by an edge are called adjacent. so E may be regarded as a set of ordered pairs.c ld m o Figure 7. an edge pointing from vertex x to vertex y is said to be incident from x and incident to y. In this context we may also write e = (x. They are also the endpoints of the edge.3. so E can be regarded as set of unordered pairs. Directed Graph. y).jntuworld. If e = (x. but the name of an edge. understanding that here (x. y). w w w jn .1. y). and the edge is said to be incident to each of its endpoints. and y is called the terminus. a d w tu r o b c .com 7. GRAPHS 83 a b d c Figure 7.www. If the graph is directed and there is a unique edge e pointing from x to y. then we may write e = (x. If the graph is undirected and there is a unique edge e connecting x and y we may write e = {x. y}. Graph with parallel edges. source or initial point of the edge e. A graph is sometimes represented by the pair (V. An edge connecting a vertex to itself is called a loop. Two edges connecting the same pair of points (and pointing in the same direction if the graph is directed) are called parallel or multiple. . terminating vertex or terminal point. E) (we assume V and E ﬁnite). the vertex x is called origin.
is the number of edges with v as their initial vertex. the ﬁrst sum in the right hand side of the equality is even.) w In a graph with directed edges. The following table summarizes the graph terminology Table 7. is the number of edges that contain it (loops are counted twice). A vertex of degree zero (not connected to any other vertex) is called isolated. E).com 7. so there must be an even number of them. If it has loops (and possible also multiple edges) then it is called a pseudograph. Then deg (v) = v∈Ve deg (v) + v∈Vo deg (v) . Proof: Let Ve and Vo respectively the set of vertices of even degree and the set of vertices of odd degree in an undirected graph G = (V.1. .c ld m o Number of vertices of odd degree.1. Graph Terminology Type Simple graph Multigraph Pseudograph Directed graph Directed multigraph Edges indirected indirected indirected directed directed Multiple Edges Allowed? no yes yes no yes Loops Allowed? no no yes yes yes The degree of a vertex v. denoted deg+ (v). which is even.) − w w 2e = jn . Let G = (V. Since deg(v) is even for v ∈ Ve . so the second sum must be even too. A vertex of degree 1 is called pendant.1. is the number of edges with v as their terminal vertex. The outdegree of a vertex v. the indegree of a vertex v. The Handshaking Theorem. An undirected graph has an even number of vertices of odd degree. If a graph has multiple edges but no loops then it is called a multigraph.www. GRAPHS 84 A graph with neither loops nor multiple edges is called a simple graph. denoted deg (v). (This applies even if multiple edges and loops are present. The total sum must be 2e. (Note that a loop at a vertex contributes 1 to both the indegree and the outdegree of this vertex. represented deg(v).jntuworld. Then 2e = deg (v) . v∈V w tu v∈V r o . E) be an undirected graph with e edges. But its terms are all odd.
The complete graph of n vertices is represented Kn (ﬁg. E) such that every pair of distinct vertices in G are connected by exactly one edge—so.4.6). . . x) (but not both) is in E. 010 000 w tu 110 100 r o 111 101 c . 7.www. for each pair of distinct vertices. The length of a path in a weighted graph is the sum of the weights of the edges in the path. . The ncube: A graph with with 2n vertices labeled 0. 3cube. Weighted Graph. Complete Graph: a simple undirected graph G such that every pair of distinct vertices in G are connected by an edge.1. Here we examine a few special graphs.1. Special Graphs. a 7 6 3 b 6 4 d Figure 7. GRAPHS 85 Sum of degrees in an directed graph. 7. w w w jn . 2n − 1 so that two of them are connected with an edge if their binary representation diﬀers in exactly one bit. . v∈V v∈V A weighted graph is a graph whose edges have been labeled with numbers. Let G = (V. . either (x.com 7. E) be a directed graph.c ld m o 011 001 Figure 7.jntuworld.2. y) or (y. 1.5. Then deg− (v) = deg+ (v) = E . A complete directed graph is a simple directed graph G = (V.
1.6. a b c w tu r o p q r s . Hint: color in red all vertices whose binary representation has an even number of 1’s.c ld m o Figure 7. Given a graph G = (V. Question: Is the ncube bipartite.1. A bipartite simple graph is called complete if each vertex in V1 is connected to each vertex in V2 . a subgraph G = (V . 7.4 .www. Complete graph K5 . Bipartite Graph: a graph G = (V.com 7. E). Regular Graph: a simple graph whose vertices have all the same degree. A graph is bipartite iﬀ its vertices can be colored with two colors so that every edge connects vertices of diﬀerent color. GRAPHS a 86 e b d c Figure 7.jntuworld. the corresponding complete bipartite graph is represented Km. .7. the ncube is regular. If V = V then G is called a spanning subgraph of G. E) in which V can be partitioned into two subsets V1 and V2 so that each edge in G connects some vertex in V1 to some vertex in V2 . 7. color in blue the ones with an odd number of 1’s. Complete bipartite graph K3. For instance.n (ﬁg. If V1  = m and V2  = n. E ) of G is another graph such that V ⊆ V and E ⊆ E.3.7). w w w jn . Subgraph.
jntuworld.1.c ld m o . and its edge set contains all edges from G connecting vertices in U . denoted U .www. the subgraph of G induced by U . GRAPHS 87 Given a subset of vertices U ⊆ V . is the graph whose vertex set is U .com 7. w tu r o . w w w jn .
d) would be 2 instead of 1.2.8 r o e2 e3 .2. The incidence matrix of a graph G is a matrix with rows labeled by vertices and columns labeled by edges. Incidence matrix.8:1 a b c d a 0 1 0 1 b c d 1 0 2 0 0 2 0 0 1 0 0 1 b e1 a w One of the uses of the adjacency matrix A of a simple graph G is to compute the number of paths between two vertices. j) of An is the number of paths of length n from i to j. so that entry for row v column e is 1 if e is incident on v. Representations of Graphs 7. and 0 otherwise. Adjacency matrix. REPRESENTATIONS OF GRAPHS 88 7. For instance the following is the adjacency matrix of the graph of ﬁgure 7.8: some authors if i = j then the entry is twice the number of loops incident on i. 1For .com 7. i = j.c ld c m o e5 7.8 entry (d. As an example.2.jntuworld. the following is the incidence matrix of graph of ﬁgure 7. is the number of edges incident on i and j.www. w w jn . The adjacency matrix of a graph is a matrix with rows and columns labeled by the vertices and such that its entry in row i. so in the example of ﬁgure 7.1. namely entry (i.2.2. column j. e4 w tu d Figure 7.
www.jntuworld.com
7.2. REPRESENTATIONS OF GRAPHS
89
a b c d
e1 e2 e3 e4 e5 1 0 0 1 0 1 1 1 0 0 0 1 1 0 0 0 0 0 1 1
7.2.3. Graph Isomorphism. Two graphs G1 = (V1 , E1 ), G2 = (V2 , E2 ), are called isomorphic if there is a bijection f : V1 → V2 and a bijection g : E1 → E2 such that an edge e is adjacent to vertices v and w if and only if g(e) is adjacent to f (v) and f (w) (ﬁg. 7.9).
a1 b1 a2
e1 d1
Figure 7.9. Two isomorphic graphs. Two graphs are isomorphic if and only if for some ordering of their vertices their adjacency matrices are equal. An invariant is a property such that if a graph has it then all graphs isomorphic to it also have it. Examples of invariants are their number of vertices, their number of edges, “has a vertex of degree k”, “has a simple cycle of length l”, etc. It is possible to prove that two graphs are not isomorphic by showing an invariant property that one has and the other one does not have. For instance the graphs in ﬁgure 7.10 cannot be isomorphic because one has a vertex of degree 2 and the other one doesn’t.
w
w w
jn .
w tu
c1
e2
r o
d2
.c ld
b2 c2
m o
www.jntuworld.com
7.2. REPRESENTATIONS OF GRAPHS
c 3
90
b
e 2 4
a
f
1
5
Figure 7.10. Non isomorphic graphs.
w
w w
jn .
w tu
r o
.c ld
m o
www.jntuworld.com
7.3. PATHS AND CIRCUITS
91
7.3. Paths and Circuits 7.3.1. Paths. A path from v0 to vn of length n is a sequence of n+1 vertices (vk ) and n edges (ek ) of the form v0 , e1 , v1 , e2 , v2 , . . . , en , vn , where each edge ek connects vk−1 with vk (and points from vk−1 to vk if the edge is directed). The path may be speciﬁed by giving only the sequence of edges e1 , . . . , en . If there are no multiple edges we can specify the path by giving only the vertices: v0 , v1 , . . . , vn . The path is a circuit (or cycle) if it begins and ends at the same vertex, i.e., v0 = vn , and has lenght greater than zero. A path or circuit is simple if it does not contain the same edge twice.
7.3.2. Connected Graphs. A graph G is called connected if there is a path between any two distinct vertices of G. Otherwise the graph is called disconnected. A directed graph is connected if its associated undirected graph (obtained by ignoring the directions of the edges) is connected. A directed graph is strongly connected if for every pair of distict points u, v, there is a path from u to v and there is a path from v to u. A connected component of G is any connected subgraph G = (V , E ) of G = (V, E) such that there is not edge (in G) from a vertex in V to a vertex in V − V . Given a vertex in G, the component of G containing v is the subgraph G of G consisting of all edges and vertices of g contained in some path beginning at v. 7.3.3. The Seven Bridges of K¨nigsberg. This is a classical o problem that started the discipline today called graph theory.
w
During the eighteenth century the city of K¨nigsberg (in East Pruso 1 sia) was divided into four sections, including the island of Kneiphop, by the Pregel river. Seven bridges connected the regions, as shown in ﬁgure 7.11. It was said that residents spent their Sunday walks trying to ﬁnd a way to walk about the city so as to cross each bridge exactly once and then return to the starting point. The ﬁrst person to solve the problem (in the negative) was the Swiss mathematician Leonhard Euler in 1736. He represented the sections of the city and the seven bridges by the graph of ﬁgure 7.12, and proved that it is impossible to ﬁnd a path in it that transverses every edge of the graph exactly once. In the next section we study why this is so.
1The
w w
jn .
w tu
r o
.c ld
m o
city is currently called Kaliningrad and is part of the Russian republic.
the number of odd vertices in a graph is always even. so it is an even number. An even vertex is a vertex with even degree.11. Let G = (V. PATHS AND CIRCUITS 92 Figure 7. w w w jn . Existence of Euler Paths and Circuits. Also.c ld m o . The sum of the degrees of all vertices in a graph equals twice its number of edges.12. represented deg(v).3. The graphs that have an Euler path can be characterized by looking at the degree of their vertices. The Seven Bridges of K¨nigsberg. As a consequence. Euler paths and circuits. Graph for the Seven Bridges of K¨nigsberg. w tu r o . an Euler circuit in G is a simple circuit that transverses every edge of the graph exactly once.com 7. E) be a graph with no isolated vertices.www. An Euler path in G is a simple path that transverses every edge of the graph exactly once. an odd vertex is a vertex with odd degree. o 7. Then G contains an Euler circuit if and only if G its vertices have even degree.jntuworld. o Figure 7. Let G be a connected multigraph. G contains an Euler path from vertex a to vertex b (= a) if and only if a and b have odd degree. Recall that the degree of a vertex v. Analogously. is the number of edges that contain v (loops are counted twice).4.3. and all its other vertices have even degree.
although often it is possible to argue that a graph has no Hamilton circuit.c ld m o Edge removal argument.5. PATHS AND CIRCUITS 93 7.3.com 7. For instance in the graph of ﬁgure 7.www. V2 } (so that each edge in G connects some vertex in V1 to some vertex in V2 ). e or d less than 2. The converse is not true. so any circuit in G must have the same number of vertices from each of both sets. i. then G has a Hamilton circuit. For instance if G = (V. A Hamilton path in G is a path (not a circuit) that contains each vertex of G once. a graph may have a Hamilton path but not a Hamilton circuit. so it is impossible to reduce the degree of a and c. E) is a bipartite graph with vertex partition {V1 . Hamilton Circuits. If G is a simple graph with n vertices with n ≥ 3 such that the degree of every vertex in G is at least n/2.e. w tu r o d . because any path must contain alternatively vertices from V1 and V2 . so if a graph has a Hamilton circuit.14 we cannot remove any edge because that would make the degree of b. A Hamilton circuit in a graph G is a circuit that contains each vertex of G once (except for the starting and ending vertex. Another kind of argument consists of removing edges trying to make the degree of every vertex equal two. w In general it is not easy to determine if a given graph has a Hamilton path or circuit. Consequently that graph has no Hamilton circuit. which occurs twice). then G cannot have a Hamilton circuit if V1  = V2 .jntuworld. Hamilton’s puzzle.3. Dirac’s Theorem. . Note that by deleting an edge in a Hamilton circuit we get a Hamilton path. w w jn .13.. a t m l b k j c o s r n q p i f e g h Figure 7. then it also has a Hamilton path. Exercise: Find a graph with a Hamilton path but no Hamilton circuit.
The name comes from a classical problem in which a salesperson must visit a number of cities and go back home traveling the minimum distance possible. A nondeterministic algorithm is an algorithm that works with an extra hint. Unfortunately no eﬃcient algorithm is known for solving this problem (and chances are that none exists).com 7. if it can be solved with an algorithm of complexity O(nk ) for some integer k. PATHS AND CIRCUITS 94 b a e c d Figure 7.) Given a weighted graph with n vertices the problem of determining whether it contains a Hamilton circuit of length not greater than a given L is known to be NPcomplete.14.www. a problem whose solution is “yes” or “no”. A decision problem is said to be polynomial. if G has a Hamilton circuit of length not greater than L the hint could consist of a Hamilton circuit with length not greater than L—so the task of the algorithm would be just to check that in fact that length w w jn . w Remark : (Digression on P/NP problems. then G has a Hamilton circuit. Ore’s Theorem.jntuworld. w tu r o ..c ld m o . This means the following. the traveling salesperson problem (TSP) consists of ﬁnding a Hamilton circuit of minimum length in this graph. Graph without Hamilton circuit. If G is a simple graph with n vertices with n ≥ 3 such that deg(u) + deg(v) ≥ n for every pair of nonadjacent vertices u and v in G. It is said to be nondeterministic polynomial. The Traveling Salesperson Problem. if in all cases when the answer is “yes” this can be determined with a nondeterministic algorithm of complexity O(nk ).3.e. i. for instance in the TSP. First it is a decision problem. or belong to the class P. but this is very ineﬃcient. or belong to the class NP. Given a weighted graph. One way to solve the problem consists of searching all possible Hamilton circuits and computing their length.
s2 .2 Currently it is not known whether the class NP is strictly larger than the class P.6. Edge (i. so that their complexity must be higher than polynomial unless P=NP.c ld m o P problems are “easy to solve”.jntuworld.www. In a sense the P=NP problem consist of determining whether every problem whose solution is easy to check is also easy to solve. j) has weight w(i. w w w jn . although it is strongly suspected that it is. w tu r o . j) > 0.3. The TSP is one of these problems. Every nbinary string appears somewhere in the sequence. . The problem of ﬁnding a gray code is equivalent to ﬁnding a Hamilton circuit in the ncube. The class NP contains a subclass called NPcomplete containing the “hardest” problems of the class. 1: procedure dijkstra(w. and vertex x is labeled L(x) (minimum distance from a if known. 001. 7.{v} 12: for each x in T adjacent to v 13: L(x) := min{L(x).3. The output is L(z) = length of a minimum path from a to z. Gray Codes. This is an algorithm to ﬁnd the shortest path from a vertex a to another vertex z in a connected weighted graph. 100.com 7.a. . For instance: 000. Dijkstra’s ShortestPath Algorithm. s2n and s1 diﬀer in exactly one bit. 2. and NP problems are problems whose answer is “easy to check”. s2n of nbinary strings verifying the following conditions: 1. 110. A Gray code is a sequence s1 . .z. 111. .x)} 2Informally. 101. .L(v)+w(v. 3. 010. otherwise ∞).L) 2: L(a) := 0 3: for all vertices x = a 4: L(x) := ∞ 5: T := set of all vertices 6: {T is the set of all vertices whose shortest} 7: {distance from a has not been found yet} 8: while z in T 9: begin 10: choose v in T with minimum L(v) 11: T := T . Two consecutive strings si and si+1 diﬀer exactly in one bit. PATHS AND CIRCUITS 95 is not greater than L. 011.
3.jntuworld.c ld m o Complexity of Dijkstra’s algorithm. b 2 a 4 e 2 f 3 2 d 1 4 1 c 3 z Figure 7. For an nvertex. PATHS AND CIRCUITS 96 14: end 15: return L(z) 16: end dijkstra For instance consider the graph in ﬁgure 7. w w jn .15. w tu b c d e f z ∞ ∞ ∞ ∞ ∞ ∞ 2 ∞ 3 4 ∞ ∞ 2 3 3 4 ∞ ∞ 2 3 3 4 ∞ 6 2 3 3 4 ∞ 4 2 3 3 4 6 4 2 3 3 4 6 4 r o . connected weighted graph.15. Dijkstra’s algorithm has a worstcase run time of Θ(n2 ). Shortest path from a to z.www.com 7. The algorithm would label the vertices in the following way in each iteration (the boxed vertices are the ones removed from T ): iteration 0 1 2 3 4 5 6 a 0 0 0 0 0 0 0 w At this point the algorithm returns the value 4. simple. .
4.16. in the third graph of ﬁgure 7. Graph Homeomorphism. two graphs are homeomorphic if they can be obtained from the same graph by a sequence of elementary subdivisions. A map divides the plane into a number of regions or faces (one of them inﬁnite). c d w tu c b e a r o b .1. we say that the edges (v.16.jntuworld. 7. A graph G is planar if it can be drawn in the plane with its edges intersecting at their vertices only. v1 ) and (v. d) are in series. v2 ) are in series.c ld m o f c b d d e a h e Figure 7. Three homeomorphic graphs. By removing vertex h we get the ﬁrst graph in the left. 7. v1 ). . where w is a new vertex.4. One such drawing is called an embedding of the graph in the plane. Planar Graphs 7. A particular planar representation of a planar graph is called a map.www. v). Note that if a graph G is planar. v1 ) and (v.4. The opposite of a series reduction is an elementary subdivision It consists of replacing an edge (u. b) and (h. For instance. Deleting such vertex v and replacing (v.2. Some Results About Planar Graphs. Planar Graphs.4. PLANAR GRAPHS 97 7.3. v2 ) is called a series reduction. (v.16). 7. then all graphs homeomorphic to G are also planar. v) with two edges (u. the edges (h. v2 ) with v1 = v2 .4. v2 ) with (v1 . If a graph G has a vertex v of degree 2 and edges (v.com 7. w) and (w. w w w a jn . Two graphs are said to be homeomorphic if they are isomorphic or can be reduced to isomorphic graphs by a sequence of series reductions (ﬁg. Equivalently.
. . Kuratowski’s Theorem: A graph is nonplanar if and only if it contains a subgraph that is homeomorphic to either K5 or K3. e = E. Then: v − e + r = 2. so 4f ≤ 2e.com 7.4. Consider the problem of coloring a map M in such a way that no adjacent regions (sharing a border) have the same color. hence 3v − 6 = 9 < e = 10. rk }. . The dual graph of that map is the graph Gd = (V d . 5. and prove that they are not isomorphic. Dual Graph of a Map.. In general. i. 4. E) be a connected planar graph. r2 . and let v = V . a coloring of a graph is an assignment of a color to each vertex of the graph. 3. and r = number of regions in which some given embedding of G divides the plane.3 .c ld m o 7. The graph K3. Note that this implies that all plane embeddings of a given graph deﬁne the same number of regions. select a point pi in the interior of ri . w w w jn . For each region ri . Proof: in K5 we have v = 5 and e = 10. Exercise: Find the duals of the maps shown in ﬁgure 7. which contradicts the previous result.www. This is equivalent to coloring the vertices of the dual graph of M in such a way that no adjacent vertices have the same color.4. On the other hand. Graph Coloring. Let G = (V.4. E) embedded in the plane. Warning: Note that a diﬀerent embedding of the same graph G may give diﬀerent (and nonisomorphic) dual graphs. there is an edge in E d connecting pi and pj . E d ). Proof: in K3. pk }.3 we have v = 6 and e = 9. p2 . . E) be a simple connected planar graph with v vertices.3 were planar. If K3. The graph K5 is nonplanar.4.3 is nonplanar. . Then 3r ≤ 2e and e ≤ 3v − 6. Euler’s Formula: Let G = (V. w tu r o . 7. Assume that the map divides the plane into a set of regions R = {r1 . each region is bounded by at least four edges. where V d = {p1 . . If a graph can be properly .5. from Euler’s formula we would have f = 5. PLANAR GRAPHS 98 1. 20 ≤ 18. A map is deﬁned by some planar graph G = (V. which is a contradiction.jntuworld. and for each edge in E separating the regions ri and rj . 2. .e.9. . e ≥ 3 edges and r regions. The coloring is called proper if there are no adjacent vertices with the same color.
Some Results About Graph Coloring. 7. proved with an intricate computer analysis of conﬁgurations: Every simple. w 4.4. colored with n colors we say that it is ncolorable. Five Color Theorem (Kempe. Obviously χ(G) ≤ V .c ld m o Exercise: Find a planar graph G such that χ(G) = 4. χ(Kn ) = n. planar graph is 5colorable.4. Heawood) (not hard to prove): Every simple. E) is called the chromatic number of G.6. Let G be a simple graph. 1.jntuworld. 2. The minimum number of colors needed to properly color a given graph G = (V. (b) G is bipartite.www. Four Color Theorem (Appel and Haken.com 7. and is represented χ(G). The following statements are equivalent: (a) χ(G) = 2. planar graph is 4colorable. .17. PLANAR GRAPHS 99 a e c d b Figure 7. w w jn . Dual graph of a map. w tu r o . (c) Every circuit in G has even length 3. 1976).
We draw rooted trees with the root at the top. If x is an ancestor of y. 5. y and z are vertices in T and that (v0 . A tree is a connected undirected graph with no simple circuits. A rooted tree. Then: 1. A rooted tree is a tree in which a particular vertex is designated as the root and every edge is directed away from the root. vn ) is a simple path in T . . v1 . 100 . . v0 .1. If x and y are children of z. y is a descendant of x. vn−1 is the parent of vn . v1 . 3. 4.1. x and y are siblings. Trees 8. vn−1 are ancestors of vn . vn is a child of vn−1 . 2. . Terminology.www. 6. w w w j jn . .1. Let T be a tree with root v0 .jntuworld. . Suppose that x. The arrows indicating the directions of the edges can be omitted. The height of a rooted tree is the maximum level of its vertices.1. If x has no children.com CHAPTER 8 Trees 8. . d k a w tu r b e f l m n r o g h o . it is called a terminal vertex or leaf. The level of a vertex v is the length of the simple path from the root to v.c ld c i p m o Figure 8. . .
it is an internal or branch vertex. w tu r o . E). w w w jn . where V is x together with its descendants and E = edges of simple paths from x to some vertex in E.c ld m o .1. 8. TREES 101 7. The subtree of T rooted at x is the graph (V. If x is not a terminal vertex.com 8.jntuworld.www.
g. A full binary tree is a binary tree in which each vertex has exactly two children or none. Binary Trees 8. Then for each subsequent data item. Binary Search Trees.2. 2. w w jn . If we reach a terminal vertex without founding the element. 10}. The following are a few results about binary trees: 1. An ordered rooted tree is a rooted tree where the children of each internal vertex are ordered. w 8.2. If a binary tree of height h has t terminal vertices. then T has i+1 terminal vertices and 2i + 1 total vertices.1. w tu r o . 3.c ld m o 8. or strings of alphabetic characters in lexicographic order.www. If the element is greater we continue through the right child. 7. and each data item in the right subtree of v is greater than the data item in v. then t ≤ 2h . More generally we can deﬁne a mary tree as a rooted tree in which every internal vertex has no more than m children.2. A binary tree is a rooted tree in which each vertex has at most two children. 9.jntuworld. for each vertex v in T . 2. 8. then that element is not present in S. The tree is called a full mary tree if every internal vertex has exactly m children. A binary search tree associated to S is a binary tree T in which data from S are associate with the vertices of T so that. Assume S is a set in which elements (which we will call “data”) are ordered. Binary Trees.2. e. BINARY TREES 102 8. In order to ﬁnd a element we start at the root and compare it to the data in the current vertex (initially the root). Example: Figure 8. 6. A binary tree is just a particular case of mary ordered tree (with m = 2).2 contains a binary search tree for the set S = {1.com 8. the elements of S can be numbers in their natural order. 4.. We can store data in a binary search tree by randomly choosing data from S and placing it in the tree in the following way: The ﬁrst data chosen will be the root of the tree. 5. that child is designated as either a left child or a right child. but not both. Making a Binary Search Tree. If a vertex has one child.3. if it is equal we have found it.2. if it is smaller we continue through the left child. designated as left child and right child. If T is a full binary tree with i internal vertices.2. starting at the root we . each data item in the left subtree of v is less than the data item in v.
3 has been made from the following list of words choosing them in the order they occur: “IN A PLACE OF LA MANCHA WHOSE NAME I DO NOT WANT TO REMEMBER”. For instance.jntuworld.2. A I w w w jn .2.com 8. Another binary Search Tree. Binary Search Tree. compare it to the data in the current vertex v. If the new data item is greater than the data in the current vertex then we move to the right child. BINARY TREES 103 5 9 3 4 6 7 8 10 2 1 Figure 8. if it is less we move to the left child. the tree in ﬁgure 8.www.3. If there is no such child then we create one and put the new data in it. DO w tu IN PLACE OF LA MANCHA NAME NOT r o WHOSE WANT TO .c ld m o REMEMBER Figure 8. .
A decision tree is a tree in which each vertex represents a question and each descending edge from that vertex represents a possible answer to that question.L Figure 8.jntuworld. Decision tree for the 5 coins puzzle. or C2 is the bad coin and it is lighter. The terminal vertices are labeled with the bad coin and whether it is heavier (H) or lighter (L).1.3. Then by comparing say C1 with any of the other coins. Decision Trees. In each vertex “Ci : Cj ” means that we compare coins Ci and Cj by placing Ci on the left pan and Cj on the right pan of the balance. If C1 and C2 have the same weight we can try comparing C3 and C4 in a similar manner.4.H C1. C5 that are identical in appearance. If their weights are the same then we know that the bad coin is C5 . The decision tree is optimal in the sense that in the worst case it uses three weighings.H C2. TREE ISOMORPHISMS 104 8.L balanced left C3:C5 balanced right right balanced C2.H C4. If C1 is lighter than C2 we proceed as before with “heavier” and “lighter” reversed. The problem is to identify the bad coin and determine whether it is lighter or heavier using only a pan balance and comparing the weights of two piles of coins. say C5 .c ld m o right C1:C5 C1. In this puzzle we have ﬁve coins C1 . and we can determine whether it is heavier or lighter by comparing it to say C1 . Tree Isomorphisms 8. First we compare the weights of C1 and C2 .L C5. DECISION TREES.L C1:C5 C3:C5 right right balanced C5. If C1 is heavier than C2 then we know that either C1 is the bad coin and is heavier. C1:C5 left w tu C1:C2 left balanced C3:C4 r o .3. and each edge is labeled depending on what side of the balance is heavier. Decision Trees. C4 .com 8. and there is no way to solve the problem with less than that—with two weighings we can get . C3 .www. Example: The FiveCoins Puzzle. The problem can be solved in the following way. The corresponding decision tree is the following: w w w jn .3.H C3.H C4.L left left balanced C3. but one is either heavier or lighter that the others. C2 . we can determine whether the bad coin is C1 and is heavier (if C1 it is heavier than C5 ) or it is C2 and is lighter (if C1 has the same weight as C5 ).
w tu r o .a3 a2<a3? YES NO a1.3. a3 with the decision tree shown in ﬁgure 8.a1.a1 Figure 8. Isomorphisms of Trees. we need a decision tree with at least 6 possible outcomes or terminal vertices. so the algorithm represented by the decision tree in ﬁgure 8. Since the mergesort algorithm uses precisely Θ(n log2 n) comparisons. Assume that T1 is a tree with vertex set V1 and T2 is another tree with vertex set V2 .com 8.3. so taking logarithms log2 n! ≈ n log2 n − n log2 e + 1 log2 (2πn) = Θ(n log2 n).1 we have h(n) = Ω(n log2 n).2.www.5.a3 YES NO a1<a3? NO YES NO a1<a3? NO . √ to Stirling’s formula. which implies h ≥ 3. we know that it is optimal. Sorting three elements.c ld m o . If they are rooted trees then we call their roots r1 and r2 respectively.a3.3. Hence in our case 6 < 2h . So the worse case complexity of a sorting algorithm is Ω(n log2 n). Complexity of Sorting. 8. so its height h(n) will verify n! ≤ 2h(n) .3.a2. Recall that in a binary tree of height h with t terminal vertices the following inequality holds: t ≤ 2h . Sorting algorithms work by comparing elements and rearranging them as needed. a2. 2 1According w w jn . 8. w More generally in order to sort n elements we need a decision tree with n! outcomes.a2 a2. which are insuﬃcient to distinguish among ten combinations of 5 possible bad coins and the bad coin being heavier or lighter. Since there are 3! = 6 possible arrangements of 3 elements.a2. n! ≈ nn e−n 2πn.a2 a3. Since log2 (n!) = Θ(n log2 n).a1 a3. We will study three diﬀerent kinds of treeisomorphisms between T1 and T2 .5 is optimal in the sense that it uses the minimum possible number of comparisons in the worstcase.a3.5 a1<a2? YES a2<a3? YES a1. DECISION TREES. a2 . For instance we can sort three elements a1 .a1.jntuworld. TREE ISOMORPHISMS 105 at most nine possible outcomes.
Example: Figure 8. i.www. All of them . w w w T1 jn . (c) f (v) is a left child of f (w) if and only if v is a left child of w..e. 3.6 shows three trees which are graphisomorphic.: (a) f (v) is adjacent to f (w) if and only if v is adjacent to w. and the root vertex.e. f (v) is adjacent to f (w) if and only if v is adjacent to w. Usual graphisomorphism between trees: T1 and T2 are isomorphic if there is a bijection f : V1 → V2 that preserves adjacency. Rootedtreeisomorphism: T1 and T2 are isomorphic if there is a bijection f : V1 → V2 that preserves adjacency and the root vertex. (d) f (v) is a right child of f (w) if and only if v is a right child of w. Binarytreeisomorphism: Two binary trees T1 and T2 are isomorphic if there is a bijection f : V1 → V2 that preserves adjacency. Trees with diﬀerent kinds of isomorphisms. so there are 2 nonisomorphic 3vertex rooted trees represented for instance by T1 and T2 . while the roots of T2 and T3 have degree 2. and left/right children. w tu T2 r o .3. DECISION TREES.c ld m o T3 Figure 8. On the other hand as rooted trees T2 and T3 are isomorphic. Finally T2 and T3 are not isomorphic as binary trees because the left child of the root in T2 is a terminal vertex while the left child of the root of T3 has two children. Exercise: Find all nonisomorphic 3vertex free trees.6. (b) f (r1 ) = r2 . TREE ISOMORPHISMS 106 1.: (a) f (v) is adjacent to f (w) if and only if v is adjacent to w. i. but they are not isomorphic to T1 because the root of T1 has degree 3. As rooted trees T2 –T5 are isomorphic. i. 3vertex rooted trees and 3vertex binary trees.com 8. (b) f (r1 ) = r2 .e. but T1 is not isomorphic to the others. 2.jntuworld.7 shows all 5 nonisomorphic 3vertex binary trees. Answer : Figure 8.
05n d’s. c 15% and d 5%.15n·3+0.05n·3 = 1.15n c’s and 0.3.3n·2+0. With this encoding ncharacter words will have 2n bits. Huﬀman codes provide an alternative representation with variable length bit strings. c = 01. instead some are more frequent that others. In general the length per character of a given encoding with characters a1 . hence its length will be 0.7. 0. . . TREE ISOMORPHISMS 107 are isomorphic as free trees. say a appears with a frequency of 50%. .www.8. 0. . Huﬀman Codes. d = 00 would be one such representation. c.3n b’s. b = 10. Then the following enconding would be more eﬃcient than the ﬁx length encoding: a = 1. .c ld m o where l(ak ) = length of ak and F = n fk . F k=1 w w jn . d}.3. However assume that they do not appear with the same frequency. k=1 given an alphabet and the frequencies of its characters. . f2 . which is shorter than 2n. Nonisomorphic binary trees. so there is only 1 nonisomorphic 3vertex free tree. Two bits are enough for representing them. b 30%. T1 T2 T3 T4 T5 Figure 8. b = 01. Fix length and Huﬀman codes can be represented by trees like in ﬁgure 8. DECISION TREES. d = 000. a2 . . c = 001. The code of each symbol consists of the sequence of labels of the edges in the path from the root to the leaf with the desired symbol. Now in average an ncharacter word will have 0. .com 8. w 8.5n·1+0. for instance a = 11.5n a’s. an whose frequencies are f1 . ﬁnd an optimal encoding that provides minimum average length for words. b. .4. The problem now is. so that shorter strings are used for the most frequently used characters.jntuworld. As an example assume that we have an alphabet with four symbols: A = {a. Usually characters are represented in a computer with ﬁx length bit strings.7n. fn is n 1 fk l(ak ) . w tu r o .
8. 8.3. Fix length code and Huﬀman code. Then we repeat the same operation with the resulting shorter list of frequencies until the list is reduced to one element and the graph obtained becomes a tree. Example: Find the optimal Huﬀman code for the following table of symbols: w w w 5 jn . 12 → 5. Than common vertex plays the role of a new symbol with a frequency equal to f1 + f2 . First we list the frequencies of all the codes and represent the symbols as vertices (which at the end will be leaves of a tree). 12 12 Here we have a choice. w tu r o . Constructing an Optimal Huﬀman Code. In general an optimal Huﬀman code can be made as follows. An optimal Huﬀman code is a Huﬀman code in which the average length of the symbols is minimum. 8.com 8. 3 .3. 12 → 12. Let’s choose the former: . TREE ISOMORPHISMS 108 1 0 1 a 0 1 a 0 1 c 0 d 1 b 1 c 0 0 d b Fix length code Huffman code Figure 8. or 8 and the second 12. 8. Then we replace the two smallest frequencies f1 and f2 with their sum f1 + f2 . we can choose to add the ﬁrst 12 and 8.www.jntuworld.c ld m o character frequency a 2 b 3 c 7 d 8 e 12 Answer : : The successive reductions of the list of frequencies are as follows: 2. 7.8. one labeled 0 and the other one labeled 1. 7 . DECISION TREES. and join the corresponding two symbols to a common vertex above them by two edges.5.
12 → 20.9.com 8. DECISION TREES.c ld m o 0 20 1 d 0 12 e 0 Figure 8. TREE ISOMORPHISMS 109 12. 8. Optimal Huﬀman code 1.3. 12 → 32 20 32 w w jn .www. The resulting code is as follows: The other choice yields the following: w 12.10. 1 5 1 2 a character code a 1111 b 1110 c 110 d 10 e 0 w tu 32 1 12 0 7 8 c 3 b r o . 8. Optimal Huﬀman code 2. 12 → 20. 12 → 32 20 32 The tree obtained is the following: 32 1 20 1 12 5 1 2 a 1 0 3 b 0 c 7 0 d 8 e 0 12 Figure 8. .jntuworld.
Taking turns two players remove any number of coins from one of the piles.c ld 3 0 0 0 0 1 0 0 0 m o 1 0 0 Figure 8.6. and each edge represents a possible “move” (choice) from one of the players. The following tree represents all possible sequences of choices. After all the children of a given vertex have been labeled.jntuworld. Then we continue labeling the rest of the vertices in the following way.www. The ﬁrst player is represented with a box and the second player is represented with an circle.11.3. so that the ﬁrst player is interested in making it “maximum” and the second player wants to make it “minimum”. This numbers represent the “value” of each position of the game. The analysis of the game starts by labeling each terminal vertex with “1” if it represents a victory for the ﬁrst player and “0” if it represents a victory for the second player.3. Each node shows the number of coins in each pile.com 8. Game Trees. The player that removes the last coin loses. Tree of a game. As an example we study the following game using a tree: Initially there are two piles with 3 coins and 1 coin respectively. TREE ISOMORPHISMS 110 character code a 111 b 110 c 10 d 01 e 00 8. 3 1 1 2 0 1 1 1 1 w w w 1 0 1 0 1 1 0 0 0 0 1 1 jn . 1 0 2 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 w tu 0 1 1 0 1 1 0 0 1 1 0 0 1 r o 2 1 0 1 0 1 0 0 0 0 0 1 . First player positions are labeled with the maximum value of the labels of its children. DECISION TREES. Trees are used in the analysis of some games. we label the vertex depending on whether it is a “ﬁrst player” position (box) or a “second player” position (circle). second player positions are labeled with the minimum .
If the starting position has been labeled “1” that means that the ﬁrst player has a winning strategy. In some games the game tree is so complicated that it cannot be fully analyzed. For instance in the present game the ﬁrst player has advantage at the initial position.c ld m o . DECISION TREES.jntuworld. where w is the parent of that grandchildren. going to position 2 . That is called an alpha cutoﬀ. w tu r o .e. which will be a second player (circled) position. TREE ISOMORPHISMS 111 value of the labels of its children. If for any reason the ﬁrst player makes a mistake and removes say one coin from the ﬁrst pile. which is the one 1 with minimum “value”.e. w w jn . then the 1 second player has one favorable move to vertex 0 . Now the strategy is for the ﬁrst player to select at each position a children with maximum value. Every vertex labeled “1” will represent a position in which the ﬁrst player has advantage and can win if he/she works without making mistakes. vertices labeled “0” represent positions for which the second player has advantage. while the second player will be interested in selecting children with minimum value. i. The idea is to skip a vertex as soon as it becomes obvious that its value will not aﬀect the value of its parent. we prone the subtree at w. Assume that we are evaluating one of its children w.www. so a position with one queen and no knights will have a higher value than one with no queen and one knight). we assign a beta value equal to the minimum value of its children evaluated so far. he/she must remove all 3 coins from 1 1 the ﬁrst pile. but we can bypass the evaluation of many vertices using the technique of alphabeta pruning. but they can be “evaluated” using heuristic methods (for instance in chess usually losing a knight is a better choice than losing the queen. This process is called the minimax procedure. i. on the other hand. and practice a beta cutoﬀ when one of its grandchildren gets a value greater than or equal to the current beta value of v.com 8. The vertices reached at that depth are not terminal. w Alphabeta pruning. so it is built up to a given depth only... otherwise the second player has advantage. Similarly.3. so it will not aﬀect the value of v and we can stop the process of evaluation of w (prone the subtree at w). If at any point a children of w gets a value less than or equal to the alpha value of v then it will become obvious that the value of w is going to be less than the current alpha value of v. at a second player (circled) vertex v. In order to do that with a ﬁrst player (boxed) vertex v. we assign it an alpha value equal to the maximum value of its children evaluated so far. and only one favorable movement at that point: 3 → 0 . Even so the evaluation and labeling of the vertices can be time consuming.
jntuworld.12.www. w tu r o . Alpha cutoﬀ. TREE ISOMORPHISMS (4) v 112 1 3 4 w 4 3 6 7 5 4 5 1 ? Figure 8. DECISION TREES.c ld m o . w w w jn .3.com 8.
jntuworld. w tu r o . In order to evaluate an expression in Polish notation. The advantage of Polish notation is that it allows us to write expressions without need for parenthesis. The reverse Polish notation consists of placing the symbol to the right: ab ◦ . The Polish notation consists of placing the symbol to the left: ◦ab. it is customary to represent the result of applying the operation to two elements a. the expression a ∗ (b + c) in Polish notation would be ∗ a + b c. we replace the two top symbols of the stack by the result of applying the operator to those elements. called rear of the list. while a ∗ b + c is + ∗ a b c.4. b by placing the operation symbol in the middle: a ◦ b. This is called inﬁx notation. Transversal Algorithms. placing the elements in a stack. called front of the list.com 8. For instance.1 Each time we ﬁnd an operator. A queue or ﬁrstin ﬁrstout (FIFO) system.4. 1A . we introduce the concept of Polish notation. Tree Transversal 8. TREE TRANSVERSAL 113 8. For instance. Polish notation is easier to evaluate in a computer.1. The tree for a constant or variable a has a as its only vertex. If the algebraic expression S is of stack or lastin ﬁrstout (LIFO) system. the expression ∗ + 2 3 4 (which in inﬁx notation is “(2 + 3) ∗ 4”) would be evaluated like this: expression stack w w w jn . Also.www. In order to motivate this subject.4. called top of the list. we scan the expression from right to left. Given a (not necessarily commutative) binary operation ◦. is a linear list of elements in which insertions and deletions take place only at one end. and insertions take place only at the other end.c ld m o ∗ + 234 ∗ + 23 ∗+2 ∗+ ∗ 4 3 4 2 3 4 5 4 20 An algebraic expression can be represented by a binary rooted tree obtained recursively in the following way. is a linear list of elements in which deletions take place only at one end.
. o TL TR Figure 8.4.15). then the tree T for S consists of ◦ as root.14. and the subtrees TL and TR (ﬁg. If T has only one vertex r. where + denotes addition.c ld + h m o * e Figure 8. .jntuworld. . its Polish. let T1 .com 8. . where SL and SR are subexpressions with trees TL and TR respectively. Pre(Tk ). . E) with root r. Tk the subtrees of T from left to right (ﬁg. postorder and inorder respectively. namely in preorder. Preorder Transversal : Pre(T ) = r. The binary tree for this expression is given in ﬁgure 8. ∗ denotes multiplication and ↑ denotes exponentiation.14. . Pre(T1 ).13). and ◦ is an operator. Otherwise.www. w w w a jn . Then: 1. reverse Polish and inﬁx representation are diﬀerent ways of ordering the vertices of the tree. Given the binary tree of an algebraic expression. For instance. Tree for a + b ∗ c + d ↑ e ∗ (f + h). Tree of S1 ◦ S2 . + b w tu + c d r o * f . The following are recursive deﬁnitions of several orderings of the vertices of a rooted tree T = (V. TREE TRANSVERSAL 114 the form SL ◦ SR .13. consider the following algebraic expression: a + b ∗ c + d ↑ e ∗ (f + h) . . then r by itself constitutes the preorder. . . postorder and inorder transversal of T . 8. 8.
w w w jn .. Tk Figure 8.15.jntuworld. 3. w tu r o . TREE TRANSVERSAL 115 r T1 T2 . Postorder Transversal : Post(T ) = Post(T1 ).. . r. Inorder Transversal.com 8. Ordering of trees. 2.4. r.c ld m o . . left subtree TL and right subtree TR . . In(TR ).www. . Post(Tk ). If T is a binary tree with root r. then: In(T ) = In(TL ).
however. Otherwise let v be the vertex in the front of the queue. In practice. 3. . we give two algorithms to ﬁnd the spanning tree T of a loopfree connected undirected graph G = (V. 2. insert it in the queue and mark it as “visited”. Spanning Trees. SPANNING TREES 116 8. Every connected graph has a spanning tree which can be obtained by removing edges until the resulting graph becomes acyclic.1.5. BreadthFirst Search Algorithm. The idea is to start with vertex v1 as root. add the vertices that are adjacent to v1 . In consists of the following: 1. v2 . v ) to T . . then the ones that are adjacent to the latter and have not been visited yet. insert v in the queue and mark it as “visited”. c b h d g a f e i Figure 8. w tu r o .www.5.com 8.2. A tree T is a spanning tree of a graph G if T is a subgraph of G that contains all the vertices of G. removing edges is not eﬃcient because ﬁnding cycles is time consuming. . This algorithm uses a queue (initially empty) to store vertices of the graph. w 8. Delete v from the queue. add vertex v and edge (v. We assume that the vertices of G are given in a certain order v1 .16.5. and so on. For instance the graph of ﬁgure 8. E ). For each vertex v of G that has not been visited yet and is adjacent to v (there might be none) taken in order of increasing subscripts.5. . 4. then we are done. The resulting spanning tree will be T = (V . Spanning tree.jntuworld. E). Spanning Trees 8. Add v1 to T . If the queue is empty.c ld m o . w w jn .16 has a spanning tree represented by the thicker edges. Next. vn .
in order. Go to step 2. DepthFirst Search Algorithm. c. b. h. BreadthFirst Search. A pseudocode version of the algorithm is as follows: 1: procedure bfs(V. 8.V’ 9: if (x. In consists of the following: 1. f. The idea of this algorithm is to make a path as long as possible.17 shows the spanning tree obtained using the breadthﬁrst search algorithm on the graph with its vertices ordered lexicographically: a.E) 2: S := (v1) {ordered list of vertices of a fix level} 3: V’ := {v1} {v1 is the root of the spanning tree} 4: E’ := {} {no edges in the spanning tree yet} 5: while true 6: begin 7: for each x in S.com 8. e. and then go back (backtrack) to add branches also as long as possible. . g. w w w jn .c ld m o g e Figure 8.5.5.y) is an edge then 10: add edge (x. i.y) to E’ and vertex y to V’ 11: if no edges were added then 12: return T 13: S := children of S 14: end 15: end bfs Figure 8. insert it in the stack and mark it as “visited”. b a w tu h i f r o c d . This algorithm uses a stack (initially empty) to store vertices of the graph.www.3. SPANNING TREES 117 5.jntuworld. Add v1 to T .17. 8: for each y in V . d.
v) to E’ 13: add v to V’ 14: w := v 15: end 16: if w = v1 then w w w jn . Apply P to v . then delete v and go to step 2 (backtrack ). If there is no vertex v that is adjacent to v and has not been visited yet.E) 2: V’ := {v1} {v1 is the root of the spanning tree} 3: E’ := {} {no edges in the spanning tree yet} 4: w := v1 5: while true 6: begin 7: while there is an edge (w. then we are done. An alternative recursive deﬁnition is as follows. w tu r o . v ) to T . Otherwise let v be the vertex on the top of the stack. 4.v) 11: does not create a cycle in T 12: add (w. Add vertex v to T and mark it as “visited”. Add the edge (v. If the stack is empty. 5. 3. 5. The DepthFirst Search Algorithm consists of applying the process just deﬁned to v1 . Go to step 2. Otherwise. We deﬁne recursively a process P applied to a given vertex v in the following way: 1.v) that when added 8: to T does not create a cycle in T 9: begin 10: Choose first v such that (w. Go to step 2 (backtrack ). SPANNING TREES 118 2. v ) to T .c ld m o .www. let v be the ﬁrst nonvisited vertex that is adjacent to v. insert v in the stack and mark it as “visited”. A pseudocode version of the algorithm is as follows: 1: procedure dfs(V. If there is no vertex v that is adjacent to v and has not been visited yet. Otherwise. 4. let v be the ﬁrst nonvisited vertex that is adjacent to v. 3.jntuworld.5.com 8. Add vertex v and edge (v. 2. then return.
Minimal Spanning Trees.www. the spanning tree shown is the one of minimum weight. An algorithm to ﬁnd a minimal spanning tree is Prim’s Algorithm.5. h. w w w jn . SPANNING TREES 119 17: return T 18: w := parent of w in T {backtrack} 19: end 20: end Figure 8. Prim’s Algorithm. d.18. For instance for the graph of ﬁgure 8. . e.com 8. g.jntuworld.19.18 shows the spanning tree obtained using the breadthﬁrst search algorithm on the graph with its vertices ordered lexicographically: a.5.c ld m o f Figure 8. DepthFirst Search. a 3 e w tu 4 2 c 6 2 1 r o b 5 d 6 . Given a connected weighted tree G. 8. b. i. its minimal spanning tree is a spanning tree of G such that the sum of the weights of its edges is minimum.4. c b h d g a f i e Figure 8.19. c. Minimum Spanning Tree. It starts with a single vertex and at each iteration adds to the current tree a minimum weight edge that does not complete a cycle. f.
www.jntuworld.com
8.5. SPANNING TREES
120
The following is a pseudocode version of Prim’s algorithm. If (x, y) is an edge in G = (V, E) then w(x, y) is its weight, otherwise w(x, y) = ∞. The starting vertex is s. 1: procedure prim(V,w,s) 2: V’ := {s} {vertex set starts with s} 3: E’ = {} {edge set initially empty} 4: for i := 1 to n1 {put n edges in spanning tree} 5: begin 6: find x in V’ and y in V  V’ with minimum w(x,y) 7: add y to V’ 8: add (x,y) to E’ 9: end 10: return E’ 11: end prim
Prim’s algorithm is an example of a greedy algorithm. A greedy algorithm is an algorithm that optimized the choice at each iteration without regard to previous choices (“doing the best locally”). Prim’s algorithm makes a minimum spanning tree, but in general a greedy algorithm does not always ﬁnds an optimal solution to a given problem. For instance in ﬁgure 8.20 a greedy algorithm to ﬁnd the shortest path from a to z, working by adding the shortest available edge to the most recently added vertex, would return acz, which is not the shortest path.
w
w w
jn .
2 a 1
w tu
b 101 c
r o
4 100
.c ld
z
m o
Figure 8.20 Kruskal’s Algorithm. Another algorithm to ﬁnd a minimal spanning tree in a connected weighted tree G = (V, E) is Kruskal’s Algorithm. It starts with all n vertices of G and no edges. At each iteration we add an edge having minimum weight that does not complete a cycle. We stop after adding n − 1 edges.
www.jntuworld.com
8.5. SPANNING TREES
121
1: procedure kruskal(E,w,n) 2: V’ := V 3: E’ := {} 4: while E’ < n1 5: begin 6: among all edges not completing a cycle in T 7: choose e of minimum weight and add it to E 8: end 9: T’ = (V’,E’) 10: return T’ 11: end kruskal
w
w w
jn .
w tu
r o
.c ld
m o
www.jntuworld.com
CHAPTER 9
Boolean Algebras
9.1. Combinatorial Circuits 9.1.1. Introduction. At their lowest level digital computers handle only binary signals, represented with the symbols 0 and 1. The most elementary circuits that combine those signals are called gates. Figure 9.1 shows three gates: OR, AND and NOT.
OR GATE
w
w w
jn .
AND GATE
w tu
x2 x1 x2 x
x1
r o
x1 + x2 x1 x2 x
.c ld
m o
NOT GATE
Figure 9.1. Gates.
Their outputs can be expressed as a function of their inputs by the following logic tables: x1 x2 x1 + x2 1 1 1 1 0 1 0 1 1 0 0 0 OR GATE
122
A combinatorial circuit. They do not have memory.1.jntuworld. x1 x2 x3 w w w Figure 9.www. Assume that at a given time y = 0.com 9. If we input a signal x2 = 1. Some combinations of gates can be used to make more complicated combinatorial circuits. A combinatorial circuit is a circuit whose output is uniquely deﬁned by its inputs.3 is not a combinatorial circuit. x1 x2 x3 y = (x1 + x2 ) · x3 1 1 1 0 1 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 1 jn .2 is combinatorial circuit with the logic table shown below. and . If x1 = 1 and x2 = 0 then y can be 0 or 1. w tu r o . For instance ﬁgure 9. representing the values of the Boolean expression y = (x1 + x2 ) · x3 . COMBINATORIAL CIRCUITS 123 x1 x2 x1 · x2 1 1 1 1 0 0 0 1 0 0 0 0 AND GATE x 1 0 x 0 1 NOT GATE These are examples of combinatorial circuits. previous inputs do not aﬀect their outputs.2.c ld y m o However the circuit in ﬁgure 9. the output becomes y = 1.
Properties of Combinatorial Circuits.3. Identity w w jn . x1 y x2 Figure 9. Distributive w 4. b. Then their properties are the following (a.2.c ld m o a+b=b+a a·b=b·a a · (b + c) = (a · b) + (a · c) a + (b · c) = (a + b) · (a + c) 5. COMBINATORIAL CIRCUITS 124 stays so even after x2 goes back to its original value 0. Two Boolean expressions are deﬁned to be equal is they have the same values for all possible assignments of values to their literals.www. Not a combinatorial circuit. w tu a+0=a a·1=a a+a=1 a·a=0 r o .1. That way we can store a bit.e. OR and NOT gates are represented by the symbols ·. c are elements of Z2 . 1} represents the set of signals handled by combinatorial circuits. Commutative 3. Associative (a + b) + c = a + (b + c) (a · b) · c = a · (b · c) 2.1. We can “delete” it by switching input x1 to 0.com 9. as shown in the following table: . Example: x + y = x · y. each represents either 0 or 1): 1. Complement A system satisfying those properties is called a Boolean algebra.. + and respectively. 9.jntuworld. i. and the operations performed on those signals by AND. Here Z2 = {0.
Identity w w jn . Example: If U is a universal set and P(U )= the power set of S (collection of subsets of S) then (P(U ). . U ). 0. combinatorial circuits are an example.c ld m o x∨y =y∨x x∧y =y∧x x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z) x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z) 5. 1) is a Boolean algebra. ∪. Commutative 3.3. . COMBINATORIAL CIRCUITS 125 x 1 1 0 0 y x+y x·y 1 0 0 0 0 0 1 0 0 0 1 1 9. w tu x∨0=x x∧1=x x∨x=1 x∧x=0 r o . ·. Associative (x ∨ y) ∨ z = x ∨ (y ∨ z) (x ∧ y) ∨ z = x ∧ (y ∧ z) 2. ∧. two binary operators ∨ and ∧ on S. z are elements of S): 1. Here we deal with general Boolean algebras. .com 9. +. 0. Abstract Boolean Algebras. ∩.jntuworld.1. is a Boolean algebra. but there are others. ∅. and a unary operator on S.1. Distributive w 4. 1) is a set S containing two distinguished elements 0 and 1. satisfying the following properties (x. A Boolean algebra B = (S. y. . Complement Example: (Z2 . ∨.www.
De Morgan’s 3. w tu 1=0 x∨y =x∧y r o . Other Properties of Boolean Algebras.4. Absorption For instance the ﬁrst idempotent law can be proved like this: x = x ∨ 0 = x ∨ x ∧ x = (x ∨ x) ∧ (x ∨ x) = (x ∨ x) ∧ 1 = x ∨ x. The properties mentioned above deﬁne a Boolean algebra.jntuworld. Bound x∨1=1 x∧0=0 x ∨ xy = x x ∧ (x ∨ y) = x 4. Idempotent x∨x=x x∧x=x 2.www.1. but Boolean algebras also have other properties: 1.1. COMBINATORIAL CIRCUITS 126 9. Involution x=x 5. w w w jn . 0 and 1 0=1 6.c ld m o x∧y =x∨y .com 9.
where yj = xj if aj = 1 and yj = xj if aj = 0. w tu r o . It can also be written as a Boolean expression in the following way: x1 ⊕ x2 = (x1 · x2 ) + (x1 · x2 ) Every Boolean function can be written as a Boolean expression as we are going to see next. Given any Boolean function f : Zn → Z2 that is not identically 2 zero. Boolean Functions. Example: We have seen that the exclusiveor can be represented x1 ⊕ x2 = (x1 · x2 ) + (x1 · x2 ). . x2 . Introduction. xn is a Boolean expression of the form y1 · y2 · · · · · yn .2. a2 . . . Applications 9. xn ) = m1 + m2 + · · · + mk . (1.2. (0. . . This provides a way to implement the exclusiveor with a combinatorial circuit as shown in ﬁgure 9. . w w jn . where each yi is either xi or xi .2.2. . deﬁned by the following table: x1 x2 x1 ⊕ x2 1 1 0 1 0 1 0 1 1 0 0 0 The exclusiveor function can interpreted as a function Z2 → Z2 2 that assigns (1. x2 . 9. it can be represented w where m1 .com 9.2. 1) → 1. 0) → 0. consider the exclusiveor function. 9.c ld m o f (x1 . That representation is called disjunctive normal form of the Boolean function f . . . For instance. A maxterm in the symbols x1 . . . . m2 . 0) → 1. A Boolean function is a function from Zn 2 to Z2 .4. . .jntuworld. xn is a Boolean expression of the form y1 + y2 + · · · + yn . . BOOLEAN FUNCTIONS. an ) = 1. .www. . mk are all the minterms mi = y1 ·y2 ·· · ··yn such that f (a1 .1. . APPLICATIONS 127 9. A minterm in the symbols x1 . (0. .3. Conjunctive Normal Form. where each yi is either xi or xi . We start with a deﬁnition. 1) → 0. Disjunctive Normal Form. .2. .
In fact: x+y = x·y. NOT} is functionally complete. . Here we will see how to do the same with other kinds of gates. w w jn . . . where M1 . . . w A set of gates {g1 . Functionally Complete Sets of Gates. all we need to do is to show that we can compute x + y using only AND and NOT gates. g2 . Proof: Since we already know that {AND. . . . BOOLEAN FUNCTIONS. OR. M2 . Example: The result about the existence of a disjunctive normal form for any Boolean function proves that the set of gates {AND. .com 9. a2 . g2 . gk } is said to be functionally complete if for any integer n and any function f : Zn → Z2 it is possible to 2 construct a combinatorial circuit that computes f using only the gates g1 . gk . NOT} is functionally complete. OR. . The set of gates {AND.2. . 9. Mk are all the maxterms Mi = y1 + y2 + · · · + yn such that f (a1 . Next we show other sets of gates that are also functionally complete. where yj = xj if aj = 0 and yj = xj if aj = 1. it can be represented f (x1 .4.5 computes x + y. .www. Given any Boolean function f : Zn → Z2 that is not identically 2 one. . . .c ld m o . In the following gates will be considered as functions from Zn into Z2 intended 2 to serve as building blocks of arbitrary boolean functions. 1. NOT} is functionally complete. OR and NOT gates.2. .4. . APPLICATIONS x1 x1 x2 x2 128 Figure 9. xn ) = M1 · M2 · · · · · Mk . . hence the combinatorial circuit of ﬁgure 9. Example: The conjunctive normal form of the exclusiveor is x1 ⊕ x2 = (x1 + x2 ) · (x1 + x2 ) . w tu r o . We have seen how to design combinatorial circuits using AND. .jntuworld. That representation is called conjunctive normal form of the Boolean function f . . an ) = 0. ExclusiveOr.
NOT} is functionally complete.5. 9. Here we address the problems of ﬁnding a combinatorial circuit that computes a given Boolean function with the minimum number of gates. The set of gates {OR. AND with OR and NOT. BOOLEAN FUNCTIONS. The idea . Hence x = x · x = x ↑ x. w w w x1 x2 jn .7.jntuworld. Since the set {OR. x1 x2 Figure 9. w tu 0 if x1 = 1 and x2 = 1 1 otherwise. The gate NAND. Minimization of Combinatorial Circuits. r o . hence the combinatorial circuit of ﬁgure 9.5. The proof is similar: x·y = x+y.6 computes x + y. 2. NAND gate. Also the OR gate can be implemented with NAND gates: x+y = x · y = (x ↑ x) ↑ (y ↑ y). APPLICATIONS x1 x2 129 x1 + x2 Figure 9. Proof: Note that x ↑ y = x · y. the NAND gate is functionally complete. NOT} is functionally complete and each of its elements can be implemented with NAND gates.com 9. OR with AND and NOT.2. so the NOT gate can be implemented with a NAND gate.www.2. 3.c ld x1 x2 m o x1 x2 Figure 9. denoted ↑ and deﬁned as x1 ↑ x2 = is functionally complete.6.
2. For simplicity in the following we will represent a · b as ab.8. w tu y 1 1 0 0 1 1 0 0 r o . Example: Let F (x.www. NOT and OR functions implemented with NAND gates. . so for instance the expressions above will look like this: Ea + Ea = E and E + Ea = E.9. where E is any Boolean expression.jntuworld. is to simplify the corresponding Boolean expression by using algebraic properties such as (E · a) + (E · a) = E and E + (E · a) = E. BOOLEAN FUNCTIONS. This function can be implemented with the combinatorial circuit of ﬁgure 9.z) 1 1 0 1 1 0 0 1 1 0 0 0 1 0 0 0 Its disjunctive normal form is f (x. z) the Boolean function deﬁned by the following table: w w w jn .c ld m o x 1 1 1 1 0 0 0 0 z f(x. y. z) = xyz + xyz + xyz. y.com 9. APPLICATIONS x x 130 x x y y Figure 9.y.
jntuworld.10.y. y. A simpler circuit f (x. A halfadder is a combinatorial circuit with two inputs x and y and two outputs s and c. A circuit that computes f (x. m o that computes So the sum is s = x ⊕ y (exclusiveor) and the carry bit is c = x · y.com 9. Figure 9. x y z w 9.6.10.z) . Example: HalfAdder.y. y. z) = xyz + xyz + xyz. w tu r o .11 shows a halfadder circuit. z) = xyz + xyz + xyz.2. BOOLEAN FUNCTIONS. y. APPLICATIONS x y 131 f(x. z) = xyz + xyz +xyz = xy + xyz = x(y + yz) = x(y + y)(y + z) = x(y + z) . But we can do better if we simplify the expression in the following way: xy f (x.9. . z Figure 9.z) . where s represents the sum of x and y and c is the carry bit. MultiOutput Combinatorial Circuits.www. Its table is as follows: x 1 1 0 0 y 1 0 1 0 s 0 1 1 0 c 1 0 0 0 w w Figure 9.2. jn . which corresponds to the circuit of ﬁgure 9.c ld f(x.
jntuworld.com 9. w w w jn .11. w tu r o . APPLICATIONS 132 x s y c Figure 9. BOOLEAN FUNCTIONS.2.www.c ld m o . Halfadder circuit.
b} and two output symbols O = {0. w tu r o . then the machine will switch to σ1 as soon as the ﬁrst “b” arrives.jntuworld. Grammars and Languages 10. We represent the machine M = (S. FiniteState Machines. Finite State Machines 10. An output function g : S × I → O. O. f.1. so their output may depend not only on its current inputs but also on the past history of those inputs.1. 2. I. The internal states are S = {σ0 . A ﬁnite set of input symbols I. g.1. A ﬁnite set of output symbols O. The nextstate and output functions are as follows: f S σ0 σ1 I a b g a b w w A ﬁnite set of states S.www. 1} that accepts any string from I∗ and outputs as many 1’s as a’s there are at the beginning of the string. jn .c ld m o σ0 σ1 1 0 σ1 σ1 0 0 133 . 3. 5. σ) w Example: We describe a ﬁnite state machine with two input symbols I = {a. An initial state σ ∈ S. 4. A ﬁnitestate machine consists of the following: 1. Finite state machines on the other hand have internal states. Combinatorial circuits have no memory or internal states. σ1 }. where σ0 is the initial state—we interpret it as not having seeing any “b” yet. A nextstate or transition function f : S × I → S. then it outputs only 0’s.com CHAPTER 10 Automata. 6. their output depends only on the current values of their inputs.
A ﬁnite set of states S.1. Example: The following example is similar to the previous one but the machine outputs 1 only after a change of input symbol. start w tu a/1 00/0 01/1 10/1 r o 01/0 . and with a set of states called accepting or ﬁnal states. A ﬁnite set of input symbols I. 01. So the input set is I = {00. 2.www.c ld b/0 m o Ö Ö 11/0 D GFED r HIJK @ABC G ONML l NC C v 00/1 10/0 11/1 10. C}.jntuworld. The output set is O = {0. 1}.2. otherwise it outputs 0: @ABC G GFED σ0 start BB BB b/0  a/0  BB  BB  B2  b/1  ~ D GFED k @ABC @ABC Q GFED m σ1 σ2 a/0 Example: A SerialAdder. FiniteState Automata.1. o. 3. FINITE STATE MACHINES 134 This ﬁnitestate machine also can be represented with the following transition diagram: a/1 a/0 start Ö @ABC G GFED σ0 b/0 Ö @ABC G GFED σ1 g b/0 The vertices of the diagram are the states. 11}. A ﬁnitestate automaton is similar to a ﬁnitestate machine but with no output.com 10. If in state σ an input i causes the machine to output o and go to state σ then we draw an arrow from σ to σ labeled i/o or i. . The set of states is S = {N C. ﬁnitestate automaton consists of: 1. The transition diagram is the following: w w w jn . More speciﬁcally. which stands for “no carry” and “carry” respectively. A serial adder accepts two bits and outputs its sum. A nextstate or transition function f : S × I → S. 10.
We say that an automaton accepts or recognizes a given string of input symbols if that strings takes the automaton from the start state to a ﬁnal state. FINITE STATE MACHINES 135 4. with 1 representing “accept” or “recognize”. I. start w tu b r o σ1 a Ö D GFED @ABC . For instance the following automaton also accepts precisely strings of a’s abd b’s that end with an a. ?>=< G GFED σ2 Example: The following automaton accepts strings of a’s and b’s with exactly an even number of a’s: .1. A subset F ⊆ S of accepting or ﬁnal states.www.jntuworld. f. so it is equivalent to the automaton shown above: b a a b w w jn . Example: The following transition diagrams represent an automaton accepting any string of a’s and b’s ending with an a. An initial state σ ∈ S. and “0” representing “not accept”: b/0 a/1 start Ö @ABC G GFED l σ0 a/1 b/0 The second kind of diagram omits the outputs and represents the accepting states with double circles: w Two ﬁnitestate automata that accept exactly the same set of strings are said to be equivalent. ?>=< σ1 Ö @ABC G GFED l σ0 D GFED @ABC 89:.c ld m o start Ö @ABC G GFED l σ0 a b Ö D GFED @ABC 89:. 5. ?>=< σ1 b a Ö @ABC 89:. F). The ﬁrst diagram uses the same scheme as with ﬁnitestate machines. We represent the automaton A = (S. σ.com 10.
?>=< G GFED k E Ö C GFED @ABC O start @ABC G GFED σ0 a Ö @ABC 89:.c ld D GFED @ABC 89:. ?>=< G GFED σ1 b Ö @ABC G GFED σ2 g h b Example: The following automaton accepts strings ending with aba: @ABC G GFED σ0 x @ABC G GFED σ1 p b start a b w w w jn .www. w tu b a r o @ABC G GFED l σ2 a a b .jntuworld.com 10. FINITE STATE MACHINES b b a a 136 start Example: The following automaton accepts strings starting with one a followed by any number of b’s: b a a Ö @ABC 89:. ?>=< m o σ3 .1.
www. +. “(x + y) ∗ x + z”. used to represent given subsets of the language. y. a (formal) language over A is a subset of A∗ (set of strings of A). Then we can say that an algebraic expression E consists of a single term w w jn . )}. etc. The set of strings over A with an even number of a’s is a language over A. “z + +y ∗ x”. Languages and Grammars 10. The following are some of them: “x + y ∗ y”.g. ∗ Although in principle any subset of A is a formal language.2. y. b}. “x(∗y) + z”.2. A way to determine the structure of a language is with a grammar. So syntactically correct algebraic expressions are a subset of the whole set A∗ of possible strings over A. the ﬁnal symbols that occur in the strings of the language. LANGUAGES AND GRAMMARS 137 10. z. w 10. or the sum of an algebraic expression and a term E → E + T.: “(x + y”. ∗. For instance in the example about algebraic expressions mentioned above. The nonterminal symbols can be chosen to represent a complete algebraic expression (E). etc. In order to deﬁne a grammar we need two kinds of symbols: nonterminal. In general.2. Consider algebraic expressions written with the symbols A = {x. A term may consists of a factor or a product of a term and a factor T →F T →T ∗F A factor may consists of an algebraic expression between parenthesis . z.com 10. Formal Languages. or terms (T ) consisting of product of factors (F ). we are interested only in languages with certain structure.jntuworld. w tu r o . There are however some strings of symbols that are not legitimate algebraic expressions. ∗.c ld m o E → T. For instance: let A = {a. (. and terminal. e.1. because they have some sort of syntax error. given a ﬁnite set A (the alphabet). the ﬁnal symbols are the elements of the set A = {x. “y + (x ∗ y + y) ∗ x”. (. Grammars. +.2.2. )}.
In general a phrasestructure grammar (or simply. The left hand side must contain at least one nonterminal symbol. grammar ) G consists of 1. T. For instance the algebraic expression “‘y + (x ∗ y + y) ∗ x” can be generated like this: E ⇒ E +T ⇒ T +T ⇒ F +T ⇒ y +T ⇒ y +T ∗F ⇒ y +F ∗F ⇒ y+(E)∗F ⇒ y+(E+T )∗F ⇒ y+(T +T )∗F ⇒ y+(T ∗F +T )∗F ⇒ y + (F ∗ F + T ) ∗ F ⇒ y + (x ∗ T + T ) ∗ F ⇒ y + (x ∗ F + T ) ∗ F ⇒ y + (x ∗ y + T ) ∗ F ⇒ y + (x ∗ y + F ) ∗ T ⇒ y + (x ∗ y + y) ∗ F ⇒ y + (x ∗ y + y) ∗ x .jntuworld. 3. B) ∈ P is written: A→B. σ. A ﬁnite set V of symbols called vocabulary or alphabet. The elements of N = V − T are called nonterminal symbols or nonterminals. and tell us how we can generate syntactically correct algebraic expressions by replacing successively the symbols on the left by the expressions on the right. F → y.c ld m o We write G = (V. The right hand side of a production can be any combination of terminal and nonterminal symbols. P ).com 10. w tu r o . 2. LANGUAGES AND GRAMMARS 138 F → (E).2. A ﬁnite subset P of (V ∗ −T ∗ )×V ∗ called the set of productions. F → z. A subset T ⊆ V of terminal symbols.www. 4. A production (A. Those expressions are called productions. w w w jn . A start symbol σ ∈ N . or an isolated terminal symbol F → x. .
and we write α1 ⇒ αn (by convention also α1 ⇒ α1 . σ2 .2. If we have α1 ⇒ α2 ⇒ · · · ⇒ αn (n ≥ 0). The Backus Normal Form or BNF is an alternative way to represent productions. where N1 = V1 − T1 and N2 = V2 − T2 are disjoint (rename nonterminal symbols if necessary). Also assume that σ is a new symbol not in V1 ∪ V2 . The production S → T is written S ::= T . Let L1 = L(G1 ) and L2 = L(G2 ) be the languages associated respectively to G1 and G2 . for instance. Let G1 = (V1 . . Productions of the form S ::= T1 . T2 . Backus Normal Form. S ::= Tn . the grammar of algebraic expressions deﬁned above can be written in BNF as follows: E ::= T  E + T T ::= F  T ∗ F F ::= (E)  x  y  z w 10.jntuworld. the language L(G) associated to this grammar is the subset of T ∗ consisting of all strings derivable from σ. Then 1.www.4. . P1 ) and G2 = (V2 . we say that αn is derivable ∗ ∗ from α1 . Union Rule: the language union of L1 and L1 L1 ∪ L2 = {α  α ∈ L1 or α ∈ L1 } starts with the two productions σ → σ1 . P2 ) be two grammars. . So. σ1 . we say that xβy is directly derivable from xαy. w w jn .com 10. Combining Grammars. . S ::= T2 . Product Rule: the language product of L1 and L2 L1 L2 = {αβ  α ∈ L1 . β ∈ L1 } .) Given a grammar G. w tu r o . and we write xαy ⇒ xβy . can be combined as S ::= T1  T2  · · ·  Tn .3. σ → σ2 . T1 .2.c ld m o 2. 10.2. LANGUAGES AND GRAMMARS 139 If α → β is a production and xαy ∈ V ∗ .
. 10. .2. w w w jn . . G is a phrasestructure (or type 0) grammar if every production is of the form: α → δ. Let G be a grammar and let λ denote the null string. starts with the two productions σ → σ1 σ . G is a regular (or type 3) grammar if every production is of the form: A → a or A → aB where A.2. β ∈ V ∗ . where α. 2. a ∈ T . 3. 2. starts with the production σ → σ1 σ2 . w tu r o . . where α ∈ V ∗ − T ∗ . δ ∈ V ∗ . . G is a contextsensitive (or type 1) grammar if every production is of the form: αAβ → αδβ (i. Types of Grammars (Chomsky’s Classiﬁcation). .c ld m o . Closure Rule: the language closure of L1 L∗ = L 0 ∪ L 1 ∪ L2 ∪ .: we may replace A with δ in the context of α and β). 2. σ → λ. . αn  αk ∈ L1 . or A → λ . . ∗ where A ∈ N . n} 1 1 (n = 1. . G is a contextfree (or type 2) grammar if every production is of the form: A → δ. 1 1 1 1 were L0 = {λ} and Ln = {α1 α2 .www.jntuworld. A language L is contextsensitive (respectively contextfree.e. δ ∈ V . The following examples show that these grammars deﬁne diﬀerent kinds of languages. k = 1.com 10. A ∈ N . 1. 3. δ ∈ V ∗ − {λ}. . . B ∈ N . regular ) if there is a contextsensitive (respectively contextfree. regular) grammar G such that L = L(G). LANGUAGES AND GRAMMARS 140 where αβ = string concatenation of α and β. ).5. 0.
2. nonterminal symbols = {E. b.c ld m o σ → aAbc . (. . A → aAbC . )}. c}.2. 3. N = {σ. and productions σ → aσb . 2. σ → abc . and productions w There are also type 0 languages that are not type 1. L}. 3 . . } . 3.www. b}. } . N = {σ. . σ → ab .6.com 10. Cc → cc . with start symbol σ. with start symbol σ. 2. b}. T. y. Two grammars G and G are equivalent if L(G) = L(G ). Cb → bC . 3. T →T ∗F . . C}. . 2. E → E + T. . with start symbol E. 10. A. A type 2 grammar for that language is the following: T = {a.jntuworld. w w jn . 2. N = {σ}. A type 1 grammar for that language is the following: T = {a. but they are harder to describe. A → abC . . and productions: σ → aσ . m = 1. A type 3 grammar for that language is the following: T = {a. S}. . . σ → aS . S → bS . T → F. LANGUAGES AND GRAMMARS 141 Example: The following language is type 3 (regular): L = {an bm  n = 1. Example: The grammar of algebraic expressions deﬁned at the beginning of the section is equivalent to the following one: Terminal symbols = {x. w tu r o . +. . with start symbol σ. F. Equivalent Grammars. z. Example: The following language is type 1 (contextsensitive) but not type 2: L = {an bn cn  n = 1. . } . . S → b. ∗. Example: The following language is type 2 (contextfree) but not type 3: L = {an bn  n = 1. and productions E → T.
jntuworld. The von Koch Snowﬂake fractal is the limit of the sequence of curves deﬁned by that process. w A way to represent an intermediate stage of the making of the fractal is by representing it as a sequence of movements of three kinds: ’d’= draw a straight line (of a ﬁx length) in the current direction.1. 10. all symbols in α must be replaced simultaneously. so we get dldrrdldldldrrdldrrdldrrdldldldrrdld If we represent by D a segment that may no be ﬁnal yet. In a contextfree interactive Lindenmayer grammar. then the sequences of commands used to build any intermediate stage of the curve can be deﬁned with the following grammar: w w jn . For instance we start with a single horizontal line d. L → z. as shown below.2. L → x.2. L → y.7.www. ’l’= turn left by 60◦ . The von Koch Snowﬂake is a fractal curve obtained by start with a line segment and then at each stage transforming all segments of the ﬁgure into a four segment polygonal line. to derive string β from string α.c ld m o Figure 10. Von Koch Snowﬂake. LANGUAGES AND GRAMMARS 142 F → (E). Figure 10. stages 4–5 . Von Koch Snowﬂake. stages 1–3. then each segment is transformed into a polygonal according to the rule d → dldrrdld.com 10. F → L.2. ’r’= turn right by 60◦ . w tu r o . A contextfree interactive Lindenmayer grammar is similar to a usual contextfree grammar with the diﬀerence that it allows productions of the form A → B where A ∈ N ∪ T (in a context free grammar A must be nonterminal). Its rules for deriving strings also are diﬀerent. ContextFree Interactive Lindenmayer Grammar. which we then transform into the polygonal dldrrdld. Example: The von Koch Snowﬂake.
com 10.www. Peano curve. 1]2 . and productions C → LLLL . Figure 10.e. L. L → l. l → l.. 1]2 such that the range of f is the whole square [0. D → d. l → l. stages 1–4. r o . The Peano curve is a space ﬁlling curve. jn . w w w R → r. 1] → [0.c ld m o . Each element of that sequence of curves can be described as a sequence of 90◦ arcs drawn either anticlockwise (’l’) or clockwise (’r’). l}. LANGUAGES AND GRAMMARS 143 N = {D}. r}. deﬁned as the limit of the sequence of curves shown in the ﬁgures below.jntuworld. r. with and start symbol C. a function f : [0. w tu r → r. R → RLR . The corresponding grammar is as follows: T = {l.2. N = {C. L → RLLLR . T = {d. r → r. and productions: D → DlDrrDlD . Example: The Peano curve. R}. with start symbol D. i.3.
Language Recognition 10. and {a} are regular languages for all a ∈ T . add σ1 → λ.2. P ) in which every production is of the form: A → a or A → aB where A. . a grammar G = (V. n ∈ N} .c ld m o 1. 3. LANGUAGE RECOGNITION 144 10. Closure Rule: Instead of adding σ → σ1 σ and σ → λ. αβ = ’concatenation of α and β’. use σ1 as start symbol and replace each production (A → a) ∈ P1 with A → aσ2 and (A → λ) ∈ P1 with A → σ2 . L1 L2 and L∗ . or A → λ .3. so we need to modify them slightly (we use the same notation as in section 10. T.1. αn = ’α concatenated with itself n times’): 1. 2. 2. B ∈ N = V − T . a ∈ T . Regular languages over an alphabet T have the following properties (recall that λ = ’empty string’.4) L1 and L2 to obtain L1 ∪ L2 .3. ∅. . We justify the above claims about L1 ∪ L2 . use σ1 as start symbol. .. Recall that a regular language is the language associated to a regular grammar.com 10.2. Regular Languages.4 do no have the form of a regular grammar. 1 We already know how to combine two grammars (see 10. β ∈ L2 } L∗ 1 ∗ T − L1 = {α ∈ T ∗  α ∈ L1 } . where RHS is the right hand side of some production (σ1 → RHS) ∈ P1 or (σ2 → RHS) ∈ P2 . and replace each production (A → a) ∈ P1 with A → aσ1 and (A → λ) ∈ P1 with A → σ1 . Product Rule: Instead of adding σ → σ1 σ2 .4): w w w jn . {λ}.www.3. Union Rule: Instead of adding σ → σ1 and σ → σ2 . σ. αn  αk ∈ L1 . L1 ∩ L2 = {α  α ∈ L1 and α ∈ L2 } .2. w tu r o .jntuworld. If L1 and L2 are regular languages over T the following languages also are regular: L1 ∪ L2 = {α  α ∈ L1 or α ∈ L2 } L1 L2 = {αβ  α ∈ L1 . / = {α1 . i.e. the only problem is that the rules 1 given in section 10. L1 L2 and L∗ as follows. add all productions of the form σ → RHS.
. A ﬁnite set of states S. R · S. 2. the expression a∗ bb∗ represents all strings of the form an bm with n ≥ 0. Nondeterministic FiniteState Automata. Regular languages can be characterized as languages deﬁned by regular expressions. A nextstate or transition function f : S × I → P(S). In order to use fewer parentheses we assign those operations the following hierarchy (from do ﬁrst to do last): ∗. a∗ (b + c) is the set of strings consisting of any number of a’s followed by a b or a c. w w jn . Given an alphabet T .3.3.2. w tu (language product). w Another way of characterizing regular languages is as sets of strings recognized by ﬁnitestate automata. 4. If R and S are regular expressions over T the following expressions are also regular: (R). Next we deﬁne recursively the language associated to a given regular expression: L(∅) = ∅ . A nondeterministic ﬁnitestate automaton is a generalization of a ﬁnitestate automaton so that at each state there might be several possible choices for the “next state” instead of just one. ·. Regular Expressions.jntuworld.com 10. 3. b} than start with a and end with b. (language closure). +.3. A ﬁnite set of input symbols I. But ﬁrst we need a generalization of the concept of ﬁnitestate automaton. L(λ) = {λ} . We may omit the dot: α · β = αβ. a regular expression over T is deﬁned recursively as follows: 1. R∗ . m > 0. L(a) = {a} L(R + S) = L(R) ∪ L(S) . r o for each a ∈ T . etc.www. An initial state σ ∈ S. 2. LANGUAGE RECOGNITION 145 10. a(a + b)∗ b is the set of strings over {a.c ld m o . 10. R + S. Formally a nondeterministic ﬁnitestate automaton consists of 1. L(R · S) = L(R)L(S) L(R ) = L(R) ∗ ∗ So. ∅. λ. and a are regular expressions for all a ∈ T . for instance.3. as we will see next.
Let P be the set of productions Õ @ABC G GFED σ Ö @ABC G GFED C b w w w jn .c ld m o if there is an edge labeled x from S to S and if S is a ﬁnal state. w tu S → xS S→λ r o @ABC 89:. n ≥ 0. σ. m > 0.jntuworld. LANGUAGE RECOGNITION 146 5.com 10. A subset F of S of accepting or ﬁnal states. Example: Consider the nondeterministic ﬁnitestate automaton deﬁned by the following transition diagram: b b a start This automaton recognizes precisely the strings of the form bn abm . the string is recognized by the automaton. ?>=< G GFED F . Regular grammar associated to a nondeterministic ﬁnitestate automaton. Since that path ends in a ﬁnal state. I.www. and V = N ∪ T . σ. let N be the set of states. F ). P ) . T. For instance the string bbabb is represented by the path (σ. Then the set of strings recognized by A is precisely L(G). Let T be the set of inputs symbols. f. C. C.3. σ. σ. We represent the automaton A = (S. Let A be a nondeterministic ﬁnitestate automaton given as a transition diagram. F). Let G be the regular grammar G = (V. A path (which we can express as a sequence of states) whose edges are labeled with the symbols of a string is said to represent the given string. Example: For the nondeterministic automaton deﬁned above the corresponding grammar will be: . We say that a nondeterministic ﬁnitestate automaton accepts or recognizes a given string of input symbols if in its transition diagram there is a path from the starting state to a ﬁnal state with its edges labeled by the symbols of the given string. Next we will see that there is a precise relation between regular grammars and nondeterministic ﬁnitestate automata. Let σ be the initial state.
and F ∈ V . The main result is the following: w w jn . Relationships Between Regular Languages and Automata. F → λ. 4. f (X. The idea is to show that given any nondeterministic ﬁnitestate automata it is possible to construct an equivalent deterministic ﬁnitestate automata recognizing exactly the same set of strings. f . N = {σ. x) = ∅ . I . F = {X ⊆ S  X ∩ F = ∅}. S = N ∪ {F }. I. 3. Here we will see that the term “nondeterministic” can be dropped. where N = V − T .3.c ld m o Let A = (S. with productions σ → bσ . w 10. Then A is equivalent to the ﬁnitestate automaton A = (S . C. σ.3. f (∅. 2. S∈X . so that regular languages are precisely those recognized by (deterministic) ﬁnitestate automata.www. T. 5.com 10. / f (S. Let G = (V. LANGUAGE RECOGNITION 147 T = {a.jntuworld. The string bbabb can be produced like this: σ ⇒ bσ ⇒ bbσ ⇒ bbaC ⇒ bbabC ⇒ bbabbF ⇒ bbabb . x) = {S  S → xS ∈ P } ∪ {F  S → x ∈ P }. C → bF . w tu r o . I = I. σ . F }. x) . I. f. C → bC . F ). x) = f (S. S = P(S). F) be a nondeterministic ﬁnitestate automaton. P ) be a regular grammar. In the previous section we saw that regular languages coincide with the languages recognized by nondeterministic ﬁnitestate automata. Then the nondeterministic ﬁnitestate automaton A = (S. b}. F = {F } ∪ {S  S → λ ∈ P } . Nondeterministic ﬁnitestate automaton associated to a given regular grammar.4. σ. f. where 1. σ → aC . F) recognizes precisely the strings in L(G). Let I = T. σ. σ = {σ}.
F }. ED {σ. {C. C. LANGUAGE RECOGNITION 148 Example: Find a (deterministic) ﬁnitestate automaton A equivalent to the following nondeterministic ﬁnitestate automaton A: b b a start Answer : The set of input symbols is the same as that of the given automaton: I = I = {a. {C}. BC 89 66 @A II v a 6 II II 666 b II 6 a b a II 6 Ø ' GF ?> ED =< GF ?> =< {F ED {C. F }. {σ. ?>=< G GFED F The starting state is {σ}. Õ @ABC G GFED σ Ö @ABC G GFED C b @ABC 89:. F }} . After removing the unreachable states we get the following simpliﬁed version of the ﬁnitestate automaton: Ø GF ED G@A BC {σ} ED GGF {C} @A BC GF ?> ED a =< G@A {C. BC @A F } :. F } @A } 89 BC :. F }} .: S = {∅. C. The set of states is the set of subsets of S = {σ. F } 89 :. Ø a b a GF ?> =< ED GF ED G {C} o {σ. F }. The ﬁnal states of A are the elements of S containing some ﬁnal state of A: F = {{F }. {σ. BC @A BC dII y 66 I xY 66 IIa a xxx a b b x 66 III x I 66 xx ED GF ?> =< GFEDr b 666b GF C} P ∅ @ABCdI {σ. {σ.e. {σ. F } 89 BC :.c ld b m o Ø a b × GFED G@ABC V ∅ e a b . {C. C}. b We notice that some states are unreachable from the starting state. F }.com 10. F }. @A C. {σ. {σ}. Ø GF ED G@A BC {σ} y w tu b a r o . {F }. C. b}. Then for each element X of S we draw an edge labeled x from X f (S.www.jntuworld. i. @A 89 BC :.3. x) (and from ∅ to ∅): to S∈X start w w w start jn .
com 10.3.jntuworld.c ld m o .www. once proved that every nondeterministic ﬁnitestate automaton is equivalent to some deterministic ﬁnitestate automaton. we obtain the main result of this section: A language L is regular if and only if there exists a ﬁnitestate automaton that recognizes precisely the strings in L. w w w jn . w tu r o . LANGUAGE RECOGNITION 149 So.
so: 3547 = ((32 · 3)2 · 3)2 · 3. w tu r o .m) {computes a^x mod m} 2: p := 1 3: bx := binary array(x) {x as a binary array} 4: t := a mod m 5: for k := 1 to length(bx) 6: begin 7: p := (p * p) mod m 8: if bx[k] = 1 then {if kth binary digit of x is 1} 9: p := (p * t) mod m 10: end 11: return p 12: end pow mod w w w jn . 3 4 4 32 = 12 = 1 (mod 10).1. Assume that we want to compute 3547 mod 10. etc. and reducing modulo 2 10 at each step: 32 = 9 (mod 10). 32 = 12 = 1 (mod 10). The algorithm in pseudocode would be like this: 1: procedure pow mod(a.x. Eﬃcient Computation of Powers Modulo m We illustrate an eﬃcient method of computing powers modulo m with an example. Next we compute the expression beginning with the inner parenthesis.jntuworld. First write 547 in base 2: 1000100011. At the end we ﬁnd 3547 = 7 (mod 10). hence 547 = 29 + 25 + 2 + 1 = 4 4 ((24 + 1) 24 + 1) 2 + 1. 32 · 3 = 1 · 3 = 3 (mod 10).www.c ld m o 150 .com APPENDIX A A. 32 = 92 = 81 = 1 (mod 10).
while (!(y & x)) y >>= 1.com A.2)).www. } y >>= 1. w tu r o . int m) { int p = 1. while (y) { p *= p. p %= m.1. a %= m. p %= m.jntuworld. if (x & y) { p *= a. int y = (1 << (8 * size of(int) . } The following is an alternative algorithm equivalent to running through the binary representation of the exponent from right to left instead of left to right: 1: procedure pow mod(a.c ld m o . } return p.m) {computes a^x mod m} 2: p := 1 3: t := a mod m 4: while x > 0 5: begin 6: if x is odd then 7: p := (p * t) mod m 8: t := (t * t) mod m 9: x := floor(x/2) 10: end 11: return p 12: end pow mod w w w jn .x. EFFICIENT COMPUTATION OF POWERS MODULO M 151 The following is a program in C implementing the algorithm: int pow(int a. int x.
1. S = ’stay in the current cell’. 3. . Turing Machine.1. An input is represented on the tape by placing the letters of the strings in contiguous tape cells. A ﬁnitestate automaton is just a Turing machine whose tape head moves always from left to right and never writes to the tape. 3. In general we make the following assumptions: 1. Machines and Languages A. The input of the ﬁnitestate automaton is presented as symbols written in the tape. 2. A tape head able to read and write (or delete) symbols from the tape. R = ’move one cell to the right’.2. A direction for the tape head to move: L = ’move one cell to the left’. which we may denote λ. Each cell may be empty or may contain a symbol from a given alphabet. All other cells contain the blank symbol.c ld m o Turing machines are generalizations of ﬁnitestate automata.www. It consists of 1. Tape Tape head Each machine instruction contains the following ﬁve parts: The current machine state. 2. A tape symbol read from the current tape cell.2. MACHINES AND LANGUAGES 152 A. w tu control unit r o . A Turing machine is a theoretical device intended to deﬁne rigorously the concept of algorithm. Turing Machines.2.jntuworld.com A. w w w jn . Figure A. 4. 5. A tape symbol to write into the current tape cell. The next machine state. An inﬁnite tape made of a sequence of cells. 1. A control unit containing a ﬁnite set of instructions.
nop. FiniteState Automata. The output of the Turing machine is the contents of the tape when the machine stops. Otherwise the string is rejected. w tu r o . The execution of a Turing machine stops when it enters the Halt state or when it enters a state for which there is no valid move. Let f : N → N the function deﬁned so that f (n) = m means that if the . then the machine is nondeterministic. There is one starting state. 3. a storage structure in which symbols can be put and extracted from it by two operations: push (place on the top of the stack) and pop (take from the top of the stack)—consequently the last symbol put into the stack is the ﬁrst symbol taken out.c ld m o Linearly Bounded Automata. A pushdown automaton is ﬁnitestate automaton with a stack. Additionally there is a third operation. 4. w w w jn . but also the symbol at the top of the stack. A linearly bounded automaton is a Turing machine whose tape is limited to the size of its input string plus two boundary cells that may not be changed. Computable Functions. This can happen in two ways: by entering a state other than the Halt state from which there is no move. the automaton executes a stack operation and goes to the next state.2. Initially there is a single symbol in the stack. i. If a Turing machine has at least two instructions with the same state and input letter. We say that an input string is accepted by a Turing machine if the machine enters the Halt state. or by running forever (for instance executing an inﬁnite loop). Pushdown Automata.e. MACHINES AND LANGUAGES 153 2.. The next state function takes into account not only the current state and the symbol read from the input. that leaves the stack intact. The tape is initially positioned at the leftmost cell of the input string unless speciﬁed otherwise. A ﬁnitestate automata can be interpreted as a Turing machine whose tape head moves only from left to right and never writes to the tape. which we denote by “Halt”. Consider a Turing machine T working on symbols from an alphabet of only one symbol A = {} (“stroke”). After reading the next input symbol and the symbol at the top of the stack. Otherwise it is deterministic. There is one halt state.www.jntuworld.com A.
. w 6. the output of T is a string of m + 1 strokes. Nongramatical languages.c ld m o . A computable function is a function computed by some Turing machine. Example: {am bn  m. Regular languages. }. 2.com A. Recursively enumerable languages. A computable function f is total if f (n) halts for every n. Here we mention a hierarchy of languages that includes (and extends) Chomsky’s classiﬁcation. A computable function f (n) halts for a given value of its argument n if T with input n + 1 strokes halts. Example: palindromes over {a. . 2. Unrestricted or phrasestructure grammars. A. . f2 .www. w tu r o . languages that are not deﬁnable by any grammar and cannot be recognized by Turing machines. n = 1. recognized by deterministic pushdown automata. where f0 . recognized by nondeterministic pushdown automata. Deterministic contextfree languages. languages without λ recognized by linearly bounded automata. 2. 3 . in increasing order of inclusion. Example: {an  fn (n) halts}. Example: {an  fn is total}. }. } 5. Contextsensitive languages.2.2.jntuworld. 3. A language is recursively enumerable if there is a Turing machine that outputs all the strings of the language. . 3 . Example: {an bn  n = 1. 2. . 4. . w w jn . f1 . An eﬀective enumeration of a set is a listing of its elements by an algorithm. 3 . Hierarchy of Languages. b}. recognized by Turing machines. 7.2. Example: {an bn cn  n = 1. Contextfree languages. . 1. MACHINES AND LANGUAGES 154 initial input of T consists of a string of n + 1 strokes. . We say that f is computed by the Turing machine T . is an eﬀective enumeration of all computable functions. They are recognized by ﬁnitestate automata. .
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.