You are on page 1of 157

Legend

Language, Automata:

, \alpha : usually denotes a string in this course.

, \beta : usually denotes a string in this course.


, \delta : usually denotes a transition function in this course.

, \sigma : usually denotes a symbol in an alphabet in this course.

, \Delta : usually denotes a blank space in this course.

, \Gamma : usually denotes a set of stack symbols in this course.

, \Lambda : usually denotes an empty string in this course.

, \Pi : usually denotes a partition in this course.

, \Sigma : usually denotes an alphabet in this course.

, \goto : usually denotes a (one step) transition in this course.

Logic:

, ~ : logical not

, ^ : logical and

, V : logical or

, -> : logical imply

, <-> : logical if and only if (equivalent)

, => : logical tautologically imply

, <=> : logical tautologically equivalent

, \A : logical for all

, \E : logical for some (there exists)


Sets:

, \in : belongs to

, \not\in : does not belong to

, @ : empty set
U, : universal set

, \subset : proper subset

, \not\subset : not a proper subset

, \subseteq : subset

, \not\subseteq : not a subset


, \cup : set union

Ai , \cup(i=1 to n) A_i : union of n sets


, \cap : set intersection

Ai , \cap(i=1 to n) A_i : intersection of n sets


, \bar A : complement of set A

(A) , P(A) : power set of set A

, X : Cartesian product

Ai , X(i=1 to n) A_i : cartesian product of n sets

Relation:

< a, b > : ordered pair


< a1, a2, ..., an > : ordered n-tuple

, <= : precedes (partial order)

Functions:

xi , Sum(i=1 to n) x_i : sum of n xi's


O(f) , O(f) : of order smaller than or equal to f
o(f) , o(f) : of order smaller than f

(f) , Omega : of order greater than or equal to f

(f) , omega : of order greater than f

(f) , Theta : of the same order as f

f(x) , lim(x -> inf) f(x) : limit of f as x goes to infinity

Introduction to Theoretical Computer Science

Today computers are used everywhere: banks, hospitals, schools, airline companies, gas
stations, grocery stores, in our cars, in home appliances, PCs, etc., etc. Some are used to
crunch numbers, some are used to process images, some are used to process other non-
numeric data and some are used to control operations of various devices. They can
reason, they can prove many mathematical theorems, they can beat chess experts in their
game, they can solve some very complex problems, they can understand our languages,
they can answer our questions and of course they can crunch numbers much much faster
than us.
Let us for a moment call what computers do computation for convenience, though some
of the things computers do such as controling appliances, answering our questions etc.
don't fall into our traditional sense of computation. Then these computers seem to be able
to compute an awfully lot of things if not everything. But are they capable of computing
anything ? Are there things computers can not do ? If there are things computers can not
do, what are they ? And why ? If there aren't things computers can not do, then how can
we tell ? What do we exactly mean by computation ?
Unfortunately there are many things computers can not do. Computers can not solve
certain types of problems. For example no computer can tell in general whether or not a
given computer program stops after a finite amount of time on a given input. They can
not solve some other types of problems fast enough even though they can solve them in
some finite amount of time. For example take the traveling salesman problem: a salesman
is given a road map with distances between cities and wants to find a shortest round trip
route that visits all the cities on the map exactly once. At the moment the so called
traveling salesman problem requires an extremely large amount of time to solve. No one
has been able to find a reasonably fast algorithm to solve it and the consensus is that it is
not likely that anyone can find such an algorithm.

I have just given you an example of the problems that computers could not solve. How do
we know that that is the case ? Are there other problems like that ? How can we tell
whther or not a given problem can be solved and solved fast enough ?
The main objective of this course is to answer those questions, that is to study limitations
of computers and computation. We are going to investigate limitations of computers and
computations by studying the essence of compuers and computations rather than all the
variations of computer and computation. This essence is a device called Turing machine.
It was first conceived of by Alan Turing in early 20-th century. It is a very simple device
but remarkably, every task modern computers perform can also be accomplished by
Turing machines. Though it has not been proven, it is generally believed (Church's thesis)
that any "computation" humans do can be done by Turing machines and that
"computation" is the computation performed by Turing machines.
Thus by studying Turing machines we can learn capabilities hence limitatgions of
computers.

Before proceeding to the study of Turing machines and their computations in this course,
we study a simpler type of computing device called finite automata. Finite automata are
very similar to Turing machines but a few restrictions are imposed on them.
Consequently they are less capable than Turing machines but then their operations are
simpler. So they provide a good introduction to our study of Turing machines. In addition
finite automata can model a large number of systems used in practice. Thus they are a
powerful tool to design and study those systems with.

We call a set of strings (of symbols) a language. Finite automata process strings. More
specifically they answer the question whether or not a given string belongs to a language.
We say finite automata recognize languages. It turns out that solving a problem can be
viewed as recognizing a language. Thus when a finite automaton is processing strings, it
can actually be solving a problem.
The languages that are recognized by finite automata are called regular languages. Since
so many systems in practice can be described by regular languages, we are also going to
study regular languages in detail as well as finite automata. We are going to learn their
properties, ways to describe them and how to use them to model many of the real life
systems.

Turing machines also recognize languages. The languages Turing machines recognize are
called Type 0 (or phrase structure) languages (regular languages are Type 3) and they are
more complex than regular languages. These two type of languages belong to a hierarchy
of four languages called Homsky hierarchy. The other two are context-free languages and
context-sensitive languages. Those four languages are together called formal languages.

After briefly studying context-free languges, which are also heavily used in practice, we
go to Turing machines. There we learn how computers can be simulated by Turing
machines and what it means that a Turing machine recognizes (decides) a language,
which is the key to the unsolvability of problem by computers.
Then with Turing machines we investigate limitations of computers and computations. In
particular we are going to see a few problems that can not be solved by Turing machines
hence by computers and how we can tell that they are unsolvable.
Our last topic is time complexities of various problems. Among the solvable problems
there are problems that can be solved within a reasonable amount of time and there are
problems that are known to require a finite but very large amount of time to solve. We are
going to see some of those which take a large amount of time. Unfortunately there is
nothing we can do to speed them up. The time needed to solve such a problem increases
at least exponentially with the size of the problem as long as we use Turing machines
(hence computers). Increasing the processor speed does not help much for such problems.
If the computation time is 2n where n is the size of the problem, then even if the processor
speed increased 1,000 times it can handle only ten or so more larger problem sizes. For
example for the traveling salesman problem if 100 cities were too many to solve fast
enough, then with the increase in the processor speed of 1,000 times 110 cities would
already be too many.
The time complexity issues are investigated using Turing machines so that the results
apply to all computers.

Let us start with review of mathematics.


Basic Mathematical Objects Back to Table of Contents

The following are the contents of this introductory chapter.

• Logic
• Sets
• Relations
• Functions

Logic

Proposition and Logical Connectives

"Proposition" can be defined as a declarative statement having a


specific truth-value, true or false.
Example:
The following statements are propositions as they have precise truth values.
Their truth values are false and
true, respectively.

• 2 is a odd number.
• 4 is a perfect square.

"Connective": Two or more propositions can be combined together to make


compound propositions with
the help of logical connectives.
Example:
Above two propositions can be used to make a compound proposition using any
of the logical connectives.

• 2 is an odd number AND 4 is a perfect square.


• 2 is an odd number OR 4 is a perfect square.

Their truth vales are false and true respectively. For the first compound
proposition to be true both the
propositions have to be true as the connective is AND and as OR is the connective
for the second one
if either of the propositions is true the truth value of the compound proposition is
true.

The following are the logical connectives used commonly:

a. Conjunction
The logical conjunction is understood in the same way as commonly used ôandö.
The compound proposition
truth-value is true iff all the constituent propositions hold true. It is represented
as " ^ ".
Truth table for two individual propositions p and q with conjunction is given
below

p q p^q
T T T
T F F
F T F
F F F

b. Disjunction
This is logical "or" read as either true value of the individual propositions.
Truth table is given below

p q pVq
T T T
T F T
F T T
F F F

c. Negation
This is the logical "negation" and it is expressed by as p for "not p".
Truth table is given below
p p
T F
F T

d. Conditional
This is used to define as "a proposition holds true if another
proposition is true" i.e. p q is read as "if p, then q"
Truth table is given below

p q p q
T T T
T F F
F T T
F F T

p -> q is also expressed in a number of different (but equivalent) ways in English.


For example, "p only if q" , "if not q then not p" , "p is sufficient for
q" , "q is necessary for p", "q is a necessity/consequence of p" and "q whenever
p"
are all differnt ways of saying "if p then q".

e. Biconditional
A proposition (p q) ^ (q p) can be abbreviated using biconditional
conjunction
as p q and is read as "if p then q, and if q then p".

f. Tautology
A compound proposition, which is true in every case.
E.g.: p V p

g. Contradiction
This is the opposite of tautology, which is false in every case.
E.g.: p ^ p

Logical implication and equivalence


If the value of p -> q is true in every case, then p is said to logically imply q.
It is represented as p => q. If p and q have the same truth-value in every case
then they are said to be logically equivalent and it is represented as p <=> q.

Following are some of the useful identities and implications from propositional
logic:

Identities
1. (P Q) ( P Q) ----- DeMorgan's Law
2. (P Q) ( P Q) ----- DeMorgan's Law
3. (P Q) ( P Q) ----- implication
4. [(P Q) R] [P (Q R)] ----- exportation
5. (P Q) ( Q P) ----- contrapositive

For explanations, examples and proofs of these identities go to Identities

Implications

1. [(P Q) Q] P ----- modus tollens


2. [(P Q) (R S)] [(P R) (Q S)]
3. [(P Q) (Q R)] (P R)

For explanations, examples and proofs of these implications go to Implications

Predicate and Predicate Logic

The propositional logic is not powerful enough to represent all types of assertions
that are used
in computer science and mathematics, or to express certain types of relationship
between
propositions such as equivalence ( for more detail click here for example for
example ).
For more complex reasoning we need more powerful logic capable of expressing
complicated
propositions and reasoning. The predicate logic is one of the extensions of
propositional logic
and it is fundamental to most other types of logic. Central to the predicate logic
are the concepts
of predicate and quantifier.

A predicate is a template involving a verb that describes a property of objects, or


a relationship
among objects represented by the variables.

For example, the sentences "The car Tom is driving is blue", "The sky is blue",
and "The cover
of this book is blue" come from the template "is blue" by placing an appropriate
noun/noun phrase
in front of it. The phrase "is blue" is a predicate and it describes the property of
being blue.
Predicates are often given a name. For example any of "is_blue", "Blue" or "B"
can be used
to represent the predicate "is blue" among others. If we adopt B as the name for
the predicate
"is_blue", sentences that assert an object is blue can be represented as "B(x)",
where x represents
an arbitrary object. B(x) reads as "x is blue".

A predicate with variables, called atomic formula, can be made a proposition by


applying one of
the following two operations to each of its variables:

1. assign a value to the variable


2. quantify the variable using a quantifier (see below).

For example, x > 1 becomes 3 > 1 if 3 is assigned to x, and it becomes a true


statement,
hence a proposition.

In general, a quantification is performed on formulas of predicate logic (called


wff ), such as
x > 1 or P(x), by using quantifiers on variables . There are two types of
quantifiers:
universal quantifier and existential quantifier.

The universal quantifier turns, for example, the statemen t x > 1 to "for every
object x
in the universe, x > 1", which is expressed as " x x > 1". This new statement is
true or false
in the universe of discourse. Hence it is a proposition once the universe is
specified.

Similarly the existential quantifier turns, for example, the statement x > 1 to "for
some
object x in the universe, x > 1", which is expressed as " x x > 1." Again, it is true
or false
in the universe of discourse, and hence it is a proposition once the universe is
specified.

Universe of Discourse The universe of discourse, also called universe , is the set of
objects
of interest. The propositions in the predicate logic are statements on objects of a
universe.
The universe is thus the domain of the (individual) variables. It can be the set of
real numbers,
the set of integers, the set of all cars on a parking lot, the set of all students in a
classroom etc.
The universe is often left implicit in practice. But it should be obvious from the
context.

Predicate logic is more powerful than propositional logic. It allows one to reason
about properties
and relationships of individual objects. In predicate logic, one can use some
additional inference
rules, some of which are given below, as well as those for propositional logic such
as
the equivalences, implications and inference rules.

Important Inference Rules of Predicate Logic:

First there is the following rule concerning the negation of quantified statement
which is very useful:
x P(x) x P(x)

Next there is the following set of rules on quantifiers and connvectives:

1. x [ P(x) Q(x) ] [ x P(x) x Q(x) ]


2. [ x P(x) x Q(x) ] x [ P(x) Q(x) ]
3. x [ P(x) Q(x) ] [ x P(x) x Q(x) ]
4. x [ P(x) Q(x) ] [ x P(x) x Q(x) ]

For more discussions and examples on these rules and others, see Reasoning(with
predicate logic)
and Quantifiers and Connectives in Discrete Structures course. Also for proof and
proof
techniques see Mathematical Reasoning.

Sets
What is a set?
Set is a group of elements, having a property that characterizes those elements.

How to specify a Set?


One way is to enumerate the elements completely. All the elements belonging to the
set are explicitly given.

Example: A = {1,2,3,4,5}

Alternate way is to give the properties that characterize the elements of the set.

Example: B = {x | x is a positive integer less than or equal to 5}

Some sets can also be defined recursively.

Set terminology

Belongs To
x B means that x is an element of set B. Using this notation we can specify the
set {0,1,2,3,4} call it Z
by writing

Z = {x | x N | x 5}
where N represents the set of natural numbers.
It is read as "the set of natural numbers that are less than or equal to 5".

Subset
Let A and B be two sets.

A is a subset of B, if every element of A is an element of B.

A is a subset of B is represented as A B.

Note: If A is a subset of B and B is a subset of A then A=B. Also, if A is a subset


of, but not equal to B

represented as A B.

Universal Set
The set U of all the elements we might ever consider in the discourse is called the
universal set.
Complement
If A is a set, then the complement of A is the set consisting of all elements of the
universal set
that are not in A. It is denoted by A' or . Thus
A' = { x | x U ^ x A } , where means " is not an element of "..
Example:
If U is the set of natural numbers and A = { 1,2,3 } , then A' = { x | x U ^ x >
3}.

Set Operations

The operations that can be performed on sets are:

1. Union
If A and B are two sets, then the union of A and B is the set that contains all
the elements that are in
A and B including the ones in both A and B. It is denoted by A B.

Example: If A = {1,2,3} and B = {3,4,5}


then A B = {1,2,3,4,5}

2. Difference
If A and B are two sets, then the difference of A from B is the set that
consists of the elements of A
that are not in B. It is denoted by A - B.

Example: If A = {1,2,3} B = {3,4,5}


then A - B = {1,2}

Note that in general A - B B - A .


For A and B of the above example B - A = {4,5} .

3. Intersection
If A and B are two sets, then the intersection of A and B is the set that
consists of the elements in
both A and B . It is denoted by A B.
Example: If A = {1,2,3,8} B = {3,4,5,8}
then A B = {3,8}.

Disjoint sets
A and B are said to be disjoint if they contain no elements in common
i.e. A B= ø, where ø is the Empty set.

Example: A = { 1,2,3,4,5 } and B = { 6,8,9 } are disjoint.


Following is a list of some standard Set Identities
A, B, C represent arbitrary sets and ø is the empty set and U is the Universal Set.

The Commutative laws:


A B=B A
A B=B A

The Associative laws:


A (B C) = (A B) C
A (B C) = (A B) C

The Distributive laws:


A (B C) = (A B) (A C)
A (B C) = (A B) (A C)

The Idempotent laws:


A A=A
A A=A

The Absorptive laws:


A (A B) = A
A (A B) = A

The De Morgan laws:


(A B)' = A' B'
(A B)' = A' B'

Other laws involving Complements:


( A' )' = A
A A' = ø
A A' = U

Other laws involving the empty set


A ø=A
A ø=ø
Other laws involving the Universal Set:
A U=U
A U=A

Venn Diagrams
A common technique in working with Set Operations is to illustrate them by drawing
Venn Diagrams.
It is a very good tool to get a general idea.
Note, however, that Venn Diagrams must NOT be used for rigorous discussions,
because they can represent
only very limited situations and miss many other possibilities.

The idea of Venn Diagram is to draw a region representing the universe and within that
to draw the regions
representing the component sets we are starting with so that the resulting diagram
describes their
interrelationships.

For example sets A = { 1,2,3,4 } and B = { 6,8,2,4 } can be represented as shown


below using Venn Diagrams:

Set A

U represents the Universal set in which A is one of the Set.


Set B

The following Venn Diagram is used to illustrate A B


A B
The following Venn Diagram is used to illustrate A U B

A B

A B is the set consisting of all the different elements in A and B.


(A B)' is the yellow region in the Venn diagram given below.

For example: U = { 1,2,3,4,5,6,7,8 } A = { 1,2,3,4 } B = { 2,4,6,8 }


A B = { 1,2,3,4,6,8 }
(A B)' = U - (A B)
= { 5, 7 }
A - B is the yellow shaded region and
B - A is the blue shaded region in the Venn Diagram shown below

Generalized Set Operations

Union, intersection and Cartesian product of sets are associative.

For example holds. To denote either of these


expressions
we often use A B C.

This can be generalized for the union of any finite number of sets as A1 A2 ....
An ,

which we write as

Ai

This generalized union of sets can be rigorously defined as follows:

Definition ( Ai) :
Basis Clause: For n = 1 , Ai = A1.

Inductive Clause: Ai = ( Ai) An+1

Similarly the generalized intersection Ai and generalized Cartesian product

Ai can be defined.

Based on these definitions, De Morgan's law on set union and intersection can also
be generalized as follows:

Theorem (Generalized De Morgan)

= , and

Recursive Definition

Recursive Definition

Subjects to be Learned

• recursive/inductive definition
• basis clause
• basis
• inductive clause
• extremal clause

A recursive definition of a set always consists of three distinct clauses:

1. The basis clause (or simply basis) of the definition establishes that certain objects
are in the set. This part of the definition specifies the "seeds" of the set from
which the elements of the set are generated using the methods given in the
inductive clause. The set of elements specified here is called basis of the set being
defined.
2. The inductive clause (or simply induction) of the definition establishes the ways
in which elements of the set can be combined to produce new elements of the set.
The inductive clause always asserts that if objects are elements of the set, then
they can be combined in certain specified ways to create other objects. Let us call
the objects used to create a new object the parents of the new object, and the new
object is their child .
3. The extremal clause asserts that unless an object can be shown to be a member of
the set by applying the basis and inductive clauses a finite number of times, the
object is not a member of the set.

Examples of Recursive Definition of Set

Example 1. The Set of Natural Numbers

Basis Clause:
Inductive Clause: For any element x in , x + 1 is in .
Extremal Clause: Nothing is in unless it is obtained from the Basis and Inductive
Clauses.

The basis for this set N is { 0 } . The x + 1 in the Inductive Clause is the parent of x, and
x is the child of x + 1.
Following this definition, the set of natural numbers N can be obtained as follows:
First by (1), 0 is put into N.
Then by (2), since 0 is in N, 0 + 1 (= 1) is in N. 0 is the parent of 1, and 1 is the child of
0.
Then by (2) again, 1 + 1 (= 2) is in N. 1 is the parent of 2, and 2 is the child of 1.
Proceeding in this manner all the natural numbers are put into N.
Note that if we don't have (3), 0.5, 1.5, 2.5, ... can be included in N, which is not what
we want as the set of natural numbers.

Example 2. The Set of Nonnegative Even Numbers

Basis Clause:
Inductive Clause: For any element x in , x + 2 is in .
Extremal Clause: Nothing is in unless it is obtained from the Basis and Inductive
Clauses.

Example 3. The Set of Even Integers

Basis Clause:
Inductive Clause: For any element x in , x + 2, and x - 2 are in .
Extremal Clause: Nothing is in unless it is obtained from the Basis and Inductive
Clauses.

Example 4. The Set of Strings over the alphabet excepting empty string
This is the set of strings consisting of a's and b's such as abbab, bbabaa, etc.

Basis Clause: , and .


Inductive Clause: For any element x in , , and .
Here ax means the concatenation of a with x.
Extremal Clause: Nothing is in unless it is obtained from the Basis and Inductive
Clauses.

Tips for recursively defining a set:


For the "Basis Clause", try simplest elements in the set such as smallest numbers (0, or
1), simplest expressions, or shortest strings. Then see how other elements can be obtained
from them, and generalize that generation process for the "Inductive Clause".

The set of propositions (propositional forms) can also be defined recursively. To see how
it is defined click here.

Test Your Understanding of Recursive Definition

Indicate which of the following statements are correct and which are not.
Click Yes or No , then Submit. There are two sets of questions.

Recursive Definition of Function

Some functions can also be defined recursively.

Condition: The domain of the function you wish to define recursively must be a set
defined recursively.
How to define function recursively: First the values of the function for the basis
elements of the domain are specified. Then the value of the function at an element, say x,
of the domain is defined using its value at the parent(s) of the element x.

A few examples are given below.


They are all on functions from integer to integer except the last one.
Example 5: The function f(n) = n! for natural numbers n can be defined recursively as
follows:

Basis Clause: f(0) = 0! = 1


Inductive Clause: For all natural number n, f(n+1) = (n+1) f(n).

Note that here Extremal Clause is not necessary, because the set of natural numbers can
be defined recursively and that has the extremal clause in it. So there is no chance of
other elements to come into the function being defined.

Using this definition, 3! can be found as follows:


Since 0 ! = 1, 1 ! = 1 * 0 ! = 1 * 1 = 1 ,
Hence 2 ! = 2 * 1 ! = 2 * 1 = 2 .
Hence 3 ! = 3 * 2 ! = 3 * 2 * 1 = 6 .

Example 6: The function f(n) = 2n + 1 for natural numbers n can be defined recursively
as follows:

Basis Clause: f(0) = 1


Inductive Clause: For all natural number n, f(n+1) = f(n) + 2 .
See above for the extremal clause.

Example 7: The function f(n) = 2n for natural numbers n can be defined recursively as
follows:

Basis Clause: f(0) = 1


Inductive Clause: For all natural number n, f(n+1) = 2 f(n) .
See Example 5 for the extremal clause.

Example 8: The function L from the set S of strings over {a, b} to the set of natural
numbers that gives the length of a string can be defined recursively as follows:

Basis Clause: For symbols a and b of the alphabet, L(a) = 1 and L(b) = 1.
Inductive Clause: For any string x and y of S, L(xy) = L(x) + L(y) , where xy is the
concatenation of strings x and y.
See Example 5 for the extremal clause.

This function L gives the number of a's and b's


L(x) for a string x is also often denoted by | x |.

Induction

Mathematical Induction
Subjects to be Learned

• first principle of mathematical induction


• basis step
• induction hypothesis
• induction
• second principle of mathematical induction

Contents

First Priciple of Mathematical Induction

As we have seen in recursion , the set of natural numbers can be defined recursively, and
its elements can be generated one by one starting with 0 by adding 1. Thus the set of
natural numbers can be described completely by specifying the basis element (0), and the
process of generating an element from a known element in the set.

Taking advantage of this, natural numbers can be proven to have certain properties as
follows:
First it is proven that the basis element, that is 0, has the property in question (basis step).
Then it is proven that if an arbitrary natural number, denote it by n, has the property in
question, then the next element, that is n + 1, has that property (inductive step).

When these two are proven, then it follows that all the natural numbers have that
property. For since 0 has the property by the basis step, the element next to it, which is 1,
has the same property by the inductive step. Then since 1 has the property, the element
next to it, which is 2, has the same property again by the inductive step. Proceeding
likewise, any natural number can be shown to have the property. This process is
somewhat analogous to the knocking over a row of dominos with knocking over the first
domino corresponding to the basis step.

More generally mathematical statements involving a natural number n such as 1 + 2 + ...


+ n = n( n + 1 )/2 can be proven by mathematical induction by the same token.

To prove that a statement P(n) is true for all natural number , where is a
natural number, we proceed as follows:
Basis Step: Prove that P( ) is true.
Induction: Prove that for any integer , if P(k) is true (called induction
hypothesis), then P(k+1) is true.
The first principle of mathematical induction states that if the basis step and the
inductive step are proven, then P(n) is true for all natural number .

As a first step for proof by induction, it is often a good idea to restate P(k+1) in terms of
P(k) so that P(k), which is assumed to be true, can be used.

Example:

Prove that for any natural number n, 0 + 1 + ... + n = n( n + 1 )/2 .

Proof:
Basis Step: If n = 0, then LHS = 0, and RHS = 0 * (0 + 1) = 0 .
Hence LHS = RHS.
Induction: Assume that for an arbitrary natural number n, 0 + 1 + ... + n = n( n + 1 )/2 .
-------- Induction Hypothesis
To prove this for n+1, first try to express LHS for n+1 in terms of LHS for n, and
somehow use the induction hypothesis.
Here let us try
LHS for n + 1 = 0 + 1 + ... + n + (n + 1) = (0 + 1 + ... + n) + (n + 1) .
Using the induction hypothesis, the last expression can be rewritten as
n( n + 1 )/2 + (n + 1) .
Factoring (n + 1) out, we get
(n + 1)(n + 2) / 2 ,
which is equal to the RHS for n+1.

Thus LHS = RHS for n+1.

End of Proof.

Second Priciple of Mathematical Induction

There is another form of induction over the natural numbers based on the second
principle of induction to prove assertions of the form x P(x) . This form of induction
does not require the basis step, and in the inductive step P(n) is proved assuming P(k)
holds for all k < n . Certain problems can be proven more easily by using the second
principle than the first principle because P(k) for all k < n can be used rather than just
P(n - 1) to prove P(n).

Formally the second principle of induction states that

if n [ k [ k < n P(k) ] P(n) ] , then n P(n) can be concluded.

Here k [ k < n P(k) ] is the induction hypothesis.

The reason that this principle holds is going to be explained later after a few examples of
proof.

Example 1: Let us prove the following equality using the second principle:
For any natural number n , 1 + 3 + ... + ( 2n + 1 ) = ( n + 1 )2.
Proof: Assume that 1 + 3 + ... + ( 2k + 1 ) = ( k + 1 )2 holds for all k, k < n.
Then 1 + 3 + ... + ( 2n + 1 ) = ( 1 + 3 + ... + ( 2n - 1 ) ) + ( 2n + 1 )
= n2 + ( 2n + 1 ) = ( n + 1 )2 by the induction hypothesis.
Hence by the second principle of induction 1 + 3 + ... + ( 2n + 1 ) = ( n + 1 )2 holds for
all natural numbers.

Example 2: Prove that for all positive integer n, i ( i! ) = ( n + 1 )! - 1


Proof: Assume that
1 * 1! + 2 * 2! + ... + k * k! = ( k + 1 )! - 1 for all k, k < n.
Then 1 * 1! + 2 * 2! + ... + ( n - 1 ) * ( n - 1 )! + n * n!
= n! - 1 + n * n! by the induction hypothesis.
= ( n + 1 )n! - 1

Hence by the second principle of induction i ( i! ) = ( n + 1 )! - 1 holds for all


positive integers.

Example 3: Prove that any positive integer n > 1, can be written as the product of prime
numbers.

Proof: Assume that for all positive integers k, n > k > 1, k can be written as the product
of prime numbers.
We are going to prove that n can be written as the product of prime numbers.

Since n is an integer, it is either a prime number or not a prime number. If n is a prime


number, then it is the product of 1, which is a prime number, and itself. Therefore the
statement holds true.
If n is not a prime number, then it is a product of two positive integers, say p and q. Since
both p and q are smaller than n, by the induction hypothesis they can be written as the
product of prime numbers (Note that this is not possible if the First Principle is being
used). Hence n can also be written as the product of prime numbers.

Relations

Definition Relation
Let A and B be sets. A binary relation from A into B is any subset of the
Cartesian product A x B.

Example1:
nbsp; Let's assume that a person owns three shirts and two pairs of slacks.
More precisely, let A = {blue shirt, mint green shirt} and B = {gray slacks,
tan slacks}.
Then certainly A x B is the set of all possible combinations (six) of shirts
and slacks that
nbsp; the individual can wear. However, the individual may wish to restrict
himself to
combinations which are color coordinated, or "related". This may not be
all possible
pairs in A x B but will certainly be a subset of A x B. For example, one
such subset
may be { (blue shirt, gray slack), (black shirt, tan slacks), (mint green
shirt, tan slacks) }.

Example2:
Let A = {2, 3, 5, 6) and define a relation R from A into A by (a, b) R if
and only if
a divides evenly into b.
So, R = {(2, 2), (3, 3), (5, 5), (6, 6), (2,6), (3, 6)}.

A typical element in R is an ordered pair (x, y). In some cases R can be


described by
actually listing the pairs which are in R, as in the previous example. This
may not be
convenient if R is relatively large. Other notations are used depending on
the past practice.
Consider the following relation on real numbers.

R = { (x, y) | y is the square of x} and S = { (x, y) | x <= y}.

R could be more naturally expressed as R(x) = x2 . or R(x) =y where y = x2 .

Relation on a Set
A relation from a set A into itself is called a relation on A.

R and S of Example 2 above are relations on A = {2, 3, 5, 6}.


Let A be a set of people and let P = {(a, b) | a A ^ b A ^ a is a child of
b } . Then P is a relation on A which we might call a parent-child relation.
Composition
Let R be a relation from a set A into set B, and S be a relation from set B
into set C. The
composition of R and S, written as RS, is the set of pairs of the form(a, c)
A x C, where
(a, c) RS if and only if there exists b B such that (a, b) Rand (b, c) S.
For example PP, where P is the parent-child relation given above, is the
composition of P with itself and it is a relation which we know as grandparent-
grandchild relation.

PropertiesOf Relations
Assume R is a relation on set A; in other words, R A x A. Let us write a
R b to denote (a, b) R .

1. Reflexive: R is reflexive if for every a A, a R a.

2. Symmetric: R is symmetric if for every a and b in A, if aRb, then bRa.

3. Transitive: R is transitive if for every a, b and c in A, if aRb and bRc,


then aRc.

4. Equivalence: R is an equivalence relation on A if R is reflexive,


symmetric and transitive.

Function

Functions

Subjects to be Reviewed

• function
• domain, codomain
• image
• image of set
• range
• sum of functions
• product of functions
• one-to-one function (injection)
• onto function (surjection)
• one-to-one onto function (bijection)
• inverse function
• composite function

Definition (function): A function, denote it by f, from a set A to a set B is a relation


from A to B that satisfies

1. for each element a in A, there is an element b in B such that <a, b> is in the
relation, and
2. if <a, b> and <a, c> are in the relation, then b = c .

The set A in the above definition is called the domain of the function and B its
codomain.
Thus, f is a function if it covers the domain (maps every element of the domain) and it is
single valued.
The relation given by f between a and b represented by the ordered pair <a, b> is
denoted as f(a) = b , and b is called the image of a under f .
The set of images of the elements of a set S under a function f is called the image of the
set S under f, and is denoted by f(S) , that is,
f(S) = { f(a) | a S }, where S is a subset of the domain A of f .
The image of the domain under f is called the range of f .

Example: Let f be the function from the set of natural numbers N to N that maps each
natural number x to x2 . Then the domain and codomain of this f are N, the image of, say
3, under this function is 9, and its range is the set of squares, i.e. { 0, 1, 4, 9, 16, ....} .

Definition (sum and product): Let f and g be functions from a set A to the set of real
numbers R.
Then the sum and the product of f and g are defined as follows:

For all x, ( f + g )(x) = f(x) + g(x) , and


for all x, ( f*g )(x) = f(x)*g(x) ,
where f(x)*g(x) is the product of two real numbers f(x) and g(x).

Example: Let f(x) = 3x + 1 and g(x) = x2 . Then ( f + g )(x) = x2 + 3x + 1 , and ( f*g )(x)
= 3x3 + x2

Definition (one-to-one): A function f is said to be one-to-one (injective) , if and only if


whenever f(x) = f(y) , x = y .

Example: The function f(x) = x2 from the set of natural numbers N to N is a one-to-one
function. Note that f(x) = x2 is not one-to-one if it is from the set of integers(negative as
well as non-negative) to N , because for example f(1) = f(-1) = 1 .

Definition (onto): A function f from a set A to a set B is said to be onto(surjective) , if


and only if for every element y of B , there is an element x in A such that f(x) = y , that
is, f is onto if and only if f( A ) = B .

Example: The function f(x) = 2x from the set of natural numbers N to the set of non-
negative even numbers E is an onto function. However, f(x) = 2x from the set of natural
numbers N to N is not onto, because, for example, nothing in N can be mapped to 3 by
this function.

Definition (bijection): A function is called a bijection , if it is onto and one-to-one.

Example: The function f(x) = 2x from the set of natural numbers N to the set of non-
negative even numbers E is one-to-one and onto. Thus it is a bijection.

Every bijection has a function called the inverse function.


These concepts are illustrated in the figure below. In each figure below, the points on the
left are in the domain and the ones on the right are in the codomain, and arrows show <
x, f(x) > relation.

Definition (inverse): Let f be a bijection from a set A to a set B. Then the function g is
called the inverse function of f, and it is denoted by f -1 , if for every element y of B,
g(y) = x , where f(x) = y . Note that such an x is unique for each y because f is a bijection.

For example, the rightmost function in the above figure is a bijection and its inverse is
obtained by reversing the direction of each arrow.

Example: The inverse function of f(x) = 2x from the set of natural numbers N to the set
of non-negative even numbers E is f -1(x) = 1/2 x from E to N . It is also a bijection.

A function is a relation. Therefore one can also talk about composition of functions.

Definition (composite function): Let g be a function from a set A to a set B , and let f be
a function from B to a set C . Then the composition of functions f and g , denoted by fg ,
is the function from A to C that satisfies fg(x) = f( g(x) ) for all x in A .

Example: Let f(x) = x2 , and g(x) = x + 1 . Then f( g(x) ) = ( x + 1 )2 .

Introduction to Language
A language is, in this course, a set of strings of symbols. Programming langauges we use are a
language in that sense. Others such as languages of logics, languages of mathematics,
natural languages etc. are all languages in that sense.
What we are going to study on languages in this course are four classes of languages
called (Chomsky) formal languages and their properties. The four classes are regular (or
type 3) languages, context-free (or type 2) languages, context-sensitive (or type 1)
languages and phrase structure (or type 0) languages. Type 3 is a subset of type 2 which
is a subset of type 1 and type 0 is the most general including the other three as a subset.

These formal languages are characterized by grammars which are essentially a set of
rewrite rules for generating strings belonging to a language as we see later. Also there are
various kinds of computing devices called automata which process these types of
languages Thus formal languages can also be characterized by the computing devices
which process them.

These formal languages and automata capture the essense of various computing devices
and computation in a very simple way. Also for some important classes of problems,
solving them can be seen as recognizing languages i.e. checking whether or not a string is
in a language. Using automata and formal languages we can study limitations of
computer and computation. It can be rigorously shown that some problems can not be
solved by computers in any finite amount of time and that some others are practically
unsolvable because of the time it takes to solve them. In asddition two of the formal
languages, regular and context-free languages, are quite useful for modeling systems used
in practice such as co9mputer network communication protocols, lexical analyzers and
parser for compilers for programming languages.

In the following chapters we first learn about languages. Then we study regular
languages, the simplest of the four formal languages, together with regular expressions
which are a method of representing regular languages.
Then we investigate various kinds of finite automata: deterministic finite automata
(DFA), nondeterministic finite automata (NFA) and nondeterministic finite automata
with transitions (NFA- ). They are devices that recognize regular languages. NFA
and NFA- are conceptually simpler and easier to use when modeling a system because
there are no restrictions on transitions for them unlike for DFA. On the other hand DFAs
are suited for writing a simulator program because there is no nondeterminism such as
going to two or more states from a state upon reading one input symbol. We are going to
see an algorithm for converting NFA- to NFA which recognizes the same language
and another for NFA to DFA conversion.
As we are going to learn next, in general there are more than one NFAs and DFAs that
reconize one language. However, if the number of states of DFA is minimized, then the
resulting DFA is unique up to the state names for a given regular language.
Then after seeing yet another way of representing regular laguages; regular grammars,
we are going to learn modeling of systems finite automata.
Our last topic on regular language is testing of languages for non-regularity.

Definitions on Language

Subjects to be Learned
• alphabet
• string (word)
• language
• operations on languages: concatenation of strings, union, intersection, Kleene star

Contents

Here we are going to learn the concept of language in very abstract and general sense,
operations on languages and some of their properties.

1. Basic concepts

First, an alphabet is a finite set of symbols. For example {0, 1} is an alphabet with two
symbols, {a, b} is another alphabet with two symbols and English alphabet is also an
alphabet. A string (also called a word) is a finite sequence of symbols of an alphabet. b,
a and aabab are examples of string over alphabet {a, b} and 0, 10 and 001 are examples
of string over alphabet {0, 1}. A language is a set of strings over an alphabet. Thus {a,
ab, baa} is a language (over alphabert {a,b}) and {0, 111} is a language (over alphabet
{0,1}). The number of symbols in a string is called the length of the string. For a string
w its length is represented by |w|. It can be defined more formally by recursive definition.
The empty string (also called null string) is the string with length 0. That is, it has no
symbols. The empty string is denoted by (capital lambda). Thus | | = 0.
Let u and v be strings. Then uv denotes the string obtained by concatenating u with v,
that is, uv is the string obtained by appending the sequence of symbols of v to that of u.

For example if u = aab and v = bbab, then uv = aabbbab. Note that vu = bbabaab uv.
We are going to use first few symbols of English alphabet such as a and b to denote
symbols of an alphabet and those toward the end such as u and v for strings.

A string x is called a substring of another string y if there are strings u and v such that y =
uxv. Note that u and v may be an empty string. So a string is a substring of itself. A string
x is a prefix of another string y if there is a string v such that y = xv. v is called a suffix of
y.
2. Some special languages

The empty set is a language which has no strings. The set { } is a language which has
one string, namely . Though has no symbols, this set has an object in it. So it is not
empty. For any alphabet , the set of all strings over (including the empty string) is
denoted by . Thus a language over alphabet is a subset of .

3. Operations on languages

Since languages are sets, all the set operations can be applied to languages. Thus the
union, intersetion and difference of two languages over an alphabet are languages over
. The complement of a language L over an alphabet is - L and it is also a
language.

Another operation onlanguages is concatenation. Let L1 and L2 be languages. Then the

concatenation of L1 with L2 is denoted as L1L2 and it is defined as L1L2 = { uv | u L1

and v L2 }. That is L1L2 is the set of strings obtained by concatenating strings of L1


with those of L2. For example {ab, b} {aaa, abb, aaba} = {abaaa, ababb, abaaba, baaa,
babb, baaba}.

Powers : For a symbol a and a natural number k, ak represents the concatenation of

k a's. For a string u and a natural number k, uk denotes the concatenation of k u's.
Similarly for a language L, Lk means the concatenation of k L's. Hence Lk is the set of
strings that can be obtained by concatenating k strings of L. These powers can be
formally defined recursively. For example Lk can be defined recursively as follows.

Recursive definition of Lk:

Basis Clause: L0 = { }
Inductive Clause: L(k+1) = Lk L.
Since Lk is defined for natural numbers k, the extremal clause is not necessary.

ak and uk can be defined similarly. Here a0 = and u0 = .

*
The following two types of languages are generalizations of and we are going to see
them quite often in this course.

Recursive definition of L*:

Basis Clause: L*

Inductive Clause: For any x L* and any w L, xw L*.


Extremal Clause: Nothing is in L* unless it is obtained from the above two clauses.

L* is the set of strings obtained by concatenating zero or more strings of L as we are


going to see in Theorem 1. This * is called Kleene star.
For example if L = { aba, bb }, then L* = { , aba, bb, ababb, abaaba, bbbb, bbaba, ... }

*
The * in is also the same Kleene star defined above.

Recursive definition of L+:

Basis Clause: L L+
Inductive Clause: For any x L+ and any w L, xw L+.
Extremal Clause: Nothing is in L+ unless it is obtained from the above two clauses.

Thus L+ is the set of strings obtained by concatenating one or more strings of L.


For example if L = { aba, bb }, then L+ = { aba, bb, ababb, abaaba, bbbb, bbaba, ... }

Let us also define (i.e. L0 L L2 ... ) as ={x|x Lk for some


natural number k } .

Then the following relationships hold on L* and L+. Theorems 1 and 2 are proven in
"General Induction" which you study in the next unit. Other proofs are omitted.

Theorem 1: Ln L*

Theorem 2:

Theorem 3:

Theorem 4: L+ = L L* = L*L

Note: According to Theorems 2 and 3, any nonempty string in L* or L+ can be expresssed


as the concatenation of strings of L, i.e. w1w2...wk for some k, where wi's are strings of L.

L* and L* have a number of interesting properties. Let us list one of them as a theorem
and prove it.

Theorem 5: L* = (L*)*.

Proof: Because by Theorem 2, by applying Theorem 2 to the language L*


we can see that

L* (L*)*.

Conversely ( L* )* L* can be proven as follows:

Let x be an arbitrary nonempty string of ( L* )*. Then there are nonempty strings w1,
w2, ..., wk in L* such that x = w1w2...wk . Since w1, w2, ..., wk are strings of L*, for each wi
there are strings wi1, wi2, ..., wimi in L such that wi = wi1wi2...wimi
Hence x = w11 ...w1m1w21.. .w2m2...wm1...wmmk . Hence x is in L* .
If x is an empty string, then it is obviously in L* .

Hence ( L* )* L* .

Since L* (L*)* and ( L* )* L* , L* = ( L* )* .

Problem Solving as Language Recognition

Subjects to be Learned

• problem instance
• problem as language
• problem solving as language recognition

Contents

In the previous section the concept of language was introduced and its properties have
been briefly studied. You might be wondering why we study language. The main reason
for studying language is that solving problems can be viewed as a language recognition
problem as explained below, that is, the problem of checking whether or not a string
belongs to a language. Thus instead of studying what kind of problems can be solved by
what kind of computational devices and how, we can study languages and devices to
recognize them which are simpler to deal with uncluttered with variations in actual
devices, programming languages etc.

Below an example is given to illustrate how solving a problem can be viewed as


recognizing a language.

Consider the following problem:


Is the longest of the distances between two nodes(i.e. the diameter) of a given graph less
than a given integer k ? Here the distance is the smallest number of edges (or hops)
between the nodes.

Some of the instances of this problem are as shown below:


Instance 1 asks whether or not the diameter of the given graph with one edge and two
nodes is less than 1. Instance 2 asks whether or not the diameter of the given graph with
four edges and four nodes is less than 2. Simiarlyt for Instance 3.

These problem instances can be represented by a string as follows:

Instance 1: 1,2;(1,2);1
Instance 2: 1,2,3,4;(1,2)(1,3)(1,4)(3,4);2
Instance 3: 1,2,3,4;(1,2)(1,3)(1,4)(2,3)(2,4)(3,4);3

Here the set of nodes, the set of edges and k are separated by ; in that order in the strings.

The solutions to these instances are:

Instance 1: No
Instance 2: No
Instance 3: Yes

There are infinitely many 'Yes' instances and 'No' instances for this problem. The set of
'Yes' instances is a language and so is the set of 'No' instances as well as the set of all
instances and many others for this problem.
We can thus see that solving the problem for a given instance is equivalent to
checking
whether or not the string representing the given instance belongs to the language of
'Yes' instances
of the problem. That is, the problem solving is the same as the language recognition. A
problem can be solved if and only if the language of its 'Yes' instances is recognizable or
decidable by a Turing machine. It is not solvable if the language is merely accecptable
but not recognizable, or even worse if it is not even acceptable.

(a.k.a Structural Induction)

Mathematical statements involving an element of a recursively defined set can be proven


by induction.
To prove by induction that a statement P(x) is true for all the elements x of a recursively
defined set S, proceed as follows:

Basis Step: Prove that P(x) is true for all the elements x in the basis of S.

Induction: Prove that for any element(s) x of S if P(x) is true, then P(y) is true for any
element y obtained from x by the induction step of the recursive definition of S.

Note 1 : In the Induction we try to prove that if a parent has the property then all of its
children also have that property. In the process we need the relationship between the
parent and the children. That relationship is found in the Inductive Clause of the recursive
definition of the set in question.
Note 2 : As a first step for general induction proof, it is often a good idea to express y in
terms of x so that P(x) can be used.

Example 1 (Theorem 1 in "Language") : Prove that Ln L* for any natural number n


and any language L.
Let us first review the definitions.
Recursive definition of Lk:
Basis Clause: L0 = { }
Inductive Clause: L(k+1) = LkL.
Since Lk is defined for natural numbers k, the extremal clause is not necessary.

Recursive definition of L*:


Basis Clause: L*

Inductive Clause: For any string x L* and any string w L, xw L*.


Extremal Clause: Nothing is in L* unless it is obtained from the above two clauses.

Now let us prove that Ln L* by induction on Ln.


Note in the proof below that Basis and Inductive Steps mirror the Basis and Inductive
Clauses of the definition of Ln .

Basis Step: Since by the definitions L0 = { }, and L* , L0 L* .

Inductive Step: Assume that Lk L* for an arbitrary natural numer k. --- Induction
Hypothesis

We are going to show that Lk+1 L* .

Let w be an arbitrary string in Lk+1 . Then there exist strings x and y that satisfy x Lk , y

L and w = xy by the definition of Lk+1. Since Lk L* by theInduction Hypothesis, x


L* .

Then by the definition of L*, xy L* since y L.

Hence w L* .

Thus Lk+1 L* .

Example 2 (Theorem 2 in "Language") Let us prove L* = .

Note that ={x|x Lk for some natural number k } .

Proof: Let us first prove L* .

Suppose that x . Then by the definition of , x Lk for some natural

number k. By Example 1 above , Lk L* . Hence x L* . Hence L* .

Next let us prove L* .


Note that L* is defined recursively and that below we are trying to prove that the

elements of L* have the property that they also belong to . So we first prove that
*
the element of the basis of L has the propertyy. Then we show that if any element, say x,
of L* has the property, then its children xy, where y is an arbitrary elememt of L, also
have the property.

Basis Step: L0 since L0 = { }.

Hence by the definition of , .

Inductive Step: Assume that for an arbitrary x in L*, x holds.

We are going to show that for an arbitrary element y L , xy holds.


Note here that x is a parent and by applying an operation (i.e. by concatenating y) a child

of x in is obtained.
So we show that the property for x is inherited by its children xy. Let us prove the
inheritance.

If x , then for some natural number k , x Lk .

Hence xy Lk+1 by the definition of Ln . Hence xy by Example 1 above.


End of Inductive Step and Proof

Hence we have proven .

Example 3

Prove that for arbitrary strings x and y of , REV(xy) = REV(y) REV(x) holds.
The function REV(x) on strings x over the alphabet is defined as follows. It
produces the reversal of a given string x (i.e. x spelled backward).

Basis Clause: REV( )= .

Inductive Clause: For any string and any symbol , REV(xa) =


aREV(x).

Note that each step mirror the recursive definition of .


Proof

*
First let us note that can be defined recursively as follows:
*
Basis Clause: where is an empty string.
Inductive Clause: For arbitrary strings x of and a of , xa is also in .
ExtremalClause: As usual. Omitted.

The proof of the equality in question is going to be proven for an arbitrary fixed x by
induction on y. Thus the statement to be proven is for an arbitrary fixed string x, and an
arbitrary string y of , REV(xy) = REV(y) REV(x) holds. The proof mirrors the
recursive definition of .

Basis Step: REV(x ) = REV( x ) = REV( )REV( x ) .


Induction: Assume that for an arbitrary string y of , REV(xy) = REV(y) REV(x) holds.
-- Induction Hypothesis
Then for an arbitrary symbol a of , REV(xya) = REV((xy)a) = a REV(xy).
But by induction hypothesis a REV(xy) = a REV(y)REV(x).
Since a REV(y) = REV(ya), REV(xya) = REV(ya)REV(x), which is what we needed.
End of Proof.

Definitions of Regular Language and Regular


Expression

Subjects to be Learned

• regular language
• regular expression

1. Regular language

The set of regular languages over an alphabet is defined recursively as below. Any
language belonging to this set is a regular language over .

Definition of Set of Regular Languages :

Basis Clause: , { } and {a} for any symbol a are regular languages.
Inductive Clause: If Lr and Ls are regular languages, then Lr Ls , LrLs and Lr* are
regular languages.

Extremal Clause: Nothing is a regular language unless it is obtained from the above two
clauses.
For example, let = {a, b}. Then since {a} and {b} are regular languages, {a, b} ( = {a}
{b} ) and {ab} ( = {a}{b} ) are regular languages. Also since {a} is regular, {a}* is a
regular language which is the set of strings consisting of a's such as , a, aa, aaa, aaaa
etc. Note also that *, which is the set of strings consisting of a's and b's, is a regular
language because {a, b} is regular.

2. Regular expression

Regular expressions are used to denote regular languages. They can represent regular
languages and operations on them succinctly.
The set of regular expressions over an alphabet is defined recursively as below. Any
element of that set is a regular expression.

Basis Clause: , and a are regular expressions corresponding to languages , { }


and {a}, respectively, where a is an element of .
Inductive Clause: If r and s are regular expressions corresponding to languages Lr and
Ls , then ( r + s ) , ( rs ) and ( r*) are regular expressions corresponding to languages Lr
Ls , LrLs and Lr* , respectively.

Extremal Clause: Nothing is a regular expression unless it is obtained from the above
two clauses.

Conventions on regular expressions

(1) When there is no danger of confusion, bold face may not be used for regular
expressions.
So for example, ( r + s ) is used in stead of ( r + s ).
(2) The operation * has precedence over concatenation, which has precedence over union
( + ).
Thus the regular expression ( a + ( b( c*) ) ) is written as a + bc*,
(3) The concatenation of k r's , where r is a regular expression, is written as rk. Thus for
example rr = r2 .
The language corresponding to rk is Lrk, where Lr is the language corresponding to the
regular expression r.
For a recursive definition of Lrk click here.
(4) We use ( r+) as a regular expression to represent Lr+ .
Examples of regular expression and regular languages corresponding to them

• ( a + b )2 corresponds to the language {aa, ab, ba, bb}, that is the set of strings of
length 2 over the alphabet {a, b}.
In general ( a + b )k corresponds to the set of strings of length k over the alphabet
{a, b}. ( a + b )* corresponds to the set of all strings over the alphabet {a, b}.
• a*b* corresponds to the set of strings consisting of zero or more a's followed by
zero or more b's.
• a*b+a* corresponds to the set of strings consisting of zero or more a's followed by
one or more b's followed by zero or more a's.
• ( ab )+ corresponds to the language {ab, abab, ababab, ... }, that is, the set of
strings of repeated ab's.

Note:A regular expression is not unique for a language. That is, a regular language, in
general, corresponds to more than one regular expressions. For example ( a + b )* and
( a*b* )* correspond to the set of all strings over the alphabet {a, b}.

Definition of Equality of Regular Expressions

Regular expressions are equal if and only if they correspond to the same language.

Thus for example ( a + b )* = ( a*b* )* , because they both represent the language of all
strings over the alphabet {a, b}.

In general, it is not easy to see by inspection whether or not two regular expressions
are equal.
Exercise Questions on Regular Language and Regular
Expression

Ex. 1: Find the shortest string that is not in the language represented by the regular
expression a*(ab)*b*.

Solution: It can easily be seen that , a, b, which are strings in the language with length
1 or less. Of the strings wiht length 2 aa, bb and ab are in the language. However, ba is
not in it. Thus the answer is ba.

Ex. 2: For the two regular expressions given below,


(a) find a string corresponding to r2 but not to r1 and
(b) find a string corresponding to both r1 and r2.

r1 = a* + b* r2 = ab* + ba* + b*a + (a*b)*

Solution: (a) Any string consisting of only a's or only b's and the empty string are in r1.
So we need to find strings of r2 which contain at least one a and at least one b. For
example ab and ba are such strings.
(b) A string corresponding to r1 consists of only a's or only b's or the empty string. The
only strings corresponding to r2 which consist of only a's or b's are a, b and the strings
consiting of only b's (from (a*b)*).

Ex. 3: Let r1 and r2 be arbitrary regular expressions over some alphabet. Find a simple
(the shortest and with the smallest nesting of * and +) regular expression which is equal
to each of the following regular expressions.

(a) (r1 + r2 + r1r2 + r2r1)*


(b) (r1(r1 + r2)*)+

Solution: One general strategy to approach this type of question is to try to see whether
or not they are equal to simple regular expressions that are familiar to us such as a, a*, a+,
(a + b)*, (a + b)+ etc.
(a) Since (r1 + r2)* represents all strings consisting of strings of r1 and/or r2 , r1r2 + r2r1 in
the given regular expression is redundant, that is, they do not produce any strings that are
not represented by (r1 + r2)*. Thus (r1 + r2 + r1r2 + r2r1)* is reduced to (r1 + r2)*.
(b) (r1(r1 + r2)*)+ means that all the strings represented by it must consist of one or more
strings of (r1(r1 + r2)*). However, the strings of (r1(r1 + r2)*) start with a string of r1
followed by any number of strings taken arbitrarily from r1 and/or r2. Thus anything that
comes after the first r1 in (r1(r1 + r2)*)+ is represented by (r1 + r2)*. Hence (r1(r1 + r2)*)
also represents the strings of (r1(r1 + r2)*)+, and conversely (r1(r1 + r2)*)+ represents the
strings represented by (r1(r1 + r2)*). Hence (r1(r1 + r2)*)+ is reduced to (r1(r1 + r2)*).

Ex. 4: Find a regular expression corresponding to the language L over the alphabet { a , b
} defined recursively as follows:

Basis Clause: L

Inductive Clause: If x L , then aabx L and xbb L .


Extremal Clause: Nothing is in L unless it can be obtained from the above two clauses.

Solution: Let us see what kind of strings are in L. First of all L . Then starting with
, strings of L are generated one by one by prepending aab or appending bb to any of
the already generated strings. Hence a string of L consists of zero or more aab's in front
and zero or more bb's following them. Thus (aab)*(bb)* is a regular expression for L.

Ex. 5: Find a regular expression corresponding to the language L defined recursively as


follows:

Basis Clause: L and a L.

Inductive Clause: If x L , then aabx L and bbx L .


Extremal Clause: Nothing is in L unless it can be obtained from the above two clauses.

Solution: Let us see what kind of strings are in L. First of all and a are in L . Then
starting with or a, strings of L are generated one by one by prepending aab or bb to any
of the already generated strings. Hence a string of L has zero or more of aab's and bb's in
front possibly followed by a at the end. Thus (aab + bb)*(a + ) is a regular expression
for L.

Ex. 6: Find a regular expression corresponding to the language of all strings over the
alphabet { a, b } that contain exactly two a's.

Solution: A string in this language must have at least two a's. Since any string of b's can
be placed in front of the first a, behind the second a and between the two a's, and since an
arbitrasry string of b's can be represented by the regular expression b*, b*a b*a b* is a
regular expression for this language.

Ex. 7: Find a regular expression corresponding to the language of all strings over the
alphabet { a, b } that do not end with ab.

Solution: Any string in a language over { a , b } must end in a or b. Hence if a string


does not end with ab then it ends with a or if it ends with b the last b must be preceded by
a symbol b. Since it can have any string in front of the last a or bb, ( a + b )*( a + bb ) is a
regular expression for the language.

Ex. 8: Find a regular expression corresponding to the language of all strings over the
alphabet { a, b } that contain no more than one occurence of the string aa.

Solution: If there is one substring aa in a string of the language, then that aa can be
followed by any number of b. If an a comes after that aa, then that a must be preceded by
b because otherwise there are two occurences of aa. Hence any string that follows aa is
represented by ( b + ba )*. On the other hand if an a precedes the aa, then it must be
followed by b. Hence a string preceding the aa can be represented by ( b + ab )*. Hence if
a string of the language contains aa then it corresponds to the regular expression ( b +
ab )*aa( b + ba )* .
If there is no aa but at least one a exists in a string of the language, then applying the
same argument as for aa to a, ( b + ab )*a( b + ba )* is obtained as a regular expression
corresponding to such strings.
If there may not be any a in a string of the language, then applying the same argument as
for aa to , ( b + ab )*( b + ba )* is obtained as a regular expression corresponding to
such strings.
Altogether ( b + ab )*( + a + aa )( b + ba )* is a regular expression for the language.

Ex. 9: Find a regular expression corresponding to the language of strings of even lengths
over the alphabet of { a, b }.

Solution: Since any string of even length can be expressed as the concatenation of strings
of length 2 and since the strings of length 2 are aa, ab, ba, bb, a regular expression
corresponding to the language is ( aa + ab + ba + bb )*. Note that 0 is an even number.
Hence the string is in this language.

Ex. 10: Describe as simply as possible in English the language corresponding to the
regular expression a*b(a*ba*b)*a* .

Solution: A string in the language can start and end with a or b, it has at least one b, and
after the first b all the b's in the string appear in pairs. Any numbe of a's can appear any
place in the string. Thus simply put, it is the set of strings over the alphabet { a, b } that
contain an odd number of b's

Ex. 11: Describe as simply as possible in English the language corresponding to the
regular expression (( a + b )3)*( +a+b).

Solution: (( a + b )3) represents the strings of length 3. Hence (( a + b )3)* represents the
strings of length a multiple of 3. Since (( a + b )3)*( a + b ) represents the strings of length
3n + 1, where n is a natural number, the given regular expression represents the strings of
length 3n and 3n + 1, where n is a natural number.

Ex. 12: Describe as simply as possible in English the language corresponding to the
regular expression ( b + ab )*( a + ab )*.

Solution: ( b + ab )* represents strings which do not contain any substring aa and which
end in b, and ( a + ab )* represents strings which do not contain any substring bb. Hence
altogether it represents any string consisting of a substring with no aa followed by one b
followed by a substring with no bb.

Properties of Regular Language

Subjects to be Learned

• Closure of the set of regular languages under union, concatenation and Kleene
star operations.
• Regularity of finite languages

Theorem 1: The set of regular languages over an alphabet is closed under operations
union, concatenation and Kleene star.
Proof: Let Lr and Ls be regular languages over an alphabet . Then by the definition of
the set of regular languages , Lr Ls , LrLs and Lr* are regular languages and they are
obviously over the alphabet . Thus the set of regular languages is closed under those
operations.

Note 1: Later we shall see that the complement of a regular language and the intersection
of regular laguages are also regular.
Note 2: The union of infinitely many regular languages is not necessarily regular. For
example while { akbk } is regular for any natural number k , { anbn | n is a natural
number } which is the union of all the languages { akbk } , is not regular as we shall
see later.

The following theorem shows that any finite language is regular. We say a language is
finite if it consists of a finite number of strings, that is, a finite language is a set of n
strings for some natural number n.

Theorem 2: A finite language is regular.

Proof: Let us first assume that a language consisting of a single string is regular and
prove the theorem by induction. We then prove that a language consisting of a single
string is regular.

Claim 1: A language consisting of n strings is regular for any natural number n (that is, a
finite language is regular) if { w } is regular for any string w.

Proof of the Claim 1: Proof by induction on the number of strings.

Basis Step: (corresponding to n = 0) is a regular language by the Basis Clause of the


definition of regular language.
Inductive Step: Assume that a language L consisting of n strings is a regular language
(induction hypothesis). Then since { w } is a regular language as proven below, L
{ w } is a regular language by the definition of regular language.

End of proof of Claim 1

Thus if we can show that { w } is a regular language for any string w, then we have
proven the theorem.

Claim 2: Let w be a string over an alphabet . Then { w } is a regular language.

Proof of Claim 2: Proof by induction on strings.

Basis Step: By the Basis Clause of the definition of regular language, { } and { a } are
regular languages for any arbitrary symbol a of .

Inductive Step: Assume that { w } is a regular language for an arbitrary string w over
. Then for any symbol a of , { a } is a regular language from the Basis Step. Hence by
the Inductive Clause of the definition of regular language { a }{ w } is regular. Hence
{ aw } is regular.

End of proof for Claim 2

Note that Claim 2 can also be proven by induction on the length of string.

End of proof of Theorem 2.


Introduction to Finite Automata

In this chapter we are going to study a class of machines called finite automata. Finite
automata are computing devices that accept/recognize regular languages and are used to
model operations of many systems we find in practice. Their operations can be simulated
by a very simple computer program. A kind of systems finite automnata can model and a
computer program to simulate their operations are discussed later.

Let us consider the operation of a soft drink vending machine which charges 15 cents for
a can. Pretend that you are the machine. Initially you are waiting for a customer to come
and put some coins, that is, you are in the waiting-for-customer state. Let us assume that
only nickels and dimes are used for simplicity. When a customer comes and puts in the
first coin, say a dime, you are no longer in the waiting-for-customer state. You have
received 10 cents and are waiting for more coins to come. So we might say you are in the
10-cents state. If the customer puts in a nickel, then you have now received 15 cents and
you wait for the customer to select a soft drink. So you are in another state, say 15-cents
state. When the customer selects a soft drink, you must give the customer a can of soft
drink. After that you stay in that state until another coin is put in to start the process
anew or you may terminate the operation and start all over from the initial state. The
states and the transitions between them of this vending machine can be represented with
the diagram below. In the figure, circles represent states and arrows state transitions. Ds
on arrows represent a dime and Ns a nickel. It is assumed that the machine terminates its
operation when it receives 15 cents or more. Click "NICKEL" or "DIME" in the figure
and see how it operates (see how arrows turn red).

In this example you as a vending machine have gone through (transitions between) a
number of states responding to the inputs from the customer (coins in this case). A
vending machine looked at this way is an example of finite automaton.

We have learned that regular languages are represented by regular expressions and
conversely. In the next few chapters first we are going to learn different kinds of finite
automata, and equivalence and conversions between them. Then we are going to see that
for every regular language a unique finite automaton can be constructed which can
recognize the language (i.e. tell whether or not a given string belongs to the regular
language). We are then going to study how finite automata can be used to simulate
operations of systems we see in practice.
Unfortunately not all languages and systems are simple like regular languages or finite
automata. There are languages which are not regular and which, therefore, can not be
recognized by finite automata. We are going to learn languages which are not regular
and ways to test languages for non-regularity.
Later we are going to learn an extension of finite automata called Turing machines.
Though Turing machines are simple modification of finite automata, they are much more
powerful computing devices than finite automata. In fact Turing machines are as
powerful as computers and it is generally believed, though not proven, that any
computation human beings do (with or without computers) can be performed by Turing
machines.

Definition of Deterministic Finite Automata

Subjects to be Learned

• Finite automata
• State transition diagram
• State transition table

Definition of deterministic finite automaton

Let Q be a finite set and let be a finite set of symbols. Also let be a function from Q

to Q , let q0 be a state in Q and let A be a subset of Q. We call the elements of Q a


state, the transition function, q0 the initial state and A the set of accepting states.
Then a deterministic finite automaton is a 5-tuple < Q , , q0 , , A >

Notes on the definition

1. The set Q in the above definition is simply a set with a finite number of elements.
Its elements can, however, be interpreted as a state that the system (automaton) is
in. Thus in the example of vending machine, for example, the states of the
machine such as "waiting for a customer to put a coin in", "have received 5 cents"
etc. are the elements of Q. "Waiting for a customer to put a coin in" can be
considered the initial state of this automaton and the state in which the machine
gives out a soda can can be considered the accepting state.
2. The transition function is also called a next state function meaning that the
automaton moves into the state (q, a) if it receives the input symbol a while in
state q.
Thus in the example of vending machine, if q is the initial state and a nickel is put
in, then (q, a) is equal to "have received 5 cents".
3. Note that is a function. Thus for each state q of Q and for each symbol a of ,
(q, a) must be specified.
4. The accepting states are used to distinguish sequences of inputs given to the finite
automaton. If the finite automaton is in an accepting state when the input ceases
to come, the sequence of input symbols given to the finite automaton is
"accepted". Otherwise it is not accepted. For example, in the Example 1 below,
the string a is accepted by the finite automaton. But any other strings such as aa,
aaa, etc. are not accepted.
5. A deterministic finite automaton is also called simply a "finite automaton".
Abbreviations such as FA and DFA are used to denote deterministic finite
automaton.

DFAs are often represented by digraphs called (state) transition diagram. The vertices
(denoted by single circles) of a transition diagram represent the states of the DFA and the
arcs labeled with an input symbol correspond to the transitions. An arc ( p , q ) from
vertex p to vertex q with label represents the transition (p, ) = q . The accepting
states are indicated by double circles.
Transition functions can also be represented by tables as seen below. They are called
transition table.

Examples of finite automaton

Example 1: Q = { 0, 1, 2 }, = { a }, A = { 1 }, the initial state is 0 and is as shown in


the following table.

State (q) Input (a) Next State ( (q, a) )


0 a 1
1 a 2
2 a 2

A state transition diagram for this DFA is given below.

If the alphabet of the Example 1 is changed to { a, b } in stead of { a }, then we need a


DFA such as shown in the following examle to accept the same string a. It is a little more
complex DFA.
Example 2: Q = { 0, 1, 2 }, = { a, b }, A = { 1 }, the initial state is 0 and is as shown
in the following table.

State (q) Input (a) Next State ( (q, a) )


0 a 1
0 b 2
1 a 2
1 b 2
2 a 2
2 b 2

Note that for each state there are two rows in the table for corresponding to the symbols
a and b, while in the Example 1 there is only one row for each state.

A state transition diagram for this DFA is given below.

A DFA that accepts all strings consisting of only symbol a over the alphabet { a, b } is
the next example.

Example 3: Q = { 0, 1 }, = { a, b }, A = { 0 }, the initial state is 0 and is as shown in


the following table.

State (q) Input (a) Next State ( (q, a) )


0 a 0
0 b 1
1 a 1
1 b 1
A state transition diagram for this DFA is given below.

Example 4: For the example of vending machine of the previous section, Q = { 0, 5, 10,
15, 20 }, = { D, N }, A = { 15, 20 }, the initial state q0 = 0. If we make it a DFA, its
transition function is as shown in the following table.

State (q) Input (a) Next State ( (q, a) )


0 N 5
0 D 10
5 N 10
5 D 15
10 N 15
10 D 20
15 N 5
15 D 10
20 N 5
20 D 10
A finite automaton as a machine

A finite automaton can also be thought of as the device shown below consisting of a tape
and a control circuit which satisfy the following conditions:

1. The tape has the left end and extends to the right without an end.
2. The tape is divide into squares in each of which a symbol can be written prior to
the start of the operation of the automaton.
3. The tape has a read only head.
4. The head is always at the leftmost square at the beginning of the operation.
5. The head moves to the right one square every time it reads a symbol.
It never moves to the left. When it sees no symbol, it stops and the automaton
terminates its operation.
6. There is a finite control which determines the state of the automaton and also
controls the movement of the head.
Operation of finite automata

Let us see how an automaton operates when it is given some inputs. As an example let us
consider the DFA of Example 3 above.
Initially it is in state 0. When zero or more a's are given as an input to it, it stays in state 0
while it reads all the a's (without breaks) on the tape. Since the state 0 is also the
accepting state, when all the a's on the tape are read, the DFA is in the accepting state.
Thus this automaton accepts any string of a's. If b is read while it is in state 0 (initially or
after reading some a's), it moves to state 1. Once it gets to state 1, then no matter what
symbol is read, this DFA never leaves state 1. Hence when b appears anywhere in the
input, it goes into state 1 and the input string is not accepted by the DFA. For example
strings aaa, aaaaaa etc. are accepted but strings such as aaba, b etc. are not accepted by
this automaton.
*
of DFA and its Properties

Subjects to be Learned
*

• Language accepted by DFA

Contents

Here we are going to formally describe what is meant by applying a transition repeatedly,
that is the concept of *

For a state q and string w, *( q , w ) is the state the DFA goes into when it reads the
string w starting at the state q. In general a DFA goes through a number of states from the
state q responding to the symbols in the string w. Thus for a DFA < Q , , q0 , , A > ,
the function
* *
:Q -> Q
is defined recursively as follows:

*
Definition of :

*
Basis Clause: For any state q of Q , (q, ) = q , where denotes the empty string.
*
Inducitve Clause: For any state q of Q, any string y and any symbol a ,
*
( q , ya ) = ( *( q , y ) , a ) .

In the definition, the Basis Clause says that a DFA stays in state q when it reads an empty
string at state q and the Inductive Clause says that the state DFA reaches after reading
string ya starting at state q is the state it reaches by reading symbol a after reading string
y from state q.

Example

For example suppose that a DFA contains the transitions shown below.
*
Then ( q , DNR ) can be calculated as follows:

*
( q , DNR ) = ( *( q , DN ) , R ) by the Inductive Clause.
= ( ( *( q , D ) , N ) , R ) by applying the Inductive Clause to *( q , DN ).
= ( ( *( q , D ) , N ) , R ) since D = D .
= ( ( ( *( q , ) , D ) , N ) , R ) by applying the Inductive Clause to *( q ,
D ).
= ( ( ( q , D ) , N ) , R ) , since ( q , ) = q .
= ( ( q1 , N ) , R ) , since ( q , D ) = q1 as seen from the diagram.
= ( q2 , R ) , since ( q1 , N ) = q2 as seen from the diagram.
= q3 since ( q2 , R ) = q3 as seen from the diagram.

*
Properties of

*
We can see the following two properties of .

Theorem 1: For any state q of Q and any symbol a of for a DFA < Q , , q0 , ,A
>,

*
(q,a)= (q,a)

Proof : Since a = a ,
*
( q , a ) = *( q , a ) .
By the definition of * ,
*
( q , a ) = ( *( q , ) , a )
But *( q , ) = q by the definition of *
.
Hence ( *( q , ) , a ) = ( q , a ) .

The next theorem states that the state reached from any state, say q , by reading a string,
say w , is the same as the state reached by first reading a prefix of w, call it x, and then by
reading the rest of the w, call it y.

Theorem 2: For any state q of Q and any strings x and y over for a DFA < Q , ,
q0 , , A > ,
* * *
( q , xy ) = ( (q,x),y).

Proof : This is going to be proven by induction on string y. That is the statement to be


proven is the following:
*
For an arbitrary fixed string x, ( q , xy ) = *( *( q , x ) , y ) holds for any arbitrary
string y.
First let us review the recursive definition of *.

*
Recursive definition of :

*
Basis Clause: .
* *
Inductive Clause: If x and a , then xa .
*
Extremal Clause: Nothing is in unless it is obtained from the above two clauses.

Now the proof of the theorem.


Basis Step: If y = , then *( q , xy ) = *( q , x ) = *( q , x ) .
Also *( *( q , x ) , y ) = *( *( q , x ) , ) = *( q , x ) by the definition of * .
Hence the theorem holds for y = .
Inductive Step: Assume that *( q , xy ) = *( *( q , x ) , y ) holds for an arbitrary
string y. This is the induction hypothesis.
We are going to prove that *( q , xya ) = *( *( q , x ) , ya ) for any arbitrary symbol
a of .

*
( q , xya ) = ( *( q , xy ) , a ) by the definition of *
= ( * ( *( q , x ) , y ) , a ) by the induction hypothesis.
= *( *( q , x ) , ya ) by the definition of * .

Thus the theorem has been proven.

For the following DFA answer the questions given below.


The following notations are used in the questions:

: \delta
*
: \delta^*
: \Lambda

Language Accepted by DFA

Subjects to be Learned

• Language accepted by DFA


A string w is accepted by a DFA < Q , , q0 , , A > , if and only if
*
( q0 , w ) A . That is a string is accepted by a DFA if and only if the DFA
starting at the initial state ends in an accepting state after reading the string.
A language L is accepted by a DFA < Q , , q0 , , A > , if and only if L

= { w | *( q0 , w ) A } . That is, the language accepted by a DFA is the set of


strings accepted by the DFA.

Example 1 :
This DFA accepts { } because it can go from the initial state to the accepting
state (also the initial state) without reading any symbol of the alphabet i.e. by
reading an empty string . It accepts nothing else because any non-empty
symbol would take it to state 1, which is not an accepting state, and it stays there.

Example 2 :

This DFA does not accept any string because it has no accepting state. Thus the

language it accepts is the empty set .

Example 3 : DFA with one cycle

This DFA has a cycle: 1 - 2 - 1 and it can go through this cycle any number of
times by reading substring ab repeatedly.
To find the language it accepts, first from the initial state go to state 1 by reading
one a. Then from state 1 go through the cycle 1 - 2 - 1 any number of times by
reading substring ab any number of times to come back to state 1. This is
represented by (ab)*. Then from state 1 go to state 2 and then to state 3 by reading
aa. Thus a string that is accepted by this DFA can be represented by a(ab)*aa .
Example 4 : DFA with two independent cycles

This DFA has two independent cycles: 0 - 1 - 0 and 0 - 2 - 0 and it can move
through these cycles any number of times in any order to reach the accepting state
from the initial state such as 0 - 1 - 0 - 2 - 0 - 2 - 0. Thus a string that is accepted
by this DFA can be represented by ( ab + bb )*.

Example 5 : DFA with two interleaved cycles

This DFA has two cycles: 1 - 2 - 0 - 1 and 1 - 2 - 3 - 1.


To find the language accepted by this DFA, first from state 0 go to state 1 by
reading a ( any other state which is common to these cycles such as state 2 can
also be used instead of state 1 ). Then from state 1 go through the two cycles 1 - 2
- 0 - 1 and 1 - 2 - 3 - 1 any number of times in any order by reading substrings baa
and bba, respectively. At this point a substring a( baa + bba )* will have been read.
Then go from state 1 to state 2 and then to state 3 by reading bb. Thus altogether
a( baa + bba )*bb will have been read when state 3 is reached from state 0.

Example 6 :

This DFA has two accepting states: 0 and 1. Thus the language that is accepted by
this DFA is the union of the language accepted at state 0 and the one accepted at
state 1. The language accepted at state 0 is b* . To find the language accepted at
state 1, first at state 0 read any number of b's. Then go to state 1 by reading one a.
At this point (b*a) will have been read. At state 1 go through the cycle 1 - 2 - 1
any number of times by reading substring ba repeatedly. Thus the language
accepted at state 1 is b*a(ba)* .

There is a systematic way of finding the language accepted by a DFA and we are
going to learn it later. So we are not going to go any further on this problem here.

Definition of Nondeterministic Finite Automata


Subjects to be Learned

• Nondeterministic finite automata


• State transition diagram
• State transition table

Definition of nondeterministic finite automaton

Let Q be a finite set and let be a finite set of symbols. Also let be a function from Q

to 2Q , let q0 be a state in Q and let A be a subset of Q. We call the elements of Q a


state, the transition function, q0 the initial state and A the set of accepting states.
Then a nondeterministic finite automaton is a 5-tuple < Q , , q0 , , A >

Notes on the definition

1. As in the case of DFA the set Q in the above definition is simply a set with a
finite number of elements. Its elements can be interpreted as a state that the
system (automaton) is in.
2. The transition function is also called a next state function . Unlike DFAs an NFA
moves into one of the states given by (q, a) if it receives the input symbol a
while in state q. Which one of the states in (q, a) to select is determined
nondeterministically.
3. Note that is a function. Thus for each state q of Q and for each symbol a of
(q, a) must be specified. But it can be the empty set, in which case the NFA
aborts its operation.
4. As in the case of DFA the accepting states are used to distinguish sequences of
inputs given to the finite automaton. If the finite automaton is in an accepting
state when the input ends i.e. ceases to come, the sequence of input symbols given
to the finite automaton is "accepted". Otherwise it is not accepted.
5. Note that any DFA is also a NFA.

Examples of NFA

Example 1: Q = { 0, 1 }, = { a }, A = { 1 }, the initial state is 0 and is as shown in


the following table.

State (q) Input (a) Next State ( (q, a) )


0 a {1}

1 a
A state transition diagram for this finite automaton is given below.

If the alphabet is changed to { a, b } in stead of { a }, this is still an NFA that accepts {


a}.

Example 2: Q = { 0, 1, 2 }, = { a, b }, A = { 2 }, the initial state is 0 and is as shown


in the following table.

State (q) Input (a) Next State ( (q, a) )


0 a {1,2}

0 b

1 a

1 b {2}

2 a

2 b

Note that for each state there are two rows in the table for corresponding to the symbols
a and b, while in the Example 1 there is only one row for each state.

A state transition diagram for this finite automaton is given below.


Operation of NFA

Let us see how an automaton operates when some inputs are applied to it. As an example
let us consider the automaton of Example 2 above.
Initially it is in state 0. When it reads the symbol a, it moves to either state 1 or state 2.
Since the state 2 is the accepting state, if it moves to state 2 and no more inputs are given,
then it stays in the accepting state. We say that this automaton accepts the string a. If on
the other hand it moves to state 1 after reading a, if the next input is b and if no more
inputs are given, then it goes to state 2 and remains there. Thus the string ab is also
accepted by this NFA. If any other strings are given to this NFA, it does not accept any of
them.

*
Let us now define the function and then formalize the concepts of acceptance of
strings and languages by NFA.
Language Accepted by NFA

Subjects to be Learned
*
• for NFA
• Language accepted by NFA
• Properties of *

*
Definition of

For a state q and string w, *( q , w ) is the set of states that the NFA can reach
when it reads the string w starting at the state q. In general an NFA
nondeterministically goes through a number of states from the state q as it reads the
symbols in the string w. Thus for an NFA < Q , , q0 , , A > , the function
* *
:Q -> 2Q
is defined recursively as follows:

Definition of *:
*
Basis Clause: For any state q of Q, (q, ) = { q }, where denotes the empty
string.
*
Inducitve Clause: For any state q of Q, any string y and any symbol a ,

*
( q , ya ) =

In the definition, the Basis Clause says that an NFA stays in state q when it reads an
empty string at state q and the Inductive Clause says that the set of states NFA can
reach after reading string ya starting at state q is the set of states it can reach by
reading symbol a after reading string y starting at state q.

Example
State (q) Input (a) Next State ( (q, a) )
For example consider the NFA with the
0 a {0,1,3} following transition table:
0 b {2}

1 a

1 b {3}
2 a {3}

2 b

3 a

3 b {1}
The transition diagram for this NFA is as given below.

Suppose that the state 3 is an accepting state of this NFA.

*
Then ( 0 , ab ) can be calculated as follows:

* *
( 0 , ab ) is the union of ( p, b ) for all p ( 0 , a ) by the Inductive Clause of
the definition of * .

Now *( 0 , a ) is the union of ( p, a ) for all p *


( 0 , ) again by the Inductive
*
Clause of the definition of .
By the Basis Clause of the definition of *, *( 0 , ) = { 0 } .
Hence *( 0 , a ) = ( 0 , a ) = { 0 , 1 , 3 } .
Hence *( 0 , ab ) = ( 0 , b ) (1,b) (3,b)={2} {3} {1}={1,2,
3}.
*
We say that a string x is accepted by an NFA < Q, , q0, , A > if and only if
*
( q0 , x ) A is not empty, that is, if and only if it can reach an accepting state by
reading x starting at the initial state. The language accepted by an NFA < Q, , q0, ,
A > is the set of strings that are accepted by the NFA.

Some of the strings accepted by the NFA given above are , a, ab, aaa, abbbb etc. and
the language it accepts is a*( ab + a + ba )(bb)* .

*
for NFA has properties similar to that for DFA.

Theorem 1: For any state q of Q and any symbol a of for an NFA < Q , , q0 , ,A
>,

*
(q,a)= (q,a)

Theorem 2: For any state q of Q and any strings x and y over for an NFA < Q , ,
q0 , , A > ,

*
( q , xy ) =

These theorems can be proven in a manner similar to those for Theorems 1 and 2 for
DFA.
Definition of Nondeterministic Finite Automata with -
Transitions

Subjects to be Learned

• Nondeterministic finite automata with -Transitions


• State transition diagram
• State transition table

Contents

One of the objectives of this chapter is to show that there is a one-to-one correspondence
between regular languages and finite automata. We are going to do that by showing that a
finite automaton can be constructed from a given regular expression by combining
simpler FAs using union, concatenation and Kleene star operations. These operations on
FAs can be described conveniently if -Transitions are used. Basically an NFA with
-Transitions is an NFA but can respond to an empty string and move to the next state.
Here we are going to formally define NFA with -Transitions (abbreviated as NFA- )
and see some examples. As we are going to see later, for any NFA- there is a NFA
(hence DFA) which accepts the same language and vice versa.

Definition of nondeterministic finite automaton with -Transitions

Let Q be a finite set and let be a finite set of symbols. Also let be a function from Q

{ } to 2Q , let q0 be a state in Q and let A be a subset of Q. We call the


elements of Q a state, the transition function, q0 the initial state and A the set of
accepting states.
Then a nondeterministic finite automaton with -Transitions is a 5-tuple < Q , , q0
, ,A>

Notes on the definition

1. A transition on reading means that the NFA- makes the transition without
reading any symbol in the input. Thus the tape head does not move when is
read.
2. Note that any NFA is also a NFA- .
Example of NFA-

Q = { 0, 1, 2, 3, 4, 5 }, = { a, b }, A = , the initial state is 0 and is as shown in the


following table.

State (q) Input (a) Next State ( (q, a) )


0 a {1}
0 {4}
1 {2}
2 { 3, 4 }
3 {5}
3 b {4}
4 a {5}

Here the transitions to are omitted from the table. A state transition diagram for this
finite automaton is given below.

When a symbol a is read at the initial state 0, for example, it can move to any of the states
other than 0. For once you are in state 1, for example, you can go to state 2, 3, 4 and 5
without reading any symbol on the tape. If you read string ab, then you come to state 4.
For though you go to states 1, 2, 3, 4 and 5 by reading a, there are no transitions on
reading b except from state 3. Thus 4 is the only state you can go to from the initial state
by reading ab.

Language Accepted by NFA-

Subjects to be Learned

• -closure
*
• for NFA-
• Language accepted by NFA-
• Properties of *

Contents

To formally define * for NFA- , we start with the concept of -closure for a state
which is the set of states reachable from the state without reading any symbol. Using that
concept we define * and then strings and languqges accepted by NFA- .

Definition of -closure

Let < Q , , q0 , , A > be an NFA- . Let us denote the -closure of a set S of states
of Q by ( S ). Then ( S ) is defined recursively as follows:

Basis Clause: S (S)

Inductive Clause: For any state q of Q, if q ( S ) , then ( q , ) (S).


Extremal Clause: Nothing is in ( S ) unless it is obtained by the above two clauses.
For the NFA- of the above figure, ( { 2 } ) is obtained as follows:

First { 2 } ( { 2 } ) , that is, 2 ( { 2 } ) . Then since 2 ( { 2 } ) , by the

Inductive Clause, (2, ) ({2}).

Since (2, ) = { 3 , 4 }, we now have { 2 , 3 , 4 } ({2}).

Since 3 and 4 have been added to ({2}), (3, ) = { 5 } and (4, )= must

be included in ( { 2 } ) . Thus now { 2 , 3 , 4 , 5 } ({2}).


Though 5 has become a memeber of the closure, since ( 5 , ) is empty, no new
members are added to ( { 2 } ) . Since ( q , ) has been examined for all the states
currently in ( { 2 } ) and no more elements are added to it, this process of generating
the closure terminates and ( { 2 } ) = { 2 , 3 , 4 , 5 } is obtained.

As we can see from the example, ( S ) is the set of states that can be reached from the
states of S by traversing any number of arcs. That is, it is the set of states that can be
reached from the states of S without reading any symbols in .

*
Now with this -closure, we can define recursively as follows:

As in the cases of DFA and NFA, * gives the result of applying the transition function
repeatedly as dictated by the given string.

*
Definition of

*
is going to be defined recursively.
Let < Q , , q0 , , A > be an NFA- .
Basis Clause: For any state q of Q,

*
(q, )= ({q}).

*
Inductive Clause: For any state q, a string y in and a symbol a in ,

*
( q , ya ) = ( ).

What the Inductive Clause means is that *( q , ya ) is obtained by first finding the states
that can be reached from q by reading y ( *( q , y ) ), then from each of those states p by
reading a (i.e. by finding ( p , a ) ), and then by reading 's ( i.e. by taking the
closure of the ( p , a )'s ) .

*
Example : For the NFA- of the following figure, ( 0 , ab ) can be obtained as
below:

First let us compute *( 0 , a ) .


For that we need ( { 0 } ).
Since it is the set of states reached by traversing the arcs from state 0, ( { 0 } ) =
{0,3,4}.
Next from each of the states in ( { 0 } ) we read symbol a and move to another state
(i.e. apply ). They are ( 0 , a ) = { 1 } , ( 3 , a ) = ( 4 , a ) = { 5 }.

Hence = { 1 , 5 } for q = 0 .
We then traverse the arcs from { 1 , 5 } to get to the states in *( 0 , a ) . Since ({1
*
} ) = { 1 , 2 , 3 } and ({5})={5}, (0,a)={1,2,3,5}.
Then to find *( 0 , ab ) read b from each of the states in *( 0 , a ) and then take the
arcs from there.
Now ( 1 , b ) , ( 3 , b ) and ( 5 , b ) are empty sets, and ( 2 , b ) = { 4 } . Thus
Since ( { 4 } ) = { 3 , 4 } , *( 0 , ab ) = { 3 , 4 } .

A string x is accepted by an NFA- < Q , , q0 , , A > if and only if *( q0 , x )


contains at least one accepting state.
The language accepted by an NFA- < Q , , q0 , , A > is the set of strings
accepted by the NFA- .

For example the NFA- of the figure given above accepts the language { , a , ab } .

Equivalence of DFAs, NFAs and NFA- s

Subjects to be Learned

• Conversion of NFA- to (equivalent) NFA


• Conversion of NFA to (equivalent) DFA
• Equivalence of DFAs, NFAs and NFA-

Conversion of NFA- to NFA

Let M1 = < Q1 , , q1,0 , 1 , A1 > be an NFA- that recognizes a language L. Then the
NFA M2 = < Q2, , q2,0 , 2 , A2 > that satisfies the following conditions recognizes L:
Q2 = Q1,
q2,0 = q1,0,

*
2( q, a ) = 1 (q,a)= ( )

A2 = A1 { q1,0 } if ( { q1,0 } ) A1

= A1 otherwise .

Thus to obtain an NFA M2 = < Q2, , q2,0 , 2 , A2 > which accepts the same language
as the given NFA- M1 = < Q1 , , q1,0 , 1 , A1 > does,

first copy the states of Q1 into Q2.


Then for each state q of Q2 and each symbol a of find 2 ( q , a ) as follows:

Find ( {q} ), that is all the states that can be reached from q by traversing arcs. Then
collect all the states that can be reached from each state of ( {q} ) by traversing one arc
labeled with the symbol a. The closure of the set of those states is 2( q , a ) .
The set of accepting states A2 is the same as A1 if no accepting states can be reached from
the initial state q1,0 through arcs in M1 . Otherwise, that is if an accepting state can be
reached from the initial state q1,0 through arcs in M1 , then all the accepting states of M1
plus state q1,0 are the accepting states of M2 .

Example 1: Let us convert the following NFA- to NFA.

The set of states Q2 of NFA is { 0, 1, 2, 3 ), the initial state is 0 and the accepting states
are 1 and 0, since 1 is in ( { 0 } ) . The transition function 2 is obtained as follows:
2( 0 , a ):

First ( { 0 } ) = { 0 , 1 } . Then from the transition function of the NFA-


( 0 , a ) = , and
1 1( 1 , a ) = { 1 , 2 }.
Hence 2( 0 , a ) = ({1,2})={1,2}.

For 2( 0 , b ) , since ( { 0 } ) = { 0 , 1 } and 1 (0,b)= 1( 1, b ) = , 2(0,b)=

Similarly 2 can be obtained for other states and symbols. They are given in the table

below together with ( { q } ) and .

State q Input ({q})


2 (q, )(= ( ))
0 a {0,1} {1,2} {1,2}

0 b {0,1}

1 a {1} {1,2} {1,2}

1 b {1}

2 a {2}

2 b {2} {3} {1,3}


3 a {1,3} {1,2} {1,2}

3 b {1,3}

The NFA thus obtained is shown below.


Example 2: Let us convert the following NFA- to NFA.

The set of states Q2 of NFA is { 0, 1, 2, 3, 4 ), the initial state is 0 and the accepting states
are 1 and 0, since 1 is in ( { 0 } ) . The transition function 2 is obtained as for
Example 1. 2 is given in the table below together with ( { q } ) , 1 ( p , ) and

State q Input ({q})


2 (q, )(= ( ))
0 a {0,1} {1,2} {1,2,3}

0 b {0,1}

1 a {1} {1,2} {1,2,3}

1 b {1}

2 a {2,3} {4} {1,4}


2 b {2,3} {4} {1,4}
3 a {3} {4} {1,4}

3 b {3}

4 a {1,4} {1,2} {1,2,3}


4 b {1,4}

The NFA thus obtained is shown below.

Proof of Equivalence of NFA- and NFA

We are going to prove that the NFA obtained from NFA- by the conversion algorithm
accepts the same language as the NFA- .
NFA- that recognizes a language L is denoted by M1 = < Q1 , , q1,0 , 1 , A1 > and
NFA obtained by the conversion is denoted by M2 = < Q2, , q2,0 , 2 , A2 >

* *
First we are going to prove that 1 (q,w)= 2 ( q , w ) for any non-empty string w.

*
To review the definition of for NFA, and NFA- , and the conversion of NFA- to
NFA click here.

When it is proven, it implies that NFA- M1 and NFA M2 accept the same non-empty
strings. The case when w is an empty string is going to be proven separately.

Claim 1: For any non-empty string w and for any state q,


* *
1 (q,w)= 2 (q,w).

Proof: This is going to be proven by induction on w. Recall that the set of strings is
defined recursively (click here for a quick review). Thus we first prove that it is true for
any arbitrary symbol, then assuming it holds for any arbitrary string w we prove it for
any of the children of w, that is wa for any symbol a in the alphabet.

Basis Step: We need to show that for any symbol a in


* *
1 (q,a)= 2 (q,a).
We are going to prove it by showing that both of them are equal to 2 (q,a).

Firstly 2( q , a ) = 1*( q , a ) --- (1)


because of the way NFA is constructed from NFA- (click here for a review) .

* *
Secondly 2 (q,a)= 2 (q , a)=
*
by the definition of 2 for NFA (click here for a review) . .

*
Since 2 (q , )={q}, = = 2 (q,a).
*
Hence 2 (q,a)= 2 ( q , a ) . --- (2)

Hence from (1) and (2),


* *
1 (q,a)= 2 (q,a).

* *
Inductive Step: We need to show that if 1 (q,w)= 2 ( q , w ) for an arbitrary string w
(Induction Hypothesis),

* *
then 1 ( q , wa ) = 2 ( q , wa ) holds for any arbitrary symbol a in .

*
First we show that 2 ( q , wa ) = --- (1)
*
using the definition of 2 , the induction hypothesis and the construction of NFA from
NFA- .

*
Then we show that = 1 ( q , wa ) --- (2)
*
basically using the definition of 1 .
* *
Then from (1) and (2) we can see that 1 ( q , wa ) = 2 ( q , wa ) .

Let us first prove (1).


*
By the definition of 2

*
2 ( q , wa ) =
* *
Since 2 (q,w)= 1 ( q , w ) by the induction hypothesis,

=
*
Since 2 (q,a)= 1 ( q , a ) by the way NFA is constructed from NFA- .

*
Hence 2 ( q , wa ) = , that is (1) has been proven.

*
Let us next prove (2), that is = 1 ( q , wa ) .
*
By the definition of 1 for NFA- ,

*
(p,a)=
1
Substituting this into the left hand side of (2) produces

= .
The right hand side of this equality is equal to

( the first and have been


swapped to get this) ,
as proven below in Claim 3,

This can be shown to be equal to , because

= .
To see an explanation for this click here.

Hence = .

* *
On the other hand = 1 ( q , wa ) , by the definition of 1 .

Hence = 1*( q , wa ) .
Hence we have proven (2).
* *
Thus from (1) and (2) 1 ( q , wa ) = 2 ( q , wa ) .
End of Induction

With this Claim 1 we can see that any non-empty string w is accepted by NFA if and only if it
is accepted by the corresponding NFA- .

As for the empty string , if it is accepted by an NFA- , then ( { q10 } ) A1 .

Hence by the way A2 is constructed, q20 A2 . Hence is accepted by NFA.

Conversely if is accepted by NFA, then q20 A2 . By the way NFA is constructed from

NFA- this means that ( { q10 } ) A1 . Hence is accepted by NFA- .

Thus NFA- and the corresponding NFA accept the same language.

As a preparation for the proof of commutativity of union and -closure operations, let
us prove the following claim.

Claim 2: (S T)= (S) (T).

We are going to prove this in two parts:

(S T) (S) ( T ) and

(S) (T) (S T).

Part 1 : (S T) (S) (T)


This is going to be proven by induction on (S T).
For that let us restate the statement so that the induction becomes clearer.
What Part 1 states is that all the elements of (S T ) have the property of being in
the set (S) (T).
Since (S T ) is defined recursively, in the Basis Step of our proof we prove the
property for the elements of the basis of (S T ) and in the Inductive Step we prove
that if an arbitrary element of (S T ) has that property, then its childen also have
it.
Let us review the definition of the -closure of the set of states of an NFA- . Let X be
the set of states of an NFA- . Then the -closure of X is defined recursively as

Basis Clause: X (X).

Inductive Clause: If q ( X ) , then ( q , ) (X).


Extremal Clause: Nothng is in ( X ) unless it is obtained by the Basis and Inductive
Clauses.
Proof of Part 1:

Basis Step: We need to prove that ( S T) (S) (T).

Since S ( S ) and T ( T ) , S and T are subsets of (S) ( T ) . Hence

(S T) (S) (T).

Inductive Step: We need to prove that if q is an arbitrary element of (S T ) with

the property of being in (S) ( T ) , then (q, ) (S) (T).

Let q be an arbitrary element of (S T ) with the property of being in (S)


(T).

Since q (S) (T),q ( S ) or q (T).

If q ( S ) , then (q, ) ( S ) by the definition of ( S ) . Hence (q,

) (S) (T).

Similarly if q ( T ) , then ( q , ) (S) (T).


Hence if q is an arbitrary element of (S T ) with the property of being in (S)

( T ) , then (q, ) (S) (T).

Thus all the elements of (S T ) have the property of being in (S) (T)

which is to say that (S T) (S) (T).


End of Proof for Part 1

Part 2 : (S) (T) (S T).

Proof of Part 2: We are going to prove (S) (S T ) and (T) (S T

) . That would imply that (S) (T) (S T).

Proof of (S) (S T):


By induction on (S).

Basis Step: We need to show that S (S T).

Since S (S T ) , and ( S T) (S T),S (S T).


Inductive Step: We need to prove that for an arbitrary element q in ( S ) , if q is in
(S T ) , then (q, ) (S T).

Since q is in (S T ) and since (S T ) is a -closure, by the definition of -

closure (q, ) (S T).

Thus (S) (S T ) has been proven.

Similarly (T) (S T ) holds.

Hence (S) (T) (S T ) holds.


End of Proof of Part 2
End of Proof of Claim 2

Claim 3: ( Si ) = ( Si ) .
Proof : Proof by induction on n.

Basis Step: n = 1. If n = 1, then ( Si ) = ( S1 ) and

( Si ) = ( S1 ) .

Hence ( Si ) = ( Si ) holds for n = 1.

Inductive Step: Assume that ( Si ) = ( Si ) holds for n. --- Inducion


Hypothesis

( Si ) = ( ( Si ) ) ( Sn+1 ) by the definition of union.

= ( Si ) ( Sn+1 ) by the induction hypothesis.

= (( Si ) Sn+1 ) by Claim 2 above, since Si is a set as well as Sn+1.

= ( Si ) by the definition of union.


End of Proof for Claim 3

Equivalence of NFA and DFA

We are going to prove that the DFA obtained from NFA by the conversion algorithm
accepts the same language as the NFA.
NFA that recognizes a language L is denoted by M1 = < Q1 , , q1,0 , 1 , A1 > and DFA
obtained by the conversion is denoted by M2 = < Q2, , q2,0 , 2 , A2 >

First we are going to prove by induction on strings that 1*( q1,0 , w ) = 2*( q2,0 , w ) for
any string w. When it is proven, it obviously implies that NFA M1 and DFA M2 accept the
same strings.

* *
Theorem: For any string w, 1 ( q1,0 , w ) = 2 ( q2,0 , w ) .

Proof: This is going to be proven by induction on w.

Basis Step: For w = ,

*
2 ( q2,0 , ) = q2,0 by the definition of 2* .
= { q1,0 } by the construction of DFA M2 .
* *
= 1 ( q1,0 , ) by the definition of 1 .

* *
Inductive Step: Assume that 1 ( q1,0 , w ) = 2 ( q2,0 , w ) for an arbitrary string w. ---
Induction Hypothesis

For the string w and an arbitrry symbol a in ,

*
1 ( q1,0 , wa ) =
*
= (
2 1 ( q1,0 , w ) , a )
*
= 2( 2 ( q2,0 , w ) , a )
*
= 2 ( q2,0 , wa )

* *
Thus for any string w 1 ( q1,0 , w ) = 2 ( q2,0 , w ) holds.

Kleene's Theorem --- Part 1

Subjects to be Learned

• Union of FAs
• Concatenation of FAs
• Kleene Star of FAs
• Acceptance of regular languages by FAs

Contents
Kleene's theorem. It states that any regular language is accepted by an FA and conversely that any
language accepted by an FA is regular.

Theorem 1 (Part 1 of Kleene's theorem): Any regular language is accepted by a finite


automaton.

Proof: This is going to be proven by (general) induction following the recursive definition
of regular language.

Basis Step: As shown below the languages ,{ } and { a } for any symbol a in are
accepted by an FA.

Inductive Step: We are going to show that for any languages L1 and L2 if they are accepted
by FAs, then L1 L2 , L1L2 and L1* are accepted by FAs. Since any regular language is
obtained from { } and { a } for any symbol a in by using union, concatenation and
Kleene star operations, that together with the Basis Step would prove the theorem.

Suppose that L1 and L2 are accepted by FAs M1 = < Q1 , , q1,0 , 1 , A1 > and M2 = < Q2

, , q2,0 , 2 , A2 > , respectively. We assume that Q1 Q2 = without loss of generality


since states can be renamed if necessary.
Then L1 L2 , L1L2 and L1* are accepted by the FAs Mu = < Qu , , qu,0 , u , Au > , Mc
= < Qc , , qc,0 , c , Ac > and Mk = < Q2 , , qk,0 , k , Ak > , respectively, which are
given below.

Mu = < Qu , , qu,0 , u , Au > :

Qu = Q1 Q2 { qu,0 } , where qu,0 is a state which is neither in Q1 nor in Q2 .


u = 1 2 { (qu,0, , { q1,0 , q2,0 } ) } , that is u(qu,0, ) = { q1,0 , q2,0 } . Note that

u (qu,0, a ) = for all a in .


Au = A1 A2

Mc = < Qc , , qc,0 , c , Ac > :

Qc = Q1 Q2
qc,0 = q1,0

c = 1 2 { (q, , { q2,0 } ) | q A1 }
Ac = A2

Mk = < Qk , , qk,0 , k , Ak > :

Qk = Q1 { qk,0 } , where qk,0 is a state which is not in Q1 .

k = 1 { (qk,0, , { q1,0 } ) } { (q, , { qk,0 } ) | q A1 }


Ak = { qk,0 }

These NFA- s are illustrated below.


It can be proven, though we omit proofs, that these NFA- s , Mu, Mc and Mk , in fact
accept L1 L2 , L1L2 and L1*, respectively.

End of Proof

Examples of Mu , Mc and Mk:

Example 1: An NFA- that accepts the language represented by the regular expression
(aa + b)* can be constructed as follows using the operations given above.
Example 2: An NFA- that accepts the language represented by the regular expression
((a + b)a*)* can be constructed as follows using the operations given above.
Kleene's Theorem -- Part 2
Subjects to be Learned

• Languages accepted by FAs are regular

Contents

The converse of the part 1 of Kleene Theorem also holds true. It states that any language
accepted by a finite automaton is regular.
Before proceeding to a proof outline for the converse, let us study a method to compute
the set of strings accepted by a finite automaton.
Given a finite automaton, first relabel its states with the integers 1 through n, where n is
the number of states of the finite automaton. Next denote by L(p, q, k) the set of strings
representing paths from state p to state q that go through only states numbered no higher
than k. Note that paths may go through arcs and vertices any number of times.
Then the following lemmas hold.

Lemma 1: L(p, q, k+1) = L(p, q, k) L(p, k+1, k)L(k+1, k+1, k)*L(k+1, q, k) .

What this lemma says is that the set of strings representing paths from p to q passing
through states labeled with k+1 or lower numbers consists of the following two sets:
1. L(p, q, k) : The set of strings representing paths from p to q passing through states
labeled wiht k or lower numbers.
2. L(p, k+1, k)L(k+1, k+1, k)*L(k+1, q, k) : The set of strings going first from p to k+1,
then from k+1 to k+1 any number of times, then from k+1 to q, all without passing
through states labeled higher than k.
See the figure below for the illustration.
Lemma 2: L(p, q, 0) is regular.
Proof: L(p, q, 0) is the set of strings representing paths from p to q without passing any
states in between. Hence if p and q are different, then it consists of single symbols
representing arcs from p to q. If p = q, then is in it as well as the strings representing
any loops at p (they are all single symbols). Since the number of symbols is finite and
since any finite language is regular, L(p, q, 0) is regular.

>From Lemmas 1 and 2 by induction the following lemma holds.

Lemma 3: L(p, q, k) is regular for any states p and q and any natural number k.

Since the language accepted by a finite automaton is the union of L(q0, q, n) over all
accepting states q, where n is the number of states of the finite automaton, we have the
following converse of the part 1 of Kleene Theorem.

Theorem 2 (Part 2 of Kleene's Theorem): Any language accepted by a finite automaton


is regular.

Example : Let us find the language accepted by the following finite automaton using the
lemmas.

Let us denote by r(p, q, k) the regular expression for the set of strings L(p, q, k).
Then the language accepted by this NFA is r(1, 1, 3).
By Lemma 1, r(1, 1, 3) = r(1, 1, 2) + r(1, 3, 2)r(3, 3, 2)*r(3, 1, 2) .

r(1, 1, 2):
r(1, 1, 2) = r(1, 1, 1) + r(1, 2, 1)r(2, 2, 1)*r(2, 1, 1)
r(1, 1, 1) = r(1,1,0) + r(1,1,0)r(1,1,0)*r(1,1,0) = a* , since r(1,1,0) = a + .
r(1, 2, 1) = r(1,2,0) + r(1,1,0)r(1,1,0)*r(1,2,0) = a+ , since r(1,2,0) = a .
r(2, 2, 1) = r(2,2,0) + r(2,1,0)r(1,1,0)*r(1,2,0) = ba+ + , since r(2,2,0) = and r(2,1,0) =
b.
r(2, 1, 1) = r(2,1,0) + r(2,1,0)r(1,1,0)*r(1,1,0) = ba* .

Hence r(1, 1, 2) = a* + a+(b a+)*b a* .

r(1, 3, 2):
r(1, 3, 2) = r(1, 3, 1) + r(1, 2, 1)r(2, 2, 1)*r(2, 3, 1)

r(1, 3, 1) =
r(2, 3, 1) = a

Hence r(1, 3, 2) = a+(b a+ + )*a


= a+(b a+ )*a .

r(3, 3, 2):
r(3, 3, 2) = r(3, 3, 1) + r(3, 2, 1)r(2, 2, 1)*r(2, 3, 1)
r(3, 3, 1) =

r(3, 2, 1) = r(3,2,0) + r(3,1,0)r(1,1,0)*r(1,2,0) = ba+ , since r(3,2,0) = and r(3,1,0) = b .


Hence r(3, 3, 2) = + ba+( ba+ + )*a
= + ( ba+)+a

r(3, 1, 2):
r(3, 1, 2) = r(3, 1, 1) + r(3, 2, 1)r(2, 2, 1)*r(2, 1, 1)
r(3, 1, 1) = r(3,1,0) + r(3,1,0)r(1,1,0)<SUP*< SUP>r(1,1,0) = ba*

Hence r(3, 1, 2) = ba* + ba+( ba+ + )*ba*


= ( ba+ )*ba* .

Hence r(1, 1, 3) = a* + a+(b a+)*ba* + ( a+( ba+ )*a )( + ( ba+)+a )*( ba+ )*ba*.

This can be further simplified to (a + ab + abb)*, then to (a + ab)*. The detail is left as an
exercise though it would be quite challenging.

In this example there is only one accepting state. If there are more accepting states, then
r(p, q, n) must be found for each accepting state q, where p is the initial state and n is the
number of states in the given finite automaton, and all the r(p, q, n)'s must be added
together to get the regular expression for the language accepted by the automaton.

Comlement and Intersection of Regular Language

Subjects to be Learned

• Complement of Regular Language


• Complement of DFA
• Intersection of Regular Languages

Contents

Complement

Let M = < Q , , q0 , , A > be a DFA that accepts a language L. Then a DFA that
accepts the complement of L, i.e. * - L, can be obtained by swapping its accepting
states with its non-accepting states, that is Mc = < Q , , q0 , , Q - A > is a DFA that
accepts * - L .
For example the following DFA accepts the language a+ over = { a , b }.

A DFA that accepts its complement is obtained from the above DFA by changing all
single circles to double circles and vice versa as shown below.

Remark 1: If we have NFA rather than DFA, we must first convert it to DFA before
swapping states to get its complement.

Remark 2: Since a language is regular if and only if it is accepted by some NFA, the
complement of a regular language is also regular.
Intersection of Regular Languages

Langauges are sets. Therefore all the properties of sets are inherited by languages. In
particular De Morgan's law also applies to languages. By Remark 2 above, if L1 and L2
are regular languages, then their complements are regular languages. Since L1 L2 =

by De Morgan's law, L1 L2 is regular.

Thus summing all this up we can say that the set of regular languages over an alphabet is
closed with respect to union, intersection, difference, concatenation and Kleene star
operations.

Test Your Understanding of Complemnent and Intersection of FAs


Indicate which of the following statements are correct and which are not.
Click True or Fals , then Submit.

Next -- Regular Grammar

Back to Study Schedule

Back to Table of Contents

Finite Automata

Regular Grammar

Subjects to be Learned

• Production and Grammar


• Regular Grammar
• Context-Free, Context-Sensitive and Phrase Structure Grammars
Contents

We have learned three ways of characterising regular languages: regular expressions,


finite automata and construction from simple languages using simple operations. There is
yet another way of characterizing them, that is by something called grammar. A grammar
is a set of rewrite rules which are used to generarte strings by successively rewriting
symbols. For example consider the language represented by a+, which is { a, aa, aaa, . . . }
. One can generate the strings of this language by the following procedure: Let S be a
symbol to start the process with. Rewrite S using one of the following two rules: S -> a ,
and S -> aS . These rules mean that S is rewritten as a or as aS. To generate the string aa
for example, start with S and apply the second rule to replace S with the right hand side
of the rule, i.e. aS, to obtain aS. Then apply the first rule to aS to rewrite S as a. That
gives us aa. We write S => aS to express that aS is obtained from S by applying a single
production. Thus the process of obtaining aa from S is written as S => aS => aa . If we
are not interested in the intermediate steps, the fact that aa is obtained from S is written as

S =>* aa , In general if a string is obtained from a string by applying productions of

a grammar G, we write =>*G and say that is derived from . If there is no

ambiguity about the grammar G that is referred to, then we simply write =>*
Formally a grammar consists of a set of nonterminals (or variables) V, a set of terminals
(the alphabet of the language), a start symbol S, which ia a nonterminal, and a set of

rewrite rules (productions) P. A production has in general the form -> , where is a
string of terminals and nonterminals with at least one nonterminal in it and is a string

of terminals and nonterminals. A grammar is regular if and only if is a single


nonterminal and is a single terminal or a single terminal followed by a single
nonterminal, that is a production is of the form X -> a or X -> aY, where X and Y are
nonterminals and a is a terminal.

For example, = {a, b}, V = { S } and P = { S -> aS, S -> bS, S -> } is a regular
grammar and it generates all the strings consisting of a's and b's including the empty
string.

The following theorem holds for regular grammars.

Theorem 3: A language L is accepted by an FA i.e. regular, if L - { } can be generated


by a regular grammar.

This can be proven by constructing an FA for the given grammar as follows: For each
nonterminal create a state. S corresponds to the initial state. Add another state as the
accepting state Z. Then for every production X -> aY, add the transition ( X, a ) = Y
and for every production X -> a add the transition ( X, a ) = Z.
For example = {a, b}, V = { S } and P = { S -> aS, S -> bS, S -> a, S -> b } form a
regular grammar which generates the language ( a + b )+. An NFA that recognizes this
language can be obtained by creating two states S and Z, and adding transitions ( S, a )
= { S, Z } and ( S, b ) = { S, Z } , where S is the initial state and Z is the accepting state
of the NFA.

The NFA thus obtained is shown below.

Thus L - { } is regular. If L contains as its member, then since { } is regular , L =


( L -{ } ) { } is also regular.

Conversely from any NFA < Q, , , q0, A > a regular grammar < Q, , P, q0 > is
obtained as follows:
for any a in , and nonterminals X and Y, X -> aY is in P if and only if (X, a) = Y ,
and for any a in and any nonterminal X, X -> a is in P if and only if (X, a) = Y for
some accepting state Y.

Thus the following converse of Theorem 3 is obtained.

Theorem 4 : If L is regular i.e. accepted by an NFA, then L - { } is generated by a


regular grammar.

For example, a regular grammar corresponding to the NFA given below is < Q, { a, b },
P, S > , where Q = { S, X, Y } , P = { S -> aS, S -> aX, X -> bS, X -> aY, Y -> bS, S -> a
}.
In addition to regular languages there are three other types of languages in Chomsky
hierarchy : context-free languages, context-sensitive languages and phrase structure
languages. They are characterized by context-free grammars, context-sensitive grammars
and phrase structure grammars, respectively.
These grammars are distinguished by the kind of productions they have but they also
form a hierarchy, that is the set of regular languages is a subset of the set of context-free
languages which is in turn a subset of the set of context-sensitive languages and the set of
context-sensitive languages is a subset of the set of phrase structure languages.
A grammar is a context-free grammar if and only if its production is of the form X ->
, where is a string of terminals and nonterminals, possibly the empty string.
For example P = { S -> aSb, S -> ab } with = { a, b } and V = { S } is a contex-free
grammar and it generates the language { anbn | n is a positive integer } . As we shall see
later this is an example of context-free language which is not regular.

A grammar is a context-sensitive grammar if and only if its production is of the form

1X 2 -> 1 2, where X is a nonterminal and 1 , 2 and are strings of terminals

and nonterminals, possibly empty except .

Thus the nonterminal X can be rewritten as only in the context of 1X 2 .


For example P = { S -> XYZS1, S -> XYZ, S1 -> XYZS1, S1 -> XYZ, YX -> XY, ZX ->
XZ, ZY -> YZ, X -> a, aX -> aa, aY -> ab, BY -> bb, bZ -> bc, cZ -> cc } with = { a,
b, c } and V = { X, Y, Z, S, S1 } is a context-sensitive grammar and it generates the
language { anbncn | n is a positive integer } . It is an example of context-sensitive language
which is not context-free.

Context-sensitive grammars are also characterized by productions whose left hand side is

not longer than the right hand side, that is, for every production -> ,| | | |.
For a phrase structure grammar, there is no restriction on the form of production, that

is a production of a phrase structure grammar can take the form -> , where and
can be any string.

Test Your Understanding of Regular Grammar


Indicate which of the following statements are correct and which are not.
Click True or Fals , then Submit.
There are two sets of questions.

Next -- Minimization of DFA

Back to Study Schedule

Back to Table of Contents

Finite Automata

Minimization of DFA

One important result on finite automata, both theoretically and practically, is that for any
regular language there is a unique DFA having the smallest number of states that accepts
it. Let M = < Q , , q0 , , A > be a DFA that accepts a language L. Then the following
algorithm produces the DFA, denote it by M1, that has the smallest number of states
amomg the DFAs that accept L.

Minimization Algorithm for DFA

Construct a partition = { A, Q - A } of the set of states Q ;


new := new_partition( };
while ( new )
:= ;new

new := new_partition( )
final := ;

function new_partition( )
for each set S of do
partition S into subsets such that two states p and q of S are in the same subset of S
if and only if for each input symbol, p and q make a transition to (states of) the same
set of .

The subsets thus formed are sets of the output partition in place of S.
If S is not partitioned in this process, S remains in the output partition.
end

Minimum DFA M1 is constructed from final as follows:

• Select one state in each set of the partition final as the representative for the set.
These representatives are states of minimum DFA M1.
• Let p and q be representatives i.e. states of minimum DFA M1. Let us also denote
by p and q the sets of states of the original DFA M represented by p and q,
respectively. Let s be a state in p and t a state in q. If a transition from s to t on
symbol a exists in M, then the minimum DFA M1 has a transition from p to q on
symbol a.
• The start state of M1 is the representative which contains the start state of M.
• The accepting states of M1 are representatives that are in A.
Note that the sets of final are either a subset of A or disjoint from A.

Remove from M1 the dead states and the states not reachable from the start state, if there
are any. Any transitions to a dead state become undefined.
A state is a dead state if it is not an accepting state and has no out-going transitions
except to itself.

Example 1 :
Let us try to minimize the number of states of the following DFA.
Initially = { { 1 , 5 } , { 2 , 3 , 4 } }.

new_partition is applied to .
Since on b state 2 goes to state 1, state 3 goes to state 4 and 1 and 4 are in different sets in
, states 2 and 3 are going to be separated from each other in new .
Also since on a sate 4 goes to sate 4, state 3 goes to state 5 and 4 and 5 are in different
sets in , states 3 and 4 are going to be separated from each other in new.
Further, since on b 2 goes to 1, 4 goes to 4 and 1 and 4 are in different sets in , 2 and 4
are separated from each other in new.
On the other hand 1 and 5 make the same transitions. So they are not going to be split.

Thus the new partition is { { 1 , 5 } , { 2 } , { 3 } , { 4 ] }. This becomes the in the


second iteration.

When new_partition is applied to this new , since 1 and 5 do the same transitions,
remains unchanged.
Thus final = { { 1 , 5 } , { 2 } , { 3 } , { 4 ] }.

Select 1 as the representative for { 1 , 5 }. Since the rest are singletons, they have the
obvious representatives.
Note here that state 4 is a dead state because the only transitionout of it is to itself.
Thus the set of states for the minimized DFA is { 1 , 2 , 3 }.
For the transitions, since 1 goes to 3 on a, and to 2 on b in the original DFA, in the
minimized DFA transitions are added from 1 to 3 on a, and 1 to 2 on b. Also since 2 goes
to 1 on b, and 3 goes to 1 on a in the original DFA, in the minimized DFA transitions are
added from 2 to 1 on b, and from 3 to 1 on a.
Since the rest of the states are singletons, all transitions between them are inherited for
the minimized DFA.
Thus the minimized DFA is as given in the following figure:

Example 2 :
Let us try to minimize the number of states of the following DFA.
Initially = { { 3 } , { 1 , 2 , 4 , 5 , 6 } }.
By applying new_partition to this , new = { { 3 } , { 1 , 4 , 5 } , { 2 , 6 } } is
obtained.
Applyting new_partition to this , new = { { 3 } , { 1 , 4 } , { 5 } , { 2 } , { 6 } } is
obtained.
Applyting new_partition again, new = { { 1 } , { 2 } , { 3 } , { 4 } , { 5 } , { 6 } } is
obtained.
Thus the number of states of the given DFA is already minimum and it can not be
reduced any further.

Test Your Understanding of Minimization of DFA


Indicate which of the following statements are correct and which are not.
Click True or Fals , then Submit.

Next -- Application of FA

Back to Study Schedule

Back to Table of Contents

Finite Automata

Application of Finite Automata

Subjects to be Learned

• Reactive system
• Modeling reactive systems with FA

Contents
We have seen an example of use of finite automata in describing the operation of a simplified version
of vending machine. Many other systems operating in practice can also be modeled by
finite automata such as control circuits of computers, computer network communication
protocols, lexical analysers for compilers etc. Many of those systems fall into the class of
systems called reactive system. A reactive system is a system that changes its actions,
outputs and conditions/status in response to stimuli from within or outside it. It is an
event driven or control driven system continuously having to react to external and/or
internal stimuli. The inputs for a reactive system are never ready unlike for example
when two numbers are added together by an adder (Here we are considering an adder at
a higher level of abstraction than physical devices level ignoring for example the
transient states of the electronic circuit that realizes an adder). An adder does not
respond unless the input i.e. two numbers to be added are ready. A system such as an
adder is called a transformational system. In the case of vending machine or
communication protocol, on the other hand, a system must respond to each stimulus,
even to a fragment of input such as each coin tossed in for a can of soda or every message
received.

It is generally agreed that finite automata are a natural medium to describe dynamic
behaviors of reactive systems. Finite automata are formal and rigorous and computer
programs can be easily written to simulate their behaviors.
To model a reactive system with finite automaton, first the states the system goes in or
the modes of its operation are identified. These become the states of the finite automaton
that models it. Then the transitions between the states triggered by events and conditions,
external or internal to the system, are identified and they become arcs in the transition
diagram of the finite automaton. In addition actions that may take place in those states
can also be added to the model.
For example consider the following very simplified version of login process to a computer
from the computer point of view. Let us assume for simplicity that this computer accepts
a single user at a time.
Initially the computer waits for a user name to be typed in. This is one state of the
system. When a name is typed in, it checks whether or not the name is valid. If it is valid,
then it asks for and then waits for the password, which is another state. If the user name
typed in is not valid, it goes back to the initial state. We could make it go to a different
state and count the number of login attempts for security purpose. But let us make it
simple. When a password is typed in and it is correct, then it accepts the user and starts a
session. That is another state though it could further be broken down into a number of
more states. When the session terminates, it gets a signal, goes back to the initial state
and waits for another login. If the password typed in is incorrect, then it informs the user
of that and waits for the next try. That is a fourth state. If the second password fails, it
goes to the initial state and starts all over again. Again what we have seen is a model for
one level of abstraction. Depending on how much detail we are interested in, different
states would be identified and transitions would have to be selected accrdingly.

The next example is a protocol for a computer to follow in communicating with another
computer. Again it is a very simplified version.

Initially the computer is in wait state waiting for "Request for Next Message" (RFNM) to
come from another computer. When a RFNM starts coming, it goes into the state of
receiving it (Our interpretation is that the computer is in a state of receiving an RFNM
and it is taking the action of receiving the RFNM) . Upon completion of the RFNM, it
sends "Acknowledgement" (ACK) to the other computer, which is another state. After
sending the ACK, it starts sending the requested message to the other party. When it is
complete, it goes into another wait state waiting for an ACK to come from the other
computer. If a negative ACK is received, it resends the message. If a positive ACK is
received, it goes back to the initial state and waits for another RFNM to come. Thus a
finite automaton that models this protocol has the following five states: initial state (wait
for RFNM), receiving RFNM, sending ACK, sending message and waiting for ACK.
Again depending on the level of abstraction, different states and transitions would have
to be chosen.
Our third example is a system that recognizes numbers with or without a sign such as
5.378, -15, +213.8 etc. One such system initially waits for the first symbol to come in. If
the first symbol is a sign, then it goes into a state, denote it by G, that indicates that a sign
has been received.
If the first digit is received before a decimal point, regardless of whether a sign has been
read or not, it goes into a state, denote it by D, that indicates a digit has been read before
a decimal point.
If a decimal point is received before a digit, then it goes into a state, denote it by P, that
indicates that a decimal point has been read.
If a decimal point has been read (i.e. in state P), then it must receive at least one digit
after that. After one digit it can continue receiving digits. Therefore from state P it goes
to another state, denote it by Q, after reading a digit and stays there as long as digits are
read. This Q is an accepting state.
On the other hand if a digit has been read before a decimal point, i.e. it is in state D, then
it can continue receiving digits and stay in D. D is another accepting state. If a decimal
point is read while in D, then it goes to state P indicating that a decimal point has been
read.

This system can also be described by a regular expression.


Since these numbers are represented by strings consisting of a possible sign, followed by
zero or more digits, followed by a possible decimal point, followed by one or more digits,
they can be represented by the following regular expression:
( s+ + s- + ) ( d+.d+ + d+ + .d+ ),

where s+ and s- represent the positive and negative signs, respectively and d {0,1,2,.
. . , 9 } . This system can be modeled by the following finite automaton:
Next -- Simulation of FA

Back to Study Schedule

Back to Table of Contents

Finite Automata

Simulation of Finite Automata

Subjects to be Learned

• Simulation of FA

Contents

Once a finite automaton is constructed, we can use a general purpose program to


simulate its operation. One such simulation algorithm is given below. It uses four arrays.
One array, called TOKEN, stores for each state the input symbols that trigger transitions
from the state. Another array, called NEXT_STATE, holds the next state for each input
symbol for each state. A third array, called ACTION, indicates the actions taken at each
state and a fourth, called STATEX, keeps the index of the first symbol in the TOKEN
array for each state. Those indices are used to access the contents of the other arrays.

Algorithm FA Simulator

state := INITIAL_STATE;
input := read_input( ) ;

while ( state NO_of_STATES and not End of Input )


index := STATEX [state] ;

while ( TOKEN [index] 0 and TOKEN [index] input )


index := index + 1;
if ( TOKEN [index] 0)
perform the action specified by ACTION [index];
state := NEXT_STATE [index];

else error

input := read_input( ) ;

end

Here 0 in the TOKEN array is a marker between states.

To see how this algorithm works, first click the box pointed by the red arrow in the
figure below. Then type in a number you want the FA to recognize.
You must hit the "Tab" key to move to the next box.
For example, to input 3.45, first click the box under the red arrow. Then type 3 Tab . Tab
4 Tab 5.

Then every time you click "SHOW" the number is processed digit by digit. The
corresponding transitions are going to be shown by red arrows in the transition diagram.

In the DFA below all the transitions to the empty state (i.e. empty transitions) are
omitted. The ACTION array would contain pointers to actions to be taken corresponding
to arcs traversed such as converting a digit in BCD form to the corresponding binary
number. At the moment it is empty. So no action is taken as a number is processed.
S is the initial state and B and H are accepting states.
The numbers below NEXT_STATE array show the correspondence between the indices
of the STATEX array and the states A, B, C and H. S corresponds to 1.
If you are also interested in how code is executed. click here
It is extremely slow. So be patient.
Next -- Non-Regular Languages

Back to Study Schedule

Back to Table of Contents

Regular Languages

Non-Regular Languages

Subjects to be Learned

• Existence of non-regular languages


• Myhill - Nerode Theorem for non-regularity test
• Pumping Lemma

Contents

We have learned regular languages, their properties and their usefulness for describing
various systems. There are, however, languages that are not regular and therefore require
devices other than finite automata to recognize them. In this section we are going to study
some of the methods for testing given languages for regularity and see some of the
languages that are not regular. The main idea behind these test methods is that finite
automata have only finite amount of memory in the form of states and that they can not
distinguish infinitely many strings. For example to recognize the language { anbn | n is a
natural number} , a finite automaton must remember how many a's it has read when it
starts reading b's. Thus it must be in different states when it has read different number of
a's and starts reading the first b. But any finite automaton has only finite number of states.
Thus there is no way for a finite automaton to remember how many a's it has read for all
possible strings anbn . That is the main limitation of finite automata. Since a regular
language must be recognized by a finite automaton, we can conclude that { anbn | n is a
natural number} is not regular. This is the basis of two of the regularity test methods we
are going to study below: Myhill-Nerode Theorem and Pumping Lemma.

Non-regularity test based on Myhill-Nerode's theorem

Indistinguishability of strings: Strings x and y in * are indistinguishable with


respect to a language L if and only if for every string z in *, either xz and yz are both
in L or they are both not in L.
For example, a and aa are indistinguishable with respect to the language an over alphabet
{ a }, where n is a positive integer, because aak and aaak are in the language an for any
positive integer k. However, with respect to the language anbn , a and aa are not
indistinguishable (hence distinguishable), because ab is in the language anbn while aab is
not in the language.

Using this concept of indistinguishability, the following theorem by Myhill and Nerod
gives a criterion for (non)regularity of a language. It is stated without a proof. For more
on Myhill-Nerode theorem click here.

Theorem : A language L over alphabet is nonregular if and only if there is an infinite


subset of * , whose strings are pairwise distinguishable with respect to L.

Example 1: L1 = { anbn | n is a positive integer } over alphabet { a , b } can be shown to


be nonregular using Myhill-Nerode as follows:
Consider the set of strings S1 = { an | n is a positive integer } . S1 is over alphabet { a , b }
and it is infinite. We are going to show that its strings are pairwise distinguishable with
respect to L1. Let ak and am be arbitrary two different members of the set S1, where k and

m are positive integers and k m . Select bm as a string to be appended to ak and am .


Then akbm is not in L1 while ambm is in L1 . Hence ak and am are distinguishable with
respect to L1 . Since ak and am are arbitrary strings of S1, S1 satisfies the conditions of
Myhill-Nerode theorem. Hence L1 is nonregular.

Example 2: L2 = { ww | w {a, b }* } is nonregular.


Consider the set of strings S2 which is the same as S1 of Example 1 above. It can be
shown to be pairwise distinguishable with respect to L2 as follows.
Let ak and am be arbitrary two different members of the set, where k and m are positive

integers and k m . Select bakb as a string to be appended to ak and am . Then akbakb is in


L2 while ambakb is not in L2 . Hence ak and am are distinguishable with respect to L2 .
Since ak and am are arbitrary strings of S2, S2 satisfies the conditions of Myhill-Nerode
theorem. Hence L2 is nonregular.

Example 3: Let L3 be the set of algebraic expressions involving identifiers x and y,


operations + and * and left and right parentheses.
L3 can be defined recursively as follows:
Basis Clause: x and y are in L3 .

Inductive Clause: If and are in L3 , then ( + ) and ( * ) are in L3 .


Extremal Clause: Nothing is in L3 unless it is obtained from the above two clauses.

For example, x , (x*y) , ( ( x + y ) * x ) and (( (x*y) + x ) + (y*y) ) are algebraic


expressions.

Consider the set of strings S3 = { (k x | k is a positive integer } , that is, the set of strings
consisting of one or more right parentheses followed by identifier x. This set is infinite
and it can be shown to be pairwise distinguishable with respect to L3 as follows:
Let (k x and (m x be arbitrary two strings of S3 , where k and m are positive integers and k

m . Select [ + x ) ]k as a string to be appended to (k and (m . For example [ + x ) ]3 is +x)


+x)+x) .
Then (k x + [ + x ) ]k is in L3 but (m x + [ + x ) ]k is not in L3 because the number of ('s is
not equal to the number of )'s in the latter string. Hence S3 is pairwise distinguishable
with respect to L3 . Hence L3 is not regular.

Pumping Lemma

Let us consider the NFA given below.

This NFA accepts among others some strings of length greater than 5 such as abbabbb,
abbabbabbb etc. Those strings which are accepted by this NFA and whose length is
greater than 5 have a substring which can be repeated any number of times without being
rejected by the NFA. For example the string abbabbb is accepted by the NFA and if one
of its substrings bba is repeated any number of times in abbabbb, the resultant strings
such as abbb (bba repeated 0 times), abbabbabbb, abbabbabbabbb etc. are also accepted
by the NFA.
In general if a string w (such as abbabbb in the example above) is accepted by an NFA
with n states and if its length is longer than n, then there must be a cycle in the NFA
along some path from the initial state to some accepting state (such as the cycle 2-3-4-2
in the above example). Then the substring representing that cycle (bba in the example)
can be repeated any number of times within the string w without being rejected by the
NFA.
The following theorem which is called Pumping Lemma is based on this observation. It
states that if a language is regular, then any long enough string of the language has a
substring which can be repeated any number of times with the resultant strings still in the
language. It is stated without a proof here.

Pumping Lemma : Suppose that a language L is regular. Then there is an FA that

accepts L. Let n be the number of states of that FA. Then for any string x in L with |x|
n, there are strings u, v and w which satisfy the following relationships:
x = uvw

|uv| n
|v| > 0 and

for every integer m 0, uvmw L.

Note that Pumping Lemma gives a necessity for regular languages and that it is not a
sufficiency, that is, even if there is an integer n that satisfies the conditions of Pumping
Lemma, the language is not necessarily regular. Thus Pumping Lemma can not be used to
prove the regularity of a language. It can only show that a language is nonregular.

Example 4: As an example to illustrate how Pumping Lemma might be used to prove


that a language is nonregular, let us prove that the language L = akbk is nonregular, where
k is a natural number.

Suppose that L is regular and let n be the number of states of an FA that accepts L.
Consider a string x = anbn for that n. Then there must be strings u, v, and w such that

x = uvw,

|uv| n
|v| > 0 , and

for every m 0, uvmw L.

Since |v| > 0 , v has at least one symbol. Also since |uv| n, v = ap, for some p > 0 ,
Let us now consider the string uvmw for m = 2.

Then uv2w = an-pa2pbn = an+pbn . Since p > 0 , n + p n . Hence an+pbn can not be in the
language L represented by akbk .

This violates the condition that for every m 0, uvmw L. Hence L is not a regular
language.

Test Your Understanding of Non-regularity


Indicate which of the following statements are correct and which are not.
Click True or Fals , then Submit.

Next -- Context-Free Grammar

Back to Study Schedule

Back to Table of Contents

Myhill-Nerode Theorem

The non-regularity test for languages by Myhill-Nerode is based on the following


theorem which is in the contrapositive form of the theorem used for nonregularity test.
Also it is a corollary to Myhill-Nerode theorem:

Let be the followijg relation on : For strings and of , if and only if

{ }={ }, that is,

if and only if they are indistinguishable with respect to . Then the theorem is is
staed as follows:
Theorem: A language L over alphabet is regular if and only if the set of equivalence

classes of is finite.
Proof of Theorem

Necessity

Suppose that a language L is regular and two strings, say x and y, are distinguishable with
respect to L. Then there is a string z such that xz is in L and yz is not in L (or xz is not in
L and yz is in L). This means that if x and y are read by an DFA that recognizes L, the
DFA reaches different states. If there are three strings that are distinguished with respect
to L, then the DFA reaches three different states after reading those three strings.... Hence
if there are infinitely many strings to be distinguished with respect to L, then the DFA
must have infinitely many states, which it can not because a DFA must have a finite
number of states.

Hence if there is an infinite set of strings which are pairwise distinguishable with respect
to a language, then the language is not regular.

Sufficiency

Conversely, if the number of classes of strings that are pairwise indistinguishable with
respect to a language L is finite, then the language L is regular.

To prove this, let [x] denote a class of strings that are indistinguishable from a string x
with respect to L. Note that "indistinguishable with respect to L" is an equivalence

relation over the set of strings (denote it by ) and [x]'s are equivalence classes. We
will show that a DFA that accepts L can be constructed using these equivalence classes.

Let be the number of distinct equivalence classes (i.e. the index) of and let ,

, ..., be representatives of those distinct equivalence classes. Then we construct a

DFA ( , , , , ) as follows:

= {[ ], [ ], ..., [ ]}

=[ ], where [ ]=[ ].

= {[ ] }

( =[ ] for all .

Let us now show that this machine is in fact a DFA and it accepts the language .

First, note that for every string in [ ], is in exactly one equivalence class, namely

[ ]. For, if and are in different classes for ] and ], then

and are distinguishable with respect to L, making them belong to different [ ]'s.
Hence is a function.
Next, let us show that this DFA accepts . For that, first note that if , then every

string in [ ] is also in L.

We then show that for every string , = [ ], where ] is the


equivalence class that belongs to. Our proof is by structural induction on string .

Basis Step: .

= by the definition of for DFA. Then by the definition of ,

=[ ]. Hence = .

Inductive Step: Assume =[ ], where ].

Then for every , = by the definition of .

But =[ ] by the induction hypothesis. Hence =

=[ ].

Hence =[ ].

Hence we have shown that for every string in , =[ ], where

]. Since if a string in [ ] is in , this means that the DFA accepts


.

Myhill-Nerode Theorem

Let us here state Myhill-Nerode Theorem. First some terminology.

An equivalence relation on is said to be right invariant if for every , , if

then for every , .


Also an equivalence relation is said to be of finite index, if the set of its equivalence
classes is finite.
With these terminology, Myhill-Nerode Theorem can now be stated as follows:

The following three statements are equivalent:


(1) A language is regular.
(2) L is the union of some of the equivalence classes of a right invariant equivalent
relation of finite index.

(3) is of finite index.

Proofs are omitted.

Context-Free Languages

Context-Free Grammar

Subjects to be Learned

• Context-Free Grammar
• Context-Free Languages
• Push Down Automata

Contents

Earlier in the discussion of grammars we saw context-free grammars. They are grammars
whose productions have the form X -> , where X is a nonterminal and is a nonempty
string of terminals and nonterminals. The set of strings generated by a context-free
grammar is called a context-free language and context-free languages can describe many
practically important systems. Most programming languages can be approximated by
context-free grammar and compilers for them have been developed based on properties of
context-free languages. Let us define context-free grammars and context-free languages
here.

Definition (Context-Free Grammar) : A 4-tuple G = < V , , S , P > is a context-free

grammar (CFG) if V and are finite sets sharing no elements between them, S V is

the start symbol, and P is a finite set of productions of the form X -> , where X V,

and (V )* .
A language is a context-free language (CFL) if all of its strings are generated by a
context-free grammar.

Example 1: L1 = { anbn | n is a positive integer } is a context-free language. For the


following context-free grammar G1 = < V1 , , S , P1 > generates L1 :
V1 = { S } , = { a , b } and P1 = { S -> aSb , S -> ab }.
Example 2: L2 = { wwr| w {a, b }+ } is a context-free language , where w is a non-
empty string and wr denotes the reversal of string w, that is, w is spelled backward to
obtain wr . For the following context-free grammar G2 = < V2 , , S , P2 > generates L2 :
V2 = { S } , = { a , b } and P2 = { S -> aSa , S -> bSb , S -> aa , S -> bb }.

Example 3: Let L3 be the set of algebraic expressions involving identifiers x and y,


operations + and * and left and right parentheses. Then L3 is a context-free language. For
the following context-free grammar G3 = < V3 , 3, S , P3 > generates L3 :
V3 = { S } , 3 = { x , y , ( , ) , + , * } and P3 = { S -> ( S + S ) , S -> S*S , S -> x , S -> y
}.

Example 4: Portions of the syntaxes of programming languages can be described by


context-free grammars. For example
{ < statement > -> < if-statement > , < statement > -> < for-statement > , < statement > ->
< assignment > , . . . , < if-statement > -> if ( < expression > ) < statement > , < for-
statement > -> for ( < expression > ; < expression > ; < expression > ) < statement
> , . . . , < expression > -> < algebraic-expression > , < expression > -> < logical-
expression > , . . . } .

Properties of Context-Free Language

Theorem 1: Let L1 and L2 be context-free languages. Then L1 L2 , L1L2 , and L1* are
context-free languages.

Outline of Proof

This theorem can be verified by constructing context-free grammars for union,


concatenation and Kleene star of context-free grammars as follows:
Let G1 = < V1 , , S1 , P1 > and G2 = < V2 , , S2 , P2 > be context-free grammars
generating L1 and L2 , respectively.
Then for L1 L2 , first relabel symbols of V2 , if necessary, so that V1 and V2 don't share
any symbols. Then let Su be a symbol which is not in V1 V2 . Next define Vu = V1 V2
{ Su } and Pu = P1 P2 { Su -> S1 , Su -> S2 } .
Then it can be easily seen that Gu = < Vu , , Su , Pu > is a context-free grammar that
generates the language L1 L2 .

Similarly for L1L2 , first relabel symbols of V2 , if necessary, so that V1 and V2 don't share
any symbols. Then let Sc be a symbol which is not in V1 V2 . Next define Vc = V1 V2
{ Sc } and Pc = P1 P2 { Sc -> S1S2 } .
Then it can be easily seen that Gc = < Vc , , Sc , Pc > is a context-free grammar that
generates the language L1L2 .

For L1* , let Ss be a symbol which is not in V1 . Then let Ps = P1 { Ss -> SsS1 , Ss ->
} . It can be seen that the grammar Gs = < Vs , , Ss , Ps > is a context-free grammar that
generates the language L1* .

Pushdown Automata

Like regular languages which are accepted by finite automata, context-free languages are
also accepted by automata but not finite automata. They need a little more complex
automata called pushdown automata.
Let us consider a context-free language anbn . Any string of this language can be tested for
the membership for the language by a finite automaton if there is a memory such as a
pushdown stack that can store a's of a given input string. For example, as a's are read by
the finite automaton, push them into the stack. As soon as the symbol b appears stop
storing a's and start popping a's one by one every time a b is read. If another a (or
anything other than b) is read after the first b, reject the string. When all the symbols of
the input string are read, check the stack. If it is empty, accept the string. Otherwise reject
it.
This automaton behaves like a finite automaton except the following two points: First, its
next state is determined not only by the input symbol being read, but also by the symbol
at the top of the stack. Second, the contents of the stack can also be changed every time
an input symbol is read. Thus its transition function specifies the new top of the stack
contents as well as the next state.

Let us define this new type of automaton formally.

A pushdown automaton ( or PDA for short ) is a 7-tuple M = < Q , , , q0 , Z0 , A ,


> , where
Q is a finite set of states,
and are finite sets ( the input and stack alphabet, respectively ).
q0 is the initial state,
Z0 is the initial stack symbol and it is a member of ,
A is the set of accepting states
is the transition function and

:Q ( ( } -> 2 Q *
.

Thus ( p , a , X ) = ( q , ) means the following:


The automaton moves from the current state of p to the next state q when it sees an input
symbol a at the input and X at the top of the stack, and it replaces X with the string at
the top of the stack.

Example 1 :

Let us consider the pushdown automaton < Q , , , q0 , Z0 , A , > , where Q = { q0 ,


q1 , q2 } , = { a , b } , = { a , b , Z0 } , A = { q2 } and let be as given in the
following table:

State Input Top of Stack Move


q0 a Z0 ( q0 , aZ 0 )
q0 a a ( q0 , aa )
q0 b a ( q1 , )
q1 b a ( q1 , )
q1 Z0 ( q2 , Z0 )

This pushdown automaton accepts the language anbn . To describe the operation of a PDA
we are going to use a configuration of PDA. A configuration of a PDA M = < Q , ,
, q0 , Z0 , A , > is a triple ( q , x , ) , where q is the state the PDA is currently in, x is
the unread portion of the input string and is the current stack contents, where the input
is read from left to right and the top of the stack corresponds to the leftmost symbol of
. To express that the PDA moves from configuration ( p , x , ) to configuration ( q , y ,

) in a single move (a single application of the transition function) we write

(p,x, ) (q,y, ).

If ( q , y , ) is reached from ( p , x , ) by a sequence of zero or more moves, we write


*
(p,x, ) (q,y, ).

Let us now see how the PDA of Example 1 operates when it is given the string aabb , for
example.
Initially its configuration is ( q0 , aabb , Z0 ). After reading the first a, its configuration is (
q0 , abb , aZ0 ). After reading the second a, it is ( q0 , bb , aaZ0 ). Then when the first b is
read, it moves to state q1 and pops a from the top of the stack. Thus the configuration is
( q1 , b , aZ0 ). When the second b is read, another a is popped from the top of the stack
and the PDA stays in state q1 . Thus the configuration is ( q1 , , Z0 ). Next it moves to
the state q2 which is the accepting state. Thus aabb is accepted by this PDA. This entire
process can be expressed using the configurations as

( q0 , aabb , Z0 ) ( q0 , abb , aZ0 ) ( q0 , bb , aaZ0 ) ( q1 , b , aZ0 ) ( q1 , , Z0 ) (


q2 , , Z0 ).

If we are not interested in the intermediate steps, we can also write

*
( q0 , aabb , Z0 ) ( q2 , , Z0 ) .
*
A string x is accepted by a PDA (a.k.a. acceptance by final state) if (q0, x, Z0) (q,
, ), for some in *, and an accepting state q.

Like FAs, PDAs can also be represented by transition diagrams. For PDAs, however, arcs
are labeled differently than FAs. If ( q , a , X ) = ( p , ) , then an arc from state p to
state q is added to the diagram and it is labeled with ( a , X / ) indicating that X at the
top of the stack is replaced by upon reading a from the input. For example the
transition diagram of the PDA of Example 1 is as shown below.

Example 2 :

Let us consider the pushdown automaton < Q , , , q0 , Z0 , A , > , where Q = { q0 ,


q1 , q2 } , = { a , b , c } , = { a , b , Z0 } , A = { q2 } and let be as given in the
following table:

State Input Top of Stack Move


q0 a Z0 ( q0 , aZ 0 )
q0 b Z0 ( q0 , bZ 0 )
q0 a ( q0 , a )
q0 b ( q0 , b )
q0 c ( q1 , )
q1 a a ( q1 , )
q1 b b ( q1 , )
q1 Z0 ( q2 , Z 0 )

In this table represents either a or b.


This pushdown automaton accepts the language { wcwr | w { a , b }* } , which is the
set of palindromes with c in the middle.
For example for the input abbcbba, it goes through the following configurations and
accepts it.

( q0 , abbcbba , Z0 ) ( q0 , bbcbba , aZ0 ) ( q0 , bcbba , baZ0 ) ( q0 , cbba , bbaZ0 )


( q1 , bba , bbaZ0 ) ( q1 , ba , baZ0 ) ( q1 , a , aZ0 ) ( q1 , , Z 0 ) ( q2 , , Z 0 ) .

This PDA pushes all the a's and b's in the input into stack until c is encountered. When c
is detected, it ignores c and from that point on if the top of the stack matches the input
symbol, it pops the stack. When there are no more unread input symbols and Z0 is at the
top of the stack, it accepts the input string. Otherwise it rejects the input string.

The transition diagram of the PDA of Example 2 is as shown below. In the figure , 1

and 2 represent a or b.

Further topics on CFL

• PDA and Context-Free Language

There is a procedure to construct a PDA that accepts the language generated by a


given context-free grammar and conversely. That means that a language is
context-free if and only if there is a PDA that accepts it. Those procedures are
omitted here.

• Pumping Lemma for Context-Free Language

Let L be a CFL. Then there is a positive integer n such that for any string u in L

with |u| n , there are strings v, w, x, y and z which satisfy

u = vwxyz
|wy| > 0

|wxy| n

for every integer m 0 , vwmxymz L

• Parsing and Parsers for CFL

Consider the algebraic expression x + yz. Though we are accustomed to


interpreting this as x + (yz) i.e. compute yz first, then add the result to x, it could
also be interpreted as ( x + y )z meaning that first compute x + y, then multiply
the result by z. Thus if a computer is given the string x + yz, it does not know
which interpretation to use unless it is explicitly instructed to follow one or the
other. Similar things happen when English sentences are processed by computers
(or people as well for that matter). For example in the sentence "A man bites a
dog", native English speakers know that it is the dog that bites and not the other
way round. "A dog" is the subject, "bites" is the verb and "a man" is the object of
the verb. However, a computer like non-English speaking people must be told
how to interpret sentences such as the first noun phrase (" A dog") is usually the
subject of a sentence, a verb phrase usually follow the noun phrase and the first
word in the verb phrase is the verb and it is followed by noun phrases
reprtesenting object(s) of the verb.
Parsing is the process of interpreting given input strings according to
predetermined rules i.e. productions of grammars. By parsing sentences we
identify the parts of the sentences and determine the strutures of the sentences so
that their meanings can be understood correctly.
Contect-free grammars are powerful grammars. They can describe much of
programming languages and basic structures of natural languages. Thus they are
widely used for compilers for high level programming languages and natural
language processing systems. The parsing for context-free languages and regular
languages have been extensively studied. However, we are not going to study
parsing here. Interested readers are referred to the textbook and other sources.

????
references on Parsing
????
Test Your Understanding of Contect-Free Language

Indicate which of the following statements are correct and which are not.
Click True or Fals , then Submit.

Next -- Turing Machines

Back to Schedule

Back to Table of Contents

Turing Machines

Turing Machines

Subjects to be Learned

• Definition of Turing Machine


• Configuration
• Operation of Turing Machine

Contents

Introduction

We have studied two types of languages from the Chomsky hierarchy: regular languages
and context-free languages. These languages can describe many practically important
systems and so they are heavily used in practice. They are, however, of limited capability
and there are many languages that they can not process. In this chapter we are going to
study the most general of the languages in Chomsky hierarchy, the phrase structure
languages (also called Type 0 languages), and the machines that can process them:
Turing machines. Turing machines were conceived of by the English mathematician Alan
Turing as a model of human "computation". Later Alonzo Church conjectured that any
computation done by humans or computers can be carried out by some Turing machine.
This conjecture is known as Church's thesis and today it is generally accepted as true.
Computers we use today are as powerful as Turing machines except that computers have
finite memory while Turing machines have infinite memory.
We are going to study Turing machines here and through that limitations of computers
and computation as we know today.

Definition

Conceptually a Turing machine, like finite automata, consists of a finite control and a
tape. At any time it is in one of the finite number of states. The tape has the left end but it
extends infinitely to the right. It is also divided into squares and a symbol can be written
in each square. However, unlike finite automata, its head is a read-write head and it can
move left, right or stay at the same square after a read or write.

Given a string of symbols on the tape, a Turing machine starts at the initial state. At any
state it reads the symbol under the head, either erases it or replaces it with a symbol
(possibly the same symbol). It then moves the head to left or right or does not move it
and goes to the next state which may be the same as the current state. One of its states is
the halt state and when the Turing machine goes into the halt state, it stops its operation.
Formally a Turing machine is a 5-tuple T = < Q, , , q0, > , where
Q is a finite set of states, which is assumed not to contain the symbol h. The symbol h is
used to denote the halt state.
is a finite set of symbols and it is the input alphabet.
is a finite set of symbols containing as its subset and it is the set of tape symbols.
q0 is the initial state.
is the transition function but its value may not be defined for certain points.

It is a mapping from Q ( { } ) to ( Q { h } ) ( { }) {R,L,


S}.
Here denotes the blank and R, L and S denote move the head right, left and do not
move it, respectively. A transition diagram can also be drawn for a Turing machine. The
states are represented by vertices and for a transition ( q, X ) = ( r, Y, D ) , where D
represents R, L or S , an arc from q to r is drawn with label ( X/Y , D ) indicating that the
state is changed from q to r, the symbol X currently being read is changed to Y and the
tape head is moved as directed by D.

Example 1 : The following Turing machine < Q1 , , , q0 , > accepts the language
aba* , where Q1 = { q0, q1, q2, q3 } , = { a , b } , = { a , b } and is as given by the
table below.

State (q) Input (X) Move ( (q, X) )


q0 ( q1 , , R )
q1 a ( q2 , a , R )
q2 b ( q3 , b , R )
q3 a ( q3 , a , R )
q3 (h, ,S)

A transition diagram of this Turing machine is given below. It is assumed that the tape
has at the left end and the head is initially at the left end of the tape.
Turing Machine that accepts aba*

To describe the operation of Turing machine we use configuration. A configuration for a


Turing machine is an ordered pair of the current state and the tape contents with the
symbol currently under the head marked with underscore. For example ( q , aababb )
shows that the Turing machine is currently in state q, the taper contents are the string
aababb and the head is reading the last a of the string. We write ( p , xay ) ( q , zbw ) if
the Turing machine goes from the first configuration to the second in one move, and ( p ,
xay ) * ( q , zbw ) if the Turing machine goes from the first configuration to the second
in zero or more moves. If the Turing machine needs to be explicitly indicated T or T*
is used.
A string x is said to be accepted by a Turing machine* T = < Q , , , q0 , > if

( q0 , x ) * ( h, yaz ) for some symbol a { } and some strings y and z in (


*
{ } ) . In this case we also say that the Turing machine halts on input x. The set
of strings accepted by a Turing machine is the language accepted by the Turing
machine. Note that the Turing machine does not stop if a string is not in the language. A
Turing machine T is said to decide a language L if and only if T writes "yes" and halts if
a string is in L and T writes "no" and halts if a string is not in L.

For example the Turing machine of Example 1 above goes through the following
sequence of configurations to accept the string aba:

( q0 , aba )
( q1 , aba )
( q2 , aba )
( q3 , aba )

(h, aba )
The first of the following figures shows a Turing machine that accepts but does not
decide the language { a }, the second is a Turing machine that accepts { a } but goes into
a loop if a string is not in the language (hence it accepts but doe not decide { a }) and the
third decides { a }, where = { a }.
Example 2 : The following Turing machine moves the head to the first to the right of
the current position. It is denoted by TR .

Example 3 : The following Turing machine erases the string on the tape and moves the
head to the left end. It is assumed that initially the tape has at the left end. This Turing
machine is denoted by TE.
Strings not Accepted by Turing Machines

When a string is not accepted by a Turing machine, that is when a Turing machine does
not halt on a string, one of the following three things happens:
(1) The Turing machine goes into an infinite loop,
(2) no transition is specified for the current configuration and
(3) the head is at the left end and it is instructed to move left.

In cases (2) and (3), the operation of the Turing machine is aborted. For example the
following Turing machine accepts the language a+, but it goes into an infinite loop for any
strings that are not in the language.
Turing machine accepting a+

Computabler Function

* *
Let S and let f be a function f : S -> . Then we say T computes f or f is

computable if for every x S,


*
( q0 , x) ( h, f(x) )
*
and for every x that is not in S, T does not halt on x.

* Note on "Turing-acceptable": Some books define "acceptance by Turing machine"


slightly differently. That is, in the Turing machines those books define, there are two halt
states: "accept halt" and "reject halt". A Turing machine thus may accept a string and
halt, reject a string and halt, or loop. With this definition, a string is accepted by a Turing
machine if given the string, the Turing machine eventually goes into the accept halt state.
As far as the material discussed in this class note, there is no difference between these
two definitions of "accept".
A language is a phrase structure (type 0) langauage if and only if it is Turing-acceptable
in either sense and it has no effects on decidablility.
Test Your Understanding of Turing Machines
Indicate which of the following statements are correct and which are not.
Click True or Fals , then Submit.
There are two sets of questions.

Next -- Combination of Turing Machines

Back to Study Schedule

Back to Table of Contents

Turing Machines

Combination of Turing Machines

Subjects to be Learned

• Combination of Turing Machines

Contents

Combination of Turing Machines

We have seen the definition of Turing machine and a few simple examples. One can
construct many more Turing machines that perform various functions. In fact Turing
machines that simulate computers and Turing machines that perform computations done
by any algorithm can be constructed. Furthermore according to the Church's thesis, any
"computation" done by human beings or machines can be done by some Turing machine.
Here we are going to study how complex Turing machines can be constructed using
simple Turing machines and how computers can be simulated by Turing machines.

Let us start with some basic Turing machines.


We have already seen TR . It moves the head to the first symbol (which may be )
to the right of the current position. Similarly by TL we denote a Turing machine that
moves the head to the first symbol (which may be ) to the left of the current
position. Then by T we denote a Turing machine that writes symbol at the current
position and does not move the head (stays). Also by TR and TL we denote Turing
machines that move the head to right and left one position, respectively.

To combine Turing machines we use the following conventions:

Let T1 and T2 represent arbitrary Turing machines.


T1T2 and T1 -> T2 denote the Turing machine that behaves initially like T1 and when T1
halts T2 takes over inheriting the head position and the tape contents of T1 . The halt state
of T1 becomes the initial state of T2 .

T1 -> T2 denote the Turing machine that first executes T1. Then if T1 halts and if the
symbol currently under the head is , then T2 is started as in the case of T1T2 . Otherwise
it crashes.

Using these basic machines and the convention, let us construct a little more complex
Turing machines. Below is assumed to be at the left end of the tape initially.

Example 4: The following machine shifts the tape contents to the left one position, takes
the head to the right end of the string and halts.
For example with the initial tape contents of ab , it goes through the following

sequence of tape contents and ends with ab :

ab
-> ab

-> aab

-> a b
-> a b

-> abb

-> ab

-> ab

-> ab

Example 5: The left-shift machine of Example 4 can be used to construct an adder for
natural numbers. First, natural numbers are represented on a Turing machine using
symbol I. For example the number 3 is represented by three consecutive I's on the tape
and 5 by five I's. In general to represent a natural number k, k consecutive I's are put on
the tape. To add two numbers m and n, m I's and n I's with a blank between them are
placed on the tape. So the initial configuration for adding 2 and 3 is ( q0 , II III ) .
After the addition the configuration becomes ( h , IIIII ) .

An adder can be constructed for example as TR -> TSL TL . After adding two
numbers placed on the tape it moves the head to the left end and halts.

Example 6:
The following Turing machine copies the tape contents at the left end to their right
*
separated by a blank , that is ( q0 , x) (h, x x).
Today's computers are very complex machines and their instruction sets contain
complicated operations. However, all of those instructions can be realized using
combinations of a small number of basic instructions. In fact many of the earlier
computers had a much smaller instruction set but still could do everything today's
computers can do albeit much more slowly. A bare minimum instruction set would
contain addition, branching, store and load operations. All the other operations can be
realized by using those basic operations. On the other hand as we have seen above, there
is a Turing machine that performs addition; the branch operation is already in Turing
machines because next configurations are determined based on the current state and tape
symbol being looked at; and store and load operations can be taken care of by a Turing
machine that copies tape contents. Furthermore if the subtraction operation is necessary,
it is not difficult to construct a Turing machine that performs subtraction using the same
representation of numbers as for the addition. Thus by combining appropriate Turing
machines a computer with a minimal instruction set can be constructed. Since any
complex computer instructions can be realized using those basic instructions, one can say
that computers can be simulated by Turing machines.

Test Your Understanding of Combination of Turing Machines


Indicate which of the following statements are correct and which are not.
Click True or Fals , then Submit.

The following notations are used in the questions:


T_a : Ta
T_R : TR
->^b : ->b

Next -- Types of Turing Machines

Back to Study Schedule

Back to Table of Contents

Turing Machines

Types of Turing Machines

Subjects to be Learned

• Variation of Turing Machine

Contents

There are a number of other types of Turing machines in addition to the one we have seen
such as Turing machines with multiple tapes, ones having one tape but with multiple
heads, ones with two dimensional tapes, nondeterministic Turing machines etc. It turns
out that computationally all these Turing machines are equally powerful. That is, what
one type can compute any other can also compute. However, the efficiency of
computation, that is, how fast they can compute, may vary.

Turing Machines with Two Dimensional Tapes

This is a kind of Turing machines that have one finite control, one read-write head and
one two dimensional tape. The tape has the top end and the left end but extends
indefinitely to the right and down. It is divided into rows of small squares. For any Turing
machine of this type there is a Turing machine with a one dimensional tape that is equally
powerful, that is, the former can be simulated by the latter.
To simulate a two dimensional tape with a one dimensional tape, first we map the squares
of the two dimensional tape to those of the one dimensional tape diagonally as shown in
the following tables:

Two Dimensional Tape

v v v v v v v ... ...
h 1 2 6 7 15 16 ... ...
h 3 5 8 14 17 26 ... ...
h 4 9 13 18 25 ... ... ...
h 10 12 19 24 ... ... ... ...
h 11 20 23 ... ... ... ... ...
h 21 22 . . .< .> ... ... ... ... ...
...... ... ... ... ... ... ... ...

Here the numbers indicate the correspondence of squares in the two tapes: square i of the
two dimensional tape is mapped to square i of the one dimensional tape. h and v are
symbols which are not in the tape alphabet and they are used to mark the left and the top
end of the tape, respectively.

One Dimensional Tape

v 1 v 2 3 h 4 5 6 v 7 8 9 10 h 11 . . . . . .

The head of a two dimensional tape moves one square up, down, left or right.
Let us simulate this head move with a one dimensional tape.
Let i be the head position of the two dimensional tape. If the head moves down from i,
then move the head of the one dimensional tape to right until it hits h or v counting the
number of squares it has visited after i. Let k be the number of squares visited by the head
of the one dimensional tape. If h was hit first, then from h move the head of the one
dimensional tape further right to the k-th square from h. That is the square corresponding
to the square below i in the two dimensional tape. If v was hit first, then (k+1)-th square
to the right from v is the new head position.
For example, suppose that the head position is at 8 for the two dimensional tape in the
above table, that is i = 8. If the head moves down to position 13, then for the one
dimensional tape, the head moves from position 8 to right. Then it meets h first, which is
the third square from 8. Thus from h, move 3 positions to the right. That is the head
position of the one dimensional tape corresponding to 13 on the two dimensional tape.
If i = 5 and the head moves down on the other hand, then on the one dimensional tape the
head moves to the right and it hits v first, which is the second square from i = 5. Thus this
time the third square is the head position of the one dimensional tape corresponding to 9
on the two dimensional tape.
Similarly formulas can be found for the head position on the one dimensional tape
corresponding to move up, right or left on the two dimensional tape. Details are omitted.
Thus some Turing machines with a one dimensional tape can simulate every move of a
Turing machine with one two dimensional tape. Hence they are at least as powerful as
Turing machines with a two dimensional tape. Since Turing machines with a two
dimensional tape obviously can simulate Turing machines with a one dimensional tape, it
can be said that they are equally powerful.

Turing Machines with Multiple Tapes :

This is a kind of Turing machines that have one finite control and more than one tapes
each with its own read-write head. It is denoted by a 5-tuple < Q , , , q0, > . Its
transition function is a partial function

:Q ( { } )n -> ( Q { h } ) ( { } )n { R , L , S } n .
A configuration for this kind of Turing machine must show the current state the machine
is in and the state of each tape.
It can be proven that any language accepted by an n-tape Turing machine can be accepted
by a one tape Turing machine and that any function computed by an n-tape Turing
machine can be computed by a one tape Turing machine. Since the converses are
obviously true, one can say that one tape Turing machines are as powerful as n-tape
Turing machines.

Turing Machines with Multiple Heads :

This is a kind of Turing machines that have one finite control and one tape but more than
one read-write heads. In each state only one of the heads is allowed to read and write. It is
denoted by a 5-tuple < Q , , , q0, >. The transition function is a partial function

: Q { H1 , H2 ... , Hn } ( { } ) -> ( Q {h}) ( { } {R,L,S


},
where H1 , H2 ... , Hn denote the tape heads.

It can be easily seen that this type of Turing machines are as powerful as one tape Turing
machines.

Turing Machines with Infinite Tape :

This is a kind of Turing machines that have one finite control and one tape which extends
infinitely in both directions.
It turns out that this type of Turing machines are only as powerful as one tape Turing
machines whose tape has a left end.

Nondeterministic Turing Machines


A nondeterministic Turing machine is a Turing machine which, like nondeterministic
finite automata, at any state it is in and for the tape symbol it is reading, can take any
action selecting from a set of specified actions rather than taking one definite
predetermined action. Even in the same situation it may take different actions at different
times. Here an action means the combination of writing a symbol on the tape, moving the
tape head and going to a next state. For example let us consider the language L = { ww :

w { a , b }* } . Given a string x, a nondeterministic Turing machine that accepts this


language L would first guess the midpoint of x, that is the place where the second half of
x starts. Then it would compare the first half of x with the second half by comparing the
i-th symbol of the first half with the i-th symbol of the second half for i = 1, 2, ... . A
deterministic Turing machine, on the other hand, can not guess the midpoint of the string
x. It must find the midpoint by for example pairing off symbols from either end of x.
Formally a nondeterministic Turing machine is a Turing machine whose transition

function takes values that are subsets of ( Q {h}) ( { } {R,L,S}.

As in the case of NFA, it is understood that a nondeterministic Turing machine at any


configuration selects one combination of next state, tape symbol and head movement out
of the set of triples without following any specific predetermined rule.

It can be shown that a nondeterministic Turing machine is only as powerful as a


deterministic Turing machine.

Theorem Any language accepted by a nondeterministic Turing machine is also accepted


by some deterministic Turing machine.
Proof : Let TN denote a nondeterministic Turing machine. Given a string x , TN starts at
the initial configuration and goes through a sequence of configurations until it reaches a
halt configuration , goes into an infinite loop or aborts. At any point in the process TN is
in some configuration and has a finite set of configurations to choose from for its next
configuration. The set of all possible computations that TN can perform for a given string
x can be represented by a rooted tree as follows. The root of the tree is the initial
configuration and it is the only vertex of level 0. All possible configurations that are
reachable by applying the transition function of TN once form the children of the initial
configuration. They form level 1. In general for each vertex of level i all possible
configurations that are reachable by applying the transition function of TN are its children.
The children of all the vertices of level i form level i+1. Note that the number of children
for a vertex in this tree is finite because the number of states is finite and there are a finite
number of tape symbols.
For example consider the following nondeterministic Turing machine that accepts a+ .
Turing machine accepting a+

Given the string aa, it would proceed as follows to accept it:

( q0 , aa ) ( q1 , aa ) ( q1 , aa ) ( q2 , aa ) ( h , aa ) .
At the second and third configurations in the above sequence, it has two candidates for

the next configuration: ( q1 , aa ) and ( q2 , aa ) for the second, and ( q1 , aa )

and ( q2 , aa ) for the third.


The tree for this case would be as follows:
One way to simulate a nondeterministic Turing machine, call it T1, with a deterministic
one, call it T2, is to traverse this tree breadth-first way from the root until the halt state is
reached. At each level of the tree, T2 applies the transition function of T1 to each
configuration at that level and computes its children. These children are the
configurations of the next level and they are stored on the tape (if necessary a second tape
may be used). If there is the halting state among these children, then T2 accepts the string
and halts. It can be easily seen that T2 accepts a string if and only if T1 accepts it. Thus
any language accepted by a nondeterministic Turing machine is also accepted by a
deterministic Turing machine, though a deterministic Turing machine might take much
more time than a nondeterministic Turing machine to accept a string.

Many other variations of Turing machine are possible. However, it has been shown that
none of them exceed the capability of basic deterministic Turing machine as far as
accepting languages is concerned. In fact the Church's thesis conjectures that any so
called computation done by humans or computers can be performed by a basic
deterministic Turing machine.

Test Your Understanding of Different Types of Turing Machines


Indicate which of the following statements are correct and which are not.
Click True or Fals , then Submit.

Next -- Unsolvable Problems

Back to Study Schedule

Back to Table of Contents

Turing Machines

Unsolvable Problems
Subjects to be Learned

• Halting Problem
• Languages not Accepted by Turing Machines
• Other Unsolvable Problems

Contents

We have learned that deterministic Turing machines are capable of doing any
computation that computers can do, that is computationally they are equally powerful,
and that any of their variations do not exceed the computational power of deterministic
Turing machines. It is also conjectured that any "computation" human beings perform can
be done by Turing machines (Church's thesis).
In this chapter we are going to learn that there are problems that can not be solved by
Turing machines hence by computers. Here "unsolvability" is in the following sense.
First recall that solving a problem can be viewed as recognizing a language (see Problem
Solving as Language Recognition). So we are going to look at the unsolvability in terms
of language recognition. Suppose that a language is acceptable but not decidable. Then
given a string a Turing machine that accept the language starts the computation. At any
point in time, if the Turing machine is running, there is no way of telling whether it is in
an infinite loop or along the way to a solution and it needs more time. Thus if a language
is not decidable, the question of whether or not a string is in the language may not be
answered in any finite amount of time. Since we can not wait forever for an answer, the
question is unanswerable that is the problem is unsolvable. Below we are going to see
some well known unsolvable problems and see why we can say they are unsolvable.

Halting Problem

One of well known unsolvable problems is the halting problem. It asks the following
question: Given an arbitrary Turing machine M over alphabet = { a , b } , and an
arbitrary string w over , does M halt when it is given w as an input ?
It can be shown that the halting problem is not decidable, hence unsolvable.

Theorem 1 : The halting problem is undecidable.

Proof (by M. L. Minsky): This is going to be proven by "proof by contradiction".


Suppose that the halting problem is decidable. Then there is a Turing machine T that
solves the halting problem. That is, given a description of a Turing machine M (over the
alphabet ) and a string w, T writes "yes" if M halts on w and "no" if M does not halt on
w, and then T halts.
We are now going to construct the following new Turing machine Tc.
First we construct a Turing machine Tm by modifying T so that if T accepts a string and
halts, then Tm goes into an infinite loop (Tm halts if the original T rejects a string and
halts).

Next using Tm we are going to construct another Turing machine Tc as follows:


Tc takes as input a description of a Turing machine M, denoted by d(M), copies it to
obtain the string d(M)*d(M), where * is a symbol that separates the two copies of d(M)
and then supplies d(M)*d(M) to the Turing machine Tm .
Let us now see what Tc does when a string describing Tc itself is given to it.
When Tc gets the input d(Tc) , it makes a copy, constructs the string d(Tc)*d(Tc) and gives
it to the modified T. Thus the modified T is given a description of Turing machine Tc and
the string d(Tc).

The way T was modified the modified T is going to go into an infinite loop if Tc halts on
d(Tc) and halts if Tc does not halt on d(Tc).
Thus Tc goes into an infinite loop if Tc halts on d(Tc) and it halts if Tc does not halt on
d(Tc). This is a contradiction. This contradiction has been deduced from our assumption
that there is a Turing machine that solves the halting problem. Hence that assumption
must be wrong. Hence there is no Turing machine that solves the halting problem.

Program correctness and Halting Problem

Note that for any computer program a Turing machine can be constructed that performs
the task of the program. Thus the question of whether or not a program halts for a given
input is nothing but the halting problem. Thus one implication of the halting problem is
that there can be no computer programs (Turing machines) that check whether or not any
arbitrary computer program stops for a given input.
Test Your Understanding of Unsolvable Problems
Indicate which of the following statements are correct and which are not.
Click True or Fals , then Submit.

Next -- More Unsolvable Preoblems

Back to Study Schedule

Back to Table of Contents

Turing Machines

More Unsolvable Problems

Subjects to be Learned

• Languages not Accepted by Turing Machines


• Other Unsolvable Problems

Contents

The next unsolvable problem is in a sense more difficult than the halting problem. It is
presented as a language and it can be shown that there are no Turing machines that accept
the language.

Language NonSelfAccepting

Let us first define two languages NSA1 and NSA2 as follows:


NSA1 = { w | w { a, b }*, w = d(T) for a Turing machine T and T does not accept
w}

NSA2 = { w | w { a, b }*, w d(T) for any Turing machine T } ,


where d(T) is a description of the Turing machine T.

NSA1 is the set of strings that describe a Turing machine but that are not accepted by the
Turing machine they describe. NSA2 is the set of strings that do not describe any Turing
machine.
Neither NSA1 nor NSA2 is empty. For let T be a Turing machine that accepts { a } and let
w = d(T). Then this w is a description of a Turing machine but it must be longer than one
symbol. Hence it is not accepted by T. Hence w is in NSA1 . For NSA2, let w = a. Then
there is no Turing machine that can be described by the string a. Certainly more symbols
than a single a are needed to describe even the simplest Turing machine. Hence a is in
NSA2 . Thus neither NSA1 nor NSA2 is empty.

Let us define the language NonSelfAccepting as NonSelfAccepting = NSA1 NSA2 .


Then we can prove the following theorem:

Theorem 2 There are no Turing machines that accept the language NonSelfAccepting.

Proof: This is going to be proven by contradiction.


Suppose there is a Turing machine, call it T0, that accepts NonSelfAccepting. Let w0 =
d( T0 ), that is w0 is a description of the Turing machine T0 . We are going to see that T0
neither accepts w0 nor rejects it, which is absurd. This means that there can not be any
Turing machine that accepts the language NonSelfAccepting.

Since NonSelfAccepting is a language, either w0 is in NonSelfAccepting or it isn't. Hence


either T0 accepts w0 or rejects it.

(1) If T0 accepts w0, then w0 NonSelfAccepting because T0 accepts NonSelfAccepting.


However, by the definitions of NSA1 and NSA2, w0 is in neither NSA1 nor NSA2 . Hence
w0 is not in NonSelfAccepting . This is a contradiction. Hence T0 can not accept w0 .
(2) If T0 does not accept w0 , then w0 is not in NonSelfAccepting because T0 accepts
NonSelfAccepting. But w0 = d( T0 ) because that is how we selected w0 . Also T0 does not
accept w0 . Hence by the definition of NSA1 , w0 is in NSA1 . Hence it is in SelfAccepting
. This is again a contradiction.
Thus there can not be Turing machine T0 that accepts the language SelfAccepting .

Knowing the unsolvability of the halting problem some other problems can be shown to
be unsolvable.

Problem Accepts( )
The problem Accepts( ) asks whetehr or not a given Turing machine T accepts . It
can be shown to be unsolvable.
Suppose that Accepts( ) is solvable. Then there is a Turing machine that solves it. Let
A be a Turing machine that solves Accepts( ). We are going to show that the halting
problem becomes solvable using this A.
Let a Turing machine T' and a string w be an instance of the halting problem. Consider a
Turing machine T = TwT', where machine Tw is a Turing machine that writes w. This T
halts on if and only if T' halts on w. Using this T, a Turing machine, call it M, that
solves the halting problem can be constructed as follows:
Given a description d(T') of a Turing machine T' and a string w as inputs, which is an
instance of the halting problem, M writes the string d( T ) on the tape and let A take over.

Then M halts on d(T') and w if and only if T' halts on w. That is, M solves the halting
problem. Thus if Accepts( ) is solvable, the halting problem can be solved. Since the
halting problem is unsolvable, this means that Accepts( ) is unsolvable.

Using a similar idea the following problem can also be shown to be unsolvable.

Problem AcceptsEverything
The problem AcceptsEverything asks whether or not a given Turing machine T halts on
every string over a given alphabet .
Suppose that AcceptsEverything is solvable. Let A be a Turing machine that solves
AcceptsEverything. We are going to show that Accepts( ) can be solved using the
solution to it.
Let T' be an instance of Accepts( ). Then consider the Turing machine T = TeraseT' ,
where Terase is a Turing machine that erases the input on the tape and halts. This T halts on
every string over if and only if T1 halts on .
Using this T, a Turing machine, call it M, that solves Accepts( ) can be constructed as
shown below.

Since Accepts( ) is unsolvable, it means that AcceptsEverything is unsolvable.

By similar arguments the following problems can be shown to be unsolvable.

AcceptsNothing
This problem asks whether or not a Turing machine accepts nothing.
It can be shown to be unsolvable using Accepts( ) .

Equivalence
This problem asks whether or not two Turing machines accept the same language.
It can be shown to be unsolvable using AcceptsEverything.

Other Unsolvable Problems

Let G1 and G2 be context-free grammars and let L(G) denote the language generated by
grammar G. Then the following problems are all unsolvable.

Is L( G1 ) L( G2 ) ?

Is L( G1 ) L( G2 ) = ? finite ? infinite ? context-free ?


Is L( G1 ) = L( G2 ) ?
Is L( G1 ) = * ?
Is the complement of L( G1 ) context-free ?
Test Your Understanding of Unsolvable Problems
Indicate which of the following statements are correct and which are not.
Click True or Fals , then Submit.

Next -- Time Complexity of Problem

Back to Study Schedule

Back to Table of Contents

Complexity

Time Complexity

Subjects to be Learned

• Time Complexity of Problems


• Decision Tree
• Class NP
• Polynomial Time Transformation
• NP-Complete Problems

Contents

In the previous sections we have learned that some problems are unsolvable by Turing
machines hence by computers. No one can write computer programs that solve those
problems and halt after a finite amount of time. A problem is solvable if some Turing
machine can solve it in finite time. Even if it takes a million years to solve a problem, it is
still solvable. However, in practice if it takes a million years, it is as good (or bad) as
unsolvable. For many problems a day or even an hour would be too long. In this section
we are going to study solvable problems and learn a hierarchy of solvable problems based
on the computation time required to solve them. The measure for computation time we
use is the worst case time. It is estimated by counting the largest possible number of key
operations to be performed in terms of the input size. For more detailed review of this,
O(f(x)) (big-oh) and other related subjects click here.

Among the solvable problems there are problems that can be solved by algorithms with
the worst case time which is a polynomial in the problem size (polynomial time
algorithms). For example a binary search takes O(lg n) time, a quick sort needs O(n2)
time, a heap sort needs O(n lg n) time. They are all polynomial time algorithms. There
are also problems that must be solved at best by exponential time algorithms in the worst
case. For example, as we are going to see below, the satisfiability problem for the
propositional dynamic logic is proven to take exponential time to solve in the worst case.
They take much more time to execute than polynomial time algorithms. Then there are
problems that require double exponential ( e.g. 22n ) time algorithms, problems that need
k-exponential time algorithms, where k is a natural number, etc. and there are problems
that require algorithms with the worst case time worse than k-exponential time for any
natural number k. The problems that can not be solved with any polynomial time
algorithm are called intractable problems .

Let us see some of those intractable problems. In logic there is a well known problem of
"satisfiability". This is the problem of asking whether or not a given formula can take the
value true for some values of its variables. For example the formula ( P V ~P ) is always
true, where P is a propositional variable. So it is certainly satisfiable. Similarly ( P V Q )
is also satisfiable. But (P ^ ~P ) is always false. So it is not satisfiable. One can ask the
same question for formulas of first order predicate logic, second order logic, etc.
Before proceeding to predicate logic let us consider the following logic called
propositional dynamic logic (PDL for short). This is a propositional logic with an extra
construct (proposition) after(A, S), where A is an algorithm and S is a statement.
after(A, S) says that S is true after executing A. For example "after( if P then Q else ~Q,
~Q )" and "if P then after( if P then Q else ~Q, Q )" , where P and Q are propositions, are
propositions of PDL. They are both satisfiable.
The satisfiability problem for PDL is known to take at least exponential time to solve in
the worst case.

The satisfiability problem becomes even harder when logic becomes more complex. For
example the satisfiability problem for Presburger arithmetic is double-exponential (2-fold
exponentail), that is it requires at least O( aan ) time to solve in the worst case.
Presburger arithmetic is a logic that allows statements involving positive integers,
variables taking positive integers as their values, the addition operation +, the equality
symbol = and quantifiers and , as well as all the connectives such as and, or etc.
For example, X [ if ~( X = 1 ), then Y Z [ X = Y + Z ] ] is a proposition of
Presburger arithmetic.
In Presburger arithmetic (minus addition operation), if, in addition, sets of integers and
the predicate "belongs to" (an element X belongs to a set S) are allowed, the logic is
called WS1S (Weak Second-order theory of 1 Successor). For the satisfiability problem
of WS1S, there are no K-fold exponential time algorithms to solve it for any number K.
Such a problem (having no K-fold exponential time algorithms) is called nonelementary.

Now let us go back to the satisfiability problem of propositional logic. This problem
belongs to a peculiar class of problems called NP-Complete problems. For the problems
of this class there are no known polynomial time algorithms for solving them nor are they
known to be unsolvable with polynomial time algorithms. At the moment, however, the
consensus is that they ca not be solved with polynomial time algorithms. Below we are
going to characterize this class of problems.

First, there are problems that are solved by answering with yes or no. For example, "Is a
string w in the language a*b ? ", "Is it possible to schedule committee meetings without
conflicts into a given number of time slots ? " , " Is it possible to assign colors to vertices
of a given graph using a given number of colors or less so that no two vertices connected
directly by an edge have the same color assigned ? " etc. These problems are called
decision problems. Some of these decision problems are NP-complete problems.
Let us here review nondeterministic Turing machines. Consider the problem of coloring
vertices of a graph with a given number of colors or less so that no two vertices
connected directly by an edge have the same color assigned. This problem is called
"Graph Coloring" problem or more precisely "Vertex Color" problem. Let us try to solve
the following instances of this graph coloring problem:

Given the following graph, is it possible to color its vertices with three or less colors ?
For the graphs of (a) and (b), you could find a solution very easily by inspection. You
could see a right coloring as soon as you saw the graphs. However, you can most likely
not tell how you arrived at your solutions. You probably don't have any algorithms you
could use to solve them. You could somehow see the solutions. This is basically the idea
of nondeterministic (Turing) machine. There is no fixed procedure which you can use
repeatedly to solve instance after instance of this problem. But you can somehow solve
them.
Let us move on to a slightly more complex example of (c). For this graph to find a right
coloring you could start with vertex 1 and assign color a. Then move on to vertex 2 and
assign color b(it has to be something other than a ). Then go to vertex 3 and assign a third
color, say c. Then at vertex 4 select color b and for vertex 5 use color a.
In this process we make a decision as to what color to use for each vertex and when a
decision is made for all the vertices we have a solution to the problem. This process
applies to any decision problem. That is to solve a decision problem a number of smaller
decisions are made one after another and as a result a solution to the problem is obtained.
This process can be represented by a tree called decision tree. For example, for the graph
coloring problem let us first decide on the order of vertices we color in, say 1, 2, 3, 4 and
5 for the graph of (c) above. Then the root of its decision tree corresponds to the vertex
we assign a color to first (vertex 1 in this example). Then for each possible color for the
first vertex, a child is created for the first vertex of the tree. So the second level of the
decision tree corresponds to the second vertex to be colored. Then in general, for each
possible color for each vertex of level i of the decision tree, a child is created. Those
children form level i+1 of the decision tree. The decision tree for the graph of (c) is given
below. Since any color can be assigned to vertex 1 without loss of generality, it has just
one child in the actual decision tree. Also since in this case the i-th and (i+1)-th vertices
are connected by an edge for i = 1, 2, 3, 4, they can not have the same color. So each
vertex after vertex 1 has two colors to choose from. So they each have two children in the
decision tree.

Thus during the process of solving the problem a decision is made at each level and when
all levels are covered, the problem is solved. A path from the root to a leaf corresponds to
a coloring of the vertices of the given graph. A decision tree, however, does not tell us
how to make decisions. Also a decision tree does not tell how to order the vertices for
coloring, that is which vertex to color first, second etc.
A deterministic machine (or algorithm) has a specific fixed set of rules for making a
decision at each level of the decision tree. Although it knows what to do at every stage of
problem solving, the decisions it makes are not necessarily the right ones. When it makes
wrong decisions, it must retract earlier decisions and try different paths, which is called
backtracking.
For the graph coloring problem a deterministic algorithm might first order the vertices of
the graph in decreasing order of their degree and also order colors. Then, following the
order of the vertices, assign to each vertex the highest order color available for the vertex.
Since that kind of algorithm is not guaranteed to use the minimum number of colors, it
may produce a wrong answer unless there is some provision for backtracking.
A nondeterministic (Turing) machine, on the other hand, is a fictitious machine and
somehow knows which branch (child) to select at each step. It always makes a right
selection.

A decision problem is said to belong to class NP if each vertex in its decision tree has a
finite number of children and if it can be solved by a nondeterministic (Turing) machine
in polynomial time.
The graph coloring problem is in class NP, so are the satisfiability problem for
propositional logic and most of the scheduling problems just to name a few. Also there
are other characterizations of class NP. Interested readers click here.
At this moment it is not known whether or not problems in class NP can be solved with a
polynomial time algorithm in the worst case. The consensus is that there is no polynomial
time algorithm to solve them. It would take at least exponential time. Among the
problems in class NP, there are problems which all problems of class NP can be
transformed to in polynomial time. Those problems are called NP-complete problems. If
a polynomial time algorithm is found for any one of the NP-complete problems, all the
problems in NP can be solved in polynomial time. Below we are going to study NP-
complete problems. We start our discussion with the concept of polynomial time
transformation (reduction). Basically we say a decision problem Q1 is polynomially
reducible to a decision problem Q2 if and only if there is a transformation that transforms
any arbitrary instance of Q1 into an instance of Q2 in polynomial time such that the
answer to Q1 is yes if and only if the answer to Q2 is yes.
A little more formally we define this in terms of languages. Note that a decision problem
can be viewed as a language of its instances and that solving it can be considered as
recognizing the language as we have seen earlier.

Let L1 and L2 be languages over alphabets 1 and 2, respectively. We say that L1 is


polynomial-time reducible to L2 if and only if there is a function f from 1* to 2*
*
such that for any string x 1 ,x L1 if and only if f(x) L2 and f can be computed
in polynomial time.

For example let us consider the following two problems: graph coloring and scheduling
of committee meetings.

The graph coloring problem is as given above. In the scheduling of committee meetings
problem, committees with their members and a positive integer k are given. The problem
is whether or not the meetings of the committees can be scheduled in k or less time slots
so that everyone can attend one's meetings. Note that some people may be in more than
one committee.

Let us try to show that this scheduling problem is polynomial time reducible to the graph
coloring problem.
What we need to do is given an instance of the scheduling problem construct an instance
of the graph coloring problem, that is construct a graph and give the number of colors to
be used to color its vertices so that the meetings can be scheduled if and only if graph can
be colored.
Let us consider the following transformation:

For each committee add a vertex to the graph, and if and only if two committee have
some members in common, connect with an edge the vertices corresponding to the
committees. Then the meetings can be scheduled in k or less time slots if and only if the
graph can be colored with k or less colors.
For example suppose that we are given the committees 1, 2, 3 and 4 with the
memberships { a, b }, {a, c, d }, { b, d } and { a, c }, respectively. Suppose also that k =
3. Thus the scheduling problem asks whether or not the meetings of the given committees
can be scheduled in 3 time slots without any conflicts.
The corresponding graph for the graph coloring problem can be constructed as follows:
Corresponding to the committees 1, 2, 3 and 4, add vertices 1, 2, 3 and 4 to the graph.
Then since committees 1 and 2 share a, an edge is inserted between vertices 1 and 2.
Similarly since committees 1 and 3, and 1 and 4 share members, edges are added between
1 and 3, and 1 and 4. Proceeding similarly the following graph is obtained corresponding
to the committee memberships.

Suppose that the meetings can be scheduled in p time slots, where p k. Then the
committees can be grouped into p groups so that the committees in the same group can
meet at the same time. Corresponding to this grouping assign colors to the vertices of the
graph so that the vertices in the same group are given the same color and those in
different groups are given different colors. This coloring uses p colors which does not
exceed k, and vertices connected with an edge have different colors. For if any two
vertices are connected with an edge, then that means that the corresponding committees
share some members and that they are scheduled to meet in different time slots. Thus
these two vertices must get different colors.
Conversely if the graph can be colored with k or less colors, then it can be easily seen
that the committees can meet in k or less time slots.
It is also easily seen that the transformation, that is the construction of graph for a given
set of committees, can be done in time polynomial in the size of the problem, which in
this case can be taken as the number of committees.

We are now ready to discuss NP-completeness.

It was first proven by S. Cook that the problems of class NP can be polynomial time
reducible to the satisfiability problem of propositional logic. Subsequently the
satisfiability problem was found to be polynomial time reducible to many other problems.
As a consequence if a polynomial time algorithm is found for any one of those problems,
all the problems can be solved with polynomial time algorithms. This group of problems
are called NP-complete problems. Formally a problem is NP-hard if every problem in
class NP can be polynomial time reducible to it. A problem is NP-complete if it is in
class NP and NP-hard.
It can be easily seen that if a problem P at hand is NP-hard and if a problem known to be
NP-complete can be polynomial time reducible to P, then P is also NP-complete. For all
the problems in class NP can be reduced to P through the known NP-complete problem in
polynomial time.
If a problem is NP-complete, then the consensus today is that it is most likely that no
polynomial time algorithms i.e. fast algorithms exist to solve it.

Today hundreds of problems are known to be NP-complete. Some of them are listed
below.

NP-complete Problems

1. Satisfiability Problem for Propositional Logic


2. Graph Color Problem
3. Committee Meeting Schedule Problem In fact most scheduling problems are NP-
complete.
4. Traveling Salesman Problem Given cities and traveling times between cities, a
traveling salesman wants to know a shortest route to visit all cities exactly once
and come back to where he/she started.
5. Bin Packing Problem Given a set of objects, their sizes and a number of bins of
the same size, find out whether or not the objects can be put into the bins.
6. Partition Problem Given a set of integers, group them into two groups so that the
sum of the numbers of one group is equal to that of the other group.
7. Subgraph Isomorphism Problem Given two graphs, find out whether or not one is
a subgraph of the other.
8. Set Cover Problem Given a set S, a collection of subsets of S and an integer k,
find out whether or not there are k or less subsets in the collection whose union is
S.
9. Knapsack Problem Given a knapsack of size S, a set of objects, their sizes, their
values and an integer V, is it possible to select objects so that the sum of their
sizes does not exceed S and the sum of their values is V or larger ?
10. 3-Dimensional Matching
Given three sets A, B and C of the same size, and a subset S of the Cartesian
product A X B X C.
Is there a subset T, called a matching, of S such that every element of A, B, and C
appears exactly once in T ?

For example, let A = {1,2}, B = {a,b}, and C = {x,y}, and S = {(1,b,x),(1,b,y),


(2,a,x), (2,b,y)}.
Then T = {(1,b,y), (2,a,x)} is a desired set satisfying all the requirements. Note
that {(1,b,x),(2,a,x)} is not a matching.

You might also like