You are on page 1of 210

Theory of Automata

CS09305
Before the start of the Formal lecture
Look at the statement give below

2
The Course

• Course Code: CS09305


Course Title: Theory of Automata
• Instructor: Shahid Yousaf
– Email Address: shahid.yousaf@cs.uol.edu.pk
– Office: Room 109 (1st Floor)
• Web Address: https://Slate.uol.edu.pk
• Term (Semester): Spring 2021
• Duration: 16 Weeks
Text and Reference Material
1. Introduction to Computer Theory, by
Daniel I. Cohen, John Wiley and Sons,
Inc., 1991, Second Edition

2. Introduction to Languages and Theory of


Computation, by J. C. Martin, McGraw Hill
Book Co., 1997, Second Edition

4
Grading
• Following is the division of marks:

• Mid-Term Exam 25
• Assignments 10
• Quizzes 15
• Final Exams. 50

– Marks division might change during the


semester

5
INTRODUCTION
Introduction

 Psychologists, mathematicians, engineers and


some of the first computer scientists shared
a common interest:
– To model the human thought process.
– Whether in the brain or in a computer.

 Warren McCulloch and Walter Pitts, two


neurophysiologists, were the first to present
a description of finite automata in 1943.
Objective of the Course
• This course is about the fundamental
capabilities and limitations of computers.
• This theory is very much relevant to
practice, for example, in the design of new
programming languages, compilers, string
searching, etc., etc.
• This course helps you to learn problem
solving skills.
• Theory teaches you how to think, express
and solve problems,.
Cont….
• Every time we introduce a new machine, we will
learn its language; and every time we develop a
new language, we will try to find a machine
that corresponds to it.
• In particular, the way we shall be studying
about computers is to build mathematical
models, called machines, and then to study
their limitations by analyzing the types of
inputs on which they can operate successfully.
• The collection of these successful inputs is
called the language of the machine
Automata Theory
 Automata theory is the study of:
• Abstract machines (or more appropriately,
abstract 'mathematical' machines or systems)
• And the computational problems that can be
solved using these machines.
 These abstso ract machines are called automata.
• Automata is Greek letters .Automata is a word
formulated from automation, which means
machine designing or replacing human beings with
machines
It is the plural of automaton, and it means
“something that works automatically”.
 An automaton can be a finite representation of a
formal language that may be an infinite set.
Mathematical Preliminaries

11
SETS
• Definition: A set is well defined collection of objects,
which are
• unordered
• distinct
• same type
• with common properties
• Sets are written with curly braces {}, and the elements
in the set are written within the curly braces.
• The set {a, b, c} has elements a, b, and c.
• The sets {a, b, c} and {b, c, b, a, a} are the same since
order does not matter in a set and since redundancy
does not count.
• The set {a} has element a. Note that {a} and a are
different things; {a} is a set with one element a.
• The set {xn : n = 1, 2, 3, . . .} consists of x,
xx, xxx, . . ..
• The set of positive even numbers is {2, 4,
6, 8, 10, 12, . . .} = {2n : n =1, 2, 3, . . .}.
• The set of odd numbers is {1, 3, 5, 7, 9, 11,
13, . . .} = {2n + 1 : n = 0, 1, 2, . . .}.
Set Representations

C = { a, b, c, d, e, f, g, h, i, j, k }

C = { a, b, …, k } finite set

S = { 2, 4, 6, … } infinite set

S = { j : j > 0, and j = 2k for some k>0 }

S = { j : j is nonnegative and even }

Courtesy Costas Busch - RPI 14


A = { 1, 2, 3, 4, 5 }
U
6 A
2 3 8
1
7 4 5
9
10

Universal Set: all possible elements

U = { 1 , … , 10 }
15
Set Operations
A = { 1, 2, 3 } B = { 2, 3, 4, 5}
A B
• Union
2 4
1
A U B = { 1, 2, 3, 4, 5 } 3 5

• Intersection
A B = { 2, 3 }
U
2
3
• Difference
A-B={1}
1
B - A = { 4, 5 }
Venn diagrams
Courtesy Costas Busch - RPI 16
• Complement
Universal set = {1, …, 7}
A = { 1, 2, 3 } A = { 4, 5, 6, 7}

4
A
A 3 6
1
2
5 7

A=A
17
{ even integers } = { odd integers }

Integers

1 odd
even
6 5
2
0
4
3 7

18
DeMorgan’s Laws

AUB=A B
U

A B=AUB
U

19
Empty, Null Set:
={}

SU =S

S =
U
= Universal Set

S- =S

-S=

20
Subset
A = { 1, 2,3,5} B = { 1, 2, 3, 4, 5 }
A B

U
Proper Subset: A B

U
B
A

21
Disjoint Sets
A = { 1, 2, 3 } B = { 5, 6}

A B=
U

A B

22
Set Cardinality
• For finite sets
A = { 2, 5, 7, 1, 10 }

|A| = 5

(set size)

23
Powersets
A powerset is a set of sets

S = { a, b, c }

Powerset of S = the set of all the subsets of S

2S = { , {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c} }

Observation: | 2S | = 2|S| ( 8 = 23 )

24
Cartesian Product
A = { 2, 4 } B = { 2, 3, 5 } C={ 7,8}

A X B = { (2, 2), (2, 3), (2, 5), ( 4, 2), (4, 3), (4, 5) }

|A X B| = |A| |B|

Generalizes to more than two sets

AXBX…XZ
25
Theory of Automata
Handout no 2

26
Courtesy Costas Busch - RPI 27
Languages

28
Language

• In English, there are at least three


different types of entities: letters,
words, sentences.
• Letters are from a finite alphabet { a, b,
c, . . . , z }
• Words are made up of certain
combinations of letters from the
alphabet.
Not all combinations of letters lead to a valid English word.
• Sentences are made up of certain
combinations of words.
Not all combinations of words lead to a valid English sentence.

• So we see that some basic units are


combined to make bigger units.
Languages
• How can you tell whether a given sentence
belongs to a particular languages
– Black is cat the
– The tea is hot
– I like chocolates two much
• Rules give a clue to forming as well as
validating sentences.
Formal vs. Informal

• Informal language
• Incoherent strings are also
understandable
• Like idiom
• Raise ambiguity
– Interpretation varies with region
– Same words have multiple meanings.
• Like, light, base, etc.
Informal languages

• Natural languages are generally defined


informally
• Human brain
– Capable to understand incoherent even
invalid sentences.
• You mangoes like
– Rectify grammatical errors etc.
– Resolve ambiguity
• Interpret according to context
• Supporting aids such as Facial expressions and
body language etc.
How to Communicate with machines
?
• Need a language: what sort
• Machines don’t have human mind though
may have its partial imitation
• Would fail on incorrect or ambiguous input
– Some recovery or input corrections may be
proposed but again very limited.
• Thus need a precise, explicit and universal
definition of communication language
Summary of Languages
• Three aspects/specifications
– Lexical
• Defines valid words/units of a language
– Syntactic
• Defines rules for combining the units to form
valid sentences (computer programs in context
of machines)
– Semantic
• Concerned with the interpretation or meaning of
a sentence (what output to produce in context
of machines)
• Affected by ambiguity the most.
Formal languages

• Rules defined explicitly and clearly


• No ambiguities
• Universally uniform understanding
• Lets the machine
– Interpret an input uniformly every time. i.e.
always produces same output for a
particular input
– Explicitly reject invalid input
Formal Languages

• Need uniformly understandable notation


• Representations
– Alphabet
• Represents a finite set of fundamental units of
lanauges, e.g. for English ={a,b,….z.A,…Z,}
∑ = {0,1}
∑ = {0,1,2,3,4,5,6,7,8,9}
Formal Languages

– List of words
• Set of all valid words of a given language, e.g., a
language English_Words that contains all valid
words of English would have a = {all entries of
the dictionary + punctuation marks and blank
space….}
• Denoted by
• Strings: A string a finite sequence of symbols
chosen from alphabet. For example
0111100 , 123045, abbbcdeg etc.
• String Variable: A letter used for
denoting a string. The author uses w, x,
y and z as string variable. For example
w = 0111100 , x = 123045, z = abbbcdeg
• Length of String: The number of
positions for symbols in the string. For
simplicity we can say that it is the
number of symbols in the string. For
example
|w| = 7 , |x| = ? , |z| = ?
Finite vs. Infinite Languages

• Finite Languages
– Countable set of words
– Can be defined by rigorously listing the
words in
– E.g. English-Words
• Infinite Languages
– Infinite set of valid words
– Cant be listed completely
– E.g. English_Sentences
Infinite Languages
• Most of the languages are infinite
• How can u check whether a word belongs to
a language if it is
– Finite
• Checking its entry in
– Infinite
• Validating against rules
Defining Languages
• Define alphabet set
• Define rules for forming valid words and
sequences of words from 
– Called grammar
– Can be descriptive
– Can be mathematical
• Can also define supporting functions e.g., length(X),
reverse(x)
Defining languages
• Example ={a,b,…z}
– L = {all words formed only of odd number of xs}
– L = {xn | n is odd}
– L = {all words of length less than or equal to 4}
– PALINDROME ={Λ, all strings x such that
reverse (x) = x}
Kleene Closure
• Set closure
• Kleene Closure (applied to )
– A set of all the strings (finite) that can be
formed by the elements of  where the
elements may be repeated any number of
times.
– Denoted by *
– Also called Kleene star.
• ∑* : The set of all strings over an
alphabet ∑ and called Kleene Star
Closure of alphabet. So we have
∑* = ∑0 U ∑1 U ∑2 U ∑3 U……………
• ∑+ : The set of all strings over an
alphabet ∑ excluding empty string, ε,
and called plus operation. So we have
∑+ = ∑1 U ∑2 U ∑3 U……………
Some observations
• Λ represents an empty string (not alphabet
thus not a part of )
• ε also represents the same
• ε is not equivalent to 
• If  =  then
– * = {Λ}
• Is S* == (S*)* and so on
Alphabets and Strings
We will use small alphabets:   a, b

Strings
a
ab u  ab
abba v  bbbaaa
baba w  abba
aaabbbaabab
47
String Operations

w  a1a2 an abba


v  b1b2 bm bbbaaa

Concatenation

wv  a1a2 anb1b2 bm abbabbbaaa

48
w  a1a2 an ababaaabbb

Reverse

w  an  a2 a1
R
bbbaaababa

49
String Length
w  a1a2 an
Length: w n

Examples: abba  4
aa  2
a 1
50
Length of Concatenation

uv  u  v

Example: u  aab, u  3
v  abaab, v  5

uv  aababaab  8
uv  u  v  3  5  8
51
Empty String
A string with no letters: ,

Observations:  0

w  w  w

abba  abba  abba


52
Substring
Substring of string:
a subsequence of consecutive characters

String Substring
abbab ab
abbab abba
abbab b
abbab bbab
53
Prefix and Suffix
abbab
Prefixes Suffixes
 abbab w  uv
a bbab
prefix
ab bab
suffix
abb ab
abba b
abbab 
54
Another Operation
w  ww
n

 
w
n

Example: abba   abbaabba


2

Definition: w 
0

abba   
0

55
The * Operation
 * : the set of all possible strings from
alphabet 

  a, b
*   , a, b, aa, ab, ba, bb, aaa, aab,

56
The + Operation
 : the set of all possible strings from

alphabet  except 

  a, b
*   , a, b, aa, ab, ba, bb, aaa, aab,


   * 

  a, b, aa, ab, ba, bb, aaa, aab,
57
Languages
A language is any subset of *

Example:   a, b
*   , a, b, aa, ab, ba, bb, aaa,

Languages: 
a, aa, aab
{ , abba, baba, aa, ab, aaaaaa}
58
Note that:

Sets   { }  {}

Set size {}    0

Set size {}  1


String length  0
59
Another Example

An infinite language L  {a b : n  0}
n n


ab
L abb  L
aabb
aaaaabbbbb
60
Operations on Languages
The usual set operations

a, ab, aaaa  bb, ab  {a, ab, bb, aaaa}


a, ab, aaaa  bb, ab  {ab}
a, ab, aaaa  bb, ab  a, aaaa
Complement: L   * L

a, ba   , b, aa, ab, bb, aaa,


61
Reverse

Definition: L  {w : w  L}
R R

Examples: ab, aab, baba  ba, baa, abab


R

L  {a b : n  0}
n n

L  {b a : n  0}
R n n

62
Concatenation

Definition: L1L2  xy : x  L1, y  L2 

Example: a, ab, bab, aa

 ab, aaa, abb, abaa, bab, baaa

63
Another Operation
Definition: L 
n

LL L
n

a, b  a, ba, ba, b 


3

aaa, aab, aba, abb, baa, bab, bba, bbb


0
Special case: L  

a , bba , aaa 0  


64
Star-Closure (Kleene *)

Definition: L*  L  L  L 
0 1 2

Example:
 , 
a, bb, 
 
a, bb*   
 aa , abb , bba , bbbb, 
aaa, aabb, abba, abbbb,
65
Positive Closure

Definition: 
L  L  L 
1 2

 L * 

a, bb, 
  
a, bb  aa, abb, bba, bbbb, 
aaa, aabb, abba, abbbb,
 
66
• Sec-b end s on slide 31
• Sec-C ends on slide 35
• Sec- A ends on slide 35

Courtesy Costas Busch - RPI 67


Theory of Automata

Recursive Definitions
Recursive Language Definition
 A recursive definition is characteristically a
three-step process:
 1. First, we specify some basic objects in the set. The
number of basic objects specified must be finite.
 2. Second, we give a finite number of rules for
constructing more objects in the set from the ones we
already know.
 3. Third, we declare that no objects except those
constructed in this way are allowed in the set.
Example:

 P-EVEN is defined by these three rules:


 Step 1: 2 is in P-EVEN.
 Step 2: If x is in P-EVEN, then so is x + 2.
 Step 3: The only elements in the set P-EVEN are those that can
be produced from the two steps above.
Example:
 Example: Let PALINDROME be the set of all
strings over the alphabet = {a, b} that are
the same spelled forward as backwards;
i.e., PALINDROME = {w : w = reverse(w)}
= {, a, b, aa, bb, aaa, aba, bab, bbb, aaaa,
abba, . . .}.
Recursive Definition of
PALINDROME
A recursive definition for PALINDROME is as
follows:
 Step 1: ^, a, and b are in PALINDROME.
 Step 2: If w is PALINDROME, then so are awa
and bwb.
 Step 3: No other string is in PALINDROME unless
it can be produced by Step 1 and 2.
Arithmetic Expressions(AE)
 We recursively define AE using the
following rules:
 What are the rules?
Recursive Definition of AE

 Rule 1: Any number (positive, negative, or zero) is in


AE.

 Rule 2: If x is in AE, then so are


(i) (x)
(ii) -x (provided that x does not already start with a minus sign)

 If x and y are in AE, then so are


(i) x + y (if the first symbol in y is not + or -)
(ii) x - y (if the first symbol in y is not + or -)
(iii) x * y
(iv) x / y
(v) x ** y (our notation for exponentiation)
74 Theory Of Automata
 The above definition is the most natural, because it is the method
we use to recognize valid arithmetic expressions in real life.
 For instance, we wish to determine if the following expression is
valid:
(2 + 4) * (7 * (9 - 3)/4)/4 * (2 + 8) - 1
 We do not really scan over the string, looking for forbidden
substrings or count the parentheses.
 We actually imagine the expression in our mind broken down into
components:
Is (2 + 4) OK? Yes
Is (9 - 3) OK? Yes

75 Theory Of Automata
Arithmetic Expression AE

 Obviously, the following expressions are not valid:


(3 + 5) + 6) 2(/8 + 9) (3 + (4-)8)
 The first contains unbalanced parentheses; the second
contains the forbidden substring /; the third contains the
forbidden substring -).
 Are there more rules? The substrings // and */ are also
forbidden.
 Are there still more?
 The most natural way of defining a valid AE is by using a
recursive definition, rather than a long list of
forbidden substrings.
76 Theory Of Automata
Regular Expressions
Defining Languages Using
Regular Expressions
 Previously, we defined the languages:
• L1 = {Xn for n = 1, 2, 3, . . .}
• L2 = {x, xxx, xxxxx, . . .}
 But these are not very precise ways of
defining languages.
 So, now we want a very precise way of
defining a languages, and we will do this
using regular expressions
Regular Expressions
 Regular expressions are written in bold face letters and
are a way of specifying the language.
 Formal way to define the lexical specifications of a
language
 Remove ambiguity altogether
 Called expressions on account of similarity with
arithmetic expressions
 Use *, + and ()
 * shows repetition
 + presents choice or disjunction (some time authors used
U for this purpose)
 () used for grouping
Language-Defining Symbols:
Star Sign
 We now introduce the use of the Kleene star, applied not
to a set, but directly to the letter x and written as a
superscript: x*.
 This simple expression indicates some sequence of x’s
(may be none at all):
x* = Λ or x or x2 or x3…
= xn for some n = 0, 1, 2, 3, …

a*= ^, a, aa,aaa,aaaa,aaaaa…………………
(ab)*= ^, ab, abab, ababab,……………

 We can think of the star as an unknownpower.


 That is, x* stands for a string of x’s, but we do not
80
specify how many, and it may be the null string .
 The notation x* can be used to define languages
by writing, say L = language (x*)

 Since x* is any string of x’s, L is then the


language of all possible strings of x’s of any
length (including Λ).

81
 Given the alphabet = {a, b}, suppose we wish to define the
language L that contains all words of the form one a followed by
some number of b’s (maybe no b’s at all); that is
L = {a, ab, abb, abbb, abbbb, …}

 Using the language-defining symbol, we may write


L = language (ab*)

 This equation obviously means that L is the language in which the


words are the concatenation of an initial a with some or no b’s.

82
 We can apply the Kleene star to the whole
string ab if we want:
(ab)* = Λ or ab or abab or ababab…
 Observe that
(ab)* ≠ a*b*
 because the language defined by the
expression on the left contains the word
abab, whereas the language defined by
83
the expression on the right does not.
 If we want to define the language L1 = {x; xx; xxx; …}
using the language-defining symbol, we can write
L1 = language(xx*)
which means that each word of L1 must start with an x
followed by some (or no) x’s.

 Note that we can also define L1 using the notation + (as


an exponent) introduced in previous lecture
L1 = language(x+)

84
Plus Sign

 Let us introduce another use of the plus


sign. By the expression
x+y
where x and y are strings of characters
from an alphabet, we mean either x or y.

 Care should be taken so as not to confuse


this notation with the notation + (as an
85
exponent).
Example

 Consider the language T over the alphabet


Σ = {a; b; c}:
 T = {a; c; ab; cb; abb; cbb; abbb; cbbb; abbbb;
cbbbb; …}
 In other words, all the words in T begin with
either an a or a c and then are followed by some
number of b’s.
 Using the above plus sign notation, we may
write this as
86
T = language((a+ c)b*)
Example

 Consider a finite language L that contains


all the strings of a’s and b’s of length
three exactly:
L = {aaa, aab, aba, abb, baa, bab, bba,
bbb}
 Thus, we may write
L = language((a+ b)(a + b)(a + b))

87
Example

 In general, if we want to refer to the set of all possible


strings of a’s and b’s of any length whatsoever, we could
write
language((a+ b)*)

 This is the set of all possible strings of letters from


the alphabet Σ = {a, b}, including the null string.

88
Regular Expressions
 Given  = {a,b}
 a* = {Λ, a,aa,aaa,aaa,aaaa,aaaaa, …}
 ab* = {a, ab,abb,abbb,abbbb, …}
 a+b = {a,b}
 (ab)* = {Λ, ab, abab, ababab, …}
 (a+b)* = {Λ, any string of as and bs}
Formal Definition of Regular
Expressions
 The set of regular expression is defined by
following rules
1. Every letter of  and Λ is a regular
expression
2. If r1 and r2 are regular expressions, then so
are
 (r1)
 r1r2
 r1+r2
 r1*

3.Nothing else is a regular expression


Regular Expressions
 Whether following are RE if so what
languages do they generate
a (b + a)*
 bb(a+b)
 (a+b)(a+b)(a+b)
 (a+b)*ba
 (a+b)*a(a+b)*
 (a+b)*aa(a+b)*
Regular Expressions
 Write RE for the following languages over
the  ={a,b}.
 All words ending with b
 All words that start with a
 All words that start with a double letter
 All words that contain at least one double letter
 All words that start and end with a double letter
 All words of length >=3
 All words that contain exactly one a or exactly
one b
 All words that don’t end at b
Regular Expressions
 Example: Give a regular expression for each of the
following over the alphabet { 0, 1 }:

 { w | w begins with a ‘1’ and ends with a ‘0’ }

 { w | w contains exactly three 1’s}

 { w | w contains at least three 1’s}

 { w | w is a string that begin with a ‘1’ and contain


exactly two 0’s }

 Regular expression definition of a language is not unique.

93
[Section 1.3]
Regular expressions

Examples: give regular expressions for the


following languages:
- { w ε {0,1}* | w contains the substring 01 }
or
- { w ε ∑* | w contains the substring 01 }
- {w in {0,1}* | second symbol of w is a 1}
- { w ε {0,1}* | |w| < 4 }
Some exercises on regular expressions

 Example: What is L((a  b)*a(a  b)*)?


 Ans: {w in {a, b}* | w contains at least one
a}
 Write regular expressions for:
1. {w in {a,b}* | |w| is odd }.
2. {w in {a,b}* | w does not have ab as a substring}.

95
Regular Expression
Identities
Regular Expression Identities
1. u = u = u
2. * = 
3. u+v=v+u
4. u+u=u
5. u* = (u*)*
6. u (v + w) = uv + uw
7. (u + v) w = uw + vw

96
Languages Associated
with Regular
Expressions
Definition

 The following rules define the language associated


with any regular expression:

 Rule 1: The language associated with the regular


expression that is just a single letter is that one-letter
word alone, and the language associated with Λ is just
{Λ}, a one-word language.

 Rule 2: If r1 is a regular expression associated with the


language L1 and r2 is a regular expression associated
with the language L2, then:
(i) The regular expression (r1)(r2) is associated with the product
L1L2, that is the language L1 times the language L2:

language(r1r2) = L1L2
98
Definition contd.
 Rule 2 (cont.):

(ii) The regular expression r1 + r2 is associated


with the language formed by the union of L1
and L2:
language(r1 + r2) = L1 + L2

(iii) The language associated with the regular


expression (r1)* is L1*, the Kleene closure of
the set L1 as a set of words:
99 language(r1*) = L1*
Languages associated with REs
 r1= a, r2 = b, r3 = Λ
 IfL1 is associated with r1 and L2 is
associated r2
 Language(r1r2)= L1L2
 Language(r1+r2) = L1+L2 = L1 U L2
 Language(r1*) = L1* (Kleen’s Closure of L1)
Regular Languages
 How to tell whether a language is regular
 Define an RE for it, if it is possible the language
is Regular other wise non-regular
 Definition
 Any language that can be represented by a
regular expression is called a regular
language
 It is to be noted that if r1, r2 are regular
expressions, corresponding to the languages L1
and L2 then the languages generated by r1+
r2, r1r2( or r2r1) and r1*( or r2*) are also
regular languages.
Regular Languages
 Example
 Consider the language L, defined over
Σ = {a,b}, of strings of length 2,
starting with a, then
 L = {aa, ab}, may be expressed by
the regular expression aa+ab. Hence
L, by definition, is a regular language.
Regular Languages
 All finite languages are regular
 Example
 Consider the language L, defined over Σ
= {a,b}, of strings of length 2, starting
with a, then L = {aa, ab}, may be
expressed by the regular expression
aa+ab. Hence L, by definition, is a
regular language.
Theorem

 If L is a finite language (a language with only finitely many


words), then L can be defined by a regular expression. In
other words, all finite languages are regular.

Proof

 Let L be a finite language. To make one regular expression that


defines L, we turn all the words in L into boldface type and insert
plus signs between them.

 For example, the regular expression that defines the language


L = {baa, abbba, bababa} is (baa + abbba + bababa)

 This algorithm only works for finite languages because an infinite


language would become a regular expression that is infinitely long,
which is forbidden.
104
Equivalent Regular Expressions
 Definition
 Two regular expressions are said to be
equivalent if they generate the same
language.
 Example
 Consider the following regular
expressions
 r1 = (a + b)* (aa + bb)
 r2 = (a + b)*aa + ( a + b)*bb then both
regular expressions define the language
of strings ending in aa or bb
Example

 The language of all words that have at least two a’s can
be defined by the expression:
(a + b)*a(a + b)*a(a + b)*

 Another expression that defines all the words with at


least two a’s is
b*ab*a(a + b)*

 Hence, we can write


(a + b)*a(a + b)*a(a + b)* = b*ab*a(a + b)*

where by the equal sign we mean that these two


expressions are equivalent in the sense that they
106
describe the same language.
Example

 Let V be the language of all strings of a’s and b’s in


which either the strings are all b’s, or else an a followed
by some b’s. Let V also contain the word Λ. Hence,
V = {Λ, a, b, ab, bb, abb, bbb, abbb, bbbb, …}
 We can define V by the expression
b* + ab*
where Λ is included in b*.
 Alternatively, we could define V by
(Λ + a)b*
which means that in front of the string of some b’s, we
have
either an a or nothing.
107
Chapter 5:

Finte Automata(FA)/Deterministic Finite


Automata(DFA)
Formal Language Definitions
• Why need formal definitions of language
– Define a precise, unambiguous and uniform
interpretation
– Communication with machines
• Formal Language notation/definition
– Regular Expression
• Tell how to generate words of that language
• Tell which words belong to this language
• Can this be automated?
Finite Automata
• Language Recognizers
• Machines embedded with grammatical
rules that recognize a language
• Automated language recognition
• REs define a language and FAs accept (or
reject ) them
Finite Automata

• Visual notations (abstract machines)

• Sort of graphs consisting of nodes called


states and edges called transitions

• States serve as memory locations that keep a


track of last character read

• Transitions define where to go on reading a


particular character
Finite Automaton

• A finite automaton is a collection of


followings:
– A finite set of states
• Exactly one initial state (start state)
• One or more (may be none) final states that mark the
acceptance of a word
• Intermediate states that are neither start not final
states
– An alphabet  of possible input letters
Finite Automaton

– A finite set of transitions that tell for each state


and for each letter of the input alphabet which
state to go to next
Finite Automaton
• The start state marks the beginning of reading
every input
• Reading a character triggers a transition from
that state which may transfer control to some
other state and the reading mechanism
advances to next character and the process
continues
• When the input terminates, if the control is left
with a final or accepting state, the input string is
accepted otherwise it is rejected and the FA
resets control to the initial state for next input
Finite Automaton
• The state to go to next on reading a letter of the

input string is determined automatically and

deterministically by the transitions and is fixed

(for that particular state and input character)

• The transitions are fired automatically

• A single character is consumed for each

transition
Finite Automaton

• Visual representations
– States represented by circles labeled to
identify each distinctly
1
• Initial (- sign) and Final states (+ sign )
– Transitions
• Directed edges labeled with the characters of 
Finite Automata
• Example
– Language of all words that end at b

a
b
b
-1 +2

a
Characteristics of FA

• Every FA must have exactly one start state


• There may be multiple or may be no final
states
– In the latter case the FA doesn’t accept any
language
• Only a single character is read on a state
at a time
Characteristics of FA

• Every state define a transition for every


character in the alphabet set or
alternatively every state has exactly as
many outgoing transitions as the number
of characters in , each labeled with a
distinct letter from 
– No duplicate edges
– No missing edges
Characteristics of FAs

• An FA is built for a particular language and


recognizes only that language

• It should accept all valid words of the


language

• It should reject all invalid words of the


language
Examples ={a, b}
• All words that contain even number of
letters
• All words ending with ab
• All words that start with a
• {}
• All words of length >=3
• All words that don’t end at ba
• All words that contain a triple letter either
aaa or bbb
Mathematical Representations of FA

• FA = (Q, ,q0,F,δ)
– Q = {q0,q1,q2,..qn where n is finite}
–  = set of input alphabets
– q0 is the start states
– F  Q is the set of final states F may be 
– δ is the transition function
• δ (qi,xj) = qk

• Mathematical representation of FA for all


words ending at b
Transition Tables

• Tabular representation of an FA
– Table representing states and transitions

a
b a b
b
-1 +2 -1 1 2
+2 1 2
a
Languages of FA

• FAs define Regular Language

• Any language that can be define by a


regular expression can be recognized by
an FA
Language of FA
• What language does the following FA
define

a 2 a

a, b
-1 b a +4

b
b
3
Language of FA
a a

b
-+1 2

b b

3
a
a
b
b
-+1 2

a
Examples

• All words that start with a double letter


• All words that start and end with a
double letter
• All words that do not start with a double
letter
• All words in which the second letter is b
FA
• EVEN-EVEN
– Language of all words having even number
of as and even number of bs

-+1 2
b

a a a a

b
3 4
b
Applications of FA
• Lexical Analyzer of Compiler
• Search mechanism in word processor

all except c c any letter

c a t
1- 2 3 4+

all except c and a c

all except c and t


Chapter 5:

Deterministic Finite Automata(DFA)

(Examples)
Examples ={a, b}
• All words that contain even number of
letters
• All words ending with ab
• All words that start with a
• All words of length >=3
• All words that don’t end at ba
• All words that contain a triple letter either
aaa or bbb
Mathematical Representations of FA

• FA = (Q, ,q0,F,δ)
– Q = {q0,q1,q2,..qn where n is finite}
–  = set of input alphabets
– q0 is the start states
– F  Q is the set of final states F may be 
– δ is the transition function
• δ (qi,xj) = qk

• Mathematical representation of FA for all


words ending at b
Transition Tables

• Tabular representation of an FA
– Table representing states and transitions

a
b a b
b
-1 +2 -1 1 2
+2 1 2
a
Languages of FA

• FAs define Regular Language

• Any language that can be define by a


regular expression can be recognized by
an FA
Language of FA
• What language does the following FA
define

a 2 a

a, b
-1 b a +4

b
b
3
Language of FA
a a

b
-+1 2

b b

3
a
a
b
b
-+1 2

a
Examples

• All words that start with a double letter


• All words that start and end with a
double letter
• All words that do not start with a double
letter
• All words in which the second letter is b
FA
• EVEN-EVEN
– Language of all words having even number
of as and even number of bs

-+1 2
b

a a a a

b
3 4
b
Applications of FA
• Lexical Analyzer of Compiler
• Search mechanism in word processor

all except c c any letter

c a t
1- 2 3 4+

all except c and a c

all except c and t


NFAs and Transition Graphs

Chapter 6
Deterministic FA (DFA)

• The FAs that we have studied so far are


DFA in that
– At every state there is exactly one outgoing
transition for a character and the machine can
follow the transition deterministically
• No duplicates

• No missing edges
Nondeterministic Finite
Automata(NFA)
• The FA where a state can have more than
one transition for the same character. This
puts the machine in an indecisive state for
which transition to follow
– Has duplicate transitions

– Can miss transitions for some characters

– Null transition
NFA
• Reduces number of states and transitions
• Costly execution
– Needs concurrent processing to find a successful path

• An NFA can have a successful and unsuccessful


path for the same input
• If an NFA has at least one successful path for an
input it is considered to be valid
• Machine crashes for an undefined transition thus
causing implicit reject
Example

• All words that start with b over Sigma (a, b)


• {b, bb, ba, bbb, baba, bbaaabba,….
NFA Language recognition

• Acceptance
– If at least one successful path exists

• Rejection
– Either machine crashes on input or ]
– No successful path exists
NFA

• Examples

– An NFA that accepts the language {bb, bbb}

– All words that contain bb in them

– All words contains a double letter

– {aa, bb, abb, baa, abba, baab, bababaa,


Epsilon Transitions ^,

• ε- Transitions

• A null transition that changes state but

doesn’t consume any character

• Possible with NFAs and Transition Graphs

(discussed next)
NFA
Alphabet = {a}

q1 a q2
a
Start
q0
a
q3
Example: Accepting

a a

q1 a q2
a
Start
q0
a
q3
Example: Accepting

a a

First choice q1 a q2
a
Start
q0
a
Parallel Processing
Second choice q3
Example: Accepting

a a

q1 a q2
a
Start
q0
a
q3 No transition so
leave it
Example: Accepting

a a

q1 a q2 Since there is no more


symbol to read and
a it is an accepting state
Therefore, the
NFA will “Accept”
Start
q0
a
q3
Example: Rejecting

a a a

q1 a q2
a
Start
q0
a
q3
Example: Rejecting

a a a

q1 a q2
a
Start
q0
a
q3
Example: Rejecting

a a a

q1 a q2
a
Start
q0
a
q3 No transition
Example: Rejecting

a a a

q1 a q2 No transition so leave

a
the state

Start
q0 Since there is no current state or
a accepting state therefore, NFA
will “Reject” the input string.
q3
Language of the NFA

L  {aa}

q1 a q2
a
Start
q0
a
q3
DFA NFA
1 Exactly one start state Exactly one start state

2 One or more or may be no One or more or may be no final


final state state

3 Only single character read Only single character read on


on making transition making transition
4 No duplicate Edges Has duplicate Edges

5 No missing Edges has missing Edges

6 No Null transition Can have null transition


Transition Graphs(TG)

• Can read multiple characters before


making a transition

• Thus every edge can be labeled with a


substring instead of a single character

• Can have multiple start states


Transition Graphs

• Examples
– All words that start and end with a double letter

a,b
aa,bb
-1 2

aa,bb
+3
DFA NFA TG

1 Exactly one start state Exactly one start state Can have more the one start state

2 One or more or may be no One or more or may be no final One or more or may be no final
final state state state

3 Only single character read Only single character read on Can read multiple character on
on making transition making transition making transition

4 No duplicate Edges Has duplicate Edges Has duplicate Edges

5 No missing Edges has missing Edges has missing Edges

6 No Null transition Can have null transition Can have null transition
Examples
• The arc from state 1 to state 2 is labeled with the
string aa, which is not a single letter.
• There are two arcs leaving state 2 labeled with b.
• There is no arc leaving state 2 labeled with a.
• There is an arc from state 1 to state 3 labeled
with , which is not a letter from .
• There is no arc leaving state 3 labeled with b.
Transition Graphs

• Examples
– All words that have al least one double letter in
them
– All words that begin and end with different
letters
– All words in which a occurs only in even
clumps and that end in three or more bs
– All words that have even number of letters
Generalized Transition Graphs
• A variation of TG
• A generalized transition graph is a
collection of three things
– A finite set of states, of which at least one is a
start state and some (may be none) are final
states
– An alphabet  of input letters
– Directed edges connecting some pairs of
states each labeled with a regular expression
Generalized Transition Graphs

a* a*

(ba +a)* (b + Λ)
-1 2 3

This machine accepts all strings without a double b


GTGs

• Examples
– All words having even number of as and bs

– All words that start with ab

– All words having as in clumps of even numbers


and end at one or more bs
NFA TO DFA
Non Deterministic Features of NFA
There are three main cases of non- determinism in
NFAs:
1. Transition to a state without consuming
any input.
2. Multiple transitions on the same input
symbol.
3. No transition on an input symbol.

To convert NFAs to DFAs we need to get rid of non-


determinism from NFAs.
Subset Construction Method

Using Subset construction method to convert NFA to


DFA involves the following steps:
 For every state in the NFA, determine all reachable states for
every input symbol.
 The set of reachable states constitute a single state in the
converted DFA (Each state in the DFA corresponds to a
subset of states in the NFA).
 Find reachable states for each new DFA state, until no more
new states can be found.
Subset Construction Method
Fig1. NFA without λ-transitions
Subset Construction Method
Fig1. NFA without λ-transitions

3
b a
a
a
a 2 b
a,b
1 5

a,b a
4

b
Subset Construction Method
Fig1. NFA without λ-transitions

3
b a
a
a
2
Step1
a b
Construct a transition table showing
a,b all reachable states for every state
1 5
for every input signal.
a,b a
4

b
Subset Construction Method
Fig1. NFA without λ-transitions Fig2. Transition table

3
b a
a
a
a 2 b
a,b
1 5

a,b a
4

b
Subset Construction Method
Fig1. NFA without λ-transitions Fig2. Transition table

3 Q δ(q,a) δ(q,b)
b a 1 {1,2,3,4,5} {4, 5}
a
a
a 2 b 2 {3} {5}
a,b
1 5 {2}
3 ∅
a,b a 4 {5} {4}
4

b
5 ∅ ∅
Subset Construction Method
Transition from state q with Transition from state q
Fig1. NFA without λ-transitions input a Fig2. Transition
with inputtable
b

3 q δ(q,a) δ(q,b)
b a Starts here {1,2,3,4,5} {4,5}
a 1
a
a 2 b 2 {3} {5}
a,b
1 5 {2}
3 ∅
a,b a 4 {5} {4}
4

b
5 ∅ ∅
Subset Construction Method

Fig2. Transition table


q δ(q,a) δ(q,b) Step2
1 {1,2,3,4,5} {4,5}
The set of states resulting from every
2 {3} {5} transition function constitutes a new
state. Calculate all reachable states
3 ∅ {2} for every such state for every input
signal.
4 {5} {4}

5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table

q δ(q,a) δ(q,b) Starts with


Initial state
q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}

2 {3} {5}
3 ∅ {2}

4 {5} {4}

5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) Starts with q δ(q,a) δ(q,b)
Initial state
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5}
2 {3} {5}
3 {2} {4,5}

4 {5} {4}

5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) Starts with q δ(q,a) δ(q,b)
Initial state
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5}
2 {3} {5}
3 {2} {4,5}

4 {5} {4}

5 ∅ ∅

Step3
Repeat this process(step2) until no
more new states are reachable.
Fig2. Transition table Fig3. Subset Construction table

q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)


{1,2,3,4,5} 1 {1,2,3,4,5} {4,5}
1 {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5}
3 ∅ {2} {2,4,5}
4 {5} {4}

5 ∅ ∅
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5}
4 {5} {4}
5
5 ∅ ∅ 4
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} {4,5}
4 {5} {4}
5
5 ∅ ∅ 4
{3,5}
Fig2. Transition table Fig3. Subset Construction table

q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)


{1,2,3,4,5} 1 {1,2,3,4,5} {4,5}
1 {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2} {3,5}
{2,4,5} {4,5}
4 {5} {4} 5 ∅ ∅
5 ∅ ∅ 4
{3,5}


Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} {4,5}
4 {5} {4}
5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5}
We already got 4 and 5.
So we don’t add them again.
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} {4,5}
4 {5} {4}
5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5} ∅ 2


2
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} {4,5}
4 {5} {4}
5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5} ∅ 2

∅ ∅ ∅
2
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} {4,5}
4 {5} {4}
5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5} ∅ 2

∅ ∅ ∅
2 3 5
3
Fig2. Transition table Fig3. Subset Construction table
q δ(q,a) δ(q,b) q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 1 {1,2,3,4,5} {4,5}
{1,2,3,4,5} {1,2,3,4,5} {2,4,5}
2 {3} {5}
{4,5} 5 4
3 ∅ {2}
{2,4,5} {3,5} {4,5}
4 {5} {4}
5 ∅ ∅
5 ∅ ∅ 4 5 4
{3,5} ∅ 2

∅ ∅ ∅
Stops here as there are 2 3 5
no more reachable states
3 ∅ 2
Fig4. Resulting FA after applying
Fig3. Subset Construction table Subset Construction to fig1
a
q δ(q,a) δ(q,b)
1 {1,2,3,4,5} {4,5} 12345
b 245 a
{1,2,3,4,5} {1,2,3,4,5} {2,4,5} 35

{4,5} 5 4 a
a,b a
{2,4,5} {3,5} {4,5} b
a b
1 ∅
3
5 ∅ ∅ a,b b
b a
4 5 4
2
{3,5} 2 a
∅ 45 5 b
∅ ∅ ∅ b 4 a
2 3 5
b
3 ∅ 2

You might also like