21CS51 ATCD MODULE 2 - 1 Regular Expressions

MODULE 2 PPT
VENKATESH BHAT
Senior Associate Professor
Department of CSE
AIET, Moodbidri
MODULE 2 SYLLABUS
• Regular Expressions and Languages: Regular Expressions,
Finite Automata and Regular Expressions, Proving Languages
Not to Be Regular
• Lexical Analysis Phase of compiler Design: Role of Lexical
Analyzer, Input Buffering, Specification of Token, Recognition
of Token.
• Textbook 1: Chapter3 – 3.1, 3.2, Chapter4- 4.1

• Textbook 2: Chapter3- 3.1 to 3.4
Regular Expressions
• The language accepted by finite automata can be easily described by
simple expressions called Regular Expressions. It is the most
effective way to represent any language.
• The languages accepted by some regular expression are referred to as
Regular languages.
• A regular expression can also be described as a sequence of pattern
that defines a string.
• Regular expressions are used to match character combinations in
strings. String searching algorithm used this pattern to find the
operations on a string.
The Operators of Regular Expressions
• UNION
• CONCATENATION
• CLOSURE
UNION Operation
• Union of two Languages L and M is denoted by L U M
• It is the set of strings that are in either L or M or Both.
• Example 1: if L = {0, 1, 2} and M = {1, 4}
then L U M = {0, 1, 2, 4}
• Example 2: if L = {ε, a, x, b} and M = {x, 1, 2}
then L U M = {ε, a, x, b, 1, 2}
CONCATENATION Operation
• Concatenation of Languages L and M is the set of strings
that can be formed by taking any string in L and
concatenating with any string in M (One string is followed
by another string) to form the result of concatenation.
• Concatenation can be denoted by a dot, or no operator.
L.M or LM
• Example 1: if L = {0, 1, 2} and M = {1, 4}
then LM = {01, 04, 11, 14, 21, 24}
• Example 2: if L = {a, x, b} and M = {ε, x}
then L U M = {a, x, b, ax, xx, bx}
CLOSURE Operation
• Closure of a language L is denoted by L* and represents the set of
those strings that can be formed by taking any number of strings from
L and concatenating all of them with or without repetition any
number of times.
• Example:
If L = {0, 1} then L* is all strings of 0’s and 1’s.
Strings like 1011, 0101101, etc… are acceptable
If L = {0, 11}, then L* consists of those strings of 0’s and 11’s
Strings like 1101100, 011011, 111111, 0000 etc… are acceptable
But, Strings like 1011, 11101, 01110 etc… are not acceptable.
Formal Definition of L*
• L* is the infinite union U i≥0 L i
, where L 0
= {ε}, L 1
= L, and
Li, for all i > 1 is LLL…L (concatenation of i copies of L)
• Example:
Let L = {0, 11}; L0 = {ε}, L1 = L, So, these first two terms
in the expansion of L* give us {ε, 0, 11}.
Consider Other Examples of L*
Consider L2. Occurrence of two strings from L with repetitions
allowed. So, there are four choices. L2 = {00, 011, 110, 1111}
• Consider L3. It is the occurrence of three strings from L.
L3 = {000, 0011, 0110, 01111, 1100, 11011, 11110, 111111}
To compute L*, We must compute Li for each i, and take the
union of all these languages. Li has 2i members.
L* = L0 U L1 U L2 U L3 U …
• ɸ* = {ε}; ɸ0 = {ε}; ɸ1 = {ε}; ɸ0 = {ε}; … ɸi = {ε}; So, ɸ is
one of only two languages whose closure is not infinite
Precedence of Regular Expression Operators
• The Operator * is of highest precedence. The Operator *
applies only to the smallest sequence of symbols to its left.
• Concatenation or “dot” operator. After grouping all stars to
their operands , we group concatenation operators to their
operands.
• Union Operator (+) are grouped with their operands.
Examples
• Consider the expression 01*+1
This is grouped as (0(1*))+1
The star operator is grouped first. Since the symbol 1
immediately to its left.
Next we group the concatenation between 0 and (1*). The
resultant regular expression becomes (0(1*)). Finally union
operator connects the expression and the expression to its
right, which is 1.
Exercises to the Students
• Write the Regular Expressions for the following languages over {0, 1}
(a) The set of all strings that begin with 110
Solution:  RE=
(b) Regular expression of strings begin and end with 110
Solution:  RE=
(c) Set of all strings that contain 1011.
Solution:  RE =
(d) Set of all strings that contain exactly three 1’s
Solution:  RE =
Solution:  RE= 110(0+1)*
Solution:  RE=
Solution:  RE =
Solution:  RE =
Solution:  RE= 110(0+1)*110
Solution:  RE =
Solution:  RE =
Solution:  RE= 110(0+1)*110
Solution:  RE = (0+1)*1011(0+1)*
Solution:  RE =
Solution:  RE= 110(0+1)*110
Solution:  RE = (0+1)*1011(0+1)*
Solution:  RE = 0*10*10*10*
What is the language accepted by the following
Regular Expression? 1* (01*0) *1*01*
What is the language accepted by the following
Regular Expression? 1* (01*0) *1*01*
• The language of strings having odd number of 0’s and more than 1.
• L(G)={set of all strings with odd number of 0’s (zero) and more than
1}
Give the Language Descriptions of the following Regular Expressions:
0(0+1)*1 All strings of 0’s and 1’s starting with 0 and
ending with 1
1 * (01*01* ) * All strings of 0’s and 1’s with even number
of 0’s
(0+1)*00 (0+1)* All strings of 0’s and 1’s with at least two
consecutive 0’s
((1+01) * (ε+0)) All strings of 0’s and 1’s without two
consecutive 0’s
ending with 1
of 0’s
consecutive 0’s
consecutive 0’s
ending with 1
of 0’s
consecutive 0’s
consecutive 0’s
ending with 1
of 0’s
consecutive 0’s
consecutive 0’s
ending with 1
of 0’s
consecutive 0’s
consecutive 0’s
(1 + ε)(00*1)*0* This regular expression captures all strings
over {0, 1} not containing two consecutive 1s.
(0*1*)*000(0+1)* This regular expression captures all strings
over {0, 1}containing aaa as a (consecutive)
substring
(0+10)*1*
substring
(0+10)*1*
substring
(0+10)*1*
substring
(0+10)*1* Left as Exercise to the Students
Regular Languages
Regular languages are a subset of all languages. (Not all
languages are regular)
The set of regular languages over alphabet Σ is recursively
defined as follows
• ∅, the empty set, is a regular language
• {ε}, the language consisting of only the empty string, is a
regular language
• For any symbol a ∈ Σ, {a} is a regular language.
• If L, M are regular languages, then so are L ⋃ M, LM, and L*.
Prove that language L = {a, aa} is a regular language.
• Proof:
• {a} is regular by definition
• So {aa} = {a}{a} is regular (concatenation rule)
• So {a, aa} = {a} U {aa} is regular (union rule)
• A regular expression is a string representation of a
regular language.
• A regular expression “matches” a set of strings (the
represented regular language).
Recursive Definition of Regular Expression
For a regular expression r, L(r) is the Language matched by r
• ɸ is a regular expression, with L(ɸ) = ɸ (matches no strings)
• ε is a regular expression with L(ε) = {ε} (Matches only
empty strings)
• For all symbols a Є Ʃ, a is a regular expression with L(a) =
{a}
Recursive Definition of Regular Expression – Contd…
Let r, r1, and r2 be the regular expressions

• r1 + r2 is a regular expression, with L(r1 + r2) = L(r1) U L(r2)
• r1 r2 is a regular expression, with L(r1 r2) = L(r1) L(r2)
• r* is a regular expression, with L(r*) = (L(r))*
Find Language from Regular Expression
Describe the language represented by the following
regular expression.
• (01)+1(0+1)*
• Solution: It describes the language accepting the
binary string 01 or the binary string start with 1
Give a regular expression that represents the following language
OR
Develop the Regular Expression from the following Language.
L = {w Є {a, b}* | |w| ≤ 2}
• Enumerate All Possible Strings:
ε + a + aa + b + bb + ab + ba
Try to combine some of them:
ε + a(ε + a) + b(ε + b) + ab + ba
That is , (ε + a + b)(ε + a + b)
• Simplify this using the rule : ε ε = ε; ε a = a ε = a; εb=bε=b
Finite Automata and Regular Expressions
• There are two kinds of NFA:
• NFA with Epsilon Transition
• NFA without Epsilon Transition
• Every Language defined by one of these automata is
also defined by a regular expression.
• Every Language defined by a regular expression, is
also defined by one of these automata.
From DFA’s to Regular Expression
Theorem
• If L = L(A) for some DFA A, then there is a regular
expression R such that L = L(R).
• For Proof, we assume that A is a DFA that has the states
(1, 2, 3, …, n) for some integer value n.
• Now we have to construct the collection of regular
expressions that describes progressively broader sets of
paths in the transition diagram of A.
Proof of the Theorem
• Assume that Rij(k) is the name of the regular expression
whose language is the set of strings w such that w is the
label of the path from state i to state j in A, and that path
has no intermediate node whose number is greater than k.
• To construct the expression R ij , we use the following
(k)
inductive definitions: starting at k = 0, and finally reaching

k = n. When k = n, there is no restriction at all on the paths
represented, since there are no states greater than n.
Induction Basis
• The Basis is k = 0
• Since all states are numbered from 1 or above, the restriction on paths is
that the path must have no intermediate states at all. There are only two
kinds of paths that meets this condition:
1) An Arc from state i to state j
2) A path of length 0 that consists of only some node i.
• If i ≠ j, then only case (1) is possible. We must examine the DFA A and find
those input symbols a such that there is a transition from state i to state j on
the symbol a.
• If there is no such symbol a, then Rij(0) = ɸ
• If there is exactly one such symbol a, then Rij(0) = a.
• If there are more symbols a1, a2, … ak, that label arc from state i to
state j, then Rij(0) = a1 + a2 + a3 + … + ak.
Induction Basis --- Contd…
• If i = j, then the legal paths are the path of length 0 and all
loops from i to itself. The path of length 0 is represented
by the regular expression ε.
• So, if there is no such symbol a, the expression becomes ε.
• If there is only one symbol a, then the expression
becomes ε + a.
• If there are multiple symbols, the expression becomes
ε + a1 + a2 + … ak.
Induction Proof:
• Suppose there is a path from state i to state j that goes through no
state higher than k. There are two possible cases to consider:
• The path does not goes through state k at all. In this case, the
label of the path is in the language of Rij(k-1).
• The path goes through state k atleast once. Then we can break
the path into several pieces. First goes from state I to state k
without passing to through k, the last piece goes from k to j
without passing through k and all the pieces in the middle go
from k to itself, without passing through k.
Induction Proof:
• Suppose there is a path from state i to state j that goes through
no state higher than k. There are two possible cases to
consider:
• The path does not goes through state k at all. In this case,
the label of the path is in the language of Rij(k-1) ---> (1)
• The path goes through state k atleast once. Then we can
break the path into several pieces. First goes from state I
to state k without passing to through k, the last piece goes
from k to j without passing through k and all the pieces in
the middle go from k to itself, without passing through k.
Induction Proof:
• If the path goes through state k only once, then there are no
middle pieces, just a path from i to k and a path from k to j.
The set of labels for all paths of this type is represented by
the regular expression: Rik(k–1)(Rkk(k–1))* Rkj(k–1)---> (2)
• By combining (1) and (2), we get,
Rij(k) = Rij(k–1) + Rik(k–1) (Rkk(k–1))* Rkj(k–1)
Example
• Convert the following DFA to regular Expression:
1 0
Start 0 1
q1 q2 q3
1
Formula:
Rij(k) = Rij(k–1) + Rik(k–1) (Rkk(k–1))* Rkj(k–1)
Regular Expressions for Paths in various states:
Show the Entries for Rij(0) using the Theorem

R11(0) 1+ε R11(1) 1* R11(2) 1*
R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
R11(0) 1+ε R11(1) 1* R11(2) 1*

R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
R11(0) 1+ε R11(1) 1* R11(2) 1*

R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
R11(0) 1+ε R11(1) 1* R11(2) 1*

R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
R11(0) 1+ε R11(1) 1* R11(2) 1*

R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
Basic Rules to Simplify the Regular Expression
(ε + 1) = (1 + ε) = 1
ε1 = 1ε = 1
ɸ+A=A+ɸ=A
ɸA = Aɸ = ɸ
ε* = ε
1 + 1* 0 = 1* 0 + 1 = 1* 0
0 + 1* 0 = 1* 0 + 0 = 1* 0
11* = 1*1 = 1*
1+1* = 1* + 1 = 1*
When k = 1,
Formula:
Rij(1) = Rij(0)+Rik(0)(Rkk(0))*Rkj(0)
• R11(1) = R11(0)+R11(0)(R11(0))*R11(0)
=
• R12(1) = R12(0)+R11(0)(R11(0))*R12(0)
=
• R13(1) = R13(0)+R11(0)(R11(0))*R13(0)
=
When k = 1,
Formula:
• R11(1) = R11(0)+R11(0)(R11(0))*R11(0)
=(1 + ε) + (1 + ε) (1 + ε)* (1 + ε) = (1) + (1)(1)*(1) = 1+1*1
= 1 + 1*= 1*
• R12(1) = R12(0)+R11(0)(R11(0))*R12(0)
= (0) + (1 + ε) (1 + ε)* (0) = (0)+(1) (1)*(0)
= 0 + 1*0 = 1*0
• R13(1) = R13(0)+R11(0)(R11(0))*R13(0)
=(ɸ) + (1 + ε) (1 + ε)* (ɸ) = (ɸ) +(1)(1)*(ɸ)
=ɸ
When k = 1,
Formula:
• R21(1) = R21(0)+R21(0)(R11(0))*R11(0)
= (ɸ) + (ɸ) (1 + ε)* (1 + ε) = (ɸ) + (ɸ)(1)*
= ɸ
• R22(1) = R22(0)+R21(0)(R11(0))*R12(0)
=(ε) + (ɸ) (1 + ε)* (0) = (ε) +(ɸ)*(0) = (ε) +(ɸ)
=ε
• R23(1) = R23(0)+R21(0)(R11(0))*R13(0)
= (1) + (ɸ) (1 + ε)* (ɸ) = (1) + (ɸ)(1)*
= 1
When k = 1,
Formula:
• R31(1) = R31(0)+R31(0)(R11(0))*R11(0)
= (ɸ) + (ɸ) (1 + ε)* (1 + ε) = (ɸ) + (ɸ)(1)*(1) = ɸ + ɸ1*
=ɸ
• R32(1) = R32(0)+R31(0)(R11(0))*R12(0)
= (1) + (ɸ) (1 + ε)* (0) = (1) + (ɸ)
=1
• R33(1) = R33(0)+R31(0)(R11(0))*R32(0)
= (0 + ε) + (ɸ) (1 + ε)* (1) = (0 + ε) + (ɸ)
=0+ε
When k = 2,
Formula:
• R11(2) = R11(1)+R12(1)(R22(1))*R21(1)
= (1*) + (1*0) (ε)* (ɸ) = (1*) + (1*0)(ε)(ɸ) = (1*) + (ɸ)
= 1*
• R12(2) = R12(1)+R12(1)(R22(1))*R22(1)
= (1*0)+(1*0)(ε)*(ε) = (1*0)+(1*0) (εε) = (1*0)+(1*0)(ε)
= (1*0) + (1*0) = 1*0
• R13(2) = R13(1)+R12(1)(R22(1))*R23(1)
= (ɸ) + (1*0) (ε)* (1) = (ɸ) + (1*0) (ε*)(1) = (ɸ) + (1*0) 1
= (ɸ) + (1*01)= 1*01
When k = 2,
Formula:
• R21(2) = R21(1)+R22(1)(R22(1))*R21(1)
=
= ɸ
• R22(2) = R22(1)+R22(1)(R22(1))*R22(1)
=
= ε
• R23(2) = R23(1)+R22(1)(R22(1))*R23(1)
=
= 1
When k = 2,
Formula:
• R31(2) = R31(1)+R32(1)(R22(1))*R21(1)
=
= ɸ
• R32(2) = R32(1)+R32(1)(R22(1))*R22(1)
=
= 1
• R33(2) = R33(1)+R32(1)(R22(1))*R23(1)
= (0+ε)+1ε*1 = (0+ε)+1ε1 = (0+ε)+11
= 0+ε+11
R11(0) 1+ε R11(1) 1* R11(2) 1*

R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
R11(0) 1+ε R11(1) 1* R11(2) 1*

R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
R11(0) 1+ε R11(1) 1* R11(2) 1*

R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
R11(0) 1+ε R11(1) 1* R11(2) 1*

R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
R11(0) 1+ε R11(1) 1* R11(2) 1*

R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
R11(0) 1+ε R11(1) 1* R11(2) 1*

R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
R11(0) 1+ε R11(1) 1* R11(2) 1*

R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
R11(0) 1+ε R11(1) 1* R11(2) 1*

R12(0) 0 R12(1) 1*0 R12(2) 1*0
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R21(0) ɸ R21(1) ɸ R21(2) ɸ
R22(0) ε R22(1) ε R22(2) ε
R23(0) 1 R23(1) 1 R23(2) 1
R31(0) ɸ R31(1) ɸ R31(2) ɸ
R32(0) 1 R32(1) 1 R32(2) 1
R33(0) 0+ε R33(1) 0+ε R33(2) 0 + ε + 11
Construction of Regular Expressions
R11(0) 1 + ε R11(1) 1* R11(2) 1* Starting State of DFA is State 1,

R12(0) 0 R12(1) 1*0 R12(2) 1*0 Accepting State of DFA is State 3
Calculate R13(3) using the formula
R13(0) ɸ R13(1) ɸ R13(2) 1*01
R13(3) = R13(2)+R13(2)(R33(2))*R33(2)
R21(0) ɸ R21(1) ɸ R21(2) ɸ
= (1*01) + (1*01)(0 + ε + 11)*(0 + ε +
R22(0) ε R22(1) ε R22(2) ε 11)
R23(0) 1 R23(1) 1 R23(2) 1 = (1*01) + (1*01)(0+11)*(0+11)
= (1*01)+(1*01)(0+11)*
R31(0) ɸ R31(1) ɸ R31(2) ɸ
= (1*01)(ε + (0+11)*)
R32(0) 1 R32(1) 1 R32(2) 1 = (1*01)(0+11)*
R33(0) 0 + ε R33(1) 0 + ε R33(2) 0 + ε + 11 So Final Regular Expression for our
DFA is (1*01)(0+11)*
DFA To Regular Expression using
State Elimination Method
Step-01:The initial state of the DFA must not have any incoming
edge.
• If there exists any incoming edge to the initial state, then
create a new initial state having no incoming edge to it.
Step-02: There must exist only one final state in the DFA.
If there exists multiple final states in the DFA, then convert all
the final states into non-final states and create a new single
final state
Step-01:The initial state of the DFA must not have any incoming edge.
• If there exists any incoming edge to the initial state, then create a new
initial state having no incoming edge to it.
ε qi qf
qi qf q’i
New Initial State Incoming Edge

Incoming Edge
Step-02: There must exist only one final state in the DFA.
If there exists multiple final states in the DFA, then convert all the final
states into non-final states and create a new single final state
Step-03: The final state of the DFA must not have any outgoing edge.
If there exists any outgoing edge from the final state, then create a
new final state having no outgoing edge from it.
• Step-04:Eliminate all the intermediate states one by one.
• These states may be eliminated in any order.
• In the end,
• Only an initial state going to the final state will be left.
• The cost of this transition is the required regular expression.
•
Problem-01:
• Find regular expression for the following DFA-
0
A B
1
Solution
• Step-01: Initial state A has an incoming edge. So, we create a
new initial state qi. The resulting DFA is-
0
ε
qi A B
1
Solution
• Step-02: Final state B has an outgoing edge. So, we create a new
final state qf The resulting DFA is-
ε ε
qi A B qf
1
Solution
0
qi
ε A B
ε qf
1
• Step-03: Now, we start eliminating the intermediate states.
• First, let us eliminate state A. There is a path going from state qi to state B via
state A. So, after eliminating state A, we put a direct path from state qi to
state B having cost ε.0 = 0. There is a loop on state B using state A. So, after
eliminating state A, we put a direct loop on state B having cost 1.0 = 10.
• The resultant DFA is:
qi 0 B ε
qf
10
Solution
• Step-04: Now, let us eliminate state B. There is a path going from state q i to
state qf via state B. So, after eliminating state B, we put a direct path from
state qi to state qf having cost 0.(10)*.∈ = 0(10)*. Eliminating state B, we get
the new DFA
0(10)*
qi qf
So, Regular Expression is 0(10)*

Problem-02:
Find regular expression for the following DFA-
q4
b
a c q3
q1 q2
d
q5
Solution
Step-01: There exist multiple final states. So, we convert them into a
single final state. The resulting DFA is:
q4
b ε
a c ε qf
q1 q2 q3
d ε
q5
Solution
Step-02: We start Eliminating the Intermediate States: First, let us
eliminate state q4. There is a path going from state q2 to state qf via
state q4. So, after eliminating state q4 , we put a direct path from
state q2 to state qf having cost b.ε = b.
b
a c ε qf
q1 q2 q3
d q5 ε
Solution
• Step-02: Now, let us eliminate state q3. There is a path
going from state q2 to state qf via state q3. So, after
eliminating state q3 , we put a direct path from state q2 to
state qf having cost c.ε = c.
b
a c qf
q1 q2
d q5 ε
Solution
eliminating state q5 , we put a direct path from state q2 to
state qf having cost d.ε = d. b
a c qf
q1 q2
d
Solution
eliminating state q5 , we put a direct path
b from state q 2 to
state qf having cost d.ε = d. c
q1
a q2 qf
d
Solution
• Step-02: Now, we eliminate state q2. There is a path going
from state q1 to state qf via state q2. So, after eliminating
state q2 , we put a direct path from state q1 to state
qf having cost a.(b+c+d). We get:
a(b + c + d)
q1 qf
Regular Expression is : a(b + c + d)

Problem-03: Find regular expression for the
following DFA
c a d
q1 q2
b
Solution
• Step-01: Initial state q1 has an incoming edge. So, we create
a new initial state qi. The resulting DFA is
c a d
qi
ε q1 q2
b
Solution
• Step-02: Final state q2 has an outgoing edge. So, we create
a new final state qf. The resulting DFA is
c a d
ε ε
qi q1 q2 qf
b
Solution
• Step-03: Now, we start eliminating the intermediate states. First, let
us eliminate state q1. There is a path going from state qi to state q2 via
state q1 . So, after eliminating state q1, we put a direct path from state
qi to state q2 having cost ε.c*.a = c*a
• There is a loop on state q2 using state q1. So, after eliminating state q1 ,
we put a direct loop on state q2 having cost b.c*.a = bc*a. Eliminating
state q1, we get d
c*a ε
qi q2 qf
bc*a
Solution
• Step-04: Now, let us eliminate state q2.
• There is a path going from state qi to state qf via state q2 .So,
after eliminating state q2, we put a direct path from state qi to
state qf having cost c*a(d+bc*a)*ε = c*a(d+bc*a)*
• Eliminating state q2, we get
c*a(d+bc*a)*
qi qf
Regular Expression = c*a(d+bc*a)*

Problem-04:
• Find regular expression for the following DFA
b a b a, b
a b a
q1 q2 q3 q4
Solution
• Step-01: State q4 is a dead state as it does not reach to any final
state. So, we eliminate state q4 and its associated edges.
• The resulting DFA is
b a b a, b
a b a
q1 q2 q3 q4
b a b
q1
a q2 b q3
Solution
• Step-02: Initial state q1 has an incoming edge (self loop). So, we
create a new initial state qi. The resulting DFA is:
b a b
ε a b
qi q1 q2 q3
Solution
• Step-03: There exist multiple final states. So, we convert
them into a single final state.
• The resulting DFA is:
b a b
ε a b
qi q1 q2 q3
ε
ε ε
qf
Solution
• Step-04: Now, we start eliminating the intermediate states. First, let
us eliminate state q3. There is a path going from state q2 to state
qf via state q3. So, after eliminating state q3, we put a direct path
from state q2 to state qf having cost b.b*.ε = bb*. Eliminating state
q3, we get
b a
ε a
qi q1 q2
ε bb*
ε
qf
Solution
• Step-04: Now, we convert parallel paths from q2 to qf into single
path. We put a single direct path from state q2 to state qf having cost
b.b*+ε = bb*. We get
b a
b a
qi
ε q1 a q2
qi
ε q1 a q2
bb* bb*
ε ε
ε
qf
qf
Solution
• Step-04: Now, we eliminate state q2. There is a path going from state
q1 to state qf via state q2. So, after eliminating state q2, we put a
direct path from state q1 to state qf having cost aa*bb. Eliminating
state q2, we get
a b
b
ε aa*bb*
qi
ε q1 a q2
qi q1
bb* ε
ε
qf
qf
Solution
• Step-04: Now, we convert parallel paths from q1 to qf into single
path. We put a single direct path from state q1 to state qf having cost
aa*bb*+ε = aa*bb*. We get
b
b qi
ε q1
ε q1 aa*bb* aa*bb* + ε
qi
ε qf
qf
Solution
• Step-04: Now, we eliminate state q1. There is a path going from state
qi to state qf via state q1. So, after eliminating state q1, we put a
direct path from state qi to state qf having cost b*aa*bb*+ ε.
Eliminating state q1, we get
b
ε qi
qi q1 b*aa*bb* + ε
aa*bb* + ε
qf
qf
So, Final Regular Expression is: b*aa*bb*+ ε

Find the Regular Expression for the DFA given below
Find the Regular Expression for the DFA given below
Converting Regular Expression into Automata
To convert the RE to FA, the method that is popularly used is known as
the Subset method. This method is used to get FA from the
given regular expression.
• The steps in this method are given below:-
• Step 1: Make a transition diagram for a given regular expression,
using NFA with ε moves.
• Step 2: Then, Convert this NFA with ε to NFA without ε.
• Step 3: Finally, Convert the obtained NFA to equivalent DFA.
Some standard rules help in the conversion of RE to NFA are:-
1.) If RE is in the form a+b, it can be represented as:
a, b
q1 q2
2.) If RE is in the form ab, it can be represented as:

a b
q1 q2 q3
3.) If RE is in the form a*, it can be represented as:

a
q2
Example 1: Design a Finite Automata from the
given RE [ ab + (b + aa)b* a ].
• Solution. At first, we will design the Transition diagram for the given
expression.
• Step 1:
ab + (b + aa)b*a
q1 qf
• Step 2:
ab + (b + aa)b*a
q1 qf
ab
q1 qf
(b + aa)b*a
• Step 3: a q2 b
q1 qf
(b + aa)b*a
• Step 4: a q2 b
q1 qf
b + aa b*a
q3
• Step 5: a q2 b
q1 qf
b + aa a
q3
b*
• Step 6: a q2 b
q1 b qf
aa a
q3
b*
• Step 7: a q2 b
q1 b qf
a a
q4 q3
a
b*
• Step 7:
a b
q2
q1 b qf
After step 7, we have got
a NFA without ε. Now we will
q4 q3 a convert it into the required
a DFA; we will first write a
b* transition table for this NFA.
• Step 8: Write the Transition Table
a b a b
q2
 q1 {q2, q4} q3
q1 b qf
q2 ɸ qf
a q3 qf q3
q4 q3 a
q4 q3 ɸ
a
* qf ɸ ɸ
b*
• Step 9: Write the Corresponding Transition Table for DFA
NFA a b DFA a b
 q1 {q2, q4} q3  q1 [q2, q4] q3
q2 ɸ qf q2 ɸ qf
q3 qf q3 q3 qf q3
q4 q3 ɸ q4 q3 ɸ
*qf ɸ ɸ {q2, q4} q3 qf
*qf ɸ ɸ
• Step 9: Write the Corresponding Transition Table for DFA
NFA a b DFA a b
 q1 {q2, q4} q3  q1 [q2, q4] q3
q2 ɸ qf q2 ɸ qf
q3 qf q3 q3 qf q3
q4 q3 ɸ q4 q3 ɸ
*qf ɸ ɸ {q2, q4} q3 qf
*qf ɸ ɸ
Now Draw the DFA Transition Diagram
Example 2: Write the DFA from the
Regular Expression: a (a* ba* ba*)*
•Left as Exercise to the Students

a (a* ba* ba*)*

q1 qf
a
q1 qf
a* ba* ba*
ba*
a
q1 qf q2
ba*
a
b
a
q1 qf q2
ba*
a
a
a
b
a
q1 qf q2
b
a
a
a
b
a
q1 qf q2
b
a
a
b
a
q1 qf q2
b
a
Now Write Transition Table for this NFA
a
b a
a NFA a b
q1 qf q2
 q1 qf ɸ
b
* qf qf q2
q2 q2 qf
a
b a
a NFA a b
q1 qf q2
 q1 qf ɸ
b
* qf qf q2
Since, No additional States are created
in NFA Transition Table, This table is also q2 q2 qf
a DFA Transition Table.
Convert the Regular Expression into NFA with ε Transition
• The process of converting a regular expression into an ∈-NFA is as
follows:
1.Create a single start state for the automaton, and mark it as the
initial state.
2.For each character in the regular expression, create a new state and
add an edge between the previous state and the new state, with the
character as the label.
3.For each operator in the regular expression (such as “*” for zero or
more, “+” for one or more, and “?” for zero or one), create new
states and add the appropriate edges to represent the operator.
4.Mark the final state as the accepting state, which is the state that is
reached when the regular expression is fully matched.
Common regular expression used in make ε-NFA:
ab a b
1 2 3
OR
a ε b
1 2 3 4
a+b
ε a ε
2 3
1 6
ε 4 5 ε
b
Common regular expression used in make ε-NFA:
a* ε
ε a ε
1 2 3 4
ε
Example: Create a ε-NFA for regular expression:
(a/b)*a
(a/b)*a  (a+b)*a
(a/b)*a  (a+b)*a
• Construct NFA for a + b, we get
(a/b)*a  (a+b)*
• Construct NFA for a + b, we get
ε 2 a 3 ε
1 6
ε 4 5 ε
b
(a/b)*a  (a+b)*a
• Construct NFA for (a + b), we get
a 3 ε
ε 2
ε 6
1 1
ε 4 5 ε
b
(a/b)*a  (a+b)*a
• Construct NFA for (a + b)*, we get
ε
2 a 3 ε
ε ε
ε 6 5
1 1
ε 4 5 ε
b
ε
(a/b)*a  (a+b)*a
• Construct NFA for (a + b)*a, we get
ε
2 a 3 ε
ε ε a
ε 6 5 6
1 1
ε 4 5 ε
b
ε
Construct a Finite Automata for the regular expression
((a+b)(a+b))*
Construct a Finite Automata for the regular expression
((a+b)(a+b))*
a ε ε 9 a 10 ε
ε 2 3 ε
ε ε 13 14
1 6 7 8
ε 4 5 ε ε 11 12 ε
b b
ε
Construct the NFA with ε Transitions for the
given Regular Expression:
 01 + 101
 (01 + 1)*
 011(0+1)*
Proving Language Not to be Regular
• Not Every Language is a Regular Language.
• Pumping Lemma is for showing certain Languages are
not to be regular.
Pumping Lemma for Regular Languages
• Let L be a Regular Language. Then there exists a constant n
(which depends on L) such that for every string w in L such that
|w| ≥ n, we can break w into three strings, w = xyz, such that
•y ≠ ε
• |xy| ≤ n
• For all k ≥ 0, the string xykz is also in L.
• We can always find a nonempty string y not too far from the
beginning of w that can be pumped, that is, repeating y any
number of times, or deleting it (in case k = 0), keeps the
resulting string in the language L.
Proof
• Suppose L is regular. Then L = L(A) for some DFA A. Suppose A has n
states. Consider any string w of length n or more, say w = a1 a2 a3 … am
where m ≥ n and each ai is an input symbol. For i = 0, 1, 2, … n define
^
state pi to be Ꟙ(q0, a1 a2 a3 …ai)
Where Ꟙ is the transition function of Ai and q0 is the start state of A.
That is pi is the start state A is in after reading the first I symbols of w.
Note that p0 = q0.
• By the pigeonhole principle, it is not possible for the n+1 different pi’s
for i = 0, 1, 2, … n to be distinct, since there are only n different
states. Thus we can find two different integers i and j, with 0≤i<j≤n,
such that pi = pj. Now we can break the string w = xyz as follows:
Proof Contd…
• Now we can break the string w = xyz as follows:
x = a 1 a2 a3 … ai
y = ai+1 ai+2 … aj
z = aj+1 aj+2 … am
• That is, x takes us to pi once; y takes us from p i back to pi (since
pi is also pj) and z is the balance of w.
• x may be empty, in the case that i = 0.
• z may be empty if j = n = m.
• But y cannot be empty , since i is strictly less than j.
Every String longer than the number of states must
cause a state to repeat
y = ai+1 … aj
x = a 1 … ai z = aj+1 … am
p0 pi
• Now consider what happens if the automaton A
receives the input xykz for any k ≥ 0.
• If k = 0, then the automaton goes from the start state q 0
(which is also p0) to pi on input x. Since pi is also pj, it
must be that A goes from pi to the accepting state on
the input z. Thus A accepts xz.
• If k > 0, then A goes from q0 to pi on input x, circles from
pi to pi k times on the input yk, and then goes to the
accepting state on the input z. Thus for any k ≥ 0, xykz is
also accepted by A; that is, xykz is in L.
Applications of Pumping Lemma
• Example 1:
• Prove that the language L consisting of all Palindromes are not
a Palindromes over (0 + 1)* is not regular.
• Solution:
• Suppose that it is regular. Then there exists a constant n
satisfying the conditions of pumping lemma. Let w = 0 n10n.
Using the Pumping Lemma, we can break w = xyz so that y ≠ ε
and |xy| ≤ n. Both x and y consists of only 0’s. Suppose x = 0 i
and y = 0j. Then We know by pumping Lemma, that xy0z is in L.
Thus 0n-j 1 0n Є L. This is not Possible. Hence the Language is
not Regular.
Example 2: Show that the language Lpr consisting of all strings
of 1’s whose length is prime is not a Regular Language.
• Proof:
Suppose it is a Regular Language. Then there would be a
constant n satisfying the conditions of pumping lemma.
Consider the Prime p ≥ n + 2; there must be such a p, since
there are infinity of primes. Let w = 1p.
By the Pumping Lemma, we can break the string w = xyz
such that y ≠ ε and |xy| ≤ n. Let |y| = m. We know that |
xyz| = p. Hence |xz| = p–m.
Example 2 Proof Contd…
Example 2 Proof Contd…
Now Consider the string xyp–mz, which must be in Lpr by Pumping
Lemma, if Lpr is really is a regular.
|xyp–mz| = |xz| + (p–m) |y| = p–m + (p–m)m = (m+1)(p–m)
Since it has two factors m+1 and p–m, |xyp–mz| is not a prime.
We must check that either of these are 1, since then (m+1)(p–m)
might be a prime.
Since m ≥ 1 and y ≠ ε, (m+1) > 1. Since p ≥ n+2 (we choosen), (p–m) > 1.
and m ≤ n since m = |y| ≤ |xy| ≤ n. Hence (p–m) ≥ 2.
We assumed the language in question was regular, and we derived a
contradiction by showing that some string not in the language was
required by the pumping lemma to be in the language. Thus we
conclude that Lpr is not a regular Language.
Exercises to the students
Prove that following are not regular language.
• {0n1n | n ≥ 1} This string consisting of a string of 0’s followed by
equal length string of 1’s, is the language L01.
• A set of strings of balanced parentheses. These are the strings of
characters “(” and “)” that can appear in a well formed arithmetic
expression.
• {0n10n | n ≥ 1 }
• {0n12n | n ≥ 1 }
• {0n1m | n ≤ m }
• {0n1m2n | n and m are arbitrary integers }
Exercises to the students
Prove that following are not regular language.
• The set of strings of 0’s and 1’s whose length is perfect square.
• The set of strings 0’s and 1’s that are of form ww (string
followed by the same string).
• The set of strings 0’s and 1’s that are of form ww R (string
followed by reverse of the string).
• The set of strings of 0’s and 1’s of the form ww’. If w = 011
then w’ = 100, so the resultant string is 011100.
END OF THE
MODULE 2
First PART.

21CS51 ATCD MODULE 2 - 1 Regular Expressions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

21CS51 ATCD MODULE 2 - 1 Regular Expressions

Uploaded by

Copyright:

Available Formats

MODULE 2 PPT

• Textbook 1: Chapter3 – 3.1, 3.2, Chapter4- 4.1

Let r, r1, and r2 be the regular expressions

inductive definitions: starting at k = 0, and finally reaching

Show the Entries for Rij(0) using the Theorem

R11(0) 1+ε R11(1) 1* R11(2) 1*

R11(0) 1+ε R11(1) 1* R11(2) 1*

R11(0) 1+ε R11(1) 1* R11(2) 1*

R11(0) 1+ε R11(1) 1* R11(2) 1*

R11(0) 1+ε R11(1) 1* R11(2) 1*

R11(0) 1+ε R11(1) 1* R11(2) 1*

R11(0) 1+ε R11(1) 1* R11(2) 1*

R11(0) 1+ε R11(1) 1* R11(2) 1*

R11(0) 1+ε R11(1) 1* R11(2) 1*

R11(0) 1+ε R11(1) 1* R11(2) 1*

R11(0) 1+ε R11(1) 1* R11(2) 1*

R11(0) 1+ε R11(1) 1* R11(2) 1*

R11(0) 1 + ε R11(1) 1* R11(2) 1* Starting State of DFA is State 1,

New Initial State Incoming Edge

So, Regular Expression is 0(10)*

Regular Expression is : a(b + c + d)

Regular Expression = c*a(d+bc*a)*

So, Final Regular Expression is: b*aa*bb*+ ε

2.) If RE is in the form ab, it can be represented as:

3.) If RE is in the form a*, it can be represented as:

•Left as Exercise to the Students

a (a* ba* ba*)*

You might also like

Regular Expression = ca(d+bca)*

So, Final Regular Expression is: baabb*+ ε

a (a* ba* ba)