0% found this document useful (0 votes)
85 views59 pages

Understanding Regular Languages & Expressions

Uploaded by

gadisakarorsa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views59 pages

Understanding Regular Languages & Expressions

Uploaded by

gadisakarorsa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Chapter two

Regular Language & Regular Expression

12/04/24 1
Regular Language
• Regular languages form a simple but important family of formal
languages.
• They can be specified in several equivalent ways:
 using deterministic or non-deterministic finite automata,
 Using regular grammars,
 Using regular expressions.

12/04/24 2
Regular Language
• Regular Language is a new language that can be constructed
from existing ones by using three simple operations: Union
(denoted by + or ∪), Concatenation, and the closure
operation (denoted by *).
• The concept of regular language (or, regular sets) over an
alphabet ∑ is defined recursively as follows:
1. The empty set ∅ is a regular language.
2. For every symbol a ∈ ∑, {a} is a regular language.
3. If A and B are regular languages, then A ∪ B, AB, and A*
are all regular languages.
4. Nothing else is a regular language.

12/04/24 3
Regular Language
• Examples of Regular Language:
a. The set {ε} is a regular language, because {ε} = ∅*
b. The set {001, 110} is a regular language over the binary
alphabet: {001, 110} = ({0}{0}{1}) ∪({1}{1}{0})
c. {1}*{10}
d. {10, 11, 1100}*

12/04/24 4
Regular Expression
A Regular Expression can be recursively defined as follows −
•ε is a Regular Expression indicates the language containing an empty string.
(L (ε) = {ε})
•φ is a Regular Expression denoting an empty language. (L (φ) = { })
•x is a Regular Expression where L = {x}
•If X is a Regular Expression denoting the language L(X) and Y is a Regular Expression
denoting the language L(Y), then
– X + Y is a Regular Expression corresponding to the language L(X) ∪ L(Y) where
L(X+Y) = L(X) ∪ L(Y).
– X . Y is a Regular Expression corresponding to the language L(X) . L(Y) where
L(X.Y) = L(X) . L(Y)
– R* is a Regular Expression corresponding to the language L(R*)where L(R*) =
(L(R))*
•If we apply any of the rules several times from 1 to 5, they are Regular Expressions.

12/04/24 5
RE Examples
Regular Expressions Regular Set
(0 + 10*) L = { 0, 1, 10, 100, 1000, 10000, … }
(0*10*) L = {1, 01, 10, 010, 0010, …}
(0 + ε)(1 + ε) L = {ε, 0, 1, 01}
Set of strings of a’s and b’s of any length including the
(a+b)*
null string. So L = { ε, a, b, aa , ab , bb , ba, aaa…….}
Set of strings of a’s and b’s ending with the string abb. So
(a+b)*abb
L = {abb, aabb, babb, aaabb, ababb, …………..}
Set consisting of even number of 1’s including empty
(11)*
string, So L= {ε, 11, 1111, 111111, ……….}
Set of strings consisting of even number of a’s followed
(aa)*(bb)*b by odd number of b’s , so L = {b, aab, aabbb, aabbbbb,
aaaab, aaaabbb, …………..}
String of a’s and b’s of even length can be obtained by
concatenating any combination of the strings aa, ab, ba
(aa + ab + ba + bb)*
and bb including null, so L = {aa, ab, ba, bb, aaab, aaba,
…………..}

12/04/24 6
Regular Expression (RE)
• An algebraic-expression notation for describing (some)
languages, rather than a machine representation.
• Languages described by regular expressions are exactly the
FA-recognizable languages.
– That’s why FA-recognizable languages are called
“regular”.
• Regular Expressions are closely related to nondeterministic
finite automata and can be thought of as a “user-friendly”
alternative to the NFA notations for describing software
components.

12/04/24 7
Regular Expression
• When a regular language is obtained through a long sequence
of operations of union, concatenation and Kleene closure, its
representation becomes cumbersome. For example, it may
look like this:
({0}* ∪ ({1}{0}{0}*)){1}{0}*(({0}{1}*) ∪{1}*)
• To simplify the representations for regular languages, we
define the notion of regular expressions over alphabet ∑ as
follows:
• look the next slide…

12/04/24 8
Regular Expression
• Definition:
R is a regular expression over alphabet Σ exactly if R is one of
the following:
1. a, for some a in Σ,
2. ε,
3. ∅ (represents the empty set)
4. ( R1 ∪ R2 ), where R1 and R2 are smaller regular expressions,
• (i.e. Some time + can be used in place of ∪)
5. ( R1 ° R2 ), where R1 and R2 are smaller regular expressions, or
6. ( R1* ), where R1 is a smaller regular expression.

12/04/24 9
Regular Expression
• Intuitive Reading of Regular Expressions
• Concatenation == “is followed by”
• + == “or”
• * == “zero or more occurrences”
• Example:
• ab* :a single a is followed by zero or more b’s {a, ab, abb, abbb, ….}
• (a+b)*: Set of strings with zero or more a's and zero or more b's:
{ε, a, b, aa, ab, ba, bb, aaa, aab, ….}
• (a*b*): Set of strings with zero or more a's and zero or more b's such that all a's
occur before any b: {ε, a, b, aa, ab, bb, aaa, aab, ….}
• (a*b*)*: Set of strings with zero or more a's and zero or more b's:
{ε, a, b, aa, ab, ba, bb, aaa, aab, ….}
12/04/24 10
Regular Expressions Regular Set
(0 + 10*) L = { 0, 1, 10, 100, 1000, 10000, … }
(0*10*) L = {1, 01, 10, 010, 0010, …}
(0 + ε)(1 + ε) L = {ε, 0, 1, 01}
Set of strings of a’s and b’s of any length including the null
(a+b)*
string. So L = { ε, a, b, aa , ab , bb , ba, aaa…….}

Set of strings of a’s and b’s ending with the string abb. So L =
(a+b)*abb
{abb, aabb, babb, aaabb, ababb, …………..}

Set consisting of even number of 1’s including empty string,


(11)*
So L= {ε, 11, 1111, 111111, ……….}

Set of strings consisting of even number of a’s followed by


(aa)*(bb)*b odd number of b’s , so L = {b, aab, aabbb, aabbbbb, aaaab,
aaaabbb, …………..}

String of a’s and b’s of even length can be obtained by


(aa + ab + ba + bb)* concatenating any combination of the strings aa, ab, ba and
12/04/24 bb including null, so L = {aa, ab, ba, bb, aaab, aaba, …………..}
11
• Regular expression for the set of all strings whose first symbol from
the right end is a 0. Ans.: L = (0+1)*0
Regular expression for the set of all strings whose second symbol
from the right end is a 0. Ans.: L = (0+1)*.0.(0+1)
Regular expression for the set of all strings whose 3rd symbol from
the right end is a 0. Ans.: L = (0+1)*.0.(0+1).(0+1)
• Describe the strings that are represented by the regular expression
(0+1)*.0.(0+1).(0+1). Ans.: Valid Strings={0000,1000, 1010, and
many other similar strings}
• Regular expression for the strings that do not contain a as a string
defined over {a,b}. Ans.: (b)*
• Regular expression for the strings that do not contain single a as a
string defined over {a,b} : Ans> (aa+b)*

12/04/24 12
Regular Expression
• For example,
• language A = { 0}* has a regular expression rA = (0)* and
• language B = (00)* U (0) has a regular expression rB = (((0)
(0))*) U (0).
• RE Precedence: we apply the following preference rules to a
non-fully parenthesized regular expression:
(1) Kleene closure (*) has the higher preference over union
(+) and concatenation.
(2) Concatenation has the higher preference over union.
Example: a+bb*a Ambiguous?

12/04/24 13
Regular Expression (RE)
• a+bb*a can be parenthesized as a+(b(b*))a

• The operations + and . in a regular expression satisfy the


distributive law: For any regular expressions r, s and t,
r(s + t) = rs + rt,
(r + s)t = rt + st.
• Two regular expressions r and s are equivalent, denoted by r =
s, if L(r) = L(s).
• Example: r = (a+b)* {ε, a, b, aa, ab, ba, bb, aaa, aab, ….}
s= (a*b*)* {ε, a, b, aa, ab, ba, bb, aaa, aab, ….}
both represent the same language, hence r=s
12/04/24 14
RE: Examples
1. Find a regular expression for the language such that all
strings beginning with a: (the given alphabet is ∑={a,b})
2. Find a regular expression for the language such that all
strings/words begin and end with ‘a’ and in between any
word using ‘b’. (the given alphabet is ∑={a,b})
3. Find a regular expression for the set of binary strings which
have at least one occurrence of the substring 001. (∑={0, 1})
4. Find a regular expression for the language:
L = {x ∈ {0,1}* | x does not end in 01 }

12/04/24 15
Solution
1. a(a + b)*
2. a + ab*a
3. (0 + 1)*001(0 + 1)*
Such a string can be written as x001y, where x and y could be
any binary strings.
4. ε + 0 + 1 + (0 + 1)*(00 + 10 + 11)
If x does not end in 01, then either
• |x| < 2 or
• x ends in 00, 10, or 11

12/04/24 16
Equivalence of different notations of
Regular languages

• An arc from class X to class Y means that we prove


every language defined by class X is also defined by
class Y.
12/04/24 17
DFA → RE by Eliminating States
• Easier to do:
DFA →GNFA →regular expression.

• GNFA (Generalized NFA)labels of transitions can be regular


expressions.
• Need special GNFA that satisfies:
(1) start state has no incoming transition;
(2) only one final state;
(3) final state has no outgoing transition.

12/04/24 18
DFA → RE by Eliminating States
• Idea:
Convert DFA → Special GNFA;
• Eliminate all states, except start and final state, one state at a
time;
• Output the label on the single transition left at the end.

12/04/24 19
DFA → RE by Eliminating States
Follow these to construct RE from DFA

• loop elimination

• transition elimination

• state elimination

12/04/24 20
DFA → RE by Eliminating States

12/04/24 21
DFA → RE by Eliminating States
• Example: Construct RE for the given DFA.
• DFA L = {w in {a, b}* | w has odd number of 1's }

12/04/24 22
DFA → RE by Eliminating States

• Added: a new start state and a new final state with empty
transitions, because
GNFA needs to satisfy:
(1) start state has no incoming transition;
(2) only one final state;
(3) final state has no outgoing transition.
12/04/24 23
DFA → RE by Eliminating States
Eliminate states one-by-one

Before we take out q0 , we have to see


all the paths to the q0 state and
all possible transitions.

We take out the q0 state and it leaves us with a 3 state Finite Automata

12/04/24 24
DFA → RE by Eliminating States
Eliminate states one-by-one

We take out the q1 state and we are left with the regular
expression

12/04/24 25
Converting Regular Expression to
Automata
• Q:Draw a FA that accepts the following RE: (a + b)*a(a + b)
• Solution: L((a + b)*a(a + b)) = {aa, ab, aaa, aab, baa,
bab, aaaa, aaab, abaa, abab, baaa, …}. Set of strings that end
in either aa or ab.

12/04/24 26
RE to ε-NFA (Thomson Construction)
1. To recognize an empty string ε:

2. To recognize a symbol a in the alphabet ∑:

3. For regular expression r1+ r2 : (N(r1) and N(r2) are NFAs


for regular expressions r1 and r2)

12/04/24 27
RE to ε-NFA (Thomson Construction)
4. For regular expression r1r2:

5. For regular expression r*:

12/04/24 28
RE to ε-NFA: Example
• For a RE (a+b)* a, the NFA construction is shown below.

12/04/24 29
Regular Grammar
• A third way of describing regular languages is by means
certain simple grammars.
• Grammars are often an alternative way of specifying
languages.
• Intuitively, a grammar is a set of rules which manipulate
symbols.
• We distinguish two kinds of symbols:
• Terminal are elements of the alphabet or target language.
• non-terminal are auxiliary symbols that facilitate the
specification.

12/04/24 30
Regular Grammars
• A grammar is a list of rules which can be used to produce or
generate all the strings of a language, and which does not
generate any strings which are not in the language.
• Grammar: generative description of a language
• Automaton: analytical description.
• A grammar is a quadruple
G = (V, Σ, P, S) where
– V is a finite set of non-terminal symbols.
– Σ is an alphabet of terminal symbols.
– P is a set of productions, which are rules.
– S is the start symbol, a distinguished member of V.

12/04/24 31
Regular Grammars
• Notation:
• Terminals (lower-case letters, operator symbols, digits,
keywords, Punctuation symbols, etc…)
• Non-Terminals (Upper-case letters, special symbols such as
statement, expression, A, B, C and etc…)
• In a regular grammar, all productions have one of three forms:
1. A → aB (right side single terminal symbol followed
by single non-terminal symbol)
2. A → a (right side single terminal symbol)
3. A → ε (empty)
Where A, B is any non-terminal and a is any terminal symbol.
12/04/24 32
Formal Grammar to Generate a
Formal Language
• Grammar generates a set of sequences of symbols
(formal language):
1. Start with the sequence consisting of just the start
symbol.
2. Repeatedly replace subsequence t by u for
production rule t → u.
3. Terminate when the sequence contains terminal
symbols only.

12/04/24 33
Right- and Left-Linear Grammars
• A grammar G = (V, Σ, P, S) is said to be right-linear if all
productions are of the form: A→ xB A→ x
• A grammar G = (V, Σ, P, S) is said to be left-Linear if all
productions are of the form: A→ Bx A→ x
• A Regular Grammar is one that is either right-linear or left-
linear.
• Note: In any regular grammar, at most one variable appears
on the right side of any production.
• Furthermore, that variable must consistently be either the
rightmost or leftmost symbol of the right side of any
production.

12/04/24 34
Regular Grammar: Example 1
• The grammar G = ({S}, {a, b}, S, P), with P given as:
S→ abS|a
• Q: Can you figure out the type of language that can be generated from
this grammar?
• Solution: first generate different terminal strings from the grammar.
(using derivation)
• S→ a
• S→ abS → aba
• S→ abS → ababS → ababa
• S→ abS → ababS → abababS → abababa
• From the above derivation it is easy to conjecture that L(G) is the
language denoted by the regular expression r = (ab)*a

12/04/24 35
Regular Grammar: Example 2
• Q: Consider the grammar G1 = (V, Σ, P, S) where V = {S, A,
B}, Σ = {0,1} and P consists of the rules given below.
S → 0A|0 A → 1B B → 0A|0
• What is the language generated by G1,(L(G1))?
Solution:
• Some of the strings that can be generated by grammar G1 are 0,
010, 01010, 0101010, . . .,
• So, L(G1) = (01)*0
• i.e. the type of language is a set of strings having zero or more
01 and ends with 0.

12/04/24 36
Regular Grammar: Example 3
• Q: Write a grammar that generates a language having one or
more a’s followed by one or more b’s.
{ab, aab, aaabb, aaaabb…}
• Solution:
Σ = {a, b}
P: S → aA
A → aA | bB
B → bB | ε

12/04/24 37
Example 4:
•Let G = (V, T, P, S) be a regular grammar, where
 V = {A0, A1, A2}
 T = {0, 1}
 S is the start symbol of the grammar.
P is the set of production rules defined as:
 A0 -> 0A1
 A0 -> 1A2
 A1 -> 0A2
 A2 -> 0

Construct a finite-automata that accepts the language generated by a given grammar G.


Solution:
•Let M = (Q, Σ, q0, F, S) be a finite-automata that accepts L (G), where
•Q = {q0, q1, q2, qf}
•Σ = {0, 1}
•q0 is the initial state
•F = {qf}

12/04/24 38
• The states q0, q1, q2 corresponds to A0, A1, A2, and qf is the new
final state of M.
• Initially we have 4 states of finite automata M.

The production rule A0 -> 0A1 includes a transition from q0 to q1 with label 0. After
this production rule, we have following partial diagram of finite automata.

12/04/24 39
The production rule A0 -> 1A2 includes a transition from q0 to q2 with label 1. After
this production rule we have following partial diagram of finite automata.

The production rule A1 -> 0A2 includes a transition from q1 to q2 with label 0. After
this production rule we have following partial diagram of finite automata.

12/04/24 40
• Similarly, for the production rule A2 -> 0includes a transition from
q2 to qf with label 0. After this production rule we have following
final diagram of finite automata accepting L (G).

12/04/24 41
Construction of a Regular
Grammar from Finite Automata

12/04/24 42
12/04/24 43
12/04/24 44
12/04/24 45
12/04/24 46
From DFA to Regular Grammar(RG)
• We can determine a RG directly from a DFA.
• Rules:
a
q0 q1 q0 → aq1

a
q0 q0 → a

• Example: q0 → 1q0|0q1
1 1 1 q1 → 1q1|0q2
0 0 q2→ 1q2|0q1| ε
q0 q1 q2

12/04/24 47
Regular Grammar to NDFA

12/04/24 48
FA Vs RE Vs RL Vs RG
a b

FSA: b

q0 q1

Regular language: {b, ab, bb, aab, abb, …}

Regular expression: a* b+ OR a* bb*

Regular grammar: q0 → aq0| bq1


q1 → bq1| ε

12/04/24 49
Proving non-regularity of a language
• To understand the power of finite automata you must also
understand their limitations.
• There are certain languages that cannot be recognized by any
finite automaton.
• The pumping lemma is used to prove that languages are not
regular. You cannot use it to prove that languages are regular.

12/04/24 50
The Pumping Lemma
• Our strategy for proving languages to be non-regular:
– The Pumping Lemma states that all regular languages
have a special property
• If we can show that a language does not have this
property, then the language cannot be regular.
• The property states that all strings in a regular language
can be “pumped” if they are at least as long as a special
value, the pumping length.
• This means that each such string contains a section that
can be repeated any number of times with the resulting
string remaining in the language.

12/04/24 51
The Pumping Lemma
• Let L be a regular language. Then there is a number
p (the pumping length) where, if s is any string in L
of length at least p, then s may be divided into three
parts, s=xyz, satisfying the following conditions:
1. y ≠ ε (but x and z may be ε)
2. |xy| ≤ p
3. for each i ≥ 0, xyiz ∈ L

12/04/24 52
Pumping Lemma Explanation
• This theorem says:
• when s is divided into xyz, either x or z may be empty, but
y may not be empty.
• x and y together must have length at most p.
• we can always find a nonempty string y not too far from the
beginning of s that can be “pumped”, i.e. it can repeat any
number of times.

• Note that although y ≠ ε , for the third condition, i may equal


zero, giving us y0= ε, and then xz must be ∈ L.

12/04/24 53
Using the Pumping Lemma
• To show that L is not regular, first assume that L is regular
in order to obtain a contradiction.
• Then use the pumping lemma to guarantee the existence of
a pumping length p such that all strings of length at least p can
be pumped
• Find a string s in L that has length p or greater but cannot
be pumped
– This is demonstrated by considering all ways of dividing s
into x, y, and z and showing that condition 3 is violated.
• Since the existence of s contradicts the pumping lemma if L
was regular means that L is not regular.
12/04/24 54
Example 1
• Let B be the language {0n1n | n≥0}. Use the pumping lemma to
show that this is not regular.
– Intuitively – not regular if you can’t build a DFA for it
• Assume that B is regular.
– Let p be the pumping length.
–Choose s to be the string 0p1p. This string is clearly a member
of B.
– s has length at least p, so it is a valid choice.

12/04/24 55
Example 1, Continued, 0n1n
• We have the following choices to split s into xyz, according to
the constraints of the pumping lemma, where y may not be
empty:
– The string y consists only of 0’s. Then we pick i=2. By condition 3, xy2z
should also be in B. But this results in the string xyyz. Since we added
more 0’s with the addition of another y, this is not in L.
– The string y consists only of 1’s. Then we pick i=2 and by the same logic
as above, the string xyyz would have more 1’s than 0’s and this is also
not in B.
– The string y consists of 1’s and 0’s. Then we pick i=2 and xyyz may
have the same number of 1’s and 0’s but now the 0’s and 1’s are not in
the desired order (we needed to have all the 0’s come before the 1’s).
Therefore, this string is not in B either.
• Contradiction no matter how s is chosen! B cannot be regular.
12/04/24 56
RE Applications
• Regular expressions are used in e.g.
1. UNIX grep and awk commands.
2. UNIX Lex (Lexical analyzer generator) and Flex (Fast
Lex) tools.
3. Modern programming languages such as PERL.
4. Computer Languages with regexp facilities:
• Python, JAVA, Perl, Ruby, most scripting languages, …
• If not officially supported, a library still may exist
5. Text editors.
6. Other :- Mysql, Microsoft Office, Open Office, ...

12/04/24 57
Properties of Regular Languages
• We say that a class of languages is closed under an
operation if applying that operation to a language (or
languages) in the class produces another language in the class.
For example, the union of two regular languages is another
regular language.
• The class of regular languages is closed under the following
operations:
a) Concatenation d) Complementation
b) Union e) Intersection
c) Kleene closure

12/04/24 58
Properties of Regular Languages
• Intersection: if L1 and L2 are regular languages, then so is L1
∩L2, the language consisting of the set of strings that are in
both L1 and L2.
• Difference: if L1 and L2 are regular languages, then so is L1
−L2, the language consisting of the set of strings that are in L1
but not L2.
• Complementation: If L1 is a regular language, then so is Σ *
−L1, the set of all possible strings that aren’t in L1.
• Reversal: If L1 is a regular language, then so is LR1, the
language consisting of the set of reversals of all the strings in
L1.

12/04/24 59

You might also like