You are on page 1of 41

Regular expressions

Expressions build up using the regular operations for


describing languages.
• The value of a regular expression is a language.
• Regular expressions provide a method for
describing patterns.

1
Priority of operations in regular
expressions
1. Star,
2. Concatenation,
3. Union.

2
Example 1:
Regular expression (0 U 1)0* -shorthand for
({0} U {1}) •{0}*
• The value of part (0 U 1) is the language {0,1}.
• The value of part 0* is the language consisting of all
strings containing any number of 0s.
• The value is the language consisting of all strings
starting with a 0 or a 1 followed by any number of 0s.

3
Example 2:
Regular expression (0 U 1)*

•The value of this expression is the language consisting


of all possible strings of 0s and 1s.

4
Formal definition of a regular expression
R is a regular expression if R is
1. a for some a in the alphabet ,
2. ε,
3.,
4. (RI U R2 ), where RI and R2 are regular expressions,
5. (RI • R2 ), where RI and R2 are regular expressions, or
6. (R1*), where R1 is a regular expression.

5
Explanation to the definition of regular
expression
a. In items 1 and 2, the regular expressions a and ε represent
the languages {a} and { ε }, respectively.
b.In item 3, the regular expression  represents the empty
language.
c. In items 4, 5, and 6, the expressions represent the languages
obtained by taking the union or concatenation of the
languages R1 and R2 , or the star of the language R1*,
respectively.

6
Considerations to regular expression
 If  = {0, 1}, we can write  as shorthand for the regular expression
(0 U 1).
Thus if  is any alphabet, the regular expression  describes the
language consisting of all strings of length 1 over this alphabet, and
* describes the language consisting of all strings over that alphabet.
 R* has all strings that are 0 or more concatenations of strings from R,
where R is a regular expression
 The language R+ = RR* has all strings that that are 1 or more
concatenations of strings from R. So R+ U ε = R*. In addition,
 RK be shorthand for the concatenation of k R's with each other.
 The language described by a regular expression R is L(R)
7
Considerations to regular expression
(cont.)
 R be any regular expression, the following identities are valid.
1.R U  = R. Adding the empty language to any other language will
not change it.
2.R • ε = R. Joining the empty string to any string will not change it.

Example
• R U ε may not equal R. Let R = 0, then L(R)={0} but L(RUε)={0,ε}.
• R •  may not equal R. let R = 0, then L(R) = {0} but L(R • ) = .

8
Equivalence of regular expression with
finite automata
Any regular expression can be converted into a
finite automaton that recognizes the language it
describes, and vice versa.

9
Theorem: Regular language

A language is regular if and only if some


regular expression describes it.

10
Lemma 1:
If a language is described by a regular expression, then
it is regular.

Proof idea:
If an NFA recognizes A then A is regular.
If there is a regular expression R describing some
language A, convert R into an NFA recognizing A.

11
Proof: Conversion of regular expression R into
NFA N
1. R = a for some a in . Then L(R) = {a}, and the
following NFA recognizes L(R)
N= ({q1, q2}, , , q1, {q2}), where (q1, a) = {q2}
and that (r, b) =  for r  q1, or b  a.

2. R = ε. Then L(R)= { ε }, and the following NFA


recognizes L(R).
N= ({q }, , , q1, {q1}), where (r, b) =  for
any r and b.

12
Proof: Conversion of regular expression R into
NFA N (cont.)
3. R = . Then L(R) = , and the following NFA
recognizes L(R). N = ({q}, , , q, ), where
(r, b) =  for any r and b.
4. R =R1 U R2.
5. R = R1 • R2
6. R=R1*.
• For pt. 4, 5 and 6 construct the NFA for R from
the NFAs for R1 and R2 (or just R1 in case 6) and
the appropriate closure construction

13
Example:
Convert the regular expression (ab U a)* to an NFA in a sequence of
stages ?
Steps:

14
Example: (cont.)

15
Lemma 2: If a language is regular, then it can
be described by a regular expression
Proof Idea
•If a language A is regular, it is accepted by a
DFA.
•Thus convert DFA into equivalent regular
expressions.
For converting DFAs into equivalent regular
expressions generalized nondeterministic finite
automata [GNFA] are used.
16
Generalized nondeterministic finite
automata
GNFA are simply nondeterministic finite automata
wherein the transition arrows may have any
regular expressions as labels.

17
Characteristic of Generalized Non-deterministic
Finite Automata [GNFA]
1.The start state of GNFA has no incoming edge from
any state,
2.There is a single accept state of GNFA which has no
outgoing edge to any state,
3.The labels on the edges are treated as regular
expressions from S*.

18
Formal definition of Generalized
Nondeterministic Finite Automaton
A generalized nondeterministic finite automaton is a 5-tuple,
(Q, Σ, δ, qstart, qaccept), where
1. Q is the finite set of states,
2. Σ is the input alphabet,
3. δ : (Q − {qaccept})× (Q − {qstart})→R is the transition function,
(The symbol R is the collection of all regular expressions over the
alphabet Σ,)
4. qstart is the start state, and
5. qaccept is the accept state.
19
Steps of conversion of DFAs into
equivalent regular expressions.
1.Create a Generalized Non-deterministic Finite
Automata [GNFA] from the deterministic finite
automata DFA.
2.Obtain a two-states only GNFA by removing
intermediate states.
3.Read the label on the transition edge of the two-states
only GNFA as regular expression R.

20
Obtaining GNFA from DFA

1. Add a new start state to the DFA that has no


incoming edge from any state. Connect the new start
state to the old start state of DFA through  transition,
2. Add a new accept state to the DFA that has no
outgoing edge to any state. Connect all old accept
states of DFA to the new accept state through 
transitions.
21
State removal (ripping)
Select any state qrip except qstart and qaccept
• Remove qrip
•Compensate the absence of qrip, by the new label on the
arrow from qi to qj which is a regular expression that
describes all strings that would take the GNFA to go
from qi to qj either directly or via qrip

22
Removal of intermediate states and joining of edge
labels by using regular operations
1.Union operation : If there are two or more edges between
two states on various symbols, replace the edges with a
single edge and the new label is the union of the symbols on
the previous edges,

23
Removal of intermediate states (cont.)

2. Concatenation operation: If there are three consecutive


states, remove the middle one. The new label is the
concatenation of the symbols on the edges between
previous states,

24
Removal (ripping) of intermediate states and joining
of edge labels by using regular operations (cont.)
3. Star operation: If there is an edge looping to the state to be
removed, include its symbol with star in the united symbols
at the place of the ripped state.

25
Obtaining the regular expression from
two-state GNFA
• The states are removed and symbols on the edges are
united unless a two-state GNFA consisting of start and
accept states is obtained.
• The required regular expression is the string of
symbols united by regular operations on the edge
between the start and accept states.

26
Creation of regular expression for ripped
state
In the old machine,
if
1. qi goes to qrip with an arrow labeled R1,
2. qrip goes to itself with an arrow labeled R2,
3. qrip goes to qj with an arrow labeled R3, and
4. qi goes to qj with an arrow labeled R4,
Then
in the new machine the arrow from qi to qj
gets the label (R1)(R2)*(R3) ∪ (R4).
27
Example: Converting a two-state DFA to an
equivalent regular expression

28
Example: (cont.)

29
The Pigeonhole Principle

If there are n pigeons and m pigeonholes such


that n>m, there is a pigeonhole with at least 2
pigeons.

30
Walk of string

Walk of string is the sequence of states of


finite automaton which are went over with
each symbol of the string.

31
Pigeonhole principle for any DFA:
If in a walk of a string, transitions ≥ states of
DFA then a state is repeated
or
If a string w has length |w| ≥ p number of states
then a state q must be repeated in the walk of w

(transition = pigeon, states = pigeonhole)


32
Example: Walk of string w

Let w = xyz,
Length |xy| ≤ p
(number of states),
Length |y| ≥ 1,
Strings accepted:
xz, xyz, xyyz or
xyiz, i=0,1,2..

33
The pumping lemma for regular
languages:
The property states that all strings in regular languages
can be enlarged [pumped] if they are at least as long as
a certain special value, called the pumping length.

That means each such string contains a section that can


be repeated any number of times with the resulting
string remaining in the language.

34
Formal definition of pumping lemma:
Given an infinite regular language L there exist an
integer p (the pumping length) such that for any
string w  L with length |w| ≥ p can be written as
w = xyz satisfying the following conditions:
1.|y| ≥ 1;
2.|xy| ≤ p
3.for all i ≥ 0, the string xyiz is also in L
35
Pumping lemma: (explanation)
For any regular language L, any sufficiently long word w (in L) can be
split into 3 parts. i.e. w = xyz , such that all the strings xykz for k≥0 are
also in L.
•y is the substring that can be pumped (removed or repeated any number
of times, and the resulting string is always in L). yi means that i copies
of y are concatenated together, and y0 equals ε.
•Cond.1 means the loop y to be pumped must be of length at least one; y
 ε.
•Cond.2 means the loop must occur within the first p characters.
•Either x or z may be ε.
36
Proof Idea
• For every regular language there is a finite automata FA that accepts the language.
• The number of states in such an FA is used as the pumping length p.
• For a string of length at least p, let s0 be the start state and let s1, ..., sp be the
sequence of the next p states visited as the string is emitted.
• Because the FSA has only p states, within this sequence of p + 1 visited states
there must be at least one state that is repeated. Write S for such a state.
• The transitions that take the machine from the first encounter of state S to the
second encounter of state S match some string. This string is called y and
• Since the machine will match a string without the y portion, or the string y can be
repeated any number of times, the conditions of the lemma are satisfied.

37
Proof:
Let M = (Q, , , q1, F), be a DFA recognizing a language L and p be
the number of states of M.
 Let w = w1 w2 … wn, be a string in L of length n, where n ≥ p.
 Let r1 , .... , rn+1 be the sequence of states that M enters while
processing w, so ri+1= (ri, wi) for 1 < i < n.
 This sequence has length n + 1, which is at least p + 1.
 Among the first p + 1 elements in the sequence, two must be the same
state, by the pigeonhole principle.
 Call the first of these rj and the second rk.
 Because rk, occurs among the first p + 1 places in a sequence starting
at r1, we have k  p + 1.
38
Proof: (cont.)
 Now let x = w1 ... wj-1. , y = wj ... wk-1, and z = wk ... wn.
 As x takes M from r1 to rj , y takes M from rj to rj, and z
takes M from rj to rn+1, which is an accept state, M must
accept xyiz for i ≥ 0.
 We know that j k, so |y| > 0; and k  p + 1, so |xy| < p.
 Thus all conditions of the pumping lemma have been
satisfied.

39
Theorem: Language L={anbn: n≥0 } is not
regular
Proof: Use the pumping Lemma
•Assume for contradiction that L is a regular language
•Since L is infinite apply the Pumping Lemma
•Let p be the pumping length
•Pick a string w such that: w L and |w|≥ p such as w = apbp = xyz
•From the Pumping Lemma |xy|≤ p, |y| ≥ 1

•Therefore: let y = ak , k ≥ 1

40
Proof: (cont.)
• From the Pumping Lemma: xyiz  L, i = 0,1,2,…
• Thus xy2z = xyyz = ap+kbp L
• But L={anbn: n≥0 } thus ap+kbp L - contradiction

41

You might also like