You are on page 1of 20

3.

Regular Expressions and


Languages
CIS 5513 - Automata and Formal Languages – Pei Wang

1
Regular languages
L is a regular language if and only if it is
accepted by a DFA or NFA (or ε-NFA)
Regular languages can be specified without
automata, but with regular expressions
Regular expressions are notations that specify
a language declaratively, focusing on the
symbolic patterns in the sentences
Expression E denotes language L(E)
2
Operators of regular languages
A language is a set of strings, therefore
 Union: The union of two languages L and M,
LM, is set union of the two
 Dot: The concatenation of two languages L and M,
L•M or LM, is string concatenation of the two
(similar to Cartesian product L×M)
 Star: The closure of a language, L*, is defined as
L0  L1  L2  L3  …, where L0 = {ε} and L1 = L

3
Operators of regular expressions
Regular expressions are formed recursively
 Constant or symbol:

L(Ø) = Ø, L(ε) = {ε}, L(a) = {a}


 Union, concatenation, and star:

L(E+F) = L(E)L(F)
L(EF) = L(E)L(F)
L(E*) = (L(E))*
 Parenthesized expression: L((E)) = L(E)

4
Order of precedence
Order of precedence of regular-expression
operators, from high to low:
star > dot > union

Example: Regular expression for the strings that


consist of alternating 0’s and 1’s
 (01)*+(10)*+0(10)*+1(01)*
 (ε+1)(01)*(ε+0)

5
Exercises
Write regular expressions for the following languages:
 3.1.1(a) The set of strings over alphabet { a,b,c}
containing at least one a and at least one b
 3.1.2(a) Binary strings where every pair of adjacent
0’s appear before any pair of adjacent 1’s
Solution:
http://infolab.stanford.edu/~ullman/ialcsols/sol3.html#sol31

6
DFA to R.E. by inclusion (1)
A regular expression can be built from a DFA for the
same language
Proof: mathematical induction on the states to be
used as intermediate in DFA
(1) Name the states of D from 1 to n, starting at the
start state
(2) Use Rij(k) for the regular expression where L(Rij(k))
= {w | w is the label of a path in D from state i to
state j without going through nodes greater than k}
7
DFA to R.E. by inclusion (2)
(3) The basis is k = 0, where Rij(0) is the union of
the symbols on the direct edges from i to j
(4) Given Rij(k-1) for all pairs of state
Rij(k) = Rij(k-1) + Rik(k-1) (Rkk(k-1))* Rkj(k-1)
(5) When k = n, the regular expression for L(D)
is the union of all R1j(k) where j is a final state
The above procedure works for NFA or ε-NFA,
too
8
DFA to R.E. by inclusion: example

9
DFA to R.E. by elimination (1)
Eliminating the state s

10
DFA to R.E. by elimination (2)
Overall process:
(1) Label the edges using r. e.
(2) Repeatedly eliminating states for each final
state, get a r.e. (R+SU*T)*SU* or R*

(3) Take the union of all the resulting r.e.


11
DFA to R.E. by elimination: example

Resulting R.E.:
(0+1)*1(0+1)+
(0+1)*1(0+1)(0+1) or
(0+1)*1(0+1)(ϵ+0+1)
12
Regular expression to ε-NFA

13
Regular expression to ε-NFA (2)
Regular expression:
(0+1)*1(0+1)

(a) 0+1
(b) (0+1)*
(c) (0+1)*1(0+1)

How about a simpler one?


14
Regular expressions in UNIX
In UNIX, regular expressions are widely used to
represent patterns in text
 “.” for any character
 “[abc]” for a+b+c
 “[a-z]” for any character between a and z
 “[:digit:]” for any digit, as [0-9]
 “[:alpha:]” for any letter, as [A-Za-z]
 “[:alnum:]” for any digit or letter, as [A-Za-z0-9]
15
Regular expressions in UNIX (cont.)
 Infix “|” for union
 Suffix “?” for “zero or one of”
 Suffix “+” for “one or more of”
 Suffix “{n}” for “n copies of”

Please note that the above usage is not


identical to how “regular expression” is defined
in our context

16
Laws for regular expressions
Two regular expressions are equivalent if and
only if they specify the same language
L + M = M + L
 (L + M) + N = L + (M + N)
 (LM)N = L(MN)
Ø + L = L + Ø = L
 εL = Lε = L
 ØL = LØ = Ø
17
Laws for regular expressions (cont.)
 L(M + N) = LM + LN
L + L = L
 (L*)* = L*
 Ø* = ε
 ε* = ε
 L+ = LL* = L*L
 L* = L+ + ε
 L? = ε + L
18
Proving a law of regular expression
To prove a law of regular expression:
1. Converting the regular expressions by
replacing the variables by different symbols
2. Checking the equality of the two languages
produced by the regular expressions
Theorem 3.14 proves this procedure is correct
using the substitutability of symbols by strings
Disprove a “law”: one single instance is enough
19
Proving the laws: examples
Check the following identities:
 Exercise 3.4.1(a): R + S = S + R
 Exercise 3.4.1(f): (R*)* = R*
 Exercise 3.4.2(a): (R + S)* = R* + S*
 Exercise 3.4.2(c): (RS + R)*RS = (RR*S)*
Solution:
http://infolab.stanford.edu/~ullman/ialcsols/sol3.html#sol34

20

You might also like