Professional Documents
Culture Documents
Chapters1 7
Chapters1 7
Introduction to
Theoretical Computer Science
Section G
Instructor: G. Grahne
Lectures: Tuesdays and Thursdays,
11:45 13:00, H 521
Office hours: Tuesdays,
14:00 - 15:00, LB 903-11
All slides shown here are on the web.
www.cs.concordia.ca/~teaching/comp335/2003F
Course organization
Textbook: J. E. Hopcroft, R. Motwani, and
J. D. Ullman Introduction to Automata Theory, Languages, and Computation, Second Edition, Addison-Wesley, New York, 2001.
Sections: There are two parallel sections. The
material covered by each instructor is roughly
the same. There are four common assignments and a common final exam. Each section
will have different midterm tests.
Assignments: There will be four assignments.
Each student is expected to solve the assignments independently, and submit a solution for
every assigned problem.
Motivation
Automata = abstract computing devices
Turing studied Turing Machines (= computers) before there were any real computers
We will also look at simpler devices than
Turing machines (Finite State Automata, Pushdown Automata, . . . ), and specification means,
such as grammars and regular expressions.
NP-hardness = what cannot be efficiently
computed
Finite Automata
Finite Automata are used as a model for
Software for designing digital cicuits
Lexical analyzer of a compiler
Searching for keywords in a file or on the
web.
Software for verifying finite state systems,
such as communication protocols.
on
Push
h
t
e
th
n
the
then
Structural Representations
These are alternative ways of specifying a machine
Grammars: A rule like E E + E specifies an
arithmetic expression
Lineup P erson.Lineup
says that a lineup is a person in front of a
lineup.
Regular Expressions: Denote structure of data,
e.g.
[A-Z][a-z]*[][A-Z][A-Z]
matches Ithaca NY
does not match Palo Alto CA
Question: What expression would match
Palo Alto CA
9
Central Concepts
Alphabet: Finite, nonempty set of symbols
Example: = {0, 1} binary alphabet
Example: = {a, b, c, . . . , z} the set of all lower
case letters
Example: The set of all ASCII characters
Strings: Finite sequence of symbols from an
alphabet , e.g. 0011001
Empty String: The string with zero occurrences of symbols from
The empty string is denoted
10
Languages:
If is an alphabet, and L
then L is a language
Examples of languages:
The set of legal English words
The set of legal C programs
The set of strings consisting of n 0s followed
by n 1s
13
e-commerce
The protocol for each participant:
Start
pay
a
redeem
b
transfer
d
ship
f
ship
ship
(a) Store
c
e
redeem
cancel
pay
g
transfer
2
cancel
1
3
redeem
Start
(b) Customer
4
transfer
Start
(c) Bank
17
Completed protocols:
cancel
Start
a
pay
redeem
ship
(a) Store
c
transfer
ship
redeem
ship
transfer
2
cancel
pay,
ship
Start
(b) Customer
pay,redeem, pay,redeem,
cancel, ship cancel, ship
3
redeem
4
transfer
Start
(c) Bank
18
1
C
3
C
P,C
P
4
C
P
R
P,C
S
C
P
R
S
P,C
P,C
P,C
T
S
P,C
P,C
P,C
P,C
S
P,C
S
P,C
P,C
19
A = (Q, , , q0, F )
Q is a finite set of states
is a finite alphabet (=input symbols)
is a transition function (q, a) 7 p
q0 Q is the start state
F Q is a set of final states
20
q0
?q1
q2
1
q0
q1
q1
q0
0
0
q2
q1
0, 1
21
Example: The FA
0, 1
Start
q0
q1
q2
(q, ) = q
Induction:
23
Start
q0
q1
0
0
0
0
q2
q3
1
? q0
q1
q2
q3
0
q2
q3
q0
q1
1
q1
q0
q3
q2
24
Example
Marble-rolling toy from p. 53 of textbook
x1
x3
x2
25
000r
?000a
?001a
010r
?010a
011r
100r
?100a
101r
?101a
110r
?110a
111r
A
100r
100r
101r
110r
110r
111r
010r
010r
011r
011r
000a
000a
001a
B
011r
011r
000a
001a
001a
010a
111r
111r
100a
100a
101a
101a
110a
26
q0
q1
q2
q0
q0
q1
q1
(stuck)
q0
q0
q1
q2
q0
q2
(stuck)
0
1
27
A = (Q, , , q0, F )
Q is a finite set of states
is a finite alphabet
is a transition function from Q to the
powerset of Q
q0 Q is the start state
F Q is a set of final states
28
q0
q1
?q2
0
{q0, q1}
1
{q0}
{q2}
29
Basis:
Induction:
(q, xa) =
(p, a)
p(q,x)
30
q0
q1
q2
0. w q0 (q0, w)
1. q1 (q0, w) w = x0
2. q2 (q0, w) w = x01
31
32
N (p, a)
pS
34
{q0}
{q1}
?{q2}
{q0, q1}
?{q0, q2}
?{q1, q2}
?{q0, q1, q2}
{q0, q1}
{q0, q1}
{q0, q1}
{q0, q1}
{q0}
{q2}
{q0, q2}
{q0}
{q2}
{q0, q2}
35
A
B
C
?D
E
?F
?G
?H
0
A
E
A
A
E
E
A
E
1
A
B
D
A
F
B
D
F
36
0
0
{q0, q1}
{q0, q2}
0
1
37
38
Induction:
def
D ({q0}, xa) = D (D ({q0}, x), a)
i.h.
= D (N (q0, x), a)
cst
N (p, a)
pN (q0 ,x)
def
= N (q0, xa)
39
40
Exponential Blow-Up
There is an NFA N with n + 1 states that has
no equivalent DFA with fewer than 2n states
0, 1
Start
q0
q1
0, 1
q2
0, 1
0, 1
0, 1
qn
Case 1:
1a2 an
0b2 bn
Then q has to be both an accepting and a
nonaccepting state.
Case 2:
a1 ai11ai+1 an
b1 bi10bi+1 bn
Now N (q0, a1 ai11ai+1 an0i1) =
N (q0, b1 bi10bi+1 bn0i1)
and N (q0, a1 ai11ai+1 an0i1) FD
N (q0, b1 bi10bi+1 bn0i1)
/ FD
42
1. An optional + or - sign
2. A string of digits
3. a decimal point
4. another string of digits
q0 ,+,-
q1
0,1,...,9
q2
0,1,...,9
0,1,...,9
q3
q5
.
q4
43
Example:
-NFA accepting the set of keywords {ebay, web}
1
e
Start
44
q0
q1
q2
q3
q4
?q5
{q1}
{q5}
+,{q1}
{q2}
{q3}
0, . . . , 9
{q1, q4}
{q3}
{q3}
45
ECLOSE
We close a state by adding all states reachable
by a sequence
Inductive definition of ECLOSE(q)
Basis:
q ECLOSE(q)
Induction:
p ECLOSE(q) and r (p, )
r ECLOSE(q)
46
Example of -closure
For instance,
ECLOSE(1) = {1, 2, 3, 4, 6}
47
(q, ) = ECLOSE(q)
Induction:
(q, xa) =
ECLOSE(p)
p((q,x),a)
48
Given an -NFA
E = (QE , , E , q0, FE )
we will construct a DFA
D = (QD , , D , qD , FD )
such that
L(D) = L(E)
Example: -NFA E
0,1,...,9
Start
q0 ,+,-
q1
0,1,...,9
q2
0,1,...,9
0,1,...,9
q3
q5
.
q4
DFA D corresponding to E
0,1,...,9
{q0, q }
1
+,-
{q }
1
0,1,...,9
0,1,...,9
{q , q }
1
.
Start
{q2}
0,1,...,9
{q3, q5}
0,1,...,9
50
51
Induction:
E (q0, xa) =
ECLOSE(p)
ECLOSE(p)
ECLOSE(p)
pE (E (q0 ,x),a)
pD (D (qD ,x),a)
pD (qD ,xa)
= D (qD , xa)
52
Regular expressions
A FA (NFA or DFA) is a blueprint for contructing a machine recognizing a regular language.
A regular expression is a user-friendly, declarative way of describing a regular language.
Example: 01 + 10
Regular expressions are used in e.g.
53
Operations on languages
Union:
L M = {w : w L or w M }
Concatenation:
L.M = {w : w = xy, x L, y M }
Powers:
L0 = {}, L1 = L, Lk+1 = L.Lk
Kleene Closure:
L =
Li
i=0
Building regexs
Inductive definition of regexs:
Basis: is a regex and is a regex.
L() = {}, and L() = .
If a , then a is a regex.
L(a) = {a}.
Induction:
If E is a regexs, then (E) is a regex.
L((E)) = L(E).
If E and F are regexs, then E + F is a regex.
L(E + F ) = L(E) L(F ).
If E and F are regexs, then E.F is a regex.
L(E.F ) = L(E).L(F ).
If E is a regexs, then E ? is a regex.
L(E ?) = (L(E)).
55
or, equivalently,
( + 1)(01)( + 0)
NFA
RE
DFA
58
(k)
Rij
jF
(0)
Rij =
{a:(i,a)=j}
Case 2: i = j
M
(0)
Rii =
a
+
{a:(i,a)=i}
59
Induction:
(k)
Rij
=
(k1)
Rij
+
(k1)
(k1) (k1)
Rik
Rkk
Rkj
k
In R (k-1)
ik
R (k-1)
kk
j
In R (k-1)
kj
60
0,1
2
(0)
+1
(0)
(0)
(0)
+0+1
R11
R12
R21
R22
61
62
(0)
+1
(0)
(0)
(0)
+0+1
R11
R12
R21
R22
(1)
(0)
(0)
(0) (0)
Rij = Rij + Ri1 R11 R1j
By direct substitution
Simplified
(1)
+ 1 + ( + 1)( + 1)( + 1) 1
(1)
0 + ( + 1)( + 1)0
10
(1)
+ ( + 1)( + 1)
(1)
+ 0 + 1 + ( + 1)0
+0+1
R11
R12
R21
R22
63
Simplified
(1)
(1)
10
(1)
(1)
+0+1
R11
R12
R21
R22
(2)
(1)
(1)
(1) (1)
Rij = Rij + Ri2 R22 R2j
By direct substitution
(2)
1 + 10( + 0 + 1)
(2)
10 + 10( + 0 + 1)( + 0 + 1)
(2)
+ ( + 0 + 1)( + 0 + 1)
(2)
+ 0 + 1 + ( + 0 + 1)( + 0 + 1)( + 0 + 1)
R11
R12
R21
R22
64
By direct substitution
(2)
1 + 10( + 0 + 1)
(2)
10 + 10( + 0 + 1)( + 0 + 1)
(2)
+ ( + 0 + 1)( + 0 + 1)
(2)
+ 0 + 1 + ( + 0 + 1)( + 0 + 1)( + 0 + 1)
R11
R12
R21
R22
Simplified
(2)
(2)
10(0 + 1)
(2)
(2)
(0 + 1)
R11
R12
R21
R22
R12 = 10(0 + 1)
65
Observations
(k)
Rij
(k1)
66
p1
Q1
S
P1
Pm
Qk
qk
pm
R km
R k1
67
q1
R 11 + Q 1 S* P1
p1
R 1m + Q 1 S* Pm
R k1 + Q k S* P1
qk
pm
R km + Q k S* Pm
68
R
S
Start
T
Eq
qF
69
1
A
0,1
B
0,1
C
1
A
0+1
B
0+1
C
70
0+1
Start
0+1
0+1
C
Start
0+1
C
Start
A
71
From
0+1
1( 0 + 1)
Start
0+1
C
Start
A
72
(a)
(b)
a
(c)
73
S
(a)
(b)
(c)
74
(a)
(b)
Start
(c)
75
76
L = L = L.
is identity for union.
{}L = L{} = L.
{} is left and right identity for concatenation.
L = L = .
is left and right annihilator for concatenation.
77
L(M N ) = LM LN .
Concatenation is left distributive over union.
(M N )L = M L N L.
Concatenation is right distributive over union.
L L = L.
Union is idempotent.
= {}, {} = {}.
L+ = LL = LL, L = L+ {}
78
Lj
!i
i=0 j=0
k, m N : w (Lm)k
p N : w Lp
w
Li
i=0
w L
79
80
In Chapter 4 we will learn how to test automatically if E = F , for any concrete regexs
E and F .
We want to test general identities, such as
E + F = F + E, for any regexs E and F .
Method:
81
E(E, F )
Now we can for instance make the substitution
S = {E/0, F /11} to obtain
S (E(E, F )) = (0 + 11)0
82
83
84
85
Induction:
Case 1: E = F + G.
Then (E) = (F) + (G), and
L((E)) = L((F)) L((G))
Let E and and F be regexs. Then w L(E + F )
if and only if w L(E) or w L(F ), if and only
if a1 L((F)) or a2 L((G)), if and only if
a1 (E), or a2 (E).
Case 2: E = F.G.
Then (E) = (F).(G), and
L((E)) = L((F)).L((G))
Let E and and F be regexs. Then w L(E.F )
if and only if w = w1w2, w1 L(E) and w2 L(F ),
and a1a2 L((F)).L((G)) = (E)
Case 3: E = F.
Prove this case at home.
86
Examples:
To prove (L + M) = (LM) it is enough to
determine if (a1 + a2) is equivalent to (a1a2)
To verify L = LL test if a1 is equivalent to
a1a1.
Question: Does L + ML = (L + M)L hold?
87
89
p0
p1
p2
...
pk
90
If (q, 1i)
/ F the machine will foolishly reject 0i1i.
91
Theorem 4.1.
The Pumping Lemma for Regular Languages.
Let L be regular.
Then n, w L : |w| n w = xyz such that
1. y 6=
2. |xy| n
3. k 0, xy k z L
92
i < j : pi = pj
93
1. x = a1a2 ai
2. y = ai+1ai+2 aj
3. z = aj+1aj+2 . . . am
y=
ai+1 . . . aj
Start
x=
p0
a1 . . . ai
z=
pi
aj+1 . . . am
94
0} 0111
w = 000
|
{z 11}
| {z } | {z
x
95
p
}|
w = 111
1} 1111
| {z } | {z
|
{z 11}
x
y
|y|=m
98
Example:
Let L be recognized by the DFA below
1
Start
{q0}
0
0
{q0, q1}
{q0, q2}
0
1
Then L is recognized by
1
Start
{q0}
0
0
{q0, q1}
{q0, q2}
0
1
100
101
AL
Start
AND
Accept
AM
102
Formally
ALM = (QL QM , , LM , (qL, qM ), FL FM ),
where
LM ((p, q), a) = (L(p, a), M (q, a))
It will be shown in the tutorial by and induction
on |w| that
103
0,1
q
(a)
0
Start
0,1
s
(b)
1
Start
pr
ps
0
0,1
qr
qs
0
(c)
104
105
Theorem 4.11.
then so is LR .
If L is a regular language,
106
Theorem 4.11.
then so is LR .
If L is a regular language,
Homomorphisms
A homomorphism on is a function h : ,
where and are alphabets.
Let w = a1a2 an . Then
h(w) = h(a1)h(a2) h(an)
and
h(L) = {h(w) : w L}
Example: Let h : {0, 1} {a, b} be defined by
h(0) = ab, and h(1) = . Now h(0011) = abab.
Example: h(L(101)) = L((ab)).
108
Inverse Homomorphism
Let h : be a homom. Let L ,
and define
h1(L) = {w : h(w) L}
h(L)
(a)
h-1 (L)
(b)
110
/ L((00 + 1)).
3. w = xaay. Then h(w) = z0101v and
/
L((00 + 1)).
4. w = xbby. Then h(w) = z1010v and
/
L((00 + 1)).
111
h
Input
h(a) to A
Start
Accept/reject
A
112
Decision Properties
We consider the following:
2. Is L = ?
3. Is w L?
113
114
Testing emptiness
L(A) 6= for FA A if and only if a final state
is reachable from the start state in A. Total
O(n2) steps.
Alternatively, we can inspect a regex E and tell
if L(E) = . We use the following method:
E = F + G. Now L(E) is empty if and only if
both L(F ) and L(G) are empty.
E = F.G. Now L(E) is empty if and only if
either L(F ) or L(G) is empty.
E = F . Now L(E) is never empty, since
L(E).
E = . Now L(E) is not empty.
E = a. Now L(E) is not empty.
E = . Now L(E) is empty.
116
Testing membership
To test w L(A) for DFA A, simulate A on w.
If |w| = n, this takes O(n) steps.
If A is an NFA and has s states, simulating A
on w takes O(ns2) steps.
If A is an -NFA and has s states, simulating
A on w takes O(ns3) steps.
If L = L(E), for regex E of length s, we first
convert E to an -NFA with 2s states. Then we
simulate w on this machine, in O(ns3) steps.
117
118
Example:
1
0
Start
0
0
1
H
0
0
(C, ) F, (G, )
/ F C 6 G
(A, 01) = C F, (G, 01) = E
/ F A 6 G
119
0
Start
A
1
0
0
(A, ) = A
/ F, (E, ) = E
/F
(A, 1) = F = (E, 1)
Therefore (A, 1x) = (E, 1x) = (F, x)
(A, 00) = G = (E, 00)
(A, 01) = C = (E, 01)
Conclusion: A E.
120
C D E
121
122
123
124
Example:
0
Start
0
Start
E
1
x
A
x
B
C D
Minimization of DFAs
126
127
128
0
Start
D
1
0
0
0
0
to obtain
1
0
D,F
1
Start
A,E
0
0
B,H
129
0
A
1
B
0
130
131
132
135
where
V is a finite set of variables.
T is a finite set of terminals.
P is a finite set of productions of the form
A , where A is a variable and (V T )
S is a designated variable called the start symbol.
136
137
E
E
E
E
I
E+E
EE
(E)
5.
6.
7.
8.
9.
10.
I
I
I
I
I
I
a
b
Ia
Ib
I0
I1
138
(i)
(ii)
(iii)
(iv)
(v)
(vi)
(vii)
(viii)
(ix)
String
a
b
b0
b00
a
b00
a + b00
(a + b00)
a (a + b00)
Lang
I
I
I
I
E
E
E
E
E
Prod
5
6
9
9
1
1
2
4
3
String(s) used
(ii)
(iii)
(i)
(iv)
(v), (vi)
(vii)
(v), (viii)
139
or, if G is understood
A
and say that A derives .
140
E E E I E a E a (E)
a(E+E) a(I+E) a(a+E) a(a+I)
a (a + I0) a (a + I00) a (a + b00)
rm
rm
rm
rm
rm
rm
rm
142
L(G) = {w T : S w}
G
P 0P 0 0x0 = w
Thus w L(Gpal ).
The case for w = 1x1 is similar.
144
w = 0x0 0P 0 P
or
w = 1x1 1P 1 P
where the second derivation is done in n steps.
By the IH x is a palindrome, and the inductive
proof is complete.
145
Sentential Forms
Let G = (V, T, P, S) be a CFG, and (V T ).
If
S
we say that is a sentential form.
If S we say that is a left-sentential form,
lm
and if S we say that is a right-sentential
rm
form
Note: L(G) is those sentential forms that are
in T .
146
lm
lm
rm
rm
147
Parse Trees
If w L(G), for some CFG, then w has a
parse tree, which tells us the (syntactic) structure of w
w could be a program, a SQL-query, an XMLdocument, etc.
Parse trees are an alternative representation
to derivations and recursive inferences.
There can be several parse trees for the same
string
Ideally there should be only one parse tree
(the true structure) for each string, i.e. the
language should be unambiguous.
Unfortunately, we cannot always remove the
ambiguity.
148
149
152
I
I
0
0
2. A w
lm
rm
3. A w, and A w
4. There is a parse tree of G with root A and
yield w.
To prove the equivalences, we use the following
plan.
Leftmost
derivation
Derivation
Rightmost
derivation
Parse
tree
Recursive
inference
154
155
Induction: w is inferred in n + 1 steps. Suppose the last step was based on a production
A X1X2 Xk ,
where Xi V T . We break w up as
w1 w2 wk ,
where wi = Xi, when Xi T , and when Xi V,
then wi was previously inferred being in Xi, in
at most n steps.
By the IH there are parse trees i with root Xi
and yield wi. Then the following is a parse tree
for G with root A and yield w:
A
X1
X2
...
Xk
w1
w2
...
wk
156
Consequently A w P , and A w.
lm
158
X2
...
Xk
w1
w2
...
wk
159
i : A w1w2 wiXi+1Xi+2 Xk .
lm
A X1Xi+2 Xk .
lm
A w1w2 wi1XiXi+1 Xk .
lm
A w1w2 wiXi+1 Xk .
lm
160
A
lm
w1w2 wi1XiXi+1 Xk
lm
w1w2 wi11Xi+1 Xk
lm
w1w2 wi12Xi+1 Xk
lm
w1w2 wi1wiXi+1 Xk
161
I
I
0
0
lm
lm
lm
lm
lm
lm
lm
EE
lm
I E
lm
aE
lm
a (E)
lm
a (E + E)
lm
a (I + E)
lm
a (a + E)
lm
a (a + I)
lm
a (a + I0)
lm
a (a + I00)
lm
a (a + b00)
163
We have
E EE EE+E I E+E I I +E
I I +I aI +I ab+I ab+a
By looking at the expansion of X3 = E only,
we can extract
E I b.
164
the derivation A w.
G
165
A X1X2 Xk w
G
166
E
E
E
E
I
E+E
EE
(E)
E EE E+EE
*
(a)
(b)
167
I
a
(a)
(b)
169
I
F
T
E
a | b | Ia | Ib | I0 | I1
I | (E)
F |T F
T |E+T
172
.
.
.
T
T
174
I
a
(a)
(b)
lm
lm
lm
lm
lm
lm
and
E E E E +E E I +E E a+E E
lm
lm
lm
lm
lm
lm
lm
175
In General:
One parse tree, but many derivations
Many leftmost derivation implies many parse
trees.
Many rightmost derivation implies many parse
trees.
Theorem 5.29: For any CFG G, a terminal
string w has two distinct parse trees if and only
if w has two distinct leftmost derivations from
the start symbol.
176
177
Inherent Ambiguity
A CFL L is inherently ambiguous if all grammars for L are ambiguous.
Example: Consider L =
{anbncmdm : n 1, m 1}{anbmcmdn : n 1, m 1}.
A grammar for L is
S AB | C
A aAb | ab
B cBd | cd
C aCd | aDd
D bDc | bc
178
A
a
A
a
B
c
d
d
b
(a)
c
(b)
179
lm
lm
lm
lm
and
S C aCd aaDdd aabDcdd aabbccdd
lm
lm
lm
lm
lm
It can be shown that every grammar for L behaves like the one above. The language L is
inherently ambiguous.
180
Pushdown Automata
A pushdown automata (PDA) is essentially an
-NFA with a stack.
On a transition the PDA:
1. Consumes an input symbol.
2. Goes to a new state (or stays in the old).
3. Replaces the top of the stack by any string
(does nothing, pops the stack, or pushes a
string onto the stack)
Input
Finite
state
control
Accept/reject
Stack
181
182
0
1
0
0
1
1
Start
,
,
,
,
,
,
Z 0 /0 Z 0
Z 0 /1 Z 0
0 /0 0
1 /0 1
0 /1 0
1 /1 1
q0
0, 0/
1, 1/
q
, Z 0 / Z 0
, 0 / 0
, 1 / 1
, Z 0 /Z 0
q2
183
PDA formally
A PDA is a seven-tuple:
P = (Q, , , , q0, Z0, F ),
where
Q is a finite set of states,
is a finite input alphabet,
is a finite stack alphabet,
184
,
,
,
,
,
,
Start
Z 0 /0 Z 0
Z 0 /1 Z 0
0 /0 0
1 /0 1
0 /1 0
1 /1 1
q0
0, 0/
1, 1/
q
, Z 0 / Z 0
, 0 / 0
, 1 / 1
, Z 0 /Z 0
q2
q0
q1
?q2
0, Z0
1, Z0
0,0
0,1
1,0
1,1
, Z0
, 0
, 1
q0 , 0Z0
q0 , 1Z0
q0 , 00
q1 ,
q0 , 01
q0 , 10
q0 , 11
q1 ,
q 1 , Z0
q 2 , Z0
q1 , 0
q1 , 1
185
Instantaneous Descriptions
A PDA goes from configuration to configuration when consuming input.
To reason about PDA computation, we use
instantaneous descriptions of the PDA. An ID
is a triple
(q, w, )
where q is the state, w the remaining input,
and the stack contents.
Let P = (Q, , , , q0, Z0, F ) be a PDA. Then
w , :
(p, ) (q, a, X) (q, aw, X) ` (p, w, ).
,
,
,
,
,
,
Z 0 /0 Z 0
Z 0 /1 Z 0
0 /0 0
1 /0 1
0 /1 0
1 /1 1
q0
0, 0/
1, 1/
q
, Z 0 / Z 0
, 0 / 0
, 1 / 1
, Z 0 /Z 0
q2
187
( q0 , 1111, Z 0 )
( q0 , 111, 1Z 0 )
( q , 1111, Z 0 )
1
( q2 , 1111, Z 0 )
( q0 , 11, 11Z 0 )
( q , 111, 1Z 0 )
1
( q , 11, Z 0 )
1
( q0 , 1, 111Z 0 )
( q , 11, 11Z 0 )
1
( q2 , 11, Z 0 )
( q0 , , 1111Z 0 )
( q , 1, 111Z 0 )
1
( q , 1, 1 Z 0 )
1
( q , , 1111Z 0 )
1
( q , , 11 Z 0 )
1
( q , , Z0 )
1
( q2 , , Z 0 )
188
189
Theorem 6.5: w , :
190
(-direction.)
Observe that the only way the PDA can enter
q2 is if it is in state q1 with an empty stack.
194
, X 0 /
Start
p0
, X 0 / Z 0X 0
q0
PN
pf
, X 0 /
, X 0 /
195
(q0, w, Z0) `
(q, , ),
N
for some q. From Theorem 6.5 we get
(q0, w, Z0X0) `
(q, , X0).
N
Since N F we have
(q0, w, Z0X0) `
(q, , X0).
F
We conclude that
(p0, w, X0) `
(q0, w, Z0X0) `
(q, , X0) `
(pf , , ).
F
F
F
(direction.) By inspecting the diagram.
196
Formally,
PN = ({q}, {i, e}, {Z}, N , q, Z),
where N (q, i, Z) = {(q, ZZ)},
and N (q, e, Z) = {(q, )}.
197
, X 0/ZX 0
, X 0 /
198
Start
p0
, X 0 / Z 0X 0
q0
, any/
, any/
PF
p
, any/
199
(q0, w, Z0) `
(q, , ),
F
for some q F, . Since F N , and
Theorem 6.5 says that X0 can be slid under
the stack, we get
(q0, w, Z0X0) `
(q, , X0).
N
The PN can compute:
(p0, w, X0) `
(q0, w, Z0X0) `
(q, , X0) `
(p, , ).
N
N
N
200
A language is
generated by a CFG
if and only if it is
accepted by a PDA by empty stack
if and only if it is
accepted by a PDA by final state
Grammar
PDA by
empty stack
PDA by
final state
| A {z }
tail
At (q, y, ) the PDA behaves as before, unless there are terminals in the prefix of . In
that case, the PDA pops them, provided it can
consume matching input.
If all guesses are right, the PDA ends up with
empty stack and input.
Formally, let G = (V, T, Q, S) be a CFG. Define
PG as
({q}, T, V T, , q, S),
where
(q, , A) = {(q, ) : A Q},
for A V , and
(q, a, a) = {(q, )},
for a T .
Example: On blackboard in class.
203
S i ,
lm
then
204
lm
{z
i+1
A .
Induction: Length is n > 1, and the IH holds
for lengths < n.
Since A is a variable, we must have
(q, x, A) ` (q, x, Y1Y2 Yk ) ` ` (q, , )
where A Y1Y2 Yk is in G.
206
B
a
C
x
1
x
2
x
3
207
Yi xi
If Yi is a terminal, we have |xi| = 1, and Yi = xi.
S w, meaning w L(G).
208
Y1
p0
p1
Y2
.
pk- 1
Yk
pk
x1
x2
xk
210
211
to a CFG.
212
214
215
[ri1, Y, ri] wi
216
aw1[r1Y2r2][r2Y3r3] [rk1Yk rk ]
aw1w2[r2Y3r3] [rk1Yk rk ]
aw1w2 wk = w
217
(r1, w2 wk , Y2 Yk ) `
(r2, w3 wk , Y3 Yk ) `
(p, , ).
219
Deterministic PDAs
,
,
,
,
,
,
Z 0 /0 Z 0
Z 0 /1 Z 0
0 /0 0
1 /0 1
0 /1 0
1 /1 1
q0
0, 0/
1, 1/
q
c , Z 0 /Z 0
c, 0/ 0
c, 1/ 1
, Z 0 /Z 0
q2
220
to the end.
224
Properties of CFLs
Simplification of CFGs. This makes life easier, since we can claim that if a language is CF,
then it has a grammar of a special form.
Pumping Lemma for CFLs. Similar to the
regular case. Not covered in this course.
Closure properties. Some, but not all, of the
closure properties of regular languages carry
over to CFLs.
Decision properties. We can test for membership and emptiness, but for instance, equivalence of CFLs is undecidable.
225
S X w
for a teminal string w. Symbols that are not
useful are called useless.
wT
227
Example: Let G be
S AB|a, A b
S and A are generating, B is not. If we eliminate B we have to eliminate S AB, leaving
the grammar
S a, A b
Now only S is reachable. Eliminating A and b
leaves us with
Sa
with language {a}.
OTH, if we eliminate non-reachable symbols
first, we find that all symbols are reachable.
From
S AB|a, A b
we then eliminate B as non-generating, and
are left with
S a, A b
that still contains useless symbols
228
2. Eliminate from G2 all nonreachable symbols and the productions they occur in.
229
and , such that S X in G2. Furthermore, every symbol used in this derivation is
The terminal derivation X xwy in G2 involves only symbols that are reachable from S,
because they are reached by symbols in X.
Thus the terminal derivation is also a derviation of G1, i.e.,
S X xwy
in G1.
230
231
232
Xw
Eliminating -Productions
236
Example: Let G be
S AB, A aAA|, B bBB|
Now n(G) = {A, B, S}. The first rule will become
S AB|A|B
the second
A aAA|aA|aA|a
the third
B bBB|bB|bB|b
We then delete rules with -bodies, and end up
with grammar G1 :
S AB|A|B, A aAA|aA|a, B bBB|bB|b
237
A w in G also.
Basis: One step. Then there exists A w
in G1. Form the construction of G1 it follows
that there exists A in G, where is w plus
some nullable variables interspersed. Then
Aw
in G.
238
A X1X2 Xk w in G1
and the first derivation is based on a production
A Y1Y2 Ym
where m k, some Yis are Xj s and the other
are nullable symbols of G.
Xi wi in G. Now we get
239
that A w in G1.
Basis: Length is one. Then A w is in G,
and since w 6= the rule is in G1 also.
Induction: Derivation takes n > 1 steps. Then
it looks like
A Y1Y2 Ym w
G
240
A X1X2 Xk w in G1
The claim of the theorem now follows from
statement (]) on slide 238 by choosing A = S.
241
AB
is a unit production, whenever A and B are
variables.
Unit productions can be eliminated.
Lets look at grammar
I
F
T
E
a | b | Ia | Ib | I0 | I1
I | (E)
F |T F
T |E+T
243
Productions
E E+T
E T F
E (E)
E a | b | Ia | Ib | I0 | I1
T T F
T (E)
T a | b | Ia | Ib | I0 | I1
F (E)
F a | b | Ia | Ib | I0 | I1
I a | b | Ia | Ib | I0 | I1
Summary
1. Eliminate -productions
in this order.
246
We shall show that every nonempty CFL without has a grammar G without useless symbols, and such that every production is of the
form
A BC, where {A, B, C} T , or
A , where A V , and T .
Ck3 Bk2Ck2
Ck2 Bk1Bk
248
A
B1
C1
B2
C2
.
.
C k -2
B k-1
Bk
(a)
A
B1
B2
. . .
Bk
(b)
249
E EP T by E EC1, C1 P T
E T M F, T T M F by
E T C2 , T T C2 , C2 M F
E LER, T LER, F LER by
E LC3, T LC3, F LC3, C3 ER
Consider a mapping
s : 2
s(w)
wL
252
253
S
0
T = a Ta
254
Sa
Sa
x1
x2
Sa
xn
Sa
Sa
x1
x2
Sa
xn
256
258
259
FA
state
Input
AND
Accept/
reject
PDA
state
Stack
260
Formally, define
P 0 = (QP QA, , , , , (qP , qA), Z0, FP FA)
where
((q, p), a, X) = {((r, A(p, a)), ) : (r, ) P (q, a, X)}
in P 0
261
Proof:
is regular, L R
is regular, and L R
=
1. R
L \ R.
always was CF, it would follow that
2. If L
L1 L2 = L1 L2
always would be CF.
3. Note that is CF, so if L1 \L2 was always
.
CF, then so would \ L = L
262
Inverse homomorphism
h(a)
PDA
state
Accept/
reject
Stack
263
265
Input size is n.
n is the total size of the input CFG or PDA.
The following work in time O(n)
(slide 203)
266
267
268
Good news:
1. Computing r(G) and g(G) and eliminating
useless symbols takes time O(n). This will
be shown shortly
(slides 229,232,234)
2. Size of u(G) and the resulting grammar
with productions P1 is O(n2)
(slides 244,245)
3. Arranging that bodies consist of only variables is O(n)
(slide 248)
4. Breaking of bodies is O(n)
(slide 248)
269
Bad news:
Eliminating the nullable symbols can make
the new grammar have size O(2n)
(slide 236)
The bad news are avoidable:
Break bodies first before eliminating nullable
symbols
Conversion into CNF is O(n2)
270
yes
Count
A
D B
271
272
w L(G)?
Inefficient way:
Suppose G is CNF, test string is w, with |w| =
n. Since the parse tree is binary, there are
2n 1 internal nodes.
Generate all binary parse trees of G with 2n 1
internal nodes.
Check if any parse tree generates w
273
A aiai+1 aj
G
X 15
X 14 X 25
X 13 X 24 X 35
X 12 X 23 X 34 X 45
X 11 X 22 X 33 X 44 X 55
a1
a2
a3
a4
a5
274
A aiai + 1 aj , if
for some k < j, and A BC, we have
275
Example:
G has productions
S AB|BC
A BA|a
B CC|b
C AB|a
{S,A,C}
-
{S,A,C}
{B}
{B}
{S,A}
{B}
{S,C}
{S,A}
{B}
{A,C}
{A,C}
{B}
{A,C}
276
278
279