Professional Documents
Culture Documents
参考资料05:Deterministic Top Down Grammars-LL - k - -1970
参考资料05:Deterministic Top Down Grammars-LL - k - -1970
-z6s-
A = w (p)
2T*/k to represent the set of all subsets
if and only if w can be derived from A of T*/k. We use the term structurally
after first applying production p. equivalent as in [P&U] to mean that two
grammars generate the same strings and
Definition i: A grarmnar G = (T,N,P,S) is the same trees (with the intermediate
said to be an LL(k) grammar for some posi- nodes unlabeled) for these strings.
tive integer k if and only if given
Construction i: Given a g r a m a r G = (T,
i) a word w in T* such that lwl
k; N,P,S) we begin the construction of a
grammar G' = (T,N',P',S') by letting
2) a non-terminal A in N; T" = T x 2T*/k
3) a word w I in T*;
N" = N x 2 T*/k
there is at most one production p in P
such that for some w 2 and w 3 in T*, s" = (s,[~})
-166-
Given a derivation from S in G, a elements, the corollary indicates that the
corresponding derivation from S" in G" set N' may be much smaller. If, for
(where G" is defined in the construction) example, G contained no ( productions, N'
is obtained if, instead of applying could not have more than INI " IN u T U
production p to an instance of A, one [¢}I k elements since the first k elements
applies (p,R) to the corresponding (A,R). of string ? would determine R. Thus, a
Since all nonterminals used must, by much more practical approach to deriving
their very use, be in N' and all produc- G' is to generate it directly from G
tions used must be in P' (after replacing without constructing all of G".
each (t,R) for t in T by t) we obtain
a corresponding derivation in G'. We are now in a position to state the
LL(k) test.
Corollary i: L(p,R~((A,R)). = Lp(A) for
Test: Given a grammar G = (T,N,P,S) and
(A,R) in N'. given an integer k, construct the
grammar G' of Construction I. Then for
Proof: As with S' and S, derivations each w in T*/k and (A,R) in N', test to
from (A,R) and A can be made to correspond. see if there is at most one p in P such
that
Lemma 2: Given G and G' as defined in w is in (Lp (A)R)/k.
Construction I, then for all (A,R) in N'
and %0 and ql in (N' U T)*, This last expression is equal to ((Lp(A)
S' =L ~(A,R)~ implies R = L(q~)/k. /k)R)/k which is certainly computable.
If all w and (A,R) pass the test, then
Proo__~: We will prove the result by the grammar is LL(k) ; otherwise it is
induction on the length of a leftmost not.
derivation. It certainly holds for the
zero length derivation since the initial To prove that this test works, we need
string is (S,{~}) and [¢} = L(()/k. Now a couple of lemmas relating the LL(k)
suppose that it is true for some property with left derivations.
S = w(A,R)? 1 for w in T* and let (p,R)
be La production which applies to (A,R) Lenmm 3: If G = (T,N,P,S) is an LL(k)
from which we obtain grammar, then for all w I in T*, A in N,
S =n w(A'R)~Yi "*L W(An'Rn) "'" (Ai'RI)~Yi" and w in T*/k, there exists a ~ in (N U T)*
such that for all w 2 and w 3 in T*, the
The lenmm certainly holds for occurrences
of nonterminals in ~YI since the string three relations
-167-
S =L WlAT and ~ = w 3 (which are conditions satisfy conditions 4, 5, and 6 of the
! definition. Therefore, we conclude that
4 and 5). Now consider another pair w 2
= ?' and then 4 and 5 hold for all
and w~ satisfying i, 2, and 3. There must choices of w 2 and w 3.
also Be a ?' in (N U T)* such that
'
S =L WlAq~' and q~' = w 3. If ~ ~ ~' then
The last statement of the lemma
the two left derivation sequences must follows as a special case of the above by
differ. Let ~IB~ for Wl in T*, B in N, taking w 2' = w 2 and w 3' = w 3.
and ~ in (N U T)* be the last word for
which the two strings are the same. Word Le~mna 4: A grammar (T,N,P,S) is LL(k) if
Wl must of course be a prefix of w I since and only if for each w I and w in T*, A in
WlA is the beginning of the words being N and T in (N U T)* such that
derived. Symbolically, we now have
S =L WlAW'
6) S =L Wl B£° there exist at most one production p in P
and such that
7) w I = ~i v for some v in T*. w is in (Lp(A) L(~))/k.
Since ~iB~ is the departure point in the Proof: Suppose there are two such
two derivations, there are two productions productions. Then each production has a
B - ~ and B - ~' such that w 2 and w 3 in T* such that A = w2(P) and
8) ~ = vw2w 3 = w 3 (hence S = WlAW 3) which is a
and violation of the L L ~ ) definition.
9) ~*'~ = vw2w
' 3' Conversely, assume the LL(k) defini-
!
tion is violated by some Wl, w2, w3, w2,
We will now show a violation of the LL(k)
definition between the two productions wl, w in T* and A in N and distinct p and
for B. Because of 8, there are ~2 and w3
p' in P. l~tting w~B T be the first point
in T* such that where the left derigatlons differ,
WlW = w{u for some u. Thus, there are
i0) B-~=~ 2
q and q' such that u/k is in
ii) ~ = w3'
and w2w 3 = vw2w 3 . (Lq(B)L(?))/k n (nq, (B)L(?))/k.
-168-
Theorem i: Given a context free grammar 5. (w2w3)/k = w.
G = (T,N,P,S) and given an integer k, one
can decide if the grammar is LL(k). The only difference between this
definition and that of an LL(k) grammar
Proof: We show that the test given earlier is the quantifier "for all Wl" has been
in this section works. moved within the scope of the "there exist
at most one p". Thus, strong LL(k)
By Lenmna 2, the nonterminals (A,R) of grammars are a special case of LL(k).
N' represent all the possible A in N and R Intuitively, they are grammars where one
in T*/k such that R = L(?)/k and can parse correctly knowing only that one
S =L WlA7 for some w I in T* and ~ in is looking for a given nonterminal and
(N U T)*. The test is therefore a test of knowing the next k input. The power of
whether the condition of Lemma 4 holds these grammars is expressed by the follow-
and is therefore an LL(k) test. ing:
-169-
nullable if L(A) ~ {c} and call A non- The starting symbol S I is taken to be S
nullable otherwise. In particular,
if S is non-nullable and to be S' if S is
this means that terminals are non-nullable.
nullable.
The first step is to rewrite G so that the
first symbol on the righthand side of a
Each production in Pi corresponds to
non-E-rule is non-nullable. This will be
done in such a way as to preserve the a derivation in G. For example, A - A 2
LL(k) property. The new grammar w i l l be
obtained from the old by the advance • .. AnB I ... B m corresponds to the produc-
substitution of E-derivations into the tion A - A I ... AnB I ... B m followed by
various strings of leading nullable
the derivation of A I = E. Thus, a
symbols that occur on the righthand side
of productions in P. This preserves the derivation in G I certainly corresponds to
LL(k) property because the look-ahead of
a derivation in G. Conversely, given a
k determines precisely w h i c h initial E-
derivation tree in G, a derivation for G I
derivations should be applied• Readers
is obtained by successively deleting all
who are not interested in the details of
leftmost branches w h i c h result in E and
this step may skip ahead to the descrip-
replacing leftmost occurrences of other
tion of the second step.
nullable nonterminals by their non-nullable
counterparts.
For each nullable symbol A of G, w e
will add a new nonterminal symbol A' to
To verify that G I is LL(k), suppose
the nonterminal set; A' w i l l have the
property that L(A') = L(A) - {E]. Let- that w e are given w I in T*, A ° in Ni, and
ting G I = (T, Ni,Pi,S I) be the grammar, w in T*/k satisfying conditions I, 2, and
w e are trying to construct, our new non- 3 of Definition i. The corresponding
terminal set is described symbolically symbols in G determine a production
as A ~ A I ... AnB I ... B m as described above
-170-
V + and we let ~(y) represent this word. A = w where w is a non-null element of
For example, letting A represent symbols n o o
of V¢ and B represent symbols of Vi, L(Ai)). We let T be the terminal set of
G^ and let N^ = V' - T be the nonterminal
L z
set. The s t a r t i n g symbol w i l l be S 1
~(BiB2A3A4B5A6A7) = [BI ] [B2A3A4] [BsA6 ] [B7]
(sometimes written [Sl]). Finally, let
where the square brackets limit the words
of V. Thus, the sequence of nullable non- the p r o d u c t i o n s e t P2 f o r G2 be the s e t
terminals that can be generated in a left- of productions determined by the following
most derivation by G I are combined with three rules :
the preceding non-nullable symbol. Ele-
ments of V that are strings of length one Rule i: If B in N I and y in V* are such
E
are considered to be elements of V I i.e. that [By] is in V' and if B - YI is a
[A] = A for A in V. production of Pi' then P2 has the produc-
-171-
determine which of the leading A i must a nonterminal cannot be distinguished by
the next k input symbols, then there would
be eliminated with c-derivations and (if
be two corresponding productions of the
all A. are not so eliminated) which non
l original grammar which could not be dis-
c-rule to apply to the next A.. Thus, tinguished by the next k input symbols.
l
the grammar is LL(k+i) and the theorem is Corollary 4: Given an LL(k) grammar G
proven. with c-rules, a strong LL(k+I) grammar in
Greibach normal form can be obtained for
A nonterminal symbol, A, is said to L(G)- [¢].
be left recursive if and only if A = Aw
for some w in T* and L(A) ~ @. Proof: From Theorem 3, an LL(k+i) grammar
without c-rules can be obtained for L(G)
Lemma 5: An LL(k) grammar G can have no - [¢], and from Theorem 4, an LL(k+i) gram-
left recursive nonterminals. mar in Grelbach normal form can then be
obtained. Construction i preserves this
Proof: Assume that an L L ~ ) grammar has form and the result is strong LL(k) by
a left recursive symbol. Then for some Theorem 2.
nonterminal A, A = Ay (p) and A ~ x (p) Theorem 5: Given an LL(k+i) grammar with-
where x and y are in T*, and p and p' out c-rules for k ~ i, there exists an
are different productions. Because G is LL(k) granmmr with c-rules for the same
unambiguous, y ~ ¢. Furthermore, S = uAv language.
for some u and v. Now consider the
derivations Proof : From Theorem 4, the grammar can be
rewritten so that it is in Greibach normal
S = uAv = uAykv = uxykv form, and is still LL(k+i). This granmuar
will now be rewritten so that it is LL(k)
and S = uAv = uAykv = uAyk+lv = uxyk+lv with c-rules. If there is more than one
production for nonterminal A with terminal
Thus S = uAykv, A = xy (p), A = x (p'), symbol a as the first symbol on the right
hand side of the production, then a new
and (xyk+Iv)/k = (xykv)/k. nonterminal, (A,a) will be introduced.
Let the set of productions for A with a as
Therefore, since the granmuar is LL(k), the first symbol on the righthand side be
p=p', and there cannot be a left recursive
nonterminal. A-aw 1
A - aw 2
A granmmr is said to be in Greibach
normal form [GR] if the righthand side of
every production begins with a terminal
A-.aw
symbol. n
Then in the new grammar these productions
Theorem 4: Given an LL(k) grammar without
will be replaced by:
c-rules, another LL(k) grammar in Greibach
normal form can be obtained for the same A ~ a(A,a)
language.
(A,a) - w I
Proof: For nonterminals A and B let > be (A,a) - w 2
the transitive relation defined by A > B
if A = B~ for some ~0. From Lemma 5 it
I
-172-
k+l input symbols distinguisn between the control unit. The stack manipulation con-
original productions for ~, th8 next k sym- sists of two cases.
bols after the a distinguish between the
productions for (A,a) in the new grarmnar. Case i: If the top tape symbol is a non-
Thus the new granmmr is LL(k). terminal A, then there is at most one
The class of LL(1) granmnars in Grei- production p such that w is in (Lp(A)R~L.
bach normal form are the simple grammars of (A) ~k)/k. If there is no such production,
Korenjak and Hopcroft [K&H].
the machine stops and rejects the sequence.
Canonical Pushdown Machines If it has such a production, it is of the
form A - aT where ~ is in (N U T)*, and
We will assume throughout this
the machine replaces A by T.
section that G = (T,N,P,S) is a strong
LL(k) grammar in Greibach normal form.
Case 2: If the top stack symbol is a
We know from Corollary 4 that any LL(k')
terminal, then it is popped off if it
grammar can be put into this form for
matches the first symbol of w; otherwise
some k satisfying k ~ k' + i. We will
the machine stops and the sequence is
describe a (deterministic) pushdown
rejected.
machine which recognizes L(G).
When the machine reaches a configura-
The input set for the machine is
tion with no tape symbols, the machine
T U [ ~] where ~ is an end of tape marker
stops and accepts the sequence if and only
not in T. The machine is designed to
accept sequences from the language follow- if the control word (i.e. inputs r+l to
r+k-l) is ~_k-l.
ed by k-I end markers.
The machine operates in close corres-
It is important for later proofs to
observe that the machine has no c-moves pondence to a leftmost derivation in the
and stops after reading in k-i end markers. granmmr. This relationship is described
The pushdown control could, of course, be by the following:
redesigned so as not to read beyond the
first end marker, and the machine would Lemma 6: Let M be the canonical pushdown
terminate its recognition with k-2 c- machine for a strong LL(k) grammar G = (T,
moves. When k=l, there is no need for N,P,S) in Greibach normal form. If M
reaches a configuration with tape ? in
the end marker.
(N U T)* and look-ahead word w of length
The finite state control has enough k-i after input sequence WlW has been
memory to store an input string of length processed, then
k-i and to perform such obvious tasks as
reading in the first k-i inputs. After i) S = L wit
the (r+k-l)-th input symbol has been
processed, the word stored in the finite 2) For all w 3 in T* such that
state control is the string consisting of
the (r+l)-th input through the (r+k-l)-th S = WlW 3 and (w3 ~k)/k-i = w, w3
input. satisfies the relation T = w 3.
Initially, the machine reads the Proof: The proof is similar in spirit to
first k-i input symbols and stores them the proof of Lemma 3 so we present the
as the word in the finite control. The logic more briefly. It is evident from
pushdown tape is initialized with the the construction that S =L w i t and all that
starting symbol S. The tape alphabet
needs to be shown is that T generates all
will be N U T. the w 3 satisfying the conditions of 2).
If the machine has not stopped after Assume that w 3 satisfies S = WlW 3 and
r+k-i inputs have been processed, the w 3 ~ k / k - I = w but not T @ w 3. There must
machine reads in the (r+k)-th input, uses
the top tape symbol and the word w formed be a w I' in T*, A in N, and T' in (N U T)*
by the (r+l)-th through the (r+k)-th input such that S =L w~AT' is the last configura-
symbols to manipulate the stack as
described below, and stores the (r+k+2)- tion before the left derivations of
th through the (r+k)-th symbols in the S =L w i t and S =L WlW3 diverge. Word
-173-
w I' cannot have the form WlX because the (S,ac) - A
only such configuration in the left (A,bb) -" (
derivation of Wl? is Wlq~ itself and we
(A,cc) - (
are assuming w 3 cannot be derived from ~.
(A,b~-) -~ (
Therefore, there is an input word x of
(A,c~) - (
length k such that WlX is a prefix of both
WlW and W l w 3 ~ k . But since x has k sym- Assume now that [anbn} U [anc n} can be
bols, the choice of production to apply at generated by an LL(k) grarmuar. First
WlAY' consistent with x is unique by the rewrite the grarmnar in Greibach normal
n-k
form. Now note that for each k, S =, a
LL(k) property, contrary to the assump- Tn ' ~n = akbn' and ~n = akcn" FurtHermore
tion that the derivations differ. Thus
the lemma is proved by contradiction. for n I ~ n2, ~n I ~ Wn2, or else the
When one applies Construction I to an
LL(k) grammar to make it strong and then
applies the construction of the machine
grammar could have a derivation of the
just given, one obtains a construction h
It should be pointed out that push- In this section it will be shown that
down machines of the type described for every k ~ I, the class of languages
above can recognize languages that cannot generated by LL(k) grammars is properly
be generated by an LL(k) grammar for any contained within the class generated by
k. For instance, the language [anbn] U LL (k+l) grammars.
[anon}, which it will be shown has no
First, for each k ~ i, consider the
LL(k) granmuar, can be recognized by the
language [an(bkd+b+cc) n I n ~ i}. Here
pushdown machine described as follows.
"+" denotes the "or" operation. This
This machine operates on the basis of a
language can be generated by the following
word of length 2 and its operation is
LL(k) grammar with c-rules.
described by a set of rules of the form
(A,w) - ~, which mean that if A is the S - aSA
top stack symbol and w is in the stored
S -aA
input string, then A is replaced by q(
and another input symbol is read in. A ~ cc
Combinations of stack symbol and w not A -.bB
shown below result in rejection of the
input string. The initial stack symbol B-¢
is S.
B - bk'id
(S,aa) - SA
However, this language cannot be
(S,ab) - A
generated by an LL(k) grammar with (-rules.
-174-
Lemma 7: There exists no LL(k) grammar
bn3/k-i = bk-l. Since furthermore bk'Idb n3
without c-rules for the language [an(bkd+
no+k'l nl+n2 k i n3
b+cc) n I n • i} where k • i. /k-i = b k'l and a b b " db
Proof: Assume that there exists such a is in the language, it follows by Lemma 6
grammar. We may assume that it is a strong that Y2 = bk'idbn3"
LL(k) grammar G = (N,T,P,S) in Greibach
normal form and we let M be the canonical
pushdown machine associated with this
Having obtained the relations
granmmr. There must be an integer n such
o
n +k-i ?I = ak'ibnl' Z = cm, and
that the input sequence a o causes the
machine to have a pushdown tape with at n
least 2k-I symbols. (This is because M qt2 = b k-I db 3 we can conclude
must distinguish all pairs
nI n2 n ° +k-I b nlc mb k_idbn 3
a and a for n I ~ n2). Thus, the S=a
resulting tape has the form yiZY2 where but this string is clearly not in the
~i and ~2 in (N U T)* satisfy l~iI • k-i language since it contains a d not pre-
fixed by bk. Therefore, the language
and I~21 • k-i and Z is an element of
cannot be obtained by an LL(k) grammar
NUT. without c-rules.
n
By Lemma 6, S =L a °yiZy 2. Also by Theorem 7: For every k • I the class of
languages generated by LL(k) grammars
Lemma 6,
w i t h o u t c-rules is properly contained
within the class of languages generated
+k-I k-i by LL(k+I) grammars without c-rules.
?iZq,2 = ak'ibn° . and ylZql2 = a
2 (n o +k +i )
c Proof: For each k • i, the language
-175-
• ) The canonical pushdown machine for a
super strong LL(k) grammar will stop and
For w in T*~* and qt in (N. U T)*, reject if and only if it discovers a tape
let S(?,w) = [x I ? = x and x ~ = wy for ?, look-ahead word w, and input a such
some i m 0 and y in T*}. If S(?,w) is not that S(wa,v) = ~.
empty, let
To prove the theorem, we need only
•w(?) = min {n ] n = Ixl for x in $(?,w)}. give an equivalence test for two super
strong LL(k) granmmrs G I = (TI,Ni,Pi,S I)
Lemma 8: If grammar G is in Greibach
normal form, t is the maximum thickness and G 2 = (T2,N2,P2,S2). We let M I and M 2
of the righthand sides of productions in be the corresponding canonical determinis-
P, w in T*~* is of length m, ? is in tic pushdown machines. We will describe
(N U T)*, and S ( % w ) is non-empty, then a third deterministic pushdown machine,
M 3 with the property that L(Gi) = L(G 2)
T(?) ~ tw(?) ~ ~(?) + m(t-l)
if and only if M 3 accepts the set of all
Proof: Clearly Tw(q~) m ~(?). Since G is strings.
-176-
that 17(v)- ~(~) I ~ 2(k-l)(t-l). Properties of LL(k) Grammars
M 3 will now be described in greater In this section, various closure and
undecidability properties of LL(k) grammars
detail. It has a two track tape as will be given. These results are primarily
described above, but the top 2(k-l)(t-l)+l of a negative nature.
cells of the tape are kept in the finite
state control unit. Thus, if the differ- Theorem 9: If the finite union of dis-
ence in thickness of the two simulated joint LL(k) languages is regular, then all
tapes is less than this amount, M 3 has the languages are regular.
access to the top of both tracks of the
simulating tape. M 3 simulates both M I Proof: Let T be the combined terminal
vocabulary of the languages. It is suffi-
and M 2 as long as neither one has entered cient to prove the results for the regular
the rejecting state and the difference in set T* since if the union is a regular set
thickness between the two tapes being R, one can add the LL(1) language T*-R to
simulated is ~ 2(k-l)(t-l). the given set of LL(k) languages in order
to get a disjoint union which gives T*.
Machine M 3 is designed to accept Letting n be the number of languages in
all input sequences until one of the the union, we let M. for 1 ~ i ~ n repre-
following three things occur: sent the canonical ~ushdown machines for
these languages. We will call a machine
i) the difference in thickness configuration viable if there exists a
between the two tapes become sequence of inputs which causes the machine
greater than 2(k-l)(t-l); to enter an accepting configuration. To
prove the theorem, we will derive an upper
2) an input sequence causes exactly bound on the tape langths of the viable
one of the machines M I or M 2 to configurations that can occur when input
sequences are applied to the starting
stop in a rejecting state;
configurations of the M i. Thus the
languages will be shown to be recognized
3) one of the machines accepts a by a finite set of configurations and
sequence and the other does not. hence be regular.
If Mq rejects because of reason i, Suppose that input sequence w I in T*
we know that the machine with the is applied to each machine M s causing it
shorter tape can accept a short sequence to enter configuration C~ and let V be the
which the other machine cannot and hence subset of these configurations which are
L(G I) ~ L(G2). If M 3 rejects because viable. Since a sequence of k end markers
of the second reason, we know that the must be accepted by one of the C. and
machine which stopped in a rejecting since the canonical M. do not made (-moves
state cannot accept any continuation of and accept only with elmpty stacks, this C.
the input sequence whereas the other will be in V and will have tape length ~i I
machine can. The languages are obviously which is less than or equal to k. Now
different if rejection is for the last suppose that we have found a j element set
reason. Conversely, if there is an input V. which is a proper subset of V and which
sequence in L(G I) which is not in L(G2) , has configurations with tape lengths of
maximum size ~'3" Let ~j+l be the shortest
then the application of that sequence to
M 3 will obviously cause M 3 to reject for sequence in T * ~ k which does not cause a
configuration in Vj to enter an accepting
one of these reasons.
configuration. (Such a sequence exists
The problem has now been reduced because V. is a proper subset of V and the
to deciding if deterministic pushdown 3
languages are disjoint). Some configura-
machine M 3 rejects any sequences (or if
tion C i in V-Vj must accept this shortest
the complement machine accepts any). This
sequence and must therefore have tape of
is a well-known decidable question [CG&G].
length less than or equal to P]+i and
therefore Vj+ I = V.j U [C.}i is a j+l
-177-
element subset of V with tapes of length I n,m m i} are two LL(1) languages, whose
~j+l or less. We have defined by induction intersection is L3, which is not LL(k).
bounds ~i and sets V i for all i such that L 8 = [bnan + c n a n i n m i } is an LL(1)
I ~ i ~ IV I. Thus if the ~. can be bound-
l language whose reversal is L3, and is not
ed independent of Wl, the tape lengths of EL(k).
all reachable viabl~ configurations will
be bounded and each machine will accept a From Theorem 9, the language L 9 =
regular set. [ambn I i ~ n ~ m} is not an eL(k) language
since its union with the LL(1) language L I
We have already observed that ~i
is the finite state language [ambn I m,n
has a bound independent of Wl; namely k. ml }. However, L 9 is the concatenation
Suppose that a bound b. has Been found
3 of a* and [anbn I n m i}, each of which
for all the possible ~. resulting from all is an LL(1) language. Therefore the set
3
choices of w I. The possible bj+ 1 are of LL(k) languages is not closed under
concatenation.
determined from the possible V. which can
3
arise. But the V. which arise have tape The language Li0 = [dambm+l + e a m ¢ m+l
lengths of bj or 3 less and there are only I m m 0} is an LL(1) language, but its
a finite number of such Vj. Thus we may 1~n~ge under the homomorphism that converts
choose b~+
I ~ to be the max~num Zj+i over d and e to a while leaving a,b, and c un-
changed is the non-LL(k) language L 3.
the range of possible V.. Thus bounds b.
j z
are established by induction and the Some undecidability properties will
theorem proved. now be given.
Corollary 5: The complement of a non- Theorem Ii: Given a context free grammar,
regular LL(k) language is never LL(k). it is undecidable whether or not there
exists a k such that the g r a m a r is LL(k).
Qorollary 4: Theorem 9 generalizes to
languages recognized by deterministic Pro0f: Consider a Turing machine M with
pushdown machines which accept only with some initial finite string on its tape.
an empty stack and do not make c-moves. An instantaneous description [Davis] for
the Turing machine is a string that
Theorem I0: The LL(k) languages are not indicates the current tape contents, in-
closed under complementation, union, inter- ternal state, and position of the head on
section, reversal, concatenation, or c-free the tape.
homomorphisms.
Let c be a symbol that cannot appear
Proof: Korenjak and Hopcroft [K&H] give in an instantaneous description and for a
examples of simple languages whose closure string x let x r denote the reversal of x.
under most of these operations produces An LL(1) grammar G I with sentence symbol
non-LL(k) languages. S I can be obtained whose sentences are of
r r
the form Po c Pl c P2 "'" c P2i-i c P2i c
Since the language L 1 = [ a m b n I i <
m~n} is an LL(1) language which is not ... c P2n where Po is the initial instant-
finite state, its complement, Lg, is not aneous description and for each i between
LL(k) by Corollary 5. The language L o = i and n, p^. is the instantaneous descrip-
[anbn + anon I n e i} was shown to be = tion that ~ s u l t s from instantaneous des-
a non-LL(k) language in an earlier section. cription P2i-i by one move of the machine.
However, L^ is the union of two LL(1)
languages, ~ L4 = [anbn I n m i} and Another LL(1) grammar G 2 with sen-
tence symbol S 2 can be written such that
L 5 = anon I n ~ I}. The languages L 6 =
its sentences are of the form qo c q~ c
[an(b + b) n I n ~ i} and L 7 = [anbm U ant m r
q2 "'" c q2i c q2i+l c ... c q2n where
-178-
for each i between 0 and n-l~ q2i+l is the Prqof: The problem of computing the look-
ahead required to determine which production
instantaneous description that results from to apply is very similar to the problem in
instantaneous description q2i by one move [L&S] of computing the "distinction index"
of the machine. of two occurrences of a nonterminal and it
is a fairly straightforward exercise to
Now let G 3 be the grammar with sen- reduce the first problem to the second.
tence symbol S R whose productions include As there is little insight to be gained
all the productions of G I and G 2 plus the in repeating the relevant definitions and
two new ones S 3 - S I and S 3 - S 2. techniques from [L&S], we omit further
detail.
A string of the form
r Theorem 13: It is undecldable whether or
r ° c r I c r 2 c ... c r2n not an arbitrary context free grammar
generates an LL(k) language, even for a
is a prefix of strings in both L(Gi) and fixed k.
L(G 2) if and only if ro, rl, r2, ..., rn
Proof: The proof of the corresponding
is the sequence of instantaneous des- theorem [K&H] for simple grammars is
criptions produced by the Turlng machine valid for LL(k) grammars.
when it starts with Pc" Therefore, if
the machine does not halt, then there However, given an arbitrary context
are arbitrarily long sequences that are free granmmr, it is decidable [P&U] if
prefixes of sentences in L(G I) and L(G2) , there exists an LL(1) grammar without
(-rules that is structurally equivalent
no choice can be made between productions to the original one.
S 3 - S I and S 3 - S 2 by looking ahead a
finite amount, and G 3 is not an LL(k) AcknowlgdKment
grammar for any k.
The authors are grateful to P. M.
On the other hand, if the machine l~wis for valuable discussion and en-
does halt, then there is a bound on the couragement.
length of any prefix of strings in both
L(Gi) and L(G2) , namely the length of the References
series of instantaneous descriptions that
lead to the halting condition. Therefore, [Davis ] - Davis, M., Cgmputabi!ity and
by looking ahead this amount it is always Unsolyability, McGraw-Hill,
possible to choose between productions New York, 1958.
S 3 ~ S I and S 3 - S 2, Any subsequent
E~G] - Ginsburg, S. and Greibach, S.,
choice between productions can be made on "Deterministic Context-Free
the basis of the next input symbol. Languages," Informatlon & Control,
Therefore, if the machine halts, G 3 is an 9,6 (December 1966) 620-648.
LL(k) grammar for some k.
[GR] - Greibach, S., "A New Normal-Form
Since G 3 is L L ~ ) if and only if the Theorem for Context-Free Phrase
Structure Grammars," J. ACM, 12,
machine halts, which is undecidable, it is 1 (Jan. 1965)42-52.
undecidable if there exists a k such that
G 3 is LL(k). [K&H] - Korenjak, A.J., and Hopcroft, J.E.
"Simple Deterministic Languages,"
Although given a general context IEEE Conference Record of Seventh
free grammar, it is decidable if there Annual Symposium on Switchin K and
exists a k such that it is LL(k), given Automata Theory, IEEE Pub. No.
an LR(k) grammar, the problem is un- 16-C-40, Oct. 1966, pp. 36-46.
decidable.
Theorem 12: Given an LR(k) grammar of [KN] - Knuth, D.E., "On the Translation
known k, it is decidable if there exists Of Languages from Left to Right,"
a k' such that the grammar is LL(k'). Information & Control, 8,6
-179-
CDecember 1965) 607-639.
-180-