This action might not be possible to undo. Are you sure you want to continue?
An Eﬃcient Yet
Presentable Version of the CYK Algorithm
Martin Lange
Dept. of Computer Science, University of Munich, Germany
Hans Leiß
Centrum f. Informations und Sprachverarbeitung
University of Munich, Germany
June 3, 2009
Abstract
The most familiar algorithm to decide the membership problem for
contextfree grammars is the one by Cocke, Younger and Kasami (CYK)
using grammars in Chomsky normal form (CNF). We propose to teach
a simple modiﬁcation of the CYK algorithm that uses grammars in a
less restrictive binary normal form (2NF) and two precomputations: the
set of nullable nonterminals and the inverse of the unit relation between
symbols.
The modiﬁed algorithm is equally simple as the original one, but high
lights that the at most binary branching rules alone are responsible for
the O(n
3
) time complexity. Moreover, the simple transformation to 2NF
comes with a linear increase in grammar size, whereas some transforma
tions to CNF found in most prominent textbooks on formal languages
may lead to an exponential increase.
1 Introduction
The membership and parsing problems for contextfree languages (CFL) are of
major importance in compiler design, bioinformatics, and computational linguis
tics. The algorithm due to Cocke [4], Younger [24] and Kasami [9], often called
the CYK algorithm, is the most wellknown and therefore commonly taught
algorithm that solves the word problem for contextfree grammars (CFG), and
with a minor extension, the parsing problem as well. It is also a very prominent
example for an algorithm using dynamic programming, a design principle for
recursive algorithms that become eﬃcient by storing the values of intermediate
calculations. It is therefore fair to say that the CYK algorithm is one of the
most important algorithms in undergraduate syllabi for students in subjects like
computer science, (bio)informatics, and computational linguistics.
1
One would expect that textbooks and course notes reﬂect this importance by
good presentations of eﬃcient versions of CYK. However, this is not the case. In
particular, many textbooks present the necessary transformation of a context
free grammar into Chomsky normal form (CNF) in a suboptimal way leading to
an avoidable exponential blowup of the grammar. They neither explained nor
discuss whether this can be avoided or not. Complexity analyses often concern
the actual CYK procedure only and neglect the pretransformation. Some of the
textbooks give vague reasons for choosing CNF, indicating ease of presentation
and proof. A few state that CNF is chosen to achieve a cubic running time, not
mentioning that only the restriction to binary rules is responsible for that.
We ﬁnd that this leaves students with wrong impressions. The complexity
of solving the word problem for arbitrary CFLs appears to coincide with the
complexity of the CYK procedure (ignoring the eﬀects of the grammar trans
formation). Also, CNF is unduly emphasised.
We believe that this situation calls for correction. We want to show that
eﬃciency and presentability are not contradictory, so that one can teach a less
restrictive version of CYK without compromising eﬃciency of the algorithm,
clarity of its presentation, or simplicity of proofs. We propose to use a grammar
normal form which is more liberal and easier to achieve than CNF and comes
with a linear increase of grammar size only, leading to a faster membership test
for arbitrary CFGs.
It would be unnecessary to argue that we should emphasize eﬃciency con
cerns as well as ease of presentation in teaching CYK to students, if the goal
was just to show that the membership problem for CFLs is decidable. But since
CYK is often the only algorithm taught for that problem, students are led to
believe that it is also the one which should be used.
Older books on parsing, e.g. Aho and Ullman [1], doubt that CYK “will
ﬁnd practical use” at all, arguing that time and space bounds of O([w[
3
) and
O([w[
2
) are not good enough and that Earley’s algorithm does better for many
grammars. At least the ﬁrst part of this may no longer be true. For exam
ple, Tomita’s [21, 19] generalization of LR parsing to arbitrary CFGs uses a
complicated “graphstructured” stack to cope with the nondeterminism of the
underlying pushdown automaton; but Nederhof and Satta [15] show how to
obtain an eﬃcient generalized LR parser by using CYK with a grammar de
scribing a binary version of the pushdown automaton. Furthermore, there are
nonstandard applications of CYK. For instance, Axelsson et al. [3] use a sym
bolic encoding of CYK for an ambiguity checking test. The claim of Aho and
Ullman [1] that CYK is not of practical importance therefore seems too skepti
cal.
The paper aims at scholars who do know about CYK or, even better, have
already taught it. We want to convince them that the standard version can
easily be improved and taught so that students learn an eﬃcient algorithm.
Note that we only suggest that a particular variant of the algorithm is the best
one to teach, not that it should be presented in a particular style.
The paper is organised as follows. Sect. 2 discusses how the CYK algo
rithm is presented in some prominent textbooks on formal languages, parsing
2
and the theory of computation. Sect. 3 recalls deﬁnitions and notations used
for languages and CFGs. Sect. 4 presents an eﬃcient test for membership in
contextfree languages, consisting of the grammar transformation into 2NF, two
precomputation steps and the actual CYK procedure. Correctness proofs and
complexity estmations are given for each of the steps. The section ﬁnishes with
an example.
1
Sect. 5 contains some concluding remarks about teaching the vari
ant proposed here and the possibility of avoiding grammar transformations at
all.
2 The CYK Algorithm in Textbooks
What is understood as the CYK algorithm? For textbooks on formal
languages, the input of the CYK algorithm is a contextfree grammar G in
Chomsky normal form (CNF) and a word w over the terminal alphabet of G;
its output is a recognition table T containing, for each subword v of w, the set
of nonterminals that derive v, i.e. its syntactic properties. In particular, T tells
us whether w is a sentence of G. Thus, the membership problem for G is solved,
and by transforming an arbitrary contextfree grammar into an equivalent one
in CNF, the membership problem for contextfree grammars is solved as well.
Using some standard notation explained below, ﬁgure 1 shows a typical
version of CYK, where the nonterminals deriving the subword a
i
. . . a
j
of w are
stored in T
i,j
.
input: a CFG G = (N, Σ, S, →) in CNF, a word w = a
1
. . . a
n
∈ Σ
+
CYK(G,w) =
1 for i = 1, . . . , n do
2 T
i,i
:= ¦A ∈ N [ A → a
i
¦
3 for j = 2, . . . , n do
4 for i = j − 1, . . . , 1 do
5 T
i,j
:= ∅;
6 for h = i, . . . , j − 1 do
7 for all A → BC
8 if B ∈ T
i,h
and C ∈ T
h+1,j
then
9 T
i,j
:= T
i,j
∪ ¦A¦
10 if S ∈ T
1,n
then return yes else return no
Figure 1: Algorithm CYK for the word problem of CFGs in CNF.
Concerning the restriction to CNF, a little reﬂection shows that the basic
1
The expert reader may have a look at the example ﬁrst to spot the modiﬁcations in
comparison to “their textbook version” of CYK.
3
idea of CYK can be extended to arbitrary contextfree grammars (and beyond
[11, 16]). The syntactic properties of w can be computed from the syntactic
properties of all strict subwords v of w using a dynamic programming approach:
store the nonterminals deriving v in table ﬁelds T
v
and compute T
w
by combining
entries in stored T
v
’s according to rules of G and all possible splittings of w into
at most m strict subwords v, where m is the maximum number of symbols in
the right hand sides of grammar rules. This gives an algorithm solving the word
problem for contextfree grammars in time O([w[
m+1
).
Such a remark is generally absent in textbooks on formal languages. Among
the computer science books on parsing, only Aho and Ullman [1] remark that
“a simple generalization works for nonCNF grammars as well”; among those in
computational linguistics one can ﬁnd a generalization of the CYK algorithm to
acyclic CFGs without deletion rules in Winograd’s book [23], c.f. Naumann and
Langer [13]. The latter then motivates the restriction to CNF by a “simpliﬁca
tion to compute T ”, without going into any detail. Aho and Ullman [1] say that
CNF “is useful in simplifying the notation needed to represent a contextfree
language”. Textbooks on formal languages motivate the restriction to CNF by
the fact that “many proofs can be simpliﬁed” if righthand sides of rules have
length at most two [7], or claim that the advantage of CNF “is that it enables
a simple polynomial algorithm for deciding whether a string can be generated
by the grammar” [12]; this is only correct if the emphasis lies on “simple”. In
fact, there is a mixture of two reasons for using grammars in CNF in CYK:
cutting down the time complexity to O([w[
3
) and keeping the correctness and
completeness proofs simple. The textbooks on formal languages do not clearly
say which of the restictions of the CNF format serves which of these goals.
In the literature on parsing, one can ﬁnd a number of variations of CYK
that diﬀer in the restrictions on the input grammar. The most liberal one that
suﬃces to make CYK run in time O([w[
3
) is that the grammar is bilinear, i.e.
in each rule A → α there are at most two nonterminal occurrences in α, see
the socalled Cparser [10]. Others relax the CNF restriction by admitting unit
rules, i.e. rules A → α where α is a nonterminal [20]. Almost all textbooks on
formal languages stick to the very restrictive CNF, where in each rule A → α
either α = BC for nonterminals B and C, or α is a terminal, or α is the empty
word ε and A is the start symbol and must not be used recursively. Then
any contextfree language has a grammar in CNF. For some variations of the
deﬁnition, obviously motivated by simplicity of presentation, this only holds
for languages without words of length 0 or 1. Given this variety, we draw the
conclusion that CYK is to be understood as the above sketched algorithm that
uses dynamic programming and contextfree grammars in some normal form to
solve the membership problem in time O([w[
3
).
Table 1 deﬁnes the normal forms that are being used in textbooks (apart
from our proposal 2NF) and shows how they relate to each other. In the column
“rule format”, A, B, C are arbitrary nonterminals, S is the start symbol that
may not occur on a righthand side, a is a terminal symbol, u, v, w are strings
of terminal symbols, and α is a sentential form.
Table 2 lists, for several books that present a version of CYK, the format
4
Name Rule format
CNF
−
A → BC [ a
CNF A → BC [ a, S → ε
CNF
ε
A → BC [ a [ ε
S2F A → α where [α[ = 2
C2F A → BC [ B [ a, S → ε
2NF A → α where [α[ ≤ 2
2LF A → uBvCw [ uBv [ v
CNF
2NF
CNF
−
2LF
CNF
ε
C2F
S2F
Chomsky normal form (CNF), strict 2form (S2F),
canonical 2form (C2F), 2normal form (2NF), bilinear form (2LF)
Table 1: Grammar normal forms and their hierarchy w.r.t. inclusion
of the normal form they require and the asymptotic time and space complex
ity they state for their CYK procedure on grammars in the respective normal
form. One can see that the most prominent textbooks on formal language
theory ignore the dependency on the grammar. Furthermore, they demand a
very restrictive normal form compared to other books which obtain reasonable
complexity bounds.
Transforming CFGs into CNF Most textbook solutions to the word prob
lem for contextfree grammars present the CYK algorithm for grammars in CNF
and just state that this is no restriction of the general case. A complexity anal
ysis is often done only for the CYK procedure. Where a complexity analysis
for the transformation into CNF is missing, one obtains the impression that the
membership problem for CFGs can be solved within the same bounds as for
grammars in CNF. This is true but almost never achieved with the procedure
presented in these textbooks. Consequently, most students will not be able to
tell whether an algorithm like CYK is possible for arbitrary CFGs, and almost
none will know the time complexity of the membership problem for arbitrary
CFGs in terms of grammar size. This is problematic not only for computational
linguistics, where the grammar size is huge compared to sentence length.
The transformation to CNF combines several steps: the eliminiation of ter
minal symbols except in right hand sides of size 1 (Term); the reduction to
rules with righthand sides of size ≤ 2 (Bin); the elimination of deletion rules
(Del); and the elimination of unit rules (Unit). It is often neglected that the
complexity of the transformation to CNF depends on the order in which these
steps are preformed.
1. The blowup in grammar size depends on the order between Del and Bin.
It may be exponential when Del is done ﬁrst, but is linear otherwise.
5
Book Subject Format Time Space
Aho/Ullman ’72 Pa CNF [w[
3

Hopcroft/Motwani/Ullman ’01 CT CNF
−
[w[
3

Harrison ’78 FL CNF [w[
3
[w[
2
Naumann/Langer ’94 Pa CNF
−
[w[
3
[w[
2
Sch¨oning ’00 CT CNF
−
[w[
3

Sippu/SoisalonSoininen ’88 Pa C2F [G[ [w[
3
[N[ [w[
2
Wegener ’93 CT CNF
−
[G[ [w[
3

Lewis/Papadimitriou ’98 CT S2F [G[ [w[
3
[N[ [w[
2
Rich ’07 CT CNF
−
[G[ [w[
3

Autebert/Berstel/Boasson ’97 FL CNF
ε
 
Subject: parsing (Pa), formal languages (FL), theory of computation (CT)
G is the input grammar, N its nonterminal set, and w is the input word.
Table 2: Overview over variants of CYK, with complexity mentioned
2. Unit can incur a quadratic blowup in the size of the grammar.
Hence, except for Term, which gives a linear increase in grammar size, the
order in which the single transformation steps are carried out should be: Bin
→Del →Unit. This will yield a grammar of size O([G[
2
). Nevertheless, many
textbooks choose a diﬀerent order, namely Del →Unit →Bin which yields a
grammar of size O(2
2G
). Even worse so, only few textbooks contain a rigorous
complexity analysis of this transformation or a hint at its suboptimality.
Table 3 shows how the textbooks mentioned in Table 2 handle the trans
formation into the normal form that their CYK variant demands, i.e. the one
listed in column “rule format” of Table 2. The last two columns give the asymp
totic size of the resulting grammar and the fact whether or not the estimation
is given in the textbok. Table 4 shows the asymptotic time complexity for the
word problem that one obtains by combining their transformation to normal
form with their corresponding version of CYK.
Which normal form to choose? All normal forms mentioned in Table 1
allow for a time complexity of CYK that is cubic in the size of the input word and
linear in the size of the grammar. Therefore, the question which of the possible
normal forms to choose ought to be determined by two factors: (i) complexity of
the grammar transformation, in particular the size of the transformed grammar,
and (ii) ease of presentation and proofs.
To simplify the discussion, we only consider grammars for εfree languages,
on which CNF and CNF
−
collapse. Simplicity of the grammar transformation
6
Book Target Method G
 Stated
Aho e.a. CNF Del →Unit →Bin →Term 2
2G
−
Hopcroft e.a. CNF
−
Del →Unit →Term →Bin 2
2G
−
Harrison CNF Del →Unit →Term →Bin 2
2G
−
Naumann e.a. (refers to Aho e.a.)
Sch¨oning CNF
−
Del →Unit →Term →Bin 2
2G
−
Sippu e.a. C2F Bin →Del [G[ +
Wegener CNF
−
Term →Bin →Del →Unit [G[
2
+
Lewis e.a. S2F Bin →Del →Unit [G[
3
+
Rich CNF
−
Bin →Del →Unit →Term [G[
2
+
Autebert e.a. CNF
ε
Term →Bin →Unit [G[
2
−
This paper 2NF Bin [G[ +
Wegener [22] gives O([V [ [G[) as [G
[, but ignores that Bin increased V to
size [G[.
Table 3: Transformation of CFG (with only useful symbols) into NF
and size of the resulting grammar would suggest to favour 2NF over C2F or
CNF. The transformation to 2NF only needs step Bin with a linear increase
in grammar size. The transformation to C2F needs the additional step Del,
but is also linear, provided that Bin is executed before Del. Transformation
to CNF aﬀords the additional steps of Unit and Term. Step Unit causes a
quadratic inrease in grammar size and therefore ought to be omitted. Term
can be achieved by a linear blowup of the grammar, but has no real advantage
for the complexity or presentation of CYK, so we prefer to omit it. Although
2LF is more liberal than 2NF, the latter is preferable since the transformation
to 2NF is simpler to explain.
But what is the price to pay in terms of modiﬁcations of CYK or the proofs
involved? Unit rules and deletion rules in the grammar seem to complicate the
ﬁlling of the recognition table: in order to compute T
v
, one now has to consider
splittings v = v
1
v
2
not only into strict subwords v
1
, v
2
, but also nonstrict ones,
say v
1
= ε, v
2
= v. Actually, a small modiﬁcation of the original version of
CYK suﬃces to accommodate for deletion and unit rules: just close the ﬁelds
T
v
under the rule “if B ∈ T
v
and A ⇒
∗
B, add A to T
v
”. This can be done
eﬃciently, as shown by Sippu and SoisalonSoininen [20] or below.
However, this is part of what is done in transformations to CNF as well:
in order to eliminate deletion rules from a grammar, one computes the set
of nullable nonterminals, and in order to eliminate unit rules, one computes
the set of nonterminals that derive a given one [1, 7, 8]. But, rather than
using these relations to transform the grammar to CNF, one can use them in
7
Book via Time Stated
Aho e.a. CNF 2
2G
n
3
−
Hopcroft e.a. CNF
−
2
2G
n
3
−
Harrison CNF 2
2G
n
3
−
Naumann e.a. CNF 2
2G
n
3
−
Sch¨oning CNF
−
2
2G
n
3
−
Sippu e.a. C2F [G[ n
3
+
Wegener CNF
−
[G[
2
n
3
+
Lewis e.a. S2F [G[
3
n
3
+
Rich CNF
−
[G[
2
n
3
+
Autebert e.a. CNF
ε
[G[
2
n
3
−
This paper 2NF [G[ n
3
+
The last column states whether or not the overall time complexity is given.
Table 4: Upper complexity bounds for the word problem on arbitrary CFGs
the CYK algorithm directly and leave the (2NF) grammar unchanged. We
think the explicit use of auxiliary information is better than an unnecessary
transformation of input data.
In our opinion, the advantage of using 2NF or C2F over CNF in CYK is
not reﬂected well in the literature. We are only aware of one textbook which
presents the CYK algorithm for grammars in C2F, [20] – a specialised book on
parsing – and a PhD thesis [14]. It seems to have been unnoticed so far that one
can use even 2NF instead of C2F. We consider 2NF as an advantage over C2F,
since not only unit rules, but also deletion rules are convenient in grammar
writing; the latter are often used as a means to admit optional constituents.
Furthermore, as we will show below, using 2NF does not compromise on the
ease of presentation nor proof.
3 Preliminaries
Let Σ be a ﬁnite set, called the alphabet. A letter is an element of Σ, a word is
a ﬁnite sequence of letters, and a language is a set of words. Σ
∗
is the set of all
words over Σ, and Σ
+
the set of all nonempty words. As usual, ε is used for the
empty word, wv for the concatenation of the two words w and v, and [w[ for
the length of the word w, with [ε[ = 0. For a word w = a
1
. . . a
n
and positions
1 ≤ i ≤ j ≤ n we write w[i..j] for the subword a
i
. . . a
j
of w.
A contextfree grammar (CFG) is a tuple G = (N, Σ, S, →) s.t.
• Σ is the ﬁnite alphabet or set of terminals;
8
• N is the ﬁnite nonempty set of nonterminals, s.t. N ∩ Σ = ∅;
• S ∈ N is the start symbol ;
• → ⊆ N (N ∪ Σ)
∗
is a ﬁnite set of rules.
The set of all symbols of G is N ∪ Σ, which will always be denoted by the
letter V in the following. We will use uppercase letters like A, B, C to denote
nonterminals, lowercase letters like a, b, c to denote terminal symbols, small
letters like x, y, z to denote symbols, lowercase letters like u, v, w to denote
words over Σ, and Greek lowercase letters α, β, γ, . . . for sentential forms, i.e.
words over V . Henceforth we simply use grammar for contextfree grammar.
A grammar can be represented by enlisting, for each nonterminal A, the
righthand sides α of rules A → α. Thus, one measures the size of G by
[G[ :=
A∈N
A→α
[Aα[.
We assume that each symbol occurs in some rule, so that [V [ ∈ O([G[).
The derivation relation ⇒⊆ V
+
V
∗
is deﬁned such that for all α, β ∈ V
∗
,
αAβ ⇒ αγβ iﬀ there is a rule A → γ. The relations ⇒
n
, ⇒
+
and ⇒
∗
are the
nfold iteration, the transitive closure, and the reﬂexivetransitive closure of ⇒,
respectively.
The language generated by G is L(G) := ¦w ∈ Σ
∗
[ S ⇒
+
w¦. Two context
free grammars G
1
, G
2
with the same terminal alphabet are equivalent if L(G
1
) =
L(G
2
).
Deﬁnition 1. A grammar G = (N, Σ, S, →) is in binary normal form (2NF)
if for all A → α we have [α[ ≤ 2.
In our exposition of the CYK algorithm, two relations between symbols of a
grammar play an important role.
Deﬁnition 2. The set of nullable symbols of the grammar G = (N, Σ, S, →) is
c
G
:= ¦A [ A ∈ N, A ⇒
+
ε¦
and the unit relation of G is

G
:= ¦(A, y) [ ∃α, β ∈ c
∗
G
with A → αyβ¦.
4 The Word Problem for ContextFree Languages
We suggest the following algorithm to test, for a given grammar G and a given
word w, whether or not w ∈ L(G).
1. Preprocessing of the grammar.
(a) Transformation of G into an equivalent G
in 2NF.
9
(b) Computation of the set c
G
of nullable symbols in G
.
(c) Construction of the the inverse unit relation
˘

G
of G
.
2. Building the CYK table for w and (G
,
˘

G
).
Phase 2 applies a recognizer which is universal for grammars in 2NF with pre
computed unitrelation. We now present the details and complexity analysis of
this algorithm, and then apply it to an example in section 4.5. Some readers
may want to look at the example before checking the complexity bounds.
4.1 Transformation into 2NF
It is wellknown that every grammar can be transformed into an equivalent one
in CNF, applying the four transformations Term, Bin, Del, Unit mentioned
in the introduction. Every grammar can be transformed into one in 2NF by
just performing the step Bin.
Lemma 1. For every grammar G = (N, Σ, S, →) there is an equivalent gram
mar G
= (N
, Σ, S, →
) in 2NF computable in time O([G[), such that [G
[ =
O([G[), [N
[ = O([G[).
Proof. The transformation Bin replaces each rule A → x
1
x
2
x
n
with n > 2 by
the rules A →
x
1
'x
2
, . . . , x
n
`, 'x
2
, . . . , x
n
` →
x
2
'x
3
, . . . , x
n
`, . . . , 'x
n−1
, x
n
` →
x
n−1
x
n
, abbreviating suﬃxes of length ≥ 2 by new symbols 'x
i
, . . . , x
n
` ∈ N
.
This can be carried out in a single pass through the grammar, and the number
of new nonterminals is bounded by the size of G. The equivalence of G
and G
can easily be shown by transforming derivations with G
into derivations with
G and vice versa.
In fact, G
is a leftcover of G: any leftmost derivation with respect to G
is
mapped to a leftmost derivation with respect to G by replacing applications of
A →
x
1
'x
2
, . . . , x
n
` with applications of A → x
1
x
2
x
n
and omitting appli
cations of 'x
i
, . . . , x
n
` →
x
i
'x
i+1
, . . . , x
n
` and 'x
n−1
, x
n
` →
x
n−1
x
n
.
A
x
1
'x
2
, . . . , x
n
`
x
2
'x
3
, . . . , x
n
`
.
.
.
.
.
.
'x
n−1
, x
n
`
x
n−1
x
n
→
A
x
1
x
2
. . . x
n
Conversely, any leftmost derivation with Gis obtained from a leftmost derivation
of G
by this mapping (cf. Exercise 3.29, [1]). Note that transformations to C2F
or CNF do not admit parses to be recovered by such a simple homomorphism
between strings of rules.
10
4.2 Precomputation of the Nullable Symbols
The following lemma gives an inductive characterisation of c
G
.
Lemma 2. Let G = (N, Σ, S, →) be a grammar. Then c
G
= c
N
where
c
0
:= ¦A [ A → ε¦ and c
i+1
:= c
i
∪ ¦A [ ∃α ∈ c
+
i
with A → α¦.
Proof. (“⊇”) By induction on i, one shows c
i
⊆ c
G
for all i.
(“⊆”) By deﬁnition, c
i
⊆ c
i+1
⊆ N for any i. As N is ﬁnite, c
N+1
= c
N
.
To show c
G
⊆ c
N
, prove by induction on n that A ⇒
n
ε implies A ∈ c
N
.
Lemma 2 suggests to construct c
G
by iteratively computing the c
i
until
c
i+1
= c
i
. A na¨ıve implementation of this idea can easily result in quadratic
running time. However, it is possible to compute c
G
in time linear in the gram
mar, as mentioned by Harrison [7], Exercise 2 in Section 4.3, and Sippu e.a. [20],
Theorem 4.14. Since this seems not to be wellknown, we present in Fig. 2 an
algorithm Nullable to do so for G in 2NF. It maintains a set todo which can be
implemented as a list with insertion of an element and removal of the ﬁrst one
in constant time, and a set nullable which should be stored as a boolean array
with constant access time. Both are assumed to be empty initially.
Algorithm Nullable successively ﬁnds those nonterminals that can derive ε
starting with those that derive it in one step and proceeding backwards through
the rules. For this, the predecessors of a nonterminal B, i.e. all the nonterminals
A such that B occurs on the righthand side of a rule A → α, need to be acces
sible without searching through the entire grammar. The algorithm therefore
starts by storing in an initially empty array occurs the set of all such A for
each such B. This information is used to infer from the information that B is
nullable, that A is also nullable, if A and B are linked through a rule A → B.
If they are linked through a rule A → BC or A → CB then this depends on
whether or not C is nullable too. Hence, the array occurs actually holds for a
rule of the form A → B the nonterminal A, and for a rule of the form A → BC
or A → CB, the pair 'A, C`. This is then used in order to avoid quadratic
running time.
Lemma 3. For any grammar G = (N, Σ, S, →) in 2NF, algorithm Nullable
computes c
G
in time and space O([G[).
Proof. Clearly, each of the forloop in lines 1–2, 3–5, and 6–8 can be executed
in time O([G[). For the whileloop note that no nonterminal B can be inserted
into todo once it has been removed from it, because then it is on nullable. Hence,
the whileloop can be executed at most [N[ times. For each nonterminal B,
the inner forloop is executed a number of times that is given by the number
of occurrences of B in the righthand side of a rule. Since G is assumed to be
in 2NF, the combined number of iterations between the outer while and the
inner forloop is bounded by 2 [G[. Furthermore, all the other instructions can
be executed in constant time yielding an overall running time of O([G[).
The space needed is O([N[) for the sets todo and nullable and O([G[) for
the array occurs. Hence, it is bounded by O([G[).
11
input: a CFG G = (N, Σ, S, →) in 2NF
Nullable(G) =
1 for all A → B do
2 occurs(B) := occurs(B) ∪ ¦A¦
3 for all A → BC do
4 occurs(B) := occurs(B) ∪ ¦'A, C`¦
5 occurs(C) := occurs(C) ∪ ¦'A, B`¦
6 for all A → ε do
7 nullable := nullable ∪ ¦A¦
8 todo := todo ∪ ¦A¦
9 while todo = ∅ do
10 remove some B from todo
11 for all A, 'A, C` ∈ occurs(B) with C ∈ nullable do
12 if A ∈ nullable then
13 nullable := nullable ∪ ¦A¦
14 todo := todo ∪ ¦A¦
15 return nullable
Figure 2: Algorithm Nullable for the lineartime computation of c
G
.
4.3 Constructing the Unit Relation
As a consequence of Lemma 3, we can also compute the unit relation eﬃciently:
Lemma 4. Let G be a grammar in 2NF. The unit relation 
G
and its inverse,
˘

G
:= ¦(y, A) [ (A, y) ∈ 
G
¦,
can be computed in time and space O([G[).
We view
˘

G
as a relation on V and call 1
G
= (V,
˘

G
) the inverse unit graph
of G.
Proof. According to Lemma 3, c
G
can be computed in time and space O([G[).
To construct 1
G
in linear time and space, ﬁrst add a node for every symbol
in V . Then, for every rule A → y, A → By and A → yB with B ∈ c
G
, add
the edge (y, A) to
˘

G
. This gives us
˘

G
as a list of length O([G[), generally
with multiple entries. We can similarly output a list of length [V [ of nodes y
associated with a list of their 
G
predecessors A
1
, . . . , A
n
, i.e. a representation
of the graph 1
G
as needed in Lemma 6 below. (We could, but need not remove
duplicates in the given time and space bound.)
Algorithm CYK below will need, for a given set M of symbols, the set of all
x such that there is a y ∈ M and x ⇒
∗
y. We show that the restriction of ⇒
∗
to V V is the reﬂexivetransitive closure 
∗
G
of the unit relation of G.
12
Lemma 5. Let G = (N, Σ, S, →) be a grammar. Then for all x, y ∈ V we have:
x ⇒
∗
y iﬀ (x, y) ∈ 
∗
G
.
Proof. (“only if”) Suppose x ⇒
∗
y, i.e. there is an n ∈ N with x ⇒
n
y. If
n = 0, then (x, y) ∈ 
∗
G
because 
∗
G
is reﬂexive. If n > 0, there is x → γ with
γ ⇒
n−1
y. Since y is a single symbol, γ = αzβ for suitable α, β, z such that
z ∈ V , z ⇒
n
1
y and αβ ⇒
n
2
ε for some n
1
, n
2
with 1 ≤ n
1
+ n
2
< n. By
induction hypothesis we have (z, y) ∈ 
∗
G
, and by Lemma 2, αβ ∈ c
∗
G
. But then
(x, z) ∈ 
G
⊆ 
∗
G
, and since 
∗
G
is transitive, (x, y) ∈ 
∗
G
.
(“if”) Since 
∗
G
is the (w.r.t. ⊆) least reﬂexive and transitive relation that
subsumes 
G
, we only need to show that ⇒
∗
subsumes 
G
. But when (x, y) ∈

G
, there are α, β ∈ c
∗
G
such that x → αyβ, and then x ⇒
∗
y since αβ ⇒
∗
ε.
It follows that for any M ⊆ V , the set of all x ∈ V which derive some
element of M, is just ¦x [ ∃y ∈ M, (x, y) ∈ 
∗
G
¦ and hence equals
˘

∗
G
(M) := ¦x [ ∃y ∈ M, (y, x) ∈
˘

∗
G
¦.
To compute such sets eﬃciently, we do not build the relation
˘

∗
G
, which can
be quadratic in the size of
˘

G
and G, but use the inverse unit graph:
Lemma 6. Let G be a grammar in 2NF with symbol set V . Given its inverse
unit graph 1
G
, the set
˘

∗
G
(M) can be computed in time and space O([G[), for
any M ⊆ V .
Proof.
˘

∗
G
(M) consists of all nodes x ∈ V that are reachable in 1
G
from some
node y ∈ M. The set of all nodes in a graph reachable from a given set of nodes
can be computed in time and space O(n + m) where n is the number of all
nodes and m the number of all edges. This simply uses depth or breadthﬁrst
search [5]. Hence,
˘

∗
G
(M) can be computed in time and space O([V [ +[
G
[) =
O([G[).
Clearly, one could drop the assumption that 1
G
be given, as it can be con
structed from G in time and space O([G[). However, since we will need to
compute
˘

∗
G
(M) for several M, it is advisable to construct 1
G
once and for all
at the beginning only.
4.4 Building the CYK Table
The membership problem for a contextfree grammar G = (N, Σ, S, →), i.e. to
determine for words w ∈ Σ
∗
whether S ⇒
+
w or not, generalizes naturally to
the computation of the sets ¦x [ x ∈ V, x ⇒
∗
v¦ of all symbols of G that derive
v, for all subwords v of w. These sets can be constructed inductively.
Theorem 7. Let G = (N, Σ, S, →) be a grammar in 2NF, (V,
˘

G
) its inverse
unit graph, and w ∈ Σ
+
. Then for any x ∈ V and any nonempty subword v of
13
w we have x ⇒
∗
v iﬀ x ∈ T
v
, where the sets T
v
and the auxiliary sets T
v
are
deﬁned by
T
v
:=
¦v¦, if [v[ = 1
¦A [ ∃v
1
, v
2
∈ Σ
+
, ∃z
1
∈ T
v
1
, ∃z
2
∈ T
v
2
with
v = v
1
v
2
, A → z
1
z
2
¦, else
T
v
:=
˘

∗
G
(T
v
).
Proof. (“if”) We prove this by induction on v. Suppose x ∈ T
v
. Then there is
a y ∈ T
v
s.t. (x, y) ∈ 
∗
G
, and by Lemma 5, x ⇒
∗
y.
Suppose [v[ = 1. By the deﬁnition of T
v
we have y = v which immediately
yields x ⇒
∗
v.
Now suppose [v[ ≥ 2. Then y ∈ T
v
is only possible if there is a splitting
v = v
1
v
2
of v into two nonempty subwords and a rule x → z
1
z
2
such that
z
1
∈ T
v
1
and z
2
∈ T
v
2
. Hence, the hypothesis yields z
1
⇒
∗
v
1
and z
2
⇒
∗
v
2
.
But then we have x ⇒
∗
y ⇒ z
1
z
2
⇒
∗
v
1
z
2
⇒
∗
v
1
v
2
= v.
(“only if”) Suppose x ⇒
∗
v. There is n ≥ 0 such that x ⇒
n
v. By induction
on n, we show that x ∈ T
v
. The base case of n = 0 means x = v, and in
particular [v[ = 1 since x ∈ V . But then x ∈ T
v
and, hence, x ∈ T
v
.
Now suppose that n > 0. Since G is 2NF and v = ε, the ﬁrst rule applied
in the derivation is of the form (i) x → y with y ∈ V , or (ii) x → y
1
y
2
with
y
1
, y
2
∈ V . In case (i), we have y ⇒
n−1
v, and hence y ∈ T
v
by hypothesis.
Since T
v
is closed under the inverse of 
G
, we also have x ∈ T
v
.
In case (ii), the derivation is of the form x ⇒ y
1
y
2
⇒
n−1
v. Then there is a
splitting v = v
1
v
2
of v such that y
1
⇒
m
1
v
1
and y
2
⇒
m
2
v
2
with m
1
, m
2
< n. If
v
1
= ε then v
2
= ε and we have y
2
∈ T
v
2
by the hypothesis, and x ⇒ y
1
y
2
⇒
∗
y
2
.
Hence, by Lemma 5, (x, y
2
) ∈ 
∗
G
and, since T
v
2
is closed under the inverse of

∗
G
, also x ∈ T
v
2
= T
v
. Equally, if v
2
= ε then v
1
= ε and y
1
∈ T
v
1
by
hypothesis, which also yields x ∈ T
v
1
= T
v
. If v
1
, v
2
both diﬀer from ε, we get
y
1
∈ T
v
1
and y
2
∈ T
v
2
both from the hypothesis and therefore x ∈ T
v
which
entails x ∈ T
v
.
Algorithm CYK in Fig. 3 contains an implementation of the procedure that is
implicitly given in Thm. 7. It computes, on a G in 2NF and a word w = a
1
a
n
of length n, for each 1 ≤ i ≤ j ≤ n the set T
a
i
···a
j
of Theorem 7, which is here
called T
i,j
.
2
These are stored in an array of size n
2
, with a boolean array of
size [N[ as entries, plus a terminal in the ﬁelds on the main diagonal. Initially,
all ﬁelds in the boolean arrays are set to false. Algorithm CYK assumes the
inverse unit graph 1
G
to have been computed already.
Note that algorithm CYK only diﬀers minimally from most textbook versions
of CYK, e.g. [8, 7]: these do not close the table entries under
˘

∗
G
as done here
2
Note that if a subword v occurs several times in w, computing and storing T
v
for each
occurrence (i, j) separately wastes some time and space. There may be applications where it
pays oﬀ to implement T as an array indexed by subwords.
14
input: a CFG G = (N, Σ, S, →) in 2NF, its graph (V,
˘

G
),
a word w = a
1
. . . a
n
∈ Σ
+
CYK(G,
˘

G
,w) =
1 for i = 1, . . . , n do
2 T
i,i
:=
˘

∗
G
(¦a
i
¦)
3 for j = 2, . . . , n do
4 for i = j − 1, . . . , 1 do
5 T
i,j
:= ∅
6 for h = i, . . . , j − 1 do
7 for all A → yz
8 if y ∈ T
i,h
and z ∈ T
h+1,j
then
9 T
i,j
:= T
i,j
∪ ¦A¦
10 T
i,j
:=
˘

∗
G
(T
i,j
)
11 if S ∈ T
1,n
then return yes else return no
Figure 3: Algorithm CYK for the word problem of CFG in 2NF.
in lines 2 and 10. Also note that CYK presupposes the input word w to be non
empty. The case of the input ε can – as with all other CYK versions – easily be
dealt with separately since c
G
needs to be computed anyway and ε ∈ L(G) iﬀ
S ∈ c
G
.
Lemma 8. Given a grammar G = (N, Σ, S, →) in 2NF, its graph 1
G
and a word
w ∈ Σ
+
, Algorithm CYK decides in time O([G[ [w[
3
) and space O([G[ [w[
2
)
whether or not w ∈ L(G).
Proof. The forloop in lines 1–2 takes time O([w[ [V [). The inner loop in
lines 7–9 is executed O([w[
3
) times, each iteration taking O([G[) steps, and
each step being executed in constant time. Line 10 is executed O([w[
2
) times,
taking O([G[) steps for each execution according to Lemma 6. This amounts to
O([w[ [V [ + [G[ [w[
3
+ [G[ [w[
2
) = O([G[ [w[
3
) altogether.
The space needed to store T is O([w[
2
[N[) and the one to compute the
˘

∗
G
(T
i,j
) is O([G[). Hence, we need space O(max¦[w[
2
[N[, [G[¦) ≤ O([w[
2
[G[)
altogether.
Lemma 8 expects G to be in 2NF and 1
G
given already. Solving the word
problem for arbitrary contextfree grammars requires the transformation and
precomputation to be done beforehand.
Corollary 9. The word problem for contextfree grammars G and words of
length n can be solved in time O([G[ n
3
) and space O([G[ n
2
).
Proof. First, transform G into an equivalent G
in 2NF with [G
[ = O([G[). By
Lemma 1, this can be done in time and space O([G[) . Then, compute the set
15
of nullable nonterminals and the inverse unit graph for G
in time and space
O([G
[) according to Lemma 2 and Lemma 4. Finally, execute CYK in time
O([G
[ n
3
) and space O([G
[ n
2
) according to Lemma 8.
Since [G
[ is O([G[), this yields an asymptotic running time of O([G[ n
3
)
and a space consumption of O([G[ n
2
).
4.5 An Example
We ﬁnish this section by executing the entire procedure on the following example
grammar G over the alphabet Σ = ¦a, b, 0, 1, (, ), +, ∗¦.
E → T [ E +T
T → F [ T ∗ F
F → aI [ bI [ (E)
I → 0I [ 1I [ ε
This is a standard example with L(G) being the set of arithmetic expressions
over the regular set of identiﬁers (a ∪ b)(0 ∪ 1)
∗
. This example (in slightly
modiﬁed form) can also be found in the standard textbook by Hopcroft e.a. [8],
where it is used to illustrate the transformation into CNF. Note that G has 4
nonterminals, 10 rules, and is of size 29.
First we need to transform G into 2NF by introducing new nonterminals X,
Y , Z for the righthand sides that are larger than 2 symbols. The result is the
following grammar G
with symbol set V .
E → T [ EX X → +T
T → F [ TY Y → ∗F
F → aI [ bI [ (Z Z → E)
I → 0I [ 1I [ ε
It has 7 nonterminals, 13 rules, and is of size 35. Since most textbook ver
sions of CYK would require G to have been transformed into CNF, we state
the corresponding ﬁgures for the resulting CNF grammar without presenting it
explicitly. If it is obtained in the order Del →Unit →Term →Bin, then the
result has 15 nonterminals, 33 rules, and is of size 83.
It is easy to see that only I is nullable, i.e. c
G
= ¦I¦. In order to be able
to compute
˘

∗
G
(M) for any M ⊆ V we ﬁrst determine the unit relation.

G
= ¦(E, T), (T, F), (F, a), (F, b), (I, 0), (I, 1)¦
Hence, we have, for instance
˘

∗
G
(¦a¦) = ¦a, E, T, F¦
˘

∗
G
(¦F¦) = ¦F, T, E¦
˘

∗
G
(¦b¦) = ¦b, E, T, F¦
˘

∗
G
(¦T¦) = ¦T, E¦
˘

∗
G
(¦0¦) = ¦0, I¦
˘

∗
G
(¦I¦) = ¦I¦
etc.
16
j = 1 2 3 4 5 6 7 8
E, T E
i = 1
( F T
E, T, F E, T
2
a F E Z
I
3
0
4
+ X
E, T, F
5
b Z
6
)
7
∗ Y
E, T, F
8
a
Figure 4: An example run of algorithm CYK.
To ﬁnish the example, consider the word w = (a0 + b) ∗ a. Executing CYK
on w creates the table depicted in Fig. 4. It is to be understood as follows. A
nonterminal A in the bottom half of an entry at row i and column j belongs to
T
v
for v = w[i..j]. Hence, it is in there because of a rule of the form A → yz
such that y and z occur at certain positions in the same row to the left and in
the same column below. A nonterminal in the top half of the entry belongs to
T
v
` T
v
, i.e. it is in there because it is a predecessor in
˘

∗
G
of some nonterminal
from the bottom half. Hence, the entire entry represents T
v
. The bottom halves
of the entries on the main diagonal always form the input word when read from
topleft to bottomright.
4.6 Parsing with CYK
The CYK recognition algorithm can be turned into a parser either by con
structing parse trees from the recognition table or by inserting trees rather than
nonterminals into the table. We only discuss how to construct parse trees from
the table, leaving aside the question of spaceeﬃcient structure sharing for a
table holding trees.
In the case of CYK for grammars in CNF, deﬁne a function extract(A, i, j)
17
that returns for A ∈ T
i,j
the trees with root labelled A and yield a
i
a
j
by
extract(A, i, j) =
¦A(a
i
) [ A → a
i
¦ if i = j
¦A(s, t) [ A → BC, i ≤ h < j,
s ∈ extract(B, i, h),
t ∈ extract(C, h + 1, j)¦ if i < j,
where A(s, t) is the tree with root labelled A and immediate subtrees s and t.
Since grammars in CNF are acyclic (i.e. A ⇒
+
A does not occur), the number
of parse trees of a given w is ﬁnite, and all possible parse trees can be obtained
from the table in ﬁnitely many steps.
Our CYK uses grammars in 2NF, which may be cyclic and hence may have
inputs with inﬁnitely many parses. We adapt the tree extraction algorithm so
that it returns a ﬁnite number of “essentially diﬀerent” analyses. First, for
x ∈ T
i,j
an auxiliary function extract
(x, i, j) returns trees with yield a
i
a
j
and root labelled x, which are leaves in case i = j or branch at the root to
two subtrees, each having a nonempty yield. Then, for A ∈ T
i,j
, extract(A, i, j)
extends such trees by adding a root labelled A on top, if A ⇒
+
x:
extract(A, i, j) = ¦A(t) [ x ∈ T
i,j
, t ∈ extract
(x, i, j), A ∈
˘

+
G
(¦x¦)¦
∪ extract
(A, i, j)
extract
(x, i, j) =
¦x¦ if i = j
¦x(s, t) [ x → yz, i ≤ h < j,
s ∈ extract(y, i, h),
t ∈ extract(z, h + 1, j)¦ if i < j.
Using extract, we consider all derivations A ⇒
+
x as inessential variants of each
other and represent them as a unary branch. Since these branches generally
do not correspond to grammar rules A → x, the extracted trees are not parse
trees in the strict sense. If the grammar is acyclic, we can turn them into parse
trees: in this case, for each (A, x) ∈ 
+
G
, there are only ﬁnitely many derivations
A ⇒
+
x, so their parse trees can be precomputed and extract(A, i, j) would put
them on top of the trees returned by extract
(x, i, j). If the grammar is cyclic,
we can still precompute the ﬁnite number of derivations A ⇒
+
x which have
no subderivations B ⇒
+
B, and by using them in extract(A, i, j), we would
obtain a ﬁnite set of “canonical” parse trees. In this way, we get a correct
but incomplete parser. (Note that the missing parse trees with subderivations
B ⇒
+
B also get lost if we turn the grammar into CNF.)
Suppose CYK is used with a 2NF grammar G
for a given grammar G and
a word w. Since G
is a leftcover of G by Lemma 1, it is easy to recover w’s
parse trees with respect to G from those for G
obtained by CYK. One just has
to undo the transformation of kary branching rules to binary branching ones
on the trees, i.e. apply the covering homomorphism on left parses. In this way,
the ﬁnitely many “canonical” parse trees with respect to G
can be transformed
into parse trees with respect to G. If G
is acyclic, this gives a complete parser
for G.
18
5 Conclusion
Teaching CYK with pretransformation into 2NF The variant of CYK
we propose here is not more diﬃcult to present than those in the standard
textbooks. The precomputations that this CYK version requires need to be
done in the textbook versions as well where they are part of the grammar
transformation: the computation of the nullable symbols is necessary for step
Del; the construction of the inverse unit graph could also be used to implement
step Unit. We believe that using these relations explicitly in the CYK algorithm
is better from a learning point of view than to use them in order to transform the
grammar. By using them in the algorithm one cannot do the Bin and Del
transformations in the wrong order and automatically avoids an exponential
blowup of the grammar. Moreover, it emphasises the fact that the computation
of such relations is a necessary means to solving the membership problem, and
it will therefore make it easier for students to remember it.
On the other hand, one may argue that the essence of CYK is the dynamic
programming technique and that therefore the actual algorithm should not con
tain computation steps which obfuscate the view onto that technique. However,
our algorithm CYK only diﬀers in two lines (the closure steps in lines 2 and 9)
from those that work on CNF input.
As shown in the introduction, many textbooks ignore the complexity of the
normal form transformation or ignore the dependency of CYK’s complexity on
the size of the input grammar. We believe that this is wrong in cases where the
blowup in the grammar is suboptimal. It is obviously possible to present our
variant of CYK without the complexity analyses. If this is done, then using the
2NF variant bears two distinct advantages over the CNF variant: (1) It does
not contain hidden obstacles like the dependency of the blowup on the order in
which the pretransformation steps are done. (2) Since the CNF transformation
comes with an at least quadratic blowup in grammar size, one may be inclined
to think that the complexity of the word problem for arbitrary CFGs is at least
quadratic in the grammar size. The 2NF variant does not give this impression.
Finally, note that in the previous two sections we have presented one way
of presenting this variant of CYK. The essentials are easily seen to be the 2NF
transformation, the computation of the nullable symbols, the linear time and
space algorithm for computing the predecessors w.r.t. ⇒
∗
, and algorithm CYK.
The correctness proofs are of course not essential in order to obtain an eﬃcient
procedure. They can be left out, as is done in many textbooks.
A tool is available that is meant to support teaching this variant of CYK
3
.
It visualises the entire procedure presented here on arbitrary input contextfree
grammars. In particular, it shows simulations of the computation of the nullable
symbols, the inverse chain relation, and the ﬁlling of the CYK table.
Avoiding normal forms at all We have argued for omitting the grammar
transformation steps Term, Unit and Del in favour of precomputed auxiliary
3
http://www.tcs.ifi.lmu.de/SeeYK
19
relations. One may push this a step further and teach a version of CYK which
omits the transformation Bin as well and only implicitly works with binary
branching rules. To do so, one would not only insert symbols x ∈ V into the
ﬁelds T
v
of the recognition table, but also “dotted rules” A → α β where
A → αβ is a grammar rule and α ⇒
∗
v. The algorithm would have to close the
table under the rules
A → α
A → α ∈ T
ε
A → α aβ ∈ T
v
A → αa β ∈ T
va
A ⇒
∗
B, B → β ∈ T
v
A ∈ T
v
A → α Bβ ∈ T
v
, B ∈ T
u
A → αB β ∈ T
vu
where the ﬁeld identiﬁers are subwords of the input sentence and ε = ε. How
ever, this would be a step away form the simplicity of CYK towards Earley’s
algorithm [6], and seems less memorizable than the explicit transformation to
2NF. We therefore conclude that the pretransformation into 2NF and the ac
cording variant of CYK forms the best balance between didactic needs and the
aim for eﬃciency.
References
[1] A. V. Aho and J. D. Ullman. The Theory of Parsing, Translation, and
Compiling, volume Volume I: Parsing. PrenticeHall, 1972.
[2] J. Autebert, J. Berstel, and L. Boasson. Contextfree languages and push
down automata. In A. Salomaa and G. Rozenberg, editors, Handbook of
Formal Languages, volume 1, Word Language Grammar, pages 111–174.
SpringerVerlag, Berlin, 1997.
[3] R. Axelsson, K. Heljanko, and M. Lange. Analyzing contextfree grammars
using an incremental SAT solver. In Proc. 35th Int. Coll. on Automata,
Languages and Programming, ICALP’08, Part II, volume 5126 of LNCS,
pages 410–422. Springer, 2007.
[4] J. Cocke and J. T. Schwartz. Programming Languages and Their Compilers.
Courant Institute of Mathematical Sciences, New York, 1970.
[5] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algo
rithms. MIT Press, McGrawHill, Cambridge, London, 1990.
[6] Jay Earley. An eﬃcient contextfree parsing algorithm. Communications
of the ACM, 13(2):94–102, 1970.
[7] M. A. Harrison. Introduction to Formal Language Theory. AddisonWesley
series in computer science. Reading (MA): AddisonWesley, 1978.
[8] J. E. Hopcroft, R. Motwani, and J. D. Ullman. Introduction to Automata
Theory, Languages, and Computation. AddisonWesley, New York, 3 edi
tion, 2001.
20
[9] T. Kasami. An eﬃcient recognition and syntax analysis algorithm for
contextfree languages. Technical Report AFCRL65758, Air Force Cam
bridge Research Laboratory, Bedford, Massachusetts, 1965.
[10] R. Leermakers. How to cover a grammar. In 27th Annual Meeting of the
ACL, Proceedings of the Conference, pages 135–142, Vancouver, Canada,
June 1989. Association for Computational Linguistics.
[11] H. Leiß. Bounded ﬁxedpoint deﬁnability and tabular recognition of lan
guages. In Computer Science Logic. 9th International Workshop, CSL’95.,
volume 1092 of LNCS, pages 388–402. Springer, 1996.
[12] H.R. Lewis and C.H. Papadimitriou. Elements of the theory of computation.
Prentice Hall, snd edition, 1998.
[13] S. Naumann and H. Langer. Parsing. Teubner, Stuttgart, 1994.
[14] M.J. Nederhof. Linguistic Parsing and Program Transformation. PhD
thesis, Proefschrift Nijmegen, Wiskunde en Informatica, 1994.
[15] M.J. Nederhof and G. Satta. Eﬃcient tabular LR parsing. In Proceedings
of the 34th Annual Meeting of the ACL, pages 239–246, Morristown, NJ,
USA, 1996. Association for Computational Linguistics.
[16] A. Okhotin. Boolean grammars. Information and Computation, 194(1):19–
48, 2004.
[17] E. Rich. Automata, Computability, and Complexity: Theory and Applica
tions. PrenticeHall, 2007.
[18] U. Sch¨oning. Theoretische Informatik — kurzgefaßt. Hochschultaschen
buch. Spektrum Akademischer Verlag GmbH, HeidelbergBerlin, 2000.
[19] E. Scott, A. Johnstone, and R. Economopoulos. BRNGLR: a cubic Tomita
style GLR parsing algorithm. Acta Inf., 44(6):427–461, 2007.
[20] S. Sippu and E. SoisalonSoininen. Parsing Theory. Vol.I: Languages and
Parsing. EATCS Monographs on Theoretical Computer Science. Springer
Verlag, Berlin, 1988.
[21] M. Tomita. Eﬃcient Parsing for Natural Language: A Fast Algorithm for
Practical Systems. Kluwer Academic Publishers, Norwell, MA, USA, 1985.
[22] I. Wegener. Theoretische Informatik. Teubner, 1993.
[23] Terry Winograd. Language as a Cognitive Process, Volume 1: Syntax.
Addison Wesley, New York, 1983.
[24] D. H. Younger. Recognition and parsing of contextfree languages in time
n
3
. Information and Control, 10(2):372–375, 1967.
21
One would expect that textbooks and course notes reﬂect this importance by good presentations of eﬃcient versions of CYK. However, this is not the case. In particular, many textbooks present the necessary transformation of a contextfree grammar into Chomsky normal form (CNF) in a suboptimal way leading to an avoidable exponential blowup of the grammar. They neither explained nor discuss whether this can be avoided or not. Complexity analyses often concern the actual CYK procedure only and neglect the pretransformation. Some of the textbooks give vague reasons for choosing CNF, indicating ease of presentation and proof. A few state that CNF is chosen to achieve a cubic running time, not mentioning that only the restriction to binary rules is responsible for that. We ﬁnd that this leaves students with wrong impressions. The complexity of solving the word problem for arbitrary CFLs appears to coincide with the complexity of the CYK procedure (ignoring the eﬀects of the grammar transformation). Also, CNF is unduly emphasised. We believe that this situation calls for correction. We want to show that eﬃciency and presentability are not contradictory, so that one can teach a less restrictive version of CYK without compromising eﬃciency of the algorithm, clarity of its presentation, or simplicity of proofs. We propose to use a grammar normal form which is more liberal and easier to achieve than CNF and comes with a linear increase of grammar size only, leading to a faster membership test for arbitrary CFGs. It would be unnecessary to argue that we should emphasize eﬃciency concerns as well as ease of presentation in teaching CYK to students, if the goal was just to show that the membership problem for CFLs is decidable. But since CYK is often the only algorithm taught for that problem, students are led to believe that it is also the one which should be used. Older books on parsing, e.g. Aho and Ullman [1], doubt that CYK “will ﬁnd practical use” at all, arguing that time and space bounds of O(w3 ) and O(w2 ) are not good enough and that Earley’s algorithm does better for many grammars. At least the ﬁrst part of this may no longer be true. For example, Tomita’s [21, 19] generalization of LR parsing to arbitrary CFGs uses a complicated “graphstructured” stack to cope with the nondeterminism of the underlying pushdown automaton; but Nederhof and Satta [15] show how to obtain an eﬃcient generalized LR parser by using CYK with a grammar describing a binary version of the pushdown automaton. Furthermore, there are nonstandard applications of CYK. For instance, Axelsson et al. [3] use a symbolic encoding of CYK for an ambiguity checking test. The claim of Aho and Ullman [1] that CYK is not of practical importance therefore seems too skeptical. The paper aims at scholars who do know about CYK or, even better, have already taught it. We want to convince them that the standard version can easily be improved and taught so that students learn an eﬃcient algorithm. Note that we only suggest that a particular variant of the algorithm is the best one to teach, not that it should be presented in a particular style. The paper is organised as follows. Sect. 2 discusses how the CYK algorithm is presented in some prominent textbooks on formal languages, parsing 2
.1 Sect. . i. aj of w are stored in Ti. . . n do Ti. an ∈ Σ+ CYK(G.j := Ti. j − 1 do for all A → BC if B ∈ Ti. its syntactic properties. consisting of the grammar transformation into 2NF. Σ. a little reﬂection shows that the basic 10 1 The expert reader may have a look at the example ﬁrst to spot the modiﬁcations in comparison to “their textbook version” of CYK. T tells us whether w is a sentence of G. .n then return yes else return no Figure 1: Algorithm CYK for the word problem of CFGs in CNF. 3 . Sect. .j := ∅. a word w = a1 .and the theory of computation. . where the nonterminals deriving the subword ai . ﬁgure 1 shows a typical version of CYK.i := {A ∈ N  A → ai } for j = 2. . for h = i. Using some standard notation explained below. .j ∪ {A} if S ∈ T1. the membership problem for G is solved. 2 The CYK Algorithm in Textbooks What is understood as the CYK algorithm? For textbooks on formal languages. 4 presents an eﬃcient test for membership in contextfree languages. input: a CFG G = (N. . . . the set of nonterminals that derive v. two precomputation steps and the actual CYK procedure. Correctness proofs and complexity estmations are given for each of the steps. . . 1 do Ti. .w) = 1 2 3 4 5 6 7 8 9 for i = 1. →) in CNF. Sect. . In particular.e. the input of the CYK algorithm is a contextfree grammar G in Chomsky normal form (CNF) and a word w over the terminal alphabet of G. 3 recalls deﬁnitions and notations used for languages and CFGs. S. The section ﬁnishes with an example. .j then Ti. Thus. . for each subword v of w. . and by transforming an arbitrary contextfree grammar into an equivalent one in CNF. . n do for i = j − 1. Concerning the restriction to CNF.h and C ∈ Th+1.j . the membership problem for contextfree grammars is solved as well. 5 contains some concluding remarks about teaching the variant proposed here and the possibility of avoiding grammar transformations at all. its output is a recognition table T containing.
Aho and Ullman [1] say that CNF “is useful in simplifying the notation needed to represent a contextfree language”. w are strings of terminal symbols. The textbooks on formal languages do not clearly say which of the restictions of the CNF format serves which of these goals. Others relax the CNF restriction by admitting unit rules. a is a terminal symbol. C are arbitrary nonterminals.f. In fact. one can ﬁnd a number of variations of CYK that diﬀer in the restrictions on the input grammar. Table 1 deﬁnes the normal forms that are being used in textbooks (apart from our proposal 2NF) and shows how they relate to each other. i. 16]). see the socalled Cparser [10]. Such a remark is generally absent in textbooks on formal languages. B. Almost all textbooks on formal languages stick to the very restrictive CNF. Then any contextfree language has a grammar in CNF. The most liberal one that suﬃces to make CYK run in time O(w3 ) is that the grammar is bilinear. where m is the maximum number of symbols in the right hand sides of grammar rules. A. For some variations of the deﬁnition. In the literature on parsing. S is the start symbol that may not occur on a righthand side. or α is a terminal. rules A → α where α is a nonterminal [20]. where in each rule A → α either α = BC for nonterminals B and C. In the column “rule format”. and α is a sentential form.e. Table 2 lists. obviously motivated by simplicity of presentation. this only holds for languages without words of length 0 or 1. u. The syntactic properties of w can be computed from the syntactic properties of all strict subwords v of w using a dynamic programming approach: store the nonterminals deriving v in table ﬁelds Tv and compute Tw by combining entries in stored Tv ’s according to rules of G and all possible splittings of w into at most m strict subwords v.e. Naumann and Langer [13]. v. or α is the empty word ε and A is the start symbol and must not be used recursively.idea of CYK can be extended to arbitrary contextfree grammars (and beyond [11. there is a mixture of two reasons for using grammars in CNF in CYK: cutting down the time complexity to O(w3 ) and keeping the correctness and completeness proofs simple. This gives an algorithm solving the word problem for contextfree grammars in time O(wm+1 ). the format 4 . c. only Aho and Ullman [1] remark that “a simple generalization works for nonCNF grammars as well”. The latter then motivates the restriction to CNF by a “simpliﬁcation to compute T ”. this is only correct if the emphasis lies on “simple”. i. in each rule A → α there are at most two nonterminal occurrences in α. for several books that present a version of CYK. among those in computational linguistics one can ﬁnd a generalization of the CYK algorithm to acyclic CFGs without deletion rules in Winograd’s book [23]. Textbooks on formal languages motivate the restriction to CNF by the fact that “many proofs can be simpliﬁed” if righthand sides of rules have length at most two [7]. Among the computer science books on parsing. we draw the conclusion that CYK is to be understood as the above sketched algorithm that uses dynamic programming and contextfree grammars in some normal form to solve the membership problem in time O(w3 ). or claim that the advantage of CNF “is that it enables a simple polynomial algorithm for deciding whether a string can be generated by the grammar” [12]. without going into any detail. Given this variety.
where the grammar size is huge compared to sentence length. The blowup in grammar size depends on the order between Del and Bin. 5 . one obtains the impression that the membership problem for CFGs can be solved within the same bounds as for grammars in CNF. strict 2form (S2F). Consequently.Name CNF− CNF CNFε S2F C2F 2NF 2LF Rule format A → BC  a A → BC  a. inclusion of the normal form they require and the asymptotic time and space complexity they state for their CYK procedure on grammars in the respective normal form. A complexity analysis is often done only for the CYK procedure. S → ε A → α where α ≤ 2 A → uBvCw  uBv  v 2LF 2NF CNFε C2F CNF CNF− S2F Chomsky normal form (CNF). the elimination of deletion rules (Del). It may be exponential when Del is done ﬁrst. It is often neglected that the complexity of the transformation to CNF depends on the order in which these steps are preformed. and the elimination of unit rules (Unit). One can see that the most prominent textbooks on formal language theory ignore the dependency on the grammar. but is linear otherwise. The transformation to CNF combines several steps: the eliminiation of terminal symbols except in right hand sides of size 1 (Term).r. and almost none will know the time complexity of the membership problem for arbitrary CFGs in terms of grammar size. 2normal form (2NF). canonical 2form (C2F). 1. most students will not be able to tell whether an algorithm like CYK is possible for arbitrary CFGs. bilinear form (2LF) Table 1: Grammar normal forms and their hierarchy w. S → ε A → BC  a  ε A → α where α = 2 A → BC  B  a.t. the reduction to rules with righthand sides of size ≤ 2 (Bin). This is problematic not only for computational linguistics. This is true but almost never achieved with the procedure presented in these textbooks. Transforming CFGs into CNF Most textbook solutions to the word problem for contextfree grammars present the CYK algorithm for grammars in CNF and just state that this is no restriction of the general case. Where a complexity analysis for the transformation into CNF is missing. they demand a very restrictive normal form compared to other books which obtain reasonable complexity bounds. Furthermore.
only few textbooks contain a rigorous complexity analysis of this transformation or a hint at its suboptimality. To simplify the discussion. and (ii) ease of presentation and proofs. the question which of the possible normal forms to choose ought to be determined by two factors: (i) complexity of the grammar transformation. the order in which the single transformation steps are carried out should be: Bin → Del → Unit. The last two columns give the asymptotic size of the resulting grammar and the fact whether or not the estimation is given in the textbok. which gives a linear increase in grammar size. Which normal form to choose? All normal forms mentioned in Table 1 allow for a time complexity of CYK that is cubic in the size of the input word and linear in the size of the grammar. N its nonterminal set. in particular the size of the transformed grammar. theory of computation (CT) G is the input grammar. Table 2: Overview over variants of CYK. we only consider grammars for εfree languages. Table 4 shows the asymptotic time complexity for the word problem that one obtains by combining their transformation to normal form with their corresponding version of CYK. formal languages (FL). the one listed in column “rule format” of Table 2. Even worse so.Book Aho/Ullman ’72 Hopcroft/Motwani/Ullman ’01 Harrison ’78 Naumann/Langer ’94 Sch¨ning ’00 o Sippu/SoisalonSoininen ’88 Wegener ’93 Lewis/Papadimitriou ’98 Rich ’07 Autebert/Berstel/Boasson ’97 Subject Pa CT FL Pa CT Pa CT CT CT FL Format CNF CNF− CNF CNF− CNF− C2F CNF− S2F CNF− CNFε Time w3 w3 w3 w3 w3 G · w3 G · w3 G · w3 G · w3  Space w2 w2 N  · w2 N  · w2  Subject: parsing (Pa). i.e. Hence. Table 3 shows how the textbooks mentioned in Table 2 handle the transformation into the normal form that their CYK variant demands. namely Del → Unit → Bin which yields a grammar of size O(22G ). many textbooks choose a diﬀerent order. Simplicity of the grammar transformation 6 . This will yield a grammar of size O(G2 ). Unit can incur a quadratic blowup in the size of the grammar. on which CNF and CNF− collapse. Nevertheless. Therefore. and w is the input word. with complexity mentioned 2. except for Term.
and in order to eliminate unit rules. so we prefer to omit it. but ignores that Bin increased V to size G. rather than using these relations to transform the grammar to CNF. Hopcroft e. 8]. This paper Target CNF CNF− CNF CNF− C2F CNF− S2F CNF− CNFε 2NF Method Del → Unit → Bin → Term Del → Unit → Term → Bin Del → Unit → Term → Bin (refers to Aho e. one computes the set of nullable nonterminals. This can be done eﬃciently.Book Aho e. v2 . But. Harrison Naumann e. a small modiﬁcation of the original version of CYK suﬃces to accommodate for deletion and unit rules: just close the ﬁelds Tv under the rule “if B ∈ Tv and A ⇒∗ B. Wegener Lewis e. Actually. provided that Bin is executed before Del. but also nonstrict ones. add A to Tv ”.) Del → Unit → Term → Bin Bin → Del Term → Bin → Del → Unit Bin → Del → Unit Bin → Del → Unit → Term Term → Bin → Unit Bin G  22G 22G 22G 22G G G2 G3 G2 G2 G Stated − − − − + + + + − + Wegener [22] gives O(V  · G) as G . one can use them in 7 . one now has to consider splittings v = v1 v2 not only into strict subwords v1 . The transformation to C2F needs the additional step Del. Table 3: Transformation of CFG (with only useful symbols) into NF and size of the resulting grammar would suggest to favour 2NF over C2F or CNF. one computes the set of nonterminals that derive a given one [1. The transformation to 2NF only needs step Bin with a linear increase in grammar size. Transformation to CNF aﬀords the additional steps of Unit and Term.a. Although 2LF is more liberal than 2NF.a. Term can be achieved by a linear blowup of the grammar. v2 = v.a.a. Sch¨ning o Sippu e.a. But what is the price to pay in terms of modiﬁcations of CYK or the proofs involved? Unit rules and deletion rules in the grammar seem to complicate the ﬁlling of the recognition table: in order to compute Tv . say v1 = ε. as shown by Sippu and SoisalonSoininen [20] or below.a. but is also linear. but has no real advantage for the complexity or presentation of CYK.a. the latter is preferable since the transformation to 2NF is simpler to explain. 7. Rich Autebert e. this is part of what is done in transformations to CNF as well: in order to eliminate deletion rules from a grammar. However. Step Unit causes a quadratic inrease in grammar size and therefore ought to be omitted.
Wegener Lewis e. In our opinion. We think the explicit use of auxiliary information is better than an unnecessary transformation of input data. the latter are often used as a means to admit optional constituents. but also deletion rules are convenient in grammar writing.a. This paper via CNF CNF− CNF CNF CNF− C2F CNF− S2F CNF− CNFε 2NF Time 22G · n3 22G · n3 22G · n3 22G · n3 22G · n3 G · n3 G2 · n3 G3 · n3 G2 · n3 G2 · n3 G · n3 Stated − − − − − + + + + − + The last column states whether or not the overall time complexity is given. Hopcroft e.. Table 4: Upper complexity bounds for the word problem on arbitrary CFGs the CYK algorithm directly and leave the (2NF) grammar unchanged. and Σ+ the set of all nonempty words. Furthermore. an and positions 1 ≤ i ≤ j ≤ n we write w[i. and w for the length of the word w. Harrison Naumann e. as we will show below. aj of w.a. .j] for the subword ai . Sch¨ning o Sippu e. Σ. • Σ is the ﬁnite alphabet or set of terminals. a word is a ﬁnite sequence of letters. ε is used for the empty word. . . We are only aware of one textbook which presents the CYK algorithm for grammars in C2F. called the alphabet. the advantage of using 2NF or C2F over CNF in CYK is not reﬂected well in the literature. since not only unit rules. As usual.a. and a language is a set of words. A contextfree grammar (CFG) is a tuple G = (N. [20] – a specialised book on parsing – and a PhD thesis [14]. . A letter is an element of Σ. wv for the concatenation of the two words w and v. For a word w = a1 .t. Rich Autebert e. →) s. Σ∗ is the set of all words over Σ.Book Aho e.a.a. We consider 2NF as an advantage over C2F. with ε = 0. using 2NF does not compromise on the ease of presentation nor proof. It seems to have been unnoticed so far that one can use even 2NF instead of C2F. 3 Preliminaries Let Σ be a ﬁnite set. 8 . S.a.
lowercase letters like u. Two contextfree grammars G1 . the righthand sides α of rules A → α. Henceforth we simply use grammar for contextfree grammar. A ⇒+ ε} and the unit relation of G is ∗ UG := {(A. 4 The Word Problem for ContextFree Languages We suggest the following algorithm to test. β ∈ EG with A → αyβ}. B. The set of all symbols of G is N ∪ Σ. • S ∈ N is the start symbol . w to denote words over Σ. whether or not w ∈ L(G). Σ. ⇒+ and ⇒∗ are the nfold iteration. S. lowercase letters like a. words over V . →) is in binary normal form (2NF) if for all A → α we have α ≤ 2. N ∩ Σ = ∅. C to denote nonterminals. A grammar can be represented by enlisting. and Greek lowercase letters α. The derivation relation ⇒ ⊆ V + × V ∗ is deﬁned such that for all α. β ∈ V ∗ . v. the transitive closure. In our exposition of the CYK algorithm. Thus. αAβ ⇒ αγβ iﬀ there is a rule A → γ. We assume that each symbol occurs in some rule. which will always be denoted by the letter V in the following. 9 . Σ. respectively.• N is the ﬁnite nonempty set of nonterminals. We will use uppercase letters like A. so that V  ∈ O(G). two relations between symbols of a grammar play an important role. S. small letters like x. Preprocessing of the grammar. i. γ. and the reﬂexivetransitive closure of ⇒. for each nonterminal A. c to denote terminal symbols. . (a) Transformation of G into an equivalent G in 2NF. • → ⊆ N × (N ∪ Σ)∗ is a ﬁnite set of rules. y)  ∃α. The set of nullable symbols of the grammar G = (N. b. A grammar G = (N. for sentential forms. . Deﬁnition 1. for a given grammar G and a given word w. β. The language generated by G is L(G) := {w ∈ Σ∗  S ⇒+ w}. 1. G2 with the same terminal alphabet are equivalent if L(G1 ) = L(G2 ). →) is EG := {A  A ∈ N.t.e. . one measures the size of G by G := A∈N A→α Aα. The relations ⇒n . y. Deﬁnition 2. s. z to denote symbols.
. xn → x2 x3 . . any leftmost derivation with G is obtained from a leftmost derivation of G by this mapping (cf. . . abbreviating suﬃxes of length ≥ 2 by new symbols xi . . xn . Note that transformations to C2F or CNF do not admit parses to be recovered by such a simple homomorphism between strings of rules. . x2 . We now present the details and complexity analysis of this algorithm. xn−1 . For every grammar G = (N. .29. xn . Bin.5. Building the CYK table for w and (G . xn x2 x3 . N  = O(G). This can be carried out in a single pass through the grammar. . xn ∈ N . Every grammar can be transformed into one in 2NF by just performing the step Bin. . . The transformation Bin replaces each rule A → x1 x2 · · · xn with n > 2 by the rules A → x1 x2 . . xn−1 xn → x1 x2 A . . . . xn . xn with applications of A → x1 x2 · · · xn and omitting applications of xi . [1]). . xn−1 . . ˘ (c) Construction of the the inverse unit relation UG of G . ˘ 2. . . . Del. S. Σ. .(b) Computation of the set EG of nullable symbols in G . 10 . such that G  = O(G). and then apply it to an example in section 4. xn → xn−1 xn . G is a leftcover of G: any leftmost derivation with respect to G is mapped to a leftmost derivation with respect to G by replacing applications of A → x1 x2 . A x1 x2 . Unit mentioned in the introduction. xn Conversely. . . S. . . xn → xn−1 xn . . The equivalence of G and G can easily be shown by transforming derivations with G into derivations with G and vice versa. Phase 2 applies a recognizer which is universal for grammars in 2NF with precomputed unitrelation. . . →) there is an equivalent grammar G = (N . In fact. . xn and xn−1 . Proof. .. Lemma 1. and the number of new nonterminals is bounded by the size of G. → ) in 2NF computable in time O(G). xn . . . . . Exercise 3. Some readers may want to look at the example before checking the complexity bounds. 4.1 Transformation into 2NF It is wellknown that every grammar can be transformed into an equivalent one in CNF. . . . . . UG ). . xn → xi xi+1 . . .. . . . Σ. applying the four transformations Term.
that A is also nullable. S. all the nonterminals A such that B occurs on the righthand side of a rule A → α. it is possible to compute EG in time linear in the grammar. C . For any grammar G = (N. Hence. Σ. and Sippu e.e. i. Lemma 2 suggests to construct EG by iteratively computing the Ei until Ei+1 = Ei .4. However. For this. →) be a grammar. the pair A. one shows Ei ⊆ EG for all i. all the other instructions can be executed in constant time yielding an overall running time of O(G). 11 . EN +1 = EN  . 3–5. Algorithm Nullable successively ﬁnds those nonterminals that can derive ε starting with those that derive it in one step and proceeding backwards through the rules. The space needed is O(N ) for the sets todo and nullable and O(G) for the array occurs. Σ.a.and the inner forloop is bounded by 2 · G. To show EG ⊆ EN  . Theorem 4. the inner forloop is executed a number of times that is given by the number of occurrences of B in the righthand side of a rule. prove by induction on n that A ⇒n ε implies A ∈ EN  . S. Exercise 2 in Section 4. and for a rule of the form A → BC or A → CB. The algorithm therefore starts by storing in an initially empty array occurs the set of all such A for each such B. (“⊆”) By deﬁnition. if A and B are linked through a rule A → B. Both are assumed to be empty initially. Since G is assumed to be in 2NF. we present in Fig. it is bounded by O(G). each of the forloop in lines 1–2. 2 an algorithm Nullable to do so for G in 2NF. [20]. algorithm Nullable computes EG in time and space O(G). If they are linked through a rule A → BC or A → CB then this depends on whether or not C is nullable too. need to be accessible without searching through the entire grammar. Furthermore. the combined number of iterations between the outer while. It maintains a set todo which can be implemented as a list with insertion of an element and removal of the ﬁrst one in constant time. Lemma 2. Proof. Lemma 3. Hence.3. →) in 2NF. as mentioned by Harrison [7]. For the whileloop note that no nonterminal B can be inserted into todo once it has been removed from it.14. because then it is on nullable. (“⊇”) By induction on i. and a set nullable which should be stored as a boolean array with constant access time. A na¨ implementation of this idea can easily result in quadratic ıve running time. This information is used to infer from the information that B is nullable. Let G = (N. This is then used in order to avoid quadratic running time. Since this seems not to be wellknown. As N is ﬁnite. For each nonterminal B. the array occurs actually holds for a rule of the form A → B the nonterminal A. Hence. Proof. Ei ⊆ Ei+1 ⊆ N for any i. Then EG = EN  where E0 := {A  A → ε} and Ei+1 := Ei ∪ {A  ∃α ∈ Ei+ with A → α}. and 6–8 can be executed in time O(G). Clearly. the predecessors of a nonterminal B. the whileloop can be executed at most N  times.2 Precomputation of the Nullable Symbols The following lemma gives an inductive characterisation of EG .
12 .3 Constructing the Unit Relation As a consequence of Lemma 3. for a given set M of symbols. but need not remove duplicates in the given time and space bound. the set of all x such that there is a y ∈ M and x ⇒∗ y. To construct IG in linear time and space. S. A. ﬁrst add a node for every symbol in V . Then. ˘ ˘ We view UG as a relation on V and call IG = (V. Σ. UG ) the inverse unit graph of G. . B } for all A → ε do nullable := nullable ∪ {A} todo := todo ∪ {A} while todo = ∅ do remove some B from todo for all A. (We could. →) in 2NF Nullable(G) = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 for all A → B do occurs(B) := occurs(B) ∪ {A} for all A → BC do occurs(B) := occurs(B) ∪ { A. Let G be a grammar in 2NF. add ˘ ˘ the edge (y. Proof.input: a CFG G = (N. . y) ∈ UG }. A) to UG . i. . We show that the restriction of ⇒∗ ∗ to V × V is the reﬂexivetransitive closure UG of the unit relation of G. A → By and A → yB with B ∈ EG .e. generally with multiple entries. . This gives us UG as a list of length O(G). ˘ UG := {(y. 4. We can similarly output a list of length V  of nodes y associated with a list of their UG predecessors A1 . can be computed in time and space O(G). According to Lemma 3. The unit relation UG and its inverse. EG can be computed in time and space O(G). a representation of the graph IG as needed in Lemma 6 below. C ∈ occurs(B) with C ∈ nullable do if A ∈ nullable then nullable := nullable ∪ {A} todo := todo ∪ {A} return nullable Figure 2: Algorithm Nullable for the lineartime computation of EG .) Algorithm CYK below will need. A)  (A. An . for every rule A → y. C } occurs(C) := occurs(C) ∪ { A. we can also compute the unit relation eﬃciently: Lemma 4.
˘∗ Proof. and since UG is transitive. which can ˘ be quadratic in the size of UG and G. Let G = (N. y) ∈ UG . we do not build the relation UG . we only need to show that ⇒∗ subsumes UG . but use the inverse unit graph: Lemma 6. it is advisable to construct IG once and for all at the beginning only.or breadthﬁrst ˘∗ search [5]. and by Lemma 2. x) ∈ UG }. there are α. β. ˘∗ To compute such sets eﬃciently. n2 with 1 ≤ n1 + n2 < n. to determine for words w ∈ Σ∗ whether S ⇒+ w or not. If n > 0. one could drop the assumption that IG be given. y) ∈ ∗ UG . (“only if”) Suppose x ⇒∗ y. But when (x. This simply uses depth. (x. i. ˘ Theorem 7. Given its inverse ˘∗ unit graph IG . is just {x  ∃y ∈ M. Σ. y) ∈ UG } and hence equals ˘∗ ˘∗ UG (M ) := {x  ∃y ∈ M. then (x. x ⇒∗ v} of all symbols of G that derive v.r. Proof. S. y) ∈ UG . Since y is a single symbol. (y. y ∈ V we have: ∗ x ⇒∗ y iﬀ (x.t. It follows that for any M ⊆ V . β ∈ EG such that x → αyβ. Σ. αβ ∈ EG . since we will need to ˘∗ compute UG (M ) for several M . for any M ⊆ V . S.e. for all subwords v of w. S.Lemma 5. there is an n ∈ N with x ⇒n y. Then for any x ∈ V and any nonempty subword v of 13 . Hence. Let G be a grammar in 2NF with symbol set V . as it can be constructed from G in time and space O(G). Σ. If ∗ ∗ n = 0. generalizes naturally to the computation of the sets {x  x ∈ V. However. (x. there is x → γ with n−1 γ ⇒ y. z) ∈ UG ⊆ UG . i. γ = αzβ for suitable α. UG (M ) consists of all nodes x ∈ V that are reachable in IG from some node y ∈ M . ⊆) least reﬂexive and transitive relation that subsumes UG . →) be a grammar in 2NF. z such that z ∈ V . z ⇒n1 y and αβ ⇒n2 ε for some n1 . Let G = (N. Clearly. y) ∈ UG . By ∗ ∗ induction hypothesis we have (z. 4. y) ∈ UG because UG is reﬂexive. But then ∗ ∗ ∗ (x. (V. →). the set of all x ∈ V which derive some ∗ element of M .4 Building the CYK Table The membership problem for a contextfree grammar G = (N.e. ∗ (“if”) Since UG is the (w. the set UG (M ) can be computed in time and space O(G). UG ) its inverse + unit graph. →) be a grammar. UG (M ) can be computed in time and space O(V  + UG ) = O(G). and w ∈ Σ . The set of all nodes in a graph reachable from a given set of nodes can be computed in time and space O(n + m) where n is the number of all nodes and m the number of all edges. These sets can be constructed inductively. Then for all x. and then x ⇒∗ y since αβ ⇒∗ ε.
14 . Hence. with a boolean array of size N  as entries. hence. Then there is ∗ a y ∈ Tv s. also x ∈ Tv2 = Tv . where the sets Tv and the auxiliary sets Tv are deﬁned by {v}. [8.g. on a G in 2NF and a word w = a1 · · · an of length n. computing and storing T for each v occurrence (i. for each 1 ≤ i ≤ j ≤ n the set Tai ···aj of Theorem 7. 7. the derivation is of the form x ⇒ y1 y2 ⇒n−1 v. y) ∈ UG . 3 contains an implementation of the procedure that is implicitly given in Thm. if v2 = ε then v1 = ε and y1 ∈ Tv1 by hypothesis. Equally. ∃z2 ∈ Tv2 with v = v1 v2 . In case (ii). It computes. (“if”) We prove this by induction on v. Algorithm CYK in Fig. and by Lemma 5.w we have x ⇒∗ v iﬀ x ∈ Tv . m2 < n. y2 ) ∈ UG and. A → z1 z2 }. or (ii) x → y1 y2 with y1 . plus a terminal in the ﬁelds on the main diagonal. If v1 . ∃z1 ∈ Tv1 . which is here called Ti.t. we get y1 ∈ Tv1 and y2 ∈ Tv2 both from the hypothesis and therefore x ∈ Tv which entails x ∈ Tv . and hence y ∈ Tv by hypothesis. if v = 1 + Tv := {A  ∃v1 . Suppose v = 1. ∗ Hence. Since Tv is closed under the inverse of UG . The base case of n = 0 means x = v. But then x ∈ Tv and. Then there is a splitting v = v1 v2 of v such that y1 ⇒m1 v1 and y2 ⇒m2 v2 with m1 . Suppose x ∈ Tv . But then we have x ⇒∗ y ⇒ z1 z2 ⇒∗ v1 z2 ⇒∗ v1 v2 = v. since Tv2 is closed under the inverse of ∗ UG . and x ⇒ y1 y2 ⇒∗ y2 . (“only if”) Suppose x ⇒∗ v. we also have x ∈ Tv . (x. else ˘∗ Tv := UG (Tv ). (x. If v1 = ε then v2 = ε and we have y2 ∈ Tv2 by the hypothesis. e. and in particular v = 1 since x ∈ V .2 These are stored in an array of size n2 . By the deﬁnition of Tv we have y = v which immediately yields x ⇒∗ v. Then y ∈ Tv is only possible if there is a splitting v = v1 v2 of v into two nonempty subwords and a rule x → z1 z2 such that z1 ∈ Tv1 and z2 ∈ Tv2 . the hypothesis yields z1 ⇒∗ v1 and z2 ⇒∗ v2 . y2 ∈ V . By induction on n. Algorithm CYK assumes the inverse unit graph IG to have been computed already. In case (i). x ∈ Tv . j) separately wastes some time and space. v2 ∈ Σ . 7]: these do not close the table entries under UG as done here 2 Note that if a subword v occurs several times in w. which also yields x ∈ Tv1 = Tv . There is n ≥ 0 such that x ⇒n v. Proof. v2 both diﬀer from ε. x ⇒∗ y. There may be applications where it pays oﬀ to implement T as an array indexed by subwords. by Lemma 5. all ﬁelds in the boolean arrays are set to false.j . Now suppose v ≥ 2. we show that x ∈ Tv . we have y ⇒n−1 v. the ﬁrst rule applied in the derivation is of the form (i) x → y with y ∈ V . Now suppose that n > 0. Since G is 2NF and v = ε. Note that algorithm CYK only diﬀers minimally from most textbook versions ˘∗ of CYK. Initially.
. The case of the input ε can – as with all other CYK versions – easily be dealt with separately since EG needs to be computed anyway and ε ∈ L(G) iﬀ S ∈ EG . Lemma 8 expects G to be in 2NF and IG given already. The word problem for contextfree grammars G and words of length n can be solved in time O(G · n3 ) and space O(G · n2 ). . an ∈ Σ ˘ CYK(G. in lines 2 and 10. .j ∪ {A} ˘∗ (T ) Ti. This amounts to O(w · V  + G · w3 + G · w2 ) = O(G · w3 ) altogether. Lemma 8.n then return yes else return no Figure 3: Algorithm CYK for the word problem of CFG in 2NF. G}) ≤ O(w2 · G) altogether. . Σ.j ) is O(G). Solving the word problem for arbitrary contextfree grammars requires the transformation and precomputation to be done beforehand. Then.w) = 1 2 3 4 5 6 7 8 9 10 11 for i = 1. S. Proof. . First.j if S ∈ T1. Given a grammar G = (N. . . S. this can be done in time and space O(G) . . Corollary 9.j := ∅ for h = i. Hence. Algorithm CYK decides in time O(G · w3 ) and space O(G · w2 ) whether or not w ∈ L(G). Σ. its graph (V. 1 do Ti. n do ˘∗ Ti. . . Line 10 is executed O(w2 ) times. each iteration taking O(G) steps. n do for i = j − 1.j := UG i.UG . + a word w = a1 . Also note that CYK presupposes the input word w to be nonempty. . . . . →) in 2NF. Proof.input: ˘ a CFG G = (N. compute the set 15 . taking O(G) steps for each execution according to Lemma 6.j then Ti. . . j − 1 do for all A → yz if y ∈ Ti.j := Ti. UG ). transform G into an equivalent G in 2NF with G  = O(G). →) in 2NF. The inner loop in lines 7–9 is executed O(w3 ) times.h and z ∈ Th+1. its graph IG and a word w ∈ Σ+ . By Lemma 1. The forloop in lines 1–2 takes time O(w · V ). . we need space O(max{w2 · N . .i := UG ({ai }) for j = 2. and each step being executed in constant time. The space needed to store T is O(w2 · N ) and the one to compute the ∗ ˘ UG (Ti.
0. and is of size 35. 33 rules. and is of size 83. where it is used to illustrate the transformation into CNF. [8]. ).e. F } {0. Since G  is O(G).a. In order to be able ˘∗ to compute UG (M ) for any M ⊆ V we ﬁrst determine the unit relation. E. 1)} Hence. (F.of nullable nonterminals and the inverse unit graph for G in time and space O(G ) according to Lemma 2 and Lemma 4. The result is the following grammar G with symbol set V . E T F I → → → → T  EX F  TY aI  bI  (Z 0I  1I  ε X Y Z → +T → ∗F → E) It has 7 nonterminals. E T F I →T E+T →F T ∗F → aI  bI  (E) → 0I  1I  ε This is a standard example with L(G) being the set of arithmetic expressions over the regular set of identiﬁers (a ∪ b)(0 ∪ 1)∗ . Note that G has 4 nonterminals. T ). 16 . ∗}.5 An Example We ﬁnish this section by executing the entire procedure on the following example grammar G over the alphabet Σ = {a. Y . This example (in slightly modiﬁed form) can also be found in the standard textbook by Hopcroft e. b. First we need to transform G into 2NF by introducing new nonterminals X. (I. 13 rules. EG = {I}. then the result has 15 nonterminals. i. Finally. If it is obtained in the order Del → Unit → Term → Bin. +. (T. 4. UG = {(E. Since most textbook versions of CYK would require G to have been transformed into CNF. 1. and is of size 29. this yields an asymptotic running time of O(G · n3 ) and a space consumption of O(G · n2 ). execute CYK in time O(G  · n3 ) and space O(G  · n2 ) according to Lemma 8. E. (. It is easy to see that only I is nullable. 10 rules. E} ˘∗ UG ({I}) = {I} etc. for instance ˘∗ UG ({a}) = ˘∗ UG ({b}) = ˘ U ∗ ({0}) = G {a. E} ˘∗ UG ({T }) = {T. b). (F. T. Z for the righthand sides that are larger than 2 symbols. I} ˘∗ UG ({F }) = {F. T. we have. F } {b. (I. a). we state the corresponding ﬁgures for the resulting CNF grammar without presenting it explicitly. 0). F ). T.
The bottom halves of the entries on the main diagonal always form the input word when read from topleft to bottomright. A nonterminal in the top half of the entry belongs to ˘∗ Tv \ Tv . T. T F 7 8 E T i=1 2 3 E.. Executing CYK on w creates the table depicted in Fig.6 Parsing with CYK The CYK recognition algorithm can be turned into a parser either by constructing parse trees from the recognition table or by inserting trees rather than nonterminals into the table. 4. A nonterminal A in the bottom half of an entry at row i and column j belongs to Tv for v = w[i. it is in there because of a rule of the form A → yz such that y and z occur at certain positions in the same row to the left and in the same column below. i. deﬁne a function extract(A. the entire entry represents Tv . 4. To ﬁnish the example. F b 4 Z ) ∗ Y E. consider the word w = (a0 + b) ∗ a. it is in there because it is a predecessor in UG of some nonterminal from the bottom half. Hence. j) 17 . T F I 0 + E Z X E.e. In the case of CYK for grammars in CNF. T.j]. Hence. F a E. i. leaving aside the question of spaceeﬃcient structure sharing for a table holding trees. F a 5 6 7 8 Figure 4: An example run of algorithm CYK. T. We only discuss how to construct parse trees from the table.j=1 ( 2 3 4 5 6 E. It is to be understood as follows.
and all possible parse trees can be obtained from the table in ﬁnitely many steps. i. t ∈ extract (x. i. extract(A. Then. it is easy to recover w’s parse trees with respect to G from those for G obtained by CYK. = s ∈ extract(y. j). where A(s. (Note that the missing parse trees with subderivations B ⇒+ B also get lost if we turn the grammar into CNF. Our CYK uses grammars in 2NF. j) extends such trees by adding a root labelled A on top. In this way. First. there are only ﬁnitely many derivations + A ⇒ x. Using extract. if A ⇒+ x: ˘ extract(A. A ⇒+ A does not occur). i. If the grammar is cyclic. j)} if i < j. and by using them in extract(A. j) returns trees with yield ai · · · aj and root labelled x. the number of parse trees of a given w is ﬁnite. we can turn them into parse + trees: in this case. t ∈ extract(C. j) = s ∈ extract(B.j an auxiliary function extract (x. i. for each (A. i. so their parse trees can be precomputed and extract(A. h + 1. i. i. we get a correct but incomplete parser. j) = {A(t)  x ∈ Ti. Since these branches generally do not correspond to grammar rules A → x. we consider all derivations A ⇒+ x as inessential variants of each other and represent them as a unary branch. this gives a complete parser for G. One just has to undo the transformation of kary branching rules to binary branching ones on the trees. Since grammars in CNF are acyclic (i. the ﬁnitely many “canonical” parse trees with respect to G can be transformed into parse trees with respect to G. h). t ∈ extract(z. for x ∈ Ti. which may be cyclic and hence may have inputs with inﬁnitely many parses. We adapt the tree extraction algorithm so that it returns a ﬁnite number of “essentially diﬀerent” analyses.j . h). j) would put them on top of the trees returned by extract (x. t)  A → BC.j the trees with root labelled A and yield ai · · · aj by if i = j {A(ai )  A → ai } {A(s. If G is acyclic. i ≤ h < j. we can still precompute the ﬁnite number of derivations A ⇒+ x which have no subderivations B ⇒+ B.e. apply the covering homomorphism on left parses. j) ∪ extract (A.that returns for A ∈ Ti. t) is the tree with root labelled A and immediate subtrees s and t. for A ∈ Ti. j)} if i = j if i < j. each having a nonempty yield. j) {x} {x(s. i. i. i ≤ h < j. If the grammar is acyclic. Since G is a leftcover of G by Lemma 1. i. i. which are leaves in case i = j or branch at the root to two subtrees. In this way. t)  x → yz. A ∈ U + ({x})} G extract (x. i. j). we would obtain a ﬁnite set of “canonical” parse trees. j). extract(A. 18 .) Suppose CYK is used with a 2NF grammar G for a given grammar G and a word w.j . x) ∈ UG .e. h + 1. i. the extracted trees are not parse trees in the strict sense.
The precomputations that this CYK version requires need to be done in the textbook versions as well where they are part of the grammar transformation: the computation of the nullable symbols is necessary for step Del. ⇒∗ . The 2NF variant does not give this impression. They can be left out. and the ﬁlling of the CYK table.lmu. (2) Since the CNF transformation comes with an at least quadratic blowup in grammar size. and it will therefore make it easier for students to remember it. as is done in many textbooks. the construction of the inverse unit graph could also be used to implement step Unit. it emphasises the fact that the computation of such relations is a necessary means to solving the membership problem. Unit and Del in favour of precomputed auxiliary 3 http://www. one may be inclined to think that the complexity of the word problem for arbitrary CFGs is at least quadratic in the grammar size. our algorithm CYK only diﬀers in two lines (the closure steps in lines 2 and 9) from those that work on CNF input.5 Conclusion Teaching CYK with pretransformation into 2NF The variant of CYK we propose here is not more diﬃcult to present than those in the standard textbooks. the computation of the nullable symbols. By using them in the algorithm one cannot do the Bin. Moreover.and Deltransformations in the wrong order and automatically avoids an exponential blowup of the grammar. the linear time and space algorithm for computing the predecessors w. then using the 2NF variant bears two distinct advantages over the CNF variant: (1) It does not contain hidden obstacles like the dependency of the blowup on the order in which the pretransformation steps are done. and algorithm CYK. We believe that this is wrong in cases where the blowup in the grammar is suboptimal. It visualises the entire procedure presented here on arbitrary input contextfree grammars. many textbooks ignore the complexity of the normal form transformation or ignore the dependency of CYK’s complexity on the size of the input grammar.t.r.de/SeeYK 19 . The essentials are easily seen to be the 2NF transformation. Finally. Avoiding normal forms at all We have argued for omitting the grammar transformation steps Term. the inverse chain relation. note that in the previous two sections we have presented one way of presenting this variant of CYK. The correctness proofs are of course not essential in order to obtain an eﬃcient procedure. On the other hand. As shown in the introduction. However. It is obviously possible to present our variant of CYK without the complexity analyses. it shows simulations of the computation of the nullable symbols. one may argue that the essence of CYK is the dynamic programming technique and that therefore the actual algorithm should not contain computation steps which obfuscate the view onto that technique. We believe that using these relations explicitly in the CYK algorithm is better from a learning point of view than to use them in order to transform the grammar. A tool is available that is meant to support teaching this variant of CYK3 . If this is done. In particular.tcs.ifi.
Introduction to Formal Language Theory. T. Communications of the ACM. Springer. [2] J. B ∈ Tu A → αB · β ∈ Tvu where the ﬁeld identiﬁers are subwords of the input sentence and ·ε = ε·. A. editors. 2007. and J. 35th Int. SpringerVerlag. [8] J. on Automata. PrenticeHall. London. volume 5126 of LNCS. Berstel. Languages and Programming. McGrawHill. Part II. MIT Press. Languages. 1997. and Compiling. Programming Languages and Their Compilers. ICALP’08. Axelsson. Translation. Lange. and Computation. 2001. Rozenberg. Schwartz. E. Leiserson. Introduction to Automata Theory. Reading (MA): AddisonWesley. Cormen. pages 111–174. Analyzing contextfree grammars using an incremental SAT solver. [7] M. 1970. Harrison. [4] J.relations. Boasson. Berlin. 3 edition. E. Word Language Grammar. R. 1978. J. Ullman. In Proc. C. Heljanko. H. D. We therefore conclude that the pretransformation into 2NF and the according variant of CYK forms the best balance between didactic needs and the aim for eﬃciency. and L. and M. [3] R. References [1] A. New York. Hopcroft. but also “dotted rules” A → α · β where A → αβ is a grammar rule and α ⇒∗ v. and R. To do so. 1990. [6] Jay Earley. D. K. The algorithm would have to close the table under the rules A→α A → ·α ∈ Tε A ⇒∗ B. Motwani. volume 1. Contextfree languages and pushdown automata. Aho and J. AddisonWesley. pages 410–422. In A. However. 1972. One may push this a step further and teach a version of CYK which omits the transformation Bin as well and only implicitly works with binary branching rules. An eﬃcient contextfree parsing algorithm. The Theory of Parsing. Handbook of Formal Languages. Cambridge. and seems less memorizable than the explicit transformation to 2NF. V. Coll. volume Volume I: Parsing. Rivest. 1970. Courant Institute of Mathematical Sciences. AddisonWesley series in computer science. Salomaa and G. [5] T. Cocke and J. Ullman. this would be a step away form the simplicity of CYK towards Earley’s algorithm [6]. 13(2):94–102. Autebert. 20 . B → β· ∈ Tv A ∈ Tv A → α · aβ ∈ Tv A → αa · β ∈ Tva A → α · Bβ ∈ Tv . L. New York. Introduction to Algorithms. one would not only insert symbols x ∈ V into the ﬁelds Tv of the recognition table.
[23] Terry Winograd. Eﬃcient Parsing for Natural Language: A Fast Algorithm for Practical Systems. [14] M. Eﬃcient tabular LR parsing. Lewis and C. 1988. PrenticeHall. Canada. Automata. Wegener. 1998. Language as a Cognitive Process. Acta Inf. Proefschrift Nijmegen. Leiß. Springer. Addison Wesley. BRNGLR: a cubic Tomitastyle GLR parsing algorithm. Massachusetts. 1993.[9] T. USA. Teubner. SpringerVerlag. Rich. MA. Okhotin. [12] H. Linguistic Parsing and Program Transformation. 1967. Leermakers.R. An eﬃcient recognition and syntax analysis algorithm for contextfree languages. 21 . pages 135–142. Elements of the theory of computation. In Proceedings of the 34th Annual Meeting of the ACL. [21] M. Prentice Hall. snd edition. Nederhof. 1994.H. Spektrum Akademischer Verlag GmbH. H. 1996. PhD thesis. 44(6):427–461. Sippu and E. Economopoulos. Proceedings of the Conference.. Theoretische Informatik. A. 9th International Workshop. volume 1092 of LNCS. Johnstone. Computability. Boolean grammars. [15] M. Air Force Cambridge Research Laboratory. NJ. Norwell. [24] D. 2007. Tomita. Information and Control. 1983. 1994. EATCS Monographs on Theoretical Computer Science. Information and Computation. Younger.J. pages 239–246.. pages 388–402.J. Bedford. and Complexity: Theory and Applications. 10(2):372–375. Parsing. HeidelbergBerlin. Theoretische Informatik — kurzgefaßt. SoisalonSoininen. Association for Computational Linguistics. Recognition and parsing of contextfree languages in time n3 . In 27th Annual Meeting of the ACL.I: Languages and Parsing. CSL’95. Naumann and H. Scott. [20] S. Parsing Theory. [19] E. In Computer Science Logic. [17] E. Kasami. 1996. Morristown. Vancouver. Teubner. 2004. Technical Report AFCRL65758. Bounded ﬁxedpoint deﬁnability and tabular recognition of languages. 1985. Satta. June 1989. How to cover a grammar. [11] H. [13] S. Hochschultascheno buch. [22] I. Berlin. 2007. 2000. Nederhof and G. USA. Sch¨ning. and R. Wiskunde en Informatica. Stuttgart. Papadimitriou. New York. [18] U. [16] A. Association for Computational Linguistics. Vol. 1965. Langer. [10] R. 194(1):19– 48. Volume 1: Syntax. Kluwer Academic Publishers.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.