You are on page 1of 5

chomsky normal form

in computer science, a formal grammar is in chomsky normal form if and only if all
production rules are of the form:

a ? bc or
a ? a or
s ? e

where a, b and c are nonterminal symbols, a is a terminal symbol (a symbol that


represents a constant value), s is the start symbol, and e is the empty string.
also, neither b nor c may be the start symbol.

every grammar in chomsky normal form is context-free, and conversely, every


context-free grammar can be efficiently transformed into an equivalent one which
is in chomsky normal form.

with the exception of the optional rule s ? e (included when the grammar may
generate the empty string), all rules of a grammar in chomsky normal form are
expansive; thus, throughout the derivation of a string, each string of terminals
and nonterminals is always either the same length or one element longer than the
previous such string. the derivation of a string of length n is always exactly 2n-
1 steps long. furthermore, since all rules deriving nonterminals transform one
nonterminal to exactly two nonterminals, a parse tree based on a grammar in
chomsky normal form is a binary tree, and the height of this tree is limited to at
most the length of the string.

because of these properties, many proofs in the field of languages and


computability make use of the chomsky normal form. these properties also yield
various efficient algorithms based on grammars in chomsky normal form; for
example, the cyk algorithm that decides whether a given string can be generated by
a given grammar uses the chomsky normal form.

the chomsky normal form is named after noam chomsky, the us linguist who invented
the chomsky hierarchy.

alternative definition

some sources define chomsky normal form in the following slightly different way:

a formal grammar is in chomsky normal form if and only if all production rules are
of the form:

a ? bc or
a ? a

where a, b and c are nonterminal symbols, and a is a terminal symbol. when using
this definition, b or c may be the start symbol.

this definition differs from the previous one in that it precludes the possibility
that the grammar will generate the empty string, e. it remains true that any
context-free accepting a language l can be efficiently transformed into a grammar
in chomsky normal form that accepts l - {e}. the principle advantage of this later
definition is that proofs are generally marginally simpler, due to the fact that
each step in a derivation never decreases the length of the resulting string. its
disadvantage, of course, is that special consideration is needed if the original
grammar generated e.
--------------------------------

backus normal form (bnf)

the syntax definitions on this site use a variant of backus normal form (bnf) that
includes the following:

[ ] brackets enclose optional items.


{ } items of which only one is required.
| a vertical pipe separates alternatives (or)
underlined the default option
... the preceding item can be repeated (as many times as needed)
upper case command that should be entered as shown
lowercase&italic variable that should be replaced with an appropriate value

where a command is too long to fit on a single line it is wrapped


with each subsequent line indented:

command option option option option option option option option


option option option option option option option option
option option option option

where there are several optional items and you can choose any,several or all
these are shown in a list, all indented to the same degree:

command option option


option
option

if further command options follow they will be indented:

command option option


option
option
option option option

if the options have sub options they are individually indented:

command option option


option option option
option
option option
option
option option option

where there are several optional items and you can choose only one from
the list, they are indented and separated by a blank line

command option option

option

option

this is the same as command option {option | option | option}

where the layout makes it possibe, braces and the pipe symbol are the preferred
method to show alternative options.
this syntax is not always followed rigourously e.g. on a page where almost
everything is a variable, to improve readability some variables may be left as
lower case but not italic.
similarly option [,option] [,...] may be abbreviated to just option ,... this is
to avoid having a surfeit of brackets around the more complex commands.
bold text is used to highlight some key items but does not have any syntactic
importance.

in some older texts bnf is also referred to as backus-naur form.

backus�naur form
from wikipedia, the free encyclopedia
jump to: navigation, search

the backus�naur form (also known as bnf, the backus�naur formalism, backus normal
form, or panini�backus form) is a metasyntax used to express context-free
grammars: that is, a formal way to describe formal languages.

bnf is widely used as a notation for the grammars of computer programming


languages, instruction sets and communication protocols, as well as a notation for
representing parts of natural language grammars (for example, meter in sanskrit
poetry.) most textbooks for programming language theory and/or semantics document
the programming language in bnf.

there are many extensions of and variants on bnf.

history

john backus created the notation in order to express the grammar of algol. at the
first world computer congress, which took place in paris in 1959, backus presented
"the syntax and semantics of the proposed international algebraic language of the
zurich acm-gamm conference", a formal description of the ial which was later
called algol 58. the formal language he presented was based on emil post's
production system. generative grammars were an active subject of mathematical
study, e.g. by noam chomsky, who was applying them to the grammar of natural
language.[1] [2]

peter naur later simplified backus's notation to minimize the character set used,
and, at the suggestion of donald knuth[3], his name was added in recognition of
his contribution.

the backus�naur form or bnf grammars have significant similarities to pa?ini's


grammar rules, and the notation is sometimes also referred to as panini�backus
form.

introduction

a bnf specification is a set of derivation rules, written as

<symbol> ::= <expression with symbols>

where <symbol> is a nonterminal, and the expression consists of sequences of


symbols and/or sequences separated by the vertical bar, '|', indicating a choice,
the whole being a possible substitution for the symbol on the left. symbols that
never appear on a left side are terminals.

as an example, consider this possible bnf for a u.s. postal address:


<postal-address> ::= <name-part> <street-address> <zip-part>

<name-part> ::= <personal-part> <last-name> <opt-jr-part> <eol>


| <personal-part> <name-part> <eol>

<personal-part> ::= <first-name> | <initial> "."

<street-address> ::= <opt-apt-num> <house-num> <street-name> <eol>

<zip-part> ::= <town-name> "," <state-code> <zip-code> <eol>

this translates into english as:

* a postal address consists of a name-part, followed by a street-address


part, followed by a zip-code part.
* a name-part consists of either: a personal-part followed by a last name
followed by an optional "jr-part" (jr., sr., or dynastic number) and end-of-line,
or a personal part followed by a name part (this rule illustrates the use of
recursion in bnfs, covering the case of people who use multiple first and middle
names and/or initials).
* a personal-part consists of either a first name or an initial followed
by a dot.
* a street address consists of an optional apartment specifier, followed
by a house number, followed by a street name, followed by an end-of-line.
* a zip-part consists of a town-name, followed by a comma, followed by a
state code, followed by a zip-code followed by an end-of-line.

note that many things (such as the format of a first-name, apartment specifier, or
zip-code) are left unspecified here. if necessary, they may be described using
additional bnf rules.

bnf's syntax may be represented with a bnf like the following:

<syntax> ::= <rule> | <rule> <syntax>


<rule> ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace> "::="
<opt-whitespace> <expression> <line-end>
<opt-whitespace> ::= " " <opt-whitespace> | ""
<expression> ::= <list> | <list> "|" <expression>
<line-end> ::= <opt-whitespace> <eol> | <line-end> <line-end>
<list> ::= <term> | <term> <opt-whitespace> <list>
<term> ::= <literal> | "<" <rule-name> ">"
<literal> ::= '"' <text> '"' | "'" <text> "'"

this assumes that no whitespace is necessary for proper interpretation of the


rule. <eol> to be the appropriate line-end specifier (in ascii, carriage-return
and/or line-feed, depending on the operating system). <rule-name> and <text> are
to be substituted with a declared rule's name/label or literal text, respectively.

variants

there are many variants and extensions of bnf, generally either for the sake of
simplicity and succinctness, or to adapt it to a specific application. one common
feature of many variants is the use of regexp repetition operators such as * and
+. the extended backus-naur form (ebnf) is a common one. in fact the example above
is not the pure form invented for the algol 60 report. the bracket notation "[]"
was introduced a few years later in ibm's pl/i definition but is now universally
recognised. abnf is another extension commonly used to describe ietf protocols.
parsing expression grammars build on the bnf and regular expression notations to
form an alternative class of formal grammar, which is essentially analytic rather
than generative in character.

many bnf specifications found online today are intended to be human readable and
are non-formal. these often include many of the following syntax rules and
extensions:

* optional items enclosed in square brackets. e.g. [<item-x>]


* items repeating 0 or more times are enclosed in curly brackets. e.g. <word>
::= <letter> { <letter> }
* items repeating 1 or more times are followed by a '+'
* terminals may appear in bold and nonterminals in plain text rather than
using italics and angle brackets
* repetition of an item is signified by an asterisk placed after that item:
�*�
* alternative choices in a production are separated by the �|� symbol. e.g.,
<alternative-a> | <alternative-b>
* where items need to be grouped they are enclosed in simple parenthesis

---------------------------------------------

greibach normal form

from wikipedia, the free encyclopedia


jump to: navigation, search

in computer science, to say that a context-free grammar is in greibach normal form


(gnf) means that all production rules are of the form:

a \to \alpha x

or

s \to \epsilon

where a is a nonterminal symbol, a is a terminal symbol, x is a (possibly empty)


sequence of nonterminal symbols not including the start symbol, s is the start
symbol, and e is the null string.

observe that the grammar must be without left recursions.

every context-free grammar can be transformed into an equivalent grammar in


greibach normal form. (some definitions do not consider the second form of rule to
be permitted, in which case a context-free grammar that can generate the null
string cannot be so transformed.) this can be used to prove that every context-
free language can be accepted by a non-deterministic pushdown automaton.

given a grammar in gnf and a derivable string in the grammar with length n, any
top-down parser will halt at depth n.

greibach normal form is named after sheila greibach.

---------------------------------

You might also like