Professional Documents
Culture Documents
Ramirez
Language Generators
• Grammar
Course Notes for A mechanism (or set of rules) by which a
CS112 Structure of language is generated
Programming Languages Defined by the following:
LANGUAGE GENERATORS • A set of non-terminal symbols, N
– Do not actually appear in strings
by • A set of terminal symbols, T
LAARNI C. DESENGAŇO - PANCHO – Appear in strings
Department of Computer Science and IT • A set of productions, P
Bicol University – Rules used in string generation
• A starting symbol, S
2
Example: Regular grammar to generate a If we could add a “memory” of some sort we
binary string containing an odd number of 1s could get this to work
N = {A,B} T = {0,1} S = A P =
A 0A | 1B | 1 • Context-free Grammars
B 0B | 1A | 0
Can be modeled by a Push-Down Automaton
Example: Regular grammars CANNOT (PDA)
generate strings of the form anbn • FSA with added push-down stack
• Grammar needs some way to count number
Productions are of the form:
of a’s and b’s to make sure they are the same
• <non> , where <non> is a nonterminal
• Any regular grammar (or FSA) has a finite
and is any sequence of terminals and
number, say k, of different states
nonterminals
• If n > k, not possible – note rhs is more flexible now
7 8
So how to generate anbn ? Let a=0, b=1 Context-free grammars are also equivalent
N = {A} T = {0,1} S = A P = to BNF grammars
A 0A1 | 01 • Developed by Backus and modified by Naur
• Note that now we can have a terminal after • Used initially to describe Algol 60
the nonterminal as well as before
Given a (BNF) grammar, we can derive any
• Can also have multiple nonterminals in a
single production string in the language from the start symbol
and the productions
Example: Grammar to generate sets of
balanced parentheses A common way to derive strings is using a
leftmost derivation
N = {A} T = {(,)} S = A P =
A AA | (A) | () • Always replace leftmost nonterminal first
• Complete when no nonterminals remain
9 10
(AA)
(()A) ( A )
(()(A))
(()(())) A A
11 12
If, for a given grammar, a string can be Ambiguous grammar example: Generate
derived by two or more different parse trees, strings of the form 0n1m, where n,m >= 1
the grammar is ambiguous N = {A,B,C} T = {0,1} S = A P=
A BC | 0A1
Some languages are inherently ambiguous
B 0B | 0
• All grammars that generate that language are C 1C | 1
ambiguous
Consider the string: 00011
Many other languages are not themselves A A
ambiguous, but can be generated by B C 0 A 1
ambiguous grammars 0 B 1 C B C
• It is generally better for use with compilers if
0 B 1 0 B 1
a grammar is unambiguous
– Semantics are often based on syntactic form 0
0
13 14
We can easily make this grammar Let’s look at a few more examples
unambiguous: Grammar to generate: {WWR | W {0,1} }
• Remove production: A 0A1
N = {A} T = {0,1} S = A P = ?
• Note that nonterminal B can generate an S 0A0 | 1A1 | 00 | 11
arbitrary number of 0s and nonterminal C can
generate an arbitrary number of 1s Grammar to generate: strings in {0,1} of the
• Now only one parse tree form WX such that |W| = |X| but W != X
A
B C • This one is a little tricker
0 B 1 C • How to approach this problem?
0 B 1
• We need to guarantee two things
– Overall string length is even
0 – At least one bit differs in the two “halves”
15 16
• Parse tree for: X = (A[2]+Y) * 20 Wow – that seems like a very complicated
<assig stmt>
parse tree to generate such a short
<var> = <arith expr>
statement
<id> <term>
<term> * <primary> • Extra non-terminals are often necessary to
<primary> <num> remove ambiguity
( <arith expr> )
• Extra non-terminals are often necessary to
<arith expr> + <term>
create precedence
<term> <primary>
– Precedence in previous grammar has * and /
<primary> <var> higher than + and -
<var> <id> – They would be “lower” in the parse tree
» “LOWER” ABOVE IS CORRECT
<id> [ <subscript list> ]
<arith expr> • What about associativity
<term> – Left recursive productions == left associativity
<primary> – Right recursive productions == right associativity
<num> 19 20
But Context-free grammars cannot generate • Let’s look at one more grammar example
everything
Grammar to generate all postfix expressions
Ex: Strings of the form WW in {0,1} involving binary operators * and -. Assume
• Cannot guarantee that arbitrary string is the <id> is predefined and corresponds to any
same on both sides variable name
• Compare to WWR Ex: v w x y - * z * -
– These we can generate from the “middle” and
build out in each direction How do we approach this problem?
– For WW we would need separate productions for
each side, and we cannot coordinate the two with Terminals – easy
a context-free grammar
» Need Context-Sensitive in this case
Nonterminals/Start – require some thought
Productions – require a lot of thought
21 22
Language Generators
T = { <id>, *, - }
N={A}
S={A}
P=
A AA* | AA- | <id>
Show parse tree for previous example
Is this grammar LL(1)?
• We will discuss what this means soon
23