You are on page 1of 4

John C.

Ramirez

Language Generators

• Grammar
Course Notes for  A mechanism (or set of rules) by which a
CS112 Structure of language is generated
Programming Languages  Defined by the following:
LANGUAGE GENERATORS • A set of non-terminal symbols, N
– Do not actually appear in strings
by • A set of terminal symbols, T
LAARNI C. DESENGAŇO - PANCHO – Appear in strings
Department of Computer Science and IT • A set of productions, P
Bicol University – Rules used in string generation
• A starting symbol, S
2

Language Generators Language Generators

Noam Chomsky described four classes of • Regular Grammars


grammars (used to generate four classes of
Productions must be of the form:
languages) – Chomsky Hierarchy
0 )Unrestricted <non>  <ter><non> | <ter>
1)Context-sensitive where <non> is a nonterminal, <ter> is a
2)Context-free terminal, and | represents either or
3)Regular Can be modeled by a Finite-State Automaton
(FSA)
• More info on unrestricted and context-
Also equivalent to Regular Expressions
sensitive grammars in a theory course
Provide a model for building lexical analyzers
• The last two will be useful to us
3 4

Language Generators Language Generators

Have following properties (among others) Example: Regular grammar to recognize


• Can generate strings of the form n, where  Pascal identifiers (assume no caps)
is a finite sequence and n is an integer N = {Id, X} T = {a..z, 0..9} S = Id
– Pattern recognition P=
• Can count to a finite number Id  aX | bX | … | a | b | … | z
– Ex. { an | n = 85 } X  aX | bX | … | 0X | … | 9X | a | … | z | 0 | … | 9
» But we need at least 86 states to do this
» Cannot count to arbitrary number Consider equiv. FSA:
» Note that { an } for any n (i.e. 0 or more
occurrences) is easy – do not have to count a 0
• Important to realize that the number of states
… …
is finite: cannot recognize patterns with an Id
arbitrary number of possibilities
z 9
5 6

CS 1520 Lecture Notes 1


John C. Ramirez

Language Generators Language Generators

Example: Regular grammar to generate a If we could add a “memory” of some sort we
binary string containing an odd number of 1s could get this to work
N = {A,B} T = {0,1} S = A P =
A  0A | 1B | 1 • Context-free Grammars
B  0B | 1A | 0
Can be modeled by a Push-Down Automaton
Example: Regular grammars CANNOT (PDA)
generate strings of the form anbn • FSA with added push-down stack
• Grammar needs some way to count number
Productions are of the form:
of a’s and b’s to make sure they are the same
• <non>  , where <non> is a nonterminal
• Any regular grammar (or FSA) has a finite
and  is any sequence of terminals and
number, say k, of different states
nonterminals
• If n > k, not possible – note rhs is more flexible now
7 8

Language Generators Language Generators

So how to generate anbn ? Let a=0, b=1 Context-free grammars are also equivalent
N = {A} T = {0,1} S = A P = to BNF grammars
A  0A1 | 01 • Developed by Backus and modified by Naur
• Note that now we can have a terminal after • Used initially to describe Algol 60
the nonterminal as well as before
Given a (BNF) grammar, we can derive any
• Can also have multiple nonterminals in a
single production string in the language from the start symbol
and the productions
Example: Grammar to generate sets of
balanced parentheses A common way to derive strings is using a
leftmost derivation
N = {A} T = {(,)} S = A P =
A  AA | (A) | () • Always replace leftmost nonterminal first
• Complete when no nonterminals remain
9 10

Language Generators Language Generators

Example: Leftmost derivation of nested • Parse tree for (()(()))


parens: (()(()))
A  (A) A

 (AA)
 (()A) ( A )
 (()(A))
 (()(())) A A

We can view this derivation as a tree, called ( ) ( A )


a parse tree for the string
( )

11 12

CS 1520 Lecture Notes 2


John C. Ramirez

Language Generators Language Generators

If, for a given grammar, a string can be Ambiguous grammar example: Generate
derived by two or more different parse trees, strings of the form 0n1m, where n,m >= 1
the grammar is ambiguous N = {A,B,C} T = {0,1} S = A P=
A  BC | 0A1
Some languages are inherently ambiguous
B  0B | 0
• All grammars that generate that language are C  1C | 1
ambiguous
Consider the string: 00011
Many other languages are not themselves A A
ambiguous, but can be generated by B C 0 A 1
ambiguous grammars 0 B 1 C B C
• It is generally better for use with compilers if
0 B 1 0 B 1
a grammar is unambiguous
– Semantics are often based on syntactic form 0
0
13 14

Language Generators Language Generators

We can easily make this grammar Let’s look at a few more examples
unambiguous: Grammar to generate: {WWR | W  {0,1} }
• Remove production: A  0A1
N = {A} T = {0,1} S = A P = ?
• Note that nonterminal B can generate an S  0A0 | 1A1 | 00 | 11
arbitrary number of 0s and nonterminal C can
generate an arbitrary number of 1s Grammar to generate: strings in {0,1} of the
• Now only one parse tree form WX such that |W| = |X| but W != X
A
B C • This one is a little tricker
0 B 1 C • How to approach this problem?

0 B 1
• We need to guarantee two things
– Overall string length is even
0 – At least one bit differs in the two “halves”
15 16

Language Generators Language Generators

• See board Let’s look at an example more relevant to


• Ok, now how do we make a grammar to do programming languages:
this? • Grammar to generate simple assignment
– Make every string (even length) the result of two statements in a C-like language (diff. from
odd-length strings appended to each other
– Assume odd-length strings are Ol and Or one in text):
– Make sure that either <assig stmt> ::= <var> = <arith expr>
» Ol has a 1 in the middle and Or has a 0 in the <arith expr> ::= <term> | <arith expr> + <term> |
middle or
» Ol has a 0 in the middle and Or has a 1 in the <arith expr> - <term>
middle <term> ::= <primary> | <term> * <primary> |
• Productions: <term> / <primary>
In  AB | BA <primary> ::= <var> | <num> | (<arith expr>)

A  0A0 | 1A1 | 1A0 | 0A1 | 1 <var> ::= <id> | <id>[<subscript list>]


<subscript list> ::= <arith expr> | <subscript list>,
B  0B0 | 1B1 | 1B0 | 0B1| 0 <arith expr>
17 18

CS 1520 Lecture Notes 3


John C. Ramirez

Language Generators Language Generators

• Parse tree for: X = (A[2]+Y) * 20 Wow – that seems like a very complicated
<assig stmt>
parse tree to generate such a short
<var> = <arith expr>
statement
<id> <term>
<term> * <primary> • Extra non-terminals are often necessary to
<primary> <num> remove ambiguity
( <arith expr> )
• Extra non-terminals are often necessary to
<arith expr> + <term>
create precedence
<term> <primary>
– Precedence in previous grammar has * and /
<primary> <var> higher than + and -
<var> <id> – They would be “lower” in the parse tree
» “LOWER” ABOVE IS CORRECT
<id> [ <subscript list> ]
<arith expr> • What about associativity
<term> – Left recursive productions == left associativity
<primary> – Right recursive productions == right associativity
<num> 19 20

Language Generators Language Generators

But Context-free grammars cannot generate • Let’s look at one more grammar example
everything
Grammar to generate all postfix expressions
Ex: Strings of the form WW in {0,1} involving binary operators * and -. Assume
• Cannot guarantee that arbitrary string is the <id> is predefined and corresponds to any
same on both sides variable name
• Compare to WWR Ex: v w x y - * z * -
– These we can generate from the “middle” and
build out in each direction How do we approach this problem?
– For WW we would need separate productions for
each side, and we cannot coordinate the two with Terminals – easy
a context-free grammar
» Need Context-Sensitive in this case
Nonterminals/Start – require some thought
Productions – require a lot of thought
21 22

Language Generators

T = { <id>, *, - }
N={A}
S={A}
P=
A  AA* | AA- | <id>
Show parse tree for previous example
Is this grammar LL(1)?
• We will discuss what this means soon

23

CS 1520 Lecture Notes 4

You might also like