You are on page 1of 10

Chapter 3 Syntax Analysis Principles of Compiler Design

Context-free grammars

 Like regular expressions, context-free grammars describe sets of strings, i.e., languages.
 Additionally, a context-free grammar also defines structure on the strings in the language it defines.
 A language is defined over some alphabet, for example the set of tokens produced by a lexer or the set
of alphanumeric characters.
 The symbols in the alphabet are called terminals.
 A context-free grammar recursively defines several sets of strings.
 Each set is denoted by a name, which is called a nonterminal. The set of nonterminals is disjoint from
the set of terminals.
 One of the nonterminals are chosen to denote the language described by the grammar. This is called
the start symbol of the grammar.

The sets are described by a number of productions. Each production describes some of the possible strings that
are contained in the set denoted by a nonterminal A production has the form
N -> X1......Xn

where N is a nonterminal and X1 : : :Xn are zero or more symbols, each of which is either a terminal or a
nonterminal.

In formal language theory, a context-free grammar (CFG) is a formal grammar in which every production
rule is of the form

V→w

where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty).

A formal grammar is considered "context free" when its production rules can be applied regardless of the
context of a nonterminal.

It does not matter which symbols the nonterminal is surrounded by, the single nonterminal on the left hand side
can always be replaced by the right hand side.

Derivation:
The basic idea of derivation is to consider productions as rewrite rules:

1) Whenever we have a nonterminal, we can replace this by the right-hand side of any production in
which the nonterminal appears on the left-hand side. W
2) We can do this anywhere in a sequence of symbols (terminals and nonterminals) and repeat doing so
until we have only terminals left.

Derivations and the language defined by a grammar

How grammar rules determine a "language," or set of legal strings of tokens.


(34-3)*42 corresponds to the legal string of seven tokens
(number - number ) * number

(34-3*42 is not a legal expression,


because there is a left parenthesis that is not matched by a right parenthesis and the second choice in
the grammar rule for an exp requires that parentheses be generated in pairs.

Prepared and compiled by Dr.Anusuya


Chapter 3 Syntax Analysis Principles of Compiler Design

Derivatrion:
Grammar rules determine the legal strings of token symbols by means of derivations.

A derivation is a sequence of replacements of structure names by choices on the right-hand sides of


grammar rules.

A derivation begins with a single structure name and ends with a string of token symbols.

At each step in a derivation, a single replacement is made using one choice from a grammar rule.
exp  exp op exp | (exp) | number
op  + | – | *

Figure 3.1 : a derivation


(1) exp => exp op exp [exp  exp op exp]
(2) => exp op number [exp  number]
(3) => exp * number [op * ]
(4) => ( exp ) * number [exp ( exp ) ]
(5) =>{ exp op exp ) * number [exp  exp op exp}
(6) => (exp op number) * number [exp number]
(7) => (exp - number) * number [op  - ]
(8) => (number - number) * number [exp  number]

derivation steps use a different arrow from the arrow meta-symbol in the grammar rules. Because grammar rules define
and derivation steps construct by replacement.
The set of all strings of token symbols
L(G) = { s | exp =>* s } obtained by derivations from the exp sym-
bol is the language defined by the
grammar of expressions.
(1) G represents the expression grammar
(2) s represents an arbitrary string of token symbols (sometimes called a sentence)
(3) The symbols =>* stand for a derivation consisting of a sequence of replacements as described earlier.
(The asterisk is used to indicate a sequence of steps, much as it indicates repetition in regular
expressions.)
(4) Grammar rules are sometimes called productions because they "produce" the strings in L(G) via
derivations.
Example 3: A CFG for ab* = { a, ab, abb, abbb, abbbb, . . . . }
1. Terminals: ∑ = {a, b},
2. Nonterminal: N = {S, B}
3. Productions: P = {
S -> aB
B -> bB | ^
}

DERIVATION of abbb using the CFG of example 3:

S => aB => abB => abbB => abbbB => abbb


Most of the time the set of productions are explicitly given for a CFG and the terminals and the non-terminals
are understood from the context as shown in examples 4-6 :

Example 4: A CFG for aba* = { ab, aba, abaa, abaaa, . . . . }

Prepared and compiled by Dr.Anusuya


Chapter 3 Syntax Analysis Principles of Compiler Design

S -> abA
A -> aA | ^

DERIVATION of abaaa using the CFG of example 4:


S => abA => abaA => abaaA => abaaaA => abaaa

Example 5: A CFG for ab*a = { aa, aba, abba, abbba, . . . . }

S -> aBa
B -> aB | ^

DERIVATION of abbbba using the CFG of example 5:


S => aBa => abBa => abbBa => abbbBa => abbbbBa => abbbba

Example 6: A CFG for { ancbn :n > 0} = { acb, aacbb, aaacbbb, . . . . }

S -> aSb | acb

DERIVATION of aaacbbb using the CFG of example 6:


S => aSb => aaSbb => aaacbbb

Definition 3.1 Given a context-free grammar G with start symbol S, terminal symbols
T and productions P, the language L(G) that G generates is defined to be the
set of strings of terminal symbols that can be obtained by derivation from S using the productions P

TR
T aTc
R
R  RbR

Grammar 3.4: Example grammar

T
 aTc
 aaTcc
 aaRcc
 aaRbRcc
 aaRbcc
 aaRbRbcc
aaRbRbRbcc
 aaRbbRbcc
aabbRbcc
aabbbcc Derivation of the string aabbbcc using grammar

T
aTc
aaTcc
aaRcc
aaRbRcc
aaRbRbRcc

Prepared and compiled by Dr.Anusuya


Chapter 3 Syntax Analysis Principles of Compiler Design

aabRbRcc
aabRbRbRcc
aabbRbRcc
 aabbbRcc
 aabbbcc Leftmost derivation of the string aabbbcc using grammar

Example

1) A context-free grammar

A derivation

2) A context-free grammar :

Another derivation

3) A context-free grammar :

A derivation:

4) A context-free grammar :

Another derivation

Definition: Context-Free Grammars

Prepared and compiled by Dr.Anusuya


Chapter 3 Syntax Analysis Principles of Compiler Design

Derivation Order

There are two types:

1) A leftmost derivation: a derivation in which the leftmost nonterminal is replaced at each step in
the derivation. Corresponds to the preorder numbering of the internal nodes of its associated
parse tree.

2) A rightmost derivation: a derivation in which the rightmost nonterminal is replaced at each step
in the derivation. Corresponds to the postorder numbering of the internal nodes of its associated
parse tree.

Example 1:

Given Grammer is,

1 2 3 4 5
S  AB  aaAB  aaB  aaBb  aab
Rightmost derivation:
1 4 5 2 3
S  AB  ABb  Ab  aaAb  aab
example 2:

Leftmost derivation:

S  aAB  abBbB  abAbB  abbBbbB


 abbbbB  abbbb

Prepared and compiled by Dr.Anusuya


Chapter 3 Syntax Analysis Principles of Compiler Design

Rightmost derivation:

S  aAB  aA  abBb  abAb


 abbBbb  abbbb
Derivation Trees
We can draw a derivation as a tree:
 The root of the tree is the start symbol of the grammar, and whenever we rewrite a nonterminal we add
as its children the symbols on the right-hand side of the production that was used.
 The leaves of the tree are terminals which, when read from left to right, form the derived string.
 If a nonterminal is rewritten using an empty production, an e is shown as its child.
 This is also a leaf node, but is ignored when reading the string from the leaves of the tree.
 Can use a tree to illustrate how a string is derived from a CFG.
 Definition: These trees are called syntax trees, parse trees, generation trees,
production trees, or derivation trees.

Steps to create Derivation Tree from given CFG

Prepared and compiled by Dr.Anusuya


Chapter 3 Syntax Analysis Principles of Compiler Design

Prepared and compiled by Dr.Anusuya


Chapter 3 Syntax Analysis Principles of Compiler Design

Ambiguity:

The syntax tree adds structure to the string that it derives. It is this structure that we exploit in the later phases
of the compiler.
For compilation, we do the derivation backwards:
 We start with a string and want to produce a syntax tree. This process is called syntax analysis or
parsing.
 Even though the order of derivation does not matter when constructing a syntax tree, the choice of
production for that nonterminal does.
 Obviously, different choices can lead to different strings being derived, but it may also happen that
several different syntax trees can be built for the same string.
 When a grammar permits several different syntax trees for some strings we call the grammar
ambiguous.
 If our only use of grammar is to describe sets of strings, ambiguity is not a problem. However, when
we want to use the grammar to impose structure on strings, the structure had better be the same every
time.
 Hence, it is a desireable feature for a grammar to be unambiguous. In most (but not all) cases, an
ambiguous grammar can be rewritten to an unambiguous grammar that generates the same set of
strings, or external rules can be applied to decide which of the many possible syntax trees is the “right
one”.

Example

Prepared and compiled by Dr.Anusuya


Chapter 3 Syntax Analysis Principles of Compiler Design

Prepared and compiled by Dr.Anusuya


Chapter 3 Syntax Analysis Principles of Compiler Design

Prepared and compiled by Dr.Anusuya

You might also like