Chapter - Three: Syntax Analysis

Chapter – three
Syntax analysis
1
Outline
 Introduction
 Context free grammar (CFG)
 Derivation
 Parse tree
 Ambiguity
 Predictive parser
 Operator precedence parsing
 LR parsers
2
Introduction
 Syntax: the way in which tokens are put together to
form expressions, statements, or blocks of statements.
 The rules governing the formation of statements in a
programming language.
 Syntax analysis: the task concerned with fitting a
sequence of tokens into a specified syntax.
 Parsing: To break a sentence down into its component
parts with an explanation of the form, function, and
syntactical relationship of each part.
 The syntax of a programming language is usually given
by the grammar rules of a context free grammar (CFG).
3
Parser
Parse tree
next char next token
lexical Syntax
analyzer analyzer
get next
char get next
token
Source
Program
symbol
table
Lexical Syntax
(Contains a record Error
Error
for each identifier)
4
Introduction…
 The syntax analyzer (parser) checks whether a given
source program satisfies the rules implied by a CFG
or not.
 If it satisfies, the parser creates the parse tree of that
program.
 Otherwise, the parser gives the error messages.
 A CFG:
 gives a precise syntactic specification of a
programming language.
5
Introduction…
 The parser can be categorized into two groups:
 Top-down parser
 The parse tree is created top to bottom, starting from
the root to leaves.
 Bottom-up parser
 The parse tree is created bottom to top, starting from
the leaves to root.
 Both top-down and bottom-up parser scan the input
from left to right (one symbol at a time).
 Efficient top-down and bottom-up parsers can be
implemented by making use of context-free-
grammar.
6
Context free grammar (CFG)
 A context-free grammar is a specification for the
syntactic structure of a programming language.
 Context-free grammar has 4-tuples:
G = (T, N, P, S) where
 T is a finite set of terminals (a set of tokens)
 N is a finite set of non-terminals (syntactic variables)
 P is a finite set of productions of the form
A→α where A is non-terminal and

α is a strings of terminals and non-terminals (including the
empty string)
 S ∈ N is a designated start symbol (one of the non-
terminal symbols)
7
Example: grammar for simple arithmetic
expressions
expression  expression + term Terminal symbols

expression  expression - term id + - * / ( )
expression  term
term  term * factor Non-terminals
term  term / factor expression
term  factor term
factor  (expression ) Factor
factor  id Start symbol
expression
8
Notational Conventions Used
 Terminals:
 Lowercase letters early in the alphabet, such as a, b, c.
 Operator symbols such as +, *, and so on.
 Punctuation symbols such as parentheses, comma, and so
on.
 The digits 0,1,. . . ,9.
 Boldface strings such as id or if, each of which represents
a single terminal symbol.
 Non-terminals:
 Uppercase letters early in the alphabet, such as A, B, C.
 The letter S is usually the start symbol.
 Lowercase, italic names such as expr or stmt.
 Uppercase letters may be used to represent non-terminals
for the constructs.
• expr, term, and factor are represented by E, T, F
9
Notational Conventions Used…
 Grammar symbols
 Uppercase letters late in the alphabet, such as X, Y, Z, that is, either non-
terminals or terminals.
 Strings of terminals.
 Lowercase letters late in the alphabet, mainly u,v,x,y ∈ T*
 Strings of grammar symbols.
 Lowercase Greek letters, α, β, γ ∈ (N∪T)*
 A set of productions A  α1, A  α2, . . . , A  αk with a common head A (call them
A-productions), may be written
A  α1 | α2 |…| αk
α1, α2,. . . , αk the alternatives for A.
 The head of the first production is the start symbol.
EE+T|E-TIT
TT*FIT/FIF
F  ( E ) | id
10
Derivation
 A derivation is a sequence of replacements of structure names
by choices on the right hand sides of grammar rules.
 Example: E → E + E | E – E | E * E | E / E | -E
E→(E)
E → id
E => E + E means that E + E is derived from E

- we can replace E by E + E
- we have to have a production rule E → E+E in our grammar.
E=>E+E =>id+E=>id+id means that a sequence of replacements of

non-terminal symbols is called a derivation of id+id from E.
11
Derivation…
 In general The one-step derivation is defined by
α A β ⇒ α γ β if there is a production rule A → γ in our
grammar
Where α and β are arbitrary strings of terminal and non-
terminal symbols.
α1=> α2=>….=> αn (αn is derived from α1 or α1 derives αn)
 At each derivation step, we can choose any of the non-terminal

in the sentential form of G for the replacement.
 Transitive closure ⇒* (zero or more steps)

 Positive closure ⇒+ (one or more steps)
12
Derivation…
 If we always choose the left-most non-terminal in each

derivation step, this derivation is called left-most derivation.
Example: E=>-E=>-(E)=>-(E+E)=>-(id+E)=>-(id+id)
 If we always choose the right-most non-terminal in each
derivation step, this derivation is called right-most
derivation.
Example: E=>-E=>-(E)=>-(E+E)=>-(E+id)=>-(id+id)
 We will see that the top-down parser try to find the left-most
derivation of the given source program.
 We will see that the bottom-up parser try to find right-most
derivation of the given source program in the reverse order.
13
Parse tree
 A parse tree is a graphical representation of a
derivation
 It filters out the order in which productions are applied
to replace non-terminals.
 A parse tree corresponding to a derivation is a labeled

tree in which:
• the interior nodes are labeled by non-terminals,
• the leaf nodes are labeled by terminals, and
• the children of each internal node represent the
replacement of the associated non-terminal in
one step of the derivation.
14
Parse tree and Derivation
Grammar E  E + E | E  E | ( E ) | - E | id
Lets examine this derivation:
E  -E  -(E)  -(E + E)  -(id + id)
E E E E E
- E - E - E - E
( E ) ( E ) ( E )
E + E E + E
This is a top-down derivation
because we start building the id id
parse tree at the top parse tree
15
Exercise
a) Using the grammar below, draw a parse tree for the
following string:
( ( id . id ) id ( id ) ( ( ) ) )
S→E
E → id
|(E.E)
|(L)
|()
L→LE
|E
b) Give a rightmost derivation for the string given in (a).
16
Ambiguity
 A grammar produces more than one parse tree for a
sentence is called as an ambiguous grammar.
• produces more than one leftmost derivation or
• more than one rightmost derivation for the same
sentence.
 We should eliminate the ambiguity in the grammar

during the design phase of the compiler.
 An unambiguous grammar should be written to eliminate
the ambiguity.
 Ambiguous grammars (b/c of ambiguous operators) can
be disambiguated according to the precedence and
associatively rules.
17
Ambiguity: Example
 Example: The arithmetic expression grammar
E → E + E | E * E | ( E ) | id
 permits two distinct leftmost derivations for the
sentence id + id * id:
(a) (b)
E => E + E E => E * E
=> id + E => E + E * E
=> id + E * E => id + E * E
=> id + id * E => id + id * E
=> id + id * id => id + id * id
18
Ambiguity: example
E  E + E | E  E | ( E ) | - E | id
Construct parse tree for the expression: id + id  id
E E E E
E + E E + E E + E
E  E id E  E
id id
E E E E
E  E E  E E  E
E + E E + E id
Which parse tree is correct?
id id
19
Ambiguity: example…
E  E + E | E  E | ( E ) | - E | id
Find a derivation for the expression: id + id  id

E
According to the grammar, both are correct.
E + E
id E  E
A grammar that produces more than one
id id
parse tree for any input sentence is said
to be an ambiguous grammar. E
E + E
E  E id
id id
20
Elimination of ambiguity
Precedence/Association
 These two derivations point out a problem with the grammar:
 The grammar do not have notion of precedence, or implied order of
evaluation
To add precedence
 Create a non-terminal for each level of precedence
 Isolate the corresponding part of the grammar
 Force the parser to recognize high precedence sub expressions first
For algebraic expressions

 Multiplication and division, first (level one)
 Subtraction and addition, next (level two)
To add association
 Left-associative : The next-level (higher) non-terminal places at the
last of a production
21
Elimination of ambiguity
 To disambiguate the grammar :
E  E + E | E  E | ( E ) | id
 we can use precedence of operators as follows:

* Higher precedence (left associative)
+ Lower precedence (left associative)
 We get the following unambiguous grammar:
EE+T|T id + id * id
TTF|F
F  ( E ) | id
22
Elimination of Left recursion
 A grammar is left recursive, if it has a non-terminal A
such that there is a derivation
A=>+Aα for some string α.
 Top-down parsing methods cannot handle left-
recursive grammar.
 so a transformation that eliminates left-recursion is
needed.
 To eliminate left recursion for single production
A  Aα |β could be replaced by the nonleft- recursive
productions
A  β A’
A’  α A’| ε
23
Elimination of Left recursion…
This left-recursive EE+T|T

grammar: TTF|F
F  ( E ) | id
Can be re-written to eliminate the immediate left recursion:
E  TE’
E’  +TE’ | 
T  FT’
T’  FT’ | 
F  ( E ) | id
24
Elimination of Left recursion…
 Generally, we can eliminate immediate left
recursion from them by the following technique.
 First we group the A-productions as:
A  Aα1 |Aα2 |…. |Aαm |β1 | β2|….| βn
Where no βi begins with A. then we replace the A

productions by:
A  β1A’ | β2A’ | … | βnA’
A’  α1Α’ | α2A’ | … | αmA’ |ε
25
Eliminating left-recursion (more)
 Example: Given: S  Aa | b
A  Ac |Sd |ε
 Substitute the S productions in A  Sd to obtain the
following productions:
A  Ac | Aad | bd |ε
 Eliminating the immediate left recursion among the A
productions yields the following grammar:
S  Aa | b
A  bdA’ | A’
A’  cA’ | adA’ |ε
26
Top down parsing
 Constructing a parse tree for the input string.
 Starting from the root and creating the nodes

of the parse tree in preorder.
 Equivalently, it can be viewed as finding a

leftmost derivation for an input string.
3-27
Top down parser
 Example :for the input id+id*id
 The top down parse trees according to the
following grammar :

 This sequence of trees corresponds to a

leftmost derivation of the input
3-28
Top down parser
 the key problem is that of determining the
production to be applied for a nonterminal,
say A.
3-29
3-30

Chapter - Three: Syntax Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter - Three: Syntax Analysis

Uploaded by

Copyright:

Available Formats

Chapter – three

A→α where A is non-terminal and

expression  expression + term Terminal symbols

E => E + E means that E + E is derived from E

E=>E+E =>id+E=>id+id means that a sequence of replacements of

 At each derivation step, we can choose any of the non-terminal

 Transitive closure ⇒* (zero or more steps)

 If we always choose the left-most non-terminal in each

 A parse tree corresponding to a derivation is a labeled

 We should eliminate the ambiguity in the grammar

Find a derivation for the expression: id + id  id

For algebraic expressions

 we can use precedence of operators as follows:

 We get the following unambiguous grammar:

This left-recursive EE+T|T

Can be re-written to eliminate the immediate left recursion:

A  Aα1 |Aα2 |…. |Aαm |β1 | β2|….| βn

Where no βi begins with A. then we replace the A

 Starting from the root and creating the nodes

 Equivalently, it can be viewed as finding a

 This sequence of trees corresponds to a

You might also like