Professional Documents
Culture Documents
Syntax analysis
1
Outline
Introduction
Context free grammar (CFG)
Derivation
Parse tree
Ambiguity
Predictive parser
Operator precedence parsing
LR parsers
2
Introduction
Syntax: the way in which tokens are put together to
form expressions, statements, or blocks of statements.
The rules governing the formation of statements in a
programming language.
Syntax analysis: the task concerned with fitting a
sequence of tokens into a specified syntax.
Parsing: To break a sentence down into its component
parts with an explanation of the form, function, and
syntactical relationship of each part.
The syntax of a programming language is usually given
by the grammar rules of a context free grammar (CFG).
3
Parser
Parse tree
next char next token
lexical Syntax
analyzer analyzer
get next
char get next
token
Source
Program
symbol
table
Lexical Syntax
(Contains a record Error
Error
for each identifier)
4
Introduction…
The syntax analyzer (parser) checks whether a given
source program satisfies the rules implied by a CFG
or not.
If it satisfies, the parser creates the parse tree of that
program.
Otherwise, the parser gives the error messages.
A CFG:
gives a precise syntactic specification of a
programming language.
5
Introduction…
The parser can be categorized into two groups:
Top-down parser
The parse tree is created top to bottom, starting from
the root to leaves.
Bottom-up parser
The parse tree is created bottom to top, starting from
the leaves to root.
Both top-down and bottom-up parser scan the input
from left to right (one symbol at a time).
Efficient top-down and bottom-up parsers can be
implemented by making use of context-free-
grammar.
6
Context free grammar (CFG)
A context-free grammar is a specification for the
syntactic structure of a programming language.
Context-free grammar has 4-tuples:
G = (T, N, P, S) where
T is a finite set of terminals (a set of tokens)
N is a finite set of non-terminals (syntactic variables)
P is a finite set of productions of the form
7
Example: grammar for simple arithmetic
expressions
8
Notational Conventions Used
Terminals:
Lowercase letters early in the alphabet, such as a, b, c.
Operator symbols such as +, *, and so on.
Punctuation symbols such as parentheses, comma, and so
on.
The digits 0,1,. . . ,9.
Boldface strings such as id or if, each of which represents
a single terminal symbol.
Non-terminals:
Uppercase letters early in the alphabet, such as A, B, C.
The letter S is usually the start symbol.
Lowercase, italic names such as expr or stmt.
Uppercase letters may be used to represent non-terminals
for the constructs.
• expr, term, and factor are represented by E, T, F
9
Notational Conventions Used…
Grammar symbols
Uppercase letters late in the alphabet, such as X, Y, Z, that is, either non-
terminals or terminals.
Strings of terminals.
Lowercase letters late in the alphabet, mainly u,v,x,y ∈ T*
Strings of grammar symbols.
Lowercase Greek letters, α, β, γ ∈ (N∪T)*
A set of productions A α1, A α2, . . . , A αk with a common head A (call them
A-productions), may be written
A α1 | α2 |…| αk
α1, α2,. . . , αk the alternatives for A.
The head of the first production is the start symbol.
EE+T|E-TIT
TT*FIT/FIF
F ( E ) | id
10
Derivation
A derivation is a sequence of replacements of structure names
by choices on the right hand sides of grammar rules.
Example: E → E + E | E – E | E * E | E / E | -E
E→(E)
E → id
11
Derivation…
In general The one-step derivation is defined by
α A β ⇒ α γ β if there is a production rule A → γ in our
grammar
Where α and β are arbitrary strings of terminal and non-
terminal symbols.
α1=> α2=>….=> αn (αn is derived from α1 or α1 derives αn)
12
Derivation…
We will see that the top-down parser try to find the left-most
derivation of the given source program.
We will see that the bottom-up parser try to find right-most
derivation of the given source program in the reverse order.
13
Parse tree
A parse tree is a graphical representation of a
derivation
It filters out the order in which productions are applied
to replace non-terminals.
E E E E E
- E - E - E - E
( E ) ( E ) ( E )
E + E E + E
This is a top-down derivation
because we start building the id id
parse tree at the top parse tree
15
Exercise
a) Using the grammar below, draw a parse tree for the
following string:
( ( id . id ) id ( id ) ( ( ) ) )
S→E
E → id
|(E.E)
|(L)
|()
L→LE
|E
b) Give a rightmost derivation for the string given in (a).
16
Ambiguity
A grammar produces more than one parse tree for a
sentence is called as an ambiguous grammar.
• produces more than one leftmost derivation or
• more than one rightmost derivation for the same
sentence.
17
Ambiguity: Example
Example: The arithmetic expression grammar
E → E + E | E * E | ( E ) | id
permits two distinct leftmost derivations for the
sentence id + id * id:
(a) (b)
E => E + E E => E * E
=> id + E => E + E * E
=> id + E * E => id + E * E
=> id + id * E => id + id * E
=> id + id * id => id + id * id
18
Ambiguity: example
E E + E | E E | ( E ) | - E | id
Construct parse tree for the expression: id + id id
E E E E
E + E E + E E + E
E E id E E
id id
E E E E
E E E E E E
E + E E + E id
Which parse tree is correct?
id id
19
Ambiguity: example…
E E + E | E E | ( E ) | - E | id
id E E
A grammar that produces more than one
id id
parse tree for any input sentence is said
to be an ambiguous grammar. E
E + E
E E id
id id
20
Elimination of ambiguity
Precedence/Association
These two derivations point out a problem with the grammar:
The grammar do not have notion of precedence, or implied order of
evaluation
To add precedence
Create a non-terminal for each level of precedence
Isolate the corresponding part of the grammar
Force the parser to recognize high precedence sub expressions first
To add association
Left-associative : The next-level (higher) non-terminal places at the
last of a production
21
Elimination of ambiguity
To disambiguate the grammar :
E E + E | E E | ( E ) | id
EE+T|T id + id * id
TTF|F
F ( E ) | id
22
Elimination of Left recursion
A grammar is left recursive, if it has a non-terminal A
such that there is a derivation
A=>+Aα for some string α.
Top-down parsing methods cannot handle left-
recursive grammar.
so a transformation that eliminates left-recursion is
needed.
To eliminate left recursion for single production
A Aα |β could be replaced by the nonleft- recursive
productions
A β A’
A’ α A’| ε
23
Elimination of Left recursion…
E TE’
E’ +TE’ |
T FT’
T’ FT’ |
F ( E ) | id
24
Elimination of Left recursion…
Generally, we can eliminate immediate left
recursion from them by the following technique.
First we group the A-productions as:
25
Eliminating left-recursion (more)
Example: Given: S Aa | b
A Ac |Sd |ε
Substitute the S productions in A Sd to obtain the
following productions:
A Ac | Aad | bd |ε
Eliminating the immediate left recursion among the A
productions yields the following grammar:
S Aa | b
A bdA’ | A’
A’ cA’ | adA’ |ε
26
Top down parsing
Constructing a parse tree for the input string.
3-27
Top down parser
Example :for the input id+id*id
The top down parse trees according to the
following grammar :
3-29
3-30