Professional Documents
Culture Documents
(Parsing)
E E + E | E * E | (E) | – E | id
E => -E (read E derives –E)
E => -E => -(E) => -(id)
Derivations (Part 2)
• αAβ => αγβ if A γ is a production
and α and β are arbitrary strings of
grammar symbols
• If a1 => a2 => … => an, we say a1
derives an
• => means derives in one step
• *=> means derives in zero or more steps
• +=> means derives in one or more steps
Sentences and Languages
• Let L(G) be the language generated by
the grammar G with start symbol S:
– Strings in L(G) may contain only tokens of G
– A string w is in L(G) if and only if S +=> w
– Such a string w is a sentence of G
• Any language that can be generated by a
CFG is said to be a context-free language
• If two grammars generate the same
language, they are said to be equivalent
Sentential Forms
• If S *=> α, where α may contain
nonterminals, we say that α is a sentential
form of G
• A sentence is a sentential form with no
nonterminals
Leftmost Derivations
• Only the leftmost nonterminal in any sentential
form is replaced at each step
• A leftmost step can be written as wAγ lm=> wδγ
– w consists of only terminals
– γ is a string of grammar symbols
• If α derives β by a leftmost derivation, then we
write α lm*=> β
• If S lm*=> α then we say that α is a left-
sentential form of the grammar
• Analogous terms exist for rightmost derivations
Parse Trees
• A parse tree can be viewed as a graphical
representation of a derivation
• Every parse tree has a unique leftmost
derivation (not true of every sentence)
• An ambiguous grammars has:
– more than one parse tree for at least one
sentence
– more than one leftmost derivation for at least
one sentence
Capability of Grammars
• Can describe most programming language
constructs
• An exception: requiring that variables are
declared before they are used
– Therefore, grammar accepts superset of
actual language
– Later phase (semantic analysis) does type
checking
Regular Expressions vs. CFGs
• Every construct that can be described by
an RE and also be described by a CFG
• Why use REs at all?
– Lexical rules are simpler to describe this way
– REs are often easier to read
– More efficient lexical analyzers can be
constructed
Verifying Grammars
• A proof that a grammar verifies a language
has two parts:
– Must show that every string generated by the
grammar is part of the language
– Must show that every string that is part of the
language can be generated by the grammar
• Rarely done for complete programming
languages!
Eliminating Ambiguity (1)
stmt matched
| unmatched
matched if expr then matched else matched
| other
unmatched if expr then stmt
| if expr then matched else unmatched
Left Recursion
• A grammar is left recursive if for any
nonterminal A such that there exists any
derivation A +=> Aα for any string α
• Most top-down parsing methods can not
handle left-recursive grammars
Eliminating Left Recursion (1)
Harder case:
S Aa | b
A Ac | Sd | ε
Eliminating Left Recursion (2)
• First arrange the nonterminals in some
order A1, A2, … An
• Apply the following algorithm:
for i = 1 to n {
for j = 1 to i-1 {
replace each production of the form Ai Ajγ
by the productions Ai δ1γ | δ2γ | … | δkγ,
where Aj δ1 | δ2 | … | δk are the Aj productions
}
eliminate the left recursion among Ai productions
}
Left Factoring
• Rewriting productions to delay decisions
• Helpful for predictive parsing
• Not guaranteed to remove ambiguity
A αβ1 | αβ2
A αA’
A’ β1 | β2
Limitations of CFGs
• Can not verify repeated strings
– Example: L1 = {wcw | w is in (a|b)*}
– Abstracts checking that variables are declared
• Can not verify repeated counts
– Example: L2 = {anbmcndm | n≥1 & m≥1}
– Abstracts checking that number of formal and
actual parameters are equal
• Therefore, some checks put off until
semantic analysis
Top Down Parsing
• Can be viewed two ways:
– Attempt to find leftmost derivation for input
string
– Attempt to create parse tree, starting from at
root, creating nodes in preorder
• General form is recursive descent parsing
– May require backtracking
– Backtracking parsers not used frequently
because not needed
Predictive Parsing
• A special case of recursive-descent
parsing that does not require backtracking
• Must always know which production to use
based on current input symbol
• Can often create appropriate grammar:
– removing left-recursion
– left factoring the resulting grammar
Transition Diagrams
• For parser:
– One diagram for each nonterminal
– Edge labels can be tokens or nonterminal
• A transition on a token means we should take that
transition if token is next input symbol
• A transition on a nonterminal can be thought of as
a call to a procedure for that nonterminal
• As opposed to lexical analyzers:
– One (or more) diagrams for each token
– Labels are symbols of input alphabet
Creating Transition Diagrams
• First eliminate left recursion from grammar
• Then left factor grammar
• For each nonterminal A:
– Create an initial and final state
– For every production A X1X2…Xn, create a
path from initial to final state with edges
labeled X1, X2, …, Xn.
Using Transition Diagrams
• Predictive parsers:
– Start at start symbol of grammar
– From state s with edge to state t labeled with token
a, if next input token is a:
• State changes to t
• Input cursor moves one position right
– If edge labeled by nonterminal A:
• State changes to start state for A
• Input cursor is not moved
• If final state of A reached, then state changes to t
– If edge labeled by ε, state changes to t
• Can be recursive or non-recursive using stack
Transition Diagram Example
E TE’
E E + T | T E’ +TE’ | ε
T T * F | F T FT’
F (E) | id T’ *FT’ | ε
F (E) | id
E: T’:
E’:
T: F:
Simplifying Transition Diagrams
E’: E:
Nonrecursive Predictive Parsing (1)
Input
Stack
Nonrecursive Predictive Parsing (2)
E ETE’ ETE’
T TFT’ TFT’
F Fid F(E)
Using a Predictive Parsing Table
Stack Input Output Stack Input Output
$E id+id*id$ … … …
S S a S iEtSS’
S’ ε
S’ S’ ε
S’ eS
E E b
LL(1) Grammars (1)
• Algorithm covered in class can be applied
to any grammar to produce a parsing table
• If parsing table has no multiply-defined
entries, grammar is said to be “LL(1)”
– First “L”, left-to-right scanning of input
– Second “L”, produces leftmost derivation
– “1” refers to the number of lookahead symbols
needed to make decisions
LL(1) Grammars (2)
• No ambiguous or left-recursive grammar
can be LL(1)
• Eliminating left recursion and left factoring
does not always lead to LL(1) grammar
• Some grammars can not be transformed
into an LL(1) grammar at all
• Although the example of a non-LL(1)
grammar we covered has a fix, there are
no universal rules to handle cases like this