Professional Documents
Culture Documents
• Derivations
• Ambiguity
• Syntax errors
• Formal languages are very important in CS Intuition: A finite automaton that runs long
– Especially in programming languages enough must repeat states
• A finite automaton cannot remember # of
• Regular languages times it has visited a particular state
– The weakest formal languages widely used • because a finite automaton has finite memory
– Many applications – Only enough to store in which state it is
– Cannot count, except up to a finite limit
• We will also study context-free languages • Many languages are not regular
• E.g., language of balanced parentheses is not
regular: { (i )i | i ≥ 0}
IF-THEN-ELSE
== = =
ID ID ID INT ID INT
Compiler Design 1 (2011) 5 Compiler Design 1 (2011) 6
E→ E*E X → Y1 ... Yn
Means X can be replaced by Y1 ... Yn
⏐ E+E X→ε
⏐ (E) Means X can be erased (replaced with empty string)
⏐ id
(1) Begin with a string consisting of the start More formally, we write
symbol “S”
X1 L X i L X n → X1 L X i−1Y1 LYm X i+1 L X n
(2) Replace any non-terminal X in the string by
a right-hand side of some production if there is a production
X → Y1 LYn X i → Y1 LYm
(3) Repeat (2) until there are no non-terminals in
the string
Terminals Examples
• Terminals are called so because there are no L(G) is the language of the CFG G
rules for replacing them
Two grammars:
• Terminals ought to be tokens of the language
S → (S ) S → (S )
OR
S → ε | ε
A fragment of our example language (simplified): Some elements of the our language
• Grammar
E
E → E+E | E ∗ E | (E) | id E
• String → E+E
E + E
id ∗ id + id → E ∗ E+E
→ id ∗ E + E E * E id
→ id ∗ id + E
id id
→ id ∗ id + id
Compiler Design 1 (2011) 27 Compiler Design 1 (2011) 28
Derivation in Detail (1) Derivation in Detail (2)
E E
E + E
E
E
→ E+E
E E
E
E E + E E + E
→ E+E
→ E+E
E * E → E ∗ E+E E * E
→ E ∗ E+E
→ id ∗ E + E
id
E E
E
E
→ E+E
→ E+E E + E E + E
→ E ∗ E+E
→ E ∗ E+E
E * E → id ∗ E + E E * E id
→ id ∗ E + E
→ id ∗ id + E
→ id ∗ id + E id id id id
→ id ∗ id + id
Compiler Design 1 (2011) 33 Compiler Design 1 (2011) 34
E E
E + E
E
E
→ E+E
E E
E
E E + E E + E
→ E+E
→ E+E
id → E+id E * E id
→ E+id
→ E ∗ E + id
E E
E
E
→ E+E
→ E+E E + E E + E
→ E+id
→ E+id
E * E id → E ∗ E + id E * E id
→ E ∗ E + id
→ E ∗ id + id
→ E ∗ id + id id id id
→ id ∗ id + id
Compiler Design 1 (2011) 41 Compiler Design 1 (2011) 42
• Note that right-most and left-most • We are not just interested in whether
derivations have the same parse tree s ∈ L(G)
– We need a parse tree for s
• The difference is just in the order in which
branches are added • A derivation defines a parse tree
– But one parse tree may have many derivations
• A grammar is ambiguous if it has more than • There are several ways to handle ambiguity
one parse tree for some string
– Equivalently, there is more than one right-most or • Most direct method is to rewrite grammar
left-most derivation for some string
unambiguously
• Ambiguity is bad
– Leaves meaning of some programs ill-defined
E→T+E|T
• Ambiguity is common in programming languages
T → int * T | int | ( E )
– Arithmetic expressions
– IF-THEN-ELSE
• Enforces precedence of * over +
• else matches the closest unmatched then • The expression if C1 then if C2 then S3 else S4
• We can describe this in the grammar
if if
S → MIF /* all then are matched */
| UIF /* some then are unmatched */
C1 if C1 if S4
MIF → if C then MIF else MIF
| OTHER
C2 S3 S4 C2 S3
UIF → if C then S
| if C then MIF else UIF • A valid parse tree • Not valid because the
(for a UIF) then expression is not
• Describes the same set of strings a MIF
E E
E E
E * E E + E
E + E E + E
E E + E int int E * E
+ E int int E + E
• Not all are supported by all parser generators • Such tokens are called synchronizing tokens
– Typically the statement or expression terminators
• Consider the erroneous expression • Idea: specify in the grammar known common
(1 + + 2) + 3 mistakes
• Panic-mode recovery: • Essentially promotes common errors to
– Skip ahead to next integer and then continue alternative syntax
• Example:
• (ML)-Yacc: use the special terminal error to – Write 5 x instead of 5 * x
describe how much input to skip – Add the production E → … | E E
E → int | E + E | ( E ) | error int | ( error ) • Disadvantage
– Complicates the grammar
• Past
– Slow recompilation cycle (even once a day)
– Find as many errors in one cycle as possible
– Researchers could not let go of the topic
• Present
– Quick recompilation cycle
– Users tend to correct one error/cycle
– Complex error recovery is needed less
– Panic-mode seems enough