Professional Documents
Culture Documents
Lecture 3
Lecture 3
CS2210
Lecture 4
Parser
source lexical analyzer token parser get next token symbol table parse tree rest of frontend IR
Grammars
Precise, easy-to understand description of syntax Context-free grammars -> efficient parsers (automatically!) Help in translation and error detection
Syntax Errors
eg. Unbalanced () Report errors quickly & accurately Recover quickly (continue parsing after error) Little overhead on parse time
CS2210 Compiler Design 2004/05
Error Recovery
Panic mode
Phrase level
Local correction: replace a token by another and continue Encode commonly expected errors in grammar Find closest input string that is in L(G)
Error productions
Global correction
Context-free Grammars
Precise and easy way to specify the syntactical structure of a programming language Efficient recognition methods exist Natural specification of many recursive constructs:
Terminals T
Symbols which form strings of L(G), G a CFG (= tokens in the scanner), e.g. if, else, id Syntactic variables denoting sets of strings of L(G) Impose hierarchical structure (e.g., precedence rules) Denotes the set of strings of L(G) Rules that determine how strings are formed N -> (N|T) *
CS2210 Compiler Design 2004/05
Nonterminals N
Start symbol S ( N)
Productions P
Terminals:
Nonterminals
Start symbol
Notational Conventions
Terminals
Nonterminals
Terminal strings
, A ->
Productions
More Definitions
L(G) language generated by G = set of strings derived from S S =>+ w : w sentence of G (w string of terminals) S =>+ : sentential form of G (string can contain nonterminals) G and G are equivalent : L(G) = L(G) A language generated by a grammar (of the form shown) is called a context-free language
CS2210 Compiler Design 2004/05
Example
G = ({-,*,(,),<id>}, {E}, E, {E -> E + E, E-> E * E , E -> (E) , E-> - E, E -> <id>})
Sentence: -(<id> + <id>) Derivation: E => -E => -(E) => -(E+E)=>-(<id>+E) => -(<id> + <id>)
Leftmost derivation i.e. always replace leftmost nonterminal Rightmost derivation analogously Left /right sentential form
Parse Trees
E E => -E => -(E) => -(E+E)=> -(<id>+E) => -(<id> + <id>) ( E <id> Parse tree = graphical representation of a derivation ignoring replacement order E E + ) E <id>
Ambiguous Grammars
>=2 different parse trees for some sentence >= 2 leftmost/rightmost derivations Usually want to have unambiguous grammars
E.g. want to just one evaluation order: <id> + <id> * <id> to be parsed as <id> + (<id> * <id>) not (<id>+<id>)*<id> To keep grammars simple accept ambiguity and resolve separately (outside of grammar)
Expressive Power
Can express matching () with CFGs Can express most properties desired for programming languages Identifiers declared before used L = {wcw|w is in (a|b) *} Parameter checking (#formals = #actuals) L ={a nbmcndm|n 1, m 1}
stmt => if expr then stmt => if E1 then stmt => if E1 then if expr then stmt else stmt => if E1 then if E2 then stmt else stmt => if E1 then if E2 then S1 else stmt => if E1 then if E2 then S1 else S 2
stmt => if expr then stmt else stmt => if E1 then stmt else stmt => if E1 then if expr then stmt else stmt => if E1 then if E2 then stmt else stmt => if E1 then if E2 then S1 else stmt => if E1 then if E2 then S1 else S 2
Left Recursion
If for grammar G there is a derivation A =>+ A, for some string then G is left recursive Example: S -> Aa | b A -> Ac | Sd |
Parsing
= determining whether a string of tokens can be generated by a grammar Two classes based on order in which parse tree is constructed:
Top-down parsing
Start construction at root of parse tree Start at leaves and proceed to root
CS2210 Compiler Design 2004/05
Bottom-up parsing
A top-down method based on recursive procedures (one for each nonterminal typically)
Predictive Parser
Decides what production to use (based on lookahead in the input) Uses a production by mimicking the right side
stmt -> if expr then stmt else stmt | if expr then stmt
Common prefix
Solution
Simple case: immediate left recursion: Replace A -> A | with A -> A A -> A |
Left Factoring
stmt -> if expr then stmt stmt stmt -> else stmt |
Transition Diagrams
Create initial and final state For each production A -> X1X2Xn create a path from the initial to the final state, with edges labeled X1, X2, Xn
0 T + 3 6
E:
Stack
Parsing Table M
Parsing Algorithm
Stack contents and remaining input called parser configuration (initially $S on stack and complete input string)
If X=a=$ halt and announce success If X=a $ pop X off stack advance input to next symbol If X is a nonterminal use M[X,a] which contains production X->rhs or error replace X on stack with rhs or call error routine, respectively, e.g. X->UVW replace X with WVU (U on top) output the production (or augment parse tree)
CS2210 Compiler Design 2004/05
1. 2. 3.
10
First(X) = {X} for terminal X If X-> a production add to First(X) For X->Y1Yk place a in First(X) if a in First(Y i) and First(Yj) for j=1i-1, if First(Yj) j=1k add to First(X)
Follow(A) := set of terminals a that can appear immediately to the right of A in some sentential form i.e., S =>* Aa for some , (a can include $)
Place $ in Follow(S), S start symbol, $ right end marker If there is a production A-> B put everything in First() except in Follow(B) If there is a production A-> B or A->B where is in First() then everything in Follow(A) is in Follow(B)
Construction Algorithm
Input: Grammar G Output: Parsing table M For each production A -> do For each terminal a in FIRST() add A-> to M[A, a] If is in FIRST() add A-> to M[A,b] for each terminal b in FOLLOW(A). ($ counts as a terminal in this step) Make each undefined entry in M to error
CS2210 Compiler Design 2004/05
11
Example
E -> TE E -> +TE | T ->FT T -> *FT | F -> (E) | id FIRST(E) = FIRST(T) = FIRST(F) ={(,id} FIRST(E) = {+, } FIRST(T) = {*, } FOLLOW(E)=FOLLOW(E)={),$} FOLLOW(T)=FOLLOW(T)={+.),$} FOLLOW(F) ={+.*,),$} I + d
* (
E E T T F
LL(1)
A grammar whose parsing table has no multiply defined entries is said to be LL(1)
First L = left to right input scanning Second L = leftmost derivation (1) = 1 token lookahead
Not all grammars can be brought to LL(1) form, i.e., there are languages that do not fall into the LL(1) class
12