This action might not be possible to undo. Are you sure you want to continue?

)

INF 524 Compiler Construction Spring 2011

**Lexical Analyzer and Parser
**

pass token source program lexical analyzer request token parser

symbol table

Parser

Accepts string of tokens from lexical analyzer (usually one token at a time) Verifies whether or not string can be generated by grammar Reports syntax errors (recovers if possible)

Errors

Lexical errors (e.g. misspelled word) Syntax errors (e.g. unbalanced parentheses, missing semicolon) Semantic errors (e.g. type errors) Logical errors (e.g. infinite recursion)

Error Handling Report errors clearly and accurately Recover quickly if possible Poor error recover may lead to avalanche of errors .

Error Recovery Panic mode: discard tokens one at a time until a synchronizing token is found Phrase-level recovery: Perform local correction that allows parsing to continue Error Productions: Augment grammar to handle predicted. common errors Global Production: Use a complex algorithm to compute least-cost sequence of changes leading to parseable code .

Context Free Grammars CFGs can represent recursive constructs that regular expressions can not A CFG consists of: ± Tokens (terminals. symbols) ± Nonterminals (syntactic variables denoting sets of strings) ± Productions (rules specifying how terminals and nonterminals can combine to form strings) ± A start symbol (the set of strings it denotes is the language of the grammar) .

Derivations (Part 1) One definition of language: the set of strings that have valid parse trees Another definition: the set of strings that can be derived from the start symbol E E + E | E * E | (E) | ± E | id E => -E (read E derives ±E) E => -E => -(E) => -(id) .

we say a1 derives an => means derives in one step *=> means derives in zero or more steps +=> means derives in one or more steps .Derivations (Part 2) A => if A is a production and and are arbitrary strings of grammar symbols If a1 => a2 => « => an.

they are said to be equivalent .Sentences and Languages Let L(G) be the language generated by the grammar G with start symbol S: ± Strings in L(G) may contain only tokens of G ± A string w is in L(G) if and only if S +=> w ± Such a string w is a sentence of G Any language that can be generated by a CFG is said to be a context-free language If two grammars generate the same language.

Sentential Forms If S *=> . where may contain nonterminals. we say that is a sentential form of G A sentence is a sentential form with no nonterminals .

Leftmost Derivations Only the leftmost nonterminal in any sentential form is replaced at each step A leftmost step can be written as wA lm=> w ± w consists of only terminals ± is a string of grammar symbols If derives by a leftmost derivation. then we write lm*=> If S lm*=> then we say that is a leftsentential form of the grammar Analogous terms exist for rightmost derivations .

Parse Trees A parse tree can be viewed as a graphical representation of a derivation Every parse tree has a unique leftmost derivation (not true of every sentence) An ambiguous grammars has: ± more than one parse tree for at least one sentence ± more than one leftmost derivation for at least one sentence .

Capability of Grammars Can describe most programming language constructs An exception: requiring that variables are declared before they are used ± Therefore. grammar accepts superset of actual language ± Later phase (semantic analysis) does type checking .

CFGs Every construct that can be described by an RE and also be described by a CFG Why use REs at all? ± Lexical rules are simpler to describe this way ± REs are often easier to read ± More efficient lexical analyzers can be constructed .Regular Expressions vs.

Verifying Grammars A proof that a grammar verifies a language has two parts: ± Must show that every string generated by the grammar is part of the language ± Must show that every string that is part of the language can be generated by the grammar Rarely done for complete programming languages! .

Eliminating Ambiguity (1) stmt if expr then stmt | if expr then stmt else stmt | other if E1 then if E2 then S1 else S2 .

Eliminating Ambiguity (2) stmt if expr then stmt E1 if expr then stmt else stmt E2 S1 S2 .

Eliminating Ambiguity (3) stmt matched | unmatched matched if expr then matched else matched | other unmatched if expr then stmt | if expr then matched else unmatched .

Left Recursion A grammar is left recursive if for any nonterminal A such that there exists any derivation A +=> A for any string Most top-down parsing methods can not handle left-recursive grammars .

Eliminating Left Recursion (1) A A 1 | A 2 | « | A m | 1 | 2 | « | n A A¶ 1A¶ | 2A¶ | « | nA¶ 1A¶ | 2A¶ | « | mA¶ | Harder case: S Aa | b A Ac | Sd | .

1 where Aj 1 | 2 | « | k are the Aj productions } eliminate the left recursion among Ai productions } . A2.Eliminating Left Recursion (2) First arrange the nonterminals in some order A1. « An Apply the following algorithm: for i = 1 to n { for j = 1 to i-1 { replace each production of the form Ai Aj by the productions Ai | 2 | « | k .

Left Factoring Rewriting productions to delay decisions Helpful for predictive parsing Not guaranteed to remove ambiguity A 1 | 2 A A¶ A¶ 1 | 2 .

Limitations of CFGs Can not verify repeated strings ± Example: L1 = {wcw | w is in (a|b)*} ± Abstracts checking that variables are declared Can not verify repeated counts ± Example: L2 = {anbmcndm | n1 & m1} ± Abstracts checking that number of formal and actual parameters are equal Therefore. some checks put off until semantic analysis .

starting from at root.Top Down Parsing Can be viewed two ways: ± Attempt to find leftmost derivation for input string ± Attempt to create parse tree. creating nodes in preorder General form is recursive descent parsing ± May require backtracking ± Backtracking parsers not used frequently because not needed .

Predictive Parsing A special case of recursive-descent parsing that does not require backtracking Must always know which production to use based on current input symbol Can often create appropriate grammar: ± removing left-recursion ± left factoring the resulting grammar .

Transition Diagrams For parser: ± One diagram for each nonterminal ± Edge labels can be tokens or nonterminal A transition on a token means we should take that transition if token is next input symbol A transition on a nonterminal can be thought of as a call to a procedure for that nonterminal As opposed to lexical analyzers: ± One (or more) diagrams for each token ± Labels are symbols of input alphabet .

create a path from initial to final state with edges labeled X1. X2. . Xn.Creating Transition Diagrams First eliminate left recursion from grammar Then left factor grammar For each nonterminal A: ± Create an initial and final state ± For every production A X1X2«Xn. «.

then state changes to t ± If edge labeled by . if next input token is a: State changes to t Input cursor moves one position right ± If edge labeled by nonterminal A: State changes to start state for A Input cursor is not moved If final state of A reached. state changes to t Can be recursive or non-recursive using stack .Using Transition Diagrams Predictive parsers: ± Start at start symbol of grammar ± From state s with edge to state t labeled with token a.

Transition Diagram Example E T F E: 0 T 1 E’ 2 E + T | T T * F | F (E) | id T¶: 10 * E E¶ T T¶ F 11 TE¶ +TE¶ | FT¶ *FT¶ | (E) | id F ε 12 T’ 13 E¶: 3 + 4 T ε 5 E’ 6 T: F: .

Simplifying Transition Diagrams E¶: 3 + 4 T ε 5 E’ 6 E: 0 T 1 E’ 2 0 T 3 + T ε 4 6 3 + T ε 4 6 .

Nonrecursive Predictive Parsing (1) Input a + b $ Stack X Y Z $ Predictive Parsing Program Output Parsing Table M .

the symbol on top of the stack.Nonrecursive Predictive Parsing (2) Program considers X. a] (production or error entry) . the program consults M[X. parser halts successfully if X = a $. parser pops X off stack and advances to next input symbol If X is a nonterminal. the next input symbol If X = a = $. and a.

Nonrecursive Predictive Parsing (3) Initialize stack with start symbol of grammar Initialize input pointer to first symbol of input After consulting parsing table: ± If entry is production. an error recovery routine is called . parser replaces top entry of stack with right side of production (leftmost symbol on top) ± Otherwise.

Predictive Parsing Table Nonterminal E E¶ T T¶ F F id T FT¶ T¶ T¶ *FT¶ F (E) Input Symbol id E TE¶ E¶ +TE¶ T FT¶ T¶ T¶ + * ( E TE¶ E¶ E¶ ) $ .

Using a Predictive Parsing Table Stack $E $E¶T $E¶T¶F $E¶T¶id $E¶T¶ $E¶ $E¶T+ $E¶T $E¶T¶F Input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ +id*id$ +id*id$ id*id$ id*id$ T FT¶ T¶ E¶ +TE¶ E TE¶ T FT¶ F id Output Stack « $E¶T¶id $E¶T¶ $E¶T¶F* $E¶T¶F $E¶T¶id $E¶T¶ $E¶ $ Input « id*id$ *id$ *id$ id$ id$ $ $ $ T¶ E¶ F id T¶ *FT¶ Output « F id .

FIRST(X) = {X} ± If X is a production.FIRST FIRST( ) is the set of all terminals that begin any string derived from Computing FIRST: ± If X is a terminal. FIRST(Y2). « FIRST(Yi-1) If is a member of FIRST(Y1). FIRST(Y2). add to FIRST(X) ± If X is a nonterminal and X Y1Y2«Yn is a production: For all terminals a. add to FIRST(X) . « FIRST(Yn). add a to FIRST(X) if a is a member of any FIRST(Yi) and is a member of FIRST(Y1).

a is in FOLLOW(A) if and only if there exists a derivation of the form S *=> Aa $ is in FOLLOW(A) if and only if there exists a derivation of the form S *=> A . is the set of terminals a that can appear immediately to the right if A in some sentential form More formally. for any nonterminal A.FOLLOW FOLLOW(A).

then everything in FOLLOW(A) is also in FOLLOW(B) .Computing FOLLOW Place $ in FOLLOW(S) If there is a production A B . then everything in FIRST( ) (except for ) is in FOLLOW(B) If there is a production A B. or a production A B where FIRST( ) contains .

$} . *. $} FOLLOW(F) = {+. } FIRST(T¶) = {*.FIRST and FOLLOW Example E E¶ T T¶ F TE¶ +TE¶ | FT¶ *FT¶ | (E) | id FIRST(E) = FIRST(T) = FIRST(F) = {(. } FOLLOW(E) = FOLLOW(E¶) = {). id} FIRST(E¶) = {+. ). $} FOLLOW(T) = FOLLOW(T¶) = {+.

$] Mark each undefined entry of M as an error entry (use some recovery strategy) . b] for every terminal b in FOLLOW(A) ± If is in FIRST( ) and $ is in FOLLOW(A) add A to M[A.Creating a Predictive Parsing Table For each production A : ± For each terminal a in FIRST( ) add A to M[A. a] ± If is in FIRST( ) add A to M[A.

Multiply-Defined Entries Example S S¶ E iEtSS¶ | a eS | b Nonterminal a S S¶ E E b S a b S Input Symbol i iEtSS¶ S¶ S¶ eS S¶ t e $ .

left-to-right scanning of input ± Second ³L´. grammar is said to be ³LL(1)´ ± First ³L´.LL(1) Grammars (1) Algorithm covered in class can be applied to any grammar to produce a parsing table If parsing table has no multiply-defined entries. produces leftmost derivation ± ³1´ refers to the number of lookahead symbols needed to make decisions .

there are no universal rules to handle cases like this .LL(1) Grammars (2) No ambiguous or left-recursive grammar can be LL(1) Eliminating left recursion and left factoring does not always lead to LL(1) grammar Some grammars can not be transformed into an LL(1) grammar at all Although the example of a non-LL(1) grammar we covered has a fix.

Sign up to vote on this title

UsefulNot useful- Compiler construction
- Attribute-6.pdf
- TOPDOWNP
- System Software Beyond Syllabuis
- Principles of Compiler Design
- 2_SimpleOnePassCompiler
- 01.Overview
- Design and Construction of Compilers
- Compiler
- Lecture 9 - Predictive Parsing
- CD LAB MANUAL.pdf
- CD Ques Bank
- 3rd Year Cse CD Course Plan
- pset3_z
- CD QUESTIONS(1,2,3).doc
- UNIT 2
- Investigate Stage
- Pcd Record
- Introduction to Compiler Construction in a Java World - Bill Campbell & Swami Iyer
- Eighth Semester Syllabus GBTU
- Apache Tika API Usage Examples
- r05311201 Automata and Compiler Design
- Seminar on Zebu
- 16-regexp-ocamllex
- Translator 02
- Beyond Grammar
- Compiler Design -- Practical Earley Parsing
- Presentation
- Naive Bayes for Function Tagging and Context Free Grammar for Relations of Myanmar Sentences
- CMPN_2
- Ch4E