This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

)

INF 524 Compiler Construction Spring 2011

**Lexical Analyzer and Parser
**

pass token source program lexical analyzer request token parser

symbol table

Parser

Accepts string of tokens from lexical analyzer (usually one token at a time) Verifies whether or not string can be generated by grammar Reports syntax errors (recovers if possible)

Errors

Lexical errors (e.g. misspelled word) Syntax errors (e.g. unbalanced parentheses, missing semicolon) Semantic errors (e.g. type errors) Logical errors (e.g. infinite recursion)

Error Handling Report errors clearly and accurately Recover quickly if possible Poor error recover may lead to avalanche of errors .

Error Recovery Panic mode: discard tokens one at a time until a synchronizing token is found Phrase-level recovery: Perform local correction that allows parsing to continue Error Productions: Augment grammar to handle predicted. common errors Global Production: Use a complex algorithm to compute least-cost sequence of changes leading to parseable code .

symbols) ± Nonterminals (syntactic variables denoting sets of strings) ± Productions (rules specifying how terminals and nonterminals can combine to form strings) ± A start symbol (the set of strings it denotes is the language of the grammar) .Context Free Grammars CFGs can represent recursive constructs that regular expressions can not A CFG consists of: ± Tokens (terminals.

Derivations (Part 1) One definition of language: the set of strings that have valid parse trees Another definition: the set of strings that can be derived from the start symbol E E + E | E * E | (E) | ± E | id E => -E (read E derives ±E) E => -E => -(E) => -(id) .

Derivations (Part 2) A => if A is a production and and are arbitrary strings of grammar symbols If a1 => a2 => « => an. we say a1 derives an => means derives in one step *=> means derives in zero or more steps +=> means derives in one or more steps .

Sentences and Languages Let L(G) be the language generated by the grammar G with start symbol S: ± Strings in L(G) may contain only tokens of G ± A string w is in L(G) if and only if S +=> w ± Such a string w is a sentence of G Any language that can be generated by a CFG is said to be a context-free language If two grammars generate the same language. they are said to be equivalent .

where may contain nonterminals.Sentential Forms If S *=> . we say that is a sentential form of G A sentence is a sentential form with no nonterminals .

then we write lm*=> If S lm*=> then we say that is a leftsentential form of the grammar Analogous terms exist for rightmost derivations .Leftmost Derivations Only the leftmost nonterminal in any sentential form is replaced at each step A leftmost step can be written as wA lm=> w ± w consists of only terminals ± is a string of grammar symbols If derives by a leftmost derivation.

Parse Trees A parse tree can be viewed as a graphical representation of a derivation Every parse tree has a unique leftmost derivation (not true of every sentence) An ambiguous grammars has: ± more than one parse tree for at least one sentence ± more than one leftmost derivation for at least one sentence .

grammar accepts superset of actual language ± Later phase (semantic analysis) does type checking .Capability of Grammars Can describe most programming language constructs An exception: requiring that variables are declared before they are used ± Therefore.

CFGs Every construct that can be described by an RE and also be described by a CFG Why use REs at all? ± Lexical rules are simpler to describe this way ± REs are often easier to read ± More efficient lexical analyzers can be constructed .Regular Expressions vs.

Verifying Grammars A proof that a grammar verifies a language has two parts: ± Must show that every string generated by the grammar is part of the language ± Must show that every string that is part of the language can be generated by the grammar Rarely done for complete programming languages! .

Eliminating Ambiguity (1) stmt if expr then stmt | if expr then stmt else stmt | other if E1 then if E2 then S1 else S2 .

Eliminating Ambiguity (2) stmt if expr then stmt E1 if expr then stmt else stmt E2 S1 S2 .

Eliminating Ambiguity (3) stmt matched | unmatched matched if expr then matched else matched | other unmatched if expr then stmt | if expr then matched else unmatched .

Left Recursion A grammar is left recursive if for any nonterminal A such that there exists any derivation A +=> A for any string Most top-down parsing methods can not handle left-recursive grammars .

Eliminating Left Recursion (1) A A 1 | A 2 | « | A m | 1 | 2 | « | n A A¶ 1A¶ | 2A¶ | « | nA¶ 1A¶ | 2A¶ | « | mA¶ | Harder case: S Aa | b A Ac | Sd | .

« An Apply the following algorithm: for i = 1 to n { for j = 1 to i-1 { replace each production of the form Ai Aj by the productions Ai | 2 | « | k .Eliminating Left Recursion (2) First arrange the nonterminals in some order A1. A2. 1 where Aj 1 | 2 | « | k are the Aj productions } eliminate the left recursion among Ai productions } .

Left Factoring Rewriting productions to delay decisions Helpful for predictive parsing Not guaranteed to remove ambiguity A 1 | 2 A A¶ A¶ 1 | 2 .

some checks put off until semantic analysis .Limitations of CFGs Can not verify repeated strings ± Example: L1 = {wcw | w is in (a|b)*} ± Abstracts checking that variables are declared Can not verify repeated counts ± Example: L2 = {anbmcndm | n1 & m1} ± Abstracts checking that number of formal and actual parameters are equal Therefore.

creating nodes in preorder General form is recursive descent parsing ± May require backtracking ± Backtracking parsers not used frequently because not needed .Top Down Parsing Can be viewed two ways: ± Attempt to find leftmost derivation for input string ± Attempt to create parse tree. starting from at root.

Predictive Parsing A special case of recursive-descent parsing that does not require backtracking Must always know which production to use based on current input symbol Can often create appropriate grammar: ± removing left-recursion ± left factoring the resulting grammar .

Transition Diagrams For parser: ± One diagram for each nonterminal ± Edge labels can be tokens or nonterminal A transition on a token means we should take that transition if token is next input symbol A transition on a nonterminal can be thought of as a call to a procedure for that nonterminal As opposed to lexical analyzers: ± One (or more) diagrams for each token ± Labels are symbols of input alphabet .

Xn.Creating Transition Diagrams First eliminate left recursion from grammar Then left factor grammar For each nonterminal A: ± Create an initial and final state ± For every production A X1X2«Xn. create a path from initial to final state with edges labeled X1. . «. X2.

Using Transition Diagrams Predictive parsers: ± Start at start symbol of grammar ± From state s with edge to state t labeled with token a. state changes to t Can be recursive or non-recursive using stack . if next input token is a: State changes to t Input cursor moves one position right ± If edge labeled by nonterminal A: State changes to start state for A Input cursor is not moved If final state of A reached. then state changes to t ± If edge labeled by .

Transition Diagram Example E T F E: 0 T 1 E’ 2 E + T | T T * F | F (E) | id T¶: 10 * E E¶ T T¶ F 11 TE¶ +TE¶ | FT¶ *FT¶ | (E) | id F ε 12 T’ 13 E¶: 3 + 4 T ε 5 E’ 6 T: F: .

Simplifying Transition Diagrams E¶: 3 + 4 T ε 5 E’ 6 E: 0 T 1 E’ 2 0 T 3 + T ε 4 6 3 + T ε 4 6 .

Nonrecursive Predictive Parsing (1) Input a + b $ Stack X Y Z $ Predictive Parsing Program Output Parsing Table M .

and a. parser pops X off stack and advances to next input symbol If X is a nonterminal. the symbol on top of the stack. parser halts successfully if X = a $. the program consults M[X.Nonrecursive Predictive Parsing (2) Program considers X. the next input symbol If X = a = $. a] (production or error entry) .

Nonrecursive Predictive Parsing (3) Initialize stack with start symbol of grammar Initialize input pointer to first symbol of input After consulting parsing table: ± If entry is production. an error recovery routine is called . parser replaces top entry of stack with right side of production (leftmost symbol on top) ± Otherwise.

Predictive Parsing Table Nonterminal E E¶ T T¶ F F id T FT¶ T¶ T¶ *FT¶ F (E) Input Symbol id E TE¶ E¶ +TE¶ T FT¶ T¶ T¶ + * ( E TE¶ E¶ E¶ ) $ .

Using a Predictive Parsing Table Stack $E $E¶T $E¶T¶F $E¶T¶id $E¶T¶ $E¶ $E¶T+ $E¶T $E¶T¶F Input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ +id*id$ +id*id$ id*id$ id*id$ T FT¶ T¶ E¶ +TE¶ E TE¶ T FT¶ F id Output Stack « $E¶T¶id $E¶T¶ $E¶T¶F* $E¶T¶F $E¶T¶id $E¶T¶ $E¶ $ Input « id*id$ *id$ *id$ id$ id$ $ $ $ T¶ E¶ F id T¶ *FT¶ Output « F id .

add to FIRST(X) ± If X is a nonterminal and X Y1Y2«Yn is a production: For all terminals a. FIRST(Y2). add to FIRST(X) . « FIRST(Yn). « FIRST(Yi-1) If is a member of FIRST(Y1). add a to FIRST(X) if a is a member of any FIRST(Yi) and is a member of FIRST(Y1). FIRST(X) = {X} ± If X is a production.FIRST FIRST( ) is the set of all terminals that begin any string derived from Computing FIRST: ± If X is a terminal. FIRST(Y2).

is the set of terminals a that can appear immediately to the right if A in some sentential form More formally. a is in FOLLOW(A) if and only if there exists a derivation of the form S *=> Aa $ is in FOLLOW(A) if and only if there exists a derivation of the form S *=> A . for any nonterminal A.FOLLOW FOLLOW(A).

Computing FOLLOW Place $ in FOLLOW(S) If there is a production A B . then everything in FIRST( ) (except for ) is in FOLLOW(B) If there is a production A B. or a production A B where FIRST( ) contains .then everything in FOLLOW(A) is also in FOLLOW(B) .

). $} FOLLOW(T) = FOLLOW(T¶) = {+. } FOLLOW(E) = FOLLOW(E¶) = {). $} .FIRST and FOLLOW Example E E¶ T T¶ F TE¶ +TE¶ | FT¶ *FT¶ | (E) | id FIRST(E) = FIRST(T) = FIRST(F) = {(. } FIRST(T¶) = {*. id} FIRST(E¶) = {+. $} FOLLOW(F) = {+. *.

$] Mark each undefined entry of M as an error entry (use some recovery strategy) .Creating a Predictive Parsing Table For each production A : ± For each terminal a in FIRST( ) add A to M[A. b] for every terminal b in FOLLOW(A) ± If is in FIRST( ) and $ is in FOLLOW(A) add A to M[A. a] ± If is in FIRST( ) add A to M[A.

Multiply-Defined Entries Example S S¶ E iEtSS¶ | a eS | b Nonterminal a S S¶ E E b S a b S Input Symbol i iEtSS¶ S¶ S¶ eS S¶ t e $ .

left-to-right scanning of input ± Second ³L´. produces leftmost derivation ± ³1´ refers to the number of lookahead symbols needed to make decisions . grammar is said to be ³LL(1)´ ± First ³L´.LL(1) Grammars (1) Algorithm covered in class can be applied to any grammar to produce a parsing table If parsing table has no multiply-defined entries.

there are no universal rules to handle cases like this .LL(1) Grammars (2) No ambiguous or left-recursive grammar can be LL(1) Eliminating left recursion and left factoring does not always lead to LL(1) grammar Some grammars can not be transformed into an LL(1) grammar at all Although the example of a non-LL(1) grammar we covered has a fix.

- Lect 09
- Meeting03 Syntax.6up
- Sslab Viva
- Rr411201 Language Processors
- CFG (3)
- role of lexical analyser
- Finjan v Proofpoint Markman Order
- 121080907 a 1
- synopsis 0
- XML
- COMP322-A02
- 10_1englishkoreantranslation
- XML Parsing
- Lex
- Pcd Experiments
- popt
- Low Level Virtual Machine C# Compiler Senior Project Proposal
- Basic Php
- Oracle Database 11g Problem SQL Statements and Oracle Optimizer
- 10.1.1.163.6502
- 3215-3-SDLC for SPA
- Tutorial 2003
- Soap Xmlrpc
- Ile Rpg Reference v5r4
- Tent A
- 09_Operators in ‘C’_pptx
- 9781783286331_Haskell_Data_Analysis_Cookbook_Sample_Chapter
- 10_f08_DialogBoxes_1
- CSharp_Programs Very Important
- null

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd