CH2 1

A simple One Pass Compiler
2
Introduction
 In computer programming, a one-pass compiler is a compiler
that passes through the parts of each compilation unit only once,
immediately translating each part into its final machine code.
 One pass compiler reads the code only once and then translates it.
 A one-pass compiler is fast since all the compiler code is loaded in the
memory at once.
 It can process the source text without the overhead of the operating
system having to shut down one process and start another.
3
Introduction
 Building one pass compiler involves:
 Defining the syntax of a programming language (CFG/BNF)
 Develop a source code parser: (Top down parser)
 Implementing syntax directed translation to generate

intermediate code:
 Generating
 Optimize
4
Structure of Compiler
Character Token Intermediate
Syntax-directed
stream Lexical analyzer stream Representation
translator
Develop
parser and code
generator for translator
Syntax definition
(CFG)
5
Syntax Definition
 To specify the syntax of a language : CFG and BNF
Example : if-else statement in C has the form of statement → if (

expression ) statement else statement;
 An alphabet of a language is a set of symbols.
Examples : {0,1} for a binary number system (language)

={0,1,100,101,...}
{a,b,c} for language={a,b,c, ac,abcc..}
{if,(,),else ...} for a if statements={if(a==1)goto10, if--}

6
Syntax Definition
 A Context-free Grammar (CFG) Is Utilized to Describe the Syntactic
Structure of a Language.
 CFG is a set of recursive rules used to generate patterns of strings.
 In CFG, the start symbol is used to derive the string. You can derive the
string by repeatedly replacing a non-terminal by the right hand side of the
production, until all non-terminal have been replaced by terminal symbols.
 It is useful to describe most of the programming languages.
 If the grammar is properly designed then an efficient parser can be

constructed automatically.
7
CFG
 A CFG recursively defines several sets of strings
 Each set is denoted by a name, which is called a nonterminal.
 One of the non terminals are chosen to denote the language described by the
grammar. This is called the start symbol of the grammar.
 Each production describes some of the possible strings that are contained in the set
denoted by a nonterminal.
 A production has the form N → X1…….Xn
where N is a nonterminal and X1…Xn are zero or more symbols, each of which is
either a terminal or a nonterminal.
8
CFG
 Some examples:
A→ a
 says that the set denoted by the nonterminal A contains the one-
character string a.
A→ aA
 says that the set denoted by A contains all strings formed by

putting an a in front of a string taken from the set denoted by A.
9
CFG
 From regular expressions to context free grammars
10
CFG
 Common syntactic categories in programming languages
are:
 Expressions:- are used to express calculation of values.
 Statements:- express actions that occur in a particular
sequence.
 Declarations:- express properties of names used in other parts
of the program.
11
CFG
 A CFG Is Characterized By a 4 tuple:
1. A Set of Tokens(Terminal Symbols)
2. A Set of Non-terminals
3. A Set of Production Rules
Each Rule Has the Form NT →{T, NT}*
4. A designated Start symbol.
12
Example CFG
 Context-free grammar for simple expressions
G = <{list, digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list> with a production P=
List → list + digit
List → list-digit
List → digit
Digit → 0|1|2|3|4|5|6|7|8|9
(the “|” means OR)
(so we could have written List → list + digit | list - digit | digit )
13
Derivation
 A given CFG we can determine the set of all strings(tokens) generated by the
grammar using derivation.
 The basic idea of derivation is to consider productions as rewrite rules:

Whenever we have a nonterminal, we can replace this by the right-hand side
of any production in which the nonterminal appears on the left-hand side.
 During parsing we have to take two decisions. These are as follows:

 We have to decide the non-terminal which is to be replaced.
 We have to decide the production rule by which the non-terminal will be
replaced.
14
Derivation
 We begin with the start symbol
 In each step, we replace one non terminal in the current

sentential form with one of the right-hand sides of production
for that nonterminal.
 Formally, we define the derivation relation by the three rules
1: N =>    if there is a production N → 
2:  => 
3:  =>  if there is a  such that  => and =>

15
Derivation
generates the string aabbbcc by the derivation
16
Left-most Derivation
 the input is scanned and replaced with the production rule from left to
right. So in left most derivatives we read the input string from left to
right. Example
 Production rules:
S=S+S
S=S-S
 S = a | b |c
 Input : a - b + c
17
Right-most Derivation
 The input is scanned and replaced with the production rule from right
to left. So in right most derivatives we read the input string from right
to left.. Example
 Production rules:
S=S+S
S=S-S
 S = a | b |c
 Input : a - b + c
18
Grammars are Used to Derive Strings:
 We can derive the string: 9 - 5 + 2 as follows:
 list → list + digit P1: list → list + digit
→list - digit + digit P2: list → list - digit
→digit - digit + digit P3:list→digit
→9 - digit + digit P4: digit →9
→9 - 5 + digit P4: digit → 5
→9 - 5 + 2 P4: digit → 2
This is an example leftmost derivation, because we replaced the

leftmost nonterminal (underlined) in each step
19
Defining Parse tree
➢ More Formally, a Parse Tree for a CFG Has the

Following Properties:
➢ The root of the tree is labeled by the start symbol
➢ Each leaf of the tree is labeled by a terminal(token) or ε
➢ Each Interior Node (Now Leaf) Is a Non-Terminal
➢ If A→ x1x2…xn, is a production, Then A Is an Interior;
x1x2…xn Are Children of A and May Be Non-Terminals or
Tokens.
20
Parse Tree for the Example
Grammar
➢ Parse tree of the string 9-5+2 using grammar G
21
Ambiguity
 A grammar is said to be ambiguous if there exists more than one left
most derivation or more than one right most derivation or more than
one parse tree for a given input string.
 Consider the following context-free grammar:
G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>
with production P =
string → string + string | string - string | 0 | 1 | … | 9
 This grammar is ambiguous, because more than one parse tree

generates the string 9-5+2
22
Ambiguity
 Two derivations (Parse Trees) for the same token string.
23
Associativity of Operators
➢ An operator  is left-associative if the expression abc must be evaluated
from left to right, i.e., as (ab)c .
➢ An operator  is right-associative if the expression abc must be evaluated

from right to left, i.e., as a a(bc).
➢ An operator  is non-associative if expressions of the form abc are illegal.
➢ Left-associative operators have left-recursive productions.
eg) 9+5+2≡(9+5)+2, a=b=c≡a=(b=c)
• Left Associative Grammar • Right Associative Grammar
list → list + digit | list – digit right → letter = right | letter
digit →0|1|…|9 letter → a|b|…|z
24
Associativity of Operators
• Left Associative Grammar • Right Associative Grammar
list → | list – digit right → letter = right | letter
digit →0|1|…|9 letter → a|b|…|z
25
Precedence of Operator
➢ A possible way of resolving the ambiguity is to use
precedence rules during syntax analysis to select among the
possible syntax trees.
➢ We say that a operator(*) has higher precedence than other

operator(+) if the operator(*) takes operands before other
operator(+) does.
• ex. 9+5*2≡9+(5*2), 9*5+2≡(9*5)+2
• left associative operators : + , - , * , /
• right associative operators : = , **
26
Precedence of Operator
expr → expr + term | term

term → term * factor | factor
factor → number | ( expr )
String 2+3*5 has the same meaning as 2+(3*5)
expr term
term term factor
factor factor number
number number
2 + 3 * 5 27

CH2 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH2 1

Uploaded by

Copyright:

Available Formats

A simple One Pass Compiler

 Develop a source code parser: (Top down parser)

 Implementing syntax directed translation to generate

Example : if-else statement in C has the form of statement → if (

 An alphabet of a language is a set of symbols.

Examples : {0,1} for a binary number system (language)

{a,b,c} for language={a,b,c, ac,abcc..}

{if,(,),else ...} for a if statements={if(a==1)goto10, if--}

 CFG is a set of recursive rules used to generate patterns of strings.

 It is useful to describe most of the programming languages.

 If the grammar is properly designed then an efficient parser can be

 Each set is denoted by a name, which is called a nonterminal.

 A production has the form N → X1…….Xn

either a terminal or a nonterminal.

 says that the set denoted by A contains all strings formed by

 From regular expressions to context free grammars

1. A Set of Tokens(Terminal Symbols)

3. A Set of Production Rules

Each Rule Has the Form NT →{T, NT}*

4. A designated Start symbol.

G = <{list, digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list> with a production P=

List → list + digit

(the “|” means OR)

 The basic idea of derivation is to consider productions as rewrite rules:

 During parsing we have to take two decisions. These are as follows:

 In each step, we replace one non terminal in the current

 Formally, we define the derivation relation by the three rules

1: N =>    if there is a production N → 

3:  =>  if there is a  such that  => and =>

generates the string aabbbcc by the derivation

 list → list + digit P1: list → list + digit

→list - digit + digit P2: list → list - digit

→digit - digit + digit P3:list→digit

→9 - digit + digit P4: digit →9

→9 - 5 + digit P4: digit → 5

This is an example leftmost derivation, because we replaced the

➢ More Formally, a Parse Tree for a CFG Has the

➢ Parse tree of the string 9-5+2 using grammar G

 Consider the following context-free grammar:

G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>

string → string + string | string - string | 0 | 1 | … | 9

 This grammar is ambiguous, because more than one parse tree

➢ An operator  is right-associative if the expression abc must be evaluated

➢ An operator  is non-associative if expressions of the form abc are illegal.

➢ Left-associative operators have left-recursive productions.

eg) 9+5+2≡(9+5)+2, a=b=c≡a=(b=c)

• Left Associative Grammar • Right Associative Grammar

list → list + digit | list – digit right → letter = right | letter

digit →0|1|…|9 letter → a|b|…|z

➢ We say that a operator(*) has higher precedence than other

expr → expr + term | term

String 2+3*5 has the same meaning as 2+(3*5)

term term factor

factor factor number

You might also like

String 2+35 has the same meaning as 2+(35)