You are on page 1of 27

A simple One Pass Compiler

2
Introduction
 In computer programming, a one-pass compiler is a compiler
that passes through the parts of each compilation unit only once,
immediately translating each part into its final machine code.

 One pass compiler reads the code only once and then translates it.

 A one-pass compiler is fast since all the compiler code is loaded in the
memory at once.

 It can process the source text without the overhead of the operating
system having to shut down one process and start another.

3
Introduction
 Building one pass compiler involves:
 Defining the syntax of a programming language (CFG/BNF)

 Develop a source code parser: (Top down parser)

 Implementing syntax directed translation to generate


intermediate code:

 Generating

 Optimize

4
Structure of Compiler
Character Token Intermediate
Syntax-directed
stream Lexical analyzer stream Representation
translator

Develop
parser and code
generator for translator

Syntax definition
(CFG)

5
Syntax Definition
 To specify the syntax of a language : CFG and BNF

Example : if-else statement in C has the form of statement → if (


expression ) statement else statement;

 An alphabet of a language is a set of symbols.

Examples : {0,1} for a binary number system (language)


={0,1,100,101,...}

{a,b,c} for language={a,b,c, ac,abcc..}

{if,(,),else ...} for a if statements={if(a==1)goto10, if--}


6
Syntax Definition
 A Context-free Grammar (CFG) Is Utilized to Describe the Syntactic
Structure of a Language.

 CFG is a set of recursive rules used to generate patterns of strings.

 In CFG, the start symbol is used to derive the string. You can derive the
string by repeatedly replacing a non-terminal by the right hand side of the
production, until all non-terminal have been replaced by terminal symbols.

 It is useful to describe most of the programming languages.

 If the grammar is properly designed then an efficient parser can be


constructed automatically.

7
CFG
 A CFG recursively defines several sets of strings

 Each set is denoted by a name, which is called a nonterminal.

 One of the non terminals are chosen to denote the language described by the
grammar. This is called the start symbol of the grammar.

 Each production describes some of the possible strings that are contained in the set
denoted by a nonterminal.

 A production has the form N → X1…….Xn

where N is a nonterminal and X1…Xn are zero or more symbols, each of which is

either a terminal or a nonterminal.

8
CFG
 Some examples:

A→ a

 says that the set denoted by the nonterminal A contains the one-
character string a.

A→ aA

 says that the set denoted by A contains all strings formed by


putting an a in front of a string taken from the set denoted by A.

9
CFG

 From regular expressions to context free grammars

10
CFG
 Common syntactic categories in programming languages
are:
 Expressions:- are used to express calculation of values.
 Statements:- express actions that occur in a particular
sequence.
 Declarations:- express properties of names used in other parts
of the program.

11
CFG
 A CFG Is Characterized By a 4 tuple:

1. A Set of Tokens(Terminal Symbols)

2. A Set of Non-terminals

3. A Set of Production Rules

Each Rule Has the Form NT →{T, NT}*

4. A designated Start symbol.

12
Example CFG
 Context-free grammar for simple expressions

G = <{list, digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list> with a production P=

List → list + digit

List → list-digit

List → digit

Digit → 0|1|2|3|4|5|6|7|8|9

(the “|” means OR)

(so we could have written List → list + digit | list - digit | digit )

13
Derivation
 A given CFG we can determine the set of all strings(tokens) generated by the
grammar using derivation.

 The basic idea of derivation is to consider productions as rewrite rules:


Whenever we have a nonterminal, we can replace this by the right-hand side
of any production in which the nonterminal appears on the left-hand side.

 During parsing we have to take two decisions. These are as follows:


 We have to decide the non-terminal which is to be replaced.
 We have to decide the production rule by which the non-terminal will be
replaced.

14
Derivation
 We begin with the start symbol

 In each step, we replace one non terminal in the current


sentential form with one of the right-hand sides of production
for that nonterminal.

 Formally, we define the derivation relation by the three rules

1: N =>    if there is a production N → 

2:  => 

3:  =>  if there is a  such that  => and =>


15
Derivation

generates the string aabbbcc by the derivation

16
Left-most Derivation
 the input is scanned and replaced with the production rule from left to
right. So in left most derivatives we read the input string from left to
right. Example

 Production rules:
S=S+S
S=S-S
 S = a | b |c

 Input : a - b + c

17
Right-most Derivation
 The input is scanned and replaced with the production rule from right
to left. So in right most derivatives we read the input string from right
to left.. Example

 Production rules:
S=S+S
S=S-S
 S = a | b |c

 Input : a - b + c

18
Grammars are Used to Derive Strings:
 We can derive the string: 9 - 5 + 2 as follows:

 list → list + digit P1: list → list + digit

→list - digit + digit P2: list → list - digit

→digit - digit + digit P3:list→digit

→9 - digit + digit P4: digit →9

→9 - 5 + digit P4: digit → 5

→9 - 5 + 2 P4: digit → 2

This is an example leftmost derivation, because we replaced the


leftmost nonterminal (underlined) in each step
19
Defining Parse tree

➢ More Formally, a Parse Tree for a CFG Has the


Following Properties:
➢ The root of the tree is labeled by the start symbol
➢ Each leaf of the tree is labeled by a terminal(token) or ε
➢ Each Interior Node (Now Leaf) Is a Non-Terminal
➢ If A→ x1x2…xn, is a production, Then A Is an Interior;
x1x2…xn Are Children of A and May Be Non-Terminals or
Tokens.

20
Parse Tree for the Example
Grammar

➢ Parse tree of the string 9-5+2 using grammar G

21
Ambiguity
 A grammar is said to be ambiguous if there exists more than one left
most derivation or more than one right most derivation or more than
one parse tree for a given input string.

 Consider the following context-free grammar:

G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>

with production P =

string → string + string | string - string | 0 | 1 | … | 9

 This grammar is ambiguous, because more than one parse tree


generates the string 9-5+2

22
Ambiguity
 Two derivations (Parse Trees) for the same token string.

23
Associativity of Operators
➢ An operator  is left-associative if the expression abc must be evaluated
from left to right, i.e., as (ab)c .

➢ An operator  is right-associative if the expression abc must be evaluated


from right to left, i.e., as a a(bc).

➢ An operator  is non-associative if expressions of the form abc are illegal.

➢ Left-associative operators have left-recursive productions.

eg) 9+5+2≡(9+5)+2, a=b=c≡a=(b=c)

• Left Associative Grammar • Right Associative Grammar

list → list + digit | list – digit right → letter = right | letter

digit →0|1|…|9 letter → a|b|…|z

24
Associativity of Operators
• Left Associative Grammar • Right Associative Grammar
list → | list – digit right → letter = right | letter
digit →0|1|…|9 letter → a|b|…|z

25
Precedence of Operator
➢ A possible way of resolving the ambiguity is to use
precedence rules during syntax analysis to select among the
possible syntax trees.

➢ We say that a operator(*) has higher precedence than other


operator(+) if the operator(*) takes operands before other
operator(+) does.
• ex. 9+5*2≡9+(5*2), 9*5+2≡(9*5)+2
• left associative operators : + , - , * , /
• right associative operators : = , **

26
Precedence of Operator

expr → expr + term | term


term → term * factor | factor
factor → number | ( expr )

String 2+3*5 has the same meaning as 2+(3*5)

expr term

term term factor

factor factor number

number number

2 + 3 * 5 27

You might also like