CD Uint1

CHAPTER 1
INTRODUCTION TO COMPILERS
1.1 Compilers & Translators
Translator is a program that translates the program written in one language to another language.
Compiler is a software that translates the High level language program to low level
language(machine language)
Eg. A compiler translate the program written in FORTRAN, COBOL to machine language
Executing a program
Needs 2 steps
 Compile the source program and translate into object program
 Load the object program in memory and execute it
Source program  COMPILER  object program
Object program  Load in memory & execute  Output
Interpreter is a software that translates the High level
language program to an intermediate code that can be directly executed
Difference :
Compiler
 Compiler produces the object code and is saved in memory
 Since the object code is saved in memory , More memory space is needed
 Compiles the entire program and then lists all the errors
 Execution is fast (Takes the object code from memory and directly executes it)
Interpreter
 Interpreter produces the intermediate code but is not saved in memory
 Less memory space is needed( object code is not saved)
 Interprets line by line and list single error at a time. Only if it is corrected, then next error
will be listed
 Execution is slow ( every time interpretation and execution is done)
1.2 Need for translators
Machine Language Program :
Program written using 0s and 1s is called machine language program.
Eg: 0110 001110 010101
Assembly language program :
Program written using mnemonics is assembly language program
Mnemonic names are used for specifying operation codes and data addresses.
Eg: ADD X, Y
ADD is the opcodes, X and Y are the addresses of data.
Assembler is a software that translates the Assembly language program to low level
language(machine language)
Macro : A macro will be translated in to sequence of assembly statements.

Macro facility – Text replacement capability.
A macro is defined using macro name along with formal parameters. The macro ends with
endmacro statement. A macro is called by the macroname along with actual parameters.
Defn:
Macro macroname formalparameters
-----
-----
Endmacro
Call:
Macroname actual parameters
Eg: MACRO ADD2 X,Y
LOAD Y
ADD X
STORE Y
ENDMACRO
ADD2 A,B
When a micro is called, the statements in the micro will be substituted there.
Phase :
 The compilation process is divided into series of sub process called phases.
 A phase is a operation that takes as input one representation of source program and produces
as output another representation of program.
 The structure of a compiler contain several phases.
 one or more phases can be combined together to form a pass.
 A Pass reads the source program or the output of the previous pass, makes transformation
specified by its phase and writes the output to an intermediate file
2 types of Compiler
 Single pass compiler (compilation is done in one pass)
 Multi pass compiler (compilation is done in several passes)
 Multipass compiler is slower than single pass compiler because each pass reads and writes
an intermediate file
 Multipass compiler occupies less space because the space occupied by the compiler for one
pass can be reused by the next pass
1.3 Structure of a compiler

Compiler has 5 phases.
 Lexical phase
 Syntax phase
 Intermediate code generation phase
 Optimization phase
 Code generation phase
Table management
Error handing
Lexical Analysis:
First phase of the compiler
Has Lexical analyzer / Scanner
Scans the source program and separates the source program into tokens
Reads the input chr by chr
IF (5 .EQ. MAX) GOTO 100 ---- 8 tokens IF ( 5 .EQ. MAX ) GOTO 100
Keywords, identifiers,constants, operators, punctuation symbols are called as tokens
2 kinds of tokens
Specific Strings ----keywords, punctuation marks, operators IF ( .EQ. GOTO )
(predefined tokens)
Classes of strings ---- identifiers, labels, constants 5 MAX 100
(user defined tokens)
Input  Source program
Output Stream of tokens
To find the token, the lexical analyser examines the successive characters in the source program.
It has to search beyond the token , to find the token
A token is treated as a pair of consisting of 2 parts. Token type and token value
MAX is a token , type = identifier, value = 75
Specific string tokens have only type , no value
Tokens once found, they will be entered into the symbol table
Const, integer, value =5 1000

.
.
.
Label, value = 100 1004

.
.
.
Variable, integer, value = MAX 1008

.
.
.
Syntax analysis:
Second phase of the compiler
Has Syntax analyzer / Parser
Groups the tokens into syntactic structures
Parse tree is generated
Syntactic structure is represented as a tree whose leaves are tokens
Eg. A + B -- A+B is an expression.
Input Stream of tokens
Output Parse tree
A parse tree represents the syntactic structure. It has 2 functions.
Check the tokens in the input
Imposes a tree like structure
A+B
+
Id id
A/B*C
expn
Expn expn
Expn expn expn
Id id id
A / B C
Intermediate code generation :

Third phase
Has intermediate code generator
Generates a stream of simple instructions
Instructions has one operator and small number of operands
Transform the parse tree into intermediate language representation
One Type of intermediate code  three address code
A:= B op C
A,B,C are operands op is binary operator
The parse tree for A/B*C converted into 3 address sequence
T1:=A/B
T2:=T1*C
Input parse tree
OutputIntermediate code
Code optimization phase:

Optional phase
Improves the intermediate code and produces optimized code
The optimized code runs faster and takes less space
Input intermediate code
Output. Optimized code
2 types of optimization
 Local optimization
1. Local transformations can be applied to the program for improvement
If A>B goto l2
Goto l3
Can be changed as
If A<=B goto l3
2. Common sub expressions can be eliminated
A=B+C+D
E=B+C+F
Can be changed as
T1=B+C
A=T1+D
E=T1+F
 loop optimization
Loop invariants can be placed outside the loop( When a calculation produces the same
result each time in a loop, then it is called loop invariant and can be placed outside the loop)
For i= 1 to 50
{
j=5 // loop invariant. So it can be brought outside theloop
i=i*j
}
Can be changed as
j=5
for i= 1 to 50
{
i=i*j
}
Code generation phase
Final phase
Original object code is generated
memory locations for data is decided
selection of registers is done here.
Simple code generation for A:=B+C
LOAD B
ADD C
STORE A
Table management /book keeping

Keep track of names used in the program and maintain essential information about them.
The information about a data object is collected in the earlier phases and is stored in the symbol
table.
Error handler
It is called whenever there is error in the source program
The phase of the compiler should report the error to the error handler. The error handler issues
appropriate error message.
1.11Compiler writing tools:
Many tools have been developed to construct compiler.
They are :
Compiler-compiler, compiler-generator, translator writing programs.
There are 3 inputs to these programs
Description of lexical structure and syntactic structure of the source language
Description of output to be generated for each source language construct
Description of target machine
Some tools have been created for the automatic design of specific compiler components. These
tools use specialized languages for specifying and implementing the compiler.
Tools available in existing compiler –compiler
 Scanner generator
 Parse generator
 Syntax directed translation engine
 Automatic code generator
 Dataflow analysis engine
Scanner generator- this tool automatically generate lexical analyzer from a specification based on
regular expression
Parser generator – this tool automatically produce syntax analyzer from the input that is based in
Context free grammar
Syntax directed translation engine- This tool generates intermediate code with three address format
from the input that consists of a parse tree. These engines have routines to traverse the parse tree and
then produces the intermediate code
Automatic code generator(Facilities for code generation) – this tool generates the machine language
for a target machine. Each operation of the intermediate language is translated using a collection of rules
and then is taken as an input by the code generator
Dataflow analysis engine-- It is used in code optimization. Data flow analysis is a key part of the code
optimization
BOOTSTRAPPING
Any compiler is characterized by 3 languages.
Source Language
Object Language
The language in which compiler is written
A compiler may run on one machine and it can produce object code for the same machine is called
Pure compiler.
A compiler may run on one machine and it can produce object code for another machine is called
Cross compiler.
Lnew language
This language L is to be implemented on 2 machines. A and B
First, design a Compiler for language L in machine A
1. Take a subset S of language L
For this subset, write a small compiler for machine A. This compiler should be written in a
language that is already in A
CA SA
2. Then write a compiler for language L using a simple language S
CS LA
This compiler, CS LA, when it runs through CA SA , produces a complete compiler CA LA
CA LA : Compiler for Language L written on A, that runs on machine A, produces object

code for A
This is called as bootstrapping

3. We want to produce another compiler for L, which runs on B
Cross compiler for L runs on machine A, produces code for machine B
CHAPTER 3-LEXICAL ANALYSIS

3.1 Role of Lexical analyser
The function of lexical analyser is to read the source program, one character at a time and to
translate it into units called tokens. Keywords, identifiers, constants and operators are egs of token.
Lexical analyzer can be a separate pass. The o/p of lexical analyser is stored in a separate
intermediate file from which the parser can take its input. But generally the lexical analyser and the
parser will be in the same pass. In this case the lexical analyser acts as a subroutine . It is called by
the parser whenever a token is needed. This method eliminates the need for intermediate file.
The lexical analyser returns a representation for the token that it has found. If the token is simple, it
returns an integer code. ( ) , . ; :
If the token is complex(identifier , const) , it returns a pair – integer code and a pointer to the table.
Integer code indicates the token type and the pointer points to the value of token.
Need for Lexical analysis:
The purpose of splitting the lexical analysis and syntax analysis is to simplify the design of the
compiler. Specifying the structure of tokens is easier than the syntactic structures
Other functions performed by the lexical analyser :

Keeping track of line numbers, producing output listing, Stripping out whitespace, deleting
comments.
Input buffering:
To find the tokens, the LA scans the characters in the source program one at a time. Many
characters beyond the token has to be examined before determining the tokens. So, it is desirable for
the lexical analyser to read the input from a buffer. The buffer contains 2 pointers. One
pointer(beginning pter) points to the beginning of token. Next pointer(look ahead pointer) scans
beyond the beginning pointer until the token is discovered.
Input buffer
The buffer is divided into 2 halves. If the look ahead pointer travels beyond the buffer half in which
it began, the other half will be loaded with the next chrs from the source file.
Eg
DECLARE(ARG1, ARG2, ………ARGn)
^
Declare is a keyword or arrayname cannot be determined until the chr next to the parenthesis is
read.
SOURCE-BUFFER ACTUAL BUFFER
When the chrs are read from the source to buffer, ie. At the time of preliminary scanning ,
following things will be done.
Delete the comments
Ignore unneeded blanks
Combine the blanks
Count lines
Preprocessing the chrs is done to avoid the trouble of moving the look ahead pointer front and back
over comments and blanks.
3.3Regular expressions
Regular expression is a notation used to describe tokens
Set of constraints to be followed are called as regular expressions.
Regular expn for a identifier
id=letter(letter| digit)* | represents or , union
*- indicates zero or more occurrences
Eg:
a,ab,abc,abcd,a1,a12,a123,ab12cd,a1b1c1,adcfre234 etc….
Regular expn for constant
const= digit+ or digit(digit)*
+- indicates one or more occurrences
Eg:
4,456,4321,67890 etc….
Regular expn for relational operator
Relop=<|<=|>|>=|<>|=
Regular expn for keyword
Keyword=begin|end|if|then|else
Construction rules:
1. {ℇ} is a regular expression denoting an empty string.
2. {a} is a regular expression with one symbol
3. If R and S are two regular expressions, then
(R)|(S) is a regular expression
(R).(S) is a regular expression
(R)* is a regular expression
Precedence: * has the highest precedence, then comes . , then | has lowest precedence
Regular expression is defined in terms of primitive RE and Complex RE
Properties Regular expressions
Properties
If R, S and T are regular expressions then
1. R|S=S|R (| is commutative)
2. R|(S|T)=(R|S)|T (| is associative)
3. R.(S.T)=(R.S).T (. is associative)
4. R.(S|T)=(R.S)| (R.T),
(S|T).R=(S.R)| (T.R) (. is distributive over |)
5. ℇ.R=R.ℇ=R (ℇ is the identity)
Example
Form the regular expression for the set containing {a,b}
1. R={ℇ} , regular expression with empty string
2. R={a} the set containing single chr forms the RE
3. R=a|b the set containing a or b forms the RE
4. R=a* Zero or more occurrences of a {},a,aa,aaa,aaaa,….
5. R=a+ one or more occurrences of a a,aa,aaa,aaaa…. a.a*
6. R=(a|b)* zero or more occurrences of a|b
{},a,aa,aaa,aaaa,b,bb,bbb,ababab,bababa,baaa….
7. R=a|(ba*) single a or b followed by zero or more occurrences of a
a,b,ba,baa,baaa,baaaa…..
8. R=aa|ab|ba|bb denotes even lengthed string
9. R=ℇ|a|b denotes a string of length 0 or 1 {},a,b
10. R=(a|b)(a|b)(a|b) denotes string of length 3
aaa,abb,aba,bbb,baa…
*
11. R=(a|b)(a|b)(a|b) denote a string of length 3 or more
Transition diagram
Valuable tool for lexical analyser
Also called as state diagram
It is a flowchart for representing tokens
Circles represents states.
Arrows represents edges.
Labels on edges represent the input character that can appear after the states
3.4Finite Automata
The transition diagram for the regular expression is called as finite automata.
A recognizer for a language L takes an input string x and checks whether x is a sentence of L. If
so, it returns yes or it returns no.
For converting regular expressions to recognizer, a transition diagram is constructed from the
expressions. This transition diagram is called as Finite Automata
Finite automata types
Non deterministic Finite Automata (NFA)
Deterministic Finite automata (DFA)
NFA
1. Edges are labeled by ℇ
ℇ
0 1
2. Same character can be used as label for 2 or more transitions , out of 1 state
1
a
0
a
2
DFA
1. Edges cannot be labelled as ℇ (no transition with ℇ)
2. Same character cannot be used as label for 2 or more transitions , out of 1 state
(for each state s and input symbol a, there is atmost 1 edge labelled a leaving s)
NFA
NFA is a labelled directed graph. Nodes are called as states. Labeled edges are called transitions.
We have one state as start state and one or more states as accepting state or final state.
Transition table :The transitions of NFA can be easily represented in a table called as transition
table. There is a row for each state. There is a column for each admissible input symbol. The entry
for state i and symbol a is the set of possible next states for the state i on the input symbol a.
NFA accepts an input string x , if and only if there is a path from the start state to some accepting
state.
Draw NFA for (a/b)*abb
Transition table
input symbol
State
a b
t0 0,1 0
1 - 2
2 - 3
As per the rules

a
1 2
R1=a
R2==b b
1 2
R3=R1|R2
a
1 2
ℇ
ℇ
5
0 b
3 4 ℇ
ℇ
R4=(R3)
R5=R4*
R6=R5.R1
R7=R6.R2
R8=R7.R2
obtain NFA for aa*|bb*
3.5 Form regular expression to finite automata

BASIC STRUCTURE OF NFA
(constructing NFA from regular expression)
Input : Regular expression R
Output : NFA
Method:
Decompose regular expression into primitive components.
For each component, construct NFA
R=ℇ ℇℇ
0 11
R=a
a
0 11
R=b
b
0 11
After constructing components for basic regular expression, proceed to combine them. Hence
compound regular expressions are formed from smaller reg expressions.
For regular expression R1|R2, construct NFA
Given N1 is NFA for R1 and N2 is NFA for R2
ℇ
ℇ N1
1f
i
ℇ ℇ
N2
I is the initial state and f is the final state

There is a transition on ℇ from the new initial state to initial states of N1, N2.
Similarly there is a ℇ transition from the final states of N and N2 to the new final state f
For expression R1.R2, construct NFA
The final state of N1 is identified as the initial state of N2.
N2
i N1 f
For the expression R* construct NFA

Construct DFA from NFA
(a/b)*abb
Input symbol a b
transitions 2 7 4 8 9
3 8 5 9 10
ℇ closure(0)
Add 0 to ℇ closure
Add all states that is reachable from 0 that has ℇ as its edge.
ℇ closure(0)={0,1,2,4,7}----------A
1. From the members of A, find the states having transitions on a. Among the given states, 2
and 7 have, a transition to 3 and 8
A-a: ℇ closure(3,8)={3,6,7,1,2,4,8)----B
2. From the members of A, find the states having transitions on .b Among the given states, 4 a
have, a transition to 5
A-b: ℇ closure(5)={5,6,7,1,2,4)----C
3. From the members of B, find the states having transitions on a. Among the given states, 2
B-a: ℇ closure(3,8)={3,6,7,1,2,4,8)----B
4. From the members of B, find the states having transitions on b. Among the given states, 4
B-b: ℇ closure(5,9)={5,6,7,1,2,4,9)----D
5. Among the given states in C, 2 and 7 have transitions on a
C-a: ℇ closure(3,8)={3,6,7,1,2,4,8)----B
6. Among the given states in C, 4 have transitions on b

C-b: ℇ closure(5)={5,6,7,1,2,4)----C
7. Among the given states in D, 2 and 7 have transitions on a

D-a: ℇ closure(3,8)={3,6,7,1,2,4,8)----B
8. Among the given states in D, 4 and 9 have transitions on b
D-b: ℇ closure(5,10)={ 5,6,7,1,2,4,10)----E
9. Among the given states in E, 2 and 7 have transitions on a
E-a: ℇ closure(3,8)={3,6,7,1,2,4,8)----B
10. Among the given states in E, 4 have transitions on b
E-b: ℇ closure(5)={5,6,7,1,2,4)----C
Transition table
State Input symbols
s a b
A B C
B B D
C B C
D B E
E B C
Algorithm:
Alg 3.1 Construct DFA from NFA
Input : NFA
Output : DFA
Method :
Define a function ℇ-closure(s)
1. s is added to ℇ-closure(s)
2. If t is in ℇ-closure(s), and if there is an edge labelled ℇ from t to u,
then add u to ℇ-closure(s), if u is not already there.
3. Repeat rule2 , until no more states can be added .
(ℇ-closure(s) is just the set of states that can be reached with ℇ transitions alone)
Computation of ℇ-closure(s)
begin
push all states T onto STACK;
ℇ-closure(T):=T;
While STACK not empty do
begin
pop s, (the top element of STACK ), out of the stack;
for each state t with an edge from s to t labelled ℇ do
if t is not in ℇ-closure(T) do
begin
add t to ℇ-closure(T);
push T on to the stack;
end
end
end
Algorithm for subset construction:

Initially ℇ-closure(s0) be a state of D
This is the start state of D
Assume that each state of D is initially unmarked
While there is an unmarked state x={s1,s2, ….sn} of D do
begin
mark x;
for each input symbol a do
begin
let T be the set of states to which there is a transition on a from some state si in x;
y:= ℇ-closure(T);
if y has not yet been added to the set of states of D then
make y as an unmarked state of D;
add a transition from x to y labelled a, if not already present
end
end
3.6Minimize the no. of states in DFA
Construct reduced DFA from NFA
Reduced DFA : Minimizing the number of states in DFA
Initial Partition consists of 2 groups
Non final state and final state
Also called as non accepting state and accepting state
Π=(A,B,C,D)(E)
Construct Πnew
Group E cannot be split
Consider (ABCD)
When the input is a, all these states go to state B which is also in the same group.
So as far a the input a is concerned, these states cannot be split.
When the input is b, ABC goto the members of the group (ABCD). But D goes to E.
(which is the member of another group)
Hence (ABCD) can be split into (ABC) and D
Πnew=(ABC)(D)(E)
Π ≠ Πnew
now Π =(ABC)(D)(E), find Πnew
(D)can’t be split. Similarly (E) can’t
Consider(ABC)
When the input is a, all these states go to B the same group
When the input is b, A and C go to C. but B goes to D which is a member of another group.
So, Πnew=(AC)(B)(D)(E)
Again Π ≠ Πnew
now Π =(AC)(B)(D)(E) again find Πnew
(B),(D),(E) can't be split
Consider (AC)
When the input is a, both A and C, go to B the same group
When the input is b, both A and C, go to C the same group
.Hence they can’t be split
Πnew=(AC)(B)(D)(E)
Π = Πnew
A can be used to represent the group AC
B,D E represents B,D E
Transition table (C and A are equal)
State Input symbols
s a b
A B A
B B D
D B E
E B A
Dead state: A state is called as dead state when it has self loops for all the input's.
Non reachable state – A state that cannot be reached
Algorithm for Minimizing the number of states in DFA

Input : A DFA M with set of states S, input transitions defined for all states, initial state s0
and set of final states
Output: A DFA M’ having few states
Method:
1.Construct a partition π of set of states.
2. Initially π consists of 2 states. Final state and non final state
3. Construct a new partition Πnew
4. if Π≠ Πnew, replace Π by Πnew and repeat the procedure.
5. if Π = Πnew, then terminate.
6. After constructing final partition Π, pick a representative for each group.
7. If M’ has dead state d, then remove d from M1. . ( If a state has transitions
to itself for all input symbols , then it is dead state)
8. Also remove any nonreachable states from the initial state.
Construction of Πnew
For each group G of Π do
begin
partition G in to subgroups
such that 2 states s and t are of G are in the same subgroup if and only if
for all the input symbols a, the states s and t have transitions to states in the
same group
place all sub groups formed in Πnew
end
3.8 Implementation of Lexical analyser

Construct a Nondeterministic finite automata N for each token pattern p in the transition rules.
Link these NFAs together with a new start state
Convert NFA to DFA
In the combined NFA, there are many accepting states one for each token
When this NFA is converted to DFA, the subsets may have several final states.
The final state indicates the token we have found.
If the last state includes more than one final state, then the final state pattern list first has priority.
Translation rules
a
abb
a*b+
NFA for the tokens
1 a
2
a b
3 b
4 5 6
Convert to single NFA
Now convert to DFA

Input symbol a b
transitions 1 3 7 4 5 7 8
2 4 7 5 6 8 8
ℇclosure(0)={0,1,5,7}-------A
Aa: ℇclosure(2,4,7)={2,4,7}-------B
Ab: ℇclosure(8)={8}-------C
Ba: ℇclosure(7)={ 7}-------D
Bb: ℇclosure(5,8)={ 5,8}-------E
Ca: ℇclosure( ɸ)=null
Cb: ℇclosure( 8)={8}-------C
Da: ℇclosure( 7)={7}-------D
Db: ℇclosure( 8)={8}-------C
Ea: ℇclosure( ɸ)=null
Eb: ℇclosure(6,8)={ 6,8}-------F
Fa: ℇclosure( ɸ)=null
Fb: ℇclosure( 8)={8}-------C
Input a b Token found

symbol
0137 A 247 8 ---- (no final state in 0137)
247 B 7 58 a ( 2 is a final state in 247)
8 C -- 8 a*b+ (8 is the fina state)
7 D 7 8 ---(no final state in 7)
58 E -- 68 a*b+(8 is the fina state)
68 F -- 8 abb 6 and 8 are final states. Since
abb is given before a*b+ in rule,
the token is abb
CHAPTER4
SYNTACTIC SPECIFICATION OF POGRAMMING LANGUAGES
4.1 Context Free Grammar

 A notation used to represent syntactic specification of programming language
 Also called as Backus Naur form (BNF)
 Used to represent expression/ Statement
 CFG involves 4 quantities
1. Terminals
2. Non terminals
3. Start symbol
4. Production
Terminals
 Basic symbols of strings in a language are called terminals
 Keywords, Punctuation symbols, and operators are called terminals
 Token is a synonym for terminal
 examples
 Single lower case letters a, b,c …
 Operators + - *…
 Punctuations ; . , etc
 Digits 0,1,2…
 Bold face strings id,if…
Nonterminals
 Special symbols that denote set of strings
 Syntactic variable/ Syntactic category is a synonym for non terminal
 Examples
 Lower case names (expn, stmt, operator….)
 Italic capital letters ( E A..)
Productions
 Rewriting rules
 Each production consists of nonterminal followed by arrow followed by string of
nonterminals and terminals
 Examples
 Stmt begin stmt list end
 Expn  expn opr expn
 Expn ( expn)
 Expn  id
 Opr+/-/*
Start symbol
 Symbol on the left side of the first production
 Eg:
 EE+d In this eg, E is the start symbol
4.2 Derivations and Parse trees

 Derivations are used to check whether the given string is valid or not
Types
 Left most derivation (left most nonterminal is replaced at each step)
 Right most derivation (right most nonterminal is replaced at each step)
Example
Consider the grammar
Eid/E+E/E*E
ida/b/c
Use of LMD – replace left most nonterminal at each step

EE*E
E+E*E
id +E*E
a+ E*E
a+ id*E
a + b*E
a + b * id
 a+ b *c
Use of RMD – replace right most nonterminal at each step
EE*E
E*id
E*c
E+E*c
E+ id*c
E +b*c
id +b*c
a +b*c
Sentence id+id*id has 2 distinct LMD
1) EE+E
id +E
id +E*E
id +id*E
id +id*id
2) EE*E
E+E*E
id +E*E
id +id*E
id +id*id
Parse tree for 1) LMD

E
E + E
id E * E
id id
Parse tree for 2) LMD

E
E * E
E + E id
id id
2 parse trees have been generated for id+id*id

Parse tree 1) is correct. It has correct precedence of * over +.
Parse tree 2) is incorrect

EE+E | E*E | (E) | -E | id
The string –(id+id) is a sentence of the above grammar

Because,
E -E -(E) -(E+E) -(id+E) -(id+id) …………(1)
Parse trees
Graphical representation for derivations can be created.
This representation is called Parse tree.
Each interior node of the parse tree is labelled by some nonterminal
The children of the node are labeled by the symbols on the right side of the production
Eg:
AXYZ is a production
X Y Z
The leaves of the parse tree are labelled by terminals or nonterminals

When it is read from left to right, they constitutes a sentential form called yield or frontier of the
tree.
The parse tree for -(id+id) is given.
- E
( E )
E + E
id id
Construction of parse tree for eqn (1) - building parse tree
E-E  E - E --E - E
- E - E - E - E - E
( E ) ( E ) ( E ) ( E )
E + E E + E E + E
id id id
Ambiguity
A grammar that produces more than one parse tree is ambiguous.

An ambiguous grammar produces more than 1 LMD or more than 1 RMD
Disambiguity rules throw away undesirable parse trees, leaving us with only one parse tree.
EE+E| E-E | E*E | E/E | E^E | (E) | -E | id
This grammar is ambiguous
By using the associativity and precedence of arithmetic operator, we can disambiguise the
grammar
Precedence of operators
Unary -, ^ , * , / , + , -
We consider ^ as right associative.

ie. a^b^c = a^(b^c)
All other operators are left associative.
ie.a-b-c=(a-b)-c
Now using the associativity and precedence rule rewrite the grammar
1. element(expn)| id //An elt is either parantheised expn or id

2. primary - primary| elt //primaries are elements with 0 or more operator of highest precedence
3. factorprimary^factor| primary //factor is sequences of one or more primaries with ^
4. termterm*factor| term/factor| factor // sequences of 1 or more factors connected by * /
5. expnexpn+term | expn-term | term // sequences of 1 or more terms connected by + -
the unambiguous grammar is ,
expnexpn+term | expn-term | term
termterm*factor| term/factor| factor
factorprimary^factor| primary
primary - primary| elt
element(expn)| id

CD Uint1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CD Uint1

Uploaded by

Copyright:

Available Formats

CHAPTER 1

Macro : A macro will be translated in to sequence of assembly statements.

1.3 Structure of a compiler

Const, integer, value =5 1000

Label, value = 100 1004

Variable, integer, value = MAX 1008

Expn expn expn

Intermediate code generation :

Code optimization phase:

Table management /book keeping

CA LA : Compiler for Language L written on A, that runs on machine A, produces object

This is called as bootstrapping

Cross compiler for L runs on machine A, produces code for machine B

CHAPTER 3-LEXICAL ANALYSIS

Other functions performed by the lexical analyser :

Draw NFA for (a/b)*abb

As per the rules

obtain NFA for aa*|bb*

3.5 Form regular expression to finite automata

I is the initial state and f is the final state

For the expression R* construct NFA

6. Among the given states in C, 4 have transitions on b

7. Among the given states in D, 2 and 7 have transitions on a

Algorithm for subset construction:

Algorithm for Minimizing the number of states in DFA

3.8 Implementation of Lexical analyser

Now convert to DFA

Input a b Token found

4.1 Context Free Grammar

4.2 Derivations and Parse trees

Use of LMD – replace left most nonterminal at each step

Parse tree for 1) LMD

Parse tree for 2) LMD

2 parse trees have been generated for id+id*id

Consider the grammar

The string –(id+id) is a sentence of the above grammar

The leaves of the parse tree are labelled by terminals or nonterminals

The parse tree for -(id+id) is given.

Construction of parse tree for eqn (1) - building parse tree

A grammar that produces more than one parse tree is ambiguous.

We consider ^ as right associative.

1. element(expn)| id //An elt is either parantheised expn or id

You might also like

obtain NFA for aa|bb