Professional Documents
Culture Documents
Module-2 COMPILER DESIGN
Module-2 COMPILER DESIGN
It is the last phase of the front end of a compiler. Here the front end translate a source program into an
Intermediate representation from which the back end generates target code.
Although a source program can be translated directly into the target language, some benefit of using a
machine independent intermediate forms are:
1. Intermediate code is closer to the target machine than the source language & hence easier to generate
code form.
2. Unlike machine language, intermediate code is (more or less) machine independent. This makes it
easier to retarget the compiler.
3. It allows a variety of option to be performed in a machine independent way.
4. Intermediate code generation can be implemented via syntax directed translation and thus can be
folded into passing by augmenting the code for the parser.
Intermediate code can take a number of forms. So different intermediate codes are:
1. Three-Address Code(TAC)
2. P-Code
3. Byte code
P-Code: It is a code in which a language specification intermediate code upon which the majority of
implementation of PASCAL are based.
Byte Code: Basically it is a machine depending codes like java and it is used by java Virtual Machine.
1
3. Copy Statement: the general form be x=y
Here the statement x gets assigned the value of y
4. Unconditional jump: i.e. goto L
The three address instruction with label L is the next to be executed.
Ex:
100 : a:=b
101 : goto 104
……………….
103 : e:=f
104 : c:=d
+ Y z x
Indirect Triples: If we use pointers to triples, rather than listing the triples themselves, the implementation is
called indirect triples implementation.
Op Arg-1 Arg-2
New index Index
10 (0)
20 (1)
30 (2)
Example: Write quadruples, triples & indirect triples for the following expression.
x[i]=y
Quadruples
3
Indirect triples
Type checking:
A source program should follow both the syntactic and semantics rules of the source language.
some rules can be checked statistically during compile time and other rules can be checked
dynamically during runtime.
Static checking includes the syntax checks performed by the parser and semantic checks such
as type checks, flow of control checks, uniqueness checks and name related checks.
So, type checking will involve adding synthesized attribute through those parts of the language
grammar that involve expression & values.
The type checker plays role between syntactic analysis and intermediate code generation as
follows:
Intermediate Representation
4
Static type Checking :
Static type checking refers to the compiler checking of program to ensure that the syntactic &
semantic conventions of the source language are being followed.
i) Type checks: Operators & operands must have compatible types.
ii)Flow of control checks: Statements that cause flow of control to leave a construct must have some place
where control can be transferred.
Ex: Break statement in C
iii) Uniqueness checks: A language may dictate that in some contexts, an entity can be defined exactly once
that is identifies in declarations, case statement labels.
iv) Name-related checks: Some times the same name must be appear two or more times, that is in Ada a loop
or block can have a name that must appear at the beginning and at the end.
Dynamic type checking:
The rules that can be checked dynamically during runtime, is called dynamic type checking.
5
Static type checking Dynamic type checking
1. If the compiler can verify at compile time, that
program is free from type error, then this type of
checking is called static type. 1. If a compiler after compile time can be verified
free from type error in runtime then this type
checking is called dynamic type checking
EXAMPLE
+
< <
7 * 7 * 4 <
5 3 5 3 5 3
From fig. (a) the data type is not defined. So the type checker will check it.
Fig (b) describes that all the labels of the tree are correct
Fig (c) describes that here is an error because it never happen that an integer operates with an boolean
expression.
For this error ,we must Eliminate it.
There are two types of conversion( elimination)
1. Explicit type conversion
2. Implicit type convection
6
1. Explicit type conversion: If the program is required to write extra code to have this conversion
performed we call this as explicit type cast or conversion.
2. Implicit type convection: If the compiler performs this transformation without direction from the
programmer, the term is implicit type conversion.
Typically implicit type conversion is done only on the "build in" type.
Examples:
*
*
+ +
5 5
4 4 + +
< int
1 ……..>
1 2 <
one a
1 2
one a
Note: indicates that a location is being dereferenced and turned into a value.
Runtime Environments
The compiler creates and manages a run-time environment which it assumes its target programs are
being executed.
This environment deals with a variety of issues such as the layout & allocation of storage location
for the object named.
7
The mechanisms used by the target program to access variables, the linkage between procedures, the
mechanisms for passing parameters & the interfaces to the operating system.
Storage organization
The executing target program runs in its own logical address space in which each program value has
a location.
The management and organization of this logical address space is shared between the compiler,
operating system & target machine.
The operating system maps the logical address into physical address, which are usually spread
through memory.
The run time representation of an object program in the logical address space consists of data &
program areas.
Code
Static
Heap
Free Memory
Stack
2. Static area: This contains data whose absolute address can be determined at compile time.
For example, In FORTRAN the address of all variables can be determined statically & therefore can be kept
in the static area.
Again In C, the global & static variables are kept in this area.
3. Stack area & heap area: To maximize the utilization of space at run time, the two areas stack & heap
are at the opposite ends of the remainder of the address space.
These areas are dynamic & their size can change as program executes. These areas grow towards
each other as needed.
8
The stack is used to store data structures called activation records that get generated during
procedure call.
Heap area is created during runtime & that includes objects pointed to by pointer types.
The two objectives static & dynamic storage allocation simply distinguish between compile time &
run time respectively.
A storage allocation is static if it can be made by the compile looking only at the text of the
program, not at what the program does when it executes.
1) Stack storage: Names local to a procedure are allocated space on a stack. The stack supports the normal
call/return policy for procedures.
2) Heap storage: Data that may out live the call to the procedure that created, it is usually allocated on a
heap of reusable storage.
The heap is an area of virtual memory that allows objects on other data elements to obtain storage
when they are created & to return that storage when they are invalided.
To support heap management 'garbage collection' enables the run time system to detect useless data
element & reuse their storage.
Activation Record
Defn: An activation record is a conceptual aggregate of data which contains all information required for a
single activation of a procedure.
Activation records get pushed into stack when a procedure is called & get popped when a procedure
return.
Activation records are held in the static area for language like FORTAN & in the stack area for
language like PASCAL. Actual parameters
Return value
Access link
Temporaries
[Activation record table]
2. local Data: These variables are a part of the local environment of the currently activated procedure.
3. Saved machine status: If the called procedure wants to use the register used by the calling procedures,
these have to be saved before & restored after the execution of the called procedures.
4. Access link: An access link may be needed to locate data needed by the call procedure but found else
where may be in another activation record. i.e. access link for access to non-local names.
6. Returned value: This is the space for the return value of the called function, if any. Again not all called
procedures return a value and if one does, we may prefer to place that value in a register for efficiency.
10