UNIT - IV - TUV - Compiler Design - NOTES - PG

UNIT-IV
Storage organization: Storage organization - storage allocation strategies -

parameter passing - symbol tables-dynamic storage allocation - intermediate
languages - representation of declarations - assignment statement - Boolean
expression - back patching-procedure calls.
4.1 STORAGE ORGANIZATION
 The executing target program runs in its own logical address space in which
each program value has a location.
 The management and organization of this logical address space is shared
between the complier, operating system and target machine. The operating
system maps the logical address into physical addresses, which are usually
spread throughout memory.
4.1.1 Typical subdivision of run-time memory:
 Run-time storage comes in blocks, where a byte is the smallest unit of

addressable memory. Four bytes forms a machine word. Multibyte objects
are stored on consecutive bytes and given the address of first byte.
 The storage layout for data objects is strongly influenced by the addressing
constraints of the target machine.
1
 A character array of length 10 needs only enough bytes to hold 10
characters, a compiler may allocate 12 bytes to get alignment, leaving 2
bytes unused.
 The unused space due to alignment considerations is referred to as padding.
 The size of some program objects may be known at run time and may be
placed in an area called static.
 The dynamic areas used to maximize the utilization of space at run time are
stack and heap.
4.1.2 Activation Records:
 Procedure Calls and returns are usually managed by a run time stack
called the control stack.
 Each live activation has an activation record on the control stack, with
the root of the activation tree at the bottom; the latter activation has its
record at the top of the stack.
 The contents of the activation record vary with the language being
implemented. The diagram below shows the contents of activation
record.
2
 Temporary values such as those arising from the evaluation of expressions.
 Local data belonging to the procedure whose activation record this is.
 A saved machine status, with information about the state of the machine just
before the call to procedures.
 An access link may be needed to locate data needed by the called procedure
but found elsewhere.
 A control link pointing to the activation record of the caller.
 Space for the return value of the called functions, if any.Again,not all called
procedures return a value, and if one does, we may prefer to place that value,
and if one does, we may prefer to place that value in a register for efficiency.
 The actual parameters used by the calling procedure. These are not placed in
activation record but rather in registers, when possible for greater efficiency.
 We assume that the program control flows in a sequential manner and
when a procedure is called, its control is transferred to the called procedure.
When a called procedure is executed, it returns the control back to the caller.
This type of control flow makes it easier to represent a series of activations
in the form of a tree, known as the activation tree.
 To understand this concept, we take a piece of code as an example:
printf(“Enter Your Name: “);
scanf(“%s”, username);
show_data(username);
printf(“Press any key to continue…”);
...
int show_data(char *user)
{
printf(“Your name is %s”, username);
return 0;
}
3
Below is the activation tree of the code given:
4.2 STORAGE ALLOCATION STRATEGIES

The different storage allocation strategies are:
1. Static Allocation-lays out storage for all data objects at compile time.
2. Stack Allocation = Manages the run-time storage as a stack.
3. Heap Allocation – allocates and deallocates storage as needed at run time
from a data area.
4.2.1 Static Allocation
 In static allocation, the position of an activation record in memory is fixed at
compile time.
4.2.1.1 Implementation of call statement:
The codes needed to implement static allocation are as follows:
MOV #here + 20, callee.static_area
GOTO callee.code_area
4
where,
callee.static_area – Address of the activation record
callee.code_area – Address of the first instruction for called procedure
#here + 20 – Literal return address which is the address of the instruction
following GOTO.
4.2.1.2 Implementation of return statement:
A return from procedure callee is implemented by :
GOTO *callee.static_area
This transfers control to the address saved at the beginning of the activation record.
4.2.1.3 Implementation of action statement:
The instruction ACTION is used to implement action statement.
4.2.1.4 Implementation of halt statement:
The statement HALT is the final instruction that returns control to the operating
system.
4.2.2. Stack Allocation
Static allocation can become stack allocation by using relative addresses for
storage in activation records. In stack allocation, the position of activation record is
stored in register so words in activation records can be accessed as offsets from the
value in this register.
The codes needed to implement stack allocation are as follows:
Initialization of stack:
MOV #stackstart , SP /* initializes stack */
Code for the first procedure
HALT /* terminate execution */
Implementation of Call statement:

ADD #caller.recordsize, SP /* increment stack pointer */
MOV #here + 16, *SP /*Save return address */
5
GOTO callee.code_area
where,
caller.recordsize – size of the activation record
#here + 16 – address of the instruction following the GOTO
Implementation of Return statement:
GOTO *0 ( SP )
SUB #caller.recordsize, SP
4.2.2.1 Stack Allocation Space
Calling Sequence
 Procedures called are implemented in what is called as calling sequence,
which consists of code that allocates an activation record on the stack
and enters information into its fields.
 A return sequence is similar to code to restore the state of machine so
the calling procedure can continue its execution after the call.
 The code in calling sequence is often divided between the calling
procedure (caller) and the procedure it calls (callee).
 When designing calling sequences and the layout of activation records,
the following principles are helpful:
6
 The calling sequence and its division between caller and callee are as
follows:
 The caller evaluates the actual parameters.
 The caller stores a return address and the old value of top sp into the
caller’s activation record. The caller then increments the top sp to the
respective positions.
 The callee saves the register values and other status information.
 The callee initializes its local data and begins execution.
4.2.2.2 Variable Length data on Stack
 The run-time memory management system must deal frequently with the
allocation of space for objects, the size of which are not known at the
compile time, but which are local to a procedure and thus may be allocated
on the stack.
 The same scheme works for objects or any type it they are local to the
procedure called and have a size that depends on the parameters of the call.
FIGURE: ACCESS DYNAMICALLY ALLOCATED AREA
7
4.2.3. HEAP ALLOCATION
Stack allocation strategy cannot be used if either of the following is possible:
 Heap allocation parcels out pieces of contiguous storage, as needed for

activation records or other objects.
 Pieces may be de allocated in any order, so over the time the heap will
consist of alternate areas that are free and in use.
 The record for an activation of procedure r is retained when the activation

ends.
 Therefore, the record for the new activation q(1,9) cannot follow that for s
physically.
 If the retained activation record for r is deallocated, there will be free space
in the heap between the activation records for s and q.
8
Variables local to a procedure are allocated and de-allocated only at
runtime. Heap allocation is used to dynamically allocate memory to the variables
and claim it back when the variables are no more required.
Except statically allocated memory area, both stack and heap memory can
grow and shrink dynamically and unexpectedly. Therefore, they cannot be
provided with a fixed amount of memory in the system.
As shown in the image above, the text part of the code is allocated a fixed
amount of memory. Stack and heap memory are arranged at the extremes of total
memory allocated to the program. Both shrink and grow against each other.
4.3 PARAMETER PASSING
The communication medium among procedures is known as parameter
passing. The values of the variables from a calling procedure are transferred to the
called procedure by some mechanism. Before moving ahead, first go through
some basic terminologies pertaining to the values in a program.
4.3.1 r-value
The value of an expression is called its r-value. The value contained in a
single variable also becomes an r-value if it appears on the right-hand side of the
assignment operator. r-values can always be assigned to some other variable.
9
4.3.2 l-value
The location of memory (address) where an expression is stored is known as
the l-value of that expression. It always appears at the left hand side of an
assignment operator.
For example:
day = 1;
week = day * 7;
month = 1;
year = month * 12;
From this example, we understand that constant values like 1, 7, 12, and
variables like day, week, month and year, all have r-values. Only variables have l-
values as they also represent the memory location assigned to them.
For example:
7 = x + y;
is an l-value error, as the constant 7 does not represent any memory location.
4.3.3 Formal Parameters
Variables that take the information passed by the caller procedure are called
formal parameters. These variables are declared in the definition of the called
function.
4.3.4 Actual Parameters
Variables whose values or addresses are being passed to the called
procedure are called actual parameters. These variables are specified in the
function call as arguments.
Example:
fun_one()
{
int actual_parameter = 10;
10
call fun_two(int actual_parameter);
}
fun_two(int formal_parameter)
{
print formal_parameter;
}
Formal parameters hold the information of the actual parameter, depending
upon the parameter passing technique used. It may be a value or an address.
4.3.5 Pass by Value
In pass by value mechanism, the calling procedure passes the r-value of
actual parameters and the compiler puts that into the called procedure’s activation
record. Formal parameters then hold the values passed by the calling procedure. If
the values held by the formal parameters are changed, it should have no impact on
the actual parameters.
4.3.6 Pass by Reference
In pass by reference mechanism, the l-value of the actual parameter is
copied to the activation record of the called procedure. This way, the called
procedure now has the address (memory location) of the actual parameter and the
formal parameter refers to the same memory location. Therefore, if the value
pointed by the formal parameter is changed, the impact should be seen on the
actual parameter as they should also point to the same value.
4.3.7 Pass by Copy-restore
This parameter passing mechanism works similar to ‘pass-by-reference’
except that the changes to actual parameters are made when the called procedure
ends. Upon function call, the values of actual parameters are copied in the
activation record of the called procedure. Formal parameters if manipulated have
11
no real-time effect on actual parameters (as l-values are passed), but when the
called procedure ends, the l-values of formal parameters are copied to the l-values
of actual parameters.
Example:
int y;
calling_procedure()
y = 10;
copy_restore(y); //l-value of y is passed
printf y; //prints 99
copy_restore(int x)
x = 99; // y still has value 10 (unaffected)
y = 0; // y is now 0
When this function ends, the l-value of formal parameter x is copied to the
actual parameter y. Even if the value of y is changed before the procedure ends,
12
the l-value of x is copied to the l-value of y making it behave like call by
reference.
4.3. 8 Pass by Name
Languages like Algol provide a new kind of parameter passing mechanism that
works like preprocessor in C language. In pass by name mechanism, the name of
the procedure being called is replaced by its actual body. Pass-by-name textually
substitutes the argument expressions in a procedure call for the corresponding
parameters in the body of the procedure so that it can now work on actual
parameters, much like pass-by-reference.
4.4 SYMBOL TABLE

Symbol table is an important data structure created and maintained by
compilers in order to store information about the occurrence of various entities
such as variable names, function names, objects, classes, interfaces, etc. Symbol
table is used by both the analysis and the synthesis parts of a compiler.
A symbol table may serve the following purposes depending upon the
language in hand:
 To store the names of all entities in a structured form at one place.
 To verify if a variable has been declared.
 To implement type checking, by verifying assignments and expressions in
the source code are semantically correct.
 To determine the scope of a name (scope resolution).
A symbol table is simply a table which can be either linear or a hash table. It
maintains an entry for each name in the following format:
<symbol name, type, attribute>
For example, if a symbol table has to store information about the following
variable declaration:
13
static int interest;
then it should store the entry such as:

<interest, int, static>
The attribute clause contains the entries related to the name.

4.4.1 Implementation
If a compiler is to handle a small amount of data, then the symbol table can
be implemented as an unordered list, which is easy to code, but it is only suitable
for small tables only. A symbol table can be implemented in one of the following
ways:
 Linear (sorted or unsorted) list
 Binary Search Tree
 Hash table
Among all, symbol tables are mostly implemented as hash tables, where the
source code symbol itself is treated as a key for the hash function and the return
value is the information about the symbol.
4.4.2 Operations
A symbol table, either linear or hash, should provide the following operations.
insert()
This operation is more frequently used by analysis phase, i.e., the first half of
the compiler where tokens are identified and names are stored in the table. This
operation is used to add information in the symbol table about unique names
occurring in the source code. The format or structure in which the names are
stored depends upon the compiler in hand.
An attribute for a symbol in the source code is the information associated with
that symbol. This information contains the value, state, scope, and type about the
symbol. The insert() function takes the symbol and its attributes as arguments and
stores the information in the symbol table.
14
For example:
int a;
should be processed by the compiler as:
insert(a, int);
4.4.3 lookup()
lookup() operation is used to search a name in the symbol table to determine:
 if the symbol exists in the table.
 if it is declared before it is being used.
 if the name is used in the scope.
 if the symbol is initialized.
 if the symbol declared multiple times.
The format of lookup() function varies according to the programming language.
The basic format should match the following:
lookup(symbol)
This method returns 0 (zero) if the symbol does not exist in the symbol table. If
the symbol exists in the symbol table, it returns its attributes stored in the table.
4.4.4 Scope Management

A compiler maintains two types of symbol tables: a global symbol
table which can be accessed by all the procedures and scope symbol tables that
are created for each scope in the program.
To determine the scope of a name, symbol tables are arranged in hierarchical
structure as shown in the example below:
int value=10;
void pro_one()
{
int one_1;
int one_2;
{ \
int one_3; |_ inner scope 1
15
int one_4; |
} /
int one_5;
{ \
int one_6; |_ inner scope 2
int one_7; |
} /
}
void pro_two()
{
int two_1;
int two_2;
{ \
int two_3; |_ inner scope 3
int two_4; |
} /
int two_5;
}
The above program can be represented in a hierarchical structure of symbol

tables:
4.4 DYNAMIC STORAGE ALLOCATION

 The technique needed to implement dynamic storage allocation techniques
depends on the space is deal located.
16
 I.e., implicitly or explicitly.
 Three types of allocation
o Explicitly allocation of fixed size block
o Explicit allocation of variable size block
o Implicit deallocaiton
4.4.1. Explicit Allocation of Fixed Size Block
 The simplest form dynamic storage allocation.
 The blocks linked together in a list and the allocation and deallocation can
done quickly with less or no storage overhead.
 A pointer available points to the first block in the list of available blocks.
FIGURE: EXPLICIT ALLOCATION OF FIXED SIZE BLOCK

4.4.2.Explicit Allocation of Variable size Block
 When blocks are allocated & deallocated storage can become fragmented
i.e., heap may consist alternate blocks that are free & in use.
 In variable size allocation it will be a problem become we could not allocate
a block larger than any free blocks, even though the space is available.
17
 The first fit method can be used to allocate variable sized block.
 When a block of size is allocated it search for the first free block size f>=s.
This block is then subdivided in to a used block of size s & a free block of
size f-s.Its time consuming.
 When a block is deallocated, it check to see if it is next to a free block. If
possible, the deal located is combined with a free block next to it to create
larger free block. It helps to avoid fragmentation.
4.4.3.Implicit Deallocation
 Implicit deallocation required the cooperation between user program &
runtime packages, this is implemented by fixing the format of storage
blocks.
FIGURE: IMPLICIT DEALLOCATION

 The first problem is to recognize the block boundaries, for fixed size it is
easy.
18
 In variable size block the size of block is kept in a inaccessible storage
attached to the block.
 The second problem is of recognizing the if a block is in use. Used block can
be referred by the user program using pointers. The pointers are kept in a
fixed position in the block for the easiness of checking the reference.
 Two approaches can be used for implicit deallocation:
o Reference counts
o Marking techniques
1.Reference Counts:
 We keep track of the no of reference to the present block. If it ever drops to
0 the block is deallocated.
 Maintaining reference counts can be costly in time (the pointer assignment
p:=q leads to changes in the reference counts of the blocks pointed by both p
& q).
 Reference counts are best if there is no cyclical reference occurs.
2. Marking Techniques
 Here the user program suspend temporarily & use the frozen pointers to
determine the used blocks. This approach requires all the pointers to the
heap to be known.(Conceptually, it’s like pouring paint to the heap through
the pointers).
 First we go through the heap & mark all the blocks unused. Then we follow
the pointers & mark all the reachable blocks as used. Then sequentional scan
of heap collects all the blocks still marked unused.
4.5 INTERMEDIATE CODE GENERATION

A source code can directly be translated into its target machine code, then
why at all we need to translate the source code into an intermediate code which is
19
then translated to its target code? Let us see the reasons why we need an
intermediate code.
 If a compiler translates the source language to its target machine language

without having the option for generating intermediate code, then for each new
machine, a full native compiler is required.
 Intermediate code eliminates the need of a new full compiler for every
unique machine by keeping the analysis portions same for all the compilers.
 The second part of compiler, synthesis, is changed according to the target
machine.
 It becomes easier to apply the source code modifications to improve code
performance by applying code optimization techniques on the intermediate
code.
4.6 INTERMEDIATE REPRESENTATION OF DECLARATION
Intermediate codes can be represented in a variety of ways and they have their
own benefits.
 High Level IR - High-level intermediate code representation is very close
to the source language itself. They can be easily generated from the source
code and we can easily apply code modifications to enhance performance.
But for target machine optimization, it is less preferred.
 Low Level IR - This one is close to the target machine, which makes it
suitable for register and memory allocation, instruction set selection, etc. It
is good for machine-dependent optimizations.
20
Intermediate code can be either language specific (e.g., Byte Code for Java) or
language independent (three-address code).
4.6.1 Declarations
A variable or procedure has to be declared before it can be used. Declaration
involves allocation of space in memory and entry of type and name in the symbol
table. A program may be coded and designed keeping the target machine structure
in mind, but it may not always be possible to accurately convert a source code to
its target language.
Taking the whole program as a collection of procedures and sub-procedures,
it becomes possible to declare all the names local to the procedure. Memory
allocation is done in a consecutive manner and names are allocated to memory in
the sequence they are declared in the program. We use offset variable and set it to
zero {offset = 0} that denote the base address.
The source programming language and the target machine architecture may
vary in the way names are stored, so relative addressing is used. While the first
name is allocated memory starting from the memory location 0 {offset=0}, the
next name declared later, should be allocated memory next to the first one.
Example:
We take the example of C programming language where an integer variable
is assigned 2 bytes of memory and a float variable is assigned 4 bytes of memory.
int a;
float b;
Allocation process:
{offset = 0}
int a;
id.type = int
id.width = 2
offset = offset + id.width
21
{offset = 2}
float b;
id.type = float
id.width = 4
offset = offset + id.width

{offset = 6}
To enter this detail in a symbol table, a procedure enter can be used. This
method may have the following structure:
enter(name, type, offset)
This procedure should create an entry in the symbol table, for variable name,
having its type set to type and relative address offset in its data area.
4.7 ASSIGNMENT STATEMENT
Suppose that the context in which an assignment appears is given by the
following grammar.
Example
P->M D
M ->ɛ
D-> D ; D | id : T | proc id ; N D ; S;D
N -> ɛ
Non terminal P becomes the new start symbol when these productions are added to
those in the translation scheme shown below.
4.7.1 Reusing Temporary Names

22
 The temporaries used to hold intermediate values in expression calculations
tend to clutter up the symbol table, and space has to be allocated to hold their
values.
 Temporaries can be reused by changing newtemp. The code generated by
the rules for E>->E1 + E2 has the general form:
 evaluate E1 into t
 evaluate E2 into t2
 t : = t1 + t2
 The lifetimes of these temporaries are nested like matching pairs of
balanced parentheses.
 Keep a count c , initialized to zero. Whenever a temporary name is used as
an operand, decrement c by 1. Whenever a new temporary name is
generated, use $c and increase c by 1. For example, consider the
assignment x := a * b + c * d – e * f
The lifetimes of these temporaries are nested like matching pairs of balanced
parentheses. Keep a count c, initialized to zero. Whenever a temporary name is
used as an operand, decrement c by 1.
4.7.2 .Three-Address Code
Intermediate code generator receives input from its predecessor phase,
semantic analyzer, in the form of an annotated syntax tree. That syntax tree then
can be converted into a linear representation, e.g., postfix notation. Intermediate
code tends to be machine independent code. Therefore, code generator assumes to
have unlimited number of memory storage (register) to generate code.
For example:
a = b + c * d;
The intermediate code generator will try to divide this expression into sub-
expressions and then generate the corresponding code.
r1 = c * d;
23
r2 = b + r1;
a = r2
r being used as registers in the target program.
A three-address code has at most three address locations to calculate the
expression. A three-address code can be represented in two forms : quadruples
and triples.
(i).Quadruples
Each instruction in quadruples presentation is divided into four fields:
operator, arg1, arg2, and result. The above example is represented below in
quadruples format:
Op arg1 arg2 result
* c d r1
+ b r1 r2
+ r2 r1 r3
= r3 a
(ii).Triples
Each instruction in triples presentation has three fields : op, arg1, and
arg2.The results of respective sub-expressions are denoted by the position of
expression. Triples represent similarity with DAG and syntax tree. They are
equivalent to DAG while representing expressions.
Op arg1 arg2
* c d
+ b (0)
+ (1) (0)
= (2)
24
Triples face the problem of code immovability while optimization, as the
results is positional and changing the order or position of an expression may cause
problems.
(iii).Indirect Triples
This representation is an enhancement over triples representation. It uses
pointers instead of position to store results. This enables the optimizers to freely
re-position the sub-expression to produce an optimized code.
4.7.3.Addressing Array Elements
Elements of an array can be accessed quickly if the elements are stored in a
block of consecutive locations.
Address calculation of multi-dimensional arrays:
A two-dimensional array is stored in of the two forms:
(i). Row-major (row-by-row)
(2).Column-major (column-by-column)
4.7.4.Type conversion within Assignments :

Consider the grammar for assignment statements as above, but suppose there
are two types – real and integer, with integers converted to reels when necessary.
25
Example:
4.8 BOOLEAN EXPRESSIONS

Boolean expressions have two primary purposes. They are used to compute
logical values, but more often they are used as conditional expressions in
statements that alter the flow of control, such as if-then-else, or while-do
statements.
26
Boolean expressions are composed of the Boolean operators ( and, or, and not )
applied to elements that are Boolean variables or relational expressions. Relational
expressions are of the form E1 relop E2, where E1 and E2 are arithmetic
expressions.
Here we consider Boolean expressions generated by the following grammar :
4.8.1 Methods of Translating Boolean Expressions

There are two principal methods of representing the value of a Boolean
expression. They are:
 To encode true and false numerically and to evaluate a Boolean
expression analogously to an arithmetic expression. Often, 1 is used to
denote true and 0 to denote false.
 To implement Boolean expressions by flow of control, that is,
representing the value of a Boolean expression by a position reached in a
program. This method is particularly convenient in implementing the
Boolean expressions in flow-of-control statements, such as the if-then
and while-do statements.
4.8.2 Numerical Representation
Here, 1 denotes true and 0 denotes false. Expressions will be evaluated
completely from left to right, in a manner similar to arithmetic expressions.
For example:
The translation for
a or b and not c
is the three-address sequence
t1 : = not c
t2 : = b and t1
t3 : = a or t2
27
A relational expression such as a < b is equivalent to the conditional statement
if a < b then 1 else 0
which can be translated into the three-address code sequence (again, we arbitrarily
start statement numbers at 100) :
100 : if a < b goto 103
101 : t : = 0
102 : goto 104
103 : t : = 1
104 :
4.8.3 Short-Circuit Code:
We can also translate a boolean expression into three-address code without
generating code for any of the boolean operators and without having the code
necessarily evaluate the entire expression. This style of evaluation is sometimes
called “short-circuit” or “jumping” code. It is possible to evaluate boolean
expressions without generating code for the boolean operators and,
or, and not if we represent the value of an expression by a position in the code
sequence.
Translation of a < b or c < d and e < f
100 : if a < b goto 103 107 : t2 : = 1
101 : t1 : = 0 108 : if e < f goto 111
102 : goto 104 109 : t3 : = 0
103 : t1 : = 1 110 : goto 112
104 : if c < d goto 107 111 : t3 : = 1
105 : t2 : = 0 112 : t4 : = t2 and t3
106 : goto 108 113 : t5 : = t1 or t4
28
4.8.4 Flow-of-Control Statements
We now consider the translation of boolean expressions into three-address
code in the context of if-then, if-then-else, and while-do statements such as those
generated by the following grammar:
S -> if E then S1
| if E then S1 else S2
| while E do S1
In each of these productions, E is the Boolean expression to be translated. In the
translation, we assume that a three-address statement can be symbolically labeled,
and that the function new label returns a new symbolic label each time it is called.
 E.true is the label to which control flows if E is true, and E.false is the label
to which control flows if E is false.
 The semantic rules for translating a flow-of-control statement S allow
control to flow from the translation S.code to the three-address instruction
immediately following S.code.
 S.next is a label that is attached to the first three-address instruction to be
executed after the code for S.
29
4.8.5 CASE STATEMENTS
The “switch” or “case” statement is available in a variety of languages. The
switch-statement syntax is as shown below :
Switch-statement syntax
switch expression
begin
case value : statement
...
default : statement
end
There is a selector expression, which is to be evaluated, followed by n
constant values that the expression might take, including a default “value” which
always matches the expression if no other value does. The intended translation of a
switch is code to:
1. Evaluate the expression.
2. Find which value in the list of cases is the same as the value of the expression.
3. Execute the statement associated with the value found.
Step (2) can be implemented in one of several ways :
 By a sequence of conditional goto statements, if the number of cases is
small.
 By creating a table of pairs, with each pair consisting of a value and a label
for the code of the corresponding statement. Compiler generates a loop to
compare the value of the expression with each value in the table. If no match
is found, the default (last) entry is sure to match.
 If the number of cases s large, it is efficient to construct a hash table.
30
 There is a common special case in which an efficient implementation of the
n-way branch exists. If the values all lie in some small range, say imin to
imax, and the number of different values is a reasonable fraction of imax -
imin, then we can construct an array of labels, with the label of the statement
for value j in the entry of the table with offset j – imin and the label for the
default in entries not filled otherwise. To perform switch, evaluate the
expression to obtain the value of j , check the value is within range and
transfer to the table entry at offset j-imin .
Syntax-Directed Translation of Case Statements:
Consider the following switch statement:
switch E
begin
case V1 : S1
case V2 : S2
...
case Vn-1 : Sn-1
default : Sn
end
This case statement is translated into intermediate code that has the following
form:
Translation of a case statement
code to evaluate E into t
goto test
L1 : code for S1
goto next
L2 : code for S2
goto next
31
...
Ln-1 : code for Sn-1
goto next
Ln : code for Sn
goto next
test : if t = V1 goto L1
if t = V2 goto L2
...
if t = Vn-1 goto Ln-1
goto Ln
next :
To translate into above form :
 When keyword switch is seen, two new labels test and next, and a new
temporary t are generated.
 As expression E is parsed, the code to evaluate E into t is generated. After
processing E ,the jump goto test is generated.
 As each case keyword occurs, a new label Li is created and entered into the
symbol table. A pointer to this symbol-table entry and the value Vi of case
constant are placed on a stack (used only to store cases).
 Each statement case Vi : Si is processed by emitting the newly created label
Li, followed by the code for Si , followed by the jump goto next.
 Then when the keyword end terminating the body of the switch is found, the
code can be generated for the n-way branch. Reading the pointer-value pairs
on the case stack from the bottom to the top, we can generate a sequence of
three-address statements of the form
32
case V1 L1
case V2 L2
...
case Vn-1 Ln-1
case t Ln
label next
where t is the name holding the value of the selector expression E, and Ln is the
label for the default statement.
4.9 BACKPATCHING
 The easiest way to implement the syntax-directed definitions in to use
passes.
 First, construct a syntax tree for the input.
 Then walk the tree in depth-first order, computing the translations
given in the definition.
 Then main problem with generating code for Boolean expression and
flow-of-control statement is a single pass is that during one single
pass.
 Statement will be put on a list of goto statements whose labels will be filled
in when the proper label can be determine.We call this subsequent
filling in of labels backpatching.
 Use three functions:
 Marklist(i) creates a new list containing only i ,an index the array of
quadruples;marklist returns a pointer to the list it has made.
 Merge(p1,p2) concatenates the list pointed to by p1 and p2,and returns
a pointer to the concatenated list.
 Backpatch(p,i) inserts I as the target label for each of the statements
on the list pointed to by p.
33
4.8.1 Boolean Expressions:
We now construct a translation scheme suitable for producing quadruples for
Boolean expressions during bottom-up parsing. The grammar we use is the
following:
(1) E -> E1 or M E2
(2) | E1 and M E2
(3) | not E1
(4) | ( E1)
(5) | id1 relop id2
(6) | true
(7) | false
(8) M -> ɛ
Synthesized attributes truelist and falselist of nonterminal E are used to
generate jumping code for boolean expressions. Incomplete jumps with unfilled
labels are placed on lists pointed to by E.truelist and E.falselist.
Consider production E -> E1 and M E2. If E1 is false, then E is also false,
so the statements on E1.falselist become part of E.falselist. If E1 is true, then we
must next test E2, so the target for the statements E1.truelist must be the beginning
of the code generated for E2. This target is obtained using marker nonterminal M.
Attribute M.quad records the number of the first statement of E2.code. With
the production M -> ɛ we associate the semantic action.
{ M.quad : = nextquad }
The variable nextquad holds the index of the next quadruple to follow. This
value will be backpatched onto the E1.truelist when we have seen the remainder of
the production E -> E1 and M E2. The translation scheme is as follows:
(1). E -> E1 or M E2 { backpatch ( E1.falselist, M.quad);
34
E.truelist : = merge( E1.truelist, E2.truelist);
E.falselist : = E2.falselist }
(2). E -> E1 and M E2 { backpatch ( E1.truelist, M.quad);

E.truelist : = E2.truelist;
E.falselist : = merge(E1.falselist, E2.falselist) }
(3) .E -> not E1 { E.truelist : = E1.falselist;

E.falselist : = E1.truelist; }
(4) E -> ( E1 ) { E.truelist : = E1.truelist;

E.falselist : = E1.falselist; }
(5) E -> id1 relop id2 { E.truelist : = makelist (nextquad);

E.falselist : = makelist(nextquad + 1);
emit(‘if’ id1.place relop.op id2.place ‘goto_’)
emit(‘goto_’) }
(6) E -> true { E.truelist : = makelist(nextquad);

emit(‘goto_’) }
(7) E -> false { E.falselist : = makelist(nextquad);

emit(‘goto_’) }
(8) M -> ɛ { M.quad : = nextquad }

4.8.2 Flow-of-Control Statements:
A translation scheme is developed for statements generated by the following
grammar :
(1) S -> if E then S
(2) | if E then S else S
(3) | while E do S
(4) | begin L end
(5) |A
(6) L -> L ; S
(7) | S
Here S denotes a statement, L a statement list, A an assignment statement,
and E a Boolean expression. We make the tacit assumption that the code that
35
follows a given statement in execution also follows it physically in the quadruple
array. Else, an explicit jump must be provided.
4.8.3 Scheme to implement the Translation:
The nonterminal E has two attributes E.truelist and E.falselist. L and S also
need a list of unfilled quadruples that must eventually be completed by
backpatching. These lists are pointed to by the attributes Lnextlist and S.nextlist.
S.nextlist is a pointer to a list of all conditional and unconditional jumps to the
quadruple following the statement S in execution order, and L.nextlist is defined
similarly.
The semantic rules for the revised grammar are as follows:
(1). S -> if E then M1 S1N else M2 S2
{ backpatch (E.truelist, M1.quad);
backpatch (E.falselist, M2.quad);
S.nextlist : = merge (S1.nextlist, merge (N.nextlist, S2.nextlist)) }
We backpatch the jumps when E is true to the quadruple M1.quad, which is
the beginning of the code for S1. Similarly, we backpatch jumps when E is false to
go to the beginning of the code for S2. The list S.nextlist includes all jumps out of
S1 and S2, as well as the jump generated by N.
(2). N -> ɛ { N.nextlist : = makelist( nextquad );
emit(‘goto _’) }
(3).M -> ɛ { M.quad : = nextquad }
(4). S -> if E then M S1 { backpatch( E.truelist, M.quad);
S.nextlist : = merge( E.falselist, S1.nextlist) }
(5). S -> while M1 E do M2 S1 { backpatch( S1.nextlist, M1.quad);

backpatch( E.truelist, M2.quad);
S.nextlist : = E.falselist
36
emit( ‘goto’ M1.quad ) }
(6) .S -> begin L end { S.nextlist : = L.nextlist }
(7). S ->A { S.nextlist : = nil }
The assignment S.nextlist : = nil initializes S.nextlist to an empty list.
(8). L -> L1 ; M S { backpatch( L1.nextlist, M.quad);
L.nextlist : = S.nextlist }
The statement following L1 in order of execution is the beginning of S. Thus
the L1.nextlist list is backpatched to the beginning of the code for S, which is given
by M.quad.
(9). L -> S { L.nextlist : = S.nextlist }
4.9 PROCEDURE CALLS
The procedure is such an important and frequently used programming
construct that it is imperative for a compiler to generate good code for procedure
calls and returns. The run-time routines that handle procedure argument passing,
calls and returns are part of the run-time support package.
Let us consider a grammar for a simple procedure call statement
(1). S -> call id ( Elist )
(2). Elist -> Elist , E
(3). Elist -> E
Calling Sequences:
The translation for a call includes a calling sequence, a sequence of actions
taken on entry to and exit from each procedure. The falling are the actions that take
place in a calling sequence:
 When a procedure call occurs, space must be allocated for the activation
record of the called procedure.
 The arguments of the called procedure must be evaluated and made available
to the called procedure in a known place.
37
 Environment pointers must be established to enable the called procedure to
access data in enclosing blocks.
 The state of the calling procedure must be saved so it can resume execution
after the call.
 Also saved in a known place is the return address, the location to which the
called routine must transfer after it is finished.
 Finally a jump to the beginning of the code for the called procedure must be
generated.
For example, consider the following syntax-directed translation
(1). S -> call id ( Elist )
{ for each item p on queue do
emit (‘ param’ p );
emit (‘call’ id.place) }
(2). Elist -> Elist , E
{ append E.place to the end of queue }
(3). Elist -> E
{ initialize queue to contain only E.place }
 Here, the code for S is the code for Elist, which evaluates the arguments,
followed by a param p statement for each argument, followed by a call
statement.
 queue is emptied and then gets a single pointer to the symbol table location
for the name that denotes the value of E.
38

UNIT - IV - TUV - Compiler Design - NOTES - PG

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UNIT - IV - TUV - Compiler Design - NOTES - PG

Uploaded by

Copyright:

Available Formats

UNIT-IV

Storage organization: Storage organization - storage allocation strategies -

4.1.1 Typical subdivision of run-time memory:

 Run-time storage comes in blocks, where a byte is the smallest unit of

4.2 STORAGE ALLOCATION STRATEGIES

Implementation of Call statement:

FIGURE: ACCESS DYNAMICALLY ALLOCATED AREA

 Heap allocation parcels out pieces of contiguous storage, as needed for

 The record for an activation of procedure r is retained when the activation

copy_restore(y); //l-value of y is passed

x = 99; // y still has value 10 (unaffected)

4.4 SYMBOL TABLE

<symbol name, type, attribute>

then it should store the entry such as:

The attribute clause contains the entries related to the name.

should be processed by the compiler as:

4.4.4 Scope Management

The above program can be represented in a hierarchical structure of symbol

4.4 DYNAMIC STORAGE ALLOCATION

FIGURE: EXPLICIT ALLOCATION OF FIXED SIZE BLOCK

FIGURE: IMPLICIT DEALLOCATION

4.5 INTERMEDIATE CODE GENERATION

 If a compiler translates the source language to its target machine language

offset = offset + id.width

offset = offset + id.width

4.7.1 Reusing Temporary Names

Op arg1 arg2 result

4.7.4.Type conversion within Assignments :

4.8 BOOLEAN EXPRESSIONS

4.8.1 Methods of Translating Boolean Expressions

(1). E -> E1 or M E2 { backpatch ( E1.falselist, M.quad);

(2). E -> E1 and M E2 { backpatch ( E1.truelist, M.quad);

(3) .E -> not E1 { E.truelist : = E1.falselist;

(4) E -> ( E1 ) { E.truelist : = E1.truelist;

(5) E -> id1 relop id2 { E.truelist : = makelist (nextquad);

(6) E -> true { E.truelist : = makelist(nextquad);

(7) E -> false { E.falselist : = makelist(nextquad);

(8) M -> ɛ { M.quad : = nextquad }

(5). S -> while M1 E do M2 S1 { backpatch( S1.nextlist, M1.quad);

You might also like