Professional Documents
Culture Documents
The executing target program runs in its own logical address space in which
each program value has a location.
The management and organization of this logical address space is shared
between the complier, operating system and target machine. The operating
system maps the logical address into physical addresses, which are usually
spread throughout memory.
1
A character array of length 10 needs only enough bytes to hold 10
characters, a compiler may allocate 12 bytes to get alignment, leaving 2
bytes unused.
The unused space due to alignment considerations is referred to as padding.
The size of some program objects may be known at run time and may be
placed in an area called static.
The dynamic areas used to maximize the utilization of space at run time are
stack and heap.
4.1.2 Activation Records:
Procedure Calls and returns are usually managed by a run time stack
called the control stack.
Each live activation has an activation record on the control stack, with
the root of the activation tree at the bottom; the latter activation has its
record at the top of the stack.
The contents of the activation record vary with the language being
implemented. The diagram below shows the contents of activation
record.
2
Temporary values such as those arising from the evaluation of expressions.
Local data belonging to the procedure whose activation record this is.
A saved machine status, with information about the state of the machine just
before the call to procedures.
An access link may be needed to locate data needed by the called procedure
but found elsewhere.
A control link pointing to the activation record of the caller.
Space for the return value of the called functions, if any.Again,not all called
procedures return a value, and if one does, we may prefer to place that value,
and if one does, we may prefer to place that value in a register for efficiency.
The actual parameters used by the calling procedure. These are not placed in
activation record but rather in registers, when possible for greater efficiency.
We assume that the program control flows in a sequential manner and
when a procedure is called, its control is transferred to the called procedure.
When a called procedure is executed, it returns the control back to the caller.
This type of control flow makes it easier to represent a series of activations
in the form of a tree, known as the activation tree.
To understand this concept, we take a piece of code as an example:
printf(“Enter Your Name: “);
scanf(“%s”, username);
show_data(username);
printf(“Press any key to continue…”);
...
int show_data(char *user)
{
printf(“Your name is %s”, username);
return 0;
}
3
Below is the activation tree of the code given:
4
where,
callee.static_area – Address of the activation record
callee.code_area – Address of the first instruction for called procedure
#here + 20 – Literal return address which is the address of the instruction
following GOTO.
4.2.1.2 Implementation of return statement:
A return from procedure callee is implemented by :
GOTO *callee.static_area
This transfers control to the address saved at the beginning of the activation record.
4.2.1.3 Implementation of action statement:
The instruction ACTION is used to implement action statement.
4.2.1.4 Implementation of halt statement:
The statement HALT is the final instruction that returns control to the operating
system.
4.2.2. Stack Allocation
Static allocation can become stack allocation by using relative addresses for
storage in activation records. In stack allocation, the position of activation record is
stored in register so words in activation records can be accessed as offsets from the
value in this register.
The codes needed to implement stack allocation are as follows:
Initialization of stack:
MOV #stackstart , SP /* initializes stack */
Code for the first procedure
HALT /* terminate execution */
5
GOTO callee.code_area
where,
caller.recordsize – size of the activation record
#here + 16 – address of the instruction following the GOTO
Implementation of Return statement:
GOTO *0 ( SP )
SUB #caller.recordsize, SP
4.2.2.1 Stack Allocation Space
Calling Sequence
Procedures called are implemented in what is called as calling sequence,
which consists of code that allocates an activation record on the stack
and enters information into its fields.
A return sequence is similar to code to restore the state of machine so
the calling procedure can continue its execution after the call.
The code in calling sequence is often divided between the calling
procedure (caller) and the procedure it calls (callee).
When designing calling sequences and the layout of activation records,
the following principles are helpful:
6
The calling sequence and its division between caller and callee are as
follows:
The caller evaluates the actual parameters.
The caller stores a return address and the old value of top sp into the
caller’s activation record. The caller then increments the top sp to the
respective positions.
The callee saves the register values and other status information.
The callee initializes its local data and begins execution.
4.2.2.2 Variable Length data on Stack
The run-time memory management system must deal frequently with the
allocation of space for objects, the size of which are not known at the
compile time, but which are local to a procedure and thus may be allocated
on the stack.
The same scheme works for objects or any type it they are local to the
procedure called and have a size that depends on the parameters of the call.
7
4.2.3. HEAP ALLOCATION
Stack allocation strategy cannot be used if either of the following is possible:
8
Variables local to a procedure are allocated and de-allocated only at
runtime. Heap allocation is used to dynamically allocate memory to the variables
and claim it back when the variables are no more required.
Except statically allocated memory area, both stack and heap memory can
grow and shrink dynamically and unexpectedly. Therefore, they cannot be
provided with a fixed amount of memory in the system.
As shown in the image above, the text part of the code is allocated a fixed
amount of memory. Stack and heap memory are arranged at the extremes of total
memory allocated to the program. Both shrink and grow against each other.
4.3 PARAMETER PASSING
The communication medium among procedures is known as parameter
passing. The values of the variables from a calling procedure are transferred to the
called procedure by some mechanism. Before moving ahead, first go through
some basic terminologies pertaining to the values in a program.
4.3.1 r-value
The value of an expression is called its r-value. The value contained in a
single variable also becomes an r-value if it appears on the right-hand side of the
assignment operator. r-values can always be assigned to some other variable.
9
4.3.2 l-value
The location of memory (address) where an expression is stored is known as
the l-value of that expression. It always appears at the left hand side of an
assignment operator.
For example:
day = 1;
week = day * 7;
month = 1;
year = month * 12;
From this example, we understand that constant values like 1, 7, 12, and
variables like day, week, month and year, all have r-values. Only variables have l-
values as they also represent the memory location assigned to them.
For example:
7 = x + y;
is an l-value error, as the constant 7 does not represent any memory location.
4.3.3 Formal Parameters
Variables that take the information passed by the caller procedure are called
formal parameters. These variables are declared in the definition of the called
function.
4.3.4 Actual Parameters
Variables whose values or addresses are being passed to the called
procedure are called actual parameters. These variables are specified in the
function call as arguments.
Example:
fun_one()
{
int actual_parameter = 10;
10
call fun_two(int actual_parameter);
}
fun_two(int formal_parameter)
{
print formal_parameter;
}
Formal parameters hold the information of the actual parameter, depending
upon the parameter passing technique used. It may be a value or an address.
4.3.5 Pass by Value
In pass by value mechanism, the calling procedure passes the r-value of
actual parameters and the compiler puts that into the called procedure’s activation
record. Formal parameters then hold the values passed by the calling procedure. If
the values held by the formal parameters are changed, it should have no impact on
the actual parameters.
4.3.6 Pass by Reference
In pass by reference mechanism, the l-value of the actual parameter is
copied to the activation record of the called procedure. This way, the called
procedure now has the address (memory location) of the actual parameter and the
formal parameter refers to the same memory location. Therefore, if the value
pointed by the formal parameter is changed, the impact should be seen on the
actual parameter as they should also point to the same value.
4.3.7 Pass by Copy-restore
This parameter passing mechanism works similar to ‘pass-by-reference’
except that the changes to actual parameters are made when the called procedure
ends. Upon function call, the values of actual parameters are copied in the
activation record of the called procedure. Formal parameters if manipulated have
11
no real-time effect on actual parameters (as l-values are passed), but when the
called procedure ends, the l-values of formal parameters are copied to the l-values
of actual parameters.
Example:
int y;
calling_procedure()
y = 10;
printf y; //prints 99
copy_restore(int x)
y = 0; // y is now 0
When this function ends, the l-value of formal parameter x is copied to the
actual parameter y. Even if the value of y is changed before the procedure ends,
12
the l-value of x is copied to the l-value of y making it behave like call by
reference.
4.3. 8 Pass by Name
Languages like Algol provide a new kind of parameter passing mechanism that
works like preprocessor in C language. In pass by name mechanism, the name of
the procedure being called is replaced by its actual body. Pass-by-name textually
substitutes the argument expressions in a procedure call for the corresponding
parameters in the body of the procedure so that it can now work on actual
parameters, much like pass-by-reference.
For example, if a symbol table has to store information about the following
variable declaration:
13
static int interest;
int a;
insert(a, int);
4.4.3 lookup()
lookup() operation is used to search a name in the symbol table to determine:
if the symbol exists in the table.
if it is declared before it is being used.
if the name is used in the scope.
if the symbol is initialized.
if the symbol declared multiple times.
The format of lookup() function varies according to the programming language.
The basic format should match the following:
lookup(symbol)
This method returns 0 (zero) if the symbol does not exist in the symbol table. If
the symbol exists in the symbol table, it returns its attributes stored in the table.
{ \
int one_3; |_ inner scope 1
15
int one_4; |
} /
int one_5;
{ \
int one_6; |_ inner scope 2
int one_7; |
} /
}
void pro_two()
{
int two_1;
int two_2;
{ \
int two_3; |_ inner scope 3
int two_4; |
} /
int two_5;
}
16
I.e., implicitly or explicitly.
Three types of allocation
o Explicitly allocation of fixed size block
o Explicit allocation of variable size block
o Implicit deallocaiton
4.4.1. Explicit Allocation of Fixed Size Block
The simplest form dynamic storage allocation.
The blocks linked together in a list and the allocation and deallocation can
done quickly with less or no storage overhead.
A pointer available points to the first block in the list of available blocks.
17
The first fit method can be used to allocate variable sized block.
When a block of size is allocated it search for the first free block size f>=s.
This block is then subdivided in to a used block of size s & a free block of
size f-s.Its time consuming.
When a block is deallocated, it check to see if it is next to a free block. If
possible, the deal located is combined with a free block next to it to create
larger free block. It helps to avoid fragmentation.
4.4.3.Implicit Deallocation
Implicit deallocation required the cooperation between user program &
runtime packages, this is implemented by fixing the format of storage
blocks.
18
In variable size block the size of block is kept in a inaccessible storage
attached to the block.
The second problem is of recognizing the if a block is in use. Used block can
be referred by the user program using pointers. The pointers are kept in a
fixed position in the block for the easiness of checking the reference.
Two approaches can be used for implicit deallocation:
o Reference counts
o Marking techniques
1.Reference Counts:
We keep track of the no of reference to the present block. If it ever drops to
0 the block is deallocated.
Maintaining reference counts can be costly in time (the pointer assignment
p:=q leads to changes in the reference counts of the blocks pointed by both p
& q).
Reference counts are best if there is no cyclical reference occurs.
2. Marking Techniques
Here the user program suspend temporarily & use the frozen pointers to
determine the used blocks. This approach requires all the pointers to the
heap to be known.(Conceptually, it’s like pouring paint to the heap through
the pointers).
First we go through the heap & mark all the blocks unused. Then we follow
the pointers & mark all the reachable blocks as used. Then sequentional scan
of heap collects all the blocks still marked unused.
19
then translated to its target code? Let us see the reasons why we need an
intermediate code.
20
Intermediate code can be either language specific (e.g., Byte Code for Java) or
language independent (three-address code).
4.6.1 Declarations
A variable or procedure has to be declared before it can be used. Declaration
involves allocation of space in memory and entry of type and name in the symbol
table. A program may be coded and designed keeping the target machine structure
in mind, but it may not always be possible to accurately convert a source code to
its target language.
Taking the whole program as a collection of procedures and sub-procedures,
it becomes possible to declare all the names local to the procedure. Memory
allocation is done in a consecutive manner and names are allocated to memory in
the sequence they are declared in the program. We use offset variable and set it to
zero {offset = 0} that denote the base address.
The source programming language and the target machine architecture may
vary in the way names are stored, so relative addressing is used. While the first
name is allocated memory starting from the memory location 0 {offset=0}, the
next name declared later, should be allocated memory next to the first one.
Example:
We take the example of C programming language where an integer variable
is assigned 2 bytes of memory and a float variable is assigned 4 bytes of memory.
int a;
float b;
Allocation process:
{offset = 0}
int a;
id.type = int
id.width = 2
21
{offset = 2}
float b;
id.type = float
id.width = 4
To enter this detail in a symbol table, a procedure enter can be used. This
method may have the following structure:
enter(name, type, offset)
This procedure should create an entry in the symbol table, for variable name,
having its type set to type and relative address offset in its data area.
4.7 ASSIGNMENT STATEMENT
Suppose that the context in which an assignment appears is given by the
following grammar.
Example
P->M D
M ->ɛ
D-> D ; D | id : T | proc id ; N D ; S;D
N -> ɛ
Non terminal P becomes the new start symbol when these productions are added to
those in the translation scheme shown below.
r1 = c * d;
23
r2 = b + r1;
a = r2
r being used as registers in the target program.
A three-address code has at most three address locations to calculate the
expression. A three-address code can be represented in two forms : quadruples
and triples.
(i).Quadruples
Each instruction in quadruples presentation is divided into four fields:
operator, arg1, arg2, and result. The above example is represented below in
quadruples format:
* c d r1
+ b r1 r2
+ r2 r1 r3
= r3 a
(ii).Triples
Each instruction in triples presentation has three fields : op, arg1, and
arg2.The results of respective sub-expressions are denoted by the position of
expression. Triples represent similarity with DAG and syntax tree. They are
equivalent to DAG while representing expressions.
Op arg1 arg2
* c d
+ b (0)
+ (1) (0)
= (2)
24
Triples face the problem of code immovability while optimization, as the
results is positional and changing the order or position of an expression may cause
problems.
(iii).Indirect Triples
This representation is an enhancement over triples representation. It uses
pointers instead of position to store results. This enables the optimizers to freely
re-position the sub-expression to produce an optimized code.
4.7.3.Addressing Array Elements
Elements of an array can be accessed quickly if the elements are stored in a
block of consecutive locations.
Address calculation of multi-dimensional arrays:
A two-dimensional array is stored in of the two forms:
(i). Row-major (row-by-row)
(2).Column-major (column-by-column)
25
Example:
26
Boolean expressions are composed of the Boolean operators ( and, or, and not )
applied to elements that are Boolean variables or relational expressions. Relational
expressions are of the form E1 relop E2, where E1 and E2 are arithmetic
expressions.
Here we consider Boolean expressions generated by the following grammar :
27
A relational expression such as a < b is equivalent to the conditional statement
if a < b then 1 else 0
which can be translated into the three-address code sequence (again, we arbitrarily
start statement numbers at 100) :
100 : if a < b goto 103
101 : t : = 0
102 : goto 104
103 : t : = 1
104 :
4.8.3 Short-Circuit Code:
We can also translate a boolean expression into three-address code without
generating code for any of the boolean operators and without having the code
necessarily evaluate the entire expression. This style of evaluation is sometimes
called “short-circuit” or “jumping” code. It is possible to evaluate boolean
expressions without generating code for the boolean operators and,
or, and not if we represent the value of an expression by a position in the code
sequence.
Translation of a < b or c < d and e < f
100 : if a < b goto 103 107 : t2 : = 1
101 : t1 : = 0 108 : if e < f goto 111
102 : goto 104 109 : t3 : = 0
103 : t1 : = 1 110 : goto 112
104 : if c < d goto 107 111 : t3 : = 1
105 : t2 : = 0 112 : t4 : = t2 and t3
106 : goto 108 113 : t5 : = t1 or t4
28
4.8.4 Flow-of-Control Statements
We now consider the translation of boolean expressions into three-address
code in the context of if-then, if-then-else, and while-do statements such as those
generated by the following grammar:
S -> if E then S1
| if E then S1 else S2
| while E do S1
In each of these productions, E is the Boolean expression to be translated. In the
translation, we assume that a three-address statement can be symbolically labeled,
and that the function new label returns a new symbolic label each time it is called.
E.true is the label to which control flows if E is true, and E.false is the label
to which control flows if E is false.
The semantic rules for translating a flow-of-control statement S allow
control to flow from the translation S.code to the three-address instruction
immediately following S.code.
S.next is a label that is attached to the first three-address instruction to be
executed after the code for S.
29
4.8.5 CASE STATEMENTS
The “switch” or “case” statement is available in a variety of languages. The
switch-statement syntax is as shown below :
Switch-statement syntax
switch expression
begin
case value : statement
case value : statement
...
case value : statement
default : statement
end
There is a selector expression, which is to be evaluated, followed by n
constant values that the expression might take, including a default “value” which
always matches the expression if no other value does. The intended translation of a
switch is code to:
1. Evaluate the expression.
2. Find which value in the list of cases is the same as the value of the expression.
3. Execute the statement associated with the value found.
Step (2) can be implemented in one of several ways :
By a sequence of conditional goto statements, if the number of cases is
small.
By creating a table of pairs, with each pair consisting of a value and a label
for the code of the corresponding statement. Compiler generates a loop to
compare the value of the expression with each value in the table. If no match
is found, the default (last) entry is sure to match.
If the number of cases s large, it is efficient to construct a hash table.
30
There is a common special case in which an efficient implementation of the
n-way branch exists. If the values all lie in some small range, say imin to
imax, and the number of different values is a reasonable fraction of imax -
imin, then we can construct an array of labels, with the label of the statement
for value j in the entry of the table with offset j – imin and the label for the
default in entries not filled otherwise. To perform switch, evaluate the
expression to obtain the value of j , check the value is within range and
transfer to the table entry at offset j-imin .
Syntax-Directed Translation of Case Statements:
Consider the following switch statement:
switch E
begin
case V1 : S1
case V2 : S2
...
case Vn-1 : Sn-1
default : Sn
end
This case statement is translated into intermediate code that has the following
form:
Translation of a case statement
code to evaluate E into t
goto test
L1 : code for S1
goto next
L2 : code for S2
goto next
31
...
Ln-1 : code for Sn-1
goto next
Ln : code for Sn
goto next
test : if t = V1 goto L1
if t = V2 goto L2
...
if t = Vn-1 goto Ln-1
goto Ln
next :
To translate into above form :
When keyword switch is seen, two new labels test and next, and a new
temporary t are generated.
As expression E is parsed, the code to evaluate E into t is generated. After
processing E ,the jump goto test is generated.
As each case keyword occurs, a new label Li is created and entered into the
symbol table. A pointer to this symbol-table entry and the value Vi of case
constant are placed on a stack (used only to store cases).
Each statement case Vi : Si is processed by emitting the newly created label
Li, followed by the code for Si , followed by the jump goto next.
Then when the keyword end terminating the body of the switch is found, the
code can be generated for the n-way branch. Reading the pointer-value pairs
on the case stack from the bottom to the top, we can generate a sequence of
three-address statements of the form
32
case V1 L1
case V2 L2
...
case Vn-1 Ln-1
case t Ln
label next
where t is the name holding the value of the selector expression E, and Ln is the
label for the default statement.
4.9 BACKPATCHING
The easiest way to implement the syntax-directed definitions in to use
passes.
First, construct a syntax tree for the input.
Then walk the tree in depth-first order, computing the translations
given in the definition.
Then main problem with generating code for Boolean expression and
flow-of-control statement is a single pass is that during one single
pass.
Statement will be put on a list of goto statements whose labels will be filled
in when the proper label can be determine.We call this subsequent
filling in of labels backpatching.
Use three functions:
Marklist(i) creates a new list containing only i ,an index the array of
quadruples;marklist returns a pointer to the list it has made.
Merge(p1,p2) concatenates the list pointed to by p1 and p2,and returns
a pointer to the concatenated list.
Backpatch(p,i) inserts I as the target label for each of the statements
on the list pointed to by p.
33
4.8.1 Boolean Expressions:
We now construct a translation scheme suitable for producing quadruples for
Boolean expressions during bottom-up parsing. The grammar we use is the
following:
(1) E -> E1 or M E2
(2) | E1 and M E2
(3) | not E1
(4) | ( E1)
(5) | id1 relop id2
(6) | true
(7) | false
(8) M -> ɛ
Synthesized attributes truelist and falselist of nonterminal E are used to
generate jumping code for boolean expressions. Incomplete jumps with unfilled
labels are placed on lists pointed to by E.truelist and E.falselist.
Consider production E -> E1 and M E2. If E1 is false, then E is also false,
so the statements on E1.falselist become part of E.falselist. If E1 is true, then we
must next test E2, so the target for the statements E1.truelist must be the beginning
of the code generated for E2. This target is obtained using marker nonterminal M.
Attribute M.quad records the number of the first statement of E2.code. With
the production M -> ɛ we associate the semantic action.
{ M.quad : = nextquad }
The variable nextquad holds the index of the next quadruple to follow. This
value will be backpatched onto the E1.truelist when we have seen the remainder of
the production E -> E1 and M E2. The translation scheme is as follows:
34
E.truelist : = merge( E1.truelist, E2.truelist);
E.falselist : = E2.falselist }
35
follows a given statement in execution also follows it physically in the quadruple
array. Else, an explicit jump must be provided.
4.8.3 Scheme to implement the Translation:
The nonterminal E has two attributes E.truelist and E.falselist. L and S also
need a list of unfilled quadruples that must eventually be completed by
backpatching. These lists are pointed to by the attributes Lnextlist and S.nextlist.
S.nextlist is a pointer to a list of all conditional and unconditional jumps to the
quadruple following the statement S in execution order, and L.nextlist is defined
similarly.
The semantic rules for the revised grammar are as follows:
(1). S -> if E then M1 S1N else M2 S2
{ backpatch (E.truelist, M1.quad);
backpatch (E.falselist, M2.quad);
S.nextlist : = merge (S1.nextlist, merge (N.nextlist, S2.nextlist)) }
We backpatch the jumps when E is true to the quadruple M1.quad, which is
the beginning of the code for S1. Similarly, we backpatch jumps when E is false to
go to the beginning of the code for S2. The list S.nextlist includes all jumps out of
S1 and S2, as well as the jump generated by N.
(2). N -> ɛ { N.nextlist : = makelist( nextquad );
emit(‘goto _’) }
(3).M -> ɛ { M.quad : = nextquad }
(4). S -> if E then M S1 { backpatch( E.truelist, M.quad);
S.nextlist : = merge( E.falselist, S1.nextlist) }
37
Environment pointers must be established to enable the called procedure to
access data in enclosing blocks.
The state of the calling procedure must be saved so it can resume execution
after the call.
Also saved in a known place is the return address, the location to which the
called routine must transfer after it is finished.
Finally a jump to the beginning of the code for the called procedure must be
generated.
For example, consider the following syntax-directed translation
(1). S -> call id ( Elist )
{ for each item p on queue do
emit (‘ param’ p );
emit (‘call’ id.place) }
(2). Elist -> Elist , E
{ append E.place to the end of queue }
(3). Elist -> E
{ initialize queue to contain only E.place }
Here, the code for S is the code for Elist, which evaluates the arguments,
followed by a param p statement for each argument, followed by a call
statement.
queue is emptied and then gets a single pointer to the symbol table location
for the name that denotes the value of E.
38