You are on page 1of 43

Compiler Design

Lecture-1

Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata 1


Compiler Design

Run-Time Environments
• The abstractions embodied in the source language definition are -
names, scopes, bindings, data types, operators, procedures,
parameters, and flow-of-control constructs.

• A compiler must accurately implement these abstractions and also


must cooperate with the operating system and other systems software
to support these abstractions on the target machine.

• To do so, the compiler creates and manages a run-time environment


in which it assumes its target programs are being executed.

• This environment deals with a variety of issues such as the layout


and allocation of storage locations for the objects named in the source
program, the mechanisms used by the target program to access
variables, the linkages between procedures, the mechanisms for
passing parameters, and the interfaces to the operating system,
input/output devices, and other programs.

2
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Source Language Issues

Language features that effect the organization of memory


•Does the source language allow recursion?
While handling the recursive calls there may be several instances of
recursive procedures that are active simultaneously.

•How the parameters are passed to the procedure?


Call by value
Call by address
Call by reference

•Does the procedure refer nonlocal names? How?

•Does the language support the memory allocation and deallocation


dynamically?

3
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Storage Organization

From the perspective of the compiler writer, the executing target


program runs in its own logical address space in which each program
value has a location.

The management and organization of this logical address space is


shared between the compiler, operating system, and target machine.

The operating system maps the logical addresses into physical


addresses.

4
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Storage Organization
• From the perspective of the compiler
writer, the executing target program runs
in its own logical address space in which
each program value has a location.

• The management and organization of


this logical address space is shared
between the compiler, operating system,
and target machine.

• The operating system maps the logical


addresses into physical addresses.
We assume-
• The run-time storage comes in blocks of contiguous bytes, where a
byte is the smallest unit of addressable memory.
• An elementary data type, such as a character, integer, or float, can
be stored in an integral number of bytes.
• Storage for an aggregate type, such as an array or structure, must
be large enough to hold all its components.

5
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Storage Organization
We assume-
The run-time storage comes in blocks of contiguous bytes, where a
byte is the smallest unit of addressable memory.

An elementary data type, such as a character, integer, or float, can be


stored in an integral number of bytes.

Storage for an aggregate type, such as an array or structure, must be


large enough to hold all its components.

The storage layout for data objects is strongly influenced by the


addressing constraints of the target machine.
•Aligning
•Padding
•Packing

6
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Storage Organization

Code area
-Target code is fixed at compile time

-Compiler can place the executable target


code in a statically determined

- Low end of memory


Static area
- The size of some program data objects, such as global constants, and
data generated by the compiler, such as information to support garbage
collection, may be known at compile time.

- One reason for statically allocating as many data objects as possible is


that the addresses of these objects can be compiled into the target
code. In early versions of Fortran, all data objects could be allocated
statically.

To maximize the utilization of space at run time, the other two areas,
Stack and Heap, are at the opposite ends of the remainder of the
address space.
7
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Storage Organization

Stack
-The stack is used to store data structures
called activation records that get generated
during procedure calls.

-The stack grows towards lower addresses.

Heap
- The heap grows towards higher.
(We shall assume that the stack grows towards higher addresses so that
we can use positive offsets for notational convenience in all our
examples.)

Many programming languages allow the programmer to allocate and


deallocate data under program control. For example, C has the functions
malloc and free that can be used to obtain and give back arbitrary
chunks of storage.

8
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Storage Allocation Strategies

 Code Area

 Static Data Area

 Stack Area

 Heap Area

Three different storage allocation strategies based on this division of


runtime storage-

1. Static allocation – allocation of all data object at compile time.

2. Stack allocation – stack is used to manage the runtime storage.

3. Heap allocation – heap is used to manage the dynamic memory


allocation.

9
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

1. Static Allocation

 The size of data object is known at compile time. The names of these
objects are bound to storage at compile time only.

 The binding of name with amount of storage allocated do not chane


at runtime.

 Compiler can easily determine the amount of storage required by


data objects.

 Compiler can fill the addresses at which the target code can find the
data it operates on.

 FORTRAN uses the static allocation strategy.

10
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

1. Static Allocation

Limitations of static allocations

 Can be done if the size of the data object known at compile time.

 The data structure can not be created dynamically – cannot manage


memory at runtime.

 Recursive procedures are not supported.

11
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

2. Stack Allocation

The storage is organized as stack – called controlled stack.

On activation the activation records are pushed into the stack and on
completion of activation the corresponding record can be popped.

The locals are stored in the each activation record. Hence locals are
bound to corresponding activation record.

The data structure can be created dynamically for stack allocation.

12
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

3. Heap Allocation

 If the values of non local variables must be retained even after the
activation record then such a retaining is not possible by stack
allocation. This limitation of stack allocation is because of its LIFO
nature. For retaining of such local variables heap allocation strategy is
used.

 The heap allocation allocates the continuous block of memory when


required and deallocated when no more needed. This deallocated
memory can be further reused by heap manager.

 The efficient heap management can be done by –


•Creating linked list for free blocks and when any memory is
deallocated that block of memory is appended to the linked list.

•Allocate the most suitable block of memory from the linked list i.e.
Use best fit technique for allocation of bock.

13
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Comparison between Static, Stack and Heap allocation


Static allocation Stack allocation Heap Allocation
Done for all data objects at Stack is used to manage the Heap is used to manage
compile time. runtime memory. dynamic memory allocation.
Data structure cannot be Data structures and data Data structures and data
created dynamically. objects can be created objects can be created
dynamically. dynamically.
Memory allocation: The names Memory allocation: Using LIFO Memory allocation: A
of data objects are bound to activation records and data contiguous block of memory
storage at compile time. objects are pushed into the from heap is allocated.
stack. The memory addressing
can be done using index and
registers.
Merits and limitations: Simple Merits and limitations: Merits and limitations: Efficient
to implement but supports Supports dynamic memory memory management is done
static allocation only. allocation but it is slower than using linked list.
Recursive procedures are not static allocation. The deallocation of space can
supported. Supports recursive procedures be reused.
but references to non local But since memory block is
variables after activation record allocated using best fit, holes
can not be retained. may get introduced in the
memory.
14
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Static Versus Dynamic Storage Allocation

The layout and allocation of data to memory locations in the run-time


environment are key issues in storage management.

• We say that a storage-allocation decision is static, if it can be made


by the compiler looking only at the text of the program, not at what the
program does when it executes.

• Conversely, a decision is dynamic if it can be decided only while the


program is running.

15
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Static Versus Dynamic Storage Allocation

Many compilers use some combination of the following two strategies


for dynamic storage allocation:
1. Stack storage. Names local to a procedure are allocated space on a
stack. The stack supports the normal call/return policy for procedures.

2. Heap storage. Data that may outlive the call to the procedure that
created it is usually allocated on a "heap" of reusable storage. The heap
is an area of virtual memory that allows objects or other data elements
to obtain storage when they are created and to return that storage
when they are invalidated.

To support heap management, "garbage collection" enables the run-


time system to detect useless data elements and reuse their storage,
even if the programmer does not return their space explicitly.
Automatic garbage collection is an essential feature of many modern
languages, despite it being a difficult operation to do efficiently.

16
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Stack Allocation of Space

Each time a procedure1 is called, space for its local variables is


pushed onto a stack, and when the procedure terminates, that space is
popped off the stack.

This arrangement not only allows space to be shared by procedure


calls whose durations do not overlap in time, but it allows us to compile
code for a procedure in such a way that the relative addresses of its
nonlocal variables are always the same, regardless of the sequence of
procedure calls.

17
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Activation Trees
Stack allocation would not be feasible if procedure calls, or activations
of procedures, did not nest in time.
If an activation of procedure p calls procedure q, then that activation of
q must end before the activation of p can end. There are three common
cases:

1. The activation of q terminates normally. Then in essentially any


language, control resumes just after the point of p at which the call to
q was made.
2. The activation of q, or some procedure q called, either directly or
indirectly, aborts; i.e., it becomes impossible for execution to continue.
In that case, p ends simultaneously with q.
3. The activation of q terminates because of an exception that q cannot
handle.

Procedure p may handle the exception, in which case the activation of q


has terminated while the activation of p continues, although not
necessarily from the point at which the call to q was made. If p cannot
handle the exception, then this activation of p terminates at the same
time as the activation of q, and presumably the exception will be
handled by some other open activation of a procedure.
18
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Example: Consider the program that reads nine integers into an array a
and sorts them using the recursive quicksort algorithm.
int a[11];
void readArrayO { /* Reads 9 integers into o[l], ...,o[9]. */
int i;
}
int partition(int m, int n) {
/* Picks a separator value v, and partitions a[m ..n] so that a[m ..p — 1] are less than v, a\p] = v,
and a[p + 1.. n] are equal to or greater than v. Returns p. */
}
void quicksort(int m, int n) {
int i;
if(n > m) {
i = partition( m , n );
quicksort(m, i - 1 );
quicksort( i + 1 , n );
}
}
mainQ {
readArrayO ;
a[0] = -9999;
a [10] = 9999;
quicksort( 1 , 9 ) ;
}
19
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Possible activations for the program

enter main()
enter readArray()
leave readArray()
enter quicksort( 1 , 9)
enter partition( 1 , 9)
leave partition( 1 , 9)
enter quicksort( 1 , 3)
leave quicksort( 1 , 3)
enter quicksort( 5 , 9)
leave quicksort( 5 , 9)
leave quicksort( 1 , 9)
leave main()

20
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Activation tree representing calls during an execution of


quicksort

21
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Downward-growing stack of activation records

22
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Activation Records

-Procedure calls and returns are usually managed by a run-time stack


called the control stack.

-Each live activation has an activation record (sometimes called a


frame) on the control stack, with the root of the activation tree at the
bottom, and the entire sequence of activation records on the stack
corresponding to the path in the activation tree to the activation where
control currently resides.

23
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Activation Records

-Temporary values, such as those arising from the


evaluation of expressions, in cases where those
temporaries cannot be held in registers.

- Local data belonging to the procedure whose


activation record this is.

- A saved machine status, with information about the state of the


machine just before the call to the procedure. This information typically
includes the return address and the contents of registers that were
used by the calling procedure and that must be restored when the
return occurs.

- An "access link" may be needed to locate data needed by the called


procedure but found elsewhere, e.g., in another activation record.

- A control link, pointing to the activation record of the caller.

24
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Activation Records

-Space for the return value of the called function, if


any. Again, not all called procedures return a value,
and if one does, we may prefer to place that value in a
register for efficiency.

- The actual parameters used by the calling procedure.


Commonly, these values are not placed in the
activation record but rather in registers, when
possible, for greater efficiency. However, we show a
space for them to be completely general.

25
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Example: By taking example of factorial program explain how activation record will
look like for every recursive call in case of factirial (3).
Solution:

int factorial (int n){


if(n == 1)
return 1;
else
return (n * factorial (n – 1) );
}

main(){
int f;
f = factorial(3);
}

26
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Example: By taking example of factorial program explain how activation record will
look like for every recursive call in case of factorial (3).
Step 1:

Act. Record for


main()

Act. Record for


factorial()

27
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Example: By taking example of factorial program explain how activation record will
look like for every recursive call in case of factorial (3).
Step 2:
Act. Record for
main()

Act. Record for


factorial (3)

Act. Record for


factorial (2)

28
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Example: By taking example of factorial program explain how activation record will
look like for every recursive call in case of factorial (3).
Step 3:
Act. Record for
main ()

Act. Record for


factorial (3)

Act. Record for


factorial (2)

Act. Record for


factorial (1)

29
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Parameter Passing
There are two types of parameters-
i) Formal Parameter
ii) Actual Parameter

Based on these parameters there are various parameter passing


methods, the most common methods are (all the examples in
FORTRAN) -

30
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

1. Call by value:

• The actual parameters are evaluated and their r-value are passed to
called procedure.

• The operations on formal parameters do not changes the values of


actual parameters.

• Example: Language like C, C++ use actual parameter passing method.


In PASCAL the non-var parameter.

31
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

2. Call by reference: This method is also called as call by address or


call by location.

• The L-value, the address of actual parameter is passed to the called


routines activation.

• The values of actual parameters can be changed.

• The actual parameter should have an L-value.

• Example: Reference parameters in C++, PASCAL’s var parameters.

32
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

3. Copy restore: This method is a hybrid between call by value and


call by reference. This method is also known as copy-in-copy-out or
values result.

• The calling procedure calculates the value of actual parameter and it


then copied to activation record for the called procedure.

• During execution of called procedure, the actual parameters value is


not affected.

• If the actual parameter has L-value then at return the value of formal
parameter is copied to actual parameter.

•Example: In ADA, this kind of parameter passing is used.

33
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

4. Call by name:

• Procedure is treated like macro. The procedure body is substituted for


call in caller with actual parameters substituted for formals.

• The actual parameters can be surrounded by parenthesis to preserve


their integrity.

• The local names of called procedure and names of calling procedure


are distinct.

•Example: In ALGOL uses call by name method.

34
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Symbol Tables

 A compiler uses a symbol table to keep track of scope and binding


information about names.

 The table is searched every time a name is encountered in source


code.

 A symbol-table mechanism must allow us to add new entries and find


existing entries efficiently.

We evaluate each scheme on basis of time required to add n entries and


make e enquires.

• A symbol-table mechanism must allow us to add new entries and find


existing entries efficiently.

35
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Symbol Tables

Symbol-Table Entries

 The items to be stored in symbol table are:


 Variable names
 Constants
 Procedure names
 Function names
 Literal constants and strings
 Compiler generated temporaries
 Labels in source language

 Compiler uses following types of information from symbol-table


 Datatype
 Name
 Declaring procedure
 offset storage
 if structure or record then pointer to the structure table
 for parameters, whether parameter passing is by value or reference ?
 Number and type of arguments passes to the function
 Base address
36
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

How to store the names in symbol tables

 The lexeme consisting of character string forming the name and


attributes of the name.

 The lexeme is needed when a symbol-table entry is set up for first


time, and when we look up a lexeme found in input to determine
whether it is a name that has already appeared.

 There are two types of name representation

1. Fixed-length name
• A fixed space for each name is allocated in symbol table.
• If name is too small then there is wastage of space.

37
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

How to store the names in symbol tables

2. Variable-length name
• Rather than allocating in each symbol-table entry the maximum
possible amount of space to hold a lexeme, we can utilize space
more efficiently if there is only one pointer space in a symbol-table
entry.
• In the record for name, we place a pointer to separate array of
characters (the string table) giving position of the first charecter of
the lexeme.

38
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Symbol Table Management

Requirement for symbol table management:

i) For quick insertion of identifier and related information

ii) For quick searching of identifier

39
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Symbol Table Management

1. List data structure for symbol-


table
 Linear list is a simplest kind of
mechanism to implement symbol
table.
 An array is used to store names and
associated information.
 New names can be added in the
order they have arrive.
 The pointer ‘available’ is maintained at the end of all stored records.
 To retrieve the information about some name we start from
beginning of array and go on searching up to available pointer. If we
reach at pointer available without finding a name we get an error “use
of undeclared name”.
 While inserting a new name we should ensure that it should not be
already there. If it is there another error occurs i.e. “Multiple defined
Name”.
 The advantage of list organization is that it takes minimum amount
of space.
40
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Symbol Table Management

2. Self organizing list


 Linear list is implemented using
linked list. A link field is added to
each record.
 We search the records in the
order pointed by the link of link
field.
 A pointer “First” is maintained to point to first record of the symbol
table. The reference to these names ca be Name 3, Name 1, Name 4,
Name 2.
 When the name is referenced or created it is moved to the front of
the list.
 The most frequently referred name will tend to be front of the list.
Hence access time to most frequently referred names will be least.

41
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Symbol Table Management

3. Hash tables
 Hashing is an important technique used to search records of symbol
table. This method is superior to list organization.
 A hash table consisting of a fixed array of m pointers to table
entries.
 Table entries organized into m separate linked lists, called buckets.
Each record in symbol table appears on exactly one of these lists.
The dynamic storage allocation facilities of the implementation
language can be used to obtain space for the records, often at some
loss of efficiency.

A hash table
of size 211

42
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata
Compiler Design

Symbol Table Management

3. Hash tables (contd...)


• To determine whether there is an entry for string s in the symbol
table, we apply a hash function h to s, such that h(s) returns an
integer between 0 to m – 1.
• If s is in symbol table, then it is on the list numbered h(s). If s is not
yet in symbol table, it is entered by creating a record for s that is
linked at the front of list numbered h(s).
• The hash function should result in uniform distribution of names in
symbol table.
• The hash function should be such that there will be minimum number
of collision. Collision is such a situation where hash function results in
same location for storing the names.
• Various collision techniques are – open hashing, chaining, rehashing.
• The advantage of hashing is quick search and the disadvantage is
that hash is complicated to implement. Some extra space required.
Obtaining scope of variables is very difficult.

43
Arup Kr. Chattopadhyay, Department of IT, IEM, Kolkata