Professional Documents
Culture Documents
Type Checking Run-Time Environments Intermediate Code Generation
Type Checking Run-Time Environments Intermediate Code Generation
Type Checking
Run-time Environments
Intermediate Code Generation
Prepared By:
Dabbal Singh Mahara
2016
1
Type Checking
• A type is a set of values together with a set of operations that can be
performed on them
• Type checking is checking that each operation in a program receives
appropriate number of arguments of appropriate types in appropriate order.
• The purpose of type checking is to verify that operations performed on a
value are in fact permissible.
• Certain operations are legal for values of each type
– It doesn’t make sense to add a function pointer and an integer in C.
– It does make sense to add two integers.
• The type of an identifier is typically available from declarations, but we may
have to keep track of the type of intermediate expressions.
• Type errors arise when operations are performed on values that do not
support that operation.
Dabbal Mahara 2
Type Systems
• A language’s type system specifies which operations are valid for which types.
• Type systems provide a concise formalization of the semantic checking rules.
• A type system defines a set of types and rules to assign types to programming
language constructs like informal type system rules, for example “if both
operands of addition are of type integer, then the result is of type integer”.
• Type Checking is the process of checking that the program obeys the type
system.
• A type checker implements type system.
• A sound type system eliminates run-time type checking for type errors.
– Memory errors: Reading from an invalid pointer, etc.
– Violation of abstraction boundaries.
Dabbal Mahara 3
Type Checking Overview
Dabbal Mahara 4
Static Checking
• Refers to the compile-time checking of programs in order to ensure that
the semantic conditions of the language are being followed
• Examples of static checks include:
– Type checks
– Flow-of-control checks
– Uniqueness checks
– Name-related checks
• Flow-of-control checks: statements that cause flow of control to leave a construct
must have some place where control can be transferred; e.g., break statements in
C
• Uniqueness checks: a language may dictate that in some contexts, an entity can
be defined exactly once; e.g., identifier declarations, labels, values in case
expressions
• Name-related checks: Sometimes the same name must appear two or more
times; e.g., in Ada a loop or block can have a name that must then appear both at
the beginning and at the end
Dabbal Mahara 5
Type Expression
• A language usually provides a set of base types that it supports together with ways to construct
other types using type constructors
• Through type expressions we are able to represent types that are defined in a program
• A base type is a type expression
a primitive data type such as integer, real, char, boolean, …
type-error signal an error during type checking
void : no type
• A type name (e.g., a record name) is a type expression
• A type constructor applied to type expressions is a type expression. E.g.,
– arrays: If T is a type expression and I is a range of integers, then array(I,T) is a
type expression
– records: If T1, …, Tn are type expressions and f1, …, fn are field names, then
record((f1,T1),…,(fn,Tn)) is a type expression
– pointers: If T is a type expression, then pointer(T) is a type expression Ex: pointer(int)
– functions: If T1, …, Tn, and T are type expressions, then so is (T1,…,Tn) →T.
Ex: int→int represents the type of a function which takes an int value as parameter,
and return type is also int.
Dabbal Mahara 6
A Simple Type Checking System
Dabbal Mahara 7
Specification of Simple Type checker
Dabbal Mahara 8
Type checking for expression
• The synthesized attribute type for E gives the type of the expression
assigned by the type system for the expression generated by E.
• The function lookup returns the type of id.
• The following figure shows the type checking for the expressions.
Dabbal Mahara 9
Type checking for statements
• In some languages statements have a type associated with them, while some
other languages don’t assign types to statements.
• In the latter case, statements are given a type void to distinguish a type safe
statement with one which has a type error.
• if an error occurs within a statement, then the type assigned to this statement is
type_error.
Dabbal Mahara 10
Type checking for functions
Dabbal Mahara 11
Type Conversion and Coercion
• Since representation of integer and real is different within a computer, the different
machine instructions are used for operations on integers and reals. Often if different
parts of an expression are of different types then type conversion is required.
• For example, in the expression: z = x + y what is the type of z if x is integer and y is
real ?
• Compilers have to convert one of the them to ensure that both operand of same type!
• In many language Type conversion is explicit, for example using type casts i.e. must
be specify as inttoreal(x)
• Type conversions which happen implicitly is called coercion. Implicit type conversions
are carried out by the compiler recognizing a type incompatibility and running a type
conversion routine (for example, something like inttoreal(int)) that takes a value of the
original type and returns a value of the required type.
• The coercion of expressions is given in following figure.
Dabbal Mahara 12
Type Conversion and Coercion (Contd.)
Dabbal Mahara 13
Structural Equivalence of Type Expressions
• The basic question is "when are two type expressions equivalent?"
• Two expressions are structurally equivalent if they are two expressions of same basic types
or are formed by applying same constructor.
Example: int a, b;
Here a and b are structurally equivalent.
Dabbal Mahara 14
Run-time Environments
Dabbal Mahara 15
Run-time Environment...
• Runtime support system is a package, mostly generated with the executable
program itself and facilitates the process communication between the process
and the runtime environment. It takes care of memory allocation and de-
allocation while the program is being executed
• This environment deals with a number of issues such as layout and allocation of
storage locations for the objects named in the source program, the mechanisms
used by the target program to access variables, the linkages between
procedures, the mechanisms for passing parameters, and interfaces to the
oerating system, input/output devices and other programs.
• That is,
‣ Management of run-time resources
‣ Correspondence between static (compile-time) and dynamic (run-time) structures
‣ Storage organization
Dabbal Mahara 16
Run-time Resources
• Execution of a program is initially under the control of the operating
system (OS)
• When a program is invoked:
‣ The OS allocates space for the program
‣ The code is loaded into part of this space
‣ The OS jumps to the entry point of the program
(i.e., to the beginning of the “main” function)
Dabbal Mahara 17
Memory Layout: Storage Organization
Low Address
High Address
Dabbal Mahara 18
Correspondance between static and Dynamic structures
• Compiler must do the storage allocation and provide access to variables and data.
• At run time, we need a system to map NAMES (in the source program) to STORAGE
on the machine.
• Allocation and de-allocation of memory is handled by a RUN-TIME SUPPORT
SYSTEM typically linked and loaded along with the compiled target code.
• One of the primary responsibilities of the run-time system is to manage ACTIVATIONS
of procedures.
• Procedure execution begins at the first statement of the procedure body.
• When a procedure returns, execution returns to the instruction immediately following
the procedure call.
Dabbal Mahara 19
Activation and Activation Tree
• Every execution of a procedure is called an ACTIVATION.
• The LIFETIME of an activation of procedure P is the sequence of steps between
the first and last steps of P’s body, including any procedures called while P is
running.
• Normally, when control flows from one activation to another, it must (eventually)
return to the same activation.
• If a procedure is recursive, a new activation can begin before an earlier
activation of the same procedure has ended.
• We can represent the activations of procedures during the running of an entire
program by a tree, called an activation tree.
• Activation tree shows the way control enters and leaves activations. In an
activation tree:
– Each node represents an activation of a procedure.
– The root represents the activation of the main program.
– The node a is a parent of the node b if the control flows from a to b.
– The node a is left to to the node b if the lifetime of a occurs before the lifetime
of b. Dabbal Mahara 20
Procedure Activations: Example
Dabbal Mahara 21
Procedure Activation : Example (contd...)
• The example is a sketch of a program that reads nine integers into an array a
and sorts them using the reciursie quicksort algorithm.
• The main function has three tasks. it calls readarray, sets sentinels and then calls
quicksort on the entire data array.
• The figure in the right side shows the sequence of calls that might result from an
execution of the program. In this execution, the call to partition(1,9) returns 4, so
a[1] to a[3] hold elements less than its chosen separator value v, while the larger
elements are in a[5] through a[9].
• In this example, procedure activations are nested in time.
Dabbal Mahara 22
Activation Tree: During an Execution of quicksort
• This activation tree shows one possible activation tree that completes the
sequence of calls and returns in above program.
• The functions are represented by the first letters of their names.
• Remember that this tree is only one possibility, since the arguments of
subsequent calls, and also the number of calls along any branch is
influenced by the values returned by the partition.
Dabbal Mahara 23
Control Stack
• Procedure calls and returns are managed by a run time stack called the control stack.
• Each live activation has a frame known as activation record, on the control stack, with
root of the activation tree at the bottom and the entire sequence of activations
corresponding to the path in the activation tree to the activation where control resides
currently. The latter activations has a record at the top of the stack.
• The stack keeps track of currently-active procedure activations.
– An activation record is pushed onto the control stack as the activation starts.
– That activation record is popped when that activation ends.
• At any point in time, the control stack represents a path from the root of the activation
tree to one of the nodes.
• The flow of the control in a program corresponds to a depth first traversal of the
activation tree that:
– starts at the root,
– visits a node before its children, and
– recursively visits children at each node an a left‐to‐right order.
Dabbal Mahara 24
Top
Dabbal Mahara 25
Activation Records
• Information needed by a single execution of a procedure is managed using a
contiguous block of storage called activation record.
• An activation record is allocated when a procedure is entered, and it is
de‐allocated when that procedure exited.
• Size of each field can be determined at compile time (Although actual location of
the activation record is determined at run‐time).
• Except that if the procedure has a local variable and its size depends on a
parameter, its size is determined at the run time.
Dabbal Mahara 26
A General Activation Record
• Temporary values, such as those arising from the evaluation of
expressions, in cases where those temporaries cannot be held in Actual parameters
registers. Returned values
• Local data belonging to the procedures whose activation record this is.
• Saved machine status, withe information about the state of the machine Control link
just before the call to the procedure. This information typically includes
the return address ( value of the program counter, to which the called Access link
procedure must return) and the content of registers that were used by
the calling procedure and that must be restored when the return occurs. Saved machine status
• An access link, may be added to locate data needed by the called Local data
procedure but found elsewhere, e.g. in another activation record.
• A control link, pointing to the activation record of caller. Temporaries
• Space for return value of the called function, if any.
• The actual parameters used by the calling procedure.
Dabbal Mahara 27
Creation of An Activation Record
Dabbal Mahara 28
Creation of An Activation Record
control link
links and saved status Caller’s activation record
callee’s responsibility
temporaries and local data
Stack_top
Dabbal Mahara 29
Creation of An Activation Record
Sample calling sequence
• Caller evaluates the actual parameters and places them into the activation record of the
callee.
• Caller stores a return address and old value for stack_top in the callee’s activation record.
• Caller increments stack_top to the beginning of the temporaries and locals for the callee.
• Caller branches to the code for the callee.
• Callee saves all needed register values and status.
• Callee initializes its locals and begins execution.
Sample return sequence
• Callee places the return value at the correct location in the activation record (next to caller’s
activation record)
• Callee uses status information previously saved to restore stack_top and the other registers.
• Callee branches to the return address previously requested by the caller.
• [Optional] Caller copies the return value into its own activation record and uses it to evaluate
an expression.
Dabbal Mahara 30
Who deallocates?
• Callee de‐allocates the part allocated by Callee.
• Caller de‐allocates the part allocated by Caller.
Variable-length data
• In some languages, array size can depend on a value
passed to the procedure as a parameter.
• This and any other variable-sized data can still be
allocated on the stack, but BELOW the callee’s
activation record.
• In the activation record itself, we simply store
POINTERS to the to-be-allocated data.
• All variable-length data is pointed to from the local
data area.
Dabbal Mahara 31
• In the analysis-synthesis model of compiler, the front end analyzes a source program and
creates an intermediate representation, from which the back end generates target code.
• The details of source language are confined to front end and details of the target machine to
the back end.
• With a suitably defined intermediate representation, a compiler for language i and machine j
can then be built by combining the front end for language i with the back end for machine j.
• Intermediate code is often the link between the compiler’s front end and back end.
• Intermediate codes are machine independent codes, but they are close to machine
instructions.
Dabbal Mahara 33
Types of Intermediate Languages
There are three kinds of intermediate representations:
1. High-level intermediate representations:
– closer to the source language; e.g., syntax trees or Directed Acyclic
Graph(DAG)
– easy to generate from the input program
– code optimizations may not be straightforward
2. Low-level intermediate representations:
– closer to target machine; e.g., P-Code, U-Code (used in PA-RISC and
MIPS),
GCC’s RTL, 3-address code
– easy to generate code from
– generation from input program may require effort
3. “Mid”-level intermediate representations:
– Java bytecode, Microsoft CIL, LLVM IR, ...
Dabbal Mahara 34
1. Syntax Tree
Dabbal Mahara 35
SDD for creating syntax tree
num 3
2
num 2 num 3
Dabbal Mahara 36
Example 2: Syntax tree
Dabbal Mahara 37
2. DAG
• Like syntax tree for an expression, DAG has leaves corresponding to atomic operands and
interior nodes corresponding to operators.
• The difference is that a node N in DAG has more than one parent if N represents a common
subexpression.
• All what is needed is that functions such as Node and Leaf above check whether a node already
exists. If such a node exists, a pointer is returned to that node.
• More compact representation
• Gives clues regarding generation of efficient code
Dabbal Mahara 38
3. Three-Address Code
• A three address code is the intermediate representation with at most one operator on the
right side of an instruction.
• That is, no built-up arithmetic expressions are permitted.
• Thus, x+y*z might be translated into the sequence of three address instructions:
t1 = y*z
t2 = x + t 1
where t1 and t2 are compiler generated temporary names.
• 3AC is close to assembly language, making machine code generation easier.
• 3AC is easy to generate from syntax trees or DAG. We associate a temporary with each
interior tree node.
Dabbal Mahara 39
Forms of 3AC
• Assignment statements of the form x := y op z, where op is a binary arithmetic or logical operation.
• Assignment statements of the form x := op y, where op is a unary operator, such as unary minus,
logical negation
• Copy statements of the form x := y, which assigns the value of y to x.
• Unconditional statements goto L, which means the statement with label L is the next to be executed.
• Conditional jumps, such as if x relop y goto L, where relop is a relational operator (<, =, >=, etc) and
L is a label. (If the condition x relop y is true, the statement with label L will be executed next.)
• Statements param x and call p, n for procedure calls, and return y, where y represents the (optional)
returned value. The typical usage: p(x1, …, xn)
param x1
param x2
…
param xn
call p, n
• Index assignments of the form x := y[i] and x[i] := y. The first sets x to the value in the location i memory
units beyond location y. The second sets the content of the location i unit beyond x to the value of y.
• Address and pointer assignments:
x := &y
x := *y
*x := y
Dabbal Mahara 40
Representation of 3AC in data structure
• How to present these instructions in a data structure?
In compiler the instructions in 3AC can be implemented as objects or records with fields for
operator and operands. Three such representations are:
– Quadruples
– Triples
– Indirect triples
1. Quadruples
• Has four fields: op, arg1, arg2, result
• Exceptions:
– Unary operators: no arg2
– Operators like param: no arg2, no result
– (Un)conditional jumps: target label is the result
Dabbal Mahara 41
2. Triples
Fig. Representation of a = b * - c + b * - c
Dabbal Mahara 42
3. Indirect Triples
• When instructions are moving around during optimizations: quadruples are
better than triples.
• Indirect triples solve this problem.
• Indirect triples consists of list of pointers to triples rather than a listing of
triples themselves. With this optimizing compilers can move an instruction
by reordering the instruction list without affecting the triples themselves.
Dabbal Mahara 43
3AC for program constructs
• Program consists of assignment statements like a=b op c or control statements like if-then-else,
while loop or for statements.
• This section deals with generation of three address code for assignment statement and control
statements.
Dabbal Mahara 45
Example: Generate three address code or the following arithmetic expression: a=-b*c
S.code = t1 = -b t2 = t1 * c
a = t2
E.addr = t2
E.code= t1 = -b t2 = t1 * c
E.addr = c
E.addr = t1
E.code = ‘ ’
E.code= t1 = -b
E.addr = b
E.code = ‘ ’
A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9]
Dabbal Mahara 47
3AC generation for Array references
• More generally, the array elements need not be started at 0. In one dimensional array,
the array elements are numbered low, low+1, low+2,............, high and base is the
relative address of A[low].
• All the components in c are known before compilation hence they can be pre-computed
and stored. This reduces the time taken to generate address of ith element.
• We assume that c is saved in the symbol table entry for A, so the relative address of
A [ i ] is obtained by simply adding i * w to c.
Dabbal Mahara 48
One Dimensional Array Reference: Example
x : = A[ i ]
= base + (i – low) * w = i *w + c
where, c = base – low * w with low = 10; w =4
x Dabbal Mahara 49
• In case of multi-dimension array like matrix, elements are either stored as Row Major or Column
Major. C language and Pascal uses row major storage where as Fortran language uses column
major storage.
(0,0) (0,1) (0,2)
• Example: Consider Array A[3,3] with elements: (1,0) (1,1) (1,2)
Row Major (0,0) (0,1) (0,2) (1,0) (1,1) (1,2) (2,0) (2,1) (2,2)
Colum Major (0,0) (1,0) (2,0) (0,1) (1,1) (2,1) (0,2) (1,2) (2,2)
Address of element A [i, j] in row major storage is given by the expression as follows.
A[i,j] = base + ((i - low1) * n2 + j - low2) * w, ................. ( 3)
where low1 and low2 are lower bounds of i & j and n2 defines the number of columns. w defines the size of
each element.
Expression can be written as:
A[i,j] = (( i * n2) + j) * w) + ( base – (( low1 * n2) + low2) * w ) .......... (4)
The second part of the Expression (4) can be pre-computed by knowing the value of base, low1, low2 and
w. This helps in faster generation of address for A[i,j].
Dabbal Mahara 50
Example: 2D array referencing
Dabbal Mahara 52
Translation of Array references
Dabbal Mahara 53
Example: Compute 3AC for expression c+a[i][j], where c, i and j are all integers and a is 2x3 integer
array.
E. addr= t5
E . addr = t4
E.addr = c +
L.array = a
L.type = integer
c L . addr = t3
L.array = a
L.type = array (3, integer)
L.addr = t1 [ E . addr =j ]
j
a [ E . addr =i ]
a.type
= array(2, array(3, integer))
Three Address Code
i t1 = i * 12
t2 = j *4
Fig. Annotated Parse Tree for c + a[ i ][ j ] t3 = t1 + t2
t4 = a [ t3]
t5 = c + t4
Dabbal Mahara 54
Flow-of- control statements
• Control statements are used to alter the sequential flow of execution.
• Some the control statements are if-then-else statement, while statement.
• S -> if (E ) S1
• S -> if ( E ) S1 else S2
• S -> while ( E ) S1
• S -> do S1 while ( E )
• Three Address Code for if-then, if-then-else, while do statements can be generated using
the translation rules given in following slides.
• In the translation rules, both S and E have a synthesized attribute code, which gives the
trasnslation into three-address instructions.
• For simpilicity, translations S.code and E.code are built up as string using SDD.
• The translation of S -> if (E) S1 consists of E.code followed by S1.code as shown in figure.
Dabbal Mahara 55
Three Address Code generation for if then statement
to E. true
Example: Generate 3 address code for the statement: if a>b then x =y +z.
Ans:
3AC for the given statement is:
if a>b then goto L1
goto L2
L1: t1 = y + z
x = t1
L2: ....
Dabbal Mahara 56
Three Address Code generation for if then else statement
Production Semantic Rules
S->if E then S1 else S2 E.true = newlabel();
E.false = newlabel();
S1.next = S.next;
S2.next = S.next;
S.code = E.code || label(E.True, ‘:’)||
S1.code || gen(‘GOTO’, S.next) ||
label(E.false, ‘:’) || S2.code
Example: Generate 3 address code for the statement: if a>b then x =y +z else x = y-z
The three address code is given below:
Dabbal Mahara 58
Example 1: Generate 3 address code for the statement: while a>b do x = y +z.
The three address code is given below:
Intermediate Code:
dot_product = 0;
t7 = t3 + t6
i =0; t8 = dot_product + t7
L1: if (i >=10) GOTO L2 dot_product = t8
t9 = i + 1
t1 = addr(a) // c = base – low* w = base
i = 19
t2 = i * 4 GOTO L1
t3 = t1[t2] L2: -------------
t4 = addr(b)
t5 = i * 4
t6 = t4[t5]
Dabbal Mahara 62
Example 6 : Generate three address code for following c program
int a[10], b[10], dot_product, i;
int *a1, *b1;
dot_product = 0;
a1 =a; b1 = b;
for ( i =0 ; i < 10 ; i++ ) dot_product += *a1++ * *b1++;
Intermediate Code:
dot_product = 0; t5 = *b1
t6 = b1 + 1
a1 = &a b1 = t6 +1
b1 = &b t7 = t3 + t5
t8 = dot_product + t7
i =0;
dot_product = t8
L1: if (i >=10) GOTO L2 t9 = i + 1
t3 = *a1 i = 19
GOTO L1
t4 = a1 + 1
L2: -------------
a1 = t4
Dabbal Mahara 63
Logical Expression
• Logical operators are mainly used in flow control statements like if then else, while-do and repeat until.
• not operation has the highest precedence-level followed by and and or is at least precedence level.
• Logical expressions always results in values either true or false.
• True can be treated as non zero or non negative or 1 value. Whereas false may be 0 or negative value.
Dabbal Mahara 64
SDD for translation of Boolean Expression to 3AC
E → not E1 E1.true = E. false
E1. false = E.true
E.code = E1.code
E → ( E1 ) E.value = E1.value
Dabbal Mahara 65
Example: consider the following statement and translate it into three address
codes. if (x < 100 || x > 200 && x != y ) x = 0;
Three address-code:
if x < 100 goto L2
goto L3
L3: if x > 200 goto L4
goto L1
L4: if x != y goto L2
goto L1
L2: x=0
L1: ......
Dabbal Mahara 66
Three address code for procedure call
Example: 1
Dabbal Mahara 67
Example 2
Consider the statement: n=f(a[i])
where a is array of integers f is function from integers to integers.
Dabbal Mahara 68
int dot_product ( int x[ ] , int y [] )
{
Example: 3 int d, i;
d =0;
for ( i= 0; i<10;i++
int main() d += x[i] * y[i];
return d;
{ }
int p, int a[10]; int b[10];
p = dot_product ( a, b); intermediate Code:
func begin dot_product t6 = t4[t5]
} d =0 t7 = t3 + t6
i=0 t8 = d + t7
Intermediate code L1: if (i>=10) goto L2 d= t8
t1 = addr(x) t9 = i + 1
funct begin main t2 = i * 4 i = 19
param a t3 = t1[t2] goto L1
t4 = addr(y) L2: return d
param b
t5 = i * 4 func end
p = call dot_product, 2
func end
Dabbal Mahara 69
Example 4: Write 3AC for the following code:
int fact ( int n)
{
if ( n== 0 ) return 1;
else return ( n* fact(n-1));
}
Intermediate Code:
func begin fact
if (n==0) goto L1
t1 = n-1
param t1
t2 = call fact, 1
t3 = n * t2
return t3
L1: return 1
func end
Dabbal Mahara 70
Thank You !
71