You are on page 1of 32

UNIT IV

SYNTAX DIRECTED TRANSLATION & RUN TIME ENVIRONMENT


Syntax directed Definitions-Construction of Syntax Tree-Bottom-up Evaluation of S-
Attribute Definitions- Design of predictive translator - Type Systems-Specification of a
simple type checker-Equivalence of Type Expressions-Type Conversions. RUN-TIME
ENVIRONMENT: Source Language Issues-Storage Organization-Storage Allocation-
Parameter Passing-Symbol Tables-Dynamic Storage Allocation-Storage Allocation in
FORTAN.

SEMANTIC ANALYSIS

 Semantic Analysis computes additional information related to the meaning of the


program once the syntactic structure is known.
 In typed languages as C, semantic analysis involves adding information to the
symbol table and performing type checking.
 The information to be computed is beyond the capabilities of standard parsing
techniques, therefore it is not regarded as syntax.
 As for Lexical and Syntax analysis, also for Semantic Analysis we need both a
Representation Formalism and an Implementation Mechanism.
 As representation formalism this lecture illustrates what are called Syntax
Directed Translations.

SYNTAX DIRECTED TRANSLATION

 The Principle of Syntax Directed Translation states that the meaning of an input
sentence is related to its syntactic structure, i.e., to its Parse-Tree.
 By Syntax Directed Translations we indicate those formalisms for specifying
translations for programming language constructs guided by context-free
grammars.
o We associate Attributes to the grammar symbols representing the language
constructs.
o Values for attributes are computed by Semantic Rules associated with
grammar productions.

CS-6660 COMPILER DESIGN 1 VI SEM CSE


 Evaluation of Semantic Rules may:
o Generate Code;
o Insert information into the Symbol Table;
o Perform Semantic Check;
o Issue error messages;
o etc.
There are two notations for attaching semantic rules:
1. Syntax Directed Definitions. High-level specification hiding man
implementation details (also called Attribute Grammars).
2. Translation Schemes. More implementation oriented: Indicate the order in which
semantic rules are to be evaluated.
Syntax Directed Definitions:
Syntax Directed Definitions are a generalization of context-free grammars in
which:
1. Grammar symbols have an associated set of Attributes;
2. Productions are associated with Semantic Rules for computing the
values of attributes.
 Such formalism generates Annotated Parse-Trees where each
node of the tree is a record with a field for each attribute (e.g.,X.a
indicates the attribute a of the grammar symbol X).
 The value of an attribute of a grammar symbol at a given parse-tree
node is defined by a semantic rule associated with the production
used at that node.
We distinguish between two kinds of attributes:
1. Synthesized Attributes. They are computed from the values of the attributes of the
children nodes.
2. Inherited Attributes. They are computed from the values of the attributes of both the
siblings and the parent nodes
Syntax Directed Definitions: An Example
Example. Let us consider the Grammar for arithmetic expressions. The Syntax Directed
Definition associates to each non terminal a synthesized attribute called val.

CS-6660 COMPILER DESIGN 2 VI SEM CSE


Synthesized Attributes: An Example
E E*E
E E+E
E int
E E1 * E2 E:val := E1:val *E2:val
E E1 + E2 E:val := E1:val + E2:val
E int E:val := int:val

Fig:Annotated Parse Tree


Inherited Attributes
Inherited Attributes are computed from the values of the attributes of both the siblings
and the parent nodes
Example for Inherited Attributes

CS-6660 COMPILER DESIGN 3 VI SEM CSE


Let us consider the syntax directed definition with both inherited and synthesized
attributes for the grammar for “type declarations”:

Example:
PRODUCTION SEMANTIC RULES
T->FT’ T’.inh=F.val
T.val=T’.syn
T’->*FT1’ T1’.inh=T’.inh*F.val
T’.syn=T1’.syn
T’->Ɛ T’.syn=T’.inh
F->id F.val=id.lexval
Let us use this SDD to compute
4*5.

Fig Annotated Parse tree

DEPENDENCY GRAPHS

CS-6660 COMPILER DESIGN 4 VI SEM CSE


It depicts the path of flow of information
1) For each node in the Parse Tree the Dependency graph has a node for each
attribute in the node.
2) For each production of the form X.a=f(Y.a,Z.b,….) create an edge from Y.a node
to X.a node,Z.b to X.a and so on. This means Y.a and Z.b are to be evaluated
before evaluating X.a
Example: Dependency graph for the annotated parse tree for 3*5

Dotted line represents parse tree

Solid line represents dependency

S-ATTRIBUTED DEFINITIONS

Definition. An S-Attributed Definition is a Syntax Directed Definition that uses only


synthesized attributes.

Evaluation Order. Semantic rules in a S-Attributed Definition can be evaluated by a


bottom-up, or PostOrder, traversal of the parse-tree.
Example. The above arithmetic grammar is an example of an S-Attributed Definition.

CS-6660 COMPILER DESIGN 5 VI SEM CSE


The annotated parse-tree for the input 3*5+4n is:

L-attributed definition:
Definition: A SDD its L-attributed if each inherited attribute of Xi in the RHS of A X1
::Xn depends only on
1. attributes of X1;X2; : : : ;Xi1 (symbols to the left of Xi in the RHS)
2. inherited attributes of A.

Restrictions for translation schemes:

1. Inherited attribute of Xi must be computed by an action before Xi.


2. An action must not refer to synthesized attribute of any symbol to the right of
that action.
3. Synthesized attribute for A can only be computed after all attributes it
references have been completed (usually at end of RHS).

Applications of Syntax-Directed Translations:

1: Construction of Syntax Trees

SDDs are useful for is construction of syntax trees. A syntax tree is a condensed form of
parse tree.

CS-6660 COMPILER DESIGN 6 VI SEM CSE


• Syntax trees are useful for representing programming language constructs like
expressions and statements.
• They help compiler design by decoupling parsing from translation.
• Each node of a syntax tree represents a construct; the children of the node represent the
meaningful components of the construct.
• e.g. a syntax-tree node representing an expression E1 + E2 has label + and two children
representing the sub expressions E1 and E2.
• Each node is implemented by objects with suitable number of fields; each object will
have an op field that is the label of the node with additional fields as follows:

 mkLeaf(num, val)-If the node is a leaf, an additional field holds the lexical value
for the leaf .
 mkLeaf(id, entry)-creates an identifier node labeled with id and a pointer to a
symbol table is given by „entry‟
 mkNode(op, left,right) -If the node is an interior node, there are as many fields
as the node has children in the syntax tree.

Example: The S-attributed definition in figure below constructs syntax trees for a simple
expression grammar involving only the binary operators + and -. As usual, these
operators are at the same precedence level and are jointly left associative. All
nonterminals have one synthesized attribute node, which represents a node of the syntax
tree.

CS-6660 COMPILER DESIGN 7 VI SEM CSE


Syntax tree for a-4+c using the above SDD is shown below.

Steps in the construction of the syntax tree for a-4+c


If the rules are evaluated during a post order traversal of the parse tree, or with reductions
during a bottom-up parse, then the sequence of steps shown below ends with p5 pointing
to the root of the constructed syntax tree.

CS-6660 COMPILER DESIGN 8 VI SEM CSE


Constructing Syntax Trees during Top-Down Parsing
With a grammar designed for top-down parsing, the same syntax trees are constructed,
using the same sequence of steps, even though the structure of the parse trees differs
significantly from that of syntax trees. The L-attributed definition below performs the
same translation as the S-attributed definition shown before.

CS-6660 COMPILER DESIGN 9 VI SEM CSE


Type Checking:

A compiler must check that the source program follows both the syntactic and semantic
conventions of the source language.
This checking is called as static checking.
Examples of static checks are:
1. Type checks.
2. Flow-of-control checks
3. Uniqueness checks
4. Name-related checks
A type checker verifies that the type of a construct matches that expected by its context.

CS-6660 COMPILER DESIGN 10 VI SEM CSE


Type information gathered by a type checker may be needed when code is generated.
Example: Arithmetic operators like + usually apply to either integers or reals.
TYPE SYSTEMS:
TYPE EXPRESSIONS:
The type of a language construct is denoted by a “type expression”.
A type expression is either a basic type or is formed by applying an operator called a type
constructor to other type expressions.
Type Expressions Definition:
1. A basic type is a type expression. Example: char, integer
2. A type name is a type expression. Example: Arrays
3. A type constructor applied to type expression is a type expression. Constructors
include:
a. Arrays
b. Products
c. Records
d. Pointers
e. Functions
4. Type expressions may contain variables whose values are type expressions.
Type Systems:
 A type system is a collection of rules for assigning type expressions to the various
parts of the program.
 A type checker implements a type system.
Static and Dynamic checking of types:
 Checking done by the compiler is said to be static, while checking done when the
target program runs is termed dynamic.

CS-6660 COMPILER DESIGN 11 VI SEM CSE


 A sound type system eliminates the need for dynamic checking for type errors
because it allows us to determine statically that these errors cannot occur when
the target program runs.
Error Recovery:
 It is desirable for the type checker to recover from errors, so it can check from the
rest of the input.
 The rules for the type checker must cope up with errors.
Specification of a simple type checker:
 We specify a type checker for a simple language in which the type of each
identifier must be declared before the identifier is used.
 The type checker is a translation scheme that synthesizes the type of each
expression from the type of its subexpression.
 The type checker can handle arrays, pointers, statements and functions.
A Simple Language:
 Consider the grammar given below:

 The starting symbol is P. P generates a sequence of Declarations followed by a


single expression E.
 With the help of the grammar we can generate the following code:

 The language contains two data types namely: integer and char and the third basic
type is type_error , used to signal errors.
 For example,

CS-6660 COMPILER DESIGN 12 VI SEM CSE


leads to the type expression array(1..256,char)
 The translation scheme for the type checking is given below:

 In the translation scheme, the action associated with the production


D-> id: T saves a type in a symbol-table entry for an identifier.
 The action addtype (id.entry, T.type) is applied to synthesized attribute entry
pointing to the symbol-table entry for id and a type expression represented by
synthesized attribute type of nonterminal T.
 If T generates char or integer, then T.type is defined to char or integer.
 Type checking of Expressions:
 The semantic rules given below say that constants represented by the token literal
and num have type char and integer respectively:

 When an identifier appears in an expression, its declared type is fetched and


assigned to the attribute type.

 Similarly the semantic rules for arrays and pointers are given below:

CS-6660 COMPILER DESIGN 13 VI SEM CSE


Type Checking of Statements:
The translation scheme for checking the type of statements is given below:

 The statements we consider are assignment, conditional and while statements.


 Sequences of statements are separated by semicolons.
 The first rule checks that the left and right sides of an assignment statement have
the same type.
 The second and third rules specify that expressions in conditional and while
statements must have type Boolean.

RUN TIME ENVIRONMENTS:


 The allocation and deallocation of data objects is managed by the run-time
support package, consisting of routines loaded with the generated target code.
 The design of the run-time support package is influenced by the semantics of
procedures.
 Each execution of a procedure is referred to as an activation of the procedure.
 If the procedure is recursive, several of its activations may be alive at the same
time.
 Each call of a procedure leads to the activation that may manipulate data objects
allocated for its use.

CS-6660 COMPILER DESIGN 14 VI SEM CSE


Source Langauge Issues:
Procedures:
 A procedure definition is a declaration that, in its simplest form, associates an
identifier with a statement.
 The identifier is the procedure name and the statement is the procedure body. A
complete program will also be treated as procedure.
 When a procedure name appears within an executable statement, we say that the
procedure is called at that point. The procedure call executes the procedure body.
 Arguments called as actual parameters may be passed to a called procedure.
 Activation Trees:
 Each execution of a procedure body is referred to as an activation of the
procedure.
 The lifetime of an activation of a procedure p is the sequence of steps between the
first and last steps in the execution of the procedure body.
 A recursive procedure p need not call itself directly; p may call another procedure
q, which may then call p through some sequence of procedure calls. We can use a
tree, called an activation tree, to depict the way control enters and leaves
activations. In an activation tree,

Example:

CS-6660 COMPILER DESIGN 15 VI SEM CSE


CS-6660 COMPILER DESIGN 16 VI SEM CSE
Control Stacks:
 The flow of control in a program corresponds to a depth-first traversal of the
activation record that starts at the root, visits a node before its children, and
recursively visits children at each node in a left-to-right order.
 We can use a stack, called a control stack to keep track of live procedure
activations. The idea is to push the node for an activation onto the control stack as
the activation begins and to pop the node when the activation ends. When node n
is at the top of the control stack, the stack contains the nodes along the path from
n to the root.

The Scope of a Declaration:


 The scope rules of a language determine which declaration of a name applied
when the name appears in the text of a program.
 The portion of the program to which a declaration applies is called the scope of
that declaration.

CS-6660 COMPILER DESIGN 17 VI SEM CSE


 An occurrence of a name in a procedure is said to be local to the procedure if it is
in the scope of a declaration within the procedure; otherwise the occurrence is
said to be nonlocal.
Bindings of Names:
 Even if each name is declared once in a program, the same name may denote
different data objects at run time.
 The term environment refers to a function that maps a name to a storage location.
 When an environment associates storage location s with a name x, we say that x is
bound to s; the association itself is referred to as a binding of x.
STORAGE ORGANISATION:
Subdivision of run-time memory:
 The compiler obtains a block of storage from the operating system for the
compiled program to run in.
 The run-time storage is subdivided to hold:
1. The generated target code
2. Data objects, and
3. A counterpart of the controls tack to keep track of procedure activation.
 The size of the generated target code is fixed at compile time, so the compiler can
place it in a statically determined area, in the lower end of the memory.
 The size of some of the data objects may also be known at compile time and it
can be placed in a statically determined area as shown in the figure below:

CS-6660 COMPILER DESIGN 18 VI SEM CSE


Typical subdivision of run-time memory into code and data area
 Data objects whose lifetimes are contained in that of an activation can be allocated on
the stack, along with other information associated with the activation.
 A separate area of run-time memory, called a heap,holds all other information.
 The size of the stack and the heap can change as the program executes.
Activation Records:
 Procedure calls and returns are usually managed by a run-time stack called the
control stack. Each live activation has an activation record (sometimes called a
frame) on the control stack.
 An activation record is a block of memory used for managing information needed
by a single execution of a procedure.
 Fortran uses the static data areas to store the activation record. In pascal and C,
the activation record is situated in stack area.
 The contents of the activation record are as shown in the figure below:

CS-6660 COMPILER DESIGN 19 VI SEM CSE


• Return value - Return value is used to store the value that the function
returns to called function after its execution.
• Actual parameters - Actual parameters are those which are used for
sending input to functions from caller function.
• Optional control link - This points to the activation record of the caller.
This is very useful in case of recursion.
• Optional access link - This is used to access non-local data, it can point to
caller data area to access global data.
• Machine status - Machine status consists of values of program counter,
machine registers etc.,
• Local data - stores the local data of the called function
• Temporary values, such as those arising from the evaluation of
expressions, in cases where those temporaries cannot be held in registers.

CS-6660 COMPILER DESIGN 20 VI SEM CSE


CS-6660 COMPILER DESIGN 21 VI SEM CSE


Storage Allocation Strategies:

1. Static Allocation:
 The size of data objects is known at compile time. The names of these objects
are bound to storage at compile time only.
 The binding of name with the amount of storage allocated do not change at
run time. Hence, the name of this allocation is called static allocation.
 In static allocation, the compiler can determine the amount of storage required
by each data object.

CS-6660 COMPILER DESIGN 22 VI SEM CSE


o At compile time, compiler can fill the addresses at which the target code
can find the data it operates on.
o Limitations of Static Allocation:
o The static allocation can be done only if the size of the data object is
known at compile time.
o The data structures cannot be created dynamically.
o Recursive procedures are not supported by this type of allocation.
2. Stack Allocation:
 In stack allocation strategy, storage is organized as stack. This is also called as
control stack.
 As activation begins, the activation records are pushed onto the stack and on
completion of this activation, the corresponding activation records can be
popped.
 The locals are stored in the each activation record.
 The data structures can be created dynamically for stack allocation.
Limitations of stack allocation:
 The memory addressing can be done using pointers and index registers.
Hence, this type of allocation is slower than static allocation.
3. Heap allocation:
 The heap allocation allocates the continuous block of memory when required
for storage of activation records or other data object. This allocated memory
can be deallocated when activation ends. This deallocated space can be further
reused by heap manager.
 The efficient heap management can be done by
i. Creating a linked list for the free blocks and when any memory is
deallocated that block of memory is appended in the linked list.
ii. Allocate the most suitable block of memory from the linked list, i.e., use
best fit technique for allocaitonof block.

CS-6660 COMPILER DESIGN 23 VI SEM CSE


PARAMETERS PASSING

A language has first-class functions if functions can be declared within any scope passed
as arguments to other functions returned as results of functions. In a language with first
class functions and static scope, a function value is generally represented by a closure. a
pair consisting of a pointer to function code a pointer to an activation record. Passing
functions as arguments is very useful in structuring of systems using upcalls

An example:
main()
{ int x =4;
int f(int y) {
return
x*y;
}
int g (int →int h){
int x = 7;
return h(3) + x;
}

CS-6660 COMPILER DESIGN 24 VI SEM CSE


Call-by-Value
The actual parameters are evaluated and their r-values are passed to the called Procedure.
A procedure called by value can affect its caller either through nonlocal names or through
pointers. Parameters in C are always passed by value. Array is unusual, what is passed by
value is a pointer. Pascal uses pass by value by default, but var parameters are passed by
reference.

Call-by-Reference
Also known as call-by-address or call-by-location. The caller passes to the called
procedure the l-value of the parameter.
If the parameter is an expression, then the expression is evaluated in a new
location, and the address of the new location is passed.
Parameters in Fortran are passed by reference an old implementation bug in
Fortran
func(a,b) { a = b};
call func(3,4); print(3);

Copy-Restore
A hybrid between call-by-value and call-by reference. The actual parameters are
evaluated and their r-values are passed as in call- by-value. In addition, l values are
determined before the call. When control returns, the current r-values of the formal
parameters are copied back into the l-values of the actual parameters.

Call-by-Name
The actual parameters literally substituted for the formals. This is like a macro expansion
or in-line expansion Call-by-name is not used in practice. However, the conceptually
related technique of in-line expansion is commonly used. In-lining may be one of the
most effective optimization transformations if they are guided by
execution profiles.

CS-6660 COMPILER DESIGN 25 VI SEM CSE


SYMBOL TABLES
A symbol table is a major data structure used in a compiler. Associates attributes
with identifiers used in a program. For instance, a type attribute is usually associated with
each identifier.
 A symbol table is a necessary component Definition (declaration) of identifiers
appears once in a program .Use of identifiers may appear in many places of the
program text Identifiers and attributes are entered by the analysis phases. When
processing a definition (declaration) of an identifier.
 In simple languages with only global variables and implicit declarations. The
scanner can enter an identifier into a symbol table if it is not already there In
block-structured languages with scopes and explicit declarations:
The parser and/or semantic analyzer enter identifiers and corresponding attributes
Symbol table information is used by the analysis and synthesis phases
 To verify that used identifiers have been defined (declared)
 To verify that expressions and assignments are semantically correct – type
checking
 To generate intermediate or target code

Symbol Table Interface


The basic operations defined on a symbol table include:
 allocate – to allocate a new empty symbol table
 free – to remove all entries and free the storage of a symbol table
 insert – to insert a name in a symbol table and return a pointer to its entry
 lookup – to search for a name and return a pointer to its entry
 set_attribute – to associate an attribute with a given entry
 get_attribute – to get an attribute associated with agiven entry
Other operations can be added depending on requirement.
For example,
A delete operation removes a name previously inserted Some identifiers become
invisible (out of scope) after exiting a block
 This interface provides an abstract view of a symbol table

CS-6660 COMPILER DESIGN 26 VI SEM CSE


 Supports the simultaneous existence of multiple tables
 Implementation can vary without modifying the interface
 Basic Implementation Techniques
 First consideration is how to insert and lookup names
 Variety of implementation techniques
 Unordered List
 Simplest to implement
 Implemented as an array or a linked list
 Linked list can grow dynamically – alleviates problem of a fixed size array
 Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
 Ordered List
 If an array is sorted, it can be searched using binary search – O(log2 n)
 Insertion into a sorted array is expensive – O(n) on average
 Useful when set of names is known in advance – table of reserved words
 Binary Search Tree
 Can grow dynamically
 Insertion and lookup are O(log2 n) on average

DATA STRUCTURE FOR SYMBOL TABLE


Symbol Table Management:
1. List:

 The simplest and easiest to implement data structure for symbol table is a linear
list of records.
 Use single array or collection of several arrays for this purpose to store name and
their associated information. Names are added to end of array. End of array
always marks by a point known as space.
 When we insert any name in this list then searching is done in whole array from
„space‟ to beginning of array. If word is not found in array then we create an entry
at „space‟ and increment „space‟ by one or value of data type.

CS-6660 COMPILER DESIGN 27 VI SEM CSE


 At this time insert( ), object look up ( ) operation are performed as major
operation while begin_scope ( ) and end_scope( ) are used in simple table as
minor operation field as „token type‟ attribute etc.
 In implementation of symbol table first field always empty because when „object-
lookup‟ work then it will return „0‟ to indicate no string in symbol table.

2. Self Organizing List


To reduce the time of searching we can add an addition field „linker‟ to each
record field or each array index.When a name is inserted then it will insert at „space‟
and manage all linkers to other existing name.

In figure (a) represent the simple list and (b) represent self-organizing list in which Id1 is
related to Id2 and Id3 is related to Id1.

3. Hash Table

A hash table, or a hash map, is a data structure that associates keys with values
„Open hashing‟ is a key that is applied to hash table. In hashing –open, there is a property
that no limit on number of entries that can be made in table.

CS-6660 COMPILER DESIGN 28 VI SEM CSE


 Main advantage of hash table is that we can insert or delete any number or name
in O (n) time if data are search linearly and there are „n‟ memory location where
data is stored.

 Using hash function any name can be search in O(1) time. However, the rare
worst-case lookup time can be as bad as O(n). A good hash function is essential
for good hash table performance.

 A poor choice of a hash function is likely to lead to clustering, in which


probability of keys mapping to the same hash bucket (i.e. a collision) occur. One
organization of a hash table that resolves conflicts is chaining.

 Structure of hash table look like as

4. Search Tree
Another approach to organize symbol table is that we add two link fields i.e. left
and right child, we use these field as binary search tree. All names are created as child of
root node that always follows the property of binary tree. For inserting any name it
always follow binary search tree insert algorithm.

Example : Create list, search tree and hash table for given program for given program
int a,b,c;
int sum (int x, int y)
{
a = x+y
return (a)

CS-6660 COMPILER DESIGN 29 VI SEM CSE


}
main ()
{
int u,
u=sum (5,6);
}

CS-6660 COMPILER DESIGN 30 VI SEM CSE


CS-6660 COMPILER DESIGN 31 VI SEM CSE
CS-6660 COMPILER DESIGN 32 VI SEM CSE

You might also like