Professional Documents
Culture Documents
Syntax-Directed Definitions
A syntax-directed definition (SDD) is a context-free grammar together with attributes and rules.
Attributes are associated with grammar symbols and rules are associated with productions.
An attribute has a name and an associated value: a string, a number, a type, a memory location,
an assigned register, strings. The strings may even be long sequences of code, say code in the
intermediate language used by a compiler. If X is a symbol and a is one of its attributes, then we
write X.a to denote the value of a at a particular parse-tree node labeled X. If we implement the
nodes of the parse tree by records or objects, then the attributes of X can be implemented by data
fields in the records that represent the nodes for X. The attributes are evaluated by the semantic
rules attached to the productions.
Example: PRODUCTION SEMANTIC RULE
E → E1 + T E.code = E1.code || T.code || ‘+’
SDDs are highly readable and give high-level specifications for translations. But they hide many
implementation details. For example, they do not specify order of evaluation of semantic actions.
Syntax-directed definition:
SDDs are useful for the construction of syntax trees. A syntax tree is a condensed form of parse
tree.
Syntax trees are useful for representing programming language constructs like expressions and
statements.
• They help compiler design by decoupling parsing from translation.
• Each node of a syntax tree represents a construct; the children of the node represent the
meaningful components of the construct.
• e.g. a syntax-tree node representing an expression E1 + E2, has label + and two children
representing the sub expressions E1 and E2
• Each node is implemented by objects with suitable number of fields; each object will have an
op field that is the label of the node with additional fields as follows:
1. If the node is a leaf, an additional field holds the lexical value for the
leaf. This is created by function Leaf(op, val)
2. If the node is an interior node, there are as many fields as the node has
children in the syntax tree. This is created by function Node(op, c1, c2,...,ck) .
Example: The S-attributed definition in figure below constructs syntax trees for a simple
expression grammar involving only the binary operators + and -. As usual, these operators are at
the same precedence level and are jointly left associative. All nonterminals have one synthesized
attribute node, which represents a node of the syntax
tree.
Syntax tree for a-4+c using the above SDD is shown below.
CHAPTER-5(Type Checking)
Types and Declarations
We begin with some basic definitions to set the stage for performing semantic analysis. A type is
a set of values and a set of operations operating on those values. There are three categories of
types in most programming languages:
Base types
int, float, double, char, bool, etc. These are the primitive types provided directly by the
underlying hardware. There may be a facility for user-defined variants on the base types (such as
C enums).
Compound types
arrays, pointers, records, structs, unions, classes, and so on. These types are constructed as
aggregations of the base types and simple compound types.
Complex types
lists, stacks, queues, trees, heaps, tables, etc. You may recognize these as abstract data types. A
language may or may not have support for these sort of higher-level abstractions.
In many languages, a programmer must first establish the name and type of any data object (e.g.,
variable, function, type, etc). In addition, the programmer usually defines the lifetime. A
declaration is a statement in a program that communicates this information to the compiler. The
basic declaration is just a name and type, but in many languages it may include modifiers that
control visibility and lifetime (i.e., static in C, private in Java). Some languages also allow
declarations to initialize variables, such as in C, where you can declare and initialize in one
statement. The following C statements show some example declarations:
Type System
In programming languages, a type system is a set of rules that assign a property called type to
various constructs a computer program consists of, such
as variables, expressions, functions or modules. The main purpose of a type system is to reduce
possibilities for bugs in computer programs, by defining interfaces between different parts of a
computer program, and then checking that the parts have been connected in a consistent way.
This checking can happen statically (at compile time), dynamically (at run time), or as a
combination of static and dynamic checking.
A type system associates a type with each computed value and, by examining the flow of these
values, attempts to ensure or prove that no type errors can occur.
Even if the expression <complex test> always evaluates to true at run-time, most type checkers
will reject the program as ill-typed, because it is difficult (if not impossible) for a static analyzer
to determine that the else branch will not be taken.
Type Checking
Type checking is the process of verifying that each operation executed in a program respects the
type system of the language. This generally means that all operands in any expression are of
appropriate types and number. Much of what we do in the semantic analysis phase is type
Checking. Sometimes the rules regarding operations are defined by other parts of the code (as in
function prototypes), and sometimes such rules are a part of the definition of the language itself
(as in "both operands of a binary arithmetic operation must be of the same type").
If a problem is found, e.g., one tries to add a char pointer to a double in C, we encounter a type
error. A language is considered stronglytyped if each and every type error is detected during
compilation. Type checking can be done compilation, during execution, or divided across both.
Static type checking is done at compile-time. The information the type checker needs is
obtained via declarations and stored in a master symbol table. After this information is collected,
the types involved in each operation are checked. It is very difficult for a language that only does
static type checking to meet the full definition of strongly typed..
Dynamic type checking is implemented by including type information for each data location at
runtime. For example, a variable of type double would contain both the actual double value and
some kind of tag indicating "double type". The execution of any operation begins by first
checking these type tags. The operation is performed only if everything checks out. Otherwise, a
type error occurs and usually halts execution
Types of conversion:
CHAPTER-6
INTRODUCTION
The front end translates a source program into an intermediate representation from
which the back end generates target code.
Benefits of using a machine-independent intermediate form are:
1. Retargeting is facilitated. That is, a compiler for a different machine can be created
by attaching a back end for the new machine to an existing front end.
INTERMEDIATE LANGUAGES
1. Syntax tree
2. Postfix notation
The semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees or for generating postfix notation.
Graphical Representations:
Syntax tree:
A syntax tree depicts the natural hierarchical structure of a source program. A dag
(Directed Acyclic Graph) gives the same information but in a more compact way because
common subexpressions are identified. A syntax tree and dag for the assignment statement a : =
b * - c + b * - c are as follows:
assign assign
a + a +
* * *
c c c
Postfix notation:
Syntax-directed definition:
Three-Address Code:
x : = y op z
where x, y and z are names, constants, or compiler-generated temporaries; op stands for any
operator, such as a fixed- or floating-point arithmetic operator, or a logical operator on boolean-
valued data. Thus a source language expression like x+ y*z might be translated into asequence
t1 : = y * z
t2 := x + t1
wheret1 and t2 are compiler-generated temporary names.
The use of names for the intermediate values computed by a program allows three-
address code to be easily rearranged – unlike postfix notation.
Three-address code is a linearized representation of a syntax tree or a dag in which
explicit names correspond to the interior nodes of the graph. The syntax tree and dag are
represented by the three-address code sequences. Variable names can appear directly in three-
address statements.
Three-address code corresponding to the syntax tree and dag given above
t1 : = - c t1 : = -c
t2 : = b * t1 t2 : = b * t1
t3 : = - c t5 : = t2 + t2
t4 : = b * t3 a : = t5
t5 : = t2 + t4
a : = t5
(a) Code for the syntax tree (b) Code for the dag
The reason for the term “three-address code” is that each statement usually contains three
addresses, two for the operands and one for the result.
Triples
Indirect triples
Quadruples:
A quadruple is a record structure with four fields, which are, op, arg1, arg2 and result.
The op field contains an internal code for the operator. The three-address statement x : =
y op z is represented by placing y in arg1, z in arg2 and x in result.
The contents of fields arg1, arg2 and result are normally pointers to the symbol-table
entries for the names represented by these fields. If so, temporary names must be entered
into the symbol table as they are created.
Triples:
To avoid entering temporary names into the symbol table, we might refer to a temporary
value by the position of the statement that computes it.
If we do so, three-address statements can be represented by records with only three
fields: op, arg1 and arg2.
The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table
or pointers into the triple structure ( for temporary values ).
Since three fields are used, this intermediate code format is known as triples.
(0) uminus c t1
(1) * b t1 t2
(2) uminus c t3
(3) * b t3 t4
(4) + t2 t4 t5
(5) := T5 a
(a) Quadruples
arg
op arg1 2
(0) uminus c
(1) * b (0)
(2) uminus c
(3) * b (2)
(4) + (1) (3)
(5) assign a (4)
(b) Triples
Indirect Triples:
Declarations
As the sequence of declarations in a procedure or block is examined, we can lay out
storage for names local to the procedure. For each local name, we create a symbol-table entry
with information like the type and the relative address of the storage for the name. The relative
address consists of an offset from the base of the static data area or the field for local data in an
activation record.
Declarations in a Procedure:
The syntax of languages such as C, Pascal and Fortran, allows all the declarations in a
single procedure to be processed as a group. In this case, a global variable, say offset, can
keep track of the next available relative address.
Before the first declaration is considered, offset is set to 0. As each new name is seen ,
that name is entered in the symbol table with offset equal to the current value of offset,
and offset is incremented by the width of the data object denoted
by that name.
The procedure enter( name, type, offset ) creates a symbol-table entry for name, gives its
type type and relative address offset in its data area.
Attribute type represents a type expression constructed from the basic types integer and
real by applying the type constructors pointer and array. If type expressions are
represented by graphs, then attribute type might be a pointer to the node representing a
type expression.
The width of an array is obtained by multiplying the width of each element by the
number of elements in the array. The width of each pointer is assumed to be 4.
Back patching
Back patching is the technique to solve the problem of replacing symbolic names into goto
statements by the actual target addresses.
Back patching usually refers to the process of resolving forward branches that have been planted
in the code, e.g. at 'if' statements, when the value of the target becomes known, e.g. when the
closing brace or matching 'else' is encountered.
{call procedure-name[([parameter][,[parameter]]...)]}
Examples
ERASE
This is a procedure call to a subroutine to erase the current window. There are no explicit inputs
or outputs. Other procedures have one or more parameters. For example:
PLOT, Circle, Square
Symbol Table:
A new symbol table is created when a procedure declaration D proc id D1;S is seen,
and entries for the declarations in D1 are created in the new table. The new table points back to
the symbol table of the enclosing procedure; the name represented by id itself is local to the
enclosing procedure. The only change from the treatment of variable declarations is that the
procedure enter is told which symbol table to make an entry in.
For example, consider the symbol tables for procedures readarray, exchange, and quicksort
pointing back to that for the containing procedure sort, consisting of the entire program. Since
partition is declared within quicksort, its table points to that of quicksort. The symbol table is
accessed by most phases of a compiler, beginning with lexical analysis, and continuing through
optimization.A compiler may use one large symbol table for all symbols or use separated,
hierarchical symbol tables for different scopes.
Symbol tables for nested procedures
sort
nil header
A
X
Readarray to readarray
Exchange to exchange
Quicksort
partition
Header
I
J
1. mktable(previous) creates a new symbol table and returns a pointer to the new table. The
argument previous points to a previously created symbol table, presumably that for the
enclosing procedure.
2. enter(table, name, type, offset) creates a new entry for name name in the symbol table pointed
to by table. Again, enter places type type and relative address offset in fields within the entry.
3. addwidth(table, width) records the cumulative width of all the entries in table in the header
associated with this symbol table.
4. enterproc(table, name, newtable) creates a new entry for procedure name in the symbol table
pointed to by table. The argument newtable points to the symbol table for this procedure
name.
Hash Table
A common data structure used to implement symbol tables is the hash table. Hash tables are used
to organise a symbol table, where the keyword or identifier is 'hashed' to produce an array
subscript. Collisions are inevitable in a hash table, and a common way of handling them is to
store the synonym in the next available free space in the table.
Hashing is the process of mapping large amount of data item to a smaller table with the help of
a hashing function. The essence of hashing is to facilitate the next level searching method when
compared with the linear or binary search. The advantage of this searching method is its
efficiency to hand vast amount of data items in a given collection (i.e. collection size).
Example: Here, we construct a hash table for storing and retrieving data related to the citizens of
a county and the social-security number of citizens are used as the indices of the array
implementation (i.e. key). Let's assume that the table size is 12, therefore the hash function
would be Value modulus of 12.
4. Each symbol table contains the symbol declared in the lexical scope.it solve the problem of
resolving name collisions (solve same name and overlapping scopes)