You are on page 1of 19

SRM INSTITUTE OF SCIENCE AND TECHNLOLGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


15CS314J - Compiler Design
ANSWER KEY
Prepared by : 102454-Dr.R.I.Minu Date of Exam: 19-11-2019
PART B (5 X 4 = 20 MARKS)
21. Write short notes on LEX
Ans:
 Definition – 2 marks
 Explanation – 2 marks
 Lex is a tool in lexical analysis phase to recognize tokens using regular expression.
 Lex tool itself is a lex compiler.
 Lex is an acronym that stands for "lexical analyzer generator." It is intended primarily
for Unix-based systems

 lex.l is an a input file written in a language which describes the generation of lexical
analyzer. The lex compiler transforms lex.l to a C program known as lex.yy.c.
 lex.yy.c is compiled by the C compiler to a file called a.out.
 The output of C compiler is the working lexical analyzer which takes stream of input
characters and produces a stream of tokens.
 yylval is a global variable which is shared by lexical analyzer and parser to return the
name and an attribute value of token.
 The attribute value can be numeric code, pointer to symbol table or nothing.
 Another tool for lexical analyzer generation is Flex.

22. Check whether the given grammar is ambiguous


Ans:
Grammar is not given in the question paper, so award mark if the student written
general definition regarding ambiguity of grammar
23. Construct the leading and trailing for the following grammar
Ans:
 Leading – 2 marks
 Trailing – 2 marks

24. Construct the three address code for a(a<b+c)a = a-c; c = b*c
The question was miss typed it should be
“if (a < b + c)
a = a - c;
c = b * c; ”
Award mark if the student answer is relevant to the given answer
Ans:

t1 = b + c;
t2 = a < t1;
If t2 Goto L0;
t3 = a - c;
a = t3;
L0: t4 = b * c;
c = t4;

25. Give the application of DAG


Ans:
 Any four application ,Each carries one mark

 Scheduling : Directed acyclic graphs representations of partial orderings have


many applications in scheduling for systems of tasks with ordering constraints.
 Data processing networks: A directed acyclic graph may be used to represent a
network of processing elements. In this representation, data enters a processing
element through its incoming edges and leaves the element through its outgoing
edges.
 Genealogy and version history: Family tree of the Ptolemaic dynasty, with
many marriages between close relatives causing pedigree collapse. Family trees
may be seen as directed acyclic graphs, with a vertex for each family member
and an edge for each parent-child relationship
 Citation graphs: In a citation graph the vertices are documents with a single
publication date. The edges represent the citations from the bibliography of one
document to other necessarily earlier documents.
 Data compression: Directed acyclic graphs may also be used as a compact
representation of a collection of sequences. In this type of application, one finds
a DAG in which the paths form the given sequences.
26. Write the SDT for type declaration statement
Ans:
 Identification of rule – 2 marks
 Explanation – 2 marks

 The syntax directed definition with both inherited and synthesized attributes for the
grammar for “type declarations”:
• The non terminal T has a synthesized attribute, type, determined by the keyword
in the declaration.
• The production D → T L is associated with the semantic rule L.in := T .type
which set the inherited attribute L.in

27. Write short notes on dead code elimination


Ans:
 Definition – 2 marks
 Example – 2 marks
The operation on DAG's that corresponds to dead-code elimination can be implemented
as follows. We delete from a DAG any root (node with no ancestors) that has no live
variables attached. Repeated application of this transformation will remove all nodes
from the DAG that correspond to dead code.

Example : In Fig., a and b are live but c and e are not, we can immediately remove the
root labeled e. Then, the node labeled c becomes a root and can be removed. The roots
labeled a and b remain, since they each have live variables attached.

PART C (5 X 12 = 60 MARKS)
28.
a. Explain how input buffering helps lexical analyzer in compilation process
with example
Ans
 General explanation – 3 marks
 Buffer pair – 3 marks
 Scheme – 3 marks
 Sentinels – 3 marks

To ensure that a right lexeme is found, one or more characters have to be looked up beyond
the next lexeme.

• Hence a two-buffer scheme is introduced to handle large lookaheads safely.

• Techniques for speeding up the process of lexical analyzer such as the use of sentinels to
mark the buffer end have been adopted.

There are three general approaches for the implementation of a lexical analyzer:

(i) By using a lexical-analyzer generator, such as lex compiler to produce the lexical analyzer
from a regular expression based specification. In this, the generator provides routines for
reading and buffering the input.

(ii) By writing the lexical analyzer in a conventional systems-programming language, using


I/O facilities of that language to read the input.
(iii) By writing the lexical analyzer in assembly language and explicitly managing the reading
of input.

Buffer Pairs
Because of large amount of time consumption in moving characters, specialized buffering
techniques have been developed to reduce the amount of overhead required to process an input
character. Fig shows the buffer pairs which are used to hold the input data

Scheme
• Consists of two buffers, each consists of N-character size which are reloaded alternatively.
• N-Number of characters on one disk block, e.g., 4096.
• N characters are read from the input file to the buffer using one system read command.
• eof is inserted at the end if the number of characters is less than N.

Pointers
Two pointers lexemeBegin and forward are maintained.
 lexeme Begin points to the beginning of the current lexeme which is yet to be found.
 forward scans ahead until a match for a pattern is found.
• Once a lexeme is found, lexemebegin is set to the character immediately after the lexeme
which is just found and forward is set to the character at its right end.
• Current lexeme is the set of characters between two pointers.

Disadvantages of this scheme


• This scheme works well most of the time, but the amount of lookahead is limited.
• This limited lookahead may make it impossible to recognize tokens in situations where the
distance that the forward pointer must travel is more than the length of the buffer.
(eg.) DECLARE (ARGl, ARG2, . . . , ARGn) in PL/1 program;
• It cannot determine whether the DECLARE is a keyword or an array name until the character
that follows the right parenthesis.

Sentinels
• In the previous scheme, each time when the forward pointer is moved, a check is done to
ensure that one half of the buffer has not moved off. If it is done, then the other half must be
reloaded.
• Therefore the ends of the buffer halves require two tests for each advance of the forward
pointer.

Test 1: For end of buffer.


Test 2: To determine what character is read.

• The usage of sentinel reduces the two tests to one by extending each buffer half to hold a
sentinel character at the end.
• The sentinel is a special character that cannot be part of the source program. (eof character
is used as sentinel).

Advantages
• Most of the time, It performs only one test to see whether forward pointer points to an eof.
• Only when it reaches the end of the buffer half or eof, it performs more tests.
• Since N input characters are encountered between eofs, the average number of tests per input
character is very close to 1.

b. Explain the process of constructing an NFA from the regular expression.


Find the NFA for the expression (a|b)*abb

Ans
 Procedure Explanation – 6 marks
 NFA creation – 6 marks

29.
a. Check whether the given grammar is LL(1) or not
S-> iEtS | iEtSeS |a,
E ->b.
b. Design a LALR parser for the following grammar
S ->L=R|R,
L->*R|id,
R->L and parse the string "id=id"
30.
a. Write down the SDD to produce three address code for assignment
statement
 SDD procedure for assignment statement – 6 marks
 Translation schema - 6 marks

ANS:

Syntax directed definition is a generalization of a context free grammar in which


each grammar symbol has an associated set of attributes. An SDD partitioned
the grammar attributes into two subset are as follows;
1. Synthesized attributes (An SDD with only synthesized attributes is called S-
attributed.)
2. Inherited attributes
A Synthesized attribute for a non terminal A at a parse tree node N is defined
by a semantic rule associated with the production at N. The value of a
synthesized attribute at a node is computed from the value of attributes at the
children of that node in the parse tree or by itself. Syntax directed definition that
uses synthesized attributes exclusively is said to be an S-attributed definition.
Syntax directed definition to produce - Three address code for assignment
statements.

Note: In the syntax directed translation o three address code, the nonterminal E
has two attributes;
1. E.place, the name that will hold the value of E,
2. E,code, the sequence of three address statements evaluating E.

Example: A = -B * (C + D)
Three-Address code is as follows:
T1 = - B
T2 = C + D
T3 = T1 * T2
A = T3

b. Explain back patching which example and SDD


 Back-patching general explanation – 6 marks
 Back-patching for Boolean expression- 6 marks

ANS:
• A key problem when generating code for Boolean expressions and flow-of-control
statements is that of matching a jump instruction with the target of the jump.
• For example, the translation of the Boolean expression B in if ( B ) S contains a
jump, for when B is false, to the instruction following the code for S.
• In a one-pass translation, B must be translated before S is examined.
• What then is the target of the goto that jumps over the code for S?
31.
a. Design a Simple code generator
 Machine Instructions for Operations – 3 marks
 Machine Instructions for Copy Statements – 3 marks
 Ending the Basic Block – 3 marks
 Managing Register and Address Descriptors – 3 marks
ANS
Machine Instructions for Operations
 For a three-address instruction such as x = y + z, do the following:
 Use getReg(x = y + z) to select registers for x, y, and z. Call these Rx Ry and Rz
 If y is not in Ry (according to the register descriptor for Ry), then issue an instruction
LD Ry y', where y' is one of the memory locations for y (according to the address
descriptor for y).
 Similarly, if z is not in Rz , issue and instruction LD Rz z', where z' is a location for x
 Issue the instruction ADD Rx Ry Rz
Machine Instructions for Copy Statements
 Let x=y
 We assume that getReg will always choose the same register for both x and y.
 If y is not already in that register Ry then generate the machine instruction LD Ry y.
 If y was already in Ry we do nothing.
 It is only necessary that we adjust the register description for Ry so that it includes x as
one of the values found there.
Ending the Basic Block
 Variables used by the block may wind up with their only location being a register.
 If the variable is a temporary used only within the block, that is fine;
 When the block ends, we can forget about the value of the temporary and assume its
register is empty.
 However, if the variable is live on exit from the block, or if we don't know which
variables are live on exit, then we need to assume that the value of the variable is needed
later.
 In that case, for each variable x whose location descriptor does not say that its value is
located in the memory location for x,
 we must generate the instruction ST x, R, where R is a register in which x's value exists
at the end of the block.
Managing Register and Address Descriptors
The rules to update the register and address descriptors.
 For the instruction LD R, x
Change the register descriptor for register R so it holds only x.
Change the address descriptor for x by adding register R as an additional location.
 For the instruction ST x, R, change the address descriptor for x to include its own
memory location.
 For an operation such as ADD Rx Ry Rz implementing a three-address instruction x =
y+x
Change the register descriptor for Rx so that it holds only x.
Change the address descriptor for x so that its only location is fix. Note that the memory
location for x is not now in the address descriptor for x.
 Remove Rx from the address descriptor of any variable other than x.
When we process a copy statement x = y, after generating the load for y into register
Ry, if needed, and after managing descriptors as for all load statements (per rule I):
 Add x to the register descriptor for Ry
Change the address descriptor for x so that its only location is Ry

b. Construct DAG and target code for the expression


x=((a+b)|(b-c))-(a+b)*(b-c)+f and explain the same
32.
a. Explain about global data flow analysis
 General explanation of data flow analysis and abstraction -3 marks
 Possible execution paths -3 marks
 Data flow analysis schema -3 marks
 Control flow constraints -3 marks

The Data-Flow Abstraction


• All the optimizations depend on data-flow analysis.
• "Data-flow analysis" refers to a body of techniques that derive information
about the flow of data along program execution paths.
• For example, one way to implement global common subexpression
elimination
• As another example, if the result of an assignment is not used along any
subsequent execution path, then we can eliminate the assignment as dead code.
• Each execution of an intermediate-code statement transforms an input state to
a new output state.
• The input state is associated with the program point before the statement
• The output state is associated with the program point after the statement.
Possible execution paths
• Within one basic block, the program point after a statement is the same as the
program point before the next statement.
• If there is an edge from block B1 to block B2, then the program point after the
last statement of B1 may be followed immediately by the program point before
the first statement of B2.
• Thus, we may define an execution path (or just path) from point p1 to point
pn, to be a sequence of points p1 ,p2, . . . , Pn such that for each i = 1,2, . . . ,
n - 1, either
• pi is the point immediately preceding a statement and pi+1 is the point
immediately following that same statement, or
• pi is the end of some block and pi+1 is the beginning of a successor block.
• “In general, there is an infinite number of possible execution paths through a
program, and there is no finite upper bound on the length of an execution path”

The Data-Flow Analysis Schema


• In each application of data-flow analysis, we associate with every program
point a data-flow value that represents an abstraction of the set of all possible
program states that can be observed for that point.
• We denote the data-flow values before and after each statements by IN[S] and
OUT[S], respectively.
• The data-flow problem is to find a solution to a set of constraints on the IN[s]'s
and OUT[S]'s, for all statements.
• There are two sets of constraints:
• Those based on the semantics of the statements ( "transfer functions")
• Those based on the flow of control.
• Transfer functions come in two flavors: information may propagate forward
along execution paths, or it may flow backwards up the execution paths
• In a forward-flow problem, the transfer function of a statement s, which we
shall usually denote fS takes the data-flow value before the statement and
produces a new data-flow value after the statement. That is,
• OUT[S] = FS(IN[S])
• Conversely, in a backward-flow problem, the transfer function f, for statement
s converts a data-flow value
• IN[S] = FS(OUT[S])
Control-Flow Constraints
• Within a basic block, control flow is simple.
• If a block B consists of statements s1 , s2, . . . , Sn in that order, then the
control-flow value out of si is the same as the control-flow value into si+1.
That is,
• IN[Si+1] = OUT[Si], for all i = 1,2,3...n-1
• Control-flow edges between basic blocks create more complex constraints
between the last statement of one basic block and the first statement of the
following block.
b.
i. Explain loop optimization techniques (6 marks)
ANS
Loop Optimization:
The optimization performed on inner loops is called loop optimization
Generally, inner loop is a place where program spends large amount of time. Hence, if number
of instructions is less in inner loop the running time of the program decreases
The following techniques can be performed on inner loops:
Code Motion/Loop Invariant
Induction Variable
Reduction in Strength
Loop Fusion/Loop Jamming
Loop Unrolling

Code Motion/Loop Invariant:


The optimization performed on inner loop, in which the code moves outside the loop called as
code motion
If there are a number of lines inside the loop whose result remains same even after executing
the loop for several times, such an expression should be placed outside the loop, i.e., just before
the loop
Example:
int i, max = 10;
for(i =10; I <= max-z; i ++)
{
Printf(“%d”, i);
}
In the example code, the result of an expression max-1 remains same for 11 iterations. Hence,
this code can be optimized by removing the computing of maz-1 outside the loop. i.e., by
placing this expression before the loop thereby avoiding multiple computations
The optimized code is
int I, max = 10, r;
r = max-1;
for(i = 0; i <=r; i++);
{
Printf(“%d”, i);
}

Induction Variable:
A variable x is called an induction variable of loop L every time the variable x changes values,
it is incremented or decremented by some constant
Example 1:
int i, maz = 10, r;
r = max-1;
for(i=10;i<=r;i++)
{
Printf(“%d”, i);
}
In the above code, variable i is called induction variable as values of I get incremented by 1,
i.e., 0,1,2,3,4,5,6,7,8,9,10

Reduction in Strength:
The strength of certain operators is higher than other operators
For example, strength of * is higher than +. Usually, compiler takes more time for higher
strength operators and execution speed is less
Replacement of higher strength operator by lower strength operator is called a strength
reduction technique
Optimization can be done by applying strength reduction technique where higher strength can
be replaced by lower strength operators
Example:
for (i=1;i<=10;i++)
{
sum = I * 7;
printf(“%d”, sum);
}
In the above code replacement of * by + will speed up the object code. Thus, optimization is
done without changing the meaning of a code
The optimized code is
temp = 7;
for(i=1;i<=10;i++)
{
temp = temp + 7;
sum = temp;
printf(“%d”, sum)
}
Note: This technique is not applied to the floating point expressions because such a use may
yield different results.

Loop Fusion/Loop Jamming:


This technique combines the bodies of two loops whenever the same index variable and number
of iterations are shared
Example:
for(i=0;i<=10;i++)
{
Printf(“TOC”);
}
For(i=0;i<=10;i++)
{
Printf(“CD”);
}
The above code can be merged on one loop and optimized code can be rewritten as
for(i=0;i<=10;i++)
{
Printf(“TOC”);
Printf(“CD”);
}
Loop Unrolling:
In this technique the number of jumps and tests can be optimized by writing the code to times
without changing the meaning of a code
Example:
int i = 1;
while(i<100)
{
a[i] = b[i];
i++;
}
The example code can be optimized as
int i = 1;
while(i<100)
{
a[i] = b[i];
i++;
a[i] = b[i];
i++;
}
The first code loop repeats 50 times whereas second code loop repeats 25 times. Hence,
optimization is done.
32 b.
ii. Write short notes on parameter passing (6 marks)
ANS:
Parameter Passing
The communication medium among procedures is known as parameter passing. The values of
the variables from a calling procedure are transferred to the called procedure by some
mechanism. Before moving ahead, first go through some basic terminologies pertaining to the
values in a program.

r-value
The value of an expression is called its r-value. The value contained in a single variable also
becomes an r-value if it appears on the right-hand side of the assignment operator. r-values can
always be assigned to some other variable.

l-value
The location of memory (address) where an expression is stored is known as the l-value of that
expression. It always appears at the left hand side of an assignment operator.

For example:

day = 1;
week = day * 7;
month = 1;
year = month * 12;
From this example, we understand that constant values like 1, 7, 12, and variables like day,
week, month and year, all have r-values. Only variables have l-values as they also represent
the memory location assigned to them.

For example:

7 = x + y;
is an l-value error, as the constant 7 does not represent any memory location.

Formal Parameters
Variables that take the information passed by the caller procedure are called formal parameters.
These variables are declared in the definition of the called function.

Actual Parameters
Variables whose values or addresses are being passed to the called procedure are called actual
parameters. These variables are specified in the function call as arguments.

Example:

fun_one()
{
int actual_parameter = 10;
call fun_two(int actual_parameter);
}
fun_two(int formal_parameter)
{
print formal_parameter;
}
Formal parameters hold the information of the actual parameter, depending upon the parameter
passing technique used. It may be a value or an address.

Pass by Value
In pass by value mechanism, the calling procedure passes the r-value of actual parameters and
the compiler puts that into the called procedure’s activation record. Formal parameters then
hold the values passed by the calling procedure. If the values held by the formal parameters are
changed, it should have no impact on the actual parameters.

Pass by Reference
In pass by reference mechanism, the l-value of the actual parameter is copied to the activation
record of the called procedure. This way, the called procedure now has the address (memory
location) of the actual parameter and the formal parameter refers to the same memory location.
Therefore, if the value pointed by the formal parameter is changed, the impact should be seen
on the actual parameter as they should also point to the same value.

Pass by Copy-restore
This parameter passing mechanism works similar to ‘pass-by-reference’ except that the
changes to actual parameters are made when the called procedure ends. Upon function call, the
values of actual parameters are copied in the activation record of the called procedure. Formal
parameters if manipulated have no real-time effect on actual parameters (as l-values are
passed), but when the called procedure ends, the l-values of formal parameters are copied to
the l-values of actual parameters.

Example:

int y;
calling_procedure()
{
y = 10;
copy_restore(y); //l-value of y is passed
printf y; //prints 99
}
copy_restore(int x)
{
x = 99; // y still has value 10 (unaffected)
y = 0; // y is now 0
}
When this function ends, the l-value of formal parameter x is copied to the actual parameter y.
Even if the value of y is changed before the procedure ends, the l-value of x is copied to the l-
value of y making it behave like call by reference.

Pass by Name
Languages like Algol provide a new kind of parameter passing mechanism that works like
preprocessor in C language. In pass by name mechanism, the name of the procedure being
called is replaced by its actual body. Pass-by-name textually substitutes the argument
expressions in a procedure call for the corresponding parameters in the body of the procedure
so that it can now work on actual parameters, much like pass-by-reference.

You might also like