Compiler Manual

Jaihind College of Engineering,
Kuran, Pune
Department of Computer Engineering
LABORATORY MANUAL
2021-2022
LABORATORY PRACTICE-IV
[Compiler]
BE-COMPUTER ENGINEERING
Subject Code: 410252(B)
SUBMITTED BY:
Name of Student: Kiran Satish Kardile

Exam seat no: B150844235
TEACHING SCHEME EXAMINATION SCHEME

Practical: 4Hrs/Week Practical Assessment: 50 Marks
Term Work: 50 Marks
JAIHIND COLLEGE OF ENGINEERING, KURAN
CERTIFICATE
DEPARTMENT OF COMPUTER ENGINEERING
This is to certify that
Kum. Kiran Satish Kardile Roll no. 35 Course BE Computer Exam Seat No
B150844235 of fourth year degree in Computer Engineering has completed the term of the
year 2021-22 as prescribed by University of Pune.
Subject Faculty: Head of Department

Prof :Dumbre A.S Prof:Khatri A.A
Department of Computer Engg Department of computer engg
Dr.Garkal D.J
Principal
JCOE,Kuran
Page
Srno. Title Sign
no
Implement a Lexical Analyzer using LEX for a
1 subset of C. Cross check your output with Stanford
LEX.
Implement a parser for an expression grammar using
YACC andLEX for the subset of C. Cross check your
2 output with StanfordLEX and YACC.
Generate and populate appropriate symbol

3
Implementation of Semantic Analysis Operations

(like type checking, verification of function
parameters,variable declarations and coercions)
4
possibly using an Attributed Translation Grammar.
Generate and populate appropriate symbol
Implement the front end of a compiler that generates

5
the three address code for a simple language
A Register Allocation algorithm that translates the

6 given code into one with a fixed number of
registers
Implement Local and Global Code Optimizations
such as Common Sub-expression Elimination,
7
Copy Propagation, Dead-Code Elimination, Loop
and Basic-Block Optimizations. (Optional
Assignment No. 1
Implement a Lexical Analyzer using LEX for a subset of C. Cross

Title check your output with Stanford LEX.
Roll No. 35
Class B.E. (Computer)
Date
Subject Laboratory Practice-IV
Signatureof subject
teacher.
Marks - /10
Lexical analyzer for HLL using
TITLE LEX
Implement a Lexical Analyzer using LEX for a subset of C. Cross check your
PROBLEM output with Stanford LEX.
STATEMENT
/DEFINITION
Appreciate the role of lexical analysis phase in compilation

OBJECTIVE  Understand the theory behind design of lexical analyzers and lexical analyzer
generator
 Be able to use LEX to generate lexical analyzers
64 bit latest Computer/ min.PC with the configuration as Pentium IV 1.7 GHz.
S/W PACKAGES AND 128M.B RAM, 40 G.B HDD, 15’’Color Monitor, Keyboard, Mouse
HARDWARE 64 bit Linux with a support of LEX utility, GCC
APPARATUS USED
1. A V Aho, R. Sethi, .J D Ullman, "Compilers: Principles, Techniques, and

REFERENCES Tools", Pearson Education, ISBN 81 - 7758 - 590
2. J. R. Levine, T. Mason, D. Brown, "Lex&Yacc", O'Reilly, 2000, ISBN 81-
7366 -061-X.– 8
3. K. Louden, "Compiler Construction: Principles and Practice", Thomson
Brookes/Cole (ISE), 2003, ISBN 981 - 243 - 694-4:page no.31-90
Refer theory, algorithm, test input, test output

STEPS
Prepare LATEX document including following

INSTRUCTIONS FOR  Title
JOURNAL  Problem Definition
 Theory, mathematical model ,test cases
 Algorithm
 Source Code
 Output
 Conclusion
Assignment No: 1
Title: Implement a Lexical Analyzer using LEX for a subset of C. Cross check your
outputwithStanford LEX.
Aim:
Assignment to understand the syntax of LEX specifications, built-in functions and variables.
Objectives:
 To understand first phase of compiler: LexicalAnalysis.

 To learn and use compiler writingtools.
 Understand the importance and usage of LEX automatedtool.
Theory:
Introduction:
LEX stands for Lexical Analyzer.LEX is a UNIX utility which generates the lexical analyzer. LEX is a tool for
generating scanners. Scanners are programs that recognize lexical patterns in text. These lexical patterns (or
regular expressions) are defined in a particular syntax. A matched regular expression may have an associated
action. This action may also include returning a token. When Lex receives input in the form of a file or text, it
attempts to match the text with the regular expression. It takes input one character at a time and continues until a
pattern is matched. If a pattern can be matched, then Lex performs the associated action (which may include
returning a token). If, on the other hand, no regular expression can be matched, further processing stops and Lex
displays an error message. Lex and C are tightly coupled. A lex file (files in Lex have the .l extension eg: first.l )
is passed through the lex utility, and produces output files in C (lex.yy.c). The program lex.yy.c basically consists
of a transition diagram constructed from the regular expressions of first.lThese file is then compiled object
program a.out, and lexical analyzer transforms an input streams into a sequence of tokens as show in fig1.1.
To generate a lexical analyzer two important things are needed. Firstly it will need a precise specification of the
tokens of the language. Secondly it will need a specification of the action to be performed on identifying each
token.
1. LEX Specifications:
The Structure of lex programs consists of three parts:
Definition Section :
The Definition Section includes declarations of variables, start conditions regular definitions, and
manifest constants (A manifest constant is an identifier that is declared to represent a constante.g. #
define PIE3.14).
 C code: Any indented code between %{ and %} is copied to the C file. This is typically used for
defining file variables, and for prototypes of routines that are defined in the codesegment.
 Definitions: A definition is very much like # define cpp directive. Forexample
letter [a-zA-Z]+ digit
[0-9]+
These definitions can be used in the rules section: one could start a rule
{letter}{printf("n Wordis = %s",yytext);}
State definitions: If a rule depends on context, it‟s possible to introduce states and
incorporatethoseintherules.Astatedefinitionlookslike%sSTATE,andbydefaulta state
INITIAL is alreadygiven.
Rule Section:
Second section is for translation rules which consist of regular expression and action with respect to it.
The translation rules of a Lex program are statements of the form:
p1 {action 1}
p2 {action 2}
p3 {action 3}
... ...
... ...
pn {action n}
Where, each p is a regular expression and each action is a program fragment describing what actionthe lexical
analyzer should take when a pattern p matches a lexeme. In Lex the actions are written inC.
Auxiliary Function(User Subroutines):

Third section holds whatever auxiliary procedures are needed by the actions. If the lex program is to be
used on its own, this section will contain a main program. If you leave this section empty
you will get the default main as follow:
int main()
{
Yylex();
Return 0;
}
In this section we can write a user subroutines its option to user e.g. yylex() is a unction automatically get called
by compiler at compilation and execution of lex program or we can call that function from the subroutinesection.
2. Built - inFunctions:
No. Function Meaning

1 yylex() The function that starts the analysis. It is automatically generated
byLex.
2 yywrap() This function is called when end of file (or input) is encountered.
If yywrap() returns 0, the scanner continues scanning, while if it
returns 1 the scanner returns a zero token to report the end of file.
3 yyless(int n) This function can be used to push back all but first „n‟ characters
of the read Token.
4 yymore() This function tells the lexer to append the next token to the current
token.
5 yyerror() This function is used for displaying any error message.
3. Built - inVariables:
No. Variables Meaning

1 Yyin Of the type FILE*. This point to the current file being parsed by the lexer.
It is standard input file that stores input source program.
2 Yyout Of the type FILE*. This point to the location where the output of the lexer
will be written. By default, both yyin and yyout point to standard input
and output.
3 Yytext The text of the matched pattern is stored in this variable (char*) i.e. When
lexer matches or recognizes the token from input token the lexeme stored
in null terminated string called yytext.
OR
This is global variable which stores current token
4 Yyleng Gives the length of the matched pattern. (yyleng stores the length or
number of character in the input string)The value in yyleng is same as
strlen() functions.
5 yylineno Provides current line number information. (May or may not be supported
by the lexer.)
6 yylval This is a global variable used to store the value of any token.
No. RE Meaning
1. Regular 1 A Matches a
2 Abc Matches abc
3 [abc] Matches a or b or c
4 [a-f] Matches a,b,c,d,e or f
5 [0-9] Matches any digit
6 + Matches one or more of x
X
7 X* Matches zero or more of x
8 [0-9]+ Matches any integer
9 (…) Grouping an expression into a single unit
10 | Alteration ( or)
11 (b|c) Is euivalent to [a-c]*
12 X? X is optional (0 or 1 occurrence)
13 If(def)? Matches if or ifdef
14 [A-Za-z] Matches any alphabetical character
15 . Matches any character except new line
16 \. Matches the . character
17 \n Matches the new character
18 \t Matches the tab character
19 \\ Matches the \ character
20 [ \t] Matches either a space or tab character
21 [^a-d] Matches any character other than a,b,c and d
22 $ End of the line
Expression:
1. Steps to Execute the program:

$ lexfilename.l(eg: first.l)
$cc lex.yy.c–ll or gcclex.yy.c–ll
$./a .out
Algorithm:
1. Start the program.
2. Lex program consists of threeparts.
a. Declaration%%
b.Translation rules %%
c. Auxilaryprocedure.
3. The declaration section includes declaration of variables, maintest, constantsand

regular definitions.
4. Translation rule of lex program are statements of theform
a. P1 {action}
b. P2 {action}
c. …
d. …
e. Pn {action}
5. Write a program in the vi editor and save it with .lextension.
6.Compile the lex program with lex compiler to produce output file as lex.yy.c. eg$lexfilename.l $ cc
lex.yy.c-ll
7.Compile that file with C compiler and verify theoutput.
Conclusion:
LEX is a tool which accepts regular expressions as an input & generates a C code to recognize that token. If that
token is identified, then the LEX allows us to write user defined routines that are to be executed. When we give
input specification file to LEX, LEX generates lex.yy.c file as an output which contains function yylex() which is
generated by the LEX tool & contains a C code to recognize the token & action to be carried out if we find
thetoken
Program:
Program.l
%{
#include<stdio.h>
%}
%%
float|int|char|double|long|void|include|typedef
printf("%s IS A KEYWORD\n",yytext);
["|(|)|{|}|#|;|,|%|&] printf("%s IS A SPECIAL SYMBOL\n", yytext);
["]+[a-zA-Z0-9]+["] printf("%s IS A MESSAGE\n", yytext);
%[a-zA-Z] printf("%s IS A STRING CONSTANT\n", yytext);
[+|-|*|/] printf("%s IS A ARITHMATIC OPERATOR\n", yytext);
[getch|main|printf|scanf|return|clrscr] printf("%s IS A PREDEFINED FUNCTION\n",
yytext);
[a-zA-Z]*.h printf("%s IS A HEADER FILE\n", yytext);
[a-zA-Z]+[a-zA-Z0-9]* printf("%s IS AN IDENTIFIER\n", yytext);
[0-9]+ printf("%s IS A CONSANT\n", yytext);
= printf("%s IS AN ASSIGNMENT OPERATOR \n", yytext);
[<|>|<=|>=|<>] printf("%s IS A RELATIONAL OPERATOR\n", yytext);
[&&|!=|==] printf("%s IS A LOGICAL OPERATOR\n", yytext);
%%
int yywrap(void)
{
return 1;
}
int main()
{
FILE *fp=fopen("code.c","r");
yyin = fp;
yylex();
yywrap();
return 0;
}
Code.c
#include<stdio.h>
#include<conio.h>
void main()
{
int a,b;
clrscr();
printf("\n\t the value of a,b\n");
scanf("%d,%d",&a,&b);
a=a+b;
b=a-b;
a=a-b;
printf("\n the no after swapping a=%d and b=%d ");
getch();
}
Output:
root123@root123-ThinkCentre-M60e:~$ vi code.c
root123@root123-ThinkCentre-M60e:~$ vi ab1.l
root123@root123-ThinkCentre-M60e:~$ lex ab1.l
root123@root123-ThinkCentre-M60e:~$ gcc lex.yy.c -ly -ll
root123@root123-ThinkCentre-M60e:~$ ./a.out
# IS A SPECIAL SYMBOL
< IS A RELATIONAL OPERATOR
stdio.h IS A HEADER FILE
> IS A RELATIONAL OPERATOR
# IS A SPECIAL SYMBOL
< IS A RELATIONAL OPERATOR
conio.h IS A HEADER FILE
> IS A RELATIONAL OPERATOR
main IS AN IDENTIFIER
( IS A SPECIAL SYMBOL
) IS A SPECIAL SYMBOL
{ IS A SPECIAL SYMBOL
a IS A PREDEFINED FUNCTION
, IS A SPECIAL SYMBOL
b IS AN IDENTIFIER
; IS A SPECIAL SYMBOL
clrscr IS AN IDENTIFIER
printf IS AN IDENTIFIER
" IS A SPECIAL SYMBOL
\n IS A PREDEFINED FUNCTION
\t IS A PREDEFINED FUNCTION
the IS AN IDENTIFIER
value IS AN IDENTIFIER
of IS AN IDENTIFIER
b IS AN IDENTIFIER
scanf IS AN IDENTIFIER
%d IS A STRING CONSTANT
%d IS A STRING CONSTANT
& IS A SPECIAL SYMBOL
& IS A SPECIAL SYMBOL
b IS AN IDENTIFIER
= IS AN ASSIGNMENT OPERATOR
+ IS A ARITHMATIC OPERATOR
b IS AN IDENTIFIER
b IS AN IDENTIFIER
-b IS AN IDENTIFIER
-b IS AN IDENTIFIER
printf IS AN IDENTIFIER
the IS AN IDENTIFIER
Assignment No. 2
Implement a parser for an expression grammar using YACC and
LEX for the subset of C. Cross check your output with Stanford
Title LEX and YACC.
35
Roll No.
Date
Signatureof
subject teacher
Marks- /10
AssignmentNo:2
Title: Implement a parser for an expression grammar using YACC and LEX for the subset of C. Cross b
check your output with Stanford LEX and YACC.
Aim: Assignment to understand basic syntax of YACC specifications built-in functions and variables
Objective:
 To understand Second phase of compiler: Syntax Analysis.

 To learn and use compiler writing tools.

 Understand the importance and usage of YACC automated tool
Theory:
Parser generator facilitates the construction of the front end of a compiler. YACC is LALR parser generator. It is
used to implement hundreds of compilers. YACC is command (utility) of the UNIXsystem. YACC stands for
“Yet Another Compiler Complier”.File in which parser generated is with .y extension. e.g. parser.y, which is
containing YACC specification of the translator. After complete specification UNIX command. YACC
transforms parser.y into a C program called y.tab.c using LR parser. The program y.tab.c is automatically
generated. We can use command with –d option asyacc –d parser.yBy using –d option two files will get
generated namely y.tab.c and y.tab.h. The header file y.tab.h willstore all the token information and so you need
not have to create y.tab.h explicitly.The program y.tab.c is a representation of an LALR parser written in C, along
with other C routines
that the user may have prepared. By compiling y.tab.c with the ly library that contains the LR parsing program
using the command.cc y tab c – lywe obtain the desired object program a out that perform the translation
specified by the original program.If procedure is needed, they can be compiled or loaded with y.tab.c, just as with
any C program.
LEX recognizes regular expressions, whereas YACC recognizes entire grammar. LEX divides theinput stream
into tokens, while YACC uses these tokens and groups them together logically. LEXand YACC work together to
analyze the program syntactically. The YACC can report conflicts orambiguities (if at all) in the form of error
messages.
1. YACC Specifications:
The Structure of YACC programs consists of three parts
1)Definition section:
2)Declartion part:
3)Translation rule section:
Definition Section:
The definitions and programs section are optional. Definition section handles control information for the YACC-
generated parser and generally set up the execution environment in which the parser will operate.
Declaration part:In declaration section, %{ and %} symbol used for C declaration. This section is used for
definition of token, union, type, start, associativity and precedence of operator. Token declared
in this section can then be used in second and third parts of Yacc specification. Translation Rule Section:
In the part of the Yacc specification after the first %% pair, we put the translation rules. Each rule consists of a
grammar production and the associated semantic action. A set of productions that we have been writing:
<left side> <alt 1> | <alt 2> | … <alt n>

Would be written in YACC as
<left side> : <alt 1> {action 1}
| <alt 2> {action 2}
……………
| <alt n> {action n}
;
In a Yacc production, unquoted strings of letters and digits not declared to be tokens are taken to be
nonterminals. A quoted single character, e.g. 'c', is taken to be the terminal symbol c, as well as the integer code
for the token represented by that character (i.e., Lex would return the character code for ' c' to the parser, as an
integer). Alternative bodies can be separated by a
vertical bar, and a semicolon follows each head with its alternatives and their semantic actions. The first head is
taken to be the start symbol.
A Yacc semantic action is a sequence of C statements. In a semantic action, the symbol $$ refers to the attribute
value associated with the nonterminal of the head, while $i refers to the value associated with the ith grammar
symbol (terminal or nonterminal) of the body. The semantic action is performed whenever we reduce by the
associated production, so normally the semantic action computes a value for $$ in terms of the $i's. In the Yacc
specification, we have
written the two E-productions. E E + T/T
and their associated semantic action as:exp : exp „+‟ term {$$ = $1 + $3;}
| term;In above production exp is $1, „+‟ is $2 and term is $3. The semantic action associated with first
production adds values of exp and term and result of addition copying in $$ (exp) left hand side. For above
second number production, we have omitted the semantic action since it is just copying the value. In general {$$
= $1;} is the default semantic action.
Supporting C-Routines Section:
The third part of a Yacc specification consists of supporting C-routines. YACC generates a single function called
yyparse(). This function requires no parameters and returns either a 0 on success, or 1 on failure. If syntax error
over its return 1.The special function yyerror() is called when YACC encountersan invalid syntax. The yyerror()
is passed a single string (char )
argument. This function just prints user defined message like:yyerror (char err)
{
printf (“Divide by zero”);
}
When LEX and YACC work together lexical analyzer using yylex () produce pairs consisting of a token and its
associated attribute value. If a token such as DIGIT is returned, the token value associated with a token is
communicated to the parser through a YACC defined variable yylval.We have to return tokens from LEX to
YACC, where its declaration is in YACC. To link this LEX program include a y.tab.h file, which is generated
after YACC compiler the program using – d option.
. Built-in Functions:
Function Meaning
yyparser() This is a standard parse routine used for calling syntax analyzer for given translation
yyerror() This function is used for displaying any error message when a yacc detects a syntax
3. Built-in Types:
Mea
Type
ning
Eg.:- %token NUMBER
Eg.:- %start S Where S is start symbol
Eg.: %left „+‟ „-„ -Assign left associatively to + & – with lowest precedence.
%left „*‟ „/„ -Assign left associatively to * & / with highest precedence.
Eg.: %right „+‟ „-„ -Assign right associatively to + & – with lowest precedence
%right „*‟ „/„ -Assign right associatively to * & / with highest precedence.
Eg.:- %nonassoc UMINUS
Eg.:- %prec UMINUS
Eg.:- %type <name of any variable> exp
% union
{ char str ;
int num ; }
rules. When yyparse() is call, the parser attempts to parse an input stream.
error
%token Used to declare the tokens used in the grammar.

%start Used to declare the start symbol of the grammar.
%left Used to assign the associatively to operators.
%right Used to assign the associatively to operators.
%nonassoc Used to unary associate.

%prec Used to tell parser use the precedence of given code.
%type Used to create the type of a variable.
%union Token data types are declared in YACC using the YACC declaration % union, like this :
Conclusion:
The yacc command accepts a language that is used to define a grammar for a target language to
be parsed by the tables and code generated by yacc. The language accepted by yacc as a grammar for
the target language is described below using the yacc input language itself.
The input grammar includes rules describing the input structure of the target language and code
to be invoked when these rules are recognized to provide the associated semantic action. The code to be
executed will appear as bodies of text that are intended to be C-language code. The C-language
inclusions are presumed to form a correct function when processed by yacc into its output files.
;CALC program
%{
#include<stdio.h>
#include<stdlib.h>
void yyerror(char *);
#include "y.tab.h"
int yylval;
%}
%%
[a-z] {yylval=*yytext='&'; return VARIABLE;}
[0-9]+ {yylval=atoi(yytext); return INTEGER;}
CALC.Y
[\t] ;
%%
int yywrap(void)
{
return 1;
}
%token INTEGER VARIABLE
%left '+' '-'
%left '*' '/'
%{
int yylex(void);
void yyerror(char *);
int sym[26];
%}
%%
PROG:
PROG STMT '\n'
;
STMT: EXPR {printf("\n %d",$1);}
| VARIABLE '=' EXPR {sym[$1] = $3;}
;
EXPR: INTEGER
| VARIABLE {$$ = sym[$1];}
| EXPR '+' EXPR {$$ = $1 + $3;}
| '(' EXPR ')' {$$ = $2;}
%%
void yyerror(char *s)
{
printf("\n %s",s);
return;
}
int main(void)
{
printf("\n Enter the Expression Grammar:");
yyparse();
return 0;
}
***OUTPUT***
Output:
$ lex calc.l
$ yacc -d calc.y
$ cc y.tab.c lex.yy.c -ll -ly -lm
$ . / a . out
Enter the Expression: ( 5 + 4 ) * 3
Answer: 27
Assignment No. 3
Generate and populate appropriate Symbol Table..
Title
35
Roll No.
Date
Signatureof
subject teacher
Marks- /10
TITLE Generate and Populate appropriateSymbolTable.
PROBLEM
STATEMENT Design suitable data structures to generate and Populate appropriate
/DEFINITION Symbol Table.
OBJECTIVE Understand the Symbol table is an important data structure created and
maintained by compilers
S/W PACKAGES AND 64-bit open source Linux (Fedora

HARDWARE 20)
APPARATUS USED Eclipse IDE, JAVA
I3 and I5 machines
1. Compiler Construction Using Java, JavaCC and Yacc, Anthony J. Dos

REFERENCES Reis, Wiley ISBN 978-0-470-94959-7
2. K Muneeswaran, “Compiler Design”, Oxford University press, ISBN 0-19-
806664-3
3. J R Levin, T Mason, D Brown, “Lex and Yacc”, O’Reilly, 2000 ISBN 81-
7366-061-X
4. A V Aho, R Sethi, J D Ullman, “Compilers: Principles, Techniques, and
Tools”, Pearson Edition, ISBN 81-7758-590-8
5. Dick Grune, Bal, Jacobs, Langendoen, Modern Compiler Design, Wiley,
ISBN 81-265-0418-8
Refer to
STEPS details
Theory-Related concept,Architecture,Syntaxetc
INSTRUCTIONS FOR  Class Diagram
JOURNAL  Test cases
Theory:
Figure 1 depicts the schematic of a two pass language processor. The first pass performsanalysis
of the source program and reflects its results in the intermediate representation (IR). The second pass
reads and analyses the IR to perform synthesis of the target program. This avoids repeated process sing of
the source program. The first pass is concerned exclusively with source language issues. Hence it is
called the FRONT END of the language processor. The second pass is concerned withthe language
processor.
The Front End
The front end performs
• Lexical analysis,
• Syntax analysis and
• Semantic analysis of the source program.
Each kind of analysis involves the following functions:
Determine validity of a source statement from the viewpoint of the analysis.
1. Determine the ‘content’ of a source statement.
2. Construct a suitable representation of the source statement for use by subsequent analysis functions, or by the
synthesis phase of the language processor. The word ‘content’ has different connotations in lexical, syntax and
semantic analysis. • In lexical analysis, the content is the lexical class to which each lexical
unit belongs
In syntax analysis it is the syntactic structure of a source statement.
• In semantic analysis the content is the meaning of a statement—for a declaration
statement, it is the set of attributes of a declared variable (e.g. type, length and
dimensionality), while for an imperative statement, it is the sequence of actions implied
by the statement.
Each analysis represents the ‘content’ of a source statement in the form of
(1) Tables of information, and (2) description of the source statement. Subsequent analysis
uses this information for its own purposes and either adds information to these tables and
descriptions, or constructs its own tables and descriptions.
Output of the front end

The IR produced by the front end consists of two components:
1. Tables of information
2. An INTERMEDIATE CODE (IC) which is a description of the source program.
TABLES
Tables contain the information obtained during different analyses of SP. The most important table is the
symbol table which contains information concerning all identifiers used in the SP. The symbol table is built
during lexical analysis. Semantic analysis adds information concerning symbol attributes while processing
declaration statements. It may also add new names designating temporary results.
INTERMEDIATE CODE ( I C )
The IC is a sequence of IC units; each IC unit representing the meaning of one action in SP. IC units may
contain references to the information in various tables.
Example
Figure 2 shows the IR produced by the analysis phase for the program i:
integer;
a,b: real;
a := b+i;Symbol Table:
Intermediate code
#1) to real, giving (id, #4)
2. Add (id, #4) to (id, #3) giving (id, #5)
3. Store (id, #5) in (Id, #2)
Figure 2: Intermediate Representation
The symbol table contains information concerning the identifiers and their types. This information is
determined during lexical and semantic analysis, respectively. In IC, the specification (Id, #1) refers to the id
occupying the first entry in the table. i* and temp are temporary names added during semantic analysis of the
assignment statement.
The Back End

The back end performs memory allocation and code generation.
Memory allocation
Memory allocation is a simple task given the presence of the symbol table. The memory requirement
of an identifier is computed from its type, length and dimen-sionality, and memory is allocated to it. The
address of the memory area is entered in the symbol table.
Example
After memory allocation, the symbol table looks as shown in Figure 3. The entries for i* and
temp are not shown because memory allocation is not needed for these id’s.
Figure 3: Symbol table after memory allocation
Certain decisions have to precede memory allocation, for example, whether i* and temp should
be allocated memory. These decisions are taken in the preparatory steps of code generation.
Code generation
Code generation uses knowledge of the target architecture, viz. knowledge of instructions and addressing modes
in the target computer, to select the appropriate instructions.
Example
For the sequence of actions for the assignment statement a :* b+i;

(i) Convert i to real, giving i*.
(ii) Add i* to b, giving temp,
(iii) Store temp in a.
the synthesis phase may decide to hold the values of i* and temp in machine registers and may generate the
assembly code
CONV_R AREG, I
ADD_R AREG, B
MOVEM AREG, A
where CONV_R converts the value of I into the real representation and leaves the result in AREG. ADD_R
performs the addition in real mode and MOVEM puts the result into the memory area allocated to A.
Input:
#include<stdio.h>
void main()
{double a=10;
int b=5; float
c=2.3f; char
d='A';
}
Output:
Generating the symbol table
ID Datatype Variable Value
1 double a 10
2 int b 5
3 float c 2.3f
4 char d ‘A’
Conclusion:
The yacccommand accepts a language that is used to define a grammar for a target language to be
parsed by the tables and code generated by yacc. The language accepted by yaccas a grammar for the target
language is described below using the yaccinput language itself
Program.java
import java.io.*;
class P1
{
public static void main(String ar[])throws IOException
{
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
int i;
String a[][]={{"","START","101",""},
{"","MOVER","BREG","ONE"},
{"AGAIN","MULT","BREG","TERM"},
{"","MOVER","CREG","TERM"},
{"","ADD","CREG","N"},
{"","MOVEM","CREG","TERM"},
{"N","DS","2",""},
{"RESULT","DS","2",""},
{"ONE","DC","1",""},
{"TERM","DS","1",""},
{"","END","",""}};
int lc=Integer.parseInt(a[0][2]);
String st[][]=new String[5][2];
int cnt=0,l;
for (i=1;i<11;i++)
{
if (a[i][0]!="")
{
st [cnt][0]=a[i][0];
st[cnt][1]=Integer.toString(lc);
cnt++;
if(a[i][1]=="DS")
{
int d=Integer.parseInt(a[i][2]);
lc=lc+d;
}
else
{
lc++;
}
}
Else
{
lc++;
}
}
System.out.print("***SYMBOL TABLE****\n");
System.out.println("_____________________");
for(i=0;i<5;i++)
{
for(cnt=0;cnt<2;cnt++)
{
System.out.print(st[i][cnt]+"\t");
}
System.out.println();}
String
inst[]={"STOP","ADD","SUB","MULT","MOVER","MOVEM","COMP","BC","DIV","RE
AD
","PRINT"};
String reg[]={"NULL","AREG","BREG","CREG","DREG"};
int op[][]=new int[12][3];
int j,k,p=1,cnt1=0;
for(i=1;i<11;i++)
{
for(j=0;j<11;j++)
{
if(a[i][1].equalsIgnoreCase(inst[j]))
{
op[cnt1][0]=j;
}
else
if(a[i][1].equalsIgnoreCase("DS"))
{
p=Integer.parseInt(a[i][2]);
}
else if(a[i][1].equalsIgnoreCase("DC"))
{
op[cnt1][2]=Integer.parseInt(a[i][2]);
}
}
for(k=0;k<5;k++)
{
if(a[i][2].equalsIgnoreCase(reg[k]))
{
op[cnt1][1]=k;
}
}
for(l=0;l<5;l++)
{
if(a[i][3].equalsIgnoreCase(st[l][0]))
{
int mn=Integer.parseInt(st[l][1]);
op[cnt1][2]=mn;
}
}
cnt1=cnt1+p;
}
System.out.println("\n *****OUTPUT*****\n");
System.out.println("**********MOT TABLE**********");
int dlc=Integer.parseInt(a[0][2]);
for(i=0;i<12;i++)
{
System.out.print(dlc+++"\t");
for(j=0;j<3;j++)
{
System.out.print(" "+op[i][j]+" ");
}
System.out.println();
}
System.out.println("");}
}
/*OUTPUT:********
pl2-pc2@pl2pc2-ThinkCentre-M700:~/Desktop$ javac P1.java
pl2-pc2@pl2pc2-ThinkCentre-M700:~/Desktop$ java P1
***SYMBOL TABLE****
_____________________
AGAIN 102
N 106
RESULT 108
ONE 110
TERM 111
*****OUTPUT*****
**********MOT TABLE**********
101 4 2 110
102 3 2 111
103 4 3 111
104 1 3 106
105 5 3 111
106 0 0 0
107 0 0 0
108 0 0 0
109 0 0 0
110 0 0 1
111 0 0 0
112 0 0 0
pl2-pc2@pl2pc2-ThinkCentre-M700:~/Desktop$
Assignment No. 4
Title Implement Semantic Analysis Operations (like type checking,

verification of function parameters, variable declarations and
coercions) possibly using an Attributed Translation Grammar.
age.
Roll No. 35
Date
Signatureof
subject teacher
Marks- /10
Implementation of Semantic Analysis Operations
TITLE
Implement Semantic Analysis Operations (like type checking, verification of

PROBLEM function parameters, variable declarations and coercions) possibly using an
STATEMENT Attributed Translation Grammar
/DEFINITION
Semantic analysis is the task of ensuring that the declarations and statements of
OBJECTIVE a program are semantically correct, i.e, that their meaning is clear and consistent
with the way in which control structures and data types are supposed to be used.
S/W PACKAGES AND 64-bit open source Linux (Fedora

HARDWARE 20)
APPARATUS USED Eclipse IDE, JAVA
I3 and I5 machines

806664-3
7366-061-X
ISBN 81-265-0418-8
Refer todetails
STEPS
INSTRUCTIONS FOR Learning Objective

JOURNAL  Learning Outcome
 Theory-Related
concept,Architecture,Syntaxetc
 Class Diagram
 Test cases
 Program Listing
 Output
 Conclusion
Theory:
Aim: Implementation of Semantic Analysis Operations
Pre-requisite: familiarity with the phases of compiler like lexical analysis i.e. parser
Learning Objectives:
- Study of Semantics analysis phase of compiler how it helps interpret symbols, their types, and their
relations with each other.
- Study of Grammar used by Semantics analysis to interpret meaning of programming and natural
language sentences.
Learning Outcomes:
The students will be able to

 Construct meaningful sentence in programming and natural language
Able to understand how semantic analysis phase of compiler behaves.

Theory:
Semantics
Semantics of a language provide meaning to its constructs, like tokens and syntax structure. Semantics
help interpret symbols, their types, and their relations with each other. Semantic analysis judges whether the
syntax structure constructed in the source program derives any meaning or not.
CFG + semantic rules = Syntax Directed Definitions
For example:
int a = “value”;
should not issue an error in lexical and syntax analysis phase, as it is lexically and structurally correct, but it
should generate a semantic error as the type of the assignment differs. These rules are set by the grammar of the
language and evaluated in semantic analysis. The following tasks should be performed in semantic analysis:
 Scope resolution
 Type checking
 Array-bound checking

Semantic Errors
We have mentioned some of the semantics errors that the semantic analyzer is expected to recognize:
 Type mismatch
 Undeclared variable
 Reserved identifier misuse.
 Multiple declaration of variable in a scope.
 Accessing an out of scope variable.
 Actual and formal parameter mismatch.
Attribute Grammar
Attribute grammar is a special form of context-free grammar where some additional information
(attributes) are appended to one or more of its non-terminals in order to provide context-sensitive information.
Each attribute has well-defined domain of values, such as integer, float, character, string, and expressions.
Attribute grammar is a medium to provide semantics to the context-free grammar and it can help specify the
syntax and semantics of a programming language. Attribute grammar (when viewed as a parse-tree) can pass
values or information among the nodes of a tree.
Example:
E → E + T { E.value = E.value + T.value }
The right part of the CFG contains the semantic rules that specify how the grammar should be interpreted. Here,
the values of non-terminals E and T are added together and the result is copied to the non-terminal E.
Semantic attributes may be assigned to their values from their domain at the time of parsing and evaluated at the
time of assignment or conditions. Based on the way the attributes get their values, they can be broadly divided
into two categories : synthesized attributes and inherited attributes.
Synthesized attributes
These attributes get values from the attribute values of their child nodes. To illustrate, assume the
following production:
S → ABC
If S is taking values from its child nodes (A,B,C), then it is said to be a synthesized attribute, as the values of
ABC are synthesized to S.
As in our previous example (E → E + T), the parent node E gets its value from its child node. Synthesized
attributes never take values from their parent nodes or any sibling nodes.
Inherited attributes
In contrast to synthesized attributes, inherited attributes can take values from parent and/or siblings. As in the
following production,
S → ABC
A can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take values from S, A,
and B.
Expansion : When a non-terminal is expanded to terminals as per a grammatical rule
Reduction : When a terminal is reduced to its corresponding non-terminal according to grammar rules. Syntax
trees are parsed top-down and left to right. Whenever reduction occurs, we apply its corresponding semantic rules
(actions).
Semantic analysis uses Syntax Directed Translations to perform the above tasks.
Semantic analyzer receives AST (Abstract Syntax Tree) from its previous stage (syntax analysis).
Semantic analyzer attaches attribute information with AST, which are called Attributed AST.
Attributes are two tuple value, <attribute name, attribute value>
For example:
int value = 5;
<type, “integer”>
<presentvalue, “5”>
We may conclude that if a definition is S-attributed, then it is also L-attributed as L-attributed definition
encloses S-attributed definitions.
Conclusion:
age.The input grammar includes rules describing the input structure of the target language and code to be
invoked when these rules are recognized to provide the associated semantic action. The code to be
executed will appear as bodies of text that are intended to be C-language code may conclude that if a
definition is S-attributed, then it is also L-attributed as L-attributed definition encloses S-attributed
definitions. We implement Semantic analysis operation (like type checking,verifications of function
parameter,variable declarations)using an attributed translation grammar.
Program:
**********ip.l***********
%{
#include "y.tab.h"
%}
ALPHA [A-Z a-z]
DIGIT [0-9]
%%
{ALPHA}({ALPHA}|{DIGIT})* return ID;
{DIGIT}+ {yylval=atoi(yytext); return ID;}
[\n \t] yyterminate();
. return yytext[0];
%%
***********ip.y**********
%{
#include <stdio.h>
#include <stdlib.h>
%}
%token ID
%left '+' '-'
%left '*' '/'
%left UMINUS
%%
S:E
E : E'+'{A1();}T{A2();}
| E'-'{A1();}T{A2();}
| T;
T : T'*'{A1();}F{A2();}
| T'/'{A1();}F{A2();}
| F;
F : '('E{A2();}')'
| '-'{A1();}F{A2();}
| ID{A3();};
%%
#include "lex.yy.c"
char st[100];
int top=0;
main()
{
printf("Enter infix expression: ");
yyparse();
printf("\n");
}
A1()
{
st[top++]=yytext[0];
}
A2()
{printf("%c",st[--top]);
}
A3()
{
printf("%c",yytext[0]);
}
Output1:
outputsemantic.txt
root1@root1-ThinkCentre-M60e:~$lex semantic.l
root1@root1-ThinkCentr-M60e:~$yacc semantic.y
root1@root1-ThinkCentre-M60e:~$gcc y.tab.c –ll –ly
root1@root1-ThinkCentre-M60e:~$./a.out .
Enter Infix Expression:(5-9)*(4+6)
59-46+*
Assignment No. 5
Implement the front end of a compiler that generates the
Title three address code for a simple language.
Roll No. 35
Date
Signatureof
subject teacher-
Marks- /10
TITLE Implement the front end of a compiler that generates the three address code
for a simple language
PROBLEM
STATEMENT 1. Write a LEX and YACC program to generate Intermediate Code for
/DEFINITION arithmetic expression
2. Write a LEX and YACC program to generate Intermediate Code for subset
of C (If,else,while)
OBJECTIVE
To understand the fourth phase of a compiler: Intermediate code generation

(ICG)
 To learn and use compiler writing tools.
 Understand and learn how to write three address code for given statement
S/W PACKAGES AND Linux OS (Fedora 20), PC with the configuration as 64-bit Fedora or
HARDWARE equivalent OS with 64-bit Intel-i5/ i7 or latest higher processor computers,
APPARATUS USED FOSS tools, LEX, YACC,

806664-3
7366-061-X
ISBN 81-265-0418-8
Refer todetails
STEPS
INSTRUCTIONS FOR Learning Objective

JOURNAL  Learning Outcome
 Theory-Related
 Class Diagram
 Test cases
 Program Listing
 Output
 Conclusion
Assignment No: 5
Title: Implement the front end of a compiler that generates the three address code for a simple language.
Aim: Write an attributed translation grammar to recognize declarations of simple variables, "for", assignment,
if, if-else statements as per syntax of C or Pascal and generate equivalent three address code for the given
input made up of constructs mentioned above using LEX and YACC. Write a code to store the identifiers from
the input in a symbol table and also to record other relevant information about the identifiers. Display all
records stored in the symboltable.
Objective: To learn the function of compiler by:
 To understand fourth phase of compiler: Intermediate codegeneration.

 To learn and use compiler writingtools.
 To learn how to write three address code for givenstatement.
Theory:
Introduction:
In the analysis - synthesis model of a compiler, the front end analyzes a source program and creates an
intermediate representation, from which the back end generates target code. Ideally, details of the source
language are confined to the front end, and details of the target machine to the back end. The front end
translates a source program into an intermediate representation from which the back end generates target code.
With a suitably defined intermediate representation, a compiler for language i and machine j can then be built
by combining the front end for language i with the back end for machine j. This approach to creating suite of
compilers can save a considerable amount of effort: m x n compilers can be built by writing just m front ends
and n backends.
Benefits of using a machine-independent intermediate formare:

1. Retargeting is facilitated. That is, a compiler for a different machine can be created byattaching a back
end for the new machine to an existing frontend.
2. A machine-independent code optimizer can be applied to the intermediaterepresentation
Intermediate Languages:
Three ways of intermediate representation:
 Syntax tree
 Postfixnotation
 Three address code
The semantic rules for generating three-address code from common programming language
constructs are similar to those for constructing syntax trees or for generating postfix notation.
GraphicalRepresentations:
1. Syntax tree:
A syntax tree depicts the natural hierarchical structure of a source program. A dag(Directed
Acyclic Graph) gives the same information but in a more compact way because common
sub expressions are identified. A syntax tree and dag for the assignment statement a : =b *
-c + b * - c are as follows:
2. Postfixnotation:
Postfix notation is a linearized representation of a syntax tree; it is a list of the nodes of the
tree in which a node appears immediately after its children. The postfix notation for the
syntax tree given above is
a b c uminus * b c uminus * + assign
3. Three-AddressCode:
Three-address code is a sequence of statements of the general
form x : = y op z
Where x, y and z are names, constants, or compiler-generated temporaries; op stands for any
operator, such as a fixed-or floating-point arithmetic operator, or a logical operator on Boolean
valued data.
Thus a source language expression like x+ y*z might be translated into a sequence
t1 : =
y* z
t2 : =
x+t1
Where t1 and t2 are compiler-generated
temporarynames.
The reason for the term “three-address code” is that each statement usually contains three
addresses, two for the operands and one for the result.
Advantages of three-address code:
 The unraveling of complicated arithmetic expressions and of nested flow-of-

controlstatements makes three-address code desirable for target code generation
andoptimization.
 The use of names for the intermediate values computed by a program allows three
addresscode to be easily rearranged – unlike postfixnotation.
Three-address code is a liberalized representation of a syntax tree or a dag in which explicit names
correspond to the interior nodes of the graph. The syntax tree and dag are represented by the three-
address code sequences. Variable names can appear directly in three address statements.
Types of Three-Address Statements:
The common three-address statements are:
 Assignment statements of the form x : = y op z, where op is a binary arithmetic

orlogical operation.
 Assignment instructions of the form x : = op y, where op is a unary operation. Essential

unary operations include unary minus, logical negation, shift operators, and conversion
operators that, for example, convert a fixed-point number to a floating-pointnumber.
 Copy statements of the form x : = y where the value of y is assigned tox.

 The unconditional jump goto L. The three-address statement with label L is the next
tobe executed.
 ConditionaljumpssuchasifxrelopygotoL.Thisinstructionappliesarelationaloperator(<,=,
>=, etc. )to x and y, and executes the statement with label L next if x stands in relation
relop toy. If not, the three-address statement following if x relop y goto L is executed next,
as in the usual sequence.
paramxandcallp,nforprocedurecallsandreturny,whereyrepresentingareturnedvalueis
optional. Forexample,
para
m x1
para
m x2
.......
para
mxn
callp
,n
generated as part of a call of the procedure p(x1, x2, …. ,xn ).
 Indexed assignments of the form x : = y[i] and x[i] : = y.
 Address and pointer assignments of the form x : = &y , x : = *y, and *x : =y.
Implementation of Three-Address Statements:
A three-address statement is an abstract form of intermediate code. In a compiler, these statements

can be implemented as records with fields for the operator and the operands. Three such
representations are: Quadruples, Triples, Indirecttriples.
A. Quadruples:
 A quadruple is a record structure with four fields, which are, op, arg1, arg2
andresult.
 The op field contains an internal code for the operator. The 3 address statement x
= yop z is represented by placing y in arg1, z in arg2 and x inresult.
 The contents of fields arg1, arg2 and result are normally pointers to the symbol-
table entries for the names represented by these fields. If so, temporary names must
be entered into the symbol table as they arecreated.
Fig a) shows quadruples for the assignment a : b * c + b *c

B. Triples:
 To avoid entering temporary names into the symbol table, we might refer to
atemporary value by the position of the statement that computesit.
 If we do so, three-address statements can be represented by records with

onlythree fields: op, arg1 andarg2.
 The fields arg1 and arg2, for the arguments of op, are either pointers to the
symboltable or pointers into the triple structure ( for temporary values).
 Since three fields are used, this intermediate code format is known astriples.
 Fig b) shows the triples for the assignment statement a: = b * c + b *c.
C.
D.Indirecttriples:
 Indirect triple representation is the listing pointers to triples rather-than listing

thetriples themselves.
 Let us use an array statement to list pointers to triples in the desiredorder.
Fig c) shows the indirect triplerepresentation

Steps to execute the program
$lexfilename.l (eg:comp.l)
$ yacc -dfilename.y (eg:comp.y)
$cc lex.yy.cy.tab.c –ll –ly -lm
$./a .out
Algorithm:
Write a LEX and YACC program to generate Intermediate Code for arithmetic expression
LEX program:
1. Declaration of header files specially y.tab.h which contains declaration for Letter,
Digit,expr.
2. End declaration section by%%
3. Match regular expression.
4. If match found then convert it into char and store it in yylval.p where p is pointer
declaredin YACC
5. Returntoken
6. If input contains new line character (\n) then return0
7. If input contains „.‟ then returnyytext[0]
8. End rule-action section by%%
9. Declare mainfunction
a. open file given at command line

b. if any error occurs then print error andexit
c. assign file pointer fptoyyin

d. call function yylex until file ends
10. End
1. Declaration of headerfiles
2. Declare structure for three address code representation having fields of

argument1,argument2, operator, result.
3. Declare pointer of char type inunion.
4. Declare token expr of type pointerp.
5. Give precedence to„*‟,‟/‟.
6. Give precedence to„+‟,‟-‟.
7. End of declaration section by%%.
8. If final expression evaluates then add it to the table of three addresscode.
9. If input type is expression of theform.
a. exp‟+‟exp then add to table the argument1, argument2,operator.
b. exp‟-‟exp then add to table the argument1, argument2,operator.
c. exp‟*‟exp then add to table the argument1, argument2,operator.
d. exp‟/‟exp then add to table the argument1, argument2,operator.
e. (exp) then assign $2 to$$.
f. Digit OR Letter then assigns $1 to$$.
10. End the section y%%.
11. Declare file *yyinexternally.
12. Declare main function and call yyparse function untillyyinends
13. Declare yyerror for if any error occurs.
14. Declare char pointer s to printerror.

b. Print errormessage.
16. End of the program.
Addtotable function will add the argument1, argument2, operator and temporary variable to the
structure array of three addresscode.
Three address code function will print the values from the structure in the form first temporary
variable, argument1, operator,argument2
Quadruple Function will print the values from the structure in the form first operator, argument1,
argument2, result field
Triple Function will print the values from the structure in the form first argument1, argument2, and
operator. The temporary variables in this form are integer / index instead of variables.
Conclusion:
Thus we have studied how to generate intermediate code. A three-address statement is an

abstract form of intermediate code. In a compiler, these statements can be implemented as records
with fields for the operator and the operands. Three such representations are: Quadruples, Triples,
Indirect triples that we have studied.
FAQ’s
1. What are the different forms ofICG?
2. What are the difference between syntax tree andDAG?
3. What are advantages of 3-addresscode?
4. Which representation of 3-address code is better than other and why?Justify.
c. What is role of Intermediate code incompiler?
Program:
*******************tac.l****************
%{
#include<stdio.h>
#include"y.tab.h"
char ch='a';
%}
%%
[0-9]+ {yylval.dval=yytext[0]; return NUM;} /*this would store value of integer
in yyval;nd return token integer to yacc*/
\n {return 0;}
. {return yytext[0];}
%%
char intgencode(char ch1,char first,char op,char second)

{
printf("\n%c = %c %c %c\n",ch,first,op,second);
return ch++;
}
void yyerror(char* str)
{
printf("\n%s",str);
}
************************tac.y****************
%{
#include<stdio.h>
%}
%union
{
char dval; /*to define possible symbol types,is to allow
storing different kind of objects into node emitted by lex*/
}
%token <dval> NUM
%type <dval> E
%left '+' '-'
%left '*' '/'
%%
statement : E {printf("\n t= %c\n This is Three Address Code ...\n",$1);}
E : E '+' E {$$=intgencode($$,$1,'+',$3);}
| E '-' E {$$=intgencode($$,$1,'-',$3);}
| E '*' E {$$=intgencode($$,$1,'*',$3);}
| E '/' E {$$=intgencode($$,$1,'/',$3);}
| '(' E ')' {$$=$2;}
| NUM {$$=$1;}
;
%%
main()
{
yyparse();
return 0;
}
OUTPUT
root123@root123-ThinkCentre-M60e:~$ lex tac.l
root123@root123-ThinkCentre-M60e:~$yacc-d tac.y
root123@root123-ThinkCentre-M60e:~$gcc y.tab.c lex .yy.c –ll-ly-lm
root123@root123-ThinkCentre-M60e:~$./a.out /
3∗5+10/10-2
a=3∗5
b=10/10
c=a+b
d=c-2
t=d
This i s Three Address Code
Assignment No. 6
Title Write a program for Register Allocation algorithm that translates the given code into
one with a fixed number of registers.
.
Roll No. 35
Date
Signature of subject
teacher-
Marks- /10
TITLE Code generation using Register Allocation
PROBLEM Write a program for Register Allocation algorithm that translates the
STATEMENT given code into one with a fixed number of registers
/DEFINITION
OBJECTIVE
To understand code generation phase of
compilation.
To understand register allocation algorithm
S/W PACKAGES AND

HARDWARE Linux OS (Fedora 20), PC with the configuration as Core 2 Duo 2.2
APPARATUS USED GHz processor. 1 GB RAM, 160 G.B HDD, 17’’Color Monitor,
Keyboard, Mouse

806664-3
7366-061-X
ISBN 81-265-0418-8
Refer todetails
STEPS
INSTRUCTIONS FOR
JOURNAL Learning Objective
 Learning Outcome
 Theory-Related
 Class Diagram
 Test cases
 Program Listing
 Output
 Conclusion
Code Generation
Goal: Takes Intermediate code representation of source program (optimized three address code
statements) and produce equivalent target program as output.
Requirement of code genrator:
 Correct target code
 High quality code
 Effective use of resources of target machines
 Quick
Code generation is the last phase of compilation.
Simple Code Generator
 Simple code generation involves generating code for each three-address statements which are
divided into different basic blocks. For that it makes use of registers to store operands and leaves
result of computation in the register as long as possible.
 At the end of a basic block the results of the computation are stored only if the register is needed
for another computation or just before procedure call, jump or labeled statements. Because after
leaving this basic block flow enters into another different basic block.
 When generating code by above mentioned strategy we need :
1. Register Descriptor
2. Address Descriptor
1. Register descripto
oTo keep track of what is currently in each register, register descriptor is maintained.
oRegister descriptor is pointer to a list that contains information about what is currently in each of the
register. Initially all registers are empty.
2. Address descriptor
oTo keep track of the locations where current value of the name can be found at run time, address
descriptor is maintained. The locations can be a register, stack location or memory addresses.
oThis particular information is stored in the symbol table.
Code Generation Algorithm
 Input to the code generation algorithm is a sequence of 3-address statement divided into blocks.
 For each three address statement of the form a := b op c in the basic block it performs following
steps :
1. Call getreg( ) to obtain the location L in which the computation a := b op c is performed. For that
we need to pass three-address statement a := b op c as parameter to the getreg( ) function. Typically
location L can be a register or it can be a memory location.
2. Obtain the current location of the operand b from its address descriptor. If the value of y is in
memory location and also in register then prefer register rather than memory location. If the value of
b is not in location L then generate the instruction MOV b, L to store copy of b in L.
3. The instruction OP x L is generated and the address descriptor of x is updated to indicate that x is
now available in L. If L is register then update its descriptor to indicate that it will contain the run
time value of x.
4. If the current value of b and c are in the register then we can say that b and c are not further used.
That means value of b and c are not live at the end of the basic block. So alter the register descriptor
to indicate that after the execution of the three-address statement a : = b op c, those register will no
longer contain b and / or c.
 To perform computation specified by each of the three-address statement, we need a location. For
that function getreg( ) is used.
 When a function getreg( ) is called, it returns a location for the computation performed by a three-
address statement. For example if a := b op c is to be performed then getreg( ) returns a location L
where the computation b op c should be performed ; and if possible it returns a register.
 It performs the following steps to return location L.
1. First it searches for a register that already contains a value of b. If such register is available
and if value of b has no further use after execution of a := b op c and if b is
2.
3. not live at the end of basic block and hold the value of no other name then it returns that
register for L.
4. 2. Otherwise getreg() function searches for the empty register. If empty register is available
then it returns that for L.
5. 3. If empty register is not available and if a has further use in the block or op is an indexing
operator that requires register then getreg ( ) finds a suitable occupied register. This occupied
register is emptied by storing its value in proper memory location M and by updating address
descriptor that register is returned to L. The least recently used strategy can be used to find
suitable occupied register to be emptied.
6. 4. If a has no further use in the block or no suitable occupied register can be found then
getreg( ) selects a memory location and returns it for L.
7. Example
8. Consider the following expression x = (a + b) – ((c + d) – e)
9. The three-address code for this is generated as
10. t1 = a + b
11. t2 = c + d
12. t3 = t2 – e
13. x = t1 – t3
14. By applying above algorithm following code will be generated.
15. Table 1 shows the instruction generated and the contents of register descriptor and address
descriptor :
16.
17. S= { St , E , I , O , F, DD , NDD}
Where,
St is an initial state such that St={Fl | Fl is IC}. At initial state, system has IC.
E is a End state.
E={Sc, Fc | Sc = success case which gives assembly code and Fc = failure case which gives
the lexical errors}.
I= set of inputs
I={Fi | Fi is an input file which consists of IC}.
O= set of outputs
O={Op | Op is the assembly code generated}.
F=A set of Functions.
F={f | f1:I → O, function f implements simple code generator and generates the target
Program:
Cogen.l
%{
#include"stdio.h"
#include"y.tab.h"
%}
%%
[a-z][a-zA-Z0-9]*|[0-9]+ { strcpy(yylval.vname,yytext); return NAME; }
[ |\t] ;
.|\n { return yytext[0]; }
%%
Cogen.y
%{
#include<stdio.h>
#include<ctype.h>
#include<string.h>
FILE *fpOut;
%}
%union
{
char vname[10];
int val;
}
%left '+' '-'
%left '*' '/'
%token <vname> NAME
%type <vname> expression
%%
input : line '\n' input
| '\n' input
|;
line : NAME '=' expression { fprintf(fpOut,"MOV %s, AX\n",$1); }
;
expression: NAME '+' NAME
{
fprintf(fpOut,"MOV AX, %s\n",$1);
fprintf(fpOut,"ADD AX, %s\n",$3);
}
| NAME '-' NAME
{
fprintf(fpOut,"SUB AX, %s\n",$3);
}
| NAME '*' NAME
{
fprintf(fpOut,"MUL AX, %s\n",$3);
}
| NAME '/' NAME
{
fprintf(fpOut,"DIV AX, %s\n",$3);
}
| NAME
{
strcpy($$, $1);
}
;
%%
FILE *yyin;
main()
{
FILE *fpInput;
fpInput = fopen("input1.c","r");
if(fpInput=='\0')
{
printf("file read error");
//exit(0);
}
fpOut = fopen("output.c","w");
if(fpOut=='\0')
{
printf("file creation error");
//exit(0);
}
yyin = fpInput;
yyparse();
fcloseall();
}
yyerror(char *msg)
{
printf("%s",msg);
}
Input.c
t1 = c * d
t2 = b +t1
t3 = e /f
t4 = t2 -t3
a = t4
output cmd:
root123@root123-ThinkCentre-M60e:~$ lex cogen.l
root123@root123-ThinkCentre-M60e:~$yacc-d cogen.y
root123@root123-ThinkCentre-M60e:~$gcc y.tab.c lex .yy.c –ll-ly-lm
root123@root123-ThinkCentre-M60e:~$./a.out
Assignment No. 7
Title
Implement local and global code optimizations such as common sub
expression elimination,copypropagation,dead code elimination,loop and
basic block optimization(optional)
.
Roll No. 35
Date
Signature of subject
teacher
Marks- /10
TITLE Implement local and global code optimizations such as common sub
expression elimination,copypropagation,dead code elimination,loop
and basic block optimization(optional)
PROBLEM Write a Java program (using OOP features) to implement different

STATEMENT code optimization techniques such as common sub expression
/DEFINITION elimination copy propagation,dead code elimination,loop and basic
block optimization
OBJECTIVE
To learn and understand
 Different techniques of code optimization
 Implementation of code optimization
techniques
S/W PACKAGES AND Linux OS (Fedora 20), PC with the configuration as Core 2 Duo 2.2
HARDWARE GHz processor. 1 GB RAM, 160 G.B HDD, 17’’Color Monitor,
APPARATUS USED Keyboard, Mouse

806664-3
7366-061-X
ISBN 81-265-0418-8
Refer todetails
STEPS
INSTRUCTIONS FOR
JOURNAL Learning Objective
 Learning Outcome
 Theory-Related
 Class Diagram
 Test cases
 Program Listing
 Output
 Conclusion
Theory:
To learn and understand

 Different techniques of code optimization
 Implementation of code optimization techniques
Learning Outcomes:
The student will be able to
 Understand code optimization in detail
 Implement local and global code optimization techniques
THEORY:
Code Optimization: Optimization is a program transformation technique, which tries to improve the
code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.
In optimization, high-level general programming constructs are replaced by very efficient low-level
programming codes. A code optimizing process must follow the three rules given below:
4. The output code must not, in any way, change the meaning of the program.
5. Optimization should increase the speed of the program and if possible, the program should demand
less number of resources.
6. Optimization should itself be fast and should not delay the overall compiling process.
Machine-independent Optimization:
In this optimization, the compiler takes in the intermediate code and transforms a part of the code
that does not involve any CPU registers and/or absolute memory locations.
Machine-dependent Optimization:
Machine-dependent optimization is done after the target code has been generated and when the code
is transformed according to the target machine architecture. It involves CPU registers and may have
absolute memory references rather than relative references. Machine-dependent optimizers put
efforts to take maximum advantage of memory hierarchy.
Code optimization techniques
1.Common sub expression elimination
2.Copy propagation
3.Dead code elimination
4.Loop and basic block optimization
S= { St , E , I , O , F, DD , NDD}
Where,
St is an initial state such that St={Fl | Fl is IC}. At initial state, system has IC.
E is a End state.
E={Sc, Fc | Sc = success case which gives assembly code and Fc = failure case which gives the
lexical errors}.
I= set of inputs
I={Fi | Fi is an input file which consists of IC}.
O= set of outputs
O={Op | Op is the assembly code generated}.
F=A set of Functions.
F={f | f1:I → O, function f implements simple code generator and generates the target
Common sub expression elimination

Common sub expression elimination is a compiler optimization technique of finding redundant
expression evaluations, and replacing them with a single computation. This saves the time overhead
resulted by evaluating the expression for more than once .
Example:
In the following code:
a = b * c + g;
d = b * c * e;
it may be worth transforming the code to:

tmp = b * c;
a = tmp + g;
d = tmp * e;
if the cost of storing and retrieving tmpis less than the cost of calculating b * c an extra time.
Copy propagation
copy propagation is the process of replacing the occurrences of targets of direct assignments with
their values.A direct assignment is an instruction of the form x = y, which simply assigns the value of
y to x.
From the following code:
y=x
z=3+y
Copy propagation would yield:
z=3+x
Algorithm:
1)Generate the flow graph from list of instructions
do{
2)Perform reaching copy analysis
3)for each node in flow graph
for each use
if use is reached by the copy where it is the target
change the use to the source in the move statement tuple
While(changes)
4)Generate the list of instruction from modified flow graph
Dead code elimination

Code that is unreachable or that does not affect the program (e.g. dead stores) can be eliminated
example
In the example below, the value assigned to i is never used, and the dead store can be eliminated. The
first assignment to global is dead, and the third assignment to global is unreachable; both can be
eliminated.
int global;
void f ()
{
inti;
i = 1; /* dead store */
global = 1; /* dead store */
global = 2;
return;
global = 3; /* unreachable */
}
Below is the code fragment after dead code elimination.
int global;
void f ()
{
global = 2;
return;
}
Algorithm:
1)Generate the flow graph from list of instructions
do{
2)performliveness on graph
3)for each node in flow graph
ifthe define s set contains only one memory
if the temporary being defined is not in the live out set
Remove the node from the flow graph
While(changes)
4)Generate the list of instruction from modified flow graph
Loop and basic block optimization
loop optimization is the process of increasing execution speed and reducing the overheads associated
with loops. It plays an important role in improving cache performance and making effective use of
parallel processing capabilities. Most execution time of a scientific program is spent on loops; as
such, many compiler optimization techniques have been developed to make them faster.
Program:
#include<stdio.h>
//#include<conio.h>
#include<stdlib.h>
#include<string.h>
//using namespace std;
struct mm
{
char op[6],op1[6],op2[6],op3[6];
int flag;
}m[20];
void rep(int i,int j,int n);
void coms();
void dead(int n);
void dis(int n);
int checkc(int i,int j);
int main()
{
int ch,i;
int n;
//clrscr();
printf("\nENTER THE NO. OF TUPLES : ");
scanf("%d",&n);
printf("\nENTER THE TUPLES NOW : \n");
for(i=0;i<n;i++)
{
scanf("%s%s%s%s",&m[i].op,&m[i].op1,&m[i].op2,&m[i].op3);
}
printf("\n\n1:COMMON SUBEXPRESSION ELIMINATION");
printf("\n2:DEAD CODE ELIMINATION");
printf("\n3:EXIT");
printf("\nENTER UR CHOICE : ");
scanf("%d",&ch);
switch(ch)
{
case 1: coms(n);
break;
case 2: dead(n);
break;
case 3:exit(0);
}return 0;
}
void coms(int n)
{
int i,j;
for(i=0;i<n;i++)
{
for(j=i;j<n;j++)
{
if(m[j].flag==1||i==j)
continue;
if((strcmp(m[i].op,m[j].op)==0)&&(strcmp(m[i].op1,m[j].op1)==0)&&(strcmp(
m[i].op2,m[j]
.op2)==0))
{
if(checkc(i,j)==0)
{
rep(i,j,n);
m[j].flag=1;
}
}
}
}
dis(n);
}
int checkc(int i,int j)
{
int k,f=0;
for(k=i+1;k<j;k++)
{
if((strcmp(m[k].op,"=")==0)&&((strcmp(m[k].op3,m[i].op1)==0)||(strcmp(m[k
].op3,m[i].op2
)==0)))
{
f=1;
}
}
return f;
}void rep(int i,int j,int n)
{
int k;
for(k=i;k<n;k++)
{
if(strcmp(m[k].op1,m[j].op3)==0)
strcpy(m[k].op1,m[i].op3);
if(strcmp(m[k].op2,m[j].op3)==0)
strcpy(m[k].op2,m[i].op3);
}
}
void dis(int n)
{
int i;
for(i=0;i<n;i++)
{
if(m[i].flag==0)
printf("\n%s\t%s\t%s\t%s",m[i].op,m[i].op1,m[i].op2,m[i].op3);
}
}
void dead(int n)
{
int i,j;
for(i=0;i<n;i++)
{
m[i].flag=1;
}
for(i=0;i<n;i++)
{
for(j=i;j<n;j++)
{
if(strcmp(m[i].op3,m[j].op1)==0||strcmp(m[i].op3,m[j].op2)==0)
m[i].flag=0;
}
}
dis(n);
}
Output:
root 1@root 1-Think Centre -M60e:~ $ gcc as s1 7 . c

root 1@root 1-Think Centre -M60e:~ $ . / a . out
ENTER THE NO. OF TUPLES : 2
ENTER THE TUPLES NOW :
33
34
43
65
87
65
43
34
1 :COMMONSUBEXPRESSION ELIMINATION
2 :DEAD CODE ELIMINATION
3 : EXIT
ENTER UR CHOICE : 1
33 34 43 65
87 65 43 34

Compiler Manual

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Compiler Manual

Uploaded by

Copyright:

Available Formats

Jaihind College of Engineering,

Department of Computer Engineering

Name of Student: Kiran Satish Kardile

TEACHING SCHEME EXAMINATION SCHEME

Subject Faculty: Head of Department

Generate and populate appropriate symbol

Implementation of Semantic Analysis Operations

Implement the front end of a compiler that generates

A Register Allocation algorithm that translates the

Implement a Lexical Analyzer using LEX for a subset of C. Cross

Appreciate the role of lexical analysis phase in compilation

1. A V Aho, R. Sethi, .J D Ullman, "Compilers: Principles, Techniques, and

Refer theory, algorithm, test input, test output

Prepare LATEX document including following

 To understand first phase of compiler: LexicalAnalysis.

Auxiliary Function(User Subroutines):

No. Function Meaning

No. Variables Meaning

1. Steps to Execute the program:

1. Start the program.

2. Lex program consists of threeparts.

3. The declaration section includes declaration of variables, maintest, constantsand

4. Translation rule of lex program are statements of theform

5. Write a program in the vi editor and save it with .lextension.

<left side> <alt 1> | <alt 2> | … <alt n>

| <alt 2> {action 2}

%token Used to declare the tokens used in the grammar.

%right Used to assign the associatively to operators.

%nonassoc Used to unary associate.

S/W PACKAGES AND 64-bit open source Linux (Fedora

1. Compiler Construction Using Java, JavaCC and Yacc, Anthony J. Dos

Output of the front end

Figure 2: Intermediate Representation

The Back End

For the sequence of actions for the assignment statement a :* b+i;

Title Implement Semantic Analysis Operations (like type checking,

Implement Semantic Analysis Operations (like type checking, verification of

S/W PACKAGES AND 64-bit open source Linux (Fedora

1. Compiler Construction Using Java, JavaCC and Yacc, Anthony J. Dos

INSTRUCTIONS FOR Learning Objective

Aim: Implementation of Semantic Analysis Operations

The students will be able to

Able to understand how semantic analysis phase of compiler behaves.

To understand the fourth phase of a compiler: Intermediate code generation

1. Compiler Construction Using Java, JavaCC and Yacc, Anthony J. Dos

INSTRUCTIONS FOR Learning Objective

Objective: To learn the function of compiler by:

 To understand fourth phase of compiler: Intermediate codegeneration.

Benefits of using a machine-independent intermediate formare:

Advantages of three-address code:

 The unraveling of complicated arithmetic expressions and of nested flow-of-

Types of Three-Address Statements:

The common three-address statements are:

 Assignment statements of the form x : = y op z, where op is a binary arithmetic

 Assignment instructions of the form x : = op y, where op is a unary operation. Essential

 Copy statements of the form x : = y where the value of y is assigned tox.

generated as part of a call of the procedure p(x1, x2, …. ,xn ).

 Indexed assignments of the form x : = y[i] and x[i] : = y.

Implementation of Three-Address Statements:

A three-address statement is an abstract form of intermediate code. In a compiler, these statements

Fig a) shows quadruples for the assignment a : b * c + b *c

 If we do so, three-address statements can be represented by records with

 Fig b) shows the triples for the assignment statement a: = b * c + b *c.

 Indirect triple representation is the listing pointers to triples rather-than listing