Professional Documents
Culture Documents
Kuran, Pune
LABORATORY MANUAL
2021-2022
LABORATORY PRACTICE-IV
[Compiler]
BE-COMPUTER ENGINEERING
Subject Code: 410252(B)
SUBMITTED BY:
Kum. Kiran Satish Kardile Roll no. 35 Course BE Computer Exam Seat No
B150844235 of fourth year degree in Computer Engineering has completed the term of the
year 2021-22 as prescribed by University of Pune.
Dr.Garkal D.J
Principal
JCOE,Kuran
Page
Srno. Title Sign
no
Implement a Lexical Analyzer using LEX for a
1 subset of C. Cross check your output with Stanford
LEX.
Implement a parser for an expression grammar using
YACC andLEX for the subset of C. Cross check your
2 output with StanfordLEX and YACC.
Roll No. 35
Class B.E. (Computer)
Date
Subject Laboratory Practice-IV
Signatureof subject
teacher.
Marks - /10
Lexical analyzer for HLL using
TITLE LEX
Implement a Lexical Analyzer using LEX for a subset of C. Cross check your
PROBLEM output with Stanford LEX.
STATEMENT
/DEFINITION
64 bit latest Computer/ min.PC with the configuration as Pentium IV 1.7 GHz.
S/W PACKAGES AND 128M.B RAM, 40 G.B HDD, 15’’Color Monitor, Keyboard, Mouse
HARDWARE 64 bit Linux with a support of LEX utility, GCC
APPARATUS USED
Title: Implement a Lexical Analyzer using LEX for a subset of C. Cross check your
outputwithStanford LEX.
Aim:
Assignment to understand the syntax of LEX specifications, built-in functions and variables.
Objectives:
Introduction:
LEX stands for Lexical Analyzer.LEX is a UNIX utility which generates the lexical analyzer. LEX is a tool for
generating scanners. Scanners are programs that recognize lexical patterns in text. These lexical patterns (or
regular expressions) are defined in a particular syntax. A matched regular expression may have an associated
action. This action may also include returning a token. When Lex receives input in the form of a file or text, it
attempts to match the text with the regular expression. It takes input one character at a time and continues until a
pattern is matched. If a pattern can be matched, then Lex performs the associated action (which may include
returning a token). If, on the other hand, no regular expression can be matched, further processing stops and Lex
displays an error message. Lex and C are tightly coupled. A lex file (files in Lex have the .l extension eg: first.l )
is passed through the lex utility, and produces output files in C (lex.yy.c). The program lex.yy.c basically consists
of a transition diagram constructed from the regular expressions of first.lThese file is then compiled object
program a.out, and lexical analyzer transforms an input streams into a sequence of tokens as show in fig1.1.
To generate a lexical analyzer two important things are needed. Firstly it will need a precise specification of the
tokens of the language. Secondly it will need a specification of the action to be performed on identifying each
token.
1. LEX Specifications:
The Structure of lex programs consists of three parts:
Definition Section :
The Definition Section includes declarations of variables, start conditions regular definitions, and
manifest constants (A manifest constant is an identifier that is declared to represent a constante.g. #
define PIE3.14).
C code: Any indented code between %{ and %} is copied to the C file. This is typically used for
defining file variables, and for prototypes of routines that are defined in the codesegment.
Definitions: A definition is very much like # define cpp directive. Forexample
letter [a-zA-Z]+ digit
[0-9]+
These definitions can be used in the rules section: one could start a rule
{letter}{printf("n Wordis = %s",yytext);}
State definitions: If a rule depends on context, it‟s possible to introduce states and
incorporatethoseintherules.Astatedefinitionlookslike%sSTATE,andbydefaulta state
INITIAL is alreadygiven.
Rule Section:
Second section is for translation rules which consist of regular expression and action with respect to it.
The translation rules of a Lex program are statements of the form:
p1 {action 1}
p2 {action 2}
p3 {action 3}
... ...
... ...
pn {action n}
Where, each p is a regular expression and each action is a program fragment describing what actionthe lexical
analyzer should take when a pattern p matches a lexeme. In Lex the actions are written inC.
3 yyless(int n) This function can be used to push back all but first „n‟ characters
of the read Token.
4 yymore() This function tells the lexer to append the next token to the current
token.
5 yyerror() This function is used for displaying any error message.
3. Built - inVariables:
Algorithm:
a. Declaration%%
b.Translation rules %%
c. Auxilaryprocedure.
a. P1 {action}
b. P2 {action}
c. …
d. …
e. Pn {action}
6.Compile the lex program with lex compiler to produce output file as lex.yy.c. eg$lexfilename.l $ cc
lex.yy.c-ll
7.Compile that file with C compiler and verify theoutput.
Conclusion:
LEX is a tool which accepts regular expressions as an input & generates a C code to recognize that token. If that
token is identified, then the LEX allows us to write user defined routines that are to be executed. When we give
input specification file to LEX, LEX generates lex.yy.c file as an output which contains function yylex() which is
generated by the LEX tool & contains a C code to recognize the token & action to be carried out if we find
thetoken
Program:
Program.l
%{
#include<stdio.h>
%}
%%
float|int|char|double|long|void|include|typedef
printf("%s IS A KEYWORD\n",yytext);
["|(|)|{|}|#|;|,|%|&] printf("%s IS A SPECIAL SYMBOL\n", yytext);
["]+[a-zA-Z0-9]+["] printf("%s IS A MESSAGE\n", yytext);
%[a-zA-Z] printf("%s IS A STRING CONSTANT\n", yytext);
[+|-|*|/] printf("%s IS A ARITHMATIC OPERATOR\n", yytext);
[getch|main|printf|scanf|return|clrscr] printf("%s IS A PREDEFINED FUNCTION\n",
yytext);
[a-zA-Z]*.h printf("%s IS A HEADER FILE\n", yytext);
[a-zA-Z]+[a-zA-Z0-9]* printf("%s IS AN IDENTIFIER\n", yytext);
[0-9]+ printf("%s IS A CONSANT\n", yytext);
= printf("%s IS AN ASSIGNMENT OPERATOR \n", yytext);
[<|>|<=|>=|<>] printf("%s IS A RELATIONAL OPERATOR\n", yytext);
[&&|!=|==] printf("%s IS A LOGICAL OPERATOR\n", yytext);
%%
int yywrap(void)
{
return 1;
}
int main()
{
FILE *fp=fopen("code.c","r");
yyin = fp;
yylex();
yywrap();
return 0;
}
Code.c
#include<stdio.h>
#include<conio.h>
void main()
{
int a,b;
clrscr();
printf("\n\t the value of a,b\n");
scanf("%d,%d",&a,&b);
a=a+b;
b=a-b;
a=a-b;
printf("\n the no after swapping a=%d and b=%d ");
getch();
}
Output:
root123@root123-ThinkCentre-M60e:~$ vi code.c
root123@root123-ThinkCentre-M60e:~$ vi ab1.l
root123@root123-ThinkCentre-M60e:~$ lex ab1.l
root123@root123-ThinkCentre-M60e:~$ gcc lex.yy.c -ly -ll
root123@root123-ThinkCentre-M60e:~$ ./a.out
# IS A SPECIAL SYMBOL
< IS A RELATIONAL OPERATOR
stdio.h IS A HEADER FILE
> IS A RELATIONAL OPERATOR
# IS A SPECIAL SYMBOL
< IS A RELATIONAL OPERATOR
conio.h IS A HEADER FILE
> IS A RELATIONAL OPERATOR
main IS AN IDENTIFIER
( IS A SPECIAL SYMBOL
) IS A SPECIAL SYMBOL
{ IS A SPECIAL SYMBOL
a IS A PREDEFINED FUNCTION
, IS A SPECIAL SYMBOL
b IS AN IDENTIFIER
; IS A SPECIAL SYMBOL
clrscr IS AN IDENTIFIER
( IS A SPECIAL SYMBOL
) IS A SPECIAL SYMBOL
; IS A SPECIAL SYMBOL
printf IS AN IDENTIFIER
( IS A SPECIAL SYMBOL
" IS A SPECIAL SYMBOL
\n IS A PREDEFINED FUNCTION
\t IS A PREDEFINED FUNCTION
the IS AN IDENTIFIER
value IS AN IDENTIFIER
of IS AN IDENTIFIER
a IS A PREDEFINED FUNCTION
, IS A SPECIAL SYMBOL
b IS AN IDENTIFIER
\n IS A PREDEFINED FUNCTION
" IS A SPECIAL SYMBOL
) IS A SPECIAL SYMBOL
; IS A SPECIAL SYMBOL
scanf IS AN IDENTIFIER
( IS A SPECIAL SYMBOL
" IS A SPECIAL SYMBOL
%d IS A STRING CONSTANT
, IS A SPECIAL SYMBOL
%d IS A STRING CONSTANT
" IS A SPECIAL SYMBOL
, IS A SPECIAL SYMBOL
& IS A SPECIAL SYMBOL
a IS A PREDEFINED FUNCTION
, IS A SPECIAL SYMBOL
& IS A SPECIAL SYMBOL
b IS AN IDENTIFIER
) IS A SPECIAL SYMBOL
; IS A SPECIAL SYMBOL
a IS A PREDEFINED FUNCTION
= IS AN ASSIGNMENT OPERATOR
a IS A PREDEFINED FUNCTION
+ IS A ARITHMATIC OPERATOR
b IS AN IDENTIFIER
; IS A SPECIAL SYMBOL
b IS AN IDENTIFIER
= IS AN ASSIGNMENT OPERATOR
a IS A PREDEFINED FUNCTION
-b IS AN IDENTIFIER
; IS A SPECIAL SYMBOL
a IS A PREDEFINED FUNCTION
= IS AN ASSIGNMENT OPERATOR
a IS A PREDEFINED FUNCTION
-b IS AN IDENTIFIER
; IS A SPECIAL SYMBOL
printf IS AN IDENTIFIER
( IS A SPECIAL SYMBOL
" IS A SPECIAL SYMBOL
\n IS A PREDEFINED FUNCTION
the IS AN IDENTIFIER
Assignment No. 2
Implement a parser for an expression grammar using YACC and
LEX for the subset of C. Cross check your output with Stanford
Title LEX and YACC.
35
Roll No.
Class B.E. (Computer)
Date
Subject Laboratory Practice-IV
Signatureof
subject teacher
Marks- /10
AssignmentNo:2
Title: Implement a parser for an expression grammar using YACC and LEX for the subset of C. Cross b
check your output with Stanford LEX and YACC.
Aim: Assignment to understand basic syntax of YACC specifications built-in functions and variables
Objective:
To understand Second phase of compiler: Syntax Analysis.
To learn and use compiler writing tools.
Understand the importance and usage of YACC automated tool
Theory:
Parser generator facilitates the construction of the front end of a compiler. YACC is LALR parser generator. It is
used to implement hundreds of compilers. YACC is command (utility) of the UNIXsystem. YACC stands for
“Yet Another Compiler Complier”.File in which parser generated is with .y extension. e.g. parser.y, which is
containing YACC specification of the translator. After complete specification UNIX command. YACC
transforms parser.y into a C program called y.tab.c using LR parser. The program y.tab.c is automatically
generated. We can use command with –d option asyacc –d parser.yBy using –d option two files will get
generated namely y.tab.c and y.tab.h. The header file y.tab.h willstore all the token information and so you need
not have to create y.tab.h explicitly.The program y.tab.c is a representation of an LALR parser written in C, along
with other C routines
that the user may have prepared. By compiling y.tab.c with the ly library that contains the LR parsing program
using the command.cc y tab c – lywe obtain the desired object program a out that perform the translation
specified by the original program.If procedure is needed, they can be compiled or loaded with y.tab.c, just as with
any C program.
LEX recognizes regular expressions, whereas YACC recognizes entire grammar. LEX divides theinput stream
into tokens, while YACC uses these tokens and groups them together logically. LEXand YACC work together to
analyze the program syntactically. The YACC can report conflicts orambiguities (if at all) in the form of error
messages.
1. YACC Specifications:
The Structure of YACC programs consists of three parts
1)Definition section:
2)Declartion part:
3)Translation rule section:
Definition Section:
The definitions and programs section are optional. Definition section handles control information for the YACC-
generated parser and generally set up the execution environment in which the parser will operate.
Declaration part:In declaration section, %{ and %} symbol used for C declaration. This section is used for
definition of token, union, type, start, associativity and precedence of operator. Token declared
in this section can then be used in second and third parts of Yacc specification. Translation Rule Section:
In the part of the Yacc specification after the first %% pair, we put the translation rules. Each rule consists of a
grammar production and the associated semantic action. A set of productions that we have been writing:
……………
| <alt n> {action n}
;
In a Yacc production, unquoted strings of letters and digits not declared to be tokens are taken to be
nonterminals. A quoted single character, e.g. 'c', is taken to be the terminal symbol c, as well as the integer code
for the token represented by that character (i.e., Lex would return the character code for ' c' to the parser, as an
integer). Alternative bodies can be separated by a
vertical bar, and a semicolon follows each head with its alternatives and their semantic actions. The first head is
taken to be the start symbol.
A Yacc semantic action is a sequence of C statements. In a semantic action, the symbol $$ refers to the attribute
value associated with the nonterminal of the head, while $i refers to the value associated with the ith grammar
symbol (terminal or nonterminal) of the body. The semantic action is performed whenever we reduce by the
associated production, so normally the semantic action computes a value for $$ in terms of the $i's. In the Yacc
specification, we have
written the two E-productions. E E + T/T
and their associated semantic action as:exp : exp „+‟ term {$$ = $1 + $3;}
| term;In above production exp is $1, „+‟ is $2 and term is $3. The semantic action associated with first
production adds values of exp and term and result of addition copying in $$ (exp) left hand side. For above
second number production, we have omitted the semantic action since it is just copying the value. In general {$$
= $1;} is the default semantic action.
Supporting C-Routines Section:
The third part of a Yacc specification consists of supporting C-routines. YACC generates a single function called
yyparse(). This function requires no parameters and returns either a 0 on success, or 1 on failure. If syntax error
over its return 1.The special function yyerror() is called when YACC encountersan invalid syntax. The yyerror()
is passed a single string (char )
argument. This function just prints user defined message like:yyerror (char err)
{
printf (“Divide by zero”);
}
When LEX and YACC work together lexical analyzer using yylex () produce pairs consisting of a token and its
associated attribute value. If a token such as DIGIT is returned, the token value associated with a token is
communicated to the parser through a YACC defined variable yylval.We have to return tokens from LEX to
YACC, where its declaration is in YACC. To link this LEX program include a y.tab.h file, which is generated
after YACC compiler the program using – d option.
. Built-in Functions:
Function Meaning
yyparser() This is a standard parse routine used for calling syntax analyzer for given translation
yyerror() This function is used for displaying any error message when a yacc detects a syntax
3. Built-in Types:
Mea
Type
ning
Eg.:- %token NUMBER
Eg.:- %start S Where S is start symbol
Eg.: %left „+‟ „-„ -Assign left associatively to + & – with lowest precedence.
%left „*‟ „/„ -Assign left associatively to * & / with highest precedence.
Eg.: %right „+‟ „-„ -Assign right associatively to + & – with lowest precedence
%right „*‟ „/„ -Assign right associatively to * & / with highest precedence.
Eg.:- %nonassoc UMINUS
Eg.:- %prec UMINUS
Eg.:- %type <name of any variable> exp
% union
{ char str ;
int num ; }
rules. When yyparse() is call, the parser attempts to parse an input stream.
error
Conclusion:
The yacc command accepts a language that is used to define a grammar for a target language to
be parsed by the tables and code generated by yacc. The language accepted by yacc as a grammar for
the target language is described below using the yacc input language itself.
The input grammar includes rules describing the input structure of the target language and code
to be invoked when these rules are recognized to provide the associated semantic action. The code to be
executed will appear as bodies of text that are intended to be C-language code. The C-language
inclusions are presumed to form a correct function when processed by yacc into its output files.
;CALC program
%{
#include<stdio.h>
#include<stdlib.h>
void yyerror(char *);
#include "y.tab.h"
int yylval;
%}
%%
[a-z] {yylval=*yytext='&'; return VARIABLE;}
[0-9]+ {yylval=atoi(yytext); return INTEGER;}
CALC.Y
[\t] ;
%%
int yywrap(void)
{
return 1;
}
%token INTEGER VARIABLE
%left '+' '-'
%left '*' '/'
%{
int yylex(void);
void yyerror(char *);
int sym[26];
%}
%%
PROG:
PROG STMT '\n'
;
STMT: EXPR {printf("\n %d",$1);}
| VARIABLE '=' EXPR {sym[$1] = $3;}
;
EXPR: INTEGER
| VARIABLE {$$ = sym[$1];}
| EXPR '+' EXPR {$$ = $1 + $3;}
| '(' EXPR ')' {$$ = $2;}
%%
void yyerror(char *s)
{
printf("\n %s",s);
return;
}
int main(void)
{
printf("\n Enter the Expression Grammar:");
yyparse();
return 0;
}
***OUTPUT***
Output:
$ lex calc.l
$ yacc -d calc.y
$ cc y.tab.c lex.yy.c -ll -ly -lm
$ . / a . out
Enter the Expression: ( 5 + 4 ) * 3
Answer: 27
Assignment No. 3
Generate and populate appropriate Symbol Table..
Title
35
Roll No.
Class B.E. (Computer)
Date
Subject Laboratory Practice-IV
Signatureof
subject teacher
Marks- /10
TITLE Generate and Populate appropriateSymbolTable.
PROBLEM
STATEMENT Design suitable data structures to generate and Populate appropriate
/DEFINITION Symbol Table.
OBJECTIVE Understand the Symbol table is an important data structure created and
maintained by compilers
Refer to
STEPS details
Theory-Related concept,Architecture,Syntaxetc
INSTRUCTIONS FOR Class Diagram
JOURNAL Test cases
Theory:
Figure 1 depicts the schematic of a two pass language processor. The first pass performsanalysis
of the source program and reflects its results in the intermediate representation (IR). The second pass
reads and analyses the IR to perform synthesis of the target program. This avoids repeated process sing of
the source program. The first pass is concerned exclusively with source language issues. Hence it is
called the FRONT END of the language processor. The second pass is concerned withthe language
processor.
The Front End
The front end performs
• Lexical analysis,
• Syntax analysis and
• Semantic analysis of the source program.
Each kind of analysis involves the following functions:
Determine validity of a source statement from the viewpoint of the analysis.
1. Determine the ‘content’ of a source statement.
2. Construct a suitable representation of the source statement for use by subsequent analysis functions, or by the
synthesis phase of the language processor. The word ‘content’ has different connotations in lexical, syntax and
semantic analysis. • In lexical analysis, the content is the lexical class to which each lexical
unit belongs
In syntax analysis it is the syntactic structure of a source statement.
• In semantic analysis the content is the meaning of a statement—for a declaration
statement, it is the set of attributes of a declared variable (e.g. type, length and
dimensionality), while for an imperative statement, it is the sequence of actions implied
by the statement.
Each analysis represents the ‘content’ of a source statement in the form of
(1) Tables of information, and (2) description of the source statement. Subsequent analysis
uses this information for its own purposes and either adds information to these tables and
descriptions, or constructs its own tables and descriptions.
TABLES
Tables contain the information obtained during different analyses of SP. The most important table is the
symbol table which contains information concerning all identifiers used in the SP. The symbol table is built
during lexical analysis. Semantic analysis adds information concerning symbol attributes while processing
declaration statements. It may also add new names designating temporary results.
INTERMEDIATE CODE ( I C )
The IC is a sequence of IC units; each IC unit representing the meaning of one action in SP. IC units may
contain references to the information in various tables.
Example
Figure 2 shows the IR produced by the analysis phase for the program i:
integer;
a,b: real;
a := b+i;Symbol Table:
Intermediate code
#1) to real, giving (id, #4)
2. Add (id, #4) to (id, #3) giving (id, #5)
3. Store (id, #5) in (Id, #2)
The symbol table contains information concerning the identifiers and their types. This information is
determined during lexical and semantic analysis, respectively. In IC, the specification (Id, #1) refers to the id
occupying the first entry in the table. i* and temp are temporary names added during semantic analysis of the
assignment statement.
Example
After memory allocation, the symbol table looks as shown in Figure 3. The entries for i* and
temp are not shown because memory allocation is not needed for these id’s.
Figure 3: Symbol table after memory allocation
Certain decisions have to precede memory allocation, for example, whether i* and temp should
be allocated memory. These decisions are taken in the preparatory steps of code generation.
Code generation
Code generation uses knowledge of the target architecture, viz. knowledge of instructions and addressing modes
in the target computer, to select the appropriate instructions.
Example
import java.io.*;
class P1
{
public static void main(String ar[])throws IOException
{
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
int i;
String a[][]={{"","START","101",""},
{"","MOVER","BREG","ONE"},
{"AGAIN","MULT","BREG","TERM"},
{"","MOVER","CREG","TERM"},
{"","ADD","CREG","N"},
{"","MOVEM","CREG","TERM"},
{"N","DS","2",""},
{"RESULT","DS","2",""},
{"ONE","DC","1",""},
{"TERM","DS","1",""},
{"","END","",""}};
int lc=Integer.parseInt(a[0][2]);
String st[][]=new String[5][2];
int cnt=0,l;
for (i=1;i<11;i++)
{
if (a[i][0]!="")
{
st [cnt][0]=a[i][0];
st[cnt][1]=Integer.toString(lc);
cnt++;
if(a[i][1]=="DS")
{
int d=Integer.parseInt(a[i][2]);
lc=lc+d;
}
else
{
lc++;
}
}
Else
{
lc++;
}
}
System.out.print("***SYMBOL TABLE****\n");
System.out.println("_____________________");
for(i=0;i<5;i++)
{
for(cnt=0;cnt<2;cnt++)
{
System.out.print(st[i][cnt]+"\t");
}
System.out.println();}
String
inst[]={"STOP","ADD","SUB","MULT","MOVER","MOVEM","COMP","BC","DIV","RE
AD
","PRINT"};
String reg[]={"NULL","AREG","BREG","CREG","DREG"};
int op[][]=new int[12][3];
int j,k,p=1,cnt1=0;
for(i=1;i<11;i++)
{
for(j=0;j<11;j++)
{
if(a[i][1].equalsIgnoreCase(inst[j]))
{
op[cnt1][0]=j;
}
else
if(a[i][1].equalsIgnoreCase("DS"))
{
p=Integer.parseInt(a[i][2]);
}
else if(a[i][1].equalsIgnoreCase("DC"))
{
op[cnt1][2]=Integer.parseInt(a[i][2]);
}
}
for(k=0;k<5;k++)
{
if(a[i][2].equalsIgnoreCase(reg[k]))
{
op[cnt1][1]=k;
}
}
for(l=0;l<5;l++)
{
if(a[i][3].equalsIgnoreCase(st[l][0]))
{
int mn=Integer.parseInt(st[l][1]);
op[cnt1][2]=mn;
}
}
cnt1=cnt1+p;
}
System.out.println("\n *****OUTPUT*****\n");
System.out.println("**********MOT TABLE**********");
int dlc=Integer.parseInt(a[0][2]);
for(i=0;i<12;i++)
{
System.out.print(dlc+++"\t");
for(j=0;j<3;j++)
{
System.out.print(" "+op[i][j]+" ");
}
System.out.println();
}
System.out.println("");}
}
/*OUTPUT:********
pl2-pc2@pl2pc2-ThinkCentre-M700:~/Desktop$ javac P1.java
pl2-pc2@pl2pc2-ThinkCentre-M700:~/Desktop$ java P1
***SYMBOL TABLE****
_____________________
AGAIN 102
N 106
RESULT 108
ONE 110
TERM 111
*****OUTPUT*****
**********MOT TABLE**********
101 4 2 110
102 3 2 111
103 4 3 111
104 1 3 106
105 5 3 111
106 0 0 0
107 0 0 0
108 0 0 0
109 0 0 0
110 0 0 1
111 0 0 0
112 0 0 0
pl2-pc2@pl2pc2-ThinkCentre-M700:~/Desktop$
Assignment No. 4
Signatureof
subject teacher
Marks- /10
Implementation of Semantic Analysis Operations
TITLE
Semantic analysis is the task of ensuring that the declarations and statements of
OBJECTIVE a program are semantically correct, i.e, that their meaning is clear and consistent
with the way in which control structures and data types are supposed to be used.
Refer todetails
STEPS
Pre-requisite: familiarity with the phases of compiler like lexical analysis i.e. parser
Learning Objectives:
- Study of Semantics analysis phase of compiler how it helps interpret symbols, their types, and their
relations with each other.
- Study of Grammar used by Semantics analysis to interpret meaning of programming and natural
language sentences.
Learning Outcomes:
Semantic Errors
We have mentioned some of the semantics errors that the semantic analyzer is expected to recognize:
Type mismatch
Undeclared variable
Reserved identifier misuse.
Multiple declaration of variable in a scope.
Accessing an out of scope variable.
Actual and formal parameter mismatch.
Attribute Grammar
Attribute grammar is a special form of context-free grammar where some additional information
(attributes) are appended to one or more of its non-terminals in order to provide context-sensitive information.
Each attribute has well-defined domain of values, such as integer, float, character, string, and expressions.
Attribute grammar is a medium to provide semantics to the context-free grammar and it can help specify the
syntax and semantics of a programming language. Attribute grammar (when viewed as a parse-tree) can pass
values or information among the nodes of a tree.
Example:
E → E + T { E.value = E.value + T.value }
The right part of the CFG contains the semantic rules that specify how the grammar should be interpreted. Here,
the values of non-terminals E and T are added together and the result is copied to the non-terminal E.
Semantic attributes may be assigned to their values from their domain at the time of parsing and evaluated at the
time of assignment or conditions. Based on the way the attributes get their values, they can be broadly divided
into two categories : synthesized attributes and inherited attributes.
Synthesized attributes
These attributes get values from the attribute values of their child nodes. To illustrate, assume the
following production:
S → ABC
If S is taking values from its child nodes (A,B,C), then it is said to be a synthesized attribute, as the values of
ABC are synthesized to S.
As in our previous example (E → E + T), the parent node E gets its value from its child node. Synthesized
attributes never take values from their parent nodes or any sibling nodes.
Inherited attributes
In contrast to synthesized attributes, inherited attributes can take values from parent and/or siblings. As in the
following production,
S → ABC
A can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take values from S, A,
and B.
Expansion : When a non-terminal is expanded to terminals as per a grammatical rule
Reduction : When a terminal is reduced to its corresponding non-terminal according to grammar rules. Syntax
trees are parsed top-down and left to right. Whenever reduction occurs, we apply its corresponding semantic rules
(actions).
Semantic analysis uses Syntax Directed Translations to perform the above tasks.
Semantic analyzer receives AST (Abstract Syntax Tree) from its previous stage (syntax analysis).
Semantic analyzer attaches attribute information with AST, which are called Attributed AST.
Attributes are two tuple value, <attribute name, attribute value>
For example:
int value = 5;
<type, “integer”>
<presentvalue, “5”>
We may conclude that if a definition is S-attributed, then it is also L-attributed as L-attributed definition
encloses S-attributed definitions.
Conclusion:
age.The input grammar includes rules describing the input structure of the target language and code to be
invoked when these rules are recognized to provide the associated semantic action. The code to be
executed will appear as bodies of text that are intended to be C-language code may conclude that if a
definition is S-attributed, then it is also L-attributed as L-attributed definition encloses S-attributed
definitions. We implement Semantic analysis operation (like type checking,verifications of function
parameter,variable declarations)using an attributed translation grammar.
Program:
**********ip.l***********
%{
#include "y.tab.h"
%}
ALPHA [A-Z a-z]
DIGIT [0-9]
%%
{ALPHA}({ALPHA}|{DIGIT})* return ID;
{DIGIT}+ {yylval=atoi(yytext); return ID;}
[\n \t] yyterminate();
. return yytext[0];
%%
***********ip.y**********
%{
#include <stdio.h>
#include <stdlib.h>
%}
%token ID
%left '+' '-'
%left '*' '/'
%left UMINUS
%%
S:E
E : E'+'{A1();}T{A2();}
| E'-'{A1();}T{A2();}
| T;
T : T'*'{A1();}F{A2();}
| T'/'{A1();}F{A2();}
| F;
F : '('E{A2();}')'
| '-'{A1();}F{A2();}
| ID{A3();};
%%
#include "lex.yy.c"
char st[100];
int top=0;
main()
{
printf("Enter infix expression: ");
yyparse();
printf("\n");
}
A1()
{
st[top++]=yytext[0];
}
A2()
{printf("%c",st[--top]);
}
A3()
{
printf("%c",yytext[0]);
}
Output1:
outputsemantic.txt
root1@root1-ThinkCentre-M60e:~$lex semantic.l
root1@root1-ThinkCentr-M60e:~$yacc semantic.y
root1@root1-ThinkCentre-M60e:~$gcc y.tab.c –ll –ly
root1@root1-ThinkCentre-M60e:~$./a.out .
Enter Infix Expression:(5-9)*(4+6)
59-46+*
Assignment No. 5
Implement the front end of a compiler that generates the
Title three address code for a simple language.
Roll No. 35
Class B.E. (Computer)
Date
Subject Laboratory Practice-IV
Signatureof
subject teacher-
Marks- /10
TITLE Implement the front end of a compiler that generates the three address code
for a simple language
PROBLEM
STATEMENT 1. Write a LEX and YACC program to generate Intermediate Code for
/DEFINITION arithmetic expression
2. Write a LEX and YACC program to generate Intermediate Code for subset
of C (If,else,while)
OBJECTIVE
S/W PACKAGES AND Linux OS (Fedora 20), PC with the configuration as 64-bit Fedora or
HARDWARE equivalent OS with 64-bit Intel-i5/ i7 or latest higher processor computers,
APPARATUS USED FOSS tools, LEX, YACC,
Refer todetails
STEPS
Title: Implement the front end of a compiler that generates the three address code for a simple language.
Aim: Write an attributed translation grammar to recognize declarations of simple variables, "for", assignment,
if, if-else statements as per syntax of C or Pascal and generate equivalent three address code for the given
input made up of constructs mentioned above using LEX and YACC. Write a code to store the identifiers from
the input in a symbol table and also to record other relevant information about the identifiers. Display all
records stored in the symboltable.
Theory:
Introduction:
In the analysis - synthesis model of a compiler, the front end analyzes a source program and creates an
intermediate representation, from which the back end generates target code. Ideally, details of the source
language are confined to the front end, and details of the target machine to the back end. The front end
translates a source program into an intermediate representation from which the back end generates target code.
With a suitably defined intermediate representation, a compiler for language i and machine j can then be built
by combining the front end for language i with the back end for machine j. This approach to creating suite of
compilers can save a considerable amount of effort: m x n compilers can be built by writing just m front ends
and n backends.
2. Postfixnotation:
Postfix notation is a linearized representation of a syntax tree; it is a list of the nodes of the
tree in which a node appears immediately after its children. The postfix notation for the
syntax tree given above is
a b c uminus * b c uminus * + assign
3. Three-AddressCode:
Three-address code is a sequence of statements of the general
form x : = y op z
Where x, y and z are names, constants, or compiler-generated temporaries; op stands for any
operator, such as a fixed-or floating-point arithmetic operator, or a logical operator on Boolean
valued data.
Thus a source language expression like x+ y*z might be translated into a sequence
t1 : =
y* z
t2 : =
x+t1
Where t1 and t2 are compiler-generated
temporarynames.
The reason for the term “three-address code” is that each statement usually contains three
addresses, two for the operands and one for the result.
The use of names for the intermediate values computed by a program allows three
addresscode to be easily rearranged – unlike postfixnotation.
Three-address code is a liberalized representation of a syntax tree or a dag in which explicit names
correspond to the interior nodes of the graph. The syntax tree and dag are represented by the three-
address code sequences. Variable names can appear directly in three address statements.
ConditionaljumpssuchasifxrelopygotoL.Thisinstructionappliesarelationaloperator(<,=,
>=, etc. )to x and y, and executes the statement with label L next if x stands in relation
relop toy. If not, the three-address statement following if x relop y goto L is executed next,
as in the usual sequence.
paramxandcallp,nforprocedurecallsandreturny,whereyrepresentingareturnedvalueis
optional. Forexample,
para
m x1
para
m x2
.......
para
mxn
callp
,n
Address and pointer assignments of the form x : = &y , x : = *y, and *x : =y.
A. Quadruples:
A quadruple is a record structure with four fields, which are, op, arg1, arg2
andresult.
The op field contains an internal code for the operator. The 3 address statement x
= yop z is represented by placing y in arg1, z in arg2 and x inresult.
The contents of fields arg1, arg2 and result are normally pointers to the symbol-
table entries for the names represented by these fields. If so, temporary names must
be entered into the symbol table as they arecreated.
To avoid entering temporary names into the symbol table, we might refer to
atemporary value by the position of the statement that computesit.
The fields arg1 and arg2, for the arguments of op, are either pointers to the
symboltable or pointers into the triple structure ( for temporary values).
Since three fields are used, this intermediate code format is known astriples.
C.
D.Indirecttriples:
$lexfilename.l (eg:comp.l)
$./a .out
Algorithm:
Write a LEX and YACC program to generate Intermediate Code for arithmetic expression
LEX program:
1. Declaration of header files specially y.tab.h which contains declaration for Letter,
Digit,expr.
4. If match found then convert it into char and store it in yylval.p where p is pointer
declaredin YACC
5. Returntoken
9. Declare mainfunction
10. End
1. Declaration of headerfiles
Addtotable function will add the argument1, argument2, operator and temporary variable to the
structure array of three addresscode.
Three address code function will print the values from the structure in the form first temporary
variable, argument1, operator,argument2
Quadruple Function will print the values from the structure in the form first operator, argument1,
argument2, result field
Triple Function will print the values from the structure in the form first argument1, argument2, and
operator. The temporary variables in this form are integer / index instead of variables.
Conclusion:
FAQ’s
Program:
*******************tac.l****************
%{
#include<stdio.h>
#include"y.tab.h"
char ch='a';
%}
%%
[0-9]+ {yylval.dval=yytext[0]; return NUM;} /*this would store value of integer
in yyval;nd return token integer to yacc*/
\n {return 0;}
. {return yytext[0];}
%%
************************tac.y****************
%{
#include<stdio.h>
%}
%union
{
char dval; /*to define possible symbol types,is to allow
storing different kind of objects into node emitted by lex*/
}
%token <dval> NUM
%type <dval> E
%left '+' '-'
%left '*' '/'
%%
statement : E {printf("\n t= %c\n This is Three Address Code ...\n",$1);}
E : E '+' E {$$=intgencode($$,$1,'+',$3);}
| E '-' E {$$=intgencode($$,$1,'-',$3);}
| E '*' E {$$=intgencode($$,$1,'*',$3);}
| E '/' E {$$=intgencode($$,$1,'/',$3);}
| '(' E ')' {$$=$2;}
| NUM {$$=$1;}
;
%%
main()
{
yyparse();
return 0;
}
OUTPUT
root123@root123-ThinkCentre-M60e:~$ lex tac.l
root123@root123-ThinkCentre-M60e:~$yacc-d tac.y
root123@root123-ThinkCentre-M60e:~$./a.out /
3∗5+10/10-2
a=3∗5
b=10/10
c=a+b
d=c-2
t=d
This i s Three Address Code
Assignment No. 6
Title Write a program for Register Allocation algorithm that translates the given code into
one with a fixed number of registers.
.
Roll No. 35
Class B.E. (Computer)
Date
Subject Laboratory Practice-IV
Signature of subject
teacher-
Marks- /10
TITLE Code generation using Register Allocation
PROBLEM Write a program for Register Allocation algorithm that translates the
STATEMENT given code into one with a fixed number of registers
/DEFINITION
OBJECTIVE
To understand code generation phase of
compilation.
To understand register allocation algorithm
Refer todetails
STEPS
INSTRUCTIONS FOR
JOURNAL Learning Objective
Learning Outcome
Theory-Related
concept,Architecture,Syntaxetc
Class Diagram
Test cases
Program Listing
Output
Conclusion
Code Generation
Goal: Takes Intermediate code representation of source program (optimized three address code
statements) and produce equivalent target program as output.
Requirement of code genrator:
Correct target code
High quality code
Effective use of resources of target machines
Quick
Simple code generation involves generating code for each three-address statements which are
divided into different basic blocks. For that it makes use of registers to store operands and leaves
result of computation in the register as long as possible.
At the end of a basic block the results of the computation are stored only if the register is needed
for another computation or just before procedure call, jump or labeled statements. Because after
leaving this basic block flow enters into another different basic block.
When generating code by above mentioned strategy we need :
1. Register Descriptor
2. Address Descriptor
1. Register descripto
oTo keep track of what is currently in each register, register descriptor is maintained.
oRegister descriptor is pointer to a list that contains information about what is currently in each of the
register. Initially all registers are empty.
2. Address descriptor
oTo keep track of the locations where current value of the name can be found at run time, address
descriptor is maintained. The locations can be a register, stack location or memory addresses.
oThis particular information is stored in the symbol table.
Input to the code generation algorithm is a sequence of 3-address statement divided into blocks.
For each three address statement of the form a := b op c in the basic block it performs following
steps :
1. Call getreg( ) to obtain the location L in which the computation a := b op c is performed. For that
we need to pass three-address statement a := b op c as parameter to the getreg( ) function. Typically
location L can be a register or it can be a memory location.
2. Obtain the current location of the operand b from its address descriptor. If the value of y is in
memory location and also in register then prefer register rather than memory location. If the value of
b is not in location L then generate the instruction MOV b, L to store copy of b in L.
3. The instruction OP x L is generated and the address descriptor of x is updated to indicate that x is
now available in L. If L is register then update its descriptor to indicate that it will contain the run
time value of x.
4. If the current value of b and c are in the register then we can say that b and c are not further used.
That means value of b and c are not live at the end of the basic block. So alter the register descriptor
to indicate that after the execution of the three-address statement a : = b op c, those register will no
longer contain b and / or c.
To perform computation specified by each of the three-address statement, we need a location. For
that function getreg( ) is used.
When a function getreg( ) is called, it returns a location for the computation performed by a three-
address statement. For example if a := b op c is to be performed then getreg( ) returns a location L
where the computation b op c should be performed ; and if possible it returns a register.
It performs the following steps to return location L.
1. First it searches for a register that already contains a value of b. If such register is available
and if value of b has no further use after execution of a := b op c and if b is
2.
3. not live at the end of basic block and hold the value of no other name then it returns that
register for L.
4. 2. Otherwise getreg() function searches for the empty register. If empty register is available
then it returns that for L.
5. 3. If empty register is not available and if a has further use in the block or op is an indexing
operator that requires register then getreg ( ) finds a suitable occupied register. This occupied
register is emptied by storing its value in proper memory location M and by updating address
descriptor that register is returned to L. The least recently used strategy can be used to find
suitable occupied register to be emptied.
6. 4. If a has no further use in the block or no suitable occupied register can be found then
getreg( ) selects a memory location and returns it for L.
7. Example
8. Consider the following expression x = (a + b) – ((c + d) – e)
9. The three-address code for this is generated as
10. t1 = a + b
11. t2 = c + d
12. t3 = t2 – e
13. x = t1 – t3
14. By applying above algorithm following code will be generated.
15. Table 1 shows the instruction generated and the contents of register descriptor and address
descriptor :
16.
17. S= { St , E , I , O , F, DD , NDD}
Where,
St is an initial state such that St={Fl | Fl is IC}. At initial state, system has IC.
E is a End state.
E={Sc, Fc | Sc = success case which gives assembly code and Fc = failure case which gives
the lexical errors}.
I= set of inputs
I={Fi | Fi is an input file which consists of IC}.
O= set of outputs
O={Op | Op is the assembly code generated}.
F=A set of Functions.
F={f | f1:I → O, function f implements simple code generator and generates the target
Program:
Cogen.l
%{
#include"stdio.h"
#include"y.tab.h"
%}
%%
[a-z][a-zA-Z0-9]*|[0-9]+ { strcpy(yylval.vname,yytext); return NAME; }
[ |\t] ;
.|\n { return yytext[0]; }
%%
Cogen.y
%{
#include<stdio.h>
#include<ctype.h>
#include<string.h>
FILE *fpOut;
%}
%union
{
char vname[10];
int val;
}
%left '+' '-'
%left '*' '/'
%token <vname> NAME
%type <vname> expression
%%
input : line '\n' input
| '\n' input
|;
line : NAME '=' expression { fprintf(fpOut,"MOV %s, AX\n",$1); }
;
expression: NAME '+' NAME
{
fprintf(fpOut,"MOV AX, %s\n",$1);
fprintf(fpOut,"ADD AX, %s\n",$3);
}
| NAME '-' NAME
{
fprintf(fpOut,"MOV AX, %s\n",$1);
fprintf(fpOut,"SUB AX, %s\n",$3);
}
| NAME '*' NAME
{
fprintf(fpOut,"MOV AX, %s\n",$1);
fprintf(fpOut,"MUL AX, %s\n",$3);
}
| NAME '/' NAME
{
fprintf(fpOut,"MOV AX, %s\n",$1);
fprintf(fpOut,"DIV AX, %s\n",$3);
}
| NAME
{
fprintf(fpOut,"MOV AX, %s\n",$1);
strcpy($$, $1);
}
;
%%
FILE *yyin;
main()
{
FILE *fpInput;
fpInput = fopen("input1.c","r");
if(fpInput=='\0')
{
printf("file read error");
//exit(0);
}
fpOut = fopen("output.c","w");
if(fpOut=='\0')
{
printf("file creation error");
//exit(0);
}
yyin = fpInput;
yyparse();
fcloseall();
}
yyerror(char *msg)
{
printf("%s",msg);
}
Input.c
t1 = c * d
t2 = b +t1
t3 = e /f
t4 = t2 -t3
a = t4
output cmd:
root123@root123-ThinkCentre-M60e:~$yacc-d cogen.y
root123@root123-ThinkCentre-M60e:~$./a.out
Assignment No. 7
Title
Implement local and global code optimizations such as common sub
expression elimination,copypropagation,dead code elimination,loop and
basic block optimization(optional)
.
Roll No. 35
Class B.E. (Computer)
Date
Subject Laboratory Practice-IV
Signature of subject
teacher
Marks- /10
TITLE Implement local and global code optimizations such as common sub
expression elimination,copypropagation,dead code elimination,loop
and basic block optimization(optional)
OBJECTIVE
To learn and understand
Different techniques of code optimization
Implementation of code optimization
techniques
S/W PACKAGES AND Linux OS (Fedora 20), PC with the configuration as Core 2 Duo 2.2
HARDWARE GHz processor. 1 GB RAM, 160 G.B HDD, 17’’Color Monitor,
APPARATUS USED Keyboard, Mouse
Refer todetails
STEPS
INSTRUCTIONS FOR
JOURNAL Learning Objective
Learning Outcome
Theory-Related
concept,Architecture,Syntaxetc
Class Diagram
Test cases
Program Listing
Output
Conclusion
Theory:
Learning Outcomes:
The student will be able to
Understand code optimization in detail
Implement local and global code optimization techniques
THEORY:
Code Optimization: Optimization is a program transformation technique, which tries to improve the
code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.
In optimization, high-level general programming constructs are replaced by very efficient low-level
programming codes. A code optimizing process must follow the three rules given below:
4. The output code must not, in any way, change the meaning of the program.
5. Optimization should increase the speed of the program and if possible, the program should demand
less number of resources.
6. Optimization should itself be fast and should not delay the overall compiling process.
Machine-independent Optimization:
In this optimization, the compiler takes in the intermediate code and transforms a part of the code
that does not involve any CPU registers and/or absolute memory locations.
Machine-dependent Optimization:
Machine-dependent optimization is done after the target code has been generated and when the code
is transformed according to the target machine architecture. It involves CPU registers and may have
absolute memory references rather than relative references. Machine-dependent optimizers put
efforts to take maximum advantage of memory hierarchy.
Code optimization techniques
1.Common sub expression elimination
2.Copy propagation
3.Dead code elimination
4.Loop and basic block optimization
S= { St , E , I , O , F, DD , NDD}
Where,
St is an initial state such that St={Fl | Fl is IC}. At initial state, system has IC.
E is a End state.
E={Sc, Fc | Sc = success case which gives assembly code and Fc = failure case which gives the
lexical errors}.
I= set of inputs
I={Fi | Fi is an input file which consists of IC}.
O= set of outputs
O={Op | Op is the assembly code generated}.
F=A set of Functions.
F={f | f1:I → O, function f implements simple code generator and generates the target
Copy propagation
copy propagation is the process of replacing the occurrences of targets of direct assignments with
their values.A direct assignment is an instruction of the form x = y, which simply assigns the value of
y to x.
From the following code:
y=x
z=3+y
Copy propagation would yield:
z=3+x
Algorithm:
1)Generate the flow graph from list of instructions
do{
2)Perform reaching copy analysis
3)for each node in flow graph
for each use
if use is reached by the copy where it is the target
change the use to the source in the move statement tuple
While(changes)
4)Generate the list of instruction from modified flow graph
Program:
#include<stdio.h>
//#include<conio.h>
#include<stdlib.h>
#include<string.h>
//using namespace std;
struct mm
{
char op[6],op1[6],op2[6],op3[6];
int flag;
}m[20];
void rep(int i,int j,int n);
void coms();
void dead(int n);
void dis(int n);
int checkc(int i,int j);
int main()
{
int ch,i;
int n;
//clrscr();
printf("\nENTER THE NO. OF TUPLES : ");
scanf("%d",&n);
printf("\nENTER THE TUPLES NOW : \n");
for(i=0;i<n;i++)
{
scanf("%s%s%s%s",&m[i].op,&m[i].op1,&m[i].op2,&m[i].op3);
}
printf("\n\n1:COMMON SUBEXPRESSION ELIMINATION");
printf("\n2:DEAD CODE ELIMINATION");
printf("\n3:EXIT");
printf("\nENTER UR CHOICE : ");
scanf("%d",&ch);
switch(ch)
{
case 1: coms(n);
break;
case 2: dead(n);
break;
case 3:exit(0);
}return 0;
}
void coms(int n)
{
int i,j;
for(i=0;i<n;i++)
{
for(j=i;j<n;j++)
{
if(m[j].flag==1||i==j)
continue;
if((strcmp(m[i].op,m[j].op)==0)&&(strcmp(m[i].op1,m[j].op1)==0)&&(strcmp(
m[i].op2,m[j]
.op2)==0))
{
if(checkc(i,j)==0)
{
rep(i,j,n);
m[j].flag=1;
}
}
}
}
dis(n);
}
int checkc(int i,int j)
{
int k,f=0;
for(k=i+1;k<j;k++)
{
if((strcmp(m[k].op,"=")==0)&&((strcmp(m[k].op3,m[i].op1)==0)||(strcmp(m[k
].op3,m[i].op2
)==0)))
{
f=1;
}
}
return f;
}void rep(int i,int j,int n)
{
int k;
for(k=i;k<n;k++)
{
if(strcmp(m[k].op1,m[j].op3)==0)
strcpy(m[k].op1,m[i].op3);
if(strcmp(m[k].op2,m[j].op3)==0)
strcpy(m[k].op2,m[i].op3);
}
}
void dis(int n)
{
int i;
for(i=0;i<n;i++)
{
if(m[i].flag==0)
printf("\n%s\t%s\t%s\t%s",m[i].op,m[i].op1,m[i].op2,m[i].op3);
}
}
void dead(int n)
{
int i,j;
for(i=0;i<n;i++)
{
m[i].flag=1;
}
for(i=0;i<n;i++)
{
for(j=i;j<n;j++)
{
if(strcmp(m[i].op3,m[j].op1)==0||strcmp(m[i].op3,m[j].op2)==0)
m[i].flag=0;
}
}
dis(n);
}
Output:
ENTER UR CHOICE : 1
33 34 43 65
87 65 43 34