You are on page 1of 39

PRACTICAL FILE

COMPILER DESIGN LAB


(CSEN4102)

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING

Submitted To: Submitted By:


Mrs. Sushma Shival Domesh Kaushik
AP, CSE Deptt. CSE 7th Semester
PDMU, Bahadurgarh A40318071

DOMESH KAUSHIK
A40318071
INDEX

S.No Name of Programs Date Signature

1 To study about Lex and Yacc compiler.

2 WAP to find whether the input is a constant string or


not

3 WAP to count the number of white spaces and new line


characters in a string entered by the user.

4 WAP to check whether a given string belongs to the


grammar or not.

5 WAP to check whether the input string contains


keywords or not.

6 WAP to remove left recursion from a grammar

7 WAP to make left factoring for a given grammar.

8 WAP to show all the operations on a stack.

9 WAP to find out the first of the non-terminals in a


grammar.

10 WAP to compute follow of the non-terminals.

DOMESH KAUSHIK
A40318071
PROGRAM NO.: 1

AIM: To study about Lex and Yacc compiler.


Compiler

A compiler is a computer program which helps you transform source code written in a high-level
language into low-level machine language. It translates the code written in one programming
language to some other language without changing the meaning of the code. The compiler also
makes the end code efficient which is optimized for execution time and memory space.

The compiling process includes basic translation mechanisms and error detection. Compiler
process goes through lexical, syntax, and semantic analysis at the front end, and code generation
and optimization at a back-end.

The phases of a compiler are explained below, other than these 2 more components that a
compiler have the symbol table and the error handler.
Lexical Analyzer
It is also called scanner. It takes the output of preprocessor (which performs file inclusion and
macro expansion) as the input which is in pure high level language. It reads the characters from
source program and groups them into lexemes (sequence of characters that “go together”). Each
lexeme corresponds to a token. Tokens are defined by regular expressions which are understood
by the lexical analyzer. It also removes lexical errors (for e.g., erroneous characters), comments
and white space.

Syntax Analyzer

The next phase is called the syntax analysis or parsing. It takes the token produced by lexical
analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements
are checked against the source code grammar, i.e. the parser checks if the expression made by
the tokens is syntactically correct.

DOMESH KAUSHIK
A40318071
Semantic Analyzer

Semantic analysis checks whether the parse tree constructed follows the rules of language. For
example, assignment of values is between compatible data types, and adding string to an
integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions;
whether identifiers are declared before use or not etc. The semantic analyzer produces an
annotated syntax tree as an output.

Intermediate Code Generation

After semantic analysis the compiler generates an intermediate code of the source code for the
target machine. It represents a program for some abstract machine. It is in between the high-
level language and the machine language. This intermediate code should be generated in such a
way that it makes it easier to be translated into the target machine code.

Code Optimization

The next phase does code optimization of the intermediate code. Optimization can be assumed
as something that removes unnecessary code lines, and arranges the sequence of statements in
order to speed up the program execution without wasting resources (CPU, memory).

Code Generation

In this phase, the code generator takes the optimized representation of the intermediate code and
maps it to the target machine language. The code generator translates the intermediate code into
a sequence of (generally) re-locatable machine code. Sequence of instructions of machine code
performs the task as the intermediate code would do.

Symbol Table

It is a data-structure maintained throughout all the phases of a compiler. All the identifier's
names along with their types are stored here. The symbol table makes it easier for the compiler
to quickly search the identifier record and retrieve it. The symbol table is also used for scope
management.

Lex Compiler
Lex is a computer program that generates lexical analyzers ("scanners" or "lexers"). Lex is
commonly used with the yacc parser generator. Lex, originally written by Mike Lesk and Eric
Schmidt and described in 1975, is the standard lexical analyzer generator on many Unix systems,
and an equivalent tool is specified as part of the POSIX standard.Lex reads an
input stream specifying the lexical analyzer and outputs source code implementing the lexer in
the C programming language. In addition to C, some old versions of Lex could also generate a
lexer in Ratfor.
 Lex is a program that generates lexical analyzer. It is used with YACC parser generator.

DOMESH KAUSHIK
A40318071
 The lexical analyzer is a program that transforms an input stream into a sequence of
tokens.
 It reads the input stream and produces the source code as output through implementing
the lexical analyzer in the C program.
LEX is a tool used to generate a lexical analyzer. Technically, LEX translates a set of regular
expression specifications (given as input in input_file.l) into a C implementation of a
corresponding finite state machine (lex.yy.c). This C program, when compiled, yields an
executable lexical analyzer.

The source ExpL program is fed as the input to the the lexical analyzer which produces a
sequence of tokens as output. Conceptually, a lexical analyzer scans a given source ExpL
program and produces an output of tokens.
Functions of Lex compiler:
The function of Lex compiler are as follows:
 Firstly lexical analyzer creates a program lex.1 in the Lex language. Then Lex compiler
runs the lex.1 program and produces a C program lex.yy.c.
 Finally C compiler runs the lex.yy.c program and produces an object program a.out.
 a.out is lexical analyzer that transforms an input stream into a sequence of tokens.

Structure of Lex compilers or programs:


Lex source or program is separated in three sections by %% delimiters. The three sections are
Declarations, Transition rules and Auxiliary functions or user subroutine section.

DOMESH KAUSHIK
A40318071
1. Declarations:
The declarations section consists of two parts, auxiliary declarations and regular
definitions.
The auxiliary declarations are copied as such by LEX to the output lex.yy.c file. This C
code consists of instructions to the C compiler and are not processed by the LEX tool.The
auxiliary declarations (which are optional) are written in C language and are enclosed
within ' %{ ' and ' %} ' . It is generally used to declare functions, include header files, or
define global variables and constants.
LEX allows the use of short-hands and extensions to regular expressions for the regular
definitions. A regular definition in LEX is of the form: D R where D is the symbol
representing the regular expression R.
2. Transition Rules:
Rules in a LEX program consists of two parts:
 The pattern to be matched.
 The corresponding action to be used.
The pattern to be matched is specified as a regular expression. It checks the input stream
for the first match to one of the patterns specified and executes code in the action part
corresponding to the pattern.
3. Auxiliary functions / user subroutines:
LEX generates C code for the rules specified in the Rules section and places this code
into a single function called yylex(). In addition to this LEX generated code, the
programmer may wish to add his own code to the lex.yy.c file. The auxiliary functions
section allows the programmer to achieve this.
You can use your Lex routines in the same ways you use routines in other programming
languages (create functions, identifiers). This is the section where main( ) is placed.

Working of Lex compiler:


The lexical analyzer created by Lex behaves in concert with a parser in the following manner:

DOMESH KAUSHIK
A40318071
1. When activated by the parser, the lexical analyzer begins reading its remaining input, one
character at a time, until it has found the longest prefix of the input that is matched by
one of the regular expressions p.
2. Then it executes the corresponding action. Typically, the action will return control to the
parser.
3. However, if it does not, then the lexical analyzer proceeds to find more lexemes, until an
action causes control to return to the parser.
4. The repeated search for lexemes until an explicit return allows the lexical analyzer to
process white space and comments conveniently.
5. The lexical analyzer returns a single quantity, the token, to the parser. To pass an attribute
value with information about the lexeme, we can set the global variable yylval.
6. e.g. Suppose the lexical analyzer returns a single token for all the relational operators, in
which case the parser won’t be able to distinguish between ” <=”,”>=”,”<”,”>”,”==” etc.
We can set yylval appropriately to specify the nature of the operator.

Fig-1.4 Architecture of lexical analyser generated by Lex

Features of Lex compilers:


 Helps to write programs that transform structured input
-Simple text program
-C compiler
 Control flow is directed by instances of regular expressions
 Two tasks in programs with structured input
-dividing the program into meaningful units
-discovering the relationship among them

Yacc Compiler
Yacc (Yet Another Compiler-Compiler) is a computer program for the UNIX operating system
developed by Stephen C. Johnson. It is a Look Ahead Left-to-Right (LALR) parser generator,
generating a parser, the part of a compiler that tries to make syntactic sense of the source code,
specifically a LALR parser, based on an analytic grammar written in a notation similar

DOMESH KAUSHIK
A40318071
to Backus–Naur Form (BNF).
Functions of Yacc compiler:
 YACC provides a tool to produce a parser for a given grammar.
 YACC is a program designed to compile a LALR (1) grammar.
 It is used to produce the source code of the syntactic analyzer of the language produced
by LALR (1) grammar.
 The input of YACC is the rule or grammar and the output is a C program.

Structure of Yacc compiler:


A YACC source program is structurally similar to a LEX one.
{declarations}
%%
{rules}
%%
{programs or routines}
1. Declaration section:Declare tokens used in the grammar and types of values used on the
stack here. Tokens that are single quoted characters like “=“ or “+” need not be declared.
Literal C code can be included in a block in this section using %{…%}
Declaration section may contain following items:
 Declarations of tokens,Yacc requires token names to be declared as such
using the keyword %token.
 Declaration of the start symbol using the keyword %start
 C declarations: included files, global variables, types.
 C code between %{ and %}.
2. Rules section:
The rules of the grammar are placed here.Actions may be associated with rules and are
executed when the associated sentential form is matched.
 The rules part contains grammar definition in a modified BNF form.
 Actions is C code in { } and can be embedded inside (Translation schemes).
3. Routine section:
The actual programming and the code of which the analysis is to be done.
DOMESH KAUSHIK
A40318071
 The auxiliary routines part is only C code.
 It includes function definitions for every function needed in rules part.
 It can also contain the main() function definition if the parser is going to be
run as a program.
 The main() function must call the function yyparse().

Working of yacc compiler:

1. YACC is designed for use with C code and generates a parser written in C.

2. The parser is configured for use in conjunction with a lex-generated scanner and relies on
standard shared features (token types, yylval, etc.) and calls the function yylex as a
scanner coroutine.

3. You provide a grammar specification file, which is traditionally named using a .y


extension.

4. You invoke yacc on the .y file and it creates the y.tab.h and y.tab.c files containing a
thousand or so lines of intense C code that implements an efficient LALR (1) parser for
your grammar, including the code for the actions you specified.

5. The file provides an extern function yyparse.y that will attempt to successfully parse a
valid sentence.

6. You compile that C file normally, link with the rest of your code, and you have a parser!
By default, the parser reads from stdin and writes to stdout, just like a lex-generated
scanner does.

Key differences between lex and yacc compiler:

1. The main difference between Lex and Yacc is that Lex is a lexical analyzer which
converts the source program into meaningful tokens while Yacc is a parser that generates
a parse tree from the tokens generated by Lex.
2. The lexical analyzer performs the lexical analysis while syntax analyzer performs
syntax analysis.
3. Lex is a lexical analyzer whereas Yacc is a parser.
4. Lex is a computer program that operates as a lexical analyzer while Yacc is a parser that is
used in the Unix Operating System.

5. Mike Lex and Eric Schmidt developed Lex whereas Stephan C. Johnson developed Yacc.

6. While Lex reads the source program one character at a time and converts it into
meaningful tokens, Yacc takes the tokens as input and generates a parse tree as output.

DOMESH KAUSHIK
A40318071
DOMESH KAUSHIK
A40318071
PROGRAM NO.: 2
AIM: Write a program to find whether the input string is constant or not.

#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<ctype.h>
int main()
{
int i,flag;
char a[5];
printf(“Shivansh, A40318070”);
puts("\nEnter the value :: ");
gets(a);
for(i=0;i<strlen(a);i++)
{
if(isdigit(a[i]))
flag=1;
else
{
flag=0;
break;
}
}
if(flag==1)
puts("String is constant");
else
puts("String is not a constant");
return 0;
}

DOMESH KAUSHIK
A40318071
OUTPUT:

DOMESH KAUSHIK
A40318071
PROGRAM NO.: 3

AIM: Program to find the number of Whitespaces and Newlines Characters.

ALGORITHM:

1. Start

2. Declare a character array str[] and initialize integer variables len, a=0.

3. Input the string from the User.

4. Find the length of the string.

5. Repeat step 6 to 7 till a<len.

6. If the str[a] has count number of ASCII character then count & Print no of Lines.

7. If the str[a] has blank space count the blank characters.Else not the blank space.

8. Stop.

SOURCE CODE:

#include<stdio.h>

#include<conio.h>

#include<string.h>

int main()

char str[200],ch;

int a=0,space=0,newline=0;

printf("\n enter a string(press escape to quit entering):");

ch=getche();

while((ch!=27) && (a<199))

DOMESH KAUSHIK
A40318071
{

str[a]=ch;

if(str[a]==' ')

space++;

if(str[a]==13)

a++;

ch=getche();

printf("\n the number of lines used : %d",newline+1);

printf("\n the number of spaces used is : %d",space);

return 0;

DOMESH KAUSHIK
A40318071
OUTPUT:

DOMESH KAUSHIK
A40318071
PROGRAM NO.: 4

AIM: WAP to check whether a given string belongs to the grammar or not.

#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<process.h>
void main()
{
clrscr();
printf("The grammer is:\n");
printf("S->AB\n");
printf("A->a/aA\n");
printf("B->bB\b\n");
char str[100]
int c;
printf("Enter the string");
scanf("%s",&str);
c=strlen(str);
for(int i=0;i<c;i++)
{
if(str[i]!='a'&& str[i]!='b')
{
printf("The string doesnot belong to the grammer");
break;
}
}
if(str[0]=='a'&& str[c-1]=='b')
{
printf("The string belongs to the grammer");
DOMESH KAUSHIK
A40318071
}
else if(str[0]=='b'&& str[c-1]=='a')
{
printf("The string doesnot belong to the grammer");
}
getch();
}

DOMESH KAUSHIK
A40318071
OUTPUT:

DOMESH KAUSHIK
A40318071
PROGRAM NO.: 5

AIM: WAP to check whether the input string contains keywords or not.

#include<stdio.h>
#include<conio.h>
#include<string.h>
int main()
{
int i,flag=0,m;
char s[5][10]={"if","else","goto","continue","return"},st[10];
printf("\n enter the string :");
gets(st);
for(i=0;i<5;i++)
{
m=strcmp(st,s[i]);
if(m==0)
flag=1;
}
if(flag==0)
printf("\n it is not a keyword");
else

printf("\n it is a keyword");

return 0;
}

DOMESH KAUSHIK
A40318071
OUTPUT:

DOMESH KAUSHIK
A40318071
PROGRAM NO.: 6

AIM: Write a program to remove Left Recursion from a Grammar.

#include<stdio.h>
#include<string.h>
#define SIZE 10
int main ()
{
char non_terminal;
char beta,alpha;
int num;
char production[10][SIZE];
int index=3;
printf("Enter Number of Production : ");
scanf("%d",&num);
printf("Enter the grammar as E->E-A :\n");
for(int i=0;i<num;i++)
{
scanf("%s",production[i]);
}
for(int i=0;i<num;i++)
{
printf("\nGRAMMAR : : : %s",production[i]);
non_terminal=production[i][0];
if(non_terminal==production[i][index])
{
alpha=production[i][index+1];
printf(" is left recursive.\n");
while(production[i][index]!=0 && production[i][index]!='|')
index++;

DOMESH KAUSHIK
A40318071
if(production[i][index]!=0)
{
beta=production[i][index+1];
printf("Grammar without left recursion:\n");
printf("%c->%c%c\'",non_terminal,beta,non_terminal);
printf("\n%c\'->%c%c\'|E\n",non_terminal,alpha,non_terminal);
}
else
printf(" can't be reduced\n");
}
else
printf(" is not left recursive.\n");
index=3;
}
}

DOMESH KAUSHIK
A40318071
OUTPUT:

DOMESH KAUSHIK
A40318071
PROGRAM NO.: 7

AIM: WAP to make left factoring for a given grammar.

#include<iostream>
#include<string>
using namespace std;
int main()
{ string ip,op1,op2,temp;
int sizes[10] = {};
char c;
int n,j,l;
cout<<"Enter the Parent Non-Terminal : ";
cin>>c;
ip.push_back(c);
op1 += ip + "\'->";
op2 += ip + "\'\'->";;
ip += "->";
cout<<"Enter the number of productions : ";
cin>>n;
for(int i=0;i<n;i++)
{
cout<<"Enter Production "<<i+1<<" : ";
cin>>temp;
sizes[i] = temp.size();
ip+=temp;
if(i!=n-1)
ip += "|";
}
cout<<"Production Rule : "<<ip<<endl;
char x = ip[3];
DOMESH KAUSHIK
A40318071
for(int i=0,k=3;i<n;i++)
{
if(x == ip[k])
{
if(ip[k+1] == '|')
{
op1 += "#";
ip.insert(k+1,1,ip[0]);
ip.insert(k+2,1,'\'');
k+=4;
}
else
{
op1 += "|" + ip.substr(k+1,sizes[i]-1);
ip.erase(k-1,sizes[i]+1);
}
}
else
{
while(ip[k++]!='|');
}
}
char y = op1[6];
for(int i=0,k=6;i<n-1;i++)
{
if(y == op1[k])
{
if(op1[k+1] == '|')
{
op2 += "#";
op1.insert(k+1,1,op1[0]);

DOMESH KAUSHIK
A40318071
op1.insert(k+2,2,'\'');
k+=5;
}
else
{
temp.clear();
for(int s=k+1;s<op1.length();s++)
temp.push_back(op1[s]);
op2 += "|" + temp;
op1.erase(k-1,temp.length()+2);
} }}
op2.erase(op2.size()-1);
cout<<"After Left Factoring : "<<endl;
cout<<ip<<endl;
cout<<op1<<endl;
cout<<op2<<endl;
return 0;
}

DOMESH KAUSHIK
A40318071
OUTPUT:

DOMESH KAUSHIK
A40318071
PROGRAM NO.: 9

AIM: WAP to find out the first of the non-terminals in a grammar.

#include<stdio.h>
#include<ctype.h>
void FIRST(char[],char );
void addToResultSet(char[],char);
int numOfProductions;
char productionSet[10][10];
main()
{
int i;
char choice;
char c;
char result[20];
printf("How many number of productions ? :");
scanf(" %d",&numOfProductions);
for(i=0;i<numOfProductions;i++)
{
printf("Enter productions Number %d : ",i+1);
scanf(" %s",productionSet[i]);
}
do
{
printf("\n Find the FIRST of :");
scanf(" %c",&c);
FIRST(result,c);
printf("\n FIRST(%c)= { ",c);
for(i=0;result[i]!='\0';i++)
printf(" %c ",result[i]);
DOMESH KAUSHIK
A40318071
printf("}\n");
printf("press 'y' to continue : ");
scanf(" %c",&choice);
}
while(choice=='y'||choice =='Y');
}
void FIRST(char* Result,char c)
{
int i,j,k;
char subResult[20];
int foundEpsilon;
subResult[0]='\0';
Result[0]='\0';
if(!(isupper(c)))
{
addToResultSet(Result,c);
return ;
}
for(i=0;i<numOfProductions;i++)
{
if(productionSet[i][0]==c)
{
.
if(productionSet[i][2]=='$') addToResultSet(Result,'$');
else
{
j=2;
while(productionSet[i][j]!='\0')
{
foundEpsilon=0;
FIRST(subResult,productionSet[i][j]);

DOMESH KAUSHIK
A40318071
for(k=0;subResult[k]!='\0';k++)
addToResultSet(Result,subResult[k]);
for(k=0;subResult[k]!='\0';k++)
if(subResult[k]=='$')
{
foundEpsilon=1;
break;
}
if(!foundEpsilon)
break;
j++;
}
}
}
}
return ;
}
void addToResultSet(char Result[],char val)
{
int k;
for(k=0 ;Result[k]!='\0';k++)
if(Result[k]==val)
return;
Result[k]=val;
Result[k+1]='\0';
}

DOMESH KAUSHIK
A40318071
OUTPUT:

DOMESH KAUSHIK
A40318071
PROGRAM NO.: 10

AIM: WAP to compute follow of the non-terminals.

#include<stdio.h>
#include<string.h>
int n,m=0,p,i=0,j=0;
char a[10][10],followResult[10];
void follow(char c);
void first(char c);
void addToResult(char);
int main()
{
int i;
int choice;
char c,ch;
printf("Enter the no.of productions: ");
scanf("%d", &n);
printf(" Enter %d productions\nProduction with multiple terms should be give as separate
productions \n", n);
for(i=0;i<n;i++)
scanf("%s%c",a[i],&ch);
do
{
m=0;
printf("Find FOLLOW of -->");
scanf(" %c",&c);
follow(c);
printf("FOLLOW(%c) = { ",c);
for(i=0;i<m;i++)
DOMESH KAUSHIK
A40318071
printf("%c ",followResult[i]);
printf(" }\n");
printf("Do you want to continue(Press 1 to continue....)?");
scanf("%d%c",&choice,&ch);
}
while(choice==1);
}
void follow(char c)
{
if(a[0][0]==c)addToResult('$');
for(i=0;i<n;i++)
{
for(j=2;j<strlen(a[i]);j++)
{
if(a[i][j]==c)
{
if(a[i][j+1]!='\0')first(a[i][j+1]);
if(a[i][j+1]=='\0'&&c!=a[i][0])
follow(a[i][0]);
}
}
}
}
void first(char c)
{
int k;
if(!(isupper(c)))
addToResult(c);
for(k=0;k<n;k++)
{
if(a[k][0]==c)

DOMESH KAUSHIK
A40318071
{
if(a[k][2]=='$') follow(a[i][0]);
else if(islower(a[k][2]))
//f[m++]=a[k][2];
addToResult(a[k][2]);
else first(a[k][2]);
}
}
}
void addToResult(char c)
{
int i;
for( i=0;i<=m;i++)
if(followResult[i]==c)
return;
followResult[m++]=c;
}

DOMESH KAUSHIK
A40318071
OUTPUT:

DOMESH KAUSHIK
A40318071
PROGRAM NO.: 8

AIM: Write a program to show all the operations on a stack.

#include<iostream.h>
#include<conio.h>
#define MAXSIZE 50
int a[MAXSIZE];
int top=-1;
void main()
{
clrscr();
int choice;
int push();
int pop();
void display();
do
{
cout<<"\nyou can do following operations on stack:\n";
cout<<"\n1.PUSH";
cout<<"\n2.POP";
cout<<"\n3.DISPLAY";
cout<<"\n4.EXIT";
cout<<"\nenter your choice:";
cin>>choice;
switch(choice)
{
case 1:push();
break;
case 2:pop();
break;
DOMESH KAUSHIK
A40318071
case 3:display();
break;
case 4:break;
default:cout<<"\nwrong choice";
}
}
while(choice!=4);
getch();
}
int push()
{
int value;
if(top==MAXSIZE-1)
{
cout<<"\nSTACK OVERFLOW";
}
else
{
cout<<"\nenter the value you want to insert:";
cin>>value;
top++;
a[top]=value;
}
return value;
}

int pop()
{
int value;
if(top==-1)
{

DOMESH KAUSHIK
A40318071
cout<<"\nstack underflow";
}
else
{
value=a[top];
top--;
}
return value;
}
void display()
{
int i;
if(top==-1)
{
cout<<"\nstack underflow";
}
else
{
for(i=top;i>=0;i--)
{
cout<<"\n"<<a[i];
}
}
}

DOMESH KAUSHIK
A40318071
OUTPUT:

DOMESH KAUSHIK
A40318071

You might also like