Professional Documents
Culture Documents
PRACTICAL FILE
OF
COMPILER DESIGN
LAB
Submitted By:-
CERTIFICATE
This is to certify that Mr. Sambhav Singh Chauhan Roll No. 10097 student of
VII Semester (Computer Science & Engineering) has completed his practical
file under the guidance and supervision of undersigned.
INDEX
Program No. 1
CASE STUDY ON LEX & YACC TOOLS
The program lex is a lexical analyzer. Lexical analysis is the recognition of words in a language.
As used in this particular application, lex, or more specifically flex, is used to recognize
characters forming the names of log curves, arithmetic operators and algebraic groupings. flex is
a particular example of the lexical analysis programs available for UNIX systems and is the
implementation delivered with Linux distributions. The original implementation was done by
Mike Lesk and Eric Schmidt at Bell Laboratories.
yacc is a language parser. It accepts word items and, given a list of rules describing how these
items form larger entities, deduces which items or combinations of items satisfy a rule. This can
also be thought of as grammatical analysis. Once a rule is satisfied, a programmer's code is
applied to the result.
The combination of lex, yacc and some programmer's C code provides a complete means to
interpret and act upon a user's wishes. The lex program uses its regular expression interpretation
capability to recognize strings of characters as words or tokens. Once a token is identified, it is
passed to yacc where the various rules are applied until some combination of tokens form a
recognizable structure to which yacc applies some pre-programmed C code.
Using Tools :-
o Similarly, the characters [ and ] (square brackets) are used to indicate acceptance of
any of the characters enclosed within them or within the range of characters described
between them. For example, the expression [abc] says to accept any of the characters
a, b or c; the expression [a-c] says the same thing.
• Once a regular expression matches the text stream being interpreted by lex, code created
by the programmer is executed. When used with yacc, this code generally amounts to
passing an indication. The indication is a token that yacc knows about, and in fact, these
are defined in the yacc portion of the analyzer/parser program so that they are common to
both lex and yacc.
o Yacc uses a grammar description to decode the meaning of the tokens that lex passes
to it. As tokens are passed to yacc, it applies its rules until a single token, or some
sequence of tokens, becomes a recognizable structure.
10097
• Before a programmer's C code is executed, though, yacc may require several structures or
token-structure combinations to be recognized. For example,
o The first rule says that a sentence is made up of two parts: a subject followed by a
predicate. If that rule is satisfied, then the C code between the curly brackets will be
executed. To satisfy the first rule, yacc has to recognize both a subject and a
predicate.
o The second rule says that a subject is recognized when either a noun or a noun phrase
is recognized. A noun is the smallest unit that yacc deals with, and in the yacc
grammar, a noun is a token that yacc will want to have lex recognize. Here, noun
phrase is also used to create the predicate. If a verb is recognized and it is followed by
a noun phrase, the predicate is identified. If only the noun phrase is identified, then
the subject has been identified.
• When the sentence is finally recognized, yacc accepts it and returns to the calling
program.
There are three sections in each program: declarations, rules and user routines. Each is
separated by a line containing only the two characters %%.
• Declarations :-
o For yacc, the declarations section contains the tokens it can use while parsing the
input. Each has a unique value greater than 256, and the set of tokens is introduced
via %token at the beginning of the line.
o lex can use the declarations section to define aliases for items that it must recognize
while looking for tokens to pass to yacc.
• Rules :-
o Within the rules section, yacc holds its parsing intelligence. This is the section that
contains the grammar rules referred to in the English sentence example earlier. In
fact, the coding used earlier is typical of a yacc grammar: items to be recognized are
separated from the means to recognize them by a colon (:), and alternative means of
recognition are separated from each other via a pipe (|) symbol.
10097
o lex uses the rules section to contain the regular expressions that allow it to identify
tokens to pass to yacc. These expressions may be the aliases from the declaration
section, regular expressions, or some combination.
• User Routines :-
o The last section contains C code which may be invoked as each of the tools processes
its input.One requirement is that the yacc tokens be known to the lex program.
o To successfully interpret the user's desired process, the program needs to know which
well logs were available for processing. This information is available in the ASCII
file selected by the user.
10097
Program No. 2
ALGORITHM TO CONVERT REGULAR EXPRESSION TO NFA
Method: We first decompose R into the primitive components. For each component we
construct a finite automaton as follows.
i ε f
a
i f
N1 ε
ε
i f
ε N2 ε
i N1 N2 f
N
ε ε
i f
ε
Program No. 3
CONSTRUCTION OF DFA FROM AN NFA
Input: An NFA N.
Method: Our algorithm constructs a transition table Dtran for D. Each DFA state is a set of
NFA states and we construct Dtran so that D will simulate “in parallel” all possible moves N can
make on a given input string
Before it sees the first input symbol, N can be in any of the states in the set €-closure(s0),
where s0 is the start state of N. Suppose that exactly the states in set T are reachable from s0 on a
given sequence of input symbols, and let a be the next input symbol. On seeing a, N can move to
any of the states in the set move(T,a). When we allow for €-transitions, N can be in any of the
states in €-closure(move(T,a)), after seeing the a.
We construct Dstates, the set of states of D, and Dtran, the transition table for D, in the following
manner. Each state of D corresponds to a set of NFA states that N could be in after reading some
sequence of input symbols including all possible €-transitions before or after symbols are read.
The start state of D is €-closure(s0). States and transitions are added to D using algorithm of
Subset Constructions. A state of D is an accepting state if it is a set of NFA states containing at
least one accepting state of N.
10097
Computation of €-closure
Program No. 4
MINIMIZATION THE NUMBER OF STATES OF A DFA
Input: A DFA M with set of states S, set of inputs ∑, transitions defined for all states and
inputs, start state s0, and set of accepting states F.
Output: A DFA M’ accepting the same language as M and having as few states as
possible.
Method:
1. Construct as initial partition ∏ of set of states with two groups: the accepting states F and
the non accepting states S-F
2. Apply the procedure to ∏ to construct a new partition ∏new.
3. If ∏new = ∏, let ∏final = ∏ and continue with step (4). Otherwise, repeat step (2) with
∏ = ∏new.
4. Choose one state in each group of the partition ∏final as the representative for that group.
The representatives will be the states of the reduced DFA M’. Let s be a representative
state, and suppose on input a there is a transition of M from s to t. let r be the
representative of t’s group. Then M’ has a transition from s to r on a. Let the start state of
M’ be the representative of the group containing the start state s0 of M, and let the
accepting states f M’ be the representatives that are in F. Note that each group of ∏final
either consists only of states in F or has no states in F.
5. If M’ has a dead state, that is, a state d that is not accepting and that has transitions to
itself on all input symbols, then remove d from M’. Also remove any states not reachable
from the start state. Any transitions to d from other states become undefined.
Construction of ∏new
10097
1. Start
2. Declare a character array str and a character variable ch and initialize integer variables
a=0, space=0, newline=0.
3. Read a string using ch = getchar(), getchar() function that reads a character from console
and echoes it on the screen.
4. a=0, Repeat step 5 to 8 steps till ch!=27 and a<199.
5. Put ch in array str[] and check whether str[0]== ‘’. If true then space++.
6. If str[a]==13 then newline++.
7. a++.
8. Again ch=getch() to read the string character by character from the console and store it in
the array and count the no. of whitespaces and newline character in the string.
9. Print the no. of whitespaces and newline characters in the string.
10. Stop
10097
Program No. 5
PROGRAM TO COUNT NO. OF BLANK SPACES AND LINES
#include<stdio.h>
#include<conio.h>
#include<string.h>
void main()
{
clrscr();
if(str[a]==13)
{
newline++;
printf(“\n”);
}
a++;
ch=getch();
getch();
}
10097
Output
1. Start
2. Declare i,j,flag=1,len as int and str[10] as char
3. Allow the user to enter the string
4. Initialize the loop i=0,j=1 to i<len
a. if str[j] is digit, assign flag=0
b. else if str[i] is alphabet, increment flag
1) if str[i] is alphanum, increment flag
2) else if str[i] is not digit, assign flag=0
3) else increment flag
c. else if str[i] is not alphanum, assign flag=0
5. if flag==0, print Identifier else print Not Identifier
6. Stop
10097
Program No. 6
PROGRAM TO FIND WHETHER GIVEN STRING IS
IDENTIFIER OR NOT
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<ctype.h>
#include<stdlib.h>
void main()
{
int i,j,flag=1,len;
char str[10];
clrscr();
len=strlen(str);
for(i=0,j=1;i<len;i++)
{
if(isdigit(str[j]))
{
flag=0;
break;
}
else if(isalpha(str[i]))
{
{
flag++;
continue;
}
if(isalnum(str[i]))
{
flag++;
continue;
}
else if(!isdigit(str[i]))
{
10097
flag=0;
break;
}
else
flag++;
}
else if(!isalnum(str[i]))
{
flag=0;
break;
}
if(flag==0)
puts("Not Identifier");
else
puts("Identifier");
getch();
}
10097
Output
1. Start
2. Declare i, flag as int and a[5] as char
3. Allow user to enter the value
4. Initialize the loop from i=0 to i<strlen(a)
a. if a[i] is digit, assign flag=1
b. else assign flag=0
5. if flag==1, print Value is Constant else print Value is Variable
6. Stop
10097
Program No. 7
PROGRAM TO FIND WHETHER GIVEN STRING IS
CONSTANT OR NOT
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<ctype.h>
void main()
{
int i,flag;
char a[5];
clrscr();
for(i=0;i<strlen(a);i++)
{
if(isdigit(a[i]))
flag=1;
else
{
flag=0;
break;
}
}
if(flag==1)
puts("Value is constant");
else
puts("Value is a variable");
getch();
}
10097
Output
Start
Declare i, flag as int and str[10] as char
Declare array a[5][10]={“printf”,”scanf”,”if”,”else”,”break”}
Allow user to enter the string
Initialize the loop i=0 to i<strlen(str)
a. Compare str and a[i]
b. If same, assign flag=1 else flag=0
If flag=1, print Keyword else print String
Stop
10097
Program No. 8
PROGRAM TO FIND WHETHER GIVEN STRING IS
KEYWORD OR NOT
#include<stdio.h>
#include<conio.h>
#include<string.h>
void main()
{
char a[5][10]={"printf","scanf","if","else","break"};
char str[10];
int i,flag;
clrscr();
for(i=0;i<strlen(str);i++)
{
if(strcmp(str,a[i])==0)
{
flag=1;
break;
}
else
flag=0;
}
if(flag==1)
puts("Keyword");
else
puts("String");
getch();
}
10097
Output
1. Start
2. Declare i as int, str[20] as char
3. Allow user to enter the string
4. Initialize the loop until str[i] is not equal to NULL
a. if str[i]==’(‘ || str[i]==’{‘, print 4
b. if str[i]==’)’ || str[i]==’}‘, print 5
c. if str[i] is digit, increment i and print 1
d. if str[i]==’+’, print 2
e. if str[i]==’*’, print 3
5. Stop
10097
Program No. 9
PROGRAM TO GENERATE TOKENS CORRESPONDING TO
GIVEN STRING
#include<stdio.h>
#include<conio.h>
#include<ctype.h>
#include<string.h>
void main()
{
int i=0;
char str[20];
clrscr();
printf(" \n Input the string ::");
gets(str);
printf("Corresponding Tokens are :: ");
while(str[i]!='\0')
{
if((str[i]=='(')||(str[i]=='{'))
{
printf("4");
}
if((str[i]==')')||(str[i]=='}'))
{
printf("5");
}
if(isdigit(str[i]))
{
while(isdigit(str[i]))
{
i++;
}
i--;
printf("1");
}
if(str[i]=='+')
{
printf("2");
}
10097
if(str[i]=='*')
{
printf("3");
}
i++;
}
getch();
}
10097
Output
1. Start .
2. Get the string from the user.
3. Generate the tokens of string.
4. Print the tokens of string.
5. Pass this string of token to the reduction function.
6. If any of the following series is encountred in the token string,then
“646 ” or “626” of “636” or null.
Replace the series with “6” (using the productions )
7. Compare the string in token array with constant “6” and store the result in d.
8. If d=0 then Print the message “string is in grammer ”
otherwise print the message “string is not in grammer.”
9. Stop.
10097
Program No.10
PROGRAM TO FIND WHETHER GIVEN EXPRESSION IS IN
GRAMMAR OR NOT
#include<stdio.h>
#include<conio.h>
#include<ctype>
void main()
{
int a=0,b=0,c,d;
char str[20],token[10];
clrscr();
printf(“\n ENTER STRING::”);
gets(str);
while(str[a]!=’\0’)
{
if((str[a]=’(‘)||(str[a]=’{‘))
{
token[b]=’4’;
b++;
}
if((str[a]=’)‘)||(str[a]=’}‘))
{
token[b]=’5’;
b++;
}
if(isdigit(str[a]))
{
while(isdigit(str[a]))
{
a++;
}
a--;
token[b]=’1’;
b++;
}
if(str[a]=’+‘)
{
token[b]=’2’;
b++;
}
if(str[a]=’*‘)
{
token[b]=’3’;
b++;
}
10097
a++;
}
token[b]=’\0’;
puts(token);
b=0;
while(token[b]!=’\0’)
{
if(token[b]==’1’)
{
token[b]=’6’;
}
b++;
}
puts(token);
b=0;
while(token[b]!=’\0’)
{
c=0;
while(((token[b]==’6’)&&(token[b+1]==’2’)&&(token
[b+2]==’6’))||((token[b]==’6’)&&(token[
b+1]==’3’)&&(token[b+2]==’6’))||((token
[b]==’4’)&&(token[b+1]==’6’)&&(token[b+
2]==’5’))||(token[c]!=’\0’))
{
token[c]=’6’;
c++;
while(token[c]!=’\0’)\
{
token[c]=token[c+2];
c++;
}
token[c-2]=’\0’;
puts(token);
}
b++;
}
d=strcmp(token,”6”);
if(d==0)
{
printf(“IT IS IN GRAMMAR”);
}
else
{
printf(“\n IT IS NOT IN GRAMMAR”);
}
getch();
10097
}
10097
Output
procedure INSTALL(A,a);
if not L(A,a) then
begin
L[A,a]:= true;
push (A,a) onto STACK
end
begin
for each nonterminal A and terminal a do L(A,a):= false;
for each production of the form A->a or A->B do
INSTALL(A,a);
while STACK not empty do
begin
pop top pair (B,a) from STACK;
for each production of the form A->B do
INSTALL(A,a)
end
end
10097
Program No. 11
PROGRAM TO IMPLEMENT LEADING
#include<conio.h>
#include<stdio.h>
char arr[18][3] =
{
{'E','+','F'},{'E','*','F'},{'E','(','F'},{'E',')','F'},{'E','i','F'},{'E','$','F'},
{'F','+','F'},{'F','*','F'},{'F','(','F'},{'F',')','F'},{'F','i','F'},{'F','$','F'},
{'T','+','F'},{'T','*','F'},{'T','(','F'},{'T',')','F'},{'T','i','F'},{'T','$','F'},
};
char prod[6] = "EETTFF";
char res[6][3]=
{
{'E','+','T'},{'T','\0'},
{'T','*','F'},{'F','\0'},
{'(','E',')'},{'i','\0'},
};
char stack [5][2];
int top = -1;
void main()
{
int i=0,j;
char pro,re,pri=' ';
clrscr();
for(i=0;i<6;++i)
{
10097
while(top>=0)
{
pro = stack[top][0];
re = stack[top][1];
--top;
for(i=0;i<6;++i)
{
if(res[i][0]==pro && res[i][0]!=prod[i])
{
install(prod[i],re);
}
}
}
for(i=0;i<18;++i)
{
printf("\n\t");
for(j=0;j<3;++j)
printf("%c\t",arr[i][j]);
}
getch();
clrscr();
printf("\n\n");
for(i=0;i<18;++i)
{
if(pri!=arr[i][0])
{
pri=arr[i][0];
printf("\n\t%c -> ",pri);
}
if(arr[i][2] =='T')
printf("%c ",arr[i][1]);
}
getch();
}
10097
Output
E + T
E * T
E ( T
E ) F
E i T
E $ F
F + F
F * F
F ( T
F ) F
F i T
F $ F
T + F
T * T
T ( T
T ) F
T i T
T $ F
E -> + * ( i
F -> ( i
T -> * ( i
10097
begin
for each non terminal A and terminal a do L(A,a):= false;
for each production of the form A->αa or A->αaB do
INSTALL(A,a);
while STACK not empty do
begin
pop top pair (B,a) from STACK;
for each production of the form A->αB do
INSTALL(A,a)
end
end
INSTALL
procedure INSTALL(A,a);
if not L(A,a) then
begin
L[A,a]:= true;
push (A,a) onto STACK
end
10097
Program No. 12
PROGRAM TO IMPLEMENT TRAILING
#include<conio.h>
#include<stdio.h>
char arr[18][3] =
{
{'E','+','F'},{'E','*','F'},{'E','(','F'},{'E',')','F'},{'E','i','F'},{'E','$','F'},
{'F','+','F'},{'F','*','F'},{'F','(','F'},{'F',')','F'},{'F','i','F'},{'F','$','F'},
{'T','+','F'},{'T','*','F'},{'T','(','F'},{'T',')','F'},{'T','i','F'},{'T','$','F'},
};
char prod[6] = "EETTFF";
char res[6][3]=
{
{'E','+','T'},{'T','\0','\0'},
{'T','*','F'},{'F','\0','\0'},
{'(','E',')'},{'i','\0','\0'},
};
char stack [5][2];
int top = -1;
void main()
{
int i=0,j;
char pro,re,pri=' ';
clrscr();
10097
for(i=0;i<6;++i)
{
for(j=2;j>=0;--j)
{
if(res[i][j]=='+'||res[i][j]=='*'||res[i][j]=='('||res[i][j]==')'||res[i][j]=='i'||res[i]
[j]=='$')
{
install(prod[i],res[i][j]);
break;
}
else if(res[i][j]=='E' || res[i][j]=='F' || res[i][j]=='T')
{
if(res[i][j-1]=='+'||res[i][j-1]=='*'||res[i][j-1]=='('||res[i][j-1]==')'||
res[i][j-1]=='i'||res[i][j-1]=='$')
{
install(prod[i],res[i][j-1]);
break;
}
}
}
}
while(top>=0)
{
pro = stack[top][0];
re = stack[top][1];
--top;
for(i=0;i<6;++i)
{
for(j=2;j>=0;--j)
{
if(res[i][0]==pro && res[i][0]!=prod[i])
{
install(prod[i],re);
break;
}
else if(res[i][0]!='\0')
break;
}
}
}
for(i=0;i<18;++i)
{
printf("\n\t");
for(j=0;j<3;++j)
printf("%c\t",arr[i][j]);
10097
}
getch();
clrscr();
printf("\n\n");
for(i=0;i<18;++i)
{
if(pri!=arr[i][0])
{
pri=arr[i][0];
printf("\n\t%c -> ",pri);
}
if(arr[i][2] =='T')
printf("%c ",arr[i][1]);
}
getch();
}
10097
Output
E + T
E * T
E ( F
E ) T
E i T
E $ F
F + F
F * F
F ( F
F ) T
F i T
F $ F
T + F
T * T
T ( F
T ) T
T i T
T $ F
E -> + * ) i
F -> ) i
T -> * ) i
10097
Program No. 13
PROGRAM TO IMPLEMENT FIRST
#include<stdio.h>
#include<conio.h>
#include<string.h>
void main()
{
char t[5],nt[10],p[5][5],first[5][5],temp;
int i,j,not,nont,k=0,f=0;
clrscr();
printf("\nEnter the no. of Terminals in the grammer: ( Enter { for absiline ) ");
scanf("%d",¬);
printf("\nEnter the Terminals in the grammer:\n");
for(i=0;i<not||t[i]=='$';i++)
{
scanf("\n%c",&t[i]);
}
for(i=0;i<nont;i++)
{
p[i][0]=nt[i];
first[i][0]=nt[i];
}
}
}
for(i=0;i<nont;i++)
{
printf("\nThe production for %c -> ",p[i][0]);
for(j=1;p[i][j]!='$';j++)
{
printf("%c",p[i][j]);
}
}
for(i=0;i<nont;i++)
{
f=0;
for(j=1;p[i][j]!='$';j++)
{
for(k=0;k<not;k++)
{
if(f==1)
break;
if(p[i][j]==t[k])
{
first[i][j]=t[k];
first[i][j+1]='$';
f=1;
break;
}
else if(p[i][j]==nt[k])
{
first[i][j]=first[k][j];
if(first[i][j]=='{')
continue;
first[i][j+1]='$';
f=1;
break;
}
}
}
}
for(i=0;i<nont;i++)
{
printf("\n\nThe first of %c -> ",first[i][0]);
10097
for(j=1;first[i][j]!='$';j++)
{
printf("%c\t",first[i][j]);
}
}
getch();
}
10097
Output
Enter the production for E ( End the production with '$' sign ) :(i)$
Enter the production for T ( End the production with '$' sign ) :i*E$
Enter the production for V ( End the production with '$' sign ) :E+i$