System Requirements: Hardware Requirements

Ex.
No:1 Implementations of symbol table with features like insert, modify, search, and
display
Aim
To write a C program to implement symbol table.
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS
 Processors - 2.0 GHz or Higher

 RAM - 256 MB or Higher
 Hard Disk - 20 GB or Higher
 Operating System - Linux and Windows 2000/XP/NT
SOFTWAREREQUIREMENTS
 TURBO C
Definition:
Symbol Table:
A symbol table is a data structure containing a record for each identifier, with
fields for the attributes of the identifier. The data structure allows us to find the record
for each identifier quickly and to store or retrieve data from that record quickly.
Algorithm
Step 1.: Start the program for performing insert, display, search and modify option in
symbol table
Step 2:Define the structure of the Symbol Table
Step 3: Enter the choice for performing the operations in the symbol Table
Step 4: If the entered choice is 1, search the symbol table for the symbol to be
inserted. If the symbol is already present, it displays "Duplicate Symbol". Else,
insert the symbol and the corresponding address in the symbol table.
Step 5: If the entered choice is 2, the symbols present in the symbol table are
displayed.
Step 6: If the entered choice is 3, the symbol is searched in the symbol table. If it is
not found in the symbol table it displays "Label Not found".
Step 7: If the entered choice is 4, the symbol to be modified is searched in the symbol
table. The address of the label can be modified.
Step 8: Enter choice 5 to exit the program.
Program
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<conio.h>
int cnt=0;
struct symtab
{
char label[20];
int addr;
}sy[50];
void insert();
int search(char *);
void display();
void modify();
void main()
{
int ch,val;
char lab[10];
do
{
printf("\n 1.insert\n2.display\n3.search\n4.modify\n5.exit\n");
scanf("%d",&ch);
switch(ch)
{
case 1:
insert();
break;
case 2:
display();
break;
case 3:
printf("enter the label");
scanf("%s",lab);
val=search(lab);
if(val==1)
printf("label is found");
else
printf("label is not found");
break;
case 4:
modify();
break;
case 5:
exit(0);
break;
}
}while(ch<5);
}
void insert()
{
int val;
char lab[10];
scanf("%s",lab);
val=search(lab);
if(val==1)
printf("duplicate symbol");
else
{
strcpy(sy[cnt].label,lab);
printf("enter the address");
scanf("%d",&sy[cnt].addr);
cnt++;
}
}
int search(char *s)
{
int flag=0,i;
for(i=0;i<cnt;i++)
{
if(strcmp(sy[i].label,s)==0)
flag=1;
}
return flag;
}
void modify()
{
int val,ad,i;
char lab[10];
scanf("%s",lab);
val=search(lab);
if(val==0)
printf("no such symbol");

else
{
printf("label is found \n");
printf("enter the address");
scanf("%d",&ad);
for(i=0;i<cnt;i++)
{
if(strcmp(sy[i].label,lab)==0)
sy[i].addr=ad;
}
}
}
void display()
{
int i;
for(i=0;i<cnt;i++)
printf("%s\t%d\n",sy[i].label,sy[i].addr);
}
Input and output
1.insert
2.display
3.search
4.modify
5.exit
1
enter the label A
enter the address 2000
1.insert
2.display
3.search
4.modify
5.exit
1
enter the label SUB
1.insert
2.display
3.search
4.modify
5.exit
1
enter the label MUL
1.insert
2.display
3.search
4.modify
5.exit
2
A 2000
SUB 3000
MUL 4000
1.insert
2.display
3.search
4.modify
5.exit
3
enter the label A
label is found
1.insert
2.display
3.search
4.modify
5.exit
4
enter the label A
label is found
1.insert
2.display
3.search
4.modify
5.exit
5
ADVANTAGES AND LIMITATIONS:
Advantages:
 Does not waste spaces.
 Little overhead in opening a scope.
Limitations:
 It is difficult to close a scope.
 Need to maintain a list of entries in the same scope.
 Using this list to close a scope and to reactive it for the second pass if needed.
REFERENCES:
 A V Aho, R. Sethi, .J D Ullman, "Compilers: Principles,Techniques, and Tools",
PearsonEducation, ISBN 81 - 7758 - 5902.
 Reference book: John R. Levine, lex&yacc3.
 Reference ppt: Lecture 2: Lexical Analysis, CS 440/540, George Mason university4.
 Reference URL:http://dinosaur.compilertools.net/ 5.
 Online manual:http://dinosaur.compilertools.net/flex/index.html
Result:
Thus the program for implementation of symbol table with features has been
executed and verified successfully.
Ex.No:2
Develop a lexical analyzer to recognize a few patterns in C
Aim:
To develop a lexical analyzer to recognize a few patterns in C.(Ex. identifiers,

constants, comments, operators etc.)
System Requirements:
Hardware Requirements

Software Requirements
 TURBO C
Description:
 Break input string into “words” called tokens
 The main functions of lexical analyzer are :-
 Stripping out comments and white spaces
 Correlating error messages with the source program
Objective:
The main objective is to to write a program to implement a lexical analyzer
which can read input characters and group them into “tokens.”
How it is being achieved?

 It is achieved by extracting character from the expression using built in
functions in C.
Algorithm:
Step 1:Start the program.
Step 2.Extract first character from the expression using get char () function.
Step 3.Check the character
i) If it is a digit then print the token as number.
ii) If it is a „+‟,‟-„,‟*‟, or „/‟ then print the token as OPERATOR.
iii) If it is a „<‟,‟>‟,‟<=‟,‟>=‟,‟/-„,--, then print the token as
RELATIONAL OPERATOR.
iv) If it is a „(„,‟)‟, then print the token as PARANTHESIS.
v) If it is an „int‟,‟float‟,‟if‟‟,‟while‟, and etc then print the token as
KEYWORD.
vi) If it is a Single letter on a letter followed by a digit or number, and
then prints the token as IDENTIFIER.
Step 4.Token is obtained using Step 3.
Step 5. Go to Step3 otherwise proceed.
Program:
#include<stdio.h>
#include<conio.h>
#include<ctype.h>
#include<string.h>
#include<stdlib.h>
#define SIZE 128
#define NONE -1
#define EOS ‘\0’
#define NUM 256
#define KEYWORD 257
#define PAREN 258
#define ID 259
#define ASSIGN 260
#define REL_OP 261
#define DONE 262
#define MAX 999
char lexemes[MAX];
char buffer[SIZE];
int lastchar = -1;
int lastentry = 0;
int tokenval=NONE;
int lineno=1;
struct entry
char *lexptr;
int token;
}symtable[100];
struct entry keywords[]={“if”,KEYWORD,”else”,KEYWORD,”for”,KEYWORD,
“int”,KEYWORD,”float”,KEYWORD,”double”,KEYWORD,”char”,KEYWO
RD, “struct”,KEYWORD,”return”,KEYWORD,0,0};
void Error_Message(char *m)
fprint(stderr,”line %d: %s”,lineno,m);
exit(1);
}int look_up(char s[])
{int k;
for(k=lastentry;k>0;k--)
if(strcmp(symtable[k].lexptr,s)==0)
return k;
return 0;
int insert(chars[],int tok)
{
int len;
len=strlen(s);
if(lastentry+1>=MAX)
Error_Message(“Symbol Table is Full”);
if(lastchar+len+1>=MAX)
Error_Message(“Lexemes Array is Full”);
lastentry++;
symtable[lastentry].token=tok;
symtable[lastentry].lexptr=&lexemes[lastcher+1];
lastchar = lastchar + len + 1;
strcpy(smtable[lastentry].lexptr,s);
return lastentry;
void Initialize()
struct entry *ptr;
for(ptr=keywords;ptr->token;ptr++)
insert(ptr->lexptr,ptr->token);
int lexer()
int t;
int val,i=0;
while(1)
t=getchar();
if(t == ’’ || t==’\t’);
else if(t==’\n’)
lineno++;
else if(t == ’(‘ || t == ‘)’)
return PAREN;
else if(t==‘<’ ||t==‘>’ ||t==‘<=’ ||t==‘>=’ ||t == ‘!=’)
return REL_OP;
else if(t == ’=’)
return ASSIGN;
else if(isdigit(t))
ungetc(t,stdin);
scanf(“%d”,&tokenval);
return NUM;
else if(isalpha(t))
while(isalnum(t))
buffer[i]=t;
t=getchar();
i++;
if(i>=SIZE)
Error_Message(“compiler error”);
}
buffer[i]=EOS;
if(t!=EOF)
ungetc(t,stdin);
val=look_up(buffer);
if(val==0)
val=insert(buffer,ID);
tokenval=val;
return symtable[val].token;}
else if(t==EOF)
return DONE;
else{
tokenval=NONE;
return t;
}}}
void main()
int lookahead;
char ans;
clrscr();
printf(“\n]t]t Program for Lexical Analysis \n”);
Initialize();
printf(“\n Enter the expression and put ; at the end”);
printf(“\n Press Ctrl + Z to terminate... \n”);

lookahead=lexer();
while(lookahead!=DONE){
if(lookahead==NUM)
printf(“\n Number: %d”,tokenval);
if(lookahead==’+’|| lookahead==’-’|| lookahead==’*’||lookahead==’/’)
printf(“\n Operator”);
if(lookahead==PAREN)
printf(“\n Parentesis”);
if(lookahead==ID)
printf(“\n Identifier: %s“,
symtable[tokenval].lexptr);
if(lookahead==KEYWORD)
printf(“\n Keyword);
if(lookahead==ASSIGN)
printf(“\n Assignment Operator”);
if(lookahead==REL_OP)
printf(“\n Relataional Operator”);
lookahead=lexer();
}}
OUTPUT:
Program for Lexical Analysis

Enter the expression and put ; at the end
Press Ctrl + Z to terminate ...
2+3
Number: 2
Operator
Number: 3
if(a<b) a=a+b;
Keyword
Parenthesis
Identifier: a
Relational Operator
Identifier: b
ParenthesisIdentifier: a
Assigment Operator
Identifier: a
OperatorIdentifier: b
^Z
Limitations:
 The lexical analyzer reads source text and produces tokens, which are the basic
lexical units of
the language.
 The limitations are some trailing context patterns cannot be properly

matched and generate warning messages („dangerous trailing context‟).
Applications:
 Lex, a programming tool for the Unix system, is a successful solution to the
general problem of
lexical analysis.
 Lex is targeted only C. It also places artificial limits on the size of strings that
can be recognized.
 This feature is typically used to handle quoted strings with escapes to denote
special characters.
Viva Questions:
1. List the various phases of a compiler. (Nov/Dec 2008)

The following are the various phases of a compiler:
 Lexical Analyzer
 Syntax Analyzer
 Semantic Analyzer
 Intermediate code generator
 Code optimizer
 Code generator
2. Describe Lexical analyzer.
It converts a text representation of the program (sequence of characters)
into a sequence
of lexical unit for a particular language (tokens). A program which performs
lexical analysis is
called a lexical analyzer, lexer or scanner.
3.What is a symbol table?

A symbol table is a data structure containing a record for each identifier,
with fieldsfor the
attributes of the identifier. The data structure allows us to find the record for
eachidentifier quickly and to store or retrieve data from that record quickly.
Whenever an identifier is detected by a lexical analyzer, it is entered into the
symbol table. The attributes of an identifier cannot be determined by the lexical
analyzer.
4.What is a compiler?
A compiler is a program that reads a program written in one language–
the source language and translates it into an equivalent program in another
language-the target language.The compiler reports to its user the presence of
errors in the source program.
REFERENCES:
 A V Aho, R. Sethi, .J D Ullman, "Compilers: Principles,Techniques, and Tools",
PearsonEducation, ISBN 81 - 7758 - 5902.
 Reference book: John R. Levine, lex&yacc3.
 Reference ppt: Lecture 2: Lexical Analysis, CS 440/540, George Mason university4.
 Reference URL:http://dinosaur.compilertools.net/ 5.
Result:
Thus the program for develop a lexical analyzer to recognize a few patterns in
C has been executed and verified successfully.
Study of Lex & Yacc Tools

Lex - A Lexical Analyzer Generator
ABSTRACT
Lex helps write programs whose control flow is directed by instances of regular
expressions in the input stream. It is well suited for editor-script type transformations and
for segmenting input in preparation for a parsing routine.
Lex source is a table of regular expressions and corresponding program fragments.

The table is translated to a program which reads an input stream, copying it to an output
stream and partitioning the input into strings which match the given expressions. As each
such string is recognized the corresponding program fragment is executed. The recognition of
the expressions is performed by a deterministic finite automaton generated by Lex. The
program fragments written by the user are executed in the order in which the
corresponding regular
expressions occur in the input stream.
The lexical analysis programs written with Lex accept ambiguous specifications and
choose the longest match possible at each input point. If necessary, substantial lookahead is
performed on the input, but the input stream will be backed up to the end of the current partition,
so that the user has general freedom to manipulate it.
Lex can generate analyzers in either C or Ratfor, a language which can be translated
automatically to portable Fortran. It is available on the PDP-11 UNIX, Honeywell GCOS,
and IBM OS systems.
1. Introduction.
Lex is a program generator designed for lexical processing of character input
streams. It accepts a high-level, problem oriented specification for character string
matching, and produces a program in a general purpose language which recognizes regular
expressions. The regular expressions are specified by the user in the source specifications
given to Lex. The Lex written code recognizes these expressions in an input stream and
partitions the input stream into strings matching the expressions. At the boundaries between
strings program sections provided by the user are executed. The Lex source file associates
the regular expressions and the program fragments. As each expression appears in the input
to the program written by Lex, the corresponding fragment is executed.
The user supplies the additional code beyond expression matching needed to
complete his tasks, possibly including code written by other generators. The program that
recognizes the expressions is generated in the general purpose programming language
employed for the user's program fragments. Thus, a high level expression language is
provided to write the string expressions to be matched while the user's freedom to write actions
is unimpaired. This avoids forcing the user who wishes to use a string manipulation language
for input analysis to write processing programs in the same and often inappropriate string
handling language.
Lex is not a complete language, but rather a generator representing a new language
feature which can be added to different programming languages, called ``host languages.'' Just
as general purpose languages can produce code to run on different computer hardware, Lex
can write code in different host languages. The host language is used for the output code
generated by Lex and also for the program fragments added by the user. Compatible run-time
libraries for the different host languages are also provided. This makes Lex adaptable to
different environments and different users. Each application may be directed to the
combination of hardware and host language appropriate to the task, the user's background, and
the properties of local implementations. At present, the only supported host language is C,
although Fortran (in the form of Ratfor [2] has been available in the past. Lex itself exists on
UNIX, GCOS, and OS/370; but the code generated by Lex may be taken anywhere where
appropriate compilers exist.
Lex turns the user's expressions and actions (called source in this pic) into the host
general-purpose language; the generated program is named yylex. The yylex program will
recognize expressions in a stream (called input in this pic) and perform the specified actions for
each expression as it is detected.
+-------+
Source -> | Lex | -> yylex
+-------+
+-------+
Input -> | yylex | -> Output
+-------+
An overview of Lex
For a trivial example, consider a program to delete from the input all blanks or tabs at the ends of
lines.
%%
[ \t]+$ ;
is all that is required. The program contains a %% delimiter to mark the beginning of the
rules, and one rule. This rule contains a regular expression which matches one or more
instances of the characters blank or tab (written \t for visibility, in accordance with the C
language convention) just prior to the end of a line. The brackets indicate the character class
made of blank and tab; the + indicates `òne or more ...''; and the $ indicates `ènd of line,'' as in
QED. No action is specified, so the program generated by Lex (yylex) will ignore these
characters. Everything else will be copied. To change any remaining string of blanks or tabs
to a single blank,
add another rule:
%%
[ \t]+$ ;
[ \t]+ printf(" ");
The finite automaton generated for this source will scan for both rules at once,
observing at the termination of the string of blanks or tabs whether or not there is a newline
character, and executing the desired rule action. The first rule matches all strings of blanks or
tabs at the end of lines, and the second rule all remaining strings of blanks or tabs.
Lex can be used alone for simple transformations, or for analysis and statistics
gathering on a lexical level. Lex can also be used with a parser generator to perform the lexical
analysis phase; it is particularly easy to interface Lex and Yacc [3]. Lex programs recognize
only regular expressions; Yacc writes parsers that accept a large class of context free
grammars, but require a lower level analyzer to recognize input tokens. Thus, a
combination of Lex and Yacc is often appropriate. When used as a preprocessor for a later
parser generator, Lex is used to partition the input stream, and the parser generator assigns
structure to the resulting pieces. The flow of control in such a case (which might be the first
half of a compiler, for example) is shown in Figure 2.
Additional programs, written by other generators or by hand, can be added easily to programs
written by Lex.
lexical
rules
|
v
+---------+ | Lex |
+---------+
|
v
+---------+
grammar
rules
|
v
+---------+ | Y a c c |
+---------+
|
v
+---------+
Input -> | yylex | -> | yyparse | -> Parsed input

+---------+ +---------+
Lex with Yacc

Yacc users will realize that the name yylex is what Yacc expects its lexical analyzer to
be named, so that the use of this name by Lex simplifies interfacing.
Lex generates a deterministic finite automaton from the regular expressions in the
source. The automaton is interpreted, rather than compiled, in order to save space. The result is
still a fast analyzer. In particular, the time taken by a Lex program to recognize and partition an
input stream is proportional to the length of the input. The number of Lex rules or the
complexity of the rules is not important in determining speed, unless rules which include
forward context require a significant amount of rescanning. What does increase with the
number and complexity of rules is the size of the finite automaton, and therefore the size of the
program generated by Lex.
In the program written by Lex, the user's fragments (representing the actions to be
performed as each regular expression is found) are gathered as cases of a switch. The
automaton interpreter directs the control flow. Opportunity is provided for the user to
insert either declarations or additional statements in the routine containing the actions, or
to add subroutines outside this action routine.
Lex is not limited to source which can be interpreted on the basis of one character
lookahead. For example, if there are two rules, one looking for ab and another for
abcdefg, and the input stream is abcdefh, Lex will recognize ab and leave the input
pointer just before cd. . . Such backup is more costly than the processing of simpler
languages.
The general format of Lex source is:

{definitions}
%%
{rules}
%%
{user subroutines}
where the definitions and the user subroutines are often omitted. The second %% is optional,
but the first is
required to mark the beginning of the rules. The absolute minimum Lex program is thus
%% (no definitions, no rules) which translates into a program which copies the input to the
output unchanged.
In the outline of Lex programs shown above, the rules represent the user's control
decisions; they are a table, in which the left column contains regular expressions and the right
column contains actions, program fragments to be executed when the expressions are
recognized. Thus an individual rule might appear
integer printf("found ke yw ord IN T");
to look for the string integer in the input stream and print the message ``found keyword
INT'' whenever it appears. In this example the host procedural language is C and the C
library function printf is used to print the string. The end of the expression is indicated by the
first blank or tab character. If the action is merely a single C expression, it can just be given
on the right side of the line; if it is compound, or takes more than a line, it should be
enclosed in braces. As a slightly more useful example, suppose it is desired to change a
number of
words from British to American spelling. Lex rules such as
colour printf("color");
mechanise p ri nt f( "m e c ha ni z e " );
petrol printf("gas");
3. Lex Regular Expressions.
The definitions of regular expressions are very similar to those in QED [5]. A regular
expression specifies a set of strings to be matched. It contains text characters (which match
the corresponding characters in
the strings being compared) and operator characters (which specify repetitions, choices,
and other features). The letters of the alphabet and the digits are always text characters; thus the
regular expression integer matches the string integer wherever it appears and the expression
a57D
looks for the string a57D.
Operators:
The operator characters are
"\[]^-?.*+|()$/{}%<>
and if they are to be used as text characters, an escape should be used. The quotation mark
operator (") indicates
that whatever is contained between a pair of quotes is to be taken as text characters. Thus
xyz"++"
matches the string xyz++ when it appears. Note that a part of a string may be quoted.
It is harmless but
unnecessary to quote an ordinary text character; the expression
"xyz++"
is the same as the one above. Thus by quoting every non-alphanumeric character being used as a
text character, the user can avoid remembering the list above of current operator
characters, and is safe should further
extensions to Lex lengthen the list.
An operator character may also be turned into a text character by preceding it with \ as in
xyz\+\+
which is another, less readable, equivalent of the above expressions. Another use of the quoting
mechanism is to get a blank into an expression; normally, as explained above, blanks or tabs
end a rule. Any blank character not contained within [] must be quoted. Several normal C
escapes with \ are recognized: \n is new line, \t is tab, and \b is backspace. To enter \ itself,
use \\. Since new line is illegal in an expression, \n must be used; it is not required to escape
tab and backspace. Every character but blank, tab, new line and the list above is always a text
character.
Character classes: Classes of characters can be specified using the operator pair [].
The construction [abc] matches a single character, which may be a, b, or c. Within
square brackets, most operator meanings are ignored. Only three characters are special:
these are \ - and ^. The - character indicates ranges. For example,
[a-z0-9<>_] indicates the character class containing all the lower case letters, the digits, the
angle brackets, and underline. Ranges may be given in either order. Using - between any
pair of characters which are not both upper case letters, both lower case letters, or both
digits is implementation dependent and will get a warning message. (E.g., [0-z] in ASCII is
many more characters than it is in EBCDIC). If it is desired to include the character - in a
character class, it should be first or last; thus
[-+0-9]
matches all the digits and the two signs.
In character classes, the ^ operator must appear as the first character after the left bracket; it
indicates that the
resulting string is to be complemented with respect to the computer character set. Thus
[âbc]
matches all characters except a, b, or c, including all special or control characters; or
[â-zA-Z]
is any character which is not a letter. The \ character provides the usual escapes within character
class brackets.
Arbitrary character. To match almost any character, the operator character . is the class of all
characters except
newline. Escaping into octal is possible although non-portable:
[\40-\176]
matches all printable characters in the ASCII character set, from octal 40 (blank) to octal 176
(tilde).
Optional expressions. The operator ? indicates an optional element of an expression. Thus
ab?c
matches either ac or abc.
Repeated expressions. Repetitions of classes are indicated by the operators * and +.
a*
is any number of consecutive a characters, including zero; while
a+
is one or more instances of a. For example,
[a-z]+
is all strings of lower case letters. And
[A-Za-z][A-Za-z0-9]*
indicates all alphanumeric strings with a leading alphabetic character. This is a typical
expression for
recognizing identifiers in computer languages.
Alternation and Grouping. The operator | indicates alternation:
(ab|cd)
matches either ab or cd. Note that parentheses are used for grouping, although they are not
necessary on the
outside level;
ab|cd
would have sufficed. Parentheses can be used for more complex expressions:
(ab|cd+)?(ef)*
matches such strings as abefef, efefef, cdef, or cddd; but not abc, abcd, or abcdef.
Context sensitivity. Lex will recognize a small amount of surrounding context. The two
simplest operators for
this are ^ and $. If the first character of an expression is ^, the expression will only be matched at
the beginning of a line (after a newline character, or at the beginning of the input stream).
This can never conflict with the other meaning of ^, complementation of character classes,
since that only applies within the [] operators. If the very last character is $, the expression will
only be matched at the end of a line (when immediately followed by newline). The latter
operator is a special case of the / operator character, which indicates trailing context. The
expression
ab/cd
matches the string ab, but only if followed by cd. Thus
ab$
is the same as
ab/\n
Left context is handled in Lex by start conditions as explained in section 10. If a rule is
only to be executed
when the Lex automaton interpreter is in start condition x, the rule should be prefixed by
<x>
using the angle bracket operator characters. If we considered ``being at the beginning of a
line'' to be start
condition ONE, then the ^ operator would be equivalent to
<ONE>
Start conditions are explained more fully later.
Repetitions and Definitions. The operators {} specify either repetitions (if they enclose
numbers) or definition
expansion (if they enclose a name). For example
{digit}
looks for a predefined string named digit and inserts it at that point in the expression. The
definitions are given
in the first part of the Lex input, before the rules. In contrast,
a{1,5}
looks for 1 to 5 occurrences of a.
Finally, initial % is special, being the separator for Lex source segments.
4. Lex Actions.
When an expression written as above is matched, Lex executes the corresponding

action. This section describes some features of Lex which aid in writing actions. Note that
there is a default action, which consists of copying the input to the output. This is performed on
all strings not otherwise matched. Thus the Lex user who wishes to absorb the entire input,
without producing any output, must provide rules to match everything. When Lex is being
used with Yacc, this is the normal situation. One may consider that actions are what is done
instead of copying the input to the output; thus, in general, a rule which merely copies can be
omitted. Also, a character combination which is omitted from the rules and which appears as
input is likely to be printed on the output, thus calling attention to the gap in the rules.
One of the simplest things that can be done is to ignore the input. Specifying a C null
statement, ; as an action causes this result. A frequent rule is
[ \t\n] ;
which causes the three spacing characters (blank, tab, and new line) to be ignored.
Another easy way to avoid writing actions is the action character |, which indicates that the
action for this rule is
the action for the next rule. The previous example could also have been written
""
"\t"
"\n"
with the same result, although in different style. The quotes around \n and \t are not required.
In more complex actions, the user will often want to know the actual text that matched
some expression like [a- z]+. Lex leaves this text in an external character array named yytext.
Thus, to print the name found, a rule like [a-z]+ printf("%s", yytext);
will print the string in yytext. The C function printf accepts a format argument and data to
be printed; in this case, the format is ``print string'' (% indicating data conversion, and s
indicating string type), and the data are the characters in yytext. So this just places the
matched string on the output. This action is so common that it
may be written as ECHO:
[a-z]+ ECHO;
is the same as the above. Since the default action is just to print the characters found, one might
ask why give a rule, like this one, which merely specifies the default action? Such rules are
often required to avoid matching some other rule which is not desired. For example, if there
is a rule which matches read it will normally match the instances of read contained in bread
or readjust; to avoid this, a rule of the form [a-z]+ is needed. This is
explained further below.
Sometimes it is more convenient to know the end of what has been found; hence
Lex also provides a count yyleng of the number of characters matched. To count both the
number of words and the number of characters in words in the input, the user might write [a-
zA-Z]+ {words++; chars += yyleng;} which accumulates in chars the number of characters in
the words recognized. The last character in the string matched can be accessed by
yytext[yyle ng-1]
Occasionally, a Lex action may decide that a rule has not recognized the correct
span of characters. Two routines are provided to aid with this situation. First, yymore()
can be called to indicate that the next input expression recognized is to be tacked on to
the end of this input. Normally, the next input string would overwrite the current entry in
yytext. Second, yyless (n) may be called to indicate that not all the characters matched by
the currently successful expression are wanted right now. The argument n indicates the
number of characters in yytext to be retained. Further characters previously matched are
returned to the input. This provides the same sort of lookahead offered by the / operator, but in
a different form.
Example: Consider a language which defines a string as a set of characters between quotation
(") marks, and
provides that to include a "in a string it must be preceded by a \. The regular expression which
matches that is
somewhat confusing, so that it might be preferable to write
\"[^"]* {
if (yytext[yyleng-1] == '\\')
yymore();
else
... normal user processing
}
which will, when faced with a string such as "abc\"def" first match the five characters
"abc\; then the call to
yymore() will cause the next part of the string, "def, to be tacked on the end. Note that
the final quote
terminating the string should be picked up in the code labeled ``normal processing''.
The function yyless() might be used to reprocess text in various circumstances.

Consider the C problem of distinguishing the ambiguity of ``=-a''. Suppose it is desired to
treat this as ``=- a'' but print a message. A rule
might be
=-[a-zA-Z]
{
printf("O p (=- ) a mbiguous \n");
y yl e s s ( yy le ng -1 );
... action for =- ...
}
which prints a message, returns the letter after the operator to the input stream, and treats the
operator as ``=-''. Alternatively it might be desired to treat this as ``= -a''. To do this, just return
the minus sign as well as the letter
to the input:
=-[a-zA-Z] {
printf("O p (=- ) a mbiguous \n");
y yl e s s ( yy le ng -2 );
... action for = ...
}will perform the other interpretation. Note that the expressions for the two cases might more
easily be written
=-/[A-Za-z]
in the first case and
=/-[A-Za-z]
in the second; no backup would be required in the rule action. It is not necessary to
recognize the whole
identifier to observe the ambiguity. The possibility of ``=-3'', however, makes
=-/[^ \t\n]
a still better rule.
In addition to these routines, Lex also permits access to the I/O routines it uses. They are:
1) input() which returns the next input character;
2) output(c) which writes the character c on the output; and
3) unput(c) pushes the character c back onto the input stream to be read later by input().
By default these routines are provided as macro definitions, but the user can override
them and supply private versions. These routines define the relationship between external
files and internal characters, and must all be retained or modified consistently. They may be
redefined, to cause input or output to be transmitted to or from strange places, including other
programs or internal memory; but the character set used must be consistent in all routines; a
value of zero returned by input must mean end of file; and the relationship between unput and
input must be retained or the Lex lookahead will not work. Lex does not look ahead at all if it
does not have to, but every rule ending in + * ? or $ or containing / implies lookahead.
Lookahead is also necessary to match an expression that is a prefix of another expression.
See below for a discussion of the character set used by Lex.
The standard Lex library imposes a 100 character limit on backup.
Another Lex library routine that the user will sometimes want to redefine is yywrap()
which is called whenever Lex reaches an end-of-file. If yywrap returns a 1, Lex continues
with the normal wrapup on end of input. Sometimes, however, it is convenient to arrange
for more input to arrive from a new source. In this case, the user should provide a yywrap
which arranges for new input and returns 0. This instructs Lex to continue processing.
The default yywrap always returns 1.
This routine is also a convenient place to print tables, summaries, etc. at the end of a
program. Note that it is not possible to write a normal rule which recognizes end-of-file;
the only access to this condition is through yywrap. In fact, unless a private version of
input() is supplied a file containing nulls cannot be handled, since a value of 0 returned by
input is taken to be end-of-file.
5. Ambiguous Source Rules.
Lex can handle ambiguous specifications. When more than one expression can
match the current input, Lex chooses as follows:
1) The longest match is preferred.
2) Among rules which matched the same number of characters, the rule given first is preferred.
Thus, suppose the rules
integer keyword action ...;

[a-z]+ identifier action ...;
to be given in that order. If the input is integers, it is taken as an identifier, because [a-z]+
matches 8 characters while integer matches only 7. If the input is integer, both rules match 7
characters, and the keyword rule is selected because it was given first. Anything shorter (e.g.
int) will not match the expression integer and so the
identifier interpretation is used.
The principle of preferring the longest match makes rules containing expressions like .*
dangerous. For
example, '.*' might seem a good way of recognizing a string in single quotes. But it is
an invitation for the program to read far ahead, looking for a distant single quote. Presented
with the input
'first' quoted string here, 'second' here

the above expression will match
'first' quoted string here, 'second'
which is probably not what was wanted. A better rule is of the form
'[^'\n]*'
which, on the above input, will stop after 'first'. The consequences of errors like this are
mitigated by the fact that the . Operator will not match new line. Thus expressions like .* stop
on the current line. Don't try to defeat this with expressions like (.|\n)+ or equivalents; the Lex
generated program will try to read the entire input file,
causing internal buffer overflows.
Where the last two rules ignore everything besides he and she. Remember that .
does not include new line. Since she includes 'he', Lex will normally not recognize the
instances of he included in she, since once it has passed a she those characters are gone.
Sometimes the user would like to override this choice. The action REJECT means ``go
do the next alternative.'' It causes whatever rule was second choice after the current rule to be
executed. The position of the input pointer is adjusted accordingly. Suppose the user really
wants to count the included instances of he:
she {s++; REJECT;}

he {h++; REJECT;}
\n |
. ;
these rules are one way of changing the previous example to do just that. After counting each
expression, it is rejected; whenever appropriate, the other expression will then be counted. In
this example, of course, the user could note that she includes he but not vice versa, and omit
the REJECT action on he; in other cases, however,
it would not be possible a priori to tell which input characters were in both classes.
Consider the two rules
a[bc]+ { ... ; REJECT;}

a[cd]+ { ... ; REJECT;}
If the input is 'ab', only the first rule matches, and on 'ad' only the second matches. The input
string 'accb'
matches the first rule for four characters and then the second rule for three characters. In
contrast, the input
'accd' agrees with the second rule for four characters and then the first rule for three.
In general, REJECT is useful whenever the purpose of Lex is not to partition the input stream
but to detect all
examples of some items in the input, and the instances of these items may overlap or
include each other. Suppose a digram table of the input is desired; normally the digrams
overlap, that is the word the is considered to contain both th and he. Assuming a two-
dimensional array named digram to be incremented, the appropriate
source is
%%
[a-z][a-z]
.
\n
{
digram[yytext[0]][yytext[1]]++;
REJECT;
}
;
;
where the REJECT is necessary to pick up a letter pair beginning at every character, rather
than at every other
character.
6. Lex Source Definitions.
Remember the format of the Lex source:

{definitions}
%%
{rules}
%%
{us e r routines }
So far only the rules have been described. The user needs additional options, though, to define
variables for use
in his program and for use by Lex. These can go either in the definitions section or in the rules
section.
Remember that Lex is turning the rules into a program. Any source not intercepted by Lex
is copied into the
generated program. There are three classes of such things.
1) Any line which is not part of a Lex rule or action which begins with a blank or tab
is copied into the Lex generated program. Such source input prior to the first %%
delimiter will be external to any function in the code; if it appears immediately after the
first %%, it appears in an appropriate place for declarations in the function written by Lex
which contains the actions. This material must look like program fragments, and should
precede the first Lex rule. As a side effect of the above, lines which begin with a blank or
tab, and which contain a comment, are passed through to the generated program. This can
be used to include comments in either the Lex source or the generated code. The comments
should follow the host language convention.
2) Anything included between lines containing only %{ and %} is copied out as

above. The delimiters are discarded. This format permits entering text like
preprocessor statements that must begin in column 1, or copying lines that do not look
like programs.
3) Anything after the third %% delimiter, regardless of formats, etc., is copied out after the Lex
output.
Definitions intended for Lex are given before the first %% delimiter. Any line in this
section not contained between %{ and %}, and beginning in column 1, is assumed to
define Lex substitution strings. The format of such lines is name translation and it causes the
string given as a translation to be associated with the name. The name and translation must be
separated by at least one blank or tab, and the name must begin with a letter. The translation can
then be called out by the {name} syntax in a rule. Using {D} for the digits and {E} for
an exponent field, for example, might abbreviate rules to recognize numbers:
D
E
%%
{D}+
{D}+"."{D}*({E})? {D}*"."{D}+({E})?
{D}+{E}
[0-9]
[DEde][-+]?{D}+
printf("integer");
|
Note the first two rules for real numbers; both require a decimal point and contain an optional
exponent field, but the first requires at least one digit before the decimal point and the second
requires at least one digit after the decimal point. To correctly handle the problem posed by a
Fortran expression such as 35.EQ.I, which does not
contain a real number, a context-sensitive rule such as
[0-9]+/"."EQ printf("integer");
could be used in addition to the normal rule for integers.The definitions section may also
contain other commands, including the selection of a host language, a character set
table, a list of start conditions, or adjustments to the default size of arrays within Lex itself
for larger source programs.
7.Usage.
There are two steps in compiling a Lex source program. First, the Lex source must be turned
into a generated
program in the host general purpose language. Then this program must be compiled and
loaded, usually with a library of Lex subroutines. The generated program is on a file
named lex.yy.c. The I/O library is defined in
terms of the C standard library [6].
The C programs generated by Lex are slightly different on OS/370, because the OS
compiler is less powerful
than the UNIX or GCOS compilers, and does less at compile time. C programs generated on
GCOS and UNIX
are the same.
UNIX. The library is accessed by the loader flag -ll. So an appropriate set of commands is lex
source cc lex.yy.c
-ll The resulting program is placed on the usual file a.out for later execution. To use Lex
with Yacc see below. Although the default Lex I/O routines use the C standard library, the
Lex automata themselves do not do so; if
private versions of input, output and unput are given, the library can be avoided.
8. Lex and Yacc.
If you want to use Lex with Yacc, note that what Lex writes is a program named yylex(), the
name required by
Yacc for its analyzer. Normally, the default main program on the Lex library calls this
routine, but if Yacc is
loaded, and its main program is used, Yacc will call yylex(). In this case each Lex rule should
end with
return(toke n);
where the appropriate token value is returned. An easy way to get access to Yacc's
names for tokens is to compile the Lex output file as part of the Yacc output file by
placing the line # include "lex.yy.c" in the last section of Yacc input. Supposing the grammar
to be named ``good'' and the lexical rules to be named ``better''
the UNIX command sequence can just be:
yacc good
lex better
cc y.tab.c -ly -ll
The Yacc library (-ly) should be loaded before the Lex library, to obtain a main program
which invokes the
Yacc parser. The generations of Lex and Yacc programs can be done in either order.
9. Examples.
As a trivial problem, consider copying an input file while adding 3 to every positive
number divisible by 7.
Here is a suitable Lex source program
%%
int k;
[0-9]+ {
k = atoi(yytext);
if (k%7 == 0)
printf("%d", k+3);
else
p ri nt f( "% d" , k );
}
to do just that. The rule [0-9]+ recognizes strings of digits; atoi converts the digits to binary and
stores the result in k. The operator % (remainder) is used to check whether k is divisible by 7;
if it is, it is incremented by 3 as it is written out. It may be objected that this program will
alter such input items as 49.63 or X7. Furthermore, it
increments the absolute value of all negative numbers divisible by 7. To avoid this, just add
a few more rules
after the active one, as here:
%%
int k;
-?[0-9]+
-?[0-9.]+
[A-Za-z][A-Za-z0-9]+ {k = a t o i ( y y t e x t ) ; printf("%d", k%7 == 0 ? k+3 :
k);
}
ECHO;
ECHO;
Numerical strings containing a ``.'' or preceded by a letter will be picked up by one of the last
two rules, and not changed. The if-else has been replaced by a C conditional expression to save
space; the form a?b:c means `ìf a
then b else c''.
For an example of statistics gathering, here is a program which histograms the lengths of
words, where a word
Is defined as a string of letters
i n t l e n gs [ 1 0 0 ] ;
%%
[a-z]+
.
\n
%%
yywrap()
{
i n t i ; lengs[yyleng]++;
printf("Length No. words\n");
for(i=0; i<100; i++)
if (lengs[i] > 0)
p ri nt f( "% 5d %1 0d \n ", i, le ng s [ i] );
re turn(1);
}
This program accumulates the histogram, while producing no output. At the end of the input it
prints the table. The final statement return(1); indicates that Lex is to perform wrapup. If
yywrap returns zero (false) it implies that further input is available and the program is to
continue reading and processing. To provide a yywrap that
never returns true causes an infinite loop.
As a larger example, here are some parts of a program written by N. L. Schryer to convert
double precision Fortran to single precision Fortran. Because Fortran does not
distinguish upper and lower case letters, this
routine begins by defining a set of classes including both cases of each letter:
a
b
c
...
z
An additional class recognizes white space:
W
[aA]
[bB]
[cC]
[zZ]
[ \t]*
In the regular expression, the quotes surround the blanks. It is interpreted as ``beginning of
line, then five blanks, then anything but blank or zero.'' Note the two different meanings of
^. There follow some rules to
change double precision constants to ordinary floating constants.
[0-9]+{W}{d}{W}[+-]?{W}[0-9]+ |
[0-9]+{W}"."{W}{d}{W}[+-]?{W}[0-9]+ |
"."{W}[0-9]+{W}{d}{W}[+-]?{W}[0-9]+ {
/ * c on ve r t c ons ta nts */
for(p=yytext; *p != 0; p++)
{
if (*p == 'd' || *p == 'D')
*p=+ 'e'- 'd';
ECHO;
}
After the floating point constant is recognized, it is scanned by the for loop to find the
letter d or D. The program than adds 'e'-'d', which converts it to the next letter of the alphabet.
The modified constant, now single- precision, is written out again. There follow a series of
names which must be respelled to remove their initial d. By using the array yytext the same
action suffices for all the names (only a sample of a rather long list is given
here).
{d}{s}{i}{n} {d}{c}{o}{s}
{d}{s}{q}{r}{t} {d}{a}{t}{a}{n}
...
{d}{f}{l}{o}{a}{t}
|
|
|
|
printf("%s ", yyte xt+1);

Another list of names must have initial d changed to initial a:
{d}{l}{o}{g}
{d}{l}{o}{g}10
{d}{m}{i}{n}1 {d}{m}{a}{x}1
|
|
|
{
yytext[0] =+ 'a' - 'd';
ECHO;
}
And one routine must have initial d changed to initial r:
{d}1{m}{a}{c}{h} {yytext[0] =+ 'r' - 'd';
To avoid such names as dsinx being detected as instances of dsin, some final rules pick up
longer words as
identifiers and copy some surviving characters:
[A-Za-z][A-Za-z0-9]* |
[0-9]+ |
\n |
. ECHO;
Note that this program is not complete; it does not deal with the spacing problems in Fortran
or with the use of
keywords as identifiers.
10. Left Context Sensitivity.
Sometimes it is desirable to have several sets of lexical rules to be applied at different times
in the input. For
example, a compiler preprocessor might distinguish preprocessor statements and analyze them
differently from ordinary statements. This requires sensitivity to prior context, and there are
several ways of handling such
problems. The ^ operator, for example, is a prior context operator, recognizing
immediately preceding left
context just as $ recognizes immediately following right context. Adjacent left context
could be extended, to produce a facility similar to that for adjacent right context, but it
is unlikely to be as useful, since often the relevant left context appeared some time earlier,
such as at the beginning of a line.
This section describes three means of dealing with different environments: a simple use of
flags, when only a
few rules change from one environment to another, the use of start conditions on rules, and
the possibility of making multiple lexical analyzers all run together. In each case, there are
rules which recognize the need to change the environment in which the following input text
is analyzed, and set some parameter to reflect the change. This may be a flag explicitly tested
by the user's action code; such a flag is the simplest way of dealing with the problem, since
Lex is not involved at all. It may be more convenient, however, to have Lex remember the
flags as initial conditions on the rules. Any rule may be associated with a start condition. It
will only be recognized when Lex is in that start condition. The current start condition may be
changed at any time. Finally, if the sets of rules for the different environments are very
dissimilar, clarity may be best achieved by writing
several distinct lexical analyzers, and switching from one to another as desired.
Consider the following problem: copy the input to the output, changing the word magic to
first on every line
which began with the letter a, changing magic to second on every line which began
with the letter b, and changing magic to third on every line which began with the letter c.
All other words and all other lines are left
unchanged.
These rules are so simple that the easiest way to do this job is with a flag:
int flag;
%%
â
^b
^c
\n
magic { fl a g = ' a ' ; EC H O ;} { fl a g = 'b '; EC H O ;} { f l a g = ' c ' ; E C H O ; }
{flag = 0 ; ECHO;}
{
switch (flag)
{
case 'a': printf("first"); break;
case 'b': printf("second"); break;
case 'c': printf("third"); break;
default: ECHO; break;
}
}
should be adequate.
To handle the same problem with start conditions, each start condition must be
introduced to Lex in the
definitions section with a line reading
%Start name1 name2 ...

where the conditions may be named in any order. The word Start may be abbreviated to s or
S. The conditions
may be referenced at the head of a rule with the <> brackets:
<name1>expression
is a rule which is only recognized when Lex is in the start condition name1. To enter a start
condition, execute
the action statement
BEGIN name1;
which changes the start condition to name1. To resume the normal state,
BEGIN 0;
resets the initial condition of the Lex automaton interpreter. A rule may be active in several
start conditions:
<name1,name2,name3> is a legal prefix. Any rule not beginning with the <> prefix operator is
always active.
The same example as before can be written:
%START AA BB CC
%%
â
^b
^c
\n
<AA>magic
<BB>magic
<CC>magic
{ECHO; BEGIN AA;} {ECHO; BEGIN BB;} {ECHO; BEGIN CC;}
{ECHO; BEGIN 0;} p r i n t f ( " f i r s t " ) ;
printf("second");
%%
{rules}
%%
{user subroutines}
The definitions section contains a combination of
1) Definitions, in the form ``name space translation''.
2) Included code, in the form ``space code''.
3) Included code, in the form

%{
code
%}
4) Start conditions, given in the form
%S name1 name2 ...
5) Character set tables, in the form
%T
number space character-string
...
%T
6) Changes to internal array sizes, in the form
%x nnn
where nnn is a decimal integer representing an array size and x selects the parameter as
follows:Letter
p
n
e
a
k
o
Parameter
pos itions
states
tree nodes
transitions
pa c ke d cha ra c te r c las se s
output array size
Lines in the rules section have the form `èxpression action'' where the action may be continued
on succeeding
lines by using braces to delimit it.
Regular expressions in Lex use the following operators:
x : the character "x"

"x"
\x
[xy]
[x-z]
[^x]
.
^x
<y>x
x$
x?
x*
x+
x|y
(x)
x/y
{xx}
x{m,n}
an "x", even if x is an operator.

the character x or y.
the characters x, y or z.
a n y c ha r a c t e r bu t x.
a ny c ha rac te r but ne w line.
an x at the beginning of a line.
an x when Lex is in start condition y.
an x at the end of a line.
an optional x.
0,1,2, ... instances of x. 1,2,3, ... instances of x.
an x or a y.
an x.
an x but only if followed by y.
the translation of xx from the
definitions section.
m through n occurrences of x
13. Caveats and Bugs.
There are pathological expressions which produce exponential growth of the tables
when converted to
deterministic machines; fortunately, they are rare.
REJECT does not rescan the input; instead it remembers the results of the previous scan. This
means that if a
rule with trailing context is found, and REJECT executed, the user must not have used
unput to change the characters forthcoming from the input stream. This is the only restriction
on the user's ability to manipulate the
not-yet-processed input.
REFERENCES:
 V. Aho and M. J. Corasick, Efficient String Matching: An Aid to Bibliographic

Search, Comm. ACM 18, 333-340 (1975).
 W. Kernighan, D. M. Ritchie and K. L. Thompson, QED Text Editor,

Computing Science Technical Report No. 5, 1972, Bell Laboratories, Murray Hill,
NJ 07974.
 M. Ritchie, private communication. See also M. E. Lesk, The Portable C Library,

Computing Science Technical Report No. 31, Bell Laboratories, Murray Hill, NJ
07974.
Ex. No:3
Implementation Of Lexical Analysis Using LEX Tool
AIM:
To write a C Program to implement a Lexical analyser using LEX Tool.
DESCRIPTION :
Lex is a tool for generating scanners. Scanners are programs thatrecognize lexical
patterns in text. These lexical patterns (or regular expressions) are defined in a particular
syntax.
 Operating System - Linux andWindows 2000/XP/NT
 TURBO C
 Flex windows(LEX TOOL)
OBJECTIVE :
The main objective is lex is a tool that converts input information into a series of
tokens.
SYNTAXES & KEYWORDS:

 identifier [a-zA-Z][a-zA-Z0-9]*
The shorthand character range construction „[x-y]‟ matches any of the characters
Between (and including) x and y. For example, [a-c] means the same as a|b|c, and [a-cA-C]
means the same as a|b|c|A|B|C.
 # assigns “PREPROCESSOR DIRECTIVE”
 If the statement begins with „\*‟ and ends with „*/‟ the Assign as “COMMENT”.
ALGORITHM:
Step1:Start the program.
Step 2:Declare all the variables and file pointers
Step 3 : D i s p l a y t h e i n p u t p r o g r a m .
Step 4:Separate the keyword in the program and display it.
Step 5:Display the header files of the input program.
Step 6:Separate the operators of the input program and display it.
Step 7 : P r i n t t h e p u n c t u a t i o n m a r k s .
Step 8:Print the constant that are present in input program.
Step 9:Print the identifiers of the input program.
Step 10:Stop the program.
PROGRAM:
/* program name is lexp.l */

%{
/* program to recognize a c program */
%}
identifier [a-zA-Z][a-zA-Z0-9]*
%%
#.* { printf("\n%s is a PREPROCESSOR DIRECTIVE",yytext);}
int |
float |
char |
double |
while |
for |
do |
if |
break |
continue |
void |
switch |
case |
long |
struct |
const |
typedef |
return |
else |
goto {printf("\n\t%s is a KEYWORD",yytext);}
{identifier}$ { printf("\n\nFUNCTION\n\t%s",yytext);}
\{ { printf("\n BLOCK BEGINS");}
\} { printf("\n BLOCK ENDS");}
{identifier}(\[[0-9]*\])? { printf("\n %s IDENTIFIER",yytext);}
\".*\" { printf("\n\t%s is a STRING",yytext);}
[0-9]+ { printf("\n\t%s is a NUMBER",yytext);}
$(\;)? { printf("\n\t");ECHO;printf("\n");}
\( ECHO;
= { printf("\n\t%s is an ASSIGNMENT OPERATOR",yytext);}
\<= |
\>= |
\< |
== |
\> { printf("\n\t%s is a RELATIONAL OPERATOR",yytext);}
%%
int main(int argc,char **argv)
{
if (argc > 1)
{
FILE *file;
file = fopen(argv[1],"r");
if(!file)
{
printf("could not open %s \n",argv[1]);
exit(0);
}
yyin = file;
}
yylex();
printf("\n\n");
return 0;
} int yywrap()
{
return 0;
}
INPUT FILE:(var.c)
#include<stdio.h>
#include<conio.h>
void main()
{
int a,b,c;
a=10; b=5;
c=a+b;
printf(“The sum is %d”,c);
getch();
}
OUTPUT:
ADVANTAGES :
 It quickly generates solutions to problems that involve lexical analysis, that is, the
recognition of strings of characters that satisfy certain characteristics.
 This enables to solve a wide class of problems drawn from text processing, code
enciphering, compiler writing, and other areas.
LIMITATIONS :
 We can easily understand some of lex‟s limitations. For example, lex cannot be used
to recognize nested structures such as parentheses. Nested structures are handled by
incorporating a stack.
 Whenever we encounter a “(” we push it on the stack. When a “)” is encountered we
match it with the top of the stack and pop the stack. However lex only has states and
transitions between states.
APPLICATIONS
 Lex is used to create a sample lexical analyzer for c programming language;

 It can recognize the valid symbols in c programming language including valid
programming constructs.
VIVA QUESTIONS
1. List the various error recovery strategies for a lexical analysis.
Possible error recovery actions are:
 Panic mode recovery
 Deleting an extraneous character
 Inserting a missing character
 Replacing an incorrect character by a correct character
 Transposing two adjacent characters
2.Define patterns/lexeme/tokens?
A set of strings in the input for which the same token is produced as output. This set
of strings described by a rule called pattern associated with the token.A lexeme is a sequence
of characters in the source program that is matched by the pattern for a token. Token is a
sequence of character that can be treated as a single logical entity.
3.What are the implementations of lexical analyzer?

a)Use a lexical analyzer generator, such as Lex compiler, to produce the lexical
analyzer from a regular expression based specification
b)Write the lexical analyzer in a conventional systems-programming language using
the I/O facilities of that language to read the input.
c)Write the lexical analyzer in assembly language and explicitly manage the reading
of input.
4.Define regular expression?

It is built up out of simpler regular expression using a set of defining rules. Each
regular expression „r‟ denotes a language L(r). The defining rules specify how L(r) is formed
by combining in various ways the languages denoted by the sub expressions of r.
RESULT:
Thus the program is to implement a Lexical analyser using LEX Tool has been
executed and the required output is obtained.
Generate YACC specification for a few syntactic categories

Ex.No:4(a) Program to recognize a valid arithmetic expression that uses operator +, -, * and /.
AIM:
To create program to recognize a valid arithmetic expression that uses operator +, -, *

and /.
 TURBO C
OBJECTIVE
 Be proficient on writing grammars to specify syntax

 Understand the theories behind different parsing strategies-their strengths &
limitations
 Understand how the generation of parser can be automated
 Be able to use YACC to generate parsers
SYNTAXES & KEYWORDS
 The symbols have higher precedence than symbols declared before in a %left, %right
or %nonassoc line.
 They have lower precedence than symbols declared after in a %left, %right or
%nonassoc line. The symbols are declared to associate to the left (%left), to the right
(%right), or to be non-associative (%nonassoc).
ALGORITHM
Step 1:Include the necessary header files.
Step 2:Declare the semantic rule for the identifier and number.
Step 3:If the statement begins with main (), if else * while, the return as MAIN, IF ELSE and
WHILE.If the variables are declared as int, float and char then return as VAR and
NUM
Step 4:Include the necessary header files.Initialize the err no=0 and declare like no as integer.
Step 5:Declare the necessary tokens for the grammar.
PROGRAM:
Lex Part:
%{
#include<stdio.h>
#include"y.tab.h"
extern int yylval;
%}
%%
[0-9]+ {
yylval=atoi(yytext);
return NUM;
}
[\t] ;
\n return 0;
. return yytext[0];
%%
Yacc Part:
%{
#include<stdio.h>
%}
%token NUM
%left '+' '-'
%left '*' '/'
%left '(' ')'
%%
expr: e{
printf("result:%d\n",$$);
return 0;
}
e:e'+'e {$$=$1+$3;}
|e'-'e {$$=$1-$3;}
|e'*'e {$$=$1*$3;}
|e'/'e {$$=$1/$3;}
|'('e')' {$$=$2;}
| NUM {$$=$1;}
%%
main()
printf("\n enter the arithematic expression:\n");
yyparse();
printf("\nvalid expression\n");
}
yyerror()
printf("\n invalid expression\n");
exit(0);
Sample Input / Output:

$ flex prog5.l
$ bison -dy prog5.y
$ gcc lex.yy.c y.tab.c
$ a.exe
enter the arithematic expression: 5+6
result:11
valid expression
ADVANTAGES
Parser using YACC AND LEX tool can be used to check arithmetic expression for its
correctness.
VIVA QUESTIONS
1. Explain yacc and lex tool
Lex and Yacc can generate program fragments that solve the first task.
The task of discovering the source structure again is decomposed into subtasks:
1. Split the source file into tokens (Lex).
2. Find the hierarchical structure of the program (Yacc)
2. Define flex, a fast scanner generator

Flex is a tool for generating scanners: programs which recognized lexical patterns in
text. flex reads the given input files, or its standard input if no file names are given, for a
description of a scanner to generate.
3. Define Bison, The YACC-compatible Parser Generator
Bison is a general-purpose parser generator that converts a grammar description for an
LALR(1) context-free grammar into a C program to parse that grammar
Result:
Thus the program for recognize a valid arithmetic expression using YACC has been
executed and verified successfully.
EX.No:4b Program to recognize a valid variable which starts with a letter followed by any
number of letters or digits
AIM:
To create program to recognize a valid variable which starts with a letter followed by
any number of letters or digits.
 TURBO C
OBJECTIVE
 Be proficient on writing grammars to specify syntax

 Understand the theories behind different parsing strategies-their strengths &
limitations
 Understand how the generation of parser can be automated
 Be able to use YACC to generate parsers
SYNTAXES & KEYWORDS
 The symbols have higher precedence than symbols declared before in a %left, %right
or %nonassoc line.
 They have lower precedence than symbols declared after in a %left, %right or
%nonassoc line. The symbols are declared to associate to the left (%left), to the right
(%right), or to be non-associative (%nonassoc).
ALGORITHM
Step 1.Include the necessary header files.
Step 2.Declare the semantic rule for the identifier and number.
Step 3.If the statement begins with main (), if else * while, the return as MAIN, IF ELSE and
WHILE.If the variables are declared as int, float and char then return as VAR and NUM
Step 4.Include the necessary header files.Initialize the err no=0 and declare like no as integer.
Step 5.Declare the necessary tokens for the grammar.
PROGRAM
Lex file
%option noyywrap
%{
#include "y.tab.h"
%}
%%
[a-zA-Z] return letter;
[0-9] return digit;
. return yytext[0];
\n return 0;
%%
YACC file
%{
#include<stdio.h>
int valid=1;
%}
%token digit letter
%%
start : letter s
s : letter s
| digit s
|
;
%%
int yyerror()
{
printf("\nIts not a identifier!\n");
valid=0;
return 0;
}
int main()
{
printf("\nEnter a name to tested for identifier ");
yyparse();
if(valid)
{
printf("\nIt is a identifier!\n");
}
}
ADVANTAGES
 In this program, single back end is developed for single source language.
 It also has the advantage of allowing the use of a single back end for multiple source
languages, and similarly allows the use of different back ends for different targets.
APPLICATIONS
 This program can be used to develop lexical analyzer and parser for a compiler using
C programming language.
VIVA QUESTIONS
1.What is LEX?
Lex is a computer program that generates lexical analysis ("scanners" or "lexers").
2. Write the Structure of a Lex file.

The structure of a Lex file is divided into three sections, separated by lines that
contain only two percent signs, as follows:
Definition section
%%
Rules section
%%
C code section
3. What is YACC?
The Yacc is a computer program used to generate parser.
Result:
Thus the program for recognize a valid variables using YACC has been executed and
verified successfully.
Ex.No:4(c ) Implementation of Calculator using LEX and YACC

AIM:
To create a program to Implement of Calculator using LEX and YACC.
DESCRIPTION
In this programs two classical tools for compilers, Lex and Yacc are used to create a
simple, desk-calculator program that performs addition, subtraction, multiplication, and
division operations.
OBJECTIVE
To write semantic rules to the YACC program and implement a calculator that takes
an expression with digits + and * and computes and prints its values.
SYNTAXES & KEYWORDS

LEX tool
Input to Lex is divided into three sections with %% dividing the sections. This is best
illustrated by example.
YACC tool
%token symbol...symbol
Declare the given symbols as tokens (terminal symbols). These symbols are added as
constant constructors for the token concrete type.
%token <type>symbol...symbol
Declare the given symbols as tokens with an attached attribute of the given type.
%start symbol...symbol
Declare the given symbols as entry points for the grammar.
%type <type>symbol...symbol
%left symbol...symbol
%right symbol...symbol
%nonassocsymbol...symbol
Associate precedences and associativities to the given symbols. All symbols on the same line
are
given the same precedence. They have higher precedence than symbols declared before in a
%left, %right or %nonassoc line. They have lower precedence than symbols declared after in
a
%left, %right or %nonassoc line. The symbols are declared to associate to the left (%left), to
the
right (%right), or to be non-associative (%nonassoc).
%% ……………….
……………………%%
 In this program two classical tools for compilers are user, that are
o Lex: A Lexical Analyzer Generator
o Yacc: “Yet Another Compiler Compiler” (Parser Generator)
 Lex creates programs that scan tokens one by one.
 Yacc takes a grammar (sentence structure) and generates a parser.
 In the first part of the program contains source code for Lex tool and the second part of
the program contains YACC tool which groups the tokens logically.
ALGORITHM
Step 1.Include the necessary header files.
Step 2.Declare the semantic rule for the identifier and number.
Step 3.If the statement begins with main (), if else * while, the return as MAIN, IF ELSE and
WHILE.If the variables are declared as int, float and char then return as VAR and NUM
Step 4.Include the necessary header files.Initialize the err no=0 and declare like no as integer.
Step 5.Declare the necessary tokens for the grammar.
PROGRAM:
LEX FILE
*********
%{
#include"y.tab.h"
#include<math.h>
extern yylval;
%}
%%
[0-9]+ {yylval=atoi(yytext);return NUM;}
[+] {return '+';}
[-] {return '-';}
[*] {return '*';}
[/] {return '/';}
[\t]+;
[\n] {return 0;}
%%
YACC FILE
*********
%{
#include<stdio.h>
%}
%token NUM
%left '-''+'
%right '*''/'
%%
start: exp {printf("%d\n",$$);}
exp:exp'+'exp {$$=$1+$3;}
|exp'-'exp {$$=$1-$3;}
|exp'*'exp {$$=$1*$3;}
|exp'/'exp
{
if($3==0)
yyerror("error");
else
{
$$=$1/$3;
}
}
|'('exp')' {$$=$2;}
|NUM {$$=$1;}
;
%%
main()
{
printf("Enter the Expr. in terms of integers\n");
if(yyparse()==0)
printf("Success\n");
}
yywrap(){}
yyerror()
{
printf("Error\n");
}
TO COMPILE & RUN-

write the following linux command & press enter:
lex calci.l
yacc -d calci.y
cc lex.yy.c y.tab.c
./a.out
Now it will print:

Enter the expression in terms of integers:
5*8+3 43 Success
Result:
Thus the program for implementation of calculator using YACC has been executed
and verified successfully.
Ex.No:5 Convert the BNF rules into YACC form and write code to generate
Abstract Syntax tree
AIM :
To write a program to convert the BNF rules into YACC form and write code to
generate abstract syntax tree.
 TURBO C
DEFINITION:
ABSTRACT SYNTAX TREE:
An AbstractSyntaxTree is often the output of a parser (or the "parse stage" of a

compiler), and forms the input to semantic analysis and code generation (this assumes
a phased compiler; many compilers interleave the phases in order to conserve
memory).
BNF:
 BNF stands for either Backus-Naur Form or Backus Normal Form
 BNF is a meta language used to describe the grammar of a programming
language
ALGORITHM:
Step1: Start
Step2: declare the declarations as a header file
{include<ctype.h>}
Step3: token digit
Step4: define the translations rules like line, expr, term, factor
Line:exp ‘\n’ {print(“\n %d \n”,$1)}
Expr:expr’+’ term ($$=$1=$3}
Term:term ‘+’ factor($$ =$1*$3}
Factor
Factor:’(‘enter’) ‘{$$ =$2)
%%
Step5: define the supporting C routines
Step6: Stop
PROGRAM:
<int.l>
%{
#include"y.tab.h"
#include<stdio.h>
#include<string.h>
int LineNo=1;
%}
identifier [a-zA-Z][_a-zA-Z0-9]*
number [0-9]+|([0-9]*\.[0-9]+)
%%
main return MAIN;
if return IF;
else return ELSE;
while return WHILE;
int |
char |
float return TYPE;
{identifier} {strcpy(yylval.var,yytext);
return VAR;}
{number} {strcpy(yylval.var,yytext);
return NUM;}
\< |
\> |
\>= |
\<= |
== {strcpy(yylval.var,yytext);
return RELOP;}
[ \t] ;
\n LineNo++;
. return yytext[0];
%%
<int.y>
%{
#include<string.h>
#include<stdio.h>
struct quad
{
char op[5];
char arg1[10];
char arg2[10];
char result[10];
}QUAD[30];
struct stack
{
int items[100];
int top;
}stk;
int Index=0,tIndex=0,StNo,Ind,tInd;
extern int LineNo;
%}
%union
{
char var[10];
}
%token <var> NUM VAR RELOP
%token MAIN IF ELSE WHILE TYPE
%type <var> EXPR ASSIGNMENT CONDITION IFST ELSEST WHILELOOP
%left '-' '+'
%left '*' '/'
%%
PROGRAM : MAIN BLOCK
;
BLOCK: '{' CODE '}'
;
CODE: BLOCK
| STATEMENT CODE
| STATEMENT
;
STATEMENT: DESCT ';'
| ASSIGNMENT ';'
| CONDST
| WHILEST
;
DESCT: TYPE VARLIST
;
VARLIST: VAR ',' VARLIST
| VAR
;
ASSIGNMENT: VAR '=' EXPR{
strcpy(QUAD[Index].op,"=");
strcpy(QUAD[Index].arg1,$3);
strcpy(QUAD[Index].arg2,"");
strcpy(QUAD[Index].result,$1);
strcpy($$,QUAD[Index++].result);
}
;
EXPR: EXPR '+' EXPR {AddQuadruple("+",$1,$3,$$);}
| EXPR '-' EXPR {AddQuadruple("-",$1,$3,$$);}
| EXPR '*' EXPR {AddQuadruple("*",$1,$3,$$);}
| EXPR '/' EXPR {AddQuadruple("/",$1,$3,$$);}
| '-' EXPR {AddQuadruple("UMIN",$2,"",$$);}
| '(' EXPR ')' {strcpy($$,$2);}
| VAR
| NUM
;
CONDST: IFST{
Ind=pop();
sprintf(QUAD[Ind].result,"%d",Index);
Ind=pop();
}
| IFST ELSEST
;
IFST: IF '(' CONDITION ')' {
strcpy(QUAD[Index].op,"==");
strcpy(QUAD[Index].arg2,"FALSE");
strcpy(QUAD[Index].result,"-1");
push(Index);
Index++;
}
BLOCK {
strcpy(QUAD[Index].op,"GOTO");
push(Index);
Index++;
};
ELSEST: ELSE{
tInd=pop();
Ind=pop();
push(tInd);
}
BLOCK{
Ind=pop();
};
CONDITION: VAR RELOP VAR {AddQuadruple($2,$1,$3,$$);
StNo=Index-1;
}
| VAR
| NUM
;
WHILEST: WHILELOOP{
Ind=pop();
sprintf(QUAD[Ind].result,"%d",StNo);
Ind=pop();
}
;
WHILELOOP: WHILE '(' CONDITION ')' {
strcpy(QUAD[Index].op,"==");
strcpy(QUAD[Index].arg2,"FALSE");
push(Index);
Index++;
}
BLOCK {
strcpy(QUAD[Index].op,"GOTO");
push(Index);
Index++;
}
;
%%
extern FILE *yyin;
int main(int argc,char *argv[])
{
FILE *fp;
int i;
if(argc>1)
{
fp=fopen(argv[1],"r");
if(!fp)
{
printf("\n File not found");
exit(0);
}
yyin=fp;}
yyparse();
printf("\n\n\t\t ----------------------------""\n\t\t Pos Operator Arg1 Arg2 Result" "\n\t\t
--------------------");
for(i=0;i<Index;i++)
{
printf("\n\t\t %d\t %s\t %s\t %s\t
%s",i,QUAD[i].op,QUAD[i].arg1,QUAD[i].arg2,QUAD[i].result);
}
printf("\n\t\t -----------------------");
printf("\n\n");
return 0;
}
void push(int data)
{
stk.top++;
if(stk.top==100)
{printf("\n Stack overflow\n");
exit(0);
}
stk.items[stk.top]=data;}
int pop()
{
int data;
if(stk.top==-1)
{
printf("\n Stack underflow\n");
exit(0);}
data=stk.items[stk.top--];
return data;}
void AddQuadruple(char op[5],char arg1[10],char arg2[10],char result[10])
{
strcpy(QUAD[Index].op,op);
strcpy(QUAD[Index].arg1,arg1);
strcpy(QUAD[Index].arg2,arg2);
sprintf(QUAD[Index].result,"t%d",tIndex++);
strcpy(result,QUAD[Index++].result);
}
yyerror()
{
printf("\n Error on line no:%d",LineNo);
}
Input:
$vi test.c
main()
{
int a,b,c;
if(a<b)
{a=a+b;}
while(a<b)
{a=a+b;}
if(a<=b)
{c=a-b;}
else
{c=a+b;}}
Output:
$lex int.l
$yacc –d int.y
$gcc lex.yy.c y.tab.c –ll –lm
$./a.out test.c
OUTPUT:
LIMITATIONS OF BNF:
 No easy way to impose length limitations, such as maximum length of
variable names
 No easy way to describe ranges, such as 1 to 31
 No way at all to impose distributed requirements, such as, a variable
must be declared before it is used
 Describes only syntax, not semantics
 Nothing clearly better has been devised
VIVA QUESTIONS:
1.What is Abstract Syntax tree?

An AbstractSyntaxTree is often the output of a parser (or the "parse stage" of a
compiler), and forms the input to semantic analysis and code generation (this assumes
a phased compiler; many compilers interleave the phases in order to conserve
memory).
2. What are BNF Rules?
A BNF specification is a set of derivation rules, written as
<symbol> ::= __expression__
where <symbol> is a nonterminal, and the __expression__ consists of one or more

sequences of symbols; more sequences are separated by the vertical bar, '|', indicating
a choice, the whole being a possible substitution for the symbol on the left. Symbols
that never appear on a left side are terminals. On the other hand, symbols that appear
on a left side are non-terminals and are always enclosed between the pair <>
3. What is DAG representation?
A directed acyclic graph (DAG) is a directed graph that contains no cycles. A

rooted tree is a special kind of DAG and a DAG is a special kind of directed graph.
REFERENCES:
 www.compilerdesign.com
 www.lexicalanalyzer.com
 www.w3school.com
RESULT:
Thus the program to convert the BNF rules into YACC form and write code to
generate abstract syntax tree has been verified and executed successfully.
EX NO:10 Implementation Of Back End Of Compiler

AIM:
To Write a C program for implementation of back end of compiler.
 TURBO C
DESCRIPTION:
The back end is responsible for translating the intermediate representation of
the sourcecode from the middle-end into assembly code.
OBJECTIVE:
To implement the back end of the compiler which takes the three address code
andproduces the 8086 assembly language instructions that can be assembled and run
using a 8086assembly.
SYNTAXES & KEYWORDS

fopen(kk,"r");
–
this statement opens a file(kk) in read mode.
printf("\t\tMOV %c,R%d\n\t",ip[i+k],j);
if(ip[i+1]=='+')
printf("\t\tADD");
// If the operator is an addition (+) then display the assembly code “MOV”“ADD” and
store the result to the corresponding R
else
printf("\t\tSUB");
// else display the assembly code “MOV” “SUB” and store the result to the
corresponding R
In the first part of the program Open a file with read mode and read the content of the
file one by one and get the first three address code. Check the arithmetic operator If
the operator is an addition (+) then display the assembly code “ADD” and store the
result to the corresponding R and if the operator is a subtraction (-) then display the
assembly code “SUB” and store the result to the corresponding register.
HOW TO EXECUTE THE PROGRAM

The program can be executed by using turbo C compiler.The program can be
executed by using turbo Ccompiler
(Alt+F9 … for compilation, Ctrl+F9 … to Run)
ALGORITHM:
Step 1:The input for the back end of the compiler is the intermediate
codegenerated by front end of the compiler.
Step 2:The input file (IN.TXT) is provided in read mode.
Step 3:The output file (TARGET.TXT) is created by the program in write mode.
Step 4:Each and every intermediate code in the input file is converted to
itsequivalent target code by the backend of the compiler
Step 5:The output is stored in the TARGET.Txt file in the form of
assemblylanguage.
Step 6:Stop the program.

PROGRAM:

IMPLEMENTATION OF BACK END OF COMPILER//file name is back.c
#include<stdio.h>
#include<conio.h>
#include<stdlib.h>
#include<string.h>
int label[20];
int no=0;
int main()
{
FILE *fp1,*fp2;
int check_label(int n);
char fname[10],op[10],ch;
char operand1[8],operand2[8],result[8];int i=0;
clrscr();
printf("\n\nEnter filename of the intermediate code:");
scanf("%s",&fname);
fp1=fopen(fname,"r");
fp2=fopen("target.txt","w");
if(fp1==NULL||fp2==NULL)
{
printf("\nError Opening the File.");
getch();
exit(0);
}
while(!feof(fp1))
{
fprintf(fp2,"\n");
fscanf(fp1,"%s",op);
i++;
if(check_label(i))
{
fprintf(fp2,"\nlabel#%d:",i);
}
if(strcmp(op,"print")==0)
{
fscanf(fp1,"%s",result);
fprintf(fp2,"\n\tOUT%s",result);
}
if(strcmp(op,"goto")==0)
{
fscanf(fp1,"%s",operand2);
fprintf(fp2,"\n\t JMP labe#%s",operand2);
label[no++]=atoi(operand2);
}
if(strcmp(op,"[]=")==0)
{
fscanf(fp1,"%s%s%s",operand1,operand2,result);
fprintf(fp2,"\n\tSTORE%s[%s],%s",operand1,operand2,result);
}
if(strcmp(op,"uminus")==0){
fscanf(fp1,"%s%s",operand1,result);
fprintf(fp2,"\n\tMOV R1,-%s",operand1);
fprintf(fp2,"\n\tMOV %s,R1",result);
}
switch(op[0])
{
case'*':
fprintf(fp2,"\n\t MOV R0,%s",operand1);
fprintf(fp2,"\n\t MUL R0 R1");
fprintf(fp2,"\n\t MOV %s,R0",result);
break;
case'+':
fprintf(fp2,"\n\t ADD R0 R1");
break;
case'-':
fprintf(fp2,"\n\t SUB R0 R1");
break;
case'/':
fprintf(fp2,"\n\t DIV R0 R1");
break;
case'%':
fprintf(fp2,"\n\t DIV R0 R1");
break;
case'=':
fscanf(fp1,"%s%s",operand1,result);
fprintf(fp2,"\n\t MOV %s,%s",result,operand1);
break;
case'>':
fprintf(fp2,"\n\t JGT %s,%s label#%s",operand1,operand2,result);
label[no++]=atoi(result);
break;
case'<':
fprintf(fp2,"\n\t JLT%s,%s label#%s",operand1,operand2,result);
label[no++]=atoi(result); break;}}
fclose(fp2);fclose(fp1);
fp2=fopen("target.txt","r");
if(fp2==NULL){
printf("\nError Opening the File");
getch();
exit(0);}
do
{
ch=fgetc(fp2);
printf("%c",ch);
}
while(ch!=EOF);
fclose(fp2);
getch();
return 0;
}
int check_label(int k)
{
int i;for(i=0;i<no;i++)
{
if(k==label[i])
return 1;
}
return 0;
}
INPUT: (IN.TXT)
[]=a i 1
* x y t1
+ t1 z t2
> t2 num 6
goto 8
+ x x x
+ y y y
print x
= y z
print z
OUTPUT: (TARGET.TXT)
MOV R0,x
MOV R1,y
MUL R0 R1
MOV t1,R0
MOV R0,t1
MOV R1,z
ADD R0 R1
MOV t2,R0
JGT t2,num label#6
JMP labe#8
label#8:
MOV R0,x
MOV R1,x
ADD R0 R1
MOV x,R0
MOV R0,y
MOV R1,y
ADD R0 R1
MOV y,R0
OUTx
MOV z,y
OUTz
ADVANTAGES AND LIMITATIONS;

In this program, single back end is developed for single source language. It also
has the advantage of allowing the use of a single back end for multiple source
languages, and similarly allows the use of different back ends for different targets.
APPLICATIONS:
This program can be used to develop a back end of a compiler using C
programming language.
VIVA QUESTIONS:
1. Define three address code.
Three address code is a sequence of statements of the general formx : = y op z
where x,y,z are operand and op is operator. The back end of compiler includes those
portions that depend on the target machine and generally those portions do not depend
on the source language, just the intermediate language. These include
 Code optimization
 Code generation,
along with error handling and symbol- table operations.
2. Write short notes on YACC.
YACC is an automatic tool for generating the parser program.YACC stands for
Yet Another Compiler Compiler which is basically the utility available
fromUNIX.Basically YACC is LALR parser generator. It can report conflict or
ambiguities in the form of error messages
3.What are the issues in the design of code generator? (AU MAY/JUN 2009)
Input to the generator
Target programs
Memory management
Instruction selection
Register allocation
Choice of evaluation order
Approaches to code generation.
4.Define three-address code?

Three address code is a sequence of statements of the form x: = y op z. where
x, y, z are names, constants, or compiler generated temporaries, op stand for any type
of operator. Since a statement involves not more than three references it is called
three-address statement, and hence a sequence of such statement is called three
address codes.
RESULT:
Thus the program has been executed and implemented the back end of the
compiler.
EX.NO:11 Implementation Of Code Optimization Techniques

AIM:

To write a C program to implement the code generation algorithm.
 TURBO C
DESCRIPTION:
OPTIMIZATION:
Optimization is a program transformation technique, which tries to improve the
code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.
BASIC BLOCK:
BB is a sequence of consecutive statements in which the flow control enters at
the beginning and leaves at the end w/o halt or possible branching except at the end
ALGORITHM:
The code generation algorithm takes as input a sequence of three – address
statements constituting a basic block. For each three – address statement of the form x
:= y op z we perform the following actions:

1. Invoke a function getreg to determine the location L where the result of the
computation y op z should be stored. L will usually be a register, but it could also be a
memory location. We shall describe getreg shortly.
2. Consult the address descriptor for y to determine y, (one of) the current
location(s) of y. prefer the register for y if the value of y is currently both in memory
and a register. If the value of y is not already in L, generate the instruction MOV y, L
to place a copy of y in L.
3. Generate the instruction OP z, L where z is a current location of z. Again, prefer
a register to a memory location if z is in both. Update the address descriptor of x to
indicate that x is in location L. If L is a register, update its descriptor to indicate that it
contains the value of x, and remove x from all other register descriptors.
4. If the current values of y and/or z have no next users, are not live on exit from the
block, and are in register descriptor to indicate that, after execution of x := y op z,
those registers no longer will contain y and/or z, respectively.
PROGRAM:
#include<stdio.h>
#include<conio.h>
#include<string.h>
struct op
{
char l;
char r[20];
}op[10],pr[10];
void main()
{
int a,i,k,j,n,z=0,m,q;
char *p,*l;
char temp,t;
char *tem;
clrscr();
printf("enter no of values");
scanf("%d",&n);
for(i=0;i<n;i++)
{
printf("left\t");
op[i].l=getche();
printf("right:\t");
scanf("%s",op[i].r);
}
printf("intermediate Code\n") ;
for(i=0;i<n;i++)
{
printf("%c=",op[i].l);
printf("%s\n",op[i].r);
}
for(i=0;i<n-1;i++)
{
temp=op[i].l;
for(j=0;j<n;j++)
{
p=strchr(op[j].r,temp);
if(p)
{
pr[z].l=op[i].l;
strcpy(pr[z].r,op[i].r);
z++ ;
}} }
pr[z].l=op[n-1].l;
strcpy(pr[z].r,op[n-1].r);
z++;
printf("\nafter dead code elimination\n");
for(k=0;k<z;k++)
{
printf("%c\t=",pr[k].l);
printf("%s\n",pr[k].r);
}
//sub expression elimination

for(m=0;m<z;m++)
{
tem=pr[m].r;
for(j=m+1;j<z;j++)
{
p=strstr(tem,pr[j].r);
if(p)
{
t=pr[j].l;
pr[j].l=pr[m].l ;
for(i=0;i<z;i++)
{
l=strchr(pr[i].r,t) ;
if(l)
{
a=l-pr[i].r;
//printf("pos: %d",a);
pr[i].r[a]=pr[m].l;
}}}}}
printf("eliminate common expression\n");
for(i=0;i<z;i++)
{
printf("%c\t=",pr[i].l);
printf("%s\n",pr[i].r);
}
// duplicate production elimination
for(i=0;i<z;i++)
{
for(j=i+1;j<z;j++)
{
q=strcmp(pr[i].r,pr[j].r);
if((pr[i].l==pr[j].l)&&!q)
{
pr[i].l='\0';
strcpy(pr[i].r,'\0');
}}
}
printf("optimized code");
for(i=0;i<z;i++)
{
if(pr[i].l!='\0')
{
printf("%c=",pr[i].l);
printf("%s\n",pr[i].r);
}
}
getch();
}
OUTPUT:
enter no of values 5
left aright: 9
left bright: c+d
left eright: c+d
left fright: b+e
left rright: f
intermediate Code
a=9
b=c+d
e=c+d
f=b+e
r=f
after dead code elimination

b =c+d
e =c+d
f =b+e
r =f
eliminate common expression
b =c+d
b =c+d
f =b+b
r =f
optimized codeb=c+d
f=b+b
r=f
DISADVANTAGES:
 Debugging made difficult
 Code moves around or disappears
 Important to be able to switch off optimization
 Increases compilation time
VIVA QUESTIONS:
1.Define Dead-code Elimination.
Dead code is one or more than one code statements, which are:
 Either never executed or unreachable,

 Or if executed, their output is never used.
Thus, dead code plays no role in any program operation and therefore it can simply be
eliminated.
2.What is common sub expression?
An occurrence of an expression E is common subexpression if E was

previously computed and the values of variables in E have not changed since.
3.How is liveness of a variable calculated?(AU APR/MAY-2015)

The name in the basic block is said to be live at a given point if its value is used
after that point in the program
4.Define Peephole optimization. (AU MAY/JUN 2007)

A Statement by statement code generation strategy often produces target code
that contains redundant instructions and suboptimal constructs. “Optimizing” is
misleading because there is no guarantee that the resulting code is optimal. It is a
method for trying to improve the performance of the target program by examining the
short sequence of target instructions and replacing this instructions by shorter or faster
sequence.

RESULT:
Thus the above program is compiled and executed successfully and output is verified.
CONTENT BEYOND THE SYLLABI
Ex.No:12 Implementation of Code Generator

Aim:
To write a C program to implement Simple Code Generator.
Algorithm:
Input: Set of three address code sequence.

Output: Assembly code sequence for three address codes (opd1=opd2, op, opd3).
Method:
Step 1- Start
Step 2- Get address code sequence.
Step 3- Determine current location of 3 using address (for 1st operand).
Step 4- If current location not already exist generate move (B,O).
Step 5- Update address of A(for 2nd operand).
Step 6- If current value of B and () is null,exist.
Step 7- If they generate operator () A,3 ADPR.
Step 8- Store the move instruction in memory
Step 9- Stop.
Program:
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<ctype.h>
#include<graphics.h>
typedef struct
{
char var[10];
int alive;
}
regist;
regist preg[10];
void substring(char exp[],int st,int end)
{
int i,j=0;
char dup[10]="";
for(i=st;i<end;i++)
dup[j++]=exp[i];
dup[j]='0';
strcpy(exp,dup);
}
int getregister(char var[])
{
int i;
for(i=0;i<10;i++)
{
if(preg[i].alive==0)
{
strcpy(preg[i].var,var);
break;
}
}
return(i);
}
void getvar(char exp[],char v[])
{
int i,j=0;
char var[10]="";
for(i=0;exp[i]!='\0';i++)
if(isalpha(exp[i]))
var[j++]=exp[i];
else
break;
strcpy(v,var);
}
void main()
{
char basic[10][10],var[10][10],fstr[10],op;
int i,j,k,reg,vc,flag=0;
clrscr();
printf("\nEnter the Three Address Code:\n");
for(i=0;;i++)
{
gets(basic[i]);
if(strcmp(basic[i],"exit")==0)
break;
}
printf("\nThe Equivalent Assembly Code is:\n");
for(j=0;j<i;j++)
{
getvar(basic[j],var[vc++]);
strcpy(fstr,var[vc-1]);
substring(basic[j],strlen(var[vc-1])+1,strlen(basic[j]));
reg=getregister(var[vc-1]);
if(preg[reg].alive==0)
{
printf("\nMov R%d,%s",reg,var[vc-1]);
preg[reg].alive=1;
}
op=basic[j][strlen(var[vc-1])];
substring(basic[j],strlen(var[vc-1])+1,strlen(basic[j]));
switch(op)
{
case '+': printf("\nAdd"); break;
case '-': printf("\nSub"); break;
case '*': printf("\nMul"); break;
case '/': printf("\nDiv"); break;
}
flag=1;
for(k=0;k<=reg;k++)
{
if(strcmp(preg[k].var,var[vc-1])==0)
{
printf("R%d, R%d",k,reg);
preg[k].alive=0;
flag=0;
break;
}
}
if(flag)
{
printf(" %s,R%d",var[vc-1],reg);
printf("\nMov %s,R%d",fstr,reg);
}
strcpy(preg[reg].var,var[vc-3]);
getch();
}
}
Sample Input & Output:
Enter the Three Address Code:
a=b+c
c=a*c
exit
The Equivalent Assembly Code is:
Mov R0,b
Add c,R0
Mov a,R0
Mov R1,a
Mul c,R1
Mov c,R1
Result:
The above C program was successfully executed and verified.
Ex.No:13 Construction of NFA from Regular Expression
Aim:
To write a C program to construct a Non Deterministic Finite Automata (NFA) from
Regular Expression.
Algorithm:
Step 1: Start the Program.

Step 2: Enter the regular expression R over alphabet E.
Step 3: Decompose the regular expression R into its primitive components
Step 4: For each component construct finite automata.
Step 5: To construct components for the basic regular expression way that corresponding to
that way compound regular expression.
Step 6: Stop the Program.
Program:
#include<stdio.h>
#include<conio.h>
#include<ctype.h>
#include<string.h>
#include<graphics.h>
#include<math.h>
#include<process.h>
int minx=1000,miny=0;
void star(int *x1,int *y1,int *x2,int *y2)
{
char pr[10];
ellipse(*x1+(*x2-*x1)/2,*y2-10,0,180,(*x2-*x1)/2,70);
outtextxy(*x1-2,*y2-17,"v");
line(*x2+10,*y2,*x2+30,*y2);
outtextxy(*x1-15,*y1-3,">");
circle(*x1-40,*y1,10);
circle(*x1-80,*y1,10);
line(*x1-30,*y2,*x1-10,*y2);
outtextxy(*x2+25,*y2-3,">");
sprintf(pr,"%c",238);
outtextxy(*x2+15,*y2-9,pr);
outtextxy(*x1-25,*y1-9,pr);
outtextxy((*x2-*x1)/2+*x1,*y1-30,pr);
outtextxy((*x2-*x1)/2+*x1,*y1+30,pr);
ellipse(*x1+(*x2-*x1)/2,*y2+10,180,360,(*x2-*x1)/2+40,70);
outtextxy(*x2+37,*y2+14,"^");
if(*x1-40<minx)minx=*x1-40;
miny=*y1;
}
void star1(int *x1,int *y1,int *x2,int *y2)
{
char pr[10];
ellipse(*x1+(*x2-*x1)/2+15,*y2-10,0,180,(*x2-*x1)/2+15,70);
outtextxy(*x1-2,*y2-17,"v");
line(*x2+40,*y2,*x2+60,*y2);
outtextxy(*x1-15,*y1-3,">");
circle(*x1-40,*y1,10);
line(*x1-30,*y2,*x1-10,*y2);
outtextxy(*x2+25,*y2-3,">");
outtextxy(*x1-25,*y1-9,pr);
outtextxy((*x2-*x1)/2+*x1,*y1-30,pr);
outtextxy((*x2-*x1)/2+*x1,*y1+30,pr);
ellipse(*x1+(*x2-*x1)/2+15,*y2+10,180,360,(*x2-*x1)/2+50,70);
outtextxy(*x2+62,*y2+13,"^");
if(*x1-40<minx)minx=*x1-40;
miny=*y1;
}
void basis(int *x1,int *y1,char x)
{
char pr[5];
circle(*x1,*y1,10);
line(*x1+30,*y1,*x1+10,*y1);
sprintf(pr,"%c",x);
outtextxy(*x1+23,*y1-3,">");
circle(*x1+40,*y1,10);
if(*x1<minx)minx=*x1;
miny=*y1;
}
void slash(int *x1,int *y1,int *x2,int *y2,int *x3,int *y3,int *x4,int *y4)
{
char pr[10];
int c1,c2;
c1=*x1;
if(*x3>c1)c1=*x3;
c2=*x2;
if(*x4>c2)c2=*x4;
line(*x1-10,*y1,c1-40,(*y3-*y1)/2+*y1-10);
outtextxy(*x1-15,*y1-3,">");
outtextxy(*x3-15,*y4-3,">");
circle(c1-40,(*y4-*y2)/2+*y2,10);
outtextxy(c1-40,(*y4-*y2)/2+*y2+25,pr);
outtextxy(c1-40,(*y4-*y2)/2+*y2-25,pr);
line(*x2+10,*y2,c2+40,(*y4-*y2)/2+*y2-10);
line(*x3-10,*y3,c1-40,(*y3-*y1)/2+*y2+10);
circle(c2+40,(*y4-*y2)/2+*y2,10);
outtextxy(c2+40,(*y4-*y2)/2+*y2-25,pr);
outtextxy(c2-40,(*y4-*y2)/2+*y2+25,pr);
outtextxy(c2+35,(*y4-*y2)/2+*y2-15,"^");
outtextxy(c1+35,(*y4-*y2)/2+*y2+10,"^");
line(*x4+10,*y2,c2+40,(*y4-*y2)/2+*y2+10);
minx=c1-40;
miny=(*y4-*y2)/2+*y2;
}
void main()
{
int d=0,l,x1=200,y1=200,len,par=0,op[10];
int cx1=200,cy1=200,cx2,cy2,cx3,cy3,cx4,cy4;
char str[20];
int gd=DETECT,gm;
int stx[20],endx[20],sty[20],endy[20];
int pos=0,i=0;
clrscr();
initgraph(&gd,&gm,"c:\\dosapp\\tcplus\\bgi");
printf("\n enter the regular expression:");
scanf("%s",str);
len=(strlen(str));
while(i<len)
{
if(isalpha(str[i]))
{
if(str[i+1]=='*')x1=x1+40;
basis(&x1,&y1,str[i]);
stx[pos]=x1;
endx[pos]=x1+40;
sty[pos]=y1;
endy[pos]=y1;
x1=x1+40;
pos++;
}
if(str[i]=='*')
{
star(&stx[pos-1],&sty[pos-1],&endx[pos-1],&endy[pos-1]);
stx[pos-1]=stx[pos-1]-40;
endx[pos-1]=endx[pos-1]+40;
x1=x1+40;
}
if(str[i]=='(')
{
int s;
s=i;
while(str[s]!=')')s++;
if((str[s+1]=='*')&&(pos!=0))x1=x1+40;
op[par]=pos;
par++;
}
if(str[i]==')')
{
cx2=endx[pos-1];
cy2=endy[pos-1];
l=op[par-1];
cx1=stx[1];
cx2=sty[1];
par--;
if(str[i+1]=='*')
{
i++;
star1(&cx1,&cy1,&cx2,&cy2);
cx1=cx1-40;
cx2=cx2+40;
stx[1]=stx[1]-40;
endx[pos-1]=endx[pos-1]+40;
x1=x1+40;
}
if(d==1)
{
slash(&cx3,&cy3,&cx4,&cy4,&cx1,&cy1,&cx2,&cy2);
if(cx4>cx2)x1=cx4+40;
else x1=cx2+40;
y1=(y1-cy4)/2.0+cy4;
d=0;
}
}
if(str[i]=='/')
{
cx2=endx[pos-1];
cy2=endy[pos-1];
x1=200;
y1=y1+100;
if(str[i+1]=='(')
{
d=1;
cx3=cx1;
cy3=cy1;
cx4=cx2;
cy4=cy2;
}
if(isalpha(str[i+1]))
{
i++;
basis(&x1,&y1,str[i]);
stx[pos]=x1;
endx[pos]=x1+40;
sty[pos]=y1;
endy[pos]=y1;
if(str[i+1]=='*')
{
i++;
star(&stx[pos],&sty[pos],&endx[pos],&endy[pos]);
stx[pos]=stx[pos]-40;
endx[pos]=endx[pos]+40;
}
slash(&cx1,&cy1,&cx2,&cy2,&stx[pos],&sty[pos],&endx[pos],&endy[pos]);
if(cx2>endx[pos])x1=cx2+40;
else x1=endx[pos]+40;
y1=(y1-cy2)/2.0+cy2;
cx1=cx1-40;
cy1=(sty[pos]-cy1)/2.0+cy1;
cx2=cx2+40;
cy2=(endy[pos]-cy2)/2.0+cy2;
l=op[par-1];
stx[1]=cx1;
sty[1]=cy1;
endx[pos]=cx2;
endy[pos]=cy2;
pos++;
}
}
i++;
}
circle(x1,y1,13);
line(minx-30,miny,minx-10,miny);
outtextxy(minx-100,miny-10,"start");
outtextxy(minx-15,miny-3,">");
getch();
closegraph();
}
Sample Input & Output:
Result:
Ex.No:7 Implementation of Control Flow Analysis
AIM:
To write a c program to implement control Flow analysis.
 TURBO C
DESCRIPTION:
CFG:
A control flow graph (CFG) in computer science is a representation, using graph
notation, of all paths that might be traversed through a program during its execution.
PROCEDURE:
In compiler theory, loop optimization is the process of the increasing execution speed and
reducing the overheads associated of loops. It plays an important role in improving cache
performance and making effective use of parallel processing capabilities. Most execution
time of a scientific program is spent on loops; as such, many compiler optimization
techniques have been developed to make them faster.
Loop Invariant Code Motion

•If a computation produces the same value in every loop iteration, move it out of the loop .
 If a computation produces the same value in every loop iteration, move it out of the
loop
for i = 1 to N
x=x+1
for j = 1 to N
a(i,j) = 100*N + 10*i + j + x
loop
for i = 1 to N
x=x+1
for j = 1 to N
a(i,j) = 100*N+ 10*i + j + x
loop
t1 = 100*N
for i = 1 to N
x=x+1
for j = 1 to N
a(i,j) = 100*N+ 10*i + j + x
loop
t1 = 100*N
for i = 1 to N
x=x+1
for j = 1 to N
a(i,j) = t1+ 10*i + j + x
loop
t1 = 100*N
for i = 1 to N
x=x+1
for j = 1 to N
a(i,j) = t1 + 10*i + j + x
loop
t1 = 100*N
for i = 1 to N
x=x+1
for j = 1 to N
a(i,j) = t1 + 10*i+ j + x
loop
t1 = 100*N
for i = 1 to N
x=x+1
t2 = 10*i + x
for j = 1 to N
a(i,j) = t1 + 10*i+ j + x
loop
t1 = 100*N
for i = 1 to N
x=x+1
t2 = 10*i + x
for j = 1 to N
a(i,j) = t1 + t2 + j
Program:
#include<stdio.h>
#include<conio.h>
int main()
{
int a[10][10], b[10][10], c[10][10],m,n, i, j, k, l,g;
printf("Enter the number of rows and column of first matrix\n");
scanf("%d%d",&m,&n);
printf("\nEnter the elements of first %dx%d matrix\n",m,n);
for (i=0; i< m; i++)
{
for(j=0; j<n; j++)
{
scanf("%d",&a[i][j]);
}
}
s1 : printf("\nEnter the number of rows and column of second matrix\n");
scanf("%d%d",&g,&l);
if(n!=g)
{
printf("\nIn matrix multiplication first column and second row number should be the
same \nRenter ");
getch();
goto s1;
}
printf("\nEnter the elements of second %dx%d matrix",n,l);
for(i = 0; i <n; i++)
{
for (j = 0; j < l; j++)
{
scanf("%d", &b[i][j]);
}
}
printf("\nThe first matrix is :-\n");
for (i = 0; i < m; i++)
{
for (j = 0; j < n; j++)
{
printf("\t%d", a[i][j]);
}
printf("\n");
}
printf("\nThe second matrix is :-\n");
for (i = 0; i < n; i++)
{
for (j = 0; j < l; j++)
{
printf("\t%d", b[i][j]);
}
printf("\n");
}
printf("\nMultiplication of the two matrices is as follows:\n");

for (i = 0;i < m; i++)
{
printf("\n");
for (j = 0; j < l; j++)
{
c[i][j]=0;
for(k=0;k<n;k++)
c[i][j] = c[i][j]+a[i][k] * b[k][j];
printf("\t%d", c[i][j]);
}
}
getch();
}
OUTPUT:
LIMITATIONS:
COMPLICATIONS IN CFG CONSTRUCTION:
 Function calls
Instruction scheduling may prefer function calls as basic block boundaries
Special functions as setjmp() and longjmp()
 Exception handling
Ambiguous jump
Jump r1
//target stored in register r1
 Static analysis may generate edges that never occur at runtime
 Record potential targets if possible Jumps target outside the current procedure
 PASCAL, Algol: still restricted to lexically enclosing procedure
VIVA QUESTIONS:
1.Define Control Flow Graph?
In a control flow graph each node in the graph represents a basic block, i.e. a straight-
line piece of code without any jumps or jump targets; jump targets start a block, and jumps
end a block.
2. What is the basic idea for data flow analysis?
Data flow analysis derives information about the dynamic behaviour of a program by
only examining the static code.
3.Define Liveness.
A variable is live at a particular point in the program if its value at that point will be
used in the future (dead,otherwise). ∴ To compute liveness at a given point, we need to look
into the future.
4.Define Activation records
Information needed by a single execution of a procedure is managed using a

contiguous block of storage called an activation record or frame, consisting of the collection
of the fields.
RESULT:
EX.NO:6 Implementation of Type Checking
OBJECTIVE:
To write a C program to test whether a given identifier is valid or not.
 TURBO C
Definition:
Type System:
A type system is a set of types and type constructors (integers, arrays, classes, etc.)
along with the rules that govern whether or not a program is legal with respect to types (i.e.,
type checking).
Type Checking:
Type checking checks and enforces the rules of the type system to prevent type errors
from happening.
Type Error:
A type error happens when an expression produces a value outside the set of values it
is supposed to have.
Static type checking:
There are two main kinds of static type checking: explicit type decoration and implicit type
inference
Let us study the expression (+ x (string-length y))
 Explicit type decoration

o Variables, parameters, and others are explicitly declared of a given type in the
source program
o It is checked that y is a string and that x is a number
 Implicit type inference
o Variables and parameters are not decorated with type information
o By studying the body of string-length it is concluded that y must be a string
and that the type of (string-length y) has to be an integer
o Because + adds numbers, it is concluded that x must be a number
PROGRAM LOGIC:
Step 1:Read the given input string.

Step 2:Check the initial character of the string is numerical or any special character except ‘_’
then print it is not a valid identifier.
Step 3:Otherwise print it as valid identifier if remaining characters of string doesn’t contains
any special characters except ‘_’.
PROCEDURE:
Go to debug -> run or press CTRL + F9 to run the program.
PROGRAM:
#include<stdio.h>
#include<conio.h>
#include<ctype.h>
void main()
{
char a[10];
int flag, i=1;
clrscr();
printf("\n Enter an identifier:");
gets(a);
if(isalpha(a[0]))
flag=1;
else
printf("
\
n Not a valid identifier");
while(a[i]!='
\
0')
{
if(!isdigit(a[i])&&!isalpha(a[i]))
{
flag=0;
break;
}
i++;
}
if(flag==1)
printf("
\
n Valid identifier");
getch();
}
INPUT & OUTPUT:

Input:
Enter an identifier: first
Output:
Valid
identifier
Enter an identifier:1aqw
Not a valid identifier
BENEFITS OF TYPE CHECKING:
Static typing has the following main benefits:
It allows statically (without running the program) detecting many programming errors
quickly, reliably and automatically. This helps reduce the number of bugs and reduces the
time spent on debugging. Type declarations serve as automatically-checked documentation.
They make programs easier to understand and maintain. Static typing may improve runtime
efficiency.
Dynamic typing has a different, complementary set of benefits:
Dynamic typing is conceptually simpler and easier to understand than static typing,
especially when using powerful container types such as Alore arrays, tuples and maps. This
effect is pronounced for non-expert programmers. Dynamic typing is more flexible. A static
type system always restricts what can be conveniently expressed. Programming with a static
type system often requires more design and implementation effort. Dynamic typing results in
more compact programs, since it is more flexible and does not require types to be spelled out.
The benefits of static typing are more pronounced for large and complex programs. It offers
little benefit over dynamic typing when writing short scripts and prototypes, for example. In
these cases it mainly slows down the programmer, and dynamic typing is preferable.
VIVA QUESTIONS:
1.Define a syntax-directed translation?/SDT? (AU MAY/JUN 2007)

Syntax-directed translation specifies the translation of a construct in terms of
attributes associated with its syntactic components. Syntax-directed translation uses a
context free grammar to specify the syntactic structure of the input. It is an input-
output mapping.
2.Define an attribute. Give the types of an attribute?

An attribute may represent any quantity, with each grammar symbol, it
associates a set of attributes and with each production, a set of semantic rules for
computing values of the attributes associated with the symbols appearing in that
production. Example:
a type, a value, a memory location etc.,
i)Synthesized attributes.
ii)Inherited attributes.
3.What are error recovery strategies in parser?

a)Panic mode
b)Phrase level
c)Error productions
d)Global corrections
4.Define Type System.
A type system is a set of types and type constructors (integers, arrays, classes, etc.)
along with the rules that govern whether or not a program is legal with respect to types (i.e.,
type checking).
5.Define Type Checking.
Type checking checks and enforces the rules of the type system to prevent type errors
from happening.
RESULT:
Thus the program for type checking has been executed and verified successfully.
Ex.No:8 Implement storage allocation strategies(Heap,Stack,Static)
AIM:
To write c program to implement storage allocation –Heap Allocation.
 TURBO C
THEORY:
Memory Allocation
Static Allocation (fixed in size)
Sometimes we create data structures that are “fixed” and don’t need to grow or shrink.
Dynamic Allocation (change in size)
At other times, we want to increase and decrease the size of our data structures to
accommodate changing needs.
Static Allocation
 Done at compile time.
 Global variables: variables declared “ahead of
 time,” such as fixed arrays.
 Lifetime = entire runtime of program
Advantage: efficient execution time.
Disadvantage:
If we declare more static data space than we need, we waste space.
If we declare less static space than we need, we are out of luck
Dynamic Allocation
 Done at run time.
 Data structures can grow and shrink to fit changing data requirements.
 We can allocate (create) additional storage whenever we need them.
 We can de-allocate (free/delete) dynamic space whenever we are done with them.
Advantage: we can always have exactly the amount of space required - no more, no less.
For example, with references to connect them, we can use dynamic data structures to create a
chain of data structures called a linked list
Stack-Based Allocation
Memory allocation and freeing are partially predictable.
Restricted but simple and efficient.
Allocation is hierarchical: Memory freed in opposite order of allocation.If alloc(A) then
alloc(B) then alloc(C), then it must be free(C) then free(B) then free(A).
Stack-Based Allocation:
Example
Procedure call:
Program calls Y, which calls X. Each call pushes another stack frame on top of the stack.
Each stack frame has space for variable, parameters, and return addresses. Stacks are also
useful for tree traversal, expression evaluation, top-down recursive parsers etc
Advantages:
-allows recursion
-conserves storage
Disadvantages:
-Overhead of allocation and deallocation
-Subprograms cannot be history sensitive
-Inefficient references (indirect addressing)
Heap Allocation
Variables local to a procedure are allocated and de-allocated only at runtime. Heap
allocation is used to dynamically allocate memory to the variables and claim it back when the
variables are no more required.
Except statically allocated memory area, both stack and heap memory can grow and
shrink dynamically and unexpectedly. Therefore, they cannot be provided with a fixed
amount of memory in the system.
Explicit versus Implicit Deallocation
Examples:
•Implicit: Java, Scheme
•Explicit: Pascal and C
To free heap memory a specific operation must be called.
Pascal ==> dispose
C ==> free
In explicit memory management, the program must explicitly call an operation to release
memory back to the memory management system.
In implicit memory management, heap memory is reclaimed automatically by a “garbage
collector”.
Heap-Based Storage Allocation
The most flexible allocation scheme is heap-based allocation. Here, storage can be
allocated and deallocated dynamically at arbitrary times during program execution. This will
be more expensive than either static or stack-based allocation. Heap-based allocation is used
ubiquitously in languages such as Lisp/Scheme and Smalltalk.
Issue: when is storage allocated and deallocated?
Allocation is easy. In C, malloc (a standard library function) allocates fresh storage.

In Lisp/Scheme, a new cons cell is allocated when the cons function is called, array storage
can be allocated using make-array, and so forth. In Smalltalk, new storage is allocated
when someone sends the new message to a class.
Deallocation is harder. There are two approaches: programmer-controlled and

automatic. In languages such as C, the programmer is in charge of deciding when heap
storage can be freed (in C using the free function). This is efficient but can lead to
problems if the programmer makes a mistake -- either storage is not freed even though it is no
longer needed (memory leak), or is freed but referred to later (dangling pointer). Dangling
pointers can lead to type insecurities or other errors -- one can refer to storage that has been
re-allocated for another purpose with data of a different type. Some studies estimate that over
30% of development time on large C/C++ projects is related to storage management issues.
Lisp/Scheme and Smalltalk, as well as various other languages, use automatic storage
management. There is no explicit deallocate function; rather, storage is automatically
reclaimed some time after it is no longer accessible.
ALGORITHM:
STEP 1: Start the program.
STEP 2: Determine the number of elements of heap and place the element sequentially in an
array.
STEP 3: Create_heap() and Insert() function creates the heap based on the values of the
elements. The root node contains the largest value and all other nodes with the value
less than root node.
STEP 4: In the del_root() function it repeatedly deletes the root node and places the element
in the end of the array. The last node in the heap is replaced in the root position and
tree is again constructed until the element is placed in proper place.
STEP 5: Display() function returns the sorted elements of the heap.
STEP 6: Repeat the same process until all the elements are deleted to perform the sort
operation.
STEP 7: Terminate the program.
/* * C Program to Implement a Heap & provide Insertion & Deletion Operation */
#include <stdio.h>
int array[100], n;
main()
{
int choice, num;
n = 0;/*Represents number of nodes in the heap*/
while(1)
{
printf("1.Insert the element \n");
printf("2.Delete the element \n");
printf("3.Display all elements \n");
printf("4.Quit \n");
printf("Enter your choice : ");
scanf("%d", &choice);
switch(choice)
{
case 1:
printf("Enter the element to be inserted to the list : ");
scanf("%d", &num);
insert(num, n);
n = n + 1;
break;
case 2:
printf("Enter the elements to be deleted from the list: ");
scanf("%d", &num);
delete(num);
break;
case 3:
display();
break;
case 4:
exit(0);
default:
printf("Invalid choice \n");
}/*End of switch */
}/*End of while */
}/*End of main()*/
display()
{
int i;
if (n == 0)
{
printf("Heap is empty \n");
return;
}
for (i = 0; i < n; i++)
printf("%d ", array[i]);

printf("\n");
}/*End of display()*/
insert(int num, int location)
{
int parentnode;
while (location > 0)
{
parentnode =(location - 1)/2;
if (num <= array[parentnode])
{
array[location] = num;
return;
}
array[location] = array[parentnode];
location = parentnode;
}/*End of while*/
array[0] = num; /*assign number to the root node */
}/*End of insert()*/
delete(int num)
{
int left, right, i, temp, parentnode;
for (i = 0; i < num; i++) {
if (num == array[i])
break;
}
if (num != array[i])
{
printf("%d not found in heap list\n", num);
return;
}
array[i] = array[n - 1];
n = n - 1;
parentnode =(i - 1) / 2; /*find parentnode of node i */
if (array[i] > array[parentnode])
{
insert(array[i], i);
return;
}
left = 2 * i + 1; /*left child of i*/
right = 2 * i + 2; /* right child of i*/
while (right < n)
{
if (array[i] >= array[left] && array[i] >= array[right])
return;
if (array[right] <= array[left])
{
temp = array[i];
array[i] = array[left];
array[left] = temp;
i = left;
}
else
{
temp = array[i];
array[i] = array[right];
array[right] = temp;
i = right;
}
left = 2 * i + 1;
right = 2 * i + 2;
}/*End of while*/
if (left == n - 1 && array[i]) {
temp = array[i];
array[i] = array[left];
array[left] = temp;
}
}
OUTPUT:
ADVANTAGE:
 provides for dynamic storage management
DISADVANTAGE:
 inefficient and unreliable
VIVA QUESTIONS:
1.What are the sub problems in register allocation strategies?

During register allocation, we select the set of variables that will reside in register at
a point in the program.
During a subsequent register assignment phase, we pick the specific register that a
variable reside in.
2. Give the standard storage allocation strategies./List out the various Storage allocation
stratergies.(AU NOV/DEC2014)
Static allocation
Stack allocation.
3.Define static allocations and stack allocations

Static allocation is defined as lays out for all data objects at compile time. Names are
bound to storage as a program is compiled, so there is no need for a Run time support
package.
Stack allocation is defined as process in which manages the run time as a Stack. It is
based on the idea of a control stack; storage is organized as a stack, And activation records
are pushed and popped as activations begin and end.
4. Give the 2 attributes of syntax directed translation into 3-addr code?

i)E.place, the name that will hold the value of E and
ii)E.code , the sequence of 3-addr statements evaluating E.
5.Define Heap allocation.
Variables local to a procedure are allocated and de-allocated only at runtime. Heap
allocation is used to dynamically allocate memory to the variables and claim it back when the
variables are no more required.
RESULT:
Thus the program for implementation of heap storage allocation has been executed
and verified successfully.
EX.No:9 Construction of DAG
AIM:
To implement to construct of DAG using C.
 TURBO C
THEORY:
DAG:
A directed acyclic graph (DAG!) is a directed graph that contains no cycles. A rooted
tree is a special kind of DAG and a DAG is a special kind of directed graph. For example, a
DAG may be used to represent common subexpressions in an optimising compiler.
+ +
. . . .
. . . .
* () *<---| ()
.. . . .. | . .
. . . . . . | . |
a b f * a b | f |
.. ^ v
. . | |
a b |--<----
Tree DAG
expression: a*b+f(a*b)
Example of Common Subexpression.

The common subexpression a*b need only be compiled once but its value can be
used twice.
A DAG can be used to represent prerequisites in a university course, constraints on

operations to be carried out in building construction, in fact an arbitrary partial-order `<'. An
edge is drawn from a to b whenever a<b. A partial order `<' satisfies:
(i) transitivity, a<b and b<c implies a<c

(ii) non-reflexive, not(a < a)
Standard graph terminology is used throughout this section. In addition, for a
directed edge e we denote by src(e) and dst(e) the source vertex and the destination vertex of
e. Until Section 3.11, when we speak of the subject DAG we mean the expression DAG
without the primary inputs, the primary outputs, and the edges emanating from or incident to
these vertices; these vertices and edges are treated specially in Section 3.11. For instance,
Figure 1 shows a C-code basic block and its corresponding DAG, in which primary inputs are
shown as squares and primary outputs as double circles.
‘Optimizations’ of Basic Blocks

Equivalent transformations: Two basic block are equivalent if they compute the
same set of expressions.
-Expressions: are the values of the live variables at the exit of the block.
Two important classes of local transformations:
-structure preserving transformations:
 common sub expression elimination
 dead code elimination
 renaming of temporary variables
 interchange of two independent adjacent statements.
-algebraic transformations (countlessly many):
 simplify expressions
 replace expensive operations with cheaper ones.
The DAG Representation of Basic Blocks
Directed acyclic graphs (DAGs) give a picture of how the value computed by each
statement in the basic block is used in the subsequent statements of the block.
Definition: a dag for a basic block is a directed acyclic graph with the following labels on
nodes:
- leaves are labeled with either variable names or constants.
 they are unique identifiers
 from operators we determine whether l- or r-value.
 represent initial values of names. Subscript with 0.
- interior nodes are labeled by an operator symbol.
- Nodes are also (optionally) given a sequence of identifiers for labels.
- interior node  computed values
- identifiers in the sequence – have that value.
Example of DAG Representation
t1:= 4*i
t2:= a[t1]
t3:= 4*i
t4:= b[t3]
t5:= t2 * t4
t6:= prod + t5
prod:= t6
t7:= i + 1
i:= t7
if i <= 20 goto 1
Three address code
+
t
* (1)
prod 5
t t
[] [] 4 <=
2 t1,
a b
*t3 + t7,
i 20
4 i 1
0
Corresponding DAG
Utility: Constructing a dag from 3AS is a good way of determining:
• common sub expressions (expressions computed more than once),
• which names are used inside the block but evaluated outside,
• which statements of the block could have their computed value used outside the
block.
Constructing a DAG
Input: a basic block. Statements: (i) x:= y op z (ii) x:= op y (iii) x:= y
Output: a dag for the basic block containing:
- a label for each node. For leaves an identifier - constants are permitted. For
interior nodes an operator symbol.
- for each node a (possibly empty) list of attached identifiers - constants not
permitted.
Method: Initially assume there are no nodes, and node is undefined.
(1) If node(y) is undefined: created a leaf labeled y, let node(y) be this node. In case(i) if
node(z) is undefined create a leaf labeled z and that leaf be node(z).
(2) In case(i) determine if there is a node labeled op whose left child is node(y) and right
child is node(z). If not create such a node, let be n. case(ii), (iii) similar.
(3) Delete x from the list attached to node(x). Append x to the list of identify for node n
and set node(x) to n.
Algorithm DAG: Constructing a DAG
Input: A basic block of three address statements. No pointers or arrayreferences.

Output:A DAG where each node n has a value, VALUE(n), which is anoperator in the case
of an interior node or a variable name if the node is a leaf. Also, each node n has a (possibly
empty) list of identifiers attached,ID(n).
The DAG Representation of Basic Blocks The previous algorithm aims at improving the
quality of the target code,but only with respect to register utilization.
There are a number of other issues in the generation of efficient code.One of them is the
elimination of redundant computation. Thus, in the sequence
x := b*c*d+b*c*2.0 b := b*c*3.0 y := b*c+1.0
the same computation of b*c is done three times (the fourth occurence isnot the same because
b is reassigned).
The following algorithm identifies and removes common subexpressionsusing a DAG as

intermediate representation. This algorithm assumes that there are no array element or
pointers in the basic block. Traditional compiler optimizations do not deal naturally with
arrayreferences and pointers.
Two special operators The [ ] operator is used to index a (one dimensional) array
a:=b[i] can be translated as (1)
l R,b(R)
if i is in register R (2)
l R,M l R,b(R)
if i is memory location M (3)
l R,S(A) l R,b(R)
if i is in stack offset S.
The * operator is similar. For example, (2) abore is replaced by
l R,M l R,*R
26
getreg(y,I) if there is register R such that RD(R) = {y} andNEXTUSE(
y,I) is emptythen
return (R)endif if there is R in REGISTERS such that RD(R) is empty thenreturn(

R)endif
R:= getanyregister()forall
v in RD(R) doAD(
v) := AD(v) - {R}if SYMTAB.loc(
v) is not in AD(v) thengenerate( st R,SYMTAB.loc(v))AD( v) := AD(v) +

{SYMTAB.loc(v)}endif
enddoreturn(R)
ALGORITHM:
1. Compute the indegrees of all vertices

2. Find a vertex U with indegree 0 and print it (store it in the ordering)
If there is no such vertex then there is a cycle

and the vertices cannot be ordered. Stop.
3. Remove U and all its edges (U,V) from the graph.

4. Update the indegrees of the remaining vertices.
5. Repeat steps 2 through 4 while there are vertices to be processed.
PROGRAM:
#include <stdio.h>
#define N 11 // no of total vertices in the graph.
typedef enum {FALSE, TRUE} bool;

typedef struct node node;
struct node {
int count; // for arraynodes : in-degree.
// for listnodes : vertex no this vertex is connected to.
// if this node is out of graph : -1.
// if this has 0 indegree then it occurs in zerolist.
node *next;
};
node graph[N];
node *zerolist;
void addToZerolist( int v ) {

/*
* adds v to zerolist as v has 0 predecessors.
*/
node *ptr = (node *)malloc( sizeof(node) );
ptr->count = v;
ptr->next = zerolist;
zerolist = ptr;
}
void buildGraph( int a[][2], int edges ) {

/*
* fills global graph with input given in a.
* a[i][0] is src vertex and a[i][1] is dst vertex.
*/
int i;
// init graph.
for( i=0; i<N; ++i ) {
graph[i].count = 0;
graph[i].next = NULL;
}
// now add the list entries.

for( i=0; i<edges; ++i ) {
// add new node to src list.
node *ptr = (node *)malloc( sizeof(node) );
ptr->count = a[i][1];
ptr->next = graph[ a[i][0] ].next;
graph[ a[i][0] ].next = ptr;
// increase indegree of dst.
graph[ a[i][1] ].count++;
}
// now create list of zero predecessors.
zerolist = NULL; // list of vertices having 0 predecessors.
for( i=0; i<N; ++i )
if( graph[i].count == 0 ) {
addToZerolist(i);
}
}
void printGraph() {
int i;
node *ptr;
for( i=0; i<N; ++i ) {

node *ptr;
printf( "%d: pred=%d: ", i, graph[i].count );
for( ptr=graph[i].next; ptr; ptr=ptr->next )
printf( "%d ", ptr->count );
printf( "\n" );
}
printf( "zerolist: " );
for( ptr=zerolist; ptr; ptr=ptr->next )
printf( "%d ", ptr->count );
printf( "\n" );
}
int getZeroVertex() {
/*
* returns the vertex with zero predecessors.
* if no such vertex then returns -1.
*/
int v;
node *ptr;
if( zerolist == NULL )

return -1;
ptr = zerolist;
v = ptr->count;
zerolist = zerolist->next;
free(ptr);
return v;
}
void removeVertex( int v ) {

/*
* deletes vertex v and its outgoing edges from global graph.
*/
node *ptr;
graph[v].count = -1;
// free the list graph[v].next.
for( ptr=graph[v].next; ptr; ptr=ptr->next ) {
if( graph[ ptr->count ].count > 0 ) // normal nodes.
graph[ ptr->count ].count--;
if( graph[ ptr->count ].count == 0 ) // this is NOT else of above if.
addToZerolist( ptr->count );
}
}
void topsort( int nvert ) {

/*
* finds recursively topological order of global graph.
* nvert vertices of graph are needed to be ordered.
*/
int v;
if( nvert > 0 ) {

v = getZeroVertex();
if( v == -1 ) { // no such vertex.
fprintf( stderr, "graph contains a cycle.\n" );
return;
}
printf( "%d.\n", v );
removeVertex(v);
topsort( nvert-1 );
}
}
int main() {
int a[][2] = {
{0,1},
{0,3},
{0,2},
{1,4},
{2,4},
{2,5},
{3,4},
{3,5}
};
buildGraph( a, 8 );
printGraph();
topsort(N);
}
OUTPUT:
VIVA QUESTIONS:
1.What is a DAG? Mention its applications.
Directed acyclic graph(DAG) is a useful data structure for implementing

transformations on basic blocks.
DAG is used in
 · Determining the common sub-expressions.

 · Determining which names are used inside the block and computed outside the block.
 · Determining which statements of the block could have their computed value outside
the block.
 · Simplifying the list of quadruples by eliminating the common su-expressions and
not performing the assignment of the form x := y unless and until it is a must.
2.Define peephole optimization.
Peephole optimization is a simple and effective technique for locally improving target code.
This technique is applied to improve the performance of the target program by examining the
short sequence of target instructions and replacing these instructions by shorter or faster
sequence.
3.List the characteristics of peephole optimization.
 · Redundant instruction elimination

 · Flow of control optimization
 · Algebraic simplification
 · Use of machine idioms
4.What is a basic block?
A basic block is a sequence of consecutive statements in which flow of control enters at the
beginning and leaves at the end without halt or possibility of branching.
Eg. t1:=a*5
t2:=t1+7
t3:=t2-5
t4:=t1+t3
t5:=t2+b
RESULT:
Thus the program for construction of DAG has implemented and executed
successfully.
S.A. ENGINEERING COLLEGE, CHENNAI-77
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
VISION
To conceive our department as centre of academic excellence by catering quality education
with ethical standards.
MISSION
To create a conducive atmosphere to achieve active professionalism by fortifying academic
proficiency with ethical standards and to enhance the confidence level to develop sustainable
solutions to upgrade the society forever.
PROGRAMME EDUCATIONAL OBJECTIVES
PEO I:
Our graduates will have professional competency in the field of Computer Science and
Engineering to investigate, analyze and demonstrate problem solving skills across broad
range of application areas with sound technical expertise.
PEO II:
Our graduates will have ethical standards, leadership qualities, communication, presentation
and team work skills necessary to function effectively and professionally.
PEO III:
Our graduates will adapt to new technologies, tools and methodologies, to assess and respond
to the challenges of the changing environment and needs of the society by providing
sustainable innovative solutions to upgrade the society forever.
PROGRAM OUTCOMES
a. Fundamental Knowledge: To apply knowledge of Mathematics, Science and Computing

fundamentals to the problems appropriate to the discipline.
b. Problem Analysis: To identify, analyze and arrive substantiated conclusions for problems
exploiting the fundamental engineering and domain specific knowledge.
c. Design skills: To design a computer-based system, process, module or program to meet
desired needs.
d. Investigation Skills: To conduct investigations of experimental design and data analysis to
provide valid conclusions.
e. Modern Tool Usage: To identify and use suitable techniques, resources and tools necessary
for computing practice.
f. Reasoning skills: To apply reasoning skills to domain specific problems to address the
environmental issues.
g. Sustainability: To understand the impact of computer engineering solutions to necessitate the
sustainable development.
h. Ethical Standards: To comprehend professional and ethical standards and responsibilities.
i. Team Work: To develop self-confidence to work individually and also effectively with
teams.
j. Presentation skills: To communicate effectively with engineering fraternity to make
effective presentations.
k. Project Management: To demonstrate engineering and management principles in order to
evaluate and manage projects as a member.
l. Lifelong Learning: To recognize the need for, and engage in life-long learning in the context
of technological change.
CS6612 COMPILER LABORATORY LTPC
0032
OBJECTIVES:
The student should be made to:
Be exposed to compiler writing tools.
Learn to implement the different Phases of compiler
Be familiar with control flow and data flow analysis
Learn simple optimization techniques
LIST OF EXPERIMENTS:
1. Implementation of Symbol Table.

2. Develop a lexical analyzer to recognize a few patterns in C.(Ex. identifiers, constants,comments,
operators etc.)
3. Implementation of Lexical Analyzer using Lex Tool.
4. Generate YACC specification for a few syntactic categories.
a) Program to recognize a valid arithmetic expression that usesoperator +, -, * and /.
b) Program to recognize a valid variable which starts with a letterfollowed by any number of letters
or digits.
c)Implementation of Calculator using LEX and YACC
5. Convert the BNF rules into Yacc form and write code to generate Abstract Syntax Tree.
6. Implement type checking
7. Implement control flow analysis and Data flow Analysis
8 .Implement any one storage allocation strategies(Heap,Stack,Static)
9. Construction of DAG
10. Implement the back end of the compiler which takes the three address code and produces the 8086
assembly language instructions that can be assembled and run using a 8086 assembler. The target
assembly instructions can be simple move, add, sub, jump. Also simple addressing modes are used.
11.Implementation of Simple Code Optimization Techniques(Constant Folding.,etc.)
TOTAL: 45
PERIODS
OUTCOMES:
At the end of the course, the student should be able to
Implement the different Phases of compiler using tools
Analyze the control flow and data flow of a typical program
Optimize a given program
Generate an assembly language program equivalent to a source language program
LIST OF EQUIPMENT FOR A BATCH OF 30 STUDENTS:

Standalone desktops with C / C++ compiler and Compiler writing tools 30 Nos.
(or) Server with C / C++ compiler and Compiler writing tools supporting 30 terminals or more.
LEX and YACC
INDEX
Ex.No Date Name of the Experiments Page No
1 Implementation of Symbol Table
2 Develop a lexical analyzer to recognize a few patterns in C.

(Ex. identifiers, constants, comments, operators etc.)
3 Implementation of Lexical Analyzer using LEX Tool.
4 Generate YACC specification for a few syntactic

categories.
a)Program to recognize a valid arithmetic expression that
uses operator +, -, * and /.
b) Program to recognize a valid variable which starts with

a letterfollowed by any number of letters or digits.
c)Implementation of Calculator using LEX and YACC
5 Convert the BNF rules into YACC form and write code to
generate Abstract Syntax Tree.
6 Implement type checking
7 Implement control flow analysis
8 Implement of storage allocation strategies(Heap storage

allocation)
9 Construction of DAG
10 Implement the back end of the compiler
11 Implementation of Simple Code Optimization Techniques
Content Beyond the Syllabus
12 Implementation of Code Generator
13 Construction of NFA from Regular Expression
SIGNATURE OF LAB IN-CHARGE

(Ms.J.Sangeetha/AP/CSE)

System Requirements: Hardware Requirements

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

System Requirements: Hardware Requirements

Uploaded by

Copyright:

Available Formats

Ex.

 Processors - 2.0 GHz or Higher

printf("no such symbol");

 Reference book: John R. Levine, lex&yacc3.

 Reference ppt: Lecture 2: Lexical Analysis, CS 440/540, George Mason university4.

To develop a lexical analyzer to recognize a few patterns in C.(Ex. identifiers,

 Processors - 2.0 GHz or Higher

How it is being achieved?

Step 1:Start the program.

#define SIZE 128

#define EOS ‘\0’

#define NUM 256

#define KEYWORD 257

#define PAREN 258

#define ASSIGN 260

#define REL_OP 261

#define DONE 262

#define MAX 999

int lastchar = -1;

struct entry keywords[]={“if”,KEYWORD,”else”,KEYWORD,”for”,KEYWORD,

void Error_Message(char *m)

fprint(stderr,”line %d: %s”,lineno,m);

}int look_up(char s[])

int insert(chars[],int tok)

Error_Message(“Symbol Table is Full”);

Error_Message(“Lexemes Array is Full”);

lastchar = lastchar + len + 1;

struct entry *ptr;

else if(t == ’(‘ || t == ‘)’)

else if(t==‘<’ ||t==‘>’ ||t==‘<=’ ||t==‘>=’ ||t == ‘!=’)

else if(t == ’=’)

printf(“\n]t]t Program for Lexical Analysis \n”);

printf(“\n Enter the expression and put ; at the end”);

printf(“\n Press Ctrl + Z to terminate... \n”);

printf(“\n Number: %d”,tokenval);

if(lookahead==’+’|| lookahead==’-’|| lookahead==’*’||lookahead==’/’)

printf(“\n Identifier: %s“,

printf(“\n Assignment Operator”);

printf(“\n Relataional Operator”);

Program for Lexical Analysis

 The limitations are some trailing context patterns cannot be properly

1. List the various phases of a compiler. (Nov/Dec 2008)

3.What is a symbol table?

 Reference book: John R. Levine, lex&yacc3.

 Reference ppt: Lecture 2: Lexical Analysis, CS 440/540, George Mason university4.

Study of Lex & Yacc Tools

Lex source is a table of regular expressions and corresponding program fragments.

Input -> | yylex | -> | yyparse | -> Parsed input

Lex with Yacc

The general format of Lex source is:

3. Lex Regular Expressions.

Optional expressions. The operator ? indicates an optional element of an expression. Thus

Repeated expressions. Repetitions of classes are indicated by the operators * and +.

Alternation and Grouping. The operator | indicates alternation:

When an expression written as above is matched, Lex executes the corresponding

The function yyless() might be used to reprocess text in various circumstances.

1) input() which returns the next input character;

2) output(c) which writes the character c on the output; and

5. Ambiguous Source Rules.

1) The longest match is preferred.

Thus, suppose the rules

integer keyword action ...;

<symbol> ::= expression

where <symbol> is a nonterminal, and the expression consists of one or more