An Analysis of Compiler Design in Context of Lexical Analyzer

Uploaded by

snehalkotar1153

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views5 pages

An Analysis of Compiler Design in Context of Lexical Analyzer

Uploaded by

snehalkotar1153

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

et

International Journal on Emerging Technologies (Special Issue NCETST-2017) 8(1): 377-381(2017)

(Published by Research Trend, Website: www.researchtrend.net)
ISSN No. (Print) : 0975-8364
ISSN No. (Online) : 2249-3255

An Analysis of Compiler Design in Context of Lexical Analyzer

Praveen Saini and Renu Sharma
Department of Computer Science & Engineering,
Amrapali Institute of Technology& Sciences, Haldwani (U.K.)
ABSTRACT: In order to reduce the complexity of designing and building computers, nearly all of these are
made to execute relatively simple commands. A program for a computer must be built by combining these
very simple commands into a program what is called machine language. Since this is a tedious and error
prone process most programming is, instead, done using a high-level programming language. This language
can be very different from the machine language that the computer can execute, so some means of bridging
the gap is required. This is where the compiler comes in. A compiler translates a program written in a high-
level programming language into the low-level machine language that is required by computers. On the other
hand, programs that are written in a high-level language and automatically translated to machine language
may run somewhat slower than programs that are hand-coded in machine language. Hence, some time-
critical programs are still written partly in machine language. A good compiler will, however, be able to get
very close to the speed of hand-written machine code when translating well structured programs. During this
process, the compiler will also attempt to spot and report obvious programmer mistakes. A typical way of
doing this is to split the compilation into several phases with well-defined interfaces. Conceptually, these
phases operate in sequence. It is common to let each phase be handled by a separate module. We are
presenting a review; of working of the very first and important phase of the compiler known as lexical
analyzer. The lexical analysis programs written with Lex. Lex source is a table of regular expressions and
corresponding program fragments. The table is translated to a program which reads an input stream,
copying it to an output stream and partitioning the input into strings which match the given expressions. As
each such string is recognized the corresponding program fragment is executed. The recognition of the
expressions is performed by a deterministic finite automaton generated by Lex. The program fragments
written by the user are executed in the order in which the corresponding regular expressions occur in the
input stream. The final outcome of this paper is to present working of lexical analyzer in a simplest way to
provide depth knowledge about lexical analyzer phase which is very crucial phase of the compiler design.
Index Terms- Lex, Yacc Parser, Parser-Lexer,

I. INTRODUCTION
A compiler is system software that converts a high-
level programming language program into a target
language equivalent to low-level (machine) language
program. It validates the input program and shows the
error message or warnings if there is any. Obviously it
attempts to mark and detail the mistakes done by the
programmer [1]. Many of the techniques used to
construct a compiler are useful in a wide variety of
applications involving symbolic data. In particular,
every man-machine interface is a form of programming Fig. 1. Basic diagram of compiler.
language and the handling of input involves these
techniques.

Saini and Sharma 377

A lexical analyzer or lexer for short, will as its input
take a string of individual letters and divide this string
The term compilation denotes the conversion of an
into tokens. Additionally, it will filter out whatever
algorithm expressed in a human-oriented source
separates the tokens (the so-called white-space), i.e.,
language to an equivalent algorithm expressed in a
lay-out characters (spaces, newlines etc.) and
hardware-oriented target language. The very basic
comments. The main purpose of lexical analysis is to
diagram for compiler is shown in figure 1.
make life easier for the subsequent syntax analysis
II. PHASES OF GENERAL COMPILER phase. Lex and yacc were both developed at Bell.T.
Laboratories in the 1970s. Yaccwas the first of the two,
The compiler is made up of different modules or developed by Stephen C. Johnson. Lex was designed
phases. Starting with token recognition, it runs through by Mike Lesk and Eric Schmidt to work with yacc.
generation of context free grammar, parsing sequence,
Both lex andyacc have been standard UNIX utilities
checking acceptability, machine independence
since 7th Edition UNIX.
intermediate code generation to finally target code
Lex takes raw input, which is a stream of characters
generation state. These act as a basis for
and converts it into a stream of tokens, which are
communication interface between user and processor logical units, each representing one or more characters
[1, 3]. The first phase of the compiler is lexical that belong together."
analysis. The word “lexical” in the traditional sense
Typically,
means “pertaining to words”. In terms of programming
1. Each keyword is a token, e.g, then, begin, integer.
languages, words are objects like variable names,
2. Each identifier is a token, e.g., a, zap.
numbers, keywords etc. Such words are traditionally
3. Each constant is a token, e.g., 123, 123.45, 1.2E3.
called tokens. The main phases of a compiler include 4. Each sign is a token, e.g., (, <, <=, +.
and undergo through Lexical Analysis, Syntax
Analysis, Semantic Analysis, Intermediate Code A. Approaches to Building Lexical Analyzers
Generation, Code Optimization, and Target Code The lexical analyzer is the only phase that processes
Generation. input character by character, so speed is critical. Either
The various phases of the compiler is shown in figure write it yourself; control your own input buffering, or
2. use a tool that takes speculations of tokens, often in the
regular expression notation, and produces for you a
table-driven LA.
Lexical Analysis group the stream of refined input
characters into tokens. Figure 3 describes this fact.

Fig. 3. Phases in Lexical Analysis.

A lexer (Lexical Analysis) has to distinguish between

Fig. 2. Phases of compiler. several different types of tokens, e.g.
numbers, variables and keywords. A lexer does not
III. WORKING PRINCIPLE OF LEXICAL check if its entire input is included in the languages
ANALYZER defined by the regular expressions. Instead, it has to cut
the input into pieces (tokens), each of which is
Saini and Sharma 378
included in one of the languages. If there are several A common convention is that it is the longest prefix of
ways to split the input into legal tokens, the lexer has to the input that matches any token which will be chosen.
decide which of these it should use. Hence, the first of the above possible splitting of if17
The simplest approach would be to generate a DFA for will be chosen. Note that the principle of the longest
each token definition and apply the DFAs one at a time match takes precedence over
to the input. This can, however, be quite slow, so we the order of definition of tokens, so even though the
will instead from the set of token definitions generate a string starts with the keyword if, which has higher
single DFA that tests for priority than variable names, the variable name is
all the tokens simultaneously. This is not difficult to chosen because it is longer.
do: If the tokens are defined by regular expressions r1,
r2. . . rn, then the regular expression r1| r2| . . .| rn
describes the union of the languages r1; r2; : : : ; rn and
the DFA constructed from this combined regular
expression will scan for all token types at the same
time. However, we also wish to distinguish between
different token types, so we must be able to know
which of the many tokens was recognized by the DFA
[4].
B. Splitting the input stream
As mentioned, the lexer must cut the input into tokens.
This may be done in several ways. For example, the
string if17 can be split in many different ways:
- As one token, which is the variable name if17?
-As the variable name if1 followed by the number 7.
-As the keyword if followed by the number 17.
-As the keyword if followed by the numbers 1 & 7.
-As the variable name i followed by the variable
name f17.
-And several more

Fig. 5. Combine DFA for several tokens.

To illustrate the precedence rule, figure 4 shows an

NFA made by combining NFAs for variable names, the
keyword if, integers and floats. When a transition
is labeled by a set of characters, it is a shorthand for a
set of transitions each labeled by a single character.
The accepting states are labeled with token names
as described above. The corresponding minimized
DFA is shown in figure 5. Note that state G is a
combination of states 9 and 12 from the NFA, so it can
accept both NUM and FLOAT, but since integers take
priority over floats, we have marked G
with NUM only.
Let us discuss this with the help of an example,
suppose the pseudo code:
if (x*y<10)
{
Z = x;
}
Fig. 4. Combined NFA for several tokens. Let’s consider the first statement of the above code.
The corresponding token stream of pairs <type, value>

Saini and Sharma 379

is shown in Figure 6. Lex and input systems together As soon as it finds a token of interest to the parser, it
constitute layers of Lexical Analyzer [5]. returns to the parser, returning the token's code as the
value of yyfex(). Not all tokens are of interest to the
parser-in most programming languages the parser
doesn't want to hear about comments and white space.
The lexer and the parser have to agree what the token
codes are.
D. Left Context Sensitivity
Sometimes it is desirable to have several sets of lexical
rules to be applied at different times in the input. For
example, a compiler preprocessor might distinguish
preprocessor statements and analyze them differently
from ordinary statements. This requires sensitivity to
prior context, and there are several ways of handling
such problems. The ^ operator, for example, is a prior
context operator, recognizing immediately preceding
left context just as $ recognizes immediately following
right context. Adjacent left context could be extended,
to produce a facility similar to that for adjacent right
context, but it is unlikely to be as useful, since often
the relevant left context appeared some time earlier,
such as at the beginning of a line. Consider the
following problem: copy the input to the output,
changing the word magic to first on every line which
began with the letter a, changing magic to second on
every line which began with the letter b, and changing
magic to third on every line which began with the letter
c. All other words and all other lines are left
Fig. 6. Output Stage of Lexical Analyzer. unchanged.
These rules are so simple that the easiest way to do this
Sample program for lex job is with a flag:
%% int flag;
int k; %%
-?[0-9]+ { ^a {flag = 'a'; ECHO;}
k = atoi(yytext); ^b {flag = 'b'; ECHO;}
printf("%d", ^c {flag = 'c'; ECHO;}
k%7 == 0 ? k+3 : k); \n {flag = 0 ; ECHO;}
} magic {
?[0-9.]+ ECHO; switch (flag)
A-Za-z][A-Za-z0-9]+ ECHO; {
case 'a': printf("first"); break;
The rule [0-9]+ recognizes strings of digits; atoi case 'b': printf("second"); break;
converts the digits to binary and stores the result in k. case 'c': printf("third"); break;
The operator % (remainder) is used to check whether k default: ECHO; break;
is divisible by 7; if it is, it is incremented by 3 as it is }
written out. It may be objected that this program will }
alter such input items as 49.63 or X7. Furthermore, it
E. Error Handling
increments the absolute value of all negative numbers
The error handling in the Lexer is basically concerned
divisible by 7.
with the errors in the compiler or its environment,
C. Parser-Lexer Communication design errors in the program being compiled, an
When you use a lex scanner and a yacc parser together, incomplete understanding of the source language,
the parser is the higher level routine. It calls the lexer transcription errors, incorrect data, etc. The tasks of the
yylex() whenever it needs a token from the input. The error handling process are to detect each error, report it
lexer then scans through the input recognizing tokens. to the user, and possibly make some repair to allow

Saini and Sharma 380

processing to continue. It cannot generally determine IV. CONCLUSION
the cause of the error, but can only diagnose the visible
This paper outlines a novel approach to lexical phase in
symptoms. Similarly, any repair cannot be considered a
compiler construction. Furthermore, expressiveness is
correction (in the sense that it carries out the user's
barely sacrificed; the compiler can be boot strapped
intent); it merely neutralizes the symptom so that
provided there is enough run-time support. In spite of
processing may continue. The purpose of error
the scope of data storage is limited and symbols used
handling is to aid the programmer by highlighting
are a few, the main aim has been just cleared
inconsistencies. It has a low frequency in comparison
conception and application of efficient look up table
with other compiler tasks, and hence the time required
approach in finite states generation for lexical analysis.
to complete it is largely irrelevant, but it cannot be
The next phase of compilation is just introduced to
regarded as an 'add-on' feature of a compiler. We
represent its utility, for the sake of completion and
distinguish between the actual error and its symptoms.
better understanding. Further study on extending this
The diagnosis always involves some uncertainty, so we
model with parser generation to generate language
may choose simply to report the symptoms with no
constructs as well as error recovery in lexical analysis
further attempt at diagnosis. Thus the word 'error' is
is in progress.
often used when 'symptom' would be more appropriate.
The compiler has been used, for example, to study
A simple example of the symptom/error distinction is
advanced topics such as the implementation of first-
the use of an undeclared identified LAX. The use is
class continuations and register allocation.
only a symptom, and could have arisen in several
ways: REFERENCES
 The identifier was misspelled on this use.
[1]. Alfred V.Aho, Ravi Sethi, Jeffery D. Ullman, Addison-
 The declaration was misspelled or omitted. Wesley, 2007. Compilers- Principles, Techniques, and Tools.
 The syntactic structure has been corrupted, [2]. Torben Ægidius Mogensen, May 28, 2009. Basics of
causing this use to fall outside of the scope of Compiler Design”, lulu, Extended Edition.
the declaration. [3]. David Galles, 2005. Modern Compiler Design, Addison-
Most compilers simply report the symptom and let the Wesley.
user perform the diagnosis. An error is detectable if [4]. Torben Ægidius Mogensen, May 28, 2009. Basics of
and only if it results in a symptom that violates the Compiler Design”, lulu, Extended Edition
definition of the language. This means that the error [5]. International Journal of Computer Applications (0975 –
8887) Volume 6– No.11, September 2010
handling procedure is dependent upon the language
[6]. William M. Waite Department of Electrical Engineering
definition, but independent of the particular source University of Colorado Boulder, Colorado 80309USAemail:
program being analyzed. For example, the spelling William.Waite@colorado.edu.
errors in an identifier will be detectable in LAX [7] Aho, Alfred V. and Ullman, Jeffrey D. [1972]. The
(provided that they do not result in another declared Theory of Parsing, Translation, and Compiling. Prentice-
identifier) but not in FORTRAN, which will simply Hall, Englewood Cliffs.
treat the misspelling as a new implicit declaration. [8] Aho, Alfred V. and Ullman, Jeffrey D. [1977]. Principles
We shall use the term anomaly to denote something of Compiler Design. Addision.
that appears suspicious, but that we cannot be certain is [9] Ross, D. T. [1967]. The AED free storage package.
Communications of the ACM, 10(8):481492.
an error. Anomalies cannot be derived mechanically
[10] Rutishauser, H. [1952]. Automatische
from the language definition, but require some exercise Rechenplanfertigung bei Programm-gesteuerten
of judgment on the part of the implementers. As [11] Rechenmaschinen. Mitteilungen aus dem Institut f•ur
experience is gained with users of a particular Angewandte Mathematik der ETHZurich, 3.
language, one can spot frequently-occurring errors and [12] Sale, Arthur H. J. [1971]. The classification of
report them as anomalies before their symptoms arise. FORTRAN statements. Computer Journal,14:1012.
[13] Sale, Arthur H. J. [1977]. Comments on 'report on the
programming language Euclid'. ACM SIGPLAN Notices,
12(4):10.
[14] Sale, Arthur H. J. [1979]. A note on scope, one-pass
compilers, and Pascal. Pascal News, 15: 6263.

Saini and Sharma 381

Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
46 pages
Unit 03 Scanner
No ratings yet
Unit 03 Scanner
51 pages
Language Processing and Lexical Analysis
No ratings yet
Language Processing and Lexical Analysis
60 pages
CD Unit-1
No ratings yet
CD Unit-1
31 pages
Lexical Analyzer (Compiler Contruction)
100% (1)
Lexical Analyzer (Compiler Contruction)
6 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
48 pages
Compiler Design Principles Overview
No ratings yet
Compiler Design Principles Overview
24 pages
Compiler Design Essentials
100% (1)
Compiler Design Essentials
193 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
59 pages
Lexical Analysis and Parsing CD
No ratings yet
Lexical Analysis and Parsing CD
107 pages
Introduction to Compiler Design
No ratings yet
Introduction to Compiler Design
30 pages
Lecture 3 - Lexical Analysis
No ratings yet
Lecture 3 - Lexical Analysis
42 pages
CD Chapter 1
No ratings yet
CD Chapter 1
28 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
64 pages
Acd 2.1
No ratings yet
Acd 2.1
20 pages
Unit 1
No ratings yet
Unit 1
109 pages
Lexical Analysis Overview by Atif Ishaq
100% (1)
Lexical Analysis Overview by Atif Ishaq
37 pages
Lexical Analyzer Implementation Guide
No ratings yet
Lexical Analyzer Implementation Guide
6 pages
Chapter 2 Lexical Analysis (Scanning)
No ratings yet
Chapter 2 Lexical Analysis (Scanning)
56 pages
Compiler Structure and Lexical Analysis
No ratings yet
Compiler Structure and Lexical Analysis
28 pages
Compiler Gold
No ratings yet
Compiler Gold
46 pages
Lexical Analyzer Overview and Functions
No ratings yet
Lexical Analyzer Overview and Functions
56 pages
Compiler Structure and Lexical Analysis
No ratings yet
Compiler Structure and Lexical Analysis
28 pages
Understanding Lexical Analysis in Compilers
No ratings yet
Understanding Lexical Analysis in Compilers
59 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
26 pages
Compiler Design
No ratings yet
Compiler Design
7 pages
Structure of Lexical Analyzer in Compilers
No ratings yet
Structure of Lexical Analyzer in Compilers
3 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
51 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Lexical Analyzer in Compiler Design
No ratings yet
Lexical Analyzer in Compiler Design
38 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
42 pages
Compiler Structure and Lexical Analysis
No ratings yet
Compiler Structure and Lexical Analysis
28 pages
Compiler Design PDF
No ratings yet
Compiler Design PDF
8 pages
Lexical Analysis and Token Recognition
No ratings yet
Lexical Analysis and Token Recognition
67 pages
Unit 4 SS
No ratings yet
Unit 4 SS
13 pages
Role of Lexical Analyzer in Compilers
No ratings yet
Role of Lexical Analyzer in Compilers
38 pages
Compiler Design Essentials
No ratings yet
Compiler Design Essentials
37 pages
Compiler Design & Automata Theory Overview
No ratings yet
Compiler Design & Automata Theory Overview
11 pages
ECS-603 Put 13-14 Sol
No ratings yet
ECS-603 Put 13-14 Sol
24 pages
CC Viva Questions
0% (1)
CC Viva Questions
5 pages
Phases of Compiler Compilation Process
No ratings yet
Phases of Compiler Compilation Process
21 pages
Unit2 Lexical Analyzer
No ratings yet
Unit2 Lexical Analyzer
6 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
3 pages
VI Sem CSE: System Programming Manual
No ratings yet
VI Sem CSE: System Programming Manual
41 pages
Compiler Construction Tools Overview
No ratings yet
Compiler Construction Tools Overview
5 pages
PP La Sa
No ratings yet
PP La Sa
20 pages
Compiler Design Fundamentals and Analysis
No ratings yet
Compiler Design Fundamentals and Analysis
46 pages
Unit 1 CD
No ratings yet
Unit 1 CD
17 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
14 pages
Compiler Design: Phases and Analysis
No ratings yet
Compiler Design: Phases and Analysis
117 pages
Lexical Analyzer Lab Manual
0% (1)
Lexical Analyzer Lab Manual
32 pages
Compiler Design: Phases and Analysis
No ratings yet
Compiler Design: Phases and Analysis
50 pages
Lexical Analyzer Design and Tools
No ratings yet
Lexical Analyzer Design and Tools
20 pages
Compiler Overview and Phases Explained
No ratings yet
Compiler Overview and Phases Explained
35 pages
Compiler Design Principles Explained
No ratings yet
Compiler Design Principles Explained
53 pages
Compiler Design Overview and Phases
No ratings yet
Compiler Design Overview and Phases
53 pages
Manual 42000428
No ratings yet
Manual 42000428
59 pages
A-Gage Mini-Array: Two-Piece Measuring Light Screen
No ratings yet
A-Gage Mini-Array: Two-Piece Measuring Light Screen
16 pages
Maximo Custom Email Interaction 1722074621
No ratings yet
Maximo Custom Email Interaction 1722074621
12 pages
Pushdown Automata
No ratings yet
Pushdown Automata
26 pages
مسيرة النظام الصحي في السودان
No ratings yet
مسيرة النظام الصحي في السودان
329 pages
Create An Engaging Slide Deck With A Dynamic Platform - Emaze
No ratings yet
Create An Engaging Slide Deck With A Dynamic Platform - Emaze
1 page
WebAppSecurity Logbook
No ratings yet
WebAppSecurity Logbook
56 pages
Database Systems I Notes
No ratings yet
Database Systems I Notes
43 pages
Application
No ratings yet
Application
2 pages
Designer, Seek
No ratings yet
Designer, Seek
3 pages
ECE Graduate's Software Career Journey
No ratings yet
ECE Graduate's Software Career Journey
1 page
Fouzia Rais Resume
No ratings yet
Fouzia Rais Resume
1 page
What Role Does New Media Play in Modern Advertising Strategies?
No ratings yet
What Role Does New Media Play in Modern Advertising Strategies?
3 pages
Ds 2de2c400mwg e Datasheet 20250115
No ratings yet
Ds 2de2c400mwg e Datasheet 20250115
5 pages
Railway Revenue Expenditure Classification
No ratings yet
Railway Revenue Expenditure Classification
37 pages
ACDC Assignment3
No ratings yet
ACDC Assignment3
5 pages
Data Scientist Resume: Heather Song
No ratings yet
Data Scientist Resume: Heather Song
1 page
SAProuter Installation Guide
No ratings yet
SAProuter Installation Guide
2 pages
App Debug Log
No ratings yet
App Debug Log
12 pages
De Castell, S., Bryson, M. y Jenson, J. - Enero, 2002. Object Lessons - Towards An Educational Theory of Technology.
No ratings yet
De Castell, S., Bryson, M. y Jenson, J. - Enero, 2002. Object Lessons - Towards An Educational Theory of Technology.
11 pages
Trigonometryquestionsandsolutions Master
100% (1)
Trigonometryquestionsandsolutions Master
151 pages
HS40 Manual
100% (1)
HS40 Manual
9 pages
Advanced Encryption Algorithms
No ratings yet
Advanced Encryption Algorithms
8 pages
Manuals Search Engine Overview
No ratings yet
Manuals Search Engine Overview
57 pages
Spare Parts Price List Update
No ratings yet
Spare Parts Price List Update
7 pages
Thesis Writing Help for Instrumentation
100% (3)
Thesis Writing Help for Instrumentation
5 pages
Windows Services - All Roads Lead To System: Kostas Lintovois
No ratings yet
Windows Services - All Roads Lead To System: Kostas Lintovois
21 pages
Streamline Your Homework Online
100% (1)
Streamline Your Homework Online
7 pages
FNaF Timeline
No ratings yet
FNaF Timeline
9 pages
Statistics
No ratings yet
Statistics
2 pages

An Analysis of Compiler Design in Context of Lexical Analyzer

Uploaded by

An Analysis of Compiler Design in Context of Lexical Analyzer

Uploaded by

et

International Journal on Emerging Technologies (Special Issue NCETST-2017) 8(1): 377-381(2017)

An Analysis of Compiler Design in Context of Lexical Analyzer

Saini and Sharma 377

Fig. 3. Phases in Lexical Analysis.

A lexer (Lexical Analysis) has to distinguish between

Fig. 5. Combine DFA for several tokens.

To illustrate the precedence rule, figure 4 shows an

Saini and Sharma 379

Saini and Sharma 380

Saini and Sharma 381

You might also like