Professional Documents
Culture Documents
Lexical Analysis
Lexical Analysis
Lexical
Analysis
Ete ¥ .tealAn_yedr : -
→ Lexical
Analyzer is the first phase of
compiler .
is cohesive unit
→ Each token a
single
&
such as identifier , keywords , operators
punctuation mark .
Token
Source Lexical Parser Syntax> Rest of
symbol target
code
Table
→
As the lexical analyzer scans
to is
program
also
tokens it
oecognize
-
called scanner .
→
Upon receiving get next token from
lexical analyzer reads the
passes ,
input character until it can
identify
next token .
→ It
may also perform secondary tasks
at User Interface .
out the
one such task is
stripping
→
source
program comments & white
in form of blanks &
spaces
newline characters .
Functions
=
of
=
Lexical Analyzer : -
→
Produces stream of tokens
→
Eliminates blanks & comments
→ Generates
symbol table which stores
information about identifiers ,
constants ,
etc .
→
keeps track of line -
numbers
Reports generating
→
errors while token .
→ Lexical
Analyzer works in 2
phases :
-
② In 2nd phase ,
it does lexical analysis ,
it of token
meaning generate series .
Tokens Patterns & Lexemes
,
collective called
meaning is token .
Patten : -
Set of routes that describe
the token are
patterns .
lexemes : -
of token
pattern .
is called lexeme
For e.g.int i
,
num / etc .
_ .
Example : -
if(a< b)
if -
keyword
C- operator
a
-
identifier
<
operator
-
b -
identifier
J -
operator .
Identifier
→
Identifier is a collection of letters .
→
Identify is -
a collection of
alphanumeric characters
→
First character of identifies must
be a letter .
Operator , _
→
Operator can be arithmetic ,
logical or
relational
operators .
Parenthesis considered
→ are as
operators
→
comma is treated as a
separation
operator
denoted
→
Assignment is
by operator .
Keyword : -
→
keyword are
special
is
words to
associated
which
meaning
some
.
Foo
→ int
data
,
void are keywords denoting
types .
Input buffering : -
input
from character
left to
right one
at a time
→ It uses 2
pointers .
2) forward -
pts Cfp )
→
It uses 2 pointers to keep track
of portion of input scanned .
both the
→
Initially _
pointers are
towards first
pointing character
of input string
.bg#ij;i--itI:j--jtIi nti,j;i--i--j--jtI
↑
fp
→ The forward pointer moves ahead
to search the end of lexeme .
→
As blank is encountered
soon as
space ,
↑
i=i=jt
fp↑ blank space
secondary storage .
But
reading from secondary storage
→
is
costly
used .
buffering
so
techniques are
lexeme, the
buffer needs to be refilled that ,
makes an
overwriting to first
part of lexeme .
→
Two buffers are used to store the
string .
scanned alternatively .
→ to
are
specify
used .
token ,
regular expressions
→
when a
pattern is matched by some
recognized .
→
string is a collection of finite
number of alphabets or letters .
a) Prefix of string : -
A obtained
string by
→
removing zero
-
called
or more
trailing symbol is as
prefix of a
string .
2) Suffix of a
string : -
A obtained
→
string by removingis
zero
called
or more
leading
sufix of
symbols
as a
string .
3) swbstoing : -
A obtained by removing
→
string called
prefix & suffix is as
substoing
of string .
4) subsequence . .
of a
string : -
→
A
string obtained by removing zero
① r=E
start E
% S
qf
② N = a
a
start
q◦ qf
3) 8=81 f- r2
e NFe
start
Nt E
4) 8=9.02
start
tNFAG
?⃝
5) r =
Crs >
*
a
→
⑤ E_①NF iOˢ⑧
I
confusing
¥É : -
at
T z a b
so so 0
I.
- -
→
→
I
.
→
④
e- → →④→⑧
v.
?⃝