Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
Lexical Analysis

Lexical Analysis

Ratings: (0)|Views: 64|Likes:
Published by niharika garg

More info:

Published by: niharika garg on Sep 18, 2009
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as DOC, PDF, TXT or read online from Scribd
See more
See less





Unit 2
Role of lexical analyzer
Recognize tokens and ignore white spaces, comments
Generates token stream
. Error reporting
. Model using regular expressions
Recognize using Finite State AutomataDiagram is here
Design of lexical analyzer
. Allow white spaces, numbers and arithmetic operators in an expression. Return tokens and attributes to the syntax analyzer . A global variable tokenval is set to the value of the number . Design requires that- A finite set of tokens be defined- Describe strings belonging to each token
We now try to construct a lexical analyzer for a language in which white spaces, numbers andarithmetic operators in an expression are allowed. From the input stream, the lexical analyzer recognizes the tokens and their corresponding attributes and returns them to the syntax analyzer.To achieve this, the function returns the corresponding token for the lexeme and sets a globalvariable, say
, to the value of that token. Thus, we must define a finite set of tokens andspecify the strings belonging to each token. We must also keep a count of the line number for the purposes of reporting errors and debugging.
Regular Expressions
We use regular expressions to describe tokens of a programming language.A regular expression is built up of simpler regular expressions (using definingrules)Each regular expression denotes a language.A language denoted by a regular expression is called as a
regular set
Regular Expressions (Rules)
Regular expressions over alphabet
Reg. Expr Language it denotes
{a}(r1) | (r2) L(r1)
L(r2)(r1) (r2) L(r1) L(r2)(r)*(L(r))*(r)L(r)(r)+ = (r)(r)*(r)? = (r) |
 We may remove parentheses by using precedence rules. * highest –concatenation next |lowestab*|c means (a(b)*)|(c)Ex: – 
= {0,1} 0|1 => {0,1} –(0|1)(0|1) => {00,01,10,11} 0* => {
,0,00,000,0000,....} –(0|1)* => all strings with 0 and 1, including the empty string
Specificification and recognition of tokens
Token represents a set of strings described by a pattern. –Identifier represents a set of strings which start with a letter continues withletters and digits –The actual string (newval) is called as
. –Tokens: identifier, number, addop, delimeter, …Since a token can represent more than one lexeme, additional information should be held for that specific lexeme. This additional information is called as the
of the token.For simplicity, a token may have a single attribute which holds the requiredinformation for that token.For identifiers, this attribute a pointer to the symbol table, and the symbol tableholds the actual attributes for that token.Some attributes: –<id,attr> where attr is pointer to the symbol table –<assgop,_> no attribute is needed (if there is only one assignmentoperator) <num,val>where val is the actual value of the number.Token type and its attribute uniquely identifies a lexeme.
 Regular expressions
are widely used to specify patternsHow to recognize tokens. Consider relop < | <= | = | <> | >= | >id letter(letter|digit)*num digit tab | newlinews delim +. Construct an analyzer that will return <token, attribute> pairsWe now consider the following grammar and try to construct an analyzer that will return<token, attribute> pairs.relop < | = | = | <> | = | >id letter (letter | digit)*num digit+ ('.' digit+)? (E ('+' | '-')? digit+)?

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->