You are on page 1of 20

Lexical Analysis

Covering topics

● Interaction of lexical analyzer with parser


● Why lexical analysis and parsing?

● Tokens, Patterns and Lexemes

● Regular Languages

● Regular Expressions

● Questions?
Interaction of lexical analyzer with parser

token
source Lexical to semantic
Parser analysis
program Analyzer
getNextToken

Symbol
Table
Lexical analyzer

● Sometimes, divided into cascade of two process

– Scanning: processes that don't require tokenization like compaction of


whitespace characters into one and comments delation.
– Lexical analysis: complex portion that produces token from the output of
the scanner
Why lexical analysis and parsing?

● Number of reasons:

– Simplicity of design is the most important consideration


– Compiler efficiency is improved
– Compiler portability is improved
Token, Patterns and Lexemes

● Token: a pair consisting of a token and an optional attribute

● Pattern: description of the form that the lexemes of a token may take

● Lexeme: sequence of characters in source program that matches the pattern


for a token

● Example:

– printf(“Average is %d”,avg);
– printf and avg are lexemes matching pattern for token id
– and “Average is %d” is lexeme matching literal
Classes for tokens

● One token for each keyword

● Token for operator individually or in group such as comparison operators

● One token representing all identifiers

● One or more tokens representing constants(numbers,literals)

● Tokens for each punctuation symbol


Tokens Specification

● Regular expressions for specifying patterns

● String and Languages

– Alphabets and characters class


– Set {0,1} binary alphabets
– ASCII, UNICODE, EBCDIC computer alphabets
– string(words and sentences)
– empty string Ꮛ
– Language denotes: any set of strings over some fixed alphabet.
Operation on languages

● Union, Concatenation and Closure.

● Let L be set {A,B,...,Z,a,b,c,.....,z} and D the set{0,1,2,3,4,5,6,7,8,9}

– L U D set of letters and digits


– LD set of strings consisting of a letter followed by a digit
– L4 is the set of all four-letter strings
– L* set of all strings of letters, including Ꮛ
– L (L U D)*
– D+
Operation on languages
Regular Expression
● Notation used to define precisely language set

● Regular expression for identifier in pascal

– letter ( letter | digit ) *


● Each regular expression r represents a language L(r)
Regular Expression

Rules that define regular expression over alphabet £

– Ꮛ denotes {Ꮛ}
– a is symbol in £
– Consider u and v are RE denoting languages L(u) and L(v) then:

(u)|(v) denoting L(u) U L(v)

(u)(v) denoting L(u)L(v)

(u)* denoting (L(u))*

(v)2 denoting L(v)2


Language denoted by regular expression is called regular set.
Example

Let £={0,1}


RE (0|1) denotes the set {0, 1}


RE (0 (1|0) ) denotes set {01, 00}


RE (1*) denotes set {Ꮛ, 1, 11, 111, 1111, …...., 1111N}
Regular Definition
● Notational convenience we may give name to regular expressions


If £ an alphabet of basic symbol, then a regular definition is a sequence of
definitions of the form

– d1 → r1
– d2 → r2
– d3 → r3
– dn → rn
Example

letter → A | B | …..... | Z | a | b | ....... | z


digit → 0 | 1 | 2 | 3| …... | 9


id → letter ( letter | digit )*
Example

Unsigned floating like 245, 1345.456, 345. 324E34 or 1.87543E-23 has regular
expression

– digit → 0 | 1 | 2 | 3| …... | 9
– digits → digit digit*
– optional_fraction → (. digits) | Ꮛ
– optional_exponent → (E ( + | - | Ꮛ ) digits ) | Ꮛ
– num → digits optional_fraction optional_exponent
Tokens Recognition
● Consider grammar fragment

– stmt → if expr then stmt | if expr then stmt else stmt | Ꮛ


– expr → term relop term | term
– term → id | num
● Where terminals if, then, else, relop, id, and num generates sets of string given
by following regular definitions
– if → if
– Then → then
– Else → else
– Relop → < | <= | = | <> | > | >=
– Id → letter ( letter | digit ) *
– Num → digit + ( . digit +) ? (E( + | - ) ? digit +) ?
Tokens Recognition continued
– delim → blank | tab | newline
– ws → delim +
Translation Table
References

● Compilers Principals Techniques and Tools (Alfred V. Aho Colombia University,


Monica S.Lam Stanford Univeristy, Ravi Sethi Avaya, Jeffrey D.Ullman Stanford
University)
● Compiler course (Alex Icon Stanford University)

You might also like