Professional Documents
Culture Documents
Covering topics
● Regular Languages
● Regular Expressions
● Questions?
Interaction of lexical analyzer with parser
token
source Lexical to semantic
Parser analysis
program Analyzer
getNextToken
Symbol
Table
Lexical analyzer
● Number of reasons:
● Pattern: description of the form that the lexemes of a token may take
● Example:
– printf(“Average is %d”,avg);
– printf and avg are lexemes matching pattern for token id
– and “Average is %d” is lexeme matching literal
Classes for tokens
– Ꮛ denotes {Ꮛ}
– a is symbol in £
– Consider u and v are RE denoting languages L(u) and L(v) then:
●
(u)|(v) denoting L(u) U L(v)
●
(u)(v) denoting L(u)L(v)
●
(u)* denoting (L(u))*
●
(v)2 denoting L(v)2
●
Language denoted by regular expression is called regular set.
Example
●
Let £={0,1}
●
RE (0|1) denotes the set {0, 1}
●
RE (0 (1|0) ) denotes set {01, 00}
●
RE (1*) denotes set {Ꮛ, 1, 11, 111, 1111, …...., 1111N}
Regular Definition
● Notational convenience we may give name to regular expressions
●
If £ an alphabet of basic symbol, then a regular definition is a sequence of
definitions of the form
– d1 → r1
– d2 → r2
– d3 → r3
– dn → rn
Example
●
letter → A | B | …..... | Z | a | b | ....... | z
●
digit → 0 | 1 | 2 | 3| …... | 9
●
id → letter ( letter | digit )*
Example
●
Unsigned floating like 245, 1345.456, 345. 324E34 or 1.87543E-23 has regular
expression
– digit → 0 | 1 | 2 | 3| …... | 9
– digits → digit digit*
– optional_fraction → (. digits) | Ꮛ
– optional_exponent → (E ( + | - | Ꮛ ) digits ) | Ꮛ
– num → digits optional_fraction optional_exponent
Tokens Recognition
● Consider grammar fragment