Professional Documents
Culture Documents
Chapter 3
Lexical Analysis
Xuejun Liang
2019 Fall
2
Outlines (Sections)
1. The Role of the Lexical Analyzer
2. Input Buffering (Omit)
3. Specification of Tokens
4. Recognition of Tokens
5. The Lexical -Analyzer Generator Lex
6. Finite Automata
7. From Regular Expressions to Automata
8. Design of a Lexical-Analyzer Generator
9. Optimization of DFA-Based Pattern Matchers
3
Examples of Tokens
Token Classes:
1. One token for each keyword
2. Tokens for the operators
3. One token representing all identifiers
4. One or more tokens representing constants
5. Tokens for each punctuation symbol
7
E = M * C ** 2
8
String Operations
Language Operations
• Union Example:
L M = {s s L or s M} Compute
• Concatenation LD
LM = {xy x L and y M} LD
D 4
• Exponentiation
L0 = {}; Li = Li-1L D*
L(LD)*
• Kleene closure D +
L* = i=0,…, Li
where
• Positive closure L = {A, B, ..., Z, a, b, ... , z}
L = i=1,…, L
+ i
and D = {0, 1, . . . 9}
11
4. Recognition of Tokens
Example 3.8: A Grammar for branching statements
The terminals of the grammar, which are if, then, else, relop ,
id, and number, are the names of tokens for lexical analyzer.
17
whitespace
19
Transition Diagrams
relop <<=<>>>==
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
4 * return(relop, LT)
=
5 return(relop, EQ)
> =
6 7 return(relop, GE)
other
8 * return(relop, GT)
Whitespace
Sketch of implementation of relop 21
transition diagram
22
5. Lexical-Analyzer Generator:
Lex and Flex
• Lex and its newer cousin flex are scanner
generators
• Scanner generators systematically translate
regular definitions into C source code for
efficient scanning
• Generated code is easy to integrate in C
applications
23
lex.yy.c C a.out
compiler
input sequence
stream a.out of tokens
24
lex spec.l
gcc lex.yy.c -ll
./a.out < spec.l
27
6. Finite Automata
• Design of a Lexical Analyzer Generator
– Translate regular expressions to NFA
– Translate NFA to an efficient DFA
Optional
regular
NFA DFA
expressions
Transition Graph
• An NFA can be diagrammatically represented by a
labeled directed graph called a transition graph
• Example
– an NFA recognizing the language of regular expression
(alb) * abb
a
start a b b
0 1 2 3
b
S = {0,1,2,3}, = {a,b}, s0 = 0, F = {3}
34
Transition Table
• The mapping of an NFA can be
represented in a transition table
Input Input
State
(0,a) = {0,1} a b
(0,b) = {0} 0 {0, 1} {0}
(1,b) = {2} 1 {2}
(2,b) = {3}
2 {3}
35
Simulating a DFA
a a
38
Computing -closure(T)
push all states of T onto stack;
initialize -closure(T) to T;
while ( stack is not empty ) {
pop t, the top element, off stack;
for ( each state u with an edge from t to u labeled )
if ( u is not in -closure(T) ) {
add u to -closure(T) ;
push u onto stack;
}
}
41
C
b a b
start a b b
A B D E
a
a
a
42
start N(r1)
r1 r2 i f
N(r2)
INDUCTION r1 r2 start
i N(r1) N(r2) f
r* start
i N(r) f
46
r3 r1 r2
r4 r3
r5 r4
47
Subset construction
DFA
48
a { action1 }
start a b b
abb { action2 } 3 4 5 6
a b
a*b+ { action3 }
start
7 b 8
a
1 2
start
0 3
a
4
b
5
b
6
a b
7 b 8
49
a a b a
none
0 2 7 8 action3
1 4
3 7 Must find the longest match:
7 Continue until no further moves are possible
When last state is accepting: execute action
50
a b b a
none
0 2 5 6 action2
1 4 8 8 action3
3 7
7 When two or more accepting states are reached, the
first action given in the Lex specification is executed
51
Examples
a a b a
a b b a
52
Note:
1. The NFA is constructed by Thompson’s Algorithm
2. The important states in the NFA are numbered
54
Leaf true
{1, 2} | {1, 2}
Algorithm: followpos
for each node n in the tree {
if n is a cat-node with left child c1 and right child c2
for each i in lastpos(c1) {
followpos(i) := followpos(i) firstpos(c2)
}
else if n is a star-node
for each i in lastpos(n) {
followpos(i) := followpos(i) firstpos(n)
}
}
60
C
b
b a b
start a b b start a b b
A B D E AC B D E
a a
a
a b a a