Professional Documents
Culture Documents
Department of Computer
Science
Compiler Design
Symbol Table
• Upon receiving a “Get next token” command from parser, the lexical analyzer reads the input
character until it can identify the next token.
• Lexical analyzer also stripping out comments and white space in the form of blanks, tabs, and
newline characters from the source program.
+ Operator2
45 Constant1
Lexemes
Lexemes of identifier: total, sum
Lexemes of operator: =, +
Lexemes of constant: 45
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 10
Token \Attributes
• More than one lexeme can match a pattern
• the lexical analyzer must provide the information about the particular
lexeme that matched.
• For example, the pattern for token number matches both 0, 1, 934, …
• But code generator must know which lexeme was found in the source
program.
• Thus, the lexical analyzer returns to the parser a token name and an
attribute value
• For each lexeme the following type of output is produced
(token-name, attribute-value)
• attribute-value points to an entry in the symbol table for this token
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 11
Example
• E = M * C ** 2
• <id, pointer to symbol table entry for E>
• < assign_op>
• <id, pointer to symbol table entry for M>
• <mult_op>
• <id, pointer to symbol table entry for C>
• <exp_op>
• <number, integer value 2>
• The lexical analysis scans the input string from left to right one character at a time.
• Buffer divided into two N-character halves, where N is the number of character on
one disk block. : : : E : : = : : Mi : * : : C: * : * : 2 : eof : : :
:
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 14
Buffer pairs
Two pointers to the input are
maintained : : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :
lexemeBegin – marks the
beginning of the current forward forward
lexeme
lexeme_beginnig
Forward – scans ahead until a
pattern match is found
forward
lexeme_beginnig
• In buffer pairs we must check, each time we move the forward pointer that we
have not moved off one of the buffers.
• Thus, for each character read, we make two tests.
• We can combine the buffer-end test with the test for the current character.
• We can reduce the two tests to one if we extend each buffer to hold a sentinel
character at the end.
• The sentinel is a special character that cannot be part of the source program, and a
natural choice is the character EOF.
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 17
Sentinels
: : E : : = : : Mi : * : eof : C: * : * : 2 : eof : : eof
*
𝜖
a
aa
aaa Infinite …..
aaaa
aaaaa….
.
+
a
aa
aaa Infinite …..
aaaa
aaaaa….
.
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟎,𝟏𝟎𝟏,𝒂𝒃𝒂,𝒃𝒂𝒂𝒃…
12.String starts and ends with same character
……
• The terminals of the grammar, which are if, then, else, relop, id, and
number, are the names of tokens
• The patterns for the given tokens are:
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 40
Recognition of tokens …
• The lexical analyzer also has the job of stripping out whitespace, by
recognizing the "token" ws defined by:
is a state
is a transition
is a start state
is a final state
<
0 1
=
2 return (relop,LE)
>
3 return (relop,NE)
=
other
5
4 return (relop,LT)
return (relop,EQ)
>
6 =
7 return (relop,GE)
other
8 return (relop,GT)
E digit
3
5280
39.37
1.894 E - 4
2.56 E + 7
45 E + 6
96 E 2
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 46
Finite Automata
Finite Automata
• Finite Automata are recognizers.
• FA simply say “Yes” or “No” about each possible input string.
• Finite Automata is a mathematical model consist of:
1. Set of states
2. Set of input symbol
3. A transition function move
4. Initial state
5. Final states or accepting states
𝑖 N(s) 𝑓
start
start N(t)
𝑖 𝑓
𝜖
Ex: ab
2. For in , construct the NFA
𝑖 𝑓
start a a b
1 2 3
𝜖 N(t) 𝜖 𝜖
Ex: a*
𝜖
Ex: (a|b) a
2 3
𝜖 𝜖 𝑎 𝜖
1
𝜖
6
1 2 3 4
𝜖 𝜖 𝜖
4 5
b
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 52
Regular expression to NFA using Thompson's rule
• a*b
𝜖 𝑎 𝜖 𝑏
1 2 3 4 5
𝜖
𝜖
• b*ab
𝜖 𝑏 𝜖 𝑎 𝑏
1 2 3 4 5 6
𝜖
if is not in then
add as unmarked state to
end
end
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
{0, 1, 7, 2,
𝜖- Closure(0)=
4}
=
---- A
{0,1,2,4,7}
a
2 3
States a b
𝜖 𝜖 A = {0,1,2,4,7} B
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B=
{1,2,3,4,6,7,8}
𝜖 𝜖
4 5
b
𝜖
A= {0, 1, 2, 4, 7}
Move(A,a)= {3,8}
𝜖-
Closure(Move(A,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 60
Conversion from NFA to DFA
𝜖
a
2 3
States a b
𝜖 𝜖 A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B=
{1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5
b
𝜖
A= {0, 1, 2, 4, 7}
Move(A,b) ={5}
𝜖- Closure(Move(A,b)) ={5, 6, 7, 1, 2, 4}
----
= {1,2,4,5,6,7}
C
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 61
Conversion from NFA to DFA
𝜖
a
2 3
States a b
𝜖 𝜖 A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B= B
{1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5
b
𝜖
B = {1, 2, 3, 4, 6, 7, 8}
Move(B,a)= {3,8}
𝜖-
Closure(Move(B,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 62
Conversion from NFA to DFA
𝜖
a
2 3
States a b
𝜖 𝜖 A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B= B D
{1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5 D=
b {1,2,4,5,6,7,9}
𝜖
B= {1, 2, 3, 4, 6, 7,
8}
Move(B,b)= {5,9}
𝜖-
= {5, 6, 7, 1, 2, 4, 9}
Closure(Move(B,b)) ----
= {1,2,4,5,6,7,9}
D
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 63
Conversion from NFA to DFA
𝜖
a
2 3
States a b
𝜖 𝜖 A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B= B D
{1,2,3,4,6,7,8}
C = {1,2,4,5,6,7} B
𝜖 𝜖
4 5 D=
b {1,2,4,5,6,7,9}
C= {1, 2, 4, 5, 6 ,7}
Move(C,a)= {3,8}
𝜖-
Closure(Move(C,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 64
Conversion from NFA to DFA
𝜖
a
2 3
States a b
𝜖 𝜖 A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B= B D
{1,2,3,4,6,7,8}
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D=
b {1,2,4,5,6,7,9}
𝜖
C= {1, 2, 4, 5, 6,
7}
Move(C,b)
{5}
=
𝜖-
Closure(Move(C,b))= {5, 6, 7, 1, 2, 4}
----
= {1,2,4,5,6,7}
C
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 65
Conversion from NFA to DFA
𝜖
a
2 3
States a b
𝜖 𝜖 A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B= B D
{1,2,3,4,6,7,8}
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D= B
b {1,2,4,5,6,7,9}
𝜖
D= {1, 2, 4, 5, 6, 7,
9}
Move(D,a)= {3,8}
𝜖-
Closure(Move(D,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 66
Conversion from NFA to DFA
𝜖
a
2 3
States a b
𝜖 𝜖 A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B= B D
{1,2,3,4,6,7,8}
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D= B E
b {1,2,4,5,6,7,9}
E=
{1,2,4,5,6,7,10}
𝜖
D= {1, 2, 4, 5, 6, 7,
9}
Move(D,b)= {5,10}
𝜖- = {5, 6, 7, 1, 2, 4,
Closure(Move(D,b)) 10}
= {1,2,4,5,6,7,10}---- E
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 67
Conversion from NFA to DFA
𝜖
a
2 3
States a b
𝜖 𝜖 A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B= B D
{1,2,3,4,6,7,8}
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D= B E
b {1,2,4,5,6,7,9}
E= B
{1,2,4,5,6,7,10}
𝜖
E= {1, 2, 4, 5, 6, 7,
10}
Move(E,a)= {3,8}
𝜖-
Closure(Move(E,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 68
Conversion from NFA to DFA
𝜖
a
2 3
States a b
𝜖 𝜖 A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B= B D
{1,2,3,4,6,7,8}
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D= B E
b {1,2,4,5,6,7,9}
E= B C
{1,2,4,5,6,7,10}
𝜖
E= {1, 2, 4, 5, 6, 7,
10}
Move(E,b)={5}
𝜖-
{5,6,7,1,2,4}
Closure(Move(E,b))=
----
= {1,2,4,5,6,7}
C
Mekonen M. # CoSc4072 Unit 2 – Lexical Analysis 69
Conversion from NFA to DFA
a
b
States
A = {0,1,2,4,7}
a
B
b
C
a B a
D
B=
{1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
B
B
D
C A a a b
D=
{1,2,4,5,6,7,9}
E=
B
B
E
C
b
C b
E
{1,2,4,5,6,7,10}
Transition
Table
b
Note:
• Accepting state in NFA is 10 DFA
• 10 is element of E
• So, E is acceptance state in DFA
{𝐷} E B C
States a b
A B A
B B D