Professional Documents
Culture Documents
Faculty of Informatics
كلية الهندسة المعلوماتية
Compiler Design
Lexical Analyzer
تتلخص المهمة األساسية للمحلل اللفظي بتجميع محارف الدخل ،اآلتية من
النص البرمجي المصدري ،بهدف توليد مجموعة من الكلمات التي ستؤلف
بدورها جمل يعالجها المحلل القواعدي.
Examples :
a|b*
nat = [0-9]+
signedNat = (+|-)? nat
number = signedNat ("." nat)? (E signedNat)?
Regular Expressions for Programming Language Tokens
Reserved Words and Identifiers
reserved = if | while | do | …
Typically, an identifier must begin with a letter and contain only letters and digits
letter = [a-zA-Z]
digit = [0-9]
identifier = letter (letter|digit)*
Regular Expressions for Programming Language Tokens
Comments
Other than acting as token delimiter, white space is usually ignored in free
format languages
Finite Automata
Finite Automata, or finite state machines, can be used to describe the process of
recognizing patterns in input strings, and so can be used to construct scanners
Identifier = letter(letter|digit)*
Accepting State
digit = [0-9]
nat = digit+
signedNat = (+|-)? nat
number = signedNat ("." nat)? (E signedNat)?
nat = digit+
signedNat = (+|-)? nat
signedNat ("." nat)?
number = signedNat ("." nat)? (E signedNat)?
Finite Automata
Example : comments
A DFA for accepting comments surrounded by braces
Finite Automata
Example : C-like comments
A DFA for accepting C-like comments
It represent the fact that an identifier is not to be recognized (if we came from the start
state), or
A delimiter has been seen and we must now accept and generate an identifier token
DFA for Scanners
A Modified DFA
In this DFA :
Brackets surrounding other indicate that the delimiting character should be
considered lookahead, and should be returned to the input string.
The error state has become the accepting state
The diagram expresses the principle of longest substring (the DFA continues to
match letters and digits until a delimiter is found.
xtemp= ytemp
Implementation of Finite Automata in Code
Ad hoc Solution
Brackets around a state number indicate that the transition should not consume the input
Implementation of Finite Automata in Code
Transition Table for C-like Comments
State / in char letter digit other State / in char letter digit other State Accept
1 2 1 true 1 false
2 2 2 3 2 true true false 2 false
3 3 3 true
T Advance Accept
Code Schema for a Table Driven Parser
state := 1;
ch := next input character ;
while not (Accept[state]) and not (T[state , ch] = error) do
newstate := T [state , ch] ;
If Advance [ state , ch] then ch := next input char ;
state := newstate ;
end while;
if Accept[state] then accept ;