You are on page 1of 4

Atif Ishaq - Lecturer GC University, Lahore

Compiler Construction
CS-4207
Lecture – 06

Transition Diagram
As an intermediate step in the construction of compiler, the pattern are convert into stylized
flowcharts called transition diagram. Right now we will generate transition diagram by hand but
there exists some mechanical ways to construct these diagrams from collection of regular
expression. Transition diagrams have collection of nodes or circles called states. Each state
represents a condition that could occur during the process of scanning the input looking for a
lexeme that matches one of several patterns. As we already talked about two pointers lexemebegin
and forward pointer. Remembering this, we may think of a state as summarizing all we need to
know about what characters we have seen between lexemebegin and forward pointer. Edges
labelled by a symbol or set of symbols are directed from one state to other.
We need to remember some conventions about transition diagram
1. A transition diagram has one or more final or accepting states indicated with double circle
or + sign. These states indicate that a lexeme has found. If an action is associated like
returning a token and an attribute value to the parser – the action is attached with the
accepting state
2. A transition diagram has one designated start state. It is indicated by an incoming edge
labelled start from no where

Below is the transition diagram that recognizes the lexeme matching the token relop. The start state is 0
and from the start state we have three possibilities, we can have < , = or > as start symbol. If < is the start
symbol or first input, then among the lexemes that matches the pattern for relop we can look for < , <= or
<>. For instance if the next symbol is = then from state 1 we move to the final state 2 after reading the
input symbol = and return the token relop with attribute value LE. Now we can trace out the other possible
inputs with the expected output token and attribute value.
Atif Ishaq - Lecturer GC University, Lahore

Below is the transition diagram to recognize the id and keywords. The keywords like if or else are reserve
words so they are not identifiers, regardless they looks like. The diagram like below s usually used to
recognize the identifiers although we may use it to recognize the keywords also.

To handle reserve words like identifiers, either install the keywords in the symbol table indicating they are
reserve words or create a separate transition diagram for each keyword. In the above diagram we begin with
state 9 by checking that a lexeme begins with a letter and goes to state 10. At state 10 there is loop that
reads either a digit or letter. When we encounter anything other than digit or letter we go to state 11 and
accept lexeme is found. Last character is not part of lexeme.
Consider the following transition diagram that shows unsigned numbers

Beginning with the state 12 we first read a digit and goes to state 13. At this state we can read any number
of additional digits. If at this state we encounter anything other than digit or a dot (we have seen a number
like 345) we move to state 20. The lexical analyzer will return a token number and a pointer to a table of
constants where the found lexeme is entered. But if at state 13 we encounter a dot, it means we have fraction
so we have to look for an additional digits and state 15 is used for this purpose. And if we encounter an E
we have optional exponent. Summarizing, whenever we are in the final state in any case we have lexeme
that are to be returned.
Another transition diagram below is for white spaces. The below diagram shows that we look for one more
white space represented by delim – typically these characters are blanks, tabs and newline and are not
considered by the langauge

`
Atif Ishaq - Lecturer GC University, Lahore

What else lexical Analyzer Do?


All keywords / reserve words are matched as ids. If the keywords are stored in symbol table. After the
match, symbol table or special keyword table is consulted. The entries in the keyword table is the keyword
along with the associated token value. When a match is found the token is returned along with its symbolic
value e.g. “if” , 15. In case a match is not found it is assumed that an id is found.

if 15
then 16
begin 17
.. ..

Coding Regular Definition in Transition Diagram


A transition diagram can be used to build a lexical analyzer. To code, each state is represented by a piece
of code. A variable state may be assumed to hold number of current state for transition diagram. For this
we can make use of switch structure. A switch based on the value of state takes us to code for each of
possible state, where we find the action of that state. Below is the sample code, not complete , in c++
which simulates the transition diagrams as discussed above.

Code

The code presented above returns a token as returned value. The token is a pair consisting of token name
and attribute value. The getRelOp first creates a new object retToken and initializes its first component to
RELOP, the symbolic code for token relop. At state 0 a function nextChar() obtains the next character
Atif Ishaq - Lecturer GC University, Lahore

from the input and assign to local variable c. we can check c for three possible characters. If the next
character is not one that can begin a comparison operator, then a function fail() is called. What fail() does
depends on the global error recovery strategy of the lexical analyzer. It should reset the forward pointer to
lexemebegin, in order to allow another transition diagram to be applied to the true beginning of unprocessed
input. It might then change the value of state to be the start state for another transition. Alternatively, if
there is no transition diagram that remains unused, fail() could initiate an error correction phase that will
try to repair the input and find a lexeme.

Code2

You might also like