You are on page 1of 32

Regular Expressions to Finite Automata

• High-level sketch

NFA

Regular
DFA
expressions

Lexical Table-driven
Specification Implementation of DFA

59
Regular Expressions to NFA (0)

• The steps are:


– Break down the RE into constituent sub-
expressions
– Create simple NFA’s for each basic symbol
– Combine the NFA’s using rules for the
operators (concatenation, alternation,
exponentiation
• Note: details here differ slightly from book, but
are equivalent
60
Regular Expressions to NFA (1)

• For each kind of rexp, define an NFA


– Notation: NFA for rexp M

For ϵ
ϵ

For input a
a

61
Regular Expressions to NFA (2)

For AB
A B

(slightly different in book)

For A | B
ϵ B ϵ
ϵ
ϵ A

62
Regular Expressions to NFA (3)

For A*
ϵ

ϵ A ϵ

63
Example of RegExp -> NFA conversion

• Consider the regular expression


(1|0)*1
• The NFA is

ϵ C
1 ϵ
E 1
A ϵ B ϵ
ϵ 0 ϵ
G H I
D F
ϵ

64
NFA to DFA. The Trick

• Simulate the NFA


• Each state of DFA
= a non-empty subset of states of the NFA
• Start state
= the set of NFA states reachable through ϵ-moves from NFA
start state
See ϵ -closure() algorithm, p.69 in textbook
• Add a transition S →a S’ to DFA iff
– S’ is the set of NFA states reachable from any state in S
after seeing the input a
• considering ϵ -moves as well

65
NFA -> DFA Example: Machine for (1|0)*1

C
1 ϵ
ϵ E
ϵ 1
A B 0 G ϵ H I
ϵ D F ϵ

•NFA Start state is marked

66
ϵ -closure(A) = DFA Start State
ϵ -closure(A) ϵ
1 ϵ
ϵ C E
A ϵ B G ϵ
1
0 H I
ϵ D F ϵ

ϵ
(States reachable from A on ϵ−transitions)
S
Call this “S”.
ABCDH Where can we get from
S on input “0”?
(ϵ -closure(F))
67
Second DFA state, Reachable on 0 from
Start State
ϵ -closure(F) ϵ
1 ϵ
ϵ C E
A ϵ B G ϵ
1
0 H I
ϵ D F ϵ

ϵ
T 0
S 0 BCDFGH
ABCDH
Call this “T”. Next: back to “S” to look
Where can we get for another possible state
from T on input “0”? 68
Answer: only back to T
DFA Start State Again

ϵ
1 ϵ
ϵ C E
A ϵ B G ϵ
1
0 H I
ϵ D F ϵ

From “S” again:


Where can we get on
input “1”?
ϵ-closure(E), + I
69
Third DFA state, Reachable on “1” From
Start State
ϵ-closure(E), + I ϵ
1 ϵ
ϵ C E
A ϵ B G ϵ
1
0 H I
ϵ D F ϵ

ϵ
T 0 Call this “U”.
Because this
S 0 BCDFGH
includes
0 1 state I, it is
ABCDH 1 an accepting
1 BCDEGHI state.
U 70
The Resultant DFA: Three States

(1|0)*1
0
0 T
S 0 1
1
1 U

No more distinct states were found, so


this is the complete DFA.

71
NFA to DFA. Remarks

• An NFA may be in many states at any time

• How many different states ?

• If there are N states, the NFA must be in


some subset of those N states

• How many non-empty subsets are there?


– 2N - 1 = finitely many

72
Implementation

• A DFA can be implemented by a 2D table T[]


– One dimension is “states”
– Other dimension is “input symbols”
– For every transition Si →a Sk define T[i,a] = k
• DFA “execution”
– If in state Si and input a, read T[i,a] = k and skip to
state Sk
– Very efficient

73
Table Implementation of a DFA

0
0 T

S 0 1
1
1 U

(1|0)*1
0 1 “U” is the
S T U accepting
state, so it
T T U is marked
with a “*”.
U* T U

74
Algorithm to simulate DFA (Fig 3.22, p.116)
array move[ ] input int final(int s) {/* “1” iff s is final */
return (s == 2);
state 0 1
}
0 1 2 int scan() {
1 1 2 int s = 0; -- current state
int c, move[3,2];
2* 1 2 c = getchar();
while (c != EOF) {
s = move[s,c];
S,T, & U are now states c = getchar();
0, 1, and 2
}
if (final(s)) return (YES);
This algorithm reads
an input string, else return (NO);
attempting to verify }
it using a DFA encoded
as a table.
75
Implementation (Wrap-up)

• NFA -> DFA conversion is at the heart of tools


such as lex and flex

• But, DFAs can be huge

• There are algorithms for reducing DFA’s but


we will not look at this issue

76
Lex and Flex: Scanner Generators

Lex source lex.yy.c


program Lex contains
foo.l compiler
function
yylex( )

a.out
lex.yy.c C (binary
compiler
executable)

candidate token
a.out
text file stream

77
Lex “Program” Structure

%{
.....C header stuff, copied into lex.yy.c.....
%}
.....Regular Definitions....
%%
.....Translation Rules for all tokens.....
%%
.....Auxiliary C Procedures, also copied
into lex.yy.c

78
Regular Definitions and Translation Rules

D1 R1
D2 R2
D3 R3
%%
P1 { action1 }
P2 { action2 }
P3 { action3 }
P4 { action4 }

79
Earlier Example Token Specs
(Added ASSIGN token)

Token Code lexval


“if” IF -
“then” THEN -
“else” ELSE -
id ID (symtab entry)
num NUM (value)
:= ASSIGN -
< RELOP LT
<= RELOP LE
= RELOP EQ
<> RELOP NE
> RELOP GT
>= RELOP GE
80
A Lex Implementation
%{
#define IF 3
#define THEN 4
#define ELSE 5 /* Regular definitions */
#define ID 6 delim [ \t\n]
#define NUM 7 ws {delim}+
#define RELOP 8 letter [A-Za-z]
#define ASSIGN 10 digit [0-9]
/* Subtypes of RELOP */ id {letter}({letter}|{digit})*
#define LT 1 num {digit}+
#define LE 2 %%
#define EQ 3
#define NE 4
#define GT 5
#define GE 6
int yylval; /* tokenval */
#include "sym.h"
Sym *insert(),*lookup();
%}
81
A Few Notes about Lex

• When a pattern is matched, the corresponding action


is executed
• Static char array “yytext” contains matched string
• If the action does not return, Lex tries to consume
more input, looking for another token
– Therefore actions normally execute a return statement with
a token code
– To “throw away” something, just use an “empty” action
• If more than one pattern matches a prefix of the
remaining input, chooses the longest matching string
• If more than one pattern matches the same input,
chooses the first one defined
• Unrecognized input is copied to stdout
82
The Translation Rules
{ws} { /* throw it away */ }
if { return(IF); }
then { return(THEN); }
else { return(ELSE); }
{id} { if ((yylval=(int)lookup(yytext))==NULL)
yylval=(int)insert(yytext,ID); return(ID); }
{num} { yylval=atoi(yytext); return(NUM); }
"<" { yylval=LT; return(RELOP); }
"<=" { yylval=LE; return(RELOP); }
"<>" { yylval=NE; return(RELOP); }
"=" { yylval=EQ; return(RELOP); }
">" { yylval=GT; return(RELOP); }
">=” { yylval=GE; return(RELOP); }
":=" { return (ASSIGN); }
%%
/* Code for insert( ) and lookup( ) can go here... */

83
Demonstration of Lex Example

/* A Short mainline to test the lexical analyser*/

extern int yylval;


main() {
int tok;
init_strtab(20000); /* part of symbol table package */
do {
tok = yylex( );
printf("Token %d Value %d\n", tok, yylval);
}
while (tok != 0);
}

84
Compile, Link and Run..

torch% a.out typed input


torch% lex scanner.l if a<=26 then b:=sum else b:=0 string
Token 3 Value 0
torch% cc main.c sym.c lex.yy.c -ll Token 6 Value 53416
main.c: Token 8 Value 2
sym.c: Token 7 Value 26 “left-overs”
lex.yy.c: Token 4 Value 26
Linking: Token 6 Value 53440
Token 10 Value 53440
torch%
Token 6 Value 53464
Token 5 Value 53464
Token 6 Value 53440
Token 10 Value 53440
IF 3 Token 7 Value 0
THEN 4 Token 0 Value 0
ELSE 5
ID 6 torch%
NUM 7
RELOP 8
ASSIGN 10 85
86
First Glimpse of the Term Project...

• A small object-oriented language, compiled to MIPS


• “Cool”: Classroom Objected-Oriented Language”
• “UnCool” (because I simplified it, so it isn’t Cool
anymore!)
• Multiple classes, all in single source file
• Int, Bool, String, and Int array primitive types
• Public Methods and private Attribute fields
– With Attribute initializers
• Assignment, expressions, if and while
• Simple input-output of numbers, and output of Strings
– (Uses built-in facilities in MIPS simulator)

87
Example Program: Factorial.cl

class Fac {
computeFac(num : Int) : Int {
if num < 1 then 1
else num * (computeFac(num-1)) fi
};
};
class Main {
main() : Object {
{ out_string("Enter a number ");
out_int((new Fac).computeFac(in_int()));
out_string("\n");
}
};
}; 88
Project (continued)

• Language spec is on the web


– (see "UnCool Manual" in Downloads section)
• Bison-ready grammar file will be provided
– You will need to provide a lexical analyser (Can use flex)
• On-line bison and flex manuals available from course web page
• Symbol table manager and several test programs will be provided
• “Reference compiler” binary on torch (see manual)
• Three passes:
– Parsing and construction of an AST
– Semantic Analysis Type checking
– Code Generation
• The reference compiler is about 1900 lines of code +bison & flex
files
89
Programming Assignment 1

• Familiarize yourself with the UnCool language


– and the MIPS simulator
• by writing and debugging a small program
– Quick Sort, to sort a list of numbers
– starting point provided (as a Java program)

• Complete assignment posted on web


• Due one week from today (January 23)
– worth 5 points, of the 35% for the term Project

90

You might also like