You are on page 1of 44

COS 320

Compilers
David Walker

Outline

Last Week

Introduction to ML

Toda!

Le"ical #nalsis

$eadin%! C&apter 2 o' #ppel



T&e (ront )nd

Le"ical #nalsis! Create se*uence o' tokens


'rom c&aracters

Snta" #nalsis! Create a+stract snta" tree 'rom


se*uence o' tokens

Tpe C&eckin%! C&eck pro%ram 'or ,ell-


'ormedness constraints
Le"er .arser
stream o'
c&aracters
stream o'
tokens
a+stract
snta"
Tpe
C&ecker

Le"ical #nalsis

Le"ical #nalsis! /reaks stream o' #SCII


c&aracters 0source1 into tokens

Token! #n atomic unit o' pro%ram snta"

i2e23 a ,ord as opposed to a sentence

Tokens and t&eir tpes!


Type:
ID
$)#L
S)MI
L.#$)4
45M
I(
Characters Recognized:
'oo3 "3 listcount
602783 32673 -226
9
0
803 600
i'
Token:
ID0'oo13 ID0"13 222
$)#L06027813 $)#L0326713 222
S)MI
L.#$)4
45M08013 45M06001
I(

Le"ical #nalsis )"ample
" : 0 ; 720 1 9

Le"ical #nalsis )"ample
" : 0 ; 720 1 9
ID0"1
Le"ical #nalsis

Le"ical #nalsis )"ample
" : 0 ; 720 1 9
ID0"1 #SSI<4
Le"ical #nalsis

Le"ical #nalsis )"ample
" : 0 ; 720 1 9
ID0"1 #SSI<4 L.#$)4 ID01 .L5S $)#L07201 $.#$)4 S)MI
Le"ical #nalsis

Le"er Implementation

Implementation Options!
62 Write a Le"er 'rom scratc&

/orin%3 error-prone and too muc& ,ork



Le"er Implementation

Implementation Options!
62 Write a Le"er 'rom scratc&

/orin%3 error-prone and too muc& ,ork


22 5se a Le"er <enerator

=uick and eas2 <ood 'or la> compiler ,riters2


Le"er
Speci'ication

Le"er Implementation

Implementation Options!
62 Write a Le"er 'rom scratc&

/orin%3 error-prone and too muc& ,ork


22 5se a Le"er <enerator

=uick and eas2 <ood 'or la> compiler ,riters2


Le"er
Speci'ication
le"er
%enerator
Le"er

Le"er Implementation

Implementation Options!
62 Write a Le"er 'rom scratc&

/orin%3 error-prone and too muc& ,ork


22 5se a Le"er <enerator

=uick and eas2 <ood 'or la> compiler ,riters2


Le"er
Speci'ication
le"er
%enerator
Le"er
stream o'
c&aracters
stream o'
tokens

?o, do ,e speci' t&e le"er@

Develop anot&er lan%ua%e

WeAll use a lan%ua%e involvin% re%ular


e"pressions to speci' tokens

W&at is a le"er %enerator@

#not&er compiler 2222



Some De'initions

We ,ill ,ant to de'ine t&e lan%ua%e o' le%al tokens


our le"er can reco%ni>e

#lp&a+et a collection o' sm+ols 0#SCII is an alp&a+et1

Strin% a 'inite se*uence o' sm+ols taken 'rom our


alphabet

Lan%ua%e o' le%al tokens a set o' strin%s


Lan%ua%e o' ML ke,ords set o' all strin%s ,&ic& are ML
ke,ords 0(I4IT)1
Lan%ua%e o' ML tokens set o' all strin%s ,&ic& map to ML tokens
0I4(I4IT)1
# lan%ua%e can also +e a more %eneral set o' strin%s!
e%! ML Lan%ua%e set o' all strin%s representin% correct ML
pro%rams 0I4(I4IT)12

$e%ular )"pressions! Construction

/ase Cases!

(or eac& sm+ol a in alp&a+et3 a is a $) denotin% t&e


set BaC

)psilon 0e1 denotes B C

Inductive Cases 0M and 4 are $)s1

#lternation 0M D 41 denotes strin%s in M or 4


0a D +1 :: Ba3 +C

Concatenation 0M 41 denotes strin%s in M


concatenated ,it& strin%s in 4
0a D +1 0a D c1 :: B aa3 ac3 +a3 +c C

Eleene closure 0MF1 denotes strin%s 'ormed + an


num+er o' repetitions o' strin%s in M
0a D + 1F :: Be3 a3 +3 aa3 a+3 +a3 ++3 222C

$e%ular )"pressions

Inte%ers +e%in ,it& an optional minus si%n3


continue ,it& a se*uence o' di%its

$e%ular )"pression!
0- D e1 00 D 6 D 2 D 3 D 7 D 8 D G D H D I D J1F

$e%ular )"pressions

Inte%ers +e%in ,it& an optional minus si%n3


continue ,it& a se*uence o' di%its

$e%ular )"pression!
0- D e1 00 D 6 D 2 D 3 D 7 D 8 D G D H D I D J1F

So ,ritin% 00 D 6 D 2 D 3 D 7 D 8 D G D H D I D J1
and even ,orse 0a D + D c D 2221 %ets
tedious222

$e%ular )"pressions 0$)s1

common a++reviations!

Ka-cL :: 0a D + D c1

2 :: an c&aracter e"cept Mn

Mn :: ne, line c&aracter

a; :: one or more

a@ :: >ero or one

all a++reviations can +e de'ined in terms


o' t&e NstandardO $)s

#m+i%uous Token $ule Sets

# sin%le $) is a completel unam+i%uous


speci'ication o' a token2

call t&e association o' an $) ,it& a token a NruleO

To le" an entire pro%rammin% lan%ua%e3 ,e


need man rules

+ut am+i%uities arise!

multiple $)s or se*uences o' $)s matc& t&e same


strin%

&ence man token se*uences possi+le



#m+i%uous Token $ule Sets

)"ample!

Identi'ier tokens! Ka->L Ka->0-JLF

Sample ke,ord tokens! i'3 t&en3 222

?o, do ,e tokeni>e!

'oo+ar ::P ID0'oo+ar1 or ID0'oo1 ID0+ar1

i' ::P ID0i'1 or I(



#m+i%uous Token $ule Sets

We resolve am+i%uities usin% t,o


conventions!

Lon%est matc&! T&e re%ular e"pression t&at


matc&es t&e lon%est strin% takes precedence2

$ule .riorit! T&e re%ular e"pressions


identi'in% tokens are ,ritten do,n in
se*uence2 I' t,o re%ular e"pressions matc&
t&e same 0lon%est1 strin%3 t&e 'irst re%ular
e"pression in t&e se*uence takes
precedence2

#m+i%uous Token $ule Sets

)"ample!

Identi'ier tokens! Ka->L Ka->0-JLF

Sample ke,ord tokens! i'3 t&en3 222

?o, do ,e tokeni>e!

'oo+ar ::P ID0'oo+ar1 or ID0'oo1 ID0+ar1


use lon%est matc& to disam+i%uate

i'::P ID0i'1 or I(
ke,ord rules &ave &i%&er priorit t&an identi'ier rule

Le"er Implementation
Implementation Options!
62 Write Le"er 'rom scratc&
/orin% and error-prone
22 5se Le"ical #nal>er <enerator
=uick and eas
ml-le" is a le"ical anal>er %enerator 'or ML2
le" and 'le" are le"ical anal>er %enerators 'or C2

ML-Le" Speci'ication

Le"ical speci'ication consists o' 3 parts!


5ser Declarations 0plain ML tpes3 values3 'unctions1
QQ
ML-L)R De'initions 0$) a++reviations3 special stu''1
QQ
$ules 0association o' $)s ,it& tokens1
0eac& token ,ill +e represented in
plain ML1

5ser Declarations

5ser Declarations!

5ser can de'ine various values t&at are


availa+le to t&e action 'ra%ments2

T,o values must +e de'ined in t&is section!

tpe le"result
tpe o' t&e value returned + eac& rule action2

'un eo' 01
called + le"er ,&en end o' input stream is reac&ed2

ML-L)R De'initions

ML-L)R De'initions!

5ser can de'ine re%ular e"pression


a++reviations!

De'ine multiple le"ers to ,ork to%et&er2 )ac&


is %iven a uni*ue name2
DI<ITS : K0-JL ;9
L)TT)$ : Ka->#-SL9
Qs L)R6 L)R2 L)R39

$ules

$ules!

# rule consists o' a pattern and an action!

.attern in a re%ular e"pression2

#ction is a 'ra%ment o' ordinar ML code2

Lon%est matc& T rule priorit used 'or disam+i%uation

$ules ma +e pre'i"ed ,it& t&e list o' le"ers t&at


are allo,ed to use t&is rule2
Ule"erVlistP re%ularVe"pression :P 0action2code1 9

$ules

$ule actions can use an value de'ined in t&e


5ser Declarations section3 includin%

tpe le"result
tpe o' value returned + eac& rule action

val eo' ! unit -P le"result


called + le"er ,&en end o' input stream reac&ed

special varia+les!

te"t! input su+strin% matc&ed + re%ular e"pression

pos! 'ile position o' t&e +e%innin% o' matc&ed strin%

continue 01! doesnAt return token9 recursivel calls


le"er

# Simple Le"er
datatpe token : 4um o' int D Id o' strin% D I( D T?)4 D )LS) D )O(
tpe le"result : token 0F mandator F1
'un eo' 01 : )O( 0F mandator F1
'un itos s : case Int2'romStrin% s o' SOM) " :P " D 4O4) :P raise 'ail
QQ
45M : K6-JLK0-JLF
ID : Ka->#-SL 0Ka->#-SL D 45M1F
QQ
i' :P 0I(19
t&en :P 0T?)419
else :P 0)LS)19
B45MC :P 04um 0itos te"t119
BIDC :P 0Id te"t19

5sin% Multiple Le"ers

$ules pre'i"ed ,it& a le"er name are matc&ed


onl ,&en t&at le"er is e"ecutin%

Initial le"er is called I4ITI#L

)nter ne, le"er usin%!

WW/)<I4 L)R)$4#M)9

#side! Sometimes use'ul to process c&aracters3


+ut not return an token 'rom t&e le"er2 5se!

continue 019

5sin% Multiple Le"ers
tpe le"result : unit 0F mandator F1
'un eo' 01 : 01 0F mandator F1
QQ
Qs COMM)4T
QQ
UI4ITI#LP i' :P 019
UI4ITI#LP Ka->L; :P 019
UI4ITI#LP N0FO :P 0WW/)<I4 COMM)4T9 continue 0119
UCOMM)4TP NF1O :P 0WW/)<I4 I4ITI#L9 continue 0119
UCOMM)4TP NMnO D 2 :P 0continue 0119

# 0Mar%inall1 More )"citin% Le"er
tpe le"result : strin% 0F mandator F1
'un eo' 01 : 0print N)nd o' 'ileMnO9 N)O(O1 0F mandator F1
QQ
Qs COMM)4T
I4T : K6-JL K0-JLF9
QQ
UI4ITI#LP i' :P 0NI(O19
UI4ITI#LP t&en :P 0NT?)4O19
UI4ITI#LP BI4TC :P 0 NI4T0N X te"t X N1O 19
UI4ITI#LP N0FO :P 0WW/)<I4 COMM)4T9 continue 0119
UCOMM)4TP NF1O :P 0WW/)<I4 I4ITI#L9 continue 0119
UCOMM)4TP NMnO D 2 :P 0continue 0119

Implementin% ML-Le"

/ compilin%3 o' course!

convert $)s into non-deterministic 'inite automata

convert non-deterministic 'inite automata into


deterministic 'inite automata

convert deterministic 'inite automata into a +la>in%l


'ast ta+le-driven al%orit&m

ou did mostl evert&in% +ut possi+l t&e last


step in our 'avorite al%orit&ms class

need to deal ,it& disam+i%uation T rule priorit

need to deal ,it& multiple le"ers



$e'res&in% our memor!
$) ::P 4D(# ::P D(#
Le" rules!
i' :P 0Tok2I(1
Ka->LKa->0-JLF :P 0Tok2Id91


$e'res&in% our memor!
$) ::P 4D(# ::P D(#
Le" rules!
i' :P 0Tok2I(1
Ka->LKa->0-JLF :P 0Tok2Id91
4D(#!
6
7
2
a->
'
i
a->0-J
3
Tok2I(
Tok2Id

$e'res&in% our memor!
$) ::P 4D(# ::P D(#
Le" rules!
i' :P 0Tok2I(1
Ka->LKa->0-JLF :P 0Tok2Id91
4D(#! D(#!
6
7
2
a->
'
i
a->0-J
3
Tok2I(
Tok2Id 6
7
237
a-&Y->
'
i
a->0-J
337
Tok2I(
Tok2Id
Tok2Id
0could +e Tok2Id9 decision made + rule priorit1
a-e%->0-J
a->0-J
a->0-J

Ta+le-driven al%orit&m

4D(#!
6
7
237
a-&Y->
'
i
a->0-J
337
Tok2I(
Tok2Id
Tok2Id
a-e%->0-J
a->0-J

Ta+le-driven al%orit&m

4D(# 0states convenientl renamed1!



S6
S7
S2
a-&Y->
'
i
a->0-J
S3
Tok2I(
Tok2Id
Tok2Id
a-e%->0-J
a->0-J

Ta+le-driven al%orit&m

D(#! Transition Ta+le!


S7 S7 S7 S7
S7 S7 S7 S7
S2 S7 S7 S7
S6 S2 S3 S7
a
+
222
i
222
S6
S7
S2
a-&Y->
'
i
a->0-J
S3
Tok2I(
Tok2Id
Tok2Id
a-e%->0-J
a->0-J

Ta+le-driven al%orit&m

D(#! Transition Ta+le!


S7 S7 S7 S7
S7 S7 S7 S7
S2 S7 S7 S7
S6 S2 S3 S7
a
+
222
i
222
S6
S7
S2
a-&Y->
'
i
a->0-J
S3
Tok2I(
Tok2Id
Tok2Id
a-e%->0-J
a->0-J
- Tok2Id Tok2I( Tok2Id
S6 S2 S3 S7
(inal State Ta+le!

Ta+le-driven al%orit&m

D(#! Transition Ta+le!


S7 S7 S7 S7
S7 S7 S7 S7
S2 S7 S7 S7
S6 S2 S3 S7
a
+
222
i
222
S6
S7
S2
a-&Y->
'
i
a->0-J
S3
Tok2I(
Tok2Id
Tok2Id
a-e%->0-J
a->0-J
- Tok2Id Tok2I( Tok2Id
S6 S2 S3 S7
(inal State Ta+le!
#l%orit&m!
Start in start state
Transition 'rom one state to ne"t
usin% transition ta+le
)ver time ou reac& a potential 'inal
state3 remem+er it ; position in stream
W&en no more transitions appl3 revert
to last 'inal state seen ; position
)"ecute associated rule code

Dealin% ,it& Multiple Le"ers
Le" rules!
UI4ITI#LP i' :P 0Tok2I(19
UI4ITI#LP Ka->LKa->0-JLF :P 0Tok2Id19
UI4ITI#LP N0FO :P 0WW/)<I4 COMM)4T9 continue 0119
UCOMM)4TP NF1O :P 0WW/)<I4 I4ITI#L9 continue 0119
UCOMM)4TP 2 :P 0continue 0119


Dealin% ,it& Multiple Le"ers
Le" rules!
UI4ITI#LP i' :P 0Tok2I(19
UI4ITI#LP Ka->LKa->0-JLF :P 0Tok2Id19
UI4ITI#LP N0FO :P 0WW/)<I4 COMM)4T9 continue 0119
UCOMM)4TP NF1O :P 0WW/)<I4 I4ITI#L9 continue 0119
UCOMM)4TP 2 :P 0continue 0119

0F
COMM)4T I4ITI#L
F1
Ka->LKa->0-JL 2

Summar

# Le"er!

input! stream o' c&aracters

output! stream o' tokens

Writin% le"ers + &and is +orin%3 so ,e use a


le"er %enerator! ml-le"

le"er %enerators ,ork + convertin% $)s t&rou%&


automata t&eor to e''icient ta+le-driven al%orit&ms2

Moral! donAt underestimate our t&eor classesZ

%reat application o' cool t&eor developed in t&e H0s2

,eAll see more cool apps as t&e course pro%resses

You might also like