You are on page 1of 15

Compiler

Construction
Lecture 3
Scanner (Lexical
Analyzer)

2
Notation
 For lexical analysis we care
about regular languages.
 Regular languages can be
described using regular
expressions.

3
Regular Languages
 Each regular expression is a
notation for a regular
language.
 If A is a regular expression,
we write L(A) to refer to
language denoted by A.
4
Regular Expression
the empty string
R|S = either R or S
RS = S follows R
(concatenation)
R* = concatenation of R
zero or more times
(R*=  |R|RR|RRR...) 5
RE Extentions
R? =  | R (zero or one R)
R+ = RR* (one or more R)

6
Regular Expression
RE Strings in L(R)
a “a”
ab “ab”
a|b “a” “b”
(ab)* “” “ab” “abab” ...
(a|)b “ab” “b”
7
Example: identifiers
 identifier:
string or letters or digits
starting with a letter
 C identifier:
[a-zA-Z_][a-zA-Z0-9_]*
8
Finite Automata (FA)
 Specification:
Regular Expressions
 Implementation:
Finite Automata

9
Finite Automata
Finite Automaton consists of
 An input alphabet
 A set of states
 A start (initial) state
 A set of transitions
 A set of accepting (final)
states
10
Finite Automaton
State Graphs
A state

The start state

An accepting
state
11
Finite Automaton
State Graphs
a

A transition

12
FA Example
A FA that accepts only “1”
1

13
FA Example
 A FA that accepts any number
of 1’s followed by a single 0

1
0

14
FA Example
 A FA that accepts ab*a
 Alphabet: {a,b}
b
a a

15

You might also like