5 - Regular Expression
5 - Regular Expression
Apply
● ℒ(R1R2) = ℒ(R1) ℒ(R2) this
this recursive
recursive defnition
defnition
● ℒ(R1 ∪ R2) = ℒ(R1) ∪ ℒ(R2) to
to
●
ℒ(R*) = ℒ(R)*
a(b∪c)((d))
a(b∪c)((d))
●
ℒ((R)) = ℒ(R)
and
and see
see what
what you
you get.
get.
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains aa as a
substring }.
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains aa as a
substring }.
(a ∪ b)*aa(a ∪ b)*
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains aa as a
substring }.
(a ∪ b)*aa(a ∪ b)*
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains aa as a
substring }.
(a ∪ b)*aa(a ∪ b)*
bbabbbaabab
aaaa
bbbbbabbbbaabbbbb
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains aa as a
substring }.
(a ∪ b)*aa(a ∪ b)*
bbabbbaabab
aaaa
bbbbbabbbbaabbbbb
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains aa as a
substring }.
Σ*aaΣ*
bbabbbaabab
aaaa
bbbbbabbbbaabbbbb
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
Designing Regular Expressions
The
The length
length ofof
aa string
string ww is
is
denoted
denoted |w|
|w|
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
ΣΣΣΣ
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
ΣΣΣΣ
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
ΣΣΣΣ
aaaa
baba
bbbb
baaa
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
ΣΣΣΣ
aaaa
baba
bbbb
baaa
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
Σ4
aaaa
baba
bbbb
baaa
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | |w| = 4 }.
Σ4
aaaa
baba
bbbb
baaa
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains at most one a }.
Here
Hereare
aresome
somecandidate
candidateregular
regularexpressions
expressionsfor
for
the
thelanguage
languageL.
L.Which
Whichofofthese
theseare
arecorrect?
correct?
Σ*aΣ*
Σ*aΣ*
b*ab* ∪∪ b*
b*ab* b*
b*(a ∪∪ ε)b*
b*(a ε)b*
b*a*b* ∪∪ b*
b*a*b* b*
b*(a* ∪∪ ε)b*
b*(a* ε)b*
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains at most one a }.
b*(a ∪ ε)b*
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains at most one a }.
b*(a ∪ ε)b*
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains at most one a }.
b*(a ∪ ε)b*
bbbbabbb
bbbbbb
abbb
a
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains at most one a }.
b*(a ∪ ε)b*
bbbbabbb
bbbbbb
abbb
a
Designing Regular Expressions
●
Let Σ = {a, b}.
●
Let L = { w ∈ Σ* | w contains at most one a }.
b*a?b*
bbbbabbb
bbbbbb
abbb
a
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa*
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa*
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa* (.aa*)*
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa* (.aa*)*
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa* (.aa*)* @
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
aa* (.aa*)* @
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
A More Elaborate Design
●
Let Σ = { a, ., @ }, where a represents
“some letter.”
●
Let's make a regex for email addresses.
a+ (.a+)* @ a+(.a+)+
cs103@cs.stanford.edu
first.middle.last@mail.site.org
dot.at@dot.com
For Comparison
a+(.a+)*@a+(.a+)+
@, .
Σ
@, . @, .
q2 q8 q7
@
. a @, . @ . a
@, .
start a @ a . a
q0 q1 q3 q4 q5 q6
a a a
Shorthand Summary
●
Rn is shorthand for RR … R (n times).
●
Edge case: defne R⁰ = ε.
●
Σ is shorthand for “any character in Σ.”
●
R? is shorthand for (R ∪ ε), meaning
“zero or one copies of R.”
●
R⁺ is shorthand for RR*, meaning “one or
more copies of R.”
Time-Out for Announcements!
Problem Set Four Graded
●
Your diligent and hardworking TAs have fnished grading PS4.
Grades and feedback are now available on Gradescope.
●
As always, please review your feedback! Knowing where to
improve is more important than just seeing a raw score.
●
Did we make a mistake? Regrades on Gradescope will open
tomorrow and are due in one week.
Problem Set Six
●
Problem Set Five was due at 2:30PM
today.
●
Problem Set Six goes out today. It’s due
next Friday at 2:30PM.
●
Design DFAs and NFAs for a range of
problems!
●
Explore formal language theory!
●
See some clever applications!
Back to CS103!
The Lay of the Land
Languages
Languages you
you can
can Languages
Languages you
you can
can
build
build aa DFA
DFA for.
for. build
build an
an NFA
NFA for.
for.
Regular
Languages
Languages
Languages you
you can
can Languages
Languages you
you can
can
build
build aa DFA
DFA for.
for. build
build an
an NFA
NFA for.
for.
Regular
Languages
Regular
Languages
Regular
Languages
Regular
Languages
Regular
Languages
Regular
Languages
Languages
Languages you
you can
can Languages
Languages you
you can
can
build
build aa DFA
DFA for.
for. build
build an
an NFA
NFA for.
for.
Regular
Languages
Regular
Languages
q₂
Σ b
start a
q₀ q₁
ε
Σ
q₃ b q₄
Generalizing NFAs
q₂
Σ b
start a
q₀ q₁
ε
Σ
q₃ b q₄
Generalizing NFAs
q₂
Σ b
start a
q₀ q₁ These
These are
are all
all
regular
regular expressions!
expressions!
ε
Σ
q₃ b q₄
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
Note:
Note: Actual
Actual NFAs
NFAs
aren't
aren't allowed
allowed to
to have
have
transitions
transitions likelike these.
these.
This
This isis just
just aa thought
thought
experiment.
experiment.
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Generalizing NFAs
start ab ∪ b
q₀ q₁
a ab*
a*b?a*
q₂ q₃
a a a b a a b b b
Key Idea 1: Imagine that we can label
transitions in an NFA with arbitrary regular
expressions.
Generalizing NFAs
start ab ∪ b
q₀ q₁
Generalizing NFAs
start ab ∪ b
q₀ q₁
Is
Is there
there aa simple
simple
regular
regular expression
expression forfor
the
the language
language of
of this
this
generalized
generalized NFA?
NFA?
Generalizing NFAs
start ab ∪ b
q₀ q₁
Is
Is there
there aa simple
simple
regular
regular expression
expression forfor
the
the language
language of
of this
this
generalized
generalized NFA?
NFA?
Generalizing NFAs
start a+(.a+)*@a+(.a+)+
q₀ q₁
Generalizing NFAs
start a+(.a+)*@a+(.a+)+
q₀ q₁
Is
Is there
there aa simple
simple
regular
regular expression
expression forfor
the
the language
language of
of this
this
generalized
generalized NFA?
NFA?
Generalizing NFAs
start a+(.a+)*@a+(.a+)+
q₀ q₁
Is
Is there
there aa simple
simple
regular
regular expression
expression forfor
the
the language
language of
of this
this
generalized
generalized NFA?
NFA?
Key Idea 2: If we can convert an NFA into
a generalized NFA that looks like this...
start some-regex
q₀ q₁
R11 R22
R12
start q1 q2
R21
From NFAs to Regular Expressions
R11 R22
R12
start q1 q2
R21
Here,
Here, RR₁₁
₁₁,, RR₁₂
₁₂,, RR₂₁
₂₁,, and
and RR₂₂
₂₂ are
are
arbitrary
arbitrary regular
regular expressions.
expressions.
From NFAs to Regular Expressions
R11 R22
R12
start q1 q2
R21
Question:
Question: Can
Can we
we get
get aa clean
clean
regular
regular expression
expression from
from this
this
NFA?
NFA?
From NFAs to Regular Expressions
R11 R22
R12
start q1 q2
R21
Key
Key Idea
Idea 3:
3: Somehow
Somehow transform
transform
this
this NFA
NFA soso that
that it
it looks
looks like
like
start this:
this:
some-regex
q₀ q₁
From NFAs to Regular Expressions
R11 R22
R12
start q1 q2
R21
The
The frst
frst step
step isis going
going to
to be
be aa
bit
bit weird.
weird.
From NFAs to Regular Expressions
R11 R22
R12
start qs q1 q2 qf
R21
From NFAs to Regular Expressions
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
Could
Could we
we
eliminate
eliminate this
this
state
state from
from the
the
NFA?
NFA?
From NFAs to Regular Expressions
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
ε R11* R12
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
Note:
Note: We're
We're using
using
concatenation
concatenation and
and
Kleene
Kleene closure
closure inin order
order
to
to skip
skip this
this state.
state.
From NFAs to Regular Expressions
ε R11* R12
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
ε R11* R12
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
ε R11* R12
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
From NFAs to Regular Expressions
ε R11* R12
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
R11 R22
R12
start ε ε
qs q1 R21 q2 qf
R22
start ε
qs q2 qf
R22
start ε
qs q2 qf
start ε
qs q2 qf
Note:
Note: We're
We're using
using union
union
to
to combine
combine these
these
transitions
transitions together.
together.
From NFAs to Regular Expressions
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
R11* R12 ε
start qs q2 qf
start qs qf
From NFAs to Regular Expressions
start qs qf
From NFAs to Regular Expressions
start
R11* R12 (R22 ∪ R21R11*R12)*
qs qf
From NFAs to Regular Expressions
start
R11* R12 (R22 ∪ R21R11*R12)*
qs qf
R11 R22
R12
start q1 q2
R21
The State-Elimination Algorithm
●
Start with an NFA N for the language L.
● Add a new start state qs and accept state qf to the
NFA.
● Add an ε-transition from qs to the old start state of N.
● Add ε-transitions from each accepting state of N to qf,
then mark them as not accepting.
● Repeatedly remove states other than qs and qf
from the NFA by “shortcutting” them until only
two states remain: qs and qf.
● The transition from qs to qf is then a regular
expression for the NFA.
The State-Elimination Algorithm
●
To eliminate a state q from the automaton, do the following
for each pair of states q₀ and q₁, where there's a transition
from q₀ into q and a transition from q into q₁:
● Let Rin be the regex on the transition from q₀ to q.
● Let Rout be the regex on the transition from q to q₁.
● If there is a regular expression Rstay on a transition from q
to itself, add a new transition from q₀ to q₁ labeled
((Rin)(Rstay)*(Rout)).
●
If there isn't, add a new transition from q₀ to q₁ labeled
((Rin)(Rout))
●
If a pair of states has multiple transitions between them
labeled R₁, R₂, …, Rₖ, replace them with a single transition
labeled R₁ ∪ R₂ ∪ … ∪ Rₖ.
Our Transformations