You are on page 1of 38

Chapter two

Regular Languages
Regular Languages
 If a language is regular there exists a finite acceptor for it.
Therefore every regular language can be described by some Dfa
or Nfa. For every regular languages there should exist
corresponding regular expression.
Theorem:
Let L be a regular language. Then there exists a regular expression r
such that L=L(r)
Proof:
 If L is regular, there exist an nfa for it. We can assume without
loss of generality, that this nfa has a single final state, distinct
from its initial state. We convert this nfa to a complete
generalized transition graph and apply the procedure nfa to
regular expression to it. This yields the required regular
expression r.
The set of regular expressions is defined
as follows:
 Every symbol of Σ is a regular expression
 ε is a regular expression
 Ø is a regular expression denoting the empty set.
 If r1 and r2 are regular expressions, then
 L(r1+r2)=L(r1)UL(r2) ---- union
 L(r1.r2)=L(r1)L(r2) ----- concatenation
 R* -------- (Kleene star) denoting the smallest superset of set
described by R that contains ε and is closed under string
concatenation. This is the set of all strings that can be made by
concatenating any finite number (including zero) of strings from
set described by R. For example, {"0","1"}* is the set of all finite
binary strings (including the empty string), and {"ab", "c"}* = {ε,
"ab", "c", "abab", "abc", "cab", "cc", "ababab", "abcab", ... }.
Priority
 * has the highest priority
 . has the next priority
 + has the least priority
 Parenthesis is used to override the above priorities
For example:
 (a+(b.c))* stands for the star closure of {a}U{bc} that is the
language{ ε,a,bc,aa,abc,bca,bcbc,aaa,aabc,.....}
 In other words we can define the regular expressions by using the following terms.
r = epsilon
r=a
r = r1 + r2
r = r1 r2
r = r1*
r = (r1)
The language represented or generated by a regular expression is a Regular
Language, denoted L(r).
Cont’d

 Example 1:
 ab* specifies the strings starting with a followed by 0 or more number
of b’s,
 (ab)* specifies 0 or more repetitions of ab
Example 2:
 For Σ={a,b} the expression
r = (a+b)*(a+bb) is regular.
It denotes the language L(r)={a,bb,aa,abb,ba,bbb,.....} So, L(r) is the set
of all strings on {a,b}, terminated by either an a or a bb.
Example 3:
 r=(aa)*(bb)*b denotes the set of all strings with an even number of
a’s followed by an odd number of b’s that is
 L(r)={a2nb2m+1: n>=0,m>=0}
Example 4:
 r=(0+1)*00(0+1)*
Algebra of regular expressions
Identity laws
a. ε. R =R. ε = R
b. Ø + R = R+ Ø = R
Idempotent laws
R+R=R
(R*)*=R*
Distributive laws
 A.(B+C)=A.B+A.C
Associative laws
 A.(B.C)=(A.B).C
 A+(B+C)=(A+B)+C
Regular Grammars
 A language is said to be regular if it can be represented with a
regular grammar. Regular languages are equivalent to type 3 grammars.
 The Linear Grammars are either left or right:
Right Linear Grammars:
 Rules of the forms
 A→ε
 A→a
 A → aB
Left Linear Grammars:
 Rules of the forms
 A→ε
 A→a
 A → Ba
Transform the following Right Linear grammar in an
equivalent NFAε.
S → aS | bA
A → cA | ε
Right linear Grammar
A -> aB
1. A is a single symbol (corresponding to a state) called a ‘non-terminal symbol’
2. a corresponds to a lexical item
3. B is a single non-terminal symbol.
Formal definition of Right Linear Grammars
A right linear grammar is a 4-tuple <T, N, S, R>, where:
1. N is a finite set of non-terminals
2. T is a finite set of terminals, including the empty string
3. S is the start symbol
4. R is a finite set of rewriting rules of the form A-> xB or A-> x, where A and B
stand for non-terminals and x stands for a terminal.
Formal example:
G1 = <T, N, S, R>, where T = {a, b}, N = {S, A, B}, and
R=
S -> aA
A -> aA
A -> bB
B -> bB
In a left regular grammar (also called left linear grammar), all
rules obey the forms
A → a - where A is a non-terminal in N and a is a terminal in Σ
A → Ba - where A and B are in N and a is in Σ
A → ε - where A is in N and ε is the empty string.
An example of a left regular grammar G with N = {S, A}, Σ = {a,
b, c}, P consists of the following rules
S → Sa
S → Ab
A→ε
A → cA
and S is the start symbol. This grammar describes the same
language as the regular expression a*bc*.
A regular grammar is a left or right regular grammar.
Relation between regular language and
Regular expression
They are equivalent:
With every regular expression we can associate a regular
language.
Conversely, every regular language can be obtained from a
regular expression.
Examples:
–Regular expression = ab*c
–Regular language = {ac, abc, abbc, ….}
Let Σ be an alphabet. The regular expressions over Σ are:
Ø Represents the empty set { }
ε Represents the set {ε}
a Represents the set {a}, for any symbol a in Σ
Con’t
For Ø:

For ε:

For a:
Types of automata
There are four basic types of automata, distinguished
by the following characteristics:

FSA have no memory, regular grammars


Pushdown automata -In addition to the tape, they use a stack to
read from and write to,
-context-free grammars
Linear-bound automata -read and write on a tape of finite length in
both directions
- context sensitive grammars
Turing machine -read and write on an infinite tape in both
directions
- unrestricted grammars
Finite Automata
 An abstract machine which can be used to implement regular
expressions (etc.).
 Has a finite number of states, and a finite amount of memory (i.e.,
the current state).
 Can be represented by directed graphs(state transition diagrams) or
transition tables
Representation
 An FSA may be represented as a directed graph; each node (or
vertex) represents a state, and the edges (or arcs) connecting the
nodes represent transitions.
 Each state is labeled.
 Each transition is labeled with a symbol from the alphabet over
which the regular language represented by the FSA is defined, or
with e, the empty string.
Con’t
 Among the FSA’s states, there is a start state and at least one final
state (or accepting state).
 Given an input string, an FSA will either accept or reject the input.
 If the FSA is in a final (or accepting) state after all input symbols
have been consumed, then the string is accepted (or recognized).
 Otherwise (including the case in which an input symbol cannot be
consumed), the string is rejected.
 Informally, a state diagram that comprehensively captures all
possible states and transitions that a machine can take while
responding to a stream or sequence of input symbols
 Recognizer for “Regular Languages”
Deterministic Finite Accepters
 The first types of automaton we study in detail are finite accepters
that are deterministic in their operation. We start with a precise
formal definition of deterministic accepters. A deterministic acceptor
has internal states, rules for transitions from one state to another,
some input, and ways of making decisions.
 Definition:
 A DFA is defined by the quintuple
M = (Q, Σ, δ, q0, F)
Q A finite set of states
Σ A finite input alphabet
q0 The initial/starting state, q0 is in Q
F A set of final/accepting states, which is a subset of Q
δ A transition function, which is a total function from Q x Σ to Q
 A deterministic finite accepter operates in the following manner. At the initial
time, it is assumed to be in the initial state q0, with its input mechanism on the
leftmost symbol of the input string. During each move of the automaton, the input
mechanism advances one position to the right, so each move consumes one input
symbol. When the end of the string is reached, the string is accepted if the
automaton is one of its final states. Otherwise the string is rejected. The input
mechanism can move only from left to right and reads exactly one symbol on
each step. The transition from one internal state to another are governed by the
transition function δ. For example
 δ(q0,a)= q1. If the dfa is in state q0 and the current input symbol is a, the dfa will
go into state q1.
The graph below represents the dfa
 M = ({q0, q1, q2} , {0,1}, δ, q0,{ q1})
 Where δ is given by
 δ(q0,0) = q0,
 δ(q0,1) = q1
 δ(q1,0) = q0,
 δ(q1,1) = q2,
 δ(q2,0) = q2,
Cont..
 The string 01 is accepted. The dfa does not accept the
string 00, since after reading two consecutive 0’s, it will be in
state q0. By similar reasoning, we see that the automaton
will accept the strings 101, 0111, and 11001, but not 100 or
1100.
The language accepted by a dfa M=(Q,∑, δ, q0,F) is the set
of all strings on ∑ accepted by M. In formal notation,
L(M)={wÎ∑*: δ*(q0,w) ÎF}.
 A dfa will process every string in ∑* and either accept it or nor accept it.
Non acceptance means dfa stops in a non final state.
Theorem
Let M=(Q, Σ, δ,q0,F) be a deterministic finite accepter, and
let GM be its associated transition graph. Then for every qi, qj
Î Q and w Î Σ* , δ*(qi,w)=qj, if and only if there is in GM a walk
with label w from qi to qj.
The following automaton is an example for trap state or
dead state i.e a state is a dead state or trap state if it is not an
accepting state and has no out-going transitions except to itself.
Nondeterministic Finite State Automata (NFA)
 An NFA is an automaton that its states might have none, one or more outgoing
arrows under a specific symbol. Example

• An NFA is a five-tuple:
M = (Q, Σ, δ, q0, F)

Q A finite set of states


Σ A finite input alphabet
q0 The initial/starting state, q0 is in Q

F A set of final/accepting states, which is a subset of Q


δ A transition function, which is a total function from Q x Σ to 2 Q
δ: (Q x Σ) = 2Q or δ: Q x (ΣU{ ε }) = 2Q
• Example #1: some 0’s followed by some 1’s
• Q = {q0, q1, q2}
• Σ = {0, 1}
• Start state is q0

• F = {q2}

δ: 0 1
qo {q0, q1} {}

q1 {} {q1, q2}

q2 {q2} {q2}
 An NFA for the language of all strings over {a,b} that contain ababb
Conversion of NFA to DFA:
Suppose that you want to find an equivalent DFA for an
NFA . The algorithm is the following:
• Start from the start state and see where 0 or 1 takes you.
• For every new subset you find, see where 0 or 1 takes you.
• Repeat until no new subsets are found.
To find an equivalent DFA with the NFA of the figure we should complete the
following table:
The NFA below has 3 states q0,q1,q2 with ∑={0,1}
Cont…

Example 2

Transition table A B
0 0,1 0
0,1 0,1,2 0
0,1,2 0,1,2,3 0
0,1,2,3 0,1,2,3 0,3
0,3 0,1,3 0,3
0,1,3 0,1,2,3 0,3
Cont
What does a DFA do on reading an input string?

 Input: a word w in ∑*

 Question: Is w acceptable by the DFA?

 Steps:

 Start at the “start state” q0

 For every input symbol in the sequence w do

 Compute the next state from the current state, given the current input
symbol in w and the transition function
 If after all symbols in w are consumed, the current state is one of the final
states (F) then accept w;
 Otherwise, reject w.
Minimization of DFA:
For storage efficiency, it is desirable to reduce the number of states as far as possible.
We now describe an algorithm that accomplishes this.
 Recall that a DFA M=(Q, Σ, δ, q0, F)
 Two states are either indistinguishable or distinguishable. Indistinguishability has
the properties of an equivalence relation. If p and q are indistinguishable and if q
and r are also indistinguishable then all the 3 states are indistinguishable.
 One method of reducing the states of a dfa is based on finding and combining
indistinguishable states.
Example
State A B
A B C
B B D
C B C
D B E
E B C

Transition Table:
Cont..
Minimize the number of states of DFA:

The initial partition contains two groups

Non final state (ABCD)

Final state (E)

new=(E) it cannot be further split.

On input b, A,B and C go to the members of (ABCD) of , while D goes to E, a


member of another group.

Thus, in new (ABCD) must be split into two groups(ABC) and (D).

The new value of is (ABC), (D), (E).

Again on input b, A and C go to C, while B goes to D.

So new is (AC) (B) (D) (E)


Cont..
Transition table:
Input
State
A B
Start A B A
B B D
D B E
Accept E B A
 The transition diagram for the optimized or minimized DFA
Closure properties of regular languages
We have seen already union concatenation kleene star
properties. Now let us move on to compliment
All we did was to make the accepting states be non-accepting,
and make the non accepting states be accepting
In terms of the 5-tuple M = (Q, Σ, δ, q0, F), all we did was to
replace F with Q-F
Using this construction, we have a proof that the complement
of any regular language is another regular language.
Refer the below diagram. The regular languages are closed
under complement.
Cont..

Cont..

Intersection
We can cross product the two DFAs as:
 LM = (QLxQM, , LM, (qL, qM), FLxFM)
Demorgan’s law

Closed under difference


Reversal
Given language L, LR is the set of strings whose reversal is in L.
Example: L = {0, 01, 100};
LR = {0, 10, 001}.
Proof: Let E be a regular expression for L.We show how to reverse E, to
provide a regular expression ER for LR.
Example: Reversal of a Regular expression:

Let E = 01* + 10*.


ER = (01* + 10*)
R = (01*)R + (10*)R
= (1*)R0R + (0*)R1R
= (1R)*0 + (0R)*1
= 1*0 + 0*1.
Homomorphisms
 A homomorphism on an alphabet is a function that gives a
string for each symbol in that alphabet.
Example: h(0) = ab; h(1) = ε.
Extend to strings by h(a1…an) = h(a1)…h(an).
Example: h(01010) = ababab.

You might also like