You are on page 1of 71

Context Free Languages

 Context Free Grammars


 Leftmost and rightmost derivation of strings
 Derivation tree
 Parsing Arithmetic Expression
 Ambiguity in CFGs

September 24, 2021 Formal Language Theory 1


Introduction

 The production in regular grammar are


restricted in two ways.
 To create a grammar that are more powerful,
we must relax some of the restrictions.
 By retaining the restriction on the left side, but
permitting anything on the right side context
free grammar.

September 24, 2021 Formal Language Theory 2


Context Free Grammar (CFG)
 Definition: A CFG, G, is G=(N, T, P, S) context free if all
productions in P have the form
 A  β, A Є N, β Є (NUT)*.
 CFGs are used in defining the syntax of programming
languages and in parsing arithmetic expressions.
 A language generated from CFG is called Context Free
Language (CFL).
 Ex.
a) S  aB b) S  aB|A
B  bA|b aA  aA|a|CBA
Aa Bλ
Cc
September 24, 2021 Formal Language Theory 3
CFG: cont’d
 Let G be a CFG, then x Є L(G) iff S  x in
zero or more steps over G.
 x Є L(G) can as well be obtained from a
derivation tree or parse tree. The root of the
tree is S and x is the collection of leaves from
left to right.
 Left most derivation: employs the reduction
of the left most non-terminal
 Right most derivation: employs the reduction
of the right most non-terminal

September 24, 2021 Formal Language Theory 4


Left and right…

 Given the grammar


 SaAB
 AbBb
 BA|λ
 Then the leftmost derivation is
 SaAB abBbBabAbBabbBbbBabbbbBabbbb
 Right most derivation is
 SaABaAabBbabAbabbBbbabbbb

September 24, 2021 Formal Language Theory 5


September 24, 2021 Formal Language Theory 6
Derivation/parse tree
 Let G=(N,T,P, S) is CFG each production is represented
with tree satisfying the following condition:
1. The root is labelled S
2. Every leaf has a label from TU{λ}
3. Every interior vertex has a label from N
4. If a vertex has lebel AϵN and its from left to right a1a2,…
an then P must contain a production of the form Aa1a2,…
an
5. A leaf labelled λ has no siblings that is a vertex with a child
labelled λ can have no other children.
Partial derivation:

September 24, 2021 Formal Language Theory 7


 Consider the
grammar G
 SaAB
 AbBb
 BA|λ
 The first is partial
derivation tree
 Second is
derivation/parse
tree

September 24, 2021 Formal Language Theory 8


CFG: cont’d
 If a derivation of a string x has two different left most
derivations, then the grammar is said to be ambiguous.
Otherwise unambiguous.
(i.e. a grammar is ambiguous if it can produce more than one
parse tree for a particular sentence.
 Ex.
1. G1 = (N, T, P, S) with productions:
S  AB
A  aA|a
B  bB|b
let x = aaabbb
a) find a left most and right most derivations for x
b) draw the parse tree for x

September 24, 2021 Formal Language Theory 9


CFG: cont’d

2. G2 = (N, T, P, S) with productions:


S  SbS|ScS|a
let x = abaca Є L(G2)
a) find a left most and right most
derivations for x
b) draw the parse tree for x
3. Is G1 ambiguous? Is G2?

September 24, 2021 Formal Language Theory 10


Parsing Arithmetic Expression

 Consider the following grammar:


ET|E+T|E–T
T  F | T * F | T/F
F  a | b | c | (E)
Draw parse trees for
a) a*b+c b) a+b*c c) (a+b)*c d) a-b-c

September 24, 2021 Formal Language Theory 11


Ambiguity of CFG

 Show the following are ambiguous grammar


1. Sss|a|b
2. SA|B|b
AaAB|ab
BabB|λ
Provide a parse tree for each of them
 A language generated from ambiguous
grammar is called ambiguous language.

September 24, 2021 Formal Language Theory 12


September 24, 2021 Formal Language Theory 13
Exercise:

September 24, 2021 Formal Language Theory 14


Chapter five

Simplification of context-free grammar


and normal forms
Outlines

 Methods for Transforming Grammars


 Normal for grammars and parsing
 Chomsky’s hierarchy of grammars

September 24, 2021 Formal Language Theory 16


Introduction
 The definition of context free grammars (CFGs) allows us to develop a wide
variety of grammars and impose no restriction what so ever to the right side of a
production.
 Most of the time, some of the productions of CFGs are not useful and are
redundant.
 This happens because the definition of CFGs does not restrict us from making
these redundant productions.
 By simplifying CFGs we remove all these redundant productions from a
grammar , while keeping the transformed grammar equivalent to the original
grammar.
 Because of this, we need a method to transform a CFG into its equivalent one
that satisfies certain restriction.
 Transformation
 Substitutions
 Two grammars are called equivalent if they produce the same language.
Simplifying CFGs is necessary to later convert them into Normal forms
Introduction

 We also investigate normal forms for CFG.


 CNF
 GNF
 Types of redundant productions and the
procedure of removing them are mentioned
below.
 Remove λ-productions.
 Remove unit-productions.
 Remove useless-productions
Removing useless production

 We want to remove productions that can


never take part in any derivation.
SaSb | λ|A
AaA
 S A is redundant as A cannot be
transformed into a terminal string.
 Removing this production leaves the
language unaffected.
Removing useless production
 Let G = (N, T, S, P) be a CFG A ϵV is useful iff there is w ϵL(G) such that:
S*xAy*w with x, y in (NUT)*
 A production is useless if it involves any useless variable.
 E.g.
 G = ({S, A, B}, {a, b}, S, P)
 SA
 AaA|λ
 BbA
 The variable B is useless, either it is no reachable from the start sysmble or generating a
terminal string.
 Eliminate the useless symbbols and production from G =(V/N, T, S, P) where V/N=(A,
B, C) and T={a,b} with P consisting
 SaS | A | C SaS|A
Aa
 Aa 
 Baa
 CaCb
 The remaining are useless production
Removing λ-Productions
Removing λ-Productions
Example
Removing Unit-Productions

 Any production of a context-free grammar of


the form: A B is called a unit-production.
 unit rules: A B
 whenever A B, replace the rule with rules Aβ
for each rule in which B β
Removing Unit-Productions
Removing Unit-Productions
 Example

Finally: the grammar is


SAa | a | bc | bb
Aa | bc | bb
Ba | bc | bb

Note: the removal of unit production has made B and the associated
productions useless.
Simplification and Normal Forms
 Let L be a context-free language that does not contain λ.
Then there exists a CFG that generates L and does not have
any useless productions, O -productions, or unit-productions.
 Proof
 Remove λ-productions.
 Remove unit-productions.

 Remove useless-productions

 So before converting to one of the normal forms, eliminate these three


kinds of rules.
Normal Forms

 Chomsky normal form.


 Greibach normal form.
Chomsky Normal Form (CNF)
Chomsky Normal Form (CNF)
CNF
CFG to CNF steps
 Step 1: Eliminate start symbol from the RHS. If the start
symbol T is at the right-hand side of any production, create
a new production as:S1 → S  
 Step 2: In the grammar, remove the null, unit and useless
productions. You can refer to the Simplification of CFG.
 Step 3: Eliminate terminals from the RHS of the
production if they exist with other non-terminals or
terminals. For example, production S → aA can be
decomposed as: S → RA , R → a 
 Step 4: Eliminate RHS with more than two non-terminals.
For example, S → ASB can be decomposed as: S → RB,
R → AS  

September 24, 2021 Formal Language Theory 32


CNF
 Egxample
 S ABa
 A aab
 B Ac
 As per the theorem, the grammar does not have any λ-production, unit production and useless
production.
CNF
Greibach Normal Form (GNF)
GNF
Steps for converting CFG into GNF
Step 1: Convert the grammar into CNF.
If the given grammar is not in CNF, convert it into CNF.

You can refer the following topic to convert the CFG into
CNF: Chomsky normal form
Step 2: If the grammar exists left recursion, eliminate it.
If the context free grammar contains left recursion,

eliminate it. You can refer the following topic to eliminate left
recursion: Left Recursion
Step 3: In the grammar, convert the given production rule
into GNF form.
If any production rule in the grammar is not in GNF form,

convert it.

September 24, 2021 Formal Language Theory 37


GNF-Example Other examples
G1 = S → aAB|aB| ε,
A → aA| a,
 The grammar B → bB | b

G1 is in GNF.

G2 = S → aAB | aB,
A → aA | ε,
B → bB | ε

G2 is not GNF because A →ε,


B →ε is not allowed
GNF
GNF gives leftmost derivation of a string and used to prove equivalence
of CFGs and PDAs. a PDA M s.t. L(M) = L(G).

 Convert the grammar


SabSb|aa into GNF
Exercise:

 Transform the grammar below to Chomsky


normal form a) b)

 Transform the grammar below to griebach


normal form a) SaSb|ab b) Sab|aS|aaS
C)
S → XB | AA   (read about Left Recursion)
A → a | SA  
B → b  
X → a  

September 24, 2021 Formal Language Theory 40


Push Down Automata (PDA)

 Non-deterministic PDA
 Languages accepted by NPDA
 Deterministic PDA
 Languages accepted by DPDA

September 24, 2021 Formal Language Theory 41


Push Down Automata (PDA)
 A class of automata associated with CFLs
 An FSA, being at a certain state advances to the next
state based on the input it reads
 A PDA advances to the next state based on the input it
reads and the top most symbol in the stack.
 Unlike FSA, PDA has a memory in the form of a stack
 PDA is more powerful than FSA
 PDA=FSA+stack
 Two types of PDA: NPDA & DPDA

September 24, 2021 Formal Language Theory 42


 Stack has 2 basic operations:
 Push, a new element is added to the stack
 POP, an element from the top most is removed

 PDA has 3 component Takes


accept/
 An input tape Input Finite
reject
control
 Finite control unit unit
 A stack with infinite size Push/pop
Input tape

Stack

September 24, 2021 Formal Language Theory 43


Non-deterministic PDA (NPDA)
 Definition: NPDA is a 7-tuple M=(Q, Σ, V/ſ, P/ẟ,
q0, Z0, F) where
1. Q : finite set of states
2. Σ : input alphabet
3. V/ſ : finite symbols on the stack (stack
alphabet/gamma)
4. P/ẟ : is a total function called pushdown function and is
given by P:QXVX(ΣU{λ})  QXV* (transition function)
5. q0 Є Q is the start state
6. Z0 : stack initializer symbol/start symbol
7. F C Q : set of final states

September 24, 2021 Formal Language Theory 44


NPDA: cont’d
 Note that the arguments of P/ẟ are, ẟ(q, a, x)
 q is The current state in Q
 a is The current input symbol on Σ or a=Ɛ
 X is The current symbol on top of the stack member of ſ
 The result is a set of pairs (q, v) where
 q is the next state and
 v is a string which is put on top of the stack in place of the
single symbol there before
 λ-transition is possible, i.e. the second argument may be empty (λ)
 No move is possible if the stack is empty.
 If v=Ɛ, then the stack is popped
 If v=x, then stack is unchanged
 If v=yz, then x, is replaced by y and z pushed to the stack

September 24, 2021 Formal Language Theory 45


NPDA: cont’d
NPDA operations (execution)
 Read an input

 Pop the top element from the stack

 Push element(s) to the stack

 Enter next state

 The operations can be represented as follows: (q’, s, x; q’’, y)


where
 q’ : current state

 s : element popped from the stack

 x : incoming input

 q’’ : next state

 y : symbol pushed on to the stack

September 24, 2021 Formal Language Theory 46


PDA-graphical notation

 Finite state automata


A B
a

 PDA a, b->c
A B

Symbol on This symbol is


top of stack,
Input pushed onto the
this symbol is stack
symbol popped

May be Ɛ Ɛ, means the stack is The symbol is pushed


neither read nor onto the stack
September 24, 2021 popped Formal Language Theory 47
Example

 Construct PDA that accepts L = {0n 1n | n ≥ 0}


 This language accepts L = {ε, 01, 0011, 000111, .......... }


 Here, in this example, the number of ‘a’ and ‘b’ have to be same.
 Initially we put a special symbol ‘$’/z0 into the empty stack.
 Then at state q2, if we encounter input 0 and top is Null, we push 0 into stack.
This may iterate. And if we encounter input 1 and top is 0, we pop this 0.
 Then at state q3, if we encounter input 1 and top is 0, we pop this 0. This may
also iterate. And if we encounter input 1 and top is 0, we pop the top element.
 If the special symbol ‘$’ is encountered at top of the stack, it is popped out and it
finally goes to the accepting state q4.

September 24, 2021 Formal Language Theory 48


NPDA: cont’d
 the operations can also be represented using a
transition diagram
(q’, s, x; q’’, y) is represented in such a way that
the arc from state q’ to state q’’ is labeled with s, x; y
 Example: suppose the set of transition rules of an
NPDA contain ẟ(q1, a, b) = {(q2, cd), (q3, λ)}
Hence, if at any time the automata is in state q1, the
input symbol read is a, and the symbol on top of the
stack is b, then either:
1. the automata goes into state q2 and the string cd replaces b on
top of the stack, or
2. it goes into state q3 with the symbol b removed from the top of
the stack

September 24, 2021 Formal Language Theory 49


NPDA: cont’d
 Instantaneous Description (ID)
An ID of a PDA M is represented by three symbols that is (q, x, α)
where qЄQ, xЄΣ* and αЄV/N*. ID is an informal notation of how
PDA compute an input string and make a decision that string is
accepted or rejected.
 An initial ID is (q0, x, Z0), i.e. the PDA is at state q0,
the input string to be processed is x and the stack
contains Z0
 a move relation (denoted by ├ , connecting IDs)
between IDs is defined as:
(q, a1a2 …an, Z1Z2…Zm) ├ (q’, a2…an, βZ2…Zm)
if ẟ(q, a1, Z1) contains (q’, β)

September 24, 2021 Formal Language Theory 50


Acceptance of strings by NPDA
 The set of strings accepted by NPDA M is
denoted by L(M) and defined as follows:
1. L(M) = {x | (q0, x, Z0) ├* (q, λ, v) for some q in
F and v in V*} (acceptance by Final Sate)
2. L(M) = {x | (q0, x, Z0) ├* (q, λ, λ) for some q in
Q} (acceptance by Empty Stack)

 Note that in (1), the stack content (v) is


irrelevant. i.e. all strings that can put M into a
final state at the end of the string are accepted.

September 24, 2021 Formal Language Theory 51


Acceptance by NPDA: cont’d

 Example 1:
Construct NPDA that accepts the language
L = {xcxr | xЄ{a, b}}

an example of a string in L can be:


w = abbcbba

September 24, 2021 Formal Language Theory 52


Acceptance by NPDA: cont’d
 Solution:
The NPDA operates as follows:
1. as it reads symbols to the left of the symbol c, it
pushes the symbol read and remains in the
same state
2. when it ‘sees’ c, it enters a new state without
doing anything on the stack
3. it compares the incoming symbol with the top
element on the stack. If there is a match, it pops
off the top element. Otherwise, the operation
stops.
September 24, 2021 Formal Language Theory 53
Acceptance by NPDA: cont’d
 The pushdown function (P/ẟ) will have: abbcbba
1. ẟ(q0,Ɛ, Z0)=(q1,Z0)
2. ẟ(q1, a, Z0 ) = (q1, aZ0),
3. ẟ(q1, b, Z0 ) = (q1, bZ0),
4. ẟ(q1, b, a ) = (q1, ba),
5. ẟ(q1, a, b ) = (q1, ab),
6. ẟ(q1, a, a ) = (q1, aa),
7. ẟ(q1, a, b ) = (q1, bb),
8. ẟ(q1, c, b ) = (q2, b), (q1, c, a ) = (q2, a),
9. ẟ(q2, b, b) = (q2, λ),
10. ẟ(q2, a, a) = (q2, λ),
11. ẟ(q2, λ, Z0) = (q3, λ)
Thus, M=(Q, Σ, V, P, q0, Z0, F) where
Q = {q0, q1, q2,q3}
Z0 = #
Σ = {a, b, c}
V = {a, b, Z0}
F = {q3} and P/ẟ is given above.

September 24, 2021 Formal Language Theory 54


Acceptance by NPDA: cont’d

 Example:
Let w = abbcbba. Trace manually to
check whether w is accepted by M or not.

September 24, 2021 Formal Language Theory 55


Acceptance by NPDA: cont’d
 Solution
State Input Stack
q0 Ɛ #
q1 abbcbba a#
q1 bbcbba ba#
q1 bcbba bba#
q1 cbba bba#
q2 bba ba#
q2 ba a#
q2 a #
q2 λ λ
q3 λ λ (input is accepted)

September 24, 2021 Formal Language Theory 56


Acceptance by NPDA: cont’d
 Example 1:
Construct NPDA for the following language.
L = {w Є {a, b}* : na(w) = nb(w)} soln, M/A=({q0,qf},
{a,b}, {0,1,z}, ẟ,q0,z,{qf}), with ẟ given as
assume w=baab

the string is accepted

September 24, 2021 Formal Language Theory 57


Exercise 2: a,zaz
b,zbz
Construct an NPDA for accepting the language: a,aaa
a,bba
L = {wwr : w Є {a, b}+ } b,aab
stack
q0 b,bbb
z λ,aa a,a λ
λ,bb b,b λ

q1 q2
λ,z z

bba

Or

September 24, 2021 Formal Language Theory 58


NPDA – CFG equivalence
 Theorem: If L is a CFL, then there exists NPDA M
such that L = L(M).
proof (by construction)
Let CFG G = (N, T, P, S)
Define M = (Q, Σ, V, P’/ẟ, q0, z, {q2}) where:
Q = {q0, q1, q2}, P’/ẟ is given by:
ẟ(q0, λ, z) = (q1, Sz) push operation
ẟ(q1, λ, A) = {(q1, β) | A β Є P and A Є N} push
ẟ(q1, a, a) = (q1, λ) for all a Є Σ pop operation
ẟ(q1, λ, z) = (q2, λ)
such that x Є L(G) iff x Є L(M)

September 24, 2021 Formal Language Theory 59


Example
 PDA for the CFG below:
 SaSb|ab N={S}, T={a,b}
 Then the PDA can be

ẟ(q0, λ, λ)(q1,z) check for “aabb”


ẟ(q1, λ,z)(q1,Sz)
ẟ(q1, λ,S)(q1,aSb)
ẟ(q1, λ,S)(q1,ab) ẟ(q0, aabb,S)Ͱ(q1, aabb,Sz)
ẟ(q1,a,a)(q1, λ) Ͱ(q1,aabb,aSbz) Ͱ(q1,abb,Sbz)
Ͱ(q1,abb,abbz) Ͱ(q1,bb,bbz)
ẟ(q1,b,b)(q1, λ)
Ͱ(q1,b,bz) Ͱ(q1, λ, z) Ͱ(q2, λ, λ)
ẟ(q1, λ, z)(q2, λ)

September 24, 2021 Formal Language Theory 60


Or first convert to GNF then construt
PDA
 Construct PDA that accepts the language generated by grammar
with productions SaSbb|a
 We first change it to a form of GNF and will be look like

 The correspondence automaton

Will have three states {qo,q1,q2} with initial


State q0 and final state q2.
First the start symbol is put into stack by
 The production SaSA will be simulated by the PDA by removing S

and replacing it with SA. So δ (q1,a,S)={(q1,SA), (q1,λ)} similarly the


other are given as follows
 Finally the appearance of

Stack symbol on the top of stack tells the PDA is on the final
state/accepted

September 24, 2021 Formal Language Theory 61


Cont…
 Consider the grammar
 Since the grammar are in GNF, we can
construct the PDA by incorporating two
additional rules and

The correspondence derivation:


SaAaaABCaaaBCaaabCaaabc

September 24, 2021 Formal Language Theory 62


NPDA – CFG equivalence: cont’d

 Example: Given G = (N, T, P, S) with S = E, T


= {a, b, c, +, -, *, /, (, )}, N = {E, F, T} and P:
ET|E+T|E–T
TF|T*F|T/F
F  a | b | c | (E)
Construct NPDA M that simulates left most
derivation of the grammar (to accept a+(b*c)).

September 24, 2021 Formal Language Theory 63


NPDA – CFG equivalence: cont’d

 Theorem: If L = L(M) for some NPDA M, then


L is a CFL. (reading assignment)

September 24, 2021 Formal Language Theory 64


Deterministic PDA (DPDA)
 A DPDA is a 7-tuple machine with the following
properties:
a) P(q, a, A) contains at most one element, where q Є Q,
a Є Σ U {λ}, A Є V
(i.e. for any given input symbol and any stack top, at
most one move can be made)
a) if P(q, λ, A) ≠ Ø then P(q, a, A) = Ø for all a Є Σ
(i.e. when a λ-move is possible for some configuration,
no input-consuming alternative is available)
 A language accepted by DPDA is called Deterministic
CFL or simply deterministic language.

September 24, 2021 Formal Language Theory 65


DPDA: cont’d
 Example: The language L = {anbn : n >= 0}
is a deterministic CFL.
DPDA M = ({q0, q1, q2}, {a, b}, {0, 1}, P, 0,
{q0}) with
P(q0, a, 0) = (q1, 10)
P(q1, a, 1) = (q1, 11)
P(q1, b, 1) = (q2, λ)
P(q2, λ, 0) = (q0, λ)
P(q2, b, 1) = (q2, λ)

September 24, 2021 Formal Language Theory 66


DPDA
 At any moment, at most one move is possible:
, Allowed transitions:
for a in Σ, b in Γ

Allowed transition

September 24, 2021 Formal Language Theory 67


September 24, 2021 Formal Language Theory 68
NPDA – DPDA equivalence

 In contrast to FSA, DPDA and NPDA are not


equivalent. There are CFLs that are non-
deterministic.

September 24, 2021 Formal Language Theory 69


 Design of Top-Down Parser
 For top-down parsing, a PDA has the following four types of transitions −
 Pop the non-terminal on the left hand side of the production at the top of the stack and push its right-hand side
string.
 If the top symbol of the stack matches with the input symbol being read, pop it.
 Push the start symbol ‘S’ into the stack.
 If the input string is fully read and the stack is empty, go to the final state ‘F’.
 Example
 Design a top-down parser for the expression "x+y*z" for the grammar G with the following production rules −
 P: S → S+X | X, X → X*Y | Y, Y → (S) | x | y | z
Solution
 If the PDA is (Q, ∑, S, δ, q0, I, F), then the top-down parsing is −
 (x+y*z, I) ⊢(x +y*z, SI) ⊢ (x+y*z, S+XI) ⊢(x+y*z, X+XI)
 ⊢(x+y*z, Y+X I) ⊢(x+y*z, x+XI) ⊢(+y*z, +XI) ⊢ (y*z, XI)
 ⊢(y*z, X*YI) ⊢(y*z, y*YI) ⊢(*z,*YI) ⊢(z, YI) ⊢(z, zI) ⊢(ε, I)
 (q0,Ɛ, Ɛ)=(q1,I)
 (q1, Ɛ,I)=(q1,S)
 (q1, Ɛ,S)=(q1,S+X)
 (q1, Ɛ,S)=(q1,X+X)
 (q1, Ɛ,X)=(q1,Y+X)

 SS+XX+XY+Xx+Xx+X*Yx+Y*Xx+y*Yx+y*z

September 24, 2021 Formal Language Theory 70


P: S → S+X | X, X → X*Y | Y, Y → (S) | x | y | z

 (x+y*z, I) ⊢(x +y*z, SI) ⊢ (x+y*z, S+XI) ⊢(x+y*z, X+XI)


 ⊢(x+y*z, Y+X I) ⊢(x+y*z, x+XI) ⊢(+y*z, +XI) ⊢ (y*z, XI)
 ⊢(y*z, X*YI) ⊢(y*z, y*YI) ⊢(*z,*YI) ⊢(z, YI) ⊢(z, zI) ⊢(ε, I)
 (q0,Ɛ, Ɛ)=(q1,I)
 (q1, Ɛ,I)=(q1,S)
 (q1, Ɛ,S)=(q1,S+X)
 (q1, Ɛ,S)=(q1,X+X)
 (q1, Ɛ,X)=(q1,Y+X)
 (q1, Ɛ,Y)=(q1,x+X)
 (q1, x,x)=(q1, Ɛ)
 (q1, y,y)=(q1, Ɛ)
 (q1, +,+)=(q1, Ɛ)
 (q1, *,*)=(q1, Ɛ)
 (q1, Ɛ,X )=(q1, X*Y)
 (q1, Ɛ,X )=(q1, Y*Y)
 (q1, Ɛ,Y )=(q1, x) | (q1,y)| (q1,z)
 (q1, Ɛ,I )=(q2, I)

September 24, 2021 Formal Language Theory 71

You might also like