Context Free Grammar Explained

Chapter Three
Context Free Grammar (CFG)

Outline
I. Introduction
II. Derivation in CFG
III. Parsing & Ambiguity in Grammars
IV. CFG Simplification
V. Normal Forms
VI. CFL & Closures Properties
VII.Pumping Lemma for CFL
Introduction
•In the literary sense of the term, grammars denote syntactical
rules for conversation in natural languages.
•Linguistics have attempted to define grammars since the

inception (origin) of natural languages like Amharic, English,
French , Germany, etc.
•The theory of formal languages finds its applicability extensively

in the fields of Computer Science.
•Noam Chomsky gave a mathematical model of grammar in 1956

which is effective for writing computer languages.
Conti…
Definition.
Grammar: A grammar denoted as ‘G’ can be formally written or

defined as using a 4-tuple, G= (V, T, S, P) where :−
 V: is called a finite set of variables or non-terminal symbols.
 T/∑:is called a finite set of Terminal symbols(input alphabets).
 S: is a special variable called the Start symbol, S ∈ V
 P: is finite set of Production rules.
A production rule P has the form α → β, where α and β are

strings symbols on V, ∑ , i.e. α ∈ (V ∪ ∑ )⁺ & β ∈ (V ∪ ∑ )*.
Conti…
Example:
Grammar: G1 − ({S, A, B}, {a, b}, S,P)
P: ( {S → AB, A → a, B → b})
Here, S, A, and B are Non-terminal symbols. a and b are

Terminal symbols, S is the Start symbol, S ∈ V
Productions, P : S → AB, A → a, B → b
Grammar: G2 − ({S, A}, {a, b}, S,P({S → aAb, aA → aaAb, A

→ ε } )) Here, S and A are Non-terminal symbols (Variables).
a and b are Terminal symbols. ε is an empty string. S is the Start

symbol, S ∈ V, Production, P : S → aAb, aA → aaAb, A → ε
Conti…
In CFG, the start symbol is used to derive the string. You can
derive the string by repeatedly replacing a non-terminal by the
right hand side of the production, until all non-terminal have been
replaced by terminal symbols.
Example:
L = {wcwR | w ϵ (a, b)*}
Production rules:
S → aSa
S → bSb
S→c
Conti…
Now check that abbcbba string can be derived from the given
CFG.
S ⇒ aSa
S ⇒ abSba
S ⇒ abbSbba
S ⇒ abbcbba
By applying the production S → aSa, S → bSb recursively and

finally applying the production S → c, we get the string abbcbba.
Conti…
Capabilities of CFG
 Context free grammar is useful to describe most of the

programming languages.
 If the grammar is properly designed then an efficient parser

can be constructed automatically.
 Using the features of associatively & precedence information,

suitable grammars for expressions can be constructed.
 Context free grammar is capable of describing nested

structures like: balanced parentheses, matching begin-end,
corresponding if-then-else's & so on.
Conti…
Context Free Language:
Context free languages are formal language families which are

accepted (recognized) by pushdown automata but not by finite
state automata. Context free languages can be generated by
context free grammar.
Regular Languages are Context Free Languages but the

revers is not true.
Regular Grammars are Context Free Grammars but the

revers is not true.
Chomsky Classification of Grammars
•According to Noam Chomsky, there are four types of
grammars:
•Type 0 know as unrestricted (recursively enumerated)

grammar.
• Type 1 know as context sensitive grammar.
• Type 2 know as context free grammar.
•Type 3 know as Regular grammar.
The following table shows how they differ from each other:
Conti…
Conti…
Type – 3 (Regular) Grammar
•Type-3 grammars generate regular languages. These languages
are exactly all languages that can be decided by finite state
automaton. Type 3 is most restricted form of grammar.
• Type-3 grammars must have the form:
VVT* | T* [LLG] or
VT*V | T* [RLG].
Example:
X → a | aY
Y→b
X→ε
Type-2 (Context Free) grammars
Type - 2 Grammar generate context-free languages.
•The productions must be in the form α → β, where α ∈ V (Non

terminal or Variable) and β ∈ (T ∪ V)* (String of terminals and
non-terminals). |α|=1. there is no restriction on β.
•These languages are recognized by a non-deterministic

pushdown automaton (PDA).
Example: S → Xa
X→a
X → aX
X → abc
X→ε
Type-1 [Context Sensitive] grammars
Type - 1 Grammars generate context-sensitive languages.
The language generated by this grammar is recognized by the Linear Bounded

Automata.
The grammar productions must be in the form: α →β, |α | ≤ |β| i.e. count of symbols
in α is less than or equal to β, and α, β Ɛ V* & T*.
Example: Consider the following CSG.
S → abc/aAbc
Ab → bA
Ac → Bbcc
bB → Bb
aB → aa/aaA
What is the language generated by this grammar?
Type-0 (Unrestricted or Recursively Enumerable ) Grammar
•Type – 0 Grammar include all formal grammars.
•Type 0 grammar generate recursively enumerable languages. The
productions have no restrictions.
•They generate the languages that are recognized by a Turing
Machine.
•The productions can be in the form of α → β where α is
(V+T)*V(V+T)* and β is (V+T)*.
Example:
S ABaC,
Ba aaB,
BC DC | E,
aD Da,
AD AB,
aE  Ea,
AE  λ.
Introduction to Context-Free Grammar
Context free grammar: It is a formal grammar which is used to
generate all possible strings in a given formal language.
Definition: A context-free grammar (CFG) consisting of a finite

set of grammar rules is a quadruple (V, T, P, S) where:
V is a set of non-terminal symbols.
T is a set of terminals where V ∩ T = NULL.
P is a set of rules, P: A →B, where B = {V ∪ T}* and A ε V.
S is a special Symbol called the start symbol.

Cont…
Example: The Language anbn:n ≥ 0 is a CFL. And its is
equivalent CFG is, P: S  aSb|λ
In addition the following are some of CFG.
P:
A → aA,
A → abc.
P:
S → aSa, S → bSb, S → ε
P: S → 00S | 11F, F → 00F | ε.

Cont…
Regular Expression Vs Context Free Grammar
•Regular Expressions are capable of describing the syntax of

Tokens. Any syntactic construct that can be described by Regular
Expression can also be described by the Context free grammar.
•Regular Expressions are most useful for describing the structure

of lexical construct such as identifiers, constant etc. Context free
grammars are most useful in describing the nested chain
structure or syntactic structure such as balanced parenthesis, if
else etc. and these can't be define by Regular Expression.
•RE: (a|b)(a|b|01), CFG: S  aA|bA, A  aA|bA|0A|1A|e.

Cont…
•The Context-free grammar form NFA for the Regular Expression
using the following construction rules:
•Example 1:
•Construct the CFG for the language having any number of a's
over the set ∑= {a}.
•Solution: As we know the regular expression for the above

language is, RE = a*.
Production rule for the Regular expression is as follows:
1. S → aS rule 1
2. S → ε rule 2
Cont…
•Now if we want to derive a string "aaaaaa", we can start with
start symbols.
1. S
2. aS
3. aaS rule 1
4. aaaS rule 1
5. aaaaS rule 1
6. aaaaaS rule 1
7. aaaaaaS rule 1
8. aaaaaaε rule 2
9. aaaaaa
•The RE = a* can generate a set of string {ε, a, aa, aaa,.....}.
We can have a null string because S is a start symbol and rule 2
gives S → ε.
Cont…
Example 2: Construct a CFG for the regular expression (0+1)*
Solution:
The CFG can be given by,
1. Production rule (P):
2. S → 0S | 1S
3. S → ε
The rules are in the combination of 0's and 1's with the start
symbol. Since (0+1)* indicates {ε, 0, 1, 01, 10, 00, 11, ....}. In
this set, ε is a string, so in the rule, we can set the rule S → ε.
Cont…
Example 3: Construct a CFG for a language L = {wcwR | where
w Ɛ (a, b)*}.
Solution: The string that can be generated for a given language is
{aacaa, bcb, abcba, bacab, abbcbba, ....}
The grammar could be:
1.S → aSa rule 1
2.S → bSb rule 2
3.S → c rule 3 ፣ Now if we want to derive a string
"abbcbba", we can start with start symbols.
4.S → aSa
5.S → abSba from rule 2
6.S → abbSbba from rule 2
7.S → abbcbba from rule 3 , Thus any of this kind of string
can be derived from the given production rules.
Cont…
Example 4:
• Construct a CFG for the language L = anb2n where n>=1.
Solution: The string that can be generated for a given language is
{abb, aabbbb, aaabbbbbb....}.
The grammar could be:
1.S → aSbb | abb
Now if we want to derive a string "aabbbb", we can start with
start symbols.
1.S → aSbb
2.S → aabbbb
Difference:
2. Derivation
Derivation is a sequence of production rules. It is used to get the
input string through these production rules. During parsing we
have to take two decisions. These are as follows:
We have to decide the non-terminal which is to be replaced.
We have to decide the production rule by which the non-

terminal will be replaced.
We have two options to decide which non-terminal to be replaced

with production rule.
Cont…
a) Left-Most Derivation
In the left most derivation, the input is scanned and replaced with
the production rule from left to right (left most variable). So in
left most derivatives we read the input string from left to right.
Example:
Production rules:
S=S+S
S=S-S
S = a | b |c
Input: a - b + c
The left-most derivation is:
Cont…
S=S+S
S=S-S+S
S=a-S+S
S=a-b+S
S=a-b+c
b) Right-Most Derivation
In the right most derivation, the input is scanned and replaced

with the production rule from right to left (right most variable).
So in right most derivatives we read the input string from right to
left.
Cont…
Example:
S=S+S
S=S-S
S = a | b |c
Input: a - b + c
The right-most derivation is:
S=S-S
S=S-S+S
S=S-S+c S=S-b+c S=a-b+c

Cont…
c. Mixed Derivations: Applying both LMD and RMD or any where.
Example:
S=S+S
S=S-S
S = a | b |c
The right-most derivation is: a - b + c
S = S - S (given)
S = S - S + S (RM)
S = a - S + S (LM)
S = a - S + c (RM)
S=a-b+c
Cont…
d) Derivation Tree: It is called parse tree. A derivation tree or
parse tree is an ordered rooted tree that graphically represents the
semantic information of a string derived from a context-free
grammar.
Representation Technique:
• Root Vertex − Must be labeled by the start symbol.
• Vertex − Labeled by a non-terminal symbol.
• Leaves − Labeled by a terminal symbol or ε.

Cont…
• Parse tree is the graphical representation of symbol. The
symbol can be terminal or non-terminal.
• In parsing, the string is derived using the start symbol. The

root of the parse tree is that start symbol.
• It is the graphical representation of symbol that can be

terminals or non-terminals.
• Parse tree follows the precedence of operators. The deepest

sub-tree traversed first. So, the operator in the parent node has
less precedence over the operator in the sub-tree.
Cont…
The parse tree follows these points:
• All leaf nodes have to be terminals.
• All interior nodes have to be non-terminals.
• In-order traversal gives original input string.
Example:
Production rules:
• T= T + T | T * T
• T = a|b|c
• Input: a * b + c
Conti…
Conti…
Conti…
There are two different approaches to draw a derivation tree:
•Top-down Approach − Starts with the starting symbol S, Goes

down to the tree leaves using production rules.
•Bottom-up Approach −Starts from the tree leaves, Proceeds

upward to the root which is the starting symbol S.
Derivation Result [Yield of a Tree] : The derivation or the yield

of a parse tree is the final string obtained by concatenating the
labels of the leaves of the tree from left to right, ignoring the
Nulls. However, if all the leaves are Null, derivation is Null.
Derivation Tree, It tells how a string is derived using production
rules from S.
Cont…
Example
Let a CFG = {V,T,S,P}, V = {S}, T = {a, b}, Starting symbol = S, P = S

→ SS | aSb | ε, One derivation from the above CFG is “abaabb”.
S → SS → aSbS → abS → abaSb → abaaSbb → abaabb LMD

Conti…
Partial Derivation Tree: A partial derivation tree is a sub-tree of
a derivation tree/parse tree/ such that either all of its children
are in the sub-tree(see S below) or none of them are in the
sub-tree(see the variable A and B).
Sentential Form: Any productions of the form X..*..Y. If a

partial derivation tree contains the root S, it is called a sentential
form. Sentence: is the final string getting from the number of
derivation steps.
Example: If in any CFG the productions are:

S → AB, A → aaA | ε, B → Bb| ε
The partial derivation tree can be the following:
Leftmost and Rightmost Derivation
•Leftmost derivation − A leftmost derivation is obtained by
applying production to the leftmost variable in each step.
•Rightmost derivation − A rightmost derivation is obtained by

applying production to the rightmost variable in each step.
Example: Let any set of production rules in a CFG be:
X → X+X | X*X |X| a, over an alphabet {a}.
•The leftmost derivation for the string "a+a*a" can be:
X → X+X → a+X → a + X*X → a+a*X → a+a*a

Cont…
•The stepwise derivation of the above string is shown as below:
Every Regular and Linear grammars are clearly CFG, but a CFG
may not necessarily Linear or regular.
Conti…
•The rightmost derivation for the above string "a+a*a" may be:
X → X*X → X*a → X+X*a → X+a*a → a+a*a
•The stepwise derivation of the above string is shown as below:

Classification of Context Free Grammars
Context Free Grammars (CFG) can be classified on the basis of
following two properties:
1) Based on number of strings it can generates.
If CFG is generating finite number of strings, then CFG is Non-

Recursive.
If CFG can generate infinite number of strings then the

grammar is said to be Recursive grammar.
Left and Right Recursive Grammars
In a context-free grammar G:
• If there is a production in the form X → Xa where X is a
non-terminal and ‘a’ is a string of terminals, it is called a left
recursive production.
•The grammar having a left recursive production is called
a left recursive grammar.
•If there is a production is in the form X → aX where X is a
non-terminal and ‘a’ is a string of terminals, it is called a right
recursive production.
•The grammar having a right recursive production is called
a right recursive grammar.
Cont…
Examples of Recursive and Non-Recursive Grammars
Recursive Grammars
1) SSaS | Sb The language(set of strings) generated by the above grammar is :{b, bab,
babab,…}, which is infinite.
2) SAa, AAb|c The language generated by the above grammar is :{ca, cba, cbba …}, which is
infinite.
Non-Recursive Grammars
SAa, Ab|c The language generated by the above grammar is :{ba, ca}, which is finite.
Types of Recursive Grammars: Based on the nature of the recursion in a recursive grammar, a
recursive CFG can be again divided into the following:
– Left Recursive Grammar (having left Recursion)
– Right Recursive Grammar (having right Recursion)
– General Recursive Grammar(having both Recursion)

Cont…
During Compilation, the parser uses the grammar of the language
to make a parse tree (or derivation tree) out of the source code.
The grammar used must be unambiguous.
2) Based on number of derivation trees.
•If there is only 1 derivation tree then the CFG is unambiguous.
•If there are more than 1 derivation tree, then the CFG is
ambiguous, i.e. Ambiguous grammar: A CFG is said to be
ambiguous if there exists more than one derivation tree for the
given input string i.e., more than one Left Most Derivation Tree
(LMDT) or Right Most Derivation Tree (RMDT).
3. Parsing and Ambiguity
Ambiguity in Context free Grammar and Context free Languages
Suppose we have a context free grammar G with production rules :
S -> aSb | bSa | SS | e
Left Most Derivation (LMD) and Derivation Tree : Leftmost derivation of a

string from starting symbol S is done by replacing leftmost non-terminal
symbol by RHS of corresponding production rule. For example, the leftmost
derivation of string abab from grammar G above is done as :
S  aSb  abSab  abab
The symbols underlined are replaced using production rules.

Derivation Tree : It tells how a string is derived using production rules from S
and has been shown in Figure 1.
Cont…
Cont…
•Right Most Derivation (RMD) : Rightmost derivation of a
string from starting symbol S is done by replacing rightmost non-
terminal symbol by RHS of corresponding production rule. e.g.;
The rightmost derivation of string abab from grammar G above is
done as :
•SSS  SaSb  Sab  aSbab  abab
•The symbols underlined are replaced using production rules. The

derivation tree for abab using rightmost derivation has been
shown in Figure 2.
Cont…
Conti…
A derivation can be either LMD or RMD or both.
For example, S  aSb  abSab  abab is LMD as well as

RMD but S  SS  SaSb  Sab  aSbab  abab is RMD but
not LMD.
Ambiguous Context Free Grammar : A context free grammar

is called ambiguous if there exists more than one LMD or more
than one RMD for a string which is generated by grammar. There
will also be more than one derivation tree for a string in
ambiguous grammar. The grammar described above is
ambiguous because there are two derivation trees (Figure 1 and
Figure 2).
Cont…
There can be more than one RMD for string abab which are:
S  SS  SaSb  Sab  aSbab  abab
S  aSb  abSab  abab
Ambiguous Context Free Languages : A context free language

is called ambiguous if there is no unambiguous grammar to
define that language and it is also called inherently ambiguous
Context Free Languages.
Cont…
Example: L= {anbncm} U {anbmcm}
Note:
•If a context free grammar G is ambiguous, language generated by grammar
L(G) may or may not be ambiguous.
•It is not always possible to convert ambiguous CFG to unambiguous CFG.

Only some ambiguous CFG can be converted to unambiguous CFG.
•There is no algorithm to convert ambiguous CFG to unambiguous CFG.
•There always exists a unambiguous CFG corresponding to unambiguous

CFL.
Cont…
Ambiguity is a common feature of natural languages, where it is
tolerated and dealt with in a variety of ways. In programming
languages, where there should be only one interpretation of each
statement, ambiguity must be removed when possible. Often we
can achieve this by rewriting the grammar in an equivalent,
unambiguous form.
If the grammar has ambiguity then it is not good for a compiler

construction. No method can automatically detect and remove the
ambiguity but you can remove ambiguity by re-writing the whole
grammar without ambiguity.
Unambiguous Grammar
•A grammar can be unambiguous if the grammar does not contain
ambiguity that means if it does not contain more than one
leftmost derivation or more than one rightmost derivation or more
than one parse tree for the given input string.
•To convert ambiguous grammar to unambiguous grammar, we

will apply the following rules:
1. If the left associative operators (+, -, *, /) are used in the

production rule, then apply left recursion in the production rule.
Left recursion means that the leftmost symbol on the right side is
the same as the non-terminal on the left side. For example, X → Xa.
Unambiguous Grammar
Example 1: Consider a grammar G is given as follows:
1.S → AB | aaB
2.A → a | Aa
3.B → b
•Determine whether the grammar G is ambiguous or not. If G is

ambiguous, construct an unambiguous grammar equivalent to G.
Solution: Let us derive the string "aab“.
S aaB aab
S  AB  AaB aaB  aab

Unambiguous Grammar
Example 1: Consider a grammar G is given as follows:
1.S → AB | aaB
2.A → a | Aa
3.B → b
•Determine whether the grammar G is ambiguous or not. If G is ambiguous, construct
an unambiguous grammar equivalent to G.
Solution: Let us derive the string "aab“.
S aaB aab (LMD1)
S  AB  AaB aaB  aab (LMD2)
•s there are two different parse tree for deriving the same string, the given grammar is
ambiguous. Unambiguous grammar will be:
1.S → AB
2.A → Aa | a
3.B → b
Unambiguous Grammar
Example 2: Show that the given grammar is ambiguous. Also,
find an equivalent unambiguous grammar.
1. S → ABA
2. A → aA | ε
3. B → bB | ε
Solution: The given grammar is ambiguous because we can

derive two different parse tree for string aa.
• The unambiguous grammar is:
1. S → aXY | bYZ | ε
2. Z → aZ | a
3. X → aXY | a | ε
4. Y → bYZ | b | ε
Unambiguous Grammar
• Show that the given grammar is ambiguous. Also, find an equivalent
unambiguous grammar.
1.E → E + E
2.E → E * E
3.E → id
• Solution:
• Let us derive the string "id + id * id"
• As there are two different parse tree for deriving the same string, the given
grammar is ambiguous.
• Unambiguous grammar will be:
1.E → E + T
2.E → T
3.T → T * F
4.T → F
5.F → id
Unambiguous Grammar
Example 4:Check that the given grammar is ambiguous or not. Also, find an
equivalent unambiguous grammar.
1. S → S + S
2. S → S * S
3. S → S ^ S
4. S → a
Solution:
•The given grammar is ambiguous because the derivation of string aab can be
represented by the following string:
Unambiguous grammar will be:
1. S → S + A |
2. A → A * B | B
3. B → C ^ B | C
4. C → a
Cont…
1. Parsing: The term parsing describes finding a sequence of
productions by which W ε L(G) is derived.
Exhaustive Search Parsing (ESP) Method is a form of top down
parsing, which we can view as the construction of a derivation
tree from the root to down.
We apply a leftmost derivation during this method of parsing.
We need no more than |W| rounds for finding ‘W’ string.
Exhaustive search parsing has serious flows (steps) and it is

tedious process.
This method is not used for efficient parsing, rather used for
efficiency is secondary issue.
Cont…
Example: Consider the following grammar.
SSS|aSb|bSa|λ, Use ESP method now to parse a string (input)

W=aabb?
Procedures:
Round 1:Gives us
SSS, selected
SaSb, selected
SbSa, not selected
S λ, not selected
Here the last two productions can be removed without further consideration.
Cont…
Round Two: Gives us the Sentential forms
SSSSSS, selected
SSSaSbS, selected
SSSbSaS,  not selected
SSSS, not selected already evaluated at round 1
which are obtained by replacing the left most S in sentential forms with all
applicable substitutions. Similarly, from sentential form 2 we get the
additional sentential forms.
SaSbaSSb selected
SaSb,aaSbb selected
SaSbabSab not selected
SaSbab, not selected
Here again several of these can be removed from Sentential.
Cont…
Round Three: Gives us Sentential forms:
SSSSSSSSSS selected
SSSSSS aSbSS selected
SSSSSSbSaSS not selected
SSSSSSSS not selected, already evaluated
……………………………………………………….
SSSaSbSaSSbS selected
SSSaSbSaaSbbS selected
SSSaSbSabSabS not selected
SSSaSbSabS not selected
Cont…
--------------------------------------------------
SaSbaSSbaSSSb selected
SaSbaSSbaaSbSb selected
SaSbaSSbabsaSb not selected
SaSbaSSbaSb not selected already evaluated before
-------------------------------------------------------------
SaSb,aaSbbaaSSbb selected
SaSb,aaSbbaaaSbSbb not selected
SaSb,aaSbbaabSabb not selected
SaSb,aaSbbaabb selected[Parsing is Over with success]
Now Exhaustive parsing is success and the input is syntactically correct.
Individual Assignment 10%:
Apply Exhaustive Parsing for W=aaabbb and w=aabbab, show steps clearly and
neatly?
Cont…
Individual Assignment 10%:
Apply Exhaustive Parsing for W=aaabbb and w=aabbab, show steps clearly and neatly?
Solution: w=aaabbb
Round 1: Gives us
SSS, selected
SaSb, selected
SbSa, not selected
S λ, not selected
Here the last two productions can be removed without further consideration.
Round 2:
SSSsss
SSSaSbS
SSSbSaS
SSSS
SaSb
Simple Grammar (S-Grammar)
A context free grammar G = (V,T,S,P) is said to be simple grammar or
S-grammar if all its productions are of the form, AaX, A ε V, a ε T,
and X ε V*, and any pair (A, a) occurs at most once in P.
While S-grammars are quite restrictive, they are of some interest. Many
of common programming languages can be described by s-grammar.
Example: The grammar, SaS|bSS|aSS|c is not an S-grammar
because the pair (S , a) occurs in the two productions SaS and
SaSS. And the grammar, SaS | bSS | c, is an example of S-
Grammar.
Exercise: Find an S-grammar for L(aaa*b+b)?
L={b, aab, aaab, aaaab, aaaaab,….}
G=({S, A, B},{a,b},S,P),
P: SaA|b
AaB
BaB|b
Q. Write S-grammar for the following languages?

a) L={anbn :n ≥ 1}
b) L= {anbn+1:n ≥ 2}  Home Quiz.
a) Soln: SaA, A  aAB /b , B  b
SaAab
SaAaaABaabBaabb
SaAaaABaaaABBaaabBBaaabbBaaabbb
a) L= {anbn+1:n ≥ 0},
Soln: SaAC/b , A  aAB/b , B  b,Cb
a) L= {anbn+1:n ≥ 1},
Soln: SaAC , A  aAB/b , B  b, Cb

a) L= {anbn+1:n ≥ 0},
Soln: SaAC/b , A  aAB/b , B  b,Cb
a) L= {anbn+1:n ≥ 1},
Soln: SaAC , A  aAB/b , B  b, Cb
a) L= {anbn+1:n ≥ 2},
Soln: SaAC , A  aXB, X aXB/b, B  b, Cb or
S aA, A aB, B bC|aBD, C bD, D b
1.SaACaaXBCaabBC aabbCaabbb
2.SaACaaXBC aaaXBBC  aaabBBC  aaabbBC  aaabbbC aaabbbb
3.SaACaaXBC aaaaXBBBC  aaaabBBBC  aaaabbBBC aaaabbBBC

aaaabbbBC aaaabbbBC aaaabbbbC aaaabbbbb
Answer Key
a) L= {anbn+1:n ≥ 2},
Soln: SaAC , A  aXB, X aXB/b, B  b, Cb or
S aA, A aB, B bC|aBD, C bD, D b
1.SaACaaXBCaabBC aabbCaabbb
2.SaACaaXBC aaaXBBC  aaabBBC  aaabbBC  aaabbbC aaabbbb
3.SaACaaXBC aaaaXBBBC  aaaabBBBC  aaaabbBBC aaaabbBBC

aaaabbbBC aaaabbbBC aaaabbbbC aaaabbbbb
Cont…
2. Ambiguity: If a context free grammar G has more than one
distinct derivation tree or if there exists more than one LMD or
more than one RMD for some string w ∈ L(G), it is called
an ambiguous grammar.
Problem 1: Check whether the grammar G with production rules:

X → X+X
X → X*X
X → X| a , is ambiguous or not.
Solution: Let’s find out the derivation tree for the random string
w="a+a*a". It has two leftmost derivations.
Derivation 1: X → X+X → a +X → a+ X*X → a+a*X → a+a*a
Parse tree 1:
Derivation 2: X → X*X → X+X*X → a+ X*X → a+a*X →
a+a*a
Parse tree 2:
Since there are two parse trees for a single string "a+a*a", the
grammar G is ambiguous.
Cont…
Problem 2:
Let us consider this grammar : E -> E+E|id
We can create 2 parse tree from this grammar to obtain a
string id+id+id , The above are the 2 parse trees generated by
left most derivation:
Cont…
•Both the above parse trees are derived from same grammar rules
but both parse trees are different. Hence the grammar is
ambiguous.
Problem 3: Let us now consider the following grammar: Set of
alphabets ∑ = {0,…,9, +, *, (, )}
E -> I
E -> E + E
E -> E * E
E -> (E)
I -> ε | 0 | 1 | … | 9
Cont…
From the above grammar String 3*2+5 can be derived in 2 ways:
I) First leftmost derivation
E=>E*E =>I*E =>3*E+E =>3*I+E =>3*2+E =>3*2+I =>3*2+5
II) Second leftmost derivation
E=>E+E =>E*E+E =>I*E+E =>3*E+E =>3*I+E =>3*2+I =>3*2+5
HW:3*(2+5), parse this string?
Following are some examples of ambiguous grammars:
S-> aS |Sa|Є
E-> E +E | E*E|id
A -> AA | (A) |a
S -> SS|AB , A -> Aa|a , B ->Bb|b
Where as following grammars are unambiguous:
S -> (L) | a, L -> LS | S
S -> AA , A -> aA , A -> b
HW:
1. Show that the following grammar is ambiguous?
SAB/aaB
Aa/Aa
Bb
2. EI|E+E|E*E|(E),
Ia|b|c,
find two derivation trees for a string “a+b*c” ε L(G)?

Ambiguity of grammar
E.g. Consider a grammar G with production rules,
p: S -> aS | Sa | a
Now for string ‘aaa’ we will have 4 parse trees, hence ambiguous.
Cont…
Inherently Ambiguous Language
Let L be a Context Free Language (CFL), If every Context Free
Grammar G with Language L = L(G) is ambiguous, then L is
said to be inherently ambiguous Language. Ambiguity is a
property of grammar not languages. Ambiguous grammar is
unlikely to be useful for a programming language, because two
parse trees structure(or more) for the same input string (program)
implies two different meanings (executable programs) for the
program.
Note: Ambiguity of a grammar is undecidable, i.e. there is no

particular algorithm for removing the ambiguity of a grammar,
but we can remove ambiguity by disambiguate the grammar
Cont…
Ambiguous Context Free Languages: A context free language
is called ambiguous if there is no unambiguous grammar to
define that language and it is also called inherently ambiguous
Context Free Languages.
Cont…
Note:
•If a context free grammar G is ambiguous, language generated
by grammar L(G) may or may not be ambiguous.
•It is not always possible to convert ambiguous CFG to

unambiguous CFG. Only some ambiguous CFG can be converted
to unambiguous CFG.
•There is no algorithm to convert ambiguous CFG to

unambiguous CFG.
•There is always exist a unambiguous CFG corresponding to
unambiguous CFL.
•Deterministic CFL are always unambiguous.
Cont…
The set of deterministic context-free languages is closed under the following operations:
•complement
•inverse homomorphism
•right quotient with a regular language
The set of deterministic context-free language is not closed under the following operations:
•union
•intersection
•concatenation
•Kleene star
•ε-free morphism
•Mirror image
CFL: Closures Property
•Context free grammars can be either deterministic or non deterministic CFG.
•Context-free languages are closed under Union, Concatenation, Kleene Star

operation.
1) Union: Let L1 and L2 be two context free languages, Then L1 ∪ L2 is

also context free.
Example 1: Let L1 = { anbn , n > 0}, Corresponding grammar G1 will have P:

S1 → aS1b | ab
Let L2 = { cmdm , m ≥ 0}.
Corresponding grammar G2 will have P: S2 → cS2d| ε.
Union of L1 and L2, L3 = L1 ∪ L2 = { anbn } ∪ { cmdm }.
The corresponding grammar G will have the additional production S → S1 |

S2.
Conti…
Example 2: L1 = { anbncm | m >= 0 and n >= 0 }
G1, P: S’S1S2, S1aS1b|ε S2S2c| ε,and
L2 = { anbmcm | n >= 0 and m >= 0 },
G2, P: S’’S1S2, S1S1a|ε S2bS2c| ε
L3 = L1 ∪ L2 = { anbncm ∪ anbmcm | n >= 0, m >= 0 } is also context free.

G3, P: SS’ | S’’,
L1 says number of a’s should be equal to number of b’s and L2 says number of
b’s should be equal to number of c’s.
•Their union says either of two conditions to be true. So it is also context free
language.
• Note: So CFL are closed under Union.

Conti…
2) Concatenation: If L1 and L2 are context free languages, then L1.L2 is
also context free.
Example: Concatenation of the languages L1 and L2 , L = L1L2 = { anbncmdm :}

from the above example 1.
•The corresponding grammar G will have the additional production: S → S1

S2.
L1 says number of a’s should be equal to number of b’s and L2 says number of
c’s should be equal to number of d’s. Their concatenation says first number of
a’s should be equal to number of b’s, then number of c’s should be equal to
number of d’s. So, we can create a PDA which will first push for a’s, pop for
b’s, push for c’s then pop for d’s. So it can be accepted by pushdown automata,
hence context free.
Note: So CFL are closed under Concatenation.

Conti…
3) Kleene Star: If L is a context free language, then L* is also context free.
Example: Let L = { anbn , n ≥ 0}.
Corresponding grammar G will have P: S → aSb| ε
Kleene Star: L* = { anbn : n ≥ 0}* The corresponding grammar G1 will have

additional productions S  aSb | SS | ε.
Note :So CFL are closed under Kleene Closure.

Cont…
1) Intersection − If L1 and L2 are context free languages, then L1 ∩ L2 is
not necessarily context free.
Example:
L1 = { anbncm | n >= 0 and m >= 0 } and L2 = (ambncn | n >= 0 and m >= 0 }
L3 = L1 ∩ L2 = { anbncn | n >= 0 }  need not be context free.
L1 says number of a’s should be equal to number of b’s and L2
says number of b’s should be equal to number of c’s. Their
intersection says both conditions need to be true, but push down
automata can compare only two. So it cannot be accepted by
pushdown automata, hence not context free.

Cont…
Intersection with Regular Language: If L1 is a regular language
and L2 is a context free language, then L1 ∩ L2 is a context free
language.
L1 = { anbmcm | n >= 0 and m >= 0}
P: SAB | Ɛ, A aA | Ɛ, B  bBc | Ɛ.
L2 = {an cm | m >= 0 , n>= 0},
P: SAB | Ɛ, A aA | Ɛ, B  aB | Ɛ.
L1 ∩ L2 = {an | n>=0 }.
P: S aS | Ɛ
Cont…
3) Complement − Similarly, complementation of context free
language L1 which is, L’ = ∑* – L1, need not be context free.

If L1 is a context free language, then L1’ may not be context free.
Note :
CFL are not closed under Reverse, difference, Intersection and Complementation.
Reading Assignment:
Deterministic Context-free Languages
Deterministic CFL are subset of CFL which can be recognized
by Deterministic PDA. Deterministic PDA has only one move
from a given state and input symbol, i.e., it do not have choice.
For a language to be DCFL it should be clear when to Push or
POP.
Deterministic Context-free Languages
For example, L1 = { anbncm | m >= 0 and n >= 0} is a DCFL because for a’s,
we can push on stack and for b’s we can pop. It can be recognized by
Deterministic PDA.
On the other hand, L2 = { anbncm ∪ anbmcm | n >= 0, m >= 0 } cannot be

recognized by DPDA because either number of a’s and b’s can be equal or
either number of b’s and c’s can be equal. So, it can only be implemented by
NPDA.
Thus, it is CFL but not DCFL.
Note : DCFL are closed only under complementation and Inverse

Homomorphism.
Question
Question : Consider the language L1,L2,L3 as given below.
L1 = { ambn | m, n >= 0 }
L2 = { anbn | n >= 0 }
L3 = { anbncn | n >= 0 }
Which of the following statements is NOT TRUE?
A. Push Down Automata (PDA) can be used to recognize L1 and L2
B. L1 is a regular language
C. All the three languages are context free
D. Turing machine can be used to recognize all the three languages
Question : The language L = { 0i12i | i ≥ 0 } over the alphabet {0, 1, 2} is :
A. Not recursive
B. Is recursive and deterministic CFL
C. Is regular
D. Is CFL bot not deterministic CFL.
Cont…
Question : Consider the following languages:
L1 = { 0n1n| n≥0 }
L2 = { wcwr | w ɛ {a,b}* }
L3 = { wwr | w ɛ {a,b}* }
Which of these languages are deterministic context-free languages?
A. None of the languages
B. Only L1
C. Only L1 and L2
D. All three languages
Question : Which one of the following grammars generate the language L = { aibj |
i≠j}
 S -> AC | CB, C -> aCb | a | b, A -> aA | ɛ, B -> Bb | ɛ
 S -> aS | Sb | a | b
 S -> AC | CB, C -> aCb | ɛ, A -> aA | ɛ, B -> Bb | ɛ
 S -> AC | CB, C -> aCb | ɛ, A -> aA | a, B -> Bb | b
4. Simplifying Context Free Grammars
The definition of context free grammars (CFGs) allows us to
develop a wide variety of grammars. Most of the time, some of
the productions of CFGs are not useful and are redundant. This
happens because the definition of CFGs does not restrict us from
making these redundant productions.
By simplifying CFGs we can remove all these redundant

productions from a grammar , while keeping the transformed
grammar equivalent to the original grammar. Two grammars are
called equivalent if they produce the same language.
Simplifying CFGs is necessary to later convert them into Normal
forms. This is related with concept of optimization.
•Various languages can efficiently be represented by a context-
free grammar. All the grammar are not always optimized that
means the grammar may consist of some extra symbols (non-
terminal). Having extra symbols, unnecessary increase the length
of grammar.
• Simplification of grammar means reduction of grammar by

removing useless symbols.
•The properties of reduced grammar are given below:
•Each variable (i.e. non-terminal) and each terminal of G appears

in the derivation of some word in L.
•There should not be any production as X → Y where X and Y

are non-terminal.
•If ε is not in the language L then there need not to be the

production X → ε.
•Simplification of Grammar includes:

1. Removal Useless Symbols
2. Λ-production removal
3. Unit production removal
Cont…
1.Useless productions Removal: The productions that can
never take part in derivation of any string, are called useless
productions. Similarly , a variable that can never take part in
derivation of any string is called a useless variable.
Example: S  abS | abA | abB, A  cd, B  aB, C  dc, in this

example, the production ‘C  dc’ is useless because the variable
‘C’ will never occur in derivation of any string. The other
productions are written in such a way that variable ‘C’ can never
reached from the starting variable ‘S’.
•Production ‘B  aB’ is also useless because there is no way it

will ever terminate .
Cont…
If it never terminates , then it can never produce a string. Hence
the production can never take part in any derivation.
To remove useless productions , we first find all the variables

which will never lead to a terminal string such as variable ‘B’.
We then remove all the productions in which variable ‘B’ occurs.
So the modified (simplified) grammar becomes :
S  abS | abA, A  cd, C  dc. We then try to identify all the

variables that can never be reached from the starting variable
such as variable ‘C’, We then remove all the productions in which
variable ‘C’ occurs.
Cont…
The grammar below is now free of useless productions: S  abS | abA, A 
cd
Exercise:
1.SaSb|λ|A, AaA,
2.SA, AaA|λ, BbA,
3.SaS|A|c, Aa, Baa, Cacb,

Identify the useless production in the above grammars and transform into new
grammar?
There are two cases where useless happens:
1. It can not reached from the start symbol
2. It can not derive a valid string

Cont…
2. λ - Productions Removal: Sometimes it is undesirable
if the right side of a production is empty string (λ). The
productions of type ‘A  λ’ are called λ-productions (also
called lambda productions or null productions or epsilon
productions). These productions can only be removed
from those grammars if that grammar do not generate λ
(an empty string). It is possible for a grammar to contain
null productions and yet not produce an empty string.
•To remove null productions , we first have to find all the
nullable variables. A variable ‘A’ is called nullable if λ
can be derived from ‘A’.
Cont…
Steps:
1.For all productions A  λ, put A into the set VN (nullable

variables). i.e. VN = {A}.
2.For all productions ‘B  A1A2…An‘, where all ’Ai’s are

nullable variables , put B into VN, i.e. VN = {A, B}.
Once the set VN has been found, we are ready to construct new
Production rules p’. To do so, we look at all productions in p of
the form AX1X2…Xn, n >=1, where each Xi ε VUT. For each
such productions of p, we put into p’ that production as well as all
those generated by replacing nullable variables with λ in all
possible considerations.
Cont…
After finding all the nullable variables, we can now start to
construct the null production free grammar.
• For all the productions in the original grammar , we add the

original production as well as all the combinations of the
production that can be formed by replacing the nullable variables
in the production by λ.
•If all the variables on the RHS of the production are nullable ,
then we do not add ‘A  λ’ to the new grammar. See no. (2)
•An example will make the point clear.

Cont…
Consider the grammar:
S  ABCd (1)
A  BC (2)
B  bB | λ (3)
C  cC | λ (4), remove λ productions?
Let’s first find all the nullable variables. Variables ‘B’ and ‘C’ are
clearly nullable because they contain ‘λ’ on the RHS of their
production. Variable ‘A’ is also nullable because in (2) , both
variables on the RHS are also nullable. Similarly , variable ‘S’ is
also nullable.
So variables ‘S’ , ‘A’ , ‘B’ and ‘C’ are nullable variables.

Cont…
Lets create the new grammar with p’. We start with the first
production. Add the first production as it is. Then we create all
the possible combinations that can be formed by replacing the
nullable variables with λ.
Therefore line (1) now becomes:
S  ABCd | ABd | ACd | BCd | Ad | Bd |Cd | d.
We apply the same rule to line (2),(3),(4): The new grammar now
becomes:
P’: S  ABCd | ABd | ACd | BCd | Ad | Bd |Cd | d
A  BC | B | C, B  bB | b, C  cC | c
Cont…
Exercise:
Find a CFG without λ productions equivalent to the grammar
defined below.
A) SaS1b,S1aS1b|λ
B) SABaC, ABc, Bb|λ, CD|λ, Dd.
C) S → XYX
X → 0X | ε
Y → 1Y | ε
S XYX | XY | YX | XX | X | Y
X  0X|0
Y 1Y|1
Cont…
3.Unit Productions Removal: The productions of type A  B,
where A, B ε V is called unit productions. To remove unit
productions, we use the useful substitution rule.
Example: Remove all unit productions?
SAa | A | B , B A | bb, A a | bc | B.
All unit productions are : SA, SB,BA, and AB.
The non unit productions are: SAa, A a | bc, B bb.
The new rules are: S a | bc | bb, for SA & SB.
A bb, for AB
B a | bc, BA
Cont…
To obtain the equivalent grammar G,
S  a | bc | bb | Aa
A  a | bb | bc
B  a | bb | bc
Note: Useful substitution may create useless productions.
So in the above example the removal of unit production has made B and
associated productions useless.
Generally, removing λ-production may create new unit production and
removing of unit production may create useless productions. So we have to
follow the following sequence of steps.
1. Removal of λ-productions
2. Removal of unit productions
3. Removal of useless productions
Cont…
Example:
S → 0A | 1B | C
A → 0S | 00
B→1|A
C → 01
Worksheet
1.Use Exhaustive parsing method to parse the string w = abbbbbb with the
grammar S  aAB, A bBb, BA/λ
2.Remove useless productions:
a) P: S → AC | B, A → a, C → c | BC, E → aA | e
b) P: Sa|aA|B|C, AaB| λ, BAa, CcCD, Dddd.
3. Removal null production from the following:
a) P: S → ASA | aB | b, A → B, B → b | ∈
b) P: SAaB|aaB, A-λ,BbbA|λ
c) SaS1b,S1aS1b|λ
d) SABaC, ABc, Bb|λ, CD|λ, Dd.
4. Remove unit production from the following:
a) S → XY, X → a, Y → Z | b, Z → M, M → N, N → a
b) S -> Aa |B, A -> b | B, B -> A | a,
c) SabAB|ba, Aaaa, BaA|bb,
d) SaA|aBB, AaaA|λ, BbB|bbC, CB
Question number 2b,3a&c,4b&d are Group assignment work 10% to be submitted!!

Context Free Grammar Explained

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Context Free Grammar Explained

Uploaded by

Copyright:

Available Formats

Chapter Three

Context Free Grammar (CFG)

•Linguistics have attempted to define grammars since the

•The theory of formal languages finds its applicability extensively

•Noam Chomsky gave a mathematical model of grammar in 1956

Grammar: A grammar denoted as ‘G’ can be formally written or

 V: is called a finite set of variables or non-terminal symbols.

 T/∑:is called a finite set of Terminal symbols(input alphabets).

 S: is a special variable called the Start symbol, S ∈ V

 P: is finite set of Production rules.

A production rule P has the form α → β, where α and β are

Grammar: G1 − ({S, A, B}, {a, b}, S,P)

Here, S, A, and B are Non-terminal symbols. a and b are

Grammar: G2 − ({S, A}, {a, b}, S,P({S → aAb, aA → aaAb, A

a and b are Terminal symbols. ε is an empty string. S is the Start

L = {wcwR | w ϵ (a, b)*}

By applying the production S → aSa, S → bSb recursively and

 Context free grammar is useful to describe most of the

 If the grammar is properly designed then an efficient parser

 Using the features of associatively & precedence information,

 Context free grammar is capable of describing nested

Context free languages are formal language families which are

Regular Languages are Context Free Languages but the

Regular Grammars are Context Free Grammars but the

•Type 0 know as unrestricted (recursively enumerated)

• Type 1 know as context sensitive grammar.

• Type 2 know as context free grammar.

•Type 3 know as Regular grammar.

• Type-3 grammars must have the form:

•The productions must be in the form α → β, where α ∈ V (Non

•These languages are recognized by a non-deterministic

The language generated by this grammar is recognized by the Linear Bounded

Example: Consider the following CSG.

Definition: A context-free grammar (CFG) consisting of a finite

V is a set of non-terminal symbols.

T is a set of terminals where V ∩ T = NULL.

P is a set of rules, P: A →B, where B = {V ∪ T}* and A ε V.

S is a special Symbol called the start symbol.

In addition the following are some of CFG.

P: S → 00S | 11F, F → 00F | ε.

•Regular Expressions are capable of describing the syntax of

•Regular Expressions are most useful for describing the structure

•RE: (a|b)(a|b|01), CFG: S  aA|bA, A  aA|bA|0A|1A|e.

•Solution: As we know the regular expression for the above

The CFG can be given by,

1. Production rule (P):

We have to decide the non-terminal which is to be replaced.

We have to decide the production rule by which the non-

We have two options to decide which non-terminal to be replaced

In the right most derivation, the input is scanned and replaced

The right-most derivation is:

S=S-S+c S=S-b+c S=a-b+c

The right-most derivation is: a - b + c

• Root Vertex − Must be labeled by the start symbol.

• Vertex − Labeled by a non-terminal symbol.

• Leaves − Labeled by a terminal symbol or ε.

• In parsing, the string is derived using the start symbol. The

• It is the graphical representation of symbol that can be

• Parse tree follows the precedence of operators. The deepest

• All leaf nodes have to be terminals.

• All interior nodes have to be non-terminals.

• In-order traversal gives original input string.

•Top-down Approach − Starts with the starting symbol S, Goes

X → X+X → a+X → a + XX → a+aX → a+a*a

X → XX → Xa → X+Xa → X+aa → a+a*a