You are on page 1of 42

Context Free Grammar

 A context free grammar defines a language.


 The language is described by a set of
sequence of tokens.
 Any sequence of tokens that can be derived
using the grammar is valid (syntactically
valid)
 CFGs have been used as a basis of the
syntax analysis phase of compilation
Context Free Grammar
 Context-free grammars play a central role in
the description and design of programming
languages and compilers.
 They are also used for analyzing the syntax of
natural languages.
Example to understand the Context
Free Grammar
 A Palindrome is a string that reads the
same forward and backward i.e. w = wR
 Eg 001100, 01110, 100001, or
101101101 are the palindrome strings.
 Let us denote this language by Lpal

 This can be written as Lpal ={w | w = wR }


Class Assignment
 Use Pumping Lemma to prove that Lpal is
not regular
Solution
 Let Lpal be regular (assume)
 Then Pumping Lemma states that there
exists strings x,y, z such that w=xyz
 Since w= 0n10n is a palindrome and is
assumed to be regular, so w=xyz
 Let x=0000……..k times k<n
y= 0000……(n-k) times
z= 1000000…. ‘n’ zeroes
Solution
 Here y can be pumped which means y
can repeat 0 or more number of times. Let
us say 0 times
 Then xz should also belong to Lpal that is
xz is also a palindrome which is a
contradiction
 Hence Lpal is not regular.
Rules to define Palindrome strings
 Pє
BASIS RULE
P 0

P1

 P  0P0 INDUCTION RULE

 P  1P1
Context Free Grammar
 A context free grammar is a formal
notation for expressing recursive
definitions of languages
 A formal language is context free if there
is a context free grammar that generates it
 CFGs are powerful enough to describe the
syntax of most programming languages
Context Free Grammar
 Not all formal languages are context free
 Informally, a Context Free Grammar is simply
a set of rewriting rules or productions (refer
previous example)
Context Free Grammar
A CFG G is defined by its four components
G= (V,T,P,S)
Where V: set of non-terminal variables
T: set of terminals
P: set of Production rules
S: start variable
Terminals and nonterminals
 Symbols that are to be replaced are called
nonterminals,
 Symbols that cannot be replaced are called
terminals
 Example:
1. 01A10
2. if (expression) statement else statement
Defining CFG: an example
Example of a palindrome
G= ({P},{0,1},A, P)
Where A is defined by the set of rules
Pє
P 0
P1
P  0P0
P  1P1
Example from simple English
1. <sentence> <noun phrase> <predicate>
2. <noun phrase>  <article> <noun>
3. <predicate>  <verb>
4. <article>  a
5. <article>  the
6. <noun>  cat
7. <noun>  dog
8. <verb>  runs
9. <verb>  walks
Derivation of “the dog walks”
<sentence><noun phrase> <predicate> (P1)
 <noun phrase><verb> (P3)
 <article> <noun> <verb> (P2)
 the <noun> <verb> (P5)
 the dog <verb> (P7)
 the dog walks (P9)
Language of the grammar
 All strings that can be derived by applying the
production rules of a grammar belong to the
language described by that grammar
 In this example

L={a dog runs, a cat runs, a dog walks, a cat


walks, the dog runs, the cat runs, the dog
walks, the cat walks}
CFG for regular expressions
 Let us consider a regular expression
(a+b)(a+b+0+1)
 Then we can have rules that use two non-
terminal variables say E and I
 The CFG for the above language can be
defined as a four-tuple
( {I,E}, {0,1,a,b,+,*,(,)},P, E)
Non-terminal Terminal Set of Start
Variables (V) Variables (T) production symbol
rules
Production rules for given G
1. EI 5. Ia
2. E  E+E 6. Ib
3. E  E*E 7. I  Ia
4. E (E) 8. I  Ib
9. I  I0
10. I  I1
Compact notation for
productions
 The production rules described above can
be written as follows
E  I|E+E|E*E|(E)
I  a|b|Ia|Ib|I0|I1
Context Free Grammar
 The term "context-free" expresses the fact that
non-terminals can be rewritten without regard
to the context in which they occur.
 A formal language is context-free if some
context-free grammar generates it.
Derivations using a grammar
 Productions of a CFG are applied to
infer that certain strings are in the
language of a certain variable
 There are two approaches
1. Recursive inference

2. Derivation
Recursive Inference
 Strings already known to be in the
language are taken and then use the rules
from input to start symbol
Derivations
 Derivations use the production rules from
head to body, the language of the
grammar is all strings of terminals
obtained this way
Inferences
 Consider the grammar
G= ( {I,E}, {0,1,a,b,+,*,(,)},P, E)
 To find out the strings in the language
defined by G. We infer the following
i. Production rule 5 implies that string a is in L
ii. Production rule 6 implies that string b is in L
iii. Production rule 9 implies that string a0 is in
L (Also uses (i))
Inferring strings
String For Production Strings
inferred language of used used
i a I 5
ii b I 6
iii b0 I 9 ii
iv b00 I 9 iii
v a E 1 i
vi b00 E 1 iv
vii a+b00 E 2 v,vi
viii (a+b00) E 4 vii
ix a*(a+b00) E 3 v,viii
Leftmost Derivations
 The derivation obtained by replacing the
leftmost variable by one of its production
bodies at each step, is called the leftmost
derivation

A leftmost derivation is indicated by a
lm

Derivation of string a*(a+b00)
using Leftmost derivation
E  E*E  I*E  a*E  a*(E) 
lm lm lm lm lm

a*(E+E)  a*(I+E)  a*(a+E) 


lm lm lm

a*(a+I)a*(a+I0)  a*(a+I00) 
lm lm lm

a*(a+b00)
Rightmost Derivations
 The derivation obtained by replacing the
rightmost variable by one of its production
bodies at each step, is called the rightmost
derivation


A rightmost derivation is indicated by arm
Rightmost Derivations

E  E*E  E*(E)  E*(E+E) 


rm rm rm rm

E*(E+I)  E*(E+I0)  E*(E+I00) 


rm rm

E*(E+b00)  E*(I+b00)  E*(a+b00) 


rm
rm rm rm

I*(a+b00)  a*(a+b00)
rm
Language of a Grammar
 If G= (V,T,P,S) is a context free grammar, the
language (CFL) of G is the set of terminal
strings that have derivations from the start
symbol
*
L(G)= {string w in T|S  w}
G
Palindromes
If Lpal is the language and Gpal= ({P},{0,1},A, P)
Where A is defined by the set of rules
Pє
P 0
P1
P  0P0
P  1P1
 The strings derived from these rules form a
context free language
Sentential Form of a Grammar
 If G= (V,T,P,S) is a context free
grammar, then any string in (VUT)*
such that S derives is a sentential form
Class Assignment
 Given production rules of a grammar are
1. S  A1B
2. A  0A| є
3. B  0B|1B| є
 Give leftmost derivation of strings
 00101
 1001
Parse trees
 The derivations obtained by a CFG can
be represented using tree structure
 The tree shows how the symbols of a
terminal string are grouped into sub
strings
 These trees are known as “parse trees”
Parse trees
 The parse trees for G are trees with following
conditions
1. Each interior node is labeled by a variable in V
2. Each leaf is labeled by a variable, or a
terminal, or an epsilon
3. If an interior node is labeled A, and its
children are labeled x1, x2, x3, ….xk
respectively, from the left, then
A  x1x2x3….xk is a production in P
Example
E

E + E

I A parse tree
showing the
derivation of I+E
Example
 Consider the grammar of language of
palindrome strings
 Learning how to

 Derive the palindrome 110010011

 Draw the parse tree for the above


derivation
Solution
 The grammar has production
rules
1. Pє
2. P 0
3. P1
4. P  0P0
5. P  1P1
Derivation
Then
P1P1 (Rule 5)
11P11 (Rule 5)
 110P011 (Rule 4)
 1100P0011 (Rule 4)
 110010011 (Rule 3)
Parse tree

1 P 1
1 P 1

0 P 0

0 P 0
1
Yield of a parse tree
 The leaves of the parse tree, on
concatenation from left, give a string,
called the yield of the tree
 This is a string derived from the root
variable
 The yield is a terminal string I.e. all leaves
are labeled either with a terminal or with є
Yield of the parse tree
 Example :Palindromes
 Yield is 110010011
Class Assignment
 Draw a parse tree for a*(a+b00)

You might also like