You are on page 1of 37

Formal Definition of a CFG

• There is a finite set of symbols that form the strings,


i.e. there is a finite alphabet. The alphabet symbols
are called terminals (think of a parse tree)
• There is a finite set of variables, sometimes called
non-terminals or syntactic categories. Each variable
represents a language (i.e. a set of strings).
– In the palindrome example, the only variable is P.
• One of the variables is the start symbol. Other
variables may exist to help define the language.
• There is a finite set of productions or production
rules that represent the recursive definition of the
language. Each production is defined:
1. Has a single variable that is being defined to the left of the
production
2. Has the production symbol 
3. Has a string of zero or more terminals or variables, called the
body of the production. To form strings we can substitute
each variable’s production in for the body where it appears.
3
– V is the set of variables
– T is the set of terminals
– P is the set of production rules
– S is the start symbol.
• CFG drive their name from the fact that the substitution
of the variable on the left of the production can be made
any time such a variable appears in a sentential form. It
does not depend on the symbol in the rest of the
sentential form (the context). This feature is the
consequence of following only a single variable on the
left side of the production.
4
Context-Free Grammar: G  (V , T , S , P )

All productions in P are of the form

A s
Variable String of
variables and
terminals

6
Example of Context-Free Grammar
S  aSb | 
productions
P  {S  aSb, S  }

G   V ,T , S , P 

V  {S }
T  {a, b} start variable
variables
terminals
7
Another Example

Context-free grammar G :
S  aSa | bSb | 
Example derivations:
S  aSa  abSba  abba
S  aSa  abSba  abaSaba  abaaba

R
L(G )  {ww : w  {a, b}*}
Palindromes of even length
12
Another Example
Context-free grammar G:
S  aSb | SS | 
Example derivations:
S  SS  aSbS  abS  ab
S  SS  aSbS  abS  abaSb  abab

L(G )  {w : na ( w)  nb ( w),
and na (v)  nb (v)
Describes
in any prefix v}
matched
parentheses: () ((( ))) (( )) a  (, b )
13
Derivation Order
and
Derivation Trees

14
Derivation Order

Consider the following example grammar


with 5 productions:

1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

15
1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Leftmost derivation order of string aab :

1 2 3 4 5
S  AB  aaAB  aaB  aaBb  aab

At each step, we substitute the


leftmost variable
16
1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Rightmost derivation order of string aab :

1 4 5 2 3
S  AB  ABb  Ab  aaAb  aab
At each step, we substitute the
rightmost variable
17
1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Leftmost derivation of aab :


1 2 3 4 5
S  AB  aaAB  aaB  aaBb  aab

Rightmost derivation of aab :


1 4 5 2 3
S  AB  ABb  Ab  aaAb  aab
18
Derivation Trees
Consider the same example grammar:
S  AB A  aaA |  B  Bb | 
And a derivation of aab :
S  AB  aaAB  aaABb  aaBb  aab
Definition:
• A parse tree for a context-free grammar G = (V,P,R, S) is a

tree whose nodes are labeled by elements of V u and that
satisfies the following conditions.
• The root is labeled by the start symbol S.
• Each interior node is labeled by a non-terminal.

• Each leaf is labeled by a terminal symbol or by .

19
S  AB A  aaA |  B  Bb | 

S  AB
S

A B

yield AB

20
S  AB A  aaA |  B  Bb | 

S  AB  aaAB
S

A B

yield aaAB
a a A

21
S  AB A  aaA |  B  Bb | 

S  AB  aaAB  aaABb
S

A B

a a A B b

yield aaABb
22
S  AB A  aaA |  B  Bb | 
S  AB  aaAB  aaABb  aaBb
S

A B

a a A B b

yield
 aaBb  aaBb
23
S  AB A  aaA |  B  Bb | 
S  AB  aaAB  aaABb  aaBb  aab
Derivation Tree S
(parse tree)
A B

a a A B b
yield
  aab  aab
24
Sometimes, derivation order doesn’t matter
Leftmost derivation:
S  AB  aaAB  aaB  aaBb  aab
Rightmost derivation:
S  AB  ABb  Ab  aaAb  aab
S

Give same A B
derivation tree
a a A B b

  25
Ambiguity

26
Grammar for mathematical expressions

E  E  E | E  E | (E) | a

Example strings:
(a  a )  a  (a  a  (a  a ))

Denotes any number

27
E  E  E | E  E | (E) | a

E  E  E  a E  a EE
E
 a  a E  a  a*a
E  E
A leftmost derivation
for a  a  a
a E  E

a a
28
E  E  E | E  E | (E) | a

E  EE  E  EE  a EE


E
 a  aE  a  aa
E  E
Another
leftmost derivation
for a  a  a E  E a

a a
29
E  E  E | E  E | (E) | a

Two derivation trees


for a  a  a
E E

E  E E  E

a E  E E  E a

a a a a
30
take a2

a  a a  2  22
E E

E  E E  E

2 E  E E  E 2

2 2 2 2
31
Good Tree Bad Tree
2  22  6 2  22  8
6 Compute expression result 8
E using the tree E
2 4 4 2
E  E E  E
2 2 2 2
2 E  E E  E 2

2 2 2 2
32
Two different derivation trees
may cause problems in applications which
use the derivation trees:

• Evaluating expressions

• In general, in compilers
for programming languages

33
Ambiguous Grammar:
A context-free grammar G is ambiguous
if there is a string w L (G ) which has:

two different derivation trees


or
two leftmost derivations

(Two different derivation trees give two


different leftmost derivations and vice-versa)
34
Example: E  E  E | E  E | (E) | a

this grammar is ambiguous since


string a  a  a has two derivation trees

E E

E  E E  E

a E  E E  E a

a a a a
35
E  E  E | E  E | (E) | a
this grammar is ambiguous also because
string a  a  a has two leftmost derivations

E  E  E  a E  a EE
 a  a E  a  a*a

E  EE  E  EE  a EE


 a  aE  a  aa
36
Another ambiguous grammar:

IF_STMT  if EXPR then STMT


| if EXPR then STMT else STMT

Variables Terminals

Very common piece of grammar


in programming languages
37
If expr1 then if expr2 then stmt1 else stmt2
IF_STMT

if expr1 then STMT

if expr2 then stmt1 else stmt2

Two derivation trees


IF_STMT

if expr1 then STMT else stmt2

if expr2 then stmt1


38
In general, ambiguity is bad
and we want to remove it

Sometimes it is possible to find


a non-ambiguous grammar for a language

But, in general ιt is difficult to achieve this

39
A successful example:
Equivalent
Ambiguous
Non-Ambiguous
Grammar
Grammar
E E E
E  E T |T
E  E E
T T  F | F
E  (E )
E a F  (E ) | a
generates the same
language
40
E  E T T T  F T  a T  a T F
 a  F F  a  aF  a  aa
E
E  E T |T
E  T
T T F | F
F  (E) | a T T  F

F F a
Unique
derivation tree
for a  a  a a a
41
An un-successful example:

n n m n m m
L  {a b c }  {a b c }
n, m  0

L is inherently ambiguous:

every grammar that generates this


language is ambiguous

42
Example (ambiguous) grammar for L:
n n m n m m
L  {a b c }  {a b c }

S  S1 | S 2 S1  S1c | A S2  aS2 | B
A  aAb |  B  bBc | 

43
The string a n b n c n  L
has always two different derivation trees
(for any grammar)

For example
S S

S1 S2

S1 c a S2

44

You might also like