## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Chapter 5 Context-Free Languages

Wuu Yang National Chiao-Tung University, Taiwan, R.O.C. September 15, 2008

1

Chapter Outline 1. Context-Free Grammars 2. Parsing and Ambiguity 3. Context-Free Grammars and Programming Languages

2

We have seen many languages that are not regular, for instance, {(n )n | n ≥ 0}, which is a special case of properly nested parentheses widely used in conventional programming languages. Context-free languages are mostly used in the speciﬁcation of high-level computer programming languages, such as Java and Perl. To decide the membership problem (whether a string belongs to a context-free language) is called parsing, which is the front-end of a compiler.

3

§5.1 Context-Free Grammars Deﬁnition. A grammar G =def (V, T, S, P ) is a context-free grammar if all production rules in P have the form A → α, where A ∈ V and α ∈ (V ∪ T )∗ . A language L is context-free if and only if L = L(G) for some context-free grammar G. Note that a regular grammar satisﬁes the above deﬁnition and, hence, it is also a context-free grammar. Consequently, a regular language is also a context-free language. Example 5.1. The following grammar is context-free, but is not regular. S → aSa S → bSb S→λ Here is a sample derivation: S ⇒ aSa ⇒ aaSaa ⇒ aabSbaa ⇒ aabbaa. The language generated

4

by this grammar is {wwR | w ∈ Σ∗ }, which is context-free, but not regular. 2 Note that this grammar is linear (see slide 3-20) in that the right-hand side of each production rule contains at most one nonterminal. But it is not right-linear nor left-linear. From this example, we conclude that the family of regular languages is a proper subclass of the family of the context-free languages.

5

Example 5.2. The following grammar is context-free, but is not regular. S → abB A → aaBb B → bbAa A→λ The language generated by this grammar is {ab(bbaa)n bba(ba)n | n ≥ 0}. This language, which is similar to {en f n | n ≥ 0}, is not regular. Note that, though, similar to a right-linear grammar, the right-hand side of each production rule contains at most one nonterminal, the grammar is not right-linear (hence, not regular). 2

6

Example. The language L =def {w ∈ {a, b}∗ | na (w) = nb (w)} is context-free, but is not regular. We can derive a grammar for L: V → V aV bV V → V bV aV V →λ There are at least two derivations of the sentence abab: V ⇒ V aV bV ⇒ aV bV ⇒ abV ⇒ abV aV bV ⇒ abaV bV ⇒ ababV ⇒ abab and V ⇒ V aV bV ⇒ V aV b ⇒ V ab ⇒ V aV bV ab ⇒ aV bV ab ⇒ abV ab ⇒ abab and V ⇒ V aV bV ⇒ V aV b ⇒ aV b ⇒ aV bV aV b ⇒ aV bV ab ⇒ abV ab ⇒ abab. We way this grammar is ambiguous. 2

7

Example. The language L =def {w ∈ {a, b}∗ | na (w) ≥ nb (w)} is context-free, but is not regular. We can derive a grammar for L: T → T aT T →V V → V aV bV | V bV aV | λ This grammar is also ambiguous. 2 Example. The language L =def {w ∈ {a, b}∗ | na (w) > nb (w)} is context-free, but is not regular. We can derive a grammar for L: S → T aT T → T aT | V V → V aV bV | V bV aV | λ This grammar is also ambiguous. 2

8

Example. The language L =def {w ∈ {a, b}∗ | na (w) = nb (w)} is context-free, but is not regular. We can derive a grammar for L: S → T aT | U bU T → T aT | V U → U bU | V V → V aV bV | V bV aV | λ This grammar is also ambiguous. This language is the complement of a previous context-free langauge. 2

9

Example. The language L =def {an bm | n = m} is context-free, but is not regular. We can derive a grammar for L: V → aV b V →λ This grammar is unambiguous. 2 Example. The language L =def {an bm | n ≥ m} is context-free, but is not regular. We can derive a grammar for L: T → aT T →V V → aV b | λ The strings derived from T contains zero or more a’s than b’s. This grammar is unambiguous. 2

10

Example. The language L =def {an bm | n > m} is context-free, but is not regular. We can derive a grammar for L: S → aT T → aT | V V → aV b | λ The strings derived from S contains one or more a’s than b’s. This grammar is unambiguous. 2

11

Example 5.3. The language L =def {an bm | n = m} is context-free, but is not regular. We can derive a grammar for L: S → aT | U b T → aT | V U → Ub | V V → aV b | λ Either (1) the strings derived from S contains one or more a’s than b’s (if we take S → aT during the ﬁrst derivation step) or (2) the strings derived from S contains one or more b’s than a’s (if we take S → U b during the ﬁrst derivation step). This grammar is unambiguous. (2nd solution). Here is the grammar from the textbook: S → AV | V B A → aA | a B → Bb | b

12

V → aV b | λ This grammar is unambiguous. 2 How can we show that the two grammars generate the same language? Exercise 25. Find a linear grammar for this language.

13

Example 5.4. Consider the following grammar: S → aSb | SS | λ The language generated by this grammar is {w ∈ {a, b}∗ | na (w) = nb (w); na (v) ≥ nb (v), for any preﬁx v of w}. This is the language of properly nested parentheses commonly used in computer programming languages and mathematical expressions. 2 This language is not regular. Question. Is there a linear grammar for this language? (See chapter 8.)

14

Leftmost and Rightmost Derivations A derivation is a sequence of steps. In each step we expand a nonterminal A by replacing A with the right-hand side of an A-production rule. For example, consider the following grammar: S → AB A → aaA A→λ B → Bb B→λ The language generated by this grammar is {a2n bm | n ≥ 0, m ≥ 0}. The string aab is a sentence (or an element) of this language. Here are two derivations of this sentence: S ⇒ AB ⇒ aaAB ⇒ aaB ⇒ aaBb ⇒ aab S ⇒ AB ⇒ ABb ⇒ Ab ⇒ aaAb ⇒ aab

15

The result of each derivation step is called a sentential form. The derivation stops when non more nonterminal is left. A sentential form without nonterminals is called a sentence. The ﬁrst derivation is a leftmost derivation in which the leftmost nonterminal is expanded ﬁrst. Similarly, the second derivation is a rightmost derivation in which the rightmost nonterminal is expanded ﬁrst. A derivation can be drawn as a derivation tree. A derivation tree is also called a syntax tree. For example:

16

S A a a A B B b a A a A

(b) a partial derivation tree (a) a derivation tree Fig 5.1

A derivation tree is an ordered tree, which means that there is an ordering among siblings. The root of a derivation is labelled with the start symbol of the grammar. The leaves are labelled with an element of T ∪ {λ}. The internal nodes are labelled with a nonterminal (or a variable, which is an element of V ). A subtree of the derivation tree with some sub-subtrees removed is called a partial derivation tree.

17

Example 5.6. Consider the following grammar: S → aAB A → bBb B→A|λ The language generated by this grammar is {a(bb)m | m ≥ 1}. The string abbbb is a sentence of this language. Here is the leftmost derivation of this sentence: S ⇒ aAB ⇒ abBbB ⇒ abbB ⇒ abbA ⇒ abbbBb ⇒ abbbb

18

S a b A B b b (a) a derivation tree B A B b b

B A B b

(b) a partial derivation tree Fig 5.2

19

Theorem 5.1. There is an obvious correspondence between a derivation of a sentence w ∈ L(G) and its derivation tree.

20

§5.2 Parsing and Ambiguity There are two sides of a (context-free) grammar: • We may use a grammar to generate sentences (derivation). • We may ask whether a string can be generated by a grammar (parsing). A simple parsing method is to try all possible derivations and see if the string could be derived. We use a top-down, breadth-ﬁrst, left-to-right approach.

21

0. exhaustive search 1. Input is a string w and a grammar G. 2. T = {S} (the start symbol of the grammar) 3. repeat 4. for each sentential form f in T do 5. locate the leftmost nonterminal, say A, 6. expand A with every A-rule 7. T := T − {f } ∪ { new sentential forms } 8. delete those sentential forms that cannot generate the required string. 9. end for 10. until we ﬁnds a leftmost derivation of the string or the collection of sentential becomes empty. If w ∈ L(G), then this algorithm always terminates and returns a leftmost derivation of w. If w ∈ L(G), this algorithm may not terminate.

22

An alternative strategy of exhaustive search. We may do this by following the leftmost derivation. When expanding a nonterminal A, we try each A-rule in turn. Deriving a sentence stops whenever it is possible to decide whether the result is the required string. This exhaustive search method may not terminate, even if w ∈ L(G), due to left-recursive rules (that is, rules of the form L → Lα). This same problem occurs if we follow the rightmost derivation, due to right-recursive rules. This is a top-down, depth-ﬁrst approach. 2

23

Recall the reverse of a grammar GR deﬁned in §3.3. A leftmost derivation in G corresponds to a rightmost derivation in GR .

24

Example 5.7. Consider the string aabb and the grammar S → SS | aSb | bSa | λ In the 1st round, we will try the following derivations in turn: S ⇒ SS S ⇒ aSb S ⇒ bSa S⇒λ The last two derivations cannot lead to the string aabb. In the 2nd round, we have 8 sentential forms: S ⇒ SS ⇒ SSS S ⇒ SS ⇒ aSbS S ⇒ SS ⇒ bSaS

25

S ⇒ SS ⇒ λS S ⇒ aSb ⇒ aSSb S ⇒ aSb ⇒ aaSbb S ⇒ aSb ⇒ abSab S ⇒ aSb ⇒ aλb The 3rd, 7th, and 8th derivations cannot lead to the required string aabb. There are 5 sentential forms left. We may conduct the 3rd round and will ﬁnd a leftmost derivation: S ⇒ aSb ⇒ aaSbb ⇒ aaλbb

26

Problems with exhaustive search: • It is ineﬃcient. • It may not terminate if w ∈ L(G). If we impose the additional constraint that there is no λ rules (that is, rules of the form A → λ) nor rules of the form A → B, then the above exhaustive search method always terminates with a correct, deﬁnite answer whether or not w ∈ L(G). We will see later that this constraint does not aﬀect the power of context-free grammars in any signiﬁcant way.

27

Example 5.8. The grammar in example 5.7 is equivalent to the following grammar (except the empty sentence), which satisﬁes the above constraint (no λ-rules): T → T T | aT b | bT a | ab | ba 2 Corollary. Let G be a context-free grammar which does not include rules of the forms A → λ and A → B where A, B ∈ V . Then the derivation of a sentence w ∈ L(G) takes at most 2|w| − 1 steps. Proof. Note that in such grammars, every derivation step increases the length of the derived sentential form by at least 1 or it changes a nonterminal to a terminal (with a rule A → a). 2

28

Theorem 5.2. Let G be a context-free grammar which does not include rules of the forms A → λ and A → B where A, B ∈ V . Then the exhaustive search method always terminate with a correct answer. Proof. Due to the above corollary, we can limit our search to at most 2|w| − 1 rounds (there is a derivation step per round), where w is the given string. If w ∈ L(G) we will ﬁnd a (leftmost) derivation. Otherwise, the search will terminate with a NO answer. 2

29

Next we will consider the time complexity of exhaustive search. Initially, there is a single sentential form (which consists of the single start symbol S). In each round, a sentential form is expanded into at most |P | new sentential forms. There are at most 2|w| − 1 rounds. Hence the upper bound of the number of sentential forms is |P | + |P | + |P | + . . . + |P |

2 3 2|w|−1

|P |2|w| − |P | = O(|P |2|w| ) = |P | − 1

This is an exponential function on the length of the input string |w|. There are more eﬃcient general parsers, such as CYK and Earley’s parsers.

30

Theorem 5.3. Every context-free grammars have a O(n3 )-time parser. Context-free grammars and parsing are used mostly in programming languages and compilers. In practice we usually require a linear-time parser. Not all context-free grammars have a linear-time parser.

31

Deﬁnition. A (context-free) grammar G is ambiguous if and only if there is a sentence w ∈ L(G) that have two or more leftmost derivations. Equivalently, a (context-free) grammar G is ambiguous if and only if there is a sentence w ∈ L(G) that have two or more rightmost derivations. Equivalently, a (context-free) grammar G is ambiguous if and only if there is a sentence w ∈ L(G) that have two or more derivation trees. Example 5.10. The grammar S → aSb | SS | λ is ambiguous since the sentence aabb has the following two leftmost derivations: S ⇒ SS ⇒ S ⇒ aSb ⇒ aaSbb ⇒ aabb S ⇒ aSb ⇒ aaSbb ⇒ aabb

32

S S a a S S S b b

Fig 5.4 S a a S S b b

Sometimes it is possible to transform an ambiguous grammar into an unambiguous one. For instance, the above grammar is equivalent to the following unambiguous grammar: S→T |λ T → U | UT U → ab | aU b

33

It is very diﬃcult to determine if a context-free grammar is ambiguous. (We will discuss this later) Example 5.11. The following grammar E → E + E | E ∗ E | (E) | a is ambiguous. This grammar is used to model the usual arithmetic expressions. Usually, we impose the additional stipulation that ∗ is performed before + (that is, ∗ has a higher precedence than +). We may use the following (unambiguous) grammar to show this precedence: Example 5.12. E →E+T | T T →T ∗F | F F → (E) | a

34

The above examples show that a context-free grammar can be used to impose precedence. Similarly, associativity can also be enforced by context-free grammars. For left-associative operations, such as +: L→L+E | E For right-associative operations, such as ∗∗: R → E ∗ ∗R | E

35

We have shown that ambiguity sometimes can be removed by properly transforming the grammar. However, this is not always possible. Certain context-free languages have only ambiguous grammars. They are called inherently ambiguous languages. Deﬁnition. Let L be a context-free grammar. If L has an unambiguous grammar, it is unambiguous. Otherwise, it is inherently ambiguous. Example 5.13. Consider the following language L =def {an bn cm } ∪ {an bm cm } The left part {an bn cm } can be generated by a grammar: S → Sc | A A → aAb | λ Similarly, the right part {an bm cm } can be generated by a grammar:

36

T → aT | B B → bBc | λ Their union is described by one additional rule: Q→S |T The string an bn cn , which belongs to both parts, have two derivations. Though this does not shown L is inherently ambiguous, it is quite possible that it is never possible to combine the two parts with a single unambiguous grammar.

37

§5.4 Context-Free Languages and Programming Languages The syntax of a programming language is usually speciﬁed by a context-free grammar. Due to the consideration of parsing eﬃciency, we are usually restricted to the subclass of LL(1) or LR(1) grammars. The following page contains C’s LALR(1) grammar.

38

39

Index

ambiguous grammar, 7, 32 sentence, 16 sentential form, 16

39-1

- SEM 5
- language implementation
- Formal Language Theory (Slides)
- Boettke Syntax and the Turing Test
- 01.Overview
- 9810228848_Graph
- 47,48
- Languages Grammars
- CFG
- 08L12
- Generating Semantically Precise Scene Graphs From Textual Descriptions for Improved Image Retrieval
- aral96
- MCQ
- LectureNLP
- FA Definitions
- Lexical and Parser tool for CBOOP program
- wf5020
- Kuligowski Parsing
- CLch4
- ibps it officer pattern
- The Definitive Antlr 4 Reference
- steels2011fluidity.pdf
- Levy 2014, Identifikacija
- online
- Compiler prac.docx
- Testing Equivalence of Regular Expressions
- Compilers
- Iitk Mtech Cs
- Mildly Context-Sensitive Languages
- uedinburgh-nissim qa

Skip carousel

- UT Dallas Syllabus for cs6320.501 06f taught by Sanda Harabagiu (sanda)
- tmp44D8.tmp
- Journalism for Women A Practical Guide by Bennett, Arnold, 1867-1931
- NLP to Create sql Query
- UT Dallas Syllabus for cs6320.001.11f taught by Sanda Harabagiu (sanda)
- UT Dallas Syllabus for cs6353.001 06f taught by I-ling Yen (ilyen)
- Patent Analysis using Text Mining
- Aspect Mining
- tmp4749.tmp
- tmp22DB.tmp
- UT Dallas Syllabus for cs6353.501.07f taught by I-ling Yen (ilyen)
- English Grammar in Familiar Lectures by Kirkham, Samuel
- Novel Database-Centric Framework for Incremental Information Extraction
- tmpC24

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Close Dialog## This title now requires a credit

Use one of your book credits to continue reading from where you left off, or restart the preview.

Loading