You are on page 1of 28

Chapter 12 Context-free Grammers

TAMUC CSCI549
Automata Theory

Automata Theory
CSCI549
Stephen T. Ha, Ph.D.
2016 - present

Required Text book:


Introduction to Computer Theory,
by
Daniel I.A. Cohen

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 1

Syntax for defining languages

TAMUC CSCI549
Automata Theory

This time we have included parentheses around every component factor. This
avoids the ambiguity of expressions like 3 + 4 * 5 and 8/4/2 by making them
illegal. We shall present a better definition of this set later.
First we must design a machine that can figure out how a given input string
was built up from these basic rules. Then we should be able to translate this
sequence of rules into an assembler language program, since all of these rules
are pure assembler language instructions

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 2

Syntax for defining languages

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 3

Syntax for defining languages

TAMUC CSCI549
Automata Theory

Rules that involve the meaning of words we call semantics and rules that do not
involve the meaning of words we call syntax.
Birds sing is good semantic. Also its syntax is good because bird is a
noun, and sing is a verb.
Concrete sings is bad semantic, in that the meaning is not meaningful,
although its syntax is good because concrete is a noun, and sing is a verb.
Semantic: relating to meaning in language or logic. (google)
Syntax: the arrangement of words and phrases to create well-formed sentences in a
language. (google)
English grammar is mostly syntax. ( http://linguistics.stackexchange.com/questions/3484/whats-thedifference-between-syntax-and-grammar. )

In general, the rules of computer language grammar are all syntactic and not
semantic.

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 4

Symbolism for generative grammars

TAMUC CSCI549
Automata Theory

The words that cannot be replaced by anything are called terminals. Words that must be
replaced by other things we call nonterminals.

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 5

Symbolism for generative grammars

TAMUC CSCI549
Automata Theory

In formal language theory, a grammar (when the context is not given, often called a formal
grammar for clarity) is a set of production rules for strings in a formal language. The rules describe
how to form strings from the language's alphabet that are valid according to the language's syntax.
A grammar does not describe the meaning of the strings or what can be done with them in
whatever contextonly their form. (https://en.wikipedia.org/wiki/Formal_grammar)
Context: the circumstances that form the setting for an event, statement, or idea, and in terms of
which it can be fully understood and assessed. (google)
Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 6

Symbolism for generative grammars - example

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 7

Symbolism for generative grammars - example

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 8

Symbolism for generative grammars - example

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 9

Symbolism for generative grammars - example

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 10

Symbolism for generative grammars - example

TAMUC CSCI549
Automata Theory

The above is Dr. Lees alternative presentation.

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 11

Symbolism for generative grammars - example

TAMUC CSCI549
Automata Theory

Exercise: Trace the production of the word aabb.

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 12

Symbolism for generative grammars - example

TAMUC CSCI549
Automata Theory

The above is Dr. Lees alternative presentation.

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 13

Symbolism for generative grammars - example

TAMUC CSCI549
Automata Theory

Discussion: Discuss what words are produced by the above CFG.

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 14

Symbolism for generative grammars - example

TAMUC CSCI549
Automata Theory

The total language generated by this CFG (in the previous slide) is all strings
of a's and b's, null or otherwise. The language generated is (a + b)*.

The above is Dr. Lees alternative presentation.


The language of the CFG on the right is also (a + b)*, but the sequence of productions
that is used to generate a specific word is not unique.
If we deleted the third and fourth productions, the language generated would
be the same.

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 15

Symbolism for generative grammars - example

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 16

Symbolism for generative grammars - example

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 17

Symbolism for generative grammars - example

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 18

Symbolism for generative grammars recall EVEN-EVEN

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 19

Symbolism for generative grammars example

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 20

Symbolism for generative grammars example

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 21

Symbolism for generative grammars

TAMUC CSCI549
Automata Theory

Not just the language of identifiers but the language of all proper FORTRAN
instructions can be defined by a CFG. This is also true of all the statements
in the languages C, PASCAL, BASIC, PL/I, and so on.
A computer must determine the grammatical structure of a computer language statement
before it can execute the instruction.
Almost all modern languages are context free. C++ is one of the rare exceptions due to
some things you can do with templates. No others come to mind that are not context
free, although things like XML integration in VB.NET and also LINQ queries in both C#
and VB might require context sensitivity. At least, Paul Vick, then head of the VB
team, mentioned that XML in VB is quite hard to parse. - Konrad Rudolph,
http://stackoverflow.com/questions/898489/what-programming-languages-are-context-free

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 22

Trees

TAMUC CSCI549
Automata Theory

These tree diagrams are called syntax trees or parse trees or generation trees
or production trees or derivation trees.

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 23

Trees

TAMUC CSCI549
Automata Theory

A CFG is called ambiguous if for at least one word in the language that it
generates there are two possible derivations of the word that correspond to
different syntax trees. If a CFG is not ambiguous, it is called unambiguuous.

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 24

Ambiguity

Ch12.ppt
Not to be circulated
Outside class

TAMUC CSCI549
Automata Theory

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 25

Total language tree

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 26

Total language tree - example

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

TAMUC CSCI549
Automata Theory

Page 27

Reading assignments

TAMUC CSCI549
Automata Theory

1. Textbook, 2nd ed., Chapter 12 (1st ed., Chapter 13, Chapter 14)
2. Chap12.pdf, by Dr. Kyung Lee in eCollege folder Doc Sharing

Ch12.ppt
Not to be circulated
Outside class

Excerpts from textbook by Daniel Cohen


and slides from Dr. Kyung Lee

Page 28