You are on page 1of 32

Introduction to

Formal Languages
Outline
Overview of languages
Strings & languages
Introduction to grammars
Regular grammars | languages
Context Free grammars | Languages

October 18, 2019 Formal Language Theory 2


Overview of languages : natural Vs formal

◼ Natural Languages
❑ rules come after the language
❑ evolve and develop

❑ highly flexible

❑ quite powerful

❑ no special learning effort needed

Disadvantages
❑ vague

❑ imprecise

❑ ambiguous

❑ user and context dependent

❑ Ex. Amharic, English, French, …

October 18, 2019 Formal Language Theory 3


Overview of languages: cont’d

◼ Formal Languages
❑ developed with strict rules
→predefined syntax and semantics
❑ precise
❑ unambiguous

→can be processed by machines!


Disadvantages
❑ unfamiliar notation

❑ initial learning effort

❑ Ex. Programming languages: Pascal, C++, …

October 18, 2019 Formal Language Theory 4


Overview of languages: cont’d

◼ Sentences: the basic building blocks of


languages
◼ Sentence = Syntax + Semantics
◼ Grammar: the study of the structure of a
sentence
◼ Ex:
<simple sentence> ::= <noun phrase><verb><noun phrase>
<noun phrase> ::= <article><noun>

→A person entered the room


October 18, 2019 Formal Language Theory 5
Overview of languages: cont’d
<simple sentence>

<noun phrase> <verb> <noun phrase>

<article> <noun> <article> <noun>

A person entered the room

Derivation tree for the simple sentence: A person entered the room.

October 18, 2019 Formal Language Theory 6


Overview of languages: cont’d

◼ In Pascal (as well as in many other


languages), for example, an identifier is
specified as follows:

<identifier> ::= <letter> | <letter> {<letter> | <digit>}*


<letter> ::= a | b| c …
<digit> ::= 0 | 1| 2 | … | 9

Ex. a, x1, num, count1, …

October 18, 2019 Formal Language Theory 7


Strings and languages
◼ Strings
❑ An alphabet, ∑, is a set of finite symbols.
❑ A string over an alphabet ∑ is a sequence of symbols from ∑.

❑ An empty string is a string without symbols, and is denoted by λ.

❑ Let w be a string, then its length, denoted by /w/, is the number of


symbols of w.
Ex. Let ∑ = {0, 1}, the following are some strings over ∑
w = λ, /w/ = 0; w = 01, /w/ = 2; w = 010110, /w/ = 6
*
❑ Given an alphabet ∑, ∑ denotes the set of all strings (including
λ) over ∑.
*
❑ ∑+ = ∑ - {λ}

Ex. ∑ = {0, 1} => ∑* = {λ, 0, 1, 01, 00, 11, 111, 0101, 0000, …}
❑ ∑i is a set of strings of length i, i = 0, 1, 2, …
*
❑ Let x Є ∑ and /x/ = n, then x = a1a2…an, ai Є ∑

October 18, 2019 Formal Language Theory 8


Strings and languages: cont’d

❑ Operations on strings
◼ Concatenation operation
❑ Let x, y Є ∑* and /x/ = n and /y/ = m. Then xy,
concatenation of x and y, = a1a2…anb1b2…bm, ai, bi Є ∑
❑ The set ∑* has an identity element λ with respect to the
binary operation of concatenation.
Ex. x Є ∑* , xλ = λx = x
❑ ∑* has left and right cancellation
For x, y, z Є ∑*,
zx = zy => x = y (left cancellation)
xz = yz => x = y (right cancellation)
❑ For x, y Є ∑* , we have /xy/ = /x/ + /y/

October 18, 2019 Formal Language Theory 9


Strings and languages: cont’d

◼ Transpose operation
❑ For any x in ∑* and a in ∑, (xa)T = a(x)T
Ex. (aaabab)T = babaaa
❑ A palindrome of even length can be obtained by the
concatenation of a string and its transpose.
❑ A prefix of a string is a substring of leading symbols of that
string.
w is a prefix of y if there exists y’ in ∑* such that y=wy’
Ex. y = 123, list all prefixes of y.
❑ A suffix of a string is a substring of trailing symbols of that
string.
w is a prefix of y if there exists y’ in ∑* such that y=y’w
Ex. y = 123, list all suffixes of y.

October 18, 2019 Formal Language Theory 10


Strings and languages: cont’d

❑ A terminal symbol is a unique indivisible object


used in the generation of strings.
❑ A nonterminal symbol is a unique object but
divisible, used in the generation of strings.
Ex. In English, a, b, A, B, etc are terminals and
the words boy, cat, dog, … are nonterminals.
In programming languages, a, A, :, ;, =, if, then, …
are terminals

October 18, 2019 Formal Language Theory 11


Strings and languages: cont’d
◼ Languages
❑ Definition: A language, L, is a set (collection) of strings over a given
alphabet, ∑.
◼ A string in L is called a sentence or word.
Ex. ∑ = {0, 1}, ∑* = {λ, 0, 1, 01, 00, 11, …}
L1 = {λ}, L2 = {0, 1, 01} over ∑
L3 = {an | n>= 0} over ∑ = {a}

❑ Let L1 , L2 be languages over ∑, then


◼ L1L2 = {xy | xЄL1, yЄL2}
◼ L{λ} = {λ}L = L, for any language L
◼ L0 = {λ}
◼ L1 = L
◼ L2 = LL ≡ {xx | xЄL}
◼ …
◼ Li = LiLi-1, for i>=2
◼ L* = U(i=0,∞)(Li)

October 18, 2019 Formal Language Theory 12


Introduction to grammars
◼ A formal language is a collection of strings over ∑ with some
rules known as grammars.
◼ Grammar rules can be represented using a syntax diagram.
◼ Alternatively BNF (Backus-Naur Form) notation can be used.
◼ BNF uses the following:
❑ Nonterminals are enclosed by <>. Terminals are represented as
they are.
❑ { } - represent repetition of nonterminals, terminals zero or more
times
❑ ::= stands for “is defined as”

❑ | stands for OR

❑ () used to group symbols

Ex. <identifier> ::= <letter>|<letter>{<letter>|<digit>}


<letter> ::= a|b|c|…
<digit> ::= 0|1|2|…|9

October 18, 2019 Formal Language Theory 13


Introduction to grammars: cont’d
◼ Phrase Structure Grammar (PSG)
❑ Definition: A PSG is a 4-tuple (N, T, P, S) where:
a. N is a finite set of nonterminals
b. T is a final set of terminals
c. P is a finite set of productions /rules/ of the form α→β, where α and β are
strings on N U T and α should contain at least one symbol from N.
d. S Є N is the start symbol of the grammar.
Note: The right hand side production, β, can be an empty string. Such a
production is called a λ-production.
Ex. G = (N, T, P, S) = ({S, B, C}, {a, b, c}, P, S)
where P is given by:
S → aSBC | aBC
BC → CB
aB → ab
C → Cc | λ

October 18, 2019 Formal Language Theory 14


Introduction to grammars: cont’d

◼ Derivation
1. If α generates β, then we write α => β
2. If α1 => α2, α2 => α3, …, αn-1 => αn, then we write
+
α1 => α2 => α3 => … => αn or α1 => αn
◼ Let G = (N, T, P, S) be a grammar, if S => α in
zero or more steps, α Є (N U T)*, then α is called a
sentential form.
◼ A sentence (in G) is a sentential form in T*.
◼ The language generated from the grammar G is
denoted by L(G). L(G) = {x Є T* | S => * x}

i.e. L(G) is the set of all terminal strings derived


from the start symbol S.

October 18, 2019 Formal Language Theory 15


Introduction to grammars: cont’d
◼ Ex1.
G = (N, T, P, S) where:
N = {<sentence>, <noun>, <verb>, <adverb>}
T = {Sam, Dan, ate, sang, well}
S = <sentence>
P consists of:
<sentence> → <noun><verb> |
<noun><verb><adverb>
<noun> → Sam | Dan
<verb> → ate | sang
<adverb> → well

October 18, 2019 Formal Language Theory 16


Introduction to grammars: cont’d
◼ Ex2.
S → A|B
A → aA|bB|a|b
B → bB|b

◼ Ex3.
S → a|bS

◼ Ex4.
S → aA|bB|a|b
A → aA|a
B → bB|b

October 18, 2019 Formal Language Theory 17


Introduction to grammars: cont’d
◼ Note that reverse derivation is not permitted. For instance, if S
→ AB is a production, then we can replace S by AB, but we
cannot replace AB by S.
◼ Notations:
i. If A is any set, then A* denotes the set of all strings over A and
A+ = A* - {λ}
ii. A, B, C, A1, A2, … denote nonterminals
iii. a, b, c, … denote terminals
iv. x, y, z, w, … denote strings of terminals
v. α, β, … denote strings from (N U T)*
vi. If A → α is a production where A Є N, the production is called
an A-production
vii. If A → α1, A → α2, A → α3, A → α4 … A → αn are all A-
productions, these can be written as A → α1| α2| α3| α4|… αn
viii. X0 = λ for any symbol X Є N U T

October 18, 2019 Formal Language Theory 18


Introduction to grammars: cont’d
◼ Definition: Let G1 and G2 be two grammars, then G1 and G2
are equivalent if L(G1) = L(G2).
◼ Ex5. G = ({S}, {a}, {S → SS | a}, S). Find L(G)
◼ Ex6. G = ({S, C}, {a, b}, P, S) where P is given by:
S → aCa
C → aCa | b
Find L(G)
◼ Ex7. G = ({S}, {a}, {S → aS|a}, S). Find L(G)
◼ Ex8. Let L be the set of all palindromes over {a, b}. Construct a
grammar G that generates L.
Hint: Use the following recursive definition
i. λ is a palindrome
ii. a, b are palindromes
iii. If x is a palindrome, axa and bxb are palindromes

October 18, 2019 Formal Language Theory 19


Regular Grammars

◼ Definition: a PSG G = (N, T, P, S) is regular


provided that:
i. If there exists a λ-production, then it is of the form S → λ and
S does not appear on the right hand side of any production.
ii. All other productions are of the form:
❑ α → β, α Є N, β Є T, OR
❑ α → βγ, α, γ Є N, β Є T
❑ A language generated from a regular grammar is called a
Regular Language.
❑ Ex1. G1:
S → aS | aB
B → bB | b
Is G1 a regular grammar?

Formal Language Theory 20


Regular Grammars: cont’d

◼ Theorem: If L1 and L2 are two regular languages,


then:
a. L1 U L2 is also a regular language
b. L1L2 is also a regular language
c. L1* is also a regular language

Formal Language Theory 21


Transition diagrams

◼ A regular grammar G=(N, T, P, S) can be


represented by a transition diagram (a directed
graph with labeled arcs as follows:
❑ The nodes of the graph are to contain non-terminals
❑ The arcs are labeled with terminals
❑ One of the nodes is an initial node which is designated
with a pointer
❑ One (or more) of the nodes is designated as final node
which is either a square or a double circle
❑ If A→aB is in P, then the arc from A to B is labeled with a
❑ If A→a is in P, then the arc from A to a final state is labeled
with a

Formal Language Theory 22


Transition diagrams: cont’d

◼ Example:
Let G=(N, T, P, S) be a regular grammar with
P: S→aA|bB
A→aA|a
B→bB|b
Draw a transition diagram that represents G

Formal Language Theory 23


Context Free Grammar (CFG)

◼ Definition: A CFG, G, is a PSG G=(N, T, P, S) with


productions of the form A → β, A Є N, β Є (NUT)*.
◼ CFGs are used in defining the syntax of programming
languages and in parsing arithmetic expressions.
◼ A language generated from CFG is called Context Free
Language (CFL).
◼ Ex.
a) S → aB b) S → aB|A
B → bA|b aA → aA|a|CBA
A→a B→λ
C→c

October 18, 2019 Formal Language Theory 24


CFG: cont’d

◼ Let G be a CFG, then x Є L(G) iff S ➔ x in


zero or more steps over G.
◼ x Є L(G) can as well be obtained from a
derivation tree or parse tree. The root of the
tree is S and x is the collection of leaves from
left to right.
◼ Left most derivation: employs the reduction
of the left most non-terminal
◼ Right most derivation: employs the reduction
of the right most non-terminal

October 18, 2019 Formal Language Theory 25


CFG: cont’d
◼ If a derivation of a string x has two different left most
derivations, then the grammar is said to be ambiguous.
Otherwise unambiguous.
(i.e. a grammar is ambiguous if it can produce more than one
parse tree for a particular sentence.
◼ Ex.
1. G1 = (N, T, P, S) with productions:
S → AB
A → aA|a
B → bB|b
let x = aaabbb
a) find a left most and right most derivations for x
b) draw the parse tree for x

October 18, 2019 Formal Language Theory 26


CFG: cont’d

2. G2 = (N, T, P, S) with productions:


S → SbS|ScS|a
let x = abaca Є L(G2)
a) find a left most and right most
derivations for x
b) draw the parse tree for x
3. Is G1 ambiguous? Is G2?

October 18, 2019 Formal Language Theory 27


Parsing Arithmetic Expression

◼ Consider the following grammar:


E→T|E+T|E–T
T → F | T * F | T/F
F → a | b | c | (E)

Draw parse trees for


a) a*b+c b) a+b*c c) (a+b)*c d) a-b-c

October 18, 2019 Formal Language Theory 28


Closure Properties of CFGs

◼ Theorem: CFGs are closed under:


a) Union
b) Concatenation
c) Kleen star(*)

October 18, 2019 Formal Language Theory 29


Union
◼ Let L1 and L2 be two context free languages.
Then L1 ∪ L2 is also context free.
◼ Ex Let L1 = { anbn , n > 0}. Corresponding
grammar G1 will have P: S1 →aAb | ab
◼ Let L2 = { cmdm , m ≥ 0}. Corresponding
grammar G2 will have P: S2 →cBb | ε
◼ Union of L1 and L2, L = L1 ∪ L2 = { anbn } ∪ {
cmdm }
◼ The corresponding grammar G will have the
additional production S → S1 | S2
October 18, 2019 Formal Language Theory 30
Concatenation

◼ If L1 and L2 are context free languages, then


L1L2 is also context free.
◼ Example
◼ Concatenation of the languages L1 and L2, L
= L1L2 = { anbncmdm }
◼ The corresponding grammar G will have the
additional production S → S1 S2

October 18, 2019 Formal Language Theory 31


Kleene Star

◼ If L is a context free language, then L* is also


context free.
◼ Example
◼ Let L = { anbn , n ≥ 0}. Corresponding
grammar G will have P: S →aAb| ε
◼ Kleene Star L1 = { anbn }*
◼ The corresponding grammar G1 will have
additional productions S1 → SS1 | ε

October 18, 2019 Formal Language Theory 32

You might also like