You are on page 1of 69

Formal Language Theory

Core Competency
Outlines
 Introduction to the Theory of Computation
 Mathematical Preliminaries and Notation

 Languages

 Grammars

 Automata

 Regular languages
 Regular grammars

 Finite state automata

 Context Free Languages


 Context Free Grammars

 Parsing arithmetic expressions

 Push Down Automata


2
Introduction theory of computation
 Automata theory deals with the definitions and properties of
mathematical models of computation.
 These models play a role in several applied areas of computer
science. One model, called the finite automaton, is used in text
processing, compilers, and hardware design.
 Another model, called the context-free grammar, is used in
programming languages and artificial intelligence.
 Automata theory is an excellent place to begin the study of the
theory of computation.
 The theories of computability and complexity require a precise
definition of a computer. Automata theory allows practice with
formal definitions of computation as it introduces concepts relevant
to other none theoretical areas of computer science
3
Introduction theory of computation…
 Computer science is a practical discipline. Those who
work in it often have a marked preference for useful and
tangible problems over theoretical speculation. This is
certainly true of computer science students who are
concerned mainly with difficult applications from the
real world.
 Theoretical questions interest them only if they help in
finding good solutions. This attitude is appropriate,
since without applications there would be little interest
in computers. But given this practical orientation, one
might ask “why study theory?”

4
Introduction theory of computation…
 The first answer is that theory provides concepts and principles that
help us understand the general nature of the discipline. The field of
computer science includes a wide range of special topics, from
machine design to programming.
 A second, and perhaps not so obvious, answer is that the ideas we
will discuss have some immediate and important applications. The
fields of digital design, programming languages, and compilers are
the most obvious examples, but there are many others. The
concepts we study here run like a thread through much of computer
science, from operating systems to pattern recognition.
 The third answer is one of which we hope to convince the reader.
The subject matters intellectually stimulating and fun. It provides
many challenging, puzzle-like problems that can lead to some
sleepless nights. This is problem solving in its pure essence.
5
Mathematical Preliminaries and Notation
 Sets
 Set Operations
 The usual set operations are union (∪), intersection
(∩), and difference (−) defined as
 Relations and Functions
 Properties of Relations
 Graphs and Trees
 Proof techniques

6
Languages and strings
 We are all familiar with the notion of natural languages,
such as English and French. Still, most of us would
probably find it difficult to say exactly what the word
“language” means.
 Dictionaries define the term informally as a system
suitable for the expression of certain, ideas, facts or
concepts, including a set of symbols and rules for their
manipulation. While this gives us an intuitive idea of
what a language is, it is not sufficient as a definition for
the study of formal languages. We need a precise
definition for the term.

7
Languages and Strings
 Symbol: any thing like a,b,c 0,1,2 etc.
 Alphabet: collection of symbols ,it must be finite
 Alphabet is denoted by sigma()
 Example {a , b}, {d, e, f, g}, {0,1,2}
 String : sequence of symbols
 Example a, b, 0,1 ab,ab ,ba,bb, 01
 Language: set of strings
 Example Σ ={0, 1}
L1 set of all strings of length 2
= {00, 01,10, 11}
this is finite string

8
Languages and Strings

L2 set of all strings of length 3


= {000, 001,010, 011, 100, 101,110,111}
this is finite string
L3 set of all strings of that begin with 0
= {0, 00,01, 010,011,0000}
this is infinite string

9
Chapter two: Grammars
outlines
 Introduction to grammar
 Types of grammar
 Regular grammar
 Context free grammar
 Derivation

11
Grammar
 To study languages mathematically, we need a
mechanism to describe them.
informal descriptions in English are often is imprecise
inadequate.
 A grammar for the English language tells us whether a

particular sentence is well-formed or not.


 A typical rule of English grammar is “a sentence can

consist of a noun phrase followed by a predicate.”

12
Grammar
A grammar G is defined as a quadruple
G =(V, T, S, P),
Where
V is a finite set of objects called variables,
T is a finite set of objects called terminal symbols,
S ∈ V is a special symbol called the start variable,
P is a finite set of productions.

13
Grammar
 It will be assumed without further mention that the sets
V and T are nonempty and disjoint.
 The production rules are the heart of a grammar; they
specify how the grammar transforms one string into
another, and through this they define a language
associated with the grammar.
 In our discussion we will assume that all production
rules are of the form x→y
 where x is an element of (V ∪ T)+ and y is in (V ∪ T)*
 The productions are applied in the following manner:
 Given a string w of the form w = uxv
14
Grammar
Example 1
Consider the grammar: G = ({S},{a,b},S,P}, with P
given by
S→aSb
S→λ then
S⟹aSb⟹aaSbb⟹aabb, so we can write
S⇒aabb
 The string aabb is a sentence in the language generated by G,

15
Grammar
Example 1
 G =({S, A}, {a, b}, S, P), with production

S→Ab
A→aAb
A→λ
1.What are the tuples?
2. What is the string generated from the given production?

16
Types of grammar
 There are different types of Grammars, examples:
 Linear Grammars
 Nonlinear Grammars

17
Types of grammars
 Linear Grammars are Grammars that have at most one
variable at the right side production of the grammar.
 There are different types of linear grammars based on
the positions of the variable at the right side of the
production.
 Nonlinear grammars are grammars that have more than
one variable at the right side of the production.

18
Right-Linear and Left-Linear Grammars

Definition:
 A grammar G =(V, T, S, P) is said to be right-linear if
all productions are of the form
A → xB,
A → x,
where A, B ∈ V, and x ∈ T*.
 A grammar is said to be left-linear if all productions are
of the form
A → Bx,
or
A → x.
19
Right-Linear and Left-Linear Grammars

 A regular grammar is one that is either right-linear or


left-linear.
 Note that in a regular grammar, at most one variable
appears on the right side of any production.
 Furthermore, that variable must consistently be either
the rightmost or leftmost symbol of the right side of
any production.

20
Noam Chomsky types of grammar
 Noam Chomsky gave a mathematical model of grammar which
is effective for writing computer languages.
 The four types of grammar according to Noam Chomsky are:-

21
Chapter three: Regular Languages
Outlines
 Regular languages
 Finite automata
 Types of finite automata
 Regular languageoperations
 Conversion from NFA to DFA

23
Automata
 An automaton is an abstract model of a digital computer.
 As such, every automaton includes some essential features.
 It has a mechanism for reading input. It will be assumed that the
input is a string over a given alphabet, written on an input file,
which the automaton can read but not change.
 The input file is divided into cells, each of which can hold one
symbol.
 The input mechanism can read the input file from left to right, one
symbol at a time.
 The input mechanism can also detect the end of the input string (by
sensing an end-of-file condition).

24
Finite automata
 Our introduction in the above section about
automata, is brief and informal.
 At this point, we have only a general understanding of
what an automaton is and how it can be represented by
a graph.
 To progress, we must be more precise, provide formal
definitions, and start to develop rigorous results.
 We begin with finite accepters, which are a simple,
special case of the general scheme introduced in
above section.

25
Finite automata
 This type of automaton is characterized by having no
temporary storage.
 Since an input file cannot be rewritten, a finite
automaton is severely limited in its capacity to
“remember” things during the computation.
 A finite amount of information can be retained in the
control unit by placing the unit into a specific state.
 But since the number of such states is finite, a finite
automaton can only deal with situations in which the
information to be stored at any time is strictly
bounded.
26
Finite Automata

27
Deterministic Finite Accepters (DFA)
 The first types of automaton that we are going to
study in detail are finite accepters that are deterministic
in their operation.
 We start with a precise formal definition of
deterministic accepters.

28
Deterministic Finite Accepters (DFA)
 Definition : A deterministic finite accepter or dfa is
defined by the quintuple(5 tuples)
M = (Q, Σ, δ, q0, F), Where

Q is a finite set of internal states,

Σ is a finite set of symbols input alphabet,

δ :Q × Σ → Q is a transition function,

q0 ϵ Q is the initial state,

F ᴄ Q is a set of final states.

29
Deterministic Finite Accepters (DFA)
Example 1
M =({q0,q1,q2},{0, 1},δ,q0,{ql}), }),
Where δ is given by
δ (q0,0) = q0, δ (q0,1) = q1
δ(q1,0) = q0, δ (q1,1) = q2
δ (q2,0) = q2, δ(q2,1) = q1

30
Deterministic Finite Accepters (DFA)
Example 2: Consider the dfa below

 In the above graph ,we allowed the use of two labels on


a single edge(see δ(q1,q2), δ(q2,q2)). Such multiply
labeled edges are shorthand for two or more distinct
transitions: The transition is taken whenever the input
symbol matches any of the edge labels.
 state q2 is a called trap(dead) state.

31
Deterministic Finite Accepters (DFA)
Example 3: Find a deterministic finite accepter that
recognizes the set of all strings on Σ= {a,b} starting with
the prefix ab.

32
Nondeterministic Finite Accepters (NFA)
 Finite accepters are more complicated if we allow them
to act non-deterministically.
 Non-determinism is a powerful but, at first sight,
unusual idea.
 We normally think of computers as completely
deterministic, and the element of choice seems out of
place. Nevertheless, non-determinism is a useful
notion, as we shall see as we proceed.

33
Definition of a Nondeterministic Accepter

 Non-determinism means a choice of moves for an


automaton. Rather than prescribing a unique move in
each situation, we allow a set of possible moves.
 Formally, we achieve this by defining the transition
function so that its range is a set of possible states.

34
Definition of a Nondeterministic Accepter
Definition
 A nondeterministic finite accepter or nfa is defined by

the quintuple(5 tuples)


M=(Q,Σ,δ,q0,F),
where
Q is a finite set of internal states,
Σ is a finite set of input alphabet,
δ : Q x ( Σ U{λ}) →2Q transition function,
q0 ϵ Q is the initial state,
F ᴄ 2Q is a set of final states.

35
Definition of a Nondeterministic Accepter
Note
 There are three major differences between this
definition and the definition of a dfa.
 In a nondeterministic accepter, the range of δ is in the
power set 2Q, so that its value is not a single element of
Q but a subset of it.
 This subset defines the set of all possible states that can

be reached by the transition.


 If, for instance, the current state is q1, the symbol a is
read, and δ(q1,a) = {q0,q2}

36
Definition of a Nondeterministic Accepter
 then either q0 or q2 could be the next state of the nfa.
Also, we allow λ as the second argument of δ.
 This means that the nfa can make a transition without
consuming an input symbol.
 Although we still assume that the input mechanism can
only travel to the right, it is possible that it is stationary
on some moves.
 Finally, in an nfa, the set δ (q ,a) may be empty,
meaning that there is no transition defined for this
specific situation.

37
Definition of a Nondeterministic Accepter

 Like dfa's, nondeterministic accepters can be represented


by transition graphs.
 The vertices are determined by Q, while an edge (qi,qj)
with label a is in the graph if and only if δ (qi;a) contains
qj.
 Note that since a may be the empty string, there can be
some edges labeled λ.

38
Equivalence of Deterministic and
Nondeterministic Finite Accepters
 We now come to a fundamental question. In what sense are dfa's
and nfa's different? Obviously, there is a difference in their
definition, but this does not imply that there is any essential
distinction between them.
 To explore this question, we introduce the concept of
equivalence between automata.
 Definition: Two finite accepters, M1 and M2, are said to be
equivalent, if they both accept the same language
L(M1) = L(M2),
 As mentioned, there are generally many accepters for a given
language, so any dfa or nfa has many equivalent accepters.

39
Conversion of NFA to DFA

40
Context-Free Languages
Context-free language
 In the last chapter, we discovered that not all languages are
regular. While regular languages are effective in describing
certain simple patterns, one does not need to look very far for
examples of non regular languages.
 The topic of context-free languages is perhaps the most
important aspect of formal language theory as it applies to
programming languages. Actual programming languages have
many features that can be described elegantly by means of
context-free languages.
 What formal language theory tells us about context-free
languages has important applications in the design of
programming languages as well as in the construction of
efficient compilers.

42
Context-free language
 In formal language theory, context free language is a
language generated by context-free grammar.
 The set of CFL is identical to the set of languages
accepted by pushdown automata

43
Context-Free Grammars
 The productions in a regular grammar are restricted in
two ways:
 The left side must be a single variable, while the right
side has a special form.
 To create grammars that are more powerful, we must
relax some of these restrictions.
 By retaining the restriction on the left side, but
permitting anything on the right, we get context-free
grammars.

44
Context free grammar
Definition
Context free grammar is defined by 4 tuples as G = (V, T, S, P) where
V = Set of variables or non-terminals
T = Set of terminals
S = Start symbol
P = production rules
A→x
where A ∈ V and x ∈ (V ∪ T)*
 A language L is said to be context-free if and only if there
is a context free grammar G such that L= L (G).

45
Context free language

46
Pumping lemma

47
Derivation

 A derivation is a sequence of replacements of


structure names by choices on the right hand
sides of grammar rules.

48
Ambiguity
 A grammar produces more than one parse tree for a
sentence is called as an ambiguous grammar.
 produces more than one leftmost derivation or
 more than one rightmost derivation for the same sentence
(input).

 We should eliminate the ambiguity in the grammar


during the design phase of the compiler.
 An unambiguous grammar should be written to eliminate
the ambiguity.
 Ambiguous grammars (b/c of ambiguous operators) can
be disambiguated according to the precedence and
associatively rules.

49
Ambiguity: Example
 Example: The arithmetic expression grammar
E → E + E | E * E | ( E ) | id
 permits two distinct leftmost derivations for the

sentence id + id * id:
(a) (b)
E => E + E E => E * E
=> id + E => E + E * E
=> id + E * E => id + E * E
=> id + id * E => id + id * E
=> id + id * id => id + id * id

50
Ambiguity: example
E  E + E | E  E | ( E ) | - E | id
Construct parse tree for the expression:
id + id  id
E E E E

E + E E + E E + E

E  E id E  E

id id
E E E E

E  E E  E E  E

E + E E + E id
Which parse tree is correct?
id id
51
Ambiguity: example…
E  E + E | E  E | ( E ) | - E | id

Find a derivation for the expression: id + id  id


E
According to the grammar, both are correct.
E + E

id E  E
A grammar that produces more than one
id id
parse tree for any input sentence is said
to be an ambiguous grammar. E

E + E

E  E id

id id
52
Parsing
What is parsing?
 Parsing: To break a sentence down into its component
parts with an explanation of the form, function, and
syntactical relationship of each part.
 The syntax of a programming language is usually given
by the grammar rules of a context free grammar (CFG).

53
Parsing…
 The parser can be categorized into two groups:
 Top-down parser
 The parse tree is created top to bottom, starting from the
root to leaves.
 Bottom-up parser
 The parse tree is created bottom to top, starting from the
leaves to root.
 Both top-down and bottom-up parser scan the input from
left to right (one symbol at a time).
 Efficient top-down and bottom-up parsers can be
implemented by making use of context-free- grammar.
 LL for top-down parsing
 LR for bottom-up parsing

54
Derivation
 A derivation is a sequence of replacements of structure
names by choices on the right hand sides of grammar
rules.

 Example: E → E + E | E – E | E * E | E / E | -E
E→(E)
E → id
E => E + E means that E + E is derived from E
- we can replace E by E + E
- we have to have a production rule E → E+E in our
grammar.
E=>E+E =>id+E=>id+id means that a sequence of
replacements of non-terminal symbols is called a
derivation of id+id from E.
55
Parse tree
 A parse tree is a graphical representation of a derivation.
 It filters out the order in which productions are applied to
replace non-terminals.
 A parse tree corresponding to a derivation is a labeled
tree in which:
 the interior nodes are labeled by non-terminals,

 the leaf nodes are labeled by terminals, and

 the children of each internal node represent the

replacement of the associated non-terminal in one


step of the derivation.

56
Parse tree and Derivation
Grammar E  E + E | E  E | ( E ) | - E | id
Lets examine this derivation:
E  -E  -(E)  -(E + E)  -(id + id)

E E E E E

- E - E - E - E

( E ) ( E ) ( E )

E + E E + E
This is a top-down derivation
because we start building the id id
parse tree at the top parse tree
Elimination of ambiguity
 These two derivations point out a problem with the grammar:
 The grammar do not have notion of precedence, or implied order of
evaluation

precedence
 Create a non-terminal for each level of precedence
 Isolate the corresponding part of the grammar
 Force the parser to recognize high precedence sub expressions first
For algebraic expressions
 Multiplication and division, first (level one)
 Subtraction and addition, next (level two)
Association
 Left-associative : The next-level (higher) non-terminal places at the
last of a production

58
Elimination of ambiguity
 Therefore, we can remove all undesirable productions
using the following sequence of steps:
1. Remove λ-productions.
2. Remove unit-productions.
3. Remove useless productions.

59
Two Important Normal Forms

 There are many kinds of normal forms we can establish


for context-free grammars. Some of these, because of
their wide usefulness, have been studied extensively.
We consider two of them briefly.

60
Chomsky Normal Form
 One kind of normal form we can look for is one in which the
number of symbols on the right of a production is strictly limited.
In particular, we can ask that the string on the right of a
production consist of no more than two symbols. One instance of
this is the Chomsky normal form.
 Definition: A context-free grammar is in Chomsky normal form
if all productions are of the form
A → BC
Or
A → a,
where A, B, C are in V, and a is in T.

61
Greibach Normal Form
 Another useful grammatical form is the Greibach normal form.
Here we put restrictions not on the length of the right sides of a
production, but on the positions in which terminals and variables
can appear.
 Arguments justifying Greibach normal form are a little
complicated and not very transparent.
 Similarly, constructing a grammar in Greibach normal form
equivalent to a given context-free grammar is tedious. We
therefore deal with this matter very briefly. Nevertheless,
Greibach normal form has many theoretical and practical
consequences.

62
Greibach Normal Form
Definition
 A context-free grammar is said to be in Greibach
normal form if all productions have the form
A → ax,
where a ∈ T and x ∈ V*
 If we compare this with Definition s-grammars, we see
that the form A → ax is common to both Greibach
normal form and s-grammars, but Greibach normal
form does not carry the restriction that the pair (A, a)
occur at most once.
 This additional freedom gives Greibach normal form a
generality not possessed by s-grammars.
63
Pushdown Automata
Introduction
What is Pushdown Automata?
 A Pushdown Automata(PDA) a way to implement a
context free grammar
 It is more powerful than FSM

 FSM has a limited memory but PDA has more memory

 Pushdown Automata = Finite State Machine + a stack

Stack operations
 A stack is a way we arrange elements one on top of another
 A stack has two basic operations
 PUSH:a new element is added on top of the stack
 POP: the top element of the is read and removed

65
Pushdown Automata
 Pushdown Automata has three components
 In input tape
 Finite control unit
 A stack with infinite size

66
Pushdown automata
Formal definition
 Pushdown automata is formally defined by 7 tuples as shown below

Where

67
Pushdown automata…
 The output of δ is finite set of pairs(P, Y) where
P is new state
Y is a sting of stack symbols that replaces x at the top of
the stack
Example
If Y = ∈ then the is popped
If Y = x then the stack is unchanged
If Y = yz then x is replaced by z and Y is pushed onto the
stack

68
Nondeterministic Pushdown Automata

69

You might also like