You are on page 1of 42

Formal Languages

and
Theory of Automata
Formal Languages and Theory of Automata

Chapter 1
Introduction
Definitions
• Symbol – An atomic unit, such as a digit, character, lower-case letter,
etc. Sometimes a word. [Formal language does not deal with the
“meaning” of the symbols.]

• Alphabet – A finite set of symbols, usually denoted by Σ.

Σ = {0, 1} Σ = {0, a, 4} Σ = {a, b, c, d}

• String – A finite length sequence of symbols, presumably from some


alphabet.

3
Alphabets and strings
• A common way to talk about words, number, pairs of words, etc. is by
representing them as strings
• To define strings, we start with an alphabet
An alphabet is a finite set of symbols.
• Examples

1 = {a, b, c, d, …, z}: the set of letters in English


2 = {0, 1, …, 9}: the set of (base 10) digits
3 = {a, b, …, z, #}: the set of letters plus the
special symbol #
4 = {(, )}: the set of open and closed brackets 4
Strings

A string over alphabet  is a finite sequence


of symbols in .

• The empty string will be denoted by (epsilon)


• Examples

abfbz is a string over 1 = {a, b, c, d, …, z}


9021 is a string over 2 = {0, 1, …, 9}
ab#bc is a string over 3 = {a, b, …, z, #}
))()(() is a string over 4 = {(, )}
5
Strings

Let: u=ε w = 0110 y = 0aa x = aabcaa z = 111


concatenation: wz = 0110111
length: |w| = 4 |x| = 6 but |u| = 0
reversal: yR = aa0
Some special sets of strings
• Σ* All strings of symbols from Σ
• Σ+ Σ* - {ε}
Example: Σ = {0, 1}
• Σ* = {ε, 0, 1, 00, 01, 10, 11, 000, 001,…}
• Σ+ = {0, 1, 00, 01, 10, 11, 000, 001,…}

6
What is Formal Language ?

A (formal) language is a set of strings over an alphabet.


i.e. any subset L of Σ*

• Classes of formal languages (e.g., regular, context-free, context-


sensitive, recursive, recursively enumerable).
• Formal languages are related to programming languages and
natural languages

7
Formal Language Examples:
Σ = {0, 1}
L = {x | x is in Σ* and x contains an even number of 0’s}

Σ = {0, 1, 2,…, 9, .}
L = {x | x is in Σ* and x forms a finite length real number}
= {0, 1.5, 9.326,…}

Σ = {a, b, c,…, z, A, B,…, Z}


L = {x | x is in Σ* and x is a CPP reserved word}
= {while, for, if, int, …}

8
Formal Language Examples:

Σ = {CPP reserved words} U { (, ), ., :, ;,…} U {Legal CPP identifiers}


L = {x | x is in Σ* and x is a syntactically correct CPP program}

Σ = {English words}
L = {x | x Σ* and x is a syntactically correct English sentence}

9
What is automata theory ?

• Automata theory is the study of abstract computational devices


• Abstract devices are (simplified) models of real computations
• Computations happen everywhere: On your laptop, on your cell
phone, in nature, …
• Why do we need abstract models?

10
A simple “computer”
H
W ITC
S

BATTERY

input: switch
output: light bulb
actions: flip switch
states: on, off
11
A simple “computer”
H
W ITC
S

BATTERY start off on

input: switch
bulb is on if and only if there was
output: light bulb
an odd number of flips
actions: f for “flip switch”
states: on, off
12
Another “computer”
1
1 start off off
1

2 2 2 2
BATTERY
1
2
off on
1

inputs: switches 1 and 2


bulb is on if and only if both
actions: 1 for “flip switch 1”
switches were flipped an odd
actions: 2 for “flip switch 2”
number of times
states: on, off

13
A design problem
1 4

?
5
BATTERY

Can you design a circuit where the light is on if and only if all the
switches were flipped exactly the same number of times?

14
A design problem

• Such devices are difficult to reason about, because they can be


designed in an infinite number of ways
• By representing them as abstract computational devices, or automata,
we will learn how to answer such questions

15
These devices can model many things
• They can describe the operation of any “small computer”, like the control
component of an alarm clock or a microwave
• They are also used in lexical analyzers to recognize well formed
expressions in programming languages:

ab1 is a legal name of a variable in C++


5u= is not

16
Different kinds of automata

• This was only one example of a computational device, and there are
others
• We will look at different devices, and look at the following questions:
• What can a given type of device compute, and what are its limitations?
• Is one type of device more powerful than another?

17
Some devices we will see
finite automata Devices with a finite amount of memory.
Used to model “small” computers.

push-down automata Devices with infinite memory that can be accessed in


a restricted way.
Used to model parsers, etc.

Turing Machines Devices with infinite memory.


Used to model any computer.
Preliminaries of automata theory
• How do we formalize the question

Can device A solve problem B?


• First, we need a formal way of describing the problems that we are
interested in solving

19
Problems

• Examples of problems we will consider


• Given a word s, does it contain the subword “fool”?
• Given a number n, is it divisible by 7?
• Given a pair of words s and t, are they the same?
• Given an expression with brackets, e.g. (()()), does every left bracket match
with a subsequent right bracket?
• All of these have “yes/no” answers.
• There are other types of problems, that ask “Find this” or “How many
of that” but we won’t look at those.

20
Graphs and trees

 Graphs
 Definition: A graph (undirected graph) consists of:
a. A non-empty set v called the set of vertices,
b. A set E called the set of edges, and
c. A map Φ (phi) which assigns to every edge a unique unordered pair
of vertices
e1 e6
v1 v2 e1 = {v1, v2}

e2 e3 e5 e2 = {v1, v3}

v3 v4
e4 e6 = {v2, v2} (a self loop)

June 27, 2022 Formal Language Theory 21


Graphs and trees: cont’d
 Definition: A directed graph (digraph) consists of:
a. A non-empty set v called the set of vertices,
b. A set E called the set of edges, and
c. A map Φ (phi) which assigns to every edge a unique ordered pair of
vertices

e1
v1 v2 e1 = (v1, v2)

e2 e3 e5 v1 : a predecessor of v2
v2 : a successor of v1
v3 v4
e4

June 27, 2022 Formal Language Theory 22


Graphs and trees: cont’d
 Definition: The degree of a vertex v in a graph (directed or undirected) is the
number of edges with v as an end vertex.
Note: that a self loop is counted twice when calculating the degree of a vertex.
Ex. In the previous graph, deg(v1) = ? deg(v2) = ?

 Definition: A path in a graph (directed or undirected) is an alternating sequence


of vertices and edges of the form v1e1v2e2…en-1vn, beginning and ending with
vertices such that ei has vi and vi+1 as its end vertices and no edge or vertex is
repeated in the sequence.
The path is said to be from v1 to vn.

Ex. In the previous graph, v1e1v2e3v3e4v4 is a path from v1 to v4.


Note: that a path may be directed (if all the edges in the path have the same
direction.)

June 27, 2022 Formal Language Theory 23


Graphs and trees: cont’d
 Definition: A graph (directed or undirected) is connected if there is a
path between every pair of vertices.
Q. Are the previous two graphs connected?

 Definition: A circuit in a graph is an alternating sequence v1e1v2e2…en-1v1


of vertices and edges starting and ending with the same vertex such that
ei has vi and vi+1 as end vertices and no edge or vertex other than v1 is
repeated.
Ex. V2e3v3e4v4e5v2 is a circuit in the previous graph

June 27, 2022 Formal Language Theory 24


Graphs and trees: cont’d
 Trees
 Definition: A graph (directed or undirected) is called a tree if it
is connected and has no circuits.
Q. Are the previous two graphs trees?
 Properties of trees:
 In a tree there is one and only one path between every pair of vertices
(nodes)
 A tree with n vertices has n-1 edges
 A leaf in a tree can be defined as a vertex of degree one
 Vertices other than leaves are called internal vertices

June 27, 2022 Formal Language Theory 25


Graphs and trees: cont’d
 Definition: An ordered directed tree is a digraph satisfying the following
conditions:
 There is one vertex called the root of the tree which is distinguished from all other
vertices and the root has no predecessors.
 There is a directed path from the root to every other vertex.
 Every vertex except the root has exactly one predecessor.
(For the sake of simplicity, we refer to ordered directed trees as simply trees.)
 The number of edges in a path is called the length of the path.
 The height of a tree is the length of the longest path from the root.
 A vertex v in a tree is at level k if there is a path of length k from the root to the
vertex v.
Q. what is the maximum possible level in a tree?
 There are several types of trees: binary, balanced binary, binary search tree,
heap, general tree, …

June 27, 2022 Formal Language Theory 26


Graphs and trees: cont’d
1
Ex. 1. List the leaves.
3 2. List the internal nodes.
2
3. What is the length of the
path from 1 to 9?
10 4 5 6
4. What is the height of the
tree?
7 8

 Note: a path from vertex (node) n1 to node nk can


be simply expressed as the sequence of nodes ni,
i=1,…,k such that ni is the parent (predecessor) of
ni+1 (1<= I <=k)

June 27, 2022 Formal Language Theory 27


 A string a is a proper prefix of w if and only if a ≠ w.

 Substrings of a string
 A string obtained by removing a prefix or a suffix from a string is
called a substring of w.
 Eg: if w=abc, then b is a substring of w.
 Every prefix and suffix of string w is a substring of w, but not every
substring of w is a prefix or a suffix.
 For every string w, both w and λ are prefixes, suffixes and
substrings of w.

June 27, 2022 Formal Language Theory 28


Strings and languages: cont’d
 A terminal symbol is a unique indivisible object used in the
generation of strings.
 A nonterminal symbol is a unique object but divisible, used
in the generation of strings.
Ex. In English, a, b, A, B, etc are terminals and the words
boy, cat, dog, … are nonterminals.
In programming languages, a, A, :, ;, =, if, then, … are
terminals

June 27, 2022 Formal Language Theory 29


Strings and languages: cont’d
 Languages
 Definition: A language, L, is a set (collection) of strings over a
given alphabet, ∑.
 A language over ∑ is a subset of ∑*

 A string in L is called a sentence or word.

Ex. ∑ = {0, 1}, ∑* = {λ, 0, 1, 01, 00, 11, …}

L1 = {λ},
L2 = {0, 1, 01} over ∑
L3 = {an | n>= 0} over ∑ = {a}
L4={λ,0,00,000,…} is a language over alphabet ∑={0}

June 27, 2022 Formal Language Theory 30


Finite Automata

31
Example of a finite automaton

off on

• There are states off and on, the automaton starts in off and tries to reach
the “good state” on
• What sequences of fs lead to the good state?
• Answer: {f, fff, fffff, …} = {f n: n is odd}
• This is an example of a deterministic finite automaton over alphabet {f}

32
Deterministic finite automata

• A deterministic finite automaton (DFA) is a 5-tuple (Q, , , q0, F)


where
• Q is a finite set of states
•  (sigma) is an alphabet
• : Q ×  → Q is a transition function
• q0  Q is the initial state
• F  Q is a set of accepting states (or final states).
• In diagrams, the accepting states will be denoted by double loops

33
Deterministic finite automata

• At the initial time, it is assumed to be in the initial state q0, with


its input mechanism on the leftmost symbol of the input string.
• During each move of the automation, the input mechanism advances
one position to the right, so each move consumes one input symbol.
• When the end of the string is reached, the string is accepted if the
automation is in one of its final states; otherwise the string is rejected.

34
Example

0 1 0,1

q0 1 q1 0 q2

State transition diagram

alphabet  = {0, 1} Transition table


start state Q = {q0, q1, q2} inputs
initial state q0 0 1
accepting states F = {q0, q1} q0 q0 q1

states
q1 q2 q1
For every transition rule d(qi,a)= qj, the graph
q2 q2 q2
has an edge (qi, qj) labeled a.
The vertex associated with q0 is called the initial
vertex, while those labeled with qf Î F are the
final vertices. 35
Language of a DFA

The language of a DFA (Q, , , q0, F) is the set of


all strings over  that, starting from q0 and
following the transitions as the string is read left
to right, will reach some accepting state.

M: off on

• Language of M is {f, fff, fffff, …} = {f n: n is odd}

36
Examples
0 0
1
q0 q1
1

0 1
1
q0 q1
0

0 1 0,1

q0 1 q1 0 q2

What are the languages of these DFAs?


37
Examples
• Construct a DFA that accepts the language

L = {010, 1} (  = {0, 1} )
• Answer

q0 1 q01 0 q010
0

q

1 q1

38
Examples
• Construct a DFA that accepts the language

L = {010, 1} (  = {0, 1} )
• Answer

q0 1 q01 0 q010
0
0 1
q 0, 1
1 q1 0, 1 qdie
0, 1

39
Examples

• Construct a DFA over alphabet {0, 1} that accepts all strings that end
in 101

40
Examples

• Construct a DFA over alphabet {0, 1} that accepts all strings that end
in 101

• Hint: The DFA must “remember” the last 3 bits of the string it is
reading

41
Reading Assignment

• Mathematical Preliminaries
• sets,
• relations,
• functions,
• sequences,
• graphs,
• trees.
42

You might also like