Professional Documents
Culture Documents
Code :
UNIT – I
PRILIMINARIES
Languages:
A general definition of language must cover a variety of distinct categories:
natural languages, programming languages, mathematical languages, etc. The
notion of natural languages like English, Hindi, etc. is familiar to us.
Informally, language can be defined as a system suitable for expression of
certain ideas, facts, or concepts, which includes a set of symbols and rules to
manipulate these. The languages we consider for our discussion is an
abstraction of natural languages. That is, our focus here is on formal
languages that need precise and formal definitions. Programming languages
belong to this category. We start with some basic concepts and definitions
required in this regard.
Symbol:
A symbol is an abstract entity. There is no formal definition for a symbol.
Examples: letters, digits and special characters etc.
Alphabet:
A finite non-empty set of symbols is called ‘alphabet’. It is denoted by ∑ .
Example: ∑ ={a,b,…z}, ∑ ={a,b}, ∑ ={0,1}
String:
A sentence over an alphabet is a finite sequence of the symbols from an
alphabet.
Example:
1.. In ∑={a,b} is the alphabet
ab, abb are valid strings, where as abc is not a string because ‘c’
doesn’t belongs to ∑
2
Computer Science & Engineering Formal Languages and Automata Theory
Length of a string:
The number of symbols in a string w is called its length, denoted by |w|.
Example: | 011 | = 3, |11| = 2, | b | = 1
Assumption:
In theory of computation, there exists a string of length zero called as ‘epsilon’
i.e. ‘ ∈ ’.
S. ∈ =S
∈ .S=S
∈ .S. ∈ =S
Convention: We will use small case letters towards the beginning of the
English alphabet to denote symbols of an alphabet and small case letters
3
Computer Science & Engineering Formal Languages and Automata Theory
Prefix: Any sequence of leading symbols over the given string is called ‘prefix’.
Example: For the string ‘cse’, the possible prefixes are ∈ ,c, cs, cse.
Suffix: Any sequence of trailing symbols over the given string is called ‘suffix’.
Example: For the string ‘cse’, the possible suffixes are ∈, e, se, cse
Substring: Any sequence of symbols over the given string is called ‘substring’.
Example: For the string ‘computer’ the possible substrings are ∈, com,
put, te, mp,….
Powers of Strings:
For any string x and integer , we use to denote the string formed by
sequentially concatenating n copies of x. We can also give an inductive
definition of as follows:
= ∈, if n = 0 ; otherwise
∈
Example: If x = 011, then = 011011011, = 011 and x =0
Powers of Alphabets:
We write (for some integer k) to denote the set of strings of length k with
Hence, for any alphabet, denotes the set of all strings of length zero. That is,
4
Computer Science & Engineering Formal Languages and Automata Theory
∑ *= ∑0 ¿ ∑1 ¿ ∑2 ¿ ∑3….∑n
The set contains all the strings that can be generated by iteratively
concatenating symbols from any number of times.
∈
∑*= set of all strings over ∑ including ‘ ’
The set of all nonempty strings over an alphabet is denoted by ∑ +=. That is,
∑+= ∑1 ¿ ∑2 ¿ ∑3….∑n
Here ‘+’ is the positive closure operator
∑*=∑+ ¿ ∈
Reversal:
Languages:
A language over an alphabet is a set of strings over that alphabet. Therefore, a
language L is any subset of ∑*. That is, any L *is a language.
Example:
5
Computer Science & Engineering Formal Languages and Automata Theory
Set Operations on Languages: Since languages are set of strings we can apply
set operations to languages. Here are some simple examples (though there is
nothing new in it).
Example: {0, 11, 01, 011} {11, 01, 111} = {0, 11, 01, 011, 111}
Reversal of a language:
Example: Let L = {0, 11, 01, 011}. Then = {0, 11, 10, 110}.
Language concatenation:
= { xy | and }.
6
Computer Science & Engineering Formal Languages and Automata Theory
L*= (Union n in N)
L+= LL*, i.e., all strings derivable by one or more concatenations of strings in L.
That is
=
L*=