You are on page 1of 7

VIGNAN’S LARA INSTITUTE OF TECHNOLOGY AND SCIENCE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Faculty : Mr. S Deva Kumar

Subject : Formal Languages and Automata Theory

Code :

Year & Branch : II CSE


Computer Science & Engineering Formal Languages and Automata Theory

UNIT – I

PRILIMINARIES

Languages:
A general definition of language must cover a variety of distinct categories:
natural languages, programming languages, mathematical languages, etc. The
notion of natural languages like English, Hindi, etc. is familiar to us.
Informally, language can be defined as a system suitable for expression of
certain ideas, facts, or concepts, which includes a set of symbols and rules to
manipulate these. The languages we consider for our discussion is an
abstraction of natural languages. That is, our focus here is on formal
languages that need precise and formal definitions. Programming languages
belong to this category. We start with some basic concepts and definitions
required in this regard.

Symbol:
A symbol is an abstract entity. There is no formal definition for a symbol.
Examples: letters, digits and special characters etc.

Alphabet:
A finite non-empty set of symbols is called ‘alphabet’. It is denoted by ∑ .
Example: ∑ ={a,b,…z}, ∑ ={a,b}, ∑ ={0,1}

String:
A sentence over an alphabet is a finite sequence of the symbols from an
alphabet.
Example:
1.. In ∑={a,b} is the alphabet
ab, abb are valid strings, where as abc is not a string because ‘c’
doesn’t belongs to ∑

2
Computer Science & Engineering Formal Languages and Automata Theory

2. In ∑ = {0, 1} is the alphabet


00,110,1011 are valid strings.
Note: The length of any string over the given alphabet is at least 1.

Length of a string:
The number of symbols in a string w is called its length, denoted by |w|.
Example: | 011 | = 3,  |11| = 2,  | b | = 1

It is convenient to introduce a notation ∈ for the empty string, which contains


no symbols at all. The length of the empty string ∈ is zero, i.e., | ∈ | = 0.

Assumption:
In theory of computation, there exists a string of length zero called as ‘epsilon’
i.e. ‘ ∈ ’.
S. ∈ =S
∈ .S=S

∈ .S. ∈ =S

Convention:  We will use small case letters towards the beginning of the
English alphabet to denote symbols of an alphabet and small case letters

towards the end to denote strings over an alphabet. That is,


(symbols) and    are strings.

Some String Operations:

Let and be two strings. The concatenation of x

and y denoted by xy, is the string . That is, the


concatenation of x and y denoted by xy is the string that has a copy of x
followed by a copy of y without any intervening space between them.

Example: Concatenation of the strings 0110 and 11 is 011011 and

3
Computer Science & Engineering Formal Languages and Automata Theory

concatenation of the strings good and boy is goodboy.

Prefix: Any sequence of leading symbols over the given string is called ‘prefix’.
Example: For the string ‘cse’, the possible prefixes are ∈ ,c, cs, cse.

Suffix: Any sequence of trailing symbols over the given string is called ‘suffix’.
Example: For the string ‘cse’, the possible suffixes are ∈, e, se, cse

Substring: Any sequence of symbols over the given string is called ‘substring’.
Example: For the string ‘computer’ the possible substrings are ∈, com,
put, te, mp,….

Powers of Strings:

For any string x and integer , we use to denote the string formed by
sequentially concatenating n copies of x. We can also give an inductive

definition of as follows:

= ∈, if n = 0 ; otherwise

Example: If x = 011, then = 011011011, = 011 and x =0

Powers of Alphabets:

We write (for some integer k) to denote the set of strings of length k with

symbols from . In other words, = { w | w is a string over and  | w | = k}.

Hence, for any alphabet, denotes the set of all strings of length zero. That is,

={ ∈ }. For the binary alphabet {a, b} we have the following.


∑0 = { ∈ }
∑1= {a,b}
∑2= {aa,ab,bb,ba}
∑3= {set of all strings possible over the alphabet of length ‘n’}
The set of all strings over an alphabet is denoted by ∑ *. That is,

4
Computer Science & Engineering Formal Languages and Automata Theory

∑ *= ∑0 ¿ ∑1 ¿ ∑2 ¿ ∑3….∑n

The set contains all the strings that can be generated by iteratively
concatenating symbols from any number of times.

∑*= set of all strings over ∑ including ‘ ’

Here ‘*’ is the kleene closure operator


 ∑ *= =∑ *- ∈
=∑1 ¿ ∑2 ¿ ….. ¿ ∑n

The set of all nonempty strings over an alphabet is denoted by ∑ +=. That is,
∑+= ∑1 ¿ ∑2 ¿ ∑3….∑n
Here ‘+’ is the positive closure operator
 ∑*=∑+ ¿ ∈

Reversal:

For any string the reversal of the string is .

Languages:
A language over an alphabet is a set of strings over that alphabet. Therefore, a
language L is any subset of ∑*. That is, any L  *is a language.

Example:

1. F is the empty language.


2. ∑* is a language for any .

∅ ≠{∈}
3. { } is a language for any .  Note that, . Because the language F

does not contain any string but { } contains one string of length zero.
4. The set of all strings over {0, 1} containing equal number of 0's and 1's.
5. The set of all strings over {a, b, c} that starts with a.

5
Computer Science & Engineering Formal Languages and Automata Theory

Convention: Capital letters A, B, C, L, etc. with or without subscripts are


normally used to denote languages.

Set Operations on Languages: Since languages are set of strings we can apply
set operations to languages. Here are some simple examples (though there is
nothing new in it).

Union:  A string    iff  or

Example: {0, 11, 01, 011} {11, 01, 111} = {0, 11, 01, 011, 111}

Intersection: A string     iff    and .

Example: {0, 11, 01, 011} {1, 01, 110} = {01}

Complement: Usually, ∑* is the universe that a complement is taken with

respect to. Thus for a language L, the complement is  L (bar) = { | }.

Example: Let L = {x | |x| is even}. Then its complement is the language {


| |x| is odd}.
Similarly we can define other usual set operations on languages like relative
complement, symmetric difference, etc.

Reversal of a language:

The reversal of a language L, denoted as , is defined as:  .

Example: Let L = {0, 11, 01, 011}. Then    = {0, 11, 10, 110}.

Language concatenation:

The concatenation of languages and is defined as

= { xy | and }.

6
Computer Science & Engineering Formal Languages and Automata Theory

Example: {a, ab}{b, ba} = {ab, aba, abb, abba}.


Note that,
1.       in general.
      2.   
     
Kleene's Star Operation:
The Kleene star operation on a language L, denoted as L* is defined as follows:

L*= (Union n in N)

= {x | x is the concatenation of zero or more strings from L}

Thus L* is the set of all strings derivable by any number of concatenations of


strings in L. It is also useful to define

L+= LL*, i.e., all strings derivable by one or more concatenations of strings in L.
That is

= (Union n in N and n >0)

      =

Example : Let  L = { a, ab }. Then we have,

L*=

= {∈} {a, ab} {aa, aab, aba, abab} …

= {a, ab} {aa, aab, aba, abab} …

Note: e is in L*, for every language L, including.

You might also like