You are on page 1of 42

Lecture # 1

CS-312
Theory of Automata and Formal
Languages
Books

1. Introduction to Computer Theory, by Daniel I. Cohen,


Latest Edition
2. An Introduction to Formal Languages and Automata,
Peter Linz, 6th edition,
3. Reference Book: Introduction to Languages and
Theory of Computation, by J. C. Martin, McGraw
Hill, Latest Edition
Agenda 1
 Objective of Course
 Introduction
 Theory of Automata
 Evolution of languages
 Type of Languages
 Difference between Formal and Natural languages
 Alphabets, Strings, Words
 Valid and invalid alphabet
 Length of a String, Reverse of a String
Objective of Course
 This course is about the fundamental capabilities and
limitations of computers.
 This theory is very much relevant to practice, for example,
in the design of new programming languages, compilers,
string searching, etc., etc.
 This course helps you to learn problem solving skills.
 Every time we introduce a new machine, we will learn
its language; and every time we develop a new language,
we will try to find a machine that corresponds to it.
Introduction
 Psychologists, mathematicians, engineers and some of
the first computer scientists shared a common
interest:
 To model the human thought process.
 Whether in the brain or in a computer.
 Warren McCulloch and Walter Pitts, two
neurophysiologists, were the first to present a
description of finite automata in 1943.
What does Theory of Automata means?
 The word “Theory” means that this subject is more
mathematical subject and less practical.
 It is not like other courses such as programing but
this subject is foundation of many other practical
subjects.
 Automata is Greek letter.
 It is plural of automaton, and it means “something
that works automatically”
 It accepts input, produce output, many have some
temporary storage and can make decisions in
transforming the input into the output.
Theory of Automata
 Theory of Automata is the study of:
• Abstract machines ('mathematical' machines or systems)
• and computational problems (languages) that can be solved using
these machines.
 These abstract machines are called automata.
 This subject play major role in:
• Theory of Computation
• Compiler Construction
• Formal Verification
• Defining Computer Languages
• Parsing
Introduction to Languages
 Language: Collection of different types of inputs through which a
machine is operated is said to be language of machine
 Basically, a language is comprises of three different types of entities:
letters, words, sentences
 Letters are from a finite alphabet { a, b, c, . . . , z }
 Words are made up of certain combinations of letters from the
alphabet.
Not all combinations of letters lead to a valid English word.
 Sentences are made up of certain combinations of words.
Not all combinations of words lead to a valid English sentence.
 So we see that some basic units are combined to make bigger units.
Introduction to Languages
 There are two types of languages
• Informal Language (Semantic language)
• Formal Language (Syntactic language)
 Informal/Semantic Language
• Concerned with the interpretation or meaning of a sentence
(what output to produce in context of machines)
• Affected by ambiguity the most.
 Formal/Syntactic Language
 Defines rules for combining the units to form valid sentences
(computer programs in context of machines)
Informal languages
 Natural languages are generally defined informally
 Human brain
 Capable to understand incoherent even invalid sentences.
 You mangoes like
 Rectify grammatical errors etc.
 Resolve ambiguity
 Interpret according to context
 Supporting aids such as Facial expressions and body language etc.
How to communicate with machines ?
 Need a language: what sort
 Machines don’t have human mind though may have its
partial limitation
 Would fail on incorrect or ambiguous input
 Thus need a precise, explicit and universal definition of
communication language
Formal languages
 Rules defined explicitly and clearly
 No ambiguities
 Lets the machine
 Interpret an input uniformly every time. i.e.
always produces same output for a particular
input
 Explicitly reject invalid input
Formal Languages
 Need uniformly understandable notation
 Representations
 Alphabet
 ={a,b,….z.} ={A,…Z}
 Binary Numbers
 ∑ = {0,1}
Whole Numbers
 ∑ = {0,1,2,3,4,5,6,7,8,9}
Alphabets
 Definition:
A finite non-empty set of symbols (letters), is called an
alphabet. It is denoted by Σ ( Greek letter sigma).
 Example:
Σ={a,b}
Σ={0,1} //important as this is the language
//which the computer understands.
Σ={i,j,k}
Strings

 Definition:
Concatenation of finite symbols from the alphabet is
called a string.
 Example:
If Σ= {a,b} then
{a, abab, aaabb, ababababababababab,..}
NOTE:

EMPTY STRING or NULL STRING


 Sometimes a string with no symbol at all is used, denoted
by (Small Greek letter Lambda) λ or (Capital Greek
letter Lambda) Λ, is called an empty string or null string.
 The capital lambda will mostly be used to denote the
empty string, in further discussion.
Words

 Definition:
Words are strings belonging to some language.
Example:
If Σ= {x} then a language L can be defined as
L={xn : n=1,2,3,…..} or L={x,xx,xxx,….}
Here x,xx,… are the words of L
NOTE:

 Words are strings that belong to some specific language.

 All words are strings, but not all strings are


words.
Alphabets Guideline
 The following three important rules for defining
alphabets for a language:
• Should not contain empty symbol Λ
• Should be finite
• Should not be ambiguous
Valid/In-valid Alphabets
 While defining an alphabet, an alphabet may contain
letters consisting of group of symbols for example
Σ1= {B, aB, bab}.

 Now consider an alphabet


Σ2= {B, Ba, bab} and

a string BababB.
Valid/In-valid alphabets

String : BababB , Σ2= {B, Ba, bab, d}


This string can be factored in two different ways
 (Ba), (bab), (B)
 (B), (abab), (B)
Which shows that the second group cannot be identified
as a string, defined over
Σ = {a, b}.
This is due to ambiguity in the defined alphabet Σ2
Valid/In-valid alphabets
 As when this string is scanned by the compiler
(Lexical Analyzer), first symbol B is identified as a
letter belonging to Σ,
 While for the second letter the lexical analyzer
would not be able to identify,
 So while defining an alphabet it should be kept in
mind that ambiguity should not be created.
Remarks:
 While defining an alphabet of letters consisting of
more than one symbols, no letter should be started
with the letter of the same alphabet i.e. one letter
should not be the prefix of another. {B,Ba}
 However, a letter may be ended in the letter of same
alphabet i.e. one letter may be the suffix of
another.{B,aB}
Conclusion
 Σ1= {B, aB, bab, d}
 Σ2= {B, Ba, bab, d}

Σ1 is a valid alphabet while Σ2 is an in-valid alphabet.


Ambiguity Example
 Σ1= {A, aA, bab, d}.
 Σ1= {A, Aa, bab, d}.
Σ1 is valid alphabet while Σ2 is an invalid alphabet.
Similarly,
 Σ1= {a, ab, ac}.
 Σ1= {a, ba, ca}.
In this case, Σ1 is invalid alphabet while Σ2 is a valid
alphabet.
Length of Strings
 Definition:
The length of string s, denoted by |s|,
is the number of letters in the string.
 Example:
Σ={a,b}
s=ababa
Tokenize s = (a), (b),(a),(b),(a)
|s|=5
Length of Strings
 Example:
Σ= {B, aB, bab, d}
s=BaBbabBd
Tokenizing=(B), (aB), (bab), (B), (d)
|s|=5
One important point to note here is that aB has length 1
not 2, Similarly bab has length 1 not 3.
 Example:
If s=xxxx then length of s is 4
Length(428) is 3
Length(Λ) is 0
Reverse of a String
 Definition:
The reverse of a string s denoted by Rev(s) or sr, is
obtained by writing the letters of s in reverse order.
 Example:
If s=abc is a string defined over Σ={a,b,c}
then Rev(s) or sr = cba
Reverse of a String
 Example:
Σ= {B, aB, bab, d}
s=BaBbabBd
Tokenize=(B),(aB),(bab),(B),(d)
Rev(s)=dBbabaBB
Note:
Wrong Rev(s)= dBbabBaB
Agenda 2
 Introduction to defining languages
 Recursive definition of a language
 Regular Expression
 Kleene Closure
 Union and Concatenation
Introduction to Defining Languages
 In theory of Automata languages can be defined in
different ways , such as
 Descriptive definition
 Recursive definition
 Using Regular Expressions(RE)
 Using Finite Automaton(FA) etc.
Descriptive definition of language
The language is defined, describing the conditions
imposed on its words.
 Example:
The language L of strings of odd length, defined over
Σ={a}, can be written as
L={a, aaa, aaaaa,…..}
 Example:
The language L of strings that does not start with a,
defined over Σ={a,b,c}, can be written as
L={b, c, ba, bb, bc, ca, cb, cc, …}
Example of Descriptive Language
 Example:
The language L of strings of length 2, defined over
Σ={0,1,2}, can be written as
L={00, 01, 02,10, 11,12,20,21,22}
 Example:
The language L of strings ending in 0, defined over
Σ ={0,1}, can be written as
L={0,00,10,000,010,100,110,…}
Example of Descriptive Language
 Example: The language EQUAL, of strings with number of
a’s equal to number of b’s, defined over Σ={a,b}, can be
written as
L= {Λ ,ab, aabb, abab,baba,abba,…}
 Example: The language EVEN-EVEN, of strings with even
number of a’s and even number of b’s, defined over
Σ={a,b}, can be written as
L={Λ, aa, bb, aaaa,aabb,abab, abba, baab, baba, bbaa,
bbbb,…}
Example of Descriptive Language
 Example: The language {anbn }, of strings defined over
Σ={a,b}, as
L={an bn : n=1,2,3,…}, can be written as
L={ab, aabb, aaabbb,aaaabbbb,…}

 Example: The language {anbnan }, of strings defined over


Σ={a,b}, as
L={an bn an: n=1,2,3,…}, can be written as
L= {aba, aabbaa, aaabbbaaa,aaaabbbbaaaa,…}
Example of Descriptive Language
 Example:
 The language PRIME, of strings defined over
Σ={a}, as
p
L={a : p is prime}, can be written as
L={aa,aaa,aaaaa,aaaaaaa,aaaaaaaaaaa…}
An Important language
 PALINDROME:
The language consisting of Λ and the strings s
defined over Σ such that Rev(s)=s.
It is to be denoted that the words of
PALINDROME are called palindromes.
 Example:For Σ={a,b},
PALINDROME={Λ , a, b, aa, bb, aaa, aba, bab,
bbb, ...}
Kleene Star Closure

 Given an alphabet Σ, then the Kleene Star Closure of


the alphabet Σ, denoted by Σ*, is the collection of all
strings defined over Σ, including Λ.
 It is to be noted that Kleene Star Closure can be
defined over any set of strings.

38
Examples
 If Σ = {x}
Then Σ* = {Λ, x, xx, xxx, xxxx, ….}
 If Σ = {0,1}
Then Σ* = {Λ, 0, 1, 00, 01, 10, 11, ….}
 If Σ = {a, b, c}
Then Σ* = {Λ, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb,
cc, aaa, aab, …… }

39
Note

 Languages generated by Kleene Star Closure of set of


strings, are infinite languages.
 (By infinite language, it is supposed that the language
contains infinite many words, each of finite length).

40
PLUS Operation (+)
 Plus Operation is same as Kleene Star Closure except that it
does not generate Λ (null string), automatically.
Example:
 If Σ = {0,1}
Then Σ+ = {0, 1, 00, 01, 10, 11, ….}
 If Σ = {a, b, c}
Then Σ+ = {a, b, c, aa, ab, ac, ba, bb, bc,….}

41
Remark
 It is to be noted that Kleene Star can also be operated on any
string i.e. a* can be considered to be all possible strings
defined over {a}, which shows that a* generates
{Λ, a, aa, aaa, …}
 It may also be noted that a+ can be considered to be all
possible non empty strings defined over {a}, which shows
that a+ generates
{a, aa, aaa, aaaa, …}

42

You might also like