# Introduction to theory of computation

Course Objectives 1. To introduce the concepts and methods that underlie the formal (mathematical) study of computing machines – What is a computing machine? – How can we characterise and classify computing machines? 2. To present some of the basic results concerning the capabilities and limits of computing machines – Are there limits in principle to what can be computed? o Every program computes a function from its input (a string of bits) to its output (a string of bits). Since a string of bits may be viewed as a binary number, every program may be viewed as computing a function from N to N. But is every function from N to N computable? o Are all programming languages and computing machines equal (in principle)? Or are some more equal than others? Are there limits in practice to what can be computed? o Are there computable problems which no matter · how clever an algorithm we devise · how efficient the language we write them in · how ‘next generation’ the hardware Will still not finish on inputs of small size before the heat death of the universe? o How do we identify these problems? 3. To extend basic mathematical skills and to develop further logical and analytical skills directly related to Computer Science. 4. To provide a theoretical foundation for other Computer Science courses

Definition of Theory of Computation Theory of computation is a branch of mathematics and computer science that deals with whether and how efficiently problems can be solved on a model of computation using an algorithm. Sipser (1996) aver that theory of computation is about the fundamental ideas related to computer hardware, software and certain applications Theory of computation is concerned with asking the following fundamental questions: 1. What are the limits of computation? 2. Are there problems which cannot be computed? 3. How do we model computation?

Prepared By Zanamwe N

Theory provides conceptual tools that are used by practitioners in computer engineering. Computability theory focuses on formulating mathematical models that describe with varying degrees of accuracy parts of computers. Context free grammars and their restricted forms are the basis of compilers and parsing. 3. that is. DB. to solve problems and to know when you haven’t solved a problem. control unit design in computer architecture.In complexity theory. Further. Computer theory applications Theory of computation provides us with many good applications. how to represent computation in forms that admit rigorous analysis and not merely execution. This course can heighten our aesthetic sense and help us build more beautiful systems. when designing a new programming language for a specialised application you need an understanding of grammars.in computability theory the classification of problems is by those that are solvable and those that are not. Turing theory Automata theory Is concerned with the definitions and properties of mathematical models of computation Unlike other models (SE. the objective is to classify problems as easy ones and hard ones. Complexity theory. an understanding of finite automata and regular expressions is useful with string searching and pattern matching. NP-Complete theory helps us distinguish the tractable from the intractable. types of computers and similar machines. Automata theory 2. Note that computational models may be accurate in some ways and not in other ways. We do not focus on these applications. computation models deal with all computers that exist. Branches of theory of computation Theory of computation is divided into two branches: 1. compiler design. Major areas of focus 1. DAA). They are deep enough to require a separate course. Computation is modelled using languages and machines. Computability theory.The first two questions can only be addressed after the last question is addressed. Computability theory introduces several of the concepts used in complexity theory. to express yourself clearly and precisely. will exist and that can ever be dreamed of. 2. For example. Theory gives us a new viewpoint of computers which are complex machines. Theory expands your mind because studying it trains you in areas such as the abilities to think. and many other modelling applications. Although both branches deal with formal models of computation. Prepared By Zanamwe N ©2011 . Pushdown automata theory 3. and hopefully you have already seen some of them. this course focuses only on computability theory. Relevance of theory to practice 1. Further theory shows us another elegant side of computation. 2. Finite state machines are used in string searching algorithms.

One model. For example if Σ = {a. Languages The fact that our study is sometimes called theory of formal languages makes it imperative to study languages. Another model. For any word x in any language. is L + Λ = L (+ is the union of sets operation) 3. It has already been indicated that TOC deals with asking the question. Natural languages like English are made up of letters. String over an alphabet. is used in programming languages and artificial intelligence. 4. b. paragraphs etc. is used in text processing. languages are used to model computation. c. 8.. 1} 2. DO. 1991) or any finite set of symbols.. A language is defined as a game of symbols with formal rules. z} or Г = {0. For clarity. is L + Φ = L Prepared By Zanamwe N ©2011 . how do we model computation? And it has been indicated that computation is modelled languages and machines. words. Symbols Are members of an alphabet and usually denoted by small letters 3. compilers. Is denoted by L. the symbol Λ or Ԑ is not allowed to be part of the alphabet for any language. certain character strings are recognisable as words (END. cdabb. 6. is there a difference between Φ and Λ (language without words and word without symbols) 2. and adcd are strings over Σ. d. Empty language (Φ) Is a language that has no words or strings. WHILE etc). Two words are considered the same if all their characters are the same and in the same order. 5. certain strings of words are recognisable as commands and certain sets of commands become a program that can be translated into machine commands.b. sentences. Example Σ = {a.c. Alphabet (Г or Σ) (gamma and sigma) Is a finite set of fundamental units out of which we build structures (Cohen. called the finite automaton. The other reason why we study languages is that. Language Is a certain specified set of strings of characters from an alphabet. with computer languages. Similarly. if length(x) = 0 then x = Λ 7. Points of thought 1. Words Are strings containing the symbols in some alphabet. and hardware design.d} then aadc. Terminology 1. called the context – free grammar. Empty string or null string Is a string of length zero and is denoted by (Λ or Ԑ). String length (|x| or length(x)) Refers to the number of symbols in a string. The word formal means that all the rules for the language are explicitly stated in terms of what strings of symbols can occur.is a finite sequence of symbols from that alphabet written adjacent to one another and not separated by commas.

Can be used to test whether a word is valid 2. This notation is sometimes known as the Kleene star. If a language L does not contain Λ. n} then Palindrome = { Λ . 2. Σ = {m. and all strings y such that reverse(y) = y}so words in Palindrome are: { Λ.c} then Σ* = { Λ.. bbb. L + Φ = L since no new words were added Defining Languages There are two types of language defining rules: 1. cc. and px denotes the string containing x successive copies of the symbol p and x > y For each language. Example: if Σ = {b. m. then L + Λ is not the same as L 3. Is defined as a language in which any string of letters from Σ is a word. It is false that Λ is a word in the language Φ since this language has no words at all. bc. Valid words If a word is contained in a given language it is valid otherwise it is invalid Question: Given the following languages: L1= pxqyrx+y. Example reverse(eert) = tree Palindrome Assume a new language Palindrome is defined over the alphabet. mmm... c bb. b. then reverse(c) is the same string of letters spelled backward. Used to construct all the words in the language by some clear procedures Concatenation operation Is used to join two or more strings and a concatenation is a string obtained by appending one string to the end of another.} Note that if you concatenate two words in Palindrome the obtained word is sometimes in Palindrome.1. 0.. For example L1 = {good} and L2 = {boy}. mm.2. nn...1. n. cb. 0. called the reverse of c even if this backward string is not a word in L.. where x and y range over all the natural numbers.Answers 1. The term infinite language means. and px denotes the string containing x successive copies of the symbol p L2= pxqyrx-y. nmn..2. Kleene Closure of an alphabet Denoted by Σ*. infinitely many words each of a finite length. bbc . There is a subtle but important difference between the word that has no letters and the language that has no words. L1 + L2 = {goodboy} Reverse function If c is a word in some language L. list 5 valid and invalid words. mnm. even the null string. where x and y range over all the natural numbers. Lexicographic ordering Prepared By Zanamwe N ©2011 . } The Kleene star is an operation that makes an infinite language of strings of letters out of an alphabet.

2. 10. 4. 111. 100. 00001. prove whether or not the string 1000011001110001 is in the closure of S. a4. 11. xyxyx. xyxxy. xy} then S* = { Λ plus any word composed of factors of x and xy} = { Λ plus all strings of x’s and y’s except those that start with y and those that contain a double y} = { Λ. 100. For example if S = {aa. Determine the smallest power of a (> 1) that we cannot form out of factors of (aa) and (aaa). 1. This factoring is unique sometimes it is not. . 11. If S is a set of words then S* is a set of all finite strings formed by concatenating words from S. 2. These six factors are all inset S so their concatenation is in S*. where any word can be used as often as we like and where the null string is also included. 11100. 0000. For example if Σ = {0. 01.} or { Λ. 001.. xyxx. xxyx. 1. aaa. Solution: factor the string as follows (x) (xy) (x) (x) (xy) (x).} prove whether aaaaaaa is in S*. 110. Example 1: if S = {00. aaa}. xx. We proceed as follows.. xyxy. aaaaa.Means that strings must be arranged in size order (words of shortest length first) and words of the same length must be put alphabetically. xxyxy. 10000. 1100. 00100. Using example 1. . a5and a6 then strings that we cannot produce must be large. 0011. 3. 000. 11001. xyxxx. 0. 00.. aaa} then S* = { Λ plus all strings of more than one a} or {an for n = 0. a5. xxxxx. 1} then S* = { Λ plus any word composed of factors of 00 and 1} = { Λ plus all strings of 0’s and 1’s in which 0’s occur in even clumps } = { Λ. prove the existence of xxyxxxyx in S*. xyx. xxx. 1001. Given that S= {aa. xxxx. 00. 1111.} Proving the existence of a word in the closure This is done by showing how a word can be written as a concatenate of words from the base set S. x. On this list we state how to form a. aaaa. xxxy. Example 2: if S = {x. aaaaaa. Assume that there are some powers of a we could not produce by concatenating factors of (aa) and (aaa). aa. The factors are: (aa) (aaa) (aa) or (aaa) (aa) (aa) or (aa) (aa) (aaa). Using the last example. xxyxx. 11111} The string 0010001 is invalid since it has a clump of 0’s of length 3. . Assume that we work our way successfully up to an-1 but then we cannot figure out how to form an Prepared By Zanamwe N ©2011 . 010. 001.. Proof by constructive algorithm Is a way of proving that something exists by showing how to create it. xy.2 a 3 .. a6 and so on. 111 } Closure of a set of words The use of the Kleene star can be generalised to sets of words not just sets of alphabet letters. prove that S* contains all an for n ≠1. Assume here that we start making a list of how to construct the various powers of a. 10011. 5.. xxy. 1} then Σ* = { Λ. Since we can produce a4. 1.

6. Theorem 1 For any set S of strings. aa. xxyyyxxxxyyyyyyxx = (xx) (yyy) (xx) (xx) (yyy) (yyy) (xx) This is analogous to saying that if computers are made up of circuits and circuits are made up of logic gates then computers are made up of logic gates. So if we consider A to be our set S*. Recursive definition of languages 2. It is already a problem and it gets worse latter. since in A* we can choose as a word any one factor from A. Together the two inclusions prove that S** = S*. then S* ={Λ}which is also true but for a different reason that is Λ= Λ Λ. It can be generalised that for any set A we know that A  A*. aaaa. Regular expressions 3. if S = {Λ}. Every factor from S* is made up of factors from S.. Sometimes the notation + instead of * is used to modify the concept of closure to refer to only the concatenation of some (not zero) strings from a set S. anyone who thinks that Λ is not confusing has missed something.. Therefore. Finite automata 4.3. every word in S** is also a word in S*. aaa. Proof Every word in S** is made up of factors from S*.} For any language S* = S+ + Λ if S does not contain Λ. some strings in S* are xxyyyxxxx yyy yyyxx. If Σ = {} or then Σ* = {Λ} this is not the same as. |Λ| = 0 | |=0 Cohen (1991) notes that... Note: Λ is an element of L* for all languages. we have S*  S**. . Transition Graph Prepared By Zanamwe N ©2011 . yyy} then S* is a set of strings where the x’s occur in even clumps and the y’s ouccur in groups of 3. . Ways of Defining Languages There are several ways of defining languages notably: 1. This can be expressed as S**  S*. Establish how an-2 was formed and then concatenate another factor of aa in front of this and then you will have an. S* = S** Illustration: if S = {xx. If we concatenate these three elements of S* we get one big word in S** which is also in S*. 9. If Σ = {a} then Σ+ = {a.

Other Recursive definition Is a method of defining sets and has three steps: 1. Rule 1: Any number is in POLYNOMIAL Prepared By Zanamwe N ©2011 . Recursive definition of positive integers Rule 1: 1 is in INTEGERS Rule 2: if x is in INTEGERS so is x+1 3. Examples: 1.5. each of which is in the form: a real number times a power of x (that may be x0=1). Recursive definition of polynomial A polynomial is a finite set of terms. Recursive definition of a set of positive even numbers Rule 1: 2 is in EVEN Rule 2: if x and y are both in EVEN then so is x+y 2. Recursive definition of factorial Rule 1: 0! = 1 Rule 2: n!= n*(n-1)! 5. then so are x+y and x-y 4. Specify some base objects in the set 2. Declare that no objects except those constructed in this way are allowed in the set. Give rules for combining more objects in the set from the ones we already know 3. Recursive definition of integers Rule 1: 1 is in INTEGERS Rule 2: if both x and y are in INTEGERS.

2 is in POLYNOMIAL By rule 2. 2x2 + 3x + (-10) = 2x2 + 3x -10 is in POLYNOMIAL REGULAR EXPRESSIONS      Cohen (2001) defines REs as language defining symbols whereas Sipser (1996) defines them as expressions describing languages. pq and (p) *Show that 2x2 + 3x – 10 is in POLYNOMIAL By rule 1. (2)(x) is in POLYNOMIAL. -10 is in POLYNOMIAL By rule 3. p-q. call it 2x2 By rule 1. Formal Definition of a Regular Expression   Symbols that appear in REs include letters of the alphabet Σ. (3)(x) is in POLYNOMIAL By rule 3.Rule 2: the variable x is in POLYNOMIAL Rule 3: if p and q are in POLYNOMIAL then so are p+q. x is in POLYNOMIAL By rule 3. parenthesis. 2x2 + 3x is in POLYNOMIAL By rule 1. the symbol for the empty language Φ. the symbol of the null string Λ. (2x)(x) is in POLYNOMIAL. 3 is in POLYNOMIAL By rule 3. Languages defined by REs are referred to as Regular Languages REs are limited in capacity because there are some languages that cannot be defined by REs A RL is one that can be defined by a RE The value of a RE is a language. Λ itself is a RE and so is Φ. The set of regular expressions is defined as follows: o Rule 1: every letter of the alphabet Σ can be made into a regular expression by writing it in bold face. call it 2x By rule 3. o Rule 2: if r1 and r2 are REs then so are:  (r1)  r1 r2  r 1 + r2  r2 * Prepared By Zanamwe N ©2011 . the star operator and the plus sign.

o Rule 3: nothing else is a RE Note that:     r + Λ = r + Λ but not always equal to r rΛ = Λr = r r+Φ=r rΦ = Φr = Φ “but what is far less clear is exactly what Φ* should mean. We shall avoid this philosophical crisis by never using this symbolism and avoiding those who do” Cohen (2001) Moving from RE notation to set notation We use L operator and its rules are as follows:      L(a) = {a} L(a+b) = L(ab) = L(r*) = L(r*) Language of Λ = L(Λ) = Λ Examples: 1. Prepared By Zanamwe N ©2011 . 2.

r + Φ = r . L((a+b)(a+b)(a+b))*-all strings with an odd number of symbols 7. L(a*b*).is a language that contains precede the 3.all strings with an even number of symbols 8. 10. then L(r) ={b} but L(rΦ) = Φ // accepts L(r) when r=Φ Notice that the use of the plus sign is far from the normal meaning of addition in the algebraic sense. 14.only eliminates plus sequences of on in which the and the shortest string is from 6. 11. L(a*b*)* . 5. this can be which is just short hand not a RE. 9. . all strings over strings of and – strings with exactly 2 )) – strings over and with 3 symbols.adding the empty language to any other language will not change it r = r – adding an empty or null string to any string will not change it. if there are ) – all strings over . L(a(a+b)*a +b(a+b)*b + a + b – all strings over a and b that start and end with the same symbol Note that:    L(a*Φ)=Φ . for plus as union or plus as choice the following all make sense. Give the RE for a language over a and b that accepts all strings with the first and last symbols different or second from last and second from first different.all strings over a and b in which all a’s (if any) precede all b’s (also if any) 15. o b*=b* + b* o b*=b* + b* + b* o b* = b* + bbb Prepared By Zanamwe N ©2011 . .all strings over a and b 16. } rΦ may not equal r if r=b. But   r+ may not equal r for example if r=b then L(r) = {b} but L(r+ ) ={b. – all strings ending in and .concatenating an empty language to any non-empty language yields an empty language. expressed as 12. 4.accepts all strings over a and b with first and last and with an a preceding a with precisely one symbols different 13.

Kleene closure of the set L1 as a set of words: i. aa. Prepared By Zanamwe N ©2011 . The regular expression r1 + r2 is associated with the language formed by union of the sets L1 and L2: i. Rule 1: that language that is associated with the regular expression that is just a single letter is that one-letter word alone and the language associated with is just { }. The language associated with with the regular expression (r1)* is L1*. then L can be defined by a regular expression. abb. Language(r1*) = L1* The relationship between REs and RLs leaves open 2 questions:   Is there an algorithm for determining whether different REs describe the same language? Is it true that every language can be described by a regular expression? the the the the Finite languages are Regular Theorem: if L is a finite language (a language with only finitely many words). aaaa} the algorithm above gives the RE a + aa + aaa + aaaa Here we need only to show that at least one RE exists. convert all words in L into bold face type and insert plus signs between them. For example. This trick only works for finite languages because with infinite languages the RE will be infinitely long which is forbidden. Language(r1)(r2) =L1L2 b. The regular expression (r1)(r2) is associated with product L1L2 that is language L1 times L2: i. In other words all finite languages are regular. 2. bb. Rule 2: if r1 is a regular expression associated with the language L1 and r2 is language associated with the language L2. aaa. Proof: To make one RE that defines the language L. then: a. the RE that defines the language L= {ab. a one word language.Also note that in algebra but in formal languages ac ca and also that (ab)* a*b* Languages associated with regular Expressions Below are rules that define the language associated with any regular expression: 1. Language(r1+r2) =L1+L2 c. bbb} is ab + bb + abb + bbb and also if L = {a.

Prepared By Zanamwe N ©2011 .