You are on page 1of 23

University of California, San Diego Department of Computer Science and Engineering Course: CSE 105 − Theory of Computability 2004

Summer Session I July 21, 2004

Instructor: Adriana Palacio TAs: Ragesh Jaiswal Vadim Lyubashevsky Saurabh Panjwani

Problem Set 2 Solutions
1. Exercise 2.1. The parse trees are in Figure 1. a. E ⇒ T ⇒ F ⇒ a b. E ⇒ E + T ⇒ T + T ⇒ F + T ⇒ a + T ⇒ a + F ⇒ a + a. c. E ⇒ E + T ⇒ E + T + T ⇒ T + T + T ⇒ F + T + T ⇒ a + T + T ⇒ a + F + T ⇒ a + a + T ⇒ a+a+F ⇒a+a+a d. E ⇒ T ⇒ F ⇒ (E) ⇒ (T ) ⇒ (F ) ⇒ ((E)) ⇒ ((T )) ⇒ ((F )) ⇒ ((a)). a. E T F a b. E T F a + a E T F E T F a + a + a c. E T F E T F d. E T F E T F E T F ( ( Figure 1: Parse trees for Exercise 2.1 a, b, c, d. a ) )

1

2. Exercise 2.3 a, b, ... , m. a. The variables are {R, S, T, X}. The terminals are {a, b}. The start variable is R. b. {ab, ba, aab} c. {ǫ, aa, ba} d. False e. True f. False g. True h. True i. False j. True k. True l. False m. All strings over {a, b} that are not palindromes.

3. Exercise 2.4 b, f. b. Let L = { w | w starts and ends with the same symbol }. Let G = (V, Σ, R, S) where • V = {S, T } (there are two variables, one of which is the start variable S) • Σ = {0, 1} (the problem says to assume this for all parts) • The set R of rules contains the following eight rules: S → 0T 0 | 1T 1 | 0 | 1 T → TT | 0 | 1 | ε Grammar G generates L. Comment: From the variable T , one can derive any string in Σ∗ . Therefore, from S one can derive any string whose first and last symbols are the same. Note that the first and last symbols of the string 0 are the same, and likewise for the string 1. Therefore, these must be derivable. On the other hand, the empty string has no symbols and thus its first and last symbols are not the same and it should not be derivable. When you provide grammars in your solutions, use the above template for the specification. Do not merely list the rules; you must also say what is the set V of variables, the set Σ of terminals, and the start variable, and say that the grammar is the 4-tuple of these. f. Let L = { w | w = wR , that is, w is a palindrome }. Let G = (V, Σ, R, S) where • V = {S} (there is only one variable, namely the start variable S)

2

S3 . S1 . Also notice that the last rule of this grammar is very similar to the last rule of the grammar of Problem 3. L2 . or (2) w is of the form ai bj for some j > i ≥ 0. → → → → → S1 | S2 | S3 a | aS1 | aS1 b b | S2 b | aS2 b T baT TT | a | b | ε 3 . L3 .• Σ = {0. S) where • V = {S. S2 . The first rule of the grammar G implies that L(G) = L1 ∪ L2 ∪ L3 . b}∗ | S3 ⇒∗ w } = L3 . Comment: We want the grammar to generate all strings w ∈ {a. b}∗ not consisting of some number of as followed by the same number of bs. This illustrates how one can reuse pieces of known grammars in building new ones. we break it up into parts depending on the structure of w. T } • Σ = {a. We remark that the high-level idea of the construction follows the idea behind the proof of the closure of the class of CFLs under union given in lecture. R. b}∗ | ba is a substring of w }. or (3) w contains ba as a substring. the language in question is the union of the three languages L1 . where (1) L1 = { ai bj | i > j ≥ 0 } (2) L2 = { ai bj | j > i ≥ 0 } (3) L3 = { w ∈ {a. b}∗ | S2 ⇒∗ w } = L2 { w ∈ {a. Σ. Let L = { an bn | n ≥ 0 }. b}∗ | S1 ⇒∗ w } = L1 { w ∈ {a. b} • The set R of rules contains the following 14 rules: S S1 S2 S3 T Grammar G generates L. 4. We note that w is not of the form an bn if and only if one of the following is true: (1) w is of the form ai bj for some i > j ≥ 0. b. and has the same functionality. namely to generate any string. above. Exercise 2. The grammar G above was designed so that { w ∈ {a. To simplify the task. b. 1} (the problem says to assume this for all parts) • The set R of rules contains the following five rules: S → 0S0 | 1S1 | 0 | 1 | ε Grammar G generates L. c. In other words. Let G = (V.6 b.

Thus from variable T one can derive all strings of the form w#uwR . 5. U } • Σ = {0. S) where • V = {S.c. From variable S one can then derive all strings of the form w#uwR v. Σ. Observe that the rule X → 0X0 | 1X1 | ε allows to derive all strings in { wwR | w ∈ {0. R. x ∈ {0. 1}∗ . v ∈ {0. where w. #} • The set R of rules contains the following 8 rules: S → TU T → 0T 0 | 1T 1 | #U U → UU | 0 | 1 | ε Grammar G generates L. 1}∗ .8 The first derivation of the sentence is as follows: SEN T EN CE ⇒ ⇒ ⇒ N OU N − P HRASE V ERB − P HRASE CM P LX − N OU N V ERB − P HRASE ART ICLE N OU N V ERB − P HRASE ⇒ the N OU N V ERB − P HRASE ⇒ the girl V ERB − P HRASE ⇒ the girl CM P LX − V ERB ⇒ the girl V ERB N OU N − P HRASE ⇒ the girl touches N OU N − P HRASE ⇒ the girl touches CM P LX − N OU N P REP − P HRASE ⇒ the girl touches ART ICLE N OU N P REP − P HRASE ⇒ the girl touches the N OU N P REP − P HRASE ⇒ the girl touches the boy P REP − P HRASE ⇒ the girl touches the boy P REP CM P LX − N OU N ⇒ the girl touches the boy with CM P LX − N OU N ⇒ the girl touches the boy with ART ICLE N OU N ⇒ the girl touches the boy with the N OU N ⇒ the girl touches the boy with the flower 4 . 1}∗ }. Let G = (V. Comment: We want the grammar to generate all strings s ∈ {0. u. #}∗ such that s = w#uwR v for some w. Exercise 2. u ∈ {0. From variable U one can derive any string in {0. v ∈ {0. 1. u. 1}∗ . T. Let L = { w#x | wR is a substring of x for w. 1}∗ . 1}∗ }. where w. 1.

Our recommendation is that a reader who has only partially succeeded in solving this problem should view each step as a hint and try to get the complete solution hint by hint. to touch the boy). (2) Using the above observation.e. b}∗ (including ε) can be derived from Y (try writing out the strings that can be obtained by applying the rules for Y and you’ll see this). with her elbow.. with another flower etc) is unspecified by this interpretation. the boy could be holding the flower in his hand). (1) Your first step should be to figure out the kind of strings that can be derived from the variable Y alone (this is because the strings that are contained in the language are related to whatever can be derived from Y ). If you spend some time thinking.This derivation corresponds to the interpretation that the flower is something that the “boy” has (for example. How the girl touches him (with her hand. Look 5 . The second derivation works as follows: SEN T EN CE ⇒ ⇒ ⇒ N OU N − P HRASE V ERB − P HRASE CM P LX − N OU N V ERB − P HRASE ART ICLE N OU N V ERB − P HRASE ⇒ the N OU N V ERB − P HRASE ⇒ the girl V ERB − P HRASE ⇒ the girl CM P LX − V ERB P REP − P HRASE ⇒ the girl V ERB N OU N − P HRASE P REP − P HRASE ⇒ the girl touches N OU N − P HRASE P REP − P HRASE ⇒ the girl touches CM P LX − N OU N P REP − P HRASE ⇒ the girl touches ART ICLE N OU N P REP − P HRASE ⇒ the girl touches the N OU N P REP − P HRASE ⇒ the girl touches the boy P REP − P HRASE ⇒ the girl touches the boy P REP CM P LX − N OU N ⇒ the girl touches the boy with CM P LX − N OU N ⇒ the girl touches the boy with ART ICLE N OU N ⇒ the girl touches the boy with the N OU N ⇒ the girl touches the boy with the flower This derivation corresponds to the interpretation that the flower is something that the “girl” is using to carry out her action (i. you should be able to see that every possible string in the set {a. 6.25 Comment: The solution for this problem has been written in discrete steps that could be followed in proceeding towards the solution. figure out the kind of strings that can be derived from S. Problem 2.

we can’t use the rule S → aSb in deriving the string ab. Prove that the class of context-free languages is closed under the concatenation operation. But can all such strings in {a. What are the strings in {a. b}∗ be derived from it? That in turn depends on what S can derive. The rule S → aSb gives us the impression that many strings beginning with an a and ending in a b can also be derived from S. meaning that some strings starting in an a and ending in a b can be derived from S. Given: Languages L1 and L2 are context-free. meaning that the two given grammars have no variables in common. R1 . b}∗ ). Since we can’t derive the empty string from S. (5) What is the complement of L(G)? The language { an bn | n ≥ 0 }. R2 . We’ve seen this (and similar) language(s) many times in class. b}∗ \ { an bn | n ≥ 0 }. Since we can’t derive ab from S. R. It says S → aSb. The grammar for it is simple and the rules are as follows: S → aSb | ε 7. S2 ) that generates L2 . (4) Applying this argument recursively. S1 ) that generates L1 and a CFG G2 = (V2 . (3) But hey! Look at the first rule for S. there is a CFG G1 = (V1 . We saw in step (2) above that any string that isn’t ε and that doesn’t start in an a and end in a b can surely be derived from S. we can’t use the rule S → aSb in order to derive aabb. so we are done. This can always be made true by renaming variables in one of the grammars. can you derive the string aabb from S? NO.) In other words. can you derive the string ab from S? NO. namely the rule S → S1 S2 ) In this construction. Σ2 . Any string starting in a b can be derived from S (using the rule S → bY ). Want: To construct a CFG G = (V. We conclude that the language L(G) must at least contain any string which is not of the form axb (where x ∈ {a. and one new rule. we conclude that any string that is formed by concatenating a sequence of a’s with a sequence of b’s of equal length cannot be derived from S. the rules of G2 . showing that L1 · L2 is a context-free language. Similarly. Which ones can’t be derived? Well. This means that L(G) = {a. we claim that S ⇒∗ w in G if and 6 . if necessary. Construction: Grammar G = (V. Note that this notation captures the fact that ε is not in the language. Σ.at the simpler rules first. Also observe that the string ε cannot be derived from S by the application of any of the 3 rules for S. S) is defined as follows: • V = V1 ∪ V2 ∪ {S} where S is a new variable • Σ = Σ1 ∪ Σ2 • R = R1 ∪ R2 ∪ {S → S1 S2 } (the set of rules contains the rules of G1 . b}∗ that are not covered by these two kinds of strings ? The strings which both start in an a and end in a b and the string ε. Σ1 . Therefore. Σ. we assume that V1 ∩ V2 = ∅. Correctness: We claim that L(G) = L(G1 ) · L(G2 ). Any string ending in an a can be derived from S (using the rule S → Y a). R. (Since L(G1 ) = L1 and L(G2 ) = L2 this means that L(G) = L1 · L2 . S) that generates the language L1 · L2 .

Then s ∈ L and |s| ≥ p. We use a template similar to the one we used to prove that a language is nonregular via the Pumping Lemma for regular languages. Σ. s can be 7 . Σ. Σ. Let p be the pumping length given by the P. w2 such that S1 ⇒∗ w1 in G1 and S2 ⇒∗ w2 in G2 . S1 ). This is true because the first rule applied in a derivation of any w via G is S → S1 S2 . Problem 2. RB If L(G1 ) = L(RA ). ∅. and L(G2 ) = L(RB ). then G = (V1 ∪ V2 ∪ {S}. S) Let L be a regular language. where G2 = (V2 . then G = (V1 ∪ V2 ∪ {S}. {S → ε}. R1 . R2 . 8. R1 ∪ R2 ∪ {S → S1 | S2 }. Σ.19.L. where G1 = (V1 . R1 ∪ R2 ∪ {S → S1 S2 }. S1 . where G1 = (V1 . 9. Σ. and vice-versa. page 115 of the textbook) applies to L. Assume that L = { 0n 1n 0n 1n | n ≥ 0 } is a CFL.. R1 . This assumption means that the Pumping Lemma for CFLs (Theorem 2. Then there exists a regular expression R such that L(R) = L. where G1 = (V1 . then G = (V1 ∪ {S}.16 First we will show how to convert a regular expression R directly to an equivalent CFG G.. S) • R=∅ G = ({S}. Problem 2. You should use this template in your own solutions. S) • (RA ∪ RB ) for reg exps RA . Σ. S2 ) and V1 ∩ V2 = ∅. The proof is by contradiction.L. Σ. the only way the derivation can proceed is to expand S1 according to G1 and S2 according to G2 . S) • (RA · RB ) for reg exps RA . Σ. The fact that V1 ∩ V2 = ∅ is important to ensure that a derivation starting from. L is a CFL. • R = s for s in the alphabet Σ G = ({S}. where G2 = (V2 . S2 ) and V1 ∩ V2 = ∅. RB If L(G1 ) = L(RA ). can only use rules in G1 and not use rules in G2 .18 a. R1 ∪ {S → SS | S1 | ε}. S1 ). S1 ). c. say. By the P. there is a CFG that generates L. R2 . Let s = 0p 1p 0p 1p . R1 . Σ. Σ. and L(G2 ) = L(RB ). Let G be a CFG equivalent to R. Σ.e. Thus. S) OR G = ({S}. S) ∗ • (RA ) for reg exp RA If L(G1 ) = L(RA ). S) • R=ε G = ({S}. Σ.only if w has the form w1 w2 for some w1 . and after that. a. i. {S → s}. Then we will use this to give another proof that every regular language is a CFL. {S → S}. Then L(G) = L(R) = L. constructed as shown above.

We know. w = 0p 1p and hence uv 2 xy 2 z ∈ L. In either case. Assume that L = { w#x | w is a substring of x. x ∈ {a. y. This means our assumption that L was a CFL is false. Then s ∈ L and |s| ≥ p. By the P. Case 1) If vxy is a substring of the first 0p 1p segment of s = uvxyz = 0p 1p 0p 1p . however. then. uv 2 xy 2 z has the form 0p w1p . x. The proof is by contradiction. Then the Pumping Lemma for CFLs (Theorem 2. that they satisfy the three conditions. w = 0p 1p and hence uv 2 xy 2 z ∈ L. z. v. Case 2) If vxy is a substring of 1p 0p . page 115 of the textbook) applies to L. uv 2 xy 2 z has the form 0p 1p w. that they satisfy the three conditions. z. In either case. s can be split into uvxyz such that the following conditions hold: (1) (2) (3) |vy| > 0 |vxy| ≤ p For all i ≥ 0: uv i xy i z ∈ L Comment: Recall that we can choose s. By condition (1). but we have no control over u. b}∗ } is a CFL. contradicting condition (3) above. Let p be the pumping length given by the P. uv 2 xy 2 z has the form w0p 1p . Since we are not given information about the length of u. or 2) 1p 0p . Now our goal is to show that conditions (1) and (2) imply that for some i ≥ 0: uv i xy i z is not in L. v. Condition (2) says that the length of vxy is at most p.split into uvxyz such that the following conditions hold: (1) (2) (3) |vy| > 0 |vxy| ≤ p For all i ≥ 0: uv i xy i z ∈ L Comment: Recall that we can choose s.L. c. We have to consider several possibilities. In either case. we do not know where vxy lies in the string s = uvxyz = 0p 1p 0p 1p . Condition (2) says that the length of vxy is at most 8 . x.L. Let s = ap bp #ap bp . however. We know. where w has more than p 0s or more than p 1s. where w. Since we are not given information about the length of u. We know that s = uvxyz = ap bp #ap bp . y. we do not know where vxy lies in the string s = uvxyz = ap bp #ap bp . but we have no control over u. by condition (1). then by condition (1) (which says that vy has non-zero length). or 3) the last 0p 1p segment. Case 3) The last case is that vxy is a substring of the second 0p 1p segment of s = uvxyz = 0p 1p 0p 1p . and in each case show that uv 2 xy 2 z ∈ L. Now our goal is to show that conditions (1) and (2) imply that for some i ≥ 0: uv i xy i z is not in L.. This means that vxy is a substring of one of the following: 1) the first 0p 1p segment. where w has more than p 0s or more than p 1s. We now consider these three cases one by one. contradicting condition (3) above. w = 1p 0p so uv 2 xy 2 z ∈ L. We have to consider several possibilities.19. This means our assumption that L was a CFL is false. where w has more than p 0s or more than p 1s. We know that s = uvxyz = 0p 1p 0p 1p .

For the third case. For the second case. This shows that if vxy contains #. we show that uxz ∈ L or uv 2 xy 2 z ∈ L. Comment: Notice that this argument is a little more involved than the ones we used in class and the one in part a. where w has more than p a’s or more than p b’s. for this one we cannot always say exactly which i will yield a string that is not in L. so uv 2 xy 2 z ∈ L. They can. and hence uv 2 xy 2 z ∈ L. This is enough to contradict condition (3) above. either i = 0 or i = 2. Case 1) The first case is that vxy is a substring of the first ap bp segment of s = uvxyz = ap bp #ap bp . 10. Therefore. it is sufficient to show that their lengths are different or find one position in which x and y differ. For the second case. If v or y contain # then uv 2 xy 2 z has more than one #. If v = ε. uxz has the form ap bp #w where w has less than p a’s. by condition (2). test for inequality. v = ε or y = ε. To keep track of the position. By condition (1). For the first case. This means that one of the following is true: either 1) vxy is a substring of the first ap bp segment of s. It then reads the following symbols in the input 9 . In either case. then. In either case. If v = ε. the PDA pushes some symbol. where w has less than p a’s or less than p b’s. i = 2. or 3) vxy is a substring of the last ap bp segment of s. For the third case. But we show that in every case there is an i ≥ 0 that yields such a string. whereas to conclude that the two strings are equal. We now consider these three cases one by one. 1}∗ }). The PDA we construct to recognize C guesses whether to check if strings x and y have different lengths. v consists of one or more b’s and y consists of zero or more a’s. however. then uxz ∈ L or uv 2 xy 2 z ∈ L. Condition (1) says that vy has non-zero length. Therefore. By condition (1). Case 2) The second case is that vxy contains #. This means that uv 2 xy 2 z has the form w#ap bp . say a onto the stack while it reads its input until it makes the guess. then x must contain #. Comment: Notice that to conclude that strings x and y are different. we show that uv 2 xy 2 z ∈ L. or to first guess a position in which these strings differ and then check that the symbols in this position of x and y are indeed different. PDAs cannot test for equality of strings. Case 3) The last case is that vxy is a substring of the second ap bp segment of s = uvxyz = ap bp #ap bp . (For the first case. This means that for all cases there is some i ≥ 0 such that uv i xy i z is not in L. for the second case. w is not a substring of ap bp and hence uv 2 xy 2 z ∈ L. The first and third cases are standard (and somewhat similar to the first and third cases in the proof discussed in class for the non-CFL language { ww | w ∈ {0. or 2) vxy contains #.26 Let C = { x#y | x. by Conditions (1) and (2). Problem 2. above. uv 2 xy 2 z has the form w#ap+j bp . y consists of one or more a’s. we show that uxz ∈ L or uv 2 xy 2 z ∈ L. Thus ap bp is not a substring of w and hence uxz ∈ L. The state diagram of this PDA is in Figure 2. We show that C is a CFL by constructing a PDA that recognizes C. i = 0. then. it is necessary to check that all symbols in x and y are identical. Thus w is not a substring of ap+j bp . ap bp is not a substring of w and hence uxz ∈ L. where j ≥ 0 and w has more than p b’s.p. we show that uxz ∈ L. If neither v nor y contain #. But.) Unlike previous problems. 1}∗ and x = y }. y ∈ {0. uxz has the form ap bp #w.

it pushes a’s onto the stack while it reads its input until it reaches the symbol #. where x. until the stack is empty or the input is exhausted. ε → a 0. ε → a 1. 16. y ∈ {0. ε → a 0. ε → ε 0. a → ε 1. ε → ε 1. then the PDA also accepts. Notice that if the input string does not contain exactly one occurrence of the symbol #. ε → ε 0. If the symbol read when the marker $ is popped is different from the symbol read when the PDA made its guess. a → ε 1. ε → ε ε. ε → $ 1. If the input is exhausted before the stack is empty (meaning |x| > |y|). iff x = y). ε → ε #. 1}∗ and |x| = |y|.e. ε → ε 0. ε → ε 1. then the machine accepts.26. Exercise 3. then the PDA will accept. say 1.0. $ → ε ε. ε → ε 0. instead. If the input string is of the form x#y. then the PDA will accept if and only if there is some position where x and y differ (i. When the PDA makes a guess to check if strings x and y have different lengths. $ → ε 1. where x. We could have just used. a → ε 0. Then it pops a’s from the stack while it continues to read its input. y ∈ {0. until the stack is empty. a → ε Figure 2: PDA for Problem 2.. Notice that we use the symbol a to emphasize that the PDA is only keeping track of the number of symbols and not the symbols themselves. ε → ε #. then the PDA accepts. $ → ε 0. ε → a 1. a → ε 0. a → ε 1.5 10 . ε → ε 0. ε → $ #. If the input has not been exhausted when the marker $ is popped (meaning |y| > |x|). Then it pops a’s from the stack while it continues to read its input. then the PDA will reject. a → ε 1. ε → ε 1. string until it reaches the symbol #. $ → ε ε. If the input string is of the form x#y. 1}∗ and |x| = |y|.

. the TM would never know when the input actually ends! c. Note: If you are careful about how the loop is structured. Can a Turing machine contain just a single state? NO. Comment: For simplicity. b. 3.7 Mbad = “The input is a polynomial p over variables x1 . . The blank symbol is part of the tape alphabet (although it is not part of the input alphabet). Evaluate p on all of these settings. d.e. 17. Prove that the class of decidable languages is closed under union. in each of these problems we consider languages over the same alphabet Σ. Can a Turing machine ever write the blank symbol on its tape? YES. Can a Turing machine’s read head ever be in the same location in two successive steps? YES. . It will stay in place instead of falling off the tape.14 a. 2. i. Since L1 is decidable. The tape alphabet always contains the blank symbol. . Problem 3. there is a TM M1 such that for every w ∈ Σ∗ : 11 . By the definition of the transition function. otherwise. a TM can write any symbol from the tape alphabet to the tape. The loop structure is too vague to really specify what it does. . 1. Try all possible settings of x1 . .. The only time this can happen is when the TM is on the first tape square and it tries to move left. there is a TM M1 that decides it. reject. but the input alphabet cannot contain this symbol. A TM must contain at least two states: an accept state and a reject state. Given: Decidable languages L1 and L2 over some alphabet Σ. xk . you can make a TM that recognizes the language thet this machine attempts to decide. b. xk to integer values. If any of these settings evaluates to 0. c. Notice that since being in either of these states halts the computation. All of the proofs can be modified to account for languages over distinct alphabets. Exercise 3. accept. because there are an infinite number of them. d. . .” This description has several problems. Can the tape alphabet Γ be the same as the input alphabet Σ? NO. There is no condition when the machine will reject. e. If the blank symbol were part of the input alphabet.a. 18. a different start state is necessary in order for the TM to read any input whatsoever. a. The machine cannot try all possible assignments of the variables. Does the machine first write all possible values of the variables to the tape and then evaluate each one of them? Or does it produce some values of the variables and evaluate that assignment before considering another assignment? The most serious problem is in the last step.

) M must be such that for every y ∈ Σ∗ : y ∈ L1 · L2 y ∈ L1 · L2 / Construction: M = On input a string y n ← |y| =⇒ =⇒ M (y) M (y) accepts rejects 12 .w ∈ L1 w ∈ L1 / x ∈ L2 x ∈ L2 / =⇒ =⇒ =⇒ =⇒ M1 (w) M1 (w) M2 (x) M2 (x) accepts rejects Likewise. i. there is a TM M1 that decides it. and thus that the class of decidable languages is closed under concatenation. If y ∈ L1 then M1 (y) accepts and hence M (y) accepts. Since L1 is decidable. then y ∈ L1 OR y ∈ L2 .. Therefore. there is a TM M1 such that for every w ∈ Σ∗ : w ∈ L1 w ∈ L1 / x ∈ L2 x ∈ L2 / =⇒ =⇒ =⇒ =⇒ M1 (w) M1 (w) M2 (x) M2 (x) accepts rejects =⇒ =⇒ M (y) M (y) accepts rejects Likewise. there is a TM M2 such that for every x ∈ Σ∗ : accepts rejects Want: A TM M that decides L1 · L2 . If it accepts then accept EndIf Reject Correctness: If y ∈ L1 ∪ L2 . If y ∈ L2 then M2 (y) accepts and hence M (y) accepts. and / / / hence M (y) rejects. b. then y ∈ L1 AND y ∈ L2 . If y ∈ L1 ∪ L2 .) M must be such that for every y ∈ Σ∗ : y ∈ L1 ∪ L2 y ∈ L1 ∪ L2 / Construction: M = On input a string y Run M1 (y) . Prove that the class of decidable languages is closed under concatenation. (This would mean that L1 ∪ L2 is decidable.e. (This would mean that L1 · L2 is decidable. M1 (y) rejects and M2 (y) rejects. Given: Decidable languages L1 and L2 over some alphabet Σ. and thus that the class of decidable languages is closed under union. If it accepts then accept EndIf Run M2 (y) . there is a TM M2 such that for every x ∈ Σ∗ : accepts rejects Want: A TM M that decides L1 ∪ L2 .

. c. then there do not exist w ∈ L1 and x ∈ L2 such that y = wx. . i. wk ∈ Σ∗ \ {ε} such that y = w1 · · · wk do Run M (wi ) for i = 1. . there is a TM M that decides it. and hence M (y) rejects. x ∈ Σ∗ such that y = wx do Run M1 (w) Run M2 (x) If both of these computations accept then accept EndIf EndFor Reject Comment: Observe that the number of pairs of strings w. . . Given: A decidable language L over some alphabet Σ.For i = 1. (This would mean that L∗ is decidable. Therefore. n do For each sequence of strings w1 . . x ∈ Σ∗ for which y = wx. . . Prove that the class of decidable languages is closed under star.. . and thus that the class of decidable languages is closed under star. Thus the computation of M on any input halts. n do Run M1 (y[1] · · · y[i]) Run M2 (y[i + 1] · · · y[n]) If both of these computations accept then accept EndIf EndFor Reject Comment: This can also be written as follows (for example): M = On input a string y For each pair of strings w. then there exist w ∈ L1 and x ∈ L2 such that y = wx. . w ∈ L1 or x ∈ L2 . . there is a TM M such that for every w ∈ Σ∗ : w∈L w∈L / =⇒ =⇒ M (w) M (w) accepts rejects Want: A TM M ′ that decides L∗ . Therefore. . Thus for all pairs w.e. If y ∈ L1 · L2 . x ∈ Σ∗ such that y = wx is finite. for all / pairs w. Thus it will find a pair such that M1 (w) accepts and M2 (x) accepts. k 13 . x ∈ Σ∗ that M checks. . x ∈ Σ∗ that M checks. Since M checks all pairs w. . Correctness: If y ∈ L1 · L2 . / / M1 (w) rejects or M2 (x) rejects. . . Since L is decidable. it will find a pair such that w ∈ L1 and x ∈ L2 . M (y) accepts.) M ′ must be such that for every y ∈ Σ∗ : y ∈ L∗ y ∈ L∗ / =⇒ =⇒ M ′ (y) M ′ (y) accepts rejects Construction: M ′ = On input a string y If y = ε then accept EndIf n ← |y| For k = 1.

then since M ′ checks all sequences w1 . .. M (y) rejects. . Given: Decidable languages L1 and L2 over some alphabet Σ. . . then y ∈ L . Since L1 is decidable. Correctness: If y ∈ L∗ . . . M ′ (y) accepts. i. k} such that wi ∈ L. Therefore. there is a TM M that decides it. then y ∈ L. Prove that the class of decidable languages is closed under intersection. . . . k. If y = ε. . . Since L is decidable. . Given: A decidable language L over some alphabet Σ. . . If y ∈ L∗ . i. . Since M ′ is a decider for L. . M (y) rejects and hence M ′ (y) accepts. . . wk ∈ L \ {ε} such that / y = w1 · · · wk . Since M is a decider for L. then y = ε or there exist w1 . . k} such that M (wi ) rejects.e. .. . . there is some i ∈ {1. . wk ∈ Σ∗ \{ε} / that M checks. . then y = ε and there does not exist a sequence w1 . . . (This would mean that L is decidable. M (y) accepts and M ′ (y) rejects.If all of these computations accept then accept EndFor EndFor Reject Comment: Observe that the number of sequences of strings w1 . . . . Therefore. . then M ′ (y) accepts by construction. Prove that the class of decidable languages is closed under complementation. wk ∈ Σ∗ \ {ε} that M checks. / e. If y = ε. wk ∈ Σ∗ \{ε} for which y = w1 · · · wk . Therefore. for all sequences w1 . Thus the computation of M ′ on any input halts.) M ′ must be such that for every y ∈ Σ∗ : y∈L y∈L / =⇒ =⇒ M ′ (y) M ′ (y) accepts rejects Construction: M ′ = On input a string y Run M (y) If it accepts then reject else accept EndIf Correctness: If y ∈ L. . . . . there is a TM M1 such that for every w ∈ Σ∗ : 14 . there is some i ∈ {1. . there is a TM M such that for every w ∈ Σ∗ : w∈L w∈L / =⇒ =⇒ M (w) M (w) accepts rejects Want: A TM M ′ that decides L. . Therefore. . / If y ∈ L. Therefore.e. d. there is a TM M1 that decides it. and thus that the class of decidable languages is closed under complementation. wk ∈ Σ∗ such that y = w1 · · · wk is finite. wk ∈ L \ {ε} such that y = w1 · · · wk . for all sequences w1 . . M (wi ) accepts. . wk ∈ L \ {ε}. for i = 1. . it will find a sequence such that w1 .

) M must be such that for every y ∈ Σ∗ : y ∈ L1 ∩ L2 y ∈ L1 ∩ L2 / Construction: M = On input a string y Run M1 (y) Run M2 (y) If both accept then accept else reject EndIf Correctness: If y ∈ L1 ∩ L2 . then y ∈ L1 OR y ∈ L2 . Prove that the class of recognizable languages is closed under union.. there is a TM M1 such that for every w ∈ Σ∗ : w ∈ L1 w ∈ L1 / x ∈ L2 x ∈ L2 / =⇒ =⇒ =⇒ =⇒ M1 (w) M1 (w) M2 (x) M2 (x) accepts rejects or does not halt Likewise. Given: Recognizable languages L1 and L2 over some alphabet Σ. and thus that the class of decidable languages is closed under intersection. i. a. c.) M must be such that for every y ∈ Σ∗ : 15 .e. in each of these problems we consider languages over the same alphabet Σ. M1 (y) rejects OR M2 (y) rejects. (This would mean that L1 ∩ L2 is decidable. b.15 a. and hence / / / M (y) rejects. and hence M (y) accepts. M1 (y) accepts and M2 (y) accepts. Therefore. (This would mean that L1 ∪ L2 is recognizable. there is a TM M2 such that for every x ∈ Σ∗ : accepts rejects or does not halt Want: A TM M that recognizes L1 ∪ L2 . there is a TM M2 such that for every x ∈ Σ∗ : accepts rejects Want: A TM M that decides L1 ∩ L2 . Since L1 is recognizable. =⇒ =⇒ M (y) M (y) accepts rejects 19. d. All of the proofs can be modified to account for languages over distinct alphabets. Problem 3. there is a TM M1 that recognizes it. Therefore. and thus that the class of recognizable languages is closed under union. Comment: For simplicity. then y ∈ L1 AND y ∈ L2 . If y ∈ L1 ∩ L2 .w ∈ L1 w ∈ L1 / x ∈ L2 x ∈ L2 / =⇒ =⇒ =⇒ =⇒ M1 (w) M1 (w) M2 (x) M2 (x) accepts rejects Likewise.

If y ∈ L1 ∪ L2 . then y ∈ L1 OR y ∈ L2 . Correctness: If y ∈ L1 ∪ L2 . This is why it is necessary to run M1 and M2 in parallel. there is a TM M1 such that for every w ∈ Σ∗ : w ∈ L1 w ∈ L1 / x ∈ L2 x ∈ L2 / =⇒ =⇒ =⇒ =⇒ M1 (w) M1 (w) M2 (x) M2 (x) accepts rejects or does not halt Likewise. M1 (y) rejects or does not halt and M2 (y) / / / rejects or does not halt. numsteps ← 1 While (0 = 0) do For each pair of strings w.) M must be such that for every y ∈ Σ∗ : y ∈ L1 · L2 y ∈ L1 · L2 / Construction: M = On input a string y n ← |y| . If y ∈ L1 then M1 (y) accepts and hence M (y) accepts. there is a TM M2 such that for every x ∈ Σ∗ : accepts rejects or does not halt Want: A TM M that recognizes L1 · L2 . and thus that the class of recognizable languages is closed under concatenation.y ∈ L1 ∪ L2 y ∈ L1 ∪ L2 / Construction: =⇒ =⇒ M (y) M (y) accepts rejects or does not halt M = On input a string y Run M1 (y) and M2 (y) in parallel If either accepts then accept EndIf Reject Comment: Notice that if M runs M1 (y) before running M2 (y) and M1 does not halt on input y. Therefore. x ∈ Σ∗ such that y = wx do Run M1 (w) for numsteps steps Run M2 (x) for numsteps steps If both of these computations have accepted then accept EndIf EndFor =⇒ =⇒ M (y) M (y) accepts rejects or does not halt 16 . machine M will not halt. Given: Recognizable languages L1 and L2 over some alphabet Σ. i. Prove that the class of recognizable languages is closed under concatenation.. then y ∈ L1 AND y ∈ L2 . there is a TM M1 that recognizes it. Since L1 is recognizable. (This would mean that L1 · L2 is recognizable. Hence M (y) either rejects or does not halt. b. If y ∈ L2 then M2 (y) accepts and hence M (y) accepts.e. then even if M2 (y) accepts.

Since L is recognizable. Thus for all pairs w. Therefore. then there do not exist w ∈ L1 and x ∈ L2 such that y = wx. .. Eventually. (This would mean that L∗ is recognizable.e. c. . Prove that the class of recognizable languages is closed under star. If y ∈ L1 · L2 . there is a TM M such that for every w ∈ Σ∗ : w∈L w∈L / =⇒ =⇒ M (w) M (w) accepts rejects or does not halt Want: A TM M ′ that recognizes L∗ . Therefore. / / regardless of the value of numsteps. k do Run M (wi ) for numsteps steps If this computation has not accepted then acc ← 0 EndIf EndFor If acc = 1 then accept EndFor EndFor numsteps ← numsteps + 1 17 . . M1 (w) does not accept (i. . . The formal / definition of M includes a reject state. x ∈ Σ∗ such that y = wx.. it rejects or does not halt) or M2 (x) does not accept (i. . acc ← 1 While (0 = 0) do For k = 1. . it runs M1 on input w and M2 on input x for numsteps steps to see if they both accept. wk ∈ Σ∗ \ {ε} such that y = w1 · · · wk do acc ← 1 For i = 1. it rejects or does not halt). That is ok because we just want a machine that recognizes L1 · L2 . M1 (w) and M2 (x) both accept within numsteps steps. and thus that the class of recognizable languages is closed under star. . there is a TM M that recognizes it. At that point. numsteps ← 1 . x ∈ Σ∗ that M checks. n do For each sequence of strings w1 . . Comment: Notice that M never rejects. Given: A recognizable language L over some alphabet Σ. x ∈ Σ∗ that M checks.e. w ∈ L1 or x ∈ L2 . M considers all pairs w. It accepts if y ∈ L1 · L2 and it does not halt if y ∈ L1 · L2 . .. but M never enters this state. .) M ′ must be such that for every y ∈ Σ∗ y ∈ L∗ y ∈ L∗ / =⇒ =⇒ M ′ (y) M ′ (y) accepts rejects or does not halt Construction: M ′ = On input a string y If y = ε then accept EndIf n ← |y| . i. x ∈ Σ∗ for which y = wx. numsteps becomes large enough that for some pair w. M (y) does not halt.e. M accepts. . For each of them.numsteps ← numsteps + 1 EndWhile Correctness: If y ∈ L1 · L2 . then there exist w ∈ L1 and x ∈ L2 such that y = wx. for all / pairs w.

then y = ε or there exist w1 .e. M (wi ) accepts within numsteps steps. k} such that wi ∈ L. Eventually. . . . and hence M (y) accepts. M1 (y) rejects or does not halt OR M2 (y) / / / =⇒ =⇒ M (y) M (y) accepts rejects or does not halt 18 .EndWhile Correctness: If y ∈ L∗ . . / regardless of the value of numsteps. . If y ∈ L∗ . . Given: Recognizable languages L1 and L2 over some alphabet Σ. then y ∈ L1 OR y ∈ L2 . . wk ∈ Σ∗ \ {ε} such that y = w1 · · · wk . . . . . for all sequences w1 . (This would mean that L1 ∩ L2 is recognizable. wk ∈ Σ∗ \ {ε} that M checks. At that point. wk ∈ L \ {ε} such that / y = w1 · · · wk . . . then M ′ considers all sequences w1 . If y ∈ L1 ∩ L2 . M accepts. there is some i ∈ {1. . M (y) does not halt. there is some i ∈ {1. . . there is a TM M1 such that for every w ∈ Σ∗ : w ∈ L1 w ∈ L1 / x ∈ L2 x ∈ L2 / =⇒ =⇒ =⇒ =⇒ M1 (w) M1 (w) M2 (x) M2 (x) accepts rejects or does not halt Likewise. there is a TM M2 such that for every x ∈ Σ∗ : accepts rejects or does not halt Want: A TM M that recognizes L1 ∩ L2 .) M must be such that for every y ∈ Σ∗ : y ∈ L1 ∩ L2 y ∈ L1 ∩ L2 / Construction: M = On input a string y Run M1 (y) If it accepts then Run M2 (y) If it accepts then accept EndIf EndIf Reject Correctness: If y ∈ L1 ∩ L2 . and thus that the class of recognizable languages is closed under intersection. d. . If y = ε. .e. there is a TM M1 that recognizes it.. . k. . Therefore. . . wk ∈ L \ {ε} such that y = w1 · · · wk . . i. . Prove that the class of recognizable languages is closed under intersection. Therefore. Therefore. . Therefore. For each of them. it rejects or does not halt). If y = ε. . then y = ε and there does not exist a sequence w1 . it runs M on each wi for numsteps steps to see if M accepts all of these inputs. . numsteps becomes large enough that for some sequence w1 .. Since L1 is recognizable. . . for i = 1. M1 (y) accepts and M2 (y) accepts. . k} such that M (wi ) does not accept (i. . . Thus for all sequences w1 . then M ′ (y) accepts by construction. . wk ∈ Σ∗ \ {ε} for which y = w1 · · · wk . . . then y ∈ L1 AND y ∈ L2 . wk ∈ Σ∗ \ {ε} that M checks. .

Exercise 4. it rejects. Is M ∈ EDFA ? NO. and hence M (y) rejects or does not halt.. 0100 ∈ ADFA ? YES. L(M ) = L(M ). DFA M rejects 011.2 The problem can be expressed as the following language: EQDR = { B. 011 ∈ ADFA ? NO. M is not a valid encoding of a DFA and a string. DFA M accepts 0100. If M is given an input that isn’t a proper encoding of a DFA and a regular expression. Exercise 4. The following TM decides EQDR . E ) M ( B. B | A and B are DFAs and L(A) = L(B) } Let MED be a TM that decides EQDFA .1 a. Is M ∈ ADFA ? NO. 20. 0100 ∈ AREX ? e. c.5 says that the following language is decidable: EQDFA = { A.g. E ) accepts rejects Construction: Here we exploit a fact already proved about decidability. M = On input B. L(M ) = ∅ (e. M ∈ EQDFA ? YES. NO. such that for all DFAs B and all regular expressions E: L(B) = L(E) L(B) = L(E) =⇒ =⇒ M ( B. Want: A TM M that decides EQDR . M. Is M. E Convert E into an equivalent DFA C Run MED ( B. Is M. d. C ) If it accepts then accept else reject Comment: Note that we follow Sipser’s convention of not including explicitly in the description of a TM the verification of the format of the input.e. 21. i. E | B is a DFA equivalent to regular expression E } We prove that EQDR is decidable as follows. Theorem 4. M accepts 0100) f. 0100 is not a valid encoding of a regular expression and a string. Is M.rejects or does not halt.. Is M. b. 19 .

B.e.. If L(E) = L(B) (i.e. E ∈ EQDR ). such that for all DFAs A: L(A) is an infinite language L(A) is a finite language =⇒ =⇒ M( A ) M( A ) accepts rejects Construction: First we make a claim. 20 . If G does not generate ε (i.Correctness: If L(E) = L(B) (i. 22. then MCFG ( G. G ∈ AεCFG ). E ∈ EQDR ).e.4 Consider the language AεCFG = { G | G is a CFG that generates ε }.e. ε ) If it accepts then accept else reject Correctness: If G generates ε (i. i. B. Hence MED ( B. then L(C) = L(E) = L(B). Want: A TM M that decides INFINITE DFA .e. Finally. and can also be reached from the start state.5 Consider the language INFINITE DFA = { A | A is a DFA and L(A) is an infinite language }.e. G ∈ AεCFG ). then MCFG ( G. Exercise 4. Exercise 4. i.. C ) rejects / and M rejects. w | G is a CFG that generates w } Let MCFG be a TM that decides ACFG . we prove the claim. through a series of transitions..6 says that the following language is decidable: ACFG = { G. such that for all CFGs G: G generates ε G does not generate ε =⇒ =⇒ M( G ) M( G ) accepts rejects Construction: Theorem 4. Hence MED ( B. / Thus M decides AεCFG . can reach itself and an accept state.. 23.e.. C ) accepts and M accepts.. Then we construct a TM M that decides INFINITE DFA based on this claim. Thus M decides EQDR . (i. ε ) rejects and M rejects. Claim: A regular language L is infinite if and only if the DFA that recognizes L has some state which. M = On input G Run MCFG ( G. The following TM decides AεCFG .. the DFA that recognizes L has a loop reachable from the start state that can reach some accept state). then L(C) = L(E) = L(B). The following TM decides INFINITE DFA . Want: A TM M that decides AεCFG . ε ) accepts and M accepts.

then it cannot possibly accept a string of length greater than p. but not necessarily) that recognizes the same language as our initial DFA and all the states of the NFA can reach an accept state and are reachble from the start state. And the Claim is proved.M = On input A For every state s of A do Clear all marks Mark all states that have transitions coming into them from s Repeat until no new state gets marked Mark any state that has a transition coming into it from any state that is already marked If state s and some accept state got marked then Clear all marks Mark the start state Repeat until no new state gets marked Mark any state that has a transition coming into it from any state that is already marked If state s got marked. and get an arbitrarily large string recognized by the DFA. i. 24. We then need to check whether this state can be reached from the start state. then the language it accepts is finite. then we can just cycle through the loop as many times as we want. (⇐) If the DFA has a loop that can be reached from the start state and can reach an accept state. and we reject. none of them satisfy the above conditions. If the NFA has no loops.10 Consider the language A = { M | M is a DFA which doesn’t accept any string containing an odd number of 1s }. after we go through all the states.. Let’s remove all these pointless states and all the transitions associated with them. then accept EndIf EndRepeat EndIf EndRepeat EndFor Reject What the above algorithm does is. it checks whether the state can reach itself through one or more transitions and can reach an accept state.e. Notice that we are now left with an NFA (it could still be a DFA. Say this new NFA has p states. If so. (Because accepting a string of length greater than p would mean visiting some state twice. for every state. we conclude that the language recognized by DFA A is finite. Problem 4. then it lies on a loop that can reach the accept state. If so. If. but that would require a loop!) Thus if this NFA has no loops. then we accept because we found a loop that can be reached from the start state and can reach the accept state. Correctness: We need to prove the above claim. Want: A TM MA that decides A. such that for all DFAs M : 21 . (⇒) Say that a state is pointless it cannot be reached from the start state or cannot reach an accept state.

Problem 5. Let DB be a DFA that recognizes B. then L(M ) ∩ L(DB ) = ∅. G2 are CFGs and L(G1 ) = L(G2 ) }. then / L(M ) ∩ L(DB ) = ∅. G2 : L(G1 ) = L(G2 ) L(G1 ) = L(G2 ) =⇒ =⇒ M ( G1 . EQCFG = { G1 . 1}∗ | w contains an odd number of 1s } is regular. DB )) rejects and MA rejects. M ∈ A).4 says that the following language is decidable: EDFA = { B | B is a DFA and L(B) = ∅ } Let ME be a TM that decides EDFA . where M1 and M2 are DFAs. Hence ME (T ( M. namely. M ∈ A). DB ) Run ME ( N ) If it accepts then accept else reject Correctness: If M is a DFA that doesn’t accept any string containing an odd number of 1s (i. We show that EQCFG is co-recognizable by constructing a recognizer for its complement. M2 . If M is a DFA that accepts some string containing an odd number of 1s (i. where N is a DFA such that L(N ) = L(M1 ) ∩ L(M2 ).e.e.2 DEF: A language is co-recognizable if its complement is recognizable.e.6 says that the following language is decidable: ACFG = { G. The following TM recognizes EQCFG . Note that the language B = { w ∈ {0.. i. Want: A TM M that recognizes EQCFG . Thus MA decides A.. The following TM decides A. G2 ) accepts rejects Construction: Theorem 4.M doesn’t accept any string containing an odd number of 1s M accepts some string containing an odd number of 1s =⇒ =⇒ MA ( M ) MA ( M ) accepts rejects Construction: Theorem 4. Consider the language EQCFG = { G1 . Hence ME (T ( M. G2 | G1 . returns N . MA = On input M Let N be the output of T ( M. 24. M = On input G1 . Notice that this TM exists since the transformation underlying the proof that the class of regular languages is closed under intersection can be automated. G2 ) M ( G1 . Let T be a TM that on input M1 .. G2 are CFGs and L(G1 ) = L(G2 ) }. such that for all CFGs G1 . G2 | G1 . w | G is a CFG that generates w } Let MCFG be a TM that decides ACFG . G2 22 . DB )) accepts and MA accepts.

It does not find one. When it finds one. / it accepts. 23 . If L(G1 ) = L(G2 ) then for all w ∈ Σ∗ 1) w ∈ L(G1 ) and w ∈ L(G2 ) OR 2) w ∈ L(G1 ) and / w ∈ L(G2 ). M searches for a string w such that 1) w ∈ L(G1 ) and w ∈ L(G2 ) OR 2) w ∈ L(G1 ) / / / and w ∈ L(G2 ). w ) accepts then acc1 ← 1 else acc1 ← 0 EndIf If MCFG ( G2 . w ) accepts then acc2 ← 1 else acc2 ← 0 EndIf If acc1 = acc2 then accept EndIf EndFor Correctness: If L(G1 ) = L(G2 ) then there is some w ∈ Σ∗ such that 1) w ∈ L(G1 ) and w ∈ L(G2 ) OR 2) / w ∈ L(G1 ) and w ∈ L(G2 ). so it does not halt.Let Σ be the union of the terminal alphabets of G1 and G2 For all w ∈ Σ∗ do If MCFG ( G1 . M searches for a string w satisfying this property.