You are on page 1of 35

MELJUN P.

CORTES, MBA,MPA,BSCS,ACS
CSC 3130: Automata theory and formal languages

Undecidable problems for CFGs and descriptive complexity

Andrej Bogdanov
http://www.cse.cuhk.edu.hk/~andrejb/csc3130

Decidable vs. undecidable
decidable
“DFA M accepts w”
“PDA M accepts w”

undecidable
“TM M accepts w” “TM M halts on w”
“TM M accepts some input” “TM M accepts all inputs”

“DFA M accepts all inputs”

“TM M and M’ accept “PDA P accepts all same ? inputs” inputs” “CFG G is ambiguous” other kinds of problems?

Computation is local
q0
lotus ootus oktus okrus

M

q6 q3

q0

›0q0l0o0t0u0s0 ›0o0q6o0t0u0s0 ›0o0k6q3t0u0s ›0o0k6r0q0u0s0 ›0o0k6r0a0q1s ›0o0k6r0qaa0☐
computation tableau

q1 okras qacc okra

The changes between rows occur in a 2x3 window

Computation histories as strings
• If M halts on w, We can represent the computation tableau by a string t over alphabet G∪Q∪{#, ›}
›0q0l0o0t0u0s0 ›0o0q6o0t0u0s0 ›0o0k6q3t0u0s ›0o0k6r0q0u0s0 ›0o0k6r0a0q1s ›0o0k6r0qaa0☐
›q0lotus#›oq6otus#...#›okrqaa☐#

M accepts w M rejects w

qa occurs in string t qa does not occur in t

Undecidable problems for PDAs
ALLPDA = {〈P〉: P is a PDA that accepts all inputs}
• Theorem

The language ALLPDA is undecidable.
• Proof: We will show that If ALLPDA can be decided, so can ATM.

Undecidable problems for PDAs
〈P〉

A

accept if P accepts all inputs reject if not accept if M rej/loops w reject if M accepts w

〈M〉, w

〈P〉

A

P accepts all inputs if M rejects or loops on w P does not accept some input if M accepts w

Undecidability via computation histories
P accepts all inputs if M rejects or loops on w P does not accept some input if M accepts w candidate computation history of M on w

P

reject accepting histories reject accept

›q0lotus#›oq6otus#...#›okrqaa☐

every other string

M accepts w M rej/loops on w

P rejects t
no accepting histories P accepts everything

Undecidability via computation histories
• Task: Design a PDA P such that
candidate computation history t of M on w
›0q0l0o0t0u0s0 ›0o0q6o0t0u0s0 ›0o0k6q3t0u0s ›0o0k6r0q0u0s0 ›0o0k6r0a0q1s ›0o0k6r0qaa0☐

P

reject accepting histories

Expect t of the form w1#w2#...#wk#
If w1 ≠›q0w , accept t.

If t does not contain qa, accept t.
If two consecutive blocks wi#wi+1 do not correspond to a proper transition of M, accept t.

Implementing P
On input t: Nondeterministically make one of the following choices Look in the first block w1 of t If w1 ≠›q0w , accept t. Look for the appearance of qa If t does not contain qa, accept t. Look for the beginning of the ith block of t If two consecutive blocks wi#wi+1 do not represent a valid transition of M, accept t.

›0o0k6q3t0u0s # ›0o0k6r0q0u0s0

valid transition

wi#wi+1 represents a valid transition if all 3x2 windows correspond to possible transitions of M

Valid and invalid windows
… 6t3t0u0 … … 0t6t0u0 …0 valid window … 6t3q3u0 … … 0t6a0q7 …0 valid if d(q3, u) = (q7, a, R) … 6q3t0u0 … … 0k6t0q0 …0 invalid window … 6c3a0t0 … … 0c6a0p0 …0 invalid window

… 6t3t0u0 … … 0t6t0q3 …0 valid window

… 6c3a0t0 … … 0b6a0t0 …0
valid window

Implementing P
wi#wi+1 represent a valid transition of M

• To check this it is better to write t in boustrophedon
›0q0l0o0t0u0s0 ›0o0q6o0t0u0s0 ›0o0k6q3t0u0s ›0o0k6r0q0u0s0 ›0o0k6r0a0q1s ›0o0k6r0qaa0☐

›q0lotus#›oq6otus#...#›okrqaa☐# ›q0lotus#sutoq6o›#...#›okrqaa☐#

Alternate rows are written in reverse

Implementing P
wi#wi+1 represent a valid transition of M
›0o0k6q3t0u0s # # ›0o0k6r0q0u0s0

…#›okq3tus#suq0rko›#… wi wi+1

proper transition

Nondeterministically look for beginning of 3x2 window Remember first row of window in state Use stack to detect beginning of second row Remember second row of window in state If window is not valid, accept, otherwise reject.

The Post Correspondence Problem
• Input: A set of tiles like this
bab cc c ab a ab baa a a baba bab e

• Given an infinite supply of such tiles, can you match top and bottom?
a baa bab c c bab a ab a ab ab cc baba e

Undecidability of PCP
PCP = {D: D is a collection of tiles that contains a top-bottom match}
• Theorem

The language PCP is undecidable.
• Proof: We will show that If PCP can be decided, so can ATM.

Undecidability of PCP
〈M〉, w T
(collection of tiles)

If M accepts w, then T can be matched If M rej/loops on w, then T cannot be matched

• Idea: Matches represent accepting histories
›q0lotus#›oq6otus#›okq3t...#›qa☐☐☐☐ ›q0lotus#›oq6otus#›okq3r...#›qa☐☐☐☐ e ›q0lotus# ›q0l o t u s # › oq60 … ›oq6 o t u s # › okq3

Some technicalities
• We will assume that
– Before accepting, TM M erases its tape – One of the PCP tiles is marked as a starting tile bab cc c ab a ab baa a s a baba

• These assumptions can be made without loss of generality (we will see why later)

Undecidability of PCP
〈M〉, w T
(collection of tiles)

If M accepts w, then T can be matched If M rej/loops on w, then T cannot be matched

• To decide ATM, we construct these tiles for PCP
s e ›q0w# a1qia3 b1b2b3
for each valid window of this form

☐# #

a a
for all a in G∪{#, ›}

#›qa e

☐ e

“final” tiles

Undecidability of PCP
›q0lotus#›oq6otus#...#›oq1☐☐☐#›qa☐☐☐☐ accepting computation history ›q0lotus#›oq6otus#...#›oq1☐☐☐#›qa☐☐☐☐

›q0lotus#›oq6otus#...#›oq1☐☐☐#›qa☐☐☐☐

s

e ›q0w#

a1qia3 b1b2b3

☐# #

a a

#›qa e

☐ e

Undecidability of PCP
• If M rejects on input w, then qr appears on bottom at some point, but it cannot be matched on top
• If M loops on w, then matching keeps going forever
s e ›q0w# a1qia3 b1b2b3 ☐# # a a #›qa e ☐ e

A technicality
• We assumed that one tile marked as starting tile
s
a baba bab cc c ab

• We can remove assumption by changing tiles a bit
*a* *b*a*b*a
“starting tile” begins with *

b*a*b* *c*c

c* *a*b

 *
“ending tile” matches last *

“middle tiles”

Ambiguity of CFGs
AMB = {G: G is an ambiguous CFG}
• Theorem

The language AMB is undecidable.
• Proof: We will show that If AMB can be decided, so can PCP.

Ambiguity of CFGs
T
(collection of tiles)

G

(CFG)

If T can be matched, then G is ambiguous If T cannot be matched, then G is unambiguous

• Proof:
Step 1: Number the tiles
1

bab cc

2

c ab

3

a ab

Ambiguity of CFGs
T
(collection of tiles)

G

(CFG)

Terminals:
Variables:

a,b,c,1,2,3 S, T, B T → cT2 B → abB2 S → aT3 B → abB3

Productions: T → babT1 B → ccS1 S→T|B T→e B→e 1 bab cc

2

c ab

3

a ab

Ambiguity of CFGs
• Each sequence of tiles gives two derivations
1 bab cc

2

c ab

2

c ab

S → T → babT1 → babcT21→ babcc221
S → B → ccB1 → ccabB21→ ccabab221

• If the tiles match, these two derive the same string

Ambiguity of CFGs
T
(collection of tiles)

G

(CFG)

✓ If T can be matched, then G is ambiguous If T cannot be matched, then G is unambiguous ✓
• Argue by contradiction:
– If G is ambiguous then ambiguity must look like this
S T T
S B B

Then n1...ni = m1…mj m1 m2 So there is a match n1 n2 ni a2 ai a1 … b2 bi b1

a1 a2 ai

T

n1 n2 ni

b1 b2

bj

B

mj

Descriptive complexity

Roulette
• In a game of roulette, you bet $1 on even or odd
• The outcome is a number between 1 and 36
– If you guessed correctly, double your bet – Otherwise, you lose
17 11

6 8
2

5 31

16 18

5 7

2 4

5

29

8

1 12

Randomness
• If we write E for even, O for odd, what we saw is
OEOEOEOEOEOEOEOEOEOE • It seems the wheel is crooked. If it wasn’t we would expect something more like OOOEEOEOOEOEOOOEEEOE • But both sequences have same probability! Why does one appear less random than the other?

Turing Machines with output
M
0 1 0

work tape …

0

1

0

output tape …

• The goal of a Turing Machine with output is to write something on the output tape and go into state qhalt

Descriptive complexity
• The descriptive complexity K(x) of x is the shortest description of any Turing Machine that outputs x

• We will assume x is long
Andrey Kolmogorov
(1903-1987)

Example of descriptive complexity
x = “OE...OE” = (OE)n
(n = 1,000,000,000)
Repeat for n steps: At odd step print O At even step print E

• Turing machine implementation:
Write n in binary on work tape ≈ log2n states While work tape not equal to 0, ≈ 3 states Subtract 1 from number on work tape ≈ 15 states If number is odd, write O ≈ 2 states If number is even, write E K(x) ≈ log2n + 20

Bounds on descriptive complexity
• Theorem 1
n + O(1)

For every x of length n, K(x) is at most O(n)
• Proof: Let x = x1...xn and consider the following TM:
Write x1 to output tape and move right Write x2 to output tape and move right ... Write xn to output tape and halt.

Descriptive complexity and randomness
• Theorem 2 For 99% of strings of length n, K(x) ≥ n  10.
0 O(log n) n  10 n + O(1)

“simple” strings

“randomness-deficient” “random-looking” strings strings

111...1, OEOE...OE, 3.14159265, 1212321234321

Evaluating randomness
• How do we know if the casino is crooked?
17 11 5 6 8 2 5 31 29 16 18 8 5 2 12 8 4

11 13 14 31

1 12 12

• Idea: Compute K(sequence).
If much less than n, indicates sequence is not random

Computing descriptive complexity
It is not possible to compute K(x). • Proof: Suppose it is, fix n and consider this TM M:
Output the first x of length n (in lexicographic order) such that K(x) ≥ n  10

Let x = output of M, then K(x) ≥ n  10 but K(x) ≤ |〈M〉| = log2n + O(1) So (when n is large) we get K(x) > K(x), impossible!