You are on page 1of 234

SYLLABUS CS4: Theoretical Foundations Of Computer Science Unit 1 Mathematical preliminaries Sets, operation, relation strings, transitive closure,

, accountability and diagonaalisation, induction and proof methods- pigeon-hole principle and simple application concept of language grammar and production rulesChomsky hierarchy. Unit II Finite state machine, regular language, deterministic finite automata, conversion deterministic automata , E- closures- regular expression finite automata, minimization of automata, Moore and Mealy machine and their equivalence. Unit III Pumping lemma for regular sets-closure properties of regular sets-decision properties for regular sets, equivalence between regular language and regular grammar. Context free language parse trees and ambiguity, reduction of CFGS, chomsky and Griebach normal forms Unit IV Push down Automata (PDA)-non Determinism-acceptance by two methods and their equivalence, conversion of PDA to CFG CFLs and PDAs-closure and decision properties of CFLs. Unit V Turing machines-various-recursively enumerable (r.e.)set-recursive sets TM as computer of function- decidability and solvability- reductions- post correspondence problem (PCP) and unsolvability of ambiguity problem of CFGs, Churchs hypothesis. Unit VI Introduction to recursive function theory- primitive recursive and partial recursive functions, Parsing top down and bottom up approach, derivation and reduction

Unit I Mathematical preliminaries Sets, operation, relation strings, transitive closure, accountability and diagonaalisation, induction and proof methods- pigeon-hole principle and simple application concept of language grammar and production rulesChomsky hierarchy. Mathematical preliminaries Sets, operations,Relations : A set is a collection of objects with no repetition. The simplest way to describe a set is by listing its elements. If a set is described using a defining property, the description should clearly specify the objects and the universe of discourse: Important notation for sets: \ For all sets A, B, and C in the universe U the following set properties hold:

Idem potency

Involution law: (Ac)c = A De Morgan

A set |A| is said to be finite if A contains a finite number of elements.

A set |A| is said to be infinite if A contains an infinite number of elements. The set A is said to be countable or enumerable is there is a way to list of the elements of A. More formally, A set A is enumerable or countable is A is finite or if there is a bisection Example. The following sets are countable:

R. If the domain is the same as the range, A = B, then R is a relation on the set A. Properties of relations: R R R R is is is is symmetric if aRb implies bRa transitive is aRb and bRc implies aRc ant symmetric if aRb and bRa implies a = b an equivalence relation if R is reflexive, symmetric and transitive

If R is a reflexive relation on A, then the reflexive closure of R is the smallest reflexive relation on A with R as a subset. If R is a symmetric relation on A, then the symmetric closure of R is the smallest symmetric relation on A with R as a subset. If R is a transitive relation on A, then the transitive closure of R is the smallest transitive relation on A with R as a subset. The relation R on A is an ordering relation if R is reflexive, ant symmetric and transitive

A is the domain of f and B is the codomain of f. difference codomain vs range?

-to-to-one. Exercise 3.1 A={(1,a),(1,b),(2,a),(2,b)} {(1,a),(2,b)} , {(1,b),(2,a)} , {(1,b),(2,b)} , {(2,a),(2,b)} , {(1,a),(1,b),(2,a)}, {(1,a),(1,b),(2,b)} , {(1,a),(2,a),(2,b)} , {(1,b),(2,a),(2,b)}, {(1,a),(1,b),(2,a),(2,b)}}

2. Determine the properties for the following relations: i) < on the set of real numbers transitive, antisymmetric set of integer numbers reflexive, transitive, antisymmetric iii) is a relative of over the set of persons living in the UK reflexive, symmetric total, onto, one-to-one, bijective, try solving equations x+y=a and x-y=b Strings and Languages: A string is a finite sequence of symbols a1a2a3an where each ai is an element of The length of a string x denoted |x| is the number of symbols in x. The University of Nottingham School of Computer Science and IT Dr. 5 Dario Landa-Silva * where x=a1a2am and the concatenation of x and y is given by the xy= a postfix of x. is a substring of z. The concatenation of two languages L1 and L2 is given by L1L2={xy | Exercise 3.2 two first bits of x are the same as the two first bits of y. Determine if R is an equivalence relation. L={0000,0001,0010,0011,0100,0101,0110,0111,1000,1001,1010,1011,1100,1 1 01,1110,1111} Yes, R is an equivalence relation 2. How many prefixes, postfixes and substrings exist in a string on length n? n+1 prefixes, n+1 postfixes, 1 2 ( 1) n n substrings string in L. Examples: aa , bb , cc , abab , cbacba

Transitive Closure: Closures of Binary Relation A binary relation R on a set S may not have a particular property such as reflexivity, symmetry, or transitivity. However, it may be possible to extend the relation so that it does have the property. Extending R means finding a larger subset of S S that contains R and which has the desired property. The closure of a relation on S with respect to a property is the smallest such extension that has the desired property. Commonly used Closures: _ transitive closure _ reflexive closure _ symmetric closure Transitive Closure of Binary Relation:

Accountability And Diagonaalisation: Diagonalization Let P(n) be a proposition involving integer

n. The Principle of Mathematical Induction states that P(n) is true for n0 _ n, if the following are true: P(n0), and for n _ n0, P(k), n0 _ k _ n implies P(n + 1) Problems 1. The number of subsets of a set of size n is 2n. 2.nX i=0 i = n(n + 1)/2 3.nX i=0 i2 = n(n + 1)(2n + 1)/6 4.nX i=0 i3 = ( nX i=0 i)2 5.nX i=0 2i = 2n+1 1 6. nX i=1 1/(i(i + 1)) = n/(n + 1) 7. For n _ 4, n! > 2n Induction And Proof Methods: Pigeonhole Principle If A and B are finite sets and |A| > |B|, then there is no one-to-one function from A to B. Example: In any group of at least two people there are at least two persons that have the same number of acquaintances within the group. Example: How many shoes must be drawn from a box containing 10 pairs to ensure a match? Example: In NYC there are at least two people with the same number of hairs on their heads. Example: If A is a set of 10 numbers

between 1 and 100, there are two distinct disjoint subsets of A whose elements sum to the same number. Diagonalization Proofs Theorem (Georg Cantor). The set of all subsets of natural numbers is uncountable. Proof by contradiction. Suppose there is a one-to-one function f from N onto pow(N). pow(N) = {S0,S1,S2,...} where Si = f(i) D = {n in N | n is not in Sn}. D is the diagonal set for N. D is a set of natural numbers, hence D = Sk for some k in N if k is in Sk, then k is not in D. But Sk = D. if k is not in Sk, then k is in D. But Sk = D. So pow(N) is not countable Concept Of Language: Context-Free Languages and Context-Free Grammars From the results in the previous chapters, if a language is regular, it is easy to _nd a general format for sentences of the language by using regular expressions. Also, it is easy to check if a string is in the language by using the language's DFA model. However, not all languages are regular. In fact, some basic properties of programming languages require something beyond regular languages. Example 4.0.4. In order to deal with mathematical expressions such ((x + y):z + x:y) a programming language needs to have the ability to recognize the language L = f(n)n : n _ 0g; which describes a simple kind of nested structure in programming languages. Context-Free Grammars Definition 4.1.1. A grammar G = (V; T; S; P) is context-free if all production rules in P has the form A ! x; where A 2 V; x 2 (V [T)?: A language L is context-free if there exists a context-free grammar G such that L = L(G);

where L(G) = fw 2 T ? : S )? wg It is easy to see that any regular grammar is context-free. But a contextfree grammar may not be regular. 69 Example 4.1.2. L = fan:bn : n _ 0g is irregular. Moreover, L is generated by the context-free grammar G = (fSg; fa; bg; S; fS ! a:S:bj_g): For a sentence w of a context-free language, there may be more than one derivations for w starting from S : S ) : : : ) w: Furthermore, since it is possible to have more than one variables on the right hand side of a production rule, there are several possibilities of applying production rules. Production Rules: Definition 4.1.3. A derivation is called leftmost if in each step the leftmost variable in the sentential form is replaced. A derivation is called rightmost if in each step the rightmost variable in the sentential form is replaced. Example 4.1.4. G = (fA;B; Sg; fa; bg; S; P) where P = fS !1 A:B;A !2 a:a:Aj3_;B !4 B:bj5_g Leftmost: S !1 A:B !2 a:a:A:B !3 a:a:B !4 a:a:B:b !5 a:a:b Rightmost: S !1 A:B !4 A:B:b !5 A:b !2 a:a:A:b !3 a:a:b A second way of showing derivations is by using derivation trees. This manner of showing derivations is independent of the order in which production rules are used. Definition 4.1.5. Let G = (V; T; S; P) be a context-free grammar derivation tree is a tree in that (1) the root is labeled S (2) every leaf has a label in T [ f_g (3) every interior vertex has a label in V (4) for every vertex A 2 V; if A0s children are a1; a2; : : : ; an; then P must contain the production rule A ! a1:a2 : : : an; (5) every leaf with label _ has no sibling. Like transition graphs for _nite automata, derivation trees give a very explicit and easily comprehended description of a derivation. number lambda factor termp lambda term expp ( exp ) lambda factor termp lambda term expp exp Theorem 4.1.6. Let G = (V; T; S; P) be a context-free grammar, 8w 2 _?; w is in L(G) if and only if there exists a derivation tree of G; whose yield is w: Proof. ()):

(1) We will _rst prove that for every sequence S ) x1 : : : ) xn: xi 2 (V [ T)?; i = 1::n; there exists a partial derivation tree with root S; which satis_es Condition 1, 3, 4, 5 and yields xn: We will prove this fact by using induction on the length of the sequence. n = 1: The tree is constructed by using the only production rule for deriving x1 from S: n = k _ 1: Assume that for every sequence S ) x1 : : : ) xk: xi 2 (V [ T)?; i = 1::k; there exists a partial derivation tree with root S; which satis_es Condition 1, 3, 4, 5 and yields xk: n = k + 1: Since the grammar is context-free, for every sequence S ) x1 : : : ) xk ) xk+1: of length k+1; where xi 2 (V [T)?; i = 1::k+1; xk must be in the form u:A:v: Moreover, there must be a production rule A ! z in the set of production rules P; and xk+1 must be in the form u:z:v for some A 2 V; u; v; z 2 (V [ T)?: From the induction assumption, there exists a partial derivation tree with root S and yields u:A:v: We simply add the children for node A following the production rule A ! z: Obviously, the new partial tree has root S; satis_es Condition 1, 3, 4, 5 and yields u::z:v = xk+1: (2) We will now prove that 8w 2 _?; if w 2 L(G) then there exists a derivation tree of G; whose yield is w: Since w 2 L(G); there exists a derivation sequence S ) : : : ) w: Therefore, there exists a partial derivation tree with root S; which satis_es Condition 1, 3, 4, 5 and yields w: This tree also satis_es Condition 2 because w 2 T ?; which means all leaves of the tree are in T [ f_g: ((): (1) We will _rst prove that for all partial derivation tree with _ 1 interior node, whose yield is x 2 (V [T)?; there exists a sequence S ) : : : ) x: We will prove this fact by using induction on the number of interior nodes. n = 1: The only interior node is S; and the sequence is S ) x: n = k _ 1: Assume that for all partial derivation tree with k _ 1 interior node, whose yield is x 2 (V [T)?; there exists a sequence S ) : : : ) x: n = k + 1: Since k + 1 > 1; every tree with k + 1 interior nodes must have a leaf z_ 2 V [ T [ f_g such that its direct parent node A 2 V is di_erent from S: (Otherwise, there is only one interior node, i.e., S:) Remove z_ and all of its siblings. The new partial derivation tree has k interior nodes. Therefore, from the induction assumption, there exists a sequence S ) : : : ) u:A:v; for some u; v 2 (V [T)?: Simply add) u:z:v = x to the sequence, where z is the concatenation of all children of A; we have the sequence S ) : : : ) u:A:v ) u:z:v = x for the tree. (2) We will now prove that 8w 2 _?; if there exists a derivation tree of G; whose yield is w; then w 2 L(G):

For every derivation tree whose yield is w; there exists a sequence S ) : : : ) w: Since w 2 _?; w 2 L(G): Theorem . Let G = (V; T; S; P) be a context-free grammar, which does not have any _-rules (i.e., A ! _ where A 2 V ) or unit-production rules (i.e., A ! B where A;B 2 V ). Then for 8w 2 _?; the exhaustive searching algorithm either produces a parsing of w or tells us that no parsing is possible. Proof. After one round, either length of the sentential form or the number of terminal symbols will increase at least one. Since the length of the sentential form or the number of the terminal symbols cannot exceed jwj; a derivation cannot involve more than 2 _ jwj rounds. _ Problem . This algorithm, however, is very ine_cient. The upper bound for the number of sentential forms M = jPj + jPj2 + _ _ _ + jPj2_jwj: Claim. We will reduce the complexity of the algorithm to jwj3 or lower. A context-free grammar G = (V; T; S; P) is called a simple grammar (sgrammar) if all of its production rules are of the form A ! a:X; where A 2 V; a 2 T; X 2 V ?; and all pair (A; a) occurs at most once in P: Lemma 4.2.5. If a grammar G = (V; T; S; P) is simple, 8w 2 _?; w can be parsed with at most jwj steps. Proof. Assume that w = a1:a2 : : : an; _ If P does not have the rule S ! a1:A1 : : : then stop, the string w =2 L(G); { else apply the production rule S ! a1:A1 : : : { If P does not have the rule A1 ! a2:A2 : : : then stop, the string w =2 L(G); _ else apply the production rule A1 ! a2:A2 : : : _ Definition 4.2.6. A context-free grammar G is ambiguous if there exists a sentence w 2 L(G) which has at least two distinct derivations. Chomskys hierarchy: N. Chomsky: Three models for the description of language, IRE Trans. Information Th. 2, 113-124, 1956. Motivating example: Sentence -> Noun Verb Noun, e.g.: Bob loves Alice Sentence -> Sentence Conjunction Sentence, e.g.: Bob loves Alice and Rome fights Carthage Grammar G(V, A, P, S). V: alphabet of non-terminal symbols, variables, grammatical types; A: alphabet of terminal symbols, S P: unordered set of productions of the form L -> R, where L, R Rewriting step: for x, y, y, z -> v iff u = xyz, v = xyz and y -> y Derivation: ->* is the transitive, reflexive closure of ->, i.e.

u ->* v iff -1) -> wi, wj = v Language defined by G: L(G) = { w ->* w) Various restrictions on the productions define different types of grammars and corresponding languages: Type 0, phrase structure grammar: No restrictions Type 1, context sensitive: |L| |R|, (exception: S occurs on any right-hand side) Type 2, context free: L Type 3, regular: L Chomsky Normal Form (CNF) Definition . A context-free grammar is in Chomsky Normal Form (CNF) if all production rules are in the form A ! B:C or A ! a where A;B;C 2 V , a 2 T. Algorithm 4.5.2. [CNF] _ Inputs: A context-free grammar G = (V; T; S; P) with _ =2 L(G) _ Output: An equivalent context-free grammar bG = (bV ; bT; bS; b P) that is in CNF Step1: Remove _-production rules and unit production rules in G Step2: Construct G1 as follows: For all rule A ! x1; x2; : : : xn _ if n = 1, there must be a terminal symbol a such that A ! a. Add A ! a to P1 _ if n _ 2, add A ! C1;C2; : : :Cn into P1 where Ci = xi if xi is a variable, and Ci = Ba is a new variable if xi = a is a terminal symbol. Add Ba ! a into P1 for all new variable Ba Step3: Construct ^G from G1 _ Put into ^ P all rules in P1 of the form A ! a and A ! BC _ Replace A ! C1;C2; : : :Cn where n > 2 by A ! C1D1 D1 ! C2D2 ::::::::: Dn2 ! Cn1Cn

Unit II Finite state machine, regular language, deterministic finite automata, conversion deterministic automata , E- closures- regular expression finite automata, minimization of automata, Moore and Mealy machine and their equivalence Finite state machine: Finite State Machines with Output (Mealy and Moore Machines) Introduction If a combinational logic circuit is an implementation of a Boolean function, then a sequential logic circuit can be considered an implementation of a finite state machine. There is a little more to it than that (because a sequential logic circuit can contain combinational logic circuits). If you take a course in programming languages, you will also learn about finite state machines. Usually, you will call it a DFA (deterministic finite automata). While finite state machines with outputs are essentially DFAs, the purpose behind them is different. DFAs in programming languages When you are learning about models of computation, one simple model is a deterministic finite automata or DFA for short. Formally, the definition of a DFA is: Q, a set of states S, an single state which is an element of Q. This is the start state. F, a set of states designated as the final states Sigma, the input alphabet delta, a transition function that maps a state and a letter from the input alphabet, to a state DFAs are used to recognize a language, L. A language is a set of strings made from characters in the input alphabet. If a language can be recognized by a DFA, it is said to have a regular grammar.

To use a DFA, you start in an initial state, and process the input string a character at a time. For example, if the input alphabet consists of "a" and "b", then a typical question is to ask whether the string "aaab" is accepted by a DFA. To find out whether it is accepted, you start off in the state state, S. Then you process each character (first "a", then "a", then "a", and finally "b"). This may cause you to move from one state to another. After the last character is processed, if you are in a final state, then the string is in the language. Otherwise, it's not in the language. There are some languages that can't be recognized by a DFA (for example, palindromes). Thus, a DFA, while reasonably powerful, there are other (mathematical) machines that are more powerful. Often, tokens in programming languages can be described using a regular grammar. FSM with output in hardware A finite state machine with output is similar to describe formally. Q, a set of states S, an single state which is an element of Q. This is the start state. Sigma, the input alphabet Pi, the output alphabet delta, a transition function that maps a state and a letter from the input alphabet, to a state and a letter from the output alphabet. The primary difference is that there is no set of final states, and that the transition function not only puts you in a new state, but also generates an output symbol. The goal of this kind of FSM is not accepting or rejecting strings, but generating a set of outputs given a set of inputs. Recall that a black box takes in inputs, processes, and generates outputs. FSMs are one way of describing how the inputs are being processed, based on the inputs and state, to generate outputs. Thus, we're very interested in what output is generated. In DFAs, we don't care what output is generated. We care only whether a string has been accepted by the DFA or not. Since we're talking about circuits, the input alphabet is going to be the set of k bit bitstrings, while the output alphabet is the set of m bit bitstrings. We'll look at this more informally, just in case you're confused. An Example Let's look at an example of an FSM.

Each of the circle is a state. For now, all you need to know is that, at any given moment, you are in one state. Think of this as a game, where there are circles drawn on the ground, and at any moment, you are standing in exactly one circle. Each of the circle is given a unique binary number. The number of bits used depends on the total number of states. If there are N states, then you need ceil( lg N ) bits (the ceiling of log base 2 of N). The states are labelled with the letter q, plus subscripts. In this example, it's q1q0. You may have k input bits. The input bits tell you which state to transition to. For example, if you have 2 input bits (x1x0), then there are four possible out going edges (x1x0 = 00, x1x0 = 01, x1x0 = 10, and x1x0 = 11). In general, there are 2k outgoing edges for k bits of input. Thus, the number of edges depends on the number of bits used in the input. Tracing an Example You might be asked, what are the sequence of states and outputs, assuming you start in state 00, and have input (1, 1, 0, 0, 1). State 00 (Start) 01 10 01 01 10 Input 1 1 0 0 1 So, you may start in state 00, reading input 1 (see column 1 of the table), which puts you in state 01. At that point, you read in input 1 (see column 2), and go into state 10 (column 3), etc. FSM with Outputs: Moore machines The goal of FSMs is to describe a circuit with inputs and outputs. So far, we have inputs, that tell us which state we should go to, given some initial, start state. However, the machine generates no outputs. We modify the FSM shown above, by adding outputs. Moore machines add outputs to each state. Thus, each state is associated with an output. When you transition into the state, the output corresponding to the state is produced. The information in the state is typically written as 01/1. 01 indicates the state, while 1 indicates the output. 01/1 is short hand for q1q0 = 01/z = 1 The number of bits in the output is arbitary, and depends on whatever your application needs. Thus, the number of bits may be less than, equal, or greater than the number of bits used to represent the state. Let's look at an example of a Moore machine.

In this example, you see two bits for the state and two bits for the output. Thus, when you see 00/01 inside one of the circles, it is shorthand for q1q0 = 00 / z1 z0 = 01. Tracing using Timing Diagrams Given the Moore machine in the previous diagram, and the timing diagram below, you might be asked to determine the state and output.

The timing diagram isn't too hard to follow. Basically, you will start off in some state (let's say, 00), and draw the diagram to indicate what happens to the state (q1q0) and to the output (z1z0). You'll notice the input does NOT change at the positive edge. That way, it's easier for you to tell the value of the input at the positive edge. To make it easier to read, I've added the value of x at the positive edge. Thus, the inputs are 1, 1, 0, 1, 1, 0. Let's look at the timing diagram at the first positive edge (drawn with a vertical line). Before the first edge, the state, q1q0 = 00. The input is 1. This should put us in state 01 (i.e., q1q0 = 00), which outputs 11 (i.e., z1z0 = 11).

You have to read down the columns. The first column says that the machine is in state 00, with output 01. The second column says that the machine is in state 01, with output 11. The reason the second column says that is due to the input, x, read in at the first positive edge. The input x is 1, which caused the FSM to move from state 00 to state 01. The value of the state and output are placed in the middle, but the really, it's the dark line that tells you when this happens. The state and output changes value on the positive edge (technically, it takes a small, but finite amount of time after the positive edge for the state and output to finally settle down, but we'll draw the diagrams as if it happens instantaneously---even though it doesn't). Here's the rest of the timing diagram.

FSM with Outputs: Mealy machines A Moore machine has outputs that are a function of state. That is, z = f( qk1,..., q0 ).

A Mealy machine has outputs that are a function of state and input, that is That is, z = f( qk-1,..., q0, xm-1,..., x0 ). We usually indicate that the output is depedent on current state and input by drawing the output on the edge. In the example below, look at the edge from state 00 to state 01. This edge has the value 1/1. This means, that if you are in state 00, and you see an input of 1, then you output a 1, and transition to state 01. Thus, 1/1 is short hand for x = 1 / z = 1. Here's a sample Mealy machine. One thing you will notice is the numbering of the states. Usually, if there are 3 states, we number them 00, 01, and 10, since those are the first 3 UB numbers. However, given that we're using two bits, we can, in principle, pick any 3 of the 4 possible 2-bit numbers. One reason we might want to pick something else besides 00, 01, and 10 is because implementing an FSM with minimal gates often involves picking the correct state numbering. Thus, if you're careful which state is numbered, say, 00, 01, and 11, you may be able to create a circuit that has fewer gates. However, minimization of the circuit based on well-chosen state numberings is outside the scope of the course. We only pick the state numberings just to make a note that this could happen, but we won't take advantage of this fact. Another interesting point to observe is how a Mealy machine differs from a Moore. Already we said that a Mealy machine's output may depend on both the values of state and input variables. We can see this in the example. Look at the edge from 00 to 01. This edge says that if a 1 is input, we will transition to state 01 and output a 1. Now look at the loop in state 01. This says that if the input is 1, we will loop back to state 01, and output a 0. So, in the first case, going to state 01 outputs a 1, whereas in the second case, going to state 01 outputs a 0. In a Moore machine, this would not happen. The output depends only on the state you transition into, not how you got into that state. Tracing using Timing Diagrams To see how the previous Mealy machine behaves, we can use timing diagrams. We'll use the same input as before.

The following timing diagram shows what happens to the state (q1q0) and to the output (z).

Just to see what happens. Initially, we're in state 00, with an output of 0. It doesn't terribly matter what the initial output is. Unlike the Moore machine, a Mealy machine's output doesn't depend on the current state. In state 00, we see an input of a 1. This takes us to state 01, with an output of 1. If you read the second column of numbers, you see 0 and a 1 (which is state 01), followed by a 1 (which is the output). Equivalence of Mealy and Moore machines We have two ways to describe a FSM: Mealy and Moore machines. A mathematician might ask: are the two machines equivalent? Initially, you might think not. A Mealy machine can have its output depend on both input and state. Thus, if we ignore the state, we should be able to convert a Moore machine to a Mealy machine. It's not so easy to see that you can convert an arbitrary Mealy machine to a Moore machine. It turns out that the two machines are equivalent. What does that mean? It means that given a Moore machine, you can create a Mealy machine, such that if both machines are fed the same sequence of inputs, they will both produce the same sequence of outputs. You can also convert from a Mealy machine to its equivalent Moore machine, and again generate the same outputs given the same sequence of inputs. Actually, to be precise we must ignore one fact about Moore machines. Moore machines generate output even if no input has been read in. So, if you ignore this initial output of the Moore machine, you can convert between one machine and the other. The actual algorithm is beyond the scope of the course. However, the basic idea of converting a Meal Mealy machine to a Moore machine is to increase the number of states. Roughly speaking, if you have a Mealy machine with N states, and there are k bits of input, you may need up to 2kN states in the equivalent Moore machine. Effectively, the new states record information about how that state was reached. Regular Languge:

Regular languages We define two new operations on languages, considering languages as sets of strings. Def 17.9 (p. 462) The concatenation (or set product) XY (sometimes written X Y) of two sets of strings X and Y, XY = {x y : x X, y Y} The Kleene star or closure of a set of strings X is X*, the set of all strings formed by concatenating members of X any number of times (including zero) in any order and allowing repetitions. This is just like our existing notion A* for alphabet A except that now X is a set of strings, not just an alphabet (which we can consider as a set of strings of length one). So the old notion is just a special case of the new more general notion, the case where A is a set of strings of length one. Using these two new operations on sets of strings, as well as standard settheoretic notions, we can now define the regular languages recursively. Definition 17.10 (p. 463) Given an alphabet A: 1. 2. For any string x in A*, {x} is a regular language. 3. If X, Y are regular languages, then so is X Y. 4. If X, Y are regular languages, then so is XY. 5. If X is a regular language, then so is X*. 6. Nothing else is a regular language. : Deterministic Finite Automata (DFA): We now begin the machine view of processing a string over alphabet . The machine has a semi-inifinite tape of squares holding one alphabet symbol per square. The machine has a finite state set, K, with a known start state. Initially we want the processing to be deterministic, that is, there is only one possible outcome from processing a string. Here is how we process it: place string on a tape with one symbol in each square place machine in start state and the read head on the first square a computation step is done by considering the (current state, current tape symbol) and based the value of this pair, move to a new state and move the tape head one square to the right stop the machine when there are no more symbols to process

According to the description, the state change operation is a function K . There are a number of interpretations we can give to this processing method, but the one of most interest to us is that of accepting the input string based on terminal state. Because of the deterministic behavior, we can say more strongly that it is deciding this string, as to whether it belongs to the language or not. The natural interpretation is means that an accepted string's terminal belongs to a set of final, or accepting states, F K. The description of a DFA M = (K, , , s, F) is that which is defined in the textbook. We must define precisely what it means to compute an input string. Because the machine never goes back, and stops after the last symbol, we can characterize the machine state as a configuration, which is an element of K * representing (current machine state, remaining portion of string to process). We define the binary relation: | (K*) (K*) the yields in one step relation It is defined as follows: for all , (p,w) | (q,w) for all w * if and only if (p,) = q This makes rigorous several notions about processing in a DFA: symbols are processed from left to right, processing each one only once the state change only consults the current symbol (what is ahead in the string is irrelevant) Define |* to be the reflexive, transitive closure of |; this is called the yields relation. In a DFA, computing the string w means to put the machine in the start configuration: (s,w) A terminal state, q, is one in which (s, w) |* (q, ) Because the state transition is a function, it implies that there is only one state p such that (q,w) |* (p,). It is sometimes convenient to express the state transition as a function: * : K * K where *(q, w) = p if and only if (q, w) |* (p, ) Because of the nature of the DFA, this function is well-defined. Acceptance We say the string w is accepted if, proceeding from the start state, the terminal state we reach is final i.e., F. Concisely, w is accepted if *(s,w) F. Because of the determinism, it is sometimes said that w is decided by a DFA. The language accepted by a DFA is the set of all strings accepted by a DFA. DFA and Regular Language Equivalence

One of the main goals of Chapter 2 is to show the following: Theorem: A language is accepted by a DFA if and only if it is a regular language (i.e., has a regular expression). Graphical representation of DFA Finite automata lend themselves readily to graphical interpretation. Each state is a node in the graph: The start state is designated by: and final states are designated by: A transition (p,) = q is represented by the labelled edge (p,,q): Although it seems obvious, we still should state the equivalence of the two representations, in that state transitions by a string is equivalent to paths in the graph: Claim: (q0,1,q1) (q1,2,q2) ... (qn-1,n,qn) are labelled edges if and only if (q0, 12...n) |* (qn, ) The empty path (no labelled edges) corresponds to the empty string (no symbols). Example 2.1.1 in textbook: This is the even-parity checker for the number of b's in a string, i.e., the machine accepts the language L = { w : #b's in w is even } Accepted strings include: , a*, a*ba*b. We'll soon see the construction of a RE from a DFA, but this one is easy. You can see two looping paths from the start/final state back to itself by either a or ba*b. Choose from (aba*b) represents one loop going by two general "paths". We can repeat this 0 or more times, getting (aba*b)* as our RE for the DFA. Switching final and non-final states gives us a DFA which represents the complement of the previous language: L = { w : #b's in w is odd }. Note that the RE doesn't easily "complement" in any way. Example 2.1.2 in textbook: L = { w : w does not contain the substring bbb } The state q3 is called a dead state, because no paths beyond this point can reach a final state. Again, switch final and non-final states constructs the DFA for the complement language: L = { w : w contains the substring bbb } It's also easy to see that a regular expression for L is (ab)*bbb(ab)*, since we only need to find the substring bbb somewhere in the string, not necessarily finding the first occurrence, which is what the DFA does. DFAs for substring acceptance

The previous example can be generalized. Consider the language { w * : w contains the substring u } This is very easily expressed by the regular expression (*)u(*). To generate a DFA, write u = 1...n Draw the partial DFA representing the "success" transitions: (q0,1,q1) (q1,2,q2) ... (qn-1,n,qn) Start state = q0, final state = qn. What is missing are the failure transitions: (qi-1, , ??), for i We don't always go back to the start state when we fail. In general, look at the failure string: y = 1...i-1 where i What we want to find is the string, x where x = largest suffix of the failure string, y, which is also a prefix of the success string, u Take this string x and run the DFA from the start state to state p, and add the failure transition (qi-1, , p) Here is another example with = {a,b}: L = { w : w contains the substring abaa }. The setup is this: For example, consider the failure string at q3: abab. Observe that, for x = ab: abx = failure string xaa = success string and that x is the largest possible substring satisfying this requirement. Therefore, to get the target state of the failure transition, run x from the start, thereby adding: ( q3, b, q2 ) Completing this procedure, we get this DFA: Complement We will state precisely one concept that we have been suggesting in the above examples. Theorem: If a language L is accepted by a DFA, then there is a derived DFA which accepts L = * - L. Given the DFA (K,,,s,F), the derived DFA is (K,,,s,K-F). Namely, the final state set of the derived DFA is the complement of the final state set of the original DFA. Product Construction: intersection and union

Theorem: If languages L1 and L2 over are accepted by a DFAs, then there are derived DFAs which accept the intersection: L1 L2 the union: L1 L2 Given the DFAs (K1,,1,s1,F1) and (K2,,2,s2,F2) for L1 and L2, respectively, the derived DFAs are of the form (K1K2,,,(s1,s2),F) where (p,q) = ( 1(p), 2(q) ) and F is either: for the intersection: F = F1F2 for the union: F = F1K2 K1F2 Intuitively the idea is to run the two DFA's simultaneously to a terminal state (q1,q2) and then Accept the string if (for the intersection) both q1 and q2 are final in their respective DFAs. (for the union) at least one of q1 and q2 are final in their respective DFAs. Intersection Example Find a DFA for the language over {a,b}: { w : w has an even number of b's and does not contain the substring bb } Here are the two languages and their DFAs: { w : w has an even number of b's } { w : w does not contain the substring bb } The intersection construction gives us a machine with 6 states: {Ax, Ay, Az, Bx, By, Bz} derived from all state pairs. The hard part is usually figuring out how to meaningfully draw the constructed DFA. We can lay the states out in a 2x3 grid, but the crossing of the edges tends to obscure any simple sense of the behavior. Here in one possible rendering: Both states Az and Bz are both dead and can effectively be replaced by a single state. Minimization Reducing the number of states is a step towards minimizing the DFA. There is a procedure for doing so and it concerns finding states which are equivalent in the sense that the set of strings which lead to final states starting from any of them is the same. In the case of a dead state, the set of such strings is simply . Equivalent states can be replaced by a single state Conversion deterministic automata:

E- closures- regular expression finite automata: A regular expression specifies a language The regular languages are those languages specified by regular expressions regular expressions: Example: 01* | 00 is the regular expression denoting strings beginning 0, followed by any number of 1s, or 0 followed by a single 0 Terminology: finite automaton = finite state automaton. The class is sometimes called FSA, or just FA. The languages accepted by a fsa is called a finite state languages. These are also the regular languages, but we use a separate definition, later, to define regular language, and then we prove the equivalence of the two classes. We will also consider Chomsky formal grammars and Chomsky hierarchy. We can prove that a particular class of formal grammars, the Type 3 grammars, define the same class of languages, the regular languages. We will thus have three independent characterizations of the same class of languages. An automaton may be deterministic or non-deterministic. We will first define deterministic fsa, then non-deterministic, then show that for fsa (this is not true for some other classes of automata), the two subclasses of fsa are equivalent with respect to the class of languages accepted. State diagrams. Example illustrating how dfa work. [blackboard; fig 17-2, p.456] Transitions of the form (qi, a, qj) or (qi, a, qj Definition. A deterministic finite automaton (dfa) M is a 5-tuple <K, A, F>, where1 K is a finite set of states A is an alphabet qin K, the initial state , qin,

F K, the final states K A K, is the transition function (or next-state function). What makes this a deterministic for each state and symbol, there is exactly one transition to a next state. Automata accept some strings (and dont accept others). 1 We use A for alphabet where PtMW use , and we use qin for the initial state where PtMW use q0. Ling 726: Mathematical Linguistics, Lecture 13-14 Finite State Automata and Languages V. Borschev and B. Partee, October 31- Nov 1, 2006 3 Definition. Given a dfa M, a string x A*, x = a1a2 an, ai A accepted by M iff there exists a sequence of states q1, q2, , qn, qn+1 such that: 1) q1 = qin is the initial state, 2) qn+1 F is a final state, 3) if x is not empty (i.e. n qi, xi) = qi+1, and in the case n = 0 string x is empty, x = e, then for e to be accepted by M is enough that two first conditions, (1) and (2) hold, i.e. q1 = qin and q1 F. The language L(M) accepted by a dfa M is the set of all strings accepted by M. Non-deterministic fas (nfa). Two in-principle weakenings of the requirements, and two more that are optional but commonly included. (i) for a given state-symbol pair, possibly more than one next state. [this is THE crucial one] (ii) for a given state-symbol pair, possibly no next state. [this could always be modelled by adding a dead-end state] (iii) allowing a transition of the form (qi, w, qj) where w A*, i.e. being able to read a string of symbols in one move, not only a single symbol. And as a noteworthy subcase of that, (iv) allowing a transition of the form (qi, e, qj) : changing state without reading a symbol. Example. fig 17-3, p. 459 A string is accepted by a non-deterministic fa if there is some path through the state diagram which begins in the initial state, reads the entire string, and ends in a final state.

Formal definition of non-deterministic fa. Just like formal definition of dfa, except that in subset of K A K (i.e. the set of transitions of the form (qi, w, qj) where w A*). The definitions of acceptance of a string is similar to one for dfa, Definition. Given a nfa M, a string x A* is accepted by M iff there exist two sequences, a sequence of strings w1, w2 , , wn, wi A*, n x = w1 w2 wn, and a sequence of states q1, q2, , qn, qn+1 such that: 1) q1 = qin is the initial state, 2) qn+1 F is a final state, 3) If n qi, wi, qi+1) x is empty, x = e, then for e to be accepted by M is enough that two first conditions, (1) and (2), hold, i.e. q1 = qin and q1 F. Equivalence of deterministic and non-deterministic fsa. This is a major result it is not self-evident. The algorithm for constructing an equivalent deterministic fsa, given a nondeterministic one, is a bit complex and we wont do it; in the worst case it may give a dfa with Ling 726: Mathematical Linguistics, Lecture 13-14 Finite State Automata and Languages V. Borschev and B. Partee, October 31- Nov 1, 2006 2n states corresponding to a nfa with n states. (And that presupposes that we take the narrower definition of nfa, with weakenings (i) and (ii) but not (iii) or (iv).) Why it is useful to have both notions: The deterministic fa are conceptually more straightforward; but in a given case it is often easier to construct a nondeterministic fa. Also, for some other classes of automata that we will consider, the two subclasses are not equivalent, so the notions remain important. Theorem. (Kleene) A set of strings is a finite automaton language iff it is a regular language. We can sketch one half of the proof by showing how to construct a finite state automaton corresponding to any given regular expression. (See pp. 464-468) Steps in the proof: i. The empty language is a fal (finite automaton language)

iii. fals are closed under union. iv. fals are closed under concatenation. v. fals are closed under the Kleene star operation. Minimization Of Automata:

One important result on finite automata, both theoretically and practically, is that for any regular language there is a unique DFA having the smallest number of states that accepts it. Let M = < Q , , q0 , , A > be a DFA that accepts a language L. Then the following algorithm produces the DFA, denote it by M1, that has the smallest number of states amomg the DFAs that accept L. Minimization Algorithm for DFA Construct a partition = { A, Q - A } of the set of states Q ; }; new := new_partition( while (


; ) new := new_partition( ; final := function new_partition( ) for each set S of do partition S into subsets such that two states p and q of S are in the same subset of S if and only if for each input symbol, p and q make a transition to (states of) the same set of . The subsets thus formed are sets of the output partition in place of S. If S is not partitioned in this process, S remains in the output partition. end Minimum DFA M1 is constructed from



as follows:

Select one state in each set of the partition final as the representative for the set. These representatives are states of minimum DFA M1. Let p and q be representatives i.e. states of minimum DFA M1. Let us also denote by p and q the sets of states of the original DFA M represented by p and q, respectively. Let s be a state in p and t a state in

q. If a transition from s to t on symbol a exists in M, then the minimum DFA M1 has a transition from p to q on symbol a. The start state of M1 is the representative which contains the start state of M. The accepting states of M1 are representatives that are in A. Note that the sets of final are either a subset of A or disjoint from A.

Remove from M1 the dead states and the states not reachable from the start state, if there are any. Any transitions to a dead state become undefined. A state is a dead state if it is not an accepting state and has no out-going transitions except to itself.

Example 1 : Let us try to minimize the number of states of the following DFA.


= { { 1 , 5 } , { 2 , 3 , 4 } }.

New_Partition is applied to . Since on b state 2 goes to state 1, state 3 goes to state 4 and 1 and 4 are in different sets in , states 2 and 3 are going to be separated from each other in new . Also since on a sate 4 goes to sate 4, state 3 goes to state 5 and 4 and 5 are in different sets in , states 3 and 4 are going to be separated from each other in new. Further, since on b 2 goes to 1, 4 goes to 4 and 1 and 4 are in different sets in

, 2 and 4 are separated from each other in new. On the other hand 1 and 5 make the same transitions. So they are not going to be split. Thus the new partition is { { 1 , 5 } , { 2 } , { 3 } , { 4 ] }. This becomes the the second iteration. When new_partition is applied to this new transitions, remains unchanged. Thus final = { { 1 , 5 } , { 2 } , { 3 } , { 4 ] }. , since 1 and 5 do the same in

Select 1 as the representative for { 1 , 5 }. Since the rest are singletons, they have the obvious representatives. Note here that state 4 is a dead state because the only transitionout of it is to itself. Thus the set of states for the minimized DFA is { 1 , 2 , 3 }. For the transitions, since 1 goes to 3 on a, and to 2 on b in the original DFA, in the minimized DFA transitions are added from 1 to 3 on a, and 1 to 2 on b. Also since 2 goes to 1 on b, and 3 goes to 1 on a in the original DFA, in the minimized DFA transitions are added from 2 to 1 on b, and from 3 to 1 on a. Since the rest of the states are singletons, all transitions between them are inherited for the minimized DFA. Thus the minimized DFA is as given in the following figure:

Example 2 : Let us try to minimize the number of states of the following DFA.

Initially = { { 3 } , { 1 , 2 , 4 , 5 , 6 } }. By applying new_partition to this , new = { { 3 } , { 1 , 4 , 5 } , { 2 , 6 } } is obtained. Applyting new_partition to this , new = { { 3 } , { 1 , 4 } , { 5 } , { 2 } , { 6 } } is obtained. Applyting new_partition again, new = { { 1 } , { 2 } , { 3 } , { 4 } , { 5 } , { 6 } } is obtained. Thus the number of states of the given DFA is already minimum and it can not be reduced any further.

Moore and Mealy machine and their equivalence:

Example Q3.Derive a minimal state table for a single-input and single-output Mooretype FSM that produces an output of 1 if in the input sequence it detects either 110 or 101 patterns. Overlapping sequences should be detected. (Show the detailed steps of your solution.)

Unit III Pumping lemma for regular sets-closure properties of regular sets-decision properties for regular sets, equivalence between regular language and regular grammar. Context freelanguage parse trees and ambiguity, reduction of CFGS, chomsky and Griebach normal forms Pumping lemma for regular sets: Other view of the concept of language: not the formalization of the notion of e_ective procedure, but set of words satisfying a given set of rules Origin : formalization of natural language Example 1. a phrase is of the form subject verb 2.a subject is a pronoun 3. a pronoun is he or she 4. a verb is sleeps or listens Possible phrases: 1. he listens 2. he sleeps 3. she sleeps 4. she listens Grammars: _ Grammar: generative description of a language _ Automaton: analytical description

_ Example: programming languages are de_ned by a grammar (BNF), but recognized with an analytical description (the parser of a compiler), _ Language theory establishes links between analytical and generative language descriptions.

Context free language: Context free grammars (CFG) and languages (CFL) Goals of this chapter: CFGs and CFLs as models of computation that define the syntax of hierarchical formal

notations as used in programming or markup languages. Recursion is the essential feature that distinguish CFGs and CFLs from FAs and regular languages. Properties, strengths and weaknesses of CFLs. Equivalence of CFGs and NPDAs. Non-equivalence of deterministic and non-deterministic PDAs. Parsing. Context sensitive grammars CSG. Context free grammars and languages (CFG, CFL) Algol 60 pioneered CFGs and CFLs to define the syntax of programming languages (Backus-Naur Form). Ex: arithmetic expression E, term T, factor F, primary P, a-op A = {+, -}, m-op M = {, /}, exp-op = ^. EP, P( E ) [Notice the recursion: E ->* ( E ) ] Ex Recursive data structures and their traversals: Binary tree T, leaf L, node N: T T(suffix). These definitions can be turned directly into recursive traversal procedures, e.g: procedure traverse (p: ptr); begin if p nil then begin visit(p); traverse(p.left); traverse(p.right); end; end; Df CFG: G = (V, A, P, S) V: non-terminal symbols, variables; A: terminal symbols; S symbol, sentence; P: set of productions or rewriting rules of the form X -> w, where X Rewriting step: for u, v, x, y, y, z -> v iff u = xyz, v = xyz and y -> y Derivation: ->* is the transitive, reflexive closure of ->, i.e. u ->* v iff -1 -> wj, wk = v. L(G) context free language generated by G: L(G) = {w ->* w }. Ex Symmetric structures: L = { 0n 1n | n 0 }, or even palindromes L0 = { w G(L) = ( {S}, {0, 1}, { S -> 0S1, S -> 0S0, S -> 1S1, S rules: S -> 0, S -> 1 to G(L0). Ex Parenthesis expressions: V = {S}, T = { (, ), [, ] }, P = { S -> (S), S -> [S], S -> SS } Sample derivation: S -> SS -> SSS ->* ()[S][ ] -> ()[SS][ ] ->* ()[()[ ]][ ] The rule S -> SS makes this grammar ambiguous. Ambiguity is undesirable in practice, since the syntactic structure is generally used to convey semantic information.

Ex Ambiguous structures in natural languages: Time flies like an arrow vs. Fruit flies like a banana. Der Gefangene floh vs. Der gefangene Floh. Bad news: There exist CFLs that are inherently ambiguous, i.e. every grammar for them is ambiguous (see Exercise). Moreover, the problem of deciding whether a given CFG G is ambiguous or not, is undecidable. Good news: For practical purposes it is easy to design unambiguous CFGs. Exercise: a) For the Algol 60 grammar G (simple arithmetic expressions) above, explain the purpose of the rule E -> AT and show examples of its use. Prove or disprove: G is unambiguous. b) Construct an unambiguous grammar for the language of parenthesis expressions above. c) The ambiguity of the dangling else. Several programming languages (e.g. Pascal) assign to nested if-then[-else] statements an ambiguous structure. It is then left to the semantics of the language to disambiguate. Let E denote Boolean expression, S statement, and consider the 2 rules: S -> if E then S, and S -> if E then S else S. Discuss the trouble with this grammar, and fix it. d) Give a CFG for L = { 0i 1j 2k | i = j or j = k }. Try to prove: L is inherently ambiguous. Equivalence of CFGs and NPDAs Thm (CFG ~ NPDA): L A* is CF iff NPDA M that accepts L. Pf ->: Given CFL L, consider any grammar G(L) for L. Construct NPDA M that simulates all possible derivations of G. M is essentially a single-state FSM, with a state q that applies one of Gs rules at a time. The start state q0 initializes the stack with the content S , where S is the start symbol of G, and is the bottom of stack symbol. This initial stack content means that M aims to read an input that is an instance of S. In general, the current stack content is a sequence of symbols that represent tasks to be accomplished in the characteristic LIFO order (last-in first-out). The task on top of the stack, say a non-terminal X, calls for the next characters of the

imput string to be an instance of X. When these characters have been read and verified to be an instance of X, X is popped from the stack, and the new task on top of the stack is started. When is on top of the stack, i.e. the stack is empty, all tasks generated by the first instance of S have been successfully met, i.e. the input string read so far is an instance of S. M moves to the accept state and stops. The following transitions lead from q to q: -> w for each rule X -> w. When X is on top of the stack, replace X by a right-hand side for X. 2) a, a top of the stack, pop the stack. Rule 1 reflects the following fact: one way to meet the task of finding an instance of X as a prefix of the input string not yet read, is to solve all the tasks, in the correct order, present in the right-hand side w of the production X -> w. M can be considered to be a non-deterministic parser for G. A formal proof that M accepts precisely L can be done by induction on the length of the derivation of any w

Pf <- (sketch): Given NPDA M, construct CFG G that generates L(M). For simplicitys sake, transform M to have the following features: 1) a single accept state, 2) empty stack before accepting, and 3) each transition either pushes a single symbol, or pops a single symbol, but not both. For each pair of states p, q -terminal Vpq. L( Vpq ) = { w | Vpq ->* w } will be the language of all strings that that can be derived from Vpq according to the productions of the grammar G to be constructed. In particular, L( Vsf ) = L(M), where s is the starting state and f the accepting state of M. Invariant: Vpq generates all strings w that take M from p with an empty stack to q with an empty stack. The idea is to relate all Vpq to each other in a way that reflects how labeled paths and subpaths through Ms state

space relate to each other. LIFO stack access implies: any w from p to q regardless of the stack content at p, and leave the stack at q in the same condition as it was at p. Different ws this in different ways, which leads to different rules of G: 1) The stack may be empty only in p and in q, never in between. If so, w = a v b, for some a, b A, v -> (r, t) and (s,b, t) Vpq -> a Vrs b 2) The stack may be empty at some point between p and in q, in state r. For each triple p, q, r -> Vpr Vrq. 3) For each p The figure at left illustrates Rule1, at right Rule 2. If M includes the transitions -> (r, t) and (s,b, t) -> (q, tart and the end of the journey is to break the trip into three successive parts: 1) to read a symbol a and push t; 2) travel from r to s with identical stack content at the start and the end of this sub-journey; 3) to read a symbol b and pop t.

Normal forms When trying to prove that all objects in some class C have a given property P, it is often useful to first prove that each object O in C can be transformed to some equivalent object O in some subclass C of C. Here, equivalent implies that the transformation preserves the property P of interest. Thereafter, the argument can be limited to the the subclass C, taking advantage of any additional properties this subclass may have. Any CFG can be transformed into a number of normal forms (NF) that are (almost!) equivalent. Here, equivalent means that the two grammars define the same language, and the proviso almost is necessary because these normal forms cannot generate the null string. Chomsky normal form (right-hand sides are short):

All rules are of the form X -> Y Z or X -> a, for some non-terminals X, Y, Z and terminal a Thm: Every CFG G can be transformed into a Chomsky NF G such that L(G) = L(G) Pf idea: repeatedly replace a rule X -> Y Z, Y -> v, Z -> w, where Y and Z are new non-terminals used only in these new rules. Both right hand sides v and w are shorter than the original right hand side v w. The Chomsky NF changes the syntactic structure of L(G), an undesirable side effect in practice. But Chomsky NF turns all syntactic structures into binary trees, a useful technical device that we exploit in later sections on the Pumping Lemma and the CYK parsing algorithm. Greibach normal form (at every step, produce 1 terminal symbol at the far left - useful for parsing): All rules are of the form X -> a w, for some terminal a Thm: Every CFG G can be transformed into a Greibach NF G such that L(G) = L(G) Pf idea: for a rule X -> Y w, ask whether Y can ever produce a terminal at the far left, i.e. Y ->* a v. If so, replace X -> Y w by rules such as X -> a v w. If not, X -> Y w can be omitted, as it will never lead to a terminating derivation. The pumping lemma for CFLs Recall the pumping lemma for regular languages, a mathematically precise statement of the intuitive notion a FSM can count at most up to some constant n. It says that for any regular language L, any sufficiently long word w in L can be split into 3 parts, w = x y z, such that all strings x yk z, for any k 0, are also in L. PDAs, which correspond to CFGs, can count arbitrarily high - though essentially in unary notation, i.e. by storing k symbols to represent the number k. But the LIFO access limitation implies that the stack can only be used to represent one single independent counter at a time. To understand what independent means, consider a PDA that recognizes a language of balanced parenthesis expressions, such as ((([[..]]))). This task clearly calls for an arbitrary number of counters to be stored at the same time, each one dedicated to counting his own subexpression. In the example above, the counter for ((( must be saved when the counter for [[ is activated. Fortunately, balanced parentheses are nested in such a way that changing from one counter to another matches

the LIFO access pattern of a stack - when a counter, run down to 0, is no longer needed, the next counter on top of the stack is exactly the next one to be activated. Thus, the many counters coded into the stack interact in a controlled manner, they are not independent. The pumping lemma for CFLs is a precise statement of this limitation. It asserts that every long word in L serves as a seed that generates an infinity of related words that are also in L. Thm: For every CFL L there is a constant n such that every z z = u v w x y such that the following holds: u vk w xk y L for all k 0. Pf: Given CFL L, choose any G = G(L) in Chomsky NF. This implies that the parse tree of any z binary tree, as shown in the figure below at left. The length n of the string at the leaves and the height h of a binary tree are related by h log n, i.e. a long string requires a tall parse tree. By choosing the critical length n = 2 |V | + 1 we force the height of the parse trees considered to be h V + 1. On a root-to-leaf path of non-terminals, this implies that on some long root-to-leaf path we must encounter 2 nodes labeled with the same non-terminal, say W, as shown at right.

For two such occurrences of W (in particular, the two lowest ones), and for some u, v, y, x, w >* u W y, W ->* v W x and W ->* w. But then we also have W ->* v2 W x2, and in general, W ->* vk W xk, and S ->* u vk W xk y and S ->* u vk w xk y for all k 0, QED. For problems where intuition tells usa PDA cant do that, the pumping lemma is often the perfect tool needed

to prove rigorously that a language is not CF. For example, intuition suggests that neither of the languages L1 = { 0k 1k 2k / k 0 } or L2 = { w w / w For L1, a PDA would have to count up the 0s, then count down the 1s to make sure there are equally many 0s and 1s. Thereafter, the counters is zero, and although we can count the 2s, cant compare that number to the number of 0s, or of 1s, an information that is now lost. For L2, a PDA would have to store the first half of the input, namely w, and compare that to the second half to verify that the latter is also w. Whereas this worked trivially for palindromes, w wreversed, the order w w is the worst case possible for LIFO access: although the stack contains all the information needed, we cant extract the info we need at the time we need it. The pumping lemma confirms these intuitive judgements. Ex 1: L1 = { 0k 1k 2k / k 0 } is not context free. Pf (by contradiction): Assume L is CF, let n be the constant asserted by the pumping lemma. Consider z = 0n 1n 2n = u v w x y. Although we dont know where vwx is In other words, one or two of the three letters 0, 1, 2 is missing in vwx. Now consider u v2 w x2 y. By the pumping lemma, it must be in L. The w x y had an equal number of 0s, 1s, and 2s, whereas u v2 w x2 y cannot, since only one or two of the three distinct symbols increased in number. This contradiction proves the thm. Ex 2: L2 = { w w / w {0, 1} } is not context free. Pf (by contradiction): Assume L is CF, let n be the constant asserted by the pumping lemma. Consider z = 0n+1 1n+1 0n+1 1n+1 = u v w x y. Using k = 0, the lemma asserts z0 = u w y that z0 cannot have the form t t, for any string t, and thus that z0 to a contradiction. Recall that |v w x| n, and thus, when we delete v and x, we delete symbols that are within a distance of at most n from each other. By analyzing three cases we show that, under this restriction, it is impossible to delete symbols in such a way as to retain the property that the shortened string z0 = u w x has the form t t. We illustrate this using the example n = 3, but the argument holds for any n.

Given z = 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1, slide a window of length n = 3 across z, and delete any characters you want from within the window. Observe that the blocks of 0s and of 1s within z are so long that the truncated z, call it z, still has the form 0s 1s 0s 1s. This implies that if z can be written as z = t t, then t must have the form t = 0s 1s. Checking the three cases: the window of length 3 lies entirely within the left half of z; the window straddles the center of z; and the window lies entirely within the right half of z, we observe that in none of these cases z has the form z = t t, and thus that z0 = u w y Closure properties of the class of CFLs Thm (CFL closure properties): The class of CFLs over an alphabet A is closed under the regular operations union, catenation, and Kleene star. Pf: Given CFLs L, L respectively. Combine G and G appropriately to obtain grammars for L S), we obtain G(L*) = ( V -> S S0 , S0 The proof above is analogous to the proof of closure of the class or regular languages under union, catenation, and Kleene star. There we combined two FAs into a single one using series, parallel, and loop combinations of FAs. But beyond the three regular operations, the analogy stops. For regular languages, we proved closure under complement by appealing to deterministic FAs as acceptors. For these, changing all accepting states to nonaccepting, and vice versa, yields the complement of the language accepted. This reasoning fails for CFLs, because deterministic PDAs accept only a subclass of CFLs. For nondeterministic PDAs, changing accepting states to non-accepting, and vice versa, does not produce the complement of the language accepted. Indeed, closure under complement does not hold for CFLs. Thm: The class of CFLs over an alphabet A is not closed under intersection and is not closed under omplement. We prove this theorem in two ways: first, by exhibiting two CFLs whose intersection is provably not CF, and second, by exhibiting a CFL whose complement is provably not CF. Pf : Consider CFLs L0 = { 0m 1m 2n | m, n 1 } and L1 = { 0m 1n 2n | m, n 1 }. not CF, as we proved in the previous section using the pumping lemma.

This implies that the class of CFLs is not closed under complement. If it were, it would also be closed under this result in a direct way by exhibiting a CFL L whose complement is not context free. Ls complement is the notorious language L2 = {ww/w } , which we have proven not context free using the pumping lemma. Pf : We show that L = { u | u is not of the form u = w w } is context free by exhibiting a CFG for L: S -> Y | Z | Y Z | Z Y Y -> 1 | 0 Y 0 | 0 Y 1 | 1 Y 0 | 1 Y 1 Z -> 0 | 0 Z 0 | 0 Z 1 | 1 Z 0 | 1 Z 1 The productions for Y generate all odd strings, i.e. strings of odd length, with a 1 as its center symbol. Analogously, Z generates all odd strings with a 0 as its center symbol. Odd strings are not of the form u = w w, hence they are included in L by the productions S -> Y | Z . Now we show that the strings u of even length that are not of the form u = w w are precisely those of the form Y Z or Z Y. First, consider a word of the form Y Z, such as the catenation of y = 1 1 0 1 0 0 0 and z = 1 0 1, where the center 1 of y and the center 0 of z are highlighted. Writing y z = 1 1 0 1 0 0 0 1 0 1 as the catenation of two strings of equal length, namely 1 1 0 1 0 and 0 0 1 0 1, shows that the former center symbols 1 of y and 0 of z have both become the 4-th symbol in their respective strings of length 5. Thus, they are a witness pair whose clash shows that y z w w for any w. This, and the analogous case for Z Y, show that the set of strings of the form Y Z or Z Y are in L. Conversely, consider any even word u = a1 a2 .. aj .. ak b1 b2 .. bj .. bk which is not of the form u = w w. There exists an index j where aj bj, and we can take each of aj and bj as center symbol of its own odd string. The following example shows a clashing pair at index j = 4: u = 1 1 0 0 1 1 1 0 1 1. Now u = 1 1 0 0 1 1 1 0 1 1 can be written as u = z y, where z = 1 1 0 0 1 1 1 11 a1 a2 .. . . aj .. ak b1 b2 .. . . bj .. bk The word problem. CFL parsing in time O(n3) by means of dynamic programming Informally, the word problem asks: given G and w

More precisely: is there an algorithm that applies to any grammar G in some given class of grammars, and any w Many algorithms solve the word problem for CFGs, e.g: a) convert G to Greibach NF and enumerate all construct an NPDA M that accepts L(G), and feed w into M. count the number of 0s

Ex2: L = {w Y, Z Invariant: Y generates any string with an extra 1, Z generates any string with an extra 0. The production Z either produce a 0 now and follow up with a string in S, i.e with an equal number of 0s and 1s; or produce a 1 but create two new tasks Z.

. For CFGs there is a bottom up algorithm (Cocke, Younger, Kasami) that systematically computes all possible parse trees of all contiguous substrings of the string w to be parsed, and works the idea of the CYK algorithm using the following example: Ex2a: L = {w We exclude the nullstring in order to convert G to Chomsky NF. For the sake of formality, introduce Y that generates a single 1, similarly for Z and 0. Shorten the right hand side 0 Z Z by introducing a non terminal Z -> Z Z, and similarly Y -> YY. Every w from left to write, there comes an index k where #1s = #0s +1, and that prefix of w can be taken as u. The

remainder v has again #1s = #0s +1. The grammar below maintains the invariants: Y generates a single 1; Y generates any string with an extra 1; Y generates any string with 2 extra 1. Analogously for Z, Z, Z and 0. S -> Z Y | Y Z start with a 0 and remember to generate an extra 1, or start with a 1 and ... Z -> 0, Y -> 1 Z and Y are mere formalities Z -> 0 | Z S | Y Z produce an extra 0 now, or produce a 1 and remember to generate 2 extra 0s Y -> 1 | Y S | Z Y produce an extra 1 now, or produce a 0 and remember to generate 2 extra 1s Z -> Z Z, Y -> YY split the job of generating 2 extra 0s or 2 extra 1s The following table parses a word w = 001101 with |w| = n. Each of the n (n+1)/2 entries corresponds to a substring of w. Entry (L, i) records all the parse trees of the substring of length L that begins at index i. The entries for L = 1 correspond to rules that produce a single terminal, the other entries to rules that produce 2 nonterminals

The picture at the lower right shows that for each entry at level L, we must try (L-1) distinct ways of splitting that entrys substring into 2 parts. Since (L-1) < n and there are n (n+1)/2 entries to compute, the CYK parser works in time O(n3). Useful CFLs, such as parts of programming languages, should be designed so as to admit more efficient parsers,

preferably parsers that work in linear time. LR(k) grammars and languages are a subset of CFGs and CFLs that can be parsed in a single scan from left to right, with a look-ahead of k symbols. Context sensitive grammars and languages The rewriting rules B -> w of a CFG imply that a non-terminal B can be replaced by a word w any context. In contrast, a context sensitive grammar (CSG) has rules of the form: u B v -> u w v, where u, v, w implying that B can be replaced by w only in the context u on the left, v on the right. It turns out that this definition is equivalent (apart from the nullstri requiring that any CSG rule be of the form v -> w, where v, w (in any derivation, the current string never gets shorter) implies that the word problem for CSLs: given CSG G and given w, is w is decidable. An exhaustive enumeration of all derivations up to the length |w| settles the issue. As an example of the greater power of CSGs over CFGs, recall that we used the pumping lemma to prove that the language 0k 1k 2k is not CF. By way of contrast, we prove: Thm: L = { 0k 1k 2k / k 1 } is context sensitive. The following CSG generates L. Function of the non-terminals V = {S, B, C, Y, Z}: each Y and Z generates a 1 or a 0 at the proper time; B initially marks the beginning (left end) of the string, and later converts the Zs into 0s; C is a counter that ensures an equal number of 0s, 1s, 2s are generated. Nonterminals play a similar role as markers in Markov algorithms. Whereas the latter have a deterministic control structure, grammars are nondeterministic. S -> B K 2 at the last step in any derivation, B K generates 01, balancing this 2 K -> Z Y K 2 counter K generates (ZY)k 2k K -> C when k has been fixed, C may start converting Ys into 1s Y Z -> Z Y Zs may move towards the left, Ys towards the right at any time B Z -> 0 B B may convert a Z into a 0 and shift it left at any time Y C -> C 1 C may convert a Y into a 1 and shift it right at any time B C -> 01 when B and C meet, all permutations, shifts and conversions have been done The Chomsky hierarchy: Types of grammars defined in terms of additional restrictions on the form of the rules:

Type 0: No restriction. Type 1 P P VN e. Type 2: Each rule is of the form P . e.) Type 3: Each rule is of the form P xB or P x. Common names: Type 0: Unrestricted rewriting systems. Type 1: Context-sensitive grammars. Type 2: Context-free grammars. Type 3: Right-linear, or regular, or finite state grammars. Note than for type 2 (and of course Type 3) grammars the definition of rule application became more simple: string y is obtained from the string x by application the rule P these strings could be represented in the form x = lPr and y = l r where l, r A*. Correspondence of type 3 grammars and fsas. (Construction p.473) Every type 3 language is a fal. Can also show that every fal is a type 3 language (construction p.474). Automata viewed as either generators or acceptors. Grammars viewed as either generators or acceptors. Grammars and trees. Grammars of the types 1-3 generate not only strings but also trees of immediate constituents on these strings (for context-sensitive grammars (type 1) such a tree doesnt mirror context-restrictions in the process of this generation). [see 16.3 16.4] 6. Properties of regular languages. Closure properties: We already know that the class of fals is closed under union, concatenation, and Kleene star. What about intersection? Complementation? Show complementation: if L is a fal, then A* - L is a fal. Use fsa construction. Assume we have a deterministic fal M that accepts L. We can construct a deterministic fal M which accepts the complement of L just by interchanging final and non-final states. Therefore fals are also closed under intersection (why?). Ling 726: Mathematical Linguistics, Lecture 13-14 Finite State Automata and Languages V. Borschev and B. Partee, October 31- Nov 1, 2006 7 Therefore the class of regular languages over any fixed alphabet is a Boolean algebra. Decidability properties: is there an algorithm for determining ... ? -- The membership question: yes.

-- The emptiness question: yes. -- Does M accept all of A*? Problem (opt. exercise): Is there an algorithm for determining, given two machines M1, M2, whether L(M1) L(M2) ? (Yes. Show it.) Is there an algorithmic solution to the question of whether two fsas accept the same language? Language Parse Trees And Ambiguity: Trees And Ambiguity Parse trees A useful property of Boolean grammars is that they define parse trees of the strings they generate [18], which represent parses of a string according to positive conjuncts in the rules. These are, strictly speaking, finite acyclic graphs rather than trees. A parse tree of a string w = a1 . . . a|w| from a nonterminal A contains a leaf labelled ai for every i-th position in the string; the rest of the vertices are labelled with rules from P. The subtree accessible from any given vertex of the tree contains leaves in the range between i + 1 and j, and thus corresponds to a substring ai+1 . . . aj . In particular, each leaf ai corresponds to itself. For each vertex labelled with a rule A 1&. . .&m&1&. . .&n and associated to a substring ai+1 . . . aj , the following conditions hold: 1. It has exactly |1|+. . .+|m| direct descendants corresponding to the symbols in positive conjuncts. For each nonterminal in each k, the corresponding descendant is labelled with some rule for that nonterminal, and for each terminal a _, the descendant is a leaf labelled with a. 2. For each k-th positive conjunct of this rule, let k = s1 . . . s. There exist numbers i1, . . . , i1, where i = i0 6 i1 6 . . . 6 i1 6 i = j, such that each descendant corresponding to each st encompasses the substring ait1+1 . . . ait . 3. For each k-th negative conjunct of this rule, ai+1 . . . aj / LG(k). The root is the unique vertex with no incoming arcs; it is labelled with any rule for the non-

terminal A, and all leaves are reachable from it. To consider the uniqueness of a parse tree for different strings, it is useful to assume that only terminal leaves can have multiple incoming arcs. Condition 3 ensures that the requirements imposed by negative conjuncts are satisfied. However, nothing related to these negative conjuncts is reflected in the actual trees. For instance, parse trees of the second grammar from Example 3.1 reflect only the conjunct S AB, and thus are plain context-free trees. On the other hand, parse trees corresponding to any conjunctive grammar, such as the first grammar in Example 3.1, reflect full information about the membership of a string in the language. 4.2.2 Ambiguity Unambiguous context-free grammars can be defined in two ways: 1. for every string generated by the grammar there is a unique parse tree (in other words, a unique leftmost derivation); 2. for every nonterminal A and for every string w L(A) there exists a unique rule A s1 . . . s with w L(s1 . . . s), and a unique factorization w = u1 . . . u with ui L(si). Assuming that L(A) 6= ? for every nonterminal A, these definitions are equivalent. In the case of Boolean grammars, the first definition becomes useless, because negative conjuncts are not accounted for in a parse tree. The requirement of parse tree uniqueness can be trivially satisfied as follows. Given any grammar G over an alphabet _ = {a1, . . . , am} and with a start symbol 32 A. Okhotin, Formal grammars (draft November 26, 2009) S, one can define a new start symbol S and additional symbols bS and A, with the following rules: S A&bS bS A&S A a1A | . . . | amA | This grammar generates the same language, and every string in L(G) has a unique parse tree, which reflects only the nonterminal A and hence bears no essential information.

Trying to generalize the second approach for Boolean grammars in the least restrictive way, one may produce the following definition: for every nonterminal A and for every string w L(A) there exists a unique rule A 1&. . .&m&1&. . .&n (4.4) with w LG(t) and w / LG(t) for all t, such that for every positive conjunct t = s1 . . . s there exists a unique factorization w = u1 . . . u with ui L(si). However, this definition can be trivialized similarly to the previous case. Given a Boolean grammar G, replace every rule (4.4) with A C1&. . .&Cm&1&. . .&n, where every new nonterminal C has a unique rule C . The resulting grammar generates the same language and contains only negative conjuncts, and so the condition on factorizations in positive conjuncts is trivially satisfied (while the choice of a rule can be made unique as well using some additional transformations). Therefore, a proper definition of ambiguity for Boolean grammars must take into account factorizations of strings according to negative conjuncts. The following definition is obtained: Definition 4.6. A Boolean grammar G = (_,N, P, S) is unambiguous if I. Different rules for every single nonterminal A generate disjoint languages, that is, for every string w there exists at most one rule A 1&. . .&m&1&. . .&n, with w LG(1) . . . LG(m) LG(1) . . . LG(n). II. All concatenations are unambiguous, that is, for every conjunct A }s1 . . . s and for every string w there exists at most one factorization w = u1 . . . u with ui LG(si) for all i. Note that Condition II applies to positive and negative conjuncts alike. In the case of a positive conjunct belonging to some rule, this means that a string that is potentially generated by this rule must be uniquely factorized according to this conjunct. For a negative conjunct A DE, Condition II requests that a factorization of w LG(DE) into LG(D) LG(E) is unique even though w is not generated by any rule involving this conjunct. As argued above,

this condition cannot be relaxed. Consider some examples. Both grammars in Example 3.1 are unambiguous. To see that Condition II is satisfied with respect to the conjunct S AB, consider that a factorization w = uv, with u L(A) and v L(B), implies that u = a and v bc, so the boundary between u and v cannot be moved. The same argument applies to the conjuncts S DC and S DC. Different rules for each of A,B,C,D clearly generate disjoint languages. Boolean grammars 33 On the other hand, the grammar in Example ?? is ambiguous because Condition II does not hold. Consider the string w = aabb and the conjunct S AB. This string has two factorizations w = a abb = aab b, with a L(A), abb L(B), aab L(A) and b L(B). This, by definition, means that the grammar is ambiguous. It is not known whether there exists an unambiguous Boolean grammar generating the same language. Though, as mentioned above, the uniqueness of a parse tree does not guarantee that the grammar is unambiguous, the converse holds: Proposition 4.1. For any unambiguous Boolean grammar, for any nonterminal A N and for any string w LG(A), there exists a unique parse tree of w from A (assuming that only terminal vertices may have multiple incoming arcs). Another thing to note is that the first condition in the definition of unambiguity can be met for every grammar using simple transformations. Assume every nonterminal A has either a unique rule (4.1) of an arbitrary form, or multiple rules each containing a single positive conjunct: A 1 | . . . | n (where i (_ N)) (4.5) There is no loss of generality in this assumption, because any multipleconjunct rule for A can be replaced with a rule of the form A A, where A is a new nonterminal with a unique rule replicating the original rule for A. Then, for every nonterminal with multiple rules of the form (4.5), these rules can be replaced with the following n rules, which clearly generate disjoint

languages: A 1 A 2&1 A 3&1&2 ... A n&1&2&. . .&n1 (4.5) The grammar obtained by this transformation will satisfy Condition I. Additionally, Condition II, if it holds, will be preserved by the transformation. Proposition 4.2. For every Boolean grammar there exists a Boolean grammar generating the same language, for which Condition I is satisfied. If the original grammar satisfies Condition II, then so will the constructed grammar. This property does not hold for context-free grammars. Consider the standard example of an inherently ambiguous context-free language: {aibjck | i, j, k > 0, i = j or j = k}. Following is the most obvious ambiguous context-free grammar generating this language: S AB | DC A aA | B bBc | C cC | D aDb | Condition II is satisfied for the same reasons as in Example 3.1. On the other hand, Condition I is failed for the nonterminal S and for strings of the form anbncn, which can be obtained using each of the two rules, and this is what makes this grammar ambiguous. 34 A. Okhotin, Formal grammars (draft November 26, 2009) If the above context-free grammar is regarded as a Boolean grammar (ambiguous as well), then the given transformation disambiguates it in the most natural way by replacing the rules for the start symbol with the following rules: S AB | DC&AB . So it has been demonstrated that ambiguity in the choice of a rule represented by Condition I can be fully controlled in a Boolean grammar, which is a practically very useful property not found in the context-free grammars. On the other hand, ambiguity of concatenations formalized in Condition II seems to be, in general, beyond such control.

Complexity of parsing 43 Reduction of CFGS: Let A and B be languages over the same alphabet _. An algorithm that transforms: _ Strings in A to strings in B. _ Strings not in A to strings not in B. B is decidable ) A is decidable. A is undecidable ) B is undecidable. One way to show a problem B to be undecidable is to reduce an undecidable problem A to B. 1.A Turing machine M computes a function f if: M halts on all inputs. On input x it writes f(x) on the tape and halts. Such a function f is called a computable function. Examples: increment, addition, multiplication, shift. Any algorithm with output is computing a function. 2.Let A and B be languages over _. A is reducible to B if and only if: _ there exists a computable function f : __ ! __ such that _ for all w 2 __; w 2 A , f(w) 2 B. Notation: A _m B. FACT: A _m B , A _m B. w 2 A , f(w) 2 B is equivalent to w 62 A , f(w) 62 B. 3.To Re-iterate: 1. Construction: f(w) from w by an algorithm. 2. Correctness: w 2 A , f(w) 2 B. 4.An Example involving DFAs EQDFA = fA;B j A;B are DFAs and L(A) = L(B)g. EDFA = fA j A is a DFA and L(A) = ;g. A reduction machine on input A;B two DFAs: 1. Constructs the DFA A0 such that L(A0) = L(A). 2. Constructs the DFA B0 such that L(B0) = L(B). 3. Constructs the DFA M1 such that L(M1) = L(A) \ L(B0). 4. Constructs the DFA M2 such that L(M2) = L(A0) \ L(B). 5. Constructs the DFA C such that L(C) = L(M1) [ L(M2). 6. Outputs C. Correctness: _ Suppose L(A) = L(B). Then, L(C) = ;. _ Suppose L(A) 6= L(B). Then, L(C) 6= ;. That is, EQDFA _m EDFA. 5 An Example involving CFGs ALLCFG = fG j G is a CFG and L(G) = __g.

EQCFG = fG;H j G;H are CFGs and L(G) = L(H)g. A reduction machine on input G a context-free grammar with alphabet _: 1. Constructs a CFG H with rules of the form S0 aS0 j _, for all a 2 _. 2. Outputs (G;H). L(H) = __. Correctness: _ Suppose G generates all strings in __. Then, L(G) = L(H). _ Suppose G does not generate some string in __. Then L(G) 6= L(H). That is, ALLCFG _m EQCFG. Chomsky And Griebach Normal Forms: A context free grammar is in chomsky normal form if each production yields a terminal or two nonterminals. Any context free grammar can be converted to chomsky normal form. First remove all E productions. If x E, then a production like y AxBxC spins off the productions y ABC | AxBC | ABxC. Perform similar substitutions for all nonterminals that lead to the empty string, then remove productions that yield E. If E is in the language, i.e. s E, then this production must remain. This is the only nonstandard production in chomsky normal form; all other productions must yield a terminal or two nonterminals. Next, remove any productions x x, as they are pointless. Given x y, let x derive everything that y derives, then remove the unit production x y. At this point the right side of each production has two or more symbols. Introduce new symbols to play out the right side. For instance, x AyzBC might be replaced with the following. x q1q2 q1 A q2 yq3 q3 zq4 q4 q5q6 q5 B q6 C Do this across the board and the resulting grammar is in chomsky normal form. Prove that a word of length n is derived in 2n-1 steps. Hint, each terminal implies one step.

Rechaining A process called rechaining breaks left self-references. For instance, let the grammar contain the rules x x3 | x8. Assume the grammar is grounded, hence x leads to other things as well. Manufacture nonterminals q and r for the occasion and replace these two productions with the following. x q3 | q8 | q2r | q7r r q3r | q8r q (everything x produces, except x3 and x8) r (everything x produces, except x3 and x8) Start with a word and its derivation under the original grammar. Whenever x is replaced with a string of x's, look ahead to see if there are other such substitutions, and perform them now. The derivation will no longer be canonical, but that's all right. Expand x into x3 or x8 wherever we can, and if the new instances of x are so replaced, do that now, and so on. finally there is no instance of x in the intermediate string that is replaced with x3 or x8 under the given derivation. A block of x has increased in length, thanks to these productions. Use the new productions, based on q and r, to create a block of the same length, consisting of the nonterminals x, q, and r. Since we can replace q and r as we would x, replace each of these as dictated by the original derivation. Then proceed along the original path, until you run into another x x3 | x8, whence you can expand the blocks of x as before, and repeat the process. The end result is a new derivation for the given word, using the new grammar. In other words, the language has not changed. Perform rechaining across the entire grammar, so that a symbol never leads to a string of itself. Notice that the manufactured symbols never reference themselves on the left. Verify that the new grammar is grounded and connected. Once this is accomplished, a similar form of rechaining eliminates all instances of self-reference on the left. Suppose x produces xa, xb, c, and d, where a and b are not powers of x's, and c and d do not start with x. Note that something besides xa and xb must exist, else the nonterminal x would not be grounded. Let q be a new nonterminal, manufactured for the occasion. Replace the aforementioned productions of x with the following. x c | d | cq | dq q a | aq | b | bq

Once again, q does not reference itself on the left. Since there are no E productions, we can expand any of the symbols in a and b, as we like, and q never references itself on the left. However, x might still reference itself, if a or b starts with x. If b starts with x5, the left reference is now x5 instead of x6. Rechain again and the left reference becomes x4. Repeat until all left references go away. Do this for all nonterminals and there are no left self-references in sight. Substitution The following two productions imply the third; this is a form of substitution. x uvwC u 123 x 123vwC In the first production, x leads to u, but after substitution, x leads to 1. Substitutions can occur anywhere, but in what follows, we will only replace the initial symbol of the right hand side. Greibach Intermediate Form A grammar is in greibach intermediate form if each rule leads to a terminal, a terminal followed by some nonterminals, or a string of nonterminals. Notice that chomsky normal form satisfies greibach intermediate form. Also, any of the aformentioned rechaining operations leaves a greibach intermediate grammar in greibach intermediate form. Finally, when one production is substituted into another, replacing the lead symbol of the right hand side, the new production, which is implied by the first two, is in greibach intermediate form. We can substitute and rechain to our heart's content. Greibach Normal Form A grammar is in greibach normal form if each production yields a terminal followed by a (possibly empty) sequence of nonterminals. Again, s E is a special case.

Assume a context free grammar is grounded and connected, and convert it to chomsky normal form, which is also greibach intermediate. Order the nonterminals in any way you like, but let s be the first nonterminal, also known as x1. The terminals are in a group by themselves, beyond the nonterminals. Start by rechaining x1, so there are no self-references on the left. Place any manufactured symbols at the end of the ordered list of nonterminals. Suppose a production carries xj to x1, for j > 1. In other words, x1 appears first in the right side of the production. Substitute for x1, using all the rules of x1. Now xj produces strings that begin with x2 or higher, and we can drop the original rule that carried xj to x1. Perform similar replacements for all the rules of xj, for all j beyond 1. If x2 references itself on the left, perform rechaining as described above. Now x2 leads to x3 or higher. If j exceeds 2, and a rule carries xj to x2, substitute for x2, so that xj leads to x3 or higher. Do the same for x3, x4, x5, and so on, until every production carries a symbol to a terminal or a higher nonterminal. What about the rules that start with a manufactured symbol such as q? Suppose, at the time of its creation, q yields x1. Throughout the course of our substitutions, x1 might become x3, which could become x7, which could become x129, the last nonterminal in the original grammar. However, the grammar is grounded, and terminals come after nonterminals, so there must be a substitution that turns x129 into a terminal. This carries q to something higher, and as mentioned earlier, q never references itself. The process runs through all the nonterminals, including the manufactured symbols, and it terminates. Suppose a rule replaces x3 with a string of nonterminals that begins with x7. Replace x7 with all its right hand sides, and x3 leads to strings that begin with terminals, or nonterminals beyond x7. Do this again and again, until each rule leads to a terminal. The language is the same, and the grammar is in greibach normal form.

Unit IV Push down Automata (PDA)-non Determinism-acceptance by two methods and their equivalence, conversion of PDA to CFG CFLs and PDAs-closure and decision properties of CFLs. Push down Automata (PDA): Pushdown Automata (PDA) Just as a DFA is a way to implement a regular expression, a pushdown automata is a way to implement a context free grammar PDA equivalent in power to a CFG Can choose the representation most useful to our particular problem Essentially identical to a regular automata except for the addition of a stack Stack is of infinite size Stack allows us to recognize some of the non-regular

Conversion Of PDA To CFG:

Non-Deterministic pushdown automaton: A non-deterministic pushdown automaton (NPDA), or just pushdown automaton (PDA) is a variation on the idea of a non-deterministic finite automaton (NDFA). Unlike an NDFA, a PDA is associated with a stack (hence the name pushdown). The transition function must also take into account the ``state'' of the stack. Formally defined, a pushdown automaton $M$ is a 7-tuple $M=(Q,\Sigma,\Gamma,T,q_0,\bot,F)$ , where $Q, \Sigma, q_0,$ and $F$ , like those in an NDFA, are the set of states, the input alphabet, the start state, and the set of final states respectively. $\Gamma$ is the stack alphabet, specifying the set of symbols that can be pushed onto the stack. $\Gamma$ is not necessarily disjoint from $\Sigma$ . $\bot$ is an element of $\Gamma$ called the start stack symbol. The transition function is $$T : Q\times(\Sigma\cup\{\lambda\})\times\Gamma\to\mathcal{P}(Q \times \Gamma^*).$$ How It Works To see how the computing machine $M$ works, first imagine $M$ with the following features: a finite set $Q$ of internal states, a horizontal tape of cells each containing an input symbol of $\Sigma$ , a tape reader that reads at most one tape cell in any given internal state, and a vertical stack of cells storing symbols of $\Gamma$ . Now, given that $M$ is in state $p$ , with symbol $A$ on top of the stack, and tape reader pointing at a tape cell containing symbol $a$ , it may do one of the following: if $T(p,a,A)\ne \varnothing$ , then it ``pops'' $A$ off the stack, ``pushes'' word $A_1\cdots A_n$ onto the stack, by starting with symbol $A_n$ , and ending with symbol $A_1$ , ``consumes'' $a$ by moving the tape reader to the right of the cell containing $a$ , and enters state $q$ , provided that $(q,A_1\cdots A_n)\in T(p,a,A)$ ; if $T(p,a,A)=\varnothing$ , then $M$ does nothing. if $T(p,\lambda,A)\ne \varnothing$ , then, without reading $a$ , it ``pops'' $A$ off the stack, ``pushes'' word $A_1\cdots A_n$ onto the stack, and enters state $q$ , as long as $(q,A_1\cdots A_n)\in T(p,\lambda,A)$ ; if $T(p,\lambda,A)=\varnothing$ , then $M$ does nothing. If $(q,\lambda) \in T(p,a,A)$ , then $A$ gets popped off, and nothing gets pushed onto the stack. Modes of Acceptance

A PDA is a language acceptor. We describe how words are accepted by a PDA $M$ . First, we start with configurations. A configuration of $M$ is an element of $Q\times \Sigma^* \times \Gamma^*$ . For any word $u$ , the configuration $(q_0,u,\bot)$ is called the start configuration of $u$ . A binary relation $\vdash$ on the set of configurations is defined as follows: if $(p,u,\alpha)$ and $(q,v,\beta)$ are configurations of $M$ , then $$(p,u,\alpha)\vdash (q,v,\beta)$$ provided that $\alpha=A\gamma$ and $\beta=B_1\cdots B_n\gamma$ , for some $A,B_1,\ldots, B_n \in \Gamma$ , and either $u=av$ , and $(q,B_1\cdots B_n)\in T(p,a,A)$ , or $u=v$ , and $(q,B_1\cdots B_n)\in T(p,\lambda,A)$ . Now, take the reflexive transitive closure $\vdash^*$ of $\vdash$ . When $(p,u,\alpha) \vdash^* (q,v,\beta)$ , we say that $v$ is derivable from $u$ . A word $u \in\Sigma^*$ is said to be accepted on final state by $M$ if $(q_0,u,\bot) \vdash^* (q,\lambda,\alpha)$ for some final state $q\in F$ , accepted on empty stack by $M$ if $(q_0,u,\bot) \vdash^* (q,\lambda,\lambda)$ , accepted on final state and empty stack by $M$ if $(q_0,u,\bot) \vdash^* (q,\lambda,\lambda)$ for some $q\in F$ . Languages Accepted by a PDA Given a mode of acceptance, the set of words accepted by $M$ is called the language accepted by $M$ based on that mode of acceptance. Given a PDA $M$ , there are three languages accepted by $M$ , corresponding to the three acceptance modes above. It turns out that three modes of acceptance are equivalent, in the following sense: if a language $L$ is accepted by $M$ on one acceptance mode, there are PDA $M_1$ and $M_2$ that accept $L$ in the other two acceptance modes. In general, unless otherwise stated, the language $L(M)$ accepted by a PDA $M$ stands for the language accepted by $M$ on final state. Remarks. Two PDAs are said to be equivalent if they accept the same language. It can be shown that any PDA is equivalent to a PDA where $T(p,\lambda,A)=\varnothing$ for all $p\in F$ and $A\in \Gamma$ (called a $\lambda$ -free PDA). One of the main reasons for studying PDA is: the notion of a PDA is equivalent to the notion of a context-free grammar. This means that, every language accepted by a PDA is context-free, and every context-free language is accepted by some PDA. Representation by State Diagrams

Like an NDFA, a PDA can be presented visually as a directed graph, called a state diagram. Instead of simply labelling edges representing transitions with the leading symbol, two additional symbols are added, representing what symbol must be matched and removed from the top of the stack (or $\lambda$ if none) and what symbol should be pushed onto the stack (or $\lambda$ if none). For instance, the notation a A/B for an edge label indicates that a must be the first symbol in the remaining input string and A must be the symbol at the top of the stack for this transition to occur, and after the transition, A is replaced by B at the top of the stack. If the label had been $\verb=a=\,\lambda\verb=/B=$ , then the symbol at the top of the stack would not matter (the stack could even be empty), and B would be pushed on top of the stack during the transition. If the label had been $\verb=a=\,\verb=A/=\lambda$ , A would be popped from the stack and nothing would replace it during the transition. For example, consider the alphabet $\Sigma := \left\{ \verb=(=, \verb=)= \right\}$ . Let us define a context-free language $L$ that consists of strings where the parentheses are fully balanced. If we define $\Gamma := \left\{ A \right\}$ , then a PDA for accepting such strings is: CFLs and PDAs-closure and decision properties of CFLs.: Closure Properties of CFL's | Substitution If a substitution s assigns a CFL to every symbol in the alphabet of a CFL L, then s(L) is a CFL. Proof _ Take a grammar for L and a grammar for each language La = s(a). _ Make sure all the variables of all these grammars are di_erent. F We can always rename variables whatever we like, so this step is easy. _ Replace each terminal a in the productions for L by Sa, the start symbol of the grammar for La. _ A proof that this construction works is in the reader. F Intuition: this replacement allows any string in La to take the place of any occurrence of a in any string of L. Example _ L = f0n1n j n _ 1g, generated by the grammar S ! 0S1 j 01.

_ s(0) = fanbm j m _ ng, generated by the grammar S ! aSb j A; A ! aA j ab. _ s(1) = fab; abcg, generated by the grammar S ! abA; A ! c j _. 1. Rename second and third S's to S0 and S1, respectively. Rename second A to B. Resulting grammars are: S ! 0S1 j 01 S0 ! aS0b j A; A ! aA j ab S1 ! abB; B ! c j _ 2. In the _rst grammar, replace 0 by S0 and 1 by S1. The combined grammar: S ! S0SS1 j S0S1 S0 ! aS0b j A; A ! aA j ab S1 ! abB; B ! c j _ Consequences of Closure Under Substitution 1. Closed under union, concatenation, star. F Proofs are the same as for regular languages, e.g. for concatenation of CFL's L1, L2, use L = fabg, s(a) = L1, and s(b) = L2. 1 2. Closure of CFL's under homomorphism. Nonclosure Under Intersection _ The reader shows the following language L = f0i1j2k3l j i = k and j = lg not to be a CFL. F Intuitively, you need a variable and productions like A ! 0A2 j 02 to generate the matching 0's and 2's, while you need another variable to generate matching 1's and 3's. But these variables would have to generate strings that did not interleave. _ However, the simpler language f0i1j2k3l j i = kg is a CFL. F A grammar: S ! S3 j A A ! 0A2 j B B ! 1B j _ _ Likewise the CFL f0i1j2k3l j j = lg. _ Their intersection is L. Nonclosure of CFL's Under Complement _ Proof 1: Since CFL's are closed under union, if they were also closed under complement, they would be closed under intersection by DeMorgan's law. _ Proof 2: The complement of L above is a

CFL. Here is a PDA P recognizing it: F Guess whether to check i 6= k or j 6= l. Say we want to check i 6= k. F As long as 0's come in, count them on the stack. F Ignore 1's. F Pop the stack for each 2. F As long as we have not just exposed the bottom-of-stack marker when the _rst 3 comes in, accept, and keep accepting as long as 3's come in. F But we also have to accept, and keep accepting, as soon as we see that the input is not in L(0_1_2_3_). Closure of CFL's Under Reversal Just reverse the body of every production. 2 Closure of CFL's Under Inverse Homomorphism PDA-based construction. _ Keep a \bu_er" in which we place h(a) for some input symbol a. _ Read inputs from the front of the bu_er (_ OK). _ When the bu_er is empty, it may be reloaded with h(b) for the next input symbol b, or we may continue making _-moves. Testing Emptiness of a CFL As for regular languages, we really take a representation of some language and ask whether it represents ;. _ In this case, the representation can be a CFG or PDA. F Our choice, since there are algorithms to convert one to the other. _ The test: Use a CFG; check if the start symbol is useless? Testing Finiteness of a CFL _ Let L be a CFL. Then there is some pumpinglemma constant n for L. _ Test all strings of length between n and 2n 1 for membership (as in next section). _ If there is any such string, it can be pumped, and the language is in_nite. _ If there is no such string, then n 1 is an upper limit on the length of strings, so the

language is _nite. F Trick: If there were a string z = uvwxy of length 2n or longer, you can _nd a shorter string uwy in L, but it's at most n shorter. Thus, if there are any strings of length 2n or more, you can repeatedly cut out vx to get, eventually, a string whose length is in the range n to 2n 1. Testing Membership of a String in a CFL Simulating a PDA for L on string w doesn't quite work, because the PDA can grow its stack inde_nitely on _ input, and we never _nish, even if the PDA is deterministic. 3 _ There is an O(n3) algorithm (n = length of w) that uses a \dynamic programming" technique. F Called Cocke-Younger-Kasami (CYK) algorithm. _ Start with a CNF grammar for L. _ Build a two-dimensional table: F Row = length of a substring of w. F Column = beginning position of the substring. F Entry in row i and column j = set of variables that generate the substring of w beginning at position j and extending for i positions. F In reader, these entries are denoted Xj;i+j1, i.e., the subscripts are the _rst and last positions of the string represented, so the _rst row is X11;X22; : : :;Xnn, the second row is X12;X23; : : :;Xn1;n, and so on. Basis: (row 1) Xii = the set of variables A such that A ! a is a production, and a is the symbol at position i of w. Induction: Assume the rows for substrings of length up to m 1 have been computed, and compute the row for substrings of length m. _ We can derive aiai+1 _ _ _ aj from A if there is a production A ! BC, B derives any pre_x of aiai+1 _ _ _ aj, and C derives the rest. _ Thus, we must ask if there is any value of k such that F i _ k < j.

F B is in Xik. F C is in Xk+1;j. Example In class, we'll work the table for the grammar: S ! AS j SB j AB A!a B!b and the string aabb.

Decision and Closure Properties of CFLs Part of my automata notes. Similar to decision and closure properties of regular languages. When we talk about a context-free language, were talking about the representation of the language, being either a context free grammar or push down automata accepting by final state or empty stack. There are algorithms to decide if:

string w is in CFL L CFL L is empty CFL L is infinite

Non-Decision Properties Many questions that can be decided for regular sets cannot be decided for CFLs. Example: Are two CFLs the same? Are two CFLs disjoint? Need theory of Turing machines and decidability to prove no algorithm exists. Testing Emptiness Already did this. We learned to eliminate useless variables. If the start symbol is one of these, then the CFL is empty, otherwise not empty. Testing Membership

Want to know if string w is in L(G).

Assume G is in CNF (or convert to CNF) w= is a special case since CNF gets rid of the empty string, but we can solve by testing if the start symbol is nullable Algorithm CYK is a good example dynamic programming and runs in O(n^3) where n=|w| Let w = a1an Construct an n by n triangular array of sets of variables Xij = {variables A | A=>* aiaj } Induction on j-i+1, the length of the derived string. finally, ask if S is in X1n Basis: Xii = {A | A-> ai is a production } Induction: Xij={A| there is a production A->BC and an integer k, i<=k<j, such that B is in Xik and C is in Xk+1,j

Example of the CYK algorithm: S->AB, A->BC | a, B->AC | b, C-> a | b String w=ababa For the basis, have productions whose body is a or b, so X11={A,C} X22={B,C} X33={A,C} X44={B,C} X55={A,C} X12={B,S} (?not sure how?) When j=i+1, k can only be i, k=i. That is, only way to derive a string of length two is to use a production is where one variable is replaced by two, then those two lead directly to your terminals you want. The reason S is in X12 is because A is in X11 and B is in X22. Similarly, B is in X12 because it derives A which is in X11 and C is in X22. See how A isnt in X12 even though it can generate a string with two terminals, but those strings must start with b, but we want X12 to start with a, which can be seen that B is not in X11. So the sets with two terminal characters X12={B,S} X23={A} X34={B,S} X45={A}

X13 with k=1, meaning X11 and X23, but this only lets us generate a body with AA or CA, etc, basically with this example, we get to X15, the whole string, and this set only has A. Since it doesnt have S the start symbol, we can say the string isnt in the language. Closure Properties of CFLs: CFLs are closed under union, concatenation, Kleene closure, reversal, homomorphism, and inverse homomorphism. But not closed under intersection or difference.

UNION Let L and M be CFLs with grammars G and H, respectively. Assume G and H have no variables in common. Names of variables do not affect the language. Just dont want same names to not get confused I guess. They can have similar terminals, thats cool. S1 and S2 are the start symbols after renaming the variables. Form a new grammar for L U M by combining all symbols and productions of G and H. Then add a new start symbol S-> S1 | S2.

C ON C A TENA TIO N Start same way, Let L and M have grammars G and H. Assume no variables in common, S1 and S2 start symbols. Form a new grammar for LM by starting with all symbols and productions of G and H. Add new start symbol S->S1S2.

K L E E N E S TAR Let L have grammar G, start symbol S1. For new grammar for L* by introducing to G a new start symbol S->S1S | .

A rightmost derivation from S generates a sequence of zero or more S1s, each of which generates some string in L.

REV ERSAL For CFL L with grammar G, form a grammar for L^R by reversing the body of every production. (so if A->BC, replace with A->CB?) Example: S->0S1 | 01, replace with S->1S0 | 10 No proof, simple induction on lengths of derivations.

HOMO MORPHIS M CFL L with grammar G. Let h be a homomorphism on the terminal symbols of G. Construct a grammar for h(L) by replacing each terminal symbol a by h(a). Example: G has production S-> 0S1 | 01. h is defined by h(0)=ab, h(1)=. h(L(G)) has grammar with productions S->abS | ab

INV ERSE HOMOMORPHIS M Grammars dont help much here, but PDAs do. Let L=L(P) for some PDA P. Construct PDA P to accept h^(-1)(L). P simulates P, but keeps as one component of a two-component state a buffer that holds the result of applying h to one input symbol. Formal construction of P has states are pairs [q, w] where q is a state of P, and w is a suffix of h(a) for some symbol a. Thus, only a finite number of possible values for w. Stack symbols of P are those of P. Start state of P is [q0, ]

Input symbols of P are the symbols to which h applies. Final states of P are the states [q, ] such that q is a final state of P. Transition of P ([q, ], a, X) = {([q, h(a)], X)} for any input symbol a of P and any stack symbol X. When buffer is empty, P can reload it. ([q, bw], , X) contains ([p, w], ) if (q, b, X) contains (p, ), where b is either an input symbol of P or . This simulates P from the buffer. Proving correctness of P, need to show L(P)=h^(-1)(L(P)). Unit V Turing machines-various-recursively enumerable (r.e.)set-recursive sets TM as computer of function- decidability and solvability- reductions- post correspondence problem (PCP) and unsolvability of ambiguity problem of CFGs,Churchs hypothesis

Turing machines: A Turing Machine consists of: A state machine An input tape A movable r/w tape head A move of a Turning Machine Read the character on the tape at the current position of the tape head Change the character on the tape at the current position of the tape head Move the tape head Change the state of the machine based on current state and character read

Plan for today TM Variants Computing a function using TMs L Turing Machine Variants Some variants of the TM Multitaped TMs Have multiple tapes With a single tape head that is read/writing the same position on each tape at any one given configuration Non-deterministic TMs More than one move is possible from any given configuration. Semi-infinite TM What our book calls the basic TM. The real basic TM has an infinite tape in both Directions Thankfully, all these variants can be shown to be equivalent to the basic TM. Specifically, non-determinism does not add extra computing power to a TM Much like FAs But unlike PDAs TMs and languages Two classes of languages that involve TMs: A language is recursively enumerable if there is a TM that accepts it A language is recursive if there is a TM that recognizes it. First observation: Every recursive language is also recursively enumerable Modify the TM that recognizes the language so that it goes into a nowhere state just before placing the 0 on the tape indicating that a string is not in the


Game plan 1. Show that there exists a language that is not recursively enumerable 2. Show that there exists a language that is recursively enumerable but not recursive. Enumerable and Recursive Sets Unfortunately, it seems that very few of the general problems concerning the nature of computation are solvable. Now is the time to take a closer look at some of these problems and in classify them with regard to a finer metric. To do this we need more precision and formality. So, we shall bring forth a little basic mathematical logic. Examining several of the sets with unsolvable membershp problems, we find that while we cannot decide their membership problems, we are often able to determine when an element is a member of some set. In other words, we know if a Turing machine accepts a particulay set and halts for some input, then that input is a member of a set. Thus the Turing machine halts for members of the set and provides no information about inputs that are not members. An example is K, the set of Turing machines that halt when given their own indices as input. Recalling that K = { i | Mi(i) halts } = { i | i consider the machine M that can be constructed from the universal Turing machine (Mu) as follows. M(i) = Mu(i, i) Another way to describe M (possibly more intuitively) is: M i (i) = halt if M (i) halts diverge otherwise

This is a Turing machine. And, since it was just Mu(i, i) we know exactly how to build it and even find its index in our standard enumeration. Furthermore, if we examine it carefully, we discover that it accepts the set K. That is, M will halt for all inputs which are members of K but diverge for nonmembers. There is an important point about this that needs to be stressed. If some integer x is a member of K then M(x) will always tell us so. Otherwise, M(x) provides us with absolutely no information.. This is because we can detect halting but cannot always detect divergence. After all, if we knew when a machine did not halt, we would be able to solve the Enumerable and Recursive Sets 2 halting problem. In fact, there are three cases of final or terminal behavior in the operation of Turing machines: a) halting, b) non-halting which we might detect, and c) non-detectable divergence. The latter is the troublesome kind that provides us with unsolvability. Some of the computable sets have solvable membership problems (for example, the sets of even integers or prime numbers) but many such as K do not. In traditional mathematical logic or recursion theory we name our collection of computable sets the class of recursively enumerable sets. There is a reason for this exotic sounding name that will be completely revealed below. The formal definition for members of the class follows. Definition. A set is recursively enumerable (abbreviated r.e.) if and only if it can be accepted by a Turing machine. We call this family of sets the class of r.e. sets and earlier we discovered an enumeration of all of them which we denoted W1, W2, ... to correspond to our standard enumeration of Turing machines. Noting that any set with a solvable membership problem is also an r.e. set (as we shall state in a theorem soon) we now present a definition of an important subset of the r.e. sets and an immediate theorem. Definition. A set is recursive if and only if it has a solvable membership problem. Theorem 1. The class of recursively enumerable (r.e.) sets properly contains the class of recursive sets. Proof. Two things need to be accomplished. First, we state that every recursive set is r.e. because if we can decide if an input is a member of a set, we can certainly accept the set. Next we present a set that does not have a solvable membership problem, but is r.e. That of course, is our old friend, the diagonal set K. That is fine. But, what else do we know about the relationship between the r.e. sets and the recursive sets? If we note that since we have total information about recursive sets and only partial information about membership in r.e. sets, the following characterization of the recursive sets follows very quickly.

Theorem 2. A set is recursive if and only if both the set and its complement are recursively enumerable.. Enumerable and Recursive Sets 3 Proof. Let A be a recursive set. Then its complement A must be recursive as well. After all, if we can tell whether or not some integer x is a member of A, then we can also decide if x is not a member of A. Thus both are r.e. via the last theorem. Suppose that A and A are r.e. sets. Then, due to the definition of r.e. sets, there are Turing machines that accept them. Let Ma accept the set A and Ma accept its complement A . Now, let us consider the following construction. M a a (x) = 1 if M (x) halts 0 if M (x) halts

If we can build M as a Turing machine, we have the answer because M does solve membership for the set A. But, it is not clear that M is indeed a Turing machine. We must explain exactly how M operates. What M must do is to run Ma(x) and Ma (x) at the same time. This is not hard to do if M has four tapes. It uses two of them for the computation of Mu(a, x) and two for the computation of Mu( a , x) and runs them in time-sharing mode (a step of Ma, then a step of Ma ). We now note that one of Ma(x) and Ma (x) must halt. Thus M is indeed a Turing machine that decides membership for the set A. From this theorem come several interesting facts about computation. First, we gain a new characterization of the recursive sets and solvable decision problems, namely, both the problem and its complement are computable. We are also soon be able to present our first uncomputable or non-r.e. set. In addition, another closure property for the r.e. sets falls out of this examination. Here are these results. Theorem 3. The complement of K is not a recursively enumerable set. Proof. The last theorem states that if K were r.e. then both it and its complement must be recursive. Since K is not a recursive set, K cannot be an r.e. set. Corollary. The class of r.e. sets is not closed under complement. Remember the halting problem? Another one of the ways to state it is as a membership problem for the set of pairs: H = { <i, x> | Mi(x) halts } = { <i, x> | x Enumerable and Recursive Sets 4

We have shown that it does not have a solvable membership problem. A little bit of contemplation should be enough to convince anyone that is an r.e. set just like K. But, what about its complement? The last two theorems provide the machinery to show that it also is not r.e. Theorem 4. The complement of the halting problem is not r.e. Proof. We now know two ways to show that the complement of the halting problem, namely the set {<i, x> | Mi(x) diverges} is not an r.e. set. The first is to use theorem 2 that states that a set is recursive if and only if both it and its complement are r.e. If the complement of the halting problem were r.e. then the halting problem would be recursive (or solvable). This is not so and thus the complement of the halting problem must not be r.e. Another method is to note that if {<i, x> | Mi(x) diverges} were r.e. then K would have to be r.e. also. This is true because we could use the machine that accepts the complement of the halting problem in order to accept K . Since K is not r.e. then the complement of the halting problem is not either. The second method of the last proof brings up another fact about reducibilities. It is actually the r.e. version of a corollary to the theorem stating that if a nonrecursive set is reducible to another then it cannot be recursive either. Theorem 5. If A is reducible to B and A is not r.e. then neither is B. Proof. Let A be reducible to B via the function f. That is, for all x: x Let us assume that B is an r.e. set. That means that there is a Turing machine Mb that accepts B. Now construct M in the following manner: M(x) = Mb(f(x)) and examine the following sequence of events. x iff Mb(f(x)) halts [since Mb accepts B] iff M(x) halts [due to definition of M] Enumerable and Recursive Sets 5 This means that if M is a Turing machine (and we know it is because we know exactly how to build it), then M accepts A. Thus A must also be an r.e. set since the r.e. sets are those accepted by Turing machines. Well, there is a contradiction! A is not r.e. So, some assumption made above must be wrong. By examination we find that the only one that could be wrong was when we assumed that B was r.e. Now the time has come to turn our discussion to functions instead of sets. Actually, we shall really discuss functions and some of the things that they can do to and with sets. We know something about this since we have seen reducibilities and they are functions that perform operations upon sets. Returning to our original definitions, we recall that Turing machines compute the computable functions and some compute total functions (those that always halt and present an answer) while others compute partial functions which are

defined on some inputs and not on others. We shall now provide names for this behavior. Definition. A function is (total) recursive if and only if it is computed by a Turing machine that halts for every input. Definition. A function is partial recursive (denoted prf) if and only if it can be computed by a Turing machine. This is very official sounding and also quite precise. But we need to specify exactly what we are talking about. Recursive functions are the counterpart of recursive sets. We can compute them totally, that is, for all inputs. Some intuitive examples are: f(x) = 3x2 + 5x + 2 f(x, y) = x if y is prime 0 otherwise

Partial recursive functions are those which do not give us answers for every input. These are exactly the functions we try not to write programs for! This brings up one small thing we have not mentioned explicitly about reducibilities. We need to have answers for every input whenever a function is used to reduce one set to another. This means that the reducibility functions need to be recursive. The proper traditional definition of reducibility follows. Enumerable and Recursive Sets 6 B) if and only if there is a recursive function f such that for all x: x Another thing that functions are useful for doing is set enumeration (or listing). Some examples of set enumeration functions are: e(i) = 2i = the ith even number p(i) = the ith prime number m(i) = the ith Turing machine encoding These are recursive functions and we have mentioned them before. But, we have not mentioned any general properties about functions and the enumeration of the recursive and r.e. sets. Let us first define what exactly it means for a function to enumerate a set. Definition. The function f enumerates the set A (or A = range of f), if and only if for all y, a) If y b) If f(x) = y then y Note that partial recursive functions as well as (total) recursive functions can enumerate sets. For example, the function: k(i) = i if M halts diverge otherwise

i is a partial recursive function that enumerates the set K. Here is a general theorem about the enumeration of r.e. sets which explains the reason for their exotic name. Theorem 6. A set is r.e. if and only if it is empty or the range of a recursive function. Proof. We shall do away with one part of the theorem immediately. If a set is empty then of course it is r.e. since it is recursive. Now what we need to show is that non-empty r.e. sets can be enumerated by recursive functions and that any set enumerated by a recursive function is r.e. a) If a set is not empty and is r.e. we must find a recursive function that enumerates it. Enumerable and Recursive Sets 7 Let A be a non-empty, r.e. set. We know that there is a Turing machine (which we shall call Ma) which accepts it. Since A is not empty, we may assume that there is some input (let us specify the integer k) which is a member of A. Now consider: M a (x,n) = x if M (x) halts in exactly n steps k otherwise

We claim that M is indeed a Turing machine and we must demonstrate two things about it. First, the range of M is part of the set A. This is true because M either outputs k (which is a member of A) or some x for which Ma halted in n steps. Since Ma halts only for members of A, we know that the range of M Next we must show that our enumerating machin M outputs all of the members of A. For any x (let us say m) of steps. Thus M(x, m) = x. So, M eventually outputs all of the members of A. In other words: A and we can assert that M exactly enumerates the set A. (N.B. This is not quite fair since enumerating functions are supposed to have one parameter and M has two. If we define M(z) to operate the same as the above machine with: x = number of zeros in z n = number of ones in z

then everything defined above works fine after a little extra computation to count zeros and ones. This is because sooner or later every pair of integers shows up. Thus we have a one parameter machine M(z) = M(<x, n>) which enumerates A.) We also need to show that M does compute a recursive function. This is so because M always halts. Recall that M(x, n) simulates Ma(x) for exactly n steps and then makes a decision of whether to output x or k. Enumerable and Recursive Sets 8 b) The last part of the proof involves showing that if A is enumerated by some recursive function (let us call it f), then A can be accepted by a Turing machine. So, we shall start with A as the range of the recursive function f and examine the following computing procedure. AcceptA(x) n = 0; halt This procedure is computable and halts for all x which are enumerated by f (and thus members of A). It diverges whenever x is not enumerated by f. Since this computable procedure accepts A, we know that A is an r.e. set. This last theorem provides the reason for the name recursively enumerable set. In recursion theory, stronger results have been proven about the enumeration of both the recursive and r.e. sets. The following theorems (provided without proof) demonstrate this. Theorem 7. A set is recursive if and only if it is finite or can be enumerated in strictly increasing order. Theorem 8. A set is r.e. if and only if it is finite or can be enumerated in non-repeating fashion. TM as computer of function: Computation with Turing Machines Computing functions with TMs The result of the function applied to an input string x, will be left on the tape when the machine halts. Functions with multiple arguments can be placed on the input tape with arguments separated by blanks.

Decidability And Solvability: 1. Describe the Turing-machine model of computation 2. Show a problem to be decidable 3. Show a problem to be undecidable 4. Explain the notion of reducibility of problem

Solvability and the Halting Problem: Our development period is over. Now it is time for some action. We have the tools and materials and we need to get to work and discover some things that are not computable. We know they are there and now it is time to find and examine a few. Our task in this section is to find some noncomputable problems. However we must first discuss what exactly problems are. Many of our computational tasks involve questions or decisions. We shall call these problems. For example, some problems involving numbers are: Is this integer a prime? Does this equation have a root between 0 and 1? Is this integer a perfect square? Does this series converge? Is this sequence of numbers sorted? As computer scientists, we are very aware that not all problems involve numbers. Many of the problems that we wish to solve deal with the programs we write. Often we would like to know the answers to questions concerning our methods, or our programs. Some of these problems or questions are: Is this program correct? How long will this program run? Does this program contain an infinite loop? Is this program more efficient than that one? A brief side trip to set forth more definitions and concepts is in order. We must describe some other things closely related to problems or questions. In fact, often when we describe problems we state them in terms of relations or predicates. For example, the predicate Prime(x) that indicates prime numbers could be defined: Prime(x) is true if and only if x is a prime number. and this predicate could be used to define the set of primes: PRIMES = { x | Prime(x) Halting Problems 2 Another way to link the set of primes with the predicate for being a prime is to state:

x PRIMES if and only if Prime(x) (N.B. Two comments on notation are necessary. We shall use iff to mean if and only if and will often just mention a predicate as we did above rather than stating that it is true.) We now have several different terms for problems or questions. And we know that they are closely related. Sets, predicates, and problems can be used to ask the same question. Here are three equivalent questions: Is x PRIMES? Is Prime(x) true? Is x a prime number? When we can completely determine the answer to a problem, the value of a predicate, or membership in a set for all instances of the problem, predicate, or things that may be in the set; we say that the problem, predicate, or set is decidable or solvable. In computational terms this means that there is a Turing machine which can in every case determine the answer to the appropriate question. The formal definition of solvability for problems follows. Definition. A problem P is solvable if and only if there is a Turing machine Mi such that for all x: If we can always solve a problem by carrying out a computation it is a solvable problem. Many examples of solvable problems are quite familiar to us. In fact, most of the problems we attempt to solve by executing computer programs are solvable. Of course, this is good because it guarantees that if our programs are correct, then they will provide us with solutions! We can determine whether numbers are prime, find shortest paths in graphs, and many other things because these are solvable problems. There are lots and lots of them. But there must be some problems that are not solvable because we proved that there are things which Turing machines (or programs) cannot do. Let us begin by formulating and examining a historically famous one. Suppose we took the Turing machine M1 and ran it with its own index as input. That is, we examined the computation of M1(1). What happens? Well, in this Halting Problems 3 case we know the answer because we remember that M1 was merely the machine: 0 0 halt and we know that it only halts when it receives an input that begins with a zero. This is fine. But, how about M2(2)? We could look at that also. This is easy; in fact, there is almost nothing to it. Then we could go on to M3(3). And so forth. In general, let us take some arbitrary integer i and ask about the behavior of Mi(i). And, let's not ask for much, we could put forth a very simple question: does it halt? Let us ponder this a while. Could we write a program or design a Turing machine that receives i as input and determines whether or not Mi(i) halts? We might design a machine like the universal Turing machine that first produced

the description of Mi and then simulated its operation on the input i. This however, does not accomplish the task we set forth above. The reason is because though we would always know if it halted, if it went into an infinite loop we might just sit there and wait forever without knowing what was happening in the computation. Here is a theorem about this that is very reminiscent of the result where we showed that there are more sets than computable sets. Theorem 1. Whether or not a Turing machine halts when given its own index as input is unsolvable. Proof. We begin by assuming that we can decide whether or not a Turing machine halts when given its own index as input. We assume that the problem is solvable. This means that there is a Turing machine that can solve this problem. Let's call this machine Mk and note that for all inputs i: Mx x x k x x () () () = 1 if M halts 0 if M diverges

(This assertion came straight from the definition of solvability.) Since the machine Mk exists, we can use it in the definition of another computing procedure. Consider the following machine. Halting Problems 4 Mx x x k k () () () = halt if M = 0 diverge if M = 1

This is not too difficult to construct from Mk and our universal Turing machine Mu. We just run Mk(x) until it provides an output and then either halt or enter an infinite loop. We shall apply Church's thesis once more. Since we have developed an algorithm for the above machine M, we may state that is indeed a Turing machine and as such has an index in our standard enumeration. Let the integer d be its index. Now we inquire about the computation of Md(d). This inquiry provides the following sequence of conclusions. (Recall that iff stands for if and only if.) Md(d) halts iff M(d) halts iff Mk(d) = 0 iff Md(d) diverges (since Md = M) (see definition of M) (see definition of Mk) Each step in the above deduction follows from definitions stated previously. Thus they all must be true. But there is a slight problem since a contradiction was proven! Thus something must be wrong and the only thing that could be incorrect must be some assumption we made. We only made one, namely our original assumption that the problem was solvable. This means that whether a Turing machine halts on its own index is unsolvable and we have proven the theorem. Now we have seen an unsolvable problem. Maybe it is not too exciting, but it is unsolvable nevertheless. If we turn it into a set we shall then have a set in which membership is undecidable. This set is named K and is well-known and greatly cherished by recursion theorists. It is: K = { i | Mi(i) halts } K was one of the first sets to be proven undecidable and thus of great historical interest. It will also prove quite useful in later proofs. Another way to state our last theorem is: Corollary. Membership in K is unsolvable. Let us quickly follow up on this unsolvability result and prove a more general one. This is possibly the most famous unsolvable problem that exists. It is called the halting problem or membership problem. Halting Problems 5 Theorem 2 (Halting Problem). For arbitrary integers i and x, whether or not Mi(x) halts is unsolvable. Proof. This follows directly from the previous theorem. Suppose we could solve halting for Mi(x) on any values of i and x. All we have to do is plug in the value i for x and we are now looking at whether Mi(i) halts. We know from the last theorem that this is not solvable. So the general halting problem (does Mi(x) halt?) must be unsolvable also, since if it were solvable we could solve the restricted version of the halting problem, namely membership in the set K. This is interesting from the point of view of a computer scientist. It means that no program can ever predict the halting of all other programs. Thus we shall

never be able to design routines which unfailingly check for infinite loops and warn the programmer, nor can we add routines to operating systems or compilers which always detect looping. This is why one never sees worthwhile infinite loop checkers in the software market. Let's try another problem. It seems that we cannot tell if a machine will halt on arbitrary inputs. Maybe the strange inputs (such as the machine's own index) are causing the problem. This might be especially true if we are looking at weird machines that halt when others do not and so forth! It might be easier to ask if a machine always halts. After all, this is a quality we desire in our computer programs. Unfortunately that is unsolvable too. Theorem 3. Whether or not an arbitrary Turing machine halts for all inputs is an unsolvable problem. Proof. Our strategy for this proof will be to tie this problem to a problem that we know is unsolvable. Thus it is much like the last proof. We shall show that halting on one's index is solvable if and only if halting for all inputs is solvable. Then since whether a machine halts on its own index is unsolvable, the problem of whether a machine halts for all inputs must be unsolvable also. In order to explore this, let's take an arbitrary machine Mi and construct another Turing machine Mall such that: Mall halts for all inputs iff Mi(i) halts At this point let us not worry about how we build Mall, this will come later. We now claim that if we can decide whether or not a machine halts for all inputs, we can solve the problem of whether a machine halts on its own index. Here is how we do it. To decide if Mi(i) halts, just ask whether Mall Halting Problems 6 halts on all inputs. But, since we have shown that we cannot decide if a machine halts upon its own index this means that if we are able to construct Mall, then we have solved membership in K and proven a contradiction. Thus the problem of detecting halting on all inputs must be unsolvable also. Let us get to work. A machine like the above Mall must be built from Mi. We shall use all of our tools in this construction. As a start, consider: M(x, i) = Mu(i, i) = Mi(i) Note that M does not pay attention to its input x. It just turns the universal machine Mu loose on the input pair (i, i), which is the same as running Mi on its own index. So, no matter what x equals, M just computes Mi(i). Yet another appeal to Church's thesis assures us that M is indeed a Turing machine and exists in the standard enumeration. Let us say that M is machine Mm. Thus for all i and x: Mm(x, i) = M(x, i) = Mu(i, i) = Mi(i). Now we shall call upon the s-m-n theorem. It says that there is a function s(m, i) such that for all i, a, and x: Ms(m, i)(x) = Mm(x, i) = M(x, i) If we let all = s(m, i) then we know that for fixed i and all x: Mall(x) = M(x, i) = Mu(i, i) = Mi(i)

Another way to depict the operation of Mall is: Mallx i i () (i) (i) = halt if M halts diverge if M diverges

To sum up, from an arbitrary machine Mi we have constructed a machine Mall which will halt on all inputs if and only if Mi(i) halts. The following derivation shows this. Mi(i) halts iff Mu(i, i) halts iff for all x, M(x, i) halts iff for all x, Mm(x, i) halts iff for all x, Ms(m,i)(x) halts iff for all x, Mall(x) halts Halting Problems 7 Each line in the above sequence follows from definitions made above or theorems (s-m-n and universal Turing machine theorems) we have proven before. Now we have exactly what we were looking for, a machine Mall which halts for all inputs if and only if Mi(i) halts. Recalling the discussion at the beginning of the proof, we realize that our theorem has been proven. Let us reflect on what we have done in this section. Our major accomplishment was to present an unsolvable problem. And, in addition, we presented two more which were related to it. They all concerned halting and as such are relevant to programming and computer science. From this we know that we can never get general answers to questions such as: will this program halt on this data set? will this program halt on any data set? This is indeed a very fine state of affairs! We have shown that there is no way to ever do automatic, general checks on loops or even correctness for the programs we develop. It is unfortunate to close on such a sad note, but the actual situation is even worse! We shall presently find out that hardly anything interesting is solvable. Reductions: Let A and B be languages over the same alphabet _. An algorithm that transforms: _ Strings in A to strings in B.

_ Strings not in A to strings not in B. B is decidable ) A is decidable. A is undecidable ) B is undecidable. One way to show a problem B to be undecidable is to reduce an undecidable problem A to B. A Turing machine M computes a function f if: M halts on all inputs. On input x it writes f(x) on the tape and halts. Such a function f is called a computable function. Examples: increment, addition, multiplication, shift. Any algorithm with output is computing a function. Let A and B be languages over _. A is reducible to B if and only if: _ there exists a computable function f : __ ! __ such that _ for all w 2 __; w 2 A , f(w) 2 B. Notation: A _m B. FACT: A _m B , A _m B. w 2 A , f(w) 2 B is equivalent to w 62 A , f(w) 62 B. To Re-iterate: 1. Construction: f(w) from w by an algorithm. 2. Correctness: w 2 A , f(w) 2 B. An Example involving DFAs EQDFA = fA;B j A;B are DFAs and L(A) = L(B)g. EDFA = fA j A is a DFA and L(A) = ;g. A reduction machine on input A;B two DFAs: 1. Constructs the DFA A0 such that L(A0) = L(A). 2. Constructs the DFA B0 such that L(B0) = L(B). 3. Constructs the DFA M1 such that L(M1) = L(A) \ L(B0). 4. Constructs the DFA M2 such that L(M2) = L(A0) \ L(B). 5. Constructs the DFA C such that L(C) = L(M1) [ L(M2). 6. Outputs C. Correctness: _ Suppose L(A) = L(B). Then, L(C) = ;. _ Suppose L(A) 6= L(B). Then, L(C) 6= ;. That is, EQDFA _m EDFA. An Example involving CFGs ALLCFG = fG j G is a CFG and L(G) = __g. EQCFG = fG;H j G;H are CFGs and L(G) = L(H)g. A reduction machine on input G a context-free grammar with alphabet _: 1. Constructs a CFG H with rules of the form S0 aS0 j _, for all a 2 _. 2. Outputs (G;H). L(H) = __. Correctness: _ Suppose G generates all strings in __. Then, L(G) = L(H).

_ Suppose G does not generate some string in __. Then L(G) 6= L(H). That is, ALLCFG _m EQCFG. 6 The Halting problem HALTTM = fM;w j M is a DTM and M halts on wg. The reduction machine outputs a DTM that loops whenever M reaches the rejecting state. On input M;w: 1. Constructs the following machine M0: Read input x. Simulate M on x. If M accepts, halt and accept. If M halts and rejects, enter a loop. 2. Outputs M0;w . That is, ATM _m HALTTM. HALTTM is undecidable since ATM is undecidable. 7 The Halting problem Three machines: _ A reduction machine that is a DTM. _ Input to the reduction machine is M;w, where M is a DTM. _ Output of the reduction machine is M0;w, where M0 is a DTM. 8 The Halting problem The output DTM M0 has the input M hard-coded into it. Let M = (Q;_; ; _; qs; qa; qr). What is M0? M0 = (Q0;_; ; _0; qs; qa; qr0): Q0 = Q [ fql; qr0g. De_ne _0 as follows: For all states p, for all states q 6= qr, for all symbols a; b, for all D 2 fL; R; Sg: if _(p; a) = (q; b;D) then _0(p; a) = (q; b;D). For all states p, for all symbols a; b, for all D 2 fL; R; Sg: if _(p; a) = (qr; b;D) then _0(p; a) = (ql; b; S). For all symbols a, include the transition _0(ql; a) = (ql; a; S). M0 is constructed from M by the reduction machine. 9 ETM = fM j M is a TM and L(M) 6= ;g A reduction machine on input M;w : 1. Constructs the following machine M0: Read input x. If x 6= w then reject. If x = w, Simulate M on w. If M accepts, halt and accept. 2. Outputs M0. Correctness: _ If M accepts w then L(M0) is not empty.

_ If M does not accept w then L(M0) is empty. That is, ATM _m ETM. 10 ETM = fM j M is a TM and L(M) 6= ;g ATM _m ETM. This implies that ETM is undecidable. This in turn implies that ETM is undecidable. At least one of them must be not recognizable. ETM is recognizable. ETM is not recognizable. 11 EQTM = fM1;M2 j M1 and M2 are TMs and L(M1) = L(M2)g EQTM is undecidable. Is EQTM recognizable? See the text book. 12 Language is Context-free CONTEXT FREETM = fM j M is a TM and L(M) is context freeg A reduction machine on input M;w: 1. Constructs the following machine M0: Read input x. If x has the form 0n1n2n then halt and accept. If x is not of this form, simulate M on w. If M accepts, halt and accept. 2. Outputs M0. Correctness _ L(M0) is __ if M accepts w. _ L(M0) is not context-free if M does not accept w. ATM _m CONTEXT FREETM. CONTEXT FREETM is undecidable since ATM is undecidable. 13 Language is Not Regular REGULARTM = fM j M is a TM and L(M) is NOT regularg A reduction machine on input M;w: 1. Constructs the following machine M0: Read input x. If x is not of the form 0n1n then halt and reject. If x is of this form, simulate M on w. If M accepts, halt and accept. 2. Outputs M0. Correctness: _ L(M0) is ; if M does not accept w. _ L(M0) is not regular if M accepts w. ATM _m REGULARTM. REGULARTM is undecidable since ATM is undecidable. 14

Not Recognizable ATM _m REGULARTM =) ATM _m REGULARTM. REGULARTM is not recognizable. ATM _m REGULARTM =) ATM _m REGULARTM REGULARTM is not recognizable. 15 Properties of Reductions 1. A _m B and B _m C =) A _m C. 2. A _m B =) A _m B 3. A _m B and B is decidable =) A is decidable. 4. Let A be recognizable. Then, A _m A if and only if A and A are decidable. 5. A _m B and B is recognizable =) A is recognizab Post Correspondence Problem (PCP):

The Post correspondence problem (due to Emil Post) is another undecidable problem that turns out to be a very helpful tool for proving problems in logic or in formal language theory to be undecidable. Let _ be an alphabet with at least two letters. An instance of the Post Correspondence problem (for short, PCP) is given by two sequences U = (u1, . . . , um) and V = (v1, . . . , vm), of strings ui, vi 2 __. The problem is to find whether there is a (finite)

sequence (i1, . . . , ip), with ij 2 {1, . . . ,m} for j = 1, . . . , p, so that ui1ui2 uip = vi1vi2 vip. Equivalently, an instance of the PCP is a sequence of pairs _ u1 v1 _ ,..., _ um vm _ . 424 For example, consider the following problem: _ abab ababaaa _ , _ aaabbb bb _

, _ aab baab _ , _ ba baa _ , _ ab ba _ , _ aa a _ . There is a solution for the string 1234556: abab aaabbb aab ba ab ab aa = ababaaa bb baab baa ba ba a. We are beginning to suspect that this is a hard problem. Indeed, it is undecidable!

Theorem 6.8.1 (Emil Post, 1946) The Post correspondence problem is undecidable, provided that the alphabet _ has at least two symbols.

There are several ways of proving Theorem 6.8.1, but the strategy is more or less the same: Reduce the halting problem to the PCP, by encoding sequences of IDs as partial solutions of the PCP. For instance, this can be done for RAM programs. The first step is to show that every RAM program can be simulated by a single register RAM program. Then, the halting problem for RAM programs with one register is reduced to the PCP (using the fact that only four kinds of instructions are needed). A proof along these lines was given by Dana Scott. 426 As an application, we prove the following result: Theorem 6.8.2 It is undecidable whether a context-free grammar is ambiguous. Proof . We reduce the PCP to the ambiguity problem for CFGs. Given any instance U = (u1, . . . , um) and V = (v1, . . . , vm) of the PCP, let c1, . . . , cm be m new symbols, and consider the following languages: LU = {ui1 uipcip ci1 | 1 _ ij _ m, 1 _ j _ p, p _ 1}, LV = {vi1 vipcip ci1 | 1 _ ij _ m, 1 _ j _ p, p _ 1},

and LU,V = LU [ LV .

We can easily construct a CFG, GU,V , generating LU,V . The productions are: S ! SU S ! SV SU ! uiSUci SU ! uici SV ! viSV ci SV ! vici. It is easily seen that the PCP for (U, V ) has a solution iff LU \ LV 6= ; iff G is ambiguous. Remark: As a corollary, we also obtain the following result: It is undecidable for arbitrary context-free grammars G1 and G2 whether L(G1) \ L(G2) = ; (see also Theorem 6.9.2). Greibachs Theorem: Recall that the computations of a Turing Machine, M, can be described in terms of instantaneous descriptions, upav. We can encode computations ID0 ` ID1 ` ` IDn halting in a proper ID, as the language, LM, consisting all of strings w0#wR 1 #w2#wR 3 # #w2k#wR

2k+1, or w0#wR 1 #w2#wR 3 # #w2k2#wR 2k1#w2k, where k _ 0, w0 is a starting ID, wi ` wi+1 for all i with 0 _ i < 2k+1 and w2k+1 is proper halting ID in the first case, 0 _ i < 2k and w2k is proper halting ID in the second case.\ Undecidable Properties Of Languages : The language LM turns out to be the intersection of two contextfree languages L0 M and L1 M defined as follows: (1) The strings in L0 M are of the form w0#wR 1 #w2#wR 3 # #w2k#wR 2k+1 or w0#wR 1 #w2#wR 3 # #w2k2#wR 2k1#w2k,

where w2i ` w2i+1 for all i _ 0, and w2k is a proper halting ID in the second case. (2) The strings in L1 M are of the form w0#wR 1 #w2#wR 3 # #w2k#wR 2k+1 or w0#wR 1 #w2#wR 3 # #w2k2#wR 2k1#w2k, where w2i+1 ` w2i+2 for all i _ 0, w0 is a starting ID, and w2k+1 is a proper halting ID in the first case. Theorem 6.9.1 Given any Turing machine M, the languages L0 M and L1 M are context-free, and LM = L0 M \ L1 M. Proof . We can construct PDAs accepting L0 M and L1 M. It is easily checked that LM = L0

M \ L1 M. As a corollary, we obtain the following undecidability result: Theorem 6.9.2 It is undecidable for arbitrary context-free grammars G1 and G2 whether L(G1) \ L(G2) = ;. Proof . We can reduce the problem of deciding whether a partial recursive function is undefined everywhere to the above problem. By Rices theorem, the first problem is undecidable. However, this problem is equivalent to deciding whether a Turing machine never halts in a proper ID. By Theorem 6.9.1, the languages L0 M and L1 M are context-free. Thus, we can construct context-free grammars G1 and G2 so that L0 M = L(G1) and L1 M = L(G2). Then, M never halts in a proper ID iff LM = ; iff (by Theorem 6.9.1), LM = L(G1) \ L(G2) = ;. Given a Turing machine M, the language LM is defined over the alphabet _ = [ Q [ {#}. The following fact is also useful to prove undecidability: Theorem 6.9.3 Given any Turing machine M, the language __ LM is context-free. Proof . One can easily check that the conditions for not belonging to LM can be checked by a PDA.

As a corollary, we obtain: Theorem 6.9.4 Given any context-free grammar, G = (V,_, P, S), it is undecidable whether L(G) = __. Proof . We can reduce the problem of deciding whether a Turing machine never halts in a proper ID to the above problem. Indeed, given M, by Theorem 6.9.3, the language __ LM is context-free. Thus, there is a CFG, G, so that L(G) = __LM. However, M never halts in a proper ID iff LM = ; iff L(G) = __. As a consequence, we also obtain the following: Theorem 6.9.5 Given any two context-free grammar, G1 and G2, and any regular language, R, the following facts hold: (1) L(G1) = L(G2) is undecidable. (2) L(G1) _ L(G2) is undecidable. (3) L(G1) = R is undecidable. (4) R _ L(G2) is undecidable. In contrast to (4), the property L(G1) _ R is decidable! We conclude with a nice theorem of S. Greibach, which is a sort of version of Rices theorem for families of languages. Let L be a countable family of languages. We assume that there is a coding function c:L ! N and that this function can be extended to code the regular languages (all alphabets are subsets of some given countably infinite set). We also assume that L is effectively closed under union and concatenation with regular languages. This means that given any two languages L1 and L2 in L, we

have L1[L2 2 L, and c(L1[L2) is given by a recursive function of c(L1) and c(L2), and that for every regular language R, we have L1R 2 L, RL1 2 L, and c(RL1) and c(L1R) are recursive functions of c(R) and c(L1). Given any language, L _ __, and any string, w 2 __, we define L/w by L/w = {u 2 __ | uw 2 L}. Theorem 6.9.6 (Greibach) Let L be a countable family of languages that is effectively closed under union and concatenation with the regular languages, and assume that the problem L = __ is undecidable for L 2 L and any given sufficiently large alphabet _. Let P be any nontrivial property of languages that is true for the regular languages and so that if P(L) holds then P(L/a) also holds for any letter a. Then, P is undecidable for L. Proof . Since P is nontrivial for L, there is some L0 2 L so that P(L0) is false. Let _ be large enough, so that L0 _ __, and the problem L = __ is undecidable for L 2 L. 436 We show that given any L 2 L, with L _ __, we can construct a language L1 2 L, so that L = __ iff P(L1) holds. Thus, the problem L = __ for L 2 L reduces to property P for L, and since for _ big enough, the first problem is undecidable, so is the second. For any L 2 L, with L _ __, let

L1 = L0#__ [ __#L. Since L is effectively closed under union and concatenation with the regular languages, we have L1 2 L. If L = __, then L1 = __#__, a regular language, and thus, P(L1) holds, since P holds for the regular languages. Conversely, we would like to prove that if L 6= __, then P(L1) is false. Since L 6= __, there is some w /2 L. But then, L1/#w = L0. Since P is preserved under quotient by a single letter, by a trivial induction, if P(L1) holds, then P(L0) also holds. However, P(L0) is false, so P(L1) must be false. Thus, we proved that L = __ iff P(L1) holds, as claimed. Greibachs theorem can be used to show that it is undecidable whether a context-free grammar generates a regular language. It can also be used to show that it is undecidable whether a context-free language is inherently ambiguous.

Unsolvability Of Ambiguity Problem Of CFGS: AMBIGUITY In this chapter, the notions of ambiguity, degree of ambiguity, and inherent ambiguity are discussed in some detail. 3.1 Degree of Ambiguity A CFG G = (V, T, P, S) is ambiguous if there is at least one string w in L(G) for which there are at least two different parse trees, each with its root labeled S and yielding w [Hopcroft et al. 01]. Each parse tree corresponds to a left-most or a rightmost

derivation. The number of different parse trees of a string w is called the degree of ambiguity of w [Kuich and Salomaa 85]. If no string produced by a grammar G has a degree of ambiguity more than x, the degree of ambiguity of G is x. It is possible to classify ambiguous grammars based on their degree of ambiguity. If the number of distinct parse trees for each string increases with the length of strings generated by a grammar, it is possible that the degree of ambiguity of that grammar is infinite. EXAMPLE: Given the following ambiguous grammar, EE+E EE*E E (E) Ea there are two left-most derivations for a*a+a E==>E+E==>E*E+E==>a*E+E==>a*a+E==>a*a+a 11 E==>E*E==>a*E==>a*E+E==>a*a+E==>a*a+a Hence, the degree of ambiguity of a*a+a is two. 3.2 Ambiguity Detection and Removal The problem of deciding whether a given (context-free) grammar for a language is ambiguous is unsolvable [Hopcroft et al. 01]. In other words, there is no general algorithm that can tell us whether a CFG is ambiguous or not. The problem of finding a solution to Posts Correspondence Problem (PCP), which is known to be undecidable, is reducible to the problem of detecting ambiguity in a context-free grammar. Hence, the problem of context-free grammar ambiguity detection is also undecidable. To handle the ambiguity arising from reduce-shift or reduce-reduce conflicts (see Section 2.1), disambiguating rules can be written [Johnson et al. 78]. Disambiguating rules attempt to remove specific known ambiguities. Disambiguating rules can assign priorities to rules (i.e., which rule to choose when a reduce-reduce conflict occurs) and to operations (i.e., whether to perform a shift or a reduce when a shift-reduce conflict occurs). However, there is no algorithm which, given an ambiguous CFG as input, can always produce an unambiguous context-free grammar as output that generates the same

language [Ullian 69]. 3.3 Inherent Ambiguity If all grammars that generate a language are ambiguous, that language is said to be inherently ambiguous [Hopcroft et al. 01]. An ambiguous grammar does not necessarily generate an ambiguous language. In other words, for a language L to be unambiguous, at least one of the grammars that can generate it should be unambiguous. The problem of determining whether an arbitrary language is inherently ambiguous is recursively unsolvable [Ginsburg and Ullian 66]. In a language L that can be defined as the union of two other languages L1 and L2, all sentences in the intersection of the sets L1 and L2 have two different interpretations because they belong to both L1 and L2. This means that ambiguity is inherent in L, and it is not possible to disambiguate languages such as L. For example, the language L = {anbncmdm | n1, m1} U {anbmcmdn | n1, m1}, with the language {anbncndn | n1} as L1L2, is inherently ambiguous. Churchs hypothesis: ChurchTuring thesis From Wikipedia, the free encyclopedia Jump to: navigation, search "Church's thesis" redirects here. For the statement in constructive mathematics, see Church's thesis (constructive mathematics). In computability theory, the ChurchTuring thesis (also known as the TuringChurch thesis,[1] the ChurchTuring conjecture, Church's thesis, Church's conjecture, and Turing's thesis) is a combined hypothesis ("thesis") about the nature of functions whose values are effectively calculable; or, in more modern terms, functions whose values are algorithmically computable. In simple terms, the ChurchTuring thesis states that a function is algorithmically computable if and only if it is computable by a Turing machine. Several attempts were made in the first half of the 20th Century to formalize the notion of computability: American mathematician Alonzo Church created a method for defining functions called the -calculus,

British mathematician Alan Turing created a theoretical model for a machine, now called a universal Turing machine, that could carry out calculations from inputs, Church, along with mathematician Stephen Kleene and logician J.B. Rosser created a formal definition of a class of functions whose values could be calculated by recursion. All three computational processes (recursion, the -calculus, and the Turing machine) were shown to be equivalentall three approaches define the same class of functions.[2][3] This has led mathematicians and computer scientists to believe that the concept of computability is accurately characterized by these three equivalent processes. Informally the ChurchTuring thesis states that if some method (algorithm) exists to carry out a calculation, then the same calculation can also be carried out by a Turing machine (as well as by a recursively definable function, and by a -function). The ChurchTuring thesis is a statement that characterizes the nature of computation and cannot be formally proven. Even though the three processes mentioned above proved to be equivalent, the fundamental premise behind the thesis the notion of what it means for a function to be effectively calculable is "a somewhat vague intuitive one". Thus, the "thesis" remains a hypothesis. Despite the fact that it has not been formally proven, the ChurchTuring thesis now has near-universal acceptance.

Unit VI Introduction to recursive function theory- primitive recursive and partial recursive functions, Parsing top down and bottom up approach, derivation and reduction Introduction to recursive function theory: Definition 1. A (total) function is a binary relation f A B, that is univocal, i.e. 8a 2 A:8b1; b2 2 B:(afb1 ^ afb2 ) b1 = b2) and for which 8a 2 A:9b 2 B:afb holds. The set A will be called domain, the set B will be called codomain. If the second requirement is dropped, the function is called partial. The function f will then be called undened for those a 2 A such that :9b 2 B:afb Denition 2. A (total) function f is dened from g and h by primitive recursion if f(x; 0) = g(x) f(x; s(y)) = h(x; y; f(x; y)), where n is the arity of f, n 1 is the arity of g and n + 1 is the arity of h and where x is an element of the domain of g. Functions, ctd. Denition 3. A (total) function f is dened from g by -recursion if 1. (8x)(9y)(g(x; y) = 0), 2. f(x) = y(g(x; y) = 0), where n is the arity of f, n + 1 is the arity of g, where x is an element of the domain of f and where y(g(x; y) = 0) is the least number y such that g(x; y) = 0. The rst condition has to be dropped for partial functions. Type Theories A dependent function type (X : A)B (where B is a family of types depending on A, i.e. for each X : A, B[X] is a type) is the type of functions f such that for every X : A, f(X) : B[X], also written as (X : A:B) or Q x:A B. In the case that the family of types B does not depend on A we get the normal (non-dependent) function type A ! B. Type universes are sets of types (where a type is just a set of objects). A type universe will be called impredicative if it is closed under construction of dependent function types over arbitrary types. In other words, if the type B is in the type universe and if A is any type, then (X : A)B will still be in the type universe. If a type universe is not impredicative, it will be called predicative. The Calculus of Constructions Impredicative type theory extending the simply typed -calculus by a general form of dependent function types. -reduction as computational model, thus strongly normalizing and all functions which are representable are total and computable. Type-checking is decidable.

Denition 4. The set of terms of the calculus of constructions is dened inductively as: X j M N j [X : A]M j (X : A)B j Prop j Type Notice that X is bound in M and B respectively for [X : A]M and (X : A)B. The Generalized Calculus of Constructions Extends the calculus of constructions by a hierarchy of type universes, maintaining the impredicative type universe Prop and adding the hierarchy fTypei j i 2 Ng of predicative type universes. The universe hierarchy is employed by a universe containment relation 2 which is dened by Prop 2 Type0 2 Type1 2 Type2 2 : : : and induces a subtyping order on type universes: Prop Type0 Type1 Type2

Primitive Recursive And Partial Recursive Functions: The Primitive Recursive Functions The class of primitive recursive functions is defined in terms of base functions and closure operations. Definition 4.6.1 Let = {a1, . . . , aN}. The base functions over are the following functions: (1) The erase function E, defined such that E(w) = _, for all w ; (2) For every j, 1 j N, the j-successor function Sj , defined such that Sj(w) = waj , for all w ; (3) The projection functions Pn i , defined such that Pn i (w1, . . . , wn) = wi, for every n 1, every i, 1 i n, and for all w1, . . . , wn . Note that P1 1 is the identity function on . Projection functions can be used to permute the arguments of another function. A crucial closure operation is (extended) composition. Definition 1.Let = {a1, . . . , aN}. For any function g: _ ___ m , and any m functions hi: _ ___ n , the composition of g and the hi is the function

f: _ ___ n , denoted as g (h1, . . . , hm), such that f(w1, . . . , wn) = g(h1(w1, . . . , wn), . . . , hm(w1, . . . , wn)), for all w1, . . . , wn . As an example, f = g (P2 2 , P2 1 ) is such that f(w1, w2) = g(w2, w1). Another crucial closure operation is primitive recursion. Definition 2. Let = {a1, . . . , aN}. For any function g: _ ___ m1 , where m 2, and any N functions hi: _ ___ m+1 , the function f: _ ___ m , is defined by primitive recursion from g and h1, . . . , hN, if f(_, w2, . . . , wm) = g(w2, . . . , wm), f(ua1, w2, . . . , wm) = h1(u, f(u, w2, . . . , wm), w2, . . . , wm), ...=... f(uaN, w2, . . . , wm) = hN(u, f(u, w2, . . . , wm), w2, . . . , wm), for all u,w2, . . . , wm . When m = 1, for some fixed w , we have f(_) = w, f(ua1) = h1(u, f (u)), ...=... f(uaN) = hN(u, f (u)), for all u . For numerical functions (i.e., when = {a1}), the scheme of primitive recursion is simpler: f(0, x2, . . . , xm) = g(x2, . . . , xm), f(x + 1, x2, . . . , xm) = h1(x, f(x, x2, . . . , xm), x2, . . . , xm), for all x, x2, . . . , xm N. The successor function S is the function S(x) = x + 1. Addition, multiplication, exponentiation, and super-exponentiation can, be defined by primitive recursion as follows (being a bit loose, we should use some projections ...):

add(0, n) = n, add(m + 1, n) = S(add(m, n)), mult(0, n) = 0, mult(m + 1, n) = add(mult(m, n), n), rexp(0,m) = 1, rexp(m + 1, n) = mult(rexp(m, n), n), exp(m, n) = rexp (P2 2 , P2 1 ), supexp(0, n) = 1, supexp(m + 1, n) = exp(n, supexp(m, n)). 314 CHAPTER 4. RAM PROGRAMS, TURING MACHINES As an example over {a, b}, the following function g: , is defined by primitive recursion: g(_, v) = P1 1 (v), g(uai, v) = Si P3 2 (u, g(u, v), v), where 1 i N. It is easily verified that g(u, v) = vu. Then, f = g (P2 2 , P2 1) computes the concatenation function, i.e. f(u, v) = uv. Definition 3. Let = {a1, . . . , aN}. The class of primitive recursive functions is the smallest class of functions (over ) which contains the base functions and is closed under composition and primitive recursion. We leave as an exercise to show that every primitive recursive function is a total function. The class of primitive recursive functions may not seem very big, but it contains all the total functions that we would ever want to compute. Although it is rather tedious to prove, the following theorem can be shown. Theorem 4.For an alphabet = {a1, . . . , aN}, every primitive recursive function is Turing computable. The best way to prove the above theorem is to use the computation model of RAM programs. Indeed, it was shown in Theorem 4.4.1 that every Turing machine can simulate a RAM program. It is also rather easy to show that the primitive recursive functions are RAMcomputable. In order to define new functions it is also useful to use predicates. Definition 5. An n-ary predicate P (over ) is any subset of ()n. We write that a tuple (x1, . . . , xn) satisfies P as (x1, . . . , xn) P or as P(x1, . . . , xn). The characteristic function of a predicate P is the function CP : ()n {a1} defined by Cp(x1, . . . , xn) =

_ a1 iff P(x1, . . . , xn) _ iff not P(x1, . . . , xn). A predicate P is primitive recursive iff its characteristic function CP is primitive recursive. We leave to the reader the obvious adaptation of the the notion of primitive recursive predicate to functions defined over N. In this case, 0 plays the role of _ and 1 plays the role of a1. It is easily shown that if P and Q are primitive recursive predicates (over ()n), then P Q, P Q and P are also primitive recursive. As an exercise, the reader may want to prove that the predicate (defined over N): prime(n) iff n is a prime number, is a primitive recursive predicate. For any fixed k 1, the function: ord(k, n) = exponent of the kth prime in the prime factorization of n, is a primitive recursive function. We can also define functions by cases. 4.6. THE PRIMITIVE RECURSIVE FUNCTIONS 319 Lemma 6. If P1, . . . , Pn are pairwise disjoint primitive recursive predicates (which means that Pi Pj = for all i _= j) and f 1, . . . , fn+1 are primitive recursive functions, the function g defined below is also primitive recursive: g(x) = f1(x) iff P1(x) ... fn(x) iff Pn(x) fn+1(x) otherwise. (writing x for (x1, . . . , xn).) It is also useful to have bounded quantification and bounded minimization. Definition 7.If P is an (n+1)-ary predicate, then the bounded existential predicate y/xP(y, z) holds iff some prefix y of x makes P(y, z) true. The bounded universal predicate y/xP(y, z) holds iff every prefix y of x makes P(y, z) true. Lemma 8.If P is an (n + 1)-ary primitive recursive predicate, then y/xP(y, z) and y/xP(y, z) are also primitive recursive predicates. As an application, we can show that the equality predicate, u = v?, is primitive recursive. Definition 9. If P is an (n + 1)-ary predicate, then the bounded minimization of P, miny/x P(y, z), is the function defined such that min y/x P(y, z) is the shortest prefix of x such that P(y, z) if such a y exists, xa1 otherwise. The bounded maximization of P, maxy/x P(y, z), is the function defined such that max y/x P(y, z) is the longest prefix of x such that P(y, z) if such a y

exists, xa1 otherwise. Lemma 10. If P is an (n + 1)-ary primitive recursive predicate, then min y/x P(y, z) and max y/x P(y, z) are primitive recursive functions. So far, the primitive recursive functions do not yield all the Turing-computable functions. In order to get a larger class of functions, we need the closure operation known as minimization. The Partial Recursive Functions The operation of minimization (sometimes called minimalization) is defined as follows. Definition 11. Let = {a1, . . . , aN}. For any function g: _ ___ m+1 , where m 0, for every j, 1 j N, the function f: _ ___ m , is defined by minimization over {aj} from g, if the following conditions hold for all w1, . . . , wm : (1) f(w1, . . . , wm) is defined iff there is some n 0 such that g(ap j, w1, . . . , wm) is defined for all p, 0 p n, and g(anj , w1, . . . , wm) = _. (2) When f(w1, . . . , wm) is defined, f(w1, . . . , wm) = anj , where n is such that g(anj , w1, . . . , wm) = _ and g(ap j, w1, . . . , wm) _= _ for every p, 0 p n 1. We also write f(w1, . . . , wm) = minju[g(u,w1, . . . , wm) = _]. Note: When f(w1, . . . , wm) is defined, f(w1, . . . , wm) = anj , where n is the smallest integer such that condition (1) holds. It is very important to require that all the values g(ap j, w1, . . . , wm) be defined for all

p, 0 p n, when defining f(w1, . . . , wm). Failure to do so allows noncomputable functions. Minimization can be viewed as an abstract version of a while loop: u := _; while g(u,w1, . . . , wm) _= _ do u := uaj ; endwhile let f(w1, . . . , wm) = u Remark : Kleene used the -notation: f(w1, . . . , wm) = ju[g(u,w1, . . . , wm) = _], actually, its numerical form: f(x1, . . . , xm) = x[g(x, x1, . . . , xm) = 0], The class of partial computable functions is defined as follows. Definition 12. Let = {a1, . . . , aN}. The class of partial recursive functions is the smallest class of functions (over ) which contains the base functions and is closed under composition, primitive recursion, and minimization. The class of recursive functions is the subset of the class of partial recursive functions consisting of functions defined for every input. One of the major results of computability theory is the following theorem. Theorem For an alphabet = {a1, . . . , aN}, every partial recursive function is Turing-computable. Conversely, every Turing-computable function is a partial recursive function. Similarly, the class of recursive functions is equal to the class of Turing-computable functions that halt in a proper ID for every input. To prove that every partial recursive function is indeed Turing-computable, since by Theorem 4.4.1, every Turing machine can simulate a RAM program, the simplest thing to do is to show that every partial recursive function is RAM-computable. For the converse, one can show that given a Turing machine, there is a primitive recursive function describing how to go from one ID to the next. Then, minimization is used to guess whether a computation halts. The proof shows that every partial recursive function needs minimization at most once. The characterization of the recursive functions in terms of TMs follows easily. There are recursive functions that are not primitive recursive. Such an example is given by Ackermanns function. Ackermanns function: A recursive function which is not primitive recursive: A(0, y) = y + 1, A(x + 1, 0) = A(x, 1), A(x + 1, y + 1) = A(x, A(x + 1, y)).

It can be shown that: A(0, x) = x + 1, A(1, x) = x + 2, A(2, x) = 2x + 3, A(3, x) = 2x+3 3, and A(4, x) = 22216 }x 3, with A(4, 0) = 16 3 = 13. For example A(4, 1) = 216 3, A(4, 2) = 2216 3. Actually, it is not so obvious that A is a total function. This can be shown by induction, using the lexicographic ordering _ on NN, which is defined as follows: (m, n) _ (m_, n_) iffeither m = m_ and n = n_, or m < m_, or m = m_ and n < n_. We write (m, n) (m_, n_) when (m, n) _ (m_, n_) and (m, n) _= (m_, n_). We prove that A(m, n) is defined for all (m, n) NN by complete induction over the lexicographic ordering on N N. In the base case, (m, n) = (0, 0), and since A(0, n) = n+1, we have A(0, 0) = 1, and A(0, 0) is defined. For (m, n) _= (0, 0), the induction hypothesis is that A(m_, n_) is defined for all (m_, n_) (m, n). We need to conclude that A(m, n) is defined. If m = 0, since A(0, n) = n + 1, A(0, n) is defined. If m _= 0 and n = 0, since (m 1, 1) (m, 0), by the induction hypothesis, A(m1, 1) is defined, but A(m, 0) = A(m1, 1), and thus A(m, 0) is defined. If m _= 0 and n _= 0, since (m, n 1) (m, n), by the induction hypothesis, A(m, n 1) is defined. Since (m 1,A(m, n 1)) (m, n), by the induction hypothesis, A(m1,A(m, n1)) is defined. But A(m, n) = A(m 1,A(m, n 1)), and thus A(m, n) is defined. Thus, A(m, n) is defined for all (m, n) N N. It is possible to show that A is a recursive function, although the quickest way to prove it requires some fancy machinery (the recursion theorem). Proving that A is not primitive recursive is harder. We can also deal with languages.

Recursively Enumerable Languages and Recursive Languages: We define the recursively enumerable languages and the recursive languages. We assume that the TMs under consideration have a tape alphabet containing the special symbols 0 and 1. Definition 12. Let = {a1, . . . , aN}. A language L is recursively enumerable (for short, an r.e. set) iff there is some TMM such that for every w L, M halts in a proper ID with the output 1, and for every w / L, either M halts in a proper ID with the output 0, or it runs forever. A language L is recursive iff there is some TM M such that for every w L, M halts in a proper ID with the output 1, and for every w / L, M halts in a proper ID with the output 0. Thus, given a recursively enumerable language L, for some w / L, it is possible that a TM accepting L runs forever on input w. On the other hand, for a recursive language L, a TM accepting L always halts in a proper ID. When dealing with languages, it is often useful to consider nondeterministic Turing machines. Such machines are defined just like deterministic Turing machines, except that their transition function is just a (finite) set of quintuples K {L,R} K, with no particular extra condition. It can be shown that every nondeterministic Turing machine can be simulated by a deterministic Turing machine, and thus, nondeterministic Turing machines also accept the class of r.e. sets. It can be shown that a recursively enumerable language is the range of some recursive function. It can also be shown that a language L is recursive iff both L and its complement are recursively enumerable. There are recursively enumerable languages that are not recursive. Turing machines were invented by Turing around 1935. The primitive recursive functions were known to Hilbert circa 1890. Godel formalized their definition in 1929. The partial recursive functions were defined by Kleene around 1934. Church also introduced the -calculus as a model of computation around 1934. Other models: Post systems, Markov systems. The equivalence of the various models of computation was shown around 1935/36. RAM programs were only defined around 1963. A further study of the partial recursive functions requires the notions of pairing functions and of universal functions (or universal Turing machines). First, we prove the following lemma showing that restricting ourselves to total functions is too limiting. Let F be any set of total functions that contains the base functions and is closed under composition and primitive recursion (and thus, F contains all the primitive recursive functions). We say that a function f: is universal for the one-argument

functions in F iff for every function g: in F, there is some n N such that f(an1 , u) = g(u) for all u . Lemma 4.8.2 For any countable set F of total functions containing the base functions and closed under composition and primitive recursion, if f is a universal function for the functions g: in F, then f / F. Proof . Assume that the universal function f is in F. Let g be the function such that g(u) = f(a|u| 1 , u)a1 for all u . We claim that g F. It it enough to prove that the function h such that h(u) = a|u| 1 is primitive recursive, which is easily shown. Then, because f is universal, there is some m such that g(u) = f(am1 , u) for all u . Letting u = am1 , we get g(am1 ) = f(am1 , am1 ) = f(am1 , am1 )a1, a contradiction. Thus, either a universal function for F is partial, or it is not in F Parsing top down and bottom up approach: Top-Down Parsing -down if it discovers a parse tree top to bottom -down parse corresponds to a preorder traversal of the parse tree -down parsers come in two forms Predictive Parsers okahead tokens Backtracking Parsers will be our focus in exponential time will not be considered -down parsing techniques will be studied

-descent parsing

Example 1: S AcB A aA A B bBS B Example 2: E TQ Q +TQ|TQ T FR R *FR|/FR F ( E ) | id

A and B

First(S) = First(AcB) = (First(A) cB) = {a, c} First(A) = First(aA a First(B) = First(bBS b Follow(S) = {$ B) Follow(A) = {c} Follow(B) = Follow(S S) = {$, a, c} Follow(S) = {$, a, c} Q and R First(E) = First(TQ) = First(T) = First(FR) = First(F) = {( , id} First(Q) = {+ , First(R) = {* , / Follow(E) = {$ , )} Follow(Q) = Follow(E) = {$ , )} Follow(T) = (First(Q) E Q) = {+, , $, )} Follow(R) = Follow(T) = {+, , $, )} Follow(F) = (First(R) T R) = {*, /, +, ,

Example on Recursive-Descent Parsing ar for expressions in EBNF notation expr term { addop term } term factor { mulop factor } factor ( expr ) | id | num a while loop

procedure expr( ) begin term( ); while token = ADDOP do match(ADDOP); term( ); end while; end expr; procedure term( ) begin factor( ); while token = MULOP do match(MULOP); factor( ); end while; end term; procedure factor( ) begin case token of (: match((); expr( ); match()); ID: match(ID); NUM: match(NUM); else syntax_error(token); end case; end factor; Syntax Tree Construction for Expressions recursive-descent parser can be used to construct a syntax tree SyntaxTree := expr ( ) ; Calling parser function for start symbol s given below New node allocates a tree node and returns a pointer to it function expr( ) : TreePtr begin left := term( ); while token = ADDOP do op := ADDOP.op ; match(ADDOP); right := term( ); left := new node(op, left, right); end while; return left; end expr; function term( ) : TreePtr

begin left := factor( ); while token = MULOP do op := MULOP.op; match(MULOP); right := factor( ); left := new node(op, left, right); end while; return left; end term; For a factor, we have the following parsing function symtable.lookup( searches a symbol table for a given name lookup function returns a pointer to an identifier symbol in symtable ptr is a pointer to a literal symbol in the literal table function factor( ) : TreePtr begin case token of (: match((); ptr := expr( ); match()); ID: ptr := symtable.lookup(; match(ID); NUM: ptr := NUM.ptr; match(NUM); else syntax_error(token, Expecting a number, an identifier, or ( ); end case; return ptr; end factor; Node Structure for Expression Trees Node operaror: + , , * , / , etc. Different for each operator ID NUM literals Left Pointer: pointer to left child expression tree Right Pointer: pointer to right child expression tree Line, Pos: keeps track of line and position of each tree node Type: associates a type with each tree node

k) parsing means that k tokens of lookahead are used

Parser stack that holds grammar symbols: non-terminals and tokens Parsing table that specifies the parser action Driver function that interacts with parser stack, parsing table and scanner LL Parsing Actions Match: to match top of parser stack with next input token Predict: to predict a production and apply it in a derivation step Accept: to accept and terminate the parsing of a sequence of tokens Error: to report an error message when matching or prediction fails S S)S Parser Stack Input Parser Action S ( ( ) ) $ Predict S S)S ( S ) S ( ( ) ) $ Match ( S ) S ( ) ) $ Predict S S)S ( S ) S ) S ( ) ) $ Match ( S ) S ) S ) ) $ Predict S ) S ) S ) ) $ Match ) S ) S ) $ Predict S ) S ) $ Match ) S $ Predict S Empty $ Accept Grammar Analysis: Determining the Predict Set A NOT nullable then Predict(A Nullable then Predict(A E TQ Q +TQ Q TQ Q T FR R *FR R /FR R F (E) F id Predict E T Q = First(TQ) = First(T) = {( , id}


Predict Predict Predict Predict Predict Predict Predict Predict


+ T Q = First(+TQ) = { + } T Q = First(TQ) = { } Q) = {$ , )} F R = First(FR) = First(F) = {( , id} * F R = First(*FR) = { * } / F R = First(/FR) = { / } Follow(R) = {+ , , $ , )} (E)={(} id = { id }

Predict F

design of LL (1)Parser: LL(1) Grammars -free grammars are suitable for LL parsing LL(1) Grammars A A n Predict(A i A j i j The predict sets of productions with same LHS are pairwise disjoint S AcB A a A Predict(A a A) = {a} A A A) = {c} B b B S Predict(B b B S) = {b} B B B) = Follow(S = {$ B a, c} = {$, a, c} Disjoint Constructing the LL(1) Parsing Table can be represented in an LL(1) parse table A is a nonterminal and tok is the lookahead token then Table[A][tok] indicates which production to predict Table[A][tok] gives an error value Table[A][tok] = A tok A 1: 2: 3: 4: 5: S A A B B A c B Predict(1) = {a, c} a A Predict(2) = {a} c} b B S Predict(4) = {b} $, a, c}

S) =

Empty slots indicate error conditions Here is a second example on constructing the parsing table entries correspond to error conditions 1: 2: 3: 4: 5: 6: 7: 8: 9: E Q Q Q T R R R F T Q Predict(1) = { ( , id } + T Q Predict(2) = { + } T Q Predict(3) = { } $,)} F R Predict(5) = { ( , id } * F R Predict(6) = { * } / F R Predict(7) = { / } +, , $, )} ( E ) Predict(9) = { ( } id Predict(10) = { id

10: F

LL(1) Parser Driver Algorithm be described as follows: Token := scan( ) Stack.push(StartSymbol) while not Stack.empty( ) do X := Stack.pop( ) if terminal(X) if X = Token then Token := scan( ) else process a syntax error at Token end if else (* X is a nonterminal *) Rule := Table[X][Token] if Rule = X Y1 Y2 Yn then for i from n downto 1 do Stack.push(Yi) end for else process a syntax error at Token end if end if end while if Token = $ then accept parsing else report a syntax error at Token end if Consider the parsing of id * (id + id) $ Parser Stack Input Parser Action

E id*(id+id)$ Predict E T Q T Q id*(id+id)$ Predict T F R F R Q id*(id+id)$ Predict F id id R Q id*(id+id)$ Match id R Q *(id+id)$ Predict R * F R * F R Q *(id+id)$ Match * F R Q (id+id)$ Predict F (E) ( E ) R Q (id+id)$ Match ( E ) R Q id+id)$ Predict E T Q T Q ) R Q id+id)$ Predict T F R F R Q ) R Q id+id)$ Predict F id id R Q ) R Q id+id)$ Match id R Q ) R Q +id)$ Predict R Q ) R Q +id)$ Predict Q + T Q + T Q ) R Q +id)$ Match + T Q ) R Q id)$ Predict T F R F R Q ) R Q id)$ Predict F id id R Q ) R Q id)$ Match id R Q ) R Q )$ Predict R Q ) R Q )$ Predict Q ) R Q )$ Match ) R Q $ Predict R Q $ Predict Q Empty $ Accept Stack grows backwards from right to left +*/ E Q T 11 23 7 5 6 5 R F ( ) id $ 44 8888 9 10 1: E TQ 2: Q +TQ 3: Q TQ 4: Q

5: 6: 7: 8: 9:


FR *FR /FR (E) id

10: F

Bottom Up Parsing Technique: Bottom-up parsing As the name suggests, bottom-up parsing works in the opposite direction from topdown. A top-down parser begins with the start symbol at the top of the parse tree and works downward, driving productions in forward order until it gets to the terminal leaves. A bottom-up parse starts with the string of terminals itself and builds from the leaves upward, working backwards to the start symbol by applying the productions in reverse. Along the way, a bottom-up parser searches for substrings of the working string that match the right side of some production. When it finds such a substring, it reduces it, i.e., substitutes the left side nonterminal for the matching right side. The goal is to reduce all the way up to the start symbol and report a successful parse. In general, bottom-up parsing algorithms are more powerful than top-down methods, but not surprisingly, the constructions required are also more complex. It is difficult to write a bottom-up parser by hand for anything but trivial grammars, but fortunately, there are excellent parser generator tools like yacc that build a parser from an input specification, not unlike the way lex builds a scanner to your spec. Shift-reduce parsing is the most commonly used and the most powerful of the bottom-up techniques. It takes as input a stream of tokens and develops the list of productions used to build the parse tree, but the productions are discovered in reverse order of a topdown parser. Like a table-driven predictive parser, a bottom-up parser makes use of a

stack to keep track of the position in the parse and a parsing table to determine what to do next. To illustrate stack-based shift-reduce parsing, consider this simplified expression grammar: S > E E > T | E + T T > id | (E) The shift-reduce strategy divides the string that we are trying parse into two parts: an undigested part and a semi-digested part. The undigested part contains the tokens that are still to come in the input, and the semi-digested part is put on a stack. If parsing the string v, it starts out completely undigested, so the input is initialized to v, and the stack is initialized to empty. A shift-reduce parser proceeds by taking one of three actions at each step: 2 Reduce: If we can find a rule A > w, and if the contents of the stack are qw for some q (q may be empty), then we can reduce the stack to qA. We are applying the production for the nonterminal A backwards. For example, using the grammar above, if the stack contained (id we can use the rule T > id to reduce the stack to (T. There is also one special case: reducing the entire contents of the stack to the start symbol with no remaining input means we have recognized the input as a valid sentence (e.g., the stack contains just w, the input is empty, and we apply S > w). This is the last step in a successful parse. The w being reduced is referred to as a handle. Formally, a handle of a right sentential form u is a production A > w, and a position within u where the string w may be found and replaced by A to produce the previous rightsentential form in a rightmost derivation of u. Recognizing valid handles is the difficult part of shift-reduce parsing. Shift: If it is impossible to perform a reduction and there are tokens remaining in the undigested input, then we transfer a token from the input onto the stack. This is called a shift. For example, using the grammar above, suppose the stack contained ( and the input contained id+id). It is impossible to perform a reduction on ( since it does not match the entire right side of any of our productions. So, we shift the first character of the input onto the stack, giving us (id on the stack and +id) remaining in the input.

Error: If neither of the two above cases apply, we have an error. If the sequence on the stack does not match the right-hand side of any production, we cannot reduce. And if shifting the next input token would create a sequence on the stack that cannot eventually be reduced to the start symbol, a shift action would be futile. Thus, we have hit a dead end where the next token conclusively determines the input cannot form a valid sentence. This would happen in the above grammar on the input id+). The first id would be shifted, then reduced to T and again to E, next + is shifted. At this point, the stack contains E+ and the next input token is ). The sequence on the stack cannot be reduced, and shifting the ) would create a sequence that is not viable, so we have an error. The general idea is to read tokens from the input and push them onto the stack attempting to build sequences that we recognize as the right side of a production. When we find a match, we replace that sequence with the nonterminal from the left side and continue working our way up the parse tree. This process builds the parse tree from the leaves upward, the inverse of the top-down parser. If all goes well, we will end up moving everything from the input to the stack and eventually construct a sequence on the stack that we recognize as a right-hand side for the start symbol. 3 Lets trace the operation of a shift-reduce parser in terms of its actions (shift or reduce) and its data structure (a stack). The chart below traces a parse of (id+id) using the previous example grammar:

In the above parse on step 7, we ignored the possibility of reducing E > T because that would have created the sequence (E + E on the stack which is not a viable prefix of a right sentential form. Formally, viable prefixes are the set of prefixes of right sentential forms that can appear on the stack of a shift-reduce parser, i.e. prefixes of right sentential forms that do not extend past the end of the rightmost handle. Basically, a shiftreduce parser will only create sequences on the stack that can lead to an eventual reduction to the start symbol. Because there is no right-hand side that matches the sequence (E + E and no possible reduction that transforms it to such, this is a dead end and is not considered. Later, we will see how the parser can determine which reductions are valid in a particular situation. As they were for top-down parsers, ambiguous grammars are problematic for bottomup parsers because these grammars could yield more than one handle under some circumstances. These types of grammars create either shift-reduce or reducereduce conflicts. The former refers to a state where the parser cannot decide whether to shift or reduce. The latter refers to a state where the parser has more than one choice of

production for reduction. An example of a shift-reduce conflict occurs with the if-thenelse construct in programming languages. A typical production might be: 4 S > if E then S | if E then S else S Consider what would happen to a shift-reduce parser deriving this string: if E then if E then S else S At some point the parser's stack would have: if E then if E then S with else as the next token. It could reduce because the contents of the stack match the right-hand side of the first production or shift the else trying to build the righthand side of the second production. Reducing would close off the inner if and thus associate the else with the outer if. Shifting would continue building and later reduce the inner if with the else. Either is syntactically valid given the grammar, but two different parse trees result, showing the ambiguity. This quandary is commonly referred to as the dangling else. Does an else appearing within a nested if statement belong to the inner or the outer? The C and Java languages agree that an else is associated with its nearest unclosed if. Other languages, such as Ada and Modula, avoid the ambiguity by requiring a closing endif delimiter. Reduce-reduce conflicts are rare and usually indicate a problem in the grammar definition. Now that we have general idea of how a shift-reduce parser operates, we will look at how it recognizes a handle, and how it decides which production to use in a reduction. To deal with these two issues, we will look at a specific shift-reduce implementation called LR parsing. LR Parsing LR parsers ("L" for left to right scan of input, "R" for rightmost derivation) are efficient, table-driven shift-reduce parsers. The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be parsed with predictive

LL parsers. In fact, virtually all programming language constructs for which CFGs can be written can be parsed with LR techniques. As an added advantage, there is no need for lots of grammar rearrangement to make it acceptable for LR parsing the way that LL parsing requires. The primary disadvantage is the amount of work it takes to build the tables by hand, which makes it infeasible to hand-code an LR parser for most grammars. Fortunately, there are LR parser generators that create the parser from an unambiguous CFG specification. The parser tool does all the tedious and complex work to build the necessary tables and can report any ambiguities or language constructs that interfere with the ability to parse it using LR techniques. 5 We begin by tracing how an LR parser works. Determining the handle to reduce in a sentential form depends on the sequence of tokens on the stack, not only the topmost ones that are to be reduced, but the context at which we are in the parse. Rather than reading and shifting tokens onto a stack, an LR parser pushes "states" onto the stack; these states describe what is on the stack so far. Think of each state as encoding the current left context. The state on top of the stack possibly augmented by peeking at a lookahead token enables us to figure out whether we have a handle to reduce, or whether we need to shift a new state on top of the stack for the next input token. An LR parser uses two tables: 1. The action table Action[s,a] tells the parser what to do when the state on top of the stack is s and terminal a is the next input token. The possible actions are to shift a state onto the stack, to reduce the handle on top of the stack, to accept the input, or to report an error. 2. The goto table Goto[s,X] indicates the new state to place on top of the stack after a reduction of the nonterminal X while state s is on top of the stack.

The two tables are usually combined, with the action table specifying entries for terminals, and the goto table specifying entries for nonterminals. LR Parser Tracing We start with the initial state s0 on the stack. The next input token is the terminal a and the current state is st. The action of the parser is as follows: If Action[st,a] is shift, we push the specified state onto the stack. We then call yylex() to get the next token a from the input. If Action[st,a] is reduce Y > X1...Xk then we pop k states off the stack (one for each symbol in the right side of the production) leaving state su on top. Goto[su,Y] gives a new state sV to push on the stack. The input token is still a (i.e., the input remains unchanged). If Action[st,a] is accept then the parse is successful and we are done. If Action[st,a] is error (the table location is blank) then we have a syntax error. With the current top of stack and next input we can never arrive at a sentential form with a handle to reduce. As an example, consider the following simplified expression grammar. The productions have been sequentially numbered so we can refer to them in the action table: 1) E > E + T 6 2) E > T 3) T > (E) 4) T > id Here is the combined action and goto table. In the action columns sN means shift state numbered N onto the stack number and rN action means reduce using production numbered N. The goto column entries are the number of the new state to push onto the stack after reducing the specified nonterminal. This is an LR(0) table (more details on table construction will come in a minute).

Derivation And Reduction:

In computer science, LR parsers are a type of bottom-up parsers that efficiently handle deterministic context-free languages in guaranteed linear time.[1] The LALR parsers and the SLR parsers are common variants of LR parsers. LR parsers are often mechanically generated from a formal grammar for the language by a parser generator tool. They are very widely used for the processing of computer languages, more than other kinds of generated parsers.[citation needed]

The name LR is an acronym. The L means that the parser reads input text in one direction without backing up; that direction is typically Left to right within each line, and top to bottom across the lines of the full input file. (This is true for most parsers.) The R means that the parser produces a reversed Rightmost derivation; it does a bottom-up parse, not a top-down LL parse or ad-hoc parse. The name LR is often followed by a numeric qualifier, as in LR(1) or sometimes LR(k). To avoid backtracking or guessing, the LR parser is allowed to peek ahead at k lookahead input symbols before deciding how to parse earlier symbols. Typically k is 1 and is not mentioned. The name LR is often preceded by other qualifiers, as in SLR and LALR. LR parsers are deterministic; they produce a single correct parse without guesswork or backtracking, in linear time. This is ideal for computer languages. But LR parsers are not suited for human languages which need more flexible but slower methods. Other parser methods that backtrack or yield multiple parses may take O(n2) or O(n3) time when they guess badly. The above properties of L, R, and k are actually shared by all shift-reduce parsers, including precedence parsers. But by convention, the LR name stands for the form of parsing invented by Donald Knuth, and excludes the earlier, less powerful precedence methods.[1] LR parsers can handle a larger range of languages and grammars than precedence parsers or top-down LL parsing.[2] This is because the LR parser waits until it has seen an entire instance of some grammar pattern before committing to what it has found. An LL parser has to decide or guess what it is seeing much sooner, when it has only seen the leftmost input symbol of that pattern. LR is also better at error reporting. It detects syntax errors as early in the input stream as possible. When using an LR parser within some larger program, you can usually ignore all the mathematical details about states, tables, and generators. All of the parsing actions and outputs and their timing can be simply understood by viewing the LR parser as just a shift-reduce parser with some nifty decision method. If the generator tool complains about some parts of your grammar, you may need some understanding of states and the difference between LR and LALR in order to tweak your grammar into an acceptable form. Full understanding of grammar and state analysis algorithms is needed only by the tool implementer and by students of parsing theory courses. Bottom-Up Parse Tree for Example A*2 + 1

Bottom-up parse tree built in numbered steps An LR parser scans and parses the input text in one forward pass over the text. The parser builds up the parse tree incrementally, bottom up, and left to right, without guessing or backtracking. At every point in this pass, the parser has accumulated a list of subtrees or phrases of the input text that have been already parsed. Those subtrees are not yet joined together because the parser has not yet reached the right end of the syntax pattern that will combine them. At step 6 in the example parse, only "A*2" has been parsed, incompletely. Only the shaded lower-left corner of the parse tree exists. None of the parse tree nodes numbered 7 and above exist yet. Nodes 3, 4, and 6 are the roots of isolated subtrees for variable A, operator *, and number 2, respectively. These three root nodes are temporarily held in a parse stack. The remaining unparsed portion of the input stream is "+ 1". Shift & Reduce Actions As with other shift-reduce parsers, an LR parser works by doing some combination of Shift steps and Reduce steps.

A Shift step advances in the input stream by one symbol. That shifted symbol becomes a new single-node parse tree. A Reduce step applies a completed grammar rule to some of the recent parse trees, joining them together as one tree with a new root symbol.

If the input has no syntax errors, the parser continues with these steps until all of the input has been consumed and all of the parse trees have been reduced to a single tree representing an entire legal input.

LR parsers differ from other shift-reduce parsers in how they decide when to reduce, and how to pick between rules with similar endings. But the final decisions and the sequence of shift or reduce steps are the same. Much of the LR parser's efficiency is from being deterministic. To avoid guessing, the LR parser often looks ahead (rightwards) at the next scanned symbol, before deciding what to do with previously scanned symbols. The lexical scanner works one or more symbols ahead of the parser. The lookahead symbols are the 'right-hand context' for the parsing decision.[3] Bottom-Up Parse Stack

Bottom-Up Parser at step 6 Like other shift-reduce parsers, an LR parser lazily waits until it has scanned and parsed all parts of some construct before committing to what the combined construct is. The parser then acts immediately on the combination instead of waiting any further. In the parse tree example, the phrase A gets reduced to Value and then to Products in steps 1-3 as soon as lookahead * is seen, rather than waiting any later to organize those parts of the parse tree. The decisions for how to handle A are based only on what the parser and scanner have already seen, without considering things that appear much later to the right. Reductions reorganize the most recently parsed things, immediately to the left of the lookahead symbol. So the list of already-parsed things acts like a stack. This parse stack grows rightwards. The base or bottom of the stack is on the left and holds the leftmost, oldest parse fragment. Every reduction step acts only on the rightmost, newest parse fragments. (This accumulative parse stack

is very unlike the predictive, leftward-growing parse stack used by top-down parsers.) Bottom-Up Parse Steps for Example A*2 + 1 Step 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Parse Stack empty id Value Products Products * Products * int Products * Value Products Sums Sums + Sums + int Sums + Value eof eof Unparsed A*2 + 1 shift *2 + 1 Value id *2 + 1 Products Value *2 + 1 shift 2 + 1 shift + 1 Value int + 1 Products Products * Value + 1 Sums Products + 1 shift 1 shift Value int Products Value Sums Sums + Products done Shift/Reduce

Sums + Products eof Sums eof

Step 6 applies a grammar rule with multiple parts: Products Products * Value This matches the stack top holding the parsed phrases "... Products * Value". The reduce step replaces this instance of the rule's right hand side, "Products * Value" by the rule's left hand side symbol, here a larger Products. If the parser

builds complete parse trees, the three trees for inner Products, *, and Value are combined by a new tree root for Products. Otherwise, semantic details from the inner Products and Value are output to some later compiler pass, or are combined and saved in the new Products symbol.[4] LR Parse Steps for Example A*2 + 1 In LR parsers, the shift and reduce decisions are potentially based on the entire stack of everything that has been previously parsed, not just on a single, topmost stack symbol. If done in an unclever way, that could lead to very slow parsers that get slower and slower for longer inputs. LR parsers do this with constant speed, by summarizing all the relevant left context information into a single number called the LR(0) parser state. For each grammar and LR analysis method, there is a fixed (finite) number of such states. Besides holding the already-parsed symbols, the parse stack also remembers the state numbers reached by everything up to those points. At every parse step, the entire input text is divided into a stack of previously parsed phrases, and a current lookahead symbol, and the remaining unscanned text. The parser's next action is determined by its current LR(0) state number (rightmost on the stack) and the lookahead symbol. In the steps below, all the black details are exactly the same as in other non-LR shiftreduce parsers. LR parser stacks add the state information in purple, summarizing the black phrases to their left on the stack and what syntax possibilities to expect next. Users of an LR parser can usually ignore state information. These states are explained in a later section. Parse Stack Look Parser state Symbol state Step Ahead Unscanned Action ... 0 1 2 3 4

Grammar Rule

Next State 9 7 4 5 8

id id9 Value7 Products4 Products4 *5 * * * int

*2 + 1 shift 2 + 1 reduce Value id 2 + 1 reduce Products Value 2 + 1 shift + 1 shift

5 6 7 8 9 10 11

Products4 *5 int8 + + + + int eof eof

1 reduce Value int 1 reduce

Products4 *5 Value6
0 0

Products Products * 4 Value 1 2 8 7 3

Products4 Sums1 Sums1 +2 Sums1 +2 int8

1 reduce Sums Products 1 shift eof shift reduce Value int reduce Products Value Sums Sums + Products

Sums1 +2 Value7

12 13

Sums1 +2 Products3
0 0

eof eof

reduce done


At initial step 0, the input stream "A*2 + 1" is divided into

an empty section on the parse stack, lookahead text "A" scanned as an id symbol, and the remaining unscanned text "*2 + 1".

The parse stack begins by holding only initial state 0. When state 0 sees the lookahead id, it knows to shift that id onto the stack, and scan the next input symbol *, and advance to state 9.

At step 4, the total input stream "A*2 + 1" is currently divided into

the parsed section "A *" with 2 stacked phrases Products and *, lookahead text "2" scanned as an int symbol, and the remaining unscanned text " + 1".

The states corresponding to the stacked phrases are 0, 4, and 5. The current, rightmost state on the stack is state 5. When state 5 sees the lookahead int, it knows to shift that int onto the stack as its own phrase, and scan the next input symbol +, and advance to state 8.

At step 11, all of the input stream has been consumed but only partially organized. The current state is 7. When state 7 sees the lookahead eof, it knows to apply the completed grammar rule Sums Sums + Products by combining the stack's rightmost three phrases for Sums, +, and Products into one thing. State 7 itself doesn't know what the next state should be. This is found by going back to state 0, just to the left of the phrase being reduced. When state 0 sees this new completed instance of a Sums, it advances to state 1 (again). This consulting of older states is why they are kept on the stack, instead of keeping only the current state. Grammar for the Example A*2 + 1 LR parsers are constructed from a grammar that formally defines the syntax of the input language as a set of patterns. The grammar doesn't cover all language rules, such as the size of numbers, or the consistent use of names and their definitions in the context of the whole program. LR parsers use a context-free grammar that deals just with local patterns of symbols. The example grammar used here is a tiny subset of the Java or C language: r0: Goal Sums eof r1: Sums Sums + Products r2: Sums Products r3: Products Products * Value r4: Products Value r5: Value int r6: Value id

The grammar's terminal symbols are the multi-character symbols or 'tokens' found in the input stream by a lexical scanner. Here these include + * and int for any integer constant, and id for any identifier name, and eof for end of input file. The grammar doesn't care what the int values or id spellings are, nor does it care about blanks or line breaks. The grammar uses these terminal symbols but does not define them. They are always at the bottom bushy end of the parse tree. The capitalized terms like Sums are nonterminal symbols. These are names for concepts or patterns in the language. They are defined in the grammar and never occur themselves in the input stream. They are always above the bottom of the parse tree. They only happen as a result of the parser applying some grammar rule. Some terminals are defined with two or more rules; these are alternative patterns. Rules can refer back to themselves. This grammar uses recursive rules to handle repeated math operators. Grammars for complete languages use recursive rules to handle lists, parenthesized expressions, and nested statements. Any given computer language can be described by several different grammars. An LR(1) parser can handle many but not all common grammars. It is usually possible to manually modify a grammar so that it fits the limitations of LR(1) parsing and the generator tool. The grammar for an LR parser must be unambiguous itself, or must be augmented by tie-breaking precedence rules. This means there is only one correct way to apply the grammar to a given legal example of the language, resulting in a unique parse tree with just one meaning, and a unique sequence of shift/reduce actions for that example. LR parsing is not a useful technique for human languages with ambiguous grammars that depend on the interplay of words. Human languages are better handled by parsers like Generalized LR parser, the Earley parser, or the CYK algorithm that can simultaneously compute all possible parse trees in one pass. Parse Table for the Example Grammar Most LR parsers are table driven. The parser's program code is a simple generic loop that is the same for all grammars and languages. The knowledge of the grammar and its syntactic implications are encoded into unchanging data tables called parse tables. The tables show whether to shift or reduce (and by which grammar rule), for every legal combination of parser state and lookahead symbol. The parse tables also tell how to compute the next state, given just a current state and a next symbol.

The parse tables are much larger than the grammar. LR tables are hard to accurately compute by hand for big grammars. So they are mechanically derived from the grammar by some parser generator tool like Bison.[5] Depending on how the states and parsing table are generated, the resulting parser is called either a SLR (simple LR) parser, LALR (look-ahead LR) parser, or canonical LR parser. LALR parsers handle more grammars than SLR parsers. Canonical LR parsers handle even more grammars, but use many more states and much larger tables. The example grammar is SLR. LR parse tables are two-dimensional. Each current LR(0) parser state has its own row. Each possible next symbol has its own column. Many combinations of state and next symbol are impossible for valid input streams. These blank cells trigger syntax error messages. The left half of the table has columns for lookahead terminal symbols. These cells determine whether the next parser action is shift (to state n), or reduce (by grammar rule rn). The Goto right half of the table has columns for nonterminal symbols. These cells show which state to advance to, after some reduction's Left Hand Side has created an expected new instance of that symbol. This is like a shift action but for nonterminals; the lookahead terminal symbol is unchanged. The table column "Current Rules" documents the meaning and syntax possibilities for each state, as worked out by the parser generator. It is not included in the actual tables used at parsing time. The marker shows where the parser is now, within some partially recognized grammar rules. The things to the left of have been parsed, and the things to the right are expected soon. A state has several such current rules if the parser has not yet narrowed possibilities down to a single rule. Curr State 0 1 2 Current Rules Goal Sums eof Goal Sums eof Sums Sums + Products Sums Sums + Products 8 9 Lookahead int id * + 8 9 done 2 3 7 LHS Goto

eof Sums Products Value 1 4 7

Sums Sums + Products Products Products * Value Sums Products Products Products * Value Products Products * Value 8 Products Products * Value Products Value Value int Value id 9

r1 r1 5 r2 r2 5 6 r3 r3 r3 r4 r4 r4 r5 r5 r5 r6 r6 r6

4 5 6 7 8 9

In state 2 above, the parser has just found and shifted-in the + of grammar rule r1: Sums Sums + Products The next expected phrase is Products. Products begins with terminal symbols int or id. If the lookahead is either of those, the parser shifts them in and advances to state 8 or 9, respectively. When a Products has been found, the parser advances to state 3 to accumulate the complete list of summands and find the end of rule r0. A Products can also begin with nonterminal Value. For any other lookahead or nonterminal, the parser announces a syntax error.

In state 3, the parser has just found a Products phrase, that could be from two possible grammar rules: r1: Sums Sums + Products r3: Products Products * Value The choice between r1 and r3 can't be decided just from looking backwards at prior phrases. The parser has to check the lookahead symbol to tell what to do. If the lookahead is *, we are in rule 3 so the parser shifts in the * and advances to state 5. If the lookahead is eof, we are at the end of rule 1 and rule 0 so the parser is done.

In state 9 above, all the non-blank, non-error cells are for the same reduction r6. Some parsers save time and table space by not checking the lookahead symbol in these simple cases. Syntax errors are then detected somewhat later, after some harmless reductions, but still before the next shift action or parser decision. Individual table cells must not hold multiple, alternative actions, otherwise the parser would be nondeterministic with guesswork and backtracking. If the grammar is not LR(1), some cells will have shift/reduce conflicts between a possible shift action and reduce action, or reduce/reduce conflicts between multiple grammar rules. LR(k) parsers resolve these conflicts (where possible) by checking additional lookahead symbols beyond the first. LR Parser Loop The LR parser begins with a nearly empty parse stack containing just the start state 0, and with the lookahead holding the input stream's first scanned symbol. The parser then repeats the following loop step until done, or stuck on a syntax error: The topmost state on the parse stack is some state s, and the current lookahead is some terminal symbol t. Look up the next parser action from row s and column t of the Lookahead Action table. That action is either Shift, Reduce, Done, or Error:

Shift n: Shift the matched terminal t onto the parse stack and scan the next input symbol into the lookahead buffer. Push next state n onto the parse stack as the new current state.

Reduce rm: Apply grammar rule rm: Lhs S1 S2 ... SL Remove the matched topmost L symbols (and parse trees and associated state numbers) from the parse stack. This exposes a prior state p that was expecting an instance of the Lhs symbol. Join the L parse trees together as one parse tree with new root symbol Lhs.

Lookup the next state n from row p and column Lhs of the LHS Goto table. Push the symbol and tree for Lhs onto the parse stack. Push next state n onto the parse stack as the new current state. The lookahead and input stream remain unchanged.

Done: Lookahead t is the eof marker. End of parsing. If the state stack contains just the start state report success. Otherwise, report a syntax error. No action: Report a syntax error. The parser ends, or attempts some recovery.

LR Generator Analysis This section of the article can be skipped by most users of LR parser generators. LR States State 2 in the example parse table is for the partially parsed rule r1: Sums Sums + Products This shows how the parser got here, by seeing Sums then + while looking for a larger Sums. The marker has advanced beyond the beginning of the rule. It also shows how the parser expects to eventually complete the rule, by next finding a complete Product. But more details are needed on how to parse all the parts of that Products. The partially parsed rules for a state are called its "core LR(0) items". The parser generator adds additional rules or items for all the possible next steps in building up the expected Products: r3: Products Products * Value r4: Products Value r5: Value int r6: Value id

Note that the marker is at the beginning of each of these added rules; the parser has not yet confirmed and parsed any part of them. These additional items are called the "closure" of the core items. For each nonterminal symbol immediately following a , the generator adds the rules defining that symbol. This adds more markers, and possibly different follower symbols. This closure process continues until all follower symbols have been expanded. The follower nonterminals for state 2 begins with Products. Value is then added by closure. The follower terminals are int and id. The kernel and closure items together show all possible legal ways to proceed from the current state to future states and complete phrases. If a follower symbol appears in only one item, it leads to a next state containing only one core item with the marker advanced. So int leads to next state 8 with core r6: Value id If the same follower symbol appears in several items, the parser cannot yet tell which rule applies here. So that symbol leads to a next state that shows all remaining possibilities, again with the marker advanced. Products appears in both r1 and r3. So Products leads to next state 4 with core r1: Sums Sums + Products r3: Products Products * Value In words, that means if you've seen a single Products, you might be done, or you might still have even more things to multiply together. Note that all the core items have the same symbol preceding the marker; all transitions into this state are always with that same symbol. Some transitions will be to cores and states that have been enumerated already. Other transitions lead to new states. The generator starts with the grammar's goal rule. From there it keeps exploring known states and transitions until all needed states have been found. These states are called "LR(0)" states because they use a lookahead of k=0, i.e. no lookahead. The only checking of input symbols occurs when the symbol is shifted in. Checking of lookaheads for reductions is done separately by the parse table, not by the enumerated states themselves. Finite State Machine

The parse table describes all possible LR(0) states and their transitions. They form a finite state machine. An FSM is a simple engine for parsing simple unnested languages, without using a stack. In this LR application, the FSM's modified "input language" has both terminal and nonterminal symbols, and covers any partially parsed stack snapshot of the full LR parse. Recall step 5 of the Parse Steps Example: Parse Stack Look Step state Symbol state ... Ahead Unscanned 5

Products4 *5 int8 +

The parse stack shows a series of state transitions, from the start state 0, to state 4 and then on to 5 and current state 8. The symbols on the parse stack are the shift or goto symbols for those transitions. Another way to view this, is that the finite state machine can scan the stream "Products * int + 1" (without using yet another stack) and find the leftmost complete phrase that should be reduced next. And that is indeed its job! How can a mere FSM do this, when the original unparsed language has nesting and recursion and definitely requires an analyzer with a stack? The trick is that everything to the left of the stack top has already been fully reduced. This eliminates all the loops and nesting from those phrases. The FSM can ignore all the older beginnings of phrases, and track just the newest phrases that might be completed next. The obscure name for this in LR theory is "viable prefix". Lookahead Sets The states and transitions give all the needed information for the parse table's shift actions and goto actions. The generator also needs to calculate the expected lookahead sets for each reduce action. In SLR parsers, these lookahead sets are determined directly from the grammar, without considering the individual states and transitions. For each nonterminal S, the SLR generator works out Follows(S), the set of all the terminal symbols which can immediately follow some occurrence of S. In the parse table, each reduction to S uses Follow(S) as its LR(1) lookahead set. Such follow sets are also used by generators for LL top-down parsers. A grammar that has no shift/reduce or reduce/reduce conflicts when using Follow sets is called an SLR grammar.

LALR parsers have the same states as SLR parsers, but use a more complicated, more precise way of working out the minimum necessary reduction lookaheads for each individual state. Depending on the details of the grammar, this may turn out to be the same as the Follow set computed by SLR parser generators, or it may turn out to be a subset of the SLR lookaheads. Some grammars are okay for LALR parser generators but not for SLR parser generators. This happens when the grammar has spurious shift/reduce or reduce/reduce conflicts using Follow sets, but no conflicts when using the exact sets computed by the LALR generator. The grammar is then called LALR(1) but not SLR. An SLR or LALR parser avoids having duplicate states. But this minimization is not necessary, and can sometimes create unnecessary lookahead conflicts. Canonical LR parsers use duplicated (or "split") states to better remember the left and right context of a nonterminal's use. Each occurrence of a symbol S in the grammar can be treated independently with its own lookahead set, to help resolve reduction conflicts. This handles a few more grammars. Unfortunately, this greatly magnifies the size of the parse tables if done for all parts of the grammar. This splitting of states can also be done manually and selectively with any SLR or LALR parser, by making two or more named copies of some nonterminals. A grammar that is conflict-free for a canonical LR generator but has conflicts in an LALR generator is called LR(1) but not LALR(1), and not SLR. SLR, LALR, and canonical LR parsers make exactly the same shift and reduce decisions when the input stream is correct language. When the input has a syntax error, the LALR parser may do some additional (harmless) reductions before detecting the error than would the canonical LR parser. And the SLR parser may do even more. This happens because the SLR and LALR parsers are using a generous superset approximation to the true, minimal lookahead symbols for that particular state. Syntax Error Recovery LR parsers can generate somewhat helpful error messages for the first syntax error in a program, by simply enumerating all the terminal symbols that could have appeared next instead of the unexpected bad lookahead symbol. But this does not help the parser work out how to parse the remainder of the input program to look for further, independent errors. If the parser recovers badly from the first error, it is very likely to mis-parse everything else and produce a cascade of unhelpful spurious error messages. In the yacc and bison parser generators, the parser has an ad hoc mechanism to abandon the current statement, discard some parsed phrases and lookahead

tokens surrounding the error, and resynchronize the parse at some reliable statement-level delimiter like semicolons or braces. This often works well for allowing the parser and compiler to look over the rest of the program. Many syntactic coding errors are simple typos or omissions of a trivial symbol. Some LR parsers attempt to detect and automatically repair these common cases. The parser enumerates every possible single-symbol insertion, deletion, or substitution at the error point. The compiler does a trial parse with each change to see if it worked okay. (This requires backtracking to snapshots of the parse stack and input stream, normally unneeded by the parser.) Some best repair is picked. This gives a very helpful error message and resynchronizes the parse well. However, the repair is not trustworthy enough to permanently modify the input file. Repair of syntax errors is easiest to do consistently in parsers (like LR) that have parse tables and an explicit data stack. Variants of LR Parsers The LR parser generator decides what should happen for each combination of parser state and lookahead symbol. These decisions are usually turned into read-only data tables that drive a generic parser loop that is grammar- and state-independent. But there are also other ways to turn those decisions into an active parser. Some LR parser generators create separate tailored program code for each state, rather than a parse table. These parsers can run several times faster than the generic parser loop in table-driven parsers. The fastest parsers use generated assembler code. In the recursive ascent parser variation, the explicit parse stack structure is also replaced by the implicit stack used by subroutine calls. Reductions terminate several levels of subroutine calls, which is clumsy in most languages. So recursive ascent parsers are generally slower, less obvious, and harder to hand-modify than recursive descent parsers. Another variation replaces the parse table by pattern-matching rules in nonprocedural languages such as Prolog. GLR Generalized LR parsers use LR bottom-up techniques to find all possible parses of input text, not just one correct parse. This is essential for highly ambiguous grammars such as for human languages. The multiple valid parse trees are computed simultaneously, without backtracking. GLR is sometimes helpful for computer languages that are not easily described by an unambiguous, conflict-free LALR(1) grammar.

Left corner parsers use LR bottom-up techniques for recognizing the left end of alternative grammar rules. When the alternatives have been narrowed down to a single possible rule, the parser then switches to top-down LL(1) techniques for parsing the rest of that rule. LC parsers have smaller parse tables than LALR parsers and better error diagnostics. There are no widely used generators for deterministic LC parsers. Multiple-parse LC parsers are helpful with human languages with very large grammars. Theory LR parsers were invented by Donald Knuth in 1965 as an efficient generalization of precedence parsers. Knuth proved that LR parsers were the most general-purpose parsers possible that would still be efficient in the worst cases. "LR(k) grammars can be efficiently parsed with an execution time essentially proportional to the length of the string." "A language can be generated by an LR(k) grammar if and only if it is deterministic, if and only if it can be generated by an LR(1) grammar."[1] In other words, if a language was reasonable enough to allow an efficient onepass parser, it could be described by an LR(k) grammar. And that grammar could always be mechanically transformed into an equivalent (but larger) LR(1) grammar. So an LR(1) parsing method was, in theory, powerful enough to handle any reasonable language. In practice, the natural grammars for many programming languages are close to being LR(1).[citation needed] The canonical LR parsers described by Knuth had too many states and very big parse tables that were impractically large for the limited memory of computers of that era. LR parsing became practical when Frank DeRemer invented SLR and LALR parsers with much fewer states.[6][7] For full details on LR theory and how LR parsers are derived from grammars, see [8] if you can find it, otherwise see their current textbook.[9] Earley parsers apply the techniques and notation of LR parsers to the task of generating all possible parses for ambiguous grammars such as for human languages.

Additional Example 1+1

Bottom-up parse of 1+1 This example of LR parsing uses the following small grammar with goal symbol E: (1) E E * B (2) E E + B (3) E B (4) B 0 (5) B 1 to parse the following input: 1+1 Action and goto tables The two LR(0) parsing tables for this grammar look as follows: state * + action 0 1 $ E goto B

0 1 2 3 4 5 6 7 8 r1 r2 r1 r2 r4 r5 s5 r3 r4 r5 s6 r3

s1 r4 r5

s2 r4 r5 r4 r5 acc

r3 s1 s1 r1 r2

r3 s2 s2 r1 r2

r3 7 8 r1 r2

The action table is indexed by a state of the parser and a terminal (including a special terminal $ that indicates the end of the input stream) and contains three types of actions:

shift, which is written as 'sn' and indicates that the next state is n reduce, which is written as 'rm' and indicates that a reduction with grammar rule m should be performed accept, which is written as 'acc' and indicates that the parser accepts the string in the input stream.

The goto table is indexed by a state of the parser and a nonterminal and simply indicates what the next state of the parser will be if it has recognized a certain nonterminal. This table is important to find out the next state after every reduction. After a reduction, the next state is found by looking up the goto table entry for top of the stack (i.e. current state) and the reduced rule's LHS (i.e. non-terminal). Parsing Steps The table below illustrates each step in the process. Here the state refers to the element at the top of the stack (the right-most element), and the next action is determined by referring to the action table above. Also note that a $ is appended to the input string to denote the end of the stream.

State Input stream Output stream Stack Next action 0 2 4 3 6 2 8 3 1+1$ +1$ +1$ +1$ 1$ $ $ $ 5 5,3 5,3 5,3 5,3,5 5,3,5,2 [0] [0,2] [0,4] [0,3] [0,3,6] Shift 2 Reduce 5 Reduce 3 Shift 6 Shift 2

[0,3,6,2] Reduce 5 [0,3,6,8] Reduce 2 [0,3] Accept

Walkthrough The parser starts out with the stack containing just the initial state ('0'): [0] The first symbol from the input string that the parser sees is '1'. In order to find out what the next action is (shift, reduce, accept or error), the action table is indexed with the current state (remember that the "current state" is just whatever is on the top of the stack), which in this case is 0, and the current input symbol, which is '1'. The action table specifies a shift to state 2, and so state 2 is pushed onto the stack (again, remember that all the state information is in the stack, so "shifting to state 2" is the same thing as pushing 2 onto the stack). The resulting stack is [0 '1' 2] where the top of the stack is 2. For the sake of explanation we also show the symbol (e.g., '1', B) that caused the transition to the next state, although strictly speaking it is not part of the stack. In state 2 the action table says that regardless of what terminal we see on the input stream, we should do a reduction with grammar rule 5. If the table is correct, this means that the parser has just recognized the right-hand side of

rule 5, which is indeed the case. In this case we write 5 to the output stream, pop one state from the stack (since the right-hand side of the rule has one symbol), and push on the stack the state from the cell in the goto table for state 0 and B, i.e., state 4. The resulting stack is: [0 B 4] However, in state 4 the action table says we should now do a reduction with rule 3. So we write 3 to the output stream, pop one state from the stack, and find the new state in the goto table for state 0 and E, which is state 3. The resulting stack: [0 E 3] The next terminal that the parser sees is a '+' and according to the action table it should then go to state 6: [0 E 3 '+' 6] Note that the resulting stack can be interpreted as the history of a finite state automaton that has just read a nonterminal E followed by a terminal '+'. The transition table of this automaton is defined by the shift actions in the action table and the goto actions in the goto table. The next terminal is now '1' and this means that we perform a shift and go to state 2: [0 E 3 '+' 6 '1' 2] Just as the previous '1' this one is reduced to B giving the following stack: [0 E 3 '+' 6 B 8] Again note that the stack corresponds with a list of states of a finite automaton that has read a nonterminal E, followed by a '+' and then a nonterminal B. In state 8 we always perform a reduce with rule 2. Note that the top 3 states on the stack correspond with the 3 symbols in the right-hand side of rule 2. [0 E 3] Finally, we read a '$' from the input stream which means that according to the action table (the current state is 3) the parser accepts the input string. The rule

numbers that will then have been written to the output stream will be [5, 3, 5, 2] which is indeed a rightmost derivation of the string "1 + 1" in reverse. Constructing LR(0) parsing tables Items The construction of these parsing tables is based on the notion of LR(0) items (simply called items here) which are grammar rules with a special dot added somewhere in the right-hand side. For example the rule E E + B has the following four corresponding items: EE+B EE+B EE+B EE+B Rules of the form A have only a single item A . The item E E + B, for example, indicates that the parser has recognized a string corresponding with E on the input stream and now expects to read a '+' followed by another string corresponding with B. Item sets It is usually not possible to characterize the state of the parser with a single item because it may not know in advance which rule it is going to use for reduction. For example if there is also a rule E E * B then the items E E + B and E E * B will both apply after a string corresponding with E has been read. Therefore we will characterize the state of the parser by a set of items, in this case the set { E E + B, E E * B }. Extension of Item Set by expansion of non-terminals An item with a dot before a nonterminal, such as E E + B, indicates that the parser expects to parse the nonterminal B next. To ensure the item set contains all possible rules the parser may be in the midst of parsing, it must include all items describing how B itself will be parsed. This means that if there are rules such as B 1 and B 0 then the item set must also include the items B 1 and B 0. In general this can be formulated as follows:

If there is an item of the form A v Bw in an item set and in the grammar there is a rule of the form B w' then the item B w' should also be in the item set. Closure of item sets Thus, any set of items can be extended by recursively adding all the appropriate items until all nonterminals preceded by dots are accounted for. The minimal extension is called the closure of an item set and written as clos(I) where I is an item set. It is these closed item sets that we will take as the states of the parser, although only the ones that are actually reachable from the begin state will be included in the tables. Augmented grammar Before we start determining the transitions between the different states, the grammar is always augmented with an extra rule (0) S E where S is a new start symbol and E the old start symbol. The parser will use this rule for reduction exactly when it has accepted the input string. For our example we will take the same grammar as before and augment it: (0) S E (1) E E * B (2) E E + B (3) E B (4) B 0 (5) B 1 It is for this augmented grammar that we will determine the item sets and the transitions between them. Table construction Finding the reachable item sets and the transitions between them

The first step of constructing the tables consists of determining the transitions between the closed item sets. These transitions will be determined as if we are considering a finite automaton that can read terminals as well as nonterminals. The begin state of this automaton is always the closure of the first item of the added rule: S E: Item set 0 SE +EE*B +EE+B +EB +B0 +B1 The boldfaced "+" in front of an item indicates the items that were added for the closure (not to be confused with the mathematical '+' operator which is a terminal). The original items without a "+" are called the kernel of the item set. Starting at the begin state (S0) we will now determine all the states that can be reached from this state. The possible transitions for an item set can be found by looking at the symbols (terminals and nonterminals) we find right after the dots; in the case of item set 0 those symbols are the terminals '0' and '1' and the nonterminals E and B. To find the item set that each symbol x leads to we follow the following procedure for each of the symbols: 1. Take the subset, S, of all items in the current item set where there is a dot in front of the symbol of interest, x. 2. For each item in S, move the dot to the right of x. 3. Close the resulting set of items. For the terminal '0' (i.e. where x = '0') this results in: Item set 1 B0 and for the terminal '1' (i.e. where x = '1') this results in:

Item set 2 B1 and for the nonterminal E (i.e. where x = E) this results in: Item set 3 SE EE*B EE+B and for the nonterminal B (i.e. where x = B) this results in: Item set 4 EB Note that the closure does not add new items in all cases - in the new sets above, for example, there are no nonterminals following the dot. We continue this process until no more new item sets are found. For the item sets 1, 2, and 4 there will be no transitions since the dot is not in front of any symbol. For item set 3 we see that the dot is in front of the terminals '*' and '+'. For '*' the transition goes to: Item set 5 EE*B +B0 +B1 and for '+' the transition goes to: Item set 6 EE+B +B0 +B1

For item set 5 we have to consider the terminals '0' and '1' and the nonterminal B. For the terminals we see that the resulting closed item sets are equal to the already found item sets 1 and 2, respectively. For the nonterminal B the transition goes to: Item set 7 EE*B For item set 6 we also have to consider the terminal '0' and '1' and the nonterminal B. As before, the resulting item sets for the terminals are equal to the already found item sets 1 and 2. For the nonterminal B the transition goes to: Item set 8 EE+B These final item sets have no symbols beyond their dots so no more new item sets are added and we are finished. The finite automaton, with item sets as its states is shown below. The transition table for the automaton now looks as follows: Item Set * 0 1 2 3 4 5 6 7 1 1 2 2 7 8 5 6 + 0 1 1 2 E 3 B 4

8 Constructing the action and goto tables From this table and the found item sets we construct the action and goto table as follows: 1. The columns for nonterminals are copied to the goto table. 2. The columns for the terminals are copied to the action table as shift actions. 3. An extra column for '$' (end of input) is added to the action table that contains acc for every item set that contains S E . 4. If an item set i contains an item of the form A w and A w is rule m with m > 0 then the row for state i in the action table is completely filled with the reduce action rm. The reader may verify that this results indeed in the action and goto table that were presented earlier on. A note about LR(0) versus SLR and LALR parsing Note that only step 4 of the above procedure produces reduce actions, and so all reduce actions must occupy an entire table row, causing the reduction to occur regardless of the next symbol in the input stream. This is why these are LR(0) parse tables: they don't do any lookahead (that is, they look ahead zero symbols) before deciding which reduction to perform. A grammar that needs lookahead to disambiguate reductions would require a parse table row containing different reduce actions in different columns, and the above procedure is not capable of creating such rows. Refinements to the LR(0) table construction procedure (such as SLR and LALR) are capable of constructing reduce actions that do not occupy entire rows. Therefore, they are capable of parsing more grammars than LR(0) parsers. Conflicts in the constructed tables The automaton is constructed in such a way that it is guaranteed to be deterministic. However, when reduce actions are added to the action table it can happen that the same cell is filled with a reduce action and a shift action (a shift-reduce conflict) or with two different reduce actions (a reduce-reduce conflict). However, it can be shown that when this happens the grammar is not an LR(0) grammar.

A small example of a non-LR(0) grammar with a shift-reduce conflict is: (1) E 1 E (2) E 1 One of the item sets we then find is: Item set 1 E1E E1 +E1E +E1 There is a shift-reduce conflict in this item set because in the cell in the action table for this item set and the terminal '1' there will be both a shift action to state 1 and a reduce action with rule 2. A small example of a non-LR(0) grammar with a reduce-reduce conflict is: (1) E A 1 (2) E B 2 (3) A 1 (4) B 1 In this case we obtain the following item set: Item set 1 A1 B1 There is a reduce-reduce conflict in this item set because in the cells in the action table for this item set there will be both a reduce action for rule 3 and one for rule 4.

Both examples above can be solved by letting the parser use the follow set (see LL parser) of a nonterminal A to decide if it is going to use one of As rules for a reduction; it will only use the rule A w for a reduction if the next symbol on the input stream is in the follow set of A. This solution results in so-called Simple LR parsers.