CMSC 451 Selected Lecture Notes

http://www.cs.umbc.edu/~squire/cs451_lect.html

CS451 Selected Lecture Notes
This is one big WEB page, used for printing
Note that this file is plain text and .gif files. The plain text spells out Greek letters because they can not be typed.
These are not intended to be complete lecture notes. Complicated figures or tables or formulas are included here in case they were not clear or not copied correctly in class. Information from the language definitions, automata definitions, computability definitions and class definitions is not duplicated here. Lecture numbers correspond to the syllabus numbering.

Contents
Lecture 1 Fast Summary Part 1 Lecture 2 Fast Summary Part 2 Lecture 3 DFA and regular expressions Lecture 4 Nondeterministic Finite Automata NFA Lecture 5 NFA with epsilon moves Lecture 6 regular expression to NFA Lecture 7 NFA to regular expression, Moore, Mealy Lecture 8 Pumping Lemma for regular languages Review basics of proofs Lecture 9 Intersection of two languages, closure Lecture 10 Decision Algorithms Lecture 11 Quiz 1 Lecture 12 Myhill-Nerode minimization Lecture 13 Formal Grammars, CFG Lecture 14 Context Free Grammar derivation trees Lecture 15 CFG 'simplification' algorithm Lecture 16 Chomsky Normal Form Lecture 17 Greibach Normal Form Lecture 18 Inherently ambiguous CFL's, Project Lecture 19 Quiz 2 Lecture 20 Push Down Automata Lecture 21 CFG/CFL to NPDA Lecture 22 NPDA to CFG/CFL Lecture 23 Turing Machine Model Lecture 24 CYK algorithm for CFG's Lecture 25 Pumping Lemma for Context Free Languages

1 of 66

05/21/2011 09:31 AM

CMSC 451 Selected Lecture Notes

http://www.cs.umbc.edu/~squire/cs451_lect.html

Lecture 25a CFL closure properties Lecture 26 The Halting Problem Lecture 27 Church Turing Thesis Lecture 28 Review Lecture 29 Final Exam Other Links

Lecture 1 Fast Summary Part 1
A fast summary of what will be covered in this course (part 1) For reference, read through the language definitions. automata definitions and

Lecture 2 Fast Summary Part 2
A fast summary of what will be covered in this course (part 2) For reference, read through the computability definitions and complexity class definitions

Lecture 3 DFA and regular expressions
Example of a Deterministic Finite Automata, DFA

Machine Definition M = (Q, Sigma, delta, q0, F) Q = { q0, q1, q2, q3, q4 } the set of states (finite) Sigma = { 0, 1 } the input string alphabet (finite) delta the state transition table - below q0 = q0 the starting state F = { q2, q4 } the set of final states (accepting when in this state and no more input) inputs | 0 | 1 | ---------+---------+---------+ q0 | q3 | q1 | q1 | q1 | q2 | states q2 | q2 | q2 | q3 | q4 | q3 | q4 | q4 | q4 | ^ ^ ^ | | | | +---------+-- every transition must have a state +-- every state must be listed

delta

2 of 66

05/21/2011 09:31 AM

CMSC 451 Selected Lecture Notes

http://www.cs.umbc.edu/~squire/cs451_lect.html

An exactly equivalent diagram description for the machine M. Each circle is a unique state. The machine is in exactly one state and stays in that state until an input arrives. Connection lines with arrows represent a state transition from the present state to the next state for the input symbol(s) by the line. L(M) is the notation for a Formal Language defined by a machine M. Some of the shortest strings in L(M) = { 00, 11, 000, 001, 010, 101, 110, 111, 0000, 0001, 0010, 0011, 0100, 0101, 0110, 1001, ... } In words, L is the set of strings over { 0, 1} that contain at least two 0's starting with 0, or that contain at least two 1's starting with 1. Every 00 11 000 001 010 011 0110 input q0 q3 q0 q1 q0 q3 q0 q3 q0 q3 q0 q3 q0 q3 sequence goes through a sequence of states, for example q4 q2 q4 q4 q4 q4 q3 q4 q3 q3 q3 q3 q4

More information on DFA

Definition of a Regular Expression -----------------A regular expression may be the null string, r = epsilon A regular expression may be an element of the input alphabet, sigma, r = a A regular expression may be the union of two regular expressions, r = r1 + r2 A regular expression may be the concatenation (no symbol) of two regular expressions,

3 of 66

05/21/2011 09:31 AM

cs. The regular expression for the machine M above is r = (1(0*)1(0+1)*)+(0(1*)0(0+1)*) Later we will give an algorithm for generating a regular expression from a machine definition.edu/~squire/cs451_lect. start with each accepting state and work back to the start state writing the regular expression. The union of these regular expressions is the regular expression for the machine. For simple DFA's. but this is plain text) A regular expression may be a regular expression in parenthesis r = (r1) Nothing is a regular expression unless it is constructed with only the rules given above. denoted L(r). Thus a DFA can be converted to a regular expression and a regular expression can be converted to a DFA.CMSC 451 Selected Lecture Notes r = r1 r2 http://www.umbc. For every DFA there is a regular language and for every regular language there is a regular expression.html A regular expression may be the Kleene closure (star) of a regular expression r = r1* (the asterisk should be a superscript. 4 of 66 05/21/2011 09:31 AM . The language represented or generated by a regular expression is a Regular Language.

umbc.out On linux.dfa > ab_a.CMSC 451 Selected Lecture Notes http://www.cs.dfa .html Given a DFA and are accepted by consuming to do available to do one or more strings.gl. there is a program this for you.dfa # or dfa < ab_b. 5 of 66 05/21/2011 09:31 AM . dfa < ab_b. determine if the string(s) the DFA.edu ln -s /afs/umbc.umbc.umbc.edu/users/s/q/squire/pub/linux/dfa dfa cp /afs/umbc.edu/users/s/q/squire/pub/download/ab_b.edu/users/s/q/squire/pub/download/ab_b. do the following: On irix.edu/~squire/cs451_lect. This may be error prone and time by hand. Fortunately.dfa .gl.edu do the following: ln -s /afs/umbc.edu/users/s/q/squire/pub/dfa dfa cp /afs/umbc.

NFA Important! nondeterministic has nothing to do with random. delta. delta.q3} | {q0. sigma. Deterministic Finite Automata for the first 3 states. for a DFA has exactly one target state but for a NFA has a set. 6 of 66 05/21/2011 09:31 AM . possibly empty (phi). q4 } the set of states sigma = { 0.html Full information is available at Simulators The source code for the family of simulators is available. q4 } the set of final states (accepting when in this state and no more input) inputs delta | 0 | 1 | ---------+---------+---------+ q0 | {q0. of target states. q3. Lecture 4 Nondeterministic Finite Automata.1} that have at least two consecutive 0's or 1's. F) Q = { q0. sigma. F) b) Equivalent regular expression c) Equivalent state transition diagram and example tree of states for input string 0100011 and an equivalent DFA. The difference between a DFA and a NFA is that the state transition table. Example of a NFA. 1 } the input string alphabet delta the state transition table q0 = q0 the starting state F = { q2.cs.edu/~squire/cs451_lect. q0.every state must be listed b) The equivalent regular expression is (0+1)*(00+11)(0+1)* This NFA represents the language L = all strings over {0.a set of states. phi means empty set +-. q1.dfa # or dfa < ab_b.umbc. q2. nondeterministic implies parallelism.q1} | q1 | phi | {q2} | states q2 | {q2} | {q2} | q3 | {q4} | phi | q4 | {q4} | {q4} | ^ ^ ^ | | | | +---------+-. q0. Nondeterministic Finite Automata given by a) Machine Definition M = (Q.out http://www. delta.dfa > ab_a.CMSC 451 Selected Lecture Notes dfa < ab_b. a) Machine Definition M = (Q.

html c) Equivalent NFA state transition diagram. We use the terminology that the path "dies" if in q3 getting an input 1.cs. The tree of states this NFA is in for the input 0100011 input q0 / \ q3 q0 dies / \ q1 q0 dies / \ q3 q0 / / \ q4 q3 q0 / / / \ q4 q4 q3 q0 / / dies / \ q4 q4 q1 q0 / / / / \ q4 q4 q2 q1 q0 ^ ^ ^ | | | accepting paths in NFA tree 0 1 0 0 0 1 1 Construct a DFA equivalent to the NFA above using just the first three rows of delta (for brevity. Note that state q3 does not go anywhere for an input of 1. consider q3 and q4 do not exists).CMSC 451 Selected Lecture Notes http://www.edu/~squire/cs451_lect.umbc. 7 of 66 05/21/2011 09:31 AM .

0) = f(q0.umbc.q2} } Note: Include any group that contains a final state of the DFA.q1. {q1. {q0.q2} | {q0.q2}.q2} | {q0.q1.cs.q1. {q0.q1. sigma delta' | 0 | 1 | ----------+-------------+-------------+ phi | phi | phi | {q0} | {q0} | {q0.q1} | states {q1} | phi | {q2} | Q' {q2} | {q2} | {q2} | {q0.q2}. 8 of 66 05/21/2011 09:31 AM . Using the notation f'({q0}. {q0. Take the union of all such states. {q1}.q2} | {q0.q2} | {q0. Later we will use Myhill minimization.0) to mean: in delta' in state {q0} with input 0 goes to the state shown in delta with input 0. sigma is the same sigma = { 0. {q1. The delta' was constructed directly from delta. F') The set of states is Q' = 2**Q.q1}. sigma. {q0}.q1.html The DFA machine is M' = (Q'.q2} } Note: read the eight elements of the set Q' as names of states of M' use [ ] in place of { } if you prefer. q0'.q2}.q2} | {q0. Further notation.q2}.CMSC 451 Selected Lecture Notes http://www. phi is the empty set so phi union the set A is just the set A. 1} The state transition table delta' is given below The starting state is set containing only q0 q0' = {q0} The set of final states is a set of sets F' = { {q2}. {q0. {q0. {q2}. the power set of Q = { phi.q1. DFA (not minimized) equivalent to lower branch of NFA above.q2} | {q1.edu/~squire/cs451_lect.q1} | {q0} | {q0.q2} | never reached never reached never reached never reached Note: Some of the states in the DFA may be unreachable yet must be specified.q2} | {q2} | {q2} | {q0. delta'.

determine if the string(s) the NFA.q2}. delta.1) = = = = f(q0.q1} 0 {q0} 0 {q0} 0 {q0} 1 {q0.CMSC 451 Selected Lecture Notes Some samples: f'({q0.nfa > fig2_7. so it can be any where in the input string.0) f(q0.q2} The sequence of states is unique for a DFA.cs. sigma.q1}.q2}. F} Q = { q0. Lecture 5 NFA with epsilon moves Definition and example of a NFA with epsilon transitions. between any symbols. q2 } sigma = { a.1) = = = = {q0} {q0.umbc.q1.edu/users/s/q/squire/pub/nfa nfa cp /afs/umbc.0) f(q0. Fortunately.1) union union union union http://www.edu/users/s/q/squire/pub/download/fig2_7. there is a program this for you.q1.0) f(q2.0) f'({q0. nfa < fig2_7.q2} {q0. nfa < fig2_7. b.q1.q2} {q0.q1} 1 {q0.1) f(q0. Consider the NFA-epsilon move machine M = { Q. Remember. front. This may be error prone and time by hand.out On linux.nfa # or nfa < fig2_7. so for the same input as above 0100011 the sequence of states is {q0} 0 {q0} 1 {q0.q2} This sequence does not have any states involving q3 or q4 because just a part of the above NFA was converted to a DFA.q1}. do the following: On irix.1) f'({q0.out Full information is available at Simulators The source code for the family of simulators is available. There is a conversion algorithm from a NFA with epsilon transitions to a NFA without epsilon transitions.umbc.0) f'({q0.edu/users/s/q/squire/pub/download/fig2_7. q1.umbc.edu/users/s/q/squire/pub/linux/nfa nfa cp /afs/umbc.nfa > fig2_7.1) f(q2.nfa . q0. back.0) f(q1. This DFA does not accept the string 00 whereas the NFA above does accept 00.gl.edu ln -s /afs/umbc.html f(q1.edu/~squire/cs451_lect.edu do the following: ln -s /afs/umbc. c } and epsilon moves q0 = q0 F = { q2 } sigma plus epsilon 9 of 66 05/21/2011 09:31 AM . Given a NFA and are accepted by consuming to do available to do one or more strings.nfa # or nfa < fig2_7.gl. epsilon is the zero length string.nfa .

beware. q1 in the NFA-epsilon becomes {q1.q2} because the machine can move from q1 to q2 by an epsilon move.q2} because the machine can move from q0 to q1 by an epsilon move. 10 of 66 05/21/2011 09:31 AM . F') First determine the states of the new machine. that is what phi means.umbc. q2 can go nowhere except q2. Thus q0 in the NFA-epsilon becomes {q0. sigma.c} including the null string and all strings with any number of a's followed by any number of b's followed by any number of c's.edu/~squire/cs451_lect. ("any number" includes zero) Now convert the NFA with epsilon moves to a NFA M = ( Q'.q1. The epsilon closure is the initial state and all states that can be reached by one or more epsilon moves. but.html The language accepted by the above NFA with epsilon moves is the set of strings over {a. delta'. There will be the same number of states but the names can be constructed by writing the state name as the set of states in the epsilon closure. Q' = the epsilon closure of the states in the NFA with epsilon moves. We do not show the epsilon transition of a state to itself here. on an epsilon move. q0'. q2 in the NFA-epsilon becomes {q2} just to keep the notation the same.b. then check q1 and find that it can move from q1 to q2 by an epsilon move.cs. we will take into account the state to itself epsilon transition when converting NFA's to regular expressions.CMSC 451 Selected Lecture Notes delta | a | b | c |epsilon ------+------+------+------+------q0 | {q0} | phi | phi | {q1} ------+------+------+------+------q1 | phi | {q2} | phi | {q2} ------+------+------+------+------q2 | phi | phi | {q2} | phi ------+------+------+------+------- http://www.

umbc.q1. Very carefully consider each old machine transitions in the first row.q1. So far we have for out new NFA Q' = { {q0.a)=q0 thus in the new machine delta'({q0. qy.q1. Further.q2}.a)=phi Now consider the input b in the first row.b)=phi.q2}} | | ------------+--------------+--------------+-------------qy or {q1. {q2} } or renamed sigma = { a. inputs delta' | a | b | c ------------+--------------+--------------+-------------qx or {q0.q2} | | | ------------+--------------+--------------+-------------qy or {q1.q1.q2} or renamed inputs delta' | a | b | c ------------+--------------+--------------+-------------qx or {q0. The reason we considered q0.q2} | {{q0.a)=phi. Remember that a NFA has transition entries that are sets. {q2} } or renamed q0 = {q0.q2} this is just because the new machine accepts the same language as the old machine and must at least have the the same transitions for the new state names.q2} | | | ------------+--------------+--------------+-------------qz or {q2} | | | ------------+--------------+--------------+-------------No more entries go under input a in the first row because old delta(q1.q1.b)=phi.q1.q1. {q1.q1. You can ignore any "phi" entries and ignore the "epsilon" column. qz } { qx. {q1. c } F' = { {q0. the names in the transition entry sets must be only the state names from Q'.a)={q0. The new machine accepts the same language as the old machine. delta(q2.q1.q2}. q1 and q2 in the old machine was because out new state { qx.edu/~squire/cs451_lect. In the old machine delta(q0.q2}.CMSC 451 Selected Lecture Notes http://www. qy. thus same sigma.q2}.b)={q2} and delta(q2.cs. qz } qx 11 of 66 05/21/2011 09:31 AM . delta(q1.q2} | | | ------------+--------------+--------------+-------------qz or {q2} | | | ------------+--------------+--------------+-------------Now we fill in the transitions.q2}.html The initial state of our new machine is {q0.q2} the epsilon closure of q0 The final state(s) of our new machine is the new state(s) that contain a state symbol that was a final state in the original machine. b. delta(q0.

q2} | | | ------------+--------------+--------------+-------------qz or {q2} | | | ------------+--------------+--------------+-------------Now. which is qz or {q2} is put into the input c transition in row 1.q1.q2}} | {{q2}} or qz ------------+--------------+--------------+-------------qy or {q1.html has symbols q0. because {q1.q2}} | {{q1.q2}} | {{q2}} or {qz} ------------+--------------+--------------+-------------qy or {q1.q2} | {{q0. .q2}} | ------------+--------------+--------------+-------------qy or {q1.. move on to row two. because our new qx state has a symbol q2 in its name and delta(q2. Fine the old machine state that results from an input and translate the old machine state to the corresponding new machine state name and put the new machine state name in the set in delta'.q2} | {{q0. Below are the "long new state names" and the renamed state names in delta'.q1. the new name for the old q2.CMSC 451 Selected Lecture Notes http://www.q2}} | {{q2}} or {qz} ------------+--------------+--------------+-------------qz or {q2} | phi | phi | {{q2}} or {qz} ------------+--------------+--------------+-------------- 12 of 66 05/21/2011 09:31 AM .q2}.umbc. inputs delta' | a | b | c ------------+--------------+--------------+-------------qx or {q0.b)={q1. delta.q2} ?. Since q1 is in {q0.cs. Just compare the zeroth column of delta to delta'.q1.q2}.c)=q2 is in the old machine. for all old machine state symbols in the name of the new machines states.. .q2} is the new machines name for the old machines name q1.q1. So we have inputs delta' | a | b | c ------------+--------------+--------------+-------------qx or {q0.edu/~squire/cs451_lect.q2} | {{q0. WHY {q1.q1.q2}} | {{q1.q2}} | {{q1. tediously.q2} | phi | {{q1.q2} and delta(q1. q1 and q2 in the new state name from the epsilon closure.q1.q1.q1.b)=q1 then delta'({q0.q2} | | | ------------+--------------+--------------+-------------qz or {q2} | | | ------------+--------------+--------------+-------------Now. inputs delta' | a | b | c ------------+--------------+--------------+-------------qx or {q0. You are considering all transitions in the old machine.

e. but we will make good use of this in converting regular expressions to machines. The construction first converts all symbols in the regular expression using construction 3). L=phi L={epsilon} r = phi 3) An element of the input alphabet. r = epsilon 2) The entire regular expression is empty. is in the regular expression r = a where a is an element of sigma. M. M. 4) Two regular expressions are joined by the union operator. from a regular expression r.e. regular-expression -> NFA-epsilon -> NFA -> DFA. 13 of 66 05/21/2011 09:31 AM . left to right at the same scope.input alphabet sigma The figure above labeled NFA shows this state transition table. for every regular expression such that L(M) = L(r). i.html <-. apply the one construction that applies from 4) 5) or 6). The six constructions below correspond to the cases: 1) The entire regular expression is the null string.cs. i. Lecture 6 Construction: machine from regular expression Given a regular expression there is an associated regular language L(r). Since there is a finite automata for every regular language.edu/~squire/cs451_lect. The constructive proof provides an algorithm for constructing a machine. It seems rather trivial to add the column for epsilon transitions. there is a machine.CMSC 451 Selected Lecture Notes inputs delta' | a | b | c ---+------+------+----/ qx | {qx} | {qy} | {qz} / ---+------+------+----Q' qy | phi | {qy} | {qz} \ ---+------+------+----\ qz | phi | phi | {qz} ---+------+------+----- http://www. + r1 + r2 5) Two regular expressions are joined by concatenation (no symbol) r1 r2 6) A regular expression has the Kleene closure (star) applied to it r* The construction proceeds by using 1) or 2) if either applies. Then working from inside outward.umbc. sigma.

Very careful compression may be performed. All these machines have the regular expression from which they were This NFA can then be converted conversion can be performed to same language as the constructed.CMSC 451 Selected Lecture Notes http://www.html Note: add one arrow head to figure 6) going into the top of the second circle. Because of the generality there are many more states generated than are necessary.umbc. the fragment regular expression aba would be a e b e a 14 of 66 05/21/2011 09:31 AM .edu/~squire/cs451_lect.cs. The unnecessary states are joined by epsilon transitions. The construction covers all possible cases that can occur in any regular expression. Further get a DFA. to a NFA without epsilon moves. For example. The result is a NFA with epsilon moves.

we also use a superscript in the naming of regular expression. This will provide a DFA that has the minimum number of states.4 in 1st Ed. The notes are in lecture 7.4 in 2nd Ed. Example: r = (0+1)* (00+11) (0+1)* Solution: find the primary operator(s) that are concatenation or union. state transition table. We call the variable r a regular expression. We can talk about r being the regular expression with i. states are numbered consecutively. 1 r 12 3 r 64 r 1k k k-1 r ij are just names of different regular expressions 2 15 of 66 05/21/2011 09:31 AM ..edu/~squire/cs451_lect.CMSC 451 Selected Lecture Notes q0 ---> q1 ---> q2 ---> q3 ---> q4 ---> q5 with e http://www.cs. In this case. the two outermost are concatenation. The set F of final states must be known. . 2. Lecture 7 Convert NFA to regular expression Conversion algorithm from a NFA to a regular expression. Within a renaming of the states and reordering of the delta. this can be trivially reduced to a b a q0 ---> q1 ---> q2 ---> q3 A careful reduction of unnecessary states requires use of the Myhill-Nerode Theorem of section 3.j subscripts ij Note r is just a (possibly) different regular expression from r 12 53 Because we need multiple columns in a table we are going to build.. Now recursively decompose each internal regular expression. Start with the transition table for the NFA with the following state naming conventions: the first state is 1 or q1 or s1 which is the starting state. or section 4.html used for epsilon. giving.umbc. all minimum machines of a DFA are identical. Conversion of a NFA to a regular expression was started in this lecture and finished in the next lecture. crudely: //---------------\ /----------------\\ /-----------------\ -->|| <> M((0+1)*) <> |->| <> M((00+11)) <> ||->| <> M((0+1)*) <<>> | \\---------------/ \----------------// \-----------------/ There is exactly one start "-->" and exactly one final state "<<>>" The unlabeled arrows should be labeled with epsilon. n The transition table is a typical NFA where the table entries are sets of states and phi the empty set is allowed. 3. 1.

edu/~squire/cs451_lect. | 32 | 32 | 32 | 32 | | ----+--------+-------+-------+-----+-----| 0 | 1 | 2 | | r | r | r | r | .. build the table entries for the k=0 column: / 0 / +{ x | delta(q ..... | 33 | 33 | 33 | 33 | | Only build column n for 1.x) = q } + epsilon \ i j i /= j i = j 16 of 66 05/21/2011 09:31 AM ... | k=n ----+--------+-------+-------+-----+-----| 0 | 1 | 2 | | n r | r | r | r | .html rows and n+1 columns labeled | k=0 | k=1 | k=2 | .. | 22 | 22 | 22 | 22 | | ----+--------+-------+-------+-----+-----| 0 | 1 | 2 | | r | r | r | r | . | 31 | 31 | 31 | 31 | | ----+--------+-------+-------+-----+-----| 0 | 1 | 2 | | r | r | r | r | . | r 13 | 13 | 13 | 13 | | 13 ----+--------+-------+-------+-----+-----| 0 | 1 | 2 | | r | r | r | r | .. +. | 23 | 23 | 23 | 23 | | ----+--------+-------+-------+-----+-----| 0 | 1 | 2 | | r | r | r | r | .. | r 11 | 11 | 11 | 11 | | 11 ----+--------+-------+-------+-----+-----| 0 | 1 | 2 | | n r | r | r | r | ...... | 21 | 21 | 21 | 21 | | ----+--------+-------+-------+-----+-----| 0 | 1 | 2 | | r | r | r | r | .CMSC 451 Selected Lecture Notes We are going to build a table with n http://www. all pairs of numbers from 1 to n Now. | r 12 | 12 | 12 | 12 | | 12 ----+--------+-------+-------+-----+-----| 0 | 1 | 2 | | n r | r | r | r | ..cs. of the column n ^ |- 2 Note n rows..x) = q } r = / i j ij \ \ +{ x | delta(q .umbc..final state The final regular expression is then the union...

html where delta is the transition table function.cs. a. x is some symbol from sigma the q's are states 0 r could be phi.. build the k=1 column: 1 r = ij 0 r i1 ( r 0 * 0 ) r + 11 1j 0 r ij note: all items are from the previous column Next. or a+b+d+epsilon ij notice there are no Kleene Star or concatenation in this column Next. 1. ) + ( . ) is union ( . q. a. epsilon. for final states p..edu/~squire/cs451_lect. build the k=2 column: 2 r = ij 1 r i2 ( r 1 * 1 ) r + 22 2j 1 r ij note: all items are from the previous column Then... ) is concatenation ( . )( .. Note: phi is the empty set epsilon is the zero length string 0..CMSC 451 Selected Lecture Notes http://www. Some minimization rules for regular expressions These can be applied at every step. build the rest of the k=k columns: k r = ij k-1 r ik ( r kk k-1 ) * r kj k-1 + k-1 r ij note: all items are from previous column Finally. 0+1. r n r + 1p n r + 1q r 1r n the regular expression is Note that this is from a constructive proof that every NFA has a language for which there is a corresponding regular expression. c.. b.. )* is the Kleene Closure = Kleene Star (phi)(x) = (x)(phi) = phi 17 of 66 05/21/2011 09:31 AM ..umbc.. are symbols in sigma x is a variable or regular expression ( .

b. q2} { a.umbc. F) as delta | a | b | c --------+------+------+----q1 | {q2} | {q2} | {q1} --------+------+------+----q2 | phi | phi | phi --------+------+------+----Q sigma q0 F = = = = { q1.edu/~squire/cs451_lect. delta.CMSC 451 Selected Lecture Notes http://www.html (epsilon)(x) = (x)(epsilon) = x (phi) + (x) = (x) + (phi) = x x + x = x (epsilon)* = (epsilon)(epsilon) = epsilon (x)* + (epsilon) = (x)* = x* (x + epsilon)* = x* x* (a+b) + (a+b) = x* (a+b) x* y + y = x* y (x + epsilon)x* = x* (x + epsilon) = x* (x+epsilon)(x+epsilon)* (x+epsilon) = x* Now for an example: Given M=(Q.cs. sigma. q0. c } q1 { q2} | k=0 | k=1 (using e for epsilon) -----+-------------+-----------------------------------r | c + epsilon | (c+e)(c+e)* (c+e) + (c+e) = c* 11 | | -----+-------------+-----------------------------------r | a + b | (c+e)(c+e)* (a+b) + (a+b) = c* (a+b) 12 | | -----+-------------+-----------------------------------r | phi | phi (c+e)* (c+e) + phi = phi 21 | | -----+-------------+-----------------------------------r | epsilon | phi (c+e)* (a+b) + e = e 22 | | -----+-------------+------------------------------------ | k=0 | k=1 | k=2 (using e for epsilon) -----+-------------+----------+------------------------r | c + epsilon | c* | 11 | | | -----+-------------+----------+------------------------r | a + b | c* (a+b) | c* (a+b)(e)* (e) + c* (a+b) only final 18 of 66 05/21/2011 09:31 AM .

cs. For the Pumping Lemma. then the language is regular.CMSC 451 Selected Lecture Notes http://www.edu/~squire/cs451_lect.w)(z = uvw and |uv|<=n and |v|>=1 and i (for all i>=0)(uv w is in L) )}] The two commonest ways to use the Pumping Lemma to prove a language is NOT regular are: 19 of 66 05/21/2011 09:31 AM . To prove a language is not regular requires a specific definition of the language and the use of the Pumping Lemma for Regular Languages. If you can prove B is false. The statement "B" is a statement from the Predicate Calculus. If a regular expression can be constructed to exactly generate the strings in a language. A implies B.umbc.v. The Pumping Lemma is generally used to prove a language is not regular. then the language is regular.html state 12 | | | -----+-------------+----------+------------------------r | phi | phi | 21 | | | -----+-------------+----------+------------------------r | epsilon | e | 22 | | | -----+-------------+----------+------------------------the final regular expression minimizes to Additional topics include Mealy Machines Moore Machines c* (a+b) and Lecture 8 Pumping Lemma for Regular Languages Review of Basics of Proofs. the statement "A" is "L is a Regular Language". If a DFA. (This is a plain text file that uses words for the upside down A that reads 'for all' and the backwards E that reads 'there exists') Formal statement of the Pumping Lemma: L is a Regular Language implies (there exists n)(for all z)[z in L and |z|>=n implies {(there exists u. then you have proved A is false. A note about proofs using the Pumping Lemma: Given: Formal statements A and B. then the language is a Regular Language. If a regular grammar can be constructed to exactly generate the strings in a language. NFA or NFA-epsilon machine can be constructed to exactly accept a language.

2^n." 20 of 66 05/21/2011 09:31 AM .CMSC 451 Selected Lecture Notes http://www. r = a a* b b* L = { a^(37*n+511) n>0 } 511 states in series. Use the regular expression to NFA 'union' construction. a language is regular if it is accepted by some DFA. can not be enumerated. A set that is not countable. The strings of a regular language can be enumerated. The class of languages called regular languages is the set of all languages that are regular.umbc. typically for a value i=0 or i=2. written down for length 0. regular expression or regular grammar. Be sure to cover all cases by argument or enumerating cases. A DFA can be constructed for any finite set of strings. L1 op L2 is in the class. 37 states in loop Lecture 9 Intersection and other closures A class of languages is simply a set of languages with some property that determines if a language is in the class (set). This comes from the fact that regular languages are closed under union. like the real numbers. A single language is a set of strings over a finite alphabet and is therefor countable. NFA. n n Notation: the string having n a's followed by n b's is a b which is reduced to one line by writing a^n b^n Languages that are not regular: L = { a^n b^n n<0 } L = { a^f1(n) b^f2(n) n0 } for any function f(n)< k*n+c for all constants k and c L = { a^(n*n) n>0 } also applies to n a prime(n log n). say L1 and L2. From mathematics we know a power set of a countable set is not countable.cs. when for any two languages in the class. Remember. But. length 2 and so forth. n! L = { a^n b^k n>0 k>n } can not save count of a's to check b's k>n L = { a^n b^k+n n>0 k>1 } same language as above Languages that are regular: L = { a^n n>=0 } this is just r = a* L = { a^n b^k n>0 k>0 } no relation between n and k. A class of languages is closed under some operation. Note: The pumping lemma only applies to languages (sets of strings) with infinite cardinality. op. A regular language may have an infinite number of strings.edu/~squire/cs451_lect. this is usually accomplished by showing a contradiction such as (n+1)(n+1) < n*n+n b) show there is no way to partition z into u. NFA-epsilon. This is the definition of "closed. v and w such that i uv w is in L. the class of regular languages is the size of a power set of any given regular language. length 1.html a) show that there is no possible n for the (there exists n).

Regular languages are closed under operations: concatenation. but this works in general by picking the alphabet of M3 to be the intersection 21 of 66 05/21/2011 09:31 AM .CMSC 451 Selected Lecture Notes http://www. written as L1 with a bar over it.M = L intersect complement M. L3 = ( L1 union L2) . M3. delta3 is constructed from delta3([x1. Sigma star is all possible strings over the alphabet sigma. such that L(M3) = L(M1) intersect L(M2) is given by: Let M1 = (Q1. reversal. L1 and L2 regular languages and L3 = L1 intersect L2 implies ______________ __ __ L3 is a regular language. Simple constructions include complementation by interchanging final states and non final states in a DFA.cs.a) = [delta1(x1. Thus we know the set of regular languages is closed under union. Given union and complementation. q1. union. difference. F1 x F2) where [q1. delta2.a). F1) Let M2 = (Q2. sigma. Concatenation. Similarly.x2]. has all strings from sigma star except the strings in L1. It turns out a DFA for a language can be made a DFA for the complement language by changing all final states to not final states and visa versa. We have seen that regular languages are closed under union because the "+" is regular expressions yields a union operator for regular languages. for DFA machines. sigma. intersection. q2. (Warning! This is not true for NFA's) Thus regular languages are closed under complementation. sigma. In symbolic terms. All of these operations are "effective" because there is an algorithm to construct the resulting language. reversing the direction of all transition arcs. complementation. Reversal is constructed from a DFA using final states as starting states and the starting state as the final state. [q1. delta3. The construction of a DFA. L(M1) union L(M2) is L(M1 union M2) by the construction in lecture 6. by DeMorgan's Theorem.umbc. L1 = sigma * . Intersection is constructed using DeMorgan's theorem and difference is constructed using L . delta2(x2. union and Kleene star are constructed using the corresponding regular expression to NFA technique. Kleene star.q2]. substitution. We choose to say the same alphabet is used in both machines. homomorphism and any finite combination of these operations.a)] for all a in sigma and all [x1. L1 and L2 regular languages and L3 = L1 union L2 implies L3 is a regular language. Then M3 = (Q1 x Q2. The language L1 bar.x2] in Q1 x Q2.L1. The complement of a language is defined in terms of set difference __ from sigma star. all ordered pairs.edu/~squire/cs451_lect.q2] is an ordered pair from Q1 x Q2. F2) Let S1 x S2 mean the cross product of sets S1 and S2. delta1.html Summary.

q1}.qj].q3] | [q0. [q1. [q0. Sigma.x)] as you might expect.umbc. Sigma. delta3.html of the alphabets of M1 and m2 and using all 'a' from this set. b}.q3].delta2(qj.q5]} F3 = F1 x F2 = {[q1. Remember. F3) defined as Q3 = Q1 x Q2 set cross product q3 = [q1. q1. q1=q0. Sigma.x). delta2. [q1. F1) with the usual definitions M2 = (Q2. b}. else some fix up is required F3 = F1 x F2 set cross product delta3 is constructed from delta3([qi.x) = [delta1(qi.q2] q3 is an element of Q3. delta1. M1 = (Q1. delta1. q2.q5]. the machines have to be DFA. NFA or NFA-epsilon type machines because these machines have corresponding regular languages. [q1.q5} delta1 | a | b | ---+----+----+ q0 | q0 | q1 | ---+----+----+ q1 | q1 | q1 | ---+----+----+ delta2 | a | b | -----+----+----+ q3 | q3 | q4 | -----+----+----+ q4 | q5 | q3 | -----+----+----+ q5 | q5 | q5 | -----+----+----+ M3 now is constructed as Q3 = Q1 x Q2 = {[q0. delta2. [q0.q5]} initial state q3 = [q0.q4]. the notation means an ordered pair Sigma = Sigma = Sigma we choose to use the same alphabet.cs. F2) with the usual definitions Now construct M3 = (Q3.x)] delta3 | a | b | ---------+---------+---------+ [q0.q3] | [q1. For example: M1: Q1 = {q0. this is most easily performed on a DFA The language L3(M3) is shown to be the intersection of L1(M1) and L2(M2) by induction on the length of the input string.CMSC 451 Selected Lecture Notes http://www. Sigma = {a. q3.x) = [delta1(qi. q2=q3. Regular set properties: One way to show that an operation on two regular languages produces a regular language is to construct a machine that performs the operation.delta2(qj. [q1. Consider two machines M1 and M2 for languages L1(M1) and L2(M2).edu/~squire/cs451_lect.q4] | this is a DFA when both 22 of 66 05/21/2011 09:31 AM .x). To show that L1 intersect L2 = L3 and L3 is a regular language we construct a machine M3 and show by induction that M3 only accepts strings that are in both L1 and L2. q5}. q4.q3]. F1 = {q1} M2: Q2 = {q3.q4].q3] Sigma = technically Sigma_1 intersect Sigma_2 delta3 is constructed from delta3([qi. F2 = {q4.q4].qj]. Sigma = {a.

each character in Sigma. Theorem: The regular set accepted by DFA's with n states: 1) the set is non empty if and only if the DFA accepts at least one string of length less than n.q5] are unreachable.html M1 and M2 are DFA's As we have seen before there may be unreachable states. only a finite number of steps will be performed. i. 3. a DFA.e. the initial state q0 and the final states F. and the algorithm will produce a result 3) The same.. Then try all strings of length 2. Thus we say there is an algorithm to decide if any regular set 23 of 66 05/21/2011 09:31 AM . result will be produced for the same input 4) Each step must have properties 1) 2) and 3) Remember: A regular language is just a set of strings over a finite alphabet.q4] | [q0.edu/~squire/cs451_lect.umbc. the Myhill-Nerode theorem and minimization to eliminate useless states.q5] | [q0. q0. Lecture 10 Decision algorithms and review Decision algorithms for regular sets: Remember: An algorithm must always terminate to be called an algorithm! Basically. thus a final state will never be reachable. . correct. an algorithm needs to have four properties 1) It must be written as a finite number of unambiguous steps 2) For every possible input. The null string is either accepted or rejected in a finite time.. represented by the usual M=(Q. Every regular set can be represented by an regular expression and by a minimized finite automata.q3] | [q1.q5] | [q1.q5] | [q1. Coming soon. Every try results in an accept or reject in a finite time. It is possible for the intersection of L1 and L2 to be empty. the transition table delta. Sigma.q4] | [q1. just more to try (finite) Rather obviously the algorithm proceeds by trying the null string first.q3] | ---------+---------+---------+ [q1. In this example [q0.q5] | [q1.q5] | [q1.q5] | [q1. Thus we can analyze every DFA and even simulate them.cs.q5] | ---------+---------+---------+ http://www.q4] | ---------+---------+---------+ [q1. We write down the set of states Q. F) There are countably many DFA's yet every DFA we look at has a finite description. n <= k < 2n.CMSC 451 Selected Lecture Notes ---------+---------+---------+ [q0. delta.q5] | ---------+---------+---------+ [q1.q4] and [q0. k. .q3] | [q1. Then try all strings of length 1.q3] | ---------+---------+---------+ [q0. We choose to use DFA's. By pumping lemma. the alphabet Sigma. Just try less than |Sigma|^|Q| strings (finite) 2) the set is infinite if and only if the DFA accepts at least one string of length k.

CMSC 451 Selected Lecture Notes
represented by a DFA is a) empty, b) finite and

http://www.cs.umbc.edu/~squire/cs451_lect.html
c) infinite.

The practical application is painful! e.g. Given a regular expression, convert it to a NFA, convert NFA to DFA, use Myhill-Nerode to get a minimized DFA. Now we know the number of states, n, and the alphabet Sigma. Now run the tests given in the Theorem above. An example of a program that has not been proven to terminate is terminates.c with output terminates.out We will cover a program, called the halting problem, that no Turing machine can determine if it will terminate. Review for the quiz. (See homework WEB page.)

Lecture 11 Quiz 1
Closed book. Multiple choice. Covers lectures, reading assignments and homework. See details on Homework WEB page here

Lecture 12 Myhill-Nerode Minimization
Myhill-Nerode theorem and minimization to eliminate useless states. The Myhill-Nerode Theorem says the following three statements are equivalent: 1) The set L, a subset of Sigma star, is accepted by a DFA. (We know this means L is a regular language.) 2) L is the union of some of the equivalence classes of a right invariant(with respect to concatenation) equivalence relation of finite index. 3) Let equivalence relation RL be defined by: xRLy if and only if for all z in Sigma star, xz is in L exactly when yz is in L. Then RL is of finite index. The notation RL means an equivalence relation R over the language L. The notation RM means an equivalence relation R over a machine M. We know for every regular language L there is a machine M that exactly accepts the strings in L. Think of an equivalence relation as being true or false for a specific pair of strings x and y. Thus xRy is true for some set of pairs x and y. We will use a relation R such that xRy <=> yRx x has a relation to y if and only if y has the same relation to x. This is known as symmetric. xRy and yRz implies xRz. This is known as transitive. xRx is true. This is known as reflexive.

24 of 66

05/21/2011 09:31 AM

CMSC 451 Selected Lecture Notes

http://www.cs.umbc.edu/~squire/cs451_lect.html

Our RL is defined xRLy <=> for all z in Sigma star (xz in L <=> yz in L) Our RM is defined xRMy <=> xzRMyz for all z in Sigma star. In other words delta(q0,xz) = delta(delta(q0,x),z)= delta(delta(q0,y),z) = delta(q0,yz) for x, y and z strings in Sigma star. RM divides the set Sigma star into equivalence classes, one class for each state reachable in M from the starting state q0. To get RL from this we have to consider only the Final reachable states of M. From this theorem comes the provable statement that there is a smallest, fewest number of states, DFA for every regular language. The labeling of the states is not important, thus the machines are the same within an isomorphism. (iso=constant, morph=change) Now for the algorithm that takes a DFA, we know how to reduce a NFA or NFA-epsilon to a DFA, and produces a minimum state DFA. -3) Start with a machine M = (Q, Sigma, delta, q0, F) as usual -2) Remove from Q, F and delete all states that can not be reached from q0. Remember a DFA is a directed graph with states as nodes. Thus use a depth first search to mark all the reachable states. The unreachable states, if any, are then eliminated and the algorithm proceeds. -1) Build a two dimensional matrix labeling the right side q0, q1, ... running down and denote this as the "p" first subscript. Label the top as q0, q1, ... and denote this as the "q" second subscript 0) Put dashes in the major diagonal and the lower triangular part of the matrix (everything below the diagonal). we will always use the upper triangular part because xRMy = yRMx is symmetric. We will also use (p,q) to index into the matrix with the subscript of the state called "p" always less than the subscript of the state called "q". We can have one of three things in a matrix location where there is no dash. An X indicates a distinct state from our initialization in step 1). A link indicates a list of matrix locations (pi,qj), (pk,ql), ... that will get an x if this matrix location ever gets an x. At the end, we will label all empty matrix locations with a O. (Like tic-tac-toe) The "O" locations mean the p and q are equivalent and will be the same state in the minimum machine. (This is like {p,q} when we converted a NFA to a DFA. and is the transitive closure just like in NFA to DFA.) NOW FINALLY WE ARE READY for 1st Ed. Page 70, Figure 3.8 or 2nd Ed. Page 159. 1) For p in F and q in Q-F put an "X" in the matrix at (p,q) This is the initialization step. Do not write over dashes. These matrix locations will never change. An X or x at (p,q) in the matrix means states p and q are distinct in the minimum machine. If (p,q) has a dash, put the X in (q,p) 2) BIG LOOP TO END For every pair of distinct states (p,q) in F X F do 3) through 7)

25 of 66

05/21/2011 09:31 AM

CMSC 451 Selected Lecture Notes

http://www.cs.umbc.edu/~squire/cs451_lect.html

and for every pair of distinct states (p,q) in (Q-F) x (Q-F) do 3) through 7) (Actually we will always have the index of p < index of q and p never equals q so we have fewer checks to make.) 3) 4) 5) If for any input symbol 'a', (r,s) has an X or x then put an x at (p,q) Check (s,r) if (r,s) has a dash. r=delta(p,a) and s=delta(q,a) Also, if a list exists for (p,q) then mark all (pi,qj) in the list with an x. Do it for (qj,pi) if (pi,qj) has a dash. You do not have to write another x if one is there already. 6) 7) If the (r,s) matrix location does not have an X or x, start a list or add to the list (r,s). Of course, do not do this if r = s, of if (r,s) is already on the list. Change (r,s) to (s,r) if the subscript of the state r is larger than the subscript of the state s END BIG LOOP Now for an example, non trivial, where there is a reduction. M = (Q, Sigma, delta, q0, F} and we have run a depth first search to eliminate states from Q, F and delta that can not be reached from q0.

26 of 66

05/21/2011 09:31 AM

| | | | | | | | +---+---+---+---+---+---+---+---+---+ | . q7. q4. and labeling the "q" columns q0. q7. q3).| .| | | +---+---+---+---+---+---+---+---+---+ | .| | | | | | | | | +---+---+---+---+---+---+---+---+---+ | .| . q7). q1). q1. (q3. q8} Sigma = {a. (q2.. q2.| .| . q7). q8).| .| . q8).| . q8). (q2.| .edu/~squire/cs451_lect.| +---+---+---+---+---+---+---+---+---+ Now fill in for step 1) (p. q1. q7).| .| | | | | | | +---+---+---+---+---+---+---+---+---+ | .| .| . (q2. q5).| . q6. q1. (q3.| ..q) such that p in F and q in (Q-F) { (q2. and put in dashes on the diagonal and below the diagonal q0 q1 q2 q3 q4 q5 q6 q7 q8 q0 q1 q2 q3 q4 q5 q6 q7 q8 +---+---+---+---+---+---+---+---+---+ | .| .| .| . (q4. q5).cs.| . q8). Now.| .html Q = {q0.| .| . q4).| | | | | +---+---+---+---+---+---+---+---+---+ | . q6} delta | a | b | ----+----+----+ q0 | q1 | q4 | ----+----+----+ q1 | q2 | q3 | ----+----+----+ q2 | q7 | q8 | ----+----+----+ q3 | q8 | q7 | ----+----+----+ q4 | q5 | q6 | ----+----+----+ q5 | q7 | q8 | ----+----+----+ q6 | q7 | q8 | ----+----+----+ q7 | q7 | q7 | ----+----+----+ q8 | q8 | q8 | ----+----+----+ note Q-F = {q0. q8)} We use an ordered (Q-F) x (Q-F) = {(q0.| . q6)} q1). build the matrix labeling the "p" rows q0.| . q1.| | +---+---+---+---+---+---+---+---+---+ | . q3.| | | | | | +---+---+---+---+---+---+---+---+---+ | . (q0. q4. . (q1. . (q1.| . q3.| . q0).| .umbc.. q8} We use an ordered F x F = {(q2. (q5.| .| . (q7. (q4. q5.| . (q2.| . (q2.| .| .| .| .CMSC 451 Selected Lecture Notes http://www.| .| . 27 of 66 05/21/2011 09:31 AM . q7). b} q0 = q0 F = {q2. q6). (q0. (q0. q4). (q2.| ..| | | | +---+---+---+---+---+---+---+---+---+ | . q5. q6). q4). (q1.

| . q8) is blank.| . q8) will get an "x'.| .b).| .| . q) on the list for (delta(p.| .| .| .| .| .| +---+---+---+---+---+---+---+---+---+ Now fill in more x's by checking all the cases in step 2) and apply steps 3) 4) 5) 6) and 7). (q3. q1) gets an "x" Another from F x F (r. Another (r. thus (q0. (q5.a)) so r=q1 and s= q2 Note that (q1.| X | | | X | X | +---+---+---+---+---+---+---+---+---+ | . q7).| X | X | | X | X | | | +---+---+---+---+---+---+---+---+---+ | .| .| . (q6.s) = (delta(p=q4.| .| . (q3.umbc. q4). including (q0. q8) in this list. Finish by filling in blank matrix locations with "O".| .| . q1). q7).a). q8). (q5. q8). Performing the tedious task results in the matrix: q0 q1 q2 q3 q4 q5 q6 q7 q8 +---+---+---+---+---+---+---+---+---+ q0 | . (q6.edu/~squire/cs451_lect.a)) where p=q0 and q=q8 then s = delta(q0.| .b)) so r=q6 and s=q8 thus since (q6.| .a). q8)} q0 q1 q2 q3 q4 q5 q6 q7 q8 q0 q1 q2 q3 q4 q5 q6 q7 q8 +---+---+---+---+---+---+---+---+---+ | .| . (q6.| x | X | X | x | X | X | x | x | +---+---+---+---+---+---+---+---+---+ q1 | .| | X | X | +---+---+---+---+---+---+---+---+---+ | . q1). q4). (q6. (q6. q0).| . q8) will get an "x" and the list.] Eventually (q1.s) = (delta(p. (q3.| . q4) means {q1.| .a) = q1 and r = delta(q8. (q5.| . delta(q=q5. q1). For example (r.| . Thus start a list in (q1.| .| | X | X | | X | X | | | +---+---+---+---+---+---+---+---+---+ | .q2) has an X.| .q5) gets an "x" It depends on the order of the choice of (p.| . q0).| .| | X | | | X | X | +---+---+---+---+---+---+---+---+---+ | .html (q3.| . delta(q=q1.a) = q8 but (q1. delta(q.| .| . q0).| .a).| . q7). q) in step 2) whether a (p. q8) and put (q0.| . delta(q. (q3.| .q) = (q4.| .| X | X | O | X | X | x | x | The "O" at (q1. q8) has an X then (p. (q5.cs. q4} 28 of 66 05/21/2011 09:31 AM .| X | X | | | +---+---+---+---+---+---+---+---+---+ | . q) gets added to a list in a cell or gets an "x".s) = (delta(p=q0. (q5.| . q4).| | +---+---+---+---+---+---+---+---+---+ | .| X | X | +---+---+---+---+---+---+---+---+---+ | .a)) and for our case the variable "a" happens to be the symbol "a". [ This is what 7) says: put (p.CMSC 451 Selected Lecture Notes http://www.

q4} | {q1. q0 with input "a" goes to q1.| O | X | O | O | X | X | +---+---+---+---+---+---+---+---+---+ q3 | . (q2.umbc.| X | X | x | x | +---+---+---+---+---+---+---+---+---+ q5 | .| .| .edu/~squire/cs451_lect.| .q6} | {q7.| .| .q3.q8} } four states F' = { {q2.| .| .q3.| O | +---+---+---+---+---+---+---+---+---+ q8 | . but in this case the result is always a DFA even though the states have the strange looking names that appear to be sets.| X | X | +---+---+---+---+---+---+---+---+---+ q7 | . q6} in the minimum machine. but are just names of the states in the DFA. delta'.q3.q8} | {q7.cs. F') with Q' = { {q0}. e.| .| .| .q6} | ----------+----------------+------------------+ {q2.| .q8} | {q7. {q1.CMSC 451 Selected Lecture Notes +---+---+---+---+---+---+---+---+---+ q2 | .| .q3. q0'.| .q5.q8} | ----------+----------------+------------------+ {q7.| .| . Since this is a DFA the next columns must use exactly the state names found in the first column.| .| . Check that every state occurs in some set and in only one set. 29 of 66 05/21/2011 09:31 AM . Sigma.| .q5.q6} | {q2.| +---+---+---+---+---+---+---+---+---+ http://www.html is a state in the minimum machine The "O" for (q2.| X | O | O | X | X | +---+---+---+---+---+---+---+---+---+ q4 | .q5. q3).| .q4} | ----------+----------------+------------------+ {q1. many other "O" just confirm this.| .| .q4} Use the same technique as was used to convert a NFA to a DFA.q6}.| .q5.g.q4} | {q2.q6} } only one final state q0' = q0 delta' | a | b | ----------+----------------+------------------+ {q0} | {q1.q5.| .| . q3.| .q8} | ----------+----------------+------------------+ Note: Fill in the first column of states first. q8} is one state in the minimum machine The resulting minimum machine is M' = (Q'.| . but q1 is now {q1.q4}.q8} | {q7. {q7. q8) means {q7. {q2. q6) means they are one state {q2.| .| . The "O" in (q7. q5) and (q2.| .| . q5.| .| .| .| .| O | X | X | +---+---+---+---+---+---+---+---+---+ q6 | .q3.| .

CFG Grammars that have the same languages as DFA's grammar is defined as G = (V.CMSC 451 Selected Lecture Notes http://www. is a set of terminal symbols.umbc.edu/users/s/q/squire/pub/download/myhill. thus the pairs of states (Q-F) X (F) are distinguishable. on irix. S) where is a set of variables.q) and (r.s) where r=delta(p.a) and s=delta(q. In this case the DFA started with the minimum number of states. then r is distinguishable from s.umbc. At the heart of the algorithm is the following: The sets Q-F and F are disjoint.cpp instruction on use are here Lecture 13 Context Free Grammars.html It is possible for the entire matrix to be "X" or "x" at the end. We say that L(G) is the language generated (accepted) by the grammar G.gl. marked X. For the pairs of states (p. If you do not wish to do minimizations by hand.cs.edu/users/s/q/squire/pub/myhill myhill or get the C++ source code from /afs/umbc. A string z is accepted by a grammar G if some sequence of rules from P can be applied to z with a result that is exactly the variable S.a) if p is distinguishable from q. thus mark (r. This is the same as Sigma for a machine. T.s) with an x. S is in V. P. We usually use capital letters for variables. A V T P 30 of 66 05/21/2011 09:31 AM .edu use ln -s /afs/umbc.edu/~squire/cs451_lect. is a list of productions (rules) of the form: variable -> concatenation of variables and terminals S is the starting variable.

S -> epsilon (An optional encoding is to generate an extra rule for every transition to a final state: delta(qi. delta(qi. 2. qi -> a with this option.edu/~squire/cs451_lect. T. Yet. .a) = any final state. The shorthand notation S -> epsilon | 0S | 1S is the same as writing the three rules.html To start. we restrict the productions P to be of the form A -> w w is a concatenation of terminal symbols B -> wC w is a concatenation of terminal symbols A.umbc. this language has a simple grammar S -> epsilon | 0S1 31 of 66 05/21/2011 09:31 AM .g file for worked example. ) See g_reg. the production S -> epsilon is still required. 1. delta. i. i i For example the language L = { 0 1 | i=0. the rule is S -> 1S. } is not a regular language. F) with } { 0. There is a rule generated for every entry in delta.a) = qj yields a rule qi -> a qj An additional rule is generated for each final state.cs. 1 } same as Sigma for the machine P = S -> epsilon | 0S | 1S S = S the q0 state from the machine the construction of the rules for P is directly from M's delta If delta has an entry from state S with input symbol 0 go to state S. Read "|" as "or". Grammars can be more powerful (read accept a larger class of languages) than finite state machines (DFA's NFA's NFA-epsilon regular expressions). S) where V = { S } the set of states in the machine T = { 0. B and C are variables in V and thus get a grammar that generates (accepts) a regular language.e. the rule is S -> 0S.. Suppose Q = { S Sigma = q0 = S F = { S we are given a machine M = (Q. If delta has an entry from state S with input symbol 1 go to state S. if the start state is a final state. q0. Sigma.CMSC 451 Selected Lecture Notes http://www. 1 } } delta | 0 | 1 | ---+---+---+ S | S | S | ---+---+---+ this looks strange because we would normally use q0 is place of S The regular expression for M is (0+1)* We can write the corresponding grammar for this machine as G = (V. P..

S) T = { 0. 1 }. i. T. S = S.umbc. This rule has a terminal after the variable. the start variable.html Note that this grammar violates the restriction needed to make the grammars language a regular language. P.CMSC 451 Selected Lecture Notes http://www. ) } = S -> epsilon | (S) | SS = S We can check this be rewriting an input string ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ) ( ) ( ( ) ) ) ) ( ) ( ) ( S ) ) ) ( ) ( ) S ) ) ( ) S S ) ) ( ) S ) ) S S ) ) S ) ) S ) S S S S S S S S S -> -> -> -> -> -> -> -> (S) where the inside S is epsilon (S) (S) where the inside S is epsilon SS (S) where the inside S is epsilon SS (S) (S) Thus the string ((()()(()))) is accepted by G because the rewriting produced exactly S. S ) T = { a. rules can only have terminal symbols and then one variable. P is below: S -> 0 | 1 | 00 | 11 | 0S0 | 1S1 We started the construction with S -> 0 and S -> 1 the shortest strings in the language. S) = { S } = { ( . A G V T P S grammar for matching parenthesis might be = (V. 32 of 66 05/21/2011 09:31 AM .e. S -> 0S0 is a palindrome with a zero added to either end S -> 1S1 is a palindrome with a one added to either end But. V = S. Thus S -> aSb. n n Construct the grammar for the language L = { a b n>0 } G = ( V. b } V = { S } S = S P is: S -> ab | aSb Because n>0 there can be no S -> epsilon The shortest string in the language is ab a's have to be on the front. we needed S -> 00 and S -> 11 to get the even length palindromes started. T. In this case start "generating" strings in the language S -> ab ab for n=1 S -> aSb aabb for n=2 S -> aaSbb aaabbb for n=3 etc. When either an "a" or a "b" is added the other must be added in order to keep the count the same. P.edu/~squire/cs451_lect. The toughest decision is when to stop adding rules.cs. 1 } The strings in this language read the same forward and backward. More examples of constructing grammars from language descriptions: Construct a CFG for non empty Palindromes over T = { 0. b's have to be on the back. P. "Non empty" means there can be no rule S -> epsilon. G = ( V. T.

X2. X1. S) with variables V. The leftmost and rightmost derivations are usually distinct but might be the same. you should get a derivation tree with a root S. Watch out! A grammar may have an unbounded number of derivation trees.html "Generating" the strings in a language defined by a grammar is also called "derivation" of the strings in a language. See example below and ((()())()) example in previous lecture. leaf vertices are from T or epsilon 4) an interior vertex A has children. T. The grammar is called "ambiguous" if the leftmost (rightmost) derivation tree is not unique for every string in the language defined by the grammar. There may be more than one leftmost derivation trees for some string.. It just depends on which production is expanded at each vertex. . . reading the leafs from left to right gives one string in the language defined by the grammar. terminal symbols T. Lecture 14. After much trial and error. Then work on the second character of the string. We will get to the CYK algorithm that does the parsing in a few lectures. P.CMSC 451 Selected Lecture Notes Thus.cs.umbc. no more rules needed. Xk 5) a leaf can be epsilon only when there is a production A -> epsilon and the leafs parent can have only this child. Given a grammar and a string in the language represented by the grammar... left to right.edu/~squire/cs451_lect. A derivation tree is constructed with 1) each tree vertex is a variable or terminal or epsilon 2) the root vertex is S 3) interior vertices are from V.. Xk when there is a production in P of the form A -> X1 X2 . a leftmost derivation tree is constructed bottom up by finding a production in the grammar that has the leftmost character of the string (possibly more than one may have to be tried) and building the tree towards the root. There may be more than one rightmost derivation tree for some string. http://www. in order. Examples: Construct a grammar for L = { x 0^n y 1^n z n>0 } 33 of 66 05/21/2011 09:31 AM . For any valid derivation tree. CFG Derivation Trees Given a grammar with the usual representation G = (V. set of productions P and the start symbol from V called S. There may be many derivation trees for a single string in the language. If the grammar is a CFG then a leftmost derivation tree exists for every string in the corresponding CFL. If the grammar is a CFG then a rightmost derivation tree exists for every string in the corresponding CFL.

) One possible derivation using the grammar above is S => xBz => x0B1z => x00B11z => x00y11z The derivation must obviously stop when the sentential form has only terminal symbols. Any variable.CMSC 451 Selected Lecture Notes http://www. 1 } S = S P = S -> xBz B -> y | 0B1 * Now construct an arbitrary derivation for S => x00y11z G A derivation always starts with the start variable. S. say B. y. called sentential form. and "over the grammar G" respectively.cs. (No more substitutions possible. the language is completed S -> xBz using the prefix. the recursion 0B1 ) Then. z. (In general there are many possible replacements. y and z could be any strings not involving n) G = ( V. P. (Note that x. The "=>". S / | \ / | \ / | \ / | \ / | \ x B z / | \ / | \ / | \ / | \ 0 B 1 / | \ / | \ 0 B 1 34 of 66 05/21/2011 09:31 AM . 0. this is a very poor way to generate all strings in the grammar! A "derivation tree" sometimes called a "parse tree" uses the rules above: start with the starting symbol. S ) where V = { B. S } T = { x.) The final string is in the language of the grammar. "any number of steps". T. But. say B B -> y | 0B1 (The base y.html Recognize that 0^n y 1^n is a base language. can be replaced by the right side of any production of the form B -> <right side> A leftmost derivation always replaces the leftmost variable in the sentential form.edu/~squire/cs451_lect.umbc. the process is nondeterministic. "*" and "G" stand for "derivation". expand the tree by creating branches using any right side of a starting symbol rule. etc. base language and suffix. The intermediate terms. may contain variable and terminal symbols.

What is a leftmost derivation trees for some string? It is a process that looks at the string left to right and runs the productions backwards. G(L) for L = { a^i b^j c^j d^i e^k f^k i>=0. Here is an example. c. P. k>=0 } | | | | | | | +---+ | +---+ +-----------+ G = ( V. d. More examples of grammars are: G(L) for L = { x a^n y b^k z k > n > 0 } note that there must be more b's than a's B -> aybb | aBb | Bb G = ( V.CMSC 451 Selected Lecture Notes | y Derivation ends x 0 0 y 1 1 z http://www. y. E. T. f } P = S -> IK I -> J | aId J -> epsilon | bJc K -> epsilon | eKf S = S G(L) for L = { a^i b^j c^k | any unbounded relation such as i=j=k>0. S) V={S. This will be intuitively seen in the push down automata and provable with the pumping lemma for context free languages.html with all leaves terminal symbols.cs. K. Given G I E S = (V. c} S=S P= -> a | b | c -> I | E+E | E*E -> E (a subset of grammar from book) a + b * c I E S I E [E + E] Given a string derived but not used 35 of 66 05/21/2011 09:31 AM . 0<i<k<j } the G(L) can not be a context free grammar. T. b. j>=0. z } S = S P = S -> xBz B -> aybb | aBb | Bb Incremental changes for "n > k > 0" Incremental changes for "n >= k >= 0" thus B -> aayb | aBb | aB B -> y | aBb | aB Independent exponents do not cause a problem when nested equivalent to nesting parenthesis. P. Try it.edu/~squire/cs451_lect. T . S } T = { a. b. S } T = { a. a string in the language generated by the grammar. J. time starts at top and moves down. e.umbc. x. S ) V = { I. P. S ) where V = { B. I} T={a. b.

all in T. just turn upside down. S | E / | \ / | \ / | \ E * E / | \ | E + E I | | | I I c | | a b Check: Read leafs left to right. figure 4. Left derivation tree.edu/~squire/cs451_lect. 36 of 66 05/21/2011 09:31 AM .html derived but not used [E done! Have S and no more input. T. must be initial string.7 2nd Ed.1. Overview: Step Step Step Step Step Step 1a) 1b) 2) 3) 4) 5) Eliminate useless variables that can not become terminals Eliminate useless variables that can not be reached Eliminate epsilon productions Eliminate unit productions Make productions Chomsky Normal Form Make productions Greibach Normal Form The CYK parsing uses Chomsky Normal Form as input The CFG to NPDA uses Greibach Normal Form as input Details: one step at a time 1a) Eliminate useless variables that can not become terminals See 1st Ed. Interior nodes must be variables. section 7. delet unused.1 Basically: Build the set NEWV from productions of the form V -> w where V is a variable and w is one or more terminals. S) and perform transformations on the grammar that preserve the language generated by the grammar but reach a specific format for the productions.CMSC 451 Selected Lecture Notes E S I E * E] E S http://www.umbc. Every vertical connection must be tracable to a production. all in V. Lemma 4. Lecture 15 CFG simplification algorithm The goal here is to take an arbitrary Context Free Grammar G = (V. book p88.cs. P.

Briefly.T.T. book p90. Thus T is unchanged. 3) Eliminate unit productions. 2) Eliminate epsilon productions.S) represents the same language. to find: A -> w where w is all terminals union V' with A n := 0 while n /= |V'| 37 of 66 05/21/2011 09:31 AM . with A in V' insert the terminals from w into the set T' and insert the variables form w into the set V' and mark the production as used.cs. Now. book 7.P. 1b) Eliminate useless variables that can not be reached from S See 1st Ed. Thus NEWV is all the reduced to all terminals. Make a copy of all productions B -> gamma. replacing B with A. productions.P. V=NEWV and P may become the same or smaller. there needs to be copies of B -> gamma. Be careful of A -> B. some pseudo code for the above steps. 2nd Ed. delete all productions from P that are marked unused. book p89. Set V'=S.S) represents the same language.T. V=V'.umbc. now accepting any variable it is in NEWV. For any production A -> w. If the language of the grammar contains the null string. mark all production as unused.4.edu/~squire/cs451_lect. all productions containing a variable not in NEWV can be thrown away. P.CMSC 451 Selected Lecture Notes Insert V into the set Then iterate over the in w as a terminal if variables that can be http://www. Delete duplicate productions. 2nd Ed. eliminate epsilon productions. book 7. T=T'. Delete this production from P.2. Iterate repeatedly through all productions until no change in V' or T'. then in principle remove epsilon from the grammar.S) represents the same language. book p91.T.P. S is unchanged. Theorem 4.1 This is complex.P. Step 1a) The set V' = phi loop through the productions. Theorem 4. 7.S) represents the same language except the new language does not contain epsilon. The new grammar G=(V.1. Lemma 4. See 1st Ed.3.1 Iterate through productions finding A -> B type "unit productions". D -> gamma for A. T'=phi. then put S -> epsilon back into the grammar later. epsilon. (sort and remove adjacent duplicate) The new grammar G=(V. The new grammar G=(V. B -> C. See 1st Ed. Now. C -> gamma.html NEWV. S is unchanged. 2nd Ed. C -> D type cases. The new grammar G=(V.

edu/~squire/cs451_lect.umbc.html n := |V'| loop through productions to find: A -> alpha where alpha is only terminals and variables in V' union V' with A end while Eliminate := V .V' loop through productions delete any production containing a variable in Eliminate.CMSC 451 Selected Lecture Notes http://www. |U| times. V := V' Step 1b) The set V' = {S} The set T' = phi n := 0 while n /= |V'| + |T'| n := |V'| + |T'| loop through productions to find: A -> alpha where A in V' union V' with variables in alpha union T' with terminals in alpha end while loop through productions delete any production containing anything outside V' T' and epsilon V := V' T := T' Step 2) The set N = phi n := -1 while n /= |N| n = |N| loop through productions to find: A -> epsilon union N with A delete production A -> alpha where no terminals in alpha and all variables in alpha are in N union N with A delete production end while if S in N set null string accepted loop through productions A -> alpha where at least one variable in alpha in N generate rules A -> alpha' where alpha' is all combinations of eliminating the variables in N Step 3) P' := all non unit productions ( not A -> B ) U := all unit productions loop through productions in U.cs. to find: A -> A ignore this A -> B loop through productions in P' copy/substitute B -> gamma to A -> gamma in P' 38 of 66 05/21/2011 09:31 AM .

g. most of the simplification is printed. (Repeat . as represented by the productions. delete the second production and replace all occurrences of the second productions left variable with the left variable of the first production in all productions.either on a production or loop until no replacements. Step 4) of "simplification" is the following algorithm: 'length' refers to the number of variables plus terminal symbols on the right side of a production." The CYKP.g input data to cykp and output g_elim. CYK parser. that is 1) No useless variables. Step 4) in the overall grammar "simplification" process is to convert the grammar to Chomsky Normal Form.CMSC 451 Selected Lecture Notes http://www. An optimization is possible but not required.out Lecture 16 Chomsky Normal Form Chomsky Normal Form is used by the CYK algorithm to determine if a string is accepted by a Context Free Grammar.cs. Of possible interest is a test case g_elim. Example grammar: 39 of 66 05/21/2011 09:31 AM . has the above steps coded in C++ and with "verbose 3" in the grammar file. proceed with CYK.) Now the grammar. for any two productions with the same right side.umbc. sort and check i+i against i) See link to "Turing machines and parsers. The grammars must have the "simplification" steps 1). is in Chomsky Normal Form. Loop through the productions For each production with length greater than 1 do Replace each terminal symbol with a new variable and add a production new variable -> terminal symbol. 2) No nullable variables and 3) no unit productions.html P := P' eliminate duplicate productions (e. 2) and 3) out of the way. Productions can be one of two formats A -> a or A -> BC The right side of the production is either exactly one terminal symbol or exactly two variables.edu/~squire/cs451_lect. Loop through the productions For each production with length grater than 2 do Replace two rightmost variables with a new variable and add a production new variable -> two rightmost variables.

T.html G = (V. Similarly D -> b can be deleted. a different naming convention 40 of 66 05/21/2011 09:31 AM . Giving the reduced Chomsky Normal Form: S F S A G C A A -> -> -> -> -> -> -> -> SF AS a CG CA b SS CS For a computer generated reduction.edu/~squire/cs451_lect.b} S=S First loop through productions (Check n>1) S -> aAS becomes S -> a stays A -> SbA becomes A -> SS A -> ba stays becomes S B S A C A A D E -> -> -> -> -> -> -> -> -> BAS a a SCA b SS DE b a (B is the next unused variable name) (C is the next unused variable name) Second loop through productions (Check n>2) S -> BAS becomes B -> a stays S -> a stays A -> SCA becomes C A A D E -> -> -> -> -> b SS DE b a stays stays stays stays stays S F B S A G C A A D E -> -> -> -> -> -> -> -> -> -> -> BF AS a a SG CA b SS DE b a (F is the next unused variable) Optimization is possible. keeping the C -> b production and substituting 'C' for 'D'. E -> a can be replaced by the single production S -> a (just to keep 'S') and all occurrences of 'B' and 'E' get replaced by 'S'. S -> a.umbc. B -> a. P.cs.CMSC 451 Selected Lecture Notes http://www. S) S S A A A -> -> -> -> -> aAS a SbA SS ba V={S.A} T={a.

html was chosen (to aid in debugging). this algorithm requires that no underscores were used in the initial grammar. after eliminate.edu/~squire/cs451_lect. a terminal symbol "a" was replaced the prefixing "T_". The productions will be sorted and duplicates will sort together and can be detected and eliminated quickly.CMSC 451 Selected Lecture Notes http://www. sorted productions: A -> a B -> a S -> A B S -> a A S -> a S a S a S -> a S a a S -> a a S a S -> a a a Chomsky Chomsky A -> B -> S -> S -> S -> S -> S -> S -> T_a -> 1.out shown in the last lecture. thus "a" becomes "T_a". (In order to be completely safe. replace terminal with variable part 1.) An example uses g_elim. that substitution is remembered so that there will be at most |T| rules generated of the form T_a -> a When there are more that two variables on the right hand side of a production. here is the Chomsky portion. new production for each pair over two Chomsky Part 2 generated productions C_ST_a -> S T_a C_T_aS -> T_a S C_ST_a -> S T_a C_T_aT_a -> T_a T_a C_ST_a -> S T_a C_ST_a -> S T_a C_T_aS -> T_a S C_T_aT_a -> T_a T_a after Chomsky. First. Extracted and cleaned up. sorted productions: A -> a 41 of 66 05/21/2011 09:31 AM . sorted productions: a a A B T_a A T_a S T_a S T_a T_a S T_a T_a T_a T_a S T_a T_a T_a T_a a Chomsky 2. Once a substitution is made.cs.umbc. This provides an easy reduction if the same two variables are replaced more than once. the new production is named "C_" concatenated with the last two variables separated by an underscore.

Lecture 17 Greibach Normal Form Greibach Normal Form of a CFG has all productions of the form A ->aV Where 'A' is a variable.. . .umbc. A2. B2.cs.html T_a S T_a B A C_ST_a C_T_aS C_T_aT_a after Chomsky. T. Productions may be created and/or removed (a mechanical implementation may use coloring to track processed.. Variables: A B C_ST_a C_T_aS C_T_aT_a S T_a Terminals unchanged. Every CFG can be rewritten in Greibach Normal Form.) Step 1 (repeat until no productions are added. The notation A(j) refers to the variable named Aj . This simple structure for the productions makes possible an efficient parser. referred to as B(j) . added and deleted productions. Am .. New variables may be created with names B1.edu/~squire/cs451_lect. Starting with a grammar: G = ( V. m-1 times at most) begin (m is the number of variables. S unchanged. S) 1a) Eliminate useless variables that can not become terminals 1b) Eliminate useless variables that can not be reached 2) Eliminate epsilon productions 3) Eliminate unit productions 4) Convert productions to Chomsky Normal Form 5) Convert productions to Greibach Normal Form using algorithm below: Re-label all variables such that the names are A1. 'a' is exactly one terminal and 'V' is a string of none or more variables. This is step 5 in the sequence of "simplification" steps for CFG's. P.. . can change on each repeat) for k := 1 to m do begin for j := 1 to k-1 do for each production of the form A(k) -> A(j) alpha do begin for all productions A(j) -> beta do 42 of 66 05/21/2011 09:31 AM .CMSC 451 Selected Lecture Notes B C_ST_a C_T_aS C_T_aT_a S S S S S T_a -> -> -> -> -> -> -> -> -> -> a S T_a T_a A T_a T_a T_a T_a a http://www. Greibach Normal Form will be used to construct a PushDown Automata that recognizes the language generated by a Context Free Grammar.

page 271. C_T_bA .CMSC 451 Selected Lecture Notes http://www.pda) cykp. for each production of form A(k) -> A(k) alpha do begin add production b(k) -> alpha and B(k) -> alpha B(k) remove production A(k) -> A(k) alpha end. A -> S S . edited.1.edu/~squire/cs451_lect.cs. sort productions and delete any duplicates see book: 2nd Ed.1. from start S terminal a . cykp < g1. terminal b . a C_AS C_T_bA . S . variable variable variable variable variable A A .b} S=S // start S terminal a b . T.html add production A(k) -> beta alpha remove production A(k) -> A(j) alpha end. T_a . For neatness. A -> S b A . // g1. P.out -> 43 of 66 05/21/2011 09:31 AM .g a test grammar // G = (V.11 construction Example (in file format.pda . A -> b a . for each production A(k) -> beta where beta does not begin A(K) do add production A(k) -> beta B(k) end. input given to program output file greibach. end. S) V={S.g > g1_cykp. enddef abaa // greibach. section 7. Step 2 (sort productions and delete any duplicates ) Remove A -> A beta by creating B -> alpha B Substitute so all reules become Greibach with starting terminal.A} T={a. Exercise 7. verbose 7 // causes every step of ever loop to be printed // runs a very long time ! S -> a A S . S -> a . C_AS .umbc.

(one involving I and J. and one involving K) Thus ambiguous and not modifiable to make it unambiguous. b T_a . Lecture 18 Inherently ambiguous CFL's. b A . project See project description A CFL that is inherently ambiguous is one for which no unambiguous CFG can exists.CMSC 451 Selected Lecture Notes A A A A C_AS C_AS C_AS C_AS C_AS C_T_bA S S T_a enddef -> -> -> -> -> -> -> -> -> -> -> -> -> a C_AS S . Note that an optional optimization could delete the last rule. i i j j i j j i L= {a b c d | i. b T_a S .edu/~squire/cs451_lect.umbc. a C_AS .j>0} is such a language i i j j The productions for a b c d could be 1) S -> I J 2) I -> a b 3) I -> a I b 4) J -> c d 5) J -> c J d i j j i The productions for a b c d could be (using K instead of J) 6) S -> a K d 7) S -> a S d 8) K -> b c 9) K -> b K c Now consider the case i = j. a C_T_bA . http://www. and replace T_a by S. An ambiguous grammar is a grammar for a language where at least one string in the language has two parse trees. a C_T_bA S . 44 of 66 05/21/2011 09:31 AM . a S S . This is a string generated by both grammars and thus will have two rightmost derivations. a C_AS C_T_bA a C_AS S S . a .j>0} union {a b c d | i.cs. This is equivalent to saying some string has more than one leftmost derivation or more than one rightmost derivation. a . a S .html S .

Multiple choice questions based on lectures. .umbc. having a state. The table would be too wide. The gamma represents a 45 of 66 05/21/2011 09:31 AM . gamma sub i) are respectively the next state and the string of symbols to be written onto the stack. F) where Q = a finite set of states including q0 Sigma = a finite alphabet of input symbols (on the input tape) Gamma = a finite set of push down stack symbols including Z0 delta = a group of nondeterministic transitions mapping Q x (Sigma union {epsilon}) x Gamma to finite sets of Q x Gamma star q0 = the initial state. the read head moves to the right and can never reverse to read that symbol again.html Lecture 19 Quiz 2 Similar to Quiz 1 Closed book. Details on Homework 7-10 page here Lecture 20 Push Down Automata. Just like the branching tree for nondeterministic finite automata except additional copies of the pushdown stack are also created at each branch. a is the input tape symbol being read. The top of the stack is read by popping off the symbol. Now. transitions of the form: delta(q. The machine is nondeterministic.a. accepting.. If there is a delta transition with the (q. q0. CFL. possibly the only state Z0 = the initial stack contents. PDA Push Down Automata. an element of Q. If there is no delta transition defined with these three values the machine halts.gamma) are performed. reading assignments and homework. meaning that all the pairs are executed causing a branching tree of PDA configurations. (qj. The ordered pairs (q sub i.A) then all pairs of (state. The definition of a Push Down Automata is: M = (Q. an input symbol and a stack symbol a delta transition is performed. The operation of the PDA is to begin in state q0.CMSC 451 Selected Lecture Notes http://www. Z0. PDA. Sigma. is an important concept. read the symbol on the input tape or read epsilon. possibly the only stack symbol F = the set of final.. an element of Sigma union {epsilon} A is the top of the stack being read.a.cs.gammai). Gamma. covered above. PDA's with corresponding CFL's and Turing Machines with corresponding Recursively Enumerable Sets (Languages). Delta is a list of. By themselves PDA's are not very important but the hierarchy of Finite State Machines with corresponding Regular Languages. states but may be empty for a PDA "accepting on an empty stack" Unlike finite automata.gammaj). nondeterministic. the delta is not presented in tabular form. If a symbol is read.A) = { (qi.} where q is the current state. delta.edu/~squire/cs451_lect. are a way to represent the language class called Context Free Languages. an element of gamma.

F) | |--> reject +-----+ +-------------------------+----------------.edu/~squire/cs451_lect. If the machine is in a final state accept. When the machine halts a decision is made to accept or reject the input. q0. q0.. Gamma. F) | |--> reject +-----+ An example of a language that requires the PDA to be a NPDA. Sigma.umbc. If the last. move right | | +-----+ | | |--> accept +--+ FSM | M = ( Q. move left and right | | +-----+ | | |--> accept +--+ FSM | M = ( Q. Gamma.Turing Machine | input string |BBBBBBBB .html sequence of push down stack symbols and are pushed right to left onto the stack. Z0. delta. delta. B. Phi. delta. input symbol has not been read then reject. Nondeterministic Push Down Automata. Then the machine goes to the next state. NFA.. and the only symbol on the stack is Z0. NFA epsilon | input string | accepts Regular Languages +-------------------------+----------------^ read. F) | |--> reject +-----+ +-------------------------+----------------.CMSC 451 Selected Lecture Notes http://www.DFA. is L = { w wr | w in Sigma and wr is w written backwards } Lecture 21 CFG/CFL to NPDA 46 of 66 05/21/2011 09:31 AM . q0. accepts Recursively Enumerable +-------------------------+----------------.cs. (This is the "accept on empty stack" case) Now. no symbols are pushed onto the stack. If the set of final states is empty. move right ^ read and write (push and pop) | | +-----------------------+ | +-----+ | | |--> accept +--+ FSM | M = ( Q. PDA and TM +-------------------------+----------------.Push Down Automata | input string |Z0 stack accepts Context Free Languages +-------------------------+----------------^ read. Sigma. If gamma is epsilon.Languages ^ read and write. q. using pictures we show the machines for FSM. then accept. Sigma. rightmost.

T. no vertical bars) A -> a gamma constructs | | | | | +-------------+ | | | +-----------+ | gamma is a possibly empty. the stack becomes empty and advance to beyond input 0011 | | ^ | | 47 of 66 05/21/2011 09:31 AM . the first stack dies. 0.cs.html Given a Context Free Grammar. in Greibach Normal Form.edu/~squire/cs451_lect. 0. gamma). A -> aB1B2B3. P. 1.. advancing 0011 |C| |T| dies ^ |T| |T| | | | | delta #5 applies to the second stack. Z0. (q. CT) TT) C) T) #e) NPDA running with input tape 0011 and initial stack S 0011 |S| ^ | | delta #3 and #4 both apply. 0. resulting in two stacks and advancing input 0011 |C| |T| ^ | | | | delta #1 and #2 apply to the first stack. use epsilon as in (q. for each production in P (individual production. delta. T) = # 1 2 3 4 5 (q. (q. (q. S ) and must now construct M = ( Q. We are given G = ( V.CMSC 451 Selected Lecture Notes http://www.umbc. (q. CFG.. a. epsilon) Finished! An example: Given production C -> 0 C T C -> 0 T T S -> 0 C S -> 0 T T -> 1 Resulting delta delta(q. S) = delta(q. Gamma. A) = {(q. S) = delta(q. F) Q = {q} the one and only state! Sigma = T Gamma = V delta is shown below q0 = q Z0 = S F = Phi NPDA will "accept on empty stack" Now. q0. sequence of | | | symbols from Gamma +---+ | | | | | V V V delta(q. C) = delta(q. Construct an NPDA machine that accepts the same language as that grammar. 0. advancing 0011 dies |T| ^ | | delta #5 applies. the second stack dies. C) = delta(q. Sigma. more may come from other productions} If gamma is empty.

Gamma. z) = { (q1.. then the [q.q3][q3. a. z) } For every rule in P . is somewhat simpler. Sigma. epsilon.B2.p] | q and p in Q and A in Gamma } This can be a big set! q is every state with A every Gamma with p every state. delta. the S -> . A -> aU generate delta(q1.A.q4]. Greibach normal form.CMSC 451 Selected Lecture Notes accept input string.html Another conversion algorithm that uses more states.B1. qm+1 in Q for each a in Sigma union {epsilon} for each A.. Phi) and Sigma intersection Gamma = Phi.A.B3.qm+1] is created for each q2. P. z) = { (q2. Sz) } delta(q1..Z0. The cardinality of V.qi] for every qi in Q (including q0) |Q| of these productions [qi.B1. http://www.. P.B2.. delta. thus a grammar that initially accepted the empty string needs the extra transition delta(q0. Sigma. (the states in the NPDA are renamed q0.. A) Note: The empty string must be removed to create Greibach normal form. F) Q = {q0.[qm. q0.Bm in Gamma such that there is a delta of the form 48 of 66 05/21/2011 09:31 AM . q0. .Bm.edu/~squire/cs451_lect. Construct a CFG G = ( V. epsilon. We are given G = ( V.. Z0.. Z0. S ) and must now construct M = ( Q. a. |V|=|Q|x|Gamma|x|Q| Note that the symbology [q. q1. q2} Sigma = T Gamma = V union {z} where z not in V (an extra symbol) delta = Q x (Sigma union epsilon) x Gamma -> Q x Gamma q0 = q0 Z0 = z (to get the NPDA started) F = q2 the final accepting state shown below Two predefined transitions. A) = { (q1.cs. U) } union all other sets from (q1.B3..p] is just a variable name. . Lecture 22 NPDA to CFG/CFL Given a NPDA M = ( Q. Gamma. the start and the accept are: delta(q0.umbc. epsilon. z) } The conversion is proved to simulate a leftmost derivation.. if necessary) Construct the productions in two stages. S ) Set T = Sigma S = S V = { S } union { [q.p] -> S -> [q0. z) = { (q2.A. q1.A.qm+1] -> a[qj. q3. q4.. T. T.q2][q2. three to be exact.

WOW! Now we have the Greibach Normal Form of a NPDA with any number of states and we can convert this to a NPDA with just one state by the construction in the previous lecture. then the new right sides of the [q.Bm). F) Q={q} Sigma={0..a.edu/~squire/cs451_lect.. 0.B2.A.qm+1] -> a[qj. eliminate unit productions. The important concept is that the constructions CFG to NPDA and NPDA to CFG provably keep the same language being accepted. Z0.} Note three degenerate cases: delta(qi.A. 0.qk][qk.A)={(qj.a. S) = (q. TT) delta(q.a. convert to Greibach Normal Form.B1. Gamma. C) delta(q.epsilon)} makes [qi.(qj..T} q0=q Z0=S F=Phi delta(q.p] -> productions.S.epsilon)} makes [qi.CMSC 451 Selected Lecture Notes http://www. delta.qm+1] | | ^ ^ | | | | | +---+ | +--------------------------+ The book suggests to follow the chain of states starting with the right sides of the S -> productions. |Q|=1 makes it easy.1} Gamma={C. CT) delta(q. The correct grammar is built generating all productions. C) = (q. eliminate nullable variables.a.a.. Then the "simplification" can be applied to eliminate useless variables.epsilon.html delta(qi. given delta(qi. +-------------+ | | | | | while qm+1 is every state | +---------------+ | | | | | | | | | | | +--+ | | | | | | +-------+ | | | | | | +-------+ | | | | | | | V V V V V V [qi. The language generated by the CFG is exactly the language accepted by the NPDA.A)=phi makes no productions delta(qi.A)= (qj. Fixing up any technical details like renaming Gamma symbols if Gamma intersection Sigma not empty and accepting or rejecting the null string appropriately.B1B2) generate the set | | | | | | for qk being every state.umbc. Sigma. T) 49 of 66 05/21/2011 09:31 AM .A) = { . .qj] -> epsilon The general case: Pictorially. S) = (q..A)={(qj. 0. to be technical. Given: NPDA = (Q.qj] -> a delta(qi. convert to Chomsky Normal Form..A. 0. q0. The reverse of the example in the previous lecture.B1B2B3.. Well.cs. C) = (q.

P . 1. qTq } four variable names T = Sigma = {0. 1 } = { Z0. V1 } = (q0. 1} (dropping the previous punctuation [.umbc. qCq. T. Gamma.V0) pop Z0 write V0 (for zero) (q0.V0V0) add another V0 to stack (q0.. P . if you prefer.0. rename the variables to single letters (assuming you have a big enough alphabet) For this simple example qCq becomes just C. delta.0. q0.epsilon) pop a V0 for each 1 accept on empty stack Build: G = (V. CT) | | | | || +-----+ | | +----+ || |+----------+ |+------+| || | || +-+ || | || | qCq -> 0 qCq qTq | | | | | +---+ | +--------------------+ was C -> 0 C T continue using same method on each delta: qCq -> 0 qTq qTq qSq -> 0 qCq qSq -> 0 qTq qTq -> 1 (epsilon becomes nothing) Now.CMSC 451 Selected Lecture Notes delta(q.cs. working an example for another NPDA for this same language: M = ( Q. Phi) q0. epsilon) http://www. thus the productions become: C -> 0 C T C -> 0 T T S -> 0 C S -> 0 T T -> 1 This grammar is Greibach normal form for L(G)={0^n 1^n | n<0} Now.]) S = Z0 = S S-> productions S -> qSq Note! qSq is a single variable just dropped the [.html Build: G = (V. T) = (q. Z0.V0) = (q1. ] symbols other productions: delta(q..V0) = (q1. q1 } = { 0.epsilon) pop a V0 for a one (q1. S) V = { S.Z0) = (q0. Sigma. V0.edu/~squire/cs451_lect. see below in productions } NPDA Q = { Sigma Gamma delta 50 of 66 05/21/2011 09:31 AM .V0) = (q0. T. S) V = { S.1. 0. qSq becomes just S and qTq becomes just T.1. C) = (q. qSq.

V0.edu/~squire/cs451_lect.V0) = (q1.V0.q0] | | +---------------+ [q0.0.q1] -> 1 (q1.q1] | | | | | +----+ | +--------------------------+ (q0.q0] -> 0 [q0.Z0.V0.Z0.Z0) = (q0.umbc.V0. P4.V0.Z0.1.q0] S -> [q0.V0.q0] | | | | | +----+ | +--------------------------+ [q0.V0.V0) (one for each state) P3 [q0.V0.V0V0) P4 (two combinations of two states) [q0.q1] [q1.q1] -> 0 [q0. P10 that produces the string 0011 P5 P6 P7 P8 P9 P10 51 of 66 05/21/2011 09:31 AM .V0. P9 that produces the string 01 P2.1.V0.V0.q0] -> 0 [q0.q1] | | +---------------+ (q0.V0.V0.Z0.q0] [q0.q0] -> 0 [q0. P8. P9.V0) = (q1. 1} S = Z0 = S S-> productions S -> [q0.q1] -> 0 [q0.CMSC 451 Selected Lecture Notes T = Sigma = {0.V0.epsilon) [q1.q1] [q1.V0.V0) = (q0.q1] -> 1 A a brief check.q0] | | | | | +----+ | +--------------------------+ [q0.q0] [q0.html (one for each state) P1 P2 delta productions (q0.q1] http://www.cs.epsilon) [q0. consider the string from the derivation P2. P4.q1] | | | | | +----+ | +--------------------------+ [q0.0.V0.q1] -> 0 [q0.

The fallback is to describe in English the steps that should be performed. accepts Recursively Enumerable Languages +-------------------------+----------------^ read and write.a]->[empty]. N]) qi is the present state ai is the symbol under the read/write head qj aj L R N is is is is is the next state written to the tape at the present position move the read/write head left one position after the write move the read/write head right one position after the write optional no movement of the tape.umbc. q0.edu/~squire/cs451_lect. aj. L] or [qi. ai] -> [qj. B. delta.. You have to convert the algorithm into the list of delta transitions. F) Q = finite set of states including q0 Sigma = finite set of input symbols not including B Gamma = finite set of tape symbols including Sigma and B delta = transitions mapping Q x Gamma to Q x Gamma x {L.. It is generally a pain to "program" a Turing machine. R] (optional [qi.cs. 52 of 66 05/21/2011 09:31 AM . but may never happen ) | | +-----+ delta is a table or list of the form: [qi. Sigma. computes partial recursive functions +-------------------------+----------------^ read and write. initially on all tape not used for input F = set of final states +-------------------------+----------------| input string |BBBBB .. The amount of detail needed depends on the reader of the algorithm accepting that there is an obvious way for the Turing machine to perform your steps. aj.CMSC 451 Selected Lecture Notes http://www.html Lecture 23 Turing Machine Model M = ( Q. aj. ai] -> [qj. move left and right | | +-----+ | | |--> accept +--+ FSM | | |--> reject +-----+ +-------------------------+----------------| input and output string |BBBBB . ai] -> [qj..R} q0 = initial state B = blank tape symbol. Gamma. move left and right | | +-----+ | | | +--+ FSM |--> done (a delta [q.

or coded as a Turing Machine program. the five items are just written with white space as a separator and an optional sixth field that is a comment. R for right. Etc. Etc. In a future lecture we will make use of the fact that a UTM can be represented as an integer and can thus also be the input data on the input tape. Each program step. For computer input to a TM simulator. any algorithm can be coded in a high order language. The Universal Turing Machine first reads the description of the Turing machine on the input tape and uses this description to simulate the Turing machines actions on the following input data. or built out of digital logic. Of course a UTM is a TM and can thus be encoded as a binary integer. Etc. we can talk about TMi as the Turing machine encoded as the number "i". Having encoded a specific Turing machine as a binary integer. A sample computer input for an algorithm to add unary strings is: 53 of 66 05/21/2011 09:31 AM .CMSC 451 Selected Lecture Notes http://www. A trivial coding is to use the 8 bit ASCII for each character in the written description of a Turing machine concatenated into one long bit stream. For an example of programming a Turing Machine see Turing Machine simulator Basically. that takes an encoded Turing machine on its input tape followed by normal Turing machine input data on that same input tape.umbc.cs. It turns out that the set of all Turing machines is countable and enumerable.edu/~squire/cs451_lect. read head sees tape symbol ai. aj. A Turing Machine program is a bit-by-bit description of an algorithm. ai] -> [qj. so a UTM can read a UTM from the input tape. nothing written to tape. Now we can construct a Universal Turing Machine. A Turing Machine program step is a 'delta' entry [qi. then read the input data from the input tape and proceed to simulate the UTM that is simulating the TM. Special character pairs #b are used for one blank. is one simple operation.html There are a lot of possible Turing machines and a useful technique is to code Turing machines as binary integers. read a TM from the input tape. or coded in assembly language. UTM. ## is used for epsilon. move] When in state qi. if the then transition to state moving one tape position L for left. qj. writing symbol aj to the tape and according to 'move' which can be N for no move. as in assembly language. But at a much lower level than any assembly language.

delta 3. When in state s2 and a zero is read from the tape.edu/~squire/cs451_lect.CMSC 451 Selected Lecture Notes http://www.000 sum of zeros on input tape start s0 halt s9 // use halt rather than 'final' when computing a result limit 20 s0 s0 s1 s1 s2 0 #b 0 #b 0 s0 s1 s1 s2 s9 ## 0 ## ## #b R R R L R skip over initial 0's write 0 over blank keep moving detect at end blank extra 0 delta delta delta delta delta 1 2 3 4 5 tape 000#b00#b should end with five zeros on tape The simulation starts with the TM in state s0 and read head on the first character.00 000. delta 4.. The new state is the same s0. go to the final state. If in a final state when machine stops.tape +----------^ | | read/write head position TM in state s0 Thus. When in state s1 and a zero is read from the tape. Machine stops when no delta applies.cs. delta 2 writes a zero over the blank and goes to state s1.. blank at end // output tape 000. delta 1 applies.tm add unary strings of zeros // input tape 000. go to state s2 and back up one space (now over last zero)..html // add. +----------|000 00 ---. and write a blank over the zero. the algorithm (program) finished successfully.tape +----------^ | | read/write head position TM in state s0 Following the steps. nothing is written to the tape and the head is moved one place right. +----------|000 00 ---. delta 5.00 blank separated.. s9. stay in state s1 and move one space right.umbc. When in state s1 and a blank is read from the tape.. 54 of 66 05/21/2011 09:31 AM .

n V[i. Lecture 24 CYK algorithm for CFG's Use the "simplification" steps to get to a Chomsky Normal Form. Each element of the matrix is a set. from C) entries in V[1. x[i] represents the i th character of the input string x Parse x using G's productions for i in 1 . denoted phi.j] = phi for k in 1 . from B. 55 of 66 05/21/2011 09:31 AM . from B index.n] then x is in CFL defined by G.j].tape +----------^ | | read/write head position TM in state s9 What would happen if there was a '1' on the tape with the above TM program? The machine would stop in a non-final state which indicates no answer was computed (like segfault). production number.. The set may be empty. from a index. V can be n by n yet only part is used..cs. or the set may contain one or more variables from the grammar G. V[i.n] and build the derivation tree starting at the root. In order to build a derivation tree. from C index) in V[i. from a. production number. n-j+1 { V[i.j] = V[i.1] = { A | A -> x[i] } for j in 2. Given a CFG grammar G in Chomsky Normal Form and a string x of length n Group the productions of G into two sets { A | A -> a } target is a terminal { A | A -> BC } target is exactly two variables V is a two dimensional matrix.umbc. a parse tree.j-k]} } if S in V[1.j] union { A | A -> BC where B in V[i..edu/~squire/cs451_lect.j] is now a set of five tuples.CMSC 451 Selected Lecture Notes http://www..html +----------|00000 ---. you need to extend the CYK algorithm to record (variable. j-1 V[i. Then find one of the (S.k] and C in V[i+k.n for i in 1 .

C B B S.. The running time is O(n^3) with a small multiplicative constant. not for large input.edu/~squire/cs451_lect..A. each token is treated as a terminal symbol. For computer languages the LALR1 and recursive descent parsers are widely used. For working small problems. Example: given a string x = baaba given grammar productions A -> a B -> b C -> a S S A B C -> -> -> -> -> AB BC BA CC AB i 1(b) 1 2 j 3 4 5 phi phi B S. But.html Notes: The parse is ambiguous if there is more than one (S. If you consider a computer language. much smaller than many source files.CMSC 451 Selected Lecture Notes http://www. Thus.n] Multiple levels of the tree may be built while working back V[*.A.) in V[1. the V matrix is 1/2 n^2 times the average number of CFG variables in each cell. But this would typically be only a 250 line input.C V[i.k] to V[*.C ^ |_ accept Derivation tree B b S C A a A a B C B b C a This can be a practical parsing algorithm. a 1000 token input might take 10 megabytes of RAM and execute in about one second. The size problem is that for n tokens.A 5(a) A.A 2(a) A. given a CFG find if it generates a specific string..k-1] and there may be more than one choice at any level if the parse is ambiguous.C 4(b) B S. Typically punctuation and reserved words are unique terminal symbols while all numeric constants may be grouped as one terminal symbol and all user names may be grouped as another terminal symbol.C B 3(a) A.j] S. use the available program cykp 56 of 66 05/21/2011 09:31 AM .cs.C S.umbc.

then the language is Context Free.x. (This is a plain text file that uses words for the upside down A that reads 'for all' and the backwards E that reads 'there exists') Formal statement of the Pumping Lemma: L is a Context Free Language implies (there exists n)(for all z)[z in L and |z|>=n implies {(there exists u.v. this is usually accomplished by showing a contradiction such as (n+1)(n+1) < n*n+n b) show there is no way to partition some z into u. then the language is a Context Free Language. If a PDA machine can be constructed to exactly accept a language.cs. To prove a language is not context free requires a specific definition of the language and the use of the Pumping Lemma for Context Free Languages. then you have proved A is false.html Using the 'cykp' program on the sample grammar. the statement "A" is "L is a Context Free Language".w. trimming some.edu/~squire/cs451_lect. A implies B.y such that 57 of 66 05/21/2011 09:31 AM .y)(z = uvwxy and |vwx|<=n and |vx|>=1 and i i (for all i>=0)(uv wx y is in L) )}] The two commonest ways to use the Pumping Lemma to prove a language is NOT context free are: a) show that there is no possible n for the (there exists n).g Lecture 25 Pumping Lemma for Context Free Languages The Pumping Lemma is generally used to prove a language is not context free.out The input was lect24.x. The statement "B" is a statement from the Predicate Calculus.w. If a Context Free Grammar can be constructed to exactly generate the strings in a language.v.CMSC 451 Selected Lecture Notes http://www. A B | A implies B -----+-----------F F | T F T | T (you can prove anything to be true with a false premise) T F | F T T | T For the Pumping Lemma.umbc. A note about proofs using the Pumping Lemma: Given: Formal statements A and B. If you can prove B is false. the result was lect24.

Be sure to cover all cases by argument or enumerating cases. f(i) not bounded by a constant times i meaning f(i) is not linear i j i j L={a b c d | i.w.] Some Context free languages and corresponding grammars: i i L={a b | i>=1} 2i 3i L={a b | i>=1} S->ab | aSb S-> aabbb | aaSbbb [any pair of constants for 2.y any fixed strings} S->uDy D->awb | aDb R R L={ww | w in Sigma star and w is w written backwards} S->xx | xSx for all x in Sigma Some languages that are NOT Context Free Languages L={ww | w in Sigma star} i i i L={a b c | i>=1} i j k L={a b c | k > j > i >=1} f(i) i L={a b | i>=1} where f(i)=i**2 f(i) is the ith prime. a corollary.html i i uv wx y is in L. j>=1} Review of basics of proofs: You may be proving a lemma. etc A proof is based on: 58 of 66 05/21/2011 09:31 AM . a theorem.cs. typically for a value i=0 or i=2.j>=1} i i j j L={a b a b | i.j>=1} i i j L={a b c | i.umbc. [This gives a contradiction to the (for all z) clause.CMSC 451 Selected Lecture Notes http://www.3] i j i L={a b c | i.j>=1} S->aBc | aSc B->b | bB S->DC D->ab | aDb C->c | cC S->CC C->ab | aCb i i L={ua wb y | i>=1 and u.edu/~squire/cs451_lect.

n Prove two sets A and B are equal.) b) Adding axioms to the system in order to be able to prove all the "true" (valid) theorems will make the system "inconsistent.html (typical normal logic and mathematics) To be accepted as "true" or "valid" Recognized people in the field need to agree your definitions are reasonable axioms.edu/~squire/cs451_lect. like 1 = 2 or n > 2n) Proof by induction (on Natural numbers) Given a statement based on." Inconsistent means a theorem can be proved that is not accepted as "true" (valid). rules of inference Assume Statement_A use proof technique to derive a contradiction (e.g.. axioms.) Proof by contradiction: Given definitions.. B is a subset of A Prove two machines M1 and M2 are equal. any formal system with enough power to do arithmetic is either incomplete or inconsistent. c) Technically. A is a subset of B prove part 2. prove not Statement_A or prove Statement_B = not Statement_B. statement that follows from applying the axiom to the definition. etc.. where n ranges over natural numbers Prove the statement for n=0 or n=1 a) Prove the statement for n+1 assuming the statement true for n b) Prove the statement for n+1 assuming the statement true for n in 1.g.CMSC 451 Selected Lecture Notes definition(s) axioms postulates rules of inference http://www. prove part 1.cs. prove part 1 that machine M1 can simulate machine M2 prove part 2 that machine M2 can simulate machine M1 Limits on proofs: Godel incompleteness theorem: a) Any formal system with enough power to handle arithmetic will have true theorems that are unprovable in the formal system. (e. postulates. definition. axiom. 59 of 66 05/21/2011 09:31 AM . . say n. Types of proofs include: Direct proof (typical in Euclidean plane geometry proofs) Write down line by line provable statements. statement that follows from applying a rule of inference from prior lines. (He proved it with Turing machines. are reasonable rules of inference are reasonable and correctly applied "True" and "Valid" are human intuitive judgments but can be based on solid reasoning as presented in a proof.umbc.

T2. An example grammar is coded in acfg. T1 union T2. As a supplement. S1) P6 is P1 with every occurrence of "a" replaced with S2. and a file that is a main program that runs the parser. S1) for language L1(G1) G2 = (V2.c acfg.cs. P6+P2. P1+P2+P3. Notice that L1 intersect L2 may not be a CFG. P1+P2+P5. We can easily get CFG's for the following languages: L1 union L2 = L3(G3) G3 = (V1 union V2 union {S3}. P1.tab. S4) S4 -> S1S2 L1 star = L5(G5) G5 = (V1 union V2 union {S5}.c a. P2. The steps are to create a file xxx.j>0} L2={a b c | i.j>0} are CFG's i i i but L1 intersect L2 = {a b c | i>0} which is not a CFG.y and a sample main program is coded in yymain.c One possible set of commands to run the sample is: bison acfg.y gcc yymain. the following shows how to take a CFG and possibly use 'yacc' or 'bison' to build a parser for the grammar. The intersection of a Context Free Language with a Regular Language is a Context Free Language. T1.y that includes the grammar. T1-{a} union T2. The complement of L1 may not be a CFG.html Lecture 25a CFL closure properties Given two CFG's G1 = (V1.CMSC 451 Selected Lecture Notes http://www. T1 union T2. S3) S3 -> S1 | S2 L1 concatenated L2 = L4(G4) G4 = (V1 union V2 union {S4}. The difference of two CFG's may not be a CFG.umbc. P1+P2+P4. S5) S5 -> S5S1 | epsilon L2 substituted for terminal "a" in L1 = L6(G6) G6 = (V1 union V2.edu/~squire/cs451_lect. S2) for language L2(G2) Rename variables until V1 intersect V2 is Phi. T1 union T2. i i j i j i Example: L1={a b c | i.out 60 of 66 05/21/2011 09:31 AM .

The following example uses bison format and has a simple lexical analysis built in for single letter variables. This is NOT saying that some programs or some Turing machines can not be analyzed to determine that they.html The output from this run shows the input being read and the rules being applied: read a read b read a A -> ba read a read end of line \n converted to 0 S -> a S -> aAS accepted A grammar that may be of more interest is a grammar for a calculator.CMSC 451 Selected Lecture Notes abaa http://www.umbc. exit) for is a very strong. provably correct. 61 of 66 05/21/2011 09:31 AM .edu/~squire/cs451_lect.cs. for example.c calc a=2 b=3 a+b You should get the result 5 printed.tab. The Halting Problem says that no computer program or Turing machine can determine if ALL computer programs or Turing machines will halt or not halt on ALL inputs.y and a sample main program yymain. always halt. To prove the Halting Problem is unsolvable we will construct one program and one input for which there is no computer program or Turing machine that can correctly determine if it halts or does not halt. See the grammar to find out what other operations are available Lecture 26 The Halting Problem The "Halting Problem" that no one will ever a Turing machine that halt (stop. calc. Simple statement such as a=2 b=3 a+b that prints the answer 5 can be coded as a CFG.c calc. statement be able to write a computer program or design can determine if an arbitrary program will a given input.y gcc -o calc yymain.c One possible set of commands to run calc is: bison calc.

and side 2. you get that side 2 is "True" and side 1 is "False. P.I) ) { while(1){} } /* loop forever. char * I) { /* code that reads the source code for a "C" program. you could choose "True" and thus deduce side 2 is "False". Assumption: There exists a way to write a function named Halts such that: int Halts(char * P. which is exactly the same as side 1. Thus the statement is proved false. Take a black box that accepts input and outputs true or false. } Construct a program called Diagonal.cs. can be both "True" and "False" there is a contradiction. The mathematical concepts we need are: Proof by contradiction.c as follows: int main() { char I[100000000]. a copy of itself. We will use the "C" programming language. determines that P is a legal program. means does not halt */ else return 1. put that black box in a bigger black box that switches the output so it is false or true respectively. /* make as big as you want or use malloc */ read_a_C_program_into( I ).c into the executable program Diagonal. if ( Halts(I. Now execute Diagonal < Diagonal. Assume a statement is true.CMSC 451 Selected Lecture Notes http://www. But staring on side 2.edu/~squire/cs451_lect. The simplest demonstration of how to use these mathematical concepts to get an unsolvable problem is to write on the front and back of a piece of paper "The statement on the back of this paper is false.umbc." Starting on side 1.html We will use very powerful mathematical concepts and do the proofs for both a computer program and a Turing machine. and finally sets a variable "halt" to 1 if P halts on input I. yet any language will work. show that the assumption leads to a contradiction." Since side 1.c 62 of 66 05/21/2011 09:31 AM . Specifically we will use diagonalization. Logical negation. Have a computer program or a Turing machine operate on itself. taking the enumeration of Turing machines and using TMi as input to TMi. The problem of determining if sides 1 and 2 are "True" of "False" is unsolvable. Self referral. well. then determines if P eventually halts (or exits) when P reads the input string I. else sets "halt" to 0 */ return halt. The Halting Problem for a programming language. as input data. } Compile and link Diagonal.

Pictorially TMh is: +---------------------------| encoded TMj B k BBBBB . This is a contradiction because this case says that Diagonal. Assumption: There exists a Turing machine. R as 01110011 00110000 00100000 . +---------------------------^ read and write. that Diagonal.c some cases terminates. This means. Note that TMh always halts and either accepts or rejects. This is a contradiction because this case says that Diagonal. such that: When the input tape contains the encoding of a Turing machine.c halts when given the input Diagonal. try the other case..umbc. 01010010 just a big binary integer. There are no other cases.c (having been compiled and linked) and so we see that Halts(I. Thus what must be wrong is our assumption "there exists a way to write a function named Halts. Halts can only return 1 or 0. BUT! we are running Diagonal..c does halt when given input Diagonal. We can assume ASCII encoding of sample s 0 space .c.c..c. BUT! we are running Diagonal.html Consider two mutually exclusive cases: Case 1: Halts(I.c does NOT halt when given input Diagonal. by the definition of halts.I) returns a value 0.edu/~squire/cs451_lect.out Every Turing machine can be represented as a unique binary number..cs. exits).c does NOT halt when given the input Diagonal..I) returns a value 0 causes the "else" to be executed and the main function halts (stops..I) returns a value 1. TMh.. Any method of encoding could be used.. Case 2: Halts(I. which never halts. that Diagonal.CMSC 451 Selected Lecture Notes http://www. by the definition of Halts. This means.I) returns a value 1 causes the "if" statement to be true and the "while(1){}" statement to be executed.c. thus our executing Diagonal. TMh accepts if TMj halts with input k and TMh rejects if TMj is not a Turing machine or TMj does not halt with input k. TMj followed by input data k. Well. The Halting Problem for Turing machines.c does NOT halt.c (having been compiled and linked) and so we see that Halts(I. move left and right | | +-----+ | | |--> accept +--+ FSM | always halts | |--> reject +-----+ 63 of 66 05/21/2011 09:31 AM ." A example of a program that has not been proven to halt on all numbers is terminates.

move left and right | | +----------------------------------+ | | __ | | | / \ 0.1 | | | +-| ql |--+ | | | +-----+ | \___/ | | | | | |--> accept-+ ^ | | +--+-+ FSM | |_____| | | | TMh |--> reject-+ _ | | +-----+ | // \\ | | +-||qf ||------|--> accept | \\_// | +----------------------------------+ may not halt We now have Turing machine TMi operate on a tape that has TMi as the input machine and TMi as the input data. We take the Finite State Machine. from TMh and 1) make none of its states be final states 2) add a non final state ql that on all inputs goes to ql 3) add a final state qf that is the accepting state Pictorially TMi is: +------------------------------------------| encoded TMj B k BBBBB . by the definition of TMh that TMi halts with input TMi..1 | | | +-| ql |--+ | | | +-----+ | \___/ | | | | | |--> accept-+ ^ | | +--+-+ FSM | |_____| | | | |--> reject-+ _ | | +-----+ | // \\ | | +-||qf ||------|--> accept | \\_// | +----------------------------------+ may not halt Consider two mutually exclusive cases: Case 1: The FSM accepts thus TMi enters the state ql. move left and right | | +----------------------------------+ | | __ | | | / \ 0.html We now use the machine TMh to construct another Turing machine TMi..CMSC 451 Selected Lecture Notes http://www. FSM. +------------------------------------------^ read and write. +------------------------------------------^ read and write.. This means.edu/~squire/cs451_lect.umbc.. +------------------------------------------| encoded TMi B encoded TMi BBBBB .cs. BUT! we are running TMi on input TMi with input TMi 64 of 66 05/21/2011 09:31 AM .

by the definition of TMh that TMi does NOT halt with input TMi. Case 2: The FSM rejects thus TMi enters the state qf. Well. BUT! we are running TMi on input TMi with input TMi and so we see that the FSM rejecting cause TMi to accept and halt. Lecture 27 Church Turing Thesis This is a mathematically unprovable belief that a reasonable intuitive definition of "computable" is equivalent to the list provably equivalent formal models of computation: Turing machines Lambda Calculus Post Formal Systems Partial Recursive Functions Unrestricted Grammars Recursively Enumerable Languages (decision problems) and intuitively what is computable by a computer program written in any reasonable programming language. k.CMSC 451 Selected Lecture Notes http://www. try the other case." QED. such that. and any input. Lecture 28 Review Review previous lectures 65 of 66 05/21/2011 09:31 AM .. This is a contradiction because this case says that TMi does NOT halt when given input TMi with input TMi.html and so we see that the FSM accepting causes TMi to loop forever thus NOT halting. FSM either accepts or rejects. This is a contradiction because this case says that TMi does halt when given input TMi with input TMi.umbc.edu/~squire/cs451_lect.cs.. Thus we have proved that no Turing machine TMh can ever be created that can be given the encoding of any Turing machine. This means. There are no other cases. Thus what must be wrong is our assumption "there exists a Turing machine. TMh. TMj. and always determine if TMj halts on input k.

one per page Lecture notes. Quiz 2 and Final Lecture notes. Quiz 1 homework details HW7. one big page Project description Turing machine and other simulators and parsers Downloadable source and executables CMSC 451 Course Page Formal Language Definitions Automata Definitions Computability Definitions Language Class Definitions Go to top Last updated 3/25/04 66 of 66 05/21/2011 09:31 AM .HW10.umbc... Multiple choice and short answer questions.edu/~squire/cs451_lect.html Lecture 29 Final Exam Closed book.cs.CMSC 451 Selected Lecture Notes http://www. all reading assignments. Covers all lectures. all homework and project.HW6. 1) go over the Quiz 1 and Quiz 2 the final exam will include some of these questions 2) You may find it helpful to go over the Lecture Notes but not the details of constructions 3) Understand what classes of languages go with what machines (automata) and grammars and regular expressions 4) Understand the statement of the Pumping Lemma for Context Free Languages 5) Understand the Halting Problem and why it is not computable 6) Understand the Church Turing Hypothesis Other links Syllabus homework details HW1.

Sign up to vote on this title
UsefulNot useful