Chapter 14

The finite state control
structure
1 Analogy
How are you feeling right now? Maybe you are happy or sleepy, dopey or grumpy, hungry
or angry. Maybe you feel several of these at once, but for simplicity, let’s assume that just
one word describes your current state. How did you come to be in this state? Clearly your
state is affected by things that happen to you — the inputs you receive. But input alone
does not determine how you feel. For example, receiving a 90% on an exam can leave you
either delighted or disappointed, depending on your expectations. Hence, your current
state is really a function of two factors: your state a little while ago, and the input you
have received since then. In other words, your state at time t+1 is determined by your
state at time t and the input you encountered between t and t+1.
The notion of state and of transition from one state to another as determined by a
combination of previous state and input is the basis for a set of computational models
called finite state machines (FSMs). The FSM models in turn leads us in a very natural
way to a powerful and broadly applicable programming strategy which we will examine in
this chapter.

2 Introduction
Computational models, or models of computation, are abstractions of devices that
produce outputs (answers) from inputs (data). For simplicity, we'll assume a basic model
of a computational device as a black box that has a single input channel and a single
output channel. One simple physical form of a such a computational device would be a
black box with a set of buttons on the front, exactly one of which can be pressed at any
time, and a set of lights on the top, exactly one of which is lit at any time. An input value is
specified by pressing one of the buttons; the output value is specified by the single light
that is lit. We can easily think of this device as producing a single output (light) in response
to a single input (button). But we can also think of the device as being given a sequence of

© 2001 Donald F. Stanat and Stephen F. Weiss

Chapter 14

Finite State Machines

Page 2

inputs (by punching one button after another) and producing a sequence of outputs (by
one light after another being lit). The most recent input is often referred to as the current
input, and the light currently lit is the current output. Note that what goes on inside the
box can be very complicated or even random; this simple model can be modified to
accommodate any computational task we like. But we are interested only in a small set of
the possible behaviors of the box, and we are interested only in behaviors that are
deterministic.
The simplest computational model is one in which the box computes a function of the
input value. Such a box will produce an output value (turn on a light) solely on the basis
of the current input value (the last button pressed). Because the output value is determined
only by the most recent input value, we say that the computation requires no memory; this
means that no information about previous inputs has to be stored to perform the
computation that determines which light shall be lit. Furthermore, given a particular input,
say the ith button, the output is always the same.
We are interested in a more complex computational model that uses some information
about past inputs in determining its output. Conceptually, we can imagine a device that
keeps track of every input it has processed, that is, the entire input history. Simpler devices
might keep track of less information, such as how many times the leftmost button was
pressed, but even this information is unbounded in the sense that the number that must be
stored may become arbitrarily large, and require arbitrarily much storage to represent it.
We're interested in a simpler class of machines called Finite State Machines (FSM), which
can store only a finite amount of information, and give outputs that depend on that
information and the input.
A Finite State Machine is a device that can be in any one of a specified finite set of states
and which can change state as a result of an input. Thus, each time an FSM gets an input,
we consider it to change states, although it may enter the same state it was in before.
Finite state machines are useful for programming because they provide an alternative
model for controlling program execution. Recall that program control is what determines
which program statement (or block of statements) is to be executed next. The most
common control structure is the default sequential structure; this causes the statements of
a program to be executed sequentially, one after another, just as they appear in the
program. The other control structures are
1. Alternative selection. This is usually embodied as an if, if...else, or switch
statement. This performs one or more tests, and based on the result of the tests,
chooses one block of code from a collection of blocks and executes it. The block
of code may of course be empty, as it is in the false branch of the if statement.
2. Iteration. This is any loop structure; it causes some loop body to be executed a
number of times with exit from the loop based on a test.
3. Subroutine. This causes the current action to be suspended. Control then branches
to the subroutine code; upon completion of the subroutine, the original action
resumes where it left off. Often recursion (which occurs when a subroutine calls

Printed % , " :

Chapter 14

Finite State Machines

Page 3

itself) is treated as a separate control structure, but we choose to include it under
the subroutine structure.
To this collection we now add the finite state control structure, which chooses a statement
to execute (or a block of statements or a subroutine to call) based on the state of a finite
state machine and the most recent input.
The remainder of this chapter will define finite state machines and work through several
simple examples to develop the concept and to give you practice in working with them.
Then, we will give some examples to show how the finite state machine can be used to
solve a more complex problem.

3 The Basic Finite State Machine Model
A finite state machine (also called a finite state automaton, or simply a finite automaton)
is a device whose input is a sequence of symbols. The automaton is always in some
identifiable state. Each time an input symbol is received, the machine enters a new state,
although the new state may be the same as the state it was in before. At each point, the
current state of the machine is determined completely by:
1. the state it was in prior to the last input, and
2. the value of the last input symbol.
Formally, a finite state machine consists of five components:
S: a finite set of states. A FSM is always in exactly one of its states.
s0: a particular state called the start state that the machine is in before it has seen any
input.
I: a finite set of input symbols
O: a finite set of outputs.
: the next state function which maps a current state and current input to a new state.
(SxI) -> S
: the output function which maps the current state and the current input symbol to an
output. (SxI) -> O
The machine starts out in s0 and looking at the first symbol of a sequence of input
symbols. It then issues the output appropriate to this state-symbol combination and goes
to the appropriate next state. It then goes on to the next input symbol and the process
repeats until the input sequence is exhausted; the machine then stops. Note that the
machine is always in some state. It starts out in s0 and is left in some state when operation
stops.

Printed % , " :

the start state is indicated by the bold incoming arrow. Both the input and output sets are {0. 0/0 0/0 1/1 S0 S1 1/0 Figure 2 The machine shown above produces a stream of output symbols that is exactly as long as the input stream. then we follow the arc labeled x/y from s to a new state and produce output y. If the machine is in some state s and the current input symbol is x. “nothing”}.b}.1}. Each arrow is labeled with one element of I and one element of O. Thus the input 011010111 becomes 010010010. By allowing “nothing” to be an element of the output symbol set. The next state and output functions are shown in the tables below. input input 0 1 S0 S0 S1 S1 S1 S0 state 0 state S0 S1 1 1 0 0 0 Output Next state Figure 1 An alternate but equivalent representation for a FSM is shown in Figure 2 below and is called a state diagram. b.Chapter 14 Finite State Machines Page 4 Example: Our first example of an FSM is one that takes as input a sequence of binary digits and produces as output the same sequence except alternate ones have been changed to zeros. and substrings of b’s into a single b. Hence the input string “aaaabaabbbbbabbab” will produce the output “abababab”. the machine in Figure 3 has a two symbol input set {a. Printed % . " : . States are shown as circles. and the input “aaaaaaaaaaaaaab” will produce “ab”. The next state function and output functions are shown using directed arrows from one state to another. It reads in strings of a’s and b’s and collapses substrings of a’s into a single a. and produces outputs of {a. The machines has two states s 0 and s1. with s0 being the start state. For example. we can specify machines whose output stream is shorter than the input.

So just return with us to those thrilling days of yesteryear when a first class stamp was 25 cents. Therefore. because when the credit amount reaches 25 cents the machine will produce a stamp and decrease the credit amount by 25 cents. The machine must keep track of how much money has been put in so far.Chapter 14 Finite State Machines Page 5 a/nothing a/a S0 S1 a/a b/b b/b S 2 b/nothing Figure 3 3. and if the amount is 25 cents or more. dispense one stamp and reduce the customer's "credit" by 25 cents. no stamp from the roll. This table merges the output and next state functions. there are only two outputs: one is a single stamp from the roll. 1 Printed % . that is. The set of inputs are nickels. plus states for 10.1 Example: the stamp machine Consider a simple vending machine that dispenses 25-cent stamps. The list of states and possible actions is described in a table in Figure 4. it must only remember the outstanding credit. it’s impossible to keep this example current. dimes and quarters. the other is nothing. " : . There is no need for a state with a credit amount of 25 cents or more. since it has no other form of memory. Figure 4 With postal rates constantly going up. There will be one state indicating that the credit is currently zero. for the sake of simplicity. each rectangle contains an output (top line) and a new state (bottom line). If. one state indicating a credit of 5 cents. we assume no coin return mechanism. Note that the machine need not remember exactly how much money has been put in it. one at a time 1. all at once. it must have one state for each possible amount of credit. 15 and 20 cents.

Some arrows are also labeled to denote that a stamp is produced as output when these paths are taken. 15. or a quarter. " : . suppose the machine is in the state "10 cents". every possibility has been accounted for. 10. adding a quarter produces a stamp and leaves the credit balance unchanged. If the customer then adds a dime. Figure 5 As it is currently specified. the machine goes to the “20 cents” state and again produces “nothing”. Printed % . the machine always enters the same state it was in before and dispenses a stamp. The arc from the 0 state back to itself would produce no output. For example the arc from the 15 cents state would dispense 15 cents in change. If another nickel is added.Chapter 14 Finite State Machines Page 6 In the above table. no matter what state the machine is in and which input it receives. the stamp machine will cheat the user out of some money if he or she runs out of coins while the machine is in some state other than 0. the machine now has enough money to dispense a stamp and have 5 cents credit left over. From any state in the stamp machine. We would then add a new arc from each state to the 0 state with the input symbol being “I am done”. The state diagram for the stamp machine. there is exactly one output to produce and one next state. shown in Figure 5. For example. When the input is received the machine goes from the current state into a new state. provides an easy way to visualize what the stamp machine does. a dime. following the arrow which is labeled with the type of coin that has been inserted. Thus from any state. it adds five cents to the amount put in so far and hence goes to state "15 cents" and produces the output "nothing". In other words. We could make the machine more realistic and more humane by adding another input: “I am done” and adding four new outputs corresponding to giving change of 5. For any credit balance. then the output is "nothing". and the output being the appropriate amount of change. Hence the machine produces the output “one stamp” and goes to the “5 cents” state. when the input is a quarter. An input can be a nickel. It starts out in the initial state (credit = 0 cents). Notice that new state doesn’t always imply different state. If the arrow the machine follows does not say to output a stamp. adding a quarter will result in the output “one stamp” with no change in the credit balance. And so the new state is in fact the same as the old. If the customer puts in a nickel. and 20 cents.

Hence we will need two states: s0 for even so far. The ab reducer machine took an input stream and produced a possibly shorter stream with duplicates eliminated. When the input stream is exhausted. Conversely. How many states should the machine have? To figure this out. the loop terminates and the machine stops.Chapter 14 Finite State Machines Page 7 4 Implementing a FSM A FSM can be implemented with a simple loop. the output symbol that was associated with the last symbol of the input sequence. The input set will consists of just the single symbol “a”. As an example. state = start state. The first machine took in one binary integer and produced another of the same length. The stamp machine took in a sequence of coins and produced a sequence of stamps and possibly some change. This could be done by hard wiring the output and transition information into the code of the functions or by using a more general table look-up scheme. While a FSM always produces a sequence of outputs in response to a sequence of inputs. It is not necessary to remember the entire number.    } The body of the loop contains three operations. If we are in the even state and see an a. there are only two possibilities: that we have seen an odd number of a’s so far or that we have seen an even number of a’s so far. then we have now seen an even number of a’s. and s1 for odd so far. produce the appropriate output symbol. This could be done with a read statement or perhaps by getting the next element from an array or linked list. The even state.x)). That is. Each time through the loop we get one input symbol. let’s build a FSM that will take a string of a’s as input and tell us whether the input contained an odd or even number of a’s. // Generate appropriate output. First. then we have now seen an odd number. and then go to the next state.x). " : . respectively. The next state and output functions will be as follows. If we are in the odd state and see an a.           output((state. remember that the states of a finite state machine correspond roughly to its memory. we can choose to ignore all the outputs except for the last one produced. If we have seen an odd number so far and then see another a. then we go to the odd state and output an O since we have now seen an odd number of a’s. The second and third statements contain function calls to generate the appropriate output and go to the next state. if we have seen an even number so far and see another a.          state = (state. s0. // Move to next state. will be the state since initially we have seen zero a’s and zero is an even number. 5 Final output machines The machines we have seen so far all take a sequence of input symbols and produce a sequence of output symbols. we must get the next input symbol. the output set will consist of E for even and O for odd. while (there is more input) {     x = next input symbol. In fact. then we go to the even state and output an E Printed % .

Notice one unusual thing about this machine. gives the parity of the input. The state diagram for this machine is shown below. The last output symbol gives the property for the entire number. Since output is associated with state transitions. The output is 2. if n is divisible by 6. the result is the binary representation of 2n. 'Ab' for alphabetic. or the remainder when n is divided by 6. The input is the sequence of binary digits that constitute the number (reading left to right). then n is divisible by 2 (because 4 is) and not by 3 (because 4 is not). and 'B' for blank. then it is divisible by both 2 and 3. although the individual symbols give the division property of that portion of the input seen so far. We can think of such machines as implementing a function mapping input strings onto a single element of the output set. there can be no output produced by the empty string even though the string of zero a’s is of even length. 3. " : . we use 'L' for letter. or by neither. Rather than labeling the arcs with all possible inputs.Chapter 14 Finite State Machines Page 8 since we have now seen an even number of a’s. constructing the state diagram of Figure 7 is straightforward. by both 2 and 3. The outputs are 'B' for blank. Thus. When the binary representation of n is extended by adding a 1. The final output indicates whether the string is blank (consisting solely of blanks) or numeric (digits and blanks) or alphabetic (letters and blanks) or alphanumeric (letters. or b (for both) and n (for neither). When the binary representation of n is extended by adding a 0. Figure 8 shows a FSM whose input is an arbitrary string of letters. digits. a/O S0 S1 even odd a/E Figure 6 Figure 7 shows a FSM that determines whether a binary integer is evenly divisible by 2. The state subscript on the states in Figure 7 represents the value of n mod 6. With those facts in hand. and if the remainder of n divided by 6 is 4. and blanks. the result is the binary representation of 2n + 1. Printed % . digits. The last output symbol produced by the machine just before it stops. and blanks). 'D' for digit. 'N' for numeric. 0/b S0 0/b S3 1/n S1 1/3 0/2 S2 0/2 1/n 1/3 1/n 0/2 S4 S5 0/2 1/n Figure 7 How was the state diagram of Figure 7 designed? The divisibility of an integer n by 2 and 3 is determined by the value of n mod 6. The sequence of output symbols has no real significance. by three.

If the machine stops in state 0. And if it ends up in states 1 or 5. then it is divisible by 3 only. As a bonus. we can eliminate the need for the separate output function altogether and instead incorporate the output into the states. is what is called a sink state. " : . Shown below is a modified version of the “even/odd a” machine from Figure 6 But this machine has the output symbols associated with the states rather than along the transitions. L/Ab S2 D. if it ends up in state 3.B/An L/Ab S0 B/B S1 B/B D/N S4 L. then the input is divisible by neither 2 or 3. Hence we can modify this machine to incorporate the output into the states as is shown below.B/An S3 D/N Figure 8 6 Acceptor machines In some cases. then the input is evenly divisible by only 2. then it is certainly alphanumeric regardless of what else is in the string. this machine correctly produces the output E for the empty string.Chapter 14 Finite State Machines Page 9 and 'An' for alphanumeric. We can determine the division property of the input simply by observing the state that the machine is in when it stops after having read the entire input sequence. which is where we go when we find a string to be alphanumeric. Note that once a string is found to contain both a letter and a digit. you can never leave. If it ends up in states 2 or 4. The interpretation is that if the machine is in a particular state. Printed % . then it produces the output associated with that state. The machine below produces the same output as does the machine in Figure 6. Hence state 4.B. a S1 S0 even a odd Figure 9 The second example is derived from the FSM in Figure 7.D/An L. then the input is divisible by both 2 and 3. It is to finite state machines what a Roach Motel is to a roach: once you get there. And the final output is the output associated with the state in which the machine stops rather than the output associated with the last transition.

We can think of the output symbols as the binary digits 0 and 1 and associate the notion of “accept” with 1 and “reject” with 0.D L. The first three examples below are acceptor automata made from examples we have seen already.B. Such machines are called acceptor automata. The revised state diagrams are shown in Figures 5 and 6 below. The machine in Figure 12 accepts strings of a’s that are of even length and rejects strings of odd length. State 0 indicates blanks only.D D Figure 11 A special case of such a machine is one in which the output set contains only two elements.D L S4 An D D S3 N L. The machine in Figure 13 accepts binary integers that are evenly divisible by 2 or by 3 or by both. Note that we have been able to add a new output. we use a double circle to indicate accepting states (output of 1). associated with s0. Instead of writing the output values 0 or 1 in the each state. Then we can think of the machine as either accepting or rejecting an input depending on whether that input causes the machine to stop in an accepting or rejecting state. e for empty. All input sequences are thus mapped onto one or the other output symbol. " : . The machine in Figure 14 accepts strings that are either alphabetic or numeric and rejects strings that are blank or alphanumeric. and a single circle to indicate rejecting states (output of 0). state 1 indicates alphabetic. L S2 Ab L S0 e B S1 B B. Notice that this machine correctly accepts the empty string. Printed % . and state 3 indicates alphanumeric. state 2 indicates numeric.Chapter 14 Finite State Machines Page 10 0 1 0 0 S1 n 1 S0 b 0 1 S3 3 S2 2 1 0 S4 2 1 S5 0 n 1 Figure 10 We can do the same thing with the FSM in Figure 8. These figures also show one further shorthand notation.

there is no easy way to impose such a limit on a FSM. for example. Pascal. Note that encountering any 'S' character takes us immediately to the sink state S3 which is a rejecting state and from which there is no exit. However. allows names of up to 255 characters. This is a limitation that is inherent to the FSM and one we will consider in more detail below. names of variables or procedures). a string must begin with a letter (indicated by the generic 'L') followed by letters and digits ('D').D D S3 D Figure 14 The machine in Figure 15 might be used in the lexical analysis phase of a compiler.Chapter 14 Finite State Machines Page 11 a S0 S1 even odd a Figure 12 0 1 S0 0 S1 1 S2 0 0 1 S3 1 0 S4 1 S5 0 1 Figure 13 L S2 B.B. Note also that this FSM imposes no limit on the length of the input. " : . Printed % . We denote by 'S' any character such as ‘$’ or ‘?’ that is neither a letter nor digit. To be accepted.D D L.D L S0 L B S4 S1 L. It takes strings of characters and accepts those that are valid identifiers (for example.

d b S0 b d S1 S2 b b S3 d Figure 16 We can take our integer acceptor one step further by accepting a string of characters that contains one real number optionally preceded and followed by blanks. Printed % . The real number can be represented either in standard notation or in scientific notation. then you next go a rejecting sink state. In state 3 we know that we have seen zero or more leading blanks followed by a contiguous substring of digits followed by one or more blanks. In state 0 we have seen nothing yet. then you go to the sink rejecting state and stay there. If the machine stops in s 0 then we know the input was empty. ending there indicates a string made up of blanks only. the others are rejecting. The construction of this machine is left as an exercise. Instead. Thus. State 2 is associated with strings that have zero or more blanks followed by a contiguous substring of digits. Try the machine on the following real numbers as well as on some strings that do not contain valid reals.Chapter 14 Finite State Machines Page 12 L. The five states in this machine can be thought of as being associated with the five different classes of strings. for example. you can assume that if the machine is in some state s and looking at input symbol x. it is customary not to show the sink rejecting state nor the arrows leading to it. if the machine is in state 0 and see the letter ‘a’. States 2 and 3 are accepting. For the sake of simplicity. " : .D. and if there is no arrow labeled x leading from s.D S1 L S0 S D. State 1 indicates that we have seen only blanks so far. possibly preceded and followed by blanks. And state 4 (not shown) is the sink state where we go if we encounter a character other than a digit or blank or if we see more than one contiguous string of digits.S S2 L.S Figure 15 The FSM in Figure 16 reads strings of characters and accepts only those strings that contain a single unsigned integer.

8E-11 7 String Searching Another practical use of acceptors is in string searching.Chapter 14 Finite State Machines Page 13 100 -100 3. (which could be simplified somewhat by using the substring facility of Java).02E24 -8.) The naive method of doing this would be to write a loop that goes through the target string a character at a time and checks to see if the pattern occurs beginning at that character. String searching problems are very common problems. is implemented by the following code. " : . This simple algorithm.1415 6. The usual statement of the problem is: "determine whether string X occurs in string Y. which we will call algorithm A: Printed % . most notably in text editing. go to the next character in the target string and start again." (For this problem string X will be referred to as the pattern and string Y as the target. After the complete pattern has been compared to the target.

etc. Thus the number of character-character comparisons done in the worst case is (mn) where m is the length of the pattern and n is the length of the target. Since m is a finite number. most characters in the target are examined m times: once to see if it they be the first character in an occurrence of the pattern.length().1 Naive String Search (Algorithm A) // Find the first occurrence of pattern in target. if (match(pattern. Since a finite state machine doesn't go back and reread any Printed % . // Does s1 match a substring of s2 starting in position pos? public boolean match(String s1. target. for (i=0.  // beginning at position start. int i.length()>s2. in the worst case. This means that a finite state machine can be used to solve the string searching problem. } } This simple solution can be inefficient because.length()­pos) return false. int pos) { // pre:  true // post: Returned value is true iff s1 matches a substring //       of s2 starting in position pos. String target) { // pre:  true // post: All occurrences of pattern in target have been reported. // Determine whether one string occurs in another string // starting at a specified point. if (s1.length() && s1. This is because these earlier characters are separated from the current character by a distance which is longer than the length of the pattern. Therefore. what the last m characters read were. i++) {} return i==s1. for (int i=0. public void findAllMatches(String pattern. " : . there are only a finite number of possible combinations for these m characters to have had. Therefore all we need to know is. once to see if they could be the second character in an occurrence of the pattern that began one symbol to the left. } // Find and report all matches of pattern in target. i++) { // inv: All occurrences of pattern in target starting in  //      positions less than i have been reported.println("Match found starting in position "+i). A way to avoid the multiple comparisons of each is for the program to remember some information about the characters that have been read so far. No characters more than m positions to the left of the character currently being examined can affect whether or not this character is part of an occurrence of the pattern. this information can be stored in the states of a finite state machine. What is needed is a finite state machine that will read an input string and accept the string if it contains the pattern.length(). // Trivial case: not enough room in s2 for a match.out. i)) System.Chapter 14 Finite State Machines Page 14 7.charAt(i)==s2. i<s1.charAt(i+pos). at most. i<target. String s2.

Boyer and Moore have developed a string matching algorithm that is faster. which is the same thing as saying that it depends on what state the automaton is in. Since the pattern starts with the character "1". Therefore the machine stays in state s0. " : . all that remains to be specified is the state transition function of the finite state machine. Whenever it finds a character that does not fit the pattern it must retreat some number of steps. than this one.length(). This could be at the beginning of an occurrence of the pattern so the machine goes to state s1. For a more detailed view. The first character read is "2". target. i<target. Now read the character "1". At each point this Although this algorithm is better than algorithm A. read the next character which is "2". for (int i=0. This is the first character of the pattern string. 2 Printed % . The machine starts in the initial state s0. Likewise. if (accepting(state)) System. but note that the finite state machine is determined completely by the pattern.charAt(i)). The following is a finite state machine that finds the first occurrence of the pattern "123" in target strings of digits.Chapter 14 Finite State Machines Page 15 characters of the input string.2 Finite State String Search (Algorithm B) var state = initialState. by a linear factor.println("Match found starting in position "+i). the finite state machine will read each character of the target exactly once. it is not the best we can do. Clearly. this is the better than algorithm A2. what has been read so far can't be the beginning of an occurrence of the pattern in the target. The number of steps it retreats depends on what the what the previously read target characters were.i++) { // inv:  We'll discuss this shortly! state = stateTransition(state.out. 1 Figure 17 A high-level description of the automaton is that as it reads characters of the target string which might be part of an occurrence of the pattern "123" it proceeds straight across the diagram from left to right. trace what happens with the input string "2122123". } Now. The program corresponding to this finite state machine would look something like this: 7.

but still easy enough to make the finite automaton method worthwhile. This is not the next character of the pattern. Thus if the machine is in state s0 (as it is when it hasn't yet read any characters of the target string). If it is in state s 2. we denote by I* the set of all possible finite length input strings made from elements of I. the machine reads the "2" and goes into state s 2. and then finally reads the "3" and goes into state s3. so the task is finished. Now read in the next character. which is "1". The question arises as to whether any language can be accepted by an acceptor automaton. While every string in I* is of finite length. Any acceptor automaton M with input set I divides I* into two subsets: those strings that are accepted by M and those that are rejected. Printed % . And if we are in s 3. the machine moves to state s 1. A problem which we haven't talked about is how to use this method if the pattern to be searched for is not known in advance.Chapter 14 Finite State Machines Page 16 still might be an occurrence of the pattern. This is somewhat harder. known as the Knuth-Morris-Pratt string matching algorithm is a well known application of finite state machines. At this point it has read the entire target string. There are many languages that cannot be accepted by an acceptor automaton. As before. it constructs a finite automaton with which to process the target string. The machine is in state si (0 <= i <= 3) if and only if i is the largest value such that the last i characters of the target string that were read are equal to the first i characters of the pattern string. so the machine moves to s2. We have seen in the examples above of a variety of languages that can be accepted by acceptor automata. then it has matched 0 characters of the pattern. For any pattern string. We refer to the accepted subset as the language accepted by the machine. From this we can deduce the loop invariant for the corresponding program: INV. so upon reaching it the machine reports that it has found an occurrence of the pattern. the transition function cannot be prepared ahead of time. this is not an occurrence of the pattern and the machine must go backwards two steps to state s 0. does there exist an acceptor automaton that accepts exactly A and rejects B? The answer is most emphatically 'No'. then the entire pattern must have been found in the target. given an arbitrary division of I* into two disjoint subsets A and B. but will have to be computed as part of the searching program. The two subsets are disjoint and their union is I*. That is. In this case. 8 What a FSM cannot do Given a finite input set. The KMP algorithm is somewhat more complex than we've let on. then the last 2 characters read match the first 2 characters of the pattern. This algorithm. Now read a "2". which is a "3". Continuing on. The simplicity of this loop invariant should by itself be enough to show that this is a good way to solve the string searching problem. I. This is the accepting state. " : . the set I* itself is infinite. Therefore.

if you have a set of size n.) then at least one of the elements of the set will be drawn at least twice. processing an input of length n takes the machine through n+1 states: the start state plus the n new states that are arrived at via the n state transitions. consider the language of algebraic expressions consisting of single letter variable names and the operators +. It derives its name from the pigeon holes used by post office workers to sort mail. then one of the states. it must have a finite number of states. say n states. This is an example of the pigeon hole principle. It is a valid arithmetic expression and should thus be accepted by M. then at least one of the holes must receive more than one letter. The first substring x contains the parentheses that take us from the start state s 0 Printed % . -. * and /. More formally. and z. consider the consider the expression (na)n. We know this must be true since M undergoes n state transitions while processing the n left parentheses and hence visits n+1 states. and draw n+1 samples from this set (with replacement. must be visited at least twice. Hence at least one state must have been visited at least twice. Simply stated. let’s assume the contrary.Chapter 14 Finite State Machines Page 17 To get a feel for the kinds of languages that cannot be accepted by acceptor automata. Since M is a FSM.y. Since there are only n states in M. The very simple machine in Figure 18 accepts this language. That is. we can divide the string of left parentheses into three parts: x. If we trace the action of M operating on this string. Figure 19 shows a slightly more complex machine that accepts arithmetic expressions with one level of parenthesization allowed. But no acceptor automaton can accept arithmetic expressions that contain unbounded parenthesization. op L S0 S1 S2 L Figure 18 op S0 L S1 S2 L ( S0 ) L ( op S1 S2 L Figure 19 To see why this is true. let’s assume that an acceptor automaton M accepts the language of valid arithmetic expressions with no limit on the level of parentheses. " : . Given that M visits si twice. say si. we will see that M visits some state s i at least twice while processing the left parentheses. Now. This is a string of n left parentheses followed by the letter a and followed by n right parentheses. if you have lots of letters to be put into the only a few holes. In the case of a FSM.

And so we have arrived at a contradiction. Similarly. and when the machine has read more left parentheses than it has states. We can easily extend the basic FSM model to provide a very powerful and useful control structure by allowing the machine to do arbitrary computation in each state. the weakness in finite state machines is that they have only a bounded amount of memory.' But a simple machine with one control state and a counter can handle this problem as follows: Printed % . Hence our only conclusion is that the assumption that M exists must be false. the string xz has fewer left parentheses than does xyz. M is in state sj with a)n remaining to be seen. This weakness in the power of the FSM clearly limits what we can do with it. for example. But that’s an error. the problem of determining whether an algebraic expression is properly parenthesized. z might be empty too. Thus in processing both strings. Now. it will necessarily become 'confused. consider the string xza) n. just like the statements inside a loop or select. This cannot be done in the pure FSM model. For example. Hence xza) n contains more right parentheses than left parentheses and is invalid and should be rejected. because parentheses can be nested arbitrarily deep. matching arbitrarily nested parentheses takes an unbounded amount of memory and hence is beyond the capability of the FSM. the state that M is in after seeing all of the left parentheses. the FSM cannot handle the task. Consider. the restrictions imposed by the basic FSM model turn out to be an artificial limit on how we use the model. including giving the machine access to arbitrary data structures such as counters. Fortunately. It takes the machine from si to sj. etc.Chapter 14 Finite State Machines Page 18 to the first occurrence of si. arrays. It is possible that x is empty if si is in fact s0. Then substring y takes us from the first occurrence of si to the second occurrence of si. We assumed that M existed and have shown that any M that alleges to accept the language will err by either accepting an invalid string or rejecting a valid string. The FSM serves as a control structure much like loops and alternative selection. the basic notion of the FSM can be extended and used as a control mechanism in more powerful computational models. while a FSM can accept strings that contain a single integer. Since the action of a FSM is completely determined by its state and the input. This substring must contain at least one symbol. " : . no FSM can determine the value of an arbitrary integer. M was supposed to accept one and reject the other. If a task requires more than that amount of memory. And the substring z simply contains the rest of the left parentheses. lists. and the statements inside the FSM control structure. Since y contains at least one parenthesis. But x takes M from s0 to si. M will thus do the same thing on both strings: it will either accept both or reject both. However. 9 Extensions to the basic model The basic FSM model is a useful theoretical model of computation But it is really too weak for most practical problems. and z takes M from si to sj. are unrestricted. Intuitively.

and the end of input is indicated by a sentinel. The principal input to a text editor is keystrokes. the counter is incremented.the values of registers. but also the other parts of the state -. and filer. counters. And typing an “e” in the filer state produces a extended listing of the files on the current volume. arrays. th e automaton i s i n th e fi nal state and th e counter = 0. The user-interface for the UCSD Pascal System for the Apple II computer has three states: system. but. if a ')' is read. or a character sequence to be searched for within the document. if a '(' is read. Note that the statements inside the case statement (indicated as "process inputSymbol in state si") are arbitrary statements. The initial state of the machine is s0. at end of stri ng. and its reaction to a sequence of keystrokes is dependent on which state it is in. the FSM has been implemented using the Java switch statement. The next state function is a function not only of the state si and the input symbol. the text editor can be considered to have many states. including possibly compound statements. typing an “e” adds that character to the file currently being edited. Here. with the counter initialized to 0. And if the counter is not equal to 0 when the string is processed. Printed % . the next state is chosen based both on the present state and the value of the counter. but the same keystrokes can mean entirely different things to the text editor. Accept i f. they may represent the name of the file under which the document is to be stored. Note that both acceptance and the next state function (and in general. If the value of the counter is ever 0 and a ')' is read. etc. A text editor on a computer provides another illustration of the finite state model. Thus. depending on the 'state' of the software.Chapter 14 Finite State Machines s0 Page 19 '(' and counter 0 / +1 ')' and counter > 0 / -1 ')' and counter = 0 fai l Parenth esi s ch ecker. As each left or right parenthesis is encountered in an input string. the counter is decremented. A program skeleton for using a general FSM control structure is given below. or instructions on how the document is to be formatted for printing. the output function as well) use the value of the registers in determining what to do next. Once in the editor. When in state s0. What the interface does in response to a user’s input depends both on the input and on the state. Most commonly. editor. the string is not well-formed and the machine enters the 'fail' state. keystrokes are text to be inserted into the document. " : . For example. there were more left parentheses than right ones. or even another finite state machine construct. typing the character “e” in system state causes the system to go to the editor state.

Chapter 14 Finite State Machines Page 20 var state = startState. consider a more realistic version of the integer acceptor machine shown previously in Figure 16. blank (contains only blanks). but is controlled by exactly the same FSM as was used in the integer acceptor. s1 for blank. case sn: process input symbol in state sn. If it is nonempty. The first test determines if the line is empty. its value is accumulated in the integer variable value. and the current state reflects the  //      input history. and s4 for invalid.. And if they are. input symbol). case s1: process input symbol in state s1. It is a sieve algorithm: a series of tests. particularly in state 2. If nothing remains after blank stripping. state = stateTransition(state. This algorithm makes a single pass Printed % . Integer Reader 1 below shows the traditional way of approaching this problem. In particular. break. // Stop FSM at end of input. break. break. valid (contains a single unsigned integer). s2 and s3 for valid. we first cast the character to an integer and then subtract the integer version of '0'. . } } As an example of the application of this control mechanism. If the string is non-blank. The computation within the states. blank or valid). If the string is valid. then the value is accumulated. then the string must have been blank. or invalid (anything other than empty. To convert a character digit to its integer counterpart. input symbol). We will ignore for now the possible problem of the integer being too large. we also want to determine the value of the integer if the string is valid. " : . while (true) { get the next input symbol // inv: all symbols preceding the current input have been processed  //      in the correct state. input symbol). it goes on to the next. Instead of just accepting or rejecting the input. The state in which the machine stops indicates the result: s0 for empty. As the input passes each test. Integer Reader 2 does the same thing.. the leading blanks are stripped off. if (input symbol == sentinel) break. determine if it is empty (contains no characters). This algorithm works. determine its value. if the string contains a valid integer. The remaining characters are then tested to see if they are all digits. but it is somewhat ad hoc and it may look at some of the characters in the string more than once. switch (state) { case s0:  process input symbol in state s0. Hence ((int)'3')­(int)'0' has the integer value 3. And second. If it fails a test. state = stateTransition(state. then trailing blanks are stripped. state = stateTransition(state. the algorithm stops. we want to read an input string and first. extends the power of the machine.

value = value*10+((int)s. } public static String stripTrailingBlank(String s) { // pre:  s contains a non­blank character // post: Returned value is s with trailing blanks removed. return "|"+res+"|"+" is invalid".1 Integer Reader 1 public static String stripLeadingBlanks(String s) { // pre:  true // post: Returned value is s with leading blanks removed.length()­1). while (s. it is easy to write. " : . String res=s.substring(0. if (s.charAt(0)==' ')  // SC eval s=s.charAt(i)) // Non­digit found.i<s.Chapter 14 Finite State Machines Page 21 over the input and while it is longer than its predecessor. //       its value. if valid.s. // Check for all digits and accumulate value int value=0.charAt(i)<'0' || '9'<s.charAt(i)­(int)'0').length()>0 && s.substring(1).length()­1)==' ') s=s. understand and modify. } Printed % . // See if string is empty.length()==0) return "|"+res+"|"+" is empty". return "|"+res+"|"+" is valid: "+value.length().charAt(s. // String is not all blank. while (s.length()==0) return "|"+res+"|"+" is all blank". strip leading blanks s=stripLeadingBlanks(s). } // Only digits found. // String is nonempty.i++) { if (s. return s. strip trailing blanks s=stripTrailingBlank(s). for (int i=0. if (s. return s. } public static String reader1(String s) { // pre:  true // post: Returned String gives string type and. 9.

else if (c=='b') state=3.charAt(i)<='9')  c='d'.2 Integer Reader 2 public static String reader2(String s) { // pre: true // post: Returned String gives string type and. int i=0.  value=(int)(s. // Invalid.length()) { // Translate character: ' '­>'b' blank //     digit­>'d' //     something else­>'s' if (s. case 2: if (c=='d') value=value*10+(int)(s. break. case 3: if (c=='d' || c=='s') state=4. break. " : Page 22 . while (i<s. value=(int)(s. } else  if (c=='s') state=4.Chapter 14 Finite State Machines 9. String res=s. switch (state) { // Nothing seen so far. break.charAt(i))­(int)'0'. break. else state=4. // Seeing digits.  // State of FSM.charAt(i))­(int)'0'.  else  if (c=='d') { state=2. if valid.charAt(i)==' ') c='b'.  Sink state.      // String index. // Only blanks seen so far. int value=0. case 0: if (c=='b')  state=1. } else state=4.charAt(i))­(int)'0'. // Valid plus trailing blanks. case 1: if (c=='d') { state=2. int state=0.charAt(i)>='0' && s. else c='s'. Printed % .  //  char c. //       its value. else  if (s.

. The appropriate states for doing so can be listed as follows: 0 Outside a comment.Chapter 14 Finite State Machines Page 23 case 4: } i++.3 Comment locator Suppose that you wish to locate the comments in a Java program. default: return "|"+res+"|"+" is invalid". } } The real number acceptor (exercise 7) can be similarly extended to determine the value of the real. 4 Just saw a '*' inside a /* . 1 Outside comment. This text may include line returns. Now inside comment mode. */ comment. but have just seen a '/'.. Now inside comment mode until <cr>. 3 Have just seen a '*' that followed a '/'. 9. 2 Have just seen a second '/'. " : . } switch (state) { case 0:  return "|"+res+"|"+" is empty". Any text following "//" up to a carriage return <cr> is a comment. case 1:  return "|"+res+"|"+" is all blank". not / / S0 S1 not / or * / <cr> / * S2 Printed % . case 2:  return "|"+res+"|"+" is valid: "+ value. Any text between '/*' and '*/' is a comment. case 3:  return "|"+res+"|"+" is valid: "+ value. There are two considerations.

and a similar description of how the output is generated. either with a diagram or (when a diagram is not feasible) a state transition table. Printed % .Chapter 14 Finite State Machines S4 not <cr> Page 24 S3 * not / or * not * * Comment Stripper Given the state diagram. but it's worth a moment's reflection to consider how difficult the code would be to understand without knowledge of how it arose. " : . the code is not difficult to write. Documentation of code based on a FSM should always describe the FSM.

} break. { int state = 0. outS=outS+'\n'. outS=outS+'/'+c. else if (c!='*') state=3. leave comment mode. // Have seen second '/'.// Will hold outbound string. enter comment mode for the rest  // of this line. Since the message may contain long runs of the same character (for example. else if (c=='*') state=3. and 'c'.length(). break. } } return outS. } break. for (int i=0.charAt(i). enter comment mode until "*/'. // Current FSM state. we want to try to compress the message for more Printed % . public static String commentStrip(String s) // pre:  true // post: Returned String is the inbound string stripped of comments. “aaaaabaaaaacccccccbbbbbbababc”). String outS="". char c=' '.4 Text Compression Let’s say we wanted to send a message consisting of only the characters 'a'. case 1: if (c=='/') state=2. // Have seen a '/'. 'b'. else outS=outS+c.i++) { c=s. // In comment mode and have seen a '*'.i<s. } 9. case 4: if (c=='/') state=0. case 2: if (c=='\n') { state=0. // If next char is '/'. else { state=0. break.    // Current character. switch (state) { // Not in comment mode. // Have seen '/*'. case 3: if (c=='*') state=4. case 0: if (c=='/')  state=1. break.Chapter 14 Finite State Machines Page 25 // Strip comments from parameter string. " : .

The procedures to compress a string and to display the compressed string are shown below. else  if (count == 2) return String. To do the compression requires that we keep track of two things: the character seen most recently and the number of consecutive occurrences of that character. } Printed % . encoding singletons would actually make the resulting string longer. however. return c. For example. public static String output(int count. // if count = 1.valueOf(c)+String. return cc // if count is 3 or more. Encoding runs of length two would produce no compression. return compressed form: count+c // pre count >= 1  if (count == 1) return String. The former can be done with a FSM since there are only three characters. A very simple way to do this is to encode any run of three or more of the same character as <count><character> where count is an integer indicating the number of occurrences of character.valueOf(c). else return count+String.valueOf(c). " : . is beyond what a FSM can do and is handled as an extension.valueOf(c). // if count is 2. the message “aaaaabaaaaacccccccbbbbbbababc” would be abbreviated as “5ab5a7c6bababc”. char c) { // Compress a homogeneous string of characters. keeping track of the count.Chapter 14 Finite State Machines Page 26 efficient transmission.

'b'). else { outString = outString + output(count.charAt(i)=='a') count++.length(). int count=1.  // String to be returned. // FSM state. else { outString = outString + output(count. // 'a' seeing a's.i<s.i++) { switch (state) { case 's': // Get first character of s. count = 1. state = s. works well. state = s. A few moments of thought reveal that the redundancies are due to the fact that the actions in all the states except the start state are Printed % .state). state = s.charAt(i). } The above code for compress. if (s.'c').'a').Chapter 14 Finite State Machines Page 27 public static String compress(String s) { // Compress s by encoding runs of length 3 or more. return outString. } break. // FSM states:  's' for start. else { outString = outString + output(count. // 'b' seeing b's // 'c' seeing c's char state = 's'.charAt(i)=='b') count++. state = s. case 'c': // Have seen one of more c's. " : . case 'b': // Have seen one of more b's. } break. if (s.charAt(i)=='c') count++. String outString = "". } break. developed from a straightforward FSM model. // Current run length. } } // Flush final character(s). count = 1.charAt(i). for (int i=0. break. outString = outString + output(count.charAt(0). count = 1. if (s.charAt(i). but has several sections that look very similar. case 'a': // Have seen one or more a's.

} } // Flush final character(s).  // String to be returned.state).Chapter 14 Finite State Machines Page 28 nearly identical.charAt(i)==state) // Repeat of previous character.charAt(i).i++) { if (s. state = s. // Initialize state. String outString = "". Printed % .length(). // pre:  true // post: Returned value is compressed version of s. public static String compress(String s) { // Compress s by encoding runs of length 3 or more.charAt(0). for (int i=1. count = 1.length()==0) return "". As we have seen the basic finite state machine is not powerful enough to handle most of these applications. char state = s. return outString. outString = outString + output(count. " : . ranging from the children's board game Candyland to traffic lights to many software applications. { outString = outString + output(count. // Current run length.  // Handle empty string. but is easily extended and is the basis for a very useful and powerful programming paradigm. else    // New character. if (s. int count=1.i<s. and the separate cases for the remaining states can be combined into one. eliminating the redundancies and giving the following code. The start state can be eliminated by making the initial state the first character of the sequence (thus changing the range of the for loop). count++. } 10 Summary The notion of state and of transition between states based on input is very common.state).

Draw the state-transition diagram for the following finite state machine and describe in English what it does.s1. " : .s2} Initial state = s0 output function F: state transition function G: What will the machine output if the input is: i) aaa ii) aaaaaaa$ iii) aa$aaaa$a$$ Printed % .Chapter 14 Finite State Machines Page 29 11 Exercises 1. Set of input symbols = {a.1.0.2} Set of states = {s0.$} Set of output symbols = {a.

Each decimal digit is stored as a sequence of four bits. 5.) What are the input symbols? Printed % . Assume that you have been hired by a traffic light manufacturer to design an "intelligent" traffic light.) write an English specification of what your intelligent traffic-light system is to do. given a pattern string. Construct a finite state machine which will translate BCD numbers into decimal digits. and reset the timers. a. is a standard way of encoding decimal numbers in computers. the notation on the arrows gives the input symbol followed by the output symbol) 3.Chapter 14 Finite State Machines Page 30 2. b. generates the state transition table to be used by a finite automaton which will find all occurrences of the pattern string in an unspecified target string. (In the diagram. Write a program which. ring a bell for blind pedestrians. "0001" for "1". 4. etc. or BCD. Inputs to your system will include signals from various timers. The outputs set the lights (including left-turn and pedestrian signals). Binary Coded Decimal. from sensors that detect the presence of cars in the left-turn and through lanes. and from pedestrian "walk" buttons. Thus the BCD number "10010011" would represent the decimal number "93". " : . Construct the formal specification for the finite state machine given by the following state transition diagram and explain in English what it does. a. with "0000" standing for "0".) construct a finite state machine to perform to these specifications.

g. "2. 8. changing occurrences of "" to ".) How many states are needed? d.712."123. Literal strings are strings of characters delimited by quotation marks (").000".934e7". Page 31 6. "1. e. " : . a. 9. "01" is "a".Chapter 14 Finite State Machines b.) all strings of a's and b's in which every a is immediately followed by a b b. Suppose that we have encoded the letters of the alphabet as decimal numbers which are represented in character form.45". d.345".235. Write a Pascal program which translates a string of digits into a sequence of characters using a finite automaton.) design an acceptor for literal strings which do not contain any quotation marks b. e.) Expand the real number acceptor to calculate the value of the real number.) design a finite automaton which reads in a literal string (enclosed in quotation marks) and writes out the literal string within the quotation marks. "02" is "b" and so on. Any character may be contained within a literal string. For example. Construct acceptors for the following sets of strings: a.465.2.) all strings of a's and b's in which the substring "ab" occurs at least twice c. "3. c. ". If a quotation mark is to be represented within a literal string.) all strings of a's and b's which contain either the pattern "aab" or the pattern "baa" Printed % .) all strings of a's and b's in which the third character from the end is a b. Ignore the possibility of overflow.) Draw the state transition diagram.) What are the output symbols? c. a. it is done by using two consecutive quotation marks ("").3298798". Do not accept numbers with incorrectly placed commas.98" or "345.) Construct a real number acceptor.g. 7.) design an acceptor for literal strings which may contain quotation marks c. b.) Expand the real number acceptor/calculator so that it accepts numbers with commas to separate out thousands.

) (b. Printed % . Finite State Machines Page 32 Describe informally the strings accepted by the machines given by the following diagrams: a.) 11. A "0" is itself a legal number.) If the first digit is not a "0" and the second digit is neither "0" nor "1" then it is a local number and should be exactly seven digits long.Chapter 14 10. b. Construct a finite state machine which is an acceptor for valid telephone numbers. Valid telephone numbers consist of the following: a. as is a "0" followed by any other legal number.) If the first digit is a "0" then a call is operator assisted. " : .

then the villagers threaten him with sticks and pitchforks. Under this stimulus he becomes sentimental and sits and hums to himself. It is almost impossible to wake him up in the morning. If this is successful. This wakes him up. and he gets frightened and retreats back into the castle where he falls asleep.) Consider the monster as a finite automaton. then the monster becomes docile and will help me with my experiments. Construct a finite automaton takes as input strings of “0”s and “1”s and accepts those strings that contain the substring "011010". If not. Whenever this happens. his only problem is that he is too eager to help and gets in the way." 13. When he is docile. If Igor keeps singing.000 volts applied to the bolts on his neck. but if Igor stops singing the monster becomes docile and helpful again.) 12.Chapter 14 c. and he goes out and terrorizes the village. Printed % . I have Igor sing to him. I send Igor out to calm him down. When this happens.) Draw the monster's state transition diagram. but unfortunately it never fails to make him enraged. a. " : . the monster will fall asleep. What are the states? What are the inputs and outputs? b. Finite State Machines Page 33 If the first digit is not a "0" and the second digit is a "0" or a "1" then it is a longdistance number and should be exactly ten digits long. Doctor Victor Frankenstein says: "The monster has been very difficult to deal with lately. The only thing that will rouse him is 10.

4 9. " : .1 Naive String Search  (Algorithm A) 12 7.1 Integer Reader 1 19 9.3 Comment locator 21 9.2 Integer Reader 2 20 9.5 23 Text Compression 23 Printed % .1 Example: the stamp machine 4 4 IMPLEMENTING A FSM 6 5 FINAL OUTPUT MACHINES 6 6 ACCEPTOR MACHINES 8 7 STRING SEARCHING 12 7.Chapter 14 Finite State Machines Page 34 Chapter 14: The finite state control structure 1 ANALOGY 1 2 INTRODUCTION 1 3 THE BASIC FINITE STATE MACHINE MODEL 3 3.2 Finite State String Search (Algorithm B) 13 8 WHAT A FSM CANNOT DO 15 9 EXTENSIONS TO THE BASIC MODEL 16 9.

Chapter 14 Finite State Machines Page 35 10 SUMMARY 26 11 EXERCISES 27 Printed % . " : .