You are on page 1of 11

An Example of ASM Design: A Binary Multiplier

Dr. D. Capson Electrical and Computer Engineering McMaster University

Introduction
An algorithmic state machine (ASM) is a Finite State Machine that uses a sequential circuit (the Controller) to coordinates a series of operations among other functional units such as counters, registers, adders etc. (the Datapath). The series of operations implement an algorithm. The Controller passes control signals which can be Moore or Mealy outputs from the Controller, to the Datapath. The Datapath returns information to the Controller in the form of status information that can then be used to determine the sequence of states in the Controller. Both the Controller and the Datapath may each have external inputs and outputs and are clocked simultaneously as shown in the following figure:
Inputs Outputs Inputs

Status

Controller

Control

Datapath

clock
Outputs

Think about this: A microprocessor may be considered as a (large !) ASM with many inputs, states and

outputs. A program (any software) is really just a method for specification of its initial state The two basic strategies for the design of a controller are: 1. hardwired control which includes techniques such as one-hot-state (also known as "one flipflop per state") and decoded sequence registers. 2. microprogrammed control which uses a memory device to produce a sequence of control words to a datapath.. Since hardwired control is, generally speaking, fast compared with microprogramming strategies, most modern microprocessors incorporate hardwired control to help achieve their high performance (or in some cases, a combination of hardwired and microprogrammed control). The early generations of microprocessors used microprogramming almost exclusively. We will discuss some basic concepts in microprogramming later in the course for now we concentrate on a design example of hardwired control. The ASM we will design is an n-bit unsigned binary multiplier.

Binary Multiplication
The design of binary multiplication strategies has a long history. Multiplication is such a fundamental and frequently used operation in digital signal processing, that most modern DSP chips have dedicated multiplication hardware to maximize performance. Examples are filtering, coding and compression for telecommunications and control applications as well as many others. Multiplier units must be fast !

The first example that we considered (in class) that used a repeated addition strategy is not always fast. In fact, the time required to multiply two numbers is variable and dependent on the value of the multiplier itself. For example, the calculation of 5 x 9 as 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 + 5 requires more clock pulses than the calculation of 5 x 3 = 5 + 5 + 5. The larger the multiplier, the more iterations that are required. This is not practical. Think about this: How many iterations are required for multiplying say, two 16-bit numbers, in the worst case ? Another approach to achieve fast multiplication is the look-up table (LUT). The multiplier and multiplicand are used to form an address in memory in which the corresponding, pre-computed value of the product is stored . For an n-bit multiplier (that is, multiplying an n-bit number by an n-bit number), a (2n+n x 2n)-bit memory is required to hold all possible products. For example, a 4-bit x 4-bit multiplier requires (28) x 8 = 2048 bits. For an 8-bit x 8-bit multiplier, a (28+8) x 16 = 1 Mbit memory is required. This approach is conceptually simple and has a fixed multiply time equal to the access time of the memory device, regardless of the data being multiplied. But it is also impractical for larger values of n. Think about this: What memory capacity is required for multiplying two 16-bit numbers ? Two 32-bit numbers ? Most multiplication hardware units use iterative algorithms implemented as an ASM for which the worst-case multiplication time can be guaranteed. The algorithm we present here is similar to the pencil-and-paper technique that we naturally use for multiplying in base 10. Consider the following example: 123 x 432 --246 369 492 ----53136 (the multiplicand) (the multiplier) (1st partial product) (2nd partial product) (3rd partial product) (the product)

Each digit of the multiplier is multiplied by the multiplicand to form a partial product. Each partial product is shifted left (that is, multiplied by the base) by the amount equal to the power of the digit of the corresponding multiplier. In the example above, 246 is actually 246x100, 369 is 369x101= 3690 and 492 is actually 492x102 = 49200, etc. There are as many partial products as there are digits in the multiplier. Binary multiplication can be done in exactly the same way: 1100 x 1011 ---1100 1100 0000 1100 -------10000100 (the multiplicand) (the multiplier) (1st partial product) (2nd partial product) (3rd partial product) (4th partial product) (the product)

However, with binary digits we can make some important observations: Since we multiply by only 1 or 0, each partial product is either a copy of the multiplicand shifted by the appropriate number of places, or, it is 0. The number of partial products is the same as the number of bits in the multiplier The number of bits in the product is twice the number of bits in the multiplicand. Multiplying two n-bit numbers produces a 2n-bit product.

We could then design datapath hardware using a 2n-bit adder plus some other components (as in the example of Figure 10.17 of Brown and Vranesic) that emulates this manual procedure. However, the hardware requirement can be reduced by considering the multiplication in a different light. Our algorithm may be informally described as follows. Consider each bit of the multiplier from right to left. When a bit is 1, the multiplicand is added to the running total that is then shifted right. When the multiplier bit is 0, no add is necessary since the partial product is 0 and then only the shift takes place. After n cycles of this strategy (once for each bit in the multiplier) the final answer is produced. Consider the previous example again:

1100 (the multiplicand) 1011 (the multiplier) ---0000 (initial partial product, start with 0000) 1100 (1 multiplier bit is 1, so add the multiplicand) ---1100 (sum) ---01100 (shift sum one position to the right) 1100 (2 multiplier bit is 1, so add multiplicand again) ---100100 (sum, with a carry generated on the left)
st nd

----

100100 (shift sum once to the right, including carry) 0100100 (3rd multiplier bit is 0, so skip add, shift once) ---1100 (4th multiplier bit is 1, so add multiplicand again) ---10000100 (sum, with a carry generated on the left) 10000100 (shift sum once to the right, including carry)
Notice that all the adds take place in these 4 bit positions we need only a 4-bit adder ! We also need shifting capability to capture the bits moving to the right as well as a way to store the carries resulting from the additions. The final answer (the product) consists of the accumulated sum and the bits shifted out to the right. A hardware design that can implement this algorithm is described in the next section.

Design of the Binary Multiplier Datapath


The multiplication as described above can be implemented with the components as shown in the figure on the next page (note that for simplicity, the clock inputs are not shown). It is the role of the controller to provide a sequence of the inputs to each component to cause the datapath hardware to perform the desired operations. Registers A and Q are controlled with synchronous inputs Load (parallel load), Shift (shift one position to the right with left serial input) and Clear (force the contents to 0). The D flipflop has an asynchronous Clear input and the counter has an asynchronous input Init (force the contents to 11..1). The log2n-bit counter (Counter P) is used to keep track of the number of iterations (n). Counter P is loaded with the value n-1 and counts down to zero - thus n operations are ensured. Each operation is either (a) add then shift or (b) just shift as described in the multiply algorithm above. Zero detection on the counter produces an output Z that is HI when the counter hits zero and this is used to tell the controller that the sequence is complete. The Counter P is initialized to n-1 with input Init = 1. The multiplicand is applied to one n-bit input of the adder. The sum output from the adder is stored as a parallel load into Register A. Register A can also shift to the right, accepting a 1-bit serial input from the left. This is provided from the output of a D flip flop which stores the value of the carry out from the adder in the previous addition. Register Q receives its left serial input when shifting from the right-most bit (lsb) of Register A. Register A and Q are identical in operation (but controlled differently) and together with the carry flipflop, they form a (1 + n + n)-bit shift register. That is, Registers C, A and Q are connected such that the carry value stored in the flipflop enters Register A from the left and the bit shifted out from the right of Register A enters Register Q from its left. At the end of the process, registers A and Q will hold the 2n-bit product (the n msbs are in Register A). The multiplier is initially stored in Register Q via its parallel load capability. The reason for this is that it provides a convenient way to access each bit of the multiplier in succession at the lsb position (Q0) of Register Q. In the multiply algorithm, each bit of the multiplier is used to decide if there should be an (a) add with shift or (b) shift only. So, Q0 is used to tell the controller which of these operations to perform on each iteration. After each shift, one bit of the multiplier is lost to the right and the Product shifts into Register Q from the left. After n shifts, Register Q holds the n lsbs of the product and the Multiplier is totally lost. Putting the datapath circuit for the binary multiplier into a box, we see it has: Data Inputs: Data Outputs: Multiplicand (n bits) Multiplier (n bits) Product (2n bits)

Control inputs: Clear carry Load, Shift and Clear (for each shift register) Init (for the counter) Status outputs: Z (zero detect) and Q0 (each bit of the Multiplier, in succession)

Multiplicand
n A Cin Cout SUM n B Parallel Adder

Multiplier

n-1 log 2 n Binary Down Counter


Init

Counter P

n Flipflop D C
Clear

n Shift Reg
Load Shift Clear Left serial input Load Shift Clear

Z
(Zero Detect)

Shift Reg
Left serial input

Register A

Register Q

n n

1 (lsb of Reg A) n

Q0 (lsb of Reg Q)

Product (msb's)

Product (lsb's)

Datapath for Binary Multiplier

Design of the Binary Multiplier Controller


An ASM chart that implements the binary multiply algorithm is given below. Note that << indicates an assignment, for example, C<<0 means set C to 0.

IDLE

C 0, A 0, P n-1 Q multiplier

MUL0

Q0

1
A A + multiplicand C Cout

C0

MUL1

C|A|Q shr (C|A|Q) P P-1

The process is achieved with 3 states (IDLE, MUL0 and MUL1). Each state will provide control signals to the Datapath to perform the multiplication sequence. The process is started with an input G. As long as G remains LO, the ASM remains in state IDLE. When G=1, the multiplication process is started. As the ASM moves to state MUL0, the carry flip flop is cleared (C<<0), Reg A is cleared (A<<0), the Counter is preset to n-1 (P << n-1) and Register Q is loaded with the Multiplier. In state MUL0, the value of each bit of the multiplier (available on Q0) determines if the multiplicand is added (Q0 = 1) or not (Q0=0). For the case Q0=0, the Carry flipflop is cleared ; for the case Q0=1, the Cout from the adder is stored in the carry flipflop. The next state is always MUL1.

In MUL1, the Carry flipflop, Reg A and Reg Q are treated as a (1 + n + n)-bit register and shifted one position to the right, together. This is indicated with the notation C|A|Q << shr (C|A|Q) in the ASM chart. The counter is also decremented (P << P 1). The value of Z then determines whether to: return to state MUL0 (Z=0) to continue iteration OR return to state IDLE (Z=1) thus completing the process. Remember that Z=1 means that the counter has counted down from n-1 to 0 and therefore n iterations have been completed. State IDLE=0 therefore indicates that the Multiplier is currently multiplying and when the ASM returns to state IDLE (IDLE=1), it indicates that multiplication is completed. At this point in the design process, the control signals must be identified and their names chosen. This is done by inspection of the ASM chart and the datapath circuit. In MUL0, the operations P << n 1, A<<0 and Q << multiplier are all independent of one another in the datapath and thus can be done simultaneously and therefore can share a common control signal (Initialize). However, the operation C<<0 must have its own control signal (Clear_C) since it occurs in both states IDLE and in MUL0. Operations C << Cout and A << A + multiplicand, required in state MUL0, can share a control signal (Load) since they are also independent functions in the datapath. And, similarly, the shifting of registers C|A|Q and decrementing of counter P can share a common control signal since they are independent operations in the datapath and are required in state MUL1 (Shift_dec). The names of the control signals are of course, a matter of design choice. We can summarize all the operations that must take place on each component in the datapath and indicate the corresponding control signal names that should be passed to the datapath in the following table:

Datapath component
Carry flipflop Counter P Register A Register Q

Operation
C << 0 C << Cout (from the adder) P << n - 1 P << P 1 A << 0 A << A + multiplicand C|A|Q << shr (C|A|Q) Q << multiplier C|A|Q << shr (C|A|Q

Control Signal name


Clear_C Load Initialize Shift_dec Initialize Load Shift_dec Initialize Shift_dec

The state transition diagram for the controller for this ASM is shown below. Note that only the inputs are shown; the outputs are not indicated:

G=0 G=1
IDLE MUL0

z=0 z=1

MUL1

From inspection of the state transition diagram, the input equations for the D flipflops (using one flipflop per state) are easily formed: DIDLE = G IDLE + MUL1 Z DMUL0 = IDLE G + MUL1 Z DMUL1 = MUL0 From the ASM chart and the table above, the equations for the control signals outputs from the controller are formed: Initialize = G IDLE Clear_C = G IDLE + MUL0 Q0 Load = MUL0 Q0 Shift_dec = MUL1 Finally, to provide a mechanism to force the state machine to state IDLE (such as at power-up), an asynchronous input Reset_to_IDLE is connected to the asynchronous inputs of the flipflops. The circuit for the controller is then simply, an implementation of all of these equations as follows:

Controller for Binary Multiplier

Go Reset_to_IDLE D Clock P Q IDLE

IDLE

Initialize Clear_C

Q0 MUL0 D C Q

Load MUL0

Z MUL1 D C Q Shift_dec

Our binary multiplier ASM has the form:

Go

Reset to IDLE

Multiplicand

Multiplier

n
Z, Q0

Controller

Initialize, Clear_C, Load, Shift_dec

Datapath
2n

clock

Product

Combining the controller and the datapath to form the top level of our design, the binary multiplier may be viewed as:
n n

Multiplier Multiplicand Go IDLE

Binary Multiplier

2n

Product

Reset to IDLE

Clock

Note that the IDLE state variable has been brought to the top level since it can be use to indicate when the Binary Multiplier is busy. The Go and IDLE lines are called handshaking lines and are used to coordinate the operation of the multiplier with the external world. If IDLE =1, a multiply can be started by putting the numbers to be multiplied on the Multiplier and Multiplicand inputs and setting Go=1 at which time the state machine jumps to state MUL0 (and therefore, simultaneously, IDLE changes to 0) to start the process. When IDLE returns to 1, the answer is available on the Product output and another multiplication could be started. No multiplication should be attempted while IDLE is 0.

Conclusion
This design of a Binary Multiplier is valid for any value of n. For example, for n=16, the multiplication of two 16-bit numbers, the datapath components would simply be extended to accommodate 16 bits in Registers A and Q and the counter would require log 2(16) = 4 bits. The adder would also be required to be 16-bits in width. However, the same controller implementation can be used since its design is independent of n. The multiplication time for n=16 would be 2(16) + 1 = 33 clocks. The product would contain 32 bits. Further refinements can be made to enhance the speed and capability of the ASM. For example, in our algorithm, each 0 in the multiplier input data causes a shift without an add, each taking a clock pulse. If the multiplier input contains runs of consecutive 0s, a barrel shifter could be used to implement all of the required shifts (equal to the length of the run of 0s) in a single clock.
Think about this:

What modifications to our design would be required in order to be able to handle signed numbers. ?

Example: Multiply 12 x 5 = 60 (with n = 4)


Assuming a 4-bit multiplier, in binary, this is 1100 x 0101 = 00111100. The following table summarizes all the values in the ASM for each step in this multiplication. The left column represents each clock pulse applied to the multiplier. The multiplication time for this ASM is always 2n+1 clocks (confirm this with the state transition diagram). Since n=4, there are 9 clocks required to complete a multiplication. Multiplication time is not data dependent as in our first example that used repeated addition ! The first row of the table is the initial state (state IDLE) at which every multiply begins. Then, for each clock pulse applied, we move down one row in the table. Counter P is this example has 2 bits (to count 4 iterations) and the zero detect Z can be seen to be Z=1 only when Counter P counts down to 00. The values of registers C, A and Q are shown for each clock pulse in the process. Note that the multiplier is initially stored in Q, then shifted out to the right giving access to each bit in the multiplier at the Q0 (lsb) position. At the same time, the product shifts in from the left. The product is formed in registers A and Q with the addition on each iteration occurring in Register A if the contents of the lsb of Register Q is HI (i.e. Q0 = 1). Notice that registers C, A and Q are shifted on every iteration and that the final answer 00111100 is contained in Registers A and Q on the final clock pulse. At this point, we have returned to state IDLE indicating that multiplication is complete. The current state of the ASM is indicated with a 1 in the appropriate States column. Note that since we are using one flipflop per state, only one of the 3 columns can contain a 1; the others are of course, 0. In the Control Signals columns, the values for each control signal are provided for each clock pulse. Note that Initialize, Clear_C and Load are Mealy-type outputs since they are a function of both current state and inputs. Shift_dec is a Moore-type output since it depends only on current state (MUL1) and is not a function of any input. In fact, Shift_dec = MUL1. Work through this example line by line to verify its operation.

Example: 12 x 5
Clock
pulse Counter P

Z 0 0 0 0 0 0 0 1 1 0

C x 0 0 0 0 0 0 0 0 0

Reg A xxxx 0000 1100 0110 0110 0011 1111 0111 0111 0011

Reg Q xxxx 0101 0101 0010 0010 0001 0001 1000 1000 1100

States
IDLE MUL0 MUL1

Control Signals
Initialize Clear_C Load Shift_dec

1 2 3 4 5 6 7 8 9

1 1 1 1 1 0 0 0 0 1

1 1 1 0 0 1 1 0 0 1

1 0 0 0 0 0 0 0 0 1

0 1 0 1 0 1 0 1 0 0

0 0 1 0 1 0 1 0 1 0

1 0 0 0 0 0 0 0 0 0

1 1 0 1 0 0 0 1 0 0

0 1 0 0 0 1 0 0 0 0

0 0 1 0 1 0 1 0 1 0

You might also like