You are on page 1of 5

2015 Fifth International Conference on Communication Systems and Network Technologies

Hardware Design Procedure: Principles and Practices


Marcus L. George , Geetam Singh Tomar
Department of Electrical and Computer Engineering
University of the West Indies, St. Augustine, Trinidad and Tobago
Marcus.george99@yahoo.com, gstomar@ieee.org

Abstract — This paper presents the hardware design A(X)


procedure (principles and practices) required to convert A
10 ns
a specification in the hardware representation of the
given design. The paper first examines general design
X C F(X)
principles required to guarantee optimum performance 23 ns
of the system to be designed such as pipelining and
sequencing using finite state machines (FSM). The B
general hardware design methodology is then presented 16 ns B(X)
after which the hardware implementation of the
Figure 1: Dataflow diagram of logic circuit ABC
designed system will be discussed. Finally, the procedure
to be used for verifying and validating the implemented Pipelining speeds up operation in combinational logic
system has been presented. circuits by stabilizing inputs to subsequent stages for their
computation, therefore allowing earlier stages to focus on
Keywords— Hardware Design Procedure, Design
processing new outputs as it finishes first. The pipelined
Methodology, FSM
version of combination logic circuit ABC as shown in
INTRODUCTION Figure 2, uses registers to hold the outputs of units A and B
Hardware design and implementation involves taking stable while unit C computes F(X). Therefore units A and B
a specification through the steps required to represent can process inputs Xi+1 while unit C performs computation
required specification in hardware. An understanding of on Xi. Therefore combinational circuit ABC has been
design methodology, hardware implementation, verification converted to a two-pipeline combinational circuit.
and validation are necessary to the successful development
registers
of the required hardware. Where the system consists of
several sub-systems, an understanding of finite state
machine (FSM) [7] design and implementation is necessary. A A(X)
The following sections of this paper summarize the relevant 10 ns
principles of the general digital design methodology. The
X C F(X)
design considerations are always taken according to the 23 ns
requirements of the application and circuit blocks are
selected to produce better results.
B
16 ns B(X)
PIPELINING METHODLOGY
Pipelining (in digital electronics) is the division of a Figure 2: Dataflow diagram of pipelined version of logic
process into a series of stages using registers with the circuit ABC
intension of improving the performance of the system The minimum period of the clock common to all
beyond what would be achieved if the system was not registers is equal to the maximum pipeline stage delay
pipelined. Pipelining facilitates the effective use of hardware which is 23 ns with reference to Figure Figure 2. The
of a combinational circuit in tandem with little extra latency of a K-pipeline is K times the period of the clock
hardware. Combinational logic circuit ABC as shown in common to all registers, and therefore is 46 ns in Figure 2.
Figure 1, consists of three units which together produce The throughput of a K-pipeline is the frequency of the clock
output F(X) using input X. If units A and B have different common to all registers and therefore is 1/23 ns. Table I
latencies then their outputs would not be available for use by demonstrates that pipelining increases the latency but
unit C at the same time resulting in the incorrect increases the throughput of the combinational logic circuit.
computation of output F(X).
TABLE I: COMPARISON OF UN-PIPELINED AND 2-STAGE
PIPELINED VERSIONS OF LOGIC CIRCUIT ABC
Version Latency Throughput
un-pipelined 39 1/39
2-stage pipeline 46 1/23

978-1-4799-1797-6/15 $31.00 © 2015 IEEE 834


DOI 10.1109/CSNT.2015.198
Pipelining methodology consists of three steps.
Consider the combinational logic circuit AG as shown in
Figure 3.
INPUTS OUTPUTS

A B C G
5 ns 2 ns 8 ns
2 ns

D F
4 ns 4 ns

E
2 ns

Figure 3: Dataflow diagram of logic circuit AG Figure 5: Dataflow diagram of the pipelined logic circuit
The first step involves the placement of two dots on Figure AG (after step 3)
3 and drawing of lines that cross every output in the circuit FSM DESIGN AND IMPLEMENTATION
and connect to both dots. These lines demarcate pipeline METHODLOGY
stages. The second step involves the adding of pipeline Finite state machines are computational models
registers at every point where a separating line crossed a consisting of a finite number of states transitions between
connection to generate valid pipelines. The pipelined those states and accompanying actions. The inputs of the
version of circuit AG after pipeline step 2 is given in Figure state machine are the system inputs and the present state,
4. while the outputs are the next state and system outputs. For
Mealy state machines [7] the system output logic is
dependent on the system inputs and present state. This
means that the state diagram for a Mealy state machine
should include both the input and output signal for each
transition edge. States are represented by circles. The state
transitions are represented by lines leaving state circles. For
each transition the inputs and outputs are represented on the
arcs in the format “input bits/output bits”.
The procedure for the design and implementation of
Mealy state machines is as follows: first the state transition
tables for each finite state machine must be developed. A
state transition table shows what state a finite state machine
will move to, based on the current state and other inputs.
Figure 4: Dataflow diagram of the pipelined logic circuit Table II gives the state transition table for a modulo 4
AG (after step 2) counter which is required to count from 0 to 3 based on the
The third step involves the removal of excess rising edge of the clock C and assert T to logic 1 when the
registers based on the latency of modules in each pipeline current count reaches 3. The inputs of the state machine as
stage. For example the third stage of the pipelined version of placed on the left-hand side columns while the outputs are
AG results in the minimum clock period of the clock on the right-hand side columns.
common to all registers of the system since unit C has the TABLE II
longest latency of all units of the system. Therefore other STATE TRANSITION TABLE FOR A MODULO 4 COUNTER
pipeline stages with shorter latencies can be combined as Present Next
C T D1D0
long as their combined latency does not exceed the current State State
minimum clock period. The 4th (Unit F) and 5th (Unit G) S0 0 S0 0 00
pipeline stages can be combined into one pipeline stage S0 1 S1 0 00
since their combined latency is 6 ns which is 2 ns shorter S1 0 S1 0 01
than the minimum clock period of 8ns. Performing this S1 1 S2 0 01
modification will result in a reduction in the number of S2 0 S2 0 10
pipelining stages and the overall system latency, while S2 1 S3 0 10
maintaining the minimum clock period of 8 ns for the S3 0 S3 0 11
system. The pipelined version of circuit AG after pipeline S3 1 S0 1 11
step 3 is given in Figure 5. The state diagrams corresponding to these state
transition tables must be developed. For a Mealy state

835
diagram the input and outputs are represented on the lines normally used in applications where the quantized versions
leaving the circle in the format inputs/outputs. Figure 6 of the computed output are required since only the integer
gives the Mealy state diagram of the modulo 4 counter. For component of numbers can be represented. When using
Moore state machines [7] the inputs are represented on the integer arithmetic the order in which arithmetic operations
lines leaving the circle while the outputs are represented are performed matters. For example if an algorithm to be
adjacent to the states (circles). implemented with integer arithmetic consists of a
multiplication and division process, a more accurate result is
0/000 0/001 obtained from performing the multiplication operation
before the division operation instead of performing the
1/001
S0 S1 division operation before the multiplication operation. This
is because if the division operation was performed first then
1/000 1/010 the quantized version of the quotient will be obtained and
hence used for the multiplication process, leading to loss of
1/111
S3 S2 accuracy before the multiplication operation was performed
in the result.
0/111 0/010 Fixed-point arithmetic offers narrow dynamic ranges
and is favoured for high-volume applications like
Figure 6: Mealy State diagram of for the Modulo 4 digitization of voice where unit manufacturing costs need to
counter be kept low. Unlike integer arithmetic, fixed-point
The finite state machine is then implemented using an arithmetic is capable of representing fractional values.
HDL eg. VHDL. The state machine logic is described with Floating-point arithmetic provides enhanced
case statements [6] or if-else statements [6]. All current performance over fixed-point and integer arithmetic due to
states and inputs are enumerated and the appropriate values their large dynamic range. Unlike fixed-point and integer
for next state and the outputs are specified. In VHDL the arithmetic, floating-point arithmetic can represent very small
case items parameters that specify the state encoding for and large numbers with a fixed number if bits. A large
each case: dynamic range is important in managing extremely large
case present_state is data samples and with data samples where the range cannot
when reset_state => be easily predicted. Floating-point arithmetic also provides
next_state <= idle; greater mathematical flexibility and accuracy over fixed-
when S0 => point and integer arithmetic. The internal representations of
if(C = ‘1’)then data in floating-point arithmetic are more exact than in
next_state <= S1 fixed-point, ensuring greater accuracy in end results.
else Floating-point arithmetic should be used for applications
next_state <=S0; that require great computational accuracy and flexibility, eg.
end if; image recognition used for medicine where a high degree of
when S1 => accuracy is required. On the other hand applications that do
if(C = ‘1’)then not require these computational features can normally use
next_state <= S2 fixed-point or integer arithmetic. If fixed-point or integer
else units were used instead of floating-point arithmetic for
next_state <=S1; applications that require great computational accuracy and
end if; flexibility, the algorithms would have to be carefully
end case; mapped to their limited dynamic ranges and scaled through
Outputs are encoded in a similar manner to the each function in the datapath. This will require significant
encoding of the next state value. Case statements or if-else amount of rounding and saturation steps and can adversely
statements are used and the appropriate value is assigned to affect algorithm performance if done improperly.
the outputs depending on the particular state transition or Although floating-point arithmetic provides enhanced
state value. More information about the implementation of performance over fixed-point and integer arithmetic, they
finite state machines in VHDL can be obtained from [6]. however require more internal circuitry, hence increasing
NUMERIC FORMATS the cost of implementation.
System developers, especially those who are new to HARDWARE DESIGN METHODOLOGY
digital signal processors, are sometimes uncertain whether The hardware design procedure consists of several
they need to use integer, fixed-point or floating-point important stages which are crucial to the proper functioning
arithmetic for implementing their systems. The choice of of the system to be implemented. The first step involves the
numeric format to use depends on the nature of application development of the data flow diagram from the system
and the constraints of the system. Integer arithmetic is not specifications. A dataflow diagram models a system as a
capable of accurately representing fractional values and is network of functional processes and data in a system to

836
clearly illustrate how data moves through the system. When Equation 1 has a latency of 12ns but is required to perform
developing a dataflow diagram the arithmetic operations the computation of f(n) once every 10 ms. In this case
required must be identified. In signal processing each however registers can be included in the system to stabilize
operation is represented by a symbol as shown in inputs only if it is inherent of some arithmetic components
Figure 7. being used in the implementation of the system to have their
b b outputs valid for a very short period of time, eg. if a fixed
point multiplier was used to compute the product of k and
a
+
x=a+b
x a

x=a−b
− x a
+
x = ∑a
x f(n-1) but however kept s its result kf(n-1) valid for half a
clock cycle after the result was computed. There is a
Adder Subtractor Accumulator possibility that the result would not be valid when the
subsequent units are ready to perform their computation.
b b
Registers will be installed to solve this problem.
a
×
x=a×b
x a
÷
x=a÷b
x a Delay

x = a[n-1]
x +1

n = n+1
n
The third step of the hardware design methodology
involved the development of the datapath block diagram of
Delay
the system. This requires the inclusion of the ports and the
Multiplication Division Counter
respective port widths for all components used. A finite state
Figure 7: Symbols for digital signal processing machine must be designed and included in the system. The
operations datapath block diagram of the system that implements the
Assume that for example the system specification requires Equation 1 is given in
the design of a unit which implements Equation 1. Figure 9.
f (n)  k  f (n  1)  2  g (n)  p(n) store_A

clk
store_B

clk
store_C

clk
store_D

clk
HR HR HR HR
32 32 32

Equation 1: Function f(n) RESET

RESET kf(n-1)
RESET
store_B
RESET
store_C

kf(n-1) + 2g(n)p(n)
RESET
store_D

store_A clk clk clk


clk reset product HR HR A sum HR f(n)

In the example, Equation 1 consists of five operations: clk


HR A
Multiplier
32
RESET
32
RESET
32

Adder
32
RESET
32

 Delay operation - f (n  1)
32 32
RESET

B start finish B Cin Cout


k HR 32
32 clk

 Multiplication operation 1 - k  f (n  1)
finish1
0

store_A

 Multiplication operation 2 - g ( n)  p ( n)
store_C

RESET g(n)p(n) RESET


2g(n)p(n) clk store_A
store_A store_B clk
reset reset HR RESET reset start_mult1
clk product clk product

 Multiplication operation 3 - 2  g (n)  p (n)


32 32 32 start store_B
clk clk
RESET Controller Unit
g(n) HR A
Multiplier HR A
Multiplier finish1 mult1_done start_mult2
32 32 32
RESET RESET finish2 mult2_done store_C

o An alternative is an addition process that adds the


B start finish 2 B start finish
done store_D
p(n) HR 32
32 clk
finish2

result of the second multiplication operation to store_A

itself.
 Addition operation - k  f (n  1)  2  g (n)  p (n) Figure 9 Datapath block diagram of the unit responsible
All operation symbols are put together to create the data for implementing the function f(n)
flow diagram that implements Equation 1 and shown in HARDWARE IMPLEMENTATION
Figure 8. High density programmable logic devices such as
f(n-1)
CPLDs and FPGAs can be utilized to integrate significantly
delay large amounts of logic in a single IC. Hardware description
languages such as VHDL, AHDL and Verilog provide an
kf(n-1) alternative to Boolean equations or gate-level description of
k
× + f(n)
digital components. HDLs also provide high level constructs
that enable designers to describe large circuits e.g. creation
of libraries which can store components for reuse in
g(n)p(n) 2g(n)p(n)
g(n)
× × subsequent designs. VHDL is the standard language for
hardware description as approved by the IEEE [2] and
p(n) 2 provides portability of code between synthesis and
simulation tools as well as device-independent design.
Fig. 8 Dataflow diagram for function f(n) After implementation using an HDL the digital circuit
The second step of the hardware design methodology is synthesised. Synthesis is the process of converting a high
involves the application of pipelining to the dataflow level description of the design into an optimized gate-level
diagram as presented. In applications the latency of the representation given a standard-cell library and design
system is substantially shorter than the frequency at which constraints. Computer aided logic design tools such as
new inputs are available to the system there is no need to Xilinx ISE and Quartus II are used to synthesis digital
apply pipelining eg. if the system is that implements circuits and can greatly reduce the design cycle time and

837
improve productivity of digital circuits. With these synthesis
tools designers are able to write technology-independent,
abstract high-level descriptions and produce technology-
dependent gate-level net list of the required digital system.
VERIFICATION AND VALIDATION
Verification testing is used to confirm that an
implemented system functions as specified. Verification
testing is done by creating test-benches for execution on
simulated/implemented systems after synthesis. A test-
bench includes test cases, inputs and expected outputs. All
test cases are exercised to ensure that actual outputs are
equivalent to expected outputs. Validation testing is used to
establish that a system satisfied an operational requirement.
Timing simulation of the implemented system can be used
to determine the latency in nanoseconds or cycles.
CONCLUSIONS
This paper presented general aspects of hardware
design procedure. Design principles such as pipelining, and
FSM design and implementation methodology were
presented after which the hardware design methodology was
presented. Hardware implementation along with the
verification and validation procedure was then summarized.
ACKNOWLEDGEMENT
The author is grateful to Professor Stephan Gift, Dr.
Lucien Ngalamou and Dr. Cathy-Ann Radix of the
University of the West Indies for the invaluable guidance
given throughout the production of this paper.
REFERENCES
[1] Hambley, Allan. Electrical Engineering Principles and Applications.
2nd. ed. New Jersey: Prentice Hall, 2001.
[2] Institute of Electrical and Electronic Engineers (IEEE). 1993. IEEE
Standard VHDL Language Reference Manual. Std 1076.3.
[3] Klemperer, Peter, Shelly, Chen, Karthik, Pattabiraman, Zbigniew,
Kalbarczyk, and Ravishankar, Iyer. 2008. FPGA Hardware
Implementation of Statically-derived Application-aware Error
Detectors. Proceedings of the 2008 International Conference on
Dependable Systems and Networks, 34-40. Washington, DC, USA:
IEEE Computer Society.
[4] Meyer-Baese, U. 2000. Digital Signal Processing with Field
Programmable Gate Arrays. 1st ed. New York: Springer.
[5] Perry, Douglas. 1998. VHDL. 3rd ed. New York: McGraw-Hill.
[6] Skahill, Kevin. 1996. VHDL for Programmable Logic. 1st ed. New
York: Addison-Wesley
[7] Wakerly, John. 1999. Digital Design Principles and Practices. 4th ed.
New York: Prentice Hall.
[8] Paul Myers, “Interfacing using Serial Protocols Using SPI and I2C”,
Proc. ESP 2005, pp.1-9, 2005.
[9] GS Tomar, ML George, “Hardware Implementation of Pulse Code
Modulation Speech Compression Algorithm”, Asia-pacifc Journal of
Multimedia Services Convergence with Art, Humanities and
Sociology, Vol.2, No.1, p. 17-24, 2012.

838

You might also like