Professional Documents
Culture Documents
Useful PDF
Useful PDF
["f
which occurs at the sign-bit time. As a consequence, we
xk =
n-0
= 0.72
= -0.30
Input Code 32-Word
= 0.95
= 0.11 Ts bln b2n b3n b4n Memory Contents
(Table 1) 0 0 0 0 0 0
0 0 0 0 1 A4 = 0.11
0 0 0 1 0 A3 = 0.95
0 0 0 1 1 A3+A4 = 1.06
I 0 0 1 0 0 A2 = -0.30
0 0 1 0 1 A2+A4 = -0.19
0 0 1 1 0 A2+A3 = 0.65
-
VI 0
0
0
0
1
1
1
0
0
1
0
0
1
0
1
A2+A3+A4 = 0.75
A1 = 0.72
A1+A4 = 0.83
0 1 0 1 0 A1+A3 = 1.67
0 1 0 1 1 Al+A3+A4 = 1.78
0 1 1 0 0 A1+A2 = 0.42
0 1 1 0 1 A1+A2+A4 = 0.53
0 1 1 1 0 A1+A2+A3 = 1.37
0 1 1 1 1 Al+A2+A3+A4 = 1.48
Y
Figure la. Adder and Full Memory 1 0 0 0 0 0
1 0 0 0 1 -A4 = -0.11
1 0 0 1 0 -A3 = -0.95
1 0 0 1 1 -(A3+A4) = -1.06
1 0 1 0 0 -A2 = +0.30
16-Word 1 0 1 0 1 -(A2+A4) = +0.19
1 0 1 1 0 -(A2+A3) = -0.65
(Top Half of -A1 = -0.72 = -0.75
-(Az+A3+Aq)
Table 1) 1 0 1 1 1
1 1 0 0 0
1 1 0 0 1 -(A1 +A4) = -0.83
1 1 0 1 0 -(Ai+A3) = -1.67
1 1 0 1 1 -(Al+A3+A4) = -1.78
1 1 1 0 0 -(A1 +Ap) = -0.42
Sign Control 1 1 1 0 1 -(A1 +A2+A4) = -0.53
0 = Add 1 1 1 1 0 -(Al+A2+A3) = -1.37
1 = Subtract 1 1 1 1 1 - ( A I + A ~ + A ~ + A =~ )-1.48
2-1
Table 2
fo bln
Input Code
b2n b3n b4n
8-Word
Memory Contents, Q
0 0 0 0 -1/2(Al+Ap+A3+Aq) = -0.74
Figure 1b. AdderlSubtractor and Memory
0 0 0 1 -1/2(Ai+A2+A~-Aq) = -0.63
0 0 1 0 -1/2(Al+A2-A3+Aq) 0.21
e,
=
X1 0 0 1 1 -1/2(Al+Ap-A3-A4) = 0.32
0 1 0 0 -1/2(Ai-Ap+A3+Aq) = -1.04
8-Word Q(0) = -0.74
ROM 0 1 0 1 -1/2(Ai-Ag+A~-Aq) = -0.93
(Top Half of 0 1 1 0 -1/2(Al-Ag-A3+Aq) = -0.09
Table 2) Condition
x4 0 1 1 1 -1/2(Al-Ap-A3-A4) = 0.02
Figure 1. DA mechanization of y = Alxl + A2x2 + A3x3 + A4x4 for bit serial [I BAATI implementation.
Least-Significant
Members of Bit Fairs
\
I t I ' rI
where
K L-1 4 The Gauss brackets tell us t o round u p to the next integer.
DA is often most efficient when the number of input lines
i s commensurate with the number of clocks required
and t o load the data, or equivalently, when w = 1. For our
example
TS
1,
x2 , b
8
1,
I
iD
* Word
ROM
11 ,
1,
I
t
FD AIS
AIS
b
(*'
9
SWB
-
PIC
- 0.74
0.74
Y
+
Figure 4. DA mechanization of y = Alxl + A2x2 A3x3 + &x4 for 1 BAAT mechanization
similar t o Figure A3 showing alternate derivation of AIS (AddISubtractl control.
address line in Figure 4 that is driven b y x k (k = 2,3,4) ac- input vector t o the addressing logic, but a scalar input, x,,
tually sees x k @ T , @ ( T , @ x,) = x k @ x,. This i s the and a scalar output, y, for the processor.
same as i n Figure IC. The derivation of the A/S control We have added a pair of delays (the z-' blocks) in the
lines is the same in both figures. input-signal path so that we could develop the delayed
In Figure 5 , we see a demonstration of an L = 4 design signals, x,-, and x,-, from the input, x,; and we have
in order t o show the structure. The memory cost is ex- added a third delay block so that we could develop the
tremely great for such a small computation, but it i s pre- delayed output, y,-,, from y,-,. We obtain yn-, from a par-
sented t o illustrate the principle. Some more practical allel-to-serial register ( P E ) , since the output from the
cases are shown below. summer in the accumulator loop i s a parallel word. The
contents of the 16-word ROM are shown in Table 3. We
APPLICATION OF D A T O A BIQUADRATIC DIGITAL
may, of course, use parallelism to increase the speed, as
FILTER (AN EXAMPLE OF VECTOR DOT-PRODUCT A N D
was discussed in the previous section.
VECTOR-MATRIX-PRODUCT MECHANIZATION)
There i s another important (because of low roundoff
A typical biquadratic digital filter has a transfer function noise, low coefficient sensitivity and favorable limit-cycle
of the form behavior) form o f digital filter (the normal form) that
Y(z) - A. + A,z-' +A2z-' serves t o illustrate the vector-matrix form of DA, and
-- (20) which we will now discuss. An excellent tradeoff study
X(z) 1 + B,z-' +B,z-~
was recently presented by Barnes [351 to show how one
where the poles are determined by the B, and B, and the could design a normal-form second order digital filter to
gain and zeros are determined by the Ao, A,, and A,. The meet prescribed performance criteria. In this section, an
time-domain description is extremely efficient set o f realizations i s shown, o n e
in which the speed and complexity can be effectively
Y" = [&Ai A2 Bi BJ'[X,X,-~X,-~Y~-~y1-J (21)
traded. ( M u c h o f t h e f o l l o w i n g i s taken f r o m Refer-
where the coefficient vector i s [A,A, A2B, B21Jand the data ences 36 and 37.)
vector is [x,x,-,x,-~ yn-, yn-21r. A direct DA mechanization A block diagram of the normal-form structure is shown
of the filter i s shown i n Figure 6. Notice the extreme in Figure 7. The multipliers b,, b2,c,, and c2 determine the
economy of this filter. This figure differs slightly from pole locations of the filter. The input multipliers a, and a2
those that we have seen above. We have a 5-dimensional are used to determine the input scaling and the multipli-
I,,
II II AIS
'1
Ts
~~~~
2-4=1/16
I@
Y
ers a,, d,, and d, to determine the zero locations of the The relationships between Equations 20 and 22 are
filter. There are nine multipliers in total, as compared to
A,, = a,
the five that are required in the so-called direct mechani-
zation. Barnes shows procedures whereby one may re- A, = ald, + a,d2 - ao(hl + h2)
duce this number of multipliers; however, we shall show A2 = aO(hlhL - ClCJ + al(c2d2 - M I )
how to eliminate them.
The vector matrix equation that describes the configu- + az(cid1 - hid,) (23)
ration of Figure 7 is given below: B1 = h, + h2
+
10
[;I [:;&1 nl[:1
U, =
1
hi
~
Yn - - - - -
words. If the words that are stored are 16 bits long, then The parallel-to-serial registers shown d o not output a
our three outputs together call for 3 x 16 = 48 bits per serial data stream, but rather provide a sequence of four
stored word. The total number of bits stored, however, is 4-bit wide segments. The time required t o perform the
a modest 4 x 48 = 192. The somewhat detailed DA reali- filtering function has been reduced t o 4 clock periods.
zation is illustrated in Figure 8. Notice how the addressing This increase in speed demands that the ROM size be in-
section has reduced t o a pair of EXOR gates. The 4-word creased t o (;) (23)4= 2048 words. O n e would like t o be
by 48-bit memory is shown for clarity as comprising three able t o make the throughput rate equal to the clock rate,
memories. I n fact, the ROM may physically consist of rather than just a quarter of that rate. By quadrupling the
three identically addressed 4-word by 16-bit memories or memory of the configuration of Figure 10 and complicat-
a single 4-word extended length (48-bit) memory. Each ing the 3 adders, we can create a very fast but memory-
16-bit output segment drives a separate accumulator loop, hungry (8K word by 48 bit) filter structure that can perform
each with its own initial-condition register, just as we en- the filtering function in a single clock period, as shown in
countered earlier. The addhubtract control line is com- Figure 11.
mon, and the outputs of two of the accumulator loops An alternate approach, which i s shown i n Figure 12,
are converted t o serial form in their parallel-to-serial reg- may be used in which eight ROMs are addressed by the
isters to be fed back to the memory addressing gates. three data streams, 2 bits at a time (2BAAT). Each memory
I n order t o simplify our subsequent development, i t is now a more modest (23)2= 32 words x 48 bits for a
will be useful to redraw Figure 8 as shown in Figure 9. We total memory requirement of 8 x 32 x 48 = 12,288 bits.
can see the essence and utter simplicity of the structure, The three adders are each of a complexity less than that
which is somewhat startling when one realizes that this is of a 16-bit modified-Booth multiplier. I n this approach,
a realization of the 9-multiplier configuration of Figure 7. we have reduced the memory size at the expense of com-
Figure 10 shows a factor-of-four speedup over the cir- plicating the adders.
cuit of Figure 9 by using the data 4 bits at a time (4BAAT). The outputs from the memories are 16 (or whatever,
2048
Word
X 4/ ROM
1 Y
/ 48 Bits
Per
Word
I I
Figure I O . Single memory 4 BAAT DA structure, four clock periods per filter function.
say N, number of) bits. In the shift-and-add process that the filter that this error i s introduced (the accumulator
occurs in the accumulators, the accuracy of the results outputs), we follow them around the recursive loop of
degrades because of the least-significant bits that are lost the filter and see the reinforcement addition that i s the
through the trauma of quantization. Figure 13 illustrates cause of noise gain within recursive filters. We will create
the problem; as the data circulates through the accumu- a parallel error path, now, with error subtractions that
lator loop, LSB's are lost at the shift stage. (These lost bits will nearly cancel the error additions.
are often modeled as an additive error.) From the point in Figure 14 shows the state-space filter of Figure 7 with
'/16
6
,
4
I
1.
2
1
Y
U 1
Figure 12. Eight memory 2 BAAT DA structure, one clock period per filter function.
of DA i n the mechanization o f a simple, direct, high- multiplier-free building blocks that lead t o increased effi-
performance complex multiplier has been the reason for ciency in transform processing. By the simple expedient
its success in FFT applications, and has spurred additional of turning t o non-orthogonal coordinate systems for
work in the development of efficient complex multipliers complex arithmetic, the resulting FFT structures [45] gave
1421. The development path for complex multipliers took birth to a new complex multiplier [46] to perform the re-
an interesting twist with the advent of radix-3 and radix-6 duced number of "twiddles" on the interstage coupling.
FFT's [43,441. Their charm is that they give larger-radix Image processing has turned t o the discrete-cosine
"n
I
Figure 15. DA mechanization of Figure 14.
transform (DCT) for more efficient processing. There, DA lacks the modularity.
again, DA has found a home 147,481. Satisfactory DA adaptive nonlinear filters have been in-
vestigated and reported by Sicuranza and Ramponi [561,
NONLINEAR AND/OR NONSTATIONARY and by Smith et al. [57).
PROCESSING WITH DA
CONCLUSIONS
We have only considered the use of DA in linear, time-
invariant systems. It i s not so restricted. For variable coef- DA i s a very efficient means t o mechanize computa-
ficients, we may use RAM'S rather than ROM's. I n fact, tions that are dominated by inner products. As we have
one of the trailblazers in this approach was Schroder [161. seen, the coefficients of the equations can be time vary-
In 1981 Cowan and Mavor [49] described an 8-tap adap- ing, and the equations themselves can be nonlinear.
tive transversal filter that employed DA, and in 1983 an When a great many computing methods are compared,
expanded version was published by Cowan, Smith, and DA has always fared well, not always (but often) best, and
Elliott [50]. Andrews [51] compared it favorably in a mech- never poorly. As a consequence, whenever the perfor-
anization study involving traditional and nontraditional mance/cost ratio is critical (especially in custom designs),
arithmetic mechanizations of adaptive filters. The capac- DA should be seriously considered as a contender.
ity of the DA adaptive structure can be increased by using
block-processing concepts [52,531. ACKNOWLEDGMENTS
The mechanization of nonlinear difference equations I want t o express my thanks t o the many workers in the
by DA was presented by Sicuranza i n 1541, who repre- field who have quickly and generously shared their results
sented the filter by a truncated discrete Volterra series. with me; to my coworkers who, over the years, have pa-
Chiang et al. in [551 report the results of a mechanization tiently simulated, analyzed, built, and tested endless DA
trade-off study of various implementations of quadratic concepts, and corrected my errors; and to the reviewers
filters. They conclude that DA i s as efficient as matrix who offered very helpful suggestions and led me to refer-
decomposition implemented with systolic arrays, but that ences that were new and unfamiliar t o me.