Professional Documents
Culture Documents
1
0. .25
0. 156 5
0. 39 5
0. 09 3
6
1
5
9. 5E 5
6
3 8 07
96 7
4 8
8
9
0
0
1
9. E 1
5E 2
3
implement in hardware [7]. The problem with the
0 2
00 2
0 0 06
0 0 76
10 4
52 0
81 -0
53 -0
5. E-0
1. E-0
72 -0
31 0
32 -1
82 -1
45 1
63 -1
09 -1
-1
0. 6
6. 024
1. 4E-
2. 7E-
9. 5E-
1. E-
0
3. 6E
3. 9E
2. 3E
5. E
3. E
0
8
1
5
8
fixed-point format is that precision falls off rapid-
ly to zero. If we allow the precision to fall off at a
Peak signal amplitude
slower rate, can we simultaneously achieve good
noise performance and a wider dynamic range? (a)
To accomplish this goal, we re-examine Table
1. From this table we discover a mechanism for 120
recasting the 16-bit fixed-point format into a s15e0 (fractional)
s11e4
Peak signal / peak roundoff error (dB)
0. 15 5
0. 39 5
0. 09 3
6. 024 6
1. 4E 1
3. 6E 5
9. 5E 5
2. 7E 6
4 7
96 7
3. 9E 8
9. 5E 8
2. 3E 9
5. 8E 0
1. 1E 0
3. 5E 1
9. 8E 1
5E 2
3
where S represents the number’s sign, M repre-
1
0 2
00 62
00 06
00 76
10 4
52 -0
81 -0
53 -0
38 -0
5. E-0
1. E-0
72 -0
31 -0
32 -1
82 -1
45 -1
63 -1
09 -1
-1
0. 6
0
0
0. 156 5
0. 39 5
0. 09 3
6
1
5
9. 5E 5
2. 7E 6
4 7
96 7
4 8
8
9
0
0
1
9. E 1
5E 2
3
6. 024
1. 4E-
9. 5E-
1. E-
0
3. 6E
3. 9E
2. E
5. E
3. E
0
3
8
1
5
8
Format
The main reasons for Numeric range S = sign bit Significant binary digits
introducing a new X = either 1 or 0
fractional format are
1.0–0.5 S1XXXXXXXXXXXXXX 15
that this format has
a path into a C data 0.5–0.25 S01XXXXXXXXXXXXX 14
type with the
0.25–0.125 S001XXXXXXXXXXXX 13
embedded-C
standard (ISO, 0.125–0.0625 S0001XXXXXXXXXXX 12
2003a), and it has a
0.0625–0.03125 S00001XXXXXXXXXX 11
proven record
for use in 0.03125–0.015625 S000001XXXXXXXXX 10
0.00390625–0.001953125 S000000001XXXXXX 7
0.001953125–0.0009765625 S0000000001XXXXX 6
0.00048828125–0.000244140625 S000000000001XXX 4
0.000244140625–0.0001220703125 S0000000000001XX 3
0.0001220703125–0.00006103515625 S00000000000001X 2
0.000030517578125 S000000000000001 1
0 S000000000000000 0
floating-point formats, the mantissa length and ered within the numeric range of the number
numeric precision are variable for formats format (e.g., 1.0 to 2–15 for fixed-point fraction-
derived from this class. This is especially impor- al). The dynamic range of the format is the ratio
tant for DSP formats. of the upper and of the lower limit (multiplied
Combining exponents: Individual exponents by two if rounding is employed).
can be combined to produce exponents of higher We should point out that this class of floating-
precision. For example, two 12-bit exponents point formats is mainly applicable to problems
could be combined to form a 13-bit exponent. that can benefit from an uneven distribution of
One 12-bit exponent could be combined with an precision. However, with the common use of
exponent from each of the other precisions (11, AGC mechanisms, DSP applications fall into this
10, 9…1) to also create a 13-bit exponent. category of problems. For applications that
Dual exponent mapping: A number’s actual require an even distribution of precision across
value for this class of formats is given by the the numeric range, standard floating-point for-
equation: value = (–1)S × 1.M × 2 f(EL,ER). The mats are more appropriately used.
exponent is essentially determined by the num- This class of floating-point formats could eas-
ber of contiguous zeros on each end of the man- ily be developed much further. But we don’t
tissa. The mantissa itself consists of the bits in require further development to derive from it
between the left-most and right-most ones data formats that are optimized for DSP. To
(excluding the sign bit). These two ones can be derive a specific format from this class, we mere-
used as separators because only zeroes are ly define the exponent function f(EL,ER) for the
allowed in the exponent fields. Each individual individual derived format. With this class of for-
binary exponent is derived by examining both mats as our primary tool, we now derive a data
the left-edge and right-edge exponents (EL and format optimized for 16-bit DSP applications.
ER). A mechanism is then used (possibly, a table
lookup) to map the two edge exponents into a
single binary exponent f(EL,ER).
A NEW FRACTIONAL FORMAT
Optimal sequences: Exponents can be The authors have previously derived several data
ordered in any sequence, with the caveat that formats optimized for 16-bit DSP applications
each binary range (e.g., 0.5 to 0.25) must be cov- from this class of floating-point formats [8]. In this
Format
S = sign bit This format provides
Numeric range M = mantissa Significant binary digits almost double the
0 = exponent
1 = separator dynamic range of
standard fixed point
1.0–0.5 SMMMMMMMMMMMMMM1 15 at the cost of
0.5–0.25 SMMMMMMMMMMMMM10 14 decreased precision
at the very top of
0.25–0.125 SMMMMMMMMMMMM100 13
the numeric range
0.125–0.0625 SMMMMMMMMMMM1000 12 (1.0–0.5). It also
provides improved
0.0625–0.03125 SMMMMMMMMMM10000 11
precision over fixed
0.03125–0.015625 SMMMMMMMMM100000 10 point throughout the
remainder of the
0.015625–0.0078125 SMMMMMMMM1000000 9
numeric range.
0.0078125–0.00390625 SMMMMMMM10000000 8
0.00390625–0.001953125 SMMMMMM100000000 7
0.001953125–0.0009765625 SMMMMM1000000000 6
0.00048828125–0.000244140625 SMMM100000000000 4
0.000244140625–0.0001220703125 SMM1000000000000 3
0.0001220703125–0.00006103515625 SM10000000000000 2
0.000030517578125 s100000000000000 1
0 s000000000000000 0
Table 2. Fractional fixed point data format recast with trailing zeros.
article we examine only one new data format, a ranges (A1 through M13 in Table 4) is combined
new fractional format. The main reasons for intro- to form a single 15-bit magnitude. This procedure
ducing a new fractional format are that this format is repeated to form a single 14-bit magnitude and
has a path into a C data type with the embedded-C on down the line. The result is that the binary
standard (ISO, 2003a), and it has a proven record fixed-point format can be considered a single
for use in signal-processing applications. If we can instance of the new class of floating-point formats
develop a fractional format with improved dynamic we developed in Table 3. To develop a new frac-
range that doesn’t sacrifice noise performance, we tional format, we follow the same approach, but
will have accomplished something worthwhile. skip the creation of the first 15-bit magnitude.
Any format optimized for fractional DSP This leaves us with two terms of each exponent
should take a clue from the noise performance size, and we order them in decreasing precision as
of fixed point and have the largest mantissas at shown in Table 5 and illustrated in Fig. 1c.
the top of the range. The mantissa should then This format provides almost double the
fall off gradually to the smallest mantissa. Such a dynamic range of standard fixed point at the cost
format derived from our new class is shown in of decreased precision at the very top of the
Table 4 and illustrated in Fig. 1b. This format numeric range (1.0–0.5). It also provides
provides a significant dynamic-range improve- improved precision over fixed point throughout
ment over the standard fixed-point format and the remainder of the numeric range. As we dis-
the prevailing 16-bit floating-point format (s10e5 cover, the increased dynamic range and addi-
or binary16). It should also provide significantly tional overall precision actually improve the
improved noise performance over the binary16 round-off noise performance of this format when
floating-point format. Its only weakness is that a compared to 16-bit fixed point.
single bit of precision was sacrificed in the top
two ranges when compared to fixed point. For a
fractional format, this is significant, and we can
SIMULATION AND RESULTS
do better by sacrificing some dynamic range. A simulation was performed to validate
By examining the binary fixed-point format, improved noise performance for the second new
we learn that a single term from each of the fractional format. Here we provide a summary
Format
S = sign bit Having derived a
M = mantissa data format that
Numeric range 0 = exponent N F
1 = separator actually improves
Red = actual digit numeric perfor-
Black = implied digit
mance for 16-bit
1.0–0.5 S0.1MMMMMMMMMMMMM1 14 A1 DSP applications,
we are left with
0.5–0.25 S0.01MMMMMMMMMMMM1 13 B2
several implementa-
0.25–0.125 S0.001MMMMMMMMMMMM10 13 B1 tion issues that must
be addressed before
0.125–0.0625 S0.0001MMMMMMMMMMM10 12 C2
the format is
0.0625–0.03125 S0.00001MMMMMMMMMMM100 12 C1 truly useful.
0.03125–0.015625 S0.000001MMMMMMMMMMM1 12 C3
0.015625–0.0078125 S0.0000001MMMMMMMMMM1 11 D4
0.0078125–0.00390625 S0.00000001MMMMMMMMMM1000 11 D1
0.00390625–0.001953125 S0.000000001MMMMMMMMMM100 11 D2
0.001953125–0.0009765625 S0.0000000001MMMMMMMMMM10 11 D3
… … …
0.0 0000000000000000 0 O1
other words, if there are five ones or five increased in size (16-bits) with this approach, the
zeros at the beginning of this field, you computational element (multiplier, arithmetic
replace them with ten ones or ten zeros. If the logic unit [ALU], accumulator) may now have
shift bit is a zero, then you remove one of the increased in size from 16 to 32 bits (at least for
leading sign bits at the front of the field. This addition). This format should prove easy to
number is then left justified into a 32-bit two’s implement in hardware but may result in a drop
complement format, and the least significant in throughput and an increase in power con-
bits are all set to zero. You now have expand- sumption due to the use of a 32-bit arithmetic
ed your 16-bit number into a 32-bit number unit together with encoding and decoding logic.
with maximum precision for the very largest Though this approach may slightly reduce the
fractions. throughput of a 16-bit DSP device, it should not
The data can now be operated upon with a significantly affect the size or cost of the device
traditional 32-bit two’s complement arithmetic because these attributes are driven primarily by
unit. However, we should point out that a 15-bit the size of the main data and address buses. The
DSP multiplier may be more appropriate use of a smaller 15 × 15-bit multiplier may also
because there are not more than 15 bits of preci- help with the size issue.
sion in either multiplicand. The resulting prod- Some might argue that a hardware imple-
uct can be shifted into a 32-bit format after the mentation of this new format would not be
15 × 15 bit multiplication is complete. worthwhile because it does not directly improve
Encoding from a 32-bit (or larger) two’s com- important hardware performance parameters
plement word to a 16-bit new fractional format such as circuit complexity, delay, and power con-
is accomplished by following the reverse opera- sumption. This point is granted. However, imple-
tion. The number of leading zeros is halved menting this new format in hardware directly
(rounding up). The shift bit is then set to zero if improves algorithm performance for many DSP
the original number of leading sign bits is odd. applications. Obviously, not all DSP applications
Once the number of leading sign bits has been require additional dynamic range or improved
halved and the shift bit set, then rounding must noise performance, but other DSP applications
be performed to compress the value into the go to great lengths algorithmically to obtain
resulting 15-bit two’s complement format. additional performance in these two categories.
Although the number format itself has not For these types of systems, improved algorithm
0.25–0.125 S0.001MMMMMMMMMMMM1 13 B2
0.0625–0.03125 S0.00001MMMMMMMMMMM1 12 C3
0.015625–0.0078125 S0.0000001MMMMMMMMMM1 11 D4
0.00390625–0.001953125 S0.000000001MMMMMMMMM1 10 E5
… … …
0.0 0000000000000000 0 O1
Note: Among the 16-bit formats blue indicates the best performer and red the second best performer.
BIOGRAPHIES
REFERENCES HOSSEIN SAIEDIAN [SM] (saiedian@eecs.ku.edu) received his
[1] ISO/IEC JTC1/SC22 WG14/N854, “DSP-C: An Extension to Ph.D. from Kansas State University in 1989. He is currently
IOS/IEC IS 9899:1990,” 1998; http://www.open- a professor and an associate chair in the Department of
std.org/jtc1/sc22/wg14/www/docs/n854.pdf (accessed Electrical Engineering and Computer Science at the Univer-
Nov. 13, 2006). sity of Kansas (KU) and a member of the KU Information
[2] A. V. Oppenheim and R. W. Shafer, Discrete-Time Signal and Telecommunication Technology Center (ITTC). His pri-
Processing, 2nd ed., Prentice-Hall, 1999. mary area of research is software engineering and informa-
[3] ISO/IEC JTC1/SC22 WG14/N1021, “Extensions for the tion security. He has over 100 publications on a variety of
Programming Language C to Support Embedded Pro- topics in software engineering and computer science.
cessors: ISO/IEC DTR 18037,” 2003; http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1021.pdf (accessed MANUEL RICHEY (manuel.richey@honeywell.com) graduated
Nov. 13, 2006). from the University of California at Davis in 1983 with a
[4] IEEE Std P754 Draft 1.2.5, “Draft Standard for Floating- Bachelor’s degree in electrical engineering. In 2007 he
Point Arithmetic”; math.berkeley.edu/~scanon/754 earned his Master’s degree in computer science from the
(accessed Nov. 13, 2006). University of Kansas. He is a principal engineer at Honey-
[5] ISO, “Rationale for International Standard-Programming well International Inc. where he has worked since 1985. At
Languages — C,” revision 5.1, 2003; http://www.open- Honeywell, he has worked primarily in the fields of digital
std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf signal processing and software radio. He holds seven U.S.
(accessed Oct. 18, 2006). patents and has four additional patents pending.