You are on page 1of 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/310970269

Correction of VHDL 2008 fixed-point library

Technical Report · October 2016


DOI: 10.13140/RG.2.2.33860.42884

CITATIONS READS
0 1,194

3 authors:

Manuel Carmona David Roma


University of Barcelona IEEC Institute of Space Studies of Catalonia
35 PUBLICATIONS   287 CITATIONS    31 PUBLICATIONS   138 CITATIONS   

SEE PROFILE SEE PROFILE

José María Gómez Cama


University of Barcelona
72 PUBLICATIONS   667 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Target assignment and path calculation for MIRADAS probe arms View project

Ambient Intelligence for Biomedical Applications View project

All content following this page was uploaded by José María Gómez Cama on 28 November 2016.

The user has requested enhancement of the downloaded file.


1

Correction of VHDL 2008 fixed-point library


Manuel Carmona, David Roma, and Jose M. Gomez

Abstract—The VHDL 2008 standard provides the fixed point Where xhigh is the sign bit. If it is 0, the value represented
type, and the associated library. This library presents a small is positive. If not, the value is negative.
bug that makes that a number can have a different rounding As a result the number 6.5 will be represented as:
depending on its representation. The authors have found that
there is a small error in the definition of the resize function that
y = − 0 − 22 + 21 + 2−1

generates this incorrect behavior. A test is presented that shows = 6.5 (2)
the undesired behavior. Finally, a workaround is provided that
solves the problem while the new library is being prepared. or, expressed in binary:
x <= "000001101000";
I. I NTRODUCTION
While the value -0.125 can be represented as:
Fixed point arithmetic is highly used in image and signal
processing. It is simpler to implement in FPGAs or ASICs than x <= "111111111110";
its floating point counterpart, providing equivalent results in whose value can be calculated using the formula:
terms of precision at a reduced cost in terms of resources.
For this reason VHDL 2008 [1] has incorporated the fixed-
point package. This package provides the arithmetic operators y= − 27 − 26 + 25 + 4 3 2 1 0
 2 + 2 + 2 + 2 + 2 +
−1 −2 −3 (3)
and functions necessary to work with these numbers. 2 +2 +2 = −0.125
The authors have been using this package during the last two The same number can also be represented in different ways
years. During this time, they have detected some mismatches depending on the range of bits used. The minimum number
between the results obtained using fixed-point arithmetic and of bits needed to represent the value 6.5 with an sfixed is 5
the expected ones that were calculated using floating-point bits:
numbers.
As a matter of fact, the number -0.125 could be rounded to signal x: sfixed (3 downto -1);
0.0 or -1.0 depending on the way it was represented using the x <= "01101";
fixed-point notation. For this reason, the code of the library While -0.125 only needs 2 bits:
was revised tracing the error to the resize function.
y = − 2−2 − 2−3 = −0.125

(4)
II. F IXED - POINT NOTATION
In VHDL 2008, the signed fixed point type is defined in the so it can be written as:
following way: signal x: sfixed (-2 downto -3);
type sfixed is array (INTEGER range <>) x <= "11";
of STD_ULOGIC; While the representation may differ if we use 12, 5 or 2
As a result, a sfixed number is declared as sfixed (high bits, we will expect to get the same result when applying
downto low), where high is the index of the most signifi- a mathematical function. This is not the case for the resize
cant bit, and low is the index of the least one. The following function included in the fixed point package. This can be seen
code shows an example: using the test described in the following section.
signal x: sfixed (7 downto -4);
III. T EST
Which represents a signed fixed point 12 bits wide (7 - (-
The error can be detected doing some simple operations
4) + 1), with 4 bits after the binary point. The bits of the
when using resize with rounding. If truncation is used, no
number can be indexed as xi . The sign is coded using two’s
error appears. The VHDL test code is:
complement. The value represented by the sfixed number can
be determined using the formula: library IEEE;
use IEEE.std_logic_1164.all;
high−1
!
high
X
i use IEEE.fixed_float_types.all;
y = − xhigh · 2 − xi · 2 (1) use IEEE.fixed_pkg.all;
i=low
use IEEE.math_real.all;
M. Carmona, D. Roma, J.M. Gomez are with the Department of Electronic use std.textio.all;
Engineering, Universitat de Barcelona, Barcelona, ES
J. M. Gomez is also with Institute of Cosmos Sciences, University of
Barcelona (IEEC-UB), Barcelona, Spain entity sfixed_resize is
Manuscript received October 19, 2016; revised November 20, 2016. end entity sfixed_resize;
2

upper one is -2, so the condition is met. In case of b, the


architecture test of sfixed_resize is upper one is 7, so it is not met.
signal a: sfixed(-2 downto -3); • The lower limit of the resulting number is different from
signal b: sfixed(7 downto -4); the input number upper limit plus one. The a upper one
signal c: sfixed(7 downto 0); plus 1 is -1, so the condition is met another time. In case
signal d: sfixed(7 downto 0); of b, it is 8, so it is not met.
begin • Rounding style is used, which is met in both cases.
a <= "11"; We can see that when the conditions are met (signal a),
b <= "111111111110"; the resize function fails. In any other case, the results are
correct (signal b).
SFIXED_RESIZE_LOGIC: process is
A new function, my_resize, has been written that returns
variable print: line;
a 0 value when the three conditions are met. In any other case,
begin
it calls the original resize:
wait for 10 * 1 ns;
c <= resize(a, 7, 0, function my_resize (
fixed_saturate, arg :
fixed_round); UNRESOLVED_sfixed;
d <= resize(b, 7, 0, constant left_index :
fixed_saturate, INTEGER; -- integer portion
fixed_round); constant right_index :
wait for 10 * 1 ns; INTEGER; -- size of fraction
write(print, to_real(a)); constant overflow_style :
write(print, string’(" ")); fixed_overflow_style_type :=
write(print, to_real(b)); fixed_overflow_style;
writeline(output, print); constant round_style :
write(print, to_real(c)); fixed_round_style_type :=
write(print, string’(" ")); fixed_round_style)
write(print, to_real(d)); return UNRESOLVED_sfixed
writeline(output, print); is
wait; constant arghigh : INTEGER :=
end process SFIXED_RESIZE_LOGIC; arg’high;
end architecture test; constant arglow : INTEGER :=
The results are: arg’low;
variable result :
-1.250000e-01 -1.250000e-01 UNRESOLVED_sfixed(left_index downto
-1.000000e+00 0.000000e+00 right_index) :=
(others => ’0’);
The test makes a conversion to the sfixed equivalent of
begin -- resize
an integer number (the lowest bit index is 0). The rounding
if (right_index > arghigh) and
method in this case is equivalent to the ”Round to nearest, ties
(round_style = fixed_round) and
to even” described in the IEEE 754-2008 [2].
(right_index /= arghigh+1) then
It can be seen that a and b represent the same number (-
result := (others => ’0’);
1.25e-01 or -0.125). Both are converted to a signal which has
else
the same format (sfixed(7 downto 0)) and represents
result := resize(arg, left_index,
a signed integer number which has 8 bits size. However, the
right_index,
results differ. In case of c, the resulting value is -1, while in
overflow_style,
case of d the result is 0. Taking into account the rounding
round_style);
applied, the value in both cases should be 0.
end if;
return result;
IV. W ORKAROUND end function my_resize;
The solution would be modifying the package, but it is a The function can be tested substituting the function
standard package, so we prefer to write a workaround while resize by my_resize. The results are in this case:
the standard is modified. This can be done analysing the code.
The incorrect results appear when three conditions are met: -1.250000e-01 -1.250000e-01
0.000000e+00 0.000000e+00
• The lower limit of the resulting fixed point number is
greater than the upper limit of the input one. As an As it is expected, the results are independent of the repre-
example, the lower limit for signals c and d is 0. The a sentation. The condition has been obtained from the package
3

body source code. This implies that this case can be easily
corrected.

V. C ONCLUSIONS
A bug has been detected in the resize function of the
VHDL-2008 Fixed point package. It gives different results
for the same number depending on its original representation.
A workaround function is presented that solves the problem
while the package is updated in the different VHDL compilers
and synthesizers.

ACKNOWLEDGMENT
This work has been funded by the Spanish MINECO
through project ESP2015-66494-R, including a percentage
from European FEDER funds.

R EFERENCES
[1] IEEE Std 1076-2008 - IEEE Standard VHDL Language Reference Man-
ual. IEEE, 2008.
[2] IEEE Std 754-2008 - IEEE Standard for Floating-Point Arithmetic.
IEEE, 2008.

View publication stats

You might also like