Beating Floating Point at its Own Game: Posit Arithmetic
A new data type called a posit is designed as a direct drop-in replacement for IEEE Standard 754 floating-point numbers (floats). Unlike earlier forms of universal number (unum) arithmetic, posits do not require interval arithmetic or variable size operands; like floats, they round if an answer is inexact. However, they provide compelling advantages over floats, including larger dynamic range, higher accuracy, better closure, bitwise identical results across systems, simpler hardware, and simpler exception handling. Posits never overflow to infinity or underflow to zero, and “Nota-Number” (NaN) indicates an action instead of a bit pattern. A posit processing unit takes less circuitry than an IEEE float FPU. With lower power use and smaller silicon footprint, the posit operations per second (POPS) supported by a chip can be significantly higher than the FLOPS using similar hardware resources. GPU accelerators and Deep Learning processors, in particular, can do more per watt and per dollar with posits, yet deliver superior answer quality. 2. Posits as an alternative to floats for weather and climate models Posit numbers, a recently proposed alternative to floating-point numbers, claim to have smaller arithmetic rounding errors in many applications. By studying weather and climate models of low and medium complexity (the Lorenz system and a shallow water model) we present benefits of posits compared to floats at 16 bit. As a standardised posit processor does not exist yet, we emulate posit arithmetic on a conventional CPU. Using a shallow water model, forecasts based on 16-bit posits with 1 or 2 exponent bits are clearly more accurate than half precision floats. We therefore propose 16 bit with 2 exponent bits as a standard posit format, as its wide dynamic range of 32 orders of magnitude provides a great potential for many weather and climate models. Although the focus is on geophysical fluid simulations, the results are also meaningful and promising for reduced precision posit arithmetic in the wider field of computational fluid dynamics. 3. “Efficient posit multiply-accumulate unit generator for deep learning applications The recently proposed posit number system is more accurate and can provide a wider dynamic range than the conventional IEEE754-2008 floating-point numbers. Its nonuniform data representation makes it suitable in deep learning applications. Posit adder and posit multiplier have been well developed recently in the literature. However, the use of posit in fused arithmetic unit has not been investigated yet. In order to facilitate the use of posit number format in deep learning applications, in this paper, an efficient architecture of posit multiply-accumulate (MAC) unit is proposed. Unlike IEEE754-2008 where four standard binary number formats are presented, the posit format is more flexible where the total bitwidth and exponent bitwidth can be any number. Therefore, in this proposed design, bitwidths of all datapath are parameterized and a posit MAC unit generator written in C language is proposed. The proposed generator can generate Verilog HDL code of posit MAC unit for any given total bitwidth and exponent bitwidth. The code generated by the generator is a combinational design, however a 5-stage pipeline strategy is also presented and analyzed in this paper. The worst case delay, area, and power consumption of the generated MAC unit under STM-28nm library with different bitwidth choices are provided and analyzed. 4. Performance-efficiency trade-off of low-precision numerical formats in deep neural networks Deep neural networks (DNNs) have been demonstrated as effective prognostic models across various domains, e.g. natural language processing, computer vision, and genomics. However, modern-day DNNs demand high compute and memory storage for executing any reasonably complex task. To optimize the inference time and alleviate the power consumption of these networks, DNN accelerators with low-precision representations of data and DNN parameters are being actively studied. An interesting research question is in how low-precision networks can be ported to edge-devices with similar performance as high-precision networks. In this work, we employ the fixed-point, floating point, and posit numerical formats at ≤8-bit precision within a DNN accelerator, Deep Positron, with exact multiply-and-accumulate (EMAC) units for inference. A unified analysis quantifies the trade-offs between overall network efficiency and performance across five classification tasks. Our results indicate that posits are a natural fit for DNN inference, outperforming at ≤8-bit precision, and can be realized with competitive resource requirements relative to those of floating point.