Professional Documents
Culture Documents
units in the convolutional layer as well as the fully connected accelerate CNN algorithms in an efficient way. Google TPUs,
layers. Since CNNs are tolerant to small error due to the nature FPGAs [1], and ASICs are competitive alternatives to the
of convolutional filters, the approximate arithmetic operations conventional computing systems as they can overcome the
incur little or no noticeable loss in the accuracy of the CNN, compute-bound issue in the CPUs or GPUs. However, these
which we demonstrate in our test results. For the approximate systems are still suffering from the heavy data traffic for
MAC unit, we use Dynamic Range Unbiased Multiplier (DRUM) external memory because accessing external memory
approximate multiplier and Approximate Adder with OR consumes significant amount of energy per read/write
operations on LSBs (AOL) which can substantially reduce the operation as well as long access time. To reduce this data
chip area and power consumption. The configuration of the movement overhead, alternative approaches are 1) modify the
approximate MAC units within each layer affects the overall memory to allow wider data bus, and 2) use multiple
accuracy of the CNN. We implemented various configurations of distributed memories. Parallel access allows several data to be
approximate MAC on an FPGA, and evaluated the accuracy processed in one clock cycle; hence, increasing the operating
using an extended MNIST dataset. Our implementation and speed and improving the hardware resource utilization.
evaluation with selected approximate MACs demonstrate that
the proposed CNN Accelerator reduces the area of CNN by 15%
at the cost of a small accuracy loss of only 0.982% compared to
We take advantage of the useful property that CNNs have the
the reference CNN. characteristics of intrinsic error tolerance. Due to this property,
using approximate multipliers and adders has little impact on
Keywords; Approximate Arithmetic,, Convolutional Neural the accuracy, while significantly reducing the implementation
Network (CNN), Hardware Accelerators, Approximate MACs. cost and the power consumption [2].
Authorized licensed use limited to: University of Cape Town. Downloaded on May 19,2021 at 13:48:51 UTC from IEEE Xplore. Restrictions apply.
II. OUR REGULAR CNN IMPLEMENTATION
Authorized licensed use limited to: University of Cape Town. Downloaded on May 19,2021 at 13:48:51 UTC from IEEE Xplore. Restrictions apply.