You are on page 1of 3

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/328878271

Approximate Adder Generation for Image Processing Using Convolutional


Neural Network

Conference Paper · November 2018


DOI: 10.1109/ISOCC.2018.8649928

CITATIONS READS

0 651

3 authors, including:

Toshinori Sato Tomoaki Ukezono


Fukuoka University Fukuoka University
197 PUBLICATIONS   973 CITATIONS    31 PUBLICATIONS   104 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Approximate arithmetic circuits View project

All content following this page was uploaded by Toshinori Sato on 12 November 2018.

The user has requested enhancement of the downloaded file.


Approximate Adder Generation for Image Processing
Using Convolutional Neural Network
Ryuta Ishida Toshinori Sato Tomoaki Ukezono
Graduate School of ECS Dept. of EECS Dept. of EECS
Fukuoka University Fukuoka University Fukuoka University
td172002@cis.fukuoka-u.ac.jp toshinori.sato@computer.org tukezo@fukuoka-u.ac.jp

Abstract— This paper proposes a design methodology for Design constraints User’s requirements Target data
configurable approximate arithmetic circuits. It considers the ‐ power ‐ error rate
‐ delay ‐ error distance
processed data of the target circuits. A prototype system, which ‐ area    …. ‐ PSNR    ….
relies on deep neural network, is built to confirm the
practicability of the methodology.

Keywords; approximate circuit; design methodology; CNN


Data‐Directed 
Circuit Generator
I. INTRODUCTION
Approximate computing [3] is a promising technique that
achieve low power, high performance, and small footprint at
the same time. There are a lot of studies on approximate
arithmetic circuits. Configurable approximate circuits [6,11,12]
recently interest researchers. Unfortunately, it is a difficult and Approximate circuit
tedious task to determine parameters for the configurations.
There are some previous studies that generate approximate Figure 1. Data-Directed Circuit Generation
circuits. SALSA [13] modifies the given RTL to its
approximate version. Similarly, ABACUS [7] synthesizes 1 4 7 4 1
⎡4 16 26 16 4⎤
approximate circuits based on its behavioral descriptions. ⎢ ⎥
𝐺 ⎢7 26 41 26 7⎥
SABER [9] determines the length of approximate bits in the ⎢4 16 26 16 4⎥
trailing bits of operands through an analytical expression of the ⎣1 4 7 4 1⎦
function. All of these previous studies consider only circuit
structure but their target data are out of considerations. Hence, where I(i, j) and R(i, j) are each pixel of the input and the
smoothed images, and G is its Gaussian kernel. Second, it
any optimizations regarding specific data cannot be conducted.
obtains the sharpened image S by S=2I-R. Multiplications are
In contrast, this paper studies a design methodology that is
replaced by additions and shift operations. After that, only
oriented to processed data.
additions are approximated. Lower-part OR adder (LOA) [6]
and Carry-Maskable Adder (CMA) [10,11] are chosen as the
II. DATA-DIRECTED CIRCUIT GENERATION target of circuit generation in this prototype.
Different data require different circuits, which are The LOA consists of the precise adder in the leading bits
optimized for several design constraints, even if their and OR gates in the trailing bits. This is because the lower bits
functionality is the same. Based on the observations, data- are less important in accuracy than the upper ones. Although it
directed design methodology shown in Fig. 1 is proposed. The is very simple, it is moderately accurate and is efficient both in
approximate circuit generator considers its target data as well power and in area. The LOA does not have dynamic
as some design constraints and user’s requirements. configurability, and thus its configuration should be statically
Parameterized approximate circuits are handled and their determined at the design time. The CMA consists of multiple
parameters are automatically determined. This relieves Carry-Maskable Full Adders (CMFAs), which are connected in
designers from the tedious task of determining the parameters. cascaded with the carry output from each CMFA connected to
the carry input of the CMFA in the chain, just like a ripple
III. A PROOF-OF-CONCEPT carry adder (RCA). It is depicted in Fig. 2. The signal ¬mask
The image sharpening circuit [5] is used to build a proof-of- configures the CMFA. When it is 0 (mask is 1), it works as an
concept prototype. First, it performs a Gaussian smoothing: OR gate. At this time, Cout as well as Cin is fixed to be 0.
Otherwise (mask is 0), it works as a precise full adder (FA).
1 Hence, if CMFAs in the leading bits are configured as FAs and
𝑅 𝑖, 𝑗 𝐺 𝑘 2, 𝑙 2 ∙𝐼 𝑖 𝑘, 𝑗 𝑙
273 those in the trailing bits are configured as OR gates, it works
similarly as to the LOA. The number of OR gates in the LOA
mask 256x256
x PSNR
Image
y s

cout
cin

Figure 2. CMFA Number of


RTL Generator

mask bits

256
64

11 32
16 16 16
11 3
3
3
3 3
5 Image sharpening circuit with LOA/CMA
3
5 16 16 16
Max 384 384 256 Max 9
32 pooling pooling
Figure 4. Prototype Tool
Output
Max 256
pooling 4096 4096
64
256

96
Input

Figure 3. AlexNet-based CNN

and the number of the masked bits in the CMA can be


determined in a same way. The noticeable difference is that the
CMA has dynamic configurability.
A slight variation of AlexNet [4] is used to determine the
bitwidth, which is the number of OR gates in the LOA or that
(a) Precise adder (b) LOA (5-bit ORs)
of the masked bits in the CMA. The trailing eight bits are
candidates of this choice. As shown in Fig. 3, it has five Figure 5. Sharpened Images
convolutional layers, three max pooling layers, and three fully-
connected layers. TensorFlow [1] with Keras [14] is used to by the precise and by the LOA, which are almost
implement it. The input is a 256x256 grayscale bitmap image nondistinguishable visually.
with 8-bit pixels. The output is the number of the mask bit: 0 to
8. Its training and the test sets are 10,000 and 2,000 images, IV. CONCLUSIONS
which are randomly selected from ILSVRC2012 dataset [8]. This paper proposes a data-directed methodology for
The batch and the epoch sizes are 250 and 50, respectively. designing approximate arithmetic circuits. Exploiting the
The prototype design tool is depicted in Fig. 4. The characteristics of processed data, parameters configuring the
representative images and user’s requirement provided as target circuit are decided. A proof-of-concept prototype is build
PSNR are entered into the tool, which mainly consists of and the practicability of the methodology is confirmed.
AlexNet and RTL generator. PSNR is defined as follows [2]:
𝐴𝑠 ACKNOWLEDGMENT
𝑃𝑆𝑁𝑅 10 log
𝑀𝑆𝐸 This work was supported by JSPS KAKENHI
1
(JP17K00088), by funds (No.175007 and No.177005) from the
𝑀𝑆𝐸 𝑠 𝑥, 𝑦 𝑠̃ 𝑥, 𝑦 Central Research Institute of Fukuoka Univ., and by VDEC,
𝑋∙𝑌
the University of Tokyo in collaboration with Synopsys, Inc..
where 𝐴𝑠 , s x, y , and 𝑠̃ 𝑥, 𝑦 are the maximum, the
precise, and the approximate values of each pixel, respectively, REFERENCES
and X and Y are the image dimensions. Due to the immaturity [1] M. Abadi, et al., arXiv:1603.04467, 2015.
of the prototype, only 40dB of PSNR is currently selectable. [2] D. Bull, “Communicating pictures,” Academic Press, 2014.
AlexNet predicts the LSB width of the approximate adder. By [3] J. Han, et al., doi:10.1109/ETS.2013.6569370, ETS, 2013.
receiving the bitwidth, the approximate adder, which satisfies [4] A. Krizhevsky, doi:10.1145/3065386, CACM, 2017.
the PSNR, is provided by the RTL generator. [5] M. S. Lau, et al., doi:10.1145/1629395.1629434, CASES. 2009.
In the experiments, the LSB width is correctly determined [6] H. R. Mahdiani, et al., doi:10.1109/TCSI.2009.2027626, TCAS I, 2010.
for 98.75% of 2,000 images and their dedicated approximate [7] K. Nepal, et al., doi: 10.7873/DATE.2014.374, DATE, 2014.
circuits are generated. Power, delay, and area of the sharpening [8] O. Russakovsky, et al., arXiv:1409.0575, 2014.
circuit are evaluated by using Mentor Graphics ModelSim and [9] D. Sengupta, et al., doi:10.1145/3061639.3062314, DAC, 2017.
Synopsys Design Compiler with NanGate 45nm library, In the [10] K. Tajima, et al., http://id.nii.ac.jp/1001/00185028/, 2018.
case of Lena, which is not included in the training nor test sets, [11] T. Yang, et al., doi:10.1109/ISQED.2018.8357311, ISQED, 2018.
the optimal number of five is successfully selected for the LOA. [12] T. Yang, et al., doi:10.1109/ASPDAC.2018.8297389, ASP-DAC, 2018.
The power, delay, and area are improved by 33.28%, 5.67%,
[13] S. Venkataramani, doi:10.1145/2228360.2228504, DAC, 2012.
and 21.86%, respectively. Fig. 5 shows the images processed
[14] https://github.com/keras-team/keras

View publication stats

You might also like