Professional Documents
Culture Documents
Design Space Exploration of Convolution Algorithms To Accelerate CNNs On FPGA
Design Space Exploration of Convolution Algorithms To Accelerate CNNs On FPGA
2
Department of Electrical Engineering
Indian Institute of Technology Patna, India
debdeep.ee15@iitp.ac.in
Abstract—Deep Neural Networks (DNN) are promising so- Convolution layers account for more than 90% of the total
lutions for various artificial intelligence tasks. Convolutional computations in a CNN. As the CNN model goes deeper,
Neural Network (CNN) is a variant of DNN, which is widely number of layers increase and computational complexity also
used in various computer vision tasks like image and face
recognition, autonomous vehicles, games, video surveillance and increases. Therefore in this paper, we restrict our focus to
various medical applications. CNNs are both compute and mem- these layers. Various techniques are available in many re-
ory bound. Convolutional layers are the most computationally search papers for performing convolution operations. Three
complex operation in CNN. Owing to the computation demanded major methods for computing convolutions, like conventional
by convolutions of CNNs, FPGAs are found to be suitable method, Fast Fourier Transform (FFT) based convolution and
for accelerating CNNs. In this paper we have carried out a
design space exploration of various algorithms for performing Winograd minimal filtering method are discussed in this paper.
operations in different convolutional layers of CNNs. Analysis Depending on the input feature map size and the kernel size,
has been done to select an appropriate algorithm for various selection of convolution algorithm in each convolutional layer
convolution layers of AlexNet CNN model based on the kernel of a CNN model varies. In this paper we perform a design
size and input feature map. First convolution layer in AlexNet space exploration of various algorithms used for performing
CNN model with three channels of 227×227 feature size and
96 channels of 11×11 kernel, has been implemented in Xilinx convolutions in CNN.
Virtex-7 FPGA. Rest of the paper is organized as follows. Section II gives the
Index Terms—Convolutional Neural Network, Deep learning, background and related works. Section III discusses FFT based
FFT, FPGA, Winograd minimal filtering convolution and Winograd minimal filtering for computing
I. I NTRODUCTION convolution operations. Section IV gives the design space
exploration of convolution techniques. Implementations and
In recent years, deep learning has gained wide popularity
evaluation results are described in Section V. Section VI
in performing various machine learning tasks. Among various
concludes the paper.
deep learning techniques, Deep Neural Network (DNN) has
the capability of learning high level features with high per-
formance. A common form of DNN is Convolutional Neural II. BACKGROUND AND R ELATED WORK
Network (CNN), which consists of a number of convolutional
layers. CNNs find applications in speech processing, natural CNNs are composed of different layers like convolution
language processing, image classification, face recognition, layer (CONV), pooling layers, normalization layer and fully
cancer detection, weather forecasting and many other con- connected (FC) layers. CONV layers are used to perform
sumer electronic devices. CNNs are accelerated using Graphic feature extraction and pooling layers perform sub sampling.
Processing Unit (GPU), Application Specific Integrated Circuit Feature classification is performed by fully connected layers,
(ASIC) and Field Programmable Gate Array (FPGA) in recent which are memory bound owing to the large number of
years. GPUs are are widely used in CNN tasks, but their power weights in its computation. In convolution operation, each
consumption is very high and hence unsuitable for embedded element of input is multiplied with each coefficient of filter
system applications [13]. Various FPGA based implementa- and the results are summed up. This is a MAC (multiply-
tions [6]–[10], [13], [14], [16]–[18] and ASIC implementa- accumulate) operation. CONV layers compose of 90% of the
tions [1], [2] of CNN are available from various research total computations in a CNN.
groups. ASIC based accelerators give high performance but Consider M channels of H × W input feature maps and
with limited flexibility whereas FPGA based implementations K × K kernels, to perform convolution. Also consider a stride
show acceptable power consumption and performance. of S, and the output feature map is denoted as Y . Conventional
978-1-5386-6575-6 /18/$31.00 © 2018 IEEE convolution (direct convolution) is given by the formula as in
1
Authorized licensed use limited to: National University Fast. Downloaded on September 01,2023 at 06:42:36 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. FFT based Convolution
2018 Eighth International Symposium on Embedded Computing and System Design (ISED)
2
Authorized licensed use limited to: National University Fast. Downloaded on September 01,2023 at 06:42:36 UTC from IEEE Xplore. Restrictions apply.
TABLE I
D ESIGN PARAMETERS FOR CONVOLUTION
2018 Eighth International Symposium on Embedded Computing and System Design (ISED)
3
Authorized licensed use limited to: National University Fast. Downloaded on September 01,2023 at 06:42:36 UTC from IEEE Xplore. Restrictions apply.
TABLE III
R ESOURCE U TILIZATION
VI. C ONCLUSION
Convolutional layers account for more than 90% of total
computation in Convolutional Neural networks. Hence an effi-
cient hardware architecture is required for this highly complex
computational unit. We have explored various algorithms for
performing convolution operations in CNN. As a case study
we have considered AlexNet CNN model. It is seen from the
Fig. 6. CONV1 layer of AlexNet
analysis that first layer of AlexNet gives better performance
with conventional algorithm. In this paper we have proposed
an efficient architecture for implementing convolution opera-
tion in the first layer of AlexNet. We have used 32-bit floating
ture is also written back to DDR. Single channel of 227×227
point arithmetic for implementation. Our architecture has
input feature is considered first, which has to be convolved
been implemented in Xilinx XC7V2000 FPGA with operating
with 96 filters of size 11×11 each. In our architecture we use
frequency of 200 MHz. Our architecture gives a performance
11 MAC units per filter for performing a row convolution. We
of 422 GFLOPs for the first layer of convolution in AlexNet
have 96 such row convolution units which perform convolution
CNN model. Future work focus on implementation of rest of
operation in parallel. Similarly three channels of input feature
the layers in AlexNet using suitable fast algorithms.
map in CONV1 are considered for computation. Outputs of
all the channels are added together, reusing the accumulators ACKNOWLEDGEMENT
in MAC (multiply and accumulate) unit.
This work is supported in part by Kerala State Council for
Science Technology and Environment (KSCSTE), under Back-
to-Lab programme of Women Scientist Division.
R EFERENCES
[1] Fengbin Tu, Shouyi Yin, Peng Ouyang, Shibin Tang, Leibo Liu and
Shaojun Wei., ”Deep Convolutional Neural Network Architecture With
Reconfigurable Computation Patterns” In IEEE Transactions on Very
Large Scale Integration (VLSI) Systems,August 2017, pp. 2220–2233
[2] Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang,
Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun and Olivier Temam”,
”DaDianNao: A Machine-Learning Supercomputer ” In Proceedings of
The 47th Annual IEEE/ACM International Symposium on Microarchi-
tecture, MICRO 2014
[3] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ”ImageNet
Classification with Deep Convolutional Neural Networks”. NIPS 2012.
[4] Andrew Lavin and Scott Gray. ”Fast Algorithms for Convolutional
Neural Networks”. In IEEE CVPR,2016.
[5] Roberto DiCecco, et.al., ”Caffeinated FPGAs: FPGA Framework For
Fig. 7. Proposed Architecture Convolutional Neural Networks”. In IEEE FPT 2016.
2018 Eighth International Symposium on Embedded Computing and System Design (ISED)
4
Authorized licensed use limited to: National University Fast. Downloaded on September 01,2023 at 06:42:36 UTC from IEEE Xplore. Restrictions apply.
[6] Liqiang Lu, Yun Liang, Qingcheng Xiao and Shengen Yan, ”Evaluating
Fast Algorithms for Convolutional Neural Networks on FPGAs”. In
IEEE FCCM, 2017.
[7] Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao and
Jason Cong, ”Optimizing FPGA-based Accelerator Design for Deep
Convolutional Neural Networks”. In ACM/SIGDA FPGA 2015.
[8] Manoj Alwani, Han Chen, Michael Ferdman and Peter Milder,”Fused-
Layer CNN Accelerators”, In Proceedings of The 47th Annual
IEEE/ACM International Symposium on Microarchitecture, MICRO
2016
[9] Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, Junbin Wang, Song
Yao, Song Han, Yu Wang and Huazhong Yang, ”Angel-Eye: A Complete
Design Flow for Mapping CNN Onto Embedded FPGA”. IEEE TCAD
37, 1 (Jan. 2018), pp. 35–47.
[10] Mohammad Motamedi, Philipp Gysel, Venkatesh Akella and Soheil
Ghiasi, ”Design Space Exploration of FPGA-Based Deep Convolutional
Neural Networks”. In IEEE ASP-DAC 2016.
[11] K. Simonyan and A. Zisserman. ”Very Deep Convolutional Networks
for Large-Scale Image Recognition”. In ICLR 2015.
[12] Michael Mathieu, Mikael Henaff, and Yann LeCun. ”Fast Training of
Convolutional Networks through FFTs”. In ICLR 2014.
[13] Huimin Li, Xitian Fan, Li Jiao, Wei Cao, Xuegong Zhou and Lingli
Wang, ”A High Performance FPGA-based Accelerator for Large-Scale
Convolutional Neural Networks”. In FPL 2016.
[14] Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo and Sarma Vrudhula,
”Scalable and Modularized RTL Compilation of Convolutional Neural
Networks onto FPGA”. In FPL 2016.
[15] Maurice Peemen, Arnaud A. A. Setio, Bart Mesman and Henk Cor-
poraal, ”Memory-Centric Accelerator Design for Convolutional Neural
Networks”. In IEEE ICCD 2013.
[16] Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula and Srihari
Cadambi, ”A Dynamically Configurable Coprocessor for Convolutional
Neural Networks”, In International Symposium on Computer Architec-
ture, ISCA, 2010.
[17] Jiantao Qiu et.al, ”Going Deeper with Embedded FPGA Platform for
Convolutional Neural Network”. In ACM/SIGDA FPGA 2016.
[18] Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan and Jason Cong,
”Caffeine: Towards Uniformed Representation and Acceleration for
Deep Convolutional Neural Networks”, In Proceedings of International
Conference on Computer Aided Design”, ICCAD, 2016.
[19] Abhinav Podili and Chi Zhang and Viktor Prasanna”, ”Fast and Effi-
cient Implementation of Convolutional Neural Networks on FPGA”, In
Proceedings of ASAP, 2017
[20] Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei
Ma, Sarma Vrudhula, Jae-sun Seo, Yu Cao, ”Throughput-Optimized
OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural
Networks”, In FPGA 2016
[21] Tahmid Abtahi, Amey Kulkarni, and Tinoosh Mohsenin ”Accelerating
Convolutional Neural Network with FFT on Tiny Cores”, In ISCAS 2017
2018 Eighth International Symposium on Embedded Computing and System Design (ISED)
5
Authorized licensed use limited to: National University Fast. Downloaded on September 01,2023 at 06:42:36 UTC from IEEE Xplore. Restrictions apply.