You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/363801475

Performance Evaluation of Low-Precision Quantized LeNet and ConvNet Neural


Networks

Conference Paper · August 2022


DOI: 10.1109/INISTA55318.2022.9894261

CITATIONS READS
2 18,404

3 authors, including:

Guner Tatar Salih Bayar


Fatih Sultan Mehmet Vakif Üniversitesi Marmara University
12 PUBLICATIONS 53 CITATIONS 32 PUBLICATIONS 215 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Guner Tatar on 06 October 2022.

The user has requested enhancement of the downloaded file.


Performance Evaluation of Low-Precision
Quantized LeNet and ConvNet Neural Networks
2022 International Conference on INnovations in Intelligent SysTems and Applications (INISTA) | 978-1-6654-9810-4/22/$31.00 ©2022 IEEE | DOI: 10.1109/INISTA55318.2022.9894261

Guner TATAR Salih BAYAR Ihsan CICEK


Dept. of Electrical-Electronics Eng., Dept. of Electrical-Electronics Eng., Dept. of Electronics Eng.,
Fatih Sultan Mehmet Vakıf University Marmara University Gebze Technical University
Istanbul, Turkey Istanbul, Turkey Kocaeli, Turkey
gtatar@fsm.edu.tr salih.bayar@marmara.edu.tr ihsancicek@gtu.edu.tr

Abstract—Low-precision neural network models are crucial for demonstrated promising throughput and power efficiency when
reducing the memory footprint and computational density. How- compared to traditional CPU/GPU architectures, owing to
ever, existing methods must have an average of 32-bit floating- inherent parallel processing capabilities [8].
point (FP32) arithmetic to maintain the accuracy. Floating-point
numbers need grave memory requirements in convolutional and
deep neural network models. Also, large bit-widths cause too
much computational density in hardware architectures. More-
over, existing models must evolve into deeper network models
with millions or billions of parameters to solve today’s problems.
The large number of model parameters increase the computa-
tional complexity and cause memory allocation problems, hence
existing hardware accelerators become insufficient to address
these problems. In applications where accuracy can be traded-
off for the sake of hardware complexity, quantization of models
enable the use of limited hardware resources to implement neural
networks. From hardware design point of view, quantized models
are more advantageous in terms of speed, memory and power
consumption than using FP32. In this study, we compared the
training and testing accuracy of the quantized LeNet and our Fig. 1. The trade-off between computational complexity and memory require-
own ConvNet neural network models at different epochs. We ments . [9]
quantized the models using low precision int-4, int-8 and int-16.
As a result of the tests, we observed that the LeNet model could
only reach 63.59% test accuracy at 400 epochs with int-16. On Designers have developed more extensive architectures to
the other hand, the ConvNet model achieved a test accuracy of increase the performance of CNNs architectures. In an ex-
76.78% at only 40 epochs with low precision int-8 quantization. ample classification problem, using a wider and deeper CNN
Index Terms—Convolutional neural networks, Quantized neu- architecture in this manner, reduces the classification error
ral networks, FPGA, Hardware accelerators, Floating point rate. In [9], the authors clearly demonstrated the relationship
arithmetic, Fixed point arithmetic, LeNet, ConvNet
between computational density and memory requirement in the
ImageNet classification example using different CNN models
I. I NTRODUCTION [10], [12]. As it can be seen from Fig. 1, the ImageNet
Convolutional neural networks (CNNs) have demonstrated classification error rate drops from 17% to 2.9%. Likewise,
significant improvements in performance in a variety of ap- it is obvious that the computational density increases as the
plications, including object detection, recognition, and classi- network model grows. In addition, the memory requirement in-
fication [1], [2]. CNN models often have an extensive com- creases noticeably. General-purpose CPUs can’t handle CNNs
putational density. Hence, software optimization and powerful with such computational complexity and memory require-
hardware accelerator architectures are needed to alleviate this ments. One of the other problems is that CNN models consist
computational complexity. Neural network designers can opt of millions of parameters, which brings along the bandwidth
for CPU/GPU clusters [3], FPGAs [4], and application-specific problem. This situation also makes memory accesses difficult.
integrated circuits (ASICs) [5]–[7] for training and inference Model pruning or weight and activation function quantization
of CNN models. Customized accelerators in FPGAs have methods are used to reduce computational density and number
of parameters of CNN models [11]. While the pruning method
prevents the mesh from being overfitted during training, the
978-1-6654-9810-4/22/$31.00 ©2022 IEEE quantization method causes a decrease in the accuracy of the

Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on October 06,2022 at 07:59:48 UTC from IEEE Xplore. Restrictions apply.
inference, if not used properly. The designer should realize a
goal-oriented model by considering these trade-offs.
The quantization of CNN model weights and activation
functions has a drawback that may reduce inference accuracy.
Because the model parameters have FP32 (32 bits floating-
point arithmetic) in the first case, the model exhibits suc-
cessful CNN inference after training. If we set our network
to int-8 (8 bits integer) or lower precision instead of FP32,
there would be low inference even if the network is well
trained. On the other hand, the memory requirement of the
quantized network is much lower than that of the floating
point counterpart, and the system will consume less energy.
Hence, this makes the quantized model more suitable for the
battery-powered embedded devices. In [13], the authors made
image classification using the ResNet model. They compared
the inference accuracy of the ResNet model implemented
using the floating-point arithmetic against the integer quantized
network implementation. The authors of [13] consequently
showed that the network model with floating-point arithmetic Fig. 2. Representative quantization-aware training scheme.
gives better results than the integer quantized network in terms
of accuracy. On the other hand, in the study conducted in [14],
the authors stated that a quantized neural network reduces definition.
computation time and energy consumption.
(
0 if x < zx
In this study, we used the LeNet [15] and ConvNet model to ReLU(x,0,0,1) = (2)
1 if x ≥ zx
classify the CIFAR-10 dataset for the FPGA-based hardware
accelerators. We set the weights and activation functions of Then we’ll look at how to calculate the QuantReLU mathe-
the models to be int-4, int-8 and int-16 low precision integers matically.
and performed the performance comparison of the network.
We used the PyTorch framework to build our model and the
y = ReLU (x, 0, 0, 1) (3)
Brevitas library to quantize weights and activation functions (
[16]. Brevitas is a PyTorch library for quantization-aware 0 if x < zx
=
training (QAT). The idea of QAT is actually to reduce the 1 if x ≥ zx
effect of data loss in neural networks during training. This way,
= sy (yq − zy )
the model will only affect the accuracy of the inference less
negatively during inference. The QAT representation scheme = ReLU (sx (xq − zx ), 0, 0, 1)
(
is given in Fig. 2. Here, we obtained int-32 by adding int- 0 if sx (xq − zx ) < 0
8 valued inputs and biases. By using the QuantReLU at =
(sx (xq − zx ) if sx (xq − zx ) ≥ 0
the output, quantization can be performed according to the (
intended int-bit value. We used QuantReLU method from 0 if xq < zx
=
the Brevitas library to quantize the activation functions of sx (xq − zx ) if xq ≥ zx
the models. To quantize the bias values of the network, we
imported the int8Bias quantizer from the Brevitas library and Accordingly;
adjust the network appropriately. The quantized activation (
function QuantReLU is defined by the equations below [17]. zy if (xq < zx
Onnxruntime [18], FINN [19], [20], and PyTorch’s quantized yq = sx (4)
zy + − zx ) if xq ≥ zx
sy (xq
inference operators can be used to transfer the quantized model sx
to FPGA hardware platforms. = ReLU (xq , zx , zy , )
sy
As a result, to perform the QuantReLU corresponding to
( the floating-point yq = ReLU (x, 0, 0, 1), we simply need to
zy if x < zx
QuantReLU(x,zx ,yx ,k) = perform;
zy + k(x − zx ) if x ≥ zx
(1) sx
yq = ReLU (xq , zx , zy , ) (5)
sy
When zx = 0, zy = 0 and k = 1, the commonly used ReLU Where; xq is quantized matrices, zx and zy are zero points, s
in deep learning models is actually a special case of the above is a positive floating-point scale value.

Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on October 06,2022 at 07:59:48 UTC from IEEE Xplore. Restrictions apply.
The remaining scenario of the paper is as follows: In section while SGD may show better results in another model. We can
II, we addressed the most recent advances and methods in determine this situation by trial and error method.
quantization. Besides, we have released information about
LeNet models with low precision integer quantizations. We III. M ODEL S ETUP, T RAINING AND I NFERENCE
discussed the benefits and drawbacks of the floating-point and We built our model by considering the LeNet infrastructure
low-precision quantized network models. In section III, we set given in Fig. 3. In contrast to the conventional LeNet, we have
up the environment for training and inference of the LeNet added a dropout layer after the fully-connected (FC) layers
and ConvNet models. In addition, we made statements about in our model. We observed that the network received proper
the challenges we encountered and how we overcame them. training without memorization using this approach.We realized
Throughout section IV, We assessed the performance of the that, followed by the addition of the dropout layer, the network
LeNet and ConvNet models. We attempted to show the effects is trained normally up to a certain number of Epochs (50-60),
of the training steps. Finally, we provided a brief overview after which it begins memorizing at high Epochs (more than
of our work, contribution to the literature, and knowledge 60). We determined the model’s cost function in classification
regarding our future studies in section V. with categorical cross-entropy, which is the mean of the cross-
entropy for all training data-set. It is used when the label of
II. BACKGROUND AND M OTIVATION the categorical cross-entropy data-set is one-hot encoded. This
Researchers use CNN models for many purposes, includ- indicates that only one bit of data at a time is correct, such
ing feature extraction, classification, or object recognition by as [0, 0, 1], [0, 1, 0], and [1, 0, 0]. We utilized stochastic
passing the input data through more than one type of layers. gradient descent (SGD) for optimization in our model. We
Neurons are used in each layer to hold the input feature also used the Adam optimization method, but we noticed that
values and transfer them to the next layer. The layers of a the network ran slower than the SGD during training. We also
CNN architecture can be composed of convolution, pooling, observed that the SGD performed better when the accuracy of
normalization, fully connected, or flattened, depending on the the inference was taken into account. We can define categorical
intended use. Convolutional and fully connected layers often cross-entropy and SGD mathematically as in Eq. 6 and Eq. 7,
require high computational density and large memory size. respectively.
Generally, we can classify CNN architectures as thin neural
N
networks, if they have less than 50 layers, as medium, if X
Loss = − yi log(ŷi ) (6)
they have 50 to 100 layers, and finally, as deep for more
i=1
than 100 layers [9]. For battery-powered portable systems,
small memory requirements and computational density are Where; yi is the true label, and ŷi is the predicted label,
crucial. Because prolonged computational density consumes respectively.
more power. The quantization of the CNN model extends the θ = θ − η.∇θ J(θ; x(i) ; y (i) ) (7)
life of such devices.
Studies have shown that low-precision fixed-point provides
∂ ∂ ∂
almost as much inference accuracy as floating-point arithmetic ∇= ı̂ + ȷ̂ + ẑ (8)
[22], [23]. Accurate quantization of the weight and activation ∂x ∂y ∂k
functions of the neural network model is advantageous in terms where ∇ is the gradient operator, η is the learning rate, x(i) is
of inference accuracy as well as memory requirements and the training example and y (i) represents the label, respectively.
power consumption. In the study [21], the authors argued
that the best scale for the network layers of the model is
16-bit fixed-point quantization. In the study conducted in
[22], the authors showed that for the LeNet model, 62.75%
accuracy was achieved with 8-bit quantization, while 68.70%
accuracy was obtained with 16-bit quantization. In addition,
they showed that a 68.6% accuracy rate was achieved in the
model they used FP32. In the study [23], the authors proposed
16-bit fixed-point arithmetic for the best scale factor at each
layer of the model. They showed that the use of fixed-point
has a loss of accuracy of about 2% compared to floating-
point. Inference accuracy and latency depend on software
optimizations and the usage of hardware accelerators. The
designer should make his work goal-oriented. A one-to-one
comparison of studies in the literature is often not possible. Fig. 3. LeNet model classifier architecture. [15]
Moreover, the equipment preferred for any model may not give
good results in different models or vice versa. For example, We trained the models according to the algorithmic
Adam optimization may give the desired results in one model, flowchart presented in Fig. 4. In the flowchart, the weight and

Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on October 06,2022 at 07:59:48 UTC from IEEE Xplore. Restrictions apply.
activation functions of the model are determined randomly TABLE I
first. Then, we performed feed-forward propagation accord- P ERFORMANCE COMPARISON OF LOW- PRECISION QUANTIZATION IN
L E N ET
ing to the input data set. Afterwards the error between the
predicted result and the actual result is computed. We end Activation Func.
Epochs Loss Train acc. Test acc.
the training if the error is the smallest possible number. We & Weights bits
continue to update the weights if the error is not within the 4 100 1.0767 62 60.69
4 200 1.0395 63 60.97
preset region. Finally, if the epoch value is insufficient, it is 4 300 1.0224 63 60.86
increased. 4 400 1.0147 64 60.59
8 100 1.0826 62 60
8 200 1.0337 63 61.13
8 300 0.9957 65 62
8 400 0.9906 65 62.477
16 100 1.0696 64 61.13
16 200 1.0353 65 61.94
16 300 0.0997 65 63.294
16 400 0.9876 66 63.59

Fig. 5. Proposed ConvNet model classifier architecture.

40 epochs. As it is obvious from these results, we were limited


in only few training and inferences due to the time constraints.
Fig. 4. Algorithmic flow of the model training process. Table II shows the performance results of the ConvNet model.
The ConvNet model clearly outperforms the other models even
As it can be observed in Table I, the LeNet model achieved after only 20 epochs. Furthermore, Table II clearly shows that
66% training accuracy with int-16 bit-width weights and acti- int-4 and int-8 quantization outperform int-16 quantization,
vation functions at 400 epochs values, while testing accuracy with int-8 quantization outperforming both. It should be noted
was 63.59%. As it is commonly known, LeNet is the first and that high bit-width quantization for weights and activation
simplest network model [24]. The reason for the low training functions has a negative impact on inference accuracy. We
and test accuracy was that the model used could not learn used a computer with the features in Table III for training of
enough. Additionally, CIFAR-10 is one of the data sets that both models. From here, it should be noted that high bit-width
are very difficult to classify. quantization for weights and activation functions negatively
We included our own ConvNet model in our study to get affects the accuracy of the inference.
better results and demonstrate the performance of the model.
Fig. 5 illustrates the ConvNet model that consists of four IV. P ERFORMANCE E VALUATION
convolutions, three fully connected (FC), and two max-pooling CNN and Deep Neural Network (DNN) based algorithms
layers. The number of parameters increases proportionally to are among the most commonly used methods for solving
the number of layers in the model. This directly affects the artificial intelligence problems. Depending on the level and
time used for training and inference. In our new model, we type of problem, CNN or DNN models can be simple or
performed few pieces of training and inferences in 20, 30 and complex. Simple CNN models performed exceptionally well
40 epochs. in problem-solving when machine learning and deep learning
We updated the weight and activation functions for each were first introduced. The complexity of today’s problems
training to be int-4, int-8 and int-16. Training took approxi- necessitates more complex and in-depth models for solution.
mately 11 minutes for each epoch and about 4.6 hours for total Deeper networks’ success in solving today’s problems is also

Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on October 06,2022 at 07:59:48 UTC from IEEE Xplore. Restrictions apply.
TABLE II
P ERFORMANCE COMPARISON OF LOW- PRECISION QUANTIZATION IN
PROPOSED C ONV N ET

Activation Func.
Epochs Loss Train acc. Test acc.
& Weights bits
4 20 0.6546 77 72.34
4 30 0.1016 96 74.56
4 40 0.0546 98 76.13
8 20 0.6493 77 72.56
8 30 0.1131 96 74.57
8 40 0.0543 98 76.78
16 20 0.6868 75 70.56
16 30 0.1178 95 73.48
16 40 0.0646 96 74.13

(a)
TABLE III
S PECIFICATIONS OF THE COMPUTER USED IN CALCULATIONS

Parameters Value Unit


CPU Manufacturer INTEL -
CPU Variant i7-7700HQ -
CPU Clock Frequency 2.8 to 3.8 GHz
CPU Core Size 4 -
Cache Size 6 MB
RAM Size 16 GB
GPU Manufacturer NVIDIA -
GPU Chipset GTX1050 -
GPU RAM Size 4 GB

dependent on software optimization and utilization of hard-


ware accelerators. Since the model is so complex, it required (b)
more memory and causes performance bottleneck issues in
hardware. It also causes power consumption issues, partic- Fig. 6. Low precision int-8-LeNet test accuracy and cost function at 400
epochs.
ularly in battery-powered portable devices. These scenarios
place the user in a bind. As a result, designers have devised
techniques for reducing the computational density and power
model with lower epochs. Consequently, low precision int-
consumption of complex neural networks. Pruning, pooling,
8 quantization outperformed others. We also presented the
and thinning methods are frequently used by researchers to
accuracy and loss function during training of the ConvNet
find solutions [25], [26]. In addition to these, by carefully
model for 40 epochs in Fig. 7-a and Fig. 7-b, respectively.
balancing accuracy and complexity, model quantization can
We can conclude that a similar problem can be solved with a
result in hardware implementable neural networks. Indeed,
smaller bit-width rather than a large bit-width. As a result, we
our study has shown that quantization can ease memory
can reduce hardware computational complexity and memory
requirements, computational complexity. Since CIFAR-10 is
requirements.
already a difficult-to-classify data set, the highest test accuracy
Artificial intelligence and its sub-branches ML and DL
we obtained using the LeNet was 63.59%. In Table II, We
are becoming increasingly utilized in consumer electronics,
achieved 76.78% test accuracy in just 40 epochs with our more
industry, automotive, and defense. Our research aims to
advanced ConvNet model using low precision int-8.
develop an infrastructure for the implementation of digital
design-based advanced driver assistance systems including
V. C ONCLUSION AND F UTURE WORK
FPGAs for vehicles in use today. The automotive industry
In this paper, we compare the performance of CIFAR-10 has already started the new era of self-driving vehicles, in
dataset classification using different low precision bit widths which the driver is replaced by computers running ML and
for use in FPGA-based hardware accelerator architectures. DL algorithms along with the operating system. All these
Here, we first applied the parameter values shown in Table I to enhancements increase the power consumption footprint of the
the LeNet model. As a result, we discovered that low precision autonomous car’s electrical infrastructure. As a consequence,
int-16 quantization outperformed others. The accuracy and loss the future AI problems requires solutions with not only enough
function during training of the LeNet model for 400 epochs accuracy for proper decision making, but also consume lower
are shown in Fig. 6-a and Fig. 6-b, respectively. Second, we power which mandates hardware solutions. The balance set
applied the parameter values from Table II to our ConvNet by quantization of models between the required accuracy and

Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on October 06,2022 at 07:59:48 UTC from IEEE Xplore. Restrictions apply.
[7] G. Tatar , I. Cicek and S. Bayar , ”FPGA Design of a Fourth Order Ellip-
tic IIR Band-Pass Filter Using LabVIEW”, Avrupa Bilim ve Teknoloji
Dergisi, no. 26, pp. 122-127, Jul. 2021, doi:10.31590/ejosat.951601
[8] X. Wei, C. H. Yu, P. Zhang, Y. Chen, Y. Wang, H. Hu, Y. Liang,
and J. Cong, “Automated systolic array architecture synthesis for high
throughput cnn inference on fpgas,” in Proceedings of the 54th Annual
Design Automation Conference 2017, 2017, p. 29.
[9] Wu, Chen, et al. ”Low-precision Floating-point Arithmetic for High-
performance FPGA-based CNN Acceleration.” ACM Transactions on
Reconfigurable Technology and Systems (TRETS) 15.1 (2021): 1-21.
[10] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-
Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE
conference on computer vision and pattern recognition, pages 248–255.
Ieee, 2009.
[11] Véstias, Mário P., et al. ”A fast and scalable architecture to run
convolutional neural networks in low density FPGAs.” Microprocessors
and Microsystems 77 (2020): 103136.
(a) [12] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z.
Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. FeiFei,
“Imagenet large scale visual recognition challenge,” International Journal
of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
[13] Jacob, Benoit, et al. ”Quantization and training of neural networks for
efficient integer-arithmetic-only inference.” Proceedings of the IEEE
conference on computer vision and pattern recognition. 2018.
[14] Nagel, Markus, et al. ”A white paper on neural network quantization.”
arXiv preprint arXiv:2106.08295 (2021).
[15] LeCun, Yann, et al. ”Gradient-based learning applied to document
recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324.
[16] Alessandro Pappalardo. Xilinx/brevitas, 2021. URL https://doi.
org/10.5281/zenodo.3333552.
[17] Lei Mao, ”Quantization for Neural Networks,”
https://leimao.github.io/article/Neural-Networks-Quantization/
(accessed: May 10, 2022).
[18] ONNX Runtime developers. Onnx runtime. https://onnxruntime. ai/,
2021. Version: x.y.z
[19] Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela
Blott, Philip Leong, Magnus Jahre, and Kees Vissers. Finn: A frame-
(b) work for fast, scalable binarized neural network inference. In Pro-
ceedings of the 2017 ACM/SIGDA International Symposium on Field-
Fig. 7. Low precision int-8-ConvNet test accuracy and cost function at 40 Programmable Gate Arrays, FPGA ’17, pages 65–74. ACM, 2017.
epochs. [20] Michaela Blott, Thomas B Preußer, Nicholas J Fraser, Giulio Gam-
bardella, Kenneth O’brien, Yaman Umuroglu, Miriam Leeser, and Kees
Vissers. Finn-r: An end-to-end deep-learning framework for fast explo-
ration of quantized neural networks. ACM Transactions on Reconfig-
hardware complexity will set the performance boundaries for urable Technology and Systems (TRETS), 11(3):1–23, 2018.
what these algorithms can achieve in the future. [21] Q. Xiao, Y. Liang, L. Lu, S. Yan, and Y.-W. Tai, “Exploring heteroge-
neous algorithms for accelerating deep convolutional neural networks on
fpgas,” in 2017 54th ACM/EDAC/IEEE Design Automation Conference
R EFERENCES (DAC). IEEE, 2017, pp. 1–6.
[22] K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang,
[1] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, and H. Yang, “Angel-eye: A complete design flow for mapping cnn
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” onto embedded fpga,” IEEE Transactions on Computer-Aided Design
in Proceedings of the IEEE conference on computer vision and pattern of Integrated Circuits and Systems, vol. 37, no. 1, pp. 35–47, 2017.
recognition, 2015, pp. 1–9. [23] Y. Ma, Y. Cao, S. Vrudhula, and J.-s. Seo, “Optimizing the convolution
[2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look operation to accelerate deep neural networks on fpga,” IEEE Transac-
once: Unified, real-time object detection,” in Proceedings of the IEEE tions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 7,
conference on computer vision and pattern recognition, 2016, pp. 779– pp. 1354–1367, 2018.
788. [24] Bouti, Amal, et al. ”A robust system for road sign detection and
[3] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. classification using LeNet architecture based on convolutional neural
Senior, P. Tucker, K. Yang, Q. V. Le et al., “Large scale distributed network.” Soft Computing 24.9 (2020): 6721-6733.
deep networks,” in Advances in neural information processing systems, [25] Liang, Tailin, et al. ”Pruning and quantization for deep neural network
2012, pp. 1223–1231. acceleration: A survey.” Neurocomputing 461 (2021): 370-403.
[4] K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, and E. [26] Luo, Jian-Hao, and Jianxin Wu. ”Autopruner: An end-to-end trainable
S. Chung, “Accelerating deep convolutional neural networks using filter pruning method for efficient deep model inference.” Pattern Recog-
specialized hardware,” Microsoft Research Whitepaper, vol. 2, no. 11, nition 107 (2020): 107461.
pp. 1–4, 2015.
[5] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam,
“Diannao: A small-footprint high-throughput accelerator for ubiquitous
machine-learning,” in ACM Sigplan Notices, vol. 49, no. 4, 2014, pp.
269–284.
[6] G. Tatar , S. Bayar and I. Cicek , ”FPGA Design of a High-
Resolution FIR Band-Pass Filter by Using LabVIEW Environment”,
Avrupa Bilim ve Teknoloji Dergisi, no. 29, pp. 273-277, Dec. 2021,
doi:10.31590/ejosat.1016363

Authorized licensed use limited to: FATIH SULTAN MEHMET UNIVERSITY. Downloaded on October 06,2022 at 07:59:48 UTC from IEEE Xplore. Restrictions apply.
View publication stats

You might also like