Professional Documents
Culture Documents
4, APRIL 2021
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
MARCHISIO et al.: FEECA: DESIGN SPACE EXPLORATION FOR LOW-LATENCY AND ENERGY-EFFICIENT CapsNet ACCELERATORS 717
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
718 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 29, NO. 4, APRIL 2021
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
MARCHISIO et al.: FEECA: DESIGN SPACE EXPLORATION FOR LOW-LATENCY AND ENERGY-EFFICIENT CapsNet ACCELERATORS 719
TABLE I
I NPUT S IZE , N UMBER OF T RAINABLE PARAMETERS , AND
O UTPUT S IZE OF E ACH L AYER OF THE C APS N ET
Fig. 10. Performance of each step of the ClassCaps layer during the inference
pass.
Fig. 11. Our FEECA methodology for obtaining Pareto-optimal design con-
figurations of the CapsNet accelerators with respect to the given optimization
objectives.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
720 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 29, NO. 4, APRIL 2021
Fig. 13. Architecture of different components of our CapsAcc. (a) processing element array. (b) Single (n pe = 1) PE. (c) Accumulator. (d) Activation unit.
(e) Squashing function unit. (f) Norm function unit. (g) Softmax function unit.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
MARCHISIO et al.: FEECA: DESIGN SPACE EXPLORATION FOR LOW-LATENCY AND ENERGY-EFFICIENT CapsNet ACCELERATORS 721
Fig. 15. Dataflow of our CapsAcc for different scenarios of the case study. (a) Convolutional layer mapping. (b) Sum generation & squashing operation
mapping for the first routing iteration. (c) Update & softmax operation mapping. (d) Sum generation & squashing operation mapping for all but the first
routing iteration.
function takes the vector s j (elementwise) and its norm s j Algorithm 1 Mapping Algorithm for CapsuleNet Operations
as inputs. The Norm input is coming from its respective Onto the PE Array
unit. Hence, the Norm operation is not implemented again
inside the squash unit. The LUT takes a 6-bit fixed-point data
and a 5-bit fixed-point norm as inputs to produce an 8-bit
output. The first output of the vector is produced with just one
additional clock cycle compared to the Norm. We decided to
limit the bit width to reduce the computational requirements
at this stage, following the analysis performed in Section III,
which shows the highest computational load for this opera-
tion. Such a design using an LUT significantly reduces the
latency of the squashing operation, as we will demonstrate in
Section V-D. A pure logic-based implementation would have
required complex mathematical operations that would not be access rate. Moreover, the accumulator unit contains a buffer
efficient when implemented in hardware. for storing the output partial sums, and the routing buffer is
The softmax function design is shown in Fig. 13(g). used to store the coefficients of the dynamic routing.
Initially, it computes the exponential function (8-bit lookup
table) and accumulates the sum in a register, followed by a B. Dataflow Design
division. Overall, having an array of n elements, this block is
In this section, we provide the details on how to map the
able to compute the softmax function of the whole array in
processing of different types of layers and operations onto our
2n cycles.
CapsAcc architecture, in a step-by-step fashion. To feed the PE
4) Control Unit: At each stage of the inference process, array, we adopt the mapping policy described in Algorithm 1.
it generates different control signals for all the components For the ease of understanding, we illustrate the process with
of the accelerator architecture, according to the operations the help of a case study performing MNIST classification on
needed. Its functionality is shown in Fig. 14. The core of our CapsAcc. Note that each stage of the CapsuleNet inference
the control unit is a finite state machine (FSM), which gen- requires its own mapping scheme.
erates at the output the control signals for the multiplexers, 1) Dataflow of the Conv1 Layer: The Conv1 layer has filters
the memories, the buffers, and all the other components of of size 9 × 9 and 256 channels. As shown in Fig. 16(a),
the CapsAcc architecture. A set of counters interacts with the we designed a row-by-row mapping (A, B), and after the last
FSM to guarantee the correct timing of all the operations. row, we move to the next channel (C). Fig. 17(a) shows
For example, in a convolution operation, the number of clock how the dataflow is mapped onto our CapsAcc architecture.
cycles needed to process the data for a given set of weights An illustrative example of mapping the weights onto the
is counted, before the next set of weights are loaded onto the weight buffer is shown in Fig. 17. To perform the convolution
PE array. Therefore, the control unit is essential for correctly efficiently, we hold the weight values in the PE array to reuse
scheduling the operations of the accelerator. the filter across different input data.
5) Memory Hierarchy: Besides the registers that are embed- 2) Dataflow of the PrimaryCaps Layer: Compared to the
ded in the PE array and in the activation unit, the memory Conv1 layer, the PrimaryCaps layer has one more dimen-
hierarchy is organized as follows. All the weights for each sion, which is the capsule size (i.e., 8). However, we treat
operation are stored in the on-chip weight memory, while the 8-D capsule as a convolutional layer with eight output
the initial data, which correspond to the pixel intensities of channels. Thus, Fig. 16(b) shows that we map the parameters
the input image, are stored in the on-chip data memory. row-by-row (A, B) and then moving through different input
As the interface between the memories and the accelerator, channels (C), and only at the third stage, we move on to the
the data buffer and weight buffer work as a cushion for the next output channel (D). This mapping procedure allows us to
interaction with the PE array at high bandwidth and high minimize the accumulator size because our CapsAcc computes
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
722 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 29, NO. 4, APRIL 2021
Fig. 16. Dataflow of the process of mapping different layers onto our
CapsAcc architecture. (a) Conv1 layer. (b) PrimaryCaps layer. (c) ClassCaps
Fig. 18. Synthesis flow and tool chain of our experimental setup.
layer.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
MARCHISIO et al.: FEECA: DESIGN SPACE EXPLORATION FOR LOW-LATENCY AND ENERGY-EFFICIENT CapsNet ACCELERATORS 723
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
724 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 29, NO. 4, APRIL 2021
Algorithm 2 NSGA-II
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
MARCHISIO et al.: FEECA: DESIGN SPACE EXPLORATION FOR LOW-LATENCY AND ENERGY-EFFICIENT CapsNet ACCELERATORS 725
Fig. 26. Power consumption and area of PEs with various bit width of P.Sum
(bout ) and n stg = 1. The dotted lines show the maximal number of inputs n pe
that can be synthesized without violating the constraint for a given bit width.
TABLE IV
Fig. 25. Experimental setup (orange) and toolflow for this section. PARAMETERS OF SRAM
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
726 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 29, NO. 4, APRIL 2021
Fig. 29. Distribution of n pe and #COLS parameters for configurations that are Pareto optimal for E-L and A-L objectives. The blue figures show the
distribution of the objectives of the whole CapsNet and the red ones of the configuration optimized for a single layer.
are highly overlapping with the solutions generated with the equal to 32 × 1 (see pointer ⑤). As visible, the distribution of
NSGA-II algorithm, meaning that the latter is an efficient the optimal parameters for the A versus D design objectives is
and fidelitous design space algorithm. Moreover, pointer ② different. Since the area strongly depends on mem bw , all their
indicates that there is a relatively small area difference values lead to some Pareto-optimal solutions.
between the configurations. Note that there are different
solutions with the same area, but different delays. We C. Heuristic Search Algorithm
also compared the Pareto-optimal solutions found by the The brute-force algorithm eventually finds the optimal
NSGA-II-based FEECA methodology with a random solutions. However, it is very slow because all the possible
search [43] of the same number of candidate solutions. solutions are explored. Therefore, we implement the heuristic
Compared to the Pareto-optimal points found by the random NSGA-II algorithm to speed up the search process. For the
search (see the green points in Fig. 27), the Pareto-optimal E-L objectives, the NSGA-II runs for 1000 iterations of the
points found by the NSGA-II-based search exhibit 67× and generation process, with a population size |P| = |Q| = 50,
146× lower average normalized Euclidean distance (ANED) to find up to 50 Pareto-optimal configurations.
to the optimal points for the E versus D and A versus D The NSGA-II algorithm needs only 50 050 evaluations
objectives, respectively. (0.44% of the search space). Therefore, the exploration time
For the E versus D objectives, Fig. 28 shows the energy has been decreased from 2.5 h to 30 s, compared to using
consumption and the delay of the configurations optimized the brute-force. The design for the E versus D objective is not
for the: 1) overall E versus D and 2) E versus D of every trivial because the optimal Pareto frontier consists of 228 con-
single layer. The PrimaryCaps layer has the biggest impact figurations. Therefore, the initial settings |P| = |Q| = 50
on the overall energy and delay, and thus, the layerwise and allow finding only a small subset of the optimal solutions,
the CapsNet-optimal configurations, in that case, fall almost regularly distributed due to the distance crowding. However,
on the same curve. On the other hand, the CapsNet-optimal almost all the found solutions belong to the optimal Pareto
configurations degrade the performance of the Sum, Update, set, and the ANED from the found solutions to the nearest
and mostly Conv1 layers, but these layers participate on the optimal ones is 4·10−5 . However, the ANED from the optimal
overall objectives with a lower impact compared to the Prima- solutions to the nearest found solutions is 0.006. To reduce
ryCaps layer. Indeed, as indicated by pointers ③, an optimal such distance from the optimal solutions, we increase the size
solution for the whole CapsNet belongs only to the Prima- of the population to |P| = |Q| = 150 (150 150 evaluations;
ryCaps layer optimal, while it is not optimal for the other 1.31% of the exploration time). With these settings, we found
layers. 150 solutions. Although such modification causes 3× more
Another view on the optimal configurations is presented time for the design, the ANED from optimal to found solutions
in Fig. 29. This figure shows the distribution of the parameters is decreased to 0.001, and each found solution belongs to the
of the CapsNet accelerator for different configurations. Note optimal Pareto set. The heuristic design for the A versus D
that if we consider all the objectives, better results are achieved objective with |P| = |Q| = 150 allows us to find 97 of
when using #ROWS=1. Considering the E versus D objec- 127 configurations with an ANED from optimal to found
tives, it is convenient to maximize mem bw . The highest con- solutions equal to 2 · 10−4 . The results are shown in Fig. 27.
tribution to the overall delay and energy consumption is due
to the PrimaryCaps layer. It is convenient to choose the value D. Multiobjective Optimization
of n pe in the range between 1 and 7. However, considering the By running the search algorithm on our benchmark, three
Conv1 only, a better choice would have been n pe ∈ {4.7} (see objectives of the CapsNet accelerator are optimized: the area
pointer ④) and equal to 4 for the ClassCaps layer. The Sum, on the chip, the energy consumption for the inference of one
Update, and ClassCaps layers prefer the size of the PE array input image, and the delay (i.e., the latency of the inference).
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
MARCHISIO et al.: FEECA: DESIGN SPACE EXPLORATION FOR LOW-LATENCY AND ENERGY-EFFICIENT CapsNet ACCELERATORS 727
TABLE V
R ESULTS FOR THE PE A RRAY AND E STIMATED D ELAY, A REA , AND E NERGY C ONSUMPTION FOR THE W HOLE C APS A CC A RCHITECTURE . T HE fastest
(L OWEST D ELAY ) C ONFIGURATION OF THE C APS A CC I S H IGHLIGHTED IN G REEN IN THE F IRST ROW, W HILE THE O RIGINAL V ERSION OF THE
C APS A CC , W HICH WAS A NALYZED IN S ECTION V, I S R EPORTED IN THE S ECOND L AST ROW. A LL C IRCUITS H AVE
B EEN S YNTHESIZED W ITH THE C LOCK P ERIOD T = 3 NS
Fig. 31. Distribution of the configuration parameters for the optimal solutions
found for three objectives. The n stg parameter was always equal to 1.
Fig. 30 reports two different visualization perspectives of E. Case Study: Synthesis of a Pareto-Optimal Solution
the results. The first one (top) shows the results as a 3-D As a case study, we synthesized the complete PE array of
plot where we can identify the resulting Pareto frontiers. The the selected solution (highlighted with a gray circle in Fig. 30)
same results are also visualized on 2-D plots (bottom), where using the Synopsys Design Compiler. The microarchitectural
each couple of two objectives is combined into products, structure of the PE array is shown in Fig. 32. Note that, since
which are energy × delay (EDP), area × delay (ADP), and the solution has one row, the structure of the PE differs from
energy × area (EAP), respectively. By reducing the dimension the generic PE (see Fig. 24) in two aspects.
of the space, only a smaller number of solutions remain 1) Since there is only one row, the W eight1 Reg. is not
in the Pareto-frontiers, which are shown by the gray lines. needed because there is no reason to store the weight
For example, the solution that is analyzed, as a case study, values for the subsequent rows.
in Section VII-E, and marked with a gray circle in Fig. 30, 2) Since there is only one row, the input partial sums are
lays on the Pareto-frontier only in the last two plots, i.e., the null. Therefore, all the relative connections and additions
ADP versus energy tradeoff (pointer ⑥) and EAP versus delay are omitted.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
728 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 29, NO. 4, APRIL 2021
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.
MARCHISIO et al.: FEECA: DESIGN SPACE EXPLORATION FOR LOW-LATENCY AND ENERGY-EFFICIENT CapsNet ACCELERATORS 729
[26] M. Capra, B. Bussolino, A. Marchisio, M. Shafique, G. Masera, and Vojtech Mrazek (Member, IEEE) received the
M. Martina, “An updated survey of efficient hardware architectures Ing. and Ph.D. degrees in information technology
for accelerating deep convolutional neural networks,” Future Internet, from the Faculty of Information Technology, Brno
vol. 12, no. 7, p. 113, Jul. 2020. University of Technology, Brno, Czech Republic,
[27] S. Sharify et al., “Laconic deep learning inference acceleration,” in Proc. in 2014 and 2018, respectively.
46th Int. Symp. Comput. Archit., Jun. 2019, pp. 304–317. He was a Visiting Post-Doctoral Researcher with
[28] J. Li et al., “SqueezeFlow: A sparse CNN accelerator exploiting the Department of Informatics, Institute of Com-
concise convolution rules,” IEEE Trans. Comput., vol. 68, no. 11, puter Engineering, Technische Universität Wien
pp. 1663–1677, Nov. 2019. (TU Wien), Vienna, Austria, from 2018 to 2019. He
[29] A. Marchisio, V. Mrazek, M. A. Hanif, and M. Shafique, “ReD-CaNe: is currently a Researcher with the Faculty of Infor-
A systematic methodology for resilience analysis and design of capsule mation Technology, Brno University of Technology.
networks under approximations,” in Proc. Design, Autom. Test Eur. Conf. He has authored or coauthored over 30 conference papers/journal articles
Exhib. (DATE), Mar. 2020, pp. 1205–1210. focused on approximate computing and evolvable hardware. His research
[30] M. Capra, B. Bussolino, A. Marchisio, G. Masera, M. Martina, and interests are approximate computing, genetic programming, and machine
M. Shafique, “Hardware and software optimizations for accelerating learning.
deep neural networks: Survey of current trends, challenges, and the road Dr. Mrazek received several awards for his research in approximate com-
ahead,” IEEE Access, vol. 8, pp. 225134–225180, 2020. puting, including the Joseph Fourier Award in 2018 for research in computer
[31] M. A. Hanif and M. Shafique, “A cross-layer approach towards develop- science and engineering.
ing efficient embedded deep learning systems,” in Proc. MICPRO, 2021,
Art. no. 103609, doi: 10.1016/j.micpro.2020.103609. Muhammad Abdullah Hanif (Graduate Student
[32] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learn- Member, IEEE) received the B.Sc. degree in elec-
ing applied to document recognition,” Proc. IEEE, vol. 86, no. 11, tronic engineering from the Ghulam Ishaq Khan
pp. 2278–2324, Nov. 1998. Institute of Engineering Sciences and Technology
[33] S. Houben, J. Stallkamp, J. Salmen, M. Schlipsing, and C. Igel, (GIKI), Topi, Pakistan, in 2011, and the M.Sc.
“Detection of traffic signs in real-world images: The German traffic sign degree in electrical engineering with a specializa-
detection benchmark,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), tion in digital systems and signal processing from
Aug. 2013, pp. 1–8. the School of Electrical Engineering and Computer
[34] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Science, National University of Sciences and Tech-
Boltzmann machines,” in Proc. Int. Conf. Mach. Learn. (ICML), 2010, nology, Islamabad, Pakistan, in 2016. He is currently
pp. 1–8. working toward the Ph.D. degree in computer engi-
[35] A. D. Kumar, “Novel deep learning model for traffic sign detection using neering at Technische Universität Wien (TU Wien), Vienna, Austria, under
capsule networks,” CoRR, vol. abs/1805.04424, May 2018. the supervision of Prof. M. Shafique.
[36] A. Paszke et al., “Automatic differentiation in PyTorch,” in Proc. NIPS He was also a Research Associate with the Vision Processing Lab, Infor-
Autodiff Workshop, 2017, pp. 1–4. mation Technology University, Lahore, Pakistan, and a Lab Engineer with
[37] A. Marchisio et al., “FasTrCaps: An integrated framework for fast yet
GIKI, Pakistan. He is currently a University Assistant with the Department
accurate training of capsule networks,” in Proc. Int. Joint Conf. Neural
of Informatics, Institute of Computer Engineering, TU Wien. His research
Netw. (IJCNN), Jul. 2020, pp. 1–8.
interests are in brain-inspired computing, machine learning, approximate
[38] A. Marchisio, M. A. Hanif, M. T. Teimoori, and M. Shafique, “Capstore:
computing, computer architecture, energy-efficient design, robust computing,
Energy-efficient design and management of the on-chip memory for
system-on-chip design, and emerging technologies.
capsulenet inference accelerators,” CoRR, vol. abs/1902.01151, Apr.
Mr. Hanif was a recipient of the President’s Gold Medal for the outstanding
2019.
academic performance during the M.Sc. degree.
[39] A. Marchisio, V. Mrazek, M. A. Hanif, and M. Shafique, “DESCNet:
Developing efficient scratchpad memories for capsule network hard- Muhammad Shafique (Senior Member, IEEE)
ware,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., early received the Ph.D. degree in computer science from
access, Oct. 13, 2020, doi: 10.1109/TCAD.2020.3030610. the Karlsruhe Institute of Technology (KIT), Karl-
[40] T. Glasmachers, “A fast incremental BSP tree archive for non-dominated sruhe, Germany, in 2011.
points,” in Evolutionary Multi-Criterion Optimization. Berlin, Germany: Afterward, he established and led a highly recog-
Springer-Verlag, 2017, doi: 10.1007/978-3-319-54157-0_18. nized research group at KIT for several years
[41] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist and conducted impactful collaborative Research and
multiobjective genetic algorithm: NSGA-II,” IEEE Trans. Evol. Comput., Development activities across the globe. In Octo-
vol. 6, no. 2, pp. 182–197, Apr. 2002. ber 2016, he joined the Faculty of Informatics,
[42] S. Li, K. Chen, J. H. Ahn, J. B. Brockman, and N. P. Jouppi, “CACTI-P: Institute of Computer Engineering, Technische Uni-
Architecture-level modeling for SRAM-based structures with advanced
versität Wien (TU Wien), Vienna, Austria, as a
leakage reduction techniques,” in Proc. IEEE/ACM Int. Conf. Comput.- Full Professor of computer architecture and robust, and energy-efficient
Aided Design (ICCAD), Nov. 2011, pp. 694–701. technologies. Since September 2020, he has been with the Division of
[43] T. Jansen, Evolutionary Algorithms and Other Randomized Search
Engineering, New York University Abu Dhabi (NYU-AD), Abu Dhabi,
Heuristics. Berlin, Germany: Springer, 2013.
United Arab Emirates. He is also a Global Network Faculty with the NYU
[44] A. Marchisio, M. A. Hanif, and M. Shafique, “CapsAcc: An efficient
Tandon School of Engineering, New York, NY, USA. He holds one U.S.
hardware accelerator for CapsuleNets with data reuse,” in Proc. Design,
patent has (co)authored six Books, more than ten book chapters, and over
Autom. Test Eur. Conf. Exhib. (DATE), Mar. 2019, pp. 964–967.
300 articles in premier journals and conferences. His research interests are
Alberto Marchisio (Graduate Student Member, in design automation and system-level design for brain-inspired computing,
IEEE) received the B.Sc. degree in electronic engi- AI & machine learning hardware, wearable healthcare devices and systems,
neering and the M.Sc. degree in electronic engi- autonomous systems, energy-efficient systems, robust computing, hardware
neering (electronic systems) from the Politecnico di security, emerging technologies, field-programmable gate arrays (FPGAs),
Torino, Turin, Italy, in October 2015 and April 2018. Multi-Processor System on Chips (MPSoCs), and embedded systems. His
He is currently working toward the Ph.D. degree at research has a special focus on cross-layer analysis, modeling, design, and
Computer Architecture and Robust Energy-Efficient optimization of computing and memory systems. The researched technologies
Technologies (CARE-Tech.) Lab, Institute of Com- and tools are deployed in application use cases from Internet-of-Things (IoT),
puter Engineering, Technische Universität Wien (TU smart cyber–physical systems (CPS), and ICT for Development (ICT4D)
Wien), Vienna, Austria, under the supervision of domains.
Prof. Dr. Muhammad Shafique. Dr. Shafique received the 2015 ACM/SIGDA Outstanding New Faculty
His main research interests include hardware and software optimizations Award, the AI 2000 Chip Technology Most Influential Scholar Award in 2020,
for machine learning, brain-inspired computing, VLSI architecture design, six gold medals, and several best paper awards and nominations at prestigious
emerging computing technologies, robust design, and approximate computing conferences. He has served as the PC Chair, the General Chair, the Track
for energy efficiency. He received the honorable mention at the Italian National Chair, and a PC Member for several prestigious IEEE/ACM conferences.
Finals of Maths Olympic Games in 2012 and the Richard Newton Young He has given several keynotes, invited talks, and tutorials, as well as organized
Fellow Award in 2019. many special sessions at premier venues.
Authorized licensed use limited to: California State University Fresno. Downloaded on June 19,2021 at 07:11:31 UTC from IEEE Xplore. Restrictions apply.