Professional Documents
Culture Documents
Received February 7, 2020, accepted February 25, 2020, date of publication February 28, 2020, date of current version March 11, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.2977087
ABSTRACT With the rapid advancements of the internet of things, systems including sensing, communica-
tion, and computation become ubiquitous. The systems that are built with these technologies are increasingly
complex and therefore require more automation and intelligent decision-making, while often including
contact with humans. It is thus critical that such interactions run smoothly in real time, and that the automation
strategies do not introduce important delays, usually not larger than 100 milliseconds, as the blink of a
human eye. Pushing the deployment of the algorithms on embedded devices closer to where data is collected
to avoid delays is one of the main motivations of edge computing. Further advantages of edge computing
include improved reliability and data privacy management. This work showcases the possibilities of different
embedded platforms that are often used as edge computing nodes: embedded microcontrollers, embedded
microprocessors, FPGAs and embedded GPUs. The embedded solutions are compared with respect to their
cost, complexity, energy consumption and computing speed establishing valuable guidelines for designers of
complex systems that need to make use of edge computing. Furthermore, this paper shows the possibilities
of hardware-agnostic programming using OpenCL, illustrating the price to pay in efficiency when software
can be easily deployed on different hardware platforms.
INDEX TERMS Internet of Things, edge computing, FPGA, system on chip, neural network.
FIGURE 2. Historical introduction to edge computing: enabling technologies and applications [3].
There are several reasons to do so, such as lower latencies C. RESPONSE TIME
or lower energy consumption, which are illustrated in the The latency of a cloud based system is affected by several
remainder of this section. different factors, including network effective bandwidth and
computation power available at the cloud server [30]. These
A. BANDWIDTH SHORTAGE
results in an unpredictability in the response time of the
By 2025, it is expected that there will be more than 40 billion application. But in many applications, the reliability of the
connected IoT devices generating 79.4 zettabytes of data, system actually depends on a specific and limited response
following an annual growth of 28.7% [23]. Most data will time.
be created by surveillance applications, but new applications
in the industrial and medical areas will significantly increase III. PROPOSED BENCHAMRK: DEEP LEARNING
the available data. Despite advances in communications, this Comparing performance of different hardware platforms and
will greatly exceed communication capabilities, highlighting different software implementations is a challenging task.
the importance of edge computing. In order to obtain fair and comparable results, the evaluation
Recent works have shown that it is possible to achieve of a deep neural network is chosen as a benchmark. Neural
intelligent decision-making even on resource constrained networks already play a significant role in intelligent systems
embedded devices. For example, optimization-based tech- and, because of recent developments in deep learning, it is
niques used for energy management systems were deployed expected that their role will be even more relevant in the
on microcontrollers [24], as well as power electronic con- future.
trollers on FPGAs [25]. This avoids the necessity to contin- In this work, we used a fully connected deep neural net-
uously send all data and decisions to the cloud, limiting it to work with three neurons in the input layer and two neurons
less frequent updates and upgrades. in the output layer. This structure corresponds to the neural
B. SECURITY AND PRIVACY network necessary to represent a predictive controller of a
Today, millions of embedded devices are used in safety half-bridge power inverter for induction heating as presented
and security critical applications such as industrial control in [31]. Control of power electronics, e.g., in the field of
systems, modern vehicles, and critical infrastructure [26]. power systems or smart grid is an example where cloud
IoT devices are always producing, consuming and exchang- computing is not possible because intelligent decisions based
ing critical data and this makes the security of the IoT system on sensor information need to be taken in the range of
very important [7]. Edge computing can help the security microseconds.
and privacy aspects of IoT applications from several view- A feedforward deep neural network is defined as a com-
points [27], [28]. First, if less data is sent over the net- position of several functions that compute the neural network
work, less data is vulnerable to unsecured communication. function of N : Rnx → Rny , and can be written as:
Additionally, more processing power at the edge results in N (x; θ, M , L) = fL+1 ◦ gL ◦ fL ◦ . . . ◦ g1 ◦ f1 (x), (1)
more encryption and decryption power which can bring more
where the input of the network is x ∈ Rnx
and the output
security. For applications that contain sensitive personal data
of the network is y ∈ Rny . M is the number of neurons in
as in the field of smart homes, increasing the amount of edge
each hidden layers and L is the number of hidden layers. Each
computing can reduce the amount of data that users need to
hidden layer applies an affine function:
share with cloud services, leading to larger control of data
ownership and data sovereignty issues by the user [29]. fl (ξl−1 ) = Wl ξl−1 + bl , (2)
FIGURE 4. Evaluated platforms. From left to right: microcontroller, embedded processor, FPGA SoC, and embedded GPU.
TABLE 1. Specifications of the general-purpose CPU device. TABLE 4. Specifications of the embedded processor device.
C. EMBEDDED PROCESSOR
TABLE 3. Specifications of the microcontroller device. A possible solution to alleviate the problem of limited per-
formance and memory typical of low-cost microcontrollers
is to use embedded processors. Embedded processors can
run at significantly higher clock frequencies (up to 2 GHz)
and can have memory capacities up to several GBs. In this
work, the neural network inference problem is evaluated on
a Raspberry Pi3 B+ embedded processor. The specifications
of the Raspberry Pi3 B+ board used in this work are shown
in Table 4.
B. MICROCONTROLLER Embedded processors usually need operating systems to
Because of their simplicity and efficiency, microcontrollers handle some system level tasks like memory management
have always been used as processing unit in embedded and process handling. This also adds to the complexity
applications like edge computing. For example, in [33], of application development with embedded processors. The
a microcontroller design has been introduced for IoT edge performance results of the implementation of the neural
computing and its performance and power consumption has network inference with an embedded processor are shown
been measured. Due to advances in microelectronics, we now in FIGURE 7 (b). Peak performance and power consumption
have access to relatively high-performance and low-power of the embedded processor in neural network inference appli-
microcontrollers. In this work, the EFM32 Giant Gecko board cation are also shown in Table 7.
manufactured by Silicon Lab has been used.
The micro-controller unit of this board is based on ARM D. FPGA
Cortex-M3 architecture. This is a mid-range low power FPGA-SoCs were introduced in order to combine the flexibil-
microcontroller, so it is a suitable example for IoT edge ity of software design and the performance gain of FPGAs.
devices. The specifications of this microcontroller are pre- In this work, a Xilinx Zynq device has been chosen as the
sented in Table 3. target FPGA-SoC. The evaluation board used in this work is
The performance results for the neural network infer- Zybo Z7 made by Digilent. The specifications of the Zybo Z7
ence application on this microcontroller are shown in board are in Table 5.
FIGURE 7 (a). Standard microcontrollers have only a few To have a more complete conclusion about FPGA-SoC
hundreds of kilobytes of RAM. Because of this, it was not implementation, we have evaluated all major design method-
possible to evaluate large neural networks, and this is the ologies for FPGA-SoCs. The structure of a typical system in
only hardware platform where a smaller network with only an FPGA-SoC is shown in FIGURE 5. The main application
10 layers has been used. Small networks can still be used to is performed in the software side (shown in the picture as the
perform advanced decision-making strategies as shown in [9] Hard Processing System) and the computation-intensive parts
for the smart building energy management system. In addi- are implemented in the FPGA side (shown in the picture as
tion, the best performance achieved with this microcontroller Custom Hardware). The key difference between the various
and the pick power consumption are shown in Table 7. implementation methodologies for FPGA-SoC lies on the
Because the specific microcontroller used in this work different approaches to design and implement the accelerator
does not have a dedicated floating-point unit, it must emu- part in the FPGA side and how to integrate it with software
late all the floating-point operations in the software level side.
FIGURE 5. System structure of a typical FPGA-SoC system. 2) HIGH LEVEL SYNTHESIS (HLS)
In this approach, the accelerator core is designed and
To accelerate an application with the FPGA-SoC, first described with a high level programing language like
the bottleneck of the application must be determined. The C/C++ [25], [36]. Then, this highly described system is
bottleneck of the application is the part that needs a signif- synthesized to a lower level hardware system. In this work,
icant amount of computation and where the majority of the this is done through Xilinx HLS design flow. This gives
execution time is spent. To find this, first the application must the opportunity to design the accelerator core with C/C++
be profiled. language and skip the complexity of RTL design and its
One way of profiling an application is to run it and then verification.
record the execution time for each function. Then by com- Several works have considered this type of implementation
paring this execution times, we can decide on which part to for a neural network. For example, in [37] a pipelined archi-
be accelerated in FPGA part. In neural network inference, tecture for a CNN has been implemented using Xilinx HLS
the vector-matrix multiplication operation to calculate the compiler. The goal in that design was to use the loop unrolling
results of every layer of the neural network is the bottle and pipelining techniques to get the most possible hardware
neck of the application. For this reason, in all FPGS-SoC utilization and performance.
implementations, we consider the hardware accelerator to be The integration of the accelerator core with the software
a matrix-vector multiplier co-processor. side must still be done manually within the Vivado System
In our work, we have followed three main approaches Integrator. However, because of the limited customization
to design the system in the FPGA-SoC platform. These capabilities, it is expected that the HLS implemented core is
approaches are explained as follows. not as efficient as of the RTL design, both in performance and
in area usage.
1) REGISTER TRANSFER LEVEL (RTL) The most important directives that are given to the
There are a lot of research efforts to accelerate a neural HLS compiler to customize as much as possible the hard-
network inference task with FPGAs. For example in [34], ware design are the following. The directive #pragma HLS
PIE has been introduced as a novel pipelined implementation PIPELINE II = 1 is used to pipeline the main matrix vector
to reach parallelism between layers in addition to in layer multiplication with initiation interval equal to 1. #pragma
parallelism. Also in [35], a layer based structure has been HLS INLINE is used to make the hierarchal structure smaller
introduced to perform the inference task in CNNs. Both these and also make the opportunity of resource sharing between
works have reported an improvement in performance and functions in the code.
FIGURE 7. Performance results for the studied platforms: (a) microcontroller, (b) embedded processor, (c) FPGA RTL, HLS and SDSoC
implementations, and (d) Jetson TX2 and GForce GPUs.
TABLE 7. Summary of performance, power consumption and design complexity for the evaluated platforms.
with a non-embedded, more powerful GPU, the performance platform. However, the much lower complexity of the imple-
of the device is comparable (see FIGURE 7 (d)) with a mentation may make it considerable for edge computation
significant decrease in power consumption as can be seen and IoT development.
in Table 7.
V. DISCUSSION
This section discusses the main conclusions and observations
F. PLATFORM INDEPENDENT IMPLEMENTATION that can be obtained from the results presented previously.
One big challenge for IoT developers to move their designs While the exact performance metrics might depend on the
toward edge computing is linked to the platform-dependent application at hand, it is expected that these results can serve
implementation. Designing a system based on an embedded as an indicator for the estimated performance of the different
GPU is completely different from implementing the same hardware platforms and the different types of implementa-
system based on an FPGA-SoC. The details of each hardware tion which can be used as guidelines by non-expert soft-
platform are very diverse, and they must be considered to ware/hardware designers (see summary on FIGURE 8).
have an optimized system.
To deal with this diversity in design and implementa- A. PERFORMANCE
tion methods, some solutions have been introduced. In this For heterogeneous computing platforms, it is very important
work, the use of OpenCL as a unified design method for all to consider the concept of concept of the computation to
platforms has been evaluated. An OpenCL neural network communication (CTC) ratio when analyzing the obtained
inference application has been tested on the three different results (see VI). As explained in [37], with higher CTC ratios,
platforms that currently support OpenCL and were used in more performance gain is achievable until the boundaries of
this work (Intel CPU, Geforce GPU and FPGA SoC). the system’s memory capabilities are reached.
OpenCL is a heterogeneous programming language that is The considered embedded IoT test application has a low
designed to be platform independent. However, it is still not CTC ratio. In each layer of our test neural network applica-
possible to have exactly the same OpenCL code running on tion, we are performing a matrix-vector multiplication and a
all platforms. vector-vector addition. The CTC rate will be different signifi-
An OpenCL code consists of two different parts, the host cantly larger for applications that use large input dimensions,
program that runs on the main processor and the device as e.g., in the field of image recognition. This means that
program (kernel code) that is going to be accelerated on the I/O bandwidth can rapidly become the main bottleneck
the co-processor which can be a GPU, an FPGA or sim- of the implementation, reducing the advantages of hardware
ply another CPU. Because of the different architecture in accelerators.
co-processor devices, it is not possible to get the same per- Among the embedded devices evaluated in this work,
formance gain from them with the same kernel code. For microcontrollers have weakest computing performance,
example, in FPGAs, pipelining is very important but for in the order of tens of kFLOPS. Most of low-cost low-energy
GPUs data parallelism is more vital. microcontrollers do not have a dedicated floating-point unit
The performance results for each platform running the and they have to emulate those operations in software
OpenCL application are listed in Table 7. As expected, level. Changing the accuracy of the computations from
the performance of the OpenCL program is significantly floating-point to fixed-point can improve their performance
lower than the individually optimized application for each significantly. In this work, a 32-bit fixed-point operation
the input array to the core, this is not useful in the con-
sidered edge computing application where task parallelism
rather than data parallelism is required. The parallelization
capabilities of embedded GPUs can be easily leveraged to
obtain very strong performances, especially in the case of
large problems.
B. POWER CONSUMPTION
For embedded devices that are going to be used in the
edge of the IoT systems, power consumption is a crit-
ical performance indicator. In some of the applications,
the edge device must operate with battery power. As expected,
low-power microcontrollers have the least power con-
FIGURE 8. Comparative chart.
sumption among all the platforms. However, to have a
fair comparison between platforms, the performance per
power ration is evaluated for all platforms and shown
in Table 7.
As it can be seen, the power consumed by the embed-
ded devices is much less than the power consumption of
the general-purpose CPUs and GPUs. Among embedded
devices, embedded GPUs are promising alternatives for com-
plex calculations on the edge, as they have significantly
better performance/power ratio as it can be seen for the
Jetson TX2.
To measure the power for the Jetson TX2 and the
EFM32 microcontroller, the available internal measurement
capabilities that the boards offer were used. For the FPGA
board, the overall power was measured using an external
microcontroller and a shunt resistor. The power consumption
FIGURE 9. Computational roof detailed in [37]. that is reported in Table 7 for the general-purpose platforms is
the one provided by the specifications of the hardware given
by the vendor.
instead of floating point resulted in three times lower com-
puting times for the same network. C. COMPLEXITY OF DESIGN
Embedded processors use an operating system to handle More complexity in design means more development time
tasks like scheduling and memory management. Still, because that results in longer time to market and higher development
of more powerful processors and also more memory available costs. While the flexibility in software design is usually high,
to them, these processors deliver significantly more perfor- it is not the case for hardware design. In order to combine the
mance than microcontrollers. positive features of each side, techniques like heterogeneous
For further increases in performance on embedded programming and hardware-software co-design have been
devices, heterogeneous implementations can be considered. introduced.
FPGA-SoCs are suitable platforms for edge computing and In this work, heterogeneous programming was evaluated
bring a wide variety of implementation options. The appli- using with OpenCL and CUDA on FPGAs and GPUs. CUDA
cation type is very important when choosing a proper imple- resulted in a better performance for Nvidia’s GPUs, but the
mentation method for FPGA-SoCs, especially in cases where main advantage of OpenCL is that it is hardware independent
the input/output bandwidth between accelerator code and and it is usable for both GPUs and FPGAs.
software, or the optimization of the digital resources, is the Also, the HLS tools are becoming popular for hardware
main bottleneck. Automatic tools such as HLS significantly design, especially in FPGAs. New approaches in HLS tech-
reduce the design time but can also lead to important per- niques have enabled the developers to manage hardware
formance decreases when compared to a very detailed RTL accelerated applications without writing even a single line of
implementation. For instance, whereas RTL implementation HDL code. They need much lower design time, but they are
can easily achieve initialization intervals equal to one for the still limited in design flexibility. However, many designs can
FPU MAC unit, i.e. one new data each clock cycle, HLS or be tested easily, making it possible to perform an automated
SDSoC implementations require more cycles. Although in sweep of the design space to obtain near optimal hardware
other applications this issue can be addressed by interleaving designs as done in [29].
TABLE 8. Computing alternatives with a running time of 100 ms. When interacting with humans, low latency becomes essen-
tial, being the focus of this work.
In this paper, some the most relevant technologies to imple-
ment edge computing systems available to designers have
been reviewed and discussed, including microprocessors,
embedded processors, FPGAs, and embedded GPUs. In order
to compare and discuss these technologies, a relevant deep
neural network case of study has been selected and imple-
mented in the different platforms and development tools.
The obtained results for each platform have been compared
in terms of processing power, energy, latency, design com-
plexity, and cost. Nowadays, microprocessors offer the most
affordable solution for low-complexity and low-cost systems,
whereas embedded processors are excellent alternatives when
D. COST additional processing power is required. Modern FPGAs also
Cost is an important decision factor that will determine in offer high performance, at the cost of additional complex-
many applications the technology to be deployed. Nowadays, ity. This can be solved by using HLS or SDSoC design
the cheapest technology available are microprocessor, exist- methodologies, at the cost of reducing the system perfor-
ing a wide range of families and manufacturers available in mance. Finally, modern GPU architectures offer not only the
a range of prices starting in just few cents. However, as it best computation capabilities but also the best computation-
has been explained, their edge computing capabilities are energy ratio, being the hardware platform of choice for highly
limited. In recent years, proliferation of embedded CPUs such demanding computing edge systems.
as Raspberry Pi and others has made possible the implemen- REFERENCES
tation of inexpensive stand-alone solutions within the range [1] E. Baccarelli, P. G. V. Naranjo, M. Scarpiniti, M. Shojafar, and
of tens of dollars, making them an excellent option for the J. H. Abawajy, ‘‘Fog of everything: Energy-efficient networked computing
low-mid cost range or large deployments. Finally, when high architectures, research challenges, and a case study,’’ IEEE Access, vol. 5,
pp. 9882–9910, 2017.
computing capabilities are required, both GPUs and FPGAs [2] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
are required. Despite the high computing capabilities and p. 436, May 2015.
parallel processing of FPGAs, advances in microelectronics [3] M. Satyanarayanan, ‘‘The emergence of edge computing,’’ Computer,
combined with the optimized GPU architectures have made vol. 50, no. 1, pp. 30–39, Jan. 2017.
[4] F. Lamnabhi-Lagarrigue, ‘‘Systems & control for the future of humanity,
the latter option to have the best performance – cost / energy research agenda: Current and future roles, impact and grand challenges,’’
ratio. Annu. Rev. Control, vol. 43, pp. 1–64, Jan. 2017.
[5] W. Shi and S. Dustdar, ‘‘The promise of edge computing,’’ Computer,
vol. 49, no. 5, pp. 78–81, May 2016.
E. COMPUTING IN THE BLINK OF AN EYE [6] W. Yu, F. Liang, X. He, W. G. Hatcher, C. Lu, J. Lin, and X. Yang, ‘‘A sur-
A common feature in most edge devices is their short vey on the edge computing for the Internet of Things,’’ IEEE Access, vol. 6,
response times. Many of these devices are used to interact pp. 6900–6919, 2018.
[7] R. Lu, K. Heung, A. H. Lashkari, and A. A. Ghorbani, ‘‘A lightweight
with human users. To provide users with the required per- privacy-preserving data aggregation scheme for fog computing-enhanced
formance, the possibility of human-machine cooperation or IoT,’’ IEEE Access, vol. 5, pp. 3302–3312, 2017.
a pleasant user experience, these devices must be able to [8] S.-J. Wu, N. Gebraeel, M. A. Lawley, and Y. Yih, ‘‘A neural network
integrated decision support system for condition-based optimal predictive
response within a short time interval that is not recognizable maintenance policy,’’ IEEE Trans. Syst., Man, Cybern., A, Syst. Humans,
by user. To illustrate the possibilities of each platform for vol. 37, no. 2, pp. 226–236, Mar. 2007.
real-time human-machine, Table 8 presents the largest neural [9] B. Karg and S. Lucia, ‘‘Deep learning-based embedded mixed-integer
model predictive control,’’ in Proc. Eur. Control Conf. (ECC), Jun. 2018,
network that can be evaluated in the approximate time of the pp. 2075–2080.
eye blinking (100 ms), including also the necessary power. [10] R. Feraund, O. J. Bernier, J.-E. Viallet, and M. Collobert, ‘‘A fast and
The size of the neural network with three inputs and two accurate face detector based on neural networks,’’ IEEE Trans. Pattern
Anal. Mach. Intell., vol. 23, no. 1, pp. 42–53, Jan. 2001.
outputs can be measured by the number of layers and neurons
[11] J. E. Stone, D. Gohara, and G. Shi, ‘‘OpenCL: A parallel programming
in each layer, or by the total amount of parameters that define standard for heterogeneous computing systems,’’ Comput. Sci. Eng.,
the neural network. All the information can be seen in Table 8, vol. 12, no. 3, pp. 66–73, May 2010.
which shows that embedded GPUs can deal with the largest [12] S. Mittal, ‘‘A survey on optimized implementation of deep learning models
on the NVIDIA Jetson platform,’’ J. Syst. Archit., vol. 97, pp. 428–442,
networks. Jan. 2019.
[13] J. Gu, Y. Liu, Y. Gao, and M. Zhu, ‘‘OpenCL caffe: Accelerating and
VI. CONCLUSION enabling a cross platform machine learning framework,’’ in Proc. ACM
Rapid developments of the internet of things, have made Int. Conf., vols. 19–21, 2016, pp. 1–5.
sensing, communication and computation ubiquitous. These [14] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu,
C. Zhang, and Z. Zhang, ‘‘MXNet: A flexible and efficient machine
technologies are increasingly complex and, therefore, learning library for heterogeneous distributed systems,’’ 2015,
require more automation and intelligent decision-making. arXiv:1512.01274. [Online]. Available: http://arxiv.org/abs/1512.01274
[15] X. Zhang, Y. Wang, and W. Shi, ‘‘PCamp: Performance comparison of [37] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, ‘‘Optimizing
machine learning packages on the edges,’’ in Proc. USENIX Workshop Hot FPGA-based accelerator design for deep convolutional neural networks,’’
Topics Edge Comput. (HotEdge), 2018, vol. 2, no. 1, pp. 1–6. in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays (FPGA),
[16] M. Papadonikolakis, C.-S. Bouganis, and G. Constantinides, ‘‘Perfor- Monterey, CA, USA, 2015, pp. 161–170.
mance comparison of GPU and FPGA architectures for the SVM train-
ing problem,’’ in Proc. Int. Conf. Field-Program. Technol., Dec. 2009,
pp. 388–391.
[17] T. Hao et al., ‘‘Edge AIBench: Towards comprehensive end-to-end edge
computing benchmarking,’’ in Benchmarking, Measuring, and Optimiz-
ing (Lecture Notes in Computer Science), vol. 11459, C. Zheng and MOHAMMAD HOSSEIN GHASEMI was born
J. Zhan, Eds. Cham, Switzerland: Springer, 2019. [Online]. Available: in Shiraz, Iran, in 1993. He received the B.Sc. and
https://link.springer.com/chapter/10.1007/978-3-030-32813-9_3#citeas M.Sc. degrees in electrical engineering from the
[18] C.-H. Hong and B. Varghese, ‘‘Resource management in Fog/Edge com- University of Tehran, Tehran, Iran, in 2015 and
puting,’’ ACM Comput. Surv., vol. 52, no. 5, pp. 1–37, Sep. 2019. 2018, respectively.
[19] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and From 2017 to 2018, he was a Guest Student at
K. Skadron, ‘‘Rodinia: A benchmark suite for heterogeneous computing,’’ TU Berlin, Berlin, Germany, for two semesters.
in Proc. IEEE Int. Symp. Workload Characterization (IISWC), Oct. 2009, His current research interests include embed-
pp. 44–54. ded system design, hardware-software co-design,
[20] M. Gusev and S. Dustdar, ‘‘Going back to the roots—The evolution of edge edge-computing in the Internet of Things (IoT)
computing, an IoT perspective,’’ IEEE Internet Comput., vol. 22, no. 2, applications, and high-performance computing.
pp. 5–15, Mar. 2018.
[21] D. Georgakopoulos, P. P. Jayaraman, M. Fazia, M. Villari, and
R. Ranjan, ‘‘Internet of Things and edge cloud computing roadmap
for manufacturing,’’ IEEE Cloud Comput., vol. 3, no. 4, pp. 66–73,
Jul. 2016.
[22] K. Zhang, Y. Mao, S. Leng, Q. Zhao, L. Li, X. Peng, L. Pan, S. Maharjan, OSCAR LUCÍA (Senior Member, IEEE) received
and Y. Zhang, ‘‘Energy-efficient offloading for mobile edge computing the M.Sc. and Ph.D. degrees (Hons.) in electri-
in 5G heterogeneous networks,’’ IEEE Access, vol. 4, pp. 5896–5907, cal engineering from the University of Zaragoza,
2016. Spain, in 2006 and 2010, respectively.
[23] Cisco, ‘‘Cisco visual networking index: Forecast and trends, 2017–2022,’’ From 2006 to 2007, he held a Research Intern-
Cisco, San Jose, CA, USA, White Paper, 2019. ship at the Bosch and Siemens Home Appli-
[24] G. Serale, M. Fiorentini, A. Capozzoli, D. Bernardini, and A. Bemporad, ances Group. Since 2008, he has been with the
‘‘Model predictive control (MPC) for enhancing building and HVAC sys- Department of Electronic Engineering and Com-
tem energy efficiency: Problem formulation, applications and opportuni-
munications, University of Zaragoza, where he is
ties,’’ Energies, vol. 11, no. 3, p. 631, Mar. 2018.
currently an Associate Professor. He was a Visiting
[25] S. Lucia, D. Navarro, O. Lucia, P. Zometa, and R. Findeisen, ‘‘Optimized
FPGA implementation of model predictive control for embedded systems Scholar with the Center for Power Electronics Systems (CPES), Virginia
using high-level synthesis tool,’’ IEEE Trans Ind. Informat., vol. 14, no. 1, Tech, in 2009 and 2012. He has also been with TU Berlin, Germany,
pp. 137–145, Jan. 2018. since 2019. His main research interests include resonant power conversion,
[26] M. Mukherjee, R. Matam, L. Shu, L. Maglaras, M. A. Ferrag, wide-bandgap devices, and digital control, mainly applied to contactless
N. Choudhury, and V. Kumar, ‘‘Security and privacy in fog computing: energy transfer, induction heating, electric vehicles, and biomedical appli-
Challenges,’’ IEEE Access, vol. 5, pp. 19293–19304, 2017. cations. In these topics, he has published more than 75 international journal
[27] J. A. Stankovic, ‘‘Research directions for the Internet of Things,’’ IEEE articles and 125 conference papers. He has filed more than 40 international
Internet Things J., vol. 1, no. 1, pp. 3–9, Feb. 2014. patents.
[28] A.-R. Sadeghi, C. Wachsmann, and M. Waidner, ‘‘Security and pri- Dr. Lucía is a member of the Aragon Institute for Engineering Research
vacy challenges in industrial Internet of Things,’’ in Proc. 52nd (I3A) and an Active Member of the Power Electronics (PELS) and the
ACM/EDAC/IEEE Des. Autom. Conf. (DAC), Jun. 2015, pp. 1–6. Industrial Electronics (IES) societies. He is currently an Associate Editor of
[29] K. Irion, ‘‘Government cloud computing and national data sovereignty,’’ the IEEE TRANSACTIONS on INDUSTRIAL ELECTRONICS, the IEEE TRANSACTIONS
Policy Internet, vol. 4, nos. 3–4, pp. 40–71, Jan. 2013. on POWER ELECTRONICS, and the IEEE OPEN JOURNAL of the INDUSTRIAL
[30] X. Meng, W. Wang, and Z. Zhang, ‘‘Delay-constrained hybrid compu- ELECTRONICS SOCIETY.
tation offloading with cloud and fog computing,’’ IEEE Access, vol. 5,
pp. 21355–21367, 2017.
[31] S. Lucia, D. Navarro, B. Karg, H. Sarnago, and O. Lucia, ‘‘Deep learning-
based model predictive control for resonant power converters,’’ IEEE Trans
Ind. Informat., to be published.
[32] B. Karg and S. Lucia, ‘‘Efficient representation and approximation of SERGIO LUCIA (Member, IEEE) received the
model predictive control laws via deep learning,’’ 2018, arXiv:1806.10644. M.Sc. degree in electrical engineering from the
[Online]. Available: http://arxiv.org/abs/1806.10644 University of Zaragoza, in 2010, and the Dr. Ing.
[33] A. Pullini, D. Rossi, I. Loi, A. Di Mauro, and L. Benini, ‘‘Mr. Wolf: A 1 degree in optimization and automatic control from
GFLOP/s energy-proportional parallel ultra low power SoC for IOT edge TU Dortmund, in 2014.
processing,’’ in Proc. IEEE 44th Eur. Solid State Circuits Conf. (ESSCIRC), He then joined the Otto-von-Guericke-
Sep. 2018, pp. 274–277. Universität Magdeburg, and visited the Mas-
[34] Y. Zhao, Q. Yu, X. Zhou, X. Zhou, X. Li, and C. Wang, ‘‘PIE: A pipeline sachusetts Institute of Technology, as a
energy-efficient accelerator for inference process in deep neural net-
Postdoctoral Fellow. Since May 2017, he has
works,’’ in Proc. IEEE 22nd Int. Conf. Parallel Distrib. Syst. (ICPADS),
been an Assistant Professor and holds the Chair
Dec. 2016, pp. 1067–1074.
of Internet of Things for Smart Buildings, TU Berlin, and the Einstein
[35] C. Huang, S. Ni, and G. Chen, ‘‘A layer-based structured design of CNN
on FPGA,’’ in Proc. IEEE 12th Int. Conf. ASIC (ASICON), Oct. 2017, Center Digital Future. His research efforts focus on decision-making under
pp. 1037–1040. uncertainty, distributed control, and embedded optimization using micro-
[36] O. Jimenez, O. Lucia, I. Urriza, L. A. Barragan, D. Navarro, and controllers, and FPGAs in the framework of the Internet of Things. His
V. Dinavahi, ‘‘Implementation of an FPGA-based online hardware-in- applications of interest include smart buildings and energy systems. He is
the-loop emulator using high-level synthesis tools for resonant power an Active Member of the IEEE CSS Society. He is an Associate Editor of
converters applied to induction heating appliances,’’ IEEE Trans. Ind. the Journal of Process Control.
Electron., vol. 62, no. 4, pp. 2206–2214, Apr. 2015.