2018 Microprocessor Optimizations For The IOT

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO.
1, JANUARY 2018 7
Microprocessor Optimizations for the

Internet of Things: A Survey
Tosiron Adegbija, Member, IEEE, Anita Rogacs, Chandrakant Patel, Fellow, IEEE,
and Ann Gordon-Ross, Member, IEEE
Abstract—The Internet of Things (IoT) refers to a pervasive expansive variety of devices, protocols, domains, and applica-
presence of interconnected and uniquely identifiable physical tions. The IoT will involve devices that gather data and drive
devices. These devices’ goal is to gather data and drive actions actions in order to improve productivity, and ultimately reduce
in order to improve productivity, and ultimately reduce or
eliminate reliance on human intervention for data acquisition, or eliminate reliance on human intervention for data acquisi-
interpretation, and use. The proliferation of these connected tion, interpretation, and use [9]. The IoT has been described as
low-power devices will result in a data explosion that will one of the disruptive technologies that will transform life, busi-
significantly increase data transmission costs with respect to ness, and the global economy [66]. Based on analysis of key
energy consumption and latency. Edge computing reduces these potential IoT use-cases (e.g., healthcare, smart cities, smart
costs by performing computations at the edge nodes, prior to
data transmission, to interpret and/or utilize the data. While home, transportation, manufacturing, etc.), it has been esti-
much research has focused on the IoT’s connected nature and mated that by 2020, the IoT will constitute a trillion dollar
communication challenges, the challenges of IoT embedded com- economic impact and include more than 50 billion low-power
puting with respect to device microprocessors has received much devices that will generate petabytes of data [29], [91], [106].
less attention. This paper explores IoT applications’ execution Due to the IoT’s expected growth and potential impact,
characteristics from a microarchitectural perspective and the
microarchitectural characteristics that will enable efficient and much research has focused on the IoTs communication and
effective edge computing. To tractably represent a wide vari- software layer [11], [34], [61], [68], however, the challenges
ety of next-generation IoT applications, we present a broad of IoT computing, especially with respect to device micro-
IoT application classification methodology based on applica- processors, has received much less attention. Computing on
tion functions, to enable quicker workload characterizations IoT devices introduces new substantial challenges, since IoT
for IoT microprocessors. We then survey and discuss poten-
tial microarchitectural optimizations and computing paradigms devices’ microprocessors must satisfy increasingly growing
that will enable the design of right-provisioned microprocessors computational and memory demands, maintain connectivity,
that are efficient, configurable, extensible, and scalable. This and adhere to stringent design and operational constraints, such
paper provides a foundation for the analysis and design of a as low cost, low energy budgets, and in some cases, real-time
diverse set of microprocessor architectures for next-generation constraints. These challenges necessitate new research focus
IoT devices.
on microarchitectural optimizations that will enable design-
Index Terms—Adaptable microprocessors, approximate ers to develop right-provisioned architectures that are efficient,
computing, edge computing, energy harvesting, heterogeneous configurable, extensible, and scalable for next-generation IoT
architectures, Internet of Things (IoT), IoT survey, low-power
embedded systems, microprocessor optimizations. devices.
Fig. 1 depicts an IoT use-case that illustrates the high-level
components of the traditional IoT model. The IoT typically
I. I NTRODUCTION comprises of several low-power/low-performance edge nodes,
HE INTERNET of Things (IoT) is an emerging technol- such as sensor nodes, that gather data and transmit the data
T ogy that refers to a pervasive presence of interconnected
and uniquely identifiable physical devices, comprising an
to high-performance head nodes, such as servers, that perform
computations for visualization and analytics. In a data cen-
ter, for example, data aggregation from edge nodes facilitates
Manuscript received September 15, 2016; revised February 1, 2017; power and cooling management [14], [73].
accepted May 22, 2017. Date of publication June 20, 2017; date of cur-
rent version December 20, 2017. This paper was recommended by Associate However, the growth of the IoT and the resulting expo-
Editor J. Xue. (Corresponding author: Tosiron Adegbija.) nential increase in acquired/transmitted data poses significant
T. Adegbija is with the Department of Electrical and Computer bandwidth and latency challenges. These challenges are exac-
Engineering, University of Arizona, Tucson, AZ 85721 USA (e-mail:
tosiron@email.arizona.edu). erbated by the intrinsic resource constraints of most embedded
A. Rogacs and C. Patel are with HP Laboratories, Palo Alto, CA 94304 edge nodes (e.g., size, battery capacity, real-time deadlines,
USA (e-mail: rogacs@hp.com; chandrakant.patel@hp.com). cost, etc.). These resource constraints must be taken into
A. Gordon-Ross is with the University of Florida, Gainesville, FL 32611
USA, and also with the Center for High Performance Reconfigurable account in the design process, and may make it more dif-
Computing, University of Florida, Gainesville, FL 32611 USA (e-mail: ficult to achieve design objectives (e.g., minimizing energy,
ann@ece.ufl.edu). size, etc.). Additionally, increasing consumer demands for
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. high-performance IoT applications will necessitate acquisition
Digital Object Identifier 10.1109/TCAD.2017.2717782 and transmission of complex data. For example, a potentially
0278-0070 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: ANURAG GROUP OF INSTITUTIONS. Downloaded on July 28,2022 at 08:19:03 UTC from IEEE Xplore. Restrictions apply.
8 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO. 1, JANUARY 2018
sufficient computational capabilities and algorithms to extract

information and interpret the data. Only processed data
(e.g., information about an anomaly in the patient) is trans-
mitted to the medical personnel, thus speeding up the diag-
noses and reducing the MRI machine’s energy consumption.
Alternatively, the data could be quantifiably reduced using
intelligent algorithms and computations, such that only impor-
tant information is transmitted to the medical personnel.
Gaura et al. [30] examined the benefits of edge mining,
in which data mining takes place on the edge devices. The
authors showed that edge mining has the potential to reduce
the amount of transmitted data, thus reducing energy con-
sumption and storage requirements. However, the edge nodes
computing capabilities must be sufficient/right-provisioned to
perform and sustain the required computations, while adher-
ing to the nodes design constraints (e.g., form factor, energy
Fig. 1. Illustration of the high-level components of the IoT.
consumption, etc.) [83].
This paper explores microarchitectural optimizations and
emerging computing paradigms that will enable edge com-
impactful IoT use-case is medical diagnostics [67]. With puting on the IoT. To ensure that microprocessor architectures
the advent of technological advances such as cheap portable designed and/or selected for the IoT have sufficient comput-
magnetic resonance imaging (MRI) devices and portable ultra- ing capabilities, a holistic approach, involving both application
sound machines, several gigabytes (GBs) of high resolution and microarchitecture characteristics, must be taken to deter-
images will be transmitted to medical personnel for remote mine microarchitectural design tradeoffs. However, due to
data processing and medical diagnosis. In some cases, this the wide variety of IoT applications and the diverse set of
system must scale to a network of several portable medical available architectures, determining the appropriate architec-
devices that transfer data to medical personnel. Transmitting tures is very challenging. The study presented herein seeks to
this data will result in bandwidth bottlenecks and pose addi- address these challenges and motivate future research in this
tional challenges for real-time scenarios (e.g., medical emer- direction.
gencies) where the latency must adhere to stringent deadline In this paper, we perform an expansive study and character-
constraints. ization of the emerging IoT application space and propose an
The IoT can also incur significant, and potentially unsustain- application classification to broadly represent IoT applications
able, energy overheads. Previous work [13], [57] established with respect to their execution characteristics. To enable the
that energy consumed while transmitting data is significantly design of right-provisioned microprocessors, we propose the
more than the energy consumed while performing compu- use of computational kernels that provide a tractable starting
tations on the data. For example, the energy required by point for representing key computations that occur in the IoT
Rockwell Automations sensor nodes to transmit 1-bit of data application space. Using computational kernels, rather than
is 1500×–2000× more than the energy required to execute a full applications, follows the computational dwarfs methodol-
single instruction (depending on the transmission range and ogy [8] and allows IoT computational patterns to be accurately
specific computations) [76]. represented at a high level of abstraction.
To address these challenges, fog computing [16] has been Furthermore, we propose a high-level design methodology
proposed as a virtualized platform that provides compute, stor- for identifying right-provisioned architectures for edge com-
age, and networking services between edge nodes and cloud puting use-cases, based on the executing applications and the
computing data centers. Rather than performing computations applications execution characteristics (e.g., compute intensity,
in the cloud, fog computing reduces the bandwidth bottleneck memory intensity, etc.). Finally, in order to motivate future
and latency by moving computation closer to the edge nodes. research, we survey a few potential microprocessor optimiza-
This paper focuses on further reducing the bandwidth, latency, tions and computing paradigms that will enable the design of
and energy consumption through edge computing, where the right-provisioned IoT microprocessor architectures.
edge nodes are directly equipped with sufficient computation
capacity in order to minimize data transmission [4].
Edge computing performs computations that process, II. H IGH -L EVEL I OT C HARACTERISTICS AND T HEIR
interpret, and use data at the edge nodes. Performing these D EMANDS ON M ICROPROCESSOR A RCHITECTURES
computations on the edge nodes minimizes data transmis- The IoT’s characteristics necessitate new designs and opti-
sion, thereby improving latency, bandwidth, and energy mizations for microprocessors that will be employed in IoT
consumption. For example, in the medical diagnostics use- devices. We briefly describe seven key characteristics—based
case described above, rather than sending several GBs of on previous research [75]—that, together, distinguish the IoT
MRI data to the medical personnel for diagnoses, the from other connected systems: intelligence, heterogeneity,
portable MRI machine (the edge node) is equipped with complexity, scale, real-time constraints, spatial constraints,
ADEGBIJA et al.: MICROPROCESSOR OPTIMIZATIONS FOR IoT: SURVEY 9
and internode support. We also describe the demands that 7) Internode Support: The IoT will comprise of several
these characteristics place on microprocessor architectures. devices/nodes that can share execution resources among
1) Intelligence: Since the goal of the IoT is to reduce each other. Due to the wide variety of IoT applications
reliance on human intervention for data acquisition that may execute on a device, and the stringent resource
and use [9], raw data must be autonomously collected constraints, it may be impractical to equip every device
and processed to create actionable information. IoT with all the execution resources it will require through-
microprocessors must be able to dynamically adapt to out its lifetime. Thus, to maintain efficient execution,
varying runtime execution scenarios and adaptable data IoT devices must be able to share execution resources
characteristics [32]. with each other, when necessary.
2) Heterogeneity: One of the key characteristics of the
IoT is that it involves a high degree of heterogeneity,
featuring different kinds of devices, applications, and
contexts [75]. Thus, IoT microprocessors must be spe- III. I OT A PPLICATION C LASSIFICATION
cialized to the different execution characteristics of IoT The IoT offers computing potential for many application
applications. IoT microprocessor heterogeneity may be domains, including transportation and logistics, healthcare,
chip-level—a single chip with heterogeneous cores—or smart environment, personal and social domains [11], etc.
network-level, where different devices feature different One of the key goals of the IoT, from an edge comput-
kinds of cores. Despite this heterogeneity, the devices ing perspective, is to equip edge devices with sufficient
must be able to seamlessly communicate with each resources to perform computations that would otherwise
other and share resources for efficient data interpretation have been transferred to a high-performance device. In
and use. order to rightly provision these devices, we must first
3) Complexity: The organization and management of the understand potential applications that will be executed on
IoT will be very complex. Apart from the large num- the devices.
bers of heterogeneous architectures, the architectures Previous works have proposed classifications for various
must be able to execute a wide variety of applica- IoT components. Gubbi et al. [34] presented a taxonomy
tions, many of which may be memory- and compute- for a high-level definition of IoT components with respect
intensive. Interactions between the different IoT devices to hardware, middleware, and presentation/data visualiza-
will dynamically vary. Some devices will be added tion. Tilak et al. [93] presented a taxonomy to classify
to the IoT network, while others will be removed; wireless sensor networks according to different communica-
these changes may impact individual devices’ execution tion functions, data delivery models, and network dynamics.
behaviors. Tory and Möller [94] presented a high-level visualization tax-
4) Scale: The IoT will comprise more than 50 billion onomy that classified algorithms based on the characteristics
devices by 2020, and the numbers are expected to grow of the data models.
continuously [91]. In addition to the increase in the However, there is currently very little research that char-
number of devices, the interactions among them will acterizes these applications with respect to their execution
also increase. To support this scale, IoT microprocessors characteristics. One of the biggest challenges the IoT presents
must be efficient—cost, energy, and area efficient— is the huge number and diversity of use-cases and potential
and constitute minimal overhead to the IoT device. In applications that will be executed on IoT devices. This chal-
addition, the microprocessors must be able to portably lenge is exacerbated by the fact that only a small fraction of
execute different kinds of applications. these applications are currently available in society. Thus, a
5) Real-Time Constraints: Some of the most important significant amount of foresight is required in designing micro-
IoT use-cases—for example, patient monitoring, medical processor architectures to support the IoT’s emergence and
diagnostics, aircraft monitoring—involve real-time con- growth.
straints, where execution must adhere to stringent dead- Much prior work has characterized IoT applications
lines. IoT microprocessors must be able to dynamically according to different use-cases and domains. For example,
determine and adhere to deadlines, based on various Atzori et al. [11] and Sundmaeker et al. [91] categorized IoT
inputs, such as user inputs, application characteristics, applications into three domains: 1) industry; 2) environment;
quality of service (QoS). and 3) society. Asin and Gascon [10] categorized IoT applica-
6) Spatial Constraints: Several IoT use-cases are location- tions into 54 domains under 12 categories. In this paper, our
based. An IoT device’s location may change throughout goal is a tractable and extensible classification that enables us
the device’s lifetime. In addition, the device may be to identify the IoT applications’ key execution characteristics.
exposed to variable, and potentially nonideal, environ- As an initial step toward understanding IoT applications’
mental conditions. For example, tracking devices may execution characteristics, we performed an expansive study of
be exposed to extreme heat, extreme cold, and/or rain IoT use-cases and the application functions present in these
at different times or in different locations. Thus, IoT use-cases. Since it is impractical to consider every IoT applica-
microprocessors must feature fault tolerance and adapt- tion within these use-cases/application domains, based on this
ability that allows them to adhere to variable operation paper, we propose an application classification methodology
conditions. that provides a high level, broad, and tractable representation
of a variety IoT applications using the application func- There are many communication technologies (e.g., Bluetooth,
tions. Our IoT application classification consists of six key Wi-Fi, etc.), and communication protocols [e.g., transfer con-
application functions. trol protocol (TCP), the emerging 6lowpan (IPv6 over low
1) Sensing. power wireless personal area network), etc.]. In this paper,
2) Communications. we highlight software defined radio (SDR) [59], which is
3) Image processing. a communication system in which physical layer functions
4) Compression (lossy/lossless). (e.g., filters, modems, etc.) that are typically implemented in
5) Security. hardware are implemented in software.
6) Fault tolerance. SDR is an emerging and rapidly developing communica-
We note that this classification is not exhaustive; however, it tion system that is driving the innovation of communications
represents a wide variety of current and potential IoT applica- technology, and promises to impact all areas of communica-
tions. The classification also provides an extensible framework tion. SDR is growing in popularity, and attractive for the IoT,
that allows emerging applications/application domains to be because of its inherent flexibility, which allows for flexible
analyzed. In this section, we describe the application functions incorporation and enhancements of multiple radio functions,
and motivate these functions using a medical diagnostics use- bands, and modes, without requiring hardware updates. SDR
case, where applicable, or other specific examples of current typically involves an antenna, an analog-to-digital converter
and/or emerging IoT applications. connected to an antenna (for receiving) and a digital-to-analog
converter connected to the antenna (for transmitting). Digital
signal processing (DSP) operations (e.g., fast Fourier trans-
A. Sensing form) are then used to convert the input signals to any form
Sensing involves data acquisition (e.g., temperature, pres- required by the application.
sure, motion, etc.) about objects or phenomena, and will Even though SDR applications are typically compute inten-
remain one of the most common functions in IoT applica- sive, with small data and instruction memory footprints, recent
tions. In these applications, activities, information, and data work [20] shows that the overheads of SDR can be kept
of interest are gathered for further processing and decision small in the IoT domain by focusing on optimizing the key
making. We use sensing in our IoT application classification kernels (e.g., synchronization and finite impulse response)
to represent applications where data acquired using sensors that dominate SDR computations and power consumption. In
must be converted to a more useable form. Our motivating general, SDR algorithms can be efficiently executed using
example for sensing applications is sensor fusion [70], where general purpose microprocessors or more specialized pro-
sensed data from multiple sensors are fused to create data that cessors, such as DSPs or field-programmable gate arrays
is considered qualitatively or quantitatively more accurate and (FPGAs). Alternatively, heterogeneous architectures [40] can
robust than the original data. also combine different kinds of microprocessors to satisfy
Sensor fusion algorithms can involve various levels of com- different operations’ execution requirements while minimiz-
plexity and compute/memory intensity. For example, sensor ing overheads, such as energy consumption. Other examples
fusion could involve aggregating data from various sources of communication applications include packet switching and
using simple mathematical computations, such as addition, TCP/IP.
minimum, maximum, mean, etc. Alternatively, sensor fusion
could involve more computationally complex/expensive appli-
cations, such as fusing vector data (e.g., video streams from C. Image Processing
multiple sources), which requires a substantial increase in In the IoT context, image processing represents applications
intermediate processing. that involve any form of signal processing where the input is an
In a medical diagnostics use-case, for example, sensing is image or video stream from which characteristics/parameters
vital in a body area network [19], where noninvasive sensors must be extracted/identified. Additionally, this classification
can be used to automatically monitor a patients physiolog- also involves applications in which an image/video input must
ical activities, including blood pressure, heart rate, motion, be converted to a more usable form. Several emerging IoT
etc. Several sensing devices, such as portable electrocardiog- applications, such as automatic number license plate recog-
raphy (ECG), electroencephalography (EEG), and electromyo- nition, traffic sign recognition, face recognition, etc., involve
graphy machines, motion and blood pressure sensors could be various forms of image processing. For example, face recog-
equipped with additional computational resources and algo- nition involves operations, such as face detection, landmark
rithms that enable the devices to not only gather data, but also recognition, feature extraction, and feature classification, all
analyze the data in order to reduce the amount of transmitted of which involve image processing.
data, with minimal energy or area overheads. Image processing is important for several impactful IoT
use-cases, and necessitates microarchitectures that can effi-
ciently perform image processing operations. For example,
B. Communications in medical diagnostics, image processing can be used to
Communications is one of the most common IoT appli- increase the reliability and reproducibility of disease diagnos-
cation functions due to the IoTs intrinsic connected struc- tics. Image processing can provide medical personnel with
ture, where data transfers traverse several connected nodes. quantitative data from historical images, which can be used
to supplement qualitative data currently used by specialists. In security applications to prevent unauthorized access to sensi-
addition, portable medical devices, e.g., portable ultrasounds, tive data and functions. Implantable medical devices, such as
can be equipped with image processing applications to provide pacemakers, implantable cardiac defibrillators, and neurostim-
speedy analysis for remote assessment of patients [69]. ulators are especially susceptible to potentially fatal security
The National Institute of Health supports the medical image and privacy issues, such as replay attacks [37], [48]. Since
processing, analysis, and visualization application [67], which medical device security is still in its infancy, there still exists
enables medical researchers to easily share research data and a wide knowledge gap with respect to the microprocessor
enhance their ability to diagnose, monitor, and treat medical characteristics that will support security algorithms execu-
disorders. However, since image processing applications are tion requirements without sacrificing the devices functional
typically data-rich, and both memory and compute intensive, requirements.
novel optimization techniques are required to enable the effi- We highlight data encryption [89], which is a common tech-
cient execution of these applications in the context of IoT nique for ensuring data confidentiality, wherein an encryption
edge computing. Furthermore, some image processing appli- algorithm is used to generate encrypted data that can only
cations require large input, intermediate, or output data to be be read/used if decrypted. Data encryption applications (e.g.,
stored (e.g., medical imaging), thus requiring a large amount secure hash algorithm) are typically compute intensive and
of storage. memory intensive, since encryption speed is also dependent
on the memory access latency for data retrieval and storage.
D. Compression
F. Fault Tolerance
With the increase in data and bandwidth-limited systems,
Fault tolerance [39] refers to a system’s ability to oper-
compression can reduce communication requirements to
ate properly when some of its components fail. Fault tolerant
ensure that data is quickly retrieved, transmitted, and/or ana-
applications are especially vital since IoT devices may be
lyzed. Several emerging IoT use-cases will involve large
deployed in harsh and unattended environments, where QoS
volumes of data, which will necessitate efficient compres-
must be maintained in potentially adverse conditions, such
sion techniques to accommodate the rapid growth of the data
as cryogenic to extremely high temperatures, shock, vibra-
and reduce transmission latency and bandwidth costs [100].
tion, etc. In some emerging IoT devices, such as implantable
Additionally, since most IoT devices are resource-constrained,
medical devices, fault tolerance could be the single most crit-
compression also reduces storage requirements when data
ical requirement, since faults can be potentially fatal. Thus,
must be stored on the edge node. For example, data gath-
fault tolerance must be incorporated into such devices without
ered using sensors in a body area network can be quantifiably
accruing significant overheads.
and intelligently reduced in order to minimize transmission
Fault tolerance can be achieved in different ways. Hardware-
and storage requirements for medical diagnosis devices.
based techniques usually rely on redundancy—redundant array
Compression involves encoding information using fewer
of independent disks [74] is a common example—wherein
bits than the original representation. The data can be encoded
redundant disks or devices are used to provide fault toler-
at the data source before storage or transmission, known as
ance in the event of a failure. This kind of redundancy can
source encoding, or during transmission, known as channel
be achieved in IoT devices using a dedicated IoT device, or
coding [6]. In our studies, however, we focus on source encod-
integrated into a larger, less constrained device, in order to
ing, as this type of encoding will be more relevant in the
minimize the attendant overheads of redundancy. Alternatively,
context of edge computing.
redundancy can be incorporated directly into the IoT devices,
Compression techniques can be broadly classified as lossy
at the expense of area and power overheads. To reduce
or lossless compression. Lossy compression (e.g., JPEG) typ-
the overheads from hardware-based fault tolerance, software-
ically exploits the perceptibility of the data in question, and
based fault tolerance can also be employed. Software-based
removes unnecessary data, such that the lost data is impercep-
fault tolerance [39], [80], [95] involves applications and
tible to the user. Alternatively, lossless compression removes
algorithms that perform operations, such as memory scrub-
statistically redundant data in order to concisely represent data.
bing, cyclic-redundancy checks (CRCs), error detection and
Lossless compression typically achieves a lower compression
correction, etc.
ratio and is usually more compute and memory intensive
than lossy compression. However, lossy compression may
be unsuitable in some scenarios where high data fidelity is IV. D ETERMINING I OT M ICROPROCESSOR
required to maintain the QoS (e.g., in medical imaging). C ONFIGURATIONS
One of the major challenges for IoT microprocessor design
is determining the best microprocessor configurations that sat-
E. Security isfy the IoT device’s execution requirements. In this section,
Since IoT devices are often deployed in open or poten- we describe a sample high-level process through which an
tially unsafe environments, where the devices are susceptible IoT microprocessor can be designed and optimized. Fig. 2
to malicious attacks, security applications are necessary to illustrates a high-level IoT microprocessor design life-cycle,
maintain the integrity of both devices and data. Furthermore, consisting of six steps. First, the use-case needs to be speci-
sensitive scenarios (e.g., medical diagnostics) may require fied. This step describes the overall functionality and behavior
way for determining these characteristics is through simu-

lations. During design space exploration [85], the kernels’
execution characteristics—memory intensity, compute inten-
sity, instructions per cycle, memory references, etc.—using
different microprocessor configurations are then analyzed
to determine which configurations best satisfy the applica-
tion resource requirements. These configurations can then
be refined, if necessary, after verifying that the functional
requirements of the use-case are met.
V. S TATE - OF - THE -A RT IN I OT M ICROARCHITECTURE

C ONFIGURATIONS
We performed an extensive survey and study of the
state-of-the-art in commercial-off-the-shelf (COTS) embedded
Fig. 2. Illustration of a high-level IoT microprocessor design life-cycle. systems microprocessor architectures from several design-
ers and manufacturers ranging from low-end microcontrollers
TABLE I
to high-end/high-performance low-power embedded systems
A PPLICATION F UNCTIONS AND S AMPLE R EPRESENTATIVE K ERNELS microprocessors. Our studies included publicly available infor-
mation on these microprocessors’ configurations, and conver-
sations with researchers and engineers directly involved with
microprocessor design and development in several manufac-
turing companies.
Based on our studies, we categorized the microprocessors
in terms of several microprocessor characteristics, includ-
ing number of cores, on-chip memory (e.g., cache), off-chip
memory support, power consumption, number of pipeline
stages, etc. Using this information, we developed a set of four
high-level microarchitecture configurations for IoT edge com-
of the IoT device, which will dictate the microprocessor puting support. These configurations represent the range of
requirements. Based on the use-case, the required applications available state-of-the-art COTS microprocessors, and provide
to achieve the desired functionality are then specified. For a reference point from which future IoT microprocessors and
example, a medical diagnostics use-case involving a portable optimizations can be developed. While microprocessors could
ultrasound device [50] may require applications for image include central processing units (CPUs), graphics processing
capture, anomaly detection, anomaly recognition, data encryp- units (GPUs), DSPs, etc., in this survey, we focus on CPUs,
tion, and data transmission. Thereafter, the specific functions since they are typically the backbone for most edge computing
within each application are determined, and these functions applications.
are broken down into their respective computational kernels. Table II depicts the microarchitecture configurations, com-
Computational kernels are basic execution blocks that repre- prising of four configurations—config1, config2, config3,
sent applications’ functions; the kernels disconnect the execu- and conf4, representing different kinds of microproces-
tions from specific implementations, programming languages, sors. We highlight specific state-of-the-art microcontroller/
and algorithms. Using computational kernels can make the microprocessor examples to motivate the configurations, how-
design process more manageable, since kernels are faster to ever, we note that these configurations are only representative
simulate and can represent a variety of applications/application and not necessarily descriptive.
functions. In addition, kernels expose computational nuances Config1 represents low-power and low-performance micro-
and reveal execution characteristics that may not be visible controller units, such as the ARM Cortex-M4 [101] found in
when considering the full application. Kernels also provide a several IoT-targeted MCUs from several developers, including
fine-grained view of applications, such that additional design Freescale Semiconductors, Atmel, and STMicroelectronics.
processes (e.g., hardware/software partitioning [90]) can be Conf1 contains a single core with 48-MHz clock frequency,
sped up. Using computational kernels to represent applica- three pipeline stages, in-order execution, and support for 1 MB
tion functions is supported by the concept of computational of flash memory.
dwarfs [8]. Computational dwarfs represent patterns of com- Config2 represents recently developed IoT-targeted CPUs,
putation at high levels of abstraction to encompass several such as the Intel Quark Technology [78], and contains a single
computational methods in modern computing. core with 400-MHz clock frequency, five pipeline stages, in-
Table I illustrates sample kernels that can be used to rep- order execution, 16-kB level one (L1) instruction and data
resent the different application classes (Section III). After the caches, and support for 2-GB RAM.
computational kernels are determined, the kernels’ execution Config3 represents mid-range CPUs, such as the ARM
characteristics are then determined. One of the most common Cortex-A7 [98] found in several general purpose embedded
TABLE II
S TATE - OF - THE -A RT M ICROPROCESSOR C ONFIGURATIONS
TABLE III
S UMMARY OF C OMPUTING PARADIGMS AND P OTENTIAL B ENEFITS
systems, and contains four cores with 1-GHz clock frequency, microarchitectures with configurations that can be
eight pipeline stages, in-order execution, 32-kB L1 instruction autonomously specialized to different applications in
and data caches, 1-MB level two (L2) cache, and support for order to achieve optimal execution, especially in terms
2-GB RAM. of energy efficiency.
Finally, config4 represents high-end/high-performance 3) Security: Security must be one of the primary design
embedded systems CPUs, such as the ARM Cortex-A15 [98], goals of IoT microprocessors, especially since IoT
and contains four cores with 1.9-GHz clock frequency, devices will be inherently more susceptible to attacks.
eight pipeline stages, 32-kB L1 instruction and data caches, 4) Future-Proof: Designing IoT microprocessors will
2-MB L2 cache, support for 4-GB RAM, and out-of-order require a lot of foresight. With the rapid emergence of
execution. Out-of-order execution allows instructions to new IoT applications, and the applications’ increasing
execute as soon as the instruction becomes available, unlike memory and compute requirements, IoT microproces-
in-order execution where instructions must execute in program sors must be able to execute future applications without
order. being over-provisioned for current applications.
5) Extensibility: Future-proofing IoT microprocessors can
be achieved by extending the microprocessors with
VI. M ICROPROCESSOR O PTIMIZATIONS , C OMPUTING additional functionalities (e.g., specialized instructions,
PARADIGMS , AND F UTURE D IRECTIONS security monitors, and new on-chip peripherals). Thus,
Using the configurations described in Section V and the ker- IoT microprocessors must be designed with ease of
nels listed in Table I as benchmarks, we performed detailed integration, customization, and extension in mind. Such
architectural simulations on GEM5 [15] to analyze the charac- extensibility will allow new levels of performance and
teristics of the configurations. The details of our analysis can energy efficiency to be achieved.
be found in our preliminary work [4]. Based on our analysis In this section, we survey a few potential microproces-
and extensive surveys, we have identified five key character- sor optimizations and computing paradigms, from a context
istics that IoT microprocessors must have in order to support of edge computing, that will enable the aforementioned IoT
the IoT’s growth. characteristics, and support the IoT’s growth. We first discuss
1) Efficiency: Due to the typical stringent resource con- optimizations for achieving adaptability in IoT micropro-
straints of IoT devices, IoT microprocessors must be cessors; we then survey and discuss research directions in
optimized for energy, cost, performance, and area effi- other computing paradigms that will enable microprocessor
ciency. optimizations for the IoT, including nonvolatile processors,
2) Configurability: One of the major observations from our approximate computing, in-memory processing, and securemi-
studies is that different IoT applications have vastly dif- croarchitectures. Our goal in this section is not to provide an
ferent runtime resource requirements. With the growth exhaustive survey of potential microprocessor optimizations;
of the IoT and the current trend of IoT applications, we our goal is to motivate researchers with future directions for
envision that this variability in runtime resource require- developing novel, configurable, and extensible low-overhead
ments will increase even further with next-generation IoT microprocessors. Table III summarizes the different
IoT applications. Therefore, these variable runtime computing paradigms, benefits, and references for further
resource requirements necessitate adaptable/configurable details.
A. Configurable/Adaptable Architectures for IoT microprocessors. The intrinsic characteristics of the

In order to achieve the right balance between performance, IoT necessitate further studies and development of innovative
power, and area in IoT applications, IoT microprocessors must cache tuning techniques for IoT microprocessors that are low-
be adaptable to the application requirements. This adaptability overhead, robust, and versatile for the variety of applications
can be achieved through the ability to change the microproces- that will execute on these microprocessors.
sor’s configuration at runtime or by heterogeneous processors To orchestrate the cache tuning process, hardware
that offer different processing resources for executing the and/or software cache tuners employ cache tuning algo-
applications. Several microprocessor components can be con- rithms/heuristics to determine the best cache configurations
figured, including the issue queue [28], reorder buffer [53], to meet design constraints. However, the tuner could impose
register files [1], and pipelines [27]. However, in this sec- significant power, area, and/or performance overheads while
tion, we focus on configurable caches [104] due to their exploring the configuration design space [3]. Thus, to max-
potential impact on the microprocessor’s end-to-end energy, imize the benefits of configurable caches in IoT micropro-
performance, and area. cessors, novel cache tuners must be designed such that they
The memory will arguably remain the most important constitute minimal overhead and effectively implement the
microprocessor component for performance and energy con- cache tuning algorithms.
sumption. Since emerging IoT applications will increase in
memory and compute intensity, IoT microprocessors must be B. Distributed Heterogeneous Architectures
equipped with more advanced memory hierarchies to take
Heterogeneous architectures [56] allow a coarse-grained
advantage of the spatial and temporal locality of the IoT
specialization of system resources to application requirements
applications. Due to the memory hierarchy’s large impact on
by equipping a microprocessor with different kinds of cores
system performance and energy consumption, much empha-
or different core configurations. The different cores execute
sis must be placed on efficient caching techniques for IoT
the same instruction set, but have different capabilities and
microprocessors.
performance levels. Thus, at runtime, the system software
Previous work has shown that specializing the cache config-
evaluates the resource requirements of applications or appli-
urations to different application or phase memory requirements
cation phases and determines the core that best satisfies the
can reduce the memory hierarchy’s energy consumption by optimization goals for the executing applications.
up to 62% [33]. In addition, our studies revealed that the One of the major advantages of heterogeneous architec-
cache is one of the more easily over-provisioned resources in tures for IoT microprocessors, from a design perspective, is
an IoT microprocessor, resulting in high energy consumption that existing cores (e.g., CPUs, DSPs, GPUs, etc.) can be
with no performance benefits. The energy consumption can be reused in the implementation of heterogeneous microproces-
quantifiably reduced, without any performance degradation, by sors; this allows previous design and verification efforts to
dynamically changing the cache configurations (e.g., reducing be amortized. However, unlike configurable architectures, het-
the cache size). Thus, a prominent optimization for IoT micro- erogeneous architectures offer a much smaller design space,
processors is dynamically configurable cache architectures that which necessitates greater design time effort in determining
allow the caches parameter values to be specified/changed the best core configurations that will satisfy the application
during runtime. requirements. In addition, in a system with a large number of
Three major challenges must be addressed in order to applications, heterogeneous cores may have a lower optimiza-
enable dynamically configurable caches for IoT microproces- tion potential than configurable cores, since there are fewer
sors: 1) augmenting caches for configurability; 2) cache tuning configurations to choose from in heterogeneous cores.
algorithms/heuristics; and 3) cache tuners. In order to maxi- Much previous research efforts have targeted heterogeneous
mize the benefits of configurable caches, the required hardware cores in general purpose computers, embedded systems, etc.,
optimizations to enable configurability must accrue minimal but their applicability to IoT microprocessors have yet to
overhead. For example, a potential technique for enabling be explicitly determined [68], [88]. Two major challenges
cache configurability uses bit-width configuration registers that that must be addressed in designing heterogeneous micro-
allow a caches banks to be shutdown to configure the cache processors for the IoT are the number and choice of cores,
size, or concatenated to configure the cache associativity [104]. and scheduling of applications to the appropriate cores. In
For optimal execution, the best configurations that achieve order to maximize the optimization potential, designers must
optimization goals and satisfy design constraints for different expend a considerable amount of effort to determine the
applications must be dynamically determined. Cache tun- best cores or configurations to incorporate into the micro-
ing determines the optimal cache configurations that match processor. To provide an effective platform that satisfies the
an application’s runtime behavior. The cache tuning algo- execution requirements of a wide variety of application char-
rithm/heuristic can result in time and energy overheads, since acteristics, the selected cores must cater to a wide range
the processor must stall during the tuning process. Much of computational complexities and performance requirements.
previous work (e.g., [2], [33], [36], [71], and [104]) have This effort would require a priori knowledge and analysis of
proposed different algorithms/heuristics for cache tuning to the applications/application domains that will execute on the
minimize the potential of overhead and maximize the cache microprocessor. In addition, potential core configurations and
tuning benefits. These works offer a valuable foundation their characteristics must be extensively analyzed.
a promising technique for replacing or supplementing energy

sources (e.g., batteries), especially in ultralow power applica-
tions. Energy can be harvested from several sources, including
solar, thermal gradients, radio frequency radiation, etc. [18],
[31], [35], [60], [77]. These energy sources, however, are typ-
ically not reliable; external factors—distance from a power
source, physical obstacles, and electromagnetic signals—can
disrupt the energy supply [97]. Due to this unreliability, tra-
ditional processors may be impractical for systems equipped
with energy harvesting—when the energy supply is disrupted,
volatile processors will lose their operating state.
Nonvolatile processors [62] use nonvolatile storage
components—nonvolatile memories—to store the processor
Fig. 3. Distributed heterogeneous architectures.
state when the power supply is disrupted. When the power is
restored, the processor’s state is restored from the nonvolatile
Given the application execution requirements, the appro- memory to continue execution. Thus, nonvolatile processors
priate core on which to execute/schedule the application allow continuous computation despite power disruptions. For
must also be determined either statically or dynamically [7]. example, one of the earlier nonvolatile processors [99] allowed
Static scheduling suffices when the applications are known system states to be backed up within 7 µs and restored within
a priori. However, when the applications are unknown, 3 µs. Additionally, since embedded systems typically spend
dynamic scheduling evaluates application characteristics at a significant amount of time idling, resulting in high leakage
runtime and schedules the applications to the appropriate power, nonvolatile processors can reduce the idle power by
cores. Much research is needed to develop low-overhead, com- allowing the system to be shut down while idle. The state can
putationally simple, and accurate scheduling techniques, for then be instantaneously restored on wake-up [63], [102].
IoT microprocessors, that will achieve optimization goals and There are several potential nonvolatile architectures that
satisfy the microprocessors resource constraints. can be used at different abstraction levels to achieve
An alternative to heterogeneous cores on a single device is nonvolatile processors for energy harvesting systems. For
a network of distributed heterogeneous architectures. Fig. 3 example, various FeRAMs, STT-RAMs, PCRAMs, and
illustrates the distributed heterogeneous network. Different ReRAMs have been explored for use in energy harvesting
devices are equipped with different computational resources systems [22], [62], [72], [92], [108]. Several factors must be
that may be required by the devices at different times. For considered when selecting an energy harvesting nonvolatile
example, Fig. 3 depicts six nodes containing six different kinds processor system. The input power characteristics—for exam-
of microprocessors: 1) a single core in-order microproces- ple, the power behavior when interrupted—affect the choice of
sor (1-CPU); 2) a 72-core general purpose GPU (72-GPU); nonvolatile architectures [65]. The application characteristics
3) a DSP; 4) an FPGA; 5) a quad-core out-of-order pro- also affect the choice of nonvolatile architectures. Application
cessor (OoO 4-CPU); and 6) a quad-core in-order processor characteristics, such as deadlines, real-time, and QoS require-
(In-order 4-CPU). Assuming an eight-threaded application A ments, must also be taken into consideration. For example,
with deadline constraints arrives on 1-CPU—1-CPU may be solar powered systems can typically be used to meet real-
under-provisioned for A’s execution (i.e., the execution time time QoS requirements more effectively than RF or thermal
on 1-CPU will exceed the deadline)—an alternative node in source [65]. Much research is required to quantify the trade-
the network can be used to execute A. Apart from determin- offs of different energy sources with respect to the nonvolatile
ing which node contains sufficient resources to execute A, architectures.
the costs—energy and time—to transfer A from 1-CPU to the
right-provisioned node, in addition to the time to retrieve the
results, must not negate the savings from executing A on a
right-provisioned core D. Approximate Computing
To enable distributed heterogeneous architectures, several Approximate computing has recently gained a lot of traction
research challenges must be addressed: benefits of waiting as a viable alternative to exact computing. Exact comput-
for a right-provisioned node if the node is busy; trade- ing targets exact numerical or Boolean equivalence, while
offs of time—execution and transmission times—and energy approximate computing allows a nonexact, inaccurate result
in the presence or absence of deadline constraints; run- that maintains the desired output quality [21]. Approximate
time, low-overhead determination of right-provisioned nodes; computing allows new optimization options for processors that
and developing portable applications and standardizing the execute resilient applications—applications that can produce
communication protocols between different kinds of nodes. outputs of sufficient quality despite some imprecise com-
putations, e.g., signal processing, multimedia, graphics, etc.
C. Energy Harvesting and Nonvolatile Processors Allowing bounded approximation in processors can provide
One of the most important, and most persistent, challenges significant performance and energy gains, while achieving an
for IoT devices is energy consumption. Energy harvesting is acceptable amount of accuracy.
Approximate computing can enable energy-efficient edge sparse data vectors and retrieve them when presented with
computing in IoT devices, such as wearable electron- noisy or incomplete versions of the vectors [43]. However,
ics [54], [81]. One of the first steps to incorporating approxi- SDM architectures are challenging due to the often conflict-
mate computing into IoT devices is identifying the devices’ ing design goals of achieving both high throughput and energy
applications, and the applications’ resilience to computing efficiency.
error. Chippa et al. [21] presented an automatic resilience To enable SDM, compute memory has been proposed
characterization framework to evaluate how amenable an as a viable implementation architecture [46]. Compute
application is to approximate computing. This framework memory [44], [45] is an in-memory processing architecture
uses approximation models that evaluate different approximate that implements both memory and processing in a single archi-
computing techniques for different partitions of an applica- tecture in order to completely eliminate the processor-memory
tion. This framework relies on significant amounts of a priori interface. The compute memory architecture implements infer-
knowledge about the executing applications, such as, the com- ence algorithms in the periphery of the memory array, and
putational patterns and input data. Since applications typically does not modify the core bit-cell array, thus maintaining the
have different execution phases, and the phase behaviors could storage density. The compute memory is able to implement
change at runtime, key to efficient approximation is a run- operations, such as the sum of absolute differences, signed
time framework that automatically detects resilient application multiplication, etc. The SDM implementation, using compute
phases and adjusts the computing exactness to match the memory, has been shown to achieve both high throughput and
currently executing phase’s resilience [42]. energy efficiency for data-rich applications, such as pattern
Several optimizations to enable approximate comput- recognition [46].
ing have been developed at different abstraction lev- However, most current compute memory implementations
els. Using various DSP filters and an ECG application, are application-specific. Ahn et al. [5] proposed a processor
Venkataraman et al. [96] presented a system-level design flow in memory application that uses specialized instructions, called
to study a system’s exactness and used elimination heuris- PIM-enabled instructions, to invoke in-memory computations.
tics to explore the design space under inexactness, area, The goal of the proposed work was to allow processing in
and energy constraints. Several other techniques have been memory operations to be compatible with existing systems and
proposed [38] for achieving inexactness at the circuit level applications, without the need to specifically design the pro-
through different approximate components, such as approx- cessor in memory for specific applications. Much work exists
imate adders [26], [64], approximate multipliers [55], [58], to extend compute memory architectures to multiapplication
and approximate logic synthesis [86], [87]. Similarly, approx- use-cases, with minimal overheads.
imate computing can also be exploited at the memory level, Another architecture, similar to the compute memory, with
for example, by storing data approximately [82] or through high potential for IoT devices is the compute sensor [105],
systems that can tolerate memory errors while maintaining the which offers in-sensor processing. The compute sensor takes
desired QoS [17]. advantage of machine learning algorithms’ inherent adaptabil-
ity to noise, and embeds information processing functionality
for these algorithms into the sensor substrates. The com-
E. In-Memory Processing pute sensor eliminates the sensor-processor interface, wherein
In-memory processing—or processing in memory—has sensed data is typically transmitted to a processor for data
been studied in relation to big data and distributed com- visualization, as is the case in traditional sensors. The compute
puting systems [41], [103]. In-memory processing addresses sensor significantly reduces energy consumption and latency
the well-known processor-memory performance gap through of feature extraction and classification functions, without
internal memory accesses; it avoids delays caused by off- sacrificing accuracy [105].
chip communication. Caches have been widely used to bridge
the processor-memory performance gap; in-memory process-
ing further reduces this gap by allowing computations to be F. Secure Microarchitectures
performed on the memory chip without the need for processor- Security applications are some of the most important
to-memory communication. With the growth of the IoT, applications that will be executed on IoT microproces-
massive amounts of data generated, and resource constraints sors. The IoT’s characteristics—pervasiveness, interdepen-
of IoT devices, in-memory processing offers an attractive dence, connectedness, mobility—makes IoT devices inherently
optimization for IoT devices. vulnerable to increasing number of attacks. IoT devices are
One of the major attractions of in-memory processing for susceptible to physical, side channel, cryptoanalysis, software,
IoT devices, in the context of edge computing, is the need for and network attacks. Security can no longer be an afterthought
real-time in-situ data processing on large data volumes. The in microprocessor design; it is imperative that microprocessors
data processing must be energy-efficient and involve low hard- are designed to be inherently secure. However, security in IoT
ware overhead. To achieve such capabilities, researchers have devices is especially challenging due to the typically stringent
explored inherently robust brain-inspired models of computa- resource constraints of these devices.
tion that involve highly efficient inference applications [24]. Most IoT devices will have no hardware support for vir-
An example of such a computing model is the sparse dis- tualization or enhanced security features, such as trusted
tributed memory (SDM), which can be trained to remember execution [52]. The processing capabilities of IoT devices’
microprocessors may be exceeded by the resource require- R EFERENCES

ments of security processes and algorithms. As a result, [1] J. Abella and A. González, “On reducing register pressure and energy in
security designers often need to tradeoff other vital optimiza- multiple-banked register files,” in Proc. 21st Int. Conf. Comput. Design,
tion goals, such as energy, performance, or cost [51]. Apart San Jose, CA, USA, 2003, pp. 14–20.
[2] T. Adegbija, A. Gordon-Ross, and A. Munir, “Phase distance map-
from the resource constraints of IoT device microprocessors, ping: A phase-based cache tuning methodology for embedded systems,”
these devices also generate massive amounts of data, some of Design Autom. Embedded Syst., vol. 18, nos. 3–4, pp. 251–278, 2014.
which may contain private or sensitive information [79]. [3] T. Adegbija, A. Gordon-Ross, and M. Rawlins, “Analysis of cache tuner
architectural layouts for multicore embedded systems,” in Proc. IEEE
Much previous research has proposed hardware security Int. Perform. Comput. Commun. Conf. (IPCCC), Austin, TX, USA,
techniques for embedded systems [12], [47], [49], [84]. While 2014, pp. 1–8.
most of these techniques can also be employed in IoT devices, [4] T. Adegbija, A. Rogacs, C. Patel, and A. Gordon-Ross, “Enabling right-
provisioned microprocessor architectures for the Internet of Things,” in
one critical requirement for IoT devices is the need to have Proc. Int. Mech. Eng. Congr. Expo., Houston, TX, USA, 2015.
runtime configurable hardware security policies that can adapt [5] J. Ahn, S. Yoo, O. Mutlu, and K. Choi, “PIM-enabled instructions:
to varying security requirements [23]. This requirement is A low-overhead, locality-aware processing-in-memory architecture,”
in Proc. ACM/IEEE 42nd Annu. Int. Symp. Comput. Archit. (ISCA),
motivated by the resource constraints of IoT devices and the Portland, OR, USA, 2015, pp. 336–348.
fact that the sensitivity of various computations vary depending [6] J. B. Anderson and S. Mohan, Source and Channel Coding: An
on the executing applications. Thus, secure microarchitectures Algorithmic Approach, vol. 150. New York, NY, USA: Springer, 2012.
[7] H. Arabnejad and J. G. Barbosa, “List scheduling algorithm for het-
for the IoT must have the capability to be adjusted to satisfy erogeneous systems by an optimistic cost table,” IEEE Trans. Parallel
specific applications’ or tasks’ needs, while incurring minimal Distrib. Syst., vol. 25, no. 3, pp. 682–694, Mar. 2014.
overheads. [8] K. Asanović et al., “The landscape of parallel computing research:
A view from Berkeley,” Dept. EECS, Univ. California at Berkeley,
There are currently very few techniques that have been Berkeley, CA, USA, Tech. Rep. UCB/EECS-2006-183, 2006.
proposed for configurable hardware security in microproces- [9] K. Ashton, “That ‘Internet of Things’ thing,” RFID J., vol. 22, no. 7,
sor architectures, especially for IoT devices [23]. The main pp. 97–114, 2009.
[10] A. Asin and D. Gascon, “50 sensor applications for a smarter world,”
advantage of security at the level of the microarchitecture is Libelium Comunicaciones Distribuidas, Zaragoza, Spain, Tech. Rep.,
that microarchitectures can typically be easily augmented for 2012.
runtime configurability. In addition, configurable microarchi- [11] L. Atzori, A. Iera, and G. Morabito, “The Internet of Things: A survey,”
Comput. Netw., vol. 54, no. 15, pp. 2787–2805, 2010.
tectures can be used to achieve multiple optimization goals. [12] S. Babar, A. Stango, N. Prasad, J. Sen, and R. Prasad, “Proposed
Thus, ensuring hardware security, using configurable microar- embedded security framework for Internet of Things (IoT),” in
chitectures, need not be at the expense of other optimization Proc. 2nd Int. Conf. Wireless Commun. Veh. Technol. Inf. Theory
Aerosp. Electron. Syst. Technol. (Wireless VITAE), Chennai, India,
goals, such as energy consumption. For example, configurable 2011, pp. 1–5.
caches can be used as a moving target defense [107] against [13] J. Baliga, R. W. Ayre, K. Hinton, and R. S. Tucker, “Green cloud
side-channel attacks in caches, while also maintaining the computing: Balancing energy in processing, storage, and transport,”
Proc. IEEE, vol. 99, no. 1, pp. 149–167, Jan. 2011.
other optimization benefits of configurability (e.g., energy and [14] C. B. Bash, C. D. Patel, and R. K. Sharma, “Dynamic thermal manage-
performance) [25]. ment of air cooled data centers,” in Proc. 10th Intersoc. Conf. Thermal
Thermomech. Phenomena Electron. Syst. (ITHERM), San Diego, CA,
USA, 2006, p. 8.
[15] N. Binkert et al., “The gem5 simulator,” Comput. Archit. News, vol. 39,
VII. C ONCLUSION no. 2, pp. 1–7, 2011.
The IoT is expected to transform life, business, and the [16] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, “Fog computing and
its role in the Internet of Things,” in Proc. 1st Edition MCC Workshop
global economy. The IoT’s scale and rapid proliferation will Mobile Cloud Comput., Helsinki, Finland, 2012, pp. 13–16.
generate massive amounts of data that will result in commu- [17] D. Bortolotti et al., “Approximate compressed sensing: Ultra-low power
nication bandwidth bottlenecks, and latency and energy over- biosignal processing via aggressive voltage scaling on a hybrid memory
heads. Edge computing significantly reduces these overheads multi-core processor,” in Proc. Int. Symp. Low Power Electron. Design,
2014, pp. 45–50.
by equipping IoT devices with right-provisioned microproces- [18] A. P. Chandrakasan, D. C. Daly, J. Kwong, and Y. K. Ramadass, “Next
sors and algorithms that can perform computations on the edge generation micro-power systems,” in Proc. IEEE Symp. VLSI Circuits,
nodes to interpret, visualize, and use data. Honolulu, HI, USA, 2008, pp. 2–5.
[19] M. Chen, S. Gonzalez, A. Vasilakos, H. Cao, and V. C. M. Leung,
This paper presented an overview of microprocessor char- “Body area networks: A survey,” Mobile Netw. Appl., vol. 16, no. 2,
acteristics that will support the growth of the IoT, from pp. 171–193, 2011.
an edge computing perspective, and optimizations that will [20] Y. Chen et al., “A low power software-defined-radio baseband proces-
sor for the Internet of Things,” in Proc. IEEE Int. Symp. High Perform.
enable those characteristics. The survey presented herein Comput. Archit. (HPCA), Barcelona, Spain, 2016, pp. 40–51.
should provide researchers with a foundation for designing [21] V. K. Chippa, S. Venkataramani, S. T. Chakradhar, K. Roy, and
IoT microprocessors that are efficient, configurable, secure, A. Raghunathan, “Approximate computing: An integrated hardware
approach,” in Proc. Asilomar Conf. Signals Syst. Comput., Pacific
future-proof, and extensible. We have also discussed some of Grove, CA, USA, 2013, pp. 111–117.
the challenges with achieving the discussed optimizations, and [22] J.-M. Choi, C.-M. Jung, and K.-S. Min, “PCRAM flip-flop circuits
presented some potential solutions for addressing the chal- with sequential sleep-in control scheme and selective write latch,”
J. Semicond. Technol. Sci., vol. 13, no. 1, pp. 58–64, 2013.
lenges. Since edge computing on the IoT is a growing area of [23] J. Crenne et al., “Configurable memory security in embedded
research, this paper provides a foundation for further research systems,” ACM Trans. Embedded Comput. Syst., vol. 12, no. 3, 2013,
into application requirements and microprocessor optimiza- Art. no. 71.
[24] J. M. Cruz-Albrecht, M. W. Yung, and N. Srinivasa, “Energy-efficient
tions that will support edge computing in next-generation IoT neuron, synapse and STDP integrated circuits,” IEEE Trans. Biomed.
devices. Circuits Syst., vol. 6, no. 3, pp. 246–256, Jun. 2012.
[25] C. Dai and T. Adegbija, “Exploiting configurability as a defense against [49] M. M. Kermani, M. Zhang, A. Raghunathan, and N. K. Jha, “Emerging
cache side channel attacks,” in Proc. IEEE Comput. Soc. Annu. Symp. frontiers in embedded security,” in Proc. 12th Int. Conf. Embedded
VLSI (ISVLSI), Bochum, Germany, 2017. Syst. 26th Int. Conf. VLSI Design, Pune, India, 2013, pp. 203–208.
[26] K. Du, P. Varman, and K. Mohanram, “High performance reliable vari- [50] G.-D. Kim et al., “A single FPGA-based portable ultrasound
able latency carry select addition,” in Proc. Design Autom. Test Europe imaging system for point-of-care applications,” IEEE Trans.
Conf. Exhibit. (DATE), Dresden, Germany, 2012, pp. 1257–1262. Ultrason., Ferroelect., Freq. Control, vol. 59, no. 7, pp. 1386–1394,
[27] A. Efthymiou and J. D. Garside, “Adaptive pipeline structures for spec- Jul. 2012.
ulation control,” in Proc. 9th Int. Symp. Asynchronous Circuits Syst., [51] P. Kocher, R. Lee, G. McGraw, A. Raghunathan, and S. Ravi, “Security
Vancouver, BC, Canada, 2003, pp. 46–55. as a new dimension in embedded system design,” in Proc. 41st Annu.
[28] D. Folegnani and A. González, “Energy-effective issue logic,” ACM Design Autom. Conf., San Diego, CA, USA, 2004, pp. 753–760.
SIGARCH Comput. Archit. News, vol. 29, no. 2, pp. 230–239, 2001. [52] P. Koeberl, S. Schulz, A.-R. Sadeghi, and V. Varadharajan, “TrustLite:
[29] (2016). Gartner. Accessed on Apr. 2016. [Online]. Available: A security architecture for tiny embedded devices,” in Proc. 9th Eur.
http://www.gartner.com/newsroom/id/2684616 Conf. Comput. Syst., Amsterdam, The Netherlands, 2014, p. 10.
[30] E. I. Gaura et al., “Edge mining the Internet of Things,” IEEE Sensors [53] Y. Kora, K. Yamaguchi, and H. Ando, “MLP-aware dynamic instruc-
J., vol. 13, no. 10, pp. 3816–3825, Oct. 2013. tion window resizing for adaptively exploiting both ILP and MLP,” in
[31] S. Gollakota, M. S. Reynolds, J. R. Smith, and D. J. Wetherall, Proc. 46th Annu. IEEE/ACM Int. Symp. Microarchit., Davis, CA, USA,
“The emergence of RF-powered computing,” Computer, vol. 47, no. 1, 2013, pp. 37–48.
pp. 32–39, 2014.
[54] L. Kugler, “Is good enough computing good enough?” Commun. ACM,
[32] V. S. Gopinath, J. Sprinkle, and R. Lysecky, “Modeling of data adapt-
vol. 58, no. 5, pp. 12–14, 2015.
able reconfigurable embedded systems,” in Proc. 18th IEEE Int. Conf.
[55] P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading accuracy for power
Workshops Eng. Comput. Based Syst. (ECBS), Las Vegas, NV, USA,
with an underdesigned multiplier architecture,” in Proc. 24th Int. Conf.
2011, pp. 276–283.
VLSI Design, Chennai, India, 2011, pp. 346–351.
[33] A. Gordon-Ross, F. Vahid, and N. D. Dutt, “Fast configurable-cache
tuning with a unified second-level cache,” IEEE Trans. Very Large [56] R. Kumar, K. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen,
Scale Integr. (VLSI) Syst., vol. 17, no. 1, pp. 80–91, Jan. 2009. “Processor power reduction via single-ISA heterogeneous multi-core
[34] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of architectures,” IEEE Comput. Archit. Lett., vol. 2, no. 1, p. 2,
Things (IoT): A vision, architectural elements, and future directions,” Jan./Dec. 2003.
Future Gener. Comput. Syst., vol. 29, no. 7, pp. 1645–1660, 2013. [57] T. T.-O. Kwok and Y.-K. Kwok, “Computation and energy efficient
[35] K. Gudan, S. Chemishkian, J. J. Hull, M. S. Reynolds, and S. Thomas, image processing in wireless sensor networks based on reconfigurable
“Feasibility of wireless sensors using ambient 2.4GHz RF energy,” in computing,” in Proc. Int. Conf. Parallel Process. Workshops (ICPP)
Proc. IEEE Sensors, Taipei, Taiwan, 2012, pp. 1–4. Workshops, Columbus, OH, USA, 2006, p. 8.
[36] H. Hajimiri and P. Mishra, “Intra-task dynamic cache reconfiguration,” [58] K. Y. Kyaw, W. L. Goh, and K. S. Yeo, “Low-power high-speed
in Proc. 25th Int. Conf. VLSI Design (VLSID), Hyderabad, India, 2012, multiplier for error-tolerant application,” in Proc. IEEE Int. Conf.
pp. 430–435. Electron Devices Solid-State Circuits (EDSSC), Hong Kong, 2010,
[37] D. Halperin, T. S. Heydt-Benjamin, K. Fu, T. Kohno, and W. H. Maisel, pp. 1–4.
“Security and privacy for implantable medical devices,” IEEE Pervasive [59] H. Lee et al., “Software defined radio—A high performance embed-
Comput., vol. 7, no. 1, pp. 30–39, Jan./Mar. 2008. ded challenge,” in Proc. Int. Conf. High Perform. Embedded Archit.
[38] J. Han and M. Orshansky, “Approximate computing: An emerging Compilers, Barcelona, Spain, 2005, pp. 6–26.
paradigm for energy-efficient design,” in Proc. 18th IEEE Eur. Test [60] C. Li, W. Zhang, C.-B. Cho, and T. Li, “SolarCore: Solar energy driven
Symp. (ETS), Avignon, France, 2013, pp. 1–6. multi-core architecture power management,” in Proc. IEEE 17th Int.
[39] V. Izosimov, P. Pop, P. Eles, and Z. Peng, “Design optimization of Symp. High Perform. Comput. Archit., San Antonio, TX, USA, 2011,
time-and cost-constrained fault-tolerant distributed embedded systems,” pp. 205–216.
in Proc. Conf. Design Autom. Test Europe, vol. 2. Munich, Germany, [61] X. Li et al., “Smart community: An Internet of Things application,”
2005, pp. 864–869. IEEE Commun. Mag., vol. 49, no. 11, pp. 68–75, Nov. 2011.
[40] B. Jeff, “Big.LITTLE system architecture from arm: Saving power [62] Y. Liu et al., “Ambient energy harvesting nonvolatile processors:
through heterogeneous multiprocessing and task context migration,” From circuit to system,” in Proc. 52nd Annu. Design Autom. Conf.,
in Proc. 49th Annu. Design Autom. Conf., New York, NY, USA, 2012, San Francisco, CA, USA, 2015, p. 150.
pp. 1143–1146. [63] Y. Liu et al., “A 65nm ReRAM-enabled nonvolatile processor with 6x
[41] T. Jiang et al., “Understanding the behavior of in-memory comput- reduction in restore time and 4x higher clock frequency using adaptive
ing workloads,” in Proc. IEEE Int. Symp. Workload Characterization data retention and self-write-termination nonvolatile logic,” in Proc.
(IISWC), Raleigh, NC, USA, 2014, pp. 22–30. IEEE Int. Solid-State Circuits Conf. (ISSCC), San Francisco, CA, USA,
[42] A. B. Kahng and S. Kang, “Accuracy-configurable adder for approx- 2016, pp. 84–86.
imate arithmetic designs,” in Proc. 49th Annu. Design Autom. Conf., [64] S.-L. Lu, “Speeding up processing with approximation circuits,”
San Francisco, CA, USA, 2012, pp. 820–825. Computer, vol. 37, no. 3, pp. 67–73, 2004.
[43] P. Kanerva, Sparse Distributed Memory. Cambridge, MA, USA: MIT
[65] K. Ma et al., “Architecture exploration for ambient energy harvesting
Press, 1988.
nonvolatile processors,” in Proc. IEEE 21st Int. Symp. High Perform.
[44] M. Kang, S. K. Gonugondla, M.-S. Keel, and N. R. Shanbhag,
Comput. Archit. (HPCA), Burlingame, CA, USA, 2015, pp. 526–537.
“An energy-efficient memory-based high-throughput VLSI architec-
[66] McKinsey. (2016). Disruptive Technologies: Advances That Will
ture for convolutional networks,” in Proc. IEEE Int. Conf. Acoust.
Transform Life, Business, and the Global Economy. Accessed on
Speech Signal Process. (ICASSP), Brisbane, QLD, Australia, 2015,
Jan. 2016. [Online]. Available: http://www.mckinsey.com
pp. 1037–1041.
[45] M. Kang, M.-S. Keel, N. R. Shanbhag, S. Eilert, and K. Curewitz, [67] Medical. (2015). Medical Image Processing, Analysis, and
“An energy-efficient VLSI architecture for pattern recognition via Visualization. Accessed on Jan. 2016. [Online]. Available:
deep embedding of computation in SRAM,” in Proc. IEEE Int. http://mipav.cit.nih.gov/
Conf. Acoust. Speech Signal Process. (ICASSP), Florence, Italy, 2014, [68] D. Miorandi, S. Sicari, F. De Pellegrini, and I. Chlamtac, “Internet of
pp. 8326–8330. Things: Vision, applications and research challenges,” Ad Hoc Netw.,
[46] M. Kang and N. R. Shanbhag, “In-memory computing architectures vol. 10, no. 7, pp. 1497–1516, 2012.
for sparse distributed memory,” IEEE Trans. Biomed. Circuits Syst., [69] A. Mort et al., “Combining transcranial ultrasound with intelligent
vol. 10, no. 4, pp. 855–863, Aug. 2016. communication methods to enhance the remote assessment and man-
[47] A. Kanuparthi, R. Karri, and S. Addepalli, “Hardware and embedded agement of stroke patients: Framework for a technology demonstrator,”
security in the context of Internet of Things,” in Proc. ACM Workshop Health Informat. J., vol. 22, no. 3, pp. 691–701, 2016.
Security Privacy Dependability Cyber Veh., Berlin, Germany, 2013, [70] E. F. Nakamura, A. A. F. Loureiro, and A. C. Frery, “Information fusion
pp. 61–64. for wireless sensor networks: Methods, models, and classifications,”
[48] N. Karimian, P. A. Wortman, and F. Tehranipoor, “Evolving authentica- ACM Comput. Surveys, vol. 39, no. 3, p. 9, 2007.
tion design considerations for the Internet of biometric things (IoBT),” [71] O. Navarro, T. Leiding, and M. Hübner, “Configurable cache tuning
in Proc. Int. Conf. Hardw./Softw. Codesign Syst. Syn. (CODES+ISSS), with a victim cache,” in Proc. 10th Int. Symp. Reconfig. Commun.
Pittsburgh, PA, USA, 2016, pp. 1–10. Centric Syst. Chip (ReCoSoC), Bremen, Germany, 2015, pp. 1–6.
[72] S. Onkaraiah et al., “Bipolar ReRAM based non-volatile flip-flops for [96] S. Venkataraman, A. Kumar, J. Schlachter, and C. Enz, “Designing
low-power architectures,” in Proc. IEEE 10th Int. New Circuits Syst. inexact systems efficiently using elimination heuristics,” in Proc.
Conf. (NEWCAS), Montreal, QC, Canada, 2012, pp. 417–420. Design Autom. Test Europe Conf. Exhibit., Grenoble, France, 2015,
[73] C. D. Patel, C. E. Bash, R. Sharma, M. Beitelmal, and R. Friedrich, pp. 758–763.
“Smart cooling of data centers,” in Proc. ASME Int. Electron. Packag. [97] H. J. Visser, A. C. F. Reniers, and J. A. C. Theeuwes, “Ambient RF
Tech. Conf. Exhibit., 2003, pp. 129–137. energy scavenging: GSM and WLAN power density measurements,” in
[74] D. A. Patterson, G. Gibson, and R. H. Katz, “A case for redundant Proc. 38th Eur. Microw. Conf. (EuMC), Amsterdam, The Netherlands,
arrays of inexpensive disks (RAID),” ACM SIGMOD Rec., vol. 17, 2008, pp. 721–724.
no. 3, pp. 109–116, 1988. [98] W. Wang and T. Dey. (2011). A Survey on Arm Cortex a Processors.
[75] C. Perera, A. Zaslavsky, P. Christen, and D. Georgakopoulos, “Context [Online]. Available: http://www.cs.virginia.edu1∼skadron
aware computing for the Internet of Things: A survey,” IEEE Commun. [99] Y. Wang et al., “A 3us wake-up time nonvolatile processor based on fer-
Surveys Tuts., vol. 16, no. 1, pp. 414–454, 1st Quart., 2014. roelectric flip-flops,” in Proc. ESSCIRC (ESSCIRC), Bordeaux, France,
[76] V. Raghunathan, S. Ganeriwal, M. Srivastava, and C. Schurgers, 2012, pp. 149–152.
“Energy efficient wireless packet scheduling and fair queuing,” ACM [100] Z. Xiong, X. Wu, S. Cheng, and J. Hua, “Lossy-to-lossless compres-
Trans. Embedded Comput. Syst., vol. 3, no. 1, pp. 3–23, 2004. sion of medical volumetric data using three-dimensional integer wavelet
[77] V. Raghunathan, A. Kansal, J. Hsu, J. Friedman, and M. Srivastava, transforms,” IEEE Trans. Med. Imag., vol. 22, no. 3, pp. 459–470,
“Design considerations for solar energy harvesting wireless embed- Mar. 2003.
ded systems,” in Proc. 4th Int. Symp. Inf. Process. Sensor Netw., [101] J. Yiu, The Definitive Guide to ARM R Cortex-M3
R and Cortex-M4
R
2005, p. 64. Processors. Newnes, 2013.
[78] M. C. Ramon, “Intel Galileo and intel Galileo gen 2,” in IntelR Galileo [102] W.-K. Yu et al., “A non-volatile microcontroller with integrated
and Intel R Galileo Gen 2. Springer, 2014, pp. 1–33. floating-gate transistors,” in Proc. IEEE/IFIP 41st Int. Conf. Depend.
[79] A.-R. Sadeghi, C. Wachsmann, and M. Waidner, “Security and privacy Syst. Netw. Workshops (DSN-W), Hong Kong, 2011, pp. 75–80.
challenges in industrial Internet of Things,” in Proc. 52nd Annu. Design [103] M. Zaharia et al., “Resilient distributed datasets: A fault-tolerant
Autom. Conf., San Francisco, CA, USA, 2015, p. 54. abstraction for in-memory cluster computing,” in Proc. 9th USENIX
[80] G. K. Saha, “Software based fault tolerance: A survey,” Ubiquity, Conf. Netw. Syst. Design Implement., San Jose, CA, USA,
vol. 2006, p. 1, Jul. 2006. 2012, p. 2.
[81] F. Samie, L. Bauer, and J. Henkel, “An approximate compres- [104] C. Zhang, F. Vahid, and W. Najjar, “A highly configurable cache archi-
sor for wearable biomedical healthcare monitoring systems,” in tecture for embedded systems,” in Proc. 30th Annu. Int. Symp. Comput.
Proc. 10th Int. Conf. Hardw./Softw. Codesign Syst. Synth., Amsterdam, Archit., San Diego, CA, USA, 2003, pp. 136–146.
The Netherlands, 2015, pp. 133–142. [105] S. Zhang, M. Kang, C. Sakr, and N. Shanbhag, “Reducing the energy
[82] A. Sampson, J. Nelson, K. Strauss, and L. Ceze, “Approximate storage cost of inference via in-sensor information processing,” arXiv preprint
in solid-state memories,” ACM Trans. Comput. Syst., vol. 32, no. 3, arXiv:1607.00667, 2016.
p. 9, 2014. [106] L. Zhou and H.-C. Chao, “Multimedia traffic security architecture
[83] E. Sánchez-Sinencio, “Smart nodes of Internet of Things (IoT): A hard- for the Internet of Things,” IEEE Netw., vol. 25, no. 3, pp. 35–40,
ware perspective view & implementation,” in Proc. 24th Edition Great May/Jun. 2011.
Lakes Symp. VLSI, Houston, TX, USA, 2014, pp. 137–138. [107] R. Zhuang, S. A. DeLoach, and X. Ou, “Towards a theory of moving
[84] D. N. Serpanos and A. G. Voyiatzis, “Security challenges in embedded target defense,” in Proc. 1st ACM Workshop Moving Target Defense,
systems,” ACM Trans. Embedded Comput. Syst., vol. 12, no. 1s, p. 66, Scottsdale, AZ, USA, 2014, pp. 31–40.
2013. [108] M. Zwerg et al., “An 82µa/MHz microcontroller with embed-
[85] Y. S. Shao, B. Reagen, G.-Y. Wei, and D. Brooks, “Aladdin: A pre-RTL, ded FeRAM for energy-harvesting applications,” in Proc. IEEE
power-performance accelerator simulator enabling large design space Int. Solid-State Circuits Conf., San Francisco, CA, USA, 2011,
exploration of customized architectures,” in Proc. ACM/IEEE 41st pp. 334–336.
Int. Symp. Comput. Archit. (ISCA), Minneapolis, MN, USA, 2014,
pp. 97–108.
[86] D. Shin and S. K. Gupta, “Approximate logic synthesis for error toler- Tosiron Adegbija (M’11) received the B.Eng.
ant applications,” in Proc. Conf. Design Autom. Test Europe, Dresden, degree in electrical engineering from the University
Germany, 2010, pp. 957–960. of Ilorin, Ilorin, Nigeria, in 2005, and the M.S. and
[87] D. Shin and S. K. Gupta, “A new circuit simplification method for error Ph.D. degrees in electrical and computer engineer-
tolerant applications,” in Proc. Design Autom. Test Europe, Grenoble, ing from the University of Florida, Gainesville, FL,
France, 2011, pp. 1–6. USA, in 2011 and 2015, respectively.
[88] A. K. Singh, M. Shafique, A. Kumar, and J. Henkel, “Mapping on He is currently an Assistant Professor of Electrical
multi/many-core systems: Survey of current and emerging trends,” and Computer Engineering with the University of
in Proc. 50th Annu. Design Autom. Conf., Austin, TX, USA, 2013, Arizona, Tucson, AZ, USA. His current research
pp. 1–10. interests include computer architecture, with empha-
[89] S. P. Singh and R. Maini, “Comparison of data encryption algorithms,” sis on adaptable computing, low-power embedded
Int. J. Comput. Sci. Commun., vol. 2, no. 1, pp. 125–127, 2011. systems design and optimization methodologies, and microprocessor optimiza-
[90] G. Stitt, R. Lysecky, and F. Vahid, “Dynamic hardware/software partitions for the Internet of Things.
tioning: A first approach,” in Proc. 40th Annu. Design Autom. Conf., Dr. Adegbija was a recipient of the Best Paper Award at the Ph.D. forum
Anaheim, CA, USA, 2003, pp. 250–255. of the IEEE Computer Society Annual Symposium on VLSI in 2014.
[91] H. Sundmaeker, P. Guillemin, P. Friess, and S. Woelfflé, “Vision
and challenges for realising the Internet of Things,” Cluster
Eur. Res. Projects Internet Things, Eur. Commision, vol. 20, Anita Rogacs received the B.S. degree in mechan-
2010. ical engineering from San Jose State University,
[92] K. Swaminathan, R. Mukundrajan, N. Soundararajan, and San Jose, CA, USA, and the M.S. and Ph.D. degrees
V. Narayanan, “Towards resilient micro-architectures: Datapath in mechanical engineering from Stanford University,
reliability enhancement using STT-MRAM,” in Proc. IEEE Comput. Stanford, CA, USA.
Soc. Annu. Symp. VLSI, Chennai, India, 2011, pp. 236–241. At HP Laboratories, Palo Alto, CA, USA, she
[93] S. Tilak, N. B. Abu-Ghazaleh, and W. Heinzelman, “A taxonomy built Life Sciences Laboratories, which is now
of wireless micro-sensor network models,” ACM SIGMOBILE Mobile home to research efforts in areas of plasmonics,
Comput. Commun. Rev., vol. 6, no. 2, pp. 28–36, 2002. Raman spectroscopy, microfluidics, analytical chem-
[94] M. Tory and T. Möller, “Rethinking visualization: A high-level taxon- istry, molecular biology, and data mining. She is
omy,” in Proc. IEEE Symp. Inf. Visual. INFOVIS, Austin, TX, USA, also the HP Laboratories Principle Investigator of
2004, pp. 151–158. the CRADA collaboration with the Federal Food and Drug Administration,
[95] O. S. Unsal, I. Koren, and C. M. Krishna, “Towards energy-aware which aims to develop screening methods using surface enhanced Raman
software-based fault tolerance in real-time systems,” in Proc. Int. Symp. spectroscopy sensors.
Low Power Electron. Design (ISLPED), Monterey, CA, USA, 2002, Dr. Rogacs was a recipient of the National Science Foundation and the
pp. 124–129. Sandia National Laboratories Research Fellowships.
Chandrakant Patel (F’08) received the B.S. degree Ann Gordon-Ross (M’00) received the B.S. and
in mechanical engineering from the University of Ph.D. degrees in computer science and engineering
California at Berkeley, Berkeley, CA, USA, in 1983 from the University of California, Riverside, CA,
and the M.S. degree in mechanical engineering from USA, in 2000 and 2007, respectively.
San Jose State University, San Jose, CA, USA, She is currently an Associate Professor of
in 1988. Electrical and Computer Engineering with the
He is currently Chief Engineer and a Senior University of Florida, Gainesville, FL, USA, and
Fellow of HP Laboratories, Palo Alto, CA, USA. also a member of the NSF Center for High
He has led HP Laboratories in delivering innovations Performance Reconfigurable Computing. She is also
in chips, systems, data centers, storage, networking, a Faculty Advisor of the Women in Electrical and
print engines, and software platforms. He is a pio- Computer Engineering and the Phi Sigma Rho
neer in thermal and energy management in data centers, and in the application National Society for Women in Engineering and Engineering Technology,
of information technology for available energy management at the scale of and an Active Member of the Women in Engineering Proactive Network. Her
cities. An advocate of a return to fundamentals, he has served as an Adjunct current research interests include embedded systems, computer architecture,
Faculty Member of Engineering with Chabot College, Hayward, CA, USA, low-power design, reconfigurable computing, dynamic optimizations, hard-
UC Berkeley Extension, Berkeley, CA, USA, San Jose State University, ware design, real-time systems, and multicore platforms.
San Jose, CA, USA, and Santa Clara University, Santa Clara, CA, USA. Dr. Gordon-Ross was a recipient of the CAREER Award from the National
In 2014, he was elected to the Silicon Valley Engineering Hall of Fame. He Science Foundation in 2010, the Best Paper Awards at the Great Lakes
holds 151 patents and has published over 150 papers. Symposium on VLSI in 2010 and the IARIA International Conference on
Mr. Patel is an ASME Fellow. Mobile Ubiquitous Computing, Systems, Services and Technologies in 2010,
and the Best Ph.D. Poster at the IEEE Computer Society Annual Symposium
on VLSI in 2014. She is very active in promoting diversity in STEM fields,
and has been a Guest Speaker at several international workshops/conferences
on this topic, organizes workshops, and participates in local outreach programs
at local K-12 schools.

2018 Microprocessor Optimizations For The IOT

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2018 Microprocessor Optimizations For The IOT

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 37, NO.

Microprocessor Optimizations for the

sufficient computational capabilities and algorithms to extract

way for determining these characteristics is through simu-

V. S TATE - OF - THE -A RT IN I OT M ICROARCHITECTURE

A. Configurable/Adaptable Architectures for IoT microprocessors. The intrinsic characteristics of the

a promising technique for replacing or supplementing energy

microprocessors may be exceeded by the resource require- R EFERENCES

You might also like