Professional Documents
Culture Documents
Textbook Invasive Computing For Mapping Parallel Programs To Many Core Architectures 1St Edition Andreas Weichslgartner Ebook All Chapter PDF
Textbook Invasive Computing For Mapping Parallel Programs To Many Core Architectures 1St Edition Andreas Weichslgartner Ebook All Chapter PDF
https://textbookfull.com/product/programming-multicore-and-many-
core-computing-systems-1st-edition-sabri-pllana/
https://textbookfull.com/product/task-scheduling-for-multi-core-
and-parallel-architectures-challenges-solutions-and-
perspectives-1st-edition-quan-chen/
https://textbookfull.com/product/from-variability-tolerance-to-
approximate-computing-in-parallel-integrated-architectures-and-
accelerators-1st-edition-abbas-rahimi/
https://textbookfull.com/product/modeling-and-simulation-of-
invasive-applications-and-architectures-sascha-roloff/
https://textbookfull.com/product/parallel-programming-for-modern-
high-performance-computing-systems-czarnul/
https://textbookfull.com/product/cable-driven-parallel-robots-
proceedings-of-the-second-international-conference-on-cable-
driven-parallel-robots-1st-edition-andreas-pott/
https://textbookfull.com/product/how-to-design-programs-an-
introduction-to-programming-and-computing-matthias-felleisen/
https://textbookfull.com/product/sql-nosql-databases-models-
languages-consistency-options-and-architectures-for-big-data-
management-1st-edition-andreas-meier/
Computer Architecture and Design Methodologies
Andreas Weichslgartner
Stefan Wildermann
Michael Glaß
Jürgen Teich
Invasive
Computing for
Mapping Parallel
Programs to Many-
Core Architectures
Computer Architecture and Design
Methodologies
Series editors
Anupam Chattopadhyay, Noida, India
Soumitra Kumar Nandy, Bangalore, India
Jürgen Teich, Erlangen, Germany
Debdeep Mukhopadhyay, Kharagpur, India
Twilight zone of Moore’s law is affecting computer architecture design like never
before. The strongest impact on computer architecture is perhaps the move from
unicore to multicore architectures, represented by commodity architectures like
general purpose graphics processing units (gpgpus). Besides that, deep impact of
application-specific constraints from emerging embedded applications is presenting
designers with new, energy-efficient architectures like heterogeneous multi-core,
accelerator-rich System-on-Chip (SoC). These effects together with the security,
reliability, thermal and manufacturability challenges of nanoscale technologies are
forcing computing platforms to move towards innovative solutions. Finally, the
emergence of technologies beyond conventional charge-based computing has led to
a series of radical new architectures and design methodologies.
The aim of this book series is to capture these diverse, emerging architectural
innovations as well as the corresponding design methodologies. The scope will
cover the following.
Heterogeneous multi-core SoC and their design methodology
Domain-specific Architectures and their design methodology
Novel Technology constraints, such as security, fault-tolerance and their impact
on architecture design
Novel technologies, such as resistive memory, and their impact on architecture
design
Extremely parallel architectures
Invasive Computing
for Mapping Parallel
Programs to Many-Core
Architectures
123
Andreas Weichslgartner Michael Glaß
Department of Computer Science Embedded Systems/Real-Time Systems
Friedrich-Alexander-Universität Erlangen- University of Ulm
Nürnberg (FAU) Ulm, Baden-Württemberg
Erlangen, Bayern Germany
Germany
Jürgen Teich
Stefan Wildermann Department of Computer Science
Department of Computer Science Friedrich-Alexander-Universität Erlangen-
Friedrich-Alexander-Universität Erlangen- Nürnberg (FAU)
Nürnberg (FAU) Erlangen, Bayern
Erlangen, Bayern Germany
Germany
This work originated from within the Transregional Collaborative Research Center
89 “Invasive Computing” (abbr. InvasIC) in which a novel paradigm for the design
and resource-aware programming of future parallel computing systems is investi-
gated. For systems with 1000 and more cores on a chip, resource-aware pro-
gramming is of utmost importance to obtain high utilization as well as high
computational and energy efficiency, but also in order to achieve predictable
qualities of execution of parallel programs. The basic principle of invasive com-
puting and innovation is to give a programmer explicit handles to specify and argue
about resource requirements desired or required in different phases of execution.
InvasIC is funded by the Deutsche Forschungsgemeinschaft (DFG), aggregating
researchers from Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU),
Karlsruher Institut für Technologie (KIT), and Technische Universität München
(TUM). Its scientific team includes specialists in parallel algorithm design, hard-
ware architects for reconfigurable MPSoC development as well as language, tool,
application, and operating system designers.
At this point, we like to thank all participating scientists of InvasIC who enabled
and jointly contributed to the achievements of InvasIC in general and to the results
summarized in this book in particular. Our particular thanks go to the DFG for
funding InvasIC.
v
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 (A) Decentralized Application Mapping . . . . . . . . . . . . . . 3
1.1.2 (B) Hybrid Application Mapping . . . . . . . . . . . . . . . . . . . 4
1.1.3 (C) Nonfunctional Properties . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Outline of this Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Invasive Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Principles of Invasive Computing . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Invasive Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Invade, Infect, Retreat, and Claims . . . . . . . . . . . . . . . . . . 12
2.2.2 Communication-Aware Programming . . . . . . . . . . . . . . . . 13
2.2.3 Actor Model and Nonfunctional Properties . . . . . . . . . . . . 15
2.3 Overhead Analysis of Invasive Computing . . . . . . . . . . . . . . . . . . 19
2.3.1 Invasive Speedup and Efficiency Analysis . . . . . . . . . . . . . 21
2.4 Invasive Hardware Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.1 Invasive Tightly Coupled Processor Arrays . . . . . . . . . . . . 25
2.4.2 The Invasive Core—i-Core . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.3 Dynamic Many-Core i-let Controller—CiC . . . . . . . . . . . . 27
2.5 Invasive Network on Chip—i-NoC . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.1 Router . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5.2 Invasive Network Adapter—i-NA . . . . . . . . . . . . . . . . . . . 31
2.5.3 Control Network Layer . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Invasive Run-Time and Operating System . . . . . . . . . . . . . . . . . . 34
2.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
vii
viii Contents
3 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1 Application Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Application Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4 Composability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5 Predictability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.1 -Predictability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4 Self-embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1 Self-embedding Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Incarnations of Embedding Algorithms . . . . . . . . . . . . . . . . . . . . 63
4.2.1 Path Load and Best Neighbor . . . . . . . . . . . . . . . . . . . . . . 64
4.2.2 Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3 Seed-Point Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Hardware-Based Acceleration for Self-embedding . . . . . . . . . . . . 68
4.4.1 Application Graph Preprocessing . . . . . . . . . . . . . . . . . . . 69
4.4.2 Serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.3 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.4.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.5.3 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.4 Random Walk with Weighted Probabilities . . . . . . . . . . . . 77
4.5.5 Hardware-Based Self-embedding . . . . . . . . . . . . . . . . . . . 79
4.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5 Hybrid Application Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.1 HAM Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2 Static Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.1 Composable Communication Scheduling . . . . . . . . . . . . . . 92
5.2.2 Composable Task Scheduling . . . . . . . . . . . . . . . . . . . . . . 94
5.3 Design Space Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3.1 Generation of Feasible Application Mappings . . . . . . . . . . 98
5.3.2 Optimization Objectives and Evaluation . . . . . . . . . . . . . . 99
5.4 Run-Time Constraint Solving . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.4.1 Constraint Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.4.2 Run-Time Mapping of Constraint Graphs . . . . . . . . . . . . . 102
5.4.3 Backtracking Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4.4 Run-Time Management and System Requirements . . . . . . . 106
Contents ix
xi
xii Abbreviations
A Attacker, 17
A Variable assignment in CSP, 105
ai Underutilization factor, 21
avgnet Average network load, 75
B Communication channel, 102, 105, 141, 148
BE Best-case execution time, 46
b Map task, 55, 59, 60, 64, 65, 67, 91, 92, 95, 100, 102, 104,
107, 111, 113, 120, 142
bCG Map task cluster of constraint graph function, 103, 105
bDSE Map task function in DSE, 102
bw Minimum required message bandwidth, 46, 47, 49, 59, 64,
71, 98
C Task cluster, 101, 102, 103, 104, 105, 112, 141
c Cost function for self-embedding algorithm, 60, 62
cap Link capacity, 49, 59, 66, 98
CL Worst-case communication latency, 93
Conf Confidentiality, 17
D Domain in CSP, 106
d Deadline, 46, 91
Dc Number of cores to invade, 23
E Set of edges of an application graph, 46, 47, 49, 50
e Edge of an application graph, 46
ECPU Overall maximal processor energy consumption of a
mapping, 99
Einc Energy consumption of all mapped operating points by
incremental RM, 114
embAlg Embedding algorithm, 60
EMMKP Energy consumption of all mapped operating points by
MMKP RM, 114
ENOC Overall maximal NoC energy consumption of a mapping,
99
xv
xvi Symbols
Env Environment, 17
EOV Overall maximal energy consumption of a mapping, 90,
99, 107, 109
2 Conf 2-confidentiality, 17
equaltype Checks if the resource type of a tile matches a certain
resource type, 101
Erel Relative energy consumption of MMKP RM and incre-
mental RM, 114
ELbit Energy consumption of one bit in a NoC link, 99, 112
ESbit Energy consumption of one bit in a NoC router, 99, 112
ENoCbit Energy consumption of routing one bit over a NoC router,
99
gR NoC router delay, 93
f Frequency, 49, 69, 125
GAPP0 ðV; EÞ Example application graph, 46, 47, 50
GArch Short notation of architecture graph, 106
GArch0 ðU; LÞ Example architecture graph, 48
GArch ðU; LÞ Architecture graph, 47, 50, 91, 102, 141
GApp ðV; EÞ Application graph, 46, 49, 71, 91, 103, 141
GApp ðV ¼ T [ M; EÞ Application graph, 70, 71
GC Short notation of constraint graph, 106
GC ðVC ; EC Þ Constraint graph, 101, 102, 141
gettype Determines the resource type of a tile: U ! R, 47, 49, 50,
94, 96, 98, 99, 101, 144
hop Hop constraint in the constraint graph, 102, 103
Hþ Hop distance, 48, 49, 93, 102, 104
H Manhattan distance, 48, 49, 50, 66, 67, 93, 99, 102, 104
h Max hop distance in self-embedding algorithm, 60, 61, 62,
64, 65, 66, 76
Hqþ Hop distance of a route, 93, 99
Hq Manhattan distance of a route, 50, 93, 99
I Input space, 53
i Running variable, 18, 19, 20, 66
=E Invasive efficiency, 22, 23
INF Infinum, 53
INFL Best-case end-to-end latency, 125,
INFLComp Best-case tile latency, 125
INFLNoC Best-case NoC latency, 125
INFTrNoC Best-case NoC throughput, 126
=P Average number of processors utilized, 22, 23
=S Invasive speedup, 22, 23
isrouted Function evaluates whether a message is routed over the
NoC or utilizes local tile communication, 100
=T Invasive execution time, 22, 23
Symbols xvii
xxi
xxii Abstract
Abstract One of the most important trends in computer architecture in recent years
is the paradigm shift toward multi and many-core chips. This chapter outlines the
implications and challenges of future many-core architectures and gives an overview
of the book’s contributions.
One of the most important trends in computer architecture in recent years is the
paradigm shift toward multi- and many-core chips. Until the year 2005, the per-
formance gain of new processor generations mainly stemmed from advances in the
microarchitecture and an increased frequency (see Fig. 1.1).
Then, frequency scaling reached its limit and additional performance gains by
improving the core architecture would result in a huge increase of power consump-
tion [5]. As Moore’s law still holds, the number of transistors still increases expo-
nentially. These additional transistors contribute best to a higher performance when
used for increasing the core count. By exploiting parallelism, multiple “weaker”
cores can outperform a single core. To accelerate programs which cannot profit from
parallelism, specialized hardware blocks (e.g., for cryptography, signal processing),
or mixtures between powerful and weaker processors (e.g., ARM big.LITTLE) can
be used. This heterogeneity might also help to circumvent the problem of dark sili-
con [9]. The term dark silicon describes the fact that not all transistors on a chip can
be utilized concurrently because of power density limits, as elsewise the temperature
would exceed its limits. As a direct consequence, some parts of the chip do no com-
putation at all and stay “dark.” Overall, heterogeneous many-core architectures seem
to be the most promising solution to cope with the aforementioned problems. This
affects all markets and branches ranging from high-performance computing (HPC),
over gaming and mobile devices, to even processors in the automotive and embedded
sector. Targeting the HPC market, Intel’s latest generation of the many-core chip Intel
Xeon Phi, Knight’s Landing [12], offers a chip with 72 Atom cores. Also, the leading
supercomputer TaihuLight of the Top 500 list [13] incorporates clusters of many-
core systems. Altogether, the system consists of 40,960 nodes where each node is an
SW26010 processor with 260 cores [6]. TaihuLight not only outperforms all other
systems which rely on processors with fewer cores or use acceleration of graphics
103
101
10−1
1970 1980 1990 2000 2010
Year
Fig. 1.1 Development of processor architectures over the last decades. While the frequency and
the power have saturated, the number of transistors still increases exponentially and performance
gains result mainly of an increased core count (c.f. [1]; plotted with data from [11])
processing units (GPUs) but is also more energy-efficient than other supercomput-
ers. The Tile-Mx 100 from Mallanox (previously Tilera) incorporates 100 ARMv8
cores onto a single chip [7]. The company markets the chip for the networking and
data center area. Also, academic research aims at massive many-core chips. In [3],
the design and implementation of a kilo-core, a chip with 1,000 processing cores, is
presented.
It can be observed that the aforementioned chips do not use specially designed and
sophisticated processor cores but rather employ already developed energy-efficient
cores from the embedded domain. The design focus shifts to the so-called uncore.
Uncore describes everything on a chip which is not a processing core, e.g., last-level
cache, communication infrastructure, and memory controller. Obviously, conven-
tional single arbitrated buses or crossbars do not scale up to thousands of cores.
Network on chips (NoCs) with regular structures and simple building blocks have
therefore emerged as easily expendable communication infrastructure [2].
As these computing systems interfuse more and more with our daily life ranging
from industry automation over transportation to internet of things and smart devices,
requirements with respect to nonfunctional execution properties drastically increase.
A functional correct execution of a program mostly not suffices anymore. Nowa-
days, the energy consumption or a predictable execution time of a program plays an
important role already. Especially for mobile and embedded devices, a small power
footprint is crucial. For example, the uncore of the processor already consumes over
50% of the power budget of a processor [4]. But, also other nonfunctional execu-
tion properties gain more and more importance: In safety critical environments, e.g.,
automotive or aerospace, hard real-time requirements are a prerequisite. Addition-
ally, to meet certain safety standards, e.g., safety integrity level (SIL), programs and
communication may be conducted redundantly. Even in non-safety critical environ-
ments, nonfunctional execution properties gain importance. For example, the user
1 Introduction 3
wants to have a minimum video throughput and quality for a paid streaming service
and has high demands on privacy and security of his/her data and programs.
In summary, modern chip architectures comprise more and more heterogeneous
computing cores interconnected with a NoC. To efficiently exploit the computational
performance of these systems while considering nonfunctional execution properties
is one of the major challenges in nowadays computer science. To tackle these prob-
lems, Teich proposed invasive computing [14]. Invasive computing gives the appli-
cation programmer the possibility to invade resources, according to her/his specified
constraints, infect the returned claim of resources with program code, and retreat
from these resources after the computation is finished. Invasive computing com-
prises various aspects of computing, ranging from language development to invasive
hardware. One aspect is mapping applications onto many-core architectures. This is
a challenging task, especially when considering nonfunctional execution properties,
resource utilization, and short mapping times.
1.1 Contributions
The book at hand investigates and proposes novel application mapping methodolo-
gies. As detailed in Fig. 1.2, the application mapping can take a different amount of
time and can fulfill various non-functional execution requirements.
The main contributions of this book may be summarized as follows:
(A) An approach to decentrally map applications to NoC architectures with a focus
on communication [16] and the possibility of hardware acceleration inside NoC
routers [18].
(B) A hybrid application flow that enables to combine the strengths of static analysis
and dynamic adaptivity [15, 17, 19, 21, 22].
(C) Assuring nonfunctional properties such as timeliness [15, 17, 19] and secu-
rity [8, 20].
Application graphs, besides other application characteristics, express the task level
parallelism of applications. This parallelism can be also exploited during the map-
ping process. The concept of self-embedding [16] describes a class of distributed
algorithms where each task is responsible for mapping its succeeding tasks and the
communication in between. Also, these algorithms are highly scalable as they do not
require global knowledge and make their mapping decision based on local informa-
tion. Dedicated hardware modules, attached to each network router inside an invasive
network on chip (i-NoC) [10], have direct access to the i-NoC link utilization and
can accelerate the self-embedding [18].
4 1 Introduction
Chapter 2: Chapter 2:
IntroducƟon to Overview of
invasive compuƟng invasive
and invasive architectures
programming and hardware
t0
Chapter 3:
Formal models for u0 u1 u2
applicaƟons and t1 t2
architectures and
fundamentals u4 u5 u6
t3
(Hybrid)
ApplicaƟon
Mapping
Chapter 4:
Chapter 6:
Fast mapping ut00 ut21 u2 t0
u0t1 t2
u 1t3 u2 Hybrid applicaƟon
heurisƟc (self-
mapping for
embedding) with
security-criƟcal
hardware support ut14 ut35 u6 u4 u5 u6 applicaƟon
t2
Chapter 5: ut0 0t1 u1 ut23
Hybrid applicaƟon
mapping
methodology u4 u5 u6
Fig. 1.2 Overview of the structure and the contributions of this book. Chapters 2 and 3 introduce
the required context and fundamentals while Chaps. 4–6 present the contributions in the area of
application mapping
Dynamic application mapping algorithms have a limited time budget to find a suitable
mapping. Hence, they cannot perform extensive formal analyses to determine bounds
on nonfunctional properties to ensure a predictable program execution. In contrast,
static approaches are unable to react to run-time events or on changes in the compo-
sition of the executed applications (i.e., inter-application scenarios). As the number
of possible scenarios is exponential to the number of applications, scenario-based
approaches suffer from a bad scalability. In contrast to existing hybrid application
mapping approaches, this book proposes the design-time application analysis and
run-time mapping (DAARM) design flow, which is capable of exploring multiple
objectives rather than only timing and energy. Most existing approaches also sim-
plify the NoC communication in their analysis and the run-time mapping process.
For more realistic results, a detailed model of the invasive NoC [10] for latency,
throughput, link utilization, and energy is an integral part of the DAARM design
flow [17, 19]. During a design space exploration (DSE) at design time, infeasible
1.1 Contributions 5
mappings which overutilize NoC resources can already be discarded. Only feasible
mappings, alongside with the explored objectives, are handed over to the run-time
management (RM). As an intermediate representation, the book at hand proposes
the notion of a constraint graph. This graph encodes all constraints which need to
hold for the run-time mapping so that it adheres to the objectives evaluated at design
time. To perform the run-time mapping of this constraint graph, this book proposes
a backtracking algorithm.
As detailed before, the proposed hybrid application mapping approach enables to give
upper bounds for nonfunctional execution properties. These properties are derived
by a statical analysis but even hold true in the context of dynamic run-time mapping.
In this book, we consider the following objectives: (a) timing (best-case/worst-case
end-to-end latency) [15, 17, 19] (see Chap. 5), (b) energy consumption [17, 19] (see
Chap. 5), and (c) security (spatial isolation of communication and computation) [8,
20] (see Chap. 6). We present the needed analysis models and methodologies to
integrate these nonfunctional execution properties and investigate the implications
on the mapping process.
References
18. Weichslgartner A, Heisswolf J, Zaib A, Wild T, Herkersdorf A, Becker J, Teich J (2015) Position
paper: Towards hardware-assisted decentralized mapping of applications for heterogeneous
NoC architectures. In: Proceedings of the International Workshop on Multi-Objective Many-
Core Design (MOMAC). VDE, pp 1–4. http://ieeexplore.ieee.org/document/7107099/
19. Weichslgartner A, Wildermann S, Gangadharan D, Glaß M, Teich J (2017) A design-time/run-
time application mapping methodology for predictable execution time in MPSoCs. ArXiv
e-prints pp 1–30, arXiv: 1711.05932
20. Weichslgartner A, Wildermann S, Götzfried J, Freiling F, Glaß M, Teich J (2016) Design-
time/run-time mapping of security-critical applications in heterogeneous MPSoCs. In: Proceed-
ings of the Conference on Languages, Compilers and Tools for Embedded Systems (SCOPES).
ACM, pp 153–162. https://doi.org/10.1145/2906363.2906370
21. Wildermann S, Weichslgartner A, Teich J (2015) Design methodology and run-time manage-
ment for predictable many-core systems. In: Proceedings of the Workshop on Self-Organizing
Real-Time Systems (SORT). IEEE, pp 103–110. https://doi.org/10.1109/ISORCW.2015.48
22. Wildermann S, Bader M, Bauer L, Damschen M, Gabriel D, Gerndt M, Glaß M, Henkel J,
Paul J, Pöppl A, Roloff S, Schwarzer T, Snelting G, Stechele W, Teich J, Weichslgartner A,
Zwinkau A (2016) Invasive computing for timing-predictable stream processing on MPSoCs.
it - Inf Technol 58(6):267–280. https://doi.org/10.1515/itit-2016-0021
Chapter 2
Invasive Computing
Abstract As this book originates in the context of invasive computing, this chapter
gives an overview of the invasive computing paradigm and its realization in software
and hardware. It starts with its basic principles and then gives an overview how
the paradigm is expressed at the language level. Afterwards, a formal definition
and analysis of invasive speedup and efficiency according to Teich et al. is given.
For the formal analysis of individual application programs independent from each
other through composability presented in the later chapters of this book, it is a
prerequisite to consider an actual invasive hardware architecture. Therefore, a tiled
invasive architecture with its building blocks is detailed with a focus on the (i-NoC).
Finally, a brief description of the employed operating system is given before other
approaches which deal with heterogeneous many-core systems are reviewed.
Future and even nowadays many-core systems come along with various challenges
and obstacles. Namely, programmability, adaptivity, scalability, physical constraints,
reliability, and fault-tolerance are mentioned in [52]. These issues motivate the new
computing paradigm invasive computing, first proposed by Teich in [50], which
introduces resource-aware programming. This gives the application programmer the
possibility to distribute the workload of the application based on the availability
and status of the underlying hardware resources. In [52], Teich et al. define invasive
computing as follows:
In contrast to statically mapped applications, resources are only claimed when they
are actually needed and are available for other applications after they are freed. This
increases the resource utilization drastically and hence the efficiency (for a formal
analysis of the invasive efficiency, see Sect. 2.3). Also, each application can adapt
itself to the amount and types of available resources. For example, if there are more
computing resources available, it can utilize a higher degree of parallelism. Or, if
there is a special accelerator module available, the programmer can use this resource
to execute an implementation variant of the algorithm which is tailored for exactly
this accelerator. Additionally, the application can retreat from resources which are
becoming too hot or unreliable [22]. All this is done in a decentralized manner and,
thus, highly scalable which is crucial for systems with 1,000 cores and more.
Invasive computing relies on three basic primitives invade, infect, and retreat.
The typical state transition of them is depicted by the chart in Fig. 2.1. First, an
initial claim is assembled by issuing an invade call. A claim can constitute itself of
computing resources such as processor cores, communication (e.g., NoC bandwidth),
and memory (e.g., caches, scratch pads). Subsequently, infect starts the application’s
code on the allocated cores of the claim. After the execution finishes, the claim size
can be increased by issuing another invade, also known as re-invade, or decreased by
a retreat, also known as a partial retreat. It is also possible to call infect, or so-called
reinfect, with another application on the same claim. After the program execution
terminates, the retreat primitive frees the claim and makes the resources available to
other applications.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back