Professional Documents
Culture Documents
Power-aware multi-objective
evolutionary optimisation for
application mapping on network-on-
chip platforms
a a b
M. V.C. da Silva , N. Nedjah & L. M. Mourelle
a
Deptartment of Electronics Engineering and
Telecommunications, State University of Rio de Janeiro, Rio de
Janeiro, Brazil
b
Deptartment of Systems Engineering and Computation, State
University of Rio de Janeiro, Rio de Janeiro, Brazil
To cite this article: M. V.C. da Silva, N. Nedjah & L. M. Mourelle (2010): Power-aware multi-
objective evolutionary optimisation for application mapping on network-on-chip platforms,
International Journal of Electronics, 97:10, 1163-1179
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation
that the contents will be complete or accurate or up to date. The accuracy of any
instructions, formulae, and drug doses should be independently verified with primary
sources. The publisher shall not be liable for any loss, actions, claims, proceedings,
demand, or costs or damages whatsoever or howsoever caused arising directly or
indirectly in connection with or arising out of the use of this material.
International Journal of Electronics
Vol. 97, No. 10, October 2010, 1163–1179
1. Introduction
As the integration rate of semiconductors increases, more complex system-on-chips
(SoCs) are launched. A simple SoC is formed by homogeneous or heterogeneous
independent components while a complex SoC is formed by interconnected hetero-
geneous components. The interconnection and communication of these components
with a communication architecture forms a network-on-chip (NoC). A NoC is similar
to a general network but with limited resources such as bandwidth, area and power.
Each component of a NoC is designed as an intellectual property (IP) block. An IP
block can be of general or special purpose such as processors, memory and digital
signal processors (DSPs) (Hu and Marculescu 2003).
Normally, a NoC is designed to run a specific application. This application
usually consists of a limited number of tasks that are implemented by a set of IP
blocks. Different applications may have a similar, or even the same, set of tasks.
An IP block can be assigned for more than a single task of the application or it
can be dedicated to execute a single task. For instance, a processor IP block can
execute different tasks like a general processor does but the NoC designer, in
performance, can assign just one task for that specific processor. On the other hand,
a multiplier IP block for floating point numbers can only multiply floating point
numbers and the NoC designer can reuse that IP if the application has more than
one floating point multiplication task. The number of IP block designers, as well as
the number of available IP blocks, is growing fast.
A NoC consists of sets of resources and switches. Resources and switches are
connected by resource network interfaces (RNIs). Switches are connected by
communication channels. A switch/resource pair forms a tile. The simplest way to
connect the available resources and switches is arranging them as a mesh so these are
able to communicate with each other by sending messages via an available
communication path. A switch is able to buffer and route messages between
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011
IP blocks can be plugged into the network if their footprint fits into an available
resource and if this resource is equipped with an adequate RNI.
Given an application, described by its ACG, the problem that we are concerned with
now is to determine how to topologically map the selected IPs onto the network
platform, such that the objectives of interest are optimised. At this stage, a more
accurate evaluation can be done taking into account the distance between resources
and the number of switches and channels crossed by a data package along a path.
The result of this process should be an optimal allocation of one of the prescribed IP
assignments, to execute a desired application on a NoC platform.
The mapping stage uses the result obtained from the assignment, which consists
of many non-dominated solutions. Let s be the number of distinct assignments
evolved, pi be the number of processors used in assignment i, and ni be the minimal
number of resources in the NoC to be utilised in the implementation of the
application with assignment solution i. In this case, the total number of possible
mappings is defined as in Equation (2).
X
s
ni !
Ms ¼ ð2Þ
i¼1
ðni pi Þ!
application referred to as task ti 2 T. Each directed arc di,j 2 D, between tasks ti and
tj, characterises either data or control dependencies.
Each task ti is annotated with relevant information, such as a unique identifier
and type of task in the network. Each di,j is associated with a value V (di,j), which
represents the volume of bits exchanged during the communication between tasks ti
and tj. Once the IP assignment has been completed, each task is associated with an IP
identifier. The result of the assignment is a graph of IPs representing the processor
elements (PEs) responsible for executing the application. This graph is called the
ACG. An ACG G ¼ G(C,A) is a directed graph, where each vertex ci 2 C represents
a IP assigned to one or more tasks, forming a core, and each directed arc ai,j
characterises the communication process from core ci to core cj. Each ai,j can be
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011
5.1. Representation
The chromosome is formed by a set of genes and each one represents a node id from
the TG. Each gene g has a IP id field that corresponds to an IP from the repository
1168 M.V.C. da Silva et al.
numbered successively from top-left to bottom-right, row by row. The row of the ith
tile is given by di/Ne, and the corresponding column by i mod N. Note that the first
resource id is the first row and the first column is numbered 0.
Three objectives of interest were identified for platform-based NoC design
optimisation. In this article, the DM’s preferences will constrain the power
consumption of the NoC while the other two objectives, area occupied and time
of execution, will be ranked in the normal way.
5.2.1. Area
In order to compute the area required it is necessary to add up the area of each
processor used in a given solution S. The identifier of each processor (procID) is
retrieved visiting each gene of S. Grouping the nodes of same processor and
identifying the nodes of dedicates processors, is a method to identify the processors
of solution S. Equation (3) shows how to compute the area of solution S, wherein
function PEðSÞ provides the set of non-dedicated processors used in S. The notation
S[t]ip indicates the IP assigned to task t in S and S[t]dedi the value of field dedi for
task t in S.
X X
AreaðSÞ ¼ areaS½tip S½tdedi þ areap : ð3Þ
t2TG p2PEðSÞ
should be executed in parallel are allocated to the same processor, these tasks must
be scheduled sequentially. If at least one of these tasks is from the critical path, the
execution time will be increased. Assume that the scheduling order is dictated by the
increasing order of the task identifier. In this context, consider the case where t1,
t2, . . ., tk, are k tasks that can be implemented in parallel, but are allocated to the
same processor. The execution time associated with a path that goes through a task ti
is increased by the sum of execution times of all tasks that are scheduled before ti.
These tasks are those whose identifier is smaller than the identifier of the task ti.
Equation (4) shows the details of this computation. In this context, the function
CðgÞ returns all possible paths of the task graph g, function PðtÞ returns the set of all
tasks in the ACG that may be executed in parallel with task t and are associated with
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011
the same processor in the solution S, function DðtÞ informs all the tasks that depends
on the execution of t and that are also allocated to the same processor in S. Note that
the attribute level of the nodes of a task graph can be used to determine the members
of the set returned by function PðtÞ.
!
X
TimeðSÞ ¼ max timeS½tip þ T ðSÞ
M2CðGTÞ
t2M
8
>
> 0 if S½tdedi ¼ 1 or P ðtÞ ¼ DðtÞ ¼ ;
>
<
T ðSÞ ¼ P : ð4Þ
> timeS½t0 ip otherwise
>
>
: t0 2 P ðtÞ [ DðtÞ
t0 < t
5.3.1. Area
To compute the area required by a given mapping it is necessary to know the area
needed for the selected processors and that occupied by the used channels and
switches. As a processor can be responsible for more than one task, each ACG node
must be visited in order to check the processor identifier for each node. It is necessary
1170 M.V.C. da Silva et al.
to identify those cases where a processor is dedicated for a task before grouping the
nodes with same procID attribute. Nodes with the same procID marked as non-
dedicated are executed by the same processor. Nodes marked as dedicated are
executed by a dedicated processor. The total number of channels and switches can be
obtained through the consideration of all communication paths between exploited
tiles. Note that a given IP mapping may not use all the available tiles, links and
switches that are available in the NoC structure. Also, observe that a portion of a
path may be re-used in several communication paths.
In this article, we adopted the XY deterministic route strategy (Duato,
Yalamanchili and Ni 2003). The data emanating from tile i to j are sent first
horizontally to the left or right side of the corresponding switch until it reaches the
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011
column of tile j, then sent up or down, also depending on the position of tile j with
respect to tile i, until it reaches the row of tile j. The number of channels in the
aforementioned route can be computed by the function CHði; jÞ as described in
Equation (6). This is also called the Manhattan distance between tiles i and j.
The number of hops between tiles along a given path leads to the number of channels
between those tiles, and incrementing that number by 1 yields the number of
traversed switches, as shown in Equation (7). The total area required is computed by
summing up the areas required by the implementation of all distinct processors,
switches and channels. The area required by switches and channels depends on the
NoC platform. A general decision support tool must allow the designer to configure
these parameters for different platforms.
Equation (8) describes the computation involved in obtaining the total area of a
given mapping solution S. For a given allocation, the function AreaA(.) gives the area
of the allocation exactly like Equation (3) does. The allocation that originated
mapping S is given by AS . Function EðgÞ returns all the edges of the task graph g,
while attributes src and tgt return the source and target tasks, respectively. Notation
S[t]res indicates the resource’s index where task t is mapped, regarding solution S.
Constants Areac and Areas represent the communication channel and switch areas,
respectively.
X
AreaM ðSÞ ¼ AreaA ðAS Þ þ areac CH S½dsrc res ; S½dtgt res
d2EðTGÞ
X ð8Þ
þ areas SW S½dsrc res ; S½dtgt res :
d2EðTGÞ
delay concerning the two aforementioned situations. Delay caused by situation (i) is
computed by function f1 and delay caused by situation (ii) is computed by function f2
in Equation (9).
TimeðSÞ ¼ max Timep ðQÞ þ Timec ðQÞ þ T 0 ðQÞ
Q2CðGTÞ
ð9Þ
0
T ðQÞ ¼ tL ðf1 ðQÞ þ f2 ðQÞÞ:
The time spent in computation for path Q of the task graph is computed as shown in
Equation (10). Function PðtÞ returns all the tasks at the same level of task t and
associated with the same processor, for a given mapping solution S. Function DðtÞ
returns all the task dependents of t and executed by the same processor, while AS ½tip
returns information about the IP assigned to a task t in S.
X
Timep ðQÞ ¼ timeAS ½tip þ T 00 ðSÞ
t2Q
8
>
> 0 if S½tdedi ¼ 1 or P ðtÞ ¼ DðtÞ ¼ ;
>
<
T ðSÞ ¼ P : ð10Þ
> timeS½t0 ip otherwise
>
> 0
: t 2 P ðtÞ [ DðtÞ
t0 < t
The time spent in communication between the tasks along path Q is computed as
shown in Equation (11). Function EðTGÞ returns all the edges d(src, tgt) of the task
0
graph, voldðt; t Þ is the volume of bits transmitted from task t to task t00 , tR is the
switch processing time and tL is the channel transmission time.
X
voldðt;t0 Þ
Timec ðQÞ ¼ tL SWðS½dt rec ; S½dt0 rec ÞðtR þ tL Þ: ð11Þ
dðt;t0 Þ 2 EðGTÞj
phit
t2Q;t0 2 Q
shown in Equation (14). Parameters ESbit and ECbit represent power consumed by
switches and channels, respectively. These parameters are platform dependent and
must be set by the NoC designer.
Ei;j
bit ¼ SW ESbit þ CH ECbit : ð14Þ
The task graph gives the volume of bits from task t to t0 through oriented edge dt;t0 .
Assuming that tasks t and t0 are mapped on tiles i and j, respectively, the amount of
bits transmitted from tile i to j is denoted as vold ðt;t0 Þ . Communication between tiles i
and j can be established with a unique channel Ci,j or with a sequence of m 4 1
channels [ci,x0,cx0, x1,cx1,x2, . . .,cxm71,j]. For example, on a 3 6 3 mesh-based NoC
ID Application N M Combinations
1 auto-indust-tg0 6 4 1.183.744
2 auto-indust-tg2 9 9 606.076.928
3 consumer-tg0 7 8 2.247.264
4 consumer-tg1 7 5 176.868
5 networking-tg2 4 3 41.616
6 office-tg0 5 5 210.681
7 telecom-tg1 6 6 9.516.192
Assignment Mapping
ID Application NSGA-II microGA NSGA-II microGA
1 auto-indust-tg0 2 4 2 7
2 auto-indust-tg2 17 23 11 47
3 consumer-tg0 9 6 3 10
4 consumer-tg1 3 9 7 18
5 networking-tg2 2 6 3 7
6 office-tg0 6 18 8 25
7 telecom-tg1 2 2 1 4
1174 M.V.C. da Silva et al.
with XY routing, a task mapped on tile 0 (the top left) sends data to a task mapped
on tile 8 (bottom right) through the following sequence of communication channels:
[c0,1, c1,2, c2,5, c5,8]. The total power consumption in communication is given by
Equation (15), where Targets(t) returns the set of tasks that are dependent of task t
and S[t]res returns the tile index where task t is mapped, considering a mapping
solution S.
X S½t ;S½t0 res
Powerc ðSÞ ¼ voldðt;t0 Þ Ebit res : ð15Þ
t2TG;8t0 2TargetsðtÞ
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011
6. Results
The E3S (0.9) Benchmark Suite (Dick 2008) was used to carry on the simulations.
The suite contains the characteristics of 17 embedded processors. These processors
are characterised by the measured execution times of 46 different types of tasks,
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011
power consumption derived from processor datasheets, die size required, price and
clock frequency. In addition, E3S contains common applications executed by
embedded systems in environments. We show the results obtained for seven different
applications, as described in Table 1, wherein N is the number of tasks and M is the
number of data dependencies of the application. The minimal and average values of
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011
Figure 4. Comparison of the optimisation time to locate the best assignments with power-
aware and non-preference-based implementations by NSGA-II and microGA. (a) Power-
aware optimisation. (b) Optimisation with no preference.
International Journal of Electronics 1177
power consumption obtained were used in this article to set the preferred minimal
and maximal bounds, respectively, of power consumption for each application.
However, any other value of power consumption can be set by the DM as suited.
The number of assignment and number of mapping solutions obtained by the
power-aware NSGA-II and microGA implementations are given in Table 2 and their
comparison is shown in Figures 2 and 3, respectively.
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011
Figure 5. Comparison of the optimisation time to locate the best mappings with power-
aware and non-preference-based implementations by NSGA-II and microGA. (a) Power-
aware optimisation. (b) Optimisation with no preference.
1178 M.V.C. da Silva et al.
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011
Besides comparing the number of solutions obtained, Figure 4a and b shows the
comparison of the search time spent by NSGA-II and microGA to evolve the best
assignment solutions for the power-aware and the non-preference-based implemen-
tations, respectively. The time of search to find the best mapping solutions with the
power-aware and the non-preference-based NSGA-II implementations is shown in
Figure 5a and b.
The three objectives of interest are: the hardware area required by the NoC-based
implementation, the execution time of the application and the power required by the
implementation. So in the non-preference-based approach a Pareto-front is a three-
dimensional surface. With the introduced power-aware approach a Pareto-front is a
bi-dimensional curve with respect to the trade-off between area occupied and time of
execution, since power consumption is constrained by DM’s preferences. Figure 6
shows the Pareto-front of application 3, which is the one with the largest amount of
possible solutions (da Silva, Nedjah and de Macedo Mourelle 2009).
7. Conclusions
The problems of assigning and mapping IPs are NP-hard problems and key research
problems in the NoC design field. In this article, we propose a innovative power-
aware multi-objective evolutionary decision support system to aid NoC designers
assigning and mapping a prescribed set of IPs into a NoC physical structure. A
International Journal of Electronics 1179
power-aware optimisation was performed and the performance of the NSGA-II and
microGA was compared. The latter performed better for all aplications.
Structured and intelligible representations of a NoC, a TG and of a repository of
IPs were used and these can be easily extended to different NoC applications.
Despite of the fact that we have adopted E3S Benchmark Suite (Dick 2008) as our
repository of IPs, any other repository could be used and modelled using XML,
making this tool compatible with different repositories.
The proposed shift crossover and inner swap mutation genetic operators can be
used in any optimisation problem where no loss of data from a individual is
accepted. The election of preference-based MOEAs instead of non-preference-based
ones speeds up the evolutionary process and reduces the amount of solutions
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011
presented to the DM. These goals were achieved by both MOEAs adopted in this
article.
Future work is fourfold: adopting a dynamic topology strategy to try to evolve
the most adequate topology for a given application, exploring the use of different
objectives based on different repositories, proposing an interfacing mechanism with a
hardware description simulator to integrate our tool to the NoC design platform and
exploring different approaches to introduce the DM’s preferences into a platform-
based aid system.
Acknowledgements
We are grateful to FAPERJ (Fundação de Amparo à Pesquisa do Estado do Rio de janeiro,
http:/www.faperj.br) and CNPq (Conselho Nacional de Desenvolvimento Cientı´fico e
Tecnológico, http:/www.cnpq.br) and CAPES (Coordenação de Aperfeiçoamento de Pessoal
de Ensino Superior, http:/www.capes.gov.br) for their continuous financial support.
References
Coello, C.A.C., Lamont, G.B., and Veldhuizen, D.A.V. (2006), Evolutionary Algorithms for
Solving Multi-Objective Problems (Genetic and Evolutionary Computation), Secaucus, NJ:
Springer-Verlag.
da Silva, M.V.C., Nedjah, N., and de Macedo Mourelle, L. (2009), ‘Optimal IP Assignment
for Efficient NoC-based System Implementation using NSGA-II and MicroGA’,
International Journal of Computational Intelligence Systems, 2, 115–123.
Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002), ‘A fast and elitist multiobjec-
tive genetic algorithm: NSGA-II’, IEEE Transactions on Evolutionary Computation, 6,
182–197.
Dick, R.P. (2008), ‘Embedded System Synthesis Benchmarks Suite (E3S)’. http://ziyang.eecs.
northwestern.edu/dickrp/e3s/.
Duato, J., Yalamanchili, S., and Ni, L. (2003), Interconnection Networks: An Engineering
Approach, San Francisco, CA: Morgan Kaufmann.
Garey, M.R., and Johnson, D.S. (1979), Computers and Intractability; a Guide to the Theory of
NP-completeness, USA: W. H. Freeman.
Hu, J., and Marculescu, R. (2003), ‘Energy-aware mapping for tile-based NoC architectures
under performance constraints’, in Proceedings of the 2003 conference on Asia South
Pacific Design Automation, Kitakyushu, Japan, New York, NY: ACM, pp. 233–239.
Murali, S., and Micheli, G.D. (2004), ‘SUNMAP: a tool for automatic topology selection and
generation for NoCs’, in Proceedings of the 41st Annual conference on Design Automation
(DAC-04), Jun. 7–11, New York: ACM Press, pp. 914–919.
Ogras, Ü.Y., Hu, J., and Marculescu, R. (2005), ‘Key research problems in NoC design: a
holistic perspective’, in Proceedings of the 3rd International Conference on Hardware/
Software Codesign and System Synthesis, eds. P. Eles, A. Jantsch, and R.A. Bergamaschi,
New York, NY: ACM, pp. 69–74.