You are on page 1of 18

This article was downloaded by: [Instituto De Ciencias Matematicas]

On: 16 October 2011, At: 15:45


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Electronics


Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/tetn20

Power-aware multi-objective
evolutionary optimisation for
application mapping on network-on-
chip platforms
a a b
M. V.C. da Silva , N. Nedjah & L. M. Mourelle
a
Deptartment of Electronics Engineering and
Telecommunications, State University of Rio de Janeiro, Rio de
Janeiro, Brazil
b
Deptartment of Systems Engineering and Computation, State
University of Rio de Janeiro, Rio de Janeiro, Brazil

Available online: 06 Oct 2010

To cite this article: M. V.C. da Silva, N. Nedjah & L. M. Mourelle (2010): Power-aware multi-
objective evolutionary optimisation for application mapping on network-on-chip platforms,
International Journal of Electronics, 97:10, 1163-1179

To link to this article: http://dx.doi.org/10.1080/00207217.2010.512105

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-


conditions

This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation
that the contents will be complete or accurate or up to date. The accuracy of any
instructions, formulae, and drug doses should be independently verified with primary
sources. The publisher shall not be liable for any loss, actions, claims, proceedings,
demand, or costs or damages whatsoever or howsoever caused arising directly or
indirectly in connection with or arising out of the use of this material.
International Journal of Electronics
Vol. 97, No. 10, October 2010, 1163–1179

Power-aware multi-objective evolutionary optimisation for application


mapping on network-on-chip platforms
M.V.C. da Silvaa, N. Nedjaha* and L.M. Mourelleb
a
Deptartment of Electronics Engineering and Telecommunications, State University of Rio de
Janeiro, Rio de Janeiro, Brazil; bDeptartment of Systems Engineering and Computation, State
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

University of Rio de Janeiro, Rio de Janeiro, Brazil


(Received 19 July 2010; final version received 30 July 2010)

Network-on-chip (NoC) is considered the next generation of communication


infrastructure, which will be omnipresent in different environments. In the
platform-based design methodology, an application is implemented by a set of
collaborating intellectual property (IP) blocks. The selection of the most suited set
of IPs as well as their physical mapping onto the NoC to efficiently implement the
application at hand are two hard combinatorial problems. In this article, we
propose an innovative power-aware multi-objective evolutionary algorithm to
perform the assignment and mapping stages of a platform-based NoC design
synthesis tool. Our algorithm uses the well-known multi-objective evolutionary
algorithms NSGA-II and microGA as kernels. The optimisation is driven by the
required area and the imposed execution time, considering that the decision
maker’s restriction is the power consumption of the implementation.
Keywords: network-on-chip; multi-objective optimisation; evolutionary
computation; application mapping

1. Introduction
As the integration rate of semiconductors increases, more complex system-on-chips
(SoCs) are launched. A simple SoC is formed by homogeneous or heterogeneous
independent components while a complex SoC is formed by interconnected hetero-
geneous components. The interconnection and communication of these components
with a communication architecture forms a network-on-chip (NoC). A NoC is similar
to a general network but with limited resources such as bandwidth, area and power.
Each component of a NoC is designed as an intellectual property (IP) block. An IP
block can be of general or special purpose such as processors, memory and digital
signal processors (DSPs) (Hu and Marculescu 2003).
Normally, a NoC is designed to run a specific application. This application
usually consists of a limited number of tasks that are implemented by a set of IP
blocks. Different applications may have a similar, or even the same, set of tasks.
An IP block can be assigned for more than a single task of the application or it
can be dedicated to execute a single task. For instance, a processor IP block can
execute different tasks like a general processor does but the NoC designer, in

*Corresponding author. Email: nadia@eng.uerj.br

ISSN 0020-7217 print/ISSN 1362-3060 online


Ó 2010 Taylor & Francis
DOI: 10.1080/00207217.2010.512105
http://www.informaworld.com
1164 M.V.C. da Silva et al.

performance, can assign just one task for that specific processor. On the other hand,
a multiplier IP block for floating point numbers can only multiply floating point
numbers and the NoC designer can reuse that IP if the application has more than
one floating point multiplication task. The number of IP block designers, as well as
the number of available IP blocks, is growing fast.
A NoC consists of sets of resources and switches. Resources and switches are
connected by resource network interfaces (RNIs). Switches are connected by
communication channels. A switch/resource pair forms a tile. The simplest way to
connect the available resources and switches is arranging them as a mesh so these are
able to communicate with each other by sending messages via an available
communication path. A switch is able to buffer and route messages between
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

resources. On a mesh-based NoC each switch is connected to up to four other


neighbouring switches through input and output channels. While a switch is sending
data through a channel, it can can buffer incoming data through another channel.
The power consumption depends on the number of exchanged messages.
Therefore, the communication between resources and the distance between them
must be considered during the mapping evaluation. Figure 1 shows the architecture
of a mesh-based NoC where each resource contains one or more IP blocks (D for
DSP, M for memory, C for cache, P for processor, FP for floating-point unit and Re
for reconfigurable block). Besides the mesh topology, which is considered in this
article, there are other topologies like torus, hypercube, 3-stage clos and butterfly
(Murali and Micheli 2004).
Usually, an application is described as a graph of tasks called a task graph (TG),
which is a high level description. The IP blocks’ features can be obtained from their

Figure 1. Mesh-based NoC with nine resources.


International Journal of Electronics 1165

manufacturer documentation. The IP assignment and IP mapping are key research


problems for efficient NoC-based designs (Ogras, Hu and Marculescu 2005).
Electronic design automation (EDA) tools must deal with these two problems. With
a low-level description, every IP mapping must be synthesised, leading to a very slow
but precise evaluation. With a high-level description, evaluation is first driven by
models of the NoC-based platform, leading to a fast evaluation; the precision
depends on the modelling. Along the design process the description abstraction level
is decreased until a register transfer level description is reached.
IP assignment and IP mapping are combinatorial optimisation problems
classified as NP-hard problems (Garey and Johnson 1979). We use multi-objective
evolutionary algorithms (MOEAs) with specific operators and objective functions to
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

yield an optimal IP assignment and IP mapping. As multi-objective problems


normally present a set of solutions, we consider the preferences of a decision maker
(DM) to find a single solution or at most a small subset of solutions. In this article,
we propose a power-aware multi-objective evolutionary decision support system to
help NoC designers with a high-level stage of a platform-based NoC design. For this
purpose, we use two MOEAs: NSGA-II (Deb, Pratap, Agarwal and Meyarivan
2002) and microGA (Coello, Lamont and Veldhuizen 2006). Both of these
algorithms were modified according to some prescribed NoC design constraints
and to accept preferences defined by the DM.
The rest of the article is organised as follows: In Section 3, we introduce the IP
assignment and IP mapping problems in platform-based designs. Then, in Section
4, we describe a structured TG and IP repository model based on the E3S data.
After that, in Section 5, we sketch the multi-objective evolutionary approach and
present the objective functions. Later, in Section 6, we show some experimental
result. Last but not least, in Section 7, we draw some conclusions and outline some
future work.

2. NoC internal structure


A NoC consists of sets of resources (R) and switches (S). Resources and switches are
connected by RNIs. Switches are connected by communication channels (or just
channels). The pair (R, S) forms a tile. The simplest way to connect the available
resources and switches is arranging them as a mesh so they are able to communicate
with each other by sending messages via an available communication path. A switch
is able to buffer and route messages between resources. On a mesh-based NoC each
switch is connected to up to four other neighbouring switches through input
and output channels. While a switch is sending data through a channel, it can buffer
incoming data through another channel. Note that power consumption is
proportional to the number of message exchanges. Therefore, the communication
between resources and the distance between them must be considered during the
mapping evaluation. Figure 1 shows the architecture of a mesh-based NoC where
each resource contains one or more IP blocks (D for DSP, M for memory, C for
cache, P for processor, FP for floating-point unit and Re for reconfigurable block).
Besides the mesh topology, which is considered in this article, there are
other topologies like torus, hypercube, 3-stage clos and butterfly (Murali and Micheli
2004).
Every resource has a unique identifier and is connected to the network via a
switch. It communicates with the switch through the available RNI. Thus, any set of
1166 M.V.C. da Silva et al.

IP blocks can be plugged into the network if their footprint fits into an available
resource and if this resource is equipped with an adequate RNI.

3. IP assignment and mapping problems


The platform-based design methodology for SoC encourages the reuse of
components to reduce costs and to reduce the time-to-market of new designs. The
designer of NoC-based systems faces two main problems: selecting an adequate set
of IPs and finding the best physical mapping of these IPs into the NoC structure.
With a platform-based design, the selection of IPs is called the IP assignment stage
and the physical mapping is called the IP mapping stage.
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

The main objective of the IP assignment stage is to select, from an IP repository,


a set of IPs that exploit re-usability and optimise the execution of a given
application. At this stage, no information about the physical location of the IPs is
available so optimisation must be done based on the application’s description (as a
TG) and IP features only. So, the result of this stage is the set of IPs that maximises
NoC performance due to IP features. The TG is then annotated and an application
characterisation graph (ACG) is produced, wherein each node (task) has an IP
assigned to it. The TG and ACG are defined in Section 4. The number of
possible assignments is in Equation (1), where m represents the number of tasks in
the application, t0, t1, . . ., tm71 and ni is the number of IPs that can be assigned to
task ti.

A ¼ n0  n1  . . . nm2  nm1 : ð1Þ

Given an application, described by its ACG, the problem that we are concerned with
now is to determine how to topologically map the selected IPs onto the network
platform, such that the objectives of interest are optimised. At this stage, a more
accurate evaluation can be done taking into account the distance between resources
and the number of switches and channels crossed by a data package along a path.
The result of this process should be an optimal allocation of one of the prescribed IP
assignments, to execute a desired application on a NoC platform.
The mapping stage uses the result obtained from the assignment, which consists
of many non-dominated solutions. Let s be the number of distinct assignments
evolved, pi be the number of processors used in assignment i, and ni be the minimal
number of resources in the NoC to be utilised in the implementation of the
application with assignment solution i. In this case, the total number of possible
mappings is defined as in Equation (2).
X
s
ni !
Ms ¼ ð2Þ
i¼1
ðni  pi Þ!

4. Task graph and IP repository models


In order to formulate the IP mapping problem, it is necessary to introduce a formal
definition of an application description first. An application can be described as a set
of tasks that can be executed sequentially or in parallel. It can be represented by a
directed acyclic graph of tasks, called task graph. A task graph (TG) G ¼ G(T, D) is a
directed acyclic graph where each node represents a computational module in the
International Journal of Electronics 1167

application referred to as task ti 2 T. Each directed arc di,j 2 D, between tasks ti and
tj, characterises either data or control dependencies.
Each task ti is annotated with relevant information, such as a unique identifier
and type of task in the network. Each di,j is associated with a value V (di,j), which
represents the volume of bits exchanged during the communication between tasks ti
and tj. Once the IP assignment has been completed, each task is associated with an IP
identifier. The result of the assignment is a graph of IPs representing the processor
elements (PEs) responsible for executing the application. This graph is called the
ACG. An ACG G ¼ G(C,A) is a directed graph, where each vertex ci 2 C represents
a IP assigned to one or more tasks, forming a core, and each directed arc ai,j
characterises the communication process from core ci to core cj. Each ai,j can be
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

tagged with IP-/application-specific information, such as communication rate,


communication bandwidth or volume of bits exchanged between cores ci and cj. A
TG is based on application features only while an ACG is based on application and
IP features, providing us with a much more realistic representation of an application
in runtime on a NoC platform. The abstraction level decreases from the TG to the
ACG representation.

5. Power-aware multi-objective evolutionary approach


The search space of the IP assignment and mapping problems, for a given
application, may exceed millions or even billions of possible combinations. Among
the huge number of possible solutions, it is possible to find many equally optimal
solutions, called non-dominated solutions (Coello et al. 2006). In a huge non-
continuous search space, deterministic approaches do not deal very well with MOPs.
In order to deal with such a big search space in a reasonable time, a power-aware
multi-objective evolutionary decision support system to aid platform-based NoC
design is proposed.
Instead of obtaining a single solution after IP assignment and mapping, as in the
typical platform-based NoC methodology, the proposed system exploits several
solutions to balance the trade-off.
The kernel of the proposed aid system is driven by two well-known MOEAs:
NSGA-II (Deb et al. 2002) and microGA (Coello et al. 2006). Both adopt the
domination concept with a ranking method of classification.
The ranking method separates solutions in Pareto fronts where each front
corresponds to a given rank. Solutions from rank one, which is called the Pareto-
optimal front, are equally good and better than any other solution from Pareto fronts
of higher ranks.
In order to deal with the IP assignment and IP mapping problems both
algorithms, NSGA-II and microGA, were adapted to recognise two individual
representations: an assignment representation and a mapping representation.
Originally, those algorithms do not consider the preferences of a DM and a major
modifications were introduced to turn them into power-aware multi-objective genetic
algorithms.

5.1. Representation
The chromosome is formed by a set of genes and each one represents a node id from
the TG. Each gene g has a IP id field that corresponds to an IP from the repository
1168 M.V.C. da Silva et al.

capable of executing the associated task type. If a IP is dedicated to execute a single


task of th TG, the dedi field value is 1, otherwise it is 0. Initially, a random IP id is
assigned to each gene, with the constraint of the IP type. Tournament selection, one-
point crossover and simple mutation were used. The crossover operator, without any
constraint, can only produce feasible individuals because the order of genes is not
changed. The mutation is controlled by IP type constraint to avoid selecting a
random IP from an IP repository of different type.
The mapping individual representation is inherited from the assignment
individual representation. It is augmented with the RES id field, which indicates
the resource on which a gene is mapped on the NoC platform, so representing
physical information. On an N 6 N regular mesh, assume that the tiles are
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

numbered successively from top-left to bottom-right, row by row. The row of the ith
tile is given by di/Ne, and the corresponding column by i mod N. Note that the first
resource id is the first row and the first column is numbered 0.
Three objectives of interest were identified for platform-based NoC design
optimisation. In this article, the DM’s preferences will constrain the power
consumption of the NoC while the other two objectives, area occupied and time
of execution, will be ranked in the normal way.

5.2. Assignment evaluation


The fitness of an assignment solution S is measured in terms of the silicon area that
would be used to implement the NoC-based application using S, the approximate
execution time of the so-implemented application, and the power consumption
required when the application is executed. Note that in this case, only computation
time and power due to computation are considered. Those introduced by the
communication can only be considered when the actual location where the IPs are
mapped within the NoC resource nodes are known, i.e. in the mapping stage. In the
following, we explain in detail how each of these characteristics is quantified.

5.2.1. Area
In order to compute the area required it is necessary to add up the area of each
processor used in a given solution S. The identifier of each processor (procID) is
retrieved visiting each gene of S. Grouping the nodes of same processor and
identifying the nodes of dedicates processors, is a method to identify the processors
of solution S. Equation (3) shows how to compute the area of solution S, wherein
function PEðSÞ provides the set of non-dedicated processors used in S. The notation
S[t]ip indicates the IP assigned to task t in S and S[t]dedi the value of field dedi for
task t in S.
X X
AreaðSÞ ¼ areaS½tip  S½tdedi þ areap : ð3Þ
t2TG p2PEðSÞ

5.2.2. Execution time


In order to compute the execution time required by a solution S, it is necessary to
find the critical path of the ACG. The critical path can be found by visiting all nodes
of all paths and recording the execution time of the slowest path. When tasks that
International Journal of Electronics 1169

should be executed in parallel are allocated to the same processor, these tasks must
be scheduled sequentially. If at least one of these tasks is from the critical path, the
execution time will be increased. Assume that the scheduling order is dictated by the
increasing order of the task identifier. In this context, consider the case where t1,
t2, . . ., tk, are k tasks that can be implemented in parallel, but are allocated to the
same processor. The execution time associated with a path that goes through a task ti
is increased by the sum of execution times of all tasks that are scheduled before ti.
These tasks are those whose identifier is smaller than the identifier of the task ti.
Equation (4) shows the details of this computation. In this context, the function
CðgÞ returns all possible paths of the task graph g, function PðtÞ returns the set of all
tasks in the ACG that may be executed in parallel with task t and are associated with
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

the same processor in the solution S, function DðtÞ informs all the tasks that depends
on the execution of t and that are also allocated to the same processor in S. Note that
the attribute level of the nodes of a task graph can be used to determine the members
of the set returned by function PðtÞ.
!
X
TimeðSÞ ¼ max timeS½tip þ T ðSÞ
M2CðGTÞ
t2M
8
>
> 0 if S½tdedi ¼ 1 or P ðtÞ ¼ DðtÞ ¼ ;
>
<
T ðSÞ ¼ P : ð4Þ
> timeS½t0 ip otherwise
>
>
: t0 2 P ðtÞ [ DðtÞ
t0 < t

5.2.3. Power consumption


To evaluate the power consumption of a application represented by a TG, the power
consumption of each IP assigned must be added. In Equation (5), powerS[t]ip
represents the power consumption when a task t is executed by its assigned IP in a
0
solution S, and xa and xa are the power constraints imposed for the assignment.
X
x0a  PowerðSÞ ¼ powerS½tip  xa : ð5Þ
t2GT

5.3. Mapping evaluation


The fitness of mapping solution S is measured in terms of the silicon area that would
be used to implement the NoC-based application using S, the execution time of
the so-implemented application and the power consumption required when the
application is executed. In the following, we explain in detail how each of these
characteristics is quantified.

5.3.1. Area
To compute the area required by a given mapping it is necessary to know the area
needed for the selected processors and that occupied by the used channels and
switches. As a processor can be responsible for more than one task, each ACG node
must be visited in order to check the processor identifier for each node. It is necessary
1170 M.V.C. da Silva et al.

to identify those cases where a processor is dedicated for a task before grouping the
nodes with same procID attribute. Nodes with the same procID marked as non-
dedicated are executed by the same processor. Nodes marked as dedicated are
executed by a dedicated processor. The total number of channels and switches can be
obtained through the consideration of all communication paths between exploited
tiles. Note that a given IP mapping may not use all the available tiles, links and
switches that are available in the NoC structure. Also, observe that a portion of a
path may be re-used in several communication paths.
In this article, we adopted the XY deterministic route strategy (Duato,
Yalamanchili and Ni 2003). The data emanating from tile i to j are sent first
horizontally to the left or right side of the corresponding switch until it reaches the
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

column of tile j, then sent up or down, also depending on the position of tile j with
respect to tile i, until it reaches the row of tile j. The number of channels in the
aforementioned route can be computed by the function CHði; jÞ as described in
Equation (6). This is also called the Manhattan distance between tiles i and j.

CHði; jÞ ¼ jbi=Nc  bj=Nc j þ j inN  jnNj: ð6Þ

The number of hops between tiles along a given path leads to the number of channels
between those tiles, and incrementing that number by 1 yields the number of
traversed switches, as shown in Equation (7). The total area required is computed by
summing up the areas required by the implementation of all distinct processors,
switches and channels. The area required by switches and channels depends on the
NoC platform. A general decision support tool must allow the designer to configure
these parameters for different platforms.

SWði; jÞ ¼ CHði; jÞ þ 1: ð7Þ

Equation (8) describes the computation involved in obtaining the total area of a
given mapping solution S. For a given allocation, the function AreaA(.) gives the area
of the allocation exactly like Equation (3) does. The allocation that originated
mapping S is given by AS . Function EðgÞ returns all the edges of the task graph g,
while attributes src and tgt return the source and target tasks, respectively. Notation
S[t]res indicates the resource’s index where task t is mapped, regarding solution S.
Constants Areac and Areas represent the communication channel and switch areas,
respectively.
X  
AreaM ðSÞ ¼ AreaA ðAS Þ þ areac  CH S½dsrc res ; S½dtgt res
d2EðTGÞ
X   ð8Þ
þ areas  SW S½dsrc res ; S½dtgt res :
d2EðTGÞ

5.3.2. Execution time


To compute the execution time of a given mapping, we consider the execution time
of each task of the critical path, their schedule and the additional time due to data
transportation through channels and switches along the communication path. The
execution time of each task is defined by the taskTime attribute in TG. Channels and
switches can be counted using Equations (6) and (7), respectively. Analysing the
International Journal of Electronics 1171

assignment problem we identified a situation that increases the execution time of an


application, which occurs when parallel tasks are allocated to the same processor.
The mapping problem analysis revealed other two situations that can increase
the execution time of the application: (i) Parallel tasks with a common source sharing
communication channels and (ii) Parallel tasks with a common target sharing
communication channels.
Equation 9 gives the execution time considering computation and communica-
tion for a given mapping solution S. Function CðgÞ returns all possible paths Q for a
task graph g, Timep (Q) returns the time necessary to process the tasks of path Q, and
Timec (Q) returns the time spent in communication between the tasks in path Q,
assuming there is no contention. Considering contention, it is necessary to add the
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

delay concerning the two aforementioned situations. Delay caused by situation (i) is
computed by function f1 and delay caused by situation (ii) is computed by function f2
in Equation (9).
 
TimeðSÞ ¼ max Timep ðQÞ þ Timec ðQÞ þ T 0 ðQÞ
Q2CðGTÞ
ð9Þ
0
T ðQÞ ¼ tL  ðf1 ðQÞ þ f2 ðQÞÞ:

The time spent in computation for path Q of the task graph is computed as shown in
Equation (10). Function PðtÞ returns all the tasks at the same level of task t and
associated with the same processor, for a given mapping solution S. Function DðtÞ
returns all the task dependents of t and executed by the same processor, while AS ½tip
returns information about the IP assigned to a task t in S.
X
Timep ðQÞ ¼ timeAS ½tip þ T 00 ðSÞ
t2Q
8
>
> 0 if S½tdedi ¼ 1 or P ðtÞ ¼ DðtÞ ¼ ;
>
<
T ðSÞ ¼ P : ð10Þ
> timeS½t0 ip otherwise
>
> 0
: t 2 P ðtÞ [ DðtÞ
t0 < t

The time spent in communication between the tasks along path Q is computed as
shown in Equation (11). Function EðTGÞ returns all the edges d(src, tgt) of the task
0
graph, voldðt; t Þ is the volume of bits transmitted from task t to task t00 , tR is the
switch processing time and tL is the channel transmission time.

X  
voldðt;t0 Þ
Timec ðQÞ ¼ tL SWðS½dt rec ; S½dt0 rec ÞðtR þ tL Þ: ð11Þ
dðt;t0 Þ 2 EðGTÞj
phit
t2Q;t0 2 Q

Function f1 computes delays on path Q regarding situation (i). Algorithm 1 describes


this function. For each parallel task that must be achieved through the same
communication channel, the overall execution time is increased due to data
pipelining. Function Targets(t) returns all the tasks of the task graph that depends
on task t, iCHðt; t0 Þ is the initial communication channel index of task t to task t0 , and
penalty is the number of flits that would be transmitted when situation (i) occurs.
A flit represents the flow unit, multiple of the phit that represents the physical unit
1172 M.V.C. da Silva et al.

Algorithm 1: f1 (Q) – SameSrcDiffTgt


1: penalty: ¼ 0
2: for all t 2 Q do
3: if Targets(t) 4 1 then
4: Seja t1 2 Targets(t) j t1 2 Q
5: for all t2 2 Targets(t)\t1 do
6: if iCHðt; t1 Þ ¼ iCHðt; t2 Þl e t1 4 m t2 then

7: penalty :¼ penalty þ volðt;tphit
8: end if
9: end for
10: end if
11: end for
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

12: return penalty

Algorithm 2: f2 (Q) – DiffSrcSameTgt


1: penalty: ¼ 0
2: for all t 2 Q do
3: for all t1 2 GT j t1 6¼ t e level(t) ¼ level(t1) do
4: for all s 2 Targets(t) e s1 2 Targets(t1) j s ¼ s1 do
5: w ¼ CHsðt; sÞ; w1 ¼ CHsðt1 ; s1 Þ
6: if exists i 2 [0,min(w.length,w
l 1m.length)] j w(i) ¼ w1(i) e t 4 t1 then
7: penalty :¼ penalty þ volðt 1 ;sÞ
phit
8: end if
9: end for
10: end for
11: end for
12: return penalty

given by the channel width. Function f2 computes delays on path Q regarding to


situation (ii). If a switch receives packages from two different channels at the same
time and needs to route them through the same output channel, there will be package
pipelining. Algorithm 2 computes how many times this situation occurs. The
function CHsðt; t0 Þ returns an ordered list of the required channels during
communication of tasks t and t0 and penalty is the number of flits that would be
transmitted when situation (ii) occurs.

5.3.3. Power consumption


To compute the power consumption of a mapping, it is necessary to consider the
power consumed in processing and communication. The total power consumed is
given as in Equation (12), wherein Powerp and Powerc represent processing and
communication consumption, respectively and xm and x0m are the power constraints.

x0m  PowerðSÞ ¼ Powerp ðSÞ þ Powerc ðSÞ  xm ð12Þ

The power consumption in processing is given summarising the power con-


sumption of each executed task of a mapping solution S. In Equation (13),
International Journal of Electronics 1173

Power(t,p) represents the power consumed when a task t is executed by a


processor p.
X
Powerp ðSÞ ¼ powerS½tip : ð13Þ
t2GT

The power consumed in communication is a important feature to be considered


in the NoC power model in order to yield an accurate evaluation. This feature
depends on the application communication pattern and the NoC platform. The
communication pattern is given by the assignment and mapping, while the NoC
platform is defined by the network topology, switching strategy and routing
algorithm. The power consumed by sending one bit from tile i to tile j is computed as
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

shown in Equation (14). Parameters ESbit and ECbit represent power consumed by
switches and channels, respectively. These parameters are platform dependent and
must be set by the NoC designer.

Ei;j
bit ¼ SW  ESbit þ CH  ECbit : ð14Þ

The task graph gives the volume of bits from task t to t0 through oriented edge dt;t0 .
Assuming that tasks t and t0 are mapped on tiles i and j, respectively, the amount of
bits transmitted from tile i to j is denoted as vold ðt;t0 Þ . Communication between tiles i
and j can be established with a unique channel Ci,j or with a sequence of m 4 1
channels [ci,x0,cx0, x1,cx1,x2, . . .,cxm71,j]. For example, on a 3 6 3 mesh-based NoC

Table 1. Applications and number of assignments and mappings for power-aware


optimisation.

ID Application N M Combinations
1 auto-indust-tg0 6 4 1.183.744
2 auto-indust-tg2 9 9 606.076.928
3 consumer-tg0 7 8 2.247.264
4 consumer-tg1 7 5 176.868
5 networking-tg2 4 3 41.616
6 office-tg0 5 5 210.681
7 telecom-tg1 6 6 9.516.192

Table 2. Applications and number of assignments and mappings for power-aware


optimisation.

Assignment Mapping
ID Application NSGA-II microGA NSGA-II microGA
1 auto-indust-tg0 2 4 2 7
2 auto-indust-tg2 17 23 11 47
3 consumer-tg0 9 6 3 10
4 consumer-tg1 3 9 7 18
5 networking-tg2 2 6 3 7
6 office-tg0 6 18 8 25
7 telecom-tg1 2 2 1 4
1174 M.V.C. da Silva et al.

with XY routing, a task mapped on tile 0 (the top left) sends data to a task mapped
on tile 8 (bottom right) through the following sequence of communication channels:
[c0,1, c1,2, c2,5, c5,8]. The total power consumption in communication is given by
Equation (15), where Targets(t) returns the set of tasks that are dependent of task t
and S[t]res returns the tile index where task t is mapped, considering a mapping
solution S.
X S½t ;S½t0 res
Powerc ðSÞ ¼ voldðt;t0 Þ  Ebit res : ð15Þ
t2TG;8t0 2TargetsðtÞ
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

Figure 2. Comparison of the number of power-aware assignments and mappings obtained


by NSGA-II and microGA. (a) Power-aware assignments. (b) Power-aware mappings.
International Journal of Electronics 1175

6. Results
The E3S (0.9) Benchmark Suite (Dick 2008) was used to carry on the simulations.
The suite contains the characteristics of 17 embedded processors. These processors
are characterised by the measured execution times of 46 different types of tasks,
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

Figure 3. Comparison of the number of mappings with power-aware and non-preference-


based implementations obtained by NSGA-II and microGA. (a) Power-aware optimisation.
(b) Optimisation with no preference.
1176 M.V.C. da Silva et al.

power consumption derived from processor datasheets, die size required, price and
clock frequency. In addition, E3S contains common applications executed by
embedded systems in environments. We show the results obtained for seven different
applications, as described in Table 1, wherein N is the number of tasks and M is the
number of data dependencies of the application. The minimal and average values of
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

Figure 4. Comparison of the optimisation time to locate the best assignments with power-
aware and non-preference-based implementations by NSGA-II and microGA. (a) Power-
aware optimisation. (b) Optimisation with no preference.
International Journal of Electronics 1177

power consumption obtained were used in this article to set the preferred minimal
and maximal bounds, respectively, of power consumption for each application.
However, any other value of power consumption can be set by the DM as suited.
The number of assignment and number of mapping solutions obtained by the
power-aware NSGA-II and microGA implementations are given in Table 2 and their
comparison is shown in Figures 2 and 3, respectively.
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

Figure 5. Comparison of the optimisation time to locate the best mappings with power-
aware and non-preference-based implementations by NSGA-II and microGA. (a) Power-
aware optimisation. (b) Optimisation with no preference.
1178 M.V.C. da Silva et al.
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

Figure 6. Pareto-fronts obtained for application 3 with power-aware implementations of


NSGA-II and microGA. (a) Assignment with NSGA-II. (b) Mapping with NSGA-II. (c)
Assignment with microGA. (d) Mapping with microGA.

Besides comparing the number of solutions obtained, Figure 4a and b shows the
comparison of the search time spent by NSGA-II and microGA to evolve the best
assignment solutions for the power-aware and the non-preference-based implemen-
tations, respectively. The time of search to find the best mapping solutions with the
power-aware and the non-preference-based NSGA-II implementations is shown in
Figure 5a and b.
The three objectives of interest are: the hardware area required by the NoC-based
implementation, the execution time of the application and the power required by the
implementation. So in the non-preference-based approach a Pareto-front is a three-
dimensional surface. With the introduced power-aware approach a Pareto-front is a
bi-dimensional curve with respect to the trade-off between area occupied and time of
execution, since power consumption is constrained by DM’s preferences. Figure 6
shows the Pareto-front of application 3, which is the one with the largest amount of
possible solutions (da Silva, Nedjah and de Macedo Mourelle 2009).

7. Conclusions
The problems of assigning and mapping IPs are NP-hard problems and key research
problems in the NoC design field. In this article, we propose a innovative power-
aware multi-objective evolutionary decision support system to aid NoC designers
assigning and mapping a prescribed set of IPs into a NoC physical structure. A
International Journal of Electronics 1179

power-aware optimisation was performed and the performance of the NSGA-II and
microGA was compared. The latter performed better for all aplications.
Structured and intelligible representations of a NoC, a TG and of a repository of
IPs were used and these can be easily extended to different NoC applications.
Despite of the fact that we have adopted E3S Benchmark Suite (Dick 2008) as our
repository of IPs, any other repository could be used and modelled using XML,
making this tool compatible with different repositories.
The proposed shift crossover and inner swap mutation genetic operators can be
used in any optimisation problem where no loss of data from a individual is
accepted. The election of preference-based MOEAs instead of non-preference-based
ones speeds up the evolutionary process and reduces the amount of solutions
Downloaded by [Instituto De Ciencias Matematicas] at 15:45 16 October 2011

presented to the DM. These goals were achieved by both MOEAs adopted in this
article.
Future work is fourfold: adopting a dynamic topology strategy to try to evolve
the most adequate topology for a given application, exploring the use of different
objectives based on different repositories, proposing an interfacing mechanism with a
hardware description simulator to integrate our tool to the NoC design platform and
exploring different approaches to introduce the DM’s preferences into a platform-
based aid system.

Acknowledgements
We are grateful to FAPERJ (Fundação de Amparo à Pesquisa do Estado do Rio de janeiro,
http:/www.faperj.br) and CNPq (Conselho Nacional de Desenvolvimento Cientı´fico e
Tecnológico, http:/www.cnpq.br) and CAPES (Coordenação de Aperfeiçoamento de Pessoal
de Ensino Superior, http:/www.capes.gov.br) for their continuous financial support.

References
Coello, C.A.C., Lamont, G.B., and Veldhuizen, D.A.V. (2006), Evolutionary Algorithms for
Solving Multi-Objective Problems (Genetic and Evolutionary Computation), Secaucus, NJ:
Springer-Verlag.
da Silva, M.V.C., Nedjah, N., and de Macedo Mourelle, L. (2009), ‘Optimal IP Assignment
for Efficient NoC-based System Implementation using NSGA-II and MicroGA’,
International Journal of Computational Intelligence Systems, 2, 115–123.
Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002), ‘A fast and elitist multiobjec-
tive genetic algorithm: NSGA-II’, IEEE Transactions on Evolutionary Computation, 6,
182–197.
Dick, R.P. (2008), ‘Embedded System Synthesis Benchmarks Suite (E3S)’. http://ziyang.eecs.
northwestern.edu/dickrp/e3s/.
Duato, J., Yalamanchili, S., and Ni, L. (2003), Interconnection Networks: An Engineering
Approach, San Francisco, CA: Morgan Kaufmann.
Garey, M.R., and Johnson, D.S. (1979), Computers and Intractability; a Guide to the Theory of
NP-completeness, USA: W. H. Freeman.
Hu, J., and Marculescu, R. (2003), ‘Energy-aware mapping for tile-based NoC architectures
under performance constraints’, in Proceedings of the 2003 conference on Asia South
Pacific Design Automation, Kitakyushu, Japan, New York, NY: ACM, pp. 233–239.
Murali, S., and Micheli, G.D. (2004), ‘SUNMAP: a tool for automatic topology selection and
generation for NoCs’, in Proceedings of the 41st Annual conference on Design Automation
(DAC-04), Jun. 7–11, New York: ACM Press, pp. 914–919.
Ogras, Ü.Y., Hu, J., and Marculescu, R. (2005), ‘Key research problems in NoC design: a
holistic perspective’, in Proceedings of the 3rd International Conference on Hardware/
Software Codesign and System Synthesis, eds. P. Eles, A. Jantsch, and R.A. Bergamaschi,
New York, NY: ACM, pp. 69–74.

You might also like