You are on page 1of 9

IET Computers & Digital Techniques

Research Article

Efficient and scalable cross-by-pass-mesh ISSN 1751-8601


Received on 5th December 2016
Accepted on 18th January 2017
topology for networks-on-chip E-First on 22nd May 2017
doi: 10.1049/iet-cdt.2016.0184
www.ietdl.org

Usman Ali Gulzari1,2 , Sheraz Anjum3, Shahrukh Aghaa1, Sarzamin Khan4, Frank Sill Torres5
1Department of Electrical Engineering, COMSATS Institute of Information Technology, Islamabad, Pakistan
2Department of Electrical Engineering, The Univesity of Lahore, Islambad Campus, Pakistan
3Department of Computer Science, COMSATS Institute of Information Technology, WahCantt, Pakistan
4Department of Electrical Engineering, COMSATS Institute of Information Technology, WahCantt, Pakistan
5Department of Electronic Engineering, Federal University of Minas Gerais, Belo Horizonte, Brazil

E-mail: usmangulzari2000@yahoo.com

Abstract: This study presents an efficient and scalable networks-on-chip (NoC) topology termed as cross-by-pass-mesh (CBP-
Mesh). The proposed architecture is derived from the traditional mesh topology by addition of cross-by-pass links in the
network. The design and impact of adding cross-by-pass links on the topology is analysed in detail with the help of synthetic,
hotspot as well as embedded traffic traces. The advantages of proposed CBP-Mesh as compared with its competitor topologies
include reduction in the network diameter, increase in bisection bandwidth, reduction in average numbers of hops, improvement
in symmetry and regularity of the network. The synthetic traffic traces and some real embedded system workloads are applied
on the proposed CBP-Mesh and its competitor two-dimensional-based NoC topologies. The comparison of analytical results in
terms of performance and costs for different network dimensions indicate that the proposed CBP-Mesh offers short latency, high
throughput and good scalability at small increase in power and energy.

1 Introduction connected-mesh (C2-Mesh) topologies as well its competitors


extended-mesh (XD-Mesh) and D-Mesh [10–15].
The rising complexity of system-on-chip (SoC) designs by adding
The simulation results indicate that CBP-Mesh is an efficient
more and more processing elements (PEs) requires communication
candidate among its selected topologies due to its less average
schemes that are more flexible and robust than the classic shared
latency and increased throughput at the cost of a slight increase in
buses [1]. A promising solution for on-chip communications
network power and energy for on-chip communication.
among PEs is the paradigm networks-on-chip (NoC), which is
The rest of the paper is organised as follows. Section 2 provides
based on packet oriented communication [2]. Highlighting the
background and related works, while Section 3 proposes the new
characteristics of NoC are their modular structure and their
topology. Section 4 provides impacts of adding links to topology
concurrency of computation as well as communication [3].
characteristics and Section 5 presents the comparisons with
Principal issues to be addressed in NoC are reduction of power
different mesh topologies. Finally, Section 6 draws the conclusions.
consumption and energy utilisation at low penalties in
performance, latency and throughput [4]. Further issues are
network scalability and design complexity of routing elements [5]. 2 Background
Therefore, the choice of an appropriate topology is mandatory. The Mesh (see Fig. 1a) is a commonly applied topology for NoC for
most promising and widely applied NoC topology is the so-called multicore systems [11]. The mesh topology offers a simple and
mesh, which profits from a regular and simple structure [6]. regular network design, and thus, is a widely chosen option for
However, mesh networks suffer under poor scalability for large NoC [13]. However, with increasing number of PEs, mesh
amount of PEs due to the great number of multi-hop links needed networks start to suffer performance degradation. This follows
to provide complete reachability [7]. Alternative solutions are mainly from the drastic increase of the network diameter, leading
meshes with hierarchical topologies like diagonal-mesh (D-Mesh) to higher latencies. Further, the throughput is reduced as the
[8] and flattened butterfly [9], which reduce the average hop count number of nodes increases stronger than the bisection width (see
in the NoC. However, most proposed structures lead to increasing also Section 2). Hence, novel types of NoC mesh topologies have
router complexity as well as higher costs in terms of power been proposed, aiming at the reduction of the network diameter and
consumption and energy [9]. the increase of the bisection width. Hereby, the principal difference
This work presents an efficient cross-by-pass-mesh (CBP- between all these new topologies is the type and the application of
Mesh) architecture design with high scalability. The architecture of the interconnections (see Fig. 1).
the proposed network topology is based on the basic mesh We propose in the following a classification into four basic
topology and adds cross-by-pass links (CBP-Links). The additions types of links for NoC mesh topologies (see Fig. 1). The classic
of CBP-Links provide reduced paths by over-passing in-between mesh-links (M-Links, see Fig. 1a) connect direct horizontal and
nodes for traverse the packets between source to destination and vertical neighbour PEs, while Torus-links (T-Links, see Fig. 1b)
increase the performance of the network w.r.t. its predecessor's and interlink horizontal and vertical PEs over longer distances [14, 15].
competitors topologies. These additional links are more effective Diagonal-links (D-Links, see Fig. 1c) connect direct diagonal
for approaching longer distance nodes, considerably reduce the neighbour PEs, extended-links (XD-Links) connect the opposite
average latency and result in higher throughput of the network. In corners PEs of a mesh through the central node (see Fig. 1d) and
order to demonstrate the performance of the proposed CBP-Mesh, cross-links (C-Links, see Fig. 1e) interlink diagonal PEs over
the synthetic hotspot traffic and five different embedded longer distances [13, 16].
application workloads were applied on proposed CBP-Mesh and The Torus topology (Fig. 1b) applies T-Links and possesses a
selected topologies. The proposed topology design is compared considerably lower network diameter compared with the standard
with some of its predecessor's such as mesh, Torus and central- mesh [13]. The topologies D-Mesh (Fig. 1c) confront this
IET Comput. Digit. Tech., 2017, Vol. 11 Iss. 4, pp. 140-148 140
© The Institution of Engineering and Technology 2017
Fig. 1  3 × 3 meshes including routers (Ro) and interconnection links
(a) Mesh, (b) Torus, (c) D-Mesh, (d) XD-Mesh, (e) C2-Mesh, (f) CBP-Mesh

3.1 Motivation
To increase the performance of the mesh network, the worst case
scenario of hop count for traversing a packet from source to its
destination node should be addressed. The worst cases of hop count
in 3 × 3 mesh network include the opposite corner nodes of Ro0,0
↔ Ro2,2 and Ro0,2 ↔ Ro2,0 (see Fig. 1a), which is four hops. The
T-Links on Torus network cover this distance in two hops by
hoping the corner nodes. By using D-Links, XD-Mesh and D-Mesh
topologies take two hops through the central node to reach its
opposite corner node. C2-Mesh uses one extra C-Link from a 3 × 3
mesh network, that reduces the hop count between two opposite
corner nodes Ro0,0 ↔ Ro2,2 in Fig. 1e. However, the other opposite
corner nodes side of Ro0,2 ↔ Ro2,0 in Fig. 1e is ignored. The
proposed CBP-Mesh design adds the two CBP-Links to a mesh
Fig. 2  CBP-Mesh router (Ro) with router links for interlinking the
network, placed between both pairs of opposite corner nodes of
connection with neighbour routers nodes
Ro0,0 ↔ Ro2,2 and Ro0,2 ↔ Ro2,0 to reduce the hop and connect
limitation by using additional D-Links [12]. However, D-Mesh directly (see Fig. 1f for details). Introduction of the CBP-Links
topology requires higher degree routers, resulting in considerably minimise distance of two hops as in the case of Torus, XD-Mesh
increased costs in terms of power consumption [14]. In detail, D- and D-Mesh networks to single hop. Addressing the worst case
Mesh applies M- and D-Links [14]. Thereby, the number of D- scenario of opposite corners in CBP-Mesh network resulted in
Links is higher than in mesh and Torus. It can be concluded that D- higher performance of the network.
Mesh is a complex and costly network topology [11]. The proposed CBP-Mesh is a scalable topology with its basic
building block of the CBP-Mesh as shown in Fig. 1f. CBP-Mesh
XD-Mesh (Fig. 1d) and C2-Mesh (Fig. 1e) have been proposed
architecture can be extended to odd (5 × 5) number of nodes,
to reduce complexity and costs at constant high scalability [13, 16].
higher number of nodes or odd/even (3 × 4) number of nodes in the
In detail, XD-Mesh applies XD-Links between corner nodes
network as shown in Figs. 3, 4 and 6c, respectively. CBP-Links in
through the centre node, while C2-Mesh adds a single link for each addition to M-Links provide multipath that helps to accommodate
3 × 3 Mesh-network connection between two opposite nodes [13]. more adaptive and dynamic routing algorithms in the proposed
It should be noted that XD-Mesh and C2-Mesh offer simplicity and network.
low cost, but having a lower performance in comparison to D-
Mesh [16]. 3.2 Principle architecture

3 CBP-Mesh This section describes the proposed CBP-Mesh architecture and


details the placement of the CBP-Links.
This section describes the proposed CBP-Mesh motivation of the
design architecture and details the placement of the CBP-Links 3.2.1 Assign links to CBP-Mesh network: A CBP-Mesh
their features. network is a combination of a classic-mesh network with extra
CBP-Links (see also Section 3). Thus, the average node distance

IET Comput. Digit. Tech., 2017, Vol. 11 Iss. 4, pp. 140-148 141
© The Institution of Engineering and Technology 2017
n 2
N links, CBP_even = 2n2 − 2n + 2 (n ≥ 6) (1b)
3

3.3 CBP-Mesh network features


CBP-Links of the proposed CBP-Mesh adds the following benefits
over its competitor topologies.
Fig. 4 illustrates how CBP-Mesh network reduces the network
diameter and the distance between nodes. In Fig. 4, three different
colour routers (blue, green and red) are shown with hexagonal
boxes interlinking each other over the network. The blue routers
have one, two or four CBP-Links (see green lines) in addition to
M-Links (see black lines). The green colour router has one and red
router has two hops distance to the nearest CBP-Links router
nodes. The CBP-Links connect the longer distance nodes by over-
passing the in-between nodes (like fly over on the roads) of the
network. The CBP-Links connect the corner node routers Ro0,0,
Ro0,4, Ro4,0, Ro4,4 ↔ Ro2,2 to a central node router in one hop.
The middle terminal nodes Ro0,2, Ro4,2 ↔ Ro2,0, Ro2,4 take also
one hop to connect (see blue arrow lines in Fig. 4).
Similarly, corner nodes Ro0,0, Ro0,4, Ro4,0, Ro4,4 via centre
node Ro2,2 take two hops to traverse to opposite nodes in diagonal
direction. The middle terminal nodes Ro0,2 ↔ Ro4,2 via Ro2,0 or
Ro2,4 and Ro2,0↔ Ro2,4 via Ro0,2 or Ro4,2 connect each other
Fig. 3  5 × 5CBP-Mesh network using one hop only. The adjacent green router nodes take one more
and red router nodes take two more hops using M-Links from the
can be decreased, leading to smaller network diameter. The above router nodes in CBP-Mesh network.
placement of these CBP-Links is an essential step of designing a Further advantages are the connection of CBP-Links to the
CBP-Mesh network and shall be detailed in the following. central/terminals of the network (see Fig. 4), which provides
A CBP-Mesh is defined as improved traffic flow and reduced hop count. The addition of CBP-
  Links in the proposed CBP-Mesh effectively address the worst case
Definition 1: A CBP-Mesh is a two-dimensional network with size hop count between nodes Ro0,0 ↔ Ro4,4 or nodes Ro0,4 ↔ Ro4,0
m × n, with m, n ≥ 3. It consists of a set of nodes N = {(x, y) | 0 ≤ x 
from nine in a 5 × 5 Mesh network to two hops.
≤ m − 1, 0 ≤ y ≤ n − 1}, with (0,0) is the node most northwest. Each
The grey areas in Fig. 4a indicate three types of network
node has its own router Rox,y.
diameters for the m × n CBP-Mesh, namely the diagonal diameter
A CBP-Mesh Router (see Fig. 2) is defined as follows: (DDia), the end to end diameter (EDia) and the middle diameter

Definition 2: A CBP-Mesh router has the degree of 3, 4, 5 or 8 (MDia). The estimation of these parameters for symmetric CBP-
with possible ports LN, LS, LE and LW enabling connections to Mesh with dimension n × n is as follows:
neighbour routers on the north (rN), south (rS), east (rE) and west
n
(rW) using links lN, lS, lE and lW. Further, a CBP-Mesh router can DDia = n − +1 (2)
2
have ports LNE, LNW, LSE and LSW for connecting the CBP-Links
(see green arrow line in Fig. 2) lNE, lNW, lSE and lSW. n
EDia = n − −1 (3)
The east neighbour rx,yE and west neighbour rx,yW of router 2
Rox,y of node (x, y) are the routers of nodes (x, y + 1) and (x, y − 1).
Similarly, north neighbour rx,yN and south neighbour rx,yS of Rox,y n
MDia = n − (n = {3, 7, 11, 15, …}) (4a)
2
are of nodes (x − 1,y) and (x + 1,y).
The insertion of CBP-Links is defined as follows:
n
  MDia = n − (n = {5, 9, 13, 17, …}) (4b)
2
Definition 3: The router Rox,y of node (x,y), with x and y are even
numbers, is connected to routers of nodes (x − 2, y − 2), (x + 2, y −  The grey areas in Fig. 4b indicate the path between nodes Ro1,0
2), (x − 2, y + 2) and (x + 2, y + 2) by CBP-Links using the ports
and Ro1,6 in a 3 × 7 network, which would have a hop count of six
LNW, LNE, LSW and LSE of Rox,y.
in a mesh network. In contrast, in the proposed CBP-Mesh the hop
Fig. 3 depicts an exemplary 5 × 5 CBP-Mesh with origin point count reduces to five (see double arrow lines in Fig. 4b).
(0,0). Here, router Ro2,2 is connected to its direct neighbours Ro1,2, For networks with larger amount of nodes, the gain due to CBP-
Ro3,2, Ro2,3 and Ro2,1 via links at its ports LN, LS, LE and LW. Links increases considerably. For example, in the 3 × 9 network
Further, Ro2,2 is connected via CBP-Links at the ports LNW, LNE, depicted in Fig. 4c, the hop counts between extreme nodes Ro1,0
LSW and LSE with the routers Ro0,0, Ro0,4, Ro4,0 and Ro4,4. The and Ro1,8 reduces from 8 for a common mesh to 6 in case of the
related assignment algorithm for all links of a CBP-Mesh can be proposed CBP-Mesh.
found in Appendix. As CBP-Links provide the alternative paths between two nodes
The M-Links are 2n2 − 2n in the CBP-Mesh network and the increases the tolerance of the network against failing links and
total numbers of links can be determined Nlinks,CPB_odd and routers. Consequently, the proposed CBP-Mesh is more robust than
Nlinks,CPB_even of odd and even CBP-Meshes from the following the classic mesh and the C2-Mesh.
As the proposed CBP-Mesh scale-up, the CBP-Links become
equations:
more effective in reducing the distance between nodes in the
network. The 3 × 9 CBP-Mesh is shown in Fig. 4. The Ro0,0 ↔
n 2
N links, CBP_odd = 2n2 − 2n + 2 (n ≥ 3) (1a) Ro0,4 and Ro2,0 ↔ Ro2,4 (see blue dotted arrow in Fig. 4) reduces
2
the hop count to two as compared with four in mesh, Torus, XD-
Mesh and D-Mesh networks. Similarly, Ro0,0 ↔ Ro0,6 and Ro2,0

142 IET Comput. Digit. Tech., 2017, Vol. 11 Iss. 4, pp. 140-148
© The Institution of Engineering and Technology 2017
Fig. 4  Different CBP-Mesh network architectures
(a) 5 × 5 CBP-Mesh, (b) 3 × 7 CBP-Mesh, (c) 3 × 9 CBP-Mesh

↔ Ro2,6 take three hops by using the CBP-Links and adjacent network diameter of an n × n mesh is 2n − 2. Reducing the network
green router nodes take one more hop to route packets to their diameter leads to the minimisation of the hop count between nodes,
destinations. and thus, to a decrease of the overall latency. It follows, that mesh
has the longest diameter, while CBP-Mesh offers the shortest one.
The bisection width is the number of links that need to be
4 Topology characteristics
removed in order to separate a network into two equal parts [13].
This section presents the characteristics NoC topology and For example, the bisection width of an n × n mesh is n [14]. The
compares the proposed CBP-Mesh with other selected topologies. bisection bandwidth, which is the bandwidth available between
both parts, results from the product of bisection width and link
4.1 Characteristics of a CBP-Mesh bandwidth. Adding links to the network increases the bisection
width due to the enhanced number of paths between two sub-
NoC topology can be characterised by its network diameter, networks. Consequently, the throughput is higher and the traffic
bisection width, path diversity, number of links, degree of routers flow in the network will be improved. The topology with smallest
and the existence of path diversity [17–20]. Table 1 compares the bisection bandwidth is mesh followed by XD-Mesh and C2-Mesh.
general characteristics of the analysed NoC topologies, whereas
In contrast, mesh has the lowest number of links followed by C2-
symmetric meshes are assumed (size n × n).
Mesh, XD-Mesh and the proposed CBP-Mesh.
The network diameter is defined as the maximum shortest path
between all terminal node pairs of the NoC [18]. For example, the

IET Comput. Digit. Tech., 2017, Vol. 11 Iss. 4, pp. 140-148 143
© The Institution of Engineering and Technology 2017
Fig. 5  Compassion of proposed and selected topologies
(a) Average network latency, (b) Average network throughput, (c) Total network power, (d) Energy per data transferred packets

Fig. 6  MPEG-4 application implemented on D-Mesh and CBP-Mesh networks


(a) MPEG-4 application task graph with the bandwidth requirements, (b) MPEG-4 cores on 3 × 4 D-Mesh network, (c) MPEG-4 cores on 3 × 4 CBP-Mesh network

A link means the connection between two routing elements, i.e. in complexity and costs in terms of power consumption of the
router, in a NoC. For example, the number Nlinks of links of an n ×  router. The comparison of required router types reveals that mesh
n mesh is 2n2 − 2n. The highest of amount of links are applied in and Torus topologies apply only medium degree routers, while the
D-Mesh. other topologies use very complex routers with a degree of up to 9.
The degree of the router means the number of links that can be Path diversity refers to the number of available paths between
connected to the router. Thereby, a higher degree leads to increase two nodes in a NoC. It should be noted that higher path diversity
increases the fault tolerance of the network.
144 IET Comput. Digit. Tech., 2017, Vol. 11 Iss. 4, pp. 140-148
© The Institution of Engineering and Technology 2017
Table 1 General characteristics of different symmetric mesh network topologies (size n × n)
Characteristics Mesh Torus D-mesh XD-Mesh C2-Mesh CBP-Mesh
number of nodes n2 n2 n2 n2 n2 n2
diameter 2n − 2 n − 1 n − 1 n − 1 n − 1 n − 1
bisection width n 2n 3n − 2 n + 2 n + 2 2(n + 1)
number of links 2n2 − 2n 2n2 4n2 − 6n + 2 2n2 − 2n + 8 2n2 − 2n + 4 2n2 − 2n + 2 n/2 2

router degree 3 to 5 5 4 to 9 4 to 9 4 to 9 4 to 9
path diversity yes yes yes yes yes yes

Table 2 Types of links used by analysed symmetric mesh Table 4 Experiments performed in NOCTweak with
topologies following parameters
Topology M-Links T-Links D-Link CPB-Links technology 65 nm
mesh 2n2 − 2n — — — operating voltage 1.0 V
Torus 2n — — clock frequency 1 GHz
2n2 − 2n
input buffer size 16 flits
D-Mesh 2n2 − 2n — 2(n − 1)2 —
number of virtual 8
XD-Mesh 2n2 − 2n — 2(n − 1) —
packet length (flit units) 150
C2-Mesh 2n2 − 2n — — (n − 1) flit length 32 bits 32 bits
2
CBP-Mesh (odd) 2n2 − 2n — — 2 n/2 router 3-stage pipeline
2 each simulation runs 100,000 cycles
CBP-Mesh (even) 2n2 − 2n 2 n/3
warm-up cycle time 20,000 cycles
links length 1000 μm
Table 2 compares the type of links each analysed topology flit injection rate 0.02 flits/cycle/node
applies (see also Fig. 2), whereas again symmetric meshes are
assumed (size n × n). It is assumed that the C- and CBP-Links have
some class due to its shape similarly XD- and D-Links consider as 5.1 Simulation environment
some class. As all topologies are mesh-like topologies, the number
of M-Links is the same for all. A further analysis reveals that Torus The NoC topologies mesh, Torus, XD-Mesh, C2-Mesh, D-Mesh
and C2-Mesh require only low amounts of additional links, while and CBP-Mesh networks are implemented in NoCTweak simulator,
the number of D-Links for D-Mesh increases quadratically with n. which is an open source and cycle-level accurate tool written in
This tendency is the same for the number of CBP-Links in the SystemC [21]. NoCTweak is selected for implementation and
proposed CBP-Mesh, however, with a four or nine times lower simulation of all networks due to the availability of large sets of
rising factor depending on the type of mesh. workloads of different synthetic traffic and real embedded system
Table 3 details the amount of required routers and links for 5 × 5 application traces.
NoC realised in the analysed topologies. The data indicate that D- The existing source routing algorithm to compute the shortest
Mesh requires a considerable amount of 6 and/or 9 degree routers path and NMAP algorithm to map embedded application on the
and possess a very high number of links. In contrast, mesh and processing cores of network are used [21]. The hotspot, synthetic
Torus apply solely routers with up to five ports at the low amount and real embedded traffic traces are applied to the proposed CBP-
of links. Finally, XD-Mesh, C2-Mesh and CBP-Mesh have Mesh and its competitor topologies for comparison. The other
balanced requirements of routers and links. simulator configurations used in the simulations are given in
The summary of the advantages gained in the proposed CBP- Table 4.
Mesh network includes:
5.2 Scalability analysis
• Reduction in network diameter.
The synthetic traffic traces of hotspot workload are applied to the
• Reduction in number of hops between nodes. proposed CBP-Mesh and its competitor networks. The analysis
• Increase in bisection bandwidth of the network. focuses on the scalability of the network topologies in terms of
• Availability of multi-paths to network's centre node. latency, throughput, power and energy. Therefore, each network
• Additional fault tolerance of the network. topology has been implemented for different symmetric sized
• Scalability. meshes of 3 × 3 to 9 × 9 networks and the results are depicted in
Figs. 5a–d.
5 Simulation results As expected, the mesh topology scales up badly. Its latency
increases more and more for complex networks, while the
To compare the proposed CBP-Mesh to existing approaches, throughput drops significantly (see Figs. 5a and b). However, mesh
different network topologies are implemented and analysed using network took low cost in terms of total network power and energy
NoCTweak [21] simulator. The results are presented in this section. due to its simple network design as shown in Figs. 5c and d.

Table 3 Number of links and routers with different degrees for an 5 × 5 selected topologies
Topology Router types Nlinks
3-Port 4-Port 5-Port 6-Port 7-Port 9-Port
mesh 4 12 9 0 0 0 40
Torus 0 0 25 0 0 0 50
D-Mesh 0 4 0 12 0 9 62
XD-Mesh 0 16 4 0 4 1 48
C2Mesh 0 16 8 0 0 1 44
CBP-Mesh 0 12 8 4 0 1 48

IET Comput. Digit. Tech., 2017, Vol. 11 Iss. 4, pp. 140-148 145
© The Institution of Engineering and Technology 2017
Table 5 Some embedded applications with required task graph to the tiles of the NoC. Some real-world embedded
number of cores applications selected for analysis and comparisons of topologies
Embedded applications Applications required task are given in Table 5:
MPEG4 decoder Mpeg4 with 12 cores The complete task graph of one of the chosen applications, i.e.
MPEG-4 decoder having 12 cores showing the bandwidth
Wifirx baseband receiver WiFi with 25 cores
requirement and information flow among different tasks is depicted
video object plane and decoder Vopd with 16 cores in Fig. 6a. The NMAP algorithm is applied to mapping the tasks of
video conference encoder Vce with 25 cores embedded applications on the tiles of the networks. The placement
multimedia system mms with 25 cores of cores for D-Mesh and CBP-Mesh networks is shown in Figs. 6b
and c. The SDRAM (C9) has more traffic load, than other cores in
the MPEG-4 application and therefore it requires more links to
Similar observations can be done for Torus. The Torus topology connect to other cores in the network (see Fig. 6a). The C9 in
shows an average increase of latency and moderate reduction in MPEG-4 plays a vital role in the overall performance of the
throughput for rising network complexity. However, for application network. The role of C9 is therefore illustrated in two competitors,
in larger networks long interconnections are required, resulting in i.e. D-Mesh and CBP-Mesh for performance. D-Mesh provides C9
designs that are not regular. Consequently, router design and
eight direct links to connect to other PEs in the network. The C9 is
routing strategies have to be more complex resulting in lower
directly connected to C0, C1, C2, C4, C5, C8, C6 and C10 in D-Mesh
scalability. In contrast, XD-Mesh and C2-Mesh offer good
performance parameters at reasonable power and energy increase. that reduces the hops count in the network. However, the traffic
from C2 → C8 has to pass through C9 in order to take the shortest
The XD-Mesh and C2-Mesh networks possess low complexity and
low cost. Its performances are lower compared with D-Mesh path and therefore packets have to be buffered at C9. Also the
though. The results indicate good performance parameters in traffic from C3 → C8 may traverse the path through C9 and C5
latency and throughput for D-Mesh. However, the extensive use of routers. Similarly, packets from C4 → C7 may also pass through
links and high degree routers leads to high costs in terms of power C9. It means C9 due to heavy traffic load becomes a bottleneck in
and energy. In contrast, the proposed CBP-Mesh scales with very the network. D-Mesh has the highest number of links in the
good performance with considerable lower penalties in network, which increase the complexity of router and cost of the
performance against D-Mesh (see Figs. 5a–d). network to a great extent.
As can be seen, the proposed 7 × 7 CBP-Mesh has the lowest The proposed CBP-Links in CBP-Mesh provides direct
average latency and good throughput, which is only outperformed connectivity between C0 → C9 and C2 → C8 pairs of nodes. The
by the costly D-Mesh and the bad scalable Torus. Compared with C9 node directly connects with C0, C4, C6 and C7. The traffic from
the classic mesh, latency and throughput improve by 36.9 and C3 → C8 take only one hop through C2 to traverse the packets
38.9%, while power consumption and energy increase by 13.5 and
13.6%. In comparison to D-Mesh, latency and throughput of the using the M- and CBP-Link as compared with three hops in D-
CBP-Mesh are 13.0 and 33.3% lower at 23.7 and 18.2% lower Mesh network. The C5 and C10 both require only one hop to its
costs in terms of power and energy. In case of the 9 × 9 network destination node of C9. CBP-Mesh divides traffic with good
topologies, CBP-Mesh has the lowest average latency, which is balance in the network and therefore do not create the bottleneck at
21.8 and 45.7% lower in comparison to D-Mesh and the standard C9 as in the D-Mesh network. The individual latency of C9 both in
mesh, respectively. In contrast, the 11 × 11 CBP-Mesh has 34.4% D-Mesh and CBP-Mesh is compared using NoCTweak. The CBP-
lower power consumption and 35.5% less effort in energy in Mesh V9 and V10 took 37 and 75% less latencies than the D-Mesh
comparison to its D-Mesh counterpart. C9 node. The average network latency, throughput, total power and
energies of networks are analysed by applying five different real-
5.3 Performance for embedded application world embedded applications traffic traces to six different NoC
networks including the proposed network. The results for these
Besides the synthetic traffic, the NoCTweak simulator provides
four parameters are shown in Figs. 7a–d.
several real-time embedded application traces. An NMAP
Simulation results indicate that the MPEG-4 application of a
algorithm is selected to map all the tasks of embedded application's
CBP-Mesh improves latency and throughput by up to 8.9 and

Fig. 7  Compassion different application for proposed and selected topologies


(a) Average network latency, (b) Average network throughput, (c) Total network power, (d) Energy per data transferred packets

146 IET Comput. Digit. Tech., 2017, Vol. 11 Iss. 4, pp. 140-148
© The Institution of Engineering and Technology 2017
15.7% from its predecessor C2-Mesh, while costs in terms of work. This work was partially supported by the Brazilian agencies
power and energy increase by up to 3.6 and 8.3%. Compared with FAPEMIG and CNPq.
more complex D-Mesh NoC topology, latency stays, reduce 7.6%,
while the throughput is up to 15.7% lower than D-Mesh. However, 8 References
CBP-Mesh outperforms from D-Mesh topology in terms of power
[1] Zarandi, H.R.: ‘A fault-tolerant core mapping technique in networks-on-chip’,
and energy penalty, which are 22 and 31% lower. Similarly, the 2013, 7, (August), pp. 238–245
proposed CBP-Mesh has been implemented in NoC with five real [2] Sehgal, V.K., Chauhan, D.S.: ‘State observer controller design for packets
embedded system applications workloads traces and compared flow control in networks-on-chip’, J. Supercomput., 2010, 54, (3), pp. 298–
other selected topologies. Simulation results indicate that the 329
[3] Khawaja, S.G., Mushtaq, M.H., Khan, S.A., et al.: ‘Designing area optimized
application of a CBP-Mesh improves latency in all other application-specific network-on-chip architectures while providing hard QoS
topologies. guarantees’, PLoS One, 2015, 10, (4), pp. 1–17
Compared with all applications for NoC topologies, throughput [4] Pomante, L.: ‘HW/SW co-design of dedicated heterogeneous parallel
of CBP-Mesh is higher than other topologies expect D-Mesh systems: an extended design space exploration approach’, IET Comput. Digit.
Tech., 2013, 7, (6), pp. 246–254
because of a higher number of links than the CBP-Mesh. However, [5] Morgan, A.a., El-Kharashi, M.W., Elmiligi, H., et al.: ‘Unified multi-
CBP-Mesh outperforms D-Mesh in terms of power and energy objective mapping and architecture customisation of networks-on-chip’, IET
penalty, which are lower in all the applications (see Figs. 7a–d). Comput. Digit. Tech., 2013, 7, (6), pp. 282–293
[6] Ju, X., Yang, L.: ‘Performance analysis and comparison of 2 × 4 network on
chip topology’, Microprocess. Microsyst., 2012, 36, (6), pp. 505–509
6 Conclusion [7] Catania, V., Mineo, A., Monteleone, S., et al.: ‘Energy efficient transceiver in
wireless network on chip architectures’. Proc. DATE ‘16, 2016, pp. 1321–
NoC is a promising paradigm to enable fast and reliable on-chip 1326
communication in large-scale multiprocessor systems. The [8] Balfour, J., Dally, W.J.: ‘Design tradeoffs for tiled CMP on-chip networks’.
performance of NoC is driven by several parameters including the Proc. 20th Annual Int. Conf. Supercomputing ICS 06, 2006, vol. 28, no. 1, p.
187
network topology. The impact of topology further increases with [9] Kim, J., Balfour, J., Dally, W.J.: ‘Flattened butterfly topology for on-chip
increasing system complexity. The principal difference between networks’
NoC topologies is the type and the application of the [10] Anjum, S., Chen, J., Yue, P., et al.: ‘Delay optimized architecture for on-chip
interconnections that has been classified into four basic links in this communication’, 2009, 7, (2), pp. 104–109
[11] Ya-gang, W., Hui-min, D.U., Xu-bang, S.: ‘Topological properties and routing
study. algorithm for semi-diagonal torus networks’, 2011, 18, (October), pp. 64–70
Additionally, this work proposes a new NoC topology termed as [12] Hu, W., Lee, S.E., Bagherzadeh, N.: ‘DMesh: a diagonally-linked mesh
CBP-Mesh that improves the standard mesh by adding new CBP- network-on-chip architecture.’
Links to the network. This modification helps in the reduction of [13] Arora, L.K.: ‘C 2 Mesh’, 2012, pp. 282–286
[14] Gulzari, U.A., Anjum, S., Agha, S.: ‘Cross by pass-mesh architecture for on-
the network diameter, minimises the average number of hops and chip communication’. Proc. – IEEE 9th Int. Symp. Embedded Multicore/
increases the bisection width of the NoC. Manycore SoCs, MCSoC 2015, 2015, pp. 267–274
The proposed CBP-Mesh and five other topologies proposed in [15] Ouyang, Y., Zhu, B., Liang, H., et al.: ‘Networks on chip based on diagonal
previous works are implemented using NoCTweak. Synthetic as interlinked mesh topology structure’, Comput. Eng., 2009
[16] Swaminathan, K., Lakshminarayanan, G., Ko, S.: ‘A novel hybrid topology
well as five different real embedded system workloads are applied for network on chip’, 2014, pp. 1–6
to analyse average network latency, throughput, total power and [17] Via, O., Insertion, L.L.: ‘“It’ s a small world after all”: NoC performance’,
energies of all the networks. Simulation results indicate that CBP- 2006, 14, (7), pp. 693–706
Mesh efficiently reduces the distance amongst nodes as compared [18] Sanju, V., Chiplunkar, N., Khalid, M., et al.: ‘A performance study of 2D
mesh & torus for network on chip based system’, 2013, pp. 0–4
with other selected topologies. The proposed network also [19] Elmiligi, H., Morgan, A., El-Kharashi, M.: ‘Power optimization for
improves the average network latency and throughput at a less cost application-specific networks-on-chips: a topology-based approach’,
of power and energy than its predecessors, i.e. mesh, Torus, C2- Microprocessors, 2009
Mesh and XD-Mesh. Compared with its competitor D-Mesh [20] Grecu, C., Ivanov, A., Pande, P., et al.: ‘Towards open network-on-chip
benchmarks’. Proc. Int. Symp. Networks-on-Chips, NOCS, 2007
topology, CBP-Mesh exhibits lesser latency as well as throughput. [21] Tran, A., Baas, B.: ‘NoCTweak: a highly parameterizable simulator for early
However, CBP-Mesh outperforms D-Mesh network in terms of exploration of performance and energy of networks on-chip’, 2012
power and energy penalty. Furthermore, the analytical results also
indicate good scalability of the proposed network with increasing 9 Appendix
network complexity. In short, the proposed NoC topology can be
used for communication among cores with lesser latency and  
greater throughput at a reasonable penalty in terms of power and Link assignment algorithm for a CBP-Mesh network with size m × 
energy consumption. n. The current router node is ro(x,y), and the connecting links to
neighbours routers are lN S E W NE NW SE SW
i , li , li , li and CBP-Links li , li li , li
7 Acknowledgment
(see Fig. 8).
The authors are thankful to COMSATS Institute of Information
Technology for providing the platform to carry out this research

IET Comput. Digit. Tech., 2017, Vol. 11 Iss. 4, pp. 140-148 147
© The Institution of Engineering and Technology 2017
Fig. 8  Link assignment algorithm for a CBP-Mesh network

148 IET Comput. Digit. Tech., 2017, Vol. 11 Iss. 4, pp. 140-148
© The Institution of Engineering and Technology 2017

You might also like