You are on page 1of 7

2016 Euromicro Conference on Digital System Design

Fault Tolerant Deadlock-free Adaptive Routing


Algorithms for Hexagonal Networks-on-Chip
Sadia Moriam∗† and Gerhard P. Fettweis ∗†
∗ Dep. of Electrical Engineering and Information Technology / Vodafone Chair Mobile Communication Systems
† Centre for Advancing Electronics Dresden (CFAED)

Technische Universität Dresden, 01062 Dresden, Germany


Email: {sadia.moriam, gerhard.fettweis}@tu-dresden.de

Abstract—Technology scaling has allowed the integration of alternate routes to the destination based on congestion and/or
a large number of cores on a single chip, which significantly to bypass permanently faulty links and routers. However, they
improves the speed of on-chip processing. Network-on-chip is are complex to design and may take non-minimal routes to
the interconnection network which provides efficient and flexible
communication between cores in such multi-processor systems-
the destination, particularly to achieve fault-tolerance. Never-
on-chip. However, the performance enhancements of technology theless, to prevent flit loss due to faults, the routing algorithm
scaling come at the cost of reliability as on-chip components must be adaptive.
particularly the network-on-chip become increasingly prone to Redundancy is the main approach to fault tolerance and dif-
faults. Redundancy is the basic approach to fault tolerance and in ferent forms of redundancy are needed for tolerating different
this paper we investigate the hexagonal on-chip network topology
with redundant diagonal inter-router links, having approximately
fault classes [1]. For future chips, envisioned to have hundreds
1.5 times the number of links as the mesh topology. To evaluate to thousands of cores [2], increasing failure rates will make
the fault tolerance of the hexagonal network with wormhole- it necessary to include redundancy to tolerate these failures.
switched routing, we present deadlock-free fault tolerant routing Spatial redundancy i.e. usage of redundant components is
algorithms obtained by applying the turn model and without needed to tolerate permanently failing components. In this
the use of costly virtual channels. To circumvent the problem
of finding the right selection of turns to prevent deadlock,
paper, we apply spatial redundancy to tolerate permanent
we propose an approach based on the transitive closure of NoC failures and investigate a hexagonal or 60 degrees NoC
the channel dependency matrix. The results indicate that the topology obtained by adding links diagonally (in the North-
hexagonal NoC with the proposed adaptive routing algorithms East and South-West directions) to the standard mesh, as
significantly improves NoC resilience by being able to tolerate shown in Fig. 1(a). The hexagonal arrangement representation
two router faults, while the mesh NoC can tolerate only one
router fault. Moreover, the proposed approach is general and
is shown in Fig. 1(b).
can be adopted for developing adaptive routing algorithms for
any regular network topology.

I. I NTRODUCTION
In multi-processor systems-on-chip (MPSoCs), network-on-
chip or NoC is the highly scalable interconnection network
for connecting modules in a packet switched communication
network. Due to its efficiency and high bandwidth, network-
on-chip has gained popularity over the last decade as the
dominant on-chip interconnect. However, with the scaling of (a) Mesh topology with diagonal (b) Hexagonal NoC topology.
transistor gate sizes, components of the NoC become highly link.
susceptible to transient and permanent faults [1], making NoC Fig. 1. Mesh NoC with diagonal links and representation as Hexagonal NoC.
reliability a great challenge for current and future technologies.
The routing algorithm in the NoC determines the path taken The predominant packet switching method used in NoC
by a flit from the source module to the destination module. routers is wormhole switching. In this switching approach,
Deterministic routing algorithms which always provide the several flits make up a packet and the first or the header flit
same path for a source-destination pair are simple and less determines the path, reserving the channels as it progresses
costly to implement. However, they cannot tolerate faults forward. The remaining flits follow the header in a pipeline
and will thus result in performance degradation in presence manner and the tail (last) flit releases the channels. The
of faults. Adaptive routing algorithms can adaptively select advantage gained is reduced latency as well as reduced buffer
sizes [3], an important factor since the on-chip resources
This work is supported in part by the German Research Foundation (DFG)
within the Cluster of Excellence ”Center for Advancing Electronics Dresden” especially area and power consumption are very limited. The
(cfaed). drawback of the wormhole switching is that it may lead to

978-1-5090-2817-7/16 $31.00 © 2016 IEEE 131


DOI 10.1109/DSD.2016.71
deadlock, a situation in which several channels are blocked Another approach to designing fault tolerant routing algo-
and no flits can progress forward. This occurs when flits rithms considers the use of virtual channels as in [11] [12],
occupying some channels are waiting to be granted access where authors use two virtual channels to tolerate all one
to the next channel and the blocking-and-waiting-for happens and two fault links and all one faulty routers, respectively
in a closed circle. Deadlock is generally prevented by the while keeping the network performance optimal by providing
routing algorithm by restricting the channels that may be alternate minimal routes for the packet traversal. Instead of
consecutively requested. As a result, in wormhole switched using adaptive routing, authors in [13] reconfigure the routing
networks the adaptive routing algorithm must be carefully algorithm in the case of link failures. Authors in [14] propose
designed to prevent deadlock and this restricts the adaptivity an algorithm that reconfigures the routing table through re-
of the algorithm in selecting alternate paths to the destination. inforcement learning and their approach is applicable to any
To determine the fault resilience of the hexagonal NoC to topology and not dependent on the shape of the fault region. In
faulty routers, we initially developed deadlock- and livelock- [15], authors present a low-cost fault tolerant routing algorithm
free adaptive routing algorithms for this topology based on the for efficient routing path selection using traffic status of the
turn model [4]. To ensure deadlock freedom, we generated the NoC. Although all of the above mentioned works consider
channel dependency matrix of the NoC and determined the mesh topology, which is the commonly used NoC topology,
presence of cycles in this matrix by calculating it’s transitive other topologies such as the torus and the hexagonal mesh have
closure. Moreover, we ensured livelock freedom by limiting also been considered in some works. The hexagonal network
the number of times the flit is misrouted while being routed is investigated in works such as [16] [17] as the hexagonal
around faulty routers. Our investigations produced 18 different network having lower average hop count is considered to have
possible adaptive routing algorithms, 6 of which are unique better performance than the mesh. In [16], authors present a
due to symmetry. Our analysis showed that these fault-tolerant deadlock free adaptive routing algorithm for the hexagonal
routing algorithms can tolerate up to two faulty routers located torus interconnection network using three virtual channels
at any position. By faulty routers, we mean permanently defect per physical channel. In [17], authors propose a different
units which cannot receive or transmit any flits. We present addressing scheme for the hexagonal network which allows
cycle accurate simulations to evaluate the performance of the to adapt the turn model based adaptive routing algorithms for
algorithms in the presence and absence of router faults. the mesh to the hexagonal network. A diagonally connected
The following sections are arranged as follows: section II mesh network is presented in [18] with adaptive quasi-minimal
discusses the related works, section III discusses the fault- routing algorithm.
tolerant routing algorithms for the hexagonal NoC, section In most works, the deadlock freedom verification of the
IV presents performance evaluation results and section V routing algorithm is based on examining the the channel de-
concludes the paper. pendency graph (CDG). According to the theorem presented in
[19], for deadlock freedom the CDG must be acyclic. Authors
II. R ELATED W ORK in [20] present a method of designing application specific
Fault tolerance of networks to permanently faulty compo- deadlock-free routing algorithms for any topology by using
nents, both off-chip and on-chip has been a topic of intense an heuristic to cut edges in the CDG and thereby preventing
research over the last few years. A great portion of these works cycles. Our contribution in this paper is different from previous
are based on adaptive routing techniques to route around faults works in that we propose fault-tolerant deadlock free routing
and many of these are based on the turn model [4], which give algorithms for the hexagonal NoC topology without the use
a methodology on how to design adaptive routing algorithms of virtual channels. Moreover, we use a general approach to
for wormhole switched networks. In [5], fault-tolerant version ensure deadlock freedom of the adaptive routing algorithm
of the Negative-First Routing algorithm for the 2D mesh is while applying the turn model.
presented and it is shown that the algorithm can tolerant any
III. H EX A DAPTIVE ROUTING A LGORITHMS
one faulty router. Authors in [6] extend this work to tolerate
any one faulty link or router. Another work which considers A. Application of the Turn Model
the turn model approach for 2D meshes is [7], in which a We use the turn model [4] for developing deadlock-free
fault-tolerant and deadlock free routing protocol based on the adaptive routing algorithms as this approach does not require
odd-even turn model is presented. The faults are contained in the usage virtual channels, which are very costly for on-chip
a disjointed rectangular sets called faulty blocks comprising implementation. For the formation of adaptive routing algo-
both faulty as well as disabled healthy nodes. Authors in [8] rithms according to this model, the channels in the network
propose a fault-tolerant routing algorithm which is deadlock- must be first partitioned into the directions in which they
free and able to achieve a higher throughput with less number route packets. As deadlock is created when due to routing
of deactivated nodes. All of the above approaches do not a set of channels are requested in a closed cycle, deadlock
require the use of virtual channels. The approach of sacrificing is prevented by ensuring these cycle of dependencies do not
healthy nodes to route around fault regions containing both occur. According to the turn model, a turn occurs when a
healthy and faulty nodes is considered in many works, such flit traveling in a certain direction is forwarded to a different
as in [9] [10]. direction. Deadlock cycles are created between turns leading to

132
closed cycles and so sufficient turns between these directions algorithms, we consider the hexagonal NoC as presented
should be avoided to prevent cycle formations. Moreover, in Fig. 1(a) and use the direction names of the x-,y- and
these set of turns should also prevent all possible simple and diagonal directions, as in a standard 900 system to have easier
complex deadlock cycles formed as a result of clockwise(CW) comparison to the mesh topology. For the hexagonal NoC,
and counter-clockwise (CCW) cycles combination. In the our investigations showed that there are 18 possible adaptive
case of the hexagonal NoC, as shown in Fig.2, there are 3 routing algorithms, 6 of which are unique due to symmetry.
directions, each pair of which are at least 60 degrees apart. As there are many possibilities of simple and complex
On closer inspection, it can be seen that the simplest cycles cycles forming due to combinations of the cycles in the
in hexagonal network are formed due to two basic triangular CW and CCW directions, the process of obtaining the right
cycles, as can be seen in Fig.2. Furthermore, many different selection of prevented turns can be quite cumbersome for any
bigger deadlock cycles can form by combinations of these 2 topology. To simplify the selection of the right combination of
basic cycles. turns, we have used concepts from graph theory. By selecting
a certain combination of turns in the CW and CCW directions
and generating the channel adjacency matrix representation of
the CDG, called channel dependency matrix (CDM), the CDM
is checked for cycles by finding its transitive closure.
600
600
c1 c1 c2 c3 c4 c5 c6 c7 c8
1 2 c1 ª 0 0 1 1 0 0 0 0º
c4 c2 « 0 0 0 0 1 1 0 0»
« »
c5 c2 c7 c3 c3 « 0 0 0 0 0 0 1 1»
c6 A= c4 «
« 0 0 0 0 0 0 0
»

3 4 c5 « 0 0 0 0 0 0 0 0»
c8 c6 « »
« 0 0 0 0 0 0 1 1»
Fig. 2. Examples of some deadlock cycles forming in the 60 degrees or Restricted turns c7 « 0 0 0 1 0 0 0 0»
c8 « »
hexagonal NoC between the three directions. ¬ 0 0 0 0 1 0 0 0¼

To prevent deadlock in e.g. the CW direction, two 600 turns c1 c2 c3 c4 c5 c6 c7 c8


(i.e. two turns between two different pairs of directions) must c1 ª 0 0 1 1 1 0 1 1º
« 1»
c2 «
0 0 0 1 1 1 1
»
by prevented as well as the turn formed by the combination c3 « 0 0 0 1 1 0 1 1»
of these two turns, i.e. a 1200 turn. This is because the two TC = c4 «
« 0 0 0 0 0 0 0
»

triangles can be combined in 3 ways (due to the 3 sides c5 « 0 0 0 0 0 0 0 0»
« »
of a triangle). Thus by preventing a turn in each triangle c6 « 0 0 0 1 1 0 1 1»
(between different pairs of directions) and another turn from c7 « 0 0 0 1 0 0 0 0»
« »
c8 ¬ 0 0 0 0 1 0 0 0¼
the combination of these two triangles, deadlock is prevented.
Some 1800 turns can be allowed to increase adaptivity when
deadlock is not created by them. Fig. 4. The channel dependency matrix, A and the Transitive Closure, TC
for 2 × 2 Mesh NoC with the indicated turns restrictions.

The entries of the CDM, 𝐴𝑖𝑗 represents the dependency of


N (+Y) NE (+ diag) a channel 𝑖 to channel 𝑗, i.e. whether a turn is allowed from
channel 𝑖 to channel 𝑗. 𝐴𝑖𝑗 equals 1 iff 𝑖 and 𝑗 are adjacent and
W (-X) E (+X) if channel 𝑗 can be requested immediately after channel 𝑖. Next
by determining the transitive closure (TC) of the CDM, using
SW (- diag) S (-Y) standard algorithms such as the Warshall algorithm and by
detecting ones in the diagonal of the TC, the presence of cycles
in the CDG is determined. This method to simplify the turn
Prevented turns: selection process is applicable to any regular network to obtain
(CW) NE-to-S, N-to-E, NE-to-E
(CCW) N-to-SW, N-to-W,NE-to-W
the required turn restrictions for the development of deadlock
free routing algorithms. As an example, the CDM and the
TC for a 2 × 2 mesh NoC with a selected set of prevented
Fig. 3. Example of a set of forbidden turns to prevent deadlock. turns is shown in Fig.4. With the selection of prevented turns:
North-to-East and West-to-South and the 1800 turns West-to-
A possible set of prevented turns, necessary for avoiding East and North-to-South , the diagonal of the TC contains all
deadlock are shown in Fig. 3. The figure shows the prevented zeros. Thus, for these prevented turns, the routing algorithm
turns when the 600 NoC is represented as the standard is deadlock free. However, the size of the NoC must also be
grid mesh with diagonal links. When presenting the routing taken into consideration. For the hexagonal NoC, to include

133
all cycles formed from the two triangles, a minimum NoC size Algorithm 1 HexNegFirstFT: Destination to NE
of 3 × 3 should be used to generate the CDM. 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 = 𝑋𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑋𝑠𝑜𝑢𝑟𝑐𝑒 , 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 = 𝑌𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑌𝑠𝑜𝑢𝑟𝑐𝑒
B. Fault Tolerant Routing Algorithm if 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 > 1 & 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 > 1 & 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 ∕= 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 then
if 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 > 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 then
Two of the set of algorithms from the 6 adaptive routing if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
algorithms provide at least three alternate paths for flit trans- 𝑆𝑒𝑙𝑒𝑐𝑡 𝐸𝑎𝑠𝑡
mission, via three output-port directions and reaching via three else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
input-port directions and therefore should be able to tolerate 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ𝐸𝑎𝑠𝑡
2 faulty routers. The two routing algorithms are the W-SW-S else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
First (or alternatively N-NE-E Last) and the E-S-SW First (or 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ
alternatively W-N-NE Last) algorithms, where E,W,S,N,NE else
and SW stands for the East, West, South, North, NorthEast 𝐷𝑟𝑜𝑝 𝐹 𝑙𝑖𝑡
and SouthWest directions, respectively. In the following we end if
discuss the fault-tolerant W-SW-S First algorithm, in which else if 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 < 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 then
flits should be routed adaptively first in the West, South or if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
SouthWest directions and then adaptively North, NorthEast 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ
and East directions to reach the destination. This algorithm else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
is also called Negative First Fault-tolerant (HexNegFirstFT) 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ𝐸𝑎𝑠𝑡
routing algorithm, as W, SW and S are the negative x, diagonal else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
and y directions respectively for the hexagonal NoC and all 𝑆𝑒𝑙𝑒𝑐𝑡 𝐸𝑎𝑠𝑡
turns from the positive directions to the negative directions are else
prevented. The Negative-First Fault-tolerant routing algorithm 𝐷𝑟𝑜𝑝 𝐹 𝑙𝑖𝑡
for the mesh was presented in [5] and was shown to be able to end if
tolerate all cases of 1 faulty router, as it can provide 2 alternate end if
routes from a source to a destination node. else if 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 > 1 & 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 == 1 then
The turns prevented in the HexNegFirstFT algorithm are if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑆 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
the following: (CW) E-to-SW (600 ), NE-to-S (600 ), E-to-S 𝑆𝑒𝑙𝑒𝑐𝑡 𝑆𝑜𝑢𝑡ℎ
(1200 ) and (CCW) N-to-SW (600 ), NE-to-W (600 ), N-to-W else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
(1200 ). In addition the following 1800 turns are disallowed: E- 𝑆𝑒𝑙𝑒𝑐𝑡 𝐸𝑎𝑠𝑡
to-W , NE-to-SW and N-to-S. Initially only local knowledge else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
of faults is assumed, i.e. a router knows only which of its 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ𝐸𝑎𝑠𝑡
immediate neighbors are faulty. However, as we will see else
later, in the case when the destination is to the NE of the 𝐷𝑟𝑜𝑝 𝐹 𝑙𝑖𝑡
current router, then local knowledge is not sufficient to have 2 end if
router-fault tolerance. Although in this section, we describe else if 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 == 1 & 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 > 1 then
the HexNegFirstFT algorithm, the E-S-SW algorithm also if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑊 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
provides 3 alternate paths to the destination and is therefore 𝑆𝑒𝑙𝑒𝑐𝑡 𝑊 𝑒𝑠𝑡
also 2 router fault tolerant. else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
According to the algorithm, whenever two or three valid 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ
output ports are available towards a destination, the output else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
port is selected which has a higher path diversity, in order to 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ𝐸𝑎𝑠𝑡
increase fault-tolerance. Accordingly, when the destination is else
to the W, S or SW of the current node, the output ports in 𝐷𝑟𝑜𝑝 𝐹 𝑙𝑖𝑡
the W, S or SW directions are chosen adaptively, to reach the end if
destination. When the destination is to the N or E or NE of the else if 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 = 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 then
current node, the flit should be first forwarded in the negative if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
direction, i.e. in the W, SW or S directions, adaptively, to reach 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ𝐸𝑎𝑠𝑡
a position from which 3 disjoint paths toward the destination else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
are available. Then the positive directions, i.e. the N, NE or E 𝑆𝑒𝑙𝑒𝑐𝑡 𝐸𝑎𝑠𝑡
directions are selected adaptively towards the destination, as else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
shown in Fig. 5. Since, it is possible for all source-destination 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ
pairs to begin in 3 possible directions and also to reach the else
destination via 3 ports, the routing algorithm can tolerate any 𝐷𝑟𝑜𝑝 𝐹 𝑙𝑖𝑡
2 router faults. end if
The pseudo-code for destinations to the NorthEast of the end if
source is given in Algorithm 1. Essentially, when the desti-

134
nation is to the NE of the current node, it is possible to go When the destination is to the SW of the source, if the
in three directions, i.e. N or E or NE or even in the negative SW router is faulty, either the W or S can be chosen as
directions, but only if the previous direction of travel was not here there are more than 1 path towards the destination in
already in a positive direction. If the 𝑥𝑜𝑓 𝑓 𝑠𝑒𝑡 and the 𝑦𝑜𝑓 𝑓 𝑠𝑒𝑡 each direction, as shown in Fig. 6. In this case, knowledge
are both greater than 1, the three positive directions, N, NE of neighbour routers’ fault condition is sufficient. However,
or E directions are taken adaptively. The output port direction greater knowledge of non-neighbour faults would reduce the
which leads to the higher path diversity and which does not path length.
lead to a faulty router is selected, as can be seen in Algorithm1.
If 𝑥𝑜𝑓 𝑓 𝑠𝑒𝑡 is equal to 𝑦𝑜𝑓 𝑓 𝑠𝑒𝑡 , NE output port is preferred. If S S
the router in this direction is faulty, then either the E or N
port can be selected, as both lead to the destination in equal D D
S S
number of hops (Fig. 5(b)).
XD X XD
D D

D X D

X
S S

S S
(a) (b) Fig. 6. Paths to the destination at SW of the source.

D D
In the algorithm when the flit is incoming from a positive
direction of travel i.e. through the W or S or SW input ports of
X X X the current router, then further movement to the W, S or SW
direction is disallowed by the deadlock freedom requirement.
S X S When the destination is to the East or North, then first 2
(c)
(d)
hops in the negative direction i.e. in the W or S or SW are
taken, so as to bring the destination to the North East of the
D D D current router, from which there are 3 disjoint paths to the
destination. Then the rules in Algorithm 1 are followed to the
S X X X destination. Some examples with different locations of faulty
routers are shown in Fig. 7. As in these cases, the path taken
S to the destination is a non-minimal one, there is some loss of
X
performance. However, due to the diagonal link, the average
(e) (f)
path length is still lower in comparison to mesh with the
S Source D Destination X Faulty Router
Negative First fault tolerant routing algorithm. Moreover, when
minimal paths exist to the destination e.g. for destinations to
the W or S of the source, with each fault encountered the
Fig. 5. Paths to the destination at NE of the source.
path length is increased by 2 hops for the mesh whereas it is
increased by 1 hop for the hex noc. Thus, in comparison to
However, at this point, with only local knowledge of faults, the mesh the performance of the hexagonal mesh is always
it is not possible to ensure reaching the destination, as seen in higher in terms of average latency and fault-tolerance.
Fig.5(c), where if E output port had been selected, destination
would not have been reachable. Because of this, the router
requires the fault knowledge of the neighbor which is 2 hops S D D D

away in the E-NE or alternatively in the N-NE direction. With


D D D
this knowledge, the router is able to able to find a path even in X
the presence of two faulty routers. This problem occurs only
when the destination is 2 hops the NE of the current. The D
D X D

flit has exactly 3 paths toward the destination and if the NE


D XD
is faulty, the other two paths are completely independent of
each other. If either the 𝑥𝑜𝑓 𝑓 𝑠𝑒𝑡 or the 𝑦𝑜𝑓 𝑓 𝑠𝑒𝑡 are equal to 1,
then one hop in the negative direction (to the W or to the S S S X D

directions, respectively) are first taken, as can be seen in Figs. (a) (b)

5(d-e). If these neighbour routers are faulty, then the positive


directions are used to travel to the destination. Fig. 7. Destinations to the East or North of the source.

135
When the path is along the West or South edge of the NoC, hexagonal NoC has a higher flit acceptance rate. In particular
and a fault blocks the path to the destination, then some hops with 0 and 2 faults, the hexagonal NoC is able to accept up to
are taken around the fault to bypass it, as can be seen in Fig. 0.44 and 0.28 flits/node/cycle respectively before the network
7(b) and Fig. 8. Although, in such cases, some forbidden turns becomes saturated, which is up to 46% higher than the mesh,
are taken (such as E-to-S turn in Fig. 8), deadlock does not which can accept only up to 0.3 and 0.23 flits/node/cycle
happen as a cycle cannot form through a faulty node at the respectively before becoming saturated.
edge.
35
HexNoC: 0 fault
D D MeshNoC: 0 fault
HexNoC: 1 fault
30
MeshNoC: 1 fault
D D D D HexNoC: 2 faults
X X MeshNoC: 2 faults

Latency (Cycles)
25

S X S X
20

Fig. 8. Routing along the west or south Edge.


15

10
IV. P ERFORMANCE E VALUATION
To evaluate the performance of the adaptive routing al- 5
0 0.1 0.2 0.3 0.4 0.5
gorithm, we implemented the hexagonal NoC topology with Acceptance Rate (Flits/Node/Cycle)
the Negative First fault tolerant routing scheme in a cycle
accurate simulator [21], based on a C++/SystemC simulation Fig. 9. Latency versus flit acceptance rate for 4 × 4 NoC with uniform
model. We also implemented the Negative First fault tolerant traffic.
routing algorithm for the mesh NoC, in order to compare
its performance to the hexagonal NoC. We evaluated the
performance in terms of latency with varying flit injection B. Fault Tolerance Evaluation
rates and the resilience of the network to different numbers For the fault tolerance investigation, we evaluated a 6 × 6
of router faults. Fault resilience or tolerance is measured sized network and the results are depicted in Fig. 10. The
by the average ratio of successfully delivered flits to the location of the faulty routers were randomly selected over
total number of injected flits. The faulty routers are modeled 1000 iterations. The results show that while the mesh has
as permanently defective units. Each router is aware of the 100% fault reliability at any position of 1 faulty router, the
immediate neighbouring faulty routers, i.e. 4 for the mesh hexagonal NoC has 100% reliability to any position of 2 faulty
and 6 for the hexagonal mesh. As discussed earlier, in the routers. Note, however, that the two corner routers (upper
hexagonal NoC, each router is also aware of the fault of the left and bottom right) have only two input or output ports
neighbour two hops away in the E-NE direction. We assumed and therefore cannot be two router fault tolerant. We do not
uniform random traffic pattern i.e. all nodes generate packets consider these two nodes in our calculation, as they are not
in with equal probability and random distribution. Moreover, part of the true hexagonal NoC. When the number of faulty
we assumed nodes connected to faulty routers do not generate routers is increased above 2, the reliability decreases but at
any packets and no messages are destined for these nodes.The a slower rate for the hexagonal topology than for the mesh
routers have input buffers with buffer depth of 6 flits and the topology. In particular, with 5 faulty routers, the hexagonal
packet size is kept at 5 flits. NoC has a resilience of 97.54% which is 9% higher than that
of the mesh.
A. Latency Evaluation We also investigated the fault reliability for higher number
We evaluated the latency performance of the two topologies of faults and compared the reliability of two sizes of hexagonal
with Negative First Fault Tolerant Routing for a 4 × 4 sized NoC. The results, for 6 × 6 and 8 × 8 NoCs are depicted in
NoC. The simulations were run for at least 100, 000 cycles Fig. 11. As can be seen, for the same percentage of faults, the
and the number of faulty routers were varied from 0 to 2. The smaller size NoC has a higher fault tolerance than the larger
location of faults are chosen randomly and over all possible sized NoC e.g. with 25% faulty routers, the 6 × 6 network
combinations. The results are shown in Figure 9, which shows has a resilience of 87% while 8 × 8 NoC has a resilience of
the average latency (averaged over all source-destination pairs) 81%. This is due to the fact that for the same percentage of
versus the flit acceptance rate. As the hexagonal topology faulty routers, the 8×8 network has a greater number of faulty
provides shorter paths with the adaptive routing algorithm routers than the 6 × 6 network and also has a longer average
(due to the diagonal link) , the average path length is shorter path length. As a result, while traversing through the network,
and therefore the average path latency is lower than that for flits have a a higher probability of encountering a faulty router
the mesh at the same flit acceptance rate. As a result, the in the 8 × 8 network than in the 6 × 6 network.

136
102 R EFERENCES
100
[1] M. Radetzki, C. Feng, X. Zhao, and A. Jantsch, “Methods of fault
98 tolerance in networks-on-chip,” ACM Computing Surveys, vol. 46, no. 1,
Oct 2013.
Fault resilience (%)

96
[2] S. Borkar, “Thousand Core Chips: A Technology Perspective,” in
94 Proceedings of the 44th Annual Design Automation Conference, ser.
92 DAC ’07, 2007, pp. 746–749.
[3] L. Ni and P. K. McKinley, “A survey of wormhole routing techniques
90
in direct networks,” Computer, vol. 26, no. 2, pp. 70–78, Feb. 1993.
88 [4] C. J. Glass and L. M. Ni, “The turn model for adaptive routing,”
Hex NoC
86 Mesh NoC
Association for Computer Machinery, vol. 41, no. 5, pp. 874–902, Sep.
1994.
84 [5] C. Glass and L. Ni, “Fault-tolerant wormhole routing in meshes,” in
82 FTCS23. Proceedings, 1993, pp. 240–249.
[6] M. Imai and T. Yoneda, “Improving dependability and performance of
80
1 2 3 4 5 fully asynchronous on-chip networks,” in Proceedings of the 17th IEEE
Number of faulty routers International Symposium on Asynchronous Circuits and Systems, 2011.
[7] J. Wu, “A fault-tolerant and deadlock-free routing protocol in 2d meshes
based on odd-even turn model,” IEEE Transaction on Computers,
Fig. 10. Comparison of fault resilience of 6 × 6 mesh and hexagonal NoC
vol. 52, no. 9, pp. 1154–1169, Sep. 2003.
for different numbers of faulty routers.
[8] Y. Fukushima, M. Fukushi, and S. Horiguchi, “Fault-Tolerant Routing
Algorithm for Network on Chip without Virtual Channels ,” in 24th
IEEE International Symposium on Defect and Fault Tolerance in VLSI
100 Systems, 2009.
6x6 Hex NoC
95 8x8 Hex NoC
[9] R. Boppana and S. Chalasani, “Fault-tolerant wormhole routing algo-
rithmsfor mesh networks,” IEEE Transaction on Computers, vol. 44,
90 no. 7, pp. 848–864, 1995.
Fault Resilience (%)

[10] D. Fick et al., “Vicis: a reliable network for unreliable silicon,” in Design
85
Automation Conference, April 2009, pp. 812–816.
80 [11] E. Masoumeh, M. Daneshtalab, J. Plosila, and H. Tenhunen, “MAFA:
Adaptive fault-tolerant routing algorithm for networks-on-chip,” in Dig-
75
ital System Design (DSD), 15th Euromicro Conference on, 2012, pp.
70 201 – 207.
[12] E. Masoumeh, M. Daneshtalab, and J. Plosila, “High performance fault-
65
tolerant routing algorithm for noc-based many-core systems,” in Parallel,
60 Distributed and, Network-Based Processing, 21st Euromicro Conference
on, 2013.
55 [13] D. Fick, A. DeOrio, G. Chen, V. Bertacco, D. Sylvester, and D. Blaauw,
50 “A Highly Resilient Routing Algorithm for Fault Tolerant NoCs,” in
0 10 20 30 40 50 Design, Automation Test in Europe Conference Exhibition (DATE),2009,
Faulty Routers (%) April 2009, pp. 21–26.
[14] C. Feng, Z. Lu, A. Jantsch, J. Li, and M. Zhang, “A reconfigurable fault-
Fig. 11. Comparison of fault resilience of 6 × 6 and 8 × 8 sized NoCs for tolerant deflection routing algorithm based on reinforcement learning
different ratios of faulty routers. for network-on-chip,” in Third International Workshop on Network on
Network on Chip Architectures, NoCArc’10. ACM, May 2010, pp.
11–16.
[15] J. Liu, J. Harkin, Y. Li, and L. Maguire, “Low cost fault-tolerant
V. C ONCLUSION algorithm for networks-on-chip,” Journal Microprocessors and Mi-
crosystems, vol. 39, pp. 358–372, 2015.
With decreasing transistor gate sizes, the network-on-chip [16] A. Shamaei and B. Bose, “Adaptive routing in hexagonal torus inter-
connection networks,” in IEEE High Performance Extreme Computing
becomes increasingly susceptible to faults. In this paper, we Conference, Sep 2013, pp. 10–12.
investigated the hexagonal NoC topology and evaluated the [17] H. Gu, J. Zhang, Z. Liu, and X. Tu, “Routing in Hexagonal Networks
fault tolerance and performance of this topology in comparison under a Corner-Based Addressing Scheme,” IEICE TRANS. INF. and
SYST., vol. E89-D, no. 5, pp. 1755–1758, May 2006.
to the mesh, which is the most commonly used NoC topology. [18] W. Hu, S. Lee, and N. Bagherzadeh, “DMesh: a diagonally-linked
We developed deadlock free fault tolerant routing algorithms mesh Network-on-Chip architecture,” in NoCArc, First International
for the hex NoC based on the turn model. We proposed a Workshop on Network on Chip Architectures, 2008.
[19] J. Duato, “A new theory of deadlock-free adaptive routing in wormhole
general approach to ensure the right selection of turns to have networks,” Parallel and Distributed Systems, IEEE Transactions on,
deadlock freedom based on the channel dependency matrix. vol. 4, no. 12, pp. 1320–1331, Dec 1993.
The results indicate that with the proposed routing algorithms, [20] V. Catania, R. Holsmark, S. Kumar, and M. Palesi, “A methodology for
design of application specific deadlock-free routing algorithms for NoC
the hex NoC with 1.5 times greater number of links than systems,” in International Conference on Hardware/Software Codesign
the mesh, significantly improves the NoC resilience by being and System Synthesis (CODES+ISSS). Proceedings, 2006, pp. 142–147.
able to tolerate any position of two faulty routers, whereas the [21] M. Winter and G. P. Fettweis, “A Network-on-Chip Channel Allocator
for Run-Time Task Scheduling in Multi-Processor System-on-Chips,”
mesh can tolerate only one faulty router. Moreover, the hex Digital Systems Design, Euromicro Symp. on, pp. 133–140, 2008.
NoC has a superior performance in terms of average latency
and throughput. In a 4 × 4 network, the hex NoC has 46%
and 22% higher flit acceptance rate at 0 and 2 faulty routers,
respectively, in comparison to the mesh at saturation point.

137

You might also like