Professional Documents
Culture Documents
Abstract—Technology scaling has allowed the integration of alternate routes to the destination based on congestion and/or
a large number of cores on a single chip, which significantly to bypass permanently faulty links and routers. However, they
improves the speed of on-chip processing. Network-on-chip is are complex to design and may take non-minimal routes to
the interconnection network which provides efficient and flexible
communication between cores in such multi-processor systems-
the destination, particularly to achieve fault-tolerance. Never-
on-chip. However, the performance enhancements of technology theless, to prevent flit loss due to faults, the routing algorithm
scaling come at the cost of reliability as on-chip components must be adaptive.
particularly the network-on-chip become increasingly prone to Redundancy is the main approach to fault tolerance and dif-
faults. Redundancy is the basic approach to fault tolerance and in ferent forms of redundancy are needed for tolerating different
this paper we investigate the hexagonal on-chip network topology
with redundant diagonal inter-router links, having approximately
fault classes [1]. For future chips, envisioned to have hundreds
1.5 times the number of links as the mesh topology. To evaluate to thousands of cores [2], increasing failure rates will make
the fault tolerance of the hexagonal network with wormhole- it necessary to include redundancy to tolerate these failures.
switched routing, we present deadlock-free fault tolerant routing Spatial redundancy i.e. usage of redundant components is
algorithms obtained by applying the turn model and without needed to tolerate permanently failing components. In this
the use of costly virtual channels. To circumvent the problem
of finding the right selection of turns to prevent deadlock,
paper, we apply spatial redundancy to tolerate permanent
we propose an approach based on the transitive closure of NoC failures and investigate a hexagonal or 60 degrees NoC
the channel dependency matrix. The results indicate that the topology obtained by adding links diagonally (in the North-
hexagonal NoC with the proposed adaptive routing algorithms East and South-West directions) to the standard mesh, as
significantly improves NoC resilience by being able to tolerate shown in Fig. 1(a). The hexagonal arrangement representation
two router faults, while the mesh NoC can tolerate only one
router fault. Moreover, the proposed approach is general and
is shown in Fig. 1(b).
can be adopted for developing adaptive routing algorithms for
any regular network topology.
I. I NTRODUCTION
In multi-processor systems-on-chip (MPSoCs), network-on-
chip or NoC is the highly scalable interconnection network
for connecting modules in a packet switched communication
network. Due to its efficiency and high bandwidth, network-
on-chip has gained popularity over the last decade as the
dominant on-chip interconnect. However, with the scaling of (a) Mesh topology with diagonal (b) Hexagonal NoC topology.
transistor gate sizes, components of the NoC become highly link.
susceptible to transient and permanent faults [1], making NoC Fig. 1. Mesh NoC with diagonal links and representation as Hexagonal NoC.
reliability a great challenge for current and future technologies.
The routing algorithm in the NoC determines the path taken The predominant packet switching method used in NoC
by a flit from the source module to the destination module. routers is wormhole switching. In this switching approach,
Deterministic routing algorithms which always provide the several flits make up a packet and the first or the header flit
same path for a source-destination pair are simple and less determines the path, reserving the channels as it progresses
costly to implement. However, they cannot tolerate faults forward. The remaining flits follow the header in a pipeline
and will thus result in performance degradation in presence manner and the tail (last) flit releases the channels. The
of faults. Adaptive routing algorithms can adaptively select advantage gained is reduced latency as well as reduced buffer
sizes [3], an important factor since the on-chip resources
This work is supported in part by the German Research Foundation (DFG)
within the Cluster of Excellence ”Center for Advancing Electronics Dresden” especially area and power consumption are very limited. The
(cfaed). drawback of the wormhole switching is that it may lead to
132
closed cycles and so sufficient turns between these directions algorithms, we consider the hexagonal NoC as presented
should be avoided to prevent cycle formations. Moreover, in Fig. 1(a) and use the direction names of the x-,y- and
these set of turns should also prevent all possible simple and diagonal directions, as in a standard 900 system to have easier
complex deadlock cycles formed as a result of clockwise(CW) comparison to the mesh topology. For the hexagonal NoC,
and counter-clockwise (CCW) cycles combination. In the our investigations showed that there are 18 possible adaptive
case of the hexagonal NoC, as shown in Fig.2, there are 3 routing algorithms, 6 of which are unique due to symmetry.
directions, each pair of which are at least 60 degrees apart. As there are many possibilities of simple and complex
On closer inspection, it can be seen that the simplest cycles cycles forming due to combinations of the cycles in the
in hexagonal network are formed due to two basic triangular CW and CCW directions, the process of obtaining the right
cycles, as can be seen in Fig.2. Furthermore, many different selection of prevented turns can be quite cumbersome for any
bigger deadlock cycles can form by combinations of these 2 topology. To simplify the selection of the right combination of
basic cycles. turns, we have used concepts from graph theory. By selecting
a certain combination of turns in the CW and CCW directions
and generating the channel adjacency matrix representation of
the CDG, called channel dependency matrix (CDM), the CDM
is checked for cycles by finding its transitive closure.
600
600
c1 c1 c2 c3 c4 c5 c6 c7 c8
1 2 c1 ª 0 0 1 1 0 0 0 0º
c4 c2 « 0 0 0 0 1 1 0 0»
« »
c5 c2 c7 c3 c3 « 0 0 0 0 0 0 1 1»
c6 A= c4 «
« 0 0 0 0 0 0 0
»
0»
3 4 c5 « 0 0 0 0 0 0 0 0»
c8 c6 « »
« 0 0 0 0 0 0 1 1»
Fig. 2. Examples of some deadlock cycles forming in the 60 degrees or Restricted turns c7 « 0 0 0 1 0 0 0 0»
c8 « »
hexagonal NoC between the three directions. ¬ 0 0 0 0 1 0 0 0¼
133
all cycles formed from the two triangles, a minimum NoC size Algorithm 1 HexNegFirstFT: Destination to NE
of 3 × 3 should be used to generate the CDM. 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 = 𝑋𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑋𝑠𝑜𝑢𝑟𝑐𝑒 , 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 = 𝑌𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑌𝑠𝑜𝑢𝑟𝑐𝑒
B. Fault Tolerant Routing Algorithm if 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 > 1 & 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 > 1 & 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 ∕= 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 then
if 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 > 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 then
Two of the set of algorithms from the 6 adaptive routing if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
algorithms provide at least three alternate paths for flit trans- 𝑆𝑒𝑙𝑒𝑐𝑡 𝐸𝑎𝑠𝑡
mission, via three output-port directions and reaching via three else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
input-port directions and therefore should be able to tolerate 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ𝐸𝑎𝑠𝑡
2 faulty routers. The two routing algorithms are the W-SW-S else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
First (or alternatively N-NE-E Last) and the E-S-SW First (or 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ
alternatively W-N-NE Last) algorithms, where E,W,S,N,NE else
and SW stands for the East, West, South, North, NorthEast 𝐷𝑟𝑜𝑝 𝐹 𝑙𝑖𝑡
and SouthWest directions, respectively. In the following we end if
discuss the fault-tolerant W-SW-S First algorithm, in which else if 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 < 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 then
flits should be routed adaptively first in the West, South or if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
SouthWest directions and then adaptively North, NorthEast 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ
and East directions to reach the destination. This algorithm else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
is also called Negative First Fault-tolerant (HexNegFirstFT) 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ𝐸𝑎𝑠𝑡
routing algorithm, as W, SW and S are the negative x, diagonal else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
and y directions respectively for the hexagonal NoC and all 𝑆𝑒𝑙𝑒𝑐𝑡 𝐸𝑎𝑠𝑡
turns from the positive directions to the negative directions are else
prevented. The Negative-First Fault-tolerant routing algorithm 𝐷𝑟𝑜𝑝 𝐹 𝑙𝑖𝑡
for the mesh was presented in [5] and was shown to be able to end if
tolerate all cases of 1 faulty router, as it can provide 2 alternate end if
routes from a source to a destination node. else if 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 > 1 & 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 == 1 then
The turns prevented in the HexNegFirstFT algorithm are if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑆 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
the following: (CW) E-to-SW (600 ), NE-to-S (600 ), E-to-S 𝑆𝑒𝑙𝑒𝑐𝑡 𝑆𝑜𝑢𝑡ℎ
(1200 ) and (CCW) N-to-SW (600 ), NE-to-W (600 ), N-to-W else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
(1200 ). In addition the following 1800 turns are disallowed: E- 𝑆𝑒𝑙𝑒𝑐𝑡 𝐸𝑎𝑠𝑡
to-W , NE-to-SW and N-to-S. Initially only local knowledge else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
of faults is assumed, i.e. a router knows only which of its 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ𝐸𝑎𝑠𝑡
immediate neighbors are faulty. However, as we will see else
later, in the case when the destination is to the NE of the 𝐷𝑟𝑜𝑝 𝐹 𝑙𝑖𝑡
current router, then local knowledge is not sufficient to have 2 end if
router-fault tolerance. Although in this section, we describe else if 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 == 1 & 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 > 1 then
the HexNegFirstFT algorithm, the E-S-SW algorithm also if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑊 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
provides 3 alternate paths to the destination and is therefore 𝑆𝑒𝑙𝑒𝑐𝑡 𝑊 𝑒𝑠𝑡
also 2 router fault tolerant. else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
According to the algorithm, whenever two or three valid 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ
output ports are available towards a destination, the output else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
port is selected which has a higher path diversity, in order to 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ𝐸𝑎𝑠𝑡
increase fault-tolerance. Accordingly, when the destination is else
to the W, S or SW of the current node, the output ports in 𝐷𝑟𝑜𝑝 𝐹 𝑙𝑖𝑡
the W, S or SW directions are chosen adaptively, to reach the end if
destination. When the destination is to the N or E or NE of the else if 𝑋𝑜𝑓 𝑓 𝑠𝑒𝑡 = 𝑌𝑜𝑓 𝑓 𝑠𝑒𝑡 then
current node, the flit should be first forwarded in the negative if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
direction, i.e. in the W, SW or S directions, adaptively, to reach 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ𝐸𝑎𝑠𝑡
a position from which 3 disjoint paths toward the destination else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝐸 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
are available. Then the positive directions, i.e. the N, NE or E 𝑆𝑒𝑙𝑒𝑐𝑡 𝐸𝑎𝑠𝑡
directions are selected adaptively towards the destination, as else if 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑢𝑟(𝑁 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛) 𝑖𝑠 𝑛𝑜𝑡𝑓 𝑎𝑢𝑙𝑡𝑦 then
shown in Fig. 5. Since, it is possible for all source-destination 𝑆𝑒𝑙𝑒𝑐𝑡 𝑁 𝑜𝑟𝑡ℎ
pairs to begin in 3 possible directions and also to reach the else
destination via 3 ports, the routing algorithm can tolerate any 𝐷𝑟𝑜𝑝 𝐹 𝑙𝑖𝑡
2 router faults. end if
The pseudo-code for destinations to the NorthEast of the end if
source is given in Algorithm 1. Essentially, when the desti-
134
nation is to the NE of the current node, it is possible to go When the destination is to the SW of the source, if the
in three directions, i.e. N or E or NE or even in the negative SW router is faulty, either the W or S can be chosen as
directions, but only if the previous direction of travel was not here there are more than 1 path towards the destination in
already in a positive direction. If the 𝑥𝑜𝑓 𝑓 𝑠𝑒𝑡 and the 𝑦𝑜𝑓 𝑓 𝑠𝑒𝑡 each direction, as shown in Fig. 6. In this case, knowledge
are both greater than 1, the three positive directions, N, NE of neighbour routers’ fault condition is sufficient. However,
or E directions are taken adaptively. The output port direction greater knowledge of non-neighbour faults would reduce the
which leads to the higher path diversity and which does not path length.
lead to a faulty router is selected, as can be seen in Algorithm1.
If 𝑥𝑜𝑓 𝑓 𝑠𝑒𝑡 is equal to 𝑦𝑜𝑓 𝑓 𝑠𝑒𝑡 , NE output port is preferred. If S S
the router in this direction is faulty, then either the E or N
port can be selected, as both lead to the destination in equal D D
S S
number of hops (Fig. 5(b)).
XD X XD
D D
D X D
X
S S
S S
(a) (b) Fig. 6. Paths to the destination at SW of the source.
D D
In the algorithm when the flit is incoming from a positive
direction of travel i.e. through the W or S or SW input ports of
X X X the current router, then further movement to the W, S or SW
direction is disallowed by the deadlock freedom requirement.
S X S When the destination is to the East or North, then first 2
(c)
(d)
hops in the negative direction i.e. in the W or S or SW are
taken, so as to bring the destination to the North East of the
D D D current router, from which there are 3 disjoint paths to the
destination. Then the rules in Algorithm 1 are followed to the
S X X X destination. Some examples with different locations of faulty
routers are shown in Fig. 7. As in these cases, the path taken
S to the destination is a non-minimal one, there is some loss of
X
performance. However, due to the diagonal link, the average
(e) (f)
path length is still lower in comparison to mesh with the
S Source D Destination X Faulty Router
Negative First fault tolerant routing algorithm. Moreover, when
minimal paths exist to the destination e.g. for destinations to
the W or S of the source, with each fault encountered the
Fig. 5. Paths to the destination at NE of the source.
path length is increased by 2 hops for the mesh whereas it is
increased by 1 hop for the hex noc. Thus, in comparison to
However, at this point, with only local knowledge of faults, the mesh the performance of the hexagonal mesh is always
it is not possible to ensure reaching the destination, as seen in higher in terms of average latency and fault-tolerance.
Fig.5(c), where if E output port had been selected, destination
would not have been reachable. Because of this, the router
requires the fault knowledge of the neighbor which is 2 hops S D D D
directions, respectively) are first taken, as can be seen in Figs. (a) (b)
135
When the path is along the West or South edge of the NoC, hexagonal NoC has a higher flit acceptance rate. In particular
and a fault blocks the path to the destination, then some hops with 0 and 2 faults, the hexagonal NoC is able to accept up to
are taken around the fault to bypass it, as can be seen in Fig. 0.44 and 0.28 flits/node/cycle respectively before the network
7(b) and Fig. 8. Although, in such cases, some forbidden turns becomes saturated, which is up to 46% higher than the mesh,
are taken (such as E-to-S turn in Fig. 8), deadlock does not which can accept only up to 0.3 and 0.23 flits/node/cycle
happen as a cycle cannot form through a faulty node at the respectively before becoming saturated.
edge.
35
HexNoC: 0 fault
D D MeshNoC: 0 fault
HexNoC: 1 fault
30
MeshNoC: 1 fault
D D D D HexNoC: 2 faults
X X MeshNoC: 2 faults
Latency (Cycles)
25
S X S X
20
10
IV. P ERFORMANCE E VALUATION
To evaluate the performance of the adaptive routing al- 5
0 0.1 0.2 0.3 0.4 0.5
gorithm, we implemented the hexagonal NoC topology with Acceptance Rate (Flits/Node/Cycle)
the Negative First fault tolerant routing scheme in a cycle
accurate simulator [21], based on a C++/SystemC simulation Fig. 9. Latency versus flit acceptance rate for 4 × 4 NoC with uniform
model. We also implemented the Negative First fault tolerant traffic.
routing algorithm for the mesh NoC, in order to compare
its performance to the hexagonal NoC. We evaluated the
performance in terms of latency with varying flit injection B. Fault Tolerance Evaluation
rates and the resilience of the network to different numbers For the fault tolerance investigation, we evaluated a 6 × 6
of router faults. Fault resilience or tolerance is measured sized network and the results are depicted in Fig. 10. The
by the average ratio of successfully delivered flits to the location of the faulty routers were randomly selected over
total number of injected flits. The faulty routers are modeled 1000 iterations. The results show that while the mesh has
as permanently defective units. Each router is aware of the 100% fault reliability at any position of 1 faulty router, the
immediate neighbouring faulty routers, i.e. 4 for the mesh hexagonal NoC has 100% reliability to any position of 2 faulty
and 6 for the hexagonal mesh. As discussed earlier, in the routers. Note, however, that the two corner routers (upper
hexagonal NoC, each router is also aware of the fault of the left and bottom right) have only two input or output ports
neighbour two hops away in the E-NE direction. We assumed and therefore cannot be two router fault tolerant. We do not
uniform random traffic pattern i.e. all nodes generate packets consider these two nodes in our calculation, as they are not
in with equal probability and random distribution. Moreover, part of the true hexagonal NoC. When the number of faulty
we assumed nodes connected to faulty routers do not generate routers is increased above 2, the reliability decreases but at
any packets and no messages are destined for these nodes.The a slower rate for the hexagonal topology than for the mesh
routers have input buffers with buffer depth of 6 flits and the topology. In particular, with 5 faulty routers, the hexagonal
packet size is kept at 5 flits. NoC has a resilience of 97.54% which is 9% higher than that
of the mesh.
A. Latency Evaluation We also investigated the fault reliability for higher number
We evaluated the latency performance of the two topologies of faults and compared the reliability of two sizes of hexagonal
with Negative First Fault Tolerant Routing for a 4 × 4 sized NoC. The results, for 6 × 6 and 8 × 8 NoCs are depicted in
NoC. The simulations were run for at least 100, 000 cycles Fig. 11. As can be seen, for the same percentage of faults, the
and the number of faulty routers were varied from 0 to 2. The smaller size NoC has a higher fault tolerance than the larger
location of faults are chosen randomly and over all possible sized NoC e.g. with 25% faulty routers, the 6 × 6 network
combinations. The results are shown in Figure 9, which shows has a resilience of 87% while 8 × 8 NoC has a resilience of
the average latency (averaged over all source-destination pairs) 81%. This is due to the fact that for the same percentage of
versus the flit acceptance rate. As the hexagonal topology faulty routers, the 8×8 network has a greater number of faulty
provides shorter paths with the adaptive routing algorithm routers than the 6 × 6 network and also has a longer average
(due to the diagonal link) , the average path length is shorter path length. As a result, while traversing through the network,
and therefore the average path latency is lower than that for flits have a a higher probability of encountering a faulty router
the mesh at the same flit acceptance rate. As a result, the in the 8 × 8 network than in the 6 × 6 network.
136
102 R EFERENCES
100
[1] M. Radetzki, C. Feng, X. Zhao, and A. Jantsch, “Methods of fault
98 tolerance in networks-on-chip,” ACM Computing Surveys, vol. 46, no. 1,
Oct 2013.
Fault resilience (%)
96
[2] S. Borkar, “Thousand Core Chips: A Technology Perspective,” in
94 Proceedings of the 44th Annual Design Automation Conference, ser.
92 DAC ’07, 2007, pp. 746–749.
[3] L. Ni and P. K. McKinley, “A survey of wormhole routing techniques
90
in direct networks,” Computer, vol. 26, no. 2, pp. 70–78, Feb. 1993.
88 [4] C. J. Glass and L. M. Ni, “The turn model for adaptive routing,”
Hex NoC
86 Mesh NoC
Association for Computer Machinery, vol. 41, no. 5, pp. 874–902, Sep.
1994.
84 [5] C. Glass and L. Ni, “Fault-tolerant wormhole routing in meshes,” in
82 FTCS23. Proceedings, 1993, pp. 240–249.
[6] M. Imai and T. Yoneda, “Improving dependability and performance of
80
1 2 3 4 5 fully asynchronous on-chip networks,” in Proceedings of the 17th IEEE
Number of faulty routers International Symposium on Asynchronous Circuits and Systems, 2011.
[7] J. Wu, “A fault-tolerant and deadlock-free routing protocol in 2d meshes
based on odd-even turn model,” IEEE Transaction on Computers,
Fig. 10. Comparison of fault resilience of 6 × 6 mesh and hexagonal NoC
vol. 52, no. 9, pp. 1154–1169, Sep. 2003.
for different numbers of faulty routers.
[8] Y. Fukushima, M. Fukushi, and S. Horiguchi, “Fault-Tolerant Routing
Algorithm for Network on Chip without Virtual Channels ,” in 24th
IEEE International Symposium on Defect and Fault Tolerance in VLSI
100 Systems, 2009.
6x6 Hex NoC
95 8x8 Hex NoC
[9] R. Boppana and S. Chalasani, “Fault-tolerant wormhole routing algo-
rithmsfor mesh networks,” IEEE Transaction on Computers, vol. 44,
90 no. 7, pp. 848–864, 1995.
Fault Resilience (%)
[10] D. Fick et al., “Vicis: a reliable network for unreliable silicon,” in Design
85
Automation Conference, April 2009, pp. 812–816.
80 [11] E. Masoumeh, M. Daneshtalab, J. Plosila, and H. Tenhunen, “MAFA:
Adaptive fault-tolerant routing algorithm for networks-on-chip,” in Dig-
75
ital System Design (DSD), 15th Euromicro Conference on, 2012, pp.
70 201 – 207.
[12] E. Masoumeh, M. Daneshtalab, and J. Plosila, “High performance fault-
65
tolerant routing algorithm for noc-based many-core systems,” in Parallel,
60 Distributed and, Network-Based Processing, 21st Euromicro Conference
on, 2013.
55 [13] D. Fick, A. DeOrio, G. Chen, V. Bertacco, D. Sylvester, and D. Blaauw,
50 “A Highly Resilient Routing Algorithm for Fault Tolerant NoCs,” in
0 10 20 30 40 50 Design, Automation Test in Europe Conference Exhibition (DATE),2009,
Faulty Routers (%) April 2009, pp. 21–26.
[14] C. Feng, Z. Lu, A. Jantsch, J. Li, and M. Zhang, “A reconfigurable fault-
Fig. 11. Comparison of fault resilience of 6 × 6 and 8 × 8 sized NoCs for tolerant deflection routing algorithm based on reinforcement learning
different ratios of faulty routers. for network-on-chip,” in Third International Workshop on Network on
Network on Chip Architectures, NoCArc’10. ACM, May 2010, pp.
11–16.
[15] J. Liu, J. Harkin, Y. Li, and L. Maguire, “Low cost fault-tolerant
V. C ONCLUSION algorithm for networks-on-chip,” Journal Microprocessors and Mi-
crosystems, vol. 39, pp. 358–372, 2015.
With decreasing transistor gate sizes, the network-on-chip [16] A. Shamaei and B. Bose, “Adaptive routing in hexagonal torus inter-
connection networks,” in IEEE High Performance Extreme Computing
becomes increasingly susceptible to faults. In this paper, we Conference, Sep 2013, pp. 10–12.
investigated the hexagonal NoC topology and evaluated the [17] H. Gu, J. Zhang, Z. Liu, and X. Tu, “Routing in Hexagonal Networks
fault tolerance and performance of this topology in comparison under a Corner-Based Addressing Scheme,” IEICE TRANS. INF. and
SYST., vol. E89-D, no. 5, pp. 1755–1758, May 2006.
to the mesh, which is the most commonly used NoC topology. [18] W. Hu, S. Lee, and N. Bagherzadeh, “DMesh: a diagonally-linked
We developed deadlock free fault tolerant routing algorithms mesh Network-on-Chip architecture,” in NoCArc, First International
for the hex NoC based on the turn model. We proposed a Workshop on Network on Chip Architectures, 2008.
[19] J. Duato, “A new theory of deadlock-free adaptive routing in wormhole
general approach to ensure the right selection of turns to have networks,” Parallel and Distributed Systems, IEEE Transactions on,
deadlock freedom based on the channel dependency matrix. vol. 4, no. 12, pp. 1320–1331, Dec 1993.
The results indicate that with the proposed routing algorithms, [20] V. Catania, R. Holsmark, S. Kumar, and M. Palesi, “A methodology for
design of application specific deadlock-free routing algorithms for NoC
the hex NoC with 1.5 times greater number of links than systems,” in International Conference on Hardware/Software Codesign
the mesh, significantly improves the NoC resilience by being and System Synthesis (CODES+ISSS). Proceedings, 2006, pp. 142–147.
able to tolerate any position of two faulty routers, whereas the [21] M. Winter and G. P. Fettweis, “A Network-on-Chip Channel Allocator
for Run-Time Task Scheduling in Multi-Processor System-on-Chips,”
mesh can tolerate only one faulty router. Moreover, the hex Digital Systems Design, Euromicro Symp. on, pp. 133–140, 2008.
NoC has a superior performance in terms of average latency
and throughput. In a 4 × 4 network, the hex NoC has 46%
and 22% higher flit acceptance rate at 0 and 2 faulty routers,
respectively, in comparison to the mesh at saturation point.
137