You are on page 1of 10

Computers and Electrical Engineering 38 (2012) 801–810

Contents lists available at SciVerse ScienceDirect

Computers and Electrical Engineering


journal homepage: www.elsevier.com/locate/compeleceng

A novel 3D NoC architecture based on De Bruijn graph q


Yiou Chen ⇑, Jianhao Hu, Xiang Ling, Tingting Huang
National Key Lab of Science and Technology on Communications, University of Electronic Science and Technology of China, Chengdu 610054, China

a r t i c l e i n f o a b s t r a c t

Article history: Networks on Chip (NoC) and 3-Dimensional Integrated Circuits (3D IC) have been proposed
Available online 15 December 2011 as the solutions to the ever-growing communication problem in System on Chip (SoC).
Most of contemporary 3D architectures are based on Mesh topology, which fails to achieve
small latency and power consumption due to its inherent large network diameter. More-
over, the conventional XY routing lacks the ability of fault tolerance. In this paper, we pro-
pose a new 3D NoC architecture, which adopts De Bruijn graph as the topology in physical
horizontal planes by leveraging its advantage of small latency, simple routing, low power,
and great scalability. We employ an enhanced pillar structure for vertical interconnection.
We design two shifting based routing algorithms to meet separate performance require-
ments in latency and computing complexity. Also, we use fault tolerant routing to guaran-
tee reliable data transmission. Our simulation results show that the proposed 3D NoC
architecture achieves better network performance and power efficiency than 3D Mesh
and XNoTs topologies.
Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction

The complexity of System on Chips (SoC) arises with the rapid development of semiconductor technology. In the future,
hundreds or even thousands of processors will be integrated in a single chip. The traditional buses can no longer meet the
communication requirement for future SoCs. Accordingly, Network on Chip (NoC) has emerged as a promising candidate for
the on-chip communication system [1–3] owing to its high throughput, parallel computing, expandability, and reusability.
The NoCs in 2-Dimension (2D) ICs limit the on-chip communication performance [4–6]. Fortunately, 3-Dimensional (3D)
NoCs with Through Silicon vias (TSV) for vertical chip interconnection [6–8] are proved to be able to significantly improve
the NoC performance [4,5,9–11] thanks to the delay and power reduction in 3D packaging technology.
However, most of the contemporary 3D NoC architectures are based on Mesh topology, which fails to achieve a small la-
tency and power due to the large network diameter. Also, the XY routing used in Mesh topology lacks the ability of fault
tolerance. In previous studies [12,13], De Bruijn graph provides a reliable network topology with small diameter and simple
routing. Therefore, using a De Bruijn graph based architecture for 3D NoC can support fault tolerant routing and lower net-
work latency and power consumption. In this paper, we propose a new 3D NoC architecture based on De Bruijn graph. Fig. 1
shows an example of 64 nodes, with 16 nodes on each plane. De Bruijn graph is used to construct horizontal topology, while
an enhanced pillar structure is used in vertical topology. Our simulation results show that this structure achieves better net-
work performance latency than most of Mesh-based architectures.
The rest part of the paper is organized as follows: Section 2 carries out a literature review on 3D NoC architectures. Section 3
introduces De Bruijn graph. Our proposed 3D NoC architecture based on De Bruijn graph is detailed in Section 4. Section 5 pre-
sents the performance metrics and simulation results under different traffic models. Finally, we conclude the paper in Section 6.

q
Reviews processed and proposed for publication to Editor-in-Chief by Guest Editor Prof. Fangyang Shen.
⇑ Corresponding author. Tel.: +86 28 61830329; fax: +86 28 61830326.
E-mail address: chenyiou2009@gmail.com (Y. Chen).

0045-7906/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compeleceng.2011.11.016
802 Y. Chen et al. / Computers and Electrical Engineering 38 (2012) 801–810

s Enhanced Pillar
s

s Router
s s
s

s
Switch
s
s

Horizontal Plane
Network

Fig. 1. The 3D architecture based on De Bruijn graph.

2. Related work

Feero and Pande [4] introduced four 3D architectures: 3D Mesh, Stacked Mesh, Ciliated 3D Mesh, and 3D BFT, as shown in
Fig. 2. 3D Mesh is a direct extension of 2D Mesh structure and Stacked Mesh and Ciliated 3D Mesh further enhance it. In
particular, Stacked Mesh connects multiple 2D Mesh layers using a bus spanning the entire vertical direction of the chip.
It takes the advantage of short inter-layer distance of 3D package to achieve smaller network latency than 3D Mesh, how-
ever, inherits all drawbacks of bus based connection schemes. Ciliated 3D Mesh is a Mesh network with multiple planes for
processor cores and one or several planes for routers. Compared with 3D Mesh, it achieves higher network utilization and
smaller network latency, but incurs network congestion and complicates the routers. 3D BFT has the same topology as
2D BFT. It maps 2D BFT network onto a multi-plane 3D NoC, which simplifies wire routing and reduces the longest inter-
router path by a factor of at least two. Nevertheless, 3D BFT is difficult to implement owing to its irregular and complex con-
nection topology.

PE

router

Vertical node
3D_Mesh Stacked Mesh Ciliated 3D Mesh 3D BFT

Fig. 2. Mesh-based 3D network and tree-based 3D network.

Crossbar switch Pillar

Fig. 3. XNoTs architecture.


Y. Chen et al. / Computers and Electrical Engineering 38 (2012) 801–810 803

Matsutani et al. [11] proposed a XNoTs architecture, which consists of multiple network layers tightly connected via
crossbar switches. Mesh is used for horizontal topology and pillar for vertical topology, as shown in Fig. 3. Pillar consists
of cores and routers with same horizontal coordinate on different planes, and a crossbar switch for data transmission among
the cores. Matsutani also proposed a deadlock-free routing by restricting inter-plane packet transfer from a lower plane to a
higher plane. However there are some drawbacks in this architecture. Firstly, XNoTs is a Mesh based topology with a large
diameter, which results in long routing paths. Secondly, indirect inter-core data transmissions results in huge network la-
tency. Besides, XNoTs is not fault-tolerable.

3. Properties of De Bruijn graph

k
A De Bruijn graph [12,13] DB (d, k) has N ¼ d nodes with diameter k and degree 2d. Any two nodes with identifiers i and j
are connected to each other if either one of the following equations holds:
i ¼ ðd  j þ rÞðmod NÞ; r ¼ 0; 1; . . . ; d  1 ð1Þ
j ¼ ðd  i þ rÞðmod NÞ; r ¼ 0; 1; . . . ; d  1 ð2Þ
The identifiers correspond to a state graph with a k bits d-ary digits shift register, which changes its state by shifting oper-
ation. Let a node with an identifier ðik1 ik2 . . . i1 i0 Þ where ij 2 f0; 1; . . . ; ðd  1Þg and 0 6 j 6 ðk  1Þ. Then, its neighbors are
(ik2 . . . i1 i0 p) and (pik1 ik2 . . . i1 ) where p 2 f0; 1; . . . ; ðd  1Þg. Fig. 4 shows the connection of DB (2, 4).
Hosseinabady et al. [12] proposed the design of NoC based on De Bruijn graph. Compared to that work, our paper extends
the performance of the De Bruijn graph in the following ways:

(1) Our paper proposes a significantly simpler routing algorithm and router architecture than those in Ref. [12]. Both Ref.
[12] and our paper use source routing, which incurs the highest area overhead in routers. As shown in Fig. 5, our paper
removes the addition addressing module in Ref. [12] which converts identifiers to routing addresses, since we use

11 2
7 1
3 5 9
15 0
6 10 12
14 8
13 4

Fig. 4. Connection of DB (2, 4).

Generate two Path 1


Path calculation in binary tree 1
binary trees
Comparator

Routing path

identifier Path 2
Addressing Path calculation in binary tree 2
address

Addressing Routing path calculation


(a) Router in reference [12]

identifier
(address) Path 1
Left shift register
Comparator

Routing path

Path 2
Right shift register

Routing path calculation


(b) Router in this paper

Fig. 5. Schematic diagrams of router comparison.


804 Y. Chen et al. / Computers and Electrical Engineering 38 (2012) 801–810

identifiers as the addresses. Moreover, our paper employs two shift registers rather than the overcomplicated binary
tree in [12] in routing calculation.
(2) The network in our paper has better scalability than Ref. [12]. Hosseinabady et al. [12] inevitably incurs huge calcu-
lation logic when converting identifiers to addresses. Our paper does not need extra addressing process.
(3) Our paper proposes two alternative routing algorithms to adapt for different performance requirements. The General
Shifting based Routing Algorithm (GSRA), which features low cost, can be used to meet tight area budget. The Shortest
Shifting based Routing Algorithm (SSRA), which can achieve the shortest routing path by iteratively calculation is used
in high performance design. By contrast, Ref. [12] lacks specific routing algorithms.

4. 3D NoC architecture based on De Bruijn graph

4.1. Topology

Definition 1: A Horizontal Plane Network (HPN) is a network on the physical horizontal plane, which refers to a wafer layer
of the 3D IC. HPN can adopt an arbitrary topology. We assume the same HPN topology for different layers in 3D NoC.
Definition 2: An Enhanced Pillar is a vertical structure consisting of PE cores and routers with same horizontal coordinate
on different planes, which are connected by a crossbar switch as shown in Fig. 6.
Compared with pillar in Ref. [11], the enhanced pillar has connections between routers and their corresponding PE cores
for exchanging data packets. Thus, the enhanced pillar can achieve higher data transfer efficiency in both horizontal and ver-
tical directions, and mitigates the traffic load of the crossbar switch as well.
Our proposed architecture is called DB_EP for simplicity. In this architecture, De Bruijn graph is used to establish the
topology of routers on the same HPN, and the enhanced pillar is used to achieve vertical connection for different horizon
planes.

4.2. Shifting based Routing Algorithms

A router and its corresponding PE core have the same network address: (HPN address, z). HPN address is the De Bruijn
address on the HPN, and z is the vertical coordinate. We use dimension-order based routing algorithm in our architecture.
In data transmission, source core sends data packets to its corresponding router (source router). Then, the packets are trans-
mitted to destination router through HPN. At the destination router, if the source and destination cores locate on the same
plane, the packets are directly delivered to the destination core. Otherwise they are sent to the switch, by which they are
delivered to the destination core.
In DB (d, k), every node i has 2d neighbors. The neighbors’ addresses can be obtained by left or right shifting one bit into i’s
address. The shifting-in bit is called padding bit. Furthermore, an N-bit address can be shifted N times to generate an arbi-
trary N-bit address. Thus, we can get any destination address by shifting the source address. The intermediate shifting results
indicate the routing path from source to destination.
In this paper, we design two Shifting based Routing Algorithms (SRA) for HPN: the General SRA (GSRA) and the Shortest
SRA (SSRA). The pseudo code of SRA, GSRA, and SSRA are shown in Table 1.

PE

. S
. Router
.

Switch

Fig. 6. Enhanced pillar structure.

Fig. 7. An example of GSRA and SSRA.


Y. Chen et al. / Computers and Electrical Engineering 38 (2012) 801–810 805

Table 1
Pseudo code of SRA, GSRA, and SSRA.

SRA
# left shifting (LS)
for h = N  1; h > 0; h – do
if the lowest h bits of source address = the highest h bits of destination address then
F = h + 1;
break;
end if
end for
old address = source address;
for F = h + 1; F < N + 1; F++ do
new address={old address[N  2:0], destination address[N  F];
left intermediate address[F  (h + 1)] = new address;
old address = new address;
end for
# right shifting (RS)
for h = N  1; h > 0; h – do
if the highest h bits of source address = the lowest h bits of destination address then
F = h;
break;
end if
end for
old address = source address;
for F = h; F < N; F++ do
new address = {destination address[F], old address[N  1:1]};
right intermediate address[F  h] = new address;
old address = new address;
end for
GSRA
adopt SRA to calculate left and right routing paths;
Choose the shorter one between left and right routing paths.
SSRA
adopt SRA to calculate left and right routing paths;
if the length of either path < 3 then
break;
else
for h = N  1; h > 0; h – do
if the highest h bits of source address = the lowest h bits of destination address then
F = h;
break;
end if
end for
old address = source address;
for F = h; F < N; F++ do
new address = {destination address[F], old address[N  1:1]};
right intermediate address[F  h] = new address;
old address = new address;
end for
choose the shortest path from the candidates

The SSRA achieves smaller network latency than the GSRA at the cost of higher computational complexity. In practice, we
can choose a suitable one according to the performance requirements. Fig. 7 gives an example of delivering packets from
node (4, 3) to node (12, 2) based on GSRA and SSRA.
In the packet header shown in Fig. 8, we use sequences of shifting directions and padding bits to indicate intermediate
addresses of the routing path in HPN. SHPN and DHPN are HPN addresses of source and destination, and Sz and Dz are their
z coordinates, respectively. Hop denotes the remnant hop counts to HPN destination. Dirc gives directions for HPN address
shifting. Pad is padding bits for HPN address shifting.
When a router receives a packet, it analyzes the packet header to determine transmission direction. If DHPN is a local HPN
address, the router transmits the packet to its corresponding PE core or the connected switch according to Dz . Otherwise, it
uses the most significant bits (MSBs) of Dirc and Pad to identify the output port. Then the packet header is updated by
decreasing the value of Hop by 1, and left shifting Dirc and Pad domains. Abundant research on deadlock-free routing in
De Bruijn graphs has been done in [14]. Interested readers may refer to those papers.

4.3. Fault tolerant routing

DB_EP is fault-tolerant since its horizontal topology is fault-tolerant. A faulty link can be bypassed for successful packet
delivery under the fault-tolerant routing algorithm [12]. Define node i as a child of node j and node j as a parent of node i if Eq.
806 Y. Chen et al. / Computers and Electrical Engineering 38 (2012) 801–810

DHPN DZ S HPN SZ Hop Dirc Pad

Fig. 8. Packet header of DB_EP.

lai Parents: |i-j | =N/2


a b

Brothers: | a-b| =1

Fig. 9. A family circle formed by four nodes.

(1) holds. Children of the same parent are brothers. It has been proved [12] that two parents i and (N/2 + i) (mod N) have the
same set of children, and the four nodes form a family circle as shown in Fig. 9.
In a 16 nodes De Bruijn graph, each family circle has at most four nodes and each node has at most two children. In Fig. 9,
if the link lai fails, node a can send packets to node i through the other side of the circle (i.e., a ? j ? b ? i), which bypasses
the faulty link using two more hops. Meanwhile, node a alters the packet header according to the new path.
Example: Consider the previous example in this section where packets are transmitted from node (4, 3) to node (12, 2).
According to the SSRA, the routing path in HPN is (4, 3) ? (9, 3) ? (12, 3). The header of the outgoing packets at each node
will be:

At the output of node (4, 3): 1100-11-0100-10-010-01-11.


At the output of node (9, 3): 1100-11-0100-10-001-1-1.
If the link between nodes (9, 3) and (12, 3) fails, node (9, 3) will send packets using path (9, 3) ? (4, 3) ? (8, 3) ? (12, 3). In
this case, the header of the outgoing packets at each node will be:
At the output of node (4, 3): 1100-11-0100-10-010-01-11.
At the output of node (9, 3): 1100-11-0100-10-011-101-001.
At the output of node (4, 3): 1100-11-0100-10-010-01-11.
At the output of node (8, 3): 1100-11-0100-10-001-1-1.

This guarantees successive data transmission in the network. In contrast, 3D_Mesh or XNoTs, which adopts deterministic
XY routing in HPNs, will encounter packet congestion at the faulty link, and thus data transmission will be interrupted.
If each HPN contains exactly dk nodes, the network is regular. Our above discussions on network topology, routing, and
fault-tolerant algorithms are based on regular networks. To apply the proposed algorithms, we first need to regularize the
network if it is irregular, which can be achieved by adding some routers without corresponding PE cores such that the total
number of routers in HPN becomes dk. Because other topologies such as Butterfly Fat Tree may also face irregularity problem,
the redundant area caused by adding routers will not significantly affect the system’s performance.

5. Performance evaluation

5.1. Complexity analysis

The De Bruijn graph used as HPN topology has simpler routing algorithm and router architecture than the Mesh topology
in [9,12]. The calculation complexity of routing algorithm is mainly about calculating the routing path, which can be esti-
mated as comp  hop, where comp indicates the cost to calculate one intermediate address in the routing path, and hop is
the number of hops. For a NoC with 2N nodes, the network diameters of De Bruijn graph and Mesh are N and 2ð2N=2  1Þ,
respectively. Therefore, the average routing hops in De Bruijn graph is less than that in Mesh. Moreover, De Bruijn graph uses
identifiers as the routing addresses. Since shifting an identifier will get its neighbors, only one shift operation is needed for
one intermediate address in the routing path. But the addresses in Mesh are two dimensional, which consist of horizontal
and vertical addresses. Each intermediate address requires comparisons on both addresses, which is more complicated than
one shift operation. Therefore, the routing algorithm and the router architecture of De Bruijn graph can be simpler than
Mesh.
Compared to Ref. [11], our paper uses an enhanced pillar structure in vertical direction, which connects PEs with their
corresponding routers. Therefore, the data from PEs traverse directly to the router instead of passing through the switch
in pillar, which achieves higher efficiency and less routing hops than Ref. [11].
Y. Chen et al. / Computers and Electrical Engineering 38 (2012) 801–810 807

5.2. Performance evaluation

We use a SystemC-based simulation platform MSNS [15] to evaluate the performance of the proposed schemes. MSNS
covers all top-down layers of a NoC system, and communications among them follow Message Passing Interface (MPI) stan-
dard. For comparison, the performance of XNoTs and 3D_Mesh is also investigated in the case study.
We use two traffic models in the simulation: uniform and hotspot.

(1) Uniform: All traffic is uniformly distributed in 3D networks. That is, every PE core has the same probability to receive
data packets.
(2) Hotspot: Eighty percent of packets are sent to the source horizontal planes, and 20% are sent to other planes.

We consider a model with 64 nodes in total and 16 nodes on each plane. In practice, it is unlikely that all nodes will send
packets at the same time. Assume that four nodes (i.e., source nodes) are sending data packets on each plane. Different dis-
tribution of source nodes will result in diverse system performance. To take the real network condition into account, we con-
sider two typical source nodes distribution patterns: centralized distribution and dispersive distribution as shown in Fig. 10.
Centralized distribution matches the practical scenario with heavy transmission congestion, where source nodes are located
very closely. Dispersive distribution matches the practical scenario where source nodes apart faraway from each other, and
transmission congestion is not heavy. We further assume that packets are injected to the network according to the Poisson
process, and the total number of transmitted data packets is one million for each case study. Besides, average packet latency
and power consumption are used as the performance metrics.

(a) Centralized source distribution for De Bruijn graph and Mesh

(b) Dispersive source distribution for De Bruijn graph and Mesh

Fig. 10. Source nodes distribution patterns for De Bruijn graph and Mesh (gray nodes are source nodes).

×104
2 DB_EP under centralized distribution
1.8 DB_EP under dispersive distribution
Average Packet Latency (ns)

XNoTs under centralized distribution


1.6 XNoTs under dispersive distribution
3D_Mesh under centralized distribution
1.4 3D_Mesh under dispersive distribution
1.2
1
0.8
0.6
0.4
0.2
0 ×105
2 4 6 8 10 12 14
injection rate (packet/s)

Fig. 11. Average packet latency of uniform traffic.


808 Y. Chen et al. / Computers and Electrical Engineering 38 (2012) 801–810

×104
1.4 DB_EP under centralized distribution
DB_EP under dispersive distribution

Average Packet Latency (ns)


1.2 XNoTs under centralized distribution
XNoTs under dispersive distribution
1 3D_Mesh under centralized distribution
3D_Mesh under dispersive distribution
0.8

0.6

0.4

0.2

0 ×105
2 4 6 8 10 12 14
injection rate (packet/s)

Fig. 12. Average packet latency of hotspot traffic.

5.2.1. Average packet latency


Average packet latency is the average travel time of a packet from the source node to the destination node [16]. Smaller
average packet latency results in higher system efficiency. Generally, average latency increases with the traffic injection rate,
since the data packets have to wait for buffers and channels in the heavily-loaded paths. Figs. 11 and 12 show the average
packet latency of the uniform and hotspot traffic models under different source node distributions, respectively.
Figs. 11 and 12 show that, for all architectures in both traffic models, packet latency under centralized distribution is lar-
ger than that under dispersive distribution. This is because most traffic load is limited in a small area in centralized distri-
bution, which leads to a longer waiting time due to congestion. In contrast, traffic load is uniformly scattered in the
dispersive distribution, and thus smaller packet latency can be achieved due to less congestion.
In addition, under a certain source distribution pattern, packet latency of hotspot traffic is smaller than that of uniform
traffic. This is because most data packet transmissions are cross different HPNs in uniform traffic model, but within the same
HPN in hotspot traffic model. Cross HPN transmissions require extra vertical hops in routing which increases packet latency.
When injection rate is low, the three architectures have similar average packet latencies. As injection rate goes beyond a
threshold (e.g., 1:2  106 packets/s), DB_EP achieves the smallest packet latency and 3D_Mesh the highest. This is because De
Bruijn graph has a smaller diameter than Mesh, and thus DB_EP takes less routing hops in HPN than others. Unlike 3D_Mesh,
XNoTs has only one hop in vertical direction, and thus achieves smaller packet latency. Our simulation results show that
DB_EP has a better performance under heavy traffic load.

5.2.2. Power consumption


Power consumption is one of the most important metrics in chip design. It can be calculated by the following equation:
P ¼ PWire þ P Node ð3Þ
where PWire is wire power and PNode is node power. Let M be the total number of wires in 3D NoC. We have:
X
PWire ¼ C m V 2DD fm ð4Þ
m

where C m , fm , and V DD stand for the capacitance and bit flipping frequency of wire m, and voltage of the chip. C m is linearly
proportional to the capacitance per unit length (C unit ). We use the wire model of SMIC 0.18 lm process technology to esti-

DB_EP under centralized distribution


Wire Power Consumption (mW)

DB_EP under dispersive distribution


XNoTs under centralized distribution
15 XNoTs under dispersive distribution
3D_Mesh under centralized distribution
3D_Mesh under dispersive distribution

10

× 10 5
2 4 6 8 10 12 14
injection rate (packet/s)

Fig. 13. Wire power consumption of uniform traffic.


Y. Chen et al. / Computers and Electrical Engineering 38 (2012) 801–810 809

12

Wire Power Consumption (mW)


DB_EP under centralized distribution
DB_EP under dispersive distribution
10 XNoTs under centralized distribution
XNoTs under dispersive distribution
3D_Mesh under centralized distribution
8 3D_Mesh under dispersive distribution

0 ×105
2 4 6 8 10 12 14
injection rate (packet/s)

Fig. 14. Wire power consumption of hotspot traffic.

Table 2
Node power consumption.

Architecture Node power (mW)


DB_EP 2311.3024
3D_Mesh 2310.5728
XNoTs 2310.2400

mate the length of each wire. fm is calculated by MSNS [15]. In addition, we obtain the node power by Synopsys Design Com-
piler. Figs. 13 and 14 show wire power consumption under different traffic models.
Figs. 13 and 14 show that wire power consumptions are linearly proportional to the injection rates. For all architectures,
power consumption under centralized distribution is larger than that under dispersive distribution. According to Eq. (4), wire
power consumption varies with bit flipping frequency on wire. From our earlier latency analysis, centralized distribution
needs more routing hops (or transmission links), and thus results in a larger wire power consumption due to a higher bit
flipping frequency.
Under a certain source distribution pattern, wire power consumption of hotspot traffic is smaller than that of uniform
traffic. According to our earlier latency analysis, hotspot traffic needs more routing hops and a larger bit flipping frequency.
Therefore, hotspot traffic model achieves smaller wire consumption.
In Figs. 13 and 14, DB_EP achieves the smallest wire power consumption and 3D_Mesh the highest. This is because DB_EP
has the smallest routing hops, whereas 3D_Mesh has the largest.
Table 2 shows the total amount of node power consumed by all 64 nodes in each architecture, which is estimated by Syn-
opsys Design Compiler. Node power consumptions of the three architectures are close, since their node architectures are
similar.

6. Conclusion

We proposed a new 3D NoC architecture DB_EP based on De Bruijn graph and two fault-tolerant routing algorithms based
on De Bruijn graph’s properties. Our proposed architecture was compared with two existing Mesh based architectures
(3D_Mesh and XNoTs) in terms of network latency and power consumption. Simulation results showed that DB_EP outper-
forms all other architectures in achieving smaller network latency and wire power consumption. Thus, we conclude that
DB_EP achieves better performance than those Mesh-based architectures, though the latter can be implemented easily in
VLSI placement and routing.

Acknowledgements

The authors thank the anonymous reviewers for their many useful suggestions. This work was supported by Research
Fund for the Doctoral Program of Higher Education of China (No. 200806141015).

References

[1] Benini L, De Micheli G. Network on chips: a new SoC paradigm. IEEE Comput 2002;35(1):70–8.
[2] Jantsch A, Tenhunen H. Networks on chip. New York: Kluwer; 2003.
[3] Bjerregaard T, Manhadevan S. A survey of research and practices of network-on-chip. ACM Comput Surv 2006;38(1):1–51.
[4] Feero BS, Pande PP. Networks-on-chip in a three-dimensional environment: a performance evaluation. IEEE Trans Comput 2009;58(1):32–45.
810 Y. Chen et al. / Computers and Electrical Engineering 38 (2012) 801–810

[5] Davis WR, Wilson J, Mick S, Xu J, Hua H, Mineo C, et al. Demystifying 3D ICs: the pros and cons of going vertical. IEEE Des Test Comput
2005;22(6):498–510.
[6] Topol AW, Tulipe DC, Shi L, Frank DJ, Bernstein K, Steen SE, et al. Three-dimensional integrated circuits. IBM J Res Dev 2006;50(4.5):491–506.
[7] Loi I, Mitra S, Lee TH, Fujita S, Benini L. A low-overhead fault tolerance scheme for TSV-based 3D network on chip links. In: Proceeding of IEEE/ACM
international conference on computer-aided design. San Jose, CA; 2008. p. 598–602.
[8] Chen Y, Hu J, Ling X. Research on topologic architecture of three-dimensional network on chip. Telecommun Sci 2009;25(4):39–44.
[9] Pavlidis VF, Friedman EG. 3-D Topologies for networks-on-chip. IEEE Trans Very Large Scale Integr Syst 2007;15(10):1081–90.
[10] Jang DM, Ryu C, Lee KY, Cho BH, Kim J, Oh TS, et al. Development and evaluation of 3-D SiP with vertically interconnected through silicon vias. In:
Proceeding of electronic components and technology conference. Reno, Nevada; 2007. p. 847–52.
[11] Matsutani H, Koibuchi M, Amano H. Tightly-coupled multi-layer topologies for 3-D NoCs. In: Proceeding of international conference on parallel
processing. Xi’an, China; 2007. p. 75–84.
[12] Hosseinabady M, Kakoee R, Mathew J, Pradhan DK. Reliable Network-on-chip based on generalized de Bruijn graph. In: Proceeding of IEEE
international high level design validation and test workshop. Irvine, CA; 2007. p. 3–10.
[13] Jungnickel D. Graphs networks and algorithms. 3rd ed. Berlin: Springer; 2007.
[14] Park H, Agrawal DP. A novel deadlock-free routing technique for a class of de Bruijn graph based networks. In: Proceeding of 9th International Parallel
Processing Symposium. 1995. p. 524–531.
[15] Li Z, Ling X, Hu J. MSNS: a top-down MPI-style hierarchical simulation framework for network-on-chip. In: Proceeding of international conference on
communications and mobile computing. Kunming, China; 2009. p. 609–14.
[16] Dally WJ, Towles B. Principles and practices of interconnection networks. San Francisco: Morgan Kaufmann; 2004.

Yiou Chen received the B.E. degree in Computer Communication Engineering, from University of Electronic Science and Technology of China, Chengdu, in
2004, and the M.S. degree in Communication and Information Engineering from University of Electronic Science and Technology of China, Chengdu, in 2007,
where she is currently pursuing the Ph.D. degree. Her current research interests include software radio, 3D NoC, and VLSI.

Jianhao Hu received the B.E. and Ph.D. degrees in communication systems from University of Electronic Science and Technology of China in 1993 and 1999,
respectively. He jointed City University of Hong Kong from 1999 to 2000 as a postal doctor. From 2000 to 2004, he served as a senior system engineer at the
3G research center in University of Hong Kong. He has been a professor of the national key lab of communication in University of Electronic Science and
Technology of China since 2005. His research interest includes high speed DSP technology with VLSI, NoC and software radio.

Tingting Huang received the B.E. and M.S. degree in Communication Engineering, from University of Electronic Science and Technology of China, Chengdu,
in 2008 and 2011, respectively. Her research interests include wireless communication and NoC.

Xiang Ling received the B.E., M.S., and Ph.D. degrees from the University of Electronic Science and Technology of China in 1995, 1999, and 2004,
respectively. He is currently an associate professor in the University of Electronic Science and Technology of China. His research interests include wireless
communication, mobile communication and VLSI.

You might also like