You are on page 1of 8

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 33

Kautz Mesh Topology for on-Chip Networks


R. Sabbaghi-Nadooshan

Abstract— During the recent years, 2D mesh Network-on-Chip has attracted much attention due to its suitability for VLSI
implementation. This paper introduces the 2-dimensional Kautz topology for Network-on-Chips as an attractive alternative to the
popular simple 2D mesh. The cost of 2D Kautz is equal to that of the simple 2D mesh but it has a logarithmic diameter. We
compare the proposed network and the mesh network in terms of power consumption and network performance. Compared to
the equal sized simple mesh NoC, the proposed Kautz-based network has better performance while consuming less energy.
Also by vertically stacking two or more silicon wafers, connected with a high-density and high-speed interconnect, it is now
possible to combine multiple active device layers within a single IC. In this paper we propose an efficient three dimensional
layout for a novel 2D mesh structure based on the Kautz topology. Simulation results show that by using the third dimension,
performance and latency can be improved compared to the 2D VLSI implementation.

Index Terms— 3D layout, 3D NoCs, Kautz, NoCs, Performance evaluation, Power consumption, SoC.

——————————  ——————————

1 INTRODUCTION

N owadays there are many challenges in designing a


complicated integrated circuit containing several
processing and memory elements integrated on a
mesh topology is very simple. It has low cost and con-
sumes low power. During the past years, much effort has
been made toward understanding the relationship be-
single chip as a System-on-Chip (SoC). SoC design has tween power consumption and performance for mesh
some limitations in the DSM technologies. One of the ma- based topologies [6]. Despite the advantages of meshes
jor problems associated with future SoC designs arises for on-chip communication, some packets may suffer
from non-scalable global wire delays [1]. These limita- from long latencies due to lack of short paths between
tions have been mentioned in many studies [1][2] includ- remotely located nodes. A number of previous works try
ing the limitation in the number of IP cores that can be to tackle this shortcoming by adding some application-
connected to the shared bus, arbitration for accessing the specific links between distant nodes in the mesh [7] and
shared bus, reliability, and so on. To overcome these limi- bypassing some intermediate nodes by inserting express
tations, network-on-chip (NoC) is introduced in recent channels [8], or using some other topologies with lower
years and much research has been conducted in this area. diameter [9].
A proper classification of these studies has been discussed The fact that the Kautz network has a logarithmic di-
in [3]. Another new technology that has been proposed is ameter and a cost equal to the linear array topology moti-
the three dimensional VLSI that exploits the vertical di- vated us to evaluate it as an underlying topology for on-
mension to alleviate interconnect related problems and to chip networks. Kautz topology is a well-known network
facilitate heterogeneous integration of technologies to structure which was initially proposed by Kautz [10] as
realize a SoC design [4]. By combining the ideas in these an efficient topology for parallel processing. Several re-
two types of technology, a new kind of architecture for searchers have studied topological properties, routing
NoCs is imaginable. In [5], new insights on network to- algorithms, efficient VLSI layout and other important
pology design for 3D NoCs is provided and issues related aspects of the de Kautz networks [11].
to processor placement across 3D layers and data man- In this paper, we propose a two-dimensional Kautz-
agement in L2 caches is addressed. In three dimensional based mesh topology for NoCs. We will compare equiva-
designs with the aid of small links between adjacent lay- lent mesh and 2D Kautz architectures using the two most
ers, there is a noticeable improvement in network per- important performance factors, network latency and
formance. power consumption. A routing scheme for 2D Kautz
The performance and also the power consumption of network has been developed and the performance and
the circuit mainly depend to the traffic pattern, routing power consumption of the two networks under similar
algorithm, switching method, and topology. Some com- working conditions has been evaluated using simulation
mon structures have been used for NoC implementations experiments. Simulation results show that the proposed
including mesh and torus networks. network can outperform its equivalent popular mesh to-
The mesh topology is the most dominant topology for pology in terms of network performance and energy dis-
today’s regular tile-based NoCs. It is well known that sipation.
Furthermore, we propose an efficient 3D VLSI layout
————————————————
for the 2D Kautz topology and we show that still it is
• R. Sabbaghi-Nadooshan is with the Department of Electronics, Islamic possible to improve performance by using 3D VLSI tech-
Azad University Central Tehran Branch, Tehran, Iran.
nologies.
The rest of the paper is organized as follows. The next
section presents the 2D Kautz structure. Section 3 dis-
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 34

cusses the background in NoC, 3D VLSI, and explains the while it imposes almost the same cost.
characteristics of 3D designs that may affect the perfor-
mance of 3D NoCs. Also the 3D VLSI layout of the 2D
Kautz structure is discussed in section 3. Section 4 ex-
plains the simulation environment that the results have
obtained in it, and then the simulation results and expe-
rimental evaluation of our approach are described. Final-
ly, we conclude the paper in section 5.

2 K AUTZ TOPOLOGY
2.1 Kautz architecture
A class of digraphs that generalize Kautz digraphs [10] is
defined in [11]. Suppose that the vertices are numbered
with integers modulo N (N being the number of nodes).
Assuming an out-degree of d, then a vertex v is joined to
vertices u = -dv - i (mod N), for i=1, 2, …, d. The diameter
of the resulting digraph is at most log d N and if N=dn+dn-k
(for a positive odd integer k), the diameter becomes n.
Based on the above definition, an N-node binary Kautz
digraph is then defined as follows. Node v, with a linear
address 0…N-1, is joined to vertices with integer address -
2v-1 (mod N) and -2v-2 (mod N).
An 8-node binary Kautz digraph is depicted in Fig. 1.
Each node generates two connections to other nodes and Fig. 2. 8×8 2D Kautz
accepts two connections from other nodes. Owing to the
fact that these connections are unidirectional, the degree 2-3 Layout improvement
of the network is the same as the one-dimensional mesh
Based on the necklace properties in de Bruijn layout [13],
networks. The diameter of a Kautz network with size N is
we have considered a more efficient layout for each row
log (N) which is the minimum distance between nodes 0
and column of the Kautz as shown in Fig. 3. With this
and N.
new layout the total wire length used in the network is
decreased. For example, for an 8×8 2D Kautz about 57%
reduction in total wire length is obtained.

5 4 1 7 0 3 6 2
Fig. 1. 1D Kautz network with 8 nodes.

2-2 Two dimensional Kautz Fig. 3. A better node placement of Kautz network
The 2D Kautz networks have some interesting topological
properties which motivates us to consider them as a suit-
able candidate for on-chip network architectures [12]. The 2-4 Routing algorithm
architecture of this network is depicted in Fig. 2. In this
Kautz graph has similarities to de Bruijn graph [11], and
network, the nodes in each row and column form a Kautz
for routing algorithm we adopt the routing methods in-
network. The most important property is that while the
troduced in [14] by Park as a common routing scheme
number of links in a 2D Kautz and an equal-size mesh are
when developing routing algorithms for the 2D Kautz
exactly the same, the network diameter of the 2D Kautz is
networks.
less than the diameter of the mesh.
Park’s algorithm uses virtual channels unevenly where
The Kautz links are unidirectional and the maximum
very few packets use all the virtual channels. Numbering
of 8 unidirectional links per node in the Kautz equals to
virtual channels in an increasing order starting from 0, it
the maximum number of links for mesh nodes (which are
uses virtual channel 0 more than other virtual channels.
4 bidirectional links). Since the node degree of a topology
For example, for N=16, some of the source nodes use only
has an important contribution in (and usually acts as the
virtual channel 0, some nodes use virtual channels 0 and
dominant factor of) the network cost, the proposed topol-
1, and a few source nodes use virtual channels 0, 1, and 2.
ogy can achieve lower average distance than a 2D mesh
This routing algorithm was revised to make a more ba-
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

lanced use of virtual channels. Here, at each node, the should be considered: 1) Processing elements that are the
packet has a degree of flexibility in selecting virtual chan- IP cores connected by the network; 2) Routers and
nels [9]. For the same example of N=16 given above, the switches that route the packets till received at destination;
last group of source nodes (that use 3 virtual channels), 3) Network adapters that are the interface between PEs
virtual channel 0 is selected to inject the message. For the and switches; 4) Links that connect two adjacent switches.
second group, packets start their journey with virtual When analyzing the network, the behavior of these com-
channel 1, and for the first group (using 1 virtual channel) ponents should be considered carefully. A different set of
they start with virtual channel 2. If a message wants to constraints exists when adapting these architectures to
start with virtual channel 2 and it is occupied, it can try the SoC design paradigm. High throughput and low la-
virtual channel 1, and if it is also occupied it can try start- tency are the desirable characteristics of a multiprocessor
ing with virtual channel 0. Through this method, the vir- system. Instead of aiming strictly for speed, designers
tual channels are used more uniformly that results in a increasingly need to consider energy consumption con-
more balanced traffic over network channels. Fig. 4 shows straints, especially in the SoC domain [15]. So, in order to
the pseudo code of the routing algorithm. compare different NoC architectures, there are some im-
In this paper, we apply the routing algorithm given in portant metrics that should be considered such as net-
Fig. 4. To this end, a message first routes the packet in X work latency, energy consumption, and throughput [15].
dimension using the above-mentioned routing algorithm In this paper, these metrics are investigated in mesh and
in such a way that the route length be minimal. When the 2D Kautz architectures in the context of 3D VLSI.
x-value of the current node and x-value of the destination Presently, there are several possible fabrication tech-
node addresses become equal, the packet is then routed to nologies that can be used to realize multiple layers of ac-
the destination by applying the same routing algorithm in tive-area (single crystal Si or re-crystallized poly-Si) sepa-
the destination columns. Note that routing in the two di- rated by interlayer dielectrics (ILDs) for 3D circuit
mensions cannot generate cyclic dependencies as the base processing). A brief description of these alternatives is
routing algorithm in each dimension is deadlock-free and given in [4].
the packets travel the network in a dimension order. Generally, there are some main advantages using the
Therefore, the resulted routing algorithm is deadlock- third dimension in VLSI design and these advantages can
free. be very useful in NoC architectures. The benefits of 3D
ICs include: 1) higher packing density due to the addition
of a third dimension to the conventional two-dimensional
Algorithm Routing (C, D, Vadd, VC) layout, 2) higher performance due to reduced average
Inputs: Current node C=(XC, YC); Destination node interconnect length, and 3) lower interconnect power con-
D=(Xd, Yd); sumption due to the reduction in total wiring length [5].
Current sub-graph indicator Vadd; Furthermore, the 3D chip design technology can be ex-
Current virtual channel VC; ploited to build SoCs by placing circuits with different
Begin voltage and performance requirements in different layers
[4]. The first benefit is true for conventional circuits and
If (C=D) then return EjectionChannel; also for NoC architectures. For example, if 64 IP cores in a
NoC architecture are organized in a 3D network instead
If (XC#Xd) then of 2D organization, the chip area reduces almost four
(PC, g) = Next-Node-X (XC, Xd); times and we will have more integration in design. In
If Vadd= decreasing and g=increasing then VC++; conventional integrated circuits, the length of global
Vadd=g; wires is very important in latency and power consump-
return (PC, VC); tion especially in emerging deep sub-micron technologies.
Endif; In NoC architectures, even though wires are invariable in
size, links between vertical layers can be very short in
If (YC#Yd) then comparison with the links in each layer in the second di-
(PC, g) = Next-Node-Y (YC, Yd); mension. The shorter the links are, the less power they
If Vadd= decreasing and g=increasing then VC++; consume. The last benefit is very amazing in SoC design
Vadd=g; that is applicable in NoC architectures. The digital and
return (PC, VC); analog components in the mixed-signal systems can be
Endif; placed on different Si layers, thereby achieving better
noise-performance due to lower electromagnetic interfe-
rence between such circuit blocks [4].
Fig. 4. The pseudo code of the routing algorithm Many studies have investigated the performance of
. three dimensional designs [4, 16, 17, 18]. Most of such
evaluations are based on wire-length distributions. It
means that a stochastic 3D interconnect model is pre-
3. 3D VLSI TECHNOLOGIES sented and the impact of 3D integration on circuit per-
formance and power consumption is investigated. In this
There are four basic components in every NoC that study, our attention is focused on a network architecture
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 36

point of view of three dimensional technologies. link to its counterpart in Row#4, and Row#7 has 8 nodes
There are some characteristics and constraints that that each of them is connected with a bidirectional link to
should be considered when designing 3D architectures. its counterpart in Row#0.
The characteristics of conventional VLSI are still impotant By connecting vertical nodes with fast vertical intecon-
in the second dimension even in a 3D design context. nects, the performance of 3D structure outperforms the
2D implementations. The area of the chip is also reduced
A) Latency in vertical communications: Vertical links, due
by nearly two times. Moreover, this structure has another
to their small length, provide fast communication be-
useful characteristic: There are fewer links between nodes
tween vertical layers. For instance, in a 70nm technology,
in layer#0 and so the power that dissipates in this layer is
the distance between adjacent vertical layers is about 3-70
less than the upper layers. This feature can be beneficial
micron. If a 4x inverter drive the link, it takes about 7ps
in 3D architectures because of the heat dissipation in 3D
that is not considerable in comparison with the links in
VLSI technologies.
horizontal communications [15]. So, the capacitance and
resistance of these links are different. In [17, 18], the link
between two components in different active layers is split
into two parts: horizontal and vertical that are different in
resistance and capacitance.
B) Vertical link density in 3D technologies: The density of
links in vertical communications is limited by fabrication
process. It means that the space between vertical channels
is limited. Via pitch in different processes is variable be-
tween 1 to 7 micron [15, 19]. The constraints on links may
result in limitation in bus bandwidth. Compared to a wire
pitch of 0.1 µm, inter-layer vias are significantly larger
and cannot achieve the same wiring density as intra layer
interconnects [5].
Fig.5. 3D VLSI layout of 2D Kautz architecture.
C) Area overhead of vertical links: As mentioned, there is
limitation in density of vertical links, so they may take
significant area in each layer. It may be about 10 percent
of the area occupied by routers and switches [5].
D) Complexity of NoC routers: One inter-layer intercon-
4. SIMULATION ENVIRONMENT
nects option is to extend the NoC into three dimensions.
This requires the addition of two more links (up and To simulate the proposed NoC topologies, an intercon-
down) to each router. However, adding two extra links to nection network simulator that is developed based on
a NoC router will increase its complexity (from 5 links to POPNET [20] with an embedded latest version of Orion
7 links) [5]. power library [21] was used. Providing detailed power
characteristics of the network elements, Orion enables the
E) Heat dissipation: An extremely important issue in 3D designers to make rapid power evaluation at the architec-
ICs is heat dissipation. The problem is expected to be ex- ture level [21].
acerbated by the reduction in chip size, assuming that The POPNET simulator was modified to mimic the ex-
same power generated in a 2D chip will now be generat- act operation of the 2D mesh and the 2D Kautz NoCs. The
ed in a smaller 3D chip, resulting in a sharp increase in simulator was also customized to support other topolo-
the power density [4]. So, the number of active layers in gies and other routing algorithms such as shuffle-
3D technologies is usually limited to 4 or 5 layers. exchange [22], de Bruijn[23],and Kautz topologies [12].
F) Different layers: Different layers may be processed by The power consumption can be obtained for each compo-
different technologies, so the characteristics of each layer nent of the network and for each layer (if implemented
may differ and should be considered in simulation. using 3D VLSI technology).
For each simulated network, the physical link width
3.1. THE 3D VLSI LAYOUT OF 2D KAUTZ was assumed to be 32 bits. The power was calculated
In our proposed layout, we assume 4 layers for the 3D based on a NoC with 90 nm technology whose routers
implementation of the network and 64 nodes are distri- operate at 250 MHz. Based on the core size information
buted in these 4 layers. Fig. 3 (in the previous section) can presented in [24], the side size of each IP core was set to 2
be changed to Fig. 5 for 4 layers spanning in three dimen- mm, and the length of each wire was set based on the
sions. number of cores it passes through.
In Fig. 5, each node represents a row of the network The message length was assumed to be M=32 flits and
shown in Fig. 2. For instance, Row#5 has 8 nodes that V=2 virtual channels per physical channel were used. At
each of these nodes is connected with a unidirectional each node, messages were generated according to a Pois-

© 2011 Journal of Computing Press, NY, USA, ISSN 2151-9617


http://sites.google.com/site/journalofcomputing/
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

son distribution with an average generation rate of λ mes- well as the effect of buffers and link lengths in the power
sages per cycle. The simulations were performed under consumption.
uniform, matrix-transpose, and hotspot [25] traffic pat- Fig. 6(b) depicts power consumption of the considered
terns. networks under various traffics. As shown in the figure,
Note that for matrix-transpose traffic load, it was as- the 2D Kautz can effectively reduce the power consump-
sumed that 30% of the messages generated by a node tion of the NoC compared to the equivalent mesh topolo-
were of matrix-transpose type (i.e. node (x,y) sends a gy. The main source of such a noticeable reduction is the
message to node (y,x)) and the rest of the messages were less hop counts taken by the messages (on average) and
sent to other nodes uniformly. In hotspot traffic load, a hence saving the power which is consumed in interme-
hotspot rate of 16% was assumed, i.e. each node sent 16% diate routers in an equivalent mesh topology.
of messages to the hotspot node (which is assumed to be In Fig. 7, the average latency and power consumption
node (4,0) in the 88 network) and the rest of messages in 3D and 2D implementation of Kautz structure is
were sent to other network nodes uniformly. shown. In this section, the latency of horizontal links in
each layer are nearly 10 times slower than vertical links.
In the case of uniform traffic, the 3D structure has lower
4.1. SIMULATION RESULTS latency because of fast vertical links in the network.
In Fig. 6(a), the average message latency is displayed as a Smaller links consumes less power and so it can affect the
function of message generation for the 8×8 2D mesh and total power consumption.
2D Kautz NoCs using V=2 virtual channels per physical One of the main characteristics of 3D technologies is
channel and for two different message sizes M=32 flits reduction in power dissipation of links in the network.
under various traffic patterns. Fig. 8(a) compares total power that dissipates in links in
As can be seen in the Fig. 6(a), the 2D Kautz NoC the case of 2D and 3D structures. The results show that
achieves a reduction in message latency with respect to with the aid of vertical links the power dissipation de-
the simple 2D mesh network for the full range of network creases nearly 40 %. The 3D layout that has been pro-
load under various traffic patterns. Non-fixed lengths will posed in this paper has an important characteristic in
also result in some variations in the delay and power of three-dimensional designs. The lower layers in the archi-
the network links. Since the operating frequency of a NoC tecture consume less power because of fewer links and so
is often determined by the longest router pipeline stage, less traffic in the layers. This feature can reduce the effect
the long wires may not degrade the NoC operating fre- of heat dissipation in 3D VLSI designs. Fig. 8(b) shows
quency. This can be achieved by segmenting long links thepower that dissipates in each four layers of the struc-
into regular fixed length links connected by 1-flit buffers. ture.
The size of each segment equals to the size of a link con-
necting two adjacent nodes. Using 1-flit buffers (which is
inspired from pipelined circuit switching methods in
conventional interconnection networks [25]) provides
pipelining over the link and also acts as a repeater for it.
By sending the flits of a message over a long link in a pi-
pelined fashion, latency-insensitive operation is guaran-
teed as discussed in [7]. Note that we have taken this pi-
pelined transmission into account in the simulation, as

(a) (b)

Fig. 6. Average delay and total power comparison between two-dimensional Kautz and mesh structure.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 38

(a) (b)

Fig. 7. Average delay and total power comparison between three and two dimensional implementation of Kautz structure (Vertical links
are 10 times faster than horizontal links).

(a) (b)
Fig. 8. a) Total link power comparison between three and two dimensional implementation of Kautz. b) Layer Power consumption.

Fig. 9. The area overhead of 8x8 and 16x16 digraph-based mesh NoCs with different buffer depths
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

4.2 Area overhead 5 CONCLUSION


The area overhead due to the additional inter-router The simple 2D mesh topology has been widely used in
wires was analyzed by calculating the number of chan- a variety of interconnection network applications espe-
nels in a NoC. An n×n 2-d mesh has 2n(n-1) channels. The cially for NoC design. However, the Kautz network
2D Kautz NoCs have the same number of channels al- has not been studied yet as the underlying topology
though some links use longer wires. To evaluate the im- for 2D tiled NoCs. In this paper, we introduced the
posed area overhead, we estimated the area of the routers two-dimensional Kautz network which has the same
and links using Orion 2.0 tool [26]. The area of the router cost as the popular mesh, but has a logarithmic diame-
was calculated in 90nm technology. ter. We then conducted a comparative simulation
In the analysis, the lengths of input and output net- study to assess the network latency and power con-
work interface buffers were considered as large as 64 flits. sumption of the two networks. Results showed that the
This was a modest size for the network interface queues. 2D Kautz topology improves the network latency es-
In Table 3, the area overhead of a Kautz NoC is eva- pecially for heavy traffic loads. The power consump-
luated for 88 networks. The results show that in an 88 tion in the 2D Kautz network was also less than that of
mesh the total area of the links and routers are 0.0366 the equivalent simple 2D mesh NoC. Furthermore, in
mm2 and 0.2742 mm2, respectively. Based on these area this paper, a 3D VLSI layout for the 2D Kautz is pro-
estimations, the area of the network part of the 2-D Kautz posed and this design has caused to further decrease
network shows a 22% increase compared to the 2-D mesh thepower consumpsion, especially in the NoC links.
NoC of equal size. Considering 2mm×2mm processing Combination of Kautz with other topologies can be
elements, the increase in the entire chip area is about 2%. a challenging future work in this line.
It is noteworthy that the router area is a function of its
buffers depth. The depth of the input and output buffers
REFERENCES
of the routers, in this paper, were set to 2 flits (or 4-flit
buffer at each port). A performance analysis in [27], how- [1] C. Grecu, P. P. Pande, A. Ivanov, and R. Saleh, "Timing Analy-
ever, showed that a typical router in next generation sis of Network on Chip Architectures for MP-Soc Platforms,"
high-performance CMPs (using 22nm technology) will Microelectronics, Vol. 36, pp. 833-845, 2005.
need 64-flit buffers at each port, which result in a higher [2] P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, "Per-
router/switch area. Fig. 9 plots the area overhead of the formance Evaluation and Design Trade-Offs for Network-on-
2D Kautz NoCs over a mesh with (the same size and buf- Chip Interconnection Architectures," IEEE Transactions on Com-
fering capacity) for network size N=88 and different puters, Vol. 54, No. 8, August 2005.
buffer depths in a system with 32-bit channels. Based on [3] T. Bjerregaard and S. mahadevan, "A Survey of Research and
Fig. 9, a system using 64-flit buffers reduces the area Practices of Network-on-Chip," ACM Computing Surveys, Vol.
overhead of the 2D Kautz networks to about 9%. 38, March 2006.
[4] K. Banerjee, S. J. Souri, P. Kapur, and K. C. Saraswat, "3D ICs: A
Table 1 Novel Chip Design for Improving Deep-Submicrometer Inter-
Area Comparison connect Performance and Systems-on-Chip Integration," Pro-
ceedings of the IEEE, Vol. 89, No. 5, May 2001.
[5] F. Li, C. Nicopoulis, T. Richardson, Y. Xie, V. Krishnan, and M.
Kandemir, "Design and Management of 3D Chip Multiproces-
Added
sors Using Network-in-Memory," International Symposium on
Link Router Added area with
Computer Architecture (ISCA'06), USA, pp. 130-141, 2006.
Network Area Area area (Per- respect to
[6] K. Srivasan, K. S. Chata, and G. Konjevad, “Linear Program-
(mm2) (mm2) centage) the whole
ming Based Techniques for Synthesis of Networks-on-chip Ar-
chip
chitectures,” IEEE International conference on Computer Design
64-node
0.0366 0.2742 0 0 (ICCD), pp. 422-429, 2004.
2D Mesh [7] U. Y. Ogras and R. Marculescu, “Application-Specific Network-
64-node on-Chip Architecture Customization via Long-Range Link In-
0.0627 0.2742 22.75 2.08
2DKautz sertion”, in IEEE/ACM Intl. Conf. on Computer Aided Design, San
Jose, 2005.
[8] W. J. Dally, “Express Cubes: Improving the Performance of K-
ary N-cube Interconnection Networks," in IEEE Trans. on Com-
puters, Vol. 40, No. 9, 1991.
[9] R. Sabbaghi-Nadooshan, M. Modarressi, and H. Sarbazi-Azad,
“The 2D digraphed-based NoCs:attractive alternatives to the
2Dmesh NoCs”, Journal of Supercomputing, DOI
10.1007/s11227-010-0410-6.
[10] W. H. Kautz, “The design of optimum interconnection net-
works for multiprocessors,” In Architecture and design of digi-
tal computers. Nato Advanced summer Institute, pp. 249-77,
1969.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 40

[11] J. C. Bermond, R. W. Dawas, and F. O. Ergincan, “De Bruijn and Tehran, Iran, in 1991 and 1994 and the Ph.D. degree in electrical
engineering from the Science and Research Branch, Islamic Azad
Kautz Bus Networks,” Networks, vol.30, No.3, pp. 205-218,
University, Tehran, Iran in 2010. From 1998 he became faculty
1997. member of Department of Electronics in Central Tehran branch,
[12] R. Sabbaghi-Nadooshan and H. Sarbazi-Azad, “The Kautz Islamic Azad University, Tehran, Iran. His research interests include
mesh: A New Topology for SoCs”, International SoC Design interconnection networks, Networks-on-Chips, and embedded sys-
tems.
Conference, pp 300-303, 2008.
[13] C. Chen, P. Agarwal and J. R. Burke, “dBcube :A New class of
Hierarchical Multiprocessor Interconnection Networks with
Area Efficient Layout,” IEEE Transaction on Parallel and Distri-
buted Systems,Vol.4,No.12,pp.1332-1344,Dec1993.
[14] H. Park and D.P. Agrawal, “A Novel Deadlock-free Routing
Technique for a class of de Bruijn based Networks,” 7th IEEE
Symposium on Parallel & Distributed Processing, pp. 92-97, 1995.
[15] K. Puttaswamy and G. H. Loh, "Implementing Caches in a 3-D
technology for High Performance Processors," IEEE Internation-
al Conference on Computer Design: VLSI in computers and Proces-
sors (ICCD'05), USA, pp. 525-532, 2005.
[16] S. Das, A. Chandrakasan and R. Reif, "Three Dimensional Inte-
grated Circuits: Performance, Design, Methodology and CAD
tools," IEEE Computer Society Annual Symposium on VLSI, USA,
pp. 13-18, 2003.
[17] S. J. Souri, K. Banerjee, A. Mehrotra and K. C. Saraswat, "Mul-
tiple Si Layer ICs: Motivation, Performance Analysis and De-
sign Implications," Design Automation Conference (DAC'00),
USA, pp. 213-220, 2000.
[18] R. Zhang, K. Roy, C.K. Koh, and D. B. Janes, "Power Trends and
Performance Characterization of 3-Dimensional Integration for
Future Technology Generation," International Symposium on
Quality Electronic Design, USA, pp. 217-222, 2001.
[19] J. Cong, A. Jagannathan, Y. Ma, G. Reinman, J. Wei and Y.
Zhang, "An Automated Design Flow for 3D Microarchitecture
Evaluation," Asia and South Pacific Conference on Design Automa-
tion (ASPDAC'06), pp. 384-389, 2006.
[20] http://www.princeton.edu/~lshang/popnet.html, August
2007.
[21] H. Wang, X. Zhu, L. Peh, and S. Malik, “Orion: A Power-
Performance Simulator for Interconnection Networks,” 35th In-
ternational Symposium on Microarchitecture, pp. 294-305, 2002.
[22] R. Sabbaghi-Nadooshan, M.Modarressi, and H. Sarbazi-Azad,
“2DSEM: A novel high-performance and low-power mesh-
bases topology for networks-on-chip” International Journal of
Parallel ,Emergent and Distributed Systems,Vol.25, No.4, pp.
331-344, August 2010.
[23] R. Sabbaghi-Nadooshan, M. Modarressi, and H. Sarbazi-Azad,
“2D DBM: An Attractive Alternative to the Simple 2D Mesh
Topology for on-chip Networks,” ICCD, pp. 486-491, 2008.
[24] R. Mullins, A. West, and S. Moore, “The Design and Implemen-
tation of a Low-Latency On-Chip Network,” Asia and South Pa-
cific Design Automation Conference, pp. 164-169, 2006.
[25] J. Duato, S. Yalamanchili, and N. Li, Interconnection Net-
works: An Engineering Approach, Morgan Kaufmann Publishers,
2005.
[26] A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi, “Orion2.0: A Fast
and Accurate NoC Power and Area Model for Early-Stage De-
sign Space Exploration,” DATE, pp. 423-428, 2009.
[27] J. Duato, et al, “A High Performance Router Architecture for
Interconnection Networks," in Proc. Int. Conf. Parallel Processing,
pp.61-68, 1996.

R. Sabbaghi-Nadooshan received the B.S. and M.S. degree in


electrical engineering from the Science and Technology University,

You might also like