Professional Documents
Culture Documents
Kautz Mesh Topology For On-Chip Networks
Kautz Mesh Topology For On-Chip Networks
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 33
Abstract— During the recent years, 2D mesh Network-on-Chip has attracted much attention due to its suitability for VLSI
implementation. This paper introduces the 2-dimensional Kautz topology for Network-on-Chips as an attractive alternative to the
popular simple 2D mesh. The cost of 2D Kautz is equal to that of the simple 2D mesh but it has a logarithmic diameter. We
compare the proposed network and the mesh network in terms of power consumption and network performance. Compared to
the equal sized simple mesh NoC, the proposed Kautz-based network has better performance while consuming less energy.
Also by vertically stacking two or more silicon wafers, connected with a high-density and high-speed interconnect, it is now
possible to combine multiple active device layers within a single IC. In this paper we propose an efficient three dimensional
layout for a novel 2D mesh structure based on the Kautz topology. Simulation results show that by using the third dimension,
performance and latency can be improved compared to the 2D VLSI implementation.
Index Terms— 3D layout, 3D NoCs, Kautz, NoCs, Performance evaluation, Power consumption, SoC.
—————————— ——————————
1 INTRODUCTION
cusses the background in NoC, 3D VLSI, and explains the while it imposes almost the same cost.
characteristics of 3D designs that may affect the perfor-
mance of 3D NoCs. Also the 3D VLSI layout of the 2D
Kautz structure is discussed in section 3. Section 4 ex-
plains the simulation environment that the results have
obtained in it, and then the simulation results and expe-
rimental evaluation of our approach are described. Final-
ly, we conclude the paper in section 5.
2 K AUTZ TOPOLOGY
2.1 Kautz architecture
A class of digraphs that generalize Kautz digraphs [10] is
defined in [11]. Suppose that the vertices are numbered
with integers modulo N (N being the number of nodes).
Assuming an out-degree of d, then a vertex v is joined to
vertices u = -dv - i (mod N), for i=1, 2, …, d. The diameter
of the resulting digraph is at most log d N and if N=dn+dn-k
(for a positive odd integer k), the diameter becomes n.
Based on the above definition, an N-node binary Kautz
digraph is then defined as follows. Node v, with a linear
address 0…N-1, is joined to vertices with integer address -
2v-1 (mod N) and -2v-2 (mod N).
An 8-node binary Kautz digraph is depicted in Fig. 1.
Each node generates two connections to other nodes and Fig. 2. 8×8 2D Kautz
accepts two connections from other nodes. Owing to the
fact that these connections are unidirectional, the degree 2-3 Layout improvement
of the network is the same as the one-dimensional mesh
Based on the necklace properties in de Bruijn layout [13],
networks. The diameter of a Kautz network with size N is
we have considered a more efficient layout for each row
log (N) which is the minimum distance between nodes 0
and column of the Kautz as shown in Fig. 3. With this
and N.
new layout the total wire length used in the network is
decreased. For example, for an 8×8 2D Kautz about 57%
reduction in total wire length is obtained.
5 4 1 7 0 3 6 2
Fig. 1. 1D Kautz network with 8 nodes.
2-2 Two dimensional Kautz Fig. 3. A better node placement of Kautz network
The 2D Kautz networks have some interesting topological
properties which motivates us to consider them as a suit-
able candidate for on-chip network architectures [12]. The 2-4 Routing algorithm
architecture of this network is depicted in Fig. 2. In this
Kautz graph has similarities to de Bruijn graph [11], and
network, the nodes in each row and column form a Kautz
for routing algorithm we adopt the routing methods in-
network. The most important property is that while the
troduced in [14] by Park as a common routing scheme
number of links in a 2D Kautz and an equal-size mesh are
when developing routing algorithms for the 2D Kautz
exactly the same, the network diameter of the 2D Kautz is
networks.
less than the diameter of the mesh.
Park’s algorithm uses virtual channels unevenly where
The Kautz links are unidirectional and the maximum
very few packets use all the virtual channels. Numbering
of 8 unidirectional links per node in the Kautz equals to
virtual channels in an increasing order starting from 0, it
the maximum number of links for mesh nodes (which are
uses virtual channel 0 more than other virtual channels.
4 bidirectional links). Since the node degree of a topology
For example, for N=16, some of the source nodes use only
has an important contribution in (and usually acts as the
virtual channel 0, some nodes use virtual channels 0 and
dominant factor of) the network cost, the proposed topol-
1, and a few source nodes use virtual channels 0, 1, and 2.
ogy can achieve lower average distance than a 2D mesh
This routing algorithm was revised to make a more ba-
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
lanced use of virtual channels. Here, at each node, the should be considered: 1) Processing elements that are the
packet has a degree of flexibility in selecting virtual chan- IP cores connected by the network; 2) Routers and
nels [9]. For the same example of N=16 given above, the switches that route the packets till received at destination;
last group of source nodes (that use 3 virtual channels), 3) Network adapters that are the interface between PEs
virtual channel 0 is selected to inject the message. For the and switches; 4) Links that connect two adjacent switches.
second group, packets start their journey with virtual When analyzing the network, the behavior of these com-
channel 1, and for the first group (using 1 virtual channel) ponents should be considered carefully. A different set of
they start with virtual channel 2. If a message wants to constraints exists when adapting these architectures to
start with virtual channel 2 and it is occupied, it can try the SoC design paradigm. High throughput and low la-
virtual channel 1, and if it is also occupied it can try start- tency are the desirable characteristics of a multiprocessor
ing with virtual channel 0. Through this method, the vir- system. Instead of aiming strictly for speed, designers
tual channels are used more uniformly that results in a increasingly need to consider energy consumption con-
more balanced traffic over network channels. Fig. 4 shows straints, especially in the SoC domain [15]. So, in order to
the pseudo code of the routing algorithm. compare different NoC architectures, there are some im-
In this paper, we apply the routing algorithm given in portant metrics that should be considered such as net-
Fig. 4. To this end, a message first routes the packet in X work latency, energy consumption, and throughput [15].
dimension using the above-mentioned routing algorithm In this paper, these metrics are investigated in mesh and
in such a way that the route length be minimal. When the 2D Kautz architectures in the context of 3D VLSI.
x-value of the current node and x-value of the destination Presently, there are several possible fabrication tech-
node addresses become equal, the packet is then routed to nologies that can be used to realize multiple layers of ac-
the destination by applying the same routing algorithm in tive-area (single crystal Si or re-crystallized poly-Si) sepa-
the destination columns. Note that routing in the two di- rated by interlayer dielectrics (ILDs) for 3D circuit
mensions cannot generate cyclic dependencies as the base processing). A brief description of these alternatives is
routing algorithm in each dimension is deadlock-free and given in [4].
the packets travel the network in a dimension order. Generally, there are some main advantages using the
Therefore, the resulted routing algorithm is deadlock- third dimension in VLSI design and these advantages can
free. be very useful in NoC architectures. The benefits of 3D
ICs include: 1) higher packing density due to the addition
of a third dimension to the conventional two-dimensional
Algorithm Routing (C, D, Vadd, VC) layout, 2) higher performance due to reduced average
Inputs: Current node C=(XC, YC); Destination node interconnect length, and 3) lower interconnect power con-
D=(Xd, Yd); sumption due to the reduction in total wiring length [5].
Current sub-graph indicator Vadd; Furthermore, the 3D chip design technology can be ex-
Current virtual channel VC; ploited to build SoCs by placing circuits with different
Begin voltage and performance requirements in different layers
[4]. The first benefit is true for conventional circuits and
If (C=D) then return EjectionChannel; also for NoC architectures. For example, if 64 IP cores in a
NoC architecture are organized in a 3D network instead
If (XC#Xd) then of 2D organization, the chip area reduces almost four
(PC, g) = Next-Node-X (XC, Xd); times and we will have more integration in design. In
If Vadd= decreasing and g=increasing then VC++; conventional integrated circuits, the length of global
Vadd=g; wires is very important in latency and power consump-
return (PC, VC); tion especially in emerging deep sub-micron technologies.
Endif; In NoC architectures, even though wires are invariable in
size, links between vertical layers can be very short in
If (YC#Yd) then comparison with the links in each layer in the second di-
(PC, g) = Next-Node-Y (YC, Yd); mension. The shorter the links are, the less power they
If Vadd= decreasing and g=increasing then VC++; consume. The last benefit is very amazing in SoC design
Vadd=g; that is applicable in NoC architectures. The digital and
return (PC, VC); analog components in the mixed-signal systems can be
Endif; placed on different Si layers, thereby achieving better
noise-performance due to lower electromagnetic interfe-
rence between such circuit blocks [4].
Fig. 4. The pseudo code of the routing algorithm Many studies have investigated the performance of
. three dimensional designs [4, 16, 17, 18]. Most of such
evaluations are based on wire-length distributions. It
means that a stochastic 3D interconnect model is pre-
3. 3D VLSI TECHNOLOGIES sented and the impact of 3D integration on circuit per-
formance and power consumption is investigated. In this
There are four basic components in every NoC that study, our attention is focused on a network architecture
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 36
point of view of three dimensional technologies. link to its counterpart in Row#4, and Row#7 has 8 nodes
There are some characteristics and constraints that that each of them is connected with a bidirectional link to
should be considered when designing 3D architectures. its counterpart in Row#0.
The characteristics of conventional VLSI are still impotant By connecting vertical nodes with fast vertical intecon-
in the second dimension even in a 3D design context. nects, the performance of 3D structure outperforms the
2D implementations. The area of the chip is also reduced
A) Latency in vertical communications: Vertical links, due
by nearly two times. Moreover, this structure has another
to their small length, provide fast communication be-
useful characteristic: There are fewer links between nodes
tween vertical layers. For instance, in a 70nm technology,
in layer#0 and so the power that dissipates in this layer is
the distance between adjacent vertical layers is about 3-70
less than the upper layers. This feature can be beneficial
micron. If a 4x inverter drive the link, it takes about 7ps
in 3D architectures because of the heat dissipation in 3D
that is not considerable in comparison with the links in
VLSI technologies.
horizontal communications [15]. So, the capacitance and
resistance of these links are different. In [17, 18], the link
between two components in different active layers is split
into two parts: horizontal and vertical that are different in
resistance and capacitance.
B) Vertical link density in 3D technologies: The density of
links in vertical communications is limited by fabrication
process. It means that the space between vertical channels
is limited. Via pitch in different processes is variable be-
tween 1 to 7 micron [15, 19]. The constraints on links may
result in limitation in bus bandwidth. Compared to a wire
pitch of 0.1 µm, inter-layer vias are significantly larger
and cannot achieve the same wiring density as intra layer
interconnects [5].
Fig.5. 3D VLSI layout of 2D Kautz architecture.
C) Area overhead of vertical links: As mentioned, there is
limitation in density of vertical links, so they may take
significant area in each layer. It may be about 10 percent
of the area occupied by routers and switches [5].
D) Complexity of NoC routers: One inter-layer intercon-
4. SIMULATION ENVIRONMENT
nects option is to extend the NoC into three dimensions.
This requires the addition of two more links (up and To simulate the proposed NoC topologies, an intercon-
down) to each router. However, adding two extra links to nection network simulator that is developed based on
a NoC router will increase its complexity (from 5 links to POPNET [20] with an embedded latest version of Orion
7 links) [5]. power library [21] was used. Providing detailed power
characteristics of the network elements, Orion enables the
E) Heat dissipation: An extremely important issue in 3D designers to make rapid power evaluation at the architec-
ICs is heat dissipation. The problem is expected to be ex- ture level [21].
acerbated by the reduction in chip size, assuming that The POPNET simulator was modified to mimic the ex-
same power generated in a 2D chip will now be generat- act operation of the 2D mesh and the 2D Kautz NoCs. The
ed in a smaller 3D chip, resulting in a sharp increase in simulator was also customized to support other topolo-
the power density [4]. So, the number of active layers in gies and other routing algorithms such as shuffle-
3D technologies is usually limited to 4 or 5 layers. exchange [22], de Bruijn[23],and Kautz topologies [12].
F) Different layers: Different layers may be processed by The power consumption can be obtained for each compo-
different technologies, so the characteristics of each layer nent of the network and for each layer (if implemented
may differ and should be considered in simulation. using 3D VLSI technology).
For each simulated network, the physical link width
3.1. THE 3D VLSI LAYOUT OF 2D KAUTZ was assumed to be 32 bits. The power was calculated
In our proposed layout, we assume 4 layers for the 3D based on a NoC with 90 nm technology whose routers
implementation of the network and 64 nodes are distri- operate at 250 MHz. Based on the core size information
buted in these 4 layers. Fig. 3 (in the previous section) can presented in [24], the side size of each IP core was set to 2
be changed to Fig. 5 for 4 layers spanning in three dimen- mm, and the length of each wire was set based on the
sions. number of cores it passes through.
In Fig. 5, each node represents a row of the network The message length was assumed to be M=32 flits and
shown in Fig. 2. For instance, Row#5 has 8 nodes that V=2 virtual channels per physical channel were used. At
each of these nodes is connected with a unidirectional each node, messages were generated according to a Pois-
son distribution with an average generation rate of λ mes- well as the effect of buffers and link lengths in the power
sages per cycle. The simulations were performed under consumption.
uniform, matrix-transpose, and hotspot [25] traffic pat- Fig. 6(b) depicts power consumption of the considered
terns. networks under various traffics. As shown in the figure,
Note that for matrix-transpose traffic load, it was as- the 2D Kautz can effectively reduce the power consump-
sumed that 30% of the messages generated by a node tion of the NoC compared to the equivalent mesh topolo-
were of matrix-transpose type (i.e. node (x,y) sends a gy. The main source of such a noticeable reduction is the
message to node (y,x)) and the rest of the messages were less hop counts taken by the messages (on average) and
sent to other nodes uniformly. In hotspot traffic load, a hence saving the power which is consumed in interme-
hotspot rate of 16% was assumed, i.e. each node sent 16% diate routers in an equivalent mesh topology.
of messages to the hotspot node (which is assumed to be In Fig. 7, the average latency and power consumption
node (4,0) in the 88 network) and the rest of messages in 3D and 2D implementation of Kautz structure is
were sent to other network nodes uniformly. shown. In this section, the latency of horizontal links in
each layer are nearly 10 times slower than vertical links.
In the case of uniform traffic, the 3D structure has lower
4.1. SIMULATION RESULTS latency because of fast vertical links in the network.
In Fig. 6(a), the average message latency is displayed as a Smaller links consumes less power and so it can affect the
function of message generation for the 8×8 2D mesh and total power consumption.
2D Kautz NoCs using V=2 virtual channels per physical One of the main characteristics of 3D technologies is
channel and for two different message sizes M=32 flits reduction in power dissipation of links in the network.
under various traffic patterns. Fig. 8(a) compares total power that dissipates in links in
As can be seen in the Fig. 6(a), the 2D Kautz NoC the case of 2D and 3D structures. The results show that
achieves a reduction in message latency with respect to with the aid of vertical links the power dissipation de-
the simple 2D mesh network for the full range of network creases nearly 40 %. The 3D layout that has been pro-
load under various traffic patterns. Non-fixed lengths will posed in this paper has an important characteristic in
also result in some variations in the delay and power of three-dimensional designs. The lower layers in the archi-
the network links. Since the operating frequency of a NoC tecture consume less power because of fewer links and so
is often determined by the longest router pipeline stage, less traffic in the layers. This feature can reduce the effect
the long wires may not degrade the NoC operating fre- of heat dissipation in 3D VLSI designs. Fig. 8(b) shows
quency. This can be achieved by segmenting long links thepower that dissipates in each four layers of the struc-
into regular fixed length links connected by 1-flit buffers. ture.
The size of each segment equals to the size of a link con-
necting two adjacent nodes. Using 1-flit buffers (which is
inspired from pipelined circuit switching methods in
conventional interconnection networks [25]) provides
pipelining over the link and also acts as a repeater for it.
By sending the flits of a message over a long link in a pi-
pelined fashion, latency-insensitive operation is guaran-
teed as discussed in [7]. Note that we have taken this pi-
pelined transmission into account in the simulation, as
(a) (b)
Fig. 6. Average delay and total power comparison between two-dimensional Kautz and mesh structure.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 38
(a) (b)
Fig. 7. Average delay and total power comparison between three and two dimensional implementation of Kautz structure (Vertical links
are 10 times faster than horizontal links).
(a) (b)
Fig. 8. a) Total link power comparison between three and two dimensional implementation of Kautz. b) Layer Power consumption.
Fig. 9. The area overhead of 8x8 and 16x16 digraph-based mesh NoCs with different buffer depths
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
[11] J. C. Bermond, R. W. Dawas, and F. O. Ergincan, “De Bruijn and Tehran, Iran, in 1991 and 1994 and the Ph.D. degree in electrical
engineering from the Science and Research Branch, Islamic Azad
Kautz Bus Networks,” Networks, vol.30, No.3, pp. 205-218,
University, Tehran, Iran in 2010. From 1998 he became faculty
1997. member of Department of Electronics in Central Tehran branch,
[12] R. Sabbaghi-Nadooshan and H. Sarbazi-Azad, “The Kautz Islamic Azad University, Tehran, Iran. His research interests include
mesh: A New Topology for SoCs”, International SoC Design interconnection networks, Networks-on-Chips, and embedded sys-
tems.
Conference, pp 300-303, 2008.
[13] C. Chen, P. Agarwal and J. R. Burke, “dBcube :A New class of
Hierarchical Multiprocessor Interconnection Networks with
Area Efficient Layout,” IEEE Transaction on Parallel and Distri-
buted Systems,Vol.4,No.12,pp.1332-1344,Dec1993.
[14] H. Park and D.P. Agrawal, “A Novel Deadlock-free Routing
Technique for a class of de Bruijn based Networks,” 7th IEEE
Symposium on Parallel & Distributed Processing, pp. 92-97, 1995.
[15] K. Puttaswamy and G. H. Loh, "Implementing Caches in a 3-D
technology for High Performance Processors," IEEE Internation-
al Conference on Computer Design: VLSI in computers and Proces-
sors (ICCD'05), USA, pp. 525-532, 2005.
[16] S. Das, A. Chandrakasan and R. Reif, "Three Dimensional Inte-
grated Circuits: Performance, Design, Methodology and CAD
tools," IEEE Computer Society Annual Symposium on VLSI, USA,
pp. 13-18, 2003.
[17] S. J. Souri, K. Banerjee, A. Mehrotra and K. C. Saraswat, "Mul-
tiple Si Layer ICs: Motivation, Performance Analysis and De-
sign Implications," Design Automation Conference (DAC'00),
USA, pp. 213-220, 2000.
[18] R. Zhang, K. Roy, C.K. Koh, and D. B. Janes, "Power Trends and
Performance Characterization of 3-Dimensional Integration for
Future Technology Generation," International Symposium on
Quality Electronic Design, USA, pp. 217-222, 2001.
[19] J. Cong, A. Jagannathan, Y. Ma, G. Reinman, J. Wei and Y.
Zhang, "An Automated Design Flow for 3D Microarchitecture
Evaluation," Asia and South Pacific Conference on Design Automa-
tion (ASPDAC'06), pp. 384-389, 2006.
[20] http://www.princeton.edu/~lshang/popnet.html, August
2007.
[21] H. Wang, X. Zhu, L. Peh, and S. Malik, “Orion: A Power-
Performance Simulator for Interconnection Networks,” 35th In-
ternational Symposium on Microarchitecture, pp. 294-305, 2002.
[22] R. Sabbaghi-Nadooshan, M.Modarressi, and H. Sarbazi-Azad,
“2DSEM: A novel high-performance and low-power mesh-
bases topology for networks-on-chip” International Journal of
Parallel ,Emergent and Distributed Systems,Vol.25, No.4, pp.
331-344, August 2010.
[23] R. Sabbaghi-Nadooshan, M. Modarressi, and H. Sarbazi-Azad,
“2D DBM: An Attractive Alternative to the Simple 2D Mesh
Topology for on-chip Networks,” ICCD, pp. 486-491, 2008.
[24] R. Mullins, A. West, and S. Moore, “The Design and Implemen-
tation of a Low-Latency On-Chip Network,” Asia and South Pa-
cific Design Automation Conference, pp. 164-169, 2006.
[25] J. Duato, S. Yalamanchili, and N. Li, Interconnection Net-
works: An Engineering Approach, Morgan Kaufmann Publishers,
2005.
[26] A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi, “Orion2.0: A Fast
and Accurate NoC Power and Area Model for Early-Stage De-
sign Space Exploration,” DATE, pp. 423-428, 2009.
[27] J. Duato, et al, “A High Performance Router Architecture for
Interconnection Networks," in Proc. Int. Conf. Parallel Processing,
pp.61-68, 1996.