You are on page 1of 5

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.

ORG

53

HyperX-NOC: Multipath optimized networkon-chip topology


Reza Kourdy Department of Computer Engineering Islamic Azad University, Khorramabad Branch, Iran Mohammad Reza Nouri rad Department of Computer Engineering Islamic Azad University, Khorramabad Branch, Iran

AbstractIn recent years, researchers have proposed a wide variety of topologies for data-center networks, all with the goal of providing high bisection bandwidth at low cost. In this paper we focus on HyperX topology because it is considered attractive for high bisection bandwidth data-center networks.For large-scale networks, high-radix switches reduce hop and switch count, which decreases latency and power. Silicon nanophotonic technology provides a long-term solution to this problem. We also carry out the high-level simulation of on chip network using NS-2 to verify the analytical analysis. Index Terms HyperX, Network on chip (NoC), Multi-Processor-System-On-Chip (MpSoC)

1 INTRODUCTION
n Multi-Processor-System-On-Chip (MpSoC), the system is comprised of multiple processors and functional units. In the past, inter-module on-chip was mostly based on shared buses between various masters and slaves. This communication approach is feasible for a system with a small number of modules. However, recent systems which require high volume transfer of data between multiple modules across the chip are not straightforwardly serviced by bus based interconnects. To serve the communication needs of such large systems scalablepacket-switched Network-on-Chips (NoC) [1] have been developed. NoC provides a feasible solution to handle the performance and the scalability issues, and is actively studied in academia [2, 3, 4, and 5]. Every 11 years since 1976, supercomputer performance has increased by a factor of 1000, reaching petaflop performance in 2008 [6]. By 2019, Moore's law is expected to provide a 160-fold increase in transistor density. While single thread performance is rising only modestly, we expect a commensurate improvement in peak per-socket performance due to increased core count coupled with advances in memory technology to provide the necessary memory bandwidth to maintain system balance. This implies that at least a sixfold increase in the number of sockets will be needed to achieve exaop performance. In 2008, the first two petascale systems were delivered: IBM's 18,802 socket Roadrunner [7], and Cray's Jaguar [8] comprising approximately 38,000 sockets. Previous work has shown that high-radix switches can help achieve high bandwidth and low latency at reasonable power [8]. High-radix switches with high bandwidth links cut the number of routing chips while maintaining high network bandwidth, and reduce latency due to reduced hop count. Supercomputers have often used a folded Clos (also called a fat tree) network topology. Kim, Dally, and Abts [10] showed, however, that when highradix switches are available, the attened buttery is more cost effective than the folded Clos.

2 RELATED WORK
D The Hyper-X network, proposed by Ahn, et al., [11], also recognizes the benefits of a flattened butterfly topology for building highly scalable networks. They do not address power or energy efficiency in their work. Al-Fares, et al., [12] present a scheme to build a high-bandwidth datacenter network in a folded-Clos topology with commodity ethernet chips. They find that their network, composed of 1 Gb/s ethernet chips, uses considerably less power than a hierarchical network composed of high-end, power inefficient 10 Gb/s switches. Our topology comparison is independent of any particular switch chip. A switch with sufficient radix, routing, and congestion-sensing capabilities allows building a flattened butterfly topology, which uses half the number of switch chips of the folded-Clos. We then go even further by dynamically adjusting the power envelope of the network to increase its energy proportionality, an issue not tackled by Al-Fares et al. The VL2 datacenter network proposed by Greenberg, et al., [13] also uses the foldedClos topology, as does Portland [14]. They do not address the issue of power.
A recent study [15] shows that an exascale system will likely have 100,000 computational nodes. The increasing scale and performance will put tremendous pressure on the network, which is rapidly becoming both a power and a performance bottleneck [16]. High-radix network switches [17] are attractive since increasing the radix reduces the number of switches required for a given system size and the number of hops a packet must travel from source to destination. Both factors contribute to reduced communication latency, component cost, and power. Highradix switches can be connected hierarchically (in topologies such as folded Clos networks [18]), directly (in a flattened butterfly or HyperX topology [19, 20]), or in a hybrid manner [21].

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

54

3 MULTIPATH TOPOLOGIES
Most of these topologies also require choices to be made for a variety of parameters: N:total number of servers (or external connections) R:switch radix (port count) T:terminals connected to a single switch S:total number of switches L:levels of a tree D:dimensions in a HyperX network K:link bandwidth W:total number of links in a network C:number of top switches in a tree In this paper, we consider these topology families:

3.1 FatTree Rather than limiting our approach to the three-level k-ary fat tree structures described by Al Fares et al. [22], we consider a generalized version of the Clos topologies with parametrized levels and fatness at each level, which were first defined by hring et al. [23] as Extended Generalized Fat Trees (EGFTs). 3.2 HyperX HyperX [11] is an extension of the hypercube and flattened butterfly topologies. Switches are points in a D-dimensional integer lattice, with Sk switches in each dimension k = 1..D. The dimensions need not be equal. A switch connects to all other switches that share all but one of its coordinates. (E.g., in a 2-D HyperX, a switch connects to all switches in the same row and in the same column.) The link bandwidths K1, ...,KD are assumed to be fixed in each dimension, but can vary across dimensions. At each switch, T ports are assigned to server downlinks. , ,T), We can describe a network as HyperX(D, S K with S and as vectors. HyperX(D, S , K ,T) has K D K =1 S k switches, T. However, we plan to support other interesting server-toserver topologies such as BCube [23] and CamCube [24], as well as traditional 2- or 3-tier topologies, to allow designers improved flexibility.
D D K =1 S k servers, and [(S/2)* K =1 [(S k 1).K k ] ] links.

values of S, and to allow different bandwidths between terminal and intra-switch links. We let K(K1; ;KL) represent the relative link bandwidths in each of the dimensions, where the unit of bandwidth (conceptually offered by one hardware link and one switch port) is the bandwidth of the terminal-to-switch connections. A regular HyperX is one for which Sk = S and Kk = K for all k = 1..L. Thus, a regular HyperX is determined by the parameters L, S, K, and T and we shall refer to it as a regular (L; S;K; T) HyperX. Figure 1 shows two examples of the HyperX topology. There are 32 terminals shown in Figure 1(a) where 4 terminals are attached per switch. Switches are organized in two dimensions. There are two switches in the first dimension, and four switches in the second dimension creating an irregular HyperX. The regular HyperX in Figure 1(b) also has 4 terminals per switch and two switch dimensions, but each dimension consists of three switches and supports 36 terminals. Note that a hypercube is a regular HyperX with (S= 2; K=1; T=1). A fully connected graph is a HyperX with L = 1. We can also describe the flattened buttery topology as a regular HyperX with T = S and either K = 1 or for full bisection bandwidth, K = 2. The topology of the YARC high-radix switch [25] is (L = 2; S = 8;K = 1; T = 1). The bisection bandwidth of a HyperX is realized by cutting one of its dimensions in half. The channel bisection of such a cut is Cm (1/4) KmSm K =1 S k = (P/4) KmSm (1) if dimension m is bisected. If KmSm is smallest among all the diL

A HyperX is a direct network of switches in which each switch is connected to some fixed number T of terminals. A terminal can be a compute node, cluster of compute nodes, I/O node, or any other interconnected device. The switches are viewed as points in an Ldimensional integer lattice. Each switch is identified by a coordinate vector, or multi-index I(I1; ; IL) where 0 Ik < Sk for each k = 1..L. Fig.1. Two examples of the HyperX topology. Dark circles (marked In each dimension, the switches are fully connected. Thus, there are T) are terminals, light circles (marked R) are switches. bidirectional links from each switch to exactly L =1 ( S k 1) other switches: a switch connects to all others K whose multi-index is the same in all but one coordinate. The num- mensions of the HyperX, so that KmSm KkSk for all k = 1;;L, L then (1) determines the bisection bandwidth. ber P of switches in the HyperX satisfies P = K =1 S k . In a simple HyperX all links have uniform bandwidth. The topology can be generalized by allowing the link bandwidths to be mul- A HyperX network needs at least TP/2 bidirectional links crossing tiples of some unit of bandwidth, to model the option of trunking of any bisection in order to be nonblocking. Thus, the ratio multiple physical layer links. This exibility can be exploited to =KmSm/2T measures the relative bisection bandwidth of the archiprovide uniform bandwidth between dimensions with different tecture. As discussed above, for a regular (L; S; 2; S) HyperX, i.e. a

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

55

flattened buttery with double-wide switch-switch links, we have =1.

4 THE SPACE OF REGULAR HYPERX NETWORKS


Since it is determined by fewer parameters, we can visualize the regular HyperX design space more easily than the general. In a regular HyperX, the network shape S and the trunking factor K are scalars. There are four free design space parameters, namely L,K, S, and T. For some (R,N,K,L) combinations, the lower bound lies completely above these two upper bounds, and there are no feasible HyperX designs.

the existing network. When such rewiring is required, it should be minimized. Maintenance costs: Electronics, cables, and connectors do fail. A network design that confuses repair people will lead to higher repair costs and/or more frequent mis-repairs.

6. SIMULATION FRAMEWORK
In this paper, we have modeled our MPLS-noc architecture concepts with the widely used network simulator ns-2 [26]. NS2 has been widely applied in research related to the design and evaluation of computer networks and to evaluate various design options for noc architectures [27], including the design of routers, communication protocols, etc.

4.1 Comparing general and regular HyperX


The irregular HyperX has fewer switches in all cases, the reduction in number ranging from 8 to 28 percent. Interestingly, the best results were always three dimensional in the general case, whereas the best regular designs were four dimensional in two cases. Flattened butteries are regular HyperX networks for which K = 2. If we add this restriction then for the full bisection-bandwidth case ( = 1) we get an inferior design (L = 4, S = 11, K = 2, T = 9) having 14,641 switches. If we add the additional restriction imposed by the flattened buttery, namely T = S, then we have a network with S = 5 11 and size N = 11 = 161,051 with 29,979 more terminal ports than needed.

6.1. THE HYPERX TOPOLOGY simulations

4.SIMULATION EXPERIMENTS
All of the topology parameters can be described as a script file in Tcl. A part of the ns-2 script file about constructing the topology is shown below. #Create nodes(switch) for {set i 1} {$i <= $s1} {incr i} { for {set j 1} {$j <= $s2} {incr j} { set sw([expr ($i*10+$j)]) [$ns node] $sw([expr ($i*10+$j)]) color blue if ($show_Label_Switch==1) { $sw([expr $i*10+$j]) label sw[expr $i*10+$j]} }} #Create links between the switch-switch (dimension-x) for {set i 1} {$i <= $s1} {incr i} { for {set j1 1} {$j1 <= [expr ($s2-1)]} {incr j1} { for {set j2 [expr ($j1+1)]} {$j2 <= $s2} {incr j2} { $ns duplex-link $sw([expr ($i*10+$j1)]) $sw([expr ($i*10+$j2)]) 1Mb 10ms DropTail }}} #Create links between the switch-switch (dimension-y) for {set j 1} {$j <= $s2} {incr j} { for {set i1 1} {$i1 <= [expr ($s1-1)]} {incr i1} { for {set i2 [expr ($i1+1)]} {$i2 <= $s1} {incr i2} { $ns duplex-link $sw([expr ($i1*10+$j)]) $sw([expr ($i2*10+$j)]) 1Mb 10ms DropTail }}}

5 COST MODEL
In order to optimize the total cost of a network, we must have a cost model. Some costs are relatively easy to model; these include: Parts costs: These cover things that a system vendor would buy from other suppliers, such as switches, cables, and connectors. Manufacturing costs: Given the large physical size of a container-based cluster and the relatively small quantities manufactured, cables for these systems are installed by hand. Design costs: A network designer must spend considerable time understanding the requirements for a network, then generating and evaluating specific options. Our approach aims to reduce this cost, while improving the designs produced.A vendor of container-based clusters would prefer to deal with a limited number of designs, since each new design requires new Quality Assurance (QA) processes, and each new design must be explained to justifiably skeptical customers. SKU costs: When a system vendor must deal with a large variety of different parts (often called Stock-Keeping Units or SKUs), this creates complexity and generally increases costs. One of our goals, therefore, is to generate network designs that require only a small set of SKUs generally this means only a few types and lengths of pre-built cables. Cost to reconfigure the network: Some clusters are born large; others have largeness thrust upon them, later on. A good network design allows for the incremental installation of capacity for example, one rack of servers at a time without requiring the re-wiring of

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

56

A) 2D HyperX 3*3*2

D) 3D HyperX 3*3*3*2

Fig.2. Hyperx- 9 switches , 18 Resources

(a)

B) 2D HyperX 3*3*4

(b)
Fig.3. Hyperx- 9 switches , 36 Resources Fig.5. Hyperx- 27 switches , 54 Resources

C) 2D HyperX 3*4*2

6. CONCLUSIONS AND FUTUREWORK


In this paper, we have discussed the major research challenges in future NoC platforms to providing QoS. A novel simulation of HyperX-NOC have presented by ns-2, which is suitable for a wide range of applications in network-on-chip applications. Therefore, further research should perform to find additional simplifications and improvements of the methods.

REFERENCES
[1] W. J. Dally and B. Towles, Route packets, not wires: On-chip interconnection networks, in Proc. Of the 38th annual Design Automation Conf., 2001. M. DallOsso, G. Biccari, L. Giovannmi, D. Bertozzi, and L. Benini, xpipes: a Latency intensive Parameterized Networkon-chip Architecture for Multiprocessor SoCs, International conference on Computer Design 2003, pp.536-539 M. Millberg, E. Nilson, R. Thid, S. Kumar, and A. Jantsch,The Nostrum backbone-a communication protocol stack for networks on chip Proceeding of the VLSI Design Conference, Jan.

[2]

Fig.4. Hyperx- 12 switches , 24 Resources

[3]

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

57

[4]

[5]

[6] [7] [8] [9] [10]

[11]

[12]

[13]

[14]

[15]

[16]

[17] [18] [19] [20] [21]

2004, pp. 693-696 K. Srinivasan, K. S. Chatha, A low complexity heuristic for design of custom network-on-chip architectures, DATE 2006, pp.130-135 K. Srinivasan , K. S. Chatha and G. Konjevod "Linear programming based technique for synthesis of network-on-chip architectures", IEEE TVLSI, April 2006, vol. 14, pp. 407-420 [16] Wang Zhang, Ligang Hou, Jinhui Wang, Shuqin Geng and Wuchen Wu Comparison Research between XY and Odd-Even Routing Algorithm of a 2-Dimension 3X3 Mesh Topology Network-on-Chip, VLSI & Syst. Lab., Beijing Univ. of Technol., Beijing, China "Top500 List," http://www.top500.org/list. A. Komornicki, G. Mullen-Schulz, and D. Landon,"Roadrunner: Hardware and Software Overview," IBM Redpaper, 2009. National Center for Computational Sciences, "Jaguar," http://www.nccs.gov/computing-resources/jaguar. J. Kim, W. J. Dally, B. Towles, and A. K. Gupta, "Microarchitecture of a High-Radix Router," in ISCA, Jun 2005. J. Kim, W. J. Dally and D. Abts,"Flattened Buttery: a Costefficient Topology for High-Radix Networks," in ISCA, Jun 2007. J. Ahn, N. Binkert, A. Davis, M. McLaren, and R. S. Schreiber. "HyperX: Topology, Routing, and Packaging of Efficient LargeScale Networks." Supercomputing, Nov. 2009. : Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pages 111. ACM, 2009. Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. A scalable, commodity data center network architecture. In SIGCOMM 08: Proceedings of the ACM SIGCOMM 2008 Conference on Data Communication, 2008. Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. Vl2: a scalable and flexible data center network. In SIGCOMM 09: Proceedings of the ACM SIGCOMM 2009 Conference onData Communication, pages 5162, 2009. Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya, and Amin Vahdat. Portland: a scalable faulttolerant layer 2 data center network fabric. SIGCOMM Comput. Commun. Rev., 39(4):3950, 2009. P. M. Kogge (editor). Exascale computing study: Technology challenges in achieving exascale systems. Technical Report TR2008-13, University of Notre Dame, 2008. A. Krishnamoorthy, R. Ho, X. Zheng, H. Schwetman, J. Lexau, P. Koka, G. Li, I. Shubin, and J. Cunningham. The integration of silicon photonics and vlsi electronics for computing systems. In Photonics in Switching, 2009. PS 09. International Conference on, pages 1 4, 2009. J. Kim, W. J. Dally, B. Towles, and A. K. Gupta. Microarchitecture of a High-Radix Router. In ISCA, Jun 2005. J. Kim, W. J. Dally, and D. Abts. Adaptive Routing in HighRadix Clos Network. In SC06, Nov 2006. J. Kim, W. J. Dally, and D. Abts. Flattened Butterfly: a Costefficient Topology for High-Radix Networks. In ISCA, Jun 2007. J. Kim, W. J. Dally, S. Scott, and D. Abts. Technology-Driven, Highly-Scalable Dragonfly Topology. In ISCA, Jun 2008. M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In SIGCOMM, 2008.

[22] S. O hring, M. Ibel, S. Das, andM. Kumar. On generalized fat trees. In Proc. Parallel Proc. Symp., 1995. [23] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers. In Proc. SIGCOMM, Barcelona, 2009. [24] H. Abu-Libdeh, P. Costa, A. Rowstron, G. OShea, and A. Donnelly. Symbiotic Routing in Future Data Centers. In SIGCOMM, 2010. [25] S. Scott, D. Abts, J. Kim, and W. J. Dally, "The BlackWidow High-Radix Clos Network," in ISCA, Jun 2006. [26] www.isi.edu/nsnam/ns. [27] R. Lemaire, F. Clermidy, Y. Durand, D. Lattard, and A. Jerraya, Performance Evaluation of a NoC-Based Design for MC-CDMA Telecommunications Using NS-2, in The 16th IEEE International Workshop on Rapid System Prototyping, Jun. 2005, pp. 2430.

You might also like