You are on page 1of 13

SALMINEN ET AL.

, SURVEY OF NETWORK-ON-CHIP PROPOSALS, WHITE PAPER, c OCP-IP, MARCH 2008

1

Survey of Network-on-chip Proposals
Erno Salminen, Ari Kulmala, and Timo D. H¨ am¨ al¨ ainen Tampere University of Technology, P.O. Box 553, FIN-33101 Tampere, Finland
Abstract This paper gives an overview of state-of-the-art regarding the network-on-chip (NoC) proposals. NoC paradigm replaces dedicated, design-specific wires with scalable, general purpose, multi-hop network. Numerous examples from literature are selected to highlight the contemporary approaches and reported implementation results. The major trends of NoC research and aspects that require more investigations are pointed out. A packet-switched 2-D mesh is the most used and studied topology so far. It is also a sort of an average NoC currently. Good results and interesting proposals are plenty. However, large differences in implementation results, vague documentation, and lack of comparison were also observed.

Keywords: network-on-chip, literature study, router, topology, comparison, implementation C ONTENTS I II III Introduction Comparison Criteria NoC Architecture Comparison III-A Switching policy . . . . . . III-B Topology . . . . . . . . . . III-C Routing . . . . . . . . . . . III-D Quality-of-Service . . . . . III-E Testing and fault-tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 3 3 5 5 6 6 6 7 7 8 8 9 10 10 10 10 11

IV

NoC Evaluation Methods IV-A Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV-B Test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NoC Router Implementation V-A Router parameters . . V-B Minimum latency . . V-C Area . . . . . . . . . V-D Operating frequency . Discussion Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

V

VI VII

References

c OCP-IP 2008

The size of one resource is. NoCs have been widely reported in several special issues in journals. . Topology defines their logical lay-out (connections) whereas floorplan defines the physical layout. Fig. The larger SoC the more probably the overall computation is heterogeneous and localized rather than evenly balanced over the chip. which favors short links. numerous special sessions on conferences and recently also a dedicated NoC symposium 1 . c OCP-IP. links. SURVEY OF NETWORK-ON-CHIP PROPOSALS. Instead. manufacturing complexity. for example [72][12]. input/output devices.. WHITE PAPER. MARCH 2008 2 I. and HW accelerators. wiring problems. in-depth literature study of NoC proposals and their implementation results. The early work and basic principles of NoC paradigm were outlined in various seminal articles. what kind of details has been reported and what we should conclude for the future of NoC. and network interfaces. However. These two factors motivate Network-on-Chip (NoC) that brings the techniques developed for macro-scale. for example. multi-hop networks into a chip. 1. The key research areas in network-on-chip design can be summarized. network interfaces • Benchmarking and traffic characterization for design. Routers direct data over several links (hops). switching. This article complements the surveys [12] [87] by providing a categorized. I NTRODUCTION System-on-chip (SoC) architectures are getting communication-bound both from physical wiring and distributed computation point of view. Routing decides the path taken from source to the destination whereas switching and flow control policies define the timing of transfers.and runtime optimization • Application mapping: task mapping/scheduling and IP component mapping. as • Communication infrastructure: topology and link optimization. C OMPARISON C RITERIA The major goal of communication-centric design and NoC paradigm is to achieve greater design productivity and performance by handling the increasing parallelism. latency. Search hits for “network-on-chip” in IEEE Xplore archive. 1 shows the hit count for search “network-on-chip” in IEEE Xplore document archive 2 . For illustrative purposes. the aforementioned sources do not present many implementation examples or conclusions about the current proposals. The three critical challenges for NoC according to Owens et al. A NoC consists of routers. power • Communication paradigm: routing. The good old shared bus is still very common in practical SoC implementations so it is well motivated to have a deeper look at NoC proposals and figure out what really matters. defines how PEs and other resources are connected to the NoC. Wiring delays are becoming dominating over gate delays. most articles do not give clear definition what they mean with the term NoC. we describe the potential pitfalls in general terms and without pointing to the sources. Fig. and reliability. Furthermore. but we do not make any limitation by topology or switching in 1 www. floorplanning. buffer sizing.SALMINEN ET AL. We present extensive comparisons for three purposes: to show major trends. 2 shows an example SoC with a NoC and nine heterogeneous IP blocks that are CPUs.ieee. An excellent text book about network basics in general is [24]. Sometimes it seems to mean only packet-switched mesh-like networks. and CAD compatibility [73].org 250 200 Number of hits 150 100 50 0 2000 2001 2002 2003 2004 2005 2006 2007 Year of publication "network-on-chip" Fig. clock domains. are: power. We have made a deliberate choice not to point out what we perceive as mistakes in others’ work. 50-200 kilogates or larger. for example [99][34][35][25] [8][53][97][91][36] [72][73] and few text books [44][69][9]. The function of a network interface (adapter) is to decouple computation (the resources) from communication (the network). quality-of-service. flow control. on the other hand. IP mapping.nocs.org 2 http://ieeexplore. This means that tens of resources are possible in a single chip with modern 65 nm processing technology. II. this paper identifies the aspects calling for further research. Task scheduling refers to the order in which the application tasks are executed and task mapping defines which processing element (PE) executes certain task. memories. Despite vast literature on the topic. As an example of increased interest. We analyzed well over 100 peer reviewed NoC publications and created classification of general properties.

cam. It splits the packets into several flits (flow control digits). the network is the abstraction of the communication among components and must satisfy quality-of-service requirements. Cambridge 3 . Another bibliography is collected by R. Both methods need buffering capacity for one full packet at minimum. Univ. the focus is on the design and implementation of network components and only single reference per NoC. The bottom rows show the number and percentage of the papers reporting/using the given property. Although very extensive. The upper part of Table I lists the circuit-switched (c in the table) and lower part shows packet-switched (p) networks. There are three choices how packets are forwarded and stored at routers: store-and-forward. Packet-switching is more common and it is utilized in about 80% of the studied NoCs. Store-and-forward (s) method waits for the whole packet before making routing decisions whereas cut-through (c) forwards the packet already when the header information is available. Table lists support of various configurations if the papers explicitly report it. Packet-switching performs routing per-packet basis. Topology. and which require performance guarantees. Mullins. the percentages from another survey.ac. link cpu cpu NI i/o i/o mem mem mem mem resource Fig. It is an open research problem to analyze the break-even point (predictability and duration of transfers) between the two schemes. 2. SURVEY OF NETWORK-ON-CHIP PROPOSALS. this paper uses a simple. For comparison. are also given. In [8]. To avoid such hype. cut-through.uk/ rdm34/onChipNetBib/browser. Interested readers are encouraged to study also the NoC-related on-line resources. However. Some sources assume packet-switching as key property of NoCs but we will consider also the circuit switched networks. the packet may spread into many consecutive routers and links like a worm. Due to limited space. especially OCP-IP’s NoC bibliography http://www. performance. c OCP-IP. N O C A RCHITECTURE C OMPARISON Extensive summary of NoC proposals is listed in Table I. Other aspects like design automation [10][59][72][74] [45]. Switching policy Circuit-switching forms a path from source to destination prior to transfer by reserving the routers (switches) and links. similarly to cut-through. but fail to show that or to charactecterize the minimal requirements of a NoC.ocpip. such as reliability. Furthermore. and wormhole. Unfortunately. [87]. A. routing. acc. Wormhole switching (w) is the most popular and well suited on chip. and energy bounds. as well as utilized evaluation methodology and criteria are given for each NoC.. 3 http://www. Circuit-switching is best suited for predictable transfers that are long enough to amortize the setup latency. switching scheme. Network interface is denoted with N I . but the buffer space can be smaller (only one flit at smallest). Table I cannot be all-inclusive. and programming models [77] are covered in more detail in the referenced publications. this paper. Therefore.SALMINEN ET AL.htm . we argue that the selected publications offer a representative view of NoC design. both in alphabetical order. although loose. definition that “network-on-chip is a communication network that is used on chip”. Each property is explained in more detail in the following.cl. performance evaluation [75][33][87]. Any unclear or partially applying properties are marked in parentheses. MARCH 2008 3 buffer router cpu cpu acc. Routing is done as soon as possible. Circuit-switching scheme also reduces the buffering needs at the routers. encoded signaling [56][43]. many papers give an impression that a “NoC” magically improves every design where it is utilized. as shown in the next column.org/university/biblio main. All data follow that route and path is torn down after the transfer has completed. An example SoC that has a 2-D mesh NoC with 9 resources. especially for small transfers. Switching policy is not clearly stated in all cases and sometimes both schemes are supported. WHITE PAPER. Packet-switching necessitates buffering and introduces unpredictable latency (jitter) but is more flexible. error detection and correction [67][11] wire optimization [56] [22][70][65]. III. we consider architectural features and implementation specific figures of merit as well as the validation for each paper.

c [57] p [103] p [34.. H. J. Henr. Bolotin. Andr.x m.t.2] p [14.t. (m) n/a m n/a x m w m s? sparse m w em w x m w 3m w x m w (t) w w m w x m w m. Samuelsson Lee. K. 84] p [88] p [56] p. Rost. / adapt. Lee.SALMINEN ET AL. Lee.ON . Switching Topologies (2) Rout. Lochside Mango Microspider Mocres Mondinelli Nostrum Octagon Orion/Luna Pavlidis PMCNOC Proteo Qnoc Ring road Slim-Spider SNA SocIn/RaSoc SPIN Star-mesh TeraFLOPS XGFT Xpipes Xu # reported % reported % in [54] Valtonen Liang Chang Richardson Salm. mesh Faust Feero Felicijan Gezel GDB Hermes Kavaldjiev Kim. Karim Soteriou Pavlidis Wang Sigue. Wolkotte Rijpkema Ahmad Beigne Arteris Bartic Anjo Bouhraoua Pande Carloni Kim.G. method Evaluation metrics circuit/packet power/energy bus.x sm n/a x x s x t.CHIP PROPOSALS IN LITERATURE . Guarantees throughput mesh/torus (hier. WHITE PAPER.30 x x x x x x x x x x 1 (x) x x x x x x x 2 x x x x x x x x 3 3 4 2 6 3 4 3 2 3 4 3 2 3 2 4 1 1 2 2 1 1 3 3 1 2 2 2 4 5 4 3 3 4 4 3 2 5 3 2 2 3 2 2 3 3 2 4 1 4 1 3 1 2 3 1 5 5 4 5 60 100 - w= wormhole.(c) [1] p.a d d (r) (c) d d r n/a (d) n/a d a d d. Pen. c=cut-through.er d.c [92] p [78] p [96] p [90.a d.) ring Simulation Analytical utilization frequency Synthesis FPGA 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 AET aSOC Crossroad dTDMA HIBI Hu Nexus Pnoc ProtoNoc SDM NoC SocBus Wolkotte Aethereal Ahmad ANOC Arteris Bartic Blackbus Bouhraoua Butterfly FT Carloni CDMA NoC CDMA NoC(2) Chi.s c . Eval. SURVEY OF NETWORK-ON-CHIP PROPOSALS. D. Mullins Bjerregaard Evain Janarthanan Mondinelli Millberg. Ahon. (a) db a d.p [40] c [60] c [38] c [17] c c [58] [98. H-C Kumar Crossbow Hu Forsell Ogras Lattard Feero Felicijan Ching Hosseinabady Moraes Kavaldjiev Kim.c w 3m * m c w x m s n/a x x n/a . Lee.h tr s ? (m) w (x) f w ftr n/a n/a .72 59 27 (1) (2) (3) (1) (d) (d) c (d) (d) (d) p (d) (d) c (d) (d) (c) (d) (d) (d) (c) n/a a (c) a c n/a d* (c) any a a p n/a n/a n/a d d n/a d+a d? d* d n/a d (c) d. c OCP-IP.x 50 28 9 37 7 83 47 15 62 12 . S. Cidon Vangal Kariniemi Bertozzi Xu 60 100 44 papers [94] c? [60] c [18] c [82] c [85..52] c.. H-C Cliche Crossbow DyAD Eclipse Ext. Zeferino Guerr. x x x x (x) x x x x x x x (x) x x x x x x x x x x x x x x x x x x x 1 x x x x 1 x x x x (x) x x x x x x x x x x x (x) x x x x 1 x x (x) x x x 1 x x x x x x (x) (x) x x x x x x x x (x) x x x x x x x x x (x) x x x x x x x x x x x x x x x x x x x x x x x x x x x x 46 77 55 x x x x (x) x x x x 30 33 50 55 25 41 x x x 21 35 39 x x 1 1 1 x 1 15 8 6 10 25 13 10 17 30 57 . Kulm.x sm w m w ftr w sm w m w ftr w n/a . J..h ftr w. 3] p [21] p [95] p [47] p [10] p [102] p 60 60 100 100 Notes: x m x m x x b (w) x hb x x x x x m x (m) x m x m w x (m) w x t w x (m) w. Xin Wang Chi.37] c [100] c [83] p. MARCH 2008 4 TABLE I E XTENSIVE SUMMARY OF NETWORK .a a c d (d) 5 8 51 8 13 85 11 27 - x x x x x x x x (x) x x x x x x x x 58 x ASIC 5 x x 16 x 8 x x x x x (x) x x x x x x x x x x x x x x x x x x x x x x x x x 1 x x x 2 x x x x 16 23 x x x x x x x x x 64 4 x 8 16 x x x x x x x x x x x x x x x x x 9 x x 80 x x x 39 6 9 65 10 15 55 9 0 x x (x) x x x x x x 43 72 82 (x) 18 30 20 5 3 3 1 1 1 2 1 7 1 1 5 2 1 1 1 1 1 30 1 1 4 4 1 24 40 30 x x x x x x x x x x x x x x x x x x x x x (x) x x x x x x x x x x x x x x 36 60 100 area # Network Author Ref.c [7] p [5] p [6] p [4] p [15] p [76] p [16] p [49] p [101] p [19] p [53] p [23] p [41] p [30] p? [71] p [54] p [28] p [29] p [20] p [39] p [64] p [48] p [51] p p [55] [66] p [13] p [27] p [42] p [63] p [62. Hu Lines Hilton Castells-Rufus Leroy Wikl. s=store-and-forward see Table I for topology symbols number of NoC terminals in largest prototype # criteria s/c/w/d (3) runtime custom (3) latency other .p d d d er d d d a er d r. crossbar Applications Comparison fat-tree/tree det.a a c.80] p [46] p.t n/a w m w x m? w x c m s ftr s x m n/a x em. H..G.

A topology can also be extended by systematically adding links between non-neighboring nodes [71][46][27]. DyAD [41] switches automatically between deterministic and adaptive at runtime and guarantees freedom from deadlock and livelock. Deflection routing forwards packets every cycle and uses adaptive misrouting (other than minimal path) in case that target p=point-to-point b=bus hb=hierarchical bus r=ring x=crossbar Fig. Hence. have often varying size and shape. NoCs can be tailored according to the requirement of the application domain. all packets follow the same route between given source-destination pair. Few approaches are customizable but the results are given for one topology only [4][7][20][58][90]. Regular structure suites especially well general-purpose chip-multiprocessors (CMPs) with homogeneous resources [95]. fat-tree. It was found in the survey [87] that 3 networks are compared on average and the largest studies included 9 [56] and 12 networks [92]. half of the cases considered only one size (number of terminals). Resources in contemporary SoCs. Although 5 network sizes were included on average. Physical restrictions in IC layout favor the utilization of 2-dimensional topologies and all but 3 papers in Table I have two-dimensional topology. Updating the routing tables at runtime which allows adaptivity event though the routing is deterministic (marked d∗) [6][71]. Hermes supports selection of the scheme at design time [64]. and crossbar have roughly even proportion followed by ring-based topologies. found the 3-D mesh most suitable NoC in many cases [92]. Topology The most common topologies are 2-D mesh and torus which constitute over 60% of cases. application modeling. Custom. integrating ICs in 3-D fashion clearly favors 3-D NoCs [28][78]. for example Slim-spider (hierarchical star) [56]. Packet-switched networks mostly utilize deterministic routing (about 70 %) but adaptivity or some means for reprogramming the routing policy is necessary for fault-tolerance. The studied circuit-switched NoCs do not commonly discuss how the circuits are setup. All examples have 8 terminals and bidirectional links. MARCH 2008 5 B. With adaptive schemes (a) the route can vary. .. star-mesh [21]. Table II lists the used topology symbols. high implementation cost restricts the usage for local communication only instead of large systems. Hierarchically constructed networks. which should be accounted in topology/floorplan analyses. hence the notation (d). CDMA NoCs [49][101] are here considered as crossbars. ft=fat-tree t=2-D torus c=custom Some basic network topologies. more disciplined evaluation is called for. Otherwise similar distribution was observed in [87] except that buses were used commonly as a reference instead of multi-hop topologies. However. C. Moreover. Unlike general-purpose macronetworks. SURVEY OF NETWORK-ON-CHIP PROPOSALS. see for example [40][53][10][31]. differentiate local and global topologies.SALMINEN ET AL. However. It is worth noting no single fixed topology categorically outperforms all others. They reduce congestion and increase fault-tolerance but require special attention to avoid deadlocks. However. Application-specific networks can obtain superior performance while minimizing both area and energy [40][10]. and SNA [57]. Soteriou et al. Routing With deterministic routing (d in the table). c OCP-IP. 3. on the other hand. customization is contradictory to the early projections of simplified layout and wiring optimization that are possible with regular topologies [35][25][36]. the transmitted data remains in-order since the circuit stays intact throughout the transfer. Both have connections between 4 neighbor nodes but torus has wraparound links connecting the nodes on network edges and mesh does not. WHITE PAPER. and further research. This way the topology better matches the differences between local and global communication. However. The properties of basic topologies are quite well-known but customization and IP mapping emphasize the importance of efficient design automation tools. A crossbar (or star) is a non-blocking network with high performance.

One third of the studies use analytical methods and one fourth have presented prototype chips. the network size (link width. h tr. fat-tree ring. for example lower latency for high-priority packets. the circuit setup may fail at runtime and lead to non-deterministic delay prior to transfer. The fundamental feature of GT is the necessity of a priori knowledge about the traffic load conditions. either hard or soft.SALMINEN ET AL. throughput. Therefore. power. simultaneously activated circuits must be statically determined. 2-D torus 2-D extended mesh. Many sources neglect this phenomenon totally and others. IV. such as stuck-at-faults or short circuits. these issues are mostly neglected in the studied papers. Otherwise. max. torus) tree. except [60] and [47]. assume that network interface or software performs the re-ordering. Conservative models are especially critical when designing real-time systems. Although deadlock is generally avoided. Unfortunately. Guarantees are related to timing (min. a) Best effort (BE) scheme forwards packets as soon as possible but no guarantees are given for latency or throughput in general case. and increased process variation expose the modern SoC to various faults and countermeasures must be actively sought and studied. The cost of reordering in terms of runtime and area overheads are unfortunately neglected which we find as major deficiency. Fault-tolerance means the ability to operate in the presence of faults. SURVEY OF NETWORK-ON-CHIP PROPOSALS. for example [34][62]. 2-D star-mesh 3-D mesh. At the same. and no guarantees can be given to new applications [61]. Testing and fault-tolerance Testing aims to detect the errors that occurred during fabrication. the complexity of traffic modeling and system simulation is high. WHITE PAPER. D. c OCP-IP. hb x m. containment. latency jitter). sm 3m. If packet injection to network is restricted by the NI. The simulation evidently leads to some simplifications and inaccuracies [87]. test mechanism must meet the time. adaptive routing. circuit-switching. and packet delivery (in-order or outof-order).e. detection.. Decreasing feature size. b) Guaranteed throughput (GT) scheme can offer minimum level of transfer capability through network. extended ring custom custom. coverage. and temperature constraints. E. is reported point-to-point de Bruijn graph direction is occupied [62]. . 3-D hypercube (i. More research is needed to obtain accurate. An interesting option is to split the traffic across several paths to reduce congestion on certain area of the network [10]. This is the most common approach nowadays. buffer sizes) is set to tolerate the estimated worst case conditions. A special self-test mechanism for NoC was presented in [47]. integrity (max. stochastic communication. Over half of the articles promise guarantees and concentrate mostly on timing aspects. Most of QoS issues are coupled with routing and flow control policies. packet loss). topology. There are five key elements to tolerate faults: avoidance. t em. Fault-tolerance can be achieved via error detection and correction. max. Similarly. MARCH 2008 6 TABLE II T HE TOPOLOGY SYMBOLS USED IN OTHER TABLES Symbol b. and prioritized packet scheduling [61]. guaranteed services should coexist with BE and provided with minimal overhead. but no exact guarantees either. a (loose) upper bound can be determined for the network latency but not for the waiting time at the NI. and both temporal and spatial redundancy [32]. At the same time. max. For example. latency. and recovery. N O C E VALUATION M ETHODS Simulation and synthesis are clearly the most popular evaluation methods. Almost half the articles mention QoS but only few actually discuss the implementation in detail or evaluate its efficiency. ftr r. Quality-of-Service Quality-of-service (QoS) refers to the levels of guarantees given for data transfers. but only 1 topol. representative traffic models that can be used in benchmarking and in NoC design. er c (c) p db Topology (single shared) bus. error rate. Three methods are employed to give timing guarantees: network dimensioning. isolation. hierarchical bus crossbar 2-D mesh. Using already tested routers to deliver test data and exploiting the NoC’s parallelism for simultaneous testing of multiple nodes is presented in [32]. Prioritizing packets offers relative guarantees. out-of-order packet delivery is problematic with adaptive routing. Link testing must be addressed in addition to NoC routers [31][74]. higher frequencies.

Furthermore. measuring area with sensitivity analysis over multiple router configurations. and latency. At the first stages. using a freely available NoC (simulator). latency jitter. third-party reference points are valuable. and hierarchical bus has been presented in [85][52]. IIR. An MPEG-2 encoder in FPGA with 8 hardwired PEs and either custom mesh or point-topoint network is presented in [55].se/research/soc/fpganoc/ 5 http://noxim. A. Similarly. such as uniform random traffic. such as Hermes [26]. Test cases can be divided into computation kernels and full applications. FPGA prototypes are very valuable to ensure correct functionality of NoC. the obtained (near) real-time execution of parallel applications allows longer runs and better credibility of the results. as a common reference might prove beneficial since most NoCs are proprietary and unavailable for public evaluation. which cannot be included in product ASICs. prototypes are too scarce and lack concrete results from real applications.SALMINEN ET AL. Category others denotes metrics that are included only in few articles. dot product. In our experience. SocBus with 2 [37] and Nexus with 16 terminals [60]. it is essential to perform verification with small systems. for example. Over half of the studies provide some sort of comparison. NoCs must be utilized in their anticipated scale. and include various overheads. The required silicon area is the most commonly reported value (77%) followed by latency (55%) and maximum operating frequency (50%). and smoothing. Noxim5 . For example. FFT. vector sum. in 40% of cases (mere 30% in [87]). HW accelerators. and contains 100 million transistors occupying 275 mm2 silicon area. For example. Note also that mere FPGA synthesis is not the same as running full applications on the FPGA (which naturally includes synthesis).isy. It is necessary to justify the proposals and to show that the results are in accordance with the related work. applications are rare used.liu. There can be 16 processors on a single FPGA and up to 35 processors with 23 other IPs when three FPGA boards are used. The basics fault-tolerance metrics are discussed in [32] but they were not considered in the papers listed here. the profiles of [10] have been used by many researchers. They also allow special hardware structures such as monitors and traffic generators. silicon area. Furthermore. All these are to be minimized and usually appropriate trade-off is sought [87]. Unfortunately 4 http://www. The chip is fabricated at 65 nm technology. c OCP-IP.net/ 6 http://www-dyn. Comparison is practically always done between in-house developed approaches which are not well documented.uk/ rdm34/wiki/ .ac. FAUST chip is targeted for baseband processing and it includes 23 IPs and an asynchronous 2-D mesh NoC [54]. a comparison based solely on a single source is very limited. About 3 criteria are reported on average which is inadequate for good comparison.. non-related values. which contains dozens of terminals. B. WHITE PAPER. error tolerance. For example. more effort towards prototyping and standardized evaluation methods in NoC community is strongly encouraged. Evaluation metrics The most important metrics for NoCs are application runtime. Modern FPGA devices are large enough for multiprocessor SoC implementation in the range of million gates and notably cheaper than ASICs in small series. Note that a prototype offers a valuable reference point for calibrating the simulation and analytical models. FPGA-NoC4 . By far the largest NoC chip to date is the Intel’s TeraFLOPS that has an 80-node 2-D mesh [95]. Two of the reported FPGA studies use 4 processing elements (PEs) but do not provide any application results [2][64]. designed to runs at 4 GHz .cl. such as 4-16 terminals common nowadays. Similar observations on utilized metrics were done in [87] as shown on the bottom row. and matrix multiplication.da. Unfortunately. as in [59][38][57]. segmentation. In summary.cam. Synthesizable traffic generators can be utilized in simulators but also if the complete system does not fit into (single) chip. Nevertheless. Nostrum NoC was prototyped with 16 traffic generators [68] and an ASIC prototype of 64-node fat-tree with traffic generators was presented in [63]. Test cases Majority of publications use synthetic traffic. Two ASIC examples include the network only. At least. An FPGA-based MPEG-4 encoder with multiple Nios II processors. DCT. or packet loss. SURVEY OF NETWORK-ON-CHIP PROPOSALS. Slim-Spider chip [56] includes the NoC and 9 resources and the same hierarchical star was used in object recognition system [50]. Examples in the first category include image binarization. test the application. which is a valid approach on the first steps of design. power consumption. to properly analyze them. Some reported prototypes omit the computation resources or results from real applications. MARCH 2008 7 Such phenomena could be avoided with real prototypes although measurements are usually more complex than with a simulator and the costs are higher. researchers must aim for wide spectrum of results instead of common practice of reporting just a few. The other metrics have lower occurrence. we encourage systematically evaluating a large set of (representative) traffic parameters. Therefore. Hence. wire length. Measurement setup and the definition of metrics vary between research groups and this complicates comparison. However.sourceforge. Actual applications give the best accuracy but their traffic profiles are also suitable. and Netmaker6 .

However. the power depends on the used wire model and required wire length which both require knowledge about the layout. scheduling. Each VC has a separate buffer which reduces blocking and hence improves performance. Few approaches use wide flits. it necessitates including such aspects as (real-time) operating system. for example 90 nm and 100 nm. In addition. Table III includes also results for networks that are used as reference in the studies (marked with prefix Ref. The number of ports is defined by the topology. FPGA results are omitted for brevity. The NoC area in [54]. The flit width refers here to the data payload alone and the control signals are excluded. None has yet reported a NoC running several large applications simultaneously. Virtual channels are necessary in adaptive routing schemes to avoid deadlocks. The reference octagon is implemented for this paper based on [46] but uses packet switching only. Virtual channels are omitted in most current implementations. The networks are described in [85][86] [26].). MARCH 2008 8 these applications are not realistic for a billion transistor multi-processor SoC (MPSoC). They are needed by certain routing algorithms ensure correct operation without deadlock. MPEG-2).263. and memory usage. Table includes few reference implementations synthesized for this paper and targeting 400 MHz frequency. is also application-dependent if power supply gating is applied.3 cases are used on average. Special FIFO design can offer clear area savings [83]. and large database processing. However. and 256 bits [19]. 7 Note 8 Leakage that some feature sizes are grouped together. N O C ROUTER I MPLEMENTATION Table III lists examples of reported implementation results for NoCs. The latter group of applications includes video encoders (H. MPEG-4) and decoders (motion-JPEG. the area is mainly defined by the flit width and the buffer size. A. Furthermore. mesh on the bottom has two buffers for PE. Estimating and lowering frequency prior to synthesis avoids over-design. Power consumption is very important issue with NoCs [73] but omitted from the table since it is not a static metric in the same sense as the area or maximum frequency. they have the same width in most cases. c OCP-IP. virtual channels per port and depth of a buffer in flits. topology. Reference buses support hierarchical structures and results are for one wrapper. for example 96 bits [83]. Standardized test case set is currently being defined by OCP-IP NoC Benchmarking group [33]. Leakage power is proportional to area but dynamic power is related to the sustainable traffic rate of a NoC and hence application-dependent8. but are still insufficient alone. SURVEY OF NETWORK-ON-CHIP PROPOSALS. Re-implementing the ideas from in literature is important to obtain fair comparison via direct control over parameters. The most common width is 32 bits which is significantly smaller than 256 bits assumed in [25]. those applications represent too fine grained parallelism which is out of the projected NoC scope. Further analysis of power consumption can be found in [10][55][56][80][102]. MPEG-2. In addition. At the same. WHITE PAPER. The results are sorted according to processing technology 7 . The results are given per router or wrapper. mesh being the most popular (5 ports: PE. and possibly re-ordering or retransmissions. 128 bits [62]. 9 Ref. Mango supports 8 VCs [13] and circuitswitched SDM up to 32 VCs per port [58]. Circuit-switched proposals are often called “bufferless” but 1-flit buffers at each output are assumed for them. south. . The lowest frequency that meets the application demands may be set at runtime to minimize dynamic power. Higher VC count also increases performance up to 4-8 VCs/port [75]. the impact of more detailed resource models has not been widely explored yet. interface may double the required silicon area. for example large signal drivers. V. However.1 cases on average.SALMINEN ET AL. These give a better view to real NoC performance. North. smart camera. is 19+10+12=41 kilogates per NoC terminal. Note that the flits are used for flow control (reserving buffers and links) whereas the phit (physical digit) denotes the number of parallel wires between the routers. one for tx and the other for rx. and West) 9 . East. most studies used statistical traffic generators and 3. especially on FPGAs. average being 5 x 3 x 6 flits per router. Router parameters For a given topology. This is important step to bring evaluations closer to expected usage scenario of NoC-based systems. Buffers are usually constructed from flip-flops but using SRAM is also viable. Network interface (NI) handles the packetization. the sum of router and the associated network and GASL interfaces. low-leakage logic gates can be opted with less stringent delay constraints. radar signal analysis. The reported buffer depths range from 1 flit to 30 flits but all sources do not document the buffer sizes. The largest study includes 30 application traces [92] but only 1. micronetwork stack [8] and other middleware that have a profound implications on runtime. However. In other words ports · V C · f lits . the buffer size is listed as a product of number of ports. Multiple virtual channels (VCs) may be associated with each physical port. Details about the needed network interface are too often omitted. Similarly. When specified. and then alphabetically. OFDM. In [87]. see for example [48][56][90]. as in [81] and [83]. but their number in the full network depends on the topology. Application results from a physical prototype are given only in [40][52][54][71][95] and the others rely on simulation.. several sources report buffers taking 50-90% of the router area and major part of the power consumption.

output arbitration.05.96 4-18 35 35 35 32 27 35 14 17.2-0. Top.6 MHz including [89][56] B. 200-2000 pred. both NI and GALS interface introduce a double latency compared to a single router (12 ns vs.05 4.89 Rostislav [84] b n/a 350 4.17-0.t 256 5 4 4 250 2. # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Network Orion/Luna TeraFLOPS Xpipes Example NoC GDB HIBI v.8 130 10 23.2.20 Lee.=predicted.. The latency of Aethereal NI is 4 − 10 cycles whereas SW implementation of packetization takes nearly 50 cycles [81].18 Chang [18] b 69 n/a 180 3.22 26.4 1.1 0. 6 ns).3 0.9 3..2 0. C 500 S 795 R 435 R 435 R 2270 L 1075.SALMINEN ET AL. c 8 n/a 180 Chi [19] m. [56] x.8 130 4 9. Router area Min.03.05. K. A combined latency of 10 cycles for the two network interfaces (sending and receiving ends) was reported in [10]. A basic 5-cycle router pipeline has the following stages: input buffering.65 Siguenz. A novel design of router allows single-cycle latency [66] and look-ahead mechanism that performs routing decision for the next router is presented in [19].34 53 Bertozzi [10] c 32 4 1 ? 100 7 0.0 31 15 100 100 100 91 77 100 40 49 89 43 AVG (all) 50. SoCBus [89] reports a 6-cycle latency for circuit-setup whereas the latency for data transfer is most likely one cycle. c OCP-IP. In [54]. R 114 L (25) R async. 0.9 AVG (130 nm) 57.4 169. 18 Leroy [58] m 32 5 1-32 1 130 0. this] m 32 5 1 2.26 Radulescu [81] m(NI) 3 x 32 4-8 1 8/3 130 4 8. MARCH 2008 9 TABLE III I MPLEMENTATION RESULT EXAMPLES OF N O C ROUTERS .75 Valtonen [94] m 16 n/a 350 3 26. pkt mesh Proteo Slim-spider Chi Chain AET Hermes Qnoc. Flit width [bit] ports VC flits [nm] [cycle] [ns] [mm2] [kgates] Soteriou [92] m 64 5 4 16 50 7 0.18 Bjerreg. WHITE PAPER. octagon Mondinelli SPIN XGFT Aethereal router Aethereal NI Anoc Faust Kavaldjiev Mango Ref.59 Dally [25] t Hosseinabady [39] db 22? n/a 90 0.6 ~0.0 0.47 2-12 Rostislav [84] m n/a 350 1 3.04. Detailed analysis of router pipelining is presented in [79]. 2-D mesh Ref.(hb) 32 2 1 2. lat.2 0. R 270-303 R 30 35 86 100 R 596 328 (2) - pred.29 <21 Andriah.2 9. the payload is transmitted (usually) at the rate of one phit per clock cycle. 15 Mondinelli [63] ftr 16 8 1 8+1 130 <0.1 4.8 15 Moraes [64] m 32 5 1 5-30 350 10 10-49 Rostislav [84] m n/a 350 4-9 0.25 16.2 17. 0. Once the header has been forwarded.0 Author Ref.18 Sathe [89] c 16 3 1 1 180 6 setup 0. bus SocBus Asoc Crossroad DyAD mesh Lochside Ref. especially when the overheads regarding the middleware and network stack are included.16. 0.08 Liang [59] m 32 4 1 2 180 0.14 14. The shown values are for the optimum case without contention.5 0. Detail [MHz] level(1) 10000 pred. 0.09 6.11 9. seg.24 Kariniemi [47] ftr 32 3. 24 de Mello [26. circ.09-0.8 130 4 9.1 6.25 Beigne [7] m 32 n/a 130 2-2. Minimum latency The minimum header latency induced by a router is given in column Min.5 10.5 12.2 0. Latency depends on the level of pipelining being about 5 clock cycles on average but details are not usually available.9 AVG (180 nm) 41.11 Vangal [95] m 32 4 2 16 65 5 1. sync # reported % reported Buffering Tech. lat.7 0. and switching. [3] ftr 32 8 1 4? 130 0.06 Hu [41] m 32 5 1 8 160 26 Mullins [66] m 64 5 4 4 180 1 4. Hermes SDM NoC Wolk.04-0.19 Salminen [86] m 32 6 1 2.14 Wolkotte [100] m 16 5.8 3.1-0. 8 Salminen this er 32 4 1 2.25 20 Lattard [54] m 32 5 2 ? 130 0.07 Chang [18] m 69 5 1 1 180 3.8 130 4 9. 0. Minimizing the latency and memory overhead of micronetwork stack that still enables simple reuse is an .3 0.6 19 Kavaldjiev [48] m 16? 5 4 4? 130 0.08.2 0..6 1 7 130 0. routing.14 13.03-0. async Qnoc.4 130 0.5 0. pkt Ref. 21 Rijpkema [83] m 3 x 32 5 1 8 120 0. R=RTL (2) 756.5 Chang [18] m 69 5 1 8 180 29. virtual channel handling. [90] r 8 3 1 7 180 0.7 0.5 1. 510-590 S 435 R 435 R 200 C 200 L 400 R 500 L 500 L 250 L async.507 R R 1200 R 400 L R 333 C 250 C R R 1600 C 150 C async. L=layout. SURVEY OF NETWORK-ON-CHIP PROPOSALS.4 5. 4270 C .0 0.15 Salminen [85] b. [13] m 32 5 8 1 130 0.2 0. rom Ref. 0.1 256 5 8 4 100 3? 0. in clock cycles and/or nanoseconds.7 0.9 5. (1) Freq. C=chip. The reported HW latencies are low enough in our opinion.pred.0 0.

larger range of evaluated parameters and configurations. for example 1 − 9mm2 . Area The router area is given in mm2 or kilogates. D ISCUSSION The most remarkable observation is the lack of standardized NoC benchmarks. this is also a commonly overlooked issue. more prototypes. These values are meant only for coarse-level comparison since the router parameters or inclusion/exclusion of wiring area in results are not always stated. This is rather counterintuitive to note because the area is the most often reported metric (cf.14mm2 . wiring has notable impact on the frequency in deep sub-micron technologies. Reported average area at 130 nm technology is 0. associated measurement methodologies. Table I). Circuit-switching allows even smaller routers due to absence of buffers. Sylvester and Keutzer [93] have estimated PEs with complexity of about 50 − 100 kilogates will allow dependable design and verification also in the future. and lack of comparison. such as packet length. network stack.. However. C ONCLUSIONS This paper gives an overview of state-of-the-art network-on-chips. Unfortunately. SURVEY OF NETWORK-ON-CHIP PROPOSALS. An order of magnitude difference is so large that a reader cannot assume anything about the NoC’s performance unless it is explicitly given. 1. and synchronization). anecdotal nature and limited number of results. larger setups. The overhead gets larger if separate wiring channels are reserved for the signals between the routers. For example. Area overheads in the range 9 − 45% were reported in [75]. The average frequency is about 600 MHz for 130 nm technology which seems adequate for cost-optimized. At the same time. VI. must be accounted in evaluation. more and more high-quality papers are published continuously as the research matures. D. Operating frequency The next column shows the reported operating frequencies. Datasheet of a stereotypical NoC is also given to alleviate the reporting. buffer type. The reported gate count per router is around 15 kilogates which is reasonably low for large MPSoCs. Few NoCs are asynchronous [7][13][29][54][60] [84][101]. GALS paradigm is crucial in deep submicron technologies [8] due to difficulties in clock distribution and to enable power saving. which would clearly diminish the relative area overhead. TeraFLOPS has 3 mm2 tiles and 53-kilogate routers account about 17% of the transistors. C. better models for traffic cases and for networks (especially for NIs. and continued motivation to keep the NoCs evolving. and a set of parameterized reference inputs for the NoC benchmarks.35 GHz for Nexus at 130 nm [60]. Most implementation results are obtained from synthesis tool but good results are obtained also in real silicon implementations: 1. Table IV shows an example data sheet for contemporary average NoC and our improvement suggestions. VII. These penalties. Most studies cited here originate from academia but a growing interest is observed also in the industry. the synthesis settings and silicon conditions must be documented. Hence. These ensure meaningful comparison between various sources and the overall view can be determined in incremental steps. Many sources assume quite large blocks to be connected via NoC. for example bus (for area-optimization) or crossbar/point-to-point (performance optimization). There are great differences between the fastest and the slowest results. To put it simply. GALS). there are few properties that were not included in the previous tables. . locally synchronous. To illustrate their importance. However. over 11x with 130 nm and 6x with 180 nm technology. see for example [58][59][100]. Self-evident point is to note the importance of proper documentation and comparison. as well as synchronization failures. and the parameters related to implementation results. An average set of currently utilized properties is identified and presented. mediumperformance devices. This is a natural spot for the hierarchical NoCs. Large tiles consist of few hundred kilogates. few very fast NoCs [58][100] increase to the average value notably. and astonishing 4. This emphasizes the importance of contrasting results proposals to the existing ones and providing commentary if large deviations from the average performance level are observed. Hence. The results are for the router only assuming that the link delays do not affect. NoC area optimization is encouraged when NoCs are used with modest-sized IPs. The group is formalizing a set of relevant metrics. an average router (15 − 18 kilogates) would impose a notable overhead in chip area. It requires synchronization logic that affects latency in addition to area. Most NoCs use synchronous components that may be asynchronous with respect to each other (globally asynchronous. MARCH 2008 10 open research problem. we need more applications. The last column shows the detail level. WHITE PAPER. c OCP-IP. which is large enough to necessitate a local network. Similar to the area.SALMINEN ET AL. roughly 15 − 36% even without NIs.67 GHz for TeraFLOPS at 65 nm.27 − 5.6 GHz for Slim-Spider at 180 nm [56]. The main concepts for benchmarking NoCs in a systematic and comparable way are currently addressed by OCP-IP workgroup [33].

Bjerregaard and S. no. Andriahantenaina and A. A. “NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. 2005. 2004. “Low power error resilient coding for on-chip data buses. [2] T.g. vague documentation.” in AHS. 2005.” IEEE Trans. and G. 2002. 1. 606–611. de Micheli. 4. c OCP-IP. de Micheli. QoS policies. 2002. 152. prototyping. Mar. traffic characterization and modeling. L. However. 2003. Digit. Bjerregaard and J. Nurmi. pp.. Anjo et al. large differences in implementation results. Mar. MARCH 2008 11 TABLE IV D ATASHEET FOR A STEREOTYPICAL N O C AND SUGGESTED RESEARCH DIRECTIONS . network latency. modeling styles Flit width Phit width Packet length [flit] Buffering per router Buffer type Router area Router frequency Min.. article No. Ahonen and J. IP mapping pipelining. 38. power vs.” IEE Proc. Jan. [4] K. prototype needed appl. NoCs are mostly evaluated with simulation and synthesis but they should be complemented with analytical studies and (FGPA) prototypes. tail:1 5 x 3 x 6 flits based on flip-flops 2 0. Tech. vol. Comput. several simultaneous. predictability synthesis. 2005.” in ASYNC. Jun. 2. pp. “Topology adaptive network-on-chip design and implementation. Bartic et al. “Micro-network for SoC: Implementation of a 32-port SPIN network. no. “A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip. [3] A. pp.. runtime. .” in DATE. p. vol. 2006.” in IPDPS. Jul. Based on the study. Arp. [8] L. design automation. [6] T. 16. Beigne et al. [11] D. Sparsoe. latency and power minimization. “Black-bus: a new data-transfer technique using local address on networks-on-chips. Benini. 15-18 kilogates 600 MHz @ 130 nm 5 cycles ? RTL n/a n/a more detailed documentation Packet-switched NoCs with two-dimensional topology and deterministic routing are the most common. R EFERENCES [1] B. 467–472. pp. 2006. “Architecture of a dynamically reconfigurable NoC for adaptive reconfigurable MPSoC. in-house ref. pp. [13] T. 2. Khawam.SALMINEN ET AL.” Arteris.” in DATE. e. Mar. 70–78. and S. Good results and interesting proposals are plenty. SURVEY OF NETWORK-ON-CHIP PROPOSALS. vol. pp. 35.lat/router Power Results from Silicon conditions Wire+repeaters 32 bits same as flit hdr:1. and power consumption are the most important metrics. Mahadevan. Parallel and Distributed Systems. 2005. power reduction QoS (cannot be compromised) needed. “Networks on chips: A new SoC paradigm. Network-on-chip is an very active research field with many practical applications in industry. latency vs. Greiner. pp. 2006. pp. payload:n . Benini and G. 1226–1231. [9] ——. no. Feb. and network interface design.14 mm @ 130 nm. 113–129. and lack of comparison were also observed. “Integration of a NOC-based multimedia processing platform.. no. Furthermore. vol..” ACM Computing Surveys. pp.” IEEE Computer. [10] D. vol. Jun. application runtime. fault-tolerance. WHITE PAPER. “A survey of research and practices of network-on-chip. evaluations with large NoC configurations running (multiple) representative applications are desired. 10–17.” in FPL.. 1128–1129. [5] “A comparison of network-on-chip and busses. third-party/std-reference standardized.” in DATE. 1. Erdogan. Bertozzi. synthesis no area. the following topics were identified as especially crucial for continued development and success of NoC paradigm: procedures and test cases for benchmarking. hdr. 2005. “An asynchronous NOC architecture providing low latency service and its multi-level design framework. Property Switching Topology Links Flow control Data delivery Virtual channels Routing QoS Clocking Network interface Fault-tolerance Testability Evaluation Prototype Used metrics Comparison Applications Average value packet-switched 2-D mesh bidirectional wormhole in-order no deterministic yes synchronous (or GALS) excluded neglected neglected simulation. 405–411. Bertozzi et al. 1. 54–63. 2-8 VC/port fault-tolerance overhead minimization synchronization impact considered considered considered analysis. [12] T. Networks on chips: technology and tools. not used Necessary improvement QoS. [7] E. Morgan Kaufmann. 2005. Ahmad. Silicon area. Aug. white paper.

46–55. Kim.html.” in ISVLSI. Aug. “QNoC: QoS architecture and design process for network on chip. Nelson. E. 2002.” in DATE. and S. 139–145. 12. Y. 81–86. 1770–1773. [37] T. 559–564. D. “Coping with latency in soc design. A.” in SOCC. [22] J. 24. 218–221. Henkel. 2002. vol. Bouhraoua and M. pp. vol. 2006. “Essential fault-tolerance metrics for noc infrastructures. dissertation. Oct. Postula. and A.. pp. Hu. Tech. “Zooming in on network-on-chip architectures. Tampere University of Technology. S. pp. “A virtual channel network-on-chip for GT and BE traffic. Morgan Kaufmann Publishers. Jul. Sep. Towles. pp.” IEEE Trans. Jun. A. “A flexible circuit switched NOC for FPGA based systems. Greiner. J. “Towards open network-on-chip benchmarks. Feb. “A 0. 1. Carloni and A. [53] S.-Feb. [38] C. 2004. 2001. Wolf. 38. Marculescu. Kim. WHITE PAPER. pp. [52] A. pp.” in DAC. 2000. IEEE. “On-chip networks: a scalable. P. [33] C. CCIT 565.” [Online] http://toledo. Lee. no. MARCH 2008 12 [14] E. Shen. Lee. Symposium on Soc. no. Ivanov. “Network on chip: An architecture for billion transistor era. pp. Kulmala.. H. Sep. 281–320. Towles. M. 4. Jul.” IEEE Trans. R. “Hermes: an infrastructure for low area overhead packet-switching networks on chip.-Y. “Design and implementation of a routing switch for on-chip interconnection networks. L. Millberg. Pande.-Oct. Sobelman. pp. Lee. 573–579. Pande. and R. 223–235. Hu and R. Moraes. Pande. 2002.pucrs. vol. and G. R. 2004. Yoo. Lee. [60] A. “µspider: a CAD tool for efficient NoC design. no. 2004. vol.” in FPL. [56] K. vol. [16] L.. and F. 141–142. 305–310.. . de Mello and L. Vajna. 2–3. Castells-Rufas..” in SAMOS VII.” in Norchip. G. Furber. “Design space exploration and prototyping for on-chip multimedia applications. Hilton and B.. D.” IEEE J. and H. Sep. “Asynchronous interconnect for synchronous SoC design. 2007. pp. 24–35. Apr. [39] M. the VLSI Journal. Ivanov. 105–112. [48] N. J. not wires: on-chip interconnection networks. Aug. S. 36–45. Jantsch and H. pp.-Oct. Kumar et al. 2003. pp. Networks on Chip. 2004.” in NOCS. Mar. “Low-power network-on-chip for high-performance SoC design. 2004.” Integration.. Dec.-C. [23] 2D-Fabric 402c Parallel Processing Interface. 2006. 105–128. pp. Chang. 1. “Testing network-on-chip communication fabrics. 2001.” ACM Trans. Apr.smart routing for networks-on-chip. Nov. R. “A generic architecture for on-chip packet-switched interconnections. “Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip. 00. 2201–2014. 2005.” IEEE Micro. Carrabina. Moraes et al. [32] C. [25] W.. 5. Dey. [54] D. 2004. Guerrier and A. Symposium on Soc. Jun.-J. Schaumont. 2004. H¨ am¨ al¨ ainen. Jul. 2007. Sugar Land. [36] J. and T. 260–263. Kavaldjiev et al. Cidon and I. Salminen. Feb. 10. “MoCSYS: A multi-clock hybrid two-layer router architecture and integrated topology synthesis framework for system-level design of fpga based on-chip networks. Nguyen. vol. Jan. pp. vol. “A scalable high-performance computing solution for networks on chips. Verbauwhede. Jantsch.SALMINEN ET AL. Kumar.” IEEE Trans. Jul. Apr. 505–508. pp. P. 159–162.” in IPDPS. VLSI Syst. Japan. Kim. [28] B. 2005. no. Dec. 148–160. Dally and B. Saleh. [49] D. “A network on chip architecture and design methodology. and H. Jan. pp. Kobe. [26] A. c OCP-IP. E. and J. P.” in IOLTS 07.-F. and H. Sep. and T. [47] H.. N. 143–148. J.. 684–689. 2006. [59] J. 2002. E. 2004. [58] A. 14. Lee. pp. Calazans. Jul. Computer-Aided Design of Integrated Circuits and Systems. pp. “A new multi-channel on-chip-bus architecture for system-on-chips. Marculescu.-S. May 2007. vol. Ching. [61] A.. Kim. Feero and P. M¨ oller. 1.. “An interconnect architecture for networking systems on chips. Hemani. vol.-H. V. 32–35. and S. [30] M. A.” in ISVLSI. Borgatti. 2004. and A.” in DATE. J. “A high-throughput network-on-chip architecture for systems-on-chip interconnect.. 711–726.” in DDECS. Yoo. Rep. Ivanov. 2002. Deng. no. Janarthanan and K. Oberg. 115–118.” in ISCAS. 2006. Bolotin et al. pp. Chen. Lindqvist. [31] C.-Oct. “Hermes project web page. Felicijan and S. M. [24] W. pp. Mathew.. E. The Netherlands: Kluwer Academic Publishers. Lauter. “Reliable network-on-chip based on generalized de Bruijn graph. “An interconnect-centric design flow for nanometer technologies. A.” Ph. and D. Tedesco. pp. 7. Kim. [64] F.” in SOCC. Nov. Mondinelli. 2. [17] D. pp. pp. no. L. 2006. pp. Nov. Karim. vol. vol. pp. “An asynchronous on-chip network router with quality-of-service (QoS) support.” Journal of Systems Architecture. Nov. Joven. [50] D. “Design of a high-performance scalable CDMA router for on-chip switched networks. W. “System-level point-to-point communication synthesis using floorplanning information. Jan. 274–277. no. 127–130.-J. 250–256. 2004. VLSI Syst. Grecu. Evain. [62] M. Wiklund. Millberg. TX. 890–895. 2007. Jantsch. “An 81.” in DAC. 22. Leroy et al. [45] T. “A reconfigurable baseband platform based on an asynchronous network-on-chip. and D. vol. communication-centric embedded system design paradigm. Pradhan.” IEEE Micro. A. 5. and P. “Evaluation and design trade-offs between circuit-switched and packet-switched NOCs for application-specific SOCs. [20] D. SURVEY OF NETWORK-ON-CHIP PROPOSALS. pp. pp. [44] A. “Prototyping and evaluating large system-on-chips on Multi-FPGA platform. 22. “Performance evaluation for three-dimensional networks-on-chip. pp. 2006. Embedded Computing Systems. Kim et al. “Uml-based multi-processor soc design framework. “Power analysis of link level and end-to-end data protection in networks on chip. 2. Elrabaa. ¨ [35] A. 2005. 32–41. Kariniemi.” IEEE Micro. Mello. Pande.-C. May 2006. Dordrecht. 2005. vol. Kangas et al. and A. Diguet. [51] J. and R. Chen. [18] K. 2004. 3–10. J. pp. [21] I.13 um 1Gb/s/channel store-and-forward network on-chip. [42] A. A.” [43] A.inf. Principles and practices of interconnection networks. Lattard et al. Sep. 2.” in VLSI. “Evaluation of current QoS mechanisms in network on chip. 43. C. 2006. pp. K. Liang et al. “VLSI implementation of a switch for on-chip networks. Feb.br/ gaph/Projects/Hermes/Hermes. 2002. Lines. pp. Keidar. [34] P. P. and Z.” in Norchip. S. Jan. Marculescu. Vitkowski. M. Liu. vol. and D. Jantsch. Saleh. no. Jantsch. pp.” in VLSI. Houzet.” in Intl. no. J..-J. J. Crossbow Technologies Inc. pp. 2008.” in HLVDT. 2007. 2006. pp.” in DAC.6 GOPS object recognition processor based on noc and visual image processing memory. “Integrated modeling and generation of a reconfigurable network-on-chip. 137–142. Sangiovanni-Vincentelli. Eds. T. P. “A low latency router supporting adaptivity for on-chip interconnects. Dally and B. pp. pp. [40] J. 50. “Spatial division multiplexing: a novel approach for guaranteed throughput on nocs. 179–189. Grecu. pp. Sep. 2005. Salminen. 2005. Lee et al. p. 89. Lee. E.D. Sep. 22. USA. Nilsson. 305–308. [55] H.” in ASP-DAC/VLSI. no. [41] J. A. “Route packets. 69–93.” in DAC. 845–851. M. “An architecture and compiler for scalable on-chip communication. “On-line reconfigurable extended generalized fat tree network-on-chip for multiprocessor system-on-chip circuits. 24–26. Chakradhar. Jun. “A validation and performance evaluation tool for ProtoNoc. no. and I. Forsell. 2004. and D. Kakoee. 5. Tenhunen. [27] S. 2000. 2003. Cong. 37–42. “DyAD . [29] F. Sep.” in AP-ASIC. pp. Henriksson. [63] F. Grecu. 205. Nov. 26. [46] F. [15] A. 2004. no.-J.” IEEE Micro. 2007. 5. Jul.” Technion Department of Electrical Engineering.” in DAC.” Proc. 392–395. pp. 2004. Chi and J. Solid-State Circuits.” in Intl.” in CODES+ISSS. Anghel. [57] S. 2004. Symposium on Soc.” in Intl. R. Mar.” in ISSOC.” in SOCC. pp.” in CICC. Tomko. [19] H. Hosseinabady.

[102] J.” in Euromicro DSD. 2004. 185–188. Forsell.” Ph. vol. [82] T. Ogras. 2007. [97] P. Aug. Feb. 294–302. “An asynchronous router for multiple service levels networks on chip.” IEEE Des. A. Jan. [95] S. Solid-State Circuits. Sweden. “Comparative analysis of serial vs parallel links in noc. [81] A.” in Norchip. “Switched interconnect for system-on-a-chip designs. Jan. [103] C. Oct. “Communication architecture optimization: making the shortest path shorter in regular networks-on-chip. Wang. 2001. Radulescu et al. Link¨ oping university. 8. Bayoumi. pp. A. and A. “Extending platform-based design to network on chip systems.” in Intl. 219–219. pp. Kolodny. 2007. Goossens. 5117. Salminen. vol. Interconnect-Centric Design for Advanced SoC and NoC.D. “Methodology for design. dissertation. Morgenshtein. “Analysis of error recovery schems for networks on chips.” in SiPS. Dally.” IEE Proc. pp. Apr. 2003. “3-d topologies for networks-on-chip. 198–203. “Design. Tenhunen. no. 2005...” in IPDPS. 2005.. P. “Applying cdma technique to network-on-chip. vol.. “A delay model for router micro-architectures. Wiklund. May 2003. Sep. pp. P. and M. 43. 196–200. “A high level power model for the Nostrum NoC. Pelkonen. “HIBI communication network for system-on-chip. Sep. Computers and Digital Techniques. Zhao. Test Comput. pp. Kreku. Ginosar. 2007. 2006. p.” IEEE Micro. pp. 43. Cidon. [80] S... SURVEY OF NETWORK-ON-CHIP PROPOSALS. pp.” Journal of System Architectures (in press). “On network-on-chip comparison. 1. no. 487–492. J.” IEEE Trans. 2001. [100] P. 1091–1100. [92] V. G. Jan/Feb. vol. 150. vol. “Impact of small process geometries on microarchitectures in systems on a chip. pp. 1. Wiklund and D. 855–868. Kreutz. and Video Technology. “Design of a switching node (router) for on-chip networks. “Development and performance evaluation of networks on chip.. M. c OCP-IP 2008 . vol. Wiklund.” IEEE Trans. vol. J. A. pp. Keutzer. Rep. “Parallel programming models for a multi-processor soc platform applied to high-speed traffic mangement. Apr. Liu. and D. Jan. Oct. 2006.” Journal of VLSI Signal Processing-Systems for Signal.” Integration. C. Liu. “An efficient on-chip NI offering guaranteed services. Kumar. 2004.” IEEE Trans.” Royal Institute of Technology. “Rasoc: a router soft-core for networks-on-chip. Oct. Symposium on Soc.” in DATE. pp. vol. Hu. “Reducing interconnect cost in noc through serialized asynchronous links. Susin. 2006. pp. 2005. 75–78.” IEEE Trans. vol. Kumar. Mar. 2005. [87] E. and S.” Proc. 155a. [77] P. no. vol. [72] U. Benini. pp. pp. Sathe. Aug. [88] H. Link¨ oping. and R. 5.. 2007. synthesis. the VLSI Journal. Rostislav et al. 2007.” IEEE Trans..” in ASYNC. [91] J. H¨ am¨ al¨ ainen.. Sylvester and K. 467–489. Ahonen. I.” in DATE. 89. Aug. no. “The design and implementation of a low-latency on-chip network.” IEEE Micro.” in ASP-DAC. Nov. [96] N. VLSI Circuits and Systems.” in ISCAS.” in SPIE. M. May 2005.” IEEE Design and test. [66] R. no. 2004. Peh and W. c OCP-IP. F. and T. 22. 69–75. 2006.. 10. “PMCNOC: A pipelining multi-channel central caching network-on-chip communication architecture design. 434–442. Computer-Aided Design of Integrated Circuits and Systems. A. A. J. “A hybrid SoC interconnect with dynamic TDMA-based transaction-less buses and on-chip networks. pp. 15.” in DSD.. [94] T. 2004. Y. [86] ——. Dordrecht. S.. no. Owens et al. A. 2003. pp. WHITE PAPER. Nurmi. 15. Aug.” in CODES. Jan. Murali et al. 1025–1040. Sep-Oct. Richardson et al. 2002. “Benchmarking mesh and hierarchical bus networks in system-on-chip context. Aug. “Research challenges for on-chip interconnection networks. [69] J. Vangal et al.” in ASIC. modeling. “Switch-based interconnect architecture for future systems on chip. no. pp. and analysis of networks-on-chip. G. Apr. Jantsch. 2005. Samuelsson and S. “An energy-efficient reconfigurable circuit-switched network-on-chip. Nilsson and J. no.. 8.” IEEE J.a nostrum network on chip prototype. Jantsch. shared-memory abstraction.. J. 2003. and A. Apr. [90] D. Eds. “Networks on silicon: blessing or nightmare?” in DSD. 4. and J. 5. 198–203. “Issues in the development of a practical NoC: the Proteo concept. “Panacea . T. Wolkotte et al. pp.-S. 24. Mar. pp. ¨ [68] E. and test of networks on chips. [89] S. [99] D. 29–41. 4–17. Jan. pp. 2. Isoaho. 3.. 2007. 2005. “Ring road noc architecture.a case study on the panacea noc . Image. Wielage and K. “Trade offs in the design of a router with both guaranteed and best-effort services for network on chip (extended version).. 185–205. vol. Soteriou et al. Valtonen et al. H. 5. and J. MARCH 2008 13 [65] A. Ahonen. 8. vol. 2007. Salminen et al. vol. 228–237. Xu et al. [83] E. The Netherlands: Kluwer Academic Publishers. Paulin et al.” [78] V. Penolazzi and A. pp. 477–488. [98] D. Jantsch. 38.” in VLSI Design. 26–34. [101] Xin Wang. pp. no. May 2006. Yakovlev. Marculescu. [67] S. Friedman.” in Norchip. Computers. 2000. Sanusi. Aug. VLSI Syst. [75] ——. pp. no. and R. pp. VLSI Syst. 1778–1781. E. [84] D. 22. 404–413.. pp. pp. Moore. Rijpkema et al. 2001. A. “An 80-tile Sub-100-W TeraFLOPS processor in 65-nm CMOS. 95–105. [73] J. 401–408. Valli. Zeferino. [85] E. D. pp. 54. “Performance evaluation and design trade-offs for network-on-chip interconnect architectures. no. [76] ——. pp. Soininen. no. and L. Mohamed. 198–192.. [71] U. and S. A. 96–108. [79] L.” in NOCS.” in IP2000. vol. 224–229. Oct. Oberg. Nurmi. 1. 2006. Oct. Mullins. Y. IEEE. no. Ogg. 2005. Sep. pp. 673–676. 229. 16–19. “Polaris: A system-level roadmapping toolchain for on-chip interconnection networks.” in VLSI design. [93] D. “Key research problems in NoC design: a holistic perspective.. 53. 2007. Nurmi. Tech. D’Alessandro. VLSI Syst. 2005. Ogras et al. T.-P. pp. 2004. “An autonomous error-tolerant cell for scalable network-on-chip architectures. and flexible network configuration. [74] P. Al-Hashimi.SALMINEN ET AL. Sig¨ uenza-Tortosa. pp. Pavlidis and E. Kulmala. 503–510. Pande et al. 5. vol. B. West. 27. [70] S.