You are on page 1of 5




Network on Chip Switch-to-Switch Flow Control

Reza Kourdy Department of Computer Engineering Islamic Azad University, Khorramabad Branch, Iran Mohammad Reza Nouri rad Department of Computer Engineering Islamic Azad University, Khorramabad Branch, Iran

AbstractNoC(Network on Chip) protocol family consist, switching techniques, routing protocols, and flow controls. These techniques are responsible for low-latency packet transfer. They strongly affect the performance, hardware amount, and power consumption of on-chip interconnection networks. Packet routing schemes decide routing paths between a given source and destination nodes. The channel buffer management technique does not let packets be discarded between two neighboring routers. Index TermsNetwork on Chip (NOC), store-and-forward (SAF), wormhole(WH) switching, Virtual Cut-Through (VCT), asynchronous wormhole (AWH).

he regular tile-based NoC architecture was proposed as a solution to the complex communication problems of System-on-Chip[1]. NoC concept nicely separates the concerns of computing and communication. It is hoped that NoC will provide solutions to increased system complexity and declining system productivity. Several researchers have suggested that a 2-D mesh [2], [3] architecture for NOC will be more efficient in terms of latency, power consumption and ease of implementation, as compared to other topologies. Most common NoC topologies are mesh and torus which constitute over 60% of cases [2]. Due to these two reasons the platform under consideration is composed of a nn array of tiles which are inter-connected by a 2D mesh network. NoC Routing algorithms can be generally classified into oblivious and adaptive [4]. In oblivious routing algorithms the path to route a packet/flit is determined by the source and the destination address, but in case of adaptive routing the path to route a particular packet with source and a destination address depends on dynamic network conditions (e.g. congested links due to traffic variability). The main advantage of using deterministic routing is its simplicity of the routers design. Because of the simplified logic, the deterministic routing provides low latency when the network is not congested. However, as the packet injection rate increases, deterministic routers are likely to suffer from throughput degradation as they can not dynamically respond to network congestion. In contrast, adaptive routers avoid congested links by using alternative routing paths, this leads to higher throughput. However, due to the extra logic needed to decide on a good routing path, adaptive routing has a higher latency at low levels of network congestion. Jingcao Hu and Radu

Marculescu has proposed a routing algorithm called Dynamic Adaptive Deterministic routing(DyAD) for NoC which combines the advantages of both deterministic and adaptive routing schemes[5]. This DyAD algorithm consists both deterministic and adaptive algorithms, switches between these two based on the networks congestion conditions. Nodes/tiles of NoC may generate huge amount of data for communication as complexity of applications is increasing day by day. Due to the increase of the data, the packet size will increase and to handle the bigger packet, the complexity of the router will also increase. The amount of data generated by a node will be treated as message and the message will be divided into several packets and packets will be transmitted flit by flit. In this scenario, it may happen that one node of NoC receives few packets of a message and waiting for other packets of the same message to arrive. To reduce the waiting time by a node, the concept of demand based routing is introduced in this paper. When a node is waiting for some packets of a message, it can raise the demand for those packets. Once the router of a node receives a demand flit for a packet, and if the packet is available in the buffer of that router, then priority will be given to that packet and it will forwarded accordingly. The major advantage of the Demand based routing is to improve delivery time of messages which intern improves throughput in terms of tasks as tasks depends on these messages. We proposed this Demand based routing for NoC by adding demanding a packet/flit and supplying a packet/flitto DyAD routing. Figure 1 shows an example NoC that consists of 16 tiles, each of which has a processing core and a router. In these networks, source nodes (i.e., cores) generate packets that consist of a header and payload data.

2012 Journal of Computing Press, NY, USA, ISSN 2151-9617



tiple routers along the routing path in a hop-by-hop manner. Each router keeps forwarding an incoming packet to the next router until the packet reaches its final destination. Switching techniques decide when the router forwards the incoming packet to the neighboring router, therefore affecting the network performance and buffer size needed for each router.

Fig.1. Network-on-Chip: routers, cores, and links.

On-chip routers transfer these packets through connected links, whereas destination nodes decompose them. High-quality communication that never loses data within the network is required for on-chip communication, because delayed packets of inter-process communication may degrade the overall performance of the target (parallel) application. Switching techniques, routing algorithms, and flow control have been studied for several decades for off-chip interconnection networks. General discussion about these techniques is provided by existing textbooks [68], and some textbooks also describe them [9, 10]. We introduce them from a view point of on-chip communications, and discuss their pros and cons in terms of throughput, latency, hardware amount, and power consumption. We also survey these techniques used in various commercial and prototype NoC systems.

3.1 Store-and-Forward (SAF) Switching Every packet is split into transfer units called flits. A single flit is sent from an output port of a router at each time unit. Once a router receives a header flit, the body flits of the packet arrive every time unit. To simply avoid input channel buffer overflow, the input buffer must be larger than the maximum packet size. The header flit is forwarded to the neighboring router after it receives the tail flit. This switching technique is called store-and-forward (SAF). The advantage of SAF switching is the simple needed control mechanism between routers due to packet-based operation [other switching techniques, such as wormhole switching that are described below use flit-based operation (Figure 2)]. The main drawback of SAF switching is the large needed channel buffer size that increases the hardware amount of the router. Moreover, SAF suffers from a larger latency compared with other switching techniques, because a router in every hop must wait to receive the entire packet before forwarding the header flit. Thus, SAF switching does not fit well with the requirements of NoCs. 3.2 Wormhole (WH) Switching Taking advantage of the short link length on a chip, an inter-router hardware control mechanism that stores only fractions of a single packet [i.e., flit(s)] could be constructed with small buffers. Theoretically, the channel buffer at every router can be as small as a single flit. In wormhole (WH) switching, a header flit can be routed and transferred to the next hop before the next flit arrives, as shown in Figure 2 because each router can forward flits of a packet before receiving the entire packet; these flits are often stored in multiple routers along the routing path. Their movement looks like a worm. WH switching reduces hop latency because the header flit is processed before the arrival of the next flits. Wormhole switching is better than SAF switching in terms of both buffer size and (unloaded) latency. The main drawback of WH switching is the performance degradation due to a chain of packet blocks. Fractions of a packet can be stored across different routers along the routing path in WH switching; so a single packet often keeps occupying buffers in multiple routers along the path, when the header of the packet cannot progress due to conflictions. Such a situation is referred to as head-of-line (HOL) blocking. Buffers occupied by the HOL blockings block other packets that want to go through the same lines, resulting in performance degradation.


NoC can improve the performance and scalability of onchip communication by introducing a network structure that consists of a number of packet routers and point-topoint links. However, because they perform complicated internal operations, such as routing computation and buffering, routers introduce larger packet latency at each router compared to that of a repeater buffer on a bus structure. (NoCs use routers instead of the repeater buffers on a bus structure.) These delays are caused by intra-router operation (e.g., crossbar arbitration) and inter-router operation. We focus our discussion on inter-router switching and channel buffer management techniques for low latency communications.

Packets are transferred to their destination through mul-

2012 Journal of Computing Press, NY, USA, ISSN 2151-9617



Fig.2. Store-and-forward (SAF) and wormhole (WH) switching techniques.

3.3 Virtual Cut-Through (VCT) Switching To mitigate the HOL blocking that frequently occurs in WH switching, each router should be equipped with enough channel buffers to store a whole packet. This technique is called virtual cut-through (VCT), and can forward the header flit before the next flit of the packet arrives. VCT switching has the advantage of both low latency and less HOL blocking. A variation called asynchronous wormhole (AWH) switching uses channel buffers smaller than the maximum used packet size (but larger than the packet header size). When a header is blocked by another packet at a router, the router stores flits (same-sized as channel buffers). Flits of the same packet could be stored at different routers. Thus, AWH switching theoretically accepts an infinite packet length, whereas VCT switching can cope with only packets whose length is smaller than its channel buffer size. Another variation of the VCT switching customized to NoC purposes is based on a cell structure using a fixed single flit packet [11]. This is similar to the asynchronous transfer mode (ATM) (traditional wide area network protocol). As mentioned above, the main drawback of WH routing is that the buffer is smaller in size than the maximum used packet size, which frequently causes the HOL blocking. To mitigate this problem, the cell-based (CB) switching limits the maximum packet size to a single flit with each flit having its own routing information. To simplify the packet management procedure, the cellbased switching removes the support of variable-length packets in routers and network interfaces. Routing information is transferred on dedicatedwires besides data lines in a channel with a single-flit packet structure (Figure 3).

Fig.3. Packet structure of the various switching techniques discussed

in this section.

2012 Journal of Computing Press, NY, USA, ISSN 2151-9617



Fig.4. Channel buffer management techniques.

The single-flit packet structure introduces a new problem; namely, the control information may decrease the ratio of raw data (payload) in each transfer unit, because it attaches control information to every transfer unit.


To implement a switching technique without buffer overflow, a channel buffer management between routers is needed.

4.1 Go & Stop Control The simplest buffer management is the Go & Stop control, sometimes called Xon/Xoff and on/off. As shown in Figure 4, the receiver router sends a stop signal to the sender router as soon as a certain space of its channel buffer becomes occupied to avoid channel buffer overflow. If the buffer size used by packets falls below a preset threshold, the receiver router sends a go signal to the sender router to resume sending. The receiver buffer is required to store at least the number of flits that are in flight between sender and receiver routers during processing of the stop signal. Therefore, the minimum channel buffer size is calculated as follows: Minimum Buffer Size = Flit Size (Roverhead + Soverhead + 2 Link delay) (1) Where Roverhead and Soverhead are, respectively, the overhead(required time units) to issue the stop signal at the receiver router and the overhead to stop sending a flit as soon as the stop signal is received. 4.2 Credit-Based Control The Go & Stop control requires at least the buffer size calculated in Equation (1), and the buffer makes up most of the hardware for a lightweight router. The credit-based

control makes the best use of channel buffers, and can be implemented regardless of the link length or the sender and receiver overheads. In the case of the credit-based control, the receiver router sends a credit that allows the sender router to forward one more flit, as soon as a used buffer is released (becomes free). The sender router can send a number of flits up to the number of credits, and uses up a single credit when it sends a flit, as shown in Figure 4. If the credit becomes zero, the sender router cannot forward a flit, and must wait for a new credit from the receiver router. The main drawback of the credit-based control is that it needs more control signals between sender and receiver routers compared to the Go & Stop control.

[1] W.J. Dally and B. Towles, Route Packets, Not Wires: On-Chip Interconnection Networks, Design Automation Conf. (DAC), pp. 683-689, 2001. [2] Erno Salminen, Ari Kulmala, and Timo, Survey of Network-on-chip Proposals, White Paper, OCP-IP, March 2008 2005. [3] P. Pratim Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, Performance evaluation and design tradeoffs for network-on-chip interconnect architectures, IEEE Transactions on Computers, vol. 54, no. 8, pp. 1025-1040, 2005. [4] Ville Rantala, Teijo Lehtonen, Juha Plosila, Network on Chip Routing Algorithms, TUCS Technical Report No 779, August 2006. [5] Jingcao Hu, Radu Marculescu, DyAD-Smart Routing for Networks-on-chip, ACM Journal, 2004. [6] J. Duato, S. Yalamanchili, and L.M. Ni, Interconnec-

2012 Journal of Computing Press, NY, USA, ISSN 2151-9617



tion Networks: An Engineering Approach. Morgan Kaufmann, 2002. [7] W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004. [8] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, Fourth Edition. Morgan Kaufmann, 2007. [9] L. Benini and G. D. Micheli, Networks on Chips: Technology and Tools. Morgan Kaufmann, 2006. [10] A. Jantsch and H. Tenhunen, Networks on Chip. Kluwer Academic Publishers, 2003. [11] M. Koibuchi, K. Anjo, Y. Yamada, A. Jouraku, and H. Amano, A simple data transfer technique using local address for networks-on-chips, IEEE Transactions on Parallel and Distributed Systems 17 (Dec. 2006) (12): 14251437.

2012 Journal of Computing Press, NY, USA, ISSN 2151-9617