International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 9– Sep 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2975

Design of Improved Routers for Network on Chip
Bhavana P.Shrivastava
#1
, Kavita Khare
*2

#
Department of Electronics and Communication Engg.
Maulana Azad National Institute of Technology
Bhopal, India


Abstract— This paper presents an improved router design for
network on chip (NoC). Network-on-chip provides large
interconnection schemes for complex SoC design. Parameters
like power, delay, area, through put influence the performance of
NoC. Here power delay product (PDP) is analyzed, which are of
prime importance with reference to hardware implementation.
In router designing the power consumption is basically due to the
buffers and the crossbar. The proposed routers are designed at
180nm technology and we compare the performance with
baseline router and virtual channel router on the same platform.
The proposed router using dual crossbar achieves 25.5% and
60.88% lower PDP, while proposed router using multicrossbar
achieves 37.91% and 61.92% lower PDP, as compared to the
baseline router and virtual channel router respectively.

Keywords— Virtual channel, MP-SoC, Elastic Channel buffer,
Virtual Allocator, FIFO.
I. INTRODUCTION
As the IC technology increases, the gate delay
decreases, but the wire delay increases relatively
and this wire delay becomes the main factor which
can decide the overall performance[12]. Many
VLSI designers are trying to solve this long wire
delay problem through buffer insertion. In addition,
many current System-on-Chips (SoCs) use a system
bus to connect functional units. SoC system use
buses to support only limited number of functional
units and thus will face scaling problems in large
scale CMPs (Chip-Multiprocessors)[13]. In order to
solve these wire delay and scalability issues, many
studies suggested the use of a packet based
communication network which is known as
Network-on-Chip (NoC). NoC is used to connect
many functional units with a universal
communication network [1, 2, 3]. In NoC, a router
sends packets from a source to a destination router
through several intermediate nodes. If the head of
packet is blocked during data transmission, the
router cannot transfer the packet any more. In order
to remove the blocking problem, the researcher
proposed wormhole method. The wormhole router
splits the packet into several flits which can be
transferred in a single transmission. Buffer
allocation and flit control are performed at a flit
level. Power dissipation in NoC architectures is
mainly characterized by the power consumed in the
links, crossbars, and input buffers, amongst these
components, input buffers alone could consume up
to 46% percent of the total power of the whole
interconnection network [4]. As a result, reducing
the size of input buffers or completely eliminating
input buffers is a natural approach to design low-
power NoCs. On the other hand, increasing the size
of an input buffer would lead to more upstream
packets being transmitted and buffered, thus
increasing the throughput [5]. Therefore, simply
reducing the size of input buffers in each router
may result in a degraded performance, as the
performance of an interconnection network is
primarily affected by the size of the buffers.
Different techniques have been proposed based on
the two approaches buffered as well as buffer less.
II. RELATED WORK
A. Buffered Approach
In buffered approach iDEAL proposed to reduce
the size of the input buffers and utilize repeaters
along inter-router channels as storage units as a
result reducing the size of buffer without reducing
the performance [9]. Cost gets increased due to
extra control unit. Another approach utilizing
channel buffering is the Elastic Channel Buffers
(ECB), which replaces all the repeaters with flip-
flops, and eliminates the input buffers [6]. All
packets are transmitted from one channel buffer to
the next by a handshaking protocol. As a result
increase a small delay in traversing the flit.
B. Bufferless Approach
In buffereless approach Flit Bless proposed a
scheme to send all incoming packets to output ports,
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 9– Sep 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2976

RoutingComputation
(RC)
Virtual channel
Allocator

Switch Allocator

VC 0
VC1
VC2
VC3

D
e
m
u
x
M
u
x
+X
-X
+Y
-Y
+X
-X
-Y
-Y
Crossbar
Processing Element
irrespective of the act whether those output ports are
productive [7]. In flit bless each packet carries
several desired output ports to compete in switch
allocation. As a result excessive computation
workload to find the final allocation results limits
the performance of switch allocator, so that routers
in Flit-Bless could only work with a frequency
lower than other NoCs. Another design which
improves upon buffer less routing is SCARAB in
which it eases the workload of switch allocator if
none of the productive output ports are available, the
packet will be dropped, and a NACK signal will be
transmitted through a dedicated circuit-switched
NACK network to trigger a retransmission [8]. Both
buffer less techniques are suitable only at low load
condition.

III. ARCHITECTURE OF BASELINE ROUTER
A baseline router is implemented using
wormhole technique it consists of buffers, switches,
and control units which are required to store and
forward flits from the input ports to the desired
output ports [10,11]. The architecture is actually
similar to that of modern routers, but with smaller
area and buffer size .Fig:1 shows a NoC 16 buffer
slots per input port. The buffer slots are divided into
four queues, and each queue is called a virtual
channel (VC). There are four cardinal input ports
and output ports connected from and to +X, -X, +Y
and- Y directions. The last pair of input/output ports
is connected from and to the processing element
(PE). The four VCs are sandwiched between the de-
multiplexer connected to the input port, and the
multiplexer connected to the crossbar.Each input
unit can communicate with router, virtual-channel
allocator, and switch allocator, which are
responsible for Routing Computation (RC), Virtual-
Channel Allocation (VA), and Switch Allocation
(SA), respectively. The crossbar is controlled by the
switch allocator for correctly connecting input ports
to output ports. When the flit arrives at the
internal flit buffer, the RC unit sends incoming flits
to one of physical channels. The Virtual Channel
Allocation unit receives the credit information from
the neighboring routers, arbitrates all the header
flits which access the same VCs, and then select
one of them according to the arbitration policy.
Switch Allocation (SA) unit arbitrates the waiting
flit in all VCs accessing the crossbar and allow only
one flit to get crossbar permission. The SA
operation is based on the VA stage since the flit
data in the buffer comes from the previous router in
the route. The flit data pass over the crossbar and
thus can arrive at the destination node.














Fig: 1 Baseline router
IV. PROPOSED DUAL CROSSBAR ROUTER
In proposed router the advantage of both buffered
In proposed router the advantage of both buff
ered as well buffer less router is achieved. In order
to get the both the advantage dual cross bar is
used.At low traffic condition the flit traverse from
the first crossbar and at high load condition flit
traverse from second crossbar using elastic
buffer.At low load condition the all packets would
only traverse the primary crossbar and follow
minimum path and experiences minimum delay so
behave as bufferless network. At high load the
packets get traverse from secondary cross bar
through a elastic buffer which provides a function
of hand shaking by providing a ready valid
handshake signal. In the Fig: 2 proposed dual
crossbar router with elastic buffer is shown. It is
having four input port at low traffic the data flit
traverse though primary switch at heavy load the
flit get store in the elastic buffer and the data flit
traverse through secondary crossbar. The function
of processing element is to give feedback from
output to input to show whether the flit is valid or
not. Buffers are provided in front of secondary in
which the data moves serially as the virtual channel
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 9– Sep 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2977

is eliminated so the virtual allocator stage is
eliminated.

Fig: 2 Proposed Router Design using dual crossbar
Switch arbiter (SA) is modified to make control
over the Demux and Mux to maintain the correct
packet flow in both crossbars. The elimination of
VCs eliminates the VA stage and simplifies SA
stage, so that SA and ST could be performed in the
same cycle the number of stages reduce to 3 shown
in Fig: 3









Fig: 3 5 stages of baseline reduces to 3 in proposed
Fig: 4 shows the implemented cadence virtuoso
circuit design of proposed router circuit using dual
crossbar.


Fig: 4 Partial schematic of proposed dual crossbar router
C. Elastic Buffer

Elastic buffer uses a ready-valid handshake to
advance a flit (flow-control digit). An upstream
ready (R) signal indicates that the downstream EB
has at least one empty storage location and can
store an additional flit [6]. A downstream valid (V)
signal indicates that the flit currently being driven is
valid. A flit advances when both the ready and valid
signals between two Elastic Buffers are asserted at
the rising
clock edge.









Fig: 5 Elastic Buffers block diagram
Fig: 5 shows an elastic buffer implemented using
Delay Flip Flop(DFF) which use a ready-valid
handshake signal to advance a flit (flow-control
digit).Elastic buffer works on the phenomenon of
handshaking. EB channels feature provides multiple
EBs to form a distributed FIFO. Fig: 6 shows the
schematic of DFF and Fig:7 shows the EB control
logic.


Routing Computation
(RC)
Switch arbiter
(SA)
Look ahead
signal
Primary Crossbar
Elastic
Buffer
Elastic
Buffer
Elastic
Buffer
Elastic
Buffer
Processing Element
+X
-X
+Y
-Y
+X
-X
+Y
-Y
Secondary
Crossbar
BW
RC
VA
SA
ST
LT
BW
RC
SA
ST
LT
D
Q
Master Latch
EN
D
Q
Master Latch
EN
Data
EB Control Logic
ENM
ENS
R_OUT
V_OUT
R_IN
V_IN
International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 9– Sep 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2978



Fig: 6 Schematic of DFF



Fig: 7 Schematic of EB control logic

B. Switch Arbiter
Switch arbiter uses the arbitration result to set the
two crossbars correctly, control all of the
multiplexers to link each output port of the router to
the correct crossbar, and control all the de-
multiplexers to direct incoming flits between the
primary and the secondary crossbar. The arbiter
implement here is modified round robin arbiter
shown in Fig: 8.It provides fast operation for
arbitration so reduce delay and consume less power.



Fig: 8 Schematic of Switch Arbiter

C. Dual Crossbar Switch

In dual crossbar organization single crossbar split
into two parts, each with smaller number of input
and output ports. Dual crossbars are responsible for
physically connecting an input port to its destined
output port based on the control signal from switch
arbiter. Implementation of crossbar is done using
multiplexer shown in Fig: 10.


Fig: 10 Schematic of Dual Crossbar switch

V. PROPOSED MULTICROSSBAR ROUTER
In the proposed multicrossbar router shown in
Fig:11 single cross bar is split into four smaller
crossbars to reduce delay. The division of the 4
crossbars is along the 4 quadrants: (+x, +y) [North-
East], (-x, -y) [South-West], (-x, +y) [North-West]
and (+x, -y) [South-East]. The packet arrives from
+x direction into I
0
, indicating that the quadrant is
(x+, y+).This packet can be routed to either O
0
(+x
direction) or O
2
(+y direction) using the North-East
crossbar. Similarly, if the packet arrives from +x
direction from I
0
direction into the South-East
crossbar, then the possible outgoing directions will
be O
0
and O
3
, indicating that the destination
quadrant is (x+, y-). Therefore, by limiting the
crossbar connections and combining select crossbar
outputs, we adaptively provide more opportunities
for the output ports to be occupied than a
conventional crossbar.

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 9– Sep 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2979






















Fig: 11 Block diagramof Proposed Router Using Multicrossbar
The multi-crossbar configuration provides lower
area due to split crossbars; reduce delay due to
shorter path lengths and higher throughput due to
selective merging of different output ports. The
function of processing element is to give feedback
from output to input to show whether the flit is
valid or not. Buffers are in which the data moves
serially as the virtual channel is eliminated so the
virtual allocator stage is eliminated. Switch arbiter
(SA) is modified to make control over the Demux
and Mux to maintain the correct packet flow in the
crossbars. Multi crossbar circuit shown in Fig:12
implemented in cadence. Fig: 13 cadence virtuoso
circuit design of proposed router circuit using
multicrossbar.
VI. SIMULATION AND RESULT

The PDP of proposed router using multicrossbar
switch reduced 27.5%compare to baseline router .It
reduces delay by 51.37% compared to baseline
router and 11.41% as compared to proposed dual
crossbar router. The PDP of proposed router using
dual crossbar switch reduced to 25.5%.













Fig:12. Schematic of multi -crossbar switch


Fig:13. Partial Schematic of Proposed router using multicrossbar
The proposed using dual crossbar design reduces
power by 68.97% and reduces the delay by
45.10%compared to baseline router. As shown in
the Table: 1 proposed technique of router design
provides better performance compared to other
techniques.
TABLE I
PDP CALCULATION
Design

Delay
(nsec)
Total
average
power (µw)
PDP
(Femto watt
sec)
Baseline router

4.117

194.6

801.16
Proposed router
using dual crossbar

2.260

263.9

596.41

Proposed router
using multicrossbar
2.002 290.2 580.98
Routing
Computation (RC)
Switch arbiter
(SA)
Look ahead
signal
Elastic
Buffer
Elastic
Buffer
Elastic
Buffer
Elastic
Buffer
Processing Element
+X
-X
+Y
-Y
+X
-X
+Y
-Y
+X, +Y
-X,-Y
-X, +Y
+X,-Y
I
0

I
2

I
1

I
3

I
1


I
2


I
0


I
3


O
0

O
1

O
2

O
3

O
4

International Journal of Computer Trends and Technology (IJCTT) – volume 4 Issue 9– Sep 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2980

Fig: 13 Simulation waveformof baseline router. Simulation
result shown in Fig.14 of dual crossbar router. Fig.15 shows
the simulation results of multi crossbar router.

Fig: 13 Simulation waveformof baseline router


Fig: 14 Simulation waveformof proposed router using dual crossbar

Fig: 15 Simulation waveformof proposed router using multicrossbar
VII. CONCLUSION
The PDP of the proposed router is significantly
reduces. To improve the performance multi-
crossbar switch in place of dual crossbar switch.
The multi-crossbar configuration provides the lower
delay due to split crossbars. Delay reduces due to
shorter path lengths and higher throughput due to
selective merging of different output ports. Through
both proposed designs we achieve the advantage of
both buffered and buffer less and performance
enhancement by reducing PDP.

REFERENCES

[1] R. Ho, K. W. Mai, and M. A. Horowitz (2001) “The future of
wires,” Proceedings of the IEEE, vol. 89, pp. 490–504.
[2] L. Benini and G. D. Micheli (2002) “Networks on chips: A new
so paradigm,” IEEE Computer, vol. 35, pp. 70–78.
[3] W. J . Dally and B. Towles(2001) “Route packets, not wires,” in
Proceedings of the Design Automation Conference (DAC),
Las Vegas, NV, USA,J une 18-22.
[4] J . D. Owens, W. J . Dally, R. Ho, D. N. J ayasimha, S. W.
Keckler, and L. S. Peh(2007) “Research challenges for on-chip
interconnection networks,”IEEE Micro, vol. 27, no. 5, pp. 96–
108.
[5] S. Heo and K. Asanovic(2005) “Replacing global wires with an
on-chip network: A power analysis,” in Proceedings of the
International Symposium on Low Power Electronics and
Design (ISLPED), San Diego, CA, USA, pp. 369–374.
[6] George Michelogiannakis, J ames Balfour and William J .
Dally(2008)“Elastic-Buffer Flow control for On-Chip
Networks,”IEEE micro pp.151-162
[7] T. Moscibroda and O. Mutlu(2007) “A case for bufferless
routing in on-chip networks,” in Proceedings of the 36th annual
InternationalSymposium on Computer Architecture
[8] M. Hayenga, N. E. J erger, and M. Lipasti(2009) “Scarab: A
single cycle adaptive routing and bufferless network,” in
Proceedings of the 42nd AnnualIEEE/ACM International
Symposiumon Microarchitecture.
[9] A. K. Kodi, A. Sarathy, and A. Louri(2008) “ideal: Inter-router
dual-function energy- and area-efficient links for network-on-
chip (noc),” in Proceedings of the 35th International
Symposium on Computer Architecture Beijing, China, pp.
241–250.
[10] C. A. Nicopoulos, D. Park, J . Kim, N. Vijaykrishnan, M. S.
Yousif, and C. R. Das(2006)“ViChaR: A dynamic virtual
channel regulator for network-on-chip routers,” in MICRO’39:
Proceedings of the 39th Annual IEEE/ACM International
Sympo-siumon Microarchitecture, pp. 333–346.
[11] P. Guerrier and A. Greiner(2000)“A generic architecture for
on-chip packet-switched interconnections,” in DATE ’00:
Proceedings of the Conference on Design, Automation and
Test in Europe, pp. 250–256.
[12] S. Borkar(1999) “Design challenges of technology scaling,”
IEEE Micro, vol. 19, pp. 23–29.
[13] W. J . Dally and B. Towles(2004) Principles and Practices of
Interconnection Networks.San Fransisco, USA: Morgan
Kaufmann.