You are on page 1of 5

International Conference on Communication and Signal Processing, April 3-5, 2014, India

Design And Verification Of Five Port Router


For Network On Chip

E. M. Choudhari, Dr. P. K. Dakhole, Member, IEEE

the inter process communication among different modules


Abstract- Traditional system on chip (SOC) designs offer takes place by transfer of packets instead of polling or
integrated solutions to exigent design tribulations in areas which arbitration as in bus architecture. The NOC design paradigm
necessitate outsized computation and restriction in certain area. has been proposed as the future of ASIC design.
Because of the common bus architecture in SOC system,
There are three main components of NoC namely router,
performance becomes sluggish which limits the processing speed.
resource and resource to network interface. For the efficient
The network on chip (NOC), due to their characteristics such as
NoC architecture, the router should be efficiently design as the
scalability, flexibility, high bandwidth have been proposed as a
valid approach to meet communication requirements in SoC, router is the central component of the network on chip system.
where common bus architecture replaced by network. The It is the communication backbone of the NOC system. Routers
communication on network on chip is carried out by means of are used on a network for directing the traffic from source to
router, so for implementing better NOC, the router should be the destination. It co-ordinates the data flow which is very
efficiently design. In this paper we present the design and crucial in communication network. The communication on
verification of router for Mesh topology using Verilog HDL network on chip is carried out by means of router.
which supports five parallel connections at the same time. It uses
The goal of this paper is to provide efficient router
store and forward type of flow control and FSM controller
architecture for efficient NoC Mesh Topology having five
deterministic routing which improves the performance of router.
input port and five output port. The router we present in this
Design unit is targeted to Sparten 3E xc3s500e-4fg320 FPGA
device and simulated in XILINX 13.1 Software. paper supports five parallel connections at the same time. One
port is the local port used to assure communication between
Index terms- FIFO, FSM, Mesh Topology, Network-On­ the processing element and the router via the network
Chip, Register blocks, Router Simulation. interface. But for the network communication we are not
consider this port in our paper. The remaining ports allow
connecting the router with its neighbors. It uses store and
I . INTRODUCTION
forward type of flow control. The switching mechanism used
In old era ICs have been designed with dedicated point to here is packet switching which is generally used on network
point connections, with one wire dedicated to each signal. Due on chip. In packet switching the data transfers in the form of
to this, large design had some limitations from physical design packets between co-operating routers and independent routing
viewpoint. The wires (wire bus) occupy large area of chip and decision is taken. The store and forward mechanism is best
in CMOS technology, interconnection dominates performance because it does not reserve channels like circuit switching and
and dynamic power dissipation. Network on Chip (NoC) is a thus does not lead to ideal physical channels. The Controller
new paradigm to make the interconnections inside a System design in this paper is FSM Controller so that every channel
on Chip (SoC) system. It reduces complexity of designing once get chance to transfer its data. In this router both input
wires and also increases speed and reliability. NoC can and output buffering is used so that congestion can be avoided
provide separation between computation and communication at both sides. We design here the router which support for the
supports modularity and IP reuse via standard interfaces, serve communication of Mesh topology which has five input and
as a platform for system test, handles synchronization issue, five output ports. And for the transmission of data round robin
and, hence increasing engineering productivity. In NoC scheduling is used. In section II, some related work is
technology the bus structure is replaced with a network which discussed. In section III, the overview of NoC design approach
is a lot similar to the Internet. is introduced, in section IV, the detailed proposed architecture
of router is described, section V presents the Synthesis and
Segments communicate with each other by sending packetized simulation results of our design and in section VI conclusions
data over this network. Network on Chip gives solution that are discussed.

II. RELATED WORK


E. M. Choudhari, M. Tech (student) in the Department of Electronics
Engineering from Yeshwantrao Chavan College of Engineering, an The router we design in this paper is avoiding congestion
autonomous institute affiliated to Rashtrasant Tukadoji Maharaj Nagpur and communication bottleneck. Although there are number of
University, Maharashtra, India. (e-mail: ektaOI2@gmail.com).
router implementation has already been done, some of the
Dr. P. K. Dakhole, Professor in the Department of Electronics
Engineering, Yeshwantrao Chavan College of Engineering, an autonomous related works are included here. In [I], FPGA implementation
institute affiliated to Rashtrasant Tukadoji Maharaj Nagpur University, of network on chip framework is given. In this they conclude
Maharashtra, India.(e-mail: pravin_dakhole@yahoo.com).

978-1-4799-3358-7114/$31.00 ©2014 IEEE +-IEEE


Advancing Technology
for Humanity

607
that as compare to common bus architecture NoC architecture Most common topology is 2-D Mesh due to its grid-type
is effective in terms of speed performance and throughput. In shapes and regular structure which are the most appropriate
[2] analysis of five port router is given. The diagonal Mesh for the two dimensional layout on a chip. Mesh topology is
topology is designed to offer a good tradeoff between easy to implement as all nodes are equally distance as shown
hardware cost and theoretical quality of service in [3]. Design in Figure.l and also makes addressing of the cores quiet
can contain large number of nodes without changing the simple during routing. The local interconnection between the
maximum diameter. They also present a new router resources and routers are independent of the size of network.
architecture called Flexible, Extensible Router NoC Moreover routing in a 2-D Mesh is easy resulting in
(FeRoNoc). The NoC router architecture with five port potentially small switches, high capacity, short clock cycle
presented in [4], which utilizes dual crossbar arrangement. and overall scalability [11]. Therefore we choose the mesh
Due to which latency arises, this reduces using predominant topology. A mesh topology has four inputs and four outputs
routing algorithm. In [5], NoC is implemented using Torus from!to other routers, and another input and output from! to
topology. They proposed router architecture, a router the PE. So we design here router for the implementation of
algorithm and also provided solution to the problem mesh topology.
introduced by the long wires in torus topology by pipelining
both the long wires and short wires and by lengthening the
input buffers attached to the long wires. Pratiksha Gehlot and
Shailesh Singh Chouhan had evaluated performance of
different Noc architectures and compare them on the basis of
some performance parameter. They conclude that Mesh and
Folded Torus has moderate values of all parameters [6]. In [7],
they mainly worked on analysis of router. They discussed
different types of router structure and used Markov chain
analysis to derive an analytical model for input queue mesh
based router. In [8], a low area overhead packet-switched Noc
architecture is presented but the drawback with that is , packet
has two headers which is quite expensive. The buffer here is
present only with input channel. The absence of output buffer
creates a serious problem in the implementation of router as it
increases the problem of congestion. Theodore Marescaux et.
Figure I: 3x3 Mesh topology
al. in their paper [9], presented the implementation of router
for NoC based system which has 2D torus network topology. C. Routing Algorithm:
Packet size has 16 data bits and 2 control bits. The main
The routing algorithm which defmes a path taken by a
drawback of this paper is a 2D torus formed using ID router
packet between the source and the destination is a main task in
which creates a serious bottleneck in traffic. Our paper
network layer design of NoC[I2]. According to how a path is
removes most of the problems cited above and improves the
defmed to transmit packets, routing can be classified as
performance of router.
deterministic or adaptive. In deterministic routing, the path is
uniquely defined by the source and target addresses.
Deterministic routing algorithms are widely used due to easy
III. AN OVERVIEW OF NOC DESIGN APPROACH implementation. and in adaptive routing algorithm use
Switching method, Topology and Routing algorithm are information about the network's state to make routing
three important things in the design of NoC. decisions. A mesh network topology consists of m column and
n rows. The routers are situated in the interconnection of two
A. Switching Technique: wires and processing element near routers. Addresses of
Circuit switching and packet switching are two major routers and resources can be easily defined by X and Y co­
switching techniques. Packet switching is better than circuit ordinates in a mesh. Hence for Mesh topology XY
switching as it does not reserve the path like circuit switching. deterministic routing algorithm is used.
Packet switching is utilized in most of NoC platforms because
of its potential for providing simultaneous data
communication between many source-destination pairs. IV. ROUTER ARCHITECTURE
Further it can be classified into Store and forward, Virtual cut
A router is a device that forwards data packets across
through and Wormhole switching [10].
computer networks. A router is a microprocessor controlled
B. Topology: device that is connected to two or more data lines from

Topology defines how nodes are placed and connected, different networks. When a data packet comes from in on one

affecting bandwidth and latency of a network. Many different of the lines, the router reads the address information in the

topologies have been proposed [5][6]. Such as mesh, torus, packet to determine its ultimate destination. With the help of

octagon, SPIN etc. Some researchers have proposed routing algorithm it directs the packet from router to router

application specific topology that can offer superior through the networks until it goes to the destination.

performance while minimizing area and energy consumption.

608
The proposed router architecture consist of four blocks i.e.
register, demultiplexer, FIFO, controller. The register is uta)
similar to the buffer which stores the input data.
Demultiplexer provides the channel which sends the input data
to the appropriate output port. FIFO is a first in first out
memory queue with control logic that manages read and write rei '
operations. It is used to control the flow of data between ,,2

source and destination. The controller is based on Fsm ,,]

deterministic routing which improves the performance of R1GISJlR DBltuWLIXER comOWR


router. The router which we design is a five port network
router. Router accepts the data in the form of packet. Packet [,\DHlVl
contains three parts. They are Header, data and frame check [,\R1G
sequence. Packet width is 8 bits and the length of the packet
between I to 63 bytes. The router drives the packet to
respective ports based on the destination address of the
packets. Figure.2 shows the block diagram of the router block. resell
This block is used to design the central router of Mesh
topology having five input and five output ports shown in
Figure 2: Internal structure of router block
Figure.3, In this block transfer of data from all inputs to all
output ports is carried out by round robin mode.

A. Resister SCHEDUlAR
D.\B_O!Jl)

We design the 8 bit register having clock, reset and enable. v.IJ.JD_CB.\'\ILJ

At the posedge of the clock, reset should be low and enable


should be high then and only then we get the output. When
Input
enable remain low, the register keeps its current value. The
ControUer D.H.\_0IJ1)
output of the register is the input of the demultiplexer. The V.WDJH.\'\1LJ

register is acting as a buffer. It stores the value temporally. As


the register we design here, it first stores the data and forwards
it when all the conditions are proper. We use the store and
D.\H_Olil)
forward flow control which avoid the congestion.
,r.lliIl_CH.\.'ill:)
Output
B. Demultiplexer ControUer
We design the 1:4 bit demultiplexer to define the four
D.U.�_Oli1_�
output ports. The last two bits of the input data acts as a select
\'ALIDJH.-\.'ill:_(
signal which indicates the port. Table I Shows the output port
regarding last two bits. Demultiplexer transfers the data
coming from the register block to the appropriate output port.

TABLE I Figure 3: Internal structure of mesh router


PORT IDENTIFICATION

Last Two Bits


C. FIFO
Sr. No. Output Port
of Data
We design the FIFO block having 8 bit width and memory
array of 0 to 15 location which stores the data called as depth
1. 00 Port I
of memory array. Here there are four fifo's to store the data
2. 01 Port 2
coming from individual output ports,It performs the read and
3. 10 Port 3
4. II Port 4 write operations. Fifo have fifo_full and fifo_empty control
signals. Fifo's are used to safely pass multi-bit data words
from one clock domain to another or to control the flow of
according to the last two bits defme in the table. The
data between source and destination side sitting in the same
demultiplexer also has an enable. When enable is high then
domain. Here for read and write operation we use the
only the demultiplexer directs the data to the appropriate
synchrounous clock. Write operation is done at the posedge of
output port at the same time corresponding write enable pin
the clock when the reset is low, write enable pin is one and
should also high. And other output ports and write enable pin
fifo is not full. In this condition only the data write into the
should remain zero. The output of the each port is connected
memory of the fifo. Read operation is done at the rising edge
to the individual FIFO unit.
of the clock ,when reset is low and simultaneouly the read
enable pin also one and fifo is not empty then only the data is
read from the memory of the fifo. When the fifojull signal is
high, it indicates that all the locations inside the fifo has been
written and it is unable to write the data into the memory of
fifo. Similarly when the fifo_empty signal is high, it indicates
that all the locations inside the fifo has been empty and it is

609
unable to read the data from the memory of the fifo. For four �"

� � =
fifo blocks there are four fifo_full signals and fifo_empty -.
signals which is the input to the controller.
� ----I
----....
---4 �,'
...
D. FSM Controller

I
This module generates all the control signals when new
packet is coming to the router. These control signals are used
by other modules to send data at output. As the demultiplexers
output is connected to the contoller, the data of the each port is
accepted by the controller when the respective port enable pin
is high as well as packet_valid signal is high. At the posedge II � I
of the clock when resetn is low then only data can be accepted.
fifojull and fifo_empty signals are also handlled by the

-----"'--
controller. There are four FIFO's regarding four output
ports.Controller checks which fifo is full, which FIFO is --,,-±=
empty and according to the port address it gives signal to write .--""-.
the data into the respective FIFO when we pin of controller is ----t.."j
high. and when fifo_full signal becomes high data is read from -=

the respective FIFO and send to the respective output port of


the controller. When reading process is completed data is -.-
ready to transmit. It indicates by the respective valid_chanel
signal when it becomes high. There are four valid_chanel for Figure 4: RTL schematic of router block
the four output ports of the controller. if any one of the chanel
is low, it means that chanel is busy in communicating and
respective fifo_empty signal is high. It means that it is ready
to write the data into the present fifo. When the suspend_data
f��E:
riB ,-
-
� �
==F

t:� �

- �
signal is low, then only router can accept the new data from -
the input and latched data is send to the FIFO and we pin is = I ' � � F--- I---
-
--'-
generated for writing the data into present FIFO. In present I---
controller, fsm logic is used due to which cycle for each F t==: : I---
�""''-
output port is repeated. Hence performance is increases.
If=

v. SYNTHESIS AND SIMULATION RESULTS

A. Synthesis
III
The synthesis of the proposed design has been done using
�..
Xilinx and following is the RTL schematic of the proposed
"'+ �
router. The figure.4 below is a schematic of a router block
.;:..

with the basic modules i.e. Register, Demultiplexer, FIFO and �t:�
Controller having one input and four output port and Figure 5 ...
is the RTL schemayic of Mesh router. It is seen that the design
has a four input ports and four output ports which include FigureS: RTL schematic of mesh router

router block and round robin schedular having input controller �


and output controller. Which manages the control on flow of ' �."
�� �� ". �� .,� f"."
" Ill." <ll. " "

packets from all four input to all four output ports using round lilUlpmd.daU n

robin scheduling. �. 0
linid.dIInriI ,
li!O'id.mm! 1 -
B. Simulation Result lIraid.dIWIl n

1I1iW.dIintK 0
Simulation refers to the verification of a design, its . , d\.O'.tIjnl 101m:o !tOllOOj

. � •."". IOmO) .. ..,


function and performance.The simulation is performed in • � •.oJlIl' 00:)0:10 """"
Xilinx 13.1 software. Figure.6 shows the simulation result of � ' d\.oot4(nl 00:)0:,0 """"
�. 0 J J J J J J J J J J
the router block having one input port and five output port. liEJj}IG ,
�.-... , -
lirmln 0
� .... ,.. ,
�� ,
�� 0
�" 0
� � dtI.ii11l1 lOll!all .....,

Figure 6: Simulation result of router block

Router block accept the input at the posedge of clock when rst
is low, respective re pin is high and suspend_data signal is low

610
as well as EN_REG and EN_DEMUX pin also high. As the Similarlly in future we can work for the implementation of
input is 100llO00 it indicates the data should get at port different topologies by designing suitable efficient router
I.After 195 ns when fifo get full, controller send signal and architecture regarding topology using improved verification
data is get at ch_outl, and it is ready to transmit which is methodology.
shown by the valid_chanell which becomes high. and Figure
7 shows the simulation result of Mesh router having four input
and four output port. REFERENCES
[I] Shubha B C, Srinkanta P, "FPGA Implementation of Network on Chip
� Framework using HDL", Students' Technology Symposium(TechSym),
- - - - - - - � - [EEE PP. 151-155, April 2010.
� ' D+JI.JXITI(7� lG1l0100
� 10110101
!l4 Do1J.J.'IJ12I711!
[2] Swapna S, Ayas Kanta Swain, Kamala Kanta Mahapatra,"Design And
� '[WAJ"IJl3�JJI 10110l!O
� It [WA.OOIIj1.i1j 101l0m I)l Analysis of Five Port Router For Network On Chip", Asia Pacific
lIW11.OWfl) 1
Conference on Postgraduate Research in Microelectronics and
lIw.l11.OWfl) 1
lIW1l.OWfl) 1 Electronics(PrimeAsia)2012.
1llNllJHIIlIJ 1
• l1li 001.,11(7111 [3]Majdi Elhajji, Brahim Attia, Abdelkrim Zitouni, Rached Tourki,Jean-luc
� If Do\J,JI2[7JJj Dekeyser, "FERONOC:Flexible and Extensible Router Implementation
� IIW [l,ijl,.t6[7JJ1
� II1II twAJI4j1JJj For Diagonal Mesh Topology"Author manuscript, published in
,0< nnHi Conference on Design and Architecture for Signal and Image
�"'
l.i�Ifl.WIl) 0 Processing(20II).

f!!I,I �I-f--J - L----.J I L...-..f [4] M.S.Suraj, D.Muralidharan, K. Seshu Kumar, "A HDL Based Reduced
li Mff.WD.4 0 Area NoC Router Architecture"978-1-4673-5301-4/\3/$31.00 © 20\3
[EEE.

[5] Angelo Kuti Lusala, Philippe Manet, Bertrand Rousseau, Jean-Didier


Legat "NOC IMPLEMENTAnON IN FPGA USING TORUS
Figure 7: Simulation result of mesh router TOPOLOGY" International Conference on Field Programmable Logic
and Applications, PP 778-781, IEEE-2007.
In this at the posedge of the clock when reset is low, the data
[6] Pratiksha Gehlot, Shailesh Singh Chouhan, "Performance Evaluation Of
which is at the any input port out of four input port goes to the Network on Chip Architectures",Departrnent of Electronics and
respective output port which is indicate by the last two bits of communication Institute of Engineering & Technology DAVV Indore,
the data. The input port accepts the data when respective India, International Conference on Emerging Trends in Electronics and
PACKET_VALID signal becomes high. When we get the data Photonic Devices & Systems (ELECTRO-2009).

at output port, it becomes ready to transmit which is shown by [7] Haythm Elrniligi, Ahmed A. Morgan, M. Watheq EI-Kharashi, and

respective VALID_CHANEL making high. In Table II we Fayez Gebali "Performance Analysis of Networks-on-Chip Routers",
Department of Electrical and Computer Engineering University of
shows the comparison of resource utilization of the proposed
Victoria, BC, Canada V8W 3P6 Mentor Graphics Egypt,Cairo
router and router design in reference[2]. From this simulation I 134I,Egypt, IEEE-2007.
we get 125.47MHz frequency whereas the design proposed in
[8] Fernando Moraes et.a!. A Low Area Overhead Packet-switched
reference[2] have 144.l96MHz frequency. But when we
Network on Chip: Architectutre and Prototyping. In IFIP VLSI­
simulate our design in Sparten 6 FPGA targeted to device
SOC 2003, pages 318-323, 2003.
xc6slx4-3tggI44, we get the frequency 193.43MHz.
[9] Theodore Marescaux et.a!. Interconnection Networks Enable
TABLE II Fine-Grain Dynamic Multi-Tasking on FPGAs. In FPL' 2002,
COMPARISON OF RESOURCE UTILIZATION OF FPGA pages 795-805, September 2002.
(DEVICE SELECTED- xc3s500E-4FG320)
[10] Partha Pratim Pande, Cristian Grecu, Michael Jones, Andre
Designed Unit Ivanov,Resve Saleh, " Performance Evaluation and Design Trade-offs
Sr. No. Recourses Proposed Design in Reference for Network-on-Chip Interconnect Architectures",TEEE Transactions On
[2] Computers, Vol 54, No.8, August 2005.

[II] Shashi Kumar, Axel Jantsch, Juha-Pekka Soininen, Martti Forsell,


I. Slices 2024 1464 Mikael Millberg, Johny Oberg, Kari Tiensyrja and Ahmed Hemani, " A
2. Slice Flip Flops 2133 1897 Network on chip Architecture and Design Methodology" Proceedings of
3. 4 input LUTs 2231 1901 The IEEE Computer Society Annual Symposium on VLSI, [EEE-2002.
4. Bonded lOB's 74 87
[12] Ville Rantala, Teijo Lehtonen, Juha Plosila, "Network On Chip Routing
5. GCLK's 5 I
Algorithms", TUCS Technical Report No 779, August 2006.

[\3] William J Dally, Brian Towels, "Route Packets Not Wires: On Chip
Interconnection Network",DAC 200I,June 2001Y.

[[4] Sharma, V. V. Joshi, V. M. Rohokale, "Performance of Router Design


VI. CONCLUSION AND FUTURE SCOPE
for Network on Chip [mplementation", [nternational Conference on
Mesh router having four input and four output ports using Communication, Information & Computing Technology (lCCICT), Oct.
Fsm logic has been proposed in this paper. So that every 19-20, Mumbai, India 2012

channel gets chance to transmit the data and performance [15] Muhammad Awais Aslam,Shashi Kumar,Rickard Holsmark, "An

increases. The synthesis and simulation of the proposed router Efficient Router Architecture and its FPGA Prototyping to Support
t
Junction Based Routing in NoC Platforms", 16 h Euromicro Conference
is verified through verilog code using Xilinx 13.1 software.
on Digital System Design 2013.
The simulation facilitates clear understanding of the
functionality of the router for network on chip. When we
simulate proposed mesh router in FPGA Sparten 3E we get the
125.47MHz frequency and when we simulate it on FPGA
Sparten 6 we get 193.43MHz frequency. In future we can
implement the mesh topology using the proposed router.

611

You might also like