You are on page 1of 3

Design and Implementation of Dual-Port Network on Chip Based on

Multi-core System
Yu-Kun Song, Qing-Song Qian*, Duo-Li Zhang

Institute of VLSI Design, Hefei University of Technology, Hefei 230009, China


*Email: sqhfut@163.com

Abstract routers in a diagonal. This communication structure can


improve the communication efficiency between resource
In order to make full use of the parallelism of NoC and nodes. Meanwhile, the use of diagonal manner and Torus
improve the parallel communication ability between topology increase network throughput, reduce the
resource nodes, a high performance dual-port network transmission latency.
and relative routing algorithm are introduced in this
paper. Meanwhile, the Torus structure is also used in the 2. Design ofTDPN
design to further improve the network performance,
which provides better connectivity of network and 2.1 Structure ofTDPN
smaller radius. This paper not only designs and The structure of TDPN is shown in Figure I, where
implements an 8*8 dual-port network on FPGA, but also every router can handle requests from two resource
tests throughput and latency of this structure. The nodes concurrently. At the same time, resource node can
experiment results show that the structure has higher send and receive two-way data. The merits of TDPN are
throughput and lower transmission latency. the symmetry in the whole network, the smaller diameter
of network and well scalability.

1. Introduction
D ROlltcr

Compared to traditional signal-core system, multi-core o ReSOlUTe node

system has the advantage of higher computing capability ROlltillg path


and lower power consumption. But at the same time it
also brings high-throughput and high-parallel challenges
to the communication on the chip. Network on chip
(NOC) as an effective communication architecture has
been proposed[I][2]. However, general mesh-based NoC
just provides a local port for resource node, thus limiting
each node only send or receive one-way data at the same
time, it can't fully exploit parallel feature of NoC to Figure l. Structure of the dual-port network
improve the parallel communication ability between
resource nodes. Some researchers have proposed the 2.2 Router architecture
method by taking use of multi-port network to increase The Figure 2 shows the architecture of TDPN's router,
throughput and decrease delay, so as to improve which consist of six parts: six Input modules, Decoder,
parallelism of communication. Literature [3] [4] put Priority_encoder, Arbiter, Crossbar, six Output modules.
forward the same dimension dual-port network. In
paper[3], the dual-port design was used in fault-tolerance,
where once a port was congested, the design selects the
other port to communicate. In paper[4], the two ports can
not only replace each other for fault-tolerant, but also
simultaneously communicate with other resource nodes.
Besides, they can send or receive two-way data at the
same time. However, the structure resulting in decreased
... ...
efficiency of network communication, due to too much i f � � f .. ..
i i �
I II II I
Oulput Output Output Oulput OU U! OU PU!
path occupation in a dimension. i 0 I 2 ]
Jo. Jo.
This paper designs a Torus based diagonal dual-port � � � � .. ..

network(TDPN). Each router in the TDPN provides two Figure 2. Router architecture
local ports, and each resource node connects with two

978-1-4673-9719-3/16/$31.00 ©2016 IEEE


2.3 Routing algorithm routerl at the same time, they can enter the destination
Routing algorithm by comparing the address of the node through the two routers which are connected with
destination node and the current node, accesses the the destination node respectively. This manner reduces
transmission direction of the routing request. In this the time of data transmission when the destination node
paper, the router has two local address, which are the needs two-way data, so it improves the parallel
addresses of LoaclO direction (XO,YO) and Locall communication efficiency between resource nodes. The
direction (Xl,Y I).The relationship between (XO,YO) and destination node switching method is similar to the (a)
(X I,Y I) is shown in the equation ( I). and (b), when the two routing requests arrive at the

{
router2 and router3.
(XO -1, YO -1), XO and YO ;to);
(Xl, Yl) =

(1)
xo
= =

( N, N ), lor YO 1

The TOPN routing algorithm consists of two parts:


intermediate path routing algorithm and destination node
switching algorithm.
Intermediate path routing algorithm is the principle from
the source node to the node around destination, and the
turning routing algorithm[51 is used in this paper, which
can determine the possible routing directions by (b)
Ca)
comparing the addresses between the destination node
and local node. The arbiter will choose the available Figure 4. Switch between two routers of destination port
direction from all the possible routing directions in order.
Meanwhile, Torus network interconnection increases the
2.4 Implementation
path diversity, the network distribution also determines
According to the feature of the dual-port network, this
the possible route direction. The pseudo code of the
paper implements the design of six modules and the
TOPN intermediate path routing algorithm is shown in
interconnection of the routing nodes on the basis of
Figure 3, where Net_R stands for network radius.
Torus network model. In this paper, the DPN[41 and
TDPN ,with a network size of 8 X 8, have been designed
if((Oest_x> Local1_X && Oest_X - Local1_X <=Net_R)
and implemented on the Xilinx V6VLX760-1ffll76
FPGA. Its hardware resource utilization is shown in
II(Oest_X < Local I_X && Local I_X - Oest_X >=Net_R»
Select right output port; Table I. The maximum clock frequency of TOPN is
if((Oest_X < Local2_X && Local2_X - Oest_X <=Net_R) 263.16MHz.
II(Oest_X > Local2_X && Oest_X - Local2_X >=Net_R»
Select left output port;
Table\. Resource utilization
if((Oest_Y > Locall_Y && Oest_Y - Locall_Y <=Net_R)
II(Oest_Y < Locall_Y && Local1_Y - Oest_Y >=Net_R» OPN TOPN Reduce
Select bottom output port;
If((Dest_Y < Local2_Y && Local2_Y - Oest_Y <=Net_R) LUTs 35712 34176 4.3%
II(Dest_Y > Local2_Y && Oest_Y - Local2_Y >=Net_R» Registers 87429 79762 8.8%
Select top output port;

3. Performance evaluation
Figure 3. Pseudo code of intermediate path algorithm
In order to evaluate the efficiency of TOPN, the paper
Oestination node switching algorithm is used to select chooses average throughout and average flit latency as
one from two routing interfaces of the resource node the performance index, compared with DPN, which
connecting with the routers. The schematic diagram of analyses the data transmission performance advantages
the node switching is shown in figure 4. In (a), request in binary affair. Average throughput and average flit
land request2 have the same destination address, when latency are defmed as follows:
request2 reaches the routerO and the 10calO direction has
been occupied by requestl, request2 can switch to
(Send Times)·(Package Length)
another routing node(router3) to connect the destination T'hrough'Put= ....-
1, .:.... ---""'----'--
- ----"'--=--'- (2)
node. Request2 can be routed through router 1 to enter (Number of Nodes)·(Total Time)

router3, also can be routed through router2 to enter


router3. In (b), when requestl and request2 arrive at the
,)end Times

I LatencYi
i-l
Lacenty = (3)
(Send Times)· (Package Length)

This paper gives some evaluation experiments executed


on FPGA , and the experimental conditions are:(l) Data
·
in the network use PCC technology [5] to achieve
transmission, and packet length is set to 2000 flits;(2)
Each resource node establishes 200 times data
transmission transactions, and destination node is
randomly generated;(3) To control the load of network
by changing the network injection rate, the interval
between two data transactions is specified from 0 cycle
to 20K cycles. The shorter interval, the heavier the Figure 6. Comparisons of latency in binary affair
network load is.
4. Conclusion
3.1 Experiment of binary affair
In experiment of binary affair, we set each resource node In this paper, a high performance dual-port network and
using both port to send two-way data to two destination relative routing algorithm are designed and implemented.
nodes at the same time. The results are shown as Figure The synthesis report shows that it costs less hardware
5 and Figure 6. As they shown, in experiment of binary resources than DPN. More importantly, the TDPN has
affair, the average throughput and average flit latency higher average throughput and lower average flit latency.
present the trend of growth with the increasing of
network load. The growth rate of TDPN, as shown in Acknowledgments
Figure 5, is faster than DPN, and being saturated when
sending interval reaches 2K cycles. DPN has been This work was supported by The National Natural
saturated when the sending interval reaches 10K cycles. Science Foundation of China (61204024, 61179036,
Compared with DPN, the average throughput of TDPN 61106020) and Natural Science Foundation of Jiangsu
has increased by 36.7%, and the average flit latency Province of China (BK2011185).
decreased by 31.2%.
Binary affair experiment shows that TDPN has better References
network performance. On the one hand, the diagonal
access manner and relative routing algorithm make [1] Benini, Luca, and G. De Micheli, Networks on chips:
resource node switch between two routing nodes. It a new SoC paradigm, Computer35.1 pp.70-78,
improves the success rate of link establishment and (2002).
reduces flit latency, and it doesn't affect throughput of [2] Dally, William J., and Towles, Brian, Route Packets,
network. On the other hand, introducing Torus structure Not Wires: On-Chip Interconnection Networks,
reduces network radius, network congestion and further Design Automation Conference, Proceedings IEEE
improve the average throughput. pp. 684-689, (2004).
[3] Ouyang Yi ming, Hu Chunlei, Liang Huaguo, Xie
Tao Fault-tolerant Architecture of NoC Based on
Du �l-port RNI. Journal of Computer Engineering,
Vo1.38, No.13, July (2012).
[4] Duoli Zhang, Shiyuan Li and Yukun Song, Design
and implementation of Dual-port Network on
Chip [C]// Solid-State and Integrated Circuit
Technology (ICSICT), 2014 12th IEEE International
Conference on. IEEE, (2014).
[5] Li Li, Wan Jian and Wan Jiawen, NoC
Retrograde-turn Routing Algorithm Based on
Packet-circuit Switching. Journal of Electronics&
Information Technology, Vol,23, No.3, Mar (2011).
Figure 5. Comparisons of throughput in binary affair

You might also like