Professional Documents
Culture Documents
Abstract-Round robin arbiter and matrix arbiter paper, the behaviors of two popular arbiters: the
mechanism are widely used in Network-on-chips. These two Round-robin and matrix arbiter are analyzed.
mechanisms are implemented in this paper. The A High-Speed and Decentralized Round-robin
performances in 2D-mesh topology are tested in a FPGA
Arbiter(HDRA) is presented. The author analyses the area
platform. The resource consumption and throughput
and critical path, and compare the proposal with round
between Round-robin arbiter and Matrix-arbiter are
robin arbiter [ 1]. In this paper we compare the behavior of
compared. Through the experiment result, we found that the
Matrix-arbiter has higher throughput than the Round-robin the matrix arbiter with that of the round robin arbiter.
Fig. 1 shows the block diagram of an NxN switch.
Each input port contains N virtual channels (VCs) to avoid
arbiter. However the Round-robin arbiter can save much
more resources than Matrix arbiter. Thus a tradeoff between
the two mechanisms should be considered when design head-of-line blocking. The task of the arbiter is to decide a
networks-on-chip arbiters. set of contention-free connection between input and
output ports. For high performance switching, actions
Keywords-Network-on-chips, round robin arbiter, matrix
such as packet arriving, scheduling and switching, and
arbiter
VC
departing are operated in a pipelined way [5].
INTRODUCTION
I.
[nportL[��� 1
V
As the area and speed on a single chip now faces the
II I
big challenge on a single chip, more and more processing
I � III
npo �2
I
elements now are placed on System on chip.
Network-on-chip (NoC) is a new method for on chip
vc
I
[nport1:L[, ' l.Jrtmt-�
communication to solve the problem that challenges the
system on chip. The physical interconnection on a chip
becomes a primary factor which limits the performance
J..l.
and power consumption. As the switch speed of crossbar
switch increases rapidly, on big problem we should
resolve is to implement a fast and fairness arbiter to Fig.l the block diagram of N x N switch
maximize the switch throughput and timing performance In addition to a crossbar switch fabric, the internal
for Network-on-chips. structure of an N x N network switch consists of VCs and
NoC has advantages on architecture, performance, arbiters (there may be additional hardware components
reusability and scalability than traditional bus-based such as memory at the input port in case of the occurrence
system-on-chip. In the NoC implementation, each basic of VC overflows).
module need to be designed elaborately to guaranteed low This paper is organized as follows: section II
latency and high throughput.. Among these basic modules, introduces the Round-robin arbiter and its implementation
the data flow control of virtual channel play an important on FPGA. In section III, the Matrix-arbiter and its
role to alleviate the package congestion. The architecture gate-level architecture are dedicated and implemented.
and dataflow control (such as AckinAck or credit)will Section IV analyzes the performance and cost for these
affect the design of arbiter of NOC significantly. The arbiters. Finally, conclusions are drawn in section V.
292
Round-robin, Matrix-arbiter and the other different
arbitration mechanisms. In this paper we just consider the
Round-robin and Matrix arbitration and emulated them on
Router �
Arbiter �
'----c:-_-,,-,J �L[[II}--J � Fig3.the architecture of round robin
Fig2.genraJ router architecture
In the design of network on chip, when use the arbiter
Whenever a resource, such as a buffer, a channel, or a
which has the fixed priority, there is no limit for waiting
switch port is contended by many requests, the arbiter is
time of the request with lower priority. In Round-robin
required to assign access to the resource to one request at
arbiter, the request input which gets the resource has
a time. For example, for an n-input arbiter that is used to
lowest priority, so that Round-robin arbiter has strong
arbitrate the use of a resource, such as the virtual channels
fairness.
(VCs) connected to that input port. Each virtual channel
that has a flit to be send requests accesses to the input port III. THE MECHANISM OF MATRIX-ARBITER
by asserting its request. We assume there are 8 virtual
channels, marked by VCI, VC2 ... VC8 separately, if VCI, A matrix arbiter adopts N x N matrix, and implements a
VC3, VC6 send out the resource requests simultaneously, least recently served priority scheme by maintaining a
at this time, assuming VC3 has higher priority than the triangular array of state bits w"j for all i < j. The
other twos, then the resource should be allocated to it,
VC 1 and VC6 will be in waiting state and send out the element w"j in row i and column j indicates that request
requests in the next clock cycle.
i takes priority over request j. For example, if the request
Supposing in a given period of time, there was many
input ports request the same output or resource, the arbiter input i had higher priority than the request input j, we set
is in charge of processing the priorities among many the element w"j to I, else the element w"j should be
different request inputs. The arbiter will release the output
O. Only the upper triangular portion of the matrix need be
port which is connected to the crossbar once the last flit in
the package has finished transmission. So that other maintained since w"j = W j,i, i '# j. When the
waiting packages could use the output by the arbitration of
request i has higher priority than the others, the request i
arbiter. A round-robin arbiter operates on the principle
that a request which was just served should have the will obtain the resource, such as output port or a virtual
lowest priority on the next round of arbitration. This can
be accomplished by generating the next priority vector p
channel. Each time a request i is granted, it clears all bits
w".
from the current grant vector g.
in its row which means the other request has higher
priority than the request i, and sets all bits in its column,
Figure2 shows the gate architecture of Round-robin
round-robin arbiter makes the last winning request lowest recently served. There is no physical meaning to consider
priority for the next round of arbitration. When there are the priorities between the request i and itself, and we mark
no requests, the priority is unchanged. If there is resource the element W1,1 by X in matrix of figure 4.
.
allocation in current clock cycle, then one bit of vector rt;2 rt;3 rt;4
[�'
grant gi would be high level that leads to relevant bit of X Tf";,3 Wz,4
the priority vector Pi+! be' I' in the next clock period. �,1 �,2 X �,4
So the next input request will be granted the highest u;:",1 u;:",2 u;:",3 X
priority in the next clock cycle, the request which had Fig4.the priority matrix
293
l1 : i lHf : ij
permutation of inputs. If the initial state is invalid, it is
easy to enter to deadlock. For example, ifW01 = �2 = 1,
x 2
W 0 = 0, when the input 0, 1,2 send out request
FigS.the state translates after arbitration. simultaneously, the requests will be disabled and no
Figure 6 shows the four-input gate architecture of a grants will be issued. In the design, we should consider
matrix arbiter. In the figure each block with dotted line
the effective initial state carefully.
describes the S-R latch, and the state is maintained in the
six S-R latches denoted by dotted line blocks in the upper IV. EMULATION RESULTS AND COMPARISONS
triangular portion of the matrix. Each of the dotted blocks
The emulations of round robin arbiter and matrix
in the lower triangular portion of the matrix represents the
arbiter are implemented on FPGA platform. We set
complementary output of the diagonally symmetric solid
different numbers of request inputs, which means the
Re�1
box.
different lengths of request vectors. We get the statistics
about the resource utilization, maximum clock frequency
and power consumption of the two different arbitration
Re�2
mechanisms. Once the packets from the virtual channel of
the input simultaneously request the crossbar switch, the
number of the request inputs of arbiter increased. For
example, if there were 5 inputs, and each input with 6
virtual channels, when the packets from the virtual
channels were transferred to the crossbar, then the overall
number of request input should be 30 to resolve the
1200
concurrency conflict.
�
Matrix-arbiter , , , ,
.1- " "
Round-robin _:___ � ______
o
1000 : � _,__
•
/'
«
600
c
o
Fig6.the gate architecture of Matrix-arbiter "0 - - �- - - � - - -:- - - � ---:- /- � - --
� I I I I I I
.�
�
the matrix arbiter are set to 1, according to the principle of
I ,of- I
200 1 Jl
complementary, the elements in the lower triangle should en I I I I! I I
o�
O --·� ��10�� 15�� 20--� 25 ---3�
O --�35·
request 1 has the highest priority, then request2, request3, _ ...
_.". .-.
•• • .• •
acquires response according to the priority matrix. When Figure 8 shows that matrix arbiter and Round robin
making a judgment, the priority matrix should be adjusted arbiter cost similar resource when there are a few requests,
according to the last priority matrix and the current grant nearly about 100 slices are consumed. However, in 3D
vector. Figure 7 shows the analysis chart, the input request NoC, the number of VCs in a router will beyond 30
vector Req is 4'bOllO, we get the grant vector gnt from channels. Thus, when the number of input requests
the priority matrix which denotes the ports 2,3,2,3 will increases, Matrix-arbiter will employ abundant resource.
In contrast, Round-robin arbiter doesn't cost so much
Req 4'b0110
obtain the resource successively.
resource. When the request inputs approach 32, the
[! : ; H! : l1H! i ! 1H! : : 1]
=
294
frequency, approximately 547 MHz. Almostly, the process data more quickly. In respect of power
maximum clock frequency of matrix arbiter is far higher consumption, they are similar to each other.
than that of the Round robin arbiter, means the In the next research, we will analyze the queuing
matrix-arbiter has higher throughput and more fast arbiter and fixed priority. In the design of NoCs, the
computation speed. Especially when the request inputs are primary arbiter will be researched to design more complex
7, the former is 1.4 times than the latter. The maximum virtual channel allocator.
clock frequency declines with the request input increases
in both of the mechanisms. The Matrix-arbiter achieves
high-speed computation at the penalty of vast resource Reference
and silicon area. [1]. Yun-Lung Lee, Jer Min Jou and Yen-Yu Chen,A
�:��:
High-Speed and Decentralized Arbiter Design for
j 1 � ��::��r���r
I
� _ . - NoC[J],350-353.
'I� I
N
___
.
___ ___
-I [2]. Gao Xiaopeng, Zhang Zhe, Long Xiang. Round Robin
- - ...J _ -�
I
Arbiters for Virtual Channel Router, IMACS
\:1 I
�450 ; � - - - r - - -,- - - -,- - - T - - - Multiconference on "Computational Engineering in
---I� 1 --
�
----
I
c: Systems Applications" 1610-1614.
� 400
I
- -:..I - - - I- -1- - - +. - --
' [3]. Li-Shiuan Peh,William J. Dally.A Delay Model and
I 1 I
ID , I
.- . I
�350 Speculative Architecture for Pipelined Routers[J], the 7th
I
..!:I::: I
-------.-.------...;� � - ---�---
.....
•
number of inputs increasing. The round robin arbiter Architecture for On-Chip Networks[J], Proceedings of the
33rd International Symposium on Computer Architecture
consumes lower power than matrix-arbiter, and the
(ISCA'06).
difference is nearly 1 mW, In the design of arbiter, we
[S]. Robert Mullins, Andrew West and Simon Moore.
should make a trade-off among the resource or silicon
Low-Latency Virtual-Channel Routers for On-Chip
area, maximum clock frequency and power consumption, Networks[J]. Proceedings of the 31st Annual International
and choose suitable arbitration mechanism according to Symposium on Computer Architecture. 1-10
0.31 r;=====::::,---
:;- --,-----,---,----,
that. [9]. L. Benini and G. Micheli, "Networks on Chips: A New SoC
t- '
�
/1
�O� -- '
- - r - - ' - - - r -- --- � - -
- Codesign of Fast Parallel Round-Robin Arbiters", IEEE
C
o I I I
I I I I
TRANSACTIONS ON PARALLEL AND DISTRIBUTED
�o_ -- �---�--�---�--� ---�--
SYSTEMS, vol. IS, issue I, pp.S4-95, Jan., 2007.
5 , , ; .---t
, ,j
--
� 0.302
1/' y'- r I
- - -i - - - t- - -/-1 - - - t- - - -I - ... - t- - -
� /1
U I
0,3
I I I I I
o I I
C. 0,298
Y�-7'-'-' - '
-_;.--l 1- _ _ L __ ....J ___ l.... ___I ___ .L. __
I I I I I
11:'
0.296 L--_'-"---__"---__"---__"---__
- "---__
- "----"
-
V. CONCLUSION
295