Professional Documents
Culture Documents
I. I NTRODUCTION
Advantages of Network-on-Chip (NoC) over traditional busbased architecture have been proposed in many researches.
The NoC architecture has advantages in both scalability and
exibility thus it can be organized to run homogeneous cores
in parallel to improve performance for specic purposes [1].
Such approach on NoC is a suitable method to realize a high
throughput computational system on FPGA.
Data encryption/decryption is one computational algorithm
often implemented in researches for performance demonstration. Characteristics of one cryptography affect the selection of the it size for routing, the packet size in trafc
communication and the architecture for Processing Element
(PE). Together with popularity of data protection demands
nowadays, a high performance NoC specic to cryptography
must be analyzed.
Our work has realized a 55 2-D Mesh, VCT switching,
running 25 Data Encryption Standard (DES) computations in
parallel. The goal of this paper is to evaluate the throughput of
a high workload NoC. The main contribution is related to the
performance verication results of MCNoC architectures for
parallel DES computation. Our results indicate that proposed
work has considerable speedup than previous works.
This paper is organized as follows: Section II describes
the related work of DES on other NoC systems. Section III
introduces the proposed architecture. Section IV describes
C. Flow Control
The VCT contributes higher throughput when load increasing due to the wormholes drawback of quickly resources saturation while packets blocking occurs [6]. Banerjee [7] presents
that VCT gave lower latencies at higher acceptance rates and
provided better performance than wormhole switching.
D. Architecture of PE
According to the structure of DES, the reasonable number
of iterations are divisors of 16, i.e. 1, 2, 4, 8 and 16 (one
PE completes one DES operation). Using a small PE for only
one iteration needs another 15 computations to complete one
DES operation causing more packets routing in network. By
contrast, a large PE contains full 16 iterations makes packet
stay inside PE longer thus network trafc allows more data to
ll up other PEs. Whether the fast reaction of small PE helps
throughput improvement to overall network becomes a factor
to consider. This part of testing is discussed in Section V.
(a) XY routing
(b) WF routing
(c) NF routing
(d) NL routing
B. Simulation Results
1) Simulation Results of PE Size: Values in Table I states
the performance and evaluation of 1-, 8- and 16-iterative
PEs. The resultant slice utilization tells the 16-iterative PE
architecture ts to slices architecture better than others.
In the experiments of throughput testing, the benet of
short data processing period in low-iterative PE does not
compensate for the loss of throughput caused by congestions.
The 1-iterative PE saturates the NoC quickly due to the
routing time is much longer than the data processing time.
Consequently more packets stay on link rather than in PE
resulting congestions. When insertion rate reaches 727Mbps,
packet congestion occurs in 8-iterative design resulting only
15.73% packets returning to original terminal tiles.
By analyzing processing time of one packet in Table II,
1- and 8-iterative PEs process faster than router since they
implement only partial DES computation. A 16-iterative PE
is able to lock packet longer providing router more chance
to service another packet which further helps reduce trafc in
network.
TABLE II: Processing Time of One Packet
PE
Size
1 iteration
8 iterations
16 iterations
Processing Time
PE
Router
35ns
80ns
75ns
80ns
115ns
90ns
PE Size in
Tested MCNoC
1 iteration
8 iterations
16 iterations
Maximum
Frequency
229.764MHz
263.832MHz
263.832MHz
Slice
Usage
Register LUT
12%
15%
13%
28%
23%
24%
in MCNoC.
4) Simulation Results of Throughput: According to comparison results described in previous sections, the DES MCNoC using WF routing algorithm has the best performance
of all. It has the highest insertion rate of packet and lowest
processing latency attributing to the higher PE utilization and
lower trafc contention than other algorithms. The XY routing
has higher packet insertion rate over NF and NL routings, but
gives the lowest throughput due to its vulnerability to network
congestion. Even though, the XY shows a very competitive
performance in high throughput design. All designs have maximum frequencies over 250MHz and throughputs are calculated
in gigabits per second listed in Table III.
Comparing with previous works listed in Table IV, the
proposed work is 6.17 times faster than [3] which is composed
of soft-core processors and pipeline technology, 14.71 times
faster than [4] which is also a complicated design applied
NePA and group pipelining.
VI. C ONCLUSIONS
The results show a high throughput DES computation
design can be achieved with low-cost switching, packet format
and routing algorithms in a 55 mesh-based MCNoC. Using
large PE is area efcient to FPGA and having PE processing
time longer than routing time is a key factor for PE architecture
Max.
Freq.
265MHz
264MHz
265MHz
264MHz
[2]
[3]
[4]
XY
WF
NF
NL
PE Arch.
Frequency
16-iterative
DES
250MHz
MicroBlaze
MicroBlaze
NePA
100MHz
100MHz
100MHz
Throughput
4.80Gbps
5.65Gbps
4.82Gbps
5.54Gbps
12.8Mbps
915Mbps
384Mbps