Professional Documents
Culture Documents
NI architectures with pipelined fashion between IPs and the propose a generic and modular architecture of NIs that support
router of NoC. These NIs allow system designers to send data many modes of OCP standard and can use Handshake 4phase
from IPs to NOC, and vice versa with low latency and jitter. or Credit-Based flow control for the mesh 2 D NoC and it can
We present how we can apply the decoupling between be used for other topologies. A NI design for Asynchrony NoC
computation and communications to achieve the IP modules is presented in [25]. In [26], authors present a Network
and interconnections to be designed independently from each Interface Sharing Techniques that can be used to optimize the
other. The proposed NI allows reducing the design time of new area of NoC Architectures. In [27], authors present an FPGA
systems and hide implementation details of the network. The implementation of a shared Network Interface architecture is
modularity of design is obtained through a separation between proposed to reduce area and power by sharing the buffering
data flows and between IP side and NoC side. The low latency resources and using the stoppable clock technique. In [29],
and minimal jitter between packets are obtained through the authors describe a Network Interface design which supports
separation between header and payload memories. The paper is serial-link packet based transmission model for network-on-
organized as follows. In Section 2, the related works are chip application. The weak point of this design is the highest
presented. In Section 3, an overview of NoC is given. In latency in injection and extraction path. Among the presented
Section 4 we describe and detail the architecture of the works, few approaches as the ones presented in [16-17] have
proposed NIs. Section 5 presents experimental results. Section proposed a generic architecture model which is used as a
6 presents a comparison with other works. Finally in section 7 template throughout the design process for accelerating design
we conclude the paper. cycle for SoC based bus. Only the works of [18-19-24] present
a modular design of NI for NoC. Many Other cited works
II. RELATED WORKS present a specific implementation for NI for NoC using
different protocols. This paper proposes a new generic
There are many works published on the design of novel architecture model of network interface. It addresses the
network architectures as presented by [14], but few problem of the integration of IP cores in such networks and
publications have addressed particular issues to the design of proposes how we can obtain a modular NI with low latency
Network Interface modules. In [15], the authors compared and jitter constraint. The proposed modular NI architecture
three schemes of paketization strategy such as software library, provides a very low Latency in the injection and extraction
on-core and off-core implementation, and related costs in terms path for MNIs and SNIs, which is much lower than a software
of latency and area are projected, showing tradeoffs in these stack implementation and other preview works. It reduces the
schemes. They insisted that a hardware wrapper jitter between two successive packets and allows NI to work
implementation has the lowest area overhead and latency. without blocking fashion. The NI can receive a new request by
Another approach is presented in [16], where the authors Master IP before the end of the transmission of the previous
propose a generic architecture model which is used as a packet. It also allows the reception of a new packet before the
template throughout the design process for accelerating the end of transaction with the slave IP. We present three different
design cycle. The key characteristics of this model are its great implementations each one uses a specific flow control. This
modularity and scalability which make it reusable for a large paper also presents the impact of the control flow mechanisms
class of applications. Its drawback is that is useful for SoCs on cost and performance of NIs. To prove our concept, we
based bus only. The authors, in [17], define a set of parameters evaluate the cost and performance of the proposed NI
for automatic generation of interface for multiprocessor SoC architectures by implementing our designs on ASIC. We use
integration based bus. However, they limited the embedded IP the industrial standard for IP cores AMBA AHB as processing
cores to CPUs (ARM7 and MC68000). The designs of wrapper elements and a wormhole [30-31-32-33] Network on Chip. It is
for the application of specific cores still lack generic aspects based on the mesh 2D topology and the use of a determinist
and only tackle restricted IP cores. An efficient on chip NI routing algorithm called XY or spidergone. We evaluate the
ASIC design proposed in [18], which offers guaranteed area, power, speed, latency, and Throughput of implementing
services, shared memory abstraction, and flexible network NI tasks that use the most three used flow controls in NoC.
configuration that implement three standard sockets compatible
with aethereal NoC. In [19], authors present a generic
architecture of network interface and associated wrappers for a III. PROPOSED NETWORK INTERFACE
networked processor to be used with mesh based NoC Separation between injection and extraction functions
architecture. In [20], a micro-network that is a generic, allows easy reuse of dual components in both master and slave
scalable interconnect architecture for system on chip called NIs, since injection corresponds to packet composition and
SPIN is presented. It gives to the system designer a simple transmission, while ejection corresponds to packet reception
view of a single shared address space and provides a variable and decoding. Shell and kernel separation through relatively
number of VCI compliant network interfaces for both initiators well-defined interfaces is really important for minimizing the
(masters) and targets (slaves). The authors, in [21], present the effort of supporting different sockets, while keeping a fixed
design of low latency network interface for mesh based NoC kernel structure and changing only the shell part. We have
compatible with AHB standard and its associated designed two types of NI for AHB based cores for our
implementation in Xilinx Virtex5 FPGA board. In [22], an network-on-chip, named Master Network interface (MNI)
OCP compliant NI for the Xpipes NoC was touched upon. The attached to master IP and Slave Network Interface (SNI)
NI has a low area but it supports only a single outstanding read attached to slaves IP. Each type of NI is additionally split in
or write transaction. An OCP compliant NIs for the mesh 2 D two sub modules, one for the request and one for the response
NoC was designed in [23]. These NIs have a low area and a data flow or channel (injection and extraction path).
low latency and they support only a burst precise mode
outstanding read and write transaction. The authors, in [24],
3
A. Master Network Interface Architecture Hresp signals to the appropriate value. The transactions with
There are two fundamental separations in the NI the network and the issuance of flits are managed by the kernel
architecture that enable this modularity: a horizontal one which output module. First, the transfer begins with the issuance of
distinguishes the injection path (request data flow) from the header flit. If FIFO header is not empty, the kernel output reads
extraction path (response data flow), and a vertical one which a header flit from header memory and transmits it to the local
distinguishes between the network-dependent and the network- port of the router and set BOP to high state.
independent (connected component) part. These two parts are Master Network Interface
Phillips AEthereal NI [18]. Separation between injection and Shell input Request transmit
corresponds to packet reception and decoding. Shell and kernel Payload Memory Output
Eop
Data
MUX
flexibility in the packet format that can be configured at
Hresp
Extraction Path
instantiation time. Since kernel deals with packet, while shell Hrdata Response Module Response Receive
not full and return Ack or credit signal to the local port of response memory. The last stage is where the response packet
router. The shell output manages end-to-end protocol is transmitted to the local port of the router.
interactions with master IP cores directly connected to the NI.
When the response memory is not empty, the Shell collects IV. EXPERIMENTAL RESULTS
data from response memory and provides it to the master IP
and sets hready and hresp signals to the appropriate value. In this section the synthesis results will be presented, and a
cost analysis of area and power consumption will be made
B. Slave Network Interface Architecture based on the synthesis results. The MNI’s performance and
SNI’s performance will be evaluated in terms of speed, latency,
The tasks of the SNI are to receive request packages from and throughput. We will present a comparative study of three
the network, decapsulate the request packages, transmit the different implementations for NI. On the IP side the three
request to the slave core, receive response from the slave core, implementations use AMBA AHB protocol. The first
encapsulate response and transmit response to the network. implementation of NI uses a Handshake 4 phases flow control.
Clearly, the proposed architecture of the slave network The second uses Handshake 2 phases and the third uses the
interface is built on two path. The extraction path performs the Credit-Based. Master and slave network interfaces with 32 bit
transformation of the request packet of our NoC to an AHB AHB data fields and 32 bit network ports have been modeled
request. The injection path performs the transformation of the with VHDL language on RTL level. They were simulated and
AHB response to a response packet to our NoC. Figure2 synthesized respectively by using the ModelSim tool and
illustrates the SNI internal architecture. The physical division Synopsys Design Vision tool. We synthesized these NIs using
of the interface is distributed in two parts: Shell and Kernel. cell based design with ST 0.13nm CMOS technology using the
The Shell part communicates with slave IP respecting the AHB High Speed (HS) library. The synthesis result of the MNI was
protocol. done with FIFO data and FIFO response having a depth of 4
Slave Network Interface words of 32 bits and the FIFO header has a depth of 2 words.
Kernel Core Interface Each used FIFO has an adjustable depth and width. The
synthesis result of the SNI was done with FIFO data and FIFO
Header Memory
Write Read
Data In Data Out
Full Empty response having a depth of 4 words of 32 bits and the FIFO
Req
Ack or Credit Kernel header has a depth of 2 words of 19 bits. The FIFO address has
a depth of 2 words of 12 bits. For master or slave network
Bop Payload Memory
Hwrite
Hburst
Full
three NI versions.
Hsize
Shell Hrdata
Hready
Hresp
Read
Eop
Data
Output
Data Out Data In
is smaller than the IP cores. An exploration of the
area/frequency trade off was performed for three NI
Empty Full
injection path and requests in the extraction path. The kernel 0.035
part is also divided into two parts called Kernel Input and
rea(mm2)
request packets. Four memories implemented as FIFO are used Credit Based
that uses Credit-Based control flow is the most reduced modes have approximately the same power. It should be noted
compared to the other modes. The MNI that uses the 2 phases that the power of the slave network interface is dominated by
control flow is the greatest compared to other modes. This is the power consumed by the four FIFOs. We conclude that the
due mainly to the fact that the number of states of the two sub- Credit-Based flow control is the best choice for NoC designer
modules Kernel input and Kernel output in Handshake modes to have a low power NI without decreasing the maximum
is higher than the number of states of the Credit-Based mode. operating frequency.
For example, the two other modes (Handshake 2 and 4 phases) HS
have approximately the same area. 15
2PH
HS 4PH
0.056 Credit Based
2PH
0.054 4PH
Credit Based 10
0.052
Power (mw)
0.05
m2)
0.048
Area(m
0.046 5
0.044
0.042
0.04
0
0 100 200 300 400 500 600 700 800 900
0.038
0 100 200 300 400 500 600 700 800 900 Frequency (MHz)
Frequency (MHz)
Figure 4. Slave Network interface area –frequency tradeoff. Figure 6. Slave Network interface Power –frequency trade off.
The area of SNI that uses Credit-Based mode has as an area V. COMPARATIVE STUDY
about 0.054mm2 at 833MHz. The area of SNI that uses 2
Phases and 4 phases Handshake modes has as areas In this section we compare our proposed NI with other
respectively 0.049mm2 and 0.051mm2 at 714MHz. For SNI, architectures. This comparison is presented in Table 1. An
The results show that the area occupied by the SNIs for Credit- exact comparison is complicated due to the fact that these
Based and Handshake 4 phase control flow has approximately architectures have been implemented with different
the same area. The 2 phases control flow has a little difference technologies and exhibit variations in their specifications and
in terms of area with other modes. It should be noted that a capabilities. Nevertheless, it will be noted that the design
large area of the slave network interface is occupied by the four presented exhibits the highest level of modularity and
FIFOs. We can conclude that the Credit-Based flow control is flexibility for supporting other standards.
the best choice for NoC designer to have a low area NI for TABLE I
Master Network Interface. COMPARATIVE STUDY
[29] [25] [22] [18] [19] This
work
B. Power of Network Interface Protocol NA OCP OCP OCP, specific AMBA
The power consumption results are from the Synopsys AXI, AHB
DTL
Design Vision (Power Compiler). An exploration of the
power/frequency trade off was performed for three NI technologies 0,13 0,13 0,13 0,13 90 0,13µm
µm µm µm µm Nm
implementations. By varying the target synthesis clock, Frequency MNI : MNI : NI: NI: MNI :
different power estimation results were reported. The results (Mhz) NA 725 1000 500 719 1000
presented in figure 5 show that the power consumption of the SNI: SNI: SNI:
1086 1000 833
MNI that uses Credit-Based control flow is the most reduced Area NI: MNI: MNI: NI: NI: MNI:
compared to the other modes. (mm2) 0.43 0.058 0.036 0.169 0.053 0.030
HS
SNI: SNI: SNI:
25 0.020 0.045 0.054
2PH
Power NA NA MNI: NA NI: MNI:
20 4PH
Credit Based
(mw) 33.5 15 9
SNI: SNI:
36.9 14.5
w)
15
er(m
(cycles) :6 [3,4]
P
10
SNI : SNI :
5
10 [3,3]
0
0 200 400 600 800 1000 1200 The proposed MNI and SNI implementations that use the
Frequency (MHz)
credit based flow control have the best performance compared
Figure 5. Master Network interface Power –frequency trade off.
to the handshake flow control implementations. For this reason
we will use this implementation to compare our NI with other
The MNI that uses Handshake 2 phases consumes more works reported in table 1. The maximum operating frequency
than other modes. The power estimation obtained by synthesis for our design is about 1111 MHz in 0.13 µm technology. The
with HS library of MNI that uses Credit-Based mode has as speed of our MNI outperform all others works. The speed of
power 8.63mW at 1GHz. The estimated power of MNI that our SNI is bigger than [18-1] and smaller than [22-25].The area
uses 2 and 4 phases Handshake modes has respectively of proposed MNI in Credit-Based mode is smaller than other
20.67mW and 19.07mW at 1.111GHz. For SNI, The three works. The area of the proposed SNI in Credit-Based mode is
6
approximately equal to the work of [19], smaller than [18-29], [12] Open Core Protocol Specification, Release 2.0, www.ocpip.org, OCP-
and finally bigger than [22-25]. The power results presented in IP Association, 2003.
[13] ARM, AMBA AHB Protocol Specification, version 2.0 ARM,1999.
table 4 show that the proposed NI has the most reduced value [14] ARM, AMBA AXI Protocol Specification, version 1.0, ARM, 2004.
compared to other works. We can conclude that our work [15] E. Salminen et al. "Survey of Network on Chip proposal,"White paper,
outperforms the presented other works in terms of power OCP IP, Mars 2008.
consumption. The result of latency of the NI of [29] is between [16] P. Bhojwani, and R. Mahapatra, "Interfacing cores with on chip packet
eight and ten cycles. The latency of [25] NI in injection and switched networks," In VLSI design conference 2003, pp. 382-387.
[17] A. Baghdadi, D. Lyonnard, N. Zergainoh, and A. A. Jerraya, "An
extraction path is respectively 4 and 6. In [22], the latency of efficient architecture model for systematic design of application-specific
MNI and SNI are respectively 6 and 10. The latency of multiprocessor soc," In DATE conference, 2001, pp.55-62.
AETERAL NI [18] is between 4 and 10 cycles. Finally, The [18] D. Lyonnard, S. Yoo, A. Baghdadi, A. A. Jerraya, "Automatic
Latency of [19] for SINGLE and BLOCK transmission in NI in generation of application-specific architectures for heterogeneous
the injection path are 4 and 5 cycles, respectively. Its latency in multiprocessor system-on-chip," In DAC 2001,pp. 518-523.
extraction path is 5 cycles. In this work, the latency in injection [19] A. Radulescu, J. Dielissen, K. Goossens, E. Rijpkema, and P. Wielage,
"An efficient on-chip network interface offering guaranteed services,
and extraction path is respectively 3 and 4 for MNI and 3 for shared-memory abstraction, and flexible network configuration," in
SNI.These results show that our NI outperforms all other Design, Automation , and Test in Europe conference, 2004, pp. 878-883.
architectures in terms of latency in injection and extraction path [20] E. L. Seung, H.B. Jun , S. Y. Yoon , and N. Bagherzadeh, "A Generic
for MNI and SNI. Network Interface Architecture for a Networked Processor Array
(NePA), " in ARCS 2008, LNCS 4934, pp. 247–260.
[21] A. Adriahantenaina, H. Charlery, A. Greiner, and L. Mortiez, "SPIN: A
VI. CONCLUSION scalable, packet switched, on-chip micro Network." in Design,
Automation, and Test in Europe conference, 2003, pp. 70-73.
In this paper, we present a new pipelined Network Interface [22] B. Attia, W. Chouchene, A. Zitouni, N. Abid, and R.Tourki, "Design
architecture whish implement the services provided by the and implementation of low latency network interface for Network on
protocol layer of OSI model at a low cost. The distributed Chip," In IEEE International Design and Test workshop, 2010, pp:37-
buffer structure implemented in proposed Network interfaces 42.
allow to reduce the jitter between two successive packets and [23] S. Stergiou, and al, "Xpipes Lite: a Synthesis Oriented Design Library
for Networks on Chips", in DAC, 2005, pp. 559-564.
allows NI to work without blocking fashion. The performance [24] B. Attia, A. Zitouni, and R. Tourki, "Design and implementation of
study and the cost analyze of three implementations using network interface compatible OCP For packet based NOC," in DTIS,
different flow control shows that the proposed architectures 2010, pp. 1-8.
allow to obtain a low cost and high speed. A comparative [25] B. Attia, A. Zitouni, N. Abid and R. Tourki, "A modular network
study has been conducted with other works .The obtained result interface adapter design for OCP compatibles NoCs," International
shows that the proposed NI outperforms other works in term of Journal of Computer and Network Security, IJCNS 2009, vol. 1, no. 2,
pp : 101-109.
latency and power. The long-term objective is to develop a tool [26] T. Bjerregaard, S. Mahadevan, R. Olsen, and J. Sparso, "An OCP
that automatically generates a specific application of NI which compliant network adapter for GALS-based SoC design using the
accepts as inputs the IP core interface specifications and NoC MANGO network-on-chip," In SoC,2005, pp.171-174.
parameters like (flow control, routing algorithm, flit width, [27] A. Ferrante, S. Medardoni, and D. Bertozzi, "Network Interface Sharing
queuing technique,..). Techniques for Area Optimized NoC Architectures," in 11th
EUROMICRO Conference on DSD methods and tools, 2008, pp. 10-17.
[28] B. Attia, W. Chouchene, A. Zitouni and R. Tourki, "Network Interface
REFERENCES Sharing for SoCs based NoC," in International Conference on
Communications, Computing and Control Applications ,2011,pp.1-6.
[1] J. Liang, S. Swaminathan, and R. Tessier, “ASOC : a scalable, single- [29] L. Fiorin, G. Palermo, and C. Silvano, "MPSoCs Run-Time Monitoring
chip communications architectures, ” in IEEE Int. Conf .on Parallel through Networks-on-Chip," in DATE, 2009, pp.558-561.
Architectures and Compilation Techniques, 2000, pp. 37-46. [30] Y. L. Lai, S. W. Yang, M. H. Sheu, Y. T. Hwang, H. Y. Tang, and P. Z.
[2] W. J. Dally, “Virtual-channel flow control,” IEEE Trans .on Parallel Huang, "A High-Speed Network Interface Design for Packet-Based
and Distributed Systems, vol. 3, pp. 194–205, Mar. 1992. NoC", In International conference on Communications, Circuits and
[3] A. Mello, L. Tedesco, N. Calazans, and F. Moraes," Virtual channels Systems, 2006, pp.2667-2671.
[4] in Network on Chips: Implementation and evaluation on Hermes NoC, [31] S. Felperin, P. Raghavan, and E. Upfal, "A theory of wormhole
” in 18th Symposium on Integrated Circuits and Systems Design, 2005, Routing," IEEE Transaction on Computer, Vol. 45, 1996, pp. 704-713.
pp. 178-183. [32] B. Attia, W. Chouchene, A. Zitouni, N. Abid, and R. Tourki, "A
[5] L. Benini, and G. D. Micheli, “ Network on chips: a new SoC Modular Router architecture design for network on chip," in 8th
Paradigm,” IEEE Computer, vol. 35,pp.70-78,Jan. 2002. International Multi-Conference on Systems, Signals and Devices, 2011,
[6] S. Kumar, A. Jantsch. J. P. Soininen, M. Forsell, M. Millberg, J. Oberg, pp. 1-6.
K. Tiensyrja, A. Hemani, "A Network on Chip Architecture and Design [33] E. Majdi, B. Attia, A. Zitouni, R. Tourki, S. Meftali,and J. L. Dekeyser,
Methodology, " in VLSI, 2002, pp. 117-124. "FERONOC: Flexible and Extensible Router Implementation For
[7] S. Yoo, G. Nicolescu, D. Lyonnard, A. Baghdadi, and A. Jeraya, diagonally mesh topology," accepted in DASIP, 2011.
"A Generic Wrapper Architecture for MultiProcessor SoC cosimulation [34] W. Chouchene, B. Attia, A. Zitouni, N. Abid, and R. Tourki, " A low
and Design", Int. Symposium on HW/SW Codesign, 2001, pp. 195-200. latency router architecture for QoS Network on Chip," accepted in
[8] M. T. Rose, “The Open Book: A Practical Perspective on OSI,” SETIT 2011.
Prentice Hall, 1990.
[9] K. Keutzer, A. R. Newton, J. M. Rabaey, and A. S. Vincentelli,
"System-level design: Orthogonalization of concerns and platform-
based design," IEEE Trans. on CAD of Integrated Circuits and Systems,
vol. 19, pp.1523-1543, Dec 2000.
[10] M. Sgroi, M. Sheets, A. Mihal, K. Keutzer, S. Malik, J. Rabaey, and A.
S. Vincentelli, "Addressing the system-on-a-chip interconnect woes
through communication-based design," In Design Automation
Conference , 2001, pp. 667-672.
[11] Virtual component interface standard - draft specification, v. 2.2.0.