You are on page 1of 14

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO.

5, MAY 2017 1403

Path Setup for Hybrid NoC Architectures


Exploiting Flooding and Standby
 Flich, Senior Member, IEEE, and Alessandro Cilardo, Senior Member, IEEE
Edoardo Fusella, Jose

Abstract—Future many-core systems will require energy-efficient, high-throughput and low-latency communication architectures.
Silicon Photonics appears today a promising solution towards these goals. The inability of photonics networks to perform inflight
buffering and logic computation suggests the use of hybrid photonic-electronic architectures. In order to exploit the full potential of
photonics, it is essential to carefully design the path-setup architecture, which is a primary source of performance degradation and
power consumption. In this paper, we propose a new path-setup approach which can put allocated circuits in a stand-by state, rapidly
restoring them when needed. Path-setup messages are sent using a flooding routing strategy to enhance the possibility of finding free
optical paths. We compare the proposed approach with a commonly used path-setup strategy as well as some other alternatives
available. The results exhibit encouraging improvements in terms of both performance and energy consumption.

Index Terms—Silicon photonics, path-setup, hybrid photonic-electronic networks-on-chip, parallel architectures, on-chip interconnection
networks

Ç
1 INTRODUCTION

S INCE the last decade, the semiconductor industry has


stopped aiming for solutions based on single, very pow-
erful microprocessors, looking instead for parallel systems
Nanophotonic waveguides, the photonic counterpart of a
wire, can in fact achieve bandwidths in the order of terabits
per second by exploiting wavelength division multiplexing
made of an increasing number of cores integrated on a single (WDM) [3], while broadband electro-optic switches are
die. However, highly parallel chips have to face tight chal- capable of routing single-channel data rates up to 40 Gb/s
lenges in terms of energy and heat dissipation. Transistor [4]. In addition, photonic signaling is expected to consume
power consumption does not scale with the integration den- less power than electrical interconnects. However, photon-
sity and performance, causing a serious increase in the over- ics is not suitable for implementing traditional buffered
all chip power dissipation. In this context, the role of the on- NoCs due to its inability to store and process data without
chip communication infrastructure is critical, as it provides an optical-electronic-optical conversion. A viable solution
the required facilities to distribute the computation among consists of combining electronic and photonic technologies
the different cores, largely impacting the overall system per- to build a hybrid network made of two subnetworks: an
formance and power consumption. Networks-on-chip electronic packet-switched network (ENoC) for handling
(NoCs) are a prominent paradigm for organizing the com- control and short messages and a photonic circuit-switched
munication in chip multiprocessors (CMPs) due to their scal- network (ONoC) for larger messages or bursty traffic [5].
able and modular nature, although they are still constrained Circuit switching requires a path-setup protocol to allocate
by power dissipation, latency, and achievable bandwidth. In the required resources to send a message (or burst)
fact, the power due to the communication architecture is optically. Obviously, the path-setup incurs an overhead in
around 30 percent of the whole die power [1] and is still too terms of performance and costs, possibly amortized by
high to meet the expected needs of future systems [2]. On the sending large data messages: the setup latency is higher
other hand, the latency and bandwidth offered by today’s than the transmission latency and the setup power con-
NoCs does not fully satisfy the needs of CMP applications, sumption is much higher than the transmission power con-
particularly for accessing on-chip memory [2]. sumption. In addition, when two or more transmissions
Silicon Photonics appears today a promising path to low- require the same resource, a conflict arises. Conflicts can be
power ultra-high bandwidth on-chip communication. handled in two different ways: messages are either sent
using the ENoC, or the requestor tries again to setup the
 E. Fusella and A. Cilardo are with the Department of Electrical Engineer- path for the conflicted message. Sending a large message
ing and Information Technologies, University of Naples Federico II, Naples through the electronic network may however lead to a dras-
80125, Italy. E-mail: {edoardo.fusella, acilardo}@unina.it. tic degradation in terms of latency and power. As a conse-
 J. Flich is with the Department of Computer Engineering (DISCA), Universi-
tat Politecnica de Valencia, Valencia 46071, Spain. quence, reducing the number of large messages sent
E-mail: jflich@disca.upv.es. electronically is a must. Note that, once a path is set, there
Manuscript received 15 Feb. 2016; revised 8 Oct. 2016; accepted 16 Oct. 2016. are no additional costs or consumption in the electronic net-
Date of publication 27 Oct. 2016; date of current version 12 Apr. 2017. work. On the other hand, keeping an optical path unused
Recommended for acceptance by A. Benoit. results in consuming power in the optical layer as well as
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
incrementing the probability of conflicts in path setup pro-
Digital Object Identifier no. 10.1109/TPDS.2016.2622265 cesses. This tradeoff raises the question of when and for
1045-9219 ß 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
1404 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 5, MAY 2017

how long to allocate the resources under given power show that this length depends on the process technology scal-
constraints. ing, e.g., in case of a 22 nm process, it is equal to one-tenth of
To fully exploit the benefits inherent in photonics, one the chip edge length. A comparison of different emerging
needs to design the control network as well as the path- technologies for on-chip interconnects is carried out in [11].
setup protocol so as to minimize the path-setup overhead According to the authors, silicon photonics exhibits the low-
while maximizing the number of concurrent optical paths est latency and the highest possible bandwidth in case of
and reducing the probability of conflicts. In this paper, we wavelength division multiplexing. In [12] the authors outline
propose a new path-setup protocol exploiting a flooding the opportunities and challenges of different emerging inter-
routing strategy and the possibility of putting allocated cir- connect technologies, including nanophotonic communica-
cuits in a stand-by state. Compared to traditional path-setup tion. They evaluate the possibility of combining three-
protocols, the main advantages are listed below: dimensional (3D) stacking technology and silicon photonics
to exploit the advantages of both technologies.
 The flooding routing strategy provides high adap-
In the recent years, several research efforts have focused
tiveness in routing enhancing the possibility of find-
on proposing various solutions at the architectural level.
ing free paths. In this way, we can reduce the
[13] presents a high-bandwidth multiple-wavelength trans-
number of conflicts that increase the latency and
mission scheme, which essentially exploits the ability to
energy consumption.
transmit parallel wavelength channels through a single
 By putting allocated circuits in a stand-by state, the
microring resonator. Shacham et al. [5] proposes a hybrid
number of path-setup attempts are reduced. In this
torus-based NoC made up of a photonic circuit-switched
way, we can allocate an optical path without per- and an electronic packet-switched network. In [14], the
forming an entire path-setup phase, thus limiting the Corona Multiple-Writer Single-Reader (MWSR) photonic
flooding of control messages in the network. This ring is presented. Pan et al. [15] describes Firefly, a clustered
will further improve the energy efficiency. architecture locally connected via an electronic medium and
We compare the proposed approach with a commonly used globally connected via an optical crossbar, partitioned into
path-setup algorithm and some available alternatives. The multiple logical crossbars arbitrated locally. Mo et al.
results exhibit encouraging improvements in terms of both [16] proposes a hybrid optical-electronic mesh-based NoC
performance and energy consumption. exploiting hybrid optical-electronic routers able to perform
A few preliminary results of this work were reported in a selective transmission. In [17], the THOE architecture is pro-
previous conference publication [6]. In this paper, we pres- posed, a low cost torus-based hybrid optical-electronic NoC
ent the full methodology and we consider a more detailed that can be implemented with a lower number of waveguides
architecture description, improved optical loss and power and ring resonators compared to [5]. Bahirat and Pasricha
estimation models, and a comprehensive experimental eval- [18] presents METEOR, a complete communication architec-
uation involving both synthetic traffic and real applications. ture consisting of a reconfigurable electrical mesh coupled
In addition, a formal proof of the deadlock freedom of the with an optical ring whose waveguides can be configured as a
proposed approach is presented. The paper is organized as combination of Single-Writer Multiple-Readers (SWMR) and
follows: Section 2 discusses some related works in the tech- Multiple-Writers Multiple-Readers (MWMR). Beux et al.
nical literature. Section 3 introduces the target architecture [19] introduced CHAMELEON, a Single-Writer Single-
and presents the adopted optical loss and energy models. Reader optical network-on-chip able to exploit reconfigurabil-
Section 4 thoroughly describes the path-setup protocol, the ity both at run- or compile-time in order to open and close
routing algorithm, and the deadlock avoidance policy. dedicated channels between cores. In [20], a hybrid elec-
Section 5 presents the experimental setup and the results tronic/photonic, hybrid-topology NoC, called H2 ONoC, is
achieved. Section 6 provides some final remarks. presented. Thanks to the novel topology, H2 ONoC achieves a
better energy efficiency compared to torus-based architec-
tures. To reduce the energy consumption and achieve a higher
2 RELATED WORK bandwidth, [21] presents a topology comparison between dif-
2.1 Photonic Networks-on-Chip ferent architectures and introduces an improved topology for
While optical interconnects are widely used to provide com- the solution presented in [5]. Finally, [22] presents PhoNoC-
munication facilities over long and medium distances [7], the Map, a computer-aided design tool addressing the design
recent advancements in nanophotonic technologies also space exploration of optical NoC mapping solutions. Pho-
make silicon photonics a promising approach for on-chip NoCMap automatically assigns application tasks to the nodes
communication. Several works in the recent years have ana- of a generic photonic NoC architecture such that either the
lyzed the benefits and limits of optics versus electronics. worst-case insertion loss or crosstalk noise are minimized.
In [8], a comparison between electrical wires and optical inter-
connects in terms of delay and power is presented. The results 2.2 Path-Setup
show that optics is advantageous for global signaling, while A few electronic NoC architectures employ circuit switch-
electrical wires are more power-effective for shorter link ing in order to provide guaranteed service levels. A repre-
lengths. Based on the semiconductor technology roadmap, sentative example is the ÆTHEREAL [23] NoC, developed
[9] investigates the requirements that photonic interconnects at Philips, that provides guaranteed throughput alongside
should meet in order to outperform traditional electronic best-effort service. A basic version of the path-setup proto-
interconnects. In [10] a study of the critical length beyond col relies on four types of control messages: path-setup, path-
which optical interconnect pays off is presented. The results ack, path-nack, and path-teardown. Basically, a path-setup
Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
FUSELLA ET AL.: PATH SETUP FOR HYBRID NOC ARCHITECTURES EXPLOITING FLOODING AND STANDBY 1405

message is sent from the source of the communication. The


aim of this message is to reach the destination node while
properly setting the required resources along the path. In
case of success, a path-ack message is generated and sent
back to the source of the communication. On the contrary,
when a path-setup message reaches a router and it is unable
to take an output port due to conflicts with other allocated
circuits, a path-nack is sent back to the source. When a path-
ack reaches its destination, the end-to-end path is ready for
the communication and hence the data message is sent.
When the communication ends, the resources previously
allocated are released by sending a path-teardown message.
The standard path-setup protocol is used-with some
enhancements—in many hybrid electronic—photonic architec- Fig. 1. The two building blocks of the Hybrid NoC: (a) The electronic
tures [5], [16], [17], [24]. It is henceforth referred to as the baseline router. (b) The Cygnus photonic switch.
protocol. To deal with the overhead introduced by the path- data and control messages are transmitted, and an optical
setup as well as the E/O and O/E conversions, existing solu- circuit-switched network for large messages or bursty traf-
tions often use a selective transmission policy injecting in the fic. The nodes are organized in a mesh topology.1 The archi-
optical network only messages with a size higher than a certain tecture is composed of k0  k1 tiles, each containing an
threshold. In addition, a few approaches [5], [17] exploit the electronic router and a photonic switch connected with an
symmetric property of optical paths in order to send back the IP core and with the four neighboring tiles. In the following
path-ack message along the reserved optical path instead of we will use ðx; yÞ with 0  x  k0  1 and 0  y  k1  1 to
using the electronic control network. This requires a non-negli- indicate a tile of the network.
gible implementation cost since routers should be enhanced Each core is connected to a network interface (NI) own-
with additional optical components, i.e., microring resonators
ing the necessary logic to perform selective transmission.
and waveguides [17]. In [5], the standard torus network is aug-
Basically, when the head flit of a message is injected in the
mented with optical paths exploiting path multiplicity, so that
NI, it is buffered according to the message size: messages
the blocking probability is reduced with a consequent benefit
shorter than a predefined value are stored in a buffer and
in terms of path setup latency.
However, the congestion on the control layer is the main then directly sent via the ENoC, while larger or bursty mes-
drawback of all these approaches. A few works [18], [25], [26] sages use a different buffer directly connected to the
rely on an alternative ring-based path-setup network that is Optical/Electronic/Optical (O/E/O) interface handling
able to configure multiple optical switches simultaneously serialization, deserialization, and OE/EO conversion. Buf-
instead of sequentially. However, this benefit comes at a non- fers are implemented as FIFO queues. When a large mes-
negligible implementation cost in terms of required wave- sage is to be sent, the path-setup phase begins and, if
guides and ring resonators. In addition, implementing two successful, the message is sent via the ONoC.
different optical networks, i.e., a control and a data network, Fig. 1a shows the router block diagram. Since this router
introduces many waveguide crossings due to the planar is designed to handle only short data and control messages,
nature of the topologies implementable on Silicon. This conse- it does not require the support for large messages and,
quently leads to a raise in crosstalk noise and power loss, hence, it is optimized for latency and not for throughput. In
which potentially constrains the network scalability and pre- addition, it holds the necessary logic to take path-setup
vents the whole architecture from operating properly. decisions. The network is non-interfering, meaning that two
To the best of our knowledge this is the first work investi- virtual networks are used in order to prevent data packets
gating new path-setup protocols able to put allocated cir- from blocking control packets. This is essential since the
cuits on a stand-by state, rapidly restoring them when setup procedure must be as fast as possible in order to pre-
needed. In addition, to enhance the chance of finding free vent sending large messages through the ENoC. The two
paths, path-setup messages rely on a flooding routing strat- networks are identical from a topological point of view.
egy. The key contributions in this paper thus are: Consequently, each electronic router is coupled with a pho-
 a new path-setup protocol; tonic switch (PS) that is the basic building block of the ONoC.
 a routing algorithm designed to take full advantage Fig. 1b shows the five-port PS considered in our study,
of the path-setup protocol; called Cygnus, first presented in [27]. Concerning the elec-
 a formal proof of the deadlock freedom of the pro- tronic router, there are five bidirectional ports, the four
posed routing algorithms for the path-setup protocol ports corresponding to the cardinal directions and the local
with standby; port connected to the O/E and E/O conversion modules.
 a comparison with a commonly used path-setup Cygnus is a non-blocking switch, meaning that it is always
approach and a few other alternatives using both possible to use simultaneously two different pairs of
synthetic traffic and real applications.
1. Notice that, although we rely on a mesh topology, the proposed
3 ARCHITECTURE OVERVIEW approach can be extended to any other topology. In case of topologies
requiring five-ports optical routers, no changes are needed. Differently,
The communication architecture is made of two different in the other cases, the router data structures and the deadlock avoid-
layers: an electronic packet-switched network where short ance rules should be adapted.
Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
1406 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 5, MAY 2017

input-output ports. In addition, unlike a few previously TABLE 1


proposed switches, it is not optimized for a certain routing Optical Loss Parameters
algorithm. Furthermore, it implements straight default
Parameter Value Ref.
paths, i.e., paths taken by the signal when all the rings are
in an OFF resonance state, thereby crossing a reduced Modulator 0.5 dB [33]
Photodetector 0.1 dB [33]
number of microring resonators (MRs) in ON resonance Coupler 1 dB [33]
with a consequent power gain. As explained in Section 4, Propagation Loss in Silicon 1.5 dB/cm [34]
these features are essential for our approach. By injecting Waveguide Crossing 0.05 dB [35]
electronic carriers into the device, due to free-carrier dis- Waveguide Bend 0.005 dB/90 [34]
persion effect [28], each Path-Setup Unit (PSU) controls the Dropping into a Ring 0.5 dB [28]
MRs in the same tile, in order to ensure that all the micror- Passing by a Ring 0.005 dB [28]
ing resonators are always set properly for a data transmis-
sion. A detailed description of the path-setup protocols photonic signals as they propagate through a waveguide:
and of the PSU is given in Section 4. the optical signal power must be higher than the photode-
tector sensitivity plus the worst-case insertion loss to ensure
3.1 O/E/O Interface a proper detection in the reception stage. In addition, the
The O/E/O interface is made up of several components: total power cannot exceed a certain threshold due to the
serializer, laser source, and driver circuit in the generation nonlinearities of the silicon material. In case of WDM sig-
stage as well as photodetector, transimpedance amplifier nals, these considerations yield the following inequality [21]
(TIA), limiting amplifier (LA), and deserializer in the recep- P  S  ILwc þ 10 log10 N ; (1)
tion stage. The generation stage handles the data transition
from the electronic domain to the optical domain. The data where P is the maximum signal power allowed to be
is first serialized. Then, the driver circuit controls the optical injected considering a threshold for nonlinearities, S is the
modulator device which translates the electrical signal into photodetector sensitivity and N is the total number of
an optical signal through a laser source by modulating the wavelengths.
light intensity according to the data bit values. Conversely, The worst-case insertion loss is given by Equation (2)
the reception stage is responsible for bringing back the opti-
cal signal into the electrical domain. The photodetector ILdB dB dB dB dB
wc ¼ ILmod þ ILdetect þ ILcoup þ ILprop
(2)
turns the light waves into an electrical current that is con- þ ILdB dB dB dB
cross þ ILbend þ ILdrop þ ILpass ;
verted into electrical voltage by TIA. Then, LA brings the
electrical voltage to the proper logic level and last, the digi- with ILdB dB dB dB dB dB dB dB
mod , ILdetect , ILcoup , ILprop , ILcross , ILbend , ILdrop , ILpass
tal signal goes through the deserializer stage. being respectively the loss due to electro-optic modulators,
Serialization/deserialization circuits are key for handling photodetectors, couplers, waveguide propagation, wave-
the electronic/photonic clock mismatch. For the simulation guide crossings, waveguide bends, ring dropping, ring
model of the serializer and deserializer, we rely on the passing in the worst-case scenario. Table 1 shows the uni-
design presented in [29] due to its low power consumption. tary values of insertion loss.
Concerning the light source, we choose a vertical cavity sur- Once the maximum number of exploitable wavelengths
face emitting laser [30] since it can be directly modulated by is defined, it is possible to evaluate the maximum number
the driving current without the need for an ad-hoc optical of wavelength channels and hence the bandwidth as
modulator device and requires reduced power consump-
tion. For the photodetector, we consider a low-voltage ger- Bw ¼ clk  N ; (3)
manium photodetector with a sensitivity of 14:2 dBm at a where clk is the optical clock speed. Note that the network
1012 bit error rate and a data rate of 10 Gbps, as demon- scalability, including the maximum number of wavelength
strated in [31]. This device provides a good tradeoff channels, may be limited by electro-magnetic effects such as
between its sensitivity and the supported data rate. Note the crosstalk noise [36]. In addition, microrings need to pro-
that, the photodetector sensitivity and the BER are highly vide multiwavelength routing capabilities and hence they
interrelated: the receiver BER for different optical signal must be designed with a small free spectral range (FSR), i.e., the
power levels can vary by several orders of magnitude. In spacing between different wavelengths that resonate with the
this paper, we neglected such data integrity aspects as they ring. However, the FSR is inversely proportional to the cir-
are out of the scope of the work. Concerning the TIA-LA cir- cumference of the ring and hence the minimum achievable
cuits, we rely on the design presented in [32] since it FSR is bounded by physical constraints such as the maximum
achieves a lower power consumption than other proposals. ring diameter. This paper neglects such electro-magnetic
effects as well as the physical constraints as it mainly focuses
3.2 Optical Loss and Bandwidth Model on the design of the electronic path-setup architecture.
Electronic networks make use of several distinct wires to
enhance the parallelism of a communication. Differently, by 3.3 Energy Models
exploiting Wavelength-Division Multiplexing each wave- In order to evaluate the energy consumption of the whole
length in an optical communication can carry different data network, we need to consider the consumption of both the
in a single waveguide. In order to achieve this benefit, we ENoC and the ONoC. Concerning the electronic power con-
need to properly deal with insertion loss (IL), which affects sumption, we rely on the method presented in [37]
Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
FUSELLA ET AL.: PATH SETUP FOR HYBRID NOC ARCHITECTURES EXPLOITING FLOODING AND STANDBY 1407

TABLE 2 TABLE 3
Power Consumption for Electronic Power Consumption for Photonic
Signaling Signaling

Parameter Value Parameter Value


Elink 0.34 pJ/mm/bit MRON 10 mW
Ebuffer 0.12 pJ/bit serializer 0.71 mW
Ecrossbar 0.36 pJ/bit deserializer 0.57 mW
Estatic 0.35 pJ/bit driver 0.8 mW
TIA  LA 2.4 mW
VCSEL @4dB 0.32 mW
previously used in a similar work [5]. The method is based VCSEL @8dB 0.80 mW
on the assumption that the following operations are VCSEL @12dB 2 mW
VCSEL @16dB 5.04 mW
required at each hop:

 reading from the buffer; source to generate just enough power to ensure a proper
 taking routing and arbitration decisions; detection of the optical signal at the photodetectors. As a con-
 crossing the inner switch; sequence, when generating a new optical signal, we need to
 going through the link; evaluate the power loss the signal is subject to when travel-
 writing to a buffer. ing to the destination. In this way, we can avoid considering
As a consequence, the energy consumed by sending a mes- the worst-case insertion loss for every communication.
sage end-to-end is equal to the sum of these energy values Table 3 reports the power consumption values for these
times the number of hops: operations including the VCSEL power consumption for
four possible loss values assuming 30 percent laser efficiency
Emessage ¼ Ehop  Nhops ; (4)
and 14:2 dBm sensitive receivers. Notice that our approach
where Nhops is the number of hops taken by a message and is independent of the used power consumption values and
Ehop is the energy necessary to cross a single hop: optical loss coefficients, which are only indicative and may
be influenced by technology and architectural variables.
Ehop ¼ ðElink  dlink Þ þ Ebuffer þ Ecrossbar þ Estatic ; (5)

where dlink is the link length between two adjacent routers.


4 PROPOSED PATH-SETUP FOR HYBRID NOCS
Table 2 reports the values of the energy per bit consumed 4.1 Minimal Flooding Routing
for these operations (Ebuffer is the sum of the components The path-setup protocol is responsible for allocating the
due to reading and writing, while the energy due to routing required resources guaranteeing that a required optical com-
and arbitration is neglected) as evaluated in [5]. munication can take place. A good path-setup protocol
Differently, when sending a message in the photonic net- should reduce the number of conflicts, which slow down the
work, the energy is consumed in the O/E/O interface and path-setup phase and result in a higher overhead in terms of
in the optical network components along the path. Photonic performance and power consumption: for every conflict, the
signaling benefits from the two proprieties of bit-rate trans- path-setup procedure is interrupted and iterated until the
parency and low loss in optical waveguides, meaning that the destination is reachable and the data can be transferred. As a
energy consumption in the optical network components, consequence, less conflicts results in a lower average number
i.e., waveguides and microring resonators, is independent of path-setup attempts per communication.
of the bitrate and the distance between the two end-points. While the baseline protocol described in Section 2.2 relies
The power consumption of a microring resonator depends on the XY dimension-order scheme for routing control mes-
on its state: in the OFF state the power is negligible, while in sages, the four turns allowed by the XY solution do not per-
the ON state it consumes around 10 mW when designed for mit any adaptiveness in routing, potentially leading to poor
switching multiwavelength broadband signals [38]. Conse- performance. Differently, the proposed approach imple-
quently, the energy that a message consumes depends on ments a minimal path flooding routing algorithm to route
how long the microrings are in the ON state regardless of path-setup messages. With the minimal path flooding, each
whether they are used or not. router checks if it is the target router. If not, the path-setup
On the other hand, the O/E/O interfaces are made up of message is sent to the neighbors that are closer to the target
several power hungry components: serializer, deserializer, router. Each router repeats the process or, in case of a con-
driver, VCSEL, photodetector, and TIA-LA circuits. The flict, generates a path-nack message.
power consumptions of the serializer and deserializer are Flooding leads to exploring paths otherwise inaccessible
respectively 2 and 1.6 mW in 90-nm CMOS processes [29]. using XY routing. However, it requires message replication
The power consumptions of the VCSEL driver and TIA-LA and may potentially increase the power consumption as well
circuits are respectively 2 and 6 mW in 80-nm CMOS pro- as the network congestion perturbing the ENoC perfor-
cesses [32]. Since we target a 32-nm process technology, the mance. On the other hand, there are more possibilities to
above values are scaled linearly to 32 nm. We neglect the find a free path, thus avoiding multiple path-setup attempts.
photodetector power consumption since it is some orders of In other words, the differences in the routing approach intro-
magnitude lower. Finally, we evaluate the power consump- duce a tradeoff between the breadth and depth of the path
tion of the VCSEL as follows. As in [17], we assume the laser exploration process and the number of times the exploration
Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
1408 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 5, MAY 2017

is performed. At the same time, in the photonic layer, the


latency is independent of the distance between the two end-
points, while PSs with straight default paths make the energy
consumption depend only on the number of turns. Hence,
optical paths with less turns are preferred.

4.2 Path-Setup Protocol with Standby


As previously described, using a flooding routing algorithm
does not only improve the chances of finding free paths but
it also increases the number of messages in the control net-
work and hence the possibility of congestion. The proposed
path-setup protocol with standby allows using flooding for
routing path-setup messages, and yet it reduces the number
of messages in the network by decreasing the probability of
conflicts in the routers.
The idea is to exploit the deterministic nature of burst
communication, where a source node communicates with a
destination node for a while before switching to a new desti-
nation. As soon as an optical communication finishes, we
put the path in standby instead of tearing it down. In a
standby path, the ring resonators are in OFF mode but the
path is reserved for a further use without the need to per-
Fig. 2. The data structures located in the Path-Setup Unit for the basic path-
form again costly flooding operations. setup protocol. (a) The PS Connectivity Table. (b) The Counter Array.
Implementing the proposed protocol requires two new (c) The finite-state transition diagram for the state of output ports of PSs.
messages, i.e., path-standby and path-wakeup, in addition to
the four messages of the baseline protocol: (path-setup, path- the input port from which the message arrived is stored
ack, path-nack, and path-teardown). Note that these control in the associated entry. This information is required to guar-
messages contain only the destination address and three antee that path-ack/path-nack/path-teardown messages are
bits specifying the message type. able to reach the source of the communication in the back-
In order to implement a distributed protocol, we added ward direction. In the Reserved state, when a path-nack mes-
in each router a Path-Setup Unit whose functions are: sage arrives, the state goes back to Unused. Otherwise, in
case of path-ack, the state changes to Allocated. In addition,
 configuring the switching functions of the corre- when a path-ack message reaches a router, the IDs of source
sponding PS; and destination nodes are stored in the corresponding
 keeping track of the PS state; fields. This is essential for keeping track of the optical paths
 managing path-setup conflicts; in standby state. The Destination node fields are managed
 implementing the routing policy for control with a Least Recently Used policy keeping track of the
messages. recently used optical connections. An output port in the
Each PSU owns an internal PS Connectivity Table (PCT) and a Allocated state implies that the necessary ring resonators are
Counter Array (CA) whose structures are shown respectively set to establish an optical path between the input and output
in Figs. 2a and 2b. The PCT contains for each output port the ports. Hence, the PS is ready to be used and, as a conse-
fields State, Input Port, Source Node, and k fields Destination,
quence, it is consuming energy. As soon as the optical com-
where k is the maximum number of paths going through this
munication ends, the source sends a path-standby message
port, which can be simultaneously in the standby state. Note
to set the state to standby instead of tearing it down. A
that the maximum number of standby paths in the network
Standby state means that the ring resonators are in OFF
depends on application communication patterns. Assuming
mode but the path is reserved for a further use. The source
standby paths uniformly distributed over the source nodes, k
PNsrc 1 node that plans to use a path on standby must send a path-
Ndst
may be evaluated as d i¼0Nsrc i e, where Nsrc is the total wakeup message and wait for a path-ack message. In case it
number of source nodes and Ndsti is the number of destination receives a path-nack message, it retries to send a new path-
nodes involved in an optically communication with the ith wakeup. path-nack messages are generated in case of conflicts
source node. The size of these data structures is constant for since it is possible that another source is trying to tear down
the NoC design, regardless of the network size, guaranteeing the standby optical path. A new flooding is performed only
scalability. In case of different topologies requiring high-radix after a path-teardown message is received or in case it is nec-
routers, the size of these structures scales linearly with the essary to communicate with a new destination.
radix of the routers. Paths in Standby state can be used only by the source node
The PS output ports are in one of five states: Unused, that allocated it for the first time. On the other hand, in order
Reserved, Allocated, Standby, and Woken up. Fig. 2c shows the to avoid source nodes keeping unutilized or standby paths,
finite-state transition diagram for the proposed path-setup which prevent other sources from establishing optical cir-
protocol. At the beginning, the state is set to Unused mean- cuits, a path-setup, not able to reach its destination, is allowed
ing that the port is free for further reservations. When a to tear down standby circuits allocated by other sources. The
path-setup message arrives, the state changes to Reserved and path-teardown crosses the network up to the source node of
Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
FUSELLA ET AL.: PATH SETUP FOR HYBRID NOC ARCHITECTURES EXPLOITING FLOODING AND STANDBY 1409

the standby circuit freeing the PCT entry corresponding to


that path. Since the tear-down message could be sent from
any node along the path and not from the destination node,
it is possible that some part of the circuit remains in the
Standby state. In this occurrence, the junk piece of circuit will
never be used again and will be torn down when necessary.
Conflicts arise when path-setup or path-wakeup messages are
not able to reach the next hop since all the required output
ports are Reserved, Allocated, or Woken up and belong to a dif-
ferent optical path. As a consequence, path-teardown mes-
sages will release the resources corresponding to the path in
the Standby state, if any, in order to cross the output port
where the conflict arises. Otherwise, we can either try again
the setup process or send the message via the ENoC. In our
solution we adopted a mixed policy: we try to setup the opti-
cal path for a certain amount of time, after which the message
is sent electronically. In this way, the time that a message can
stay in the queue is bounded, preventing head-of-line block-
ing yet avoiding messages being sent via the ENoC in case of
sporadic conflicts. In case of concurrent path-wakeup and
path-teardown messages related to the same circuit, the path-
wakeup will reach a router where the circuit has already been
torn down. At this time, a path-nack message is generated
and sent back to the source node. Then, a new flooding will
occur, since, in the meanwhile, the path-teardown message
will have reached the source node. Fig. 3. A timing diagram showing an example of possible path-setup
scenarios.
Since we rely on a flooding routing algorithm, it is possi-
ble that a path-setup message needs to be routed to more than
a single output port. Assuming that the path-setup message is reserved by another source node and hence a path-nack mes-
sent through k output ports with k > 1, two resulting sce- sage is generated and sent back. At the next path-setup
narios are possible: the router will receive k path-nack mes- attempt, a path-setup message will be generated, since in the
sages or one path-ack and k  1 path-nack messages. In both meanwhile the standby resources will have been torn down.
cases, we need a synchronization mechanism in order to
avoid leaving junk messages in the network: for each input 4.3 Deadlock Freedom
port, a counter (Fig. 2b) keeps track of how many messages Deadlock is caused by packets waiting on each other in a
coming from that input port were sent. When a path-ack or cycle. In order to ensure deadlock freedom, the routing algo-
path-nack message arrives, the corresponding counter is rithm can be designed so as to prohibit just enough turns to
decremented. Only when the counter is equal to 0, the path- break all of the cycles in the network. Basically, a turn involves
ack or path-nack message is sent back to the previous router a 90-degree change in the traveling direction. In 2D meshes
along the path. In this way, only one message arrives to the there are eight types of turns. A turn is called an NE turn if it
source NI and there are no junk messages left in the network. requires a change of direction from North to East. Similar defi-
Fig. 3 shows a few examples of possible path-setup sce- nitions apply to the other turns, i.e., NW, EN, ES, SE, SW, WN,
narios where the Source Node 0 tries to setup a path to the WS. The drawback is that prohibiting some turns reduces the
Destination Node 0. Fig. 3a illustrates, first, a failed setup adaptiveness of the algorithm. In order to overcome this limi-
since the required output port in Router 2 is allocated by tation, virtual channels can be used [39]: physical channels
another source node, followed by a successful setup. At the may be split into virtual channels, i.e., abstractions that share
end of the optical data transmission, the path is put in the same physical channel, although each virtual channel has
standby. Last, the path is woken up for a further optical its own queue. Basically, virtual channels decouple the alloca-
communication. Differently, in Fig. 3b, the path-setup mes- tion of buffers from the allocation of channels by providing
sage is not able to reach the destination node since a multiple buffers for each physical channel. The cost of imple-
required output port is in a standby state belonging to menting virtual channels is small. Each virtual channel
another optical path. As a consequence, a path-teardown requires its own queue, but the queue size can be as small as a
message is generated and sent to the source of the other single flit. A prohibited turn can be allowed as long as we
optical path to release the resources in standby. Fig. 3c change the virtual channel, effectively breaking potential
shows an example of a failed woken up of a standby path. dependency cycles. In a mesh topology, in case of an arbitrary
Source 0 owns an optical path in standby to reach Destina- routing strategy, the number of forbidden turns in the worst-
tion 0 through Router 0-1-2. However, Source 1 requires case path grows linearly with the network size. Obviously, it
tearing down the optical path in order to reach another des- is not possible to change virtual channel for every invalid
tination. Since the teardown message has not yet reached turn. As a consequence, if a path-setup message cannot cross a
Source 0, Source 0 sends a wake-up message. As soon as the router due to a forbidden turn and the lack of available virtual
wake-up message reaches Router 2, it finds the output port channels, then a path-nack message is generated and sent back.
Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
1410 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 5, MAY 2017

Likewise, when a path-wakeup message is blocked due to a


conflict, a path-nack message is generated and sent back. Simi-
lar considerations apply for path-ack messages. Note that
sending a message in the backward direction introduces 180-
degree turns further exacerbating the deadlock issue.
In the following, we show that the proposed approach is
deadlock free as long as the following rules are met:
Rule 1. As least two virtual channels are used. A higher number
of virtual channels may be used to improve the adaptiveness, in
which case an even number Nvc of virtual channels is required.
Rule 2. Control packets are not allowed to take EN and SW turns
at any nodes located in an even column without changing vir-
tual channel. Control packets are not allowed to take ES and
NW turns at any nodes located in an odd column without Fig. 4. A few examples of how virtual channels are used to eliminate
deadlocks. The gray and black nodes are respectively sources and
changing virtual channel.
destinations, while the red nodes cannot be crossed due to conflicts. The
Rule 3. The virtual channels are ordered according to their IDs. bidirectional black arrows indicate routing restrictions. The virtual channel
changes are highlighted and colored according to the message type.
In case of Nvc virtual channels, the IDs range from 0 to
Nvc  1. A message stored on a virtual channel VCi cannot be channels are ordered and routing is restricted to visit
forwarded to a virtual channel VCj if j < i, with channels in increasing order. This will eliminate cycles in
0  i  Nvc  1 and 0  j  Nvc  1. the channel dependency graph and, hence, the routing
Rule 4. Path-setup, path-wakeup, and path-standby messages are function is deadlock free according to Dally’s
allowed to use a virtual channels VCi only if 0  i  N2vc  1, Theorem [39].
while path-ack, path-nack, and path-teardown messages are Case 2. All the packets belong to path-ack, path-nack,
or path-teardown messages. This case can be treated as
allowed to use a virtual channel VCi only if N2vc  i  Nvc  1.
Case 1.
Rule 1 guarantees that there are enough virtual channels Case 3. Packets belong to arbitrary control message.
to allow path-ack/path-nack messages to come back to the Suppose packet pi , with 0  i  n  1, belongs to a path-
source using the same path of path-setup/path-wakeup mes- setup, path-wakeup or path-standby message and packet
sages in the backward direction. Rule 2 specifies where to pj , with 0  j  n  1, belongs to a path-ack, path-nack,
put restrictions and is equivalent to Rules 1-2 of the odd- or path-teardown message.
even turn model [40]. However, unlike the odd-even turn Suppose that n ¼ 2, then 180-degree turns can occur
model, restrictions are bidirectional meaning that if a packet since there are both path-setup/path-wakeup and path-
crosses a restriction when going to a certain direction, ack/path-nack messages. However, according to Rule 4,
another packet will cross a restriction when going in the path-setup/path-wakeup/path-standby and path-ack/
backward direction. Rule 3 defines a partial order of virtual path-nack/path-teardown messages are stored in queues
channel classes. Rule 4 excludes the occurrence that path- of respectively virtual channel VCi with 0  i  N2vc  1
setup/path-wakeup messages and path-ack/path-nack mes- and VCj with N2vc  j  Nvc  1. These two sets of virtual
sages are waiting on each other in a cycle. channels are disjoint sets, breaking any possible two-cycle
with 180-degree turns. Differently, suppose that n > 2
Theorem 1. The minimal flooding routing algorithms for the and i < j. Then, every packet pk with j  k  n  1 must
path-setup protocol with standby is deadlock free as long as belong to path-ack/path-nack/path-teardown messages
Rules 1, 2, 3, and 4 are met. due to Rules 3, 4. Similarly, every packet ph with 0  h  i
Proof. Assume that there exists a set of packets must belong to path-setup/path-wakeup/path-standby
p0 ; p1 ; . . . ; pn1 , where piþ1 waits for pi with 0  i  n  2. messages. This means that p0 is stored in a virtual channel
A deadlock arises if p0 waits for pn1 , as a circular depen- VCi with 0  i  N2vc  1 and pn1 is stored in a virtual
dence is generated through the packets. Different cases channel VCj with N2vc  j  Nvc  1. Consequently, accord-
need to be considered. ing to Rule 3, virtual channels are ordered and routing is
Case 1. All the packets belong to path-setup, path- restricted to visit channels in increasing order and, hence,
wakeup, or path-standby messages. 180-degree turns do it is not possible that p0 waits for pn1 . Similar considera-
not occur since there are no path-ack, path-nack, or tions hold true in case of i > j. This will eliminate cycles
path-teardown messages. According to Rule 4, the packets in the channel dependency graph and, hence, the routing
are stored in queues of virtual channel VCi with function is deadlock free according to Dally’s
0  i  N2vc  1. If Nvc ¼ 2, then all the packets are stored Theorem [39]. u
t
in the same virtual channel. Since Rule 2 is equivalent to Fig. 4 shows an example of how virtual channels are used
Rules 1-2 of the odd-even turn model, the reasoning fol- in the proposed approach. Consider the communication
lows the proof of the odd-even theorem [40]. Differently, if between nodes ð0; 5Þ and ð2; 3Þ, and two virtual channels
Nvc > 2, then the packets could be stored in multiple vir- available. The path-setup message is flooded and it can reach
tual channels. However, according to Rule 3, virtual the destination through multiple paths using virtual
Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
FUSELLA ET AL.: PATH SETUP FOR HYBRID NOC ARCHITECTURES EXPLOITING FLOODING AND STANDBY 1411

TABLE 4 TABLE 5
Simulation Parameters Characteristics of the Architectures Analyzed

Parameter Value Approach Path-setup msg #VCs Stand-by


Chip size (mm ) 2
20  20 Routing optical paths
Topology size (k0  k1 ) 88 Baseline XY 2 NO
Flit Size (Bytes) 4 Arch-A Flooding (minimal) 2 NO
Buffer Size (Flits) 6 Arch-B Flooding (minimal) 4 NO
Electronic Clock Frequency (Ghz) 1 Arch-C Flooding (non-minimal) 4 NO
Optical Clock Frequency (Ghz) 10 Proposed Flooding (minimal) 4 YES
Photodetector sensitivity (dBm) 14.2
Modulation rate (Gb/s) 10
# Wavelength Channels 20 control network, this choice provides a simple electronic
infrastructure that is enough to guarantee basic
connectivity.
channel VC0 . To avoid deadlocks, the path-ack message is
In order to validate the proposed approach, and to dem-
sent back on VC1 . Note that in nodes ð1; 5Þ and ð1; 4Þ the
onstrate the benefits of using a path-setup protocol with
path-setup message is not sent to the South port due to ES
standby, we tested and compared the proposed approach
routing restrictions. Similarly, in the communication
between nodes ð3; 4Þ and ð5; 3Þ, the path-setup message is not with a commonly used path-setup algorithm and a few
able to cross nodes ð3; 3Þ and ð5; 4Þ respectively due to a con- other alternatives. The comparison is done with both syn-
flict and a router restriction. Path-nack and path-ack messages thetic traffic and real-world applications. Table 5 shows the
are generated and sent back on virtual channel VC1 . Last, characteristics of the five path-setup approaches analyzed.
consider the communication between nodes ð1; 1Þ and ð3; 2Þ The baseline approach [5], [16], [17], [24] uses the protocol
and assume here four virtual channels. Due to conflicts on introduced in Section 2.2 and XY dimension-order routing
nodes ð1; 2Þ and ð3; 1Þ, there is only a single path that can be in the control network. Differently, the proposed approach
used. Since there are four virtual channels, we can cross a exploits the path-setup protocol with standby and the mini-
single restriction. The path-setup message is sent through mal flooding routing algorithm. In addition, four virtual
VC0 . As soon as it reaches node ð2; 1Þ, the message switches channels are used in order to enhance the routing adap-
to VC1 due to an EN restriction. In the same way, the path- tiveness, allowing messages to cross a single forbidden turn
ack message is sent through VC2 and, in order to cross node as explained in Section 4. In order to extend our compari-
ð2; 1Þ, it exploits VC3 . sons, three other approaches, exploiting the baseline proto-
col, were implemented. Arch-A and Arch-B are equipped
5 SIMULATION RESULTS with minimal flooding routing while in Arch-B a single for-
5.1 Simulation Setup bidden turn is allowed thanks to four virtual channels.
The simulation environment was set up by modifying Arch-C is also configured with four virtual channels and, in
gMemNoCsim, an in-house event-driven cycle-accurate addition, it can take advantage of non-minimal paths with a
NoC simulator. The optical bandwidth, the number of single non-minimal hop in each path. Non-minimal path
usable wavelengths as well as the energy consumption are flooding removes the closeness restriction by allowing for
evaluated by incorporating the models provided in Sections each path a fixed number of non-minimal hops.
3.2 and 3.3 into the simulator. The main simulation parame-
5.2 Comparison Under Synthetic Traffic
ters are summarized in Table 4. We targeted a 32-nm pro-
cess technology and we assumed an 8  8 mesh topology We first carried out a set of experiments to evaluate the ben-
efits of using the proposed path-setup approach presented
and a 400 mm2 CMP die area. The area footprint of a single
in this article and to analyze its performance and energy
tile is slightly less than 1 mm2 allowing the integration of all
consumption. The main simulation parameters are summa-
the 64 tiles in the die. This footprint is evaluated by assum-
rized in Table 6.
ing the individual area of modulators and photodetectors
Non-bursty traffic is injected at a rate of 0.2 flits/cycle/
equal to, respectively, 115 and 125 mm2 and the ring diame-
NI in order to keep the ENoC busy. All the considerations
ters of passive filters and broadband comb switches equal
in the rest of this section concern burst and control traffic.
to, respectively, 6 and 200 mm. The hop length, required to
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Bursty traffic is described by three parameters: the burst
D
calculate the propagation loss, is evaluated as ðk0 1Þðk 1 1Þ
, message size, the burst injection rate, and the percentage of
where D is the die area and k0  k1 is the topology size. nodes that are involved in a burst transfer as sources or des-
Regarding the ENoC, all the data channels have the same tinations. These nodes are randomly selected and during
bus width of 32 bits, equal to a single flit. The routers are the simulation are changed in order to provide a scenario
configured to have a buffer depth of 6 flits. The electronic independent of the possible mapping choices. Each node is
components in the network are clocked at 1.0 GHz, while characterized by a low, medium, or high value. The values
the optical clock speed is conservatively set to 10 Ghz as are chosen so as to have light traffic when all the values are
in [33]. According to [41], a 10 Gb/s fixed modulation rate low and an extremely challenging traffic when all the values
per wavelength is assumed and we consider 20 wavelength are high. Other traffic behaviours are achieved with a differ-
channels. Data messages in the ENoC are routed using XY ent combination of these values. This traffic characterization
dimension-order routing. Of course, different routing algo- is suitable for describing scientific or multimedia applica-
rithms can be used, but, since our goal is to investigate the tions where communication patterns are far from full
Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
1412 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 5, MAY 2017

TABLE 6 congested. On the contrary, the proposed approach per-


Synthetic Traffic Parameters forms better than all the other approaches by exploiting the
path-setup protocol with standby state: the network is never
Parameter Value
congested and only in case of all the burst values being
Message Size (Bytes) 8 high, the latency increases significantly. In other cases,
Injection Rate (flits/cycle/NI) 0.2
when Arch-B incurs high latencies, the proposed approach
Burst Message Size (Bytes) Small 256 is able to reach a reduction up to 50 percent.
Medium 1,024
Fig. 6 shows the average energy per bit consumption
Large 4,096
for the burst messages. The energy is divided into three
Burst Injection Rate Low 1=300
parts: the energy for sending messages via ENoC, via
(flits/cycle/NI) Medium 1=250
High 1=200 ONoC, and the energy due to the path-setup overhead.
Percentage of source and Low 20% src 20% dst
The ONoC energy includes the energy due to the E/O/E
destination nodes of bursty traffic Medium 40% src 40% dst interface. The energy due to the path-setup is the main
High 40% src 60% dst component in case of small and medium size messages:
more than 95 percent of the whole energy in case of small
size messages and more than 80 percent for medium size
connectivity [42]. In fact, applications that scale most effi-
messages using the baseline architecture. When the burst
ciently to large numbers of IP cores tend to depend on
size is large and the injection rate high, a number of mes-
point-to-point communication patterns where the average
sages are sent via the ENoC due to the congestion on the
topological degree of communication (TDC)2 ranges
control network. This causes an increment of the ENoC
between three and seven distinct destinations [43]. Notice
energy that can become the main component even in case
that this kind of applications may need to handle heavy traf-
of few messages. The general behavior is that Arch-C per-
fic with large-size messages. For instance, the H:264 video
forms very poorly due to the large amount of flooding
decoder and the sparse matrix solver applications, analyzed
messages being sent, followed by Arch-B and Arch-A.
in the next section, are characterized by an average message
The more messages are sent during the path-setup phase,
size of 1,280 and 820 bytes, respectively.
the more energy is consumed. The baseline consumes less
Simulations are performed for each combination of the
energy since no message replication is performed. This
three values characterizing the bursty traffic. Fig. 5 shows the
average latency for each architecture when varying the simu- trend does not occur in case of large messages, as the
lation parameters values. The latency is divided into three path-setup overhead is independent of the size of the
parts: the latency due to electronic and optical data transfers message and hence it tends to be negligible. Concerning
and the latency due to the path-setup overhead. It is evaluated the baseline and the Arch-C architecture, a high injection
as the average of the latencies of all the burst messages that rate leads to a congestion on the control network and
reach their destination via the ENoC as well as the ONoC. hence to a large increase in the energy consumed by the
The baseline architecture and Arch-C perform ade- ENoC. This holds true for all architectures but Arch-B is
quately for low values of the burst parameters, but as the able to route less messages in the ENoC. Different consid-
burst size, injection rate, and percentage of nodes are erations apply to the proposed approach. Its ability to put
increased, the latency grows faster compared to the other an optical circuit in a standby state allows reducing the
architectures. In the worst cases, when the burst size and number of floodings and hence power consumption.
injection rate are high, the network gets congested. The Compared to the baseline architecture, which does not
main drawback of non-minimal flooding is that routing the perform floodings, it reaches the same values for small-
messages to all the router output ports results in reserving a size messages, while for larger messages it achieves a
prohibitive amount of resources. As a consequence, the reduction greater than a 50 percent.
number of path setups that can be carried out simulta-
neously decreases drastically, leading to a higher number of 5.3 Comparison Under Real Applications
conflicts and hence a higher path-setup overhead. However, In addition to the comparison under synthetic traffic, the dif-
compared to the baseline approach, Arch-C performs better ferent approaches were compared under a real scenario. We
due to the higher probability of finding free paths. Differ- considered four real applications, that are included in the
ently, Arch-A and Arch-B perform between 30 and 60 per- MCSL realistic NoC traffic benchmark suite [44], namely,
cent better than the baseline solution, depending on the size FPPPP, a chemical program performing multi-electron
of the messages and the injection rate. Basically, the gap integral derivatives; H264-720p_dec, a H.264 video decoder
increases as the two parameters grow. The ability of safely with a resolution of 720p; H264-1080p_dec, a H.264 video
crossing forbidden turns in Arch-B, due to the four virtual decoder with a resolution of 1080p; and SPARSE, a random
channels, leads to an average latency reduction of about sparse matrix solver for electronic circuit simulations. Their
27 percent with a maximum of 60 percent for the high burst characteristics are summarized in Table 7. We chose applica-
size. All these architectures perform poorly in case of a high tions with various characteristics to evaluate the effective-
number of nodes sending large-sized messages with a high ness and the scalability of the proposed method for different
injection rate. When these values are high, the network gets traffic scenarios. Note that the MCSL suite provides, for each
application, recorded traffic patterns with detailed commu-
2. The Topological Degree of Communication is defined as the nication traces including the memory space allocation, the
number of destinations that a given processing element must reach. tasks mapping and scheduling.
Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
FUSELLA ET AL.: PATH SETUP FOR HYBRID NOC ARCHITECTURES EXPLOITING FLOODING AND STANDBY 1413

Fig. 5. A comparison between the different path-setup architectures in terms of latency under synthetic traffic.

Fig. 7 shows the average latency and energy consump- probability of finding free paths. A better reduction is pro-
tion for each path-setup approach when implementing the vided by Arch-A and Arch-B, able to reach a reduction of
different applications in a 8  8 mesh topology. As in the more than 40 percent compared to the baseline solution.
previous section, the figure displays the breakdown of the Notice that the two virtual channels owned by Arch-B give
latency and energy consumption in terms of path-setup a slight advantage over Arch-A. Last, the proposed
overhead as well as electronic and optical transmission. In approach achieves a 45 percent improvement over the base-
general, the traffic pattern is less challenging than the line solution while, compared to Arch-A and Arch-B, it
worst-case synthetic traffic and hence the network never incurs a negligible 3 percent penalty in terms of latency
saturates. caused by the protocol with standby.
Concerning the latency, the baseline architecture per- Differently, concerning the energy consumption, Arch-C
forms worse than the other architectures. Arch-C performs performs poorly due to the non-minimal behaviour of its
better than the baseline solution, achieving an average flooding routing algorithm: the energy consumption is
reduction of around 20 percent. This is due the flooding around the 50 percent higher than the baseline architecture.
algorithm with non-minimal paths which increases the Arch-A consumes an average 30 percent less energy than
Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
1414 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 5, MAY 2017

Fig. 6. A comparison between the different path-setup architectures in terms of energy per bit consumption under synthetic traffic.

the baseline solution, while Arch-B consumes an average 12 architectures. This advantage roughly accounts for a 80 per-
percent less energy than Arch-A. Last, the proposed cent reduction compared to the baseline architecture. The
approach consumes less energy compared to all the other results above point out that the architecture exploiting the
path-setup protocol with standby is highly energy efficient,
TABLE 7 while the latency overhead is negligible.
Characteristics of the Applications Analyzed
6 CONCLUSION
Application #Tasks #Comm Avg msg Avg msg TDC This paper shown the importance of using a suitable
links size (Byte) injection rate
path-setup strategy to enable efficient use of the photonic
FPPPP 334 1,145 204.8 0.00064 0.17 resources in hybrid photonic-electronic networks. We
H264-720p_dec 2,311 13,461 1,280 0.000018 0.10
proposed a new path-setup protocol that can reduce the
H264-1080p_dec 5,191 7,781 1,280 0.000018 0.19
SPARSE 96 67 819.2 0.00031 0.02
path-setup latency and the power consumption due to
its ability to put allocated circuits on a stand-by state. In
Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
FUSELLA ET AL.: PATH SETUP FOR HYBRID NOC ARCHITECTURES EXPLOITING FLOODING AND STANDBY 1415

[6] E. Fusella, J. Flich, A. Cilardo, and A. Mazzeo, “On the design of a


path-setup architecture for exploiting hybrid photonic-electronic
NoCs,” in Proc. Workshop Exploiting Silicon Photon. Energy-Efficient
High Performance Comput., 2015, pp. 9–16.
[7] R. Ramaswami, K. Sivarajan, and G. Sasaki, Optical Networks: A Prac-
tical Perspective. San Mateo, CA, USA: Morgan Kaufmann, 2009.
[8] P. Kapur and K. C. Saraswat, “Comparisons between electrical
and optical interconnects for on-chip signaling,” in Proc. IEEE Int.
Interconnect Technol. Conf., 2002, pp. 89–91.
[9] M. Haurylau, et al., “On-chip optical interconnect RoadMap:
Challenges and critical directions,” IEEE J. Select. Topics Quantum
Electron., vol. 12, no. 6, pp. 1699–1705, Nov./Dec. 2006.
[10] G. Chen, et al., “Predictions of CMOS compatible on-chip optical
interconnect,” VLSI J. Integr., vol. 40, no. 4, pp. 434–446, 2007.
[11] K.-H. Koo, H. Cho, P. Kapur, and K. C. Saraswat, “Performance
comparisons between carbon nanotubes, optical, and cu for future
high-performance on-chip interconnect applications,” IEEE Trans.
Electron Devices, vol. 54, no. 12, pp. 3206–3215, Dec. 2007.
[12] L. P. Carloni, P. Pande, and Y. Xie, “Networks-on-chip in emerg-
ing interconnect paradigms: Advantages and challenges,” in Proc.
3rd ACM/IEEE Int. Symp. Netw.-on-Chip, 2009, pp. 93–102.
[13] B. A. Small, B. G. Lee, K. Bergman, Q. Xu, and M. Lipson,
“Multiple-wavelength integrated photonic networks based on
microring resonator devices,” J. Opt. Netw., vol. 6, no. 2, pp. 112–
120, 2007.
[14] D. Vantrease, et al., “Corona: System implications of emerging
nanophotonic technology,” in Proc. ACM SIGARCH Comput.
Archit. News, 2008, vol. 36, no. 3, pp. 153–164.
[15] Y. Pan, P. Kumar, J. Kim, G. Memik, Y. Zhang, and A. Choudhary,
“Firefly: Illuminating future network-on-chip with nano-
photonics,” in Proc. ACM SIGARCH Comput. Archit. News, 2009,
vol. 37, no. 3, pp. 429–440.
[16] K. H. Mo, Y. Ye, X. Wu, W. Zhang, W. Liu, and J. Xu, “A hierarchi-
cal hybrid optical-electronic network-on-chip,” in Proc. IEEE Com-
put. Soc. Annu. Symp. VLSI, 2010, pp. 327–332.
[17] Y. Ye, J. Xu, X. Wu, W. Zhang, W. Liu, and M. Nikdast, “A torus-
based hierarchical optical-electronic network-on-chip for multi-
processor system-on-chip,” ACM J. Emerging Technol. Comput.
Syst., vol. 8, no. 1, 2012, Art. no. 5.
[18] S. Bahirat and S. Pasricha, “Meteor: Hybrid photonic ring-mesh
network-on-chip for multicore architectures,” ACM Trans. Embed-
Fig. 7. A comparison between the different path-setup architectures in
ded Comput. Syst., vol. 13, no. 3s, 2014, Art. no. 116.
terms of latency and energy consumption under real applications traffic.
[19] S. L. Beux, et al., “Chameleon: Channel efficient optical network-
on-chip,” in Proc. IEEE Des. Autom. Test Europe Conf. Exhibition,
addition, path-setup messages are sent using a flooding 2014, pp. 1–6.
routing strategy in order to enhance the probability of [20] E. Fusella and A. Cilardo, “H2 ONoC: A hybrid optical-electronic
finding free optical paths. We analyzed the performance NoC based on hybrid topology,” IEEE Trans. Very Large Scale Integr.
Syst., vol. 99, pp. 1–14, 2016, Doi: 10.1109/TVLSI.2016.2581486.
and energy consumption of the proposed path-setup
[21] J. Chan, G. Hendry, A. Biberman, and K. Bergman, “Architectural
strategy and compared it with a commonly used exploration of chip-scale photonic interconnection network
approach as well as some alternative solutions. The designs using physical-layer analysis,” J. Lightwave Technol.,
results show that, as the message sizes and the injection vol. 28, no. 9, pp. 1305–1315, 2010.
rates increase, the number of conflicts incurred while [22] E. Fusella and A. Cilardo, “PhoNoCMap: An application mapping
tool for photonic networks-on-chip,” in Proc. IEEE Des. Autom.
establishing the circuit grows significantly, leading to Test Europe Conf. Exhibition, 2016, pp. 289–292.
performance and energy efficiency degradation. In this [23] K. Goossens, J. Dielissen, and A. Radulescu, “Æthereal network
scenario, the proposed approach turned out to reduce on chip: Concepts, architectures, and implementations,” IEEE
the latency up to 45 percent and the power consumption Des. Test Comput., vol. 22, no. 5, pp. 414–421, Sep./Oct. 2005.
[24] M. Petracca, B. G. Lee, K. Bergman, and L. P. Carloni, “Photonic
up to a 80 percent. NoCs: System-level design exploration,” IEEE Micro, vol. 29,
no. 4, pp. 74–85, Jul./Aug. 2009.
REFERENCES [25] S. Pasricha and S. Bahirat, “Opal: A multi-layer hybrid photonic
NoC for 3D ICS,” in Proc. 16th Asia South Pacific Des. Autom. Conf.,
[1] N. Klrman, et al., “On-chip optical technology in future bus-based 2011, pp. 345–350.
multicore designs,” IEEE Micro, vol. 27, no. 1, pp. 56–66, Jan. 2007. [26] P. Grani and S. Bartolini, “Simultaneous optical path-setup for
[2] A. Cilardo and E. Fusella, “Design automation for application- reconfigurable photonic networks in tiled CMPS,” in Proc. IEEE
specific on-chip interconnects: A survey,” VLSI J. Integr., vol. 52, Int. Conf. High Performance Comput. Commun., IEEE 11th Int. Conf.
pp. 102–121, 2016. Embedded Softw. Syst., IEEE 6th Int. Symp. Cyberspace Safety Secur.,
[3] E. Fusella and A. Cilardo, “Lighting up on-chip communications 2014, pp. 482–485.
with photonics: Design tradeoffs for optical NoC architectures,” [27] H. Gu, K. H. Mo, J. Xu, and W. Zhang, “A low-power low-cost
IEEE Circuits Syst. Mag., vol. 16, no. 3, pp. 4–14, Jul.–Sep. 2016. optical router for optical networks-on-chip in multiprocessor sys-
[4] A. Biberman, et al., “Broadband silicon photonic electrooptic tems-on-chip,” in Proc. IEEE Comput. Soc. Annu. Symp. VLSI, 2009,
switch for photonic interconnection networks,” IEEE Photon. Tech- pp. 19–24.
nol. Lett., vol. 23, no. 8, pp. 504–506, Apr. 2011. [28] B. G. Lee, A. Biberman, P. Dong, M. Lipson, and K. Bergman, “All-
[5] A. Shacham, K. Bergman, and L. P. Carloni, “Photonic networks- optical comb switch for multiwavelength message routing in sili-
on-chip for future generations of chip multiprocessors,” IEEE con photonic networks,” IEEE Photon. Technol. Lett., vol. 20, no. 10,
Trans. Comput., vol. 57, no. 9, pp. 1246–1260, Sep. 2008. pp. 767–769, May2008.

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.
1416 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 28, NO. 5, MAY 2017

[29] J. Poulton, et al., “A 14-mw 6.25-gb/s transceiver in 90-nm Edoardo Fusella received the BS, MS, and PhD
CMOS,” IEEE J. Solid-State Circuits, vol. 42, no. 12, pp. 2745–2757, degrees in computer engineering from the Uni-
Dec. 2007. versity of Naples Federico II, Italy, in 2008, 2011,
[30] H. Watanabe, T. Atsushi, K. Takayuki, I. Matsubara, and K. Iwata, and 2015, respectively, where he is currently a
“Vertical cavity surface emitting laser,” U.S. Patent App. 14/447 post-doctoral researcher in the Department of
528, Jul. 30, 2014. Electrical Engineering and Information Technolo-
[31] G. Masini, G. Capellini, J. Witzens, and C. Gunn, “A four-channel, gies. He has been a visiting researcher with the
10 Gbps monolithic optical receiver in 130nm CMOS with inte- cnica de Vale
Universitat Polite  ncia, Spain and
grated ge waveguide photodetectors,” in Proc. Conf. Nat. Fiber NEC Europe Ltd., Heidelberg, Germany. His
Optic Eng. Conf., 2007, Art. no. PDP31. research interests focus on the domain of many-
[32] C. Kromer, et al., “A 100-mw 4 10 gb/s transceiver in 80-nm core systems, for which he has been investigat-
cmos for high-density optical interconnects,” IEEE J. Solid-State ing on-chip communication architectures, with emphasis on both elec-
Circuits, vol. 40, no. 12, pp. 2667–2679, Dec. 2005. tronic and optical on-chips networks, ranging from design tools to
[33] C. Batten, et al., “Building manycore processor-to-dram networks implementation and physical design.
with monolithic silicon photonics,” in Proc. 16th IEEE Symp. High
Performance Interconnects, 2008, pp. 21–30.  Flich received the PhD in computer engi-
Jose
[34] F. Xia, L. Sekaric, and Y. Vlasov, “Ultracompact optical buffers on
neering, in 2001. He is a full professor with UPV,
a silicon chip,” Nature Photon., vol. 1, pp. 65–71, 2007.
where he leads the research activities related to
[35] W. Bogaerts, P. Dumon, D. V. Thourhout, and R. Baets, “Low-loss,
NoCs. He published more than 100 conference
low-cross-talk crossings for silicon-on-insulator nanophotonic
and journal papers, and has served in different
waveguides,” Opt. Lett., vol. 32, no. 19, pp. 2801–2803, 2007.
conference program committees (ISCA, PACT,
[36] D. Nikolova, et al., “Scaling silicon photonic switch fabrics for
HPCA, NOCS, ICPP, IPDPS, HiPC, CAC, CASS,
data center interconnection networks,” Opt. Express, vol. 23, no. 2,
ICPADS, ISCC), as program chair (INA-OCMC,
pp. 1159–1175, 2015.
CAC) and track co-chair (EUROPAR). He has
[37] N. Eisley and L.-S. Peh, “High-level power analysis for on-chip
collaborated with different Institutions (Ferrara,
networks,” in Proc. Int. Conf. Compilers Archit. Synthesis Embedded
Naples, Catania, Jonkoping, USC) and compa-
Syst., 2004, pp. 104–115.
nies (AMD, Intel, Sun). Current research activities focus on routing,
[38] F. Xia, M. Rooks, L. Sekaric, and Y. Vlasov, “Ultra-compact high
coherency protocols, and congestion management within NoCs. He has
order ring resonator filters using submicron silicon photonic wires
co-invented different routing strategies, reconfiguration and congestion
for on-chip optical interconnects,” Opt. Express, vol. 15, no. 19,
control mechanisms, some of them with high recognition (RECN and
pp. 11 934–11 941, 2007.
LBDR for on-chip networks). He is a member of the HiPEAC NoE. He is
[39] W. J. Dally and C. L. Seitz, “Deadlock-free message routing in
coeditor of the book Designing Network-on-Chip Architectures in the
multiprocessor interconnection networks,” IEEE Trans. Comput.,
Nanoscale Era. He coordinated the FP7 NaNoC project and the H2020
vol. 100, no. 5, pp. 547–553, May 1987.
MANGO project. He is a senior member of the IEEE.
[40] G.-M. Chiu, “The odd-even turn model for adaptive routing,”
IEEE Trans. Parallel Distrib. Syst., vol. 11, no. 7, pp. 729–738, Jul.
2000. Alessandro Cilardo received the five-year
[41] J. Chan, G. Hendry, K. Bergman, and L. P. Carloni, “Physical-layer degree in electronics engineering, magna cum
modeling and system-level design of chip-scale photonic intercon- laude and the PhD degree in computer science
nection networks,” IEEE Trans. Comput.-Aided Des. Integr. Circuits from the University of Naples Federico II, in Janu-
Syst., vol. 30, no. 10, pp. 1507–1520, Oct. 2011. ary 2003 and November 2006, respectively. He is
[42] S. Kamil, L. Oliker, A. Pinar, and J. Shalf, “Communication currently an assistant professor with the Univer-
requirements and interconnect optimization for high-end scien- sity of Naples Federico II. His research activities
tific applications,” IEEE Trans. Parallel Distrib. Syst., vol. 21, no. 2, focus on electronic design automation, high-per-
pp. 188–202, Feb. 2010. formance and embedded computing, as well as
[43] J. S. Vetter and F. Mueller, “Communication characteristics of computer arithmetic, with special emphasis on
large-scale scientific applications for contemporary cluster the application domain of security and cryptogra-
architectures,” J. Parallel Distrib. Comput., vol. 63, no. 9, pp. 853– phy-related processing. He is the single or main author of around 70
865, 2003. peer-reviewed papers published in leading scientific journals and confer-
[44] W. Liu, et al., “A NoC traffic suite based on real applications,” in ences. He is a senior member of the Institute of Electrical and Electron-
Proc. IEEE Comput. Soc. Annu. Symp. VLSI, 2011, pp. 66–71. ics Engineers and a member of the HiPEAC NoE.

" For more information on this or any other computing topic,


please visit our Digital Library at www.computer.org/publications/dlib.

Authorized licensed use limited to: Georgia Institute of Technology. Downloaded on September 03,2023 at 07:30:41 UTC from IEEE Xplore. Restrictions apply.

You might also like