You are on page 1of 5

The Square-Root-NOC:

New Architecture for Network on chip

Teamour Esmaeili
Dep.of Computer Engineering
DareShahr Branch,
Islamic Azad University, Iran
Ghazal Lak
Dep.of Computer Engineering
DareShahr Branch,
Islamic Azad University, Iran
Akram Noori Rad
Dep.of Computer Engineering
DareShahr Branch,
Islamic Azad University, Iran
Abstract Modern chip-scale computing system performance is increasingly becoming determined by the characteristics of
the interconnection network. Photonic technology has been proposed as an alternative to traditional electronic interconnects for
its advantages in bandwidth density, latency, and power efficiency Circuit-switched photonic network architectures take
advantage of the optical spectrum to create high-bandwidth transmission links through the transmission of data channels on
multiple parallel wavelengths. The square-root principle is known to achieve low search time for peer-to-peer search techniques
that do not utilize query routing indices (e.g., query flooding or random walk searches).
Index TermsSquare-Root-NOC, chip multiprocessor(CMP), high-performance computing (HPC).
~~~~~~~~~~ ~~~~~~~~~~
uture high-performance computing (HPC) systems
will be increasingly met with more stringent physical
demands and growing performance requirements to
meet the needs of emerging HPC applications [1]. Per-
formance requirements have thus far been obtained
through the increased parallelism of chip multiprocessors
(CMPs) and greater memory capacity; however, the cur-
rent trend in processor and memory scaling will ulti-
mately be met with fundamental roadblocks.
Interconnecting the cores on a CMP is becoming in-
creasingly difficult due to physical limitations in wire
densities and packaging power dissipation. Moreover,
memory and IO communications are also being bottle-
necked by pin density and limited surface area on the
chip package. These challenges have led system architects
to investigate alternative technologies for handling the
communication infrastructure required by HPC systems.
Integrated silicon photonics is currently seen as an at-
tractive interconnect solution for mitigating the band-
width and energy bottlenecks that are facing HPC sys-
tems. Photonic interconnects offers orders-of-magnitude
improved bandwidth density over electronics by leverag-
ing wavelength-division multiplexing (WDM) to concur-
rently transmit multiple spectrally-parallel streams of
data through a single optical waveguide. Photonics also
affords improved energy efficiency by eliminating the
need for electronic buffers and switches that are required
within typical electronic networks. These notable advan-
tages have prompted the proposal of many novel photon-
ics-enabled interconnect architectures [2]-[4].
Modern computing platforms are leveraging the paral-
lel computation capabilities of chip multiprocessors
(CMPs) to achieve improved computational performance
and energy efficiency. The transition away from single-
core architectures to CMPs occurred due to fundamental
limitations in the power dissipation capabilities of current
packaging technology, which consequently hampered
progress in on-chip clock-rate scaling. This subsequent
shift toward CMP architectures creates a need for high-
performance chip-scale interconnects for core-to-core and
core-to-memory communications. Chip-scale intercon-
nects have thus far been implemented with electronics,
but they achieve limited on-chip and off-chip bandwidth
scaling due to minimum achievable wire pitches and the
same power dissipation restrictions that constrict the
clock rate.
These restrictions will prevent electronics from achiev-
ing the performance requirements of current and future
applications that are communication bound. Addition-
ally, there will be difficult challenges in meeting the con-
nectivity requirements of computer systems with increas-
ing number of cores. New technologies need to be ex-
plored to address these issues. Integrated photonic tech-
nology is an attractive interconnect solution that can be
used to mitigate the energy and bandwidth bottlenecks
that are arising in CMP systems.
Photonic interconnects can enable improved band-
width density by leveraging wavelength-division multi-
plexing (WDM) to concurrently transmit multiple spec-
trally parallel streams of data through a single optical
waveguide, which contrasts with electronic interconnects
that require a unique metal wire per bit stream. As a re-
sult, photonics can alleviate the problems facing inter-
connect subsystems that are reaching limits in wire and
input/output (I/O) pin density. Additionally, photonics
also affords improved energy efficiency by eliminating
the need for the electronic buffering and switching that
are found in conventional electronic networks. The com-
bined advantages of better bandwidth, density and power
efficiency make photonic interconnects a serious con-
tender as a technological replacement for electronic inter-
Recent advancements in silicon nano-photonic tech-
nology have opened the possibility of integrating photon-
ics for chip-scale interconnection networks. In compari-
son to electronics, photonics has the potential to offer
higher-bandwidth connections by leveraging data paral-
lelism offered by wavelength-division-multiplexing
Additionally, energy dissipated in photonic signaling
is effectively distance independent, enabling greater en-
ergy efficiency for global chip- and board-scale communi-
cations. Although these advantages exist, optical signal-
ing is incapable of practical in-flight processing or buffer-
ing without optical-electronic-optical conversion. Also,
signal regeneration in optics cannot be easily accom-
plished on the CMOS-compatible silicon photonic plat-
form. Photonic messages must therefore be able to propa-
gate the length of the transmission path without accumu-
lating significant optical loss. In light of these constraints,
many novel photonic interconnect designs have been
proposed for enabling optical data transmission in the
chip-scale domain [5-7].
We report, to the best of our knowledge, the first de-
tailed physical-layer analysis of chip-scale photonic inter-
connection networks. Although many photonic topolo-
gies have been proposed in an effort to improve comput-
ing performance, less emphasis has been placed on un-
derstanding whether such designs are feasible from a
physical-layer standpoint. Since it is not currently practi-
cal to test full network topologies in a laboratory environ-
ment due to fabrication yield limitations, we implement
physically-accurate simulation models for this analysis.
We model the previously proposed Torus topology [7],
and introduce two new topologies, TorusNX and Square
Root. These three networks are analyzed in terms of three
physical-layer metrics (that play a critical role in deter-
mining the overall scalability and performance of the
network design: insertion loss, crosstalk, and energy.
Advancements in silicon photonic device technology have
brought about the development of all the functional com-
ponents necessary in constructing chip-scale interconnec-
tion networks based on photonics. The set of fundamental
devices include waveguides [8,9], bends [8], crossings
[10], filters [11], switches [12], modulators [13], and detec-
tors [14]. Replicating the functionality of electronic inter-
connect designs with these photonic devices is possible;
however, the advantages that photonic technology offers
will not be fully appreciated since their behavior and cha-
racteristics are fundamentally different from those of their
electronic counterparts. In what can be considered as the
first step toward a full-scale photonic platform, Ophir et
al. demonstrated the operation of an optical bus (i.e.,
point-to-point link) operating at a data rate of 3 GHz [15].
Network architects have also proposed a variety of ad-
vanced novel interconnect designs in order to fully lever-
the capabilities of photonics [16-24].
Wavelength-routed topologies are constructed using ring-
resonator-based filters which accordingly route light
waves based on their wavelengths [16-20]. Any source
node can address its intended destination through the
selection of an appropriate transmission wavelength (i.e.,
source routing), which is then guided by the ring filters
throughout the network. Transmission latencies can be
designed to be extremely short when using wavelength
routing, since the propagation delay is simply the time of
flight at the speed of light. However, spectral bandwidth
is leveraged for routing purposes which could have oth-
erwise been used to increase communication data rates.
Spatial routing uses electro-optic broadband ring resona-
tors to guide a large set of parallel wavelength channels
along an optical path [21-23]. The ring resonators act as
comb switches to simultaneously control the path of all
incident wavelength channels. Spatial routing requires a
priori establishment of the entire optical path which is
typically created using a circuit-switching style method-
ology. While spatial routing exhibits longer latencies than
wavelength routing due to the overhead of the circuit-
switching protocol, it is able to leverage the entirety of the
available optical spectrum for data striping to create ex-
tremely high bandwidth links. A previous study
showed that the circuit-switching overhead can be amor-
tized over large data messages, which is a characteristic in
certain scientific applications typically executed on high-
performance systems [22].
The usage of time-division multiplexing (TDM) has also
been previously proposed as a technique for improving
optical on-chip network performance [24]. TDM routing
temporally divides the transmission medium into a con-
tinuous series of frames. Each frame is subdivided into
several time slots, which represents a different configura-
tion of the entire optical network, and the set of all unique
time slots completely connects all nodes in the network.
The network is constructed using broadband ring switch-
es, identical to the switches used
for spatial routing, which are electro-optically reconfig-
ured at the beginning of each time slot. A queued mes-
sage at a source node will wait until an appropriate time
slot arrives before it begins transmission, which contrasts
with the spatial routing mechanism of immediately re-
questing the circuit allocation. Incarnations of some of the
aforementioned TDM routing and the WSSR concept pre-
sented in this work were previously proposed and ana-
lyzed for multiprocessor networks and wide-area net-
works [25,26]. The previous work showed that the use of
WDM and TDM was effective for reducing network-level
latency. With respect to TDM techniques, a comparison of
link multiplexing and path multiplexing was conducted,
and it showed that link multiplexing performed better in
certain traffic configurations with a significant reduction
in design complexity [25]. The alternative WDM tech-
nique was also described to have similar performance
characteristics as the TDM case [26].
A previously proposed circuit-switched topology is the
TorusNX, which is designed with a reduced number of
crossings and an optimized switching layout [23]. A 4 * 4
version of the TorusNX is illustrated in Fig. 1, consisting
of 16 gateway switches and 16 4*4 non-blocking switches.
The structure of each switch configured with two WDM
partitions is given in Fig. 2. Each pair of rings (indicated
by a red ring and a blue ring) composes the two cascaded
rings that compose a two-partition router. the original
single-partition designs can be reconstructed by removing
either the red or blue set of rings from the layout.
We investigate space-routed photonic networks, which
are designed to use actively-controlled silicon ring-
resonator-based broadband switches to route WDM mes-
sages, composed of a set of wavelength channels, from
source to destination. The ring resonators are electro-optic
devices that can be manipulated to be in an off-resonance
through state allowing signals to pass by , or in an on-
resonance drop state which shifts the signal onto another
waveguide. An electronic control plane, mirroring the
photonic network layout, is necessary to control each
broadband switch through a circuit-switching protocol.
When a photonic connection is being provisioned, a path-
setup message on the control plane will trace out an opti-
cal path on the photonic plane by reserving and configur-
ing the appropriate optical switches. This form of routing
can fully utilize the optical spectrum by leveraging WDM
to create extremely high-throughput links. This method
contrasts with wavelength-routed networks, which lever-
age filters and wavelength selectivity to perform routing.
the main network through which data is routed. The To-
rus requires an additional access network, represented by
thinner lines (additional waveguides) and the blocks (in-
jection) and ejection to facilitate entering and exiting the
main network [7].
3.1 TorusNX
TorusNX improves the Torus topology by introducing
new a switch design that eliminates the need for the ac-
cess network and directly integrates the gateway into the
main topology.
3.2 Square Root
Square Root (Fig. 3) is an alternative hierarchical topology
optimized to reduce the required number of
waveguide crossings and switching points. Due to the
recursive nature of constructing the Square Root, the
number of nodes along the X and Y dimension of the to-
pology must be equal and a positive integer power of two
(i.e. 2, 4, 8, 16, .). TorusNX and Square Root were both
Fig.1. (Color online) Schematic of the TorusNX topology. G` blocks
represent gateway photonic switches and X` blocks represent 4*4
non-blocking photonic switches. Lines indicate bidirectional links
which are composed of two waveguides that are used for counter-
propagating lightwaves
Fig.2. (Color online) Schematic of the TorusNX photonic routers
configured with two WDM partitions: (a) gateway switch and (b)
4*4 non-blocking photonic switch s
Fig.3. 4*4 Square Root Topology
designed in response to preliminary physical-layer short-
comings of the Torus, since insertion losses due to wave-
guide crossings and the large number of switches have a
dramatic impact on system performance.
In unsliucluied nelvoiks, such as lhal in CnuleIIa, lhe
topology of the network is built up over time as peers
choose neighbors essentially randomly. Without any out-
side interference, such networks tend toward a power-
law distribution, where the number of neighbors of the
Ith most connected peer is proportional
to 1/Iu. Here, u is a constant that determines the skew of
the distribution. For such networks, random walk
searches have shown to be effective [27,28]. A simple ran-
dom walk search starts at one peer in the network, and is
piocessed ovei lhal peeis conlenl. That peer then for-
wards the search to a random neighbor, who processes
and forwards the query again. In this way, the search
vaIks iandonIy aiound lhe nelvoik, unliI il leini-
nates, either because enough results have been found or
because a time-to-live (TTL) has been reached [28].
Consider a peer-to-peer network with N peers. Each peer
k in the network has degree dk (that is, dk is the number
of neighbors that k has). The total degree in the network
is D,where D = .
We define the square-root topology as a topology where
the degree of each peer is proportional to the square root
of lhe popuIaiily of lhe peeis conlenl.
The square-root topology is optimal for simple random
walk searches. But are simple random walk searches the
best search strategy for the square-root topology. Previ-
ous work [27-29] has shown that content movement can
improve simple random walks significantly. However,
we can still optimize random walks for cases
where content movement is not feasible. In this section,
we describe two optimizations that work together to im-
prove search efficiency for random walks in square-root
networks. Both optimizations introduce determinism into
the routing process, so to avoid routing loops between the
same sets of nodes, state keeping must be used [11].With
state keeping, nodes remember where they have for-
warded searches and avoid forwarding them to the same
neighbors over and over again.
[1] L. Rolinson, el aI.. Iholonics foi HILC: a Iov-
poveied soIulion foi high landvidlh appIicalions,
in OFC, OWH2 (2011).
[2] C. Batten, et al., "Building many-core processor-to-
DRAM networks with monolithic CMOS silicon pho-
tonics," IEEE Micro 29 (4) pp. 8-21 (2009).
[3] D. Vantrease, et al., "Corona: System implications of
emerging nanophotonic technology," in ISCA, pp.
153-164 (2008).
[4] A. Shacham, et al., "Photonic networks-on-chip for
future generations of chip multiprocessors," IEEE
Trans. Comput. 57 (9) pp. 1246-1260 (2008).
[5] C. allen, el aI., uiIding nany-core processor-to-
DRAM networks with monolithic CMOS silicon pho-
tonics, IEEE Micro 29 (4) 8-21 (2009).
[6] D. Vanliease, el aI., Coiona: Syslen inpIicalions of
emerging nanophotonic technology, in ISCA 08:
Proc. Of the 35th International Symposium on Com-
puter Architecture
[7] A. Shacham, K. Bergman, and L. P. Carloni, Iholon-
ic networks-on-chip for future generations of chip
multiprocessors, IEEE Trans. on Computers 57 (9)
1246-1260 (September 2008) , pp. 153-164 (2008).
[8] I. Xia, L. Sekaiic, and Y. VIasov, UIliaconpacl opli-
caI luffeis on a siIicon chip, Nal. Iholonics, vol. 1,
pp. 65-71, 2006.
[9] M. Gnan, S. Thorns, D. Macintyre, R. De La Rue, and
M. SoieI, Ialiicalion of Iov-loss photonic wires in
silicon-on-insulator using hydrogen silsesquioxane
electron-beam resist, Electron. Lett., vol. 44, no. 2,
pp. 115-116, Jan. 2008.
[10] W. Bogaerts, P. Dumon, D. V. Thourhout, and R.
aels, Lov-loss, low-cross-talk crossings for silicon-
on-insuIaloi nanopholonic vaveguides, Opl. Lell.,
vol. 32, no. 19, pp. 2801-2803, 2007.
[11] B. Little, J. Foresi, G. Steinmeyer, E. Thoen, S. Chu, H.
Haus, L. Ippen, L. KineiIing, and W. Cieene, UIlia-
compact Si-SiO2 microring resonator optical channel
dropping filters, IEEE Photon. Technol. Lett., vol. 10,
no. 4, pp. 549-551, Apr. 1998.
[12] B. Lee, A. Biberman, P. Dong, M. Lipson, and K.
eignan, AII-optical comb switch for multiwave-
length message routing in silicon photonic net-
works, IEEE Photon. Technol. Lett., vol. 20, no. 10,
pp. 767-769, May 2008.
[13] S. Manipatruni, Q. Xu, B. Schmidt, J. Shakya, and M.
Lipson, High speed carrier injection 18 Gb/s silicon
micro-ring electro-optic modulator, in 20th Annu.
Meeting of the IEEE Lasers and Electro-Optics Soci-
ety (LEOS), Oct. 2007, pp. 537-538.
[14] S. Assefa, B. G. Lee, C. Schow, W. M. Green, A. Ry-
lyakov, R. A. John, and Y. A. Vlasov, 20 Gbps re-
ceiver based on germanium photodetector hybrid-
integrated with 90 nm CMOS amplifier, in CLEO
2011~Laser Applications to Photonic Applications,
2011, PDPB11.
[15] N. Ophir, K. Padmaraju, A. Biberman, L. Chen, K.
Iieslon, M. Lipson, and K. eignan, Iiisl denon-
stration of error-free operation of a full silicon on-
chip pholonic Iink, in OplicaI Iilei Connunicalion
Conf., 2011, OWZ3.
[16] C. Batten, A. Joshi, J. Orcutt, A. Khilo, B. Moss, C.
Holzwarth,M. Popovic, H. Li, H. Smith, J. Hoyt, F.
Kartner, R. Ram,V. Stojanovic, and K. Asanovic,
uiIdingnany-core processor-to-DRAM networks
vilh nonoIilhic CMOS siIicon pholonics, ILLLMi-
cro, vol. 29, no. 4, pp. 8-21, July-Aug. 2009.
[17] N. Kirman,M. Kirman, R. K. Dokania, J. F.Martinez,
A. . ApseI, M. A. Walkins, and D. H. AIlonesi, On-
chip optical technology in future bus-based multicore
designs, IEEE Micro, vol. 27, no. 1, pp. 56-66, 2007.
[18] M. J. Cianchetti, J. C. Kerekes, and D. H. Albonesi,
IhaslIane: a iapid liansil oplicaI iouling nelvoik,
in Proc. of the 36th Annu. Int. Symp. on Computer
Architecture (ISCA), 2009, pp. 441-450.
[19] D. Vantrease, R. Schreiber, M. Monchiero, M. McLa-
ren, N. P. Jouppi, M. Fiorentino, A. Davis, N. Binkert,
R. C. eausoIeiI, and }. H. Ahn, Coiona: Syslen im-
plications of emerging nanophotonic technoIogy, in
Proc. of the 35th Annu. Int. Symp. on Computer Ar-
chitecture (ISCA), June 2008, pp. 153-164.
[20] S. Koohi, M. AldoIIahi, and S. Hessali, AII-optical
wavelength-routed NoC based on a novel hierarchi-
cal topology, in 5th IEEE/ACM Int. Symp. on Net-
works on Chip (NoCS), May 2011, pp. 97-104.
[21] A. Shachan, K. eignan, and L. CaiIoni, Iholonic
networks-on-chip for future generations of chip mul-
tiprocessors, IEEE Trans. Comput., vol. 57, no. 9, pp.
1246-1260, Sept. 2008.
[22] G. Hendry, S. Kamil, A. Biberman, J. Chan, B. Lee, M.
Mohiyuddin, A. Jain, K. Bergman, L. Carloni, J. Kubi-
alovicz, L. OIikei, and }. ShaIf, AnaIysis of photonic
networks for a chip multiprocessor using scientific
applications, in 3rd ACM/IEEE Int. Symp. on Net-
works-on-Chip (NOCS),May 2009, pp. 104-113.
[23] J. Chan, G. Hendry, A. Biberman, and K. Bergman,
AichilecluiaI expIoialion of chip-scale photonic in-
terconnection network designs using physical-layer
analysis, J. Lightwave Technol., vol.28, no. 9, pp.
1305-1315, May 2010.
[24] G. Hendry, J. Chan, S. Kamil, L. Oliker, J. Shalf, L.
CaiIoni, and K. eignan, SiIicon nanopholonic net-
work-on-chip using TDM arbitration, in 2010 IEEE
18th Annu. Symp. on High Performance Intercon-
nects (HOTI), Aug. 2010, pp. 88-95.
[25] C. Qiao and R. MeIhen, Reducing connunicalion
latency with path multiplexing in optically intercon-
necled nuIlipiocessoi syslens, ILLL Tians. IaiaIIeI
Distrib. Syst., vol. 8, no. 2, pp. 97-108, Feb. 1997.
[26] X. Yuan, R.MeIhen, and R. Cupla, Disliiluled palh
reservation algorithms for multiplexed all-optical in-
leiconneclion nelvoiks, ILLL Tians. Conpul., voI.
48, no. 12, pp. 1355-1363, Dec. 1999.
[27] L. Adamic, R. Lukose, A. Puniyani, and B. Huber-
man. Search in power-law networks. Phys. Rev. E,
64:46135-46143, 2001.
[28] E. Cohen and S. Shenker. Replication strategies in
unstructured peer-to-peer networks. In Proc. SIG-
COMM, 2002.
[29] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker. Search
and replication in unstructured peer-to-peer net-
voiks. In Iioc. InlI Conf. on Supeiconpuling (ICS),

You might also like