Teamour Esmaeili Dep.of Computer Engineering DareShahr Branch, Islamic Azad University, Iran Ghazal Lak Dep.of Computer Engineering DareShahr Branch, Islamic Azad University, Iran Akram Noori Rad Dep.of Computer Engineering DareShahr Branch, Islamic Azad University, Iran Abstract Modern chip-scale computing system performance is increasingly becoming determined by the characteristics of the interconnection network. Photonic technology has been proposed as an alternative to traditional electronic interconnects for its advantages in bandwidth density, latency, and power efficiency Circuit-switched photonic network architectures take advantage of the optical spectrum to create high-bandwidth transmission links through the transmission of data channels on multiple parallel wavelengths. The square-root principle is known to achieve low search time for peer-to-peer search techniques that do not utilize query routing indices (e.g., query flooding or random walk searches). Index TermsSquare-Root-NOC, chip multiprocessor(CMP), high-performance computing (HPC). ~~~~~~~~~~ ~~~~~~~~~~ 1 INTRODUCTION uture high-performance computing (HPC) systems will be increasingly met with more stringent physical demands and growing performance requirements to meet the needs of emerging HPC applications [1]. Per- formance requirements have thus far been obtained through the increased parallelism of chip multiprocessors (CMPs) and greater memory capacity; however, the cur- rent trend in processor and memory scaling will ulti- mately be met with fundamental roadblocks. Interconnecting the cores on a CMP is becoming in- creasingly difficult due to physical limitations in wire densities and packaging power dissipation. Moreover, memory and IO communications are also being bottle- necked by pin density and limited surface area on the chip package. These challenges have led system architects to investigate alternative technologies for handling the communication infrastructure required by HPC systems. Integrated silicon photonics is currently seen as an at- tractive interconnect solution for mitigating the band- width and energy bottlenecks that are facing HPC sys- tems. Photonic interconnects offers orders-of-magnitude improved bandwidth density over electronics by leverag- ing wavelength-division multiplexing (WDM) to concur- rently transmit multiple spectrally-parallel streams of data through a single optical waveguide. Photonics also affords improved energy efficiency by eliminating the need for electronic buffers and switches that are required within typical electronic networks. These notable advan- tages have prompted the proposal of many novel photon- ics-enabled interconnect architectures [2]-[4]. Modern computing platforms are leveraging the paral- lel computation capabilities of chip multiprocessors (CMPs) to achieve improved computational performance and energy efficiency. The transition away from single- core architectures to CMPs occurred due to fundamental limitations in the power dissipation capabilities of current packaging technology, which consequently hampered progress in on-chip clock-rate scaling. This subsequent shift toward CMP architectures creates a need for high- performance chip-scale interconnects for core-to-core and core-to-memory communications. Chip-scale intercon- nects have thus far been implemented with electronics, but they achieve limited on-chip and off-chip bandwidth scaling due to minimum achievable wire pitches and the same power dissipation restrictions that constrict the clock rate. These restrictions will prevent electronics from achiev- ing the performance requirements of current and future applications that are communication bound. Addition- ally, there will be difficult challenges in meeting the con- nectivity requirements of computer systems with increas- ing number of cores. New technologies need to be ex- plored to address these issues. Integrated photonic tech- nology is an attractive interconnect solution that can be used to mitigate the energy and bandwidth bottlenecks that are arising in CMP systems. Photonic interconnects can enable improved band- width density by leveraging wavelength-division multi- plexing (WDM) to concurrently transmit multiple spec- trally parallel streams of data through a single optical waveguide, which contrasts with electronic interconnects that require a unique metal wire per bit stream. As a re- sult, photonics can alleviate the problems facing inter- connect subsystems that are reaching limits in wire and input/output (I/O) pin density. Additionally, photonics also affords improved energy efficiency by eliminating F JOURNAL OF COMPUTING, VOLUME 4, ISSUE 6, JUNE 2012, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG 144 the need for the electronic buffering and switching that are found in conventional electronic networks. The com- bined advantages of better bandwidth, density and power efficiency make photonic interconnects a serious con- tender as a technological replacement for electronic inter- connects. Recent advancements in silicon nano-photonic tech- nology have opened the possibility of integrating photon- ics for chip-scale interconnection networks. In compari- son to electronics, photonics has the potential to offer higher-bandwidth connections by leveraging data paral- lelism offered by wavelength-division-multiplexing (WDM). Additionally, energy dissipated in photonic signaling is effectively distance independent, enabling greater en- ergy efficiency for global chip- and board-scale communi- cations. Although these advantages exist, optical signal- ing is incapable of practical in-flight processing or buffer- ing without optical-electronic-optical conversion. Also, signal regeneration in optics cannot be easily accom- plished on the CMOS-compatible silicon photonic plat- form. Photonic messages must therefore be able to propa- gate the length of the transmission path without accumu- lating significant optical loss. In light of these constraints, many novel photonic interconnect designs have been proposed for enabling optical data transmission in the chip-scale domain [5-7]. We report, to the best of our knowledge, the first de- tailed physical-layer analysis of chip-scale photonic inter- connection networks. Although many photonic topolo- gies have been proposed in an effort to improve comput- ing performance, less emphasis has been placed on un- derstanding whether such designs are feasible from a physical-layer standpoint. Since it is not currently practi- cal to test full network topologies in a laboratory environ- ment due to fabrication yield limitations, we implement physically-accurate simulation models for this analysis. We model the previously proposed Torus topology [7], and introduce two new topologies, TorusNX and Square Root. These three networks are analyzed in terms of three physical-layer metrics (that play a critical role in deter- mining the overall scalability and performance of the network design: insertion loss, crosstalk, and energy. 2 PHOTONIC INTERCONNECTION NETWORKS Advancements in silicon photonic device technology have brought about the development of all the functional com- ponents necessary in constructing chip-scale interconnec- tion networks based on photonics. The set of fundamental devices include waveguides [8,9], bends [8], crossings [10], filters [11], switches [12], modulators [13], and detec- tors [14]. Replicating the functionality of electronic inter- connect designs with these photonic devices is possible; however, the advantages that photonic technology offers will not be fully appreciated since their behavior and cha- racteristics are fundamentally different from those of their electronic counterparts. In what can be considered as the first step toward a full-scale photonic platform, Ophir et al. demonstrated the operation of an optical bus (i.e., point-to-point link) operating at a data rate of 3 GHz [15]. Network architects have also proposed a variety of ad- vanced novel interconnect designs in order to fully lever- age the capabilities of photonics [16-24]. Wavelength-routed topologies are constructed using ring- resonator-based filters which accordingly route light waves based on their wavelengths [16-20]. Any source node can address its intended destination through the selection of an appropriate transmission wavelength (i.e., source routing), which is then guided by the ring filters throughout the network. Transmission latencies can be designed to be extremely short when using wavelength routing, since the propagation delay is simply the time of flight at the speed of light. However, spectral bandwidth is leveraged for routing purposes which could have oth- erwise been used to increase communication data rates. Spatial routing uses electro-optic broadband ring resona- tors to guide a large set of parallel wavelength channels along an optical path [21-23]. The ring resonators act as comb switches to simultaneously control the path of all incident wavelength channels. Spatial routing requires a priori establishment of the entire optical path which is typically created using a circuit-switching style method- ology. While spatial routing exhibits longer latencies than wavelength routing due to the overhead of the circuit- switching protocol, it is able to leverage the entirety of the available optical spectrum for data striping to create ex- tremely high bandwidth links. A previous study showed that the circuit-switching overhead can be amor- tized over large data messages, which is a characteristic in certain scientific applications typically executed on high- performance systems [22]. The usage of time-division multiplexing (TDM) has also been previously proposed as a technique for improving optical on-chip network performance [24]. TDM routing temporally divides the transmission medium into a con- tinuous series of frames. Each frame is subdivided into several time slots, which represents a different configura- tion of the entire optical network, and the set of all unique time slots completely connects all nodes in the network. The network is constructed using broadband ring switch- es, identical to the switches used for spatial routing, which are electro-optically reconfig- ured at the beginning of each time slot. A queued mes- sage at a source node will wait until an appropriate time slot arrives before it begins transmission, which contrasts with the spatial routing mechanism of immediately re- questing the circuit allocation. Incarnations of some of the aforementioned TDM routing and the WSSR concept pre- sented in this work were previously proposed and ana- lyzed for multiprocessor networks and wide-area net- works [25,26]. The previous work showed that the use of WDM and TDM was effective for reducing network-level latency. With respect to TDM techniques, a comparison of link multiplexing and path multiplexing was conducted, and it showed that link multiplexing performed better in certain traffic configurations with a significant reduction in design complexity [25]. The alternative WDM tech- nique was also described to have similar performance JOURNAL OF COMPUTING, VOLUME 4, ISSUE 6, JUNE 2012, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG 145 characteristics as the TDM case [26]. A previously proposed circuit-switched topology is the TorusNX, which is designed with a reduced number of crossings and an optimized switching layout [23]. A 4 * 4 version of the TorusNX is illustrated in Fig. 1, consisting of 16 gateway switches and 16 4*4 non-blocking switches. The structure of each switch configured with two WDM partitions is given in Fig. 2. Each pair of rings (indicated by a red ring and a blue ring) composes the two cascaded rings that compose a two-partition router. the original single-partition designs can be reconstructed by removing either the red or blue set of rings from the layout. 3 CHIP-SCALE PHOTONIC NETWORKS We investigate space-routed photonic networks, which are designed to use actively-controlled silicon ring- resonator-based broadband switches to route WDM mes- sages, composed of a set of wavelength channels, from source to destination. The ring resonators are electro-optic devices that can be manipulated to be in an off-resonance through state allowing signals to pass by , or in an on- resonance drop state which shifts the signal onto another waveguide. An electronic control plane, mirroring the photonic network layout, is necessary to control each broadband switch through a circuit-switching protocol. When a photonic connection is being provisioned, a path- setup message on the control plane will trace out an opti- cal path on the photonic plane by reserving and configur- ing the appropriate optical switches. This form of routing can fully utilize the optical spectrum by leveraging WDM to create extremely high-throughput links. This method contrasts with wavelength-routed networks, which lever- age filters and wavelength selectivity to perform routing. the main network through which data is routed. The To- rus requires an additional access network, represented by thinner lines (additional waveguides) and the blocks (in- jection) and ejection to facilitate entering and exiting the main network [7]. 3.1 TorusNX TorusNX improves the Torus topology by introducing new a switch design that eliminates the need for the ac- cess network and directly integrates the gateway into the main topology. 3.2 Square Root Square Root (Fig. 3) is an alternative hierarchical topology optimized to reduce the required number of waveguide crossings and switching points. Due to the recursive nature of constructing the Square Root, the number of nodes along the X and Y dimension of the to- pology must be equal and a positive integer power of two (i.e. 2, 4, 8, 16, .). TorusNX and Square Root were both Fig.1. (Color online) Schematic of the TorusNX topology. G` blocks represent gateway photonic switches and X` blocks represent 4*4 non-blocking photonic switches. Lines indicate bidirectional links which are composed of two waveguides that are used for counter- propagating lightwaves Fig.2. (Color online) Schematic of the TorusNX photonic routers configured with two WDM partitions: (a) gateway switch and (b) 4*4 non-blocking photonic switch s Fig.3. 4*4 Square Root Topology JOURNAL OF COMPUTING, VOLUME 4, ISSUE 6, JUNE 2012, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG 146 designed in response to preliminary physical-layer short- comings of the Torus, since insertion losses due to wave- guide crossings and the large number of switches have a dramatic impact on system performance. 4 THE SQUARE-ROOT TOPOLOGY In unsliucluied nelvoiks, such as lhal in CnuleIIa, lhe topology of the network is built up over time as peers choose neighbors essentially randomly. Without any out- side interference, such networks tend toward a power- law distribution, where the number of neighbors of the Ith most connected peer is proportional to 1/Iu. Here, u is a constant that determines the skew of the distribution. For such networks, random walk searches have shown to be effective [27,28]. A simple ran- dom walk search starts at one peer in the network, and is piocessed ovei lhal peeis conlenl. That peer then for- wards the search to a random neighbor, who processes and forwards the query again. In this way, the search vaIks iandonIy aiound lhe nelvoik, unliI il leini- nates, either because enough results have been found or because a time-to-live (TTL) has been reached [28]. Consider a peer-to-peer network with N peers. Each peer k in the network has degree dk (that is, dk is the number of neighbors that k has). The total degree in the network is D,where D = . We define the square-root topology as a topology where the degree of each peer is proportional to the square root of lhe popuIaiily of lhe peeis conlenl. The square-root topology is optimal for simple random walk searches. But are simple random walk searches the best search strategy for the square-root topology. Previ- ous work [27-29] has shown that content movement can improve simple random walks significantly. However, we can still optimize random walks for cases where content movement is not feasible. In this section, we describe two optimizations that work together to im- prove search efficiency for random walks in square-root networks. Both optimizations introduce determinism into the routing process, so to avoid routing loops between the same sets of nodes, state keeping must be used [11].With state keeping, nodes remember where they have for- warded searches and avoid forwarding them to the same neighbors over and over again. REFERENCES [1] L. Rolinson, el aI.. Iholonics foi HILC: a Iov- poveied soIulion foi high landvidlh appIicalions, in OFC, OWH2 (2011). [2] C. Batten, et al., "Building many-core processor-to- DRAM networks with monolithic CMOS silicon pho- tonics," IEEE Micro 29 (4) pp. 8-21 (2009). [3] D. Vantrease, et al., "Corona: System implications of emerging nanophotonic technology," in ISCA, pp. 153-164 (2008). [4] A. Shacham, et al., "Photonic networks-on-chip for future generations of chip multiprocessors," IEEE Trans. Comput. 57 (9) pp. 1246-1260 (2008). [5] C. allen, el aI., uiIding nany-core processor-to- DRAM networks with monolithic CMOS silicon pho- tonics, IEEE Micro 29 (4) 8-21 (2009). [6] D. Vanliease, el aI., Coiona: Syslen inpIicalions of emerging nanophotonic technology, in ISCA 08: Proc. Of the 35th International Symposium on Com- puter Architecture [7] A. Shacham, K. Bergman, and L. P. Carloni, Iholon- ic networks-on-chip for future generations of chip multiprocessors, IEEE Trans. on Computers 57 (9) 1246-1260 (September 2008) , pp. 153-164 (2008). [8] I. Xia, L. Sekaiic, and Y. VIasov, UIliaconpacl opli- caI luffeis on a siIicon chip, Nal. Iholonics, vol. 1, pp. 65-71, 2006. [9] M. Gnan, S. Thorns, D. Macintyre, R. De La Rue, and M. SoieI, Ialiicalion of Iov-loss photonic wires in silicon-on-insulator using hydrogen silsesquioxane electron-beam resist, Electron. Lett., vol. 44, no. 2, pp. 115-116, Jan. 2008. [10] W. Bogaerts, P. Dumon, D. V. Thourhout, and R. aels, Lov-loss, low-cross-talk crossings for silicon- on-insuIaloi nanopholonic vaveguides, Opl. Lell., vol. 32, no. 19, pp. 2801-2803, 2007. [11] B. Little, J. Foresi, G. Steinmeyer, E. Thoen, S. Chu, H. Haus, L. Ippen, L. KineiIing, and W. Cieene, UIlia- compact Si-SiO2 microring resonator optical channel dropping filters, IEEE Photon. Technol. Lett., vol. 10, no. 4, pp. 549-551, Apr. 1998. [12] B. Lee, A. Biberman, P. Dong, M. Lipson, and K. eignan, AII-optical comb switch for multiwave- length message routing in silicon photonic net- works, IEEE Photon. Technol. Lett., vol. 20, no. 10, pp. 767-769, May 2008. [13] S. Manipatruni, Q. Xu, B. Schmidt, J. Shakya, and M. Lipson, High speed carrier injection 18 Gb/s silicon micro-ring electro-optic modulator, in 20th Annu. Meeting of the IEEE Lasers and Electro-Optics Soci- ety (LEOS), Oct. 2007, pp. 537-538. [14] S. Assefa, B. G. Lee, C. Schow, W. M. Green, A. Ry- lyakov, R. A. John, and Y. A. Vlasov, 20 Gbps re- ceiver based on germanium photodetector hybrid- integrated with 90 nm CMOS amplifier, in CLEO 2011~Laser Applications to Photonic Applications, 2011, PDPB11. [15] N. Ophir, K. Padmaraju, A. Biberman, L. Chen, K. Iieslon, M. Lipson, and K. eignan, Iiisl denon- stration of error-free operation of a full silicon on- chip pholonic Iink, in OplicaI Iilei Connunicalion Conf., 2011, OWZ3. [16] C. Batten, A. Joshi, J. Orcutt, A. Khilo, B. Moss, C. Holzwarth,M. Popovic, H. Li, H. Smith, J. Hoyt, F. Kartner, R. Ram,V. Stojanovic, and K. Asanovic, uiIdingnany-core processor-to-DRAM networks vilh nonoIilhic CMOS siIicon pholonics, ILLLMi- cro, vol. 29, no. 4, pp. 8-21, July-Aug. 2009. [17] N. Kirman,M. Kirman, R. K. Dokania, J. F.Martinez, A. . ApseI, M. A. Walkins, and D. H. AIlonesi, On- chip optical technology in future bus-based multicore designs, IEEE Micro, vol. 27, no. 1, pp. 56-66, 2007. [18] M. J. Cianchetti, J. C. Kerekes, and D. H. Albonesi, IhaslIane: a iapid liansil oplicaI iouling nelvoik, JOURNAL OF COMPUTING, VOLUME 4, ISSUE 6, JUNE 2012, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG 147 in Proc. of the 36th Annu. Int. Symp. on Computer Architecture (ISCA), 2009, pp. 441-450. [19] D. Vantrease, R. Schreiber, M. Monchiero, M. McLa- ren, N. P. Jouppi, M. Fiorentino, A. Davis, N. Binkert, R. C. eausoIeiI, and }. H. Ahn, Coiona: Syslen im- plications of emerging nanophotonic technoIogy, in Proc. of the 35th Annu. Int. Symp. on Computer Ar- chitecture (ISCA), June 2008, pp. 153-164. [20] S. Koohi, M. AldoIIahi, and S. Hessali, AII-optical wavelength-routed NoC based on a novel hierarchi- cal topology, in 5th IEEE/ACM Int. Symp. on Net- works on Chip (NoCS), May 2011, pp. 97-104. [21] A. Shachan, K. eignan, and L. CaiIoni, Iholonic networks-on-chip for future generations of chip mul- tiprocessors, IEEE Trans. Comput., vol. 57, no. 9, pp. 1246-1260, Sept. 2008. [22] G. Hendry, S. Kamil, A. Biberman, J. Chan, B. Lee, M. Mohiyuddin, A. Jain, K. Bergman, L. Carloni, J. Kubi- alovicz, L. OIikei, and }. ShaIf, AnaIysis of photonic networks for a chip multiprocessor using scientific applications, in 3rd ACM/IEEE Int. Symp. on Net- works-on-Chip (NOCS),May 2009, pp. 104-113. [23] J. Chan, G. Hendry, A. Biberman, and K. Bergman, AichilecluiaI expIoialion of chip-scale photonic in- terconnection network designs using physical-layer analysis, J. Lightwave Technol., vol.28, no. 9, pp. 1305-1315, May 2010. [24] G. Hendry, J. Chan, S. Kamil, L. Oliker, J. Shalf, L. CaiIoni, and K. eignan, SiIicon nanopholonic net- work-on-chip using TDM arbitration, in 2010 IEEE 18th Annu. Symp. on High Performance Intercon- nects (HOTI), Aug. 2010, pp. 88-95. [25] C. Qiao and R. MeIhen, Reducing connunicalion latency with path multiplexing in optically intercon- necled nuIlipiocessoi syslens, ILLL Tians. IaiaIIeI Distrib. Syst., vol. 8, no. 2, pp. 97-108, Feb. 1997. [26] X. Yuan, R.MeIhen, and R. Cupla, Disliiluled palh reservation algorithms for multiplexed all-optical in- leiconneclion nelvoiks, ILLL Tians. Conpul., voI. 48, no. 12, pp. 1355-1363, Dec. 1999. [27] L. Adamic, R. Lukose, A. Puniyani, and B. Huber- man. Search in power-law networks. Phys. Rev. E, 64:46135-46143, 2001. [28] E. Cohen and S. Shenker. Replication strategies in unstructured peer-to-peer networks. In Proc. SIG- COMM, 2002. [29] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker. Search and replication in unstructured peer-to-peer net- voiks. In Iioc. InlI Conf. on Supeiconpuling (ICS), 2002. JOURNAL OF COMPUTING, VOLUME 4, ISSUE 6, JUNE 2012, ISSN (Online) 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG 148