Computers and Electrical Engineering: Peng Dai, Jenhui Chen, Yiqiang Zhao, Yen-Han Lai

Computers and Electrical Engineering xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
Computers and Electrical Engineering

journal homepage: www.elsevier.com/locate/compeleceng
A study of a wire–wireless hybrid NoC architecture with an

energy-proportional multicast scheme for energy efficiency q
Peng Dai a, Jenhui Chen b,⇑,1, Yiqiang Zhao a, Yen-Han Lai b
a
School of Electronic and Information Engineering, Tianjin University, Nankai, Tianjin 300072, PR China
b
Department of Computer Science and Information Engineering, School of Electrical and Computer Engineering, College of Engineering, Chang Gung
University, Kweishan, Taoyuan 33302, Taiwan, ROC
a r t i c l e i n f o a b s t r a c t
Article history: The efficiency of interconnect network-on-chip (NoC) design significantly affects the ther-
Received 16 February 2014 mal and energy-consumption problems. The wireless interconnect NoC (WiNoC) design
Received in revised form 2 June 2015 provides a promising NoC architecture for multicast in chip multiprocessor (CMP) as com-
Accepted 3 June 2015
pared with fully wired NoC. However, wireless routers (WRs) cost a larger area size as well
Available online xxxx
as larger energy consumption than wired routers do. In this paper, we study a 2-tier wire–
wireless hybrid NoC (WHNoC) architecture in an ðNSÞ-processing-element (PE) CMP where
Keywords:
N PEs use wires to connect with a wireless-enabled hub forming a star topology subnet and
CMP
CSMA
S wireless-enabled hubs form a fully connected WiNoC, named sWHNoC. We first investi-
Energy efficiency gate the performance of slotted p-persistent carrier sense multiple access (CSMA) protocol
Multicast on the fully connected WiNoC. To greatly reduce the energy consumption of WiNoC, we
Network-on-chip propose an energy-proportional multicast scheme (EMS) by using a power-gating (PG)
Wireless technique to switch off non-member WRs during the period of multicast transmission. A
comprehensive comparison of the star-ring WHNoC, the mesh-based WHNoC, and the pro-
posed sWHNoC is studied. Detailed analyses of the energy consumption of sWHNoC are
presented. The correctness of analysis is validated by using Orion 2.0 simulator. Based
on our investigation, the sWHNoC with the slotted p-persistent CSMA and the EMS will sig-
nificantly reduce the energy consumption as well as the transmission latency in CMP.
Ó 2015 Elsevier Ltd. All rights reserved.
1. Introduction
Cache coherence among processing elements (PEs) in a multiprocessor system-on-chip (MPSoC) or chip multiprocessor
(CMP) architecture is a fundamental problem that dominates the multiprocessing performance as well as energy efficiency.
During the last decade, network-on-chip (NoC) technology had emerged as communication backbones to enable a high
degree of integration in CMP [1]. Different from bus-based systems, PEs communicate with each other in NoC by sending
data packets across an on-chip network instead of driving voltage signals across a dedicated bus [2]. Despite the traditional
planar metal NoC architecture having better performance than the bus-based one, it is still limited by high latency and large
q
Reviews processed and recommended for publication to the Editor-in-Chief by Associate Editor Dr. M. Daneshtalab.
⇑ Corresponding author at: Department of Computer Science and Information Engineering, Chang Gung University, No. 259, Wen-Hwa 1st Road,
Kweishan, Taoyuan 33302, Taiwan, ROC.
E-mail addresses: daipeng000@tju.edu.cn (P. Dai), jhchen@mail.cgu.edu.tw (J. Chen), yq_zhao@tju.edu.cn (Y. Zhao), nelovico@gmail.com (Y.-H. Lai).
1
This work was supported in part by the Ministry of Science and Technology, Taiwan, R.O.C., under Contract MOST 103-2221-E-182-042.
http://dx.doi.org/10.1016/j.compeleceng.2015.06.005
0045-7906/Ó 2015 Elsevier Ltd. All rights reserved.
Please cite this article in press as: Dai P et al. A study of a wire–wireless hybrid NoC architecture with an energy-proportional multicast
scheme for energy efficiency. Comput Electr Eng (2015), http://dx.doi.org/10.1016/j.compeleceng.2015.06.005
2 P. Dai et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx
power consumption due to the parasitic RC on the metal lines [3]. When viable millimeter-wave (mm-Wave) antennae and a
transceiver technology integrated on chip are introduced [4], the wireless interconnects network-on-chip (WiNoC) architec-
ture is investigated to enhance the multiprocessing performance by reducing the multi-hop links [5–7].
Traditionally an efficient way of packet transmission from one PE to multiple PEs (e.g., cache coherence among PEs on
CMP) is to perform multicast transmission [8–15]. However, these studies of multicast transmission are all based on wired
interconnection and need a multicast routing protocol to transmit multicast packets. These packets are usually transmitted
to their destinations involving multiple hops. This result leads to a longer transmission delay and costs much energy [16–18].
The naturally broadcast property of WiNoC is promising to enhance the multicast transmission performance in CMP as
compared with the 3-D topological NoC [19], the optical NoC [20], and the radio-frequency interconnects (RF-I) NoC [21]
since the transmission only involves one time transmission (i.e., one-hop transmission). Besides, these NoC technologies
(3-D NoC, optical NoC, and RF-I NoC) are limited by today’s semiconductor manufacture technologies [6]; thus, the comple-
mentary metal–oxide–semiconductor (CMOS) compatible WiNoC has its unique opportunities for implementation. Hence,
only metal-wire and wireless interconnects can be easily implemented and massively produced by taking today’s CMOS
technologies.
The implementation of wired NoC and WiNoC has different merits. First, the wired NoC requires a smaller area overhead
(i.e., lower energy consumption and heat) than the WiNoC because each wireless router (WR) of WiNoC requires a large area
to implement a wireless interface (WI) consisting of a transceiver, a receiver, and antennas. Secondly, WiNoC possesses a
better opportunity of achieving lower latency and higher throughput than wired NoC in transmission if the multi-hop trans-
mission can be achieved in one-hop.
By considering the optimization of transmission latency as well as energy efficiency, it is a good way to combine these
two kinds of NoC architectures to form a wire–wireless hybrid NoC (WHNoC) in CMP [22]. Deb et al. [7] proposed a
WHNoC architecture where all PEs are divided into multiple subnets. Each subnet uses a star-ring topology to connect inside
PEs. The interconnection among subnets uses hybrid wired and wireless links which are determined by using the principle of
small-world graphs. The adopted protocol of the WiNoC is a token-passing protocol. Indeed, the token-passing protocol has
several drawbacks. First, the token circulates among all WRs (i.e., round-robin scheduling) no matter the WRs need to trans-
mit or not. Secondly, token circulation will lead to a long access delay if the number of WRs is large. Thirdly, there is a risk of
token losing and will encounter a token re-election overhead. Fourthly, token-passing protocol is not easy to implement pri-
ority access.
The p-persistent carrier sense multiple access (CSMA) had been proven that the performance can be well managed in a
finite population [23–25]. Seo et al. [23] had shown that the throughput upper-bound (or mean access delay lower-bound)
can be achieved by an optimal persistent probability p from the number of backlogged terminals (i.e., WRs in our paper) in a
finite population CSMA system (i.e., a fixed number of WRs on a chip). Thus, the aforementioned drawbacks of the
token-passing protocol and the achievements of [23,24] motivate us to adopt the p-persistent CSMA in a fully connected
WiNoC architecture to achieve NoC energy efficiency since the number of PEs on CMP is fixed.
Although CMP technologies nowadays have better performance than their predecessors, they still consume a lot of energy
[26]. In this paper, we focus on the energy efficiency of NoC and study a WHNoC architecture in which all subordinate PEs of
a subnet connect to a WI-equipped hub (i.e., WRs) by wires to form a star topology and WRs connect with each other in a
fully connected topology, called sWHNoC, as shown in Fig. 1(a). To greatly reduce the energy consumption an
Fig. 1. (a) An 8 8 sWHNoC architecture with star topology subnets where PEs connect with a WR via metal wires in the intra-subnets and subnets connect
with each other via a fully connected WiNoC in the inter-subnets. (b) The architecture of WR with a sleep transistor ring design.
P. Dai et al. / Computers and Electrical Engineering xxx (2015) xxx–xxx 3
energy-propositional multicast scheme (EMS) is proposed to support this kind of sWHNoC. The EMS uses a power-gating
(PG) technique to temporarily power off WRs when they are not involved in the multicast transmission. To the best knowl-
edge of the authors, we are the first to propose the slotted p-persistent CSMA for sWHNoC with EMS in the CMP.
The rest of the paper is organized as follows. Section 2 introduces the hybrid architecture and the mechanism of EMS with
PG technique. The energy consumption of EMS is analyzed in Section 3. A performance comparison between CSMA protocol
and token-passing protocol is presented in Section 4. Section 5 evaluates the energy efficiency of EMS with simulation and
analysis results. Finally, some conclusions are given in Section 6.
2. System model
The sWHNoC is a hierarchical architecture that contains two levels: the bottom-level (i.e., the intra-subnet) and top-level
(i.e., the inter-subnet). The sWHNoC is partitioned into S subnets where each subnet occupies one WR and uses the star
topology to connect N PEs belonging to the subnet. The subnet constructs the bottom-level. The top-level is constructed
by the WRs that are inserted in subnets as the central hubs for wireless transmission to connect all the subnets.
2.1. WHNoC architecture
Fig. 1(a) shows a case of the sWHNoC with 64 PEs where S ¼ 16. In this case, four PEs are grouped into a subnet and the
top-level is a fully connected WiNoC. To distinguish different tasks assignment, Fig. 1(a) uses different colors to represent
different tasks. PEs colored with a same color belong to a same multicast group. In multicast transmission, only one PE of
a multicast group plays the multicast source; the others play the multicast receivers. PEs that do not belong to the multicast
group are called multicast non-member PEs. Similarly WRs are either multicast member WRs or multicast non-member
WRs. Since WRs are shared by PEs, a WR may belong to several multicast groups concurrently as shown in Fig. 1(a). The
member WRs are those WRs who connect to at least one member PE. If PEs belong to the WR are all non-member PEs in
this multicast transmission, the WR is a non-member WR. We call the subnet with the member WR as member subnet,
and the subnet with the non-member WR as non-member subnet. Let R ¼ fR1 ; R2 ; . . . ; RS g denote the set of subnets.
Obviously, one member subnet may contain non-member PEs. For example, R3 ; R5 ; R7 ; R14 are subnets that contain
non-member PEs in multicast transmission.
When a multicast packet is generated by a PE, the PE will send the multicast packet to its connected WR first. The WR
maintains a first-in-first-out (FIFO) queue to receive the multicast packets and a multicast table using the multicast group
identity (MID) as the table entry. Each entry in the multicast table contains one MID and a list of multicast member IDs
< R1 ; R2 ; . . . ; Ri >; 1 6 i 6 S. If the entry contains the WR ID itself, the WR sends this multicast packet to other connected mul-
ticast members expect the source PE immediately. If the entry contains other WR IDs (i.e., those IDs are not its WR ID), the
WR will send the multicast packet via its WI.
The wireless transmission mechanism follows the slotted p-persistent CSMA scheme [27,28]. In this scheme, a WR with a
multicast packet to send will sense the channel. If the channel is sensed to be idle for one slot long, the WR transmits the
packet to the channel with probability p at the beginning of the next slot or defers with probability 1 p. If the channel
is sensed to be busy, the WR continuously senses the channel until the channel is sensed an idle slot before it can transmit
the packet as described above. The failed transmission of packets caused by at least two WRs transmit at the same time will
be rescheduled for next transmission until success. To ensure the multicast transmission being reliable, a multicast acknowl-
edgment scheme [29] is adopted.
In the sWHNoC architecture, each multicast packet may be transmitted through one WR at least (i.e., for members are all
in the same subnet) or two WRs at most (i.e., inter-subnet). The sWHNoC has a simple transmission process. It has a stable
amount of hops for each multicast transmission (i.e., two or three hops). Let jLi j denote the size of the ith multicast group (i.e.,
the number of PEs), and the number of WRs in this group is jMi j. For the reason of the multicast communication efficiency, all
members in a multicast group are assumed always placed closely until the subnet is saturated; thus, we have jM i j ¼ djLi j=N e,
where N is the size of subnet in PEs.
Unlike star topology, two kinds of intra-subnet topologies are considered in other works, namely the mesh topology [5]
and the star-ring topology [7]. These topologies make the intra-subnet communication more efficient by using wired links
for adjacent members instead of going through a central hub. To implement the adjacent wired connection, however, tradi-
tional routers are required for every PEs. This consequence leads to an extra area overhead and increases energy consump-
tion. For this drawback, we make a comprehensive study of comparing star topology with these two kinds of topologies in
Section 4.
2.2. EMS
During the period of multicast transmission, WRs who belong to the multicast group transmit or receive packets while
multicast non-members do not. Hence, if all multicast non-members can be powered off during the period of time, the
energy will be saved. The degree of energy saving depends on the scale of non-members. Traditionally, in wired NoC, every
PE contains one router to exchange packets with other PEs. All routers form the interconnection NoC for data delivery. To
Fig. 2. An illustration of the multicast packet format with MID.
power off any one of these routers in this fabric will cause disorder of routing functions and, thus, it is not feasible in wired
NoC. In WiNoC, the WRs are independent and non-member WRs can be temporarily powered off during multicast transmis-
sion without causing the fault of transmission.
The term energy-proportional in EMS means that only those multicast member WRs need to be powered on to receive
packets while those non-member WRs will be powered off for a given period. To implement EMS, the PG technique is
required. The energy consumption reduction is achieved by cutting off the power supply path when the circuit block is in
idle state. Its significant component is a serial of parallel transistors that can balance the power supply to every parts of a
block. Fig. 1(b) demonstrates a feasible scheme to implement the PG technique with a ring style of ‘‘footer’’ sleep transistors
(i.e., STs). The footer transistors switch the virtual VSS (directly connecting to the WR) and permanent VSS (i.e., the true VSS
ground). All switch terminals of sleep transistors of a WR are connected together and controlled by an individual, tiny
PG-controller as shown in Fig. 1(b). We notice that each WR has one PG-controller and the PG-controller is never powered
off. Actually, this tiny PG-controller is just composed of several logic gates. Therefore, even it is never powered off, it would
not consume much power [30]. The optimal value of transistor size and the optimal number of transistors vary with the
CMOS technologies and its applications. Usually it depends on the drain current in ON and OFF states (i.e., Ion and Ioff ), body
bias (Vbb), power supply (VDD), and so on. Keating et al. [30] have presented the further detailed design and implementation
of sleep transistors under different CMOS technologies and applications.
The EMS is exercised at each WR to determine whether it has to be switched off for a specific period or not. A WR consists
of a multi-bidirectional-ports wormhole router with virtual channels and a WI consisting of a pair of transmitter and receiver
(including antennas) [31–33]. In order to implement multicast communication, a packet replicator block is needed to repli-
cate the packet and transmit it to different output ports.
A cache update message generated in the PE is divided into fixed length flits first. The transmission between the PE and its
connected WR is a flit-based wormhole flow control. The first flit (header flit) uses the MID address as the wormhole routing
information to the connected WR. Every WR uses an MID detector (see Fig. 1(b)) to compare MID with its local ID (LID) list. If
the MID matches its LID (i.e., local transmissions), the multiplexer (MUX) is enabled for flit receiving and replicates flits to
other connected PEs by using wormhole flow control. In this case, the flits are not packed as a packet and no buffer is needed.
For the output transmission (i.e., among WRs), the WR packs these flits into a single packet and adding the MID, sequence
number (SEQ), and duration (unit: time slots) into the header of the packet as shown in Fig. 2. To do so, a buffer is needed for
packing/unpacking these flits. Transmission among WRs is a packet-based manner in EMS. We note that packing several
short flits into a packet for wireless transmission is more energy-efficient due to the property of wireless transmission
[27,28]. We emphasize that the energy consumed by packing/unpacking in buffer is much lower than the energy consumed
by transmitting flits individually via wireless transmission. In each multicast transmission among WRs, each WR follows the
multicast receiving procedures as described above. If the packet is identified, the WR receives the packet and unpacks the
packet into flits for sending to its connected PEs. Otherwise, the WR abandons receiving the multicast packet and forwards
the duration information to its PG-controller to power off the WR by switching off all sleep transistors as shown in Fig. 1(b).
During the sleep period, the nonmember WRs sleep a period of the duration value (unit: time slots) and are waked up by the
PG-controller when the duration expires.
3. Energy consumption analysis
In this section, we mainly focus on the energy consumption caused by multicast transmission in the sWHNoC with EMS.
Multicast transmission comprises intra-subnet communication and inter-subnet communication. In intra-subnet, data col-
lision can be avoid by using arbiter and virtual channels. In inter-subnet, the optimal slotted p-persistent CSMA is adopted
[23,25].
Fig. 3 illustrates the wireless channel status of the p-persistent CSMA whose time axis is slotted and one slot duration is a
seconds (the propagation delay). To ensure that the carrier sensing works well, the propagation delay of the farthest distance
among these WRs is defined as the length of the slot. The system state alternates between the idle period I in which no WRs
have a packet and the busy period B in which at least one WR has a packet. The busy period B is divided into several sub-busy
periods, and the jth sub-busy period (denoted by BðjÞ ) consists of a transmission delay (denoted by DðjÞ ) followed by a trans-
mission period plus a slot (denoted by T ðjÞ ) as shown in Fig. 3. In this system, T ðjÞ ¼ 1 þ a whether the transmission is suc-
cessful or not, and BðjÞ ¼ DðjÞ þ 1 þ a, for j ¼ 1; 2; . . . ; 1. The packet arrivals of a finite population in each period are
assumed to be independent and geometrically distributed [23,34]. Let a packet arriving an empty WR which has no packets
in buffer follow the geometric process with a probability q (0 < q < 1) in any slot, where we assume that q is comprised of
both new and rescheduled packets.
Fig. 3. Channel status in the slotted p-persistent CSMA with PG technique on WR by using EMS.
3.1. Derivation of N t
To evaluate the inter-subnet energy consumption for each packet Einter , the total expected number of transmissions N t
(including several collisions and one success) the WR has to take before its successful transmission of a data packet has
to be obtained first. Let J be the number of sub-busy periods in a busy period B. The expectation value of J was derived in
[34] and is give by J ¼ 1=ð1 qÞð1þ1=aÞS ; the mean of idle period is give by I ¼ a=ð1 ð1 qÞS Þ. Let pn ðX j Þ denote the proba-
bility that there are n arrivals among N WRs in X j slots and n P 1. Using the geometric packet arrival rate q at each of S
WRs in a slot, the probability is expressed as
h in
1 S
pn ðX j Þ ¼ SX j
1 ð1 qÞXj ð1 qÞX j ðSnÞ ; ð1Þ
1 ð1 qÞ n
where n ¼ 1; 2; . . . ; S.
Let P ðjÞ
s be the probability of successful transmission for a specific WR in the jth sub-busy period, as derived in [35], and is
given by
( " #)
p X1
k
k
X j pð1 pÞ qð1 qÞ
k
PðjÞ
s ¼ ð1 pÞ ð1 qÞ
1 ð1 qÞX j S k¼0 pq
( " #)S1
kþ1
kþ1 X j ð1 pÞ ð1 qÞkþ1
ð1 pÞ pð1 qÞ
pq
" #" #S1
pqð1 qÞX j S X 1
ð1 qÞk ð1 pÞk pð1 qÞkþ1 qð1 pÞkþ1
: ð2Þ
1 ð1 qÞX j S k¼1 pq pq
where X j ¼ 1 slot for j ¼ 1 and X j ¼ 1 þ 1=a slots for j P 2.

The failure probability P f that a specific WR fails in the first attempt to access the channel [35] is given by
" # " #
I þ Dð1Þ Pð1Þ
s B Dð1Þ Psð2Þ
Pf ¼ 1 þ 1 ; ð3Þ
IþB 1 Peð1Þ IþB 1 P ð2Þ
e
where P ðjÞ
e is the probability of a specific WR not having any packet arrivals during B
ðjÞ
and is given in [35]
1 ð1 qÞX j ðSnÞ ðjÞ

PðjÞ
e ¼ Xj S
ð1 qÞðX j þD =aÞ
; ð4Þ
1 ð1 qÞ
where DðjÞ is the expected value of DðjÞ and is given by [34]

( " #)S " #S
ðjÞ a X
1
k ð1 pÞk ð1 qÞk
Xj að1 qÞXj S X 1
pð1 qÞk qð1 pÞk
D ¼ ð1 pÞ ð1 qÞ p : ð5Þ
1 ð1 qÞX j S k¼1
pq 1 ð1 qÞX j S k¼1 pq
According to (5), the mean of busy period B is derived in [34] and B ¼ Dð1Þ þ 1 þ a þ ½ð1 qÞð1þ1=aÞS 1ðDð2Þ þ 1 þ aÞ.
Based on (2)–(5), the expected total number of attempts (transmissions) that a specific WR will encounter until it gets
successful transmission of a packet (including the last successful attempt) can be obtained by
Pf ðJ 1ÞPf
Nt ¼ þ þ 1: ð6Þ
JPsð1Þ JPð2Þ
s
3.2. Derivation of Einter
ðnÞ
Let K T and K R be the power consumption per bit of transmitter and receiver (including antennas) in WI, K X be the power
consumption per bit of router with n bi-direction ports, and K W be the power consumption per bit of a wire line (one hop).
Let Rd be the transmission data rate (bits per T) and a be the percentage of T in packet and b ¼ 1 a be the percentage of T in
the period of ACKs plus a. The under side of Fig. 3 shows the power status of four parts of WRs during transmission. The
terms S; M, and M0 represent the multicast source, multicast members, and multicast non-members. SW ; SX ; ST , and SR repre-
sent the power statuses of wire line, router, transmitter, and receiver of the source WR. Similarly MW ; M X ; MT , and M R rep-
resent the member WR, and M0W ; M 0X ; M 0T , and M 0R represent the non-member WR, respectively.
The high level of these waves means it consumes energy and the low level means it is powered off. When a multicast
packet is generated by a source PE, the PE will send the multicast packet to its connected WR first. After receiving the packet,
the WR uses the WI to transmit the packet to other WRs. At the beginning of transmission, all WRs are powered on to receive
the MID field of the packet. If the MID matches the LID, the multicast members continuously receive the rest of the packet.
When the packet transmission is finished, all WRs are back to the standby status to wait for the next packet transmission.
According to (5), (6), and Fig. 3, the energy consumption of a multicast source for one packet transmission (including col-
lisions) is given by
" #
h i Pf ðJ 1ÞPf
ðjÞ ðNþ1Þ
ES ¼ Rd ðD þ 1ÞK W þ aK T þ bK R þ KX þ þ1 : ð7Þ
JPð1Þ
s JPsð2Þ
Since all multicast member WRs except the source WR have to receive the packet, the average energy consumption of mul-
ticast receivers can be obtained by
( " # )
Pf ðJ 1ÞPf h i
ðNþ1Þ
EM ¼ ðjM i j 1ÞRd cm aK R ð1Þ
þ þ aK R þ aK T þ ða þ a cm aÞK X þ ðjLi j 1ÞRd ½ða þ a cm aÞK W ;
JPs JPsð2Þ
ð8Þ
where cm represents the length of MID field (in slots). Since all non-multicast members M 0i have to detect the MID in each
packet, the average energy consumption of non-member is
" #
Pf ðJ 1ÞPf
EM0 ¼ ðS jM i jÞRd ðcm aK R Þ ð1Þ
þ þ1 : ð9Þ
JPs JP ð2Þ
s
The inter-subnet energy consumption for each packet transmission (including collisions and one success) is obtained by
Einter ¼ ES þ EM þ EM0 : ð10Þ
3.3. Derivation of Ec
If all the multicast member PEs are in a single subnet, only intra-subnet communication implements. The energy con-
sumption can be presented as

ðNþ1Þ
Eintra ¼ Rd jLi jK W þ K X : ð11Þ
We define P t as the probability of inter-subnet communication, so that 1 P t is the probability of intra-subnet communica-
tion. Hence, the normalized energy consumption per PE for each packet of the whole system is
ð1 Pt ÞEintra þ Pt Einter
Ec ¼ : ð12Þ
NS
4. Performance metrics of CSMA and token-passing protocols
In this section, we compare the performance of CSMA and token-passing protocols. To study the performance metrics, we
use simulation to compare the CSMA protocol with the token-passing protocol in terms of area cost, energy consumption,
and MAC access delay. Both CSMA and token-passing protocols use the time division multiplexing (TDM) scheme to access
the wireless medium. The time is divided into slots. A token is circulated among WRs in sequence for obtaining the medium
access right in the token-passing protocol [38]. The token is passed via the wireless channel. For simplicity, we assume the
token will not be lost during token passing process.
4.1. Area cost
Since CSMA and token-passing protocols are algorithms, the hardware overhead is not related to the protocol. Thus the
area cost is the same.
4.2. MAC delay
In this subsection, we study the MAC delay DM (in slots), which is defined as the time the packet (composed of several
flits) arrives the front-end of the queue (waiting to be transmitted) to the time the packet starts to be transmitted. For fair
comparison, the length of each packet T tok is a fix value of 10 time slots in both CSMA and token-passing protocols. The token
passing from one WR to another WR costs a time duration T tp . Usually the size of a token is 48 bits or equal to 6 flits. For
clearly demonstrating the performance metrics, we define a ratio g ¼ T tok =T tp , which can be used to show the impact of dif-
ferent sizes of tokens on performance metrics. In token-based protocol, if no packets are needed to be transmitted in the WR,
the WR passes the token to the next WR with T tp immediately. Each WR has T tok to transmit packets once it occupies the
token. Conversely, in CSMA protocol, each WR uses a probability (e.g., p ¼ 0:03 [34]) to transmit packets. Packet transmission
may collide if two or more WRs transmit their packets at the same slot.
Fig. 4 shows average DM in different S. Obviously, DM of the token-passing protocol increases with the increase of S
because all WRs circulate the token in turn, whereas DM of CSMA increases slowly with the increase of S. It shows that
the token-passing protocol is not suitable in large scale CMPs. Fig. 4 shows that DM of CSMA stably increases with the
increase of S because it uses the contention strategy and an appropriate p to reduce collision probability. This is why
CSMA can be adopted for communications in WiNoC. Overall, the CSMA protocol outperforms the token-passing protocol.
The gap of improvement increases when S increases. Additionally, the impact of the size of token on DM is higher when
the traffic load is heavy.
4.3. Energy consumption
To compare energy overhead, a 2-D mesh architecture of token-passing is adopted. The number of WR is S ¼ n þ k where
n represents the number of WRs that have no packets to transmit and k is the number of WRs that have packets to transmit
(i.e., source nodes). Therefore, the time of circulating one cycle needs kT tok þ ST tp . Suppose the system has a total traffic load
Ltotal to be consumed. The time it will consume is Ltotal =Rd . Then each source WR (i.e., has packets to transmit) needs
Ltotal =ðRd kT tok Þ times to transmit its packets. During one token-passing cycle, each WR only has T tok time slots to transmit
packets and has ðk 1ÞT tok þ ST tp time slots to be a receiver. Since communication among WRs is a full-connected topology
and no specific power saving mechanism is applied, the energy consumption for a source WR per cycle can be simplified as
Rd ½T tok M T þ ðS 1ÞT tok M R , where the M T and M R are the power per bit of WR in transmitting mode and receiving mode,
respectively. Considering the energy overhead during token-passing period, the total energy per cycle of NoC with k WRs
300
CSMA (G =0 .1)
Token-based( G =0 .1,η =3)
250 Token-based( G =0 .1,η =10)
CSMA (G =0 .9)
Token-based( G =0 .9,η =3)
200 Token-based( G =0 .9,η =10)
DM (slots)
150
100
50
1
2 16 32 48 64 80 96 112 128
S
Fig. 4. Medium access delay comparison between CSMA and token-passing protocols.
140 Token-passing( η =1 .3)

Token-passing( η =3)
Normalized energyper WR (nJ/bit)

Token-passing( η =10)
120 CSMA
100
80
60
40
20
0
2 10 20 30 40 50 60 64
S
Fig. 5. Energy comparison between CSMA and token-passing protocols.
who have flits to transmit is Rd ½kðT tok M T þ ðS 1ÞT tok M R Þ þ ST tp ðM T þ M R Þ. As it spends Ltotal =ðRd kT tok Þ cycles to transmit, the
total energy consumed in the traffic load Ltotal is
Ltotal
Etok ¼ Rd k½T tok MT þ ðS 1ÞT tok M R þ ST tp ðM T þ M R Þ : ð13Þ
Rd kT tok
The average energy per bit can be presented as:
SðM T þ M R Þ
Etok ¼ M T þ ðS 1ÞM R þ : ð14Þ
gk
Here we ignore the static power consumption, and assume that the energy is consumed only when data are transmitted
or received. The energy consumption of CSMA can be found in Section 3. Fig. 5 shows the energy consumption of CSMA and
token-passing protocols by varying S where N ¼ 4; p ¼ 0:03; a ¼ 0:01, and Pt ¼ 0:5. The energy parameters of WR follows the
same parameters used in [7]. All simulation results are obtained by using simulator Orion 2.0 (see Section 4). Fig. 5 shows
that the energy consumption of token-passing mechanism is related to g. The smaller the g is, the higher energy consump-
tion is. The reason is that if the token size dominates higher portion of each transmission, the energy consumption will be
high. As we can see in Fig. 5 that CSMA does not need the redundant token to decide the access right among WRs. Therefore,
CSMA is more energy-efficient than token-passing mechanism.
5. Experimental results
To illustrate the energy efficiency of EMS, a case of h h NoC, h ¼ 8, is examined. Suppose it has S subnets and every sub-
net contains N PEs (i.e., the subnet size is N). The WI component considered in this study is a mm-Wave transceiver with
body-enabled techniques [4], and is well studied with a metal zigzag antenna on an mm-Wave NoC in [7] due to its good
property of providing a wide bandwidth as well as low power consumption on-chip. The power consumption per bit of trans-
mitter, receiver, and antenna are 0.875 pJ/bit, 0.557 pJ/bit, and 0.3125 pJ/bit, respectively. To evaluate the energy consump-
tion of sWiNoC with EMS, we adopt the NoC simulator Orion 2.0 which has been validated with post-layout simulations of
Intel 80-core Teraflops chip. Therefore, the power consumption per bit of some parts in WiNoC are obtained as Table 1.
K W ; K T , and K R are the power of wire line (one hop), transmitter, and receiver per bit.
In 16 GHz mm-Wave, one slot a can carry 2 bits and then Rd ¼ 2=a. For instance, if a ¼ 0:0001; T ¼ 2 kbits. The packet arri-
val probability to a WR q ¼ aG=N, where G is offered work load to the chip. Other simulation parameters are
a ¼ 1 32a; b ¼ 32a, and cm ¼ 2. The switch (router) we used in the simulation is the wormhole router which possesses
4 virtual channels in each port and the buffer depth is 32 bits.
Table 1
Power parameters.
KW KT KR
Power (pJ/bit) 0.154 1.188 0.869
5.1. Multicast group size
Routers as the backbone component consume the most energy in NoC, and the input buffers in input ports contribute the
major energy in router. Therefore, the number of bi-direction ports effects the total energy consumption of NoC significantly.
Fig. 6 illustrates the router energy consumption per bit with different number of bi-direction ports, which is obtained from
Orion 2.0 simulation results. Actually the number of bi-direction ports is N þ 1 (i.e., N ports to PEs plus one port to the WI).
ðNþ1Þ
According to the above derivation and the simulation results (i.e., K X in Fig. 6), we can get the normalized energy con-
sumption per PE as shown in Fig. 7. In the 64-PE case, since N S ¼ 64 is fixed (i.e., N is inversely proportional to S), the
energy consumption increases as jLi j increases.
As shown in Fig. 7, N ¼ 4 is the best energy efficient case among the five cases. This is because that a bigger size of subnet
needs more bi-direction ports in the router which consumes more energy when intra-subnet communication is performed.
On the other hand, a smaller size of subnet causes more WRs to be implemented in the system and costs much more energy
and area. It can also be observed that a turning point appears when the subnet size increases. Based on this observation, it is
recommended that an appropriate N (i.e., the value of S) should be considered for optimal performance turning. In the 64-PE
case, the optimal number of WRs is S ¼ 16 (i.e., one quarter of the total number of PEs).
5.2. Multicast group distribution
The parameter Pt in (12) reflects the possibility of performing inter-subnet communication which is determined by the
multicast group distribution. The more multicast member PEs assemble in a subnet, the smaller P t is. We assume that
Fig. 6. Router energy consumption per bit with different bi-direction ports.
Fig. 7. Comparison of normalized energy consumption in five sizes of subnet by increasing jLi j when a ¼ 0:001; p ¼ 0:03, and P t ¼ 0:5.
a ¼ 0:001; p ¼ 0:03; jLi j ¼ 20, the normalized energy consumption Ec with growing P t is shown in Fig. 8. It illustrates that
energy consumption increases when Pt grows because of the utilization of WRs. At the same point of P t , the smaller subnet
size N consumes more energy, and this gap increases when Pt grows. This is because a smaller value of N means a larger
number of WRs and thus consumes more energy.
5.3. The number of bits per T
In the normalized T, we have defined the slot duration as a in Fig. 3. The parameter a can reflect the number of bits per T
because of Rd ¼ 2=a. Usually, the more bits are transmitted, the more energy would be consumed. However, normalizing the
energy consumption of each bit is more visualized to evaluate the energy consumption as shown in Fig. 9. The energy con-
sumption is more efficiently with a small value of a when the subnet size is small (i.e., N ¼ 2 or 4). When the subnet size is
large enough (i.e., N ¼ 8; 16, or 32), the energy efficiency caused by small a is not evident. It shows that N ¼ 4 is the best size
for energy efficiency in the 64-PE case.
5.4. Energy comparison with fully wired NoC
The fully wired NoC we study here is a mesh-based wired NoC architecture (i.e., an 8 8 wired NoC). In the fully wired
NoC, the multicast transmission is implemented by two ways. In the first way, multicast packets are replicated at the source
PE. The source PE delivers them to all its multicast members by using multiple times of unicast, called multi-unicast method
Fig. 8. Normalized energy consumption with different subnet size by varying P t when a ¼ 0:001; p ¼ 0:03, and jLi j ¼ 20.
Fig. 9. Normalized energy consumption with different subnet size by varying a when p ¼ 0:03; P t ¼ 0:5, and jLi j ¼ 20.
[36]. Obviously, this way will lead to high latency and low throughput. In the second way, multicast packets are replicated in
routers inside. The multicast source PE delivers them by using either methods: the tree-based method [37] or the path-based
method [14]. The extra energy caused by replicating multicast packets is ignored for simplicity.
In the tree-based method, the router replicates the packet to every output channel and deliver them to the multicast
members. Obviously, the tree-based method wastes lots of energy especially when the multicast group size is large. In
the path-based method, it can avoid this unnecessary waste by listing the destination IDs in the header of the delivered
packet. According to these destination IDs, the packet is routed directly to the destination PEs, and the multicast
non-member PEs do not receive any packet.
In a h h wired NoC, every PE occupies a router which contains 5 bi-direction ports (i.e., the east, west, north, south, and
ð5Þ
local). The normalized energy per bit for this router is defined as K X . The energy consumed by wire line per hop is normal-
ized as K W . jLi j represents the size of multicast group i in PEs. For comparison, we adopt a model of multi-unicast for the
Orion 2.0 simulator provided by Wang et al. [36] in the wired NoC. The energy consumption per bit of a h h NoC can be
2 ð5Þ
presented as Emul uni ¼ h ðh 1ÞðK X þ K W Þ. Using the same model to analyze the tree-based method, the energy consump-
2 2 ð5Þ
tion per bit is Etre ¼ ðh 1ÞK W þ h K X . We assume multicast members which belong to the same group are distributed
nearby in order to obtain an optimal energy consumption of wired NoC for fair comparison. Thus the path-based energy con-
ð5Þ 2
sumption per bit is derived as Epat ¼ jLi jK X þ ðjLi j 1ÞK W , where jLi j 6 h .
Fig. 11 shows the comparison of normalized energy consumption per PE in an 8 8 NoC for four types of multicast meth-
ods: the multi-unicast, tree-based, path-based, and sWHNoC with EMS. Since the energy consumption resulted by the
multi-unicast method is too large (i.e., 275.1 nJ/bit), the value of the multi-unicast method is not shown in Fig. 11. The
energy consumption of the tree-based method is constant because the energy consumption of the tree-based method is
not related to the multicast group size. The path-based method outperforms sWHNoC with EMS in energy efficiency when
jLi j < 6 because the energy consumed by routers in wired NoC is smaller than that consumed by WRs in sWHNoC. However,
Fig. 10. Normalized energy consumption comparison between with and without EMS. (a) a ¼ 0:001; p ¼ 0:03; P t ¼ 0:5, and N ¼ 4. (b)
a ¼ 0:001; p ¼ 0:03; jLi j ¼ 20, and N ¼ 4. (c) p ¼ 0:03; P t ¼ 0:5; jLi j ¼ 20, and N ¼ 4.
Fig. 11. Normalized energy consumption versus jLi j among the sWHNoC with EMS, path-based, and tree-based methods when N ¼ 4.
when jLi j increases, the sWHNoC with EMS gradually outperforms the path-based method because the sWHNoC with EMS
benefits from one time, one hop transmission via the WR. In contrast, the routers in the fully wired NoC takes multiple times
and multiple hops to deliver multicast packets.
5.5. Energy comparison with other hybrid architectures
The star-ring subnet [7] requires 3 bi-direction ports routers in every PE, and these routers connect with a central WR. In the
star-ring subnet, it allows adjacent PEs to communicate with each other by using the 3 bi-direction ports router. It is more
energy efficient by using this way when the distance exceeds a threshold value. We define jDj as the distance (in hops) between
ð3Þ
two PEs. The energy consumed by 3 bi-direction ports routers is represented as Esr1 ¼ ðjDj þ 1ÞK X þ jDjK W ; jDj ¼ 1; 2; 3 . . .. The
ðNÞ ð3Þ
energy with WR is Esr2 ¼ þ KX 2K X
þ 2K W . Obviously, when jDj is small enough, Esr1 < Esr2 , WR can be not used. When jDj is
large, WR would be used and the energy consumption is stable (not increasing with jDj). Similar with star-ring case, the
mesh-subnet is also according with the jDj to decide whether the WR path should be used, but the router of each PE is with
ðNÞ
5 bi-direction ports. For our star-subnet, the energy per bit is K X þ 2K W . Hence, we assume the subnet size N ¼ 8, and the nor-
malized energy per bit of these three kinds of subnet is shown as Fig. 12. It illustrates that the threshold distance in star-ring is 6
and the mesh’s is 2. The dashed lines in Fig. 11 indicate the case if no WRs are used. The energy efficiency of star-subnet is
prominent when jDj > 4 compared with star-ring case, and jDj > 1 compared with mesh case.
5.6. Capacity of EMS on energy efficiency
As EMS reaches the energy efficiency goal by switching off the non-member WRs, the energy saving capacity of EMS can
be evaluated as compared with the non-EMS case. In non-EMS case, all the member and non-member WRs are powered on
during the whole multicast communication, and their energy per T can be obtained by
" #
h i P ðJ 1ÞPf
ðNþ1Þ f
EN ¼ ðS 1ÞRd bK T þ aK R þ K X þ þ 1 þ jLi jRd K W : ð15Þ
JPð1Þ
s JPsð2Þ
Then, the normalized energy consumption for each PE is
ð1 Pt ÞEintra þ Pt EN
Ecn ¼ : ð16Þ
NS
Therefore, we can get the comparisons between with and without EMS as shown in Fig. 10.
5.7. Energy saving capacity in the whole system
In the experiment, we investigate the impact of the ratio of multicast traffic to total traffic q; 0 < q < 1 on energy con-
sumption. The energy consumption of multicast is not only related to multicast traffic ratio but also determined by the dis-
tance between the source and destination nodes. To study the traffic ratio, we assume a fixed distance between the source
node and the destination node to make the study more clear and dedicated. In a 2-D 8 8 NoC, the longest transmission path
Fig. 12. Normalized energy comparison of three subnet topologies when N ¼ 8.
is 14 hops (e.g, the diagonal transmission), and the shortest path is 1 hop. Thus the average number of transmission hops
ð5Þ
between source and destination nodes is 6. The mean energy consumption of unicast traffic per bit is Euni ¼ 7K X þ 6K W .
The total energy consumption per bit can be presented as qEX þ ð1 qÞEuni , where EX represents the energy consumption
per bit and is either Emul uni ; Etre , or Epat (refer to subSection 5.4). The energy consumption of sWHNoC can be obtained by
qNSEc þ ð1 qÞEuni .
Fig. 13(a) shows the relationship between the total energy consumption and q when jLi j ¼ 30;
a ¼ 0:001; Pt ¼ 0:5; p ¼ 0:03; N ¼ 4, and S ¼ 16. It shows that sWHNoC can achieve the lowest energy consumption as com-
pared with the other three different kinds of methods. Although all methods increase their energy consumption with the
increment of q, the effect on sWHNoC with EMS is almost none. This result shows that sWHNoC with EMS is better in dealing
with multicast transmissions in CMPs. Fig. 13(b) illustrates the ratio of the energy consumed by multicast to the total energy
consumption, denoted as RM , in different values of q. With the increment of q; RM ’s of all methods increase. Fig. 13(b) shows
the presentation of multi-unicast method in RM is the worst. It begins at RM ¼ 40% as q ¼ 1% while RM ’s of the other three
methods are less than 10%. When q grows to 10%, the RM ’s of the four methods reach nearly 88%, 50%, 32%, and 18%, respec-
tively. In these data, we can see that if the multicast traffic ratio is not large (i.e., q < 40%), the sWHNoC with EMS can main-
tain an acceptable energy consumption ratio.
Finally, we show the energy saving improvement ratio by using sWHNoC with EMS as compared with three multicast
methods. The energy saving improvement ratio is defined as RSAV ¼ ðEom Es Þ=Eom ; 0 6 RSAV 6 1, where Eom means the energy
consumption caused by other multicast methods (i.e., multi-unicast, tree-based, and path-based multicast methods) and Es
Fig. 13. Energy consumption in different values of q when jLi j ¼ 30; a ¼ 0:001; P t ¼ 0:5; p ¼ 0:03; N ¼ 4; S ¼ 16. (a) The total energy consumption per bit. (b)
RM .
100
Multi-unicast, |Li | =10
90 Multi-unicast, |Li | =30
Multi-unicast, |Li | =60
80 Tree-based, |Li | =10
Tree-based, |Li | =30
70 Tree-based, |Li | =60
Path-based, |Li | =10
RSAV (%)
60
Path-based, |Li | =30
50 Path-based, |Li | =60
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
RM (%)
Fig. 14. Energy saving improvement ratio RSAV versus energy consumed by multicast to total consumed energy ratio RM on NoC by adopting sWHNoC with
EMS in three multicast schemes (all in wired NoC): multi-unicast, tree-based, and path-based multicast scheme under jLi j ¼ 10; 30, and 60 in a 64-PE CMP.
means the energy consumption caused by the sWHNoC with EMS. Fig. 14 illustrates the relationship between RM and RSAV
with three types of multicast scales (i.e., represented in jLi j=64 that is 10=64 ¼ 0:15625; 30=64 ¼ 0:46875, and
60=64 ¼ 0:9375). It shows that RSAV increases with the increase of RM . The RSAV of the multi-unicast method achieved by using
sWHNoC with EMS can get highest improvement regardless of any multicast scale. The rest can be deduced by analogy that
the achieved RSAV decreases following the tree-based and path-based methods order. Overall, we have the observation that
the more energy the multicast method consumes, the more RSAV the sWHNoC with EMS can achieve. The RSAV gaps among the
path-based method in different jLi j are bigger since the energy consumption caused by the path-based method is propor-
tional to jLi j as shown in Fig. 11.
6. Conclusion
In this paper, a sWHNoC architecture with EMS for energy efficiency was studied. We proposed the p-persistent CSMA for
the fully connected WiNoC. To the best knowledge of the authors, the p-persistent CSMA for fully connected WiNoC is pro-
posed and studied in literature first time. Our study indicated the following results:
The hybrid star-topology subnet and fully connected WiNoC architecture is suitable for energy efficiency in NoC.
The optimal energy efficiency can be achieved by adjusting the portion of N and S in the sWHNoC architecture. The opti-
mal value of S is one quarter of the total number of PEs on CMP.
The optimal p-persistent CSMA based on the finite number of WRs is one of candidates for the high performance WiNoC.
The EMS with PG technique can significantly reduce the energy consumption in multicast transmission.
As a final remark, the slotted p-persistent CSMA is considerable for further performance investigation since the number of
PEs is fixed on CMP. The value of N and S can be turned to achieve optimal performance in terms of energy and transmission
latency. Base on the study, the sWHNoC shows itself that it is the most suitable topology for the CMP.
References
[1] Benini L, Micheli GD. Networks on chips: a new SoC paradigm. Computers 2002;35(1):70–8.
[2] Seongmoo H, Asanovic K. Replacing global wires with an on-chip network: a power analysis. In: Proc IEEE ISLPED’2005, San Diego, California; 2005. p.
369–74.
[3] Tavakoli E, Tabandeh M, Kaffash S, Raahemi B. Multi-hop communications on wireless network-on-chip using optimized phased-array antennas.
Comput Electr Eng 2013;39(7):2068–85.
[4] Yu X, Sah SP, Deb S, Pande PP, Belzer B, Deukhyoun H. A wideband body-enabled millimeter-wave transceiver for wireless network-on-chip. In: Proc
IEEE MWSCAS’2011, Seoul, Korea; 2011. p. 1–4.
[5] Ganguly A, Chang K, Deb S, Pande PP, Belzer B, Teuscher C. Scalable hybrid wireless network-on-chip architectures for multicore systems. IEEE Trans
Comput 2011;60(10):1485–502.
[6] Deb S, Ganguly A, Pande PP, Belzer B, Heo D. Wireless NoC as interconnection backbone for multicore chips: promises and challenges. IEEE J Emer Sel
Top Circ Syst 2012;2(2):228–39.
[7] Deb S, Chang K, Yu X, Sah SP, Cosic M, Ganguly A, et al. Design of an energy efficient CMOS compatible NoC architecture with millimeter-wave wireless
interconnects. IEEE Trans Comput 2013;62(12):2382–96.
[8] Yan S, Lin B. Custom networks-on-chip architectures with multicast routing. IEEE Trans VLSI Syst 2009;17(3):342–55.
[9] Xiang D, Zhang Y. Cost-effective power-aware core testing in NoCs based on a new unicast-based multicast scheme. IEEE Trans Comput Aid Des Int Circ
Syst 2011;30(1):135–47.
[10] Yuan J, Liu H, Jiang X, Xie W, Wang X. Key techniques of multicast communication for network on chip. In: Proc. int’l conf. internet technique and
applications, Wuhan, China; 2010. p. 1–4.
[11] Stefan R, Molnos A, Ambrose A, Goossens K. A TDM NoC supporting QoS, multicast, and fast connection set-up. In: Proc. DATE 2012, Dresden, Germany;
2012. p. 1283–8.
[12] Samman F, Hollstein T, Glesner M. Adaptive and deadlock-free tree-based multicast routing for networks-on-chip. IEEE Trans VLSI
2010;18(7):1067–80.
[13] Samman F, Hollstein T, Glesner M. New theory for deadlock-free multicast routing in wormhole-switched virtual-channelless networks-on-chip. IEEE
Trans Parallel Distrib Syst 2011;22:544–57.
[14] Ebrahimi M, Daneshtalab M, Liljeberg P, Plosila J, Flich J, Tenhunen H. Path-based partitioning methods for 3d networks-on-chip with minimal
adaptive routing. IEEE Trans Comput 2014;63(3):718–33.
[15] Ebrahimi M, Daneshtalab M, Liljeberg P, Plosila J, Tenhunen H. Path-based multicast routing for 2D and 3D mesh networks. In: Palesi M, Daneshtalab
M, editors. Routing algorithms in networks-on-chip. New York: Springer Science+Business Media; 2014.
[16] Sethuraman B, Vemuri R. Multicasting based topology generation and core mapping for a power efficient networks-on-chip. In: Proc. ACM/IEEE
ISLPED’07, Portland, Oregon; 2007. p. 399–402.
[17] Hu W, Lu Z, Jantsch A, Liu H. Power-efficient tree-based multicast support for networks-on-chip. In: Proc. ASP-DAC’11, Yokohama, Japan; 2011. p. 363–
8.
[18] Wang X, Yang M, Jiang Y, Liu P. On an efficient NoC multicasting scheme in support of multiple applications running on irregular sub-networks. J
Microprocess Microsyst 2011;35(2):119–29.
[19] Pavlidis VF, Friedman EG. 3-D topologies for networks-on-chip. In: Proc. IEEE int’l SoC conf., Taipei, Taiwan; 2006. p. 285–288.
[20] Li H, Gu H, Yang Y, Yu X. A hybrid packet-circuit switched router for optical network on chip. Comput Electr Eng 2013;39(7):2197–206.
[21] Chang MF, Cong J, Kaplan A, Naik M, Reinman G, Socher E, Tam S-W. CMP network-on-chip overlaid with multi-band RF-interconnect. In: Proc. int’l
conf. HPCA’2008, Salt Lake City, UT; 2008. p. 191–202.
[22] Chung H, Teuscher C, Pande P. Design and evaluation of technology-agnostic heterogeneous networks-on-chip. ACM J Emerg Technol Comput Syst
2014;10(3):1–27. article 20.
[23] Seo J-B, Jin H, Leung VCM. Throughput upper-bound of slotted CSMA systems with unsaturated finite population. IEEE Trans Commun
2013;61(6):2477–87.
[24] Abadal S, Mestres A, Iannazzo M, Solé-Pareta J, Alarcón E, Cabellos-Aparicio A. Evaluating the feasibility of wireless networks-on-chip enabled by
graphene. In: Proc. NoCArc’14, Cambridge, UK; 2014. p. 51–6.
[25] Yang Y, Yum T-SP. Delay distributions of slotted ALOHA and CSMA. IEEE Trans Commun 2003;51(11):1846–57.
[26] Lee S-E, Bagherzadeh N. A high level power model for Network-on-Chip (NoC) router. Comput Electr Eng 2009;35(6):837–45.
[27] Kleinrock L, Tobagi F. Packet switching in radio channels: Part I – Carrier sense multiple-access modes and their throughput-delay characteristics. IEEE
Trans Commun COM 1975;23(12):1400–16.
[28] Chen J. AMNP: ad hoc multichannel negotiation protocol with broadcast solutions for multi-hop mobile wireless networks. IET Commun
2010;4(5):521–31.
[29] Chen J, Sheu S-T. A reliable broadcast/multicast MAC protocol for multi-hop mobile ad hoc networks. IEICE Trans Commun 2006;E89-B(3):867–78.
[30] Keating M, Flynn D, Aitken R, Gibbons A, Shi K. Low power methodology manual: for system-on-chip design. New York: Springer; 2007. ch. 5–12.
[31] Boppana RV, Chalasani S, Raghavendra CS. Resource deadlocks and performance of wormhole multicast routing algorithms. IEEE Trans Parallel Distrib
Syst 1998;9(6):535–49.
[32] Jerger NE, Peh LS, Lipasti M. Virtual circuit tree multicasting: a case for on-chip hardware multicast support. In: Proc. IEEE ISCA’08, Beijing, China; 2008.
p. 229–40.
[33] Wang L, Jin Y, Kim H, Kim EJ. Recursive partitioning multicast: a bandwidth-efficient routing for networks-on-chip. In: Proc. ACM/IEEE NOCS’09, San
Diego, CA; 2009. p. 64–73.
[34] Takagi H, Kleinrock L. Throughput analysis for persistent CSMA systems. IEEE Trans Commun 1985;33(7):627–38.
[35] Gkelias A, Dohler M, Friderikos V, Aghvami AH. Average packet delay of CSMA/CA with finite user population. IEEE Commun Lett 2005;9(3):273–5.
[36] Wang H-S, Zhu X, Peh L-S, Malik S. Orion: a power-performance simulator for interconnection networks. In: Proc. IEEE/ACM MICRO’2002, Istanbul,
Turkey; 2002. p. 294–305.
[37] Malumbres MP, Duato J, Torrellas J. An efficient implementation of tree-based multicast routing for distributed shared-memory multiprocessors. In:
Proc. IEEE IPDPS’1996, New Orleans, LA; 1996. p. 186–189.
[38] DiTomaso D, Kodi A, Matolak D, Kaya S, Laha S, Rayess W. Energy-efficient adaptive wireless NoCs architecture. In: Proc. IEEE NoCS’2013, Tempe, AZ;
2013. p. 1–8.
Peng Dai received the B.S. and M.A. degrees in the department of Microelectronics from Tianjin University, Tianjin, China, in June 2012 and January 2015
respectively. His research focuses on the design and implementation of VLSI, Mixed-Signal integrated circuit. Currently, he works at Spreadtrum
Communications Ltd.
Jenhui Chen received the B.S. and Ph.D. degrees in the department of Computer Science and Information Engineering (CSIE), Tamkang University, Taipei,
Taiwan in January 2003. He is a professor in the department of CSIE, College of Engineering, Chang Gung University. His main research interests include
design, analysis, and implementation of communication protocols, wireless networks, cloud computing, big data, augmented reality, SoC, and NoC.
Yiqiang Zhao is a professor at the School of Electronic Information Engineering, Tianjin University. His primary research interests are mixed-signal
integrated circuit and system, VLSI imaging system, information security.
Yen-Han Lai received the B.S. degree in the department of mathematics, National Taitung University, Taitung, Taiwan, and M.S. degree in the department of
CSIE, Chang Gung University, Taoyuan, Taiwan, in 2009 and 2013 respectively. He is currently a Ph.D. student in the department of CSIE, Chang Gung
University. His main research focuses on wireless communications and network-on-chip.

Computers and Electrical Engineering: Peng Dai, Jenhui Chen, Yiqiang Zhao, Yen-Han Lai

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computers and Electrical Engineering: Peng Dai, Jenhui Chen, Yiqiang Zhao, Yen-Han Lai

Uploaded by

Copyright:

Available Formats

Computers and Electrical Engineering xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Computers and Electrical Engineering

A study of a wire–wireless hybrid NoC architecture with an

2.1. WHNoC architecture

Fig. 2. An illustration of the multicast packet format with MID.

3. Energy consumption analysis

where X j ¼ 1 slot for j ¼ 1 and X j ¼ 1 þ 1=a slots for j P 2.

1 ð1 qÞX j ðSnÞ ðjÞ

where DðjÞ is the expected value of DðjÞ and is given by [34]

3.2. Derivation of Einter

Einter ¼ ES þ EM þ EM0 : ð10Þ

4. Performance metrics of CSMA and token-passing protocols

4.1. Area cost

4.2. MAC delay

4.3. Energy consumption

140 Token-passing( η =1 .3)

Normalized energyper WR (nJ/bit)

Fig. 5. Energy comparison between CSMA and token-passing protocols.

The average energy per bit can be presented as:

5.1. Multicast group size

5.2. Multicast group distribution

5.3. The number of bits per T

5.4. Energy comparison with fully wired NoC

5.5. Energy comparison with other hybrid architectures

5.6. Capacity of EMS on energy efﬁciency

Then, the normalized energy consumption for each PE is

5.7. Energy saving capacity in the whole system

Fig. 12. Normalized energy comparison of three subnet topologies when N ¼ 8.

50 Path-based, |Li | =60

You might also like