You are on page 1of 29

Authors Accepted Manuscript

A hardened network-on-chip design using Runtime


hardware trojan mitigation methods
Jonathan Frey, Qiaoyan Yu

www.elsevier.com/locate/vlsi

PII:
DOI:
Reference:

S0167-9260(16)30031-1
http://dx.doi.org/10.1016/j.vlsi.2016.06.008
VLSI1225

To appear in: Integration, the VLSI Journal


Received date: 12 March 2016
Revised date: 13 June 2016
Accepted date: 27 June 2016
Cite this article as: Jonathan Frey and Qiaoyan Yu, A hardened network-on-chip
design using Runtime hardware trojan mitigation methods, Integration, the VLSI
Journal, http://dx.doi.org/10.1016/j.vlsi.2016.06.008
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.

A Hardened Network-on-Chip Design using Runtime Hardware Trojan Mitigation Methods

Jonathan Frey, Qiaoyan Yu

University of New Hampshire, Durham, NH 03824 USA.

jpg73@wildcats.unh.edu
qiaoyan.yu@unh.edu

Abstract
Due to the globalized semiconductor business model, malicious hardware modifications, known as hardware Trojans (HTs), have risen up as
a big concern for chip security. HT detection and mitigation methods for general integrated circuits have been investigated in the past decade.
However, the majority of the existing efforts are not customized for HTs in Networks-on-Chip (NoCs). To complement the firmware and
software level methods for rogue NoCs detection, we propose countermeasures to harden the NoC hardware design against tampering. More
specifically, we propose a collaborative dynamic permutation and flit integrity check method to mitigate the potential inside-router HTs
inserted by the disloyal member in the NoC design house or the 3rd-party system integration company. Our method improves the number of
received packets by up to 70.1% over the other methods if the HT controls the NoC packet destination address. The average link availability of
our method is 43.7% higher than that of the exiting methods. Our method increases the effective average latency by up to 63.4%, 68.2%, and
98.9% for the single HT in the destination, header, and tail fields, respectively, over the existing methods.

Index TermsNetwork-on-chip (NoC), hardware Trojan, hardware security, bandwidth depletion, deadlock, livelock, denial-of-service
attack, latency, throughput.

INTRODUCTION
fabrication, assembly and testing, have resulted in chip designs facing a new challenge
threats on hardware security [1-4]. Among various hardware threats, hardware Trojans (HTs) are a
well-known one; which are malicious hardware modifications on the original chip. An HT is composed
of trigger circuit and payload circuit. The trigger circuit is used to examine the arrival of the trigger
condition that the attacker specifies in the Trojan insertion stage. The Trojan payload circuit could cause a
denial-of-service (DoS) problem, alter a chips normal operations, or provide the adversary with the
privilege to access a confidential memory space [5-7]. The Semiconductor Research Corporation (SRC)
UTSOURCED

This manuscript was submitted on March 11st, 2016, and revised on June 13rd, 2016.
.

2
[34] concluded that one of the most challenging hardware security areas is the lack of HT detection
methods at and before the Register Transfer Level (RTL).
Existing efforts on HT detection can be categorized as non-destructive and destructive detection methods
[3, 8, 9]. The non-destructive methods include functional testing [10, 11], accelerated Trojan activation
followed with testing [12], and analysis on side-channel signals (e.g. power, delay, temperature, or
electromagnetic profiles) [13-15]. The destructive method category is mainly attributed to destructive
reverse engineering, such as Chemical Metal Polishing followed by Scanning Electron Microscope (SEM)
image reconstruction and analysis [16], and fingerprint examination [17]. As HT designs are becoming
more intricate and the HT target system integrates more transistors, HT detection in a very-large-scale
integrated system is comparable to finding a needle in a haystack. Consequently, HT detection at testing
time (or static time) cannot guarantee a HT detection rate of 100%. The residual HTs on chips will further
harm the systems with those malicious circuits.
As billions of transistors are integrated on a single die, Networks-on-Chip (NoCs) emerge as an efficient
on-chip communication infrastructure [18-20]. The consequence of using a compromised NoC has been
highlighted in [10, 31, 41, 42]. The HTs inserted in NoCs will lead to information leakage, unauthorized
memory access, and DoS attacks, such as incorrect path routing, deadlock, and livelock. Existing HT
detection methods and countermeasures have been designed for general integrated circuits [21-23]. For
Systems-on-Chip (SoCs) integrators, SoC firmware-level countermeasures [41, 42] are also available to
detect the presence of rogue NoCs. However, there is lack of effective countermeasures that specifically
harden the NoC design. This work fills in this need. We propose a runtime HT detection and mitigation
method to harden the NoC against tampering. Compared to the existing approaches, our method fully
considers the NoC features, including high modularity, large scalability, and parallel transmission channels,
when we strengthen the NoC capability to resist potential HT attacks.
The remainder of this work is organized as follows. The HT insertion scenarios interested in this work
are described in Section II. We summarize the related HT detection and mitigation methods for NoCs and
our contributions in Section III. Section IV discusses the proposed method, which is composed of a
collaborative HT detection and NoC data integrity check, dynamic permutation with a Physical Unclonable
Function (PUF) controller, a local unique and unpredictable permutation selection signal generator. In
Section V, we present the thorough experimental results attained from the simulations performed on a gatelevel

Figure1. The hardware Trojan insertion scenarios considered in this work.

Figure 2. Packet format used in this work.

NoC implementation. Conclusions and future work are provided in Section VI.

3
NoC Design
Company A

NoC (firm) IP Market

Core Members in
the Design Team
Other
Members in
the Team

Memory

PEs

To foundry

NoC
3rd-party
EDA tool

NoC Design
Company B
(Company As
competitor)

Scenarios of HT
insertion

PEs: Process Elements

HEAD TAIL
bit
bit

SRC
field

DEST
field

Header
flit

Source Destination
0 Address
Address

Payload
flit

Data

Tail
flit

Data

NoCs Hardware Trojan Insertion Scenarios


Attack Scenarios
HTs are designed to deliver attacks such as DoS, information leakage, data manipulation, and system
degradation. During the design stage, HTs can be embedded at the RTL description [6], or straight into the
gate level netlist, leading to logical attacks on the system. At the fabrication stage, design layouts can be
modified to include a HT, changing internal circuit characteristics such as delay. The countermeasure
proposed in this work aims to address the HTs that are inserted by the disloyal team member in the NoC
design house, or the 3rd-party multiple-processor SoC (MPSoC) integration company, as shown in Fig. 1.
Some team members hired by the competitor of the NoC design company A could place HTs in the NoC
design. Although not being a core member in the design team, the disloyal employee may have the access
to the NoC gate-level netlist and maliciously modifies a few (not all) router instantiations in some firm
NoC IPs sold by the company A. Even if the original NoC IP does not contain HTs, the untrusted 3rd-party
EDA tool could possibly insert HTs into the NoC IP to develop an unsecure MPSoC for the customers. The
first scenario shown in Fig. 1 will cause Company A lose reputation and market shares. The second
scenario shown in Fig. 1 will cause catastrophic consequences if the compromised MPSoCs are used in
military and government applications. In addition to rely on the regulation policies, a NoC designer should
implement countermeasures to resist potential HT insertions from the supply chain.

NoC Basics
In NoCs, flit is the basic flow control unit that is transferred over the NoC. Each NoC packet has one
header flit, one tail flit and several payload flits. The first bit is the header bit, where a high logic value
indicates the presence of header flit. The second bit is the tail bit, where a high logic value specifies the

4
arrival of tail flit. The rest of flit bits either contain information, such as a source identifier, destination
address and routing protocol, or are data bits. Flit source/destination address and flit type information
guides NoC router to transfer flits over the multiple-hop routing paths. Figure 2 shows the packet format
used in this work. This format is a simplified version from the one used in the Intels NoC chip [18].
Malicious modifications on the header flit or flit type may destroy the flits data integrity, resulting in a
misrouted packet or a flit loss. The misrouted packet possibly violates the deadlock-free routing rules,
creating deadlocks in the network. Deadlock, livelock, and flit loss will lead to bandwidth depletion.

Attacks in NoCs
In the context of NoC-based MPSoCs, HT attacks that extract secret information and hijacking are more
likely to happen in process element cores, rather than in the NoC routers. As the primary focus of this work
is to harden the NoC design, we address the HTs that perform DoS attacks in the NoC routers. The
consequence of HT DoS attacks includes bandwidth depletion, incorrect path routing, deadlock and
livelock [5]. More specifically, HTs in routers can maliciously change the flit source/destination address or
flit type information of a packet that has left the transmitter network interface (NI). If a Trojan payload
modifies the destination address of a packet, that packet could be directed to an unauthorized IP core. A
drop of the header flit or tail flit will result in the incomplete packet being retained in the router until some
operation arrives to reset the router. More detailed HT attacks in NoCs are presented in [5].

Related Work and Our Contributions


Previous HT Detection and Mitigation Methods in NoCs
Software-level methods for NoC security are discussed in [24, 25]. The presence of a compromised NoC
can be detected by firmware-level methods [41, 42]. To thwart the attack from a rogue NoC, authenticated
encryption framework is proposed in [43]. The majority of hardware-level methods for the security
management of NoCs are employed in the network interfaces (NIs) or the module directly connecting with
IP cores [25-29]. Memory address checking [28, 29] are employed to prevent untrusted IP cores from
accessing trusted IP cores. Permission lookup tables are implemented in a memory protection unit [27, 30]
to provide authentication against malicious packets. Communication obstacles are placed between the
trusted and untrusted zones in [47] to stop attacks from the untrusted zone. A security wrapper and keykeeper cores [31] are used to encrypt private and public keys at the network level; at the core level, elliptic
curve cryptography is run to resist power attacks. Existing approaches applied to NIs can block
unauthorized memory accesses and packet relay, but cannot effectively prevent NoC bandwidth depletion.
There are some works that perform HT detection and mitigation for on-chip communication networks at
the NoC microarchitecture level. Wang et al. [32] set an upper throughput limit on the switch allocator to
detect whether a HT is maliciously controlling the allocator to perform a DoS attack. Control signals are
added to the input and output arbiters to prevent the number of low-security flits exceeding the static limit
of the low-security channel. The throughput is measured by a user-defined counter. If the counter is
compromised, the security protection mechanism fails.
Baron et al. [30] use two security wrappers to address masquerade attacks and DoS attacks. The first
security wrapper in [30] compares the destination address of the outgoing packet with the illegal address
range to block masquerade packets. The second security wrapper in [30] counts the number of clock cycles
elapsed since the first packet is injected into the NoC, and postpones the injection of the next packet. In
addition, a tail flit is manually appended to a packet that is suspected to be a malicious packet without a tail
flit. The security wrappers may be able to block the malicious packets from non-secure IP cores, but may

5
lose the defense capability if an adversary has access to the wrapper design. Moreover, the counter-based
HT detection methods induce latency and area overhead.
Kim and Villasenor [33] apply the restricted address registers to the address decoder to avoid a malicious
bus master/slave from accessing the restricted address range. Although the restricted address registers are
invisible to all masters and writable from external ports, those registers may be changed maliciously by the
attacker. Kim and Villasenor believe that using a secret code by an authorized person can address the
remaining malicious attacks. Unfortunately, that work has not considered and assessed the potential risk
and cost of using an authentication code. The bus arbiter in [33] uses a time-based counter or statistical
analyzer to identify the suspicious use of bus mastership. Similar to [30], the counter-based detection
method is not effective as soon as the HT is triggered. This potentially causes bandwidth depletion due to
malicious packets propagating over the entire network.
In the previous work [28, 33], examination on the memory access range is used as a main method to
ensure the NoC is secure. However, those countermeasures have a common limitationthe memory range
checking unit itself is lack of protection (unless a cipher is used in the memory). Moreover, after the packet
passes the address filter, no further protection is available in the NI or router. Unfortunately, NIs and
routers can be compromised after the route computation stage. For instance, one may modify the memory
address of a packet that is saved in the output buffer. Many HT countermeasures for NoCs [26, 28, 29] are
mainly located in NIs, rather than routers. Routers have five interconnection channels with neighboring
nodes (i.e. routers and NI); in contrast, NIs only have one connection to its IP core and one connection to
its router. Therefore, we believe that it is more feasible to execute DoS attacks in a router than in a NI.

Our Contributions
The main contributions are as follows.
(1) We propose countermeasures to harden the NoC design, rather than fully relying on software or
firmware solutions [41, 42] to detect a compromised NoC used in the MPSoC. Our NoC hardening
method addresses the HTs placed in the NoC netlist by the disloyal employee in the design house or
untrusted 3rd-party system integration company.
(2) Our method is designed for the NoC routers, instead of the NIs targeted in the existing works, to
promptly mitigate the HT effects before the bandwidth depletion happens to the entire NoC.
Moreover, our router-level HT mitigation mechanism raises the bar for an adversary to
simultaneously control multiple routing hops to create a malicious communication path between two
IP cores in the NoC-based MPSoC. As a result, a packet affected by HTs cannot successfully
transverse multiple router hops and reach the destination to complete the malicious task.
(3) The proposed collaborative dynamic permutation and flit integrity check method is capable of
examining the invariables of NoC to immediately terminate the detected HTs. The proposed PUFbased random vector generation method exploits the unique process variations and the random
content from each routers round-robin registers to guarantee each router independently resist
potential HT attacks.
(4) More specific HT attacks in NoCs are considered in this work then the existing literatures. To the best
of our knowledge, this is the first work that mitigates the HTs that maliciously modifies the flit
content, such as critical flit role bits and routing path bits. Accordingly, our simulation results
demonstrate the significant impact of different flit bits being compromised on the NoC throughput,
link availability, traffic hotspot migration, bandwidth depletion, area cost and average packet latency.

6
We further assess the NoC resistance against HT attacks if the attacker has the knowledge of the
applied countermeasure.

Proposed HT Mitigation Method in NoC Router


Overview of Proposed Framework

Figure 3. High-level overview of the proposed countermeasure.


Figure 4. Router architecture used in this work. One proposed input and output port pair for hardware Trojan detection and mitigation. Solid
blocks highlight our innovations over the generic router architecture.

NI

NI
N
W

E
S

NI
N
W

L
E

Link

Flush
FIFO

NI

Flush
FIFO

FIFO

Router

HT insertion
locations
Locations of
proposed HT
countermeasure

South
Input
Port

Flit

Critical flit
bits

Control
signals

Key bits

Flitin
Input FIFO

Input FIFO
Controller

East
Output
Port

Flit
Processing
Unit

FIFO

Router

Two NoC ports


with proposed HT
countermeasure

Crossbar

Arbiter

Output FIFO

Flitout

Output FIFO
Controller

The proposed HT mitigation method aims to detect and mitigate the HT attacks that (1) modify the flit
type, (2) change the legal packet destination address to an unauthorized one, and (3) sabotage the integrity
of a packet. The main consequence of the HT targeted in this work is the NoC bandwidth depletion. We
assume that the links between NoC routers are trusted (ensured by using the technique such as [45]). Due to
the regularity of implementation and large area consumption of the FIFOs, we assume that FIFOs are more
likely and easier to be compromised by the adversary than the other router submodules (such as flit
processing unit).

7
As shown in the left side of Fig. 3, a generic mesh-NoC is composed of links, NIs, and routers with five
ports (north, east, south, west, and local input/output ports). We use one pair of input and output ports
(shadowed area in Fig. 3) as an example to present the novelty of our router design. The zoomed in picture
in Fig. 3 depicts the high-level view of the proposed HT countermeasures (HTC) applied in the south input
port and the east output port. In the input port, the HTC1 module dynamically permutes the incoming NoC
flit bits before reaching the FIFO. As a result, the NoC flit saved in the input FIFO is scrambled. Due to our
randomness and dynamicity of the permutation patterns, it is much harder for an attacker to change the flit
content into something meaningful through HTs in our proactively defensed router than in a router with
static protection. The goal of proposed permutation is to reduce the probability of a HT inserted in the FIFO
successfully modifying the critical bits in the NoC packet. The HTC2 module is used to examine the
integrity of flits, which could be sabotaged by the HTs placed in the input FIFO. The main function of the
HTC2 is to prevent the malicious flits from entering the rest of the network by dropping the flit and flush
the input FIFO if necessary. HTs could also be located in the flit processing unit and the FIFO in the east
output port. Hence, we place a HTC3 module after the FIFO to stop the malicious flit from leaving the
current router and then recover the scrambled flit back to the original bit order. The three HTC modules
proposed in Fig. 3 cooperatively provide proactive defense to thwart the potential HTs that harm the NoC
flit integrity and thus deplete the NoC bandwidth.
The HTC modules in Fig. 3 are detailed in Fig. 4. The highlighted solid blocks in Fig. 4 are the changes
over the generic router. To guarantee a packet reach its original destination, the critical fields (i.e.
source/destination address and flit type bits) in a NoC packet cannot be tampered with. To maintain the flit
integrity in despite of the potential HTs in the router, we propose to first encode the critical flit fields with
an Error Control Code (ECC). Next, we apply a Dynamic Flit Permutation (DFP) technique to the packet
buffers, so that critical information bits in a NoC flit is randomly reordered before that flit is sent to the
input FIFO. If the attacker could not precisely know the dynamic permutation pattern at each specific
moment, the HTs he/she inserted in the router will not be triggered as expected or execute the malicious
mission as designed. Symmetrically, the Dynamic Flit De-Permutation (DFDeP) is applied at the output
port. After de-permutation, an ECC decoder (DeECC) facilitates the router to detect and correct all 1-bit
errors on the critical flit content.

Flit In

Flit ECC
Encoder

FP

PPS Vector
Control signals

PFDP

Input
FIFO
&
Ctrl

PDeECC
Route
Comp.

Crossbar
Switch

Proposed Route
Computation

Critical flit bits


Full flit bits
FP: flit permutation
FDP: flit de-permutation
ECC: error correction
coding Encoder
PFDP: partial flit depermutation
PPS Vector: Permutation
pattern selection vector

Output
FIFO &
Ctrl

Flit DeECC
FDP
Flit Out

PPS Vector

successful attacks at the design time.

The permutation randomness is achieved by a


pattern selection vector, which is dynamically
generated by the Local Random-Vector Generator.
Each router has its own random number generator,
which provides the selection vector to choose one of
the flit permutation patterns for the DFP and DFDeP
units. We exploit the unique routing history saved in
the Arbiter to generate a random vector through a
PUF structure implemented in each router. Due the
uniqueness of each PUF and the routing history, the
dynamic permutation pattern for DFP and DFDeP
varies over time and is different from router to
router. As the packet routing history in each router
depends on the NoC application at deployment stage,
the output of Local Random-Vector Generator is
unpredictable at the NoC design stage. Thus, the
adversary cannot easily place HTs to perform

8
The Proposed Route Computation unit is designed to interpret multiple versions of the permutated flit
content into a request to use the correct output port for the next routing hop. To reduce the hardware cost,
we propose to use Partial Flit De-Permutation and DeECC to correctly interpret important internal and
external signals such as write request and FIFO buffer status signals.

Figure 5. Proposed collaborative hardware Trojan detection and mitigation method based on dynamic permutation and our flit integrity check.

Proposed Collaborative Dynamic Permutation and Flit Integrity Check Method for Routers
We propose to integrate a dynamic permutation mechanism with ECC to address HTs inserted in the
router. The overview of the proposed collaborative method is shown in Fig. 5. To minimize the hardware
cost and maximize the error correction strength, we only encode the critical flit bits (i.e. header, tail and
destination address bits) with the flit ECC encoder. Together with the non-critical flit bits, the codeword
from the ECC encoder is permuted before being stored in the input FIFO. The proposed route computation
unit processes the critical flit bits by partially de-permuting and partially error control decoding before the
normal route computation. This is because the partial de-permutation and decoder outputs are only used to
generate a valid output-channel request. If the loss of flit integrity is detected, we either correct the flit, if
correctable, or drop that packet, if uncorrectable. An uncorrectable bit error will trigger a permutation
configuration update.
The flit permutation technique obfuscates the flit content to a certain degree, so that the attacker cannot
easily implement a HT to modify the flit content. In addition, the permuting mechanism may distance the
positions of the target bits of a DoS attack. As a result, not all critical flit bits are modified by the HT. The
distributed error bits can allow us to use a simple error correction code to reconstruct the critical flit bits
and prevent the depletion of bandwidth accordingly. The error detection and correction capability available
in our router provides a second line of defense to thwart HTs, if the attacker successfully bypasses the
dynamic permutation.
As shown in Fig. 5, the unpredictable permutation pattern selection (PPS) vector applied to the flit
permutation (FP), flit de-permutation (FDP) and partial flit de-permutation (PFDP) is the same. To further
reduce the overhead induced by the random selection vector generation, one can share a selection vector
with five pairs of input and output ports per each router. Our method can be extended to dynamically
change the permutation configuration once an error is detected in the critical flit field. More details of the
baseline route computation module in Fig. 5 can be found in the work [46].

Dynamic Permutation with PUF-based Controller


We propose to permute the flit bit positions before storing it to the input buffer, and de-permute the flit
after the output buffer. The principle of our permutation is as expressed in (1).
flit dp (t , i, j ) dp flit (t , i, j ), PPS (t , i, j )

(1)

In which, the variables i and j in (1) and (2) mean the router ID and the input port ID, respectively.
Variables flit(t) and flitdp(t) are the original flit and the flit after dynamic permutation at the time t,
respectively. The function dp[] stands for our proposed dynamic permutation algorithm, which is a
function of the incoming flit and a unique PPS vector. Equation (1) is not explicit mathematic expression of

9
flit permutation; instead, we use this equation to show the dependent factors for the proposed dynamic
permutation.
A pseudo code for dynamic permutation is shown in Fig. 6. In each input port of a router, a permutation
pattern is dynamically selected by the PPS vector generated by the PUF structure located in the router. The
essence of flit permutation is a process of swapping flit indexes. The flit permutation is implemented with
hard wiring flit bits to the inputs of multiplexers, which are controlled by the PPS vector. Each router
performs flit permutation independently, as each router has its own PUF structure. While flit permutation is
executed at runtime, the list of different permutation patterns is generated offline. A designer can define the
total number of permutation patterns and specify the flit width. The core of permutation can be done in
MATLAB or other software with various built-in permutation algorithms. In order to avoid the critical flit
bits being placed in the same positions through different permutation patterns, we put one restriction in the
offline permutation pattern as indicated in Fig. 6. The hardware implementation is shown in Fig. 7. Without
losing generality, we show a case in Fig. 7 that the header, tail, and destination address bits are distributed
to different bit positions through dynamic permutation. To add a non-linear property to the permutation, we
configure the permutation mapping with the unfixed PPS vector. The pattern selection vector is the output
of a PUF-based structure and varies with process variations and the dynamic challenge vector X, as
expressed in (2).
PPS (t , i, j ) f Process Variation, X t , i, j

(2)

The challenge vector X(t, i, j) depends on the routing history in each routers output port. More details for
the variable of PPS and X are provided in Section IV.D. As each router has a unique and unpredictable
routing history, the random PPS(i, j, t) and thus the permutation configuration are unique for each routers
each input port at different moments.
In the proposed design, we introduce three random factors to the bit permutation module: (1) multiple
permutation configurations, (2) traffic condition and process variation dependent PPS signal, and (3) the
choice of which permuting configurations to use at runtime. By using these three random factors, we
introduce a large space of obfuscation in the design, thus adding difficulty for the adversary to achieve
meaningful attacks on the RTL design. The hardware overhead for this technique is only due to
multiplexers; the implementation for the permutation algorithms is done by hardwiring. The de-permutation
algorithm reverses the effect of permutation, and has a similar structure.

Figure 6. Pseudo code for proposed dynamic permutation.

Figure 7. Schematic for the proposed PUF-based flit permutation function dp. The permutation algorithms are implemented with hardwiring
before the multiplexers.

Figure 8. Proposed circuit diagram for the PPS vector. (a) PUF-like structure, (b) challenge vector generation circuit.

10

Figure 9. Example time diagram for generation of a new PPS vector.

Figure 10. NoC divided into secure and non-secure zones for experimental setup.

//Bit permutation in each flit, executed at runtime

X[0]

X[1]

X[2]

for i = 1 : No.Router //number of routers


Flit bit

for j = 1 : 5 //five ports in each router


for k = 1 : FlitWidth {
k = PermPattern( PPS(i, t), k );

D Q

flitdp(t, i, j) [k] = flit(t, i, j) [k];

DQ

PPS[0]

PPS CLK

11

D Q

PPS[1]

PPS[2]

}
(a)
X0[n]

//Permutation pattern generation function, executed offline

RRReg4[n]

function PermPattern(Pid, Bid):

RRReg3[n]
RRReg2[n]

//Pid: pattern index, and Bid: bit position index

X1[n]

X2[n]

RRReg1[n]

parameter NumPattern = 8; //defined by user


RRReg0[n]

parameter FlitWidth = 36; //defined by user


CLK

for i = 1: NumPattern {
(b)

vector(i,:) = perm(1:FlitWidth);

PUF-based Vector Generation for Random


Permutation Pattern Selection (PPS)

while (critical bits in vector(i) are placed in


the same positions in other vectors(any m <i) ):

The permutation pattern selection vector


determines which permutation algorithm for the flit
permuting and de-permuting modules will be used at
}
the current time. To prevent the attacker from
attacking the permutation mechanisms, we proposed
NewBid = vector(Pid, Bid);
to dynamically change the selection vector for the permutation algorithm at runtime. The selection vector is
generated
locally in each router. As shown in Fig. 4, each pair of input and output ports needs to have one
return NewBid;
proposed Local Random Vector Generator. No centralized random vector management for the entire NoC
is needed. This further hardens our method against the adversary as the local router selection vector will
differ at runtime. As the flit permutation configuration in each router is different, cracking one router would
not guarantee the malicious packet to have full network access to successfully reach its destination.
vector(i,:) = perm(1:FlitWidth);

Flit(t, i, j)

PPS
Vector

HEAD Data bit


bit
[5:3]

TAIL Data bit


bit
[11:9]

Data bit
[8:6]

MUX

MUX

MUX

MUX

SRC
bit [2]

DEST
bit [0]

HEAD
bit

TAIL
bit

Flitdp(t, i, j)

DEST
bit [0]

Data bit
[23:20]

SRC
bit [2]

12
We exploit the PUF-like structure [35, 36] and the
random content saved in the round-robin register to
generate a PPS vector. As shown in Fig. 8(a), the
PUF-like network is made up of multiplexers that
propagate a flit bit along a path determined by the
multiplexer challenge vector. Due to process
variations, the propagation delays between
multiplexer inputs are slightly different. Thus, each
multiplexer will have two different inputs. The
challenge vector controls which multiplexer inputs
propagate to their respective outputs. The output of
the multiplexer is latched by the registers at the clock
edge. The period of the user-defined PPS clock pulse
should be longer than the time needed for all packets
left in the input and output buffers to move to the next
hop. If eight permutation algorithms are sufficient to
obfuscate the flit, a 3-bit PPS vector is needed for
runtime dynamic permutation.

Tail Flit
Arrival
Forced Input
Buffer Full
New PPS Vector
Generation
Enable

(1)
(2)

(6)

Output Buffer
Empty

(5)

(3)
PPS CLK
PPS Vector

Current PPS
vector

(4)

New PPS
vector

Besides the process variation, the proposed method


further dynamically changes the challenge vector
using the internal value of the Round-Robin Registers
(RRReg) in each router. As the RRReg value is set
based on the output port requests from the real-time
packet traffic, different routers will have unique
RRReg values at a given time. Moreover, the RRReg
value varies with different traffic traces extracted
from the NoC application. We propose to exploit the
time differential concept to design the RRReg-based
challenge vector generator. Equations (3)-(5) express how we use the RRReg content at different clock
cycles to generate the challenge vector X.
X 2 [ n ] ( RRReg0 [ n 3]) ((! X 2 [ n 3]) & (( RRReg4 [ n 6])
( RRReg3 [ n 6]) ( RRReg2 [ n 6]) ( RRReg1 [ n 6 ])))

(3)

X 1 [ n] ( RRReg2 [ n 2]) ( RRReg1 [ n 2])

(4)

X 0 [ n ] ( RRReg4 [ n 1]) ( RRReg3 [ n 1])

(5)

where RRReg0[n] stands for the first RRReg bit at clock cycle n, and RRReg0[n-2] means the first RRReg
bit stored at clock cycle (n-2). We use a shift register to keep the RRReg history, as shown in Fig. 8(b).
In addition, the PPS vector depends on the incoming flit, thus it cannot be predicted at the design time.
This unpredictability further increases the difficulty to execute a successful HT attack at runtime.

Dynamic PPS Update


The time relation of the PPS vector generation and internal router signals is shown in Fig. 9. When it is
time to change the PPS vector, we wait until a tail flit arrives, signifying that a full packet has been

13
received. As shown in Fig. 9, upon arrival of the tail flit (1), the input buffer controller forces the input
buffer full signal to be high, in order to stop more flits from entering the input buffers. Once the input
buffer full signal is forced to high, transition (2) happens, and the PPS generation network is turned on and
allows an incoming flit bit to enter the generator of the PPS vector. As soon as all of the routers output
buffers are emptied, transition (3) occurs. Transition (3) leads to a clock pulse for the PPS vector generation
is formed to latch the random PPS vector at the current time (4). The negative edge of the clock pulse leads
to transitions (5) and (6), which turn off the PPS generation network, and sets the buffer signals back to
normal, respectively.

Experimental Results
Experimental Setup
Our baseline network was a 4x4 Mesh NoC that used a XY routing algorithm. The link width was 32 bits,
and our input and output FIFOs had a buffer depth of 8. The crossbar used a Round Robin arbiter for
directional priority. A Hamming (15, 11) code was used as the error control codec. The PPS vector is a 3bit vector for the random selection of 8 permutation patterns. We divided the NoC into secure and nonsecure zones, as shown in Fig. 10. The IP cores in the secure zone are authenticated and trusted ones. The
IP cores in the non-secure zone may contain some untrusted modules. The packet transmission in the NoC
is restricted under secure-zone communication and non-secure-zone communication rules. Each secure
(non-secure) node has the same probability to send packets to any of the other secure nodes (non-secure).
We considered different packet injection rates () ranging from 0.0769 to 0.2 packets per cycle per node.
Our simulation time was over twenty thousand clock cycles to achieve a steady state in all methods, and
obtain more accurate performance results.

Related Work under Comparison in this Section

Figure 11. Number of valid packets received. (a) No HT, (b) 1 DEST HT, and (c) 3 DEST HTs.
Figure 12. Number of valid packets received. (a) No HT, (b) 1 HEAD HT, and (c) 3 HEAD HTs.
Figure 13. Number of valid packets received. (a) No HT, (b) 1 TAIL HT, and (c) 3 TAIL HTs.

14
In section, we evaluate all methods above suffering
those HTs, which are injected between the Input
FIFO and the route computation module. Three HT
typesdestination address (i.e. DEST HT), header
bits (i.e. HEAD HT), and tail bits (i.e. TAIL HT)
were considered in our experiments. The HT
injection spots were on routers center routers R5 and
R9, and corner router R0. We injected HTs into the
middle routers as the incurred effects should be more
detrimental for routers with more traffic passing
through. We chose a corner router as another
injection spot to view the effects of a HT inside a
router with less I/O ports.

(a)

(b)

(c)

The proposed method was compared with existing


methods [28, 30, 32, 37]. The work in [28] presents a
data protection unit in detail to address the potential
hardware Trojans attacking the NoC NI. Saponara et
al [38] follow a similar idea to detect if the memory
access is out of the secure memory range in the NI.
We assume that address filtering should always be
done at the network interface. Hereafter, Data
Protection Unit (DPU) [28] is referred as Baseline in
the rest of this work. In [30], a counter-based traffic
adjustment NI is presented to address the packet loss
and thus bandwidth depletion due to hardware
Trojans. We refer to this method as NI-Term. When
(a)
(b)
(c)
a packet without a tail flit is left in the NoC network,
that packet cannot move forward and will block
future packets entering the network, thus resulting in
the network bandwidth depletion. The timeout
mechanisms [39, 40] are typically utilized to resolve
the bandwidth depletion induced by deadlock or link
faults. Similarly, the work [32] uses a counter to
terminate the current packet transmission if
malicious behavior is detected. We apply the timeout
mechanism to the router, by manually inserting a tail
flit after a pre-determined time in order to terminate
the transmission of the current packet. This method
is referred to as R-Term. The data protection unit
[28] and the wrapper method in [30] are not designed
(a)
(b)
(c)
for router level hardware Trojan detection. The work [39] proposes to investigate the feasibility of applying
address filters to NoC routers. Despite of minor difference, the key principle for the DPU and Wrapper
methods is to check whether the pair of source and destination addresses carried in the header flit matches
to the table of legal source-destination pair list. If the address pair is not legal, that packet is filtered out by
the router. Hereafter, we name this method R-AddrFilter.

15

Throughput
We compared the throughput of different methods by examining the total number of valid packets
received by the NoC NIs in the given simulation time. We considered a valid packet as one that eventually
reaches its original destination address, rather than a modified destination. By comparing the performance
shown in Figs. 11(a), (b), and (c), we can see that the proposed method receives the same amount of valid
packets for the same traffic injection rate, regardless of how many HTs are inserted. In contrast, other
methods drop significantly in the number of received valid packets as more HTs are inserted. For the single
DEST HT scenario, the proposed method improves the number of received packets by up to 6.4%, 70.1%,
14.3%, and 11.5% over the baseline, NI-Term, R-Term, and R-AddrFilter, respectively. For the three
DEST HTs scenario, the proposed method achieves even better throughput performance than other
methods. Our approach can successfully receive 43.1% more packets than R-AddrFilter. The proposed
method is better than the other methods for the DEST HT scenario because our method is able detect and
correct a 1 bit change to the destination address.

Figure 14. Link availability comparison among methods that suffer from DEST HTs.
Figure 15. Link availability comparison among methods that suffer from HEAD HTs.
Figure 16. Link availability comparison among methods that suffer from TAIL HTs

16

In Fig. 12, we show the impact of HEAD HTs on the number of received packets. Again, our method
outperforms other approaches. This is because when a header flit is dropped, the other methods cannot
recover that flit, and the packet is lost. The throughput of our method remains unchanged with the
introduction of HTs. The other methods, however, receive 43.3% less packets in the three HEAD HTs case
than the zero HEAD HT case. In addition, the throughput of the R-Term, R-AddrFilter and baseline
methods will remain equal, as no protection is provided by those methods against HEAD HTs. This is
different than the throughput performance shown in Fig. 11.
When the TAIL HT causes tail flit losses, a more radical change on the number of received packets is
observed than in the prior HT cases. As shown in Fig. 13, the proposed method outperforms the other
methods by receiving more valid packets when TAIL HTs are introduced. Our method achieves the same
throughput in the TAIL HT cases as in the HEAD HT cases. The impact of the different HT types on the
throughput of the other methods is more significant. For instance, as shown in Fig. 13(b), the R-AddrFilter
method receives 94% less packets than our method. This performance drop is more than what was observed
in the HEAD HT case. The performance drop is more detrimental because the absence of tail flits will not
allow all routing paths to release, resulting in the network bandwidth depleting rapidly. The R-Term
method is designed to handle TAIL HTs after a pre-defined waiting period. As the number of inserted HTs
increases, multiple routers waiting for the timeout to insert tail flits will block new packets from being
injected to the network, thus resulting in a significant decrease on the number of received packets.
Compared to the baseline method, the proposed method improves the throughput by 6.4% (for 1 DEST
HT case) and 43.12% (for 3 DEST HT case) at the greatest injection rate. When the HT controls the header
flit bits of the NoC packet, the proposed method provided 11.6% and 43.67% increases in throughput over
the baseline method for 1 HEAD HT and 3 HEAD HTs, respectively. If the HT compromises the tail bits of
the NoC packet, the proposed method achieves up to a 98.8% improvement in throughput for the 3 TAIL
HTs scenario over the baseline method.

Link Availability

17

Figure 17 Traffic hotspot migration and bandwidth depletion, induced by DEST HTs

Figure 18. Traffic hotspot migration and bandwidth depletion, induced by HEAD HTs.

Figure 19. Traffic hotspot migration and bandwidth depletion, induced by TAIL HTs.
.

Variation on the number of received packets shown in Figs. 11-13 demonstrates, from a throughput
perspective, how different methods put different amounts of effort on network bandwidth depletion if a HT
is present. Now, let us examine the effectiveness of each HT mitigation method from a link availability
point of view. The link switching activity of each NoC link is recorded throughout the simulation and the
total number of idle links at each clock cycle is accumulated. Given a traffic trace, a smaller number of idle
links indicates a more severe level of bandwidth depletion.

18
In equation (6), we define the network bandwidth availability as the ratio of the number of cycles that
n links are idle over the total number of simulation cycles.
Availabilityn idle links

Cycles for n Idle Links

(6)

Total Simulation Cycles

The subplot (a) depicted in Figs. 14-16 shows the link availability of the baseline NoC without a HT
insertion. Figure 14 shows the bandwidth availability of the NoC given a specific HT mitigation method
against three DEST type HTs injected into the NoC router. The traffic injection rate, , was 0.2 packets per
cycle per node. If the histogram plot displays a peak towards more idle links for a higher availability, then
the corresponding HT mitigation method successfully manages the presence of HTs. In contrast, if the peak
is shifted more towards a smaller amount of idle links, then the method may be suffering more from the
depleted link bandwidth. As shown in Fig. 14, the R-AddrFilter and NI-Term methods exhibit less link
availability than the baseline, R-Term, and proposed methods if the DEST HTs appear in the routers. The
proposed method achieves 19.1% and 22.5% higher average link availability than the R-AddrFilter and NITerm methods respectively.
Now, we examine the impact of three HEAD HTs on the link availability achieved by different methods.
Comparing Figs. 14 and 15, we can see that only the proposed method obtains the same link availability
with that of the baseline without any HTs. This is because our method detects the HEAD HT and
reconstructs the packet header flit in the same clock cycle, while the other methods lack protection against
HEAD HTs. With three HEAD HTs injected in the NoC routers, the average number of idle links for the
baseline, R-Term, and R-AddrFilter methods is reduced by 10% over the baseline NoC with the 0 HT case;
the NI-Term method suffers from a 29.8% decrease in the average number of available links over the 0 HT
case. The decreases in the link availability of the other methods are a result of the extra payload and tail
flits propagating into the router FIFOs, hindering the network performance as those flits will waste the
FIFO buffer space.

Figure 20. Average packet latency for a wide range of traffic injection rate and different number of HTs.
Figure 21. Effective average packet latency.

19

If three TAIL HTs were injected to the NoC routers, our method is the only one that follows the trend of
the baseline without HTs case, as shown in Fig. 16. Due to the HT payload effects, the other methods show
signs of bandwidth depletion, as certain links have data stuck on them for extended periods of time. The
proposed method improves the average link availability by 33.7%, 33.7%, 26.4%, and 43.7% over the
baseline, R-AddrFilter, NI-Term, and R-Term methods respectively.

Traffic Hotspot Migration and Bandwidth Depletion


We monitored the arrival activity of new flits at each output port of the routers to estimate the switching
activity (thus power consumption) of each router throughout the entire simulation. The total number of
different flits transferred through each router is accumulated and then plotted in a node-type plot, as shown
in Figs. 17-19. Each square in the figure represents the switching activities of one router, with the
corresponding coordinates in the physical NoC implementation. The color of each square indicates the
intensity of the number of output port switching operations occur over the entire simulation period. Figures
17-19 show the impact of two DEST HTs, two HEAD HTs, and two TAIL HTs on the router switching
activities, respectively. The tile plots of each method were compared to the baseline NoC without HTs in
the network, which is subplot (a) in each figure.
Compared to the tile plots in Fig. 17, the proposed method exhibits the closest activity compared to the
baseline with 0 HTs. The proposed method has more intense switching on two routers over the baseline,
due to a measurement error where whenever correction occurs, it counted as two switching operations
instead of one. Less switching of the routers in the other methods is either because the contaminated
packets are filtered out by the HT mitigation method (e.g. R-AddrFilter), or because the packets are
transmitted based on a counter (e.g. NI-Term). The baseline NoC with 2 DEST HTs differs from the 0 HT
case at a few routers, due to data being routed to the wrong local port. Another example is router R1 (X=2

20
and Y=1) in the baseline NoC, which exhibits more switching activity with 2 HTs than with 0 HTs. This
is because R1s neighbor routers R0 (X=1 and Y=1) and R5 (X=2 and Y=2) are the HT injection sites.
Traffic mitigation due to the DEST HTs can also be seen on router R4 (X=1, Y=2).
The tile plots for the HEAD HT case are shown in Fig. 18. As all of the methods, excluding the proposed
method, cannot prevent the effects of the HEAD HTs, those methods will have less link usage at the
affected routers. This is an indication that the R-Term, Baseline, R-AddrFilter, and NI-Term methods
receive fewer packets than the baseline NoC without the HT case, which is confirmed by our results in Fig.
12. Note that the NI-Term method receives the least as the total traffic for the simulation is less than that of
the other methods due to the counter-based traffic adjustment.
Figure 19 shows the tile plots for the traffic condition of the NoC with different HT mitigation against
two TAIL HTs. Only the R-Term and proposed methods can mitigate TAIL HTs, therefore the other
methods will suffer from bandwidth depletion. Our method corrects the tail flit if a HT is detected. On the
contrary, the R-Term method depends on a counter to timeout the suspended packet transmission. The
counter-based waiting process eventually makes the network halt. This is why the overall switching activity
for the R-Term method is low for this case, even though this method provides some protection against
TAIL attacks.

Effective Average Packet Latency


The packet latency is the time interval from the moment when the packet is completely injected into the
network to the time when the tail flit of that packet arrives in the packet destination. Figure 20 shows the
average packet latency of the proposed method, which does not increase with an increasing number of
DEST HTs. When the traffic injection rate increases, the average latency variation is within only 2.2%.
As some HTs in the other methods can make designs have erratic latency results (for instance, infinite
latency), the average latency would become meaningless. Therefore, we use an effective average latency
metric to compare the latency performance among different methods. As expressed in (7), the effective
average latency (i.e. Avg. LatencyEffective) is the product of the real average latency for all valid packets and
the ratio of the valid packets received by the baseline without HTs over those collected by the method
under test.
Avg. Latency Effective Avg. Latency *

Valid Packets Baseline w / o HT

(7)

Valid Packets method undertest

Using this effective latency, we are able to consider the impact of HTs on latency fairly, given some
methods will never receive the packets affected by HTs. As shown in Fig. 21, the proposed method
improves the effective average latency by up to 63.4%, 68.2%, and 98.9% over the existing methods for the
single DEST HT, single HEAD HT, and single TAIL HT scenarios, respectively.

Figure 22. Success rate PAS of tail bit attacks for different number of permutation configurations available in a router.

Figure 23. Statistical pie chart for the number of valid bits that an adversary modifies on destination address to perform a successful attack.

21

Figure 24. Probability of attack success P AS of the proposed method if the adversary knows the permutation configurations applied in the
router.

Probability of Attack
Success

2-bit
change:
16.35633%

3-bit
change:
1.37491%

4-bit
change:
0.05198%

6-bit
change:
0.00000% 5-bit
change:
0.00092%

1
0.8
0.6
0.4
0.2

y = 0.4887x-1.402

0.0144

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of Permutation Configurations in Router

Comparison of Area Cost and Power Consumption


We implemented the 4x4 NoCs with the compared HT detection/mitigation methods with VerilogHDL,
and synthesized the source codes in Synopsys Design Vision with a 65nm TSMC library. The clock
frequency was set to 500 MHz for all the methods compared in this Section.
The area shown in the second column of Table represents the total hardware cost that is needed at each
NoC node. If the HT detection/mitigation mechanism is applied in the NI, we sum up the area of each
router and the HT countermeasure in NI as the total area. Similar computation is performed for dynamic
and leakage power measurement. As shown in Table 1, the proposed method increases the area overhead
by 39% and 17% over the baseline and NI-Term, respectively. The area overhead is caused by the
permuting mechanisms and extended internal flit bit widths due to the ECC. Note, we implement the
permutation with multiplexers, rather than using software-level algorithm, our method increases the router

22
area. Our method consumes more area than R-Term and R-AddrFilter but achieves better latency and
bandwidth availability as discussed in the previous subsections. Although the area overhead of our method
is higher over other methods, it is still significantly less than using an encryption module. For instance, the
encryption module of an AES-128 costs 89830m2, which is larger than the entire router. To balance the
security, performance and cost, our method is feasible for security-critical applications.
Accordingly, we also compare the power consumption in Table 1. As shown in the third column, our
method increases the dynamic power consumption by 13.1% over the baseline. NI-Term consumes the
highest dynamic power due to the buffer and the counter used in the NI for packet termination. Although
the proposed method increases the router switching power over baseline, R-Term, and R-AddrFilter, the
increased power consumption is still less than the dynamic power of the pipelined AES. Compared with
other method, our method utilizes a comparable amount of leakage power.

Figure 25. Impact of different traffic injection rates and traffic traces on the occurrence of different PUF challenge vectors for the dynamic
permutation configuration.

Probability of HT Attack Success (PAS)


PAS for the Case of Countermeasure Unknown
We used the probability of attack success (PAS) as a metric to evaluate the obfuscation efficiency of the
proposed bit permuting mechanism. PAS is defined as the ratio of the number of cases that the hardware
Trojan successfully selects the bits belonging to the flit field of interest over the total number of test cases.
If the attackers do not have sufficient knowledge of the specific permutation configuration, they will simply
overwrite the interested field at the position that they blindly guessed.
Figure 22 shows the PAS achieved by our permutation method against tail bit attacks for NoC with flit
sizes of 32-bits, 64-bits and 128-bits. Without the permutation, we assume that the hardware Trojan can
always successfully control the tail bit, i.e. PAS will be 1. Since our permutation configuration is
dynamically chosen by the PPS vector, our PAS is less than 1, as shown in Fig. 22. We examined the impact
of the flit width on the PAS for the tail bit attack. Our simulation results show that the average PAS of our
method are 0.0206, 0.0104, and 0.0053 for 32-bit flits, 64-bit flits, and 128-bit flits, respectively.

23
Figure 23 shows the probability of partially or
completely identifying and replacing the exact flitfield positions with intended malicious content. The
flit field of interest has a binary vector that may
have overlapped bits with the malicious intention
bits. Even if the adversary does not correctly guess
all of the target bits, they still may change the target
field to a different malicious vector. As shown in
Fig. 23, the attacker can successfully perform an
attack by changing one of the destination address
bits with a probability of 82.2%. The successful
change of 3 destination address bits is with a
probability of 1.37%, which is over one order of
magnitude lower than the 1-bit cases. The case for
6-bit changes on the destination address is none
among one million test cases. Figure 23 indicates
our error control code can obtain a PAS below 10-6
even if a 5-bit malicious vector is injected by the
attacker.

PAS for When the Adversary Knows the Countermeasure


The goal of the HT countermeasures in [30, 32, 33] and our approaches is to minimize the HT attack
success rate (PAS). A lower PAS means a more secure design. Previous work [30, 32, 33] assumes that the
attacker does not have the knowledge of what HT countermeasure mechanism is used in the design.
Unfortunately, this assumption is not always valid if the attacker has access to the design details. The
methods in [30, 32, 33] yield a PAS of 1 when the applied HT countermeasure is known to the adversary.
This is because the adversary may be able to change the restricted memory range and mute the counter for
HT detection. In contrast, our method assumes that the adversary may know the bit permuting mechanism
in the design, but they do not know the exact PPS vector. This is because the PPS vector is uniquely
generated each router at runtime. As shown in Fig. 24, PAS significantly decreases with the increasing
number of permutation configurations used. The nonlinear trend line is predicted by a power function
y=0.4887x-1.402. As predicted, if the number of permutation configurations available in the router is equal to
16, our method achieves a PAS 1.04*10-2 in the case that the countermeasure is known to the adversary. Our
PAS is two orders of magnitude less than the previous work [30, 32, 33].

Randomness of Proposed Dynamic Permutation Configuration


The randomness of the dynamic permutation configuration is achieved by the following two factors: (1)
the PUF structure implemented in each router produces different select bits due to process variations, and
(2) the challenge bits from the round-robin table depend on the real-time traffic injection rate and traffic
pattern. As a result, the selection of the dynamic permutation configuration is unknown at the design time.
To balance the hardware cost and the security performance achieved by the dynamicity proposed in this
work, in this section, we use eight configuration patterns provided by the PUF structure to prove the
permutation unpredictability of our method. In each NoC router, we implemented eight configurations of
dynamic bit permutation. Figure 25 clearly indicates that the round-robin table based dynamic permutation
can make the permutation configuration unpredictable, thus raising the bar for the attacker to insert a
meaningful hardware Trojan. If one increases the number of multiplexers and registers used in the PUF
structure, more challenge vectors will be generated, thus further improving the randomness of the dynamic

24
pattern selection vector. One traffic trace for Fig. 25 was the secure and non-secure zone cases shown in
Fig. 10, with a traffic injection rate of 0.143 packets per cycle per node. We also examined the impact of
different traffic injection rates and traffic traces on the occurrences of different challenge vectors. As
shown in Fig. 25, different traffic injection rates for the same traffic trace can lead to significant difference
on the occurrence frequency of different challenge vectors. For instance, the configurations 0 and 5 yield 73.9% and +141.4% occurrence changes, respectively, when the traffic injection rate lambda varies from
0.143 to 0.2 packets per cycle per node. As the uniform traffic trace is injected into the network, the
frequency of some challenge vectors (e.g. 1 and 7) are more than doubled.
Figure 26 shows the occurrences of different PUF challenge vectors over the 21,000 clock cycles. As can
be seen, each router demonstrates a unique characteristic of the occurrence frequency for the permutation
configurations. If we increase the number of dynamic patterns available in each router, the randomness
achieved by the routing-history-based PUF output will be further improved. More PUF challenge vectors
will make it more difficult for the attacker to successfully insert a meaningful HT in the NoC router to
deplete the network bandwidth.

Figure 26. Occurrence numbers of PUF challenge vectors for different NoC routers. The X-axis is the decimal number of each dynamic
permutation configuration ID. The Y-axis is the number of occurrence times of that challenge vector in the total simulation period.

Conclusion

25

Security threats in NoCs, such as hardware Trojans, are receiving an increasing amount of attention.
Although HT detection and mitigation approaches have been investigated for general integrated circuits and
systems, the existing methods are not customized for HTs in Networks-on-Chip (NoCs). In this work, we
propose to harden the NoC design, rather than detecting a rogue NoC from software or firmware level.
Previous studies for HTs in NoCs mainly focused on the network interfaces, instead of the routers. To fill
in this gap, we propose a collaborative dynamic permutation and flit integrity check method to thwart an
attacker from maliciously modifying the flit content in the routers. Our method complements the previous
NoC works aiming for NI security. To achieve the dynamicity of the flit permutation, we further propose a
PUF-based random selection vector generation method, which can facilitate to exclusively choose one of
the multiple permutation configurations in each router. As each router selects the permutation configuration
independently, our method makes it more difficult for the attacker to perform a successful HT attack
through multiple routers.
Gate-level hardware assessments show that our method outperforms the address filtering in NIs, packet
termination in NIs, address filtering in routers, and packet termination in routers methods, in terms of
throughput, link availability, bandwidth depletion, and average latency. In addition to evaluating the impact
of destination address HTs on the NoC performance, we also extensively compare our method with other
approaches by considering the scenarios of header and tail bits being controlled by the HTs. Despite an
increase in injected HTs, and a variation on HT types, our method always attains the best performance. Our
method improves the number of valid received packets by up to 70.1% over other methods if the HT
controls the destination address. The proposed method improves the average link availability by up to
43.7% over other methods. Our approach achieves a better effective average latency by up to 63.4%,
68.2%, and 98.9% over the existing methods for the single destination HT, header HT, and tail HT
scenarios, respectively. More interestingly, we assess the attack success rate of our method when the
attacker knows the applied countermeasure. Simulation results show that our method can maintain the
attack success rate at the order of magnitude below 10-2, while other methods will have the attack success
rate of 1.

26
One limitation of our method is area and power overhead. The proposed method consumes 39% more
area and 13% dynamic power than the baseline design. In future work, we will optimize the hardware
implementation to reduce the area overhead. To assess the router- and NI-level address filter methods, we
used a pseudo secure and non-secure traffic zone. In future work, we will also evaluate the NoC
performance by considering the real secure and non-secure traffic scenarios.

References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]

S. King, et al. Designing and implementing malicious hardware, in Proc. the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
(LEE08), Article 5, 2008.
R. Chakraborty, S. Narasimhan and S. Bhunia, Hardware Trojan: Threats and emerging solutions, in Proc. HLDTV, pp. 166171. 2009.
R. Karri, J. Rajendran, K. Rosenfeld and M. Tehranipoor, Trustworthy Hardware: Identifying and Classifying Hardware Trojans, IEEE Computer, vol.
43, no.10, pp. 3946, Oct. 2010.
X. Wang, M. Tehranipoor, J. Plusquellic, Detecting malicious inclusions in secure hardware: Challenges and solutions, in Proc. HOST08, pp. 1519,
2008.
J.-P. Diguet, et al., NOC-centric Security of Reconfigurable SoC, in Proc. NOCS07, pp. 223232, May 2007.
Y. Jin, N. Kupp and Y. Makris, Experiences in Hardware Trojan design and implementation, in Proc. HOST09, pp. 5057, Jul. 2009.
S. Adee, [online] The hunt for the kill switch, http://www.spectrum.ieee.org .
M.Tehranipoor and F. Koushanfar, A Survey of Hardware Trojan Taxonomy and Detection, IEEE Design & Test of Computers, vol. 27, no. 1, pp. 1025,
Jan.-Feb. 2010.
H. Salmani, M. Tehranipoor, and J. Plusquellic, New Design Strategy for Improving Hardware Trojan Detection and Reducing Trojan Activation Time,
in Proc. HOST09, pp. 6673, July 2009.
S. Bhunia, et al, Hardware Trojan Attacks: Threat Analysis and Countermeasures, in Proc. of the IEEE, vol.102, no. 8, pp. 12291247, July 2014.
J. Rajendran, O. Sinanoglu, and R. Karri, Regaining Trust in VLSI Design: Design-for-Trust Techniques, in Proc. of the IEEE, vol. 102, no. 8, pp. 1266
1282, July 2014.
M. Banga and M. Hsiao, VITAMIN: voltage inversion technique to ascertain malicous insertions in ICs, in Proc. HOST09, pp. 104107, 2009.
S. Narasimhan, et al., Improving IC Security Against Trojan Attacks Through Integration of Security Monitors, IEEE Design & Test of Computers, vol.
29, no. 5, pp.3746, Oct. 2012.
Y. Jin, and Y. Makris, Hardware trojan detection using path delay fingerprint, in Proc. HOST08, pp. 5157, 2008.
L. Lin, W. Burleson, and C. Paar, Moles: malicious off-chip leakage enabled by sidechannels, in Proc. ICCAD 09, pp. 117122, 2009.
R. Torrance and D. James, The State-of-the-Art in IC Reverse Engineering, in Proc. CHES, vol. 5747 of LNCS, pp. 363381, Sept. 2009.
D. Agrawal, S. Baktir, D. Karakoyunlu, P. Rohatgi, and B. Sunar, Trojan detectionusing ic fingerprinting, in Proc. IEEE Symposium on Security and
Privacy, pp. 296310, 2007.
S. Vangal, et al., An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS, IEEE Journal of Solid-State Circuits, vol. 43, no. 1, pp.2941, 2008.
W. J. Dally and B. Towles, Route packets, not wires: On-chip interconnection networks, in Proc. DAC01, pp. 684689, Jun. 2001.
L. Benini and G. De Micheli, Networks on Chips: A new SoC paradigm, Computer, pp. 7078, Jan. 2002.
R. S. Chakraborty, and S. Bhunia, HARPOON: An Obfuscation-Based SoC Design Methodology for Hardware Protection, IEEE TCAD, vol. 28, no. 10,
pp. 14931502, 2009.
J. Rajendran, Y. Pino, O. Sinanoglu, and R. Karri, Logic encryption: a fault analysis perspective, in Proc. DATE'12, pp. 953958, 2012.
G. K. Contreras, M. T. Rahman and M. Tehranipoor, Secure Split-Test for Preventing IC Piracy by Untrusted Foundry and Assembly, in Proc. DFT13,
pp. 196203, 2013.
H. Wassel, et al, SurfNoC: A low latency and provably non-intetfering approach to secure Networks-on-Chip, in Proc. ISCA13, pp.583594, 2013.
L. Fiorin, S. Lukovic, G. Palermo, Implementation of a reconfigurable data protection module for NoC-based MPSoCs, in Proc. IEEE. Intl. Symp.
Parallel and Distributed Processing, pp. 18, Apr. 2008.
J. Porquet, A. Greiner, and C. Schwarz, NoC-MPU: a secure architecture for flexible co-hosting on shared memory MPSoCs, in Proc. DATE11, pp. 14,
2011.
J. Sepulveda, et al. Hybrid-on-chip communication architecture for dynamic MP-SoC protection, in Proc. SBCCI, pp. 16, 2012.
L. Fiorin, G. Palermo, S. Lukovic, V. Catalano and C. Silvano, Secure Memory Accesses on Networks-on-Chip, IEEE Trans. on Computers, vol. 57, no.
9, pp. 12161229, Sept. 2008.
L. Fiorin, G. Palermo, S. Lukovic, and C. Silvano, A Data Protection Unit for NoC-Based Architectures, in Proc. CODES+ISSS, pp. 167172, Sept.-Oct.
2007.
S. Baron, M.Wangham, C. Zeferino, Security mechanisms to improve the availability of a Network-on-chip, in Proc. ICECS13, pp. 609612, 2013.
G. Gebotys, and R. Gebotys, A framework for security on NoC technologies, in Proc. ISVLSI, pp. 113117, 2003.
Y. Wang and G. Suh, Efficient timing channel protection for on-chip networks, in Proc. NOCS12, pp. 142151, 2012.
L. Kim, J. Villasenor, and C. Koc, A Trojan-resistant system-on-chip bus architecture, in Proc. Military Comm. Conf. pp. 16, 2009.
.Semiconductor
Research
Corporation,
Reliable
needs
for
secure,
trustworthy,
and
reliable
semiconductors,
[Online]:
https://www.src.org/calendar/e004965/sa-ts-workshop-report-final.pdf.
G.E. Suh and S. Devadas, Physical Unclonable Functions for Device Authentication and Secret Key Generation, in Proc. DAC07, pp.914, June 2007.
Herder, C.; Meng-Day Yu; Koushanfar, F.; Devadas, S. Physical Unclonable Functions and Applications: A Tutorial, Proceedings of the IEEE, vol. 102,
no. 8, pp. 11261141, 2014.
S. Evain, and J.-P. Diguet, From NoC security analysis to design solutions, in Proc. IEEE Workshop on Signal Processing Systems Design and
Implementation, pp.166171, Nov. 2005.
S. Saponara,T. Bacchillone, E. Petr, L. Fanucci, R. Locatelli, and M. Coppola, Design of an NoC Interface Macrocell with Hardware Support of Advanced
Networking Functionalities, IEEE Trans. On Computers, vol. 63, no. 3, pp. 609621, Mar. 2014.
Q. Yu and P. Ampadu, Dual-Layer Adaptive Error Control for Network-on-Chip Links, IEEE Trans. on Very Large Scale Integration (VLSI) Systems,
vol. 20, no. 7, pp. 13041317, 2012.
N. Matveeva, Y. Sheynin, and E. Suvorova, QoS Support in Embedded Networks and NoC, in Proc. 16th Conference of Open Innovations Association
FRUCT, pp. 5159, 2014.
R. J. Shridevi, D. M. Ancajas, K. Chakraborty, and S. Roy, Runtime Detection of a Bandwidth Denial Attack from a Rogue Network-on-Chip, in Proc.
NOCS15, Article No.8, pp. 1-8, 2015.
D. M. Ancajas, K. Chakraborty, and S. Roy, Fort-NoCs: Mitigating the Threat of a Compromised NoC, in Proc. DAC14, pp. 1-6, 2014.

27
[43] H. K. Kapoor, G. B. Rao, S. Arshi, and G. Trivedi, A Security Framework for NoC Using Authenticated Encryption and Session Keys, Circuits,
Systems, and Signal Processing, vol. 32, no. 6, pp 2605-2622, Dec. 2013.
[44] A. Prodromou, A. Panteli, C. Nicopoulos, and Y. Sazeides, Nocalert: An on-line and real-time fault detection mechanism for network-on-chip
architectures, in Proc. MICRO12, pp. 60-71, 2012.
[45] Q. Yu and J. Frey, Exploiting Error Control Approaches for Hardware Trojans on Network-on-Chip Links, in Proc. 16th IEEE Symp. Defect and Fault
Tolerance in VLSI and Nanotechnology Systems, pp. 266-271, Oct. 2013.
[46] Q. Yu, M. Zhang and P. Ampadu, Addressing network-on-chip router errors with inherent information redundancy, ACM Trans. on Embedded
Computing Syst.-Special Issue on On-Chip and Off-Chip Network Archit. vol. 12, no. 4, Article No.105, Jun. 2013.
[47] X. Zhang and M. Tehranipoor, Case study: Detecting Hardware Trojans in third-party digital IP cores, in Proc. HOST11, pp. 6770, June 2011.

Jonathan Frey (S14) is pursuing his Master of Science degree in electrical and computer engineering at the
University of New Hampshire, Durham, NH. He expects to receive his Master degree in December, 2015.
His research interests include network-on-chip, hardware Trojan detection.

Qiaoyan Yu (S03-M11) received the B.S. degree from the Xidian University of China, Xi'an, China in
2002, the M.S. degree from the Zhejiang University of China, Hangzhou China in 2005, and the Ph.D.
degree in electrical and computer engineering from the University of Rochester, Rochester, NY, in 2011. Dr.
Yu is currently an Assistant Professor with the Department of Electrical and Computer Engineering,
University of New Hampshire, Durham, NH. Her research interests include hardware security and trust,
cyber-physical system, error control for networks-on-chip, fault-tolerance for many-core systems, and
emerging nanoelectronics.
Dr. Yu serves on the technical program committees of DFT, ASP-DAC, GLSVLSI, and ISCAS. She is a
member of the editorial boards of Integration, the VLSI Journal, Microelectronics Journal, and Journal of Circuits, Systems, and
Computers.

28
Table 1. Hardware cost and power consumption
Design

Area
(m2)

Dynamic

Leakage
Power (w)

Power (mw)

Baseline

41359.3

12.5229

202.8677

NI-Term

49080.2

15.5648

240.6631

R-Term

41375.9

12.5229

202.9453

R-AddrFilter

42040.1

12.2889

206.3470

Proposed

57514.3

14.1659

287.7810

Pipelined
AES

89830.7

4.5501

352.54