You are on page 1of 12

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO.

11, NOVEMBER 2022 3661

BLAST: Belling the Black-Hat High-Level


Synthesis Tool
Mohammed Abderehman , Rupak Gupta, Rakesh Reddy Theegala, and Chandan Karfa, Senior Member, IEEE

Abstract—A hardware Trojan (HT) is a malicious modifica- controller FSM. Higher abstraction level, shorter design cycle,
tion of the design done by a rogue employee or a malicious easy design space exploration, 10× less coding at a higher
foundry to leak secret information, create a backdoor for attack- level, and shorter verification time make HLS an attractive
ers, alter functionality, degrade performance and even halt the starting point for the IC developers [5].
system. In Black-hat high-level synthesis (HLS) (Pilato et al., Hardware Trojans (HT) [6] are malicious design modifica-
2019), the authors have introduced a possibility of HTs inser-
tion in the register transfer level (RTL) design by the HLS tool
tions by an adversary to either change functionality, degrade
itself. Specifically, degradation attack (DA), battery exhaustion performance, leak information, or denial of service. The HT
(BE) attack, and downgrade attack (DG) have been proposed in has two main parts: 1) a trigger: a circuit (signal) that acti-
that work. In this study, we show how all three HTs inserted by vates the Trojan and 2) a payload: a circuit that performs
Pilato et al. (2019) can be detected using a C-to-RTL equivalence the malicious function activated by the trigger signal. Most
checking framework. We have assumed that both the input C of the HTs are activated by a rare condition. The circuit per-
code and the Trojan-infected RTL code are available for our anal- forms correctly in normal scenarios. Therefore, HTs are very
ysis. Specifically, our framework extracts an RTL-level finite-state hard to detect during the presilicon validation phase. Once
machine with datapaths (RTL-FSMDs) from the HLS-generated the HT is activated, the circuit will start malfunctioning. The
RTL. During finite-state machine with datapath (FSMD) con- HTs may also be inserted in any phase of the design cycle
struction, a BE attack can be identified. Our proposed method
by an untrusted synthesis tool [1], [7]. The impact of HTs
then compares the FSMD of the input C code with the RTL-
FSMD to identify the DA and the DG. The experimental results includes economic damage, planned to obsolesce or cyber-
confirm the detection of HTs of the black-hat HLS tool. attack on national assets [8]. Therefore, the detection of HTs
is an important task for securing the design. This is an active
Index Terms—Finite-state machine with datapath (FSMD), domain of research in this decade [9], [10], [11], [12].
hardware Trojan (HT), high-level synthesis (HLS), HLS opti- The commercial electronic design automation (EDA) com-
mizations, register transfer level (RTL), rewriting method.
panies sold proprietary HLS CAD tools with a set of IPs as
their component library. It may be the case that the licensed
I. I NTRODUCTION software is altered by a rogue employee. As result, the HLS
tool will generate Trojan-infected hardware which may not
HE COMPLEXITY of modern day integrated circuits
T (ICs) is growing exponentially [2]. To keep pace with this
complexity and to reduce design time, the use of high-level
perform as expected after a certain time (i.e., once the Trojan
gets activated). The employee may do this to create significant
economic damage for the company or to give attackers access
synthesis (HLS) [3] tools are rapidly increasing. About 14 out to the secret key of a cryptography hardware or to create a
of the top 20 semiconductor companies are using HLS tools bad name for the company. Since the HT primarily reuses the
for IC development [4]. The HLS tool converts the high-level actual datapath components, it will be hard to detect them by
C/C++ input specification into equivalent register transfer the testing phase.
level (RTL) design. The substeps of the HLS process are: In a recent study [1], [7], it is shown that HT can actually
1) preprocessing which applies various compiler optimization; be inserted by the HLS tool itself. The authors have shown
2) the scheduling phase that assigns each operation to a that it is easy to insert HTs by the HLS tool compared to
time step; 3) allocation and binding which identifies mini- other EDA tool like logic synthesis and physical synthesis
mum functional units (FUs) and registers for the operations tool. Specifically, the Black-Hat HLS tool [1] inserts three
and the variables, respectively, of the C specification based types of HTs: 1) battery exhaustion attack (BE) which may
on schedule; and 4) datapath and controller generation which increase power consumption; 2) degradation attack (DA) to
creates the datapath interconnections and a controller finite degrade the performance of the IPs; and 3) downgrade attack
state machine (FSM). The RTL consists of a datapath and a (DG) to reduce the security level of the design. Since HLS
Manuscript received 5 August 2022; accepted 5 August 2022. Date of
process transforms an un-timed C/C++ code into a timed RTL
current version 24 October 2022. The work of Chandan Karfa was sup- code, applies various optimization in each of its substep, it is
ported in part by the Department of Science and Technology (DST), India, a difficult task for the formal verification tools to find the
under Grant CRG/2019/001300, and in part by the Qualcomm Faculty correlation between the initial specification and the generated
Award 2021. This article was presented at the International Conference on RTL by HLS tool [13]. Therefore, simulation is the primary
Hardware/Software Codesign and System Synthesis (CODES+ISSS) 2022
and appeared as part of the ESWEEK-TCAD special issue. This article was way to verify the correctness of the HLS result. Since it does
recommended by Associate Editor A. K. Coskun (Corresponding author: not provide complete coverage, HT inserted during HLS may
Mohammed Abderehman.) likely to be undetected.
The authors are with the Department of Computer Science and Engineering, The objective of this work is to develop a formal HLS
Indian Institute of Technology Guwahati, Guwahati 781039, India
(e-mail: ma.adem@iitg.ac.in; rgupta@iitg.ac.in; rakes170101071@iitg.ac.in;
Trojan detection framework. Since an HLS tool user generates
ckarfa@iitg.ac.in). RTL from an initial C specification, we assume that a detec-
Digital Object Identifier 10.1109/TCAD.2022.3200513 tion framework has access to both the initial C code and the
1937-4151 
c 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on May 11,2023 at 11:01:43 UTC from IEEE Xplore. Restrictions apply.
3662 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO. 11, NOVEMBER 2022

corresponding RTL code. However, it does not have access however, it is not guaranteed to find the test vectors capa-
to any intermediate synthesis information like scheduling of ble of triggering Trojans and, therefore, detection using this
operations, variable to register mapping information, etc. of technique is not guaranteed.
the HLS tool. The question is “can we detect the HLS Trojan In [20], side channel-based HT detection mechanism has
by comparing the generated RTL with the initial C specifica- been presented. In this method, principle component analysis
tion?” It may be noted that our objective is not proving the (PCA) is used as side-channel fingerprint of the circuit to com-
equivalence between the C and the RTL, rather, we try to find pare it with the golden model. However, the characteristics of
the difference between these two behaviors. This behavioral the physical design can be modified by other factors and not
difference may lead to the detection of HLS HTs. only by HT. As a result, HT detection may not be effective
Our HT detection framework is developed by utilizing two and time consuming. Liu et al. [21] have replaced the require-
of our previous works FastSim [14] and DEEQ [15]. In [14], a ment of the golden model by using golden parametric signature
way to extract a high-level behavior from the HLS-generated obtained by trusted simulation model, parameters from die and
RTLs is shown. In [15], that high-level behavior of the RTL applying advanced statistical modeling techniques. However,
is used to prove the equivalence between the C and RTL. the requirement of precise model of the process makes the
For completeness of this article, we discuss the ideas of [14] technique difficult. In [22], run time detection technique has
and [15] briefly in this article. However, FastSim or DEEQ been presented. Both hardware and software have been used
cannot detect HLS inserted HTs. In this work, we developed to detect HT. Additional circuit (logic) is added in order to
an HT detection framework by utilizing the power of them. support a security monitoring at run time. But, this tech-
Specifically, we are looking for any inconsistency or difference nique is expensive in terms of circuit area. Hicks et al. [9]
during the extraction of a high-level behavior from the RTL presented an HT detection method at RTL level. The HT detec-
in [14] or during equivalence checking in [15]. Once such tion problem is formulated as an unused circuit identification
difference or inconsistency is identified, we further analyzed (UCI) problem. However, how to define unused circuit is not
them to detect the HTs. Specifically, the contributions of this easy and is not quite clear. Therefore, most of these approaches
article are as follows. compared golden model with design circuit to detect HT. To
1) A detection mechanism called BLAST for HLS tool the best of our knowledge, there is no techniques that can
inserted HTs [1] is presented here. detect high level synthesis (HLS) HTs [1].
2) The BLAST utilizes method that extracts a high-level
behavior from RTL and a C-to-RTL equivalence check- B. Verification of HLS
ing method for HT detection.
3) We have shown that all HLS Trojans presented [1] can Formal verification of HLS is still evolving. Most
be identified by BLAST. of the existing techniques proposed phase-wise verifica-
4) A prototype of BLAST is implemented. The experimen- tion of HLS [23]. These methods rely on intermediate
tal results show the usefulness of the proposed method. synthesis information from the HLS tool. Several path-
This is the first attempt to detect the HLS inserted HTs. based equivalence checking methods have been proposed
The remainder of this article is organized as follows. In for verification of compiler optimization and scheduling
Section II, related works are discussed. Background and tasks [24], [25], [26], [27]. In these methods, the input C
overview of BLAST are presented in Section III. Detection of specification and the scheduled behavior are modeled by
HTs are presented in Sections IV–VI. The experimental results an FSM with datapaths (FSMDs). In general, path-based
are presented in Section VII. The performance of BLAST approaches decompose each FSMD into a set of paths and
for various HLS optimizations is discussed in Section VIII. the equivalence is established by showing path-level equiva-
Section IX concludes this article. lence between two FSMDs. There are few works that target
verification of register allocation [28] and the datapath and
the controller generation phase [29] as well. However, these
methods are not applicable for end-to-end verification of
HLS. A recent work [15] proposes C to RTL equivalence
II. R ELATED W ORKS checking for HLS. In the presence of an HT in the RTL,
A. Hardware Trojan Detection this method may not show the equivalence or may result
in false positive. Therefore, the method needs to be tuned
The HT detection mechanisms depend on the deployment
and further analysis is needed to detect HLS HT. Therefore,
phase (like, specification, RTL, layout, and fabrication) and
these approaches are not directly applicable for HLS HT
the required inputs (like, golden chips, etc.) A survey of sev-
detection.
eral techniques for detecting HT at different design flow has
been presented in [16]. In [17], optical inspection-based HT
detection technique is presented. In this method, the layout of III. BACKGROUND AND OVERVIEW OF BLAST
the circuit under test is compared with a picture of the man- In this work, we model C and RTL as FSMDs. The FSMDs
ufactured circuit under test, obtained by removing the layers and the equivalence theory are discussed briefly here.
one by one. This method requires sophisticated and highly
accurate techniques to obtain and analyze the die photo of the
chip under test. However, the process is expensive and time A. FSMDs and Their Equivalence
consuming to apply it. Jha and Jha [18] proposed a randomiza- An FSMD is an inherently deterministic model that can rep-
tion to compare, in probability, the functionality of the original resent any hardware circuit [30]. An FSMD M is defined as a
design and the final circuit. Salmani et al. [19] presented a 7-tuple Q, q0 , I, O, V, f , h, where Q is the finite set of states,
technique to increase the probability of generating a transi- q0 ∈ Q is the reset (initial) state, I is the finite set of input
tion in a Trojan and analyze its activation time. In both cases, variables, O is the finite set of output variables, V is the finite

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on May 11,2023 at 11:01:43 UTC from IEEE Xplore. Restrictions apply.
ABDEREHMAN et al.: BLAST: BELLING THE BLACK-HAT HLS TOOL 3663

set of storage variables, f : Q × 2S → Q is the state transition


function, h : Q × 2S → U is the update function. Here, S
represents the set of relations over arithmetic expressions and
Boolean literals and U represent a set of storage and output
assignments.
A trace of an FSMD is a finite walk from the reset state
q0 to itself, and q0 should not occur in between. The con-
dition of execution cτ of a trace τ is a logical expression
over I, which must be satisfied by the initial data state in
order to traverse the path τ . The data transformation sτ of a
trace τ over O is an ordered tuple ej  of algebraic expres-
sions over I such that the expression ej represents the value
of the output oj after execution of the trace in terms input
variables. Computation of the condition of execution and data
transformation can be obtained by forward substitution. The
forward-substitution method of finding data transformations
is based on symbolic execution [31]. A path p from qi to qj ,
where qi , qj ∈ Q, is a finite transition sequence of states where Fig. 1. Overall flow of our HT detection framework.
all the intermediate states are distinct.
Let the input C behavior be represented by the FSMD
M0 = Q0 , q0,0 , I, V0 , O, f0 , h0  and the RTL be represented attack can be detected. The DA and the DG will be detected
by the FSMD M1 = Q1 , q1,0 , I, V1 , O, f1 , h1 . It may be noted using equivalence checking. The detection mechanisms are
that the inputs and outputs of both the behaviors are identi- discussed in detail in the following sections.
cal. V0 consists of the variables in the input C program. V1
consists of registers of the RTL behavior. The state transition IV. D ETECTION OF BATTERY E XHAUSTION ATTACK
function may differ since the control structure may be altered
due to application of compiler transformations or by schedul- Hardware circuits (especially small devices created with
ing. The equivalence of M0 and M1 means that for all possible IPs) require a battery power source. Managing energy utiliza-
inputs, the execution traces of M0 and M1 produce the same tion is a key design principle in a circuit design. BE attacks
outputs. So, a trace, which represents one possible execution can drain out power by including extra (useless) or idle FUs
of an FSMD, takes one assignment of inputs and produces the that use a considerable amount of power from the source. As a
outputs. The equivalence of traces can be defined as follows. result, extra power will be consumed by the useless FU when
Definition 1 (Equivalence of Traces): A trace τ0 of an it is switched on and battery lifetime is shortening with no
FSMD M0 is equivalent to a trace τ1 of another FSMD M1 , impact on the functionality.
denoted as τ0  τ1 , if cτ0 ≡ cτ1 and sτ0 ≡ sτ1 , where cτ0 and
cτ1 represent the conditions of execution of τ0 and τ1 , respec- A. Attack Model
tively, and sτ0 and sτ1 represent the data transformations of τ0 In a BE attack in [1], the idle FUs will be used to drain out
and τ1 , respectively. the power when the Trojan is activated. The number of FUs
The following two definitions capture the notion of equiv- required for one type of operation (e.g., multiplier) is deter-
alence of FSMDs. mined by the maximum number of that operation scheduled to
Definition 2 (Containment of FSMDs): An FSMD M0 is execute in parallel in a control state. In a control state where
said to be contained in an FSMD M1 , symbolically M0 M1 , the number of operations scheduled is less than the number
if for any trace τ0 of M0 , there exists a trace τ1 of M1 such of FUs present in the datapath, some of the FUs will remain
that τ0  τ1 . idle in that state. These idle FUs are reused to execute some
Definition 3 (Equivalence of FSMDs): Two FSMDs M0 and fake operations in a BE attack. The actual inputs of the idle
M1 are said to be computationally equivalent, i.e., M0  M1 , FU are bit-flipped and then multiplexed with the actual inputs
if M0 M1 and M1 M0 . for the FU. This multiplexer is controlled by the Trojan trig-
ger. The results of such fake operations will not be stored
in the destination register. This can be done by disabling the
B. BLAST: HLS Trojan Detection Framework write enable signal of the destination register. If no idle FU
The semantic gap between the C and RTL is the primary is available in any control state, the tool may insert an addi-
challenge in correlating the C and corresponding RTL gen- tional state and implement the attack on that state. The idea
erated by an HLS tool. To tackle this, we first abstract out a behind this attack is to trigger the combinational functionality
high-level behavior, called RTL level finite-state machine with and enhance dynamic power consumption.
datapath (RTL-FSMD), from the RTL using the idea of [14]. Example 1: Let us consider the example given in Fig. 2.
The input C code is modeled as C-FSMD. Next, the equiva- The expected datapath is shown in Fig. 2(a). The inputs in1
lence checking between input C-FSMD code and RTL-FSMD and in2 to the multiplier are from many time multiplexed regis-
is carried out. ters. The details are not important in our context. We consider
We can detect all three HTs inserted by the Black-hat only the relevant part of the datapath to explain the effect of
HLS tool [1] during this process. The overall flow of our the BE attack. The datapath is modified as shown in Fig. 2(b)
HT detection framework BLAST is given in Fig. 1. During to introduce the attack. Specifically, the output of the multi-
RTL-FSMD extraction from the RTL, any spurious form of plexer is negated (i.e., bit-flipped) and is multiplexed with the
operation will be identified. With further analysis, the BE actual input of the multiplier. The bit-flipped data is stored

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on May 11,2023 at 11:01:43 UTC from IEEE Xplore. Restrictions apply.
3664 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO. 11, NOVEMBER 2022

Algorithm 1: RTL-FSMD_Extraction
Input: RTL
Result: RTL-FSMD
/* RTL consists of a datapath D and a
controller FSM F */
1 foreach state S in the controller FSM F do
2 Find the active micro-operations MS for the control
(a) signal assignments in S;
(b) 3 RS = ; /* Set of RT-operations in S
*/
4 foreach micro-opn of the form μ:r ← rin in MS do
/* Rewrite method */
5 do
(c) (d) 6 w = Find the left-most wire signal in the
RHS exp μe of μ;
Fig. 2. Example to illustrate the effect of BE attack. 7 Find a micro-opn of the form w ← ew in MS ;
8 Replace w with (ew ) in the μe ;
9 while (all signals in RHS exp μe of μ are either
in the register to avoid the combinational loop. These two Input, Reg or Constant);
additional multiplexers and the registers are controlled by the 10 R S = RS ∪ {μ};
Trojan trigger tj. 11 end foreach
The BE attack will be detected during the RTL-FSMD 12 Replace the control signal assignments in S of F with
extraction from the Verilog RTL. The HLS generated RTL RS ;
13 end foreach
has a separate datapath and controller FSM. So, in the FSMD
14 Return F; /* FSM F is converted to FSMD F
extraction phase as explained in [14], the datapath is analyzed
at this point */
for the control signal assignment of each state and the RTL
operations executed in that particular state are identified. This
way the controller FSM and datapath can be converted into an
equivalent FSMD which is nothing but a high-level behavior.
The overall idea of RTL-FSMD extraction process is explained
in the next section and how BE attack will be detected during
that phase in the subsequent section.

B. RTL-FSMD Extraction
In the datapath, signal flow is controlled by the control sig-
nals. For each datapath module, input to output assignments (a)
is termed as micro-operations. For example, for a multiplexer
out = MUX(in1, in2, sel), there are two micro-operations
possible, i.e., out ← in1 and out ← in2 and the associ-
ated control signal assignment are sel = 0 and sel = 1,
respectively. Given a control signal assignment in a control
state, we have a set of active micro-operations in each transi-
tion of the controller FSM. All the assignment operations are
active in all control steps. The RTL operations in each state
are then obtained by application of the rewriting method of (b) (c)
the work [14]. Starting from a micro-operations of the form
Fig. 3. (a) Controller FSM. (b) Datapath. (c) RTL-FSMD.
r ⇐ rin , the rewriting method identifies the spatial sequence
of data flow needed for an RT operation in reverse order. The
method consists in rewriting terms one after another in the
right-hand-side expression using the active micro-operations. Example 2: Let us consider the datapath and controller
The method stops when all the terms in the RHS are either FSM shown in Fig. 3. All the control signal names start with
registers, inputs, or constants. The rewriting takes place from CS. Let the order of the control signals be CSr1, CSr2, CSr3,
left to right in a breadth-first manner. CSr4, CSm1, CSm2, CSm3, CSm4, CSf1, CSf2, CSf3. Let
The above process will identify the RTL operation(s) exe- us consider the control assertion A = 0, 1, 0, 0, 1, 0, 0, 1, 1,
cuted in a state of the controller FSM. The same process can 0, 1 of the transition q2 → q3 . For this control signal assign-
be applied for each state of the controller FSM to extract the ment, the activated micro-operations are:{r1out ⇐ r1, r2out
RTL-FSMD from the RTL. The extraction process is given in ⇐ r2, r3out ⇐ r3, r4out ⇐ r4, m1out ⇐ r3out, m2out ⇐
Algorithm 1. Lines 5–9 in the algorithm represent the rewriting r1out, m3out ⇐ r3out, m4out ⇐ r4out, f1out ⇐ m1out +
process. In this method, we use a Mealy Model representation m2out, f2out ⇐ m3out - m4out, f3out ⇐ f1out * f2out, r2 ⇐
in which the operations are associated with the state transition. f3out}. Out of them, r2 ⇐ f 3out is the micro-operation with
Based on the transition condition, the operation performed is register r2 at LHS. The sequence of the rewriting process to
decided. The process is explained with an example below. accomplish the corresponding RT-operation are as follows:

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on May 11,2023 at 11:01:43 UTC from IEEE Xplore. Restrictions apply.
ABDEREHMAN et al.: BLAST: BELLING THE BLACK-HAT HLS TOOL 3665

r2 ← f3out Algorithm 2: Detect_BE_Attack (RTL, sn )


r2 ← f1out * f2out [since f3out⇐ f1out * f2out] Result: An instance of the battery exhaustion attack in
r2 ← (m1out + m2out) * (m3out − m4out) [since f1out← the state sn .
m1out + m2out and f2out⇐ (m3out − m4out)] 1 Collect all bit-flipped registers in the state sn in Rbf from
r2 ← (r3out + r1out) * (m3out − m4out) [since m1out⇐ RSn (Ref Algorithm 1);
r3out and m2out⇐ r1out ] 2 Collect all idle FUs in the state sn in Fidle .
r2 ← (r3out + r1out) * (r3out − r4out) [since m3out⇐ r3out 3 for each Register ri in Rbf do
and m4out⇐ r4out] 4 for fi ∈ Fidle do
r2 ←(r3 + r1) * (r3out − r4out) [since r3out⇐ r3, r1out⇐r1] 5 Apply the rewrite method from the output of the
r2 ← (r3 + r1) * (r3 − r4) [since r3out⇐ r3, r4out⇐ r4]. idle FU fi .
Since r1, r3, and r4 are registers in the RHS, the rewriting 6 if the input pattern of fi contains any register
process stops. So, the RT-operation r2 ← (r3 + r1) * (r3 − r4) from Rbf then
is executed by the given control assignment in the transition 7 Report “A possible instance of battery
q2 → q3 . The RT-operation for other state transitions of the exhaustion attack” with relevant detail.
FSM can be found in a similar manner. The obtained RTL- 8 end if
FSMD behavior is given in Fig. 3(c). 9 end for
10 end for
11 Report “No battery exhaustion attack is found in the state
C. Detection sn .”
Our idea of detection is to identify the Trojan pattern shown
in Fig. 2(b) in the datapath during FSMD extraction. The
trigger to identify such a pattern is when an RTL opera-
tion of the form R ← ¬R (bit-flipped) is found in a state when the HT is disabled, i.e., tj is False, no spurious operation
during rewriting method. For each state si , there are some will be executed. However, when the Trojan is triggered, the
active micro-operations. The rewriting method takes a micro- spurious operations as shown in the transition with condition
operation in which a register Ri presents in LHS and keeps tj will be executed.
rewriting the RHS terms one by one until the RHS expression During FSMD construction, Algorithm 2 will identify the
consists only inputs, registers, or constants. If the rewriting operations r4 = ¬r4 and r5 = ¬r5 in line 1 and stores r4
method starts with a micro-operation Ri ← w (where w is a and r5 in Rbf . Since the multiplier is idle in this state, the
wire signal) and stops with an RTL operation Ri ← ¬Ri , then rewriting method will identify the input pattern r4 × r5 for
we store the instance of the register in the set Rbf (set of bit- the multiplier. Since r4 and r5 occur in the input pattern of
flipped register). Here, the RHS consists only the LHS register the multiplier, Algorithm 2 will report a possible BE attack. In
in negation form. We then perform the following analysis to addition, it will also report the involved multiplexers, registers,
and the FU for further debugging.
identify the attack.
We shall collect all the bit-flipped registers in a set Rbf in
each control state. We shall also collect all the idle FUs (Fidle ) V. D ETECTION OF D EGRADATION ATTACK
in each control state. Let us assume that the FU fi is idle in The IPs are reused in different applications of system design
control state sn . An FU is idle in a control state if it is not to reduce design costs. They are excellent candidates for hard-
used by any of the RTL operations in that state. The same can ware accelerator design. Many circuit design companies use
be identified by some additional book-keeping in Algorithm 1. HLS tools to create reusable IPs at high-level specifications.
Since all the control signals have some value in each control Therefore, an attacker is more interested in potential mod-
state, data from some registers are coming to the inputs of an ification of the IPs during the design process. In the DA,
idle FU as well. We now apply the rewriting method from the the attacker inserts empty states in the controller FSM. As
output of the idle FU. This will identify the operation starting a result, the performance of the IPs will degrade when the
from the idle FU output (and not from a destination register). Trojan trigger is activated.
This operation for fi is called input pattern to fi . If the input
pattern of fi contains any register from the set Rbf , we shall
report a possible BE attack in the state sn with all relevant A. Attack Model
details to the user. The overall idea is that if some register’s The DA inserts a few empty states (i.e., bubble) in the con-
value is bit-flipped whenever the trigger signal is on and that troller FSM. These bubbles create a divergence in control flow
register’s value is used in some FU and the output of the FU is (i.e., an alternative path) from a specific state si before com-
not utilized then that is the instance of BE attack. The overall ing back to the next state si . The transitions are controlled
detection mechanism is presented as Algorithm 2. by the trigger signal of the Trojan. This alternative trace just
Example 3: Let us now consider the example given in consumes some extra clock cycles before coming back to the
Fig. 2. Let assume that the multiplier in Fig. 2(a) is idle in original behavior. The circuit will perform properly in a normal
state s1. Since the multiplier is idle in s1, the register r1 will scenario. The Trojan can be activated only after a predefined
not be updated in this state. The multiplier output can be visu- amount of time or in the case of a specific input sequence.
alized as fOut ← in1 × in2 in normal mode as shown in When a Trojan is activated, the alternative trace is executed
Fig. 2(c). This is the correct version of the controller for this and it will slow down (i.e., degrade) the actual computation.
state transition. However, the controller FSM behavior will be Example 4: Let us consider the FIR behavior in Fig. 4.
affected by the BE attack as shown in Fig. 2(d). In this case, The behavior is simplified by removing a few loops related
there will be two transitions controlled by the Trojan trigger tj to the memory read and wait for the start signal for simplifi-
from the state s1 in the HT affected design. In a normal mode cation. The corresponding C-FSMD obtained from the input C

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on May 11,2023 at 11:01:43 UTC from IEEE Xplore. Restrictions apply.
3666 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO. 11, NOVEMBER 2022

Algorithm 3: C_to_RTL_EqCheck (C, RTL)


Input: C is the input C to HLS, RTL is generated by a HLS
tool from C
Result: Equivalent, detect degradation or downgrade attack
1 M0 = constructFSMD (C);
2 M1 = RTL-FSMD_Extraction(RTL);
Fig. 4. C code for FIR filter. 3 T0 = findTrace(M0 ); T1 = findTrace(M1 );
4 while (T0 = φ) do
5 τ0 = select a trace from T0 ;
6 TC = getTestcase(τ0 );
7 τ1 = getCorrespondingTrace(T1 , TC);
8 if ((cτ0 ≡ cτ1 ) ∧ (sτ0 ≡ sτ1 )) then
9 //τ0 and τ1 are equivalent
10 removeTrace (τ1 , T1 );
11 end if
12 else if {(cτ0 ∧ cτ1 = φ) ∧ (sτ0 ≡ sτ1 )} then
13 Detect_DA_Attack (τ0 , T1 ); //Algorithm 4 if (not DA)
then
14 Detect_DG_Attack (τ0 , T1 ); //Algorithm 5
15 end if
16 end if
17 else if {(cτ0 ∧ cτ1 = φ) ∧ (sτ0 = sτ1 )} then
18 Detect_DG_Attack (τ0 , T1 ); //Algorithm 5
19 end if
20 removeTrace (τ0 , T0 );
21 end while
22 if (no DA or DG) then
(a) (b) 23 Report “No attack is found”;
24 end if
Fig. 5. Example to illustrate the effect of DA. (a) C-FSMD. (b) RTL-FSMD
after HT insertion.

is shown in Fig. 5(a). The RTL-FSMD obtained from the RTL algorithm examines whether the trace pairs of C-FSMD and
is shown in Fig. 5(b). The HLS tool inserts a bubble state t7 RTL-FSMD are equivalent or not. The algorithmic steps are
as shown in Fig. 5(b). The DA creates a divergence of control described briefly below.
flow from the state t4. In the normal scenario, the transition 1) Generate All Traces in Both the Behaviors: The function
t4 −
→ t5 will be executed. So, there will be no degradation. The
!tj findTrace extracts all the traces of M0 and M1 and assigns
execution will follow the bubble state (t4 −→ t7 → t5) when the
tj to the sets T0 and T1 , respectively. We have used the tool
Trojan trigger tj is True. As a result, for every iteration, there Klee [33] for this purpose. We have modified Klee’s source
will be one cycle of degradation. The bubble is inserted inside code to get the symbolic the data transformation (sτ ) and the
the loop which iterates ntaps time. Therefore, total degradation condition of execution (cτ ) of each trace τ in M0 and M1 .
will be ntaps cycles. 2) Find Potential Corresponding Traces Between Two
The DA will be detected during equivalence checking Behaviors: For checking equivalence between M0 and M1 , we
between C-FSMD and RTL-FSMD. The DA actually changes need to check equivalence between the traces. A naive algo-
the behavior of the controller FSM. As a result, the number rithm will take O(n2 ) comparison (n is the number of traces
of traces has increased in the RTL-FSMD and the condition in an FSMD) to find the equivalence because it will compare
of execution of some traces has also changed. As a result, the each trace in M0 with all traces in M1 (to the worst case)
equivalence of a few traces of C-FSMD could not be found to find the equivalence. To reduce complexity, a data-driven
in RTL-FSMD. In such a situation, we will analyze to find approach is taken to find the potential corresponding traces
a set of traces in RTL-FSMD whose union is equivalent to between T0 and T1 . We use Klee tool [] get a test case for
the trace in C-FSMD. With further analysis of the condition each trace in a behavior. Hence, we know the values of input
of executions of those traces, the DA can be detected. In the variables (test case) for each trace τ0 in T0 . Now, we run M1
following, we briefly discuss the equivalence checking method with this test case and find the trace τ1 which is followed for
followed by the detection of DA. this particular test case. Lines 5–7 of Algorithm 3 implement
this idea. This data-driven approach will reduce the complexity
B. C-to-RTL Equivalence Checking of equivalence checking to O(n) comparisons.
The primary challenge in C-to-RTL equivalence checking 3) Equivalence Checking of Traces Between Two
is the abstraction gap between the C and the RTL codes. Behaviors: Finally, the trace-wise equivalence of poten-
Therefore, the RTL-FSMD is abstracted from the RTL used in tial correspondent traces is checked using SMT solver Z3
the equivalence checking method. The input C is represented [34] in this work. A potential corresponding trace pair are
as C-FSMD. The construction of the FSMD model from the equivalent if their respective condition of executions and data
C is discussed in detail in [32]. Algorithm 3 is used to the transformations are equivalent (in lines 8–10). If they are
equivalence between C-FSMD and RTL-FSMD. The equiva- not equivalent, we check possible instances of degradation
lence checking method used here is adapted from [15]. The or DGs.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on May 11,2023 at 11:01:43 UTC from IEEE Xplore. Restrictions apply.
ABDEREHMAN et al.: BLAST: BELLING THE BLACK-HAT HLS TOOL 3667

Algorithm 4: Detect_DA_Attack (τ0 , T1 )


Input: τ0 , T1
Result: Instance of the degradation (Trojan trigger condition)
1 ccomb = φ; //combined condition of traces
2 foreach τi ∈ T1 do
3 // check if cτi is stronger condition than cτ0 and data
transformations match
4 if ((cτi → cτ0 ∧ sτi ≡ sτ0 )) then
5 ccomb = ccomb ∨ cτi ; (a) (b)
6 end if
7 end foreach Fig. 6. Example to illustrate the effect of DG.
8 //check the union of the strong condition cτi is equal to
condition cτ0
9 if (ccomb ≡ cτ0 ) then
10 Report “Possible degradation attack is found in T1 ”. for a trace τ0 = s1 → s2 → (s3 − →c s4 → s5 −→c s6 →
11 end if s3)ntaps → s1 of C-FSMD in Fig. 5(a) in the RTL-FSMD
12 else in Fig. 5(b). Then Algorithm 3 calls Algorithm 4 to check a
13 Report “No degradation attack is found in T1 .” possible instance of DA. It found two traces τ1 = t1 → t2 →
14 end if (t3 −
→c t4 tj t7 → t5 → t6 −

→ →c t3)
ntaps → t1 and τ = t1 → t2 →
2
(t3 −
→c t4 → t5 → t6 −

!tj

c t3) ntaps → t1 in the RTL-FSMD such
that union of these two traces are equivalent to τ0 and cτ1 and
cτ2 are stronger than cτ0 . After a careful inspection of cτ1 and
C. Detection cτ2 , we found the trigger condition is tj.
We can detect DAs with the help of the equivalence checker
tool. The tool will give us the trace level equivalence between VI. D ETECTION OF D OWNGRADE ATTACK
the C-FSMD and RTL-FSMD. As shown in [1], this bubble
A compromised HLS tool can inject a malicious function-
effectively creates an alternative trace in the behavior with
ality to access the behaviors of the design. This malicious
no effective operation inside it. Therefore, our objective is to
functionality can be used by the actual attacker to extract
identify such spurious traces in the RTL-FSMD. It may be
useful information from the circuit. As a result, the security
noted that each of these traces is associated with a condition
properties of the design like cryptography algorithms (AES,
(i.e., the trigger of the Trojan) and those conditions are not
SHA) will be compromised. The level of trust of these algo-
present in the initial behavior. During equivalence checking,
rithms depends on the number of rounds that are executed.
therefore, the equivalent trace cannot be found for these traces.
If the algorithms execute some rounds below a given count,
Let p :qi ⇒ qj  be one path from the state qi to qj in the RTL-
the design becomes vulnerable. It is sufficient to reduce the
FSMD in which the condition to enable the HT is incorporated
number of the executed round in the cryptography algorithms
by the attacker. In fact, there will be another parallel path (or
to compromise the security of the algorithms.
a set of paths) from qi in RTL-FSMD which is associated with
the negation of the trigger condition. It is, therefore, possible
to find two (or a set of) traces in RTL-FSMD through qi and A. Attack Model
qj whose union will be equivalent to the corresponding trace DG changes the functionality of the input specification. The
in C-FSMD. By analyzing the conditions of these traces, the tool might reduce the number of the executed round in secure
HT trigger condition will be identified. The overall DA detec- hash algorithm (SHA) [36]. This can be done by modifying
tion mechanism is presented in Algorithm 4. Algorithm 4 is the loop constants in the RTL or by preloading a value higher
invoked in line 13 of Algorithm 3 when the equivalence of τ0 than 0 as the initial value of the loop iterator. As a result, the
of M0 and τ1 of M1 cannot be shown because the respective security of the algorithm will be compromised. The value of
condition of executions are not equivalent but their intersection the loop constant depends on the trigger signal of the Trojan.
is not NULL (i.e., there is some common condition of execu- For example, message pairs with a collision can be generated
tions between them). In Algorithm 4, we first identify a set of for the SHA algorithm when the number of a round is reduced
traces in the RTL-FSMD M1 which has same data transfor- from 64 to 18 [37]. The following example explains the DG
mation as that of τ0 but each of them has a stronger condition attack.
of execution than that of τ0 of C-FSMD M0 . The stronger Example 6: Let us consider the example given in Fig. 6.
condition of execution is indicated by implication in line 4 Consider the datapath shown in Fig. 6(a). The nb1 is a con-
of Algorithm 4. Then, we check if the union of condition of stant and the i (loop variable) are inputs to the comparator and
executions of all these traces turns out to be equivalent to the the output determines whether the loop body will be executed
cτ0 . This indicates that some spurious traces may be added in or not. We consider only the relevant part of the datapath to
the RTL-FSMD by the attacker for implementing DA or DG explain the effect of a DG. The datapath is modified as shown
attacks. A trace of C-FSMD can also be split into multiple in Fig. 6(b) to introduce the attack. Specifically, an extra con-
traces in RTL-FSMD during scheduling as shown in [35]. In stant (i.e., nb2) is introduced and is multiplexed with nb1. The
such case, a possible DA attack will be detected which is false output of the multiplexer depends on the trigger condition of
positive. Therefore, a careful manual inspection of the condi- the HT. If the trigger condition is True, the loop will iterate
tion of executions of the traces will identify if any spurious nb2 times instead of nb1 times. The value of the loop bound
HT trigger condition is added in the RTL-FSMD. is changed based on the trigger condition.
Example 5: Consider the FSMDs in Fig. 5. During check- Since the functionality has changed in this attack, the equiv-
ing equivalence, Algorithm 3 could not found any equivalence alence checking method will find a possible nonequivalence

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on May 11,2023 at 11:01:43 UTC from IEEE Xplore. Restrictions apply.
3668 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO. 11, NOVEMBER 2022

(a) (b)

Fig. 7. (a) C Code from AES before adding tj. (b) Representative RTL-C
after DG.

(a) (b) (c)

Fig. 9. Example to illustrate the effect of DG. (a) Trace of τ00 from C-FSMD
obtained after unrolling (nb = 64). (b) Trace of τ10 from RTL-FSMD obtained
after unrolling (nb = 32). (c) Trace of τ11 from RTL-FSMD obtained after
unrolling (nb = 64).

Algorithm 5: Detect_DG_Attack (τ0 , T1 )


(a) (b) Input: τ0 , T1
Result: Instance of the downgrade (Trojan trigger condition)
Fig. 8. Example to illustrate the effect of DG. (a) C-FSMD (M0 ) obtained 1 ccomb = φ; //combined condition of traces
from Fig. 7(a). (b) RTL-FSMD (M1 ) obtained from Fig. 7(b). 2 foreach τi ∈ T1 do
3 // check if cτi is stronger condition than cτ0
4 if (cτi → cτ0 ) then
between the C-FSMD and the RTL-FSMD. In addition to 5 ccomb = ccomb ∨ cτi ;
that, the traces for which equivalence could not be found by 6 end if
the method will also be reported. These traces correspond to 7 end foreach
8 //check the union of the strong condition ccomb is equal to
the block where functional changes are made by the tool. condition cτ0
An attacker in the foundry or a rogue user can activate the 9 if (ccomb ≡ cτ0 ) then
Trojan after a predefined amount of time or by a specific input 10 Report “Possible downgrade attack is found in T1 ”.
sequence. With the careful inspection of this information, the 11 end if
user can pinpoint the Trojan inserted by the HLS tool. 12 else
13 Report “No downgrade attack is found in T1 .”
14 end if
B. Detection
DG compromises the circuit behavior only when the Trojan
is activated. As long as the Trojan trigger is not activated,
the design produces correct results and the Trojan stays unde- of executions of cτ0 . This indicates that some spurious traces
tected. We use our equivalent checker tool to detect the HT may be added in the RTL-FSMD by the attacker for imple-
trigger condition for DG. In DG attack, the number of traces menting DG attack. By a careful analysis of the traces involved
in M0 and M1 are different due to the HT trigger condition. in ccomb , we can figure out the spurious HT trigger condition
Specifically, M1 contains more traces than that of M0 . The of the DG and the value of the loop counter is changed.
overall DG detection mechanism is presented in Algorithm 5. Example 7: Consider the input C-code and the corre-
Algorithm 5 is invoked in Algorithm 3 when our equivalent sponding RTL-C (after HLS tool inserts Trojan) as shown
checker could not find equivalent of a trace of τ0 of M0 in in Figs. 7(a) and (b), respectively, and their corresponding
M1 in two scenarios (in lines 13 and 17). In former case FSMDs are shown Figs. 8(a) and (b), respectively. The traces
(line 13), the respective condition of executions have some of the respective behaviors are shown in Fig. 9. Specifically,
overlapping but are not equivalent but the data transforma- the C-FSMD has one trace and the RTL-FSMD has two traces.
tions are equivalent. In the later case (line 17), both condition During equivalence checking between traces in M0 and M1 , the
of executions and the data transformations are not equiva- equivalence of τ00 could not be found. But, the algorithm will
lent. In Algorithm 5, we first identify a set of traces τi in the select τ10 (or τ11 ) and Algorithm 5 will be called in line 17
RTL-FSMD for which the condition of execution is stronger (or line 13). Since cτ10 and cτ11 are stronger than cτ00 and
(stronger condition is indicated by implication) than that τ0 . (cτ10 ∨ cτ11 ) ≡ cτ00 , Algorithm 5 will report a possible DG
Then, we check if the union of condition of executions of all attack. After a careful inspection of the data transformation
these traces ccomb turns out to be equivalent to the condition and condition of execution of τ10 and τ11 , we identify that

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on May 11,2023 at 11:01:43 UTC from IEEE Xplore. Restrictions apply.
ABDEREHMAN et al.: BLAST: BELLING THE BLACK-HAT HLS TOOL 3669

TABLE I
E XPERIMENTAL R ESULTS OF HT D ETECTION FOR D IFFERENT HLS and without branch (ARFNB)], each of them is written in
B ENCHMARKS C-code. The benchmarks are taken from the distribution of
Bambu HLS [40]. The experimental results of our bench-
marks are shown in Table I. The second (type) and third
(#C) columns show the type of attacks and the number of
input C code lines for each benchmark, respectively. We have
recorded the number of code lines (#RTL) of RTL and the
number of code lines (#RTLC) of RTL-C in fourth and fifth
columns, respectively. The sixth and seventh columns are the
trace count (#TC) in the source C and RTL-C, respectively. It
may be noted that the trace count is not measured for BE
attack since it is identified in RTL-FSMD extraction. The
number of instance (#instance) of HT insertion is given in
column eighth. The column ninth represents the HT detection
time by our tool BLAST. Each row (BE/DA/DG) represents
a BE/degradation/downgrade HT scenarios created from the
original RTL. In the case of a BE attack, the detection time is
less compared to other attacks because BE attack is identified
during the FSMD extraction phase. The equivalence check-
ing is not needed to detect BE attack. For FMM, although
the number of traces are high, the BE attack is detected in
quick time. The DA attack detection time is much higher as
compared to BE attack detection for FMM. The DA attack
detection time for Parker, Find_min and FMM is higher as
compared to the DA attack detection in other test cases since
the hardware Trigger condition tj and the value of loop count the number of traces is more in these cases. As a result, equiv-
(variable) are decided by tj. The loop executes a reduced num- alence checking is taking more time. In all cases, HT attacks
ber of rounds (from 64 to 32) in case of HT is activated. In are correctly detected by our proposed method. Generally, the
addition to loop count, the data transformation of the traces is detection time for our framework is not high. We have tested
also affected by the modification of the loop count. that the runtime of our tool is not much impacted by the appli-
cations of HLS optimizations on our benchmarks since the
overall steps to be checked are mostly the same in all cases.
VII. E XPERIMENTAL R ESULTS HT Overheads: We synthesized the original RTL code [RTL
Implementation Detail and Experimental Setups: Our HT (original)], the RTL code after inclusion of HTs [RTL(HTs)] to
detection framework BLAST is implemented in Python. The check the resource utilization overheads of HT implementa-
BLAST first extracts an abstract syntax tree (AST) from the tion. We evaluated the attacks implemented in Vivado HLS
Verilog using the pyVerilog [38] parser and then implemented generated RTL code. In this experiment, we reported the
the rewriting method on the AST to obtain the RTL-FSMD. scenario from Table I on which the overhead is maximum.
Specifically, we have adapted the FastSim [14] to extract the For example, among three scenarios of Parker in Table I, we
RTL-FSMD in our work. The RTL-FSMD of our flow and reported the BE scenario since overhead was maximum in BE
the RTL-C of FastSim represents the same reverse engineering among the three scenarios of Parker. All the designs were syn-
high-level behavior of RTL. The C-FSMD is extracted from thesized for Virtex4 XC4VCX15 series FPGA. From device
the input C behavior. For identifying traces in an FSMD, we utilization summary report obtained after synthesis, we cal-
have used Klee [33] and for checking the equivalence of traces culate the overhead (Slices, Flipflop, and LUT) needed by
between two FSMDs, we have used SMT solver Z3 [34]. The the additional logic added to implement HTs in the origi-
experiments have been performed on a machine with a CPU: nal RTL with respect to the available resource in the device.
Intel Core i7, 2.5 GHz, and 8 GB RAM on a set of HLS Table II presents the device utilization summary and maxi-
benchmarks. We have used the Vivado HLS tool [39] to gen- mum area overhead of RTL (HTs) as compared to the RTL
erate Verilog RTL for the benchmarks written in C. We then (original) for bigger test cases. As shown, the hardware needed
manually inserted all three HTs (BE, degradation, and DGs) to implement RTL (HTs) is slightly more than the hardware
on the RTLs and generate various versions of the RTL.1 The needed to implement RTL (original). The area overhead is
HTs are inserted by the following logic presented in [1]. We less than 1% in most cases. In general, these results show
have not added any trigger circuit in the RTLs to activate the that the HTs minimally impact the area. Similarly, minimum
HTs. Instead, we assume the trigger condition as an input. We input arrival time before clock (MIATBC), maximum output
perform the RTL simulation to ensure the functional correct- required time after clock (MORTAC), and maximum combi-
ness of the modified RTLs when the HTs are not activated and national path delay (MCPD) of RTL (BE) and RTL (DA) as
the desired behaviors when the HTs are activated. compared to RTL (original) obtained from timing report after
Experiments: We evaluated our method on a variety of synthesis are reported Table III. As shown in the table, the
HLS benchmarks [Waka, Motion, Parker, Array add, Find min, delay is increased by an average of 1ns for all the scenarios.
FMM, and auto-regressive lattice filter with branch (ARFNC) We have not reported the DG scenarios here since the results
were similar. Hence, we conclude that the extra logic added to
1 The BlackHat HLS tool [1] is not available online. Therefore, we have implement HTs into the original RTL minimally impacts the
implemented the same idea on the Vivado HLS generated RTLs. area and speed of the design.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on May 11,2023 at 11:01:43 UTC from IEEE Xplore. Restrictions apply.
3670 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO. 11, NOVEMBER 2022

TABLE II
C OMPARISONS OF A REA OVERHEAD FOR RTL (O RIGINAL ) W ITH tool applies various hardware-oriented optimizations like array
R ESPECT T O THE RTL (HT S ) partitioning, loop pipelining, loop unrolling, data flow opti-
mizations, etc. in the back-end of HLS to make the input
C/C++ code hardware efficient. In this section, we have ana-
lyzed how HLS optimizations affect the performance of the
BLAST framework.
The BLAST has two phases—an RTL-FSMD extraction
phase and a C-to-RTL equivalence checking phase. The RTL-
FSMD extraction phase is relied on the FastSim [14] tool. This
tool is equipped to handle all kinds of optimizations applied in
HLS. To detect BE attack, BLAST essentially adds a module
(i.e., Algorithm 2) in FastSim flow to analyze the BE attack
in presence of bit-flipped operations in a state as discussed
in Section IV-C. Therefore, the BE attack can be detected by
BLAST irrespective of what HLS optimizations are applied
by the HLS tool. Since BLAST analyzes the RTLs generated
by the HLS tool in a state wise manner of the controller FSM,
the run time of BE attack detection is not impacted much by
the applications of HLS optimizations. Usually, the BE attack
is identified in milliseconds.
The DA and DG attacks detection rely on the C-to-RTL
equivalence checking in which the RTL-FSMD extracted in
phase one is formally compared with the input C behavior (i.e.,
C-FSMD). Since BLAST checks the trace level equivalence
between these two behaviors, a major change in the control
flow due to HLS optimizations will impact DA and DG attacks
detection probability. The front-end optimizations like con-
stant propagation, copy propagation, common subexpression
elimination, dead code elimination, static single assignment,
TABLE III code motion, operator strength reduction (e.g., multiplication
C OMPARISONS OF I NCREASE IN D ELAY FOR RTL (O RIGINAL ) W ITH RTL
(BE) AND RTL (DA) by constant is replaced by left shift by constant), etc. mostly
impact the data dependence in the behavior. Such optimiza-
tions do not impact much on the control flow of the input
behavior. Therefore, the performance of BLAST won’t be
impacted by applications of such software optimizations in
the front-end of the HLS.
Let us now discuss the hardware oriented optimizations.
The array partitioning essentially breaks an array into multiple
arrays to map them into multiple RAMs in order to improve
memory access time. The array merging is the reverse process
of array partitioning. In our case, the RAMs are represented as
arrays in RTL-FSMD. So, we have two behaviors where the
number of intermediate arrays is different. The control struc-
ture of the input behavior is not impacted by this optimization.
Therefore, array partitioning/merging won’t impact our DA
and DG detection. Loop unrolling unrolls the loop of input C.
In Algorithm 3, we use Klee to identify traces in the behav-
iors. Klee unrolls loops to identify the traces. Although loop
unrolling changes the control structure, it won’t impact the
detection of DA and DG attacks in BLAST since loops are
unrolled during detection.
The loop pipelining creates multiple stages within a state
where each stage works on the data of different iterations of
the loop. This helps in running the multiple iterations of the
VIII. P ERFORMANCE OF BLAST FOR HLS loop in parallel to improve the latency. For a pipelined func-
O PTIMIZATIONS tion, the pipelined stages work in similar manner. The FastSim
Modern day HLS tools are equipped with various soft- creates sequential representation of the pipelined stages with
ware and hardware-oriented optimizations to provide efficient suitable logic to handle the inherent dataflow among the sub-
hardware from a given C/C++ behavior. In the front-end sequent stages. Consider the example in Fig. 10 to understand
of the HLS tool, a compiler like GCC or LLVM is used the fact. Assume the operations within a loop body are sched-
to parse the input behavior. Since these C/C++ compilers uled in three pipeline stages as shown in Fig. 10(a). The
consist of hundreds of software code optimizations, they are corresponding RTL-FSMD behavior is shown in Fig. 10(b).
now available in the HLS tool. On the other hand, the HLS Each pipeline stage is activated by a flag. In the first clock,

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on May 11,2023 at 11:01:43 UTC from IEEE Xplore. Restrictions apply.
ABDEREHMAN et al.: BLAST: BELLING THE BLACK-HAT HLS TOOL 3671

analyzed the area and delay overheads of our benchmarks. We


concluded that the inclusion of HTs minimally impact the area
and the delay. The experimental results for a commercial HLS
tool for several HLS benchmarks show that our method can
efficiently detect the attacks automatically without taking any
information from the HLS tool.
Based on the impact of HTs, they can be categorized
into: 1) change functionality; 2) degrade performance; 3) leak
information; and 4) denial of service. While denial of service
type HT will be hard to implement at HLS, the others HTs
Fig. 10. Representation of pipelined loop in C. are possible. In fact, the BE attack degrades performance by
consuming more power, DA attack degrades performance by
increasing the latency of the design and the DG attack changes
only stage 1 is active and in the second clock both stage 1 the functionality. Similar to BE and DA attacks, the other
and stage 2 are active. From third clock, all stages are active. attacks can also be formulated to leak information about secret
FastSim copies the value of each intermediate variable into a data in terms of power or latency. The novelty of our work lies
temporary variable and uses them in the right-hand expression with the fact that we have identified the generic impacts of the
of the operations. Consequently, at ith clock, stage 1 works HTs at higher abstraction level. Specifically, the HTs results
on ith inputs, stage 2 works on the (i − 1)th data and state 3 in either 1) executing spurious bit-flipping operations around
works on (i − 2)th data. During equivalence checking between an idle FUs in a state or 2) any HT trigger condition effec-
C-FSMD and RTL-FSMD, such pipelined loop will result in tively creates branches in the controller FSM. This results in a
a single trace. The corresponding loop of C-FSMD is also situation where a set of traces are created from a single trace
results in a single trace. Thus, there won’t be any change in of the input behavior. The approach proposed in this paper are
the control flow between C-FSMD and RTL-FSMD in pres- trying to find these two scenarios. Therefore, BLAST should
ence of loop pipelining. Therefore, BLAST can detect DA and able to detect any kind of HTs that changes functionality or
DG attacks in presence of loop and function pipelinings. degrades performance or leaks information.
In data-flow optimization, the producer-consumer relation
between various modules in the input C code is identified
and such modules are executed in parallel in RTL. The first- R EFERENCES
in-first-out (FIFO) or Ping-Pong buffer is used between a
[1] C. Pilato, K. Basu, F. Regazzoni, and R. Karri, “Black-hat high-level
producer-consumer pair for asynchronous data communication synthesis: Myth or reality?” IEEE Trans. Very Large Scale Integr. (VLSI)
between them. To model such parallel behavior in RTL-FSMD Syst., vol. 27, no. 4, pp. 913–926, Apr. 2019.
(which is a sequential behavior), we extract the FSMD of [2] H. Stefan, K. Sri, and P. Dickon, Creating Value in the Semiconductor
each module first. We then generate a global RTL-FSMD in Industry. New York, NY, USA: McKinsey, Aug. 2011, pp. 1–15.
which one of the states of each module will be executed in [3] P. Coussy, D. D. Gajski, M. Meredith, and A. Takach, “An introduc-
tion to high-level synthesis,” IEEE Design Test Comput., vol. 26, no. 4,
each clock.2 The next state to be executed in a module is pp. 8–17, Jul./Aug. 2009.
determined by the state transition of RTL-FSMD of the corre- [4] B. Bailey. “Is High-Level Synthesis Ready for Prime Time?” Jun. 2012.
sponding module. The detail of such modeling may be found [Online]. Available: http://www.edn.com/design/integrated-circuit-desig
in [14]. Since the control flow of the RTL-FSMD in presence n/4375454/Is-high-level-synthesis-ready-for-prime-time
of data flow optimization is completely different from that of [5] J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang,
“High-level synthesis for FPGAs: From prototyping to deployment,”
the C-FSMD, BLAST cannot detect DA or DG attach in such IEEE Trans. Comput.-Aided Design Integr. Circuits Syst, vol. 30, no. 4,
a scenario. Specifically, Algorithm 3 returns false-negative (in pp. 473–491, Apr. 2011.
line 23) if dataflow optimizations are applied. In general, if the [6] R. Karri, J. Rajendran, K. Rosenfeld, and M. Tehranipoor, “Trustworthy
control flow of the input behavior is modified significantly by hardware: Identifying and classifying hardware trojans,” Computer,
HLS, BLAST may return false-negative results. We have used vol. 43, no. 10, pp. 39–46, Oct. 2010.
[7] K. Basu et al., “CAD-base: An attack vector into the electronics sup-
Klee to obtain the traces in a program (line 3 of Algorithm 3) ply chain,” ACM Trans. Design Autom. Electron. Syst., vol. 24, no. 4,
and Z3 SMT solver for checking the equivalence of traces in pp. 1–30, Apr. 2019.
our Algorithms. So, the run time of BLAST largely depends [8] S. Bhunia and M. M. Tehranipoor, The Hardware Trojan War: Attacks,
on these two tools to detect DA and DG attacks. Myths, and Defenses. Cham, Switzerland: Springer, Jan. 2018.
[9] M. Hicks, M. Finnicum, S. T. King, M. M. K. Martin, and J. M. Smith,
“Overcoming an untrusted computing base: Detecting and remov-
IX. C ONCLUSION ing malicious hardware automatically,” in Proc. IEEE S P, 2010,
In this article, we presented the BLAST framework to detect pp. 159–172.
HLS tool inserted HT in the RTL. The strength in our detec- [10] U. Alsaiari and F. Gebali, “Hardware trojan detection using reconfig-
urable assertion checkers,” IEEE Trans. Very Large Scale Integr. (VLSI)
tion framework is the ability to construct a C-like behavioral Syst., vol. 27, no. 7, pp. 1575–1586, Jul. 2019.
specification from the RTL generated by the HLS tool. This [11] C. Bao, D. Forte, and A. Srivastava, “Temperature tracking: Toward
helps us to correlate the RTL with the input C-specification. robust run-time detection of hardware trojans,” IEEE Trans. Comput.-
We have shown that all three Trojan inserted by the Black- Aided Design Integr. Circuits Syst., vol. 34, no. 10, pp. 1577–1585,
Oct. 2015.
hat HLS tool are detected in our framework. Our framework
[12] N. Fern and K. T. Cheng, “Evaluating assertion set completeness to
will report the possible instances of the HTs. The developer expose hardware trojans and verification blindspots,” in Proc. DATE,
can identify the actual HT with further analysis. We have also Mar. 2019, pp. 402–407.
[13] W. Chen, S. Ray, J. Bhadra, M. Abadir, and L.-C. Wang, “Challenges and
2 Since RTL-FSMD is a cycle accurate model, operations executed in each trends in modern SoC design verification,” IEEE Design Test, vol. 34,
clock can be tracked. no. 5, pp. 7–22, Oct. 2017.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on May 11,2023 at 11:01:43 UTC from IEEE Xplore. Restrictions apply.
3672 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 41, NO. 11, NOVEMBER 2022

[14] M. Abderehman, J. Patidar, J. Oza, Y. Nigam, T. A. Khader, and [39] “Vivado high-level synthesis.” Accessed: May 21, 2022. [Online].
C. Karfa, “FastSim: A fast simulation framework for high-level Available: http://xilinx.com/support/download.html
synthesis,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., [40] “Bambu tool reference.” Accessed: May 21, 2022. [Online]. Available:
vol. 41, no. 5, pp. 1371–1385, May 2022. https://panda.dei.polimi.it/
[15] M. Abderehman, T. R. Reddy, and C. Karfa, “DEEQ: Data-driven end-
to-end EQuivalence checking of high-level synthesis,” in Proc. 23rd
ISQED, 2022, pp. 64–70.
[16] S. Bhasin and F. Regazzoni, “A survey on hardware trojan detection
techniques,” in Proc. ISCAS, 2015, pp. 2021–2024.
[17] S. Bhasin, J.-L. Danger, S. Guilley, X. T. Ngo, and L. Sauvage, Mohammed Abderehman received the B.Tech.
“Hardware trojan horses in cryptographic IP cores,” in Proc. Workshop degree in computer engineering from Defence
Fault Diagnosis Tolerance Cryptogr., 2013, pp. 15–29. University, Engineering College, Bishoftu, Ethiopia,
[18] S. Jha and S. Jha, “Randomization based probabilistic approach to detect in 2011, and the M.Tech. degree from the Defence
trojan circuits,” in Proc. 11th IEEE High Assurance Syst. Eng. Symp., Institute of Advanced Technology, Pune, India, in
2008, pp. 117–124. 2014. He is currently pursuing the Ph.D. degree
[19] H. Salmani, M. Tehranipoor, and J. Plusquellic, “A novel technique with the Indian Institute of Technology Guwahati,
for improving hardware trojan detection and reducing trojan activation Guwahati, India.
time,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 1, His current research interests include the verifica-
pp. 112–125, Jan. 2012. tion of high-level synthesis, hardware security, and
[20] D. Agrawal, S. Baktir, D. Karakoyunlu, P. Rohatgi, and B. Sunar, “Trojan embedded system.
detection using IC fingerprinting,” in Proc. IEEE SP, 2007, pp. 296–310.
[21] Y. Liu, K. Huang, and Y. Makris, “Hardware trojan detection through
golden chip-free statistical side-channel fingerprinting,” in Proc. DAC,
2014, pp. 1–6.
[22] G. Bloom, B. Narahari, and R. Simha, “OS support for detecting trojan
circuit attacks,” in Proc. HOST, 2009, pp. 100–103. Rupak Gupta received the B.E. degree in computer
[23] C. Karfa, D. Sarkar, C. Mandal, and C. Reade, “Hand-in-hand verifica- science and engineering from M.I.T.M Indore (Rajiv
tion of high-level synthesis,” in Proc. GLSVLSI, 2007, pp. 429–434. Gandhi Proudyogiki Vishwavidyalaya), Indore,
[24] S. Kundu, S. Lerner, and R. K. Gupta, “Translation validation of high- India, in 2018, and the M.Tech. degree in computer
level synthesis,” IEEE Trans. Comput.-Aided Design Integr. Circuits science and engineering from IIT Guwahati,
Syst., vol. 29, no. 4, pp. 566–579, Apr. 2010. Guwahati, India, in 2021.
[25] K. Banerjee, C. Karfa, D. Sarkar, and C. Mandal, “Verification of code He is currently working as a Software Engineer
motion techniques using value propagation,” IEEE Trans. Comput.-Aided with Amagi Media Labs, Bengaluru, India. His
Design Integr. Circuits Syst., vol. 33, no. 8, pp. 1180–1193, Aug. 2014. current research interests include hardware trojan
detection, reverse engineering register to variable
[26] R. Chouksey, C. Karfa, and P. Bhaduri, “Translation validation of code
mapping and RTL to C equivalence checking.
motion transformations involving loops,” IEEE Trans. Comput.-Aided
Design Integr. Circuits Syst., vol. 38, no. 7, pp. 1378–1382, Jul. 2019.
[27] R. Chouksey and C. Karfa, “Verification of scheduling of conditional
Behaviors in high-level synthesis,” IEEE Trans. Very Large Scale Integr.
(VLSI) Syst., vol. 28, no. 7, pp. 1638–1651, Jul. 2020.
[28] C. Karfa, D. Sarkar, C. Mandal, and C. Reade, “Register sharing verifi-
cation during data-path synthesis,” in Proc. ICCTA, 2007, pp. 135–140. Rakesh Reddy Theegala received the B.Tech.
degree from IIT Guwahati, Guwahati, India, in
[29] C. Karfa, D. Sarkar, and C. Mandal, “Verification of Datapath and
2021.
controller generation phase in high-level synthesis of digital circuits,”
He is working as a Google Software Engineering,
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 29, no. 3,
IIT Guwahati. His current interest includes RTL to
pp. 479–492, Mar. 2010.
C equivalence checking.
[30] D. Gajski and L. Ramachandran, “Introduction to high-level synthe-
sis,” IEEE Trans. Design Test Comput., vol. 11, no. 4, pp. 44–54, 1994.
[Online]. Available: https://ieeexplore.ieee.org/document/329454
[31] Z. Manna, Mathematical Theory of Computation. Tokyo, Japan:
McGraw-Hill Kogakusha, 1974.
[32] C. Karfa, C. Mandal, and D. Sarkar, “Formal verification of code motion
techniques using data-flow-driven equivalence checking,” ACM Trans.
Design Autom. Electron. Syst., vol. 17, no. 3, p. 30, 2012.
[33] C. Cadar, D. Dunbar, and D. Engler, “KLEE: Unassisted and automatic
generation of high-coverage tests for complex systems programs,” in
Proc. OSDI, Dec. 2008, pp. 209–224. Chandan Karfa (Senior Member, IEEE) received
the M.S. and Ph.D. degrees in computer science
[34] “Z3—The SMT solver.” Accessed: May 21, 2022. [Online]. Available:
and engineering from IIT Kharagpur, Kharagpur,
https://github.com/Z3Prover/z3
India, in 2007 and 2011, respectively.
[35] O. Peñalba, J. M. Mendías, and R. Hermida, “A global approach to He has worked for five years as an Sr. R&D
improve conditional hardware reuse in high-level synthesis,” J. Syst. Engineer with Synopsys (India) Pvt. Ltd.,
Archit., vol. 47, no. 12, pp. 959–975, Jun. 2002. Bengaluru, India. He is currently working as
[36] “Secure hash standard—SHS: Federal information processing, standard an Associate Professor with the Department of
FIPS 180-4, U.S. department of commerce and national institute of stan- Computer Science and Engineering, IIT Guwahati,
dards and technology (NIST).” 2012. https://csrc.nist.gov/publications/ Guwahati, India. He has published more than fifty
detail/fips/180/4/final research papers in reputed international journals
[37] S. K. Sanadhya and P. Sarkar, “Attacking reduced round SHA-256,” and conferences. His research interests include formal verification, high-level
in Applied Cryptography and Network Security. Heidelberg, Germany: synthesis, hardware security, and formal methods.
Springer, 2008, pp. 130–143. Dr. Karfa has received the Qualcomm Faculty Award from Qualcomm in
[38] S. Takamaeda-Yamazaki, “Pyverilog: A Python-based hard- 2021, the TechnoInventor Award by India Electronics and Semiconductor
ware design processing toolkit for verilog HDL,” in Applied Association in 2014, the Innovative Student Projects Award from Indian
Reconfigurable Computing (Lecture Notes in Computer Science National Academy of Engineers in 2008 and 2013, the Best Paper Awards
9040). Cham, Switzerland: Springer Int., Apr. 2015, pp. 451–460, in ADCOM Conference in 2007 and in I-CARE conference in 2013, and the
doi: 10.1007/978-3-319-16214-0_42. Microsoft Research India Ph.D. Fellowship in 2008.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on May 11,2023 at 11:01:43 UTC from IEEE Xplore. Restrictions apply.

You might also like