2002 10944

1
Optimizing Privacy-Preserving Outsourced

Convolutional Neural Network Predictions
Minghui Li, Sherman S. M. Chow, Senior Member, IEEE, Shengshan Hu,
Yuejing Yan, Minxin Du, and Zhibo Wang, Senior Member, IEEE
Abstract—Neural networks provide better prediction performance than previous techniques. Prediction-as-a-service thus becomes
popular, especially in the outsourced setting since it involves extensive computation. Recent researches focus on the privacy of the
query and results, but they do not provide model privacy against the model-hosting server and may leak partial information about the
results. Some of them further require frequent interactions with the querier or heavy computation overheads. This paper proposes a
new scheme for privacy-preserving neural network prediction in the outsourced setting, i.e., the server cannot learn the query,
arXiv:2002.10944v1 [cs.CR] 22 Feb 2020
(intermediate) results, and the model. Similar to SecureML (S&P’17), a representative work which provides model privacy, we leverage
two non-colluding servers with secret sharing and triplet generation to minimize the usage of heavyweight cryptography. Further, we
adopt asynchronous computation to improve the throughput, and design garbled circuits for the non-polynomial activation function to
keep the same accuracy as the underlying network (instead of approximating it). Our experiments on four neural network architectures
show that our scheme achieves an average of 282× improvements in reducing latency compared to SecureML. Compared to MiniONN
(CCS’17) and EzPC (EuroS&P’19), both without model privacy, our scheme also has 18× and 10× lower latency, respectively. For the
communication costs, our scheme outperforms SecureML by 122×, MiniONN by 49×, and EzPC by 38×.
Index Terms—Convolutional neural network, outsourcing computation, privacy preservation, homomorphic encryption
1 I NTRODUCTION aim to protect the model not only from the querier as most
of the existing works did, but also from the hosting server.
N EURAL network (NN) [1] attempts to identify
relationships underlying a set of data by mimicking
the way the human brain operates. Convolutional neural 1.1 Related Work
networks (CNN), which are based on biologically-inspired
variants of multi-layer perceptrons, are proven to be very Privacy-Preserving Prediction without Model Privacy. Many
useful in medical image analysis, natural language existing schemes can protect the privacy of the query from
processing, and recognition of images and videos. the server. Yet, they consider the model owner is the server
The trained network/model is a valuable asset performing the prediction tasks, i.e., the prediction service
generated after collecting and learning from a massive only works with the plaintext knowledge of the model. Some
amount of data. It is tempting for an adversary to steal the works [10], [11] use the additive homomorphic encryption
model, pirate it, or use it to provide a commercial to perform operations over the encrypted query and the
prediction service for profits [2]. Moreover, with the clear model. Others [12], [13], [14], [15], [16] design
knowledge of the model, the risk of compromising the protocols for various kinds of computations in the secure
privacy of a model is higher since white-box attacks can two-party computation (S2C) setting. The S2C approach
infer more information than black-box attacks, e.g., often expects the queriers to remain online and interact
membership inference attack [3], [4], [5] for determining if with the server continuously, which brings some burden to
a specific sample was in the training dataset. Knowing the the queriers and incurs higher costs in network round-trip
model also makes adversarial example attacks [6], [7], [8], time.
[9] more effective. A tiny perturbation of the input can Fully Homomorphic Encryption (FHE). FHE [17] allows
deceive the model and threaten its accuracy. Outsourcing processing (any polynomial functions over) encrypted data.
the model and the prediction service to any third-party It is thus a handy tool for not only processing the query in
cloud server (say, for relieving from the cost of maintaining an encrypted format but also processing over an encrypted
an online server) thus comes with great privacy and model. As a counterexample, in the case of decision tree,
security implications. This paper tackles model privacy. We Tai et al. [18] managed to use only additive HE instead of
FHE to support S2C evaluation, but it is subject to the
• M. Li, S. Hu, Y. Yan, and Z. Wang are with the School of Cyber limitation that the server needs to know the model in clear.
Science and Engineering, School of Computer Science, Wuhan University, With its powerful computability, FHE introduces high
Wuhan, 430072, China, and State Key Laboratory of Cryptology, P.O. Box overheads to the servers. It also fails to cover
5159, Beijing, 100878, China.
E-mail: {minghuili, hushengshan, yjyan, zbwang}@whu.edu.cn non-polynomial operations which are common in NN.
• S. Chow and M. Du are with the Department of Information Engineering, CryptoDL [19] and E2DM [20] approximate them by
The Chinese University of Hong Kong, Hong Kong. E-mail: {sherman, polynomials, which degrades the prediction accuracy. A
dm018}@ie.cuhk.edu.hk
major drawback is that this approach requires the model
2
⑤
 
⑥
③
① ②  ④  
 Server 
 Offline triplet
generation
Users    Users 

① 
Server  ③  ⑥
②
 ④
⑤
① Secret‐Sharing Query ② Convolution ③ Activation ④ Pooling ⑤ Fully connected ⑥ Prediction
Fig. 1: System overview
owner to encrypt the model w.r.t the public key of each 1.2 Technical Highlights
querier. Not only it becomes impractical for multiple In this paper, we study the problem of privacy-preserving
queriers, this also increases the risk of model leakage since neural network prediction. In particular, we make the
any querier has the power to decrypt the model. A very following technical contributions.
recent work uses multi-key FHE for supporting outsourced
• We design protocols for different phases of the
decision-tree evaluation specifically [21]. Yet, it further
convolutional neural network prediction. Our
requires the FHE scheme to support multiple keys, i.e., the
system can simultaneously protect the query, the
decryption requires multiple secret keys and hence
model, any intermediate results, and the final
involves multiple parties.
prediction results against the servers while
Secret Sharing and Non-Colluding Computation Model. guaranteeing a high accuracy and efficiency.
Cryptography is for reducing our trust assumptions as • We accelerate the triplets generation [26], [27] using
much as possible. In many different application data packing and single-instruction-multiple-data
domains [22], [23], [24], the non-colluding assumption is (SIMD) [29]. Besides, we adopt asynchronous
leveraged to reduce the use of (relatively heavyweight) computation to speed up both offline and online
cryptographic techniques. In the minimal setting with two computations.
non-colluding (cloud) servers, neither of them is trusted; • For non-polynomial activation functions, we design
yet, they are assumed not to be colluding with each other. a series of garbled circuits for S2C between the two
servers. This preserves the accuracy differing from
Baryalai et al. [25] realize prediction services by S2C existing approaches which only approximate the
among two servers, one with the model and the other one activation functions by polynomial.
with a secret key. They do not consider the privacy of the • For efficiency, we replace the non-polynomial
intermediate data (e.g., the output of the activation max-pooling function with the linear
function), which may infer the prediction results. average-pooling function. Our experimental result
In SecureML [26], the model is divided into two of training via these two methods demonstrates that
random shares deployed to the two servers, respectively. the final accuracy of average-pooling is comparable
The querier shares the query across the two servers. The or even higher than that of max-pooling.
two servers interact with each other. Eventually, they • Our extensive experiments on four neural network
derive their corresponding share of the prediction result. architectures show that our scheme achieves 282×,
The user can then derive the final result by merging both 18×, and 10× lower latency than SecureML [26],
shares locally. To speed up the online (inner-product) MiniONN [12], and EzPC [15], respectively. For the
computation, SecureML generates Beaver’s multiplication communication costs, our scheme outperforms prior
triplet [27] for the additively-shared data in an offline art such as SecureML by 122×, MiniONN by 49×,
preprocessing stage. For the non-polynomial operation (i.e., and EzPC by 38×.
comparison), they use Yao’s garbled circuits (GC) [28] for • We provide a security analysis in the simulation
boolean computations. Yet, SecureML focuses on simple paradigm to show the privacy of our scheme.
neural networks (and linear/logistic regression) without
considering the more complex convolutional NNs. 2 P ROBLEM S TATEMENT AND P RELIMINARY
Table 1 coarsely compares existing schemes. It is fair to 2.1 System Model and Threat Model
say that there is no outsourcing solution with model privacy Fig. 1 illustrates our system model. The model, generated
and satisfactory performance in accuracy and efficiency. by a model owner (not depicted), is randomly split and
3
TABLE 1: Comparison of related work TABLE 2: Notations

Privacy G the original image (of pixel size n × n)
model inter. Accuracy Efficiency C the (m × m) convolutional kernels of CNN

query
para. data
H the convoluted image
NDCS [25] 7 7 3 High Low
f (x) the non-polynomial activation function
CryptoNets [10] 7 3 3 Low Medium
J the activated image
CryptoDL [19] 3 3 3 Low Medium q×q the size of the pooling window
E2DM [20] 3 3 3 Low Medium K the down-sampled image
XONN [16] 7 3 3 Medium High W the weight matrix in fully-connected layer
Deepsecure [14] 7 3 3 High Medium F the prediction results
SecureML [26] 3 3 3 High Medium · the inner product operation
MiniONN [12] 7 3 3 High High
Gazelle [13] 7 3 3 High High
let J be the image activated by f (x), which remains of size
EzPC [15] 7 3 3 High High n × n.
Our Scheme 3 3 3 High High The pooling layer performs a down-sampling along the
spatial dimensions while retaining the critical information.
The usual ones are max-pooling and average-pooling, which
shared between two servers, S1 and S2 . A user U who outputs the maximum or the average value of the pool,
holds the query also randomly splits it into two shares and respectively. The size of the pool is q × q . The resulting
uploads them to S1 and S2 , respectively. Based on the smaller image K is of size (n/q)2 .
shared model and query, S1 interacts with S2 to make The final layer is the fully-connected layer. The image
predictions. After interactions, the servers obtain the matrix in the previous layer is first transformed into a
corresponding shared results and return them to U , who column vector of dimension being the neuron numbers of
can eventually recover the real prediction result locally. the first layer in fully-connected layer, i.e., each element in
Following [25], [26], [12], [13], [14], [15], [16], we assume the vector is the input of the corresponding neuron. Every
that S1 and S2 are honest-but-curious. Besides, they are not neuron in the previous layer is connected to every neuron
colluding with each other. Most service providers (say, in the next layer. The weight matrix of the fully-connected
Microsoft Azure and Amazon EC2) are motivated to layer is denoted by W. By the scalar product of the output
maintain their reputation instead of risking it with of the previous layer and the corresponding weight matrix,
collusion. With this assumption, our design goal becomes we obtain a column vector, each of its element corresponds
ensuring both S1 and S2 cannot learn any information to the input of the next layer. The output of the last layer is
about the prediction model, the query from U , and the deemed as the final prediction results.
(intermediate) prediction results. Intuitively, this gives us The classes with the desired scores form the final
hope for higher efficiency without greatly affecting prediction results F , which correspond to the most
accuracy. probable categories. See [1] for more details about CNN
and its applications.
The goal of our system is to let the querier learn F
2.2 CNN and Notations without revealing the query G to the servers. Meanwhile,
2.2.1 Convolutional Neural Network (CNN) the servers do not need to know the neural network.
CNN is good for tasks such as image recognition and
classification [30]. We use image processing as the running 2.2.2 Notations
example. A typical CNN consists of four classes of layers. We tabulate the notation in Table 2. We operate on the
The core one is the convolutional layer. It computes the secret-shared values extensively. For differentiation, we use
convolution, essentially an inner product, between the raw a superscript 1 or 2 to denote the shared information of
pixel values in each local region of the image and a set of servers S1 or S2 . For examples, the shared image held by
the convolutional kernel. Let G be the query image as an S1 and S2 are G 1 and G 2 respectively, and the i-th row and
n × n matrix of 8-bit pixel. For simplicity, we consider only j -th column entry held by S1 are Gi,j 1
. The inner product
one convolution kernel C each of size m × m (i.e., the operator is ·. For the inner product of two matrices, we first
padding mode is ‘‘same’’). The convolution transforms G transform the matrices into vectors.
into a matrix H.
After the convolutional layer, it is usually the activation
2.3 Cryptographic Tools
layer, which applies an element-wise activation function. A
non-polynomial activation function increases the nonlinear 2.3.1 Homomorphic Encryption (HE)
properties of the trained model. We use the most HE (e.g., [17], [31]) allows computations over encrypted
fashionable ReLU function, which only keeps the positive data without access to the secret key. Decrypting the
part of any real-valued input (i.e., f (x) = max(0, x)). We resultant ciphertexts gives the computation result of the
4
tri ← TRIP(a1 , b1 , (pk , sk ); a2 , b2 , pk ) (H1 ; H2 ) ← CONV(G 1 , C1 , (pk , sk ); G 2 , C2 , pk )
Input: S1 holds the shares a1 ∈ Z2t , b1 ∈ Z2t and the Input: S1 holds the shares of image G 1 and
key pair (pk , sk ). S2 holds the other shared a2 = (a−a1 ) convolutional kernel C1 , and (pk , sk ), S2 holds shares
mod 2t , b2 = (b − b1 ) mod 2t and the public key pk . G 2 , C2 , and pk .
Output: S1 gets tri1 = (a1 , b1 , z1 ). S2 gets tri2 = Output: S1 , S2 get shared convoluted images H1 , H2
(a2 , b2 , z2 ), where z = z1 + z2 = a · b mod 2t . resp.
At S1 side: S1 and S2 :
1) Encrypt a1 and b1 to have Ja1 Kpk and Jb1 Kpk ; 1) for i, j in range n
2) Send the ciphertexts Ja1 Kpk and Jb1 Kpk to S2 . 1 p,q=i+ m−1
- Choose the sub-image hGp,q ip,q=i− m−1
1
as vector
1
At S2 side: 1 2 p,q=i+ m−1 2
G̃i,j and hGp,q ip,q=i− m−1
1
as vector G̃i,j ;
1) Compute V1 = Ja1 · b2 Kpk = Ja1 Kpk ⊗ Jb2 Kpk , -
1
Call TRIP to generate the triplet vectors: tri =
V2 = Ja2 · b1 Kpk = Jb1 Kpk ⊗ Jb1 Kpk ; (A1 = hak1 i, A2 = hak2 i, B1 = hbk1 i, B2 = hbk2 i,
2) Send V3 = V1 ⊕ V2 ⊕ JrKpk to S1 , where r ∈ Z2t ; Z1 = hz1k i, Z2 = hz2k i)m
2
k=1 ;
3) Set z2 = (a2 · b2 + r) mod 2t . // z1 +z2 = (a1 +a2 )(bk1 +bk2 ) or Z1 +Z2 = AB
k k k k
At S1 side: - S1 sends U 1 = G̃i,j1

− A1 , V 1 = C1 − B1 to S2 ,
1) Decrypt V3 to have v = a1 · b2 + a2 · b1 + r; S2 sends U = G̃i,j − A2 , V 2 = C2 − B2 to S1 ;
2 2
2) Set z1 = (v + a1 · b1 ) = (a · b + r) mod 2t . - S1 and S2 compute U = G̃i,j − A, V = C − B ;

1 1
- S1 computes Hi,j = −U V + G̃i,j V + C1 U + Z1 ,
The triplet is tri = ((a1 , a2 ), (b1 , b2 ), (z1 , z2 )). 2 2 2
S2 computes Hi,j = G̃i,j V + C U + Z2 .
Fig. 2: Secure triplet generation protocol 2) Return H1 and H2 .
H1 + H2 = Z1 + Z2 + GC − AB .
Fig. 3: Convolution operation

operations as if they had been performed on the plaintext.
For the public/private-key pair (pk , sk ), and an encryption
of x denoted by JxKpk , we have homomorphisms ⊕ and ⊗ 3 O UR C ONSTRUCTION
where JaKpk ⊕ JbKpk = Ja + bKpk , JaKpk ⊗ JbKpk = Ja × bKpk .
Following MiniONN [12], this work uses the ring-based 3.1 Accelerated Triplet Generation
FHE scheme called YASHE [31]. It offers a plaintext space The convolutional and fully-connected layers involve many
which is large enough for usual computation. With inner-product computations. For preserving privacy, S1
single-instruction multiple-data (SIMD) techniques [29], and S2 operate on the secret shares of the inputs. They first
4096 independent plaintexts in our application can be prepare a triplet for minimizing the cost of online
packed to one ciphertext and operated in parallel, which computation over secret shares. Specifically, S1 holds a1
reduces the communication and computation. and b1 , S2 holds a2 and b2 . They would like to compute the
multiplication of a = a1 + a2 and b = b1 + b2 in a shared
2.3.2 Secure Two-Party Computation (S2C) form i.e., S1 obtains z1 , S2 obtains z2 where z1 + z2 = ab.
S2C allows two parties to jointly compute a function over The triplets generation algorithm TRIP in Fig. 2 consists
their private inputs while warranting the correctness. No of time-consuming homomorphic operations. Since
party learns anything about the other’s input beyond what ab = (a1 + a2 )(b1 + b2 ) = a1 b1 + a1 b2 + a2 b1 + a2 b2 , we
is implied by the function. Garbled circuits proposed by can compute a1 b1 and a2 b2 locally. To compute a1 b2 and
Yao [28] can evaluate arbitrary function represented as a a2 b1 , S1 first encrypts a1 and b1 and sends the ciphertexts
boolean circuit. At a high level, one party prepares the to S2 . So, S2 can add V1 , V2 , and a random number r
garbled circuit and sends it to the other party, who together to obtain V3 via additive homomorphism without
obliviously obtains the circuit and then evaluate the entire information leakage. The shared result of S2 is
circuit gate-by-gate. During the entire process, no party z2 = a2 b2 + r. By decrypting V3 , S1 obtains v , which is
learns anything about the input data of others beyond what added to a1 b1 to obtain the shared result z1 . In this way,
is implied by the final results. the shared results z1 and z2 can recover the multiplication
Secret Sharing allows one to distribute a secret among result z by simple addition, but neither S1 nor S2 can
several parties by distributing shares to each party. For learn z . This preparation is done in an offline stage.
example, in (t, n)-Shamir secret sharing [32], there are n The triplet generation process involves many
shares which are random elements of a finite field. Any set homomorphic operations. To further speed up the triplet
of at least t shares allow recovery of the secret value via generation process, we also adopt data packing and
Lagrange interpolation. Any non-qualifying set of shares asynchronous computation.
looks randomly distributed, which provides perfect
confidentiality. In this paper, we use (2, 2)-Shamir secret 3.1.1 Data Packing
sharing or simply additive secret sharing, which the two Considering n shares hai1 , bi1 , ai2 , bi2 in i n
i=1 , we pack ha1 ii=1
shares add up to the secret value in the field. Many S2C together to be a single plaintext A1 to generate n triplets
protocols operate on the shares (e.g., see [33]). simultaneously. Analogously, we compute the packed data
5
𝑎 𝑏 𝑎 𝑏 𝑎 ·𝑏 𝑎 ·𝑏 𝑎 ·𝑏 +𝑎 ·𝑏 +𝑟
 𝑎 𝑏 Enc 𝑎 𝑏 𝑎 ·𝑏 𝑎 ·𝑏 𝑎 ·𝑏 +𝑎 ·𝑏 +𝑟

(𝑠𝑘)
…
…
…
…
…
…
…

𝑎 𝑏 𝑎 𝑏 Mul 𝑎 ·𝑏 𝑎 ·𝑏 𝑎 ·𝑏 +𝑎 ·𝑏 +𝑟
  Dec 
Add
𝑎 𝑏 𝑟 𝑎 𝑏 𝑟 
𝑎 ·𝑏 +𝑎 ·𝑏 +𝑟
𝑎 𝑏 𝑟 Enc 𝑎 𝑏 𝑟 𝑎 ·𝑏 +𝑎 ·𝑏 +𝑟
 
…
…
…
…
…
…
…
𝑎 𝑏 𝑟 𝑎 𝑏 𝑟 𝑎 ·𝑏 +𝑎 ·𝑏 +𝑟
 𝑧 =𝑎 · 𝑏 + 𝑎 · 𝑏 + 𝑎 · 𝑏 + 𝑟  𝑧 =𝑎 · 𝑏 − 𝑟 𝑎 ·𝑏
𝑧 =𝑎 · 𝑏 + 𝑎 · 𝑏 + 𝑎 · 𝑏 + 𝑟 𝑧 =𝑎 · 𝑏 − 𝑟 𝑎 ·𝑏
+ =
…
…
𝑧 =𝑎 · 𝑏 + 𝑎 · 𝑏 + 𝑎 · 𝑏 + 𝑟 𝑧 =𝑎 · 𝑏 − 𝑟 𝑎 ·𝑏
Fig. 4: Packed triplet generation process
B1 = hbi1 ini=1 , A2 = hai2 ini=1 , and B2 = hbi2 ini=1 . The 3.2.2 Secret-Sharing the Query
homomorphic operations are then performed on the U randomly divides the pixel Gi,j (i, j ∈ [1, n]) of query
packed data via SIMD [29] to get the packed triplets 1
G into Gi,j 2
and Gi,j as Gi,j = Gi,j1 2
+ Gi,j . G 1 and G 2 are
((A1 , A2 ), (B1 , B2 ), (Z1 , Z2 )). They can then be unpacked distributed to S1 and S2 , respectively.
to extract the real triplets h(ai1 , ai2 ), (bi1 , bi2 ), (z1i , z2i )in
i=1 ,
where z1i + z2i = (ai1 + ai2 ) · (bi1 + bi2 ). In essence, our 3.2.3 Convolution Computation
approach reduces the encryption, transmission, addition, As in Fig. 3, S1 and S2 first extract the sub-image G̃i,j 1
and
and multiplication costs. Fig. 4 gives a toy example for the 2 1
G̃i,j with the size of m × m from the shared images G and
packed triplet generation process.
G 2 , respectively. They then engage in the TRIP protocol to
generate tri = ((A1 , A2 ), (B1 , B2 ), (Z1 , Z2 )). With tri, S1
3.1.2 Asynchronous Computation (S2 ) uses A1 (A2 ) as a one-time pad to hide the sub-images
1 2
G̃i,j (resp. G̃i,j )). Likewise, they use B1 (B2 ) to hide the
In the offline phase, S1 and S2 have to wait for the
intermediate results from the other while generating the convolutional kernel C1 (resp. C2 ). After they exchanged
triplets. We call this synchronous computation, as these padded shares, they can locally compute H1 and H2 .
demonstrated on the left of Fig. 5. To speed it up, we H1 + H2 gives
design an asynchronous computation scheme exhibited on Z1 + Z2 + G̃i,j V + CU − (G̃i,j − A)(C − B)
the right of Fig. 5. Instead of waiting for the feedback, the
=AB + G̃i,j C − G̃i,j B + G̃i,j C − CA
servers continue the remaining operations which do not
involve the feedback. For example, S1 can encrypt b1 when −(G̃i,j C − AC − G̃i,j B + AB) = Gi,j · C.
transforming the ciphertext a1 . S2 can encrypt the random
number r and compute a2 b2 ahead of time. Compared with 3.2.4 Activation Computation
synchronous computing, asynchronous computation
reduces the waiting time and hence the latency.
S1 and S2 use the garbled circuit in Fig. 6 to compute the
activation function over H1 and H2 . The circuit utilizes four
sub-circuits:
3.2 Our Design • ADD(a, b) outputs the sum a + b;
• GT(a, b) returns the bit denoting if a > b;
With reference to the system overview in Fig. 1, our system
• MUX(a, b, c) outputs a or b according to c, i.e., if c
consists of the following stages.
holds, MUX(a, b, c) = b, otherwise MUX(a, b, c) = a;
and
3.2.1 Initialization • SUB(a, b) returns the difference a − b.
S1 possesses one share of the prediction model, i.e., the (ActF): ((0/1) · H − R; R) ← ActF(H1 ; (H2 , R)), S1 and
convolutional kernel C1 and the weight matrix W1 , and the S2 run ActF with H1 and (H2 , R) being the respective
HE key pair (pk , sk ). S2 owns the other share of the inputs, where R is a random matrix generated by S2 . The
prediction model (i.e., C2 and W2 ) and the public key pk . U outputs of S1 and S2 are (0/1) · H − R and R, respectively.
holds the query image G . Specifically, ADD(H1 , H2 ) adds H1 and H2 to get H. Then
6
(𝒔𝒌) Syn.  (𝒔𝒌) Asyn. 

(𝑎 , 𝑏 ) 𝐈𝐧𝐩𝐮𝐭 (𝑎 , 𝑏 ), 𝑟 (𝑎 , 𝑏 ) 𝐈𝐧𝐩𝐮𝐭 (𝑎 , 𝑏 ), 𝑟
𝑎 𝑎 𝑎
Enc 𝑏 Enc
𝑏 Enc
𝑏 𝑎
𝑎 𝑟
𝑏
Tra
Tra 𝑏 𝑉 = 𝑎 ⨂ 𝑏
𝑎 , 𝑏 , 𝑟
𝑉 = 𝑎 ⨂ 𝑏 𝑉 = 𝑎 ⨂ 𝑏
Com 𝑉 = 𝑎 ⨂ 𝑏
𝑉 = 𝑉 ⨁𝑉 ⨁ 𝑟
𝑉 = 𝑉 ⨁𝑉 ⨁ 𝑟 Com
𝑧 =𝑎 𝑏 -𝑟 Tra 𝑉
𝑉 Tra 𝑧 =𝑎 𝑏 -𝑟
Dec 𝑣=𝐷𝑒𝑐(𝑉 )
𝑣=𝐷𝑒𝑐(𝑉 )
Dec 𝑧 =𝑣+𝑎 𝑏
𝑧 =𝑣+𝑎 𝑏Query 1 Query 2 Query 3 Query 4
𝑧 =𝑎 𝑏 +𝑏 𝑎 pre
𝐎𝐮𝐭𝐩𝐮𝐭
pre pre
𝑧 =𝑎 𝑏 -𝑟
pre  𝑧 =𝑎 𝑏 +𝑏 𝑎 𝐎𝐮𝐭𝐩𝐮𝐭 𝑧 =𝑎 𝑏 -𝑟
𝑏 …
Conv
+𝑟+𝑎 Tra Tra Tra Tra …  +𝑟+𝑎 𝑏 
later later later later
 
Fig. 5: Synchronous and asynchronous computations
ADD

01 ; K2 ) ←GT
(K POOL(J 1 ; J 2 )
𝑹 𝑹  𝐵
  𝑅



0 S1 holds
Input: MUX theshared activated image J1 , S2 holds
𝐵 𝐵 ·
𝑩 · −𝑹
GT MUX 𝐵 ·shares
the other  J2 .
ADD 𝑅 the shared pooled image K1 , S2
SUB  Output:SUBS 1 obtains
holds the other share K2 .
  0 0 At S1 side:
𝐵 · −𝑅 𝑅
1) for i, j in range b nq c
- Compute
 1 
P u=q−1 Pv=q−1 1
q×q u=0 v=0 Jqi−u,qj−v .
Fig. 6: Activation function circuit 1
- Set the averages as the value of Ki,j .
At S2 side:
GT(H, 0) compares H and 0 element-wise for ReLU. The
output is a binary matrix B of the same size as H, where 1) for i, j in range b nq c
1 Pu=q−1 Pv=q−1 2
the pixel Bi,j = 1 if Hi,j > 0, 0 otherwise. With B , - Compute q×q u=0 v=0 Jqi−u,qj−v .
MUL(H, 0, B) performs activation function, i.e., if Hi,j > 0 - 2
Set the averages as the value of Ki,j .
(Bi,j = 1), outputs H, otherwise outputs 0. Finally, SUB
makes element-wise subtraction to have B · H − R, which Fig. 7: Pooling operation
is the output of S1 . Let J be the activated image B · H.
J 1 = B · H − R. The output of S2 is regarded as J 2 = R.
3.2.6 Fully Connected Layer
3.2.5 Pooling Computation This layer in essence performs dot products between the
The two servers can perform the average-pooling on the pooled data K and the weight parameters W, which can be
respective shared data (e.g., J 1 and J 2 ) without any computed using the triplets similar to that in the
interaction. convolutional layer illustrated in Fig. 3. We skip the largely
repetitive details. Specifically, S1 and S2 take as input the
(J11 + J21 + · · · + Jq12 )/q 2 + (J12 + J22 + · · · + Jq22 )/q 2 shares K1 , W1 and K2 , W2 respectively, resulting in the
=(J1 + J2 + · · · + Jq2 )/q 2 . shared prediction results F 1 and F 2 . User U can merge F 1
and F 2 to recover the result F = KW.
Fig. 7 shows how S1 and S2 perform element-wise average-
pooling over (J 1 , J 2 ) to obtain (K1 , K2 ). The pixel value
1
Ki,j is the average of the corresponding q × q pixels in J 1 . 3.3 Accelerated Asynchronous Online Computations
2
Analogously, Ki,j can be derived from J 2 . Employing the For convolution and fully-connected layers, we can
average value to replace the original pixels, we can reduce conduct computations (of U 1, V 1, U 2, V 2
J to the down-sampled image K of size d nq e × d nq e. 1 2
(pre-transmission), U, V, Hi,j , Hi,j (post-transmission)) and
7
Query 1 Query 2 Query 3 Query 4 1) Convolution: input image 28 × 28, window size
Pre‐tra Pre‐tra Pre‐tra Pre‐tra 

16: R16×576 ← R16×25 × R25×576 .

5 × 5, stride(1, 1), number of output channels of
Conv … Tra Tra Tra Tra … 2)  

ReLU Activation: calculates ReLU for each input.
Post‐tra Post‐tra Post‐tra Post‐tra
3)  
Average Pooling: window size 1 × 2 × 2 and
outputs R16×12×12 .
4) Convolution: input image 12 × 12, window size
Fig. 8: Online asynchronous computation ADD
5 × 5, stride (1, 1), number of output channels of
16: R16×64 ← R16×400 × R400×64 .
5) ReLU Activation: calculatesReLU for each input.
transmission (of U 1 , V 1 , U 2 , V 2 ) at the same time. Fig. 8
illustrates the asynchronous computation process. The
6)
0
outputs R16×4×4 .
GT
Average Pooling: window size 1 × 2 × 2 and
𝑹
computation process at the tail of the arrow is in parallel 𝑹  𝐵 the incoming 256
with the transmission process atthe arrow
 the pre-transmission and post-transmission
𝑅 head. For
7) Fully Connected: fully connects
notes to the outgoing 100 nodes:R100×1 ←

example,
2 are in𝐵parallel with
0
R100×256 × R256×1 . MUX
operations of query 𝐵 ·the transmission
𝑩· 𝑹
GT
of query 1 and query 3, respectively.MUX 8)
𝐵 ·  ReLU for each input.
ReLU Activation: calculates
ADD 9) Fully Connected: fully connects the incoming
𝑅
SUB  SUB
100 notes to the outgoing 10 nodes: R10×1 ←
10×100 100×1
4 PERFORMANCE E VALUATION R ×R .
4.1Experimental Setup 0
0 and Dataset Fig. 9: The neural network architecture used in our
Our experiments used MNIST dataset [34], a widely-used experiment  𝑅 𝐵· 𝑅
dataset in the machine learning community, which consists
of 60, 000 training and 10, 000 black-white hand-written
digit test-images of 28 × 28 pixels belong to 10 classes.

TABLE 3: Comparison of triplet generation 
Following MiniONN [12], we trained and implemented a Methodology Complexity Required operations
realistic neural network. Fig. 9 details its architecture. n·(5Enc + 2CMul + 2Add +
For U , S1 , and S2 , we use separate machines with an SecureML [26] O(n)
1Dec)
Intel 4-Core CPU operating at 1.60 GHz and equipped with n
·(5Enc + 2CMul + 2Add +
8 GB RAM, running a ubuntu 18.04 operating system. MiniONN [12] O(n/l) l
1Dec)
We implemented our scheme in C++ with Python n
·(5Enc + 2CMul + 2Add +
binding. For garbled circuits, we use the ABY library with Our Scheme O(n/l) l
1Dec)
SIMD circuits [33]. Following MiniONN, we use YASHE
Enc: encryption; CMul: constant multiplication between ciphertext
for FHE and implement it by the SEAL library [35], which and plaintext; Add: addition between two ciphertexts; Dec:
supports SIMD. In YASHE, the degree of the polynomial n decryption.
is set to be 4096. In this way, we can pack 4096 elements
together. The plaintext modulus is 101, 285, 036, 033. The
length of the plaintext is 64-bit. The ciphertext modulus is final accuracy is comparable or even higher than that of
128-bit. These parameters matter in Section 4.3. max-pooling.
All results are averaged over at least 5 runs, in which the
error is controlled within 3%. 4.2.2 Effect of Activation Function
Some FHE-based schemes [10], [19] approximates the
activation function with polynomials. Fig. 12 illustrates the
4.2 Accuracy Evaluation
curves of the original ReLU function and its
4.2.1 Effect of Pooling Method approximations with different degrees, which are
To evaluate average-pooling we used and max-pooling polynomial regression function polyfit from the Python
used in existing works, we trained the model with both package numpy.
methods. Fig. 10 and Fig. 11 plot the accuracy against the When the absolute value of the input is smaller than 4,
number of iterations for average-pooling and max-pooling, we can obtain a satisfactory approximation effect.
respectively. The light blue line P1 represents the accuracy However, for larger input, the error introduced by the
of the specific training epoch, where P1 (i)(0 < i ≤ 10000) approximate become very large, which makes the accuracy
is the accuracy after the i-th iteration. The sky blue line P2 reduction noticeable. Our scheme reaches the same
shows the smoothed accuracy P2 (i) = αP2 (i − 1)+ accuracy as plaintext computation by directly designing the
(1 − α)P1 (i), with the smoothing factor α set to 0.6. The garbled circuits for ReLU, which compute an identical
dark blue line P3 shows the accuracy on the test dataset. output as ReLU.
The accuracy grows with the iterations until stabilized.
After 1, 000 iterations, the accuracy of max-pooling
4.3 Efficiency Evaluation
achieves 98.1% while that of average-pooling is just 97.5%.
Yet, after 10, 000 iterations, both the accuracy of the two 4.3.1 Triplet Generation
methods can achieve 99.2%. In brief, although the Table 3 compares our triplets generation method with prior
convergence of the average-pooling is relatively slow, the work in terms of the complexity and the number of
8
1 1
Accuracy with average pooling
Accuracy with max pooling

0.99 0.99
0.98 0.98
0.97 0.97
0.96 0.96
0.95 0.95
0.94 0.94
0.93 0.93
0.92 0.92
0.91 0.91
0.9 0.9
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
Number of Iterations Number of Iterations
Fig. 10: Accuracy with average-pooling Fig. 11: Accuracy with max-pooling
15
TABLE 5: ReLU costs (ms)
10
Avg. Avg.
Offline Online
5 Offline Online
0 S1 87.815 752.545 0.021 0.184
y
degree 2
-5 degree 3 S2 100.143 516.413 0.024 0.126
degree 4
-10
degree 5
degree 6
-15 TABLE 6: Polynomial approximation
ReLU
-20
-5 -4 -3 -2 -1 0 1 2 3 4 5 Time Performance
Approximate Function
x (ms) Gain
0.1992 + 0.5002x + 0.1997x2 19.02 61×

Fig. 12: Approximate of the ReLU function
0.1995 + 0.5002x + 0.1994x2 −
38.00 123×
0.0164x3
TABLE 4: Triplet generation costs 0.1500 + 0.5012x + 0.2981x2 −
69.62 225×
0.0004x3 − 0.0388x4
Our scheme with Our scheme with Performance
SecureML [26]
Acceleration Asyn. Comp. Gain 0.1488 + 0.4993x + 0.3007x2 +
69.64 224×
0.0003x3 − 0.0168x4
79716.524ms 19.635ms 16.970ms 4697×
0.1249 + 0.5000x + 0.3729x2 −
82.20 265×
0.0410x4 + 0.0016x6
operations. As Fig. 2, triplet generation for two

n-dimensional shared vectors of SecureML [26] encrypts
each element of the shared vector respectively. This process simultaneously. For the packed data, the offline time costs
consists of 5n Enc encryptions, 2n CMul multiplications for generating the circuits are 87.815ms and 100.143ms,
and additions, n decryptions, in which n ciphertexts are respectively. The time consumptions in the online stage are
transferred between the two servers. The complexity is 752.545ms and 516.413ms. The time costs are for 4096
O(n). pieces of data. The average per-data time costs of the
In contrast, MiniONN [12] and our scheme use the offline and online computations are 0.045ms and 0.310ms.
packing technique. All the l elements can be computed at Compared with the existing works using approximate
the same time. The n-dimensional vector is compressed polynomials, our GC-based circuits also provide higher
into an (n/l)-dimensional vector. This means that the efficiency. Table 6 illustrates the approximation polynomial
complexity of MiniONN and our scheme is just O(n/l). with different degrees. If the degree is set to be 6, the
Table 4 shows that our accelerated triplet generation approximation polynomial is 0.1249 + 0.5000x + 0.3729x2
scheme enjoys several order-of-magnitude improvements −0.0410x4 +0.0016x6 . When executing this polynomial, 13
over SecureML [26], for 4096 tripets. Further adopting multiplications and 4 addition are involved, which take
asynchronous computations improves the efficiency to 82.20ms. Therefore, our ReLU circuits outperform the
13.6%, resulting in an overall speed up of 4697 times. approximation polynomial by 265 times.
4.3.2 Activation Function 4.3.3 Evaluation on MNIST

To make our contribution stand out, we run the activation Table 7 reports the total latency for 4096 images and the
function circuits alone to demonstrate its performance. amortized time for each image. In the offline phase, the two
Table 5 summarizes our results. We perform our RuLU servers interact with each other to prepare the triplets,
with the SIMD circuits and activate the 4096 data which costs about 2000s. The amortized time for each
9
TABLE 7: Performance in each phase time cost of our scheme with asynchronous computation is
just 0.029. In other words, we obtain 27.5 % performance
Phases Amortized time saving. Analogously, the savings for network2, network3,
Latency (s)
per image (s)
and network4 are 20.0%, 1.7%, 1.4% respectively. In short,
S1 913.735 0.223 our asynchronous computation reduces the latency.
Offline
S2 1084.340 0.264 To demonstrate the superiority of our scheme, we
compare with SecureML [26], MiniONN [12], and
Query
0.005 0.000 EzPC [15]. Fig. 13 and Fig. 14 show the computation and
Encoding
communication costs on different neural network
Conv 268.932 0.065 architectures, respectively. Compared with SecureML [26]
ReLU 11694.716 2.855 (with model privacy), we provide 282× faster computation
Pooling 0.182 0.000
time and 122× lower communication cost. Compared with
MiniONN [12] (without model privacy), our scheme
Conv 29.881 0.007 performs 18× and 49× better. Compared with EzPC [15]
Online ReLU 1299.412 0.317 (without model privacy), ours is 10× and 38× better.
Pooling 0.020 0.000
Fully 29.876 0.007 5 S ECURITY A NALYSIS
ReLU 126.895 0.031 Both the query and the model are randomly shared and
distributed to two independent servers. As long as they do
Result
0.004 0.000 not collude, the privacy of the query and the model are
Decoding
preserved. For the intermediate data and the results, both
Total 13451.094 3.284
servers possess the corresponding share with pre-computed
triplets, which are independent of the sensitive data. The
query, the model, the intermediate data, and the prediction
image is thus 0.487s. In the online phase, the activation results are secret-shared. Hence no meaningful information
function dominates since it uses garbled circuits. For the is revealed to the semi-honest servers.
convolution layer, pooling layer, and fully connected layer, We provide simulation-based proof for security under
all the computations are executed over shares without HE the semi-honest model.
or garbled circuits, which just takes a little time. Notably, Security Definition: Let f = (fU , fQ S1 , fS2 ) be a
the client just needs 2.197µs to encode the query and probabilistic polynomial function and a protocol
decode the prediction result. The amortized online time computing f . The user andQtwo servers want to compute
cost for the user is 3.284s. f (G, C1 , C2 , W1 , W2 ) using where G is the query image
of the user U , and (C1 , W1 ) and (C2 , W2 ) are the shared
4.3.4 Evaluation on Different Network Architectures model deposited Q on the two servers. The view of U during
We conduct experiment over network architectures in four the execution of is VU (G, C1 , C2 , W1 , W2 ) = (G, rU , mU )
published works [26], [10], [14], [12], which use a where rU is the random tape of U , and mU is the message
combination of FC and Conv layers as follows. For received by U . Simultaneously, the view of S1 and S2 are
FC(784 → 128), 784 and 128 respectively represent the size defined as VS1 (G, C1 , C2 , W1 , W2 ) = (C1 , WQ 1
, rS1 , mS1 ),
of the input and the output. On the other hand, for VS2 (a, b, c) = (C2 , W2 , rS2 , mS2 ). The protocol achieving
Conv(1 × 28 × 28 → 5 × 13 × 13), the input image is of the function f is regarded as secure if for every possible
size 28 × 28 with 1 channel, the output image is of size input G, C1 , C2 , W1 , W2 of f , there exist the probabilistic
13 × 13 with 5 channels. For the square activation function, polynomial time simulators ΦU , ΦS1 , and ΦS2 such that,
it is essentially multiplication operation between two secret
shares. In our scheme, we can achieve it by using the ΦU (a, fU (G, C1 , C2 , W1 , W2 )) ≡p VU (G, C1 , C2 , W1 , W2 ),
triplet, which is similar to the convolutional operation. ΦS1 (a, fS1 (G, C1 , C2 , W1 , W2 )) ≡p VS1 (G, C1 , C2 , W1 , W2 ),
• Network1 [26]: FC(784 → 128) ⇒ Square ⇒
ΦS2 (a, fS2 (G, C1 , C2 , W1 , W2 )) ≡p VS2 (G, C1 , C2 , W1 , W2 ),
FC(128 → 128) ⇒ Square ⇒ FC(128 → 10). where ≡p denotes computational indistinguishability.
• Network2 [10]: Conv(1 × 28 × 28 → 5 × 13 × 13) ⇒
Theorem 1: The triplet generation protocol in Section 3.1 is
Square ⇒ Pooling(5 × 13 × 13 → 5 × 13 × 13) ⇒
secure against semi-honest adversaries.
Conv (5 × 13 × 13 → 50 × 5 × 5) ⇒ Pooling(50 ×
Proof : S1 holds the secret key. All the messages passed
5 × 5 → 50 × 5 × 5) ⇒ FC(1250 → 100) ⇒ Square from S1 to S2 are encrypted. All those passed from S2 to
⇒ FC(100 → 10). S1 are distributed uniformly by adding random data
• Network3 [14]: Conv(1 × 28 × 28 → 5 × 13 × 13) ⇒
independent of the data of S2 . The view of S1 is
ReLU ⇒ FC(845 → 100) ⇒ ReLU ⇒ FC(100 → 10).
VS1 = (C1 , W1 , a1 , b1 , z1 ), where a1 , b1 , z1 come from the
• Network4 [12]: the same as the one in Section 4.1.
triplet of S1 . We construct the simulator ΦS1 (C1 , W1 ) as:
Table 8 shows the performance of synchronous and 1) Pick the random integers â1 , b̂1 , ẑ1 from Z2t .
asynchronous computations in both the offline and online 2) Output ΦS1 (C1 , W1 ) = (C1 , W1 , â1 , b̂1 , ẑ1 ).
phases. For network1, the total time cost of the existing As the randomness â1 (or b̂1 , ẑ1 ) is generated in the
schemes with synchronous computation is 0.040s while the same manner as a1 (or a1 , b1 ), and independent from the
10
TABLE 8: Performance of the synchronous computation and asynchronous computation on different networks
Network 1 Network 2 Network 3 Network 4
Syn. Asyn. Syn. Asyn. Syn. Asyn. Syn. Asyn.
Comp Comp Comp Comp Comp Comp Comp Comp
Offline Phase (s) 0.007 0.005 0.020 0.016 0.052 0.050 0.492 0.489
Online Phase (s) 0.033 0.023 0.050 0.039 0.323 0.319 3.285 3.234
Total (s) 0.040 0.028 0.070 0.057 0.375 0.368 3.777 3.723
Fig. 13: Computation costs Fig. 14: Communication costs
other data, the distribution of (C1 , W1 , â1 , b̂1 , ẑ1 ) and Theorem 4: The activation computation protocol in
(C1 , W1 , a1 , b1 , z1 ) are indistinguishable. Thus we have Section 3.2.4 is secure against semi-honest adversaries.
proven that VS1 ≡p ΦS1 (C1 , W1 ). Analogously, Proof : The two servers compute the activation function
ΦS2 (C , W ) = (C , W , â2 , b̂2 , ẑ2 ), VS2 ≡p ΦS2 (C2 , W2 ).
2 2 2 2 by using the designed garbled circuits. The view of S1 is
VS1 = (C1 , W1 , H1 , J 1 ). The view of S2 is
Theorem 2: The image distribution protocol in Section 3.2.2 is
VS2 = (C , W2 , H2 , J 2 ). Since the garbled circuits are
2
secure against semi-honest adversaries, we only consider
Proof : User U additively shares the query image into the inputs and outputs of the circuits rather than the
two parts. The views of the three parties are VU = (G, RG ), internal details. We construct a simulator ΦS1 (C1 , W1 ) as:
VS1 = (C1 , W1 , G 1 ), VS2 = (C2 , W2 , G 2 ), where G 1 = RG , 1) Pick the random integers Ĥ1 , Jˆ1 from Z2t .
G 2 = G − RG . We construct a simulator ΦS1 (C1 , W1 ) as:
2) Output ΦS1 (C1 , W1 ) = (C1 , W1 , Ĥ1 , Jˆ1 ).
1) Pick random integers R̂G from Z2t and set Ĝ 1 = R̂G . The distributions of H1 (J 1 ) and H1 (Jˆ1 ) are
2) Output ΦS1 (C1 , W1 ) = (C1 , W1 , Ĝ 1 ). indistinguishable. Hence VS1 ≡p ΦS1 (C1 , W1 ) holds.
Both RG and R̂G are generated randomly. Hence the Analogously, we have ΦS2 (C2 , W2 ) = (C2 , W2 , Ĥ2 , Jˆ2 ),
distribution of RG and R̂G are indistinguishable. As a and VS2 ≡p ΦS2 (C2 , W2 ).
consequence, we have VS1 ≡p ΦS1 (C1 , W1 ). Analogously, Theorem 5: The pooling computation protocol in Section 3.2.5
ΦS2 (C2 , W2 ) = (C2 , W2 , Ĝ 2 ), VS2 ≡p ΦS2 (C2 , W2 ). is secure against semi-honest adversaries.
Theorem 3: The convolutional computation protocol in Proof : The view of S1 is VS1 = (C1 , W1 , J 1 , K1 ). The
Section 3.2.3 is secure against semi-honest adversaries. view of S2 is VS2 = (C2 , W2 , J 2 , K2 ). We construct a
Proof : The two servers perform the convolution with the simulator ΦS1 (C1 , W1 ) as:
help of triplets. Their views VS1 and VS2 are 1) Pick the random integers Jˆ1 , K̂1 from Z2t .
(C1 , W1 , G 1 , H1 , A1 , B1 , Z1 ), (C2 , W2 , G 2 , H2 , A1 , B1 , Z1 ) 2) Output ΦS1 (C1 , W1 ) = (C1 , W1 , Jˆ1 , K̂1 ).
respectively. We construct a simulator ΦS1 (C1 , W1 ) as: Since the distribution of J 1 (K1 ) and J 1 (K̂1 ) are
indistinguishable, VS1 ≡p ΦS1 (C1 , W1 ) holds. Analogously,
1) Call the triplet generation protocol to have Â1 , B̂1 , Ẑ1 .
ΦS2 (C2 , W2 ) = (C2 , W2 , Jˆ2 , K̂2 ), and VS2 ≡p ΦS2 (C2 , W2 ).
2) Pick the random integers Ĝ 1 , Û 2 , V̂ 2 from Z2t .
Theorem 6: The fully connected protocol in Section 3.2.6 is
3) Compute Û = Ĝ 1 − Â1 + Û 2 , V̂ = C1 − B̂1 + V̂ 2 .
4) Compute Ĥ1 = −Û V̂ + Ĝ 1 V̂ + C1 Û + Ẑ1 . Proof : The views VS1 and VS2 are
5) Output ΦS1 (C1 , W1 ) = (C1 , W1 , Ĝ 1 , Ĥ1 ). (C1 , W1 , K1 , F 1 , A1 , B1 , Z1 ), (C2 , W2 , K2 , F 2 , A2 , B2 , Z2 )
The distribution of H1 and Ĥ1 are indistinguishable. respectively. We construct a simulator ΦS1 (C1 , W1 ) as:
Hence VS1 ≡p ΦS1 (C1 , W1 ) holds. Analogously, we have 1) Call the triplet generation protocol to have Â1 , B̂1 , Ẑ1 .
ΦS2 (C2 , W2 ) = (C2 , W2 , Ĝ 2 , Ĥ2 ), and VS2 ≡p ΦS2 (C2 , W2 ). 2) Pick the random integers K̂1 , Û 2 , V̂ 2 from Z2t .
11
3) Compute Û = K̂1 − Â1 + Û 2 , V̂ = W1 − B̂1 + V̂ 2 . [2] Y. Adi, C. Baum, M. Cissé, B. Pinkas, and J. Keshet, “Turning your
4) Compute F̂ 1 = −Û V̂ + K̂1 V̂ + W1 Û + Ẑ1 . weakness into a strength: Watermarking deep neural networks
by backdooring,” in USENIX Security Symposium, 2018, pp. 1615–
5) Output ΦS1 (C1 , W1 ) = (C1 , W1 , K̂1 , F̂ 1 ). 1631.
The distribution of F 1 and F̂ 1 are indistinguishable. [3] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership
Hence VS1 ≡p ΦS1 (C1 , W1 ) holds. Analogously, we have inference attacks against machine learning models,” in IEEE
Symposium on Security and Privacy (S&P), 2017, pp. 3–18.
ΦS2 (C2 , W2 ) = (C2 , W2 , K̂2 , F̂ 2 ), and VS2 ≡p ΦS2 (C2 , W2 ). [4] J. Hayes, L. Melis, G. Danezis, and E. D. Cristofaro, “LOGAN:
membership inference attacks against generative models,” PoPETs,
vol. 2019, no. 1, pp. 133–152, 2019.
6 A DDITIONAL R ELATED W ORKS [5] A. Salem, Y. Zhang, M. Humbert, P. Berrang, M. Fritz, and
M. Backes, “ML-Leaks: Model and data independent membership
Wang et al. [36] focused on the problem of natural language inference attacks and defenses on machine learning models,” in
processing and proposed a secure model learning method. ISOC Network and Distributed System Security Symposium (NDSS),
They leveraged the non-colluding assumptions for secure 2019.
[6] N. Papernot, P. D. McDaniel, I. J. Goodfellow, S. Jha, Z. B.
computation without letting the servers learn the inputs, Celik, and A. Swami, “Practical black-box attacks against machine
intermediate or final results. Bost et al. [37] considered learning,” in ACM Asia Conference on Computer and Communications
traditional machine-learning classifications (e.g., Security (AsiaCCS), 2017, pp. 506–519.
hyperplane decision-based classifiers, naı̈ve Bayes [7] J. Jia and N. Z. Gong, “AttriGuard: A practical defense against
attribute inference attacks via adversarial machine learning,” in
classifiers, decision trees) over FHE-encrypted data. USENIX Security Symposium, 2018, pp. 513–529.
Tai et al. [18] proposed privacy-preserving decision trees [8] G. F. Elsayed, S. Shankar, B. Cheung, N. Papernot, A. Kurakin,
evaluation which only uses linear functions, hence additive I. J. Goodfellow, and J. Sohl-Dickstein, “Adversarial examples that
homomorphic encryption suffices. Aloufi et al. [21] fool both computer vision and time-limited humans,” in Neural
Information Processing Systems (NeurIPS), 2018, pp. 3914–3924.
considered privacy-preserving decision trees evaluation in [9] Z. Wang, M. Song, S. Zheng, Z. Zhang, Y. Song, and Q. Wang,
the outsourced setting, yet, it requires multi-key FHE. All “Invisible adversarial attack against deep neural networks: An
these works did not consider neural networks. adaptive penalization approach,” IEEE Trans. Dependable Sec.
Comput., vol. PP, pp. 1–1, DOI: 10.1109/TDSC.2019.2 929 047, 2019.
For some latest and representative works for neural
[10] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. E. Lauter, M. Naehrig,
network evaluation in the S2C setting without model and J. Wernsing, “CryptoNets: Applying neural networks
privacy, EzPC [15] proposes a compiler which translates to encrypted data with high throughput and accuracy,” in
between arithmetic and boolean circuits. XONN [16] International Conference on Machine Learning (ICML). ACM, 2016,
pp. 201–210.
‘‘replaces’’ the matrix multiplication with XNOR, which is [11] H. Chabanne, A. de Wargny, J. Milgram, C. Morel, and E. Prouff,
virtually free in GC. Yet, this scheme only applies to the “Privacy-preserving classification on deep neural network,” IACR
binary neural networks, where the parameters are binary. Cryptology ePrint Archive 2017/035, 2017.
DeepSecure [14] preprocesses the data and network before [12] J. Liu, M. Juuti, Y. Lu, and N. Asokan, “Oblivious neural network
predictions via MiniONN transformations,” in ACM Conference on
S2C but leaks some information about the network Computer and Communications Security (CCS), 2017, pp. 619–631.
parameters. [13] C. Juvekar, V. Vaikuntanathan, and A. Chandrakasan, “GAZELLE:
Differential privacy [38], [39], [40] adds noise without A low latency framework for secure neural network inference,” in
sacrificing too much data utility, but it cannot wholly USENIX Security Symposium, 2018, pp. 1651–1669.
[14] B. D. Rouhani, M. S. Riazi, and F. Koushanfar, “DeepSecure:
ensure data privacy. Trusted processor (e.g., Intel SGX) scalable provably-secure deep learning,” in Design Automation
approaches [41], [42], [43], [44] work on the data within the Conference (DAC), 2018, pp. 2:1–2:6.
trusted perimeter, but they are subjected to the memory [15] N. Chandran, D. Gupta, A. Rastogi, R. Sharma, and S. Tripathi,
“EzPC: Programmable, efficient, and scalable secure two-party
constraint (currently 128MB) which can be easily exceeded
computation,” in EuroS&P. IEEE, 2019, pp. 496–511.
by a specific layer of a deep neural network. Moreover, [16] M. S. Riazi, M. Samragh, H. Chen, K. Laine, K. E. Lauter,
paging introduces overhead. Cryptographic approaches, in and F. Koushanfar, “XONN: XNOR-based oblivious deep neural
general, do not have these problems. network inference,” in USENIX Security Symposium, 2019, pp.
1501–1518.
[17] C. Gentry, “Fully homomorphic encryption using ideal lattices,”
in Symposium on Theory of Computing (STOC). ACM, 2009, pp.
7 C ONCLUSION 169–178.
We improve the accuracy and efficiency of [18] R. K. H. Tai, J. P. K. Ma, Y. Zhao, and S. S. M. Chow, “Privacy-
preserving decision trees evaluation via linear functions,” in
privacy-preserving neural network prediction in the European Symposium on Research in Computer Security (ESORICS),
outsourcing setting (i.e., the servers providing the 2017, pp. 494–512.
prediction service can get nothing about the query, the [19] E. Hesamifard, H. Takabi, M. Ghasemi, and R. N. Wright,
model, any intermediate results, and the final result). We “Privacy-preserving machine learning as a service,” PoPETs, vol.
2018, no. 3, pp. 123–142, 2018.
adopt non-linear activate function which preserves the [20] X. Jiang, M. Kim, K. E. Lauter, and Y. Song, “Secure outsourced
accuracy of the underlying neural network. We also reduce matrix computation and application to neural networks,” in ACM
both the computation and communication overheads by Conference on Computer and Communications Security (CCS), 2018,
pp. 1209–1222.
adopting packing and asynchronous computation.
[21] A. Aloufi, P. Hu, H. W. H. Wong, and S. S. M. Chow, “Blindfolded
Extensive experiments show that our scheme outperforms evaluation of random forests with multi-key homomorphic
the state-of-the-art. encryption,” IEEE Trans. Dependable Sec. Comput., 2019, to appear.
[22] S. S. M. Chow, J. Lee, and L. Subramanian, “Two-
party computation model for privacy-preserving queries over
R EFERENCES distributed databases,” in ISOC Network and Distributed System
Security Symposium (NDSS), 2009.
[1] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT [23] S. Kamara, P. Mohassel, and M. Raykova, “Outsourcing multi-
Press, 2016, http://www.deeplearningbook.org. party computation,” IACR Cryptology ePrint Archive 2011/272.
12
[24] B. Wang, M. Li, S. S. M. Chow, and H. Li, “A tale of two clouds: Sherman S. M. Chow received the Ph.D.
Computing on data encrypted under multiple keys,” in IEEE degree from New York University. He is an
Conference on Communications and Network Security, CNS, 2014, pp. associate professor with the Chinese University
337–345. of Hong Kong. He was a research fellow with
[25] M. Baryalai, J. Jang-Jaccard, and D. Liu, “Towards privacy- the University of Waterloo. His interest is applied
preserving classification in neural networks,” in Privacy, Security cryptography. He has published in AsiaCrypt,
and Trust (PST), 2016, pp. 392–399. CCS, EuroCrypt, ITCS, NDSS, and Usenix
[26] P. Mohassel and Y. Zhang, “SecureML: A system for scalable Security. He served on the program committees
privacy-preserving machine learning,” in IEEE Symposium on of many conferences including AsiaCrypt, CCS,
Security and Privacy (S&P). IEEE, 2017, pp. 19–38. CRYPTO, ESORICS, Financial Crypt, ICDCS,
[27] D. Beaver, “Efficient multiparty protocols using circuit Infocom, PETS, PKC, and TheWeb. He has
randomization,” in Advances in Cryptology – CRYPTO, 1991, also served on the editorial board of many journals and book series,
pp. 420–432. including IEEE T RANS . I NFORMATION F ORENSICS AND S ECURITY
[28] A. C. Yao, “How to generate and exchange secrets,” in IEEE (TIFS), IET Information Security (IET-IFS), Intl. J. Information Security
Symposium on Foundations of Computer Science. IEEE, 1986, pp. (IJIS), J. of Information Security and Applications (JISA), EAI Endorsed
162–167. Transactions on Scalable Information Systems, and SpringerBriefs on
[29] N. P. Smart and F. Vercauteren, “Fully homomorphic SIMD Cyber Security Systems and Networks (CSSN). He is a European
operations,” Des. Codes Cryptography, vol. 71, no. 1, pp. 57–81, 2014. Alliance for Innovation (EAI) Fellow (2019, inaugural), and named as
[30] Q. Zou, Z. Zhang, Q. Li, X. Qi, Q. Wang, and S. Wang, “Deepcrack: one of the 100 Most Influential Scholars (Security and Privacy, 2018)
Learning hierarchical convolutional features for crack detection,” by ArnetMiner (AMiner). He received the Early Career Award from the
IEEE Trans. Image Processing, vol. 28, no. 3, pp. 1498–1512, 2019. Hong Kong Research Grants Council.
[31] J. W. Bos, K. E. Lauter, J. Loftus, and M. Naehrig, “Improved
security for a ring-based fully homomorphic encryption scheme,”
in IMA Int’l Conference on Cryptography and Coding (IMACC), 2013,
pp. 45–64. Shengshan Hu received the B.E. degree from
[32] A. Shamir, “How to share a secret,” Commun. ACM, vol. 22, no. 11, Wuhan University, China, in 2014, in Computer
pp. 612–613, 1979. Science and Technology. He is currently a
[33] D. Demmler, T. Schneider, and M. Zohner, “ABY - A framework for PhD candidate in the School of Cyber Science
efficient mixed-protocol secure two-party computation,” in ISOC and Engineering in Wuhan University. His
Network and Distributed System Security Symposium (NDSS), 2015. research interest focuses on secure outsourcing
[34] Y. LeCun, “The MNIST database of handwritten digits,” http:// of computations.
yann.lecun.com/exdb/mnist, 1998.
[35] N. Dowlin, R. Gilad-Bachrach, K. Laine, K. E. Lauter, M. Naehrig,
and J. Wernsing, “Manual for using homomorphic encryption for
bioinformatics,” Proc. of the IEEE, vol. 105, no. 3, pp. 552–567, 2017.
[36] Q. Wang, M. Du, X. Chen, Y. Chen, P. Zhou, X. Chen, and
X. Huang, “Privacy-preserving collaborative model learning: The
case of word vector training,” IEEE Trans. Knowl. Data Eng., vol. 30, Yuejing Yan received her B.S. and M.S.
no. 12, pp. 2381–2393, 2018. degrees from Wuhan University, China. She
[37] R. Bost, R. A. Popa, S. Tu, and S. Goldwasser, “Machine is currently pursuing the Ph.D. degree in the
learning classification over encrypted data,” in ISOC Network and State Key Laboratory of Information Engineering
Distributed System Security Symposium (NDSS), 2015. in Surveying, Mapping, and Remote Sensing,
[38] M. Abadi, A. Chu, I. J. Goodfellow, H. B. McMahan, I. Mironov, Wuhan University, China. Her research interest
K. Talwar, and L. Zhang, “Deep learning with differential privacy,” focuses on secure outsourcing of computation.
in ACM Conference on Computer and Communications Security (CCS),
2016, pp. 308–318.
[39] A. D. Smith, A. Thakurta, and J. Upadhyay, “Is interaction
necessary for distributed private learning?” in IEEE Symposium
on Security and Privacy (S&P), 2017, pp. 58–77.
[40] L. Xiang, J. Yang, and B. Li, “Differentially-private deep learning
from an optimization perspective,” in IEEE International Conference Minxin Du received his B.S. and M.S. degrees
on Computer Communications (INFOCOM), 2019, pp. 559–567. in Information Security from Wuhan University.
[41] O. Ohrimenko, F. Schuster, C. Fournet, A. Mehta, S. Nowozin, He is now a Ph.D. student with the Chinese
K. Vaswani, and M. Costa, “Oblivious multi-party machine University of Hong Kong. His research focuses
learning on trusted processors,” in USENIX Security Symposium, on applied cryptography.
2016, pp. 619–636.
[42] T. Hunt, C. Song, R. Shokri, V. Shmatikov, and E. Witchel, “Chiron:
Privacy-preserving machine learning as a service,” CoRR, 2018.
[43] S. Hu, L. Y. Zhang, Q. Wang, Z. Qin, and C. Wang, “Towards
private and scalable cross-media retrieval,” IEEE Trans. Dependable
Sec. Comput., vol. PP, pp. 1–1, DOI: 10.1109/TDSC.2019.2 926 968,
2019.
[44] F. Tramèr and D. Boneh, “Slalom: Fast, verifiable and private
execution of neural networks in trusted hardware,” in International
Zhibo Wang received
Conference on Learning Representations (ICLR), 2019.
the B.E. degree in Automation from Zhejiang
University, China, in 2007, and his Ph.D degree
in Electrical Engineering and Computer Science
from University of Tennessee, in 2014. He is
Minghui Li received her B.S. degree in currently a Professor with the School of Cyber
Information Security in 2016, and her M.S. Science and Engineering, Wuhan University,
degree in Computer Technology in 2018, from China. His currently research interests include
Wuhan University, China. She is currently Internet of Things, network security and privacy
pursuing the Ph.D. degree in the School of Cyber protection. He is a Senior Member of IEEE and
Science and Engineering, Wuhan University, a Member of ACM.
China. Her research focuses on security and
privacy issues in cloud computing and artificial
intelligence.

2002 10944

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2002 10944

Uploaded by

Copyright:

Available Formats

1

Optimizing Privacy-Preserving Outsourced

Fig. 1: System overview

TABLE 1: Comparison of related work TABLE 2: Notations

model inter. Accuracy Efficiency C the (m × m) convolutional kernels of CNN

tri ← TRIP(a1 , b1 , (pk , sk ); a2 , b2 , pk ) (H1 ; H2 ) ← CONV(G 1 , C1 , (pk , sk ); G 2 , C2 , pk )

At S1 side: - S1 sends U 1 = G̃i,j1

2) Set z1 = (v + a1 · b1 ) = (a · b + r) mod 2t . - S1 and S2 compute U = G̃i,j − A, V = C − B ;

Fig. 3: Convolution operation

Fig. 4: Packed triplet generation process

(𝒔𝒌) Syn.  (𝒔𝒌) Asyn. 

Pre‐tra Pre‐tra Pre‐tra Pre‐tra 

Conv … Tra Tra Tra Tra … 2)  

Accuracy with max pooling

0.1992 + 0.5002x + 0.1997x2 19.02 61×

operations. As Fig. 2, triplet generation for two

4.3.2 Activation Function 4.3.3 Evaluation on MNIST

Fig. 13: Computation costs Fig. 14: Communication costs

You might also like