Professional Documents
Culture Documents
Abstract—Blockchain has gradually attracted widespread interactive system, many researchers [3]–[8] have explored
attention from the research community of the IoT, due to its employing a decentralized blockchain network to adapt IoT
decentralization, consistency, and other attributes. It builds a scenarios.
secure and robust system by generating a backup locally for each
participant node to collectively maintain the network. However, In the blockchain, each node will generate a backup
this feature brings some privacy concerns since all nodes can locally for the whole chain data to maintain and synchronize
access the chain data, users’ sensitive information under risk the network [9]. However, such a deep supervision mech-
of leakage. The local differential privacy (LDP) mechanism can anism brings increasing privacy concerns. Since all nodes
be a promising way to address this issue as it implements data have access to the chain data, the sensitive information will
perturbation before uploading to the chain. While traditional
LDP mechanisms cannot fit well with the blockchain since the face the threat of leakage. To address the privacy issue,
requirements of a fixed input range, large data volume, and researchers are devoted to applying privacy-preserving algo-
using the same privacy budget, which are practically difficult in a rithms on the blockchain, such as secure multiparty computa-
decentralized environment. To overcome these problems, we pro- tion [10], zero-knowledge proof [11], homomorphic computa-
pose a novel LDP mechanism to split input numerical data and tion [12], [13], and so on. These research focus on encryption
implement perturbation by digital bits, which does not require a
fixed input range and large data volume. In addition, we use an approaches to work on the ciphertext, which provides higher
iteration approach to adaptively allocate the privacy budget for security in theory while costing much computing resources. It
different perturbation procedures that minimize the total devia- will be practically unacceptable for IoT devices in terms of
tion of perturbed data and increase the data utility. We employ their constraints in storage and computing capability. In addi-
mean estimation as the statistical utility metric under the same tion, encryption methods will lose the statistical utility when
and randomized privacy budgets to evaluate the performance
of our novel LDP mechanism. The experiment results indicate other users tend to extract valuable information from database
that the proposed LDP mechanism performs better in different querying.
scenarios, and our adaptive privacy budget allocation model can Since there is no centralized server in blockchain, the
significantly reduce the deviation of the perturbation function to local differential privacy (LDP) mechanismcan be a promis-
provide high data utility while maintaining privacy. ing approach to address privacy issues. As it can provide both
Index Terms—Adaptive privacy budget allocation, blockchain, privacy guarantee and data utility while does not increase the
local differential privacy (LDP), mean estimation, numerical computational complexity. However, traditional LDP mecha-
splitting. nisms [14]–[17] do not fit well with the blockchain, in terms
of the mean estimation of numerical data. The reasons are as
I. I NTRODUCTION follows.
N RECENT years, IoT applications and relevant 1) The input range is limited in advance. In the Laplace
I research [1], [2] are growing continuously with impressive
speed. When people use IoT devices, many individual data
mechanism and Duchi’s solution, all input data need to
be mapped into a range of [−A, A] in preprocessing.
will be collected by equipment automatically. For data sharing However, under the decentralization scenario, it is prac-
or additional services (predicting, statistics, etc.), some people tically difficult to ask every participant to follow the
would allow the IoT system to upload their data into cloud same input rules.
servers, which leads to much data interaction with third-party 2) The mean estimation result does not perform well in the
applications. It poses strict requirements on the performance case of a small amount of data. It requires a large amount
and stability of the central server. To build a stable and robust of data to balance the positive and negative noise. When
input data volume is small, the perturbation degree is
Manuscript received 10 December 2021; accepted 7 January 2022. Date hard to control.
of publication 25 January 2022; date of current version 7 April 2023. 3) The employed privacy budget needs to be the same for
(Corresponding author: Kai Zhang.)
The authors are with the Department of Computing Technologies, all users. The privacy budget corresponds to the protec-
Swinburne University of Technology, Melbourne, VIC 3122, Australia tion strength of LDP mechanisms. As a decentralized
(e-mail: kevin.zhang0522@gmail.com; 102346450@student.swin.edu.au; system, it is challenging to request all participants to
hxiao@swin.edu.au; yingzhao@swin.edu.au; 102506526@
student.swin.edu.au; jinjun.chen@gmail.com). choose the same protection strength, since privacy is a
Digital Object Identifier 10.1109/JIOT.2022.3145845 very subjective factor.
2327-4662
c 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: George Mason University. Downloaded on February 02,2024 at 15:44:10 UTC from IEEE Xplore. Restrictions apply.
6734 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO. 8, 15 APRIL 2023
To address these issues, we propose a novel LDP mech- one data collector. Hence, users have to follow the same
anism to adapt the nature of blockchain in IoT, the usage data uploading rules, including the fixed input data range and
scenario focuses on the privacy preservation on numerical data employ the same privacy budget. Otherwise, the traditional
in the statistical mean estimation query. In general, for any LDP mechanisms cannot conduct preprocessing (the Laplace
input numerical data, this method splits it into several digits mechanism needs to work out global sensitive f , and Duchi’s
by bit, then translates each digit into binary value for perturba- solution needs to implement discretization) and the data utility
tion. After perturbation, perturbed digits will be aggregated by will also lose greatly. While in the blockchain, each node can
inverse function. In addition, our method employs an iteration be regarded as a data collector. Nodes are equal to each other,
approach to adaptively allocate privacy budgets for different which means they own the rights to make the data upload reg-
perturbation procedures to minimize the total deviation of per- ulation. It is the same to the users, they would tend to set the
turbation functions. According to the experiment result, the privacy protection strength (privacy budget ε) by themselves.
proposed mechanism can provide both the strong privacy guar- Based on the above discussion, we propose a novel LDP
antee and high utility on mean estimation with no requirements mechanism for the blockchain in IoT scenarios to overcome
of the fixed input range, large data volume, and same privacy these problems, and it will be expounded in the following
budget. The main contributions of this article are summarized sections.
as follows.
1) We propose a novel LDP mechanism to fit blockchain
well, with no requirement of the limited input range and III. P RELIMINARIES
data volume. The LDP mechanism emerged with no requirement of the
2) We work out the problem to implement the LDP mecha- trusted third-party collector. The users conduct the data pertur-
nism in blockchain with participants employing different bation before uploading to the server, and since the raw data
privacy budgets. are hidden locally, privacy would be protected. The definition
3) We provide an iteration approach to adaptively allocate of the LDP model as follows: given mechanism M and its
privacy budgets for different perturbation procedures. domain Dom(M) and range Ran(M), for the mechanism M, if
Paper Organization: Section II presents related work any input of sample record t and t (t, t ∈ Dom(M)), and their
and problem analysis. Section III introduces preliminaries. output t∗ (t∗ ⊆ Ran(M)) satisfy the following formula (1), then
Section IV expounds on our algorithm and approaches. The mechanism M satisfies ε-local differential privacy:
experimental results and evaluation metrics are demonstrated
in Section V. Section VI conducts the conclusions and dis-
P M(t) = t∗ ≤ eε × P M t = t∗ . (1)
cusses future directions.
From the definition, we can learn that the key point of the
II. R ELATED W ORK AND P ROBLEM A NALYSIS LDP model is to control the output of mechanism M. For any
two inputs t and t from domain (M), the output of mecha-
In the past few years, many researchers have contributed
nism M will have a similar result. According to (1) above,
to privacy preservation in the blockchain. Wu et al. [18]
when the privacy budget ε is close to 0, the probability of
presented blockchain-based solutions to address the privacy
P(M(t) = t∗ ) is equal to P(M(t ) = t∗ ), it indicates that the
issues in 5G-enabled drone communications. They also [19]
algorithm is highly protective for the input data. The smaller
conducted deep exploration on the privacy preservation of
the privacy budget it is, the stronger capability of data privacy
the blockchain and edge computing for industry 4.0. For
protection will be, and the worse data utility we obtain.
the studies of DP and LDP mechanisms in the blockchain:
Hassan et al. [20] discussed the privacy-preserving solutions of
blockchain-based IoT systems. Mohanta et al. [21] employed
IV. O UR P ROPOSED A PPROACH
blockchain as a secure database to address the privacy issues
in IoT scenarios. Zhao et al. [22] used the blockchain to A. LDP-Based Data Interaction Framework in Blockchain
trace the update operation in a federated learning model to The overall framework of the proposed LDP model and
avoid malicious attacks. Gai et al. [23] integrated IoT with data stream is demonstrated in Fig. 1. Specifically, the IoT
edge computing and blockchain, and proposed a framework devices collect users’ data and implement perturbation locally,
to establish a privacy-preserving mechanism on industrial then upload perturbed data into the intermediate client mod-
IoT scenarios. Zhao et al. [24] proposed a blockchain-based ule. The intermediate client module is directly connected to
approach to save and track the cost of the DP model. the smart contract through API functions and forwards the
In general, many studies regarded the blockchain system perturbed data from users to the blockchain network. All data
as a secure data collector while less attention toward on the uploading processes are compulsory to pass the consensus.
privacy issues for itself. According to the previous discus- Whenever they failed in network malfunction, the intermediate
sion, the LDP mechanism can be a feasible solution to adapt client module would resend the request to fix the error. The
the blockchain system in IoT Scenarios. But the traditional third-party data analysts can send the query to the smart con-
LDP mechanisms are designed for the centralized server envi- tract to obtain the mean estimation result, and the process of
ronment, it is the essential reason why they cannot adapt querying will be presented in the part of the smart contract in
blockchain well. In the centralized scenario, there has only detail.
Authorized licensed use limited to: George Mason University. Downloaded on February 02,2024 at 15:44:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: NUMERICAL SPLITTING AND ADAPTIVE PRIVACY BUDGET-ALLOCATION-BASED LDP MECHANISM 6735
TABLE I
Algorithm (a) Split Numerical Data into Digits by Bit. E NCODING E XAMPLE OF N UMERICAL N UMBER Nr = 256.48
(b) Transfer Digits into Four-bit Binary Mode
Input: Users’ sensitive numeric data Nr ,
Dp denotes the accuracy of the number of decimal places.
Output: Four-bit binary mode array of integer part int_ANr
Four-bit binary mode array of decimal part decimal_ANr
Function transfer(num)
Bm [0 . . . 9] ← ["0000", "0001", "0010", "0011", "0100", "0101",
"0110", "0111", "1000", "1001"]
for j< 10 do
if num = j then method to realize the transformation between binary and dec-
return Bm [j]
end for imal mode, once the input digits match the index of array Bm ,
End the function will return the corresponding binary string value.
Di denotes the decimal value of each digit split by Nr , and Bi
Set array int_ANr , decimal_ANr to be empty
int_N r ← int(Nr ) denotes the corresponding 4-bit binary value. When the input
decimal_N r ← Nr − int_N r data have a decimal part, the integer part and the decimal part
int_LNr ←length(int_N r ) need to be processed separately. Specifically, the decimal part
decimal_LNr ←Dp
#########For Integer part########## needs to be converted into an integer before processing.
for i < int_LNr do To explain the encoding procedure more clearly, we propose
Di ← int_N r %10 an example here (as shown in Table I): for input numerical
int_N r ← int_N r //10
Bi ←transfer(Di ) number 256.48, we set Dp = 2, then it is as follows.
int_ANr [i]←Bi The following perturbation approach is based on the general
end for random response (GRR) algorithm, which requires input data
#########For Decimal part#########
for j < decimal_LNr do to be a binary value before applying the algorithm. That is the
decimal_N r ← decimal_N r ∗10 reason why we transfer each digit into binary mode.
Di ← int(decimal_N r )
Bi ←transfer(Di )
2) Data Perturbation: The perturbation function will apply
decimal_ANr [i]←Bi to each 4-bit binary value. To be specific, the perturbation
end for function will return the original value with probability p and
Return: int_ANr , decimal_ANr
return the opposite value with probability 1 − p. By this way,
the perturbed 4-bit binary value will be obtained. Before per-
turbation, we need to set the key parameter of total privacy
B. Novel LDP Algorithm budget ε. Here, we use α to denote the length of the valid
In general, the total procedures of the proposed algorithm input number (2), and allocate privacy budget averagely for
can be divided into three steps: 1) encoding; 2) perturbation; each digit
and 3) aggregation. Each step will be demonstrated as follows. α = int_LNr + decimal_LNr . (2)
1) Encoding: For any numerical input data, the encoding
function will split it into several digits by bit. Then, translate Then, the privacy budget of perturbation for each digit
each digit into binary mode from decimal mode. equals (ε/α) (i.e., sequential composition feature of differ-
As the pseudocode demonstrated above, during the encod- ential privacy model proposed by McSherry [25]). Since the
ing procedure, the model inputs user’s sensitive numerical perturbation function applies GRR on a 4-bit binary value,
data Nr , and parameter Dp represents the correctness number according to the definition of differential privacy
to the decimal places. This procedure could obtain the 4-bit ε
e 4α
binary array of both integer part int_ANr and decimal part p= ε . (3)
decimal_ANr . The function of transfer uses the enumeration e 4α + 1
Authorized licensed use limited to: George Mason University. Downloaded on February 02,2024 at 15:44:10 UTC from IEEE Xplore. Restrictions apply.
6736 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO. 8, 15 APRIL 2023
TABLE II
P ROBABILITY OF P ERTURBATION F UNCTION
Algorithm (a) Rectify the Perturbed Decimal Array RP .
(b) Aggregate All Perturbed Digits
Input: Perturbed array of integer part int_RP
Perturbed array of decimal part decimal_RP
Output: Final output result Sp
Giving Dn , int_LNr , decimal_LNr
Set Sp , int_Sp , decimal_Sp to be empty
#########For Integer part#########
Algorithm (a) Implement GRR Algorithm on Each Four-bit T←1
Binary Digit by Bit. (b) Transfer Perturbed Four-bit Binary for i < int_LNr do
Digit into Decimal Value n←Dn [i]
C←2n + (15 − 2n)p − 15
Input: Four-bit binary mode array of integer part int_ANr int_Sp ←int_Sp +(C + int_RP [i])∗T
Four-bit binary mode array of decimal part decimal_ANr T←T∗10
Output: Perturbed array of integer part int_RP end for
Perturbed array of decimal part decimal_RP #########For Decimal part#########
Function perturb(bin) T←1
Set Ret[0 . . . 3] to be empty for i < decimal_LNr do
for i< 4 do n←Dn [i]
r←rand.uniform(0, 1) C←2n + (15 − 2n)p − 15
if r > p do T←T∗0.1
if bin[i]= 0 then
decimal_Sp ← decimal_Sp + C + decimalRP [i] ∗ T
Ret[i]= 1 Sp ← int_Sp + decimal_Sp
else end for
Ret[i]= 0 Return: Sp
end for
return Ret
End
Function transfer(num) transferred into decimal mode, where α denotes the length
Bm [0...15]←["0000", "0001", "0010", "0011", "0100", "0101", "0110", "0111
", "1000", "1001", "1010", "1011", "1100", "1101", "1110", "1111"]
of the valid input number and p represents the perturbation
for j< 15 do probability.
if num = Bm [j] then 3) Aggregation: According to the output result of the sec-
return j
end for
ond step, we have obtained the decimal perturbed value for
End each digit. But the mathematical expectation of perturbation
Giving ε, int_LNr , decimal_LNr function is biased for input digits. To rectify the output, we
Set p, int_Rp , decimal_Rp to be empty
α ←int_LNr + decimal_LNr
adopt coefficient C to adjust the output result, according to (4)
ε
p← e 4α
ε C = 2n + (15 − 2n)p − 15. (5)
e 4α +1
#########For Integer part######### After adding the coefficient C, the output result will be
for i < int_LNr do
Tmpi ←perturb(int_ANr [i]) unbiased. Then, the last step is to aggregate all perturbed
Ri ←transfer(Tmpi ) digits.
int_Rp [i]←Ri Where array Dn denotes the raw input digits for each bit,
end for
#########For Decimal part######### and the final output result is Sp . Then, the whole process of
for i < decimal_LNr do this novel LDP mechanism is completed. Since the proposed
Tmpi ←perturb(decimal_ANr [i]) LDP mechanism can directly apply to numerical data with no
Ri ←transfer(Tmpi )
decimal_Rp [i]←Ri other requirements, it can adapt to the decentralized system
end for well. Furthermore, because the perturbation function is based
Return: int_RP , decimal_RP on the 4-bit binary value, the upper bound and lower bound of
the output result is in the range of [−C, 15+C] for each digit.
Hence, the bounded output will have better utility performance
As Table II shows, there are five scenarios of perturbation under the situation of employing different privacy budgets.
result: 1) binary value without any change; 2) binary value
with the change of 1 bit; 3) 2 bits; 4) 3 bits; and 5) 4 bits. Then, C. Iteration Approach to Minimize Deviation
the next step is to transfer perturbed 4-bit binary value into
According to the proposed LDP mechanism, we can obtain
decimal value. The detailed enumeration result of input digits
great data utility in the statistical mean estimation query. But
(from 0 to 9) is demonstrated in the Appendix. According to
in the previous procedures, we allocate a privacy budget for
the enumeration methodology, we can obtain the general math-
each digit averagely, which will come up with a bigger devi-
ematical expectation formula of the output of the perturbation
ation in high-order bits. To consider more on data utility, we
algorithm is
employ the iteration approach to adaptively allocate the pri-
ResultP(n) = (−15 + 2n)p + 15 − n. (4) vacy budget for each digit to minimize the total deviation of
our perturbation function. The definition of deviation is as
As the pseudocode demonstrated above, during the pertur-
follows:
bation procedure, the input data is a 4-bit binary array of both
integer part int_ANr and decimal part decimal_ANr . The per- deviation = abs perturbeddata − originaldata (6)
turbed output results are int_RP and decimal_RP correspond to abs(deviationlast − deviationcurrent )
the integer part and decimal part, respectively, which has been decrementrate = . (7)
deviationlast
Authorized licensed use limited to: George Mason University. Downloaded on February 02,2024 at 15:44:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: NUMERICAL SPLITTING AND ADAPTIVE PRIVACY BUDGET-ALLOCATION-BASED LDP MECHANISM 6737
TABLE III
Algorithm (a) Iterate the Allocation of Privacy Budget 4-B IT B INARY VALUE OF t, t , AND t∗
Input: Initial Eparray
Output: Final Allocation Privacy Budget Array Eparray
Giving Input Number N
deviationoriginal = Perturb(N, Eparray )
Set Eparray to be empty
Epoch←500
LR←0.2
LN ←length(N)
deviationmin ←deviationoriginal
for i < Epoch do part. Then, transfer decimal parts into integers according to the
Eptmp ←Eparray precision setting. Finally, store both integer and decimal parts
for j < LN do
Eptmp j ←Eptmp j − LR
into a two-dimension array with the same first-index number.
for k < LN do When the blockchain receives a query of mean estimation,
Eptmp [k]←Eptmp [k] + LR the smart contract will add up all stored values for both integer
deviationtmp ←Perturb(N, Eptmp )
if deviationtmp < deviationmin do
and decimal parts, respectively. Then, return these results to
deviationmin ←deviationtmp the external interface function with a parameter of precision
Eparray ←Eptmp rate in the decimal part. The external interface function will
end for
work out [according to (8)] the final result and response to
Eptmp j ←Eptmp j + LR
end for queries
end for
Return: Eparray Rfinal = Suminteger + Sumdecimal ∗ Rateprecision . (8)
Authorized licensed use limited to: George Mason University. Downloaded on February 02,2024 at 15:44:10 UTC from IEEE Xplore. Restrictions apply.
6738 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO. 8, 15 APRIL 2023
TABLE IV
P RIVACY B UDGET VALUES D URING I TERATION However, the privacy concern continues rising since all
participant nodes can access the chain data, users’ sen-
sitive information would be vulnerable in the blockchain
network.
In order to address the privacy issues, this article proposes
a novel LDP mechanism to protect sensitive data with no
requirement of the fixed input range, large data volume, and
same privacy budget. In addition, we use the iteration approach
to adaptively allocate privacy budget for different procedures,
minimize the total deviation of the perturbation function. The
experiments have compared our proposed mechanism with tra-
ditional ones in different cases and demonstrated details of the
iteration approach.
In conclusion, our novel mechanism performs better
(i.e., with higher utility) under different circumstances. In
future work, we will consider improving our proposed
mechanism to adapt the exponential mechanism, then
it can deal with categorical data in acceptable time
complexity.
Authorized licensed use limited to: George Mason University. Downloaded on February 02,2024 at 15:44:10 UTC from IEEE Xplore. Restrictions apply.
6740 IEEE INTERNET OF THINGS JOURNAL, VOL. 10, NO. 8, 15 APRIL 2023
Authorized licensed use limited to: George Mason University. Downloaded on February 02,2024 at 15:44:10 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: NUMERICAL SPLITTING AND ADAPTIVE PRIVACY BUDGET-ALLOCATION-BASED LDP MECHANISM 6741
[25] F. D. McSherry, “Privacy integrated queries: An extensible platform for Ying Zhao is currently pursuing the Ph.D. degree
privacy-preserving data analysis,” in Proc. ACM SIGMOD Int. Conf. with the Department of Computing Technologies,
Manag. Data, 2009, pp. 89–97. Swinburne University of Technology, Melbourne,
[26] “Heart Rate Analysis.” 2017. [Online]. Available: https://github.com/ VIC, Australia.
JenniferLing/heart_rate_analysis (accessed Aug. 31, 2021). Her current research focuses on data privacy.
[27] “Medical Cost Personal Datasets.” 2017. [Online]. Available: https://
www.kaggle.com/mirichoi0218/insurance (accessed Aug. 31, 2021).
[28] “FISCO BCOS. The Building Block of Open Consortium Chain.”
[Online]. Available: https://www.fisco-bcos.org/ (accessed Aug. 31,
2021).
Hongwang Xiao is currently pursuing the Jinjun Chen (Fellow, IEEE) received the
Ph.D. degree with the Swinburne University of Ph.D. degree in information technology from
Technology, Melbourne, VIC, Australia. Swinburne University of Technology, Melbourne,
His research interests include computer vision, VIC, Australia.
capsule networks, and related topics. He is a Professor with the Swinburne University
of Technology. His research has been published sig-
nificantly in various venues. His research is in data
security and privacy, cloud computing, and scalable
data processing.
Authorized licensed use limited to: George Mason University. Downloaded on February 02,2024 at 15:44:10 UTC from IEEE Xplore. Restrictions apply.