Professional Documents
Culture Documents
Ad Hoc Networks
journal homepage: www.elsevier.com/locate/adhoc
Keywords: With the development of 5G communication technology, the Internet of Things technology has ushered in the
Wireless networks development opportunity. In the application of Internet of Things, spatial and social relations can be used
Weighted graph to provide users with convenience in life and work, meanwhile there is also the risk of personal privacy
Privacy protection
disclosure. The data transmitted in the wireless network contains a large number of graph structure data, and
Differential privacy
the edge weight in weighted graph increases the risk of privacy disclosure, therefore in this paper we design
a privacy protection algorithm for weighted graph, and adopts the privacy protection model to realize the
privacy protection of edge weight and graph structure. Firstly, the whole graph sets are disturbed and the
noises are added during the process of graph generation. Secondly, the privacy budget is allocated to protect
the weight values of edges. The graph is encoded to deal with the structure of graph conveniently without
separating from the information of edges, and then the disturbed edge weight is integrated into the graph.
After that the privacy protection of the graph structure is realized in the process of frequent graph mining
combined with differential privacy. Finally, the algorithm proposed in this paper is validated by experiments.
1. Introduction expose the user’s privacy. Moreover in the weighted network graph,
the weight value also has the risk of privacy disclosure. For example,
The fifth generation (5G) mobile communications and Internet of traffic network graph, communication network graph and so on. When
Things (IoT) are changing the style of information transmission and considering the privacy protection of weighted network graph, the
data sharing. The unexpected surge of wireless data and the sustaining following issues need to be considered. The first aspect includes the
development of various digital terminals make people increasingly rely applicability of privacy protection model and the complexity of graph.
on digital services in social network, transportation, payment, naviga- Because graph is a complex data structure, when using privacy pro-
tion, online shopping and e-health [1,2]. Then, every day there will be tection model, we need to design a certain privacy protection strategy
a lot of network data generated, each user has become a digital individ- to ensure the feasibility of the algorithm. The second aspect includes
ual. Network data generally take the expression of graphs, the nodes in the contradiction and correlation between data changing and data
these graphs represent individuals or company organizations in social protection, which require not only the data utility of graph, but also
network, and edges indicate the relationships between nodes [3–5]. the privacy of data respectively. In the third aspect, the structures of
Specially, the edge can be weighted, which represents the quantified
graph and edge weight are different, which may lead to the difficulty
relationship between nodes, such as degree of intimacy of user relation-
of selecting processing methods, and it is difficult to apply the existing
ship or the amount of transactions between companies, etc. Therefore,
algorithms directly. Therefore, in order to solve the above problems
these network graph data contains a lot of user information [6–9].
and challenges, we needs to employ stricter privacy protection model
However, the increasing complexity of information technology and the
to protect the privacy of weighted network graph data, and design
growing capacity to analyze and disseminate network data are posing a
an effective privacy protection algorithm, so as to ensure the privacy
significant threat to the privacy of users. Therefore, if these graph data
are released without any protecting processing, it will lead to privacy protection of the graph structure and edge weight value, and ensure
disclosure. that the disturbed graph data can achieve high utility.
When graph data is published, anonymization is usually adopted to C.Dwork et al. [10] proposed the differential privacy protection
delete or replace the node ID in graph. However, the simple hidden model, which is a method based on data distortion. In this method, the
node information still has the risk of privacy disclosure, because not risk of privacy disclosure is strictly defined and the mathematical proof
only the node information, but also the structural information can is provided, so that the availability of data can be greatly guaranteed.
∗ Corresponding author.
E-mail address: ningbo@dlmu.edu.cn (B. Ning).
https://doi.org/10.1016/j.adhoc.2020.102303
Received 2 June 2020; Received in revised form 26 August 2020; Accepted 16 September 2020
Available online 23 September 2020
1570-8705/© 2020 Elsevier B.V. All rights reserved.
B. Ning et al. Ad Hoc Networks 110 (2021) 102303
In this model, the privacy risk of adding or deleting a record in a (1) According to the frequency of edge, Laplace noise disturbance of
data set can be controlled within a certain minimum, which makes it differential privacy is used in the graph generation process, and
impossible for an attacker to infer the user’s accurate information. It reasonable rules of graph generation are designed.
means that the existence or absence of a record has small impact on (2) After getting the disturbed graph sets, we design the edge weight
the final output results. In addition, because the model is biased to the protection algorithm, including a reasonable privacy budget allo-
mathematical model, which also brings great interest and challenges cation strategy.
to researchers. Therefore, in this paper, we first adopt the differential (3) Then, the disturbed edge weight is integrated into the coding
privacy protection model to protect the privacy of weighted graph data. process of the graph, and the frequent subgraphs of atlas are
At present, differential privacy technology has been applied to the mined. In the mining process, the Laplace mechanism and the
protection of non-weighted graph data, and the relative effect is rela- exponential mechanism of differential privacy are used to protect
tively excellent [11–13]. Q. Xiao et al. [11] used the HRG (hierarchical graph structure, and then the data utility is improved.
random graph) model to transform the graph into a tree structure, so (4) The expensive experiments verify the rationality and effectiveness
as to deal with the complex structure of the graph, and used the MCMC of our algorithm.
(Markov Monte Carlo) method to sample the HRG model space, while
satisfying differential privacy. Compared with the existing work, this 2. Related definition
work reduces the noise intake and effectively retains the basic network
structure characteristics. Kang Dong et al. [13] made improvements
Differential privacy (DP) model [10] guarantees that it can also
on the basis of the work of Q. Xiao et al. However, Edge weight
limit the attacker’s inference of private information, even in front of
also increases the risk of privacy disclosure, and the algorithm in the
a powerful attacker. Due to the robustness of the privacy protection
protection of non-weighted graph cannot be directly used to protect
model, the privacy protection in graph data has been widely used. The
weighted graph, because of the complexity of weighted graph in which
definitions and properties about differential privacy are described as
the weight of edge needs to be considered.
following.
In the privacy protection work of weighted graph data, most re-
searches employed the k-anonymity method [14–16] to protect the
Definition 2.1 (𝜀-Differential Privacy). Given a random algorithm 𝑀,
personal privacy. Because of the assumption of background knowledge
𝑅𝑎𝑛𝑔𝑒(𝑀) indicate a set of all possible output results of algorithm
when using the k-anonymity protection method, the relative effec-
𝑀. For any two neighbor data sets 𝐷 and 𝐷′ , and any subset 𝑆 of
tiveness of the proposed algorithm is not very strong. There are also
𝑅𝑎𝑛𝑔𝑒(𝑀), if 𝑃 𝑟(𝑀(𝐷) ∈ 𝑆) ≤ 𝑃 𝑟(𝑀(𝐷′ ) ∈ 𝑆) × exp(𝜀) is satisfied,
many attempts to protect the weight by differential privacy, but in
then algorithm 𝑀 satisfy 𝜀-differential privacy.
the process of protecting the weight of edge, many works do not
consider the structure protection of graph. Hu et al. [17] protected the The neighbor data sets represent two data sets with only one data
information of edge weight from leakage, the method of transforming difference, 𝜀 is the privacy budget, which determines the effect of
edge weight into probability value is adopted, so that the graph is privacy protection. Generally, the smaller the privacy budget 𝜀, the
published as uncertainty graph, but the data utility is largely ignored. higher the intensity of privacy protection.
Xiaoye Li et al. [18] transformed the edge weight sequence into the In addition, there are two common mechanisms of differential pri-
non-attribution histogram, and then used the differential privacy to vacy, Laplace mechanism and exponential mechanism [25]. Laplace
protect the weight information in graph. However, they only consider mechanism is used in numerical privacy protection, while exponential
the privacy protection of the weight in graph, rather than the protection mechanism is used in non-numerical privacy protection. Two mecha-
for structure of graph. Li and Chen et al. [19,20] emphasized that the nisms are defined below. The choice of noise mechanism determines
weight information of edge is not considered in the privacy protection the accuracy of query.
of non-weighted graphs, and the weight and structure of weighted
graphs will increase the risk of privacy leakage. Therefore, a privacy Definition 2.2 (Laplace Mechanism). Given the data set 𝐷, satisfying the
protection algorithm is designed by combining edge weight value and sensitivity of function 𝑓 ∶𝐷→𝑅𝑑 , denoted as △𝑓 , then the random algo-
graph structure. Li et al. [21] transformed graphs into ordered triples, rithm 𝑀(𝐷)=𝐹 (𝐷)+𝐿𝑎𝑝(△𝑓 ∕𝜀) satisfies 𝜀-differential privacy. Where
and designed a query model wsquery satisfying differential privacy. 𝐿𝑎𝑝(△𝑓 ∕𝜀) is random noise, which is a Laplace distribution with scale
In addition, some work has used differential privacy protection tech- parameter △𝑓 ∕𝜀. The magnitude of noise is directly proportional to
nology to protect the privacy of frequent subgraphs. E. Shen et al. [22] △𝑓 and inversely proportional to 𝜀.
applied the algorithm of frequent graph pattern mining and the guaran-
tee of differential privacy to protect the mining of frequent subgraphs Next, the concept of global sensitivity is given. This parameter
without permission. At the same time, in order to ensure privacy and determines the disturbance degree and it is needed by the mechanism,
utility, an effective neighbor counting technology is proposed. Xiang which is only determined by the query function and independent of
Cheng et al. [23] solved the problems of inaccurate mining results and data set.
weak differential privacy protection caused by large output space on
the basis of E. Shen et al. And the whole algorithm satisfies the 𝜀- Definition 2.3 (Global Sensitivity). Given a function 𝑓 ∶ 𝐷→𝑅𝑑 , the
differential privacy, which guarantees the utility and the privacy of input is a data set 𝐷, the output is a 𝑑-dimensional real number vector,
data to a large extent. Many algorithms for mining frequent subgraphs for any neighbor data sets 𝐷 and 𝐷′ , △𝑓 = max𝐷,𝐷′ ‖𝑓 (𝐷) − 𝐹 (𝐷′ )‖ is
have been proposed. The gSpan algorithm [24] used in this paper is the global sensitivity of 𝑓 . Where 𝑅 represents the real number space
a frequent subgraph mining algorithm based on the depth search of of mapping and the 𝐿1 distance between 𝑓 (𝐷) and 𝑓 (𝐷′ ).
graph, and it can make the expression of graph unique and easy to
operate. Definition 2.4 (Exponential Mechanism). Given the data set 𝐷, the
In this paper, we design a privacy protection algorithm for the random algorithm 𝑀 outputs an entity object 𝑜 ∈ Range(𝑀), 𝑢(𝐷, 𝑜)
weighted graph, and use the differential privacy protection model to is the utility function, △𝑢 = max∀𝑜,𝐷,𝐷′ ‖𝑢(𝐷, 𝑜) − 𝑢(𝐷′ , 𝑜)‖ is the sen-
protect the edge weight and structure of graph. At the same time, our sitivity of the utility function. If the algorithm selects and outputs 𝑜
proposed method ensures the data utility as much as possible. The main from Rang(𝑀) with a probability proportional to exp( 𝜀𝑢(𝐺,𝑜)
2△𝑢
), then the
contributions are as follows: algorithm 𝑀 satisfies 𝜀-differential privacy.
2
B. Ning et al. Ad Hoc Networks 110 (2021) 102303
3
B. Ning et al. Ad Hoc Networks 110 (2021) 102303
Where, (𝑚𝑖𝑛_𝑙𝑒𝑛𝑔𝑡ℎ(𝑉𝑖 , 𝑉𝑗 ) − 1) represents the shortest path length Algorithm 1: graph dataset generation algorithm
between the current node 𝑣𝑖 and other nodes 𝑣𝑗 . That is to say, the value Input: original graph data set 𝐺𝐷, Edge Frequency set 𝐹𝑖
of 𝑅(𝑉𝑖 , 𝑉𝑗 ) according to the six degree principle can be 0, 0.2, 0.4, 0.6, Output: generated graph dataset 𝐺𝐷′
0.8 and 1. At this time, only the number of sides of the shortest path 1 for 𝑖=1 to 𝑛 do
between nodes is considered, and the weight on edge is not considered 2 𝐹𝑖′ ← 𝐹𝑖′ + Laplace(1∕𝜀1 );
in the calculation of shortest path length. At the same time, considering 3 Initialize 𝐺𝑖′ ← 𝑒1 ;
that the shortest path length may exceed 6 in other network diagrams, 4 Generate 𝐺𝑖′ neighbor collection 𝑁;
it is specified that when the shortest path length between nodes exceeds 5 foreach neighbor edge 𝑒 ∈ 𝑁 do
6, the value of 𝑅(𝑉𝑖 , 𝑉𝑗 ) is 0. 6 if 𝑒’ frequency ≥ Threshold and 𝐼 is max then
Then, the relationship between a target node 𝑣𝑗 ∗ and other gener- 7 Remove 𝑒 from 𝑁;
ated nodes 𝑣𝑖 can be obtained by summation and meaning. The formula 8 𝐺𝑖′ ← 𝐺𝑖 + 𝑒;
expression is as follows: 9 goto Line 4;
1 ∑
𝑛
𝑅(𝑉𝑖 , 𝑉𝑗 ∗ ) = × 𝑅(𝑉𝑖 , 𝑉𝑗 ∗ ) (2) 10 else
𝑛 𝑖=1 11 𝐺𝐷′ ← 𝐺𝑖′ ;
In weighted graph, the weight of edge should be considered, espe-
12 Return 𝐺𝐷′ ;
cially when an edge is added with a new node, the new node may
be related to some nodes in the existing node. Therefore, we can get
the formula of neighbor edge filtering conditions, using the letter 𝐼 to
represent the amount of information carried by one side, the specific
considering that the treatment of divide, rules of graphs may cause
formula is described as follows:
the uncorrelation of each graph. We designs the allocation strategy
𝐼 = 𝛼 × 𝑅(𝑉𝑖 , 𝑉𝑗 ∗ ) + (1 − 𝛼) × 𝑊𝑖𝑗 ∗ (3) of privacy budget according to the number of sides, which can still
guarantee the correlation between the whole atlas. Then, the principle
Where 𝛼 is the equilibrium parameter. The designed principle of of allocating privacy budget can be made by the number of edges, the
formula setting is that the greater the relationship between a node and 𝐸
privacy budget of each edge weight sequence is 𝐸𝑖 × 𝜀. If we directly
other nodes is, the easier it is to be selected; at the same time, the 𝐸
use the parameter 𝐸𝑖 × 𝜀 of Laplace function, it can be found that the
weight value information generally represents the relationship degree
noise value will be very large, which will disturb the data too much,
or transaction amount, the greater the weight value is, the more impor-
and the privacy budget will also cause great waste. Therefore, we uses
tant the node is, so the neighbor edge with the larger 𝐼 value is easier
the negative exponential function to adjust the privacy budget, which
to be selected.
can control the denominator part of Laplace function between 0 and
As shown in Fig. 2(a), if 𝛼 value is 1∕2, then the candidate neighbor
1, that is to say, it can reduce the noise disturbance and ensure the
set formed by current edge 𝑒12 is 𝑁 = {𝑒13 , 𝑒16 , 𝑒17 , 𝑒23 , 𝑒28 }. The
data utility. At the same time Satisfy differential privacy. Then, the
𝐼 value of edge 𝑒13 is 𝐼 = 1∕2 × (1∕2 × (1 + 0.8)) + (1 − 1∕2) ×
Laplace function allocates noise for each edge of graph, expressed as
2 = 0.45 + 1 = 1.45. Similarly, 𝐼 value of other neighbor edges is 𝐸
Laplace△𝑤∕𝑒 − 𝐸𝑖 × 𝜀2 , where △𝑤∕𝑒 is the sensitivity of the weight
{1.95, 1.45, 2, 0.95}. Therefore, according to the neighbor edge filtering
sequence, and 𝜀2 is the privacy budget allocated to. The settings of
rule, edge 𝑒23 is selected and added to new graph. Then according to
two functions will be described in detail in the privacy analysis of this
the selected edge 𝑒12 and edge 𝑒23 , the candidate neighbor set is formed,
chapter. Algorithm 2 is the edge weight protection algorithm (EWPA).
and the filtering is repeated until a new graph is formed, as shown in
Line 4 adds Laplace noise to the edge weight, and Line 5 judges the
Fig. 2(b). In addition, there will be cases where the 𝐼 value of two
weight after disturbance, so as to avoid that the weight has no practical
edges is the largest and the same, such as edge 𝑒13 and edge 𝑒17 . In this
significance.
case, according to the frequency of two edges, one edge with higher
frequency can be selected to join the graph.
Algorithm 1 is the graph dataset generation algorithm (GDGA). Line Algorithm 2: edge weight protection algorithm
2 of the algorithm disturbs the frequency of the edge in graph to form Input: Edge weight sequence set 𝑊 𝑆, privacy budget 𝜀1 , the
a new frequency set 𝐹𝑖′ . Line 3 initializes the new graph, and select the number of edges in each graph 𝐸𝑖 , the size of graph set
edge 𝑒1 corresponding to the first item in 𝐹𝑖′ into the graph 𝐺𝑖′ . Line 𝐺𝐷, total number of edge 𝐸
4 generates neighbor edge set 𝑁 according to the nodes of selected Output: disturbed edge weight sequence set 𝑊 𝑆 ′
edge. Lines 5–8 are the process of filtering and judging the neighbor 1 for 𝑖=1 to 𝑛 do
candidate set. Line 9 generates a neighbor edge set 𝑁 for the generated 2 foreach 𝑊 𝑆𝑖 ∈ 𝑊 𝑆 do
edges, and continue the filtering and judgment process of Lines 5–8. 3 for 𝑗=1 to 𝐸𝑖 do
Finally, Line 12 returns the generated graph data set 𝐺𝐷′ after the 4
𝐸
𝑤′𝑗 = 𝑤𝑗 + Laplace△𝑤∕𝑒 − 𝐸𝑖 × 𝜀2 ;
disturbance.
5 if 𝑤′𝑗 < 0 then
3.2. Privacy protection of edge weight 6 goto Line 4
7 else
After getting the disturbed graph set, this section will design the 8 𝑊 𝑆 ′ ← 𝑤′𝑗 ;
privacy protection algorithm for edge weight. Because the algorithm
designed in this paper is to process each graph in a graph set, then the 9 Return 𝑊 𝑆 ′ ;
protection of the edge weight can be transformed into the protection
of edge weight sequence.
The reasonable usage of privacy budget is a problem that it has 3.3. Privacy protection of graph structure
to be considered in the process of differential privacy protection. A
good privacy budget allocation strategy can avoid the waste of privacy The protection of graph structure is a difficulty in the privacy
budget and reduce the risk of privacy disclosure. In the process of protection of graph, especially in the privacy protection of weighted
edge weight protection, if the privacy budget is evenly distributed, graph. In addition, the existing non-weighted graph protection methods
there will still be the risk of privacy disclosure. At the same time, are difficult to be directly applied to the privacy protection of weighted
4
B. Ning et al. Ad Hoc Networks 110 (2021) 102303
frequent graph structure, we extend the coding process of the existing Edge Fig. 3(b) Fig. 3(c) Fig. 3(d)
frequent subgraph mining algorithm, integrate the edge weight value 0 ⟨0, 1, 3, 𝑋, 𝑌 ⟩ ⟨0, 1, 4, 𝑌 , 𝑍⟩ ⟨0, 1, 3, 𝑋, 𝑌 ⟩
into the graph coding, and adopt differential privacy protection in the 1 ⟨1, 2, 5, 𝑌 , 𝑋⟩ ⟨2, 0, 5, 𝑋, 𝑌 ⟩ ⟨1, 2, 4, 𝑌 , 𝑍⟩
2 ⟨2, 3, 3, 𝑋, 𝑍⟩ ⟨1, 2, 6, 𝑍, 𝑋⟩ ⟨2, 3, 6, 𝑍, 𝑋⟩
frequent subgraph mining process. 3 ⟨4, 1, 4, 𝑍, 𝑌 ⟩ ⟨2, 3, 3, 𝑋, 𝑍⟩ ⟨3, 1, 5, 𝑋, 𝑌 ⟩
4 ⟨2, 4, 6, 𝑋, 𝑍⟩ ⟨0, 4, 3, 𝑌 , 𝑋⟩ ⟨3, 4, 3, 𝑋, 𝑍⟩
3.3.1. Graph extended coding
The gSpan algorithm [21] used in this paper is a classic algorithm
based on depth first search (DFS). By coding the graph, the direct opera-
tion of graph is avoided, and the workload caused by the isomorphism ∙ The edge order of Fig. 3(b) is (𝑣0 , 𝑣1 ), (𝑣1 , 𝑣2 ), (𝑣2 , 𝑣3 ), (𝑣1 , 𝑣4 ),
of subgraph is reduced. The core idea of the algorithm is (1) coding (𝑣4 , 𝑣2 );
process. The graph is DFS encoded and identified with the smallest ∙ The edge order of Fig. 3(c) is (𝑣0 , 𝑣1 ), (𝑣1 , 𝑣2 ), (𝑣2 , 𝑣0 ), (𝑣2 , 𝑣3 ),
DFS code. (2) Subgraph mining process. If the DFS code of candidate (𝑣0 , 𝑣4 );
subgraph is not the smallest, the graph is pruned; if the code is the ∙ The edge order of Fig. 3(d) is (𝑣0 , 𝑣1 ), (𝑣1 , 𝑣2 ), (𝑣2 , 𝑣3 ), (𝑣3 , 𝑣1 ),
smallest, the subgraph is filtered out within the threshold conditions. (𝑣3 , 𝑣4 ).
In this paper, the graph is represented on the basis of expanding
Each search tree corresponds to an edge order representation, which
the encoding process of gSpan algorithm. When encoding the graph,
makes the original graph with the reasonable representations, so we
the disturbed weight value obtained from Algorithm 2 is integrated
consider the graph as a meta group. Since the edge weight is considered
into the encoding process, and the minimum DFS code formation rules
in the coding, each edge of sequence can be expressed as a six tuple
involved in encoding are adjusted. By coding the graph, we can achieve
according to the tuple design standard of the original algorithm, and
the following goals. The first goal is to simplify the operation of graph
the weight value can be placed in the third column of the tuple, that is,
and overcome the difficulty of direct operation of graph. The second
the tuple representation of each edge is ⟨𝑖, 𝑗, 𝑊𝑖𝑗 , 𝑙𝑖 , 𝑙(𝑖, 𝑗), 𝑙𝑗 ⟩, the tuple
goal is to ensure that the change of edge weight value, which can
considering weight value is called edge ordered EDFS code, where 𝑖,
affect the generation of the graph structure, so as to establish the
𝑗 is the identifier, 𝑊𝑖𝑗 is the weight of edge, (𝑙𝑖 , 𝑙(𝑖, 𝑗),𝑗 ) is the label
relationship between them. The third goal is to provide convenience
of node and edge. Because the labeled graph is not considered in this
for the subsequent frequent graph mining work and reduce the search
paper, the edge label is not considered in the EGC algorithm in the
space. In this paper, the extended coding algorithm of graph is called
process of constructing the EDFS code, so the edge can be expressed
GEC algorithm. And an example is given to illustrate the specific as a five tuple. For example, ⟨𝑣0 , 𝑣1 ⟩ in Fig. 3(b) can be expressed as a
implementation process of the algorithm in this section. meta group ⟨0, 1, 3, 𝑋, 𝑌 ⟩.
As shown in Fig. 3, when performing depth first search on the Therefore, the EDFS code of edge order can be obtained as shown
vertices in Fig. 3(a), multiple DFS search trees may be generated. As in Table 1.
shown in Fig. 3(b) to Fig. 3(d), the three search trees obtained by Since the edge weight value is considered in the edge tuple, the
depth first search on Fig. 3(a), which are isomorphic. According to formation rule of the minimum EDFS code is changed. The selection
the sequence of traversal time, the identification subscript of node can rule of the minimum EDFS code is as follows:
be set in the search tree. The earlier it is found, the smaller the node Suppose that there are two edge orders 𝑒1 = (𝑣𝑖 , 𝑣𝑗 , 𝑊𝑖𝑗 , 𝑙(𝑣𝑖 ), 𝑙(𝑣𝑗 )),
subscript is. For example, the first visited node 𝑋 in Fig. 3(b) can be 𝑒2 = (𝑣𝑥 , 𝑣𝑦 , 𝑊𝑥𝑦 , 𝑙(𝑣𝑥 ), 𝑙(𝑣𝑦 )), if 𝑒1 < 𝑒2 (the smaller the edge is, the easier
set to 𝑣0 , and the second visited node 𝑌 can be set to 𝑣1 .... according it is to be selected),If and only if one of the following conditions is met:
to this regulation, the identification subscript of each visited node can
be set. Then, the representation of edge can be obtained. For example, (1) (𝑣𝑖 , 𝑣𝑗 ) < (𝑣𝑥 , 𝑣𝑦 ) or
the edge between 𝑌 and 𝑍 in Fig. 3(c) can be represented as (𝑣0 , 𝑣1 ). (2) (𝑣𝑖 , 𝑣𝑗 ) = (𝑣𝑥 , 𝑣𝑦 ) and 𝑊𝑖𝑗 > 𝑊𝑥𝑦 or
The outer edge and inner edge will be formed in the search tree, (3) (𝑣𝑖 , 𝑣𝑗 ) = (𝑣𝑥 , 𝑣𝑦 ) and 𝑊𝑖𝑗 = 𝑊𝑥𝑦 or (𝑙(𝑣𝑖 ), 𝑙(𝑣𝑗 )) < (𝑙(𝑣𝑥 ), 𝑙(𝑣𝑦 )).
which are represented by solid line and dotted line respectively. The
edge between 𝑌 and 𝑍 in Fig. 3(b) corresponds to the dotted line edge In the minimum EDFs code selection rule, the edge order is com-
(𝑣4 , 𝑣1 ) between node ID 𝑣1 and 𝑣4 which represents the inner edge, pared item by item according to the elements in edge order. The first
and the node ID subscript in the inner edge is in reverse order. At the two items are compared, i.e. the node ID. the larger the node ID is,
same time, the edge sequence <𝑇 can be formed according to the visited the easier it is to be selected. The change of the rule in condition (2) is
nodes. The rules of edge order formation are described as following. that when the identifications of two sides are equal, the weight value is
Assuming 𝑒1 = (𝑖1 , 𝑗1 ), 𝑒2 = (𝑖2 , 𝑗2 ) (1) if 𝑖1 = 𝑗1 and 𝑖1 < 𝑗1 , 𝑒1 <𝑇 𝑒2 ; taken into account, and it is specified that the higher the weight value
(2) if 𝑖1 < 𝑗1 and 𝑗1 = 𝑖2 , 𝑒1 <𝑇 𝑒2 ; (3) if 𝑒1 <𝑇 𝑒2 and 𝑒2 <𝑇 𝑒3 , 𝑒1 <𝑇 is, the easier the edge is to be selected. This condition is formulated
𝑒3 . According to the edge order formation rules, the expression of edge on the basis of considering that the weight value usually represents the
order can be unique. Then, the edge order of Fig. 3(b) to Fig. 3(d) can intimacy degree of the relationship or the transaction amount, etc., so
be obtained as follows: the larger the specified weight is, the earlier it should be selected.
5
B. Ning et al. Ad Hoc Networks 110 (2021) 102303
Therefore, it can be obtained that the EDFS code corresponding Algorithm 3: Frequent subgraph protection algorithm
to Fig. 3(c) is the minimum EDFS code of Fig. 3(a), that is, the only Input: 𝑖-subgraph candidate 𝐶𝑖 ; privacy budget 𝜀3 , 𝜀4 ; frequency
representation of the graph. Then, we can filter frequent subgraphs and threshold; frequent 𝑖-graphs 𝑛𝑖
protect their privacy according to the minimum coding in subgraph Output: Frequent 𝑖-subgraphs 𝐹𝑖
mining. The specific process is described in the next section. 1 if 𝑠 ≠ 𝑚𝑖𝑛_𝐸𝐷𝐹 𝑆(𝑠) then
2 Return 𝐹 𝑆 ← 𝐹 𝑆 ∪ {𝑠};
3.3.2. Mining and protection of frequent graph structure
3 for 𝑗=1 to 𝑛𝑖 do
This paper uses differential privacy to protect graph structure in
4 foreach 𝑠 ∈ 𝐶𝑖 do
the process of mining. Due to the feature that differential privacy can
5 enumerate 𝑆 ∈ 𝐺 ⊆ 𝐺𝐷 and count its children;
analyze the protection intensity quantitatively, it has been widely used
6 foreach 𝑐, 𝑐 is 𝑠′ child with one edge growth in 𝐺𝐷 do
in data publishing and data mining. However, the relationship between
7 if lsupport(𝑐) = support(𝑐)+ laplace(1∕𝜀3 ) ≥ Threshold
privacy protection and data mining is always contradictory. The greater
then
the degree of privacy protection, the greater the difficulty of mining,
8 𝐶𝑖′ ← 𝑐;
and vice versa. Generally, when differential privacy protection is used
in the existing subgraph mining work, the general operation is to
9 if 𝐶𝑖′ ≠ ∅ then
directly add Laplace mechanism disturbance to the support of sub-
10 𝑔𝑗 ← select a subgraph 𝑔 from 𝐶𝑖′ such that
graphs, and then filter frequent subgraphs. Considering that the graph 𝜀 ×𝑠𝑢𝑝𝑝𝑜𝑟𝑡
11 𝑃 𝑟{Selecting subgraph 𝑔} ∝ exp( 4 2𝑛 );
structure is non numerical data, this paper proposes that differential 𝑖
privacy index mechanism can be used to protect the privacy of graph 12 Remove 𝑔𝑗 from 𝐶𝑖 ;
structure. By setting the parameters in the index mechanisms, the index 13 𝐹𝑖 ← 𝑔𝑗 ;
mechanism is successfully applied to the subgraph mining algorithm, so 14 𝐹 𝑆𝑃 𝐴(𝐺𝐷, 𝜀3 , 𝜀4 , 𝐹 𝑆, 𝑠);
that the algorithm can take two mechanisms of differential privacy to
15 Return 𝐹𝑖 ;
protect the privacy of graph structure at the same time, so that the data
utility of the mining results is better.
In the design of the algorithm, 1-graph is taken as the candidate
set of subgraphs, and whether the EDFS code of these subgraphs meets
Firstly, the correctness of differential privacy can be proved as
the minimum EDFS code condition is judged. If the EDFS code is the
follows. The WGPA algorithm divides the privacy budget 𝜀 into four
minimum, the child of subgraph is generated by increasing one edge at
parts, that is, it still satisfies the inequality (𝜀1 + 𝜀1 + 𝜀3 + 𝜀4 ≤ 𝜀).
a time, called as child simply. Each child corresponds to one support,
Among them, 𝜀1 is used to disturb the graph data set, 𝜀2 is used
then the support of subgraph can be disturbed, and the support of noise
to disturb the edge weight, 𝜀3 and 𝜀4 are used to disturb the graph
version can be obtained. Then the threshold condition of noise support
structure in the process of frequent subgraph mining. Based on the
can be determined. If it is less than the set threshold, the subgraph will
parallel combination [10] property of differential privacy, each part of
be eliminated. Then, according to the existing subgraphs, the child is
generated, and the frequent subgraphs can be obtained by repeating the algorithm satisfies differential privacy. Since the algorithm of graph
the supported disturbance and the screening of threshold judgment generation is still designed on the whole graph data set, according to
conditions. The algorithm does not end like this, but filters the frequent the sequence combination [10] of differential privacy, if there is an
subgraphs again, and uses the index mechanism of differential privacy algorithm 𝐴1 that satisfies 𝜀-differential privacy, the algorithm 𝐴2 (𝐴1 ())
to further select the most ideal frequent subgraphs that meet the designed on the basis of it still conforms to differential privacy, so as
conditions as final output results, so as to ensure the data effect and to ensure that WGPA algorithm still satisfies differential privacy, that
privacy. The frequent subgraph protection algorithm(FSPA) is shown is, it satisfies 𝜀-differential privacy (𝜀1 + 𝜀1 + 𝜀3 + 𝜀4 ≤ 𝜀).
in Algorithm 3. Secondly, we analyze the time complexity of the three algorithms
proposed in this paper. The time complexity of algorithm 1 is 𝑂(𝑛×𝑚),
where 𝑛 is the size of graph data set GD and 𝑚 is the size of edge
3.4. Algorithm analysis frequency set 𝐹𝑖′ . The time complexity of algorithm 2 is 𝑂(𝑛×𝐸𝑖 ), where
𝑛 is the size of graph data set GD and 𝐸𝑖 is the number of edges in
In this section, we carry on the algorithm analysis on correctness each graph. The time complexity of algorithm 3 is 𝑂(𝑛𝑖 ×|𝐶𝑖 |), where
of differential privacy and computational complexity of our proposed 𝑛𝑖 is the number of frequent i-subgraphs, |𝐶𝑖 | is the size of candidate
algorithms. i-subgraphs.
6
B. Ning et al. Ad Hoc Networks 110 (2021) 102303
From the analysis above, we can see that the proposed method in 4.1. Experimental setup
this paper is effective and efficient.
The experiment was carried out in Win7 64 bit operating system,
4. Experiment Intel i3 3.8 GHz processor, 12G memory, and implemented in Java
language. In the experiment, two real datasets are used to verify the
In this chapter, the algorithm proposed in this paper will be verified effectiveness of algorithms, as shown in Table 2.
by experiments. To observe the influence of parameters in the algo- In addition, RE and F1-score are used to test the performance of
rithm on experimental results, two experimental evaluation measures algorithms, which are defined as follows:
are adopted. Meanwhile, the algorithm without graph generation pro- RE (relative error): used to measure the reliability of mining results,
cess (called Naive) and only Laplace mechanism used in the process of as shown in formula (4).
frequent subgraph mining(called Basic) are taken as the experimental
𝑀𝑒𝑠𝑢𝑟𝑒𝑑𝑉 𝑎𝑙𝑢𝑒 − 𝐴𝑐𝑡𝑢𝑎𝑙𝑉 𝑎𝑙𝑢𝑒
comparison algorithm. 𝑅𝐸 = × 100% (4)
𝐴𝑐𝑡𝑢𝑎𝑙𝑉 𝑎𝑙𝑢𝑒
7
B. Ning et al. Ad Hoc Networks 110 (2021) 102303
Table 2 As shown in Fig. 4(a) and Fig. 4(b), with the threshold increasing,
Experimental data set.
the value of F1-score is increasing. The reason is that the larger the
Dataset Figure size Vertex size Total edges Average edges Weight range threshold is, the fewer subgraphs that can meet the condition of fre-
Grd [29] 340 9189 9317 28 0-5 quent threshold when mining frequent subgraphs, and the greater the
IBM [30] 1230 14304 25221 21 0-20
real possibility of mining. Therefore, the greater the value of F1-score
is, the greater the data utility is. Basic algorithm is no more than 0.8,
generally between 0.6 and 0.7, especially in large data sets Obviously.
F1-score used to measure the data availability of mining results, as In addition, in the WGPA algorithm, the F1-score value can meet the
shown in formula (5). above 0.8, and achieve the effect of about 0.9, especially in the IBM
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙 data set, which is more obvious.
𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 = 2 × (5) As shown in Fig. 5(a) and Fig. 5(b), the value of RE decreases
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
with the increase of threshold. When the threshold value is very small,
Where, precision = |𝑈𝑝 ∩ 𝑈𝑐 |∕|𝑈𝑝 | and recall= 2 × |𝑈𝑝 ∩ 𝑈𝑐 |∕|𝑈𝑐 |. 𝑈𝑐 the number of frequent graphs that can meet the threshold condition
is the accurate result of frequent subgraph mining, 𝑈𝑝 is the result of will be larger. That is to say, in these subgraphs, the number of
frequent subgraph mining under differential privacy. non real frequent graphs will be larger, so the relative error will be
larger. Therefore, the selection of threshold is also a factor affecting
4.2. Experimental results and analysis the experimental results. From the experimental results, when 0.3–0.4
is selected, the results are relatively excellent. Of course, the setting
This section is to show the effect of algorithms on different datasets, of threshold can also be determined according to the actual needs. In
mainly to observe the impact of parameter privacy budget and thresh- addition, the RE value of the WGPA algorithm is always the smallest in
old on experimental results. The selection of threshold and privacy the different data sets, which can be lower than 0.02.
budget 𝜀 is determined according to the actual situation. Generally, As shown in Fig. 6(a) and Fig. 6(b), the more the privacy budget is
the setting of 𝜀 is not too large. If the setting of 𝜀 is too large, the added, the weaker the privacy protection degree of the graph is, and
protection work is basically gone. If the threshold setting is too large, meanwhile the less the interference to data is, so the data utility is
there may be only one or zero mining results, which is not meaningful. higher with a bigger F1-score value. In the mining process of Naive,
Therefore, in order to better observe the effect of two parameters on Laplace mechanism and exponential mechanism are adopted simulta-
the algorithm, in the experiment, privacy budget 𝜀 is between 0 and neously, which can ensure that the screened subgraphs are better, and
30, which is allocated according to the proportion of 𝜀1 : 𝜀2 : 𝜀3 : 𝜀4 = then the utility of final output frequent atlas is better. WGPA improves
2 : 3 : 2 : 3, and the threshold value is between 0 to 0.6. the accuracy of mining results on the basis of Naive, especially in IBM
8
B. Ning et al. Ad Hoc Networks 110 (2021) 102303
dataset, the value of F1-score is relatively high, which can also show [11] Q. Xiao, R. Chen, K. Tan, Differentially private network data release via structural
that in complex graph data The result of the algorithm is better. inference, in: Proceedings of the 20th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, KDD’14, New York, USA, 2014, pp.
As shown in Fig. 7(a) and Fig. 7(b), due to the influence of privacy
911–920.
budget, the RE value is constantly decreasing, and the interference is [12] H. Nguyen, A. Imine, M. Rusinowitch, et al., Network structure release under
also increasing at this time. WGPA algorithm is better. In the data set differential privacy, Trans. Data Privacy 9 (3) (2016) 215–241.
IBM, it can basically ensure that the re value is below 0.16, and the [13] K. Dong, Z. Liu, Y. Xu, et al., Differentially private big data publication via struc-
relative error of mining results is very small. tural inference and community detection, United Kingdom, ISPAN-FCST-ISCC,
2017.
Finally, the running time RT of the algorithm is also observed. The
[14] M.E. Skarkala, M. Maragoudakis, S. Gritzalis, et al., Privacy preservation by k-
time required for Naive algorithm and WGPA algorithm to mine fre- anonymization of weighted social networks, in: ASONAM, Turkey, 2012, pp.
quent subgraphs is observed on data set Grd and data set IBM, as shown 423–428.
in Fig. 8. In contrast, the WGPA algorithm needs more time. In the [15] L. Chuan, L. Ihsien, Y. Wunsheng, et al., K-anonymity against neighborhood
mining process, the size of threshold affects the mining time. Generally, attacks in weighted social networks, Secur. Commun. Netw. 8 (18) (2015)
3864–3882.
the time needed is reduced with the increase of the threshold. The
[16] X. Zhang, Q. Zhou, C. Gu, Published weighted social networks privacy preser-
reason is that the larger the threshold is and the less frequent subgraphs vation based on community division, in: Proceedings of the 7th International
can meet the conditions in graph set, and the running time RT is also Conference on Communication and Network Security, Tokyo Japan, ICCNS, 2017,
smaller. At the same time, it takes more time on IBM datasets, mainly pp. 86–90.
because the more complex the graph is, the more time it will be taken. [17] J. Hu, J. Yan, Z. Wu, et al., A privacy-preserving approach in friendly-correlations
of graph based on edge-differential privacy, J. Inf. Sci. Eng. 35 (4) (2019)
821–837.
5. Conclusions [18] X. Li, J. Yang, Z. Sun, et al., Differential privacy for edge weights in social
networks, Secur. Commun. Netw. 4267921 (2017) 1–10.
[19] Y. Li, H. Shen, C. Lang, H. Dong, Practical anonymity models on protecting
In this paper, we propose a privacy protection algorithm to protect
private weighted graphs, Neurocomputing 2 (18) (2016) 359–370.
the weighted graph in Internet of things, which mainly adopts the [20] J. Chen, B. Zhang, M. Chen, A 𝛾-Strawman privacy-preserving scheme in
differential privacy protection model to protect the edge weight and weighted social networks, Secur. Commun. Netw. 9 (18) (2016) 5625–5638.
graph structure. First of all, we disturb the whole graph set and add [21] L. Lihui, J. Shiguang, Weighted social network privacy protection based on
noise in the process of graph generation; secondly, we design edge differential privacy, J. Commun. 36 (9) (2016) 145–159.
[22] E. Shen, T. Yu, Mining Frequent Graph Patterns with Differential Privacy,
weight protection algorithm for the disturbed graph set, then code the
SIGKDD, USA, 2013, pp. 545–553.
graph and integrate the disturbed edge weight into it. Then, we mine [23] X. Cheng, S. Sh, S. Xu, et al., A two-phase algorithm for differentially pri-
and protect the frequent graph structure of the graph set, difference vate frequent subgraph mining, IEEE Trans. Knowl. Data Eng. 30 (8) (2018)
privacy is used in the process of mining. Finally, experiments are 1411–1425.
carried out in real datasets, and the experimental results show that our [24] X. Yan, J. Han, GSpan: Graph-Based Substructure Pattern Mining, ICDM,
Maebashi City, Japan, 2002, pp. 721–724.
method is feasible and effective.
[25] F. Mcsherry, K. Talwar, Mechanism Design Via Differential Privacy, FOCS, USA,
2007, pp. 94–103.
Declaration of competing interest [26] Q. Liu Liu, G. Wang, F. Li, et al., Preserving privacy with probabilistic in
weighted social networks, IEEE Trans. PAQarallel Distrib. Syst. 28 (5) (2017)
1417–1429.
The authors declare that they have no known competing finan-
[27] Y. Shao, J. Liu, S. Shi, et al., Fast de-anonymization of social networks with
cial interests or personal relationships that could have appeared to structural information, Data Sci. Eng. 4 (1) (2019) 76–92.
influence the work reported in this paper. [28] D.J. Watts, Six Degrees: The Science of a Connected Age, W. W. Norton, ISBN,
New York, 2003.
[29] betterenvi.gSpan[OL], https://github.com/betterenvi/gSpan/blob/master/
Acknowledgment
graphdata/graph.data.
[30] IBM Research[OL], http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/
This research was supported by the National Natural Science Foun- data_mining/datasets/syndata.html/assocSynData, (Accessed 7 August 2019).
dation of China (Grant No. 61976032).
9
B. Ning et al. Ad Hoc Networks 110 (2021) 102303
Xiaoyu Tao received the B.S. degree in Computer Science Guanyu Li received the B.S. degree in Computer Science
and Technology from Liaoning University of Technology, from Dalian Marine College, China in 1985, the M.S. degree
China in 2017. She is currently studying Computer Science in Computer Science from Dalian Marine College, China in
at Dalian Maritime University to get the M.S. degree. Her 1993, and the Ph.D. degree in Management Science and
research interests focus on Privacy preserving on graph data. Engineering from Dalian University of Technology, China
in 2010. He is currently a professor in Dalian Maritime
University, China. His primary research interests are in
Semantic Web, Ontology Engineering, Internet of Things,
Knowledge Graph, etc. He has published more than 60
papers in refereed journals and conferences.
10