You are on page 1of 10

Ad Hoc Networks 110 (2021) 102303

Contents lists available at ScienceDirect

Ad Hoc Networks
journal homepage: www.elsevier.com/locate/adhoc

Differential privacy protection on weighted graph in wireless networks


Bo Ning ∗, Yunhao Sun, Xiaoyu Tao, Guanyu Li
School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China

ARTICLE INFO ABSTRACT

Keywords: With the development of 5G communication technology, the Internet of Things technology has ushered in the
Wireless networks development opportunity. In the application of Internet of Things, spatial and social relations can be used
Weighted graph to provide users with convenience in life and work, meanwhile there is also the risk of personal privacy
Privacy protection
disclosure. The data transmitted in the wireless network contains a large number of graph structure data, and
Differential privacy
the edge weight in weighted graph increases the risk of privacy disclosure, therefore in this paper we design
a privacy protection algorithm for weighted graph, and adopts the privacy protection model to realize the
privacy protection of edge weight and graph structure. Firstly, the whole graph sets are disturbed and the
noises are added during the process of graph generation. Secondly, the privacy budget is allocated to protect
the weight values of edges. The graph is encoded to deal with the structure of graph conveniently without
separating from the information of edges, and then the disturbed edge weight is integrated into the graph.
After that the privacy protection of the graph structure is realized in the process of frequent graph mining
combined with differential privacy. Finally, the algorithm proposed in this paper is validated by experiments.

1. Introduction expose the user’s privacy. Moreover in the weighted network graph,
the weight value also has the risk of privacy disclosure. For example,
The fifth generation (5G) mobile communications and Internet of traffic network graph, communication network graph and so on. When
Things (IoT) are changing the style of information transmission and considering the privacy protection of weighted network graph, the
data sharing. The unexpected surge of wireless data and the sustaining following issues need to be considered. The first aspect includes the
development of various digital terminals make people increasingly rely applicability of privacy protection model and the complexity of graph.
on digital services in social network, transportation, payment, naviga- Because graph is a complex data structure, when using privacy pro-
tion, online shopping and e-health [1,2]. Then, every day there will be tection model, we need to design a certain privacy protection strategy
a lot of network data generated, each user has become a digital individ- to ensure the feasibility of the algorithm. The second aspect includes
ual. Network data generally take the expression of graphs, the nodes in the contradiction and correlation between data changing and data
these graphs represent individuals or company organizations in social protection, which require not only the data utility of graph, but also
network, and edges indicate the relationships between nodes [3–5]. the privacy of data respectively. In the third aspect, the structures of
Specially, the edge can be weighted, which represents the quantified
graph and edge weight are different, which may lead to the difficulty
relationship between nodes, such as degree of intimacy of user relation-
of selecting processing methods, and it is difficult to apply the existing
ship or the amount of transactions between companies, etc. Therefore,
algorithms directly. Therefore, in order to solve the above problems
these network graph data contains a lot of user information [6–9].
and challenges, we needs to employ stricter privacy protection model
However, the increasing complexity of information technology and the
to protect the privacy of weighted network graph data, and design
growing capacity to analyze and disseminate network data are posing a
an effective privacy protection algorithm, so as to ensure the privacy
significant threat to the privacy of users. Therefore, if these graph data
are released without any protecting processing, it will lead to privacy protection of the graph structure and edge weight value, and ensure
disclosure. that the disturbed graph data can achieve high utility.
When graph data is published, anonymization is usually adopted to C.Dwork et al. [10] proposed the differential privacy protection
delete or replace the node ID in graph. However, the simple hidden model, which is a method based on data distortion. In this method, the
node information still has the risk of privacy disclosure, because not risk of privacy disclosure is strictly defined and the mathematical proof
only the node information, but also the structural information can is provided, so that the availability of data can be greatly guaranteed.

∗ Corresponding author.
E-mail address: ningbo@dlmu.edu.cn (B. Ning).

https://doi.org/10.1016/j.adhoc.2020.102303
Received 2 June 2020; Received in revised form 26 August 2020; Accepted 16 September 2020
Available online 23 September 2020
1570-8705/© 2020 Elsevier B.V. All rights reserved.
B. Ning et al. Ad Hoc Networks 110 (2021) 102303

In this model, the privacy risk of adding or deleting a record in a (1) According to the frequency of edge, Laplace noise disturbance of
data set can be controlled within a certain minimum, which makes it differential privacy is used in the graph generation process, and
impossible for an attacker to infer the user’s accurate information. It reasonable rules of graph generation are designed.
means that the existence or absence of a record has small impact on (2) After getting the disturbed graph sets, we design the edge weight
the final output results. In addition, because the model is biased to the protection algorithm, including a reasonable privacy budget allo-
mathematical model, which also brings great interest and challenges cation strategy.
to researchers. Therefore, in this paper, we first adopt the differential (3) Then, the disturbed edge weight is integrated into the coding
privacy protection model to protect the privacy of weighted graph data. process of the graph, and the frequent subgraphs of atlas are
At present, differential privacy technology has been applied to the mined. In the mining process, the Laplace mechanism and the
protection of non-weighted graph data, and the relative effect is rela- exponential mechanism of differential privacy are used to protect
tively excellent [11–13]. Q. Xiao et al. [11] used the HRG (hierarchical graph structure, and then the data utility is improved.
random graph) model to transform the graph into a tree structure, so (4) The expensive experiments verify the rationality and effectiveness
as to deal with the complex structure of the graph, and used the MCMC of our algorithm.
(Markov Monte Carlo) method to sample the HRG model space, while
satisfying differential privacy. Compared with the existing work, this 2. Related definition
work reduces the noise intake and effectively retains the basic network
structure characteristics. Kang Dong et al. [13] made improvements
Differential privacy (DP) model [10] guarantees that it can also
on the basis of the work of Q. Xiao et al. However, Edge weight
limit the attacker’s inference of private information, even in front of
also increases the risk of privacy disclosure, and the algorithm in the
a powerful attacker. Due to the robustness of the privacy protection
protection of non-weighted graph cannot be directly used to protect
model, the privacy protection in graph data has been widely used. The
weighted graph, because of the complexity of weighted graph in which
definitions and properties about differential privacy are described as
the weight of edge needs to be considered.
following.
In the privacy protection work of weighted graph data, most re-
searches employed the k-anonymity method [14–16] to protect the
Definition 2.1 (𝜀-Differential Privacy). Given a random algorithm 𝑀,
personal privacy. Because of the assumption of background knowledge
𝑅𝑎𝑛𝑔𝑒(𝑀) indicate a set of all possible output results of algorithm
when using the k-anonymity protection method, the relative effec-
𝑀. For any two neighbor data sets 𝐷 and 𝐷′ , and any subset 𝑆 of
tiveness of the proposed algorithm is not very strong. There are also
𝑅𝑎𝑛𝑔𝑒(𝑀), if 𝑃 𝑟(𝑀(𝐷) ∈ 𝑆) ≤ 𝑃 𝑟(𝑀(𝐷′ ) ∈ 𝑆) × exp(𝜀) is satisfied,
many attempts to protect the weight by differential privacy, but in
then algorithm 𝑀 satisfy 𝜀-differential privacy.
the process of protecting the weight of edge, many works do not
consider the structure protection of graph. Hu et al. [17] protected the The neighbor data sets represent two data sets with only one data
information of edge weight from leakage, the method of transforming difference, 𝜀 is the privacy budget, which determines the effect of
edge weight into probability value is adopted, so that the graph is privacy protection. Generally, the smaller the privacy budget 𝜀, the
published as uncertainty graph, but the data utility is largely ignored. higher the intensity of privacy protection.
Xiaoye Li et al. [18] transformed the edge weight sequence into the In addition, there are two common mechanisms of differential pri-
non-attribution histogram, and then used the differential privacy to vacy, Laplace mechanism and exponential mechanism [25]. Laplace
protect the weight information in graph. However, they only consider mechanism is used in numerical privacy protection, while exponential
the privacy protection of the weight in graph, rather than the protection mechanism is used in non-numerical privacy protection. Two mecha-
for structure of graph. Li and Chen et al. [19,20] emphasized that the nisms are defined below. The choice of noise mechanism determines
weight information of edge is not considered in the privacy protection the accuracy of query.
of non-weighted graphs, and the weight and structure of weighted
graphs will increase the risk of privacy leakage. Therefore, a privacy Definition 2.2 (Laplace Mechanism). Given the data set 𝐷, satisfying the
protection algorithm is designed by combining edge weight value and sensitivity of function 𝑓 ∶𝐷→𝑅𝑑 , denoted as △𝑓 , then the random algo-
graph structure. Li et al. [21] transformed graphs into ordered triples, rithm 𝑀(𝐷)=𝐹 (𝐷)+𝐿𝑎𝑝(△𝑓 ∕𝜀) satisfies 𝜀-differential privacy. Where
and designed a query model wsquery satisfying differential privacy. 𝐿𝑎𝑝(△𝑓 ∕𝜀) is random noise, which is a Laplace distribution with scale
In addition, some work has used differential privacy protection tech- parameter △𝑓 ∕𝜀. The magnitude of noise is directly proportional to
nology to protect the privacy of frequent subgraphs. E. Shen et al. [22] △𝑓 and inversely proportional to 𝜀.
applied the algorithm of frequent graph pattern mining and the guaran-
tee of differential privacy to protect the mining of frequent subgraphs Next, the concept of global sensitivity is given. This parameter
without permission. At the same time, in order to ensure privacy and determines the disturbance degree and it is needed by the mechanism,
utility, an effective neighbor counting technology is proposed. Xiang which is only determined by the query function and independent of
Cheng et al. [23] solved the problems of inaccurate mining results and data set.
weak differential privacy protection caused by large output space on
the basis of E. Shen et al. And the whole algorithm satisfies the 𝜀- Definition 2.3 (Global Sensitivity). Given a function 𝑓 ∶ 𝐷→𝑅𝑑 , the
differential privacy, which guarantees the utility and the privacy of input is a data set 𝐷, the output is a 𝑑-dimensional real number vector,
data to a large extent. Many algorithms for mining frequent subgraphs for any neighbor data sets 𝐷 and 𝐷′ , △𝑓 = max𝐷,𝐷′ ‖𝑓 (𝐷) − 𝐹 (𝐷′ )‖ is
have been proposed. The gSpan algorithm [24] used in this paper is the global sensitivity of 𝑓 . Where 𝑅 represents the real number space
a frequent subgraph mining algorithm based on the depth search of of mapping and the 𝐿1 distance between 𝑓 (𝐷) and 𝑓 (𝐷′ ).
graph, and it can make the expression of graph unique and easy to
operate. Definition 2.4 (Exponential Mechanism). Given the data set 𝐷, the
In this paper, we design a privacy protection algorithm for the random algorithm 𝑀 outputs an entity object 𝑜 ∈ Range(𝑀), 𝑢(𝐷, 𝑜)
weighted graph, and use the differential privacy protection model to is the utility function, △𝑢 = max∀𝑜,𝐷,𝐷′ ‖𝑢(𝐷, 𝑜) − 𝑢(𝐷′ , 𝑜)‖ is the sen-
protect the edge weight and structure of graph. At the same time, our sitivity of the utility function. If the algorithm selects and outputs 𝑜
proposed method ensures the data utility as much as possible. The main from Rang(𝑀) with a probability proportional to exp( 𝜀𝑢(𝐺,𝑜)
2△𝑢
), then the
contributions are as follows: algorithm 𝑀 satisfies 𝜀-differential privacy.

2
B. Ning et al. Ad Hoc Networks 110 (2021) 102303

Generally, according to the access form of data, the protection


framework of differential privacy model can be divided into two types,
interactive and non-interactive. In the interactive protection frame-
work, the data manager will design the privacy algorithm according
to each query access of users, and provide the protected query results
to users, until the privacy budget is exhausted, and the query model
does not meet the differential privacy. In the non-interactive protection
framework, the data manager will design the privacy protection algo-
rithm according to the whole database, provide the disturbed database Fig. 1. Privacy disclosure of edge weight.
to users, and then users can query and access the database according to
their needs. The main content of this paper is to adopt a non-interactive
framework to protect weighted network graph data. and the weight of edge BC is 4, then the attacker can infer the identity
Besides, differential privacy has two properties of sequence combi- information of node B, so privacy protection is needed for edge weight.
nation and parallel combination [10]. Sequence combination empha- Therefore, both the privacy protection of the edge weight and the
sizes that privacy budget can be allocated in different steps of method, data utility of edge weight should be considered. In this chapter, we will
while parallel combination ensures the privacy of algorithm satisfying disturb the whole graph set, and then disturb the edge weight and graph
differential privacy in its data set’s disjoint subset. structure, finally, the weighted graph protection algorithm(WGPA) is
proposed.
Property 1 (Sequence Combination). Given the data set 𝐷 and 𝑛 random
algorithms {𝑀1 , ⋯ 𝑀𝑛 } and 𝑀𝑖 (1 ≤ 𝑖 ≤ 𝑛) satisfying 𝜀-differential 3.1. Graph dataset generation algorithm
privacy, then {𝑀𝑖 } (1 ≤ 𝑖 ≤ 𝑛) satisfies 𝜀-differential privacy, where 𝜀

= 𝑛𝑖=1 𝜀𝑖 .
As a complex data structure, graph is difficult to be processed
directly, so the object of this paper is graph data set, that is to design
Property 2 (Parallel Combination). Given the data set 𝐷, it is divided into privacy protection algorithm based on a divide and conquer strategy.
𝑛 disjoint subsets, i.e. 𝐷 = {𝐷1 , … , 𝐷𝑛 }. 𝑛 random algorithms, and 𝑀𝑖 In this paper, we first disturb the whole graph set, and adopt the idea
(1 ≤ 𝑖 ≤ 𝑛) satisfies 𝜀- differential privacy, then the combined algorithm
of graph generation, which is to generate graph according to some
of {𝑀𝑖 } (1 ≤ 𝑖 ≤ 𝑛) satisfies 𝜀-differential privacy on data set 𝐷 =
characteristics of graph, and then disturb the graph in the process of
{𝐷1 , … , 𝐷𝑛 }, where 𝜀 = max1≤𝑖≤𝑛 (𝜀𝑖 ).
graph generation combined with differential privacy, and finally output
Generally, the network graph is regarded as a simple graph (undi- the disturbed graph data set [17]. For example, the existing work may
rected, acyclic and non-multilateral graph). Let 𝐺 = (𝑉 , 𝐸), where 𝑉 select the node degree to generate the graph. The feature chosen in
is a vertex set, which represents an individual. If there is a certain this paper is edge frequency, that is, according to the statistics of edge
relationship between two nodes, there is an edge between them, 𝐸 ⊆ frequency in each graph, and adding Laplace noise to disturb the edge
𝑉 × 𝑉 is used to represent the set of edges. If it is a weighted graph, frequency. In the process of generating graph, the edge with higher
it can be expressed as 𝐺 = (𝑉 , 𝐸, 𝑊 ), 𝑊 is the weight set of edge, frequency is given priority, and then the generation rules of graph are
and the weight on edge represents some degree of relationship between made. The generation of graph is as follows:
two nodes, such as the intimate relationships among friends, trust and Firstly, disturb the edge frequency. The frequency of an edge is the
the amount of business transactions. This paper mainly designs the number of times an edge appears in each graph, in the graph data set
algorithm for graph data set, the definition graph data set is as follows: with only one graph difference, according to the definition of sensitivity
△𝑓 = max𝐷,𝐷′ ‖𝑓 (𝐷) − 𝐹 (𝐷′ )‖, the sensitivity of edge frequency is
Definition 2.5 (Graph Dataset). A graph set composed of multiple △𝑓 = 1, then the Laplace noise added to edge frequency is the noise
graphs, counted as GD. For a graph data set GD of size 𝑛, the total function Laplace (1∕𝜀1 ) conforming to parameter 1∕𝜀1 . The set of edge
number of edges in the graph set is calculated as 𝐸, and the number of frequency after disturbance is calculated as 𝐹𝑖′ = {𝑓1 , … , 𝑓𝑚 }, where
edges in each graph is counted as 𝐸𝑖 . 1 ≤ 𝑖 ≤ 𝑛, 𝑛 is the size of graph set, 𝑚 is the number of edges in the
graph, and the elements in 𝐹𝑖′ are stored in descending order.
3. Privacy protection algorithm for weighted graph Then, the graph is generated by increasing one edge at a time. The
first edge is the edge corresponding to the first item in 𝐹𝑖′ , that is, the
At present, a lot of work only considers the privacy protection edge with highest frequency is selected; and the candidate neighbor
of the graph without weight, but the privacy protection of weighted edge is generated by two nodes of the edge. Then, multiple candidate
graph is not very perfect. However, focus on theory and practice, the neighbor edges will be formed by the nodes of edge, counted as the
weighted graph provides more structural information than the non- candidate neighbor set 𝑁. Therefore, it is necessary to design reason-
weighted graph, especially in some network graphs, the weight value able neighbor edge filtering conditions and add the most appropriate
on edge may represent the relationship between nodes, cooperation edge to the current graph.
times, transaction amount, etc., which is easy to increase the risk of The design idea of neighbor edge filter condition is as follows: when
privacy disclosure. If all the weight information is deleted or replaced, selecting an edge from candidate neighbor edge set, we can add an edge
the data utility will be greatly reduced [26]. by adding a new node or connecting two existing nodes. Considering
Usually, before publishing the graph, a simple way is to use numbers that there is a certain degree of relationship between nodes, new edges
or risk-free characters to replace the real node information, but the can be added to the graph with this selection standard.
attacker can still infer the node information based on the structural In the social network graph, the degree of relationship between
information, and then they will add or delete nodes (edges) to change nodes can be reflected by the six degree segmentation theory [28]. The
the structural information [27]. Although the above methods improve six degree segmentation can be defined as: the relationship between one
the isomorphism of graph, the attacker still has the possibility to node and other nodes does not exceed six degrees. Then, the formula
attack the user privacy, especially when there is a lot of background for degree of relationship between any two nodes can be obtained as
knowledge. As shown in Fig. 1, assuming that the attacker knows that
follows:
node A is related to node B, node B is related to node C and node F,
and meanwhile, the attacker also knows that the weight of edge AB is 2 𝑅(𝑉𝑖 , 𝑉𝑗 ) = 1 − 0.2 × (𝑚𝑖𝑛_𝑙𝑒𝑛𝑔𝑡ℎ(𝑉𝑖 , 𝑉𝑗 ) − 1) (1)

3
B. Ning et al. Ad Hoc Networks 110 (2021) 102303

Where, (𝑚𝑖𝑛_𝑙𝑒𝑛𝑔𝑡ℎ(𝑉𝑖 , 𝑉𝑗 ) − 1) represents the shortest path length Algorithm 1: graph dataset generation algorithm
between the current node 𝑣𝑖 and other nodes 𝑣𝑗 . That is to say, the value Input: original graph data set 𝐺𝐷, Edge Frequency set 𝐹𝑖
of 𝑅(𝑉𝑖 , 𝑉𝑗 ) according to the six degree principle can be 0, 0.2, 0.4, 0.6, Output: generated graph dataset 𝐺𝐷′
0.8 and 1. At this time, only the number of sides of the shortest path 1 for 𝑖=1 to 𝑛 do
between nodes is considered, and the weight on edge is not considered 2 𝐹𝑖′ ← 𝐹𝑖′ + Laplace(1∕𝜀1 );
in the calculation of shortest path length. At the same time, considering 3 Initialize 𝐺𝑖′ ← 𝑒1 ;
that the shortest path length may exceed 6 in other network diagrams, 4 Generate 𝐺𝑖′ neighbor collection 𝑁;
it is specified that when the shortest path length between nodes exceeds 5 foreach neighbor edge 𝑒 ∈ 𝑁 do
6, the value of 𝑅(𝑉𝑖 , 𝑉𝑗 ) is 0. 6 if 𝑒’ frequency ≥ Threshold and 𝐼 is max then
Then, the relationship between a target node 𝑣𝑗 ∗ and other gener- 7 Remove 𝑒 from 𝑁;
ated nodes 𝑣𝑖 can be obtained by summation and meaning. The formula 8 𝐺𝑖′ ← 𝐺𝑖 + 𝑒;
expression is as follows: 9 goto Line 4;
1 ∑
𝑛
𝑅(𝑉𝑖 , 𝑉𝑗 ∗ ) = × 𝑅(𝑉𝑖 , 𝑉𝑗 ∗ ) (2) 10 else
𝑛 𝑖=1 11 𝐺𝐷′ ← 𝐺𝑖′ ;
In weighted graph, the weight of edge should be considered, espe-
12 Return 𝐺𝐷′ ;
cially when an edge is added with a new node, the new node may
be related to some nodes in the existing node. Therefore, we can get
the formula of neighbor edge filtering conditions, using the letter 𝐼 to
represent the amount of information carried by one side, the specific
considering that the treatment of divide, rules of graphs may cause
formula is described as follows:
the uncorrelation of each graph. We designs the allocation strategy
𝐼 = 𝛼 × 𝑅(𝑉𝑖 , 𝑉𝑗 ∗ ) + (1 − 𝛼) × 𝑊𝑖𝑗 ∗ (3) of privacy budget according to the number of sides, which can still
guarantee the correlation between the whole atlas. Then, the principle
Where 𝛼 is the equilibrium parameter. The designed principle of of allocating privacy budget can be made by the number of edges, the
formula setting is that the greater the relationship between a node and 𝐸
privacy budget of each edge weight sequence is 𝐸𝑖 × 𝜀. If we directly
other nodes is, the easier it is to be selected; at the same time, the 𝐸
use the parameter 𝐸𝑖 × 𝜀 of Laplace function, it can be found that the
weight value information generally represents the relationship degree
noise value will be very large, which will disturb the data too much,
or transaction amount, the greater the weight value is, the more impor-
and the privacy budget will also cause great waste. Therefore, we uses
tant the node is, so the neighbor edge with the larger 𝐼 value is easier
the negative exponential function to adjust the privacy budget, which
to be selected.
can control the denominator part of Laplace function between 0 and
As shown in Fig. 2(a), if 𝛼 value is 1∕2, then the candidate neighbor
1, that is to say, it can reduce the noise disturbance and ensure the
set formed by current edge 𝑒12 is 𝑁 = {𝑒13 , 𝑒16 , 𝑒17 , 𝑒23 , 𝑒28 }. The
data utility. At the same time Satisfy differential privacy. Then, the
𝐼 value of edge 𝑒13 is 𝐼 = 1∕2 × (1∕2 × (1 + 0.8)) + (1 − 1∕2) ×
Laplace function allocates noise for each edge of graph, expressed as
2 = 0.45 + 1 = 1.45. Similarly, 𝐼 value of other neighbor edges is 𝐸
Laplace△𝑤∕𝑒 − 𝐸𝑖 × 𝜀2 , where △𝑤∕𝑒 is the sensitivity of the weight
{1.95, 1.45, 2, 0.95}. Therefore, according to the neighbor edge filtering
sequence, and 𝜀2 is the privacy budget allocated to. The settings of
rule, edge 𝑒23 is selected and added to new graph. Then according to
two functions will be described in detail in the privacy analysis of this
the selected edge 𝑒12 and edge 𝑒23 , the candidate neighbor set is formed,
chapter. Algorithm 2 is the edge weight protection algorithm (EWPA).
and the filtering is repeated until a new graph is formed, as shown in
Line 4 adds Laplace noise to the edge weight, and Line 5 judges the
Fig. 2(b). In addition, there will be cases where the 𝐼 value of two
weight after disturbance, so as to avoid that the weight has no practical
edges is the largest and the same, such as edge 𝑒13 and edge 𝑒17 . In this
significance.
case, according to the frequency of two edges, one edge with higher
frequency can be selected to join the graph.
Algorithm 1 is the graph dataset generation algorithm (GDGA). Line Algorithm 2: edge weight protection algorithm
2 of the algorithm disturbs the frequency of the edge in graph to form Input: Edge weight sequence set 𝑊 𝑆, privacy budget 𝜀1 , the
a new frequency set 𝐹𝑖′ . Line 3 initializes the new graph, and select the number of edges in each graph 𝐸𝑖 , the size of graph set
edge 𝑒1 corresponding to the first item in 𝐹𝑖′ into the graph 𝐺𝑖′ . Line 𝐺𝐷, total number of edge 𝐸
4 generates neighbor edge set 𝑁 according to the nodes of selected Output: disturbed edge weight sequence set 𝑊 𝑆 ′
edge. Lines 5–8 are the process of filtering and judging the neighbor 1 for 𝑖=1 to 𝑛 do
candidate set. Line 9 generates a neighbor edge set 𝑁 for the generated 2 foreach 𝑊 𝑆𝑖 ∈ 𝑊 𝑆 do
edges, and continue the filtering and judgment process of Lines 5–8. 3 for 𝑗=1 to 𝐸𝑖 do
Finally, Line 12 returns the generated graph data set 𝐺𝐷′ after the 4
𝐸
𝑤′𝑗 = 𝑤𝑗 + Laplace△𝑤∕𝑒 − 𝐸𝑖 × 𝜀2 ;
disturbance.
5 if 𝑤′𝑗 < 0 then
3.2. Privacy protection of edge weight 6 goto Line 4
7 else
After getting the disturbed graph set, this section will design the 8 𝑊 𝑆 ′ ← 𝑤′𝑗 ;
privacy protection algorithm for edge weight. Because the algorithm
designed in this paper is to process each graph in a graph set, then the 9 Return 𝑊 𝑆 ′ ;
protection of the edge weight can be transformed into the protection
of edge weight sequence.
The reasonable usage of privacy budget is a problem that it has 3.3. Privacy protection of graph structure
to be considered in the process of differential privacy protection. A
good privacy budget allocation strategy can avoid the waste of privacy The protection of graph structure is a difficulty in the privacy
budget and reduce the risk of privacy disclosure. In the process of protection of graph, especially in the privacy protection of weighted
edge weight protection, if the privacy budget is evenly distributed, graph. In addition, the existing non-weighted graph protection methods
there will still be the risk of privacy disclosure. At the same time, are difficult to be directly applied to the privacy protection of weighted

4
B. Ning et al. Ad Hoc Networks 110 (2021) 102303

Fig. 2. Original graph and generated graph.

graph, which requires us to find a suitable means to simplify the Table 1


operation of the disturbance to graph. Therefore, in the perspective of EDFS codes of edge order.

frequent graph structure, we extend the coding process of the existing Edge Fig. 3(b) Fig. 3(c) Fig. 3(d)

frequent subgraph mining algorithm, integrate the edge weight value 0 ⟨0, 1, 3, 𝑋, 𝑌 ⟩ ⟨0, 1, 4, 𝑌 , 𝑍⟩ ⟨0, 1, 3, 𝑋, 𝑌 ⟩
into the graph coding, and adopt differential privacy protection in the 1 ⟨1, 2, 5, 𝑌 , 𝑋⟩ ⟨2, 0, 5, 𝑋, 𝑌 ⟩ ⟨1, 2, 4, 𝑌 , 𝑍⟩
2 ⟨2, 3, 3, 𝑋, 𝑍⟩ ⟨1, 2, 6, 𝑍, 𝑋⟩ ⟨2, 3, 6, 𝑍, 𝑋⟩
frequent subgraph mining process. 3 ⟨4, 1, 4, 𝑍, 𝑌 ⟩ ⟨2, 3, 3, 𝑋, 𝑍⟩ ⟨3, 1, 5, 𝑋, 𝑌 ⟩
4 ⟨2, 4, 6, 𝑋, 𝑍⟩ ⟨0, 4, 3, 𝑌 , 𝑋⟩ ⟨3, 4, 3, 𝑋, 𝑍⟩
3.3.1. Graph extended coding
The gSpan algorithm [21] used in this paper is a classic algorithm
based on depth first search (DFS). By coding the graph, the direct opera-
tion of graph is avoided, and the workload caused by the isomorphism ∙ The edge order of Fig. 3(b) is (𝑣0 , 𝑣1 ), (𝑣1 , 𝑣2 ), (𝑣2 , 𝑣3 ), (𝑣1 , 𝑣4 ),
of subgraph is reduced. The core idea of the algorithm is (1) coding (𝑣4 , 𝑣2 );
process. The graph is DFS encoded and identified with the smallest ∙ The edge order of Fig. 3(c) is (𝑣0 , 𝑣1 ), (𝑣1 , 𝑣2 ), (𝑣2 , 𝑣0 ), (𝑣2 , 𝑣3 ),
DFS code. (2) Subgraph mining process. If the DFS code of candidate (𝑣0 , 𝑣4 );
subgraph is not the smallest, the graph is pruned; if the code is the ∙ The edge order of Fig. 3(d) is (𝑣0 , 𝑣1 ), (𝑣1 , 𝑣2 ), (𝑣2 , 𝑣3 ), (𝑣3 , 𝑣1 ),
smallest, the subgraph is filtered out within the threshold conditions. (𝑣3 , 𝑣4 ).
In this paper, the graph is represented on the basis of expanding
Each search tree corresponds to an edge order representation, which
the encoding process of gSpan algorithm. When encoding the graph,
makes the original graph with the reasonable representations, so we
the disturbed weight value obtained from Algorithm 2 is integrated
consider the graph as a meta group. Since the edge weight is considered
into the encoding process, and the minimum DFS code formation rules
in the coding, each edge of sequence can be expressed as a six tuple
involved in encoding are adjusted. By coding the graph, we can achieve
according to the tuple design standard of the original algorithm, and
the following goals. The first goal is to simplify the operation of graph
the weight value can be placed in the third column of the tuple, that is,
and overcome the difficulty of direct operation of graph. The second
the tuple representation of each edge is ⟨𝑖, 𝑗, 𝑊𝑖𝑗 , 𝑙𝑖 , 𝑙(𝑖, 𝑗), 𝑙𝑗 ⟩, the tuple
goal is to ensure that the change of edge weight value, which can
considering weight value is called edge ordered EDFS code, where 𝑖,
affect the generation of the graph structure, so as to establish the
𝑗 is the identifier, 𝑊𝑖𝑗 is the weight of edge, (𝑙𝑖 , 𝑙(𝑖, 𝑗),𝑗 ) is the label
relationship between them. The third goal is to provide convenience
of node and edge. Because the labeled graph is not considered in this
for the subsequent frequent graph mining work and reduce the search
paper, the edge label is not considered in the EGC algorithm in the
space. In this paper, the extended coding algorithm of graph is called
process of constructing the EDFS code, so the edge can be expressed
GEC algorithm. And an example is given to illustrate the specific as a five tuple. For example, ⟨𝑣0 , 𝑣1 ⟩ in Fig. 3(b) can be expressed as a
implementation process of the algorithm in this section. meta group ⟨0, 1, 3, 𝑋, 𝑌 ⟩.
As shown in Fig. 3, when performing depth first search on the Therefore, the EDFS code of edge order can be obtained as shown
vertices in Fig. 3(a), multiple DFS search trees may be generated. As in Table 1.
shown in Fig. 3(b) to Fig. 3(d), the three search trees obtained by Since the edge weight value is considered in the edge tuple, the
depth first search on Fig. 3(a), which are isomorphic. According to formation rule of the minimum EDFS code is changed. The selection
the sequence of traversal time, the identification subscript of node can rule of the minimum EDFS code is as follows:
be set in the search tree. The earlier it is found, the smaller the node Suppose that there are two edge orders 𝑒1 = (𝑣𝑖 , 𝑣𝑗 , 𝑊𝑖𝑗 , 𝑙(𝑣𝑖 ), 𝑙(𝑣𝑗 )),
subscript is. For example, the first visited node 𝑋 in Fig. 3(b) can be 𝑒2 = (𝑣𝑥 , 𝑣𝑦 , 𝑊𝑥𝑦 , 𝑙(𝑣𝑥 ), 𝑙(𝑣𝑦 )), if 𝑒1 < 𝑒2 (the smaller the edge is, the easier
set to 𝑣0 , and the second visited node 𝑌 can be set to 𝑣1 .... according it is to be selected),If and only if one of the following conditions is met:
to this regulation, the identification subscript of each visited node can
be set. Then, the representation of edge can be obtained. For example, (1) (𝑣𝑖 , 𝑣𝑗 ) < (𝑣𝑥 , 𝑣𝑦 ) or
the edge between 𝑌 and 𝑍 in Fig. 3(c) can be represented as (𝑣0 , 𝑣1 ). (2) (𝑣𝑖 , 𝑣𝑗 ) = (𝑣𝑥 , 𝑣𝑦 ) and 𝑊𝑖𝑗 > 𝑊𝑥𝑦 or
The outer edge and inner edge will be formed in the search tree, (3) (𝑣𝑖 , 𝑣𝑗 ) = (𝑣𝑥 , 𝑣𝑦 ) and 𝑊𝑖𝑗 = 𝑊𝑥𝑦 or (𝑙(𝑣𝑖 ), 𝑙(𝑣𝑗 )) < (𝑙(𝑣𝑥 ), 𝑙(𝑣𝑦 )).
which are represented by solid line and dotted line respectively. The
edge between 𝑌 and 𝑍 in Fig. 3(b) corresponds to the dotted line edge In the minimum EDFs code selection rule, the edge order is com-
(𝑣4 , 𝑣1 ) between node ID 𝑣1 and 𝑣4 which represents the inner edge, pared item by item according to the elements in edge order. The first
and the node ID subscript in the inner edge is in reverse order. At the two items are compared, i.e. the node ID. the larger the node ID is,
same time, the edge sequence <𝑇 can be formed according to the visited the easier it is to be selected. The change of the rule in condition (2) is
nodes. The rules of edge order formation are described as following. that when the identifications of two sides are equal, the weight value is
Assuming 𝑒1 = (𝑖1 , 𝑗1 ), 𝑒2 = (𝑖2 , 𝑗2 ) (1) if 𝑖1 = 𝑗1 and 𝑖1 < 𝑗1 , 𝑒1 <𝑇 𝑒2 ; taken into account, and it is specified that the higher the weight value
(2) if 𝑖1 < 𝑗1 and 𝑗1 = 𝑖2 , 𝑒1 <𝑇 𝑒2 ; (3) if 𝑒1 <𝑇 𝑒2 and 𝑒2 <𝑇 𝑒3 , 𝑒1 <𝑇 is, the easier the edge is to be selected. This condition is formulated
𝑒3 . According to the edge order formation rules, the expression of edge on the basis of considering that the weight value usually represents the
order can be unique. Then, the edge order of Fig. 3(b) to Fig. 3(d) can intimacy degree of the relationship or the transaction amount, etc., so
be obtained as follows: the larger the specified weight is, the earlier it should be selected.

5
B. Ning et al. Ad Hoc Networks 110 (2021) 102303

Fig. 3. Depth-first search tree.

Therefore, it can be obtained that the EDFS code corresponding Algorithm 3: Frequent subgraph protection algorithm
to Fig. 3(c) is the minimum EDFS code of Fig. 3(a), that is, the only Input: 𝑖-subgraph candidate 𝐶𝑖 ; privacy budget 𝜀3 , 𝜀4 ; frequency
representation of the graph. Then, we can filter frequent subgraphs and threshold; frequent 𝑖-graphs 𝑛𝑖
protect their privacy according to the minimum coding in subgraph Output: Frequent 𝑖-subgraphs 𝐹𝑖
mining. The specific process is described in the next section. 1 if 𝑠 ≠ 𝑚𝑖𝑛_𝐸𝐷𝐹 𝑆(𝑠) then
2 Return 𝐹 𝑆 ← 𝐹 𝑆 ∪ {𝑠};
3.3.2. Mining and protection of frequent graph structure
3 for 𝑗=1 to 𝑛𝑖 do
This paper uses differential privacy to protect graph structure in
4 foreach 𝑠 ∈ 𝐶𝑖 do
the process of mining. Due to the feature that differential privacy can
5 enumerate 𝑆 ∈ 𝐺 ⊆ 𝐺𝐷 and count its children;
analyze the protection intensity quantitatively, it has been widely used
6 foreach 𝑐, 𝑐 is 𝑠′ child with one edge growth in 𝐺𝐷 do
in data publishing and data mining. However, the relationship between
7 if lsupport(𝑐) = support(𝑐)+ laplace(1∕𝜀3 ) ≥ Threshold
privacy protection and data mining is always contradictory. The greater
then
the degree of privacy protection, the greater the difficulty of mining,
8 𝐶𝑖′ ← 𝑐;
and vice versa. Generally, when differential privacy protection is used
in the existing subgraph mining work, the general operation is to
9 if 𝐶𝑖′ ≠ ∅ then
directly add Laplace mechanism disturbance to the support of sub-
10 𝑔𝑗 ← select a subgraph 𝑔 from 𝐶𝑖′ such that
graphs, and then filter frequent subgraphs. Considering that the graph 𝜀 ×𝑠𝑢𝑝𝑝𝑜𝑟𝑡
11 𝑃 𝑟{Selecting subgraph 𝑔} ∝ exp( 4 2𝑛 );
structure is non numerical data, this paper proposes that differential 𝑖
privacy index mechanism can be used to protect the privacy of graph 12 Remove 𝑔𝑗 from 𝐶𝑖 ;
structure. By setting the parameters in the index mechanisms, the index 13 𝐹𝑖 ← 𝑔𝑗 ;
mechanism is successfully applied to the subgraph mining algorithm, so 14 𝐹 𝑆𝑃 𝐴(𝐺𝐷, 𝜀3 , 𝜀4 , 𝐹 𝑆, 𝑠);
that the algorithm can take two mechanisms of differential privacy to
15 Return 𝐹𝑖 ;
protect the privacy of graph structure at the same time, so that the data
utility of the mining results is better.
In the design of the algorithm, 1-graph is taken as the candidate
set of subgraphs, and whether the EDFS code of these subgraphs meets
Firstly, the correctness of differential privacy can be proved as
the minimum EDFS code condition is judged. If the EDFS code is the
follows. The WGPA algorithm divides the privacy budget 𝜀 into four
minimum, the child of subgraph is generated by increasing one edge at
parts, that is, it still satisfies the inequality (𝜀1 + 𝜀1 + 𝜀3 + 𝜀4 ≤ 𝜀).
a time, called as child simply. Each child corresponds to one support,
Among them, 𝜀1 is used to disturb the graph data set, 𝜀2 is used
then the support of subgraph can be disturbed, and the support of noise
to disturb the edge weight, 𝜀3 and 𝜀4 are used to disturb the graph
version can be obtained. Then the threshold condition of noise support
structure in the process of frequent subgraph mining. Based on the
can be determined. If it is less than the set threshold, the subgraph will
parallel combination [10] property of differential privacy, each part of
be eliminated. Then, according to the existing subgraphs, the child is
generated, and the frequent subgraphs can be obtained by repeating the algorithm satisfies differential privacy. Since the algorithm of graph
the supported disturbance and the screening of threshold judgment generation is still designed on the whole graph data set, according to
conditions. The algorithm does not end like this, but filters the frequent the sequence combination [10] of differential privacy, if there is an
subgraphs again, and uses the index mechanism of differential privacy algorithm 𝐴1 that satisfies 𝜀-differential privacy, the algorithm 𝐴2 (𝐴1 ())
to further select the most ideal frequent subgraphs that meet the designed on the basis of it still conforms to differential privacy, so as
conditions as final output results, so as to ensure the data effect and to ensure that WGPA algorithm still satisfies differential privacy, that
privacy. The frequent subgraph protection algorithm(FSPA) is shown is, it satisfies 𝜀-differential privacy (𝜀1 + 𝜀1 + 𝜀3 + 𝜀4 ≤ 𝜀).
in Algorithm 3. Secondly, we analyze the time complexity of the three algorithms
proposed in this paper. The time complexity of algorithm 1 is 𝑂(𝑛×𝑚),
where 𝑛 is the size of graph data set GD and 𝑚 is the size of edge
3.4. Algorithm analysis frequency set 𝐹𝑖′ . The time complexity of algorithm 2 is 𝑂(𝑛×𝐸𝑖 ), where
𝑛 is the size of graph data set GD and 𝐸𝑖 is the number of edges in
In this section, we carry on the algorithm analysis on correctness each graph. The time complexity of algorithm 3 is 𝑂(𝑛𝑖 ×|𝐶𝑖 |), where
of differential privacy and computational complexity of our proposed 𝑛𝑖 is the number of frequent i-subgraphs, |𝐶𝑖 | is the size of candidate
algorithms. i-subgraphs.

6
B. Ning et al. Ad Hoc Networks 110 (2021) 102303

Fig. 4. The effects of 𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 change on 𝐹 1 − 𝑠𝑐𝑜𝑟𝑒.

Fig. 5. The effects of 𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 change on 𝑅𝐸.

Fig. 6. The effects of 𝜀 change on F1-score.

From the analysis above, we can see that the proposed method in 4.1. Experimental setup
this paper is effective and efficient.
The experiment was carried out in Win7 64 bit operating system,
4. Experiment Intel i3 3.8 GHz processor, 12G memory, and implemented in Java
language. In the experiment, two real datasets are used to verify the
In this chapter, the algorithm proposed in this paper will be verified effectiveness of algorithms, as shown in Table 2.
by experiments. To observe the influence of parameters in the algo- In addition, RE and F1-score are used to test the performance of
rithm on experimental results, two experimental evaluation measures algorithms, which are defined as follows:
are adopted. Meanwhile, the algorithm without graph generation pro- RE (relative error): used to measure the reliability of mining results,
cess (called Naive) and only Laplace mechanism used in the process of as shown in formula (4).
frequent subgraph mining(called Basic) are taken as the experimental
𝑀𝑒𝑠𝑢𝑟𝑒𝑑𝑉 𝑎𝑙𝑢𝑒 − 𝐴𝑐𝑡𝑢𝑎𝑙𝑉 𝑎𝑙𝑢𝑒
comparison algorithm. 𝑅𝐸 = × 100% (4)
𝐴𝑐𝑡𝑢𝑎𝑙𝑉 𝑎𝑙𝑢𝑒

7
B. Ning et al. Ad Hoc Networks 110 (2021) 102303

Fig. 7. The effects of 𝜀 change on 𝑅𝐸.

Fig. 8. Comparison of run times on different datasets.

Table 2 As shown in Fig. 4(a) and Fig. 4(b), with the threshold increasing,
Experimental data set.
the value of F1-score is increasing. The reason is that the larger the
Dataset Figure size Vertex size Total edges Average edges Weight range threshold is, the fewer subgraphs that can meet the condition of fre-
Grd [29] 340 9189 9317 28 0-5 quent threshold when mining frequent subgraphs, and the greater the
IBM [30] 1230 14304 25221 21 0-20
real possibility of mining. Therefore, the greater the value of F1-score
is, the greater the data utility is. Basic algorithm is no more than 0.8,
generally between 0.6 and 0.7, especially in large data sets Obviously.
F1-score used to measure the data availability of mining results, as In addition, in the WGPA algorithm, the F1-score value can meet the
shown in formula (5). above 0.8, and achieve the effect of about 0.9, especially in the IBM
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙 data set, which is more obvious.
𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 = 2 × (5) As shown in Fig. 5(a) and Fig. 5(b), the value of RE decreases
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
with the increase of threshold. When the threshold value is very small,
Where, precision = |𝑈𝑝 ∩ 𝑈𝑐 |∕|𝑈𝑝 | and recall= 2 × |𝑈𝑝 ∩ 𝑈𝑐 |∕|𝑈𝑐 |. 𝑈𝑐 the number of frequent graphs that can meet the threshold condition
is the accurate result of frequent subgraph mining, 𝑈𝑝 is the result of will be larger. That is to say, in these subgraphs, the number of
frequent subgraph mining under differential privacy. non real frequent graphs will be larger, so the relative error will be
larger. Therefore, the selection of threshold is also a factor affecting
4.2. Experimental results and analysis the experimental results. From the experimental results, when 0.3–0.4
is selected, the results are relatively excellent. Of course, the setting
This section is to show the effect of algorithms on different datasets, of threshold can also be determined according to the actual needs. In
mainly to observe the impact of parameter privacy budget and thresh- addition, the RE value of the WGPA algorithm is always the smallest in
old on experimental results. The selection of threshold and privacy the different data sets, which can be lower than 0.02.
budget 𝜀 is determined according to the actual situation. Generally, As shown in Fig. 6(a) and Fig. 6(b), the more the privacy budget is
the setting of 𝜀 is not too large. If the setting of 𝜀 is too large, the added, the weaker the privacy protection degree of the graph is, and
protection work is basically gone. If the threshold setting is too large, meanwhile the less the interference to data is, so the data utility is
there may be only one or zero mining results, which is not meaningful. higher with a bigger F1-score value. In the mining process of Naive,
Therefore, in order to better observe the effect of two parameters on Laplace mechanism and exponential mechanism are adopted simulta-
the algorithm, in the experiment, privacy budget 𝜀 is between 0 and neously, which can ensure that the screened subgraphs are better, and
30, which is allocated according to the proportion of 𝜀1 : 𝜀2 : 𝜀3 : 𝜀4 = then the utility of final output frequent atlas is better. WGPA improves
2 : 3 : 2 : 3, and the threshold value is between 0 to 0.6. the accuracy of mining results on the basis of Naive, especially in IBM

8
B. Ning et al. Ad Hoc Networks 110 (2021) 102303

dataset, the value of F1-score is relatively high, which can also show [11] Q. Xiao, R. Chen, K. Tan, Differentially private network data release via structural
that in complex graph data The result of the algorithm is better. inference, in: Proceedings of the 20th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, KDD’14, New York, USA, 2014, pp.
As shown in Fig. 7(a) and Fig. 7(b), due to the influence of privacy
911–920.
budget, the RE value is constantly decreasing, and the interference is [12] H. Nguyen, A. Imine, M. Rusinowitch, et al., Network structure release under
also increasing at this time. WGPA algorithm is better. In the data set differential privacy, Trans. Data Privacy 9 (3) (2016) 215–241.
IBM, it can basically ensure that the re value is below 0.16, and the [13] K. Dong, Z. Liu, Y. Xu, et al., Differentially private big data publication via struc-
relative error of mining results is very small. tural inference and community detection, United Kingdom, ISPAN-FCST-ISCC,
2017.
Finally, the running time RT of the algorithm is also observed. The
[14] M.E. Skarkala, M. Maragoudakis, S. Gritzalis, et al., Privacy preservation by k-
time required for Naive algorithm and WGPA algorithm to mine fre- anonymization of weighted social networks, in: ASONAM, Turkey, 2012, pp.
quent subgraphs is observed on data set Grd and data set IBM, as shown 423–428.
in Fig. 8. In contrast, the WGPA algorithm needs more time. In the [15] L. Chuan, L. Ihsien, Y. Wunsheng, et al., K-anonymity against neighborhood
mining process, the size of threshold affects the mining time. Generally, attacks in weighted social networks, Secur. Commun. Netw. 8 (18) (2015)
3864–3882.
the time needed is reduced with the increase of the threshold. The
[16] X. Zhang, Q. Zhou, C. Gu, Published weighted social networks privacy preser-
reason is that the larger the threshold is and the less frequent subgraphs vation based on community division, in: Proceedings of the 7th International
can meet the conditions in graph set, and the running time RT is also Conference on Communication and Network Security, Tokyo Japan, ICCNS, 2017,
smaller. At the same time, it takes more time on IBM datasets, mainly pp. 86–90.
because the more complex the graph is, the more time it will be taken. [17] J. Hu, J. Yan, Z. Wu, et al., A privacy-preserving approach in friendly-correlations
of graph based on edge-differential privacy, J. Inf. Sci. Eng. 35 (4) (2019)
821–837.
5. Conclusions [18] X. Li, J. Yang, Z. Sun, et al., Differential privacy for edge weights in social
networks, Secur. Commun. Netw. 4267921 (2017) 1–10.
[19] Y. Li, H. Shen, C. Lang, H. Dong, Practical anonymity models on protecting
In this paper, we propose a privacy protection algorithm to protect
private weighted graphs, Neurocomputing 2 (18) (2016) 359–370.
the weighted graph in Internet of things, which mainly adopts the [20] J. Chen, B. Zhang, M. Chen, A 𝛾-Strawman privacy-preserving scheme in
differential privacy protection model to protect the edge weight and weighted social networks, Secur. Commun. Netw. 9 (18) (2016) 5625–5638.
graph structure. First of all, we disturb the whole graph set and add [21] L. Lihui, J. Shiguang, Weighted social network privacy protection based on
noise in the process of graph generation; secondly, we design edge differential privacy, J. Commun. 36 (9) (2016) 145–159.
[22] E. Shen, T. Yu, Mining Frequent Graph Patterns with Differential Privacy,
weight protection algorithm for the disturbed graph set, then code the
SIGKDD, USA, 2013, pp. 545–553.
graph and integrate the disturbed edge weight into it. Then, we mine [23] X. Cheng, S. Sh, S. Xu, et al., A two-phase algorithm for differentially pri-
and protect the frequent graph structure of the graph set, difference vate frequent subgraph mining, IEEE Trans. Knowl. Data Eng. 30 (8) (2018)
privacy is used in the process of mining. Finally, experiments are 1411–1425.
carried out in real datasets, and the experimental results show that our [24] X. Yan, J. Han, GSpan: Graph-Based Substructure Pattern Mining, ICDM,
Maebashi City, Japan, 2002, pp. 721–724.
method is feasible and effective.
[25] F. Mcsherry, K. Talwar, Mechanism Design Via Differential Privacy, FOCS, USA,
2007, pp. 94–103.
Declaration of competing interest [26] Q. Liu Liu, G. Wang, F. Li, et al., Preserving privacy with probabilistic in
weighted social networks, IEEE Trans. PAQarallel Distrib. Syst. 28 (5) (2017)
1417–1429.
The authors declare that they have no known competing finan-
[27] Y. Shao, J. Liu, S. Shi, et al., Fast de-anonymization of social networks with
cial interests or personal relationships that could have appeared to structural information, Data Sci. Eng. 4 (1) (2019) 76–92.
influence the work reported in this paper. [28] D.J. Watts, Six Degrees: The Science of a Connected Age, W. W. Norton, ISBN,
New York, 2003.
[29] betterenvi.gSpan[OL], https://github.com/betterenvi/gSpan/blob/master/
Acknowledgment
graphdata/graph.data.
[30] IBM Research[OL], http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/
This research was supported by the National Natural Science Foun- data_mining/datasets/syndata.html/assocSynData, (Accessed 7 August 2019).
dation of China (Grant No. 61976032).

References Bo Ning received the B.S. degree in Computer Science


from Dalian University of Technology, China in 2003, and
[1] H.S. Song, S.J. Sun, N. Akhtar, et al., Benchmark data and method for real- the Ph.D. degree in Computer Science from Northeastern
University, China in 2009. He is currently a associated
time people counting in cluttered scenes using depth sensors, IEEE Trans. Intell.
professor in Dalian Maritime University, China. His primary
Transp. Syst. 20 (10) (2018) 3599–3612.
research interests are in network data management, data
[2] Y.X. Tong, Y.X. Zeng, Z. Zhou, et al., A unified approach to route planning for
privacy preserving, etc. He has published more than 30
shared mobility, Proc. VLDB Endow. (11) (2018) 1633–1646.
papers in refereed journals and conferences.Prior to joining
[3] Haldar Nah, J.X. Li, M. Reynold, et al., Location prediction in large-scale social
Dalian Maritime University, he was visiting scholar with
networks: an in-depth benchmarking study, VLDB J. 28 (5) (2019) 623–648.
Swinburne University of Technology, Australia.
[4] Y. Yuan, X. Lian, G.R. Wang, et al., Weight-constrained route planning over
time-dependent graphs, in: ICDE, 2019, pp. 914–925.
[5] Y. Wang, Y. Yuan, Y. Ma, et al., Time-dependent graphs: Definitions, applications,
and algorithms, Data Sci. Eng. 4 (4) (2019) 352–366.
[6] T.T. Cai, J.X. Li, A.S. Mian, et al., Target-aware holistic influence maximization
in spatial social networks, in: IEEE Transactions on Knowledge and Data
Yunhao Sun received his B.S. degrees in computer science
Engineering, http://dx.doi.org/10.1109/TKDE.2020.3003047.
and technology from DaLian Maritime University, China in
[7] J.X. Li, S. Timos S.J. Culpepper, et al., Geo-social influence spanning
2013, and he is currently a Ph.D. candidate at Faculty
maximization, IEEE Trans. Knowl. Data Eng. 29 (8) (2017) 1653–1666. of Information Science and Technology, Dalian Maritime
[8] Y.X. Tong, Z. Zhou, Y.X. Zeng, et al., Spatial crowdsourcing: a survey, VLDB J. University. His research interests cover the query processing,
29 (1) (2020) 217–250. semantic reasoning on knowledge graph and semantic data.
[9] Y. Yuan, X. Lian, L. Chen, et al., RSkNN: kNN Search on road networks by
incorporating social influence, IEEE Trans. Knowl. Data Eng. 28 (6) (2016)
1575–1588.
[10] C. Dwork, F. Mcsherry, K. Nissim, et al., Calibrating Noise to Sensitivity in
Private Data Analysis, TCC Springer, New York, 2006, pp. 265–284.

9
B. Ning et al. Ad Hoc Networks 110 (2021) 102303

Xiaoyu Tao received the B.S. degree in Computer Science Guanyu Li received the B.S. degree in Computer Science
and Technology from Liaoning University of Technology, from Dalian Marine College, China in 1985, the M.S. degree
China in 2017. She is currently studying Computer Science in Computer Science from Dalian Marine College, China in
at Dalian Maritime University to get the M.S. degree. Her 1993, and the Ph.D. degree in Management Science and
research interests focus on Privacy preserving on graph data. Engineering from Dalian University of Technology, China
in 2010. He is currently a professor in Dalian Maritime
University, China. His primary research interests are in
Semantic Web, Ontology Engineering, Internet of Things,
Knowledge Graph, etc. He has published more than 60
papers in refereed journals and conferences.

10

You might also like