You are on page 1of 9

Smart Data Agent for Preserving Location Privacy

1st Harkeerat Kaur 2rd Isao Echizen 3rd Rohit Kumar


Indian Institute of Technology Jammu National Institute of Informatics Indian Institute of Technology Jammu
Jammu, India Tokyo, Japan Jammu, India
harkeerat.kaur@iitjammu.ac.in iechizen@nii.ac.jp 2017ucs0054@iitjammu.ac.in

Abstract—A novel agent model is proposed that preserves the studies reveal that users, are continuously tracked by the apps
privacy of location information by using smart data i.e., data that installed on their smartphones [3]. It is also typical for the
protect themselves in a manner appropriate to the needs of the users to forget which apps they have installed on their phones
user. The model protects personal data by wrapping in a “cloak of
intelligence”. This requires development of an intelligent agent and are usually unaware of continuous tracking. In the current
that acts as the user’s virtual proxy in cyberspace controlling scenario, choice of whether to grant an app access to the
the release of the user’s information in accordance with the its location is binary, i.e., either grant or deny permission. In case
preferences, context of information and/or nature of situation. the user denies permission, all the services provided by the
The presented work focuses on developing a location data agent app may be restricted. Fear of the risk to privacy might cause
for smartphone users. Many applications request access to the
user’s location data in order to provide their services, but some the user to completely turn-off location services, but then the
may take advantage of this opportunity to continuously record user would be unable to use any services. This lack of options
location data even when they are not being used. This continuous makes it difficult for the user to control their privacy. Thus,
tracking of the user’s location can reveal extensive personal it is imperative to develop solutions that enables user to avail
information. The aim of this work is to develop a neural network services while preserving their privacy.
based intelligent agent that learns the user’s privacy preferences
and estimates the preferred privacy levels for future interactions. This paper presents novel a neural network based smart
Agent in the proposed model behaves as an advanced data data model developed for smartphone kind of IoT framework.
location manager and interacts with the apps instead of letting It acts like a proxy or virtual agent to protect the privacy
them directly access the location data. Apart from deciding the of users’ actual location information when interacting with
amount of distortion, the agent also works to prevent remove LBS providers. Smart data refers to data which can “think
spatial and temporal correlations by adding perturbations to
actual access history so as to prevent a third party from making for themselves”. The process of making data smart involves
an kind of prediction on personal data. development of an intelligent web-based agent that protect
Index Terms—Deep Learning, Internet of Things, Location the privacy and security of location by providing it to the
Privacy, Neural Networks, Privacy by Design, Smart Data requesting applications in accordance with the users’ instruc-
tions/preferences. The agent would also make users aware of
I. I NTRODUCTION
the possibility of being tracked. The agent model is based
The technological revolution brought about by artificial on the concept of “Privacy by Design”, which means “data
intelligence (AI) and the Internet of Things (IoT) has rendered protection through technology design” [4]. Recent studies
technology and gadgets to become integral and indispensable show adversarial attacks on prediction models, where the
parts of daily life. IoT systems and devices provide value- attackers add a small amount of noise to the input samples
added services by collecting tremendous amounts of user data. to make model predict the incorrect results [5]. On the same
In order to access these services users have to often submit line, the proposed agent model uses the concepts of the
a lot of personal information like their choices, behaviors, adversarial attacks but to benefit the user and prevent it form
eating habits, social status etc. Location information privacy is illegitimate exposures. Thereby, enabling users to have more
a particularly important area of concern that is attracting much control over the disclosure, distortion, and sharing of their
attention from the research community [1]. It essentially refers location information.
to the one’s control over dissipation of its location information The work is organized as follows. A brief review of the
[2]. Various types of IoT devices continuously access location- existing location privacy methods in Section II. The smart data
based services to map routes, make weather predictions, pre- concept is discussed in Section III which is followed by the
dict traffic conditions, or issue search queries. In addition to architecture of proposed smart data agent model in Section
responding to user queries, location-based services (LBS) also IV. The experimental results and discussion are provided in
record our exact locations, movement patterns, and places of Section V. The key points are summarized in Section VI.
interest. An LBS may perform continuous tracking, learning,
and prediction using these data or may sell them to a third II. R ELATED W ORKS
party who may use them to commercialize their services A number of approaches have been taken to preserve
or send unwanted advertisements or messages. Some recent location privacy which can be broadly classified as location
978-1-7281-2547-3/20/$31.00 ©2020 IEEE anonymization, dummy location, and location obfuscation
approach [6], [7]. This section discusses some important basis of geometry only and does not checks the real-time
techniques within these categories. probability of the existence of the user at a particular location
1. Location anonymization approaches: In these approaches, [12]. Ardagna et al. proposed privacy enhancing techniques
a centralized anonymizer or a third-party acts as a trusted based on spatial obfuscation to preserve location privacy of
agent between user and LBS. The anonymizer cloaks the users while allowing them to specify privacy preferences and
actual location of the user within a set of locations known accordingly presented various operators for location obfusca-
as “cloaking region” [8]. In the basic K-anonymity approach, tion [16]. They also proposed a middle-ware component that
the anonymizer first encloses the actual location within a set balances the trade-off between user privacy preferences and
of K − 1 similar locations and then forwards the query to location obfuscation to maximize service quality [17]. In an-
the LBS [6]. Several improvements to this approach have other approach a query initiator and query requester connected
been proposed such as (K, T )-anonymization, which ensures over a wireless-ad hoc network like Bluetooth to provide
anonymity of a user’s query in the presence of an attacker with obfuscated location [18]. The query initiator obfuscates the
knowledge about the time window for when the query was user’s location by reporting a locally cloaked rectangular area
issued [9]. Another improvement is the use of the distributed of itself along with other k − 1 agents in the network to
K-anonymity protocol which uses cryptographic mechanisms the query requester. This approach strongly depends on the
and distributed servers to provide the desired anonymity [10]. presence of other agents and network infrastructure, which
A third improvement is K-anonymity and L-diversity cloaking are not always available. If enough users are not available,
algorithm which determines the minimum cloaking region on QoS may be degraded. A lightweight semantic scheme that
the basis of the number of buildings (L) and the number of operates over a collaborative network was recently proposed
users (K) [11]. These approaches can degrade the quality of for generating obfuscated regions [7].
service (QoS) when there are not enough users (a certain It can be observed from literature that designing a location
minimum number of users) near the target user. In such preserving approach is not a trivial task. The major idea
cases, the anonymizer must increase the radius of the cloaking behind the approaches described above is to either distort
regions to have enough users for anonymization. This increase the actual location or supplement it with fake locations to
in processing also leads to increase service latency. confuse the LBS provider. There is a direct trade-off between
2. Dummy location approaches: Another way to prevent at- QoS and privacy protection with most of these approaches.
tackers from inferring the actual location is by submitting fake However, the most important concern with these approaches
or dummy queries to the an LBS directly rather than having is the restriction they place on the “choice of privacy”. The
a centralized anonymizer server to generate them. Although notion of “choice of privacy” is more about control i.e., lettings
this eliminates the need for an extra server, if the fake or users decide with whom they want to share information and
dummy locations are not properly selected the attacker may be to which extent. With the available approaches users can only
able to infer the actual location of the user [12]. Moreover, if choose to use either distorted or actual locations to access
the successive queries submitted by a user have spatial and/or services. Thus, there is limited freedom of choice about such
temporal correlation, it may be possible to infer the actual factors as “with whom” and “how much” and the ability to
location by using simple intersection and relating the query alter this behavior over time. This leaves the central idea of
probabilities. This can be prevented by extracting the query privacy un-exercised. While each approach has some strengths
probabilities and using dummy nodes with similar probabilities and weaknesses, it is extremely difficult to design a practical
to prevent any distinction [13]. Another approach to prevent model for location privacy. This motivates the design of an
spatial and temporal correlation between neighboring dummy intelligent model that firstly analyzes the environment and then
location sets is to first create a set of randomly chosen dummy predicts the amount of privacy needed such that it is cognizant
locations. Then for each subsequent request, dummies that of the user’s needs and changes it accordingly over time.
could be identified by taking into account the spatio-temporal
correlations are filtered out [14]. This filtering is done on III. S MART DATA C ONCEPT
the basis of time reachability, direction similarity, and in- Smart data refers to a concept which allows data to “think
degree and out-degree of the distributions. Another scheme for themselves” by transforming itself into different “active
uses a privacy-aware algorithm in which dummy locations are forms” while interacting with the external environment. This
flexibly generated in accordance with a virtual grid or circle. concept was proposed by George Tomko in 2010 and finds
The locations are then blurred so that they map to a candidate its birth in the spirit of Privacy by Design [19], [20]. Imple-
location inside the circle. However, this approach does not mentation of the smart data concept requires the development
consider the nature of the region forming the circle [15]. of an agent model that act as user’s proxy and transforms
3. Location obfuscation approaches: These approaches prevent data by wrapping them in a “cloak of intelligence”. Data
the actual location of the user from being identified by are instructed how to expose or transform themselves to the
distorting or blurring it so that the resulting location cannot interacting environment in accordance with user’s preferences,
be directly linked to the actual location. Distortion is achieved content, context, and nature of the situation.
by cloaking, enlarging, or reducing the target region. A basic The proposed model introduces a data agent that transcends
approach for location obfuscation makes decisions on the the above definition with respect to the users location data. As
TABLE I
A PPS , TAGS AND C ONTEXTS

Apps Tags Context Words


Ola,Uber, Maps, Sygic Maps & Navigation book, ride, go, look, find, way, map, guide, locate
SnapChat, Facebook, Instragram Social check-in, live, tag, locate, share
Starbucks, Baskin Robins, Dominos Food & Drinks shop, order, book, deliver, address, locate
Urban Clap Home Services book, order, address, locate
Chrome Communication find 1 km, nearby
Book My Show Entertainment nearby, locate, maps

Fig. 2. Roles of smart data agent.

IV. D ESIGNING S MART DATA AGENT M ODEL


Fig. 1. Illustration of smart data agent. The concept of Privacy by Design is based on two main
parameters - context and control. In order to exercise these
parameters the model performs four important operations as
discussed, smartphone users’ have a limited choice, either to shown in Fig 2. The design for each of these operations is
allow an app to access location data in order to use its services provided below.
or not allow access and thus not be able to use its services. 1) Sensing the interacting environment
The proposed model is aimed at enabling a user to achieve 2) Privacy prediction
the level of privacy matching the user’s personal choices, 3) Location distortion
behaviors, and context to control the extent to which location 4) Preventing adversarial prediction
information is distorted and with whom it is shared on the
basis of boundary rules. The concept of the smart data agent A. Sensing Environment
preserving location privacy for smartphone users’ is illustrated As already mentioned the data agent acts as a virtual user
in Fig. 1. The original data i.e., the coordinates of user’s and manages the location. It starts by calibrating itself to the
actual location are at the center. Instead of directly revealing properties of external environment. It does this by focusing
themselves (coordinates x,y)) to the interacting environment on the app tags, usage frequency, and context words. This
(apps) it is passed through a transformation/cloaking layer. The information acts as a metadata for prediction and perturbations.
outer side of the layer senses the category of the interacting 1) App Tags: The agent begins by crawling the web and/or
app (Maps & Navigation, Social, Food & Drinks Health, etc.). app stores to obtain initial tags for the installed apps. An
It analyzes their previous behaviors and accordingly modifies online tool such as the Google Play scrapper is provided
the data to be shared in accordance with the user’s defined in [23]. Table I shows an example set of apps that were
rules and/or issues a warning to the user if required. For tagged category-wise in accordance with the kind of
example, if an app like abc or pqr behaves maliciously or value added services they provide.
very frequently accesses the location data, the model provides 2) Usage Frequency: The usage frequency (U F ) corre-
it with a distorted location instead of the actual one. There sponds to number of times location data is accessed
exists some recent works which discusses the use of machine within a certain interval of time. This data is used to
learning techniques to enhance information privacy on social determine whether an app is frequently accessing the
networks and IoT ecosystem [21], [22]. However, to the best location data. Here we assume that the agent considers
of our knowledge, the idea of using smart data to preserve one time unit to be equal to 10 seconds so the total
privacy of location data has not been addressed till now. number of possible units per day is 8640 (24 × 60 × 6).
Fig. 3. Location history monitoring.

Fig. 4. Network architecture of smart data agent model.

Then U F of each app is computed by counting the Loc accessed holds boolean values (0/1) depending
number of times the location data accessed divided by upon whether the location data were accessed by the
the total number of possible time units. corresponding app at time unit ti , and if accessed whether
∑N it was distorted. For example, consider the situation in
LAi which an app starts navigation at 10:00:00 hours (ti =3600)
U Fapp = i=1 (1)
N at location (32.803528, 74.896323) and ends at 12:30:00
3) Context Words: Context words are obtained by having hours (tj =4500) at location (32.704872, 74.879152). The
the user input words for each tag category that the user corresponding history log is shown in Fig. 3(a). Since it is a
feels are relevant or may require precise location for navigation app, the location changes continuously during the
QoS as shown in Table I. period from 3600 to 4500 as shown in Fig. 3(b). If the app
A separate log file is maintained for each app, does not perform any other navigations for the remainder of
wherein each record is a tuple Tiapp ={T imeunit(ti ), the day, the access history for the day can be visualized as
Loc Accessed(Boolean), actual coordinates (xi , yi ), shown in Figs. 3(c), where the locations observed between
distorted coordinates (x′i , yi′ )}. Since the logs are 3600 and 4500 are marked one and zero otherwise. Instead
maintained day-wise, ti ∈ (0, 8640). The variables of using the U F for each day, the smart data agent model
upon whether the request contains context words provided by
the user. The three hidden network layers are activated using
the ReLU function. The output layer consists of two outputs
r1 and r2 , which specify the range of distance r1 < d < r2
in which the location to be accessed should be distorted (e.g.
100–200 or 200–500 meters).
Model Training: The model requires data to learn decision-
making. This is achieved by calibrating the model in accor-
dance with boundary rules specified by users choice of privacy
belonging to a particular app category, in the presence or
absence of context, and operating at a certain usage frequency.
The choice is recorded by showing the user various app tags,
asking the user to specify context words, and inputting their
choice of distortion range for different usage frequencies. The
model is then trained on the data for the initial set of choices.
Fig. 5. Usage Trends for application A. Example training data for a user say U ser1 in Table II shows
its choice of privacy behavior for apps belonging to category
‘social’ are shown. It can be seen from instances 1 and 2, that
if the UF is very low, the user decides that no distortion is
required. In instance 3, the usage frequency is moderate but
the context is absent and thus user sets distortion range to
200-500 mt. However, in instances 4-5 the usage frequency
is moderate but context is present, so a user may choose no
distortion needed. Instances 6-10 illustrate the other choices
set by the users according to UF% and context information. It
is noteworthy here that the model is personalized for each user.
For another user, its choice of privacy range may be different
for different usage frequency and context availability.
Model Prediction: If the user uses an existing app or down-
loads a new app belonging to category ‘social’, its location
privacy behavior will be predicted by the model trained on
Fig. 6. Data Mutation: Transformation over time, usage frequency, and choice.
its previous set of choices. This behavior will change as the
U F frequency changes in due course of time. Thereafter,
suggests that the U F should be observed over a window whenever the model encounters a similar request associated
of three days and that the average U F should be used to with a specified tag, it uses the U F and context words to
determine app behavior. predict the distortion range learned during the training phase.
The confidence on the input request is developed by checking
B. Privacy Prediction its similarity with the training information. The similarity is
Then agents core consists of a neural network architecture computed by clustering the training data on the basis of their
to predict a user’s privacy behavior for a location data on U F and context combinations. It is proposed that for a new
the basis of the request received and information obtained request with a deviation greater than 10% from each existing
from the previous step. The model consists of a five layered clusters, the model predicts the privacy range, and records the
deep fully connected neural network architecture, where each user response. If the user response is satisfactory, it is passed
intermediate layer uses ReLU activation function (Fig. 4). back to the model for learning. Otherwise, the model asks the
Model Structure: The input layer has nine input features user to specify the degree and range of distortion, which are
consisting of [App Tag], Usage frequency, Context. Here we used for learning and set up a new cluster. In the same way,
consider seven major categorizations of apps in accordance user can also change its previous settings and list of context
with their tags as: [Maps & Navigation, Social, Health & words. As compared to other location privacy techniques, the
Fitness, Food & Drinks, Communication, Games & Enter- distortion is not fixed. It is flexible so that a user can enjoy
tainment, and Others], which are represented as a one-hot QoS as per personal choice.
encoding vector of 7 dimension. The app tag along with U F Data Mutation Illustration: Fig. 6 illustrates how data
and context amounts to total of nine input to the network. transforms itself into different forms or “think for themselves”.
The U F is the average of usage over the past three days like Depending upon how the various parameters like U F , app
the example one shown in Fig. 5. This value is configurable, tags, and privacy range are defined as suggested by the user,
and can be changed by the user to 5, or 7 days for some the agent instructs the data about what distortion ranges it
application. The context value is either zero or one depending should take next so as to maintain user privacy and/or QoS
TABLE II
S AMPLE TRAINING DATA FOR APPS BELONGING TO SOCIAL CATEGORY

No. Tag [Maps, Social, Health & Fitness, Food & Drinks, Com- usage freq context r1 r2
munication,Games & Entertainment, Others] (UF)%
1 [0100000] 3.24 1 0 0
2 [0100000] 2.82 0 0 0
3 [0100000] 12.42 0 200 500
4 [0100000] 15.63 1 0 0
5 [0100000] 16.71 1 0 0
6 [0100000] 26.71 1 0 0
7 [0100000] 32.71 1 200 500
8 [0100000] 61.43 0 1000 2000
9 [0100000] 77.52 1 200 500
10 [0100000] 71.53 0 1000 2000

as specified. The form keeps on changing as the affecting D. Preventing Adversarial Prediction
parameters vary. One way to prevent data mining on location history is
to add noise to it to perturb its spatial and/or temporal
C. Computing Distorted Locations correlations patterns. The approach taken here attempts to
break these correlations for apps with low or medium U F .
The distortion of location data in accordance with the
The proposed model does this in two ways. 1) It distorts
privacy range and in a fast manner is an important task. The
the actual coordinate values by a different amount each time,
agents cannot handle the workload imposed by continuous
which prevents spatial correlation. 2) It perturbs access history
loops to search for distorted locations lying in an appropriate
by introducing location request at some points of time in a
range. It is thus important to design a method for quickly
pseudo-random manner.
mapping location data in accordance with specified range. We
To achieve this perturbation the model maintains a pool of
thus investigated how locations can be distorted such that the
random locations that are randomly filled by either accessing
specified range is satisfied.
the actual locations from the smartphone’s GPS function or by
The actual distance between any two points on earth is given distorting the values from an original pool to some distortion
by the Haversine distances. Let R = 6371000 be the radius of ranges. Then for each app it observes its actual location-
the Earth in meters (mt). Then the Haversine distance between accessed, LA, variable for the past quarter of a day, i.e.
any two GPS coordinates, (lat1 , long1 ) and (lat2 , long2 ), is 6 hours, and the LA variable for the same quarter of the
computed as: previous day (Fig. 7(a-b)). It subjects these two values to

d = 2R arcsin( h) (2) an OR operation to obtain a set of random location-accessed
variable indices LA rand (Fig. 7(c)).
where h = sin2 ( ϕ2 −ϕ 2
1)
) + cos(ϕ1 )cos(ϕ2 )sin2 ( λ2 −λ
2
1
), It selects random values from the pool at the onset and
ϕ1 =rad(lat1 ), ϕ2 =rad(lat2 ), λ1 =rad(long1 ), λ2 =rad(long2 ) offset of the vacant time units corresponding to LA rand = 0.
This formula is unsuitable for real time due to number It is then interpolated to fill in the remaining values. This
of arithmetic and trigonometric computations. Instead, the variable is known as Rand cord variable and has dimensions
distance can be approximated in less time using quick distance equal to LA rand. While the model is working in the present
formula given as: quarter, if no location is accessed and LA rand(i) = 1,
the model sends false queries to the app LBS with locations
√ Rand cord(i). This introduces false queries into the LBS
q = 111.319 (d2x + d2y )mt. (3)
with different time patterns (Fig. 7(d)). Then Fig. 8(a) shows
the actual locations, and Fig. 8(b) shows locations with false
where dx = lat2 - lat1 and
inductions. This break the spatial and temporal correlations
dy = (long2 - long1 ) cos((lat2 + lat1 )*0.00872664626)
prevents a third party from any learning.
Proposed Estimation for Quick Mapping Let the amount
of distortion in the coordinate locations be represented as V. E XPERIMENTAL S TUDY AND R ESULTS
∆x = 0.00ax bx cx dx , ∆y = 0.00ay by cy dy . Then we have em- The proposed model was evaluated by implementing it in
pirically determined how to adjust the values after the decimal Python 3.0, running on a local machine with an Intel Core-i7
points in order to generate a new location, x′ = x ± ∆x and 3.4 GHz processor, 8 GB RAM, and a Microsoft Windows 10
y ′ = y ±∆y, such that the distance between them is in the de- Pro - 64 bit OS.
sired range. Table III shows how the values of these variables
can be randomly varied (0 ≤ ax , bx , cx , dx , ay , by , cy , dy ≤ A. Data Collection
9)to have the desired distortion. This table can be used for The data is collected by surveying a closed sample of
quickly computing distorted locations. 25 users to record their location privacy preference for each
TABLE III
C OORDINATE DISTORTION AND PRIVACY RANGES

∆x = 0.00ax bx cx dx , ∆y = 0.00ay by cy dy
Distances 0.000000-0.000400 0.000400-0.000800 0.000800-0.000999 0.00100-0.001499 0.001500-0.001999
Quick ∆q 0-53 mt 53-107 mt 107-134 mt 134-201mt 201-268 mt
Haversine ∆h 0-55 mt 55-111 mt 111-138 mt 138-208mt 208-278 mt
Distances 0.001999-0.002499 0.002499-0.003999 0.003999-0.005999 0.005999-0.009999 ≥ 0.00999
Quick ∆q 268-335mt 335-573mt 573-804 mt 804-1341 mt ≥ 1341
Haversine ∆h 278-347mt 347-559mt 559-834 mt 834-1391 mt ≥ 1391

Fig. 7. Illustration of random location insertion.

Fig. 8. Trajectory of actual and randomized locations.

TABLE IV
S OME PREDICTION RESULTS FOR U SER -1 AND U SER -2

User-1 User-2
No. Input features True range Predicted range Input features True range Predicted range
1 [1 0 0 0 0 0, 12.58%, 0] [100-200] [100.33-200.45] [1 0 0 0 0 0, 98.16%, 1] [0-0] [0.09- 0.11]
2 [0 1 0 0 0 0, 72.14%, 0] [ 500-1000] [498.65-998.23] [0 1 0 0 0 0, 76.08%, 1] [250-500] [252.03-503.38]
3 [0 0 1 0 0 0, 72.14%, 0] [0-0] [0.100 -0.015] [0 0 1 0 0 0, 62.27%, 0] [550-1200] [522.17-1155.58]
4 [0 0 0 1 0 0, 15.73%, 1] [200-500] [202.55-503.50] [0 0 0 1 0 0, 15.03%, 0] [300-600] [08.33-615.76]
5 [0 0 0 0 1 0, 63.37%, 0] [1000-2000] [998.64-1997.75] [0 0 0 0 1 0, 9.93%, 1] [0-0] [0.23-0.31]
of the seven categories of apps. For each app category, 5-
7 situations are presented for the U F and context situation
(input) and choice of distortion is recorded (output). Each
input sample was replicated a few times in a nearby range
to have a well defined training set similar to that of the
example training data shown in Table II. Depending upon
their personal choice and app usage frequency of the lo-
cation privacy preferences were recorded that satisfies their
convenience level. For each user, the sample set vary between
150-180 points. The sample user records can be obtained
from https://github.com/rohitdavas/Smart-data repository on
request.
B. Model performance
Since the proposed model is a personalized agent its
performance has to be separately analyzed for each user.
For each user, the dataset was randomly split into training
and test sets with a train:test:validation ratio of 9:1:1 and a
batch size of 4. The mean square error (MSE) was used as
the metric for the loss over the predicted distortion ranges
and actual distortion ranges. Adam optimzer algorithm was
used with default Adam parameter settings of lr = 0.001,
betas = (0.9, 0.999), eps = 1e − 08, weight decay = 0, and
amsgrad = F alse. The variation of M SE versus epoch, is
shown in the convergence graph in Fig. 9(a) and (b) for two
distinct users U ser − 1 and U ser − 2 having different privacy Fig. 9. Convergence graph for (a) User-1, (b) User-2
preferences dataset. The MSE decreases at each successive
epoch as learning progresses. The average inference time per
coherence map between actual and random latitude values and
instance is recorded to be 0.00343 seconds. Also, the Table IV
longitude values, respectively, for quarter signals generated
shows some sample results for User-1 and User-2. It can be
using the proposed approach. The distribution is random. High
seen that the predicted distortion range values are very close or
coherence values indicate signals similarity. It can be seen that
almost similar to input values. Similar results were observed
at many points the coherence is less than 0.5. The percentage
for various users and values and can be found at github link.
of points having coherence less than 0.5 in Fig. 10(a) and Fig.
We have not made any comparisons since there does not exist
10(b) is 86.61% and 41.24% respectively. This indicates that
any such modeling for location data until now.
the approach introduces a sufficient randomness.
C. Measuring induced perturbations Correlation is another measure of the relationship between
two signals. The correlation coefficient is also used to evaluate
This section defines measures to evaluate the spatial and
similarity. If two signals have a high degree of similarity, the
temporal entropy discussed in Section IV-D. Here, two mea-
absolute magnitude of the computed correlation coefficient is
sures are used to quantify the amount of noise and its affect
close to 1. The correlation coefficient between two signals x(t)
on signal - Coherence and Correlation. Coherence is more
and y(t) is computed as:
specifically termed “magnitude-squared coherence” between ∑ ∑ ∑
two signals x(t) and y(t) defined as: xy − x y
Cr = √ ∑ ∑ ∑ ∑ . (5)
|Gxy (f )|2 [ x − ( x)2 ][ y − ( y)2 ]
Cxy (f ) = , (4) The correlation coefficients between actual and random lat-
Gxx (f ).Gyy (f )
itude values and longitude values were 0.76 and 0.51, re-
where Gxy (f ) is the cross-spectral density between x and spectively, again indicating unlinkability and a change in
y, and Gxx (f ) and Gyy (f ) are the auto-spectral densities of their representation. This experiment was repeated for the
x and y, respectively. The coherence values always satisfy each for the collected data instances, and similar observations
0 ≤ Cxy (f ) ≤ 1. Coherence measures the degree of linear were made. Collectively, it was observed that the correlation
dependency of two signals by testing for similar frequency coefficients varied between (35% and -80%) and the coherence
components. If two signals correspond perfectly to each other values varied between (30% and 90%).
at a given frequency, the magnitude of coherence is 1. If they
are totally unrelated, coherence is 0. Fig. 10(a) shows the VI. C ONCLUSIONS
coherence map for signals that were exactly the same, the The proposed smart data agent model for preserving loca-
values are all equal to one. Figs. 10(b) and (c) show the 1D tion privacy is based on the concept of “privacy by design”.
Fig. 10. Depiction of coherence Cxy (f ): (a) same signal, (b) actual and random latitudes, (c) actual and random longitudes.

It acts as a personalized assistant and gives smartphone users [9] A. Masoumzadeh and J. Joshi, “An alternative approach to k-anonymity
more options for controlling what to share, how much to share, for location-based services,” Procedia Computer Science, vol. 5, pp.
522–530, 2011.
and with whom to share data. The neural network learns user’s [10] G. Zhong and U. Hengartner, “A distributed k-anonymity protocol
privacy preference and keeps operating in background to alter for location privacy,” in IEEE International Conference on Pervasive
the location data before passing it further. This way it is able Computing and Communications. IEEE, 2009, pp. 1–10.
[11] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam,
to make prediction for existing as well as new applications as “l-diversity: Privacy beyond k-anonymity,” ACM Transactions on Knowl-
the context and usage frequency changes. Several thresholds edge Discovery from Data (TKDD), vol. 1, no. 1, pp. 3–es, 2007.
are used that can be fine tuned by the user. The removal of [12] M. F. Mokbel, “Privacy in location-based services: State-of-the-art
and research directions,” in International Conference on Mobile Data
spatial and temporal correlations in the data by altering the Management. IEEE, 2007, pp. 228–228.
actual access history prevents a third party from mining the [13] B. Niu, Q. Li, X. Zhu, G. Cao, and H. Li, “Achieving k-anonymity in
personal data. Overall, the proposed model enables data to be privacy-aware location-based services,” in IEEE INFOCOM 2014-IEEE
Conference on Computer Communications. IEEE, 2014, pp. 754–762.
transformed in accordance with the type of application that [14] M. L. Damiani, E. Bertino, and C. Silvestri, “Protecting location privacy
will use the data and enables the control parameters to be through semantics-aware obfuscation techniques,” in IFIP International
adjusted in accordance with the conditions so that the user Conference on Trust Management. Springer, 2008, pp. 231–245.
[15] H. Lu, C. S. Jensen, and M. L. Yiu, “Pad: privacy-area aware, dummy-
can enjoy the highest possible quality of service. based location privacy in mobile services,” in Seventh ACM International
Workshop on Data Engineering for Wireless and Mobile Access, 2008,
ACKNOWLEDGEMENTS pp. 16–23.
[16] C. A. Ardagna, M. Cremonini, E. Damiani, S. D. C. Di Vimercati,
This work was partially supported by MEXT KAKENHI and P. Samarati, “Location privacy protection through obfuscation-based
techniques,” in IFIP Annual Conference on Data and Applications
Grants (16H06302 and 18H04120), Japan. Security and Privacy. Springer, 2007, pp. 47–60.
[17] C. A. Ardagna, M. Crcmonini, E. Damiani, S. De Vimercati, and
R EFERENCES P. Samarati, “A middleware architecture for integrating privacy prefer-
ences and location accuracy,” in IFIP International Information Security
Conference. Springer, 2007, pp. 313–324.
[1] A. R. Beresford and F. Stajano, “Location privacy in pervasive comput-
[18] T. Hashem and L. Kulik, “Safeguarding location privacy in wireless ad-
ing,” IEEE Pervasive computing, vol. 2, no. 1, pp. 46–55, 2003.
hoc networks,” in International Conference on Ubiquitous Computing.
[2] C. Bettini, S. Jajodia, P. Samarati, and S. X. Wang, Privacy in location-
Springer, 2007, pp. 372–390.
based applications: research issues and emerging trends. Springer
[19] G. J. Tomko, D. S. Borrett, H. C. Kwan, and G. Steffan, “Smartdata:
Science & Business Media, 2009, vol. 5599.
Make the data think for itself,” Identity in the Information Society, vol. 3,
[3] N. T. Times, nytimes.com/interactive/2018/12/10/business/location-data-
no. 2, pp. 343–362, 2010.
privacy-apps.html, 2020 (accessed May 14, 2020).
[20] I. Harvey, A. Cavoukian, G. Tomko, D. Borrett, H. Kwan, and D. Hatz-
[4] GDPR, https://gdpr-info.eu/issues/privacy-by-design/, (accessed May 14, inakos, Smartdata. Springer, 2013.
2020). [21] M. Amiri-Zarandi, R. A. Dara, and E. Fraser, “A survey of machine
[5] N. Narodytska and S. Kasiviswanathan, “Simple black-box adversarial learning-based solutions to protect privacy in the internet of things,”
attacks on deep neural networks,” in 2017 IEEE Conference on Com- Computers & Security, p. 101921, 2020.
puter Vision and Pattern Recognition Workshops (CVPRW). IEEE, [22] I. Bilogrevic, K. Huguenin, B. Agir, M. Jadliwala, M. Gazaki, and
2017, pp. 1310–1318. J.-P. Hubaux, “A machine-learning based approach to privacy-aware
[6] B. Gedik and L. Liu, “Protecting location privacy with personalized k- information-sharing in mobile social networks,” Pervasive and Mobile
anonymity: Architecture and algorithms,” IEEE Transactions on Mobile Computing, vol. 25, pp. 125–142, 2016.
Computing, vol. 7, no. 1, pp. 1–18, 2007. [23] Scrapper, https://pypi.org/project/google-play-scraper/, (accessed May
[7] T. Le and I. Echizen, “Lightweight collaborative semantic scheme for 14, 2020).
generating an obfuscated region to ensure location privacy,” in IEEE
International Conference on Systems, Man, and Cybernetics (SMC).
IEEE, 2018, pp. 2844–2849.
[8] M. Gruteser and D. Grunwald, “Anonymous usage of location-based
services through spatial and temporal cloaking,” in First International
conference on Mobile systems, applications and services, 2003, pp. 31–
42.

You might also like