You are on page 1of 6

2017 IEEE Fog World Congress (FWC)

Edge Compression of GPS Data for Mobile IoT


Joydeep Acharya Sudhanshu Gaur
Digital Solution Platform Laboratory Digital Solution Platform Laboratory
Hitachi America Ltd., Santa Clara, CA Hitachi America Ltd., Santa Clara, CA
Email: joydeep.acharya@hal.hitachi.com Email: sudhanshu.gaur@hal.hitachi.com

Abstract—Intelligent Transportation Systems (ITS) is a key trucks are outside cellular coverage. In addition transmission
IoT use case. To enable ITS applications, the location information of raw GPS coordinates can lead to breaches in security and
(GPS) of a vehicle needs to be continuously transmitted to the privacy [4] as they can be intercepted by a malicious adversary
cloud. Due to bandwidth and latency considerations, there is
a limit to the aggregate volume and velocity of all the data who can infer the location of the vehicles and the passengers.
transmitted to the cloud from the vehicle. To address this prob- Thus the location data needs to be compressed and obfus-
lem, this paper proposes a novel technique for compressing the cated prior to transmission. We address this issue in this paper.
GPS data before transmission to the cloud. Our algorithm at the Compression of GPS trajectories is a well researched topic for
edge correlates the GPS data with the the local GIS information legacy applications. The inventors in [5], [6] propose a form of
to derive high-precision quantized estimates. At the cloud, our
algorithm estimates the vehicular speed from the quantized data, delta compression while transmitting successive GPS points.
to reconstruct the GPS coordinates with minimum error. Thus The authors in [7] evaluate the performances of several spatio-
our algorithm is different from traditional algorithms for GPS temporal sampling techniques of a GPS trajectory in order to
trajectory compression. Our proposed technique also achieves minimize the error between the actual and sampled points. In
the secondary benefit of automatic encryption and obfuscation their follow-up work [3], they propose a queue based system
of the transmitted GPS data, thus improving the privacy and
security of ITS systems. Finally we show that, to implement this to quickly remove sampled points from a trajectory while
algorithm in a real deployment, a fog based architecture is needed effectively bounding the growth of error caused by the removal
for addressing the control and management layer functionalities. of points. Multi trajectory compression methods are proposed
in [8]. The authors in [9] propose separate compression of
I. I NTRODUCTION spatial and temporal information based on graph theoretic
In the context of the Internet of Things (IoT), recent years models. A different approach from the above is mentioned by
have seen fundamental paradigm shifts in data acquisition, the authors of [10] who compress points in a GPS trajectory
processing and transmission from the devices to the cloud. Fog by estimating the speed and direction change of the trajectory.
computing principles [1] advocate bringing storage, processing Our approach is fundamentally different from all the above
and analytics capabilities closer to the devices and edge of the as we assume that the local map of the region where the
network and reduce the total volume, velocity and variety of vehicle is located is available at the cloud and the vehicle
data that needs to be transmitted to the cloud. The character- and is updated as the vehicle moves to a new location. This
istics of the data depends on the specific IoT application and map allows the edge and the cloud to establish the GIS context
thus influence the processing at the edge gateway or at fog of the vehicular location (relative position of the vehicle wrt
nodes close to the devices. to the road it is on). Using the GIS context we develop an
Intelligent Transportation Systems (ITS) integrates sensing, algorithm at the edge that compresses each point in the GPS
communications, control and information processing from trajectory of the vehicle to a single bit. We call this context
vehicles to the cloud and is an important IoT application. aware compression. At the cloud this algorithm performs
ITS is a conglomeration of various use cases such as con- intelligent reconstruction of the location by first estimating
nected vehicles, autonomous driving, fleet management and the vehicular speed from the compressed GPS data.
logistics, optimized navigation and remote vehicle monitoring
and diagnostics. To enable these applications, the real-time II. C ONTEXT OF L OCATION I NFORMATION
location of all the vehicles need to be transmitted to the cloud GPS values are represented by four bytes each for latitude
for subsequent analytics [2]. Continuous transmission of the and longitude which cover all points in the globe. However a
raw GPS coordinates increases the volume and velocity of vehicle in motion is confined within a much small geograph-
data transmitted from that the vehicle to the cloud. Assuming ical area. If the road on which a vehicle is present is known,
4 bytes of data each for transmitting the latitude, longitude the set of GPS points where the vehicle can be located is much
and timestamp, the authors in [3] show that the cost of limited and hence can be represented in less than four bytes.
tracking a fleet of 4000 vehicles when raw GPS coordinates As the car moves along the same road only its relative location
are transmitted every second would be as high as 2.5 million wrt to the road needs to be transmitted. This is the essential
USD annually. A major component of this cost is the data idea of GIS context aware compression of GPS data. We shall
charges for satellite transmission during the times when these formalize this idea now.

978-1-5386-3666-4/17/$31.00 ©2017 IEEE


2017 IEEE Fog World Congress (FWC)

(n)
sright are the closest points to pn with high probability.1

III. C OMPRESSION AND R ECONSTRUCTION A LGORITHMS


The main contribution of this paper is to quantize and
transmit Cpn instead of pn and show that this can be done
with only one bit most of the time.
The key steps in the compression and reconstruction algo-
rithms are respectively
{ }
Φ Ψ
Compression: Cp(n−1) , pn −→ Cp(n) −→ C˜p(n) (2)
−1 −1
Ψ Φ
Reconstruction C˜p(n) −→ Cp(n) −→ p̂n (3)
(n)
where Φ(·) is the function to derive Cp from pn and Ψ(·) is
(n)
the compression function for Cp . We will establish that Ψ(·)
Fig. 1. Example of GIS context of a point p that lies on a road S. The time
index n has been omitted can achieve a single bit of compression for most time instants.
At the cloud side Ψ−1 leads to perfect reconstruction of Cp .
(n)
−1
There is some loss in Φ leading to p̂n ̸= pn . We will later
evaluate this error. We now explain these functions
The GIS information of a road is a set of GPS points
that define its trajectory. There are many databases that store A. Function Φ(·) at Edge
the GIS information of roads. OpenStreetMap [11] is an This function has three subcases explained below
example of an open source database and companies like (n) (n−1)
Google etc. would have their own proprietary ones. The spatial a) Determine that Cp can’t be derived from Cp . Com-
(n)
spacing of points for a given road depends on the database - pute Cp by querying the GIS information database D
OpenSteetMaps for example can have points anywhere from and performing minimum distance computations over all
(n) (n)
20 − 100 m spacing. roads and waypoints to calculate S (n) , sleft and sright .
This would happen, for example, at the start of the
Definition 1: Define a GIS information database as D and (n−1)
algorithm when there is no stored Cp . This may also
its entries as < S, S > where S is the index of the road and
be needed periodically if the point pn is far away from
(n−1) (n−1) (n−1)
the line defined by the points sleft , sright of Cp .
S = [S[0], S[1], · · · , S[k − 1], S[k], S[k + 1], · · · ] , (1) This can happen, for example, if the car has moved on to
a new road. To determine if this has happened we propose
condition (a) defined by
is the set of points on the road, with each point S[n] of the
form [lat, lon] where lat and lon denote the latitude and ( ) ( ) (a) ( )
(n−1) (n−1) (n−1) (n−1)
longitude respectively. d sleft , pn + d sright , pn ≷ αd sleft , sright ,
(b)
For a two-way road, there will be two separate database (4)
entries, each indicative of a given direction (we will ignore where d(p, q) is the Euclidean distance between two GPS
multiple lanes in the same direction within a road). Assume points p and q and α > 1 is a design parameter.
(n) (n−1)
that this database is shared between the cloud and the edge b) Determine that Cp can be derived from Cp . The
node which performs the GPS compression. criterion is condition (b) in (4). This means that pn is
(n−1) (n−1)
Definition 2: Define {pn } be the sequence of GPS readings close to the points sleft and sright and the vehicle is
being output by the GPS device in a vehicle. At time instant still in the same road, i.e. S (n)
= S (n−1) . We now need to
(n−1) (n−1)
n, the reading pn is of the form [latn , lonn ] where latn check the relative location of pn w.r.t. sleft and sright .
(n−1)
and lonn are the latitude and longitude values. Consider the triangle ∆, formed by the point sleft , pn
(n−1)
Definition 3: At time instant n, if pn is located on road and sright . This again leads to two further subcases,
(n)
S (n) , define the GIS context of pn as the set Cp = i) If ∆ is not acute, it means that pn has crossed sright .
(n−1)
(n) (n) (n) (n) (n)
{S , sleft , sright } where sleft and sright are two consec- (n−1)
Let the point next to sright in the trajectory of S (n)
(n)
utive points in S defined in (1) such that the points pn , sleft
(n) 1 The terms left and right have been used for ease of explanation with
and sright form an acute triangle.
Figure 1 as reference. It doesn’t mean that the points are necessarily to the
This is depicted in Figure 1. If the spacing between points left and right of pn which is strictly true only when we are visualizing roads
(n) (n)
in S(n) is same then sleft and sright are also the closest in the west to east direction. Nevertheless the actual algorithm doesn’t depend
on specific directional labels. GIS databases have points ordered according to
points to pn . The OpenStreetMap GIS information database the direction of the trajectory, i.e. a vehicle in motion will pass through points
(n)
D has non-uniform spacings but even in this case sleft and with higher indices and this is what our algorithm uses.
2017 IEEE Fog World Congress (FWC)

(n−1) (n) (n−1) (n) (n−1) (n)


be snext . Update sleft = sright & sright = snext . ii) If Indxn = 1, then(pick a value
) of p̂n close to sright ,
2 (n)
(n) (n−1)
which means that d sright , p̂n < ϵ where ϵ is a small
ii) If ∆ is acute, then update = Cp Cp number. The value of ϵ is a design parameter. We could
Note that the complexity of performing the check in (4) is (n)
assign p̂n = sright by setting ϵ = 0. ϵ ̸= 0 can reduce
much less than computing Cp as explained in step a) or else the variance of reconstruction error when effects such
we could have simply computed the exact Cp for each point. as realistic road trajectories and non-zero accuracy of
B. Function Ψ(·) at Edge the initial measured GPS point pn are considered.
1) Choice of γn : The choice of γn is crucial to reduce
This function has a one-to-one correspondence to the three
the reconstruction error. Intuitively the value of γn should be
subcases mentioned in Section III-A
(n) (n−1) proportional to the vehicular speed around time n. To make
a) If Cp can’t be derived from Cp as per condition (a) this notion exact, we begin with the following definition,
in (4), then transmit Tn = pn .
(n−1) Definition 4: At time instant n, if Indxn = 0 and there have
b) If Cp can be derived from Cp as per condition (b) in
been K zeros already transmitted since the last time a non-zero
(4), then define a binary variable Indxn for time n.
value was receive (i.e. there exists some positive integer K
i) If ∆ is not acute, set Indxn = 1. Transmit Tn = such that Indxk = 0 for n−K −1 ≤ k < n and Indxn−K =
Indxn . 1) then define Γ(n) = K. If at time N , Indxn = 1 or Tn =
ii) If ∆ is acute, set Indx = 0. Transmit Tn = Indxn . pn then Γ(n) = 0.
Thus we have included the GPS information in a single bit To see the utility of Γ(n), consider the situation where the
Indx leading to a compression ration of 32 : 1. vehicle is moving at constant speed and the set of points, S, in
C. Function Ψ−1 (·) at Cloud the road database are spaced uniformly. In this idealized case,
{Indx}, the vector of Indxn values that the vehicle would
This consists of the following familiar operations (i.e. transmit over time is |0 ·{z
· · 0} 1 0| ·{z
· · 0} 1 · · · , for some integer N .
operations very similar to those performed at the edge)
N N
(n)
a) If Tn = pn , then compute Cp by querying the GIS Thus Γ(n) = mod(n, N ). It is easy to see that in this case,
information database D and performing minimum dis- the optimal choice of γn is γn = 1 − Γ(n)/N .
tance computations over all roads and waypoints in D to In reality, the vehicle speed will vary in a random fashion
(n) (n)
calculate S (n) , sleft and sright . due to traffic and driving patterns. The spacing of points in
b) If Tn = Indxn then S also need not be uniform (as seen in OpenStreetMap).
(n−1)
i) If Indxn = 1, let the point next to sright in the For this case, {Indxn } would be 0| ·{z · · 0} 1 0| ·{z
· · 0} 1 · · · , for
(n−1) (n) (n−1)
trajectory of S (n) be snext . Update sleft = sright N1 N2
integers N1 , N2 , · · · and the values of these intervals can
(n) (n−1)
and sright = snext . not be known in advance. In another words, when the cloud
(n) (n−1)
ii) If Indxn = 0, then update Cp = Cp receives Indxn = 0, for the first time after Indxn−1 = 1 or
Tn−1 = pn−1 , it can not know in advance, the length of the
D. Function Φ−1 (·) at Cloud
interval of zero values (let us denote this by N ) that the edge
This is the reconstruction algorithm to obtain p̂n . We will transmit.
propose the following What the cloud does know is the length of the past intervals
a) When Tn = pn , set p̂n = pn . where the Indx values were zero. Let us denote them by N−1 ,
b) If Tn = Indxn then N−2 etc. with N−1 , the previous interval, N−2 , the one before
i) If Indxn = 0, then set that and so on. We propose that N can be estimated from the
(n) (n) values of the past intervals. This is because the values of these
p̂n = γn sleft + (1 − γn )sright , (5) intervals are still dependent on the vehicle speed even though
where γn is a design parameter. This means that p̂n is the dependence is loose due to non-uniform spacings of the
(n)
chosen to be a point in the line connecting sleft and points in S. Now for most of the travel (except for sudden
(n)
sright . This is because the reconstruction algorithm traffic, stop signs, red lights etc.), the vehicle speed in one
(n) interval will be correlated to the speed in the previous one.
knows that the vehicle hasn’t crossed over sright
We thus propose a heuristic algorithm to derive Nest , the
as otherwise the compression algorithm would have
estimate of N .
transmitted Indxn = 1.
1) When Indxn = 0 is received after either Indxn−1 = 1
2 This strictly needn’t be true as the vehicle could have moved way past or Tn−1 = pn−1 , set Nest = N−1 . Set Γ(n) = 0.
snext . However based on extensive tests done on urban traffic (low-medium
speed) using the OpenStreetMap database (average separation of points around 2) For present and future time instants k > n, with
35 meters) and when a new GPS point is generated every second, we see that Indxk = 0, compute Γ(k) = k − n. If Γ(k) = Nest − 1,
this event is somewhat rare. In the case when the vehicle has moved way then set Nest = ρNest where ρ > 2, i.e. re-adjust the
beyond snext and the update ignores this, within a short time the condition
in (4) will be triggered leading to the re-computation of Cp and correction of estimate as the original one was too small.
sleft and sright values. 3) Set γk = 1 − Γ(k)/Nest .
2017 IEEE Fog World Congress (FWC)

4) Reset the algorithm when Indxk = 1 or Tk = pk .


We call this the Speed Estimation Algorithm and in Sec-
tion IV, will evaluate the performance of this algorithm and
compare it with three other algorithms that do not try to
estimate the vehicle speed from the received Indxn values.
IV. P ERFORMANCE E VALUATION
We carried out field tests in the Bay Area, California to
evaluate the performance of our proposed algorithm. The tests
involved 10 cars each with its own specific route, driving
every day around the same time over a period of three
months. Each vehicle was fitted with an on-board, Linux-based
computing platform with a GPS receiver that generated a GPS
reading every 0.5 sec. The GPS trajectories had 2000 − 3000
points. The cloud was hosted in a local data center and the
connectivity between the vehicle edge and the cloud was
established through MQTT over an AT&T 4G-LTE network.
OpenStreetMaps was used to generate S. This was then loaded
(n)
to PostGres using its PostGIS extension to perform Cp Fig. 2. Original(green) and Reconstructed(red) routes by the Speed Estimation
computations. Algorithm
To following reconstruction algorithms were evaluated,
1) Speed Estimation: This was described in Section III-D1.
The value of ρ was optimized for a given route based on
learning over data from multiple days.
2) Constant Interval: This assumes a fixed value of Nest
and then performs steps 2) - 4) of the Speed Estimation
Algorithm. This value of Nest was optimized for a given
route based on learning over data from multiple days.
3) Fractional Interpolation: This heuristic computed
1
γn = (6)
1 + βΓ(n)
where the value of β was optimized for a given route
based on learning over data from multiple days.
4) Exponential Interpolation:This heuristic computed
γn = exp (−νΓ(n)) (7)
where the value of ν was optimized for a given route
Fig. 3. Mean and Standard Deviation of SED (in meters) for different
based on learning over data from multiple days. reconstruction algorithms
The logic of the last two heuristics is that γn should be a non-
increasing function of Γ(n). Figure 2 shows the performance
of the Speed Estimation algorithm for a single drive instance. plotting the mean and standard deviation of the ED sequence.
Only a portion of the total route is shown. We see that it As can be seen, the speed estimation algorithm gives the
provides a very close reconstruction of the original route. lowest error performance which validates our proposition that
An single large error occurs in the entire route which is the received quantized GPS information can be used to obtain
highlighted. This happens because at an intersection, the an estimate of the vehicular speed which can then be used for
(n)
computation of Cp is erroneous and the vehicle is wrongly reconstruction.
thought to have moved to an intersecting road. As seen, this Upon investigation of our data, we found that the GPS
is soon corrected. receiver had a variable error from 5 − 15 meters in its
We will now focus on the error in reconstruction which readings. For example, pn , the location of a vehicle that the
is given by the Synchronous Euclidean Distance (SED) [7] GPS receiver at the edge measures, could be shifted by 15
between the original and reconstructed GPS trace. Since the meters from the road S where the vehicle is actually located.
compressed GPS trace is generated at a constant interval and This makes the error values in Figure 3 seem higher than
no points are skipped, the SED reduces to Euclidean Distance they actually are. This is because the database S generated
(ED) between the original and reconstructed GPS sequences. from OpenStreetMaps has correct locations of GPS points for
(n) (n)
Figure 3 shows the performance of the four algorithms by the road S (such as points sleft and sright ) and thus the
2017 IEEE Fog World Congress (FWC)

reconstructed GPS points at the cloud were located on S unlike


pn . Thus our algorithm is in reality negating the effect of GPS
measurement errors.
V. S ECURITY I MPLICATIONS
Security is a major concern for all IoT applications. The
location and mobility pattern of a human can be inferred from
the corresponding values of his or her vehicle. If raw GPS
data is being transmitted by a vehicle, it can be intercepted by
a malicious entity and cause a breach of privacy and security.
This is prevented in our algorithm as raw GPS data is not
transmitted. If an adversary intercepts the transmit data at
time n and reads the value of Indxn , it is harder for him to
decode the actual location information without knowing the
past values and without having the GIS database S. Thus our
algorithm also achieves the secondary purpose of obfuscating
(and in a sense encrypting) the information stored.
Fig. 4. Fog computing based system architecture for implementing efficient
VI. F OG C OMPUTING BASED S OLUTION A RCHITECTURE GPS Compression
In this section, we argue that a system architecture to
implement GPS compression efficiently needs fog computing
long-term system level policies for the control plane (such
principles. This is because, in order to perform the compres-
as whether to reduce reporting interval or use a k-sampled
sion, the vehicular edge needs control information that are
GIS database for high-speed traffic). Note that the data plane
specific to the geographical region where it is located and this
is defined only between the cloud and the edge and not for
changes over time as the vehicle traverses along its path. Such
the fog. This is because to maintain a continuous data flow
localized control information can be best obtained from fog
reconstruction when the vehicle moves from the region of
nodes that are located in that region and not from a centralized
one fog node to the next, the cloud is the natural anchor. To
cloud. Consider the following examples
summarize, our architecture captures the different natures of
1) Till now we have not specified the periodic time interval the control and data plane and distributes their functionalities
after which the edge transmits Tn . This can be configured between the cloud, fog and edge.
to a fixed value but it is more efficient if it is changed
dynamically as per the local traffic conditions. For e.g. VII. C ONCLUSION
a local fog node can detect that the traffic is slow and
instruct all vehicles in the region to increase the reporting In this paper we have considered a novel way to compress
duration as the vehicles are unlikely to have moved much the location information of a vehicle by using of the GIS
in a short time duration. context of the vehicular location and subsequently reconstruct-
2) Conversely the fog node may detect that the traffic is ing it by estimating the vehicular speed from the compressed
moving much faster than normal. It can instruct the values. Our algorithm achieves a large compression ratio (32 :
vehicle to reduce the reporting interval. It can also instruct 1) with reasonable error performance. We have implemented
the vehicle to keep the same reporting interval but use a this algorithm in a in-vehicle computing platform to test its
K-sampled version of S where the vehicle now uses a validity. We have demonstrated that in order to implement this
modified GIS database algorithm in an IoT application, the system architecture should
include fog computing for greater control plane efficiency to
S̃ = [S[0], S[K − 1], S[2K − 1], · · · , ] , (8) support the data plane traffic from the edge to the cloud.
instead of the original one in (1). This is because the
R EFERENCES
vehicle is more likely to go past several intervals formed
by consecutive points in S, if these intervals are small. [1] Cisco, “Fog Computing and the Internet of Things: Extend the Cloud
to where the things are,” 2015. [Online]. Available: http://www.cisco.
Both these are examples of local decisions which would best com/c/dam/en us/solutions/trends/iot/docs/computing-overview.pdf
enforced by a regional fog node rather than a centralized cloud. [2] Comtech Telecommunications Corp., “Tomorrows connected car: Loca-
The high-level functional architecture is shown in Figure 4. tion aware,” White Paper, 2016.
The Compression Rules function at the fog node is responsible [3] J. Muckell, P. W. O. Jr., J.-H. Hwang, C. T. Lawson, and S. S. Ravi,
“Compression of trajectory data: a comprehensive evaluation and new
for implementing the control plane functionalities that we approach,” Geoinformatica, July 2014.
have discussed above. The fog also informs the cloud f any [4] Hewlett Packard Enterprise, “Securing the internet of things,”
changes in control plane rules as that would be needed for 2015. [Online]. Available: https://www.hpe.com/h20195/V2/GetPDF.
aspx/4AA6-3369ENW
data reconstruction at the cloud. The cloud and the fog nodes [5] E. Crawford, T. Behan, and D. Kelly, “System and method for com-
also communicate via a higher level management plane to set pressing GPS data,” US Patent 2015/293232 A1, 2015.
2017 IEEE Fog World Congress (FWC)

[6] M. Ingram and W. EE, “Communication system including telemetry de-


vice for a vehicle connected to a cloud service,” US Patent 2011/0234427
A1, 2011.
[7] J. Muckell, J.-H. Hwang, C. T. Lawson, and S. S. Ravi, “Algorithms for
compressing GPS trajectory data: An empirical evaluation,” In Proc.
of the 18th SIGSPATIAL International Conference on Advances in
Geographic Information Systems, January 2010.
[8] J. Birnbaum, H.-C. Meng, J.-H. Hwang, and C. Lawson, “Similarity-
based compression of GPS trajectory data,” In Proc. of the Fourth
International Conference on Computing for Geospatial Research and
Application, July 2013.
[9] X. Xu, X. Gao, X. Zhao, Z. Xu, and H. Chang, “A novel algorithm for
urban traffic congestion detection based on GPS data compression,” In
Proc. of the IEEE International Conference on Service Operations and
Logistics, and Informatics (SOLI), 2016.
[10] M. Chen, M. Xu, and P. Franti, “Compression of GPS trajectories,” In
Proc. of Data Compression Conference (DCC), April 2012.
[11] “Openstreetmap,” https://www.openstreetmap.org/.

You might also like