You are on page 1of 4

Project: ENGG1003 - Programming Assignment 1

GPS Data Analysis, October 6, 2021 ENGG1003- Introduction to Procedural Programming

ENGG1003 - Programming Assignment 1


GPS Data Analysis
Due date: 9:00am, Thursday 28 October 2021
Submission: Upload your final .py file to Blackboard as an assignment submission
Marking: During your Week 7 lab
Weighting: 20% of your final grade

1 Introduction
Modern fitness tracking apps such as Strava and Garmin Connect make extensive use of sensor data to
provide insights into a user’s exercise program. This project will concentrate on the analysis of GPS data
collected during a mountain bike ride.
The GPS track log is presented in a csv file containing timestamps, latitude, longitude, and elevation
columns recorded at 1 second intervals. In this project you will analyse this GPS data to create some
common analysis metrics provided by fitness tracking services. Some effort will also be made to interpolate
missing data in places where the GPS signal was lost.

2 Tasks
1. Open the provided .csv file in a text editor and take note of the column headings containing the
latitude, longitude, and elevation values for each GPS data point.
2. Write a Python script which opens the .csv file and loads the three columns into numpy arrays.
Note that adding .values to the variable returned by pd.read csv() returns a numpy array.
3. In the “random walk” lab task (Week 5 lab, Task 2) two numpy arrays were interpreted as lists of x
and y coordinates and plotted as such. Here you will do something similar. While lat and lon are not
x and y co-ordinates, they are latitude and longitude measured in degrees, you can plot the “walk”
of the mounting biking ride in a similar way. Pay close attention to whether plot(lat,lon) or
plot(lon,lat) produces a plot in the standard map orientation of “North up” by checking that
the GPS data starts and finishes on the East side.
There is a large gap (missing GPS points as the bike travelled through a tunnel) in the data along
the Western edge. Zoom into the Western edge (left hand side) of your plot and confirm that you see
a similar gap as in Figure 1.

Figure 1: The largest gap in GPS data is along the Western edge due to a disused railway tunnel.

Page 1 of 4
Project: ENGG1003 - Programming Assignment 1
GPS Data Analysis, October 6, 2021 ENGG1003- Introduction to Procedural Programming

4. Read in the “time” column into an array of strings (NB: The .values extension will provide an
array of strings)
Print the first value and ensure that you get: 2021-01-05 23:48:56+00:00.
The format is: YY-MM-DD HH:MM:SS+<UTC Offset>.
The UTC offset is zero, indicating that the timestamps are in GMT.
5. From the time strings, create a numpy array of time spans, called timespan, where the value of each
element is the number of seconds which has past since the previous GPS point. The points are
mostly one second apart so most of the entries in this array will be “1” but there are a few missing
data points so there are several entries where the time step between GPS points will be greater than
“1”.
To create this array you can use the following example as a template:
import d a t e t i m e

t i m e 1 S t r = ’2021 = 01 = 05 2 3 : 4 8 : 5 6 + 0 0 : 0 0 ’
time1Obj = d a t e t i m e . d a t e t i m e . s t r p t i m e ( t i m e 1 S t r , ’%Y=%m=%d %H:%M:%S + 0 0 : 0 0 ’ )
t i m e 2 S t r = ’2021 = 01 = 06 0 0 : 4 9 : 5 7 + 0 0 : 0 0 ’
time2Obj = d a t e t i m e . d a t e t i m e . s t r p t i m e ( t i m e 2 S t r , ’%Y=%m=%d %H:%M:%S + 0 0 : 0 0 ’ )
timeApart = time2Obj = time1Obj
t i m e D i f f = timeApart . s e c o n d s

In this example time1Str is a string and the datetime.datetime.strptime() function converts


that string to a time “object”. Arithmetic can then be performed with these time objects. The results
of these arithmetic operations will then contain a .seconds variable.
Since there are several thousand GPS points you will need to perform processing in a loop to create
the full array.
Plot the timespan array, use plot(timespan) from the matplotlib.pyplot library. If the
data was “complete” this will be a horizontal line as all values will be “1”. However, since there are
gaps in this data (missing GPS points) you will see a few places where the timespan is greater than
“1”.
Print out the value of all of the time spans which are greater than one. There should be five of them.
6. Create an array called time where the value of each element is the number of seconds which has past
since the first GPS point. You can do this in a similar way to Q5 where the time apart is calculated
by subtracting the current time from the first time point or you can use the timespan array directly
to calculate the time array.
Plot the time array, use plot(time). You should see that in most places time will increase linearly
(increasing by one for each new point) except for the gaps found earlier where time will jump by more
than “1”.
7. Create an array containing the distance travelled between each GPS point. To do this you will need
an equation which translates changes in latitude and longitude to distance.
There are many ways to do this, but you are given the choice between the following two options (worth
different marks):
ˆ Using an equirectangular approximation (partial marks):

x = ∆λ cos ϕm
y = ∆ϕ
p
d = R x2 + y 2

where:
– ϕm is “average latitude”. Use cos ϕm = 0.839

Page 2 of 4
Project: ENGG1003 - Programming Assignment 1
GPS Data Analysis, October 6, 2021 ENGG1003- Introduction to Procedural Programming

– ∆ϕ is change in latitude in radians


– ∆λ is change in longitude in radians
– R is the radius of Earth, 6371 km.
– d is the distance traveled. NB: This will have the same units as R!
ˆ Using the haversine formula (full marks):

∆ϕ ∆λ
a = sin2 + cos ϕ1 cos ϕ2 sin2
2 2
√ √
c = 2atan2( a, 1 − a)
d = Rc

where:
– ∆ϕ is the change in latitude
– ϕ1 is the latitude of the first point
– ϕ2 is the latitude of the second point
– ∆λ is the change in longitude
– R is the radius of Earth, 6371 km
– atan2() is the “four quadrant” inverse-tan() function. Use numpy.arctan2().
– d is the distance traveled. NB: This will have the same units as R!
Note that:
ˆ The length of the array containing distances will contain 1 fewer elements than the array of
points. Store the distance between the 1st and 2nd point in the first element of this array and
all other values thereafter.
ˆ The numpy trig functions take arguments in radians and the lat and lon are in degrees. You will
need to convert from degrees to radians first.
ˆ Changes in elevation are ignored. Only calculate the “2D” distance.
8. Create another array and store the average speed traveled between each GPS point. Note that
distance between points
speed between points =
time between points
and you already have distance and timespan arrays. For full marks the speed must be calculated
as a single line of code operating on these arrays using numpy vectorisation. Using loops to create
the speed array can be done but will result in partial marks.
If you followed the steps above your timespan will be in seconds and your distance will be in kilometers
so your speed will be in kilometers per second. Convert your speed into km/h and plot this converted
speed. Use plot(speed) or plot(time,speed).
This was a mountain bike ride - the maximum speed will have been around 50 km/hr. Note that
outliers are an indication that the time calculation is possibly incorrect - gaps in the GPS data should
still result in a correct average speed during the gap, not an outlier.
9. Given the elevation data, create an array of elevation changes between each data point where the
value of each element is the change in elevation since the previous GPS point.
Plot the elevation data.
10. filter the elevation data with the following code:
import s c i p y . s i g n a l a s s i g
[ b , a ] = sig . butter (2 ,0.01)
e l e F i l t e r e d = sig . l f i l t e r (b , a , e l e )

Page 3 of 4
Project: ENGG1003 - Programming Assignment 1
GPS Data Analysis, October 6, 2021 ENGG1003- Introduction to Procedural Programming

Plot the raw and “filtered” data to observe the effect - the “noise” in the elevation data has been
removed and the data smoothed out.
11. (Difficult Question) Use linear interpolation to calculate new latitude, longitude, and elevation
arrays which have all values at all 1 second intervals. This will create extra GPS points to fill in the
gaps of missing GPS points observed earlier.
Note that latitude, longitude, and elevation are three different functions of time. This implies that
the interpolation is performed on latitude, longitude, and elevation independently (ie: with three calls
to the scipy.interp1d() function).
The output of the interpolation process will be a new array of GPS points without any gaps (ie: with
1 second between each point). Plot the latitude and longitude points as a “map” on top of the original
data, indicating the interpolated data with red X’s. ie: with plt.plot(...,...,’rx’). Zoom in
to the section on the Western edge, shown in Figure 1, to show that your interpolation worked.
From the interpolated GPS data, calculate a “complete” array of speed vs. time and plot it vs time
in seconds since the start of the GPS track. During the interpolated sections speed will be constant
(ie: a horizontal line).
From the interpolated GPS data, create a plot of filtered elevation vs time.

Page 4 of 4

You might also like