You are on page 1of 6

A DATASET for GPS Spoofing Detection on Autonomous Vehicles

Authors
Ghilas Aissou, Selma Benouadah, Hassan El Alami, and Naima Kaabouch

Affiliations
School of Electrical Engineering and Computer Science, University of North Dakota Grand Forks, ND 58202
USA

Corresponding author’s email address


ghilas.aissou@und.edu

Abstract
A dataset of Global Positioning System (GPS) spoofing attacks is presented in this article. This dataset includes data extracted
from authentic GPS signals collected from different locations to emulate a moving and a static autonomous vehicle using a universal
software radio peripheral unit configured as a GPS receiver. During the data collection, 13 features are extracted from eight-parallel
channels at different receiver stages (i.e., acquisition, tracking, and navigation decoding). In addition to the collected authentic GPS
signals, three GPS spoofing attack types were simulated, simplistic, intermediate, and sophisticated attacks. The resultant dataset
contains a total of 158,170 samples, including 55% of legitimate instances and 45% of samples corresponding to three types of
simulated GPS spoofing attacks, all in a balanced distribution. The data described and attached to this article can be used to
investigate the effect of the GPS spoofing attack on the extracted features and contribute to the development of GPS spoofing attack
detection techniques based on supervised and unsupervised machine learning.

Value of the Data


• The raw dataset is collected using real-time feature extraction from authentic GPS signals using a USRP unit configured as
an eight-channel GPS receiver emulating a real GPS receiver in an autonomous vehicle. The dataset includes the correlator's
amplitude at the time the Doppler shift is updated.
• Using the collected GPS signals, three spoofing attack types were simulated: simplistic, intermediate, and sophisticated
attacks. A simulation was chosen over real live GPS spoofing attacks since they are illegal and risky, and setting up an indoor
test bench was dismissed as its results are often biased and incorrect.
• The dataset is made easy for robust training of supervised/unsupervised machine learning algorithms to detect GPS spoofing
attacks by converting the 3D cubic data from the eight parallel channels into a 2D feature map.
• The three different attacks and legitimate signals can easily be separated from each other using the label field. In this way,
the dataset can be rearranged for binary classification with one attack at a time.
• The dataset is available in Excel format and can be converted to a comma-separated file format to help researchers easily
import the dataset into different research software platforms and programs.
• A separate Excel file containing the raw (original) GPS data is included for researchers to use to simulate customized GPS-
related attacks or for other uses.
• As civilian GPS transmissions are open and unencrypted, GPS spoofing attacks against GPS receivers are frequent and
dangerous. Researchers can utilize this dataset to study how different extracted features impact Artificial Intelligence (AI)
based models' ability to identify and categorize spoofed and real instances. The research based on this dataset will then help
the community develop robust countermeasures for GPS spoofing attacks.

Data Description

1.1 Data Collection


The data was collected using a GPS receiver in two different scenarios. The first is a driving test at speeds ranging from 0 to 60 mph
with an average speed of 45 mph. The second scenario consists of three stationary positions at different altitudes. It is worth
mentioning that the vertical GPS position error is larger than the horizontal plane since a sufficient satellite spread is only possible
for a horizontal plane [1].
Figure 1: Two scenarios of GPS signal collection, a static GPS receiver in three different locations
(on the right), and a driving scenario (on the left)

The data is available in excel files in two formats; an 8-channel format and a readable (simple/ efficient) format. The 8-channel format
consists of 158,170 data samples with 13 features from eight channels represented as a 3-dimensional expansion of the channels, as
illustrated in Fig. 2. The extracted features and their corresponding stage of extraction are described in Table II. The extraction process
is done in parallel in each of the eight channels.

Figure 2: Parallel feature extraction from 8 channels GPS receiver.

The readable excel format is a conversion of the time series output data of the eight channels of the receiver into a two-
dimensional (2D) feature map distribution. The zero values in the eight-channel dataset reflect the time span when the channel is
unlocked with GPS signals, so all these values are eliminated in the simplified version.

Table 2: EXTRACTED FEATURES

Feature name Receiver stage


Satellite Vehicle Number (PRN) Post-correlation
Carrier Doppler in Hz (DO) Post-correlation
Pseudo-range in meter (PD) Post-correlation
Receiver Time (RX) Post-correlation
Time of the Week in seconds (TOW) Post-correlation
Carrier Phase Cycles (CP) Post-correlation
Magnitude of the Early Correlator (EC) During correlation
Magnitude of the Late Correlator (LC) During correlation
The Magnitude of the Prompt Correlator (PC) During correlation
Prompt in phase correlator (PIP) During correlation
Prompt Quadrature Component (PQP) During correlation
Carrier Doppler in Tracking loop in Hz(TCD) During correlation
Carrier to Noise Ratio in dB-Hz (C/N◦) Pre-correlation
1.2 Features description
PRN: The GPS constellation contains 27 operational satellites, each satellite is identified with a unique identification number.
DO: The Doppler shift is the result of the satellite and receiver motion. The DO is expressed as the frequency drift between the sent
frequency and received frequency of the GPS signal, as described in (1).
𝑐+𝑣𝑟
𝑓=( ) ∗ 𝑓𝑖 (1)
𝑐+𝑣𝑠

Where, vr is the receiver speed, and vs is the satellite speed [2]. The Doppler offset is ±5kHz with rate of change of ±20Hz related to
the receiver configuration.
PD: The pseudo-range is the time difference between the transmission and reception time. It can be expressed in meters which refer to
the distance between the receiver and the satellites. PD is given by(2).
𝑃𝑠 = 𝑐(𝑡𝑟 − 𝑡𝑠 ) (2)
Where tr is the reception time and ts is the transmission time.
TOW: It refers to the number of elapsed seconds since the start of the week given by the satellite atomic clock. When TOW reaches to the
end of the week which is 604,799 seconds it resets to zero.
RX: It is the receiver time given in seconds after the start of the TOW.
CP: It refers to the beat frequency difference between the received carrier of the satellite signal and a receiver- generated reference
frequency given in cycles [3]. The CP is consistent with the code phase at any time interval as shown in (3).
ω(𝜏(𝑡𝑚 ) − 𝜏(𝑡𝑛 )) = 𝜙(𝑡𝑚 ) − 𝜙(𝑡𝑛 ) (3)
Where wc is the nominal carrier frequency τ (tm) is the code phase at time tm, ϕ(tm) is the carrier phase at time tm.
PC: The PC measurement is made by a C/A code tracking with an early and late correlator at 1-chip spacing using a replica of the
C/A code. The PC is within one chip of the incoming satellite signal code positioned between the early and late correlators (EC and
LC). During the correlation process, the correlator equalizes the PC at half the distance between the EC and LC [ 4]. Fig. 3 shows the
correlator's output being extracted only at the time the code phase and carrier phase is updated and validated in the tracking loops
and the data is forwarded to the observable block since the correlator amplitude fluctuates in response to receiver and satellite motion.

Figure 3: The PRN start sample plot versus the correlator data points.

EC: The Early Correlator is at 1/2 chip spacing before the prompt correlator.
LC: The Late Correlator is at 1/2 chip spacing after the prompt correlator.
PIP: It is the in-phase component of the Prompt correlator amplitude.
PQP: It is the quadrature component of the prompt correlator amplitude.
PC can be expressed in terms of PIP and PQP component as shown in the following equation:
PC= √𝑃𝐼𝑃 2 + 𝑃𝑄𝑃2 (4)
TCD: TCD is the Doppler shift estimated at the tracking loops, each value is passed through a threshold filter before forwarding it to the
observables block.
C/N◦: It is the ratio of the received carrier strength over the received noise strength.

2. Experimental design, materials and methods


The hardware used in the implementation consists of a USRP, a front-end active GPS antenna with a right-hand circular
polarization and 27 dBi gain, and an I5-4300U laptop with 8G RAM running with Ubuntu 16.04.7 LTS version, as shown in Fig.
4. The USRP is an N200 model running with a Xilinx Spartan 3A-DSP 1800 FPGA motherboard model, 100 MS/s dual ADC,
400 MS/s dual DAC, and Gigabit Ethernet connectivity. The FPGA motherboard is coupled with a WBX 50-2200 MHz transceiver
as a daughterboard. The WBX transceiver is suitable for land-mobile communication, maritime, and aviation applications.

Figure 4: Field experiment hardware.

The software used is an open-source Global Navigation Satellite Systems software-defined receiver GNSS- SDR. It is based on
GNU Radio software which is an open-source framework for SDR applications. Its design accommodates all types of
customization, including the exchangeability of signal sources and signal processing algorithms. It also enables the modification
of output formats, the selection of signal processing technique, and the creation of interfaces for all signal’s, parameters, and
intermediate variables. These software benefits are used for real-time feature extraction on various signal processing blocks.

Figure 5: GPS signal collection experiment design.

Feature extraction is performed at different receiver stages starting from the pre-correlation stage and going through the delay lock
up and phase lock up correlation loops, as illustrated in Fig. 5. The extraction process ends in the post-correlation stage in the observable
block.

3. GPS spoofing attack simulation


Orchestrating over-the-air live GPS spoofing attacks is dangerous and illegal. Moreover, building an indoor testbed for radio
frequency tests is unrealistic and may lead to biased results [5]. For these reasons, three types of spoofing attacks, simplistic,
intermediate, and sophisticated attacks, were simulated based on their complexity and sophistication using Matlab. The simulation
process illustrated in the flowchart in Fig. 6 involve the introduction of random variable with a high degree of statistical variance to
eliminate the chance of overfitting and biased results [5] [6].
At the receiver side, the GPS signal received from the satellite is expressed in equation (5).

𝑆(𝑡) = 𝐴𝑠 𝐷(𝑡 − 𝜏𝑠 )𝑥(𝑡 − 𝜏𝑠 )cos(2𝜋(𝑓𝑙 + 𝑓𝑑 )𝑡 + ∅) (5)


Where, 𝐴𝑠 is the amplitude of the received signal, 𝐷(𝑡) is the data bit stream, 𝑥(𝑡) is the spreading code, 𝜏𝑠 is the time of flight known
as code phase, 𝑓𝑑 is the doppler shift, and ∅ is the carrier phase shift.
In case of a spoofing attack the spoofer sends N signals that can be represented in (6).
𝑆𝑠 (𝑡) = ∑𝑁𝑖=1 𝐴𝑠𝑠𝑖 𝐷𝑠𝑖 (𝑡 − 𝜏𝑠𝑖 )𝑥𝑖 (𝑡 − 𝜏𝑠𝑖 )cos⁡(2𝜋(𝑓𝑙 + 𝑓𝑑𝑖 )𝑡 + ∅𝑠𝑖 ) (6)
Where, 𝐴𝑠𝑠𝑖 is the amplitude of the spoofed signal, 𝜏𝑠𝑖 ,⁡𝑓𝑑𝑖 , and ∅𝑠𝑖 ⁡are the induced code phase, doppler shift, and carrier phase,
respectively.

In a simplistic spoofing attack, the spoofed signal is transmitted at a high power level 𝐴𝑠𝑠𝑖 >𝐴𝑠 , resulting in an abrupt increase in the
C/N0. The spoofed signal is unsynchronized with the real GPS signal, which initially causes a loss of lock and an unmatched
code/carrier phase and Doppler shift with the real signal (𝜏𝑠𝑖 ≠ 𝜏𝑠 , ∅ ≠ ∅𝑠𝑖 , 𝑎𝑛𝑑⁡𝑓𝑑 ⁡ ≠ 𝑓𝑑𝑖 ). In addition, the requirement for consistent
code phase and Doppler shift in the generated signals such that 𝜏𝑠𝑖 = 𝜏𝑠(𝑖+1) and 𝑓𝑠𝑑𝑖 = 𝑓𝑠𝑑(𝑖+1) introduces fluctuations as the spoofer
tries to match the two values, such as a time change >>100 ns and a Doppler shift rate that exceeds the normal range of ±20Hz [3].
For these reasons, close monitoring of the signal power, Doppler shift, or Code phase can efficiently detect the presence of a simplistic
spoofing attack.

Figure 6: Flowchart of GPS spoofing attack simulation.

During intermediate spoofing attacks, the spoofer is assumed to be aware of the target exact position and a good estimate of its
velocity. The attacker makes sure to aligned the code phase (at least half code chip) and doppler shift with the legitimate signal 𝜏𝑠𝑖 ≃
𝜏𝑠 and 𝑓𝑑𝑖 ≃ 𝑓𝑑 , with a certain level of uncertainty depending on the receiver configuration. The signal power 𝐴𝑠𝑠𝑖 is kept close to
the authentic signal power 𝐴𝑠 , (𝑖. 𝑒. , 𝐴𝑠𝑠𝑖 =𝐴𝑠 ). In this way, the doppler and code phase show no anomalies. However, the first attempt
to synchronize the code phase 𝜏𝑠𝑖 and doppler 𝑓𝑑𝑖⁡ with the authentic signal will cause distortions in the delay and phase lock tracking
𝑑𝐸𝐶 𝑑𝐿𝐶 𝑑𝑃𝐶
loops values; EC, LC, and PC that can be seen as sudden jump in ⁡ , ⁡, and . Once the attacker’s signal locks
𝑑𝑡 𝑑𝑡 𝑑𝑡
with the target’s receiver the correlator distortion will disappear, and the attackers starts the lift-off process to induce the desired
changes in doppler and code phase of the signal. In such instance the receiver reference clock can be used for TOW check. [2] [7].

In sophisticated attacks, multiple synchronized transmitters are used to transmit both jamming (nulling) and spoofing signals at
different arrival angles emulating the GPS constellation knowing the exact target position and velocity as in an intermediate attack in
addition to that it is also assumed that the attacker is aware of the target’s antenna’s phase center [3]. Due to this attack complexity,
most detection techniques including the angle of arrival detection techniques are unable to detect such attacks. However, observing
distortions in the correlation loop can be a reliable solution to detect these attacks; nevertheless, this technique sometimes detects
multipath signals as spoofing attacks which rise the false alarm rate [8]. In this work, we inserted distortions and changes in the peak-
to-next-peak ratio in the correlation loops in multiple parallel channels, and we introduced a quadrature accumulation shift in the
correlator (PQP) emulating a real case spoofing attack using multiple transmitters [9] as shown in Fig. 7.

Figure 7: In-phase versus quadrature component authentic signal (left), authentic and quadrature
accumulation during spoofing (red scatter).

References
[1] Renfro, B.A., Stein, M., Boeker, N. and Terry, A., 2018. An analysis of global positioning system (GPS) standard positioning
service (SPS) performance for 2017. See https://www. gps. gov/systems/gps/performance/2014-GPS-SPS-performance-analysis.
Pdf.
[2] A. Ranganathan, H. O´ lafsd´ottir, S. Capkun, Spree: A spoofing resistant gps receiver, in: Proceedings of the
22nd Annual International Conference on Mobile Computing and Networking, 2016, pp. 348–360.
[3] M. L. Psiaki, T. E. Humphreys, Gnss spoofing and detection, Proceedings of the IEEE 104 (6) (2016) 1258–1270.
[4] J. J. Spilker Jr, P. Axelrad, B. W. Parkinson, P. Enge, Global positioning system: theory and applications, volume I, American
Institute of Aeronautics and Astronautics, 1996.
[5] G. Panice et al., "A SVM-based detection approach for GPS spoofing attacks to UAV," 2017 23rd International Conference on
Automation and Computing (ICAC), 2017, pp. 1-11, doi: 10.23919/IConAC.2017.8081999.
[6] Merwe, J.R.V.D., Nikolikj, A., Kram, S., Lukcin, I., Nadzinski, G., Rügamer, A. and Felber, W., 2020, September. Blind
Spoofing Detection for Multi-Antenna Snapshot Receivers Using Machine-Learning Techniques. In Proceedings of the 33rd
International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2020) (pp. 3294-3312).
[7] C. J. Wullems, A spoofing detection method for civilian l1 gps and the e1-b galileo safety of life service, IEEE Transactions on
Aerospace and Electronic Systems 48 (4) (2012) 2849–2864.
[8] M. Turner, S. Wimbush, C. Enneking, A. Konovaltsev, Spoofing detection by distortion of the correlation function, in: 2020
IEEE/ION Position, Location and Navigation Symposium (PLANS), IEEE, 2020, pp. 566–574.
[9] K. D. Wesson, J. N. Gross, T. E. Humphreys, B. L. Evans, Gnss signal authentication via power and distortion monitoring, IEEE
Transactions on Aerospace and Electronic Systems 54 (2) (2017) 739–75

You might also like