You are on page 1of 26

Lane-level trajectory reconstruction based on data-fusion

Mohammad Ali Arman a, Chris M.J. Tampèrea


a
Centre for Industrial Management, Traffic and Infrastructure; KU Leuven; Celestijnenlaan 300; 3001 Leuven, Belgium

Abstract

While lane-changing movements are performed on the entire motorway network for various reasons, such as
overtaking, their intensity is substantially greater near complex segments, such as weaving areas. Aside from mandatory
lane changes, some drivers also conduct lateral maneuvers for cooperation or anticipation near network nodes. Unlike
longitudinal driver behavior (car-following models), lateral driver behavior (lane-changing movements) has received fewer
research efforts. The scarcity of suitable data resources to analyze these behaviors and movements might be a crucial cause
for this research gap. This paper presents a four-step approach for reconstructing and correcting lateral bias in trajectories
collected by a commercial traffic information application running on everyday smartphones. The resulting lateral position
is accurate enough to allow for identification of the driving lane, and thus, the lane changes. The algorithm's core is built
on a data fusion method using trajectory and loop detector data. The evaluation and validation of the proposed algorithm
using drones and closed-circuit television (CCTV) data demonstrate that the core of the algorithm can correctly match
more than 94% of trajectory and loop detector data. Between each pair of successive detector stations, the lateral position
error has been significantly corrected and reduced to less than half the width of a standard lane of motorway networks.
As a result, more than 90% of processed trajectory sample points are in the correct lane. The algorithm requires just two
calibration parameters, so it is relatively simple to apply to other test networks.

Key words: Trajectory reconstruction, Driving lane identification, Data-fusion, Lateral Position Correction, Microscopic traffic
studies.

1. Introduction

Motorway networks are designed to move traffic flow uninterruptedly at high speed and volume. At network nodes where
traffic merges and diverges from the mainstream, the concentration of lane changes can disturb the flow. Although lane-changing
maneuvers occur across the entire network, for instance, for overtaking, the intensity of these maneuvers is much higher in
complex segments, such as weaving sections. In such sections, apart from mandatory lane changes, other drivers perform lateral
maneuvers for the purpose of cooperation or anticipation. Drivers who make lane changes might also change their speed and
headway, causing heterogeneity within and across lanes, also referred to as turbulence (van Beinum et al., 2018). Because during
a lane change, the vehicle temporarily occupies two lanes, complex, turbulent motorway sections typically exhibit reduced
capacity, a decrease in traffic speed, congestion spillback, and decreased safety levels.
While many studies have been devoted to the longitudinal behavior of drivers (car-following models), lateral behavior and
lane-changing maneuvers have received much less attention. A plausible reason for this is a lack of data (Keyvan-Ekbatani et al.,
2016). To study lateral driver behavior, researchers typically use trajectory data. To investigate lane-change maneuvers, trajectory
data must be accurate to the extent that the driving lane can be identified. Also, the driving characteristics of the vehicle that
changes lanes, such as its speed profile, as well as the local traffic conditions in the origin and destination lanes, are essential for
the analysis (Ahmed et al., 2022). The usual way to collect trajectory data for studying lateral behavior is through image processing
of video recordings (or, in older research: serial aerial photographing). This method suffers from at least four drawbacks: 1) the
cost of image processing is high, as this task can be mostly automated but typically requires some supervision and validation by
a human operator; 2) even after spending this cost, the data set is usually limited by the camera vision: with a few exceptions, all
available datasets cover segments in the order of hundreds of meters over a limited time (a few hours); 3) when trying to extend
the coverage to longer road sections using wide lens cameras, the problem of images' perspective will make vehicle characteristics
like speed and acceleration unreliable, especially at the edges of the image. While this can be partly addressed by recording video
with consecutive cameras, the more complex processing typically makes processing more costly as it raises technical issues in
video transition and overlap; 4) finally, because of the relatively short sections, the data sets contain just local fragments of a

Corresponding author. Tel.: +32 16 32 16 73.


E-mail addresses: mohammadali.arman@kuleuven.be (M.A. Arman), chris.tampere@kuleuven.be (C. M.J. Tampère)
, while lane-changing maneuvers involve a decision-making process with decisions being made over time as a
series of interrelated decisions (e.g., depending on where one entered the road and planned to leave). If the analyst could observe
the trajectory of a vehicle over multiple kilometers, including several mandatory and/or discretionary lateral maneuvers, this would
significantly increase the richness of the study and provide better support for decisions on how to improve traffic management.
This paper presents an algorithm to correct the bias (especially in lateral position) in trajectory data collected by smartphones
based on a data-fusion of the GPS trajectories with loop detector data. The result will be reconstructed trajectories in which the
vehicles' driving lanes can be identified with high reliability. The algorithm consists of four steps. In the first and second step,
qualified trajectories are selected, and minor revisions are applied. The third step is the core of the algorithm and fuses the
smartphone trajectory data with individual loop detector data. Trajectory data includes the time, geographical location, and speed
of a vehicle for all timestamps along the vehicle's route. The loop detector data includes the passage time, the speed, the headway,
and the length of the vehicle in disaggregate format (i.e., per individual vehicle). While the trajectories only exist for vehicles
carrying an active smartphone app, the loop detector data is registered for all vehicles in the flow and for each lane separately.
The data-fusion algorithm determines which loop detector registration on which lane matches the passage of each GPS trajectory,
allowing us to infer an approximate lateral position that turns out, as we show in this paper, in general, more accurate than the
original lateral position in the GPS signal. Matching criteria for the fusion are the similarity of the passage time, speed, and vehicle
length. As a result of the fusion step, for each trajectory, we have as many points in which we are confident about the driving lane
as there are loop detector stations along the path. The fourth step of the proposed algorithm involves the reconstruction and thus
reducing the bias of trajectory sample points between each two successive loop detector stations based on two forward and
backward modifications. The data used in this study was collected on a test site around the Antwerp ring in Belgium. We used a
probe vehicle equipped with a differential-GPS (d-GPS) device to develop models and evaluate the results. Also, the results have
been validated using two sources of videos recorded by drones and road CCTVs in two different sections in the test network. The
validations show that our algorithm has been effective in reducing the lateral bias of the GPS trajectory data of smartphones
enough to determine the driving lane with high reliability. To the best of our knowledge, it is the first time that fusion of GPS
trajectories from regular smartphones and loop detector data was used to reduce lateral error of the GPS recordings to the extent
that the driving lane can be reliably determined, which opens new research avenues. Among the most paramount applications, we
suggest the development and calibration of lane-change models, calibration of traffic microsimulation software, theoretical and
practical analysis of lateral traffic operations, especially in complex motorway network corridors, and safety analysis.
This paper is organized as follows: Section 2 reviews the literature on the processing of trajectory data. Section 3 describes
the test network, data collection, and various data sources used in this paper. In the 4th section, we describe the methodology of
the proposed algorithm in detail. The 5th section is devoted to presenting results, validation, and discussion. Finally, in the sixth
section, we conclude the paper.

2. Literature Review

To the best of our knowledge, this paper is the first successful attempt to correct the bias of trajectories recorded by
smartphone GPS by fusion of these trajectories with loop detector data such that the driving lane can be identified. Therefore, we
cannot provide a comprehensive literature review specifically on this topic. Given that our data-fusion intends to provide data for
the investigation of lane change behavior, such as studies carried out by (Gurupackiam and Jones Jr, 2012; Jin et al., 2019; Knoop
et al., 2012a; Marczak et al., 2016), we will review the broader literature in the following order. We will first briefly review studies
on trajectory data for microscopic traffic state studies, including lane change analysis. Then, we review studies proposing error
corrections on trajectory data. And finally, we discuss some literature on data-fusion techniques in the field of traffic operations
observation and intelligent transportation systems (ITS).

The first attempt to collect trajectory data was to study the hysteresis phenomenon. Data collection was done through aerial
photography and manually extracting trajectories from photos (Treiterer and Myers, 1974). Although some other efforts have been
made to provide this type of data, the release of the next generation simulation (NGSIM) trajectory dataset (Kovvali et al., 2007)
was a milestone in this field that sparked many studies on both car following and lane changing behavior (despite many criticisms
of its accuracy (Coifman and Li, 2017)). The interested reader can find a comprehensive list of traffic studies based on trajectory
data in (Li et al., 2020). In Table 1, we summarize the information of some trajectory databases available in the literature. Except
for the first two datasets in the table, all the others have been derived via video recording and image processing.
Table 1
A short list of some of the available trajectory datasets.
Authors Country- Length of the Dataset duration Types of road segments Public access
region road segment
(Treiterer and Myers, USA - Approximately 150 minutes Only the median lane
1974) Columbus, 366 meters
Ohio
(Hoogendoorn et al., The Maximum 520 Various Different motorway sites
2003) Netherlands - meters including merging,
Utrecht diverging and weaving
sections
NGSIM (Kovvali et USA - Maximum 640 Eight videos each for One signalized arterial and https://ops.fhwa.dot.gov/
al., 2007) California meters 15 minutes two motorways including trafficanalysistools/ngsim.htm
merging, diverging and
weaving sections
(Shariat et al., 2011) Iran - Tehran Each one Three videos each 2 A merging, a diverging and
approximately hours a weaving section
320 meters
(Asaithambi and India - Chennai 245 meters 5.5 hours (not A separated two-lane two- https://toledo.net.technion.ac.il/
Basheer, 2017; continuously) way segment without any mixed-traffic-trajectory-data/
Kanagaraj et al., merging or diverging
2015)
(van Beinum et al., The Different lengths Several videos each for Different motorway sites
2018) Netherlands - from 210 to 1100 30 minutes including merging,
various sites meters diverging and weaving
sections
(Raju et al., 2018) India - Mumbai 120 meters 12 hours (not Ten lanes road segment with
continuously) mixed traffic without any
merging or diverging
(Kaufmann et al., Germany - 600 meters - A signalized street
2018) Düsseldorf
(Krajewski et al., Germany - 420 meters 60 videos each in Six different locations all https://www.highd-
2018) Cologne average for 17 minutes simple multilane motorways dataset.com/
(Zhan et al., 2019) Various sites in Maximum 170 11 videos with various Several roundabouts, https://interaction-dataset.com/
USA, China, meters length between 24 to signalized and unsignalized
Germany 260 minutes intersections and merging
and Bulgaria and lane change sections
pNEUMA Greece - 1.3 Square 10 drones, four days Network of the central https://open-traffic.epfl.ch/
(Barmpounakis and Athens kilometers each day includes five business district with around
Geroliminis, 2020) videos and each for 100 intersections
maximum 20 minutes
(Liu et al., 2021) Suzhou, 4 sites, maximum Maximum 30 minutes Four different Expressways
Nanjing and 250 meters
Huaian - China

All the datasets in Table 1 (and many other trajectory datasets) are subject to the quadruple drawbacks listed in the introduction
of this paper. Also, assuming all limitations and expenses of collecting trajectory data through the current methods, researchers
have no control over road conditions during the data collection campaign. The occurrence of a traffic accident can threaten the
usefulness of the data collected for study purposes; or, inversely, if the objective is to observe incidental conditions, the analyst
depends on coincidence to observe such events in the limited spatiotemporal scope. What is lacking is a method for cheaply
collecting larger trajectory data sets (both in space and time) with sufficient accuracy for studying the lateral behavior of drivers.
This method should be suitable for long-term data collection from a long-distance road network. In addition, this method should
allow researchers to check the history of traffic state and accidents in the study network, and if they encounter an attractive case
study, they should be able to reconstruct the relevant data.
Two existing studies gathered trajectories that continue in time and over a long road length (a few kilometers). One of them
collects data based on an architecture consisting of dome cameras with V2X communication (every 500 meters) and fixed cameras
(every 100 meters) (Passchier et al., 2013). The second employs millimeter-wave radar sensors deployed at 200-500 meters (Wang
et al., 2022). Although these methods can collect high-quality and long-coverage trajectory data, their installation is very
expensive.
Another way to collect trajectory data is to use a probe vehicle equipped with multiple sensors such as a panoramic camera,
LIDAR, etc. (Chen et al., 2019b; Yao et al., 2016; Yao et al., 2012; Zhao et al., 2016). Although such a probe vehicle can collect
data over any desired path length, this method only stores data of vehicles in the sensor range and is thus limited to the immediate
surroundings of the probe vehicle.

According to a definition proposed in (Schuessler and Axhausen, 2009), all actions taken to prepare trajectory data for traffic
and transportation studies, pattern identification, and traffic state analysis can be considered pre-processing. Pre-processing
measures can include outlier and anomaly detection (cleaning), offline resampling, filtering, smoothing, data fusion with other
sources, etc. A proposed classification of pre-processing measures is provided in Fig. 1. Although these pre-processing measures
will increase the quality of trajectory data, the resulting accuracy still does not allow detection of the driving lane. In fact, these
methods are suitable for minor corrections or applications in macroscopic traffic studies.

Fig. 1. Classification of pre-processing measures to improve the quality of trajectory data for usage in traffic studies.

Trajectory data pre-processing, including cleaning, segmentation, completion, calibration, and sampling, is, in most literature,
the first step of several applications of trajectory data (Feng and Zhu, 2016). It has been shown that pre-processing measures
improve the actual analyses of the trajectory data (Alvares et al., 2009); this can include: speed filtering, correcting temporal order,
pruning multiple points with the same timestamp, and filtering for a minimum number of points in each trajectory. Outlier
detection and anomaly removal can include identifying outlier trajectory sample points on a path or identifying an entire path as
an anomaly. In this situation, the quality of a trajectory path is so low that detecting and removing some of the sample points is
no longer effective. Among the first category is a study proposing an incremental local outlier factor algorithm to identify outliers
in trajectory data streams (Pokrajac et al., 2007). In another study, an anomaly detection algorithm for application on trajectory
paths was proposed based on partitioning trajectories and then finding anomaly points based on successive distance and density
of points (Lee et al., 2008). Other methods for detecting anomaly points in trajectories are incremental clustering based on the
successive windows where the next point is expected to locate (Bu et al., 2009) or detecting stop points in trajectory streams as
outliers (Yuan et al., 2013). Lee and Krumm showed that using filters such as mean and median filtering, the Kalman filter, and
the particle filter can improve the accuracy of low-precision trajectories like those collected with cellphones (Lee and Krumm,
2011). Moreover, regression analysis is used to approximate and correct trajectory errors (Zhou et al., 2016). Pulshashi et al.
compare t-fixed partition and k-ahead artificial arcs smoothing methods on effective detection and revision of outlier trajectory
points (Pulshashi et al., 2018). Some studies suggest using stochastic processes such as Markov chains to estimate noise and
uncertainty in the position trajectory sample points (Emrich et al., 2012; Niedermayer et al., 2013).

To reconstruct trajectory data, various methods have been proposed, such as generic filtering, data-fusion, and filters based
on car-following models. Data-fusion is a widely used method that integrates information from different data sources to produce
more accurate, consistent, and effective data (Hall and Llinas, 1997; Llinas and Hall, 1998). This technique has many applications
in transportation and traffic engineering, in traffic accident detection based on the fusion of simulated probe vehicle and loop
detector data (Dia and Thomas, 2011), intelligent transportation systems (ITS) (Faouzi et al., 2011), demand prediction for taxies
(Rodrigues et al., 2019), smart cities (Lau et al., 2019), traffic flow prediction (Xie et al., 2020), and traffic congestion mitigation
(Lai et al., 2020). To reconstruct trajectories, data fusion using a particle filter was used to estimate the origin-destination patterns
based on automatic license plate recognition data (Rao et al., 2018). In another application of data-fusion with particle filters, data
obtained from automatic vehicle identification (AVI) and loop detectors were used to reconstruct the trajectories. This study used
PTV-VISSIM to generate trajectories and validate the results, which show more than 90% accuracy; however, this was limited to
longitudinal path information only (Feng et al., 2015). Three information sources, including loop detector data, traffic control
data, and travel time information, were fused, and rules of a microscopic traffic flow model have been used to reconstruct the
trajectory of vehicles in the signalized arterials based on the particle filter method (Xie et al., 2018). In a similar traffic context,
particle filters were used with travel time data and speed limits of the arterials to weight particles (Wei et al., 2020). Another study
proposed an algorithm that reconstructs trajectory data based on loop detector data for the purpose of travel time estimation (Ni
and Wang, 2008). Based on the data collected by mobile traffic sensors, two models, optimization-based and delay-based, were
presented that reconstruct the trajectory of all vehicles in signalized intersections (Sun and Ban, 2013). Also, based on the ping-
pong effect, the travel time information of smartphone antennas was used to reconstruct the trajectory path (Vajakas et al., 2015).
Moreover, a method called contact-enhanced trajectory reconstruction has been introduced to complete missing parts of sparse
mobile phone trajectory data and reconstruct their trajectories with complete and highly accurate longitudinal information of the
path (Chen et al., 2018, 2019a). With the aim of safety analysis of left turn at signalized intersections, a Kalman filter-based
method has been proposed that can reconstruct the trajectories. This method is only applicable to connected vehicles as it requires
vehicle kinematics information through the onboard system (Ma and Zhu, 2021). Finally, a technique has been proposed to
reconstruct all trajectories in traffic flow using the mobile sensing data obtained from connected automated vehicles; however,
this technique is limited to single-lane flow only (Wang et al., 2020). In reviewing the literature, some may encounter studies with
the keyword of trajectory reconstruction that are intended to reduce noise in the trajectories of recorded videos; this topic is
completely outside the scope of this paper and has not been studied.

Although all the above-mentioned studies provide effective methods for the application and purpose of their research to
reconstruct and improve the accuracy of trajectories, none of them cover the research gap covered in our paper. None of these
research studies can provide a reproducible method for fully automated (and thus: cheap, even for large data sets over long time
periods) reconstruction of trajectories on long lengths with an accuracy allowing driving lane detection.

Compared to the existing literature, this paper contributes the following innovations: a method for sufficiently accurate, fully
automated reconstruction of trajectories collected by smartphone GPS so that the bias in lateral position reduces to less than half
the width of a standard motorway lane. In this way, it is possible to provide accurate trajectory data in high volume and for long
distances. As a result, it is possible to analyze the drivers' lane selection and lane-changing behavior due to changes in the traffic
situation of different network segments. This is an achievement that has never existed in traffic engineering studies, enabling more
extensive empirical analysis of lateral behavior, which is a research gap that is quite noticeable, especially when compared to the
vast body of literature on longitudinal driver behavior.

3. Test network and data collection

The method proposed in this paper uses the individual vehicle registrations by double inductive loop detectors and trajectory
data stored by smartphones. In addition, to serve as ground truth in the validation of our methods, three more data sources were
used: trajectory data of a differential-GPS device (d-GPS) mounted on a probe vehicle, videos recorded by a dro , and
road CCTVs. All datasets were collected from a test network near Antwerp, Belgium. Our test network consists of the E313 and
R1 motorways between junctions Wommelgem and Antwerpen-Zuid in both driving directions, passing over an important
interchange Antwerpen-Oost and a busy 5-lane weaving section near Berchem-Borgerhout.

The trajectory data was collected through passive observation of existing users of the Touring Mobilis, RTL Traffic, and
Flitsmeister smartphone applications of Be-Mobile commercial traffic service provider, available for both iOS and Android
operating systems. To this end, a geofence was defined around the motorway network (shown in Fig. 2, where the geofence is
highlighted through a transparent blue mask). Upon entry of the geofence, positioning data is recorded at one Hz frequency through
of each vehicle in which a user has Be- applications installed. Because the drivers
(or passengers) are entirely anonymous and uninformed (other than their general consent, upon installation of the application, to
collect and exchange anonymized data), neither driving instructions nor vehicles equipment with any supplementary antennas or
other sensors were provided. Thus, drivers are uninfluenced by passive observation and decide independently on their driving
lanes and maneuvers such as lane change and overtaking. Different users may place their smartphones anywhere in their vehicle
by their personal choice. Trajectory data does not include the type and length of the vehicles. However, because of the diversity
of Be- does not only include passenger cars but rather a (possibly disproportionate) sample of
different vehicle types. Trajectory data is sent online from the smartphone application to the Be-Mobile server and is available to
the research team for analysis with a few hours delay. Trajectory information, including a dedicated unique ID for each vehicle
(smartphone) and timestamp based on local time (Central European Time (CET)) and vehicle spatial location, including latitude
and longitude in the WSG84 coordinate system (EPSG: 4326), as well as speed and heading. Note that the location accuracy is
unknown and varies broadly between trajectories depending on the smartphone GPS receiver quality, internal filtering, place
inside the vehicle, interference with other signals, atmospheric conditions, multipath reflections, etcetera. On weekdays,
trajectories of, on average, 4200 vehicles are registered per day. Data storage started in December 2018 and continued until
October 2021. Trajectory data stored on September 30 and November 20, 2019, was used to develop and evaluate the model. For
validation, trajectory data of 20 and 22 November 2019 and 23 and 25 June 2021 have been used.
On average, there is a double inductive loop detector station every 400 meters across the test network as well as on all on-
ramps and off-ramps, and each lane is equipped with a loop pair. The individual vehicle data of the loop detectors used in this
paper include the passage time, passage speed, headway, and vehicle length for every individual vehicle that passed and is stored
in a disaggregated format. The position of the loop detectors in the test network is shown in Fig. 2 using red circles.

We set up a data collection experiment to have a ground truth for evaluating the expected results. We used a probe vehicle
equipped with a d-GPS device in that experiment. Also, nine smartphones were placed in the probe vehicle simultaneously so that
we could compare the accuracy of the trajectory data recorded by the smartphone application with ground truth. Data collection
by probe vehicle was repeated for two days (September 30 and November 20, 2019), each for about 3 hours. For some technical
reasons, some of the smartphones were not collected GPS data in all of the trips of the probe vehicle inside the geofence. As a
result, in total, we made 31 d-GPS trajectories (trips of the probe vehicle inside the geofence), producing 212 smartphone
trajectories through the geofence that, together, passed 2262 times over loop detector positions.

Finally, we used image processing of videos recorded by a drone's camera and road CCTVs to validate the results of the
proposed algorithm. The videos were recorded by the drone's camera on November 20 and 22, 2019 during the afternoon peak
period, for thirty minutes each day. A segment approximately 400 meters long is visible in the recorded videos. This segment of
the test site is highlighted in Fig. 2 with a transparent red mask. All vehicle trajectories were extracted from the videos using
commercial video processing software (Adamec et al., 2020). Finally, video recorded by the road CCTVs has been used to provide
more data for validation purposes. The videos are for two days, June 23 and 25, 2021, each for six consecutive hours starting from
5:00 AM. The position of the CCTVs and the segment of the test network covered by them are highlighted in Fig. 2 with two
camera signs and two orange boxes. CCTVs cover two consecutive detector stations as well as a merging on-ramp in North to
South driving direction in a segment of the network approximately 310 meters long. The two cameras overlap a few meters. This
information was used to validate the reconstructed trajectories.

Fig. 2. The data collection site (the area covered by the transparent blue mask shows the geofence range) (colorful in print)

4. Methodology

The proposed algorithm of this paper consists of 4 steps, which are described below in detail. Fig. 3 shows the overall
flowchart of the proposed algorithm.
Fig. 3. The overall flowchart of the proposed algorithm

4.1 Step one: trajectory selection and map-matching

The first step of the proposed algorithm consists of two actions: trajectory selection and map-matching. Initially, the algorithm
considers three criteria to determine if a trajectory qualifies for the application of the proposed method.
Sometimes trajectory data is recorded with significant temporal or/and spatial interruptions. Numerous causes, such as
e disconnecting from the GPS satellite, may contribute to
this error. If such an interruption occurred near the positions of the loop detectors, the quality of the results of the proposed
algorithm would be significantly negatively affected. Consequently, we decided to exclude trajectory data with an interruption
longer than a threshold from the continuation of the process. A temporal interruption of more than 10 seconds and a spatial
interruption of more than the length that the vehicle travels in 10 seconds at an average trajectory speed are considered the
threshold of this error. This threshold is somewhat arbitrary, and we have chosen it based on an audition of our database. The
examination of the final results of the proposed algorithm showed us that the algorithm could effectively reduce the error due to
intervals of equal or less than 5 seconds (or its spatial equivalent). And trajectory paths with intervals of 5 to 10 seconds (or their
spatial equivalent) still have valuable information to analyze in their non-interrupted sections. This threshold may change
depending on the method and quality of trajectory data collection, road network geometry, and some other criteria.
The core of the proposed algorithm in this paper is based on matching trajectory data and disaggregated loop detector data.
Before applying the algorithm, the lateral and longitudinal error of the trajectory data may be up to 15 meters. Considering the
data collected by the probe vehicle and the d-GPS and smartphones simultaneously (explained in the 3 rd section), assuming the
position recorded by the d-GPS as ground truth, the distribution of longitudinal and lateral GPS error of smartphones is shown in
Fig. 4. While the lateral error is significant, and it is not possible to determine the driving lane without correcting this error, taking
into account the speed of the vehicles, the time equivalent of the longitudinal error is an order of magnitude that enables the
implementation of the proposed algorithm of this paper (more details in Section 4.3 and Fig. 7). Therefore, the trajectory path
itself cannot be used to determine the positions of the loop detectors through which the vehicle has passed. As a result, a map-
matched version of trajectory paths is essential. Consequently, the second criterion for a qualified trajectory path to apply the
proposed algorithm is that it is possible to apply a map-matching algorithm to them. A coarse mapping is sufficient here because
the only function of the map-matched trajectory is to determine which detector locations (not lane-specific) are crossed by the
trajectory while passing through the corridor. Our test network is a size-limited motorway network without any parallel routes.
Therefore, the best map-matching algorithm here is the fastest one that can work well on low-precision GPS data. In this paper,
we have used the algorithm proposed by Quddus et al. (Quddus et al., 2006). This method works well on GPS data collected with
low-accuracy devices such as smartphones. Comparing this method with other map-matching methods shows fast results and a
high rate of correct outputs (Quddus et al., 2007).
Fig. 4. The lateral and longitudinal error of the smartphone trajectories compared to the d-GPS as ground truth (colorful in print)

Finally, considering the logic of the proposed algorithm (which will be described in detail in section 4.3), the trajectory path
should be long enough to pass over at least two loop detectors. We found that approximately 92% of all trajectory paths in our
database are qualified for applying the proposed algorithm.

4.2. Step two: correction of minor trajectory errors

The second step of the proposed algorithm applies some minor corrections to the trajectory paths selected in the first step.
This step itself consists of four tasks. Each task corresponds to identifying and correcting a specific type of minor error in the
trajectory paths. Several factors can cause errors in the trajectory data. Sensor noise, multi-path signal reflection, signal blocking,
the limited number of visible satellites as well as the effects of the atmosphere and ionosphere are the main causes of these errors
(Merry and Bettinger, 2019), but also incorrect filtering of the GPS signal by the simple multi-purpose filters in the smartphone
can cause minor errors. In many cases, the errors are very easily detectable and modifiable, which are the subjects of this algorithm
step. Although these errors may seem trivial, neglecting to correct them seriously harms the quality of the final result.
The first task in this step is to complete trajectory paths with interruption intervals. Some momentary issues, as discussed in
section 4.1, can cause a temporary interruption in the continuous recording of spatial data. By considering each trajectory path as
a time series, it is quite easy to detect interruptions. Revising this error is also straightforward via interpolation.
Interpolation could be done in one of these three manners: 1) linear in position, 2) linear in heading and speed of the trajectory
sample points before and after interruption intervals, and 3) interpolation based on the curvature of the road centerline (obtained
from map-matching). Fig. 5 provides a schematic comparison of the results of these three interpolation methods. The first
technique will yield a straight line for missing segments; however, the third method is hampered by the fact that most trajectories
are not offsets from the road centerlines. Consequently, the second method is selected for application in this paper. In this method,
proportional to the number of missing trajectory sample points, new sample points are estimated based on the heading of the last
point before the missing segment in the forward direction. The same action is repeated in the reverse direction from the first point
after the missing segment backward. The final interpolated points are the weighted average of these two forward and backward
estimations. The weight is derived linearly based on the distance between each missing point and the last and first known points
before and after the missing segment.

Fig. 5. Comparison of three proposed interpolation methods for completing interrupted trajectory paths (colorful in print)

For all the qualified trajectories, the coordinate system has been converted to Belgian Lambert 72 (EPSG: 31370), a metric
system suitable for computations of the proposed algorithm. In addition, the speed , acceleration , heading , and
turning angle of each trajectory sample point (each timestamp ) are calculated based on the sequence of points. As a result,
the trajectory stream of each vehicle is a 6-tuple of length as: .
The second task in this step is to revise the error due to sharp deceleration of the vehicle. Smartphones revise the spatial
information received from the GPS satellites with the help of an internal filter. This simple filter is usually based on a linear system
model with constant speed. When the driver suddenly applies a sharp brake to the vehicle, the filter makes an error rather than
correcting the GPS information. First, it estimates the location of the next point assuming constant speed (while actually, speed
reduces; hence, the filter introduces bias by moving forward more than the actual movement). Next, the filter tries to compensate
for its initial error as new GPS position samples suggest the vehicle's actual position behind the previously filtered position.
However, sometimes the magnitude of deceleration applied by the driver is so sharp that it leads to a considerable error, and the
filter compensates for it by a backward movement (recording a point in timestamp behind the point recorded on timestamp
). An example of this error is shown in Fig. 6 part (a). The process of detecting and correcting this error is presented in
Pseudocodes 1.
Occasionally, a vehicle may be stopped for a few seconds, e.g., due to traffic jams in its path. The GPS receiver and internal
filter of smartphones record such a case as a cluster of points remarkably close to each other, suggesting local erratic motion that
is actually GPS positioning error and filtering noise. An instance of this error is shown in Fig. 6, part (b). Such clusters could
negatively affect the result of the fourth step of the proposed algorithm. The process of detecting and correcting this error is
presented in Pseudocodes 2.
And finally, the last task in this step is identifying and correcting zig-zag movements. A zig-zag movement is entirely the
result of the smartphone s relatively low GPS accuracy. Considering the vehicle s kinematic model, a vehicle cannot perform
such maneuvers in a few seconds. Examples of this error are shown in Fig. 6 part (c). The process of detecting and correcting this
error is presented in Pseudocodes 3.

Fig. 6. Types of minor errors that are identified and corrected in step two of the proposed algorithm (All parts of this figure have been
produced in an exaggerated manner with the aim of clearly showing the types of errors.). (colorful in print)

Pseudocodes 1-3. The identification and correction of minor errors in trajectory streams
The above-mentioned three types of minor errors are not the only examples of minor errors in trajectory data; however, they
are the boldest, and correcting them before starting the core of the proposed algorithm is very fast and very effective in improving
the quality of the final results. We found that, on average, approximately 4% of all trajectory sample points of each trajectory path
in our database are subject to one of the above-mentioned three types of errors.

4.3. Step three: data fusion for lateral correction at detector station locations (matching problem)

The most important part of the proposed algorithm is its third step. In this step, we will infer the driving lane of trajectories
based upon passing the detector station locations by fusing information from two independent data sources. The description of
this step of the proposed algorithm begins with a claim. We will then explain the argument that supports this claim. The argument
itself is based on some axioms. We will show that these axioms are valid.
Theorem: We can correctly identify the passage of an individual vehicle through a set of loop detectors in its path. So, we
know the vehicle's driving lane near the loop detector station. (Two statements of the theorem are equivalent by different words).
Argumentation: With experience, we can gain cognition of the features of any object in general. Also, objects are only
known as they are experienced and not exactly as they are (cognition is not perfect). Whenever we experience more features of
an object or gain multiple experiences from certain features of that object, our cognition of it gets more comprehensive, and the
conviction of this understanding expands(1). Data fusion is a well-known method that leads to more reliable cognition based on
experience from various measurement sources and results in producing more consistent, accurate, and useful information (Hall
and Llinas, 1997). If we observe a vehicle in traffic flow from several data sources, we can study its movement more accurately.
Recording the passing data of a vehicle by the loop detectors and recording the trajectory data of a vehicle are two different sources
of information that reveal different characteristics of a vehicle movement in traffic flow. From the trajectory data, we can infer
when a car passes the longitudinal location of a detector station. If we can identify by which of the lane-specific loops of that
detector station the study vehicle was recorded, then by continuing the rest of the proposed algorithm, we can determine the
driving lane of the vehicle along its entire path and the location of possible lane changes with a high level of certainty as will be
shown in section 5. The length of a vehicle is always constant, so the approximately same value must be recorded for the length
of a vehicle by different loop detectors along its path. Therefore, the vehicle length would be the control variable.
Trajectory data includes vehicle spatial information and speed at each timestamp. Loop detectors for each passing vehicle
record the time and speed of passage and the size of the vehicle. For the above argument to be valid, the following axioms must
be true:
Axioms:
I. We have a detailed digital map of the study network at our disposal in which the position of the lanes, their connections,
and especially the location of the lane-specific loop detectors are correctly marked with the same references as trajectory
data.
II. The passage time of each vehicle through the location of the loop detectors is detectable with a good approximation based
on the trajectory data.
III. Although both the speed information recorded by the loop detectors and the speed information recorded by the trajectory
data are not perfect, their combined error is small and negligible.
IV. Whereas the vehicle length information recorded by the loop detectors is not perfect, its error is small and measurable.
The lane-specific digital maps required by axiom I can be extracted from low-precision trajectory data of smartphones, as
demonstrated by the authors in (Arman and Tampère, 2021). Direct extraction of the digital map from the same smartphone
trajectory data that we now like to position on it, has the advantage that potential systematic bias in the GPS signal (e.g., due to
obstruction or reflection of the signal by bridges or buildings) exists in both sources alike, so that our relative positioning of the
trajectory within the lane map (which essentially subtracts and thus cancels both correlated biases) is largely unbiased. This would
not hold for an independent map, e.g., an accurate GIS database of the road authority. In section 5, we will explore to what extent
axioms two through four are valid.
In the continuation of this paper, we consistently differentiate between detector stations and detector loops; the latter are lane-
specific, and the former are groups of loops with the same longitudinal position along the road axis. We have three data sources:
loop detectors data, d-GPS data, and trajectory data collected by smartphones carried by the probe vehicle. Let's first define some
notations to refer to information from these data sources. Suppose there are detector station positions (DSP) inside the test

(1)
From the beginning of this argumentation until here, the idea adopted from Emmanuel Kant's argument on the importance of categories for cognition based on
the experiences from his book Critique of Pure Reason (Kant, I., Critique of Pure Reason, (1781), edited by: Paul Guyer and Allen W. Wood., Cambridge
University Press 1998).
network. Then is the set of all detector stations inside the network. Let us define
as the subset of detector stations that are visited by any specific vehicle whose trajectory we are examining,
. From now on, in dealing with each trajectory, we will only deal with its set of visited detector stations.
Consequently, we will continue defining the notations just considering this set of detectors. Each includes one or more
double-inductive loops for each of the lanes at that location. So, let us refer to the set of double-inductive loops of each station
as . For each vehicle that passes through an in its path, information
including passage time , passage speed , and vehicle length is recorded.
As previously described (in section 3), we divided the data collected by the d-GPS device into d-GPS trajectories within the
geofence area. Each d-GPS trajectory includes a set of samples of trajectory points (the set length is equal to the d-GPS trajectory
travel time). For each sample point, information including time, geographical coordination, and speed are available. Let us denote
the speed and time of passage of the probe vehicle based on d-GPS data over the detector station in its path as and ,
respectively. Similar to the trajectory data recorded by d-GPS, the same information is available for the trajectory data recorded
by smartphones during each d-GPS trajectory of the probe vehicle. We denote the probe vehicle's speed and time of passage based
on the data collected by any specific smartphone over the detector station as and , respectively. Finally, let us denote the
length of the probe vehicle by (fix and equal to 450.5 centimeters).
Given the cm-accuracy of d-GPS, there is a negligible error in the passage time of the probe vehicle over any based on
the data recorded by the d-GPS device. We precisely know our probe vehicle has passed through which lane-specific loop at each
detector station as well. For each passage of the probe vehicle, this time is also recorded by the loop detector and the smartphone
application. The second axiom claims that the smartphone application can record the time of each passage with acceptable
accuracy. Fig. 7 shows the distribution of the difference between the passage time of the probe vehicle recorded by loop detectors
and recorded by the d-GPS device (part a) and between time recorded by loop detectors and time recorded by smartphones (part
b) for 321 times of probe vehicle passing over different loop detectors inside the geofence (due to the existence of several
smartphones inside the probe vehicle, for each passage over loop detector positions data of several smartphones are available).
These differences in passage times between different data sources are mathematically defined as:

(1)

Fig. 7. Passage time difference between a) d-GPS trajectories and corresponding loop detector data b) smartphone trajectories and
corresponding loop detector data (seconds) (colorful in print)

Given that the d-GPS records provide absolute certainty about the matches between loop detector data and probe vehicle, we
can compare the time of passage as registered by both data sources (2). The histogram of Fig. 7 part b represents the time gap
between smartphone trajectories and corresponding loop detector data and indicates that in nearly 89 percent of cases, the absolute
difference between passage time registered by smartphones and loop detectors is less than half a second, which confirms the
validity of the second axiom.
In a similar way, the validity of the third axiom can be proven by comparing the statistical distributions of and
. Fig. 8 parts (a) and (b) represent the distribution of and respectively. Where:

(2)
Note that for other vehicles for which we do not have the d-GPS trace, we cannot draw similar histograms, as in contrast to our probe vehicle we are
never certain that we plot the time difference of observations corresponding to the same vehicle.
(2)

Fig. 8. Passage speed difference between a) d-GPS trajectories and corresponding loop detector data b) smartphone trajectories and
corresponding loop detector data (meter/seconds) (colorful in print)

Based on Fig. 8, part b, one can verify that the absolute speed difference of the same vehicle passage observed in smartphone
trajectories and related loop detector data is never more than 2.25 meters per second, and in approximately 95% of instances, it is
even less than one meter per second, which proves the third axiom's validity.
To prove that the fourth axiom is also valid, we must show that different loop detectors record the length of our probe vehicle
approximately the same in different measurements. Fig. 9 part(a) shows the distribution of the probe vehicle length as recorded
by various loop detectors inside the geofence. In section 3, it was explained that the data collected by the probe vehicle was divided
into d-GPS trajectories within the geofence area. Parts (b) and (c) of Fig. 9 show the mean and standard deviation of the length of
the probe vehicle recorded on the set of detectors visited on different d-GPS trajectories, respectively.

Fig. 9. a) Distribution of the size of the probe vehicle recorded by the loop detectors, b) Distribution of the average size of the probe
vehicle recorded by the loop detectors in different d-GPS trajectories, c) Distribution of the standard deviation of the size of the probe vehicle
recorded by the loop detectors in different d-GPS trajectories. (colorful in print)

As shown in Fig. 7, when a vehicle passes over a loop detector position, the passage time recorded by the smartphone's
application and the loop detector may differ by up to second. During this 2-second period (let us call this search ),
other vehicles may pass over other loops (in adjacent lanes) of the detector station. In addition, more than one vehicle may pass
over each loop of a detector station during the search window in heavy traffic. Although the driving lane is always well-known in
data collection experiments with the d-GPS, this is not the case for the bulk data for which we only have GPS traces from the
smartphone. Therefore, to gain the correct matching, the speed and passage time based on trajectory data should be compared
with the speed and passage time of all vehicles recorded by all loops of a during the search window. Since both, the passage
time and passage speed criteria must be considered, and these two criteria have different units of measurement, we need to make
these criteria dimensionless. To this end, we can use the probability of similarity of passage time and passage speed based on the
two data sources. These probabilities can be calculated based on probability distribution functions (PDF) that can be estimated
using the histograms in part (b) of Fig. 7 and Fig. 8. The reason for using histograms of part (b) instead of histograms of part (a)
is as follows: except for the experiment of probe vehicle data collection, as described in section 3, trajectory data is continuously
collected by the application installed on the smartphones of anonymous drivers. Those trajectory data always have some errors.
However, as discussed above and stated through the second and third axioms, the error is to the extent that allows successfully
matching the observation of two data sources. This assertion will be verified in section 5 of the paper.
A neighbor-based method of kernel density estimating was used to estimate the PDFs. In general form, if sample
is univariate independent and identically distributed (i.i.d.) with an unknown density , then a kernel is a
nonnegative function controlled by a smoothing parameter :

(3)

Smoothing parameter is controlling the tradeoff between bias and variance in the resulting distribution and is known as
Silverman rule-of-thumb bandwidth. The smaller the bandwidth, the lower the bias of the density distribution. In contrast, the
shape of the distribution function is more unsmooth, indicating high variance. The most common (and well accurate) criterion to
select bandwidth is to use the expected risk function, which is commonly known as the mean integrated squared error ( )
(Hansen, 2005):

(4)

The optimal value of must minimize the . For this study, to estimate the PDF of and the PDF of ,
the optimal values of are and , respectively. Let us show the estimated PDF of and the estimated PDF of
with and , respectively. These two PDFs are shown in Fig. 10.

Fig. 10. Estimated probability distribution functions (PDF) based on the relative differences between data registered by smartphones
application and loop detectors, a) Passage time (seconds), b) Passage speed (meters/seconds) (colorful in print)

Fig. 10 also presents the normal distributions assuming the mean and standard deviation similar to the empirical distributions.
In section 5.4, we will examine the use of the normal distributions instead of the empirical distribution, which will lead to worse
matching results; herewith demonstrating the importance of the empirical distributions.
In summarizing, the probability of observing any difference in the time and speed of a vehicle passing a between the
trajectory data of smartphones and the data recorded by the loop detectors would be:

(5)

We now use these probabilities to infer which of the lane-specific loops of a detector station observed the passage of a given
trajectory of interest (so that we can, at that location, pinpoint its driving lane). There are, however, many candidate matches in
the detector loop data. The problem is shown schematically in Fig. 11. All vehicles recorded during the search window in every
visited by a trajectory path, regardless of the driving lane of the trajectory (which is, at this point, unknown), should be
considered. An consisting of two numbers is assigned to each observed vehicle by loop detectors. The first number is the
set), and the second number is an ordinal number , which is equal to the total
number of vehicles recorded during the search window by all loops of a . Let us denote the passage time, passage speed, and
vehicle length of each of these recorded vehicles in any by , and respectively. All sequences from the
combination of vehicles recorded on all s in a trajectory path should be considered (the sequences shown in part (b) of Fig.
11). Let us denote the set of these sequences by , then . Each member of is a sequence consisting
of the records of vehicles (including their passage time and passage speed) in successive s in the trajectory path. Therefore,
any can be written as Eq. (6). And considering the known passage time and passage speed of each trajectory over each
we will have Eq. (7).

(6)

(7)

Fig. 11. a) The schematic shows a trajectory crossing a route that includes four-lane road three-lane road segments. The rectangles
associated with an ID show the vehicles recorded by the loop detectors on the route. The length of the rectangles is assumed to be proportional
to the length of the vehicles, and the color of the rectangles is assumed to be proportional to the speed of the vehicles. b) The schematic shows
a combinatorial problem of enumeration of all possible sequences of combinations of vehicles recorded by the loop detectors along a trajectory
path. (colorful in print)

Taking into account Eq. (5) and Eq. (7), now the problem of matching the trajectory data with the data of all the loop detectors
in its path can be formulated as maximizing a conditional likelihood function as follows:

(8)

(9)

(10)

in Eq. (9) and Eq. (10) is the condition controlling the likelihood function based on the fourth axiom. This variable is
equal to one if the standard deviation of the length of vehicles of a sequence is less than or equal to the maximum acceptable
standard deviation . In section 4.4, the value of will be discussed in detail.

4.4. Calibration of vehicle length consistency function

Fig. 9 showed that in different paths (d-GPS trajectories) inside the test network, the length of the probe vehicle is recorded
by the loop detectors in a consistent way that is, on average close to the actual length of the probe vehicle, and the standard
deviation in different d-GPS trajectories is limited and small (index of dispersion observed in different d-GPS
trajectories ranged between 0.003 to 0.034). The first guess for the value of variable can be the maximum value observed
in part (c) of Fig. 9. However, this value is obtained for a 4.505 meters length passenger car and should be calibrated for other
vehicle classes. For the calibration of , the data of the loop detectors were used in road segments in the test network that
included at least four consecutive loop detector positions without any changes in the traffic flow (without any on-ramps and/or
off-ramps during the segments). In the early hours of the day (between 2:00 AM and 4:00 AM), all the records of the loops of
each loop detector position were investigated. During these hours of the day, the volume of traffic flow is so low that it is easy to
track a certain vehicle over a series of consecutive loop detectors. In total, after reviewing 60 hours of data and certain
identification of 806 vehicles from different classes and lengths, the calibration values of for different vehicle lengths are
shown in Table 2. Classification of vehicles based on their length in different classes may be done according to different standards;
we decided to use the standard of the Flemish government (Agentschap Wegen en Verkeer (AWV) and Vlaams Verkeerscentrum
(VVC)) for this purpose.

Table 2
Calibration values for maximum acceptable standard deviation of sequences of recorded vehicle length by loop detectors.
Length range meters 1.0 to 4.9 meters 4.9 to 6.9 meters 6.9 to 12.0 meters meters
0.15 0.41 0.65 0.83 1.05

As we stated, the algorithm exploits the fact that the length recorded for any certain vehicle remains more or less constant
over successive loop detector stations. Nevertheless, different vehicles of similar length may arrive at the location of a loop
detector station at the same time, which can lead to wrong matches when, apart from length, also speed and passage time are
similar. Our validation in section 5 shows that such wrong matches do not happen too often. Another case where there is a high
chance for a wrong match is when the vehicle length detection is very biased. This can happen when the vehicle changes lanes
precisely at the location of the loop detector station, which may lead, for instance, to exceptionally low length registrations,
possibly on both lane detectors (Knoop et al., 2012b). In such circumstances, one might opt to modify the algorithm to look for
matches with either similar or suspiciously short lengths. As we did not have indications of the frequent occurrence of this case,
we did not consider such modifications in this paper.

4.5. Solution method for the maximum likelihood problem

The exact solution to the problem of Eq. (8) to Eq. (10) would consist of the following:

1- Make a list of all possible sequences;


2- For all possible sequences, calculate the median and the standard deviation of the vehicle length;
3- For the sequences in which the standard deviation of the vehicle length satisfies the condition of Eq. (10) based on Table
2, calculate the value of the likelihood function;
4- The sequence that yields the maximum value of the likelihood function is the answer to the problem.
This exact procedure is not feasible. A typical trajectory in our data passes through 10 to 20 (average 15) loop detector
locations in its path. During normal traffic conditions, number of the vehicles recorded by all loops of each detector station within
the search window, depending on the number of lanes, is between 4 and 6 (average 5) vehicles but can reach 12 vehicles during
peak periods. Therefore, to solve the problem exactly, an average of sequences would have to be evaluated
explicitly. This amount of computation is very time-consuming and renders the proposed algorithm impractical. Instead of
explicitly evaluating all sequences, a heuristic solution method is proposed. At first, the product of probabilities of passage time
and passage speed similarity between trajectory and loop detector data is estimated for each potential match in each . Then,
starting with the highest product of probabilities, all probable matches in each are ranked. Then the median and standard
deviation of vehicle lengths in the sequence of first rank matches are computed. Unless this violates Eq. (10), we presume
that the match sequence consisting of the first rank in each is the best match. Otherwise, the acceptable vehicle length range
is equal to the median plus and minus . If the vehicle length of a match in any exceeds this range, that match will be dropped
from the sequence and substituted with the match with the next highest rank. This technique is repeated until a series of matches
is obtained that does not contradict Eq (10). This method is described using Pseudocode 4. Thanks to the heuristic solution method,
the average run time of the matching problem per trajectory path has been reduced from 77 seconds to 0.62 seconds(3).

(3)
A computer equipped with an Intel Core i7-8700K CPU, and 32 GB of RAM has been used
Pseudocode 4. The heuristic method for implicit solution of the problem of equations 8 to 10.

4.6. Step four; trajectory path reconstruction

The last step of the proposed algorithm is to reconstruct the trajectory path. At the end of the previous step, a sequence of
observations of the vehicle is obtained over the s in the trajectory path. Thus, we know which lane the vehicle drove near
each , and resetting the lateral position at those locations to the middle of the lane minimizes the expected lateral position
error. These known positions along the vehicle path are considered the skeleton of the reconstructed trajectory, as illustrated in
Fig. 12. For comparison purpose, the original trajectory path and the path recorded simultaneously by the d-GPS device is also
shown in this figure.

Fig. 12. The skeleton of an under-reconstruction trajectory path (colorful in print)

The second part of this step is simple. We assume the trajectory speed and heading information are correct at all timestamps.
In fact, this information is not accurate; but it is the only information we have about the vehicle path between each two successive
s. The path is then reconstructed from each skeleton point, using forward and backward integration of the speed and heading.
As a result, two reconstructed paths are obtained for each trajectory segment between two successive s. The final path is the
weighted average of these two, with weights evolving linearly with distance from the skeleton points. Noteworthy to mention, the
average distance between each two successive loop detector positions in the test network is approximately 400 meters. The bias
of the trajectory sample points between each two successive s has been modified twice thanks to forwarding and backward
reconstructions. The uncertainty of this correction, however, evolves from a minimum at the skeleton points ( s) to maximal
uncertainty in the middle between s (as confirmed further in Fig. 14). This makes the maximum error after bias correction a
function of the detector distance. Whilst this fact holds, we have no systematic variation in detector spacing that allows quantifying
this dependency. At least, as we demonstrated through our results (Fig. 14) in our test network, the remaining lateral error with
detector spacings up to 738 meters (= the maximum in our test network) still allows identifying the driving lane. Moreover, we
point out that detector spacing only affects the accuracy of bias correction in between detectors, while the validity and correctness
of the matching process are independent of the distance of the loop detector stations as this matching is done only based on data
around a single .

5. Results and discussion

In this section, we will first present the results of the third step of the proposed algorithm. Then we show that, in general, the
proposed algorithm can reconstruct trajectories with acceptable accuracy. As explained earlier, the accuracy we aim for is to allow
detecting the driving lane. Next, the validation results of the proposed method will be presented based on videos recorded by a
drone and by CCTVs. For this purpose, it is necessary to define all the possible situations of reconstructed trajectories versus
validation data. Finally, we will discuss the effect of the estimated PDFs (Fig. 10) on the results.

In this section, we are looking for the occurrence of matching errors in the results. In general, the goal of the data-fusion
problem is to find the best match between trajectory data and loop detector data. This is the most critical part of the proposed
algorithm. If the answer to this question is reliable, we can claim a highly reliable reconstructed trajectory.

In total, we could compare the calculated matches with 2262 passages of d-GPS trajectories (in total, 212 smartphone
trajectory paths of 31 d-GPS trajectories), of which we know the exact driving lane in each . Driver-view videos have also
been used for double-checking. In total, only in 71 cases did the algorithm result in a wrong match, so the data-fusion algorithm
was successful in 96.86% of the cases.

Given this high percentage of correct matches, we tested whether, after applying forward and backward bias correction, the
lateral bias over the entire track had been sufficiently reduced, i.e., to the extent that we can identify the driving lane with high
reliability. In total, we collected 33146 trajectory sample points by smartphone applications inside the geofence area. After full
reconstruction of all smartphone trajectories, it turns out that 93.22% of the bias-corrected sample points are located in the actual
driving lane as recorded by the d-GPS device. The result of applying the whole algorithm to a trajectory path recorded by a
smartphone is shown in Fig. 13. The figure clearly shows how effective the proposed algorithm is in correcting the bias and
reducing the lateral error of smartphone trajectory data.

Fig. 13. Final bias-corrected (reconstructed) trajectory path (colorful in print)


A more detailed illustration is given of a reconstructed probe vehicle trajectory with a length of 272 sample points (272
seconds) in Fig. 14. Part (a) of this figure shows the ground-truth trajectory recorded by d-GPS and the positions of the loop
detectors along the path. The red numbers indicate the sample point numbers (timestamps). The vertical axis in plots of sections
(b) and (c) correspond to the same numbers. In parts (b-1) and (c-1), the boxplots show the lateral bias of the sample points of
trajectories recorded by 9 smartphones carried in the probe vehicle relative to the path recorded by d-GPS. Each orange dashed
line in these graphs represents the width of a standard lane in motorway networks (3.75 meters). Parts (b-2) and (c-2) show the
lateral bias remaining after the reconstruction of the trajectories. The solid red line represents half the width of a standard lane
(1.875 meters): the proposed algorithm reduced the lateral bias of all smartphone recordings to (far) less than half of the width of
a standard lane. Note how, at detector station locations, the residual bias is minimal but not zero because we positioned the skeleton
points in the middle of the lane while, apparently, the probe vehicle deviated from the middle by 0-0.6m approximately. In between
skeleton points, the error grows but remains small enough to identify the actual driving lane, and consequently, lane change
maneuvers can be identified in most cases.

The most crucial part of this algorithm is the data-fusion problem: if false results have been obtained for the data-fusion
problem, the forward/backward reconstruction in the fourth step is not effective at all and may even add to the lateral bias. If at
least one of the obtained matches is incorrect in the third step, depending on the driving lane in the upstream and downstream
s, one of the following incorrect observations of lane-changing maneuvers from the reconstructed trajectories may infer: 1)
two lane-changing maneuvers are reconstructed that have not happened in reality (two false positives). 2) two lane-changing
maneuvers that have happened in reality have not been reconstructed (two false negatives), 3) one actual lane-changing maneuver
has been removed, and one false maneuver has been added.
Fig. 14. Boxplots of lateral bias comparison of smartphone trajectories, before and after application of the proposed algorithm (colorful
in print)

5.2. Validation based on drone video recording

During recording of the drone videos, 67 GPS trajectories were registered through the smartphone apps of anonymous users.
In this validation phase, only one detector station was visible in the road segment monitored by the drone. The validation results
are presented through the confusion matrix in Table 3. Considering that the drone camera spans 400 meters of road times 67 GPS
tracks, almost 26,800 meters of the path were recorded by the drone. Overall, in 90.68% of this length, the sample points were
reconstructed in the same driving lane as the drone videos show.
Table 3
Validation results based on the drone videos (correct reconstructions are shown with bold numbers and a green background)
Reconstructed Correct Match
Wrong Match
Observed Without LCM With LCM
Without LCM 42 5 (7.46%)
Correct Match 1 (1.49%)
With LCM 0 19
LCM: lane-changing maneuver
TT: travel time

Apart from the possible errors of the algorithm, there are two other situations, Pseudo Error and Induced deviation, which are
remaining biases in the reconstructed data that can be removed in post-processing (Pseudo Error) or are small position errors of
otherwise correctly detected lane changes:

Pseudo Error: In this case, the remaining lateral bias is somewhat larger than the half of the lane width, so that the trajectory
seems to change to another lane while in reality, it did not. Such cases can be easily filtered out. For instance, the whole maneuver
took place in a concise period of time (2-5 seconds), which was not likely intended as two conscious lane changes by a driver.
Alternatively, the trajectory drifted slightly into the next lane (in most instances, less than a meter), indicating that it was not an
intended driving maneuver. As it can be inferred from the title of this scenario, the Pseudo Error is not an error; rather, it is just
resembling like an error. Consequently, whereas we report it for clarity purposes, it should neither be considered an error in the
evaluation nor in the interpretation of the results.

Induced deviation: A correct lane change, but with a slight difference in its longitudinal position. During the reconstruction
algorithm, we repositioned the matched trajectory at the center of the lane at skeleton points near the loop detector. In fact, we are
unaware of every vehicle's real lateral position at this point, and this assumption is the simplest one for minimizing expected lateral
error, but it may not be the most accurate one in all cases. This assumption may thus induce a longitudinal shift (bias) in the
location where lane-changing maneuvers are observed in reconstructed trajectories compared to their actual location. Fig. 15
illustrates this.

Fig. 15. Induced deviation in the location of lane-changing maneuvers in reconstructed trajectories (green line) versus the actual driving
path (blue line) because of wrongly resetting position to the lane center at skeleton points (red line) (colorful in print)

Considering the vehicle's speed during the lane-changing maneuver, the required time for fulfilling this maneuver (4-6
seconds (Li et al., 2021; Toledo and Zohar, 2007)), and using some trigonometry (Part b of Fig. 15), the theoretical upper bound
for induced deviation is approximately 80 meters, and it is noteworthy to mention we have never witnessed an induced deviation
greater than 30 meters in any of our validations.

Induced deviation in comparison between reconstructed trajectories and drone data has been seen in a range of 3.12 to 24.77
meters with an average of 19.44 meters. Examples of wrongly reconstructed trajectory, Pseudo Error, induced deviation, and an
entirely correct reconstruction are presented in Fig. 16, parts (a) to (d), respectively. In the cases of 11 (16.42%) of the
reconstructed trajectories, we could recognize and correct a Pseudo Error.
Fig. 16. Validation of the results of the proposed algorithm compared to drone video recordings. In parts a to d, the first row shows the
video recorded by the drone and the image processed route of the vehicle under study. In the second row, the reconstructed trajectory of the
same vehicle is shown. The third row compares the lateral position of the path recorded by the drone video after image processing (green line)
and the reconstructed trajectory (red line). For a better comparison, the Y-axis is magnified relative to road length on the X-axis. (colorful in
print)

Although the accuracy of the validation results is slightly lower than the results obtained from the reconstruction of the
trajectories of smartphones inside the probe vehicle, we conclude from the validation by drone data that the proposed algorithm
determines the true driving lane with an accuracy of more than 90%. In fact, the error due to the malfunction of the matching
phase of the proposed algorithm in the validation data is only 1.49%, so matching accuracy is higher than 98.5%.

5.3. Validation based on road CCTV videos

The validation with the drone video was important for two reasons: (a) the section has a sharp horizontal curvature that
changes, which makes a difficult case for the lateral drift of the GPS signals, and (b) there was quite some turbulence due to nearby
on- and off-ramps. However, the quantity of data was limited. To address that limitation, we add another more extensive validation
with ground truth from CCTV videos over 12 hours covering two successive detector stations. The BeMobile server recorded a
total of 1208 trajectories throughout this time period. Table 4 summarizes the algorithm's overall performance for these
trajectories. The induced deviation in the position of lane-change maneuvers in this validation ranged from 15 to 30 meters. The
matching accuracy evaluated via CCTV data equaled 95.15% = (1-(117/(1208×2)))%. Moreover, in the cases of 89 (7.37%) of
the reconstructed trajectories, we could detect and correct a Pseudo Error.
Table 4
Validation results based on the CCTV videos (correct reconstructions are shown with bold numbers and a green background)
Reconstructed and are on same lane and are on same and are on different lanes
Without LCM lane but and are the and are the
Observed With LCM and are not
Same TT Different TT same different
Without LCM 942 19 (0.79%) 0 97 (4.01%) 0 0
and are on same lane
With LCM 0 16 0 0 0
and are on different lanes With LCM 0 0 133 1 (0.04%)
and : upstream and downstream lane of driving based on observation
and : upstream and downstream lane of driving based on the reconstruction
LCM: lane-changing maneuver
TT: travel time

Regarding the error percentages in Table 4, it was taken into account that the CCTV used for validation had two successive
detector stations inside the camera vision. As a result, all the errors were registered if at least one of the two matches was wrong.
So, error percentages are computed as the number of errors divided by 2×1208. On the contrary, in the case of validation based
on drone videos, as we had only one loop detector station inside the drone vision, this factor 2 did not apply. As a result, if the
criterion is the number of trajectories that have been reconstructed correctly, we reach 90.3% accuracy, which is almost the same
as a result obtained for validation based on drone videos (including two matching problems on the detector stations and the
reconstruction of trajectories between two detector stations).

5.4. Algorithm sensitivity analysis; the effect of probability functions

In section 4.3, we showed that the estimated empirical PDFs do not follow the normal distribution. Examining Fig. 10, it can
be seen that the probability of values close to the mean values in these two empirical distributions (blue curves) is higher than the
corresponding probability based on normal distributions (red curves) with the same means and standard deviations. In contrast,
normal distributions are fatter in the tails. What
use as a proxy a normal distribution (assuming we could optimize its mean and standard deviation for optimal matching
performance)?

If we perform the third step of the algorithm based on best-fitting normal distributions (red curves in Fig. 10), the number of
matching errors based on the probe vehicle trial increases from 71 to 398. Although this number of errors still means 82.40% of
correct matching results, it should be noted that this number of incorrect matches corresponds to 212 smartphone trajectory paths,
which means approximately 1.9 false matches per trajectory. Taking into account that each wrong match could directly affect the
correct identification of two lane-changing maneuvers, we conclude that it does not seem possible to use the normal distribution
instead of the empirical distributions, even if the means and standard deviations are the same as empirical values. Mind that the
empirical distributions of Fig. 10 are local characteristics of our test network and data collection facilities and hence, probably not
transferrable to other cases; a tailored calibration of them is highly recommendable for application of the proposed algorithm in
other networks or with other data collection facilities.

6. Remarks and conclusion

This paper presents a four-step algorithm for reconstructing and bias correction of trajectory paths recorded by everyday
smartphones. The core of this algorithm solves a data-fusion problem of trajectory data and individual loop detector data. The
ultimate accuracy of the results depends to a large extent on the correctness of this step. This algorithm was developed to provide
appropriate and reproducible data to study the lateral behavior of drivers in extensive, complex motorway corridors.

The evaluation and validation of the proposed algorithm using drone and CCTV video data show that the data fusion matches
more than 94% of trajectory passages correctly to the corresponding lane-specific detector loop recording. The full algorithm
corrects lateral bias on all trajectory sample points successfully so that just over 90% of all points are mapped into the right lane.
Consequently, for the first time (to the best of our knowledge), researchers can identify the driving lane and locations of lane
change maneuvers based on large quantities of trajectory data recorded by everyday smartphones. An important feature of this
generated data is that it allows studying driver behavior in a wide variety of traffic conditions and over a long motorway corridor
(basically limited only by the configuration of the data collection geofence in the smartphone app -office). Apart
from the calibration of empirical PDFs necessary to accurately perform the data-fusion, the proposed algorithm has very few
calibration parameters (the maximum standard deviation of vehicle length for different vehicle classes in Eq. (10)). As a result,
this algorithm can be applied to any other motorway network and any other method of collecting trajectory data using GPS
provided that individual inductive loop data per lane are available.

This is an important step in providing data for a better understanding of the lateral behavior of vehicles in traffic flow. We
envision applications in the development and calibration of lane change models, calibration of traffic microsimulation software,
theoretical traffic flow studies, and motorway network safety analysis. The lack of such data and consequently incomplete
understanding of lateral driver behavior has been one of the reasons for the relative inefficiency of management strategies for
weaving sections. Traffic analysis based on the data obtained from the proposed algorithm can inspire novel lane-specific
management strategies for drivers' guidance and cooperative vehicle systems.

Based on GPS data from only a (small) sample of all vehicles, the reconstructed trajectories may indeed be unsuited for some
existing lane-c
data has other advantageous properties that may enable new lane-changing and safety analyses for which previous data sets were
not suited. It is understandable that existing safety and lane-change analyses require the full sample of vehicles: so far, there has
been no need to analyze based on partial data as virtually all existing studies were based on video recordings (as discussed in
Table 1) that naturally observe the full sample. This does not mean that novel safety and lane change analyses using just a sample
of all vehicles would be impossible! The new data resulting from our algorithm has some unique characteristics that might enable
new research avenues: as the raw trajectory and loop detector data were continuously recorded over more than two years and a
significantly longer network than existing datasets (Table 1), analyses are now possible that require long time series, filtering of
specific (and maybe rare) conditions, or longitudinal analysis of drivers tracked over multiple kilometers. For example, traffic
accidents are rare events; it is hard and expensive to plan a conventional trajectory data collection aimed at studying an accident.
Raw data for our algorithm requires the presence of loop detector stations and configuration in the back-office for the GPS-data
collection, making it easy to collect large data sets also on other (and even larger) sites than the one in our paper. With long-time
series of accidents and reconstructed trajectory data, correlations between both (e.g., which lane-change hotspots are also accident
black spots?) can be analyzed despite the rare character of accidents. These arguments make us believe that the limitation of data
that observes only a sample of all vehicles is not necessarily unsuitable for safety and lane change analyses. It invites and
challenges the research community to come up with creative, original types of analysis to extract the rich information embedded
in it that will hopefully complement existing studies.

Like any research, one should be aware of some limitations. First, this method relies on two data sources that may not be
available everywhere: lane-specific individual loop detector data and high-frequently sampled (1 Hz) floating car data. Moreover,
its performance is sensitive to the correct functioning of these data sources: even short interruptions of data recording, especially
by the loops and even if only on one of the lanes, severely downgrades the results. Secondly, the scope of application of the
algorithm is limited to motorway networks and off-line analyses; it did not test for urban networks, and it is not suitable for real-
time applications (as it requires vehicle length observation on all detector passages of the same GPS trajectory; hence, can only
be run upon completion of a trajectory). Also, the accuracy of the proposed algorithm is limited to determining the driving lane,
and the results cannot be expected to determine a more detailed lateral position of vehicles within their lane.

Whereas we stated that the proposed method was not tested for urban networks, the application of the proposed method can
easily be expanded to networks with parallel road segments. To this end, if the trajectories are long enough that we can observe
beyond where these parallel roads merge or diverge, we know which road they actually were on and, thus, on which detector set
to look for matches. For example, in a section about 800 meters long in our test network, a two-lane segment coming from the
east (E313) through the interchange toward the south and a three-lane segment coming directly from the north toward the south
are parallel, but considering whether the beginning of the trajectory started from the east or the north, there would be no confusion
due to the parallel sections. On the other hand, if we cannot distinguish the actual path of the trajectory between two parallel road
segments based on the upstream or downstream of the trajectory, then we have a more uncertain data fusion (matching) problem.
For such cases, we should first do rough matching over both roads, then see where most matches lie, then do it again constrained
to only detectors on that road. As a result, we believe there would not be many cases in which parallel roads would significantly
hamper the proposed method's application (although this needs verification in future applications of the method).

Some further research is possible. For instance, the likelihood function in the data fusion step is tuned based on the same
importance weights for the similarity of passage time and passage speed. As a direction for further studies, a sensitivity analysis
could be performed to find the best combination of weights. Although the bias of the reconstructed trajectories at the location of
the loop detectors is extensively reduced and effectively modified between each two successive loop detector stations, there are
remaining biases, especially in sharp road curves. Identifying the road geometric design features associated with such residual
biases could be used to estimate and reduce those biases. Finally, our algorithm exploits GPS trajectories from a small fraction of
traffic (~1-2%) and ignores the remaining vehicles even though the individual loop data contains valuable information on their
state; smartly exploiting -tune the
trajectory reconstruction given constraints imposed by the other vehicles might be interesting future study directions.

7. Acknowledgement

The authors herewith acknowledge


and EU-CEF-project CONCORDA (Action 2016-EU-TM-0327-S) for financial support of this research. Also, the
authors acknowledge Be-Mobile for providing raw trajectory data and Bart van Dessel, Abdelkarim Bellafkih, Koen Rutten,
Steven Muylaert, and Wim van Calster of the Department of Mobility and Public Works of the Flemish Government for providing
the loop detector data and drone and CCTV videos.

8. References

Adamec, V., Herman, D., Schullerova, B., Urbanek, M., 2020. Modelling of traffic load by the DataFromSky System in the Smart City Concept, Smart Governance
for Cities: Perspectives and Experiences. Springer, pp. 135-152, https://doi.org/10.1007/978-3-030-22070-9_7.
Ahmed, I., Karr, A.F., Rouphail, N.M., Chase, R.T., Tanvir, S., 2022. Characterizing lane changing behavior and identifying extreme lane changing traits.
Transportation Letters, 1-15, https://doi.org/10.1080/19427867.2022.2066856.
Alvares, L.O., Oliveira, G., Heuser, C.A., Bogorny, V., 2009. A Framework for Trajectory Data Preprocessing for Data Mining, Twenty-First International
Conference on Software Engineering & Knowledge Engineering (SEKE), Boston, Massachusetts, pp. 698-702,
Arman, M.A., Tampère, C.M., 2021. Lane-level routable digital map reconstruction for motorway networks using low-precision GPS data. Transportation
Research Part C: Emerging Technologies 129, 103234, https://doi.org/10.1016/j.trc.2021.103234.
Asaithambi, G., Basheer, S., 2017. Analysis and Modeling of Vehicle Following Behavior in Mixed Traffic Conditions. Transportation Research Procedia 25,
5094-5103, https://doi.org/10.1016/j.trpro.2017.07.001.
Barmpounakis, E., Geroliminis, N., 2020. On the new era of urban traffic monitoring with massive drone data: The pNEUMA large-scale field experiment.
Transportation Research Part C: Emerging Technologies 111, 50-71, https://doi.org/10.1016/j.trc.2019.11.023.
Bu, Y., Chen, L., Fu, A.W.-C., Liu, D., 2009. Efficient anomaly monitoring over moving object trajectory streams. Association for Computing Machinery, Paris,
France, https://doi.org/10.1145/1557019.1557043.
Chen, G., Viana, A.C., Fiore, M., Sarraute, C., 2018. Individual trajectory reconstruction from mobile network data. [Technical Report] RT-0495. INRIA, Saclay-
Ile-de-France,
Chen, G., Viana, A.C., Fiore, M., Sarraute, C., 2019a. Complete Trajectory Reconstruction from Sparse Mobile Phone Data. EPJ Data Science 8(30),
https://doi.org/10.1140/epjds/s13688-019-0206-8.
Chen, J., Tian, S., Xu, H., Yue, R., Sun, Y., Cui, Y., 2019b. Architecture of vehicle trajectories extraction with roadside LiDAR serving connected vehicles. Ieee
Access 7, 100406-100415, https://doi.org/10.1109/ACCESS.2019.2929795.
Coifman, B., Li, L., 2017. A critical evaluation of the Next Generation Simulation (NGSIM) vehicle trajectory dataset. Transportation Research Part B:
Methodological 105, 362-377, https://doi.org/10.1016/j.trb.2017.09.018.
Dia, H., Thomas, K., 2011. Development and evaluation of arterial incident detection models using fusion of simulated probe vehicle and loop detector data.
Information Fusion 12(1), 20-27, https://doi.org/10.1016/j.inffus.2010.01.001.
Emrich, T., Kriegel, H.-P., Mamoulis, N., Renz, M., Zufle, A., 2012. Querying uncertain spatio-temporal data, 2012 IEEE 28th International Conference on Data
Engineering. IEEE, pp. 354-365, https://doi.org/10.1109/ICDE.2012.94.
Faouzi, N.-E.E., Leung, H., Kurian, A., 2011. Data fusion in intelligent transportation systems: Progress and challenges A survey. Information Fusion 12(1), 4-
10, https://doi.org/10.1016/j.inffus.2010.06.001.
Feng, Y., Sun, J., Chen, P., 2015. Vehicle trajectory reconstruction using automatic vehicle identification and traffic count data. Journal of advanced transportation
49(2), 174-194, https://doi.org/10.1002/atr.1260.
Feng, Z., Zhu, Y., 2016. A survey on trajectory data mining: Techniques and applications. IEEE Access 4, 2056-2067,
https://doi.org/10.1109/ACCESS.2016.2553681.
Gurupackiam, S., Jones Jr, S.L., 2012. Empirical Study Of Accepted Gap And Lane Change Duration Within Arterial Traffic Under Recurrent And Non-recurrent
Congestion. International Journal for Traffic & Transport Engineering 2(4), http://dx.doi.org/10.7708/ije.2012.2(4).02.
Hall, D.L., Llinas, J., 1997. An introduction to multisensor data fusion. Proceedings of the IEEE 85(1), 6-23, https://doi.org/10.1109/5.554205.
Hansen, B.E., 2005. Exact mean integrated squared error of higher order kernel estimators. Econometric Theory 21(6), 1031-1057,
https://doi.org/10.1017/S0266466605050528.
Hoogendoorn, S.P., Van Zuylen, H.J., Schreuder, M., Gorte, B., Vosselman, G., 2003. Traffic Data Collection from Aerial Imagery. IFAC Proceedings Volumes
36(14), 89-94, https://doi.org/10.1016/S1474-6670(17)32401-1.
Jin, C.-J., Knoop, V.L., Li, D., Meng, L.-Y., Wang, H., 2019. Discretionary lane-changing behavior: empirical validation for one realistic rule-based model.
Transportmetrica A: transport science 15(2), 244-262, https://doi.org/10.1080/23249935.2018.1464526.
Kanagaraj, V., Asaithambi, G., Toledo, T., Lee, T.-C., 2015. Trajectory data and flow characteristics of mixed traffic. Transportation Research Record 2491(1),
1-11, https://doi.org/10.3141%2F2491-01.
Kaufmann, S., Kerner, B.S., Rehborn, H., Koller, M., Klenov, S.L., 2018. Aerial observations of moving synchronized flow patterns in over-saturated city traffic.
Transportation Research Part C: Emerging Technologies 86, 393-406, https://doi.org/10.1016/j.trc.2017.11.024.
Keyvan-Ekbatani, M., Knoop, V.L., Daamen, W., 2016. Categorization of the lane change decision process on freeways. Transportation research part C: emerging
technologies 69, 515-526, https://doi.org/10.1016/j.trc.2015.11.012.
Knoop, V.L., Hoogendoorn, S., Shiomi, Y., Buisson, C., 2012a. Quantifying the number of lane changes in traffic: Empirical analysis. Transportation research
record 2278(1), 31-41, https://doi.org/10.3141%2F2278-04.
Knoop, V.L., Wilson, R.E., Buisson, C., van Arem, B., 2012b. Number of lane changes determined by splashover effects in loop detector counts. IEEE
Transactions on Intelligent Transportation Systems 13(4), 1525-1534, https://doi.org/10.1109/TITS.2012.2190403.
Kovvali, V.G., Alexiadis, V., Zhang PE, L., 2007. Video-based vehicle trajectory data collection, Transportation Research Board 86th Annual Meeting,
Washington DC, United States,
Krajewski, R., Bock, J., Kloeker, L., Eckstein, L., 2018. The highd dataset: A drone dataset of naturalistic vehicle trajectories on german highways for validation
of highly automated driving systems, 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, pp. 2118-2125,
https://doi.org/10.1109/ITSC.2018.8569552.
Lai, J.W., Chang, J., Ang, L.K., Cheong, K.H., 2020. Multi-level information fusion to alleviate network congestion. Information Fusion 63, 248-255,
https://doi.org/10.1016/j.inffus.2020.06.006.
Lau, B.P.L., Marakkalage, S.H., Zhou, Y., Hassan, N.U., Yuen, C., Zhang, M., Tan, U.X., 2019. A survey of data fusion in smart city applications. Information
Fusion 52, 357-374, https://doi.org/10.1016/j.inffus.2019.05.004.
Lee, J., Han, J., Li, X., 2008. Trajectory Outlier Detection: A Partition-and-Detect Framework, 2008 IEEE 24th International Conference on Data Engineering,
pp. 140-149, https://doi.org/10.1109/ICDE.2008.4497422.
Lee, W.-C., Krumm, J., 2011. Trajectory preprocessing, Computing with spatial trajectories. Springer, pp. 3-33, https://doi.org/10.1007/978-1-4614-1629-6_1.
Li, L., Jiang, R., He, Z., Chen, X.M., Zhou, X., 2020. Trajectory data-based traffic flow studies: A revisit. Transportation Research Part C: Emerging Technologies
114, 225-240, https://doi.org/10.1016/j.trc.2020.02.016.
Li, Y., Li, L., Ni, D., Zhang, Y., 2021. Comprehensive survival analysis of lane-changing duration. Measurement 182, 109707,
https://doi.org/10.1016/j.measurement.2021.109707.
Liu, Z., He, J., Zhang, C., Yan, X., Wang, C., Qiao, B., 2021. Vehicle trajectory extraction at the exit areas of urban freeways based on a novel composite
algorithms framework. Journal of Intelligent Transportation Systems, 1-19, https://doi.org/10.1080/15472450.2021.2021079.
Llinas, J., Hall, D.L., 1998. An introduction to multi-sensor data fusion, ISCAS'98. Proceedings of the 1998 IEEE International Symposium on Circuits and
Systems (Cat. No. 98CH36187). IEEE, pp. 537-540, https://doi.org/10.1109/ISCAS.1998.705329.
Ma, Y., Zhu, J., 2021. Left-turn conflict identification at signal intersections based on vehicle trajectory reconstruction under real-time communication conditions.
Accident Analysis & Prevention 150, 105933, https://doi.org/10.1016/j.aap.2020.105933.
Marczak, F., Daamen, W., Buisson, C., 2016. Empirical analysis of lane changing behavior at a freeway weaving section, in: Cohen, S., Yannis, G. (Eds.), Traffic
Management, Volume 3. Wiley, pp. 139-151, https://doi.org/10.1002/9781119307822.ch10.
Merry, K., Bettinger, P., 2019. Smartphone GPS accuracy study in an urban environment. PloS one 14(7), https://doi.org/10.1371/journal.pone.0219890.
Ni, D., Wang, H., 2008. Trajectory reconstruction for travel time estimation. Journal of Intelligent Transportation Systems 12(3), 113-125,
https://doi.org/10.1080/15472450802262307.
Niedermayer, J., Züfle, A., Emrich, T., Renz, M., Mamoulis, N., Chen, L., Kriegel, H.-P., 2013. Probabilistic nearest neighbor queries on uncertain moving object
trajectories. Proceedings of the VLDB Endowment 7(3), 205-216, https://doi.org/10.48550/arXiv.1305.3407.
Passchier, I., Netten, B.D., Wedemeijer, H., Maas, S.M., van Leeuwen, C.J., Schackmann, P.-P.M., 2013. DITCM roadside facilities for cooperative systems
testing and evaluation, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013). IEEE, pp. 936-942,
https://doi.org/10.1109/ITSC.2013.6728352.
Pokrajac, D., Lazarevic, A., Latecki, L.J., 2007. Incremental Local Outlier Detection for Data Streams, 2007 IEEE Symposium on Computational Intelligence and
Data Mining, pp. 504-515, https://doi.org/10.1109/CIDM.2007.368917.
Pulshashi, I.R., Bae, H., Choi, H., Mun, S., 2018. Smoothing of Trajectory Data Recorded in Harsh Environments and Detection of Outlying Trajectories,
Proceedings of the 7th International Conference on Emerging Databases. Springer Singapore, Singapore, pp. 89-98, https://doi.org/10.1007/978-981-10-
6520-0_10.
Quddus, M.A., Noland, R.B., Ochieng, W.Y., 2006. A High Accuracy Fuzzy Logic Based Map Matching Algorithm for Road Transport. Journal of Intelligent
Transportation Systems 10(3), 103-115, https://doi.org/10.1080/15472450600793560.
Quddus, M.A., Ochieng, W.Y., Noland, R.B., 2007. Current map-matching algorithms for transport applications: State-of-the art and future research directions.
Transportation Research Part C: Emerging Technologies 15(5), 312-328, https://doi.org/10.1016/j.trc.2007.05.002.
Raju, N., Kumar, P., Jain, A., Arkatkar, S.S., Joshi, G., 2018. Application of trajectory data for investigating vehicle behavior in mixed traffic environment.
Transportation research record 2672(43), 122-133, https://doi.org/10.1177%2F0361198118787364.
Rao, W., Wu, Y.-J., Xia, J., Ou, J., Kluger, R., 2018. Origin-destination pattern estimation based on trajectory reconstruction using automatic license plate
recognition data. Transportation Research Part C: Emerging Technologies 95, 29-46, https://doi.org/10.1016/j.trc.2018.07.002.
Rodrigues, F., Markou, I., Pereira, F.C., 2019. Combining time-series and textual data for taxi demand prediction in event areas: A deep learning approach.
Information Fusion 49, 120-129, https://doi.org/10.1016/j.inffus.2018.07.007.
Schuessler, N., Axhausen, K.W., 2009. Processing Raw Data from Global Positioning Systems without Additional Information. Transportation Research Record
2105(1), 28-36, https://doi.org/10.3141%2F2105-04.
Shariat, A., Kalantari, N., Khashaiepour, M., Arman, M.A., Fard, M.R., Abedini, M., Babaie, S., 2011. Calibration guide for microscopic traffic simulation
software for use in Tehran. (In persian). Ava-e-Fahim, Tehran, Iran,
Sun, Z., Ban, X., 2013. Vehicle trajectory reconstruction for signalized intersections using mobile traffic sensors. Transportation Research Part C: Emerging
Technologies 36, 268-283, https://doi.org/10.1016/j.trc.2013.09.002.
Toledo, T., Zohar, D., 2007. Modeling duration of lane changes. Transportation Research Record 1999(1), 71-78, https://doi.org/10.3141%2F1999-08.
Treiterer, J., Myers, J., 1974. The hysteresis phenomenon in traffic flow. Transportation and traffic theory 6, 13-38,
Vajakas, T., Vajakas, J., Lillemets, R., 2015. Trajectory reconstruction from mobile positioning data using cell-to-cell travel time information. International
Journal of Geographical Information Science 29(11), 1941-1954, https://doi.org/10.1080/13658816.2015.1049540.
van Beinum, A., Farah, H., Wegman, F., Hoogendoorn, S., 2018. Driving behaviour at motorway ramps and weaving segments based on empirical trajectory data.
Transportation Research Part C: Emerging Technologies 92, 426-441, https://doi.org/10.1016/j.trc.2018.05.018.
Wang, J., Fu, T., Xue, J., Li, C., Song, H., Xu, W., Shangguan, Q., 2022. Realtime wide-area vehicle trajectory tracking using millimeter-wave radar sensors and
the open TJRD TS dataset. International Journal of Transportation Science and Technology, https://doi.org/10.1016/j.ijtst.2022.02.006.
Wang, Y., Wei, L., Chen, P., 2020. Trajectory reconstruction for freeway traffic mixed with human-driven vehicles and connected and automated vehicles.
Transportation research part C: emerging technologies 111, 135-155, https://doi.org/10.1016/j.trc.2019.12.002.
Wei, L., Wang, Y., Chen, P., 2020. A particle filter-based approach for vehicle trajectory reconstruction using sparse probe data. IEEE Transactions on Intelligent
Transportation Systems 22(5), 2878-2890, https://doi.org/10.1109/TITS.2020.2976671.
Xie, P., Li, T., Liu, J., Du, S., Yang, X., Zhang, J., 2020. Urban flow prediction from spatiotemporal data using machine learning: A survey. Information Fusion
59, 1-12, https://doi.org/10.1016/j.inffus.2020.01.002.
Xie, X., van Lint, H., Verbraeck, A., 2018. A generic data assimilation framework for vehicle trajectory reconstruction on signalized urban arterials using particle
filters. Transportation research part C: emerging technologies 92, 364-391, https://doi.org/10.1016/j.trc.2018.05.009.
Yao, W., Zeng, Q., Lin, Y., Xu, D., Zhao, H., Guillemard, F., Geronimi, S., Aioun, F., 2016. On-road vehicle trajectory collection and scene-based lane change
analysis: Part II. IEEE Transactions on Intelligent Transportation Systems 18(1), 206-220, https://doi.org/10.1109/TITS.2016.2571724.
Yao, W., Zhao, H., Davoine, F., Zha, H., 2012. Learning lane change trajectories from on-road driving data, 2012 IEEE Intelligent Vehicles Symposium. IEEE,
pp. 885-890, https://doi.org/10.1109/IVS.2012.6232190.
Yuan, N.J., Zheng, Y., Zhang, L., Xie, X., 2013. T-finder: A recommender system for finding passengers and vacant taxis. IEEE Transactions on knowledge and
data engineering 25(10), 2390-2403, https://doi.org/10.1109/TKDE.2012.153.
Zhan, W., Sun, L., Wang, D., Shi, H., Clausse, A., Naumann, M., Kummerle, J., Konigshof, H., Stiller, C., de La Fortelle, A., 2019. Interaction dataset: An
international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps. arXiv preprint arXiv:1910.03088,
https://doi.org/10.48550/arXiv.1910.03088.
Zhao, H., Wang, C., Lin, Y., Guillemard, F., Geronimi, S., Aioun, F., 2016. On-road vehicle trajectory collection and scene-based lane change analysis: Part i.
IEEE Transactions on Intelligent Transportation Systems 18(1), 192-205, https://doi.org/10.1109/TITS.2016.2571726.
Zhou, X., Li, C., Yuan, X., Xia, B., Mao, G., Xiong, L., 2016. A Novel Method for Smoothing Raw GPS Data with Low Cost and High Reliability, 2016 IEEE
84th Vehicular Technology Conference (VTC-Fall), pp. 1-5, https://doi.org/10.1109/VTCFall.2016.7880866.
Author Statement

June 21, 2022


Dear respected editor,

We declare, all persons who meet authorship criteria are listed as authors, and all
authors certify that they have participated sufficiently in the work to take public responsibility
for the content, including participation in the concept, design, analysis, writing, or revision of
the manuscript. Furthermore, each author certifies that this material or similar material has not
been and will not be submitted to or published in any other publication before its appearance
in the Transportation Research Part C: Emerging Technologies. In the table below we
summarized the contribution of the authors based on the relevant CRediT roles:

CRediT roles Contribution by Chris M.J. Tampère Contribution by Mohammad Ali Arman
Conceptualization
Methodology
Software (Programming)
Validation
Formal analysis
Investigation
Resources
Data Curation
Writing - Original Draft
Writing - Review & Editing
Visualization
Supervision
Project administration
Funding acquisition

Sincerely yours
Chris M.J. Tampère and Mohammad Ali Arman
Centre for Industrial Management, Traffic, and Infrastructure; KU Leuven; Celestijnenlaan 300;
3001 Leuven, Belgium

You might also like