You are on page 1of 7

Urban Freight Delivery Stop Identification

with GPS Data


Xia Yang, Zhanbo Sun, Xuegang J. Ban, and José Holguín-Veras

Delivery stop identification is a crucial but challenging step in the The use of GPS data to identify urban freight delivery stops has
measurement of urban freight performance. This paper presents the proved to be challenging. Until recently, the only data provided
application of a robust learning method, support vector machine (SVM), by GPS units indicating that a vehicle may have stopped was the
in identifying delivery stops with GPS data. The duration of a stop, the absence of communications between the satellite and the receiver
distance from a stop to the center of the city, and the distance to a stop’s (pings). For example, if the vehicle speed drops below certain value,
closest major bottleneck were extracted as the three features used in the there will be no pings, indicating a possible vehicle stop. As a result,
SVM model. A linear SVM with nested K-fold cross validation proved to it may be difficult to make a distinction between vehicle stops
be highly reliable and robust in identifying delivery stops with relatively (such as traffic stops or deliveries) and signal loss. Furthermore, even
long stop duration, such as those made for grocery stores. Second-by- after a vehicle stop can be properly determined, it is still necessary
second freight GPS data collected in New York City were used to conduct to distinguish between delivery stops and nondelivery stops. To this
a case study. The identification accuracy for the case study was higher
end, proper algorithmic techniques need to be applied.
than 99% for the training and testing data sets.
One of the first attempts to discuss the stop identification issue
was Du and Aultman-Hall (4), who analyzed the GPS data collected
Freight performance measurement is intended to assist public agen- from a subset of travelers in Lexington, Kentucky, between March
cies and private freight shipping companies to monitor and improve 2002 and July 2000, and calibrated the data with manual travel logs
freight performance pertaining to mobility and energy and envi- provided by the participants. With the two data sets, Du and Aultman-
ronmental impacts. This study focused on urban freight that mainly Hall identified the heading change (compass direction), dwell time
delivers commodities to various locations in urban areas from ware- (time elapsed while vehicle speed drops below a certain level), and
houses. The unique feature of urban freight is therefore tour-centric. distance between the GPS points and the network geometry as main
A tour refers to an entire urban freight trip, starting at a warehouse, parameters for trip end identification (4). It was found that the proper
making multiple deliveries or pickups in the middle, and ending benchmarks for detecting trip ends were a heading change of 180°
usually at the same warehouse. The deliveries or pickups usually take or a dwell time between 20 and 140 s. However, the study focused
a significant amount of time (from several minutes to a few hours) and on passenger cars. Since passenger cars typically take less time to
are called delivery stops in this paper. Correctly identifying delivery park, it is likely that a lower vehicle dwell time can be an acceptable
stops is essential to characterize urban freight deliveries, such as threshold for identifying a trip end for a passenger car. This may not
tours, tour durations, delivery times, and number of stops, among be the case for commercial vehicles.
others. Traditionally, delivery stop identification was done via surveys Greaves and Figliozzi developed an algorithm to identify the stops
or driver logs, which consumed time and resources and could only for commercial vehicles (3). The algorithm analyzed the time differ-
cover limited urban areas. ence between GPS-to-satellite communications to determine whether
GPS data have recently gained popularity in measuring freight a vehicle was stopped. It was found that 240 s was an adequate thresh-
performance because of the advantages of GPS technologies over old to indicate a stop. In addition to the time threshold, the geographic
traditional tools such as loop detectors and traffic cameras. Instead distance between the locations of a vehicle at consecutive com­
of gathering data from only a finite number of network locations, munications was also considered. If a vehicle had moved more than
GPS units can capture continuous vehicle traces. GPS units are also the accuracy rating of the device (e.g., 6 m), it was determined
much smaller and less expensive than traditional data-gathering
that the signal had been lost. The researchers also tagged any points
tools. However, GPS data have limitations, for example, uncertainty
where the vehicle position changed less than 6 m, regardless of the time
about vehicle types (although vehicle class may be inferred from
elapsed, to identify short stops by manual inspection. The limitation
traces (1)), signal loss and spatial inaccuracy caused by urban canyons
of the algorithm was that many of the results relied on manual inspec-
(e.g., tunnels and tall buildings), difficulties for data cleaning (2–4),
tions, which may be biased and also time-consuming for a large data
and so forth.
set. The algorithm became further distressed in urban areas such as
New York City, where a commercial vehicle may have to move sev-
X. Yang and Z. Sun, Room JEC 5107; X. J. Ban, Room JEC 4034; and José
Holguín-Veras, Room JEC 4030, Department of Civil and Environmental Engineer- eral times for the same delivery because of the extremely demanding
ing, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180-3590. spatial constraints of the Manhattan traffic network (e.g., numerous
Corresponding author: X. J. Ban, banx@rpi.edu. one-way streets, tall buildings, and pedestrian traffic). Since the algo-
rithm needs to flag such short trips for manual inspection, it may end
Transportation Research Record: Journal of the Transportation Research Board,
No. 2411, Transportation Research Board of the National Academies, Washington,
up manually checking every delivery site.
D.C., 2014, pp. 55–61. McCormack et al. analyzed data from the Seattle, Washington,
DOI: 10.3141/2411-07 metropolitan area (2). They used an algorithm that recorded delivery

55
56 Transportation Research Record 2411

stops when the vehicle’s dwell time (i.e., time that the vehicle engine x2
is off or idle) exceeded 3 min (180 s). It was reported that in addition 1 2
= ||w
to insignificant truck movement, the GPS points tended to fluctuate b ||
• x–
when a truck idled. To deal with this issue, if the distance between w
two consecutive data points was less than 65 ft, the instance was b =0
removed. Although this algorithm is effective at filtering spurious • x– −1
w =
trips, it also removes data that could be significant for other freight
x –b

performance measures, such as service times (i.e., how long it takes w w
for the truck to unload and start the next trip).
The purpose of this study was to develop a robust method to identify
urban freight delivery stops from fine-grained (second-by-second)
GPS data. The data were collected from a trucking company (the name
of the company cannot be released because of the nondisclosure
agreement signed by the research team with the company) that delivers
groceries to multiple stores in New York City, especially Manhattan. x1

b ||
||w
The key technique applied in the algorithm is the support vector
machine (SVM). SVM is a recently developed pattern classifier, which
can be used for two-group or multiple-group classification (5, 6). FIGURE 1   Optimal separating hyperplane for two-dimensional,
In transportation, SVM has been used for incident detection, signal two-class problem.
timing estimation, vehicle classification, and regression analysis,
among others (1, 7–9). A three-stage algorithm was developed based
on linear SVM with nested K-fold cross validation, which proved where
to be effective in identifying delivery stops with relatively long stop • = dot product,

durations, such as those made for grocery stores. The first stage is to x = input vector (x = {x1, . . . , xn}),
identify all the stops based on the speeds as recorded in the second- w = vector perpendicular to the hyperplane, and
by-second GPS data. In the second stage, three features of stops are b = constant.
extracted, including stop duration; distance from a stop to the center
of the city; and binary distance to the closest major bottleneck, such Figure 1 is a simple illustration of SVM with two-dimensional
as a tunnel or a toll booth. The third stage is to implement the linear input ( p = 2) and the hyperplane is reduced to a linear line in the
SVM with nested K-fold cross validation to classify all the stops into two-dimensional space.
delivery stops and nondelivery stops. The accuracy of classification Separated by the hyperplane, all the training data must satisfy the
is higher than 99% for the training and testing data sets. following constraints:

w i xi − b ≥ +1 for ∀yi = +1 (2)


Fundamentals of SVM
w i xi − b ≤ −1 for ∀yi = −1 (3)
SVM is a pattern classifier (5, 7). It first represents training data in
a transformed feature space so that the points can be separated by a which is equivalent to
hyperplane with the largest margin between the two classes. The test-
ing data are then mapped into the same space and predicted to belong yi ( w i xi − b ) ≥ 1 ∀i = 1, . . . , n (4)
to which side of the separating hyperplane. For linearly inseparable
data, kernel methods can be applied to transform the original input As shown in Figure 1, the best hyperplane is the one that not only
space into a higher dimensional feature space where the transformed correctly separates the data, but also maximizes the margin or the
data become linearly separable (8). In this paper, since a linear SVM distance between the closest vectors in the two classes to the hyper-
model for two-group classification is sufficiently accurate in identify- plane (5). In this paper, the delivery stops (y = 1) and nondelivery
ing urban freight delivery stops, only linear SVMs will be briefly pre- stops (y = −1) are separated by a three-feature (p = 3) linear SVM
sented in this section. Vapnik (5), Bishop (6), and Gunn (10) provide classifier.
more details on SVM.
For linear SVMs, training data with n data points are denoted by D,
that is, D = (xi, yi)| xi ∈ Rp, yi ∈ {−1, 1}, i = 1, . . . , n, where xi is a SVM for Urban Freight Delivery
p-dimensional real vector and yi is a binary value indicating the class Stop Identification
that the point xi belongs to. Vector x is also called the features of the
SVM. Features can be understood as the most salient characteristics of SVM was applied to identify urban freight delivery stops. A case
the data points for classification purposes. For the method developed study was conducted with second-by-second GPS data of delivery
in this paper, there are three features, so p = 3. The main task of SVM is tours in the New York metropolitan area. The deliveries were made to
to find the separating hyperplane that clearly divides the two classes a New York City-based chain of small supermarkets, serving mostly
as much as possible. A hyperplane can be defined by the following urban customers. Most stores were within Manhattan, including
equation: Roosevelt Island, except for two stores in Brooklyn and Scarsdale.
The driver log data recorded by drivers of the delivery trucks were also
wi x−b= 0 (1) available, which provided the ground truth delivery stops of the tours.
Yang, Sun, Ban, and Holguín-Veras 57

TABLE 1   Driver’s Delivery Log for February 27, 2013

Weight Time at
Store Number Store Name Commit Time Time In Time Out Pieces (lb) Store (min) On Time

5:00 –11:00 p.m. 6:15 p.m. 7:00 p.m. 250 5,017 0:45 OK
5:00 –11:00 p.m. 7:30 p.m. 8:00 p.m. 307 5,349 0:30 OK
5:00 –11:00 p.m. 8:30 p.m. 9:00 p.m. 463 8,513 0:30 OK
5:00 –11:00 p.m. 9:30 p.m. 10:00 p.m. 278 5,202 0:30 OK

Note: Data in some columns in Table 1 and Table 2 are purposely removed due to privacy agreements.

Samples of the driver log data and GPS data are provided in Table 1 method, 2,249 stops were detected from the GPS data. Most of the
and Table 2, respectively. The latitude and longitude information from identified stops were nondelivery stops, while only 42 were delivery
the GPS data and the store number and name from the log data were stops, which were validated with the driver log data.
removed to protect the privacy of the data providers.
A three-stage approach was developed with SVM to identify
delivery stops from GPS data. The first stage is to preprocess the Feature Extraction
driver log data to obtain all the ground truth delivery stops and the
second-by-second GPS data to find all the stops, including delivery The second stage is to extract features from GPS data that will be
stops and nondelivery stops. The second stage is to extract features used in the SVM model. A feature is a prominent characteristic of
from the GPS data to determine which combination of the features the data that can be used to classify samples into multiple groups.
is the most effective to use in the SVM model. The last stage is to It was observed from the driver log data that the stop duration of
implement the SVM for urban freight delivery identification. a delivery stop is usually longer compared with a nondelivery stop.
Therefore, stop durations were first extracted as a feature. A scatter
plot of the stop durations is shown in Figure 2.
Data Preprocessing Figure 2 shows that stop duration is an important feature for deliv-
ery stop identification. Since a significant amount of data (the squares
The driver log is first processed to obtain the actual delivery stops. and stars) overlap each other, it is difficult to distinguish delivery stops
As shown in Table 1, detailed information about delivery stops can from nondelivery stops based on this single feature. In fact, if stop
be easily obtained from the driver log, including the store number, duration is used as the only feature for classification, there would be
arrival time and department time at each store, and delivery time. It 30 nondelivery stops (of the 42 delivery stops in total) misidentified
turns out that there were 42 delivery stops in total and the minimum as delivery stops, although all the delivery stops can be correctly
delivery time was around 15 min. Since the drivers always round the identified. Therefore, it was necessary to consider other features.
arrival or departure times to the nearest 5 min, the actual minimum By scrutinizing Figure 2, it was found that the largest challenge
delivery time may vary from 10 to 20 min. was in identifying long nondelivery stops from delivery stops. This
The second-by-second GPS data are then processed to get all the is illustrated in Figure 3. All the long stops (with durations longer
stops. Because of the urban canyon effect produced by tall buildings than 500 s) are plotted in the figure: dots represent delivery stops
and long tunnels, a speed threshold of 14 km/h was used to detect and asterisks represent nondelivery stops. It is found that the deliv-
vehicle stops to capture all potential stops. It was found that the ery stops are concentrated in New York City while the nondelivery
difference in terms of stop durations with a speed threshold of 14 stops are further away or close to the bottlenecks. This finding led
and 0 km/h is within 30 s, which is negligible for the minimum to considering the distance from a stop to the center of Manhattan
delivery time according to the driver log (10 to 20 min). Since deliv- (great-circle distance) and the distance from a stop to its closest
ery trucks usually make minor movements around the stop location bottleneck as other features for classification.
(e.g., move from one parking spot to another, circle to find a parking The formulas to calculate the great-circle distance (d ) between
spot), multiple spurious vehicle stops may be generated. To resolve two geographical locations are defined as Equation 5:
this issue, consecutive stops that were less than 10 s apart were
combined as a single stop. With the aforementioned data processing d = r∆σ̂ (5)

TABLE 2   Second-by-Second GPS Data Set

Index Date Time Latitude N/S Longitude E/W Height (m) Speed (km/h) Heading

54 2/27/2013 20:54:02 136  6  0


55 2/27/2013 20:54:03 136  7  0
56 2/27/2013 20:54:04 136 10 147
57 2/27/2013 20:54:05 135 10 149
58 2/27/2013 20:54:06 134 10 149
59 2/27/2013 20:54:07 134 12 154
58 Transportation Research Record 2411

120
Nondelivery stop
Delivery stop
100

80

Stop duration (min)


60

40

20

0
0 500 1,000 1,500 2,000 2,500
Stop ID number after data sorting based on stop type

FIGURE 2   Stop duration of all stops.

where r is the radius of Earth (6,371,009 m) and Δσ̂ is the central where ϕs, λs, ϕf, λf are the geographical latitudes and longitudes of
angle between the two points, calculated by Equation 6: the two points (s and f stand for the two locations, such as the vehi-
cle’s stop location and a toll booth), respectively; Δϕ, Δλ are their
absolute differences. This feature and the first feature (stop duration)
 (cos φ f sin ∆λ )2 
  are plotted in Figure 4.
 + ( cos φs sin φ f − sin φs cos φ f cos ∆λ )
2
 Figure 4 shows that the stop duration and the distance to central
∆σˆ = arctan   (6)
 sin φs sin φ f + cos φs cos φ f cos ∆λ Manhattan are two effective features to make reliable identification

FIGURE 3   Long stops with duration over 500 s (nondelivery stops are marked by stars, and the large star
on the left represents 22 nondelivery stops that are farther west and are not included in the selected region).
Yang, Sun, Ban, and Holguín-Veras 59

180
Nondelivery stop

Distance to the center of Manhattan (km)


160 Delivery stop

140

120

100

80

60

40

20

0
0 20 40 60 80 100 120
Stop duration (min)

FIGURE 4   Stop duration and distance to city center.

of delivery stops, but there are still some overlaps of the two types A binary value was set to indicate the distance from a stop to
of stops (the squares and asterisks). The distance of the stop to its its closest major bottleneck: the distance was recorded as 1 if the
closest major bottleneck was considered as the third feature. The great-circle distance was less than or equal to 1,000 m and recorded
reasoning was that a vehicle may experience large delays at major as 0 otherwise. It was found that the classification results were not
bottlenecks (such as toll booths, tunnels, or bridges) in New York very sensitive to this threshold. The selected bottlenecks included
City. In this study, only regular bottlenecks, such as toll booths the two ends of Lincoln Tunnel and the toll booth in Weehawken.
and entrances to tunnels or bridges, were considered. Other variable A scatter plot of the three features is shown in Figure 5. In addition
bottlenecks, such as those caused by accidents, construction, and to the three features mentioned above, other related features were
inclement weather, were not taken into account. However, similar explored, such as the average speed during a vehicle stop and the
a methodology can certainly be applied if information on variable change of heading direction. It was found that the three-feature SVM
bottlenecks becomes available. classifier provided the best results.
Binary distance to the closest bottleneck

Distance to the center of Manhattan (km) Stop duration (min)

FIGURE 5   Stop duration and distance to city center and distance to closest bottleneck.
60 Transportation Research Record 2411

Implementation and Results TABLE 3   Results for Use of GPS Data for Identification
of Delivery Stops
A nested K-fold cross-validation procedure was used to validate
the proposed SVM model. This procedure is well established and Training
Data Set Testing Data Set
commonly used to tune and validate classification models (see
Alexander et al. (11) for an in-depth discussion). Use of this pro- Fold Accuracy Accuracy False False
cedure simultaneously guaranteed that optimal classifiers could be Number (%) (%) Positive Negative
learned and the model predictions were unbiased and non-overfitted.
1 99.75 100 0 0
In the nested K-fold cross validation, the original data set was first
partitioned into K equal subsets. Then K − 1 subsets were used as the 2 99.80 100 0 0
training data set while the remaining subset was used as the testing 3 99.80 100 0 0
data set to measure the performance of each trained model. During 4 99.80 100 0 0
the training procedure, the best model was selected based on the 5 99.80 99.56 1 0
grid search (K − 1-fold) of all possible combinations of parameters 6 99.80 100 0 0
(αi, i = 1, . . . , m). The pseudocode of this nested K-fold cross- 7 99.80 100 0 0
validation procedure (rewritten based on Alexander et al. (11)) is 8 99.80 100 0 0
summarized below: 9 99.80 100 0 0
10 99.80 100 0 0
1. Repeat K times. Average 99.80 99.96 0.1 0
– Select K − 1 subsets as the training set.
– Keep the remaining set as the testing set.
1.1  Repeat for I = 1, 2, . . . , m.
a.  Repeat for j = 1, 2, . . . , K − 1 (for samples only in stop. Two successive stops less than 10 s apart were then combined,
the training set). Select K − 2 subsets as the training validation which was easy to implement. It turned out that even with the speed
set; keep the remaining subset as the testing validation set; threshold set as high as 14 km/h, the stop duration deviation was
train a classifier on the training validation set with parameter within 30 s compared with 0 km/h, which is negligible to the actual
αi; test the classifier on the testing validation set. delivery stop duration.
b.  Record p(i), the average performance over the K − 1 The issue of minor movements at stores was also solved by com-
testing validation set. bining multiple consecutive stops that were less than 10 s into a
1.2 Determine αj, where j = argmax(p(i)) for i = 1, 2, . . . , m. single stop. To some extent, this method might not be suitable for
1.3  Train the classifier on the training set with parameter αj. deliveries in extremely congested areas, resulting in multiple stops
– Test the classifier used in Step 1.3 on the testing set. after circling around in a large scale for the same delivery. This extreme
2. Return –p, the average performance over the K testing sets. case, however, barely exists in reality because delivery companies
always prefer to pay fines instead of circling for a long time to find
This study implemented the linear SVM with nested 10-fold cross a normal parking spot (12).
validation (K = 10). The results are shown in Table 1. The accuracy
is defined as the number of correctly classified stops over the total
number of stops; false positive is defined as the number of nondelivery Manual Detection of Major Bottlenecks
stops misclassified as delivery stops; and false negative is defined as
the number of delivery stops misclassified as nondelivery stops. The binary distance to a stop’s closest major bottleneck was chosen as
Table 3 shows that the results of the linear SVM with nested K-fold the third feature of the SVM model to classify stops, which required
cross validation are highly reliable and robust in delivery stop identi- manual detection of major bottlenecks. To reduce the manual effort,
fication with the use of the second-by-second GPS data. The average only major bottlenecks that were on the delivery tours and close to
accuracy of the proposed model (across K-folds) is higher than 99% the delivery region were inspected. This was done by examining
for the training and testing sets. The classification results of the testing the second-by-second GPS traces on digital maps. The bottlenecks
sets have at most one misclassified stop. could then be easily detected. Nonetheless, this process requires
some manual work, which is a limitation of the proposed method.

Discussion of Results
Benefits of Applying Robust Learning Methods
This section will briefly discuss some of the key issues related
to the development of the SVM method for freight delivery stop The process of constructing the SVM model and generating the results
identification. highlight the importance of applying robust learning methods, in this
case the SVM method, for delivery stop identification and freight
performance measurement with GPS data in general. Such learning
Data Preprocessing to Alleviate the Effects methods are highly adaptive and learn key model parameters can be
of Urban Canyons and Minor Movements learned from the data. In essence, the methods learn from the data
Around Delivery Stores the best combinations of the thresholds of the three features used in
the SVM model. The thresholds might be manually or automatically
A big problem in delivery stop identification is idling. To deal with this determined via experimental studies, for example, as in Greaves and
problem, a high speed threshold (14 km/h) was first used to identify a Figliozzi (3) and McCormack et al. (2).
Yang, Sun, Ban, and Holguín-Veras 61

However, those simple methods suffer from at least two drawbacks. With the advent of urban Big Data, such a robust, learning-based,
First, the methods require significant effort and sometimes trial and data-driven method is likely to have great potential for urban freight
error to determine the thresholds. Second, and more critical, the performance measurement and urban transportation modeling and
methods can only produce simple combinations of the thresholds. management in general.
For example (for a two-feature SVM), according to the simple method,
if Feature A is less than 100, and (or) Feature B is more than 200, the
stop is a delivery stop. In reality, the best combinations of the features Acknowledgments
could be more complicated. For example, a more realistic method
would be that if (a) Feature A is between 0 and 100 and Feature B is The research reported in this paper was supported by the Commer-
between 100 and 200 or (b) Feature A is between 100 and 200 and cial Remote Sensing and Spatial Information Technologies’ project
Feature B is more than 200, the stop is a delivery stop. In this case, Integrative Freight Demand Management in the New York City
simple, intuitive methods would have difficulty determining the Metropolitan Area: Implementation Phase, which was part of the U.S.
thresholds. By contrast, robust learning methods such as SVM can Department of Transportation’s Research and Innovative Technol-
easily deal with those cases by learning the structure and the actual ogy Administration (U.S. DOT RITA), and by the New York City
thresholds. SVM methods will be more beneficial to use if the sepa- Department of Transportation. The authors also acknowledge the
rating hyperplane is nonlinear, in which case the simple intuitive support and guidance from Caesar Singh of U.S. DOT RITA. The
method will obviously fail to work. support is both acknowledged and appreciated.
Similar to the previous simple methods, some threshold values
were also needed to construct the SVM model. For example, during
the preprocessing phase, 14 km/h was used to determine a stop and References
10 s was used to combine consecutive stops. However, these thresh-
old values are straightforward and can be readily set by looking at the   1. Sun, Z., and X. Ban. Vehicle Classification Using Mobile Traffic Sensors.
Transportation Research Part C: Emerging Technologies, Vol. 37, 2013,
data. Moreover, they are also needed by the simple, intuitive methods. pp. 102–117.
  2. McCormack, E., X. Ma, C. Klocow, A. Currarei, and D. Wright. Devel-
oping a GPS-Based Truck Freight Performance Measure Platform.
Conclusions and Future Washington State Department of Transportation, Olympia, 2010.
  3. Greaves, S. P., and M. A. Figliozzi. Collecting Commercial Vehicle Tour
Research Directions Data with Passive Global Positioning System and Technology: Issues
and Potential Applications. In Transportation Research Record: Journal
A linear SVM model with the nested K-fold cross-validation proce- of the Transportation Research Board, No. 2049, Transportation Research
dure was applied to identify urban freight delivery stops with second- Board of the National Academies, Washington, D.C., 2008, pp. 158–166.
by-second GPS data. A three-stage algorithm was developed including   4. Du, J., and L. Aultman-Hall. Increasing the Accuracy of Trip Rate Infor-
mation from Passive Multi-Day GPS Travel Datasets: Automatic Trip
preprocessing of the GPS data, feature extraction, and implementa-
End Identification Issues. Transportation Research Part A: Policy and
tion of the SVM model. The features that were extracted consisted of Practice, Vol. 41, 2007, pp. 220–232.
stop duration, the distance from a stop to the center of the city, and a  5. Vapnik, V. N. The Nature of Statistical Learning Theory. Springer-Verlag,
stop’s binary distance to its closest major bottleneck (e.g., toll booths, New York, 1995.
tunnels, or bridges). Variable bottlenecks such as accidents, construc-  6. Bishop, C. M. Pattern Recognition and Machine Learning, Springer,
New York, 2006.
tion, and inclement weather were not considered. A case study was   7. Corinna, C., and V. Vladdimir. Support-Vector Networks. Machine
conducted that used second-by-second GPS data on delivery tours in Learning, Vol. 20, 1995, pp. 273–297.
the New York City metropolitan area. The classification results proved   8. Yuan, F., and R. L. Cheu. Incident Detection Using Support Vector
that the proposed three-stage algorithm is highly reliable. The aver- Machines. Transportation Research Part C: Emerging Technologies,
Vol. 11, 2003, pp. 309–328.
age accuracy of the model was higher than 99% for the training and
  9. Hao, P., X. Ban, K. Bennett, Q. Ji, and Z. Sun. Signal Timing Estimation
testing data sets. Using Sample Intersection Travel Times. IEEE Transactions on Intelligent
The three-stage SVM model proposed in this paper can be applied Transportation Systems, Vol. 13, 2012, pp. 792–804.
to urban freight delivery stop identification, especially to deliveries that 10. Gunn, S. Support Vector Machines for Classification and Regression.
are made for small supermarkets or other deliveries with relatively Technical Report, Image Speech and Intelligent Systems Research Group,
University of Southampton, United Kingdom, 1998.
long stop durations at concentrated delivery destination areas. For a 11. Alexander, S., T. Ioannis, D. Yerbolat, and F. A. Constantin. GEMS:
delivery stop with much shorter duration (e.g., a courier), the model A System for Automated Cancer Diagnosis and Biomarker Discovery
may not be directly applied. This issue will be addressed in future from Microarray Gene Expression Data. International Journal of Medical
studies. Informatics, Vol. 74, 2005, pp. 491–503.
12. Holguín-Veras J., J. Polimeni, B. Cruz, N. Xu, G. List, J. Nordstrom, and
The results from the SVM-based freight delivery stop identifi-
J. Haddock. Off-Peak Freight Deliveries: Challenges and Stakeholders’
cation can be used to analyze urban freight delivery performance, Perceptions. In Transportation Research Record: Journal of the Trans-
such as mobility (speed, travel time, delivery time, and so forth) and portation Research Board, No. 1906, Transportation Research Board of
fuel consumption and emissions. These topics will be addressed in the National Academics, Washington, D.C., 2005, pp. 42–48.
future research, particularly for the purpose of evaluating freight
performance for off-hour deliveries (12). Any opinions, findings, and conclusions or recommendations expressed in this
paper are those of the authors and do not necessarily reflect the views of U.S.
The SVM-based model developed in this study represents the first DOT or New York City DOT.
step in the use of robust learning methods to explore large urban data
sets (GPS data in this case) for urban freight performance assessment. The Freight Transportation Data Committee peer-reviewed this paper.

You might also like