1 s2.0 S2352146523002454 Main

Available online at www.sciencedirect.
com
Available online at www.sciencedirect.com
ScienceDirect
ScienceDirect
Available online at www.sciencedirect.com
Transportation Research Procedia 00 (2022) 000–000
Transportation Research Procedia 00 (2022) 000–000 www.elsevier.com/locate/procedia
ScienceDirect www.elsevier.com/locate/procedia
Transportation Research Procedia 69 (2023) 751–758
AIIT 3rd International Conference on Transport Infrastructure and Systems (TIS ROMA 2022),
AIIT 3rd International Conference on September
15th-16th Transport Infrastructure and Systems (TIS ROMA 2022),
2022, Rome, Italy
15th-16th September 2022, Rome, Italy
Assessment of bus speed influencing factors through the
Assessment of bus speed influencing factors through the
exploitation of machine learning techniques
exploitation of machine learning techniques
Aristomenis Kopsacheilis*, Ioannis Politis, Georgios Georgiadis
Aristomenis Kopsacheilis*, Ioannis Politis, Georgios Georgiadis
Transport Engineering Laboratory, Department of Civil Engineering Aristotle University of Thessaloniki, Thessaloniki, GR-54124, Greece
Transport Engineering Laboratory, Department of Civil Engineering Aristotle University of Thessaloniki, Thessaloniki, GR-54124, Greece
Abstract
Abstract
Bus commercial speed is among the most influencing parameters of public transport quality of service. Literature has highlighted
Bus commercial
many speed that
different factors is among
affect the
busmost
speedinfluencing
in differentparameters
ways. Thisofpaperpublic transport
focuses quality
on the of service.ofLiterature
development predictivehas highlighted
models for the
many different
estimation factors
of bus that affect
commercial bus speed
speed, in different
as well as on theways. This paperoffocuses
determination on the
the effect of development of predictive
several variables on bus speedmodelsbetween
for the
estimation stops.
successive of busForcommercial
the need of speed, as well
predicting busascommercial
on the determination
speed, threeof(3)themachine
effect of severalmodels
learning variables
wereondeveloped;
bus speeda between
Support
successive stops. For
Vector Regression the need
(SVR); of predicting
a Random Forest bus commercial
Regression (RFR);speed,
and three (3) machine
an Artificial Neurallearning
Network models
(ANN).were developed;
The models were a Support
based
Vector
on dataRegression
referring to(SVR); a Random
10 lines of the busForest Regression
network (RFR); and
of Thessaloniki, an Artificial
Greece. Neural Network
Data included (ANN). such
several features The models were based
as bus dwell time,
on data referring
passenger demand, to bus
10 lines of the bus
stop location, busnetwork of Thessaloniki,
stop infrastructure Greece.indicate
etc. Results Data included severalperformance
a satisfactory features suchbyasallbus dwell while
models, time,
passenger
the demand,was
best accuracy busachieved
stop location, bus ANN
from the stop infrastructure etc. Results
model. Afterwards, indicatefocused
the analysis a satisfactory
on the performance
determinationbyofall themodels, while
effect of the
the best accuracy
explanatory wason
variables achieved frombus
commercial thespeed.
ANN To model. Afterwards,
achieve the analysis
that, the Shapley focused
Additive on the determination
Explanations of the was
(SHAP) algorithm effect of the
applied,
explanatory
based on thevariables
ANN model. on commercial bus speed.
Bus stop spacing wasTo achieveas
identified that,
thethe Shapley
most Additive
contributing Explanations
feature (SHAP)output;
to the model’s algorithm was applied,
distance of more
based450
than on the ANNbetween
meters model. Bus stop spacing
2 successive waswas
stops identified
found to as have
the most contributing
a positive effectfeature
on bustospeed.
the model’s output; distance
Single-shelter of more
bus stops were
than 450 meters
associated between
with higher bus 2commercial
successivespeed,
stops while
was found to have
the location of abus
positive
stops ineffect on busofspeed.
the middle Single-shelter
a building block proved bustostops
affectwere
bus
associated with higher
speed in a positive way.bus commercial
Dwell time was speed,
also awhile theparameter,
crucial location ofhaving
bus stops in the middle
a decreasing of aonbuilding
effect block
bus speed, asproved
its valueto increases.
affect bus
speed in a positive
Additionally, it wasway. Dwell
found thattime was also in
an increase a crucial parameter,
the number having
of traffic a decreasing
lights and turnseffect
alongona bus
routespeed, as itsaffects
section, value bus
increases.
speed
Additionally,
negatively. it wasnofound
Finally, that relation
apparent an increase in the number
was observed between of total
traffic lights and
passenger turns
traffic andalong a route section, affects bus speed
bus speed.
negatively.
© 2023 TheFinally,
Authors. noPublished
apparent relation was observed
by ELSEVIER B.V. between total passenger traffic and bus speed.
© 2023 The Authors. Published by ELSEVIER B.V.
© 2023
This The
is an Authors.
open accessPublished by ELSEVIER
article under B.V.
the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
This is an
Peer-reviewopen access
under article under
responsibility of the CC BY-NC-ND
scientific committee
Peer-review under responsibility of the scientific committee license (https://creativecommons.org/licenses/by-nc-nd/4.0)
of the
of the Transport Infrastructure
Transport Infrastructure and
and Systems
Systems (TIS
(TIS ROMA
ROMA 2022) 2022)
Peer-review
Keywords: under
Public responsibility
transport; of the scientific
Bus commercial speed; Buscommittee of the Transport
quality of service; Infrastructure
Machine Learning; Shapleyand Systems
Additive (TIS ROMA 2022)
Explanations
Keywords: Public transport; Bus commercial speed; Bus quality of service; Machine Learning; Shapley Additive Explanations
* Corresponding author.
* E-mail kopsacheilis@civil.auth.gr
address:author.
Corresponding
E-mail address: kopsacheilis@civil.auth.gr
2352-1465 © 2023 The Authors. Published by ELSEVIER B.V.
This is an open
2352-1465 access
© 2023 Thearticle under
Authors. the CC BY-NC-ND
Published by ELSEVIER license
B.V.(https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under
This is an open responsibility
access of the scientific
article under CC BY-NC-NDcommittee of the
license Transport Infrastructure and Systems (TIS ROMA 2022)
(https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the Transport Infrastructure and Systems (TIS ROMA 2022)
2352-1465 © 2023 The Authors. Published by ELSEVIER B.V.

This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the Transport Infrastructure and Systems (TIS ROMA 2022)
10.1016/j.trpro.2023.02.232
752 Aristomenis Kopsacheilis et al. / Transportation Research Procedia 69 (2023) 751–758
2 Kopsacheilis et al./ Transportation Research Procedia 00 (2022) 000–000
1. Introduction
A major component that influences the sustainability of a transport system and in extent sustainable development
overall, is the modal share of public transport (Banister, 2011). The contribution of public transport to environmental
sustainability derives not only from the reduced share of the private car, but also because of the encouragement of
non-motorised modes (Mohan & Tiwari, 1999). The United Nations has prioritized the promotion of public transport
use by 2030 (United Nations, 2015), since it recognizes its contribution to achieving sustainable development. At the
same time, the European Commission aims at increasing the sustainability of the transport system by encouraging the
shift from private vehicles to shared ones (European Commission, 2011).
Since sustainable development has an economic aspect as well, operators need to place their efforts in making
public transport systems more efficient, in order to increase their level of service and therefore achieve a higher
economic sustainability (Kennedy, 2002) and increase their ridership. Public transport ridership could be influenced
by various parameters, such as bus fare, service coverage, reliability, etc. Furthermore, changes on bus travel time
have a significant effect on ridership. Especially, a study by the Transit Cooperative Research Program (TCRP)
indicated that a 1% increase in total travel time is expected to cause a 0.4% decrease in ridership (TCRP, 2007). Since
bus speed is directly linked to travel time, improvements in commercial speed are expected to have a positive effect
on ridership. Moreover, since passengers rank bus speed as the second top criterion when evaluating public transport
service quality (de Oña et al., 2012), the need for an accurate bus speed prediction methodology, is apparent. However,
as bus speed is easily affected by traffic conditions, passenger demand and various other factors, the need for the
identification of the key factors that influence commercial speed is evident, in order for operators to introduce efficient
policies in improving the provided level of service.
The present research aims at identifying how certain bus stop and bus route factors influence bus commercial speed
between successive bus stops. To that end, three models (Random Forest Regression (RFR), Support Vector Machine
(SVM), Artificial Neural Network (ANN)) are trained on historic bus line route data and their performance is compared
against each other. Afterwards, we apply the Shapley Additive Explanations (SHAP) algorithm, in order to interpret
the best model and investigate the influence of each of the independent variables to bus commercial speed.
The paper is structured as follows; in the next section a brief literature review of the past research work on bus
speed and bus arrival time prediction models is presented, along with highlighting the studies that focused on the
identification of their determining factors. Following this, the case study is presented, followed by the description of
the data collection process. Afterwards, the main methodological elements of the research are described. The results
are summarized in the penultimate section of the paper, while on the final section, the main conclusions, the policy
implications and the limitations of the research are noted.
2. Literature
Bus commercial speed is an important parameter to operators, since it increases the mode’s utility and makes it
more competitive in relation to private cars. Bus speed is also important to passengers, due to its contribution to the
reduction of total travel time (TCRP, 2013). Passengers perceive bus speed through bus arrival time at bus stops, since
waiting time is an influencing factor for the realized level of service (Dell’Olio et al., 2011). Two of the key factors
that affect bus speed are bus running time and bus delay at bus stops (TCRP, 2013). As a result, past research has
focused on the development of models that predict bus speed and bus arrival time, in order to increase public transport
system’s efficiency.
Based on the fact that bus speed is a decisive influencing factor on the estimation of bus arrival time at bus stops
(Weng et al., 2016), research has been conducted on speed forecasting models that attempt to predict the average speed
of buses between stops or along a bus route. Julio et al (Julio et al., 2016), exploited ANN and SVM speed prediction
models that were structured using historic and real time data from Global Positioning System (GPS) vehicle trackers.
Results showed a clear distinction between the predictive capabilities of the machine learning models and those of
more basic mathematical ones. More to the point, the ANN was proven as the most accurate with a prediction error
3.3% lower than the next best machine learning model. Salvo et al. (Salvo et al., 2007), estimated bus speed by
exploiting Radial Basis Function network (RBF) and Multi-layer Perceptron (MLP) models. Results indicated the flow
per capacity ratio and illegal parking among the most influential input variables.
Aristomenis Kopsacheilis et al. / Transportation Research Procedia 69 (2023) 751–758 753
Kopsacheilis et al./ Transportation Research Procedia 00 (2022) 000–000 3
Jeong and Rilett exploited Automatic Vehicle Location data for the development of regression and ANN models
which predicted bus travel time (Jeong & Rilett, 2004). Results indicated that the ANN model performed significantly
better against all the other ones. Chien et al (2002) developed an enhanced ANN algorithm for bus arrival time
prediction that was trained on data which resulted from analysis of bus lines on a microsimulation software. Yu et al
exploited several machine learning models, such as SVMs, ANNs and others in order to estimate the arrival times of
buses from multiple routes (Yu et al., 2011). The SVM model was proven to be the most efficient, while results
indicated that the combination of bus running times from multiple bus routes, instead of individual routes, as input
data contributed to a better model performance. Ranjitkar et al (2019) concluded that the combination of time series
with ANN was proven as the most efficient model regarding the estimation of bus arrival times, while decision trees
had satisfactory prediction accuracy.
Various studies have examined the factors that affect bus arrival time and in turn bus speed. Research found that
bus arrival time is strongly affected by running time, boarding and alighting passengers and the number of traffic
signals along the bus route (Abkowitz & Engelstein, 1984). Based on previous literature, Strathman et al proposed a
bus running time model that took into account several parameters such as passenger demand, number of stops and
others (Strathman et al., 2000). Results showed that increased passenger demand could add to the total running time.
Lin and Bertini (2004) studied the effects of bus schedule recovery process on the arrival times at bus stops using a
Markov chain model. Results showed that bus delay incurred in a single bus stop could propagate on bus stops
downstream, but will eventually decay, as bus operators tend to adjust their speeds in order to keep the schedule on-
time.
3. Case Study
3.1. Study Area
The study area is the Thessaloniki urban area, which is the second largest Greek city, with a population of 973,997
residents in its functional urban area (Eurostat, 2021). The city’s urban transport system is characterized by the
dominance of the private car and the poor availability of public transport alternatives. Public transport demand is
served only by a bus network, while the city’s underground remains under construction, with a completion-operation
estimation date of 2023.
Although, Thessaloniki’s public transport system served passenger demand efficiently in past years, many factors
such as the city’s population increase and the multi-year financial crisis, have contributed to its progressive decay.
This fact is highlighted by its modal share that decreased from 36.4% in 1988 to 27.5% in 1998 (Basbas & Taxiltaris,
2001) and 22% in 2019 (Politis et al., 2019). As illustrated (Fig. 1a), the existing bus network covers the majority of
the city’s area, with a higher density observed in the city centre.
3.2. Data collection
The objective of this research was to develop a bus commercial speed approximation model between stops and to
investigate the factors that influence commercial speed between successive stops. The data collection was implemented
for 478 route analysis sections that were defined between successive bus stops Xi and Xi+1, as illustrated in Fig. 1b.
The study considers 10 bus lines (total of 55 in Thessaloniki), which operate in the city centre and serve 32.7% of the
total daily passenger demand.
There is a plethora of variables that need to be taken into account when estimating the speed of a bus line. These
variables are categorized in three main categories; bus stop demand variables (e.g., dwell time, failure rate, etc.), bus
stop location variables (e.g., stop location, stop design, etc.) and additional speed related variables (e.g., stop spacing,
traffic signal spacing, etc.) (TCRP, 2013). Due to data unavailability at the time of the data collection process, variables
associated with traffic volume were not included in the analysis. Additionally, the variables ‘Running Time’, ‘Bus
stop infrastructure’ and ‘Turns’ were also considered in our analysis. A presentation and brief description of the
variables that were finally selected for our analysis, their type and descriptive characteristics are included in Table 1.
a b
STOP STOP
𝑋𝑋𝑖𝑖 𝑋𝑋𝑖𝑖+1
Analysis Section
Fig. 1. (a) Bus lines of Thessaloniki’s network (Database retrieved from Thessaloniki’s Transport Authority (TheTA, 2022)); (b) Bus route
section analysed.
The data collection was implemented in two stages. At first, onboard survey personnel collected data regarding the
passenger boarding and alighting demand, the vehicle running time between successive bus stops and the dwell time
at each bus stop (Varvatou & Spillia, 2019). On the second stage, additional data regarding each analysis section were
collected, such as the number of traffic lights, the number of turns, the distance between bus stops, the location of each
bus stop in relation to the following intersection and the design of the bus stop infrastructure. This data was collected
through Geographic Information Systems (GIS) (QGIS, 2020) and Google Maps applications (Google, 2022).
For the calculation of the commercial bus speed, which is the dependent variable of the studied problem, at each
one of the analysis sections, the following three variables were taken into account; the distance between two successive
bus stops Xi and Xi+1 in kilometres; the running time of the bus vehicle in seconds; the dwell time of the bus vehicle
at bus stop Xi+1. Although Running Time is used for the calculation of the commercial speed, the variable is not
included in our ANN model, due to its moderate negative correlation with the dependent variable r=-0.516, n=478,
p=0.000. The rest of the explanatory Xi variables in Table 1 are weakly correlated and thus they are included in our
analysis.
Table 1. Description of research input variables.

Variable Type of Min Max Standard
Description Coded Variable Average
Name Variable Value Value Deviation
Commercial speed of bus vehicle, in
Commercial
kilometres per hour (km/h), between bus Speed (Y) Scale 3.67 44.91 14.88 6.58
Speed
stops Xi and Xi+1.
Distance, in kilometres (km), between StopSpacing
Stop Spacing Scale 0.11 0.8 0.33 0.12
bus stops Xi and Xi+1. (X1)
Dwell time, in seconds (s) of the bus at
stop Xi+1. Time interval between the
Dwell Time time stamp when the bus’s loading DwellTime (X2) Scale 2 57 14.31 8.18
doors open and when the loading doors
close (TCRP, 2013).
Boarding Number of passengers boarding at bus BoardingPax
Scale 0 30 3.59 4.21
Passengers stop Xi+1. (X3)
Alighting Number of passengers alighting at bus AlightingPax
Scale 0 44 3.72 5.35
Passengers stop Xi+1. (X4)
Number of traffic lights on the analysis TrafficLights
Traffic Lights Scale 0 6 1.15 0.86
section. (X5)
Turns Number of turns on the analysis section. Turns (X6) Scale 0 4 0.31 0.56
MidBlock (X7) Nominal 0 1 0.78 0.41
Bus stop Location of bus stop in relation to an
FarSide (X8) Nominal 0 1 0.11 0.31
location intersection.
NearSide (X9) Nominal 0 1 0.11 0.32
SingleShelter
Nominal 0 1 0.64 0.48
(X10)
Bus stop The design of the bus stop's passenger
DoubleShelter
infrastructure waiting infrastructure. Nominal 0 1 0.12 0.32
(X11)
Pole (X12) Nominal 0 1 0.24 0.43
4. Methodology
4.1. Speed Prediction Models
4.1.1. Artificial Neural Networks
ANNs are well established state of the art models that are widely used in transport research, for classification,
clustering, or regression problems. (Patterson, 1990; Sadek, 2007). One key feature of ANNs is that they operate
through neurons and synapses, mimicking the operation of the human brain (Rumelhart et al., 1986). Fig. 2a illustrates
a simple architecture of an ANN for regression with sixteen (16) input variables and two (2) hidden layers.
For this research, a feedforward neural network is employed, with backpropagation. The input layer consisted of
12 neurons, equal to the independent variables of the studied problem (Table 1). After the trial of many different
combinations regarding the number of neurons and hidden layers, the architecture of three hidden layers of 48, 48 and
12 neurons respectively, achieved comparatively better results based on the Root Mean Squared Error (RMSE) index.
After comparing the performance under different activation functions, it was concluded that the ‘Rectified Linear’
function was the most suitable. After several trials, the ‘Adam’ optimizer (with a learning rate of 0.0001) and a batch
size of 64 were the optimal hyperparameters for our model, which was trained on 80% of the total dataset and tested
on the remaining part. The model was trained for 300 epochs, as the horizontal slope of both learning curves after the
250th epoch indicates that the cost function has reached a minimum (Fig. 2b). The ANN model was compiled using
the Python 3 programming language (Python, 2022).
a b
. . . .
. . . .
. . . .
INPUT LAYER HIDDEN LAYERS OUTPUT LAYER
Fig. 2. (a) Architecture of ANN network; (b) ANN learning curves.
4.1.2. Random Forest Regression
Random Forest Regression (RFR) models are based on the principles of Ensemble Learning and Decision Trees.
In the context of an RFR model, several independent predictor models ℎ are applied based on the train data. The final
prediction value is based on the unweighted average of the predictions of the models ℎ (Segal, 2004). RFR models
are a popular alternative to linear regression models, due to their considerable prediction accuracy. In the case of our
research, we utilised suitable Python libraries for the application of the RFR model as well as for the optimization of
its hyperparameters (number of predictors, maximum tree depth, etc.). For our model, we chose 50 estimators with a
maximum depth of 8 nodes per tree.
4.1.3. Support Vector Regression
Support Vector Machines (SVM) are a very popular supervised machine learning model family that are used for
classification and regression tasks. Support Vector Regression (SVR) models, variants of SVMs, are used in order to
tackle regression problems, by initially fitting a loss function based on the train data (Boser et al., 1992). In order to
minimize error, the model exploits a threshold parameter 𝜀𝜀𝜀𝜀, which basically defines the boundaries [−𝜀𝜀𝜀𝜀, 𝜀𝜀𝜀𝜀], according
to which, points that fall outside this space are penalized, while points within are not. In our research, we used the
suitable Python libraries in order to optimize the parameter 𝜀𝜀𝜀𝜀, as well as other hyperparameters. Results showed a
value of 𝜀𝜀𝜀𝜀 = 4, as optimal, while the use of a linear kernel was the most appropriate option.
5. Results
The models’ training process was evaluated through the RMSE and Mean Absolute Error (MAE) metrics, which
calculate the loss of the model by comparing the modelled value 𝑌𝑌𝑌𝑌� against the actual one 𝑌𝑌𝑌𝑌. Table 2 summarises the
performance of all three models on predicting the commercial bus speed. According to the average scores for the two
metrics (average of 10 iterations on random data splits), the ANN model is the most accurate, followed by the RFR
and finally the SVR model. In terms of the ANN, there are no signs of overfitting or underfitting, since the training
and validation curves (Fig. 2b) do not deviate significantly at any point (Brownlee, 2019). This observation leads us
to the conclusion that the trained model has adjusted to the different patterns of the dataset and can predict outputs
based on new input data, without being overly sensitive to possible outliers.
Table 2. Commercial speed – Model comparison.
Model RMSE MAE
Artificial Neural Network (ANN) 4.9937 3.9943
Random Forest Regression (RFR) 5.0380 4.0700
Support Vector Regression (SVR) 5.2914 4.2550
5.1. Assessment of bus speed influencing variables
For the estimation of the importance of each input variable and its contribution to the outcome of the models, we
employed the methodology introduced by Lundberg and Lee, which uses Shapley values and gaming theory into
determining the influence of each input feature on the dependent variable of a particular problem (Lundberg & Lee,
2017). For the interpretation of machine learning models, Lundberg and Lee proposed several algorithms, that can
assess the Shapley values for all the input variables, based on the type of the original model. In the case of this research,
the Deep SHAP algorithm was employed, which calculates the variable importance of Deep – Learning models.
The ANN, being the most accurate of the three models, was chosen as basis for the assessment of the importance
of the independent variables on bus commercial speed. The relative significance of each input variable and their
contribution to commercial speed can be seen on Fig. 3a. On the Y-axis (left) of Fig. 3a, the 12 input variables are
listed in descending significance. On the Y-axis (right), the scale indicates the high and low values for each variable.
On the X-axis the SHAP values for each variable are presented, which describe the relative influence of each variable
to the model output. Positive values indicate a positive contribution to the output value, while negative values indicate
the reverse effect. Finally, the width of each shape on the Y-axis indicates the number of values that are concentrated
on the specific spot.
a b
Fig. 3. (a) Factor importance for all independent variables; (b) Influence of ‘Stop Spacing’ feature on ‘Speed’.
Concerning ‘Stop Spacing’, results of Fig. 3b indicate that placing bus stops at a distance of over 450 meters can
have a positive effect on commercial speed. This can be appointed to the fact that buses have the required distance to
increase their speed until coming to a halt on the next stop and also to periurban/rural areas, where bus stop placement
is not so dense and traffic conditions are lighter. Furthermore, bus stops with ‘Single Shelter’ are linked with higher
speed, something that can be attributed to the relatively high number of ‘SingleShelter’ bus stops in our database or
from the fact that in the analysed network, ‘Single Shelter’ infrastructure is mainly observed in less crowded bus stops
that are therefore associated with lower bus dwell time. Regarding ‘Dwell Time’, a decrease in its value influences
speed in a positive way, while on the other hand, increased passenger demand (boarding or alighting) does not seem
to affect commercial speed significantly. This leads us to the assumption that other factors related to dwell time, such
as ticket collection methods, in-vehicle passenger circulation patterns, etc. may have higher impact on speed.
Regarding the location of bus stops, results show that bus stops located on the middle of a building block (Mid-
Block) proved to affect bus speed in a positive way, probably due to the distance of the bus stop from traffic lights
and the lesser friction between the bus and turning traffic at intersections on the edges of building blocks. The number
of traffic lights along a section, was also among the top contributing variables. More to the point, an increase in the
number of traffic lights is linked to lower vehicle speeds, since vehicles have to stop more frequently, thus leading to
increased running time along a section and subsequently to lower speed. Additionally, the absence of a bus priority
signal system in Thessaloniki also hinders bus speed at intersections. Finally, the number of turns along an analysis
section also proved as an influencing factor, with increased values resulting in lower commercial speeds.
6. Conclusions
This research attempts to identify the significance and influence of certain factors related to bus transport demand
and infrastructure on bus speed. The trained ANN model was validated through a combination of two performance
metrics and achieved adequate levels of predictive accuracy. In comparison to equivalent ANN models employed on
relevant literature, results are satisfactory if the limitations of the dataset are taken into consideration. After the model
learning process was completed, SHAP values were estimated, in order to interpret the model’s results and identify
the key contributing variables to the final output.
Empirical findings indicate bus stop spacing as the key contributor to the bus speed prediction model’s output.
Single shelter bus stops affected bus speed positively, while bus stops at midblock were also linked with increased
bus speed. Dwell time was also a crucial parameter, having a decreasing effect on bus speed, as its value increases.
High numbers of traffic lights and turns along a route section have negative effect on bus speed. Finally, no apparent
relation was found between boarding and alighting passenger traffic and bus speed. The results extracted from our
analysis come in agreement with existing literature regarding the effect of stop spacing, dwell time and number of
traffic lights on bus speed. On the other hand, results regarding passenger boarding and alighting demand contradict
past research. Finally, no prior work on the influence of midblock and single shelter bus stops on bus speed was found
in order to correspondingly discuss our findings.
Results of this research could prove helpful for public transport operators, practitioners and researchers. The
proposed model could assist in predicting the average speed of their bus fleet thus providing a higher level of service
to their passengers and indicate specific interventions on infrastructure level that could result in increased bus speeds.
A more detailed data collection process that would include data regarding bus stops located on suburban networks
could result in a more robust model that could be applied irrespective of the location of the analysis area. Furthermore,
variables that were omitted from this research, due to data collection capacity constraints, such as the curb lane’s
traffic volume, pedestrian traffic and right-turning traffic, could be also significant for the model and contribute to
improved forecasting capabilities.
Acknowledgements
Part of this study was implemented as part of the RECREATE project: “smaRt ECosystem foR improvEment of
public trAnsporT pErformance” (Project code: ΚΜΡ6-0284565) under the framework of the Action «Investment Plans
of Innovation» of the Operational Program «Central Macedonia 2014-2020», that is co-funded by the European
Regional Development Fund and Greece”.
References
Abkowitz, M., & Engelstein, I. (1984). Methods for maintaining transit service regularity. Transportation Research Record, 961, 1-8.
Banister, D. (2011). Cities, mobility and climate change. Journal of Transport Geography, 19(6). https://doi.org/10.1016/j.jtrangeo.2011.03.009
Basbas, S., & Taxiltaris, C. (2001). The quality of an urban public transport system as perceived by the users. European Transport Conference 2001,
Association for European Transport (AET), Planning and Transport Research and Computation (PTRC), 1–12.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual
workshop on Computational learning theory (pp. 144-152). https://doi.org/10.1145/130385.130401
Brownlee, J. (2019). How to use Learning Curves to Diagnose Machine Learning Model Performance.
https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
Chien, S. I. J., Ding, Y., & Wei, C. (2002). Dynamic bus arrival time prediction with artificial neural networks. Journal of Transportation
Engineering, 128(5). https://doi.org/10.1061/(ASCE)0733-947X(2002)128:5(429)
de Oña, J., de Oña, R., & Calvo, F. J. (2012). A classification tree approach to identify key factors of transit service quality. Expert Systems with
Applications, 39(12). https://doi.org/10.1016/j.eswa.2012.03.037
Dell’Olio, L., Ibeas, A., & Cecin, P. (2011). The quality of service desired by public transport users. Transport Policy, 18(1).
https://doi.org/10.1016/j.tranpol.2010.08.005
European Commission. (2011). White Paper. Roadmap to a Single European Transport Area – Towards a competitive and resource efficient
transport system. COM(2011) 144 final. Brussels, 28.3.2011.
Eurostat. (2021). Population on 1 January by age groups and sex - functional urban areas.
https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=urb_lpop1&lang=en
Google. (2022). Google Maps. https://www.google.com/maps/@40.6213265,22.9629665,15z
Jeong, R., & Rilett, L. R. (2004). Bus arrival time prediction using artificial neural network model. IEEE Conference on Intelligent Transportation
Systems, Proceedings, ITSC. https://doi.org/10.1109/itsc.2004.1399041
Julio, N., Giesen, R., & Lizana, P. (2016). Real-time prediction of bus travel speeds using traffic shockwaves and machine learning algorithms.
Research in Transportation Economics, 59. https://doi.org/10.1016/j.retrec.2016.07.019
Kennedy, C. A. (2002). A comparison of the sustainability of public and private transportation systems: Study of the Greater Toronto Area.
Transportation, 29(4). https://doi.org/10.1023/A:1016302913909
Lin, W. H., & Bertini, R. L. (2004). Modeling schedule recovery processes in transit operations for bus arrival time prediction. Journal of Advanced
Transportation, 38(3). https://doi.org/10.1002/atr.5670380306
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems,
2017-December.
Mohan, D., & Tiwari, G. (1999). Sustainable transport systems: linkages between environmental issues, public transport, non-motorised transport
and safety. Economic and Political Weekly, Vol. 34, N(25).
Patterson, D. W. (1990). Introduction to artificial intelligence and expert systems (1st ed.). Prentice-hall of India.
Politis, I., Nikolaidou, A., Papadopoulos, E., Fyrogenis, I., & Verani, E. (2019). CHANGE Project, “Development of a 4-stage model for the
Thessaloniki agglomeration”, Deliverable 2.1.
Python. (2022). The Python Language Reference — Python 3.10.4 documentation. https://docs.python.org/3/reference/
QGIS. (2020). Documentation for QGIS 3.4 — QGIS Documentation documentation. https://docs.qgis.org/3.4/en/docs/
Ranjitkar, P., Tey, L. S., Chakravorty, E., & Hurley, K. L. (2019). Bus Arrival Time Modeling Based on Auckland Data. Transportation Research
Record, 2673(6). https://doi.org/10.1177/0361198119840620
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088).
https://doi.org/10.1038/323533a0
Sadek, A. W. (2007). Artificial intelligence applications in transportation. Transportation Research Circular, Artificial Intelligence in Transportation.
Salvo, G., Amato, G., & Zito, P. (2007). Bus speed estimation by neural networks to improve the automatic fleet management. European
Transport\Trasporti Europei, 37, 93–104.
Segal, M. R. (2004). Machine Learning Benchmarks and Random Forest Regression. Biostatistics.
Strathman, J. G., Dueker, K. J., Kimpel, T., Gerhart, R. L., Turner, K., Toylor, P., Callas, S., & Griffin, D. (2000). Service reliability impacts of
computer-aided dispatching and automatic vehicle location technology: a tri-met case study. Transportation Quarterly, 54(3).
TCRP. (2007). Report 118: Bus Rapid Transit Practitioner’s Guide.
TCRP. (2013). TCRP Report 165: Transit Capacity and Quality of Service Manual. Transit Capacity and Quality of Service Manual, 3rd Edition.
TheTA. (2022). Downloads | TheTA. http://oseth.com.gr/en/downloads/
United Nations. (2015). Transforming our world: the 2030 Agenda for Sustainable Development. A/RES/70/1.
Varvatou, K., & Spillia, I. (2019). Development of mathematical retrogression models for the evaluation of the parameters which affect the duration
of public transportation routes in urban environment [B.Sc. Thesis]. Aristotle University of Thessaloniki.
Weng, J., Wang, C., Huang, H., Wang, Y., & Zhang, L. (2016). Real-time bus travel speed estimation model based on bus GPS data. Advances in
Mechanical Engineering, 8(11). https://doi.org/10.1177/1687814016678162
Yu, B., Lam, W. H. K., & Tam, M. L. (2011). Bus arrival time prediction at bus stop with multiple routes. Transportation Research Part C: Emerging
Technologies, 19(6). https://doi.org/10.1016/j.trc.2011.01.003

1 s2.0 S2352146523002454 Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S2352146523002454 Main

Uploaded by

Copyright:

Available Formats

Available online at www.sciencedirect.

2352-1465 © 2023 The Authors. Published by ELSEVIER B.V.

3.1. Study Area

3.2. Data collection

Table 1. Description of research input variables.

4.1. Speed Prediction Models

4.1.1. Artificial Neural Networks

INPUT LAYER HIDDEN LAYERS OUTPUT LAYER

Fig. 2. (a) Architecture of ANN network; (b) ANN learning curves.

4.1.2. Random Forest Regression

4.1.3. Support Vector Regression

Table 2. Commercial speed – Model comparison.

Model RMSE MAE

Artificial Neural Network (ANN) 4.9937 3.9943

Random Forest Regression (RFR) 5.0380 4.0700

Support Vector Regression (SVR) 5.2914 4.2550

5.1. Assessment of bus speed influencing variables

You might also like