Professional Documents
Culture Documents
net/publication/356836799
CITATIONS READS
0 166
5 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Benjamin Lartey on 07 December 2021.
Abstract— The growth in the transportation sector has led volume based on historical, time series data. Traffic volume
to an enormous increase in the number of vehicles that ply prediction enables traffic managers to make plans and take
our roads daily. Even though this advancement has provided appropriate measures to mitigate traffic congestion. Roadway
numerous transportation modes, it has resulted in serious trans-
portation issues including road congestion. Hence, estimating users benefit from traffic volume prediction by having an idea
the number of vehicles on a road will enable traffic managers for selecting the most appropriate mode of transportation and
to take appropriate decisions to curb congestion. In this paper, selecting less congested routes to their destinations.
we propose to use an extreme gradient boosting (XGBoost) Over the past few years, many traffic forecasting tech-
algorithm to efficiently and accurately predict the hourly niques have been proposed to solve the traffic volume
traffic volume. We investigate the effectiveness of the proposed
method for different scenarios including how well it performs prediction problem. These traffic forecasting techniques can
during extreme weather conditions and holidays. We further be either statistically-based or machine learning based. An
investigate the effect of ridge and LASSO regularization on the example of the statistically-based technique that has been
performance of XGBoost. We then propose a new approach for used for traffic volume forecasting is the autoregressive
setting the LASSO regularization parameter in terms of the integrated moving average (ARIMA) model [2]. Even though
number of observations and predictors. The performance and
computational efficiency of the proposed approach is evaluated this model has a good theoretical background, it does not
on data collected from Interstate-94, Minnesota and the results attain satisfactory performance since it fails to handle the
are compared with existing methods. The results show that the nonlinear nature of traffic conditions such as off-peak and
proposed method provides a good balance between performance peak periods. On the other hand, the machine learning based
and computational efficiency. models produce prediction results which are more stable
I. INTRODUCTION and accurate. Conventional machine learning models such as
support vector machines (SVM) [3] and k-Nearest Neighbor
In the past few years, there has been significant growth [4] have been employed by researchers for traffic volume
in the transportation sector with an enormous increase in prediction. However, these conventional machine learning
the number of vehicles on roads. According to the Federal methods are faced with the challenge of generating accurate
Highway Administration, (a division of the US Department results when dealing with enormous amount of traffic data.
of Transportation), there are over 276 million registered In recent years, deep learning approaches such as long-
vehicles plying the highways as of 2019 [1]. The increment short term memory (LSTM) [5] and deep neural network [6]
in the volume of road traffic has given rise to serious are employed for traffic volume prediction. Although these
transportation issues like traffic congestion. Congestion is not approaches are capable of handling huge traffic data and gen-
just a transportation problem, but also an economic growth erating accurate prediction results, deep learning approaches
problem. Congestion can impede business attraction and are computationally expensive. In order to forecast traffic
growth in a region and also affect the quality of life of locals. volume in time and accurately, a model that can provide a
To address this issue and to accommodate the large number good balance between performance and computational cost
of vehicles on roads, two measures can be taken. First, lane is required. Additionally, the model should be able to handle
capacities should be increased by expanding the existing the nonlinear nature of the traffic data.
roads or constructing completely new roads. This solution In this work, we employ Extreme Gradient Boosting (XG-
is costly to implement and may not be feasible especially in Boost) for traffic volume prediction. XGBoost is scalable,
densely populated urban areas like Manhattan in New York computationally efficient and exhibits high performance. It
City due to limited space. The second approach is to employ provides a good balance between performance and computa-
effective traffic control techniques to make more efficient tional efficiency. XGBoost has been shown to be effective for
use of the existing road network. This traffic management time series forecasting problems. For instance, XGBoost was
approach is less expensive to implement and feasible to apply employed for stock price forecasting in [7]. In [8], XGBoost
especially in urban settings. A typical traffic control strategy was employed for predicting daily electricity consumption
is traffic forecasting which involves the prediction of traffic since it is able to capture the nonlinear relationship between
attributes and exhibit better performance in making predic-
All authors are with North Carolina Agricultural and Technical State Uni-
versity, Department of Electrical and Computer Engineering, Greensboro, tion on nonlinear traffic data. We employed XGBoost with
NC 27411, USA. LASSO regularization to enhance the performance of the
*Corresponding author: A. Homaifar. Address: 1601 East Market XGBoost model and avoid overfitting. LASSO regularization
Street, North Carolina A&T State University, Department of Electri-
cal and Computer Engineering, Greensboro, NC 27411, USA. Email: is considered since weights of leaves which do not make
homaifar@ncat.edu. significant contribution in a tree structure are reduced to
zero or dropped to improve efficiency and performance [9]. III. METHODOLOGY
Also, we propose a new approach for setting the LASSO A. Tree-based machine learning methods
regularization parameter α in terms of the number of samples
Tree-based machine learning models comprises decision
and the number of predictors as defined in (11).
trees (DT), random forests (RF), gradient boosting machines
The remaining part of this paper is highlighted as follows: (GBM) and extreme gradient boosting method. Tree-based
section II reviews existing literature on this topic, section III methods are equipped with varying functionalities which
gives a detailed description of our implementation, section enables them to (i) handle various data without any rig-
IV contains the results and discussions and we present our orous preprocessing (ii) normally have low computational
final conclusions and future work in section V. cost [20]. Decision trees are capable of approximating
complex relationships and non-linearities between predictors,
II. R ELATED WORK however, it often does not perform well on unseen data.
To address this limitation, ensemble methods are preferred.
Early research on traffic volume prediction was based on
Ensemble methods (RF, GBM and XGBoost) are built on
statistically based approaches such as Autoregressive Inte-
Decision trees. In RF, each DT is developed irrespective of
grated Moving Average (ARIMA) models [10] and Kalman
the other trees which means there is no collective effort to
filtering model [11]. Apart from the statistically based ap-
improve the overall performance. To address this, GBM is
proaches, researchers have also employed machine learning
considered. GBM builds the individual trees in a sequence
based techniques for traffic volume prediction. For instance,
such that new trees provide an improvement or a boost to
a k-Nearest Neighbor (k-NN) model was adopted by [12]
the subsequent ones. However, GBM does not make room for
for the prediction of short-term traffic condition. The k-NN
regularization to penalize the model’s complexity and avoid
model exhibited better performance when set side by side
overfitting [21]. The XGBoost model handles this drawback.
with the artificial neural network (ANN) and support vector
XGBoost model is preferred to the other tree-based models
machine (SVM). Also, an SVM [13] model and a time-based
since it provides a good balance between performance and
SVM model [14] have been employed for the prediction
computational complexity.
of traffic volume. In [15], a hybrid model of SVM and
ARIMA was used for traffic volume prediction and a wavelet B. XGBoost for traffic volume prediction
denoising method was used to remove noise from the traffic Since the task is a time series problem, the current traffic
data. Another machine learning model that has exhibited volume is predicted based on the previous traffic volume
good performance and minimal error is the classification and historical data. For a given traffic volume data with n samples
regression trees (CART) algorithm. For example, in [16], a and m predictors, D = {(xp , Vp ) : |D| = n, xp ∈ Rm , Vp ∈
traffic volume prediction was made by implementing a CART R}, the objective function of a tree model for traffic volume
model which outperformed k-nearest neighbors (k-NN) and prediction can be stated as:
Kalman filtering. n
(t)
Recently, advanced machine learning and deep learning C = argmin l(Vp , Vˆp ) (1)
models have been employed for the prediction of traffic p=1
volume. Researchers have adopted these advanced techniques (t)
in order to improve the prediction accuracy and minimize where Vp is the actual traffic volume and Vˆp is the
the prediction error. In order to achieve these goals, the predicted traffic volume acquired by adding a new tree ft
authors in [6] used a deep learning framework to forecast at time step t, the predicted traffic volume for the added tree
traffic volume. In their work, traffic volume was predicted by f is computed as:
(t) (t−1)
considering adverse snowy conditions and during a special Vˆp = Vˆp + f (t) (xp ) (2)
event. The main contribution of their work was building a
(t−1)
framework based on tanh hidden layers and l1 regularization. where Vˆp is the predicted traffic volume at the previ-
The first hidden layer is responsible for capturing the spatio- ous time. In order to avoid overfitting, a regularization term
temporal relationship between independent variables and the Ω is added to (1), and (2) is also substituted into (1) to obtain
other layers capture the nonlinear relationship. In [17] [18], a the following objective function:
gradient boosting decision tree (GBDT) algorithm was used
for forecasting traffic volume. The authors in [5] proposed n
(t−1)
a traffic forecasting technique based on Long Short-Term C = argmin l(Vp , Vˆp + f (t) (xp )) + Ω(f (t) (xp ))
Memory (LSTM) framework. A two-layer framework with p=1
C. XGBoost with LASSO regularization L(t) is the objective function at the t-th iteration. The
Extreme Gradient Boosting (XGBoost) is an improved algorithm seeks to minimize the loss in every iteration.
variant of the Gradient Boosting Decision Tree (GBDT) al- By applying second-order approximation of Taylor’s ex-
gorithm [22]. It is highly scalable, computationally efficient pansion, (3) becomes:
n
and exhibits high performance [23]. The main motivation 1
behind the XGBoost algorithm is to obtain an optimized L(t) ≈ [l(yˆp (t−1) , yp ) + gp ft (xp ) + hp ft 2 (xp )] + Ω(ft )
p=1
2
objective function value [7].
The theoretical background of the XGBoost algorithm is (7)
summarized below and for more details see [23]. A tree where gp = ∂yˆp (t−1) l(yˆp (t−1) , yp ), hp =
boosting algorithm makes prediction on a given data by (t−1)
∂y2ˆ (t−1) l(yˆp , yp ), gp and hp are respectively first
p
summing the predictions from each tree expressed mathe- and second order gradients of the loss function, l. The
matically in Eqn. (1). constant term can be removed in order to simplify (4) and
K
then substitute the expression for Ω.
yˆp = fk (xp ) (4) Defining a sample set of each leaf j as Sj , (4) can be
k=1 rewritten as:
where fk represents a particular tree structure q with weights
of leaves, w. xp and yˆp are the training set and predicted T
1 1
value respectively. The objective function to be minimized L(t) = γT + [wj (Gj + α) + wj2 Hj ] (8)
j=1
2 2
is defined as:
L= l(yˆp , yp ) + Ω(fk ) where Gj = p∈Sj gp and Hj = p∈Sj hp . Considering
p k
a fixed tree structure q, the optimal weight and the optimal
value of the objective function can be computed respectively
1
where Ω(fk ) = γT + αw1 (5) as:
2
In this case, l is defined as the loss function which is a Gj + 12 α
measure of the difference between the predicted value, yˆp and wj∗ = − (9)
Hj
the actual value, yp . T is the number of leaves in each tree, w
is the weight of the leaves, γ is the minimum loss reduction T
1 (Gj + 12 α)2
needed for splitting a node, α is the regularization parameter L∗(t) = γT − (10)
and Ω is the regularization term which is responsible for 2 j=1 Hj
penalizing the model’s complexity. Also, the regularization
Equation (7) is referred to as the scoring function which
term is included to smoothen the new weights learned to
is a measure of each tree’s structure quality. Smaller scoring
overcome overfitting and make the model more robust.
values imply better tree structure, q.
The objective function defined in (2) can be minimized
by adding a new tree ft to the loss function. IV. RESULTS AND DISCUSSION
Fig. 1 provides a high level description of our ex-
n
periments. The following main steps are considered: data
L(t) = [l(yp , yˆp (t−1) + ft (xp )] + Ω(ft ) (6) preprocessing, feature engineering, splitting the data into
p=1 train, validation and test set, model selection and evaluation.
A. Data description D. Model selection
The traffic data used in this paper is a multivariate, K-fold cross validation method was employed to compare
sequential, time-series data provided as an open source data various machine learning models and select the best one. We
on UCI Machine Learning Repository by the Minnesota used the typical K value of 10 [27].
Department of Transportation (MnDoT) [24]. The weather
data is obtained from OpenWeatherMap [25]. This data E. Hyperparameter setting
was collected between the periods of 2012 and 2018 and it The maximum tree depth is set to 8 since it ensures a
comprises hourly traffic volume on Interstate 94 Westbound balance between performance and computational complexity.
for MnDoT Automatic Traffic Recorder (ATR) Station No. The same maximum tree depth is used for decision trees,
301, which lies approximately halfway between Minneapolis gradient boosting decision trees and random forest.
and St. Paul, Minnesota. It consists of a total of eight
predictors and 48, 204 observations. This data has also been TABLE I: XGBoost Hyperparameter setting
used by the authors in [26]. A detailed description of each
of the attributes is given below: Hyperparameter value
(c) Comparison of the prediction results of XGBoost+RIDGE and (d) Comparison of the prediction results of XGBoost+RIDGE and
SVM during independence day (05 July 2018) SVM during a bad weather condition (25 August 2018)
(e) Comparison of the prediction results of XGBoost+LASSO and (f) Comparison of the prediction results of XGBoost+RIDGE and
SVM during normal days (18-28 July 2018) SVM during normal days (18-28 July 2018)
Fig. 2: Comparison of prediction results of XGBoost and SVM during independence day, bad weather conditions and normal days.