SMC21 2021 SMC Paper Final New

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/356836799
XGBoost: a tree-based approach for traffic volume prediction
Preprint · December 2021
CITATIONS READS
0 166
5 authors, including:
Benjamin Lartey Abdollah Homaifar

North Carolina Agricultural and Technical State University North Carolina Agricultural and Technical State University
2 PUBLICATIONS 0 CITATIONS 268 PUBLICATIONS 3,145 CITATIONS
SEE PROFILE SEE PROFILE
Abenezer Girma Ali Karimoddini

North Carolina Agricultural and Technical State University North Carolina Agricultural and Technical State University
18 PUBLICATIONS 55 CITATIONS 83 PUBLICATIONS 520 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Cognitive Attention Modeling View project
Human-Robot Collaboration (HRC) Research View project
All content following this page was uploaded by Benjamin Lartey on 07 December 2021.
The user has requested enhancement of the downloaded file.

XGBoost: a tree-based approach for traffic volume prediction
Benjamin Lartey, Abdollah Homaifar*, Abenezer Girma, Ali Karimoddini, Daniel Opoku
Abstract— The growth in the transportation sector has led volume based on historical, time series data. Traffic volume
to an enormous increase in the number of vehicles that ply prediction enables traffic managers to make plans and take
our roads daily. Even though this advancement has provided appropriate measures to mitigate traffic congestion. Roadway
numerous transportation modes, it has resulted in serious trans-
portation issues including road congestion. Hence, estimating users benefit from traffic volume prediction by having an idea
the number of vehicles on a road will enable traffic managers for selecting the most appropriate mode of transportation and
to take appropriate decisions to curb congestion. In this paper, selecting less congested routes to their destinations.
we propose to use an extreme gradient boosting (XGBoost) Over the past few years, many traffic forecasting tech-
algorithm to efficiently and accurately predict the hourly niques have been proposed to solve the traffic volume
traffic volume. We investigate the effectiveness of the proposed
method for different scenarios including how well it performs prediction problem. These traffic forecasting techniques can
during extreme weather conditions and holidays. We further be either statistically-based or machine learning based. An
investigate the effect of ridge and LASSO regularization on the example of the statistically-based technique that has been
performance of XGBoost. We then propose a new approach for used for traffic volume forecasting is the autoregressive
setting the LASSO regularization parameter in terms of the integrated moving average (ARIMA) model [2]. Even though
number of observations and predictors. The performance and
computational efficiency of the proposed approach is evaluated this model has a good theoretical background, it does not
on data collected from Interstate-94, Minnesota and the results attain satisfactory performance since it fails to handle the
are compared with existing methods. The results show that the nonlinear nature of traffic conditions such as off-peak and
proposed method provides a good balance between performance peak periods. On the other hand, the machine learning based
and computational efficiency. models produce prediction results which are more stable
I. INTRODUCTION and accurate. Conventional machine learning models such as
support vector machines (SVM) [3] and k-Nearest Neighbor
In the past few years, there has been significant growth [4] have been employed by researchers for traffic volume
in the transportation sector with an enormous increase in prediction. However, these conventional machine learning
the number of vehicles on roads. According to the Federal methods are faced with the challenge of generating accurate
Highway Administration, (a division of the US Department results when dealing with enormous amount of traffic data.
of Transportation), there are over 276 million registered In recent years, deep learning approaches such as long-
vehicles plying the highways as of 2019 [1]. The increment short term memory (LSTM) [5] and deep neural network [6]
in the volume of road traffic has given rise to serious are employed for traffic volume prediction. Although these
transportation issues like traffic congestion. Congestion is not approaches are capable of handling huge traffic data and gen-
just a transportation problem, but also an economic growth erating accurate prediction results, deep learning approaches
problem. Congestion can impede business attraction and are computationally expensive. In order to forecast traffic
growth in a region and also affect the quality of life of locals. volume in time and accurately, a model that can provide a
To address this issue and to accommodate the large number good balance between performance and computational cost
of vehicles on roads, two measures can be taken. First, lane is required. Additionally, the model should be able to handle
capacities should be increased by expanding the existing the nonlinear nature of the traffic data.
roads or constructing completely new roads. This solution In this work, we employ Extreme Gradient Boosting (XG-
is costly to implement and may not be feasible especially in Boost) for traffic volume prediction. XGBoost is scalable,
densely populated urban areas like Manhattan in New York computationally efficient and exhibits high performance. It
City due to limited space. The second approach is to employ provides a good balance between performance and computa-
effective traffic control techniques to make more efficient tional efficiency. XGBoost has been shown to be effective for
use of the existing road network. This traffic management time series forecasting problems. For instance, XGBoost was
approach is less expensive to implement and feasible to apply employed for stock price forecasting in [7]. In [8], XGBoost
especially in urban settings. A typical traffic control strategy was employed for predicting daily electricity consumption
is traffic forecasting which involves the prediction of traffic since it is able to capture the nonlinear relationship between
attributes and exhibit better performance in making predic-
All authors are with North Carolina Agricultural and Technical State Uni-
versity, Department of Electrical and Computer Engineering, Greensboro, tion on nonlinear traffic data. We employed XGBoost with
NC 27411, USA. LASSO regularization to enhance the performance of the
*Corresponding author: A. Homaifar. Address: 1601 East Market XGBoost model and avoid overfitting. LASSO regularization
Street, North Carolina A&T State University, Department of Electri-
cal and Computer Engineering, Greensboro, NC 27411, USA. Email: is considered since weights of leaves which do not make
homaifar@ncat.edu. significant contribution in a tree structure are reduced to
zero or dropped to improve efficiency and performance [9]. III. METHODOLOGY
Also, we propose a new approach for setting the LASSO A. Tree-based machine learning methods
regularization parameter α in terms of the number of samples
Tree-based machine learning models comprises decision
and the number of predictors as defined in (11).
trees (DT), random forests (RF), gradient boosting machines
The remaining part of this paper is highlighted as follows: (GBM) and extreme gradient boosting method. Tree-based
section II reviews existing literature on this topic, section III methods are equipped with varying functionalities which
gives a detailed description of our implementation, section enables them to (i) handle various data without any rig-
IV contains the results and discussions and we present our orous preprocessing (ii) normally have low computational
final conclusions and future work in section V. cost [20]. Decision trees are capable of approximating
complex relationships and non-linearities between predictors,
II. R ELATED WORK however, it often does not perform well on unseen data.
To address this limitation, ensemble methods are preferred.
Early research on traffic volume prediction was based on
Ensemble methods (RF, GBM and XGBoost) are built on
statistically based approaches such as Autoregressive Inte-
Decision trees. In RF, each DT is developed irrespective of
grated Moving Average (ARIMA) models [10] and Kalman
the other trees which means there is no collective effort to
filtering model [11]. Apart from the statistically based ap-
improve the overall performance. To address this, GBM is
proaches, researchers have also employed machine learning
considered. GBM builds the individual trees in a sequence
based techniques for traffic volume prediction. For instance,
such that new trees provide an improvement or a boost to
a k-Nearest Neighbor (k-NN) model was adopted by [12]
the subsequent ones. However, GBM does not make room for
for the prediction of short-term traffic condition. The k-NN
regularization to penalize the model’s complexity and avoid
model exhibited better performance when set side by side
overfitting [21]. The XGBoost model handles this drawback.
with the artificial neural network (ANN) and support vector
XGBoost model is preferred to the other tree-based models
machine (SVM). Also, an SVM [13] model and a time-based
since it provides a good balance between performance and
SVM model [14] have been employed for the prediction
computational complexity.
of traffic volume. In [15], a hybrid model of SVM and
ARIMA was used for traffic volume prediction and a wavelet B. XGBoost for traffic volume prediction
denoising method was used to remove noise from the traffic Since the task is a time series problem, the current traffic
data. Another machine learning model that has exhibited volume is predicted based on the previous traffic volume
good performance and minimal error is the classification and historical data. For a given traffic volume data with n samples
regression trees (CART) algorithm. For example, in [16], a and m predictors, D = {(xp , Vp ) : |D| = n, xp ∈ Rm , Vp ∈
traffic volume prediction was made by implementing a CART R}, the objective function of a tree model for traffic volume
model which outperformed k-nearest neighbors (k-NN) and prediction can be stated as:
Kalman filtering. n
(t)
Recently, advanced machine learning and deep learning C = argmin l(Vp , Vˆp ) (1)
models have been employed for the prediction of traffic p=1
volume. Researchers have adopted these advanced techniques (t)
in order to improve the prediction accuracy and minimize where Vp is the actual traffic volume and Vˆp is the
the prediction error. In order to achieve these goals, the predicted traffic volume acquired by adding a new tree ft
authors in [6] used a deep learning framework to forecast at time step t, the predicted traffic volume for the added tree
traffic volume. In their work, traffic volume was predicted by f is computed as:
(t) (t−1)
considering adverse snowy conditions and during a special Vˆp = Vˆp + f (t) (xp ) (2)
event. The main contribution of their work was building a
(t−1)
framework based on tanh hidden layers and l1 regularization. where Vˆp is the predicted traffic volume at the previ-
The first hidden layer is responsible for capturing the spatio- ous time. In order to avoid overfitting, a regularization term
temporal relationship between independent variables and the Ω is added to (1), and (2) is also substituted into (1) to obtain
other layers capture the nonlinear relationship. In [17] [18], a the following objective function:
gradient boosting decision tree (GBDT) algorithm was used
for forecasting traffic volume. The authors in [5] proposed n
(t−1)
a traffic forecasting technique based on Long Short-Term C = argmin l(Vp , Vˆp + f (t) (xp )) + Ω(f (t) (xp ))
Memory (LSTM) framework. A two-layer framework with p=1
numerous units were employed in their network to capture (3)

the spatio-temporal relations in the traffic data. The LSTM To obtain the optimal weight that minimizes the objective
model exhibited significant performance when compared function in (3), we employ the extreme gradient boosting
with other models. A similar work has been conducted (XGBoost) algorithm. The optimal solution of the objective
in [19]. The proposed method compared with the traffic function is obtained by substituting the optimal weight back
forecasting techniques in the literature exhibits both better into (3). The final predicted traffic volume is computed after
performance and computational efficiency. a number of t iterations.
Fig. 1: Block diagram of prediction model for traffic volume prediction
C. XGBoost with LASSO regularization L(t) is the objective function at the t-th iteration. The
Extreme Gradient Boosting (XGBoost) is an improved algorithm seeks to minimize the loss in every iteration.
variant of the Gradient Boosting Decision Tree (GBDT) al- By applying second-order approximation of Taylor’s ex-
gorithm [22]. It is highly scalable, computationally efficient pansion, (3) becomes:
n
and exhibits high performance [23]. The main motivation 1
behind the XGBoost algorithm is to obtain an optimized L(t) ≈ [l(yˆp (t−1) , yp ) + gp ft (xp ) + hp ft 2 (xp )] + Ω(ft )
p=1
2
objective function value [7].
The theoretical background of the XGBoost algorithm is (7)
summarized below and for more details see [23]. A tree where gp = ∂yˆp (t−1) l(yˆp (t−1) , yp ), hp =
boosting algorithm makes prediction on a given data by (t−1)
∂y2ˆ (t−1) l(yˆp , yp ), gp and hp are respectively first
p
summing the predictions from each tree expressed mathe- and second order gradients of the loss function, l. The
matically in Eqn. (1). constant term can be removed in order to simplify (4) and
K
then substitute the expression for Ω.
yˆp = fk (xp ) (4) Defining a sample set of each leaf j as Sj , (4) can be
k=1 rewritten as:
where fk represents a particular tree structure q with weights
of leaves, w. xp and yˆp are the training set and predicted T
1 1
value respectively. The objective function to be minimized L(t) = γT + [wj (Gj + α) + wj2 Hj ] (8)
j=1
2 2
is defined as:

L= l(yˆp , yp ) + Ω(fk ) where Gj = p∈Sj gp and Hj = p∈Sj hp . Considering
p k
a fixed tree structure q, the optimal weight and the optimal
value of the objective function can be computed respectively
1
where Ω(fk ) = γT + αw1 (5) as:
2
In this case, l is defined as the loss function which is a Gj + 12 α
measure of the difference between the predicted value, yˆp and wj∗ = − (9)
Hj
the actual value, yp . T is the number of leaves in each tree, w
is the weight of the leaves, γ is the minimum loss reduction T
1 (Gj + 12 α)2
needed for splitting a node, α is the regularization parameter L∗(t) = γT − (10)
and Ω is the regularization term which is responsible for 2 j=1 Hj
penalizing the model’s complexity. Also, the regularization
Equation (7) is referred to as the scoring function which
term is included to smoothen the new weights learned to
is a measure of each tree’s structure quality. Smaller scoring
overcome overfitting and make the model more robust.
values imply better tree structure, q.
The objective function defined in (2) can be minimized
by adding a new tree ft to the loss function. IV. RESULTS AND DISCUSSION
Fig. 1 provides a high level description of our ex-
n
periments. The following main steps are considered: data
L(t) = [l(yp , yˆp (t−1) + ft (xp )] + Ω(ft ) (6) preprocessing, feature engineering, splitting the data into
p=1 train, validation and test set, model selection and evaluation.
A. Data description D. Model selection
The traffic data used in this paper is a multivariate, K-fold cross validation method was employed to compare
sequential, time-series data provided as an open source data various machine learning models and select the best one. We
on UCI Machine Learning Repository by the Minnesota used the typical K value of 10 [27].
Department of Transportation (MnDoT) [24]. The weather
data is obtained from OpenWeatherMap [25]. This data E. Hyperparameter setting
was collected between the periods of 2012 and 2018 and it The maximum tree depth is set to 8 since it ensures a
comprises hourly traffic volume on Interstate 94 Westbound balance between performance and computational complexity.
for MnDoT Automatic Traffic Recorder (ATR) Station No. The same maximum tree depth is used for decision trees,
301, which lies approximately halfway between Minneapolis gradient boosting decision trees and random forest.
and St. Paul, Minnesota. It consists of a total of eight
predictors and 48, 204 observations. This data has also been TABLE I: XGBoost Hyperparameter setting
used by the authors in [26]. A detailed description of each
of the attributes is given below: Hyperparameter value
• holiday is a categorical attribute which consists of all n estimators 500

max depth 8
US National holidays and regional holiday, Minnesota learning rate 0.03
State Fair. gamma 0.001
• temp is a numerical attribute which contains the average reg lambda 0.001
temperature measured in kelvin.
• rain 1h is the numerical measure of the amount of rain The ridge regularization parameter, λ is set to zero when
that occurred within that hour in mm. experimenting with LASSO regularization. Also, the LASSO
• snow 1h is the numerical measure of the amount of regularization parameter, α is set to zero when experimenting
snow that occurred within that hour in mm. with ridge regularization. It has been shown in [28] that
• clouds all is a numerical attribute which contains the α can be expressed as the product of a constant and the
amount of cloud cover expressed in terms of percentage. Gaussian width of a unit ball divided by the square root
• weather main is a categorical attribute which contains of the number of samples. For the sake of simplicity, we
a brief description of the present weather. replaced Gaussian width which is a set by the number of
• weather description is a categorical attribute which con- predictors. Thus, the regularization parameter of LASSO is
tains a detailed description of the current weather. defined as:
m
• date time includes the date, time and hour of the col- α = c√ (11)
n
lected data in local central standard time (CST).
• traffic volume is the numerical output variable which where c is a positive constant i.e. c > 0 and has a
contains the hourly I-94 westbound traffic volume. default value of 1, m is the number of features and n is the
number of observations. The remaining training parameters
B. Data preprocessing and feature engineering with regards to XGBoost are set as shown in Table I.
For the FCDNN network, we used the same number of
The raw data contains noise and missing values. Hence,
hidden layers as in [29] where three hidden layers were
we conducted data preprocessing to transform the raw data
used each with 100 units. Also, two LSTM layers were used
into relevant and reliable form. We cleaned the data by
with 100 units in the first layer and 50 units in the second
detecting and removing outliers and filling missing values
layer. Default hyperparameter values were used for the other
by replacing missing data in a column with the mean of
models.
that column. In order to capture more information from the
data, we generated additional features called peak periods F. Evaluation
and weekends. These added features are crucial, because the
number of vehicles on roads significantly varies during the The final step involves evaluating the performance of the
weekends and peak hours. The added features capture this models. The models are evaluated by using the mean absolute
variation. error (MAE), the root mean squared error (RMSE) and
coefficient of determination (R2 ) [29]. The MAE, the RMSE
C. Splitting the data and R2 can be computed as:
n
The traffic data is split into training, validation and testing 1
M AE = |yˆp − yp | (12)
set. The training set is used for training the model and the n p=1
testing set is an unseen data which is used when employing
the model for prediction. We allocated the first three years

of the data for training which corresponds to 65% and the 1 n
remaining was split into 10% for validation and 25% for RM SE = (yˆp − yp )2 (13)
n p=1
testing.
the null hypothesis is that the proposed model is statistically
n 2 comparable with the other models when the p − value is
p=1 (yˆp − yp )
R 2 = 1 − n (14) greater than the confidence level (α − value). As depicted
p=1 (yˆp − y)2
in Table III, the proposed method is statistically different
where y is the mean of the output variable. from the conventional machine learning models (SVM and
KNN), a tree-based model (RF algorithm) and the FCDNN
G. Experiment results network since p − value < α − value.
We investigated the prediction results of the XGBoost
model and the baseline model SVM during a severe weather TABLE III: Statistical analysis based on Wilcoxon signed
condition on 25 August 2018, on Independence Day (5 July rank test
2018) and during normal days as shown in Fig. 2. We
XGBoost vs Models p-value α-value
selected SVM as the baseline model since it is well known
to exhibit good performance with regards to traffic volume SVM 0.00 0.05
KNN 0.00
prediction [30], [31]. On 25 August 2018, the weather RF 0.00
was generally bad with heavy rains and thunderstorm which FCDNN 0.00
affected the traffic volume as shown in Fig. 2b and 2d. Since
Independence Day is a major holiday in the US, there are
numerous events that take place in each of the states. For I. Complexity analysis
instance, in Minnesota, there were several events that took The experiment is implemented on a MacOS operating
place in the twin cities (Minneapolis and St. Paul). This system with Intel i5 2.7 GHz CPU and 8 GB memory. The
explains why the traffic volume was high on Interstate 94 execution time and time complexity of the proposed model
(the highway between Minneapolis and St. Paul) as shown in compared with a conventional machine learning model, SVM
Fig. 2a. Therefore, these two scenarios are good candidates and other tree based models, DT and RF are shown in Table
for testing how well the models capture the variations in IV. DT has the lowest execution time whereas SVM has
traffic volume data. The proposed model was able to make the highest execution time. However, XGBoost will execute
accurate predictions during the Independence Day and two faster than DT when the number of observations, n increases.
days before and after the holiday as shown in Fig. 2a. The K and d are the number of trees and maximum tree depth
SVM model did not provide accurate predictions during the respectively.
holiday and two days before and after the holiday. We also
explored the prediction results of the models during normal TABLE IV: Comparison of computational complexity of
days as shown in Fig. 2e and 2f. The prediction results of models
XGBoost follows a similar pattern with the actual traffic
Model Execution time (s) Time complexity
volume. Also, the performance of the proposed model was
compared with the prediction results of other models such SVM 48.10 O(m2 )
DT 0.30 O(nm log m)
as Support Vector Machine (SVM), k-Nearest Neighbors RF 47.96 O(Knm log m)
(KNN), Decision Trees (DT), Random Forest (RF), Gradient XGBoost 7.14 O(Kdm + m log m)
Boosting Decision Trees (GBDT), Fully Connected Deep
Neural Network (FCDNN) and Long Short Term Memory
(LSTM) as shown in Table II. J. Discussion
The proposed model compared with SVM, KNN, DT, RF,
TABLE II: Comparison of various models based on R2 , GBM, FCDNN and LSTM shows a better performance and
RM SE and M AE efficiency when considering a forecast time of 60 minutes.
The proposed model identifies the temporal relations between
Model R2 RMSE MAE
attributes and the nonlinear relations in the metro interstate
SVM 0.9291 0.2659 0.1890 traffic volume data. The RF model which is also a tree
KNN 0.8521 0.3842 0.2926
DT 0.9375 0.2497 0.1413
based model shows significant performance when compared
RF 0.9733 0.1631 0.1119 with other models whereas KNN and SVM which are
GBM 0.9716 0.1684 0.1177 conventional machine learning models exhibited the worst
FCDNN 0.9573 0.2065 0.1513
LSTM 0.9298 0.2534 0.1886
performance. Fig. 2 indicates how the XGBoost model
XGBoost+ridge 0.9776 0.1495 0.0947 accurately predicts traffic volume pattern during a holiday
XGBoost+lasso 0.9802 0.1406 0.0902 and severe weather conditions and then two days before and
after the two special cases. Thus, the proposed model is able
to handle irregularities such as special events and adverse
H. Statistical analysis weather conditions in traffic data. The XGBoost model with
A statistical analysis was conducted by employing LASSO regularization generated the highest accuracy since
Wilcoxon Signed Rank Test approach [32] to check the leaves that do not make significant contribution to the final
statistical significance of the proposed method. In this test, prediction are dropped or forced to zero.
(a) Comparison of the prediction results of XGBoost+LASSO and (b) Comparison of the prediction results of XGBoost+LASSO and
SVM during independence day (05 July 2018) SVM during a bad weather condition (25 August 2018)
(c) Comparison of the prediction results of XGBoost+RIDGE and (d) Comparison of the prediction results of XGBoost+RIDGE and
SVM during independence day (05 July 2018) SVM during a bad weather condition (25 August 2018)
(e) Comparison of the prediction results of XGBoost+LASSO and (f) Comparison of the prediction results of XGBoost+RIDGE and
SVM during normal days (18-28 July 2018) SVM during normal days (18-28 July 2018)
Fig. 2: Comparison of prediction results of XGBoost and SVM during independence day, bad weather conditions and normal days.
V. CONCLUSION AND FUTURE WORK ACKNOWLEDGMENTS

The authors thank the support from the North Carolina
In this paper, we employed XGBoost with LASSO regular- Department of Transportation (NCDOT) under the award
ization for predicting the hourly traffic volume on Interstate- number TCE2020-03. This work is also supported by Air
94 in Minnesota. We investigated on how well the proposed Force Research Laboratory (AFRL) and Office of Secre-
model and the baseline model (SVM) performs during hol- tary of Defense (OSD) under agreement number FA8750-
idays, extreme weather conditions and normal days. The 15-2-0116. This work is also supported partially by the
XGBoost model exhibited better performance than the other NASA University Leadership Initiative under grant number
models which means it could be employed to make accurate 80NSSC20M0161. The author(s) bear full responsibility of
predictions of traffic volume. Besides comparing the perfor- the facts and the accuracy of the data presented in this paper.
mance of the proposed model with the prediction results of The contents do not necessarily indicate the official views
other models in the literature, its computational efficiency or policies of the funding agencies. This paper does not
was also compared. The average computation time of the constitute a standard, specification, or regulation.
XGBoost model is 7.14s which means it takes less than
10s to predict the traffic volume in an hour. Therefore, the R EFERENCES
algorithm is efficient to meet the requirements of real-time [1] F. H. A. (US), Highway statistics 2019. Federal Highway Adminis-
application. tration, 2021.
In this study, the focus was on only traffic volume pre- [2] M. S. Ahmed and A. R. Cook, Analysis of freeway traffic time-series
data by using Box-Jenkins techniques, 1979, no. 722.
diction, however, an overall traffic forecast should comprise [3] Y. Zhang and Y. Xie, “Forecasting of short-term freeway volume
traffic speed, lane occupancy and travel time which will be with v-support vector machines,” Transportation Research Record, vol.
more meaningful to traffic managers and commuters. Thus, 2024, no. 1, pp. 92–99, 2007.
[4] G. A. Davis and N. L. Nihan, “Nonparametric regression and short-
a more diverse traffic forecast system will be considered in term freeway traffic forecasting,” Journal of Transportation Engineer-
our future work. ing, vol. 117, no. 2, pp. 178–188, 1991.
[5] Z. Zhao, W. Chen, X. Wu, P. C. Chen, and J. Liu, “Lstm network: a [30] X. Wang, K. An, L. Tang, and X. Chen, “Short term prediction of
deep learning approach for short-term traffic forecast,” IET Intelligent freeway exiting volume based on svm and knn,” International Journal
Transport Systems, vol. 11, no. 2, pp. 68–75, 2017. of Transportation Science and Technology, vol. 4, no. 3, pp. 337–352,
[6] N. G. Polson and V. O. Sokolov, “Deep learning for short-term 2015.
traffic flow prediction,” Transportation Research Part C: Emerging [31] X. Feng, X. Ling, H. Zheng, Z. Chen, and Y. Xu, “Adaptive multi-
Technologies, vol. 79, pp. 1–17, 2017. kernel svm with spatial–temporal correlation for short-term traffic flow
[7] Y. Wang and Y. Guo, “Forecasting method of stock market volatility in prediction,” IEEE Transactions on Intelligent Transportation Systems,
time series data based on mixed model of arima and xgboost,” China vol. 20, no. 6, pp. 2001–2013, 2018.
Communications, vol. 17, no. 3, pp. 205–221, 2020. [32] R. Woolson, “Wilcoxon signed-rank test,” Wiley encyclopedia of
[8] W. Wang, Y. Shi, G. Lyu, and W. Deng, “Electricity consumption pre- clinical trials, pp. 1–3, 2007.
diction using xgboost based on discrete wavelet transform,” DEStech
Transactions on Computer Science and Engineering, no. aiea, 2017.
[9] P. Zhao and B. Yu, “On model selection consistency of lasso,” The
Journal of Machine Learning Research, vol. 7, pp. 2541–2563, 2006.
[10] Y. Suzuki, “Prediction of daily traffic volumes by using autoregressive
models,” in Proceedings 199 IEEE/IEEJ/JSAI International Confer-
ence on Intelligent Transportation Systems (Cat. No. 99TH8383).
IEEE, 1999, pp. 116–118.
[11] I. Okutani and Y. J. Stephanedes, “Dynamic prediction of traffic
volume through kalman filtering theory,” Transportation Research Part
B: Methodological, vol. 18, no. 1, pp. 1–11, 1984.
[12] B. Yu, X. Song, F. Guan, Z. Yang, and B. Yao, “k-nearest neighbor
model for multiple-time-step prediction of short-term traffic condi-
tion,” Journal of Transportation Engineering, vol. 142, no. 6, p.
04016018, 2016.
[13] Q. Xu and R. Yang, “Traffic flow prediction using support vector
machine based method,” Journal of Highway and Transportation
Research and Development, vol. 22, no. 12, pp. 131–134, 2005.
[14] Q. Li, “Short-time traffic flow volume prediction based on support
vector machine with time-dependent structure,” in 2009 IEEE Instru-
mentation and Measurement Technology Conference. IEEE, 2009,
pp. 1730–1733.
[15] M. TAN, Y. LI, and J. XU, “A hybrid arima and svm model for traffic
flow prediction based on wavelet denoising,” Journal of Highway and
Transportation Research and Development, vol. 7, pp. 126–133, 2009.
[16] Y. Xu, Q.-J. Kong, and Y. Liu, “Short-term traffic volume prediction
using classification and regression trees,” in 2013 IEEE Intelligent
Vehicles Symposium (IV). IEEE, 2013, pp. 493–498.
[17] S. Yang, J. Wu, Y. Du, Y. He, and X. Chen, “Ensemble learning
for short-term traffic prediction based on gradient boosting machine,”
Journal of Sensors, vol. 2017, 2017.
[18] Y. Xia and J. Chen, “Traffic flow forecasting method based on
gradient boosting decision tree,” in 2017 5th International Conference
on Frontiers of Manufacturing Science and Measuring Technology
(FMSMT 2017). Atlantis Press, 2017, pp. 413–416.
[19] C. Kang and Z. Zhang, “Application of lstm in short-term traffic flow
prediction,” in 2020 IEEE 5th International Conference on Intelligent
Transportation Engineering (ICITE). IEEE, 2020, pp. 98–101.
[20] C. Kern, T. Klausch, and F. Kreuter, “Tree-based machine learning
methods for survey research,” in Survey research methods, vol. 13,
no. 1. NIH Public Access, 2019, p. 73.
[21] C. Bentéjac, A. Csörgő, and G. Martı́nez-Muñoz, “A comparative anal-
ysis of gradient boosting algorithms,” Artificial Intelligence Review,
vol. 54, no. 3, pp. 1937–1967, 2021.
[22] J. H. Friedman, “Stochastic gradient boosting,” Computational statis-
tics & data analysis, vol. 38, no. 4, pp. 367–378, 2002.
[23] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,”
in Proceedings of the 22nd acm sigkdd international conference on
knowledge discovery and data mining, 2016, pp. 785–794.
[24] A. Asuncion and D. Newman, “Uci machine learning repository,”
2007.
[25] I. OpenWeatherMap, “Openweathermap current weather and forecast,”
2016.
[26] M. V. Muntean, “Identifying critical traffic flow time slots using k-
means and decision trees,” in 2020 IEEE 10th International Confer-
ence on Intelligent Systems (IS). IEEE, 2020, pp. 364–369.
[27] D. Anguita, L. Ghelardoni, A. Ghio, L. Oneto, and S. Ridella,
“The’k’in k-fold cross validation.” in ESANN, 2012, pp. 441–446.
[28] A. Banerjee, S. Chen, F. Fazayeli, and V. Sivakumar, “Estimation with
norm regularization.” in NIPS, 2014, pp. 1556–1564.
[29] O. Mohammed and J. Kianfar, “A machine learning approach to short-
term traffic flow prediction: a case study of interstate 64 in missouri,”
in 2018 IEEE International Smart Cities Conference (ISC2). IEEE,
2018, pp. 1–7.
View publication stats

SMC21 2021 SMC Paper Final New

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SMC21 2021 SMC Paper Final New

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

XGBoost: a tree-based approach for trafﬁc volume prediction

Preprint · December 2021

Benjamin Lartey Abdollah Homaifar

SEE PROFILE SEE PROFILE

Abenezer Girma Ali Karimoddini

SEE PROFILE SEE PROFILE

Cognitive Attention Modeling View project

Human-Robot Collaboration (HRC) Research View project

The user has requested enhancement of the downloaded file.

numerous units were employed in their network to capture (3)

• holiday is a categorical attribute which consists of all n estimators 500

V. CONCLUSION AND FUTURE WORK ACKNOWLEDGMENTS

View publication stats

You might also like