00 1-S2.0-S2405844023016882-Main

Heliyon 9 (2023) e14481
Contents lists available at ScienceDirect
Heliyon
journal homepage: www.cell.com/heliyon
Research article
Forecasting time trend of road traffic crashes in Iran using the

macro-scale traffic flow characteristics
Habibollah Nassiri a, *, Seyed Iman Mohammadpour b, Mohammad Dahaghin c
a
Civil Engineering Department, Sharif University of Technology, Tehran, Iran
b
c
A R T I C L E I N F O A B S T R A C T
Keywords: Background: The serial correlation in the time series datasets should be considered to prevent
Time series biased estimates for coefficients. Nonetheless, the current models almost cannot explicitly handle
SARIMAX autocorrelation and seasonality, and they focus mainly on the discrete nature of data. Nonethe
Traffic safety
less, the crash time series follows a normal distribution at the macro-scale. Moreover, the influ
Multivariate
ential exogenous variables have been overlooked in Iran, employing univariate models. There are
Crash prediction
Speed also contradictory results in the literature regarding the effect of average speed on crash
frequency.
Objective: This study is aimed to evaluate the distinct impacts of mean speed on total and fatal
accident time series at the national level. Besides, the SARIMAX modeling framework is intro
duced as a robust multivariate method for short-term crash frequency prediction.
Method: To this end, monthly total and fatal crash counts were aggregated for all rural highways
in Iran. Besides, the time trends of traffic exposure, and average speed recorded by loop detectors,
were aggregated at the same level as covariates. The Box-Jenkins methodology was employed for
time series analysis.
Results: The results illustrated that the seasonal autoregressive integrated moving average with
explanatory variable (SARIMAX) model outperformed the univariate ARIMA and SARIMA
models. Also, SARIMA was more appropriate than the simple ARIMA when seasonality existed in
the time series. Besides, the average speed had a negative linear association with the total crashes.
In contrast, it revealed an increasing effect on fatal crashes.
Conclusion: Average speed has a dissimilar effect on the different traffic crash severities. Besides,
the seasonal nature of data and the dynamic effects of the influential underlying factors should be
considered to prevent underfitting issues and to predict future time trends accurately.
Applications: The developed instruments could be employed by policymakers to evaluate the in
tervention’s effectiveness and to forecast the future time trends of accidents in Iran.
1. Introduction
Motor vehicle crashes (MVCs) have become a critical public health issue worldwide, causing nearly 1.35 million fatalities annually
* Corresponding author.
E-mail addresses: nassiri@sharif.edu (H. Nassiri), s.mohammadpour@student.sharif.edu (S.I. Mohammadpour), Mohammad.dahaghin@alum.
sharif.edu (M. Dahaghin).
https://doi.org/10.1016/j.heliyon.2023.e14481
Received 1 April 2022; Received in revised form 3 March 2023; Accepted 7 March 2023
Available online 11 March 2023
2405-8440/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
H. Nassiri et al. Heliyon 9 (2023) e14481
[1]. The reported data indicates that MVCs were the eighth-leading cause of death in 2016; however, if no significant measures are
taken, it is expected to escalate to the seventh position by 2030 [2]. Recognizing the severity of this problem, the United Nations
included the prevention of the road traffic injuries as a part of the sustainable development goals in 2015 [3]. In Iran, the annual death
rate from road traffic crashes is 25.5 per 100,000 people, which is significantly higher than the global average of 18.2 per 100,000
people [1]. Therefore, in order to develop effective interventions, it is crucial to gain an understanding of how traffic accidents will
evolve over time, while taking into consideration the concurrent effects of various time-varying influential factors.
Time series regression methods are utilized to identify the relationship between crash counts for a geographic region (e.g., county,
country) and their underlying time-varying regressors, such as macroeconomic conditions, operational traffic characteristics, and
socio-demographic specifications. These methods target three purposes: to forecast future accident trends, given the future values of
explanatory variables, to compare the relative importance of various time-varying contributing factors, and to conduct interrupted
time series analysis aimed to evaluate the effectiveness of a particular policy or countermeasure. Nonetheless, the validity and per
formance of such models largely rely on selecting accurate econometric models. To recognize an adequate model postulation,
recognizing count variables characteristics is essential as MVCs are nonnegative, discrete, and sporadic event count [4].
Previous studies have primarily utilized Negative Binomial (NB), and Poisson regressions for crash prediction, as crash counts are
nonnegative integers. However, such models have a restrictive implicit assumption of independently and identically distributed error
terms, which can be inappropriate for analyzing time series data. In the case of the crash time series, regular and seasonal serial
correlations are intrinsic features that should not be overlooked, as disregarding them may result in inefficient parameter estimates and
erroneous inferences [4]. To address this issue, scholars have introduced autoregressive (AR) terms to discrete-valued regressions,
including the Poisson autoregressive (PAR) [5] and integer-valued Poisson (INAR) models [4]. However, these methods only consider
the AR terms and overlook the moving average (MA) components. Additionally, they neglect the overdispersion feature of crash time
series. To tackle these problems, the NBGARCH [6] and GLARMA [7] methods have been developed to handle autocorrelated,
integer-valued, and over-dispersed time series. Nevertheless, these methods still fail to account for seasonal autocorrelations in crash
time series, which is a significant limitation. In order to address the limitation of traditional time series models the seasonal autor
egressive integrated moving average (SARIMA) model has been proposed as a linear dependence modeling framework that can
effectively analyze time series, taking into account seasonality [8]. Previous studies have utilized the ARIMA method to model monthly
crash counts aggregated at the intersection level and to evaluate photo enforcement efficiency [9]. More recently, the multivariate
ARIMA (ARIMAX) model has been shown to outperform its univariate counterpart by considering vehicle, human, and roadway
predictors of crash time series [10]. Furthermore, some scholars have suggested the use of the multivariate SARIMA (SARIMAX) model
to account for both the dynamic effects of influential predictors and the seasonality of crash time series [11]. One of the main dis
advantages of ARIMA-based models is that they assume a normal Gaussian distribution for the error term, which does not take into
consideration the discrete nature of crash counts. Nevertheless, this restriction may not be particularly relevant in highly aggregated
datasets with large spatial and temporal dimensions, where the mean of counts is relatively high, and the assumption of a Gaussian
normal distribution for the error term is appropriate [4].
The statistical method is widely recognized for its comprehensibility and strong theoretical underpinnings. However, the inflexible
assumptions regarding the data distribution and the restricted paradigmatic relationships imposed on the link function between the
dependent variable and predictors are of significant concern. This is due to the potential bias in parameter estimates and mis
interpreted findings resulting from the violation of implicit assumptions inherent in the statistical approach. In recent times, the
nonlinear associations observed in actual safety data have prompted a shift in focus from traditional frequentist approaches to machine
learning (ML) algorithms. While machine learning models have been criticized for their lack of transparency, this is not a significant
impediment when the primary aim of the study is to make predictions rather than interpret the results. Consequently, prior studies
have utilized feedforward, multilayer feedforward, and recurrent neural networks to accurately represent the nonlinear behavior of
accident time series [8]. Nonetheless, deep learning models tend to perform poorly when dealing with small and less complex datasets,
reducing the chances of effective training. There is empirical evidence that supports this claim. A comprehensive comparative study
that evaluated various time series prediction models for the COVID-19 outbreak revealed that traditional statistical models out
performed advanced deep learning models at the onset of the outbreak due to the low sample size available [12]. That is especially the
case for crash prediction models that rely on monthly data for a short period, containing up to 80 data points. More importantly, in
crash analysis, the applicability of machine learning methods is limited by their black-box nature, particularly when the primary
objective of the study is to prioritize the evaluation of the effects of various exogenous variables on the target variable rather than just
prediction. Moreover, the theoretical basis of neural network methods for time series forecasting is vague, as it is not clear whether
seasonal and trend dependencies should be removed prior to model estimation. Additionally, the best method of splitting the data for
testing and training is not obvious in the methodology [13]. On the other hand, the classical SARIMA model has a well-established
theoretical basis, and the availability of statistical packages to employ such models has increased the instrument’s applicability for
policymakers and local jurisdiction safety experts.
It is evident that speed is a major factor in the number of fatalities on the roads in high-income countries, with an estimated 30% of
deaths attributed to it. Meanwhile, in some low-income and middle-income countries, speed is estimated to be the primary contrib
utory factor in about half of all road crashes [14]. Reviewing the literature, it is evident that if speed increases while other conditions
remain unchanged, accidents will tend to be more severe [15,16]. However, there are contradictory results regarding speed’s effects on
crash frequency.
Previous research has yielded contradictory findings on the relationship between mean speed and crash occurrence, despite the
established understanding that speed variation is detrimental to traffic safety [17–19]. Some studies have found no significant as
sociation or even an inverse association between crash frequency and mean speed. However, these studies have been criticized for not
2
accounting for the “frequency-severity indeterminacy”, which affects the distribution of accident severities and types, leading to
inconclusive results [16].
1.1. Research motivation and objectives
Compared to simple ARIMA models, the SARIMA model is advantageous in modeling linear serial correlations in seasonal series
[20] and is amongst the most efficient linear methods for such series [21]. However, the multivariate SARIMAX model, which accounts
for both the seasonal effects and the underlying exogenous variables, has seldom been applied in the macro crash forecast [4].
Meanwhile, the underfitting issue in univariate models might result in a lack of generalizability and biased out-of-sample predictions.
Accordingly, this research makes a noteworthy contribution by developing the multivariate SARIMA (SARIMAX) model and
comparing its results to those of the univariate model, thereby filling a gap in the literature. Besides, previous studies have not
employed multivariate methods to reveal the quantitative effects of traffic exposure and average speed on the traffic accidents time
trend in Iran. Nonetheless, illustrating the partial contribution of these important contributing factors is crucial for adopting proper
countermeasures. Considering the dynamic effects of the time-varying influential factors would improve the forecast accuracy and
generalizability of the proposed instruments. Accordingly, this research contributes to the literature by employing the multivariate
SARIMAX model to predict the national total and fatal traffic crashes on rural Iran highways. Estimating future time trends of crash
frequency plays a key role in designing interventions to improve traffic safety and assists policymakers in evaluating the effectiveness
of the implemented countermeasures.
To avoid an inverse association between crash likelihood and mean speed and misleading conclusions, the impacts of average speed
on distinct crash types must be studied separately. Otherwise, the most common accident type in the study area could overshadow the
different impacts of speed on other crash types. Unfortunately, previous studies have not investigated the different crash mechanisms
for the different accident severities/types, which would explain the inconsistent findings in the literature regarding the speed-safety
association. Hence, this paper investigates the distinct impacts of mean speed on total and fatal traffic crashes, considering the con
tradictory results in the literature.
The remainder of the paper is structured as follows: Section 2 presents a comprehensive literature review, highlighting state-of-the-
art studies and research gaps. Section 3 provides details of the dataset, SARIMAX, and SARIMA models. Section 4 presents and dis
cusses the forecasting results obtained after fitting the models to the data. Finally, the paper is concluded, and future research rec
ommendations are provided in Section 5.
2. Literature review
2.1. Methodologies in crash count time series analysis
2.1.1. Integer-valued statistical methods

In the last two decades, the negative binomial and Poisson models have widely been used to predict cross-sectional, panel, and time
series crash count data [22,23]. The results of studies illustrated that these models are appropriate for the nonnegative integer nature
of accident count data. Nonetheless, these regression models have the implicit assumption that observations should be independent of
each other. Consequently, they cannot incorporate the autocorrelation and seasonal fluctuations of time series data in the model
development. Accordingly, different solutions have been proposed by researchers to deal with the impairment. For example, some
studies adopted a time trend variable in the model as an explanatory variable to account for the serial correlation [24,25]. Others have
utilized the state-space [26] or the Praise-Winston autoregressive (AR) models on crash count time series [27]. Despite the potential
benefits of implementing the aforementioned methods, there is no guarantee that they will take into account the effect of serial
correlation, particularly in the case of long-time series count data [4].
To explicitly account for the effect of autocorrelation, scholars developed the integer-valued autoregressive (INAR) Poisson model,
dealing both with the serial correlation and discrete nonnegative nature of accident time series [4,28]. However, the INAR (1) Poisson
model still had significant limitations, as it constrained the analysis to the stationary time series process. Besides, the dynamic
properties of the influential factors were not adequately described in the INAR model. Moreover, the model’s structure was not flexible
enough to handle the serial correlation appropriately, as solely the first-order autoregressive term (AR (1)) was allowed, and there was
no moving average (MA) term. Also, it could not handle the over-dispersed crash count data. Recently, studies employed the linear
Poisson autoregressive (PAR(p)) model that can account for higher-order AR(P) processes and also deals well with the discrete nature
of crash counts [5]. Nonetheless, both the PAR and INAR models fail to deal well with the effect of overdispersion. To address the issue,
Ye et al. [6] introduced an extension of an NB logit regression model, termed the NBINGARCH model. The model contained
explanatory variables and simultaneously accounted for the serial correlation and overdispersion characteristics of the time series
count data. However, the methodology has a notable limitation in that none of the mentioned models explicitly account for seasonal
effects in the long-term accident time series.
2.1.2. Real-valued statistical methods vs. machine learning algorithms

The autoregressive integrated moving average (ARIMA) model [29] has been widely used to incorporate the serial correlation
feature of time series. The seasonal ARIMA model (SARIMA) also considers the seasonality of time trends and has indicated superiority
over well-known predictive methods such as artificial neural networks, exponential smoothing, and moving average in the literature
[30,31]. Recently, Blázquez-García et al. [32] compared the predictive capability of the SARIMA model with the artificial neural
3
networks (ANN) and the generalized additive model (GAM). The results indicated that the SARIMA approach outperforms the
competitive models. Qian et al. [8] analyzed the monthly road traffic fatalities in China from 2000 to 2017. The accuracy of SARIMA
and Elman recurrent neural network (ERNN) models was compared, utilizing the mean absolute percentage error (MAPE) criterion.
The ERRN model revealed slightly better performance. Indeed, the nonlinear models, such as the ANN model, solely outperform the
traditional linear ARIMA models when the nonlinear behavior is observed in the time series [8].
A study recently investigated the yearly traffic fatality counts in Malaysia. The research compared the performance of ARIMA
(0,1,1) with the Poisson GLM and the negative binomial GLM models. The results demonstrated that the ARIMA model performed
better based on the MAPE criterion [33]. Quddus [4] reported that when the normality assumption of errors in the ARIMA model is
violated, the INAR Poisson model leads to better performance. Nonetheless, the ARIMA outperforms the INAR model in the case of
aggregated time series. Because, in this case, the mean of the counts is high, and the normal error distribution assumption is satis
factory. In conclusion, despite complicated machine learning and integer-valued time series regression models being other options for
predicting crash counts, the classical ARIMA technique and its extensions still lead to predictions with high accuracy.
A recent study analyzed the time series of occupational traffic fatality rates in the southern states of Brazil. The study applied the
βARIMA and KARIMA extensions of the ARIMA model to model the random effects of the time series data explicitly. Results indicated
that the simple ARIMA model almost outperforms the extensions in the aggregated time series [34]. Disaggregated time series crash
counts generally follow the Poisson-type distribution. So, Quddus [7] employed the Poisson-based generalized linear autoregressive
and moving average (GLARMA) model to a time series dataset of aircraft safety data, analyzing an intervention’s effectiveness within
UK airspace. The study revealed that accounting for serial correlation in airprox counts would be more important than considering the
over-dispersion while dealing with time series crash data. Quddus [7] suggested the use of the real-valued ARIMA model and its
extensions for datasets with a high mean count (greater than 50) and a large spatial and temporal resolution, such as national-level
fatal accidents. It is important to note, however, that the NBINGARCH or GLARMA models may be more suitable for highly dis
aggregated time series datasets. Indeed, if the aggregate data follows a normal distribution, the Poisson distribution assumption of
these models would result in biased estimates and misleading interpretations. Besides, these models cannot explicitly control for
inherent seasonal variations in the time series process.
2.1.3. Multivariate analysis of crash time series

Recent studies have indicated that incorporating human, vehicle, and environmental factors in time series analysis of crash datasets
raises the prediction capability of the univariate forecasting methods [10]. Consequently, the ARIMA with explanatory variable
(ARIMAX) modeling technique could outperform the simple ARIMA in terms of accuracy and comprehensiveness if such data is
available. Nonetheless, the vast majority of studies in the literature analyzed only the aggregated crash counts without considering the
simultaneous effects of influential underlying factors [8,34–36]. Although a limited number of studies have adopted exogenous
variables as covariates, most of them only adopted dummy variables, evaluating an intervention’s effectiveness [37–40]. The studies
which conducted the multivariate dynamic time series analysis have adopted operational traffic characteristics (e.g., traffic flow),
macroeconomic conditions (e.g., GDP, unemployment rate), socio-demographic specifications (e.g., total population, automobile
numbers), weather conditions, and traffic violations [4,5,10,37,41–44]. However, no study analyzed the dynamic impacts of mean
speed on different crash severity levels, controlling for traffic exposure. Besides, the SARIMAX modeling technique has not been
well-established for analyzing accident time series in the literature.
2.2. Speed-crash association
On the one hand, the early research suggested that the speed-crash relationship could be represented by a “U-shaped” curve [45,
46]. This implies that only high and low speed ranges contribute to accidents. Nonetheless, subsequent studies revealed exponential,
power, or linear model postulations for the positive speed-crash association [47–51]. On the other hand, some studies found a negative
[17,18,52,53] or an insignificant association [18,54–56]. Solomon [45] put forward the idea that the deviation from the modus speed
matters. Subsequent studies have revealed that speed variation is a major factor in causing accidents and that the simultaneous effects
of mean speed are comparatively negligible. This has led to the conclusion that “Variance Kills, not speed” [18,56]. In contrast, some
findings suggest that speed and speed variations are both significant predictors of crash frequency [57,58].
Since the emergence of the COVID-19 pandemic, intercity traffic volume has decreased, and mean speed has increased significantly
in some countries whose results on traffic safety would bring valuable insights. The traffic volume dropped sharply during the
pandemic, even up to 50% in more than 12 countries [59]. With a such decline in exposure to crash risk, traffic accidents were expected
to decrease significantly. Meanwhile, despite a consistent decline in the number of total crash counts, the number of road traffic deaths
has increased in some countries. Studies attributed the surge in severe accidents to mobility with empty lines, reduced crowding, and
increased speeding [59]. The empirical findings of recent studies also referred the increase in severe accidents to the increase in
aggressiveness (e.g., speeding, drunk driving, improper passing) and inattentiveness (e.g., unbelted driving, distracted driving, failing
to signal) of drivers on empty roads [60]. These findings suggest that the effect of mean speed on accidents would not be similar for
different severity levels necessarily.
Recent studies have demonstrated that speed and its variations are not the only factors responsible for higher crash frequencies but
that the combination of specific traffic conditions also plays an important role in crash occurrences [61]. Factors such as traffic flow
(traffic exposure), vehicle occupancy, and road geometric design have been reported as influential factors. Traffic flow (traffic
exposure) has always been considered the main predictor of crash risk [22,62,63]. Besides, the time trends of this crucial variable
explain the general trends of crash counts. Consequently, scholars would not correctly extract the partial effects of other variables
4
without controlling for traffic exposure.

Previous researches that have shown an inverse relationship between speed and crash occurrence have attributed their findings to
unobserved heterogeneity, endogeneity, and confounding factors. However, they have not investigated influential mechanisms, such
as the proportion of distinct accident types, which could help interpret these findings.
3. Materials and methods
3.1. Data source and data distribution
The National Traffic Police (NAJA) provided the surveillance data for this study, which consisted of all traffic accidents on rural
Iran highways recorded on standard COM114 forms. The dataset included information about the passengers/occupants involved in
accidents, accident severity, and accident type. Since 2016, the Ministry of Roads and Urban Development (MRUD) has collected daily
data from both police reports and loop detectors. The loop detectors measure the Time Mean Speed (TMS) of vehicles on the roads,
which is “the average speed of all vehicles passing a point on a highway or lane over a specified time period” [64]. The speed of
individual vehicles can be calculated by dividing a specified distance (the length of the detector plus the average length of a vehicle)
over the measured travel time (the time span that the vehicle is sensed by the loop detector). The TMS is simply the mean of individual
vehicle speeds, estimated by the point detector. The average monthly TMSs recorded by 2604 loop detectors across the country were
used in this study as a measure of monthly operating speed on rural highways.
As the traffic exposure variable plays a determinative role in the crash occurrence, inductive loop detectors were used to measure
daily traffic volume on rural highways. To effectively capture the seasonal fluctuations in Iranian travel patterns, the data was
aggregated on a monthly basis according to the Persian calendar. Finally, the time series of total and fatal traffic crashes, average
monthly traffic flow, and average monthly TMS from Farvardin 1, 1395 (March 20, 2016) to Mordad 31, 1400 (August 22, 2021) were
utilized for analysis (e.g., 65 data points). Table 1 presents the descriptive statistics of the target variable and the exogenous variables.
The proposed crash prediction model will evaluate the impact of each contributing factor on the monthly number of crashes.
Fig. 1 indicates the normal probability plots, histograms, and KS statistics of monthly total and fatal accidents time series data.
These statistics imply that the normal distribution assumption for the error term in the Box-Jenkins method is appropriate for the
macro-level data, and suggest that the discrete-valued models with the Poisson distribution assumption would not be suitable for this
case.
3.2. Methodology
The ARIMA model, developed by Box and Jenkins [29], is a combination of the moving average (MA) and autoregressive (AR)
models, which explicitly incorporates differencing in its formulation. The AR model characterizes a time series based on the rela
tionship between current observations and their lagged values, whereas the MA model is a representation of a time series as an
amalgamation of present and prior random error [10]. To account for seasonality, the seasonal ARIMA (SARIMA) adds three seasonal
components to the model. The model’s structure is denoted as SARIMA (p, d, q)(P, D, Q)s , where d is the number of non-seasonal
differencing operations, q is the moving average order, p is the autoregressive order, and s represents the number of periods in
each season. The seasonal components are represented by D, Q, and P. The SARIMA model can be expressed as a series of lag poly
nomials, as represented in Eq. (1) [65]:
ΦP (Ls )φp (L)zt = ΘQ (Ls )θq (L)εt (1)
Where zt = yt − yt− s denotes the first-order seasonal difference, eliminating non-stationarity from the time series, and εt is the Gaussian
white noise error at time t.
The SARIMAX model, as introduced by scholars in Ref. [66], extends the SARIMA model by including exogenous variables to
enhance its explanatory power. Referred to as “dynamic regression” in the literature [67], the SARIMAX model captures the dynamic
effects of external factors on the time series of interest. According to its definition in Ref. [68], the SARIMAX model can be expressed as
Table 1
Descriptive statistics of data.
Variable Number of months Minimum Maximum Mean Std. Deviation
Response variables
Number of total accidents 65 6929 16,138 11203.06 2340.02
Number of fatal accidents 65 235 560 415.26 68.81
Exogenous variables
Mean speed (kph) 65 76.18 80.94 78.61 0.91
Traffic flow (∗105 ) 65 261.42 698.76 515.73 79.53
Eviews version 10 was used to estimate the models employing the maximum likelihood procedure and to calculate the ADF and BDS tests. Addi
tionally, Minitab version 19 was used to conduct the Kolmogorov-Smirnov (KS) and Ljung-Box (LB) tests on residuals and to represent the ACF/PACF
and time series plots.
5
Fig. 1. Normal probability plots, histograms, and KS statistics illustrate the normal distribution of target variables. Panels (a) and (b) represent the
normal probability plots of the fatal and total accidents, respectively; While Panels (c) and (d) indicate the histograms of the fatal and total ac
cidents, respectively.
indicated in Eqs. (2) and (3):

yt = ϖ 0 It + βX + Nt (2)
Where;
θ(B)Θ(B)ut
Nt = (3)
φ(B)Φ(Bs )(1 − B)d (1 − Bs )D
In which It is the intervention term; yt is the appropriate transformation of the target variable, X is a vector of predictors, and Nt
denotes the error component, represented by SARIMA (p, d, q)(P, D, Q)s . Besides, the Φ, φ are the seasonal AR (SAR) and regular AR
operators. The Θ, θ are the seasonal MA (SMA) and regular MA operators, and Bs , B are the seasonal and regular backshift operators.
The ut is a random error component.
The Box-Jenkins method is a three-step iterative modeling process that includes model identification, parameter estimation, and
diagnosis checking [29]. The stationarity assumption is a core implicit assumption of the Box-Jenkins models, which necessitates that
the statistical features of the time series, such as mean, variance, and serial correlation, remain constant over time [10].
To begin with, the stationarity and linearity of the series should be verified using the ADF (Augmented Dickey-Fuller) [69] and BDS
(Brock, Dechert, and Scheinkman) [70] tests, respectively. The ADF test is a statistical tool used to determine the presence of a unit root
in a given series, which implies that the series is non-stationary. The null hypothesis of the ADF test is that the time series contains a
unit root. The BDS test has the null hypothesis that time series data is independent and identically distributed (iid). This test has a
unique capability to detect nonlinearities while not being affected by linear dependencies in the data. It is necessary to set two free
parameters - the epsilon value (ε) and the embedding dimension (m) - when conducting the BDS test. Generally, the embedding
dimension should range from 2 to 8, although the maximum embedding dimension should be lower if there are fewer observations
available [71]. Simulation studies suggest that the BDS test should be computed using an epsilon value in the range of 1.5–2.0 times the
standard deviation of the time series [72]. This study computed the BDS statistics, setting the m and ε parameters to 5 and 1.5 σ ,
respectively. Additionally, stationarity can be evaluated by visually inspecting the time series and analyzing the autocorrelation
function (ACF) and partial autocorrelation function (PACF) plots. Model identification is the subsequent step, which entails deter
mining the parameters order of the Box-Jenkins model through examination of the PACF and ACF plots. Subsequently, the parameters
6
of the formulated model are estimated using the maximum likelihood method. Ultimately, the alternative models are compared based
on the lowest MAPE and BIC criteria.
After developing the model, the next task is to perform diagnostic checking to evaluate its accuracy. Visually assessing the white
noise properties of the residuals, such as the constancy of variance, independence, and zero mean, can be done by inspecting the PACF
and ACF residual plots. Additionally, the normality and independence of residuals can be evaluated using the Kolmogorov-Smirnov
(KS) and Ljung and Box (LB) tests, respectively. The Ljung-Box test [73] is a measure of how well the autocorrelation of the re
siduals matches the autocorrelation of white noise. If the Ljung-Box test statistic is not statistically significant, it means that the null
hypothesis of the white noise characteristics of the residuals cannot be rejected. Finally, the efficiency of the model’s forecasts can be
assessed by comparing the observed and predicted monthly crash frequencies. All statistical tests in this study are conducted at the 5%
significance level.
4. Results and discussion
The purpose of this research was to propose an accurate time series regression model for crash macro forecast in Iran. To this end,
ARIMA, ARIMAX, SARIMA, and SARIMAX econometric models were developed and compared for total and fatal crash count time
series datasets. The datasets were divided into two sections, with the first 55 observations from Farvardin 1, 1395 (March 20, 2016) to
Mehr 30, 1399 (October 21, 2020) utilized as the training set and the last ten months employed as the test set. The developed models
were evaluated employing the higher R2 , the lower MAPE, and the lower Bayesian information criterion (BIC) as forecast accuracy
measures.
4.1. Accident macro forecast for the fatal crash count dataset
Fig. 2 illustrates the monthly time trends of fatal traffic accidents in Iran. It is evident that monthly fatal accidents follow the time
trends of the traffic flow (traffic exposure) closely. They both pick up in the first two months of the year, during the New Year (Nowruz)
holidays. Then they decline till the fourth month when the summer school holidays start. The trends pick up again more intensively till
the sixth month, when summer vacations end. They then drop off during the autumn and the first two months of the winter, and they
start the increasing pattern in the last month again. Consequently, the seasonal nature of the crash time series is evident, confirming the
necessity of implementing the advanced SARIMA models.
The time series plot of the monthly fatal crashes indicates that the series is stationary in terms of variance and mean, as there are no
significant trends or fluctuations in variance over time. The partial autocorrelation function (PACF) and autocorrelation function
(ACF) plots for the first 24 lags display a damped sinusoid behavior, with the significance of the first lags’ autocorrelation coefficients,
which suggests that the series is stationary (see Fig. 3). The ADF test further confirmed the stationarity of the series rejecting the null
hypothesis of non-stationarity at a 99% confidence level (see Table 2).
Furthermore, the BDS test revealed that the monthly fatal accidents follow a linear time series, since the test statistic did not reject
the null hypothesis at a 5% significance level (see Table 2); Therefore, linear time series regression methods would be suitable for
modeling the process.
Fig. 2. Monthly total (*10)/fatal crash counts and monthly traffic flow (*10^5) in Iran.
7
Fig. 3. The ACF and PACF plots of the original fatal crash count time series, represented in panels (a) and (b), respectively.
Table 2
P-values of statistical tests for linearity and stationarity.
Time series BDS test ADF test
m=2 3 4 5
Monthly fatal crashes 0.320 0.938 0.993 0.999 0.000

Ln (monthly total crashes) 0.000 0.000 0.000 0.000 0.050
Monthly total crashes 0.000 0.000 0.000 0.000 0.066
In order to identify the most appropriate model, the ACF and PACF plots were examined. The ACF plot showed a damped sinusoidal
pattern, with only the first lag coefficients having high values. Moreover, the PACF demonstrated a spike at the first lag, followed by a
tailing off, indicating the presence of autoregressive terms in the series. As spikes were also observed in subsequent lags in both plots, it
was concluded that the time series was a SARIMA process. To determine the best model, multiple SARIMA models were developed,
ranked by the lower BIC criterion. The statistical significance of the top 20 models’ parameters was evaluated at a 95% confidence
level, as shown in Fig. 4. Finally, two univariate models were selected for further analysis, and their features are described in Table 3.
The univariate seasonal and non-seasonal models were found to be the best fit, with SARIMA (1, 0, 0)(1, 0, 0)12 and ARIMA (1,0,0)
models, respectively. As presented in Table 3, the estimated ARIMA model had an R2 value of 0.472, indicating that roughly half of the
variation in current monthly fatal crashes could be explained by past fatal crashes. The inclusion of seasonal dependencies in the
SARIMA model increased the R2 by 1.5 times, resulting in a value of 0.666.
The multivariate SARIMAX model demonstrated superior performance to other models, with a coefficient of determination of
0.830. Monthly mean speed and traffic exposure had a significant positive impact on fatal accidents. Diagnostic checks confirmed that
the developed models were appropriate for forecasting time series data of fatal accidents in Iran. The ACF and PACF plots of the
residuals displayed no large coefficients at any lag, indicating that there was no serial correlation among the residuals and that the
models accurately captured the underlying time series process, as illustrated in Fig. 5.
The overall accuracy of the models was also evaluated using the LB test, and the test statistics were found to be statistically sig
nificant at a %95 confidence level (see Table 3). The estimated models were employed to forecast the first ten months of the out-of-
sample period (see Fig. 6). The out-of-sample forecast accuracy of the models was reported as the mean absolute percent error (MAPE)
8
Fig. 4. BIC values of seasonal [panel (a)] and non-seasonal [panel (b)] univariate models. The values in parentheses show the order of model
parameters as (p,q) (P,Q).
Table 3
Accident prediction models for monthly fatal accidents in Iran.
ARIMA ARIMAX SARIMA SARIMAX
Coef. T-stat Coef. T-stat Coef. T-stat Coef. T-stat

Explanatory variables
Non-Seasonal AR (1) 0.677 7.19 0.608 5.180 0.534 4.09 0.442 3.229
Seasonal AR (1) – – – – 0.636 4.46 0.493 3.651
Constant 420.839 20.34 − 44.930 − 0.056 409.677 15.51 – –
Monthly traffic flow (* 105 ) – – 0.615 7.723 – – 0.553 5.760
Mean speed (Km/ ) – – 1.835 1.857 – – 1.679 2.513
hr
Descriptive statistics
Length of series 55 55 55 55
Log-likelihood − 293.30 − 267.67 − 283.64 − 263.57
Accuracy (within sample)

Bayesian information criterion (BIC) 10.884 10.098 10.606 9.949
Mean absolute % error (MAPE) 10.205 6.092 8.209 5.761
Mean absolute deviation (MAD) 41.528 25.160 33.162 24.027
Root mean square error (RMSE) 49.805 31.301 39.589 28.242
R2 0.472 0.791 0.666 0.830
Diagnosis Check of Residuals

Ljung and Box (LB) (K = 14) 47.91 0.73 25.24 0.11 10.53 0.57 10.95 0.76
Kolmogorov-Smirnov (KS) 0.086 <1.46 0.054 <1.46 0.100 <1.46 0.082 <1.46
Forecast Accuracy
Out of sample MAPE (%) 17.67 8.38 11.29 5.61
in Table 3. Reviewing the SARIMAX model in Table 3, the mean speed has a linear positive association with fatal accidents, controlling
for traffic exposure.
4.2. Accident macro forecast for the total crash count dataset
The time series in Fig. 2 showed an increasing trend in monthly accidents, indicating non-stationarity in the mean. The stationarity
was further examined using the ADF test, which did not reject the non-stationarity null hypothesis at a 95% confidence level, as
indicated in Table 2. Additionally, non-stationarity in variance was observed after the 42nd data point, possibly due to the Covid
pandemic (see Fig. 2). Utilizing a Box-Cox transformation with a λ of 0 enabled the mean and variance to be stabilized, resulting in a
stationary series. The ADF test confirmed the stationarity of the transformed series, rejecting the null hypothesis at a 5% significance
level, as shown in Table 2. Additionally, the ACF plot of the transformed time series indicated stationarity, with only the first lag
9
H. Nassiri et al.
10
Heliyon 9 (2023) e14481

Fig. 5. The residuals ACF and PACF of ARIMA and SARIMA Models (fatal crashes). Panels (a) and (b) represent the ACF plots of ARIMA and SARIMA reseduals, respectively; While panels (c) and (d)
indicate PACF plots of ARIMA and SARIMA reseduals, respectively.
Fig. 6. Observed vs. forecasted values of monthly fatal accidents in Iran.
coefficients being statistically significant (see Fig. 7).

The transformed series ACF plot in Fig. 7 demonstrated a damped pattern, while the PACF plot showed a single significant spike at
the first lag, indicating a pure AR (1) process. Despite developing various SARIMA models, none of the estimated models had
Fig. 7. The ACF and PACF plots of the transformed total crash count time series, represented in panels (a) and (b), respectively.
11
statistically significant parameters. Finally, the ARIMA (1,0,0) and ARIMA (1,0,0) with exogenous variables (ARIMAX) models were
chosen as the best models for forecasting monthly total accidents, and their estimated parameters are provided in Table 4.
Upon reviewing Table 4, it is observed that all the regressors, except for the seasonal model, are statistically significant at a 95%
confidence level. The R2 value of the ARIMA model is 0.664, indicating that the current crash counts can be explained by taking into
account the autocorrelations in the series. The multivariate ARIMAX model performs better than the univariate model and explains
90% of the variation in the crash time series. The MAPE values suggest that the aforementioned models accurately reproduce ob
servations with an average absolute error of approximately 1%. Furthermore, the ARIMAX model suggests that the monthly crash
frequency is negatively associated with the mean speed while controlling for the simultaneous impacts of traffic exposure.
After conducting tests on the residuals, including checking for non-autocorrelation, variance stationarity, and zero mean, the
partial autocorrelation function (PACF) and autocorrelation function (ACF) plots were analyzed and are presented in Fig. 8. The results
indicate that the ACF/PACF values are insignificant, suggesting that the residuals are uncorrelated. The Ljung and Box test evaluated
the fit of the developed instruments, where the null hypothesis of the test statistic was not rejected, as shown in Table 4. Moreover, the
Kolmogorov-Smirnov (KS) test confirmed the hypothesis of the Gaussian white noise properties for the error term, as the normality null
hypothesis was not rejected based on the test’s P-values, all of which were above 0.15 (e.g., t-value<1.46).
The developed macro forecast crash prediction models were then applied for out-of-sample crash prediction (see Fig. 9). The
prediction accuracy of the models was evaluated based on the lower MAPE criterion, with the results reported in the last row of
Table 4.
In summary, the results suggest that accident frequency increases when the mean speed declines, particularly in congested traffic,
where elevated interactions among vehicles restrict the freedom of maneuver and relative distance. Therefore, the time series
regression models indicate an inverse relationship between the crash frequency and mean speed. Recent real-time crash prediction
research also supports this inverse association between crash occurrence and mean speed [74–77]. On the other hand, crash severity
declines in congested traffic because lower speed triggers fewer severe accidents [78,79]. The results demonstrate a positive associ
ation between mean speed and fatal crashes, which is in accordance with previous findings [16,80]. Additionally, the SARIMA model
outperforms the ARIMA method when seasonality exists in the time series. This study criticizes the usefulness of the integer-valued
extensions of the ARIMA model, known as GLARMA models, as they are not explicitly designed to model the highly seasonal crash
counts. Finally, the study aggregated daily traffic and crash data based on the Persian calendar monthly, as the travel pattern of
Iranians follows the ceremonies and holidays in their specific culture. Aggregating data on another basis would partially hide the
seasonal nature of accidents time series, which previous studies on time trends of road traffic fatalities in Iran have overlooked [36,39,
81,82].
Table 4
Accident prediction models for log-transformed monthly total accidents in Iran.
ARIMA ARIMAX SARIMA
Coef. t-stat Coef. t-stat Coef. t-stat

Explanatory variables
Non-Seasonal AR (1) 0.815 10.86 0.710 7.130 0.798 11.286
Seasonal AR (1) – – – – 0.260 1.503
Constant 9.289 112.44 14.0495 9.82 9.273 93.776
Monthly traffic flow (* 105 ) – – 0.002 10.30 – –
Mean speed (Km/ ) – – − 0.071 − 3.88 – –
hr
Descriptive statistics
Length of series 55 55 55
Log-likelihood 36.97 70.77 38.15
Accuracy (within sample)

Bayesian information criterion (BIC) − 1.126 − 2.209 − 1.096
Mean absolute % error (MAPE) 1.062 0.560 2.370
Mean absolute deviation (MAD) 0.099 0.052 0.095
Root mean square error (RMSE) 0.122 0.066 0.119
R2 0.664 0.901 0.683
Diagnosis Check of Residuals

Ljung and Box (LB) (K = 14) 9.94 1.15 6.29 0.76 9.52 1.08
Kolmogorov-Smirnov (KS) 0.073 <1.46 0.097 <1.46 0.116 1.85
Forecast Accuracy
Out of sample MAPE (%) 2.26 1.97 2.87
12
H. Nassiri et al.
13
Heliyon 9 (2023) e14481

Fig. 8. The Residuals ACF and PACF of ARIMA and ARIMAX Models (total crashes). Panels (a) and (b) represent the ACF plots of ARIMA and ARIMAX reseduals, respectively; While panels (c) and (d)
indicate PACF plots of ARIMA and ARIMAX reseduals, respectively.
Fig. 9. Observed vs. forecasted values of monthly total accidents in Iran.
5. Conclusions
5.1. Conclusions
This study aimed to employ the classical Box-Jenkins methodology to predict the time trends of monthly total and fatal crash counts
of rural Iran highways, incorporating the dynamic effects of time-varying traffic characteristics. To this end, a range of univariate and
multivariate econometric models was developed, namely ARIMA, ARIMAX, SARIMA, and SARIMAX models. Besides, the dissimilar
impacts of mean speed on fatal and total accidents were explored using the aforementioned models while controlling for traffic
exposure. The estimated models’ in-sample accuracy was compared employing various performance metrics, and the proposed in
struments’ forecast performance was evaluated based on the mean absolute percent error (MAPE) criterion.
The following were made in this research: in the monthly fatal crash dataset, the most accurate crash macro forecast tools were the
SARIMAX, SARIMA, and ARIMA models, respectively. While the ARIMA and ARIMAX models represented the best fits in predicting the
total accidents as the seasonal effects were weaker in this case. Besides, the mean speed indicated a linear positive association with
fatal accidents, controlling for traffic exposure. On the other hand, it was inversely related to monthly total crash frequency.
Conclusively, the results illustrate that the multivariate ARIMA and SARIMA models clearly outperform their univariate counterparts.
So, incorporating the dynamic effects of time-varying exogenous predictors on accidents, especially traffic conditions, would properly
upgrade the forecast adequacy of the univariate time series regressions. Furthermore, the results suggest that it is essential to take into
account seasonal dependencies when predicting crash time series data that exhibits strong seasonal patterns. Consequently, the
SARIMA and SARIMAX models outperform their non-seasonal counterparts in the seasonal datasets. Moreover, the study illustrates the
dissimilar impacts of average speed on total and severe crashes. Considering the contradictory results in the literature, the subsequent
studies should focus on explaining the reasons behind these discrepancies.
5.2. Implications
The models developed in this study could aid policymakers in predicting future accident trends and evaluating the effectiveness of
newly implemented interventions. The multivariate models, which incorporate speed and traffic exposure, can effectively identify the
efficiency of policies even in the presence of external factors that disrupt common trends, such as pandemics. The study found that
increasing the monthly mean speed by 1 km/h leads to a monthly increase of about two fatal accidents on rural highways in Iran.
In low-income and middle-income countries (LMICs), enforcing regulations and penalties for speeding offenses may be less
effective than in high-income countries due to the negligible costs of such offenses. In many high-income countries, heavy vehicles are
equipped with mechanisms to prevent violation of speed limits; however, these devices are often met with resistance in LMICs for
economic reasons or disabled by operators if installed [83]. Drivers in LMICs are often under pressure to speed due to timetables and
incentives based on ticket receipts. Studies suggest that offering an insurance incentive for installing speed enforcement devices in
vehicles could be an effective policy for reducing speed-related accidents in Iran [84]. Additionally, highways in high-income countries
are designed with a forgiving design concept, which can diminish the severity and probability of traffic crashes even if individuals
exceed speed limits or make other mistakes while driving. Therefore, the results of this study raise concerns about speed-related
14
accidents in Iran and highlight the need for policymakers to increase the cost of speed violations, implement modern enforcement
techniques, and upgrade road design standards.
The study also recommends the use of the SARIMAX method as a useful tool for quantifying the explicit impacts of distinct
influential factors on crashes while accounting for underlying serial correlation, seasonality, and dynamic effects of other contributing
factors. Despite its potential benefits, this robust econometric tool has been largely overlooked in traffic safety studies that use
aggregate crash count time series analysis. Finally, the study highlights that accident frequency in Iran closely follows the traffic flow
time trend and the seasonal pattern of recreational travel. Therefore, future interventions aimed at reducing accidents in Iran will need
to include measures to control unnecessary intercity travel, considering the low price of fuel in the country.
5.3. Limitations and recommendations
Containing environmental and behavioral regressors of accidents as exogenous variables in a SARIMAX framework is recom
mended for future studies. Although, such information may not be available at the national level in developing countries. Comparing
the forecast performance of the developed models with advanced modeling techniques such as recurrent neural networks and integer-
valued time series regression methods should also be influential. Although, the better performance of neural networks seems ques
tionable for linear dependence time trends.
Author contribution statement
Habibollah Nassiri: Conceived and designed the experiments; Analyzed and interpreted the data; Wrote the paper. </p>
Seyed Iman Mohammadpour: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the
data; Contributed reagents, materials, analysis tools or data; Wrote the paper. </p>
Mohammad Dahaghin: Contributed reagents, materials, analysis tools or data. </p>
Funding statement
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Data availability statement
Data will be made available on request.
Declaration of interest’s statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.
References
[1] World Health Organization (WHO), Global Status Report on Road Safety 2018, World Health Organization, Geneva, 2018. Available from: https://www.who.
int/publications/i/item/9789241565684.
[2] United Nations Department for Safety and Security (UNDSS), Road safety strategy booklet [cited 27 Jan 2023]. In: The United Nations (UN) website [Internet].
Available from: https://www.un.org/sites/un2.un.org/files/2020/09/road_safety_strategy_booklet.pdf, 2020 Sep.
[3] P. Anila, Road Safety-Considerations in Support of the 2030 Agenda for Sustainable Development, United Nations publication issued by the United Nations
Conference on Trade and Development, Geneva, 2017. Available from: https://unctad.org/system/files/official-document/dtltlb2017d4_en.pdf.
[4] M.A. Quddus, Time series count data models: an empirical application to traffic accidents, Sep. 1, Accid. Anal. Prev. 40 (5) (2008) 1732–1741.
[5] Y. Zhang, Y. Zou, L. Wu, J. Tang, M. Muneeb Abid, Exploring the application of the linear Poisson autoregressive model for analyzing the dynamic impact of
traffic laws on fatal traffic accident frequency, 2020 Oct 9, J. Adv. Transport. (2020) 1–9.
[6] F. Ye, T.P. Garcia, M. Pourahmadi, D. Lord, Extension of negative binomial GARCH model: analyzing effects of gasoline price and miles traveled on fatal crashes
involving intoxicated drivers in Texas, Transport. Res. Rec. 2279 (1) (2012 Jan) 31–39.
[7] Quddus M. Time-series Regression Models for Analysing Transport Safety Data. InSafe Mobility: Challenges, Methodology and Solutions 2018 Apr 18. Emerald
Publishing Limited. p. 279–296.
[8] Y. Qian, X. Zhang, G. Fei, Q. Sun, X. Li, L. Stallones, H. Xiang, Forecasting deaths of road traffic injuries in China using an artificial neural network, Aug 17,
Traffic Inj. Prev. 21 (6) (2020) 407–412.
[9] W. Vanlaar, R. Robertson, K. Marcoux, An evaluation of Winnipeg’s photo enforcement safety program: results of time series analyses and an intersection
camera experiment, Accid. Anal. Prev. 62 (2014 Jan 1) 238–247.
[10] C.C. Ihueze, U.O. Onwurah, Road traffic accidents prediction modelling: an analysis of Anambra State, Nigeria, Accid. Anal. Prev. 112 (2018 Mar 1) 21–29.
[11] Y. Chen, S. Tjandra, Daily collision prediction with SARIMAX and generalized linear models on the basis of temporal and weather variables, Transport. Res. Rec.
2432 (1) (2014 Jan) 26–36.
[12] R. Gupta, G. Pandey, S.K. Pal, Comparative analysis of epidemiological models for COVID-19 pandemic predictions, Biostatistics & Epidemiology 5 (1) (2021
Jan 2) 69–91.
[13] PA Sánchez-Sánchez, JR García-González, LH Coronell, Encountered Problems of Time Series with Neural Networks: Models and Architectures, Recent Trends in
Artificial Neural Networks - from Training to Prediction 2020 Mar 4, IntechOpen, 2020, pp. 21–38.
[14] M. Chan, M.R. Bloomberg, Reducing speed to save lives [cited 27 January 2023]. In: World Health Organization Newsroom, Commentaries [Internet]. Available
from: https://www.who.int/news-room/commentaries/detail/reducing-speed-to-save-lives, 2017 May 9.
[15] L. Aarts, I. Van Schagen, Driving speed and the risk of road crashes: a review, Accid. Anal. Prev. 38 (2) (2006 Mar 1) 215–224.
[16] E. Hauer, Speed and safety, Transport. Res. Rec. 2103 (1) (2009) 10–17.
15
[17] A. Baruya, Speed-accident relationships on European roads, in: 9th International Conference on Road Safety in Europe, Bergisch Gladbach, Germany,
September, 1998.
[18] N.J. Garber, R. Gadiraju, Factors affecting speed variance and its influence on accidents, Transport. Res. Rec. 1213 (1989) 64–71.
[19] B.B. Mehrabani, B. Mirbaha, Evaluating the relationship between operating speed and collision frequency of rural multilane highways based on geometric and
roadside features, Civil Engineering Journal 4 (3) (2018 Mar) 609.
[20] X. Zhang, Y.A. Hongyan, H.U. Guoqing, C.U. Mengjing, G.U. Yue, H. Xiang, Basic characteristics of road traffic deaths in China, Iran. J. Public Health 42 (1)
(2013) 7.
[21] X. Zhang, Y. Pang, M. Cui, L. Stallones, H. Xiang, Forecasting mortality of road traffic injuries in China using seasonal autoregressive integrated moving average
model, Ann. Epidemiol. 25 (2) (2015 Feb 1) 101–106.
[22] M.A. Abdel-Aty, A.E. Radwan, Modeling traffic accident occurrence and involvement, Accid. Anal. Prev. 32 (5) (2000 Sep 1) 633–642.
[23] D. Lord, S.P. Washington, J.N. Ivan, Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory,
Accid. Anal. Prev. 37 (1) (2005 Jan 1) 35–46.
[24] R.B. Noland, M.A. Quddus, A spatially disaggregate analysis of road casualties in England, Accid. Anal. Prev. 36 (6) (2004 Nov 1) 973–984.
[25] R. Noland, M. Quddus, W. Ochieng, The effect of the congestion charge on traffic casualties in London: an intervention analysis, in: Transportation Research
Board 85st Annual Meeting, Washington, DC, USA, January, 2006.
[26] C. Antoniou, G. Yannis, State-space based analysis and forecasting of macroscopic road safety trends in Greece, Accid. Anal. Prev. 60 (2013 Nov 1) 268–276.
[27] G. Chi, A.G. Cosby, M.A. Quddus, P.A. Gilbert, D. Levinson, Gasoline prices and traffic safety in Mississippi, J. Saf. Res. 41 (6) (2010 Dec 1) 493–500.
[28] T. Brijs, D. Karlis, G. Wets, Studying the effect of weather conditions on daily crash counts using a discrete time-series model, Accid. Anal. Prev. 40 (3) (2008
May 1) 1180–1190.
[29] G.E. Box, G.M. Jenkins, Time Series Analysis: Forecasting and Control, Holdan-Day, San Francisco, 1970.
[30] K.J. Linthicum, A. Anyamba, C.J. Tucker, P.W. Kelley, M.F. Myers, C.J. Peters, Climate and satellite indicators to forecast Rift Valley fever epidemics in Kenya,
Science 285 (5426) (1999 Jul 16) 397–400.
[31] G.E. Box, G.M. Jenkins, G.C. Reinsel, G.M. Ljung, Time Series Analysis: Forecasting and Control, John Wiley & Sons, 2015 May 29.
[32] A. Blázquez-García, A. Conde, A. Milo, R. Sánchez, I. Barrio, Short-term office building elevator energy consumption forecast using SARIMA, Journal of Building
Performance Simulation 13 (1) (2020 Jan 2) 69–78.
[33] A.H.C. Wai, S.Y. Seng, J.L.W. Fei, Fatality Involving Road Accidents in Malaysia: a comparison between three statistical models, in: Proceedings of the 2019 2nd
International Conference on Mathematics and Statistics, Prague, Czech Republic, July. 2019.
[34] C. Melchior, R.R. Zanini, R.R. Guerra, D.A. Rockenbach, Forecasting Brazilian mortality rates due to occupational accidents using autoregressive moving
average approaches, Int. J. Forecast. 37 (2) (2021 Apr 1) 825–837.
[35] A. Bahadorimonfared, H. Soori, Y. Mehrabi, A. Delpisheh, A. Esmaili, M. Salehi, M. Bakhtiyari, Trends of fatal road traffic injuries in Iran, 2004–2011, PLoS One
8 (5) (2013), e65198. May 28.
[36] M. Delavary Foroutaghe, A. Mohammadzadeh Moghaddam, V. Fakoor, Time trends in gender-specific incidence rates of road traffic injuries in Iran, PLoS One 14
(5) (2019 May 9), e0216462.
[37] R. Bergel-Hayat, M. Debbarh, C. Antoniou, G. Yannis, Explaining the road accident risk: weather effects, Accid. Anal. Prev. 60 (2013 Nov 1) 456–465.
[38] Q. Wu, T. Chen, P.A. Byrne, J. Larsen, Y. Elzohairy, General deterrence of drinking and driving: an evaluation of the effectiveness of three Ontario
countermeasures, Int. J. Eng. Manag. Econ. 5 (3–4) (2015) 209–223.
[39] M. Delavary Foroutaghe, A. Mohammadzadeh Moghaddam, V. Fakoor, Impact of law enforcement and increased traffic fines policy on road traffic fatality,
injuries and offenses in Iran: interrupted time series analysis, PLoS One 15 (4) (2020 Apr 17), e0231182.
[40] J.I. Nazif-Munoz, Y. Oulhote, M.C. Ouimet, The association between legalization of cannabis use and traffic deaths in Uruguay, Addiction 115 (9) (2020 Sep)
1697–1706.
[41] T.H. Law, R.R. Umar, S.V. Wong, The Malaysian government’s road accident death reduction target for year 2010, IATSS Res. 29 (1) (2005 Jan 1) 42–49.
[42] C. Li, J. Chen, Traffic accident macro forecast based on ARIMAX model, 2009 Apr 11, in: International Conference on Measuring Technology and Mechatronics
Automation vol. 3, IEEE, 2009, pp. 633–636.
[43] C. Antoniou, G. Yannis, E. Papadimitriou, S. Lassarre, Relating traffic fatalities to GDP in Europe on the long term, Accid. Anal. Prev. 92 (2016 Jul 1) 89–96.
[44] Z. Zhang, W. Yang, S. Wushour, Traffic accident prediction based on LSTM-GBRT model, 2020 Mar 5, J. Control Sci. Eng. (2020) 1, 0.
[45] D.H. Solomon, Accidents on Main Rural Highways Related to Speed, Driver, and Vehicle, US Department of Transportation, Federal Highway Administration,
1964.
[46] J.A. Cirillo, Interstate system accident research study II, interim report II, Public Roads 35 (3) (1968) 71–75.
[47] D.J. Finch, P. Kompfner, C.R. Lockwood, G. Maycock, Speed, Speed Limits and Crashes. Crowthorne, Berkshire, vol. 58, Transport Research Laboratory TRL,
Project Report PR, 1994.
[48] C.N. Kloeden, A.J. McLean, V.M. Moore, G. Ponte, Travelling Speed and the Risk of Crash Involvement Volume 2-case and Reconstruction Details, NHMRC Road
Accident Research Unit, The University of Adelaide, Adelaide, 1997.
[49] C.N. Kloeden, A.J. McLean, G. Glonek, (Road Accident Research Unit, the University of Adelaide, Adelaide, SA). Reanalysis of Travelling Speed and the Risk of
Crash Involvement in Adelaide South Australia. Canberra (ACT), Department of Transport and Regional Services, Australian Transport Safety Bureau, 2002 Apr.
Report No.: CR 207.
[50] M.C. Taylor, A. Baruya, J.V. Kennedy, The Relationship between Speed and Accidents on Rural Single Carriageway Roads. Crowthorne: Transport Research
Laboratory, Road Safety Division, Department of the Environment, Transport and the Regions, 2002 Mar. Report No.: TRL511.
[51] G. Nilsson, Traffic Safety Dimensions and the Power Model to Describe the Effect of Speed on Safety, Ph.D. Thesis, Lund Institute of Technology, Department of
Technology and Society, Traffic Engineering, 2004. Available from: https://www.motor-talk.de/forum/aktion/Attachment.html?attachmentId=689000.
[52] M.C. Taylor, D. Lynam, A. Baruya, The Effect of Drivers’ Speed on the Frequency of Accidents, Crowthorne: Transport Research Laboratory, 2000 Mar. Report
No.: TRL421. Sponsored by Road Safety Division, Department of the Environment, Transport and the Regions.
[53] J. Stuster, Aggressive Driving Enforcement: Evaluations of Two Demonstration Programs, US Department of Transportation, National Highway Traffic Safety
Administration, 2004.
[54] C.A. Lave, Speeding, coordination, and the 55 mph limit, Am. Econ. Rev. 75 (5) (1985 Dec 1) 1159–1164.
[55] K.M. Kockelman, J. Ma, Freeway speeds and speed variations preceding crashes, within and across lanes, J. Transport. Res. Forum 46 (1) (2007) 43–61.
[56] M. Quddus, Exploring the relationship between average speed, speed variation, and accident rates using spatial statistical models and GIS, J. Transport. Saf.
Secur. 5 (1) (2013 Mar 1) 27–45.
[57] D.T. Levy, P. Asch, Speeding, coordination, and the 55-mph limit: comment, Am. Econ. Rev. 79 (4) (1989 Sep 1) 913–915.
[58] M. Tanishita, B. Van Wee, Impact of vehicle speeds and changes in mean speeds on per vehicle-kilometer traffic accident rates in Japan, IATSS Res. 41 (3) (2017
Oct 1) 107–112.
[59] Y.J. Yasin, M. Grivna, F.M. Abu-Zidan, Global impact of COVID-19 pandemic on road traffic collisions, World J. Emerg. Surg. 16 (2021 Dec) 1–4.
[60] X. Dong, K. Xie, H. Yang, How did COVID-19 impact driving behaviors and crash Severity? A multigroup structural equation modeling, Accid. Anal. Prev. 172
(2022 Jul 1), 106687.
[61] P. Choudhary, M. Imprialou, N.R. Velaga, A. Choudhary, Impacts of speed variations on freeway crashes by severity and vehicle type, Accid. Anal. Prev. 121
(2018 Dec 1) 213–222.
[62] L.Y. Chang, Analysis of freeway accident frequencies: negative binomial regression versus artificial neural network, Saf. Sci. 43 (8) (2005 Oct 1) 541–557.
[63] T.F. Golob, W. Recker, Y. Pavlis, Probabilistic models of freeway safety performance using traffic flow data as predictors, Saf. Sci. 46 (9) (2008 Nov 1)
1306–1333.
[64] R.P. Roess, E.S. Prassas, W.R. McShane, Traffic Engineering, Pearson/Prentice Hall, 2004.
16
[65] R. Adhikari, R.K. Agrawal, An Introductory Study on Time Series Modeling and Forecasting, LAP LAMBERT Academic Publishing, 2013.
[66] G.E. Box, G.C. Tiao, Intervention analysis with applications to economic and environmental problems, J. Am. Stat. Assoc. 70 (349) (1975 Mar 1) 70–79.
[67] A. Pankratz, Forecasting with Dynamic Regression Models, John Wiley & Sons, 2012 Jan 20.
[68] K.W. Hipel, A.I. McLeod, Time Series Modelling of Water Resources and Environmental Systems, Elsevier, 1994 Apr 7.
[69] D.A. Dickey, W.A. Fuller, Distribution of the estimators for autoregressive time series with a unit root, J. Am. Stat. Assoc. 74 (366a) (1979 Jun 1) 427–431.
[70] W.A. Broock, J.A. Scheinkman, W.D. Dechert, B. LeBaron, A test for independence based on the correlation dimension, Econom. Rev. 15 (3) (1996 Jan 1)
197–235.
[71] J. Belaire-Franch, D. Contreras, How to compute the BDS test: a software comparison, J. Appl. Econom. 17 (2002) 691–699.
[72] Kanzler L. Very fast and correctly sized estimation of the BDS statistic [Internet]. Social Science Research Network (SSRN) website; 1999 Feb [cited 2023 Mar
11]. Available from: https://dx.doi.org/10.2139/ssrn.151669.
[73] G.M. Ljung, G.E. Box, On a measure of lack of fit in time series models, Biometrika 65 (2) (1978) 297–303.
[74] M. Abdel-Aty, H.M. Hassan, M. Ahmed, Real-time analysis of visibility related crashes: can loop detector and AVI data predict them equally?, in: Transportation
Research Board 91st Annual Meeting Washington, DC, USA, January, 2012.
[75] R. Yu, M. Abdel-Aty, Using hierarchical Bayesian binary probit models to analyze crash injury severity on high speed facilities with real-time traffic data, Accid.
Anal. Prev. 62 (2014 Jan 1) 161–167.
[76] L. Wang, M. Abdel-Aty, Q. Shi, J. Park, Real-time crash prediction for expressway weaving segments, Dec 1, Transport. Res. C Emerg. Technol. 61 (2015) 1, 0.
[77] C. Xu, P. Liu, W. Wang, Evaluation of the predictability of real-time crash risk models, Accid. Anal. Prev. 94 (2016 Sep 1) 207–215.
[78] D. Shefer, P. Rietveld, Congestion and safety on highways: towards an analytical model, Urban Stud. 34 (4) (1997 Apr) 679–692.
[79] C. Xu, A.P. Tarko, W. Wang, P. Liu, Predicting crash likelihood and severity on freeways with real-time loop detector data, Accid. Anal. Prev. 57 (2013 Aug 1)
30–39.
[80] A. Theofilatos, G. Yannis, A review of the effect of traffic and weather characteristics on road safety, Accid. Anal. Prev. 72 (2014 Nov 1) 244–256.
[81] S. Yousefzadeh-Chabok, F. Ranjbar-Taklimie, R. Malekpouri, A. Razzaghi, A time series model for assessing the trend and forecasting the road traffic accident
mortality, Archives of Trauma Research 5 (3) (2016 Sep).
[82] M. Parvareh, A. Karimi, S. Rezaei, A. Woldemichael, S. Nili, B. Nouri, N.E. Nasab, Assessment and prediction of road accident injuries trend using time-series
models in Kurdistan, Burns & Trauma. (2018 Dec 1) 6.
[83] F. Wegman, P. Elsenaar, Sustainable Solutions to Improve Road Safety in The Netherlands, SWOV Institute for Road Safety Research, Leidschendam, 1997.
[84] S. Sahebi, H. Nassiri, B. Van Wee, Y. Araghi, Incorporating car owner preferences for the introduction of economic incentives for speed limit enforcement,
Transport. Res. F Traffic Psychol. Behav. 64 (2019 Jul 1) 509–521.
17

00 1-S2.0-S2405844023016882-Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

00 1-S2.0-S2405844023016882-Main

Uploaded by

Copyright:

Available Formats

Heliyon 9 (2023) e14481

Contents lists available at ScienceDirect

Forecasting time trend of road traffic crashes in Iran using the

1.1. Research motivation and objectives

2.1. Methodologies in crash count time series analysis

2.1.1. Integer-valued statistical methods

2.1.2. Real-valued statistical methods vs. machine learning algorithms

2.1.3. Multivariate analysis of crash time series

2.2. Speed-crash association

without controlling for traffic exposure.

3. Materials and methods

3.1. Data source and data distribution

indicated in Eqs. (2) and (3):

4. Results and discussion

Monthly fatal crashes 0.320 0.938 0.993 0.999 0.000

Coef. T-stat Coef. T-stat Coef. T-stat Coef. T-stat

Accuracy (within sample)

Diagnosis Check of Residuals

Heliyon 9 (2023) e14481

Fig. 6. Observed vs. forecasted values of monthly fatal accidents in Iran.

coefficients being statistically significant (see Fig. 7).

Coef. t-stat Coef. t-stat Coef. t-stat

Accuracy (within sample)

Diagnosis Check of Residuals

Heliyon 9 (2023) e14481

Fig. 9. Observed vs. forecasted values of monthly total accidents in Iran.

5.3. Limitations and recommendations

Author contribution statement

Data availability statement

Data will be made available on request.

Declaration of interest’s statement

You might also like