Professional Documents
Culture Documents
net/publication/329191384
CITATIONS READS
0 152
2 authors:
All content following this page was uploaded by Ybralem Bugusa on 26 November 2018.
Abstract- Intelligent Transportation System (ITS) is advanced Developing a model that predicts the real-timeroad safety risk
transportation framework for the more quick-wittedutilization of (driving outcome) for ADAS are important in creating a safer
transportation system. The setup consists of an arrangement of driving environment for drivers, passengers, and pedestrians.
coordinates propelled advancements which utilize hardware,
For all of this, it is important to have a real-time data of the
computers, correspondence, sensor, and cameras to upraise
travelers with relevant information for the safety and competence traffic safety factors,i.e., vehicle and other infrastructure
of transportation system. Advanced Driver Assistance System device information with regard time, location and other
(ADAS) is a piece of ITS, that helps drivers in the driving relevant information in association with the main safety
procedure to empower safe condition. In such a system,to factors of driver behavior, roadway information, and
guarantee the wellbeing of all the street clients, it is important to
invironmental condition data. The aim of this paper is
upgrade the execution of Advanced Driver Assistance System
(ADAS) for lower classes of vehicles. Real-time driving safety risk developing model that predicts the real-time driving outcome
prediction is a primary component of an ADAS.In this paper, we of driver during driving situation that can be updated to
are going to use a dataset from Secondary Strategic Highway individual driver information. It will use Strategic Highway
Research Program (SHRP2) Naturalistic Driving Study. The Research Program 2(SHRP2) Naturalistic Driving Study
elastic net regularized multinomial logistic regression model is (NDS) data to make a predictive model of real-time traffic risk
going to be utilized to predict the traffic safety risk that can be
for ADAS. The model uses some of the data above by
updated to the individual driver based on specific variables. The
prediction performance of the model will be improved by using aggregating the data of individual trip.
re-sampling techniques.
To classify the driving outcome of the driver using multiple
Keywords: traffic safety, driving outcome, naturalistic driving traffic safety factor, the elastic net regularized multinomial
behavior, multinomial logistic regression, elastic net, re-sampling, logistic regression [1] will be used due to its ability work with
advanced driver assistance system. correlated variable and bias/variance trade-off.
655
International Journal of Pure and Applied Mathematics Special Issue
To overcome this distraction many analysis and crash risk The second Strategic Highway Research Program (SHRP2) is
level prediction model have been done by evaluating the crash research method studying the largest and most comprehensive
casual factors data to provide vital information for effective Naturalistic Driving Study (NDS) ever undertaken linked with
transportation policy, vehicle design, driver awareness and Roadway Information Database (RID). Initially, the 100-Car
potential countermeasures. However, there have been certain Naturalistic Driving Study collect the large-scale NDS data in
criticized in data observation studies of crash casual factors the US which is first instrumented-vehicle study funded by the
due to poor quality data; absence of available and valuable National Highway Traffic Safety Administration (NHTSA)
exposure measures linked to the observation; the and the Virginia Department of Transportation (VDOT). Then
incomparability of self-revealed safety-related events; and, the it is followed by a larger and more wide-ranging study, the
difficulty in evaluating accountability for the safety-related Strategic Highway Research Program 2 (SHRP2) conducted
events [2]. from 2006 to 2015 holds data about driver behavior, road,
vehicle, and weather and traffic conditions with age 16-68 in
[5] Paper uses traffic data from loop detector to develop the event of either crash or near-crash.
prediction models of real-time crash risk for different segment
types and states of traffic flow on Korean expressways. They The NDS and RID data allow researchers to associate driving
address the result difference in occupancy and level of behavior to specific roadway conditions for better analysis and
overcrowding play a significant role in the ramp vicinities and traffic risk prediction. Detail report of SHRP2 NDS data
level of congestion difference between the downstream and instruments, data dictionary, all variables of the traffic safety
upstream leads to a hazardous condition. However, it has the factors and how to query and filter states in [11] [18-19]
papers.
656
International Journal of Pure and Applied Mathematics Special Issue
In this study, we are going to use the SHRP2 NDS particular Authors [20] investigate the potential contribution of three
variable of the safety factors,i.e., driver behavior, road, predictors (i.e., driver anger, impulsiveness, and
vehicle, and whether information due to several reasons [2]: aggressiveness) of aggressive and transgressive manners on
the road using multiple regression analysis models.
Have advanced data collection techniques for
different safety factor The [6] develop a model for detecting the driver‟s
No information biases and involvementlikelihood in secondary tasks from distinctive
Quality information primarily to identify the driver attributes of driving behavior (i.e., lateral acceleration,
fault in near-crash/crash events. longitudinal acceleration,throttle position, and yaw rate).
Using supervised feed-forward Artificial Neural Network
b. Driver behavior (ANN) finds that the Selected driving performance attributes
Recently, many kinds of researchwork in the prediction of were useful in detecting the associated secondary tasks with
real-time risk to prevent the sudden loss of human life by car driving behavior
and other vehicle and acknowledged that driver behavior plays
a significant role in driving risk [1]. Itpresents that nearly 90% In [14] the author uses driver behavior and physiological
of the light-vehicle crashes included a similar sort of human features for crash prediction to advanced vehicle collision
error such as impaired conditions, accidental mistakes, and avoidance system using supervised learning model.
risky driving behavior and driver mistake is additionally a
Several studies use different data set to explore the
center explanation behind roughly 87% of every single
relationship between the involvement and the traffic safety-
business vehicle crashes. Since factors associated with
related events and considers input data used for their work.
individual driver is different [10], developing models that can
However, due to several reasons they aware some
be customizable to the individual driver is essential to making
improvements on data sample usage, some of them they don‟t
good countermeasures to traffic safety risk. Driving behavior
useweather-related data and lane-based traffic data [5]. Others
can be more influential when it is associated with roadway,
they may not evaluate variables that influence the performance
vehicle and weather condition data.
of prediction [20] (i.e., years of driving experience is not
In [5] paper the authors develop real-time crash risk prediction considered).
models for different segment types and traffic flow states on
In this study, we are going to use the driver behavior as a
Korean expressways using Genetic programming technique.
major contributor to safety risk with external factors that may
influence it,i.e., roadway information weather condition, etc.
657
International Journal of Pure and Applied Mathematics Special Issue
of SHRP2 NDS data which is the vast research method ever Many studies have been conducted by using driver behavior
conducted. It contains data from vehicles, road, and weather with different factors to predict
condition data about driver behavior. We will consider some safety-related traffic problems. There are also studies which
specific variables from all this data that are important for the donot consider other factors rather than driver behavior.
predictor to classify the real-time driving the outcome of the
individual driver.
[1] Driver behavior along with roadside information Elastic net 68.5%
Paper [3], [6], [9] and [10]‟s has considered the additional Paper [1] uses weather and vehicle information, in addition to
factors like vehicle information and other factors. But, they driver behavior and roadside information which can be more
have used a limited number of sample data which can affect accurate in identifying safety-related traffic problems.
the accuracy of the system. In paper [14], the driver data was
collected from simulator driving. In this case, the accuracy can
be less when it is compared to thereal driving situation.
658
International Journal of Pure and Applied Mathematics Special Issue
To improve the accuracy of [1], we will usere-sampling consists of 19,600 baseline events with 27 variables which
technique. includes detailed epochs, driver state, and driving environment
information derived from video reduction. Event video
III. PROPOSED APPROACH
reduced data consists of 68 crashes or 760 near-crashes with
57 variables which includes detailed event, driver state, and
a. Data preparation and sampling
driving environment information derived from video
reduction. By joining the dataset into Baseline, Near-crash and
For further pre-processing of the SHRP2NDS data set, we will
Crash events, we have selected total 21 important variables
use Data munging and wrangle for cleaning and punctuation
which includes driver state, and driving environment
correction. It also helps to check the structure and value of the
information. Hence the number of Baseline event data is
data according to the data dictionary; thiscontributes to map
extremely large. Therefore, we have applied random sampling
the data into R. R is the tool that we will use in this study. To
techniques on major class (i.e. baseline event data) to reduce
handle the missing value, we are going to use Random Forest
the data size required to achieve better classification.After data
package to fill the missing value which is the fastest approach
cleaning, 3353 total data are left which consists of 2595-
[26].Multivariate Imputation by Chained Equations (MICE) is
Baseline (B), 969-NearCrash (N) and 62- Crash (C)Event
an expensive method used to handle the missing value [27].
Datasets.
We have collected the data from baseline video reduced data
and Event video reduced data.Baseline video reduced dataset
3000
2595
2500
Number of Data
2000
1500
1000 696
500
62
0
B N C
Events
Various applications can overcome the problem of imbalanced or drawing haphazardly with substitution from a set of data
data between classes. The rightness of prediction can be points or original data.
improved when it handle the imbalanced data. The occurrence
of imbalanced data is generalized to one-sided towards the In this study, we are going to use bootstrapping method to
larger part class. In any case, in a few applications, the increase the accuracy of the classifier and overcome such
accuracy of prediction in minority class is additionally problem.
noteworthy as much as in majority class. The accuracy in
minority class will be increased by using re-sampling b. Driving outcome classification
techniques. Such as under-sampling, over-sampling and
bootstrapping are the most used techniques. Under-sampling The driving outcome or risk events of SHRP2 NDS are into
and over-sampling are re-sampling based methods which don't three crash, near-crashand baseline or normal state. Near-crash
consider how the data are scattered in the space [25]. is conflict situation requiring a rapid and careful maneuver to
avoid an accident.
Bootstrapping is statistical resampling strategy to appraise the
accuracy of the sample by utilizing subsets of accessible data The author [1] further decomposes the events into 5 class or
outcome to better create a safe environment for pedestrians as
659
International Journal of Pure and Applied Mathematics Special Issue
well as driver based on the unsafe driver behavior and decomposed the dataset into Baseline1, Baseline2, Baseline3,
secondary task that the driver involved. Accordingly, we have Near-crash and Crash.
TABLE 2: THE DECOMPOSED OUTCOME AND THEIR MEANING
No. Outcome If it is
5 Crash Crash
3000
2595
2500
Number of Data
2000
1500
1000
696
500
62
0
B N C
Events
To create safer environment which is especially usefulfor driving slowly in relation to other vehicle) and secondary
young drivers as well as drivers with limited driving ability. task(i.e. texting, cellphone call, eye glance) that the driver
The baseline event is divided into three based on the driver involves during driving situation.
involvement in unsafe driving behavior (i.e. improper turn,
660
International Journal of Pure and Applied Mathematics Special Issue
This can be done by selecting the prevailing unsafe driving The following figure illustrates the number of each driving
behavior and secondary task from the dataset. The outcomes after classification.
decomposed one is show in table2, two colors are added to
represent the level of risk baseline1 is “green” as it
is,baseline2 is “green-yellow,” and baseline3 is “khaki.” The
remaining near-crash and crash event color is as it is. For
more,we have shown in below figure 5.
Driving Outcome
1600
1344
1400
1200
1000
800 711 696
540
600
400
200 62
0
Baseline1 Baseline2 Baseline3 Near-Crash Crash
661
International Journal of Pure and Applied Mathematics Special Issue
In this paper, we propose weighted regularized multinomial In reverse, the Ridge regression is L2 regularization solves the
logistic regression (MLR) which is precisely known elastic net multicollinearity by shrinking the value of coefficients but
[23] for variable selection and classification. doesn‟t reach to zero [23-24].
Multinomial Logistic Regression is the type of logistic Elastic net is a group of Lasso and Ridge Logistic Regression.
regression technique used to evaluate the parameters of a While we are working with high dimensional, correlated
multivariate explanatory model used when the dependent features, other reasons that improve the accuracy and
variable is binary (0/1, true/false and yes/no) and the performance of prediction are:
independent variable is continuous or categorical [8]. Reasons why we will use elastic net:
It is the type of logistic regression with more than two Integrated variable selection,
outcomes but not ordinal. Hence we have more than two Capability to work with bias/variance trade-off,
outcomes we will use the multinomial logistic regression. Cost-sensitive classification,
However, it's hard to work with a highly correlated Capacity to handle highly correlated variables,
independent variable to overcome it will collaborate with a Ability to deal with high-dimensional problems, and
regularized method that is weighted regularized multinomial Work with both numerical and categorical variables
logistic regression. Elastic net is a merging of L1 and L2
regularization. Problems in prediction can happen due to either biased or
variance; It may also occur by both. Regularized regression
LASSO regression is the type of regression technique that techniques have their own mechanism to overcome this:
works with highly correlated independent variable solves the
high variable relation by shrinkage parameter lambda which
takes one variable and shrinks the rest to zero that is the
problem with multicollinearity. It is the L1 regularization [23-
24].
662
International Journal of Pure and Applied Mathematics Special Issue
Ridge Regularized
Advantage Allow where n < p
Solve multi collinearity by minimizing coefficients but not to zero.
L2 norm based penalty.
Disadvantage No feature selection.
Formula 𝑛 𝑘 𝑘
𝑛 𝑘 𝑘
Lasso Regularized
Advantages Allow where n < p
Similar with ridge but multi collinearity by shrinking coefficients to zero.
L1 norm based penalty.
Feature selection
2
𝑛 𝑘
= min 𝑦𝑖 − 𝛽𝑜 − 𝑢𝑖𝑗 𝛽𝑗 + 𝜆2 𝜷 2 + 𝜆1 𝜷 1
𝛽
𝑖=1 𝑗 =1
663
International Journal of Pure and Applied Mathematics Special Issue
664
International Journal of Pure and Applied Mathematics Special Issue
[5] Kwak, Ho-Chan, and Seungyoung Kho. "Predicting crash risk and [23] Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. "Regularization
identifying crash precursors on Korean expressways using loop detector paths for generalized linear models via coordinate descent." Journal of
data." Accident Analysis & Prevention 88 (2016): 9-19. statistical software 33.1 (2010): 1.
[6] Ye, M., Osman, O. A., Ishak, S., & Hashemi, B."Detection of driver [24] Boulares, Mehrez, and Mohamed Jemni. "Learning sign language
engagement in secondary tasks from observed naturalistic driving machine translation based on elastic net regularization and latent
behavior." Accident Analysis & Prevention 106 (2017): 385-391. semantic analysis." Artificial Intelligence Review 46.2 (2016): 145-166.
[7] You, Jinming, Junhua Wang, and Jingqiu Guo. "Real-time crash [25] Thanathamathee, Putthiporn, and ChidchanokLursinsap. "Handling
prediction on freeways using data mining and emerging techniques." imbalanced data sets with synthetic boundary data generation using
Journal of Modern Transportation (2017): 1-8. bootstrap re-sampling and AdaBoost techniques." Pattern Recognition
[8] Starkweather, Jon, and Amanda Kay Moske. "Multinomial logistic Letters 34.12 (2013): 1339-1347.
regression."Consulted page at September 10th: http://www. unt. [26] Liaw, Andy, and Matthew Wiener. "Classification and regression by
edu/rss/class/Jon/Benchmarks/MLR_JDS_Aug2011. pdf 29 (2011): randomForest." R news 2.3 (2002): 18-22.
2825-2830. [27] Buuren, Stef, and Karin Groothuis-Oudshoorn. "mice: Multivariate
[9] Tian, R., Li, L., Chen, M., Chen, Y., & Witt, G. J. "Studying the imputation by chained equations in R." Journal of statistical software
effects of driver distraction and traffic density on the probability of 45.3 (2011).
crash and near-crash events in naturalistic driving environment." IEEE
Transactions on Intelligent Transportation Systems 14.3 (2013): 1547-
1555.
[10] Guo, Feng, and Youjia Fang. "Individual driver risk assessment using
naturalistic driving data." Accident Analysis& Prevention 61 (2013): 3-
9.
[11] Brewer, Marcus A., Shannon Barkwell, Pei-Sung Lin, SeckinOzkul,
Walter R. Boot, Priyanka Alluri, Larry T. Hagen et al. "Exploration of
the SHRP2 naturalistic driving study data to identify factors related to
the selection of freeway ramp design speed." Ann Arbor 1001 (2017):
48109-2150.
[12] Kosuge, R., Okamura, K., Kihira, M., Nakano, Y., & Fujita,
G."Predictors of driving outcomes including both crash involvement
and driving cessation in a prospective study of Japanese older drivers."
Accident Analysis & Prevention 106 (2017): 131-140.
[13] Klauer, S. G., Dingus, T. A., Neale, V. L., Sudweeks, J. D., & Ramsey,
D. J. "The impact of driver inattention on near-crash/crash risk: An
analysis using the 100-car naturalistic driving study data." (2006).
[14] Ba, Y., Zhang, W., Wang, Q., Zhou, R., & Ren, C."Crash prediction
with behavioral and physiological features for advanced vehicle
collision avoidance system." Transportation Research Part C:
Emerging Technologies 74 (2017): 22-33.
[15] Murray, Daniel C., Brenda Lantz, and Stephen A. Keppler. "Predicting
truck crash involvement: Developing a commercial driver behavior-
based model and recommended countermeasures." (2005)
[16] Treat, John R., et al. "Tri-level study of the causes of traffic accidents:
final report. Executive summary." (1979).
[17] Fell, J. C., D. L. Hendricks, and M. Freedman. "The relative frequency
of unsafe driving acts in serious traffic crashes." PROCEEDINGS OF
THE 44TH ANNUAL CONFERENCE OF THE ASSOCIATION FOR
THE ADVANCEMENT OF AUTOMOTIVE MEDICINE, CHICAGO,
ILLINOIS, USA, OCTOBER 2-4, 2000. 2000.
[18] Dingus, Thomas A., Feng Guo, Suzie Lee, Jonathan F. Antin, Miguel
Perez, Mindy Buchanan-King, and Jonathan Hankey. "Driver crash risk
factors and prevalence evaluation using naturalistic driving data."
Proceedings of the National Academy of Sciences 113, no. 10 (2016):
2636-2641.
[19] Files, Trip. "Analyzing Driver Behavior Using Data from the SHRP 2
Naturalistic Driving Study." (2013).
[20] Berdoulat, Emilie, David Vavassori, and María Teresa Muñoz Sastre.
"Driving anger, emotional and instrumental aggressiveness, and
impulsiveness in the prediction of aggressive and transgressive
driving." Accident Analysis & Prevention 50 (2013): 758-767.
[21] Peden, Margie. "World report on road traffic injury prevention."
(2004).
[22] Li, Q., Xie, B., You, J., Bian, W., & Tao, D."Correlated logistic model
with elastic net regularization for multilabel image classification."
IEEE Transactions on Image Processing 25.8 (2016): 3801-3813.
665
666