You are on page 1of 8

Multinomial Logistic Regression To Estimate The

Influence Of Accident Factors On Accident Severity
Muhammad Husaini Nadri1
1
Universiti Sainis Malaysia 11500 USM Penang. Malaysia

Abstract— According to Department for Transport, road traffic have been used to determine the influential factor that
accidents are responsible for more than 3000 deaths per year in contribute to the accident severity.
the UK. Although progress is being made in a number of areas,
the number of vehicles involved in accidents have not been
falling in line over the year. This study focus on identifying the II. LITERATURE REVIEW
factor contributing the cause of UK traffic accidents severity.
A. Introduction
Dataset of UK traffic accidents from kaggle form 2005 to 2007,
2009 to 2011, and 2012 to 2014 that have 1.6 million instances A detailed literature review on related studies was preformed
with 34 attributes was chosen for the analysis. A nominal to collect information and build background knowledge for the
multinomial logistic regression model was built. This particular thesis. This literature review available based on published
research notes and journal papers on factors contributing to
model type of regression analysis was used due to the mixed
accident. During the literature review, also a task of finding
nature of data. Multinomial regression was used to compare available methodologies for analyzing influential factors for
accident severity of fatal injury, injury, and Property Damage traffic crashes was undertaken. In research studies conducted on
Only. The influential factors include Light Conditions, Day of this topic, so far multinomial logistic regression models are the
Week, Road Type, Road Class, Road surface condition, and most common models used in analyzing influential factors for
weather conditions that affect accident severity. The analysis traffic crashes.
show different factors having a statistically significant impact
on the accident severity. B. Influential Factors
Based on the study that had been conducted by Giuliano et
Keywords— accidents severity, Multinomial regression al. (2009), they have combine both descriptive statistics and
statistical modeling to analyze the factors that associated with
I. INTRODUCTION accident in the state of California. From the descriptive
investigation, it was observed that the low possibilities of
Nowadays, there are several traffic accident happening daily. accident occur in the winter and early spring (January, February,
Traffic crashes might end up with injury, death, and property and April) and most of accident occur during the late summer
damage. According to Zaloshnja et al. (2004) stated that each and early fall (August, September, and October). It was also
crash costs $59,153 to $88,483 in year 2000 dollars. Their cost observed that very less accident happen during the late night and
components include medical costs, emergency service costs, early morning, but accident rate tended to rise throughout the
property damage, loss of productivity, monetized value of pain morning, peak in the early afternoon. Additionally the
and suffering, and loss of quality of life due to injury or death. researches also noticed a crash pattern by day of the week. The
Accident severity is of special concern to researchers in traffic data indicated that accident tended to be more frequent on
safety since this research is aimed not only at prevention of weekday and minimal rate of accident over the weekend.
accidents but also at reduction of their severity. One way to Cheng and Mannering (1999) in their research also state
accomplish the latter is to identify the most probable factors that that, detail information towards roadway conditions, alcohol
affect accident severity. Variety of factors contribute to the risk use, injury, restraint use, and weather are the factors contributing
of accident, which are weather condition, road surface to the accident. The speed variable also increased the likelihood
condition, light condition, day of week, speed limit, and road of possible injury for the accident severity.
type.
This study aims at examining factors that believed to have a C. Logistic Regression Model
higher potential for serious injury or death. Other factors were
not examined because of substantial limitations in the data According to Nassar et al. (1997), they developed an integrated
obtained from accident reports. Logistic regression was used in accident risk model (ARM) for policy decisions using risk
this study to estimate the effect of the statistically significant factors affecting both accident occurrences on road sections and
factors on accident severity. Multinomial Logistic regression severity of injury to occupants involved in the accidents. Using
Identify applicable funding agency here. If none, delete this text box.

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Their study found some striking differences between the two area types and their respective models. (2005) heavy vehicle crash severity was investigate in urban and rural areas. and others were used to determine most influential factor to the accident severity. The dependent variable is accident severity which is in the causal sequence leading to more severe injuries. Mercier et al.5 Based on Kim et al. and to identify the most significant factors that mostly contribute to accident severity. This variable represent by 1 is PDO. METHODOLOGY between a categorical dependent variable and one or more independent variables.02%. the percentage of injury crash is 13%. They has 3 level of injury. into three categories which are property-damage-only (PDO). environmental factors like road conditions. the model was tested for regression analysis to formulation. categorical dependent variable based on one or more predictor variables. They explained that the structural model was to identify the factors that might affect the severity of the helps to clarify the role of driver characteristics and behaviour accident. the UK. and 3 is fatal injury involved in the accident. and fatal injury. Logistic regression measures the relationship III. to use. estimated the effects of various factors in terms of odds multipliers that is. and 2012 to 2014 traffic crashes in logistic regression model fits the data. light conditions. by using The current study analyzed UK traffic accidents data from probability scores as the predicted values of the dependent UK Department of Transport database. Environmental factors such as the weather. Road surface condition. injury. Most notable was that Figure 1 Distribution of Accident Severity the different models contained different variables. Data Description collisions on rural highways. variables shared by both models typically possessed signs of different magnitude and impact. 2009 to 2011. (1997) used logistic regression to determine whether either age or gender (or both) was a factor influencing severity of injuries suffered in head-on automobile A. Only accident severity injury were relating driver characteristics and behaviour to type of crash considered for the purpose of this study. A (PDO) crash is 2. and the area surrounding a roadway also contribute to heavy vehicle crashes and crash severities. speed limit. they developed models that are practical and easy evaluate the results. The of any data analysis tool to analyse relationship between a percentage of fatal crash is 84. Logistic Regression large truck crash severities and suggest that complex interactions between driver and other measurable Logistic regression is a type of probabilistic statistical environmental factors are playing a significant role in the classification model that used to predicting the outcome of a demands placed on the driver in rural versus urban areas. This study used a multinomial logistic (MNL) model to model four outcomes of heavy vehicle crash severity in urban and rural conditions. In the model building part of the methodology. A multinomial variable was created for study by Bham et al. The data covered from variable. which are usually continuous. and the percentage of property-damage-only dependent variable and one or more independent variables. model to inspect the differences in accident contributing factors for six collision types. 2 is Logistic regression methods have become an essential element injury. Since the study goal and injury severity. The multinomial model’s estimation results mention that the risk of a multivehicle crash was higher during weekdays while the risk of a single vehicle collision was higher over the weekend. In one study conducted by Khorashadi et al. After . The data set used in this study was derived from a sample of 1. (2012) used a multinomial logistic (MNL) crash severity and used as the dependent variable. These findings underscore the difference between urban and rural B. The result also find that single vehicle accident were significantly associated with night time and wet conditions. which was coded decrease the odds of more severe crash types and injuries. (1995). how much does each factor increase or The dependent variable was accident severity. the type of roadway. The model also indicated that roadway grades and the presence of curves also increased the severity of crashes. A chi-square test is used to indicate how well the 2005 to 2007.98%. Additionally. they built a structural model million instances involved in serious accidents reported in traffic police records in UK. negative binomial regression and a sequential binary logit model building.

C.3% observations involved in first part of this chapter presents the descriptive results of UK .98 2. Therefore. logistic regression analysis was to identify the influential factors. the deviance statistic is called –2LL and it can be thought of as a chi-square value Model Chi-Square df Sig. Results of Logistic Regression Modelling for Predictor Variables Equation 1 The response variable in the dataset which consisted of three levels of crash was modelled as a nominal variable. calculate the logistic regression modelling. The multinomial logistic regression 3 1280205 84. so it is significant. With ML. Analysis of Data Null Hypothesis A multinomial logistic regression model was developed. which is referred to as P. influential factors impact the results is discussed. these From the test of parallel lines results shown in Table 4. the logistic regression Maximum Likelihood is used to finding the best fitting line by model was built by using stepwise selection. which are considered to have Valid 1 19441 2. The main objective of this task was to The ln symbol refers to a natural logarithm P can be computed investigate the complex relationships between the accident from the regression equation also.0 100. The stepwise selection was made to select the best explanatory variables to build the model. it provides a final value for the deviance. Dependent Variable Crash Severity Checking best fit.02 84. which is usually referred to as "negative a) Test of Parallel Lines Procedure two log likelihood". The first part of this analysis was checking nominal of a) Maximum Likelihood dependent variable accident severity by using test of parallel lines. P. 1) Logistic Regression Equation traffic accidents data based on the data analyzed in this study. The final results contained the variables that were selected from the stepwise minimizing the squared residuals. (OLS).02 model which uses maximum likelihood estimation method was Total 1504150 100. the computer uses different "iterations" in which it tries different solutions until it gets the smallest possible deviance or B. which should be used in the regression modelling of 2) Logistic Regression Model Fit predicting the accident severity. a total of 1504150 observations of the accident Predictor variables were tested at a 95% significance level. They are important criterions for checking was built. theoretically. The second part discusses the results of the multinomial logistic The logistic formulas are stated in terms of the probability that regression modelling and lastly.0 applied to estimate statistically the effects of these variables in From the dependent variable crash severity frequency table contributing to the occurrence of accident severity levels. The above. whether the explanatory variable was an influential factor on dependent variable accident severity or not.001 regression model was used. The General test of parallel lines to check which multinomial logistic 65. shown above.1 variables were considered as the influential factors.98 an effect on the occurrence of related with accident severity were 2 204504 13 13 analyzed by using logistic regression in order to determine the most significant ones. According to Pedazur and Hosmer(1989). RESULTS Frequency Percent Valid Percent Several environmental factors. ML is a way of finding the smallest possible deviance between the observed and predicted values using calculus. b) Accident Severity Frequency IV. the significant level is less than 0. The data points were used whereby.226 34 . The final product from this expected probability that Y=1 for a given value of X. so the nominal multinomial logistic regression model likelihood estimates. significant level and chi-square were presented for each Therefore the dependent variable of accident severity is explanatory variable in the likelihood ratio tests and maximum nominal. how the Y=1. as the ordinary least squares selection procedure. A. we could. in the third part. if we know the severity and the 38 selected predictor variables by using the regression equation. Once it has found the best solution.05. In the second part of this analysis. The probability that Y is 0 is 1. 1. 38 predictor variables selected from dataset with six of these were grouped variables.

13.119 13.867 is significant to the dependent variable accident severity.347 stepwise selection procedure remove 1 variables which are 4 Pedestrian Crossing Physical Facilities to build the multinomial with high winds] logistic regression model. itions=Snowing 1.39 .05.297 0.86 . [Weather_Cond 18.1% observations involved in fatal injury.450 1.507 230. value is less than 0.05.66 From the summary of stepwise selection shown in table the itions=Raining .533 1. From =5] 7 the table. property-damage-only (PDO).759 =1] 5 [Day_of_Week 1. value is less than 0.424 without high 67 winds] [Day_of_Week . This variables do not statistically significant to the model without high 5 and do not have influential relationship with the dependent winds] variable accident severity.9 itions=Fog or 30 .477 dependent variable since all the -Sig.b mist] E-8 [Weather_Cond .120 1. Light Conditions. Road Type. and Road Class are statistically significant to the [Day_of_Week 1. If the p-Sig.21 itions=Fine with .000 .6% observations involved in injury. it means this factor [Day_of_Week .806 2) Maximum Likelihood Estimates 650 1 with high winds] Table 2 Maximum Likelihood Estimates [Weather_Cond itions=Snowing 1.435 1.353 1. Nominal Logistic Regression Model Building Od 95% Confidence ds Interval for Exp(B) Accident_Severitya Rat Lower Upper 1) Stepwise Selection Procedure io Bound Bound Intercept Table 1 Stepwise Summary [Weather_Cond .646 The chi-square is the important criterion in maximum =4] 1 likelihood estimates for identifying the influential factors for the model.023 2.114 =2] 46 [Day_of_Week .87 . The variable was removed by using [Weather_Cond the Wald statistic criterion based on Backward Stepwise.2 .621 2.132 3. This variable had been removed due to significant value higher than itions=Raining .772 2.364 =3] 4 [Day_of_Week .05.044 6 high winds] [Weather_Cond 6.69 . =6] 82 .1 .3 . 3) Odds Ratio Estimates Table 3 Odds Ratio Estimates C. and 85. it can be seen that Day of Week.99 .

96 =1] 8 .1 ngle .71 .574 ass=1] 13 .1 .349 ass=2] No street 31 E-7 lighting] [@1st_Road_Cl 1.978 Street lights 63 ass=4] 30 present and lit] [@1st_Road_Cl 1.000 .165 3.6 itions=Raining .1 1.244 5.540 1.6 02 .227 1.58 .73 .716 1.999 8 _Conditions=Sn 96 .329 itions=Fine 1.250 3.7 [@1st_Road_Cl ns=Darkeness: .763 itions=Other] 3 4 ost/Ice] [Weather_Cond .594 2.173 1.174 12.037 2.034 4.8 _Conditions=Dr .423 63.9 =2] 7 .084 [Light_Conditio ass=3] 96 ns=Darkness: 1.b itions=Snowing .000 .58 [Road_Surface 1.193 oundabout] 0 itions=Snowing 1.773 _Conditions=Fr .6 [Weather_Cond [Road_Type=O 1.4 .336 [Road_Type=Sl .124 2.869 ne way street] 07 E-7 with high winds] [Road_Type=R .891 9.000 .347 .140 9.b 2 with high winds] ow] E-7 [Weather_Cond [Road_Type=D itions=Raining 1.0 Intercept Street lights 03 .97 .30 ood (Over 3cm 79 itions=Fog or .287 1.4 [@1st_Road_Cl 2.8 1.60 48 .580 2.635 [Road_Type=Si without high 72 1.800 4.1 68 .896 6 unlit] high winds] [Road_Surface [Weather_Cond 1.b 3.131 ip road] 2 [Day_of_Week .785 without high 84 92 carriageway] winds] 2.1 [Weather_Cond .000 .566 1.124 20.[Light_Conditio 1.249 1.951 2.65 present but E-7 itions=Fine with .051 18.537 4 of water)] mist] [Road_Surface [Weather_Cond .331 [Light_Conditio ass=5] 60 ns=Darkness: 3.b [Weather_Cond .38 [Weather_Cond .417 ual .482 2.005 [@1st_Road_Cl 1.2 .175 y] without high 37 [Road_Surface winds] _Conditions=Fl 5.0 .743 winds] 67 carriageway] [Day_of_Week .

207 1.431 .23 _Conditions=Sn .880 1.255 =6] 2 [Road_Type=Sl . and Single 11 carriageway it has accident severity to injury only because odds ow] ratio greater than 1. Fog or mist.2 1.0 .788 .395 =4] 1 oundabout] 2 [Day_of_Week .345 2.057 .0 [@1st_Road_Cl . [Road_Type=D 1.952 1.78 [Road_Type=R .0 ual .78 .2 [@1st_Road_Cl 1. and raining without high winds have [Road_Surface higher tendency to increase accident severity to fatal injury.577 1.0 1 which indicate more likely to have fatal injuries.545 13 Weather conditions of Fine with high winds.496 10.196 1. An Odds Ratio 1. It is more likely that accident happen at road type that have roundabout to have [Road_Surface accident severity up to fatal because odds ratio less than 1.0 indicate if the odds ratio < 1.For the accident happen on Dry and .482 3.036 1.474 2. The reference ns=Darkness: group for this table is accident severity 3.568 the referent group.4 [@1st_Road_Cl 1.195 3.59 [Road_Type=O 1.76 [Road_Type=Si . All working days from Monday to Friday all have odds ratio below _Conditions=Fl 1.2 _Conditions=Dr .87 . One way street.0 .942 79 [Day_of_Week .91 1.352 [Light_Conditio ass=4] 08 ns=Darkness: 1.000 .127 8.[Day_of_Week .233 4.94 carriageway] . .932 [Light_Conditio ip road] 6 ns=Darkeness: 2. Street lights present and lit. All light . 32 present but unlit] PDO relative to fatal injury [Road_Surface 1.313 3. it will likely to cause _Conditions=Fr .b ood (Over 3cm 57 conditions that include no street lighting.2 . But if accident happen on Frost and Snow 69 ost/Ice] road surface it more likely to have fatal injuries.007 1.761 However for Dual carriageway.417 =3] 1 ne way street] 34 [Day_of_Week . the outcome is more likely to be in Street lights .784 . Street lighting unknown. y] raining with high winds.928 [Light_Conditio ass=2] 66 ns=Darkness: 1.462 3.707 1.561 1.613 2.516 1.811 . street lights present but of water)] unlit are more likely to cause injury of accident severity since [Road_Surface the odds ratio greater than 1.626 property damage only.641 Street lights 62 ass=5] 01 present and lit] [Light_Conditio Table shows the Odds Ratio for the predictor.993 17 carriageway] .0 .3 =5] 3 ngle .752 1.256 2.610 No street 26 ass=1] 1 lighting] [@1st_Road_Cl 1.12 Flood (Over 3cm of water) road surface.524 Street lighting 18 ass=3] 39 unknown] [@1st_Road_Cl 1.

. the accident severity levels when accident happen. The multinomial logistic regression model was used because [4] it has the ability to detect influential factors for accident severity. J. Berkley. Accident Analysis and Prevention 36. Zhou.path. and 2012 [3] Giuliano.. 469–481. But if accident happen on Frost and Snow carried out.. [5] Chang. S.. Lemeshow. happen on Dry and Flood (Over 3cm of water) road surface. Saccomanno. collisions. G. A. The explanatory variables include [6] several traffic and environmental factors. Rimkus. injury. which was coded occupancy in truckand non-truck-involved accidents. present but unlit lit which have odds ratio below 1 which indicate more likely to have fatal injuries. N. V. 579-592. &Miller M. (1999) Analysis of injury severity and vehicle The dependent variable was accident severity. National Fog or mist.F. [10] Nassar. J.. Inc.. (pp 801-808). All working days from Monday to Friday all have odds ratio below 1 which indicate Future Recommendation more likely to have fatal injuries. P. the nominal multinomial Transportation Studies. D.. Accident from Monday to Friday more likely to cause fatal injuries. road surface it more likely to have fatal injuries. data from 2005 to 2007.. Age and Class. ratio less than 1. DC severity that may cause fatal injury. C. 1. But if accident happen on Frost and Snow road surface it more likely It is more likely that accident happen at road type that have to have fatal injuries. D. 2009 to 2011. Shankar. McFerrin. 138(6). it will likely For the future work the following recommendation can be to cause injury only. K. Accident Analysis into three categories. (2005) characteristics of predictor variables and screen out the most Differences in Rural and Urban Driver-Injury Severities in Accidents influential ones.. There is no other factors were considered roundabout to have accident severity up to fatal because odds as significant for the likelihood that influence accident severity. Accident Analysis built by using both to investigate the influential factors for and Prevention. One way street.edu/PATH/Publications/PDF/PWP/2010/PWP -2010-01. 1997. accident severity in UK. For the accident happen on Dry and Flood (Over 3cm of water) road surface.C. Journal of Transportation by using some powerful statistical modeling techniques in order Engineering. Road surface condition. Different variable like different crash types and V. Hosmer. F.berkeley. and Prevention. Shelley. 31 (5). winds have higher tendency to increase accident severity to fatal injury. They were property-damage-only (PDO).University of California. J. & Mannering.. J.. Day of Week. E.R. A multinomial logistic regression model was Involving Large-Trucks: An Exploratory Analysis. The main objective of the current study was to [1] Zaloshnja. or more correctly.R. (2004). & Mannering. J. Road Type. S. and raining without high Research Council. Washington.S. & Miller T. In this study. 1997. Li. For Analysis and Prevention 27 (4).. G. and weather conditions. However for Dual carriageway.. U. it is recommended that characteristics of drivers because odds ratio greater than 1. Other model like log linear model and nested logistic models can be used to predict accident severity. Washington. This study examined the influential factors for UK traffic REFERENCES accidents. (2012) Multinomial Logistic have an effect on the accident severity and they were analyzed Regression Model for Single-Vehicle and Multivehicle Collisions on Urban U. to determine the most significant ones. it NY: John Wiley & Sons... Based on this [2] main objective.. http://www. Niemeier.. Day of week also influence the accident severity which [12] Kim. Presented at the 76th Annual Meeting of the explanatory variables did influence the likelihood of an accident Transportation Research Board. 910-921. Applied logistic regression [M].Weather conditions of Raining who get involved. They are considered to [7] Bham. Bhanu. Road [11] Mercier. Costs of large truck-involved crashes determine the influential factors that contribute significantly to in the United States. Richardson.H. L.. Integrated Risk The results of the current study suggested that several Model (ARM) of Ontario. The influential factors include Light Conditions. F. . 37 (5). Highways in Arkansas. Mercier.H. the light condition of no street lighting and street lights present but unlit more likely to cause fatal injuries. (2009) Commercial to 2014 obtained from the UK Department of Transport was Motor Vehicles’ SafetyA California Perspective. J. 1989.. and fatal injury. snowing with high winds. November 2012. W. DC. L. 2. snowing without occurrences of motorcycle-motor vehicle crashes should be high winds have higher tendency to have injury during accident. For the accident D. Lawrence.A. Shortreed. drivers who cause the without high winds.pdf. Institute of used for this analysis. TRB. 786-797. A total of eight variables [8] were selected for exploratory analysis to investigate [9] Khorashadi.S.. CONCLUSIONS AND FUTURE RECOMMENDATIONS different roadways can be used to produce better model.M. Injury relative to fatal injury will likely to cause property damage and injury only. New York. educated on the influential factors affecting such crashes in For the light condition of no street lighting and street lights order to take effective countermeasures. 1995. & Manepalli. Personal and behavioural predictors of automobile crash and injury severity. For gender as predictors of injury severity in head-on highway vehicular accident happen on weather conditions of Fine with high winds. In: Transportation Research Record 1581. raining with high winds. Retrieved logistic regression model was built to investigate characteristics from: of injury and fatality of in UK. and Single carriageway it has accident severity to injury only Therefore. M.