You are on page 1of 9

Transportation Research Part A 120 (2019) 252–260

Contents lists available at ScienceDirect

Transportation Research Part A


journal homepage: www.elsevier.com/locate/tra

Which curves are dangerous? A network-wide analysis of traffic


T
crash and infrastructure data

Michal Bíl , Richard Andrášik, Jiří Sedoník
CDV – Transport Research Centre, Líšeňská 33a, 636 00 Brno, Czech Republic

A R T IC LE I N F O ABS TRA CT

Keywords: We conducted spatial analyses of traffic crashes, which took place in Czechia over 2010–2016,
Traffic crash with respect to the road geometry data. The aim of the work was to identify hazardous road sub-
Road alignment segments where higher than expected numbers of traffic crashes occur.
Hazard The entire Czech road network (58,200 km) was segmented at intersections into 39,074 be-
Horizontal curve
tween-intersection segments of varying lengths. Each road segment was further automatically
Hotspot
Bayesian inference
sectioned, according to its horizontal alignment, into geometry-homogenous units (horizontal
curves and tangents). Overall, 257,101 curves, defined as curved sections with radii below
2100 m, and 136,388 tangents, were identified. Subsequently, traffic crashes were joined to the
respective geometrical units to determine their hazardousness. The degree of hazardousness was
determined relatively, on a segment-by-segment basis, in order to eliminate the lack of precise
traffic exposure data. In addition, the exact binomial test and Bayesian inference were used to
identify the most hazardous horizontal curves.
It was found that, in general, the curves with a higher crash risk have lower radii than the
other curves. We identified the geographical locations of all curves with a high crash hazard. We
also ranked the curves according to the crash hazard. Approximately ten percent of road seg-
ments contained at least one hazardous horizontal curve. 6943 significantly hazardous curves
were identified by the use of the exact binomial test. The Bayesian inference reduced this number
to 1395 (0.31% of the entire road network) and ranked them according to the Bayes factor. The
most hazardous curve was 45 m long and contained 8.7 traffic crashes per year. Its hazard rate
accounted for 37.4. This state-wide analysis of primary data was conducted over an extremely
short time (up to 3 days) as the result of an application of an efficient algorithm for automatic
road curvature determination.

1. Introduction

Road geometry (road alignment) plays an important role in road infrastructure planning and design. From a traffic safety per-
spective, road geometry influences the crash hazard. Roads predominantly consist of straight segments (tangents) and curves. The
curved road segments were previously studied to identify radii with the highest crash hazard. Schneider et al. (2010) studied mo-
torcycle crashes and concluded that “A decreasing radius is indicative of a sharper curve, which results in a significant increase in the
frequency of crashes”. Similar results, pointing to smaller curve radii as more hazardous, were also published by other authors (e.g.,
Anastasopoulos et al., 2008; Fitzpartick et al., 2010; Khan et al., 2012; Elvik, 2013). Usually a single number representing how many


Corresponding author.
E-mail address: michal.bil@cdv.cz (M. Bíl).

https://doi.org/10.1016/j.tra.2019.01.001
Received 3 July 2018; Received in revised form 12 November 2018; Accepted 3 January 2019
Available online 08 January 2019
0965-8564/ © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/BY/4.0/).
M. Bíl et al. Transportation Research Part A 120 (2019) 252–260

times a given curve (or a group of curves with similar radii) is more hazardous than a base (reference) one is returned. Such an
approach can then be used when new roads are built to forecast the expected number of traffic crashes. It is evident, however, that
such an approach cannot be used for retrospective hotspots identification.
Many methods have been developed to identify crash hotspots. Some of them were based on clustering of traffic crashes
(Steenberghen et al., 2004; Erdogan et al., 2008; Bíl et al., 2013), while other applied various regression models to predefined
(segmented) homogenous parts of roads (e.g., Lord and Mannering, 2010; Vangala et al., 2015; Zou et al., 2015). The results which
the above-mentioned methods usually produce are called hotspots, i.e., places with a higher than expected number of crashes (Elvik,
2008). This work is to some degree similar to both approaches as it also identifies the most hazardous parts of road infrastructure, i.e.
certain curves among the list of all the previously defined geometrically homogenous infrastructure units.
The aim of this paper is to present an approach capable of evaluating the hazard (probability of traffic crash occurrence) for
individual road geometry (curves and tangents). ROCA software (Bíl et al., 2018) was used for the rapid determination of road
geometry from digital vector data. This approach allowed us to link traffic crashes with precise spatial locations (data on crashes were
collected using GPS devices) to the respective individual road geometry units and then analyze them. Subsequently, the exact bi-
nomial test and Bayesian inference were applied to identify and rank the most hazardous horizontal curves.

2. Data and methods

2.1. Data

An analysis of the entire Czech road network was conducted. We worked with data on 235,098 traffic crashes recorded by the
Police in the period of 2010–2016. The Czech road network is approximately 58,200 km long (represented by 1,955,498 vertices in
digital data) and consists of 39,074 individual between-intersection segments (i.e., parts of roads which are bounded by two
neighboring intersections).

2.2. Methods

2.2.1. Preparation of sub-segment units


The road network was analyzed using the ROCA software (Bíl et al., 2018; https://roca.cdvinfo.cz/) which utilizes the approach
of road geometry identification introduced by Andrášik and Bíl (2016). The method identifies horizontal curves with their radii and
tangents. Fig. 1 explains the terminology used in this work. The road network has to be divided at intersections first. The resulting
between-intersection segments are each uniformly exposed to the traffic flow and have therefore constant annual average daily traffic
(AADT). ROCA software then identifies the geometrically homogenous sub-segment units (it can be either a horizontal curve or a
tangent) within each between-intersection segment. The respective traffic crashes are then linked to the closest sub-segment units.
This process allowed for further statistical processing (Fig. 2).

2.2.2. Statistical analysis of the curved sub-segment units


The sub-segment units (we focused on horizontal curves in this work) were compared with the between-intersection segments
with respect to the density of traffic crashes. When the traffic intensity along the entire between-intersection segment is constant, the
density of crashes within each sub-segment unit is also constant if and only if the hazard of a traffic crash is constant along the entire
between-intersection segment. A higher density of crashes within a sub-segment unit would therefore imply increased hazard in that
particular sub-segment unit.
It was presumed, as a null hypothesis (H0 ), that a traffic crash has the same probability of occurring along the entire road segment.
In other words, if it is known that the proportion of the length of a sub-segment unit within a particular road segment is p0,
0 < p0 < 1, the null hypothesis states that the proportion of traffic crashes within the sub-segment unit is also p0. The hazardous
curves were consequently those, where the observed proportions of traffic crashes were significantly greater than p0. The data related

Fig. 1. Derivation of the sub-segment units.

253
M. Bíl et al. Transportation Research Part A 120 (2019) 252–260

Fig. 2. Data preprocessing.

to each sub-segment unit and its respective between-intersection segment has been summarized in a table (Table 1).
The measure of the hazard (hazard ratio; HR) for each sub-segment unit can be expressed in the following fraction:
n
N n
HR = Lp0
= ,
Np0
L

where L > 0 stands for the length of the between-intersection segment, n is the number of traffic crashes within a sub-segment unit
and N > 0 represents the total number of traffic crashes within the between-intersection segment. HR is approximately one when the
sub-segment unit has the same proportion of crashes per unit length as the entire between-intersection segment. HR values, which are
significantly greater than one, identify the hazardous sub-segment units. HR computation is demonstrated in Fig. 3. First, the lengths
of each unit are derived (A). Subsequently, the sums of the recorded crashes, which took place on the units, are calculated (B).
Finally, HR for each unit is computed (C).
To evaluate whether data (n, N and p0) leads to a significant result, a statistical test is needed. The exact binomial test (EBT; Clopper
and Pearson, 1934) was used in order to test the null hypothesis (see H0 above). The p-value of EBT can be computed directly with the
use of the formula:
N
p − value = ∑ ⎛ N ⎞ p0k (1 − p0 ) N − k .
k=n ⎝k⎠

Since many statistical tests (one for each curve) were performed, we suspected that the EBT could produce a number of false
alarms due to a multiple comparisons problem (Miller, 1981). The Bayesian inference (MacKay, 2003) was therefore applied for each
sub-segment unit. The Bayes factor (BF; Kass and Raftery, 1995) was calculated as a measure of the Bayesian inference:
P (data|H1)
BF12 = ,
P (data|H2)

where “data” stands for {n, N , p0 } , H1: “The sub-segment unit is more hazardous than the respective between-intersection segment,”
and H2 : “The sub-segment unit is less or equally hazardous as the respective between-intersection segment”. BF plays an important
role in the Bayesian inference as it is a factor expressing the change in the proportion of prior probabilities to the proportion of
posterior probabilities, i. e.:
P (H1 |data) P (H1)
= ·BF12.
P (H2 |data) P (H2)

Regardless of the prior probabilities, BF of more than 150 means very strong evidence of the observed data for H1 (Kass and Raftery,
1995). A domain of rejection of the null hypothesis and a domain of very strong evidence for H1 can be constructed for any total
number of traffic crashes within a between-intersection segment (see Fig. 4). Concerning EBT, an observation above a specific red
curve would reject the null hypothesis and accept the alternative hypothesis that a horizontal curve is significantly hazardous.

Table 1
Data on traffic crashes and exposure related to a sub-segment unit and its respective between-intersection segment.
Number of traffic crashes Proportion of length (exposure)

Sub-segment unit n p0
Complement N–n 1 – p0

Between-intersection segment N 1

254
M. Bíl et al. Transportation Research Part A 120 (2019) 252–260

Fig. 3. An example of HR computation for each sub-segment unit.

Fig. 4. Domains of rejection of the null hypothesis by the use of EBT (left) and domains of very strong evidence for the hypothesis that a horizontal
curve is hazardous (right) for varying total numbers of traffic crashes.

Similarly, an observation above a specific red curve provides a very strong evidence for the hypothesis that a horizontal curve is
hazardous in the case of the Bayesian inference.
For instance, “A” in Fig. 4 stands for a situation, in which all traffic crashes (TCs) on the between-intersection segment occurred
within the horizontal curve in question (n = N ) and the length of the curve is 60% of the length of the between-intersection segment
( p0 = 0.6 ). If N = 5, the horizontal curve is not determined as hazardous by either of the two methods. In the case of N = 10, the
curve is hazardous according to both approaches (10 TCs on 60% of the length compared to 0 TCs on 40% of the length indicates
hazardousness).

3. Results

3.1. General overview

The geometry of 39,074 individual between-intersection segments within the Czech road network was analyzed. It contains
257,101 curves and 136,388 tangents. There were 109,412 traffic crashes related to curves and 125,686 of them related to tangents.
There were only 22.6% horizontal curves with at least one TC. The rest of them (77.4%) either contained no TC (58.8%), or were
even parts of between-intersection segments with no TC (18.6%). In the former case, the horizontal curves cannot be proved to be
hazardous by any retrospective method (HR = 0 for them). One cannot obviously say anything about the sub-segment units when
there was no traffic crash on the respective between-intersection.
The between-intersection segments were evaluated independently. This means that no information on traffic exposure (traffic
intensities) was needed for evaluation of hazard within these segments, because the traffic volume (exposure) was constant along
each segment. In contrast, comparing the total numbers of TCs to only the total lengths of horizontal curves (not considering the
traffic intensity) can produce misleading results such as increasing hazard of a TC when the curve radius increases (see Fig. 5).

255
M. Bíl et al. Transportation Research Part A 120 (2019) 252–260

Fig. 5. Total number of traffic crashes and total length of curves according to their radius.

3.2. Detailed results for the sub-segments

We analyzed all horizontal curves containing at least one traffic crash. EBT arrived at 6943 significantly hazardous curves, which
is 1.45% of the entire road network. Only 1395 (0.31% of the entire road network) of them were also confirmed as being hazardous
by the use of the Bayesian inference. In addition, each curve with very strong evidence (i.e., BF > 150) for being hazardous was also
identified as hazardous by the use of EBT (see Fig. 6).
The horizontal curves were examined with respect to their HR and radii. First, horizontal curves were separated into two groups
according to the result of EBT (rejection/not rejection of the null hypothesis). These two groups of horizontal curves were compared
in terms of medians (Wilcoxon test) and cumulative probability distributions (two-sample Kolmogorov-Smirnov test), both of which
can be visually represented in boxplots (see Figs. 7 and 8). Confidence intervals of medians are highlighted by notches. Subsequently,
we performed the same comparisons when the horizontal curves were divided into two groups according to the results of the Bayesian
inference.
The HR for hazardous curves, when compared to the other curves, was significantly greater in terms of medians (see Fig. 7) and
also in terms of the cumulative probability distribution shift according to the Kolmogorov-Smirnov test (p-value < 0.0001). This
means that the results of both EBT and Bayesian inference are consistent with the concept of HR.

Fig. 6. A comparison of results provided by EBT and the Bayesian inference. The horizontal line represents the level of significance (α = 0.05 ) and
the vertical line stands for BF accounting for 150. Each black dot represents a tested horizontal curve.

256
M. Bíl et al. Transportation Research Part A 120 (2019) 252–260

Fig. 7. HR compared to the results given by EBT – rejection of the null hypothesis or not (left), and Bayesian inference – very strong evidence for H1
or not (right).

Fig. 8. Radii of hazardous and other horizontal curves. The hazardousness of the curves is based on the results of the EBT (left) and Bayesian
inference (right).

It was determined that the radii of hazardous curves were significantly lower (see Fig. 8) than the radii of the remaining curves in
terms of medians and also in terms of the cumulative probability distribution shift according to the Kolmogorov-Smirnov test (p-
value < 0.0001).
In the last step, the results obtained by the use of EBT and Bayesian inference were compared. Although the cumulative dis-
tribution functions of HR for hazardous curves differ from each other, the cumulative distribution function of curve radii for ha-
zardous curves determined by the use of EBT is similar as for hazardous curves determined by the use of the Bayesian inference (see
Fig. 9). The Kolmogorov-Smirnov test of this hypothesis (no difference between the cumulative distribution functions) results,
however, in a test statistic of 0.0912 and p-value much less than the significance level 0.05.
In summary, the general outcome, that horizontal curves with smaller radii are more hazardous (see Fig. 8), was achieved by both
the EBT and Bayesian inference. The Bayesian inference also provided an additional measure – the Bayes factor – which can be used
to rank the hazardous curves. Since 1395 hazardous curves were identified, the ranking is necessary for road administrators to first
allow an inspection of the most hazardous curves.

257
M. Bíl et al. Transportation Research Part A 120 (2019) 252–260

Fig. 9. Empirical cumulative distribution functions of HR (left) and curve radii (right) for hazardous curves. The hazardousness of the curves is
based on the results of the EBT and Bayesian inference.

3.3. Examples of the most hazardous curves

The general statistics for the data were presented above. The primary aim of this analysis is, however, to identify the concrete
places where the most hazardous curves are localized. We present certain examples of them below (Table 2).
The most hazardous road curved segment is also visualized on Fig. 10. It is apparent that the number of crashes is enormous and
far from proportional (65.6% of all crashes are only located on 1.8 length of the entire between-intersection segment).

4. Discussion

Road geometry identification software, ROCA (Andrášik and Bíl, 2016; Bíl et al., 2018; https://roca.cdvinfo.cz/), applied in this
work allowed for rapid data processing, i.e., road geometry identification of large datasets within a reasonable time. The application
of the proposed method for curvature hazard can then be applied. This analysis can therefore be used for any, even a very large,
national road network, if input digital data (road network segments and traffic crashes) are available.
The overall number of crashes located on individual between-intersection segments can be seen as an outcome of a random
process influenced by traffic flow and road infrastructure. It was assumed in this work that the number of crashes would be

Table 2
A list of 20 most hazardous sub-segment units.
Rank ID Radius [m] Length [m] TCs TCs BF

Unit Segment Unit Segment

1 374,031 25 45 2565 61 93 2.0E+81


2 354,456 835 97 1356 34 36 2.1E+36
3 373,080 148 182 1818 48 93 7.5E+22
4 369,220 30 44 5544 22 108 7.3E+22
5 350,700 906 571 7351 65 202 2.5E+22
6 350,283 2004 466 8923 62 273 3.1E+21
7 378,208 194 327 1387 66 94 5.6E+20
8 260,037 299 179 1904 18 18 3.3E+18
9 254,402 65 165 3542 18 26 7.9E+17
10 287,765 196 145 1570 37 79 1.3E+17
11 300,883 185 209 6035 27 90 7.8E+16
12 351,325 20 19 264 17 21 4.1E+15
13 202,231 146 142 2992 22 50 3.1E+15
14 352,605 899 215 3396 27 63 2.4E+15
15 352,570 1648 579 3317 53 100 8.9E+14
16 378,199 186 303 1346 34 42 6.8E+14
17 331,354 128 112 3550 17 41 1.5E+14
18 367,527 60 32 1194 16 42 5.8E+13
19 286,078 216 54 4532 14 57 3.9E+13
20 383,833 53 163 2672 24 60 1.5E+13

258
M. Bíl et al. Transportation Research Part A 120 (2019) 252–260

Fig. 10. The most hazardous curve on the Czech road network (HR = 37.4).

proportionally the same for the respective parts of sub-segment units. A higher proportion of crashes means that the respective unit
also has a higher crash hazard. The use of the Bayesian inference helped us narrow the results obtained by EBT to only 0.31% of the
length of the road network. The remaining curves, identified as hazardous by EBT, but not by the Bayesian inference, are probably
only a false alarm.
This work presents two types of results:

• First, it was confirmed, on Czech traffic crash and infrastructure data, that horizontal curves with lower radii are more hazardous
than curves with higher radii.
• Second, the most hazardous horizontal curves within the Czech road network were also ranked and localized (see Table 2).

The results about the higher crash hazard of the curves with smaller radii conform to the results of previous work (e.g., Persaud
et al., 2000; Anastasopoulos et al., 2008; Khan et al., 2014). These and many other works have only used, however, aggregated data,
i.e. the number of crashes per two elementary geometries or for a selected curvature interval, but always for the entire (or large part
of) network analyzed. Such results can be distorted, however, by the varying AADT values across a road network.
This proposed approach is a segment-by-segment approach. This means that relevant (traffic intensity independent, AADT) results
can be obtained. Also, two elementary data are only needed to conduct this kind of analysis: data on road crashes and data on road
geometry types. Such a straightforward, rapid and efficient analysis stands in contrast to complex and data demanding analyses of
safety performance functions (e.g., Banihashemi, 2016; Gooch et al., 2016). It can be used as a suitable tool for rapid network-wide
evaluation of hazard.
Using this approach, the researchers will also be able to localize the particular curves which are the most hazardous within the
road network. In other words, the proposed procedure aims not only at gathering information about hazardousness of horizontal
curves in general, but is able to determine, at which particular curves a mitigation measure will be the most effective.

5. Conclusions

The most hazardous curves within the Czech road network were identified. The approach used throughout this work was based on
“segment-per-segment” evaluation. This meant we did not have to take traffic intensities into consideration. Only 0.31% of the entire
road network length consists of the most hazardous curves. The hazard was obtained using hazard ratio and tested by the exact
binomial test and the Bayesian inference. This analysis was only possible due to the existence of an efficient road geometry iden-
tification method (Andrášik and Bíl, 2016) and the ROCA software (Bíl et al., 2018) which processed a large amount of data within a
reasonable time.
The primary contribution of this work can be summarized as follows:

259
M. Bíl et al. Transportation Research Part A 120 (2019) 252–260

• The road network can be rapidly separated into sub-segment units in terms of their homogenous horizontal alignment (curves
with their radii and tangents).
• The crash hazard should be computed for each between-intersection segment separately in order to maintain traffic flow, and thus
to keep exposure, across all sub-segment units, constant.
• The Bayesian inference is a useful tool to identify and rank (by the use of the Bayes factor) the most hazardous geometry unit.
• It was determined that the most hazardous curves have lower radii in general than other curves.
Acknowledgments

This work was supported by the Ministry of Education, Youth and Sports within the National Sustainability Program I, project of
Transport R&D Centre (LO1610), on the research infrastructure acquired from the Operation Program Research and Development for
Innovations (CZ.1.05/2.1.00/03.0064). We would like to thank David Livingstone for his help with English editing and the two
anonymous reviewers whose constructive comments helped increase the lucidity of the manuscript.

References

Anastasopoulos, P.Ch., Tarko, A.P., Mannering, F.L., 2008. Tobit analysis of vehicle accident rates on interstate highways. Accid. Anal. Prev. 40 (2), 768–775.
Andrášik, R., Bíl, M., 2016. Efficient road geometry identification from digital vector data. J. Geogr. Syst. 18 (3), 249–264.
Banihashemi, M., 2016. Effect of horizontal curves on urban arterial crashes. Accid. Anal. Prev. 95, 20–26.
Bíl, M., Andrášik, R., Sedoník, J., Cícha, V., 2018. ROCA – an ArcGIS toolbox for road alignment identification and horizontal curve radii computation. PLoS One 13
(12), e0208407. https://doi.org/10.1371/journal.pone.0208407.
Bíl, M., Andrášik, R., Janoška, Z., 2013. Identification of hazardous road locations of traffic accidents by means of kernel density estimation and cluster significance
evaluation. Accid. Anal. Prev. 55, 265–273.
Clopper, C.J., Pearson, E.S., 1934. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26, 404–413.
Elvik, R., 2013. International transferability of accident modification functions for horizontal curves. Accid. Anal. Prev. 59, 487–496.
Elvik, R., 2008. A survey of operational definitions of hazardous road locations in some European countries. Accid. Anal. Prev. 40, 1830–1835.
Erdogan, S., Yilmaz, I., Baybura, T., Gullu, M., 2008. Geographical information systems aided traffic accident analysis system case study: city of Afyonkarahisar. Accid.
Anal. Prev. 40 (1), 174–181.
Fitzpartick, K., Lord, D., Park, B.-J., 2010. Horizontal curve accident modification factor with consideration of driveway density on rural four-lane highways in Texas.
J. Transp. Eng. 136 (9), 827–835.
Gooch, J.P., Gayah, V.V., Donnell, E.T., 2016. Quantifying the safety effects of horizontal curves on two-way, two-lane rural roads. Accid. Anal. Prev. 92, 71–81.
Kass, R.E., Raftery, A.E., 1995. Bayes factors. J. Am. Stat. Assoc. 90 (430), 773–795.
Khan, G., Bill, A.R., Chitturi, M., Noyce, D.A., 2012. Horizontal curves, signs, and safety. Transp. Res. Rec. 2279, 124–131.
Khan, G., Bill, A., Chitturi, M., Noyce, D.A., 2014. Safety evaluation of horizontal curves on rural undivided roads. Transp. Res. Rec. 2386, 147–157.
Lord, D., Mannering, F.L., 2010. The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transp. Res. Part A 44,
291–305.
MacKay, D.J.C., 2003. Information Theory, Inference, and Learning Algorithms. Cambridge University Press.
Miller, R.G., 1981. Simultaneous Statistical Inference, second ed. Springer Verlag, New York.
Persaud, B., Retting, R., Lyon, C., 2000. Guidelines for identification of hazardous highway curves. Transp. Res. Rec. 1717, 14–18.
ROCA, computer software. CDV – Transport Research Centre, < https://roca.cdvinfo.cz/ > .
Schneider IV, W.H., Savolainen, P.T., Moore, D.N., 2010. Effects of horizontal curvature on single-vehicle motorcycle crashes along rural two-lane highways. Transp.
Res. Rec. 2194, 93–98.
Steenberghen, T., Dufays, T., Thomas, I., Flahaut, B., 2004. Intra-urban location and clustering of road accidents using GIS: a Belgian example. Int. J. Geogr. Inform.
Sci. 18 (2), 169–181.
Vangala, P., Lord, D., Geedipally, S.R., 2015. Exploring the application of the negative binomial-generalized exponential model for analyzing traffic crash data with
excess zeros. Anal. Methods Accid. Res. 7, 29–36.
Zou, Y., Wu, L., Lord, D., 2015. Modeling over-dispersed crash data with a long tail: examining the accuracy of the dispersion parameter in negative binomial models.
Anal. Methods Accid. Res. 5–6, 1–16.

260

You might also like