You are on page 1of 8

Collision and Grounding of Ships and Offshore Structures – Amdahl, Ehlers & Leira (Eds)

© 2013 Taylor & Francis Group, London, ISBN 978-1-138-00059-9

Feasibility of collision and grounding data for probabilistic


accident modeling

M. Hänninen, M. Sladojevic, S. Tirunagari & P. Kujala


Aalto University, Department of Applied Mechanics, Espoo, Finland

ABSTRACT: There exist various sources of data related to marine traffic safety, and the amount of data seems
to be further growing in the future. However, the data sets have different formats, scopes, and initial purposes.
The paper discusses the feasibility of maritime traffic accident and incident data to probabilistic modeling of
collision and grounding accidents, especially their causal factors. In addition, a case study is conducted for
examining the data feasibility. First, categorical Finnish accident causal data is utilized in learning a Bayesian
network model from the data. The data feasibility is then evaluated based on the how well the model matches
to unseen accident cases and how it performs in classification of the accidents. The results indicate that the
dataset does not contain enough information for the applied of modeling approach. Finally, recommendations
to improving the data or ways to cope with the uncertainty are given.

1 INTRODUCTION (VTS) violation and incident reports. Other potential


data sources such as Port State Control inspection data,
The purpose of accident modeling is to learn more occupational safety data, data from insurance compa-
about accidents in order to prevent them in the nies or classification societies are not addressed. The
future. Probabilistic accident models, depending on systems and practices of accident or incident report-
the underlying theoretical accident model type used ing or the corresponding data formats might differ
(see e.g. Hollnagel 2004), quantitatively describe from country to country. Here the emphasis is on data
accident causes, mechanisms, event chains, or sys- describing the marine traffic in Finland.
tem variability. Such a model could be utilized within The rest of the paper is organized as follows.
a cost-benefit analysis, risk management or safety- Chapters 2–4 describe the features of the aforemen-
related decision making. However, a ship, and further tioned accident and incident data sources and discusses
the marine traffic system as a whole, can be considered their feasibility to probabilistic collision and ground-
as a complex socio-technical system. In such a system ing modeling. Chapter 5 presents the data, methods,
an accident is hardly ever a result of a single cause or a results and discussion of the case study, learning
chain of events (Hollnagel, 2004). On the other hand, a Bayesian network of reported accident causes in
accidents are low probability events and thus relatively Finnish collisions and groundings. Finally, conclu-
little data about accidents exists. Therefore, the lack of sions from the data, the literature review and the case
data combined to the complexity of the problem might study results are drawn in Chapter 6.
result in unreliable or invalid probabilistic models.
This paper discusses the feasibility of ship acci-
dent data for probabilistic collision and/or grounding
2 ACCIDENT DATABASES
modeling purposes. In addition, as incidents or near-
misses occur more frequently than accidents but might
2.1 EMCIP
be partly governed by the same underlying mecha-
nisms and thus could provide additional information All Member States of the European Union are obli-
about marine traffic accidents (Harrald et al. 1998), gated to report any marine casualty or accident occur-
also incident data is considered. The study is based rence involving merchant ships, recreational crafts
on examining the data itself when available, review- and inland waterway vessels to the European Marine
ing relevant literature, and a case study of evaluating Casualty Information Platform (EMCIP) operated by
accident data feasibility to learning a Bayesian net- European Maritime Safety Agency EMSA (Correia
work model of the dependencies between the reported 2010). In EMCIP, the casualty events are classified
accident causes. The examination is limited to acci- into 25 event types. Collisions and groundings can
dent databases providing categorical information on be categorized as a collision with another ship, a col-
the accidents, accident investigation reports, a near- lision with multiple ships, a collision when the ship
miss reporting database, and Vessel Traffic Service is not underway, contact with floating cargo, contact

1
with ice, contact with other floating object, contact Table 1. Data fields in the DAMA accident database.
with unknown floating object, contact with a fixed
object, contact with a flying object, drift ground- Field Format Field Format
ing/stranding, or powered grounding/stranding. The
collected information is divided into factual data and Case number number Country text
Ship name text Waters cat
casualty analysis data. To describe the sequence of
Home port text Voyage phase cat
the events related to a casualty, the results obtained in Nationality text Working ac. cat
the Casualty analysis methodology for maritime oper- Type of ship cat Wind direction cat
ations (CASMET) project (Caridis 1999) are used. Constuction year number Wind force cat
Special focus has been paid to verifying the quality Renovation year number Sea cat
of the reporting and accomplishing the application of Material cat Visibility cat
the same principles in the investigations of casualties GRT number Light cat
and data analyses across the EU (EMSA 2010). DWT number Cargo cat
EMCIP database had operated on a voluntary basis Length number Pilot onboard y/n
Classification soc. text 2. ship name text
for two years until June 2011 when it became manda-
Year number 2. ship nation text
tory. Therefore, the data it currently contains might Month number Loss/dam. severity cat
still be too scarce for probabilistic modeling purposes. Day number Evacuated y/n
Further, all accidents stored in the system are available Time of event number Hull damage y/n
only to EMSA. A particular Member State has access Day of the week number Hull dam. severity cat
only to her own data, and the access is only granted to Event #1 cat Damage length Number
authorities. Nevertheless, despite the low number of Event #2 cat Damage width Number
records and the limited access to the system, EMCIP Event #3 cat Damage depth Number
manages to establish a common taxonomy. This could Cause #1 cat Hull dam. locat. y cat
Cause #2 cat Hull dam. locat. z cat
facilitate different comparison studies in the future.
Cause #3 cat Hull dam. locat. x cat
So far the Finnish EMCIP data has only been uti- Cause #4 cat Death people Number
lized in reporting marine traffic accident statistics for Departure port text Injured people Number
the years 2009–2010 (Trafi 2011) and the authors were Destination port text Oil pollution Number
unable to find studies of further accident modeling Latitude number Bridge manning Free text
based on EMCIP data. Due to researchers not having Longitude number Damages Free text
access to the data, further examination of its feasibility
is impossible.

in-depth analyses of the marine accidents in Finland,


2.2 DAMA such as an analysis of the correlations between the
Before EMCIP, from the year 1990 to 2010, the marine different factors, or studies for finding subgroups or
accidents of Finnish vessels and accidents to foreign clusters within the accidents, could not be found.
vessels within Finnish territorial waters were stored
in accident database DAMA (Laiho 2007, Kallberg
2.3 HELCOM
2011). In 2001–2005, the average number of acci-
dent cases stored per year was 50, of which 15 were Baltic Marine Environment Protection Commission
groundings and 5 collisions (Laiho 2007). HELCOM (Helsinki Commission) gathers data on
DAMA included 20 accident type categories, Baltic Sea accidents (HELCOM 2012a) covering all
including ship-ship collision, collision with an off- accidents of tankers over 150 GT and/or other ships
shore platform, collision with a floating object, col- over 400 GT within the states’ territorial waters or
lision with a bridge or quay, and grounding/stranding. EEZs. Due to a change in the reporting format, the
Besides the accident type, DAMA entries included data before 2004 and the subsequent years are not fully
fields listed in Table 1. However, not all fields had been comparable. In 2005–2009, the average annual num-
filled in all accident cases. DAMA had 78 alternatives ber of accidents in HELCOM database was 125. The
for the accident causes and a possibility to report up accident dataset, from 1989 on, can be accessed online
to four causes per accident. These causes had been with a map based web tool (HELCOM 2012b) and is
categorized under the following seven cause groups: also available on request.
external factors; ship structure and layout; technical HELCOM database accidents are divided into colli-
faults in ship equipment; factors related to equipment sions, fire, groundings, machinery damages, physical
usage and placement; cargo, cargo and fuel handling damages, pollutions, sinkings, technical failures and
and related safety equipment; communication, orga- other accidents. Collisions can be further classified as
nizing, instructions and routines; and people, situation collisions with another vessel, with an object, or as the
assessment, actions. ones with another vessel and an object. The HELCOM
Based on the DAMA data, statistical analyses of data fields and the numbers of times the field has been
accident characteristics such as ship types, circum- filled in the 1989–2009 data can be seen in Table 2.
stances and causes have been conducted (Heiskanen Only one cause per involved ship is reported.The cause
2001, Laiho 2007, Kujala et al. 2009). However, more categorization into a human factor, a technical factor,

2
Table 2. Data fields in HELCOM accident database. The DAMA data and HELCOM data from the years 1997-
number of times reported describes the number of cases 1999 and 2001–2006 were also used in evaluating
where the corresponding field has not been left blank or accident statistics for the Gulf of Finland (Kujala et al.
reported as “n.i.”, “unknown” etc. in 1989–2009. Ship2 size 2009). Mazaheri et al. (in prep.) have studied corre-
values were found to be identical to the reported Ship1 size in
all but one collision with another vessel, so its correctness can
lations between the ship traffic and the location of
be questioned and the reporting percentage is not presented the grounding accidents within the HELCOM data.
in the table. HELCOM data was also used by Hänninen & Kujala
(2013) when modeling the dependencies of the Gulf
# of times Reporting of Finland Port State Control inspection findings and
Data field Entry format reported (%) accident involvement.
Compared with DAMA, HELCOM contains fewer
Date dd.mm.yyyy 1251 100,0% accidents from Finnish waters: as an example, in
Ship1 name text 1251 100,0%
Ship2 name text 145 100,0%*
DAMA there are 46 accidents from Finnish waters in
Year numeric 1251 100,0% 2004, whereas in HELCOM database the number is 8.
Latitude numeric 1250 99,9% On the other hand, some of the accidents present in the
Longitude numeric 1250 99,9% HELCOM data are missing from DAMA. Neverthe-
Accident type cat. 1249 99,8% less, although not complete and even containing some
Ship1 category cat. 1230 98,3% errors (Salmi 2010), at the moment HELCOM data is
Pollution no/yes/n.i. 1166 93,2%
Type of pollution text 133 93,0%***
the largest database with a uniform data format of the
Amount of poll. numeric 1021 81,6% Baltic Sea accidents.
Collision type cat. 273 78,0%**
Ship1 type text 964 77,1%
Ship2 category cat. 108 75,0%*
Ship2 type text 87 65,9%* 3 ACCIDENT INVESTIGATION REPORTS
Country text 756 60,4%
Ship1 size (gt) numeric 725 58,0% In Finland, Safety Investigation Authority (SIA) inves-
Time hh.mm 646 51,6% tigates and reports “all major accidents regardless of
Ship2 size (gt) numeric 68 51,5%*
Cause, ship1 cat 616 49,2%
their nature as well as all aviation, marine and rail
Ship1 draught (m) numeric/ 590 47,2% accidents and their incidents” (SIA 2012a). Marine
interval accidents are investigated if they have occurred within
Pilot, ship1 cat 572 45,7% Finnish waters, or if a Finnish vessel has been involved
Cargo type text 535 42,8% in the accident. SIA investigates and reports how the
Ice conditions no/yes/n.i. 507 40,5% accident occurred, what were the circumstances, the
Damage text 478 38,2%
Cause, ship2 cat 46 34,8% causes, the consequences and the rescue operations.
Accident details text 423 33,8% The reports based on the investigations also provide
Ship1 size (dwt) numeric 395 31,6% recommendations of actions for preventing similar
Offence text 277 22,1% accidents.
Cause details text 274 21,9% The marine traffic accident investigation reports of
Assistance need text 209 16,7%
Ship1 hull single/ 170 13,6%
accidents from 1997 on and 10 older reports canbe
double/n.i. downloaded from SIA web pages (SIA 2012b). In
Pilot, ship2 cat 15 11,4%* October 2012, 187 reports of accidents, serious inci-
Ship2 hull single/ 13 9,8%* dents, incidents, damages, minor accidents and other
double/n.i. incidents were available.
Ship2 draught (m) numeric/ 55 4,4%* Accident reports are in text format and their usage
interval
Additional info text 38 3,0%
typically requires human effort in extracting informa-
Consequences/ text 36 2,9% tion of interest from the text. The task can become
response actions tedious while humans may not always be capable of
Amount of poll. (tons) numeric 15 1,2% extracting the information objectively. Text mining is
Crew trained in ice no/yes/n.i. 14 1,1% an emerging technology that can be used to augment
navigation existing data in electronic textual databases by making
Ship2 size (dwt) numeric 157 –
unstructured text data available for analysis (Francis &
*of the collisions with another vessel. Flynn 2010).
**of collisions. Zheng & Jin (2010) used accident reports and a
***of accidents with pollution. text data mining technique called attribute reduction
for extracting the most frequent human factors which
an external factor, or another factor is coarser than the they considered as reasons leading to human errors in
one in DAMA. It is supplemented with a text field for marine traffic accidents. Artana et al. (2005) devel-
describing the cause in more detail. However, as can oped and evaluated software utilizing text-mining for
be seen from Table 2, it has been filled in only 22% of encountering maritimmarinee hazards as well as a
the cases. risk management system covering organizations and
From the data, HELCOM publishes annual acci- human resources. Tirunagari et al. (2012) applied NLP
dent statistics (HELCOM 2012c). A combination of methods text mining to cluster the marine accident

3
reports. However, the utilization of text mining is a text mining. So far, the data has not been utilized for
complex task as it involves addressing text data which even establishing trends (Erdogan 2011).
is very unstructured and fuzzy (Tan 1999). Moreover,
there are quite many challenges when accident reports 4.2 VTS violation and incident reporting
are concerned as the reports are written in natural lan-
guage with no standard template and often contain Vessel Traffic Service (VTS) provides information and
misspellings and abbreviations. Also, the detection navigational guidance to the vessels and can orga-
of multi words such as “safety culture” is difficult nize the traffic within a VTS monitoring area (FTA
because it is not known which word is of greater 2011). In the Gulf of Finland, areas not included in
importance and the words “safety” and “culture” the VTS areas are covered by Mandatory Ship Report-
have a different meaning when appearing separately ing System GOFREP. Within Finnish territorial waters,
compared with when considered as a single word. vessels with a GT of at least 300 are obliged by law
to participate in the VTS monitoring and report their
arrival to the GOFREP area or when they are leaving
4 NEAR-MISS REPORTING a port in the Gulf (FTA 2012).
VTS operators should report all violations they
4.1 Insjö and ForeSea observe within the Finnish VTS areas and the
GOFREP area. Also, incidents or near misses within
ForeSea is an anonymous and voluntary experience
Finnish waters are reported. However, differences in
database initiated by Finnish and Swedish organi-
the numbers of reported violations between VTS oper-
zations and government agencies. The aim of the
ators have been detected (Talja, pers. comm.). In 2010,
database is “to capture the conditions that are nor-
a total number of 125 incident and violation reports
mally not reported to authorities” including accidents,
were made at the Gulf of Finland VTS center.
near misses and non-conformities (ForeSea 2012).
The format of the violation and especially the
The database is a refined version of the Swedish Insjö
incident reporting forms has slightly varied over the
system which was launched in 2002 and the plan
years but the basic structure, a narrative text field
is to replace Insjö with ForeSea.
for describing the event and a few check box-type
In September 2011, twelve companies were report-
options for the location or circumstances has remained
ing to ForeSea and 76 to Insjö (Bråfelt, pers. comm).
unchanged. The information the reports covered in the
Approximately one report per year per ship has been
first half of the year 2009 and the fill-up percentages
obtained to Insjö. On the 7th of December 2011, Insjö
is presented in Tables 3–4. At the beginning of 2012,
contained 1282 accident reports, 841 near misses and
the reporting system was reformed and all reporting is
532 non conformity records. 1268 of these reports had
to be done into an electrical system.
been transferred to ForeSea. After ForeSea becomes
The work of the VTS was described both verbally
fully operational in July 2013, every individual mem-
and statistically based on two two-week periods of
ber company will be required to provide reports to the
VTS operators reporting all situations requiring VTS
database every year.
intervention (Westerlund 2011). Salmi (2010) used
The philosophy behind the ForeSea taxonomy is
violation reports for identifying accident-prone ves-
“what can be got into”, compared with EMCIP’s phi-
sels by comparing the vessels present at the violation
losophy of “what the collector wants to get in” (Bråfelt,
reports to HELCOM accident statistics. Unfortunately,
pers. comm). The database administrator is responsi-
the categorized data the reports contain does not
ble for classifying the event into 27 categories based on
provide much input to probabilistic models and the
his interpretation. Data can be separated into five main
information about the situation, the vessel(s) and the
categories: prerequisite data, the course of events, the
circumstances must be transformed into categorical
causes, the consequences, and the measures. Each of
data, which may introduce some uncertainty. On the
these is further divided into subcategories. The causes
other hand, as with accident investigation and near
are divided into human/manning, working environ-
miss reports, finding the truth behind the textual infor-
ment, marine environment, technical ship and cargo
mation may also be challenging. Nevertheless, the
and management causes.
advantage of VTS violation and incident reports is that
Data stored in the Insjö database is available to four
violations and incidents occur more frequently than
categories of users with different rights and accesses
accidents and thus there is more data available.
to features. Researchers have access to the most of the
features, including also a right to export data to Excel
format.
5 CASE STUDY: FEASIBILITY OF
Insjö and ForeSea contain only a short description
CATEGORICAL ACCIDENT CAUSE
of the event in narrative textual form, with very lit-
DATA FOR LEARNING A BAYESIAN
tle factual data available (the ship type, type of event,
NETWORK MODEL
the activity of the ship, the location) and its quality
depends on the reporter’s skills (Bråfelt, pers. comm).
5.1 Purpose of the case study
As with accident investigation reports, the utilization
of the data would require information extraction from Although textual descriptions provide the rich infor-
the text, conducted either manually or possibly with mation on accidents, the terms or expressions when

4
Table 3. Information fields of the Finnish VTS violation Table 4. Information fields of the Finnish VTS incident
reports from the year 2009. In addition, a capture of the situ- reports from the year 2009. In addition, a capture of the situ-
ation on ECDIS is attached to the report which may include ation on ECDIS is attached to the report which may include
additional AIS information about the speed, course and head- additional AIS information about the vessel’s speed, course
ing of the vessel. The filling percentages are calculated from and heading. The filling percentages are calculated from 21
37 VTS violation reports from January–July 2009. incident reports from January–July 2009.

Type of Type of field Type of Type of field


information Field and filling % information Field and filling%

Vessel Name Text (100%) Vessel Name Text (95%)


identification Flag Text (100%) identification Callsign Text (90%)
Port of registry Text (65%) IMO Number Text (76%)
Callsign Text (100%) Pilot Text (38%)
Type Text (100%) Master Text (0%)
IMO Number Text (100%) Time Date and time Text (100%)
MMSI Text (100%) Position, speed Position Text (86%)
GT Text (76%) and course Destination Text (81%)
Time Date and time Text (92%) Location Hanko VTS Check box
Position, speed Latitude & longitude Text (100%) Helsinki VTS Check box
and course Kotka VTS Check box
Location Territorial waters of Check box GOFREP Check box
Finland/international (100%) (95%)
waters Weather Weather Text (visib. 67%,
Outside scheme/ Check box wind dir. 95%,
Traffic Separation Check box/ wind force 95%)
Scheme/ Text (name) Type of Near miss Check box
Lane/ Check box/ non-conformity Accident Check box
Text (desc.) AIS Check box
Separation zone/ Check box Environment Check box
Other location Check box/ Pilot Check box
Text (desc.) Equipment Check box
(76%) Personal injuries Check box
Identification Plotted by Radar/ Check box Emergency Check box
Plotted by AIS (89%) Other Check box
Identified by Text (GOFREP (100% )
or VTS) (0%) Additional Description of
Weather Wind direction Text (68%) information incident Text
Wind force (m/s) Text (68%) Actions taken by Text (Descr. and/or
Sea state (douglas) Text (22%) VTS Operator actions 100%)
Visibility (m) Text (8%) Operator Text (100%)
Contravened Rule 10 (b) I Check box Supervisor Text (95%)
regulations Rule 10 (b) ii Check box
Rule 10 (b) iii, joining Check box
Rule 10 (b) iii, leaving Check box
Rule 10 (c) Check box
Rule 10 (d) Check box
However, fixed categorization might result informa-
Rule 10 (e) Check box tion loss and thus introduce uncertainty in the model.
Rule 10 (f) Check box Further, given the complexity of the problem, the cate-
Rule 10 (g) Check box gorization of accident causes should be rather detailed
Rule 10 (h) Check box while the dataset would need to be large in order to
Rule 10 (i) Check box have enough data within each category.
Rule 10 (j) Check box The aim of the case study is to evaluate if cate-
IMO Resolution Check box gorical accident-cause data is a feasible information
MSC.139(76) Annex 1 Check box source for a probabilistic model of collisions and
Other rules Check box/
Text (95%)
groundings and their reported causes. The model is
Additional Details of the incident Text (97%) constructed directly from the data. The feasibility is
information evaluated based on how well the model matches to
unseen accident cases.

5.2 Data and methods


referring to a similar factor or cause might vary, which
might complicate any probabilistic modeling based In order to avoid problems from taxonomy differences,
on the data. Categorical accident information requires a single accident database is used as an input. As
less effort on preparing the data for probabilistic EMCIP data is not available, DAMA accident database
analyses and removes the problem of unambiguity. is chosen as the input data due to featuring the most

5
Table 5. The number of cases in the dataset and the number
of cases with at least one, two or three reported causes.

Collisions Groundings Total

No. of cases 55 160 215


>0 reported causes 55 157 212
>1 reported causes 11 21 32
>2 reported causes 4 10 14

Table 6. The number of cause types within cause categories


of the dataset and the number of cases with the reported cause
category.

Collisions Groundings Total Figure 1. A part of the BN model structure learned from
the data with the NPC algorithm. The rest of the variables
People, situation 32 117 149 are unconnected or pairwise connected variables and are not
assessment, shown.
actions (13)
External conditions (7) 33 29 62
Table 7. The performance of the Bayesian network model
Technical failure (5) 0 30 30
given the test set compared to an empty graph (for the scores,
Communication 8 7 15
higher values are better).
organization
procedures etc. (8)
BN model Empty graph
Ship structure and 0 2 2
layout (2)
Equipment and 0 2 2 Log-likelihood score −599.8 −603.2
layout (1) AIC score −659.8 −640.2
Other (1) 0 2 0 BIC score −728.1 −682.3
Total (37) 73 189 262 Classification error 16.7% 19.4%
Precision (collision) 0.625 0.000
Recall (collision) 0.357 0.000
F-measure 0.455 0.000
AUC 0.87 0.80
detailed cause categorization and a possibility to report
more than one cause per accident. Table 5 summa-
rizes the data consististing of 55 ship-ship collisions
and 160 grounding cases. The accidents have occurred performs with unseen data. Log-likelihood score is
within 1997–1999 and January 2001–June 2006. From calculated for comparing the model fit to the test
the accidents, accident type (collision/grounding) and set. However, as log-likelihood favors densely con-
the reported primary, secondary and third causes are nected networks, the Akaike Information Criterion
considered. 37 different cause types are present in the (AIC) (Akaike 1974) and the Bayesian Information
dataset. These causes can be grouped into seven cate- Criterion (BIC) (Schwarz 1978) scores, which addi-
gories. The frequencies of these categories within the tionally penalize a model based on its complexity, are
data are shown in Table 6. also determined. The scores are then compared with
A Bayesian network (BN) model (Pearl 1988) con- the ones of an empty graph, i.e., a model with no depen-
sisting of 38 variables in total is learned from the dencies between the variables. In addition, the model’s
data. In brief, BN is a graphical representation of ability to correctly classify test set cases as collisions
the joint probability distribution of a set of variables is evaluated by calculating the collision misclassifica-
describing a certain problem (Darwiche 2009). The tion rate, precision and recall and the area under the
case study model variables describe the accident type ROC-curve (AUC) (e.g. Bradley 1997).
(collision/grounding) and whether each cause type had
been reported in an accident (yes/no). The graph struc-
5.3 Results and discussion
ture is learned using NPC algorithm (Steck & Tresp
1999) whereas Expectation-maximization method From the data, NPC algorithm learns a Bayesian
(Dempster et al. 1977) is applied for determining network of ten connected variables (including the
the network probability parameters. Hugin Expert event type and the presences of nine cause types), 17
software (Mädsen et al. 2005) is utilized in the unconnected cause type variables, and five pairs of
construction. dependent causes. Figure 1 presents the ten connected
For evaluating the quality of the resulting model, variables.
the dataset is divided into a training set (143 cases) The data itself (Table 5) already suggests that it
which is used for learning the model and a test set cannot produce a very informative model on the con-
(72 cases) for evaluating how well the resulted model nections between different causes, as in less than 15%

6
of the cases more than one cause had been reported. the amount of data compared to the complexity of
This can also be seen from the model performance met- accident mechanisms. As an example, the case study
rics. Table 4 presents the log-likelihood, AIC and BIC results suggest that a reliable Bayesian network model
score of the model and the accident type classifica- of the interdependencies of collisions and groundings
tion performance characteristics given the test set data. and their causes cannot be learned from the most com-
Log-likelihood score of the model was slightly better prehensive categorized collision and grounding cause
but not clearly superior to the one of the empty graph. data available. In addition, all data sources have been
Further, AIC and BIC slightly preferred the empty populated by humans and the contents are thus based
graph, which indicates that the BN model variable on their views on the accident and its possible causes,
dependencies were not very strong. When consid- which is inevitably subjective.
ering the accident type classification performance, It can be concluded that using any of the acci-
both models had over 80% overall classification per- dent or near miss data as the only source of input
formance. However, this is largely due to the clear to a quantitative collision or grounding model seems
dominance of groundings over collisions in the data: risky: if factors such as underreporting, biases, errors
empty graph assigned all cases to the most proba- and missing fields are not considered, the models may
ble class, groundings, and yet reached relatively good produce unreliable results. Double checking between
overall performance. However, whereas the empty two or more databases, using the data together with
graph thus had zero recall and precision for collisions, prior knowledge on the problem, combining multiple
the BN model did not do well either. It correctly clas- related data sources when learning the model from
sified only 36% of the collisions and 63% of the cases data, and choosing the modeling approach carefully
classified as collisions were true collisions. A slightly are a few potential ways for decreasing the uncertainty
better classification performance might be possible or improving the validity of the models. However, it is
to achieve by artificially balancing the proportions important to emphasize that any improvements in the
of collisions and groundings in the data by over- or data or its handling will not matter, if the databases
undersampling, but still it seems that DAMA does not stay unavailable to the modelers and further indirectly
contain enough information for the purposes of this to the stakeholders making the decisions based on the
type of BN model construction. models.
Although Bayesian networks can represent rather
complex interactions between variables, include
ACKNOWLEDGEMENTS
uncertainty related to the problem and handle miss-
ing data while also having a qualitative, graphical
The study was conducted as a part of Competitive
dimension (Darwiche 2009), it should be noted that
Advantage by Safety (CAFE) project, financed by the
the chosen modeling technique and its assumptions
European Union – European Regional Development
always affect the results. If the aim of the model
Fund – through the Regional Council of Päijät-Häme,
was solely classify the accidents into collisions or
City of Kotka, Finnish Shipowners’ Association, and
groundings based on the reported causes, a better or
Kotka Maritime Research Centre corporate group:
comparable classification might have resulted from a
Aker Arctic Technology Inc., Port of HaminaKotka,
simpler naive Bayes classifier than a Bayesian net-
Port of Helsinki, Kristina Cruises Ltd, and Meriaura
work fully constructed from data. Moreover, instead of
Ltd. The authors wish to express their gratitude to
using Bayesian networks, the classification could have
the fundersOlle Bråfelt from ICC and Sari Talja from
been conducted with other methods such as decision
Finnish Transport Agency are warmly thanked for the
trees, logistic regression, neural networks, or sup-
interviews.
port vector machines models. However, as the model
was not targeted only to classification but mainly
to describing the dependencies between the different REFERENCES
causes, and not only probabilistically but also visu-
ally, Bayesian networks and the applied NPC structure Artana, K.B.; Putranta, D.D.; Nurkhalis, I.K. & Kuntjoro,
learning algorithm were chosen to be used. Y.D. 2005. Development of Simulation and Data Min-
ing Concept for Marine Hazard and Risk Management.
In: Proceedings of the 7th International Symposium on
Marine Engineering. Tokyo, October 24th to 28th, 2005.
6 CONCLUSIONS Bradley, A.P. 1997. The use of the area under the ROC curve
in the evaluation of machine learning algorithms. Pattern
This paper has examined a few potential sources of recognition 30(7): 1145–1159.
input data for quantitative marine traffic accident Bråfelt, O. 2011. Pers. comm. Insjö/ForeSea administrator.
models. Although relatively many sources of data Interview, 1st of September.
exist, none of them seem to have the quality and quan- Caridis, P.A. 1999. CASMET. Casualty Analysis Metho-
dology for Maritime Operations. National Technical
tity to serve as a sufficient information source for University of Athens, Athens. C01.FR.003.
probabilistic modeling of collisions and groundings Correia, P. 2010. European Marine Casualty Information
occurrence. The quantity problem is further enhanced Platform a common EU taxonomy. 5th International Con-
by the restricted data availability, even for academic ference on Collision and Grounding of Ships (ICCGS).
research purposes. The largest challenges come from TKK-AM-16: 13–17. Espoo, Finland.

7
Darwiche, A. 2009. Modeling and reasoning with Bayesian Finland: Finnish Transport Safety Agency. Trafin
networks volume 1. Cambridge University Press. julkaisuja 1/2011.
Dempster, A.P.; Laird, N.M. & Rubin, D.B. 1977. Maxi- Kujala, P.; Hänninen, M.;Arola,T. &Ylitalo, J. 2009.Analysis
mum likelihood from incomplete data via the EM algo- of the marine traffic safety in the Gulf of Finland. Relia-
rithm. Journal of the Royal Statistical Society. Series B bility Engineering & System Safety 94(8): 1349–1357.
(Methodological): 1–38. Laiho, A. 2007. Ship accident analysis 2001-2005 (in
[EMSA] European Maritime Safety Agency. 2010. Euro- Finnish). Helsinki, Finland: Finnish Maritime Adminis-
pean Maritime Casualty Information Platform. EMCIP tration. Merenkulkulaitoksen julkaisuja 5/2007.
Reporting (online) [cited 11.10.2012]. Available: http:// Mazaheri A.; Kotilainen P.; Montewka J.; Sormunen O. &
emcipportal.jrc.ec.europa.eu/EMCIP-Reporting.341.0. Kujala P. in prep. Correlation study between ship ground-
html ing, ship traffic, and waterway complexity-A case study
Erdoğan, I. 2011. Best practices in near-miss reporting. based on the statistics of the Gulf of Finland.
The role of near-miss reporting in creating and enhancing Mädsen, A.L.; Jensen, F.; Kjärulff, U.B. & Lang, M. 2005.
the safety culture. Master’s thesis, Chalmers University The Hugin tool for probabilistic graphical models. Inter-
of Technology, Sweden. national Journal of Artificial Intelligence Tools 14 (3):
ForeSea. About (online) [cited 16.10.2012]. Available: http:// 507–544.
www. foresea.org/about.aspx Pearl, J. 1988. Probabilistic reasoning in intelligent systems:
Francis, L. & Flynn, M. 2010. Text mining handbook. In networks of plausible inference. Morgan Kaufmann.
Casualty Actuarial Society E-Forum. Salmi, K. 2010. Targeting accident prone ships by their
[FTA] The Finnish Transport Agency. 2011. Vessel behavior and safety culture. Aalto University School of
Traffic Service (online) [cited 16.10.2012]. Available: Science and Technology, Espoo TKK-AM-14. Available:
http://portal. liikennevirasto.fi/sivu/www/e/professionals/ http://appmech.tkk.fi/julkaisut/TKK-AM-14.pdf
vts/vts [SIA] Safety Investigarion Authority. 2012a. Role and func-
[FTA] The Finnish Transport Agency. 2012. GOFREP tion (online) [cited 12.10.2012]. Available: http://www.
(online) [cited 16.10.2012]. Available: http://portal. turvalli suustutkinta.fi/en/Etusivu/OTKES
liikennevirasto. fi/sivu/www/e/professionals/vts/gofrep [SIA] Safety Investigation Authority. 2012. Marine
Geiger, D.; Verma, T. & Pearl, J. 1990. Identifying indepen- (online) [cited 12.10.2012]. Available: http://www.
dence in Bayesian networks. Networks 20(5): 507–534. turvallisuustutkin ta.fi/en/Etusivu/Tutkintaselostukset/
Harrald, J.R.; Mazzuchi, T.A.; Spahn, J.; Van Dorp, R.; Vesiliikenne
Merrick, J.; Shrestha, S. & Grabowski, M. 1998. Using Steck, H. & Tresp, V. 1999. Bayesian Belief Networks for
system simulation to model the impact of human error in Data Mining. Proceedings of the 2. Workshop on Data
a maritime system. Safety Science 30(1–2): 235–247. Mining and Data Warehousing: 145–154.
Heiskanen, M. 2001. Accident analysis 1990–2000, Ground- Talja, S. Pers. comm. Finnish Transport Agency/Gulf of
ings and collisions with vessel (in Finnish). Helsinki, Finland Vessel Traffic Centre. Phone interview, 7th of
Finland: Finnish Maritime Administration. Merenkulku- October, 2011.
laitoksen julkaisuja 7/2001. Tan, A.H. 1999. Text mining: The state of the art and the
[HELCOM] Helsinki Commission. 2012a. Report on ship- challenges. In: Proceedings of the PAKDD 1999 Work-
ping accidents in the Baltic Sea area for the year shop on Knowledge Discovery from Advanced Databases:
2011 (online) [cited 11.10.2012]. Available: http:// 65–70.
www.helcom.fi/stc/files/shipping/shipping_accidents_ Tirunagari, S.; Hänninen, M.; Guggilla, A.; Ståhlberg, K. &
2011.pdf Kujala, P. 2012. Impact of similarity measures on causal
[HELCOM] Helsinki Commission. 2012b. HELCOM Map relation based feature selection method for clustering
and Data Service (online) [cited 12.10.2012]. Available: maritime accident reports. Journal of Global Research
http:// maps.helcom.fi/website/mapservice/index.html in Computer Science 3(8): 46–50.
[HELCOM] Helsinki Commission. 2012c. Helcom: [Trafi] Finnish Transport Safety Agency & Statistics
Accidents and response (online) [cited 12.10.2012] Finland. 2011. Maritime traffic accident statistics 2010
Available: http:// www.helcom.fi/shipping/accidents/en_ (in Finnish) (online) [cited 11.10.2012]. Available:
GB/accidents/ http://www.trafi.fi/ filebank/a/1322164797/6ce7133934c
Hollnagel, E. 2004. Barriers and Accident Prevention. 4017751894384eb04fc41/1616-721-Vesiliikenneonnetto
Hampshire: Ashgate. muuksien_vuositilasto_ 2010.pdf
Hänninen, M. & Kujala, P. 2013. Port State Control Inspec- Westerlund, K. 2011. The Risk Reducing Effect of
tions and Ship Safety, Part II: Bayesian Network Mod- VTS in Finnish Waters. EfficienSea Deliverable No.
eling of Inspection Findings and Accident Involvement. D_WP6_5_01. Finnish Transport Agency.
Submitted manuscript. Zheng, B. & Jin, Y. 2010. Analysis on factors leading
Kallberg, V.P. 2011. Accident statistics for different modes to human fault in marine accidents based on attribute
of transport – preliminary survey (in Finnish). Helsinki, reduction. Journal of Shanghai Maritime University 1.

You might also like