You are on page 1of 14

Communications in Transportation Research 1 (2021) 100011

Contents lists available at ScienceDirect

Communications in Transportation Research


journal homepage: www.journals.elsevier.com/communications-in-transportation-research

Emerging approaches applied to maritime transport research: Past


and future
Ran Yan a, Shuaian Wang a, Lu Zhen b, *, Gilbert Laporte c, d
a
Department of Logistics and Maritime Studies, The Hong Kong Polytechnic University, 999077, Hong Kong, China
b
School of Management, Shanghai University, Shanghai, 200444, China
c
Department of Decision Sciences, HEC Montreal, Montreal, Quebec, H3T 2B1, Canada
d
School of Management, University of Bath, Bath, BA2 7AY, UK

A R T I C L E I N F O A B S T R A C T

Keywords: Maritime transport is the backbone of international trade and globalization. Maritime transport research can be
Maritime transport roughly divided into two categories, namely the shipping side and the port side. Most of the classic approaches
Shipping adopted to address practical problems in these research topics are based on long-term observations and expert
Port
knowledge, while few of them are based on historical data accumulated from practice. In recent years, emerging
Data-driven modeling
Digitalization in the maritime industry
approaches, which we refer to as machine learning and deep learning techniques in this essay, have been
receiving a wider attention to solve practical problems. As a relatively conservative industry, there are some
initial trials of applying the emerging approaches to solve practical problems in the maritime sector. The objective
of this essay is to review the application of emerging approaches to maritime transport research. The main
research topics in maritime transport and classic methods developed to solve them are first presented. The
introduction of emerging approaches and their suitability to be applied in maritime transport research is then
discussed. Related existing studies are then reviewed according to problem settings, main data sources, and
emerging approaches adopted. Challenges and solutions in the process are also discussed from the perspectives of
data, model, users, and targets. Finally, promising future research directions are identified. This essay is the first
to give a comprehensive review of existing studies on developing machine learning and deep learning models
together with popular data sources used to address practical problems in maritime transport.

1. Overview of maritime transport research analysis, vessel operational management, green shipping, shipping safety
and security, and shipping company management. Meanwhile, the most
Maritime transport carries more than 80% of world merchandise popular research topics in the port part include port management, per-
trade by volume (UNCTAD, 2020). It is therefore regarded as the back- formance evaluation and competitiveness as well as terminal manage-
bone of globalized trade and of the manufacturing supply chain. The ment. For the research methods used in maritime transport research from
maritime transport sector is also a crucial component of a country’s 2000 to 2014, a total of seven methods are identified by the authors,
economic system, as it acts a fundamental role in importing and namely SIQO (survey, interview, questionnaire, and observation), eco-
exporting resources and providing employment opportunities. Given the nomic modeling, MES (mathematical, econometric, and statistical anal-
importance of maritime transport at the international and regional levels, ysis), case study, CCCQ (conceptual, content, comparative, and
the related academic research has been receiving wide attention. qualitative analysis), literature review, and simulation, which we call
According to Talley (2013) and Shi and Li (2017), research topics in classic approaches in this essay. The main aspects of these classic ap-
maritime transport can be divided into the “shipping part” and the “port proaches are summarized in Table 1.
part”. After reviewing 1,292 papers, Shi and Li (2017) further concluded
that the top five research topics in shipping part are shipping market 2. Emerging technologies and their application in maritime
transport research

* Corresponding author. Most of the classic approaches summarized in Section 1 are heavily
E-mail addresses: angel-ran.yan@connect.polyu.hk (R. Yan), wangshuaian@ dependent on long-term practical experience and expert knowledge in a
gmail.com (S. Wang), lzhen@shu.edu.cn (L. Zhen), gilbert.laporte@cirrelt.net qualitative manner. Even if some quantitative methods are applied, such
(G. Laporte). as economic modeling and MES, the developed models are also more or

https://doi.org/10.1016/j.commtr.2021.100011
Received 18 August 2021; Received in revised form 29 October 2021; Accepted 30 October 2021
Available online 10 November 2021
2772-4247/© 2021 The Authors. Published by Elsevier Ltd on behalf of Tsinghua University Press. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

Table 1 the classification task, the objective is to assign each input vector to
Summary of classic approaches in maritime transport research. one of a finite number of discrete categories. In the regression task,
Method Basic idea and content the output consists of one or more continuous variables. Common SL
models include logistics regression (LR), support vector machine
SIQO Survey: The collection of information from a sample of
individuals through their responses to questions, including data (SVM), support vector regression (SVR), artificial neural networks
collecting, aggregating, and analyzing. (ANN), Bayesian network (BN), decision tree (DT), random forest
Interview: A conversation for gathering information which (RF), and boosting machines (e.g., gradient boosting regression tree
usually includes an interviewer, who coordinates the whole [GBRT]).
process and asks questions, and an interviewee, who responds to
the questions.
 UL: A training set consisting of a set of input vectors without any
Questionnaire: A research instrument that consists of a set of corresponding target values used to train ML models. Common UL
questions or other types of prompts with the aim to collect tasks include clustering, density estimation, and association rule
information from a respondent*. mining. Common UL algorithms include k-means, principal compo-
Observation: The study of non-experimental situations in which
nent analysis, A priori, and autoencoders.
behavior is observed and recorded, generating either qualitative
or quantitative results.  RL: A problem of finding suitable actions to take in a given situation to
Economic Using different theories as well as quantitative or qualitative maximize a reward by taking a sequence of states and actions. Com-
modeling models and techniques to analytically evaluate the cause and mon approaches are model-free RLs relying on trial-and-error expe-
effects of any economic phenomenon. rience to set up the optimal policy, and model-based RLs attempting
MES Mathematical analysis: Modelling a nonmathematical situation,
phenomenon, and the relationship between the situations
to model the environment and choose the optimal policy.
mathematically, e.g., by developing mathematical models and
proposing proper solution approaches. DL, although usually treated separately from ML, is actually a subfield
Econometric analysis: Constructing a set of equations to provide a of ML which is based on deep neural networks (DNNs) that consist of
quantitative explanation of the behavior of economic variables.
multiple hidden layers with advanced neurons (Larochelle et al., 2009).
Statistical analysis: Collecting and analyzing data based on
statistical assumptions and tests to draw meaningful inferences The most common DL models are convolutional neural networks (CNNs)
from the samples to the whole population. widely applied in computer vision and speech recognition and recurrent
Case study An in-depth study of the research object from various and neural networks (RNNs) designed for sequential data structures such as
comprehensive aspects to seek patterns and causes of its time-series data and natural language processing.
behaviors.
Both emerging approaches covered in this essay are data driven;
CCCQ Conceptual analysis: Breaking down and analyzing concepts into
their constituent parts to gain better understandings of a therefore, model development is highly dependent on data while model
particular issue. performance is closely related to data quality and quantity. The general
Content analysis: Quantifying and analyzing the presence of procedure of model construction is as follows. To begin with, the
certain words, themes, or concepts within some given qualitative
practical problem to be addressed should be clarified, especially the
data.
Comparative analysis: Comparing of two or more processes, application scenarios, users, and prediction accuracy required. Then,
documents, data sets, and other objectives. necessary data fields need to be identified, and the related data sources
Qualitative analysis: Using subjective judgement to analyze the as well as their availability, costs, quality, and quantity should be
research object based on non-quantifiable information. investigated. Next, required data are collected, and those from different
Literature An overview and evaluation of the available literature in a given
sources should be fused. After formulating the whole dataset, feature
review subject or area.
Simulation An imitation of the operation of a real-world process or system engineering should be conducted, including data preprocessing, feature
over time. deletion and creation, dimension reduction, feature encoding, and data
normalization and standardization. Then, proper ML or DL models
Note*: Questionnaire and survey are related but different terms. Survey contains
a relatively wider range of actions from question design, data collection and should be developed in a trial-and-error manner considering features
analysis, and drawing a conclusion. This means that a survey always involves and target(s), the allowed computation time and power, and the
questionnaires, while a single questionnaire is only one small part of survey. requested model accuracy and interpretability. Hyperparameters in the
ML or DL models should also be tuned for better generalization ability
and prediction performance. It should also be mentioned that features
less based on subjective judgement when making assumptions and con- might need further processing in this step in certain models. Finally, the
structing models. In addition, classic models rarely rely on data obtained constructed ML or DL models should be evaluated by various metrics. In
from practice as they are not data-driven (with econometric analysis and regression tasks, common metrics are the mean squared error, the root
statistical analysis as exceptions), and thus prevent them from learning mean squared error, and the mean absolute error. In classification tasks,
from data dynamically so as to mine useful information. common metrics are the confusion matrix, accuracy, recall, precision,
Recent years have witnessed a rapid growing of studies on developing and the F1 score. In a practical implementation, the model may need to
machine learning (ML) and deep learning (DL) models, which we call be iteratively adjusted and improved to cope with changing
emerging approaches in this essay, to solve practical problems. Both requirements.
emerging approaches are data-driven and belong to the area of artificial The key merits of applying emerging approaches to address practical
intelligence (AI), which comprises techniques that enable a computer to problems are summarized as follows. First, emerging approaches with
think like humans and mimic their behaviors to solve complex tasks data-driven property allow the use of past information (which mainly
without or with little human intervention. According to Mitchell (1998), refers to data) to predict the future, and thus can avoid false assumptions
an ML problem is defined as follows: “A computer program is said to and being swayed by biases resulted by subjective expert judgement.
learn from experience E with respect to some task T and some perfor- Second, decision making based on emerging approaches is generally
mance measure P, if its performance on T, as measured by P, improves more accurate and effective than that based on expert knowledge, which
with experience E.” There are three types of ML models: supervised is the result of the data-driven property of emerging approaches.
learning (SL), unsupervised learning (UL), and reinforcement learning Therefore, the accountability of such decisions can be significantly
(RL), whose definitions can easily be found in most ML textbooks. Here improved, and thus future risks can also be reduced. Third, prediction
we only give their main points discussed in Bishop (2006) as follows. results given by emerging approaches and the follow-up decisions made
are more consistent with the status quo as they are derived from his-
 SL: A training set comprising examples of the input vectors together torical data from practice. Moreover, they can be constantly improved in
with their corresponding target vectors is used to train ML models. In the future with the accumulation and incorporation of new data.

2
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

Although emerging approaches have already been widely applied in papers are summarized in Table 2. Finally, challenges, solutions, and
some disciplines, their adoption in the maritime industry remains limited future research directions of applying emerging approaches to address
so far. We argue that it is now feasible and necessary to develop these practical maritime transport problems are presented. The remainder of
approaches for the maritime industry. On the one hand, given a growing this study is organized as follows. Section 4 introduces common data
volume of shipping data with higher accuracy from automatic identifi- sources adopted by the related existing literature. Section 5 provides
cation systems (AIS), global positioning systems, and tracking devices for typical applications of emerging approaches in maritime transport. Sec-
cargo, containers, and machinery devices, the accessibility of finer tion 6 discusses challenges and solutions of applying emerging ap-
granularity and more accurate sea and weather data, and larger capacity proaches to maritime transport research. Section 7 presents promising
devices for real-time data transmission and storage, the quantity and future research directions. Section 8 concludes this essay.
quality of data in maritime transport are improved, facilitating the
following data-driven analysis. In addition, data reported by the gov- 4. Introduction of common data sources
ernments and maritime institutions are gradually being made available to
the public, while more comprehensive data fields are covered by com- This section aims to briefly introduce popular data sources used in
mercial databases (Clarkson, 2021). Therefore, it is justifiable to say that maritime transport research based on emerging approaches presented in
the foundation of emerging approaches, i.e., the acquisition of data Table 2.
required, can be basically fulfilled in maritime transport research. In
addition, various libraries for the implementation of ML and DL algo-  AIS
rithms also facilitate the development of emerging approaches even by
non-professional technicians in the maritime industry. Such program- Table 2 indicates that AIS is no doubt the most popular and important
ming libraries include but are not limited to TensorFlow, PyTorch, data source facilitating data-driven maritime transport research using
Theano, Keras, and SciKit Learn in Python, and CARAT, Nnet, and emerging approaches. All international voyage ships with a gross
deepnet in R, where the newly proposed algorithms are continuously tonnage above 300 and all passenger ships are obliged to be equipped
included, making the construction of tailored ML and DL models easier. with an AIS transmitter. AIS provides updates on dynamic vessel sailing
Furthermore, the rapid growth of computation power and algorithm information in different time intervals depending on sailing speeds, while
acceleration techniques further enhance the practicability of emerging static and voyage information is usually updated every 6 min or when
approaches, enabling them to process real-time tasks in maritime any field is changed. Especially, dynamic information mainly includes a
transportation. vessel’s real-time position with accuracy indication, record reporting
On the other hand, there are urgent needs of the contemporary time, course and overground speed, heading degree, navigation status,
maritime industry for digital transformation and technological innova- and rate of turn. Static information mainly includes vessel identifiers
tion, while initial trials have been made with the assistance of AI tech- (IMO number, maritime mobile service identity [MMSI], call sign, and
nology. A research hotspot for applying emerging technology in the ship name), vessel type and dimensions (length and beam), and location
shipping side is the development of maritime autonomous surface ship of position fixing antenna. Voyage related information includes vessel’s
(MASS), which is regarded as the future of the maritime industry (Bakdi draught, type of cargo, and voyage destination and the ETA (Meijer,
et al., 2021; Szelangiewicz et al., 2021). At present, autonomous and 2017; Yang et al., 2019). For detailed introduction of AIS and its appli-
remote-controlled ships based on sensor fusion, intelligent control algo- cation, readers are referred to Yang et al. (2019) and Tu et al. (2017). At
rithms, as well as effective communication and connectivity, that allow present, AIS data can be collected from commercial websites such as
the ship itself to collect and process data from surrounding environment AISHub, Elane, ExactEarth, FleetMon, IHS Markit, and Marine Traffic, to
and make appropriate and timely responses, are being tested in some sea name just a few (Tu et al., 2017).
areas. More commonly, massive data collected from onboard sensors
together with the sea and weather conditions and port conditions are  Ship specifications
used to derive the estimated time of arrival (ETA) and the estimated time
of departure (ETD), adjust container routing, and reduce fuel consump- Specifications of individual ships in the world merchant fleet can be
tion. Regarding the port side, the concept of smart port has emerged, found in commercial databases such as Lloyd’s List Intelligence1 and
which encompasses state-of-the-art technologies including big data, AI, Shipping Intelligence Network2, where accurate, up-to-date, and
Internet of Things, blockchain technology, and 5G connection. Pioneer comprehensive features of individual vessels such as current and former
smart ports such as Hamburg, Rotterdam, Quebec, and Singapore are names, ship identification numbers, ship type, tonnages and dimensions,
using these technologies to enhance their operations and solutions to builder and built date, ship operational information such as owner,
optimize maritime and land transport operations while being more manager, registration, flag, and classification, and main and auxiliary
environmentally friendly. These limited existing attempts have already machinery details are provided. Features of vessels with certain charac-
shown that shipping and port services can significantly benefit from teristics, for example, in certain classification societies (e.g., American
digital transformation with emerging approaches as the core (UNCTAD, Bureau of Shipping3) and the sunken ships (e.g., International Registry of
2019). Sunken Ships4), are provided by specific databases.

3. Objective and structure  Ship sailing records

The aim of this essay is to review typical applications of emerging Ship sailing records are mainly from onboard sensors and ship noon
approaches to solve maritime transport problems from the shipping side report. Especially, sensor data are automatically collected from multiple
and the port side. In the first step, we searched related academic papers sensors installed onboard a ship in a fixed time interval ranging from
from Scopus, Google Scholar, and Science Citation Index databases. In seconds to minutes (and thus they are usually associated with a large
the second step, papers citing the papers found in the first step as well as scale) (Wang et al., 2016). The main sensors include GPS receiver to track
papers cited by them are investigated. In the third step, relevant research
topics are identified, and representatives under each topic are illustrated.
We then summarize the common data sources, and the related papers are 1
https://ihsmarkit.com/products/maritime-ships-register.html.
further reviewed from the perspectives of problems addressed and 2
https://www.clarksons.net/portal/.
methods adopted. The research topics, problem settings, main data 3
https://ww2.eagle.org/en.html.
4
sources, and the main types of emerging approaches developed in related http://www.shipwreckregistry.com/.

3
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

Table 2
Summary of the papers covered in this essay.
Problem Research topic Problem settings Main data sources ML models adopted DL models adopted
category

Shipping Ship trajectory Prediction of vessel trajectory in the near AIS SVM, ELM, k-NNs, autoencoder LSTM, GRU, GAN
prediction future based on historical time series
trajectory points
Shipping Ship risk prediction Prediction of the occurrence of vessel AIS; SVM, MLP, tree-based models, k-NN LSTM and its variants, CNNs,
and safety collision accidents; Maritime accident regression, BN, k-means, KDE, ANN, deep RL
management Analysis of ship accident data and reports statistics and accident polynomial regression, GBRT
reports
Shipping Ship inspection Prediction of ship risk; PSC inspection BN, SVM, tree-based models, None
planning Optimization of onboard inspection records; association rule mining
sequence; Ship specifications
Analysis of the influencing factors of port
state control (PSC) inspection results and
the influences of PSC inspection
Shipping Ship energy Prediction of ship fuel consumption in Ship sailing records ANN, LR, tree-based models, LASSO, Hybrid model combining
efficiency prediction various environments (e.g., noon report); ridge regression, SVR, SOM, GMM, LSTM, RNN, and ENN, ANN
Marine weather fuzzy c-means clustering with 10 hidden layers
forecasts
Shipping Ocean freight market Prediction of Baltic Dry Index (BDI); Market condition ANN, SVR, RF, GBR, MLP RNN, LSTM
condition prediction Prediction of sea freight rate indicators such as BDI
Port Ship destination and Prediction of ship destination (turning AIS; DBSCAN, ANN, RF, GBRT, SVM DNN
arrival time point identification and trajectory Marine weather
prediction extraction); forecasts;
Prediction of ship arrival time to a port; Port statistics
Prediction of equivalent problems of ship
arrival time to a port
Port Port condition Prediction of port cargo volume; Port statistics; ANN, SVM, RF, SVR LSTM
prediction Prediction of port traffic volume Bill of lading

real-time position and ship navigation speed, shaft power tester to record  PSC inspection records
torque of the shaft, fuel consumption sensor to detect instantaneous fuel
consumption rate, wind speed and direction sensors to detect wind PSC inspection records are published by regional PSC organizations,
conditions, and water depth sensor to acquire water depth. In contrast, which are also called PSC Memorandum of Understandings (MoUs), in
the ship noon report is the ship’s sailing profile filled manually by crew their individual databases. Currently, there are nine MoUs around the
members on a daily basis (usually noon at the local time) (Du et al., world in additional to the United States Coast Guard. Among them,
2019). Common fields of ship noon report include ship geographical Tokyo MoU7 in the Asia Pacific region and Paris MoU8 covering North
location, distances covered since the voyage and the last report, average America and Europe take leading roles. Records of ship inspections,
propeller revolutions per minute (RPM), engine speed, sailing speed, which cover ship information (e.g., ship identity, type, certificates,
total cargo hold, as well as sea and weather conditions regarding wind, dimension, operation information, and historical inspection records
current, and wave (Yan et al., 2020a). One drawback of ship noon report within the MoU) and inspection information (e.g., detailed deficiency
data is that only one report is generated for one day, while many data and detention conditions in an inspection) carried out by all member
fields are manually filled or are the average over a period. Therefore, data states of an MoU are published in a public database. It is also noted that
quantity and quality of ship noon report are generally lower than those the ship selection procedures, onboard inspection criteria and scheme,
from onboard sensors. and data fields provided are different in different MoUs.

 Ship accident data  Marine weather forecasts

Public global ship accident data are published by the International Marine weather refers to the atmospheric and ocean-wave conditions
Maritime Organization (IMO) in Marine Casualties and Incidents5 in its at sea with the focus on wind speed and direction, wave heights and
Global Integrated Shipping Information System (GISIS). It contains two periods, roughness of nearshore waters, and significant weather (such as
types of information regarding ship casualties: factual data collected from storms and hurricanes), which can facilitate vessel navigation and ensure
various sources, and more elaborate information based on the reports of smooth sailing on the high seas. Popular online resources of marine
investigations into casualties received at IMO (IMO, 2021a). Marine ac- weather forecasts used in existing research are National Marine Envi-
cident statistics and the detailed investigation reports are also published ronmental Forecasting Center (NMEFC), National Oceanic and Atmo-
by some local marine departments or administrations. For example, in- spheric Administration (NOAA), Weathernews Inc. (WNI), Copernicus
formation on marine accidents within the Hong Kong waters and related Marine Environment Monitoring Service (CMEMS), and the European
to Hong Kong registered ships is published by the Marine Accident Centre for Medium-Range Weather Forecasts (ECMWF), while different
Investigation Section (MAIS) under the Marine Department of Hong Kong data fields and formats are provided by different online weather forecast
Special Administrative Region6, where an overview and detailed acci- resources. For example, ECMWF (referred to as ERA5) provides
dent statistics are available. comprehensive weather data in regular latitude-longitude grid with
horizontal resolution as 0.25 0.25 for atmosphere and 0.5 0.5 for
ocean waves with 1 h as the temporal resolution. The weather forecasts
are updated once a day (ERA5, 2021).

5
https://gisis.imo.org/Public/MCI/Default.aspx.
6 7
https://www.mardep.gov.hk/en/publication/publications/reports/eovervie http://www.tokyo-mou.org/.
8
w.html. https://www.parismou.org/.

4
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

 Port statistics global maritime sector. Among the frequently occurred maritime acci-
dents such as fire/explosion, foundering/sinking, and stranding/
Port statistics refers to key port indicators such as vessel statistics, grounding, collisions receive the widest attention from both the shipping
port container throughput, port cargo throughput, marine accidents, and industry and the academic scholars due to their high frequency and the
bunker sales. The data are published by individual ports on the corre- severe consequence they cause (Huang et al., 2020). Among studies on
sponding websites while the fields provided might be different from each ship collision analysis and prediction based on ML models, SVMs are
other. For example, Hong Kong Port9 provides vessel arrivals by ocean/ adopted to predict ship future navigation status regarding course and
river, type, flag, and main berthing location; port container throughput of speed (Gang et al., 2021) and to divide the ship encounter azimuth map
Hong Kong port and Kwai Tsing container terminals as well as by so as to avoid collisions (Gao et al., 2020). Multilayer perceptron (MLP)
handling location and seaborne/river; port cargo throughput by as a type of neural network, together with the fuzzy logic, is combined
seaborne/river, amongst others. Similarly, bunker sales, vessels and with classic expert system for collision avoidance (Ahn et al., 2012). The
tanker arrivals, total cargo and container throughputs, and vessel arrivals collision risk index is predicted using several popular ML models
are published on the website of the Maritime and Port Authority of including various types of tree-based models (e.g., RF, GBRT, and extra
Singapore10. trees regression), as well as SVM and k-NN regression, and the prediction
results are combined with D–S theory which is a traditional method to
 BDI evaluate ship collision risk (Abebe et al., 2021). DL models are also
applied to predict ship collision risk levels and achieve collision avoid-
The BDI is regarded as a leading indicator of economic activity and ance, which can be found in Sawada et al. (2021) and Ma et al. (2020)
the barometer of the dry bulk shipping market. It is an index of high where LSTM and its variants are used. Ship navigation images extracted
volatility and is the average price paid for the transport of dry bulk from AIS data are investigated by CNNs to predict collision risk based on
materials regarding Capesize (40%), Panamax (30%), and Supramax image classification (Zhang et al., 2020c).
(30%) across more than 20 routes. BDI is released on each working day In addition to ship collision analysis and prediction, other types of
by the London-based Baltic Exchange and the data are fully accessible. ship risk, e.g., ship grounding accident (Wu et al., 2017), fire accidents in
electric vehicle for vehicle carriers (Wu et al., 2021), seafarers’ nonfatal
5. Typical applications of emerging approaches to maritime injuries (Zhang et al., 2020b), and unattended machinery plants (Abaei
transport et al., 2021), are also predicted and investigated in the existing literature
using various types of BNs. Besides, ANN together with a belief
This section aims to review typical applications of ML and DL models rule-based inference model and an evident reasoning rule is used to
to address practical problems (which are mainly related to prediction predict engine wear fault (Xu et al., 2020). DNN, polynomial regression,
tasks) in maritime transport from the shipping side and the port side. One and GBRT are used to predict damage and oil outflow of tankers (Das
should note that only representatives, not exhaustive studies under each et al., 2021), and deep RL is used to predict shaft deformation of
topic are covered. medium-sized oil/chemical tanker (Choi et al., 2021).
Another popular research topic in maritime safety management is the
5.1. Emerging approaches applied to address problems in the shipping side analysis of regional and global ship accident statistics and accident re-
ports, where BNs are the most popular methodology as they are partially
5.1.1. Ship trajectory prediction interpretable, while the dependencies between features and the classifi-
Accurate and efficient ship trajectory prediction is the foundation of cation variable (i.e., the prediction target) can be visualized (Wang and
sailing risk analysis, while such prediction based on emerging ap- Yang, 2018; Fan et al., 2020a, 2020b; Khan et al., 2020; Li et al., 2021;
proaches is mainly powered by AIS data. Existing literature developing Ung, 2021; Liu et al., 2021). In addition, k-means is applied to analyze
ML models for ship trajectory prediction includes Liu et al. (2020) who the influence of human errors on ship accidents (Lema et al., 2014), while
adopt least squares SVM, Mao et al. (2018) who propose extreme the combination of k-means with kernel density estimation (KDE) is
learning machine (ELM), Virjonen et al. (2018) who employ k-nearest applied to extract spatial patterns of global maritime accidents (Zhang
neighbors (k-NNs), and Murry and Perera (2020) who develop a dual et al., 2021).
linear autoencoder approach. As dynamic AIS data providing ship navi-
gation status is of huge volume, DL models are also widely applied for 5.1.3. Ship selection and inspection process optimization in port state control
ship trajectory prediction in addition to ML models. Among them, RNN is It is claimed by the IMO that the best way of improving safety at sea is
a popular DL model to predict vessel trajectory as it can make use of by developing international regulations that are followed by all shipping
sequential information whose output is dependent on previous compu- nations (IMO, 2021a). Vessel inspection carried out by port states, which
tations, which is in line with the properties of ship trajectories. For is also called PSC, aiming at ensuring the foreign visiting ships to comply
example, long-short term memory (LSTM) networks are developed in with various international and regional maritime conventions and reg-
Huang et al. (2020) and Li et al. (2019), and a gate recurrent unit (GRU) ulations, is regarded as an effective safety net to catch substandard ships
is applied in Suo et al. (2020). A generative adversarial network (GAN) (IMO, 2021b). PSC is one of the most important international maritime
with attention module and interaction module is also developed for ship policies, and various ML models have been developed to improve the
trajectory prediction (Wang and He, 2021). effectiveness and efficiency of PSC inspection. Most of the studies aim at
One may also notice that the predicted ship trajectory has been achieving accurate identification of high-risk visiting ships by predicting
further extended to ship collision risk analysis in some studies. We ship risks of various types (e.g., the number of deficiencies and the
categorize these studies into ship risk prediction instead of pure ship detention probability) by ML models including BNs (Yang et al., 2018;
trajectory prediction as summarized in Section 5.1.2. Wang et al., 2019), SVMs (Xu et al., 2007; Wu et al., 2021), and
tree-based models (Yan et al., 2020b, 2021a, 2021b, 2021a). In addition,
5.1.2. Ship risk prediction and safety management onboard inspection sequence is optimized based on association rule
Navigation safety has always been one of the top concerns of the mining techniques (Tsou, 2019; Chung et al., 2020; Osman et al., 2020;
Yan et al., 2021d), which belong to typical UL models. Additionally, we
note that BNs are also popular in analyzing the influencing factors of
9
https://www.mardep.gov.hk/en/fact/portstat.html#1. detention (Yang et al., 2021), evaluating the effectiveness of PSC on
10
https://www.mpa.gov.sg/web/portal/home/maritime-singapore/port-stat reducing ship accidents (H€anninen et al., 2014a; Li et al., 2014; Fan et al.,
istics. 2019), and quantifying the improvement of navigation safety brought by

5
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

PSC inspections (H€ anninen and Kujala, 2014b; Fan et al., 2020). As a _
Spreckelsen et al. (2014), Santos et al. (2014), Uyar and Ilhan (2016),
typical PSC inspection usually lasts two to 3 h, while only one record is and Yang and Mehmed (2019). Moreover, several ML models are
generated for an inspection, the data quantity is quite limited. Therefore, developed for comparison in Ubaid et al. (2020), including SVR, RF, and
to the best of the authors’ knowledge, there is no study developing DL GBR, and in Næss (2018), including ANN, MLP, RNN, and LSTM. One
models to address practical problems in PSC. may note that in all the literature covered in this subsection so far, apart
from Næss (2018), where DL models are used for the prediction of ocean
5.1.4. Ship energy efficiency prediction freight market conditions, all the other studies only use ML models. This
In recent years, ship fuel consumption prediction and optimization is mainly caused by the limited data that can be obtained: Most of these
have received an increased emphasis as a result of stricter vessel emission studies have no more than 1,000 records in the whole dataset, while
regulations, cyclical downturn of the world economy, and high bunker massive AIS data are used in Næss (2018) to support the use of DL models
prices (Yan et al., 2021c). Vessel fuel consumption is widely considered for freight rate prediction.
the foundation of the following voyage management and optimization We also note that regarding other shipping markets in the interna-
such as speed optimization, trim optimization, and sailing path planning. tional shipping industry, i.e., the sale and purchase market, the new-
However, achieving accurate predictions of fuel consumption in different building market, and the demolition market, emerging approaches are
environments is not a trivial task as various factors, e.g., vessel sailing seldomly applied. Instead, classic economic models and econometric
speed, engine and hull conditions, cargo loaded, and the surrounding sea models are widely applied in forecasting or estimating tasks as model
and weather conditions, could influence vessel fuel consumption in a explainability and significance test are of vital importance.
complex manner (Du et al., 2019; Yan et al., 2021c). Therefore, to cap-
ture their intricate interrelations, ML models have been widely applied to 5.2. Emerging approaches applied to address problems in the port side
predict vessel fuel consumption rates in various scenarios. For example,
Pedersen and Larsen (2009) is the pioneer study on applying ANN for 5.2.1. Ship destination and arrival time prediction
vessel fuel consumption prediction considering sailing speed, wind, Although the voyage destination and the corresponding ETA are
wave, air and water temperature, and draft conditions. ANNs are widely required to be reported via AIS, it has been shown that the accuracy is
developed and are by far the most popular ML model for fuel consump- low and are therefore of very limited use (Yang et al., 2021). Therefore,
tion prediction in existing literature (Yan et al., 2021c), which can be achieving accurate ship destination and arrival time prediction is of
found in Petersen (2011), Rudzki and Tarelko (2016), Du et al. (2019), significant importance. Accurate predictions are beneficial to several

Farag and Olçer (2020), and Le et al. (2020). Other SL models developed stakeholders in maritime transportation, such as cargo senders, for-
to predict ship fuel consumption rates include tree-based models (Peng warders and hinterland carriers, terminal operators, port authorities, and
et al., 2020; Yan et al., 2020a), least absolute shrinkage and selection shipping lines. One should note that the prediction of voyage destination
operator (LASSO) regression (Wang et al., 2018; Gkerekos et al., 2019; and the prediction of the arrival time to a port have different focuses: The
Soner et al., 2019; Uyanık et al., 2020), ridge regression (Soner et al., destination of a certain voyage is unknown and is the prediction target of
2019; Uyanık et al., 2020), and SVR (Uyanık et al., 2020). UL models are the former, while the destination is already known in the latter. Mean-
also constructed for vessel fuel consumption prediction, such as while, it is also noted that these two issues are addressed together in
self-organizing maps (SOM) (Man et al., 2020), Gaussian mixture model some existing studies as both are influenced by some common external
(GMM) (Petersen, 2011), and fuzzy c-means clustering (Tran, 2020). DL factors, such as ship characteristics, historical and current vessel trajec-
models are only employed by a very limited number of studies including tories, sailing speed, weather and sea conditions, and the departure port
Panapakidis et al. (2020) which combined LSTM, RNN, and Elman neural of the voyage.
network (ENN). By utilizing the vessel fuel consumption prediction re- Studies on ship destination prediction based on AIS can be divided
sults, downstream voyage optimization models are developed to deter- into two categories: turning point based and trajectory extraction based.
mine the optimal sailing speeds (Du et al., 2019; Zheng et al., 2019; Yan Regarding the first category, a framework for anomaly detection and
et al., 2020a), vessel engine settings (Rudzki and Tarelko, 2016; Wang route prediction called TREAD is developed by Pallotta et al. (2013).
et al., 2016), trim conditions (Du et al., 2019), and sailing routes (Zhang Another study is based on density-based spatial clustering of applications
et al., 2019). A recent comprehensive review on vessel fuel consumption with noise (DBSCAN) for turning region clustering and ANN for next
prediction approaches can be found in Yan et al. (2021c). turning point predicting (Daranda, 2016). There are more exiting studies
Emissions of gases and particles resulted by ship fuel consumption in within the second category. For example, several joint vessel destination
port areas can be a source of localized ocean acidification and a threat to and arrival time prediction models based on vessel trajectory extraction
human health (Wang et al., 2021; Zhuge et al., 2021). However, to the using data streams produced by AIS and ML models are proposed in the
best of the authors’ knowledge, there are quite few studies on the use of 2018 Grand Challenge, and the representatives can be found in Amariei
ML and DL models to predict ship fuel consumption or emissions in port et al. (2018), Lin et al. (2018), Bodunov et al. (2018), and Nguyen et al.
areas. One such study is Schaub et al. (2019), where ANN is applied to (2018). ML models are developed in most of the studies, while a DNN
predict vessel emissions when maneuvering to enter or leave a port. model is developed for vessel travel time prediction in Lin et al. (2018).
The destination prediction in the above studies is all at a regional level.
5.1.5. Ocean freight market condition prediction Zhang et al. (2020a) proposed a global-level vessel destination prediction
As a proxy for dry bulk shipping stocks as well as a general shipping model based on RF to evaluate trajectory similarities, and the destination
market bellwether, BDI within different time frames are forecasted by port with the highest probability was recommended by further consid-
several existing studies using emerging approaches. In particular, two ering port frequency. More recently, Jia et al. (2021) adopted classifi-
machine learning models are widely adopted: ANNs (Leonov and Niko- cation models such as RF and GBRT to predict the destination of global
lov, 2012; Zeng et al., 2016; Şahin et al., 2018; Bae et al., 2021) and SVRs crude oil exports using cargo information, vessel information,
(Han et al., 2014; Bao et al., 2016) while considering various influencing geographical information, and macroeconomic data, while vessel tra-
factors such as BDIs in the past period, iron ore freight volume, coal price, jectory data are excluded.
charter rates, and new building development, to name just a few. Several prediction tasks can be viewed as the equivalents of ship
Another popular prediction target in related studies is sea freight rate, arrival time predictions, including late arrival or delay prediction, arrival
which is the requested price for the transport of cargo from one place to punctuality prediction, and early arrival prediction, although they can be
another by sea and is mainly determined by the weight and volume of the either regression or classification problem, while the ship arrival time
cargo and the distance to the destination. ANNs are the most popular prediction is a regression problem. As liner ships follow fixed routes and
prediction models of sea freight rate which can be found in von schedules, most of the current studies on port arrival time prediction are

6
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

designed within the context of container terminals and are based on model is built; after that, users of the model, based on the model output,
classic ML models such as ANNs (e.g., Fancello et al., 2011; Parolas, make decisions; finally, the decisions will be applied to the targets. We
2016; Parolas et al., 2017; Meijer, 2017; Viellechner and Spinler, 2020), first examine the challenges associated with each layer with examples
RFs (e.g., Cannas et al., 2013; Pani et al., 2015; Yu et al., 2018; Bussmann, provided (most of them are from PSC inspection), and then some possible
2019; Viellechner and Spinler, 2020), SVMs (e.g., Meijer, 2017; Viel- solutions are discussed.
lechner and Spinler, 2020), and BNs (e.g., Salleh et al., 2017). There are
also a small number of ship arrival time prediction models designed for
all types of ships, which can be found in Kim et al. (2017) using refined 6.1. Data
case-based reasoning and in El Mekkaoui et al. (2020) using ANN. Based
on the prediction of port arrival time, port operation optimization models First, data sources used for the development of emerging approach
are developed, which can be found in Fancello et al. (2011) for opti- based solution should be compatible with the focus of the problem. This
mizing port human resource allocation and Yu et al. (2018) for improving means that the data sources should not be too limited and hence unable
daily port operation planning. Besides, the benefits of accurate ship to cover the full picture, nor too wide so as to contain interference
arrival time prediction are analyzed from both qualitative and quanti- information.
tative perspectives in Parolas (2016), Meijer (2017), Parolas et al. Example of selection of features: To predict ship risk level in PSC in-
(2017), and Bussmann (2019). Ship arrival time prediction models are spection, ship generic factors, such as the ship dimensions (e.g., length,
not only developed in existing academic literature, but also dealt in beam, depth, and draft), their flag, data on the management company,
cooperated projects between the government and the industry, such as operator information, and vessel casualties are useful and necessary to
SAFER carried out by the Singapore Maritime and Port Authority and the evaluate the ship risk, as they are highly related to the ship’s structural
IBM Research Singapore Laboratory (Yeo et al., 2019). condition, the safety of navigation, and safety management which are the
key points of PSC. In contrast, ship operation information, such as the
5.2.2. Port condition prediction ship’s sailing route, the average sailing speed over the voyages, the
Emerging approaches have been applied to predict port conditions freight rate, and the charter mode should deserve less attention since the
and thus to improve port management efficiency to a certain (but cost of obtaining these data can be very high, while the port authorities
limited) extent. For example, regarding shipping demand prediction, are not so much interested in such information when selecting high-risk
ANN and the combination of ANN and SARIMAX are applied to predict ships for inspection.
domestic shipping demand of cement (Fışkın and Cerit, 2019). Both ML Example of selection of output variable: In the context of high-risk ship
(SVR and RF) and DL (LSTM) models are developed for the prediction of identification and selection in PSC inspection, the goal is to enhance
liquid bulk cargo volume using bill of lading data (Kim et al., 2021). maritime safety, to protect the marine environment, and to guarantee
ANNs combined with time series methods (Zhang et al., 2021) as well as good living and working conditions for seafarers through ship inspection.
SVRs (Ruiz-Aguilar et al., 2020) are also applied to predict port cargo Therefore, the ship risk level should be evaluated by related risk in-
volume. Regarding port traffic volume prediction, AIS data are used to dicators, e.g., the number of deficiencies detected, the probability of
predict daily ship traffic volume at the Shanghai port by using ANNs detention (or fatal deficiencies detected), and the probability of maritime
(Wang et al., 2017). In addition, outbound truck traffic flows at major incidents or accidents involvement, as opposed to other types of risks
container terminals at the port of Rotterdam are also predicted by ANNs such as pirate attacks, since this has nothing to do with the goal of PSC.
(Nadi et al., 2021). Second, after collecting proper data, data preprocessing is essential as
real-world datasets are usually noisy, featured with missing values and
6. Issues in applying emerging approaches to practical maritime data errors. As an important part of feature engineering, dealing with
transport problems missing values and errors effectively in specific problem settings can
sometimes be a challenge as different methods are suitable to different
There are four layers associated with developing emerging ap- problems. Common methods used to address missing values in raw data
proaches to address practical maritime transport problems, as shown in together with the examples are provided in Table 3. It should be
Fig. 1, i.e., data, model, users, and targets, and each layer is nested into mentioned that some emerging approaches themselves can process
the next one. Relevant data must be collected, then using the data, a missing values, with tree-based models as a representative/example.
Data errors can be classified into four categories: outliers, duplicates,
rule violations, and pattern violations. We focus on methods dealing with
outliers, which refer to data values that deviate significantly from the
distribution of the values of a feature. We briefly summarize common
methods to detect outliers in data as follows.

 z-score
The z-score of a feature value is a metric which indicates how many
standard deviations the feature value is from all the feature values’ mean
by assuming that they follow a Gaussian distribution. If the z-score of a
feature value is beyond a given threshold, which is usually 2.5, 3, or 3.5,
the corresponding record is regarded as an outlier.

 Interquartile range (IQR)


The IQR is used to measure variability in data. Data values are first
sorted in ascending order and split into four equal parts, where Q1, Q2,
and Q3 refer to the 25th, 50th, and 75th percentile of the data, respec-
tively. IQR is defined as the first and the third quartile, i.e., IQR ¼ Q3 –
Q1. Data points which fall below Q1 – 1.5  IQR and above Q3 þ
1.5  IQR are regarded as outliers. IQR can be applied together with
boxplot for data distribution visualization to achieve more intuitive
Fig. 1. Layers of the development of emerging approaches. outlier detection.

7
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

Table 3 Table 3 (continued )


Summary of methods to deal with missing values in datasets. Method Description Example Note
Method Description Example Note
missing and 0,
Re-obtaining Query the specific PSC inspection: If the Data format of otherwise.
of missing missing fields from dimension external data Imputation Fill missing values Ship energy efficiency
data other data sources information of a sources may not be for time- using the last or prediction: If the
certain vessel is in consistent with series data next observed value field ‘draft’ is
missing in the the original data in time series data, absent in a noon
database provided format. or the mean of the report, it can be
by the last and next values replaced by using
corresponding (which is also called the ‘draft’ value
MoU, one may linear from the noon
search the related interpolation) report of either the
information of the previous day or the
specific vessel from next day, or the
external ship mean of the
specification previous day and
databases, such as the next day, if they
Lloyd’s List belong to the same
Intelligence and voyage.
Shipping Multiple Multiple imputation The working
Intelligence imputation is a statistical mechanism of
Network. technique based on multiple imputation
Discarding Delete the records Ship energy efficiency If there is a large the distribution of is complex and is
data and the features prediction: If there is enough dataset the observed data to hard to be
with too many a ship sailing record while few records estimate a set of illustrated by
missing values generated by are with many plausible values for examples. Readers
directly from the onboard sensors missing fields, the missing data are referred to
whole dataset with only the record deletion is a Rubin (1996), and
current common and Schafer (1999), and
geographical effective way. Royston (2004) for
location available, Meanwhile, when more information.
while all the other there are many Regression Predict missing PSC inspection: If there are several
fields missing due records with values of a certain Suppose there are features with
to transmission missing values feature using several missing missing values, they
failure, the record while the regression values for feature can be addressed in
could be deleted distribution of the approaches ‘age’ in the dataset. turn. One should
from the whole records are not First, predictors of also care about the
dataset. random, record ‘age’ are identified overfit problem as
PSC inspection: deletion might using a correlation the feature value is
Suppose one would cause bias. matrix. A regression predicted from
like to incorporate There is no rule of model, e.g., linear other features.
ship draft thumb for feature regression, is
information as an deletion, while one constructed by the
indicator of ship should keep in mind predictors selected
dimension for risk that dropping as the input and
prediction. feature might lead ‘age’ as the output.
However, the draft to (heavy)
field of more than information loss.
half of the ships is  UL techniques
missing in the In unsupervised settings, the training data are deemed to consist of
available ship
both normal and outlier data points while they are unlabeled. One typical
specification
database. Under outlier detection method is clustering, which is based on the assumption
this condition, one that normal clusters contain much larger number of records, while
can delete this anomalous clusters contain much less records as outliers are only asso-
feature from the ciated to a small percentage of the total data with distinct feature values.
whole dataset.
Imputation Fill missing values PSC inspection: If Record similarity
Popular clustering algorithms used for outlier detection include k-means,
for non- using mean, there is a container could be evaluated fuzzy c-means, k-NN, and DBSCAN. Except for clustering methods, other
time-series median, or mode of ship with missing by methods such as types of UL techniques include self-organizing maps, one-class SVM, and
data the corresponding ‘age’ value, one k-means clustering isolation forest.
features over all or could use the mean, and distance
After outlier detection, whether to remove them from the dataset or
similar record(s) in median, or mode of measurements.
the dataset ‘age’ of all ships in Moreover, one more revise them are worthy of a more careful investigation. On the one hand,
the dataset, or the binary feature outliers may be caused by data collection and measurement errors,
mean, median, or indicating whether sampling problems, and natural variations, and thus distort the perfor-
mode of ‘age’ of all each record has mance of the prediction models developed. On the other hand, the
container ships to missing value
fill this missing regarding a certain
detected outliers may contain valuable and irreplaceable information,
value. feature can be hence deleting them directly could cause serious information loss.
added after
conducting
imputation based 6.2. Model
on statistical
estimates, which
takes 1 if it is First, the applicable scenarios of DL and ML models are different. As
the structure of DL models is usually much more complex than tradi-
tional ML models (as they have a larger number of layers in a neural

8
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

network with many nodes in each layer, while the nodes are inter- prospect as it is almost impossible for conservative practitioners to trust
connected with each other in an intricate manner), they need much and use a model without knowing its working mechanism (Molnar,
more data for model training according to the Vapnik–Chervonenkis 2021). The requirement of model explainability is particularly important
(VC) dimension theory (Abu-Mostafa, 1989). Meanwhile, due to the in maritime policy-related applications due to the complex structure of
sophisticated and versatile structure of DL models, they are more the maritime industry which involves several stakeholders. According to
suitable to non-structural data such as audio, image, text, and video Doshi-Velez and Kim (2017), interpretability stems from “incomplete-
than traditional ML models. Table 2 suggests that various types of ML ness”, which creates a fundamental barrier to optimization and evalua-
models, especially linear regression models, SVMs, ANNs, tree-based tion. These authors claim that model explainability becomes unnecessary
models, and k-NNs have been widely adopted by studies on maritime only under two conditions: 1) Unacceptable results will not cause sig-
transport, while only basic types of DL models, e.g., CNN, RNN, and nificant consequences in a model, and 2) the problem is sufficiently well
LSTM are developed in related studies. Particularly, DL models are studied and validated in real applications, so that the stakeholders trust
usually applied to problems where adequate data can be obtained from the system’s recommendations. However, for most of the policy-making
AIS (e.g., in ship trajectory prediction and ship risk prediction) and processes in the traditional and conservative maritime industry, neither
onboard sensors (e.g., in ship energy efficiency prediction). In contrast, condition can be met.
it is noted that no DL model has been developed for ship inspection Example: For the first condition, since a typical PSC inspection usually
planning. The main reason is that the cost of generating one ship in- takes several hours, if a ship is selected for inspection, the shipping
spection record is very high. Taking the port of Hong Kong as an schedule may be delayed, which will affect both carriers and shippers.
example, PSC inspection is conducted purely by manpower, and a Therefore, if low-risk ships are wrongly identified as high-risk ships by
typical PSC inspection usually lasts two to 3 h, while the average hourly the model and are frequently inspected, ship owners and operators will
cost of a (follow-up) inspection is over 150 USD (Tokyo MoU, 2016). be less motivated to keep their ship in satisfactory condition and may
Moreover, even as one of the largest and busiest ports over the world, even cause conflicts between their shipping company and the port state.
usually no more than five inspections were conducted on each working In contrast, if a substandard ship of very low quality cannot be accurately
day at the port of Hong Kong before the COVID-19. Therefore, no more identified by the model and is not inspected before leaving the port, then
than 1,000 records can be generated per year. It can be concluded that a threat to maritime safety will arise. Regarding the second condition,
DL models are not suitable to be applied to all practical problems in since shipping is a relatively traditional industry, policy making is mainly
maritime transport. Instead, they are suitable to be applied in problems assisted by expert systems instead of emerging approaches, and the
where massive data can be obtained while the cost of data acquisition is stakeholders’ trust in emerging approaches as well as in their predictions
acceptable, or to problems featured with unstructured data. is still very weak. Consequently, only when the internal working process
Second, how to develop a proper emerging approach to solve a certain of the emerging approaches and the prediction results generated become
problem in maritime transport is not a trivial task. To begin with, the explainable to the users, can these models be trusted and used by them.
category of problem needs to be decided, e.g., classification, regression, However, explainability to non-technical personnel, who are usually the
clustering, or association rule mining. Then, models falling within certain model users, could turn out to be a major challenge, as different audi-
categories should be tried and evaluated. One tricky issue in this step is ences expect various levels of explanations.
how to decide model parameters and tune model hyperparameters. After Although the development of explainable emerging approach based
model construction, model performance should be evaluated by proper models is a relatively new research topic which is not that well studied,
metrics. there are currently several strategies to improve model explainability,
Example: Suppose one aims to develop an ML model to predict ship such as developing intrinsically interpretable models like linear regres-
detention probability in PSC inspection. This is obviously a classification sion and decision trees, providing summary statistics and visualization,
problem, while one challenge is that the detention rate is very low, which clearing the model internal working mechanisms, and analyzing single-
is usually no more than 5%. This indicates that on average, only five ships data points (Molnar, 2021). The advantages of explainable black-box
are detained after inspecting 100 ships. Therefore, the dataset is highly emerging approaches are easier debugging and improvement, clari-
imbalanced, that is, ships with detention heavily outnumber ships fying the model fairness, ensuring privacy protection, increasing model
without detention. Considering this characteristic, either data pre- reliability and robustness, enhancing users’ trust, and allowing further
processing methods to form a balanced dataset or data-driven models investigation (Doshi-Velez and Kim, 2017).
with special structure able to deal with this issue should be applied.
Moreover, metrics to evaluate the performance of classification models 6.3. Users
dealing with imbalanced datasets should be adopted. For example, ac-
curacy, which is the most widely used metric to evaluate the performance One guiding principle for the use of emerging approach-based models
of normal classification models, is not proper in this case. Instead, recall, is to avoid imposing any burden or pressure on the users, as applying
precision, and F1 score could be used. An existing representative study emerging approaches usually means replacing the current (naïve)
addressing the problem of ship detention problem is Yan et al. (2021b). method. In the context of maritime transport, the developers and users
Third, most of the emerging approaches are associated with the are usually different. On the one hand, emerging approaches are usually
property of ‘black-box’, which means that the model working mechanism developed by engineers or researchers with data analysis and shipping
and prediction results cannot be understood by humans (while BN is a backgrounds, and they usually feature multifarious inputs and outputs as
typical exception as it can be visualized and thus partially explainable). It well as complex structures, which means that the data processing and
is expected that the emerging approaches should possess the property of model operation can be sophisticated. On the other hand, the users of
explainability (or interpretability), which aims to make the internal these models are usually practitioners of maritime transport, such as
mechanisms of an automation ‘black-box’ model and the prediction re- government officials, technicians in shipping companies, and crew
sults understandable to a human. It is well known that when models are members, who may lack expertise in data acquisition and processing, in
not conceived with self-explanatory characteristics, they may engender addition to model operation. It is therefore crucial to avoid requiring the
pitfalls. Therefore, the explainability of a black-box model could bridge users to input external data and carrying out too many extra operations
the gap between making a prediction and making a decision, which is when using the models. Ideally, a straightforward and friendly graphical
highly related to the ultimate goal of applying it to practical maritime user interface should be provided for easier human-computer interaction,
problems (Athey, 2017). In some cases, explainability can even be as with little input information needed, and direct output generated.
important as the accuracy of a black-box model. As a matter of fact, the Example: In a decision support tool developed for ship selection in
explainability of a data-driven model largely determines its application PSC, to facilitate interactions between the emerging approach and the

9
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

users, a website-based online decision support tool with easy access was Yan et al. (2021a) developed a state-of-the-art ML model called XGBoost
developed11. All the data used by the system are retrieved automatically with monotonic constraints on certain features regarding the prediction
from the information system employed by that port state and a popular target for high-risk ships in order to generate prediction results
database providing common ship specifications. This means the users can complying with shipping domain knowledge.
read or download the outputs of the model without doing anything (e.g.,
without inputting any data or running the model). As a consequence, the 7. Future research directions
use of the emerging approach is consistent with the users’ past working
habits and does not pose any extra operational burden. Based on the above review and discussion, we further propose the
Another issue associated with the users of emerging approaches is the following directions for future research regarding the application of
lack of universal applicability of a certain model, as the model is usually emerging approaches to practical problems in maritime transport from
constructed using the data of a certain geographic region or from a the perspectives of data, model, users, and targets.
certain ship or port. Since different entities are associated with different From the perspective of data, which is the foundation of the devel-
characteristics, a data-driven prediction model constructed based on the opment of data-driven based emerging approaches, the above review
dataset collected from one entity is hard to be applied to predict the shows that there are quite limited free public data in the maritime in-
future conditions of another entity. If one wants to extend a developed dustry. This is not surprising as there are a significant number of different
model for one certain entity to another, new data need to be collected, and independent actors in the maritime transportation, while each of
while the parameters and hyperparameters of the model should also be these is only willing to sharing similar related data when it advances their
updated. Alternatively, innovative techniques such as transfer learning self-interest and preserves what is seen by them as a competitive edge. As
can be applied. indicated by Table 2, free public data sources are mainly provided by
Example: Suppose there is a developed ML model to predict the maritime institutions, i.e., port authorities, PSC MoUs, and the IMO
number of deficiencies of foreign visiting ships to the Hong Kong port regarding port statistics, PSC inspection records, and ship accident data
using Hong Kong’s historical inspection records. If one directly applies in their central databases. However, we also note that collecting data
this model to another port, e.g., the port of Shanghai, it is highly likely from these databases may be time-consuming, as frequent queries and
that the prediction performance will be low. The main reason is that copy-paste operations are needed. Other data sources are either from
inspectors at different ports usually have different backgrounds and thus commercial companies with high annual subscription fee (e.g., AIS, ship
expertise, and the inspection focuses varies at different port authorities, specifications, sea and weather data) or provided by port and shipping
and as a result, the number of deficiencies detected on one ship could be companies in an anonymous way (e.g., ship sailing records). Obviously,
quite different if inspected by different ports. To apply the developed ML there are digital inefficiencies in maritime transport chains. Therefore, to
model in order to predict ship deficiency number at the port of Shanghai, facilitate more in-depth emerging approach-based research on maritime
one can collect inspection records at the port of Shanghai and re-train the transport, “it is paramount that all the actors involved share their related
ML model, with proper adjustment of model parameters and intentions proactively in real-time to facilitate seamless interactions”
hyperparameters. (UNCTAD, 2018). This encourages digital collaborations between
ship-to-port, port-to-port, and port-to-hinterland data sharing.
6.4. Targets From the perspective of modeling, future studies adopting emerging
approaches should focus more on incorporating shipping domain
Fairness to the targets is also a key issue of the application of knowledge into data-driven prediction models. In existing studies,
emerging approaches to practical maritime problems. As the decision- domain knowledge has been considered to achieve precise and accurate
making process in the traditional maritime sector is largely based on problem definition, high-quality data collection, effective data pre-
widely accepted rules gained from long-term experience and shipping processing and feature engineering, and interpretation of prediction re-
domain knowledge, the results offered by models without applying such sults. Nevertheless, comprehensive consideration of the specific shipping
rules will be deemed unfair to the targets and will therefore have a lower domain knowledge and the preference of model users are rare in the
applicability. modeling process, including model choice, model structuring, and
Example: In developing a high-risk ship selection decision support tool parameter setting. For example, in vessel fuel consumption prediction,
for PSC, ships with a worse management company, recognized organi- shipping domain knowledge tells us that the relationship between fuel
zation, or flag performance should be predicted to be of higher risk when consumption and sailing speed should be convex and monotonic.
all the other factors are the same (e.g., ship age, type, and performance in Nevertheless, predictions given by emerging approaches may be opposite
historical inspections), that is, the output is monotonic with some fea- due to noises in data and model ineffectiveness. Such a violation would
tures. Fair prediction in line with domain knowledge is crucial for the limit the acceptance and application of the developed model by the
effectiveness of the application of the ship risk prediction model: If the practitioners in maritime industry. Although how to efficiently incor-
ships recommended by the model for inspection were not those that porate domain knowledge into data-driven models is still an open
should be inspected according to domain knowledge (i.e., ships with a question and a research hotspot, it can be achieved by adjusting model
worse management company, recognized organization, or flag perfor- structure, imposing constraints, post-hoc revising prediction results in
mances), ship owners might consider the model to be unfair and lack the rudimental application (Yan et al., 2020a, 2021a, 2021a).
motivation to intensively maintain ship conditions; this result would From the perspective of users, much more efforts should be made to
contradict the goal of PSC to motivate ship owners to carry out high- bridge the gap between making a prediction and making a decision
quality shipping (Yan et al., 2021a). through improving explainability of ‘black-box’ data-driven models
However, as practical data are usually featured with errors and noises (Athey, 2017). It is not difficult to open up these ‘black-box’ models to a
and are of limited quantity, domain knowledge may not be perfectly certain extent via existing methods for model explanation, which can be
captured by data-driven models developed based on emerging ap- classified as intrinsic and post hoc considering the application period,
proaches. To effectively incorporate such domain knowledge, tailored model-specific and model-agnostic considering the application range,
data-driven models could be developed by modifying the model structure and local and global considering the scope of interpretation (Molnar,
or imposing constraints in the model development process. For example, 2021). Nevertheless, we also note that achieving precise and
human-understandable explanations of ‘black-box’ models as well as
evaluating the performance of such explanations are still open questions
11 (Doshi-Velez and Kim, 2017). Furthermore, providing effective expla-
https://sites.google.com/site/wangshuaian/research-interest/ai-for-psc-at-h
ong-kong. nations of the prediction models and results to maritime practitioners,

10
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

who are usually laymen in data analytics but with rich industrial expe- Acknowledgements
rience as well as generating impactful management insights could be
even more difficult. Therefore, providing user-centered explanations of The authors thank the editor and the reviewer for their constructive
‘black-box’ models in the traditional maritime industry is worthy of more comments on improving this manuscript. This research is supported by
exploration. the National Natural Science Foundation of China (Grant numbers
Finally, from the perspective of targets, a wider range of practical 72025103, 71831008, 72071173).
maritime problems should be addressed using emerging approaches in
future studies. Table 2 indicates that only a few maritime problems have References
been dealt with by adopting emerging approaches with the focus on
prediction tasks. Several important issues from both shipping and port Abaei, M.M., Hekkenberg, R., BahooToroody, A., 2021. A multinomial process tree for
reliability assessment of machinery in autonomous ships. Reliab. Eng. Syst. Saf. 210,
sides have not yet been considered using emerging approaches. For 107484.
example, from the shipping side, vessel routing and scheduling, fleet Abebe, M., Noh, Y., Seo, C., Kim, D., Lee, I., 2021. Developing a ship collision risk index
planning and development, and shipping network design, which are the estimation model based on Dempster–Shafer theory. Appl. Ocean Res. 113, 102735.
Abu-Mostafa, Y.S., 1989. The Vapnik-Chervonenkis dimension: information versus
determinants of the competitiveness of shipping companies, have not yet complexity in learning. Neural Comput. 1 (3), 312–317.
been optimized by emerging approaches. Several issues in vessel man- Ahn, J.H., Rhee, K.P., You, Y.J., 2012. A study on the collision avoidance of a ship using
agement, e.g., labor employment and shipping service management are neural networks and fuzzy logic. Appl. Ocean Res. 37, 162–173.
Amariei, C., Diac, P., Onica, E., Roşca, V., 2018. Cell grid architecture for maritime route
also not addressed by emerging approaches. Fewer topics are covered in prediction on AIS data streams. In: Proceedings of the 12th ACM International
the port side. Even if vessel destination and arrival time prediction at the Conference on Distributed and Event-Based Systems, pp. 202–204.
regional level are achieved in several existing studies, there are few Athey, S., 2017. Beyond prediction: using big data for policy problems. Science 355
(6324), 483–485.
studies at the global level. Moreover, most of these studies are concerned
Bae, S.H., Lee, G., Park, K.S., 2021. A Baltic dry index prediction using deep learning
with liner ships with a fixed route and schedule (and thus it can be ex- models. J. Korea Trade 25 (4), 17–36.
pected that such prediction could be much easier), while little effort has Bakdi, A., Glad, I.K., Vanem, E., 2021. Testbed scenario design exploiting traffic big data
been made for tramp ships. Consequently, there are relatively few for autonomous ship trials under multiple conflicts with collision/grounding risks
and spatio-temporal dependencies. IEEE Trans. Intell. Transport. Syst. https://
emerging approach-based models for port congestion prediction, port doi.org/10.1109/TITS.2021.3095547 in press.
resource allocation management, and port risk and security prediction Bao, J., Pan, L., Xie, Y., 2016. A new BDI forecasting model based on support vector
with ship destination and arrival time predictions as the foundation. machine. In: Proceedings of the 2016 IEEE Information Technology, Networking.
Electronic and Automation Control Conference, pp. 65–69.
Similarly, as little research exists on the prediction of port throughput in Bishop, C., 2006. Pattern Recognition and Machine Learning. Springer, Berlin.
the near or far future, predicting and evaluating port competitiveness as Bodunov, O., Schmidt, F., Martin, A., Brito, A., Fetzer, C., 2018. Real-time destination and
well as managing port development is rare. Moreover, it can be expected eta prediction for maritime traffic. In: Proceedings of the 12th ACM International
Conference on Distributed and Event-Based Systems, pp. 198–201.
that as more accurate and timely forecasts and predictions can be ach- Bussmann, N.H., 2019. Predicting Arrival Times of Container Vessels: a Machine Learning
ieved by emerging approaches compared with traditional approaches, Application. M.S. Dissertation. Enschede. University of Twente, the Netherlands.
cases in more complex context, e.g., optimization of ad hoc shipping Cannas, M., Fadda, P., Fancello, G., Frigau, L., Mola, F., 2013. Delay prediction in
container terminals: a comparison of machine learning methods. In: Proceedings of
services (where accurate shipping demand and port condition pre- the 13th World Conference on Transportation Research, vols. 1–6.
dictions with uncertainties are required) can be handled more effectively Choi, S.P., Lee, J.U., Park, J.B., 2021. Application of deep reinforcement learning to
and efficiently. predict shaft deformation considering hull deformation of medium–sized oil/
chemical tanker. J. Mar. Sci. Eng. 9 (7), 1–29.
Chung, W.H., Kao, S.L., Chang, C.M., Yuan, C.C., 2020. Association rule learning to
8. Conclusions improve deficiency inspection in port state control. Marit. Pol. Manag. 47 (3),
332–351.
Clarkson, 2021. Clarkson research. https://www.clarksons.net/portal/. (Accessed 1
This essay first summarizes the existing research topics on maritime September 2021).
transport and the classic approaches developed to address them. It then Daranda, A., 2016. Neural network approach to predict marine traffic. Baltic J. Modern
briefly introduces emerging technologies, i.e., ML and DL models, as well Computing 4 (3), 483–495.
Das, T., Goerlandt, F., Tabri, K., 2021. An optimized metamodel for predicting damage
as their application to the solution of practical maritime problems in the and oil outflow in tanker collision accidents. In: Proceedings of the Institution of
existing academic literature. Particularly, we classify problems in mari- Mechanical Engineers, Part M. Journal of Engineering for the Maritime Environment,
time transport into two main categories—shipping and port—and give an 1–15.
Doshi-Velez, F., Kim, B., 2017. Towards a Rigorous Science of Interpretable Machine
overview of representatives in each category. Common data sources used Learning. arXiv: Machine Learning, p. 1702, 08608.
to address these problems are also introduced. Based on the broad-range Du, Y., Meng, Q., Wang, S., Kuang, H., 2019. Two-phase optimal solutions for ship speed
review, we further discuss issues in applying emerging approaches for and trim optimization over a voyage using voyage report data. Transp. Res. Part B
Methodol. 122, 88–114.
practical maritime transport problems from the perspectives of data,
El Mekkaoui, S., Benabbou, L., Berrado, A., 2020. Predicting ships estimated time of
model, users, and targets. Major challenges and their solutions from each arrival based on AIS data. In: Proceedings of the 13th International Conference on
perspective are discussed and illustrated by examples from PSC inspec- Intelligent Systems: Theories and Applications, pp. 1–6.
tion. Finally, we outline some promising future research directions of ERA5, 2021. ERA5 hourly data on single levels from 1979 to present. Accessed on. https
://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?
adopting emerging approaches to solve practical problems in maritime tab¼overview. (Accessed 4 October 2021).
transport, also from the perspectives of data, model, users, and targets. Fan, L., Zhang, Z., Yin, J., Wang, X., 2019. The efficiency improvement of port state
This essay is the first to provide a comprehensive review of existing control based on ship accident Bayesian networks. Proc. Inst. Mech. Eng. O J. Risk
Reliab. 233 (1), 71–83.
studies on developing ML and DL models together with popular data Fan, L., Zheng, L., Luo, M., 2020. Effectiveness of Port State Control Inspection Using
sources used to handle practical problems in maritime transport. We Bayesian Network Modelling. Maritime Policy & Management, pp. 1–18.
believe that although there are inevitable challenges, with the rapid Fan, S., Blanco-Davis, E., Yang, Z., Zhang, J., Yan, X., 2020a. Incorporation of human
factors into maritime accident analysis using a data-driven Bayesian network. Reliab.
advancement of digitization in the maritime industry, emerging ap- Eng. Syst. Saf. 203, 107070.
proaches will become more and more prevailing in the near future. Fan, S., Yang, Z., Blanco-Davis, E., Zhang, J., Yan, X., 2020b. Analysis of maritime
transport accidents using Bayesian networks. Proc. Inst. Mech. Eng. O J. Risk Reliab.
234 (3), 439–454.
Fancello, G., Pani, C., Pisano, M., Serra, P., Zuddas, P., Fadda, P., 2011. Prediction of
Declaration of competing interest arrival times and human resources allocation for container terminal. Marit. Econ.
Logist. 13 (2), 142–173.

The authors declare that they have no known competing financial Farag, Y.B., Olçer, A.I., 2020. The development of a ship performance model in varying
operating conditions based on ANN and regression techniques. Ocean Eng. 198,
interests or personal relationships that could have appeared to influence 106972.
the work reported in this paper.

11
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

Fışkın, C.S., Cerit, A.G., 2019. Forecasting domestic shipping demand of cement: 20PSC%20INSPECTION%20FEES%20AND%20CHARGES%20BY%20A
comparison of SARIMAX, ANN and Hybrid SARIMAX-ANN. In: Proceedings of the 4th UTHORITIES-ver16-r.pdf. (Accessed 17 August 2021).
International Conference on Computer Science and Engineering, pp. 68–72. Murray, B., Perera, L.P., 2020. A dual linear autoencoder approach for vessel trajectory
Gang, L., Ma, J., Yao, J., 2021. Decision-making of vessel collision avoidance based on prediction using historical AIS data. Ocean Eng. 209, 107478.
support vector regression. In: Proceedings of the 2nd International Conference on Nadi, A., Sharma, S., Snelder, M., Bakri, T., van Lint, H., Tavasszy, L., 2021. Short-term
Artificial Intelligence and Information Systems, vols. 1–6. prediction of outbound truck traffic from the exchange of information in logistics
Gao, M., Shi, G.Y., Liu, J., 2020. Ship encounter azimuth map division based on automatic hubs: a case study for the port of Rotterdam. Transport. Res. C Emerg. Technol. 127,
identification system data and support vector classification. Ocean Eng. 213, 107636. 103111.
Gkerekos, C., Lazakis, I., Theotokatos, G., 2019. Machine learning models for predicting Næss, P.A., 2018. Investigation of Multivariate Freight Rate Prediction Using Machine
ship main engine fuel oil consumption: a comparative study. Ocean Eng. 188, Learning and Ais Data. M.S. Dissertation. Trondheim. Norwegian University of
106282. Science and Technology, Norway.
Han, Q., Yan, B., Ning, G., Yu, B., 2014. Forecasting dry bulk freight index with improved Nguyen, D.D., Le Van, C., Ali, M.I., 2018. Vessel destination and arrival time prediction
SVM. Math. Probl Eng. 2014, 460684. with sequence-to-sequence models over spatial grid. In: Proceedings of the 12th ACM
H€anninen, M., Kujala, P., 2014. Bayesian network modeling of port state control International Conference on Distributed and Event-Based Systems, pp. 217–220.
inspection findings and ship accident involvement. Expert Syst. Appl. 41 (4), Osman, M.T., Yuli, C., Li, T., Senin, S.F., 2020. Association Rule Mining for Identification
1632–1646. of Port State Control Patterns in Malaysian Ports. Maritime Policy & Management.
H€anninen, M., Banda, O.A.V., Kujala, P., 2014. Bayesian network model of maritime https://doi.org/10.1080/03088839.2020.1825854.
safety management. Expert Syst. Appl. 41 (17), 7837–7846. Pallotta, G., Vespe, M., Bryan, K., 2013. Vessel pattern knowledge discovery from AIS
Huang, Y., Chen, L., Chen, P., Negenborn, R.R., Van Gelder, P.H.A.J.M., 2020. Ship data: a framework for anomaly detection and route prediction. Entropy 15 (6),
collision avoidance methods: state-of-the-art. Saf. Sci. 121, 451–473. 2218–2245.
IMO, 2021a. Maritime safety. Accessed on. https://www.imo.org/en/OurWork/Safety/ Panapakidis, I., Sourtzi, V.M., Dagoumas, A., 2020. Forecasting the fuel consumption of
Pages/default.aspx. (Accessed 20 September 2021). passenger ships with a combination of shallow and deep learning. Electronics 9 (5),
IMO, 2021b. Port state control. Accessed on. https://www.imo.org/en/OurWork/MSA 1–25.
S/Pages/PortStateControl.aspx. (Accessed 3 April 2021). Pani, C., Vanelslander, T., Fancello, G., Cannas, M., 2015. Prediction of late/early arrivals
Jia, H., Adland, R., Wang, Y., 2021. Global oil export destination prediction: a machine in container terminals-a qualitative approach. Eur. J. Transport Infrastruct. 15 (4),
learning approach. Energy J. 42 (4), 1–21. 536–550.
Khan, R.U., Yin, J., Mustafa, F.S., Liu, H., 2020. Risk assessment and decision support for Parolas, I., 2016. ETA Prediction for Containerships at the Port of Rotterdam Using
sustainable traffic safety in Hong Kong waters. IEEE Access 8, 72893–72909. Machine Learning Techniques. M.S. Dissertation. Delft University of Technology,
Kim, S., Kim, H., Park, Y., 2017. Early detection of vessel delays using combined historical Delft, the Netherlands.
and real-time information. J. Oper. Res. Soc. 68 (2), 182–191. Parolas, I., Tavasszy, L., Kourounioti, I., van Duin, R., 2017. Prediction of vessels’
Kim, S., Sohn, W., Lim, D., Lee, J., 2021. A multi-stage data mining approach for liquid estimated time of arrival (ETA) using machine learning–a port of Rotterdam case
bulk cargo volume analysis based on bill of lading data. Expert Syst. Appl. 115304. study. In: Proceedings of the Transportation Research Board 96th Annual Meeting,
Larochelle, H., Bengio, Y., Louradour, J., Lamblin, P., 2009. Exploring strategies for pp. 8–12.
training deep neural networks. J. Mach. Learn. Res. 10 (1), 1–40. Pedersen, B.P., Larsen, J., 2009. Prediction of full-scale propulsion power using artificial
Le, L.T., Lee, G., Park, K.S., Kim, H., 2020. Neural network-based fuel consumption neural networks. In: Proceedings of the 8th International Conference on Computer
estimation for container ships in Korea. Marit. Pol. Manag. 47 (5), 615–632. and IT Applications in the Maritime Industries, vols. 10–12.
Lema, E., Papaioannou, D., Vlachos, G.P., 2014. Investigation of coinciding shipping Peng, Y., Liu, H., Li, X., Huang, J., Wang, W., 2020. Machine learning method for energy
accident factors with the use of partitional clustering methods. In: Proceedings of the consumption prediction of ships in port considering green ports. J. Clean. Prod.
7th International Conference on PErvasive Technologies Related to Assistive 121564.
Environments, vols. 1–4. Petersen, J.P., 2011. Mining of ship operation data for energy conservation. DTU
Leonov, Y., Nikolov, V., 2012. A wavelet and neural network model for the prediction of Informatics.
dry bulk shipping indices. Marit. Econ. Logist. 14 (3), 319–333. Royston, P., 2004. Multiple imputation of missing values. STATA J. 4 (3), 227–241.
Li, K.X., Yin, J., Bang, H.S., Yang, Z., Wang, J., 2014. Bayesian network with quantitative Rubin, D.B., 1996. Multiple imputation after 18þ years. J. Am. Stat. Assoc. 91 (434),
input for maritime risk analysis. Transportmetrica: Transport. Sci. 10 (2), 89–118. 473–489.
Li, W., Zhang, C., Ma, J., Jia, C., 2019. Long-term vessel motion predication by modeling Rudzki, K., Tarelko, W., 2016. A decision-making system supporting selection of
trajectory patterns with AIS data. In: Proceedings of the 5th International Conference commanded outputs for a ship's propulsion system with a controllable pitch
on Transportation Information and Safety, pp. 1389–1394. propeller. Ocean Eng. 126, 254–264.
Li, B., Lu, J., Lu, H., Li, J., 2021a. Predicting Maritime Accident Consequence Scenarios Ruiz-Aguilar, J.J., Moscoso-L opez, J.A., Urda, D., Gonzalez-Enrique, J., Turias, I., 2020.
for Emergency Response Decisions Using Optimization-Based Decision Tree A clustering-based hybrid support vector regression model to predict container
Approach. Maritime Policy & Management. https://doi.org/10.1080/ volume at seaport sanitary facilities. Appl. Sci. 10 (23), 8326.
03088839.2021.1959074. Şahin, B., Gürgen, S., Ünver, B., Altin, I., 2018. Forecasting the Baltic dry index by using
Li, G., Weng, J., Hou, Z., 2021b. Impact analysis of external factors on human errors using an artificial neural network approach. Turk. J. Electr. Eng. Comput. Sci. 26 (3),
the ARBN method based on small-sample ship collision records. Ocean Eng. 236, 1673–1684.
109533. Salleh, N.H.M., Riahi, R., Yang, Z., Wang, J., 2017. Predicting a containership’s arrival
Lin, C.X., Huang, T.W., Guo, G., Wong, M.D., 2018. MtDetector: a high-performance punctuality in liner operations by using a fuzzy rule-based Bayesian network
marine traffic detector at stream scale. In: Proceedings of the 12th ACM International (FRBBN). Asian J. Shipp. Logist. 33 (2), 95–104.
Conference on Distributed and Event-Based Systems, pp. 205–208. Santos, A.A., Junkes, L.N., Pires Jr., F.C., 2014. Forecasting period charter rates of VLCC
Liu, X., He, W., Xie, J., Chu, X., 2020. Predicting the trajectories of vessels using machine tankers through neural networks: a comparison of alternative approaches. Marit.
learning. In: 5th International Conference on Control, Robotics and Cybernetics, Econ. Logist. 16 (1), 72–91.
pp. 66–70. Sawada, R., Sato, K., Majima, T., 2021. Automatic ship collision avoidance using deep
Liu, K., Yu, Q., Yuan, Z., Yang, Z., Shu, Y., 2021. A systematic analysis for maritime reinforcement learning with LSTM in continuous action spaces. J. Mar. Sci. Technol.
accidents causation in Chinese coastal waters using machine learning approaches. 26 (2), 509–524.
Ocean Coast Manag. 213, 105859. Schafer, J.L., 1999. Multiple imputation: a primer. Stat. Methods Med. Res. 8 (1), 3–15.
Ma, J., Jia, C., Yang, X., Cheng, X., Li, W., Zhang, C., 2020. A data-driven approach for Schaub, M., Finger, G., Riebe, T., Dahms, F., Hassel, E., Baldauf, M., 2019. Data-based
collision risk early warning in vessel encounter situations using attention-BiLSTM. modelling of ship emissions and fuel oil consumption for transient engine operation.
IEEE Access 8, 188771–188783. In: OCEANS 2019-Marseille, pp. 1–5.
Man, Y., Sturm, T., Lundh, M., MacKinnon, S.N., 2020. From ethnographic research to big Shi, W., Li, K.X., 2017. Themes and tools of maritime transport research during 2000-
data analytics—a case of maritime energy-efficiency optimization. Appl. Sci. 10 (6), 2014. Marit. Pol. Manag. 44 (2), 151–169.
2134. Soner, O., Akyuz, E., Celik, M., 2019. Statistical modelling of ship operational
Mao, S., Tu, E., Zhang, G., Rachmawati, L., Rajabally, E., Huang, G.B., 2018. An automatic performance monitoring problem. J. Mar. Sci. Technol. 24 (2), 543–552.
identification system (AIS) database for maritime trajectory prediction and data Suo, Y., Chen, W., Claramunt, C., Yang, S., 2020. A ship trajectory prediction framework
mining. In: Proceedings of ELM-2016, pp. 241–257. based on a recurrent neural network. Sensors 20 (18), 5133.
Meijer, R., 2017. Predicting the ETA of a Container Vessel Based on Route Identification _
Szelangiewicz, T., Zelazny, K., Antosik, A., Szelangiewicz, M., 2021. Application of
Using AIS Data. M.S. Dissertation. Delft University of Technology, Delft, the measurement sensors and navigation devices in experimental research of the
Netherlands. computer system for the control of an unmanned ship model. Sensors 21 (4), 1312.
Mitchell, M., 1998. An Introduction to Genetic Algorithms. MIT press, Cambridge, United Talley, W., 2013. Maritime transport research: topics and methodologies. Marit. Pol.
States of America. Manag. 40 (7), 709–725.
Molnar, C., 2021. Interpretable Machine Learning. Accessed on. https://christophm.git Tran, T.A., 2020. Effect of ship loading on marine diesel engine fuel consumption for bulk
hub.io/interpretable-ml-book/. (Accessed 10 May 2021). carriers based on the fuzzy clustering method. Ocean Eng. 207, 107383.
MoU, Tokyo, 2016. Information on fees and charges by authorities for follow-up PSC Tsou, M.C., 2019. Big data analysis of port state control ship detention database. J. Mar.
inspection. Accessed on. http://www.tokyo-mou.org/doc/INFORMATION%20FOR% Eng. Technol. 18 (3), 113–121.

12
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

Tu, E., Zhang, G., Rachmawati, L., Rajabally, E., Huang, G.B., 2017. Exploiting AIS data Yan, R., Zhuge, D., Wang, S., 2021d. Development of two highly-efficient and innovative
for intelligent maritime navigation: a comprehensive survey from data to inspection schemes for PSC inspection. Asia Pac. J. Oper. Res. 38 (3), 2040013.
methodology. IEEE Trans. Intell. Transport. Syst. 19 (5), 1559–1582. Yang, Z., Mehmed, E.E., 2019. Artificial neural networks in freight rate forecasting. Marit.
Ubaid, A., Hussain, F.K., Charles, J., 2020. Machine learning-based regression models for Econ. Logist. 21 (3), 390–414.
price prediction in the Australian container shipping industry: case study of asia- Yang, Z., Yang, Z., Yin, J., 2018. Realising advanced risk-based port state control
oceania trade lane. In: Proceedings of the International Conference on Advanced inspection using data-driven Bayesian networks. Transport. Res. Pol. Pract. 110,
Information Networking and Applications, pp. 52–59. 38–56.
UNCTAD, 2018. Digital data sharing: the ignored opportunity for making global maritime Yang, D., Wu, L., Wang, S., Jia, H., Li, K.X., 2019. How big data enriches maritime
transport chains more efficient. Accessed on. https://unctad.org/news/digital-data research–a critical review of Automatic Identification System (AIS) data applications.
-sharing-ignored-opportunity-making-global-maritime-transport-chains-more. Transport Rev. 39 (6), 755–773.
(Accessed 11 October 2021). Yang, D., Wu, L., Wang, S., 2021a. Can we trust the AIS destination port information for
UNCTAD, 2019. Digitalization in maritime transport: ensuring opportunities for bulk ships?–Implications for shipping policy and practice. Transport. Res. E Logist.
development. Accessed on. https://unctad.org/system/files/official-document/p Transport. Rev. 149, 102308.
resspb2019d4_en.pdf. (Accessed 21 August 2021). Yang, Z., Wan, C., Yang, Z., Yu, Q., 2021b. Using Bayesian network-based TOPSIS to aid
UNCTAD, 2020. Review of maritime transportation 2020. Accessed on. https://unctad.or dynamic port state control detention risk control decision. Reliab. Eng. Syst. Saf. 213,
g/webflyer/review-maritime-transport-2020. (Accessed 2 October 2021). 107784.
Ung, S.T., 2021. Navigation risk estimation using a modified Bayesian network Yeo, G., Lim, S.H., Wynter, L., Hassan, H., 2019. MPA-IBM Project SAFER: sense-making
modeling–a case study in Taiwan. Reliab. Eng. Syst. Saf. 213, 107777. analytics for maritime event recognition. INFORMS J. Appl. Anal. 49 (4), 269–280.
Uyanık, T., Karatu g, Ç., Arslanoglu, Y., 2020. Machine learning approach to ship fuel Yu, J., Tang, G., Song, X., Yu, X., Qi, Y., Li, D., Zhang, Y., 2018. Ship arrival prediction and
consumption: a case of container vessel. Transport. Res. Transport Environ. 84, its value on daily container terminal operation. Ocean Eng. 157, 73–86.
102389. Zeng, Q., Qu, C., Ng, A.K., Zhao, X., 2016. A new approach for Baltic dry index forecasting
_
Uyar, K., Ilhan, A., 2016. Long term dry cargo freight rates forecasting by using recurrent based on empirical mode decomposition and neural networks. Marit. Econ. Logist. 18
fuzzy neural networks. Procedia Computer Science 102, 642–647. (2), 192–210.
Viellechner, A., Spinler, S., 2020. Novel data analytics meets conventional container Zhang, C., Zhang, D., Zhang, M., Mao, W., 2019. Data-driven ship energy efficiency
shipping: predicting delays by comparing various machine learning algorithms. In: analysis and optimization model for route planning in ice-covered Arctic waters.
Proceedings of the 53rd Hawaii International Conference on System Sciences, vols. Ocean Eng. 186, 106071.
1–10. Zhang, C., Bin, J., Wang, W., Peng, X., Wang, R., Halldearn, R., Liu, Z., 2020a. AIS data
Virjonen, P., Nevalainen, P., Pahikkala, T., Heikkonen, J., 2018. Ship movement driven general vessel destination prediction: a random forest based approach.
prediction using k-NN method. In: Proceedings of the 2018 Baltic Geodetic Congress, Transport. Res. C Emerg. Technol. 118, 102729.
pp. 304–309. Zhang, G., Thai, V.V., Law, A.W.K., Yuen, K.F., Loh, H.S., Zhou, Q., 2020b. Quantitative
von Spreckelsen, C., von Mettenheim, H.J., Breitner, M.H., 2014. Spot and freight rate risk assessment of seafarers’ nonfatal injuries due to occupational accidents based on
futures in the tanker shipping market: short-term forecasting with linear and non- Bayesian network modeling. Risk Anal. 40 (1), 8–23.
linear methods. In: Operations Research Proceedings, vol. 2012, pp. 247–252. Zhang, W., Feng, X., Goerlandt, F., Liu, Q., 2020c. Towards a Convolutional Neural
Wang, S., He, Z., 2021. A prediction model of vessel trajectory based on generative Network Model for Classifying Regional Ship Collision Risk Levels for Waterway Risk
adversarial network. J. Navig. 74 (5), 1161–1171. Analysis, vol. 204. Reliability Engineering & System Safety, p. 107127.
Wang, L., Yang, Z., 2018. Bayesian network modelling and analysis of accident severity in Zhang, Y., Sun, X., Chen, J., Cheng, C., 2021. Spatial Patterns and Characteristics of
waterborne transportation: a case study in China. Reliab. Eng. Syst. Saf. 180, Global Maritime Accidents, vol. 206. Reliability Engineering & System Safety,
277–289. p. 107310.
Wang, K., Yan, X., Yuan, Y., Li, F., 2016. Real-time optimization of ship energy efficiency Zheng, J., Zhang, H., Yin, L., Liang, Y., Wang, B., Li, Z., Song, X., Zhang, Y., 2019.
based on the prediction technology of working condition. Transport. Res. Transport A voyage with minimal fuel consumption for cruise ships. J. Clean. Prod. 215,
Environ. 46, 81–93. 144–153.
Wang, S., Wang, S., Gao, S., Yang, W., 2017. Daily ship traffic volume statistics and Zhuge, D., Wang, S., Wang, D.Z., 2021. A joint liner ship path, speed and deployment
prediction based on automatic identification system data. In: Proceedings of the 9th problem under emission reduction measures. Transp. Res. Part B Methodol. 144,
International Conference on Intelligent Human-Machine Systems and Cybernetics, 155–173.
pp. 149–154.
Wang, S., Ji, B., Zhao, J., Liu, W., Xu, T., 2018. Predicting ship fuel consumption based on
LASSO regression. Transport. Res. Transport Environ. 65, 817–824.
Ran Yan is a PhD candidate at The Hong Kong Polytechnic
Wang, S., Yan, R., Qu, X., 2019. Development of a non-parametric classifier: effective
identification, algorithm, and applications in port state control for maritime University (PolyU). Her research interests include data ana-
lytics in shipping, big data in shipping, and green shipping
transportation. Transp. Res. Part B Methodol. 128, 129–157.
Wang, S., Zhuge, D., Zhen, L., Lee, C.Y., 2021. Liner shipping service planning under management.
sulfur emission regulations. Transport. Sci. 55 (2), 491–509.
Wu, B., Yan, X., Yip, T.L., Wang, Y., 2017. A flexible decision-support solution for
intervention measures of grounded ships in the Yangtze River. Ocean Eng. 141,
237–248.
Wu, B., Tang, Y., Yan, X., Soares, C.G., 2021a. Bayesian Network modelling for safety
management of electric vehicles transported in RoPax ships. Reliab. Eng. Syst. Saf.
209, 107466.
Wu, S., Chen, X., Shi, C., Fu, J., Yan, Y., Wang, S., 2021b. Ship Detention Prediction via
Feature Selection Scheme and Support Vector Machine (SVM). Maritime Policy &
Management. https://doi.org/10.1080/03088839.2021.1875141 in press.
Xu, R., Lu, Q., Li, W., Li, K.X., Zheng, H., 2007. A risk assessment system for improving
port state control inspection. In: Proceedings of the 2007 International Conference on
Machine Learning and Cybernetics, pp. 818–823. Shuaian (Hans) Wang is a Professor at The Hong Kong Poly-
Xu, X., Zhao, Z., Xu, X., Yang, J., Chang, L., Yan, X., Wang, G., 2020. Machine technic University (PolyU). His research interests include
learning–based wear fault diagnosis for marine diesel engine by fusing multiple data- shipping operations management, green shipping, big data in
driven models. Knowl. Base Syst. 190, 105324. shipping, port planning and operations, urban transport
Yan, R., Wang, S., Du, Y., 2020a. Development of a two-stage ship fuel consumption network modeling, and logistics and supply chain management.
prediction and reduction model for a dry bulk ship. Transport. Res. E Logist. He dedicates to rethinking and proposing innovative solutions
Transport. Rev. 138, 101930. to improve the efficiency of maritime and urban transportation
Yan, R., Wang, S., Fagerholt, K., 2020b. A semi-“smart predict then optimize”(semi-SPO) systems, to promote environmental friendly and sustainable
method for efficient ship inspection. Transp. Res. Part B Methodol. 142, 100–125. practices, and to transform business and engineering education.
Yan, R., Wang, S., Cao, J., Sun, D., 2021a. Shipping domain knowledge informed
prediction and optimization in port state control. Transp. Res. Part B Methodol. 149,
52–78.
Yan, R., Wang, S., Peng, C., 2021b. An artificial intelligence model considering data
imbalance for ship selection in port state control based on detention probabilities.
J. Comput. Sci. 48, 101257.
Yan, R., Wang, S., Psaraftis, H.N., 2021c. Data analytics for fuel consumption
management in maritime transportation: status and perspectives. Transport. Res. E
Logist. Transport. Rev. 155, 102489.

13
R. Yan et al. Communications in Transportation Research 1 (2021) 100011

Lu Zhen is a Professor and Dean in the School of Management at Gilbert Laporte is a Full Professor of at University of Bath.
Shanghai University. His research interests include logistics and Laporte has been awarded the Order of Canada and the Innis-
supply chain management, operations research, optimization in Gerin Medal. In 2019, Laporte was elected a member of the
port and shipping management. He has published more than 90 National Academy of Engineering for domain-defining contri-
SCI and SSCI journal papers on some reputable journals. He has butions to the theory and practice of vehicle routing, facility
served as an associate editor or an editorial board member of location, and distribution management.
five SCI/SSC journals such as TR Part B, JORS, COR; and he is
also the Fellow of the Operational Research Society.

14

You might also like