Prediction of Ship Fuel Consumption by Using An Artificial Neural Network

Journal of Mechanical Science and Technology 32 (12) (2018) 5785~5796
www.springerlink.com/content/1738-494x(Print)/1976-3824(Online)
DOI 10.1007/s12206-018-1126-4
Prediction of ship fuel consumption by using an artificial neural network†

Miyeon Jeon1, Yoojeong Noh1,*, Yongwoo Shin1, O-Kaung Lim1, Inwon Lee2 and Daeseung Cho2
1
School of Mechanical Engineering, Pusan National University, Busan 46241, Korea
2
Department of Naval Architecture & Ocean Engineering, Pusan National University, Busan 46241, Korea
(Manuscript Received February 28, 2018; Revised July 20, 2018; Accepted August 22, 2018)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Abstract
A smart ship collects various data with large volume, such as voyage, machinery, and weather data. Thus, big data analysis for smart
ships is an important technology that can be widely applied to improve ship maintenance, operational efficiency, and equipment life
management. In this study, an accurate regression model for the fuel consumption of the main engine by using an artificial neural net-
work (ANN) was proposed by big data analysis including data collection, clustering, compression, and expansion. To obtain an accurate
regression model, various numbers of hidden layers and neurons and different types of activation functions were tested in the ANN, and
their effects on the accuracy and efficiency of the regression analysis were studied. The proposed regression model using ANN is a more
accurate and efficient model to predict the fuel consumption of the main engine than polynomial regression and support vector machine.
Keywords: Artificial neural network; Big data analysis; Ship fuel consumption; Smart ship; Regression analysis
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The data collected from the ship are high-capacity, large-

1. Introduction
volume data that are updated in real time in a time-series form.
A smart ship refers to a new generation of ships equipped Perera and Mo [2] developed a ship big data framework to
with a system that can remotely monitor the ship’s engine conduct preprocessing and post-processing of data collected
status and navigation information from a land-based control on the ship for analyzing the large amount of data on ship
center by incorporating internet of things (IoT) technology. performance. Data preprocessing includes sensor error detec-
The development of such IoT technology for the maritime tion, data classification, and data compression, whereas data
field has enabled the collection and monitoring of navigational post-processing includes data expansion, integration verifica-
and other information from sensors mounted on ship equip- tion, and data regeneration. However, they derive a mathe-
ment and machinery. Large amounts of time-series data are matical model of the relationship among the parameters using
collected, which must be processed to improve ship perform- a simple regression model, and no study has verified the accu-
ance. racy of ship fuel consumption data. In most big data analysis,
The evaluation of the ship performance and optimization of the dimension of the factors is high and the interaction be-
ship operations have long been important issues in the ship- tween factors is often strong. In the case of nonlinear relations,
ping and maritime field. The concept of ship performance using an accurate regression model is necessary.
generally refers to the relationship among ship speed, engine Various statistical methods and artificial intelligence meth-
power, weather conditions, and fuel consumption. The rela- ods, such as polynomial regression (PR), support vector ma-
tionship among these factors is expressed in a variety of ways chine (SVM), fuzzy logic, and artificial neural network
in a model with parameters that are physical, statistical, or (ANN), exist. PR is the most commonly used method that
something in-between and aims to optimize ship operations approximates a set of data points by using various degrees of
for minimizing operating costs and maximizing revenues [1]. polynomials, such as linear and quadratic functions. The ap-
Therefore, to optimize ship operations and achieve an effec- proach is commonly used because it has reasonable flexibility
tive energy efficiency and navigation strategy, studying the for uncomplicated data. SVM is a popular machine learning
relationship between a ship’s fuel consumption and the factors tool for classification and regression. SVM maps input data
that affect the ship’s fuel consumption accurately is necessary. into high-dimensional space where it constructs an optimal
separating hyperplane. The data mapping is performed using
*
Corresponding author. Tel.: +82 51 510 2308, Fax.: +82 51 514 7640
predetermined kernel functions. The approach has generaliza-
E-mail address: yoonoh@pusan.ac.kr
†
Recommended by Associate Editor Jaewook Lee tion ability and overcomes the overfitting problem that often
© KSME & Springer 2018 occurs in other learning techniques. Unlike the traditional
5786 M. Jeon et al. / Journal of Mechanical Science and Technology 32 (12) (2018) 5785~5796
regression model that obtains the relation between the input Start
and output variables with a single equation, the regression

Input
model based on fuzzy logic uses local functions to globally corrupted data, xcor
approximate the nonlinear model. As a result, the fuzzy logic

model becomes a single expression that combines local func- Data denoising
Data filtering Fi
using smoothing
algorithm, 
tions, which are called membership functions. The approach is Data denoising
proven to be capable of modeling highly nonlinear and multi-

dimensional problems. Initialization : No. of clusters (K) = 1
PR is widely used in practice due to its easy implementation.

Data clustering
In the shipping and maritime field, PR was used to assess the (1) Use of Gaussian mixture model K=K+1
(2) Calculation of average silhouette value for K clusters, ̅
hull-propeller performance loss [3] and ship trajectory recon-
struction for automatic identification system data [4]. SVM No
K = K tot & S K = S max
was used as a classifier to detect color and texture features Data clustering
extracted from blocks, which divide ship detection images into
Yes
small blocks of pixels [5] and used as regression models to Data compression & expansion
(1) Use of autoencoders Adjustment of auto encoder's
depose outliers and regressing polluted ship tracks [6] and (2) Calculation of mean squared error, MSE hyperparameter
predict ship motion [7]. Fuzzy logic regression was used for
No
maritime weather prediction for shipping feasibility [8]. MSE £ MSEc
ANNs have been actively studied in various fields to im- Yes Data compression & expansion
prove the accuracy of regression models, including high-
dimensional factors and complex interactive relationships Data regression
(1) Generation of ANN model Adjustment of ANN’s parameters
between factors. Sarlie [9] first explained the relationship be- (2) Calculation of determination coefficient, R
tween neural networks and statistical models, such as linear or

No
nonlinear models, by statistically analyzing neural network R ³ Rc
Data regression
theory. The ANN is a structured technique for modeling Yes
highly nonlinear and dynamic systems and simple linear sys- Output
Regression model for ship’s fuel
tems. ANNs are helpful in developing an appropriate model consumption
when the physical relationships are complex or when the na-

ture of the event has unknown properties. Although neural End
networks require prior information about the system and have

Fig. 1. Big data analysis process for ship fuel consumption.
less physical insight, these properties reduce model depend-
ence and allow the development of an accurate model. Several
researchers have applied regression analysis using ANNs in formance was derived using ANN, and accuracy and effi-
various fields, such as estimation of labor, material, and utility ciency of ANN were studied by varying various parameters of
costs [10]; prediction of energy-related CO2 emissions from ANN and by comparing with regression models such as PR
urban office buildings [11]; rainfall prediction [12]; and pre- and SVM.
diction of wind drift and evaporation losses [13].
In the shipping and maritime field, Petersen et al. estimated
2. Data preprocessing
ship propulsion efficiency by using single-input variables with
an ANN and a Gaussian process method [14]. Li et al. per- Big data analysis for ship performance consists of four steps,
formed modeling, analysis, and prediction of ship motion by as shown in Fig. 1: Data denoising, data clustering, data com-
using a neural network [15]. Perera and Mo described the pression and expansion, and regression analysis using a neural
compression and expansion of ship performance data by using network [18].
an autoencoder, which is a type of neural network [16, 17]. Ship-related data, such as machinery, navigation, and
Although many studies have attempted to apply an ANN to weather, are time series data and are collected as input for big
the ship and maritime field, they did not validate the entire big data analysis. Data denoising eliminates noise, bias, and out-
data analysis process of ship performance nor did they esti- liers from raw data. In data denoising, such abnormal data are
mate and validate regression models for predicting ship per- removed if differences of data values collected in adjacent
formance. instances are large or data deviate possible range values of
In the present study, a big data analysis framework for pre- input or output variables based on domain knowledge about
dicting ship fuel consumption was constructed, which in- all variables. In this study, a smoothing algorithm was used to
cluded data denoising, clustering, compression, expansion, obtain refined datasets by minimizing differences in data val-
and regression. The performance of the framework was veri- ues between adjacent instances and differences in refined and
fied by verification of each preprocessing and post-processing raw data.
process in the framework. A regression model for ship per- In data clustering, the refined data are classified in accor-
M. Jeon et al. / Journal of Mechanical Science and Technology 32 (12) (2018) 5785~5796 5787
dance with the high-frequency operation regions using Gaus- GMM can represent clusters. Given that GMMs can accom-
sian mixture model (GMM). After data clustering, silhouette modate clusters with different sizes and correlation structures,
analysis is performed to ensure that the clusters are well clas- they are regarded as more appropriate than other clustering
sified following the operation regions. The number of clusters methods, such as K-means or hierarchical clustering, which
is repeatedly adjusted until the estimated silhouette value is deterministically assign data to clusters.
the highest and the number of clusters becomes the total num- Similar to most clustering methods, the number of desired
ber of clusters. Then, the number of clusters with the highest clusters (normal components) must first be specified before
silhouette value is selected as the most appropriate one. fitting the model. To determine in which clusters the data are
The clustered data are subject to data compression and ex- included, the number of Gaussian clusters K and the parameter
pansion processes for data transmission and storage. To verify Θ in the Gaussian model must be estimated. The parameter Θ
the performance of data compression and expansion without is a set of parameters that includes the vector of means μ, co-
severe data loss, mean squared errors (MSE) between before variance matrix Σ, and weights π of each cluster.
and after data compression are calculated and compared with
user-defined MSE. Thereafter, data compression and restora- Θ = [π1 ,L,π K ,θ1 ,L,θ K ] , (2)
tion ratios are checked. If all the conditions satisfy specific
user-defined values, then the data preprocessing is finished. where θ k = [μ k ,Σ k ] for the kth cluster.
The regression analysis of ship fuel consumption is per- To calculate each parameter, the expectation–maximization
formed using ANN. To verify accuracy of ANN, a regression (EM) algorithm consisting of an expectation step and a
value is calculated and the parameters of ANN are adjusted maximization step was used [20]. First, before applying the
until the estimated regression value is larger than the user- EM algorithm, the number of clusters K, the dataset x, and the
defined value. The output indicates the regression model for termination criteria were selected. The GMM with K clusters
ship fuel consumption. Sec. 2 explains the preprocessing is defined as
processes and regression analysis in detail.
K
p ( x | Θ ) = å π k pk ( x | z k , θ k ) , (3)
2.1 Data denoising k =1
Big data with time series include noise, error, and bias, all
where pk ( x | zk , θ k ) is the multivariate Gaussian probability
of which are denoted as corrupted data, xcor = éëx1 , x 2 ,L, x n ùû
density function with the parameter θk of the kth cluster. z
where xi Î R d and d is the dimension of the input variables represents a K-dimensional vector of indicator variables that
and n is the number of data. To find the refined dataset x̂ indicate which of the K clusters generated x. πk = p ( zk ) is
without noise, the difference between data values at adjacent
the probability of a randomly selected x was generated by
times and the difference between raw and filtered data should
cluster k, and it is referred as a weight of kth cluster. Notably,
be minimized [19]. K
2
n -1 å k =1
pk =1 .
Minimize xˆ - xcor
j =1
( )
+ l å x j +1 - x j , (1) The output was obtained as the weights and parameter val-
ues of K clusters that maximize the log-likelihood given by
where λ is the smoothing parameter that controls data fidelity
and fitness of the spline function, and n is the number of data. n
éK ù
When λ is close to zero, the filtered data x̂ are fitted to the L(θ | x1 ,..., x n ) = å ln ê å p kf ( xi | μ k , Σ k ) ú , (4)
i =1 ë k =1 û
spline function and interpolated from the corrupted data rather
than the raw data. When λ is close to infinity, the filtered data
q m = arg maxq L ( θ | x1 ,L, x n ) , (5)
are fitted to the corrupted data. Data denoising was performed
by Matlab’s smoothing spline function. If the user does not where xi indicates the ith data point and θm is the optimized
specify the smoothing parameter, then the parameter is se- parameter to maximize the log-likelihood function values for
lected within the “allowable range”, which is often close to the each cluster.
value of 1 / (1 + h3 / 6 ) , where h is the average space between The expectation step determines the cluster to which each
data points and is much smaller than the allowable range of data point belongs. To do so, the probability that each sample
the smoothing parameter. The appropriate λ value can obtain a belongs to K clusters is calculated, the probability is converted
reasonably fitted spline function and reduce the noise and to a weight value, and each data point is allocated to the appro-
error. priate cluster with the highest probability, as shown in Eq. (6).
2.2 Data clustering Pk ( xi | zk , θ k ) P ( zk )

P ( xi | z k , Θ ) = K
. (6)
GMM is often used as a data clustering method by assum- å P ( xi | zm , θ m ) P ( zm )
m =1
ing that the multivariate normal components of the fitted
The denominator indicates the probability that the data

point xi has the corresponding parameter, and the numerator is
the intersection event for two probabilities: the probability of
xi given the corresponding parameter, and the probability of xi
belonging to the kth cluster. The maximization step updates the
corresponding parameter depending on the assigned cluster.
Using the updated parameters, the expectation step is repeated
until the difference between the posterior probability and the
loglikelihood value satisfies the tolerance values of 1.0E-8 and
1.0E-6, respectively, and the maximum number of iterations is
set to be 100. Fig. 2. Autoencoder neural network model.
Once a data point is assigned to its respective cluster, how
well clustering was conducted must be measured. Silhouette uncorrelated variables called principal components. PCA finds
analysis aims to interpret and validate the consistency within these new features in such a way that most of the data variance
data clusters. The silhouette value is a measure of the similar- is expressed as low-dimensional representation. PCA a simple
ity between data in one cluster to those in other clusters. When feature extraction method and can reduce the dimension of
the silhouette value is close to 1, the data are effectively input variables. However, much information of the original
matched to their own cluster and poorly matched to other clus- data may be lost when they are expanded because the reduced
ters. input variables lose the physical meaning of original input
For each datum i, a(i) is the average distance of datum i variables through the linear transformation. By contrast, the
from all other data within the same cluster. The a(i) value autoencoder can mostly preserve the original data without
indicates how well datum i fits its assigned cluster. The considerable information loss because of its nonlinear map-
smaller the a(i) value, the better datum i fits the assigned clus- ping capability.
ter. b(i) is the lowest average distance of datum i from all The autoencoder is an ANN that makes the output values as
other data in other clusters of which datum i is not a member. close as possible to the input values by learning the relation-
The value of b(i) can be interpreted as how poorly datum i ships between the input and output values by hidden layers.
matches its neighboring cluster. The autoencoder consists of an encoder and a decoder; the
The silhouette value is defined as [21] encoder maps the input values to a hidden representation by
reducing their dimension, and the decoder maps this represen-
b (i ) - a (i ) tation back to the original input values by expanding them.
S (i ) = . (7) The autoencoder regenerates data and is used for effective
max éë a ( i ) , b ( i ) ùû data management in big data analysis. The mathematical rela-
tionship between layers in the autoencoder is the same as that
If a(i) is less than b(i), then datum i is well matched to its in ANNs in Sec. 3; thus, the detailed explanation on neural
own cluster and poorly matched to neighboring clusters, and network is provided in the section.
the silhouette value is close to 1. If b(i) is less than a(i), then
the silhouette value is close to a negative value, which indi-
3. Regression analysis
cates that datum i needs to be placed in the neighboring cluster.
If a(i) is equal to b(i), then datum i is at the boundary of two Using data compressed and refined in preprocessing, a re-
clusters. gression model for the output variable (performance) of the
physical system must be obtained. In this study, to predict ship
fuel consumption, three types of regression models, namely,
2.3 Data compression and expansion
ANN, PR and SVM, were tested. Sec. 3 describes each model.
Although data clustering has been completed, the big data
still have large capacity and a large number of input variables 3.1 Artificial neural network
affecting the physical systems. When the number of input
variables is large, the input variables can be highly correlated, ANNs are computing systems inspired by the biological
which may be potentially redundant. Thus, various methods neural networks that constitute animal brains. Such systems
have been developed to simplify the expression of big data by progressively improve their performance in tasks, such as
compressing it. For example, principal component analysis image recognition, by a process called machine learning. The
(PCA) and autoencoders can extract important features from ANN consists of processing elements, namely, inputs, outputs,
big data by reducing the dimensions of input parameters. weights, and activation (neuron/transfer) functions, which are
PCA is an algorithm for reducing high-dimensional data to similar to those of a model of biological neurons. The artificial
low-dimensional data by using linear transformation to con- neurons are elementary units in the ANN; they receive one or
vert a set of possibly correlated variables into a set of linearly more inputs and sum them up to generate the output. Each
Table 1. General activation function of artificial neural networks.
Activation functions Formula Graph
1
Sigmoid g ( x) =
1 + e- x
Tangent
g ( x) = tanh( x)
sigmoid
Fig. 3. Neural network model for regression analysis.
ReLU
input is separately weighted on the basis of degree of the input (rectified g ( x) = max(0, x)
signal, and the sum passes through a nonlinear function called linear unit)
the activation function. The activation function can have a
sigmoid shape but can also be nonlinear, piecewise linear
functions, or step functions. They are usually monotonically where each neuron in Oi and Oj has a set of time series data. If
increasing, continuous, differentiable, and bounded. the output Oj is obtained, then Oj becomes the input to obtain
The input and output variables, X(t) = [X1(t),X2(t),···,Xd(t)] the next output in a next hidden or output layer. This computa-
and Y(t) are considered, where t is the time. The output layer tion is repeated for all input, hidden, and output layers in the
can have multiple output variables, but a single output variable neural network.
was considered in this study. Depending on how the input The activation function, such as sigmoid, tangent sigmoid,
signal is transferred among input, hidden, and output layers, and rectified linear unit (ReLU) functions, is a predefined one.
the neural network can be called feed-forward or recurrent These functions are the most commonly used activation func-
networks. tions and are listed in Table 1. The sigmoid and tangent sig-
The feed-forward neural network (FNN) allows the input moid functions can be repeatedly used because they maintain
signal to be transferred only one way such that the output of the same functional form even with their differentiations.
any layer does not affect the same layer unlike the recurrent However, the sigmoid function has a range from 0 to 1,
network. The FNN architecture has two main types: multi- whereas the tangent sigmoid function has a range from −1 and
layer perceptron (MLP) and bridged MLP. The ANN uses the 1. As the number of hidden layers increases, the differentia-
multilayers with FNN architecture and can thus be considered tion values rapidly converge to zero, thereby introducing diffi-
MLP. The MLP has several layers between input and output, culties in the neural network training and prediction. By con-
and all adjacent layers have forward connections in which no trast, the ReLU function can solve this problem because its
connections occur among neurons in the same layers. If the differentiation value is either 0 or 1 [23], which accelerates
number of hidden layer is one, then MLP becomes a single- neural network training. In this study, regression using neural
layer FNN. However, BLMP has a bridge connection across networks was processed using exponential sigmoid, tangent
layers where all of the inputs are connected to the output node sigmoid, and ReLU functions. The regression results using
and the nodes in all hidden layers. If BLMP has nested con- different activation functions were compared to extract the
nections with only one node in each hidden layer, then it be- accurate regression model for ship performance prediction.
comes a fully connected cascade network. In this process, weight matrix W and bias vector b were ob-
Various neural networks exist, but this study mainly fo- tained by minimizing the error between the predicted response
cused on the construction of the big data analysis framework Y(t) and real data y(t), which is expressed as Eq. (9). In this
for the prediction of ship fuel consumption. Thus, the most optimization process, a gradient descent method, which is a
commonly used ANN (MLP) was used. nonlinear optimization method, was used by taking the deriva-
Given that seven input variables and one output variable are tive of the error function E with respect to the weight and bias
used, ANN consists of a single layer of output neurons where to decrease the error.
the inputs compute a linear combination of inputs. The net-
tn
work structure consists of an activation function f, weight 1 2
E=å (Y (t ) - y (t ) ) , (9)
matrix W, and bias b. p and q neurons are assumed to be pre- t = t1 2
sent in layers i and j, respectively. When the neuron in layer j
receives the input in layer i, the output Oj is obtained by calcu- where ti is time instance for ith data. The multilayer network
lating the activation function f, which is a function of a linear uses various learning techniques, especially backpropagation.
combination of weight matrix Wij with p×q dimension, and In backpropagation, the weights of each connection are ad-
bias vector bj. The mathematical formulation for the output justed to reduce the values of the error function by small
can be written as [22] amounts. After repeating this process for a sufficient number
of training cycles, the network will converge to some state in
( )
O j ( t ) = f b j + Wij Oi (t ) , (8) which the value of the error function is small.
3.2 Other regression models Table 2. Data description for ship performance prediction.
The most common regression model is PR. PR is a method Data No. Parameter Remark
of calculating polynomial coefficients by determining the 1 Avg. draft [m]
Ship state
degree of polynomials. The input data are set as x = 2 Trim [m]
éëx1 , x 2 ,L, x n ùû , where xi Î R d and d is the dimension of the 3 ME power [kW]
Engine operation
input variables and n is the number of data. The regression Input 4 Shaft speed [rpm]
model can be expressed as 5 STW [knots]
Navigation speed
6 SOG [knots]
y = Xβ + ε , (10) 7 Rel. wind speed [m/s] Weather condition
é y1 ù é1 x1 x1 2
L p
x1 ù é b 0 ù é e 1 ù Output 1 ME fuel cons. [tons/day] Fuel consumption
ê ú ê úê ú ê ú
ê y2 ú ê1 x2 x22 L x 2p ú ê b1 ú êe 2 ú
ê y3 ú = ê1 x3 x32 L x 3p ú ê b 2 ú + êe 3 ú , (11)
ê ú ê úê ú ê ú n
ê M ú êM M M O M úê M ú ê M ú f ( x,a i,a i*) = å (a i - a i*)k ( x, xi ) + b , (15)

ê y ú ê1 xn xn p K xn p úû êë b m úû êëe n úû i =1
ë nû ë
where a and a * are Lagrange multipliers, and k ( x, xi ) is
where ε is a vector of unobserved random error with mean a kernel function to map the input space to the feature space.
zero on variables xi, β is a vector of regression coefficients for
variables xi and p is a degree of polynomials. The vector of
regression coefficients can be estimated using least square 4. Case study
estimation as 4.1 Data collection
-1 A dataset of ship performance and navigation information

β = ( XT X ) XT y . (12)
was analyzed. The dataset was replicated from datasets by
Perera and Mo (2016a). The original data were collected from
SVM is a supervised learning model for classification and a specific vessel with ship length of 225 m, beam of 32.29 m,
regression using a linear classifier. The SVM model can be gross tonnage of 38.889 tons, and deadweight at max draft of
expressed as [23] 72.562 tons. The vessel is powered by a two-stroke main en-
gine with a maximum continuous rating of 7564 kW and axial
Ns
f ( x ) = å wi F i ( x ) + b , (13) rotation speed of 105 rpm. The fixed pitch propeller diameter
i =1 is 6.2 m with four blades. The extracted dataset was divided
into input and output variables to conduct regression analysis.
where Fi ( x) is a mapping function between input and fea- The input variables can be summarized as four types that rep-
ture spaces, which can be considered a hidden layer, such as resent ship states, engine operations, navigation speed, and
ANN; Ns is the number of support vectors; and b is the bias. weather conditions: average (avg.) draft, trim, main engine
The most commonly used SVM is a ε-insensitive support (ME) power, shaft speed, speed through water (STW), speed
vector regression, which finds weights and biases by minimiz- of the ground (SOG), and relative (rel.) wind speed. The out-
ing ε-insensitive loss function and norm of linear parameters put variable is the fuel consumption of the main engine, as
as listed in Table 2. The input and output parameters were used
as input and output variables in the ANN model for analysis of
N
1 2 ship fuel consumption, which is explained in Sec. 4.3.
min J = w + C å yi - f ( xi ) e , (14)
2 i =1
ì f ( x ) - yi £ e + x ü
i
i 4.2 Data preprocessing
ï *ï
subject to í yi - f ( x ) £ e + x i ý ,
i
After collecting the data for the input and output variables,
ïx , x * ³ 0 ï
î i i
þ denoising was performed using the smoothing algorithm ex-
plained in Sec. 2.1. In this case, a default value for the
where ε-insensitive loss function is defined as smoothing parameter, which was automatically selected
L( f ( xi ), yi ) = f ( xi ) - yi - e for f ( xi ) - yi ³ e and 0 for within the allowable range of the parameter, was used. Fig. 4
other cases. C is a regulation parameter that controls the influ- shows the time series data for SOG before and after the data
ence of noises and outliers to the optimal separating hyper- denoising process. The denoised data smoothed out irregular-
plane, and x and x * are non-negative slack variables. ity and sudden change in the corrupted data while following
This primal optimization problem is transformed into a dual the tendency of the corrupted data. In addition to SOG data,
problem [23], and the solution can be expressed as all data were refined by the denoising preprocessing and have
Cluster
Fig. 4. Denoised data for SOG.
Fig. 6. Silhouette value of GMMs.
and expansion using the autoencoder. The total number of

data is 4000 for seven input variables and one output variable.
The number of neurons in the hidden layer is seven for eight
dimensions, including the input and output neurons in the
layers. The activation function is a sigmoid function, and the
iteration is terminated if it exceeds 10000 iterations. Given
that eight input and output dimensions are reduced to seven
dimensions in the encoder, the compression ratio becomes
22 %. The performance of the autoencoder can be evaluated
by calculating the error between the inputs of the encoder and
outputs of the decoder on the basis of the reconstructed data in
Fig. 5. Data clustering using GMM for K = 1–3.
the decoder. As a result, 95 % of the data were recovered with
a small MSE of 0.0503. Although the example used only has
shown a similar tendency to the SOG data. The smoothing 4000 items of data and the effect of the encoder is inconsider-
parameter was verified to ensure that the denoised dataset able, it can be widely used for the effective management of
preserves mostly complete information of the given datasets ship-related big data with large capacity.
and the regression model has at least a 0.95 coefficient of de-
termination.
4.3 Regression model of ship fuel consumption by using an
Data clustering was performed on the basis of the refined
artificial neural network
datasets to classify operation regions with high frequency.
Data clustering was performed to extract the features for proc- As stated in Sec. 4.1, the input and output parameters in Ta-
essing the regression with subset data. Fig. 5 shows the GMM ble 2 were also used as input and output variables in the ANN
models between ME power and shaft speed for K = 1, 2, 3. model. Similarly, the numbers of input and output neurons are
For K = 1, the cluster is expressed as a bivariate Gaussian set to be seven and one, respectively. The regression model of
distribution. Given that ME power and shaft speed have a ship fuel consumption was derived using a feed-forward ANN
great effect on the fuel consumption of the main engine, they with backpropagation algorithm as described in Sec. 3. The
were chosen as factors for clustering analysis. For K = 2–3, Levenberg–Marquardt optimization, which is known as the
the bivariate Gaussian distribution for each cluster overlaps fastest backpropagation algorithm and is highly recommended,
and each cluster is not clearly separated. To verify the cluster- was used to update the weight and bias. The MSE and regres-
ing analysis results, silhouette values were estimated, as sion (R) values were used to check how well the regression
shown in Fig. 6, by increasing the number of clusters up to model fits the given data. The MSE and R values for different
seven. Although the silhouette value for K = 1 is nearly 1, activation functions were compared to determine the effect of
those for K = 2–3 are negative values, thereby indicating un- the activation function on the estimation of the regression
desirable clustering results. Given that the collected dataset is model. The ANN model generates a range of 1–7 neurons and
concentrated within a specific area and collected from the 0–2 hidden layers by using the reduced dimensions, and the
same operation region, the GMM for K = 1 shows the highest regression model using ANN was compared with a linear
silhouette value. In this study, the number of given data is not regression model.
large; thus, the effect of clustering is inconsiderable. However, The data used to construct the regression model came from
the large numbers of data can be effectively processed and multiple datasets: Training, testing, and validation datasets.
analyzed by the clustering analysis. These datasets were used in the different stages of model crea-
The last step of the data preprocessing is data compression tion. The model was first fit to a training dataset, and the pa-
Table 3. ANN regression model using sigmoid function. Table 5. ANN regression model using ReLU function.
Regression values Regression values

No. MSE Iter. Conf. No. MSE Iter. Conf.
Train. Test. Valid. All Train. Test. Valid. All
S1 0.7080 26 7-1-1 0.5324 0.4967 0.5318 0.5283 ReLU1 0.8414 24 7-1-1 0.3343 0.3297 0.3199 0.3320
S2 0.4752 27 7-2-1 0.7324 0.6942 0.7095 0.7246 ReLU2 0.7221 30 7-2-1 0.5136 0.4812 0.5146 0.5093
S3 0.4533 28 7-3-1 0.7691 0.7129 0.7374 0.7565 ReLU3 0.5805 32 7-3-1 0.6335 0.6147 0.6170 0.6299
S4 0.3347 32 7-4-1 0.8223 0.7672 0.7895 0.8103 ReLU4 0.4919 35 7-4-1 0.7189 0.6793 0.6932 0.7109
S5 0.3053 34 7-5-1 0.8580 0.8161 0.8324 0.8479 ReLU5 0.4368 47 7-5-1 0.7656 0.7286 0.7400 0.7578
S6 0.2808 35 7-6-1 0.8707 0.8205 0.8448 0.8592 ReLU6 0.3887 45 7-6-1 0.7923 0.7592 0.7755 0.7860
S7 0.2729 37 7-7-1 0.8771 0.8414 0.8464 0.8680 ReLU7 0.3615 44 7-7-1 0.8168 0.7874 0.7984 0.8104
S8 0.2185 30 7-4-1-1 0.8931 0.8478 0.8680 0.8829 ReLU8 0.6360 22 7-4-1-1 0.4507 0.4208 0.4415 0.4459
S9 0.1903 30 7-4-2-1 0.9117 0.8667 0.8834 0.9018 ReLU9 0.5384 33 7-4-2-1 0.6308 0.5956 0.6025 0.6224
S10 0.1963 29 7-4-3-1 0.9226 0.8781 0.8923 0.9118 ReLU10 0.4200 37 7-4-3-1 0.7742 0.7236 0.7411 0.7620
S11 0.1711 30 7-4-4-1 0.9359 0.8831 0.9019 0.9235 ReLU11 0.3292 33 7-4-4-1 0.8250 0.7778 0.8167 0.8170
S12 0.1648 33 7-4-5-1 0.9427 0.8930 0.9148 0.9309 ReLU12 0.3340 36 7-4-5-1 0.8042 0.7601 0.7867 0.7956
S13 0.1799 30 7-4-6-1 0.9349 0.8848 0.9067 0.9227 ReLU13 0.3151 37 7-4-6-1 0.8360 0.7912 0.8116 0.8264
S14 0.1808 32 7-4-7-1 0.9442 0.8977 0.9065 0.9317 ReLU14 0.3005 35 7-4-7-1 0.8343 0.8013 0.8067 0.8256
Table 4. ANN regression model using tangent sigmoid function.

well with the target values, the R value becomes 1. “Structure
Regression values No.” indicates the model number using different activation
No. MSE Iter. Conf.
Train. Test. Valid. All functions: sigmoid (S-model hereinafter), tangent sigmoid
TS1 0.6639 24 7-1-1 0.5397 0.4933 0.5215 0.5312 (TS-model hereinafter), and ReLU (ReLU-model hereinafter).
TS2 0.5251 29 7-2-1 0.7150 0.6488 0.6840 0.7016 The detailed network information is presented in the network
TS3 0.4225 27 7-3-1 0.7722 0.7139 0.7403 0.7599 configuration that includes the hidden layers and number of
TS4 0.3762 34 7-4-1 0.8174 0.7589 0.7828 0.8038
neurons. Each number represents the number of neurons for
the input, hidden, and output layers, and the hyphen between
TS5 0.3443 35 7-5-1 0.8347 0.7830 0.8027 0.8235
the numbers indicates their connections. Thus, the first num-
TS6 0.3130 39 7-6-1 0.8453 0.8053 0.8145 0.8347
ber indicates the dimension of the input variables, and the last
TS7 0.2714 37 7-7-1 0.8715 0.8197 0.8470 0.8603 number indicates the dimension of output variables, which is
TS8 0.3789 32 7-4-1-1 0.7511 0.7114 0.7351 0.7428 the fuel consumption of the main engine in this study. The
TS9 0.2583 31 7-4-2-1 0.8446 0.7973 0.8277 0.8355 middle values indicate the number of neurons in the hidden
TS10 0.1903 31 7-4-3-1 0.9279 0.8708 0.9006 0.9160 layers in which the number of hidden layers was limited to
TS11 0.1923 31 7-4-4-1 0.9249 0.8645 0.8883 0.9111 two.
TS12 0.1915 31 7-4-5-1 0.9196 0.8831 0.8914 0.9100 Considering the dimension of input variables, the maximum
TS13 0.1863 31 7-4-6-1 0.9317 0.8892 0.8983 0.9235
number of neurons was set as seven. For two hidden layers,
the number of neurons in the first hidden layer was set as four,
TS14 0.1731 30 7-4-7-1 0.9289 0.8826 0.9015 0.9179
which is half the dimension of input variables to reduce com-
putation time and achieve high convergence rate.
rameters such as bias and weights were estimated on the basis In addition, the number of iterations for repeated learning
of the training dataset. The fitted model was used to predict the neural network was provided to test the efficiency of the
the output on a second dataset, called the validation dataset. regression analysis. The MSE and R values are shown to
The validation dataset tuned the hyperparameters of the model, demonstrate the accuracy of the regression model. The R val-
such as the number of hidden layers. The testing dataset was ues were estimated for regression models using four types of
used to assess the performance of the regression model. datasets: training, testing, validation, and all datasets. The
training dataset accounted for 70 % of all datasets, whereas
the testing and validation datasets each accounted for 15 %.
5. Results and discussion
Each dataset was randomly selected whenever the neural net-
The results of the regression analysis are presented in Ta- work was generated.
bles 3-5, in which they are the averaged values repeated over Tables 3-5 show the regression results using different acti-
100 times for each model. The regression value R is defined as vation functions, namely, sigmoid, tangent sigmoid, and
a correlation coefficient between target and output values ob- ReLU functions, respectively. The models with a small num-
tained from regression models. Given that the output values fit ber of neurons, such as S1, show lower MSE values than those
with sufficient number of neurons. Table 6. Regression analysis results using PR.
As the number of neurons and hidden layers increases, the
Types MSE R values
accuracy of the regression model also improves. Calculation
Linear 0.9584 0.0448
of the regression values for each dataset shows that the learn-
ing performance of the neural network in the training dataset Interaction 0.7244 0.2859
is similar to that in all datasets. Considering that the regression Pure quadratic 0.8526 0.1528
values of the testing and validation datasets are similar to the Full quadratic 0.6531 0.3581
regression values of all datasets, the fitness of the training
dataset is reasonable and no error occurs in the neural network, Table 7. Regression analysis results using SVM.
including the testing dataset.
Types MSE R values
If the tangent sigmoid or ReLU function is used as the acti-
Gaussian 0.9157 0.0812
vation function, as shown in Tables 4 and 5, then the regres-
sion results are somewhat different from each other. For ex- Linear 2.478 0.0072
ample, if the number of hidden layers is one in the S-model Quadratic 1.854 0.4804
and TS-model, then the regression values are over 0.8 for S4
and TS4 models with four neurons. Therefore, the sigmoid
function conducts learning properly without disappearance of
the gradient of the activation function. However, ReLU1–
ReLU7, with one to six neurons, have regression values less
than 0.8. Given that the gradient value of the ReLU function is
either 0 or 1 when taking the gradient of the activation func-
tion, it does not accurately represent the nonlinearity of the
regression model.
When the number of hidden layers is two, the S-model has
high regression values, such as over 0.85 for S8–S14 in most
cases, which converges to 1. The TS-model has a similar ten- Fig. 7. Regression values for one hidden layer.
dency, as shown in Table 5 for TS10–TS14. However, the
ReLU-model has much lower regression values than those
using S-models and TS-models for one and two hidden layers
because the ReLU function does not accurately describe the
highly nonlinear regression model owing to its linear function
shape.
The box plots for R values using ANN with three activation
functions are shown in Figs. 7 and 8 for one and two hidden
layers, respectively. The lower and upper values of the box
plots indicate the minimum and maximum regression values.
The top and bottom lines of the boxes indicate the first and
Fig. 8. Regression values for two hidden layers.
third quantiles of the regression values, and their middle lines
indicate the median values (second quantile) between them.
As shown in Tables 3-5, the median R values using sigmoid ance among the three activation functions, it is the most attrac-
and tangent sigmoid functions are much higher than those tive for big data because ship-related data are in time series
using the ReLU function for one hidden layer (Fig. 7), espe- and datasets are updated in real time. The sigmoid and tangent
cially when the number of neurons is small. When the number sigmoid functions have vanishing gradient problems in deep
of hidden layers becomes two (Fig. 8), the median R values layers, whereas the ReLU function does not. In this case study,
using ANN with three activation functions increase as the the hidden layer does not need to be deep unlike other big data
number of neurons increases. problems; thus, the sigmoid or tangent sigmoid functions are
However, the variations in the R values using the ReLU recommended over the ReLU function.
function are much larger than those using sigmoid and tangent In comparing the performance of ANN with other regres-
sigmoid functions. The ReLU function has extreme gradient sion models, average R values using the PRs with various
values of either 0 or 1, whereas the sigmoid and tangent sig- degrees of polynomials and SVM were repeatedly calculated
moid functions have continuous gradient values. Thus, de- over 100 times, as shown in Tables 6 and 7. In comparing the
pending on how training datasets are randomly selected, the R performance of each method under equal conditions, the same
values can be considerably different depending on the ran- training, testing, and validation datasets were used in all meth-
domness of the datasets. ods.
Although the ReLU function indicates the worst perform- The degree of polynomials increases, but R values do not
Table 8. Regression analysis result of ship fuel consumption.
Customized function R values Time (sec)

Exponentialsigmoid 0.9362 0.5200
ANN Tangent sigmoid 0.9383 0.4585
ReLU 0.8790 0.5439
Linear 0.0449 0.1167
Interaction 0.2862 0.1892
PR
Pure quadratic 0.1526 0.1425
Full quadratic 0.3582 0.2406
Gaussian RBF 0.0813 0.4109 Fig. 10. Target-output regression using SVM with a quadratic function.
SVM Linear 0.0071 6.5650
Quadratic 0.4810 380.9915
Fig. 9. Regression values using various regression models. Fig. 11. Target-output regression using ANN with TS13.
increase and are much lower than those using ANN, even for highly depending on the values of the parameters. The MSE
the worst case, as shown in Table 6. Similarly, although vari- and R values in Table 7 are the best results obtained by vary-
ous kernel functions are used in SVM, R values remain much ing the parameters of SVM. The computational time of ANN
lower than those using ANN. Parameters such as kernel scale is similar regardless of the types of activate functions because
and epsilon of SVM are repeatedly adjusted, but MSE and R the amount of data used in this study is small.
values remain worse than the results of Table 4. The regression outputs (predicted value) with respect to tar-
Fig. 9 shows the box plots of R values using PR with full gets (real value from given data) for training, testing, and vali-
quadratic functions, SVM with quadratic kernel functions, and dation datasets are shown in Figs. 10 and 11 for verifying the
ANN with TS13, which yield the best results in PR, SVM and most accurate ANN model (T13) and SVM model with quad-
ANN, respectively. PR and SVM show similar performance ratic functions among the examples. For a perfect fit, the re-
regardless of the randomness of datasets, whereas the per- gression outputs should be equal to the targets. The fitted line
formance of ANN depends on which dataset is selected for for the SVM model does not match well with target data; thus,
regression analysis. Nevertheless, when comparing the median it is not a good fit for given datasets. By contrast, the ANN
R values using each method, ANN with the worst accuracy is model using the tangent sigmoid function has a reasonably
much better than the best results using PR and SVM, except good fit with the datasets with a high regression value of
for one outlier in the ANN. Thus, a highly nonlinear relation- 0.98352.
ship is expected between the input variables related to ship
state, engine operation, navigation and weather conditions and
6. Conclusions
ship fuel consumption.
To compare the efficiency of each method, computational In this study, a regression model using an ANN approach
time was calculated. Table 8 shows the median R values and was proposed to predict ship performance in terms of the fuel
average computational time. Given that PR requires simple consumption of the vessel’s main engine. The conclusions can
calculation to obtain regression coefficients, it shows an ap- be summarized as follows.
proximately 1/4–1/2 times faster calculation than ANN. SVM ·A big data analysis framework for prediction of ship fuel
takes the longest computational time except when using the consumption was built. Ship-related data, such as naviga-
Gaussian radial basis function. Unlike PR and ANN, SVM tion, weather, ship operation, and ship structure, were re-
requires longer computational time especially for quadratic fined by sequential data preprocessing including data de-
functions because it solves a dual optimization problem. SVM noising, clustering, compression, and expansion. By
is sensitive to its parameters; thus, its performance can vary checking the silhouette values, MSE values, compressive
ratios, and R values, the performance of the framework ogy (2017).

was verified. [4] L. Zhang, Q. Meng, Z. Xiao and X. Fu, A novel ship trajec-
·Although previous research used a linear regression tory reconstruction approach using AIS data, Ocean Engi-
model for ship performance, this study derived a more neering, 159 (1) (2018) 165-174.
accurate regression model for ship performance using an [5] J. R. A. Morillas, I. C. Garcia and U. Zolzer, Ship detection
ANN. As the number of hidden layers and neurons in- based on SVM using color and texture features, Interna-
creases, the ANN yields an accurate regression model. In tional Journal of Fuzzy Logic and Intelligence Systems, 15
addition, the effects of activation function type on learn- (4) (2015) 251-259.
ing performance were studied. The sigmoid and tangent [6] B. Ban, J. Yang, P. Chen, J. Xiong and Q. Wang, Ship track
sigmoid functions show high and stable regression values regression based on support vector machine, IEEE Access, 5
even with a small number of hidden layers and neurons, (2017) 18836-18846.
whereas the ReLU function requires a sufficient number [7] B. Kawan, H. Wang, G. Li and K. Chhantyal, Data-driven
of hidden layers and neurons to obtain high and stable modeling of ship motion prediction based on support vector
regression values. regression, Proc. of the 58th Conference on Simulation and
·When the accuracy and efficiency of ANN is compared Modelling (SIMS 58), Reykjavik, Iceland (2017) 350-354.
with PR with various degrees of polynomials and SVM [8] A. S. Aisjah and S. Arifin, Maritime weather prediction using
with various types of kernel functions by R values and fuzzy logic in java sea for shipping feasibility, International
computational time, ANN is the most accurate and effi- Journal of Artificial Intelligence, 10 (S13) (2013) 112-122.
cient. Accordingly, regression analysis using ANN is re- [9] W. S. Sarle, Neural networks and statistical models, Proc. of
quired for predicting ship fuel consumption with nonlin- the Nineteenth Annual SAS Users Group International Con-
ear relationships between input and output variables. ference, Cary, NC, USA (1994) 1538-1550.
Although the data used in this study may not be considered [10] A. E. Smith and A. K. Mason, Cost estimation predictive
big data, the proposed preprocessing method can be used to modeling: Regression versus neural network, The Engineer-
deal with large volumes of time-series ship-related data. The ing Economist, 42 (2) (2010) 137-161.
data preprocessing method improves the data quality while [11] Y. Hong, Q. Ren, X. Hu, T. Lin, L. Shi, G. Zhang and X. Li,
reducing the data volume and enables accurate prediction of Modeling energy-related CO2 emissions from office build-
ship fuel consumption. Regression models using ANNs are ings using general regression neural network, Resources,
essential in accurately predicting nonlinear ship performance Conservation and Recycling, 129 (2018) 168-174.
with various input factors and complex data. Although this [12] N. Mishra, H. K. Soni, S. Sharma and Y. K. Upadhyay,
approach has some limitations such as certain aspects of hy- Development and analysis of artificial neural network mod-
perparameter selection or explainability of the regression els for rainfall prediction by using time-series data, Interna-
model, the regression analysis using ANN can effectively and tional Journal of Intelligent Systems and Applications, 10 (1)
accurately predict ship performance and can be used as a (2018) 16-23.
complex and real-time model in the future shipping and ma- [13] H. M. Al-Ghobari, M. S. EI-Marazky, A. Z. Dewidar and
rine industry. The best possible outcome of this study can be M. A. Mattar, Prediction of wind drift and evaporation losses
achieved by attempting the optimization of ship performance from sprinkler irrigation using neural network and multiple
for the smart ships of the future. regression techniques, Agricultural Water Management, 195
(1) (2018) 211-221.
[14] J. P. Petersen, D. J. Jacobsen and O. Winther, Statistical
Acknowledgments
modelling for ship propulsion efficiency, Journal of Marine
This work was supported by the National Research Founda- Science and Technology, 17 (1) (2012) 30-39.
tion of Korea grant funded by the Korean government (MSIP) [15] G. Li, H. Zhang, B. Kawan, H. Wang, O. L. Osen and A.
through GCRC-SOP (No. 2011-0030013). Styve, Analysis and modeling of sensor data for ship motion
prediction, Proc. of the Oceans Conference, Shanghai, China
(2016) 10-13.
References
[16] L. P. Perera and B. Mo, Marine engine operating regions
[1] M. Haranen, P. Pakkanen, R. Kariranta and J. Salo, White, under principal component analysis to evaluate ship per-
grey and black-box modeling in ship performance evaluation, formance and navigation behavior, IFAC-PapersOnLine, 49
Hull Performance and Insight Conference, Pavone Canavese, (23) (2016a) 512-517.
Torino, Italy (2016). [17] L. P. Perera and B. Mo, Data compression of ship perform-
[2] L. P. Perera and B. Mo, Machine intelligence based data ance and navigation information under deep learning, Proc.
handling framework for ship energy efficiency, IEEE Trans- of the 35th International Conference on Ocean, Offshore
actions on Vehicular Technology, 66 (10) (2017) 8659-8666. and Arctic Engineering (OMAE 2016), Busan, Korea
[3] C. Jonge, Data-driven analysis of vessel performance, Mas- (2016b) OMAE2016-54093.
ter Thesis, Norwegian University of Science and Technol- [18] M. Jeon, J. Lim and Y. Noh, A study on big data technol-
ogy and collection, processing and analysis method for ship, Miyeon Jeon is a graduate student in
The Korean Society of Mechanical Engineers Annual Con- the School of Mechanical Engineering
ference, Korea, 3083-3085. in Pusan National University. Her re-
[19] P. J. Green and B. W. Silverman, Nonparametric regres- search area is big data preprocessing and
sion and generalized linear models: A roughness penalty analysis.
approach, Chapman and Hall, London, UK (1994).
[20] R. Sridharan, Gaussian mixture models and the EM algo-
rithm, https://people.csail.mit.edu/rameshvs/content/gmm-em.
pdf (2018).
[21] R. C. de Amorim and C. Hennig, Recovering the number of Yoojeong Noh is an Assistant Professor
clusters in data sets with noise features using feature re- in the School of Mechanical Engineer-
scaling factors, Information Sciences, 324 (2015) 126-145. ing of Pusan National University. Her
[22] M. H. Hassoun, Fundamentals of artificial neural networks, recent interests include big data analysis,
MIT Press, Cambridge, MA, USA (1995). uncertainty quantification, and design
[23] A. L. Maas, A. Y. Hannun and A. Y. Ng, Rectifier nonlin- under uncertainties.
earities improve neural network acoustic models, Interna-
tional Conference on Machine Learning, 30 (1) (2013) 3.

Prediction of Ship Fuel Consumption by Using An Artificial Neural Network

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Prediction of Ship Fuel Consumption by Using An Artificial Neural Network

Uploaded by

Copyright:

Available Formats

Journal of Mechanical Science and Technology 32 (12) (2018) 5785~5796

Prediction of ship fuel consumption by using an artificial neural network†

The data collected from the ship are high-capacity, large-

and output variables with a single equation, the regression

approximate the nonlinear model. As a result, the fuzzy logic

proven to be capable of modeling highly nonlinear and multi-

PR is widely used in practice due to its easy implementation.

tween neural networks and statistical models, such as linear or

when the physical relationships are complex or when the na-

networks require prior information about the system and have

2.2 Data clustering Pk ( xi | zk , θ k ) P ( zk )

The denominator indicates the probability that the data

Table 1. General activation function of artificial neural networks.

Activation functions Formula Graph

ê M ú êM M M O M úê M ú ê M ú f ( x,a i,a i*) = å (a i - a i*)k ( x, xi ) + b , (15)

-1 A dataset of ship performance and navigation information

Fig. 6. Silhouette value of GMMs.

and expansion using the autoencoder. The total number of

Regression values Regression values

Table 4. ANN regression model using tangent sigmoid function.

Table 8. Regression analysis result of ship fuel consumption.

Customized function R values Time (sec)

ratios, and R values, the performance of the framework ogy (2017).

You might also like

ê M ú êM M M O M úê M ú ê M ú f ( x,a i,a i) = å (a i - a i)k ( x, xi ) + b , (15)