Professional Documents
Culture Documents
www.springerlink.com/content/1738-494x(Print)/1976-3824(Online)
DOI 10.1007/s12206-018-1126-4
(Manuscript Received February 28, 2018; Revised July 20, 2018; Accepted August 22, 2018)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Abstract
A smart ship collects various data with large volume, such as voyage, machinery, and weather data. Thus, big data analysis for smart
ships is an important technology that can be widely applied to improve ship maintenance, operational efficiency, and equipment life
management. In this study, an accurate regression model for the fuel consumption of the main engine by using an artificial neural net-
work (ANN) was proposed by big data analysis including data collection, clustering, compression, and expansion. To obtain an accurate
regression model, various numbers of hidden layers and neurons and different types of activation functions were tested in the ANN, and
their effects on the accuracy and efficiency of the regression analysis were studied. The proposed regression model using ANN is a more
accurate and efficient model to predict the fuel consumption of the main engine than polynomial regression and support vector machine.
Keywords: Artificial neural network; Big data analysis; Ship fuel consumption; Smart ship; Regression analysis
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
regression model that obtains the relation between the input Start
predict ship motion [7]. Fuzzy logic regression was used for
No
maritime weather prediction for shipping feasibility [8]. MSE £ MSEc
ANNs have been actively studied in various fields to im- Yes Data compression & expansion
prove the accuracy of regression models, including high-
dimensional factors and complex interactive relationships Data regression
(1) Generation of ANN model Adjustment of ANN’s parameters
between factors. Sarlie [9] first explained the relationship be- (2) Calculation of determination coefficient, R
highly nonlinear and dynamic systems and simple linear sys- Output
Regression model for ship’s fuel
tems. ANNs are helpful in developing an appropriate model consumption
dance with the high-frequency operation regions using Gaus- GMM can represent clusters. Given that GMMs can accom-
sian mixture model (GMM). After data clustering, silhouette modate clusters with different sizes and correlation structures,
analysis is performed to ensure that the clusters are well clas- they are regarded as more appropriate than other clustering
sified following the operation regions. The number of clusters methods, such as K-means or hierarchical clustering, which
is repeatedly adjusted until the estimated silhouette value is deterministically assign data to clusters.
the highest and the number of clusters becomes the total num- Similar to most clustering methods, the number of desired
ber of clusters. Then, the number of clusters with the highest clusters (normal components) must first be specified before
silhouette value is selected as the most appropriate one. fitting the model. To determine in which clusters the data are
The clustered data are subject to data compression and ex- included, the number of Gaussian clusters K and the parameter
pansion processes for data transmission and storage. To verify Θ in the Gaussian model must be estimated. The parameter Θ
the performance of data compression and expansion without is a set of parameters that includes the vector of means μ, co-
severe data loss, mean squared errors (MSE) between before variance matrix Σ, and weights π of each cluster.
and after data compression are calculated and compared with
user-defined MSE. Thereafter, data compression and restora- Θ = [π1 ,L,π K ,θ1 ,L,θ K ] , (2)
tion ratios are checked. If all the conditions satisfy specific
user-defined values, then the data preprocessing is finished. where θ k = [μ k ,Σ k ] for the kth cluster.
The regression analysis of ship fuel consumption is per- To calculate each parameter, the expectation–maximization
formed using ANN. To verify accuracy of ANN, a regression (EM) algorithm consisting of an expectation step and a
value is calculated and the parameters of ANN are adjusted maximization step was used [20]. First, before applying the
until the estimated regression value is larger than the user- EM algorithm, the number of clusters K, the dataset x, and the
defined value. The output indicates the regression model for termination criteria were selected. The GMM with K clusters
ship fuel consumption. Sec. 2 explains the preprocessing is defined as
processes and regression analysis in detail.
K
p ( x | Θ ) = å π k pk ( x | z k , θ k ) , (3)
2.1 Data denoising k =1
Big data with time series include noise, error, and bias, all
where pk ( x | zk , θ k ) is the multivariate Gaussian probability
of which are denoted as corrupted data, xcor = éëx1 , x 2 ,L, x n ùû
density function with the parameter θk of the kth cluster. z
where xi Î R d and d is the dimension of the input variables represents a K-dimensional vector of indicator variables that
and n is the number of data. To find the refined dataset x̂ indicate which of the K clusters generated x. πk = p ( zk ) is
without noise, the difference between data values at adjacent
the probability of a randomly selected x was generated by
times and the difference between raw and filtered data should
cluster k, and it is referred as a weight of kth cluster. Notably,
be minimized [19]. K
2
n -1 å k =1
pk =1 .
Minimize xˆ - xcor
j =1
( )
+ l å x j +1 - x j , (1) The output was obtained as the weights and parameter val-
ues of K clusters that maximize the log-likelihood given by
where λ is the smoothing parameter that controls data fidelity
and fitness of the spline function, and n is the number of data. n
éK ù
When λ is close to zero, the filtered data x̂ are fitted to the L(θ | x1 ,..., x n ) = å ln ê å p kf ( xi | μ k , Σ k ) ú , (4)
i =1 ë k =1 û
spline function and interpolated from the corrupted data rather
than the raw data. When λ is close to infinity, the filtered data
q m = arg maxq L ( θ | x1 ,L, x n ) , (5)
are fitted to the corrupted data. Data denoising was performed
by Matlab’s smoothing spline function. If the user does not where xi indicates the ith data point and θm is the optimized
specify the smoothing parameter, then the parameter is se- parameter to maximize the log-likelihood function values for
lected within the “allowable range”, which is often close to the each cluster.
value of 1 / (1 + h3 / 6 ) , where h is the average space between The expectation step determines the cluster to which each
data points and is much smaller than the allowable range of data point belongs. To do so, the probability that each sample
the smoothing parameter. The appropriate λ value can obtain a belongs to K clusters is calculated, the probability is converted
reasonably fitted spline function and reduce the noise and to a weight value, and each data point is allocated to the appro-
error. priate cluster with the highest probability, as shown in Eq. (6).
1
Sigmoid g ( x) =
1 + e- x
Tangent
g ( x) = tanh( x)
sigmoid
Fig. 3. Neural network model for regression analysis.
ReLU
input is separately weighted on the basis of degree of the input (rectified g ( x) = max(0, x)
signal, and the sum passes through a nonlinear function called linear unit)
the activation function. The activation function can have a
sigmoid shape but can also be nonlinear, piecewise linear
functions, or step functions. They are usually monotonically where each neuron in Oi and Oj has a set of time series data. If
increasing, continuous, differentiable, and bounded. the output Oj is obtained, then Oj becomes the input to obtain
The input and output variables, X(t) = [X1(t),X2(t),···,Xd(t)] the next output in a next hidden or output layer. This computa-
and Y(t) are considered, where t is the time. The output layer tion is repeated for all input, hidden, and output layers in the
can have multiple output variables, but a single output variable neural network.
was considered in this study. Depending on how the input The activation function, such as sigmoid, tangent sigmoid,
signal is transferred among input, hidden, and output layers, and rectified linear unit (ReLU) functions, is a predefined one.
the neural network can be called feed-forward or recurrent These functions are the most commonly used activation func-
networks. tions and are listed in Table 1. The sigmoid and tangent sig-
The feed-forward neural network (FNN) allows the input moid functions can be repeatedly used because they maintain
signal to be transferred only one way such that the output of the same functional form even with their differentiations.
any layer does not affect the same layer unlike the recurrent However, the sigmoid function has a range from 0 to 1,
network. The FNN architecture has two main types: multi- whereas the tangent sigmoid function has a range from −1 and
layer perceptron (MLP) and bridged MLP. The ANN uses the 1. As the number of hidden layers increases, the differentia-
multilayers with FNN architecture and can thus be considered tion values rapidly converge to zero, thereby introducing diffi-
MLP. The MLP has several layers between input and output, culties in the neural network training and prediction. By con-
and all adjacent layers have forward connections in which no trast, the ReLU function can solve this problem because its
connections occur among neurons in the same layers. If the differentiation value is either 0 or 1 [23], which accelerates
number of hidden layer is one, then MLP becomes a single- neural network training. In this study, regression using neural
layer FNN. However, BLMP has a bridge connection across networks was processed using exponential sigmoid, tangent
layers where all of the inputs are connected to the output node sigmoid, and ReLU functions. The regression results using
and the nodes in all hidden layers. If BLMP has nested con- different activation functions were compared to extract the
nections with only one node in each hidden layer, then it be- accurate regression model for ship performance prediction.
comes a fully connected cascade network. In this process, weight matrix W and bias vector b were ob-
Various neural networks exist, but this study mainly fo- tained by minimizing the error between the predicted response
cused on the construction of the big data analysis framework Y(t) and real data y(t), which is expressed as Eq. (9). In this
for the prediction of ship fuel consumption. Thus, the most optimization process, a gradient descent method, which is a
commonly used ANN (MLP) was used. nonlinear optimization method, was used by taking the deriva-
Given that seven input variables and one output variable are tive of the error function E with respect to the weight and bias
used, ANN consists of a single layer of output neurons where to decrease the error.
the inputs compute a linear combination of inputs. The net-
tn
work structure consists of an activation function f, weight 1 2
E=å (Y (t ) - y (t ) ) , (9)
matrix W, and bias b. p and q neurons are assumed to be pre- t = t1 2
sent in layers i and j, respectively. When the neuron in layer j
receives the input in layer i, the output Oj is obtained by calcu- where ti is time instance for ith data. The multilayer network
lating the activation function f, which is a function of a linear uses various learning techniques, especially backpropagation.
combination of weight matrix Wij with p×q dimension, and In backpropagation, the weights of each connection are ad-
bias vector bj. The mathematical formulation for the output justed to reduce the values of the error function by small
can be written as [22] amounts. After repeating this process for a sufficient number
of training cycles, the network will converge to some state in
( )
O j ( t ) = f b j + Wij Oi (t ) , (8) which the value of the error function is small.
5790 M. Jeon et al. / Journal of Mechanical Science and Technology 32 (12) (2018) 5785~5796
3.2 Other regression models Table 2. Data description for ship performance prediction.
The most common regression model is PR. PR is a method Data No. Parameter Remark
of calculating polynomial coefficients by determining the 1 Avg. draft [m]
Ship state
degree of polynomials. The input data are set as x = 2 Trim [m]
éëx1 , x 2 ,L, x n ùû , where xi Î R d and d is the dimension of the 3 ME power [kW]
Engine operation
input variables and n is the number of data. The regression Input 4 Shaft speed [rpm]
model can be expressed as 5 STW [knots]
Navigation speed
6 SOG [knots]
y = Xβ + ε , (10) 7 Rel. wind speed [m/s] Weather condition
é y1 ù é1 x1 x1 2
L p
x1 ù é b 0 ù é e 1 ù Output 1 ME fuel cons. [tons/day] Fuel consumption
ê ú ê úê ú ê ú
ê y2 ú ê1 x2 x22 L x 2p ú ê b1 ú êe 2 ú
ê y3 ú = ê1 x3 x32 L x 3p ú ê b 2 ú + êe 3 ú , (11)
ê ú ê úê ú ê ú n
ì f ( x ) - yi £ e + x ü
i
i 4.2 Data preprocessing
ï *ï
subject to í yi - f ( x ) £ e + x i ý ,
i
After collecting the data for the input and output variables,
ïx , x * ³ 0 ï
î i i
þ denoising was performed using the smoothing algorithm ex-
plained in Sec. 2.1. In this case, a default value for the
where ε-insensitive loss function is defined as smoothing parameter, which was automatically selected
L( f ( xi ), yi ) = f ( xi ) - yi - e for f ( xi ) - yi ³ e and 0 for within the allowable range of the parameter, was used. Fig. 4
other cases. C is a regulation parameter that controls the influ- shows the time series data for SOG before and after the data
ence of noises and outliers to the optimal separating hyper- denoising process. The denoised data smoothed out irregular-
plane, and x and x * are non-negative slack variables. ity and sudden change in the corrupted data while following
This primal optimization problem is transformed into a dual the tendency of the corrupted data. In addition to SOG data,
problem [23], and the solution can be expressed as all data were refined by the denoising preprocessing and have
M. Jeon et al. / Journal of Mechanical Science and Technology 32 (12) (2018) 5785~5796 5791
Cluster
Fig. 4. Denoised data for SOG.
Table 3. ANN regression model using sigmoid function. Table 5. ANN regression model using ReLU function.
with sufficient number of neurons. Table 6. Regression analysis results using PR.
As the number of neurons and hidden layers increases, the
Types MSE R values
accuracy of the regression model also improves. Calculation
Linear 0.9584 0.0448
of the regression values for each dataset shows that the learn-
ing performance of the neural network in the training dataset Interaction 0.7244 0.2859
is similar to that in all datasets. Considering that the regression Pure quadratic 0.8526 0.1528
values of the testing and validation datasets are similar to the Full quadratic 0.6531 0.3581
regression values of all datasets, the fitness of the training
dataset is reasonable and no error occurs in the neural network, Table 7. Regression analysis results using SVM.
including the testing dataset.
Types MSE R values
If the tangent sigmoid or ReLU function is used as the acti-
Gaussian 0.9157 0.0812
vation function, as shown in Tables 4 and 5, then the regres-
sion results are somewhat different from each other. For ex- Linear 2.478 0.0072
ample, if the number of hidden layers is one in the S-model Quadratic 1.854 0.4804
and TS-model, then the regression values are over 0.8 for S4
and TS4 models with four neurons. Therefore, the sigmoid
function conducts learning properly without disappearance of
the gradient of the activation function. However, ReLU1–
ReLU7, with one to six neurons, have regression values less
than 0.8. Given that the gradient value of the ReLU function is
either 0 or 1 when taking the gradient of the activation func-
tion, it does not accurately represent the nonlinearity of the
regression model.
When the number of hidden layers is two, the S-model has
high regression values, such as over 0.85 for S8–S14 in most
cases, which converges to 1. The TS-model has a similar ten- Fig. 7. Regression values for one hidden layer.
dency, as shown in Table 5 for TS10–TS14. However, the
ReLU-model has much lower regression values than those
using S-models and TS-models for one and two hidden layers
because the ReLU function does not accurately describe the
highly nonlinear regression model owing to its linear function
shape.
The box plots for R values using ANN with three activation
functions are shown in Figs. 7 and 8 for one and two hidden
layers, respectively. The lower and upper values of the box
plots indicate the minimum and maximum regression values.
The top and bottom lines of the boxes indicate the first and
Fig. 8. Regression values for two hidden layers.
third quantiles of the regression values, and their middle lines
indicate the median values (second quantile) between them.
As shown in Tables 3-5, the median R values using sigmoid ance among the three activation functions, it is the most attrac-
and tangent sigmoid functions are much higher than those tive for big data because ship-related data are in time series
using the ReLU function for one hidden layer (Fig. 7), espe- and datasets are updated in real time. The sigmoid and tangent
cially when the number of neurons is small. When the number sigmoid functions have vanishing gradient problems in deep
of hidden layers becomes two (Fig. 8), the median R values layers, whereas the ReLU function does not. In this case study,
using ANN with three activation functions increase as the the hidden layer does not need to be deep unlike other big data
number of neurons increases. problems; thus, the sigmoid or tangent sigmoid functions are
However, the variations in the R values using the ReLU recommended over the ReLU function.
function are much larger than those using sigmoid and tangent In comparing the performance of ANN with other regres-
sigmoid functions. The ReLU function has extreme gradient sion models, average R values using the PRs with various
values of either 0 or 1, whereas the sigmoid and tangent sig- degrees of polynomials and SVM were repeatedly calculated
moid functions have continuous gradient values. Thus, de- over 100 times, as shown in Tables 6 and 7. In comparing the
pending on how training datasets are randomly selected, the R performance of each method under equal conditions, the same
values can be considerably different depending on the ran- training, testing, and validation datasets were used in all meth-
domness of the datasets. ods.
Although the ReLU function indicates the worst perform- The degree of polynomials increases, but R values do not
5794 M. Jeon et al. / Journal of Mechanical Science and Technology 32 (12) (2018) 5785~5796
Fig. 9. Regression values using various regression models. Fig. 11. Target-output regression using ANN with TS13.
increase and are much lower than those using ANN, even for highly depending on the values of the parameters. The MSE
the worst case, as shown in Table 6. Similarly, although vari- and R values in Table 7 are the best results obtained by vary-
ous kernel functions are used in SVM, R values remain much ing the parameters of SVM. The computational time of ANN
lower than those using ANN. Parameters such as kernel scale is similar regardless of the types of activate functions because
and epsilon of SVM are repeatedly adjusted, but MSE and R the amount of data used in this study is small.
values remain worse than the results of Table 4. The regression outputs (predicted value) with respect to tar-
Fig. 9 shows the box plots of R values using PR with full gets (real value from given data) for training, testing, and vali-
quadratic functions, SVM with quadratic kernel functions, and dation datasets are shown in Figs. 10 and 11 for verifying the
ANN with TS13, which yield the best results in PR, SVM and most accurate ANN model (T13) and SVM model with quad-
ANN, respectively. PR and SVM show similar performance ratic functions among the examples. For a perfect fit, the re-
regardless of the randomness of datasets, whereas the per- gression outputs should be equal to the targets. The fitted line
formance of ANN depends on which dataset is selected for for the SVM model does not match well with target data; thus,
regression analysis. Nevertheless, when comparing the median it is not a good fit for given datasets. By contrast, the ANN
R values using each method, ANN with the worst accuracy is model using the tangent sigmoid function has a reasonably
much better than the best results using PR and SVM, except good fit with the datasets with a high regression value of
for one outlier in the ANN. Thus, a highly nonlinear relation- 0.98352.
ship is expected between the input variables related to ship
state, engine operation, navigation and weather conditions and
6. Conclusions
ship fuel consumption.
To compare the efficiency of each method, computational In this study, a regression model using an ANN approach
time was calculated. Table 8 shows the median R values and was proposed to predict ship performance in terms of the fuel
average computational time. Given that PR requires simple consumption of the vessel’s main engine. The conclusions can
calculation to obtain regression coefficients, it shows an ap- be summarized as follows.
proximately 1/4–1/2 times faster calculation than ANN. SVM ·A big data analysis framework for prediction of ship fuel
takes the longest computational time except when using the consumption was built. Ship-related data, such as naviga-
Gaussian radial basis function. Unlike PR and ANN, SVM tion, weather, ship operation, and ship structure, were re-
requires longer computational time especially for quadratic fined by sequential data preprocessing including data de-
functions because it solves a dual optimization problem. SVM noising, clustering, compression, and expansion. By
is sensitive to its parameters; thus, its performance can vary checking the silhouette values, MSE values, compressive
M. Jeon et al. / Journal of Mechanical Science and Technology 32 (12) (2018) 5785~5796 5795
ogy and collection, processing and analysis method for ship, Miyeon Jeon is a graduate student in
The Korean Society of Mechanical Engineers Annual Con- the School of Mechanical Engineering
ference, Korea, 3083-3085. in Pusan National University. Her re-
[19] P. J. Green and B. W. Silverman, Nonparametric regres- search area is big data preprocessing and
sion and generalized linear models: A roughness penalty analysis.
approach, Chapman and Hall, London, UK (1994).
[20] R. Sridharan, Gaussian mixture models and the EM algo-
rithm, https://people.csail.mit.edu/rameshvs/content/gmm-em.
pdf (2018).
[21] R. C. de Amorim and C. Hennig, Recovering the number of Yoojeong Noh is an Assistant Professor
clusters in data sets with noise features using feature re- in the School of Mechanical Engineer-
scaling factors, Information Sciences, 324 (2015) 126-145. ing of Pusan National University. Her
[22] M. H. Hassoun, Fundamentals of artificial neural networks, recent interests include big data analysis,
MIT Press, Cambridge, MA, USA (1995). uncertainty quantification, and design
[23] A. L. Maas, A. Y. Hannun and A. Y. Ng, Rectifier nonlin- under uncertainties.
earities improve neural network acoustic models, Interna-
tional Conference on Machine Learning, 30 (1) (2013) 3.