Ship Classification Based On Convolutional Neural Networks
Ship Classification Based On Convolutional Neural Networks
To cite this article: Yang Yang, Kaifa Ding & Zhuang Chen (2022) Ship classification based
on convolutional neural networks, Ships and Offshore Structures, 17:12, 2715-2721, DOI:
10.1080/17445302.2021.2016271
effect of the neural network under the conditions of the perception region sensitive to the input, so as to avoid the gradient disappear-
field. The ResNet model revealed ‘degradation’ and conceived of a ance problem.
‘shortcut connection’ for degradation, effectively alleviating the
problem of difficult training of neural networks with too much
2.1. Convolutional layer
depth. Algorithms in the field of object detection can be divided
into two categories – the first being single-target detection, rep- Convolution is the core operation of a CNN. The parameters in the
resented by YOLO series (Redmon et al. 2016) and SSD (Liu convolution core are obtained by random initialisation – which is
et al. 2016) characterised by its high speed and low accuracy; the usually uniformly distributed, Xavier (Glorot and Bengio 2010),
second being two-stage target detection, represented by R-CNN Gaussian – updated by reverse propagation. The convolution oper-
(Girshick et al. 2014), characterised by high accuracy and low ation involves dot products between the parameters in the convolu-
speed. In this study, we propose a ship classification method tion kernel and the pixel values in the same size area in the input
based on CNN, an algorithm for deep learning. Moreover, transfer image. The result is then output at the corresponding positions.
learning can be used in ship classification models to avoid the over- The characteristics of the convolution operation are local connec-
fitting phenomenon caused by an insufficient number of images tions and weight sharing. Figure 2 shows a convolution process
and to improve the accuracy of the model on small sample ship with an input image of 5 × 5, a 3 × 3 convolution kernel, a step
datasets. Considering the influence of environmental factors, ship size of 1, and an offset of zero.
classification under foggy and night-time conditions was studied.
The feasibility and effectiveness of a ship classification method
2.2. Pooling layer
based on CNN were verified by comparison with traditional
machine learning algorithms. After the convolution operation, each pixel in the output contains
information about a portion of the area in the input image. Conse-
quently, the output results in information redundancy. To improve
2. Convolutional neural network background
the computational efficiency, the output after the convolution oper-
CNNs (Khan et al. 2020) are hierarchical models composed of one ation needs to be pooled, which can reduce the data dimension and
or more convolutional layers and pooling layers, followed by non- the computational overhead. Figure 3 shows the maximum pooling
linear activation function layers with fully connected layers on top, operation with an input image of 6 × 6, a pooled area of 2 × 2, and a
as shown in Figure 1. Compared with traditional machine learning step size of 2.
algorithms, CNNs use images as inputs, avoiding the complex fea-
ture extraction and data reconstruction used in traditional machine
2.3. Fully connected layer
learning algorithms. The images are convolved by the CNNs. This
convolution operation preserves the local spatial structure of the The convolutional layer and pooling layer map the input image into
images, enabling CNNs to extract more valid information from the feature space, the fully connected layer mapping the feature into
them. Consequently, CNNs have more obvious advantages in the label space. In the fully connected layer, each neuron is con-
image classification and target detection than traditional machine nected to all neurones of the previous layer, and neurones in the
learning algorithms. Images are fed to the first layer, including same layer are not connected. The equation for a fully connected
the convolutional layer and pooling layer, which applies a trans- layer can be expressed as follows:
formation and sends the processed images to the next layer, this
process being repeated until the last layer achieves the predicted (l)
d
zki = w(l−1)
ij
(l−1)
xkj + b(l−1)
i (1)
values. The error between the predicted values and the true values j=1
is further calculated based on the task (regression task or classifi-
cation task). The error is passed forward by the backpropagation where d is the number of neurones in the l−1 layer, l represents the
algorithm to update the parameters, this process being repeated current layer; w(l−1)
kj is the connection parameter between the j-th
(l−1)
until the model converges. unit of the l−1 layer and i-th unit of the l layer; xkj is the input
However, a model that converges on the training set may not value of the j-th neuron of the k-th sample of the l−1 layer; b(l−1) i
(l−1)
perform well on the test set. To improve the generalisation ability is the offset of the i-th neuron of the l−1 layer; and zki is the out-
and to reduce over-fitting, LRN, dropout (Hinton et al. 2012; Sri- put value of the i-th unit of the k-th sample of the l layer.
vastava et al. 2014), batch normalisation (Ioffe and Szegedy 2015),
and other tricks can be used in CNNs. LRN is a method for improv-
3. Methodology
ing accuracy during deep learning training. The LRN layer imitates
the lateral suppression mechanism of the biological nervous system, To compare the effects of CNNs and traditional machine learning
creating a competitive mechanism for the activity of local neurones algorithms in ship classification, typical models were selected
with less feedback, and improving the generalisation ability of the from the field of CNNs and traditional machine learning. The Alex-
model. Dropout refers to the training process of a deep learning Net (Krizhevsky et al. 2017) model was the winner of the 2012 Ima-
network, based on a certain probability of a portion of the neural geNet Large Scale Visual Recognition Challenge (ILSVRC),
network units temporarily discarded from the network. Dropout regarded as the start of deep learning research and representative
can alleviate the occurrence of over-fitting more effectively and CNN models. The AlexNet model has five convolutional layers
achieve the effect of regularisation to a certain extent. Batch nor- and three fully connected layers. The ReLU activation function
malisation pre-processes the input data for each layer during the (Nair and Hinton 2010) was used to replace the traditional sigmoid
neural network training. The basic idea is as follows: for each hid- and tanh activation functions, and LRN was used to improve the
den layer neuron, the input distribution – which is gradually generalisation ability of the model. In general, the more complex
mapped to the nonlinear function and then to the limit saturation and deeper the model, the more the sample space can be divided
region of the value interval – is forced to return to a relatively stan- into different categories of regions. Consequently, the classification
dard normal distribution with a mean of 0 and variance of 1, so that effect of the model is even better. To observe the influence of net-
the input value of the nonlinear transformation function falls into a work depth on the ship classification task, a 19-layer VGG-19
SHIPS AND OFFSHORE STRUCTURES 2717
model was selected. The ReLU activation function and dropout other datasets. The model was initialised using the pre-training
method were used in the VGG-19 model. Moreover, as a classic weight parameters of the ImageNet dataset, after which the model
machine learning algorithm, SVM has been widely used in ship weight parameters were fine-tuned based on the experimental
classification, to construct an optimal classification plane that not dataset.
only separates the samples correctly, but also maximises the classifi-
cation interval.
The calculation of the gradient descent (GD) is computationally
4. Experimental results
complex. To improve the computational efficiency, CNNs are
trained using stochastic gradient descent (SGD) and backpropaga- Based on the accumulation of laboratory data, a visible-light ship
tion algorithms. SGD divides the training samples into multiple image classification dataset was constructed, and named SHIP-3.
mini-batches each time a mini-batch is fed to the model for train- The dataset contains images of three types of ships: bulk carriers,
ing, the entire mini-batch training process being called an epoch. container ships, and cruise ships, each with 289 images. (Figure 4).
Considering the internal storage form of a computer, the number To ensure a balance of the three sample types, each type of ship
of samples contained in a mini-batch is usually of the order of n image was considered separately when dividing the dataset. Twenty
to the power of 2. If the number of samples in a mini-batch is percent of the images were randomly selected from each ship class
too large, the calculation efficiency is low, occupying the GPU for testing, while the remaining 80% were used for model training
resources. If the number of samples is too small, the gradient is and validation.
easily affected by a single sample, and the convergence speed can Data augmentation can effectively expand a ship dataset with
be slow. few samples, reducing the possibility of over-fitting, and improving
The mini-batch size used in this study was 32. The learning rate
affects the convergence of the model, a large learning rate tending to
cause the model to diverge, a small learning rate slowing the con-
vergence – therefore, the learning rate was set to 0.00001 in this
study. For the model to be fully trained, the number of training
epochs was chosen to be 100 (Bengio 2012).
To accelerate the training and improve the robustness of the
model, transfer learning was adopted (Li et al. 2009; Pan and
Yang 2010). Transfer learning can train new models by migrating
some or all model parameters that have been trained on other data-
sets to new datasets. When the dataset types are similar, the robust-
ness of the model using transfer learning tends to be better than that
of a randomly initialised model. In CNNs, a lower layer extracts
general features – such as the corners of an image – and a higher
layer extracts specific features – such as the colour and shape of
an image. The extraction of general features is less dependent on
the dataset, so the parameters that extract the general features
from the ImageNet dataset can be used for feature extraction in Figure 3. Pooling operation.
2718 Y. YANG ET AL.
the normalisation ability of the model. In this study, the ship image
dataset was enhanced (Wang et al. 2020), by random cropping and
horizontal flipping. Figure 5 shows an original image, Figure 6
showing a randomly cropped version of it, and Figure 7 showing
the horizontally flipped version of it. With data augmentation,
1010 images were obtained for each category for a total of 3030
images. The distribution of the SHIP-3 dataset is presented in
Table 1.
To evaluate the generalisation ability of the proposed models,
three evaluation metrics widely used for this type of task were cho-
sen – that is, accuracy, the F1-score, and the confusion matrix. Accu-
racy is defined as the ratio of the number of samples accurately
predicted by the model to the total number of samples. The F1-
score is defined by precision and recall, as follows: Figure 6. Random cropping.
TP
Precision = (2)
TP + FP where TP (true positive) indicates that a given condition exists, and
it really exists. False negative (FN) is a test result that indicates that a
TP condition does not hold, while in fact it does – in other words, no
Recall = (3)
TP + FN erroneous effect was inferred. False positive (FP) – commonly called
a ‘false alarm’ – indicates that a given condition exists when it does
Precision × Recall not. The confusion matrix is a model evaluation metric that can be
F1-score = 2× (4)
Precision + Recall
Disclosure statement
We declare that we have no financial and personal relationships with other people
or organizations that can inappropriately influence our work, there is no pro-
fessional or other personal interest of any nature or kind in any product, service
and/or company that could be construed as influencing the position presented in,
or the review of, the manuscript entitled. The authors report no conflicts of inter-
est. The authors alone are responsible for the content and writing of this article.
Funding
This research was financially supported by the National Science Foundation of
China (grant number 51979036).
Figure 13. The curve of the accuracy versus the number of iterations using different
models.
ORCID
Yang Yang [Link]
information of the ship’s image, making the accuracy of the SVM
model only about 0.5. The CNN model proposed in this paper
still achieved an accuracy of approximately 65%–75% with the data- References
set above, the accuracy of the VGG-19 model being close to 0.75. Bengio Y. 2012. Practical recommendations for gradient-based training of deep
The CNN model automatically extracts the feature details passing architectures. Lect Notes Comput Sci. 7700(1-3):437–478.
the feature back in the form of a multilayer feature map. After Girshick R, Donahue J, Darrell T, Malik J. 2014. Rich feature hierarchies for
the loss function is obtained, the weight parameters are updated accurate object detection and semantic segmentation. arXiv preprint arXiv:
1311.2524.
using the backpropagation algorithm. Consequently, compared Glorot X, Bengio Y. 2010. Understanding the difficulty of training deep feedfor-
with the SVM model, the CNN could better learn the features ward neural networks. J Mach Learn Res. 9:249–256.
from the image. Graves A, Mohamed AR, Hinton G. 2013. Speech recognition with deep recur-
rent neural networks. 2013 IEEE International Conference on Acoustics,
Speech and Signal Processing. 6645–6649.
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J. 2017. LSTM:
a search space odyssey. IEEE Trans Neural Networks Learn Syst. 28
5. Conclusions (10):2222–2232.
He KM, Zhang XY, Ren SQ, Sun J. 2015. Deep residual learning for image rec-
In this paper, we propose a ship classification method based on ognition. arXiv preprint arXiv: 1512.003385.
CNN. Based on the theory and framework of the proposed method, Hinton G, Deng L, Yu D, Dahl GE, Mohamed AR, Jaitly N, Senior A, Vanhoucke
we classified a variety of ships and considered the impact of com- V, Nguyen P, Sainath TH, Kingsbury B. 2012. Deep neural networks for
plex marine environmental conditions on ship image classification. acoustic modeling in speech recognition: the shared views of four research
groups. IEEE Signal Process Mag. 29(6):82–97.
The feasibility and effectiveness of ship classification based on Ioffe S, Szegedy C. 2015. Batch normalization: accelerating deep network train-
CNNs were verified by comparison with traditional machine learn- ing by reducing internal covariate shift. arXiv preprint arXiv: 1502.03167.
ing algorithms. The main conclusions that could be drawn are as Khan A, Sohail A, Zahoora U, Qureshi AQ. 2020. A survey of the recent archi-
follows: tectures of deep convolutional neural networks. arXiv preprint arXiv:
1901.06032v7.
Krizhevsky A, Sutskever I, Hinton GE. 2017. ImageNet classification with deep
(1) A CNN method combined with transfer learning was proposed convolutional neural networks. Adv Neural Inf Process Syst. 60(6):1106–1114.
to classify ship images. It can be seen from the experimental Lang HT, Zhang J, Zhang X, Meng JM. 2016. Ship classification in SAR image by
data that the classification accuracy of the model combined joint feature and classifier selection. IEEE Geosci Remote Sens Lett. 13
with transfer learning was higher than that of the random initi- (2):212–216.
Lecun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature. 521:436–444.
alisation model, and the calculation overhead was significantly Li B, Yang Q, Xue X. 2009. Transfer learning for collaborative filtering via a rating-
reduced. Consequently, transfer learning on a small dataset can matrix generative model. Proceedings of the 26th International Conference on
improve the robustness of the ship classification model. Machine Learning, ICML 2009. June, Montreal, Quebec, Canada, 617–624.
(2) The experiment showed the accuracy of the CNN model in ship Lin H, Song S, Yang J. 2018. Ship classification based on MSHOG feature and
classification to be better than that of the SVM model. The task-driven dictionary learning with structured incoherent constraints in
SAR images. Remote Sens (Basel). 10(2):190.
SVM model needs to extract features manually from the ship Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC. 2016. SSD:
images before classification, the quality of the extracted fea- single shot MultiBox detector. Lect Notes Comput Sci. 9905:21–37.
tures significantly affecting the classification accuracy. The Nair V, Hinton GE. 2010. Rectified linear units improve restricted boltzmann
CNN model updates the feature parameters through a back- machines. International Conference on International Conference on
Machine Learning. June 21–24, Haifa, Israel.
propagation algorithm, which better extracts the features. Pan SJ, Yang Q. 2010. A survey on transfer learning. IEEE Trans Knowl Data
The confusion matrix showed that bulk cargo ships and con- Eng. 22(10):1345–1359.
tainer ships were easily misclassified owing to the insufficient Qi S, Ma J, Lin J, Li YS, Tian JW. 2015. Unsupervised ship detection based on
number of samples and incomplete sample information. This saliency and S-HOG descriptor from optical satellite images. IEEE Geosci
Remote Sens Lett. 12(7):1451–1455.
phenomenon could be reduced by adding training samples.
Redmon J, Divvala S, Girshick R, Farhadi A. 2016. You only look once: unified,
(3) In bad weather conditions such as foggy days, the identification real-time object detection. arXiv preprint arXiv:1506.02640v5.
of ships was affected, the ship images losing many details. The Simonyan K, Zisserman A. 2015. Very deep convolutional networks for large-
classification accuracy of the CNN and SVM models for ship scale image recognition. arXiv preprint arXiv:1409.1556v6.
images dropped significantly. However, the classification accu- Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutidinov R. 2014.
Dropout: a simple way to prevent neural networks from overfitting. J
racy of the CNN model for the same ship images was signifi- Mach Learn Res. 15(1):1929–1958.
cantly higher than that of the SVM model, and exhibited Wang YQ, Yao QM, Kwork J, Ni LM. 2020. Generalizing from a few examples: a
better robustness. survey on few-shot learning. arXiv preprint arXiv: arXiv: 1904.05046v3.