Professional Documents
Culture Documents
Naval Aeronautical and Astronautical University, Department of Electronic and Information Engineering
ABSTRACT classifier and the cascade structure. Since then, the feature
extractor such as Histograms of oriented Gradients (HoG),
Deep learning has led to impressive performance on a Scale Invariant Feature Transform (SIFT), Speeded Up
variety of object detection tasks recently. But it is rarely Robust Features (SURF), Local Binary Patterns (LBPs) are
applied in ship detection of SAR images. The paper aims to proposed and improve the performance further. Meanwhile
introduce the detector based on deep learning into this field. the classifiers are also promoted rapidly, such as Boosting,
We analyze the advantages of the state-of-the-art Faster R- Support Vector Machines (SVM) and their modifiers[4-6].
CNN detector in computer vision and limitations in our The above two thoughts are effective in the past years. But
specific domain. Given this analysis, we proposed a new in the era of deep learning, they face the problem of a low
dataset and four strategies to improve the standard Faster R- accuracy. So it is essential to adopt the detection method
CNN algorithm. The dataset contains ships in various based on the deep learning in this domain.
environments, such as image resolution, ship size, sea Since AlexNet won ImageNet of image classification in
condition, and sensor type, it can be a benchmark for 2012, neural network encounters another revival[7]. Followed
researchers to evaluate their algorithms. The strategies by ZF-Net, VGG-Net, GoogLeNet, and ResNet, the
include feature fusion, transfer learning, hard negative Convolutional Neural Network (CNN) continues to refresh
mining, and other implementation details. We conducted the record of the classification task[8]. Meanwhile as Ross
some comparison and ablation experiments on our dataset. and Kaiming put CNN into the detection task, they proposed
The result shows that our proposed method obtains better a series of effective algorithms like Region based
accuracy and less test cost. We believe that SAR ship Convolutional Neural Network (R-CNN), Spatial Pyramid
detection method based on deep learning must be the focus Pooling (SPP)-net, Fast R-CNN, and Faster R-CNN[9-12].
of future research. The detection result becomes an unprecedented high level.
Especially the Faster R-CNN, which has recently shown
Index Terms—Deep learning, SAR, ship detection, impressive results on various object detection benchmarks.
Faster R-CNN. This paper presents the following contributions: In the
beginning, we construct a dataset for ship detection in SAR
1. INTRODUCTION images. We call it SAR Ship Detection Dataset (SSDD).
SSDD contains ships in various environments. It can be a
Synthetic Aperture Radar (SAR) is an active radar that can basic benchmark for researchers to evaluate their algorithms.
provide high resolution images under all weather conditions. Base on the dataset, we propose several improvements to
SAR images have been widely used for fishing vessel boost the standard Faster R-CNN. Several comparisons and
detection, ship traffic monitoring and immigration control. ablation experiments results show the efficiency of the
Numerous studies have been done to detect ships in SAR Faster R-CNN and the effectiveness of our improvements.
images[1]. 2. RELATED WORK
The Constant False-Alarm Rate (CFAR) and Viola & Ross introduced a Region-based CNN (R-CNN) for object
Jones are two common algorithms in this field. CFAR is detection as shown in Fig.1. It has two stages: generating
widely used by setting a threshold so that we can find targets some object-agnostic proposals and training a regressor to
that are statistically significant above the background pixel refine the position of the bounding box. Approximately 2000
while maintaining a constant false alarm rate. A function that proposals pass through the CNN for extracting features per
fits distribution of clutter is first computed for determining image. This causes a large computation. In order to relieve
the threshold. All pixels with their values higher than the this problem, SPP-net[10] and Fast R-CNN[11] are proposed in
threshold are defined as ship targets[2]. After the pre- order. They feed into the whole input image once to extract
screening by CFAR, a discriminator is needed to reject the features, and project the region proposals to the final feature
background. Viola and Jones is a seminal work which has a map. The Fast R-CNN is a special case of the SPP-net,
significant impact to object detection[3]. It has three stages which uses a single spatial pyramid pooling layer, and thus
that makes it very fast: the integral image, the AdaBoost allows end-to-end fine-tuning a pre-trained ImageNet model.
5. CONCLUSION
We construct a dataset for ship detection of SAR images
called SSDD. The SSDD is so far the first SAR images on
public for researchers to evaluate the performance of their
detectors. We also present an improved Faster R-CNN
Fig. 8 Detection samples on the SSDD dataset. method to detect ships in SAR images. The proposed
4.3. Ablation experiment method adopts the standard Faster R-CNN as the meta-
In order to evaluate the proposed method further, we make architecture. And change in four aspects according to SSDD.
some ablation experiments in Table 4. Our purpose is to That is feature fusion, transfer learning, hard negative
examine the contributions of different strategies proposed in mining and some implementation details. The experiments
Section 3.3. conducted on SSDD demonstrate our proposed method has a
TABLE 4 RESULTS OF THE ABLATION EXPERIMENTS better accuracy and is less time consuming.
time (ms) per
methods AP 6. REFERENCES
image
Standard 70.1% 198
Improved 78.8% 183 [1] Wackerman, C.C. Friedman, K.S. Pichel, W.G. Clemente-
Standard + Feature fusion 76.4% 213 Colon, P. Li X, “Automatic Detection of Ships in RADARSAT-1
Standard + Transfer learning 74.3% 203 SAR imagery”, Canadian Journal of Remote Sensing, 27 (5) 568-
Standard + Hard negative mining 75.6% 199 577 (2001) .
Dropout 71.6% 198
NMS 69.1% 188 [2] Banerjee, A. Burlina, P. Chellappa, R, “Adaptive target
Standard +
details
Region proposal number 68.9% 163 detection in foliage-penetrating SAR images using alpha-stable
Dropout+NMS+Region models”, IEEE Transactions on Image Processing, 8 (12), 1823-
68.6% 161
proposal 1831(1999) .
We examine the impact of feature fusion in 4th row of
Table 4. By the fusion strategy illustrated in Section 3.3.1, [3] P. Viola, M. Jones, “Rapid object detection using a boosted
the model can detect ships with different sizes. The average cascade of simple features”, in: Proc. of CVPR (2001) .
precision boost from 70.1% to 76.4%, while the test time
increase less (15ms). Further we evaluate the performance of [4] J. Cheney, B. Klein, A. K. Jain, and B. F. Klare,
“Unconstrained face detection: State of the art baseline and
the transfer learning strategy, as shown in 5th row of Table
challenges”, In ICB, pages 229-236(2015) .
4. Since SSDD is a small dataset which has a big difference
with the common object detection dataset. If we transfer the [5] M. Mathias, R. Benenson, M. Pedersoli, L. V. Gool, “Face
whole convolutional layers to our domain, the AP is about detection without bells and whistles”, in: ECCV (2014) .
70.1%. If we transfer the former three layers to our domain,
and fine-tune the latter layer on SSDD. The average [6] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, D.
precision increase to 74.3% and the test time increase very Ramanan, “Object detection with discriminatively trained part-
less. The same as the hard negative mining. The average based models”, Pattern Analysis and Machine Intelligence, IEEE
precision increases from 70.1% to 75.6%, but the test time Transactions on 32 (9) 1627-1645 (2010) .
nearly the same as the standard. This is because the strategy
[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
classification with deep convolutional neural networks”, In NIPS,
pages 1106-1114 (2012).
[15] Bengio, Y., Clune, J., Lipson, H., & Yosinski, J, “How
transferable are features in deep neural networks”, CoRR,
abs/1411.1792(2014).