Tech Paper Edit F

Integrated Localization and Recognition for Inshore Ships in Large Scene Remote Sensing Images
CHAPTER 1
INTRODUCTION
As the technology of the satellite is developed, the spatial resolutions of optical
remote sensing images are getting higher and higher. There are high resolution images
which could provide more details, which have significantly contributed to the image
understanding and target analysis in remote sensing field. Ship detection and recognition
play a critical role in vessel traffic services, fishery management, and especially, naval
warfare. Researchers in this field focus on two directions:
1) Ocean ship detection and recognition

2) Inshore ship detection and recognition.
Compared with the relatively clean background of ships in the ocean, the background of
ships in harbors is much more complex. The buildings on land and the connections between
ships and harbors cause a lot of false alarms. In this letter, we focus on the analysis of
inshore ship targets. Existing works mainly concentrate on the detection of inshore ships.
After that, like the detection of ocean surface ships, an automatic threshold was used to
convert image to the binary image. Besides the using of prior knowledge, water and land
segmentation is another common way to preprocess the image. Previously they were using
Otsu’s segmentation method to extract the boundaries of targets. The new segmentation
energy function based on the active contour model. After segmentation, contour matching
algorithms and local features are always used to detect ships. An alterable included angle
chain curve description was introduced in to detect contours of ships. The “V” shape of ship
head is also a discriminative local feature which is widely used on ship detection.
Some research has been done on ship recognition. The previous works were examined the
performances of different image classification and feature extraction algorithms on ship
imagery. The hierarchical multiscale local binary patterns feature combined with support
vector machine (SVM) classification method obtained 91% accuracy classifying barges,
cargoes, containers, and tankers. The enhancement was made such that utilized the
superstructure of vessels and divided the vessel signatures in three sections bow, middle, and
stern. Then, a hierarchical decision-level classification was used to analyze each section and
combine the results.
For the practical application, a complete framework should not only detect locations of
Dept. of ECE Page 1

targets, but also identify their types. Therefore, traditional automatic target recognition
systems are composed of detection part and recognition part. Detection is done first to give
locations of targets. And then recognition is done on the detection results, giving types of
targets. For example, in, to recognize ocean ships, the rapid image exploitation resource
Ship Detection System was used first as the detector. However, this detection and
classification method may be incompatible in a framework because the detection method
and the recognition method are proposed separately.
In this letter, we propose an end-to-end framework to complete localization and recognition
for inshore ships, which is a rarely researched area. We borrow the prior-based method in
and develop it by integrating scale-invariant feature transform (SIFT) registration part into
the framework. This makes the framework robust to the error of geographic coordinates.
Then, we propose a novel multi-model method to give the preliminary result of localization
and recognition, which is a set of candidates. Finally, we use a fusion strategy to combine
candidates given by models. This fusion strategy is proposed to solve adjacently moored
ship problems. Histogram of oriented gradient (HOG) feature has been used for
classification in some researches, such as vessel classification and face recognition. To boost
the expression ability of models, HOG is also selected as the feature extractor in the
framework.
The detailed workflow of our framework is illustrated in Fig. 1. First, the geographic
coordinate error between port template and test image is eliminated by SIFT Registration.
Next, region of interests (ROIs) are extracted by the converted coordinates and are aligned to
the horizontal direction. Then, the trained multi-models are used on ROIs by sliding window
method, giving candidates. Last, the candidates are combined as final results using the fusion
strategy. All of these allow for the proposed inshore ship localization and recognition
framework in end-to-end fashion.
Fig. 1: Workflow of our proposed inshore ship recognition framework.
Dept. of ECE Page 2

(a) (b) (c)

Fig. 2: Construction of multi-model. (a) For different ship types, collect samples with
different sizes. (b) Extract features of samples. (c) Build models by binary classification.
Size information is self-contained in each model.
Dept. of ECE Page 3

CHAPTER 2
MULTIMODEL TRAINING
A. HOG Feature on Inshore Ship
Visual saliency features, like contours, are always used in the detection of inshore ship.
However, sometimes different types of ships have similar contours (most ships are long and
narrow). This causes difficulty on the recognition. To overcome this difficulty, more
discriminative detailed features should be utilized, for example, characteristic distribution
of buildings on the ship, sign on the deck and so on. Considering that inshore ships are easy
to be disturbed by background, the robust HOG feature is used to extract ship’s detailed
features. In HOG detector learned discriminative features of human, like head and legs.
Similarly, we expect HOG to obtain detailed information of inshore ships.
However, HOG is not a rotation invariant feature so that there is an essential condition to
use it: in the testing data, the directions of detected targets should be relatively invariant.
The directions of training samples and the directions of targets on testing data have to be
accordant. Obviously, the directions of inshore ships in remote sensing images vary in all
directions. To solve this problem, we utilize the characteristics of ships moored in harbors:
the inshore ships are ported along wharfs and the orientations of wharfs are constant and
knowable. So we use geographic information to align the directions of detected areas,
making them consistent to the direction of training samples.
B. Multi-model Construction
Generally, we take the recognition of inshore ship as a multi classification problem. A
common way is to build a single model using multiple classifiers. However, sizes of
different ships are quite different. The multiple classifiers require that the pixel sizes of
input samples are the same. Therefore, samples of different types of ships should be resized
to the same size for training. There are two disadvantages of this approach. One is that
different types of ships have different length–width ratio. Resizing samples leads to the
distortion of ship shapes. The other is that this approach overlooks the information of ships’
sizes, which is a very useful feature, leading to the recognition inaccurate and inefficient.
The multi-model method is proposed to circumvent disadvantages mentioned above.

First, according to different types of ships, we set different sizes of training samples. The
negative samples are collected randomly in the ports and sea areas. Then, for each type of
Dept. of ECE Page 4

ships, the positive and negative samples are used to extract their HOG features and build a
single model by a binary SVM classifier. Each type of ships has its own model. By this
way the models, trained by the normalized samples with specific sizes, contain size
information of different ship types. Task of recognition is converted from a multi
classification problem to a multiple binary-classification problem. Refer to Fig. 2.
In the letter in which HOG was proposed, HOG was combined with the SVM to detect
pedestrians. Besides, the numbers of training samples are unbalanced as shown in Table I.
The structural risk minimization principle-based SVM classifier has a better performance
in cases when training set is relatively small. Therefore, we select the SVM classifier for
the multi-model construction.
C. Visual Representation of Feature and Model

We select a type of ship to visually display the feature the model (see Fig. 3). The sample
size is 320X64. R-HOG with bin size 9 block size 16X16, and cell size 8X8 is used to
extract ship feature. Linear SVM classifier is used to build models. As we mark by red
box in Fig. 3, the ship’s specific sign on deck is obviously extracted in the feature map.
The detailed features of ships are obtained as we expected. This provides useful basis for
the recognition.
TABLE I
EXPERIMENTAL RESULTS
Dept. of ECE Page 5

(c) (d)
Fig.3: Visual representation of feature and model. (a) Ship sample. (b)Feature map. (c)
Positive weights of the model. (d) Negative weights of the model. As marked by red box in
(a) and (b), the sign on the deck, which is a salient detail, is extracted in the feature map.
CHAPTER 3
RECOGNITION AND FUSION
A.ROIs Extraction and Alignment
Target detection and recognition in large scene images rely on the utilization of
geographic information to improve processing speed. As we mentioned before, the
geographic information is also used to ensure direction consistency of training samples and
Dept. of ECE Page 6

recognized ships. Proposed a method in which the port template was used to extract wharf
areas as ROIs. These are regions where ships may be moored. However, in practice
geographic coordinates of a port in different remote sensing images normally have tens of
meters error. This error is fatal when using template match method. To solve this problem,
SIFT registration is used in our framework. Fig. 4 shows the elimination of the geographic
coordinate error. We also align ROIs to horizontal direction in this step. The detailed
procedure is explained in Algorithm 1.
B.Recognition on Specific Multiscale Spaces
The sliding window method is used on ROIs. Because each ship model was trained by
samples with specific size, the extracted ROIs should be resized to several specific spaces.
When a target appears in ROIs, each model will give its predicted result of the target. The
result consists of three parts: match score which denotes confidence, bounding box which
indicates target’s location and type of the target. The predictions will be combined into a
final result after fusion.Fig.5 shows the recognition results. We can see that the correct
model gives the highest prediction score and the appropriate bounding box. Other results,
whose sizes cannot match the target, are suppressed. To detect targets with different sizes, in
traditional method testing image is resized to gradual scale spaces, tentatively matching the
size of targets and ignoring the discriminative size feature.
In the proposed multi-model method, recognition is fulfilled in a novel way by preparing

a sole model for each type of ships, therefore, size features are reasonably used to enhance
accuracy and efficiency. The framework locates inshore ships by successive steps. First
ROIs are extracted and aligned to narrow detection region. Then, the precise locations of
targets are given by sliding window method and the following fusion step. In this way, the
proposed framework avoids the extra detection step.
Dept. of ECE Page 7

Fig. 4: Same area of test image and port template. We can see that the geographic
coordinate error is eliminated after registration.
Algorithm 1 ROIs Extraction and Alignment
1) For the port template, label each wharf area (ROI) in which ships may be moored and
save them in geographic coordinate format. Measure and save orientation angles of
each ROI.
2) For the test image, convert geographic coordinates of ROIs from template image to test
image by the SIFT feature registration.
3) Extract ROI’s in test image using the converted geographic coordinates.

4) Align ROIs to the horizontal direction by orientation angles saved before.
Fig.5: Recognition on the specific scale spaces. Only the correct model gives the
highest score and the appropriate bounding box. Other recognition results are
suppressed by size feature.
Dept. of ECE Page 8

C.Candidates Fusion
The fusion strategy is explained in Algorithm 2
Algorithm 2 Fusion Strategy
1) Assign to C the set of all predictions and t 1 the filtering threshold. Delete predictions in C whose
match scores are less than t1, resulting in set Cs.
2) Assume that ch and ci are elements in cs. If ch and ci meet the following conditions, we cluster ci into the
group of ch
d ≥ α × min{l , l } and d ≥ α × min{w ,w }
where dx and dy are horizontal and vertical overlapped Euclidean distances of the two
bounding boxes of ch and ci , lh, and li are lengths of ship types of ch and ci , wh and wi
are widths of ship types of ch and ci .
3) Remove the clustered elements in c2 by step 2). Do 2) until the set C5 is empty.
4) After 3) Cs is divided into multisets. For each set, choose the prediction with the highest
match score as its final result. Adjust the fused bounding boxes by ship sizes.
We design the fusion strategy to combine the candidates. The separation of ships moored
adjacently has always been a difficult problem. Therefore, our fusion strategy is
designed based on Euclidean distance to solve this problem. Fig. 6 shows the fusion
processes when ships are moored adjacently. After filtering by threshold t1, most false
alarms are removed and the remaining several predictions are more confident
predictions. There are multiple targets in each ROI, so we need to cluster the predictions
predicted for one target as a group. Aiming at the cases that ships are moored in a
line or in a row, parameters α1 and α2 are set to make the clustering controllable.
In fact, when α1=0.25 and α2 = 0.65 the cluster method could effectively separate the
adjacent ships as shown in Fig. 6(c). Since in training samples, ships are slightly smaller
than sample sizes, the predicted bounding boxes should be adjusted at last.
Dept. of ECE Page 9

CHAPTER 4
EXPERIMENTS AND RESULTS

In the experiments, the authors have collected 280 Quick bird images with the resolution of
0.6 m per pixel and with the pixel sizes of about 16000X16000. The recognition task aims
at 12 kinds of major military ships as shown in Table I. We label the ship targets manually
and choose 200 images to obtain positive and negative samples. The remaining 80 images
are used to test the algorithm performance. We select the test images in which ships are in
different cases, for example, ships moored side by side and ships disturbed by shadows.
A. Results of Localization and Recognition
The features are extracted by R-HOG with bin size 9, block size 16X16, and cell size 8X8. The
models are built by multiple linear SVM classifiers with max iteration times 1000 and relax factor
0.03.Fusion parameters GAP ref adobe pdf A recognized result is considered as a correct result
only when the overlap area of its bounding box and the ground truth exceeds 75%, as well as the
predicted type is correct. The detailed experimental results on 200 training images and 80 test
images are shown in Table I. The accuracy and false ratio are defined as follows
Number of correctly recognized ships

Accuracy = X100
Number of real ships
Number of falsely recognized ships X 100

False ratio = Number of recognized ships
B. Comparison
Two detection classification methods are implemented for the comparison. We choose
the methods in and for the inshore ship detection. Considering the detection methods, ships
Dept. of ECE Page 10

of type 1, 5, 9, and 11 in Table I are not taken into account for the comparison. We choose
bag of words (BOW) model, which is an efficient image recognition algorithm, as the
classification method. For each result by the two detection methods, a ship type is given as
the recognition result using BOW model combined with nearest neighbor method. We
conduct the two reference methods on our test images, and the results are shown in Table
II. We can see that our method have a better performance. By the experiments, we find that
the detection method is likely to matter more for inshore ship localization and recognition,
compared with the classification method. If the bounding box given by detection method
just contains the target, the subsequent classification method is more likely to give the
correct type. If the given bounding box is too big or too small (enclosing many
backgrounds or missing out part of target), the classification is more error prone.
C. Analysis of the Proposed Framework

Fig. 7 shows the recognition result on a Naval Station Norfolk image by the framework.
The Aircraft Carriers and the Submarines are included in this port. In Fig. 7(c) and (d),
targets are disturbed by shadow or sundries. Compared with the land–sea segmentation
method, the framework is more robust when handling situations of disturbances. Moreover,
the framework could effectively separate the adjacent ships as shown in Fig. 7(b) and (c).
The quantities of different types of ships in remote sensing images have a great difference,
especially for the military ship. This leads to the sample unbalanced problem for the
classification. We can see that the proposed multi-model method avoids. The extra process
to solve this problem, However, it still has big effect on the performance of the framework.
If there is a large quantity of training samples, for the corresponding type of ships, the
framework is more robust to disturbances, and the accuracy is higher, e.g., type 1 and 2. If
there are only a few training samples, the backgrounds are more easily to be recognized as
the corresponding type of ships, leading to a high false ratio, e.g., type 11 and 12.
Fig. 6: Fusion processes. (a) Preliminary recognition results by sliding window method.
(b) Filter results by t1. False alarms are removed in this step. (c) Cluster results by step 2)

and step 3). The side-by-side ships are effectively separated. (d) Result with the highest
score in each set is chosen as the final result. (e) Bounding box adjustment.
Fig. 7: Recognition result on a Naval Station Norfolk image.

CHAPTER 5
CONCLUSION
An end-to-end framework is proposed to automatically locate and recognize inshore

ships in large scene remote sensing images. A novel the multi-model method is proposed to
recognize different types of ships. By this method, size information is reasonably used to
improve accuracy and efficiency. A fusion strategy is presented to combine candidates and
separate adjacent targets. Moreover, the utilization of geographic information combined
with the viewing method could give precise locations of targets. According to these
contributions, the proposed method is robust for different scenes with shadows or other
disturbances. Experiments on Quick bird images show that our multi-model recognition
method yields a good performance. In the future, we will obtain more samples for the
recognition of more types of ships.

CHAPTER 6
REFERENCES
[1] Wenkai Li, Kun Fu, Hao Sun, Xian Sun, Zhi Guo, Menglong Yan, Xinwei Zheng,
“Integrated Localization and Recognition for Inshore Ships in Large Scene Remote Sensing
Images”, IEEE Geoscience and Remote Sensing Letters, volume 14, no.6,pp936-940, JUNE
2017.
[2] C. Zhu, H. Zhou, R. Wang, and J. Guo, “A novel hierarchical method of ship detection
from spaceborne optical image based on shape and texture features,” IEEE Trans. Geosci.
Remote Sens., vol. 48, no. 9, pp. 3446–3456, Sep. 2010.
[3] N. Proia and V. Pagé, “Characterization of a Bayesian ship detection method in optical
satellite images,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 2, pp. 226–230, Apr. 2010.
[4] G. Long and X. Chen, “A method for automatic detection of ships in harbor area in high-
resolution remote sensing image,” Comput. Simul., vol. 24, no. 5, pp. 198–201, May 2007.

Tech Paper Edit F

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tech Paper Edit F

Uploaded by

Copyright:

Available Formats

Integrated Localization and Recognition for Inshore Ships in Large Scene Remote Sensing Images

1) Ocean ship detection and recognition

Dept. of ECE Page 1

Fig. 1: Workflow of our proposed inshore ship recognition framework.

Dept. of ECE Page 2

(a) (b) (c)

Dept. of ECE Page 3

A. HOG Feature on Inshore Ship

The multi-model method is proposed to circumvent disadvantages mentioned above.

Dept. of ECE Page 4

C. Visual Representation of Feature and Model

Dept. of ECE Page 5

RECOGNITION AND FUSION

A.ROIs Extraction and Alignment

Dept. of ECE Page 6

B.Recognition on Specific Multiscale Spaces

In the proposed multi-model method, recognition is fulfilled in a novel way by preparing

Dept. of ECE Page 7

Algorithm 1 ROIs Extraction and Alignment

3) Extract ROI’s in test image using the converted geographic coordinates.

Dept. of ECE Page 8

Algorithm 2 Fusion Strategy

d ≥ α × min{l , l } and d ≥ α × min{w ,w }

Dept. of ECE Page 9

EXPERIMENTS AND RESULTS

A. Results of Localization and Recognition

Number of correctly recognized ships

Number of falsely recognized ships X 100

Dept. of ECE Page 10

C. Analysis of the Proposed Framework

Dept. of ECE Page 11

Fig. 7: Recognition result on a Naval Station Norfolk image.

Dept. of ECE Page 12

An end-to-end framework is proposed to automatically locate and recognize inshore

Dept. of ECE Page 13

Dept. of ECE Page 14

You might also like