You are on page 1of 5

The Journal of Engineering

IET International Radar Conference (IRC 2018)

SAR ATR with full-angle data augmentation eISSN 2051-3305


Received on 19th February 2019
Accepted on 2nd May 2019
and feature polymerisation doi: 10.1049/joe.2019.0219
www.ietdl.org

Yikui Zhai1 , Hui Ma1, Jian Liu1, Wenbo Deng1, Lijuan Shang1, Bing Sun2, Ziyi Jiang3, Huixin Guan3,
Yihang Zhi1, Xi Wu1, Jihua Zhou1
1School of Information Engineering, Wuyi University, Jiangmen, People's Republic of China
2Scoolof Electronics and Information Engineering, Beihang University, Beijing, People's Republic of China
3School of Computer, Wuyi University, Jiangmen, People's Republic of China

E-mail: yikuizhai@163.com

Abstract: Utilising neural networks to learn and extract valuable features has achieved satisfactory performance for synthetic
aperture radar automatic target recognition (SAR ATR). However, such target recognition capability could be seriously limited by
severe image rotation. To greatly improve the performance of convolutional neural networks-based SAR ATR, a data
augmentation method combining region of interest (ROI) extraction and full-angle rotation method is firstly proposed in this
study. Then, an inception-SAR NET is presented to polymerise multi-branch feature maps. The superior performance of
inception-SAR NET structures is obtained by comprehensive experiments. Finally, the results based on MSTAR dataset
demonstrate that authors’ methods could achieve the most advanced performance.

1 Introduction (iii) Comprehensive experiments were performed to empirically


explore CNN model with different Inception block on moving and
Synthetic aperture radar (SAR) is an important means of ground stationary target recognition (MSTAR) database.
observation. Its imaging advantages of all-weather, all-day time,
through fog and vegetation, have been incomparable in optical
sensors, which are also one of the key technologies in the field of 2 Related works
military sensors. The obtained high azimuth resolution is Traditionally, machine learning algorithms for feature extraction
equivalent to the azimuth resolution provided by a large aperture are usually designed manually. The methods commonly used in the
antenna [1, 2]. The amount of SAR target data has been increasing recognition technology include template-matching method [4],
continuously and quality of imaging has been constantly support vector machine (SVM) [5]. Qi et al. [6] proposed a method
improving. of combining multi-information dictionary learning with sparse
Automatic target recognition of SAR images is mostly representation. Wang et al. [7] proposed a Gabor filter and a local
consisting of image preprocessing, feature extraction and target texture feature extraction method to settle the challenging problem.
recognition. Most of them have achieved a relatively satisfactory As is in the practical applications, the way to extract features are
performance. Although great progress has been made by often not representative for different complex and huge types of
convolutional neural network (CNN)-based methods on synthetic data.
aperture radar automatic target recognition (SAR ATR), there are Recently, deep learning has been developed rapidly, especially
still some shortcomings when applying deep convolution neural DCNN has been widely used in all aspects of pattern recognition.
network (DCNN) to SAR ATR. The first one is that CNNs are Benefit from the end-to-end learning, DCNN enables automatic
sensitive to severe image rotation, which could undermine the feature extraction, avoiding traditional manual feature design
performance of SAR ATR with the angle of input images varying. process. Researchers have been using DCNN for SAR ATR for
The second one is that, in general CNN structure, a convolutional years. Xu et al. [8] first utilised data augmentation methods to
layer often contains convolutional kernels in the same size, which augment SAR dataset, then they proposed a DCNN for SAR ATR,
does not make better use of multi-convolution kernel. which has achieved satisfactory performance. Furukawa [9]
To address these two problems, a full-angle data augmentation explored the translation invariance of CNNs. He proposed a DCNN
and Inception-SAR NET were proposed in this paper. Firstly, model based on the residual network and data augmentation, which
image region of interest (ROI) extraction and full-angle data obtained satisfactory performance on MSTAR dataset. To better
augmentation were proposed to overcome the drawback that CNNs utilise the massive unlabeled SAR data, Gao et al. [10] proposed a
are sensitive to severe image rotation. Secondly, Motivated by novel CNN framework for SAR ATR. Active learning method was
Inception net [3], this paper proposed Inception-SAR NET that first adopted to select the most valuable samples to extend the
could aggregate feature maps generated by convolutional layer original training set.
with different kernel size within one block. Furthermore,
comprehensive experiments were performed in this paper to
explore the influence of different Inception-SAR NET structure on
3 Proposed methods
SAR ATR. The contributions of the proposed method are as CNNs are unbearably sensitive to severe image rotation, which
follows: leads to a sharp drop in performance when images rotating at a
small angle. Data augmentation could effectively alleviate this
(i) A full-angle data augmentation method was proposed to resolve problem by enriching the azimuth information of SAR images.
the problem that CNNs are sensitive to severe image rotation. Therefore, full-angle data augmentation was proposed in this paper.
(ii) Inception-SAR NET was proposed to polymerise multi-branch Recent development of CNN architecture has demonstrated the
feature maps to better boost the SAR ATR performance. vitality of CNNs. While very rare of CNNs utilised polymerisation
information, which could aggregate extracted features and express
images in a more effective way. Recently, inception architecture

J. Eng. 1
This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
where (x1, y1), (x2, y2), (x3, y3), (x4, y4) are the locations of four
corners of processed image.
In this paper, raw SAR images of MSTAR dataset are cropped
into 64 × 64, which means that l and h in this paper are 64 and 64,
respectively. Fig. 1 shows the process of SAR image. The
preprocessed image is shown in Figs. 1a and b describes the
processed one.
In SAR ATR, the recognition performance could be easily
influenced by azimuth and bow angle of the target. In the training
Fig. 1  SAR images ROI extraction set of the original MSTAR dataset, the azimuth and bending angel
of images are incomplete, which means the original MSTAR
training set cannot cover SAR targets with all imaging angles.
Therefore, full-angle image rotation was proposed to enrich the
azimuth information of the target in MSTAR dataset. In this paper,
the original MSTAR dataset was augmented by rotating images of
one degree clock-wisely per rotation. The relationship between the
position of the pixels in the rotated images and the position of the
pixels in the corresponding original images are as

ai cos θ −sin θ ci
Fig. 2  Examples of ROI images rotating at different angle = (2)
bi sin θ cos θ di

where (ai, bi) is the location of a pixel in the transformed image,


(ci, di) is its counterpart in the original image. θ is the angle of
rotation. In this paper, the ROI images are rotated 360 times, which
means that θ traverse from 0 to 359. The rotation was conducted
360 times for every image in MSTAR dataset, so the number of
images in MSTAR dataset has increased 360 times, which has
greatly augmented the original MSTAR dataset.
Fig. 2 shows the ROI images of MSTAR dataset rotating at four
angles. Fig. 2a is the ROI of original image of MSTAR dataset.
Fig. 2b is the ROI of original image rotating at 90°. Fig. 2c is the
ROI of original image rotating at 180°. Fig. 2d is the ROI of
original image rotating at 270°.

Fig. 3  Inception block 3.2 Inception network for SAR ATR

was proposed, which breaks the conventional convolution cascades 3.2.1 Inception block: To maintain the sparsity of network
mode and aggregate features by different convolutional kernels. structure, the high computational performance of dense matrix can
Motivated by the architecture of this, this paper proposed also be utilised. This paper introduces inception structure into SAR
Inception-SAR NET. ATR. Considering multiple convolutional kernels of different sizes
can enhance network resilience, current representations of the
Inception module are restricted to filter sizes 1 × 1, 3 × 3 and 5 × 5.
3.1 Full-angle data augmentation Multilayer Inception will eventually lead to a large number of
Based on massive data, neural networks can achieve satisfactory model parameters and a greater dependence on computing
results in most pattern recognition tasks. While in SAR ATR, there resources. 1 × 1 convolutional layer utilised in Inception structure
does not exist massive dataset. Moreover, because CNNs are could reduce feature map dimension.
sensitive to severe image rotation, SAR dataset should cover as As can be seen from Fig. 3, 1 × 1 convolutions are used to
much angle information as possible. Therefore, to improve the reduce the computation before the expensive 3 × 3 and 5 × 5
performance of the proposed model, full-angle data augmentation convolutions. In addition to being used for reductions, the 1 × 1
was utilised in this paper. convolutional layers could also be attached by rectified linear
Due to the imaging characteristics of the SAR devices, there is activation layers to increase the non-linearity of extracted features.
often noise in the SAR images, especially speckle noise. Due to the As shown in Fig. 3, there are four branches within the Inception
composition of SAR images in MSTAR dataset, which our model block. The input base feature maps are processed by each branch,
is trained and validated on most of the area in the SAR image is then four groups of feature maps are aggregated into one group of
background that is useless even harmful to SAR ATR. To remove feature maps with a high degree of polymerisation information,
the influence of the background, image ROI extraction was first which are of great benefit for SAR ATR.
utilised in this paper.
In the original SAR images database, the ROI is often located at 3.2.2 Inception-SAR NET: In this paper, we further studied the
the geometric centre of the image. Assuming that the geometric structure of the Inception network and applied it to SAR ATR. As
centre of the SAR image is (x0, y0) and the original length and is shown in Fig. 4, this paper proposed a network structure for SAR
height of SAR image is l0, h0. The length and height of the images recognition based on the aforementioned Inception block.
processed image are l and h, respectively. Then the locations of We utilise a convolution layer followed by ReLU layer as feature
four corners of ROI image are extraction layers for SAR images. Given an input image I ∈ Rkh × kw
with n elements (n = kh ⋅ kw), reshaped as a vector x ∈ Rn
x1 = x3 = x0 − l/2
x2 = x4 = x0 + l/2 x = [x1, x2, …, xn]T (3)
(1)
y1 = y2 = y0 − h/2
y3 = y4 = y0 + h/2 The vector x ∈ Rn can be the activations from the convolution
layer of CNN which is obtained from the SAR image. This feature
extraction can be parameters as ϕ0(x; θ),where θ represents the

2 J. Eng.
This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
Fig. 4  Proposed SAR recognition net with inception block

parameters of CNN. The feature extraction network is formulated branches intermediate prediction are assembled to generate the
as final output via aggregation manner
n n
plast = cat(y1, y2, y3, y4) (8)
Pi j = ϕ ∑ ∑ (xi jθi j) + b) (4)
i=1 j=i
where cat( ⋅ ) is a function that concatenates all input along
where b is the bias, ϕ( ⋅ ) denotes the operation of the feature channels. Then we used the average pooling method to reduce the
extraction network, which can be expressed as dimension of the feature map. Average pooling can be written as
the following optimisation problem:
0 (x ≤ 0) J
f (x) = (5) 1
2∑
2
xi(x > 0) avg(plast) = arg min ∥ u − plast ∥ (9)
u i
Pij is the extracted features which are then fed into the inception
module after max pooling. Maximum pooling takes the maximum where J is the number of steps of pooling template moves, u is a
of feature points in the neighbourhood, which can be written as the fixed-length feature vector. Finally, we utilise a dropout layer to
following optimisation problem: prevent over-fitting and improve recognition rate. The loss function
of our model is a Softmax function. To express the effectiveness of
hm, j = max pi j for j = 1, 2, …, k the proposed method, comprehensive experiments with three more
i ∈ Nm
(6) different Inception blocks were performed on augmented MSTAR
dataset. The structure of other three Inception blocks are shown in
In inception module, considering that different receptive fields of Fig. 5, which have one branch with 3 × 3 convolutional kernel
convolutional networks can provide different contextual sizes, one branch with 5 × 5 convolutional kernel size, two
information which is very important for SAR targets recognition. branches with 3 × 3 and 5 × 5 kernel size branches. Inception block
There are four branches operated on the extracted features. The use with more branches could polymerise more feature maps. To
of convolutional kernels of sizes 1, 3, and 5 are mainly for the sake express the models in this paper more conveniently, Inception
of alignment, which means features can be reshaped into the same block in Figs. 5a–c are referred as Inception-SAR-3, Inception-
dimension and then be spliced together directly after 1 × 1 SAR-5, and Inception-SAR-B2, respectively.
convolutional layer. The output of branches can be expressed as
(y1, y2, y3, y4). The process of inference is represented as follows: 4 Experiments and analysis
n n 4.1 Dataset description
y1 = ϕ ∑ ∑ max (hm j)θm j + b, ,
To evaluate the performance of this method, the experiment of X-
m j
band HH polarisation high-resolution SAR was carried out on the
y2 = G(G(hm, j)) (7) MSTAR database collected and provided by Sandia National
y3 = G(hm, j) Laboratory. The original images in MSTAR database are in the
resolution of 128 × 128. Besides, airborne SAR takes training
y4 = G(G(hm, j)) images at a pitch angle of 17°, while the test image is imaged at a
pitch angle of 15°. There were three types of military targets within
where G is a function of a branch in the inception module. Also MSTAR dataset, which are T72 (main battle tanks), BMP2
then, after features processed by the inception module, all of (armored personal carriers), BTR70 (armored personal carriers)

J. Eng. 3
This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
Table 1 Overall accuracies of four models on MSTAR
dataset
No. Model Overall accuracy, %
1 Inception-SAR 96.41
2 Inception-SAR-3 92.16
3 Inception-SAR-5 92.31
4 Inception-SAR-B2 94.29

Table 2 Confusion matrix of model Inception-SAR


Testing targets Recognition results Average per class, %
BMP2 BTR T72
BMP2 577 4 1 99.14
BTR 35 546 6 93.02
T72 0 3 193 98.47

Table 3 Confusion matrix of model Inception-SAR-3


Testing targets Recognition results Average per class, %
BMP2 BTR T72
BMP2 567 13 2 97.42
BTR 69 507 11 86.37
T72 2 10 184 93.88

Table 4 Confusion matrix of model Inception-SAR-5


Testing targets Recognition results Average per class, %
BMP2 BTR T72
BMP2 567 15 0 97.42
BTR 52 521 14 88.76
Fig. 5  Three Proposed Inception blocks T72 3 21 172 87.76
(a) Inception block of 1 branch with Pooling layer, (b) Inception block of 1 branch
with 5 × 5 kernel size, (c) Inception block with 2 branches with 3 × 3 and 5 × 5 kernel
size Table 5 Confusion matrix of model Inception-SAR-B2
Testing targets Recognition results Average per class, %
BMP2 BTR T72
BMP2 573 8 1 98.45
BTR 47 528 12 89.55
T72 1 9 186 94.90

Fig. 6  Optical images of three kinds of targets From Table 1, model Inception-SAR-5 and Inception-SAR-3
(a) T72 target image, (b) BTR70 target image, (c) BMP2 target image have achieved roughly equal performance, except that Inception-
SAR-5 has achieved overall accuracy 0.15% higher than the
overall accuracy of Inception-SAR-3. Model Inception-SAR-B2
achieved the overall accuracy of 94.29% on MSTAR dataset,
which is 2.05% higher than the average accuracy of Inception-
SAR-3 and Inception-SAR-5. Model Inception-SAR achieved the
overall accuracy of 96.41% on MSTAR dataset, which then is
2.12% higher than the overall accuracy achieved by model
Inception-SAR-B2. It can also be concluded from Table 1 that
models with more branches could achieve better performance. The
Fig. 7  SAR Imaging of three kinds of targets reason why models with more branches could achieve higher
(a) T72 SAR image, (b) BTR70 SAR image, (c) BMP2 SAR image overall accuracy is that Inception block of more branches could
extract more complex features, which could better characterise the
and so on. Figs. 6 and 7 show the images of the three targets and SAR images. The conclusions drawn from Table 2 can also prove
the corresponding SAR images, respectively. that models with more branches in its inception blocks could
aggregate more feature maps, which in return could achieve better
performance.
4.2 Experimental results and analysis Tables 2–5 are confusion matrices for these four models. It can
4.2.1 Experimental results and analysis with various be seen from Tables 3 and 4, when the convolutional kernel size
Inception block structures: To evaluate the proposed methods, changes from 3 × 3 to 5 × 5, the accuracies per class maintain the
this paper first conducted full-angle data augmentation on MSTAR roughly the same except that accuracy in T72 declines 6.12%.
dataset. Then four models were experimented on augmented Compare Tables 3–5 and Table 2, it can be concluded that when the
MSTAR dataset, which are Inception-SAR, Inception-SAR-3, number of branches in Inception block increases, the accuracy per
Inception-SAR-5 and Inception-SAR-B2. The overall accuracies of class increase as overall accuracy rise, which also validated the
these four models on MSTAR dataset are shown in Table 1. effectiveness of Inception model.

4 J. Eng.
This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)
Table 6 Performance comparison of Inception-SAR NET show that the proposed method can achieve the most advanced
before and after data augmentation performance.
Expreiment no. Data augmentation Average accuracy, %
1 without 92.57 6 Acknowledgments
2 with 96.41 This work is supported by NNSF(No.61771347); Characteristic
Innovation Project of Guangdong Province (No.2017KTSCX181);
Young Innovative Talents Project of Guangdong Province
Table 7 Performance comparison of different methods (2017KQNCX206); Jiangmen Science and Technology Project
Methods Overall accuracy, % ([2017] No.268); and Youth Foundation of Wuyi University
SNN + SVM [11] 90.62 (No.2015zk11).
all-in-one CNN [12] 93.16
MSRC [13] 93.66 7 References
additional feature CNN [14] 94.38 [1] Raney, RK, Runge, H, Bamler, R, et al.: ‘Precision SAR processing using
SARnet [8] 95.68 chirp scaling [J]’, IEEE Trans. Geosci. Remote Sens., 1994, 32, (4), pp. 786–
799
proposed model 96.41 [2] Wahl, DE, Eichel, PH, Ghiglia, DC, et al.: ‘Phase gradient autofocus – a
robust tool for high resolution SAR phase correction’, IEEE Trans. Aerosp.
Electron. Syst., 1994, 30, (3), pp. 827–835
[3] Szegedy, C, Liu, W, Jia, Y, et al.: ‘Going deeper with convolutions’. IEEE
4.2.2 Experimental results and analysis with full-angle data Conf. on Computer Vision and Pattern Recognition (CVPR), Boston MA,
augmentation: To verify the feasibility of full-angle data USA, June 2015, pp. 1–9, doi: 10.1109/CVPR.2015.7298594
enhancement method in this paper, comparative experiments using [4] Ross, TD, Worrell, SW, Velten, VJ, et al.: ‘Standar SAR ATR evaluation
Inception-SAR NET were conducted in this paper, one of which is experiments using the MSTAR public release data set’. Aerospace/Defense
Sensing and Controls, Int. Society for Optics and Photonics, Orlando, FL,
performed on the raw MSTAR dataset and the other one is USA, 1998, pp. 566–573
performed on the augmented MSTAR dataset. Table 6 shows the [5] Tao, W, Xi, C, Xiangwei, R, et al.: ‘Study on SAR target recognition based on
performance comparison of Inception-SAR NET before and after support vector machine’. 2009 2nd Asian-Pacific Conf. on Synthetic Aperture
the data augmentation. From the table, the performance of Radar, Xian, China, 2009, pp. 856–859
[6] Huijiao, Q., Yinghua, W., Jun, D., et al.: ‘SAR target recognition based on
Inception-SAR NET has been greatly improved after full-angle multi-information dictionary learning and sparse representation’, Syst. Eng.
data augmentation, which serves as a significant proof that Electron., 2015, 37, (6), pp. 1280–1287
enriching the azimuth information of dataset could reduce the [7] Lu, W., Fan, Z., Wei, L., et al.: ‘A method of SAR target recognition based on
decline of performance caused by input image rotation. Gabor filter and local texture feature extraction’, J. Radars, 2015, 4, (6), pp.
658–665. DOI: 10.12000/JR15076
[8] Xu, Y, Liu, K, Ying, Z, et al.: ‘SAR automatic target recognition based on
4.2.3 Performance comparison of different methods: Table 7 deep convolutional neural network’. Int. Conf. on Image and Graphics,
shows the performance comparison of different methods. It is noted Springer, Cham, 2017, pp. 656–667
[9] Furukawa, H.: ‘Deep learning for target classification from SAR imagery:
that our proposed model could achieve relatively higher data augmentation and translation invariance’, IEICE Tech. Rep., 2017, 117,
performance than other methods. Among methods in Table 7, only (182), pp. 13–17
the proposed model and SARnet utilised dada augmentation [10] Gao, F, Yue, Z, Wang, J, et al.: ‘A novel active semisupervised convolutional
method that achieved the top two overall accuracy, which proves neural network algorithm for SAR image recognition’, Comput. Intell.
Neurosci., 2017, 2017, (24), p. 3105053
that data augmentation could boost the performance, generally. [11] Li, X, Li, C, Wang, P, et al.: ‘SAR ATR based on dividing CNN into CAE
and SNN [C]’. IEEE 5th Asia-Pacific Conf. on Synthetic Aperture Radar
(APSAR), Singapore, 2015, pp. 676–679, DOI: 10.1109/
5 Conclusion APSAR.2015.7306296
To effectively improve the performance of SAR ATR, a full-angle [12] Ding, J, Chen, B, Liu, H, et al.: ‘Convolutional neural network with data
augmentation for SAR target recognition [J]’, IEEE Geosci. Remote Sens.
data augmentation method and Inception-SAR NET method were Lett., 2016, 13, (3), pp. 364–368
proposed, here. The full-angle data augmentation method can [13] Dong, G, Wang, N, Kuang, G.: ‘Sparse representation of monogenic signal:
significantly reduce the impact of severe image rotation and make with application to target recognition in SAR images [J]’, IEEE Signal
CNNs more robust. Meanwhile, the Inception-SAR NET, Process. Lett., 2014, 21, (8), pp. 952–956
[14] Cho, JH, Chan, GP.: ‘Additional feature CNN based automatic target
aggregating multi-branch features, achieves a superior recognition recognition in SAR image [C]’. Asian Conf. on Defence Technology – Japan,
performance. Finally, experimental results on MSTAR dataset Tokyo, Japan, 2017, p. 1

J. Eng. 5
This is an open access article published by the IET under the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/3.0/)

You might also like