Professional Documents
Culture Documents
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 1
Abstract—Synthetic aperture radar (SAR) has all-day and target, and it can be used to identify and classify the target. The
all-weather characteristics and plays an extremely important radar target recognition technology is based on the radar echo
role in the military field. The breakthroughs in deep learning signal to extract the target feature and realize the automatic
methods represented by convolutional neural network (CNN)
models have greatly improved the SAR image recognition ac- judgment of the target attribute, class or type [1]. Using
curacy. Classification models based on CNNs can perform high- image recognition technology for target recognition is the most
precision classification, but there are security problems against intuitive method in the field of automatic target recognition, so
adversarial examples (AEs). However, the research on AEs is our research work is mainly focused on the two-dimensional
mostly limited to natural images, and remote sensing images radar image formed by the amplitude information obtained by
(SAR, multispectral, etc.) have not been extensively studied. To
explore the basic characteristics of adversarial examples of SAR imaging the target by synthetic aperture radar.
images (ASIs), we use two classic white-box attack methods to Target recognition based on two-dimensional SAR images
generate ASIs from two SAR image classification datasets and has three main steps: image preprocessing, feature extrac-
then evaluate the vulnerability of six commonly used CNNs. The tion, and classification decision. The design and selection of
results show that ASIs are quite effective in fooling CNNs trained features directly determine the accuracy of target recogni-
on SAR images, as indicated by the obtained high attack success
rate. Due to the structural differences among CNNs, different tion. Eryildirim et al. [2] extracted features based on two-
CNNs present different vulnerabilities in the face of ASIs. We dimensional cepstrum with the aim of discriminating between
found that ASIs generated by nontarget attack algorithms feature clutter and man-made objects in a SAR image. Gaglione et
attack selectivity, which is related to the feature space distribution al. [3] proposed a recognition algorithm for full-polarimetric
of the original SAR images and the decision boundary of the SAR images, which is based on the pseudo-Zernike moments
classification model. We proposed the sample-boundary-based
adversarial example selectivity distance (AESD) to successfully and the Krogager decomposition components and exploited
explain the attack selectivity of ASIs. We also analyzed the effects the multisource data offered by different sensors. Clemente
of image parameters such as image size and number of channels et al. [4] proposed a novel algorithm for automatic target
on the attack success rate of ASIs through parameter sensitivity. recognition that is capable of exploiting single or multichannel
The experimental results of this study provide data support and SAR images to extract features based on pseudo-Zernike
an effective reference for attacks on and the defense capabilities
of various CNNs with regard to AEs in SAR image classification moments (pZm) for target recognition. The method for SAR
models. target recognition based on dictionary learning and joint
dynamic sparse representation (DL-JDSR) was proposed by
Index Terms—Synthetic aperture radar, convolutional neural
network, adversarial example. Sun et al. [5] and combined amplitude features and scale-
invariant feature transform (SIFT) features in the recognition
process. An algorithm for automatic target recognition based
I. I NTRODUCTION
on Krawtchouk moments was proposed by Clemente et al. [6]
AR is a sensor that actively emits microwaves, which
S improves the azimuth resolution through the principle of
a synthetic aperture to obtain large-area high-resolution radar
and had high reliability in the presence of noise and reduced
sensitivity to discretization errors.
Because the SAR imaging mechanism is different from
images. The SAR image is the image data acquired by the the optical imaging system, different terrains in the two-
microwave band, which contains only the echo information dimensional images generated by the SAR imaging system
of one band. It is recorded in the form of binary complex exhibit several special phenomena such as shadows, overlap,
numbers. The data of each pixel can extract the corresponding and perspective shrinkage. In addition, SAR images have
amplitude and phase information. The amplitude information coherent speckle noise, and the visual readability is poor. It is
is the backscattering intensity of the radar wave by the ground difficult to manually design effective features for SAR image
This work was supported by the National Natural Science Foundation of target recognition. Different from the traditional automatic tar-
China (41871364, 41871276, 41871302 and 41861048). This work was carried get recognition technology based on artificial design features
out in part using computing resources at the High Performance Computing [7], [8], deep neural networks, especially convolutional neural
Platform of Central South University.
Haifeng Li (e-mail: lihaifeng@csu.edu.cn), Haikuo Huang, Li Chen, Jian networks (CNNs), can automatically learn target features for
Peng, Haozhe Huang, Zhengqi Cui, Xiaoming Mei are with the School of automatic target recognition [9], which reduces the computa-
Geosciences and Info-Physics, Central South University, Changsha 410083 tional cost and improves the recognition accuracy.
China. Guohua Wu is with the School of Traffic and Transportation Engi-
neering, Central South University, Changsha 410083 China. Corresponding With the rapid development of deep learning in the field of
author: Guohua Wu (e-mail: guohuawu@csu.edu.cn). computer vision, image classification and recognition methods
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 2
based on CNN models have been widely used for target resolution remote sensing images. Nevertheless, we are unclear
recognition in SAR images. For example, Shao et al. [10] regarding the characteristics of ASIs at present. In the paper,
analyzed and compared the performance of different CNNs we mainly discuss the characteristics of AEs for SAR image
on the MSTAR [11] dataset based on accuracy, number of classification models based on CNNs.
parameters, training time and other metrics to verify the CNN classification models trained on SAR images are
superiority of CNNs for SAR image target recognition. Wang easily fooled by ASIs with special added noise. We can use
et al. [12] proposed despeckling and classification coupled attack algorithms to generate this special noise. It is special
CNNs to distinguish multiple categories of ground targets as follows: First, this noise is so small that it is almost
in SAR images with strong and varying speckle. Shang et imperceptible when added to the original image. Second,
al. [13] proposed deep memory convolution neural networks the pixels of the original image are modified to generate an
to alleviate the problem of overfitting caused by insufficient adversarial image using gradient descent along the direction
SAR image samples, and their method achieved higher accu- in which the CNN incorrectly predicts, thus easily fooling the
racy than several other well-known SAR image classification CNN. Speckle noise is caused by the coherence principle on
algorithms. Huang et al. [14] proposed an improved deep which SAR imaging is based. Speckle noise is very obvious
Q-network method for polarimetric SAR (PolSAR) image in SAR images, and its randomness means it is not necessarily
classification, and the experimental results demonstrated that easy to fool the CNN into making the wrong classification.
the proposed method has a better classification performance We describe a possible problem in SAR image recognition,
than traditional supervised classification methods. i.e., AEs lead to misclassification of CNNs. We discovered
Although CNNs perform very well in the field of image an interesting phenomenon of the AE by experiment. This
classification and recognition, they are highly sensitive to phenomenon has not yet been studied, so we proposed a
adversarial perturbations (APs). APs can easily fool classifiers hypothesis for the occurrence of this phenomenon, and verified
based on CNNs [15]–[19]. The purpose of adding APs to the our conjecture through experiments. For the CNN trained on
original images is to fool a classifier into misclassifying them. SAR images, we observed the common feature of the ASIs
We refer to such an image with an adversarial perturbation generated by different attack algorithms, i.e., ASIs generated
(AP) as an adversarial example (AE). In many cases, APs are by the nontarget attack algorithm have a preference for select-
difficult for humans to perceive, but they still cause CNNs to ing attack classes. The research on AE in the field of remote
incorrectly predict the labels of AEs. The existence of AEs sensing did not explain this fact, but we proposed a new metric,
carries hidden dangers in the military and other fields with adversarial example selective distance (AESD), to successfully
high security requirements. Fig.1 shows that the adversarial explain the phenomenon of the attack selectivity of ASIs. Our
SAR image (ASI) is misclassified with high confidence by research on this phenomenon will provide a theoretical basis
the CNN, and humans can barely perceive any differences for designing new attack methods.
between the original SAR image and the ASI. The main contributions of our work are listed as follows:
• We propose the adversarial example selective distance
(AESD) to analyze the attack selectivity of ASIs gen-
erated by different nontarget attack algorithms, which
provided us with the opportunity to quickly achieve
targeted attacks and understand the underlying geometry
of ASIs.
• Massive analyzing the AEs on MSTAR and SENSAR
datasets with two attack model on six popular deep CNN
models, we obtain several interesting conclusions:
Fig. 1: On the left is the original SAR image. A CNN correctly • The network structure of the CNN has a great influence
identified it as a T-62 main battle tank with a confidence on robustness. The simpler the structure is, the higher the
of 99.62%. On the right is the ASI. After adding an AP to robustness of the CNN.
the original SAR image, the CNN identified it as a Sandia • For the training data, adding auxiliary information in the
Laboratory Implementation of Cylinders (SLICY) false target form of channels can improve the accuracy of the CNN,
with a confidence of 99.93%. but it will reduce its robustness.
• In addition, the use of cropping to remove unnecessary
information for classification and recognition in the image
Previous research on AEs has shown that they can easily can improve the robustness of CNNs against adversarial
fool natural image classifiers [20]–[23]. However, there are attacks.
few studies on AEs involving remote sensing fields such as • Simply pursuing higher accuracy will lose robustness.
SAR, which limits our understanding of the security of remote We suggest that the designers should weigh the balance
sensing image classification models. Czaja et al. [24] were between accuracy and robustness when designing new
the first to analyze the problem of AEs in remote sensing CNNs.
images. The results showed that AEs can cause errors in even The rest of this paper is organized as follows. In Section
the most advanced remote sensing image classifiers. Li et al. II, we briefly review the related works on AEs. Section III
[25] proposed a robust structure that can detect AEs in high- introduces the principle of the attack algorithms and defines
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 3
the AESD. Section IV reports on the experimental results and With the rapid development of remote sensing technology,
provides an analysis. Finally, Section V provides a summary the amount of data we can obtain is constantly increasing,
and conclusion. and it has become increasingly difficult to process massive
amounts of data manually. Remote sensing image processing
II. R ELATED WORKS technology has gradually transformed from using traditional
In recent years, breakthroughs have been achieved in the visual interpretation to relying on automated methods. Data-
application of CNNs to the field of remote sensing. Remote driven CNNs have made good progress in automatic remote
sensing automatic discrimination technology based on CNNs sensing image processing. Currently, many CNN methods have
has been widely used in social life and military fields [26]– been applied to automatic remote sensing image processing.
[30]. In the past, researchers have expended considerable The existence of AEs may have a serious impact on the
effort in studying how to improve the accuracy of mod- practical application of these methods. SAR is widely used
els by designing better model architectures, proposing more in the military field, and military applications have extremely
effective loss functions, and expanding datasets to improve high requirements for security. Nevertheless, we know nothing
data diversity. Before the discovery of AEs, the accuracy about ASIs at the moment. Therefore, we start with the classic
of CNNs was a consistent research concern. However, few attack algorithm to study the characteristics of ASIs and hope
people have questioned the safety and reliability of CNNs. that our work can be helpful for future research.
Szegedy et al. [31] were the first to discover that CNNs are
easily fooled by tiny perturbations. These perturbations cause
CNNs to misclassify AEs with high confidence. Moreover, III. M ETHODOLOGY
AEs generated by one neural network can fool other neural In this section, we will introduce the methods used in
networks. These findings have caused widespread concern the experiment in detail. First, the definition of ASIs is
among researchers about the safety of deep learning. Many introduced. Next, the methods of generating ASIs will be
other later works also studied these interesting properties, but introduced subsequently. Finally, the AESD we proposed will
no complete theory yet exists to explain this phenomenon be introduced.
[32]–[35]. Goodfellow et al. [20] believe that the linear nature
of neural networks in high-dimensional spaces leads to the
generation of AEs. In a high-dimensional space, the infinitely A. Problem description of ASIs
small perturbations in AEs accumulate during the forward
propagation of the network, which can cause large changes First, we formalize ASIs. Let X denote a SAR image
in the output and result in errors. dataset in which each SAR image is denoted as x ∈ RC×H×W
As CNNs are increasingly applied in practice, many re- and k represents a classification model that outputs a predicted
searchers have begun to focus on the security of CNNs and label k(x) for each SAR image x ∈ X.
have studied AEs in areas such as autonomous driving [36]
and face recognition [37], [38] and found that AEs reduce the
x) 6= k(x) f or most x ∈ X
k(e
robustness of CNNs. AEs exist not only in computer vision (1)
fields, such as image classification [21]–[23], [39]–[41], object x
e=x+η
detection [42], [43], and semantic segmentation [44], [45], but
also in the natural language processing [46], [47] and speech where x is the original SAR image, x e is the ASI, and η ∈
recognition [48], [49]. RC×H×W is the AP, and C, H and W are the number of
Various attack algorithms exist for generating AEs. Taking channels, height and width of a SAR image, respectively. The
image classification as an example, the attack algorithms can classification model k misclassifies the ASI x e generated by
be divided into different types. Based on information that the vast majority of original SAR images x from the SAR
attackers can obtain, the attack algorithms can be divided image dataset X after the perturbation η is added. We use the
into white-box attacks [20]–[23], [31], [39], [40] and black- ∞ norm to constrain the perturbation. When the perturbation
box attacks [15], [41]. In a white-box attack, an attacker can η is sufficiently small, humans cannot visually distinguish an
obtain complete information about the target model, including ASI from an original SAR image.
its parameters, structure, training method, and even training X
data. In contrast, in a black-box attack, the attacker does not kηkp = ( |ηij |p )1/p < δ (2)
know specific information about the model but can observe the 0≤i<H
0≤j<W
output by submitting various inputs. The attacker then uses the
correspondence relationships between the input and output to p=∞
find suitable AEs with which to attack the model. According =⇒
to whether the attack class is directional, the attack algorithm
can be divided into target attacks [20], [22], [31], [39], [41] kηk∞ = max |ηij | (3)
0≤i<H
and nontarget attacks [15], [21], [23], [40]. A target attack 0≤j<W
deception model incorrectly predicts the AEs as the specified
labels, while a nontarget attack needs only to cause the model where p is the norm. Here, p = ∞. δ controls the value of
to predict the labels of the AEs incorrectly. perturbation.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 4
B. Method for generating ASIs to control the amplitude of the attack noise. The larger the
When training the CNN, the parameters in the CNN are parameter value is, the greater the attack intensity and the
updated by subtracting the gradient obtained by backpropaga- easier it is to observe the noise. We add the perturbation η to
tion so that the loss value becomes increasingly smaller, and the original image x to obtain the adversarial image x e.
the probability of the model prediction becomes increasingly
higher. In a nontarget attack, the goal is that the model
misclassifies the input image into any class other than the
correct class. We need only to increase the loss value to 2) Basic iterative method: The FGSM is fast, but it uses
achieve this goal. This is the opposite of the purpose of only one gradient update, and sometimes one update is not
updating parameters when training the CNN. enough to attack successfully. Thus, Kurakin et al. [21] pro-
Considering the speed and attack success rate of ASIs posed the BIM to generate AEs. The BIM attack algorithm is
generated by the attack algorithm, we use two classic white- as follows:
box attack algorithms, fast gradient sign method (FGSM) [20]
and basic iteration method (BIM) [21]. FGSM is currently the
attack algorithm most widely used to attack image classifiers
based on CNNs because it needs only a one-step gradient-
increasing operation to generate ASIs. Using FGSM to gener-
ate ASIs is very fast, but it also limits the attack success rate.
We also use BIM, which has a higher attack success rate. BIM
is an iterative algorithm that makes it easier to fool CNNs by
generating ASIs through its multiple iterations, but it is slower
than FGSM.
1) Fast gradient sign method: Goodfellow et al. [20] pro-
posed the FGSM to generate AEs. The FGSM attack algorithm
is as follows:
x0 = x, xi+1 = clip(xi + αsign(∇x J(θ, x, y))) (7)
x
e = x + sign(∇x J(θ, x, y)) (4)
c
X
∇x J(θ, x, y) = − yk log pk (5)
k=1
−1, x < 0
sign(x) = 0, x=0 (6)
1, x>0
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 5
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 6
w = ∇x f (x0 ) approximates the distance between the sample i.e., the model predicts the sample as the output value of the
x0 and the decision boundary P as follows: c-th class. ∇x is the partial derivative of the predicted vector
f (x) of sample x.
|f (x0 )| Fig.6 shows the main process of analyzing ASIs. We divided
l(x0 ) = (10)
k∇x f (x0 )k2 the SAR dataset into a training dataset and a test dataset. The
CNN is trained on the training dataset, and the accuracy of the
model is calculated using the test dataset. The misclassification
of the model itself affects our calculation of the attack success
rate of ASIs. Therefore, for the test dataset, we set a confi-
dence threshold to filter out the test data that the model can
correctly classify. In our experiments, the threshold is 0.7. The
confidence of the test image is less than the threshold, which
is considered to be an unreliable false prediction. We use
the filtered test images as the original images to generate the
adversarial images by two attack algorithms, FGSM and BIM.
The correctness of AESD is verified by calculating the AESD
of original images and comparing whether the class of AESD
is consistent with the predicted label of the adversarial images.
Fig. 4: Sample-boundary distance for a nonlinear binary clas- The attack success rate is used to evaluate the vulnerability
sifier. of CNNs and show the impact of the adversarial image
parameters such as image size and number of channels.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 7
is divided into two folders, s1 and s2, where s1 holds the SAR
images and s2 holds the optical images. In our experiments, B. Metrics and Implementation details
we use only the SAR images. Folder s1 contains 49 subfolders, To evaluate the vulnerability of CNNs, we used the evalua-
and each subfolder corresponds to the SAR images taken tion indicator attack success rate. In the attack task, the higher
of the same area, i.e., the SAR images in a given folder the attack success rate value is, the more fragile the CNN. It
belong to the same area and class, and each area’s class is is expressed as follows:
represented by a serial number (e.g., 0,1,2· · · ). We selected
20 of the 49 folders and randomly selected SAR images from Ndif f
each folder according to a training-to-testing ratio of 1: 1 attack success rate = (12)
Nall
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 8
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 9
Both ResNet and DenseNet contain the connection structure The aggregation of the predicted labels of ASIs is related
that propagates low-layer information through identity connec- to the distribution of the original SAR images in the feature
tions and residual connections to increase the network depth. space, and it is also affected by the attack selectivity of the
ResNeXt uses a multibranched isomorphic structure in which ASIs. In general, the distribution of the original SAR images
the number of branches is called the ”cardinality” and forms belonging to the same class is concentrated in the feature
another key factor for measuring neural networks in addition space, and there is always a decision boundary closest to these
to depth and width. Although these structures improve model images. The attack selectivity makes an original image tend
classification accuracy, they are also more likely to accumulate to choose to cross the closest decision boundary to generate
small perturbations in the lower layers that are amplified in the an AE. Consequently, the predicted labels of ASIs generated
higher layers and thus have a substantial impact on the final by the original SAR images of the same class are highly
output. concentrated. In Fig.2, the distribution of the three classes is
From Table II, the other five CNNs have significantly relatively concentrated. Most samples of class A are closer to
higher attack success rates than does VGG16, especially the classification boundary P2 . Since the AEs generated by the
DenseNet121. The attack success rate of DenseNet121 is also nontarget attack algorithm have attack selectivity, the predicted
higher than that of other CNNs. VGG16 has the simplest struc- labels of AEs generated by the original samples of class A are
ture and includes no complex modules. Consequently, VGG16 highly concentrated, and most of the predicted labels are class
has the lowest attack success rate by ASIs. DenseNet121 has C.
a very dense connection structure that directly connects all the Fig.14 shows the sample-boundary distance of the SAR
layers. The input to each layer consists of the feature maps of images. In Fig.13, GoogLeNet classified the ASI generated by
all the preceding layers. Therefore, the small perturbations in the SAR original image of class BTR70 as class 2S1. Class
ASIs are accumulated and enlarged through the feature map 2S1 corresponds to the class with the maximum proportion in
aggregation, causing DenseNet121 to be more vulnerable to class BTR70 of Fig.13(b), so we randomly select an image
attacks. from the original images of class RRT70 to generate an AE,
and this AE is most likely to be misclassified as class 2S1.
The class calculated from the AESD of this original image is
D. Attack selectivity of ASIs also class 2S1, which corresponds to the class of the shortest
By collecting statistics on the predicted labels of ASIs, we bar in Fig.14 and is the same as the class most likely to
found that the predicted label distribution of ASIs is highly be attacked. The result illustrates that AESD can explain the
concentrated. Fig.13 shows the distribution and the proportion attack selectivity of ASIs.
of the predicted labels of ASIs generated by the FGSM attack To verify the reliability of AESD, we counted the proportion
on GoogLeNet on MSTAR. The predicted labels of the ASIs of the predicted labels of ASIs that matched the classes
concentrate primarily on a few classes and most of the classes indicated by AESD, i.e., the accuracy of AESD as shown
only distributed a small number of ASIs. Fig.13(a) shows the in Table IV. Min-Dist-1 is the proportion of the predicted
nature of the 4th- and 5th-order truncation. We will call it a labels of ASIs that are the same as the classes pointed to
kth-order truncation when it is larger than a certain constant k. by AESD. Because the AESD is obtained based on the
Fig.13(b) shows that these 4–5 classes account for more than sample-boundary distance and the sample-boundary distance is
90% of all the predicted classes, indicating that ASIs are more calculated approximately, we also count Min-Dist-3, which is
likely to be misidentified as these 4–5 classes. For example, the proportion of the predicted labels of ASIs that are the same
the ASIs generated by the original SAR images of class T72 as one of the classes represented by the smallest three sample-
were classified as class T62, class ZIL131, and class BMP2, boundary distances. From Table IV, on MSTAR, the average
comprising 38.13%, 34.53%, and 22.30%, respectively, while accuracy of Min-Dist-1 for the six CNNs is 81.84%, and the
the remaining classes accounted for only 5.04%. average accuracy of Min-Dist-3 is 97.35%. On SENSAR, the
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 10
Fig. 10: The middle-layer features of the SAR image in Fig.9(a) extracted by VGG16.
Fig. 11: The SAR image in Fig.9(a) outputs a 512-dimensional feature vector through the convolutional layer of VGG16.
average accuracy of Min-Dist-1 is 68.60%, and the average those of the single-step attack algorithm.
accuracy of Min-Dist-3 is 89.88%. These results show that
AESD is reliable in explaining the attack selectivity of ASIs. E. Parameter sensitivity analysis
In Table IV, the accuracy of AESD when using BIM is gen- By adjusting the hyperparameters of CNNs, the model
erally lower than the accuracy of AESD when using FGSM. accuracies can be improved to a certain extent. From the
We consider that a reasonable and possible reason is that BIM perspective of hyperparameters, we maintain that changing the
is an iterative algorithm and it recalculates the perturbation image-related hyperparameters should also have an impact on
in each iteration. The ASIs may continuously cross different the attack success rate of ASIs. Therefore, we used MSTAR
decision boundaries during the iterative process. However, the to analyze the sensitivity of the attack success rate of ASIs
AESD we proposed is calculated by the finally generated ASIs. to changes in the image parameters (size and number of
Thus, our method ignores the iterative process, which results channels).
in the predicted labels of ASIs generated by the iterative attack 1) Sensitivity analysis of the number of channels: To ana-
algorithm matching the AESD with a lower accuracy rate than lyze the effect of the number of channels on the attack success
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 11
Fig. 12: The feature vector in Fig.11 outputs a 10-dimensional images. In our experiment, we did not use CNNs specially
prediction vector through the fully connected layer of VGG16, designed for SAR images, and only common CNNs were used.
and each dimension represents a class. For the recognition of a common CNN, there is no obvious
difference between optical images and SAR images. Moreover,
the grayscale information of the optical images has a great
TABLE IV: Accuracy of AESD. correlation with the amplitude information of the SAR images,
SAR Model Attack so the adversarial properties of the SAR images and the optical
Min-Dist-1 Min-Dist-3
Dataset Name Algorithm images are similar on the digital level.
FGSM 80.25% 96.91%
VGG16
BIM 80.11% 96.71% However, the CNN classification model trained on optical
GoogLeNet
FGSM 89.75% 98.95% images is more likely to be fooled against the adversarial
BIM 84.44% 98.43% images than the CNN classification model trained on SAR
FGSM 87.58% 98.86%
InceptionV3
BIM 78.37% 96.86% images. Increasing the number of channels of SAR images
MSTAR
ResNet50
FGSM 84.54% 98.51% improves the attack success rate of ASIs. From Table V, using
BIM 77.91% 96.94% FGSM and BIM, the attack success rates of the three-channel
FMSM 86.27% 98.69%
ResNeXt50
BIM 76.45% 95.05% ASIs of all CNNs are higher than those of the single-channel
DenseNet121
FGSM 78.25% 96.27% ASIs. Compared with SAR images, optical images contain
BIM 78.17% 96.00% more bands, and the AEs of optical images can carry more
FGSM 71.81% 93.24%
VGG16
BIM 68.06% 88.74% adversarial information and are more likely to fool a CNN
FGSM 67.08% 89.61% from misclassification.
GoogLeNet
BIM 63.03% 86.97% 2) Sensitivity analysis of the size: To analyze the impact of
FGSM 76.67% 95.87%
InceptionV3 image size on the attack success rate, we cropped the MSTAR
BIM 66.31% 89.95%
SENSAR
ResNet50
FGSM 71.90% 91.32% images to six different sizes: 128 × 128, 112 × 112, 96 × 96,
BIM 64.86% 86.80% 80 × 80, 64 × 64, and 48 × 48. Then, we used two methods
FMSM 70.50% 91.13%
ResNeXt50 in the experimental analysis: one performed no processing on
BIM 59.44% 79.95%
DenseNet121
FGSM 74.40% 93.94% the center-cropped image, and the other resized the center-
BIM 69.15% 91.05% cropped image to 128 × 128. Fig.15 shows that decreasing
the image size can reduce the attack success rate of ASIs, and
different attack algorithms and model structures have different
rate of ASIs, we trained CNNs using SAR images read in both sensitivities to the sizes of ASIs.
RGB and GRAY modes. The RGB mode makes two copies For VGG16 and GoogLeNet, ASIs generated by FGSM and
of the single-channel image in the channel direction to form BIM reduce the attack success rate when the image size is
a three-channel image. reduced, and the attack success rate still maintains a significant
A CNN is a black-box model, which is different from downward trend even after being zoomed to the same size.
traditional machine learning models that require artificial de- These results show that VGG16 and GoogLeNet are sensitive
sign features, such as color and texture. A CNN can learn a to changes in the size of ASIs and that reducing the sizes of
pattern from data to extract the features it can distinguish, and ASIs is not the direct cause of the decline in the attack success
these features are incomprehensible to humans. Regardless of rate. Instead, other factors regarding ASIs lead to the decrease.
whether it is applied to an optical image or a SAR image, For InceptionV3, the attack success rate of ASIs generated
a CNN is based on the pattern learned in the data and by FGSM decreases as the image size decreases, and they
thus has the ability to correctly classify images. Because the still maintain a significant downward trend even after being
principle of SAR imaging is different from that of optical zoomed to the same size. The attack success rate of ASIs
sensors, there is a considerable amount of noise and artifacts generated by BIM remains basically flat. These results show
in SAR images. These are the characteristics that optical that InceptionV3 is sensitive to the size of ASIs generated
images do not have, and a CNN designed in combination with by FGSM. The reduction in image size also affects other
these characteristics has higher accuracy in recognizing SAR factors that indirectly reduce the attack success rate. The attack
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 12
(a)
(b)
Fig. 13: The distribution and the proportion of the predicted labels of ASIs generated by FGSM attack GoogLeNet on MSTAR.
(a) shows the distribution of the predicted labels of ASIs. The titles over the bar graphs indicate the classes to which the original
SAR images belong. (b) shows the proportion of the predicted labels of ASIs. The tags in the centers of the ring charts indicate
the class to which the original SAR images belong.
success is not sensitive to the size of ASIs generated by three copies of single-band images in the channel direction,
BIM, and reducing the image size does not cause a significant which does not increase the effective information in the image.
decrease in the attack success rate. The classification accuracy is not improved, but the robustness
For ResNet50, ResNeXt50, and DenseNet121, the attack of the CNN is reduced. The center-cropped SAR image
success rates of ASIs generated by FGSM also decrease as removes part of the background information in the image but
the image size decreases. After the images are zoomed to the does not reduce the effective information in the image, so the
same size, the attack success rate does not decrease; instead, accuracy of the CNN has not decreased, but the robustness
it fluctuates, while the attack success rate of ASIs generated has been improved. The more effective the information in the
by BIM remains basically unchanged. The results show that image extracted by the CNN is, the higher the classification
ResNet50, ResNeXt50, and DenseNet121 are sensitive to the accuracy of the CNN. The less redundant the information
sizes of ASIs generated by FGSM, and the reduction in image of the image used to train the CNN is, the more robust the
size leads directly to a decrease in the attack success rate. They model. The balance between accuracy and robustness should
are not sensitive to variations in the sizes of ASIs generated be considered when using CNN.
by BIM.
In our experiments, SAR images read in RGB mode are
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 13
Fig. 14: The sample-boundary distance of the SAR images. We selected one SAR image randomly from each class of the test
dataset and calculated the sample-boundary distances of these images. The class of the color of the bar indicated by the red
arrow in the figure indicates the predicted label of ASI, and the class of the color of the shortest bar is the class indicated by
AESD.
V. C ONCLUSIONS AND D ISCUSSION can create a mask based on the pixel-level label (the value of
the target area is 1, and the value of the background area is 0).
In this paper, we used FGSM and BIM to generate AEs of Then, the gradient of the loss function to the input image can
SAR image classification models based on CNNs. The results be calculated and multiplied by the mask to change only the
show that CNNs trained on SAR images are very susceptible pixel occupied by the target, which can achieve the purpose
to ASIs, and the more complex the structure of the CNN is, the of only adding the AP to the target.
easier it is for the CNN to be successfully attacked by ASIs. Our current experiment is purely digital, i.e., we have not
We then proposed AESD to expound on the attack selectivity implemented AE physically. Currently, it is extremely difficult
of ASIs, which provided the theoretical basis for targeted and expensive for us to obtain SAR images at any time
attacks. Finally, we analyzed the effects of parameters such to test the attack effect of AEs in the physical world. The
as image size and number of channels on the attack success task of generating AEs in the remote sensing field under
rates of ASIs. The results show that reducing the image size realistic physical constraints is still an unresolved problem.
and number of channels can reduce the attack success rate of Physical attacks can be performed by changing properties such
ASIs. Our work provides an experimental reference for the as reflectivity, for example by adding materials or changing
attack and defense capabilities of various CNNs against AEs surface textures. Using the material reflection feature database,
in SAR image classification models. we can determine the set of achievable reflection perturbations.
The proposal of AESD provides theoretical guidance for our We can limit the AP that we generate through the perturbation
next work. We will design new attack and defense methods set to meet the reflection characteristics of these materials
based on the above research in the future. On the one hand, and ensure that the AP can be restored in the real world.
specifying the attack direction on the basis of AESD can make AEs are very susceptible to environmental changes (such as
the original sample cross the decision boundary with almost light, clouds, etc.), and physical AEs should be robust to
the shortest distance and reach the space of the specified class such changes. While the exact mechanism for enhancing the
to quickly achieve the target attack. On the other hand, AESD, robustness of AEs is beyond the scope of our manuscript, we
which is used to quantify the attack selectivity of AEs as a hope that interested researchers can start related research to
feature of the sample, can be used to detect whether the input jointly solve the current problems.
sample is an AE.
However, many problems still require further research. R EFERENCES
Because there are no pixel-level labels of the SAR target, we
are forced to study the adversarial examples of SAR images by [1] P. Tait, Introduction to radar target recognition. IET, 2005, vol. 18.
[2] A. Eryildirim and A. E. Cetin, “Man-made object classification in sar
perturbing the whole image. If only perturbations are generated images using 2-d cepstrum,” in 2009 IEEE Radar Conference. IEEE,
for the target, the pixel-level labeling of the target is required. 2009, pp. 1–4.
We can discard the AP in the background and only add the AP [3] D. Gaglione, C. Clemente, L. Pallotta, I. Proudler, A. De Maio, and
J. J. Soraghan, “Krogager decomposition and pseudo-zernike moments
to the target to enable the target to be incorrectly recognized for polarimetric distributed atr,” in 2014 Sensor Signal Processing for
by the CNN. The rationale could be adopted as follows: We Defence (SSPD). IEEE, 2014, pp. 1–5.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 14
(a)
(b)
Fig. 15: The attack success rate trends of the ASIs as the input image size changes: (a) shows that the image was not processed
after center-cropping the image. (b) shows that the image was resized to 128 × 128 after center-cropping the image.
[4] C. Clemente, L. Pallotta, I. Proudler, A. De Maio, J. J. Soraghan, and vol. 11, no. 11, pp. 4180–4192, 2018.
A. Farina, “Pseudo-zernike-based multi-pass automatic target recogni- [10] J. Shao, C. Qu, and J. Li, “A performance analysis of convolutional
tion from multi-channel synthetic aperture radar,” IET Radar, Sonar & neural network models in sar target recognition,” in 2017 SAR in Big
Navigation, vol. 9, no. 4, pp. 457–466, 2015. Data Era: Models, Methods and Applications (BIGSARDATA). IEEE,
[5] Y. Sun, L. Du, Y. Wang, Y. Wang, and J. Hu, “Sar automatic target 2017, pp. 1–6.
recognition based on dictionary learning and joint dynamic sparse [11] O. Lay, S. Dubovitsky, R. Peters, J. Burger, S.-W. Ahn, W. Steier, H. Fet-
representation,” IEEE Geoscience and Remote Sensing Letters, vol. 13, terman, and Y. Chang, “Mstar: a submicrometer absolute metrology
no. 12, pp. 1777–1781, 2016. system,” Optics Letters, vol. 28, no. 11, pp. 890–892, 2003.
[6] C. Clemente, L. Pallotta, D. Gaglione, A. De Maio, and J. J. Soraghan, [12] J. Wang, T. Zheng, P. Lei, and X. Bai, “Ground target classification in
“Automatic target recognition of military vehicles with krawtchouk noisy sar images using convolutional neural networks,” IEEE Journal
moments.” IEEE Trans. Aerosp. Electron. Syst., vol. 53, no. 1, pp. 493– of Selected Topics in Applied Earth Observations and Remote Sensing,
500, 2017. vol. 11, no. 11, pp. 4180–4192, 2018.
[7] X. Yuan, T. Tang, D. Xiang, Y. Li, and Y. Su, “Target recognition in sar [13] R. Shang, J. Wang, L. Jiao, R. Stolkin, B. Hou, and Y. Li, “Sar targets
imagery based on local gradient ratio pattern,” International journal of classification based on deep memory convolution neural networks and
remote sensing, vol. 35, no. 3, pp. 857–870, 2014. transfer parameters,” IEEE Journal of Selected Topics in Applied Earth
[8] D. Xiang, T. Tang, Y. Ban, and Y. Su, “Man-made target detection from Observations and Remote Sensing, vol. 11, no. 8, pp. 2834–2846, 2018.
polarimetric sar data via nonstationarity and asymmetry,” IEEE Journal [14] K. Huang, W. Nie, and N. Luo, “Fully polarized sar imagery classi-
of Selected Topics in Applied Earth Observations and Remote Sensing, fication based on deep reinforcement learning method using multiple
vol. 9, no. 4, pp. 1459–1469, 2016. polarimetric features,” IEEE Journal of Selected Topics in Applied Earth
[9] J. Wang, T. Zheng, P. Lei, and X. Bai, “Ground target classification in Observations and Remote Sensing, vol. 12, no. 10, pp. 3719–3730, 2019.
noisy sar images using convolutional neural networks,” IEEE Journal [15] J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling deep
of Selected Topics in Applied Earth Observations and Remote Sensing, neural networks,” IEEE Transactions on Evolutionary Computation,
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 15
vol. 23, no. 5, pp. 828–841, 2019. [38] Z.-A. Zhu, Y.-Z. Lu, and C.-K. Chiang, “Generating adversarial exam-
[16] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily ples by makeup attacks on face recognition,” in 2019 IEEE International
fooled: High confidence predictions for unrecognizable images,” in Conference on Image Processing (ICIP). IEEE, 2019, pp. 2516–2520.
Proceedings of the IEEE conference on computer vision and pattern [39] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and
recognition, 2015, pp. 427–436. A. Swami, “The limitations of deep learning in adversarial settings,” in
[17] X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and 2016 IEEE European symposium on security and privacy (EuroS&P).
defenses for deep learning,” IEEE transactions on neural networks and IEEE, 2016, pp. 372–387.
learning systems, vol. 30, no. 9, pp. 2805–2824, 2019. [40] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple
[18] N. Ford, J. Gilmer, N. Carlini, and D. Cubuk, “Adversarial exam- and accurate method to fool deep neural networks,” in Proceedings of
ples are a natural consequence of test error in noise,” arXiv preprint the IEEE conference on computer vision and pattern recognition, 2016,
arXiv:1901.10513, 2019. pp. 2574–2582.
[19] G. Elsayed, S. Shankar, B. Cheung, N. Papernot, A. Kurakin, I. Good- [41] C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, and D. Song, “Gener-
fellow, and J. Sohl-Dickstein, “Adversarial examples that fool both ating adversarial examples with adversarial networks,” arXiv preprint
computer vision and time-limited humans,” in Advances in Neural arXiv:1801.02610, 2018.
Information Processing Systems, 2018, pp. 3910–3920. [42] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tramer,
[20] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing A. Prakash, T. Kohno, and D. Song, “Physical adversarial examples for
adversarial examples,” arXiv preprint arXiv:1412.6572, 2014. object detectors,” arXiv preprint arXiv:1807.07769, 2018.
[21] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the [43] S.-T. Chen, C. Cornelius, J. Martin, and D. H. P. Chau, “Shapeshifter:
physical world,” arXiv preprint arXiv:1607.02533, 2016. Robust physical adversarial attack on faster r-cnn object detector,”
[22] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural in Joint European Conference on Machine Learning and Knowledge
networks,” in 2017 ieee symposium on security and privacy (sp). IEEE, Discovery in Databases. Springer, 2018, pp. 52–68.
2017, pp. 39–57. [44] V. Fischer, M. C. Kumar, J. H. Metzen, and T. Brox, “Adver-
[23] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Univer- sarial examples for semantic image segmentation,” arXiv preprint
sal adversarial perturbations,” in Proceedings of the IEEE conference on arXiv:1703.01101, 2017.
computer vision and pattern recognition, 2017, pp. 1765–1773. [45] A. Arnab, O. Miksik, and P. H. Torr, “On the robustness of semantic
[24] W. Czaja, N. Fendley, M. Pekala, C. Ratto, and I.-J. Wang, “Adversarial segmentation models to adversarial attacks,” in Proceedings of the IEEE
examples in remote sensing,” in Proceedings of the 26th ACM SIGSPA- Conference on Computer Vision and Pattern Recognition, 2018, pp. 888–
TIAL International Conference on Advances in Geographic Information 897.
Systems, 2018, pp. 408–411. [46] M. Alzantot, Y. Sharma, A. Elgohary, B.-J. Ho, M. Srivastava, and K.-
[25] W. Li, Z. Li, J. Sun, Y. Wang, H. Liu, J. Yang, and G. Gui, “Spear and W. Chang, “Generating natural language adversarial examples,” arXiv
shield: Attack and detection for cnn-based high spatial resolution remote preprint arXiv:1804.07998, 2018.
sensing images identification,” IEEE Access, vol. 7, pp. 94 583–94 592, [47] S. Ren, Y. Deng, K. He, and W. Che, “Generating natural language
2019. adversarial examples through probability weighted word saliency,” in
[26] Z. Deng, H. Sun, S. Zhou, J. Zhao, and H. Zou, “Toward fast and Proceedings of the 57th Annual Meeting of the Association for Compu-
accurate vehicle detection in aerial images using coupled region-based tational Linguistics, 2019, pp. 1085–1097.
convolutional neural networks,” IEEE Journal of Selected Topics in [48] N. Carlini and D. Wagner, “Audio adversarial examples: Targeted attacks
Applied Earth Observations and Remote Sensing, vol. 10, no. 8, pp. on speech-to-text,” in 2018 IEEE Security and Privacy Workshops
3652–3664, 2017. (SPW). IEEE, 2018, pp. 1–7.
[27] F. Wu, Z. Zhou, B. Wang, and J. Ma, “Inshore ship detection based on [49] Y. Qin, N. Carlini, I. Goodfellow, G. Cottrell, and C. Raffel, “Imper-
convolutional neural network in optical satellite images,” IEEE Journal ceptible, robust, and targeted adversarial examples for automatic speech
of Selected Topics in Applied Earth Observations and Remote Sensing, recognition,” arXiv preprint arXiv:1903.10346, 2019.
vol. 11, no. 11, pp. 4005–4015, 2018. [50] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
[28] M. Kampffmeyer, A.-B. Salberg, and R. Jenssen, “Urban land cover large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
classification with missing data modalities using deep convolutional [51] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
neural networks,” IEEE Journal of Selected Topics in Applied Earth V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”
Observations and Remote Sensing, vol. 11, no. 6, pp. 1758–1768, 2018. in Proceedings of the IEEE conference on computer vision and pattern
[29] H. Luo, C. Chen, L. Fang, X. Zhu, and L. Lu, “High-resolution aerial recognition, 2015, pp. 1–9.
images semantic segmentation using deep fully convolutional network [52] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
with channel attention mechanism,” IEEE Journal of Selected Topics in the inception architecture for computer vision,” in Proceedings of the
Applied Earth Observations and Remote Sensing, vol. 12, no. 9, pp. IEEE conference on computer vision and pattern recognition, 2016, pp.
3492–3507, 2019. 2818–2826.
[30] H. Li, Z. Cui, Z. Zhu, L. Chen, J. Zhu, H. Huang, and C. Tao, “Rs- [53] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
metanet: Deep meta metric learning for few-shot remote sensing scene recognition,” in Proceedings of the IEEE conference on computer vision
classification,” arXiv preprint arXiv:2009.13364, 2020. and pattern recognition, 2016, pp. 770–778.
[31] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, [54] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual
and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint transformations for deep neural networks,” in Proceedings of the IEEE
arXiv:1312.6199, 2013. conference on computer vision and pattern recognition, 2017, pp. 1492–
[32] S. Bubeck, E. Price, and I. Razenshteyn, “Adversarial examples from 1500.
computational constraints,” arXiv preprint arXiv:1805.10204, 2018. [55] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
[33] X. Ma, B. Li, Y. Wang, S. M. Erfani, S. Wijewickrema, G. Schoenebeck, connected convolutional networks,” in Proceedings of the IEEE confer-
D. Song, M. E. Houle, and J. Bailey, “Characterizing adversar- ence on computer vision and pattern recognition, 2017, pp. 4700–4708.
ial subspaces using local intrinsic dimensionality,” arXiv preprint [56] M. Schmitt, L. H. Hughes, and X. X. Zhu, “The sen1-2 dataset for deep
arXiv:1801.02613, 2018. learning in sar-optical data fusion,” arXiv preprint arXiv:1807.01569,
[34] S. Gulshad, J. H. Metzen, A. Smeulders, and Z. Akata, “Interpreting 2018.
adversarial examples with attributes,” arXiv preprint arXiv:1904.08279,
2019.
[35] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry,
“Adversarial examples are not bugs, they are features,” in Advances in
Neural Information Processing Systems, 2019, pp. 125–136.
[36] A. Boloor, X. He, C. Gill, Y. Vorobeychik, and X. Zhang, “Simple
physical adversarial examples against end-to-end autonomous driving
models,” in 2019 IEEE International Conference on Embedded Software
and Systems (ICESS). IEEE, 2019, pp. 1–7.
[37] A. J. Bose and P. Aarabi, “Adversarial attacks on face detectors using
neural net based constrained optimization,” in 2018 IEEE 20th Inter-
national Workshop on Multimedia Signal Processing (MMSP). IEEE,
2018, pp. 1–6.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSTARS.2020.3038683, IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 16
Haifeng Li received the master’s degree in trans- Zhenqi Cui is currently pursuing the Ph.D. de-
portation engineering from the South China Uni- gree with the Central South University, Changsha
versity of Technology, Guangzhou, China, in 2005, 410083, China. His research interest includes remote
and the Ph.D. degree in photogrammetry and remote sensing image understanding and meta learning.
sensing from Wuhan University, Wuhan, China, in
2009. He is currently a Professor with the School
of Geosciences and Info-Physics, Central South
University, Changsha, China. He was a Research
Associate with the Department of Land Surveying
and Geo-Informatics, The Hong Kong Polytechnic
University, Hong Kong, in 2011, and a Visiting
Scholar with the University of Illinois at Urbana-Champaign, Urbana, IL,
USA, from 2013 to 2014. He has authored over 30 journal papers. His current
research interests include geo/remote sensing big data, machine/deep learning,
and artificial/brain-inspired intelligence. He is a reviewer for many journals. Xiaoming Mei received the B.S. degree in Applied
Geophysics from Central South University, Chang-
sha, China, in 2000, and the M.S. degree in Geode-
tection and Information Technology from Central
South University, in 2003, and the Ph.D. degree in
Photogrammetry and Remote Sensing from Wuhan
Haikuo Huang is currently pursuing the master University, Wuhan, China, in 2007. His research
degree with the Central South University, Changsha interests include multi- and hyperspectral remote
410083, China. His research interest includes remote sensing data processing, high resolution image pro-
sensing image understanding and machine learning, cessing and scene analysis, LiDAR data processing,
robust machine learning. and computational intelligence.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/