Professional Documents
Culture Documents
Brajesh Kumar , Onkar Dikshit , Ashwani Gupta & Manoj Kumar Singh
To cite this article: Brajesh Kumar , Onkar Dikshit , Ashwani Gupta & Manoj Kumar Singh (2020)
Feature extraction for hyperspectral image classification: a review, International Journal of Remote
Sensing, 41:16, 6248-6287, DOI: 10.1080/01431161.2020.1736732
1. Introduction
Hyperspectral images are captured in hundreds of fine narrow contiguous spectral bands
(Goetz 2009) in visible to infrared regions of electromagnetic spectrum. Each pixel in
hyperspectral image is represented by a vector whose size is equal to the number of
spectral bands. The pixels have very detailed spectral signatures as each component of
the vector is a measurement corresponding to a specific wavelength. The contiguous
acquisition makes it possible to derive radiance spectrum for each pixel in the image. The
rich spectral information helps to better discriminate surface features and objects than
traditional imaging systems (Li et al. 2011a). However, due to fine spectral distance, these
bands are highly correlated and provide redundant information (Jia and Richards 1994)
for a particular application. Since hyperspectral sensors are not designed for specific
applications, some bands that are useful in a given application may not reveal important
information for some other applications (Yin, Wang, and Hu 2012). Therefore, extracting
application specific information is crucial for the effective use of hyperspectral images.
Hyperspectral image classification is an important tool for the quantitative analysis of
image data having application in a wide range of areas including environmental studies,
agricultural monitoring, defence, urban planning, weather forecasting, etc. High dimen-
sionality poses some challenges including curse of dimensionality (Hughes 1968) and
increased computational cost that make supervised classification a challenging task.
The redundant and noisy bands not only put unnecessary computational load but also
affect the classification accuracy. Increased number of features provides more information
into the classifier but number of training samples required for reasonable estimation of
statistical behaviour of the data increases exponentially as dimensionality becomes
higher (Landgrebe 2003). Maintaining a reasonably good ratio of available training
samples to the number of features is important for the good performance of any classifier.
However, usually only limited number of training samples is available (Huang and Kuo
2010) as the sampling and collection of reliable training samples is itself a complex and
expensive task. In the absence of sufficient number of training pixels, the classification
accuracies tend to be poor. A widely accepted approach to mitigate dimensionality
related issues is to represent data with the reduced number of features. Feature reduction
is a crucial preprocessing step for the effective hyperspectral image classification.
The most dimensionality reduction techniques can be divided into two major cate-
gories: feature selection and spectral feature extraction. Feature selection algorithms
mainly aim to reduce spectral redundancy and optimally select a subset of original
spectral bands discarding the others (Serpico and Moser 2007). Various measures are
used as selection criteria to distinguish the features including spectral distance (Backer
et al. 2005; Ifarraguerri and Prairie 2004), variance (Chang et al. 1999), mutual information
(Martinez-Uso et al. 2007; Estevez et al. 2009), spectral angle mapper (Keshava 2004),
spectral divergence (Chang and Wang 2016), etc. But separability-based feature selection
methods are computationally very expensive (Landgrebe 2003) and it is difficult to
determine optimal number of features to select. The second approach feature extraction
either enhances relevant bands by arithmetic operations or projects the data onto a new
feature space preserving the discriminative information (Yin, Wang, and Hu 2012). The
new feature space is usually a lower dimensional space but sometimes it may have same
or higher dimension than the original one. Although, original form of the data is lost and
its physical interpretation becomes difficult, but spectral feature extraction is more
effective approach between the two paradigms (Serpico and Moser 2007). With recent
developments, feature extraction not necessarily generates lower number of features
always. Some modern techniques such as Tensor Principle Component Analysis and some
deep learning based methods generate higher than the original number of features. With
the advent of such techniques and availability of high end computing resources, high
dimensionality is no longer a curse always, although, availability of a good number of
training samples is still a point of concern. Modern sensors provide fine spatial resolution
enabling the availability of information on smaller spatial structures, such as edges,
texture, shape, size, etc. The additional information on spatial structures helps to better
6250 B. KUMAR ET AL.
discriminate the objects and land cover features. Extraction of information on spatial
structures from the image data is crucial for classification accuracy.
Feature extraction/selection techniques are reviewed by researchers time to time. Jia,
Kuo, and Crawford (2013) provided a comprehensive review of feature extraction as well
as feature selection methods with a focus on dimensionality reduction and feature
mining. The work is concerned with the extraction of spectral information only. Li et al.
(2018) reviewed dimensionality reduction techniques in a specific domain that is based on
discriminant analysis. The authors discussed and analyzed mainly the linear discriminant
analysis (LDA), sparse graph-based discriminant analysis (SGDA), and their various exten-
sions. The article is concerned with spectral features and dimensionality reduction aspect
only. The modern deep learning based methods are not covered and there is no discus-
sion on the spatial or spectral–spatial feature extraction in these articles. A good review of
feature selection techniques is presented by Sun and Du (2019). It exclusively covers band
selection methods only. Both conventional and modern approaches are nicely presented
in the article.
In this work, major feature extraction techniques are reviewed and analyzed with their
strengths and limitations. The focus of the work is on the extraction of both spectral and
spatial information. It is a comprehensive review that includes most of the major spectral,
spatial, and spectral–spatial features extraction techniques. It attempts to cover a range of
methods from conventional to advanced ones including deep learning techniques.
Dutta et al. (2015) quantified soil constituents from airborne hyperspectral data using
NDVI and lasso regression algorithm. Garcia-Salgado and Ponomaryov (2016) converted
the hyperspectral image to NDVI representation and computed texture features for
classification. Although knowledge-based features have direct relation to physical para-
meters but application specific expertise is required to derive such features. The purpose
of this category of feature extraction is not dimensionality reduction at all. Rather, these
techniques are used to obtain additional and different kind of information from spectral
knowledge to discriminate the land cover types/objects.
Y ¼ ΩT X (1)
where Y 2 RDP is the transformed data and D B is the reduced number of bands. The
transformation matrix Ω can be computed by optimizing a function called projection
index that is a real valued function of Y
where f ðÞ depends on the application. Ifarraguerri and Chang (2000) used projection
index based on information divergence for hyperspectral image classification. Chiang,
Chang, and Ginsberg (2001) proposed projection indices based on moments for unsu-
pervised target detection. Principal component analysis (PCA) (Joliffe 2002) is a widely
used unsupervised technique that can be considered as a type of PP that uses variance as
a projection index. It generates features called principal components minimizing the
6252 B. KUMAR ET AL.
1X P
1X P
ΣX ¼ ðx i xÞðx i xÞT ; x ¼ xi (3)
P i¼1 P i¼1
where x is the sample mean. The B D transformation matrix Ω is obtained from eigen-
vectors of ΣX corresponding to top D non-zero eigenvalues by solving eigenvalue equation
2 3
λ1 0
6 λ2 7
6 7
ΣX V ¼ ΛV; Λ ¼ 6 .. 7 (4)
4 . 5
0 λB
so that
ΣZ ¼ ΣX þ ΣW (6)
ΣN Σ1
Z V ¼ ΛV (7)
Huang and Zhang (2010), Rasti, Ulfarsson, and Sveinsson (2010), and Dopido et al. (2012)
explored the capabilities of MNF as a feature extraction technique for hyperspectral
images and established its superiority over PCA.
Both PCA and MNF being based on second-order statistics can not characterize subtle
material substances as for such substances sufficient samples are not available to con-
stitute the reliable statistics (Wang and Chang 2016). Accurate determination of the
covariance matrix also depends on the availability of adequate samples. Under such
circumstances independent component analysis (ICA) (Hyverinen, Karhunen, and Oja
2001) works better. ICA is related to PCA, which intends to find a linear decomposition
INTERNATIONAL JOURNAL OF REMOTE SENSING 6253
X ¼ AY (8)
J ¼ trðS1
w Sb Þ (9)
where Sw is within-class scatter matrix, Sb is between-class scatter matrix, and tr(.) is the
trace of the matrix. The matrices Sw and Sb (Lee and Landgrebe 1993) are defined using
training data for K classes as follows
X
K
Sw ¼ pi Σi (10)
i¼1
6254 B. KUMAR ET AL.
X
K X
K
Sb ¼ pi ðmi m0 Þðmi m0 ÞT ; m0 ¼ pi mi (11)
i¼1 i¼1
where mi , pi , and Σi are the mean vector, prior probability, and covariance matrix of class i,
respectively. The transformation matrix Ω is obtained by solving following eigenvalue
equation
ðS1
w Sb ΛIÞΩ ¼ 0 (12)
LFDA has shown good performance for different types of images but it has limited success
with high dimensional images (Huang and Kuo 2010) due to its inherent limitations. LFDA
makes assumption on normal-like distribution of classes that may not be the case with real
images. For K-class data, it produces at most K 1 features as rank of the between-class
scatter matrix is K. Such a low number of features are not optimal always (Kuo and Landgrebe
2004). Moreover, the within-class scatter matrix is often singular for the high dimensional
images (Yang, Yu, and Kuo 2010). Non-parametric discriminant analysis (NDA) was introduced
by Fukunaga and Mantock (1983) by defining a new non-parametric between-class scatter
matrix to overcome some limitations of LFDA. But NDA has same singularity problem. NDA
was improved by non-parametric weighted feature extraction (NWFE) (Kuo and Landgrebe
2004) method. NWFE defines new within-class and between-class scatter matrices and
computes weighted means. NWFE is a more successful method for hyperspectral imagery;
however, it faces issues related to computation time. Yang, Yu, and Kuo (2010) developed
a method known as cosine-based nonparametric feature extraction (CNFE), which uses
a cosine distance based weight function for scatter matrices. CNFE employs
a regularization technique to handle singularity problem. A method known as decision
boundary feature extraction (DBFE) (Lee and Landgrebe 1993) was developed specifically
for hyperspectral images. It uses decision boundaries instead of class mean and covariance
matrices to derive feature vectors. The DBFE is computationally intensive and requires good
number of quality training samples for determining efficient decision boundaries. It is an
efficient feature extraction technique but with limited training samples its performance may
degrade.
The kernel trick can also be used to extend linear supervised methods to nonlinear
feature extraction. LFDA is extended by Baudat and Anouar (2000) as generalized dis-
criminant analysis (GDA) using kernel function. GDA performs non-linear analysis but like
LFDA it produces only K 1 features. Kuo, Li, and Yang (2009) proposed kernel NWFE
(KNWFE) method and shown that NWFE is a special case of KNWFE with linear kernel.
KNWFE works efficiently if good number of training samples is available. Li et al. (2011b)
effectively used another kernel version of LFDA named as kernel local Fisher discriminant
analysis (KLFDA) (Sugiyama 2007) for hyperspectral feature extraction. KLFDA is
a combination of locality preserving projection and kernel discriminant analysis.
hyperspectral data. The fundamental operator is mother wavelet, which is a function that
satisfy following requirement (Bruce, Koger, and Li 2002)
ð þ1
jFðψðtÞÞj2
dω < 1 (13)
1 jωj
Most of these methods use discrete wavelet transform (DWT). Usually for spectral feature
extraction 1-D DWT is used (Kumar and Dikshit 2015a). The 1-D DWT can be applied on
a hyperspectral signal f ðÞ of length B as follows
X
B1
Wψ ði; jÞ ¼ f ðtÞψ i;j ðtÞ (14)
t¼0
where ψi;j ¼ 2j=2 ψð2i t jÞ, 2i is scale parameter, and j is translation parameter. It
decomposes the signal into approximation (L) and details (H) coefficients. Bruce, Koger,
and Li (2002) used 1-D DWT iteratively and shown that wavelet decomposition can be
performed upto log2 ðBÞ levels without losing significant information. The wavelet coeffi-
cients are used as spectral features. The authors investigated a number of mother
wavelets and found that lower order wavelets are better for hyperspectral image feature
extraction. They also established that larger scale wavelet coefficients are more useful.
Kaewpijit, Moigne, and El-Ghazawi (2003) decomposed hyperspectral signal with 1-D DWT
and shown that shape of the spectral signature is still recognizable with reduced dimen-
sionality. Li (2004) tested various wavelet types to extract features for linear unmixing of
hyperspectral signals. The results have shown that Haar wavelet performs better for such
applications.
v ¼ gðWx þ βÞ (15)
6256 B. KUMAR ET AL.
Figure 1. Autoencoder.
where W is input-to-hidden weight matrix, β is the bias vector of hidden layer, and gðÞ is
the activation function. The latent representation v is then used to reconstruct an
approximation x^ 2 RN by reverse mapping,
x^ ¼ gðΘ v þ γÞ (16)
where Θ is hidden-to-output weight matrix and γ is the bias vector of output layer. The
purpose of training is to minimize the reconstruction error Jðx; ^xÞ between x and x^. Jðx; x^Þ is
usually square error cost but can also be computed in many other different ways. The latent
representation is the compressed form of original input as h
N. If reconstruction error is
within a threshold, the latent representation can be used as a reduced set of features. To
minimize the error, a number of autoencoders are stacked as shown in Figure 2. The hidden
layer in one autoencoder is input to the next one. This arrangement is known as stacked
autoenocoder (SAE) that can progressively generate deep features. The multilayer network so
formed determines parameters by layerwise greedy learning (Lv, Han, and Qiu 2017). The
parameters are fine tuned with back propagation. If label information is available at topmost
layer, it is a supervised learning otherwise it becomes an unsupervised learning process. For
pixelwise spectral features, the pixel vector is fed to SAE as shown in Figure 3. Chen et al.
(2014) used SAE having five layers to generate deep features for hyperspectral images and
classified the images with logistic regression. Sun et al. (2017) obtained discriminative deep
features using SAE and designed a semi-supervised approach for training the encoders. The
authors also suggested a mean pooling scheme for fusing spectral and spatial information.
A different approach is proposed by Zhou et al. (2019) with some optimization criteria to learn
discriminative features using SAE. The authors applied a local Fisher discriminant regulariza-
tion on hidden layers of SAE. It helps to improve within-class and between-class diversity.
Autoencoders or SAEs rely on smaller hidden layers. However, larger hidden layers may
also provide useful features. A different approach of encoders known as sparse autoencoder
(Tao et al. 2015) allows to have larger hidden layers, i.e. h B. These encoders impose
a condition of sparsity on the hidden units intending to keep most of the neurons inactive.
INTERNATIONAL JOURNAL OF REMOTE SENSING 6257
Encoder1 Decoder1
x
Encoder2 Decoder2
Encoder3 Decoder3
Bottleneck
Hidden layer
Feature
set
Hyperspectral
image Pixel vector
Layers of SAE
Given a training set T ¼ fx 1 ; x 2 ; . . . ; x Q g, the training of sparse encoder aims to find optimal
parameters by minimizing the cost function given by
1X Q Xsl
Jsparse ¼ Jðx i ; x^i Þ þ β ^j Þ
KLðρ k ρ (17)
Q i¼1 j¼1
6258 B. KUMAR ET AL.
where Q is the number of training pixels, sl is number of units in lth hidden layer, and
^Þ is the Kullback-Leibler (KL) divergence, defined as follows:
KLðρ k ρ
ρ ð1 ρÞ
^j Þ ¼ ρ log
KLðρ k ρ þ ð1 ρÞlog (18)
^j
ρ ^j Þ
ð1 ρ
where ρ is the sparsity parameter close to 0 and ρ ^j is the average activation of jth hidden
unit. Multiple sparse autoencoders are stacked to form stacked sparse autoencoder
(SSAE). Kang et al. (2018) fused spectral and Gabor features and used SSAE for deep
feature learning. Denoised autoencoder (DAE) (Xing, Ma, and Yang 2016) is another type
of encoder developed from autoencoder. DAE first modifies the original data x to x0 by
setting up some elements of x to zero or adding some Gaussian noise to x. The modified
data x 0 is used as input. DAE aims to reconstruct x from the output. Similar to SAE and
SSAE, stacked DAE can be framed that performs better for noisy data. Hao et al. (2018)
employed stacked DAE to encode pixelwise spectral values for hyperspectral images. The
deep features so obtained were fused with other features to get good classification
accuracy. Lan et al. (2019) applied k-sparse method for sparsity and introduced a k-
sparse denoising autoencoder. In addition, the k-sparsity based method also uses
a dropout function at hidden layer to prevent overfitting.
Restricted Boltzman Machine (RBM) (Chen, Zhao, and Jia 2015) is a generative stochas-
tic model of neural networks that consists of two fully connected layers, a visible (input)
layer and a hidden layer. Contrary to conventional Boltzmann Machine, there are no
connections among the units of the same layer in RBM. The data is fed through visible
units and a latent representation is produced at hidden layer. The representation is fed
back and input is reconstructed at visible layer. Each unit has an activation probability and
a state. RBMs are trained in unsupervised mode. The training of RBM involves adjusting of
weights and biases until the reconstruction error is acceptably small. The network
provides a probability to an input with the help of an energy function EðÞ. Both activation
probability and state are used during the training. The joint probability distribution of
units can be expressed as
1
pðv; h; θÞ ¼ expðEðvr ; hr ; θÞÞ (19)
ZðθÞ
XX
ZðθÞ ¼ Eðvr ; hr ; θÞ (20)
vr hr
If reconstruction error is within the threshold, the contents of hidden units represent the
desired features. A network with multiple RBM layers can be designed that is trained layer-
wise. Output of one trained layer is input to the next layer in the network. This kind of
learning system is known as deep belief network (DBN). The DBN learns deep features
hierarchically reducing the error.
INTERNATIONAL JOURNAL OF REMOTE SENSING 6259
Convolution neural network (CNN) is a special deep learning model inspired by human
vision system. It can be trained using supervised or unsupervised learning approach. The
unsupervised training can be performed with the help of greedy layerwise pertaining
(Romero, Gatta, and Camps-Valls 2016). The supervised training process involves back-
propagation. CNN model typically consists of a group of convolutional layers, pooling
layers, and fully connected layers as shown in Figure 4. Local connections and shared
weights are two important aspects of CNNs that provide better generalization. The
convolutional layer performs convolution of input with kernels or filters. The convolution
in CNN can be defined as
where v l is the output feature map, Kl is the filter, and bl is a bias parameter at lth layer.
The output feature map Fl1 of layer ðl 1Þ becomes input to layer l, ‘ ? ’ is a convolution
operator, and gðÞ is a non-linearity function also known as activation function. ReLU
(Krizhevsky, Sutskever, and Hinton 2012) is one of the most popular activation functions
due to its fast convergence. The output of a neuron at ith position in lth layer and mth
feature map is
!
l;m
XX Sl 1
vi ¼ g b þl;m s iþs
kl;m;p v ðl1Þ;p (23)
p s¼0
where p is the index of feature map in ðl 1Þth layer, bl;m is bias of mth feature map in lth
layer, Sl 1 is the kernel size in lth layer, kl;m;p
s
is the kernel value at position s in pth feature
map. Usually feature maps contain redundant information, therefore convolution is often
followed by pooling operation. Pooling operation reduces the resolution of feature maps
in turn reducing parameters and computation time. Pooling is performed over non-
overlapping subregions of the feature map. The pooling operation can be defined as
v l ¼ f ðv l1 þ bl Þ (24)
where f ðÞ is a function that performs subsampling operation. The output of pooling layer
is smaller than its input. The purpose of pooling is to make features robust and abstract.
Feature Extraction
Several convolutional and pooling layers can be stacked together to make it a deep CNN.
After several layers of convolution and pooling in one direction (1-D convolution), a set of
spectral features is obtained. The last convolutional/pooling layer in hierarchy is followed
by fully connected layers. There could be one or more fully connected layers. The purpose
of fully connected layer is to produce one feature vector. It generates more deep and
abstract features. The operation of a fully connected layer is given as
where Wl is weight matrix and hðÞ is the activation function of the fully connected layer.
To obtain spectral features, the original pixel vector is fed to the first convolution layer
and features are obtained through the last layer in the architecture. As shown in Figure 5,
the pixel vector is used as input to CNN for spectral feature extraction. Chen et al. (2016)
demonstrated the utility of CNN for extracting spectral features.
Various classification frameworks are developed in recent years based on different
deep learning approaches exhibiting impressive performance. However, finding appro-
priate size and number of hidden units for specific problems is a major issue with deep
learning approaches. The major spectral feature extraction techniques are summarized in
Table 1.
Stacking
Feature set
centric window or kernel. The kernels may have different shapes such as square, disk, etc. of
different sizes. Various techniques have been developed over the years for spatial features.
Other morphological operators such as opening and closing are combination of these two
operators. The morphological opening of a pixel x k is carried out by erosion followed by
dilation as follows
With the application of opening and closing, the elements smaller than structuring element
are deleted while retaining bigger ones. This approach helps to determine shape and size
of the objects in the image. However, sometimes these operators may introduce false
objects in the image. This problem can be avoided with the help of reconstruction.
Opening and closing by reconstruction ensure that an object smaller than structuring
element is completely removed, whereas bigger objects remain completely intact. In order
to identify different types of objects, a series of structuring elements of different sizes is
used that leads to the concept of morphological profile. The opening profile Φn ðxk Þ at pixel
xk using n structuring elements is an n-dimensional vector defined as
where ϕðjÞ ðÞ is opening by reconstruction with jth structuring element and 1 j n.
Similarly, closing profile Ψn ðx k Þ at pixel x k is also an n-dimensional vector given by
where ψj ðÞ is closing by reconstruction with structuring element of jth size. The morpho-
logical profile MPðx k Þ at pixel x k is defined by collating Φn ðx k Þ and Ψn ðx k Þ as follows
INTERNATIONAL JOURNAL OF REMOTE SENSING 6263
MPðx k Þ ¼ fΦn ðx k Þ; Ψn ðx k Þg
(32)
¼ fϕðnÞ ðx k Þ; :::; ϕð2Þ ðxk Þ; x k ; ψð2Þ ðx k Þ; :::; ψðnÞ ðx k Þg
where MPj ðÞ is morphological profile built on jth band image and 1 j m.
Morphological profiles are good for representing multi-scale variability but not sufficient
for modelling other geometrical properties of the structures (Mura et al. 2011). To over-
come these limitations, attribute filters are used instead of reconstruction based morpho-
logical operators. Attribute filters operate on spatially connected pixels using some
criteria based on different attributes such as area, standard deviation, and volume, etc.
Analogous to extended morphological profiles, extended attribute profiles can be cre-
ated, which model geometrical properties of the objects.
where offset ðΔx ; Δy Þ is the distance between pixel ðx; yÞ and its neighbour. The co-
occurrence matrix G is sensitive to the rotation. For a rotated image, a different G may
be obtained depending on the angle of rotation. In order to alleviate the variance to
rotation, G is calculated for different offsets at different angles and averaged. A number of
statistical features can be extracted using the GLCM. Among these, major texture features
are: contrast, correlation, variance, entropy, angular second moment, inverse difference
moment, sum average, sum variance, sum entropy, difference variance, difference
entropy, information measures of correlation, and maximal correlation coefficient.
6264 B. KUMAR ET AL.
where σ is normalization scale, ðx; yÞ are spatial variables, θ is the orientation, and F is the
central frequency of scale. By using different scales and orientations a set of filters or filter
bank is prepared. Gabor filters provide two types of features namely: unichrome and oppo-
nent. Unichrome features are computed from a single band, while opponent Gabor features
combine spatial information across multiple bands. Shi and Healey (2003) combined both
unichrome and opponent Gabor features to extract texture information from hyperspectral
images. The combined set contains a large number of features. Therefore, authors first used
spectral binning and PCA for dimensionality reduction before extracting Gabor features. The
unichrome features were computed band by band. A subset of features was optimally
selected for the purpose of classification. A simplified process was proposed by Rajadell,
Garca-Sevilla, and Pla (2013) to reduce the computational efforts for feature extraction.
However, they also used a reduced number of bands and applied the filter bank over the
whole image.
X
n1
LBPn;r ¼ sðgi gc Þ2i ;
i¼0
(36)
1; x0
sðxÞ ¼
0; x<0
where gc and gi are gray level intensity values of the centre and ith neighbour, and n is the
number of neighbours represented on a circle of radius r. The LBPn;r operator is gray-scale
invariant but rotation variant. The 3 3 patterns (with n ¼ 8, r ¼ 1) are present in the
most of the observed textures. These patterns are called as uniform patterns. A uniformity
measure U was introduced by Ojala, Pietikäinen, and Mäenpää (2002) to formally define
uniform patterns. The value of U indicates the number of spatial transitions in the pattern.
Only those patterns are considered as uniform, which have U value of at most 2. An
operator LBPriu2
n;r for gray-scale and rotation invariant uniform pattern is described as
Pn1
LBPriu2 ¼ i¼0 sðgi gc Þ; if UðLBPn;r Þ 2
(37)
n;r
n þ 1; otherwise
where
UðLBPn;r Þ ¼ jsðg gc Þ sðg0 gc Þj þ
Pn1 n1 (38)
i¼1 j sðgi g c Þ sðgi1 g c Þ j
The superscript ‘riu2’ indicates rotation invariant uniform pattern with UðLBPn;r Þ 2. For
hyperspectral images, the texture patterns can be computed for each spectral band.
Kumar and Dikshit (2015a) implemented geometric moments for hyperspectral imagery.
The authors applied PCA to the input hyperspectral image and computed moments from
the first principal component. Mirzapour and Ghassemian (2016) compared geometric,
Zernike, and Legendre moments for hyperspectral images. They used several principal
components for moment computation. The authors shown that geometric moments are
better for images of agriculture areas, while Zernike and Legendre moments are better for
urban datasets.
Pixel
neighboring
region
Feature
set
Reduced
Hyperspectral image Flattening
image
Layers of SAE
!
XX
Sl 1 X
Tl 1
ðiþsÞ;ðjþtÞ
v l;m
i;j ¼g b l;m
þ w s;t
l;m;p v ðl1Þ;p (41)
p s¼0 t¼0
s;t
where Sl Tl is the size of kernel at lth layer, and wl;m;p is the weight value at position ðs; tÞ
in pth feature map. The 2-D CNN is usually not used with the raw hyperspectral image.
Instead, the original image is reduced as shown in Figure 7. This is the major drawback of
2-D CNN based schemes as original spatial correlation is lost in the process. Chen et al.
(2016) applied PCA on the original image and took only the first principal component. The
first principal component was used to extract layerwise deep spatial feature using CNN
with the help of 4 4 and 5 5 kernels at different layers. Yang, Zhao, and Chan (2017)
averaged all the spectral bands and generated a single band image. The spatial features
were produced from single band by using CNN with the help of a comparatively large
kernel of the size of 21 21. Another PCA based approach was proposed by Song et al.
(2018), where a few most informative principal components were taken as input to CNN.
The deep spatial were learned with the help of 23 23, 25 25, and 27 27 kernels for
different images. The major spatial feature techniques are summarized in Table 2.
4. Spectral–spatial features
When separate steps are used for extraction of spectral and spatial features, the joint
spectral–spatial correlation is ignored. Spectral–spatial feature extraction techniques
simultaneously extract spectral and spatial features. These techniques work on all the
three dimensions of the raw hyperspectral image cube without changing the original
shape. Thus, original spectral–spatial correlation is retained. Mostly, these techniques are
the extension of conventional 2-D techniques into 3-D space.
Hyperspectral image
Stacking
Pixel
neighboring
region
Feature set
Pooling layer
Convolutional layer
Layers of CNN
1X n
Σtv ¼ ðx i xtv Þðxtv
i
xtv ÞT (42)
n i¼1 tv
INTERNATIONAL JOURNAL OF REMOTE SENSING 6269
P
n
where xtv ¼ n1 i
xtv and n is the number of image elements. In an analogy to PCA, the
i¼1
transformation tensor matrix can be obtained by solving following equation
Σtv Vtv ¼ Λtv Vtv (43)
where Vtv is an orthonormal tensor matrix and Λtv is a diagonal tensor matrix. A fast
version of Tensor PCA can be implemented in Fourier domain.
Zhang et al. (2013) developed a tensor representation based supervised feature
extraction technique called tensor discriminative locality alignment (TDLA). The tensor
representation preserves original spectral and spatial constraints of the pixels and their
neighbours. TLDA finds a multilinear transformation to reduce the original high order
feature space to a lower order feature space. The output feature space preserves the
discriminability of the classes. It employs an optimization algorithm that reduces the
distance between same class pixels and on the other hand it increases the distance
between pixels of different classes.
Another tensor-based feature extraction method called local tensor discriminant analysis
(LTDA) was proposed by Nie et al. (2009) to overcome the limitations of conventional LDA,
which makes assumption on Gaussian like class distribution. LTDA was used by Zhong et al.
(2015) for reducing the redundancy from the spectral–spatial feature set of the hyperspec-
tral images. The authors demonstrated that features are better represented in tensor format.
where S is normalization scale, ðx; yÞ and λ are spatial and wavelength variables respectively,
ðσx ; σy ; σλ Þ defines the width of Gaussian envelope at three axes, ½x0 ; y0 ; λ0 T ¼ R ½x; y; λT ,
and R is a rotational matrix for transformation. The amplitude of centre frequency is given by
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
F ¼ ðFx2 þ Fy2 þ Fλ2 Þ (45)
identify the best subset. Bau, Sarkar, and Healey (2010) used a set of 3-D Gabor filters for
modelling spectral–spatial information. They developed a selection procedure to select
the most significant features from the generated set. Shen and Jia (2011) designed a set of
complex Gabor filters based on different orientations and frequencies. A selection and
fusion method was developed to reduce the redundant contents and generating an
optimized feature set. Considering the huge features generated by Gabor filters, Shen
et al. (2013) proposed a symmetrical uncertainty and the approximate Markov blanket
based supervised selection method for Gabor features.
where ðΔx1 ; Δy1 ; Δz1 Þ and ðΔx2 ; Δy2 ; Δz2 Þ are offsets. The elements of GLCM are converted to
probability form by dividing each element by the sum of G to produce a normalized co-
occurrence matrix Q
Gði; j; kÞ
Qði; j; kÞ ¼ P P P (48)
i j k Gði; j; kÞ
The statistical features can be determined form Q. Some of the most useful features such
as contrast (C), angular second moment (A), and entropy (E) are computed as follows
INTERNATIONAL JOURNAL OF REMOTE SENSING 6271
XXX
C¼ Qði; j; kÞ½ði jÞ2 þ ðj kÞ2 þ ði kÞ2 (49)
i j k
XXX
A¼ Q2 ði; j; kÞ (50)
i j k
XXX
E¼ Q2 ði; j; kÞ½ln Qði; j; kÞ (51)
i j k
The appropriate kernel size is important for texture analysis. A semi-variance and spectral
separability based method was developed by Tsai and Lai (2013) for choosing appropriate
kernel size for volumetric images. All such features are kept collectively to form a texture
feature set.
where Q is a 3-D structuring element centred at ði; j; kÞ. Similarly, 3-D dilation is performed
as follows
The 3-D opening, 3-D closing, and 3-D profiles are also derived from the general
morphological operations as defined in Section 3.1. The 3-D operators better suit to
the cubical nature of the hyperspectral data as cubical nature of the data is preserved.
where gc is the gray level value of centre voxel and gi corresponds to ith voxel gray level, r
is the radius of sphere, and n is the number of voxels in neighbourhood. The local
6272 B. KUMAR ET AL.
continuity of the surface can be described by another number 3DLBP2n;r representing the
number neighbouring voxels having gray level value larger than that of central voxel
determined as follows
Pn1
3DLBP2n;r ¼ i¼0 sðgi gc Þ; if Vð3DLBP2n;r Þ n
(55)
n þ 1; otherwise
where
Vð3DLBP2n;r Þ ¼ sðgj gc Þ sðgk gc Þ (56)
where gj and gk are adjacent voxels. The 3-D LBP has exhibited better performance than
2-D LBP for hyperspectral images.
Hyperspectral image
Feature
set
Neighboring region
Reduced image
Flattening
spatial features from DBN. A similar approach was used by Lan et al. (2019) with DAE. The
spatial component was prepared from a PCA reduced image with the help of a cubical
kernel. Similarly, an unsupervised feature learning approach was developed by Tao et al.
(2015) with SSAE who proposed to use multiple kernels in multiscale manner for spatial
component. They used reduced spectral dimension for computational efficiency. Kang
et al. (2018) used SSAE to generate deep spectral–spatial features with the help of Gabor
filters. The Gabor features were fused with spectral values to provide as input to sparse
stacked autoencoder. After training the deep network, the deep features were captured.
The spectral–spatial feature extraction frameworks based on different types of
autoencoders can produce joint deep features. However, these methods still use
vectorization and thus do not preserve the original correlation of the data.
3-D CNN can better deal with joint spectral–spatial information. It uses
a 3-D kernel for convolution. The Equation (42) is further extended to get the
value at a neuron at position ði; j; kÞ in a 3-D CNN
!
XX
Sl 1 X X
Tl 1 Ul 1
ðiþsÞ;ðjþtÞ;ðkþuÞ
vi;j;k
l;m ¼ g βl;m þ s;t
wl;m;p v ðl1Þ;p (57)
p s¼0 t¼0 u¼0
where Sl Tl Ul is the kernel size at lth layer. The 3-D CNN can be used with raw image
cube without any dimensionality reduction to extract joint spectral–spatial features for
the classification. For this purpose, image is divided into blocks/patches equal to the
number of pixels in the image. Each patch corresponds to one particular pixel at the
centre of that patch. The image patches are used as input to CNN for extracting pixelwise
features as shown in Figure 9. Chen et al. (2016) demonstrated the use of 3-D CNN for
spectral–spatial feature computation with the help of 27 27 B patches. A classification
framework based on 3-D CNN was developed by Chen et al. (2018), where spectral and
spatial features jointly exploited. The authors tested different block sizes and found that
larger blocks provide better results.
Hyperspectral image
Stacking
Convolutional layer
Layers of CNN
5. Experiments
Experiments are carried out for some well known techniques presented in Section 2.
Although experimental results for individual techniques are reported in various publica-
tions, here a comparative study is performed.
5.1.2. Salinas
Salinas is a 512 217 pixels image captured by AVIRIS sensor over Salinas valley, USA in
1998. The pixel size is 3.7 m. It represents an agriculture land mainly consisting of soil,
vegetation, and vineyard fields. Originally, there were 224 spectral bands out of which 20
bands are removed. There are 16 information classes: Broccoli green weeds 1, Broccoli
green weeds 2, Fallow, Fallow rough plough, Fallow smooth, Stubble, Celery, Grapes
untrained, Soil vineyard develop, Corn senesced green weeds, Lettuce romaine 4 wk,
Lettuce romaine 5 wk, Lettuce romaine 6 wk, Lettuce romaine 7 wk, Vineyard untrained,
and Vineyard vertical trellis. Figure 11 shows the FCC and ground reference for Salinas.
Asphalt
Meadows
Gravel
Tree
Metal sheets
Soil
Bitumen
Bricks
Shadow
Undefined
100 m
(a) (b)
The classification accuracy is measured in terms of overall accuracy (OA) and kappa (κ)
coefficient. Both global and classwise accuracies are reported. All the results are the
average of 10 trials.
3 Fallow
5 Fallow smooth
6 Stubble
7 Celery
8 Grapes untrained
15 Vineyard untrained
100 m Undefined
(a) (b)
Table 3. The information classes and corresponding number of training and test pixels for hyper-
spectral datasets.
Pavia University Salinas
Class Class name Train Test Class Class name Train Test
Class 1 Asphalt 548 6304 Class 1 Broccoli green weeds 1 200 1809
Class 2 Meadows 540 18,146 Class 2 Broccoli green weeds 2 200 3526
Class 3 Gravel 392 1815 Class 3 Fallow 200 1776
Class 4 Tree 524 2912 Class 4 Fallow rough plough 200 1194
Class 5 Metal sheets 292 1113 Class 5 Fallow smooth 200 2478
Class 6 Soil 532 4572 Class 6 Stubble 200 3759
Class 7 Bitumen 375 981 Class 7 Celery 200 3559
Class 8 Bricks 514 3364 Class 8 Grapes untrained 200 11,071
Class 9 Shadow 231 795 Class 9 Soil vineyard develop 200 6003
Class 10 Corn senesced green weeds 200 3078
Class 11 Lettuce romaine 4 wk 200 868
Class 12 Lettuce romaine 5 wk 200 1727
Class 13 Lettuce romaine 6 wk 200 716
Class 14 Lettuce romaine 7 wk 200 870
Class 15 Vineyard untrained 200 7068
Class 16 Vineyard vertical trellis 200 1607
INTERNATIONAL JOURNAL OF REMOTE SENSING 6277
Table 4. Classwise and global classification accuracies obtained using major conventional spectral
feature extraction techniques for Pavia University. The number of features is given within brackets.
Class Raw PCA (6) MNF (5) DWT (7) ICA (15) NWFE (40) CNFE (11) DBFE (27)
Class 1 0.8050 0.8394 0.8255 0.8134 0.8357 0.7593 0.7579 0.9155
Class 2 0.7185 0.6692 0.6673 0.6619 0.6525 0.5606 0.7691 0.8683
Class 3 0.7612 0.7254 0.7226 0.7374 0.6762 0.6986 0.7774 0.8144
Class 4 0.9627 0.9429 0.9287 0.9469 0.9504 0.8301 0.8619 0.9461
Class 5 0.9861 0.9776 0.9981 0.9907 0.9981 0.9870 0.9946 1.0000
Class 6 0.8263 0.8235 0.7887 0.7555 0.7927 0.5093 0.6721 0.9232
Class 7 0.8849 0.8751 0.8663 0.8784 0.8470 0.8394 0.9046 0.8950
Class 8 0.8428 0.8176 0.7746 0.8069 0.8156 0.7433 0.8937 0.8710
Class 9 1.0000 0.9917 0.9986 1.0000 0.9945 0.9807 0.9798 1.0000
Global κ 0.8041 0.7838 0.7717 0.7691 0.7702 0.6710 0.7269 0.8968
OA (%) 85.16 83.49 82.65 82.46 82.81 74.92 79.21 92.34
Table 5. Classwise and global classification accuracies obtained using major deep learning spectral
feature extraction techniques for Pavia University. The number of features is given within brackets.
Class SAE (60) SSAE (60) DBN (30) CNN (9)
Class 1 0.9146 0.9188 0.9276 0.9346
Class 2 0.9524 0.9465 0.9432 0.9518
Class 3 0.8812 0.8937 0.9128 0.9124
Class 4 0.9627 0.9598 0.9584 0.9457
Class 5 0.9894 0.9914 0.9926 0.9842
Class 6 0.9063 0.9126 0.9077 0.9014
Class 7 0.9149 0.9189 0.9209 0.9165
Class 8 0.8928 0.8876 0.9143 0.9248
Class 9 0.9924 0.9925 0.9915 0.9965
Global κ 0.9236 0.9284 0.9316 0.9368
OA (%) 93.16 93.48 94.31 94.62
Table 6. Classwise and global classification accuracies obtained using major conventional spectral
feature extraction techniques for Salinas. The number of features is given within brackets.
Class Raw (204) PCA (5) MNF (15) DWT (9) ICA (20) NWFE (70) CNFE (30) DBFE (143)
Class 1 0.9943 0.9800 0.9875 0.9869 0.9830 0.9920 0.9807 0.9943
Class 2 0.9924 0.9957 0.9960 0.9942 0.9903 0.9979 0.9961 0.9915
Class 3 0.9890 0.9715 0.9953 0.9942 0.9740 0.9581 0.9751 0.9785
Class 4 0.9941 0.9966 0.9992 0.9983 0.9949 0.9975 0.9975 0.9864
Class 5 0.9819 0.9852 0.9704 0.9763 0.9949 0.9573 0.9819 0.9683
Class 6 0.9971 0.9991 0.9983 0.9960 0.9994 0.9977 0.9980 0.9977
Class 7 0.9909 0.9978 0.9956 0.9949 0.9956 0.9896 0.9946 0.9946
Class 8 0.6342 0.6588 0.6545 0.6708 0.6302 0.6563 0.6088 0.6592
Class 9 0.9900 0.9943 0.9917 0.9793 0.9964 0.9386 0.9876 0.9592
Class 10 0.9408 0.9146 0.8811 0.9016 0.9156 0.9072 0.8860 0.9467
Class 11 0.9851 0.9907 0.9839 0.9770 0.9747 0.9595 0.9841 0.9689
Class 12 0.9959 0.9964 0.9934 0.9991 0.9992 0.9881 0.9987 0.9994
Class 13 0.9795 0.9931 0.9973 0.9931 0.9973 0.9835 0.9946 0.9778
Class 14 0.9783 0.9473 0.9540 0.9526 0.9424 0.9621 0.9465 0.9817
Class 15 0.6559 0.6718 0.6796 0.6495 0.6469 0.6525 0.6783 0.7103
Class 16 0.9873 0.9885 0.9898 0.9955 0.9962 0.9968 0.9943 0.9878
Global κ 0.8741 0.8793 0.8777 0.8767 0.8710 0.8672 0.8676 0.8815
OA (%) 88.72 89.18 89.04 88.96 88.45 88.11 88.13 90.16
Table 7. Classwise and global classification accuracies obtained using major deep learning spectral
feature extraction techniques for Salinas. The number of features is given within brackets.
Class SAE (60) SSAE (60) DBN (30) CNN (16)
Class 1 0.9825 0.9786 0.9924 0.9942
Class 2 0.9846 0.9924 0.9944 0.9931
Class 3 0.9632 0.9648 0.9665 0.9736
Class 4 0.9749 0.9752 0.9732 0.9847
Class 5 0.9858 0.9844 0.9789 0.9729
Class 6 0.9847 0.9769 0.9816 0.9834
Class 7 0.9904 0.9852 0.9811 0.9863
Class 8 0.9066 0.9264 0.9168 0.9166
Class 9 0.9842 0.9676 0.9748 0.9684
Class 10 0.9272 0.9128 0.9224 0.9365
Class 11 0.9314 0.9345 0.9126 0.9325
Class 12 0.9863 0.9856 0.9764 0.9844
Class 13 0.9844 0.9768 0.9602 0.9808
Class 14 0.9285 0.9225 0.9384 0.9215
Class 15 0.8224 0.8428 0.8627 0.8522
Class 16 0.9076 0.9062 0.9118 0.9108
Global κ 0.9218 0.9284 0.9329 0.9414
OA (%) 0.9294 0.9312 93.84 94.85
sharply with the increase in the number of features. Beyond a certain point, the accuracy
stabilizes with no significant improvement when the number of the features increases. In
some cases accuracy drops when larger feature set is used. It can be established that
feature extraction techniques keep the most of the important information in a few top
features having higher eigenvalues. This characteristics helps to reduce the dimension-
ality while retaining the important information.
1.0
0.9
0.8
0.7
0.6 CNFE
CNN
κ
0.5 DBFE
DWT
0.4 ICA
MNF
0.3 NWFE
PCA
0.2 SAE
0.1
2 10 20 30 40 50 60 70 80 90 100
Number of features
(a)
1.0
0.9
0.8
0.7
κ
CNFE
0.6 CNN
DBFE
0.5 DWT
ICA
MNF
0.4 NWFE
PCA
SAE
0.3
2 10 20 30 40 50 60 70 80 90 100
Number of features
(b)
Figure 12. Global k versus the number of features: (a) Pavia University (b) Salinas.
6280 B. KUMAR ET AL.
Figure 13. Classification maps for Pavia University: (a) Raw, (b) DBFE, (c) SAE, (d) CNN, (e) 3-D EMP, (f)
3-D CNN.
Figure 14. Classification maps for Salinas: (a) Raw, (b) DBFE, (c) SAE, (d) CNN, (e) 3-D EMP, (f) 3-D CNN.
6. Conclusion
In this paper, a review of the feature extraction techniques used in the classification of
hyperspectral images is presented. Different approaches for spectral, spatial, and
spectral–spatial feature extraction techniques are reviewed and their strengths and
weaknesses are discussed. The experiments are carried out on two different types of
hyperspectral images. The results show that dimensionality related issues can be
managed well using feature extraction without significantly compromising classifica-
tion accuracy. Supervised techniques provide better accuracy than their unsupervised
counterparts. In the absence of training data, the unsupervised feature extraction can
provide acceptable solutions. It is also observed from the results that spatial features
INTERNATIONAL JOURNAL OF REMOTE SENSING 6281
Acknowledgements
The authors would like to thank Prof. Paolo Gamba of University of Pavia, Italy for providing ROSIS
dataset. This work is supported by TEQIP-III project funded by World Bank, NPIU, and MHRD, Govt. of
India under grant number TEQIP3/MRPSG/01.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This work was supported by the TEQIP III [TEQIP3/MRPSG/01].
ORCID
Brajesh Kumar http://orcid.org/0000-0001-8100-7287
Onkar Dikshit http://orcid.org/0000-0003-3213-8218
Ashwani Gupta http://orcid.org/0000-0002-2199-8346
Manoj Kumar Singh http://orcid.org/0000-0003-3119-1244
References
Bach, F. R., and M. I. Jordan. 2003. “Kernel Independent Component Analysis.” Journal of Machine
Learning Research 3 (1): 1–48.
Bachmann, C. M., T. L. Ainsworth, and R. A. Fusina. 2005. “Exploiting Manifold Geometry in
Hyperspectral Imagery.” IEEE Transactions on Geoscience and Remote Sensing 43 (3): 441–454.
doi:10.1109/TGRS.2004.842292.
Bachmann, C. M., T. L. Ainsworth, and R. A. Fusina. 2006. “Improved Manifold Coordinate
Representations of Large-Scale Hyperspectral Scenes.” IEEE Transactions on Geoscience and
Remote Sensing 44 (10): 2786–2803. doi:10.1109/TGRS.2006.881801.
Backer, S. D., P. Kempeneers, W. Debruyn, and W. Scheunders. 2005. “A Band Selection Technique for
Spectral Classification.” IEEE Geoscience and Remote Sensing Letters 2 (3): 319–323. doi:10.1109/
LGRS.2005.848511.
Bau, T. C., S. Sarkar, and G. Healey. 2010. “Hyperspectral Region Classification Using a
Three-Dimensional Gabor Filterbank.” IEEE Transactions on Geoscience and Remote Sensing 48
(9): 3457–3464. doi:10.1109/TGRS.2010.2046494.
Baudat, G., and F. Anouar. 2000. “Generalized Discriminant Analysis Using a Kernel Approach.”
Neural Computing 12 (1): 2385–2404. doi:10.1162/089976600300014980.
Benediktsson, J. A., and P. Ghamisi. 2016. Spectral-Spatial Classification of Hyperspectral Remote
Sensing Images. London: Artech House.
6282 B. KUMAR ET AL.
Bruce, L. M., C. H. Koger, and J. Li. 2002. “Dimensionality Reduction of Hyperspectral Data Using
Discrete Wavelet Transform Feature Extraction.” IEEE Transactions on Geoscience and Remote
Sensing 40 (10): 2331–2338. doi:10.1109/TGRS.2002.804721.
Chang, C.-I., and Q. Du. 1999. “Interference and Noise-adjusted Principal Components Analysis.” IEEE
Transactions on Geoscience and Remote Sensing 37 (5): 2387–2396. doi:10.1109/36.789637.
Chang, C.-I., Q. Du, T.-L. Sun, and M. Althouse. 1999. “A Joint Band Prioritization and
Band-decorrelation Approach to Band Selection for Hyperspectral Image Classification.” IEEE
Transactions on Geoscience and Remote Sensing 37 (6): 2631–2641. doi:10.1109/36.803411.
Chang, C.-I., and S. Wang. 2006. “Constrained Band Selection for Hyperspectral Imagery.” IEEE
Transactions on Geoscience and Remote Sensing 44 (6): 1575–1585. doi:10.1109/
TGRS.2006.864389.
Chen, C., F. Jiang, C. Yang, S. Rho, W. Shen, S. Liu, and Z. Liu. 2018. “Hyperspectral Classification Based
on Spectralspatial Convolutional Neural Networks.” Engineering Applications of Artificial
Intelligence 68 (1): 165–171. doi:10.1016/j.engappai.2017.10.015.
Chen, Y., H. Jiang, C. Li, X. Jia, and P. Ghamisi. 2016. “Deep Feature Extraction and Classification of
Hyperspectral Images Based on Convolutional Neural Networks.” IEEE Transactions on Geoscience
and Remote Sensing 54 (10): 6232–16241. doi:10.1109/TGRS.2016.2584107.
Chen, Y., Z. Lin, X. Zhao, G. Wang, and Y. Gu. 2014. “Deep Learning-based Classification of
Hyperspectral Data.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing 7 (6): 2094–2107. doi:10.1109/JSTARS.2014.2329330.
Chen, Y., X. Zhao, and X. Jia. 2015. “Spectral-Spatial Classification of Hyperspectral Data Based on
Deep Belief Network.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing 8 (6): 2321–2332. doi:10.1109/JSTARS.2015.2388577.
Chiang, -S.-S., C.-I. Chang, and I. W. Ginsberg. 2001. “Unsupervised Target Detection in Hyperspectral
Images Using Projection Pursuit.” IEEE Transactions on Geoscience and Remote Sensing 39 (7):
1380–1391. doi:10.1109/36.934071.
Dopido, I., A. Villa, A. Plaza, and P. Gamba. 2012. “A Quantitative and Comparative Assessment of
Unmixing-Based Feature Extraction Techniques for Hyperspectral Image Classification.” IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5 (2): 421–435.
doi:10.1109/JSTARS.2011.2176721.
Duda, R. O., P. E. Hart, and D. G. Stock. 2001. Pattern Classification. 2nd ed. New York: Wiley.
Dutta, D., A. E. Goodwell, P. Kumar, J. E. Garvey, R. G. Darmody, D. P. Berretta, and J. A. Greenberg.
2015. “On the Feasibility of Characterizing Soil Properties from AVIRIS Data.” IEEE Transactions on
Geoscience and Remote Sensing 53 (9): 5133–5147. doi:10.1109/TGRS.36.
Estevez, P. A., T. Michel, C. A. Perez, and J. M. Zurada. 2009. “Normalized Mutual Information
Feature Selection.” IEEE Transactions on Neural Networks 20 (2): 189–201. doi:10.1109/
TNN.2008.2005601.
Falco, N., J. A. Benediktsson, and L. Bruzzone. 2014. “A Study on the Effectiveness of Different
Independent Component Analysis Algorithms for Hyperspectral Image Classification.” IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7 (6): 2183–2199.
doi:10.1109/JSTARS.2014.2329792.
Fauvel, M., J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson. 2008. “Spectral and Spatial
Classification of Hyperspectral Data Using SVMs and Morphological Profiles.” IEEE Transactions
on Geoscience and Remote Sensing 46 (11): 3804–3814. doi:10.1109/TGRS.2008.922034.
Friedman, J. H., and J. W. Tukey. 1974. “A Projection Pursuit Algorithm for Exploratory Data Analysis.”
IEEE Transactions on Computers C-23: 881–889. doi:10.1109/T-C.1974.224051.
Fukunaga, K., and M. Mantock. 1983. “Nonparametric Discriminant Analysis.” IEEE Transactions on
Pattern Analysis and Machine Intelligence 5 (6): 671–678. doi:10.1109/TPAMI.1983.4767461.
Garcia-Salgado, B. P., and V. Ponomaryov. 2016. “Feature Extraction Scheme for a Textural
Hyperspectral Image Classification Using Gray-scaled HSV and NDVI Image Features Vectors
Fusion.” Proceedings of International Conference on Electronics, Communications and Computers
(CONIELECOMP), 186–191. Cholula: Mexico.
Goetz, A. F. H. 2009. “Three Decades of Hyperspectral Remote Sensing of the Earth: A Personal View.”
Remote Sensing of Environment 113: S5–S16. doi:10.1016/j.rse.2007.12.014.
INTERNATIONAL JOURNAL OF REMOTE SENSING 6283
Keshava, N. 2004. “Distance Metrics and Band Selection in Hyperspectral Processing with
Applications to Material Identification and Spectral Libraries.” IEEE Transactions on Geoscience
and Remote Sensing 42 (7): 1552–1565. doi:10.1109/TGRS.2004.830549.
Krizhevsky, A., I. Sutskever, and G. E. Hinton. 2012. “ImageNet Classification with Deep Convolutional
Neural Networks.” Proceedings of Advanced Neural Information Processing Systems (NIPS),
1097–1105, Lake Tahoe, Nevada, USA.
Kumar, B., and O. Dikshit. 2015a. “Integrating Spectral and Textural Features for Urban Land Cover
Classification with Hyperspectral Data.” Proceedings of Joint Urban Remote Sensing Event (JURSE),
1–4. Lausanne, Switzerland.
Kumar, B., and O. Dikshit. 2017. “Hyperspectral Image Classification Based on Morphological Profiles
and Decision Fusion.” International Journal of Remote Sensing 38 (20): 5830–5854. doi:10.1080/
01431161.2017.1348636.
Kuo, B.-C., and D. A. Landgrebe. 2004. “Nonparametric Weighted Feature Extraction for
Classification.” IEEE Transactions on Geoscience and Remote Sensing 42 (5): 1096–1105.
doi:10.1109/TGRS.2004.825578.
Kuo, B.-C., C.-H. Li, and J.-M. Yang. 2009. “Kernel Nonparametric Weighted Feature Extraction for
Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 47 (4):
1139–1155. doi:10.1109/TGRS.2008.2008308.
Lan, R., Z. Li, Z. Liu, T. Gu, and X. Luo. 2019. “Hyperspectral Image Classification Using K-sparse
Denoising Autoencoder and Spectral-restricted Spatial Characteristics.” Applied Soft Computing
74: 693–708. doi:10.1016/j.asoc.2018.08.049.
Landgrebe, D. A. 2003. Signal Theory Methods in Multispectral Remote Sensing. New York: Wiley.
Lee, C., and D. A. Landgrebe. 1993. “Decision Boundary Feature Extraction for Nonparametric
Classification.” IEEE Transactions on Systems, Man, and Cybernetics 23 (2): 433–444. doi:10.1109/
21.229456.
Li, J. 2004. “Wavelet-Based Feature Extraction for Improved Endmember Abundance Estimation in
Linear Unmixing of Hyperspectral Signals.” IEEE Transactions on Geoscience and Remote Sensing 42
(3): 644–649. doi:10.1109/TGRS.2003.822750.
Li, S., H. Wu, D. Wan, and J. Zhu. 2011a. “An Effective Feature Selection Method for Hyperspectral
Image Classification Based on Genetic Algorithm and Support Vector Machine.” Knowledge-Based
Systems 24 (1): 40–48. doi:10.1016/j.knosys.2010.07.003.
Li, W., S. Prasad, J. E. Fowler, and L. M. Bruce. 2011b. “Locality-preserving Discriminant Analysis in
Kernel-induced Feature Spaces for Hyperspectral Image Classification.” IEEE Geoscience and
Remote Sensing Letters 8 (5): 895–898. doi:10.1109/LGRS.2011.2128854.
Li, W., F. Feng, H. Li, and Q. Du. 2018. “Discriminant Analysis-Based Dimension Reduction for
Hyperspectral Image Classification: A Survey of the Most Recent Advances and an Experimental
Comparison of Different Techniques.” IEEE Geoscience and Remote Sensing Magazine 15–34.
doi:10.1109/MGRS.2018.2793873.
Lv, F., M. Han, and T. Qiu. 2017. “Remote Sensing Image Classification Based on Ensemble Extreme
Learning Machine with Stacked Autoencoder.” IEEE Access 5: 9021–9031. doi:10.1109/
ACCESS.2017.2706363.
Lv, Z. Y., P. Zhang, J. A. Benediktsson, and W. Z. Shi. 2014. “Morphological Profiles Based on
Differently Shaped Structuring Elements for Classification of Images with Very High Spatial
Resolution.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7
(12): 4644–4652. doi:10.1109/JSTARS.2014.2328618.
Ma, L., and M. M. Crawford. 2010. “Local Manifold Learning Based K-Nearest Neighbor for
Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 48
(11): 4099–4109.
Martinez-Uso, A., F. Pla, J. M. Sotoca, and P. Garcia-Sevilla. 2007. “Clustering-based Hyperspectral
Band Selection Using Information Measures.” IEEE Transactions on Geoscience and Remote Sensing
45 (12): 4158–4171. doi:10.1109/TGRS.2007.904951.
Mirzapour, F., and H. Ghassemian. 2016. “Moment-based Feature Extraction from High Spatial
Resolution Hyperspectral Images.” International Journal of Remote Sensing 37 (6): 1349–1361.
doi:10.1080/2150704X.2016.1151568.
INTERNATIONAL JOURNAL OF REMOTE SENSING 6285
Mura, M. D., Villa, A., Benediktsson, J. A., Chanussot, J., and Bruzzone, L. 2011. “Classification of
Hyperspectral Images by Using Extended Morphological Attribute Profiles and Independent
Component Analysis.” IEEE Geoscience and Remote Sensing Letters 8 (3): 542–546. doi:10.1109/
LGRS.2010.2091253.
Neher, R., and A. Srivastava. 2005. “A Bayesian MRF Framework for Labeling Terrain Using
Hyperspectral Imaging.” IEEE Transactions on Geoscience and Remote Sensing 43 (6): 1363–1374.
doi:10.1109/TGRS.2005.846865.
Nie, F., S. Xiang, Y. Song, and C. Zhang. 2009. “Extracting the Optimal Dimensionality for Local
Tensor Discriminant Analysis.” Pattern Recognition 42 (1): 105–114. doi:10.1016/j.
patcog.2008.03.012.
Nielsen, A. A. 2011. “Kernel Maximum Autocorrelation Factor and Minimum Noise Fraction
Transformations.” IEEE Transactions on Image Processing 20 (3): 612–624. doi:10.1109/
TIP.2010.2076296.
Ojala, T., M. Pietikäinen, and T. Mäenpää. 2002. “Multiresolution Gray-scale and Rotation Invariant
Texture Classification with Local Binary Patterns.” IEEE Transactions on Pattern Analysis and
Machine Intelligence 24 (7): 971–987. doi:10.1109/TPAMI.2002.1017623.
Onoyama, H., C. Ryu, M. Suguri, and M. Iida. 2014. “Integrate Growing Temperature to Estimate the
Nitrogen Content of Rice Plants at the Heading Stage Using Hyperspectral Imagery.” IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing 7 (4): 2506–2515. doi:10.1109/
JSTARS.2014.2329474.
Pu, H., Z. Chen, B. Wang, and G.-M. Jiang. 2014. “A Novel Spatial-Spectral Similarity Measure for
Dimensionality Reduction and Classification of Hyperspectral Imagery.” IEEE Transactions on
Geoscience and Remote Sensing 52 (11): 7008–7022. doi:10.1109/TGRS.2014.2306687.
Qian, Y., M. Ye, and J. Zhou. 2012. “Hyperspectral Image Classification Based on Structured Sparse
Logistic Regression and Three-Dimensional Wavelet Texture Features.” IEEE Transactions on
Geoscience and Remote Sensing 51 (4): 2276–2291. doi:10.1109/TGRS.2012.2209657.
Quesada-Barriuso, P., F. Arguello, and D. B. Heras. 2014. “Spectral-Spatial Classification of
Hyperspectral Images Using Wavelets and Extended Morphological Profiles.” IEEE Journal of
Selected Topics in Applied Earth Observations and Remote Sensing 7 (4): 1177–1185. doi:10.1109/
JSTARS.4609443.
Rajadell, O., P. Garca-Sevilla, and F. Pla. 2013. “Spectral-Spatial Pixel Characterization Using Gabor
Filters for Hyperspectral Image Classification.” IEEE Geoscience and Remote Sensing Letters 10 (4):
860–864. doi:10.1109/LGRS.2012.2226426.
Rasti, B., M. O. Ulfarsson, and J. R. Sveinsson. 2010. “Hyperspectral Feature Extraction Using Total
Variation Component Analysis.” IEEE Transactions on Geoscience and Remote Sensing 54 (12):
6976–6985. doi:10.1109/TGRS.2016.2593463.
Rellier, G., X. Descombes, F. Falzon, and J. Zerubi. 2004. “Texture Feature Analysis Using a
Gauss-Markov Model in Hyperspectral Image Classification.” IEEE Transactions on Geoscience
and Remote Sensing 42 (7): 1543–1551. doi:10.1109/TGRS.2004.830170.
Ren, Y., L. Liao, S. J. Maybank, Y. Zhang, and X. Liu. 2017. “Hyperspectral Image Spectral-Spatial
Feature Extraction via Tensor Principal Component Analysis.” IEEE Geoscience and Remote Sensing
Letters 14 (19): 1431–1435. doi:10.1109/LGRS.2017.2686878.
Romero, A., C. Gatta, and G. Camps-Valls. 2016. “Unsupervised Deep Feature Extraction for Remote
Sensing Image Classification.” IEEE Transactions on Geoscience and Remote Sensing 54 (3):
1349–1362. doi:10.1109/TGRS.2015.2478379.
Scholkopf, B., A. Smola, and K.-R. Mller. 1998. “Nonlinear Component Analysis as a Kernel Eigenvalue
Problem.” Neural Computing 10 (1): 1299–1319. doi:10.1162/089976698300017467.
Serpico, S. B., and G. Moser. 2007. “Extraction of Spectral Channels from Hyperspectral Images for
Classification Purposes.” IEEE Transactions on Geoscience and Remote Sensing 45 (2): 484–495.
doi:10.1109/TGRS.2006.886177.
Shankar, B. U., S. K. Meher, and A. Ghosh. 2011. “Wavelet-fuzzy Hybridization: Feature-extraction and
Land-cover Classification of Remote Sensing Images.” Applied Soft Computing 11: 2999–3011.
doi:10.1016/j.asoc.2010.11.024.
6286 B. KUMAR ET AL.
Shen, L., and S. Jia. 2011. “Three-Dimensional Gabor Wavelets for Pixel-Based Hyperspectral Imagery
Classification.” IEEE Transactions on Geoscience and Remote Sensing 49 (12): 5039–5046.
doi:10.1109/TGRS.2011.2157166.
Shen, L., Z. Zhu, S. Jia, J. Zhu, and Y. Sun. 2013. “Discriminative Gabor Feature Selection for
Hyperspectral Image Classification.” IEEE Geoscience and Remote Sensing Letters 10 (1): 29–33.
doi:10.1109/LGRS.2012.2191761.
Shi, M., and G. Healey. 2003. “Hyperspectral Texture Recognition Using a Multiscale Opponent
Representation.” IEEE Transactions on Geoscience and Remote Sensing 41 (5): 1090–1095.
doi:10.1109/TGRS.2003.811076.
Song, W., S. Li, L. Fang, and T. Lu. 2018. “Hyperspectral Image Classification With Deep Feature
Fusion Network.” IEEE Transactions on Geoscience and Remote Sensing 56 (6): 3173–3184.
doi:10.1109/TGRS.2018.2794326.
Sugiyama, M. 2007. “Dimensionality Reduction of Multimodal Labeled Data by Local Fisher
Discriminant Analysis.” Journal of Machine Learning and Research 8: 1027–1061.
Sun, W., and Q. Du. 2019. “Hyperspectral Band Selection: A Review.” IEEE Geoscience and Remote
Sensing Magazine 118–139. doi:10.1109/MGRS.2019.2911100.
Sun, X., F. Zhou, J. Dong, F. Gao, Q. Mu, and X. Wang. 2017. “Encoding Spectral and Spatial Context
Information for Hyperspectral Image Classification.” IEEE Geoscience and Remote Sensing Letters 14
(12): 2250–2254. doi:10.1109/LGRS.2017.2759168.
Tan, K., E. Li, Q. Du, and P. Du. 2014. “Hyperspectral Image Classification Using Band Selection and
Morphological Profiles.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing 7 (1): 40–48. doi:10.1109/JSTARS.2013.2265697.
Tao, C., H. Pan, Y. Li, and Z. Zou. 2015. “Unsupervised Spectral-Spatial Feature Learning with Stacked
Sparse Autoencoder for Hyperspectral Imagery Classification.” IEEE Geoscience and Remote
Sensing Letters 12 (12): 2438–2442. doi:10.1109/LGRS.2015.2482520.
Tsai, F., and J.-S. Lai. 2013. “Feature Extraction of Hyperspectral Image Cubes Using
Three-Dimensional Gray-Level Cooccurrence.” IEEE Transactions on Geoscience and Remote
Sensing 51 (6): 3504–3513. doi:10.1109/TGRS.2012.2223704.
Velasco-Forero, S., and J. Angulo. 2013. “Classification of Hyperspectral Images by Tensor Modeling
and Additive Morphological Decomposition.” Pattern Recognition 46 (1): 566–577. doi:10.1016/j.
patcog.2012.08.011.
Wang, J., and C.-I. Chang. 2016. “Independent Component Analysis-based Dimensionality Reduction
with Applications in Hyperspectral Image Analysis.” IEEE Transactions on Geoscience and Remote
Sensing 44 (6): 1586–1600. doi:10.1109/TGRS.2005.863297.
Xia, J., L. Bombrun, T. Adal, Y. Berthoumieu, and C. Germain. 2016. “Spectral-Spatial Classification of
Hyperspectral Images Using ICA and Edge-Preserving Filter via an Ensemble Strategy.” IEEE
Transactions on Geoscience and Remote Sensing 54 (8): 4971–4982. doi:10.1109/
TGRS.2016.2553842.
Xia, J., J. Chanussot, P. Du, and X. He. 2015. “Spectral-Spatial Classification for Hyperspectral Data
Using Rotation Forests with Local Feature Extraction and Markov Random Fields.” IEEE
Transactions on Geoscience and Remote Sensing 53 (5): 2532–2546. doi:10.1109/
TGRS.2014.2361618.
Xing, C., L. Ma, and X. Yang. 2016. “Stacked Denoise Autoencoder Based Feature Extraction and
Classification for Hyperspectral Images.” Journal of Sensors 2016: 1–10.
Yang, J., Y.-Q. Zhao, and C.-W. J. Chan. 2017. “Learning and Transferring Deep Joint Spectral-Spatial
Features for Hyperspectral Classification.” IEEE Geoscience and Remote Sensing Letters 55 (8):
4729–4742. doi:10.1109/TGRS.2017.2698503.
Yang, J.-M., P.-T. Yu, and B.-C. Kuo. 2010. “A Nonparametric Feature Extraction and Its Application to
Nearest Neighbor Classification for Hyperspectral Image Data.” IEEE Transactions on Geoscience
and Remote Sensing 48 (3): 1279–1293. doi:10.1109/TGRS.2009.2031812.
Ye, Z., S. Prasad, W. Li, J. E. Fowler, and M. He. 2014. “Classification Based on 3-D DWT and Decision
Fusion for Hyperspectral Image Analysis.” IEEE Geoscience and Remote Sensing Letters 11 (1):
173–177. doi:10.1109/LGRS.2013.2251316.
INTERNATIONAL JOURNAL OF REMOTE SENSING 6287
Yin, J., Y. Wang, and J. Hu. 2012. “A New Dimensionality Reduction Algorithm for Hyperspectral
Image Using Evolutionary Strategy.” IEEE Transactions on Industrial Informatics 8 (4): 935–943.
doi:10.1109/TII.2012.2205397.
Zhang, L., L. Zhang, D. Tao, and X. Huang. 2013. “Tensor Discriminative Locality Alignment for
Hyperspectral Image SpectralSpatial Feature Extraction.” IEEE Transactions on Geoscience and
Remote Sensing 51 (1): 242–255. doi:10.1109/TGRS.2012.2197860.
Zhao, W., and S. Du. 2016. “Spectral-spatial Feature Extraction for Hyperspectral Image Classification:
A Dimension Reduction and Deep Learning Approach.” IEEE Transactions on Geoscience and
Remote Sensing 54 (8): 4544–4554. doi:10.1109/TGRS.2016.2543748.
Zhong, Z., B. Fan, J. Duan, L. Wang, K. Ding, S. Xiang, and C. Pan. 2015. “Discriminant Tensor
Spectral-Spatial Feature Extraction for Hyperspectral Image Classification.” IEEE Geoscience and
Remote Sensing Letters 12 (5): 1028–1032. doi:10.1109/LGRS.2014.2375188.
Zhou, P., J. Han, G. Cheng, and B. Zhang. 2019. “Learning Compact and Discriminative Stacked
Autoencoder for Hyperspectral Image Classification.” IEEE Transactions on Geoscience and Remote
Sensing 57 (7): 4823–4833. doi:10.1109/TGRS.36.