Professional Documents
Culture Documents
Abstract—Dimensionality reduction is a necessity in most hy- have been proposed. One example is the well-known discrimi-
perspectral imaging applications. Tradeoffs exist between un- nant analysis feature extraction, but these approaches tend to be
supervised statistical methods, which are typically based on either computationally expensive and/or suboptimal. Recently,
principal components analysis (PCA), and supervised ones, which
are often based on Fisher’s linear discriminant analysis (LDA), an approach known as subspace LDA has been proposed,
and proponents for each approach exist in the remote sensing where a PCA projection discards the null space of the global
community. Recently, a combined approach known as subspace covariance matrix to resolve an ill-conditioned LDA problem.
LDA has been proposed, where PCA is employed to recondition Thus, in theory, the subspace LDA technique benefits from the
ill-posed LDA formulations. The key idea behind this approach advantages of both methods.
is to use a PCA transformation as a preprocessor to discard the
null space of rank-deficient scatter matrices, so that LDA can be Although some authors have previously reported experimen-
applied on this reconditioned space. Thus, in theory, the subspace tal observations on the detrimental effects of PCA [4], [5],
LDA technique benefits from the advantages of both methods. In a theoretical analysis of the discrimination potential of PCA-
this letter, we present a theoretical analysis of the effects (often projected hyperspectral features has not been studied in detail.
ill effects) of PCA on the discrimination power of the projected In this letter, we present a theoretical analysis of the discrimi-
subspace. The theoretical analysis is presented from a general pat-
tern classification perspective for two possible scenarios: 1) when nation power in various linearly projected spaces as a means to
PCA is used as a simple dimensionality reduction tool and 2) when demonstrate the weaknesses of PCA in many applications. We
it is used to recondition an ill-posed LDA formulation. We also pro- also present a theoretical analysis of class discrimination in a
vide experimental evidence of the ineffectiveness of both scenarios subspace LDA-projected space. With this analysis, we intend
for hyperspectral target recognition applications. to cover two important scenarios typically encountered in a
Index Terms—Dimensionality reduction, feature extraction, pattern classification problem: 1) when the size of the training
hyperspectral, image classification, pattern classification. data is sufficiently large relative to the dimensionality of the
feature space and the features do not possess high redundancy
I. I NTRODUCTION and 2) when the training data size is insufficient to model the
patterns using second-order statistics and/or a subset of features
Authorized licensed use limited to: Univ of Puerto Rico Mayaguez - Library. Downloaded on February 10, 2009 at 16:09 from IEEE Xplore. Restrictions apply.
626 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 5, NO. 4, OCTOBER 2008
Authorized licensed use limited to: Univ of Puerto Rico Mayaguez - Library. Downloaded on February 10, 2009 at 16:09 from IEEE Xplore. Restrictions apply.
PRASAD AND BRUCE: LIMITATIONS OF PCA FOR HYPERSPECTRAL TARGET RECOGNITION 627
which is known to maximize class separation. It follows that practice in PCA transformations to project the data in directions
PCA will maximize the optimality criterion only when the such that the significant eigenvalues of the overall covariance
solution of (4) is the same as the solution of (5). It is obvious matrix (or total scatter matrix) are retained. In situations where
that a common solution will not exist for any arbitrary Sb , Sw , ST is rank deficient, consider a simple PCA projection that
and ST . discards the null space of ST . Techniques such as subspace
An intuitive way to picture this is the following. Let the LDA employ PCA with this goal. Such a transformation ensures
spectral decomposition of ST be that, after projection, N (S̃T ) = {Φ}, i.e., the null set. If we
restrict S̃b and S̃w to be positive semidefinite
ST = U ΛU T . (6)
N (S̃T ) = N (S̃) ∩ N (S̃w ) . (8)
In a PCA projection, we choose eigenvectors corresponding to
large eigenvalues of ST for projection (let us say the first n
Hence, after the PCA projection
are retained), and the projection matrix is given by Ũ T , which
denotes a matrix containing the principal directions for projec- N (S̃b ) ∩ N (S̃w ) = {Φ}. (9)
tion. Let the corresponding diagonal matrix of eigenvalues be
Λ̃. When Ũ T is used as the projection matrix, in the projected Recall that the desired projection space is N (S̃T )⊥ ∩ N (S̃w ).
space, the modified Fisher’s ratio J2 becomes However, if N (S̃T ) = {Φ} (after a PCA projection), it only
implies that N (ST )⊥ = {Φ}, and the intersection of the null
|Ũ T Sb Ũ | |Ũ T Sb Ũ |
J2 (Ũ ) = =
spaces of S̃b and S̃w is a null set. This does not guarantee
Ũ T (Sw + Sb )Ũ |Ũ T ST Ũ | N (S̃w ) = {Φ}, i.e., it does not guarantee retention of the null
space of Sw . Hence, even in a situation where PCA is used as a
|Ũ T Sb Ũ | |Ũ T Sb Ũ |
= = n . (7) preprocessing step (to resolve singularity issues in ST ) before
|Λ̃| λi another feature reduction step (e.g., as in subspace LDA),
i=1 discarding the null space of ST is not necessarily the optimal
Here, n is the number of principal components retained in strategy. At this point, we would also like to point out that other
the PCA projection. Clearly, the modified Fisher’s ratio is not techniques that discard the null space of Sw to resolve singular-
guaranteed to increase relative to the original space by this ity issues, such as pseudoinverse LDA [9], may have a similar
projection for two reasons. detrimental effect on class separation in the projected space.
1) The numerator |Ũ T Sb Ũ | is not guaranteed to increase
since Ũ represents the principal directions of ST and IV. E XPERIMENTAL H YPERSPECTRAL D ATA
not Sb . Hyperspectral data were collected using an Analytical Spec-
2) The value of the denominator |Λ̃| is actually greater than tral Devices (ASD) Fieldspec Pro FR handheld spectroradiome-
the value of |Λ| in the original space, because small ter [13] for four classes of vegetation. Signatures collected
eigenvalues in Λ (less than 1 and numerically close to 0) from this device have 2151 spectral bands sampled at 1 nm
were discarded to create Λ̃. over the range of 350–2500 nm, with a spectral resolution
From these arguments, it is clear that PCA is not an optimal ranging from 3 to 10 nm. A 25◦ instantaneous-field-of-view
transformation for feature extraction stages of pattern recogni- foreoptic was used; the instrument was set to average ten
tion systems. signatures to produce each sample signature; and the sensor
was held NADIR at approximately 4 ft above the vegeta-
B. Class Separation in a PCA-Projected Space, Case II: tion canopy. The signatures were truncated above band 1000
Sw and ST Are Rank Deficient (1350 nm) to remove atmospheric water absorption band ef-
fects and the noisier upper bands, resulting in hyperspectral
This case deals with scenarios where PCA is applied as a tool signatures with a dimensionality of 1000. Signatures used in
to discard the null space of ST , as with subspace LDA. Note this letter form four classes relevant to precision agriculture
that, by definition, Sb has a rank of, at most, c − 1, where c applications: Cotton variety ST-4961, Johnsongrass (Sorghum
is the number of classes. On the other hand, Sw (and, hence, halepense), Cogongrass (Imperata cylindrica), and Sicklepod
ST ) may be either full ranked or rank deficient, depending (Cassia obtusifolia), which are listed in Table I. These signa-
on the amount of training data and redundancy of features. tures were measured in good weather conditions in Mississippi
Zheng et al. pointed out in [12] that, when Sw is rank deficient, during 2000–2004.
the transformation that maximizes the optimality criterion, in an
ideal sense, would project the data onto a subspace N (ST )⊥ ∩
V. E XPERIMENTAL R ESULT
N (Sw ), where N (ST )⊥ is the orthogonal complement of the
null space of ST and N (Sw ) is the null space of Sw . One way To corroborate the claims made in this letter with experimen-
to visualize this is to realize that an ideal transformation will tal evidence, we present two types of experimental analyses
shrink the within-class scatter (by projecting to the null space using the data set described in Section IV. In the first set of
of Sw ) in the nonnull space of ST . experiments, a sliding window of size 25 bands was moved
In the following discussion, scatter matrices in the trans- across the wavelength spectrum, and class separation was mea-
formed space are denoted with a tilda, i.e., S̃. It is common sured in each window after three linear transformations: 1) LDA
Authorized licensed use limited to: Univ of Puerto Rico Mayaguez - Library. Downloaded on February 10, 2009 at 16:09 from IEEE Xplore. Restrictions apply.
628 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 5, NO. 4, OCTOBER 2008
Authorized licensed use limited to: Univ of Puerto Rico Mayaguez - Library. Downloaded on February 10, 2009 at 16:09 from IEEE Xplore. Restrictions apply.
PRASAD AND BRUCE: LIMITATIONS OF PCA FOR HYPERSPECTRAL TARGET RECOGNITION 629
Fig. 2. Bhattacharya distance measure for a two-class problem, with a sliding analysis window across the range of wavelengths of the hyperspectral signatures,
for Cotton versus Johnsongrass with PCA dimensions of (a) 25 and (b) 15.
TABLE II
OVERALL RECOGNITION ACCURACIES FOR THREE DIFFERENT DATA SETS AT TWO DIFFERENT MIXING RATIOS (AND THE 95% CONFIDENCE
INTERVAL IN PARENTHESIS). ALL ARE REPORTED IN PERCENTAGE (DF: MULTICLASSIFIER DECISION FUSION SYSTEM, SCL:
SINGLE-CLASSIFIER SYSTEM, SLDA: SUBSPACE LDA, BA: BAND AVERAGING)
important scenarios commonly encountered in the analysis. We [4] A. Cheriyadat and L. M. Bruce, “Why principal component analysis is not
also presented experimental evidence corroborating our claims an appropriate feature extraction method for hyperspectral data,” in Proc.
IEEE Int. Geosci. Remote Sens. Symp., Jul. 2003, vol. 6, pp. 3420–3422.
with various data sets. We showed that class separation may [5] J. Li, L. M. Bruce, J. Byrd, and J. Barnett, “Automated detection of
deteriorate after a PCA transformation. We also provide target Pueraria Montana (kudzu) through Haar analysis of hyperspectral re-
recognition accuracies of various subpixel target recognition flectance data,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., Jul. 2001,
vol. 5, pp. 2247–2249.
tasks after projecting the feature space using these transforma- [6] S. Prasad and L. M. Bruce, “Decision fusion with confidence based weight
tion techniques separately. assignment for hyperspectral target recognition,” IEEE Trans. Geosci.
From these theoretical arguments and experimental evidence, Remote Sens., vol. 46, no. 5, pp. 1448–1456, May 2008.
[7] S. Kumar, J. Ghosh, and M. M. Crawford, “Best-bases feature extraction
it follows that PCA should not be employed (by itself or algorithms for classification of hyperspectral data,” IEEE Trans. Geosci.
as a preprocessing to solve small-sample-size problems) by Remote Sens., vol. 39, no. 7, pp. 1368–1379, Jul. 2001.
researchers for ATR applications. Instead, alternative methods [8] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed.
New York: Wiley, 2000.
should be explored to solve the small-sample-size problem en- [9] J. Ye and Q. Li, “A two-stage linear discriminant analysis via QR-
countered with LDA-based transformations. A few techniques decomposition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 6,
that resolve small-sample-size issues in LDA transformations pp. 929–941, Jun. 2005.
include the regularized LDA method [11] and a recently pro- [10] D. L. Swets and J. Went, “Using discriminating eigenfeatures for image
retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 18, no. 8, pp. 831–
posed multiclassifier DF framework [6]. 836, Aug. 1996.
[11] J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “Regularization
studies on LDA for face recognition,” in Proc. Int. Conf. Image Process.,
R EFERENCES Oct. 2004, vol. 4, pp. 63–66.
[1] Z. Sun, D. Huang, Y. Cheung, J. Liu, and G. Huang, “Using FCMC, FVS, [12] W. Zheng, L. Zhao, and C. Zou, “An efficient algorithm to solve the
and PCA techniques for feature extraction of multispectral images,” IEEE small sample size problem for LDA,” Pattern Recognit., vol. 37, no. 5,
Geosci. Remote Sens. Lett., vol. 2, no. 2, pp. 108–112, Apr. 2005. pp. 1077–1079, May 2004.
[2] M. D. Farrell and R. M. Mersereau, “On the impact of PCA dimension [13] Analytical Spectral Devices FieldspecPro FR specifications. [Online].
reduction for hyperspectral detection of difficult targets,” IEEE Geosci. Available: http://www.asdi.com/products_specifications-FS3.asp
Remote Sens. Lett., vol. 2, no. 2, pp. 192–195, Apr. 2005. [14] J. A. Benediktsson and J. R. Sveinsson, “Multisource remote sensing
[3] A. M. Martinez and A. C. Kak, “PCA versus LDA,” IEEE Trans. Pattern data classification based on consensus and pruning,” IEEE Trans. Geosci.
Anal. Mach. Intell., vol. 23, no. 2, pp. 228–233, Feb. 2001. Remote Sens., vol. 41, no. 4, pp. 932–936, Apr. 2003.
Authorized licensed use limited to: Univ of Puerto Rico Mayaguez - Library. Downloaded on February 10, 2009 at 16:09 from IEEE Xplore. Restrictions apply.