You are on page 1of 11

Remote Sensing Letters

ISSN: 2150-704X (Print) 2150-7058 (Online) Journal homepage:

Rational function approximation for feature

reduction in hyperspectral data

S. Abolfazl Hosseini & Hassan Ghassemian

To cite this article: S. Abolfazl Hosseini & Hassan Ghassemian (2016) Rational function
approximation for feature reduction in hyperspectral data, Remote Sensing Letters, 7:2,
101-110, DOI: 10.1080/2150704X.2015.1101180

To link to this article:

Published online: 11 Nov 2015.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

Download by: [Wuhan University] Date: 12 November 2015, At: 05:04

VOL. 7, NO. 2, 101110

Rational function approximation for feature reduction

in hyperspectral data
S. Abolfazl Hosseini and Hassan Ghassemian
Faculty of Computer and Electrical Engineering, Tarbiat Modares University, Tehran, Iran


In this letter, we propose a feature extracting technique by using Received 23 May 2015
rational function curve tting. A unique rational function curve is Accepted 19 September
developed to t the spectral signature of each pixel in a hyper- 2015
Downloaded by [Wuhan University] at 05:04 12 November 2015

spectral image. Polynomial coecients in the numerator and

denominator of the tted curve are considered as new extracted
features. The main contribution of this letter is utilization of curve-
tting ability in order to classify and compress hyperspectral data.
In other words, naturally dierent curves can be discriminated
when they are approximated by rational functions with equal
form, but dierent amounts of coecients. This rational function
curve tting feature extraction method provides better classica-
tion results compared to some common feature extraction algo-
rithms when a maximum likelihood classier is used. The method
also has the ability of lossy data compression since the original
data can be reconstructed using the tted curves. In addition, the
proposed algorithm has the possibility to be applied to all pixels of
image individually, independently and simultaneously, unlike
methods like principal component analysis which need to know
all data points to compute the transformation matrix before trans-
forming data points to new feature space.

1. Introduction
Hyperspectral (HS) data sometimes called imaging spectroscopy (Ben-Dor et al. 2009)
contain hundreds of spatially co-registered images in the form of image cubes; each image
corresponds to a specic narrow spectral band usually in the wavelength range of
4002500 nm. For an N bands HS data set, the intensities measured for each pixel by the
HS sensor can be considered as the elements of an N-dimensional vector. The plot of the
sequence of intensity values corresponding to the reectance amounts of its ground surface
in adjacent wavelengths intervals (y = [y1, y2. . . yN]T, where T denotes vector transpose) as a
function of band numbers (x = [1, 2,. . ., N]T) is named as spectral signature (SS) of pixel. In
supervised classication of this type of data, some theoretical and practical problems like
Hughes phenomenon appear due to their high between bands correlation and high
dimensional spaces specications (Landgrebe 2002). There are four strategies to reduce
this phenomenon eect in HS images classication: semi-supervised classication
(Marconcini, Camps-Valls, and Bruzzone 2009; Shahshahani and Landgrebe 1994),

CONTACT Hassan Ghassemian

2015 Taylor & Francis

combination of spatial and spectral information (Camps-Valls, Shervashidze, and Borgwardt

2010; Mirzapour and Ghassemian 2015), using classiers like support vector machine which
are less sensitive to the training sample size (Melgani and Bruzzone 2004) and feature
reduction (extraction/selection) (Landgrebe 2002).
The main goal of feature reduction is to nd a transformation that maps data to a
lower dimensional space, preserving essential discriminative information. Principal com-
ponent analysis (PCA) and linear discriminant analysis (LDA) (Landgrebe 2002), non-
parametric weighted feature extraction (NWFE) (Kuo and Landgrebe 2004), wavelet
transform (Pu and Gong 2004) and maximum margin projection (MMP) (Xiaofei, Deng,
and Jiawei 2008) are examples of feature extraction (FE) techniques for data redundancy
reduction in remotely sensed data. Non-linear extensions of PCA (KPCA) have been
proposed by using kernel trick (Yanfeng, Ying, and Ye 2008). Kernel Fisher discriminant
analysis (Mika et al. 1999) and generalized discriminant analysis (Baudat and Anouar
2000) have been developed independently as kernel-based non-linear extensions of
Downloaded by [Wuhan University] at 05:04 12 November 2015

LDA. Also, optimal selection of spectral bands has been extensively discussed in the
literature (Landgrebe 2002). The features produced by feature reduction step are fed to
the classier. The maximum likelihood classier (MLC) is a widely used parametric
classier in HS data classication (Pal and Mather 2003).
Despite simple structure and relatively good results, many statistical analysis-based
methods like PCA and LDA have some deciencies when they are used in HS image
classication. For example, these methods do not consider the geometric aspects of SSs
and the ordinance of original features which is a rich source of information. For each
pixel of an HS image, we have a vector of measured quantities corresponding to
reectance in consecutive wavelengths. Therefore, the ordinance of measurements
might have some useful information. In other words, SS as a curve has some useful
information. Hosseini and Ghassemian (2012, 2013) introduced an FE technique based
on the fractal nature of SS that inherently depends on the ordinance of samples. Run
and King (1999) and Tsai and Philpot (1998) proposed a method for nding absorption
points of spectra based on derivative spectra. Since derivative computation is very
sensitive to noise, they have used a smoothing preprocessing step. Other disadvantage
of many FE methods is that they cannot be applied pixel by pixel, and rst we need to
know all data points to compute transformation matrix before transforming data to new
space and producing new features. The main contribution of this letter is to introduce an
FE method which regards the geometrical nature of the SSs and the ordinance informa-
tion existing in the SS that yields improvement of classication results. Indeed, we try to
t a rational function with polynomial numerator and denominator to the SS of each
pixel. Then the coecients of these polynomials are considered as new feature vectors
and are fed to an MLC. Although the concept of curve tting has already been used in
HS data processing in applications such as spectra smoothing (Run and King 1999; Tsai
and Philpot 1998), the novelty of the proposed framework is the use of the coecients
of the tted curves (not the amounts of the tted curves themselves) as the reduced
features in the new space, and using these reduced features in data classication and
compression. Results are compared to PCA, LDA, NWFE and MMP as four basic and
classic FE methods with relatively good performance for HS data classication. Unlike
these methods, our proposed method is applied pixel by pixel. Therefore, a parallel

implementation of the algorithm is possible. Moreover, since the proposed transform is

invertible, it could be used as a lossy compression method for HS data, too.

2. Curve tting and its discriminating ability

Curve tting is a traditional approach to nd the mathematical relationship between
observed values and independent variables. Also, it can be used in order to noise reduc-
tion and data smoothing (Run and King 1999; Tsai and Philpot 1998), and data inter-
polation/extrapolation (Acevedo et al. 2004). The aim of curve tting is to nd a function f
() in a prespecied class of functions for the data {(, I)}, where = 1, 2,. . .,N, and I is a
function of the independent variable , which minimizes the residual (the distance
between the data samples and f()) under the weight (w = [w1, . . . , wN]T (Fang and
Gossard 1995).
Some of tting criteria to perform linear or non-linear ttings are least squares (LS),
Downloaded by [Wuhan University] at 05:04 12 November 2015

least absolute residuals and bisquare tting. In the LS method, f() is found by minimiz-
ing the following weighted mean squared error:

1X N
w f  I 2 (1)
N 1

There are dierent curve-tting models like polynomial, linear, spline, etc. In this
letter, a method for tting a curve using rational functions is utilized in order to extract
new features for classication and compression purposes. Therefore, it is important that f
() has fewer parameters than the number of data samples.
Consider a function f and the two integers L 0 and M 0, the rational function
approximant of order (L, M) for f is dened as ^f
, !
f cjM1
1 cj
j0 j1

For the same model, dissimilarities of curves yield dierences in the coecients of the
tted curves. Therefore, it seems that these coecients could be used as discriminating
features for the curves. For example, in Figure 1(a) two dierent families of curves are
plotted, and the distributions of their rational function approximants coecients are
demonstrated in Figure 1(b)(f). In this curve-tting problem, we tted a rational func-
tion with L = 0 and M = 4 to both families of curves. Therefore, each curve can be
expressed with its own ve coecients. As can be seen, the histograms of some
coecients completely separate the two families. This fact can be considered as a
motivation for using the coecients of the rational function tted curves of the SSs as
discriminating features in HS data classication tasks.

3. The proposed FE method

The SS of any pixel in an HS image is dened as the plot of its measured intensities in
dierent wavelengths as a function of wavelengths or band numbers. It means that each
SS contains samples of an unknown function f() in N consecutive points. We show that

Figure 1. Two families of curves and the histograms of the coecients of their corresponding Pad
approximants: (a) the curves families (family 1: solid blue lines, and family 2: dotted red lines); (b)
Downloaded by [Wuhan University] at 05:04 12 November 2015

numerator coecient distribution; (c)(f) denominator coecients distribution.

an approximation of f() in the form of a rational function with polynomial numerator

and denominator can be developed through an LS method with uniform weights in
Equation (1). Then, we utilize the coecients of these polynomials as new features for
MLC. Also, these features can be used for reproducing the original data.
To provide equal condition for all HS images, the variable is considered in the form
/N as normalized band number. The rational approximant of f(/N) for the pixel located
at (x,y) is given by
  X L  j , XM  j !
fx;y cjM1 1 cj (3)
N j0
N j1

We want to determine coecients vector c = [c1 c2 . . .cM+L+1]T to minimize:

1X N
E f^ = N  f = N (4)
N 1

A system of non-linear equations is obtained by computing partial derivatives of E

with respect to the coecients and setting them to zero. A sucient but not necessary
condition for solving this system is to nd c such that
L  j X
M  j
cjM1 N  f =N cj N f = N ; 1; ::; N (5)
j0 j1

Now we have a system of linear equations with M + L + 1 unknowns and N equations

which can be rewritten in matrix form as
ANML1 cML11 bN1 (6)

(    j )
f N N j 1; :::; M
where ANML1 aj , aj  j and bNx1
      j M 1; :::; M L 1
f N1 ; f N2 ; ::::; f NN t . N

For feature reduction purposes, we want M + L + 1 ! N, therefore the matrix A is

not square, so we use MoorePenrose pseudo inverse of A and nd c such that the
norm of Ac-b be minimized. The above procedure must be performed for all pixels
of the HS data set, and the vectors c replace the original data and new image cube
is developed. Therefore, the third dimension of data is changed to M + L + 1,
achieving to resize data for a rate of N/(M + L + 1). The procedure can be
performed for all pixels, simultaneously.
This rational function curve-tting feature extraction (RFCF-FE) method is an
unsupervised FE method. The new features are applied to MLCs and results are
compared to PCA as a traditional unsupervised FE method and LDA and NWFE as
two supervised FE methods and MMP as an unsupervised manifold-based algo-
rithm. As demonstrated in the following section, the RFCF-FE method results are
more accurate than its competing methods for both urban and agricultural data
sets. Also, it can be used as a coding algorithm for HS data compression.
Downloaded by [Wuhan University] at 05:04 12 November 2015

4. Experimental results
4.1. HS data sets
The rst data set used in our experiments is a mixed forest/agricultural 145 145
pixels image from Indian Pine Site (IPS) in Indiana. It is captured by AVIRIS sensor.
The spatial resolution of this data set is 20 m. The image contains 220 reectance
bands in the wavelengths from 400 to 2500 nm with 10 nm resolution. After
removing water absorption bands, N = 200 bands were left. The scene contains
16 dierent land covers, and detailed information can be found in Universidad-del-
Pais-Vasco (2015).
The other data set was gathered over the urban area of the University of Pavia
(UP) by ROSIS sensor. It is 610 340 pixels, with the spatial resolution of 1.3 m, in
115 reectance bands in the range of 430860 nm. Discarding some noisy bands
yields N = 103 bands nally. This scene contains nine dierent land cover
(Universidad-del-Pais-Vasco 2015). Band 18 of these data sets in shown in Figures
2(a) and 3(a), respectively.

Figure 2. Comparing the original and the reconstructed image of IPS (band 18): (a) original data; (b)
obtained using the RFCF-FE method; (c) obtained using inverse PCA.
Downloaded by [Wuhan University] at 05:04 12 November 2015

Figure 3. Comparing the original and the reconstructed image of UP (band 18): (a) Original data; (b)
obtained using the RFCF-FE method; (c) obtained using inverse PCA.

4.2. Results and discussion

The proposed FE method has been applied to both HS data sets IPS and UP, and
the extracted features have been fed into an MLC. The classication results have
been compared to those of PCA, NWFE, MMP and LDA features. For a given number
of features, D, the parameter L changes in the range of 0 to D 1. Then M is
selected regarding the constraint: M + L + 1 = D. For each value of D, the values of
L and M producing best results have been selected for comparison to the compet-
ing methods with the same dimensions. Since 10% of the whole data volume with
a minimum of 15 and a maximum of 50 samples per class is used for training the
classier, D changes in the range 2 D 14. Figures 2(b) and 3(b) show the images
of 18th band of input data sets, reconstructed from extracted features by the RFCF-
FE method (L = 0, M = 13 for IPS and L = 12, M = 1 for UP, as two typical amounts
of M and L). Reconstruction process has been performed using Equation (4) for
= 1, 2,. . ., N (note that shows the index of the wavelength, not the wavelength
value). Also, Figures 2(c) and 3(c) demonstrate reconstruction of the same band
using inverse PCA. Examples of real SSs and their rational tted curves of two
sample pixels of these two data sets are plotted in Figure 4, where (L, M) are equal
to (0, 13) and (12, 1) for IPS and UP, respectively.
The above discussions about Figures 24 show that the RFCF-FE method has high
ability to preserve spatial and spectral characteristics of the data. Also, it can be seen
that by applying this method for IPS data some spatial domain smoothing occurred
without destroying the edges of the regions. However, in a few points as illustrated in
Figure 4, there may be a large dierence between the original and the tted curve; our

Figure 4. Spectral signature of a typical pixel and its approximation: (a) IPS data, image coordi-
nate = (50, 20), Land cover = corn; (b) UP data, image coordinate (70, 70), Land cover = self-blocking
Downloaded by [Wuhan University] at 05:04 12 November 2015


results show that these exceptional points do not have a severe impact on the classica-
tion performance. This is because (1) these exceptional points in each SS are rare, if any;
(2) the locations of these points are not the same for dierent SSs, i.e. this phenomenon
does not destroy any specic band completely; and (3) for each SS, the coecients of
the corresponding tted curve are used as the features, not the curve itself.
Figure 5 demonstrates the accuracy assessment measures for IPS and UP data sets.
The optimum values of L (and so M) parameters of the RFCF-FE method dier for
dierent values of D and dierent iterations of the algorithm, but in most cases the
best classication results have occurred when values of L are 0, 1, D 2 and D 1. Note
that despite the other methods the maximum number of extracted features in the LDA
method is equal to Nc 1, where Nc is the number of classes.
The superiority of the RFCF-FE method compared to the other methods is appar-
ent from Figure 5. As can be seen, all measures (average accuracy, average validity,
overall accuracy and kappa statistics) have been dramatically improved by the RFCF-
FE algorithm in comparison to the other methods. The only exception is for the MMP
method in the UP case at D = 12, 13 and 14. This improvement in IPS is more than in
UP because IPS data contain agricultural scene with less details and larger ground
instantaneous eld of view in contrast to the urban scene of UP. Therefore, our
method outperforms PCA, NWFE and LDA for agricultural scenes as well as urban
ones and loses out to MMP by a few points for the urban data set.
Table 1 contains the peak signal-to-noise ratio (PSNR) values for reconstructed images
from PCA and the proposed method for IPS and UP, respectively, when D varies from 3 to 15.
PSNR for the proposed method corresponds to the values of L and M that yields to the
best result. These optimum values of L and M have been shown in these tables. The
superiority of the proposed method with respect to the PCA-based compressing method is
apparent from these tables. As demonstrated in Table 1, in most cases for the UP data set, the
best result is obtained when M = 0. It implies that Maclaurin series has a better performance
in these cases. PSNR is calculated using 10 log10 (S/N), where S is the energy of original signal
and N is the energy of dierence between the original and decompressed signal.
Downloaded by [Wuhan University] at 05:04 12 November 2015

Figure 5. Comparison of the RFCF-FE method with those of PCA, NWFE, MMP and LDA in terms of
common accuracy measures. (a)(d) For IPS data, and (e)(h) for UP data. Reference ground truth
maps are available in Universidad-del-Pais-Vasco.

Table 1. Comparing PSNR of the proposed method and inverse PCA for IPS and UP data sets.
Compression rate IPS Compression rate UP
200/3 (0, 2) 34.95 24.39 103/3 (2, 0) 31.85 29.76
200/4 (0, 3) 27.42 25.74 103/4 (3, 0) 20.25 16.44
200/5 (0, 4) 58.70 24.20 103/5 (2, 2) 32.92 23.54
200/6 (0, 5) 29.63 23.28 103/6 (5, 0) 21.35 16.17
200/7 (4, 2) 24.17 24.46 103/7 (5, 1) 25.29 19.27
200/8 (0, 7) 22.78 25.13 103/8 (2, 5) 24.06 19.52
200/9 (6, 2) 55.11 25.21 103/9 (1, 7) 43.77 22.52
200/10 (2, 7) 34.38 25.90 103/10 (2, 7) 38.98 21.77
200/11 (6, 4) 28.46 27.03 103/11 (6, 4) 68.19 22.74
200/12 (8, 3) 28.17 28.99 103/12 (11, 0) 34.11 23.65
200/13 (5, 7) 31.47 29.43 103/13 (12, 0) 27.86 22.91
200/14 (2, 11) 41.01 31.47 103/14 (13, 0) 26.05 24.08
200/15 (10, 4) 42.38 30.14 103/15 (14, 0) 35.47 23.98
Downloaded by [Wuhan University] at 05:04 12 November 2015

5. Conclusion
A new FE method for HS data is proposed using rational function curve tting. The main
motivation for using the curve-tting approach for HS data FE is the utilization of the
information that exists in the sequence of original features (ordinance of reectance
coecients in SS) that are neglected by competing methods. The coecients of SSs
approximants are calculated through analytical operations. These extracted features are
then fed into an ML classier. The size of the training samples is selected as 10% of total
data volume with a minimum of 15 and maximum of 50 samples for each class. The
classication performance is compared to PCA as a traditional unsupervised FE method
and LDA and NWFE as two supervised FE methods and MMP as an unsupervised manifold-
based algorithm. The results show the superiority of the proposed method. Also, it has
been shown that this technique has satisfactory results for signal visualization and signal
representation, and can be considered as a good coding algorithm for lossy compression
of HS data. The proposed method can be applied pixel by pixel and does not need to
transform whole data to a new space simultaneously. In addition, this method is a novel
approach which can be used as a powerful base for developing more ecient FE methods.

Disclosure statement
No potential conict of interest was reported by the authors.

This work was supported by Iran communication research center [grant number 18133/500 T by
Identication code: 90-01-03].

Acevedo, J. C., H. Haneishi, M. Yamaguchi, N. Ohyamaa, and J. Baez. 2004. CIE-XYZ Fitting by
Multispectral Images and Mean Square Error Minimization with a Linear Interpolation Function.
Revista Mexicana De Fsica 50 (6): 601607.
Baudat, G., and F. Anouar. 2000. Generalized Discriminant Analysis Using a Kernel Approach.
Neural Computation 12 (10): 23852404. doi:10.1162/089976600300014980.

Ben-Dor, E., S. Chabrillat, J. Dematt, G. Taylor, J. Hill, M. Whiting, and S. Sommer. 2009. Using
Imaging Spectroscopy to Study Soil Properties. Remote Sensing of Environment 113: S38S55.
Camps-Valls, G., N. Shervashidze, and K. M. Borgwardt. 2010. Spatio-Spectral Remote Sensing
Image Classication With Graph Kernels. IEEE Geoscience and Remote Sensing Letters 7 (4): 741
745. doi:10.1109/LGRS.2010.2046618.
Fang, L., and D. C. Gossard. 1995. Multidimensional Curve Fitting to Unorganized Data Points by
Nonlinear Minimization. Computer-Aided Design 27 (1): 4858. doi:10.1016/0010-4485(95)
Hosseini, A., and H. Ghassemian. 2012. Classication of Hyperspectral and Multispectral Images by
Using Fractal Dimension of Spectral Response Curve. In 2012 20th Iranian Conference on
Electrical Engineering (ICEE), 14521457. Tehran: IEEE.
Hosseini, A., and H. Ghassemian. 2013. A New Hyperspectral Image Classication Approach Using
Fractal Dimension of Spectral Response Curve. In 2013 21st Iranian Conference on Electrical
Engineering (ICEE), 16. Mashad: IEEE.
Kuo, B.-C., and D. A. Landgrebe. 2004. Nonparametric Weighted Feature Extraction for
Downloaded by [Wuhan University] at 05:04 12 November 2015

Classication. IEEE Transactions on Geoscience and Remote Sensing 42 (5): 10961105.

Landgrebe, D. 2002. Hyperspectral Image Data Analysis as a High Dimensional Signal Processing
Problem. IEEE Signal Processing Magazine 19 (1): 1728. doi:10.1109/79.974718.
Marconcini, M., G. Camps-Valls, and L. Bruzzone. 2009. A Composite Semisupervised SVM for
Classication of Hyperspectral Images. IEEE Geoscience and Remote Sensing Letters 6 (2): 234
238. doi:10.1109/LGRS.2008.2009324.
Melgani, F., and L. Bruzzone. 2004. Classication of Hyperspectral Remote Sensing Images with
Support Vector Machines. IEEE Transactions on Geoscience and Remote Sensing 42 (8): 1778
1790. doi:10.1109/TGRS.2004.831865.
Mika, S., G. Ratsch, J. Weston, B. Scholkopf, and K. Muller. 1999. Fisher Discriminant Analysis with
Kernels. In Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal
Processing Society Workshop, 4148. Madison, WI: IEEE.
Mirzapour, F., and H. Ghassemian. 2015. Improving Hyperspectral Image Classication by
Combining Spectral, Texture, and Shape Features. International Journal of Remote Sensing 36
(4): 10701096. doi:10.1080/01431161.2015.1007251.
Pal, M., and P. M. Mather. 2003. An Assessment of the Eectiveness of Decision Tree Methods for
Land Cover Classication. Remote Sensing of Environment 86 (4): 554565. doi:10.1016/S0034-
Pu, R., and P. Gong. 2004. Wavelet Transform Applied to EO-1 Hyperspectral Data for Forest LAI
and Crown Closure Mapping. Remote Sensing of Environment 91 (2): 212224. doi:10.1016/j.
Run, C., and R. King. 1999. The Analysis of Hyperspectral Data Using Savitzky-Golay Filtering-
Theoretical Basis (Part 1). In IEEE 1999 International Proceedings on Geoscience and Remote
Sensing Symposium, Hamburg, 756758. IEEE. doi:10.1109/IGARSS.1999.774430.
Shahshahani, B. M., and D. A. Landgrebe. 1994. The Eect of Unlabeled Samples in Reducing the
Small Sample Size Problem and Mitigating the Hughes Phenomenon. IEEE Transactions on
Geoscience and Remote Sensing 32 (5): 10871095. doi:10.1109/36.312897.
Tsai, F., and W. Philpot. 1998. Derivative Analysis of Hyperspectral Data. Remote Sensing of
Environment 66 (1): 4151. doi:10.1016/S0034-4257(98)00032-7.
Universidad-del-Pais-Vasco. 2015. Hyperspectral Remote Sensing Scenes.
Xiaofei, H., C. Deng, and H. Jiawei. 2008. Learning a Maximum Margin Subspace for Image
Retrieval. IEEE Transactions on Knowledge and Data Engineering 20 (2): 189201. doi:10.1109/
Yanfeng, G., L. Ying, and Z. Ye. 2008. A Selective KPCA Algorithm Based on High-Order Statistics
for Anomaly Detection in Hyperspectral Imagery. IEEE Geoscience and Remote Sensing Letters 5
(1): 4347. doi:10.1109/LGRS.2007.907304.