Professional Documents
Culture Documents
Nonlinear Robust Regressions Based On ?-Regression Quantile, Least Median of Squares and Least Trimmed Squares Using Genetic Algorithms
Nonlinear Robust Regressions Based On ?-Regression Quantile, Least Median of Squares and Least Trimmed Squares Using Genetic Algorithms
K
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 12, DECEMBER 2011, ISSN 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 64
2 THEORIES AND METHODS
2.1 -Regression Quantile, LTS and LMS
Given
( )
( 1)
,
N p
N
+
= e X 1 X R
( )
1 2
,
N p
N
= e X x x x R
( )
1 2
,
T
p
i i i iN
x x x = e x R ( )
1 2
T
N
N
y y y = e y R
with
N
1 is 1 N vector with all elements equal to one,
( )
1
1 0
+
e =
p T
p
R | | | is a vector of regression
coefficients, ( )
N T
N
e e e R e e =
2 1
is vector of
residualsand R isthesetofrealnumbersandi=1,2,
,N.Then,theordinarymultiplelinearregressionisgiven
by
(1) = + y X e
TheOLSbasedlinearregressionminimizes
2
1
, (2)
N
i
i
e
=
to find the estimate of , say ,
where x
T
i i
i
y e = and
( )
T T
i i
x x 1 = . The solution can be found by solving the
followinglinearequation(see[1]and[10])
(3)
T T
= X X X y
However, the prediction of the OLS based regression will
be distorted when outliers are present since squaring
residuals magnifies the effect of the outliers. In previous
works, one can use regression quantile for tackling the
existenceofoutliers.Theregressionquantileminimizes
1
( ) (4)
N
i
i
e
o
tofindtheestimatorof , where
0,
( ) (5)
( 1) 0,
z z
z
z z
o
o
o
>
=
<
and
o
is a function from R into R . We should notice
that the L1 regression is a special case of regression
quantile when is equal to 0.5. One can also use either
LMSorLTSfortacklingoutliersinwhichtheestimationof
isfoundbyminimizingeither
2
median (6)
i
e
or
| |
2
1
(7)
h
i
i
e
=
where
| | | | | |
2 2 2
1 2
,
N
e e e s s s
| |
{ }
| 1, 2, ,
j i
e e j N e = and h ( ) N s
mustbedeterminedinadvance,respectively.
=
N
i
i
The relation of eigenvalues and eigenvectors
ofthematricesCandKwerestudiedby[16],[17]and[20].
The centered multiple linear regression in the feature
spaceisgivenby
0
(8) = + y e
where
( ) 1 2 P
F
T
= is a vector of regression
coefficients in the feature space, e
~
is a vector of random
errors and ( ) ( )y 1 1 I y
T
N N N
N / 1
0
= where
N
I is the N N
identitymatrix.
Let
1 2 1 1 r r p p
F F
+ +
> > > > > > >
0 = = =
N
be
the eigenvalues of K and ( )
1 2 N
= B b b b be the
matrixofthecorrespondingnormalizedeigenvectors
s
b (s
=1, 2, , N) of K. By defining
l l l
b = and
l
T
l
a + =
forl=1,2, ,
F
p and
( ) 1 2
,
p
F
= A a a a we can reduce
model (8) to
( ) ( ) 0
. (9)
p p
F F
= + y U e 0
where
( )
( ) ( )
F F F p
p p
I + K A U = = and
( ) 1 2
( ),
p p F F
= see
[20]forthedetaileddiscussion.
According to Mercer Theorem, if we choose a
continuous, symmetric and positive semidefinite kernel
R R R
P P
: k then there exists F R
P
: such that
) ( ) ( ) , (
j
T
i j i
x x x x k = , see [9], [16] and [17]. Instead of
choosing explicitly, we choose a kernel k and employ
the corresponding function as . Let ) , (
j i ij
K x x k =
thenKand
l
(l=1,2, ,
F
p
)areexplicitlyknownnow.
It implies that
( ) p
F
U and
( ) p
F
are also explicitly known and
model(9)is well defined now. It is evident that the elements
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 12, DECEMBER 2011, ISSN 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 65
of
( )
F
p
U are the principal components of ( )
i
x for i = 1, 2,
, N.
If we only use the first r ( )
F
p s vectors of , ,
2 1
, ,
F p
model(9)becomes
( ) ( ) 0
, (10)
r r
= + y U 0
where
1 2
( )
T
N
= is a vector of residuals influenced
by dropping the term
( ) ( ) p -r p -r
F F
U 0 in model (9),
respectively.Weusuallydisposeoftheterm
( ) ( ) p -r p -r
F F
U 0 for
tackling the effects of multicollinearity on the PCA based
regressions where the number r is called the retained
number of nonlinear principal components (PCs) for the
KPCR. We canuse the ratio
1
l
( l =1, 2, ,
F
p
) for
detectingthepresenceofmulticollinearityon
( )
.
r
U If
1
l
is smaller than, say < 1/1000, then we consider that
multicollinearityexistson
( ) r
U [10].
Let
( )
r
0 be the estimator of
( ) r
0 using the OLS method.
Then,thepredictionvalueof y ,say , y
isgivenby
( ) ( )
(11)
N r r
y 0 = + y 1 K
ThepredictionoftheKPCRwiththefirstrvectorsof
1
,
2
,
,
p
F
isgivenby
( )
( ) ( )
1
, (13)
N
i i r
i
f y ck
=
= +
x x x
where ( )
( ) ( ) 1 2
T
N r r
c c c = 0 and
( ) r
f isafunctionfrom
N
R into . R
= e U u u u R then we obtain
( )
.
T
i oi i r
y 0 = u Furthermore, we solve one of the following
problems
1
min ( ), (13)
N
i
i=
o
or
2
min median (14)
i
or
2
1
min , (15)
h
i
i=
tofindtherobustestimatorsof
( )
.
r
0
Let
( )
*
r
0 be the estimator of
( ) r
0 using the regression
quantile. Then, the prediction value of y with the first r
vectorsof
1 2
, , , ,
p
F
say ,
~
y isgivenby
( ) ( )
*
(16)
N r r
y = + y 1 K 0
andtheresidualbetween y and y
~
isgivenby
. (17) = y y
ThepredictionoftheKPCRwiththefirstrvectorsof
1
,
2
,
,
p
F
isgivenby
( )
( ) ( )
1
, (18)
N
i i r
i
g y d k
=
= +
x x x
where ( )
( ) ( )
*
1 2
T
N r r
d d d = 0 and
( ) r
g is a function
from
N
R into . R The number of r is called the retained
numberofnonlinearPCsfortheRKPCRR. Thepredictionsof
the RKPCLMSR and RKPCLTRS with the first r vectors of
F
p 2 1
, , , are conducted by using the similar
procedures,respectively.
r
0 bethesolution
ofthisproblem.
9.Calculate ( )
( ) ( )
*
1 2
.
T
N r r
d d d 0 =
10. Given a vector ,
p
R x e the prediction of the RKPCQR
withthefirstrvectorsof
F
p 2 1
, , , isgivenby
( )
( ) ( )
1
, .
N
i i r
i
g y d
=
= +
x x x k
=
N
i
i
When ( ) 0 x =
=
N
i
i
1
,wereplaceK
by KN = KEKKE+EKE in Step 4, where E is the N N
matrixwithallelementsequalto1/N.Insteadofproblem
(13)instep8,wesolveproblem(14)and(15)usingGAsto
obtain the prediction of RKPCLMR and RKPCLTSR,
respectively.
3 CASE STUDIES
3.1 Data Sets
Wegenerateddatasetsfromatrigonometricfunctionand
sincfunctiontotesttheperformancesofKPCR,RKPCQR,
RKPCLMSR and RKPCLTSR. Generally, the generated
data from those functions can be written as ( )
i i i
e x f y + =
where i = 1, 2, , N. We also generate ( )
tj tj tj
e x f y + =
where j = 1, 2, , Nt; where Nt is a positive integer. The
randomnoises
i
e and
tj
e arerealnumbersgeneratedbya
normally distributed random with zero mean and
standard deviation
1
o
and
2
o , respectively, with
| |. 1 , 0 ,
2 1
e o o
Wecallthesetof { } ( , )
i i
x y and
{ }
( , )
tj tj
x y the
trainingdatasetandthetestingdataset,respectively.
The generated data from the trigonometric function
andsincfunctionaregivenasfollows:
( ) 2.5sin( ) 1.5cos(2 ), (20) f x x x = +
with 2 : 0.15: 2
i
x e t t (
and 2 : 0.2: 2
tj
x e t t (
;
if x 0 4sin( )
( ) (21)
otherwise. 4
x x
f x
=
with 8: 0.25: 8
i
x e (
and 6: 0.3: 6 ,
tj
x e (
respectively.The
notation : : z t z (
standsfor , , 2 , , z z t z t z + + (
where
tisarealnumber.
For shake of comparisons, we set
1
o and
2
o equal to
0.2and0.3,respectively.Inaddition,wealsogeneratethe
datasetfromthepolynomialfunction
2
( ) 5 , (22) f x x x = +
The MAE is also used to prediction error of the testing
datasetsanddenotedbyMAEt.
Outliers are created artificially by moving some
s ) , (
i i
y x and s ) , (
tj tj
y x awayfromdesignatedlocations.We
generate eight potential outliers for each of the first and
second data sets where the positions of outliers in x
direction and
t
x direction are chosen randomly in the
domainof
i
x anddomainof ,
tj
x respectively.Thepositions
of outliers in y direction and
t
y direction are randomly
selected in the interval [8.5 8.5] and [16.25, 16.25] from
the correct positions of
i
y and ,
tj
y respectively. In the
thirddataset,thesixpotentialoutliersof
i
y sand
tj
y sare
generated in the interval [20, 20] from the correct
positionsrandomly.
3.2 The Results
In these case studies, we used the Gaussian kernel
( )
2
( , ) exp k q = x z x z with parameter q is set to five
for KPCR, RKPCQR, RKPCLMSR and RKPCLTSR. We
set the parameter h and equal to 85% of the number of
training data and 0.25, respectively, and involved the
estimate of
( ) r
0 by using KPCR
( )
( )
r
0 in the initial
population of GAs. The ith gene of the other
chromosomes(orcandidatesolutionsof
( )
)
r
0 israndomly
chosenbytheformulae
( )
( )
30 (1) 15 (25)
r
i
rand 0 +
where
( )
( )
r
i
0 istheithelementof
( )
r
0 andi=1,2, ,r.
For the sake of comparisons, the numbers of
population, maximum iterations, mutation rate and
selectionrateare50,1000,0.2and0.5,respectively.Asthe
results,theplotsofthepredictionsofKPCR,RKPCQR,R
KPCLMSR and RKPCLTSR corresponding to the three
datasetsarepresentedinFigure1,Figure2andFigure3,
respectively.WecanseethatthepredictionsofRKPCQR,
RKPCLMSR and RKPCLTSR are less distorted by the
presence of outliers compared to KPCR. Table 1
summarizes the prediction error results of KPCR, R
KPCQR, RKPCLMSR and RKPCLTSR. In the case of
outliers present in both training data set and testing data
sets,RKPCQR,RKPCLMSRandRKPCLTSRyieldlower
MAEsandMAEtscomparedtoKPCR.
Figure1:PredictionsofKPCR(Black),RKPCQR(Red),R
KPCLMSR (Blue) and RKPCLTSR (Green) with q and r
equal to 5 and 10, respectively. The black circles are the
trigonometric data with random noises: (a) training data,
(b)testingdata.
Figure2:PredictionsofKPCR(Black),RKPCQR(Red),R
KPCLMSR (Blue) and RKPCLTSR (Green) with q and r
equal to 5 and 13, respectively. The black circles are the
sincdatawithrandomnoises:(a)trainingdata,(b)testing
data.
Figure3:PredictionsofKPCR(Black),RKPCQR(Red),R
KPCLMSR (Blue) and RKPCLTSR (Green) with q and r
equal to 5 and 14, respectively. The black circles are the
polynomialdatawithrandomnoises:(a)trainingdata,(b)
testingdata.
Table1:MeanofMAEandMAEtforKPCR,RKPCQR,R
KPCLMSRandRKPCLTSR(withoutliers).
Data Method MAE MAEt
Trigono
metric
KPCR(r=10) 0.8773 0.8612
RKPCQR(r=10,=0.25) 0.6556 0.6627
KPCLMSR(r=10) 0.7964 0.7967
RKPCLTSR(r=10) 0.6795 0.6945
Sinc KPCR(r=13) 1.4692 1.5290
RKPCQR(r=13,=0.25) 1.0703 1.0486
KPCLMSR(r=13) 1.1028 1.3458
RKPCLTSR(r=13) 1.0486 1.0409
Poly
nomial
KPCR(r=14) 1.3287 2.2538
RKPCQR(r=14,=0.25) 1.1565 2.1389
KPCLMSR(r=14) 1.0232 2.1263
RKPCLTSR(r=14) 1.1674 2.1598
4 CONCLUSIONS
We have proposed three novel nonlinear robust
regressionsusingthehybridizationofKPCA,regression
quantile, LMS, LTS and genetic algorithms. The proposed
methodsareperformedbytransformingoriginaldatainto
ahigherdimensionalfeaturespaceandcreatingamultiple
linear regression in this space. Then, we perform a kernel
trick and solve the optimization problems of regression
quantile,LMSandLTSinthefeaturespaceusingGAsfor
obtainingnonlinearrobustregressions.Inthecaseofdata
with outliers, the prediction of RKPCQR, RKPCLMSR
and RKPCLTSR are less distorted by the presence of
outliers and give smaller MAEs and MAEts compared to
KPCR. When outliers are not present, all of the four
methodsperformverywell.
ACKNOWLEGDEMENT
The authors sincerely thank to Universiti Teknologi
Malaysia and Ministry of Higher Education (MOHE)
Malaysia for Research University Grant (Vot number
Q.J130000.7128.02J88). In addition, we also thank to The
Research Management Center (RMC) UTM for
supportingthisresearchproject.
Table2:MeanofMAEandMAEtforKPCR,RKPCQR,R
KPCLMSRandRKPCLTSR(withoutoutliers).
Data Method MAE MAEt
Trigono
metric
KPCR(r=10) 0.0746 0.3237
RKPCQR(r=10,=0.25) 0.0760 0.3272
KPCLMSR(r=10) 0.0896 0.3427
RKPCLTSR(r=10) 0.0821 0.3283
Sinc KPCR(r=13) 0.0733 0.0557
RKPCQR(r=13,=0.25) 0.0720 0.0616
KPCLMSR(r=13) 0.0821 0.0804
RKPCLTSR(r=13) 0.0728 0.0772
Poly
nomial
KPCR(r=14) 0.1678 0.1562
RKPCQR(r=14,=0.25) 0.1425 0.1712
KPCLMSR(r=14) 0.1740 0.1890
RKPCLTSR(r=14) 0.1484 0.1659
REFERENCES
[1] H. Anton, Elementary Linear Algebra, John Wiley and Sons, Inc.,
2000.
[2] M.B. Aryanezhad and M. Hemati, A new genetic algorithm for
solving nonconvex nonlinear programming problems, Applied
MathematicsandComputation,86:186194,2008.
[3]J.Cho,J.Lee,S.WChoi,D.Lee,andI.Lee,Faultidentificationfor
process monitoring using kernel principal component analysis,
ChemicalEngineeringScience,pages279288,2005.
[4] M. Gen, R. Cheng, and L. Lin, Network Models and Optimization
MultiobjectiveGeneticAlgorithmApproach,Springer,2008.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 12, DECEMBER 2011, ISSN 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 69
[5] L. Hoegaerts, J.A.K. Suykens, J. Vandewalle, and B. De Moor,
Subsetbasedleastsquaressubspaceinreproducingkernelhilbert
space,Neurocomputing,pages293323,2005.
[6]P.Huber,RobustStatistics,JohnWileyandSonInc,1981.
[7]A.M.Jade,B.Srikanth,B.DKulkari,J.PJog,andL.Priya,Feature
extractionanddenoisingusingkernelPCA,ChemicalEngineering
Sciences,58:44414448,2003.
[8] C. Lu, C. Zhang, T. Zhang, and W. Zhang, Kernel based
symmerical principal component analysis for face classification,
Neurocomputing,70:904911,2007.
[9] H. Q. Minh, P. Niyogi, and Y. Yao, Mercers theorem, feature
maps,andsmoothing,LectureNotesinComputerScience,Springer
Berling,4005/2006,2009.
[10]D.C.Montgomery,E.A.Peck,andG.G.Vining,Introductionto
LinearRegression,WileyInterscience,2006.
[11] M.S. Osman, Mahmoud A. AboSinn, and A.A. Mousa, A
combined genetic algorithmfuzzy logic controller (gafls) in
nonlinear programming, Applied Mathematics and Computation,
170:821840,2005.
[12] C. H. Park, W. I. Lee, W. Suck, and A. Vautrin, Improved
geneticalgorithmformultidisciplinaryoptimizationofcomposite
laminates, Chemometrics and Intelligent Laboratory Systems,
68:18941903,2008.
[13]R.Rosipal,M.Girolami,L.J.Trejo,andA.Cichoki,KernelPCA
for feature extraction and denoising in nonlinear regression,
NeuralComputingandApplications,pages231243,2001.
[14]R.RosipalandL.J.Trejo,Kernelpartialleastsquaresregression
in reproducing kernel Hilbert space, Journal of Machine Learning
Research,2:97123,2002.
[15] R. Rosipal, L. J. Trejo, and A. Cichoki, Kernel principal
component regression with EM approach to nonlinear principal
component extraction, Technical Report, University of Paisley, UK,
2001.
[16] B. Scholkopf,A. Smola,andK.R.Muller, Nonlinear component
analysis as a kernel eigenvalue problem, Neural Computation,
10:12991319,1998.
[17] B. Scholkopf and A. J. Smola, Learning with kernels, The MIT
Press.,2002.
[18] S.N.Sivanandam and S.N.Deepa, Introduction to Genetic
Algorithms,Springer,2008.
[19] S. Sumathi, T. Hamsapriya, and P. Surekha, Evolutionary
Intelligence,Springer,2008.
[20] A.Wibowo (2008), Nonlinear predictions inregression modelsbased
on kernel method, PhD Dissertation, Graduate School of Systems
andInformationEngineering,Univ.ofTsukuba,Japan.
[21] A. Wibowo A. and M.I. Desa (2011), Kernel Based Nonlinear
Weighted Least Squares Regression, Journal of Computing, 3,
Issue11.
[22] X. Yu and M. Gen, Introduction to evolutionary Algorithms,
Springer,2010.
Antoni Wibowo is currently working as a senior lecturer in the faculty
of computer science and information systems in UTM. He received
B.Sc in Math Engineering from University of Sebelas Maret (UNS)
Indonesia and M.Sc in Computer Science from University of
Indonesia. He also received M. Eng and Dr. Eng in System and
Information Engineering from University of Tsukuba Japan. His
interests are in the field of computational intelligence, machine
learning, operations research and data analysis.
Mohamad Ishak Desa is a professor in the faculty of computer
science and information systems in UTM. He received B.Sc. in
Mathematics from UKM in Malaysia, along an advance diploma in
system analysis from Aston University. He received a M.A. in
Mathematics from University of Illinois, and then, a PhD in operations
research from Salford University in UK. He is currently the Head of
Operation Business Intelligence Research Group in UTM. His
interests are operations research, optimization, logistic and supply
chain.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 12, DECEMBER 2011, ISSN 2151-9617
https://sites.google.com/site/journalofcomputing
WWW.JOURNALOFCOMPUTING.ORG 70