Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
3Activity
0 of .
Results for:
No results containing your search query
P. 1
Gaussian Process Model for Uncertain Data Classification

Gaussian Process Model for Uncertain Data Classification

Ratings: (0)|Views: 131 |Likes:
Published by ijcsis
Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, out-dated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. We
propose that when data mining is performed on uncertain data, data uncertainty has to be considered in order to obtain high quality data mining results. In this paper we study how uncertainty can be incorporated in data mining by using data clustering as a motivating example. We also present a Gaussian process model that can be able to handle data uncertainty in data mining.
Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, out-dated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. We
propose that when data mining is performed on uncertain data, data uncertainty has to be considered in order to obtain high quality data mining results. In this paper we study how uncertainty can be incorporated in data mining by using data clustering as a motivating example. We also present a Gaussian process model that can be able to handle data uncertainty in data mining.

More info:

Published by: ijcsis on Jan 20, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

07/10/2012

pdf

text

original

 
Gaussian Process Model for Uncertain DataClassification
 
G.V.Suresh
1
 
Assoc.Professor, CSE DepartmentUniversal College of EngineeringGuntur, Indiavijaysuresh.g@gmail.com
Shabbeer Shaik 
Assoc. Professor, MCA DepartmentTirmula College of EngineeringGuntur, Indiaskshabbeer@yahoo.com
E.V.Reddy
2
 
Assoc.Professor, CSE DepartmentUniversal College of EngineeringGuntur, Indiaucet_evr@yahoo.com
Usman Ali Shaik 
Assoc.Professor, CSE DepartmentUniversal College of EngineeringGuntur, Indiausman_india@yahoo.com
 Abstract
 
Data uncertainty is common in real-worldapplications due to various causes, including imprecisemeasurement, network latency, out-dated sources and samplingerrors. These kinds of uncertainty have to be handled cautiously,or else the mining results could be unreliable or even wrong.
 
Wepropose that when data mining is performed on uncertain data,data uncertainty has to be considered in order to obtain highquality data mining results. In this paper we study howuncertainty can be incorporated in data mining by using dataclustering as a motivating example. We also present a Gaussianprocess model that can be able to handle data uncertainty in datamining.
 Keywords- Gaussian process, uncertain data, Gaussian distribution, Data Mining
I.
 
I
 NTRODUCTION
Data is often associated with uncertainty because of measurement inaccuracy, sampling discrepancy, outdated datasources, or other errors. This is especially true for applicationsthat require interaction with the physical world, such aslocation-based services [1] and sensor monitoring [3]. For example, in the scenario of moving objects (such as vehicles or  people), it is impossible for the database to track the exactlocations of all objects at all-time instants. Therefore, thelocation of each object is associated with uncertainty betweenupdates [4]. These various sources of uncertainty have to beconsidered in order to produce accurate query and miningresults. We note that with uncertainty, data values are no longer atomic. To apply traditional data mining techniques, uncertaindata has to be summarized
 
into atomic values. Taking moving-object applications as an example again, the location of anobject can be summarized either by its last recorded location or  by an expected location. Unfortunately, discrepancy in thesummarized recorded value and the actual values couldseriously affect the quality of the mining results. In recentyears, there is significant research interest in data uncertaintymanagement. Data uncertainty can be categorized into twotypes, namely existential uncertainty and value uncertainty. Inthe first type it is uncertain whether the object or data tupleexists or not. For example, a tuple in a relational database could be associated with a probability value that indicates theconfidence of its presence. In value uncertainty, a data item ismodelled as a closed region which bounds its possible values,together with a probability density function of its value. Thismodel can be used to quantify the imprecision of location andsensor data in a constantly-evolving environment.
 A.
 
Uncertain data mining
There has been a growing interest in uncertain data mining[1],including clustering[2], [3], [4], [5], classi
cation[6], [7], [8],outlier detection [9], frequent pattern mining [10], [11],streams mining[12] and skyline analysis[13] on uncertain data,etc. An important branch of mining uncertain data is to buildclassi
cation models on uncertain data. While [6], [7] studythe classi
cation of uncertain data using the support vector model, [8] performs classi
cation using decision trees. This paper unprecedentedly explores yet another classi
cationmodel, Gaussian classi
ers, and extends them to handleuncertain data.
 
The key problem in Gaussian process methodis the class conditional density estimation. Traditionally theclass conditional density is estimated based on data points. For uncertain classi
cation problems, however, we should learnthe class conditional density from uncertain data objectsrepresented by probability distributions.II.
 
ESEARCH
B
ACKGROUND
 
 A.
 
General Structure of Gaussian Process Models
The conditional distribution
(|)
 pyx
describes thedependency of an observable y on a correspondinginput
 x
χ 
. The class of models described in this sectionassumes that this relation can be decomposed into a systematicand a random component. Further- more, the systematicdependency is given by a latent function
:
 f
suchthat the sampling distribution, i.e. the likelihood, is of the form
(|((),)
 pyfx
θ 
 
(
2.1)
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010111http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
 Which describes the random aspects of the data-generating process. In general, we will use to denote additional parametersof the likelihood besides
. Note that the conditionaldistribution of the observable
 y
depends on
 x
via the value of the latent function
 f 
at
 x
only. The aim of inference is toidentify the systematic component
 f 
from empiricalobservations and prior beliefs. The data comes in the form of  pair-wise observations
{(,)}|1,..}
ii
 Dyxim
= =
where
m
isthe number of samples. Let
1
[,...,]
m
 Xxx
=
and
1
[,...,]
m
 yyy
=
 
collect the inputs and responsesrespectively. In general we assume
n
 x
unless statedotherwise. The name Gaussian process model refers to using aGaussian process (GP) as a prior on
 f 
. The Gaussian process prior is non-parametric in the sense
 
that instead of assuming a particular parametric form of 
 f 
(x,
θ
) and making inferenceabout
φ 
, the approach is to put a prior on function valuesdirectly. Each input position
 x
χ 
has an associated randomvariable
 f(x).
A Gaussian process prior on f technically meansthat a priori the joint distribution of a collection of functionvalues f= associated with any collection of m’ inputs [x
1
. . . ,x
m’
]
T
is multivariate normal (Gaussian)
(|,)(|,)
 pfXNfm
ψ 
=
(2.2)with mean m and covariance matrix K. A Gaussian process isspecified by a mean function
m(x)
and a covariance function
(,',)
kxx
ψ 
such that
(,,)
iji
kxxj
ψ 
=
and
1'
[(),.....()]
m
mmxmx
=
.By choosing a particular form of covariance function we may introduce hyper-parameters
ψ 
tothe Gaussian process prior. Depending on the actual form of the covariance function
(,',)
kxx
ψ 
the hyper-parameters
ψ 
 can control various aspects of the Gaussian process. Thesampling distribution (2.1) of 
 y
depends on
 f 
only through
  f(x).
As an effect the likelihood of 
 f 
given
 D
factorizes
1
(|,)
(|(),)|(|,)
miii
 py
 pyfxpy
θ 
θ θ 
=
=
=
(2.3)and depends on
 f 
only through its value at the observed inputs f According to the model, conditioning the likelihood on f isequivalent to conditioning on the full function f. This is of central importance since it allows us to make inference over finite dimensional quantities instead of handling the wholefunction
 f 
. The posterior distribution of the function values f iscomputed according to Bayes’ rule
1
(|f,)(f|,)(f|,)(f|,,)(|,)(|,)(|,)
miii
 pypXNm pDpy pDpD
θ ψ θ ψ θ θ ψ θ ψ 
=
= =
 
(2.4)Where
 f 
i
= f (x
i
).
The posterior distribution of 
can be used tocompute the posterior predictive distribution of 
 f 
(x
*
)
for anyinput x
*
 
where in the following the asterisk is used to mark testexamples. If several test cases are given,
X
*
 
collects the testinputs and f 
*
denotes the corresponding vector of latentfunction values. The predictive distribution of 
*
 
is obtained byintegration over the posterior uncertainty
*
(f|,,)(f|f,X,X,)(f|,,)
 pD,XppDd
θ ψ ψ θ ψ 
=
 (2.5)Where the first term of the right hand side describes thedependency of 
*
on f induced by the GP prior. The joint prior distribution of f and
*
due to the GP prior is multivariatenormal
***T*****
fm(f|f,X,X,)|,fm
 p
ψ 
 ⎞⎛ =⎝ 
(2.6)Where the covariance matrix is partitioned such that
**
isthe prior covariance matrix of the
*
and
*
contains thecovariances between f and
*
.The conditional distribution of 
*
|f can be obtained from the joint distribution (2.6) usingrelation to give
T-1T-1 p(f|f,X,X,
ψ
)=N(f|m+KK(f-m),K-KKK)*********
(2.7)which is again multivariate normal. The simplest possiblemodel assumes that the function can be observed directly y =
 f 
(x) so that the posterior on f becomes
 p
(f | D) =
δ
(f 
y),describing that no posterior uncertainty about f remains. Fromthis posterior the predictive distribution of f 
*
can be obtainedaccording to eq. (2.5) which corresponds to simply replacing f  by y in eq. (2.7) Figure 1 shows the posterior Gaussian processwhich is obtained by conditioning the prior on the fiveobservations depicted as points. The predictive uncertainty iszero at the locations where the function value has beenobserved. Between the observations the uncertainty about thefunction value grows and the sampled functions represent validhypothesis about f under the posterior process.
Figure 1 show the posterior Gaussian process on the five observations
III.
 
O
VERVIEW
O
F
T
HE
P
ROBLEM
 We assume the following statistical model
()
tfx
ε 
= +
(3.1)where x is a D-dimensional input and
ε 
the output, additive,Gaussian uncertain data such that
(0,)
t
 Nv
ε 
, where
v
isthe unknown data variance. Such a model implies that 
[|x]=()
 Etfx
(3.2) Now, let
xu
 x
ε 
= +
or 
x(u,)
 x
 Nv
where
 I 
is the
 D X D
identity matrix and
 x
v
is the input data variance. In this case,the expectation of 
given the characteristics of x is obtained by integrating over the input distribution
[|u,]=()()
 x
 Etvfxpxdx
(3.3)
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010112http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
 This integral cannot be solved analytically withoutapproximations for many forms of 
 f 
(x)
 A.
 
 Analytical approximation using the Delta-Method 
The function
 f 
of the random argument x can always beapproximated by a second order Taylor expansion around themean u of x:
3
1(x)(u)(xu)'(u)(xu)"(u)(xu)(||xu||)2
T
 ffffO
= + + + −
(
3.4)
 
where
(x)'(u)x
 f  f 
=
, and
2T
(x)"(u)xx
 f  f 
=
, evaluated atx=u .Within this approximation, we can now solve the (3.3)integral. We have
 
1[|,](u)(xu)'(u)(xu)"(u)(xu)(x)x2
T x
 Etuvfffp
+ +
1(u)Tr["(u)]()Tr["(u)]22
 x x
v ffvIfu
+ = +
(3.5)Where Tr denotes the trace. Thus, the new generative modelfor our data
(u,)(u,)(u)Tr["(u)2
 x x
tgvxvgvf
ε 
= += +
(3.6)IV.
 
RELATED
 
WORKS
 A.
 
 Defining a new Gaussian Process
In the case of uncertain or random inputs, the new input/outputrelationship is given by (3.6), where the former function
 f 
, inthe noise-free case, has been replaced by
(u,)
 x
gv
=
()Tr["(u)]2
 x
v fu
+
. If we put a Gaussian prior on
(u), we can derive the corresponding prior on its secondderivative and then de
ne the prior on the space of admissiblefunctions
(u,)
 x
gv
which is viewed as the sum of the twocorrelated random functions,
(u) an
Tr["(u)2
 x
v f 
. We useresults from the theory of random functions [3.5]. Let us recallthat if 
()
 X
and
()
Y
are two random functions of the sameargument
r,
with expected values
()
 x
m
and
()
 y
m
andcovariance functions
(,')
 x
Cr
and
(,')
 y
Cr
respectively,then the mean and covariance function of 
()()()
 ZrXrY
= +
are given by
()()()
mzrmxrmy
= +
(4.1)(,')(,')(,')(,')(,')
 zxyxyyx
CrrCrrCrrCrrCr
= + + +
(4.2)in the case
()
 X
and
()
Y
are correlated
(,')
 xy
Cr
and
(,')
 yx
Cr
are the cross-covariance functions. We can nowapply this to our function
(.)
g
. Let us first derive the meanand covariance function of 
(u,)
 x
gv
in the one-dimensionalcase and then extend these expressions to
 D
dimensions.Given that
()
 fu
has zero-mean and covariance function
(,)
ij
Cuu
as given by
21
1(u,u)exp(()2
dijdij
Cvwuu
=
=
itssecond derivative
"()
 fu
has zero-mean and covariancefunction
422
(,)/
ijij
Cuuuu
. It is then straightforward that
"()2
 x
v fu
has zero-mean and covariance function
2422
(,)/4
 xijij
vCuuuu
also, the cross-covariance functions between
(u)
 f 
and
"()2
 x
v fu
is given by by
22
(,)/2
 xiji
vCuuu
.Therefore, using the fact we have
222222
(,)(,),
ijijij
CuuCuuuu
=
in onedimension,
(,)()"()2
 x x
vguvfufu
= +
has zero-mean andcovariance function
4222222
(,)(,)cov[(,),(,)](,)4
ijij xixjxxiji
CuuCuuvguvguvCuiujvuuu
= + +
 (4.3)In the case of 
 D
-dimensional inputs, we have
2222
(u,u)(u,u)cov[(u,),(u,)](u,u)TrT4uuuuu
ijij xixjxxTTiiijii
CvgvgvCijvu
= + +
(4.4)Where
22
(u,u)uuuu
Tiijj
Cij
is a
 D X D
each entry of which being a
D X D
the block 
(,)
rs
contains
22
(u,u)uuuu
rsiijj
Cij
 So we see that the first term of the
corrected 
covariancefunction corresponds to the noise-free case plus two correctionterms weighted by the input noise variance, which might beeither learnt or assumed to be known
a priori
.
 B.
 
 Inference and prediction
Within this approximation, the likelihood of thedata
1
{.....,}
 N 
t
is readily obtained. We have
t|(0,)
 N 
UQ
with
'
δ
ijijij
Qvt 
= +
(4.5)Where t is the
 N X 1
vector of observed targets.
U
the
 N X D
 matrix of input means,
'
.
ij
is given by (4.4) and
δ
1
ij
=
when
,0
ij
=
otherwise. The parameters
[....,,,,]
1Dx
wwvvv
Θ =
can then be learnt either in aMaximum Likelihood framework or in a Bayesian way, byassigning priors and computing their posterior distribution.When using the
usual
GP, the predictive distribution of amodel output corresponding to a new input
***
,(()|,{u,t},)
 pfu
Θ
uu
is Gaussian with mean and variancerespectively given by
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010113http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->