|Views: 131
|Likes: 2

Published by ijcsis

Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, out-dated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. We

propose that when data mining is performed on uncertain data, data uncertainty has to be considered in order to obtain high quality data mining results. In this paper we study how uncertainty can be incorporated in data mining by using data clustering as a motivating example. We also present a Gaussian process model that can be able to handle data uncertainty in data mining.

propose that when data mining is performed on uncertain data, data uncertainty has to be considered in order to obtain high quality data mining results. In this paper we study how uncertainty can be incorporated in data mining by using data clustering as a motivating example. We also present a Gaussian process model that can be able to handle data uncertainty in data mining.

Data uncertainty is common in real-world applications due to various causes, including imprecise measurement, network latency, out-dated sources and sampling errors. These kinds of uncertainty have to be handled cautiously, or else the mining results could be unreliable or even wrong. We

propose that when data mining is performed on uncertain data, data uncertainty has to be considered in order to obtain high quality data mining results. In this paper we study how uncertainty can be incorporated in data mining by using data clustering as a motivating example. We also present a Gaussian process model that can be able to handle data uncertainty in data mining.

propose that when data mining is performed on uncertain data, data uncertainty has to be considered in order to obtain high quality data mining results. In this paper we study how uncertainty can be incorporated in data mining by using data clustering as a motivating example. We also present a Gaussian process model that can be able to handle data uncertainty in data mining.

See more

See less

Gaussian Process Model for Uncertain DataClassification

G.V.Suresh

1

Assoc.Professor, CSE DepartmentUniversal College of EngineeringGuntur, Indiavijaysuresh.g@gmail.com

Shabbeer Shaik

Assoc. Professor, MCA DepartmentTirmula College of EngineeringGuntur, Indiaskshabbeer@yahoo.com

E.V.Reddy

2

Assoc.Professor, CSE DepartmentUniversal College of EngineeringGuntur, Indiaucet_evr@yahoo.com

Usman Ali Shaik

Assoc.Professor, CSE DepartmentUniversal College of EngineeringGuntur, Indiausman_india@yahoo.com

Abstract

—

Data uncertainty is common in real-worldapplications due to various causes, including imprecisemeasurement, network latency, out-dated sources and samplingerrors. These kinds of uncertainty have to be handled cautiously,or else the mining results could be unreliable or even wrong.

Wepropose that when data mining is performed on uncertain data,data uncertainty has to be considered in order to obtain highquality data mining results. In this paper we study howuncertainty can be incorporated in data mining by using dataclustering as a motivating example. We also present a Gaussianprocess model that can be able to handle data uncertainty in datamining.

Keywords- Gaussian process, uncertain data, Gaussian distribution, Data Mining

I.

I

NTRODUCTION

Data is often associated with uncertainty because of measurement inaccuracy, sampling discrepancy, outdated datasources, or other errors. This is especially true for applicationsthat require interaction with the physical world, such aslocation-based services [1] and sensor monitoring [3]. For example, in the scenario of moving objects (such as vehicles or people), it is impossible for the database to track the exactlocations of all objects at all-time instants. Therefore, thelocation of each object is associated with uncertainty betweenupdates [4]. These various sources of uncertainty have to beconsidered in order to produce accurate query and miningresults. We note that with uncertainty, data values are no longer atomic. To apply traditional data mining techniques, uncertaindata has to be summarized

into atomic values. Taking moving-object applications as an example again, the location of anobject can be summarized either by its last recorded location or by an expected location. Unfortunately, discrepancy in thesummarized recorded value and the actual values couldseriously affect the quality of the mining results. In recentyears, there is significant research interest in data uncertaintymanagement. Data uncertainty can be categorized into twotypes, namely existential uncertainty and value uncertainty. Inthe first type it is uncertain whether the object or data tupleexists or not. For example, a tuple in a relational database could be associated with a probability value that indicates theconfidence of its presence. In value uncertainty, a data item ismodelled as a closed region which bounds its possible values,together with a probability density function of its value. Thismodel can be used to quantify the imprecision of location andsensor data in a constantly-evolving environment.

A.

Uncertain data mining

There has been a growing interest in uncertain data mining[1],including clustering[2], [3], [4], [5], classi

ﬁ

cation[6], [7], [8],outlier detection [9], frequent pattern mining [10], [11],streams mining[12] and skyline analysis[13] on uncertain data,etc. An important branch of mining uncertain data is to buildclassi

ﬁ

cation models on uncertain data. While [6], [7] studythe classi

ﬁ

cation of uncertain data using the support vector model, [8] performs classi

ﬁ

cation using decision trees. This paper unprecedentedly explores yet another classi

ﬁ

cationmodel, Gaussian classi

ﬁ

ers, and extends them to handleuncertain data.

The key problem in Gaussian process methodis the class conditional density estimation. Traditionally theclass conditional density is estimated based on data points. For uncertain classi

ﬁ

cation problems, however, we should learnthe class conditional density from uncertain data objectsrepresented by probability distributions.II.

R

ESEARCH

B

ACKGROUND

A.

General Structure of Gaussian Process Models

The conditional distribution

(|)

pyx

describes thedependency of an observable y on a correspondinginput

x

χ

∈

. The class of models described in this sectionassumes that this relation can be decomposed into a systematicand a random component. Further- more, the systematicdependency is given by a latent function

:

fX

→

suchthat the sampling distribution, i.e. the likelihood, is of the form

(|((),)

pyfx

θ

(

2.1)

(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010111http://sites.google.com/site/ijcsis/ISSN 1947-5500

Which describes the random aspects of the data-generating process. In general, we will use to denote additional parametersof the likelihood besides

f

. Note that the conditionaldistribution of the observable

y

depends on

x

via the value of the latent function

f

at

x

only. The aim of inference is toidentify the systematic component

f

from empiricalobservations and prior beliefs. The data comes in the form of pair-wise observations

{(,)}|1,..}

ii

Dyxim

= =

where

m

isthe number of samples. Let

1

[,...,]

T m

Xxx

=

and

1

[,...,]

T m

yyy

=

collect the inputs and responsesrespectively. In general we assume

n

x

∈

unless statedotherwise. The name Gaussian process model refers to using aGaussian process (GP) as a prior on

f

. The Gaussian process prior is non-parametric in the sense

that instead of assuming a particular parametric form of

f

(x,

θ

) and making inferenceabout

φ

, the approach is to put a prior on function valuesdirectly. Each input position

x

χ

∈

has an associated randomvariable

f(x).

A Gaussian process prior on f technically meansthat a priori the joint distribution of a collection of functionvalues f= associated with any collection of m’ inputs [x

1

. . . ,x

m’

]

T

is multivariate normal (Gaussian)

(|,)(|,)

pfXNfmK

ψ

=

(2.2)with mean m and covariance matrix K. A Gaussian process isspecified by a mean function

m(x)

and a covariance function

(,',)

kxx

ψ

such that

(,,)

iji

K

kxxj

ψ

=

and

1'

[(),.....()]

T m

mmxmx

=

.By choosing a particular form of covariance function we may introduce hyper-parameters

ψ

tothe Gaussian process prior. Depending on the actual form of the covariance function

(,',)

kxx

ψ

the hyper-parameters

ψ

can control various aspects of the Gaussian process. Thesampling distribution (2.1) of

y

depends on

f

only through

f(x).

As an effect the likelihood of

f

given

D

factorizes

1

(|,)

(|(),)|(|,)

miii

pyf

pyfxpyf

θ

θ θ

=

=

=

∏

(2.3)and depends on

f

only through its value at the observed inputs f According to the model, conditioning the likelihood on f isequivalent to conditioning on the full function f. This is of central importance since it allows us to make inference over finite dimensional quantities instead of handling the wholefunction

f

. The posterior distribution of the function values f iscomputed according to Bayes’ rule

1

(|f,)(f|,)(f|,)(f|,,)(|,)(|,)(|,)

miii

pypXNmK pDpyf pDpD

θ ψ θ ψ θ θ ψ θ ψ

=

= =

∏

(2.4)Where

f

i

= f (x

i

).

The posterior distribution of

f

can be used tocompute the posterior predictive distribution of

f

(x

*

)

for anyinput x

*

where in the following the asterisk is used to mark testexamples. If several test cases are given,

X

*

collects the testinputs and f

*

denotes the corresponding vector of latentfunction values. The predictive distribution of

f

*

is obtained byintegration over the posterior uncertainty

*

(f|,,)(f|f,X,X,)(f|,,)

pD,XppDdf

θ ψ ψ θ ψ

∗ ∗ ∗

=

∫

(2.5)Where the first term of the right hand side describes thedependency of

*

f

on f induced by the GP prior. The joint prior distribution of f and

*

f

due to the GP prior is multivariatenormal

***T*****

K K fm(f|f,X,X,)|,fmK K

pN

ψ

⎞⎛ ⎡⎤⎡ ⎤ ⎡ ⎤=⎟⎜⎢⎥⎢ ⎥ ⎢ ⎥⎟⎣ ⎦ ⎣ ⎦⎦⎣⎝ ⎠

(2.6)Where the covariance matrix is partitioned such that

**

K

isthe prior covariance matrix of the

*

f

and

*

K

contains thecovariances between f and

*

f

.The conditional distribution of

*

f

|f can be obtained from the joint distribution (2.6) usingrelation to give

T-1T-1 p(f|f,X,X,

ψ

)=N(f|m+KK(f-m),K-KKK)*********

(2.7)which is again multivariate normal. The simplest possiblemodel assumes that the function can be observed directly y =

f

(x) so that the posterior on f becomes

p

(f | D) =

δ

(f

−

y),describing that no posterior uncertainty about f remains. Fromthis posterior the predictive distribution of f

*

can be obtainedaccording to eq. (2.5) which corresponds to simply replacing f by y in eq. (2.7) Figure 1 shows the posterior Gaussian processwhich is obtained by conditioning the prior on the fiveobservations depicted as points. The predictive uncertainty iszero at the locations where the function value has beenobserved. Between the observations the uncertainty about thefunction value grows and the sampled functions represent validhypothesis about f under the posterior process.

Figure 1 show the posterior Gaussian process on the five observations

III.

O

VERVIEW

O

F

T

HE

P

ROBLEM

We assume the following statistical model

()

t

tfx

ε

= +

(3.1)where x is a D-dimensional input and

t

ε

the output, additive,Gaussian uncertain data such that

(0,)

tt

Nv

ε

, where

t

v

isthe unknown data variance. Such a model implies that

[|x]=()

Etfx

(3.2) Now, let

xu

x

ε

= +

or

x(u,)

x

NvI

where

I

is the

D X D

identity matrix and

x

v

is the input data variance. In this case,the expectation of

t

given the characteristics of x is obtained by integrating over the input distribution

[|u,]=()()

x

Etvfxpxdx

∫

(3.3)

(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010112http://sites.google.com/site/ijcsis/ISSN 1947-5500

This integral cannot be solved analytically withoutapproximations for many forms of

f

(x)

A.

Analytical approximation using the Delta-Method

The function

f

of the random argument x can always beapproximated by a second order Taylor expansion around themean u of x:

3

1(x)(u)(xu)'(u)(xu)"(u)(xu)(||xu||)2

TT

ffffO

= + − + − − + −

(

3.4)

where

(x)'(u)x

f f

∂=∂

, and

2T

(x)"(u)xx

f f

∂=∂ ∂

, evaluated atx=u .Within this approximation, we can now solve the (3.3)integral. We have

1[|,](u)(xu)'(u)(xu)"(u)(xu)(x)x2

TT x

Etuvfffpd

⎡ ⎤+ − + − −⎢ ⎥⎣ ⎦

∫

1(u)Tr["(u)]()Tr["(u)]22

x x

v ffvIfuf

+ = +

(3.5)Where Tr denotes the trace. Thus, the new generative modelfor our data

(u,)(u,)(u)Tr["(u)2

t x x

tgvxvgvff

ε

= +⎧⎪⎨= +⎪⎩

(3.6)IV.

RELATED

WORKS

A.

Defining a new Gaussian Process

In the case of uncertain or random inputs, the new input/outputrelationship is given by (3.6), where the former function

f

, inthe noise-free case, has been replaced by

(u,)

x

gv

=

()Tr["(u)]2

x

v fuf

+

. If we put a Gaussian prior on

f

(u), we can derive the corresponding prior on its secondderivative and then de

ﬁ

ne the prior on the space of admissiblefunctions

(u,)

x

gv

which is viewed as the sum of the twocorrelated random functions,

f

(u) an

Tr["(u)2

x

v f

. We useresults from the theory of random functions [3.5]. Let us recallthat if

()

Xr

and

()

Yr

are two random functions of the sameargument

r,

with expected values

()

x

mr

and

()

y

mr

andcovariance functions

(,')

x

Crr

and

(,')

y

Crr

respectively,then the mean and covariance function of

()()()

ZrXrYr

= +

are given by

()()()

mzrmxrmyr

= +

(4.1)(,')(,')(,')(,')(,')

zxyxyyx

CrrCrrCrrCrrCrr

= + + +

(4.2)in the case

()

Xr

and

()

Yr

are correlated

(,')

xy

Crr

and

(,')

yx

Crr

are the cross-covariance functions. We can nowapply this to our function

(.)

g

. Let us first derive the meanand covariance function of

(u,)

x

gv

in the one-dimensionalcase and then extend these expressions to

D

dimensions.Given that

()

fu

has zero-mean and covariance function

(,)

ij

Cuu

as given by

21

1(u,u)exp(()2

dd ijdijd

Cvwuu

=

= − −

∑

itssecond derivative

"()

fu

has zero-mean and covariancefunction

422

(,)/

ijij

Cuuuu

∂ ∂ ∂

. It is then straightforward that

"()2

x

v fu

has zero-mean and covariance function

2422

(,)/4

xijij

vCuuuu

∂ ∂ ∂

also, the cross-covariance functions between

(u)

f

and

"()2

x

v fu

is given by by

22

(,)/2

xiji

vCuuu

∂ ∂

.Therefore, using the fact we have

222222

(,)(,),

ijijij

CuuCuuuu

∂ ∂=∂ ∂

in onedimension,

(,)()"()2

x x

vguvfufu

= +

has zero-mean andcovariance function

4222222

(,)(,)cov[(,),(,)](,)4

ijij xixjxxiji

CuuCuuvguvguvCuiujvuuu

∂ ∂= + +∂ ∂ ∂

(4.3)In the case of

D

-dimensional inputs, we have

2222

(u,u)(u,u)cov[(u,),(u,)](u,u)TrTr 4uuuuu

ijij xixjxxTTT iiijii

CC vgvgvCijvu

⎡ ⎤⎡ ⎤⎡ ⎤∂ ∂∂= + +⎢ ⎥⎢ ⎥⎢ ⎥∂ ∂ ∂ ∂ ∂ ∂⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦⎣ ⎦⎣ ⎦

(4.4)Where

22

(u,u)uuuu

TT iijj

Cij

⎡ ⎤∂ ∂⎢ ⎥∂ ∂ ∂ ∂⎢ ⎥⎣ ⎦

is a

D X D

each entry of which being a

D X D

the block

(,)

rs

contains

22

(u,u)uuuu

rsT iijj

Cij

⎡ ⎤∂ ∂⎢ ⎥∂ ∂ ∂ ∂⎢ ⎥⎣ ⎦

So we see that the first term of the

corrected

covariancefunction corresponds to the noise-free case plus two correctionterms weighted by the input noise variance, which might beeither learnt or assumed to be known

a priori

.

B.

Inference and prediction

Within this approximation, the likelihood of thedata

1

{.....,}

N

tt

is readily obtained. We have

t|(0,)

N

UQ

with

'

δ

ijijij

Qvt

= +

∑

(4.5)Where t is the

N X 1

vector of observed targets.

U

the

N X D

matrix of input means,

'

.

ij

∑

is given by (4.4) and

δ

1

ij

=

when

,0

ij

=

otherwise. The parameters

[....,,,,]

1Dxt

wwvvv

Θ =

can then be learnt either in aMaximum Likelihood framework or in a Bayesian way, byassigning priors and computing their posterior distribution.When using the

usual

GP, the predictive distribution of amodel output corresponding to a new input

***

,(()|,{u,t},)

pfu

Θ

uu

is Gaussian with mean and variancerespectively given by

(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010113http://sites.google.com/site/ijcsis/ISSN 1947-5500

Filters