Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
3Activity
0 of .
Results for:
No results containing your search query
P. 1
Data Warehouses for Uncertain Data

Data Warehouses for Uncertain Data

Ratings: (0)|Views: 211|Likes:
Journal of Computing, ISSN 2151-9617, http://www.journalofcomputing.org
Journal of Computing, ISSN 2151-9617, http://www.journalofcomputing.org

More info:

Published by: Journal of Computing on Jul 08, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

05/05/2014

pdf

text

original

 
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG 178
Data Warehouses for Uncertain Data
Hoda M. O. Mokhtar
 
Abstract
Data warehousing is one of the most powerful BI tools nowadays. A data warehouse stores historical data that is inte-grated from many sources, and processes it in a multidimensional approach to make it easy to use for efficient decision making. How-ever, so far most of the data warehouse’s designs are based on the assumption that data in the data warehouse is either true or true untila new snapshot occurs. Today, many real world applications require handling uncertain data. Sensor networks, and a wide range of location based services (LBS), and many others deals with data that is not 100% guaranteed accurate. Inspired by the importance of those newly emerging application, in this paper we propose a novel framework for data warehouses that efficiently handles both exactand uncertain data. We present the application of our model in the context of sensor networks and show analyzing uncertain data canalso be achieved.
 
Index Terms
 —Data Warehouses, analyzing fuzzy data, uncertain data warehouses, sensor data.
1 I
NTRODUCTION
NCERTAINITY
is
 
an
 
inherent
 
property
 
in
 
many
 
real
 
world
 
data.
 
Even
 
with
 
the
 
current
 
advances
 
in
 
sensor
 
data
 
acquisition,
 
and
 
positioning
 
systems
 
(GPSs),
 
acquiring
 
100%
 
accurate
 
(exact)
 
data
 
is
 
not
 
feasible.
 
In
 
most
 
real
world
 
applications
 
data
 
acquired
 
is
 
an
 
approximate,
 
uncertain,
 
fuzzy,
 
or
 
near
accurate
 
data.
 
Acquiring
 
an
 
exact
 
reading
 
of
 
a
 
sensor
 
at
 
every
 
time
 
instant
 
or
 
obtaining
 
the
 
exact
 
location
 
of
 
a
 
moving
 
object
 
at
 
every
 
time
 
instant
 
is
 
not
 
possible.
 
Handling
 
data
 
uncertainty
 
thus
 
requires
 
special
 
treatment
 
than
 
regular
 
traditional
 
data
 
that
 
is
 
assumed
 
to
 
 be
 
always
 
true.
 
Handling
 
uncertain
 
and
 
fuzzy
 
data
 
was
 
discussed
 
in
 
the
 
database
 
community
 
is
 
several
 
research
 
work
 
including
 
[1–6].
 
Probabilistic
 
databases
 
and
 
fuzzy
 
databases
 
are
 
among
 
the
 
techniques
 
proposed
 
to
 
deal
 
with
 
data
 
uncertainty
 
in
 
the
 
database
 
environment.
 
In
 
addition,
 
several
 
approaches
 
were
 
presented
 
for
 
querying,
 
managing,
 
storing,
 
and
 
mining
 
uncertain
 
data
 
[7–9].
 
However,
 
elevating
 
this
 
to
 
consider
 
uncertain
 
and
 
fuzzy
 
historical
 
data
 
in
 
data
 
warehouses
 
was
 
not
 
thoroughly
 
investigated
 
[10,
 
11].
 
In
 
general,
 
data
 
warehouses
 
were
 
introduced
 
to
 
aid
 
managers
 
and
 
decision
 
makers
 
in
 
making
 
the
 
most
 
efficient
 
decision.
 
A
 
data
 
warehouse
 
is
 
simply
 
defined
 
as
 
a
 
subject
oriented,
 
consistent,
 
time
 
variant,
 
and
 
non
volatile
 
store
 
of
 
data
 
that
 
 basically
 
gains
 
its
 
power
 
through
 
storing
 
and
 
handling
 
measurable
 
historical
 
data
 
[12].
 
Today,
 
data
 
warehousing
 
is
 
widely
 
accepted
 
of
 
 by
 
many
 
organizations
 
as
 
an
 
effective
 
and
 
efficient
 
 business
 
intelligence
 
and
 
decision
 
making
 
tool.
 
A
 
key
 
characteristic
 
of
 
data
 
warehouses
 
is
 
the
 
usage
 
of
 
multi
dimensional
 
model
 
(i.e.
 
star
 
schema
 
and
 
snow
flake
 
schema).
 
These
 
models
 
enable
 
the
 
data
 
warehouse
 
to
 
provide
 
OLAP
 
(On
Line
 
Analytical
 
Processing)
 
capabilities
 
that
 
enrich
 
the
 
querying
 
process.
 
However,
 
current
 
data
 
warehouse
 
models
 
are
 
 built
 
on
 
the
 
assumption
 
that
 
data
 
is
 
true
 
until
 
a
 
new
 
instance
 
(snapshot)
 
occurs.
 
This
 
assumption
 
although
 
valid
 
in
 
some
 
applications
 
where
 
obtaining
 
exact
 
values
 
is
 
possible,
 
seems
 
unrealistic
 
in
 
many
 
newly
 
emerging
 
applications
 
where
 
data
 
error
 
can
 
occur.
 
Sensor
 
failure,
 
calibration
 
error,
 
measurement
 
inaccuracy,
 
sampling
 
discrepancy,
 
and
 
even
 
outdated
 
data
 
are
 
all
 
normal
 
sources
 
of
 
data
 
inaccuracy.
 
These
 
factors
 
affect
 
the
 
nature
 
of
 
data
 
stored
 
in
 
the
 
database
 
and
 
consequently
 
transferred
 
to
 
the
 
data
 
warehouse
 
for
 
further
 
processing.
 
Suppose
 
for
 
example
 
we
 
have
 
a
 
data
 
warehouse
 
to
 
aid
 
meteorologists
 
in
 
making
 
decisions
 
 based
 
on
 
weather
 
monitoring
 
readings.
 
If
 
we
 
determine
 
that
 
one
 
of
 
the
 
 basic
 
sensors
 
is
 
probable
 
to
 
give
 
incorrect
 
readings,
 
how
 
do
 
we
 
know
 
which
 
reading
 
is
 
wrong?
 
Do
 
we
 
input
 
all
 
the
 
readings
 
in
 
the
 
warehouse
 
and
 
treat
 
them
 
as
 
if
 
they
 
were
 
100%
 
accurate?
 
What
 
if
 
we
 
have
 
more
 
than
 
one
 
faulty
 
sensor?
 
How
 
do
 
we
 
combine
 
those
 
erroneous
 
readings
 
and
 
aggregate
 
them?
 
Inspired
 
 by
 
the
 
role
 
of
 
data
 
warehouses
 
in
 
many
 
applications
 
and
 
the
 
effect
 
of
 
data
 
uncertainty
 
on
 
query
 
results
 
and
 
manipulation,
 
in
 
this
 
paper,
 
we
 
investigate
 
the
 
design
 
of
 
data
 
warehouse
 
schema
 
that
 
is
 
capable
 
to
 
handle
 
fuzzy,
 
uncertain
 
data
 
in
 
an
 
efficient
 
way.
 
The
 
main
 
contributions
 
of
 
the
 
paper
 
are:
 
1.
 
Proposing
 
a
 
model
 
for
 
representing
 
uncertainty
 
in
 
data
 
warehouses.
 
2.
 
Extending
 
traditional
 
star
 
schema
 
model
 
to
 
capture
 
and
 
handle
 
uncertain
 
data.
 
U
————————————————
 
 
Hoda
 
 M.
 
O.
 
 Mokhtar
 
is
 
with
 
the
 
Faculty
 
of 
 
Computers
 
and
 
Infor
mation,
 
Information
 
Systems
 
Dpet.
 
Cairo
 
University,
 
Postal
 
Code:12613
Egypt
 
©
 
2011
 
 Journal
 
of
 
Computing
 
Press,
 
NY,
 
USA,
 
ISSN
 
2151
9617
 
http://sites.google.com/site/journalofcomputing/ 
 
 
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG 179
3.
 
Presenting
 
the
 
application
 
of
 
uncertain
 
data
 
warehouses
 
in
 
a
 
real
 
application
 
(i.e.
 
weather
 
data
 
acquired
 
from
 
sensors).
 
The
 
rest
 
of
 
the
 
paper
 
is
 
organized
 
as
 
follows:
 
Section
 
2
 
presents
 
a
 
 brief
 
overview
 
of
 
previous
 
related
 
work.
 
Section
 
3,
 
discusses
 
our
 
design
 
approach
 
for
 
uncertain
 
data
 
warehouses.
 
Section
 
4,
 
discusses
 
the
 
application
 
of
 
our
 
design
 
to
 
handle
 
sensor
 
data
 
(specifically
 
weather
 
data).
 
Finally,
 
section
 
5
 
concludes
 
and
 
proposes
 
directions
 
for
 
future
 
work.
 
2 R
ELATED
W
ORK
 
Data
 
uncertainty
 
is
 
an
 
inherent
 
property
 
in
 
various
 
real
world
 
applications.
 
Managing
 
uncertainty
 
in
 
database
 
systems
 
had
 
 been
 
the
 
focus
 
of
 
many
 
research
 
work
 
specially
 
recently
 
with
 
the
 
advent
 
of
 
various
 
applications
 
that
 
are
 
 based
 
on
 
measured
 
data
 
[13–16].
 
Handling
 
uncertain
 
data
 
has
 
 been
 
the
 
focus
 
of
 
several
 
research
 
works
 
in
 
the
 
database
 
community.
 
In
 
general,
 
data
 
uncertainty
 
is
 
a
 
nature
 
consequence
 
of
 
either
 
measurement
 
errors,
 
or
 
incomplete
 
data.
 
For
 
example,
 
acquiring
 
the
 
exact
 
location
 
of
 
a
 
moving
 
object
 
at
 
every
 
time
 
instant
 
is
 
not
 
feasible.
 
Thus,
 
approximation
 
and
 
prediction
 
techniques
 
are
 
used
 
to
 
obtain
 
missing
 
locations.
 
This
 
in
 
turn,
 
affects
 
the
 
degree
 
of
 
accuracy
 
of
 
the
 
data
 
stored
 
in
 
the
 
database.
 
Handling
 
uncertain
 
data
 
can
 
 be
 
divided
 
into
 
two
 
main
 
directions:
 
proposing
 
efficient
 
techniques
 
to
 
model
 
the
 
uncertain
 
data,
 
and
 
incorporating
 
those
 
models
 
to
 
serve
 
different
 
applications
 
[17,1,18].
 
Modeling
 
uncertain
 
data
 
in
 
the
 
database
 
context
 
was
 
discussed
 
in
 
several
 
work
 
[1,2,4].
 
Generally,
 
uncertainty
 
in
 
the
 
database
 
environment
 
is
 
classified
 
as
 
either
 
value
 
uncertainty
 
(attribute
 
uncertainty),
 
or
 
tuple
 
uncertainty
 
(existential
 
uncertainty).
 
In
 
the
 
first
 
case,
 
one
 
or
 
more
 
fields
 
might
 
contain
 
uncertain
 
data,
 
while
 
in
 
the
 
second
 
case,
 
uncertainty
 
is
 
on
 
the
 
whole
 
record
 
as
 
whether
 
it
 
exists
 
or
 
not.
 
Dealing
 
with
 
 both
 
cases
 
was
 
the
 
focus
 
of
 
several
 
works.
 
In
 
[19],
 
the
 
authors
 
consider
 
the
 
use
 
of
 
probability
 
density
 
functions
 
(PDFs)
 
to
 
represent
 
the
 
probabilistic
 
nature
 
of
 
data.
 
Although
 
the
 
solution
 
is
 
neat,
 
a
 
strong
 
probabilistic
 
 background
 
is
 
needed
 
to
 
maintain
 
such
 
solution.
 
In
 
[20],
 
the
 
authors
 
considered
 
incomplete
 
data,
 
thus
 
fuzzy
 
sets
 
was
 
an
 
alternative
 
approach.
 
Using
 
fuzzy
 
sets
 
[21],
 
the
 
authors
 
are
 
able
 
to
 
consider
 
a
 
range
 
of
 
values
 
rather
 
than
 
a
 
single
 
exact
 
value
 
as
 
used
 
 before.
 
Fuzzy
 
set
 
theory
 
is
 
the
 
theory
 
 behind
 
the
 
usage
 
of
 
fuzzy
 
values.
 
The
 
fuzzy
 
set
 
theory
 
stipulates
 
that
 
not
 
only
 
are
 
there
 
values
 
for
 
given
 
object
 
classification,
 
 but
 
these
 
objects
 
have
 
degrees
 
of
 
membership
 
to
 
their
 
categories
 
(i.e
 
fuzzy
 
sets)
 
as
 
well.
 
Thus,
 
fuzzy
 
sets
 
are
 
sets
 
that
 
include
 
values
 
(just
 
like
 
normal
 
sets)
 
as
 
well
 
as
 
a
 
membership
 
value
 
(also
 
known
 
as
 
degree
 
of
 
truth)
 
that
 
indicates
 
how
 
strongly
 
a
 
certain
 
value
 
 belongs
 
to
 
this
 
set.
 
Hence,
 
an
 
element
 
is
 
associated
 
with
 
a
 
membership
 
value
 
that
 
reflects
 
the
 
degree
 
of
 
confidence
 
of
 
 belonging
 
to
 
a
 
certain
 
set
 
of
 
values.
 
Another
 
solution
 
was
 
introduced
 
in
 
[22].
 
In
 
this
 
paper
 
the
 
authors
 
used
 
NULL
 
values
 
for
 
the
 
missing
 
entries.
 
Other
 
work
 
focused
 
on
 
querying
 
uncertain
 
data.
 
In
 
moving
 
object
 
context,
 
probabilistic
 
range
 
queries
 
were
 
introduced
 
in
 
[15].
 
In
 
this
 
paper
 
the
 
authors
 
present
 
a
 
model
 
and
 
query
 
answering
 
approach
 
that
 
employs
 
stochastic
 
processes
 
to
 
answer
 
queries
 
over
 
uncertain
 
data.
 
In
 
[23]
 
the
 
authors
 
consider
 
sensor
 
net
works
 
and
 
present
 
a
 
solution
 
for
 
indexing
 
uncertain
 
data
 
in
 
sensor
 
network
 
environment.
 
In
 
[24]
 
the
 
authors
 
consider
 
the
 
problem
 
of
 
outlier
 
detection
 
in
 
uncertain
 
sensor
 
data.
 
Lately,
 
research
 
was
 
directed
 
to
 
consider
 
mining
 
and
 
aggregating
 
uncertain
 
data.
 
Proposing
 
different
 
clustering
 
techniques
 
for
 
uncertain
 
data
 
was
 
the
 
focus
 
of
 
some
 
work
 
including
 
[18,10,25].
 
In
 
this
 
paper
 
we
 
continue
 
to
 
study
 
uncertain
 
data,
 
However,
 
we
 
follow
 
a
 
different
 
perspective,
 
we
 
consider
 
uncertainty
 
in
 
the
 
data
 
warehouse
 
environment.
 
More
 
specifically,
 
how
 
to
 
model
 
and
 
represent
 
uncertainty
 
in
 
historical
 
data
 
stored
 
in
 
a
 
data
 
warehouse.
 
3 U
NCERTAIN
D
ATA
W
AREHOUSES
 
In
 
this
 
section
 
we
 
present
 
our
 
approach
 
to
 
design
 
a
 
data
 
ware
house
 
(DW)
 
that
 
is
 
capable
 
to
 
store,
 
manage,
 
and
 
analyze
 
 both
 
exact
 
and
 
uncertain
 
data.
 
Following
 
the
 
traditional
 
data
 
warehouse
 
definition,
 
we
 
continue
 
to
 
have
 
a
 
consistent,
 
time
 
variant,
 
subject
oriented
 
and
 
historical
 
store
 
of
 
data
 
with
 
an
 
additional
 
feature
 
that
 
is
 
handling
 
uncertain
 
data.
 
Although
 
data
 
in
 
the
 
data
 
warehouse
 
is
 
historical
 
in
 
nature,
 
the
 
degree
 
of
 
confidence
 
in
 
each
 
value
 
stored
 
in
 
the
 
data
 
warehouse
 
need
 
not
 
 be
 
the
 
same.
 
Consider
 
for
 
example
 
a
 
weather
 
sensor
 
that
 
monitors
 
temperature
 
readings,
 
once
 
the
 
sensor
 
acquires
 
a
 
reading,
 
a
 
snapshot
 
is
 
automatically
 
generated
 
in
 
the
 
DW
 
that
 
 basically
 
records
 
the
 
sensor
 
identifier,
 
the
 
reading,
 
and
 
the
 
reading
 
time.
 
If
 
the
 
sensor
 
had
 
a
 
measurement
 
error
 
the
 
history
 
of
 
that
 
sensor
 
is
 
thus
 
affected
 
and
 
consequently
 
future
 
analysis
 
and
 
reports
 
can
 
 be
 
affected
 
as
 
well.
 
Motivated
 
 by
 
this
 
kind
 
of
 
commonly
 
occurring
 
real
 
world
 
uncertainty
 
situation,
 
in
 
this
 
section
 
we
 
present
 
our
 
proposed
 
DW
 
design
 
to
 
handle
 
those
 
situations
 
 both
 
efficiently
 
and
 
effectively.
 
Our
 
solution
 
offers
 
a
 
way
 
to
 
handle
 
uncertain
 
values
 
in
 
a
 
data
 
warehouse,
 
 be
 
they
 
probabilistic
 
or
 
fuzzy.
 
The
 
main
 
key
 
in
 
our
 
solutions
 
is
 
 based
 
on
 
an
 
important
 
conclusion
 
presented
 
in
 
[26].
 
This
 
conclusion
 
states
 
that
 
 both
 
probability
 
density
 
function
 
(PDF)
 
and
 
a
 
membership
 
function
 
produce
 
values
 
that
 
imply
 
the
 
same
 
thing.
 
The
 
idea
 
 behind
 
this
 
conclusion
 
is
 
the
 
fact
 
that
 
a
 
membership
 
function
 
produces
 
values
 
in
 
the
 
range
 
[0,1]
 
which
 
indicates
 
how
 
close
 
an
 
element
 
is
 
to
 
a
 
certain
 
set,
 
and
 
consequently
 
measures
 
its
 
chance
 
of
 
 belonging
 
to
 
this
 
set.
 
Whereas,
 
a
 
PDF
 
function
 
also
 
returns
 
a
 
value
 
in
 
the
 
range
 
[0,1]
 
that
 
indicates
 
the
 
probability
 
of
 
occurrence
 
of
 
a
 
certain
 
random
 
variable
 
in
 
an
 
observation
 
space.
 
This
 
mapping
 
 between
 
the
 
two
 
measures
 
can
 
enforce
 
them
 
to
 
have
 
the
 
same
 
interpretation.
 
This
 
in
 
turn
 
allows
 
us
 
to
 
treat
 
the
 
fuzzy
 
set
 
to
 
which
 
an
 
element
 
 belongs
 
in
 
the
 
same
 
way
 
we
 
treat
 
its
 
probability
 
of
 
occurrence.
 
Hence,
 
the
 
closer
 
the
 
values
 
are
 
to
 
1,
 
the
 
more
 
likely
 
the
 
element
 
 belongs
 
to
 
a
 
fuzzy
 
set
 
and
 
the
 
 
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG 180
higher
 
its
 
probability
 
of
 
occurrence.
 
Using
 
this
 
conclusion
 
we
 
are
 
able
 
to
 
consider
 
uncertain
 
data
 
regardless
 
of
 
its
 
nature
 
(probabilistic
 
or
 
fuzzy).
 
In
 
either
 
case
 
we
 
consider
 
the
 
possible
 
range
 
of
 
values
 
for
 
a
 
certain
 
value
 
(for
 
example
 
a
 
sensor
 
reading)
 
and
 
maintain
 
a
 
confidence
 
value
 
in
 
the
 
range
 
[0,1]
 
that
 
expresses
 
our
 
 belief
 
level
 
in
 
this
 
acquired
 
value.
 
Working
 
in
 
a
 
data
 
warehouse
 
environment,
 
we
 
use
 
a
 
multi
dimensional
 
model
 
to
 
model
 
data
 
in
 
the
 
data
 
warehouse.
 
In
 
this
 
paper
 
we
 
only
 
consider
 
the
 
de
normalized
 
star
 
schema
 
as
 
it
 
is
 
widely
 
used
 
in
 
many
 
systems.
 
Following
 
the
 
traditional
 
star
 
schema,
 
our
 
proposed
 
schema
 
has
 
2
 
main
 
components
 
namely,
 
the
 
fact
 
table
 
where
 
numeric
 
facts
 
(measures)
 
are
 
kept
 
for
 
future
 
analysis
 
and
 
decision
 
making,
 
and
 
the
 
dimension
 
tables
 
where
 
verbose
 
attributes
 
are
 
stored.
 
Definition
 
1
 
 An
 
uncertain
 
star
 
schema
 
(ustar
 
schema)
 
is
 
a
 
multidimensional
 
data
 
model.
 
 An
 
uncertain
 
star
 
schema
 
has
 
a
 
central
 
 fact
 
table
 
where
 
both
 
exact
 
and
 
uncertainty
 
 facts
 
are
 
represented,
 
along
 
with
 
descriptive
 
dimension
 
tables.
 
Where,
 
uncertainty
 
 facts
 
are
 
numeric
 
 facts
 
(measures)
 
that
 
express
 
the
 
degree
 
of 
 
confidence
 
of 
 
the
 
data
 
stored
 
in
 
the
 
data
 
warehouse.
 
An
 
uncertain
 
star
 
schema
 
with
 
its
 
main
 
components
 
is
 
illustrated
 
in
 
Fig.
 
1.
 
Fig.
 
1.
 
An
 
Uncertain
 
Star
 
Schema
 
(ustar
 
schema)
 
Definition
 
2
 
 An
 
uncertain
 
warehouse
 
(UDW)
 
is
 
a
 
subject
oriented,
 
integrated,
 
non
volatile,
 
and
 
time
 
variant
 
store
 
 for
 
both
 
exact
 
and
 
uncertain
 
data.
 
 An
 
UDW 
 
is
 
modeled
 
using
 
a
 
ustar
 
schema
 
with
 
additional
 
 facts
 
(measures)
 
that
 
reflect
 
the
 
degree
 
of 
 
data
 
uncertainty.
 
Having
 
a
 
schema
 
to
 
model
 
the
 
uncertain
 
data
 
in
 
the
 
data
 
warehouse,
 
the
 
next
 
target
 
is
 
to
 
allow
 
data
 
analysis.
 
And
 
since
 
we
 
are
 
continuing
 
to
 
use
 
a
 
star
 
like
 
schema
 
regular
 
On
Line
 
Analytical
 
Processing
 
(OLAP)
 
operations
 
are
 
still
 
applicable.
 
Thus,
 
we
 
can
 
still
 
drill
down,
 
roll
up,
 
slice
 
and
 
dice.
 
The
 
main
 
function
 
that
 
needs
 
further
 
investigation
 
is
 
the
 
aggregation
 
function.
 
For
 
aggregate
 
functions,
 
working
 
in
 
an
 
uncertain
 
context
 
forces
 
us
 
to
 
consider
 
average
 
values
 
rather
 
than
 
exact
 
values.
 
Thus,
 
we
 
focus
 
on
 
efficiently
 
calculating
 
average
 
over
 
uncertain
 
data
 
using
 
the
 
measures
 
we
 
are
 
keeping
 
in
 
the
 
fact
 
table.
 
Each
 
measure
 
in
 
our
 
model
 
is
 
associated
 
with
 
a
 
confidence
 
level
 
that
 
indicates
 
our
 
 belief
 
in
 
the
 
truth
 
of
 
the
 
data
 
value.
 
Consequently,
 
it
 
is
 
necessary
 
to
 
incorporate
 
these
 
measures
 
into
 
the
 
cube.Trying
 
to
 
aggregate
 
the
 
values
 
presented
 
a
 
problem
 
as
 
those
 
measures
 
are
 
no
 
longer
 
discrete
 
values
 
 but
 
rather
 
fuzzy
 
or
 
probabilistic
 
values
 
that
 
are
 
thus
 
semi
additive
 
facts.
 
Therefore,
 
simple
 
aggregations
 
can
 
yield
 
meaningless
 
values
 
and
 
further
 
attention
 
is
 
required.
 
Hence,
 
averaging
 
the
 
values
 
is
 
the
 
common
 
solution
 
for
 
aggregating
 
semi
additive
 
values.
 
The
 
first
 
approach
 
to
 
average
 
confidence
 
 based
 
values
 
is
 
to
 
use
 
the
 
uaverage
 
 function
 
defined
 
 below.
 
Definition
 
3
 
Given
 
a
 
data
 
warehouse
 
D,
 
with
 
a
 
 fuzzy/probabilistic
 
measure
 
 M
 
in
 
the
 
 fact
 
table,
 
and
 
a
 
 fact
 
table
 
with
 
n
 
records.
 
The
 
confidence
 
of 
 
the
 
measure
 
 M
 
(Confidence(M))
 
is
 
a
 
real
 
number
 
over
 
the
 
range
 
[0:
 
1].
 
Such
 
that:
 
1.
 
IfM
 
is
 
a
 
random
 
variable,
 
then,
 
Confidence(M)=
 
 probability
 
of 
 
occurrence
 
ofM
 
according
 
to
 
a
 
 given
 
 probability
 
density
 
 function
 
(pdf)
 
with
 
 parameters
 
(
 
 ,
 
),
 
where,
 
 
 ,
 
 
are
 
the
 
mean
 
and
 
standard
 
deviation
 
resp.
 
2.
 
IfM
 
is
 
a
 
 fuzzy
 
variable
 
over
 
a
 
 fuzzy
 
set
 
F,
 
then,
 
Confidence(M)=
 
membership
 
 function(M,F).
 
In
 
 both
 
cases,
 
the
 
closer
 
the
 
Confidence(M)
 
is
 
to
 
1,
 
the
 
more
 
our
 
 belief
 
of
 
truth
 
and
 
confidence
 
in
 
the
 
measure
 
recorded
 
in
 
the
 
fact
 
table.
 
The
 
closer
 
the
 
confidence
 
is
 
to
 
0,
 
the
 
less
 
our
 
 belief
 
is.
 
Definition
 
4
 
Given
 
a
 
data
 
warehouse
 
D,
 
with
 
a
 
 fact
 
table
 
containing
 
n
 
records.
 
Let
 
each
 
record
 
contains
 
at
 
least
 
1
 
 fuzzy
 
(probabilistic)
 
attribute
 
 A
 
with
 
value
 
t
i
[A]
 
 
1
i
n
 
.
 
Such
 
that:
 
niu A
iii
1][
 
 
Where,
 
u
i
 
is
 
the
 
value
 
 for
 
attribute
 
 A
 
in
 
tuple
 
i,
 
and
 
 
i
 
is
 
the
 
confidence
 
measure
 
 for
 
that
 
value.
 
Then,
 
the
 
average
 
value
 
of 
 
a
 
 fuzzy/probabilistic
 
measures
 
of 
 
attribute
 
 A
 
 presented
 
in
 
the
 
 fact
 
table
 
over
 
the
 
n
records
 
denoted
 
by
 
uaverage(A)
 
is
 
defined
 
as:
 
)1()*(*)(
1
nurecordsof number confidencemeasure Auaverage
niii
 
 

Activity (3)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads
Imsai Arasan liked this

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->