Data Warehouses For Uncertain Data

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 178

Data Warehouses for Uncertain Data
Hoda M. O. Mokhtar
Abstract Data warehousing is one of the most powerful BI tools nowadays. A data warehouse stores historical data that is inte-
grated from many sources, and processes it in a multidimensional approach to make it easy to use for efficient decision making. How-
ever, so far most of the data warehouses designs are based on the assumption that data in the data warehouse is either true or true until
a new snapshot occurs. Today, many real world applications require handling uncertain data. Sensor networks, and a wide range of
location based services (LBS), and many others deals with data that is not 100% guaranteed accurate. Inspired by the importance of
those newly emerging application, in this paper we propose a novel framework for data warehouses that efficiently handles both exact
and uncertain data. We present the application of our model in the context of sensor networks and show analyzing uncertain data can
also be achieved.

Index TermsData Warehouses, analyzing fuzzy data, uncertain data warehouses, sensor data.
1 INTRODUCTION
NCERTAINITY is an inherent property in many real
world data. Even with the current advances in sensor
data acquisition, and positioning systems (GPSs), acquiring
100% accurate (exact) data is not feasible. In most realworld
applicationsdataacquiredisanapproximate,uncertain,fuzzy,
ornearaccuratedata.Acquiringanexactreadingofasensorat
every time instant or obtaining the exact location of a moving
objectateverytimeinstantisnotpossible.
Handlingdatauncertaintythusrequiresspecialtreatmentthan
regular traditional data that is assumed to be always true.
Handling uncertain and fuzzy data was discussed in the
database community is several research work including [16].
Probabilistic databases and fuzzy databases are among the
techniques proposed to deal with data uncertainty in the
database environment. In addition, several approaches were
presented for querying, managing, storing, and mining
uncertain data [79]. However, elevating this to consider
uncertainandfuzzyhistoricaldataindatawarehouseswasnot
thoroughly investigated [10, 11]. In general, data warehouses
were introduced to aid managers and decision makers in
makingthemostefficientdecision.Adatawarehouseissimply
defined as a subjectoriented, consistent, time variant,
and nonvolatile store of data that basically gains its power
through storing and handling measurable historical data [12].
Today, data warehousing is widely accepted of by many
organizationsasaneffectiveandefficientbusinessintelligence
and decision making tool. A key characteristic of data
warehouses is the usage of multidimensional model (i.e. star
schemaandsnowflakeschema).Thesemodelsenablethedata
warehouse to provide OLAP (OnLine Analytical Processing)
capabilitiesthatenrichthequeryingprocess.
However, current data warehouse models are built on the
assumption that data is true until a new instance (snapshot)
occurs. This assumption although valid in some applications
where obtaining exact values is possible, seems unrealistic in
manynewlyemergingapplicationswheredataerrorcanoccur.
Sensor failure, calibration error, measurement inaccuracy,
sampling discrepancy, and even outdated data are all normal
sources of data inaccuracy. These factors affect the nature of
datastoredinthedatabaseandconsequentlytransferredtothe
data warehouse for further processing. Suppose for example
we have a data warehouse to aid meteorologists in making
decisions based on weather monitoring readings. If we
determine that one of the basic sensors is probable to give
incorrect readings, how dowe know which reading is wrong?
Do we input all the readings in the warehouse and treat them
asiftheywere100%accurate?Whatifwehavemorethanone
faulty sensor? How do we combine those erroneous readings
and aggregate them? Inspired by the role of data warehouses
in many applications and the effect of data uncertainty on
query results and manipulation, in this paper, we investigate
thedesignofdatawarehouseschemathatiscapabletohandle
fuzzy, uncertain data in an efficient way. The main
contributionsofthepaperare:
1. Proposing a model for representing uncertainty in data
warehouses.
2. Extending traditional star schema model to capture and
handleuncertaindata.
U
- HodaM.O.MokhtariswiththeFacultyofComputersandInfor
mation,InformationSystemsDpet.CairoUniversity,PostalCode:
12613Egypt
2011JournalofComputingPress,NY,USA,ISSN
21519617
http://sites.google.com/site/journalofcomputing/

3.Presentingtheapplicationofuncertaindatawarehousesina
realapplication(i.e.weatherdataacquiredfromsensors).
Therestofthepaperisorganizedasfollows:Section2presents
abriefoverviewofpreviousrelatedwork.Section3,discusses
ourdesignapproachforuncertaindatawarehouses.Section4,
discusses the application of our design to handle sensor data
(specifically weather data). Finally, section 5 concludes and
proposesdirectionsforfuturework.

2 RELATED WORK
Data uncertainty is an inherent property in various realworld
applications. Managing uncertainty in database systems had
been the focus of many research work specially recently with
theadventofvariousapplicationsthatarebasedonmeasured
data [1316]. Handling uncertain data has been the focus of
severalresearchworksinthedatabasecommunity.Ingeneral,
data uncertainty is a nature consequence of either
measurementerrors,orincompletedata.
Forexample,acquiringtheexactlocationofamovingobjectat
every time instant is not feasible. Thus, approximation and
predictiontechniquesareusedtoobtainmissinglocations.This
inturn,affectsthedegreeofaccuracyofthedatastoredinthe
database. Handling uncertain data can be divided into two
main directions: proposing efficient techniques to model the
uncertain data, and incorporating those models to serve
differentapplications[17,1,18].
Modelinguncertaindatainthedatabasecontextwasdiscussed
in several work [1,2,4]. Generally, uncertainty in the database
environment is classified as either value uncertainty (attribute
uncertainty), or tuple uncertainty (existential uncertainty). In
the first case,one or more fields might contain uncertain data,
whileinthesecondcase,uncertaintyisonthewholerecordas
whetheritexistsornot.Dealingwithbothcaseswasthefocus
of several works. In [19], the authors consider the use of
probability density functions (PDFs) to represent the
probabilistic nature of data. Although the solution is neat, a
strong probabilistic background is needed to maintain such
solution. In [20], the authors considered incomplete data, thus
fuzzy sets was an alternative approach. Using fuzzy sets [21],
theauthorsareabletoconsiderarangeofvaluesratherthana
singleexactvalueasusedbefore.Fuzzysettheoryisthetheory
behind the usage of fuzzy values. The fuzzy set theory
stipulates that not only are there values for given object
classification,buttheseobjectshavedegreesofmembershipto
theircategories(i.efuzzysets)aswell.Thus,fuzzysetsaresets
that include values (just like normal sets) as well as a
membership value (also known as degree of truth) that
indicates how strongly a certain value belongs to this set.
Hence,anelementisassociatedwithamembershipvaluethat
reflectsthedegreeofconfidenceofbelongingtoacertainsetof
values.Anothersolutionwasintroducedin[22].Inthispaper
the authors used NULL values for the missing entries. Other
work focused on querying uncertain data. In moving object
context,probabilistic range querieswereintroducedin[15].In
this paper the authors present a model and query answering
approach that employs stochastic processes to answer queries
over uncertain data. In [23] the authors consider sensor net
works and present a solution for indexing uncertain data in
sensor network environment. In [24] the authors consider the
problem of outlier detection in uncertain sensor data. Lately,
research was directed to consider mining and aggregating
uncertain data. Proposing different clustering techniques for
uncertain data was the focus of some work including
[18,10,25].

In this paper we continue to study uncertain data, However,
we follow a different perspective, we consider uncertainty in
the data warehouse environment. More specifically, how to
model and represent uncertainty in historical data stored in a
datawarehouse.
3 UNCERTAIN DATA WAREHOUSES
Inthissectionwepresentourapproachtodesignadataware
house(DW)thatiscapabletostore,manage,andanalyzeboth
exact and uncertain data. Following the traditional data
warehouse definition, we continue to have a consistent, time
variant, subjectoriented and historical store of data with an
additionalfeaturethatishandlinguncertaindata.
Althoughdatainthedatawarehouseishistoricalinnature,the
degree of confidence in each value stored in the data
warehouse need not be the same. Consider for example a
weather sensor that monitors temperature readings, once the
sensor acquires a reading, a snapshot is automatically
generated in the DW that basically records the sensor
identifier,thereading,andthereadingtime.Ifthesensorhada
measurement error the history of that sensor is thus affected
andconsequentlyfutureanalysisandreportscanbeaffectedas
well.Motivatedbythiskindofcommonlyoccurringrealworld
uncertainty situation, in this section we present our proposed
DW design to handle those situations both efficiently and
effectively.
Oursolutionoffersawaytohandleuncertainvaluesinadata
warehouse,betheyprobabilisticorfuzzy.Themainkeyinour
solutions is based on an important conclusion presented in
[26]. This conclusion states that both probability density
function(PDF)andamembershipfunctionproducevaluesthat
imply the same thing. The idea behind this conclusion is the
fact that a membership function produces values in the range
[0,1] which indicates how close an element is to a certain set,
and consequently measures its chance of belonging to this set.
Whereas,aPDFfunctionalsoreturnsavalueintherange[0,1]
thatindicatestheprobabilityofoccurrenceofacertainrandom
variable in an observation space. This mapping between the
two measures can enforce them to have the same
interpretation. This in turn allows us to treat the fuzzy set to
which an element belongs in the same way we treat its
probabilityofoccurrence.Hence,thecloserthevaluesareto1,
the more likely the element belongs to a fuzzy set and the

higher its probability of occurrence. Using this conclusion we
are able to consider uncertain data regardless of its nature
(probabilisticorfuzzy).Ineithercaseweconsiderthepossible
range of values for a certain value (for example a sensor
reading)andmaintainaconfidencevalueintherange[0,1]that
expresses our belief level in this acquired value. Working in a
data warehouse environment, we use a multidimensional
model to model data in the data warehouse. In this paper we
only consider the denormalized star schema as it is widely
used in many systems. Following the traditional star schema,
ourproposedschemahas2maincomponentsnamely,thefact
table where numeric facts (measures) are kept for future
analysisanddecisionmaking,andthedimensiontableswhere
verboseattributesarestored.
Definition 1 An uncertain star schema (ustar schema) is a

multidimensionaldatamodel.
An uncertain star schema has a central fact table where both exact
and uncertainty facts are represented, along with descriptive
dimensiontables.
Where, uncertainty facts are numeric facts (measures) that express
thedegreeofconfidenceofthedatastoredinthedatawarehouse.
An uncertain star schema with its main components is

illustratedinFig.1.
Fig.1.AnUncertainStarSchema(ustarschema)
Definition2Anuncertainwarehouse(UDW)isasubjectoriented,
integrated, nonvolatile, and time variant store for both exact and
uncertain data. An UDW is modeled using a ustar schema with
additionalfacts(measures)thatreflectthedegreeofdatauncertainty.
Having a schema to model the uncertain data in the data

warehouse,thenexttargetistoallowdataanalysis.Andsince
we are continuing to use a star like schema regular OnLine
Analytical Processing (OLAP) operations are still applicable.
Thus,wecanstilldrilldown,rollup,sliceanddice.Themain
function that needs further investigation is the aggregation
function. For aggregate functions, working in an uncertain
context forces us to consider average values rather than exact
values. Thus, we focus on efficiently calculating average over
uncertain data using the measures we are keeping in the fact
table.
Each measure in our model is associated with a confidence
level that indicates our belief in the truth of the data value.
Consequently, it is necessary to incorporate these measures
into the cube.Trying to aggregate the values presented a
problem as those measures are no longer discrete values but
ratherfuzzyorprobabilisticvaluesthatarethussemiadditive
facts. Therefore, simple aggregations can yield meaningless
values and further attention is required. Hence, averaging the
values is the common solution for aggregating semiadditive
values. The first approach to average confidence based values
istousetheuaveragefunctiondefinedbelow.
Definition 3 Given a data warehouse D, with a fuzzy/probabilistic

measure M in the fact table, and a fact table with n records. The
confidence of the measure M (Confidence(M)) is a real number over
therange[0:1].Suchthat:
1. IfMisarandomvariable,then,
Confidence(M)= probability of occurrence ofM according to a
given probability density function (pdf) with parameters (u,o),
where,u,oarethemeanandstandarddeviationresp.
2. IfMisafuzzyvariableoverafuzzysetF,then,
Confidence(M)=membershipfunction(M,F).
Inbothcases,theclosertheConfidence(M)isto1,themoreour
belief of truth and confidence in the measure recorded in the
facttable.Theclosertheconfidenceisto0,thelessourbeliefis.
Definition4GivenadatawarehouseD,withafacttablecontaining
n records. Let each record contains at least 1 fuzzy (probabilistic)
attributeAwithvalueti[A]1in.Suchthat:
n i
u
A t
i
i
i
s s = 1 ] [
o

Where, ui is the value for attribute A in tuple i, and oi is the
confidencemeasureforthatvalue.
Then,theaveragevalueofafuzzy/probabilisticmeasuresofattribute
A presented in the fact table over the nrecords denoted by
uaverage(A)isdefinedas:
) 1 (
) * (
*
) (
1
n
u
records of number
confidence measure
A uaverage
n
i
i i
=
= =
o


Although the above approach to define the average yields
meaningful resultsfor probabilistic measures we present a
betterevaluationfortheaverageanditsconfidenceinterval.
Our computation depends on the use of intervals rather than
strict single values. In this approach, we assume that the
probabilisticmeasuresfollowaGaussian(normal)distribution.
Thereasonofchoosingthiskindofdistributionisitssimplicity
and that it is adequate to the type of measures we are
considering. Thus, we define for each measure value a
confidenceintervalusingthemean(u)andstandarddeviation
(o)ofthatmeasure.Usingbothparameters(u,o)alongwiththe
known error function (erf) from probability theory, we can
determinetheconfidenceofhavingavalueinthatinterval[27].
Thusitwasascertainedthatwhentheintervaliswidened,the
confidence decreased as more values can fall within this
interval.Atthesametime,theprobabilityoffindingthemean
inside the interval, however, increases. Thus, in this work for
fuzzy values we will compute the average according to
Equation 1, however, for probabilistic values we assume that
values are Gaussian random processes with mean u and
standarddeviationo.Hence,Gaussianerrorfunctioncanbe
employed.
Definition 5 Given a series of measurements described by a

Gaussian(normal)distributionwithstandarddeviationoandmean
(expectedvalue)0,then )
2
(
o
a
erf istheprobabilitythattheerror
ofasinglemeasurementliesbetweenaand+a,forpositivea[27].
Theabovedefinition,Definition5definestheerrorrangefora
special case of the Gaussian distribution (i.e. mean=0). This
definition is generalized in the probability theory yielding the
followinggeneralformulafortheconfidencerange.
Definition6GivenaGaussianrandomvariablexthatrepresentsa
measure (fact) represented in the fact table and an uncertain data
warehouse. Such that ) , ( ~ o u N x .Then the error function erf(x)
isdefinedas[27]:
dt e x erf
x
t

H
=
0
2
2
) (
Then,theprobabilityofameasurementfallinginarangeno around
themeanuforanynisgivenby:
)
2
( ) (
n
erf n x n P = + < < o u o u
Thus, knowing the mean and standard deviation for the

measured values we can simply induce the confidence range
withthehighestprobabilityofhavingthemeasuredvaluein.
Besides, considering Gaussian random variables for our
uncertain facts allows us to use the Central Limit theory [27],
andalsotousethelinearcharacteristicof thenormaldistribu
tion to compute the average of a set of Gaussian (normal)
randomprocesses.
Definition7If ) , ( ~
2
i i i
N X o u areindependentvariables,andifai
1 i n and b are constants, then the Sum of those Independent
Random Variables also known as the Linear Combinations of
IndependentNormalRandomVariablesisgivenby[27]:
2 2 2
2
2
2
2
1
2
1
2 2 1 1
2
2 2 1 1
...
...
, ) , ( ~ ...
n n
n n
n n
a a a
b a a a
where N b X a X a X a Y
o o o o
u u u u
o u
+ + + =
+ + + + =
+ + + + =
Using Definition 7 we can thus compute the sum aggregate

function of the uncertain values of independent measures in
the fact table. This aggregated value will consequently follow
the same type of distribution as the original values (Gaussian
distribution)withthemeanandstandarddeviationasdefined
inthedefinition.
Definition 8 If
) , ( ~
2
i i i
N X o u
are independent variables, then the
Average of those Independent Random Variables X is distributed as:
) , ( ~
2
n
N X
i
i
i
o
u
.
Fig. 2. A Modified Uncertain Star Schema for Probabilistic

Measures
Thustheaverageaggregateforprobabilisticmeasurescanalso
be computed yielding a normal distribution with u,o as
definedinDefinition7.
Finally, once the mean and standard deviation are calculated,
the confidence interval can be computed as shown above and
theustarschemacanbemodifiedasshowninFig.2andused
forfurtheranalysis.

4. UNCERTAIN DATA WAREHOUSES FOR WEATHER DATA
In this section we will explore the usage of our proposed
uncertain data warehouse in designing an UDW for synthetic
weather data acquired from sensor readings. We assume we
have sensors that measure temperature, pressure, rainfall, and
humidity and we treat those measurements as independent
measures. We also assume that the sensors follow normal
distributionintheirreadings.
Employing our proposed design schema we design the ustar
schema shown in Fig. 3. In this schema we have a central fact
table and 5 dimension tables (date dimension and 4 weather
sensordimensions).Weuseatransactionfacttablesoeachrow
inthefacttablerepresentsthereadingsofthedifferentsensors
on a specific date at a specific hour. We assume that readings
areobservedeveryhoursothelowestgranularitywehavefor
the time is hour. This assumption can be easily relaxed in real
applications and in this case it is preferable we split the
datatime dimension into 2 separate dimensions (data and
time)toavoidtheexplosioninthedimensiontablesize.Inthe
facttablewemeasuretheaveragevaluesforeachofthesensor
readings using the uaverage measure defined earlier. In addi
tion, we record the confidence range of each measure (CI)
giventhemeasuresattributes(u,o).Thisconfidenceintervalis
expressedinthefacttablewith2valuesasmeasureCIMinand
measure CI Max to express the minimum and maximum
bound of the interval. However, the fact table can keep more
than1interval(i.e.morerangesaroundthemean),thiswillin
turnaffectthepossiblenumberofqueriesthatweissueaseach
rangetranslatestoadifferentconfidenceintheoccurrenceofa
certainreading.
Using this schema we are capable to issue a range of queries

includingthefollowingcommonqueries:
Q1: SELECT DISTINCT LOCATIONID, AVG(TEMPERATURE) AS

TEMPERATURE
FROMFACTWEATHER
WHERETOTALCONFIDENCE>0.6
GROUPBYLOCATIONID
Query Q1 returns the locations of the sensors along with their

average temperature measure where the degree of confidence
isgreaterthan60%.
Q2: SELECT DISTINCT SEASON, AVG(TEMPERATURE) AS [AVERAGE

TEMPERATURE],
AVG(HUMIDITY)AS[AVERAGEHUMIDITY]
FROMFACTWEATHERINNERJOINDIMTIME
ONFACTWEATHER.TIMEID=DIMTIME.TIMEID
GROUPBYSEASON
Query Q2 uses the season member in the date dimension hie

rarchy and computes the average temperature and humidity
foreachseason.TheresultofQ2canbefurtherusedtoanalyze
the temperature and humidity values along different seasons
throughouttheyear.
These 2 queries along with various other weather based que

riescanthusbeefficientlyansweredusingtheschemainFig.3.
Fig.3.AustarSchemaforWeatherData

5. CONCLUSIONS AND FUTURE WORK

Inthispaperwepresentapioneersteptowardsdesigningun
certain data warehouses. Our approach is capable of handling
bothfuzzyandprobabilisticmeasures.Forthefuzzyvalueswe
use the membership function and for the probabilistic values
weusetheconfidenceintervalforGaussianrandomprocesses.
Weproposeanuncertainstarschemaframeworkthatisableto
capture the uncertainty feature of the measures stored in the
facttable.Forfutureworkwetargettoconsiderawiderrange
of distributions and to investigate the effect for our proposed
modelontheminingprocess.

REFRENCES

[1] L. Antova, T. Jansen, C. Koch, and D. Olteanu, Fast and simple rela
tionalprocessingofuncertaindata,inIEEE24thInternationalConference
on Data Engineering (ICDE08). Washington, DC, USA: IEEE Computer
Society,2008.
[2] N. Dalvi and D. Suciu, Efficient query evaluation on probabilistic da
tabases,TheVLDBJournal,vol.16,no.4,pp.523544,2007.


[3]S.Abiteboul,P.Kanellakis,andG.Grahne,Ontherepresentationand
queryingofsetsofpossibleworlds,inProc.ACMSIGMOD,1987.
[4] T. Imielinski and W. L. Jr, Incomplete information in relational data
bases,J.ACM,1984.
[5] S. Lee, An extended relational database model for uncertain and im
precise information, in Proc. 18
th
Intl Conf. Very Large Data Bases
(VLDB),1992.
[6] M. Mutsuzaki, M. Theobald, A. de Keijzer, J. Widom, P. Agrawal, O.
Benjelloun, A. Sarma, R. Murthy, and T. Sugihara, Trioone: Layering
uncertainty and lineage on a conventional dbms, in Proc. Third Biennial
Conf.InnovativeDataSystemsResearch(CIDR),2007.
[7] W. Ngai, B. Kao, C. Chui, R. Cheng, M. Chau, and K. Yip, Efficient
clustering of uncertain data, in Proc. Sixth IEEE Intl Conf. Data Mining
(ICDM),2006.
[8]J.Pei,B.Jiang,X.Lin,andY.Yuan,Probabilisticskylinesonuncertain
data,inProc.33rdIntlConf.VeryLargeDataBases(VLDB),2007.
[9]M.ZemankovaandA.Kandel,Queryevaluationinprobabilisticdata
bases,TheoreticalComputerScience,vol.171,no.1/2,1997.
[10] S. McClean, B. Scotney, and M. Shapcott, Aggregation of imprecise
and uncertain information, IEEE Trans. Knowledge and Data Eng., vol.
13,no.6,pp.902912,Dec.2001.
[11]R.Ross,V.Subrahmanian,andJ.Grant,Aggregateoperatorsinprob
abilisticdatabases,J.ACM,vol.52,no.1,2005.
[12]W.H.Inmon,Buildingthedatawarehouse.Wellesley,MA,USA:QED
InformationSciences,Inc.,1992.
[13]R.Cheng,D.Kalashnikov,andS.Prabhakar,Evaluatingprobabilistic
queriesoverimprecisedata,inProceedingsoftheACMSIGMODInterna
tionalConferenceonManagementofData,June2003.
[14] P. D. and C. Jensen, Capturing the uncertainty of movingobjects
representations, in Proceedings of the SSDBM Conference, 1999, p. 123
132.
[15] H. M. O. Mokhtar and J. Su, Universal trajectory queries for moving
object databases, in IEEE International Conference on Mobile Data Man
agement(MDM04),Berkeley,California,January2004,pp.133144.
[16]R.Cheng,D.Kalashnikov,andS.Prabhakar,Queryingimprecisedata
in moving object environments, IEEE Trans. Knowledge and Data Eng.,
vol.16,no.9,pp.11121127,Sept.2004.
[17] X. Lian and L. Chen, Probabilistic ranked queries in uncertain data
bases, in Proceedings of the 11th international conference on Extending
database technology: Advances in database technology, ser. EDBT 08.
NewYork,NY,USA:ACM,2008,pp.511522.
[18]M.Renz,R.Cheng,H.P.Kriegel,A.Zfle,andT.Bernecker,Similarity
search and mining in uncertain databases, In Proceedings of PVLDB, pp.
16531654,2010.
[19] T. Green and V. Tannen, Models for incomplete and probabilistic
information,DataEng.Bull.,vol.29,no.1,2006.
[20] C. Testemale, Fuzzy relational databasesa key to expert systems,
Journal of the American Society for Information Science, vol. 37, no. 4,
1986.
[21]L.Zadeh,Fuzzysets,InformationandControl,vol.8,1965.
[22] C. Zaniolo, Database relations with null values, in Proceedings of
the1stACMSIGACTSIGMODsymposiumonPrinciplesofdatabasesys
tems,ser.PODS82,1982,pp.2733.
[23] D.O. Kim, D.S. Hong, H.K. Kang, and K.J. Han, Urtree: an effi
cientindexforuncertaindatainubiquitoussensornetworks,inProceed
ingsofthe2ndinternationalconferenceonAdvancesingridandpervasive
computing,ser.GPC07,2007,pp.603613.
[24]C.C. AggarwalandP.S.Yu,Outlierdetectionwithuncertaindata.
inSDM08,2008,pp.483493.
[25] H. P. Kriegel and M. Pfeifle, Densitybased clustering of uncertain
data,inProc.11thACMSIGKDDIntlConf.KnowledgeDiscoveryinData
Mining(KDD),2005.
[26] E. A. Rundensteiner and L. Bic, Aggregates in possibilistic databas
es, in Proceedings of the 15
th
international conference on Very large data
bases,ser.VLDB89,1989,pp.287295.
[27] P. Peebles, Probability, Random Variables, and Random Signal Prin
ciples.McGrawHillScience/Engineering/Math;4edition,2000.

HodaM.O.Mokhtar
Is currently an assistant professor in the Information Sys
tems Dept., Faculty of Computers and Information, Cairo
University. Dr. Hoda Mokhtar received her PhD in Com
puter Science in 2005 from University of California Santa
Barbara. She received her MSc and BSc in 2000 and 1997
resp. from the Computer Engineering Dept., Faculty of En
gineering CairoUniversity. Her research interestsare da
tabasesystems,movingobjectdatabases,datawarehousing,
anddatamining.

Data Warehouses For Uncertain Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Warehouses For Uncertain Data

Uploaded by

Copyright:

Available Formats

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617

Definition 1 An uncertain star schema (ustar schema) is a

An uncertain star schema with its main components is

Having a schema to model the uncertain data in the data

Definition 3 Given a data warehouse D, with a fuzzy/probabilistic

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617

Definition 5 Given a series of measurements described by a

Thus, knowing the mean and standard deviation for the

Using Definition 7 we can thus compute the sum aggregate

Fig. 2. A Modified Uncertain Star Schema for Probabilistic

Using this schema we are capable to issue a range of queries

Q1: SELECT DISTINCT LOCATIONID, AVG(TEMPERATURE) AS

Query Q1 returns the locations of the sensors along with their

Q2: SELECT DISTINCT SEASON, AVG(TEMPERATURE) AS [AVERAGE

Query Q2 uses the season member in the date dimension hie

These 2 queries along with various other weather based que

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617

You might also like