You are on page 1of 36

IGI PUBLISHING ITJ3902

701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USA


International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 1
Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.igi-pub.com
This paper appears in the publication, International Journal of Data Warehousing and Mining, Volume 3, Issue 4
edited by David Taniar© 2007, IGI Global

Semantics-Aware Advanced
OLAP Visualization of
Multidimensional Data Cubes
Alfredo Cuzzocrea, University of Calabria, Italy
Domenico Saccà, University of Calabria, Italy
Paolo Serafino, University of Calabria, Italy

AbSTrACT

Efficiently supporting advanced OLAP visualization of multidimensional data cubes is a novel and chal-
lenging research topic, which results to be of interest for a large family of data warehouse applications
relying on the management of spatio-temporal (e.g., mobile) data, scientific and statistical data, sensor
network data, biological data, etc. On the other hand, the issue of visualizing multidimensional data do-
mains has been quite neglected from the research community, since it does not belong to the well-founded
conceptual-logical-physical design hierarchy inherited from relational database methodologies. Inspired
from these considerations, in this article we propose an innovative advanced OLAP visualization technique
that meaningfully combines (i) the so-called OLAP dimension flattening process, which allows us to extract
two-dimensional OLAP views from multidimensional data cubes, and (ii) very efficient data compression
techniques for such views, which allow us to generate “semantics-aware” compressed representations
where data are grouped along OLAP hierarchies.

Keywords: approximate query answering; data cube compression; OLAP; OLAP visualization

INTrODUCTION during the last years: (i) the data querying prob-
OLAP systems (Chaudhuri & Dayal, 1997; lem, which concerns with how data are accessed
Codd, Codd, & Salley, 1993; Inmon, 1996; and queried to support summarized knowledge
Kimball, 1996) have rapidly gained momentum extraction from massive data cubes; (ii) the data
in both the academic and research communities, modeling problem, which concerns with how
mainly due to their capability of exploring and data are represented and, thus, processed inside
querying huge amounts of data sets according to OLAP servers (e.g., during query evaluation);
a multidimensional and multi-resolution vision. and (iii) the data visualization problem, which
Research-wise, three relevant challenges of concerns with how data are presented to OLAP
OLAP have captured the attention of researchers users and decision makers in data warehouse
environments. Indeed, research communities

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
2 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

have mainly studied and investigated the first (1997), which inspired various models for
two problems, whereas the last one, even if multidimensional databases and data cubes
important-with-practical-applications, has been (e.g., Agrawal et al., 1997; Hacid & Sattler,
very often neglected. 1998; Thanh Binh, Min Tjoa, & Wagner, 2000;
Approximate query answering (AQA) Tsois, Karayannidis, & Sellis, 2001; Vassiliadis,
techniques address the first challenge, and 1998; Vassiliadis & Sellis, 1999)). Neverthe-
can be reasonably considered as one of the less, despite this effort, several papers have
most important topics in OLAP research. The recently put in evidence some formal limita-
main proposal of AQA techniques consists in tions of accepted conceptual models for OLAP
providing approximate answers to resource- (e.g., Cabibbo & Torlone, 1998), or theoretical
consuming OLAP queries (e.g., range- (Ho, failures of popular data cube operations, like
Agrawal, Megiddo, & Srikant, 1997), top-k aggregation functions (e.g., Lehner, Albrecht,
(Fang, Shivakumar, Garcia-Molina, Motwani, & Wedekind, 1998; Lenz & Shoshani, 1997;
& Ullman, 1998), and iceberg (Xin, Han, Cheng, Lenz & Thalheim, 2001).
& Li, 2006) queries) instead of computing Contrarily to data querying and modeling
exact answers, as decimal precision is usually issues, since data presentation models do not
negligible in OLAP query and report activities properly belong to the well-founded conceptual-
(e.g., see Cuzzocrea, 2005). Due to a relevant logical-physical design hierarchy for relational
interest from the data warehouse research databases (which has also been inherited from
community, AQA techniques have been inten- multidimensional models (Vassiliadis et al.,
sively investigated during the last years with 1999)), the problem of OLAP data visualization
the achievement of important results. Among has been studied and investigated so far only
the others, histograms (e.g., Acharya, Poosala, (Gebhardt, Jarke, & Jacobs, 1997; Inselberg,
& Ramaswamy, 1999; Bruno, Chaudhuri, & 2001; Keim, 1997; Maniatis, Vassiliadis,
Gravano, 2001; Gunopulos, Kollios, Tsotras, & Skiadopoulos, & Vassiliou, 2003a, 2003b).
Domeniconi, 2000; Muralikrishna & DeWitt, On the other hand, being OLAP a technology
1998; Poosala & Ioannidis, 1997), wavelets focused at supporting decision making, thus
(Vitter, Wang, & Iyer, 1998), and sampling (e.g., based on (sensitive) information exploration
Babcock, Chaudhuri, & Das, 2003; Chaud- and browsing, it is easy to understand that, in
huri, Das, Datar, Motwani, & Rastogi, 2001; future years, tools for advanced visualization
Cuzzocrea & Wang, 2007; Gibbons & Matias of multidimensional data cubes will quickly
1998) are the most successful techniques, and conquest the OLAP research scene.
they have also inducted several applications in Starting from fundamentals of data cube
contexts even different from OLAP, like P2P compression techniques and OLAP data vi-
data management (e.g., Gupta, Agrawal, & El sualization research issues, in this article we
Abbadi, 2003). Summarizing, with respect to argue to meaningfully exploit the main results
the OLAP context, AQA techniques propose coming from the former and the goals of the
(i) computing compressed representations of latter in a combined manner, and propose
multidimensional data cubes, and (ii) evaluating a novel technique for supporting advanced
(approximate) answers against such represen- OLAP visualization of multidimensional data
tations via ad-hoc query algorithms that, usu- cubes. The basic motivation of such an ap-
ally, meaningfully take advantages from their proach is realizing that (i) compressing data
hierarchical nature, which, in turn, is inherited is an efficient way of visualizing data, and (ii)
from the one of input data cubes. this intuition is well-founded at large (i.e., for
Conceptual data models for OLAP are any data-intensive system relying on massive
widely recognized as based on data cube con- data repositories), and, more specifically, it is
cepts like dimension, hierarchy, level, member, particularly targeted to the OLAP context where
and measure, first introduced by Gray et al.

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 3

accessing multidimensional data cubes can be- mensional summary data domains presented by
come a realistic bottleneck for data warehouse us in Buccafurri, Furfaro, Saccà, and Sirangelo
systems and applications. For instance, as we (2003), via introducing the amenity of gener-
better motivate in Section 2, this is the case of ating semantics-aware buckets (i.e., buckets
mobile OLAP, which, recently, has attracted that “follow” groups of the OLAP hierarchies
considerable attention from the data warehouse of D). In other words, we use the OLAP hi-
research community. erarchies defined on the dimensions of D to
Another contribution of our work is rep- drive the compression process. This allows us
resented by the wide experimental analysis we to achieve space efficiency, while, at the same
conducted in order to test the effectiveness of our time, support approximate query answering and
proposed technique. To this end, we performed advanced OLAP visualization features against
various kinds of experiments with respect to multidimensional data cubes.
several metrics, and against different classes
of data cubes; specifically, we have taken into Article Outline
consideration synthetic, benchmark, and real The remaining part of this article is organized
data cubes. Results of these experiments confirm as follows. In the second section, we describe
that our proposed technique outperforms similar mobile OLAP application scenarios where the
state-of-the-art initiatives with respect to both advanced visualization technique for multidi-
the accuracy and visualization goals. mensional data cubes we propose assumes a
critical role in the vest of enabling technology.
Technique Overview In the third section, we outline the background
Briefly, our proposed technique relies on two of our proposal, which is represented by the
steps. The first one consists of generating a compression techniques presented in Buccafurri
two-dimensional OLAP view D from the input et al. (2003). In the fourth section, we provide
multidimensional data cube A by means of an a motivating example stating the goodness of
innovative approach that allows us to flatten our idea of making use of semantics-aware
OLAP dimensions (of A), and, as a consequence, compressed representations of two-dimensional
effectively support exploration and browsing OLAP views. In the fifth section, we provide fun-
activities against A (via D), by overcoming damentals and basic definitions used throughout
the natural disorientation and refractoriness of the article. The sixth section is devoted to the
human beings in dealing with hyper-spaces. description of our innovative OLAP dimension
Specifically, the (two) OLAP dimensions on flattening process. In the seventh section, we
which D is defined are built from the dimen- illustrate the hierarchy-driven two-dimensional
sions of A according to the analysis goals of OLAP view compression algorithm we propose.
the target OLAP user/application. The idea of The eighth section focuses on a comprehen-
using views to tame computational overheads sive experimental evaluation of our technique
due to data management and query processing against different classes of data cubes. Finally,
tasks against massive data warehouses is not the ninth section we derive conclusions of our
novel in literature, and it has been extensively work, and draw future directions for further
investigated across the last decade (e.g., (Ezeife, research in this field.
2001; Harinarayan, Rajaraman, & Ullman,
1996), with relevant results. The second step APPLICATION SCENArIOS
consists of generating a bucket-based com- The technique we propose in this article can be
pressed representation of D, namely hierarchy- successfully applied to all those scenarios in
driven indexed quad-tree summary (H-IQTS), which accessing and exploring massive multi-
denoted by H-IQTS(D), which meaningfully dimensional data cubes is a critical requirement.
extends the compression technique for two-di- For instance, this is the case of mobile OLAP
systems and applications, where users access

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
4 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

corporate OLAP servers via handheld devices. knowledge, by discarding the useless one, being
In fact, mobile devices are usually characterized resource-consuming transactions infeasible to
by specific properties (e.g., small storage space, be processed by such kind of devices.
small size of the display screen, discontinuance As a motivating application scenario, these
of the connection to the WLAN, etc) that are results can be successfully applied to the system
often incompatible with the need of browsing Hand-OLAP, proposed by us in Cuzzocrea et
and querying summarized information extracted al. (2003), which allows OLAP users to extract,
from massive multidimensional data cubes browse, and query compressed two-dimensional
made accessible through wireless networks. views (which are computed via the technique
In such application scenarios, flattening (Buccafurri et al., 2003)) coming from a remote
multidimensional data cubes into two-dimen- OLAP server (see Figure 1). Specifically, ac-
sional OLAP views represents an effective cording to the guidelines of algorithms proposed
solution yet an enabling technology for mobile in Buccafurri et al. (2003), Hand-OLAP is
OLAP environments, as, contrarily to what targeted at supporting range-queries, a very
happens for hyper-spaces, handheld devices popular class of queries useful to extract sum-
can easily visualize two-dimensional spaces on marized knowledge from data cubes in the vest
conventional (e.g., 2D) screens. This property, of aggregate information (Ho et al., 1997). The
along with the realistic need of compress- basic idea that Hand-OLAP is based on is: rather
ing data to be transmitted and processed by than querying the original multidimensional
handheld devices, makes perfect sense to our data, it may be more convenient to generate a
idea of using data compression techniques as a compressed view of them, store the view into
way of visualizing OLAP data. Moreover, the the handheld device, and query it locally, even
amenity of driving the compression process by if the WLAN is off, thus obtaining approximate
means of OLAP hierarchies, thus meaningfully answers that, as discussed in the first section,
generating semantics-aware buckets, further are perfectly suitable for OLAP goals (e.g., see
corroborates the application of our proposed Cuzzocrea, 2005).
technique to mobile OLAP environments, as the According to well-known design patterns,
limited computational capabilities of handheld Hand-OLAP is a multi-tier system, and every
devices impose us to definitively process useful software layer corresponds to a specific ap-

Figure 1. The system Hand-OLAP: Overview

Data Handheld
Warehouse Device

Extraction of a Two- Compression of the Two-


Dimensional View Dimensional View

OLAP Server Two-Dimensional View

Compressed Two-
Dimensional View

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 

plication logic (see Figure 2). Specifically, the with respect to the Data Engineering point
following software layers can be identified in of view, consists of three components which
the Hand-OLAP logical architecture: cooperate to fulfill OLAP-users’ requests:

• Data Sources Layer: It is the collection of • Request Manager: It is the component


(i) OLAP servers from which the desired that receives the request of OLAP users,
information can be retrieved, and (ii) wrap- and translates it either into a request to the
pers that extract meta-information about the Metadata Manager for retrieving meta-
available data cubes as well as the actual information about the content of the target
data; data cube, or into a request to the View
• Application Server Layer: It is the layer Manager for retrieving a compressed rep-
that (i) elaborates OLAP-users’ requests, resentation of the two-dimensional OLAP
(ii) interacts with OLAP servers, (iii) com- view defined by OLAP users:
putes the compressed representation of the  Metadata Manager: It is the com-
extracted OLAP view, and (iv) sends it to ponent that extracts meta-information
the handheld device; about the OLAP server it is connected
• User’s Layer: It includes the client-side to, and returns them in a XML for-
tool that allows a handheld device to acquire mat;
and elaborate the desired information, by • View Manager: It is the component that
enabling useful functionalities such as (i) extracts from the selected data cube the
connectivity services, metadata querying two-dimensional view defined by OLAP
and browsing, range-query managing (e.g., users, (ii) uses the compression agent
editing, executing, browsing, refreshing for summarizing it, and (iii) returns the
etc). compressed representation to the handheld
device;
The application server layer, which is the • Compression Agent: It is the component
most interesting component of Hand-OLAP that receives a two-dimensional view from

Figure 2. The system Hand-OLAP: Logical architecture

METADATA
wrapper
MANAGER
QUERY
MANAGER
wrapper
VIEW REQUEST
MANAGER MANAGER

wrapper COMPRESSION
AGENT

wrapper

Data Sources Layer Application Server Layer User s Layer

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

the view manager and returns its com- all the items contained within such buckets.
pressed representation to it--in particular, As shown in Buccafurri et al. (2003), due to
the view manager sends the extracted its hierarchical nature, QTS is particularly suit-
two-dimensional view to the compression able for evaluating range-queries. To further
agent together with the value of the desired improve query capabilities, the leaf buckets
compression ratio, which depends on both of QTS having non-uniform data distribution,
the amount of storage space available at on which traditional interpolation techniques
the handheld device and the size of the would fail, are equipped with very compact
view; data structures called indexes. Indexes can be
• Query Manager: It is the component that is efficiently represented in few bytes, and provide
in charge of supporting range-query evalu- succinct descriptions of data distributions of
ation on the compressed two-dimensional buckets they summarize. This approach leads
view, and visualizing the results on the to the definition of an extended version of QTS
handheld device according to a partitioned called indexed quad-tree summary (IQTS).
hierarchical representation. Indexes allow us to definitively augment the
quality of intra-bucket query estimation, thus
Indeed, as we discuss next, the actual capa- overcoming general-purpose state-of-the-art
bilities of Hand-OLAP can be further improved compression techniques like histograms and
by integrating inside its core layer advanced wavelets (Buccafurri et al., 2003).
OLAP visualization features developed on top In Buccafurri et al. (2003), we define three
of the technique we propose in this article. index types with different organization of sub-
buckets, so that we select the index, which better
bACKGrOUND approximates the data distribution inside a given
Given a two-dimensional summary data domain bucket: (i) the 2/3LT-Index, which is suitable
D, the technique proposed in Buccafurri et al. for distributions with no strong asymmetry; (ii)
(2003) allows us to obtain a compact data struc- the 2/4LT-Index, which is oriented to biased
ture called quad-tree summary (QTS), which distributions; (iii) the 2/p(eak)LT-Index, which
founds on a quad-tree-based partitioned repre- is designed for capturing distributions having
sentation of D, denoted by QTS(D), where, at a few high density peaks. As an example, here
each iteration of the generating partition process, we focus on an instance of Index, the 2/3LT-
(i) the current bucket b in QTS(D) to be split index (see Figure 3), which is built for a leaf
is greedily chosen by selecting the one having bucket of a compressed two-dimensional OLAP
maximum sum of the squared errors (SSE), view. The sum of all the items contained in such
and (ii) b is split in four equal-size square sub- bucket is equal to 50. The index is obtained
buckets that are added to the current partition. as follows: the bucket is partitioned into four
Specifically, given a bucket b, the SSE of b, equal-size sub-buckets and, in turn, each of
denoted by SSE(b), is defined as follows: the four sub-buckets into other four equal-size
sub-sub-buckets. The index stores approximate
SSE (b) = ∑ (D[k ] − AVG (b) ) aggregate data about both the generated sub-
2
(1)
k∈b buckets and sub-sub-buckets. Such aggregate
such that: (i) k denotes a position inside b, (ii) data consist of the sums of items contained in the
D[k] is the value of D at position k, and (iii) regions, which are colored in grey (see Figure
AVG(b) is the average of values inside b. 3). The values of the sums are stored using less
This task is iterated until the storage space than 32 bits, introducing some approximation.
B available for housing QTS(D) is consumed. The number of bits used for each stored value
The “natural” representation of QTS(D) is like depends on the size of the corresponding sub-
a quad-tree, such that nodes are corresponding bucket. That is, referring to Figure 3, we use 6
to buckets of the partition and store the sum of bits for both the regions A and B (which have

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 7

the same size), and 5 bits for C, whose size other regions with rather uniform distributions).
is an half of A and B. Analogously, we use 4 The detailed description of these kinds of index,
bits for D and E (whose size is an half of C), along with their experimental evaluation on
and so on. We point out that saving one bit for both synthetic and real data sets, can be found
storing the sum of C with respect to A can be in Buccafurri et al. (2003).
justified by considering that, on average, the As regards the issue of selecting the most
value of the sum of items inside C is an half suitable Index for a bucket on the basis of the
of the sum corresponding to A, since the size actual distribution of data inside the bucket.
of C is an half of the size of A. Thus, on the That is, we measure the approximation error
average, the accuracy of representing A using 6 carried out by the index, and select the index,
bits is the same as the accuracy of representing which provides the best accuracy. In order to
C using 5 bits. measure the approximation error of an Index
The previously described index is based I(b) on a bucket b, we use the following error
on a balanced quad-tree partition of a bucket. metrics:
Different types of index can be used, based on
64
different partitions. For instance, we can build ( I (b)) = ∑ ( sum(qi ) − sumI (qi )) 2 (2)
an index based on an unbalanced quad-tree i =1

partition. Such an Index is more suitable for a where qi represents the i-th (among 64 ones)
bucket where items are distributed heteroge- sub-bucket of b obtained by dividing its sides
neously (i.e., buckets consisting of some regions into eight equal-size ranges, and sumI(qi)
containing very skewed data distributions and represents the estimation of the sum of items

Figure 3. Building a 2/3LT-index and its compressed representation on memory

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

occurring in qi which can be done by using I(b) contained within them. Just like IQTS(D), leaf
and the knowledge of sum(b). Given a bucket b, nodes of H-IQTS(D) are equipped with Indexes
we choose the 2/nLT-Index which “originates” in order to improve query capabilities.
the minimum value of ε(I(b)) (see Buccafurri
et al., 2003) for further details). MOTIVATING EXAMPLE
With respect to the results achieved in Buc- To become convinced of the benefits coming
cafurri et al. (2003), in this article we investigate from the idea of using OLAP hierarchies to drive
the problem of providing a compressed repre- the compression process, consider the following
sentation of a given two-dimensional OLAP example. Let A be a two-dimensional data cube
view D, being D extracted from the target data defined on top of relational data sources storing
cube A by means of the previously mentioned sale data, and having as measure the total amount
OLAP dimension flattening process instead of Sales (e.g., 15,000 €) of a given product (e.g.,
of processing two-dimensional summary data t-shirt), belonging to the dimension product,
domains (like in Buccafurri et al., 2003). This in a given city (e.g., Lisbon), belonging to the
imposes us to handle OLAP hierarchies defined dimension zone. Consider the quad-tree-based
on the dimensions of D, thus achieving an partitioning scheme for A depicted in the left
innovative contribution with respect to goals side of Figure 4. This scheme presents “wrong”
of Buccafurri et al. (2003). Indeed, summary buckets as items related to Chicago (belonging
data considered in Buccafurri et al. (2003) re- to the dimension zone) are aggregated in the
semble OLAP data in the application scenario left-down bucket along with items related to
we address in this article, but summary data cities located in Europe (i.e., Prague, Berlin,
do not expose hierarchies and do not impose Munich, etc), instead of being aggregated in the
us to handle and deal with the semantics of left-up bucket along with items related to cities
hierarchies. located in America (i.e., New York, Vancouver,
Due to the need of handling OLAP hi- Toronto, etc). The same happens with Raincoat
erarchies, H-IQTS(D) adds to IQTS(D) the (belonging to the dimension product), whose
amenity of generating a quad-tree based par- items are aggregated in the right-down bucket
titioned representation of D according to the along with items related to summer clothes (i.e.,
semantics provided by hierarchies defined on sunglasses, bikini, t-shirt etc), instead of being
the dimensions of D (i.e., as highlighted in the aggregated in the left-down bucket along with
first section, using the hierarchies to drive the items related to winter clothes (i.e., gloves, hat,
compression process). In fact, in the presence scarf, etc). On the contrary, consider the hierar-
of hierarchies on the dimensions, neglecting chy-driven partitioning scheme for A depicted
such information (as would happen by adopt- in the right side of Figure 4. As an alternative
ing the quad-tree based partitioning scheme to the previous one, this scheme follows the
(Buccafurri et al., 2003)) could involve in the hierarchies defined on the dimensions, and, as
wrong condition of obtaining buckets storing a consequence, buckets are computed on top of
aggregate values computed over OLAP data measures related to the same semantic domain. It
related to items belonging to different groups should be note that this condition is desirable at
within a same hierarchy. To go in further details, large, but it assumes a more relevant role for the
due to its generating process and contrarily to context we address, as, typically, the compres-
IQTS(D), H-IQTS(D) can also house rectan- sion process causes the loss of the structure (in
gular buckets instead of square buckets only, terms of OLAP schemas) of data cubes.
since an arbitrary data cube exposes, without Now, consider the benefits due to the de-
any loss of generality, arbitrary groups in the scribed approach in a mobile OLAP setting like
hierarchies. Similarly to IQTS(D), H-IQTS(D) the one drawn by the system Hand-OLAP. In
is shaped as a quad-tree, and the information Hand-OLAP, compressed views extracted from
stored in its buckets is still the sum of items remote OLAP servers are mainly explored and

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 

browsed via popular DRILL-DOWN OLAP et al., 1997)—a complete survey can be found
operations (i.e., increasing the level of detail of in Vassiliadis et al. (1999).
OLAP data) implemented via splits over buck-
ets of the view. Nevertheless, since each split Hierarchy, Member, Level, and
partitions the current bucket into four equal-size OLAP Metadata
sub-buckets, OLAP users could be required to Given an OLAP dimension di and its domain of
perform many splits before to access the sum- members Ψ(di), each of them denoted by ρj, a
marized knowledge he/she is interested in, as hierarchy defined on di, denoted by H(di) can be
“wrong” buckets could be accessed during the represented as a general tree (i.e., such that each
exploration task. On the contrary, by admitting node of the tree has a number n ≥ 0 of child nodes)
semantics-aware buckets, since OLAP analysis built on top of Ψ(di). H(di) is usually obtained
is subject-oriented (Han & Kamber, 2000), according to a bottom-up strategy by (i) setting
OLAP users access the summarized knowl- as leaf nodes of H(di) members in Ψ(di), and
edge of interest in a faster manner rather than (ii) iteratively aggregating sets of members in
the previous case, as each split partitions the Ψ(di) to obtain other (internal) members, each
current bucket into four sub-buckets computed of them denoted by σj, which correspond to
over semantically-related OLAP data. internal nodes in H(di). In turn, internal members
in Ψ(di) (equally, nodes in H(di)) can be further
FOUNDAMENTALS AND bASIC aggregated to form other super-members until
DEFINITIONS a unique aggregation of members is obtained;
In order to better understand our proposal, it the latter corresponds to the root node of H(di),
is needed to introduce some fundamentals and and it is known in literature as the aggregation
basic definitions regarding the constructs of ALL. More precisely, ALL is only an artificial
OLAP conceptual data model we adopt, along aggregation introduced to obtain a tree (i.e.,
with the notation we use in the rest of the article. H(di)) instead of a list of trees, each of them
These definitions are compatible with main rooted in the second-level-internal-nodes σj,
results of previous popular models (e.g., Gray which should be the “effective” highest-level

Figure 4. Equal-size quad-tree based partition (left) and hierarchy-driven quad-tree based parti-
tion (right) of the product-zone data cube

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
10 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

partition of members in Ψ(di). Each member the third level of the hierarchies defined on
in H(di) is characterized by a level (of the hi- the dimensions zone, time, userclass, and ac-
erarchy), denoted by Lj, such that Lj ≥ 0 (note cidentclass respectively. Under the described
that, when Lj = 0, σj ≡ ρj); as a consequence, OLAP schema, given the data cell C3,h in A
we can define a level Lj in H(di) as a collection with value Val(C3,h) = 2,000 € at third level
of members. For each level Lj, the ordering L3, the metadata set M(C3,h) could be defined
of Lj, denoted by O(Lj), is the one exposed by as follows: M(C3,h) = {IL, 1,500, Accountant,
the OLAP server platform for the target data AccidentsInIL}; by rolling-up, we could access
cube. Note that such ordering depends on how the upper-level components C2,m with value
knowledge held in (OLAP) data is produced, Val(C2,m) = 3,500 € and M(C2,m) = {USA, 2,000,
processed, and delivered. AdministrativeManager, AccidentsInUSA}; by
Given a multidimensional data cube A drilling-down, we could access the lower-level
such that Dim(A) = {d0, d1, …, dn-1} is the set components C4,m with value Val(C4,m) = 1,000
of dimensions of A, and Hie(A) = {H(d0), H(d1), € and M(C4,m) = {Chicago, 900, Administrati-
…, H(dn-1)} the set of hierarchies defined on the veOfficer, WorkAccidents}.
latter dimensions, the collection of members
σj at level Lj of each hierarchy H(di) in Hie(A) Left boundary Member (LbM) and
univocally refers, in a multidimensional fashion, right boundary Member (rbM)
a certain (OLAP) data cell Cj,h in A at level Lj. Given a member σj at level Lj of the hierarchy
In other words, Cj,h is the OLAP aggregation H(di) defined on an OLAP dimension di and
of data cells in A at level Lj. We name such the set of its child nodes Child(σj), which are
collection as j-level OLAP Metadata (for Cj,h), members at level Lj+1, we define as the Left
and denote them as M(Cj,h). Given a level Lj, Boundary Member (LBM) of σj the child node
the data cell Cj,h at Lj and the corresponding col- of σj in Child(σj) that is the first in the ordering
lection of OLAP metadata M(Cj,h), if we move O(Lj+1). Analogously, we define as the Right
up towards the level Lj+1 (i.e., by performing Boundary Member (RBM) of σj the child node
a roll-up (OLAP) operation on the hierarchy of σj in Child(σj) that is the last in the ordering
H(di)), we increase the level of abstraction O(Lj+1). As an example, consider the hierarchy
and decrease the level of detail of both Cj,h H(di) depicted in Figure 5; here, (i) e is the LBM
and M(Cj,h), thus accessing the upper-level of b, (ii) f is the RBM of b, (iii) b is the LBM
components Cj+1,m and M(Cj+1,m), with m ≠ h. of a, (iv) d is the RBM of a etc.
Contrarily to this, if we move down towards
the level Lj-1 (i.e., by performing a drill-down
(OLAP) operation on the hierarchy H(di)), we
decrease the level of abstraction and increase
the level of detail of both Cj,h and M(Cj,h), thus
Figure 5. An OLAP hierarchy
accessing the lower-level components Cj-1,m and
M(Cj-1,m), with m ≠ h.
For instance, consider a four-dimensional H(dj )
data cube A defined on the top of relational data a
sources containing insurance data and having
as measure the average value of the refunds
(e.g., 12,000 €) allocated in a given region c d
b
(e.g., Seattle), during a given time interval
(e.g., 2005), for a given employment class (e.g.,
bank clerk), and for a given kind of accident e g j
f h i k
(e.g., car crash), such that city, year, employ-
ment, and kindofaccident are the members of

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 11

OLAP DIMENSION hierarchies of other dimensions in A according


FLATTENING to an ordered definition set D(vi), defined as
OLAP dimension flattening process is the first follows D(vi) = {〈HLi, dj, Pj〉, 〈HLj, dj+1, Pj+1〉,
step of our technique for supporting advanced …, 〈HLj+K-1, dj+K, Pj+K〉}, where K = |D(vi)| – 1.
OLAP visualization of multidimensional data In more detail, for each pair of consecutive
cubes. In more detail, we flatten dimensions tuples 〈〈HLj, dj+1, Pj+1〉, 〈HLj+1, dj+2, Pj+2〉〉 in
of the input multidimensional data cube A into D(vi), the sub-tree of H(dj+2) rooted in the root
two specialized dimensions called visualization node of H(dj+2) and having depth equal to Pj+2,
dimensions (VD) that support advanced OLAP denoted by H SP (d j + 2 ), is merged to H(dj+1) by
j +2

visualization of A via constructing an ad-hoc appending a clone of it to each member σj+1 of


two-dimensional OLAP view D defined on level HLj+1, named as hooking level, in H(dj+1).
the VDs. From the described approach, it follows that: (i)
The process that allows us to obtain the the ordering of items in D(vi) defines the way
two VDs from the dimensions of A works as of building H*(di); (ii) the first hierarchy to be
follows. Let Dim(A) and Hie(A) be the set of processed is just H(di). As an example of the
dimensions and the set of hierarchies of A, re- flattening process of two OLAP dimensions
spectively. Each VD is a tuple vi = 〈di, H*(di)〉 into a new one, consider Figure 6, where the
such that (i) di is the dimension selected by the hierarchy H*(dj) is obtained by merging H(dj+1)
target OLAP user/application, (ii) H*(di) is a to H(dj) via setting Pj+1 = 1 and HLj = 1.
hierarchy built from meaningfully merging As regards data processing issues, it should
the “original” hierarchy H(di) of di with the be noted that, in order to finally compute D, due
to the OLAP dimension flattening task above,

Figure 6. Merging OLAP hierarchies


H(dj ) H(d j+1)
a i

b c d m n

e f g h i j k o p q r

H*(dj )
a

b c d

m n m n m n

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
12 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

it is needed to re-aggregate multidimensional input a data cube A and two OLAP hierarchies
data in A according to the new VDs. Hi and Hj of A, and returns as output the two-
Algorithm build2DOLAPViewViaFlattening dimensional OLAP view D extracted from A
(see Figure 7) implements the OLAP dimension by re-aggregating multidimensional data in A
flattening process. It takes as input the follow- according to Hi and Hj.
ing parameters: (i) the multidimensional data
cube A; (ii) the dimension di of A selected as HIErArCHY-DrIVEN
first VD; (iii) the dimension dj of A selected COMPrESSION OF
as second VD; (iv) the definition set for the
first output VD vi, D(vi); (v) the definition set TWO-DIMENSIONAL OLAP
for the second output VD vj, D(vj). It returns VIEWS
as output the two-dimensional OLAP view D Compressing the two-dimensional OLAP
extracted from A via flattening dimensions of view D (extracted from A according to the
A into the VDs according to the input definition OLAP dimension flattening process described
sets. Specifically, build2DOLAPViewViaFlat- in the sixth section) is the second step of our
tening makes use of the following procedures: proposed technique. Given D, for each step j
(i) buildVisualizationDimension, which takes as of our compression algorithm, we need to (i)
input the data cube A, a dimension d of A and greedily select the leaf bucket b of H-IQTS(D)
the definition set D(v), and returns as output the having maximum SSE (see the third Section),
VD v built on top of d according to the guide- and (ii) split b in four sub-buckets through in-
lines previous given above; (ii) getHierarchy, vestigating, for each dimension dk of D, levels
belonging to the utility package OLAPTools, of the hierarchy H(dk). The first task is similar
which, applied to a VisualizationDimension to what proposed in Buccafurri et al. (2003)
object v, returns the modified hierarchy H* for two-dimensional summary data domains,
of v; (iii) aggregate2DOLAPView, belonging to whereas the novelty proposed in this article
the utility package OLAPTools, which takes as

Figure 7. Algorithm build2DOLAPViewViaFlattening

ALGORITHM build2DOLAPViewViaFlattening
Input: The multidimensional data cube A; the dimension of A selected as first VD, di;
the dimension of A selected as second VD, dj; the definition set for the first output VD vi, D(vi);
the definition set for the second output VD vj, D(vj).
Output: The two-dimensional OLAP view D.
import OLAPTools.*;
begin
OLAPTools.OLAPView D ← null;
OLAPTools.VisualizationDimension vi ← null;
OLAPTools.VisualizationDimension vj ← null;
OLAPTools.Hierarchy H*i ← null;
OLAPTools.Hierarchy H*j ← null;
vi ← buildVisualizationDimension(A,di, D(vi));
vj ← buildVisualizationDimension(A,dj,D(vj));
H*i ← vi.getHierarchy();
H*j ← vj.getHierarchy();
D ← OLAPTools.aggregate2DOLAPView(A,H*i,H*j);
return D;
end;

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 13

consists in the second task properly. A Hierarchy-Driven Algorithm for


Formally, given the current bucket bj = Compressing Two-Dimensional
D[lj,0:uj,0][lj,1:uj,1] to be split at step j of our OLAP Views
compression algorithm, such that [lj,k:uj,k] is For the sake of simplicity, we will present our
the range of bj on the dimension dk of D, the hierarchy-driven compression algorithm for
problem is finding, for each dimension dk of D, two-dimensional OLAP views through show-
a splitting position Sj,k belonging to [lj,k:uj,k], i.e. ing how to handle the hierarchy of an OLAP
lj,k ≤ Sj,k ≤ uj,k. To this end, for each dimension dk dimension dk (i.e., how to determine a splitting
of D, our splitting strategy aims at (i) grouping position Sj,k on dk). Obviously, this technique
items into buckets related to the same semantic must be performed for both the dimensions of
domain, and (ii) maintaining the hierarchy H(dk) the target (two-dimensional) OLAP view D,
balanced as more as possible. Particularly, the thus obtaining, for each pair of splits at step
first aspect allows us to achieve the benefits j of our algorithm (i.e., Sj,0 and Sj,1), four two-
highlighted in the fourth section; the second dimensional buckets to be added to the current
aspect allows us to sensitively improve query partition of D (i.e., H-IQTS(D)).
estimation capabilities as, on the basis of this Let D be a two-dimensional data cube, and
approach, we finally obtain buckets with bal- D[0:|dk| – 1] be a one-dimensional OLAP view
anced “numerousness” (of items) that introduce of D obtained by projecting D with respect to
a smaller approximation error in the evaluation dk (see Figure 8). Let bj = D[lj,k:uj,k] be the cur-
of (OLAP) queries involving several buckets rent (one-dimensional) bucket of D[0:|dk| – 1]
rather than the contrary case (see Buccafurri to be split at step j. To determine Sj,k on [lj,k:uj,k],
et al., 2003) for further investigations). On the we denote as Tj,k(lj,k:uj,k) the sub-tree of H(dk)
other hand, this evidence has been already rec- whose (i) leaf nodes are the members of the sets
ognized in the context of Equi-Width histograms M(C0,h) defined on data cells C0,h in D[lj,k:uj,k]
(Piatetsky-Shapiro & Connell, 1984). with lj,k ≤ h ≤ uj,k, and (ii) the root node is the
(singleton) member of the set M(Cp,r) defined
on the data cell Cp,r that is the aggregation of

Figure 8. Modeling the splitting strategy


T0
A
H(dk)

B C
T1 T2

D E F G H I

7 2  1  0 1   7  3 4 7  1 3 7
D [0:17]

Sj,k

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
14 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

D[lj,k:uj,k] at level Lp of H(dk), being p the depth and:


of Tj,k(lj,k:uj,k). To give an example, consider
Figure 8. Here, tree T0, properly denoted by  1  
b′′j +1 = D   ⋅ D[l j ,k : u j ,k ]  : u j ,k  (6)
Tj,k(0:17), is related to the whole OLAP view   2  
D[0:17], and corresponds to the whole H(dk). Otherwise, if σk is the RBM of Rj,k, then
At step j, dk is split in the position Sj,k = 11, thus we have:
generating the buckets D[0:11] and D[12:17]. In
consequence of this, tree T1, properly denoted 1 
S j ,k =  ⋅ D[l j ,k : u j ,k ]  (7)
by Tj+1,k(0:11), is related to D[0:11], whereas 2 
tree T2, properly denoted by Tj+1,k(12:17), is
and, as a consequence, we obtain the follow-
related to D[12:17].
ing buckets:
Let (i) dk be the dimension of D to be pro-
cessed, (ii) H(dk) the hierarchy defined on dk,  1 
(iii) bj = D[lj,k:uj,k] the current (one-dimensional) b′j +1 = D l j ,k :  ⋅ D[l j ,k : u j ,k ]   (8)
 2 
bucket to be split at step j of our algorithm, (iv)
Tj,k(lj,k:uj,k) the tree related to bj, (v) T1j,k(lj,k:uj,k) be and:
the second level of Tj,k(lj,k:uj,k). In order to select
 1   (9)
the splitting position Sj,k on [lj,k:uj,k], we initially b′′j +1 = D   ⋅ D[l j ,k : u j ,k ]  + 1: u j ,k 
consider the data cell C0,k in D[lj,k:uj,k] whose  2  
indexer is in the middle of D[lj,k:uj,k], denoted Finally, if σk is different from both the LBM
by Xj,D, which is defined as follows: and the RBM of Rj,k (i.e. it follows the LBM of
Rj,k in the ordering O(T1j,k(lj,k:uj,k)) and precedes
1  the RBM of Rj,k in the ordering O(T1j,k(lj,k:uj,k)),
X j , D =  ⋅ D[l j ,k : u j ,k ]  (3)
2  we perform a finite number of shift operations
It should be noted that processing the sec- on the indexers of D[lj,k:uj,k] starting from the
ond level of Tj,k(lj,k:uj,k) (i.e., T1j,k(lj,k:uj,k)) derives middle indexer Xj,D and within the range:
from the usage of the aggregation ALL in OLAP
conceptual models, which, in total, introduces Γ j ,k =  lo
j ,k , up
j ,k
 (10)
an additional level in the general tree modeling
an OLAP hierarchy (see the fifth section). such that:
Then, starting from ρk, being ρk the (single-
lo 1  1 
ton--see the fifth section) member in the set j ,k =  ⋅ D[l j ,k : u j ,k ]  −  ⋅ D[l j ,k : u j ,k ] 
2  3 
M(C0,k), we go up on H(dk) until the parent
of ρk at level T1j,k(lj,k:uj,k), denoted by σk, is (11)
reached, and we decide how to determine Sj,k on
the basis of the nature of σk. If σk is the LBM and:
of the root node of Tj,k(lj,k:uj,k), denoted by Rj,k,
lo 1  1 
then we have: j ,k =  ⋅ D[l j ,k : u j ,k ]  +  ⋅ D[l j ,k : u j ,k ] 
2  3 
1  (12)
S j ,k =  ⋅ D[l j ,k : u j ,k ]  − 1 (4)
2 
These shift operations are repeated until
and, as a consequence, we obtain the follow- a data cell Vj,k in D[lj,k:uj,k] such that the corre-
ing two (one-dimensional) buckets as child sponding member σk at level T1j,k(lj,k:uj,k) is the
buckets of bj: LBM or the RBM of Rj,k. It should be noted that
admitting a maximum offset of ±  13 ⋅ D [l : u ]  k

 1  
j ,k j ,k
 
b′j +1 = D l j ,k :  ⋅ D[l j ,k : u j ,k ]  − 1 (5) with respect to the middle of the current bucket
 2  
is coherent with the aim of maintaining the

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 1

hierarchy H(dk) balanced as more as possible, the aim of obtaining balanced partitions of the
which allows us to take advantages from the input OLAP view.
previously highlighted benefits (see the fourth The described approach is implemented
section). by algorithm compress2DOLAPView (see Figure
To this end, starting from the middle of Γj,k 9), which takes as input the two-dimensional
(which is equal to the one of D[lj,k:uj,k], Xj,D), OLAP view D and the amount of storage space
we search for the data cell Vj,k by iteratively B available for housing the compressed repre-
considering indexers Ij,q within Γj,k defined by sentation of D, and returns as output the data
the following function: structure H-IQTS(D).
compress2DOLAPView makes use of the
 X j,D q=0 following procedures: (i) computeBucketSum,
I j ,q =  q (13)
I
 j ,q −1 + ( −1) ⋅ q q >1 belonging to the utility package OLAPTools,
which takes as input an OLAP view D and a
If such data cell Vj,k exists, then Sj,k is set as Bucket object b (which implements a bucket
equal to the so-determined indexer I*j,q , and, as a of the partition of D), and returns as output the
consequence, we obtain the pairs of buckets: sum of the items contained in b; (ii) setSum,
belonging to the utility package Compression-
b′j +1 = D l j ,k : I *j ,k − 1 (14) Toolkit, which takes as input a Bucket object b
and an integer Sum, and sets the sum stored in
and: b to the value Sum; (iii) add, belonging to the
utility package Sets, which takes as input an
b′′j +1 = D  I *j ,q : u j ,k  (15) item a and, applied to a Set object s, adds a
to s; (iv) computeOccupancy, belonging to the
if I*j,q is the LBM of Rj,k, or, alternatively, the utility package CompressionToolkit, which takes
pairs of buckets: as input the (current) compressed data structure
H-IQTS(D), and returns as output its occupancy
b′j +1 = D l j ,k : I *j ,q  (16) in KB; (v) findLeafBucketWithMaxSSE, belong-
ing to the utility package CompressionToolkit,
and: which takes as input an array of (current) leaf
buckets V, and returns as output the bucket hav-
b′′j +1 = D  I *j ,q + 1: u j ,k  (17) ing maximum SSE among them; (vi) getLevel,
belonging to the utility package OLAPTools,
if I*j,q is the RBM of Rj,k. On the contrary, if which takes as input a bucket b and a hierarchy
such data cell Vj,k does not exist, then we do H, and returns as output the level L of b in H;
not perform any split on D[lj,k:uj,k], and we (vii) computeSplittingPosition, belonging to the
“remand” the splitting at the next step of the utility package CompressionToolkit, which
algorithm (i.e., j + 1) where the splitting posi- takes as input a bucket b, a hierarchy H and a
tion Sj+1,k is determined by processing the third level L, and returns as output the splitting posi-
level T2j+1,k(lj+1,k:uj+1,k) of the tree Tj+1,k(lj+1,k:uj+1,k) tion S of b at level L, according to the guidelines
(i.e., by decreasing the aggregation level of given above; (viii) getDepth, belonging to the
OLAP data with respect to the previous step). utility package OLAPTools, which takes as
The latter approach is iteratively repeated until input a hierarchy H, and returns as output the
a data cell Vj,k verifying the condition above is depth p of H; (ix) hasIndex, belonging to the
found; otherwise, if the leaf level of Tj,k(lj,k:uj,k) utility package CompressionToolkit, which, ap-
is reached without finding any admissible split- plied to a Bucket object b, returns TRUE if b is
ting point, then D[lj,k:uj,k] is added to the current equipped with an index I(b), otherwise FALSE;
partition of the OLAP view without being split. (x) removeIndex, belonging to the utility package
We point out that this way to do still pursues CompressionToolkit, which takes as input a Bucket

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
1 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

Figure 9. Algorithm compress2DOLAPView

ALGORITHM compress2DOLAPView
Input: The two-dimensional OLAP view D; the storage space available for housing
H-IQTS(D), B.
Output: The compressed representation of D, H-IQTS(D).
import Sets.*;
import CompressionToolkit.*;
import OLAPTools.*;
begin
Sets.Set H-IQTS(D) ← new Sets.Set();
CompressionToolkit.Bucket bj ← null;
CompressionToolkit.Bucket bj+1 ← null;
Sets.Set bucketsToBeProcessed ← null;
int SUM ← 0;
int ℓj,0 ← 0;
int ℓj,1 ← 0;
int Sj,0 ← 0;
int Sj,1 ← 0;
int k ← 0;
bj ← new CompressionToolkit.Bucket(0,|d0| - 1,0,|d1| - 1);
SUM ← OLAPTools.computeBucketSum(D,bj);
CompressionToolkit.setSum(bj,SUM);
H-QTS(D).add(bj);
bucketsToBeProcessed ← new Sets.Set();
bucketsToBeProcessed.add(bj);
B ← B - CompressionToolkit.computeOccupancy(H-IQTS(D));
while (B > 0 && bucketsToBeProcessed.size() > 0) do
bj ← CompressionToolkit.findLeafBucketWithMaxSSE(bucketsToBeProcessed);
ℓj,0 ← OLAPTools.getLevel(bj,H(d0));
ℓj,1 ← OLAPTools.getLevel(bj,H(d1));
Sj,0 ← computeSplittingPosition(bj,H(d0),ℓj,0);
Sj,1 ← computeSplittingPosition(bj,H(d1),ℓj,1);
while (Sj,0 = -1 && Sj,1 = -1 &&
ℓj,0 < OLAPTools.getDepth(H(d0)) &&
ℓj,1 < OLAPTools.getDepth(H(d1)) do
ℓj,0 ← ℓj,0 + 1;
ℓj,1 ← ℓj,1 + 1;
Sj,0 ← computeSplittingPosition(bj,H(d0),ℓj,0);
Sj,1 ← computeSplittingPosition(bj,H(d1),ℓj,1);
endwhile
if (Sj,0 <> -1 || Sj,1 <> -1) then
if (CompressionToolkit.hasIndex(bj) = true) then
B ← B + CompressionToolkit.computeOccupancy(bj.getIndex());
CompressionToolkit.removeIndex(bj);
endif
endif
while (k < 4) do
bj+1 ← CompressionToolkit.getSubBucket(bj,Sj,0,Sj,1,ℓj,0,ℓj,1,k);
if (bj+1 <> null) then
Ij+1 ← CompressionToolkit.computeIndex(bj+1);
if (Ij+1 <> null) then
CompressionToolkit.equipeWith(bj+1,Ij+1);
endif
H-IQTS(D).add(bj+1);
B ← B – CompressionToolkit.computeOccupancy(H-IQTS(D));
bucketsToBeProcessed.add(bj+1);
endif
endwhile
bucketsToBeProcessed.remove(bj);
endwhile
return H-IQTS(D);
end;

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 17

object b, and removes the Index I(b) which b is Zone. Finally, Figure 15 shows the steps of the
equipped with; (xi) getSubBucket, belonging to compression process of D.
the utility package CompressionToolkit, which
takes as input a Bucket object bj, two splitting EXPErIMENTAL STUDY
positions Si and Sj defined on bj along the two In order to test the effectiveness of our proposed
dimensions di and dj at depths ℓi and ℓj of levels technique, we defined two kinds of experiments.
Li and Lj of bj in H(di) and H(dj), respectively, The first one is oriented to probe the data cube
and an integer k ranging in [0:3], and returns as compression performance (or, equally, the ac-
output the sub-bucket bj+1 of bj by (xi.i) splitting curacy) of our technique, whereas the second
bj on Si and Sj, thus obtaining four sub-buckets, one is instead oriented to probe the visualization
and (xi.ii) selecting among the latter the bucket capabilities of our technique in meaningfully
bj+1 on the basis of the value of k – (i.e., if k = supporting advanced OLAP visualization of
0, then the left up sub-bucket of bj is selected) multidimensional data cubes.
and so on; (xii) computeIndex, belonging to the
utility package CompressionToolkit, which takes Data Layer
as input a Bucket object b, and returns as output Ins regards the data layer of our experimental
the “best” index I(b) built on it if the accuracy framework, we engineered three kinds of data
provided by linear interpolation is not higher cubes and we extracted from them two-dimen-
than that provided by I(b) – otherwise, it returns sional OLAP views by means of a random
the Null object; (xiii) equipWith, belonging to the flattening process on the data cube dimensions.
utility package CompressionToolkit, which takes The usage of different classes of data cubes
as input a Bucket object b and an Index I(b), allowed us to submit our proposed technique
and equips b with I(b); (xiv) remove, belong- to a comprehensive and “rich” experimental
ing to the utility package Sets, which takes as analysis, and, as a consequence, carefully test
input an item a and, applied to a Set object s, its performance. Data cube classes we con-
removes a from s. sidered are the following: (i) synthetic data
cubes, which allow us to completely control
Example the variation of input parameters determining
Consider a three-dimensional data cube A de- the nature of OLAP data distributions as well
fined on top of relational data sources storing as the one of the OLAP hierarchies (e.g., act-
sale data, and having (i) as measure the total ing on the topology of the hierarchies etc); (ii)
amount of sales (i.e., the SQL aggregation benchmark data cubes, which allow us to test
operator SUM is exploited), and (ii) as dimen- the effectiveness of our technique under the
sions the set: Dim(A) = {Product, Zone, Time}. stressing of an in-laboratory-built input, and
The hierarchies H(Product), H(Zone), and to evaluate our technique against competitor
H(Time) are depicted in Figure 10, Figure 11, ones on “well-referred” data sets that have
and Figure 12, respectively. Here, we provide been widely used in similar research experi-
a compression process example, where the ences; (iii) real data cubes, which allow us to
VDs are (i) Product/Time, whose hierarchy probe the efficiency of our technique against
H(Product/Time) is obtained by merging the real-life data sets.
hierarchies H(Product) and H(Time) using For what regards synthetic data sets, we
PProduct = 2 and HLTime = 1 (see Figure 13), and finally obtained two kinds of two-dimensional
(ii) Zone, which is the same as the one defined OLAP views: (i) the view DC(L1,L2), for which
on the target data cube A. Figure 14 shows the data are uniformly distributed on a given range
two-dimensional OLAP view D extracted from [L1,L2], with L1 < L2, (i.e., the well-known
A by re-aggregating multidimensional data in A Continuous Values Assumption (CVA) (Colliat,
according to the (new) VDs Product/Time and 1996) holds), and (ii) the view DZ(zmin,zmax), for
which data are distributed according to a Zipf

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
1 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

(Zipf, 1949) distribution whose parameter z is (Transaction Processing Council, 2006) and
randomly chosen on a given range [zmin,zmax], APB-1 (OLAP Council, 1998). By exploiting
with zmin < zmax. Uniform and Zipf-based views data generation routines made available at
allow us to probe the benefits of our technique the respective benchmark Web sites, we built
under two “opposite” cases of (OLAP) data benchmark databases and, based on the latter,
distributions, being the latter, due to its gener- multidimensional (benchmark) data cubes from
ating process, closer to real-life instances. In which we extracted two-dimensional OLAP
both views, we generated, for each dimension, views. In more detail, from the benchmark data
an artificial hierarchy having depth equal to 15, set TPC-H, we extracted a two-dimensional
which is a reasonable value to be considered OLAP view (see Figure 16) having as dimen-
with respect to the goals of our experimental sions the attributes (i) C_Address, belonging to
analysis. In more detail, each artificial hierarchy the dimensional table dbo.Customer and linked
has been generated by means of a bottom-up to the fact table dbo.Lineitem through the di-
approach that, starting from the lowest-level mensional table dbo.Orders, and (ii) S_Address,
members of the view, progressively aggregates belonging to the dimensional table dbo.Supplier.
members in internal members until the desired Since the original hierarchies in TPC-H have
depth is obtained. It should be noted that these limited depth, thus being inappropriate to the
artificial hierarchies implicitly define the se- scope of our experimental analysis, we equipped
mantics of the (synthetic) view. the dimensions C_Address and S_Address with
For what regards benchmark data sets, we artificial hierarchies having depth equal to 15,
considered two popular benchmarks: TPC-H similarly to what done with synthetic data

Figure 10. Hierarchy H(Product)


Product /

Foodstuff Clothes Books

Dairy Drink Baked Summer Winter Accessories Scientific Humanistic Manuals


Clothes Clothes
Bikini

Gardening
Raincoat
Wine

Beer

Stocking
Chai

T-Shirt

Psychology

Novels

History

Cooking
Literature
Bag
Bread

Shorts
Gorgonzola

Gloves
Bagel

Belt

Physics
Pullover
Butter

Muffin

Computer Science
Maths
Geitost

Figure 11. Hierarchy H(Zone)


Zone

Europe Asia America

North Central Mediterranean East Southeast South North Central South


San Paulo
Saigon
Prague

Guatemala
London

Valencia

Brasilia
Rome

New York
Oslo

Panama City
Chicago
Shanghai
Berlin
Helsinki

Nice

San Jose
Geneva

Hong Kong

Buenos Aires

Lima
Managua
Athens
Dublin

Karachi
Tokyo
Seoul

Bangkok

Calcutta

Bombay

Vancouver
Munich

Jakarta
Kuala Lumpur

Kathmandu

Toronto

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 1

Figure 12. Hierarchy H(Time)


Time

Q1 Q2 Q3 Q4

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Figure 13. Hierarchy H(Product/Time)

Product / Time

Foodstuff Clothes Books

Dairy Drink Baked Summer Winter Accessories Scientific Humanistic Manuals


Clothes Clothes

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

Figure 14. Product/time-zone OLAP view D

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
20 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

sets. From the benchmark data set APB-1, we (University of California, Irvine, 2005), and
extracted a two-dimensional OLAP view (see we built a two-dimensional OLAP view (see
Figure 17) having as dimensions the attributes Figure 18) by defining two new dimensional
(i) Class, belonging to the dimensional table tables dbo.dAncestry and dbo.dAncestry_1,
dbo.Product, and (ii) Month, belonging to the which both store data on the origin regions of
dimensional table dbo.Time. Just like the TPC- parents of each person whose data are stored
H case, we equipped the dimensions Class and in the fact table dbo.usCensus. dbo.dAncestry
Month with 15-depth artificial hierarchies. and dbo.dAncestry_1 have been built starting
Finally, for what regards real-life data from the definition of USCensus1990 attributes
sets, we considered the popular data set US- available at (University of California, Irvine,
Census1990 (University of California, Irvine, 2001), and populated with tuples coming from
2001) made available from UCI KDD Archive dbo.usCensus. Then, they have been linked to

Figure 15. Compression process of the product/time-zone OLAP view D

Foodstuffs
Dairy

{Foodstuffs,
Clothes} 212 144

ALL
ALL 174 Accessories D.1 D.2
Scientific

Books 1444 

D.3 D.4
Manuals
North

North
Europe

America
South
Asia

South
America
D
Books
America
Europe

{Europe,
America
Asia}

ALL
ALL

(a) (b)

Dairy.Q1 Dairy.Q1

{Dairy, {Dairy,
37 374 37 374
Drink, Drink,
Baked} Baked}
{Foodstuffs, Backed.Q4 212 {Foodstuffs, Backed.Q4 212 D.2.1 D.2.2
D.2.1 D.2.2 Summer Clothes.Q1
Clothes} Summer Clothes.Q1 Clothes}
{Summer Clothes, {Summer Clothes,
Winter Clothes, Winter Clothes,
Accessories} Accessories}
307 3733 307 3733
ALL Accessories. Q4 ALL
Accessories. Q4 D.2.3
D.1 D.2.3 D.2.4 D.1 D.2.4
Scientific Scientific.Q1
{Scientific,
4 20
Humanistic}
Books Books
144  Humanistic.Q4 D.3.3 D.3.2 
Manuals.Q1
22 230
Manuals
Manuals D.3 D.4 D.4
Manuals.Q4 D.3.3 D.3.4
Buenos Aires

Buenos Aires
Guatemala

Guatemala
Chicago

Chicago
Brasilia

Brasilia
North
Europe

South
Asia
North
Europe

South
Asia

America}
America,

America}
America
Central

America,
{North

America
Central
South

{North

South

Europe Asia
{Europe,
Asia} America America

ALL ALL

(c) (d)

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 21

dbo.usCensus via simple ID-based relation- Metrics


ships (i.e., dbo.usCensus.dAncstry1 ↔ dbo. As regards the outcomes of our experimental
dAncestry.Id and dbo.usCensus.dAncstry2 ↔ study, we defined the following metrics. For the
dbo.dAncestry_1.Id). The resulting two-di- first kind of experiments (i.e., that focused on
mensional OLAP view has as dimensions the the accuracy), given a population of synthetic
attributes (i) Id, belonging to the dimensional range-SUM queries QS, we measure the aver-
table dbo.dAncestry_1, and (ii) Id, belonging to age relative error (ARE) between exact and
the dimensional table dbo.dAncestry. Finally, approximate answers to queries in QS, defined
15-depth artificial hierarchies have been embed- as follows:
ded to the latter dimensions.

Figure 16. Two-dimensional OLAP view extracted from the benchmark data set TPC-H

Figure 17. Two-dimensional OLAP view extracted from the benchmark data set APB-1

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
22 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

Figure 18. Two-dimensional OLAP view built on top of the real-life data set USCensus1990

1 |QS |−1 (18) there not exists any sibling node Nj of Ni such
Erel = ⋅ ∑ Erel (Qk )
| QS | k =0 that Qi ∩ Qj <> ∅.
As an example, consider Figure 19, where
such that, for each query Qk in QS, we have: a HRQ having depth equal to 2 is depicted. This
query could model a typical business intelli-
| A(Qk ) − A (Qk ) | gence (BI) scenario where two local companies
Erel (Qk ) = (19)
A(Qk ) which are joined to a common main company
pose queries to specialized sub-domains of the
where (i) A(Qk) is the exact answer to Qk, and
data cube sales according to their business goals.
(ii) Ã(Qk) is the approximate answer to Qk.
Also following the previous simple yet effective
Specifically, having fixed a range size ∆k for
example, it should be noted that HRQs have a
each dimension dk of the target synthetic OLAP
wide range of applications in OLAP systems
view D, we generated queries in QS through
(as also highlighted in Koudas et al. (2000)),
spanning D by means of the “seed” ∆0 × ∆1
since they allow us to extract “hierarchically-
query Qs whose left-up corner moves across
shaped” summarized knowledge from massive
two-dimensional references 〈i, j〉 of D.
data cubes.
For the second kind of experiments, we
Similarly to the previous kind of ex-
have been inspired from hierarchical range
periments, for each node Ni in QH(WH,PH), the
queries (HRQ) introduced by Koudas, Mu-
population of queries QS,i to be used as input
thukrishnan and & Srivastava (2000). In our
query set has been generated by means of the
implementation, a HRQ QH(WH,PH) is a full
above-described spanning technique (i.e.,
tree such that: (i) the depth of such tree is equal
based on the seed query Qis). In more detail,
to PH; (ii) each internal node Ni has a fan-out
since, due the nature of HRQs, the selectivity
degree equal to WH; (iii) each node Ni stores s
of seed queries Qi ,k of nodes Ni at level k of
the definition of a (“traditional”) range-SUM
QH(WH,PH) must decreases as the depth Pk of
query Qi; (iv) for each node Ni in QH(WH,PH),
QH(WH,PH) increases, we first imposed that the
selectivity of the seed query of the root node

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 23

Figure 19. A HRB (left) and its implementation on the data cube sales (right)

HRQ A0 Sales
Q0 =([B0, U0], [D1, Z1])
Q3 Q0
Q1
Q 1=([D0 , M0 ], [G1 , N1 ]) Q2=([N0 , T0 ], [P1 , U1 ]) Q6
Q4
Q3=([F0 , H0 ], [H1 , I1 ]) Q6=([R0, S0 ], [S1 , T1])
Q5
Q2
Q5=([P0 , Q0], [Q1, R1]) Z0
Q4=([I0, L0 ], [L1, M1 ]) A1 Z1

N0 in QH(WH,PH), denoted by || Q0,0 ||, is equal to


s
Given a HRB QH(WH,PH), we measure the
the γ % of ||D||, being γ an input parameter and average accessed bucket number (AABN),
||D|| the selectivity of the target OLAP view D, which models the average number of buckets
respectively. Then, for each internal node Ni in accessed during the evaluation of QH(WH,PH),
QH(WH,PH) at level k, we randomly determined and it is defined as follows:
the seed queries of the child nodes of Ni by k
PH |(WH ) |−1
checking the following constraint: 1
AABN (QH (WH , PH )) = ∑ k
⋅ ∑ AABN ( N  )
k = 0 (WH )  =0
|(WH )k +1 |−1
(22)

i =0
|| Qis,k +1 ||≤|| Qis,k || (20)

with: where, in turn, AABN(Nℓ) is the average number


of buckets accessed during the evaluation of
Qis,k +1 ∩ Q sj ,k +1 = ∅ the population of queries QS,ℓ of the node Nℓ in
(21)
QH(WH,PH), defined as follows:
for each i and j in [0, |(WH)k+1|–1], with i <> j, |QS , |−1
1
and adopting the criterion of maximizing each
s
AABN ( N  ) =
| QS , |
⋅ ∑ ABN (Qk )
(23)
||Qi ,k||. k =0

Figure 20. Experimental results for the accuracy metrics with respect to the query selectivity ||Q||
on the 1,000 × 1,000 two-dimensional synthetic OLAP views DC(25,70) (left) and DZ(0.5,1.5)
(right) with r = 10 %

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
24 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

such that, for each query Qk in QS,ℓ, ABN(Qk) COMPArISON TECHNIqUES


is the number of buckets accessed during the In our experimental study, we compared the
evaluation of Qk. performance of our proposed technique (under
Summarizing, given a compression tech- the two metrics previously defined) against
nique T, AABN allows us to measure the capa- the following well-known histogram-based
bilities of T in supporting advanced OLAP visu- techniques for compressing data cubes: Min-
alization of multidimensional data cubes as the Skew by Acharya et al. (1999), GenHist by
number of buckets accessed can be reasonably Gunopulos et al. (2000), and STHoles by Bruno
considered as a measure of the computational et al. (2001). In more detail, having fixed the
cost needed to extract summarized knowledge. space budget B (i.e., the storage space available
This resembles a sort of measure of the entropy for housing the compressed representation of
of the overall knowledge extraction process. As the input OLAP view), we derived, for each
stated in the fourth Section, this aspect assumes comparison technique, the configuration of
a leading role in mobile OLAP settings (e.g., the input parameters that respective authors
Hand-OLAP). consider the best in their papers. This ensures
a fair experimental analysis (i.e., an analysis
such that each comparison technique provides

Figure 21. Experimental results for the accuracy metrics with respect to the compression ratio
r on the 1,000 × 1,000 two-dimensional synthetic OLAP views DC(25,70) (left) and DZ(0.5,1.5)
(right) with ||Q|| = 350 × 300

Figure 22. Experimental results for the visualization metrics with respect to the depth of HRQs
P on the 1,000 × 1,000 two-dimensional synthetic OLAP views DC(25,70) (left) and DZ(0.5,1.5)
(right) with WH = 5, r = 10 %, and γ = 70 %

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 2

its best performance). Furthermore, for all on the interval [5, 20] % of size(D)), and fixing
the comparison techniques, we set the space the selectivity of queries ||Q||. This allows us
budget B as equal to the r % of size(D), being to measure the scalability of the compression
r the compression ratio and size(D) the total techniques, which is a critical aspect in OLAP
occupancy of the input OLAP view D. As an systems (e.g., see Cuzzocrea, 2005). Finally,
example, r = 10 % (i.e., B is equal to the 10 % Figure 22 shows our experimental results for
of size(D)) is widely recognized as a reasonable what regards the “visualization capabilities”
setting (e.g., see Bruno et al., 2001)). of the comparison techniques (according to
the guidelines drawn through the article) with
Experimental results respect to the depth of HRQs (i.e., PH) having
Figure 20 shows our experimental results for fan-out degree WH equal to 5 and the parameter
what regards the accuracy of the compression γ equal to 70 %. The input two-dimensional
techniques with respect to the selectivity of que- OLAP views and the value of the parameter r
ries in QS on the 1,000 × 1,000 two-dimensional are the same of the previous experiments.
synthetic OLAP views DC(25,70) (left side) and Figure 23, 24, and 25 show the results
DZ(0.5,1.5) (right side), respectively. Figure of the same experiment set described above
21 shows the results of the same experiment when 1,000 × 1,000 two-dimensional bench-
when ranging r on the interval [5, 20] (i.e., B mark OLAP views extracted from the data sets

Figure 23. Experimental results for the accuracy metrics with respect to the query selectivity
||Q|| on the 1,000 × 1,000 two-dimensional benchmark OLAP views extracted from the data sets
TPC-H (left) and APB-1 (right) with r = 10 %

Figure 24. Experimental results for the accuracy metrics with respect to the compression ratio
r on the 1,000 × 1,000 two-dimensional benchmark OLAP views extracted from the data sets
TPC-H (left) and APB-1 (right) with ||Q|| = 350 × 300

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 1

Semantics-Aware Advanced
OLAP Visualization of
Multidimensional Data Cubes
Alfredo Cuzzocrea, University of Calabria, Italy
Domenico Saccà, University of Calabria, Italy
Paolo Serafino, University of Calabria, Italy

AbSTrACT

Efficiently supporting advanced OLAP visualization of multidimensional data cubes is a novel and chal-
lenging research topic, which results to be of interest for a large family of data warehouse applications
relying on the management of spatio-temporal (e.g., mobile) data, scientific and statistical data, sensor
network data, biological data, etc. On the other hand, the issue of visualizing multidimensional data do-
mains has been quite neglected from the research community, since it does not belong to the well-founded
conceptual-logical-physical design hierarchy inherited from relational database methodologies. Inspired
from these considerations, in this article we propose an innovative advanced OLAP visualization technique
that meaningfully combines (i) the so-called OLAP dimension flattening process, which allows us to extract
two-dimensional OLAP views from multidimensional data cubes, and (ii) very efficient data compression
techniques for such views, which allow us to generate “semantics-aware” compressed representations
where data are grouped along OLAP hierarchies.

Keywords: approximate query answering; data cube compression; OLAP; OLAP visualization

INTrODUCTION during the last years: (i) the data querying prob-
OLAP systems (Chaudhuri & Dayal, 1997; lem, which concerns with how data are accessed
Codd, Codd, & Salley, 1993; Inmon, 1996; and queried to support summarized knowledge
Kimball, 1996) have rapidly gained momentum extraction from massive data cubes; (ii) the data
in both the academic and research communities, modeling problem, which concerns with how
mainly due to their capability of exploring and data are represented and, thus, processed inside
querying huge amounts of data sets according to OLAP servers (e.g., during query evaluation);
a multidimensional and multi-resolution vision. and (iii) the data visualization problem, which
Research-wise, three relevant challenges of concerns with how data are presented to OLAP
OLAP have captured the attention of researchers users and decision makers in data warehouse
environments. Indeed, research communities

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
2 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

Figure 25. Experimental results for the visualization metrics with respect to the depth of HRQs
P on the 1,000 × 1,000 two-dimensional benchmark OLAP views extracted from the data sets
TPC-H (left) and APB-1 (right) with WH = 5, r = 10 %, and γ = 70 %

Figure 26. Experimental results for the accuracy Figure 27. Experimental results for the accu-
metrics with respect to the query selectivity racy metrics with respect to the compression
||Q|| on the 1,000 × 1,000 two-dimensional ratio r on the 1,000 × 1,000 two-dimensional
real-life OLAP view built on top of the data set real-life OLAP view built on top of the data set
USCensus1990 with r = 10 % USCensus1990 with ||Q|| = 350 × 300

TPC-H (left sides of figures) and APB-1 (right tidimensional domains); instead, with respect
sides of figures), respectively, are considered to the visualization metrics, our proposed tech-
as input. Finally, Figure 26, 27, and 28 show nique overcomes the comparison techniques,
the experimental results when a 1,000 × 1,000 thus confirming its suitability in efficiently
two-dimensional real-life OLAP view built on supporting advanced OLAP visualization of
top of the data set USCensus1990 is considered multidimensional data cubes.
as input.
From the analysis of the set of experimental CONCLUSION AND
results on two-dimensional synthetic, bench- FUTUrE WOrK
mark and real-life OLAP views, it follows that, In this article, we have presented an innova-
with respect to the accuracy metrics, our pro- tive technique for supporting advanced OLAP
posed technique is comparable with MinSkew, visualization of multidimensional data cubes,
which represents the best on two-dimensional which is particularly suitable for mobile OLAP
views (indeed, as well-recognized-in-literature, scenarios (like, for instance, those addressed
MinSkew presents severe limitations on mul-

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 27

Figure 28. Experimental results for the visu- the results coming from the high-dimensional
alization metrics with respect to the depth of data and information visualization research area
HRQs P on the 1,000 × 1,000 two-dimensional (e.g., see University of Hannover, Germany,
real-life OLAP view built on top of the data set 2005), which are already suitable to be applied
USCensus1990 with WH = 5, r = 10 %, and γ to the problem of visualizing multidimensional
= 70 % databases and data cubes.

ACKNOWLEDGMENT
Authors are very grateful to Francesco Cris-
tofaro for having developed and performed
the experimental section on benchmark and
real-life data sets.

rEFErENCES
Agrawal, R., Gupta, A., & Sarawagi, S. (1997). Mod-
eling multidimensional databases. Proceedings
of 13th IEEE ICDE International Conference
(pp. 232-243), Bangalore, India.

Acharya, S., Poosala, V., & Ramaswamy, S. (1999).


by the system Hand-OLAP). Founding on Selectivity estimation in spatial databases.
very efficient two-dimensional summary data Proceedings of 1999 ACM SIGMOD Interna-
domain compression solutions (Buccafurri et tional Conference (pp. 13-24), Philadelphia,
al., 2003), our technique meaningfully exploits PA, USA.
the data compression paradigm that, in this ar-
Babcock, B., Chaudhuri, S., & Das, G. (2003). Dy-
ticle, has been proposed as a way of visualizing
namic sample selection for approximate query
multidimensional OLAP domains to overcome answers. Proceedings of 2003 ACM SIGMOD
the natural disorientation and refractoriness of International Conference (pp. 539-550), San
human beings in dealing with hyper-spaces. Diego, CA, USA.
In this direction, the OLAP dimension flat-
tening process and the amenity of computing Buccafurri, F., Furfaro, F., Saccà, D., & Sirangelo,
C. (2003). A quad-tree based multiresolution
semantics-aware buckets are, to the best of
approach for two-dimensional summary data.
our knowledge, innovative contributions to Proceedings of 15th IEEE SSDBM International
the state-of-the-art OLAP research. Finally, Conference (pp. 127-140), Cambridge, MA,
various experimental results performed on USA.
different kinds of two-dimensional OLAP
views extracted from synthetic, benchmark, Bruno, N., Chaudhuri, S., & Gravano, L. (2001).
STHoles: A multidimensional workload-aware
and real-life multidimensional data cubes have
histogram. Proceedings of 2001 ACM SIGMOD
clearly confirmed the benefits of our proposed International Conference (pp. 211-222), Santa
technique in the OLAP visualization context, Barbara, CA, USA.
also in comparison with well-known data cube
compression techniques. Cabibbo, L., & Torlone, R. (1998). From a proce-
Future work is mainly focused on making dural to a visual query language for OLAP.
Proceedings of 10th IEEE SSDBM International
the proposed technique capable of building
Conference (pp. 74-83), Capri, Italy.
m-dimensional OLAP views over massive n-
dimensional data cubes, with m << n and m > Chaudhuri, S., & Dayal, U. (1997). An overview of
2, by extending the algorithms presented in this data warehousing and OLAP technology. ACM
article. A possible solution could be found in SIGMOD Record, 26(1), 65-74.

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
2 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

Chaudhuri, S., Das, G., Datar, M., Motwani, R., & Data Mining and Knowledge Discovery, 1(1),
Rastogi, R. (2001). Overcoming limitations of 29-53.
sampling for aggregation queries. Proceedings
of 17th IEEE ICDE International Conference Gunopulos, D., Kollios, G., Tsotras, V. J., & Do-
(pp. 534-542), Heidelberg, Germany. meniconi, C. (2000). Approximating multi-
dimensional aggregate range queries over real
Codd, E. F., Codd, S. B., & Salley, C. T. (1993). Pro- attributes. Proceedings of 2000 ACM SIGMOD
viding OLAP to user-analysts: An IT mandate. International Conference (pp. 463-474), Dal-
E. F. Codd and Associates Technical Report. las, TX, USA.
Colliat, G. (1996). OLAP, relational, and multidi- Gupta, A., Agrawal, D., & El Abbadi, A. (2003).
mensional database systems. SIGMOD Record, Approximate range selection queries in peer-
25(3), 64-69. to-peer systems. Proceedings of 1st CIDIR
International Conference, Asilomar, CA,
Cuzzocrea, A. (2005). Overcoming limitations USA. Retrieved from http://www-db.cs.wisc.
of approximate query answering in OLAP. edu/cidr/cidr2003/program/p13.pdf
Proceedings of 9th IEEE IDEAS International
Conference (pp. 200-209), Montreal, Canada. Hacid, M. S., & Sattler, U. (1998). Modeling
multidimensional databases: A formal object-
Cuzzocrea, A., & Wang, W. (2007). Approximate centered approach. Proceedings of 6th ECIS
range-sum query answering on data cubes with International Conference (pp. 1-15), Aix-en-
probabilistic guarantees. Journal of Intelligent provence, France.
Information Systems, 28(2), 161-197.
Han, J., & Kamber, M. (2000). Data mining: Con-
Cuzzocrea, A., Furfaro, F., & Saccà, D. (2003). cepts and techniques. Morgan Kauffmann
Hand-OLAP: A system for delivering OLAP Publishers.
services on handheld devices. Proceedings of
6th IEEE ISADS International Conference (pp. Harinarayan, V., Rajaraman, A., & Ullman, J. (1996).
80-87), Pisa, Italy. Implementing data cubes efficiently. Proceed-
ings of 1996 ACM SIGMOD International
Ezeife, C. I. (2001). Selecting and materializing Conference (pp. 205-216), Montreal, Canada.
horizontally partitioned warehouse views. Data
& Knowledge Engineering, 36(2), 185-210. Ho, C. T., Agrawal, R., Megiddo, N., & Srikant, R.
(1997). Range queries in OLAP data cubes. Pro-
Fang, M., Shivakumar, N., Garcia-Molina, H., Mot- ceedings of 1997 ACM SIGMOD International
wani, R., & Ullman, J.D., (1998). Computing Conference (pp. 73-88), Tucson, AZ, USA.
iceberg queries efficiently. Proceedings of 24th
VLDB International Conference (pp. 299-310), Inmon, W. H. (1996). Building the data warehouse.
New York City, NY, USA. John Wiley & Sons.
Gebhardt, M., Jarke, M., & Jacobs, S. (1997). A Inselberg, A. (2001). Visualization and knowledge
toolkit for negotiation support interfaces to discovery for high dimensional data. Proceed-
multi-dimensional data. Proceedings of 1997 ings of 2nd IEEE UIDIS International Workshop
ACM SIGMOD International Conference (pp. (pp. 5-24), Zurich, Switzerland.
348-356), Tucson, AZ, USA.
Keim, D. A. (1997). Visual data mining. Tutorial
Gibbons, P. B., & Matias, Y. (1998). New sam- at 23rd VLDB International Conference, Ath-
pling-based summary statistics for improving ens, Greece. Retrieved from http://www.dbs.
approximate query answers. Proceedings of informatik.uni-muenchen.de/daniel/VLDB-
1998 ACM SIGMOD International Conference Tutorial.ps
(pp. 331-342), Seattle, WA, USA.
Kimball, R. (1996). The data warehouse toolkit.
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., John Wiley & Sons.
Reichart, D., & Venkatrao, M. (1997). Data
cube: A relational aggregation operator gen- Koudas N., Muthukrishnan S., & Srivastava D.
eralizing group-by, cross-tab, and sub-totals. (2000). Optimal histograms for hierarchical

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 2

range queries. Proceedings of 19th ACM PODS Thanh Binh, N., Min Tjoa, A., & Wagner, R. (2000).
International Symposium (pp. 196-204), Dal- An object oriented multidimensional data
las, TX, USA. model for OLAP. Proceedings of 1st WAIM
International Conference (pp. 69-82), Shang-
Lehner, W., Albrecht, J., & Wedekind, H. (1998). hai, China.
Normal forms for multivariate databases. Pro-
ceedings of 10th IEEE SSDBM International Transaction Processing Council. (2006). TPC
Conference (pp. 63-72), Capri, Italy. benchmark H. Retrieved from http://www.
tpc.org/tpch/
Lenz, H. J., & Shoshani, A. (1997). Summarizability
in OLAP and statistical databases. Proceedings Tsois, A., Karayannidis, N., & Sellis, T. (2001).
of 9th IEEE SSDBM International Conference MAC: Conceptual data modeling for OLAP.
(pp. 132-143), Olympia, WA, USA. Proceedings of 3rd DMDW International Work-
shop, Interlaken, Switzerland. Retrieved from
Lenz, H. J., & Thalheim, B. (2001). OLAP databases http://sunsite.informatik.rwth-aachen.de/Publi-
and aggregation functions. Proceedings of 13th cations/CEUR-WS/Vol-39/paper5.pdf
IEEE SSDBM International Conference (pp.
91-100), Fairfax, VA, USA. University of California, Irvine. (2001). 1990 US
Census Data. Retrieved from http://kdd.ics.
Maniatis, A., Vassiliadis, P., Skiadopoulos, S., & uci.edu/databases/census1990/USCensus1990.
Vassiliou, Y. (2003a). CPM: A cube presentation html
model for OLAP. Proceedings of 5th DaWaK
International Conference (pp. 4-13), Prague, University of California, Irvine. (2005). Knowledge
Czech Republic. discovery in databases archive. Retrieved from
http://kdd.ics.uci.edu/
Maniatis, A., Vassiliadis, P., Skiadopoulos, & S.,
Vassiliou, Y. (2003b). Advanced visualization University of Hannover, Germany. (2005). 2D, 3D,
for OLAP. Proceedings of 6th ACM DOLAP and high-dimensional data and information
International Workshop (pp. 9-16), New Or- visualization research group. Retrieved from
leans, LO, USA. http://www.iwi.uni-hannover.de/lv/seminar_
ss05/bartke/home.htm
Muralikrishna, M., & DeWitt, D.J. (1998). Equi-depth
histograms for estimating selectivity factors for Vassiliadis, P. (1998). Modeling multidimensional
multi-dimensional queries. Proceedings of 1998 databases, cubes, and cube operations. Pro-
ACM SIGMOD International Conference (pp. ceedings of 10th IEEE SSDBM International
28-36), Seattle, WA, USA. Conference (pp. 53-62), Capri, Italy.

OLAP Council. (1998). Analytical processing bench- Vassiliadis, P., & Sellis, T. (1999). A survey of logical
mark 1, Release II. Retrieved from http://www. models for OLAP databases. SIGMOD Record,
symcorp.com/downloads/OLAP_Council- 28(4), 64-69.
WhitePaper.pdf
Vitter, J. S., Wang, M. & Iyer, B. (1998). Data cube
Piatetsky-Shapiro, G., & Connell, C. (1984). Accurate approximation and histograms via wavelets.
estimation of the number of tuples satisfying a Proceedings of 7th ACM CIKM International
condition. Proceedings of 1984 ACM SIGMOD Conference (pp. 96-104), Bethesda, MD,
International Conference (pp. 265-275), Bos- USA.
ton, MA, USA.
Xin, D., Han, J., Cheng, H., & Li, X. (2006). Answer-
Poosala, V., & Ioannidis, Y. (1997). Selectivity ing top-k queries with multi-dimensional selec-
estimation without the attribute value indepen- tions: The ranking cube approach. Proceedings
dence assumption. Proceedings of 23rd VLDB of 32nd VLDB International Conference (pp.
International Conference (pp. 486-495), Ath- 463-475), Cairo, Egypt.
ens, Greece. Zipf, G. K. (1949). Human behaviour and the prin-
ciple of least effort. Addison-Wesley.

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
30 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

Alfredo Cuzzocrea received the Laurea Degree in computer science engineering on April 2001 and the PhD
degree in computer science and system engineering on February 2005 both from the University of Calabria.
Presently, he is a research fellow at the Department of Electronics, Computer Science and Systems of the
University of Calabria, where is a member of the Database Research Group. He also actively collaborates
with the Institute of High Performance Computing and Networking of the Italian National Research Council.
His research interests include multidimensional data modeling and querying, data stream modeling and
querying, data warehousing and OLAP, XML data management, Web information systems modeling and
engineering, knowledge representation and management models and techniques, Grid and P2P computing.
He is author or co-author of more than 60 papers in referred international conferences (including SSDBM,
DEXA, DaWaK, DOLAP, IDEAS, SEKE, WISE, FQAS, SAC) and international journals (including JIIS,
DKE, WIAS). He serves as program committee member of referred international conferences (including
ICDM, CIKM, PAKDD, DaWaK, SAC) and as review board member of international journals (including
TODS, TKDE, INS, IJSEKE, FGCS).

Domenico Saccà was born in Catanzaro (Italy) on November 5, 1950, and he received a Doctoral degree
in Engineering from the University of Rome on 1975. Since 1987, he is full professor of Computer Engi-
neering at the University of Calabria and, since 2002, he is also Director of the CNR (the Italian National
Research Council) Research Institute ICAR (Institute for High Performance Computing and Networking),
located in Rende (CS) and branches in Naples and Palermo. Previously, from 1995 on, he was director of
the CNR Institute on System and Computer Sciences (Istituto per la Sistemistica e l’Informatica, ISI-CNR).
In the past he was visiting scientist at IBM Laboratory of San Jose, at the Computer Science Department
of UCLA and at the ICSI Institute of Berkeley; furthermore, he was a scientific consultant of MCC, Austin,
and manager of the Research Division of CRAI. His current research interests focus on advanced issues
on database such as: scheme integration in data warehousing, compressed representation of datacubes,
workflow and process mining, logic-based database query languages. His list of publications contains more
than 100 papers on journals (including Journal of the ACM, SIAM Journal on Computing, ACM Transac-
tions on Database Systems, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions
on Software Engineering, Theoretical Computer Science, Journal of Computer and System Sciences, Acta
Informatica) and on proceedings of international conferences. He has been member of the program com-
mittees of several international conferences, director of international schools and seminars and leader of
many national and international research projects.

Paolo Serafino received the Laurea Degree in Computer Science Engineering on July 2006. Presently, he
is research collaborator at Department of Electronics, Computer Science and Systems of the University
of Calabria.

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
2 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

Figure 25. Experimental results for the visualization metrics with respect to the depth of HRQs
P on the 1,000 × 1,000 two-dimensional benchmark OLAP views extracted from the data sets
TPC-H (left) and APB-1 (right) with WH = 5, r = 10 %, and γ = 70 %

Figure 26. Experimental results for the accuracy Figure 27. Experimental results for the accu-
metrics with respect to the query selectivity racy metrics with respect to the compression
||Q|| on the 1,000 × 1,000 two-dimensional ratio r on the 1,000 × 1,000 two-dimensional
real-life OLAP view built on top of the data set real-life OLAP view built on top of the data set
USCensus1990 with r = 10 % USCensus1990 with ||Q|| = 350 × 300

TPC-H (left sides of figures) and APB-1 (right tidimensional domains); instead, with respect
sides of figures), respectively, are considered to the visualization metrics, our proposed tech-
as input. Finally, Figure 26, 27, and 28 show nique overcomes the comparison techniques,
the experimental results when a 1,000 × 1,000 thus confirming its suitability in efficiently
two-dimensional real-life OLAP view built on supporting advanced OLAP visualization of
top of the data set USCensus1990 is considered multidimensional data cubes.
as input.
From the analysis of the set of experimental CONCLUSION AND
results on two-dimensional synthetic, bench- FUTUrE WOrK
mark and real-life OLAP views, it follows that, In this article, we have presented an innova-
with respect to the accuracy metrics, our pro- tive technique for supporting advanced OLAP
posed technique is comparable with MinSkew, visualization of multidimensional data cubes,
which represents the best on two-dimensional which is particularly suitable for mobile OLAP
views (indeed, as well-recognized-in-literature, scenarios (like, for instance, those addressed
MinSkew presents severe limitations on mul-

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 27

Figure 28. Experimental results for the visu- the results coming from the high-dimensional
alization metrics with respect to the depth of data and information visualization research area
HRQs P on the 1,000 × 1,000 two-dimensional (e.g., see University of Hannover, Germany,
real-life OLAP view built on top of the data set 2005), which are already suitable to be applied
USCensus1990 with WH = 5, r = 10 %, and γ to the problem of visualizing multidimensional
= 70 % databases and data cubes.

ACKNOWLEDGMENT
Authors are very grateful to Francesco Cris-
tofaro for having developed and performed
the experimental section on benchmark and
real-life data sets.

rEFErENCES
Agrawal, R., Gupta, A., & Sarawagi, S. (1997). Mod-
eling multidimensional databases. Proceedings
of 13th IEEE ICDE International Conference
(pp. 232-243), Bangalore, India.

Acharya, S., Poosala, V., & Ramaswamy, S. (1999).


by the system Hand-OLAP). Founding on Selectivity estimation in spatial databases.
very efficient two-dimensional summary data Proceedings of 1999 ACM SIGMOD Interna-
domain compression solutions (Buccafurri et tional Conference (pp. 13-24), Philadelphia,
al., 2003), our technique meaningfully exploits PA, USA.
the data compression paradigm that, in this ar-
Babcock, B., Chaudhuri, S., & Das, G. (2003). Dy-
ticle, has been proposed as a way of visualizing
namic sample selection for approximate query
multidimensional OLAP domains to overcome answers. Proceedings of 2003 ACM SIGMOD
the natural disorientation and refractoriness of International Conference (pp. 539-550), San
human beings in dealing with hyper-spaces. Diego, CA, USA.
In this direction, the OLAP dimension flat-
tening process and the amenity of computing Buccafurri, F., Furfaro, F., Saccà, D., & Sirangelo,
C. (2003). A quad-tree based multiresolution
semantics-aware buckets are, to the best of
approach for two-dimensional summary data.
our knowledge, innovative contributions to Proceedings of 15th IEEE SSDBM International
the state-of-the-art OLAP research. Finally, Conference (pp. 127-140), Cambridge, MA,
various experimental results performed on USA.
different kinds of two-dimensional OLAP
views extracted from synthetic, benchmark, Bruno, N., Chaudhuri, S., & Gravano, L. (2001).
STHoles: A multidimensional workload-aware
and real-life multidimensional data cubes have
histogram. Proceedings of 2001 ACM SIGMOD
clearly confirmed the benefits of our proposed International Conference (pp. 211-222), Santa
technique in the OLAP visualization context, Barbara, CA, USA.
also in comparison with well-known data cube
compression techniques. Cabibbo, L., & Torlone, R. (1998). From a proce-
Future work is mainly focused on making dural to a visual query language for OLAP.
Proceedings of 10th IEEE SSDBM International
the proposed technique capable of building
Conference (pp. 74-83), Capri, Italy.
m-dimensional OLAP views over massive n-
dimensional data cubes, with m << n and m > Chaudhuri, S., & Dayal, U. (1997). An overview of
2, by extending the algorithms presented in this data warehousing and OLAP technology. ACM
article. A possible solution could be found in SIGMOD Record, 26(1), 65-74.

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
2 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

Chaudhuri, S., Das, G., Datar, M., Motwani, R., & Data Mining and Knowledge Discovery, 1(1),
Rastogi, R. (2001). Overcoming limitations of 29-53.
sampling for aggregation queries. Proceedings
of 17th IEEE ICDE International Conference Gunopulos, D., Kollios, G., Tsotras, V. J., & Do-
(pp. 534-542), Heidelberg, Germany. meniconi, C. (2000). Approximating multi-
dimensional aggregate range queries over real
Codd, E. F., Codd, S. B., & Salley, C. T. (1993). Pro- attributes. Proceedings of 2000 ACM SIGMOD
viding OLAP to user-analysts: An IT mandate. International Conference (pp. 463-474), Dal-
E. F. Codd and Associates Technical Report. las, TX, USA.
Colliat, G. (1996). OLAP, relational, and multidi- Gupta, A., Agrawal, D., & El Abbadi, A. (2003).
mensional database systems. SIGMOD Record, Approximate range selection queries in peer-
25(3), 64-69. to-peer systems. Proceedings of 1st CIDIR
International Conference, Asilomar, CA,
Cuzzocrea, A. (2005). Overcoming limitations USA. Retrieved from http://www-db.cs.wisc.
of approximate query answering in OLAP. edu/cidr/cidr2003/program/p13.pdf
Proceedings of 9th IEEE IDEAS International
Conference (pp. 200-209), Montreal, Canada. Hacid, M. S., & Sattler, U. (1998). Modeling
multidimensional databases: A formal object-
Cuzzocrea, A., & Wang, W. (2007). Approximate centered approach. Proceedings of 6th ECIS
range-sum query answering on data cubes with International Conference (pp. 1-15), Aix-en-
probabilistic guarantees. Journal of Intelligent provence, France.
Information Systems, 28(2), 161-197.
Han, J., & Kamber, M. (2000). Data mining: Con-
Cuzzocrea, A., Furfaro, F., & Saccà, D. (2003). cepts and techniques. Morgan Kauffmann
Hand-OLAP: A system for delivering OLAP Publishers.
services on handheld devices. Proceedings of
6th IEEE ISADS International Conference (pp. Harinarayan, V., Rajaraman, A., & Ullman, J. (1996).
80-87), Pisa, Italy. Implementing data cubes efficiently. Proceed-
ings of 1996 ACM SIGMOD International
Ezeife, C. I. (2001). Selecting and materializing Conference (pp. 205-216), Montreal, Canada.
horizontally partitioned warehouse views. Data
& Knowledge Engineering, 36(2), 185-210. Ho, C. T., Agrawal, R., Megiddo, N., & Srikant, R.
(1997). Range queries in OLAP data cubes. Pro-
Fang, M., Shivakumar, N., Garcia-Molina, H., Mot- ceedings of 1997 ACM SIGMOD International
wani, R., & Ullman, J.D., (1998). Computing Conference (pp. 73-88), Tucson, AZ, USA.
iceberg queries efficiently. Proceedings of 24th
VLDB International Conference (pp. 299-310), Inmon, W. H. (1996). Building the data warehouse.
New York City, NY, USA. John Wiley & Sons.
Gebhardt, M., Jarke, M., & Jacobs, S. (1997). A Inselberg, A. (2001). Visualization and knowledge
toolkit for negotiation support interfaces to discovery for high dimensional data. Proceed-
multi-dimensional data. Proceedings of 1997 ings of 2nd IEEE UIDIS International Workshop
ACM SIGMOD International Conference (pp. (pp. 5-24), Zurich, Switzerland.
348-356), Tucson, AZ, USA.
Keim, D. A. (1997). Visual data mining. Tutorial
Gibbons, P. B., & Matias, Y. (1998). New sam- at 23rd VLDB International Conference, Ath-
pling-based summary statistics for improving ens, Greece. Retrieved from http://www.dbs.
approximate query answers. Proceedings of informatik.uni-muenchen.de/daniel/VLDB-
1998 ACM SIGMOD International Conference Tutorial.ps
(pp. 331-342), Seattle, WA, USA.
Kimball, R. (1996). The data warehouse toolkit.
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., John Wiley & Sons.
Reichart, D., & Venkatrao, M. (1997). Data
cube: A relational aggregation operator gen- Koudas N., Muthukrishnan S., & Srivastava D.
eralizing group-by, cross-tab, and sub-totals. (2000). Optimal histograms for hierarchical

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007 2

range queries. Proceedings of 19th ACM PODS Thanh Binh, N., Min Tjoa, A., & Wagner, R. (2000).
International Symposium (pp. 196-204), Dal- An object oriented multidimensional data
las, TX, USA. model for OLAP. Proceedings of 1st WAIM
International Conference (pp. 69-82), Shang-
Lehner, W., Albrecht, J., & Wedekind, H. (1998). hai, China.
Normal forms for multivariate databases. Pro-
ceedings of 10th IEEE SSDBM International Transaction Processing Council. (2006). TPC
Conference (pp. 63-72), Capri, Italy. benchmark H. Retrieved from http://www.
tpc.org/tpch/
Lenz, H. J., & Shoshani, A. (1997). Summarizability
in OLAP and statistical databases. Proceedings Tsois, A., Karayannidis, N., & Sellis, T. (2001).
of 9th IEEE SSDBM International Conference MAC: Conceptual data modeling for OLAP.
(pp. 132-143), Olympia, WA, USA. Proceedings of 3rd DMDW International Work-
shop, Interlaken, Switzerland. Retrieved from
Lenz, H. J., & Thalheim, B. (2001). OLAP databases http://sunsite.informatik.rwth-aachen.de/Publi-
and aggregation functions. Proceedings of 13th cations/CEUR-WS/Vol-39/paper5.pdf
IEEE SSDBM International Conference (pp.
91-100), Fairfax, VA, USA. University of California, Irvine. (2001). 1990 US
Census Data. Retrieved from http://kdd.ics.
Maniatis, A., Vassiliadis, P., Skiadopoulos, S., & uci.edu/databases/census1990/USCensus1990.
Vassiliou, Y. (2003a). CPM: A cube presentation html
model for OLAP. Proceedings of 5th DaWaK
International Conference (pp. 4-13), Prague, University of California, Irvine. (2005). Knowledge
Czech Republic. discovery in databases archive. Retrieved from
http://kdd.ics.uci.edu/
Maniatis, A., Vassiliadis, P., Skiadopoulos, & S.,
Vassiliou, Y. (2003b). Advanced visualization University of Hannover, Germany. (2005). 2D, 3D,
for OLAP. Proceedings of 6th ACM DOLAP and high-dimensional data and information
International Workshop (pp. 9-16), New Or- visualization research group. Retrieved from
leans, LO, USA. http://www.iwi.uni-hannover.de/lv/seminar_
ss05/bartke/home.htm
Muralikrishna, M., & DeWitt, D.J. (1998). Equi-depth
histograms for estimating selectivity factors for Vassiliadis, P. (1998). Modeling multidimensional
multi-dimensional queries. Proceedings of 1998 databases, cubes, and cube operations. Pro-
ACM SIGMOD International Conference (pp. ceedings of 10th IEEE SSDBM International
28-36), Seattle, WA, USA. Conference (pp. 53-62), Capri, Italy.

OLAP Council. (1998). Analytical processing bench- Vassiliadis, P., & Sellis, T. (1999). A survey of logical
mark 1, Release II. Retrieved from http://www. models for OLAP databases. SIGMOD Record,
symcorp.com/downloads/OLAP_Council- 28(4), 64-69.
WhitePaper.pdf
Vitter, J. S., Wang, M. & Iyer, B. (1998). Data cube
Piatetsky-Shapiro, G., & Connell, C. (1984). Accurate approximation and histograms via wavelets.
estimation of the number of tuples satisfying a Proceedings of 7th ACM CIKM International
condition. Proceedings of 1984 ACM SIGMOD Conference (pp. 96-104), Bethesda, MD,
International Conference (pp. 265-275), Bos- USA.
ton, MA, USA.
Xin, D., Han, J., Cheng, H., & Li, X. (2006). Answer-
Poosala, V., & Ioannidis, Y. (1997). Selectivity ing top-k queries with multi-dimensional selec-
estimation without the attribute value indepen- tions: The ranking cube approach. Proceedings
dence assumption. Proceedings of 23rd VLDB of 32nd VLDB International Conference (pp.
International Conference (pp. 486-495), Ath- 463-475), Cairo, Egypt.
ens, Greece. Zipf, G. K. (1949). Human behaviour and the prin-
ciple of least effort. Addison-Wesley.

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.
30 International Journal of Data Warehousing & Mining, 3(4), 1-30, October-December 2007

Alfredo Cuzzocrea received the Laurea Degree in computer science engineering on April 2001 and the PhD
degree in computer science and system engineering on February 2005 both from the University of Calabria.
Presently, he is a research fellow at the Department of Electronics, Computer Science and Systems of the
University of Calabria, where is a member of the Database Research Group. He also actively collaborates
with the Institute of High Performance Computing and Networking of the Italian National Research Council.
His research interests include multidimensional data modeling and querying, data stream modeling and
querying, data warehousing and OLAP, XML data management, Web information systems modeling and
engineering, knowledge representation and management models and techniques, Grid and P2P computing.
He is author or co-author of more than 60 papers in referred international conferences (including SSDBM,
DEXA, DaWaK, DOLAP, IDEAS, SEKE, WISE, FQAS, SAC) and international journals (including JIIS,
DKE, WIAS). He serves as program committee member of referred international conferences (including
ICDM, CIKM, PAKDD, DaWaK, SAC) and as review board member of international journals (including
TODS, TKDE, INS, IJSEKE, FGCS).

Domenico Saccà was born in Catanzaro (Italy) on November 5, 1950, and he received a Doctoral degree
in Engineering from the University of Rome on 1975. Since 1987, he is full professor of Computer Engi-
neering at the University of Calabria and, since 2002, he is also Director of the CNR (the Italian National
Research Council) Research Institute ICAR (Institute for High Performance Computing and Networking),
located in Rende (CS) and branches in Naples and Palermo. Previously, from 1995 on, he was director of
the CNR Institute on System and Computer Sciences (Istituto per la Sistemistica e l’Informatica, ISI-CNR).
In the past he was visiting scientist at IBM Laboratory of San Jose, at the Computer Science Department
of UCLA and at the ICSI Institute of Berkeley; furthermore, he was a scientific consultant of MCC, Austin,
and manager of the Research Division of CRAI. His current research interests focus on advanced issues
on database such as: scheme integration in data warehousing, compressed representation of datacubes,
workflow and process mining, logic-based database query languages. His list of publications contains more
than 100 papers on journals (including Journal of the ACM, SIAM Journal on Computing, ACM Transac-
tions on Database Systems, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions
on Software Engineering, Theoretical Computer Science, Journal of Computer and System Sciences, Acta
Informatica) and on proceedings of international conferences. He has been member of the program com-
mittees of several international conferences, director of international schools and seminars and leader of
many national and international research projects.

Paolo Serafino received the Laurea Degree in Computer Science Engineering on July 2006. Presently, he
is research collaborator at Department of Electronics, Computer Science and Systems of the University
of Calabria.

Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global
is prohibited.

You might also like