Professional Documents
Culture Documents
INTRODUCTION
Data warehouse (DW) serves analytical purposes by adopting the multidimensional
data model. It provides for exploring specific elements of analysis (measures) from
different perspectives (dimensions). Dimensions attributes either form a hierarchy or are
just descriptive. Dimensional hierarchies allow for obtaining views of data with different
granularity, i.e. summarized or detailed through roll-up and drill-down operations
respectively. The relational approach has proved to be common for implementing the
multidimensional model. Among its well-known advantages concerning strategies for
storing data, standardization and tool-independence the relational model provides for the
representation of hierarchies at logical level as well. The implementation of DW structure
fact and dimension tables at logical level is most often through star or snowflake scheme.
The snowflake scheme being normalized considering dimension tables provides for better
highlighting hierarchies among dimension members.
On-line analytical processing (OLAP) allows dynamic manipulation of the data
contained in the DW. Aggregations on DW data can be computed by enforcing
summarizability in dimension hierarchies [1]. Summarizability or correct aggregations
requires that facts map directly to the lowest-level dimension values, to a single value per
dimension and dimensional hierarchies to be balanced trees. Properly designed
hierarchical structure of dimensions ensures correct and efficient calculation of
aggregation functions and decreases logical errors. Hierarchies have to be modeled at the
logical level in order to ensure structure that is consistent and coherent. Different types of
hierarchies and issues concerning correctness of aggregations are presented in the next
section. Dependencies to be enforced on asymmetrical hierarchies for ensuring
summarizability are pointed out. Mechanism for implementation of hierarchies in relational
logical scheme is proposed. Algorithms for enforcing the stated dependencies are
designed. Implementation of the algorithms as procedures for hierarchy generation and
manipulation that are to be included in DW logical schemes metadata are highlighted.
DIMENSION
HIERARCHIES
STRUCTURE,
SUMMARIZABILITY,
DEPENDENCIES
A categorization of dimensional hierarchies has been presented in [2]. Simple
hierarchy has tree structure generated for the instances with one-to-many parent-child
relationships. A simple hierarchy is symmetric if there exists single path from bottom level
members to the top and all levels are mandatory. Its fully summarizable and the
aggregation of measures along the levels is straightforward. An example of simple and
symmetric hierarchy is shown in fig.1.The simple hierarchy is asymmetric when not all
- IIIA.24-1 -
levels are mandatory. There may be paths not covering all levels or there may be parent
levels without children. These types of asymmetry are shown in fig.1. Such hierarchies
arent fully summarizable.
Functional dependency,
Non-raggedness dependency,
Balance dependency.
For multiple hierarchies a constraint that removes multiple paths from a parent to
child level as the one from fig.3. is posed by the transitive anti-closure dependency. It
prevents a roll-up path to by-pass intermediate nodes. Thus if theres a longer path
between two nodes the direct path between them is not allowed. When the graph from
fig.3. is forced with this dependency the direct path from nodes 1 to 4 is excluded. Tree
like hierarchies are often more useful for OLAP analysis. Functional dependency forces
hierarchy to a tree. It enforces the rule that each child has a single parent. When a relation
obeys a functional dependency CP where C is child and P is parent level then for each
pair of rows if r[C] = r[C] then r[P] = r[P]. So far as the non-raggedness dependency is
concerned a relation obeys it if all children of a parent are of one and the same level. Let L
- IIIA.24-2 -
be a level identifier. A relation r obeys the dependency CPL if (CP)r is a hierarchy where
for all tuples t and all children tuples t it holds t t[L] = t[L]. This dependency ensures that
there exists a node on every hierarchy level in each roll-up path. The non-raggedness
dependency ensures that no levels can be bypassed in roll-up paths but still it doesnt
provide for correct aggregations if hierarchy is unbalanced, i.e. leaf nodes arent on the
same level. The relation r obeys the balance dependency CPL if (CP)r is a hierarchy
and for all tuples t, t Leaves(h) it holds that t[L] = t[L]. Forcing the stated dependencies
on dimensional hierarchy correctness of aggregations on every level is guaranteed.
DIMENSIONAL HIERARCHIES DESIGN MECHANISM
The purpose of a hierarchy is to provide navigational structure for a dimension so
that measure values with different level of aggregation can be obtained by drilling down or
rolling up. As shown in the previous section it has to be a non-ragged and balanced tree
with measure values associated to the leaves. Techniques for representing dimensional
hierarchies have been discussed in [2], [3] and [4]. The mapping of different types of
hierarchies into relations with mapping rules are presented in [2], a primitive for generation
of de-normalized hierarchy relation as a part of mechanism for DW logical scheme design
is to be found in [4], while in [3] techniques for transforming hierarchies in order to ensure
summarizability have been treated from the viewpoint of implementing OLAP visual
interface. In the current paper we present a mechanism for designing dimensional
hierarchies as part of relational DW logical scheme. In our previous work [6] mechanism
for computing aggregations as part of the DW logical scheme design has been presented.
In this work functional dependencies in dimensional hierarchies have been stated as
invariants providing for scheme consistency. Conceptual design of dimensional hierarchy
and mapping procedure to de-normalized relation has been presented in [7]. The design
approach chosen for dimensional hierarchies in our current work involves representing the
hierarchy structure by table and enforcing the previously discussed dependencies on it.
The structure of the hierarchy table is presented in fig.3. The records describe child level
members with parent and level identifier. The tables content in fig.3. corresponds to the
dimension hierarchy with multiple paths from fig.2.
- IIIA.24-3 -
and presented in Table 1 shown in the last section. Hierarchies are tested for violation of
summarizability conditions after the cases of asymmetry and multiplicity shown in fig.1.
and fig.2. Algorithms are designed to check hierarchy with structure that is represented by
the table from fig.3. for asymmetry. According to the type of asymmetry or multiplicity the
algorithm enforces corresponding dependency. The algorithm for shorter path elimination
is shown in fig.4.
- IIIA.24-4 -
Objective
Obtain the primary key of a related table
from relationship between tables
Create Hierarchy Table ()
Creates a table for dimension hierarchy
Generate Hierarchy Table ()
Records are inserted into the hierarchy
table by traversing primary-foreign key
relationships between tables
Child with Multiple Parents ()
Hierarchy table is checked for children
having more than one parent
Parent with Multiple Children ()
Parents with more than one child are looked
for
The following rules concerning correct aggregation are included in DW metadata
when artificial parent and child values are inserted in the hierarchy structure:
- IIIA.24-5 -
For a parent with an artificial child include its own value in the aggregation;
Values for artificial parents are obtained by aggregating their children values;
Update metadata with foreign key information when creating separate dimension by
enforcing functional dependency.
CONCLUSIONS AND FUTURE WORK
.
The hierarchical structure defined for dimensions members provides for aggregation
of measures through OLAP roll-up operation. Asymmetric and multiple hierarchies violate
summarizability conditions. Dependencies have been examined as transformations to be
preformed upon them in order to ensure correctness of aggregations. Contributions of the
paper refer to the elaboration of mechanism for design and implementation of dimensional
hierarchies as tables in DW logical scheme; the algorithms for verification and enforcing
summarizability in asymmetric and multiple hierarchies; the implementation of the
algorithms as DW metadata procedures and the establishment of rules for aggregation
artificial child and parent values.
A matter of interest in future is the investigation of quality factors referring to DW
logical design. Future work is therefore intended in further tuning the structure of the
hierarchical table when aggregating artificial parent and child values by considering
multidimensional normal forms.
REFERENCES
[1] Hurtado, C., A. Mendelzon. Reasoning about Summarizability in Heterogeneous
Multidimensional Schemas. In ICDT, Proceedings of the 8th International Conference on
Database Theory, pp. 375-389. 2001
[2] Malinowski, E., E. Zimanyi. Hierarchies in a Multidimensional Model: From
Conceptual modeling to Logical Representation. Accepted for publication in Data &
Knowledge Engineering. 2005
[3] Mansmann, S., M. Scholl. Extending Visual OLAP for Handling Irregular
Dimensional Hierarchies. A Min Tjoa and J. Trujillo (Eds.): DaWaK 2006, LNCS 4081, pp.
95-105, Springer Verlag Berlin Heidelberg. 2006
[4] Marotta, A., Ruggia, R. Data Warehouse Design: A Schema Transformation
Approach. 22nd International Conference of the Chilean Computer Science Society (SCCC
2002), Copiapo, Chile. 2002
[5] Niemi, T., J. Nummenmaa, P. Thanisch. Logical Multidimensional Database
Design for Ragged and Unbalanced Aggregation Hierarchies. In Proceedings of the
International Workshop on Design and Management of Data Warehouses. Interlaken,
Switzerland, June 4. 2001
[6] Rozeva, A. Implementation of Aggregations in a Data Warehouse Logical Scheme
Framework and Mechanism, Third International Scientific Conference Computer
Science2006, Istanbul, Turkey, Oct.12-15, vol. II, pp. 330-335, 2006.
[7] Rozeva, A. Data Warehousing Conceptual Scheme Design And Mapping Into
Relational Logical Scheme, Proceedings of the International Conference Automatics and
Informatics06, Bulgaria, Sofia. Oct. 3-6, pp.149-152, 2006
[8] Wu, L., L. Miller, S. Nilakanta. Design of Data Warehouses Using Metadata.
Information and Software Technology 43, pp 109-119. 2001
ABOUT THE AUTHOR
Assoc.Prof. Anna Rozeva, PhD, Department of Computer Systems and Informatics,
University of Forestry, Sofia, Phone: +359 2 91907 340, -mail: arozeva@ltu.bg.
- IIIA.24-6 -