You are on page 1of 6

International Conference on Computer Systems and Technologies - CompSysTech07

Dimensional Hierarchies Implementation in Data Warehouse Logical


Scheme Design
Anna Rozeva
Abstract: Hierarchies represent substantial part of the multidimensional view of data based on
exploring measures of facts for business or non-business domain along various dimensions. In data
warehousing and on-line analytical processing they provide for examining data at different levels of detail.
Several types of hierarchies have been presented with issues concerning dependencies and summarizability
of data along the levels. Design mechanism for implementation of dimensional hierarchies in datawarehouse
logical scheme has been proposed. Algorithms for enforcing dependencies on dimensional hierarchies for
achieving correct summarizability of data have been developed. An implementation of the algorithms as
procedures in logical schemes metadata has been presented.
Key words: Data Warehouse, Logical Scheme, Dimensional Hierarchy, Summarizability,
Dependencies, Metadata, OLAP.

INTRODUCTION
Data warehouse (DW) serves analytical purposes by adopting the multidimensional
data model. It provides for exploring specific elements of analysis (measures) from
different perspectives (dimensions). Dimensions attributes either form a hierarchy or are
just descriptive. Dimensional hierarchies allow for obtaining views of data with different
granularity, i.e. summarized or detailed through roll-up and drill-down operations
respectively. The relational approach has proved to be common for implementing the
multidimensional model. Among its well-known advantages concerning strategies for
storing data, standardization and tool-independence the relational model provides for the
representation of hierarchies at logical level as well. The implementation of DW structure
fact and dimension tables at logical level is most often through star or snowflake scheme.
The snowflake scheme being normalized considering dimension tables provides for better
highlighting hierarchies among dimension members.
On-line analytical processing (OLAP) allows dynamic manipulation of the data
contained in the DW. Aggregations on DW data can be computed by enforcing
summarizability in dimension hierarchies [1]. Summarizability or correct aggregations
requires that facts map directly to the lowest-level dimension values, to a single value per
dimension and dimensional hierarchies to be balanced trees. Properly designed
hierarchical structure of dimensions ensures correct and efficient calculation of
aggregation functions and decreases logical errors. Hierarchies have to be modeled at the
logical level in order to ensure structure that is consistent and coherent. Different types of
hierarchies and issues concerning correctness of aggregations are presented in the next
section. Dependencies to be enforced on asymmetrical hierarchies for ensuring
summarizability are pointed out. Mechanism for implementation of hierarchies in relational
logical scheme is proposed. Algorithms for enforcing the stated dependencies are
designed. Implementation of the algorithms as procedures for hierarchy generation and
manipulation that are to be included in DW logical schemes metadata are highlighted.
DIMENSION
HIERARCHIES

STRUCTURE,
SUMMARIZABILITY,
DEPENDENCIES
A categorization of dimensional hierarchies has been presented in [2]. Simple
hierarchy has tree structure generated for the instances with one-to-many parent-child
relationships. A simple hierarchy is symmetric if there exists single path from bottom level
members to the top and all levels are mandatory. Its fully summarizable and the
aggregation of measures along the levels is straightforward. An example of simple and
symmetric hierarchy is shown in fig.1.The simple hierarchy is asymmetric when not all

- IIIA.24-1 -

International Conference on Computer Systems and Technologies - CompSysTech07

levels are mandatory. There may be paths not covering all levels or there may be parent
levels without children. These types of asymmetry are shown in fig.1. Such hierarchies
arent fully summarizable.

Fig.1. Symmetric dimensional hierarchy and asymmetric dimensional hierarchies


Hierarchies are multiple if multiple paths exist from bottom level to the top or in case
of many-to-one parent-child relationships among levels. Multiple hierarchies are shown in
fig.2. These hierarchies arent fully summarizable as well.

Fig.2. Multiple dimensional hierarchies


Correct summarizability as the most essential design goal for dimension hierarchies
in OLAP and conditions for ensuring it have been discussed in [5]. They refer to level
attributes and demand that they are disjoint and complete, i.e. every element has to be
assigned to an element on the level above. The verification of these conditions is
performed by analysis of dependencies among attributes of a dimension. In a relational
DW logical scheme a dimension represents relation over a set of attributes. Consequently
a dimension hierarchy involves parent-child relationship between two columns of the
tables relation. In order to define a hierarchy that complies to the summarizability
conditions inclusion dependencies should hold in the relation. The following dependencies
[5] can be forced on relation for ensuring correctness of aggregations along the levels:

Transitive anti-closure dependency,

Functional dependency,

Non-raggedness dependency,

Balance dependency.
For multiple hierarchies a constraint that removes multiple paths from a parent to
child level as the one from fig.3. is posed by the transitive anti-closure dependency. It
prevents a roll-up path to by-pass intermediate nodes. Thus if theres a longer path
between two nodes the direct path between them is not allowed. When the graph from
fig.3. is forced with this dependency the direct path from nodes 1 to 4 is excluded. Tree
like hierarchies are often more useful for OLAP analysis. Functional dependency forces
hierarchy to a tree. It enforces the rule that each child has a single parent. When a relation
obeys a functional dependency CP where C is child and P is parent level then for each
pair of rows if r[C] = r[C] then r[P] = r[P]. So far as the non-raggedness dependency is
concerned a relation obeys it if all children of a parent are of one and the same level. Let L

- IIIA.24-2 -

International Conference on Computer Systems and Technologies - CompSysTech07

be a level identifier. A relation r obeys the dependency CPL if (CP)r is a hierarchy where
for all tuples t and all children tuples t it holds t t[L] = t[L]. This dependency ensures that
there exists a node on every hierarchy level in each roll-up path. The non-raggedness
dependency ensures that no levels can be bypassed in roll-up paths but still it doesnt
provide for correct aggregations if hierarchy is unbalanced, i.e. leaf nodes arent on the
same level. The relation r obeys the balance dependency CPL if (CP)r is a hierarchy
and for all tuples t, t Leaves(h) it holds that t[L] = t[L]. Forcing the stated dependencies
on dimensional hierarchy correctness of aggregations on every level is guaranteed.
DIMENSIONAL HIERARCHIES DESIGN MECHANISM
The purpose of a hierarchy is to provide navigational structure for a dimension so
that measure values with different level of aggregation can be obtained by drilling down or
rolling up. As shown in the previous section it has to be a non-ragged and balanced tree
with measure values associated to the leaves. Techniques for representing dimensional
hierarchies have been discussed in [2], [3] and [4]. The mapping of different types of
hierarchies into relations with mapping rules are presented in [2], a primitive for generation
of de-normalized hierarchy relation as a part of mechanism for DW logical scheme design
is to be found in [4], while in [3] techniques for transforming hierarchies in order to ensure
summarizability have been treated from the viewpoint of implementing OLAP visual
interface. In the current paper we present a mechanism for designing dimensional
hierarchies as part of relational DW logical scheme. In our previous work [6] mechanism
for computing aggregations as part of the DW logical scheme design has been presented.
In this work functional dependencies in dimensional hierarchies have been stated as
invariants providing for scheme consistency. Conceptual design of dimensional hierarchy
and mapping procedure to de-normalized relation has been presented in [7]. The design
approach chosen for dimensional hierarchies in our current work involves representing the
hierarchy structure by table and enforcing the previously discussed dependencies on it.
The structure of the hierarchy table is presented in fig.3. The records describe child level
members with parent and level identifier. The tables content in fig.3. corresponds to the
dimension hierarchy with multiple paths from fig.2.

Fig.3. Dimensional hierarchy table


Basics for creating dimensional hierarchy table are the following:
Source - fields of tables from DW logical scheme.
Method of creation level by level bottom-up.
Location of candidate levels - separate tables (a normalized, snowflake DW logical
scheme is examined.
Elaboration of levels - traversing primary-foreign key relationships in the scheme.
Lowest level - foreign key in the fact table.
Child members - dimension table primary key, parent members foreign keys in
dimension table.
Dimensional hierarchy is created by implementing procedures Create Hierarchy
Table, Traverse Relationship and Generate Hierarchy Table implemented as metadata

- IIIA.24-3 -

International Conference on Computer Systems and Technologies - CompSysTech07

and presented in Table 1 shown in the last section. Hierarchies are tested for violation of
summarizability conditions after the cases of asymmetry and multiplicity shown in fig.1.
and fig.2. Algorithms are designed to check hierarchy with structure that is represented by
the table from fig.3. for asymmetry. According to the type of asymmetry or multiplicity the
algorithm enforces corresponding dependency. The algorithm for shorter path elimination
is shown in fig.4.

Fig.4. Algorithm for shorter path elimination


The algorithm checks Child values for multiple parents and evaluates parents Level
identifiers. In case that they are different their level identifiers are checked. the record with
the higher level value is selected and deleted. It represents the shorter path to the node
examined.
Functional dependency on the hierarchy is enforced by eliminating many-to-one
relationships between parent and child levels. The approach adopted in our mechanism is
to create a separate dimension from the parent level with multiple children. The initial
dimension in Step3 is restricted to the level of the child with multiple parents this is the
lowest level with member denoted by 5 to the right in fig.2. The new dimension involves
the rest of the levels - the ones with members denoted by 3, 2, and 1. In Step 4 dimension
table is created for the new dimension. Its populated with instances for the levels and their
members and primary key is defined as the field representing the lowest level. Hierarchy
structure table for the new dimension is set up in Step7. The fact table structure is
modified with a field that will relate to the newly created dimension Step9. A record is
inserted in Step10 with value for the newly created foreign key equal to the primary key of
the new dimension. The functional dependency procedure is shown in fig.5.

Fig.5. Algorithm for many-to-one relationship elimination


Non-raggedness dependency concerns examination of roll-up paths for missing
levels. Therefore parents with multiple children are looked for and child level identifiers are
examined. When these level identifiers are different the path involving lower level child
indicates missing level or gap. The dependency is enforced by creating an artificial child
value for the parent being examined and inserting a row in the hierarchy table for it. The
parent value in the record representing the path with gap is updated with the artificial value
created. The algorithm for enforcing the dependency is shown in fig.6.

- IIIA.24-4 -

International Conference on Computer Systems and Technologies - CompSysTech07

Fig.6. Algorithm for filling gaps in roll-up paths


Enforcing balance dependency is performed by the algorithm shown in fig.7. It looks
for child values that dont exist as parent values and arent at the lowest hierarchy level.
Having found such children starting from the one at the uppermost level artificial child
values are created by inserting records in the hierarchy table such as: Parent = the located
childless value, Child = artificial, LevelId = childless value LevelId +1. Artificial child
values are created until the lowest level is reached.

Fig.7. Algorithm for making child level identifiers equal


IMPLEMENTATION OF DIMENSIONAL HIERARCHIES IN DW METADATA
Issues concerning DW metadata have been considered in [8]. Metadata updates
concerning aggregate fact table design and implementation are presented in [6]. Metadata
that refer to hierarchy establishment and management consist in procedures and
implementation rules. Procedures in MySQL have been created after the algorithms for
enforcing summarizability in hierarchies that have been presented in the previous section.
Procedures for hierarchy generation and manipulation are proposed as metadata as well.
They are shown in Table 1 with names and description. Rules proposed as metadata refer
to obtaining aggregate values along hierarchy roll-up paths with artificial values.
Table 1: DW metadata for dimensional hierarchies
Procedure
Traverse Relationship ()

Objective
Obtain the primary key of a related table
from relationship between tables
Create Hierarchy Table ()
Creates a table for dimension hierarchy
Generate Hierarchy Table ()
Records are inserted into the hierarchy
table by traversing primary-foreign key
relationships between tables
Child with Multiple Parents ()
Hierarchy table is checked for children
having more than one parent
Parent with Multiple Children ()
Parents with more than one child are looked
for
The following rules concerning correct aggregation are included in DW metadata
when artificial parent and child values are inserted in the hierarchy structure:

- IIIA.24-5 -

International Conference on Computer Systems and Technologies - CompSysTech07

For a parent with an artificial child include its own value in the aggregation;
Values for artificial parents are obtained by aggregating their children values;
Update metadata with foreign key information when creating separate dimension by
enforcing functional dependency.
CONCLUSIONS AND FUTURE WORK
.
The hierarchical structure defined for dimensions members provides for aggregation
of measures through OLAP roll-up operation. Asymmetric and multiple hierarchies violate
summarizability conditions. Dependencies have been examined as transformations to be
preformed upon them in order to ensure correctness of aggregations. Contributions of the
paper refer to the elaboration of mechanism for design and implementation of dimensional
hierarchies as tables in DW logical scheme; the algorithms for verification and enforcing
summarizability in asymmetric and multiple hierarchies; the implementation of the
algorithms as DW metadata procedures and the establishment of rules for aggregation
artificial child and parent values.
A matter of interest in future is the investigation of quality factors referring to DW
logical design. Future work is therefore intended in further tuning the structure of the
hierarchical table when aggregating artificial parent and child values by considering
multidimensional normal forms.
REFERENCES
[1] Hurtado, C., A. Mendelzon. Reasoning about Summarizability in Heterogeneous
Multidimensional Schemas. In ICDT, Proceedings of the 8th International Conference on
Database Theory, pp. 375-389. 2001
[2] Malinowski, E., E. Zimanyi. Hierarchies in a Multidimensional Model: From
Conceptual modeling to Logical Representation. Accepted for publication in Data &
Knowledge Engineering. 2005
[3] Mansmann, S., M. Scholl. Extending Visual OLAP for Handling Irregular
Dimensional Hierarchies. A Min Tjoa and J. Trujillo (Eds.): DaWaK 2006, LNCS 4081, pp.
95-105, Springer Verlag Berlin Heidelberg. 2006
[4] Marotta, A., Ruggia, R. Data Warehouse Design: A Schema Transformation
Approach. 22nd International Conference of the Chilean Computer Science Society (SCCC
2002), Copiapo, Chile. 2002
[5] Niemi, T., J. Nummenmaa, P. Thanisch. Logical Multidimensional Database
Design for Ragged and Unbalanced Aggregation Hierarchies. In Proceedings of the
International Workshop on Design and Management of Data Warehouses. Interlaken,
Switzerland, June 4. 2001
[6] Rozeva, A. Implementation of Aggregations in a Data Warehouse Logical Scheme
Framework and Mechanism, Third International Scientific Conference Computer
Science2006, Istanbul, Turkey, Oct.12-15, vol. II, pp. 330-335, 2006.
[7] Rozeva, A. Data Warehousing Conceptual Scheme Design And Mapping Into
Relational Logical Scheme, Proceedings of the International Conference Automatics and
Informatics06, Bulgaria, Sofia. Oct. 3-6, pp.149-152, 2006
[8] Wu, L., L. Miller, S. Nilakanta. Design of Data Warehouses Using Metadata.
Information and Software Technology 43, pp 109-119. 2001
ABOUT THE AUTHOR
Assoc.Prof. Anna Rozeva, PhD, Department of Computer Systems and Informatics,
University of Forestry, Sofia, Phone: +359 2 91907 340, -mail: arozeva@ltu.bg.

- IIIA.24-6 -

You might also like