You are on page 1of 5

‫جمهورية العراق‬

‫جامعة االمام جعفر الصادق (ع)‬

‫الكلية ‪ :‬تكنلوجيا المعلومات‬

‫القسم‪ :‬هندسة تقنيات الحاسوب‬

‫المرحلة ‪:‬الثالثه‬

‫عنوان التقرير ‪:‬‬

‫((‪))  data dependency‬‬

‫اسم الطالب ‪:‬‬

‫((سجاد حامد رضا ازرك ))‬

‫استاذ المادة ‪:‬‬

‫(( زينا ستار ))‬


Introduction
A data dependency in computer science is a situation in which
a program statement (instruction) refers to the data of a preceding
statement. In compiler theory, the technique used to discover data
dependencies among statements (or instructions) is called dependence
analysis.

Data Dependence and Data Independence

Applications implemented in pre-relational database systems are data-dependent,


meaning that both the physical representation of the data and the methods of
accessing it are built directly into the application code. This makes even the slightest
change to the physical design of a database an extraordinarily laborious effort.

The main objective of relational DBMSs is data independence. For years, the
relational database management systems used to run businesses, often referred to as
OLTP systems, made data independence obligatory. In an OLTP database, data is
stored in nonredundant tables that demand that every column of the table be
rigorously related to its primary key alone and to no other tables. This ensures that
information is available to all applications and analyses that use it, and it provides a
mechanism for maintaining consistency and reliability across applications: a single
source of each particular data element, a single version of the truth.

Data independence works well for OLTP systems because the applications accessing
the data generally access single tables or join only a few, small tables in relatively
simple queries. With the introduction of the data warehouse, previously unheard of
demands were placed on the relational database management systems underlying
them. In the data warehouse environment, large tables must be scanned and large
result sets are frequently returned. Many tables are joined together, complicated
calculations are made, and detailed data is aggregated directly in the queries. In
addition, large data volumes are extracted, transformed and loaded into the tables
concurrently with users running queries against the data. It quickly became apparent
that databases created and tuned for OLTP could not sustain the performance levels
required to support the demands of business intelligence processing. The OLTP
databases could not perform the queries within their allotted time window or, in some
cases, at all.

This situation highlights the potential for contradiction between designing databases
for optimum integrity and designing databases for optimum performance. The key to
data independence is data normalization, and normalized data schemas are the most
demanding of system performance.

To address the issue of poor performance, data independence has often been
abandoned in many environments and denormalized schemas have been used to
address a few particular, rather than all general, analytical needs of the enterprise.
Note: The term denormalized is used because of its familiarity in the industry, not
because of its technical accuracy. As described elsewhere in this normalization is a
logical concept, not a physical, concept. Therefore, it is incorrect to speak of
denormalization in the context of physical database design.

Although this arrangement addresses short-term decision support needs, it


compromises the enterprise view of the data and its adaptability. Data independence,
adaptability, and cross-enterprise functionality go hand in hand, and a normalized data
schema is critical to reaching these objectives.

The following topics provide some detail about why this is true.

Recursive Relationships
Recursive Relationships

The star schema which is the most common form of denormalization used in
contemporary data warehousing, cannot handle every kind of relationship that can
exist comfortably in a fully-normalized environment. Recursive relationships are one
such example. Recursion, as the term is generally used in computer science, is only a
small subset of the recursive function theory of formal logic.

A recursive relationship exists when the parent of a member in a hierarchical


relationship is also a member of the same entity. As demonstrated by the following
figure, there are two ways that this can manifest itself: with only a single parent or
with multiple parents:

The most commonly used example of a single-parent recursive relationship is


an employee table, where both an employee and its manager have rows. From an E-R
perspective, you would say a manager has employees. But managers, too, are
employees. This also means that managers can have managers who are employees,
and so on.

In the diagram, the single-parent recursive relationship is a customer table in which a


customer can be a customer of yet another customer in the table. The classic multiple-
parent recursive relationship is the bill of material. The diagram shows an example in
which multiple organizations can have multiple organizational affiliations. Project
work breakdown hierarchies are another common example of a a multiple parent
recursive structure.

In a recursive structure, there can be an unlimited number of levels without knowing


how many levels each member hierarchy currently has or potentially can have. One
hierarchy have only two levels, while another might be 15 levels deep. Herein lies the
limitation of the star schema for handling recursive relationships: it requires a fixed
number of levels, because each level is set up by a series of fixed columns in a
dimension table. Because you do not know the number of levels in a recursive
structure, you cannot predefine the columns.
The most critical entities in an enterprise data model frequently have recursive
structures. Organizational hierarchies such as internal, customer, supplier, and
competitor entities are usually recursive relationships.

Arguments Against Denormalizing the Physical


Database Schema to Increase Its Usability
Arguments Against Denormalizing the Physical Database Schema to
Increase Its Usability

Many data warehouse designers argue that denormalized physical database schemas
are easier for end users to navigate than fully normalized schemas.

Denormalized physical schemas certainly seem more user-friendly than the


complexity of a highly generalized, normalized data model. However, denormalized
physical schemas are driven by known queries, so their ease of use is somewhat
illusory. Formulating queries to address novel requirements, a task that is nearly
definitive of the data warehouse process model, is made more difficult, if not
impossible, in a denormalized environment. A fully normalized enterprise data model
is flexible enough to support the undertaking of any new analyses of the data.

That said, the reality is that end users typically do not write queries anyway, and when
they do, they are likely to use a third party natural language query generator, so the
usability argument is often moot. Coding novel queries is often the responsibility of
an application developer or a natural language query writing tool.

More importantly, you can create “denormalized” views to implement a semantic


layer that makes the normalized data model easier to Few sites permit users to query
base tables directly anyway, so creating views on base tables that look exactly like
their denormalized table counterparts should not be an issue.

If there were no issues of performance for those database management systems that
lack the parallel processing power of Teradata Database, then denormalization could
be handled universally by implementing .Star schemas, snowflakes, summary tables,
derived data, and the like could be built as virtual clusters of tables that look exactly
like their physical counterparts. By handling denormalization virtually, the
relationships within, between, and among the underlying base tables of the schema
remain intact, and referential integrity can be maintained by the system regardless of
how many virtual denormalized relationships are created. This flexibility frees DBAs
to create any number of denormalized views for users while simultaneously
maintaining semantic data integrity and eliminating the data redundancies required by
denormalized physical schemas.

DBAs can create virtual, subject-oriented schemas for specific applications as well as
creating views for more general database access without affecting the underlying base
table data. These same views can also be used to enforce security constraints for the
different communities of business users who must access the database.
Consider another argument that favors the ease of use of a fully-normalized database
schema over a denormalized schema. A physical star schema has physical dimensions
that support a physical fact table. However, for some dimensions there can be
mutually exclusive substitutes for the same data. For example, suppose an airline is
interested in both the point-to-point travel of customers between segments in addition
to their travel between their true origins and destinations. This discussion abbreviates
this dimensional family as O&D.

The true O&D dimension is different from the segment O&D, although superficially,
it looks the same. Moreover, their respective consolidation of facts is different as well,
although the detailed base table data is the same. If the star schemas are physicalized,
two very large schemas must be created, maintained, and coordinated to represent
them, whereas with virtual star schemas, the data is maintained only in the base
tables, producing a single, consistent version of the truth.

You might also like