You are on page 1of 10

LEVELS OF DISTRIBUTION TRANSPERENCY

Structure
4.0 Objectives
4.1 Introduction
4.2 Reference Architecture for distributed databases
4.3 Types of data fragmentation
4.3.1 Horizontal Fragmentation
4.3.2 Derived Horizontal Fragmentation
4.3.3 Vertical Fragmentation
4.3.4 Mixed Fragmentation
4.4 Integrity Constraints in Distributed Databases
4.5 Summary
4.0 Objectives: In this unit we will learn the following topics:

Reference Architecture for Distributed Databases

Types of Data Fragmentation

Integrity constraints in Distributed databases

4.1 Introduction:
In this unit we suggest reference architecture for the distributed database. This
architecture allows us to determine the different levels of transparency, which are
conceptually relevant to understand distributed databases. Also the mapping between
the different levels is defined. Here we have used the relational model and relational
algebra for this purpose. The distributed access primitives are represented using the
SQL statements, as it is user friendly. The emphasis is given to the fact that how the
SQL primitives reference the objects which constitute the database. The different
types of fragmentation methods are discussed. Also the integrity constraints in the
distributed transaction are explained.
4.2 Reference Architecture for Distributed Databases:
Here we have suggested reference architecture for the distributed databases as shown
in the fig.4.1.The different levels are conceptually helpful to understand the
functioning of the whole system. The various stages of the architecture are as follows.

Global Schema: defines all the data, which are contained in the
distributed database as if it is a centralized system. Here a set of global
relations is used.

Fragmentation Schema: Each global relation is split into several nonoverlapping portions that are called as Fragments. The mapping between
the global relations and fragments is defined in the Fragmentation Schema.
It is a one to many relation such that several fragments correspond to one
global relation but only one global relation corresponds to one fragment.
They are indicated as Ri, the ith fragment of the global relation R.

Allocation Schema: The fragments are really the logical portions of the
global relation, which are physically dispersed at different sites of the
network. This schema defines at which site(s) a fragment is allocated. It is
to be noted that depending upon the requirement more than one fragment
may be allocated at a site. So this mapping determines whether the system
is a Redundant system or a Non redundant system.

Local Mapping Schema: We have already described the relationships


between the objects at the three top levels of this architecture. These three
levels are site independent; therefore, they do not depend on the data
model of the local DBMSs. At a lower level, it is necessary to map the
physical images to the objects that are manipulated by the local DBMS s.
This mapping is called a local mapping schema and depends on the type
of local DBMS; therefore in a heterogeneous system we have different
types of local mappings at different sites.

Global
Schema

Fragmentation
Schema

Site
Independent
Schemas

Allocation
Schema

(Other sites)
Local
Mapping
Schema 1

DBMS of
Site 1

Local
database

Local
Mapping
Schema 2

.
.
.
.

DBMS of
Site 2

Local
database

at site 1

at site 2
Fig. 4.1 A reference architecture for distributed databases
This architecture provides a very general conceptual framework for understanding
distributed databases. The three most important objectives that motivate the features
of this architecture are the separation of data fragmentation and allocation, the control
of redundancy, and the independence from local DBMSs.

Separating the concept of data fragmentation from the concept of data


allocation: This separation allows us to distinguish two different levels of
distribution transparency, namely fragmentation transparency and
location transparency. Fragmentation transparency is the highest degree
of transparency and consists of the fact that the user or application

programmer works on global relations. Location transparency is a lower


degree of transparency and requires the user or application programmer to
work on fragments instead of global relations; however, he or she does not
know where the fragments are located.

Explicit control of redundancy: The reference architecture provides


explicit control of redundancy at the fragment level.

Independence from local DBMSs: This feature, called local mapping


transparency, allows us to study several problems of distributed database
management without having to take into account the specific data models
of local DBMSs. Another type of transparency, which is strictly related to
location

transparency,

is

replication

transparency.

Replication

transparency means that the user is unaware of the replication of


fragments.
4.3 Types Of Data Fragmentation:
Two different types fragmentation: Horizontal and Vertical fragmentation can
decompose the global relations into fragments. We will first consider these two types
of fragmentation separately and then consider the more complex fragmentation, which
can be obtained by applying a composition of both.
In all types of fragmentation, a fragment can be defined by an expression in a
relational language (we will use relational algebra), which takes global relation as
operands and produces the fragment as result. For example, if a global relation
contains data about employees, a fragment which contains only data about employees
who work at department D1 can be obviously defined by a selection operation on the
global relation.
Some rules, which must be followed when defining fragments:

Completeness condition: All the data of the global relation must be


mapped into the fragments; i.e., it must not happen that a data item that
belongs to a global relation does not belong to any fragment.

Reconstruction condition: It must always be possible to reconstruct each


global relation from its fragments. The necessity of this condition is
obvious in fact, only fragments are stored in the distributed database, and
global relation have to be built through this reconstruction operation if
necessary.

Disjoint condition: It is convenient that fragments be disjoint, so that the


replication of data can be controlled explicitly at the allocation level.

4.3.1 Horizontal Fragmentation: Horizontal fragmentation consists of partitioning


the tuples of a global relation into subsets; this is clearly useful in distributed
databases, where each subset can contain data that have common geographical
properties. It can be defined by expressing each fragment as a selection operation on
the global relation.
Example: let a global relation be
SUPPLIER (SNUM, NAME, CITY)
Then the horizontal fragmentation can be defined in the following way:
SUPPLIER1 = SLCITY =Mysore SUPPLIER
SUPPLIER2 = SLCITY =Shimoga SUPPLIER
Now let us verify whether this fragmentation fulfills the conditions stated earlier.
The completeness condition: If Mysore and Shimoga are the only possible
values of the CITY attribute, then it satisfies this condition.
The reconstruction condition: can be verified easily, because it is always possible to
reconstruct the SUPPLIER global relation through the following operation.
SUPPLIER = SUPPLIER1 UN SUPPLIER2
The disjoint ness condition is clearly verified.
Qualification: The predicate, which is used in the selection operation and defines a
fragment, is called as Qualification. For instance, in the above example the
qualifications
q1 : CITY = Mysore
q2 : CITY = Shimoga
We can generalize from the above example that in order to satisfy the completeness
condition, the set of qualifications of all fragments must be complete, at least with
respect to the set of allowed values. The reconstruction condition is always satisfied
through the union operation, and the disjoint ness condition requires that
qualifications be mutually exclusive.
4.3.2 Derived Horizontal Fragmentation: This is a type of fragmentation, which is
derived from the horizontal fragmentation of another relation.
Example: Consider a global relation
SUPPLY (SNUM, PNUM, DEPTNUM, QUAN)

where SNUM is a supplier number. If it is required that a fragment has to contain the
tuples for suppliers, which are in a given city, and then we have to go for derived
fragmentation. A semi-join operation with the fragments SUPLIER1 and SUPLIER2
is needed in order to determine the tuples of SUPPLY, which correspond to the
suppliers in a given city. The derived fragmentation of SUPPLY can be therefore
defined as follows:
SUPPLY1 = SUPPLY SJSNUM=SNUMSUPPLIER1
SUPPLY2 = SUPPLY SJSNUM=SNUMSUPPLIER2
The reconstruction of the global relation SUPPLY can be performed through the
union operation as was shown for SUPPLIER.
The completeness of the above fragmentation requires that there be no supplier
numbers in the SUPPLY relation, which are not contained also in the SUPPLIER
relation. This is a typical, and reasonable, integrity constraint for this database and
usually is called as the referential integrity constraint.
The disjoint ness condition is satisfied if a tuple of the SUPPLY relation does not
correspond to two tuples of the SUPPLIER relation that belong to two different
fragments. In this case this condition is easily verified, because the supplier numbers
are unique keys of the SUPPLIER relation.
4.3.3 Vertical Fragmentation:
The Vertical fragmentation of a global relation is the subdivision of its attributes into
groups; fragments are obtained by projecting the global relation over each group.
This can be useful in distributed databases where each group of attributes can contain
data that have common geographical properties. The fragmentation is correct; if each
attribute is mapped into at least one attribute of the fragments; moreover, it must be
possible to reconstruct the original relation by joining the fragments together.
Example: Consider a global relation
EMP (EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM)
A vertical fragmentation of this relation can be defines as
EMP1=PJ EMPNUM,NAME, MGRNUM, DEPTNUM EMP
EMP2=PJ EMPNUM, SAL,TAX EMP
The reconstruction of relation EMP can be obtained as
EMP=EMP1JN EMPNUM=EMPNUM EMP2
This is because; EMPNUM is a key of EMP.
Let us draw some important points to be noted from this example.

The purpose of including the key of the global relation into each
fragment is to ensure the reconstruction property.
An alterative way to provide the reconstruction property is to
generate tuple identifiers that are used as system-controlled keys.
This can be convenient in order to avoid the replication of large
keys; moreover, users cannot modify tuple identifiers.
Let us finally consider the problem of fragment disjoint ness. First, we have seen that
at least the key should be replicated in all fragments in order to allow reconstruction.
In fact, if we include the same attribute in two different vertical fragments, we know
exactly that the column that corresponds to this attribute.
For example, consider the following vertical fragmentation of relation EMP:
EMP1 = PJEMPNUM,NAME,MGRNUM,DEPTNUM EMP
EMP2 = PJEMPNUM,NAME,SAL,TAX EMP
The attribute NAME is replicated in both fragments. We can remove this attribute
when we reconstruct relation EMP through an additional projection operation.
EMP = EMP1 JNEMPNUM=EMPNUM PJEMPNUM, SAL, TAX EMP2
4.3.4 Mixed Fragmentation: The fragments that are obtained by the above
fragmentation operations are relations themselves, so that it is possible to apply the
fragmentation operations recursively, provided that the correctness conditions are
satisfied each time. The reconstruction can be obtained by applying the reconstruction
rules in reverse order.
Example: Consider the same global relation
EMP (EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM)
The following is a mixed fragmentation, which is obtained by applying the vertical
fragmentation of the previous example, followed by a horizontal fragmentation on
DEPTNUM:
EMP1 = SLDEPTNUM10 PJEMPNUM, NAME, MGRNUM, DEPTNUM EMP
EMP2 = SL10<DEPTNUM20 PJEMPNUM, NAME, MGRNUM, DEPTNUM EMP
EMP3 = SLDEPTNUM>20 PJEMPNUM, NAME, MGRNUM, DEPTNUM EMP
EMP4 = PJEMPNUM, NAME, SAL , TAX EMP
EMP

EMP1 EMP2 EMP3

EMP4

Fig.4.3 The fragmentation tree of relation EMP


The reconstruction of relation EMP is defined by the following expression:

EMP = UN (EMP1, EMP2, EMP3) JNEMPNUM = EMPNUM


PJEMPNUM, SAL, TAX EMP4
A fragmentation tree can conveniently represent mixed fragmentation (as shown in
the above figure). In a fragmentation tree, the root corresponds to a global relation,
the leaves corresponds to the leaves correspond to the fragments, and the intermediate
nodes correspond to the intermediate results of the fragment-defining expressions.

The EXAMPLE_DDB:
The

following

codes

shows

the

global

and

fragmentation

schemata

of

EXAMPLE_DDB. Most of the global relations of EXAMPLE_DDB and their


fragmentation have been already introduced. A DEPT relation, horizontally
fragmented into three fragments on the value of the DEPTNUM attribute, is added.
Global schema
EMP (EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM)
DEPT (DEPTNUM, NAME, AREA, MGRNUM)
SUPPLIER (SNUM, PNUM, DEPTNUM, QNUM)
Fragmentation schema
EMP1 = SLDEPTNUM 10 PJEMPNUM, NAME, MGRNUM, DEPTNUM(EMP)
EMP2 = SL10<DEPTNUM 20 PJEMPNUM, NAME, MGRNUM, DEPTNUM(EMP)
EMP3 = SLDEPTNUM >20 PJEMPNUM, NAME, MGRNUM, DEPTNUM(EMP)
EMP4 = PJEMPNUM, NAME, SAL, TAX(EMP)
DEPT1= SLDEPTNUM10(DEPT)
DEPT2= SL10<DEPTNUM20(DEPT)
DEPT3= SLDEPTNUM>20(DEPT)
SUPPLIER1 = SLCITY = SF (SUPPLIER)
SUPPLIER2 = SLCITY = LA (SUPPLIER)
SUPPLY1 = SUPPLY SJSNUM=SNUMSUPPLIER1
SUPPLY2 = SUPPLY SJSNUM=SNUMSUPPLIER2
4.4 Integrity Constraints In Distributed Databases:
When an update performed by a database application violates an integrity constraint,
the application is rejected and thus the correctness of data is preserved. A typical
example of integrity constraint is referential integrity, which requires that all values of
a given attribute of a relation exist also in some other relation. This constraint is
particularly useful in distributed databases, for ensuring the correctness of derived
fragmentation. For example, since the SUPPLY relation has a fragmentation which is
derived from that of SUPPLIER relation by means of a semi-join on the SUPNUM
attribute, it is required that all values of SUPNUM in SUPPLY be present also in
SUPPLIER.

Integrity constraints can be enforced automatically by adding to application programs


some code for testing whether the constraint is violated. If so, the program execution
is suspended and all actions already performed by it are cancelled, if necessary.
One of the most serious disadvantages of integrity constraints is the loss in
performance that is due to the execution of the integrity tests; this loss is very
important in distributed databases. The major problems in applying integrity checking
might increase the need of accessing remote sites. It is necessary to consider also
integrity checking in the design of the distribution of database.
4.5Summary:
In this unit we have studied reference architecture for distributed database. Also the
different types of fragmentation techniques are discussed. We have also seen some
demonstration examples. Some ideas about integrity constraints are given.

You might also like