You are on page 1of 69

Chapter 4

Distributed
Database Systems
Chapter 4 - Objectives
Basic Concepts in Distributed Database System.
Advantages and disadvantages of distributed
databases.
Functions and architecture for a DDBMS.
Distributed Database Design issues.
Levels of DDBMS Transparency.
Rules for DDBMSs.

2 Distributed Database Systems 04/13/2024


Basic Concepts of Distributed DBs
Distributed Database
A logically interrelated collection of shared data (and a
description of data), physically distributed over a
computer network.

Distributed DBMS
Software system that permits the management of the
distributed database and makes the distribution
transparent to users.
Distributed DBMSs should help resolve the islands of
information problem in organizations.

3 04/13/2024
Distributed Databases & Distributed Computing

Distributed databases bring the advantages of


distributed computing to the database domain.
A distributed computing system consists of a
number of processing sites or nodes that are
interconnected by a computer network and that
cooperate in performing certain assigned tasks
Distributed computing systems partition a big,
unmanageable problem into smaller pieces to solve it
efficiently in a coordinated manner.
Hence, more computing power is harnessed to solve a
complex task, with the cooperation between the
4 independent nodes. 04/13/2024
Basic Concepts of Distributed DBs cont’d…
Collection of logically-related shared data.
Data split into fragments.
Fragments may be replicated.
Fragments/replicas allocated to sites.
Sites linked by a communications network.
Data at each site is under control of a local DBMS.
DBMSs handle local applications autonomously.
Each DBMS participates in at least one global
application.

5 04/13/2024
Distributed DBMS Architecture

6 04/13/2024
Data distribution and replication among
distributed databases – an example

7 04/13/2024
Distributed Processing
A centralized database that can be accessed over a
computer network. (this is not a distributed database)

8 04/13/2024
Parallel DBMS
A DBMS running across multiple processors and
disks designed to execute operations in parallel,
whenever possible, to improve performance.
Based on premise that single processor systems can
no longer meet requirements for cost-effective
scalability, reliability, and performance.
Parallel DBMSs link multiple, smaller machines to
achieve same throughput as single, larger machine,
with greater scalability and reliability.

9 04/13/2024
Parallel DBMS
Parallel technology is typically used for
very large databases possibly of the order
of terabytes (1012 bytes), or systems that
have to process thousands of transactions
per second.
Also note that most DBMS vendors have
a parallel DMBS version of their
products.
Also, this is not a distributed database
systems.
10 04/13/2024
Parallel DBMS
Main architectures for parallel DBMSs are:

Shared memory,
 This architecture provides high-speed data access for a limited
number of processors, but it is not scalable beyond about 64
processors, at which point the interconnection network becomes
a bottleneck
Shared disk,
 Architecture optimized for applications that are inherently
centralized and require high availability and performance
Shared nothing.
 Often known as massively parallel processing (MPP), is a
multiple-processor architecture in which each processor is part
of a complete system, with its own memory and disk storage

11 04/13/2024
Parallel DBMS

(a) shared memory

(b) shared disk

(c) shared nothing

12 04/13/2024
Multi- Database System (MDBS)
 MDBS -A distributed DBMS in which each site maintains
complete autonomy
 Simply speaking , MDBS is a DBMS that resides
transparently on top of existing database and file systems,
and presents a single database to its users
 MDBS attempt to logically integrate a number of
independent DDBMSs while allowing the local DBMSs to
maintain complete control of their operations.
 If there is no provision for the local sites to function as a
standalone DBMS, then the system has no local
autonomy
 For a centralized database, there is complete autonomy
13 but a total lack of distribution and heterogeneity. 04/13/2024
Classification of DDBMS
There are unfederated (where there are no local users)
and federated(there are local users) MDBSs.
A federated system is a cross between a distributed
DBMS and a centralized DBMS; it is a distributed
system for global users and a centralized system for
local
In General, Classification of DDBMS is based on three
important factors
Level of data distribution
Degree of local site autonomy
Extent of site Heterogeneity

14 04/13/2024
Advantages of DDBMSs
Reflects organizational structure
Improved shareability and local autonomy
Improved availability
Improved reliability
Improved performance

15 04/13/2024
Disadvantages of DDBMSs
Complexity
Cost
Security
Integrity control more difficult
Lack of standards
Lack of experience
Database design more complex

16 04/13/2024
Types of DDBMS( based on site heterogeneity )
Homogeneous DDBMS
Heterogeneous DDBMS

17 04/13/2024
Homogeneous DDBMS
All sites use same DBMS product.
Much easier to design and manage.
Approach provides incremental growth and allows
increased performance.
Usually are results of a new system being designed

18 04/13/2024
Heterogeneous DDBMS
Sites may run different DBMS products, with possibly
different underlying data models.
Occurs when sites have implemented their own
databases and integration is considered later.
Translations required to allow for:
 Different hardware.
 Different DBMS products.
 Different hardware and different DBMS products.
Typical solution is to use gateways.
Gateways: convert the language and model of each
different DBMS into the language and model of the
relational system.
19 04/13/2024
Overview of Networking
Network - Interconnected collection of autonomous
computers, capable of exchanging information.
 Local Area Network (LAN) intended for connecting
computers at same site.
 Wide Area Network (WAN) used when computers
or LANs need to be connected over long distances.
 WAN relatively slow and less reliable than LANs.
DDBMS using LAN provides much faster response
time than one using WAN.
 LANs can be extended over a long geographic areas

20
using Virtual Private Networks(VPNs)
Distributed Database Systems 04/13/2024
Overview of Networking- Summary of WAN
and LAN

21 Distributed Database Systems 04/13/2024


Reference Architecture for DDBMS
Due to diversity, there is no accepted DDBMS
architecture equivalent to the ANSI/SPARC 3-level
architecture.
A reference architecture for DDBMSs consists of:
Set of global external schemas.
Global conceptual schema (GCS).
Fragmentation schema and allocation schema.
Set of schemas for each local DBMS conforming to 3-
level ANSI/SPARC architecture.
Some levels may be missing, depending on levels/type
of of transparency supported.

22 Distributed Database Systems 04/13/2024


Reference Architecture for DDBMS

23 Distributed Database Systems 04/13/2024


Functions of a DDBMS
Expect DDBMS to have at least the functionality of a
centralized DBMS, and .
Also to have the following functionality:
 Extended communication services.
 Extended Data Dictionary.
 Distributed query processing.
 Extended concurrency control.
 Extended recovery services.

24 04/13/2024
Components of a DDBMS

25 04/13/2024
Distributed Database Design Issues
Three key issues need to be considered:

Fragmentation,
Allocation,
Replication.

26 04/13/2024
Distributed Database Design
Fragmentation
Relation may be divided into a number of sub-relations,
which are then distributed.
Allocation
Each fragment is stored at a site with “optimal”
distribution.
Replication
Copy of fragment may be maintained at several sites.

27 04/13/2024
Why Fragment?
Usage
Applications work with views rather than entire
relations.
Efficiency
Data is stored close to where it is most frequently used.
Data that is not needed by local applications is not
stored.

28 Distributed Database Systems 04/13/2024


Why Fragment?
Parallelism
With fragments as unit of distribution, transaction can
be divided into several subqueries( sub transactions)
that operate on the fragments.
Security
Data not required by local applications is not stored
and so not available to unauthorized users.

29 Distributed Database Systems 04/13/2024


Why Fragment?
Disadvantages

Performance: The performance of global applications


that require data from several fragments located at
different sites may be slower
Integrity: Integrity control may be more difficult if data
and functional dependencies are fragmented and
located at different sites.

30 Distributed Database Systems 04/13/2024


Fragmentation
Definition and allocation of fragments is carried out
strategically to achieve the following :
Locality of Reference.
Improved Reliability and Availability.
Improved Performance.
Balanced Storage Capacities and Costs.
Minimal Communication Costs.
Fragmentation involves analyzing most important
applications, based on quantitative/qualitative
information.

31 04/13/2024
Data Allocation
Four alternative strategies regarding placement of
data:
Centralized( Distributed Processing),
Partitioned (or Fragmented),
Complete Replication,
Selective Replication.

32 04/13/2024
Data Allocation
Centralized: Consists of single database and DBMS
stored at one site with users distributed across the
network.(Not really a distributed database)
Partitioned: Database partitioned into disjoint
fragments, each fragment assigned to one site.
Complete Replication: Consists of maintaining
complete copy of database at each site.
Selective Replication: Combination of partitioning,
replication, and centralization, based on the nature of
data
 This is the most commonly used strategy because of
its flexibility.
33 04/13/2024
Comparison of Strategies for Data Distribution

34 04/13/2024
Correctness of Fragmentation
Three correctness rules:
Completeness,
Reconstruction,
Disjointness.

35 04/13/2024
Correctness of Fragmentation
Completeness
If relation R is decomposed into fragments R1, R2, ... Rn,
each data item that can be found in R must appear in at
least one fragment.
Reconstruction
It must be possible to define a relational operation
that will reconstruct R from the fragments.
Reconstruction for horizontal fragmentation is Union
operation and for vertical Natural Join is used.

36 04/13/2024
Correctness of Fragmentation
Disjointness
If data item di appears in fragment Ri, then it should
not appear in any other fragment.
Exception: vertical fragmentation, where primary
key attributes must be repeated to allow
reconstruction.
For horizontal fragmentation, data item is a tuple.
For vertical fragmentation, data item is an
attribute.

37 04/13/2024
Fragmentation Options
Fragmenting a relation should be done with
caution. The following are the fragmentation
possibilities for relations in a database
Horizontal
Vertical
Mixed
Derived
No Fragmentation

38 04/13/2024
Horizontal and Vertical Fragmentation

39 04/13/2024
Mixed Fragmentation

40 04/13/2024
Horizontal Fragmentation
Consists of a subset of the tuples of a relation.
Defined using Selection operation of relational
algebra:
p(R)
For example:

P1 =  type=‘House’(PropertyForRent)
P2 =  type=‘Flat’(PropertyForRent)

Reconstruction expression ?

41 04/13/2024
Vertical Fragmentation
Consists of a subset of attributes of a relation.
Defined using Projection operation of relational algebra:
a1, ... ,an(R)

For example:
S1 = staffNo, position, sex, DOB, salary(Staff)
S2 = staffNo, fName, lName, branchNo(Staff)
Determined by establishing affinity of one attribute to
another.

Reconstruction expression ?

42 04/13/2024
Mixed Fragmentation
Consists of a horizontal fragment that is vertically
fragmented, or a vertical fragment that is horizontally
fragmented.
Defined using Selection and Projection operations of
relational algebra:
 p(a1, ... ,an(R)) or
a1, ... ,an(σp(R))

43 04/13/2024
Example - Mixed Fragmentation
S1 = staffNo, position, sex, DOB, salary(Staff)
S2 = staffNo, fName, lName, branchNo(Staff)

S21 =  branchNo=‘B003’(S2)
S22 =  branchNo=‘B005’(S2)
S23 =  branchNo=‘B007’(S2)

Reconstruction expression ?

44 04/13/2024
Derived Horizontal Fragmentation
A horizontal fragment of a child relation that is based
on horizontal fragmentation of a parent
relation( primary key table).
Some applications may involve a join of two or more
relations.
If the relations are stored at different locations, there
may be a significant overhead in processing the join.

45 04/13/2024
Derived Horizontal Fragmentation
In such cases, it may be more appropriate to ensure
that the relations, or fragments of relations, are at
the same location
Ensures that fragments that are frequently joined
together are at the same site.
Defined using Semijoin operation of relational
algebra:
Ri = R F Si, 1iw
where w is the number of horizontal fragments defined
on S( the parent table) and f is the join attribute

46 04/13/2024
Example - Derived Horizontal Fragmentation
If we have staff fragments below,
S3 =  branchNo=‘B003’(Staff)
S4 =  branchNo=‘B005’(Staff)
S5 =  branchNo=‘B007’(Staff)

We Could use derived fragmentation for Property:

Pi = PropertyForRent branchNo S i, 3i5


Hence, we have three fragments named P 3, P4 P5
Reconstruction expression ?

47 04/13/2024
Derived Horizontal Fragmentation
If a child relation contains more than one foreign
key, need to select one of the parent tables.
Choice can be based on fragmentation used most
frequently or fragmentation with better join
characteristics.

48 04/13/2024
No fragmentation
A final strategy is not to fragment a relation.
For example, the Branch relation contains only
a small number of tuples and is not updated
very frequently.
Rather than trying to horizontally fragment the
relation on branch number for example, it
would be more sensible to leave the relation
whole and simply replicate the Branch relation
at each site

49 04/13/2024
Distributed Database Design Methodology
1. Use normal methodology to produce a design for the
global relations.
2. Examine topology of system to determine where
databases will be located.
3. Analyse most important transactions and identify
appropriateness of horizontal/vertical fragmentation.
4. Decide which relations are not to be fragmented.
5. Examine relations on 1 side of relationships(Parent
Relations) and determine a suitable fragmentation
schema. Relations on many side (Child Relations) may
be suitable for derived horizontal fragmentation.

50 Distributed Database Systems 04/13/2024


Levels of Transparencies in a DDBMS
The objective of transparency is to make the
distributed system appear like a centralized system
for the user.
This is sometimes referred to as the
fundamental principle of distributed DBMSs
Four main Transparencies
 Distribution transparency
 Transaction transparency
 Performance transparency
 DBMS transparency
51 04/13/2024
Transparencies in a DDBMS – sub
transparencies

Distribution Transparency
Fragmentation Transparency
Location Transparency
Replication Transparency
Transaction Transparency
Concurrency Transparency
Failure Transparency

Performance Transparency

DBMS Transparency
52 Distributed Database Systems 04/13/2024
Distribution Transparency
 Distribution transparency allows user to perceive database
as single, logical entity.
 If DDBMS exhibits distribution transparency, the user has
freedom not to know the operational details of the network
and the placement of the data in the distributed system:
 Fragmentation Transparency gives the user the freedom to be
unaware of the fact that data is fragmented (fragmentation
transparency),
 This is the highest level of distribution transparency
 local mapping transparency:. With local mapping transparency,
the user needs to specify both fragment names and the location
of data items, including replication sites if any.
 This is the lowest level of distribution transparency

53 04/13/2024
Distribution Transparency
Location Transparency is the middle level of
distribution transparency.
With location transparency, the user must know
how the data has been fragmented but still does
not have to know the location of the data
With replication transparency, user is unaware
of replication of fragments – the existence of
floating copies of same data item at different
sites.
Closely related to location transparency

54 Distributed Database Systems 04/13/2024


Transaction Transparency
Ensures that all distributed transactions maintain
distributed database’s integrity and consistency.
Distributed transaction accesses data stored at
more than one location.
Each transaction is divided into number of sub
transactions, one for each site that has to be
accessed.
DDBMS must ensure the indivisibility (atomicity)
of both the global transaction and each of the sub
transactions.
55 Distributed Database Systems 04/13/2024
Example - Distributed Transaction
T prints out names of all staff, using schema defined
in the fragmentation example as S1, S2, S21, S22, and
S23.
DDBMS defines three sub-transactions TS3, TS5, and
TS7 to represent agents at sites 3, 5, and 7.

56 Distributed Database Systems 04/13/2024


Concurrency Transparency
All transactions must execute independently and be
logically consistent with results obtained if
transactions executed one at a time, in some
arbitrary serial order.
Same fundamental principles as for centralized
DBMS hold.
DDBMS must ensure both global and local
transactions do not interfere with each other.
Similarly, DDBMS must ensure consistency of all sub
transactions of a global transaction.

57 Distributed Database Systems 04/13/2024


Classification of Transactions
In IBM’s Distributed Relational Database
Architecture (DRDA), four types of distributed
transactions:
Remote request(Remote Query)
Remote unit of work( Remote Transaction)
Distributed unit of work( Distributed Transaction)
Distributed request.(Distributed Query)
In this context “Request” is a SQL Select Statement
(query) and “Unit of work” is a transaction that
manipulates the content of a database.

58 Distributed Database Systems 04/13/2024


Classification of Transactions

59 Distributed Database Systems 04/13/2024


Concurrency Transparency
Replication makes concurrency more complex.
If a copy of a replicated data item is updated, update
must be propagated to all copies.
Could propagate changes as part of original
transaction, making it an atomic operation.
However, if one site holding copy is not reachable,
then transaction is delayed until site is reachable.

60 Distributed Database Systems 04/13/2024


Concurrency Transparency
Could limit update propagation to only those sites
currently available. Remaining sites updated when
they become available again.( but update propagation
should be the first thing that such sites should do)
Could allow updates to copies to happen
asynchronously, sometime after the original update.
Delay in regaining consistency may range from a few
seconds to several hours.

61 Distributed Database Systems 04/13/2024


Failure Transparency
 DDBMS must ensure atomicity and durability of global
transaction.
 Which means, ensuring that sub transactions of global
transaction either all commit or all abort.
 Thus, DDBMS must synchronize global transaction to
ensure that all sub transactions have completed
successfully before recording a final COMMIT for global
transaction.
 Must do this in presence of site and network failures.
 DDBMS Commit Protocols ( reading assignment)
 Two phase commit(2PC)
 Three Phase Commit(3PC)

62 Distributed Database Systems 04/13/2024


Performance Transparency
DDBMS must perform as if it were a centralized
DBMS.
DDBMS should not suffer any performance
degradation due to the “distributed
architecture”.
DDBMS should determine most cost-effective strategy
to execute a request.

63 Distributed Database Systems 04/13/2024


Performance Transparency
Distributed Query Processor (DQP) maps data
request into ordered sequence of operations on local
databases.
Must consider fragmentation, replication, and
allocation schemas.
DQP has to decide:
which fragment to access;
which copy of a fragment to use;
which location to use.

64 Distributed Database Systems 04/13/2024


Performance Transparency
DQP produces execution strategy optimized with
respect to some cost function.
Typically, costs associated with a distributed request
include:
I/O cost;
CPU cost;
Communication cost.

65 Distributed Database Systems 04/13/2024


DBMS transparency
With DBMS transparency, it should be possible to
have different DBMSs in the system with out
bothering the user to know about it.
DBMS transparency hides the knowledge that the local
DBMSs may be different and is therefore applicable
only to heterogeneous DDBMSs

66 Distributed Database Systems 04/13/2024


Date’s 12 Rules for a DDBMS
Fundamental Principle
To the user, a distributed system should look exactly like a
non-distributed system.
1. Local Autonomy
2. No Reliance on a Central Site
3. Continuous Operation
4. Location Independence
5. Fragmentation Independence
6. Replication Independence

67 Distributed Database Systems 04/13/2024


Date’s 12 Rules for a DDBMS
7. Distributed Query Processing
8. Distributed Transaction Processing
9. Hardware Independence
10. Operating System Independence
11. Network Independence
12. Database Independence

Last four rules are ideals.

68 Distributed Database Systems 04/13/2024


Quiz #2

1. If Only one of the sub transactions failed to return either commit or


rollback, then the global transaction must commit ( True/False)
2. Given the following fragments of relation R ,Write the reconstruction
relational algebra expression
R1=  p1(R), R2=  p2(R), R3= L1(R2), R4= L2(R2)
R=?
3. A transaction that issues a select statement to a single remote site is called.
A. Remote Request B. Remote Transaction C. Distributed Request D.
Distributed Transaction
4. What is a data Item for Derived Horizontal Fragmentation
5. A.What kind of fragmentation is done in
the following figure?
B. What relational algebra operation is used to
produce the fragments?

69 Distributed Database Systems 04/13/2024

You might also like