Basis For Distributed Database Technology

Basis for Distributed Database Technology
 Database System Technology (DST)

 controlled access to structured data
 aims towards centralized (single site) computing
 Computer Networking Technology (CNT)
 facilitates distributed computing
 goes against centralized computing
 Distributed Database Technology = DST + CNT
 aims to achieve integration without centralization
What is distributed?
Processing Logic
Function
Data
Control
All the above modes of distribution are necessary and important for
distributed database technology
Distributed database system
A distributed database is a collection of multiple, logically

interrelated databases distributed over a computer network.
A distributed database management system (DDBMS) is a software

system that permits the management of the distributed databases
and makes the distribution transparent to the users.
What is not a DDBMS?
A DDBMS is not a “collection of files” that can be stored at each
node of a computer network.
A multiprocessor system based DBMS (parallel database system) is

not a DDBMS.
A DDBMS is not a system wherein data resides only at one node.

Aims of Distributed DBMS - Transparent
Management of Distributed & Replicated Data
Transparency refers to separation of the higher-level semantics of a
system from lower-level implementation details.
From data independence in centralized DBMS to fragmentation
transparency in DDBMS.
Who should provide transparency? - DDBMS!
Aims of Distributed DBMS - Reliability through
Distributed Transactions
Distributed DBMS can use replicated components to eliminate
single point failure.
The users can still access part of the distributed database with
“proper care” even though some of the data is unreachable.
Distributed transactions facilitate maintenance of consistent
database state even when failures occur.
Aims of Distributed DBMS - Improved
Performance
Since each site handles only a portion of a database, the contention
for CPU and I/O resources is not that severe. Data localization
reduces communication overheads.
Inherent parallelism of distributed systems may be exploited for
inter-query and intra-query parallelism.
Performance models are not sufficiently developed.
Aims of Distributed DBMS - Easier System
Expansion
Ability to add new sites, data, and users over time without major
restructuring.
Huge centralized database systems (mainframes) are history
(almost!).
PC revolution (Compaq buying Digital, 1998) will make natural
distributed processing environments.
New applications (such as, supply chain) are naturally distributed -
centralized systems will just not work.
Complicating Factors
Data may be replicated in a distributed environment. Therefore,

DDBMS is responsible for (i) choosing one of the stored copies of the
requested data, and (ii) making sure that the effect of an update is
reflected on each and every copy of that data item.
Maintaining consistency of distributed/replicated data.
Since each site cannot have instantaneous information on the
actions currently carried out in other sites, the synchronization of
transactions at multiple sites is harder than centralized system.
and Complexity, Cost, Distribution of control, Security,...
Problem Areas
Distributed Database Design

Distributed Query Processing
Distributed Directory Management
Distributed Concurrency Control
Distributed Deadlock Management
Reliability of Distributed Databases
Operating Systems Support
Heterogeneous Databases
Relationship among Problems
Directory Management
Query Processing Distributed DB Design Reliability
Concurrency Control
Deadlock Management
Transparency and Architecture
issues in DDBMSs
Top-Down DDBMS Architecture - Classical
Global Schema
Fragmentation Schema Site Independent

Schemas
Allocation Schema
Local Mapping Schema I Local Mapping Schema I Other sites
DBMS I DBMS I
Local Database I
Local Database 2
Site 1
Site 2
Top-Down DDBMS Architecture - Classical
Global Schema: a set of global relations as if database were not

distributed at all
Fragmentation Schema: global relation is split into “non-overlapping”
(logical) fragments. 1:n mapping from relation R to fragments Ri.
Allocation Schema: 1:1 or 1:n (redundant) mapping from fragments
to sites. All fragments corresponding to the same relation R at a site
j constitute the physical image Rj. A copy of a fragment is denoted
by Rji.
Local Mapping Schema: a mapping from physical images to
physical objects, which are manipulated by local DBMSs.
Global Relations, Fragments and Physical Images
R •Separating concepts of
R1 R11
R1 fragmentation and allocation
(Site 1)
R1 2 •Explicit control of redundancy
R2
•Independence from local
R2 1
R2 databases
(Site2)
R3 R2 2 Allows for:
R3 2 •Fragmentation Transparency
Global Fragments R3
Relation (Site3) •Location Transparency
R3 3
•Local Mapping Transparency
Physical Images
Rules for Data Fragmentation
Completeness: All the data of the global relation must be mapped

into fragments.
Reconstruction: It must always be possible to reconstruct each
global relation from its fragments.
Disjointedness: It is convenient if the fragments are disjoint so that
the replication of data can be controlled explicitly.
Types of Data Fragmentation
Vertical Fragmentation
•Projection on relation (subset of attributes)
•Reconstruction by join
Vertical Fragmentation •Updates require no tuple migration
Horizontal Fragmentation
•Selection on relation (subset of tuples)
•Reconstruction by union
•Updates may requires tuple migration
Horizontal Fragmentation Mixed Fragmentation
•A fragment is a Select-Project query on
relation.
Levels of Distribution Transparency
Fragmentation Transparency: Just like using global relations.

Location Transparency: Need to know fragmentation schema; but
need not know where fragments are located. Applications access
fragments (no need to specify sites where fragments are located).
Local Mapping Transparency: Need to know both fragmentation and
allocation schema; no need to know what the underlying local
DBMSs are. Applications access fragments explicitly specifying
where the fragments are located.
No Transparency: Need to know local DBMS query languages, and
write applications using functionality provided by the Local DBMS
Why is support for transparency difficult?
There are tough problems in query optimization and transaction

management that need to be tackled (in terms of system support
and implementation) before fragmentation transparency can be
supported.
Less distribution transparency the more the end-application
developer needs to know about fragmentation and allocation
schemes, and how to maintain database consistency.
Higher levels of distribution transparency require appropriate
DDBMS support, but makes end-application developers work easy.
Some Aspects of top-down architecture
Distributed database technology is an “add-on” technology, most

users already have populated centralized DBMSs. Whereas top
down design assumes implementation of new DDBMS from scratch.
In case of OODBMs, top-down architecture makes sense because
most OODBMs are going to be built from scratch.
In many application environments, such as semi-structured
databases, continuous multimedia data, the notion of fragment is
difficult to define.
Current relational DBMS products provide for some form of location
transparency (such as, by using nicknames).
Bottom up Architecture - Present & Future
Possible ways in which multiple databases may be put together for

sharing by multiple DBMSs.
The DBMSs are characterized according to
•Autonomy - degree to which individual DBMSs can operate
independently. Tightly coupled - integrated (A0), Semiautonomous -
federated (A1), Total Isolation - multidatabase systems(A2)
•Distribution - no distribution - single site (D0), client-server -
distribution of DBMS functionality (D1), full distribution - peer to peer
distributed architecture(D2)
•Heterogeneity - homogeneous (H0) or heterogeneous (H1)
Distributed DBMS Implementation Alternatives
Distribution
(A0,D2,H0)
(A2,D2,H1)
Autonomy
Heterogeneity
Architectural Alternatives
(A0,D0,H0): multiple DBMSs that are logically integrated at single

site - composite systems.
(A0,D0,H1): multiple database managers that are heterogeneous
but provide integrated view to the user.
(A0,D1,H0): client-server based DBMS.
(A0,D2,H0): Classical distributed database system architecture.
(A1,D0,H0): Single site, homogeneous, federated database systems
- not realistic.
(A1,D0,H1): heterogeneous federated DBMS, having common
interface over disparate cooperating specialized database systems.
Architectural Alternatives
(A1,D1,H1): heterogeneous federated database systems with

components of the systems placed at different sites.
(A2,D0,H0): homogeneous multidatabase systems at a single site.
(A2,D0,H1): heterogeneous multidatabase systems at a single site.
(A2,D1,H1) & (A2,D2,H1): distributed heterogeneous multidatabase
systems. In case of client-server environments it creates a three
layer architecture. Interoperability is the major issue.
Autonomy, distribution, heterogeneity are orthogonal issues.
Client/Server Database Systems
Distinguish and divide the functionality to be provided into two

classes: server functions and client functions. That is, two level
architecture. Made popular by relational DBMS implementations.
DBMS client: user interface, application, consistency checking of
queries, and caching and managing locks on cached data.
DBMS Server: handles query optimization, data access and
transaction management.
Typical scenarios: multiple clients/single server; multiple
client/multiple servers (dedicated home-server or any server)
Client/Server Reference Architecture
User Interface Application Program
Operating
Client DBMS
System
Communication software
SQL Queries Result Relation
Communication software
Semantic Data Controller
Query Optimizer
Operating
Transaction Manager
Recovery Manager
Runtime Support Processor
System
Database
Distributed Database Reference Architecture
ES1 ES2 ESn
GCS
LCS1 LCS2 LCSn
LIS1 LIS2 LISn

Components of Distributed DBMS
User
System Responses User Requests
External Schema User Interface Handler User Processor
Global Conceptual Schema Semantic Data Controller
GD/D Global Query Optimizer
Global Execution Monitor
Local Conceptual Schema Local Query Processor Data Processor
Local Recovery Manager

System Log
Local Internal Schema Runtime Support Processor
Database
MDBS Architecture With Global Schema
GES1 GES2 GES3
LES11 LES12 LES13 GCS LESn1 LESn2 LESn3
LCS1 LCSn
LIS1 LISn
MDBS Architecture without Global Schema
ES1 ES2 ESn
Multidatabase
Layer
Local Database
System Layer LCS1 LCS2 LCSn
LIS1 LIS2 LISn

Components of MDBS
User
System Responses User Requests
Multi-DBMS Layer
Query Processor Query Processor
Transaction Manager Transaction Manager
Scheduler Scheduler
Recovery Manager Recovery Manager
Runtime Support Processor Runtime Support Processor
Database Database
Global Directory Issues
Directory is itself a database that contains meat-data about the

actual data stored in the database. It includes the support for
fragmentation transparency for the classical DDBMS architecture.
Directory can be local or distributed.
Directory can be replicated and/or partitioned.
Directory issues are very important for large multi-database
applications, such as digital libraries.
Impact of new technologies
Internet and WWW

 Semi-structured data, multimedia data
 Keyword based search - browsing versus querying
 What does integration mean?
Applied technologies
 Workflow systems
 Data warehousing & Data mining
 What is the role of distributed database technology?
Research Issues - DDBMS Technology
Evaluation of state of the art data replication strategies.
On-line distributed relational database redesign.
Distributed object-oriented database systems - design
(fragmentation, allocation), query processing (methods execution,
transformation), transaction processing
WWW and Internet - transparency issues, implementation strategies
(architecture, scalability), On-line transaction processing, On-line
analytical processing (data warehousing , data mining), query
processing (STRUDEL, WebSQL), commit protocols
Research Issues - Applications
Workflow systems - High throughput (supply chain, Amazon,..)
short, sweet, and robust versus ad-hoc (office automation) problem
solving.
Electronic commerce - reliable high throughput, distributed
transactions.
Distributed multimedia - QoS, real-time delivery, design and data
allocation, MPEG-4 aspects.

Basis For Distributed Database Technology

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basis For Distributed Database Technology

Uploaded by

Copyright:

Available Formats

Basis for Distributed Database Technology

 Database System Technology (DST)

A distributed database is a collection of multiple, logically

A distributed database management system (DDBMS) is a software

A multiprocessor system based DBMS (parallel database system) is

A DDBMS is not a system wherein data resides only at one node.

Data may be replicated in a distributed environment. Therefore,

Distributed Database Design

Query Processing Distributed DB Design Reliability

Fragmentation Schema Site Independent

Local Mapping Schema I Local Mapping Schema I Other sites

Global Schema: a set of global relations as if database were not

Completeness: All the data of the global relation must be mapped

Fragmentation Transparency: Just like using global relations.

There are tough problems in query optimization and transaction

Distributed database technology is an “add-on” technology, most

Possible ways in which multiple databases may be put together for

(A0,D0,H0): multiple DBMSs that are logically integrated at single

(A1,D1,H1): heterogeneous federated database systems with

Distinguish and divide the functionality to be provided into two

SQL Queries Result Relation

ES1 ES2 ESn

LCS1 LCS2 LCSn

LIS1 LIS2 LISn

External Schema User Interface Handler User Processor

Global Conceptual Schema Semantic Data Controller

GD/D Global Query Optimizer

Global Execution Monitor

Local Conceptual Schema Local Query Processor Data Processor

Local Recovery Manager

Local Internal Schema Runtime Support Processor

GES1 GES2 GES3

LES11 LES12 LES13 GCS LESn1 LESn2 LESn3

ES1 ES2 ESn

LIS1 LIS2 LISn

Query Processor Query Processor

Transaction Manager Transaction Manager

Recovery Manager Recovery Manager

Runtime Support Processor Runtime Support Processor

Directory is itself a database that contains meat-data about the

Internet and WWW

You might also like