Professional Documents
Culture Documents
Distributed Database
Management Systems
In this chapter, you will learn:
What a distributed database management
system (DDBMS) is and what its
components are
How database implementation is affected
by different levels of data and process
distribution
2
In this chapter, you will learn:
How transactions are managed in a
distributed database environment
How database design is affected by the
distributed database environment
3
Evolution of DDBMS
Distributed (Decentralized) database management
systems (DDBMS)
Definition: Database system governs the storage
and processing of logically related data over
Interconnected computer systems
Data/processing functions reside on multiple sites
4
Evolution of DDBMS
1970’s: Centralized DBMS
Structure information system for reports
5
Evolution of DDBMS
1980’s: Social and Technical Changes
Ad hoc capability required
Decentralized management structure common
6
Evolution of DDBMS
Future (discussion)
Centralized
De-centralized
Multiple source
Hybrid (combined)
Multiple processes
User-oriented
7
DDBMS Advantages
Data located near site with greatest demand,
distributed to match business requirements
Faster data access delivered to end-user
Faster data processing
Growth facilitation (scalability of system
extension)
Improved communications, improve
information system
8
DDBMS Advantages
9
DDBMS Disadvantages
Complexity of management and control
Security
Lack of standards
Increased storage requirements
Greater difficulty in managing data
environment
Increased training costs
10
Distributed Processing and
Distributed Database
Distributed Processing
Shares database’s logical processing among
physically, networked independent sites
Example: data I/O, data selection, data
validation are performed on one computer,
while reports are performed on another
computer
11
Shared data processing among three sites
through communication network
12
Distributed Processing and
Distributed Database
Distributed Database
Store a logically related database over two or
more physically independent site
The sites are connected through a network
13
Distributed Database
14
Distributed Database
vs. Distributed Processing
Distributed processing
Share with processing chores among several site
Does not require distributed database
15
Distributed Database
vs. Distributed Processing
Distributed database
System is composed of several parts of
database (called database fragments)
Database fragments are located at different sites
16
Definition of DDBMS
A software which governs the storage and
processing of logically related data over
interconnected computer systems in which
both and processing functions are
distributed among several sites
It must have at least following functions
17
Functions of DDBMS
Application/end user interface:
to interact with the end-user or application programs
and with other DBMSs within the distributed database
Validation to analyze data requests
Transformation to determine which request data
components are distributed and which ones are
localized.
Query optimization to find the best access strategy
Mapping to determine the data location and
remote fragment
18
Functions of DDBMS
I/O interface to read or write data from or to
permanent storage
Formatting to prepare the data for presentation
to the end-user or application program
Security to provide data privacy at both local
and remote databases
Backup and recovery to ensure the availability
and recoverability
19
Functions of DDBMS
DB Administration fro DBAs
Concurrency Control to manage
simultaneous data access
Transaction Management to ensure the
consistency states
20
Centralized Database
21
Fully Distributed Database
Management System
Handle not only the centralized database,
but also distributed of data and processing.
22
Fully Distributed Database
Management System
Figure 10.4
23
DDBMS Components
Computer workstations, independent of
platform
Network hardware and software
components reside in each workstation
Communications media
24
DDBMS Components
25
DDBMS Components
26
Distributed Database
Components
Figure 10.5
27
DDBMS Protocols (Rules)
Interface with network to transport data and
commands between DPs and TPs
Synchronize data received from DPs and route
to appropriate TPs
Ensure common database functions
Security
Concurrency control
28
Levels of Data and Process
Distribution
Database systems can be classified based
on process distribution and data distribution
Table 10.1
29
Single-Site Processing, Single-
Site Data (SPSD)
All processing on single CPU or host computer
All data are stored on host computer disk
DBMS located on the host computer
DBMS accessed by dumb terminals
Typical of mainframe and minicomputer
DBMSs
Typical of 1st generation of single-user
microcomputer database
30
Single-Site Processing, Single-
Site Data (con’t.)
Figure 10.6
31
Multiple-Site Processing,
Single-Site Data (MPSD)
• Requires network file server
• Applications accessed through LAN
• Variation known as client/server architecture
Figure 10.7
32
Multiple-Site Processing,
Single-Site Data (MPSD)
• The difference between file server and
client/server distributed system
• In client/server system, all data processing is
done at the server site
• Both perform multiple-site processing
• File server requires the database to be located
at a single site, while client/server may
support data at multiple sites
33
Multiple-Site Processing,
Multiple-Site Data (MPMD)
Fully distributed DDBMS with support for
multiple DPs and TPs at multiple sites
Depends on the level of support, DDBMSs also
classified as
Homogeneous distributed database system
Integrate one type of centralized DBMS over the network
Heterogeneous distributed database system
Integrate different types of centralized DBMSs over a network
34
Heterogeneous Distributed
Database Scenario
Figure 10.8
35
Multiple-Site Processing,
Multiple-Site Data (MPMD)
Fully Heterogeneous distributed database
system
Support different DBMSs that may even support
different data models (relational, hierarchical, or
network) running under different platforms
Restricts
Read only; multiple tables in single transaction;
distinct database; limited database model
36
Distributed DB Transparency
DDBMSD require several functional
characteristics or features.
Allows end users to feel like only database
user (feel like a centralized system)
Hides complexities of distributed database
37
Distributed DB Transparency
Transparency features
Distribution (treated as a single system; do not need to
know data partition, data replication, and data location)
Transaction (allows transaction at several sites and still
ensure the database integrity)
Failure (continue to operate even one of node fails)
Performance (feel like centralized system with high
performance)
Heterogeneity (integrate several different local DBMSs
under a common, or global schema
38
Distribution Transparency
Allows management of a physically dispersed
database as though it were centralized (based on
end-user programmer’s recognition)
Three Levels
Fragmentation transparency
Programmer do not need to know database is partitioned; do not
need to know the database fragment names and location at all.
Location transparency
Programmer must know the names of database fragments, but
not the locations
Local mapping transparency
Must specify both the fragment names and their locations
39
Distribution Transparency
40
Distribution Transparency
Example:
EMPLOY
(EMP_NAME,
EMP_DOB,
EMP_ADDRESS,
EMP_DEPARTMENT,
EMP_SALARY)
Distributed over three
different locations: New
York (E1), Atlanta (E2),
and Miami (E3)
41
Distribution Transparency
Case 1: Fragments transparency
SELECT *
FROM E1
WHERE EMP_DOB<’01-JAN-1940’;
42
Distribution Transparency
Case 2: Location transparency
SELECT *
FROM E1
WHERE EMP_DOB<’01-JAN-1940’;
UNION
SELECT *
FROM E2
WHERE EMP_DOB<’01-JAN-1940’;
UNION
SELECT *
FROM E3
WHERE EMP_DOB<’01-JAN-1940’;
43
Distribution Transparency
Case 3: local mapping transparency
SELECT *
FROM E1 NODE NY
WHERE EMP_DOB<’01-JAN-1940’;
UNION
SELECT *
FROM E2 NODE ATL
WHERE EMP_DOB<’01-JAN-1940’;
UNION
SELECT *
FROM E3 NODE MIA
WHERE EMP_DOB<’01-JAN-1940’;
44
Distribution Transparency
Distributed transparency is supported by
distributed data dictionary (DDD) or
distributed data catalog (DDC), which
contains the description of entire data
The description is called distributed global
schema
DDD or DDC itself is a distributed database
and needs to be updated all over the sites
Some current DDBMS implementations
impose limitations of the level of
transparency support
45
Transaction Transparency
Ensures transactions maintain integrity and
consistency
Completed only if all involved database sites
complete their part of the transaction
Management mechanisms
Distributed request
Distributed transaction
Remote request
Remote transaction
46
Remote Request
Allows us to access
single reference data at
single remote site
47
Remote Transaction
Composed of several
requests at single site
48
Distributed Transaction
Allows a transaction to
reference several
different DP sites.
49
Distributed Requests
50
Distributed Requests (con’t.)
51
Distributed Concurrency
Control
Multisite, multiple-process operations more
likely to create data inconsistencies and
deadlocked transactions
Problems
Transaction committed by local DP
One DP could not commit transaction’s result
52
Two-Phase Commit Protocol
DO-UNDO-REDO protocol
Write-ahead protocol
Two kinds of nodes
Coordinator
Subordinates
53
Two-Phase Commit Protocol
Phases
Preparation
Coordinator sends message to all subordinates
Confirms all are ready to commit or abort
Final Commit
Ensures all subordinates have committed or aborted
54
Performance Transparency
and Query Optimization
Objective: Minimize total cost associated
with execution of request
Main costs
Access time
Communication
CPU time
55
Performance Transparency
and Query Optimization
Basis for query optimization algorithms
Optimum execution order
Sites accessed to minimize communication
costs
Dynamic or static optimization
Statistically based vs. rule-based query
optimization algorithms
56
Distributed Database Design
Partition database into fragments
Horizontal
Vertical
Mixed
Fragments to replicate
Storage of data copies at multiple sites
Fully, partially, unreplicated databases
57
Distributed Database Design
Data allocation
Where to locate data
Centralized, partitioned, replicated
58
Client/Server Advantages Over
DDBMS
Client/server less expensive
Client/server solutions allow use of
microcomputer’s GUI
More people with PC skills than mainframe
skills
59
Client/Server Advantages Over
DDBMS
PC is well established in workplace
Numerous data analysis and query tools
exist
Considerable cost advantages to off-loading
application development
60
Client/Server Disadvantages
61
Date’s 12 Commandments for
Distributed Databases
1. Local Site Independence
2. Central Site Independence
3. Failure Independence
4. Location Transparency
5. Fragmentation Transparency
6. Replication Transparency
62
Date’s 12 Commandments for
Distributed Databases
7. Distributed Query Processing
8. Distributed Transaction Processing
9. Hardware Independence
10. Operating System Independence
11. Network Independence
12. Database Independence
63