You are on page 1of 63

Chapter 10

Distributed Database
Management Systems
In this chapter, you will learn:
 What a distributed database management
system (DDBMS) is and what its
components are
 How database implementation is affected
by different levels of data and process
distribution

2
In this chapter, you will learn:
 How transactions are managed in a
distributed database environment
 How database design is affected by the
distributed database environment

3
Evolution of DDBMS
 Distributed (Decentralized) database management
systems (DDBMS)
 Definition: Database system governs the storage
and processing of logically related data over
 Interconnected computer systems
 Data/processing functions reside on multiple sites

4
Evolution of DDBMS
 1970’s: Centralized DBMS
 Structure information system for reports

 Using structure (3GL) programming languages

 Store in central mainframe or minicomputer

5
Evolution of DDBMS
 1980’s: Social and Technical Changes
 Ad hoc capability required
 Decentralized management structure common

 1990’s: New forces


 Internet and the World Wide Web used for data access
and distribution
 Data analysis through data mining and data
warehousing

6
Evolution of DDBMS
 Future (discussion)
 Centralized
 De-centralized
Multiple source
 Hybrid (combined)

 Grid-enhanced Multiple location

Multiple processes

User-oriented
7
DDBMS Advantages
 Data located near site with greatest demand,
distributed to match business requirements
 Faster data access delivered to end-user
 Faster data processing
 Growth facilitation (scalability of system
extension)
 Improved communications, improve
information system
8
DDBMS Advantages

 Reduced operating costs


 User-friendly interface (GUI)
 Less danger of single-point failure
 Processor independence

9
DDBMS Disadvantages
 Complexity of management and control
 Security
 Lack of standards
 Increased storage requirements
 Greater difficulty in managing data
environment
 Increased training costs
10
Distributed Processing and
Distributed Database
 Distributed Processing
 Shares database’s logical processing among
physically, networked independent sites
 Example: data I/O, data selection, data
validation are performed on one computer,
while reports are performed on another
computer

11
Shared data processing among three sites
through communication network
12
Distributed Processing and
Distributed Database
 Distributed Database
 Store a logically related database over two or
more physically independent site
 The sites are connected through a network

13
Distributed Database

14
Distributed Database
vs. Distributed Processing
 Distributed processing
 Share with processing chores among several site
 Does not require distributed database

 May be based on a single database on single


computer
 Copies or parts of database processing functions
must be distributed to all data storage sites

15
Distributed Database
vs. Distributed Processing
 Distributed database
 System is composed of several parts of
database (called database fragments)
 Database fragments are located at different sites

 Requires distributed processing

 Both (common points)


 Require a network to connect components

16
Definition of DDBMS
 A software which governs the storage and
processing of logically related data over
interconnected computer systems in which
both and processing functions are
distributed among several sites
 It must have at least following functions

17
Functions of DDBMS
 Application/end user interface:
 to interact with the end-user or application programs
and with other DBMSs within the distributed database
 Validation to analyze data requests
 Transformation to determine which request data
components are distributed and which ones are
localized.
 Query optimization to find the best access strategy
 Mapping to determine the data location and
remote fragment

18
Functions of DDBMS
 I/O interface to read or write data from or to
permanent storage
 Formatting to prepare the data for presentation
to the end-user or application program
 Security to provide data privacy at both local
and remote databases
 Backup and recovery to ensure the availability
and recoverability

19
Functions of DDBMS
 DB Administration fro DBAs
 Concurrency Control to manage
simultaneous data access
 Transaction Management to ensure the
consistency states

20
Centralized Database

21
Fully Distributed Database
Management System
Handle not only the centralized database,
but also distributed of data and processing.

1. User can access “local data”, without


knowing the rest of information about the
data fragments

22
Fully Distributed Database
Management System

Figure 10.4
23
DDBMS Components
 Computer workstations, independent of
platform
 Network hardware and software
components reside in each workstation
 Communications media

24
DDBMS Components

 Transaction processor (TP)


 Software component in each workstation that
requires data
 Retrieves and processes the application data
requires (local and remote)
 Also called application manager (AP) or
transaction manager (TM)

25
DDBMS Components

 Data processor (DP)


 Software component residing on each computer
that store and retrieve data located at the site
 Also called data manager (DM)

26
Distributed Database
Components

Figure 10.5
27
DDBMS Protocols (Rules)
 Interface with network to transport data and
commands between DPs and TPs
 Synchronize data received from DPs and route
to appropriate TPs
 Ensure common database functions
 Security
 Concurrency control

 Backup and recovery

28
Levels of Data and Process
Distribution
Database systems can be classified based
on process distribution and data distribution

Table 10.1

29
Single-Site Processing, Single-
Site Data (SPSD)
 All processing on single CPU or host computer
 All data are stored on host computer disk
 DBMS located on the host computer
 DBMS accessed by dumb terminals
 Typical of mainframe and minicomputer
DBMSs
 Typical of 1st generation of single-user
microcomputer database

30
Single-Site Processing, Single-
Site Data (con’t.)

Figure 10.6

31
Multiple-Site Processing,
Single-Site Data (MPSD)
• Requires network file server
• Applications accessed through LAN
• Variation known as client/server architecture

Figure 10.7
32
Multiple-Site Processing,
Single-Site Data (MPSD)
• The difference between file server and
client/server distributed system
• In client/server system, all data processing is
done at the server site
• Both perform multiple-site processing
• File server requires the database to be located
at a single site, while client/server may
support data at multiple sites

33
Multiple-Site Processing,
Multiple-Site Data (MPMD)
 Fully distributed DDBMS with support for
multiple DPs and TPs at multiple sites
 Depends on the level of support, DDBMSs also
classified as
 Homogeneous distributed database system
 Integrate one type of centralized DBMS over the network
 Heterogeneous distributed database system
 Integrate different types of centralized DBMSs over a network

34
Heterogeneous Distributed
Database Scenario

Figure 10.8

35
Multiple-Site Processing,
Multiple-Site Data (MPMD)
 Fully Heterogeneous distributed database
system
 Support different DBMSs that may even support
different data models (relational, hierarchical, or
network) running under different platforms
 Restricts
 Read only; multiple tables in single transaction;
distinct database; limited database model

36
Distributed DB Transparency
 DDBMSD require several functional
characteristics or features.
 Allows end users to feel like only database
user (feel like a centralized system)
 Hides complexities of distributed database

37
Distributed DB Transparency
 Transparency features
 Distribution (treated as a single system; do not need to
know data partition, data replication, and data location)
 Transaction (allows transaction at several sites and still
ensure the database integrity)
 Failure (continue to operate even one of node fails)
 Performance (feel like centralized system with high
performance)
 Heterogeneity (integrate several different local DBMSs
under a common, or global schema

38
Distribution Transparency
 Allows management of a physically dispersed
database as though it were centralized (based on
end-user programmer’s recognition)
 Three Levels
 Fragmentation transparency
 Programmer do not need to know database is partitioned; do not
need to know the database fragment names and location at all.
 Location transparency
 Programmer must know the names of database fragments, but
not the locations
 Local mapping transparency
 Must specify both the fragment names and their locations

39
Distribution Transparency

40
Distribution Transparency
 Example:
 EMPLOY
(EMP_NAME,
EMP_DOB,
EMP_ADDRESS,
EMP_DEPARTMENT,
EMP_SALARY)
 Distributed over three
different locations: New
York (E1), Atlanta (E2),
and Miami (E3)

41
Distribution Transparency
 Case 1: Fragments transparency

SELECT *
FROM E1
WHERE EMP_DOB<’01-JAN-1940’;

42
Distribution Transparency
 Case 2: Location transparency

SELECT *
FROM E1
WHERE EMP_DOB<’01-JAN-1940’;
UNION
SELECT *
FROM E2
WHERE EMP_DOB<’01-JAN-1940’;
UNION
SELECT *
FROM E3
WHERE EMP_DOB<’01-JAN-1940’;
43
Distribution Transparency
 Case 3: local mapping transparency

SELECT *
FROM E1 NODE NY
WHERE EMP_DOB<’01-JAN-1940’;
UNION
SELECT *
FROM E2 NODE ATL
WHERE EMP_DOB<’01-JAN-1940’;
UNION
SELECT *
FROM E3 NODE MIA
WHERE EMP_DOB<’01-JAN-1940’;
44
Distribution Transparency
 Distributed transparency is supported by
distributed data dictionary (DDD) or
distributed data catalog (DDC), which
contains the description of entire data
 The description is called distributed global
schema
 DDD or DDC itself is a distributed database
and needs to be updated all over the sites
 Some current DDBMS implementations
impose limitations of the level of
transparency support

45
Transaction Transparency
 Ensures transactions maintain integrity and
consistency
 Completed only if all involved database sites
complete their part of the transaction
 Management mechanisms
 Distributed request
 Distributed transaction

 Remote request

 Remote transaction

46
Remote Request
 Allows us to access
single reference data at
single remote site

47
Remote Transaction
 Composed of several
requests at single site

48
Distributed Transaction
 Allows a transaction to
reference several
different DP sites.

49
Distributed Requests

50
Distributed Requests (con’t.)

51
Distributed Concurrency
Control
 Multisite, multiple-process operations more
likely to create data inconsistencies and
deadlocked transactions
 Problems
 Transaction committed by local DP
 One DP could not commit transaction’s result

 Yields inconsistent database

52
Two-Phase Commit Protocol
 DO-UNDO-REDO protocol
 Write-ahead protocol
 Two kinds of nodes
 Coordinator
 Subordinates

53
Two-Phase Commit Protocol
 Phases
 Preparation
 Coordinator sends message to all subordinates
 Confirms all are ready to commit or abort

 Final Commit
 Ensures all subordinates have committed or aborted

54
Performance Transparency
and Query Optimization
 Objective: Minimize total cost associated
with execution of request
 Main costs
 Access time
 Communication
 CPU time

55
Performance Transparency
and Query Optimization
 Basis for query optimization algorithms
 Optimum execution order
 Sites accessed to minimize communication
costs
 Dynamic or static optimization
 Statistically based vs. rule-based query
optimization algorithms

56
Distributed Database Design
 Partition database into fragments
 Horizontal
 Vertical

 Mixed

 Fragments to replicate
 Storage of data copies at multiple sites
 Fully, partially, unreplicated databases

57
Distributed Database Design
 Data allocation
 Where to locate data
 Centralized, partitioned, replicated

58
Client/Server Advantages Over
DDBMS
 Client/server less expensive
 Client/server solutions allow use of
microcomputer’s GUI
 More people with PC skills than mainframe
skills

59
Client/Server Advantages Over
DDBMS
 PC is well established in workplace
 Numerous data analysis and query tools
exist
 Considerable cost advantages to off-loading
application development

60
Client/Server Disadvantages

 Creates more complex environment with


different platforms
 Increased number of users and sites creates
security problems
 Training issues become more complex and
expensive

61
Date’s 12 Commandments for
Distributed Databases
1. Local Site Independence
2. Central Site Independence
3. Failure Independence
4. Location Transparency
5. Fragmentation Transparency
6. Replication Transparency
62
Date’s 12 Commandments for
Distributed Databases
7. Distributed Query Processing
8. Distributed Transaction Processing
9. Hardware Independence
10. Operating System Independence
11. Network Independence
12. Database Independence
63

You might also like