You are on page 1of 33

Distributed Database Management

Systems

Dated: 05.03.2017
Lecture Outline
• Introduction
– Motivation
– Distributed database system
– Distributed database system promises
Traditional File Processing

program 1
File 1
data description 1

program 2
data description 2 File 2

program 3
data description 3 File 3
Database Management System
Application
program 1
(with data
semantics)
DBMS

description
Application
program 2 manipulation
(with data database
semantics) control

Application
program 3
(with data
semantics)
Database vs. File-based Approach
Database approach is preferred over traditional file-based
approach because of following key reasons:

• Self-describing nature of a database system


• Insulation between program and data, and data abstraction
• Support of multiple views of data
• Sharing of data and multiuser transaction processing
Distributed Database System (DDBS)
• Distributed Database (DDB):
– A distributed database (DDB) is a collection of multiple, logically
interrelated databases distributed over a computer network.

• Distributed Database Management System (DDBMS):

– A distributed database management system (D–DBMS) is the software that


manages the DDB and provides an access mechanism that makes this
distribution transparent to the users.

• Distributed Database System (DDBS):

– Distributed database system (DDBS) = DB + DBMS + Communication


What is not Distributed Database System
• A database system which resides at one of the nodes of a
network of computers - this is a centralized database on a
network node.

Site 1
Site 2

Site 5

Communication
Network

Site 4 Site 3
Distributed DBMS Environment

Site 1
Site 2

Site 5
Communication
Network

Site 4 Site 3
Applications

• Manufacturing - especially multi-plant


manufacturing
• Military command and control
• Electronic fund transfers and electronic trading
• Corporate MIS
• Airline restrictions
• Hotel chains
• Any organization which has a decentralized
organization structure
Distributed DBMS Promises
1. Transparent management of distributed,
fragmented, and replicated data

2. Improved reliability/availability through


distributed transactions

3. Improved performance

4. Easier and more economical system expansion


Transparency
• Transparency is the separation of the higher
level semantics of a system from the lower level
implementation issues.
Example
SELECT ENAME,SAL ASG
FROM EMP,ASG,PAY ENO PNO RESP DUR
WHERE DUR > 12
E1 P1 Manager 12
AND EMP.ENO = ASG.ENO E2 P1 Analyst 24
AND PAY.TITLE = EMP.TITLE E2 P2 Analyst 6
E3 P3 Consultant 10
E3 P4 Engineer 48
E4 P2 Programmer 18
E5 P2 Manager 24
E6 P4 Manager 48
EMP E7 P3 Engineer 36
E7 P5 Engineer 23
ENO ENAME TITLE E8 P3 Manager 40
E1 J. Doe Elect. Eng.
E2 M. Smith Syst. Anal. PROJ PAY
E3 A. Lee Mech. Eng. PNO PNAME BUDGET TITLE SAL
E4 J. Miller Programmer
E5 B. Casey Syst. Anal. P1 Instrumentation 150000 Elect. Eng. 40000
E6 L. Chu Elect. Eng. P2Database Develop.135000 Syst. Anal. 34000
E7 R. Davis Mech. Eng. P3 CAD/CAM 250000 Mech. Eng. 27000
E8 J. Jones Syst. Anal. P4 Maintenance 310000 Programmer 24000
SELECT ENAME,SAL
FROM EMP,ASG,PAY
Transparent Access
WHERE DUR > 12
AND EMP.ENO = ASG.ENO Tokyo
AND PAY.TITLE = EMP.TITLE

Karachi Paris

Paris projects
Paris employees
Communication
Paris assignments
Network
Karachi employees
Karachi projects
Karachi employees
Karachi assignments
N.Shah
New
N.Shah projects
York
Paris projects
Karachi projects
New York projects
New York employees
with budget > 200000
New York projects
Nawabshah employees
New York assignments
Nawabshah assignments
Transparency
• Transparency involves:
1. Data independence
• Logical
• Physical

2. Network (distribution) transparency

3. Replication transparency

4. Fragmentation transparency
• horizontal fragmentation: selection
• vertical fragmentation: projection
• hybrid
Transparency
• Data independence (Data transparency):
– It refers to the immunity of user applications to
changes in the definition and organization of data,
and vice versa.

– Logical data independence:


• refers to the immunity of user applications to changes in
the logical structure of data.
• Application should still be running if additional attributes
are added to a relation

– Physical data independence:


• deals with hiding the details of the storage structure from
user applications
Transparency
• Network transparency:
– User does not need to know the operational
details of the network:
• Service transparency
• Location transparency
Transparency
• Replication:
– It refers to multiple copies of the same data. Helps
to improve performance, reliability, and availability
of the system across the network.

• Replication transparency:
– The user does not need to know the existence of
copies, their management, and location.
Transparency
• Fragmentation:
– It refers to the division of database relations into smaller
fragments and treat each fragment as a separate database
object (i.e., another relation). It helps performance,
availability, and reliability of the system.

– Fragmentation can reduce the negative effects of replication

• Fragmentation types:
• horizontal fragmentation: selection
• vertical fragmentation: projection
• hybrid
Fragmentation Transparency - Horizontal
Fragmentation Transparency - Horizontal
Fragmentation Transparency - Vertical
Fragmentation Transparency - Vertical
Exercise: Fragment following EMP table Horizontally and Vertically along with equivalent

SQL query.

Slide 3- 23
Layers of Transparency
Who Should Provide Transparency?
• Application
– Applications or application modules are implemented in a
distributed fashion, communication and data exchange via standard
protocols (RPC, CORBA, HTTP, . . . )
• Operating system
– Realizes network transparency, e.g., on the system level (NFS) or
protocol level
• Database system
– Transparent access to data at remote database instances
– Requires splitting queries, transaction control, replication
Distributed DBMS Promises
1. Transparent management of distributed,
fragmented, and replicated data

2. Improved reliability/availability through


distributed transactions

3. Improved performance

4. Easier and more economical system expansion


Reliability Through Distributed Transactions
• Reliability:
– Allows correct operation even in case of failures
– Is achieved by data copies (replicates) on remote
sites
– Correct operations are achieved by transferring
one consistent database state into another
consistent database state
• Example: Increasing the salaries of all employees in
distributed environment by 10%.
Distributed DBMS Promises
1. Transparent management of distributed,
fragmented, and replicated data

2. Improved reliability/availability through


distributed transactions

3. Improved performance

4. Easier and more economical system expansion


Improved Performance
• Improved performance:
– Is achieved using fragmentation and parallelism
• Improved performance using fragmentation:
– Fragmenting the conceptual database in a way that enables
data to be stored in close proximity to its points of use
which ultimately reduces transfer costs and delays
• Improved performance using fragmentation:
– Inter-query parallelism:
• execution of multiple queries at the same time
– Intra-query parallelism:
• parallel execution of sub-queries at different sites accessing a
different part of the distributed database
Distributed DBMS Promises
1. Transparent management of distributed,
fragmented, and replicated data

2. Improved reliability/availability through


distributed transactions

3. Improved performance

4. Easier and more economical system expansion


Easier System Expansion

• Necessity of increasing database size and/or decreasing query


execution time

• Expansion by adding additional storage and processing power


to the network

• A system of smaller computers is often cheaper than a single big


machine with the equivalent power
Challenges/Issues
• Distributed database design
– Fragmentation, replication, and distribution

• Distributed query processing


– Executing a query over the network in the most cost-effective way

• Distributed concurrency control


– Synchronizing access such that integrity is maintained

• Reliability of distributed DBMS


– Ensure consistency, detect failures, and recover from failures

• Heterogeneous databases
Relationship Among Challenges/Issues

Directory
Management

Query Distribution
Reliability
Processing Design

Concurrency
Control

Deadlock
Management

You might also like