Professional Documents
Culture Documents
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 1
Outline
Introduction
Distributed and Parallel Database Design
Distributed Data Control
Distributed Query Processing
Distributed Transaction Processing
Data Replication
Database Integration – Multidatabase Systems
Parallel Database Systems
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 2
Outline
Introduction
What is a distributed DBMS
History
Distributed DBMS promises/benefits
Design issues/challenges/problems
Distributed DBMS architecture
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 3
Distributed Computing
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 4
Current Distribution – Geographically
Distributed Data Centers
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 5
What is a Distributed Database System?
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 6
What is not a DDBS?
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 7
Distributed DBMS Environment
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 8
Implicit Assumptions
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 9
Important Point
Logically integrated
but
Physically distributed
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 10
Outline
Introduction
What is a distributed DBMS
History
Distributed DBMS promises/benefits
Design issues/challenges/problems
Distributed DBMS architecture
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 11
History – File Systems
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 12
History – Database Management
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 13
History – Early Distribution
Peer-to-Peer (P2P)
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 14
History – Client/Server
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 15
History – Data Integration
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 16
History – Cloud Computing
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 17
Data Delivery Alternatives
Delivery modes
Pull-only
Push-only
Hybrid
Frequency
Periodic
Conditional
Ad-hoc or irregular
Communication Methods
Unicast
One-to-many
Note: not all combinations make sense
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 18
Outline
Introduction
What is a distributed DBMS
History
Distributed DBMS promises/benefits
Design issues/challenges/problems
Distributed DBMS architecture
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 19
Distributed DBMS promises/benefits
Improved performance
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez
Transparency
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez
Example
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 22
Transparent Access
Tokyo
SELECT ENAME,SAL
Boston Paris
FROM EMP,ASG,PAY
WHERE DUR > 12 Paris projects
Paris employees
AND EMP.ENO = ASG.ENO Communication Paris assignments
Network Boston employees
AND PAY.TITLE =
EMP.TITLE Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 23
Distributed Database - User View
Distributed Database
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 24
Distributed DBMS - Reality
User
Query
User
DBMS
Application
Software
DBMS
Software
DBMS Communication
Software Subsystem
User
DBMS User Application
Software Query
DBMS
Software
User
Query
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 25
Types of Transparency
Data independence
Network transparency (or distribution transparency)
Location transparency
Fragmentation transparency
Fragmentation transparency
Replication transparency
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 26
Reliability Through Transactions
Data replication
Great for read-intensive workloads, problematic for updates
Replication protocols
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 27
Potentially Improved Performance
Parallelism in execution
Inter-query parallelism
Intra-query parallelism
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 28
Scalability
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 29
Geographic Distribution
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 30
Flexibility and Adaptability
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 31
Data Security and Privacy
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 32
Cost efficiency
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 33
Resilience to Network Failures
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 34
Outline
Introduction
What is a distributed DBMS
History
Distributed DBMS promises/benefits
Design issues/challenges/problems
Distributed DBMS architecture
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 35
Distributed DBMS
Issues/challenges/problems
Distributed database design
How to distribute the database
Replicated & non-replicated database distribution
Distributed query processing
Convert user transactions to data manipulation instructions
Optimization problem
min{cost = data transmission + local processing}
General formulation is NP-hard
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 36
Distributed DBMS
Issues/challenges/problems
Distributed concurrency control
Synchronization of concurrent accesses
Consistency and isolation of transactions' effects
Deadlock management
Reliability
How to make the system resilient to failures: consider when a
system in inoperable or inaccessible due to network failure for
instance
How to ensure database is up to date after recover?
Consistency, atomicity and durability
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 37
Distributed DBMS
Issues/challenges/problems
Distributed directory management
A directory may be global to the entire DDBS or local to each
site; it can be centralized at one site or distributed over several
sites; there can be a single copy or multiple copies
Contains information (such as descriptions and locations) about
data items in database
How to distribute the database
Replicated & non-replicated database distribution
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 38
Distributed DBMS
Issues/challenges/problems
Distributed deadlock management
Consider the competition among users for access to a set of
resources (data, in this case); this can result in a deadlock if the
synchronization mechanism is based on locking.
How to handle deadlock in DDBSs: prevention, avoidance, and
detection/recovery.
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 39
Distributed DBMS
Issues/challenges/problems
Replication
Mutual consistency
Freshness of copies
Eager vs lazy
Centralized vs distributed
Parallel DBMS
Objectives: high scalability and performance
Not geo-distributed
Cluster computing
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 40
Related Issues
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 41
Outline
Introduction
What is a distributed DBMS
History
Distributed DBMS promises/benefits
Design issues/challenges/problems
Distributed DBMS architecture
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 42
ANSI/SPARC Architecture
The ANSI/SPARC
architecture has three
views of data:
the external view,
which is that of the
end user, who might
be a programmer;
the internal view, that
of the system or
machine: physical
definition and
organization of data.
the conceptual view,
that of the enterprise
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 43
ANSI/SPARC Architecture
The ANSI/SPARC
architecture has three
views of data:
the external view,
which is that of the
end user, who might
be a programmer;
the internal view, that
of the system or
machine: physical
definition and
organization of data.
the conceptual view,
that of the enterprise
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 44
DBMS Implementation Alternatives
Architectural Models for Distributed DBMSs
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 45
DBMS Implementation Alternatives
Architectural Models for Distributed DBMSs: Dimensions of the Problem
Distribution
Whether the components of the system are located on the same machine or not
Heterogeneity
Various levels (hardware, communications, operating system)
DBMS important one
data model, query language, transaction management algorithms
Autonomy
Refers to distribution of control
Various versions
Design autonomy: Ability of a component DBMS to decide on issues related to its own
design.
Communication autonomy: Ability of a component DBMS to decide whether and how
to communicate with other DBMSs.
Execution autonomy: Ability of a component DBMS to execute local operations in any
manner it wants to.
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 46
Client/Server Architecture
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 47
Advantages of Client-Server
Architectures
More efficient division of labor
Horizontal and vertical scaling of resources
Better price/performance on client machines
Ability to use familiar tools on client machines
Client access to remote data (via standards)
Full DBMS functionality provided to client workstations
Overall better system price/performance
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 48
Database Server
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 49
Distributed Database Servers
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 50
Peer-to-Peer Component Architecture
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 51
MDBS Components & Execution
A multidatabase system, also known as a federated database system,
is a type of database management system (DBMS) that integrates
multiple autonomous databases into a single logical database.
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 52
Mediator/Wrapper Architecture
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 53
Cloud Computing
PaaS – Platform-as-a-Service
SaaS – Software-as-a-Service
DaaS – Database-as-a-Service
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 54
Simplified Cloud Architecture
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 55
Distributed computing Individual exercise
a. Define the concept of distributed database management
systems (DDBMS) and explain its significance in
modern information systems.
b. Discuss the key differences between a centralized
DBMS and a distributed DBMS.
c. a. Define a distributed database management system
(DDBMS) and explain how it differs from a traditional
centralized database management system.
d. Discuss the benefits and challenges associated with
implementing a distributed DBMS in organizations.
e. Identify and discuss the key design issues and
challenges involved in building a distributed database
management system.
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 56
Distributed computing Individual exercise
f. Using books and the Internet resources, explain how
design issues such as data distribution, concurrency
control, data replication, query optimization, and
security are addressed in distributed DBMS
architectures.
g. Describe the architecture of a distributed database
management system, including its components, layers,
and functionalities. Compare it with the multidatabase
systems integration
h. Discuss the role of distributed data storage, query
processing, concurrency control mechanisms,
transaction management, and data replication in
distributed DBMS architectures.
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 57
Distributed computing Individual exercise
i. Using books and the Internet resources, expound on the
following characteristics of the distributed database
systems. NB: characteristics define the inherent
qualities or attributes of distributed database systems
Lecture slides as adapted and customized from © 2020, M.T. Özsu & P. Valduriez 58