You are on page 1of 24

Advanced Database

Management System

DISTRIBUTED DATABASE

Module 1
TOPICS
Distributed Databases:
 Introduction

 Distributed DBMS Architecture

 Data Fragmentation

 Replication and Allocation Techniques for Distributed


Database Design
INTRODUCTION TO ADVANCED
DATABASE MANAGEMENT SYSTEM.
 An advanced database management system (DBMS) is a
powerful software application that efficiently manages
large amounts of data. It offers features like scalability,
performance optimization, data security, advanced
querying, analytics, and integration with other systems.
 It supports various data models, ensures data integrity,
and enables data replication and availability. Advanced
DBMS facilitates data warehousing, cloud computing,
and distributed computing, providing organizations with
robust data management capabilities.
FEATURES
 Advanced DBMS supports various data models such as relational,
object-oriented, and hierarchical models.
 It provides mechanisms for handling and managing large-scale
databases.
 Performance optimization techniques like indexing, query optimization,
and caching are incorporated.
 Robust security mechanisms protect sensitive data from unauthorized
access.
 Data replication ensures high availability and fault tolerance.

 Advanced querying capabilities include complex joins, subqueries,


aggregations, and window functions.
 Concurrency control mechanisms handle concurrent access to the
database.
 Integration with other systems and data sources allows seamless data
exchange.
DISTRIBUTED DATABASE SYSTEM
A distributed database is basically a
database that is not limited to one
system, it is spread over different
sites, i.e, on multiple computers or
over a network of computers. A
distributed database system is located
on various sites that don’t share
physical components. This may be
required when a particular database
needs to be accessed by various users
globally. It needs to be managed such
that for the users it looks like one
single database. 
TYPES
1. Homogeneous Database:
In a homogeneous database, all different sites
store database identically. The operating
system, database management system, and the
data structures used – all are the same at all
sites. Hence, they’re easy to manage.

2. Heterogeneous Database:
In a heterogeneous distributed database,
different sites can use different schema and
software that can lead to problems in query
processing and transactions. Also, a particular
site might be completely unaware of the other
sites. Different computers may use a different
operating system, different database
application. They may even use different data
models for the database.
DIFFERENCE BETWEEN CENTRALIZED
DATABASE AND DISTRIBUTED DATABASE:
Distributed database Centralized Database
CONCEPT OF DISTRIBUTED
COMPUTING SYSTEM
Distributed computing refers to a system where processing and data storage is
distributed across multiple devices or systems, rather than being handled by a
single central device. In a distributed system, each device or system has its
own processing capabilities and may also store and manage its own data.
These devices or systems work together to perform tasks and share resources,
with no single device serving as the central hub.
Components
There are several key components of a Distributed Computing System
 Devices or Systems: The devices or systems in a distributed system have
their own processing capabilities and may also store and manage their own
data.
 Network: The network connects the devices or systems in the distributed
system, allowing them to communicate and exchange data.
 Resource Management: Distributed systems often have some type of
resource management system in place to allocate and manage shared
resources such as computing power, storage, and networking.
ADVANTAGES OF DISTRIBUTED
DATABASES
 1) There is fast data processing as several sites participate in request processing.
 2) Reliability and availability of this system is high.
 3) It possess reduced operating cost.
 4) It is easier to expand the system by adding more sites.
 5) It has improved sharing ability and local autonomy.

DISADVANTAGES OF DISTRIBUTED DATABASES


 1) The system becomes complex to manage and control.
 2) The security issues must be carefully managed.
 3) The system require deadlock handling during the transaction
processing otherwise the entire system may be in inconsistent state.
 4) There is need of some standardization for processing of
distributed database system.
APPLICATIONS OF DISTRIBUTED
DATABASE:
 It is used in Corporate Management Information System.
 It is used in multimedia applications.

 Used in Military’s control system, Hotel chains etc.

 It is also used in manufacturing control system.

DISTRIBUTED DATABASE SYSTEM ARCHITECTURES

 Client - Server Architecture for DDBMS


 Peer - to - Peer Architecture for DDBMS

 Multi - DBMS Architecture


FUNCTIONS
 Data distribution: One of the primary functions of a distributed database system is to
distribute data across multiple sites. This is done to ensure that data is stored closer to
where it is needed and to reduce the amount of data that needs to be transferred over
the network.
 Data replication: In a distributed database system, data can be replicated across
multiple sites. Replication can improve system availability and reliability by ensuring
that data is available even if one of the sites fails.
 Data fragmentation: Data fragmentation involves breaking down a large database
into smaller fragments and distributing them across multiple sites. This can help
improve system performance by reducing the amount of data that needs to be
transferred over the network.
 Query processing: Query processing involves processing user queries and retrieving
data from the distributed database system. This is a complex task as data may be
stored across multiple sites and may need to be combined to answer user queries.
 Transaction management: In a distributed database system, transactions may span
multiple sites. Transaction management involves coordinating these transactions and
ensuring that they are executed correctly and efficiently.
 Security and access control: In a distributed database system, it is important to
ensure that data is secure and that access to it is controlled. This involves
implementing appropriate security measures and access control mechanisms to
protect data from unauthorized access or modification.
FEATURES
 Data partitioning: A distributed database system can partition data across
multiple nodes or servers to improve scalability and performance.
 Data replication: A distributed database system can replicate data across
multiple nodes or servers to improve fault tolerance and availability.
 Distributed query processing: A distributed database system can perform
queries across multiple nodes or servers to improve performance and
efficiency.
 Distributed transaction processing: A distributed database system can
support transactions that span multiple nodes or servers, while ensuring data
consistency and integrity.
 Consensus protocols: A distributed database system can use consensus
protocols, such as Paxos or Raft, to ensure agreement and coordination
among different nodes or servers.
 Distributed locking mechanisms: A distributed database system can use
distributed locking mechanisms, such as two-phase locking or timestamp
ordering, to ensure data consistency and avoid conflicts.
DESIGN ISSUES
 Distributed Database Design
One of the main questions that is being addressed is how database and the applications that run against it should be
placed across the sites.
 Distributed Directory Management
A directory contains information (such as descriptions and locations) about data items in the database. Problems
related to directory management are similar in nature to the database placement problem discussed in the preceding
section.
 Distributed Query Processing
Query processing deals with designing algorithms that analyze queries and convert them into a series of data
manipulation operations. The problem is how to decide on a strategy for executing each query over the network in the
most cost-effective way, however cost is defined.
 Distributed Concurrency Control
Concurrency control involves the synchronization of access to the distributed database, such that the integrity of the
database is maintained. It is, without any doubt, one of the most extensively studied problems in the DDBS field..
 Distributed Deadlock Management
It is important that mechanisms be provided to ensure the consistency of the database as well as to detect failures and
recover from them. The implication for DDBSs is that when a failure occurs and various sites become either inoperable
or inaccessible, the databases at the operational sites remain consistent and up to date.
 Reliability of Distributed DBMS
It is important that mechanisms be provided to ensure the consistency of the database as well as to detect failures and
recover from them. The implication for DDBSs is that when a failure occurs and various sites become either inoperable
or inaccessible, the databases at the operational sites remain consistent and up to date.
 Replication
If the distributed database is (partially or fully) replicated, it is necessary to implement protocols that ensure the
consistency of the replicas, i.e. copies of the same data item have the same value.
DISTRIBUTED DBMS ARCHITECTURE
DDBMS architectures are generally developed depending on three
parameters
 Distribution − It states the physical distribution of data across the different
sites.
 Autonomy − It indicates the distribution of control of the database system
and the degree to which each constituent DBMS can operate independently.
 Heterogeneity − It refers to the uniformity or dissimilarity of the data
models, system components and databases.
Common architectural models are
 Client-server architecture: In this architecture, clients connect to a central
server, which manages the distributed database system. The server is
responsible for coordinating transactions, managing data storage, and
providing access control.
 Peer-to-peer architecture: In this architecture, each site in the distributed
database system is connected to all other sites. Each site is responsible for
managing its own data and coordinating transactions with other sites.
 Federated architecture: In this architecture, each site in the distributed
database system maintains its own independent database, but the databases
are integrated through a middleware layer that provides a common interface
for accessing and querying the data.
CLIENT - SERVER ARCHITECTURE FOR
DDBMS
 Single Server Multiple Client
 Multiple Server Multiple Client
PEER- TO-PEER ARCHITECTURE FOR DDBMS
 In these systems, each peer acts
both as a client and a server for
imparting database services. The
peers share their resource with
other peers and co-ordinate their
activities.
 This architecture generally has
four levels of schemas −
 Global Conceptual Schema −
Depicts the global logical view of
data.
 Local Conceptual Schema −
Depicts logical data organization
at each site.
 Local Internal Schema −
Depicts physical data
organization at each site.
 External Schema − Depicts user
view of data.
MULTI - DBMS ARCHITECTURES
This is an integrated database
system formed by a collection of
two or more autonomous database
systems.
Multi-DBMS can be expressed through six levels of
schemas −
 Multi-database View Level − Depicts multiple user
views comprising of subsets of the integrated distributed
database.
 Multi-database Conceptual Level − Depicts integrated
multi-database that comprises of global logical multi-
database structure definitions.
 Multi-database Internal Level − Depicts the data
distribution across different sites and multi-database to
local data mapping.
 Local database View Level − Depicts public view of
local data.
 Local database Conceptual Level − Depicts local data
organization at each site.
 Local database Internal Level − Depicts physical data
organization at each site.
DATA ALLOCATION
Data Allocation is an intelligent distribution of your data
pieces, (called data fragments) to improve database
performance and Data Availability for end-users. It aims to
reduce overall costs of transaction processing while also
providing accurate data rapidly in your DDBMS systems.
Data Allocation is one of the key steps in building your
Distributed Database Systems.
There are two common strategies used in optimal Data
Allocation: Data Fragmentation and Data Replication.
DISTRIBUTED DATA STORAGE 
There are 2 ways in which data can be stored on different sites
1. Replication –
In this approach, the entire relationship is stored redundantly
at 2 or more sites. If the entire database is available at all sites,
it is a fully redundant database. Hence, in replication, systems
maintain copies of data. 
2. Fragmentation –
In this approach, the relations are fragmented (i.e., they’re
divided into smaller parts) and each of the fragments is stored
in different sites where they’re required. It must be made sure
that the fragments are such that they can be used to reconstruct
the original relation (i.e, there isn’t any loss of data). 
DATA FRAGMENTATION
In fragmentation, the relations are broken (i.e., separated into smaller
portions) in this manner, and each of the fragments is kept in multiple
locations as needed. It must be ensured that the fragments can be
utilized to recreate the original relationship (i.e., that no data is lost).
Fragmentation is useful since it avoids the creation of duplicate data,
and consistency is not an issue.
 Horizontal fragmentation — Splitting by rows — Each tuple is
allocated to at least one fragment once the relation is broken into
groups of tuples.
 Vertical fragmentation — Splitting by columns — The relation’s
schema is broken into smaller schemas. To achieve a lossless join,
each fragment must have a shared candidate key.
REPLICATION AND ALLOCATION TECHNIQUES
FOR DISTRIBUTED DATABASE DESIGN
Data Replication is the process of storing data in more than one site or node. It is useful in improving
the availability of data. It is simply copying data from a database from one server to another server so
that all the users can share the same data without any inconsistency.
The result is a distributed database in which users can access data relevant to their tasks without
interfering with the work of others. Data replication encompasses duplication of transactions on an
ongoing basis, so that the replicate is in a consistently updated state and synchronized with the source.
 Transactional Replication – In Transactional replication users receive full initial copies of the
database and then receive updates as data changes. Data is copied in real time from the publisher to the
receiving database(subscriber) in the same order as they occur with the publisher therefore in this type
of replication, transactional consistency is guaranteed. Transactional replication is typically used in
server-to-server environments. It does not simply copy the data changes, but rather consistently and
accurately replicates each change.
 Snapshot Replication – Snapshot replication distributes data exactly as it appears at a specific
moment in time does not monitor for updates to the data. The entire snapshot is generated and sent to
Users. Snapshot replication is generally used when data changes are infrequent. It is bit slower
than transactional because on each attempt it moves multiple records from one end to the other end.
Snapshot replication is a good way to perform initial synchronization between the publisher and the
subscriber.
 Merge Replication – Data from two or more databases is combined into a single database. Merge
replication is the most complex type of replication because it allows both publisher and subscriber to
independently make changes to the database. Merge replication is typically used in server-to-client
environments. It allows changes to be sent from one publisher to multiple subscribers.
REPLICATION SCHEME –FULL
REPLICATION
 The most extreme case is replication of the whole database at
every site in the distributed system. This will improve the
availability of the system because the system can continue to
operate as long as at least one site is up.
THE END

You might also like