You are on page 1of 23

Distributed Databases

Chapter 1: An Overview

Reference: Distributed Database principles and concepts , Stefano Ceri , Giuseppe


Pelagatti
Outline

• Introduction.

• Distributed database definition.

• Centralized vs Distributed DB Features.

• Why Distributed Databases?

• Distributed Database Management


Systems.
Distributed Databases Definition
• A distributed DB is a collection of data which belong logically to the same system but are
spread over the sites of a computer network. Two of its important aspects are:
✔ Distribution: data is not resident at the same site (processor)
✔ Logical correlation: the data have some properties which tie them together – distinguishing a
distributed db from a set of local db or files.

• DDB uses global/distributed applications which access data at more than one site.
Eg – transfer of funds from an account of one bank branch to another account of another
branch updating the db at both branches)

• DDB Is a collection of data distributed over different computer network.

• Each site
✔Has autonomous processing capability .
✔Can perform local applications.
✔ Participate in the execution of at least one global application which requires accessing
data at several sites using a communication subsystem.

• Cooperation between autonomous site- most technological problem.


A DDB on a geographically dispersed network
Centralized vs Distributed DB

Centralized Distributed

• Located and maintained in one • Collection of data belong


location(Processor) to same system but spread
over sites of computer
•Pros : all data is located in one place. network.

•Cons: •Emphasizes
-bottleneck may occur 1.Distribution: not same
-Single point of failure. site(processor).
2.Logical correlation :
some properties tie data
together.
Centralized vs Distributed DB Features

• Centralized Control

• Data independence

• Reduction of redundancy

• Complex physical structures for efficient access

• Integrity , Recovery & Concurrency control.

• Privacy & Security.


Centralized Control
• In centralized db, a db administrator (DBA) guarantees data safety.

• In DDB
• There is a Hierarchical control structure based on a global DBA who has central
responsibility of whole DB.
• Local DBAs who have responsibility of their respective local DB
✔May have high degree of autonomy (Site autonomy), upto the
point that a global DBA is not required and inter-site coordination is
performed by the local administrators themselves.

• Site Autonomy vary from complete with no centralized DBA to completely


centralized control
Data Independence
• In centralized db, the actual organization of data is transparent to the application
programmer.

• Programs written having conceptual view of data (conceptual schema) & unaffected
by changes in physical organization of data.

• In DDB
• Same importance as traditional DB.
• Introduce Distribution Transparency
• Programs can be written as if the database were not distributed.
• Correctness of programs unaffected by data movement from site to
another while speed of execution may be affected
Reduction of Redundancy
• In Traditional DB
• Redundancy was reduced for two reasons
1. There is only one copy of data shared by several applications –
inconsistency can be avoided
2. Storage space saved
• Redundancy was reduced by data sharing – by allowing several applications
to access the same files and records.

• In DDB
• Data redundancy needed/desirable as
1. Availability of data can be increased if it is replicated at all sites.
2. Also one site failure does not stop execution due to presence of
replicated data at other sites.
• Data replication convenience increase with
ratio of retrieval accesses (any copy) versus update accesses (all copies) performed
by applications to it. -🡪 a tradeoff – retrieval of data can be done on any copy but
updations must be performed consistently on all copies.
[if retrieval is more – more replication desirable]
Complex physical structures & efficient access
• In Traditional DB
• Secondary indexes, interfile chains & others.
• Used to obtain complex and efficient access of data

• In DDB
• Very difficult to build and maintain such structures in distributed db.
• Efficient access can’t be provided by this structure as
1. Very difficult to build and maintain such structures.
2. Not convenient to navigate at record level in DDB
• A distributed access plan can be produced by an optimizer.
✔ Global optimization : determines which data must be accessed at which
sites and which files must consequently be transmitted. (parameters –
communication cost)
✔ Local optimization : decides how to perform local db access at each site.
Optimizers’ Design problems
Categories

Global Optimization Local Optimization


•Which data must be accessed at •how to perform local DB access at
which site & which data files must each site.
consequently be transmitted
between sites. •Typical to traditional , no
distributed DB problems.(not to be
•Optimization parameter: considered here)
• Communication cost
• Accessing the local DBs cost

•Importance of these factors


depend on relation between
communication cost & disk access
cost , which depend on
communication network.

•Research here aids in


understanding how DDB can be
efficiently accessed even if access
plans not produced automatically.
Integrity, recovery and concurrency control
• Strongly Correlated issues .
• Solution : providing transactions.
• Transaction

• Definition: Atomic unit of execution – set of operations performed entirely


or not at all.
• Example: Funds transfer example (debit & credit)
• Problem: debit at an operational site & credit at non operational site – global
transaction
• How to act ? Abort transaction or find smart way to execute transfer even
if sites not simultaneously operating ?

• Transaction atomicity enemies

• Failures
• Concurrency
Integrity, recovery and concurrency control
• DB integrity
• Transaction atomicity assure DB integrity by assuring all actions transfer
DB from consistent state to another are performed or initial consistent
state is preserved.

• Recovery: Deals with preserving transaction atomicity in the


presence of failures.

• Concurrency Control: Deals with ensuring transaction atomicity in the


presence of concurrent execution of transactions.
Problems : Synchronization harder in DDB than in centralized DB
Privacy and Security
• In Traditional centralized DB
• DBA has centralized control
• DBA ensures that only authorized access to the data is performed
• Without specialized control procedures , is weak to privacy & security
violations than older separate files based approaches

• In DDB
• Local DBAs face same DBA problems in traditional DB.
• In DDB with very high degree of site autonomy, local DBA more protected
through enforcing their own protection instead of central DBA.
• Communication networks are vulnerable to attacks. So security problems are
intrinsic.
Why Distributed Databases?
• Why DDB development has just begun ?

1. Recent development of small computers at lower costs instead of large


mainframes constitutes the necessary h/w support needed.
2. DDB development depends on Computer Network& Database technologies
Which are developed during the seventies . It is a complex task to build a
distributed db on top of a computer network and a set of local dbms at each
site, it would be difficult without these building blocks.
Why Distributed Databases?
1. Organizational and economic reasons.
• Most organizations are decentralized, so distributed approach is more economic
than maintaining a large central computer.
2. Interconnection of existing DBs
• When several dbs already exist in an organization and the need of performing
global applications rises, DDB is a natural solution.
• Creating bottom-up DDB from existing local DBs having less effort from
completely new centralized DB creation
3. Incremental growth.
• When an organization grows, then DDB supports a smooth incremental growth
with minimum impact on the already existing units.
• With centralized approach would have to
Either take care for future dimension expansion in initial design –
difficult & expensive
Or the growth will have major impact on existing applications
4. Reduce communication overhead
As many applications are local, it reduces communication overhead wrt centralized
systems.
Why Distributed Databases?
5. Performance considerations
• Several autonomous processors
• High degree of parallelism – increase performance
• In DDB mutual interference between different processors minimized.
• Load is shared between different processors
• Bottlenecks as communication network itself or common services of the whole
system are avoided.

6. Reliability and availability


• The distributed db approach, specially with redundant data, can be used to
obtain higher reliability and availability.
• Ensures Graceful degradation property: the effect of each failure is confined to
those applications which use data of the failed site and complete crash is rare.
• Failure in DDB is can be higher than in centralized DB because of greater number
of components but failure affect only applications using failed site.
Distributed Database Management Systems (DDBMSs)
• Support creation and maintenance of DDBs.

• Commercially available Distributed systems developed by centralized DBMSs


vendors

• DDBMS extends centralized DBMSs by supporting communication & cooperation


between several instances of DBMS that are installed at local sites of computer
network.

• Software components of DDBMS


1.DB management component (DB)
2.Data communication component(DC)
3.Data dictionary (DD) – info about data distribution
in the network
4.Specialized Distributed DB component (DDB)

- DBMS will refer to (DB, DC, DD)


- DDBMS will refer to (DB,DC,DD,DDB)
Distributed Database Management Systems (DDBMSs)

Services supported by DDBMS:


• Remote DB access by an application program – is the most important one
and is provided by all systems which have a distributed db component.

• Some degree of distribution transparency – supported to a different extent


by different extent by different systems, because there is a strong trade-off
between distribution transparency and performance.

• Support for database administration & control – this feature includes tools
for monitoring the db, gathering info about db utilization, and providing a
global view of data files existing at the various sites.

• Some support for concurrency control & recovery of distributed


transactions
Distributed Database Management Systems (DDBMSs)
• DDBMSs provides access remote DB by an application through

• Units shipped between Systems by


1. DB access primitive
2. Result obtained by executing it
• Assures distribution transparency
Distributed Database Management Systems
(DDBMSs)

• Auxiliary program executed at remote site is required by application which


1. Access remote DB
2. Return the result to requesting application
• Efficient if many DB access is required for auxiliary program perform all
required access and send only result back.
Homogeneity and Heterogeneity of
DDBMSs
• Homogeneity and heterogeneity can be considered at different levels in a DDB – the
hardware, the OS and the local DBMSs.
• We consider distinction at level of local DBMSs.
• Homogenous DDBMS :
• DDBMSs with same DBMS at each site, even if the computers and/or OS are not
the same.
• Preferred to be built in case of top-down without preexisting system
development of DDB

• Heterogeneous DDBMS :
• At least two different DBMSs.
• Added problem of translating between different data models of the different
local DBMSs
• Used in case of integrating preexisting DBs .
• Some systems support communication between different DC
components(mainly developed for compatibility reasons in centralized systems)
as in DBMSs produced for running on IBM computers
End of Chapter 1

You might also like