You are on page 1of 18

What is a DDBMS?

an integrated database which is built on top of a computer network rather than on a single computer. and the application programs which are run by the computer access data at different sites. The data which constitute the database are stored at the different sites of the computer network.Availability of Database + Availability of Computer Network = Distributed Database A distributed database is. . in brief.

Sites linked by a communications network. Fragments/replicas allocated to sites. Fragments may be replicated. physically distributed over a computer network.Concepts Distributed Database A logically interrelated collection of shared data (and a description of this data). . Data at each site is under control of a DBMS. 3 ‡ Each DBMS participates in at least one global application. Distributed DBMS Software system that permits the management of the distributed database and makes the distribution transparent to users. DBMSs handle local applications autonomously. ‡ ‡ ‡ ‡ ‡ ‡ ‡ Collection of logically-related shared data. Data split into fragments.

Distributed DBMS Distributed Processing A centralized database that can be accessed over a computer network. 5 .

2 important aspects ‡ Distribution: data are not resident at the same site (processor). ‡ Logical correlation: data have some properties which tie them together. so that we can distinguish a distributed database from a single centralized database. .

‡ During normal operations the applications which are requested from the terminals of a branch need only to access the database of that branch. ‡ Each computer with its local account database at one branch constitutes one site of the distributed database.‡ Consider a Bank that has three branches at different locations: Delhi. Chennai. ‡ At each branch. Bombay. a computer controls the teller terminals of the branch and the account database of that branch. computers are connected by a communication network. .

.‡ These applications are completely executed by the computer of the branch where they are issued and will therefore be called local applications. ‡ Example of a local application: debit or credit card application performed on an account stored at the same branch at which the application is requested.

These applications are called global applications or distributed applications. ‡ (Remember logical correlation aspect of DDBMS) .‡ Understanding the distinction between distributed database and a set of local database. ‡ The important aspect is the existence of some applications which access data at more than one branch.

because it is also necessary to ensure that both updates are performed or neither.‡ A typical global application is a transfer of funds from an account of one branch to an account of another branch. ‡ This application requires updating the databases at two different branches. . ‡ Ensuring this requirement for global applications is a difficult task. This application is something more than just performing two local updates at two individual branches ( a debit and a credit).

.‡ In the example cited. however distributed databases can be built also on a local network (higher throughput). ‡ Local application: Locality is not defined with respect to the geographical distribution of the computers which execute it. ‡ Different data sites connected through a network makes a DDBMS. but with respect to the fact that only one computer with its own database is involved. computers are at geographically different locations.

which requires accessing data at several sites using a communication subsystem. ‡ Each site of the network has autonomous processing ability and can perform local applications. . ‡ Each site also participates in the execution of at least one global application.Please remember the following definition of DDBMS ‡ A distributed database is a collection of data which are distributed over different computers of a computer network.

it is possible to identify a hierarchical control structure based on a global database administrator who has the central responsibility of the whole database. in distributed databases. the idea of centalized control is much less emphasized. ‡ Local database administrator may have a high degree of autonomy. ‡ Distributed databases differ very much in the degree of site autonomy: from complete site autonomy without any centralized database administrator to almost completely centralized control. This characteristics is called SITE AUTONOMY. up to the point that a global database administrator is completely missing and the inter site coordination is performed by the local administrator themselves. ‡ In distributed databases. In general. the data itself was recognised to be an important investment of the enterprise which required a centralized responsibility. . Centralized Database ‡ Centralized Control: In case of a DBMS. and on local database administrator who have the responsibility of their respective local database.Features of Distributed vs. the fundamental function of a DBA was to guarantee the safety of data.

namely DISTRIBUTION TRANSPERENCY. ‡ In distributed databases. however their speed of execution is affected. data independence has the same importance as in traditional databases. ‡ By Distribution Transparency we mean that programs can be written as if the database were not distributed. Thus the correctness of programs is unaffected by the movement of data from one site to another. PROGRAMS are UNAFFECTED by the CHANGES in the PHYSICAL ORGANIZATION OF DATA. .Features of Distributed vs. Centralized Database ‡ Data Independence: Data Independence means that the actual organization of the data is transparent to the application programmer. however a new aspect is added to the usual notion of data independence.

storage space is saved by eliminating redundancy. inconsistencies among several copies of the same logical data are automatically avoided by having only one copy ± Second. The LOCALITY OF APPLICATION can be increased if the data is replicated at all sites where applications need it. In distributed databases. . Centralized Database ‡ Reduction of redundancy: In traditional databases. The AVAILABILITY of the system can be increased. because a site failure does not stop the execution of applications at other sites if the data is replicated. The same reason against redundancy which were given for the traditional environment are still valid. 2. there are several reasons for considering data redundancy as a desirable feature: 1.Features of Distributed vs. redundancy was reduced as far as possible for 2 reasons: ± First.

‡ The convenience of replicating a data item increases with the ratio of Retrieval accesses VRS. Update accesses. ‡ If we have several copies of an item. . RETRIEVAL can be performed ON ANY COPY. while UPDATES must be performed consistently ON ALL COPIES. performed by applications to it.

Involves analyzing most important applications. type of access 27 (read or write). Fragmentation Definition and allocation of fragments carried out strategically to achieve: Locality of Reference Improved Reliability and Availability Improved Performance Balanced Storage Capacities and Costs Minimal Communication Costs. . site from which an application is run.Distributed Database Design Fragmentation Relation may be divided into a number of sub-relations. Allocation Each fragment is stored at site with "optimal" distribution. and predicates of read operations. which are then distributed. performance criteria for transactions and applications. Replication Copy of fragment may be maintained at several sites. Quantitative information may include: frequency with which an application is run. based on quantitative/qualitative information. Qualitative information may include transactions that are executed by application.

replication. 31 . Partitioned Database partitioned into disjoint fragments.Data Allocation Four alternative strategies regarding placement of data: Centralized / Partitioned (or Fragmented) / Complete Replication / Selective Replication Centralized Consists of single database and DBMS stored at one site with users distributed across the network. Selective Replication Combination of partitioning. each fragment assigned to one site. and centralization. Complete Replication Consists of maintaining complete copy of database at each site.

Comparison of Strategies for Data Distribution 33 .