What is Distributed Database? A distributed database (DDB) is an integrated collection of databases that is physically distributed across sites in a computer network. ... To form a distributed database system (DDBS), the files must be structured, logically interrelated, and physically distributed across multiple sites. Distributed Database - Why? Data are always available to end users, i.e., they are easily accessible. The availability makes the total system reliable. Distributed database increases the performance of the overall system. Because, the servers are available near the place where it is very much needed. Let us consider the scenario of XYZ bank which is headquartered in New Delhi. Also, assume that the bank maintains its server in its head office. Now, all the bank transactions done at all the branches of XYZ bank must reach the central server to access the data. For example, consider a customer who is trying to withdraw the money from his account through an ATM located in Chennai. His withdrawal request must be sent to the central server, processed in central server, and money will be disbursed in the ATM. The following image shows the Distributed Server approach for the above given scenario. Now assume that, XYZ bank established several servers which are distributed throughout the country, say 10 different servers. Now, any request generated from the ATM from any part of the country will be forwarded to the server available in that part of the country. For any reason, if the requested data is not available with the local server, the server searches for the actual location of the requested data and forwards the request to that server, and routes the answer to the initiator Types of Distributed Databases
Homogeneous Distributed Database
Identical software are used in all the sites
All sites are well known. Partial control over the data of other sites Heterogeneous Distributed Database
Different sites uses different database software
The structure of databases reside in different sites may be different (because of data partitions) Co-operation between sites are limited. That is, it is not easy to alter the structure of the database or any other software used Distributed Databases - Important considerations
Data allocation - We need to know the answers for
the following questions; What to store? Where to store? and How to store? Data fragmentation - It is about, How one should organize the data? Distributed queries and transactions - We must find a way to handle the data using queries and to handle transactions which are happening in multiple distributed sites Different options for distributing a database in a distributed database system Data replication – It is about keeping the same copies at different sites. The whole database may be reproduced and maintained at all or few of the sites, or A particular table may be reproduced and maintained at all or few of the sites Horizontal partitioning – For example, if you have a table EMP which stores data according to a schema EMP(Eno, Ename, Dept, Dept_location), then horizontal partitioning of EMP on Dept_location is about breaking employee records according to the department location values and store different set of employee details at different locations. The data at different locations will be different, but the schema will be the same, ie., EMP(Eno, Ename, Dept, Dept_location). Vertical partitioning
For example, assume the schema EMP(Eno, Ename, Dept,
Dept_location). If you would like to break the above schema like one to store employee details and the other to store the department details, it can be done as follows; EMP(Eno, Ename, Dept), and DEPT(Dept, Dept_location) These two tables might be stored at different locations for ease of access according to the defined organization policies for example. Hybrid approach
Horizontal partitioning and replication of few or all horizontal
partitions. Vertical partitioning and replication of few or all vertical partitions. Vertical partitioning, followed by horizontal partitioning of some vertical partitions, followed by replication of few horizontal partitions, etc. Keywords and Definitions in Distributed Database
Replication (in Database)
R in Site 1 = R in Site 2 = R in Site 3 = ... = R in Site n