You are on page 1of 13

DISTRIBUTED DATABASES

Definitions
• It is a collection of multiple interconnected databases, which are spread physically across various
locations that communicate via a computer network.
• It can simply be defined as a database system that stores data in multiple locations instead of one
location. This means that rather than putting all data on one server or on one computer, data is
placed on multiple servers or in a cluster of computers consisting of individual nodes.
Features of Distributed Databases

1. Databases in the collection are logically interrelated with each other. Often they represent
a single logical database.
2. Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of other sites.
3. The processors in the sites are connected via network. They do not have any multiprocessor
configuration. (A multiprocessor is a set of multiple processors that executes instruction
simultaneously)
4. A distributed database is not a loosely connected file system it is rather a loosely coupled
system in which the individual components are not so thoroughly bound together that a
change in one breaks the other. This means that a failure in one of the sites or systems does
not affect the functioning of the system as a whole because another system can complete
the task.
5. A distributed database incorporates transaction processing but it is not synonymous with a
transaction processing system.
Diagram 1

Memory
Database
Location 2

Communication
Channel

Memory Memory
Database Database
location 3 Location 4
Options for distributing a database

1. Data replication – It is about keeping the same copies at different sites. The
whole database may be reproduced and maintained at all or a few sites, or a
particular table may be reproduced and maintained at all or few of the sites.
2. Horizontal partitioning – It is about partitioning a table by records without
disturbing the structure of the table. For example, if you have a table EMP
which stores data according to a schema EMP(Eno, Ename, Dept,
Dept_location), then horizontal partitioning of EMP on Dept_location is about
breaking employee records according to the department location values and
store different set of employee details at different locations. The data at
different locations will be different but the schema will be the same ie
EMP(Eno, Ename, Dept, Dept_location). Each horizontal fragment must have all
the columns of the original base table.
Options for distributing a database

3. Vertical partitioning – It is about partitioning a table vertically ie


decomposition. Hence, the partition of tables at different locations
will be of different structures. For example assume the schema
EMP(Eno, Ename, Dept, Dept_location). If you would like to break the
above schema like one to store employee details and the other to
store the department details, it can be done as follows: Emp(Eno,
Ename, Dept), and Dept(Dept, Dept_location). These two tables can
be stored at different locations foe ease of access according to the
defined organization policies for example.
4. Hybrid approach – It is a combination of few or all of the above
said techniques.
Factors Encouraging DDBMS

• Distributed nature of organisational units – Most organizations in the current times are
subdivided into multiple units that are physically distributed over the globe. Each unit requires
its own set of local data. Thus the overall database of the organisataion becomes distributed.
• Need for sharing data – The multiple organizational units often need to communicate with
each other and share their data and resources. This demands common databases or replicated
databases that should be used in a synchronized manner.
• Reliability and availability – In DDBMS, if one system fails or stops working, another system can
complete the task. Say the server or computer in location one fails, another server in any of
the other locations can serve the client request(s).
• Support for both OLTP and OLAP – Online Transaction Processing and Online Analytical
Processing work upon diversified systems which may have common data. OLAP and OLTP are
the two primary data processing systems used in data science. The difference between the two
is that OLAP is used for complex data analysis while OLTP is used for real-time processing of
online transactions at scale.
Distribution transparency

• It is the the property of distributed databases by the virtue of


which the internal details of the distribution are hidden from the
users. The DDBMS designer may choose to fragment tables,
replicate the fragments and store them at different sites.
However, since users are oblivious of these details, they find the
distributed database easy to use like any other centralized
database. The three dimensions of distribution transparency are:
1. Location transparency

It is the ability to access data without knowing its physical location.


It ensures that the user can query on any fragment(s) of a table as if
they were stored locally in the users site. The fact that the table or
its fragments are stored at remote sites in the distributed database
system, should be completely oblivious to the end user. The address
of the remote sites and the access mechanisms are completely
hidden.
2.Replication transparency

It ensures that replication of databases are hidden from the users. It


enables users to query upon a table as if only a single copy of the
table exists. Replication transparency is associated with concurrency
transparency and failure transparency. Whenever a user updates a
data item, the update is reflected in all the copies of the table.
However, this operation should not be known to the user. This is
concurrency transaparency. Also, in case of failure of a site, the
user can still proceed with his queries using replicated copies
without any knowledge of failure. This is failure transparency.
3.Failure transparency

• It enables the concealment of faults, allowing users and


application programs to complete their tasks despite the failure of
hardware or software components. It requires that faults are
concealed such that applications can continue to function without
any impact on behavior or correctness arising from the fault.
Databases are however susceptible to a number of failures and
they are broadly categorized into software, hardware and network
failures.
4. Concurrency transparency

• It enables several processes to operate concurrently using shared


resources without interference between them. This means that
the system should provide each user with the illusion that they
have exclusive access to the resources or data. Mechanisms are
put in place to ensure that the consistency of shared data is
maintained despite being accessed and updated by multiple
processes.
Commit protocol

• It is a protocol used to ensure atomicity and durability of distributed


transactions. Atomicity in DDBMS refers to the property of a database
transaction where all the action within the transaction are executed as a
single, indivisible unit of work. It ensures that the transaction is either
fully completed or fully rolled back to the state it was in before the
transaction began.
• Any database system should guarantee that the desirable properties of a
transaction are maintained even after failure. If a failure occurs during
the execution of a transaction, it may happen that all the changes bought
about by the transaction are not committed. This makes the database
inconsistent. Commit protocols prevent this scenario using either
transaction undo (roll back) or transaction redo (roll forward).

You might also like