0% found this document useful (0 votes)
236 views11 pages

Module 1 - ELECTIVE 1

The document discusses distributed database management systems (DDBMS). It defines a DDBMS as a logical database that is divided into fragments stored across multiple networked computer sites. Each site controls local data and participates in global applications. Key advantages of DDBMS include improved availability, reliability, performance and ability to reflect an organization's structure. However, maintenance is more difficult due to data being split across locations.

Uploaded by

group.four.it2r7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
236 views11 pages

Module 1 - ELECTIVE 1

The document discusses distributed database management systems (DDBMS). It defines a DDBMS as a logical database that is divided into fragments stored across multiple networked computer sites. Each site controls local data and participates in global applications. Key advantages of DDBMS include improved availability, reliability, performance and ability to reflect an organization's structure. However, maintenance is more difficult due to data being split across locations.

Uploaded by

group.four.it2r7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

University of Science and Technology of Southern Philippines

Alubijid | Cagayan de Oro | Claveria | Jasaan | Oroquieta | Panaon

Course : IT315 – IT Elective1 (Distributed Database Management System)

Module No. :1

Title : Introduction to Distributed Database Management System

ILO :

✓ Identify the importance of distributed database management system


✓ Identify the advantages of distributing data across multiple sites
✓ Identify the disadvantages of distributed systems

INTRODUCTION TO DISTRIBUTED DATABASE MANAGEMENT SYSTEM

Definition

A distributed database is a collection of many interconnected databases that are physically scattered across
multiple places and communicate over a computer network (Distributed DBMS - Distributed Databases,
n.d.). It encompasses numerous computers or nodes linked by a network. In a distributed database, each
node can store a piece of the data, and the whole database is the sum of the data stored on each node (Kumar,
2022).

A Distributed Database Management System (DDBMS) is made up of a single logical database that is
divided into several parts. Each fragment is kept on one or more computers that are linked by a
communications network and are controlled by a different DBMS. Each site is capable of executing user
requests that require access to local data independently (that is, each site has some degree of local
autonomy), as well as data stored on other computers in the network (Conolly & Begg, 2015).

Characteristics of a Distributed Database (Moore, n.d.)

The following are the characteristics of a Distributed Databases:


• location independent
• Distributed query processing
• Distributed transaction management
• Hardware independent
• Operating system independent
• Network independent
• Transaction transparency
• DBMS independent

Information Management 1
University of Science and Technology of Southern Philippines
Alubijid | Cagayan de Oro | Claveria | Jasaan | Oroquieta | Panaon

Characteristics of a DDBMS

According to Conolly & Begg (2015), the following are the characteristics of a DBMS:
• a collection of logically related shared data;
• the data is split into a number of fragments;
• fragments maybe replicated;
• fragments/replicas are allocated to sites;
• the sites are linked by a communication network;
• the data at each sit is under the control of a DBMS;
• the DBMS at each site can handle local applications, autonomously; and
• each DBMS participated in at least one global application.

Comparison of Database Management System (DBMS) and Distributed Database Management


System (DDBMS)

There key differences between a centralized database (DBMS) and the distributed database (DDBMS).
Here are the following according to www.programmerbay.com , see Table 1:

Table 1. DBMS vs. DDBMS


BASIS CENTRALIZED DATABASE DISTRIBUTED DATABASE
Location It is managed by a single machine or It is spread and split up across
system which is at a single location. various storage device locations
Maintenance Easy to maintain Difficult to maintain
Efficiency Less efficient More efficient
Failure Entire data gets lost Still be able to access other
databases
Response speed Slow Fast
Communication High Low
cost

Location

In the case of a centralized database, the database is administered by a single machine or system located in
a single location that monitors the entire database and ensures its fluidity.

In the case of a distributed database, the database is spread and split up among numerous storage device
locations that are geographically far from each other, necessitating the use of centralized systems to ensure
the efficiency and fluidity of all devices.

Ease of Maintenance

A centralized database stores all data and information in one area and is very easy to manage due to its ease
of access and reach.

Information Management 2
University of Science and Technology of Southern Philippines
Alubijid | Cagayan de Oro | Claveria | Jasaan | Oroquieta | Panaon

In the case of a distributed database, all of the data and information is split up at multiple locations, which
is extremely tough and complex since we must deal with a variety of issues such as avoiding data
redundancy and ensuring data consistency.

Database Efficiency

A centralized database is less efficient than a distributed database since all of the data is stored in one
location, making data retrieval more difficult.

A distributed database is more efficient than a centralized database since data is divided and kept in multiple
locations, ensuring that the data searching process is smooth and time-consuming.

Distributed Database Architecture

Distributed databases can be homogenous or heterogeneous (Moore, n.d.).

Homogenous distributed database

All physical locations in a homogeneous distributed database system use the same underlying hardware and
run the same operating systems and database applications. Homogeneous distributed database systems seem
to users as a single system and can be much easier to build and manage. To be considered homogeneous,
the data structures at each place must be either identical or compatible. Each location's database application
must also be same or compatible.

Heterogenous distributed database

Hardware, operating systems, and database software may differ between locations in a heterogeneous
distributed database. Different sites may use different schemas and software, while schema differences can
complicate query and transaction processing.

Different nodes may have incompatible hardware, software, and data structures, or they may be located in
incompatible places. Users in one area may be able to access data in another, but not upload or modify it.
Heterogeneous distributed databases are frequently difficult to use, rendering them economically unviable
for many enterprises.

Advantages and Disadvantages of DDBMs

Given the promising potential of a DDBMS, still there are advantages and disadvantages of this type of
database. According the Conolly & Begg (2015), these are the advantages and disadvantages of DDBMs.

Advantages

Reflects organizational structure


Many organizations are organically scattered, which reflects organizational structure over multiple places.

Information Management 3
University of Science and Technology of Southern Philippines
Alubijid | Cagayan de Oro | Claveria | Jasaan | Oroquieta | Panaon

Improved shareability and local autonomy


Data distribution can represent an organization's geographical dispersal; users at one site can access data
housed at other places. Data can be placed among the users who regularly utilize that data on the site. Users
have local authority over the data in this manner, allowing them to define and enforce local policies
surrounding the usage of this data. A global DBA is in charge of the entire system.

Improved Availability
A failure at one DDBMS site or a communication link failure that renders some sites inaccessible does not
render the entire system unworkable. Distributed DBMSs are built to function in the face of such failures.
If a single node fails, the system may be able to redirect queries to another site.

Improved Reliability
Because data can be replicated such that it resides at multiple locations, the failure of a node or a
communication link does not always render the data unavailable.

Improved Performance
Because the data is close to the point of "highest demand," and because distributed DBMSs have intrinsic
parallelism, the speed of database access may be faster than that of a remote centralized database.
Furthermore, because each site handles a subset of the complete database, there may not be the same level
of contention for CPU and I/O services as with a centralized DBMS.

Economics
It is now widely understood that creating a system of smaller computers with the comparable power of a
single giant computer costs substantially less. This makes purchasing different computers for business
divisions and departments more cost-effective. It is also far less expensive to add workstations to a network
than it is to update a mainframe system.

The second possible cost savings occurs when databases are geographically distributed. The applications
demand remote access to distributed data. In such circumstances, due to the relative cost of data
transmission across the network versus the cost of local access, it may be far more cost effective to split the
application and execute the processing locally at each site.

Modular Growth
When databases are geographically scattered, the second probable cost savings occurs. Remote access to
dispersed data is required by the applications. In such cases, due to the relative cost of data transmission
across the network versus the cost of local access, splitting the program and executing the processing
locally at each site may be significantly more cost effective.
Integration
The integration of legacy systems is one example of how some organizations are required to use distributed
data processing to allow legacy systems to coexist with more modern systems. At the same time, no single
package can deliver all of the features that a modern business requires. As a result, it is critical for
companies must be able to connect software components from several vendors in order to satisfy their
individual needs.

Information Management 4
University of Science and Technology of Southern Philippines
Alubijid | Cagayan de Oro | Claveria | Jasaan | Oroquieta | Panaon

Remaining competitive
E-business, computer-supported collaborative work, and workflow management are three contemporary
technologies that significantly rely on distributed database technology. Many businesses have
To remain competitive, they had to reorganize their operations and adopt distributed database technology.

Disadvantages

Complexity
A distributed database management system that conceals its distributed nature from the user while
providing an acceptable level of performance, dependability, and availability is intrinsically more complex
than a centralized database management system. The ability to replicate data adds another layer of
complexity to the distributed DBMS. If the program does not handle data replication properly, there will
be a decrease in availability, reliability, and performance as compared to the centralized system, and the
previously mentioned advantages will become drawbacks.

Cost
Because of the increased complexity, we can expect the procurement and
A DDBMS's maintenance costs will be higher than those of a centralized DBMS. Furthermore, a distributed
DBMS necessitates the purchase of additional hardware in order to build a network between sites. With the
utilization of this network, there are constant communication costs. There are also labor costs associated
with managing and maintaining the local DBMSs and the underlying network.

Security
In a distributed DBMS, not only must access to duplicated data be handled in multiple locations, but the
network itself must be made secure. Networks were formerly regarded to be an insecure means of
communication. Although this is still somewhat true, significant progress in network security has been
made.
Integrity control more difficult.
The authenticity and consistency of stored data is referred to as database integrity. Constraints, which are
consistency rules that the database is not permitted to disobey, are commonly used to express integrity.
Improving Integrity access to a substantial amount of data that defines the constraint but is not engaged in
the actual update action is typically required for constraints. The communication and processing expenses
required to enforce integrity in a distributed DBMS constraints can be prohibitively expensive.

Lack of standards
Although distributed DBMSs rely on excellent communication, standard communication and data access
protocols are only now becoming available. The lack of standards has severely hampered the potential of
distributed DBMSs. There are no tools or approaches available to assist users in converting a centralized
DBMS to a distributed DBMS.

Information Management 5
University of Science and Technology of Southern Philippines
Alubijid | Cagayan de Oro | Claveria | Jasaan | Oroquieta | Panaon

Lack of experience
Despite many of the protocols and issues being well recognized, general-purpose distributed DBMSs have
not been extensively adopted. As a result, we lack the same amount of industry experience that we do with
centralized DBMSs. This could be a big turnoff for someone who wants to use this technology.

Database design more complex


A distributed database must be designed to account for data fragmentation, allocation of fragments to
specified sites, and data replication in addition to the usual challenges of creating a centralized database.

Reasons to use DDBMS

As the organization grow with time and database developers and/or administrators need to keep their
databases updated and can keep up with the rapid changes in the organization, there is a need for the shift
of database system from centralized to a distributed database. Listed below are the reasons according to
www.tutorialspoint.com (2019).

Distributed Nature of Organizational Units


The majority of corporations nowadays are divided into numerous units that are geographically dispersed
across the world. Each component needs its own unique set of local data. As a result, the organization's
whole database is scattered.

Need for Sharing of Data


The various organizational units frequently need to communicate with one another and share resources and
data. This necessitates the usage of shared databases or replicated databases that must be utilized in
synchronization.

Support for Both OLTP and OLAP


Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) are two different types
of processing that use different systems that might share data. Both of these operations are aided by
distributed database systems by providing synchronized data.

Database Recovery
Replication of data across various sites is one of the common methods utilized in DDBMS. If a database at
any site is corrupted, data replication automatically aids in data recovery. As the broken site is being rebuilt,
users can access data from other sites. As a result, database failure may become virtually invisible to users.

Support for Multiple Application Software


The majority of businesses employ various application software, each with unique database support. A
unified functionality for using the same data across several platforms is provided by DDBMS.

Information Management 6
University of Science and Technology of Southern Philippines
Alubijid | Cagayan de Oro | Claveria | Jasaan | Oroquieta | Panaon

Types of Distributed Databases

Since data is distributed to different sites for a more efficient performance, there are different types of
distributed databases that can be used to implement it. According to Moore (n.d.), these are the different
types of distributed databases that can be considered:

Data instances are created in various areas of the database using replicated data. Distributed databases can
access identical data locally by using duplicated data, which reduces traffic. Read-only and writable data
are the two types of replicated data that can be distinguished.

Only the initial instance of replicated data can be changed in read-only versions; all subsequent corporate
data replications are then updated. Data that is writable can be changed, but only the initial occurrence is
affected.

Primary keys that point to a single database record are used to identify horizontally fragmented data.
Horizontal fragmentation is typically used when business locations only want access to the database for
their own branch.

Utilizing primary keys that are duplicates of each other and accessible to each branch of the database is
how vertically fragmented data is organized. When a company's branch and central location deal with the
same accounts differently, vertically fragmented data is used.

APACHE CASSANDRA BASICS

A distributed NoSQL database management system known as Apache Cassandra is open-source and built
to handle large volumes of data across numerous commodity servers while ensuring high availability and
fault tolerance. It was first created at Facebook and later made available for free as a component of the
Apache Software Foundation (Apache Cassandra | Apache Cassandra Documentation, n.d.).

Cassandra was created primarily to deal with the problems associated with maintaining massive, highly
accessible, and constantly changing data collections. It is renowned for its capacity to scale horizontally,
spread data across numerous nodes, and provide reliable performance despite hardware failures. Because
of this, it is the best option for businesses and applications that need to process and store data in real time
without any interruptions (Apache Cassandra | Apache Cassandra Documentation, n.d.).

Features

Distributed Architecture
Cassandra uses a distributed peer-to-peer architecture without a single point of failure. Multiple nodes are
used to distribute data, ensuring high availability and data redundancy.

Scalability
Cassandra can scale read and write activities with ease by expanding the cluster's nodes. This enables it to
scale linearly and manage enormous amounts of data and traffic.

Information Management 7
University of Science and Technology of Southern Philippines
Alubijid | Cagayan de Oro | Claveria | Jasaan | Oroquieta | Panaon

NoSQL Data Model


Applications with dynamic and changing data structures are ideally suited for Cassandra's flexible, schema-
less data architecture. Text, integers, dates, and a variety of other data formats are supported.

High Availability
Even if a few nodes in the cluster fail, data is still accessible thanks to fault tolerance and replication
techniques. As a result, Cassandra is appropriate for use in mission-critical applications.

Tunable Consistency
Cassandra offers a balance between data consistency and system efficiency by letting users select the level
of consistency they require for their operations.

Query Language (CQL)


For developers with SQL knowledge, Cassandra offers the Cassandra Query Language (CQL), a query
language that is similar to SQL.

Support for Time Series Data


Due to its suitability for time series data storage and analytics, Cassandra is widely used in IoT, monitoring,
and event logging applications.

Wide Adoption
Numerous businesses, including big tech firms, utilize Cassandra for a number of purposes, including
content management, online shopping, financial services, and other things.

Apache Cassandra is a strong, distributed NoSQL database system that excels at managing massive amounts
of data, guaranteeing high availability, and offering scalability. It is an excellent option for contemporary,
data-intensive applications that demand real-time performance and durability since it can handle dynamic
and developing data structures (Apache Cassandra | Apache Cassandra Documentation, n.d.).

Difference of Cassandra vs DBMS

Table 2. Differences between Cassandra and DBMS


S.No. CASSANDRA RDBMS

Cassandra is a high performance and highly RDBMS is a Database management system


scalable distributed NoSQL database or software which is designed for relational
1. management system. databases.

RDBMS uses SQL for querying and


Cassandra is a NoSQL database.
2. maintaining the database.

Information Management 8
University of Science and Technology of Southern Philippines
Alubijid | Cagayan de Oro | Claveria | Jasaan | Oroquieta | Panaon

S.No. CASSANDRA RDBMS

3. It deals with unstructured data. It deals with structured data.

4. It has a flexible schema. It has fixed schema.

Cassandra has peer-to-peer architecture with no RDBMS has master-slave core architecture
5. single point of failure. means a single point of failure.

Cassandra handles high volume incoming data RDBMS handles moderate incoming data
6. velocity. velocity.

In RDBMS there is limited data source means In Cassandra there are various data source
7. data come from many locations. means data come from one/few location.

8. It supports simple transactions. It supports complex and nested transactions.

In Cassandra the outermost container is In RDBMS the outermost container is


9. Keyspace. database.

10. Cassandra follows decentralized deployments. RDBMS follows centralized deployments.

In RDBMS mainly data are written in one


In Cassandra data written in many locations.
11. location.

In Cassandra row represents a unit of


In RDBMS row represents a single record.
12. replication.

In Cassandra column represents a unit of


In RDBMS column represents an attribute.
13. storage.

In Cassandra, relationships are represented In RDBMS relationships are represented


14. using collections. using keys and join etc.

Source: Difference between Cassandra and RDBMS, 2020

Information Management 9
University of Science and Technology of Southern Philippines
Alubijid | Cagayan de Oro | Claveria | Jasaan | Oroquieta | Panaon

References:

[1] Distributed DBMS - Distributed Databases.


(n.d.). https://www.tutorialspoint.com/distributed_dbms/distributed_dbms_databases.htm

[2] Kumar, H. (2022, January 24). Distributed Database System in DBMS - Scaler Topics. Scaler
Topics. https://www.scaler.com/topics/dbms/distributed-database-in-dbms/

[3] Conolly, T., & Begg, C. (2015). Database Design : A Practical Approach to Design, Implementation
and Management (6th ed.). Pearson Education Limited.

[4] Moore, L. (n.d.). What is distributed database? - Definition from WhatIs.com. SearchOracle.
https://www.techtarget.com/searchoracle/definition/distributed-database

[5] Difference Between Centralized Database And Distributed Database | Programmerbay. (n.d.).
https://programmerbay.com/difference-between-centralized-database-and-decentralized-database/

[6] Tutorialspoint. (2019). Distributed DBMS - Distributed Databases - Tutorialspoint.


Tutorialspoint.com.
https://www.tutorialspoint.com/distributed_dbms/distributed_dbms_databases.htm

[7] Apache Cassandra | Apache Cassandra Documentation. (n.d.). Cassandra.apache.org.


https://cassandra.apache.org/_/cassandra-basics.html

[8] Difference between Cassandra and RDBMS. (2020, December 14). GeeksforGeeks.
https://www.geeksforgeeks.org/difference-between-cassandra-and-rdbms/

Information Management 10
University of Science and Technology of Southern Philippines
Alubijid | Cagayan de Oro | Claveria | Jasaan | Oroquieta | Panaon

Information Management 11

You might also like