You are on page 1of 57

LECTURE 9 & 10

DISTRIBUTED DATABASE MANAGEMENT MUHAMMAD HAMIZ MOHD RADZI

SYSTEMS
COURSE OUTLINE
 Evolution of DDBMS
 Advantages and Disadvantages of DDBMS
 Distributed Processing and Distributed Database
 Characteristics of DDBMS
 DDBMS Components
 Level of Data and Process Distribution
 Distributed Database Transparency
 Distribution Transparency
 Distributed Database Design
MUHAMMAD HAMIZ MOHD RADZI
OBJECTIVES
At the end of this lesson, you should be able to:
 Describe centralize DBMS with its features and problem
 Describe the evolution of DDBMS
 Explain the advantages and disadvantages of DDBMS
 Explain the distributed processing and distributed database
 Explain the characteristics of DDBMS
 Describe the components of DDBMS
 Explain SPSD, MPSD, MPMD

MUHAMMAD HAMIZ MOHD RADZI


 Explain distribution, transaction, failure and performance transparency
 Explain fragmentation, location and local mapping transparency
 Describe the horizontal, vertical and mixed fragmentation of database design
 Explain fully, partially and un-replicated replication of database design
 Explain the data allocation of database design

MUHAMMAD HAMIZ MOHD RADZI


INTRODUCTION
 Previously, database is known as one large single logical data.
Application
program 1
(with data
semantics)
DBMS
description
Application
program 2 manipulation
(with data database
semantics)
control

Application
program 3
(with data
semantics) MUHAMMAD HAMIZ MOHD RADZI
 Database is located at the server and processing is split between server and client
to lessen data traffic on the network.
 However, there a few limitations to the design of a database that makes DBMS is
not desirable.
 Hence, concept of DDBMS is came into the context.
 DDBMS means the database are fragmented into few parts and are allocated to
few places.
 Even though the database are allocated into few places, the DDBMS treats the
database as single logical data.

MUHAMMAD HAMIZ MOHD RADZI


EVOLUTION OF DDBMS
 During the 1970s, corporations implemented centralized database management
systems to meet their structured information needs.

 Structured information is usually presented as regularly issued formal reports in a


standard format.

 Thus, structured information needs are well served by centralized systems.

 Required that corporate data be stored in a single central site and data access
provided through dumb terminals

MUHAMMAD HAMIZ MOHD RADZI


CENTRALIZED DATABASE
A database system which resides at one of the nodes of a network of
computers.
Site 1
Site 2

Site 5

Communication
Network

Site 4 Site 3
MUHAMMAD HAMIZ MOHD RADZI
PROBLEMS WITH CENTRALIZED DB
 Performance degradation as number of remote sites grew

 High cost to maintain large centralized DBs

 Reliability problems with one, central site

 The site with the database can become a bottleneck.

 Data availability is not efficient

 Possible availability problem: if the site with the database goes down, there can be
no data access.
MUHAMMAD HAMIZ MOHD RADZI
DDB CONCEPT
 Instead of having one, centralized database, we are going to spread the data out
among various cities on the distributed network, each of which has its own
computer and data storage facilities.

 All of this distributed data is still considered to be a single logical database.

 Location transparency - The user just issues the query, and the result is returned.

 It is not necessary to know where on the network the data being sought is located.
MUHAMMAD HAMIZ MOHD RADZI
DDBMS
 A distributed database (DDB) is a collection of multiple, logically interrelated
databases distributed over a computer network.

 A distributed database management system (DDBMS) is the software that


manages the DDB and provides an access mechanism that makes this distribution
transparent to the users.

MUHAMMAD HAMIZ MOHD RADZI


FACTORS THAT AIDED DDBMS TO COPE WITH
TECHNOLOGICAL ADVANCEMENT
Acceptance of Internet as a platform for business

Mobile wireless revolution

Usage of application as a service

Focus on mobile business intelligence

MUHAMMAD HAMIZ MOHD RADZI


DISTRIBUTING THE DATA
 Headquartered in NY, a company’s database
consists of 6 large tables: A, B, C, D, E, F.

 With a centralized database, all 6 tables


would be located in NY.

 The company has major sites in Los Angeles,


Memphis, New York, Paris, and Tokyo.

 The first and simplest idea in distributing the


data would be to disperse the six tables
among the five sites, perhaps based on
frequency of use of each table.
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTING THE DATA
 Tables A and B are kept at New York

 Table C is moved to Memphis

 Tables D and E are moved to Tokyo

 Table F is moved to Paris.

MUHAMMAD HAMIZ MOHD RADZI


DISTRIBUTING THE DATA
 Paris employees can now access Table F without incurring
telecommunications costs associated with accessing Table F in NY.

 Local autonomy - Paris employees, e.g., can take responsibility for Table F -
- its security, backup and recovery, and concurrency control.

MUHAMMAD HAMIZ MOHD RADZI


DISTRIBUTING THE DATA: PROBLEMS
 When the database was centralized at New York, a query issued
at any of the sites that required a join of two or more of the
tables could be handled in the standard way by the computer at
New York.

 The result would then be sent to the site that issued the query.

 In the dispersed approach, a join might require tables located at


different sites!
MUHAMMAD HAMIZ MOHD RADZI
ADVANTAGES AND DISADVANTAGES OF DDBMS
Advantages Disadvantages

• Data are located near • Complexity of management


greatest demand site and control
• Faster data access and • Technological difficulty
processing • Security
• Growth facilitation • Lack of standards
• Improved communications • Increased storage and
• Reduced operating costs infrastructure requirements
• User-friendly interface • Increased training cost
• Less danger of a single- • Costs incurred due to the
point failure requirement of duplicated
• Processor independence infrastructure
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTED PROCESSING AND DISTRIBUTED
DATABASES
 Distributed processing – a database’s logical processing is shared among two or
more physically independent sites that are connected through a network
 One computer performs I/O, data selection and validation while second
computer creates reports
 Uses a single-site database but the processing chores are shared among several
sites

 Distributed database – stores a logically related database over two or more


physically independent sites. The sites are connected via a network
 Database is composed of database fragments which are located at different
sites and may also be replicated among various sites

MUHAMMAD HAMIZ MOHD RADZI


DISTRIBUTED PROCESSING ENVIRONMENT

MUHAMMAD HAMIZ MOHD RADZI


DISTRIBUTED DATABASE ENVIRONMENT

MUHAMMAD HAMIZ MOHD RADZI


 Distributed processing does not require a distributed database, but a
distributed database requires distributed processing (each database
fragment is managed by its own local database process).

 Distributed processing may be based on a single database located on a


single computer. For the management of distributed data to occur, copies or
parts of the database processing functions must be distributed to all data
storage sites.

 Both distributed processing and distributed databases require a network to


connect all components.

MUHAMMAD HAMIZ MOHD RADZI


CHARACTERISTICS OF DDBMS
A DBMS must have at least the following functions to be classified as distributed:

 Application interface to interact with the end user, application programs, and other DBMSs
within the distributed database.
 Validation to analyze data requests for syntax correctness.
 Transformation to decompose complex requests into atomic data request components.
 Query optimization to find the best access strategy. (Which database fragments must be
accessed by the query, and how must data updates, if any, be synchronized?)
 Mapping to determine the data location of local and remote fragments.
 I/O interface to read or write data from or to permanent local storage.
MUHAMMAD HAMIZ MOHD RADZI
 Formatting to prepare the data for presentation to the end user or to an
application program.

 Security to provide data privacy at both local and remote databases.

 Backup and recovery to ensure the availability and recoverability of the database
in case of a failure.

 DB administration features for the database administrator.

 Concurrency control to manage simultaneous data access and to ensure data


consistency across database fragments in the DDBMS.

 Transaction management to ensure that the data moves from one consistent state to
another.

 This activity includes the synchronization of local and remote transactions as well as
transactions across multiple distributed segments.
MUHAMMAD HAMIZ MOHD RADZI
 Must perform all the functions of a centralized DBMS

 Must handle all necessary functions imposed by the distribution of data and
processing

 Must perform these additional functions transparently to the end user

MUHAMMAD HAMIZ MOHD RADZI


DDBMS COMPONENTS

MUHAMMAD HAMIZ MOHD RADZI


 Computer workstations or remote devices (sites or nodes) that form the network
system.
 Network hardware and software components that reside in each workstation or
device. The network components allow all sites to interact and exchange data.
 Communications media that carry the data from one node to another. The DDBMS
must be communications media-independent; that is, it must be able to support
several types of communications media.
 The transaction processor (TP), which is the software component found in each
computer or device that requests data. The transaction processor receives and
processes the application’s data requests (remote and local). The TP is also known
as the application processor (AP) or the transaction manager (TM).
 The data processor (DP), which is the software component residing on each
computer or device that stores and retrieves data located at the site. The DP is also
known as the data manager (DM). A data processor may even be a centralized
DBMS.

MUHAMMAD HAMIZ MOHD RADZI


MUHAMMAD HAMIZ MOHD RADZI
FUNCTIONS OF DISTRIBUTED DBMS
 Receives the request of an application
 Validates analyzes, and decomposes the request
 Maps the request
 Decomposes request into several I/O operations
 Searches and validates data
 Ensures consistency, security, and integrity
 Validates data for specific conditions
 Presents data in required format

MUHAMMAD HAMIZ MOHD RADZI


LEVELS OF DATA & PROCESS DISTRIBUTION
 Current database systems can be classified on the basis of how process distribution
and data distribution are supported.
 For example, a DBMS may store data in a single site (centralized DB) or in multiple
sites (distributed DB) and may support data processing at a single site or at
multiple sites.

MUHAMMAD HAMIZ MOHD RADZI


SINGLE-SITE PROCESSING, SINGLE-SITE DATA
(SPSD)
Processing is done on a single host computer
Data stored on host computer’s local disk
Processing restricted on end user’s side
DBMS is accessed by dumb terminals

MUHAMMAD HAMIZ MOHD RADZI


FIGURE 12.6 - SINGLE-SITE PROCESSING, SINGLE-
SITE DATA (CENTRALIZED)

MUHAMMAD HAMIZ MOHD RADZI


MULTIPLE-SITE PROCESSING, SINGLE-SITE DATA
(MPSD)
 Multiple processes run on different computers sharing a single data repository
 Require network file server running conventional applications
 Accessed through LAN
 Client/server architecture
 Reduces network traffic
 Processing is distributed
 Supports data at multiple sites

MUHAMMAD HAMIZ MOHD RADZI


MULTIPLE-SITE PROCESSING, SINGLE-SITE DATA

MUHAMMAD HAMIZ MOHD RADZI


MULTIPLE-SITE PROCESSING, SINGLE-SITE DATA
(MPSD)
 Fully distributed database management system
 Support multiple data processors and transaction processors at multiple sites

 Classification of DDBMS depending on the level of support for various types of


databases
 Homogeneous: Integrate multiple instances of same DBMS over a network
 Heterogeneous: Integrate different types of DBMSs
 Fully heterogeneous: Support different DBMSs, each supporting different data
model

MUHAMMAD HAMIZ MOHD RADZI


RESTRICTIONS OF DDBMS
 Remote access is provided on a read-only basis
 Restrictions on the number of remote tables that may be accessed in a single
transaction
 Restrictions on the number of distinct databases that may be accessed
 Restrictions on the database model that may be accessed

MUHAMMAD HAMIZ MOHD RADZI


DISTRIBUTED DATABASE TRANSPARENCY

Distribution Transaction Failure


transparency transparency transparency

Performance Heterogeneity
transparency transparency

MUHAMMAD HAMIZ MOHD RADZI


 Distribution transparency – user does not know where data is located and if
replicated or partitioned

 Transaction transparency – transaction can update at several network sites to


ensure data integrity

 Failure transparency – system continues to operate in the event of a node failure


(other nodes pick up lost functionality)

 Performance transparency – allows system to perform as if it were a centralized


DBMS. No performance degradation due to use of a network or platform
differences

 Heterogeneity transparency – allows the integration of several different local


DBMSs under a common schema
MUHAMMAD HAMIZ MOHD RADZI
DISTRIBUTION TRANSPARENCY
 Allows management of a physically dispersed database as though it were a centralized
database

 Supported by a distributed data dictionary (DDD) which contains the description of the entire
database as seen by the DBA
 The DDD is itself distributed and replicated at the network nodes

 Three levels of distribution transparency are recognized:

1. Fragmentation transparency – user does not need to know if a database is partitioned;


fragment names and/or fragment locations are not needed
2. Location transparency – fragment name, but not location, is required
3. Local mapping transparency – user must specify fragment name and location
MUHAMMAD HAMIZ MOHD RADZI
MUHAMMAD HAMIZ MOHD RADZI
The EMPLOYEE table is divided among three locations (no replication)

Suppose an employee wants to find all employees with a birthdate prior to jan
1, 1940

 Fragmentation transparency-
 SELECT * FROM EMPLOYEE WHERE EMP_DOB < ’01-JAN-1940’;

 Location transparency-
 SELECT * FROM E1 WHERE EMP_DOB < ’01-JAN-1940’ UNION SELECT * FROM E2 …
UNION SELECT * FROM E3…;

 Local Mapping Transparency


 SELECT * FROM E1 NODE NY WHERE EMP_DOB < ’01-JAN-1940’ UNION SELECT * FROM
E2 NODE ATL … UNION SELECT * FROM E3 NODE MIA…;

MUHAMMAD HAMIZ MOHD RADZI


DISTRIBUTED DATABASE DESIGN
Data fragmentation:

 How to partition the database into fragments

Data replication:

 Which fragments to replicate

Data allocation:

 Where to locate those fragments and replicas

MUHAMMAD HAMIZ MOHD RADZI


DATA FRAGMENTATION
 Breaks single object into two or more segments or fragments

 Each fragment can be stored at any site over a computer network

 Information about data fragmentation is stored in the distributed data


catalog (DDC), from which it is accessed by the TP to process user
requests

MUHAMMAD HAMIZ MOHD RADZI


DATA FRAGMENTATION STRATEGIES / TYPES
Horizontal fragmentation:
 Division of a relation into subsets (fragments) of tuples (rows)
 Same Schema

Vertical fragmentation:
 Division of a relation into attribute (column) subsets
 Each fragment must contain the primary key

Mixed fragmentation:
 Combination of horizontal and vertical strategies MUHAMMAD HAMIZ MOHD RADZI
MUHAMMAD HAMIZ MOHD RADZI
HORIZONTAL FRAGMENTATION EXAMPLE

MUHAMMAD HAMIZ MOHD RADZI


VERTICALLY FRAGMENTED TABLE CONTENTS

MUHAMMAD HAMIZ MOHD RADZI


A SAMPLE CUSTOMER TABLE

MUHAMMAD HAMIZ MOHD RADZI


HORIZONTAL FRAGMENTATION OF THE CUSTOMER
TABLE BY STATE

MUHAMMAD HAMIZ MOHD RADZI


VERTICALLY FRAGMENTED TABLE CONTENTS

Two separate areas in the company use different fields of the table in
the daily activities – the SERVICE dept and the COLLECTIONS dept
MUHAMMAD HAMIZ MOHD RADZI
MIXED FRAGMENTATION OF THE CUSTOMER TABLE

The table is divided horizontally by the three states and within


each state there is a vertical fragmentation by department
MUHAMMAD HAMIZ MOHD RADZI
TABLE CONTENTS AFTER THE MIXED FRAGMENTATION PROCESS

MUHAMMAD HAMIZ MOHD RADZI


CORRECTNESS RULES
Completeness
 Any data item must be covered by at least one fragment

Reconstruction
 There exist relation operators (select, project, join…) to reconstruct the original table from
fragments
 Horizontal fragmentation - Union
 Vertical fragmentation - Join
 Hybrid fragmentation - combination of relational operators

Disjointness
 Remove duplicate data
 Horizontal fragmentation - a row cannot appear in more than one fragment.
 Vertical fragmentation – non-key attributes cannot appear in more than one fragment.
MUHAMMAD HAMIZ MOHD RADZI
DATA REPLICATION
 Storage of data copies at multiple sites served by a computer network

 Fragment copies can be stored at several sites to serve specific information


requirements

 Can enhance data availability and response time

 Can help to reduce communication and total query costs

 Imposes additional processing overhead

 Which copy do you read when submitting a query

 All copies must be updated when a write occurs


MUHAMMAD HAMIZ MOHD RADZI
DATA REPLICATION

MUHAMMAD HAMIZ MOHD RADZI


REPLICATION SCENARIOS
Fully replicated database:
 Stores multiple copies of each database fragment at multiple sites
 Can be impractical due to amount of overhead

Partially replicated database:


 Stores multiple copies of some database fragments at multiple sites
 Most DDBMSs are able to handle the partially replicated database well

Un-replicated database:
 Stores each database fragment at a single site
 No duplicate database fragments
 Database size, usage frequency and costs (performance, overhead, management)
influence the decision to replicate
MUHAMMAD HAMIZ MOHD RADZI
DATA ALLOCATION
Deciding where to locate data: Data distribution over a computer network is achieved
through data partition, data replication, or a combination of both

Allocation strategies:

Centralized data allocation


 Entire database is stored at one site

Partitioned data allocation


 Database is divided into several disjointed parts (fragments) and stored at several
sites

Replicated data allocation


 Copies of one or more database fragments are stored at several sites MUHAMMAD HAMIZ MOHD RADZI
REFERENCES
Database Systems: A Practical Approach to Design, Implementation, and Management,
Thomas Connolly and Carolyn Begg, 5th Edition, 2010, Pearson.

Fundamental of Database Management Systems, Mark L. G., 2nd Edition, 2012, John
Wiley.

MUHAMMAD HAMIZ MOHD RADZI

You might also like