You are on page 1of 33

Lecture 4

April 5, 2024 Distributed Databases - CSC451 1


Recap
 Objectives of a distributed system
 Transparencies in a Distributed DBMS
 Data Independence
 Network Transparency
 Replication Transparency
 Fragmentation Transparency
 Language Transparency
 Architectural Model Design Issues
 Autonomy
 Distribution
 Heterogeneity
 Architectural Models for Distributed DBMSs
 Standard ANSI/SPARC Architecture
 Distributed DBMS Architecture
 Multi-DBMS Architecture

April 5, 2024 Distributed Databases - CSC451 2


Outline
Architectural Models for Distributed DBMSs …
Client Server Architecture
Distributed Database Design
Design Approaches
How to distribute the data
Replicated and non-replicated database distribution
Fragmentation
Allocation

April 5, 2024 Distributed Databases - CSC451 3


Multiple Clients / Single Server

April 5, 2024 Distributed Databases - CSC451 4


Multiple Clients / Multiple Server

• SQL Interface
• Programmatic interface
• Other application
support environments

April 5, 2024 Distributed Databases - CSC451 5


Distributed Database Design
Design Approaches
Top down
Mostly in designing systems from scratch
Mostly in homogeneous systems
Bottom up
When the database already exists in a number of
sites

April 5, 2024 Distributed Databases - CSC451 6


Top down design
Requirements
Analysis

Objectives

User Input
Conceptual View Integration View Design
Design

Access
GCS Information ES’s

Distribution
Design User Input

LCS’s

Physical
Design

LIS’s
April 5, 2024 Distributed Databases - CSC451 7
Dimensions of the Design Problems
Access pattern behavior
dynamic
static
partial
information
data
Level of knowledge
data + complete
program information

Level of sharing

April 5, 2024 Distributed Databases - CSC451 8


Distribution Design Issues
Why fragment at all?
How to fragment?
How much to fragment?
How to test correctness?
How to allocate?
Information requirements?

April 5, 2024 Distributed Databases - CSC451 9


Fragmentation
What is a reasonable unit of distribution?
Relation
 Requires extra communication
Fragments of relations (sub-relations)
 Views that cannot be defined on a single fragment will
require extra processing
 Semantic data control (especially integrity enforcement

will be more difficult)

April 5, 2024 Distributed Databases - CSC451 10


Horizontal Fragmentation
 PROJ1: Projects with budgets less PROJ
than KES 200,000 PNO PNAME BUDGET LOC
 PROJ2: Projects with budgets P1 Instrumentation 150,000 Nairobi
Database
more than or equal to KES P2 development 135,000 Kisumu
200,000 P3 KSM/BSA 250,000 Kisumu
P4 Maintenance 310,000 Kakamega
P5 KSM/BSA 500,000 Eldoret

PROJ1 PROJ2
PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC
P1 Instrumentation 150,000 Nairobi P3 KSM/BSA 250,000 Kisumu
Database P4 Maintenance 310,000 Kakamega
P2 development 135,000 Kisumu P5 KSM/BSA 500,000 Eldoret

April 5, 2024 Distributed Databases - CSC451 11


Vertical Fragmentation
 PROJ1: Information about PROJ
project budgets PNO PNAME BUDGET LOC
 PROJ2: Information about P1 Instrumentation 150,000 Nairobi
P2 Database development 135,000 Kisumu
project names and locations P3 KSM/BSA 250,000 Kisumu
P4 Maintenance 310,000 Kakamega
P5 KSM/BSA 500,000 Eldoret

PROJ1 PROJ2
PNO BUDGET PNO PNAME LOC
P1 150,000 P1 Instrumentation Nairobi
P2 135,000 P2 Database development Kisumu
P3 250,000 P3 KSM/BSA Kisumu
P4 310,000 P4 Maintenance Kakamega
P5 500,000 P5 KSM/BSA Eldoret

April 5, 2024 Distributed Databases - CSC451 12


Correctness of Fragmentation
 Completeness
 Decomposition of relation R into fragments R1, R2, ..., Rn is complete if and
only if each data item in R can also be found in some Ri
 Reconstruction
 If relation R is decomposed into fragments R1, R2, ..., Rn, then there should
exist some relational operator such that
R = 1≤i≤nRi
 Disjointness
 If relation R is decomposed into fragments R1, R2, ..., Rn, and data item di is
in Rj, then di should not be in any other fragment Rk (k ≠ j ).

April 5, 2024 Distributed Databases - CSC451 14


Allocation Alternatives
Non-replicated
partitioned : each fragment resides at only one site
Replicated
fully replicated : each fragment at each site
partially replicated : each fragment at some of the sites
Rule of thumb:
read - only queries
1
If replication is advantageous,
update quries
otherwise replication may cause problems

April 5, 2024 Distributed Databases - CSC451 15


Information Requirements
Four categories:
Database information
Application information
Communication network information
Computer system information

April 5, 2024 Distributed Databases - CSC451 16


Fragmentation
Horizontal Fragmentation
Primary Horizontal Fragmentation (PHF)

Derived Horizontal Fragmentation (DHF)

Vertical Fragmentation (VF)

HyBrid Fragrmentation (HF)

April 5, 2024 Distributed Databases - CSC451 17


Working Example
EMP ASG
 ENO ENAME TITLE ENO PNO RESP DUR
E1 Mark Njenga Electrical Eng. E1 P1 Manager 12
E2 Janet Musili Syst. Analyst E2 P1 Analyst 24
E3 Emmah Otieno Programmer E3 P2 Programmer 6
E4 Linnet Kyalo Mech. Eng.

PROJ TITLE
PNO PNAME BUDGET LOC TITLE SAL
P1 Maintenance 150,000 Nairobi Electrical Eng. 40000
P2 Database development 85,000 Kisumu Syst. Analyst 34000
P3 Advisory 135,000 Kisumu Programmer 24000
P4 Assurance 97,000 Nairobi Mech. Eng. 27,000

April 5, 2024 Distributed Databases - CSC451 18


PHF Information Requirements
Database information
 •Relationship
PAY
TITLE, SAL

EMP PROJ
ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC

ASG
ENO, PNO, RESP, DUR

April 5, 2024 Distributed Databases - CSC451 19


PHF - Information Requirements
Application Information
 Find all Maintenance projects less than 100,000
 Simple predicates
PNAME = ‘Maintenance’
BUDET <= 100,000
 Minterm Predicates
Given R and Pr = {p1, p2, …, pn}
Define M = {m1, m2, …, mn} as
M={mi | mi = ∧Pi∈ P pj*}
Example
m1: PNAME=‘Maintenance’ ∧ BUDET<=100,000
m2: NOT (PNAME=‘Maintenance’) ∧ BUDET<=100,000
m3: PNAME=‘Maintenance’ ∧ NOT (BUDET<=100,000)
m4: NOT (PNAME=‘Maintenance’) ∧ NOT (BUDET<=100,000)

April 5, 2024 Distributed Databases - CSC451 20


Primary Horizontal Fragmentation
Definition:
Rj = σFj (R), 1<=j<=w

Where Fj is a selection formula which is (preferably) a


minterm predicate

A horizontal fragment Ri of relation R consists of all the


tuples of R that satisfy a minterm predicate mi .

There are as many horizontal fragments (also called minterm


fragments) as there are minterm predicates.
April 5, 2024 Distributed Databases - CSC451 21
Selecting Simple Predicates
Given: A relation R, the set of simple predicates Pr
Output: The set of fragments R = {R1, R2, …, Rw} which obey the
fragmentation rules

Preliminaries:
 Pr should be complete
 Pr should be minimal

April 5, 2024 Distributed Databases - CSC451 22


Completeness of Simple Predicates
 A set of simple predicates Pr is said to be complete if and only if
the access to the tuples of the minterm fragments defined on Pr
requires that two tuples of the same minterm fragment have the
same probability of being accessed by any application.

 Example:
 Assume PROJ[PNO, PNAME, BUDGET, LOC] has two applications
defined on it.
 Find the budgets of projects at each location (1)
 Find projects with budgets less than KES 100,000 (2)

April 5, 2024 Distributed Databases - CSC451 23


Completeness of Simple Predicates
According to (1)
Pr={LOC="Kisumu", LOC="Nairobi"}

which is not complete with respect to (2)

Modify
Pr={LOC="Kisumu", LOC="Nairobi",
Budget <= 100,000, Budget > 100,000}
which is complete

April 5, 2024 Distributed Databases - CSC451 24


Minimality of Simple Predicates
If a predicate influences how fragmentation is
performed (i.e. it causes fragment f to be further
fragmented into, say, fi and fj) then there should be at
least one application that accesses fi and fj differently.
In other words, the simple predicate should be relevant
in determining fragmentation
If all predicates of a set Pr are relevant, then Pr is
minimal

April 5, 2024 Distributed Databases - CSC451 25


Minimality of Simple Predicates
Example:
Pr={LOC="Kisumu", LOC="Nairobi",
Budget <= 100,000, Budget > 100,000}

is minimal (in addition to being complete)


However, if we add
PNAME="Instrumentation"
then Pr is not minimal

April 5, 2024 Distributed Databases - CSC451 26


Examples
Two candidate relations: PAY and PROJ
Fragmentation of relation PAY
 Application: Check the salary info with the aim of
determining raise
 Simple predicates TITLE SAL
Electrical Eng. 40000
p1: SAL <=30,000 Syst. Analyst 34000
Programmer 24000
p2: SAL > 30,000 Mech. Eng. 27,000
Pr={p1, p2} which is complete and minimal
 Minterm predicates
m1: (SAL <= 30,000)
m2: NOT (SAL <=30,000) = (SAL > 30,000)

April 5, 2024 Distributed Databases - CSC451 27


Examples

PAY1 PAY2
TITLE SAL TITLE SAL
Programmer 24000 Electrical Eng. 40000
Mech. Eng. 27,000 Syst. Analyst 34000

April 5, 2024 Distributed Databases - CSC451 28


Examples
Fragmentation of relation PROJ
Applications:
 Find the name and budget of projects given their no (Issued at two sites).
 Access project information according to budget. (one site accesses <= 100,000
and the other > 100,000)
Simple predicates
For application (1)
p1: LOC = "Nairobi" PNO PNAME BUDGET LOC
P1 Maintenance 150,000 Nairobi
P2: LOC = "Kisumu"
P2 Database development 85,000 Kisumu
For application (2) P3 Advisory 135,000 Kisumu
P4 Assurance 97,000 Nairobi
p3: BUDGET<=100,000
p4: BUDGET>100,000
Pr={p1, p2, p3, p4}

April 5, 2024 Distributed Databases - CSC451 29


Examples
Fragmentation of relation PROJ…

Minterm fragments left after elimination

m1: (LOC="Kisumu")^(BUDGET<=100,000)
m2: (LOC="Kisumu")^(BUDGET>100,000)
m3: (LOC="Nairobi")^(BUDGET<=100,000)
m4: (LOC="Nairobi")^(BUDGET>100,000)

April 5, 2024 Distributed Databases - CSC451 30


Examples
PROJ1 PROJ2
PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC
P2 Database development 85,000 Kisumu P3 Advisory 135,000 Kisumu

PROJ3 PROJ4
PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC
P4 Assurance 97,000 Nairobi P1 Maintenance 150,000 Nairobi

April 5, 2024 Distributed Databases - CSC451 31


Correctness
 Completeness
 Since Pr is complete and minimal, the selection predicates are complete.
 Reconstruction
 If relation R is decomposed into fragments R1, R2, ..., Rn, then the Union
operator can obtain the original relation
R =U1≤i≤nRi
 Disjointness
 Minterm predicates that form the basis of fragmentation should be
mutually exclusive

April 5, 2024 Distributed Databases - CSC451 32


Summary
Architectural Models for Distributed DBMSs …
Client Server Architecture
Distributed Database Design
Design Approaches
How to distribute the data
Replicated and non-replicated database distribution
Fragmentation
Allocation

April 5, 2024 Distributed Databases - CSC451 33


Next Lecture
Fragmentation
Derived Horizontal Fragmentation
…

April 5, 2024 Distributed Databases - CSC451 34

You might also like