You are on page 1of 51

Distributed Database Design

Chapter 2

Distributed Database Design


2019-2020 3rd Sem2 NW
Morning/Evening/Dr. Salma

1
Outline
● Introduction
● Fragmentation
⬥ Horizontal fragmentation
⬥ Derived horizontal fragmentation
⬥ Vertical fragmentation
● Allocation

2
Outline
● Introduction
● Fragmentation
⬥ Horizontal fragmentation
⬥ Derived horizontal fragmentation
⬥ Vertical fragmentation
● Allocation

3
Distributed Database Design
● The design of a distributed computer system
involves making decisions on the placement of data
and programs across the sites of a computer
network.
● In distributed DBMSs, such placement involves two
things:
⬥ Placement of the DDBMS software
⬥ Placement of the applications that run on the database

4
Framework of Distribution
Behavior of access pattern

Dynamic
Static
Complete
No sharing
Information
Data sharing

Data + program Level of knowledge


on access pattern
sharing
behavior
Partial
Level of sharing Information
5
1st Dimension: Level of Sharing
In terms of the level of sharing, there are three possibilities.
•First, there is no sharing:
Each application and its data execute at one site, and there is no
communication with any other program or access to any data file at
other sites.
•Data sharing:
All the programs are replicated at all the sites, but data files are
not.
•Data-plus-program sharing:
Both data and programs may be shared, meaning that a program
at a given site can request a service from another program at a
second site, which, in turn, may have to access a data file located at
a third site.

6
2nd Dimension: Access Pattern Behavior

• The access patterns of user requests may be static, so


that they do not change over time, or dynamic.
• It is obviously considerably easier to plan for and
manage the static environments than would be the case
for dynamic distributed systems.

7
3rd Dimension: level of knowledge about the access
pattern behavior

• One possibility is that the designers do not have any


information about how users will access the database.
This is a theoretical possibility, but it is very difficult, if
not impossible, to design a distributed DBMS that can
effectively cope with this situation.
• The more practical alternatives are that the designers
have complete information, where the access patterns
can be predicted and do not deviate significantly from
these predictions.
• Or partial information, where there are deviations from
the predictions.
8
‫‪Design Strategies‬‬
‫●‬ ‫‪Top-Down‬‬
‫‪⬥ Mostly in designing systems from scratch‬‬
‫‪⬥ Mostly in designing homogeneous systems‬‬
‫ﺗﺳﺗﺧدم ھذه اﻹﺳﺗراﺗﯾﺟﯾﺔ ﻓﻲ ﺣﺎﻟﺔ ﻛﺎن ﻣﺻﻣﻣو ﻗواﻋد اﻟﺑﯾﺎﻧﺎت ﻟدﯾﮭم ﺗﺣﻛم ⬥‬
‫ﻛﻠﻲ )ﺳﻠطﺔ ﻣطﻠﻘﺔ( ﻓﻲ ﺗﺻﻣﯾم ﻛﺎﻣل ﻗﺎﻋدة اﻟﺑﯾﺎﻧﺎت‪ ،‬وﻓﻲ ﺗوزﯾﻊ اﻟﺑﯾﺎﻧﺎت‪،‬‬
‫وھذا ﻻ ﯾﺣدث إﻻ ﻟﻠﻣﺻﻣم اﻟﻣﺣظوظ ﻓﻲ ﺣﺎﻻت ﺧﺎﺻﺔ‬
‫●‬ ‫‪Bottom-up‬‬
‫‪⬥ When the databases already exist at a number of‬‬
‫‪sites‬‬
‫ﻋﻠﻰ ﻋﻛس اﻹﺳﺗراﺗﯾﺟﯾﺔ اﻟﺳﺎﺑﻘﺔ‪ ،‬ﻧﺳﺗﺧدم ھذه اﻹﺳﺗراﺗﯾﺟﯾﺔ ﻋﻧدﻣﺎ ﯾﻛون اﻟﻣﺻﻣﻣون ﺑدون أي ﺳﻠطﺔ ﻟﺗﺻﻣﯾم وﺗوزﯾﻊ ﻗواﻋد اﻟﺑﯾﺎﻧﺎت‪ .‬وھذا ﯾﺣدث ﻟﻠﻣﺻﻣم ﺳﯾﺊ اﻟﺣظ ﻓﻲ‬
‫‪.‬ﺣﺎﻻت ﺧﺎﺻﺔ أﯾﺿﺎ‬

‫‪Note: In this lecture, we focus on top-down design strategy.‬‬


‫‪9‬‬
Top-Down Design Process

10
Distribution Design Issues
● Fragmentation
⬥ Why fragmentation at all?
⬥ How should we fragment?
⬥ How much should we fragment?
⬥ How to test correctness of decomposition?
● Allocation
● Necessary information required for fragmentation
and allocation

11
Why Fragment?
● Can’t we just distribute relations?
● What is a reasonable unit of distribution?
⬥ Relations as unit of distribution:
- If the relation is not replicated, we get a high volume of
remote data accesses.
- If the relation is replicated, we get unnecessary
replications, which cause problems in executing
updates and waste disk space.
- Might be an OK solution, if queries need all the data in
the relation and data stays only at the sites that use the
data.
⬥ Fragments of relations (sub-relations) 12
Fragmentation
● Unit of distribution
= unit of data application accesses
☺ Reduce irrelevant data access
☺ Facilitate intra-query concurrency over different fragments
(Parallel execution of a single query)
☺ Can be used with other performance enhancing methods
(e.g., indexing and clustering)
☹ Applications have conflicting requirements, making disjoint
fragmentation a very hard problem
☹ Multiple fragment access requires join or union
☹ Semantic data control (integrity enforcement) could be very
costly and difficult.
13
About fragmentation
● How should we fragment?
⬥ Vertical Fragments sub grouping of attributes
⬥ Horizontal Fragments sub grouping of tuples
⬥ Mixed/Hybrid Fragments combination of above two

● How much to fragment?


⬥ Too little too much of irrelevant data access
⬥ Too much too much processing cost
⬥ Need to find suitable level of fragmentation

14
Correctness Criteria
● Completeness no loss of data
⬥ Decomposition of relation R into fragments R1, R2, …, Rn is
complete if and only if each data item in R can also be
found in some Ri .
● Reconstruction
⬥ If relation R is decomposed into fragments R1, R2, …, Rn
then there should exist some relational operator ∇ such
that R = ∇ 1≤ i ≤ n Ri
● Disjointness
⬥ If relation R is decomposed into fragments R1, R2, …, Rn
and data item di is in Rj, then di should not be in any other
fragment Rk (k!=j). 15
Allocation Alternatives
● Full Replication
⬥ Each fragment resides at each site
● Partial Replication
⬥ Each fragment resides at some of the sites
● Not-replicated (Partitioned)
⬥ Each fragment resides at only one site

Q & A:
What are the advantages and disadvantages?

16
Allocation Alternatives

17
Rule of Thumb for Allocation
If number-of-read-only-queries is more than
number-of-update-queries,
then replication is advantageous, otherwise
replication may cause problems.

18
Information Requirements
● Four categories
⬥ Database information
⬥ Application information
⬥ Communication network information
⬥ Computer system information

19
Outline
● Introduction
● Fragmentation
⬥ Horizontal fragmentation
⬥ Derived horizontal fragmentation
⬥ Vertical fragmentation
● Allocation

20
Fragmentation
● Horizontal fragmentation (HF)
⬥ Primary horizontal fragmentation (PHF)
– based on predicates accessing the relation
⬥ Derived horizontal fragmentation (DHF)
– based on predicates being defined on another logically related
relation

● Vertical fragmentation (VF)


● Hybrid fragmentation (HF)

21
DB example

Figure 3.3

22
PHF Information Requirements
● Database Information
> links
PAY
Title, Sal

L1
EMP PROJ
Eno,Ename, Title Jno, Jname, Budget. Loc
L2 L3
ASG
Fig (3.7)
Eno, Jno, Resp, Dur
● > Cardinality of each relation card (R): The number of
tuples of the relation. 23
Database information
● Figure 3.7 shows the expression of links among the
database relations.
● Note that the direction of the link shows a one-to-many
relationship.
● For example, for each title there are multiple employees
with that title; thus there is a link between the PAY and
EMP relations.
● Along the same lines, the many-to-many relationship
between the EMP and PROJ relations is expressed with
two links to the ASG relation.
● The relation at the tail of a link is called the owner of the
link and the relation at the head is called the member.
24
PHF Information Requirements (cont.)
● Application Information 1
⬥ Simple predicate: Given R(A1 , A2 , ..., An ), with each Ai
having domain of values dom(Ai), a simple predicate pj is
pj : Ai θ Value
where θ ∈{=,<, >, ≤, ≥, ≠} and Value ∈ dom(Ai).

Example

Pname=“maintenance”
Budget ≤ 200000

25
PHF Information Requirements (cont.)
● Application Information 2
⬥ minterm predicate: Given R and a set of simple
predicates Pr = {p1, p2, ..., pm} on R, the set of minterm
predicates M = {m 1, m 2 , ..., m z } is defined as
M = {mi | mi = ∧Pj ∈ Pr p*j }, 1 ≤ i ≤ z, 1 ≤ j ≤ z
where p*j = pj or ¬pj

Example
m1: (Pname=“maintenance”) ∧ (Budget ≤ 200000)
m2: ¬ (Pname=“maintenance”) ∧ (Budget ≤ 200000)
m3: (Pname=“maintenance”) ∧ ¬ (Budget ≤ 200000)
m4: ¬ (Pname=“maintenance”) ∧ ¬ (Budget ≤ 200000) 26
Minterm predicate
● Even though simple predicates are quite
elegant to deal with, user queries quite often
include more complicated predicates, which
are Boolean combinations of simple
predicates.
● One combination that we are particularly
interested in, called a minterm predicate, is
the conjunction of simple predicates.

27
Example
● Consider relation PAY . The following are some of the possible
simple predicates that can be defined on PAY.
p1: TITLE = “Elect. Eng.”
p2: TITLE = “Syst. Anal.”
p3: TITLE = “Mech. Eng.”
p4: TITLE = “Programmer”
p5: SAL ≤ 30000
● The following are some of the minterm predicates that can be defined
based on these simple predicates.
m1: TITLE = “Elect. Eng.” ^ SAL ≤ 30000
m2: TITLE = “Elect. Eng.” ^ SAL > 30000
m3: ┐(TITLE = “Elect. Eng.”) ^ SAL ≤ 30000
m4: ┐(TITLE = “Elect. Eng.”) ^ SAL > 30000
m5: TITLE = “Programmer” ^ SAL ≤ 30000
m6: TITLE = “Programmer” ^ SAL > 30000

28
Example (continued)
● There are a few points to mention here. First, these are not all the minterm
predicates that can be defined; we are presenting only a representative
sample.
● Second, some of these may be meaningless given the semantics of relation
PAY; we are not addressing that issue here.
● Third, these are simplified versions of the minterms. The minterm definition
requires each predicate to be in a minterm in either its natural or its negated
form. Thus, m1, for example, should be written as:
m1: TITLE = “Elect. Eng.” ^ TITLE ≠ “Syst. Anal.” ^
TITLE ≠ “Mech. Eng.” ^ TITLE ≠ “Programmer” ^ SAL ≤ 30000
● However, clearly this is not necessary, and we use the simplified form.
● Finally, note that there are logically equivalent expressions to these
minterms; for example, m3 can also be rewritten as
m3: TITLE ≠ “Elect. Eng.” ^ SAL ≤ 30000

29
PHF Information Requirements (cont.)
● Application Information 3
⬥ Minterm selectivity: sel (mi)
– number of tuples of the relation that would be accessed by a
user query specified according to a given minterm predicate.
⬥ Access frequency: acc (qi)
– frequency with which user applications access data. If Q = {q1 ,
q2 , ..., qn } is the set of queries, acc (qi ) indicates access
frequency of query qi in a given period.

30
Primary Horizontal Fragmentation
● Each horizontal fragment Ri of relation R is defined
by Ri = σFi (R), 1 ≤ i ≤ w, where Fi is a selection formula,
which is (preferably) a minterm predicate.
⬥ A horizontal fragment Ri of relation R consists of all the
tuples of R which satisfy a minterm predicate mi.
● Given a set of minterm predicates M, there are as
many as horizontal fragments of relation R as there
are minterm predicates.
● Set of horizontal fragments also referred to as
minterm fragments.
31
Example 3.7
● The decomposition of relation PROJ into
horizontal fragments PROJ1 and PROJ2 in
Example 3.1 is defined as follows:
PROJ1 = σ BUDGET ≤ 200000 (PROJ)
PROJ2 = σ BUDGET > 200000 (PROJ)

32
Completeness of Simple Predicates
● A set of simple predicates Pr is said to be
complete if and only if the accesses to the tuples
of the minterm fragments defined on Pr requires
that two tuples of the same minterm fragment
have the same probability of being accessed by
any application.

33
Example 3.8
Consider relation PROJ of Figure 3.3. We can
define the following horizontal fragments based on
the project location. The resulting fragments are
shown in Figure 3.8.
PROJ1 = σ LOC=“Montreal” (PROJ)
PROJ2 = σ LOC=“New York” (PROJ)
PROJ3 = σ LOC=“Paris” (PROJ)

34
Example 3.8

35
Example 3.9
● Consider the fragmentation of relation PROJ given in Example
3.8. If the only application that accesses PROJ wants to access
the tuples according to the location, the set is complete since
each tuple of each fragment PROJi (Example 3.8) has the same
probability of being accessed.
● If, however, there is a second application which accesses only
those project tuples where the budget is less than or equal to
$200,000, then Pr is not complete. Some of the tuples within each
PROJi have a higher probability of being accessed due to this
second application.
● To make the set of predicates complete, we need to add
(BUDGET ≤ 200000, BUDGET > 200000) to Pr:
Pr = { LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,
BUDGET ≤ 200000, BUDGET > 200000}
36
Example 3.9

Applications:
Q1: Find the projects at each location
Q2: Find projects with budget less than $200,000 incomplete
Predicates:
 Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”}
 Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,
BUDGET≤ 2000000, BUDGET >200000}
37
Minimality of Simple Predicates
● If a predicate influences how fragmentation is
performed, (i.e., causes a fragment f to be further
fragmented into, say, fi and fj), then there should be
at least one application that accesses fi and fj
differently.
● In other words, the simple predicate should be
relevant in determining a fragmentation.
● If all the predicates of a set Pr are relevant, then Pr is
minimal.

38
Minimality of Simple Predicates
Example 3.10
Applications:
Q1: Find the projects at each location
Q2: Find projects with budget less than $200,000
minimal

 Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,


BUDGET ≤ 200000, BUDGET >200000}
 Pr={ LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,
BUDGET ≤ 200000, BUDGET >200000,
PNAME=“Instrumentation”}
● The resulting set would not be minimal since the new predicate is not relevant
with respect to Pr – there is no application that would access the resulting
fragments any differently. 39
Contradictory Minterms
● Given a minimal and complete set of simple
predicates, containing n simple predicates
● Not all the minterm fragments derived are valid
⬥ A fragment can be self contradictory because of
implications among simple predicates.
Example:
Dom(Sal): [10000, 200000]; Dom(Loc) = {HK,SF}
p1 : sal < 50000; p2 : Loc = HK; p3 : Loc = SF
Note: p2 🡺(¬ p3 ); p3 🡺(¬ p2 )
the minterm p1 ∧ p2 ∧ p3 is self contradictory.
40
PHF Example: PAY PAY

Application:
Check the salary info and determine raise
Employee records kept at two sites ==>
application runs at two sites

Simple predicates: P1: SAL ≤ 30000, P2: SAL > 30000


Pr = {P1, P2}, which is complete and minimal
Pr’ = Pr
Minterm predicates: m1: (SAL ≤ 30000); m2: (SAL > 30000)
PAY1 PAY2

41
PHF Example 3.11 : PROJ
PROJ
Applications
⬥ Find the name and budget of
projects given their locations
– Issued at three sites
⬥ Access project information according
to budget p1: LOC = “Montreal”
– One site access ≤ 200000; other p2 : LOC = “New York”
accesses >200000 p3 : LOC = “Paris”

m1: LOC = “Montreal” ∧ BUDGET ≤


p4: BUDGET ≤ 200000
200000
p5 : BUDGET > 200000
m2 : LOC = “Montreal” ∧ BUDGET >
200000
m3 : LOC = “New York”∧ BUDGET ≤
200000
42
PHF Example: PROJ - Result
PROJ PROJ1

PROJ3

m1: LOC = “Montreal” ∧ BUDGET ≤ 200000


m2 : LOC = “Montreal” ∧ BUDGET > 200000 PROJ4
m3 : LOC = “New York” ∧ BUDGET ≤ 200000
m4 : LOC = “New York” ∧ BUDGET > 200000
m5 : LOC = “Paris” ∧ BUDGET ≤ 200000
m6 : LOC = “Paris” ∧ BUDGET > 200000 PROJ6
The result of the primary horizontal fragmentation of PROJ is to
form six fragments FPROJ = {PROJ1, PROJ2, PROJ3, PROJ4,
PROJ5, PROJ6} of relation PROJ according to the minterm
predicates M. Since fragments PROJ2, and PROJ5 are empty, they
are not depicted in the figure.
43
PHF - Correctness
● Completeness
⬥ Since Pr is complete and minimal, the selection
predicates are complete
● Reconstruction
⬥ If relation R is fragmented into FR = {R1, R2, …, Rr}
R = ∪ Ri , ∀ Ri ∈FR

● Disjointness
⬥ Minterm predicates that form the basis of fragmentation
should be mutually exclusive

44
DHF: Derived Horizontal Fragmentation
DHF: Defined on a
member relation
Owner according to a
relation
PAY selection operation
Title, Sal Each link is on its owner
an equijoin
L1
EMP PROJ
Eno, Ename, Title Pno, Pname, Budget, Loc
L2 L3
ASG
Member
Eno, Pno, Resp, Dur
relation
45
PAY
Title, Sal
DHF – Example
L1
EMP
PAY1= σ SAL≤ 30000 PAY PAY2= σ SAL> 30000 PAY
Eno, Ename, Title

EMP1 = EMP PAY1


EMP

DHF
EMP2 = EMP PAY2

46
DHF: Derived Horizontal Fragmentation
● Let S be horizontally fragmented and let there be a
link L with owner(L) = S, and member(L) = R, the
derived horizontal fragments of R are defined as
Ri = R Si , 1 ≤ i ≤ w
where Si is the horizontal fragment of S, is the
semijoin operator, and w is the maximum number of
fragments
● Inputs to derived horizontal fragmentation:
⬥ partitions of owner relation
⬥ member relation
⬥ the semijoin condition
● The algorithm is straight forward. 47
DHF: Correctness
● Completeness
⬥ primary horizontal fragmentation based on completeness
of selection predicates. For derived horizontal
fragmentation based on referential integrity
● Reconstruction
⬥ Same as primary horizontal fragmentation (via union)
● Disjointedness
⬥ Simple join graphs between the owner and the member
fragments.

48
DHF: Issues
● Multiple owners for a member relation; how
should we derived horizontally fragments of a
member relation?
● There could be a chain of derived horizontal
fragmentation.

49
Example 3.13
Let us now consider ASG. Assume that there are the following two applications:
1. The first application finds the names of engineers who work at certain places. It
runs on all three sites and accesses the information about the engineers who work
on local projects with higher probability than those of projects at other locations.
2. At each administrative site where employee records are managed, users would
like to access the responsibilities on the projects that these employees work on
and learn how long they will work on those projects.
The first application results in a fragmentation of ASG according to the (nonempty)
fragments PROJ1, PROJ3, PROJ4 and PROJ6 of PROJ obtained in Example
3.11. Remember that
PROJ1: σ LOC=“Montreal” ^ BUDGET ≤ 200000 (PROJ)
PROJ3: σ LOC=“New York” ^ BUDGET ≤ 200000 (PROJ)
PROJ4: σ LOC=“New York” ^ BUDGET > 200000 (PROJ)
PROJ6: σ LOC=“Paris” ^ BUDGET > 200000 (PROJ)

50
Question & Answer

51

You might also like