Chapter 5 Distributed Database Design

Chapter 5 Distributed Database Design
- Design of a distributed computer system involves making decision on the

placement of data and programs across the sites of a computer network
- This course concentrates on distribution of data
Alternative Design Strategies

- Top-Down Design Process (Refer text page 104)
- Bottom-Up Design Process (Refer text page 106)
Reasons for Fragmentation
- A relation is not a suitable unit for distribution because application views are
usually subsets of relations. Therefore subsets of relations are more suitable as
distribution unit.
- Relation is not replicated (high volume of remote data accesses)

Relation is replicated at all or some sites (unnecessary replication causes update
and storage problem)
- Increase concurrency and system throughput (Parallel execution of query by

dividing the query into sub queries that operate on fragments)
Disadvantages of fragmentation
- Performance degradation – if applications prevent the decomposition of the

relation into mutually exclusive fragments and the applications views are defined on
more than one fragment
- Difficulty in semantic data control (Integrity checking) as attributes are allocated

to different sites as a result of fragmentation.
Fragmentation
PROJ
PNO PNAME BUDGET LOC
P1 x 150 000 Montreal
P2 y 135 000 New York
P3 z 250 000 New York
Horizontal
PROJ1
P1 x 150 000 Montreal
P2 y 135 000 New York
PROJ2
P3 z 250 000 New York
Vertical
PROJ1
PNO BUDGET
P1 150 000
P2 135 000
P3 250 000
PROJ2
PNO PNAME LOC
P1 x Montreal
P2 y New York
P3 z New York
Note: Primary key (PNO) is included in both fragments

Degree of Fragmentation
Not to fragment at all  fragment to individual tuples/ attributes
Correctness Rules of Fragmentation

- To ensure the database does not undergo semantic change during fragmentation
Completeness
If a relation R is decomposed into fragments R1, R2,…Rn, each data item that can be
found in R can also be found in one or more Ri. For horizontal fragmentation, item =
tuple and for vertical fragmentation, item = attribute
Reconstruction
If a relation R is decomposed into fragments R1, R2,…Rn, it should be possible to
define a relational operator Δ such that
R= Δ Ri
Disjointness
If a relation R is horizontally decomposed into fragments R1, R2,…Rn, and data item,
d is in Rj, it is not in any other fragment Rk (j≠k)
For vertical fragmentation, primary key is repeated in all fragments, therefore
disjointness is defined on the non primary key attributes.
Allocation Alternatives
- Nonreplicated
- Only one copy of any fragment on the network
- Replication
- Fully replicated
- Partially replicated
Horizontal Fragmentation
Information Requirements
1) Database Information
- Concerns the global conceptual schema
- How relations are connected to one another (ER Diagram)
2) Application Information
Qualitative
- Determine the most important predicates used in user queries
- Simple Predicates – E.g SAL > 20 000, TITLE=”Programmer”
- Min term Predicates - Conjunction of simple predicates

- SAL > 20 000  TITLE=”Programmer”
Quantitative
Min term selectivity
- Number of tuples accessed by a query specified according to a given minterm
predicate
Access frequency
- Access frequency of a query in a given period
Primary Horizontal Fragmentation
- Selection operation on the owner relations of a database schema
Ri =  Fi (R) , 1  i  w
1) Determine a set of simple predicates, Pr (complete and minimal)
Simple predicates are said to be;
Complete
If and only if there is an equal probability of access by every application to
any tuple belonging to any minterm predicate defined according to Pr
Minimal
If all the predicates of a set Pr are relevant
2) Derive the set of minterm predicates from the predicates in set Pr. These minterm
predicates determine the fragments used as candidates in allocation step.
3) Elimination of meaningless minterm fragments.

Derived Horizontal Fragmentation
Defined on member relation according to selection operation specified on owner

relation
Ri = R x Si, 1  i  w, where Si =  Fi (S), 1  i  w
Refer example 5.12
When there is more than one possible derived horizontal fragmentation, which
candidate fragmentation to choose is based on 2 criteria;
Refer figure 5.7
1) Fragmentation used in more applications

- Try to facilitate the accesses of heavy users to improve system performance
2) Fragmentation with better join characteristic

- Query execution will be faster when join is performed on smaller relations
- System throughput improves when query can be executed in parallel
Checking for the correctness rules of fragmentation
Completeness
- Primary horizontal fragmentation
Fragmentation is complete if the selection predicates are complete
- Derived horizontal fragmentation

Let R be the member relation,
S be the owner relation,
A be the join attribute
Then for each tuple t of R, there should be a tuple t’ of S such that
t[A] = t’[A]
Reconstruction
- Reconstruction of a global relation from its fragments is performed by the union
operator for primary and derived horizontal fragmentation
Disjointness
- Primary horizontal fragmentation
Disjointness is guaranteed if the minterm predicates are mutually exclusive
- Derived horizontal fragmentation

Disjointness is guaranteed if the join graph is simple
Vertical Fragmentation
Objective
- Partition a relation into smaller relations so that many of the user application will
run on only one fragment
- Minimize execution time of user applications that run on the fragments by

allowing user queries to deal with smaller relation causing a smaller number of page
accesses
There are 2 heuristic approaches for vertical fragmentation

1) Grouping
- Assigning each attributes to one fragment, and at each step join some of fragments
until some criteria is satisfied
- Results in overlapping of fragments
2) Splitting
- Start with a relation and decides on the beneficial partitioning based on the access
behavior of applications to the attributes
- Non-overlapping of fragments
Information Requirements of Vertical Fragmentation
- Vertical partitioning places in one fragment those attributes usually accessed
together
- Attribute usage value,

use(qi, Aj) = 1 if attribute Aj is referenced by query qi
0 otherwise
Refer to example 5.15

Note: Attribute usage matrix
- Attribute usage values are not sufficient for attribute splitting and fragmentation as
they do not represent the weight of application frequencies. Therefore, we need to
form Attribute Affinity
Refer to example 5.16

Note: Attribute Affinity Matrix
Clustering Algorithm
- Bond energy algorithm is used to group the attributes based on attribute affinity
values
- Bond energy algorithm takes as input the attribute affinity matrix, permutes its
rows and columns, to generate Clustered Affinity Matrix in 3 steps
1) Initialization
A1 A2
A1 45 0
A2 0 80
A3 45 5
A4 0 75
2) Iteration
cont(A1,A2, A3) = 2bond(A1, A2) + 2bond(A2, A3) - 2bond(A1, A3)
= 2*225 + 2*890 – 2*4410 = -6590

= 2*4410 + 2*890 – 2*225 = 10150

= 2*4410 + 2*225 – 2*890 = 7490
Since the contribution of the ordering (1-3-2) is the largest, therefore
A1 A3 A2
A1 45 45 0
A2 0 5 80
A3 45 53 5
A4 0 3 75
Continue with column A4

= 2*890 + 2*11865 – 2*768 = 23974

= 2*768 + 2*11865 – 2*890 = 23486

= 2*768 + 2*890 – 2*11865 = -20414
Since the contribution of the ordering (3-2-4) is the largest, therefore
A1 A3 A2 A4
A1 45 45 0 0
A2 0 5 80 75
A3 45 53 5 3
A4 0 3 75 78
3) Row ordering
A1 A3 A2 A4
A1 45 45 0 0
A3 45 53 5 3
A2 0 5 80 75
A4 0 3 75 78
- Based on Clustered Affinity Matrix, we have 2 fragments

- When the partition algorithm is applied to CA matrix obtained from relation
PROJ, the result is the definition of fragments FPROJ = {PROJ1, PROJ2}, where
PROJ1= {A1, A3} and PROJ1= {A1, A2, A4}
Thus
PROJ1= {PNO, BUDGET}
PROJ2= {PNO, PNAME, LOC}
Hybrid / Mixed / Nested Fragmentation

- Sometimes a simple horizontal or vertical fragmentation of a database will not
sufficient to satisfy the requirements of user application
- We may have a vertical fragmentation followed by horizontal fragmentation or

vice versa
Refer to figure 5.19
- To reconstruct the original global relation in case of hybrid fragmentation, starts at

the leaves of the tree and moves upward by performing joins and unions
Refer to figure 5.20

Chapter 5 Distributed Database Design

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 5 Distributed Database Design

Uploaded by

Copyright:

Available Formats

Chapter 5 Distributed Database Design

- Design of a distributed computer system involves making decision on the

- This course concentrates on distribution of data

Alternative Design Strategies

Reasons for Fragmentation

- Relation is not replicated (high volume of remote data accesses)

- Increase concurrency and system throughput (Parallel execution of query by

- Performance degradation – if applications prevent the decomposition of the

- Difficulty in semantic data control (Integrity checking) as attributes are allocated

Note: Primary key (PNO) is included in both fragments

Correctness Rules of Fragmentation

- Simple Predicates – E.g SAL > 20 000, TITLE=”Programmer”

- Min term Predicates - Conjunction of simple predicates

1) Determine a set of simple predicates, Pr (complete and minimal)

Simple predicates are said to be;

3) Elimination of meaningless minterm fragments.

Defined on member relation according to selection operation specified on owner

Ri = R x Si, 1  i  w, where Si =  Fi (S), 1  i  w

Refer example 5.12

Refer figure 5.7

1) Fragmentation used in more applications

2) Fragmentation with better join characteristic

- Derived horizontal fragmentation

- Derived horizontal fragmentation

- Minimize execution time of user applications that run on the fragments by

There are 2 heuristic approaches for vertical fragmentation

- Results in overlapping of fragments

- Attribute usage value,

Refer to example 5.15

Refer to example 5.16

cont(A1,A3, A2) = 2bond(A1, A3) + 2bond(A3, A2) - 2bond(A1, A2)

cont(A3,A1, A2) = 2bond(A3, A1) + 2bond(A1, A2) - 2bond(A3, A1)

Since the contribution of the ordering (1-3-2) is the largest, therefore

cont(A3,A2, A4) = 2bond(A3, A2) + 2bond(A2, A4) - 2bond(A3, A4)

cont(A3,A4, A2) = 2bond(A3, A4) + 2bond(A4, A2) - 2bond(A3, A2)

cont(A4,A3, A2) = 2bond(A4, A3) + 2bond(A3, A2) - 2bond(A4, A2)

Since the contribution of the ordering (3-2-4) is the largest, therefore

- Based on Clustered Affinity Matrix, we have 2 fragments

Hybrid / Mixed / Nested Fragmentation

- We may have a vertical fragmentation followed by horizontal fragmentation or

Refer to figure 5.19

- To reconstruct the original global relation in case of hybrid fragmentation, starts at

Refer to figure 5.20

You might also like