You are on page 1of 24

Distributed database

Management Systems
Week-4
Distribution Design Issues
1. Why fragment at all?
2. How to fragment?
3. How much to fragment?
4. How to test correctness?
5. How to allocate?
6. Information requirements?
Fragmentation
Can't we just distribute relations?
• What is a reasonable unit of distribution?
➡ relation
✦ app. views are subsets of relations ->locality
✦ natural to consider it
➡ fragments of relations (sub-relations)
✦ Increase concurrency
✦ single view may depend on multiple fragments (join cost)
✦ Integrity check
Fragmentation Alternatives –
Horizontal
Fragmentation Alternatives –
Vertical
Degree of Fragmentation

• Finding the suitable level of partitioning within this range


Correctness Rules of Fragmentation
Completeness
➡ Decomposition of relation R into fragments R1, R2, ..., Rn is
complete if and only if each data item in R can also be found in some Ri
Reconstruction
➡ If relation R is decomposed into fragments R1, R2, ..., Rn, then
there should exist some relational operator ∇ such that R = ∇1≤i≤nRi
Disjointness
➡ If relation R is decomposed into fragments R1, R2, ..., Rn, and
data item di is in Rj, then di should not be in any other fragment Rk (k ≠
j ).
Allocation Alternatives
Non-replicated
➡ partitioned : each fragment resides at only one site
Replicated
➡ fully replicated : each fragment at each site
➡ partially replicated : each fragment at some of the sites
Rule of thumb:
If read-only queries/ update queries << 1, replication is advantageous,
otherwise replication may cause problems
Comparison of Replication Alternatives w.r.t
DBMS Functions
Information Requirements
• Four categories:
➡ Database information
➡ Application information
➡ Communication network information
➡ Computer system information
Information Requirements for Fragmentation
• Database information
• Concerns with Global Conceptual Schema
Application Information

• Two sets Qualitative and Quantitative


• Qualitative information is: Predicates
• Rule of thumb in investigating predicates
• Simple Predicate
• Given a relation R(A ; A ; : : : ; A ), where A is an attribute defined over domain
1 2 n i

D , a simple predicate p defined on R has the form


i j
Minterm predicate
• It is conjunction of simple predicates

• Given a set of simple predicates


• Pr = {pr1, pr2, …, pm}for a relation R, the set of minterm predicates M
is defined as M = {m1, m2, ……, mz}
Quantitative Information
• Minterm Selectivity
• Number of records THAT SATISFY MINTERM
• Sel(m1) = ?
• Sel(m2) =- ?

• Access frequency
• frequency with which user applications access data. If Q = {q1,….qn} is a set of
user queries, acc(q ) indicates the access frequency of query q in a given
i i

period.
Desirable properties of simple predicates
• The set should be complete.

• Informally, the set should include only predicates with attributes and
conditions that are used in the applications
Completeness
• A set of simple predicate Pr is said to be complete if and only if there is an
equal probability of access by every application to any tuple belonging to any
minterm fragment that is defined according to Pr.

• Case 1: The only application that accesses J wants to access the tuples
according to the location (any location).

• The set of simple predicates Pr= LOC=“Lahore”, LOC=“Islamabad”, LOC=“KHI”


• is complete because each tuple of each fragment has the same probability of
being accessed.
2nd Case:
• Case 2: There is a second application which accesses only those
project tuples where the Salary is less than $200,000.

• The Pr is not complete since some tuple in JPi has higher access
probability

• To make the set complete, we need to add (Salary<= 200,000,


Salary>200,000) to Pr.
Distributed Data Storage

• Replication
• System maintains multiple copies of data, stored in different sites, for
faster retrieval and fault tolerance.
Data Replication
• A relation or fragment of a relation is replicated if it is stored
redundantly in two or more sites.
• Full replication of a relation is the case where the relation is
stored at all sites.
• Fully redundant databases are those in which every site
contains a copy of the entire database.
Data Replication (Cont.)

• Advantages of Replication
• Availability: failure of site containing relation r does not result in unavailability of r
is replicas exist.
• Parallelism: queries on r may be processed by several nodes in parallel.
• Reduced data transfer: relation r is available locally at each site containing a replica
of r.
• Disadvantages of Replication
• Increased cost of updates: each replica of relation r must be updated.
• Increased complexity of concurrency control: concurrent updates to distinct
replicas may lead to inconsistent data unless special concurrency control
mechanisms are implemented.
• One solution: one copy as primary copy and apply concurrency control
operations on primary copy choose
Data Transparency
• Data transparency: Degree to which system user may remain unaware
of the details of how and where the data items are stored in a
distributed system
• Consider transparency issues in relation to:
• Fragmentation transparency
• Replication transparency
• Location transparency
Naming of Data Items - Criteria

1. Every data item must have a system-wide unique name.


2. It should be possible to find the location of data items efficiently.
3. It should be possible to change the location of data items transparently.
4. Each site should be able to create new data items autonomously.
Centralized Scheme - Name Server
• Structure:
• name server assigns all names
• each site maintains a record of local data items
• sites ask name server to locate non-local data items
• Advantages:
• satisfies at single point
• Disadvantages:
• name server is a potential performance bottleneck
• name server is a single point of failure
Use of Aliases
• Alternative to centralized scheme: each site prefixes its own
site identifier to any name that it generates
i.e., site 17.account.
• Fulfills having a unique identifier, and avoids problems
associated with central control.
• However, fails to achieve network transparency.

You might also like