You are on page 1of 23

Lecture 4

April 5, 2024 Distributed Databases - CSC451 1


Recap
Architectural Models for Distributed DBMSs …
Client Server Architecture
Distributed Database Design
Design Approaches
How to distribute the data
Replicated and non-replicated database distribution
Fragmentation (HF, VF, Hybrid)
 Allocation
 Primary Horizontal Fragmentation

April 5, 2024 Distributed Databases - CSC451 2


Outline
Horizontal Fragmentation
 Derived Horizontal Fragmentation
Vertical Fragmentation
Distributed Query Processing

April 5, 2024 Distributed Databases - CSC451 3


Derived Horizontal Fragmentation
Defined on a member relation of a link according to
a selection operation specified on its owner.
 Each link is an Equijoin

PAY
PROJ
EMP
ASG

April 5, 2024 Distributed Databases - CSC451 4


Derived Horizontal Fragmentation
Given a link L where owner(L)=S and member(L)=R,
the Derived Horizontal Fragments of R are defined as
follows

Ri = R Si ,1<i<w

Where w is the maximum number of fragments that


will be defined on R and
Si = σFi (S)

Where Fi is the formula according to which the


primary horizontal fragmentation Si is defined
April 5, 2024 Distributed Databases - CSC451 5
Example
Given Link L1 where owner(L1) = Pay and
member(L1) = EMP
EMP1 = EMP PAY1
EMP2 = EMP PAY2
Where
PAY1 = σSAL< 30,000 (PAY)
PAY2 = σSAL> 30,000 (PAY)

April 5, 2024 Distributed Databases - CSC451 6


Example…

PAY1 PAY2
TITLE SAL TITLE SAL
Programmer 24000 Electrical Eng. 40000
Mech. Eng. 27,000 Syst. Analyst 34000

EMP1 EMP2
ENO ENAME TITLE ENO ENAME TITLE
E3 Emmah Otieno Programmer E1 Mark Njenga Electrical Eng.
E4 Linnet Kyalo Mech. Eng. E2 Janet Musili Syst. Analyst

April 5, 2024 Distributed Databases - CSC451 7


Vertical Fragmentation – Information
Requirements
Application Information
 Attribute affinities
 A measure that indicates how closely related the attributes

are
 Attribute usage values

 Givena set of queries Q={q1, q2,..qn} that will run


on relation R[A1,A2,A3]

use (qi, Aj) = { 1 if attribute Aj is referenced by query qi


0 otherwise

April 5, 2024 Distributed Databases - CSC451 8


VF – Information Requirements
Consider the queries

SELECT BUDGET SELECT PNAME, BUDGET


FROM PROJ FROM PROJ
WHERE PNO=Value WHERE BUDGET=Value

SELECT PNAME SELECT SUM(BUDGET)


FROM PROJ FROM PROJ
WHERE LOC=Value WHERE LOC=Value

Let A1=PNO, A2=PNAME, A3=BUDGET, A4=LOC

April 5, 2024 Distributed Databases - CSC451 9


VF – Information Requirements
Access frequencies of queries:

April 5, 2024 Distributed Databases - CSC451 10


Affinity Measure aff(A1,A2)

April 5, 2024 Distributed Databases - CSC451 11


Affinity Measure aff(A1,A2)
Attribute Affinity Matrix:

April 5, 2024 Distributed Databases - CSC451 12


Clustering
Involves reorganizing the attribute affinity matrix to
form clusters where the attributes in each cluster
demonstrate high affinity to each other

April 5, 2024 Distributed Databases - CSC451 13


VF-Correctness

April 5, 2024 Distributed Databases - CSC451 14


Hybrid Fragmentation

April 5, 2024 Distributed Databases - CSC451 15


Fragment Allocation Problem
Problem statement
Given
F={F1, F2, …, Fn} fragments
S={S1, S2, …, Sn} network sites
Q={q1, q2, …, qn} applications
Find the optimal distribution of F to S
Optimality
 Minimal cost
 Communicaiton + Storage + Processing (read & update), Time
 Performance
 Response time and throughput
 Constraints
 Per site constraints (storage & processing)

April 5, 2024 Distributed Databases - CSC451 16


Fragmentation Summary
Information requirements
 Database information
 Selectivity of fragments
 Size of fragments
 Application information
 Access types and numbers
 Access localities
 Communication network information
 Unit cost of storing data at site
 Unit cost of processing data at site
 Compute system information
 Bandwidth
 Latency
 Communication overhead
April 5, 2024 Distributed Databases - CSC451 17
Distributed Query Processing
High level user query

Query
Processor

Low level data processing

April 5, 2024 Distributed Databases - CSC451 18


Problem?

April 5, 2024 Distributed Databases - CSC451 19


Problem in DDBS?
Fragments stored at different sites

Site1 Site2
ASG1= σ ENO<=E3 (ASG) ASG2= σ ENO>E3 (ASG)

Site3 Site4
EMP1= σ ENO<=E3 (EMP) EMP2= σ ENO<=E3 (EMP) Site5
Result

April 5, 2024 Distributed Databases - CSC451 20


Problem in DDBS?
Site 5

EMP1 EMP2
Site 3 Site 4

ASG1 ASG2
Site 1 Site 2

April 5, 2024 Distributed Databases - CSC451 21


Distributed Query Processing Methodology

April 5, 2024 Distributed Databases - CSC451 22


Summary

Horizontal Fragmentation
Derived Horizontal Fragmentation
Vertical Fragmentation
Distributed Query Processing

April 5, 2024 Distributed Databases - CSC451 23

You might also like