Fragmentation/1
» What is a reasonable unit of distribution? Relation or fragment of
relation?
> Relations as unit of distribution:
> If the relation is not replicated, we get a high volume of remote data
accesses.
> If the relation is replicated, we get unnecessary replications, which
cause problems in executing updates and waste disk space
>» Might be an OK solution, if queries need all the data in the relation
and data stays only at the sites that use the data
> Fragments of relation as unit of distribution
> Application views are usually subsets of relations
> Thus, locality of accesses of applications is defined on subsets of
relations
» Permits a number of transactions to execute concurrently, since they
will access different portions of a relation
> Parallel execution of a single query (intra-query concurrency)
However, semantic data control (especially integrity enforcement) is
more difficult
=> Fragments of relations are (usually) appropriate unit of distribution.
DDBS14. SLO2 8/60 M. Bohlen
’Fragmentation/2
>» Fragmentation aims to improve:
Reliability
Performance
Balanced storage capacity and costs
Communication costs
Security
» The following information is used to decide fragmentation
v
>
>
>
>
> Quantitative information: cardinality of relations, frequency of
queries, site where query is run, selectivity of the queries, etc.
» Qualitative information: predicates in queries, types of access of data,
read/write, etc.
Dpasis, stoz 9/60 M. BohlenFragmentation/3
> Types of Fragmentation
> Horizontal: partitions a relation along its tuples
» Vertical: partitions a relation along its attributes
> Mixed/hybrid: a combination of horizontal and vertical fragmentation
(a) Horizontal Fragmentation
(b) Vertical Fragmentation (c) Mixed Fragmentation
DDBSI4. SLOZ 10/60 M. BohlenFragmentation /4
> Example of database instance
ASG.
rel [om
Pi
DDBSI4. SLaz
ENP
ENO
el
3
4
65
&6
£8
PROS
J, Doe
Lchu
R Davis |Mech. Eng,
4. Jones [Syst Anal
2
BRRBBRVPIV®B
PNo|
Pt
2 | Database Develop, 135000
New York
1/60
1M. BahlenFragmentation/5
> Example (contd.): Horizontal fragmentation of PROJ relation
> PROJI: projects with budgets less than 200'000
> PROJ2: projects with budgets greater than or equal to 200'000
PRON,
Pwo | PNAME uncer | toc
Pt Instumeiaton 150000 | Montreal
P2 | Database Develop. | 125000 | New York
PROL,
Pwo | _ PNAME suncet_| toc
pa | cancan 23000 | New Yo
pa | Maintenance 10000 | Pais
DDBSI4. SLOZ 12/60 M. BohlenFragmentation/6
> Example (contd.): Vertical fragmentation of PROJ relation
> PROJ1: information about project budgets
> PROJ2: information about project names and locations
ROL, PROJ,
PNo| BUDGET nro] PNAME | Loc
Pt | 150000 P| Instrumentation | Montreal
Po | 136000 P2 | Database Develop} New York
Pa | 260000 P3._ | CADICAM New York
Pa | 310000 Pa | Maintenance | Paris
DDBSI4. SLaz
13/60
1M. BahlenCorrectness Rules of Fragmentation
> Completeness
> Decomposition of relation R into fragments Ri, Rp,..., Ry is
complete iff each data item in R can also be found in some R;.
» Reconstruction
» If relation R is decomposed into fragments Ri, Ro,....R,, then there
should exist some relational operator V that reconstructs R from its
fragments, ie, R= RiV...VR,
> Union to combine horizontal fragments
> Join to combine vertical fragments
> Disjointness
> If relation R is decomposed into fragments Ri, Ro,...,Rn and data
item d; appears in fragment Rj, then d; should not appear in any
other fragment Rj, k # j (exception: primary key attribute for
vertical fragmentation)
> For horizontal fragmentation, data item is a tuple
» For vertical fragmentation, data item is an attribute
DDBSI4. SLaz ie M. BahlenIdea of Horizontal Fragmentation
> Intuition behind horizontal fragmentation
» Every site should hold all information that is used to query the site
> The information at the site should be fragmented so the queries of
the site run faster
> Horizontal fragmentation is defined as selection operation, op(R)
> Example:
oeuncer<200K(PROJ)
opupcet>200K(PROJ)
DpaSI4. SLa2 bie 1M. BahlenInformation Requirements/1
Database information:
> Links between relations (a link models a 1:N relationship between
relations that are related to each other by an equality join)
PAY
TITLE, SAL
4
EMP PROJ
ENO, ENAME, TITLE |PNO, PNAME, BUDGET, LOC|
> Cardinality of relations: card(R)
DDBSI4. SLOZ 16/60 M. BohlenHorizontal Fragmentation /3
> Example: Fragmentation of the PROJ relation
> Consider the following query: Find the name and budget of projects
given their location.
> The query is issued at all three locations
» Fragmentation based on LOC, using the set of predicates
{LOC = ‘Montreal’, LOC = ‘NewYork’, LOC = ‘Paris’ }
PROL = c1.0c~ nonest(PROJ)
PNO | PNAME: BUDGET | LOC
PL Instrument. | 150000 Montreal
PRO = 01.00 era (PROJ)
PNO PNAME BUDGET LOC
P2 DB Develop. 135000 =~ New York
P3 CAD/CAM, 250000 =~ New York
PROM = o.0c~nws(PROJ)
PNO PNAME — BUDGET LOC
P4 Maintenance 310000 Paris
DDBSI4. SLaz 260 1M. BahlenHorizontal Fragmentation /4
> If access is only according to the location, the above set of
predicates is complete
» ie, in each fragment PROJ; each tuple has the same probability of
being accessed
> If there is a second query/ application that accesses only those
project tuples where the budget is less than $200K, the set of
predicates is not complete.
> P2in PROJ, has higher probability to be accessed
DDaSI4. SLoz Eo M. BahlenHorizontal Fragmentation/5
> Example (contd.):
» Add BUDGET < 200K and BUDGET > 200K to the set of
predicates to make it complete.
=> {LOC = ‘Montreal’, LOC = ‘NewYork’, LOC = ‘Paris’,
BUDGET > 200K, BUDGET < 200K} is a complete set
» Minterms to fragment the relation are given as follows:
LOC = ‘Montrea') \ (BUDGET < 200K)
LOC = 'Montrea’) \ (BUDGET > 200K)
LOC = NewYork’) \ (BUDGET < 200K)
LOC = ‘NewYork’) \ (BUDGET > 200K)
LOC = 'Pars') \ (BUDGET < 200K)
(
(
(
(
(
(LOC = 'Pari') « (BUDGET > 200K)
DDBSI4. SLa2 Eo M. BahlenHorizontal Fragmentation /6 m3
> Example (contd.): Now, PROJ), /
fragments
2,3 will be split in two
PRO = ¢.0c= Went (PROJ) PROJ, = o10¢~ Neuve (PROJ)
PNO|PNAME —|BUDGET|LOC PNO|PNAME [BUDGET|LOC
PL Instrument [150000 | Montreal] [P3_|CAD/CAM|250000 |New York
PROS: = o10c=Newiet'(PROJ) PROJs = oxoc-pae (PROJ)
PNO|PNAME |BUDGET|LOC | [PNO|PNAME |BUDGET|LOC
P2_|DB Develop.|135000 [New York] [P4 |Maintenance]310000_ | Pars
» Note that the following fragments are empty:
> c10c~s',80crz0x (PROJ)
> 010 Honma sancr>x0 (PROJ)
DDBSI4. SLOZ 24/60 M. BohlenVertical Fragmentation/1
> Objective of vertical fragmentation is to partition a relation into a
set of smaller relations so that many of the applications will run on
only one fragment.
> Vertical fragmentation of a relation R produces fragments
R,,Ro,..., each of which contains a subset of R’s attributes.
> Vertical fragmentation is defined using the projection operation of
the relational algebra:
TA ArwaAa(R)
> Example:
PROJ: = Tpno,suocer(PROJ)
PROLp = Tpno,pname.Loc(PROJ)
> Vertical fragmentation has also been studied for (centralized) DBMS
> Smaller relations, and hence less page accesses
> eg., MONET system
DDBSI4. SLaz nie M. BahlenVertical Fragmentation/2
» Vertical fragmentation is more complicated than horizontal
fragmentation
» In horizontal partitioning: for n simple predicates, the number of
possible minterms is 2"; some of them can be ruled out by existing
implications/ constraints.
> In vertical partitioning: for m non-primary key attributes, the number
of possible fragments is equal to B(m) (= the mth Bell number), ie.,
the number of partitions of a set with m members.
> For large numbers, B(m) ~ m” (e.g., B(15) = 10°)
DDBSI4. SLaz 060 M. BahlenVertical Fragmentation/3
> Two types of heuristics for vertical fragmentation exist:
> Grouping: assign each attribute to one fragment, and at each step,
join some of the fragments until some criteria is satisfied.
> Bottom-up approach
> Splitting: starts with a relation and decides on beneficial partitionings
based on the access behaviour of applications to the attributes.
> Top-down approach
> Results in non-overlapping fragments
> Optimal solution is probably closer to the full relation than to a set of
small relations with only one attribute
DDBSI4. SLaz [Nie M. BahlenVertical Fragmentation/4
> Application information: The major information required as input
for vertical fragmentation is related to applications (queries)
» Since vertical fragmentation places in one fragment those attributes
usually accessed together, there is a need for some measure that
would define more precisely the notion of “togetherness”, ie., how
closely related the attributes are
> This information is obtained from queries and collected in the
Attribute Usage Matrix and Attribute Affinity Matrix.
DDBSI4. SLaz 31/60 1M. BahlenVertical Fragmentation/6
> Example: Consider relation PROJ(PNO, PNAME, BUDGET, LOC)
and queries:
q1 = SELECT BUDGET FROM PROJ WHERE PNO=Value
2 = SELECT PNAME, BUDGET FROM PROJ
gg = SELECT PNAME FROM PROJ WHERE LOC=Value
q4 = SELECT SUM(BUDGET) FROM PROJ WHERE LOC =Value
> Abbreviations:
A, = PNO, Ay = PNAME, A3 = BUDGET, Ay = LOC
> Attribute Usage Matrix
DDBSI4. SLaz 33/60 1M. Bahlen