Fragmentation

You might also like

You are on page 1of 18
Fragmentation/1 » What is a reasonable unit of distribution? Relation or fragment of relation? > Relations as unit of distribution: > If the relation is not replicated, we get a high volume of remote data accesses. > If the relation is replicated, we get unnecessary replications, which cause problems in executing updates and waste disk space >» Might be an OK solution, if queries need all the data in the relation and data stays only at the sites that use the data > Fragments of relation as unit of distribution > Application views are usually subsets of relations > Thus, locality of accesses of applications is defined on subsets of relations » Permits a number of transactions to execute concurrently, since they will access different portions of a relation > Parallel execution of a single query (intra-query concurrency) However, semantic data control (especially integrity enforcement) is more difficult => Fragments of relations are (usually) appropriate unit of distribution. DDBS14. SLO2 8/60 M. Bohlen ’ Fragmentation/2 >» Fragmentation aims to improve: Reliability Performance Balanced storage capacity and costs Communication costs Security » The following information is used to decide fragmentation v > > > > > Quantitative information: cardinality of relations, frequency of queries, site where query is run, selectivity of the queries, etc. » Qualitative information: predicates in queries, types of access of data, read/write, etc. Dpasis, stoz 9/60 M. Bohlen Fragmentation/3 > Types of Fragmentation > Horizontal: partitions a relation along its tuples » Vertical: partitions a relation along its attributes > Mixed/hybrid: a combination of horizontal and vertical fragmentation (a) Horizontal Fragmentation (b) Vertical Fragmentation (c) Mixed Fragmentation DDBSI4. SLOZ 10/60 M. Bohlen Fragmentation /4 > Example of database instance ASG. rel [om Pi DDBSI4. SLaz ENP ENO el 3 4 65 &6 £8 PROS J, Doe Lchu R Davis |Mech. Eng, 4. Jones [Syst Anal 2 BRRBBRVPIV®B PNo| Pt 2 | Database Develop, 135000 New York 1/60 1M. Bahlen Fragmentation/5 > Example (contd.): Horizontal fragmentation of PROJ relation > PROJI: projects with budgets less than 200'000 > PROJ2: projects with budgets greater than or equal to 200'000 PRON, Pwo | PNAME uncer | toc Pt Instumeiaton 150000 | Montreal P2 | Database Develop. | 125000 | New York PROL, Pwo | _ PNAME suncet_| toc pa | cancan 23000 | New Yo pa | Maintenance 10000 | Pais DDBSI4. SLOZ 12/60 M. Bohlen Fragmentation/6 > Example (contd.): Vertical fragmentation of PROJ relation > PROJ1: information about project budgets > PROJ2: information about project names and locations ROL, PROJ, PNo| BUDGET nro] PNAME | Loc Pt | 150000 P| Instrumentation | Montreal Po | 136000 P2 | Database Develop} New York Pa | 260000 P3._ | CADICAM New York Pa | 310000 Pa | Maintenance | Paris DDBSI4. SLaz 13/60 1M. Bahlen Correctness Rules of Fragmentation > Completeness > Decomposition of relation R into fragments Ri, Rp,..., Ry is complete iff each data item in R can also be found in some R;. » Reconstruction » If relation R is decomposed into fragments Ri, Ro,....R,, then there should exist some relational operator V that reconstructs R from its fragments, ie, R= RiV...VR, > Union to combine horizontal fragments > Join to combine vertical fragments > Disjointness > If relation R is decomposed into fragments Ri, Ro,...,Rn and data item d; appears in fragment Rj, then d; should not appear in any other fragment Rj, k # j (exception: primary key attribute for vertical fragmentation) > For horizontal fragmentation, data item is a tuple » For vertical fragmentation, data item is an attribute DDBSI4. SLaz ie M. Bahlen Idea of Horizontal Fragmentation > Intuition behind horizontal fragmentation » Every site should hold all information that is used to query the site > The information at the site should be fragmented so the queries of the site run faster > Horizontal fragmentation is defined as selection operation, op(R) > Example: oeuncer<200K(PROJ) opupcet>200K(PROJ) DpaSI4. SLa2 bie 1M. Bahlen Information Requirements/1 Database information: > Links between relations (a link models a 1:N relationship between relations that are related to each other by an equality join) PAY TITLE, SAL 4 EMP PROJ ENO, ENAME, TITLE |PNO, PNAME, BUDGET, LOC| > Cardinality of relations: card(R) DDBSI4. SLOZ 16/60 M. Bohlen Horizontal Fragmentation /3 > Example: Fragmentation of the PROJ relation > Consider the following query: Find the name and budget of projects given their location. > The query is issued at all three locations » Fragmentation based on LOC, using the set of predicates {LOC = ‘Montreal’, LOC = ‘NewYork’, LOC = ‘Paris’ } PROL = c1.0c~ nonest(PROJ) PNO | PNAME: BUDGET | LOC PL Instrument. | 150000 Montreal PRO = 01.00 era (PROJ) PNO PNAME BUDGET LOC P2 DB Develop. 135000 =~ New York P3 CAD/CAM, 250000 =~ New York PROM = o.0c~nws(PROJ) PNO PNAME — BUDGET LOC P4 Maintenance 310000 Paris DDBSI4. SLaz 260 1M. Bahlen Horizontal Fragmentation /4 > If access is only according to the location, the above set of predicates is complete » ie, in each fragment PROJ; each tuple has the same probability of being accessed > If there is a second query/ application that accesses only those project tuples where the budget is less than $200K, the set of predicates is not complete. > P2in PROJ, has higher probability to be accessed DDaSI4. SLoz Eo M. Bahlen Horizontal Fragmentation/5 > Example (contd.): » Add BUDGET < 200K and BUDGET > 200K to the set of predicates to make it complete. => {LOC = ‘Montreal’, LOC = ‘NewYork’, LOC = ‘Paris’, BUDGET > 200K, BUDGET < 200K} is a complete set » Minterms to fragment the relation are given as follows: LOC = ‘Montrea') \ (BUDGET < 200K) LOC = 'Montrea’) \ (BUDGET > 200K) LOC = NewYork’) \ (BUDGET < 200K) LOC = ‘NewYork’) \ (BUDGET > 200K) LOC = 'Pars') \ (BUDGET < 200K) ( ( ( ( ( (LOC = 'Pari') « (BUDGET > 200K) DDBSI4. SLa2 Eo M. Bahlen Horizontal Fragmentation /6 m3 > Example (contd.): Now, PROJ), / fragments 2,3 will be split in two PRO = ¢.0c= Went (PROJ) PROJ, = o10¢~ Neuve (PROJ) PNO|PNAME —|BUDGET|LOC PNO|PNAME [BUDGET|LOC PL Instrument [150000 | Montreal] [P3_|CAD/CAM|250000 |New York PROS: = o10c=Newiet'(PROJ) PROJs = oxoc-pae (PROJ) PNO|PNAME |BUDGET|LOC | [PNO|PNAME |BUDGET|LOC P2_|DB Develop.|135000 [New York] [P4 |Maintenance]310000_ | Pars » Note that the following fragments are empty: > c10c~s',80crz0x (PROJ) > 010 Honma sancr>x0 (PROJ) DDBSI4. SLOZ 24/60 M. Bohlen Vertical Fragmentation/1 > Objective of vertical fragmentation is to partition a relation into a set of smaller relations so that many of the applications will run on only one fragment. > Vertical fragmentation of a relation R produces fragments R,,Ro,..., each of which contains a subset of R’s attributes. > Vertical fragmentation is defined using the projection operation of the relational algebra: TA ArwaAa(R) > Example: PROJ: = Tpno,suocer(PROJ) PROLp = Tpno,pname.Loc(PROJ) > Vertical fragmentation has also been studied for (centralized) DBMS > Smaller relations, and hence less page accesses > eg., MONET system DDBSI4. SLaz nie M. Bahlen Vertical Fragmentation/2 » Vertical fragmentation is more complicated than horizontal fragmentation » In horizontal partitioning: for n simple predicates, the number of possible minterms is 2"; some of them can be ruled out by existing implications/ constraints. > In vertical partitioning: for m non-primary key attributes, the number of possible fragments is equal to B(m) (= the mth Bell number), ie., the number of partitions of a set with m members. > For large numbers, B(m) ~ m” (e.g., B(15) = 10°) DDBSI4. SLaz 060 M. Bahlen Vertical Fragmentation/3 > Two types of heuristics for vertical fragmentation exist: > Grouping: assign each attribute to one fragment, and at each step, join some of the fragments until some criteria is satisfied. > Bottom-up approach > Splitting: starts with a relation and decides on beneficial partitionings based on the access behaviour of applications to the attributes. > Top-down approach > Results in non-overlapping fragments > Optimal solution is probably closer to the full relation than to a set of small relations with only one attribute DDBSI4. SLaz [Nie M. Bahlen Vertical Fragmentation/4 > Application information: The major information required as input for vertical fragmentation is related to applications (queries) » Since vertical fragmentation places in one fragment those attributes usually accessed together, there is a need for some measure that would define more precisely the notion of “togetherness”, ie., how closely related the attributes are > This information is obtained from queries and collected in the Attribute Usage Matrix and Attribute Affinity Matrix. DDBSI4. SLaz 31/60 1M. Bahlen Vertical Fragmentation/6 > Example: Consider relation PROJ(PNO, PNAME, BUDGET, LOC) and queries: q1 = SELECT BUDGET FROM PROJ WHERE PNO=Value 2 = SELECT PNAME, BUDGET FROM PROJ gg = SELECT PNAME FROM PROJ WHERE LOC=Value q4 = SELECT SUM(BUDGET) FROM PROJ WHERE LOC =Value > Abbreviations: A, = PNO, Ay = PNAME, A3 = BUDGET, Ay = LOC > Attribute Usage Matrix DDBSI4. SLaz 33/60 1M. Bahlen

You might also like