DISTRIBUTED DBMS

SUSHIL KULKARNI

 DDBMS Concepts  Applications  Characteristics, Properties of DDBMS  Distributed Processing  Advantages & Disadvantages DDBMS  Types & Functions of DDBMS  Main Issues of DDBMS  Component Architecture for DDBMS  Data Allocation & Fragmentation  Transparencies

CONCEPTS

CONCEPTS
‡ So far, we assume a centralized database 
Data are stored in one location (e.g. a single hard disk)  A centralized database management system to handle transaction  To handle multiple requests, a client-server system is used
- Client send requests for data to server - Server handle query, transaction management etc.

SUSHIL KULKARNI

CONCEPTS
‡ This is not the only possibility ‡ In many cases, it may be advantageous for data to be distributed
± Branches of a bank ± Different part of the government storing different kind of data about a person ± Different organizations sharing part of their data

‡ Thus, distributed databases

SUSHIL KULKARNI

CONCEPTS ‡ Data spread over multiple machines (also referred to as sites or nodes. ‡ Network interconnects the machines ‡ Data shared by users on multiple machines SUSHIL KULKARNI .

CONCEPTS Distributed database Logical interrelated collection of shared data. physically distributed over a computer network. SUSHIL KULKARNI . along with description of data.

CONCEPTS Distributed DBMS The software system that permits the management of the distributed database and makes the distribution transparent to users SUSHIL KULKARNI .

CONCEPTS Applications ‡ User access distributed database via applications SUSHIL KULKARNI .

SUSHIL KULKARNI . ‡ Global application : Application that required data from other sites.CONCEPTS TWO types of Applications ‡ Local application : Application that do not required data from other sites.

TYPES OF DDBMS ‡ In a homogeneous distributed database: ± All sites have identical software. ± Each site surrenders part of its autonomy in terms of right to change schemas or software. ± Appears to user as a single system. SUSHIL KULKARNI . ± Are aware of each other and agree to cooperate in processing user requests.

TYPES OF DDBMS ‡ In a heterogeneous distributed database: ± Different sites may use different schemas and software. ‡ Difference in schema is a major problem for query processing. ± Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing. ‡ Difference in software is a major problem for transaction processing. SUSHIL KULKARNI .

TYPE: HOMOGENEOUS DBMS Identical DBMSs SUSHIL KULKARNI .

TYPE: HETROGENEOUS DBMS Non-identical DBMSs SUSHIL KULKARNI .

security. recovery SUSHIL KULKARNI .OBJECTIVES : DISTRIBUTED ARCHITECTURE ‡ Location Transparency ± User does not have to know the location of the data. logging. ± Data requests automatically forwarded to appropriate sites ‡ Local Autonomy ± Local site can operate with its database when network connections fail ± Each site controls its own data.

SIGNIFICANT TRADE -OFF  Synchronous Distributed Database ‡ All copies of the same data are always identical ‡ Data updates are immediately applied to all copies throughout network ‡ Good for data integrity ‡ High overhead slow response times ‡ Asynchronous Distributed Database ‡ Some data inconsistency is tolerated ‡ Data update propagation is delayed ‡ Lower data integrity ‡ Less overhead faster response time NOTE: all this assumes replicated data (to be discussed later) .

Advantages & Disadvantages  Advantages ‡ Increased reliability & availability ‡ Local control ‡ Modular growth ‡ Lower communication costs ‡ Faster response  Disadvantages ‡ Software cost & complexity ‡ Processing overhead ‡ Data integrity ‡ Slow response .

DISTRIBUTED PROCESSING A centralized database that can be accessed over a computer network. SUSHIL KULKARNI .

DISTRIBUTED PROCESSING T T COM 1 T T T COM 2 T Communication Network DB T T COM 3 T SUSHIL KULKARNI .

including query optimization SUSHIL KULKARNI .FUNCTIONS OF DDBMS Functions of a centralized DBMS plus: extended communication to allow the transfer of queries and data among sites extended system catalog to store data distribution details distributed query processing .

FUNCTIONS OF DDBMS extended concurrency control to maintain consistency of replicated data. extended recovery services to take account of failures of individual sites and common links SUSHIL KULKARNI .

SUSHIL KULKARNI . Logical database is partitioned in to different data streams and located at different sites.TWO MAIN ISSUES IN DDBMS Making query from one site to the same or remote site.

COMPONENT ARCHITECTURE FOR DDBMS ‡ Local DBMS ‡ Data Communication Component ‡ Global System Catalog ‡ Distributed DBMS component SUSHIL KULKARNI .

DATA ALLOCATION .

DATA ALLOCATION ‡ Centralized ‡ Fragmented ‡ Complete replication ‡ Selective replication SUSHIL KULKARNI .

Distributed Data Storage ‡ Assume relational data model. ‡ Replication: ± System maintains multiple copies of data. . ‡ Fragmentation: ± Relation is partitioned into several fragments stored in distinct sites ‡ Replication and fragmentation can be combined: ± Relation is partitioned into several fragments: System maintains several identical replicas of each such fragment. stored in different sites. for faster retrieval and fault tolerance.

‡ Full replication of a relation is the case where the relation is stored at all sites. SUSHIL KULKARNI .Data Replication ‡ A relation or fragment of a relation is replicated if it is stored redundantly in two or more sites. ‡ Fully redundant databases are those in which every site contains a copy of the entire database.

± Reduced data transfer: relation r is available locally at each site containing a replica of r.) Data Replication ‡ Advantages of Replication: ± Availability: failure of site containing relation r does not result in unavailability of r is replicas exist. SUSHIL KULKARNI .Data Replication (Cont. ± Parallelism: queries on r may be processed by several nodes in parallel.

‡ One solution: choose one copy as primary copy and apply concurrency control operations on primary copy.) Data Replication ‡ Disadvantages of Replication ± Increased cost of updates: each replica of relation r must be updated. ± Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented. .Data Replication (Cont.

the tuple-id attribute may be added to each schema to serve as a candidate key. account-number. «. ‡ Example : relation account with following schema. balance). ‡ Vertical fragmentation: the schema for relation r is split into several smaller schemas.Data Fragmentation ‡ Division of relation r into fragments r1. ‡ Account-schema = (branch-name. ± A special attribute. ‡ Horizontal fragmentation: each tuple of r is assigned to one or more fragments. rn which contain sufficient information to reconstruct relation r. ± All schemas must contain a common candidate key (or superkey) to ensure lossless join property. r2. .

T60 T61 . An A2 «««. .HORIZONTAL FRAGMENTATION Original relation A1 T1 T2 T3 . An Site 2 -Fragments contain subsets of complete tuples (all attributes at all sites) How to reconstruct R= Rs1 7 Rs2 ««. An 1 1 1 2 2 3 3 3 T2 T3 . . Tn A2 «««. . . Tn A1 T1 A2 «««. 7 Rsn .T60 Site 1 A1 T61 .

TID=RS2.VERTICAL FRAGMENTATION A1 Original Relation TID ±Tuple ID Hidden Attribute to ensure account and simple join reconstruction A1 A2 A3 A4 How to Reconstruct: R=Rs1 Rs2 Rsn (R) t1 t2 tn A2 TID 1 2 n TID 1 2 n A3 A4 t1 t2 RS2 RS1.TID tn Join condition RS1 t1 t2 tn SITE1 SITE2 .

TID=RS2.VERTICAL FRAGMENTATION A1 Original Relation TID ±Tuple ID Hidden Attribute to ensure account and simple join reconstruction A1 A2 A3 A4 How to Reconstruct: R=Rs1 Rs2 Rsn (R) t1 t2 tn A2 TID 1 2 n TID 1 2 n A3 A4 t1 t2 RS2 RS1.TID tn Join condition RS1 t1 t2 tn SITE1 SITE2 .

MIXED FRAGMENTATION Rs1 A1 A2 A3 Rs3 R A1 A2 A3 A4 A5 A4 A5 u s a Rs2 A1 A2 A3 A4 (Salary Attributes) (Benefit Attributes) A5 Rs4 E u r o p e .

MIXED FRAGMENTATION A1 Original Relation TID ±Tuple ID Hidden Attribute to ensure account and simple join reconstruction A1 A2 A3 A4 How to Reconstruct: R=Rs1 Rs2 Rsn (R) t1 t2 tn A2 TID 1 2 n TID 1 2 n A3 A4 t1 t2 RS2 RS1.TID tn Join condition RS1 t1 t2 tn SITE1 SITE2 .TID=RS2.

Horizontal Fragmentation of account Relation branch-name Hillside Hillside Hillside account-number A-305 A-226 A-155 account1=Wbranch-name=³Hillside´(account) branch-name Valleyview Valleyview Valleyview Valleyview account-number A-177 A-402 A-408 A-639 account2=Wbranch-name=³Valleyview´(account) SUSHIL KULKARNI balance 500 336 62 balance 205 10000 1123 750 .

tuple-id(employee-info) account number balance 500 A-305 336 A-226 205 A-177 10000 A-402 62 A-155 1123 A-408 750 A-639 deposit2=4account-number. customer-name. tuple-id(employee-info) . balance.Vertical Fragmentation of employee-info employeeRelation branch-name customer-name tuple-id 1 2 3 4 5 6 7 tuple-id 1 2 3 4 5 6 7 Lowman Hillside Camp Hillside Camp Valleyview Kahn Valleyview Kahn Hillside Kahn Valleyview Green Valleyview deposit1=4branch-name.

± Fragments may be successively fragmented to an arbitrary depth.Advantages of Fragmentation ‡ Horizontal: ± allows parallel processing on fragments of a relation ± allows a relation to be split so that tuples are located where they are most frequently accessed ‡ Vertical: ± allows tuples to be split so that each part of the tuple is stored where it is most frequently accessed ± tuple-id attribute allows efficient joining of vertical fragments ± allows parallel processing on a relation ‡ Vertical and horizontal fragmentation can be mixed. SUSHIL KULKARNI .

REPLICATION and FRAGMENTATION Partition of Attributes/tuples need not be disjoint A1 A2 A3 A4 A5 A1 A2 A3 A4 A2 A3 A4 A5 Overlap (replication of attributes) .

TRANSPARENCIES .

TRANSPARENCIES IN DDBMS ‡ Transparencies hide implementation details from the user ‡ Example in Centralized databases : Data independence ‡ Main types of transparencies in DDBMS: o Distributed Transparency o Transaction Transparency SUSHIL KULKARNI .

If this transparency is exhibited then the user does not need to know that 1. The data are partitioned. 3. Data can be replicated at several sites. Data location. logical entity.DISTRIBUTED TRANSPARENCY Allows the user to see the database as a single. 2. SUSHIL KULKARNI .

salary. position. lName.EXAMPLE Staff (staffNo. dob. dbranchNo (Staff) 2 SUSHIL KULKARNI . salary (Staff) 1 S ! staffNo. fName. fName.lName . branchNo) Vertical fragmentation: S ! staffNo. sex. sex . dob. position.

Assume that there are only three branches.EXAMPLE Fragment S 2 according to branch number. Horizontal fragmentation: S S S 21 22 23 ! W !W !W branchNo ! ' B003 ' branchNo ! ' B 005 ' branchNo ! ' B 007 ' (Staff) (Staff) (Staff) SUSHIL KULKARNI .

EXAMPLE Assume that : S 1 and S 2 are at site 5. S 21 at site 3 S 22 at site 5 S 23 at site 7 SUSHIL KULKARNI .

lName FROM Staff WHERE position = µ Manager ¶ SUSHIL KULKARNI .FRAGMENTATION TRANSPARENCY If it is provided then the user does not need to know the data is fragmented. Example: SELECT fName.

LOCATION TRANSPARENCY If it is provided then the user must know how the data has been fragmented but still does not have know the location of the data. SUSHIL KULKARNI .

lName FROM S22 WHERE staffNo IN (SELECT staffNO FROM S1 where position = µ Manager ¶) SUSHIL KULKARNI . lName FROM S21 WHERE staffNo IN (SELECT staffNO FROM S1 where position = µ Manager ¶) UNION SELECT fName.LOCATION TRANSPARENCY Example: SELECT fName.

lName FROM S23 WHERE staffNo IN (SELECT staffNO FROM S1 where position = µ Manager ¶ ) SUSHIL KULKARNI .LOCATION TRANSPARENCY Example: UNION SELECT fName.

SUSHIL KULKARNI .LOCAL MAPPING TRANSPARENCY If it is provided then the user must know how the data has been fragmented as well as the location of the data.

lName FROM S22 AT SITE 5 WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 3 where position = µ Manager ¶) SUSHIL KULKARNI .LOCATION TRANSPARENCY Example: SELECT fName. lName FROM S21 AT SITE 3 WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 5 where position = µ Manager ¶) UNION SELECT fName.

lName FROM S23 AT SITE 7 WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 3 where position = µ Manager ¶ ) SUSHIL KULKARNI .LOCATION TRANSPARENCY Example: UNION SELECT fName.

TRANSACTION TRANSPARENCY It maintains distributed database¶s integrity and consistency. SUSHIL KULKARNI .

000(Emp2)) Site 2 Execution in Parallel on fragments and union results together .000(Emp1)) U 4LName( Wsalary>40.QUERY PROCESSING IN DDMS Issues 1: Parallel Processing across Fragments Horizontal fragmentations =Emp1 U Emp2 4LName(Wsalary>40.000(Employee)) Site 1 2 Fragments   4LName( Wsalary>40.

symmetric and associative (A B) C Parallel Processing A (B C) (Wxx(A)) (B C) .QUERY PROCESSING IN DDMS Site1 Site2 Site3 Joins.

002.QUERY PROCESSING IN DDMS Join Strategies R=4 Fnames.003. 2000 bytes Department) Site 2 100 records. results to 3 1.000 records.000 bytes transfered   minimize total communication cost of data transfer . join at 2.000. results to 3 3)Ship Department to 1. 1. join at 1.000 bytes Mg rssn to ssn Strategies: 1)Ship both relations to the result site and join there 2)Ship employee to 2. 3000 bytes Site 1 10. Dnames (Employee Site 3 100 records.000 bytes transfered 5.000 bytes transfered 1. Cnames.

THANKS ! .