This action might not be possible to undo. Are you sure you want to continue?
• CS263 Lecture 16
⇒ Parallel DBMS - What and Why? ⇒ What is a Client/Server DBMS? ⇒ Why do we need Distributed DBMSs? ⇒ Date’s rules for a Distributed DBMS ⇒ Benefits of a Distributed DBMS ⇒ Issues associated with a Distributed DBMS ⇒ Disadvantages of a Distributed DBMS
PARALLEL DATABASE SYSTEM
WHY DO WE NEED THEM? • More and More Data! We have databases that hold a high amount of data, in the order of 1012 bytes: 10,000,000,000,000 bytes! • Faster and Faster Access! We have data applications that need to process data at very high speeds: 10,000s transactions per second!
SINGLE-PROCESSOR DBMS AREN’T UP TO THE JOB!
INTRAQUERY PARALLELISM It is possible to process ‘sub-tasks’ of a transaction in parallel with each other. INTERQUERY PARALLELISM It is possible to process a number of transactions in parallel with each other. . Improves Throughput.PARALLEL DBMSs BENEFITS OF A PARALLEL DBMS Improves Response Time.
000 records using 10 CPUs . 1 second to scan a DB of 1.000 records using 1 CPU 1 second to scan a DB of 10. the time taken to execute a transaction should be reduced by the same factor: 10 seconds to scan a DB of 10. As you multiply resources by a certain factor.000 records using 1 CPU 1 second to scan a DB of 10.PARALLEL DBMSs HOW TO MEASURE THE BENEFITS Speed-Up.000 records using 10 CPUs Scale-up. As you multiply resources the size of a task that can be executed in a given time should be increased by the same factor.
PARALLEL DBMSs SPEED-UP Number of transactions/second Linear speed-up (ideal) 2000/Sec 1600/Sec 1000/Sec 5 CPUs 16 CPUs Sub-linear speed-up 10 CPUs Number of CPUs .
PARALLEL DBMSs SCALE-UP Number of transactions/second 1000/Sec 900/Sec Linear scale-up (ideal) Sub-linear scale-up 5 CPUs 1 GB Database 10 CPUs 2 GB Database Number of CPUs. Database size .
Shared Memory – Parallel Database Architecture CPU CPU CPU CPU CPU CPU MEMORY .
Shared Disk – Parallel Database Architecture M M M M M M CPU CPU CPU CPU CPU CPU .
Shared Nothing – Parallel Database Architecture M CPU CPU M CPU CPU M CPU M M .
MAINFRAME DATABASE SYSTEM .
SPECIALISED NETWORK CONNECTION TERMINALS DUMB MAINFRAME COMPUTER DUMB DUMB PRESENTATION LOGIC BUSINESS LOGIC DATA LOGIC .
CLIENT/SERVER DATABASE SYSTEM .
CLIENT/SERVER DBMS CLIENT PROCESS ⇒ Manages user interface ⇒ Accepts user data ⇒ Processes application/business logic ⇒ Generates database requests (SQL) ⇒ Transmits database requests to server ⇒ Receives results from server ⇒ Formats results according to application logic ⇒ Present results to the user .
CLIENT/SERVER DBMS SERVER PROCESS ⇒ Accepts database requests ⇒ Processes database requests Performs integrity checks Handles concurrent access Optimises queries Performs security checks Enacts recovery routines ⇒ Transmits result of database request to client .
CLIENT #1 CLIENT/SERVER DBMS ARCHITECTURE SERVER CLIENT #2 DATA LOGIC D/BASE CLIENT #3 PRESENTATION LOGIC BUSINESS LOGIC (FAT CLIENT) Data Request Data Response .
CLIENT #1 CLIENT/SERVER DBMS ARCHITECTURE SERVER CLIENT #2 BUSINESS LOGIC DATA LOGIC D/BASE PL/SQL CLIENT #3 PRESENTATION LOGIC (THIN CLIENT) Data Request Data Response .
DISTRIBUTED PROCESSING ARCHITECTURE CLIENT CLIENT CLIENT CLIENT WIDE AREA NETWORK LAN CLIENT CLIENT LAN CLIENT CLIENT Stratford CLIENT CLIENT Leyton CLIENT CLIENT DBMS LAN CLIENT CLIENT LAN CLIENT CLIENT Barking Leytonstone .
DISTRIBUTED DATABASE SYSTEM .
e. .DISTRIBUTED DATABASES WHAT IS A DISTRIBUTED DATABASE? A distributed database system is a collection of logically related databases that co-operate in a transparent manner..as the user is unaware of where the data is located it is possible to move the data from one physical location to another without affecting the user. Transparent implies that each user within the system may access all of the data within all of the databases as if they were a single database There should be ‘location independence’ i.
DISTRIBUTED DATABASE ARCHITECTURE WIDE AREA NETWORK CLIENT CLIENT CLIENT CLIENT DBMS Stratford CLIENT CLIENT CLIENT Barking DBMS DBMS LAN CLIENT CLIENT CLIENT CLIENT Leyton CLIENT CLIENT DBMS LAN CLIENT CLIENT CLIENT CLIENT Leytonstone .
M:N CLIENT/SERVER DBMS ARCHITECTURE CLIENT #1 SERVER #1 D/BASE CLIENT #2 SERVER #2 D/BASE CLIENT #3 NOT TRANSPARENT! .
COMPONENTS OF A DDBMS Site 1 DDBMS DC LDBMS GSC GSC Computer Network DB DDBMS DC Site 2 LDBMS = Local DBMS DC = Data Communications GSC = Global Systems Catalog DDBMS = Distributed DBMS .
• Improved Processing Power Instead of one server handling the full database.DISTRIBUTED DATABASES ADVANTAGES • Reduced Communication Overhead Most data access is local. The rest of the system remains functional and available. . less expensive and performs better. then the only part of the system that is affected is the relevant local site. • Removal of Reliance on a Central Site If a server fails. we now have a collection of machines handling the same database.
This can effect a cultural change as it allows potentially greater control over local data . • Local autonomy The database is brought nearer to its users. .DISTRIBUTED DATABASES ADVANTAGES • Expandability It is easier to accommodate increasing the size of the global (logical) database.
7. 11. 6. 4. Local autonomy No reliance on a central site Continuous operation Location independence Fragmentation independence Replication independence Distributed query independence Distributed transaction processing Hardware independence Operating system independence Network independence Database independence . 12.DISTRIBUTED DATABASES DATE’S TWELVE RULES FOR A DDBMS A distributed system looks exactly like a non-distributed system to the user! 1. 5. 2. 8. 10. 3. 9.
DISTRIBUTED DATABASES ISSUES ⇒ Data Allocation ⇒ Data Fragmentation ⇒ Distributed Catalogue Management ⇒ Distributed Transactions ⇒ Distributed Queries – (see chapter 20) .
DISTRIBUTED DATABASES DATA ALLOCATION METRICS 1. Locality of reference Is the data near to the sites that need it? 2. Storage costs How does the strategy effect the availability and cost of data storage? 2. Performance Does the strategy result in bottlenecks or under-utilisation of resources? 2. Communication costs How much network traffic will result from the strategy? . Reliability and availability Does the strategy improve fault tolerance and accessibility? 2.
DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES CENTRALISED Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs Lowest Lowest Lowest Unsatisfactory Highest .
DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES PARTITIONED/FRAGMENTED Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs High Low (item) – High (system) Lowest Satisfactory Low .
DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES COMPLETE REPLICATION Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs Highest Highest Highest High High (update) – Low (read) .
DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES SELECTIVE REPLICATION Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs High Low (item) – High (system) Average Satisfactory Low .
⇒ Efficiency It’s more efficient if data is close to where it is frequently used. ⇒ Security Data not required by local applications is not stored at the local site. .DISTRIBUTED DATABASES WHY FRAGMENT DATA? ⇒ Usage Applications are usually interested in ‘views’ not whole relations. ⇒ Parallelism It is possible to run several ‘sub-queries’ in tandem.
17 340.g.. JONES GRAY SMITH GREEN ONO KHAN STRATFORD BARKING STRATFORD BARKING BARKING STRATFORD 1000.00 200.14 500.DISTRIBUTED DATABASES HORIZONTAL DATA FRAGMENTATION ACCOUNT CUSTOMER BRANCH BALANCE 200 324 345 350 400 456 e. (σ b n =‘S tfo ’ ra ch tra rd Account) .00 23.00 333.00 Horizontal Fragmentation: Consists of a Restriction on a Relation.
BARKING BRANCH BRANCH CUSTOMER BALANCE 324 350 400 GRAY GREEN ONO BARKING BARKING BARKING 200.00 .00 340.14 500.00 ACCT NO.DISTRIBUTED DATABASES HORIZONTAL DATA FRAGMENTATION ACCT NO.17 333.00 23. STRATFORD BRANCH BRANCH CUSTOMER BALANCE 200 345 456 JONES SMITH KHAN STRATFORD STRATFORD STRATFORD 1000.
DISTRIBUTED DATABASES VERTICAL DATA FRAGMENTATION S# 200 324 456 NAME SITE JONES GRAY KHAN PHONE NO LOGIN PASSWORD XXYY22 STRATFORD 0208-500-9000 JON200T BARKING 0208-545-7528 GRA324S ZZEE56 STRATFORD 0208-500-5821 KHA456T KJTR78 Vertical Fragmentation: Consists of a Projection on a Relation. N M. S EP O EN # A E IT . H N O Student) .g. (∏ S . e..
200 324 456 JONES GRAY KHAN STRATFORD BARKING STRATFORD 0208-500-9000 0208-545-7528 0208-500-5821 S# NETWORK ADMINISTRATION PASSWORD LOGIN-ID 200 324 456 JON200T GRA324S KHA456T XXYY22 ZZEE56 KJTR78 .DISTRIBUTED DATABASES VERTICAL DATA FRAGMENTATION S# STUDENT ADMINISTRATION SITE NAME PHONE NO.
This has severe performance penalties. All changes to any local system catalog have to be propagated to the site maintaining the global catalog. Bad performance. the catalogues from ALL other sites are examined for the item. . • Dispersed Catalog There is no physical global catalog. Each time a remote data item is required. single point of failure.DISTRIBUTED DATABASES DISTRIBUTED CATALOG MANAGEMENT • Centralised Global Catalog One site maintains the full global catalog. compromises site autonomy.
DISTRIBUTED DATABASES DISTRIBUTED CATALOG MANAGEMENT • Replicated Global Catalog Each site maintains its own global catalog. . changed or deleted locally has to be propagated to ALL other sites . Although this greatly speeds up remote data location. is fairly efficient. • Local-Master Catalog Each site maintains both its local system catalog as well as a catalog of all of its data items that are replicated at other sites. A detail of every data item added. it is very inefficient to maintain. and is not a single point of failure. This avoids compromising site autonomy.
DISTRIBUTED DATABASES DISTRIBUTED TRANSACTIONS ATOMIC DISTRIBUTED TRANSACTION Stratford Client Stratford Client Stratford Client Stratford DBMS (a) Stratford DB Barking DBMS (b) Barking DB Global Transaction (a) Debit Stratford A/C £500 (b) Credit Barking A/C £350 (c) Credit Leyton A/C £150 Leyton DBMS (c) Leyton DB .
OK .TWO-PHASE COMMIT (2PC) .
TWO-PHASE COMMIT (2PC) .ABORT ‘G lo b al A bo rt’ .
Lack of experience. Cost.DISTRIBUTED DATABASES DISADVANTAGES OF DDBMSs Architectural complexity. Lack of standards. Security. . Integrity control more difficult. Database design more complex.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.