You are on page 1of 52

UNIT-3

Introduction to Parallel database and I/O Parallelism


Introduction to Parallel database
• Nowadays organizations need to handle a huge amount of data with a high transfer rate.
• For such requirements, the client-server or centralized system is not efficient.
• With the need to improve the efficiency of the system, the concept of the parallel database comes in picture.
• A parallel database system seeks to improve the performance of the system through parallelizing concept.
Necessary for Parallel Database
• Multiple resources like CPUs and Disks are used in parallel.
• The operations are performed simultaneously, as opposed to serial processing.
• A parallel server can allow access to a single database by users on multiple machines.
• It also performs many parallelization operations like data loading, query processing, building indexes, and
evaluating queries.
Advantages

• Performance Improvement
• By connecting multiple resources like CPU and disks in parallel we can significantly increase the performance of the system.
• High availability –
• In the parallel database, nodes have less contact with each other, so the failure of one node doesn’t cause for failure of the entire system.
This amounts to significantly higher database availability.
• Proper resource utilization –
• Due to parallel execution, the CPU will never be ideal. Thus, proper utilization of resources is there.
• Increase Reliability –
• When one site fails, the execution can continue with another available site which is having a copy of data. Making the system more
reliable.
Speedup
• The ability to execute the tasks in less time by increasing the number of resources is called Speedup.

• Speedup=time original/time parallel


• Where ,
• time original = time required to execute the task using 1 processor
• time parallel = time required to execute the task using 'n' processors
Parallelism in Database

• Data can be partitioned across multiple disks for parallel I/O.


• Individual relational operations(e.g., sort, join, aggregation) can be executed in
parallel.
• Data can be portioned and each processor can work independently on its own partition
• Queries are expressed in high level language (SQL, translated to relational
algebra)
• Makes parallelization easier
• Different queries can be run in parallel with each other.
• Concurrency control takes care of conflicts
• Database naturally lend themselves to parallelism.
Parallel in Database
• Parallel computer, or multiprocessor, is a form of distributed system made of a number of nodes (processors,
memories, and disks) connected by a very fast network within one or more cabinets in the same room.
• There are two kinds of multiprocessors depending on how these nodes are coupled:
Tightly coupled and loosely coupled.
• Tightly coupled multiprocessors contain multiple processors that are connected at the bus level with a shared-
memory.
• Mainframe computers, supercomputers, and the modern multicore processors all use tight coupling to boost
performance.
• Loosely coupled multiprocessors, now referred to as computer clusters, or clusters for short, are based on multiple
commodity computers interconnected via a high-speed network
Need of Parallelism
Goal of Parallelism
Parallelism in Databases

Data can be partitioned across multiple disks for parallel I/O


Parallelism in Databases
Parallelism in Databases
Parallelism in Databases
Parallel Database Architecture
Parallel Architecture
1. Session manager – It plays the role of transaction monitor, providing support
for client interactions with the server. It perform the connections and
disconnections between the client processes and the two other subsystems.

2. Request manager – It receives client requests related to query compilation


and execution. It can access the database directory which holds all meta-
information about data and programs.

3. Data manager – It provides all the low level functions needed to run
compiled queries in parallel i.e., database operator execution, parallel
transaction support, cache management, etc.
If the request manager is able to compile dataflow control, then
synchronization and communication among data manager modules is
possible. Otherwise, transaction control and synchronization must be done
by a request manager module.
General Architecture of a Parallel Database system
Query Parallelism
Query Parallelism
Forms of Query Parallelism
Interquery Parallelism
Interquery Parallelism
• Queries/transactions execute in parallel with one another
• Increase transaction throughput, used primarily to scale up a transaction
processing system to support a larger number of transactions per second.
• Easiest form of parallelism to support, particularly in a shared memory parallel
database, because even sequential database system support concurrent
processing.
Intraquery Parallelism
Intraquery Parallelism

• Execution of a single query in parallel on multiple processor/disk, important for


speeding up long-running queries.
• Two complementary forms of intraquery parallelism:
• Intraoperation Parallelism – Parallelize the execution of each individual operation in the
query.
• Interoperation Parallelism – execute the different operations in a query expression in
parallel.
Pipeline Parallelism
Pipeline Parallelism
Mixed Parallelism
Introduction to Transaction Management
Transaction Concept
Transaction Concept
Examples of Transactions
Example of Transaction
Properties of Transactions
Atomicity
Consistency
Isolation
Durability
Objectives of Transaction Management
Transaction structure
Basic Transaction Primitives
Transaction serialization and recovery
Schedules
Schedule - 1

A = 100
A:=100-50
A=50
B = 50
B:=50+50
B=100 A = 50
Temp:= 50 * 0.1 = 5
A:=50 - 5
A=45
B = 100
B:=100+5
B=105
Schedule - 2

A = 100
temp:= 100 * 0.1 = 10
A:=100-10
A=90
B = 50
B:=50+10
B=60

A = 90
A:=90-50
A=40
B = 60
B:=60+50
B=110
Schedule - 3

A = 100
A:=100-50
A=50
A = 50
Temp:= 50 * 0.1 = 5
A:=50-5
A=45
B = 50
B:=50+50
B=100

B = 100
B:=100+5
B=105
Schedule - 4

A = 100
A:=100-50
A = 100
Temp:= 100 * 0.1 = 10
A:=100-10
A=90
A = 90
B=50
B:=50+50
B=100

You might also like