Professional Documents
Culture Documents
Cơ Sở Dữ Liệu Song Song
Cơ Sở Dữ Liệu Song Song
CSE 414
Lectures 18: Parallel Databases
(Ch. 20.1)
×1 ×5 ×10 ×15
# nodes (=P)
Ideal
×1 ×5 ×10 ×15
• Interference
– Contention for resources between nodes
• Stragglers
– Slowest node becomes the bottleneck
CSE 414 - Fall 2017 9
Architectures for Parallel
Databases
• Shared memory
• Shared disk
• Shared nothing
Interconnection Network
D D D
CSE 414 - Fall 2017 11
Shared Disk
M M M
P P P
Interconnection Network
D D D
CSE 414 - Fall 2017 12
Shared Nothing
Interconnection Network
P P P
M M M
D D D
CSE 414 - Fall 2017 13
A Professional Picture…
Characteristics:
• Also hard to scale past a certain point:
existing deployments typically have fewer
than 10 machines
Characteristics:
• Today, this is the most scalable architecture.
• Most difficult to administer and tune.
• Inter-query parallelism
cid=cid
pid=pid
Customer
Product Purchase
Product Purchase
• Selection: σA=123(R)
– Scan file R, select records with A=123
• Join: R S
– Scan file S, insert into a hash table using attr. B as key
– Scan file R, probe the hash table using attr. B
Data: Servers:
1 2 . . . P
K A B
… …
Data: Servers:
1 2 . . . P
K A B K A B K A B K A B
… … … … … …
… …
Which tuples
go to what server?
• R is block-partitioned
• R is hash-partitioned on K
Data: R(K, A, B, C)
Query: γA, sum(C)(R)
• R is block-partitioned or hash-partitioned on K
R1 R2 . . . RP
Reshuffle R
on attribute A
Shuffle
Find all orders from today, along with the items ordered
SELECT *
FROM Order o, Product p ⋈ o.pid = p.pid
WHERE o.pid = p.pid
AND o.date = today() date = today()
Product Order
p o
Query Execution
date = today()
scan Order o
date = today()