You are on page 1of 240

Distributed Database Systems

fekade.getahun@aau.edu.et

1
Agenda
 Object Relational Model
 Distributed Database

2
1. Object-Relational Model

3
Recap
 Database
 Types Database
 Relational
 Non-relational
 Object Oriented model
 Object relational model

4
OO database concept
 Representing complex object
 Encapsulation
 Class
 Inheritance

5
OO database concept
 Association: is the link between entities in an application.
It is represented by means of references between objects.
It can be binary, ternary and reverse

Select p.name, p.empl.company_name


From p in Persons

6
ADVANTAGES OF OODB
 An integrated repository of information that is shared by
multiple users, multiple products, multiple applications on
multiple platforms.
 It also solves the following problems:
 The semantic gap: The real world and the Conceptual model is
very similar.
 Impedance mismatch: Programming languages and database
systems must be interfaced to solve application problems. But the
language style, data structures, of a programming language (such
as C) and the DBMS (such as Oracle) are different. The OODB
supports general purpose programming in the OODB framework.
 New application requirements: Especially in OA, CAD, CAM,
CASE, object-orientation is the most natural and most convenient.

7
Complex object model
 Allows
 Sets of atomic values
 Tuple-valued attributes
 Sets of tuples (nested relations)
 General set and tuple constructors
 Object identity
 Thus, formally
 Every atomic value in A is an object.
 If a1, ..., an are attribute names in N, and O1, ..., On are objects,
then T = [a1:O1, ..., an:On] is also an object, and T.ai retrieves the
value Oi.
 If O1, ..., On are objects, then S = {O1, ..., On} is an abject.
8
Object Model
 An object is defined by a triple (OID, type constructor, state)
 where OID is the unique object identifier,
 type constructor is its type (such as atom, tuple, set, list, array, bag, etc.) and
state is its actual value.
Example:
(i1, atom, 'John')
(i2, atom, 30)
(i3, atom, 'Mary')
(i4, atom, 'Mark')
(i5, atom 'Vicki')
(i6, tuple, [Name:i1, Age:i2])
(i7, set, {i4, i5})
(i8, tuple, [Name:i3, Friends:i7])
(i9, set, {i6, i8})
9
OBJECT-ORIENTED DATABASES
 OODB = Object Orientation + Database Capabilities

 May provide the following features:


 persistence
 support of transactions
 simple querying of bulk data
 concurrency control
 resilience and recovery
 security
 versioning
 integrity
 performance issues
 DATA MODELS:
 Complex object model
 Semantic data model such as Extended ER (EER) model
10
OODB
 RESEARCH PROTOTYPES
 ORION: Lisp-based system
 IRIS: Functional data model, version control, object-SQL.
 Galileo: Strong typed language, complex objects.
 PROBE .
 POSTGRES: Extended relational database supporting objects.
 COMMERCIAL OODB
 O2: O2 Technology. Language O2C to define classes, methods and types. Supports multiple
inheritance. C++ compatible. Supports an extended SQL language O2SQL which can refer to
complex objects.
 G-Base: Lisp-based system, supports ADT, multiple inheritance of classes.
 CORBA: Standards for distributed objects.
 GemStone: Earliest OODB supporting object identity, inheritance, encapsulation. Language
OPAL is based upon Smalltalk.
 Ontos: C++ based system, supports encapsulation, inheritance, ability to construct complex
objects.
 Object Store: C++ based system. A good feature is that it supports the creation of indexes.
 Statics: Supports entity types, set valued attributes, and inheritance of entity types and methods.

11
OODB
 COMMERCIAL OODB
 Relational DB Extensions: Many relational systems support
OODB extensions.
 User-defined functions (dBase).
 User-defined ADTs (POSTGRES)
 Very-long multimedia fields (BLOB or Binary Large Object). (DB2
from IBM, SQL from SYBASE, Informix, Interbase)

12
OODB Implemenation Strategies
 Develop novel database data model or data language
(SIM)
 Extend an existing database language with object-oriented
capabilities. (IRIS, O2 and VBASE/ONTOS extended
SQL)
 Extend existing object-oriented programming language
with database capabilities (GemStone OPAL extended
SmallTalk)
 Extendable object-oriented DBMS library (ONTOS)

13
ODL A Class With Key and Extent
 A class definition with “extent”, “key”, and more elaborate
attributes; still relatively straightforward

class Person (extent persons key ssn) {


attribute struct Pname {string fname …} name;
attribute string ssn;
attribute date birthdate;

short age();
}

class department(extent Departments) {


attribute string name;
attribute string college;
}
Simple OQL Queries
 Basic syntax: select…from…where…
SELECT d.name
FROM d in departments
WHERE d.college = ‘Engineering’;
 An entry point to the database is needed for each query

SELECT d.name
FROM departments d
WHERE d.college = ‘Engineering’;
Object-Relational Data Models
 Extend the relational data model by including object
orientation and constructs to deal with added data types.
 Allow attributes of tuples to have complex types,
including non-atomic values such as nested relations.
 Preserve relational foundations, in particular the
declarative access to data, while extending modeling
power.
 Upward compatibility with existing relational languages.

16
Nested Relations
 Motivation:
 Permit non-atomic domains (atomic  indivisible)
 Example of non-atomic domain: set of integers,or set of tuples
 Allows more intuitive modeling for applications with complex
data
 Intuitive definition:
 allow relations whenever we allow atomic (scalar) values -
relations within relations
 Retains mathematical foundation of relational model
 Violates first normal form.

17
Example of a Nested Relation
 Example: library information system
 Each book has
 title,
 a set of authors,
 Publisher, and
 a set of keywords
 Non-1NF relation books

18
1NF Version of Nested Relation
 1NF version of books

flat-books

19
4NF Decomposition of Nested Relation
 Remove awkwardness of flat-books by assuming that the
following multi-valued dependencies hold:
 title author
 title keyword
 title pub-name, pub-branch
 Decompose flat-doc into 4NF using the schemas:
 (title, author)
 (title, keyword)
 (title, pub-name, pub-branch)

20
4NF Decomposition of flat–books

21
Problems with 4NF Schema
 4NF design requires users to include joins in their queries.
 1NF relational view flat-books defined by join of 4NF
relations:
 eliminates the need for users to perform joins,
 but loses the one-to-one correspondence between tuples and
documents.
 And has a large amount of redundancy
 Nested relations representation is much more natural here.

22
Complex Types and SQL:1999
 Extensions to SQL to support complex types include:
 Collection and large object types
 Nested relations are an example of collection types
 Structured types
 Nested record structures like composite attributes
 Inheritance
 Object orientation
 Including object identifiers and references

23
Collection Types
 Set type (not in SQL:1999)
create table books (
…..
keyword-set setof(varchar(20))
……
)
 Sets are an instance of collection types. Other instances
include
 Arrays (are supported in SQL:1999)
 E.g. author-array varchar(20) array[10]
 Can access elements of array in usual fashion:
 E.g. author-array[1]

 Multisets (not supported in SQL:1999)


 I.e., unordered collections, where an element may occur multiple
times
 Nested relations are sets of tuples
 SQL:1999 supports arrays of tuples
24
Large Object Types
 Large object types
 clob: Character large objects
book-review clob(10KB)
 blob: binary large objects
image blob(10MB)
movie blob (2GB)

25
Structured and Collection Types
(PostgreSQL)
Structured types can be declared and used in SQL
CREATE TYPE Publisher as (name varchar(20),
branch varchar(20));

CREATE TYPE Book AS (title varchar(20), authors


text[], pub_date date, pub Publisher, keywords text[]);

 Structured types can be used to create tables

CREATE TABLE books of Book


26
Structured Types (Cont.)
 Creating tables without creating an intermediate type
 For example, the table books could also be defined as
follows:
Create table books (title varchar(20),authors text[],
pub_date date, pub Publisher, keywords
text[])

27
Structured Types (Cont.)
Add two records into the books table
Insert into books (title,authors,pub_date, pub, keywords) values
('Compilers','{"Smith","Jones"}', now()::date,row('McGraw-
Hill','New York')::publisher,'{"Parsing","Analysis"}'),
('Networks','{"Jones","Frick"}',now()::date,row('Oxford','London')::
publisher,'{"Internet","Web"}')

Retrieve the content of the books table – two rows will be returned
Select * from Books;

Unnesting the nested relation - array


Select title,c.* as authors, (pub).name,(pub).branch, k.* as keywords
from Books b, unnest(authors) c, unnest(keywords) k;
28
Structured Types (Cont.) – Nested Table
Create Table Departments (dID serial primary key, dname varchar(20) not null);
Insert INTO Departments (dname) values ('Sales'), ( 'Marketing'), ('Production'),
('IT')
Create type name_type as (fname varchar(20), lname varchar(20));
Create type Edu_ty as (name varchar(20), Institution varchar(20), year
varchar(5));
Create table Employees (Id serial not null primary key, fullname name,
telno text[], edu Edu_Ty[], salary numeric, dId int references
departments(did));
Insert into Employees (fullname, telno, edu, salary, did) values
(row('dawit','alemu')::name,'{"0111222922","0911631715"}’,
Array[row('MSc','AAU','2015')::Edu_ty,
row('Bsc','AAU','2013')::Edu_ty,
row('Diploma','BU','1999')::Edu_ty],
29
14000, 4);
Structured Types (Cont.)
Define function that return table –
Create function getEmployee (eID int) returns Table (id int, fname varchar(20),
lname varchar(20), telno varchar(20),cred_name varchar(20), awarding_Inst
varchar(20), award_year varchar(20),depart_ID int, dname varchar(20)) AS $$
SELECT e.id,(fullname).fname as fname, (fullname).lname as lname, t.* as
telno,
(ed).name as cred_name, (ed).Institution as awarding_Inst,(ed).year as
award_year,e.did as depart_ID, dname
from Employees e, unnest(telno) t, unnest(edu) ed, departments d
where (d.did = e.did) and (e.id = $1);
$$ LANGUAGE SQL;

-- Function as data source


Select * from getEmployee(1);

30
Inheritance in PostgreSQL
 PostgreSQL supports only table inheritance no type
inheritance which is supported in SQL-99
create type Person_Ty as (PID varchar (20), fullname name_type,
address full_address);
create table People of Person_ty;
Create table Emps (id serial, salary numeric) INHERITS (people);
-- inherits columns of the base table people

Inserting data to the Emps table adds part of the data into the base table
– people but the reverse is not true
Insert into emps (pid, fullname,address,salary) values (1245,
row('Dawit', 'bekele')::name_type, row('DZ','AM')::full_address,
9878)
31
Structured and Collection Types (Oracle)
 Structured types can be declared and used in SQL
CREATE OR REPLACE TYPE Publisher as Object (name varchar(20), branch
varchar(20));
/
CREATE OR REPLACE TYPE VA as VARRAY (5) of VARCHAR(30);
/
CREATE OR REPLACE TYPE Book AS OBJECT (title varchar(20), authors VA,
pub_date date, pub Publisher, keywords VA);
/
 Structured types can be used to create tables

create table books of Book

32
Structured Types (Cont.)
 Creating tables without creating an intermediate type
 For example, the table books could also be defined as follows:

Create table books (title varchar(20),authors text[],


pub_date date, pub Publisher, keywords text[])

 Methods can be part of the type definition of a structured type:


Create or Replace type Employee_Ty as Object
(name varchar(20), salary int,
MEMBER function giveraise (percent IN int) return NUMBER);

 Method body is created separately


CREATE OR REPLACE TYPE BODY Employee_Ty AS
MEMBER Function giveraise(percent IN int ) return NUMBER IS
begin
RETURN (salary + ( salary * percent) / 100);
end giveraise;
END;
/
33
Creation of Values of Complex Types –
oracle
 Values of structured types are created using
constructor functions
 E.g. Publisher(‘McGraw-Hill’, ‘New York’)
Note: a value is not an object

34
Creation of Values of Complex Types
 To insert the preceding tuple into the relation books
Insert into books (title, authors, pub, keywords) values
('Compilers', VA('Smith', 'Jones'),
Publisher('McGraw-Hill', 'New York'), VA('parsing','analysis'));

Insert into books (title, authors, pub, keywords) values


(‘Introduction to Programming’, VA('Sample', 'Jones‘, ‘Test’),
Publisher('McGraw-Hill', 'New York'), VA(‘Modularity','analysis'));

Select Title, a.* from Books b, table(b.authors)

35
Inheritance Person_Typ
 Suppose that we have the following type definition for people:
create or replace type Person_typ as Object
(name varchar(20),
address varchar(20)) not final; Teacher_Typ Student_Typ
/
 Using inheritance to define the student and teacher types
create type Student under Person
As Object (degree varchar(20),
department varchar(20))
create or replace type Student_typ UNDER Person_ty
(degree varchar(20),
department varchar(20)) not final;
/

create type Teacher _typ under Person_typ


(salary integer,
department varchar(20)) not final

 Subtypes can redefine methods by using overriding method in place of member in the
member declaration

36
Reference Types
 Object-oriented languages provide the ability to create and
refer to objects.
 In SQL:1999
 References are to tuples, and
 References must be scoped,
 I.e., can only point to tuples in one specified table

37
Reference Declaration in SQL:1999
 E.g. define a type Department with a field name and a
field head which is a reference to the Person in table
people as scope
create type Department as Object
(name varchar(20), head ref Person_typ )

The table departments is defined as follows


create table departments of Department

38
Initializing Reference Typed Values
 In Oracle, to create a tuple with a reference value, first
create the tuple with a null reference and then set the
reference separately using the function ref(p) applied to a
tuple variable

 E.g. create a department with name CS and head being the


person named John

insert into departments values (`CS’, null)


update departments d
set head = (select ref(p) from people p
where name='John')
where d.name = 'CS' and d.head is null;
/ 39
Querying with Structured Types
 Find the title and the name of the publisher of each book.
select title, publisher.name
from books

 Note, the use of the dot notation to access fields of the


composite attribute (structured type) publisher

40
Nested Table
CREATE TYPE animal_ty AS OBJECT (breed
VARCHAR(25), name VARCHAR(25), birthdate DATE);
/
CREATE TYPE animals_nt AS TABLE OF animal_ty;

/
CREATE TABLE breeder (breederName VARCHAR(25),
animals animals_nt)
breederName
nested Animals
table animals store as animals_nt_tab;
Breed Name Birthdate
Breed Name Birthdate
41
Nested Table
 CREATE TABLE breeder (breederName VARCHAR(25),
animals animals_nt) nested table animals store as
animals_nt_tab;
INSERT INTO breeder VALUES (
'John Smith ',
animals_nt(
animal_ty('DOG', 'BUTCH', '31-MAR-01'),
animal_ty('DOG', 'ROVER', '05-JUN-01'),
animal_ty('DOG', 'JULIO', '10-JUN-01') )
);
breederName Animals

John Smith 'DOG’ 'BUTCH 31-MAR-01


'DOG’ 'ROVER’ 05-JUN-01
'DOG’ 'JULIO’ '10-JUN-01' 42
Nested Table

SELECT breederName, N.Name, N.BirthDate FROM breeder,


TABLE(breeder.Animals) N;

SELECT breederName, N.Name, N.BirthDate FROM breeder,


TABLE(breeder.Animals) N WHERE N.Name = 'JULIO';

43
Comparison of O-O and O-R Databases
 Relational systems
 simple data types, powerful query languages, high protection.
 Persistent-programming-language-based OODBs
 complex data types, integration with programming language,
high performance.
 Object-relational systems
 complex data types, powerful query languages, high protection.
 Note: Many real systems blur these boundaries
 E.g. persistent programming language built as a wrapper on a
relational database offers first two benefits, but may have poor
performance.

44
Distributed Database

45
Outline
 Distributed Database
 Introduction
 DDBMS Architecture
 DDB Design
 Distributed Query Processing

46
1. Introduction to Distributed
Database

47
File Systems

Program 1
Data
description File 1

Redundant Data
Program 2
Data File 2
description
Program 3
Data File 3
description

48
Database Management

Application program 1
(with data semantics)

Data description
Application program 2 Data Manipulation DATABAS
(with data semantics) E

Application program 3
(with data semantics)

49
Objective of database technology
 The key objective of DBS is Integration not centralization

 It is possible to achieve integration without centralization

50
Motivation

Database Technology Computer Network

integration distribution

Distributed Database
Systems

integration

Integration ≠ centralization

51
What is distributed …
 Processing logic or processing elements
 Functions
 Data
 Control

52
Classification of Distributed computing
 Criteria's [Bochmann, 1983]
 Degree of coupling – how closely the processing elements are
connected together
 Amount of Data exchanged/ amount of local processing
 Weak vs strong coupling
 Interconnection structure
 Point-to-point interconnection b/n processing units
 Common interconnection channel
 Interdependence of components
 Synchronization between components
 Synchronous or asynchronous

53
What is a Distributed Database
System?
 A distributed database (DDB) is a collection of multiple,
logically interrelated databases distributed over a
computer network.
 A distributed database management system (D–DBMS) is
the software that manages the DDB and provides an
access mechanism that makes this distribution transparent
to the users.
 Distributed database system (DDBS) = DDB + D–DBMS

54
What is not a DDBS?
 A timesharing computer system
 A loosely or tightly coupled multiprocessor system
 A database system which resides at one of the nodes of a
network of computers - this is a centralized database on a
network node

55
Centralized DBMS on a Network

Site 1

Site 2

Communication
Network

Site 3
Site 4

59
Distributed DBMS Environment
Site 2

Site 2
Site 1

Communication
Network

Site 3
Site 4

60
Implicit Assumptions
 Data stored at a number of sites ➯ each site logically
consists of a single processor.
 Processors at different sites are interconnected by a
computer network ➯ no multiprocessors
 parallel database systems
 Distributed database is a database, not a collection of files
➯ data logically related as exhibited in the users’ access
patterns
 relational data model
 D-DBMS is a full-fledged DBMS
 not remote file system, not a TP system

61
Promises of Distributed DBMS
 Transparent management of distributed, fragmented, and
replicated data
 Improved reliability/availability through distributed
transactions
 Improved performance
 Easier and more economical system expansion

62
Transparency
 Transparency is the separation of the higher level
semantics of a system from the lower level
implementation issues.
 Fundamental issue is to provide
 Data independence in the distributed environment
 Network (distribution) transparency
 Replication transparency
 Fragmentation transparency
 horizontal fragmentation: selection
 vertical fragmentation: projection
 hybrid

63
2. Distributed DBMS Architecture

64
Introduction: Architecture
 Defines the structure of the system. i.e,
 The components of a system are identified
 The functions of each component is specified and
 The interrelationships and interactions among these
components are defined

 The three “reference” architectures for distributed DBMS


 Client/server
 Peer-to-peer distributed DBMS
 Multi database system

65
DBMS Standardization
 Reference Model
 A conceptual framework whose purpose is to divide standardization work into

manageable pieces and to show at a general level how these pieces are related

to one another. (e.g., ISO/OSI)

 The three approaches

1. Component-based

 Components of the system are defined together with the

interrelationships between components .

Recommended for design and implementation of system


66
DBMS Standardization …
2. Function-based
 Classes of users are identified together with the functionality that the system will

provide for each class (e.g., ISO/OSI)

 The objectives of the system are clearly identified. But it gives very little insight

into how these objectives are attained

3. Data-based
 Identify the different types of data and specify the functional units that will

realize and/or use data according to these views.

 As data is the central resource that DBMS manages datalogical approach is the

preferable for standardization activities (e.g., ANSI/SPARC )

67
ANSI/SPARC Architecture

user view of the system which can be shared

represent the data and the relationship in between without considering the users or physical de
physical definition and organization of data

storage devices and the access mechanisms

68
Conceptual Schema Definition
RELATION PROJ [
KEY = {PNO}
ATTRIBUTES = {
PNO : CHARACTER(7)
PNAME : CHARACTER(20)
BUDGET : NUMERIC(7)
LOC : CHARACTER(15)
}
]
RELATION ASG [
KEY = {ENO,PNO}
ATTRIBUTES = {
ENO : CHARACTER(9)
PNO : CHARACTER(7)
RESP : CHARACTER(10)
DUR : NUMERIC(3)
}
]
69
Internal Schema Definition
RELATION EMP [
KEY = {ENO}
ATTRIBUTES = {
ENO : CHARACTER(9)
ENAME : CHARACTER(15)
TITLE : CHARACTER(10)
}
]

INTERNAL_REL E [
INDEX ON E# CALL EMINX
FIELD = {
E# : BYTE(9)
ENAME : BYTE(15)
TIT : BYTE(10)
}
]
70
External View Definition –
Example 1
Create a BUDGET view from the PROJ relation

CREATE VIEW BUDGET(PNAME, BUD) AS


SELECT PNAME, BUDGET FROM PROJ

71
External View Definition –
Example 2
Create a Payroll view from relations EMP and
Pay

CREATE VIEW PAYROLL (EMP_NO, EMP_NAME,


SAL)
AS SELECT EMP.ENO,EMP.ENAME,PAY.SAL
FROM EMP, PAY
WHERE EMP.TITLE = PAY.TITLE

72
Architectural models for Distributed
DBMS
 Ways to put multiple databases for sharing multiple
DBMS

73
Dimensions of the Problem
 Distribution
 Whether the components (deals with data) of the system are

located on the same machine or not

 Client-server vs Peer to Peer

 Heterogeneity
 Various levels (hardware, communications, operating system)

 DBMS important one


 data model, query language, transaction management algorithms
74
Dimensions of the Problem
 Autonomy
 Refers to the degree to which individual DBMSs can operate independently

 Autonomy is a function of communication, execution of transaction, dependency

 Various dimensions:
 Design autonomy: Ability of a component DBMS to decide on issues related to its own data

model, design and transaction management techniques.

 Communication autonomy: Ability of a component DBMS to decide whether and how to

communicate with other DBMSs. i.e., what type of information it wants to provide to other

DBMSs or the SW that controls their global execution

 Execution autonomy: Ability of a component DBMS to execute local operations in any

manner it wants to.

75
Architectural alternatives
 A0, D0, H0: logically integrated system
 Set of homogenous multiple DBMS

E.g. share everything multiprocessor system

 A0, D0, H1: logically integrated heterogeneous system


E.g. integrating network, hierarchical and relational databases
residing in the same machine

 A0, D1, H0: database is distributed (client-server) even


though integrated view is presented to users
76
Architectural …
 A0, D2, H0: same type of transparency is provided to users in
a fully distributed environment
 There is no distinction between the client and server

 A1, D0, H0: semi-autonomous systems (federated DBMS)


 Components have significant autonomy in their execution
 but their participation in a federation indicated that they are willing
to cooperate with others in the executing user requests that access
multiple databases
 Components are homogenous and not distributed
77
Architectural …
 (A1, D1, H1): distributed heterogeneous federated DBMS
 (A2, D0, H0): fully autonomous (multi-database system). The
components didn’t know how to talk with each other
 It is autonomous collections of homogenous DBMS
 Multi-DBMS is the software that provides for the management of this
Multi-databases and provides transparent access to it

 (A2, D0, H1): Autonomous, and heterogeneous DBMSs


 (A2, D1,H1): client/server distributed heterogeneous systems
 (A2, D2, H1): P2P distributed heterogeneous systems
78
Client/server

79
Client/server
 Task distribution

80
Advantages of Client-Server Architectures
 More efficient division of labor
 Horizontal and vertical scaling of resources
 Better price/performance on client machines
 Ability to use familiar tools on client machines
 Client access to remote data (via standards)
 Full DBMS functionality provided to client workstations
 Overall better system price/performance

81
Problems With Multiple-Client/Single
Server
 Server forms bottleneck
 Server forms single point of failure
 Database scaling difficult

82
Multiple client- multiple server

 Each client manages its own connection to the appropriate


server
83
Multiple client – multiple server

 Each client knows of its own home server which


communicates with other servers as required
84
Peer-to-Peer Distributed Systems

85
Components of DDBMS

86
MDBMS architecture with GCS

87
Components of a Multi-DBMS

88
3. Distributed Database Design

89
Introduction
 The design of DDB involves
 Making decision on the placement of data and program across
the sites of a computer network as well as possible designing
the network itself

 In DDBMS, the placement of applications entails


 Placement of the DDBMS software; and
 Placement of the applications that run on the DB

90
Design strategies
 Top-down
 Based on designing systems from scratch
 Begins with the requirement analysis that defines the
environment of the system and elicits both the data and
processing needs of all potential database users
 It is applicable for the design of homogeneous databases
 Bottom-up
 When the databases already exist at a number of sites
 Design involves integrating databases into one database
 Integrate Local schema into Global schema
 It is ideal in the context of heterogeneous databases

91
Top-Down Design Process

92
Distribution Design Issues
 Why fragment at all?
 How should we fragment?
 How much should we fragment?
 Is there any way to test the correctness of decomposition?
 How should we allocate?
 What is the necessary information for fragmentation and
allocation?

93
Reasons for Fragmentation
 Can't we just distribute relations?
 What is a reasonable unit of distribution?
 relation
 views are subsets of relations locality
 extra communication
 fragments of relations (sub-relations)
 concurrent execution of a number of transactions that access different
portions of a relation
 views that cannot be defined on a single fragment will require extra
processing
 semantic data control (especially integrity enforcement) more difficult

94
Fragmentation Alternatives-Horizontal

95
Fragmentation Alternatives-Vertical

96
Degree of Fragmentation

Finding the suitable level of partitioning within this range

97
ER model – for the running examples

PROJ
Skill
PNO PName Budget Location
Title Sal

EMP
ENO ENAME TITLE

ASS
PNO ENO Dur RESP
98
Fragmentation
 Horizontal Fragmentation (HF)
 Primary Horizontal Fragmentation (PHF)
 Derived Horizontal Fragmentation (DHF)

 Vertical Fragmentation (VF)


 Hybrid Fragmentation (HF)

99
PHF – Information Requirements
 Application Information
 minterm selectivity: sel(mi)
 The number of tuples of the relation that would be accessed by a user
query which is specified according to a given minterm predicate mi
 access frequencies: acc(qi)
 The frequency with which a user application accesses data. If Q =
{q1, q2, …, qq} is a set of user queries, acc(qi) indicates the access
frequency of the query qi in a given period
 Acc(mi) is computed from the acc(qi) that constitute the minterm

100
Primary Horizontal Fragmentation
Definition:
Rj = σFj (R ), 1 ≤ j ≤ w

where Fj is a selection formula, which is (preferably) a minterm


predicate
 A horizontal fragment Ri of relation R consists of all the tuples of
R which satisfy a minterm predicate mi
 Given a set of minterm predicates M, there are as many horizontal
fragments of relation R as there are minterm predicates
 Set of horizontal fragments also referred to as minterm fragments

101
PHF – Algorithm
Given:
A relation R, the set of simple predicates Pr

Output:
The set of fragments of R = {R1, R2,…,Rw} which obey the
fragmentation rules.

Preliminaries :
1. Pr should be complete
2. Pr should be minimal

102
Completeness of Simple Predicates
 A set of simple predicates Pr is said to be complete IFF
the accesses to the tuples of the minterm fragments
defined on Pr requires that two tuples of the same
minterm fragment have the same probability of being
accessed by any application
 Example:
 Assume PROJ[PNO, PNAME, BUDGET, LOC] has two
applications defined on it
 Find the budgets of projects at each location (1)
 Find projects with budgets less than $200000 (2)

103
Completeness of Simple Predicates
 According to (1),
Pr = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”}
which is not complete with respect to (2).

 Modify
 Pr = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,
BUDGET≤200000, BUDGET>200000}

which is complete.

104
Minimality of Simple Predicates
 If a predicate influences how fragmentation is performed,
(i.e., causes a fragment f to be further fragmented into,
say, fi and fj) then there should be at least one application
that accesses fi and fj differently
 In other words, the simple predicate should be relevant in
determining a fragmentation.
 If all the predicates of a set Pr are relevant, then Pr is
minimal

105
Minimality of Simple Predicates
Example :
 Pr ={LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,
BUDGET≤200000, BUDGET>200000}

is minimal (in addition to being complete).

 However, if we add
PNAME = “Instrumentation”

then Pr is not minimal.

106
COM_MIN Algorithm
 Given:
a relation R and a set of simple predicates Pr
 Output:
a complete and minimal set of simple predicates Pr' for Pr
 Rule 1:
a relation or fragment is partitioned into at least two parts
which are accessed differently by at least one application.

107
COM_MIN Algorithm
❶ Initialization
 find a pi ∈ Pr such that pi partitions R according to Rule 1
 set Pr' = pi ;
 Pr ←Pr – pi;
 F ←fi
❷ Iteratively add predicates to Pr' until it is complete
 find a pj ∈ Pr such that pj partitions some fk defined according to minterm
predicate over Pr' according to Rule 1
 set Pr' = Pr' ∪ pi ;
 Pr ←Pr – pi;
 F ← F ∪ fi
 if ∃pk ∈ Pr' which is non-relevant then
 Pr' ← Pr' – pk
 F ← F – fk

108
PHORIZONTAL Algorithm
 Makes use of COM_MIN to perform fragmentation
 Input:
a relation R and a set of simple predicates Pr
 Output:
a set of minterm predicates M according to which relation
R is to be fragmented

❶ Pr' ← COM_MIN (R, Pr)


❷ determine the set M of minterm predicates
❸ determine the set I of implications among pi ∈ Pr
❹ eliminate the contradictory minterms from M
109
PHF – Example
 Two candidate relations : Skill and PROJ.
 Fragmentation of relation Skill
 Application: Check the salary info and determine raise
 Employee records kept at two sites application run at two sites
 Simple predicates
 p1: SAL ≤ 30000
 p2: SAL > 30000
 Pr= {p1,p2} which is complete and minimal Pr'=Pr
 Minterm predicates
 m1: (SAL ≤ 30000)
 m2: NOT(SAL ≤ 30000) = (SAL > 30000)

110
PHF - Example

Skill1 Skill2

111
PHF - Example
 Fragmentation of relation PROJ
 Applications:
 Find the name and budget of projects given their location
 Issued at three sites
 Access project information according to budget
 one site accesses ≤200000 other accesses >200000
 Simple predicates
 For application (1)
 p1 : LOC = “Montreal”
 p2 : LOC = “New York”
 p3 : LOC = “Paris”
 For application (2)
 p4 : BUDGET ≤ 200000
 p5 : BUDGET > 200000
 Pr = Pr' = {p1,p2,p3,p4,p5}
112
PHF – Example
 Fragmentation of relation PROJ continued
 Minterm fragments left after elimination
m1 : (LOC = “Montreal”) ∧ (BUDGET ≤ 200000)
m2 : (LOC = “Montreal”) ∧ (BUDGET > 200000)
m3 : (LOC = “New York”) ∧ (BUDGET ≤ 200000)
m4 : (LOC = “New York”) ∧ (BUDGET > 200000)
m5 : (LOC = “Paris”) ∧ (BUDGET ≤ 200000)
m6 : (LOC = “Paris”) ∧ (BUDGET > 200000)

113
PHF Correctness
 Completeness
 Since Pr' is complete and minimal, the selection predicates are
complete
 Reconstruction
 If relation R is fragmented into FR = {R1,R2,…,Rr}
R = ∪∀Ri ∈FR Ri
 Disjointness
 Minterm predicates that form the basis of fragmentation should
be mutually exclusive.

114
Derived Horizontal Fragmentation
 Defined on a member relation of a link according to a
selection operation specified on its owner.
 Each link is an equijoin.
 Equijoin can be implemented by means of semi-joins.

115
DHF – Definition
 Given a link L where owner(L)=S and member(L)=R, the
derived horizontal fragments of R are defined as
Ri = R ⋉Si, 1≤i≤w

 where w is the maximum number of fragments that will be


defined on R and
Si = σFi (S)

 where Fi is the formula according to which the primary

horizontal fragment Si is defined.


116
DHF- Example
 Given link L1 where owner(L1)=SKILL and
member(L1)=EMP
EMP1 = EMP ⋉ SKILL1 Skill1 Skill2
EMP2 = EMP ⋉ SKILL2
 where
SKILL1 = σSAL≤30000(SKILL)
SKILL2 = σSAL>30000(SKILL)

117
DHF – Correctness
 Completeness
 Let R be the member relation of a link whose owner is relation S which is
fragmented as FS = {S1, S2, ..., Sn}. Furthermore, let A be the join attribute
between R and S. Then, for each tuple t of R, there should be a tuple t' of S
such that: t[A] = t'[A]
 i.e., Referential integrity :(tuples of any fragment of the member
relation are also in the owner relation)
 Reconstruction
 Reconstruction of a global relation R from its fragments {R1, R2, …,
Rn} is performed by the union operator (R is union of its fragments)
 Disjointness
 In DHF disjointness is guaranteed only if the join graph between
the owner and the member fragments is simple.

118
Paper review template
 Introduction
 Statement of the problem
 Objective
 Methodology
 Approach/ proposed solution
 Critics
 Conclusion

119
Vertical Fragmentation
 Has been studied within the centralized context
 design methodology
 physical clustering
 More difficult than horizontal, because more alternatives
exist
 Two approaches :
Grouping: attributes to fragments
Splitting: relation to fragments

120
VF
 Overlapping fragments
 grouping
 Non-overlapping fragments
 splitting
 We do not consider the replicated key attributes to be
overlapping
 Advantage:
 Easier to enforce functional dependencies (for integrity
checking etc.)

121
VF – Information requirements
 Application Information
 Attribute affinities
 a measure that indicates how closely related the attributes are
 This is obtained from more primitive usage data
 Attribute usage values
 Given a set of queries Q = {q1, q2,…, qq} that will run on the relation
R[A1, A2,…, An]

 use(qi,•) can be defined accordingly

122
VF – Definition of use(qi,Aj)
 Consider the following 4 queries for relation PROJ
q1: SELECT BUDGET q2: SELECT PNAME,BUDGET
FROM PROJ FROM PROJ
WHERE PNO=Value
q3: SELECT PNAME q4: SELECT SUM(BUDGET)
FROM PROJ FROM PROJ
WHERE LOC=Value WHERE LOC=Value
Let A1= PNO, A2= PNAME, A3= BUDGET, A4= LOC

123
VF – Affinity Measure aff(Ai,Aj)
 The attribute affinity measure between two attributes Ai and Aj
of a relation R[A1, A2, …, An] with respect to the set of
applications Q = (q1, q2, …, qq) is defined as follows :

aff(Ai, Aj) = all queries that access Ai and Aj (query access)

Where refl(qk) is the number of accesses to Ai and Aj for the query


qk at site sl and accl(qk) is the query qk access frequency
124
Example
 Assume each query in the previous example accesses the
attributes once during each execution

 Also assume the access frequencies


of each query in per site

 Then
aff(A1, A3) = 15*1 + 20*1+10*1
= 45
 and the attribute affinity matrix AA is
125
VF – Clustering Algorithm
 Take the attribute affinity matrix AA and reorganize the
attribute orders to form clusters where the attributes in
each cluster demonstrate high affinity to one another

 Bond Energy Algorithm (BEA) has been used for


clustering of entities. BEA finds an ordering of entities
(e.g. attributes) using the global affinity measure

(∑ ∑ ( ) [ )
𝑛 𝑛
𝐴𝑀=𝑚𝑎𝑥 𝑎𝑓𝑓 𝑎𝑖 ,𝑎 𝑗 𝑎𝑓𝑓 ( 𝑎𝑖 ,𝑎 𝑗−1 ) +𝑎𝑓𝑓 ( 𝑎𝑖 ,𝑎 𝑗+1 ) +¿ 𝑎𝑓𝑓 ( 𝑎𝑖− 1 ,𝑎 𝑗 ) +𝑎𝑓𝑓 ( 𝑎𝑖+1 ,𝑎 𝑗 ) ]
𝑖=1 𝑗=1

(∑ ∑ )])
𝑛 𝑛
𝐴𝑀 =𝑚 𝑎𝑥 𝑎𝑓𝑓 ( 𝑎𝑖 ,𝑎 𝑗 ) [ 𝑎𝑓𝑓 ( 𝑎 𝑖 , 𝑎 𝑗 −1 ) + 𝑎𝑓𝑓 ( 𝑎𝑖 , 𝑎 𝑗 +1
𝑖 =1 𝑗 =1

126
Bond Energy Algorithm
Input: The AA matrix
Output: The clustered affinity matrix CA which is a
perturbation of AA
❶ Initialization: Place and fix one of the columns of AA in
CA
❷ Iteration: Place the remaining n-i columns in the
remaining i+1 positions in the CA matrix. For each
column, choose the placement that makes the most
contribution to the global affinity measure
❸ Row order: Order the rows according to the column
ordering

127
Cont(Ai,Ak, Aj) = 2bond(Ai Ak)+2bond(Ak,Aj)-2bond(Ai, Aj)

128
BEA – Example
 Consider the following AA matrix and the corresponding CA matrix where A1
and A2 have been placed.

Place A3:
Ordering (0-3-1):
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2):
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150
Ordering (2-3-4): cont (A2,A3,A4) = 1780
129
BEA: Example

130
Partitioning Algorithm
 The objective is find set of attributes that can be accessed
solely in most cases. i.e., to divide a set of clustered
attributes {A1, A2, …, An} into two (or more) sets {A1, A2,
…, Ai} and {Ai+1, …, An} such that there are no (or
minimal) applications that access both (or more than one)
of the sets.

131
Partitioning algorithm
Define
AQ(qi) = {Aj|use(qi, Aj) =1}
TQ = {qi| AQ(qi) subset of TA}
BQ = {qi| AQ(qi) subset of BA}
OQ = Q –{TQ U BQ} //set of applications that access both TA and BA
and
CTQ = total number of accesses to attributes by applications that access only
TA
CBQ = total number of accesses to attributes by applications that access only
BA
COQ = total number of accesses to attributes by applications that access both
TA and BA
Then find the point along the diagonal that maximizes
z = CTQ∗CBQ−COQ2
132
Partitioning algorithm
Two problems :
❶ Cluster forming in the middle of the CA matrix
 Shift a row up and a column left and apply the algorithm to
find the “best” partitioning point
 Do this for all possible shifts
 Cost O(m2)
❷ More than two clusters
 m-way partitioning
 try 1, 2, …, m–1 split points along diagonal and try to find the
best point for each of these
 Cost O(2m)

133
VF correctness
 A relation R, defined over attribute set A and key K, generates the
 vertical partitioning FR = {R1, R2, …, Rr}.
 Completeness
 The following should be true for A:
 A =∪ ARi
 Reconstruction
 Reconstruction can be achieved by
R = ∆ Ri ∀Ri ∈FR
 Disjointness
 TID's are not considered to be overlapping since they are maintained by
the system
 Duplicated keys are not considered to be overlapping

134
Hybrid fragmentation

135
Allocation
 Problem Statement
 Given
F = {F1, F2, …, Fn} fragments
S = {S1, S2, …, Sm} network sites
Q = {q1, q2,…, qq} applications
Find the "optimal" distribution of F to S.
 Optimality
 Minimal cost
 Communication + storage + processing (read & update)
 Cost in terms of time (usually)
 Performance
 Response time and/or throughput
 Constraints
 Per site constraints (storage & processing)
136
Information Requirements
 Database information
 selectivity of fragments
 size of a fragment
 Application information
 access types and numbers
 access localities
 Communication network information
 unit cost of storing data at a site
 unit cost of processing at a site
 Computer system information
 bandwidth
 latency
 communication overhead
137
Allocation – Information Requirements
 Database Information
 selectivity of fragments
 size of a fragment
 Application Information
 number of read accesses of a query to a fragment
 number of update accesses of query to a fragment
 A matrix indicating which queries updates which fragments
 A similar matrix for retrievals
 originating site of each query
 Site Information
 unit cost of storing data at a site
 unit cost of processing at a site
 Network Information
 communication cost of frame between two sites
 frame size
138
Allocation Model
 General Form
min(Total Cost)
subject to
response time constraint
storage constraint
processing constraint

Decision Variable

xij = 1 if fragment Fi is stored at site Sj


0 otherwise
139
Allocation Model
 Total Cost
Query processing cost + cost of storing a fragment at a site
 Storage Cost (of fragment Fj at Sk)
(unit storage cost at Sk) * (size of Fj) * xjk
 Query Processing Cost (for one query)
processing component + transmission component

140
Allocation Model
 Query Processing Cost

Processing component

access cost + integrity enforcement cost + concurrency control


cost

 Access cost

(no of update accesses + no of read accesses) * x ij * local processing cost


at a site

141
Allocation Model
 Query Processing Cost
Transmission component

cost of processing updates + cost of processing retrievals

 Cost of updates
update message cost + acknowledgment cost

 Retrieval Cost
(cost of retrieval command + cost of sending back the result)

142
Allocation Model
 Constraints
 Response time
 Execution time of query <= max allowable response time for that
query
 Storage constraints
 Storage requirement of a fragment at that site <=storage capacity at
that site

 Processing constraint (for a site)


 Processing load of a query at that site <= processing capacity of that
site

143
Allocation Model
 Attempts to reduce the solution space
 assume all candidate partitioning are known and select the
“best” partitioning
 ignore replication at first
 sliding window on fragments

144
4. Distributed Query Processing

145
Introduction
 Query Processing

high level user query

query
Processor

low level data manipulation


commands
146
Query Processing Components
 Query language that is used
 SQL

 Query execution methodology


 The steps that one goes through in executing high level
(declarative) user queries.

 Query optimization
 How do we determine the “best” execution plan?

147
Query processing problem
Example

SELECT ENAME
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND DUR > 37

148
Example …

149
Cost of Alternatives
 Assume:
 size(EMP) = 400, size(ASG) = 1000
 tuple access cost = 1 unit; tuple transfer cost = 10 units
 Strategy 1
 produce ASGi: (10+10)∗tuple access cost 20
 transfer ASGi to the sites of EMP: (10+10)∗tuple transfer cost 200
 produce EMPi : (10+10) ∗tuple access cost∗2 40
 transfer EMPi to result site: (10+10) ∗tuple transfer cost 200
Total cost 460
 Strategy 2
 transfer EMP to site 5:400∗tuple transfer cost 4,000
 transfer ASGi to site 5 :1000∗tuple transfer cost 10,000
 produce ASGi:1000∗tuple access cost 1,000
 join EMPi and ASGi:400∗20∗tuple access cost 8,000
Total cost 23,000

150
Objective of Query processing
 To transform a high-level query on a distributed database into low
level language on local databases
 Minimize a cost function
I/O cost + CPU cost + communication cost
 These might have different weights in different distributed
environments
 Wide area networks
 communication cost will dominate
 low bandwidth
 low speed
 high protocol overhead
 Local area networks
 communication cost not that dominant
 total cost function should be considered
151
Complexity of Relational Operations
Assume
• relations of cardinality n
• sequential scan

Operation Complexity
Select O(n)
Project
Project (with duplicate elimination) O(nlog n)
Group
Join O(nlog n)
Semi-join
Division
Set Operations
Cartesian Product O(n2)

152
Characterization of Query processors
 Four characteristics that hold for Centralized query processors
 Language
 Input language – relational calculus or relational algebra
 Types of optimization
 Exhaustive search
 cost-based
 Optimal
 combinatorial complexity in the number of relations

 Heuristics
 not optimal
 regroup common sub-expressions
 perform selection, projection first
 replace a join by a series of semi-joins
 reorder operations to reduce intermediate relation size
 optimize individual operations

153
Optimization Timing
 Static
 compilation optimize prior to the execution
 difficult to estimate the size of the intermediate results error propagation
 can amortize over many executions
 E.g. R*
 Dynamic
 run time optimization
 exact information on the intermediate relation sizes
 have to reoptimize for multiple executions
 E.g. Distributed INGRES
 Hybrid
 compile using a static algorithm
 if the error in estimate sizes > threshold, reoptimize at run time
 E.g. MERMAID

154
Statistics
 Relation
 cardinality
 size of a tuple
 fraction of tuples participating in a join with another relation
 Attribute
 cardinality of domain
 actual number of distinct values
 Common assumptions
 independence between different attribute values
 uniform distribution of attribute values within their domain

155
Decision Sites
 Centralized
 single site determines the “best” schedule
 simple
 need knowledge about the entire distributed database
 Distributed
 cooperation among sites to determine the schedule
 need only local information
 cost of cooperation
 Hybrid
 one site determines the global schedule
 each site optimizes the local subqueries

156
Network Topology
 Wide area networks (WAN)
 characteristics
 low bandwidth
 low speed
 high protocol overhead
 communication cost will dominate; ignore all other cost factors
 global schedule to minimize communication cost
 local schedules according to centralized query optimization
 Local area networks (LAN)
 communication cost not that dominant
 total cost function should be considered
 broadcasting can be exploited (e.g. joins) to optimize query processing
 special algorithms exist for star networks

157
Exploitation of Replicated Fragments
 In Distributed query processing global relations are
mapped into queries on physical fragments of relation by
translating relations into fragments – localization
 Replication is need for increasing reliability and
availability

 Optimization algorithms might exploit the existence of


replicated fragments at run time to minimize
communication time

158
Use of semijoins
 Semijoin reduces the size of the operand relation
 But it increase the number of messages and in the local
processing time
 E.g. SDD 1, designed for slow wide area networks, use
semijoin extensively

159
Layers of Query Processing

160
Query Decomposition
 Input : Calculus query on global relations
1. Normalization
 manipulate query quantifier and qualification
2. Analysis
 detect and reject “incorrect” queries
 possible for only a subset of relational calculus
3. Simplification
 eliminate redundant predicates
4. Restructuring
 calculus query is restructured into algebraic query
 more than one translation is possible
 use transformation rules
161
Normalization
 Lexical and syntactic analysis
 check validity (similar to compilers)
 check for attributes and relations
 type checking on the qualification
 Put into normal form
 Conjunctive normal form
(p11∨p12∨…∨p1n) ∧…∧ (pm1∨pm2∨…∨pmn)
 Disjunctive normal form
(p11∧p12 ∧…∧p1n) ∨…∨ (pm1 ∧pm2∧…∧pmn)
 OR's mapped into union
 AND's mapped into join or selection

162
Analysis
 Remove incorrect queries
 Type incorrect
 If any of its attribute or relation names are not defined in the global
schema
 If operations are applied to attributes of the wrong type
 Semantically incorrect
 Components do not contribute in any way to the generation of the
result
 Only a subset of relational calculus queries can be tested for
correctness
 Those that do not contain disjunction and negation
 Technique to detect incorrect queries
 connection graph (query graph) that represent the semantic of the query
 join graph
163
Analysis – Example
SELECT ENAME,RESP
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND PNAME = "CAD/CAM"
AND DUR ≥ 36
AND TITLE = "Programmer"

164
Analysis
 If the query graph is not connected, the query is wrong.
SELECT ENAME,RESP, PNAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND PNAME = "CAD/CAM"
AND DUR ≥ 36
AND TITLE = "Programmer"

165
Simplification
 Use transformation rules
 elimination of redundancy
 idempotency rules
p1 ∧ ¬( p1) ⇔ false
p1 ∧ (p1 ∨ p2) ⇔ p1
p1 ∨ false ⇔ p1

 application of transitivity
 use of integrity rules

166
Simplification – Example
SELECT TITLE
FROM EMP
WHERE EMP.ENAME = “J. Doe”
OR (NOT(EMP.TITLE = “Programmer”)
AND (EMP.TITLE = “Programmer”)
OR EMP.TITLE = “Elect. Eng.”)
AND NOT(EMP.TITLE = “Elect. Eng.”) )

SELECT TITLE
FROM EMP
WHERE EMP.ENAME = “J. Doe”

167
Restructuring
 Convert relational calculus to
relational algebra
 Make use of query trees

Example
Find the names of employees other than
J. Doe who worked on the CAD/CAM
project for either 1 or 2 years.

SELECT ENAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND ENAME ≠ “J. Doe”
AND PNAME = “CAD/CAM”
AND (DUR = 12 OR DUR = 24)

168
Restructuring –Transformation Rules
 Commutativity of binary operations
 R×S⇔S×R
 R join S ⇔S join R
 R∪S⇔S∪R
 Associativity of binary operations
 ( R × S ) × T ⇔ R × (S × T)
 ( R join S) join T ⇔ R join (S join T)
 Idempotence of unary operations
 ΠA’(ΠA’(R)) ⇔ΠA’(R)
 σp1(A1)(σp2(A2)(R)) = σp1(A1) ∧ p2(A2)(R)
where R[A] and A' ⊆ A, A" ⊆ A and A' ⊆ A"
 Commuting selection with projection
169
Restructuring –Transformation Rules
 Commuting selection with binary operations
 σp(A)(R × S) ⇔ (σp(A) (R)) × S
 σp(Ai)(R join(Aj,Bk) S) ⇔ (σp(Ai)(R)) join(Aj,Bk) S
 σp(Ai)(R ∪ T) ⇔ σp(Ai)(R) ∪ σp(Ai)(T)
where Ai belongs to R and T
 Commuting projection with binary operations
 ΠC(R × S) ⇔ΠA’(R) × ΠB’(S)
 ΠC(R join(Aj,Bk) S)⇔ΠA’(R) join(Aj,Bk) ΠB’(S)
 ΠC(R ∪ S) ⇔ΠC (R) ∪ ΠC (S)
where R[A] and S[B]; C = A' ∪ B' where A' ⊆ A, B' ⊆ B

170
Example
Example
Find the names of employees other than
J. Doe who worked on the CAD/CAM
project for either 1 or 2 years

SELECT ENAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND ENAME ≠ “J. Doe”
AND PNAME = “CAD/CAM”
AND (DUR = 12 OR DUR =
24)

171
Equivalent Query

172
Restructuring

σDur=12 v Dur=24

173
Step 2 – Data Localization
 Input: Algebraic query on distributed relations
 Determine which fragments are involved
 Localization program
 substitute for each global query its materialization program
 ➠ optimize

174
Example
 Assume
 EMP is fragmented into EMP1, EMP2,
EMP3 as follows:
 EMP1=σENO≤“E3”(EMP)
 EMP2= σ“E3”<ENO≤“E6”(EMP)
 EMP3=σENO≥“E6”(EMP)
 ASG fragmented into ASG1 and ASG2 as
follows:
 ASG1=σENO≤“E3”(ASG)
 ASG2=σENO>“E3”(ASG)

Replace EMP by (EMP1∪EMP2∪EMP3 ) and


ASG by (ASG1 ∪ ASG2) in any query

175
Provides Parallellism

176
Eliminates …

177
Reduction for PHF
 Reduction with selection
 Relation R and FR={R1, R2, …, Rw} where Rj=σ pj(R)
σ pi(Rj)= φ if ∀x in R: ¬(pi(x) ∧ pj(x))
EMP1=σENO≤“E3”(EMP)
Example EMP2= σ“E3”<ENO≤“E6”(EMP)
SELECT * EMP3=σENO>“E6”(EMP)
FROM EMP
WHERE ENO=“E5”

178
Reduction for PHF
 Reduction with join
 Possible if fragmentation is done on join attribute
 Distribute join over union
(R1 ∪ R2) join S ⇔ (R1 join S) ∪ (R2 join S)
 Given Ri = σpi(R) and Rj = σpj(R)
Ri join Rj = φ if ∀x in Ri, ∀y in Rj: ¬(pi(x) ∧ pj(y))

179
Reduction for PHF
 Reduction with join - Example
 Assume EMP is fragmented into three
ASG1: σENO ≤ "E3"(ASG)
ASG2: σENO > "E3"(ASG) EMP1=σENO≤“E3”(EMP)
 EMP2= σ“E3”<ENO≤“E6”(EMP)
Consider the query
EMP3=σENO>“E6”(EMP)
SELECT * FROM EMP, ASG
WHERE EMP.ENO=ASG.ENO

180
Reduction for PHF
 Reduction with join
 Distribute join over unions
 Apply the reduction rule

181
Reduction for VF
 Find useless (not empty) intermediate relations
 Relation R defined over attributes A = {A1, ..., An} vertically
fragmented as Ri = ΠA'(R) where A' ⊆ A:
ΠD,K(Ri) is useless if the set of projection attributes D is not in A’
Example: EMP1= ΠENO,ENAME(EMP); EMP2= ΠENO,TITLE (EMP)
SELECT ENAME
FROM EMP

182
Reduction for DHF
 Rule :
 Distribute joins over unions
 Apply the join reduction for horizontal fragmentation

Example
ASG1: ASG JoinENO EMP1
ASG2: ASG JoinENO EMP2
EMP1: σTITLE=“Programmer” (EMP)
EMP2: σTITLE<>“Programmer” (EMP)

Query
SELECT *
FROM EMP, ASG
WHERE ASG.ENO = EMP.ENO
AND EMP.TITLE = “Mech. Eng.”
183
Reduction for DHF

184
Reduction for DHF
Joins over unions

Elimination of the empty intermediate relations (left sub-tree)

185
Reduction for Hybrid Fragmentation
 Combine the rules already specified:
 Remove empty relations generated by contradicting selections
on horizontal fragments
 Remove useless relations generated by projections on vertical
fragments
 Distribute joins over unions in order to isolate and remove
useless joins

186
Reduction for Hybrid Fragmentation
Example
Consider the following hybrid
fragmentation:
EMP1=σENO≤"E4" (ΠENO,ENAME(EMP))
EMP2=σENO>"E4"
(ΠENO,ENAME(EMP))
EMP3= ΠENO,TITLE(EMP)
and the query
SELECT ENAME
FROM EMP
WHERE ENO=“E5”

187
Global Query Optimization
 Input: Fragment query
 Find the best (not necessarily optimal) global schedule
 Minimize a cost function
 Distributed join processing
 Bushy vs. linear trees
 Which relation to ship where?
 Ship-whole vs ship-as-needed
 Decide on the use of semi-joins
 Semi-join saves on communication at the expense of more local
processing.
 Join methods
 nested loop vs ordered joins (merge join or hash join)

188
Cost-Based Optimization
 Solution space
 The set of equivalent algebra expressions (query trees).
 Cost function (in terms of time)
 I/O cost + CPU cost + communication cost
 These might have different weights in different distributed
environments (LAN vs WAN).
 Can also maximize throughput
 Search algorithm
 How do we move inside the solution space?
 Exhaustive search, heuristic algorithms (iterative improvement,
simulated annealing, genetic,…)

189
5. Concurrency Control

190
Concurrency Control in Distributed
Database
 Concurrency control schemes dealt with handling of data
as part of concurrent transactions.
 Various locking protocols are used for handling
concurrent transactions in centralized database systems.
 There are no major differences between the schemes in
centralized and distributed databases. The only major
difference is that the way the lock manager should deal
with the replicated data.

191
Locking protocols
1. Single lock manager approach
2. Distributed lock manager approach
a) Primary Copy protocol
b) Majority protocol
c) Biased protocol
d) Quorum Consensus protocol

192
Single Lock Manager - Concurrency
Control in Distributed Database

193
Single Lock Manager …
1. Transaction T1 @S5 request for data item
D
2. The initiator site S5’s Transaction manager
sends the lock request to lock data item D
to the lock-manager site S3.
 The Lock-manager at site S3 will look for the
availability of the data item D.
3. If the requested item is not locked by any
other transactions, the lock-manager site
responds with lock grant message to the
initiator site S5.
4. The initiator site S5 can use the data item
D from any of the sites S1, S2, and S6 for
completing the Transaction T1.
5. After successful completion of the
Transaction T1, the Transaction manager
of S5 releases the lock by sending the
unlock request to the lock-manager site S3.

194
Primary Copy Protocol

195
Majority Based Protocol
 A transaction which needs to lock data item Q has to
request and lock data item Q in half+one sites in which Q
is replicated (i.e, majority of the sites in which Q is
replicated).
 The lock-managers of all the sites in which Q is replicated
are responsible for handling lock and unlock requests
locally individually.
 Irrespective of the lock types (read or write, i.e, Shared or
Exclusive), we need to lock half+one sites.

196
Majority Based Protocol

197
Parallel Databases

198
Parallel Databases
 Introduction
 I/O Parallelism
 Interquery Parallelism
 Intraquery Parallelism
 Intraoperation Parallelism
 Interoperation Parallelism
 Design of Parallel Systems

199
Introduction
 Parallel machines are becoming quite common and affordable
 Prices of microprocessors, memory and disks have dropped sharply
 Recent desktop computers feature multiple processors and this trend
is projected to accelerate
 Databases are growing increasingly large
 large volumes of transaction data are collected and stored for later
analysis.
 multimedia objects like images are increasingly stored in databases
 Large-scale parallel database systems increasingly used for:
 storing large volumes of data
 processing time-consuming decision-support queries
 providing high throughput for transaction processing

200
Parallelism in Databases
 Data can be partitioned across multiple disks for parallel
I/O.
 Individual relational operations (e.g., sort, join,
aggregation) can be executed in parallel
 data can be partitioned and each processor can work
independently on its own partition.
 Queries are expressed in high level language (SQL,
translated to relational algebra)
 makes parallelization easier.
 Different queries can be run in parallel with each other.
Concurrency control takes care of conflicts.
 Thus, databases naturally lend themselves to parallelism.
201
Modes of Parallelism
 At the heart of all parallel machines is a collection of
processors.
 Each processor has its own local cache
 Classify parallel architectures into three broad groups
 The most tightly coupled architectures shared memory
 A less tightly coupled architecture shares disk but not memory.
 Shared nothing

202
Shared-Memory

Each processor has access to all the memory of all the


processors. That is, there is a single physical address
space for the entire machine, rather than one address
space for each processor - Network cost, low extensibility

203
Shared-Disk

• every processor has its own memory, which is not accessible


directly from other processors. However, the disks jure accessible
from any of the processors through the communication network.
• complexity, potential performance problem for cache coherency
204
Shared-Nothing

all processors have their own memory and their own disk or disks
the shared-nothing architecture is the most commonlyused architecture for database systems
Used by Teradata, IBM, Sybase, Microsoft for OLAP
Prototypes: Gamma, Bubba, Grace, Prisma, EDS
+ Extensibility, availability
- Complexity, difficult load balancing

205
I/O Parallelism
 Reduce the time required to retrieve relations from disk by
partitioning the relations on multiple disks.
 Horizontal partitioning – tuples of a relation are divided
among many disks such that each tuple resides on one
disk.
 Partitioning techniques (number of disks = n):
Round-robin: Send the ith tuple inserted in the relation to disk i
mod n.
Hash partitioning: send tuple n to disk f(n) where f is a
uniformly distributed random function

207
I/O Parallelism (Cont.)
 Range partitioning: break tuples up into contiguous
ranges of keys, requires a key that can be ordered linearly
 Choose an attribute as the partitioning attribute.
 A partitioning vector [vo, v1, ..., vn-1] is chosen.
 Let v be the partitioning attribute value of a tuple. Tuples such
that vi  vi+1 go to disk I + 1. Tuples with v < v0 go to disk 0 and
tuples with v  vn-2 go to disk n-1.
E.g., with a partitioning vector [5,11], a tuple with partitioning
attribute value of 2 will go to disk 0, a tuple with value 8 will
go to disk 1, while a tuple with value 20 will go to disk2.

208
Comparison of Partitioning Techniques
 Evaluate how well partitioning techniques support the
following types of data access:
1. Scanning the entire relation.
2. Locating a tuple (identify query) associatively – point
queries.
 Example: r.A = 25.
3. Locating a set of tuples based on the value of a given
attribute lies within a specified range – range queries.
 Example: 10  r.A < 25.

209
Comparison of Partitioning Techniques(Cont.)
Round robin:
 Advantages
 Best suited for sequential scan of entire relation on each query.
 All disks have almost an equal number of tuples; retrieval work
is thus well balanced between disks.
 Range queries are difficult to process
 No clustering - tuples are scattered across all disks

210
Comparison of Partitioning Techniques(Cont.)
Hash partitioning:
 Good for sequential access
 Assuming hash function is good, and partitioning attributes
form a key, tuples will be equally distributed between disks
 Retrieval work is then well balanced between disks.
 Good for point queries on partitioning attribute
 Can lookup single disk, leaving others available for answering
other queries.
 Index on partitioning attribute can be local to disk, making
lookup and update more efficient
 No clustering, so difficult to answer range queries

211
Range partitioning
 Partition requires a partitioning attribute A usually the
primary key
 A vector of dimension n partitions A
 Vector {v0,v1,…,vn-1}
 Each tuple t goes into:
 Partition 0 if t[A] < v0
 Partition n-1 if t[A] > vn-2
 Partition k if t[A] > vk-1 and t[A] < vk, k >=1
 Simple range partitioning #disks = #partitions

212
Comparison of Partitioning Techniques (Cont.)
Range partitioning:
 Provides data clustering by partitioning attribute value.
 Good for sequential access
 Good for point queries on partitioning attribute: only one
disk needs to be accessed.
 For range queries on partitioning attribute, one to a few disks
may need to be accessed
 Remaining disks are available for other queries.
 Good if result tuples are from one to a few blocks.
 If many blocks are to be fetched, they are still fetched from one to
a few disks, and potential parallelism in disk access is wasted
 Example of execution skew.

213
Partitioning a Relation across Disks
 If a relation contains only a few tuples which will fit into a
single disk block, then assign the relation to a single disk.
 Large relations are preferably partitioned across all the
available disks.
 If a relation consists of m disk blocks and there are n disks
available in the system, then the relation should be
allocated min(m,n) disks.

214
Handling of Skew
 The distribution of tuples to disks may be skewed — that is,
some disks have many tuples, while others may have fewer
tuples.
 Types of skew:
 Attribute-value skew.
 when lots of tuples are clustered around the same (or nearly same value)
i.e. some values appear in the partitioning attributes of many tuples; all
the tuples with the same value for the partitioning attribute end up in the
same partition.
 Can occur with range-partitioning and hash-partitioning.
 Partition skew.
 With range-partitioning, badly chosen partition vector may assign too
many tuples to some partitions and too few to others.
 Less likely with hash-partitioning if a good hash-function is chosen.
215
Handling Skew in Range-Partitioning
 To create a balanced partitioning vector (assuming
partitioning attribute forms a key of the relation):
 Sort the relation on the partitioning attribute.
 Construct the partition vector by scanning the relation in sorted
order as follows.
 After every 1/nth of the relation has been read, the value of the
partitioning attribute of the next tuple is added to the partition vector.
 n denotes the number of partitions to be constructed.
 Duplicate entries or imbalances can result if duplicates are
present in partitioning attributes.
 Alternative technique based on histograms used in
practice

216
Handling Skew using Histograms
 Balanced partitioning vector can be constructed from
histogram in a relatively straightforward fashion
 Assume uniform distribution within each range of the
histogram
 Histogram can be constructed by scanning relation, or
sampling (blocks containing) tuples of the relation.

217
Handling Skew Using Virtual Processor
Partitioning
 Skew in range partitioning can be handled elegantly using
virtual processor partitioning:
 create a large number of partitions (say 10 to 20 times the
number of processors)
 Assign virtual processors to partitions either in round-robin
fashion or based on estimated cost of processing each virtual
partition
 Basic idea:
 If any normal partition would have been skewed, it is very
likely the skew is spread over a number of virtual partitions
 Skewed virtual partitions get spread across a number of
processors, so work gets distributed evenly!

218
Interquery Parallelism
 It is a form of parallelism where many different Queries or Transactions
are executed in parallel with one another on many processors
 Increases transaction throughput; used primarily to scale up a
transaction processing system to support a larger number of
transactions per second.
 Easiest form of parallelism to support, particularly in a shared-memory
parallel database, because even sequential database systems support
concurrent processing.
 More complicated to implement on shared-disk or shared-nothing
architectures
 Locking and logging must be coordinated by passing messages between
processors.
 Data in a local buffer may have been updated at another processor.
 Cache-coherency has to be maintained - reads and writes of data in buffer
must find latest version of data.
219
Cache Coherency Protocol
 Example of a cache coherency protocol for shared disk
systems:
 Before reading/writing to a page, the page must be locked in
shared/exclusive mode.
 On locking a page, the page must be read from disk
 Before unlocking a page, the page must be written to disk if it
was modified.
 More complex protocols with fewer disk reads/writes exist.
 Cache coherency protocols for shared-nothing systems are
similar. Each database page is assigned a home processor.
Requests to fetch the page or write it to disk are sent to the
home processor.
220
Intraquery Parallelism
 Execution of a single query in parallel on multiple
processors/disks; important for speeding up long-running
queries.
SELECT * FROM Email ORDER BY Start_Date;
 Two complementary forms of intraquery parallelism :
 Intraoperation Parallelism – parallelize the execution of each
individual operation in the query.
 SELECT * FROM Email ORDER BY Start_Date; //(Sort
Operation)
 SELECT * FROM Student, CourseRegd WHERE Student.Regno
= CourseRegd.Regno; //(Join)

221
Intraquery Parallelism
 Interoperation Parallelism – execute the different operations
in a query expression in parallel.
 A single query may involve multiple operations at once.
 SELECT AVG(Salary) , dept_id FROM Employee GROUP BY
Dept_Id;

 It can be achieved in two ways


1. Pipelined Parallelism: consume the result produced by one
operation by the next operation in the pipeline
Example: r1 ⋈ r2 ⋈ r3 ⋈ r4 (i.e., there is logical dependency)
2. Independent Parallelism: Operations that are not depending
on each other can be executed in parallel at different
processors
222
Parallel Processing of Relational Operations
 The discussion of parallel algorithms assumes:
 read-only queries
 shared-nothing architecture
 n processors, P0, ..., Pn-1, and n disks D0, ..., Dn-1, where disk Di is
associated with processor Pi.
 If a processor has multiple disks they can simply simulate a
single disk Di.
 Shared-nothing architectures can be efficiently simulated
on shared-memory and shared-disk systems.
 Algorithms for shared-nothing systems can thus be run on
shared-memory and shared-disk systems.
 However, some optimizations may be possible.
223
Parallel Sort
Range-Partitioning Sort
 Assumptions:
 Assume n processors, P0, P1, …, Pn-1 and n disks D0, D1, …, Dn-1.
 Disk Di is associated with Processor Pi.
 Relation R is partitioned into R0, R1, …, Rn-1 using Round-robin technique or Hash
Partitioning technique or Range Partitioning technique (if range partitioned on some
other attribute other than sorting attribute)
 Objective:
 to sort a relation (table) Ri that resides on n disks on an attribute A in parallel.
 i.e. choose processors P0, ..., Pm, where m  n -1 to do sorting.
 Step 1: Partition the relations Ri on the sorting attribute A at every processor
using a range vector v. Send the partitioned records which fall in the ith range
to Processor Pi where they are temporarily stored in D i.
 Step 2: Sort each partition locally at each processor P i. And, send the sorted
results for merging with all the other sorted results which is trivial process.
224
 Assume that relation Employee(Emp_ID, EName, Salary) is permanently partitioned
using Round-robin technique into 3 disks D0, D1, and D2which are associated with
processors P0, P1, and P2. At processors P0, P1, and P2, the relations are named
Employee0, Employee1 and Employee2 respectively.

 SELECT * FROM Employee ORDER BY Salary;


 Step 1: Construct range vector of form: v[v0, v1, …, vn-2].
 Assume the range vector v[14000, 24000] representing range 0 (14000 and less), range
1 (14001 to 24000) and range 2 (24001 and more).
 Redistribute Employee 0, Employee 1 and Employee 2 using these range vectors and
store it in temporary disk

225
 Sort 2: Sort each temporary table in ascending order and later merge

226
Parallel Sort (Cont.)
Parallel External Sort-Merge
 Assume the relation has already been partitioned among disks
D0, ..., Dn-1.
 Each processor Pi locally sorts the data on disk Di.
 The sorted runs on each processor are then merged to get the final
sorted output.
 Parallelize the merging of sorted runs as follows:
 The sorted partitions at each processor Pi are range-partitioned across the
processors P0, ..., Pm-1.
 Each processor Pi performs a merge on the streams as they are received,
to get a single sorted run.
 The sorted runs on processors P0,..., Pm-1 are concatenated to get the final
result.
227
SELECT * FROM Employee ORDER BY Salary;

v[14000, 24000]

228
Parallel Join
 The join operation requires pairs of tuples to be tested to
see if they satisfy the join condition, and if they do, the
pair is added to the join output.
 Parallel join algorithms attempt to split the pairs to be
tested over several processors. Each processor then
computes part of the join locally.
 In a final step, the results from each processor can be
collected together to produce the final result.

229
Partitioned Join
 For equi-joins and natural joins, it is possible to partition the two
input relations across the processors, and compute the join locally
at each processor.
 Let r and s be the input relations, and we want to compute r r.A=s.B s.
 r and s each are partitioned into n partitions, denoted r0, r1, ..., rn-1
and s0, s1, ..., sn-1.
 Can use either range partitioning or hash partitioning.
 r and s must be partitioned on their join attributes r.A and s.B),
using the same range-partitioning vector or hash function.
 Partitions ri and si are sent to processor Pi,
 Each processor Pi locally computes ri ri.A=si.B si. Any of the
standard join methods can be used.

230
Partitioned Join (Cont.)

231
232
Partitioned Parallel Hash-Join
Parallelizing partitioned hash join:
 Assume s is smaller than r and therefore s is chosen as the build
relation.
 A hash function h takes the join attribute value of each tuple in
1
s and maps this tuple to one of the n processors.
 Each processor P reads the tuples of s that are on its disk D ,
i i
and sends each tuple to the appropriate processor based on hash
function h1. Let si denote the tuples of relation s that are sent to
processor Pi.
 As tuples of relation s are received at the destination processors,
they are partitioned further using another hash function, h2,
which is used to compute the hash-join locally. (Cont.)
237
Partitioned Parallel Hash-Join (Cont.)
 Once the tuples of s have been distributed, the larger relation r is
redistributed across the m processors using the hash function h1
 Let ri denote the tuples of relation r that are sent to processor Pi.
 As the r tuples are received at the destination processors, they are
repartitioned using the function h2
 (just as the probe relation is partitioned in the sequential hash-join
algorithm).
 Each processor Pi executes the build and probe phases of the hash-
join algorithm on the local partitions ri and s of r and s to produce a
partition of the final result of the hash-join.
 Note: Hash-join optimizations can be applied to the parallel case
 e.g., the hybrid hash-join algorithm can be used to cache some of the
incoming tuples in memory and avoid the cost of writing them and reading
them back in.
238
Parallel Nested-Loop Join
 Assume that
 relation s is much smaller than relation r and that r is stored by partitioning.
 there is an index on a join attribute of relation r at each of the partitions of
relation r.
 Use asymmetric fragment-and-replicate, with relation s being
replicated, and using the existing partitioning of relation r.
 Each processor Pj where a partition of relation s is stored reads the
tuples of relation s stored in Dj, and replicates the tuples to every
other processor Pi.
 At the end of this phase, relation s is replicated at all sites that store tuples
of relation r.
 Each processor Pi performs an indexed nested-loop join of relation s
with the ith partition of relation r.

239
Other Relational Operations
Selection (r)
 If  is of the form ai = v, where ai is an attribute and v a
value.
 If r is partitioned on ai the selection is performed at a single
processor.
 If  is of the form l <= ai <= u (i.e.,  is a range selection)
and the relation has been range-partitioned on a i
 Selection is performed at each processor whose partition overlaps
with the specified range of values.
 In all other cases: the selection is performed in parallel at
all the processors.

240
Other Relational Operations (Cont.)
 Duplicate elimination
 Perform by using either of the parallel sort techniques
 eliminate duplicates as soon as they are found during sorting.
 Can also partition the tuples (using either range- or hash-
partitioning) and perform duplicate elimination locally at each
processor.

 Projection
 Projection without duplicate elimination can be performed as
tuples are read in from disk in parallel.
 If duplicate elimination is required, any of the above duplicate
elimination techniques can be used.

241
Grouping/Aggregation
 Partition the relation on the grouping attributes and then
compute the aggregate values locally at each processor.
 Can reduce cost of transferring tuples during partitioning by
partly computing aggregate values before partitioning.
 Consider the sum aggregation operation:
 Perform aggregation operation at each processor P i on those tuples
stored on disk Di
 results in tuples with partial sums at each processor.
 Result of the local aggregation is partitioned on the grouping
attributes, and the aggregation performed again at each processor P i
to get the final result.
 Fewer tuples need to be sent to other processors during
partitioning.
242
Cost of Parallel Evaluation of Operations
 If there is no skew in the partitioning, and there is no
overhead due to the parallel evaluation, expected speed-up
will be 1/n
 If skew and overheads are also to be taken into account,
the time taken by a parallel operation can be estimated as
Tpart + Tasm + max (T0, T1, …, Tn-1)
 Tpart is the time for partitioning the relations
 Tasm is the time for assembling the results
 Ti is the time taken for the operation at processor P i
 this needs to be estimated taking into account the skew, and the time
wasted in contentions.

243
Interoperator Parallelism
 Pipelined parallelism
 Consider a join of four relations
 r1 r2 r3 r4
 Set up a pipeline that computes the three joins in parallel
 Let P1 be assigned the computation of
temp1 = r1 r2
 And P2 be assigned the computation of temp2 = temp1 r3
 And P3 be assigned the computation of temp2 r4
 Each of these operations can execute in parallel, sending result
tuples it computes to the next operation even as it is computing
further results
 Provided a pipelineable join evaluation algorithm (e.g. indexed nested
loops join) is used
244
Factors Limiting Utility of Pipeline
Parallelism
 Pipeline parallelism is useful since it avoids writing
intermediate results to disk
 Useful with small number of processors, but does not
scale up well with more processors. One reason is that
pipeline chains do not attain sufficient length.
 Cannot pipeline operators which do not produce output
until all inputs have been accessed (e.g. aggregate and
sort)
 Little speedup is obtained for the frequent cases of skew
in which one operator's execution cost is much higher than
the others.

245
Independent Parallelism
 Independent parallelism
 Consider a join of four relations
r1 r2 r3 r4
 Let P1 be assigned the computation of temp1 = r1 r2
 And P2 be assigned the computation of temp2 = r3 r4
 And P3 be assigned the computation of temp1 temp2
 P1 and P2 can work independently in parallel
 P3 has to wait for input from P1 and P2
 Can pipeline output of P1 and P2 to P3, combining independent parallelism
and pipelined parallelism
 Does not provide a high degree of parallelism
 useful with a lower degree of parallelism.
 less useful in a highly parallel system,

246
Design of Parallel Systems
Some issues in the design of parallel systems:
 Parallel loading of data from external sources is needed in
order to handle large volumes of incoming data.
 Resilience to failure of some processors or disks.
 Probability of some disk or processor failing is higher in a
parallel system.
 Operation (perhaps with degraded performance) should be
possible in spite of failure.
 Redundancy achieved by storing extra copy of every data item
at another processor.

247
Design of Parallel Systems (Cont.)
 Online reorganization of data and schema changes must
be supported.
 For example, index construction on terabyte databases can take
hours or days even on a parallel system.
 Need to allow other processing (insertions/deletions/updates) to be
performed on relation even as index is being constructed.
 Basic idea: index construction tracks changes and “catches up”
on changes at the end.
 Also need support for online repartitioning and schema
changes (executed concurrently with other processing).

248

You might also like