You are on page 1of 38

Distributed Databases and

Client-Server Architectures

Nasrullah Memon

File Systems:
MOTIVATION

program 1
File 1
data description 1

program 2
data description 2 File 2

program 3
data description 3 File 3

F7S/MIT – Database Systems Page 2


Database Management
Application
program 1
(with data
semantics)
DBMS

description
Application
program 2 manipulation
(with data database
semantics) control

Application
program 3
(with data
semantics)

F7S/MIT – Database Systems Page 3

Integrate Databases and Commuinication

Database Computer
Technology Networks
integration distribution

Distributed
Database
Systems
integration

F7S/MIT – Database Systems Page 4


Distributed Computing

• A number of autonomous processing


elements (not necessarily homogeneous)
that are interconnected by a computer
network and that cooperate in performing
their assigned tasks.

F7S/MIT – Database Systems Page 5

Distributed Computing
• Synonymous terms
– distributed data processing
– multiprocessors/multicomputers
– satellite processing
– backend processing
– dedicated/special purpose
computers
– timeshared systems
– functionally modular systems
– Peer to Peer Systems

F7S/MIT – Database Systems Page 6


What is distributed …

• Processing logic
• Functions
• Data
• Control

F7S/MIT – Database Systems Page 7

What is a Distributed Database


System?
A distributed database (DDB) is a collection of
multiple, logically interrelated databases
distributed over a computer network.

A distributed database management system (D–


DBMS) is the software that manages the DDB and
provides an access mechanism that makes this
distribution transparent to the users.

Distributed database system (DDBS) = DB +


Communication
F7S/MIT – Database Systems Page 8
What is not a DDBS?

• A timesharing computer system


• A loosely or tightly coupled multiprocessor
system
• A database system which resides at one of
the nodes of a network of computers - this is
a centralized database on a network node

F7S/MIT – Database Systems Page 9

Centralized DBMS on a Network

Site 1
Site 2

Site 5

Communication
Network

Site 4 Site 3

F7S/MIT – Database Systems Page 10


Distributed DBMS Environment

Site 1
Site 2

Site 5
Communication
Network

Site 4 Site 3

F7S/MIT – Database Systems Page 11

Implicit Assumptions
• Data stored at a number of sites each site
logically consists of a single processor.
• Processors at different sites are
interconnected by a computer network no
multiprocessors
– parallel database systems
• Distributed database is a database, not a
collection of files data logically related as
exhibited in the users’ access patterns
– relational data model
• D-DBMS is a full-fledged DBMS
– not remote file system, not a TP system
F7S/MIT – Database Systems Page 12
Shared-Memory
Architecture

P1 Pn M
D

Examples : symmetric multiprocessors


(Sequent, Encore) and some
mainframes (IBM3090, Bull's
DPS8)
F7S/MIT – Database Systems Page 13

Shared-Nothing Architecture

P1 Pn
D1 Dn
M1 Mn

Examples : Teradata's DBC, Tandem, Intel's


Paragon, NCR's 3600 and 3700
F7S/MIT – Database Systems Page 14
Applications
• Manufacturing - especially multi-plant
manufacturing
• Military command and control
• Electronic fund transfers and electronic
trading
• Corporate MIS
• Airline restrictions
• Hotel chains
• Any organization which has a
decentralized organization structure
F7S/MIT – Database Systems Page 15

Distributed DBMS Promises

• Transparent management of distributed,


fragmented, and replicated data
• Improved reliability/availability through
distributed transactions
• Improved performance
• Easier and more economical system
expansion

F7S/MIT – Database Systems Page 16


Transparency
• Transparency is the separation of the higher
level semantics of a system from the lower level
implementation issues.
• Fundamental issue is to provide
data independence
in the distributed environment
– Network (distribution) transparency

– Replication transparency

– Fragmentation transparency
• horizontal fragmentation: selection
• vertical fragmentation: projection
• hybrid

F7S/MIT – Database Systems Page 17

Example
EMP ASG
ENO ENAME TITLE ENO PNO RESP DUR
E1 J. Doe Elect. Eng. E1 P1 Manager 12
E2 M. Smith Syst. Anal. E2 P1 Analyst 24
E3 A. Lee Mech. Eng. E2 P2 Analyst 6
E4 J. Miller Programmer E3 P3 Consultant 10
E5 B. Casey Syst. Anal. E3 P4 Engineer 48
E6 L. Chu Elect. Eng. E4 P2 Programmer 18
E7 R. Davis Mech. Eng. E5 P2 Manager 24
E8 J. Jones Syst. Anal. E6 P4 Manager 48
E7 P3 Engineer 36
E7 P5 Engineer 23
E8 P3 Manager 40

PROJ PAY
PNO PNAME BUDGET TITLE SAL
P1 Instrumentation 150000 Elect. Eng. 40000
P2 Database Develop. 135000 Syst. Anal. 34000
P3 CAD/CAM 250000 Mech. Eng. 27000
P4 Maintenance 310000 Programmer 24000

F7S/MIT – Database Systems Page 18


Transparent Access

SELECTENAME,SAL
Tokyo
FROM EMP,ASG,PAY
WHERE DUR > 12
Boston Paris
AND EMP.ENO = ASG.ENO
Paris projects
AND PAY.TITLE = Paris employees
Communication
EMP.TITLE Network
Paris assignments
Boston employees
Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York
Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments
F7S/MIT – Database Systems Page 19

Distributed Database - User View

Distributed Database

F7S/MIT – Database Systems Page 20


Distributed DBMS - Reality
User
Query
DBMS
Software
User
Application
DBMS
Software

DBMS Communication
Software Subsystem

User
DBMS User Application
Software Query DBMS
Software

User
Query
F7S/MIT – Database Systems Page 21

Potentially Improved Performance

• Proximity of data to its points of use

– Requires some support for fragmentation and


replication

• Parallelism in execution

– Inter-query parallelism

– Intra-query parallelism

F7S/MIT – Database Systems Page 22


System Expansion
• Issue is database scaling

• Peer to Peer systems

• Communication overhead

F7S/MIT – Database Systems Page 23

Distributed DBMS Issues


• Distributed Database Design
– how to distribute the database
– replicated & non-replicated database distribution
– a related problem in directory management

Query Processing
– convert user transactions to data manipulation
instructions
– optimization problem
– min{cost = data transmission + local processing}
– general formulation is NP-hard
F7S/MIT – Database Systems Page 24
Distributed DBMS Issues
• Concurrency Control
– Synchronization of concurrent accesses
– Consistency and isolation of transactions' effects
– Deadlock management

• Reliability
– How to make the system resilient to failures
– Atomicity and durability

• Privacy/Security
– Keep database access private
– Protect against malicious activities

• Trusted Collaborations (Emerging requirements)


– Evaluate trust among users and database sites
– Enforce policies for privacy
– Enforce integrity F7S/MIT – Database Systems Page 25

Relationship Between Issues


Directory
Management

Query Distribution
Reliability
Processing Design

Concurrency
Control

Deadlock
Management
F7S/MIT – Database Systems Page 26
Related Issues
• Operating System Support
– operating system with proper support for
database operations
– dichotomy between general purpose
processing requirements and database
processing requirements
• Open Systems and Interoperability
– Distributed Multidatabase Systems
– More probable scenario
– Parallel issues
• Network Behavior
F7S/MIT – Database Systems Page 27

Architecture of a Database
System

• Background materials of database


architecture
• Defines the structure of the system
– components identified
– functions of each component defined
– interrelationships and interactions between
components defined

F7S/MIT – Database Systems Page 28


ANSI/SPARC Architecture
Users

External External External External


Schema view view view

Conceptual Conceptual
view
Schema

Internal Internal view


Schema

F7S/MIT – Database Systems Page 29

Standardization
Reference Model
– A conceptual framework whose purpose is to divide
standardization work into manageable pieces and to show at a
general level how these pieces are related to one another.
Approaches
– Component-based
• Components of the system are defined together with the
interrelationships between components.
• Good for design and implementation of the system.
– Function-based
• Classes of users are identified together with the functionality
that the system will provide for each class.
• The objectives of the system are clearly identified. But how do
you achieve these objectives?
– Data-based
• Identify the different types of describing data and specify the
functional units that will realize and/or use data according to
these views.
F7S/MIT – Database Systems Page 30
Conceptual Schema Definition
RELATION EMP [
KEY = {ENO}
ATTRIBUTES = {
ENO : CHARACTER(9)
ENAME : CHARACTER(15)
TITLE : CHARACTER(10)
}
]
RELATION PAY [
KEY = {TITLE}
ATTRIBUTES = {
TITLE : CHARACTER(10)
SAL : NUMERIC(6)
}
]
F7S/MIT – Database Systems Page 31

Conceptual Schema Definition


RELATION PROJ [
KEY = {PNO}
ATTRIBUTES = {
PNO : CHARACTER(7)
PNAME : CHARACTER(20)
BUDGET : NUMERIC(7)
}
]
RELATION ASG [
KEY = {ENO,PNO}
ATTRIBUTES = {
ENO : CHARACTER(9)
PNO : CHARACTER(7)
RESP : CHARACTER(10)
DUR : NUMERIC(3)
}
] F7S/MIT – Database Systems Page 32
Internal Schema Definition
RELATION EMP [
KEY = {ENO}
ATTRIBUTES = {
ENO : CHARACTER(9)
ENAME : CHARACTER(15)
TITLE : CHARACTER(10)
}
]


INTERNAL_REL EMPL [
INDEX ON E# CALL EMINX
FIELD = {
HEADER: BYTE(1)
E# : BYTE(9)
ENAME : BYTE(15)
TIT : BYTE(10)
}
F7S/MIT – Database Systems Page 33
]

External View Definition – Example 1

Create a BUDGET view from the PROJ


relation

CREATE VIEW BUDGET(PNAME, BUD)


AS SELECT PNAME, BUDGET
FROM PROJ

F7S/MIT – Database Systems Page 34


External View Definition – Example 2

Create a Payroll view from relations EMP and


TITLE_SALARY

CREATE VIEW PAYROLL (ENO, ENAME, SAL)


AS SELECT
EMP.ENO,EMP.ENAME,PAY.SAL
FROM EMP, PAY
WHERE EMP.TITLE = PAY.TITLE

F7S/MIT – Database Systems Page 35

Alternatives in Distributed Database


Systems
Distribution
Distributed
Peer-to-peer
multi-DBMS
Distributed DBMS

Client/server

Autonomy

Multi-DBMS

Federated DBMS
Heterogeneity

F7S/MIT – Database Systems Page 36


Dimensions of the Problem
• Distribution
– Whether the components of the system are located on the
same machine or not
• Heterogeneity
– Various levels (hardware, communications, operating system)
– DBMS important one
• data model, query language,transaction management
algorithms
• Autonomy
– Not well understood and most troublesome
– Various versions
• Design autonomy: Ability of a component DBMS to decide
on issues related to its own design.
• Communication autonomy: Ability of a component DBMS
to decide whether and how to communicate with other
DBMSs.
• Execution autonomy: Ability of a component DBMS to
execute local operations in any manner it wants to.
F7S/MIT – Database Systems Page 37

Datalogical Distributed DBMS Architecture

ES1 ES2 ... ESn

ES: External Schema


GCS GCS: Global Conceptual
Schema
LCS: Local Conceptual
LCS1 LCS2 ... LCSn Schema
LIS: Local Internal Schema
LIS1 LIS2 ... LISn

F7S/MIT – Database Systems Page 38


Datalogical Multi-DBMS
Architecture
GES1 GES2 ... GESn

LES11 … LES1n GCS LESn1 … LESnm

LCS1 LCS2 … LCSn

LIS1 LIS2 … LISn

• GES: Global External Schema • LCS: Local Conceptual Schema


• LES: Local External Schema • LIS: Local Internal Schema

F7S/MIT – Database Systems Page 39

Timesharing Access to a Central


Database
Terminals or PC terminal emulators
• No data
storage
• Host
running all
software
Batch Response Network
requests
Communications
Application Software
DBMS Services

Database

F7S/MIT – Database Systems Page 40


Multiple Clients/Single Server
Applications Applications Applications

Client Client Client


Services Services Services
Communications Communications Communications

LAN
High-level Filtered
requests data only
Communications

DBMS Services

Database

F7S/MIT – Database Systems Page 41

Task Distribution
Application
QL Programmatic
Interface … Interface
Communications Manager
SQL result
query table
Communications Manager

Query Optimizer
Lock Manager
Storage Manager
Page & Cache Manager

Database

F7S/MIT – Database Systems Page 42


Advantages of Client-Server
Architectures
• More efficient division of labor
• Horizontal and vertical scaling of resources
• Better price/performance on client machines
• Ability to use familiar tools on client machines
• Client access to remote data (via standards)
• Full DBMS functionality provided to client
workstations
• Overall better system price/performance
F7S/MIT – Database Systems Page 43

Problems With Multiple-


Client/Single Server

• Server forms bottleneck


• Server forms single point of failure
• Database scaling difficult

F7S/MIT – Database Systems Page 44


Multiple Clients/Multiple Servers
• directory Applications
• caching Client
Services
• query
Communications
decomposition
• commit protocols
LAN

Communications Communications

DBMS Services DBMS Services

Database Database

F7S/MIT – Database Systems Page 45

Server-to-Server
• SQL interface Applications
• programmatic Client
interface Services
Communications
• other
application
LAN
support
environments
Communications Communications

DBMS Services DBMS Services

Database Database

F7S/MIT – Database Systems Page 46


Components of a Multi-DBMS
USER
Global
Responses Requests
GTP GUI GQP

GS GRM GQO

Local Component Interface Processor Component Interface Processor Local


Requests (CIP) (CIP) Requests

User Transaction Transaction User


D Interface Manager D Manager Interface
B Query
Scheduler B Scheduler
Query
M
Processor … M
Processor
Query Recovery Recovery Query
S Optimizer Manager S Manager Optimizer
Runtime Sup. Runtime Sup.
Processor Processor

F7S/MIT – Database Systems Page 47

Directory Issues
Type

Local & central Local & distributed


Global & central & non-replicated (?) & non-replicated
& non-replicated

Global & distributed


Local & central & non-replicated (?)
& replicated (?)

Location
Global & central
& replicated (?)
Local & distributed
& replicated

Global & distributed


& replicated
Replication
F7S/MIT – Database Systems Page 48
Design Problem
• In the general setting :
Making decisions about the placement of data
and programs across the sites of a computer
network as well as possibly designing the
network itself.

• In Distributed DBMS, the placement of


applications entails
– placement of the distributed DBMS software;
and

– placement of the applications that run on the


database F7S/MIT – Database Systems Page 49

Distribution Design

• Top-down
– mostly in designing systems from scratch

– mostly in homogeneous systems

• Bottom-up
– when the databases already exist at a number
of sites

F7S/MIT – Database Systems Page 50


Distribution Design Issues
• Why fragment at all?

• How to fragment?

• How much to fragment?

• How to test correctness?

• How to allocate?

• Information requirements?

F7S/MIT – Database Systems Page 51

Fragmentation
• Can't we just distribute relations?
• What is a reasonable unit of distribution?
– relation
• views are subsets of relations
• extra communication
– fragments of relations (sub-relations)
• concurrent execution of a number of transactions that
access different portions of a relation
• views that cannot be defined on a single fragment will
require extra processing
• semantic data control (especially integrity enforcement)
more difficult
F7S/MIT – Database Systems Page 52
Fragmentation Alternatives –
Horizontal
PROJ
PROJ1 : projects with budgets less than PNO PNAME BUDGET LOC
$200,000 P1 Instrumentation 150000 Montreal
PROJ2 : projects with budgets greater P2 Database Develop. 135000 New York
P3 CAD/CAM 250000 New
New York
York
than or equal to $200,000 P4 Maintenance 310000 Paris
P5 CAD/CAM 500000 Boston

PROJ1 PROJ2

PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC


P1 Instrumentation 150000 Montreal P3 CAD/CAM 250000 New York
P2 Database Develop. 135000 New York P4 Maintenance 310000 Paris
P5 CAD/CAM 500000 Boston

F7S/MIT – Database Systems Page 53

Fragmentation Alternatives –
Vertical
PROJ
PROJ1: information about project PNO PNAME BUDGET LOC
budgets P1 Instrumentation 150000 Montreal
PROJ2: information about project P2 Database Develop. 135000 New York
P3 CAD/CAM 250000 New
New York
York
names and locations P4 Maintenance 310000 Paris
P5 CAD/CAM 500000 Boston

PROJ1 PROJ2

PNO BUDGET PNO PNAME LOC

P1 150000 P1 Instrumentation Montreal


P2 135000 P2 Database Develop. New York
P3 250000 P3 CAD/CAM New York
P4 310000 P4 Maintenance Paris
P5 500000 P5 CAD/CAM Boston

F7S/MIT – Database Systems Page 54


Degree of Fragmentation

finite number of alternatives

tuples relations
or
attributes

Finding the suitable level of partitioning


within this range

F7S/MIT – Database Systems Page 55

Correctness of Fragmentation
• Completeness
– Decomposition of relation R into fragments R1, R2, ..., Rn is
complete if and only if each data item in R can also be found
in some Ri
• Reconstruction
– If relation R is decomposed into fragments R1, R2, ..., Rn,
then there should exist some relational operator ∇such that
R = ∇1≤i≤nRi
• Disjointness
– If relation R is decomposed into fragments R1, R2, ..., Rn,
and data item di is in Rj, then di should not be in any other
fragment Rk (k ≠ j ).

F7S/MIT – Database Systems Page 56


Other Fragmentation Issues
• Privacy
• Security
• Bandwidth of Connection
• Reliability
• Replication Consistency
• Local User Needs

F7S/MIT – Database Systems Page 57

Query Processing
high level user query

query
processor

low level data manipulation


commands

F7S/MIT – Database Systems Page 58


Query Processing Components

• Query language that is used


– SQL: “intergalactic dataspeak”

• Query execution methodology


– The steps that one goes through in executing
high-level (declarative) user queries.

• Query optimization
– How do we determine the “best” execution
plan? F7S/MIT – Database Systems Page 59

Selecting Alternatives
SELECT ENAME Π Project
FROM EMP,ASG σ Select
WHERE EMP.ENO = ASG.ENO × Join
AND DUR > 37

Strategy 1
ΠENAME(σDUR>37∧EMP.ENO=ASG.ENO (EMP × ASG))
Strategy 2
ΠENAME(EMP ENO (σDUR>37 (ASG)))

Strategy 2 avoids Cartesian product, so is “better”


F7S/MIT – Database Systems Page 60
What is the Problem?
Site 1 Site 2 Site 3 Site 4 Site 5
ASG1=σENO≤“E3”(ASG) ASG2=σENO>“E3”(ASG) EMP1=σENO≤“E3”(EMP) EMP2=σENO>“E3”(EMP) Result

Site 5 Site 5
result = EMP1’∪EMP2’ result2=(EMP1∪ EMP2) ENOσDUR>37(ASG1∪ ASG1)
EMP1’ EMP2’
ASG1 ASG2 EMP1 EMP2
Site 3 Site 4
EMP1’=EMP1 ENOASG1
’ EMP2 ’=EMP
ENOASG2

2
Site 1 Site 2 Site 3 Site 4

ASG1’ ASG2’
Site 1 Site 2
ASG1’=σDUR>37(ASG1) ASG2’=σDUR>37(ASG2)

F7S/MIT – Database Systems Page 61

Query Optimization Objectives


Minimize a cost function
I/O cost + CPU cost + communication cost
These might have different weights in different
distributed environments
Wide area networks
– communication cost will dominate (80 – 200 ms)
• low bandwidth
• low speed
• high protocol overhead
– most algorithms ignore all other cost components
Local area networks
– communication cost not that dominant (1 – 5 ms)
– total cost function should be considered
Can also maximize throughput
F7S/MIT – Database Systems Page 62
Complexity of Relational
Operations
Operation Complexity

Select
Project O(n)
• Assume (without duplicate elimination)

– relations of cardinality Project


O(nlog n)
(with duplicate elimination)
n Group

– sequential scan Join


Semi-join O(nlog n)
Division
Set Operators

Cartesian Product O(n2)

F7S/MIT – Database Systems Page 63

Query Optimization Issues –


Types of Optimizers
• Exhaustive search
– cost-based
– optimal
– combinatorial complexity in the number of relations
• Heuristics
– not optimal
– regroup common sub-expressions
– perform selection, projection first
– replace a join by a series of semijoins
– reorder operations to reduce intermediate relation size
– optimize individual operations

F7S/MIT – Database Systems Page 64


Query Optimization Issues –
Optimization Granularity

• Single query at a time


– cannot use common intermediate results

• Multiple queries at a time


– efficient if many similar queries
– decision space is much larger

F7S/MIT – Database Systems Page 65

Query Optimization Issues –


Optimization Timing
• Static
– compilation ⇒ optimize prior to the execution
– difficult to estimate the size of the intermediate results ⇒
error propagation
– can amortize over many executions
– R*
• Dynamic
– run time optimization
– exact information on the intermediate relation sizes
– have to reoptimize for multiple executions
– Distributed INGRES
• Hybrid
– compile using a static algorithm
– if the error in estimate sizes > threshold, reoptimize at run
time
– MERMAID F7S/MIT – Database Systems Page 66
Query Optimization Issues –
Statistics

• Relation
– cardinality
– size of a tuple
– fraction of tuples participating in a join with
another relation
• Attribute
– cardinality of domain
– actual number of distinct values
• Common assumptions
– independence between different attribute
values
– uniform distribution of attribute values within
their domain
F7S/MIT – Database Systems Page 67

Query Optimization Issues –


Decision Sites
• Centralized
– single site determines the “best” schedule
– simple
– need knowledge about the entire distributed database
• Distributed
– cooperation among sites to determine the schedule
– need only local information
– cost of cooperation
• Hybrid
– one site determines the global schedule
– each site optimizes the local subqueries

F7S/MIT – Database Systems Page 68


Query Optimization Issues –
Network Topology
• Wide area networks (WAN) – point-to-point
– characteristics
• low bandwidth
• low speed
• high protocol overhead
– communication cost will dominate; ignore all other cost
factors
– global schedule to minimize communication cost
– local schedules according to centralized query
optimization
• Local area networks (LAN)
– communication cost not that dominant
– total cost function should be considered
– broadcasting can be exploited (joins)
– special algorithms exist for star networks

F7S/MIT – Database Systems Page 69

Distributed Query
Processing Methodology
Calculus Query on Distributed
Relations

Query GLOBAL
Query GLOBAL
Decomposition
Decomposition SCHEMA
SCHEMA

Algebraic Query on Distributed


Relations
CONTROL
Data FRAGMENT
SITE Data FRAGMENT
Localization
Localization
SCHEMA
SCHEMA

Fragment Query

Global STATS
STATSON
ON
Global FRAGMENTS
Optimization
Optimization FRAGMENTS

Optimized Fragment Query


with Communication Operations

LOCAL Local LOCAL


Local LOCAL
Optimization SCHEMAS
SITES Optimization SCHEMAS

Optimized Local
Queries
F7S/MIT – Database Systems Page 70
Restructuring
• Convert relational calculus to relational ΠENAME Project
algebra
• Make use of query trees
• Example σDUR=12 OR DUR=24
Find the names of employees other
than J. Doe who worked on the
CAD/CAM project for either 1 or 2 σPNAME=“CAD/CAM” Select
years.
SELECT ENAME
FROM EMP, ASG, PROJ
σENAME≠“J. DOE”
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO PNO

AND ENAME ≠ “J. Doe”


AND PNAME = “CAD/CAM” ENO Join
AND (DUR = 12 OR DUR = 24)
PROJ ASG EMP

F7S/MIT – Database Systems Page 71

Restructuring –Transformation
Rules
• Commutativity of binary operations
– R×S⇔S×R
– R S⇔S R
– R∪S⇔S∪R
• Associativity of binary operations
– ( R × S ) × T ⇔ R × (S × T)
– ( R S ) T ⇔ R (S T )
• Idempotence of unary operations
ΠA’(ΠA’(R)) ⇔ ΠA’(R)
σp1(A1)(σp2(A2)(R)) = σp1(A1) ∧ p2(A2)(R)
where R[A] and A' ⊆ A, A" ⊆ A and A' ⊆ A"
• Commuting selection with projection

F7S/MIT – Database Systems Page 72


Restructuring – Transformation
Rules
• Commuting selection with binary operations
σp(A)(R × S) ⇔ (σp(A) (R)) × S
σp(Ai)(R (Aj,Bk) S) ⇔ (σp(Ai) (R)) (Aj,Bk) S
σp(Ai)(R ∪ T) ⇔ σp(Ai) (R) ∪ σp(Ai) (T)
where Ai belongs to R and T
• Commuting projection with binary operations
ΠC(R × S) ⇔ Π A’(R) × Π B’(S)
ΠC(R (Aj,Bk) S) ⇔ Π A’(R) (Aj,Bk) Π B’(S)
ΠC(R ∪ S) ⇔ Π C (R) ∪ Π C (S)
where R[A] and S[B]; C = A' ∪ B' where A' ⊆ A, B' ⊆ B

F7S/MIT – Database Systems Page 73

Example
Recall the previous example: ΠENAME Project
Find the names of employees
other than J. Doe who worked
on the CAD/CAM project for σDUR=12 OR DUR=24
either one or two years.
σPNAME=“CAD/CAM” Select
SELECT ENAME
FROM PROJ, ASG, EMP σENAME≠“J. DOE”
WHERE ASG.ENO=EMP.ENO
AND ASG.PNO=PROJ.PNO
AND ENAME≠“J. Doe” PNO

AND PROJ.PNAME=“CAD/CAM”
AND (DUR=12 OR DUR=24) ENO Join

PROJ ASG EMP

F7S/MIT – Database Systems Page 74


Equivalent Query
ΠENAME

σPNAME=“CAD/CAM” ∧(DUR=12 ∨ DUR=24) ∧ ENAME≠“J. DOE”

PNO ∧ENO

ASG PROJ EMP


F7S/MIT – Database Systems Page 75

Restructuring
ΠENAME

PNO

ΠPNO,ENAME

ENO

ΠPNO ΠPNO,ENO ΠPNO,ENAME

σPNAME = "CAD/CAM" σDUR =12 ∧ DUR=24 σENAME ≠ "J. Doe"

PROJ ASG EMP

F7S/MIT – Database Systems Page 76

You might also like