Relational Algebra
Relational Algebra
Database Systems
L06: Relational Algebra
CS2102: Database Systems -- Adi Yoga Sidi Prabawa 1 / 66
Roadmap
Complex Expressions
Tree
Equivalence
This is not easy, but if we do Operands variables or values from which new values can be constructed
have it and we denote it as Operators symbols denoting procedures that construct new values from the given values
Rc, then we can define set
difference as
R − S = R ∩ Sc Relational Algebra
Try to check this on your Operands relations (or variables representing relations)
own by drawing the Venn
Operators transformation from one or more input relations into one output relation
diagram. To help you, the
Venn diagram for the
complement is shown
below.
Operators Note
Unary selection (σ) projection (π) renaming (ρ) All other operators (except
"aggregate" and "sorting", not
Binary union (∪) intersection (∩) difference (−) discussed here) can be expressed
using
product (×) inner join (⋈) outer join (⟕,⟖,⟗)
σ π ρ ∪ ∩ − ×
Example
Question Quiz #1
Consider the following two operations: Think of some common programming language such
1. (a × b) + (a × c) as Python. Are the two operations equivalent?
2. a × (b + c)
Plan
Hash Join (cost=29.80..186.32 rows=3872 width=64)
Hash Cond: (([Link])::text = ([Link])::text)
→ Seq Scan on restaurants r (cost=0.00..18.80 rows=880 width=64)
→ Hash (cost=18.80..18.80 rows=880 width=64)
→ Seq Scan on customers c (cost=0.00..18.80 rows=880 width=64)
Plan
Hash Join (cost=29.80..186.32 rows=3872 width=64)
Hash Cond: (([Link])::text = ([Link])::text)
→ Seq Scan on restaurants r (cost=0.00..18.80 rows=880 width=64)
→ Hash (cost=18.80..18.80 rows=880 width=64)
→ Seq Scan on customers c (cost=0.00..18.80 rows=880 width=64)
Plan
Nested Loop (cost=0.00..11655.80 rows=258133 width=64)
Join Filter: (([Link])::text < ([Link])::text)
→ Seq Scan on restaurants r (cost=0.00..18.80 rows=880 width=64)
→ Materialize (cost=0.00..23.20 rows=880 width=64)
→ Seq Scan on customers c (cost=0.00..18.80 rows=880 width=64)
Mathematical Example
ℤ+ is closed under {+}
but not under {+, −}
ℤ is closed under {+, −, ×}
but not under {+, −, ×, ÷}
ℝ is closed under {+, −, ×, ÷}✱
but not under {+, −, ×, ÷, √}
ℂ∞ is algebraically closed
this is called extended complex number
✱
Except for division by 0.
CS2102: Database Systems -- Adi Yoga Sidi Prabawa 11 / 66
Algebra
Preliminary
❱ Closure
Closure
Basic Theorem
Theorem Visualization
Implication Definition
Sample Data We say that a set of values is closed under the set of
oerators if any combination of the operators
produces only values in the given set.
Theorem
Relations are closed under relational algebra
Proof
1. Inputs and outputs are relations
2. By (1),
This produces another relation
3. No other outputs are possible✱
✱
A slight problem is when it is an "error", but it is closed if we consider only valid formula
CS2102: Database Systems -- Adi Yoga Sidi Prabawa 12 / 66
Algebra
Preliminary
❱ Closure
Closure
Basic Implication
Theorem Visualization
Implication Chaining
Sample Data We can chain operations because all inputs and
outputs are relations. In other words, the output of
one operator can be an input to another operator.
Example
Using the image on the right, we get
Op5 (
Op2 ( Op1 ( R1 ) )
Op4 (
Op3 ( R2 ),
R3
)
)
ER Diagram
38 rows 55 rows
≥ >=
1. Precedence
2. Selection (op)
≡ N/A Note 3. Negation
≢ N/A 4. Conjunction
The condition c must specify only attributes in R. 5. Disjunction
Visualization
Properties
Naïve Implementation × The result have the same
schema as the input
def σ(c, R): relation.
res = Rel(Attr(R))
The number of rows are
for row in [Link]:
if c(row) == True:
often smaller.
res += row
return res
Result
rname area
Spice Palace London
London Seafood Shack London
Thames River Tavern London
3 rows
Result
rname pizza price
Bella Italia Veggie 11
Spice Palace Veggie 13
Sizzle Grill BBQ Chicken 13
Sizzle Grill Supreme 15
Sizzle Grill Four Cheese 13
Sizzle Grill Mushroom 16
6 rows
In the relational algebra Additionally, the ordered list of attributes ℓ must specify only attributes in R.
tool, we use the following
syntax:
π[ℓ] (R)
PROJECT[ℓ] (R)
× Properties
Naïve Implementation
Resulting schema is as
def π(ℓ, R): specified by ℓ without
res = Rel(Attr(R)[ℓ]) the relation name.
for row in [Link]:
res += row[ℓ]
The number of rows may
return res be smaller.
relation is defined as a
set of tuples (duplicates
where row[ ℓ ] is a slicing of
the row on the given
removed)
columns specified in ℓ . This
is the same for Attr(R)[ℓ].
Result
cname
Alice
Bob
Emily
Lucas
Isabella
Ethan
Mia
Alexander
Ava
Daniel
10 rows
σ[rname = 'Siz… : : :
29%
π[pizza,price](…
C σ[rname = 'Sizzle Grill'](Sells)
10%
D π[pizza,price](σ[rname = 'Sizzle Grill'](Sells))
None of the a…
0%
Notation for ℜ
Relational Algebra Tools ×
In the relational algebra
There are 2 ways to specify ℜ and we will follow option 1. We mention option 2 for Note
tool, we use the following completeness as it is supported by the tools.
In the case of option 1, the
syntax: order of attributes does not
ρ[ℜ] (R) 1. ℜ is an unordered collection of "Bi ← Ai" (without quotes) matter. However, renaming
RENAME[ℜ] (R) all attributes produces
Example: ρ[B1 ← A1, B2 ← A2](R) longer formula.
Where ℜ are comma The renaming is done "at the same time" (i.e., no chaining) and the resulting
separated renaming Also note that we do not
operation of the form:
schema should not have duplicate attributes
have a way to rename the
2. ℜ is an ordered list of "Bi" (without quotes) relation. Renaming relation
B <- A Example: ρ[B1, B2](R) is also irrelevant as the
relations produced by the
The tool also admits a Order of attributes matter and the list need not all attributes (trailing operators are temporary.
comma separated attributes attributes not mentioned remain the same)
(i.e., option 2).
Naïve Implementation ×
def ρ(ℜ, R):
Visualization
Properties
attr = ()
for a in Attr(R):
if a in ℜ:
attr += (ℜ[a],) Resulting schema is the
else: old schema renamed by
attr += (a,) ℜ.
res = Rel(attr)
[Link] = [Link]
The order of column is
return res unchanged (except for the
renaming).
The number of rows
where ℜ is treated as a
dictionary (i.e., hash table).
remains the same.
Result
pizza_name
Margherita
Veggie
Pepperoni
BBQ Chicken
Supreme
Four Cheese
Hawaiian
Mushroom
8 rows
Result
rname
Margherita
Hawaiian
BBQ Chicken
Mushroom
4 rows
Outer Joins More formally, let R(A1, A2, ..., An) and S(B1, B2, ..., Bm) be relations
G H I
✱
We denote this as (Attr(R) ∩ Attr(S)) = ∅ where Attr(R) is the set of attributes in R and Attr(S) is the set of attributes in S.
CS2102: Database Systems -- Adi Yoga Sidi Prabawa 32 / 66
Binary Operators
Set Operators
❱ Product
Product
Basic Example
Example
Exercise Question
Discussion Find all pairs of customer name and restaurant name such that they are in the same area.
Joins
Inner Joins
Outer Joins Result
cname rname
12 rows
Kinds
Cross Product
Naïve Better?
def ⋈(θ, R, S): def ⋈(θ, R, S): Is It Better?
T = ×(R, S) res = Rel(Attr(R) + Attr(S))
1. If T is too big, then memory
res = σ(θ, T) for row1 in [Link]: management is a problem
2. If θ only involves R or S, then
return res for row2 in [Link]: the selection can be performed
row = row1 + row2 before cross product.
12 rows
Better?
def ⋈=(θ=, R, S):
res = Rel(Attr(R) + Attr(S))
for row1 in [Link]:
for row2 in HashGet([Link], θ=): # use hash table!
res += (row1 + row2)
return res
where Caution
ℓ = (Attr(R) ∩ Attr(S)) + (Attr(R) − Attr(S)) + (Attr(S) − Attr(R))✱
In the current
Attribute = (common) + (in R but not in S) + (in S but not in R)
implementation of the
Here, + is a tuple concatenation instead of set union as the order matters
relational algebra tool, the
θ = ∀ Ai ∈ (Attr(R) ∩ Attr(S)) : [Link] = [Link] order of attributes is
Condition = (all common attributes are the equal using = operator) different:
1. Attr(R)
2. Attr(R) − Attr(S)
✱
This is actually equivalent to (Attr(R) ∪ Attr(S)) except that we want to make it more explicit about the ordering
CS2102: Database Systems -- Adi Yoga Sidi Prabawa 40 / 66
Binary Operators
Set Operators
Product
Inner Joins
Joins Natural Join
❱ Inner Joins
Theta Join
Question
Equi Join Find all restaurant name, pizza, and the price of the pizza sold by restaurant in London.
Natural Join
Outer Joins
Result
rname pizza price
Spice Palace Veggie 13
Spice Palace Mushroom 14
Spice Palace Supreme 16
Spice Palace Four Cheese 16
London Seafood Shack Margherita 14
: : :
12 rows
Question
But how do we generate the dangling tuple?
Joins Dangle
Inner Joins Idea Dangle
❱ Outer Joins
Basic 1. Find non-dangling tuples dangle(R ⋈[θ] S) =
Dangle A. Find the inner join R − π[Attr(R)](R ⋈[θ] S)
B. Perform projection to fit the relation
null
2. Remove non-dangling tuples from relation
Quiz Due to Step 1B above, this is guaranteed to be
union-compatible
SQL?
SELECT *
FROM Restaurants R -- R
Note WHERE EXISTS (
The idea of finding non-dangling tuple is quite SELECT 1
❱ Outer Joins x y z a b
Basic Consider the relations on the right. How many rows and columns x1 y1 10 x2 b2
Dangle are in the result of the relational algebra expression below? x2 y1 5 x3 b3
null x3 y1 15
σ[b ≡ NULL](R1 ⟕[x = a] R2)
Quiz x2 y2 20
x3 y3 30
NOTE: The above expressions cannot be currently evaluated in the tools,
try to trace it on your own
1 rows, 5 co…
38%
5 rows, 5 colu…
38% Choice Comment
1 rows, 3 colu…
A 1 row, 5 column
8% B 5 row, 5 column
5 rows, 3 colu… C 1 row, 3 column
8%
D 5 row, 3 column
None of the a…
8%
We say that two relational algebra expression Q1 and Q2 are isomorphic (denoted by Q1 ≅ Q2) if for any input relations,
both produces the same result with
possibly different column order
possibly different row order
Equivalent Isomorphic
a b c a b c c b a
3 4 5 1 2 3 4 3 2
2 3 4 ≡ 2 3 4 ≅ 5 4 3
1 2 3 3 4 5 3 2 1
R × (S × T) ≡ (R × S) × T
Caution ×
Projection R ⋈ (S ⋈ T)
associative
≅ (R ⋈ S) ⋈ T
Note that in the current
version of the relational π[ℓ1](π[ℓ2](R)) ≢ π[ℓ1](R) unless ℓ1 ⊆ ℓ2 different column order
algebra tool, we have a (R ⋈[θ1] S) ×[θ2] T ≢ R ⋈[θ1] (S ×[θ2] T)
different column order for
natural join. In the current if θ1 uses Attr(T)
implementation, you will or θ2 uses Attr(R)
see that
R ⋈ (S ⋈ T)
≡ Combined
(R ⋈ S) ⋈ T
π[ℓ](σ[θ](R)) ≢ σ[θ](π[ℓ](R)) unless θ uses only attributes in ℓ
In our definition of natural
join, it is only isomorphic. σ[θ](R × S) ≢ σ[θ](R) × S unless θ uses only Attr(R)
Incompatible Attributes
σ[ price = 'Margherita' ](Sells)
Unnecessary Operators
π[ rname ](π[ rname , price ](Sells))
≡ π[ rname ](Sells)
Unoptimized Query
σ[ price ≥ 15 ]( Sells ⋈[name = rname] ρ[name ← rname](Restaurants))
≡ σ[ price ≥ 15 ]( Sells ) ⋈[name = rname] ρ[name ← rname](Restaurants)
Operators
Unary selection (σ) projection (π) renaming (ρ)
Binary union (∪) intersection (∩) difference (−)
product (×) inner join (⋈) outer join (⟕,⟖,⟗)
All other operators (except "aggregate" and "sorting", not discussed here) can be expressed using
σ π ρ ∪ ∩ − ×
Relational Algebra
Relational Algebra
π[a1, a2, ..., am](σ[θ](r1 × r2 × ... × rn)) ⇓ Query optimizer
Query Plan
⇓ Code generator
Executable Code
Projection Renaming
L ⋈[θ] R
The visualization has different meaning here. It visualizes which relation we keep the dangling tuple from.
Solution
This actually depends on the type in Python because we can multiply a string with an integer.
YES if
a : INT
b : INT
c : INT
NO if
a : INT
b : TEXT
c : TEXT
Choice Comment
A σ[rname = 'Sizzle Grill'](π[price,pizza](Sells)) NO: attribute rname does not exists after projection
B π[price,pizza](σ[rname = 'Sizzle Grill'](Sells)) YES: correct column order and correct condition
Solution
The key here is the dangling tuples. Full outer join consists of (inner join) ∪ (left dangle) ∪ (right dangle).
Note that after we project the columns to include only Attr(R), then we are left with the (semi join) ∪ (left dangle).
So we need to remove the left dangle get the semi join. However, this cannot be done after the projection because
also note that (semi join) ∪ (left dangle) is the original relation.
So we remove left dangle before projection. This can be done by looking at {null(S)}. We construct a condition such
that there is an attribute from Attr(S) that are non NULL. This assumes we can construct ∃ or uses a series of
disjunction.
Choice Comment
A 1 row, 5 column YES: this finds the dangling tuple
Solution
σ[ area = rarea ](Customers × (ρ[rarea ← area](Restaurants)))
≅ Customers ⋈ Restaurants
This is not equivalent but isomorphic because we are losing some columns AND the columns lost are duplicate columns
σ[ price ≥ 15 ]( Sells ⋈[name = rname] ρ[name ← rname](Restaurants))
≡ σ[ price ≥ 15 ]( Sells ) ⋈ Restaurants