0% found this document useful (0 votes)
10 views67 pages

Relational Algebra

The document discusses relational algebra as part of a database systems course, focusing on its mathematical foundation and operators. It covers unary and binary operators, including selection, projection, and joins, and emphasizes the importance of closure in relational algebra. The document also includes examples and SQL query optimization techniques related to relational algebra operations.

Uploaded by

tayruxin11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views67 pages

Relational Algebra

The document discusses relational algebra as part of a database systems course, focusing on its mathematical foundation and operators. It covers unary and binary operators, including selection, projection, and joins, and emphasizes the importance of closure in relational algebra. The document also includes examples and SQL query optimization techniques related to relational algebra operations.

Uploaded by

tayruxin11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CS2102

Database Systems
L06: Relational Algebra
CS2102: Database Systems -- Adi Yoga Sidi Prabawa 1 / 66
Roadmap

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 2 / 66


Roadmap
❱ Overview
Sections
Overview

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 3 / 66


Roadmap
Overview
❱ Sections
Sections
Algebra
Mathematical System
Closure

Unary Operators Binary Operators


Selection Set Operators
Projection Product
Renaming Joins

Complex Expressions
Tree
Equivalence

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 4 / 66


Algebra

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 5 / 66


Algebra
❱ Preliminary
Algebra
Preliminary
×
Other Ways to get Set
Optimization Algebra
Explain
Difference?
Closure
Another way to get set
Definition
Sample Data
difference is if we have
An algebra is a mathematical system consisting of
"complement" operation.

This is not easy, but if we do Operands variables or values from which new values can be constructed
have it and we denote it as Operators symbols denoting procedures that construct new values from the given values
Rc, then we can define set
difference as

R − S = R ∩ Sc Relational Algebra
Try to check this on your Operands relations (or variables representing relations)
own by drawing the Venn
Operators transformation from one or more input relations into one output relation
diagram. To help you, the
Venn diagram for the
complement is shown
below.
Operators Note
Unary selection (σ) projection (π) renaming (ρ) All other operators (except
"aggregate" and "sorting", not
Binary union (∪) intersection (∩) difference (−) discussed here) can be expressed
using
product (×) inner join (⋈) outer join (⟕,⟖,⟗)
σ π ρ ∪ ∩ − ×

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 6 / 66


Algebra
❱ Preliminary
Algebra
Preliminary
Optimization Optimization
Explain
Closure Why Algebra?
Sample Data Algebra allows for query optimization. If an SQL query can be expressed as a relational algebra, we can rewrite the
query such that:
It is equivalent to the original query
It can be evaluated faster than the original query (using "cost estimation")

Example
Question Quiz #1
Consider the following two operations: Think of some common programming language such
1. (a × b) + (a × c) as Python. Are the two operations equivalent?
2. a × (b + c)

Which of the operations above is "faster"?

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 7 / 66


Algebra
❱ Preliminary
Algebra
Preliminary
Optimization Explain
Explain
Closure
SQL Query
Sample Data EXPLAIN
SELECT rname, cname
FROM Restaurants R, Customers C
WHERE [Link] = [Link];

Plan
Hash Join (cost=29.80..186.32 rows=3872 width=64)
Hash Cond: (([Link])::text = ([Link])::text)
→ Seq Scan on restaurants r (cost=0.00..18.80 rows=880 width=64)
→ Hash (cost=18.80..18.80 rows=880 width=64)
→ Seq Scan on customers c (cost=0.00..18.80 rows=880 width=64)

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 8 / 66


Algebra
❱ Preliminary
Algebra
Preliminary
Optimization Explain
Explain
Closure
SQL Query
Sample Data EXPLAIN
SELECT rname, cname
FROM Restaurants R INNER JOIN Customers C
ON [Link] = [Link];

Plan
Hash Join (cost=29.80..186.32 rows=3872 width=64)
Hash Cond: (([Link])::text = ([Link])::text)
→ Seq Scan on restaurants r (cost=0.00..18.80 rows=880 width=64)
→ Hash (cost=18.80..18.80 rows=880 width=64)
→ Seq Scan on customers c (cost=0.00..18.80 rows=880 width=64)

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 9 / 66


Algebra
❱ Preliminary
Algebra
Preliminary
Optimization Explain
Explain
Closure
SQL Query
Sample Data EXPLAIN
SELECT rname, cname
FROM Restaurants R, Customers C
WHERE [Link] < [Link];

Plan
Nested Loop (cost=0.00..11655.80 rows=258133 width=64)
Join Filter: (([Link])::text < ([Link])::text)
→ Seq Scan on restaurants r (cost=0.00..18.80 rows=880 width=64)
→ Materialize (cost=0.00..23.20 rows=880 width=64)
→ Seq Scan on customers c (cost=0.00..18.80 rows=880 width=64)

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 10 / 66


Algebra
Preliminary
❱ Closure
Closure
Basic Basic
Theorem Visualization
Implication Definition
Sample Data We say that a set of values is closed under the set of
oerators if any combination of the operators
produces only values in the given set.

Mathematical Example
ℤ+ is closed under {+}
but not under {+, −}
ℤ is closed under {+, −, ×}
but not under {+, −, ×, ÷}
ℝ is closed under {+, −, ×, ÷}✱
but not under {+, −, ×, ÷, √}
ℂ∞ is algebraically closed
this is called extended complex number


Except for division by 0.
CS2102: Database Systems -- Adi Yoga Sidi Prabawa 11 / 66
Algebra
Preliminary
❱ Closure
Closure
Basic Theorem
Theorem Visualization
Implication Definition
Sample Data We say that a set of values is closed under the set of
oerators if any combination of the operators
produces only values in the given set.

Theorem
Relations are closed under relational algebra

Proof
1. Inputs and outputs are relations
2. By (1),
This produces another relation
3. No other outputs are possible✱


A slight problem is when it is an "error", but it is closed if we consider only valid formula
CS2102: Database Systems -- Adi Yoga Sidi Prabawa 12 / 66
Algebra
Preliminary
❱ Closure
Closure
Basic Implication
Theorem Visualization
Implication Chaining
Sample Data We can chain operations because all inputs and
outputs are relations. In other words, the output of
one operator can be an input to another operator.

Example
Using the image on the right, we get
Op5 (
Op2 ( Op1 ( R1 ) )
Op4 (
Op3 ( R2 ),
R3
)
)

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 13 / 66


Algebra
Preliminary
Closure
Sample Data
❱ Sample Data Schema
Schema
Entity Sets Pizza Foreign Key Constraints
Relationship Sets Pizzas(pizza: TEXT) ([Link]) ⇝ ([Link])
Customers(cname: TEXT, area: TEXT) ([Link]) ⇝ ([Link])
Restaurants(rname: TEXT, area: TEXT) ([Link]) ⇝ ([Link])
Likes(cname: TEXT, pizza: TEXT) ([Link]) ⇝ ([Link])
Sells(rname: TEXT, pizza: TEXT, price: INTEGER)

ER Diagram

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 14 / 66


Algebra
Preliminary
Closure
Sample Data
❱ Sample Data Entity Sets
Schema Pizzas Customers Restaurants
Entity Sets
pizza cname area rname area
Relationship Sets
Margherita Alice New York City Bella Italia New York City
Veggie Bob Los Angeles Sizzle Grill Los Angeles
Pepperoni Emily Chicago Taste of Chicago Chicago
BBQ Chicken Lucas London Spice Palace London
Supreme Isabella New York City London Seafood Shack London
Four Cheese Ethan Toronto Toronto Tastes Toronto
Hawaiian Mia Los Angeles Mumbai Masala Mumbai
Mushroom : : : :

8 rows 13 rows 13 rows

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 15 / 66


Algebra
Preliminary
Closure
Sample Data
❱ Sample Data Relationship Sets
Schema Likes Sells
Entity Sets
cname pizza rname area price
Relationship Sets
Alice Margherita Bella Italia Margherita 10
Alice Veggie Bella Italia Veggie 11
Alice Pepperoni Bella Italia Pepperoni 12
Alice Mushroom Bella Italia Hawaiian 13
Bob BBQ Chicken Bella Italia BBQ Chicken 14
Bob Supreme Bella Italia Four Cheese 15
Bob Four Cheese Bella Italia Mushroom 16
: : : : :

38 rows 55 rows

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 16 / 66


Unary Operators

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 17 / 66


Unary Operators
❱ Selection
Syntax
Selection
Semantics Syntax
Example
Exercise × Selection Operator op
Relational
ProjectionAlgebra Tools σ[c](R) where c is a condition that returns a boolean value (but potentially NULL) = ≠
Renaming
In the relational algebra
tool, we use the following < ≤
syntax: Expr Example Name
σ[c] (R)
> ≥
SELECT[c] (R) 1 (expr) σ[(start_year = 2020)](Projects) Precedence ≡ ≢
2 attr op const σ[start_year = 2020](Projects) Constant Selection
The operators for the
conditions c are: 2 attr1 op attr2 σ[start_year ≤ end_year](Projects) Attribute Selection
RA Tools
3 ¬expr σ[¬(start_year = 2020)](Projects) Negation
Precedence
= ==
σ[start_year = 2020 ∧ manager = 'Judy'](Projects)
The precedence is
≠ != <> 4 expr1 ∧ expr2 Conjunction (logical and)
shown on the number
< < σ[start_year = 2020 ∨ manager = 'Judy'](Projects)
5 expr1 ∨ expr2 Disjunction (logical or) on the left-most
≤ <= column.
> >

≥ >=
1. Precedence
2. Selection (op)
≡ N/A Note 3. Negation
≢ N/A 4. Conjunction
The condition c must specify only attributes in R. 5. Disjunction

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 18 / 66


Unary Operators
❱ Selection
Syntax
Selection
Semantics Semantics
Example
Exercise Selection Operator Note
Projection σ[c](R) selects all tuples from a relation R (i.e., rows from a table) that satisfy the This can be mapped to
Renaming selection condition c based on principle of acceptance. WHERE clause in SQL or
filter function in Python.

Visualization
Properties
Naïve Implementation × The result have the same
schema as the input
def σ(c, R): relation.
res = Rel(Attr(R))
The number of rows are
for row in [Link]:
if c(row) == True:
often smaller.
res += row
return res

where c(row) is the


evaluation of the condition
c on the current row row.

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 19 / 66


Unary Operators
❱ Selection
Syntax
Selection
Semantics Example
Example
Exercise Question
Projection Find all restaurants in London.
Renaming

Result
rname area
Spice Palace London
London Seafood Shack London
Thames River Tavern London

3 rows

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 20 / 66


Unary Operators
❱ Selection
Syntax
Selection
Semantics Exercise
Example
Exercise Question
Projection Find all restaurants that (a) sells Veggie cheaper than 14 or (b) is named Sizzle Grill.
Renaming

Result
rname pizza price
Bella Italia Veggie 11
Spice Palace Veggie 13
Sizzle Grill BBQ Chicken 13
Sizzle Grill Supreme 15
Sizzle Grill Four Cheese 13
Sizzle Grill Mushroom 16

6 rows

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 21 / 66


Unary Operators
Selection
❱ Projection
Projection
Syntax Syntax
Semantics
Example Projection Operator Caution
Quiz π[ℓ](R) where ℓ is an ordered list of attributes Order of attribute
Renaming matters
π[ A1 , A2 ](R))
Note ≢
π[ A2 , A1 ](R))
For simplicity, we do not allow the following on ℓ:

Operations (e.g., π[A1 + A2](R))


Duplicate (e.g., π[A , A ](R))
Relational Algebra Tools × 1 1

In the relational algebra Additionally, the ordered list of attributes ℓ must specify only attributes in R.
tool, we use the following
syntax:
π[ℓ] (R)
PROJECT[ℓ] (R)

Where ℓ are comma


separated attribute name.

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 22 / 66


Unary Operators
Selection
❱ Projection
Projection
Syntax Semantics
Semantics
Example Projection Operator Note
Quiz π[ℓ](R) keeps only the columns (i.e., projects) specified in the ordered list ℓ and This can be mapped to
Renaming in the same order. SELECT clause in SQL or
map function in Python
(without operation).
Visualization

× Properties
Naïve Implementation
Resulting schema is as
def π(ℓ, R): specified by ℓ without
res = Rel(Attr(R)[ℓ]) the relation name.
for row in [Link]:
res += row[ℓ]
The number of rows may
return res be smaller.
relation is defined as a
set of tuples (duplicates
where row[ ℓ ] is a slicing of
the row on the given
removed)
columns specified in ℓ . This
is the same for Attr(R)[ℓ].

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 23 / 66


Unary Operators
Selection
❱ Projection
Projection
Syntax Example
Semantics
Example Question
Quiz Find all customer names that likes at least one pizza.
Renaming

Result
cname
Alice
Bob
Emily
Lucas
Isabella
Ethan
Mia
Alexander
Ava
Daniel

10 rows

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 24 / 66


Unary Operators
Selection
❱ Projection
Projection
Syntax Exercise
Semantics
Example Quiz #2
Quiz Which of the following relational algebra expression resulted in the output below?
Renaming
rname pizza price price pizza
Sizzle Grill BBQ Chicken 13 13 BBQ Chicken
Sizzle Grill BBQ Chicken 13 15 Supreme
Sizzle Grill BBQ Chicken 13 13 Four Cheese
Sizzle Grill BBQ Chicken 13 16 Mushroom

σ[rname = 'Siz… : : :
29%

π[price,pizz… Choice Comment


58%
A σ[rname = 'Sizzle Grill'](π[price,pizza](Sells))
σ[rname = 'Siz…
3% B π[price,pizza](σ[rname = 'Sizzle Grill'](Sells))

π[pizza,price](…
C σ[rname = 'Sizzle Grill'](Sells)
10%
D π[pizza,price](σ[rname = 'Sizzle Grill'](Sells))
None of the a…
0%

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 25 / 66


Unary Operators
Selection
Projection
Renaming
❱ Renaming Syntax
Syntax
Semantics Renaming Operator Why Rename?
Example ρ[ℜ](R) where ℜ is a collection of attributes renaming Renaming will be
relevant for product
and join operations

Notation for ℜ
Relational Algebra Tools ×
In the relational algebra
There are 2 ways to specify ℜ and we will follow option 1. We mention option 2 for Note
tool, we use the following completeness as it is supported by the tools.
In the case of option 1, the
syntax: order of attributes does not
ρ[ℜ] (R) 1. ℜ is an unordered collection of "Bi ← Ai" (without quotes) matter. However, renaming
RENAME[ℜ] (R) all attributes produces
Example: ρ[B1 ← A1, B2 ← A2](R) longer formula.
Where ℜ are comma The renaming is done "at the same time" (i.e., no chaining) and the resulting
separated renaming Also note that we do not
operation of the form:
schema should not have duplicate attributes
have a way to rename the
2. ℜ is an ordered list of "Bi" (without quotes) relation. Renaming relation
B <- A Example: ρ[B1, B2](R) is also irrelevant as the
relations produced by the
The tool also admits a Order of attributes matter and the list need not all attributes (trailing operators are temporary.
comma separated attributes attributes not mentioned remain the same)
(i.e., option 2).

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 26 / 66


Unary Operators
Selection
Projection
Renaming
❱ Renaming Semantics
Syntax
Semantics Renaming Operator Note
Example ρ[ℜ](R) renamed all the attributes mentioned in ℜ such that for each This can be mapped to the
renaming Bi ← Ai, the attribute Ai is renamed into Bi. AS keyword in SELECT
clause in SQL. There is no
equivalent in Python.

Naïve Implementation ×
def ρ(ℜ, R):
Visualization
Properties
attr = ()
for a in Attr(R):
if a in ℜ:
attr += (ℜ[a],) Resulting schema is the
else: old schema renamed by
attr += (a,) ℜ.
res = Rel(attr)
[Link] = [Link]
The order of column is
return res unchanged (except for the
renaming).
The number of rows
where ℜ is treated as a
dictionary (i.e., hash table).
remains the same.

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 27 / 66


Unary Operators
Selection
Projection
Renaming
❱ Renaming Example
Syntax
Semantics Question
Example Rename the pizza such that the name of the pizza becomes "pizza_name".

Result
pizza_name
Margherita
Veggie
Pepperoni
BBQ Chicken
Supreme
Four Cheese
Hawaiian
Mushroom

8 rows

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 28 / 66


Binary Operators

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 29 / 66


Binary Operators
❱ Set Operators
Operators
Set Operators
Example Operators
Product
Operation Meaning Visualization Description
Joins
Inner Joins ×
Relational Algebra
Outer Joins Tools R UNION S R∪S A relation containing all tuples that are in R or S
In the relational algebra
tool, we use the following
syntax where E1 and E2 are
arbitrary relational algebra
expressions:
E1 ∪ E2 R INTERSECT S R∩S A relation containing all tuples that are in R and S
E1 ∩ E2
E1 - E2

Alternatively, we also have


the following:
A relation containing all tuples that are in R but not S
E1 UNION E2
E1 INTERSECT E2 R EXCEPT S R−S
This is an asymmetric operation
E1 MINUS E2

If the relations produced by


the two expressions are not The two relations must be union-compatible.
union-compatible, the tool
will raise an error.

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 30 / 66


Binary Operators
❱ Set Operators
Operators
Set Operators
Example Example
Product
Joins Question
Inner Joins Find all the pizza sold by both Bella Italia and Desert Diner.
Outer Joins

Result
rname
Margherita
Hawaiian
BBQ Chicken
Mushroom

4 rows

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 31 / 66


Binary Operators
Set Operators
❱ Product
Product
Basic Basic
Example Example
Exercise Definition
a b c d e
Discussion The cross product of two relations (denoted R × S) is a relation
Joins formed by combining all pairs of tuples (i.e., rows) from the two
1 2 A B C

Inner Joins input relations. 3 4 D E F

Outer Joins More formally, let R(A1, A2, ..., An) and S(B1, B2, ..., Bm) be relations
G H I

with n and m such that R and S have no common attributes✱


attributes respectively, then

a b c d e
R × S produce a schema 1 2 A B C
(R × S) (A1, A2, ..., An, B1, B2, ..., Bm) 1 2 D E F

Relational Algebra Tools × R × S produce the set of tuples 1 2 G H I

In the relational algebra


R × S = { (a1, a2, ..., an, b1, b2, ..., bm) 3 4 A B C
tool, we use the following | (a1, a2, ..., an) ∈ R ∧ (b1, b2, ..., bm) ∈ S } 3 4 D E F
syntax where E1 and E2 are
arbitrary relational algebra 3 4 G H I
expressions:
E1 × E2
E1 CROSS E2


We denote this as (Attr(R) ∩ Attr(S)) = ∅ where Attr(R) is the set of attributes in R and Attr(S) is the set of attributes in S.
CS2102: Database Systems -- Adi Yoga Sidi Prabawa 32 / 66
Binary Operators
Set Operators
❱ Product
Product
Basic Example
Example
Exercise Question
Discussion Find all pairs of customer name and restaurant name such that they are in the same area.
Joins
Inner Joins
Outer Joins Result
cname rname

Naïve Implementation × Alice Bella Italia


Alice Big Apple Bistro
def ×(R, S): Alice Down Under Delights
attrR = Attr(R)
Bob Sizzle Grill
attrS = Attr(S)
attrT = attrR + attrS Bob Hollywood Cafe
res = Rel(attrT) : :
for row1 in [Link]:
for row2 in [Link]: 28 rows
row = row1 + row2
res += row
return res

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 33 / 66


Binary Operators
Set Operators
❱ Product
Product
Basic Exercise
Example
Exercise Question
Discussion Find all restaurant name, pizza, and the price of the pizza sold by restaurant in London.
Joins
Inner Joins
Outer Joins Result
rname pizza price
Spice Palace Veggie 13
Spice Palace Mushroom 14
Spice Palace Supreme 16
Spice Palace Four Cheese 16
London Seafood Shack Margherita 14
: : :

12 rows

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 34 / 66


Binary Operators
Set Operators
❱ Product
Product
Basic Discussion
Example
Exercise Observation
Discussion Given two relations R and S, where the size of R is denoted as |R| and the size of S is denoted as |S|, the size of the
Joins cross product is |R| × |S|
Inner Joins
Outer Joins In practice, most queries requiring a cross product also require
Selection operation to remove unnecessary rows
Projection operation to remove unnecessary / duplicate (duplicate in meaning, not in name) columns

π[cname, rname](σ[area = rarea](Customers × (ρ[rarea ← area](Restaurants))))

Join Operators Notation


Join operators simplify relational algebra operation by
Combining cross product, selection, and (optionally) projection
R ⋈ [θ ] S
Avoid generating all |R| × |S| intermediate tuples

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 35 / 66


Binary Operators
Set Operators
Product
Inner Joins
❱ Joins Preliminary
Preliminary
Inner Joins Basic Idea
Outer Joins Combine cross product, selection, and (optionally) projection
Avoid generating all |R| × |S| intermediate tuples
Typically results in simpler relational algebra expressions when formulating queries

Kinds
Cross Product

Inner Joins Outer Joins


Include only tuples that satisfy the condition Include tuples that do not satisfy the condition

⋈[θ] θ-join ⟕[θ] left outer join


⋈[=] equi join ⟖[θ] right outer join
⋈ natural join ⟗[θ] full outer join

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 36 / 66


Binary Operators
Set Operators
Product
Inner Joins
Joins Theta Join
❱ Inner Joins
Theta Join
Definition Note
Equi Join The θ-join of two relations (denoted by R ⋈[θ] S) is defined as The condition θ can use
Natural Join attributes that appears in R
Outer Joins R ⋈[θ] S = σ[θ]( R × S ) or S.

In other words, it is a cross product followed by selection.

Naïve Better?
def ⋈(θ, R, S): def ⋈(θ, R, S): Is It Better?
T = ×(R, S) res = Rel(Attr(R) + Attr(S))
1. If T is too big, then memory
res = σ(θ, T) for row1 in [Link]: management is a problem
2. If θ only involves R or S, then
return res for row2 in [Link]: the selection can be performed
row = row1 + row2 before cross product.

if θ(row): res += row


return res

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 37 / 66


Binary Operators
Set Operators
Product
Inner Joins
Joins Theta Join
❱ Inner Joins
Theta Join
Question
Equi Join Find all restaurant name, pizza, and the price of the pizza sold by restaurant in London.
Natural Join
Outer Joins
Result
rname pizza price
Spice Palace Veggie 13
Spice Palace Mushroom 14
Spice Palace Supreme 16
Spice Palace Four Cheese 16
London Seafood Shack Margherita 14
: : :

12 rows

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 38 / 66


Binary Operators
Set Operators
Product
Inner Joins
Joins Equi Join
❱ Inner Joins
Theta Join
Definition Note
Equi Join The equi join of two relations (denoted by R ⋈[=] S) is defined as a special θ-join This is often represented as
Natural Join where the only relational operator that can be used is equality (e.g., = or ≡). a dedicated operation
Outer Joins because attribute selection
using equality operator may
In other words, all equi join is a θ-join but not all θ-join are equi join.
be performed faster with
hash tables.

Better?
def ⋈=(θ=, R, S):
res = Rel(Attr(R) + Attr(S))
for row1 in [Link]:
for row2 in HashGet([Link], θ=): # use hash table!
res += (row1 + row2)
return res

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 39 / 66


Binary Operators
Set Operators
Product
Inner Joins
Joins Natural Join
❱ Inner Joins
Theta Join
Definition Note
Equi Join The natural join of two relations (denoted by R ⋈ S) is defined as This is an even specialized
Natural Join case of equi join.
Outer Joins ℓ
R ⋈ S = σ[ ]( σ[θ]( R × S ) )

where Caution
ℓ = (Attr(R) ∩ Attr(S)) + (Attr(R) − Attr(S)) + (Attr(S) − Attr(R))✱
In the current
Attribute = (common) + (in R but not in S) + (in S but not in R)
implementation of the
Here, + is a tuple concatenation instead of set union as the order matters
relational algebra tool, the
θ = ∀ Ai ∈ (Attr(R) ∩ Attr(S)) : [Link] = [Link] order of attributes is
Condition = (all common attributes are the equal using = operator) different:

1. Attr(R)
2. Attr(R) − Attr(S)


This is actually equivalent to (Attr(R) ∪ Attr(S)) except that we want to make it more explicit about the ordering
CS2102: Database Systems -- Adi Yoga Sidi Prabawa 40 / 66
Binary Operators
Set Operators
Product
Inner Joins
Joins Natural Join
❱ Inner Joins
Theta Join
Question
Equi Join Find all restaurant name, pizza, and the price of the pizza sold by restaurant in London.
Natural Join
Outer Joins
Result
rname pizza price
Spice Palace Veggie 13
Spice Palace Mushroom 14
Spice Palace Supreme 16
Spice Palace Four Cheese 16
London Seafood Shack Margherita 14
: : :

12 rows

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 41 / 66


Binary Operators
Set Operators
Product
Outer Joins
Joins Basic
Inner Joins Definition
❱ Outer Joins
Basic Let R1(A1, A2, ..., Ai) and R2(B1, B2, ..., Bj) be relations. The outer joins of R1 and R2 are defined as
Dangle
null Inner Join Left Dangling Tuple Right Dangling Tuple
Quiz
Left R 1 ⟕c R 2 = R 1 ⋈c R 2 ∪ (dangle(R1 ⋈c R2) × {null(R2)})
Right R1 ⟖c R2 = R 1 ⋈c R 2 ∪ ({null(R1)} × dangle(R2 ⋈c R1))
Full R1 ⟗c R2 = R 1 ⋈c R 2 ∪ (dangle(R1 ⋈c R2) × {null(R2)}) ∪ ({null(R1)} × dangle(R2 ⋈c R1))

Question
But how do we generate the dangling tuple?

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 42 / 66


Binary Operators Quiz #3
Consider a system having outer join instead of semi join. Can you define
Set Operators
Product
Outer Joins semi join in terms of outer join?

Joins Dangle
Inner Joins Idea Dangle
❱ Outer Joins
Basic 1. Find non-dangling tuples dangle(R ⋈[θ] S) =
Dangle A. Find the inner join R − π[Attr(R)](R ⋈[θ] S)
B. Perform projection to fit the relation
null
2. Remove non-dangling tuples from relation
Quiz Due to Step 1B above, this is guaranteed to be
union-compatible
SQL?
SELECT *
FROM Restaurants R -- R
Note WHERE EXISTS (
The idea of finding non-dangling tuple is quite SELECT 1

Semi Join vs Inner Join? ×


common and it is given its own operator called semi- FROM Customers C -- S
join. WHERE [Link] = [Link] -- θ
The schema produced is
); -- R ⋉[θ] S
different: R ⋉[θ] S = π[Attr(R)](R ⋈[θ] S)
Inner Join
Attr(R) + Attr(S)
Semi Join
Attr(R)

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 43 / 66


Binary Operators
Set Operators
Product
Outer Joins
Joins null
Inner Joins What About null(R)? SQL?
❱ Outer Joins SELECT NULL, NULL, NULL, ..., NULL;
Basic Since this depends on the size of R, we have two
Dangle ways to do this.
1. Assume that there is an infinite length tuple
null
called null_row and we do projection
Quiz
null(R) = π[Attr(R)](null_row)
2. Perform repeated operation in a loop
null(R) = ∏Attr(R){NULL}
where ∏Attr(R)(z) is (z × z × ... × z) where we
perform the multiplication as many times as
the number of attributes of R

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 44 / 66


Binary Operators
Set Operators
Product
Outer Joins
Joins Quiz
Inner Joins Quiz #4
R1 R2

❱ Outer Joins x y z a b

Basic Consider the relations on the right. How many rows and columns x1 y1 10 x2 b2
Dangle are in the result of the relational algebra expression below? x2 y1 5 x3 b3
null x3 y1 15
σ[b ≡ NULL](R1 ⟕[x = a] R2)
Quiz x2 y2 20
x3 y3 30
NOTE: The above expressions cannot be currently evaluated in the tools,
try to trace it on your own

1 rows, 5 co…
38%

5 rows, 5 colu…
38% Choice Comment
1 rows, 3 colu…
A 1 row, 5 column
8% B 5 row, 5 column
5 rows, 3 colu… C 1 row, 3 column
8%
D 5 row, 3 column
None of the a…
8%

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 45 / 66


Complex Expressions

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 46 / 66


Complex Expressions
❱ Motivation
Question
Motivation
Operator Tree Question
Equivalence
Alternatives Question
Find all managers (with their offices)
of projects that started in 2020 or
later, where at least one member
of the project team has to work 30
hours or more on that project per
week.

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 47 / 66


Complex Expressions
❱ Motivation
Question
Motivation
Operator Tree Question Operator Tree
Equivalence
Alternatives Question
Find all managers (with their offices)
of projects that started in 2020 or
later, where at least one member
of the project team has to work 30
hours or more on that project per
One Line? × week.
π[name, manager, office](
σ[hours ≥ 30](Teams)
⋈[pname = name] Possible Solution #1
π[name, manager, office](
M := ρ[manager ← name](Managers)
σ[start_year ≥ 2020]
P := σ[start_year ≥ 2020](Projects)
(Projects)
⋈ T := σ[hours ≥ 30](Teams)
ρ[manager ← name]
Q1 := π[name, manager, office](P ⋈ M)
(Managers) Q2 := T ⋈[pname = name] Q1
) Q := π[name, manager, office](Q2)

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 48 / 66


Complex Expressions
❱ Motivation
Question
Motivation
Operator Tree Question Operator Tree
Equivalence
Alternatives Question
Find all managers (with their offices)
of projects that started in 2020 or
later, where at least one member
of the project team has to work 30
hours or more on that project per
One Line? × week.

π[name, manager, office](


(
σ[hours ≥ 30](Teams) Possible Solution #2
⋈[pname = name]
M := ρ[manager ← name](Managers)
σ[start_year ≥ 2020]
P := σ[start_year ≥ 2020](Projects)
(Projects)
) T := σ[hours ≥ 30](Teams)
⋈ Q1 := T ⋈[pname = name] P
ρ[manager ← name] Q2 := Q1 ⋈ M
(Managers) Q := π[name, manager, office](Q2)

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 49 / 66


Complex Expressions
Motivation
❱ Equivalence
Equivalence
Equivalent/Isomorphic Equivalent/Isomorphic
Properties
Alternatives Definition
We say that two relational algebra expression Q1 and Q2 are equivalent (denoted by Q1 ≡ Q2) if for any input relations,
both produces the same result with
the same column order
possibly different row order

We say that two relational algebra expression Q1 and Q2 are isomorphic (denoted by Q1 ≅ Q2) if for any input relations,
both produces the same result with
possibly different column order
possibly different row order

Equivalent Isomorphic
a b c a b c c b a
3 4 5 1 2 3 4 3 2
2 3 4 ≡ 2 3 4 ≅ 5 4 3
1 2 3 3 4 5 3 2 1

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 50 / 66


Complex Expressions
Motivation
❱ Equivalence
Equivalence
Equivalent/Isomorphic Properties
Properties
Alternatives Selection Cross Products and Joins
σ[θ1](σ[θ2](R)) ≡ σ[θ2](σ[θ1](R)) R×S ≅ S×R different column order

σ[θ1](σ[θ2](R)) ≡ σ[θ ∧ θ ](R) R⋈S ≅ S⋈R different column order


1 2

R × (S × T) ≡ (R × S) × T
Caution ×
Projection R ⋈ (S ⋈ T)
associative
≅ (R ⋈ S) ⋈ T
Note that in the current
version of the relational π[ℓ1](π[ℓ2](R)) ≢ π[ℓ1](R) unless ℓ1 ⊆ ℓ2 different column order
algebra tool, we have a (R ⋈[θ1] S) ×[θ2] T ≢ R ⋈[θ1] (S ×[θ2] T)
different column order for
natural join. In the current if θ1 uses Attr(T)
implementation, you will or θ2 uses Attr(R)
see that

R ⋈ (S ⋈ T)
≡ Combined
(R ⋈ S) ⋈ T
π[ℓ](σ[θ](R)) ≢ σ[θ](π[ℓ](R)) unless θ uses only attributes in ℓ
In our definition of natural
join, it is only isomorphic. σ[θ](R × S) ≢ σ[θ](R) × S unless θ uses only Attr(R)

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 51 / 66


Complex Expressions
Motivation
Equivalence
Alternatives
❱ Alternatives Invalid
Invalid
Redundant Missing Attributes
σ[ pizza = 'Margherita'](π[ rname, price ](Sells))
π[ pizza ](ρ[ name ← pizza ](Sells))

Incompatible Attributes
σ[ price = 'Margherita' ](Sells)

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 52 / 66


Complex Expressions Quiz #5
Can you rewrite the queries using natural join instead?
Motivation
Equivalence
Alternatives
❱ Alternatives Redundant
Invalid
Redundant Cross Product + Attribute Selection ≡ Inner Join
σ[ area = rarea ](Customers × (ρ[rarea ← area](Restaurants)))
≡ Customers ⋈ [ area = rarea ] ρ[rarea ← area](Restaurants)

Unnecessary Operators
π[ rname ](π[ rname , price ](Sells))
≡ π[ rname ](Sells)

Unoptimized Query
σ[ price ≥ 15 ]( Sells ⋈[name = rname] ρ[name ← rname](Restaurants))
≡ σ[ price ≥ 15 ]( Sells ) ⋈[name = rname] ρ[name ← rname](Restaurants)

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 53 / 66


Summary

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 54 / 66


Summary
❱ Algebra
Basic
Algebra
SQL Query Basic
Unary
Set What Is It?
Joins Formal method to query relational data
Closure property to construct arbitrarily complex relational expression
Basis for database query language such as SQL

Operators
Unary selection (σ) projection (π) renaming (ρ)
Binary union (∪) intersection (∩) difference (−)
product (×) inner join (⋈) outer join (⟕,⟖,⟗)

All other operators (except "aggregate" and "sorting", not discussed here) can be expressed using
σ π ρ ∪ ∩ − ×

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 55 / 66


Summary
❱ Algebra
Basic
Algebra
SQL Query SQL Query
Unary SELECT Statement
Set SQL Query
Joins

SELECT DISTINCT a_1, a_2, ..., a_m
FROM r_1, r_2, ..., r_n SQL parser & validator
WHERE θ;

Relational Algebra
Relational Algebra
π[a1, a2, ..., am](σ[θ](r1 × r2 × ... × rn)) ⇓ Query optimizer

Query Plan

⇓ Code generator

Executable Code

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 56 / 66


Summary
Algebra
❱ Unary
Unary
Set Selection
Joins

Projection Renaming

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 57 / 66


Summary
Algebra
Unary
Set
❱ Set Operations
Joins Operation Visualization Description

R∪S A relation containing all tuples that are in R or S

R∩S A relation containing all tuples that are in R and S

A relation containing all tuples that are in R but not S


R−S
This is an asymmetric operation

The two relations must be union-compatible.

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 58 / 66


Summary
Algebra
Unary
Joins
Set Operations
❱ Joins L Operation R Visualization

L ⋈[θ] R

Keep Dangling L ⟕[θ] R

L ⟖[θ] R Keep Dangling

Keep Dangling L ⟗[θ] R Keep Dangling

The visualization has different meaning here. It visualizes which relation we keep the dangling tuple from.

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 59 / 66


postgres=# exit
Press any key to continue . . .

CS2102: Database Systems -- Adi Yoga Sidi Prabawa


Solutions
❱ Quiz #1
Quiz #2
Quiz #1
Quiz #3 Question
Quiz #4 Quiz #1
Quiz #5
Think of some common programming language such as Python. Are the two operations equivalent?

Solution
This actually depends on the type in Python because we can multiply a string with an integer.
YES if
a : INT
b : INT
c : INT
NO if
a : INT
b : TEXT
c : TEXT

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 61 / 66


Solutions
Quiz #1
❱ Quiz #2
Quiz #2
Quiz #3 Question
Quiz #4 Quiz #2
Quiz #5
Which of the following relational algebra expression resulted in the output below?

rname pizza price price pizza


Sizzle Grill BBQ Chicken 13 13 BBQ Chicken
Sizzle Grill BBQ Chicken 13 15 Supreme
Sizzle Grill BBQ Chicken 13 13 Four Cheese
Sizzle Grill BBQ Chicken 13 16 Mushroom
: : :

Choice Comment
A σ[rname = 'Sizzle Grill'](π[price,pizza](Sells)) NO: attribute rname does not exists after projection

B π[price,pizza](σ[rname = 'Sizzle Grill'](Sells)) YES: correct column order and correct condition

C σ[rname = 'Sizzle Grill'](Sells) NO: too many columns

D π[pizza,price](σ[rname = 'Sizzle Grill'](Sells)) NO: wrong column order

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 62 / 66


Solutions
Quiz #1
Quiz #2
Quiz #3
❱ Quiz #3 Question
Quiz #4 Quiz #3
Quiz #5
Consider a system having outer join instead of semi join. Can you define semi join in terms of outer join?

Solution
The key here is the dangling tuples. Full outer join consists of (inner join) ∪ (left dangle) ∪ (right dangle).

Note that after we project the columns to include only Attr(R), then we are left with the (semi join) ∪ (left dangle).
So we need to remove the left dangle get the semi join. However, this cannot be done after the projection because
also note that (semi join) ∪ (left dangle) is the original relation.

So we remove left dangle before projection. This can be done by looking at {null(S)}. We construct a condition such
that there is an attribute from Attr(S) that are non NULL. This assumes we can construct ∃ or uses a series of
disjunction.

(R ⋉[θ] S) = π[Attr(R)](σ[∃ A ∈ Attr(S) : S.A ≢ NULL](R ⟗[θ] S))

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 63 / 66


Solutions
Quiz #1
Quiz #2
Quiz #4
Quiz #3 Question
❱ Quiz #4 R1 R2
Quiz #4
Quiz #5 x y z a b
Consider the relations on the right. How many rows and columns x1 y1 10 x2 b2
are in the result of the relational algebra expression below? x2 y1 5 x3 b3
x3 y1 15
σ[b ≡ NULL](R1 ⟕[x = a] R2)
x2 y2 20
x3 y3 30
NOTE: The above expressions cannot be currently evaluated in the tools,
try to trace it on your own

Choice Comment
A 1 row, 5 column YES: this finds the dangling tuple

B 5 row, 5 column NO: do not forget the selection

C 1 row, 3 column NO: no projection, need to have 5 columns

D 5 row, 3 column NO: no projection, need to have 5 columns

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 64 / 66


Solutions
Quiz #1
Quiz #2
Quiz #5
Quiz #3 Question
Quiz #4 Quiz #5
❱ Quiz #5
Can you rewrite the queries using natural join instead?

Solution
σ[ area = rarea ](Customers × (ρ[rarea ← area](Restaurants)))
≅ Customers ⋈ Restaurants
This is not equivalent but isomorphic because we are losing some columns AND the columns lost are duplicate columns
σ[ price ≥ 15 ]( Sells ⋈[name = rname] ρ[name ← rname](Restaurants))
≡ σ[ price ≥ 15 ]( Sells ) ⋈ Restaurants

CS2102: Database Systems -- Adi Yoga Sidi Prabawa 65 / 66


(END)

CS2102: Database Systems -- Adi Yoga Sidi Prabawa

You might also like