You are on page 1of 71

 

Database Systems

Chapter 8
The Relational Algebra

 8.1 
 Relation Schema 

 A relation schema is used to describe a relation

 Denoted by S(A1, A2, A3, …, An), where


 S: Relation schema name
 A , …, A : attributes of S
1 n

 The degree of a relation is the number of


attributes in a relation schema

 8.2 
 Relation Instance (1) 

 A relation state (or relation instance) r of the


relation schema S(A1, A2, …, An), denoted by r(S),
is a set of n-tuples
 r(S) = {t , t , …, t }
1 2 m

 each n-tuple ti is an ordered list of n values,


 t = <v , v , …, v > where
i 1 2 n
 each value v , 1  i  n, is an element of
i
dom(Ai) or a special null value

 8.3 
 Basic Relational Algebra Operations 

 Is a collection of operators that are used to


manipulate entire relations.

 The result of each operation is a new relation


which can be further manipulated.

 These operations enable a user to specify basic


retrieval requests (or queries)

 8.4 
 Brief History of Origins of Algebra 
 Muhammad ibn Musa al-Khwarizmi (800-847 CE) –
from Morocco wrote a book titled al-jabr about
arithmetic of variables
 Book was translated into Latin.
 Its title (al-jabr) gave Algebra its name.

 Al-Khwarizmi called variables “shay”


 “Shay” is Arabic for “thing”.
 Spanish transliterated “shay” as “xay” (“x” was “sh” in
Spain).
 In time this word was abbreviated as x.

 Where does the word Algorithm come from?


 Algorithm originates from “al-Khwarizmi"
 Reference: PBS (http://www.pbs.org/empires/islam/innoalgebra.html)

 8.5 
 Relational Algebra Overview 
 Relational Algebra consists of several groups of operations
 Unary Relational Operations
 SELECT (symbol:  (sigma))
 PROJECT (symbol: (pi))
 RENAME (symbol:  (rho))
 Relational Algebra Operations From Set Theory
 UNION (  ), INTERSECTION ( ), DIFFERENCE (or
MINUS, – )
 CARTESIAN PRODUCT ( x )
 Binary Relational Operations
 JOIN (several variations of JOIN exist)
 DIVISION
 Additional Relational Operations
 OUTER JOINS, OUTER UNION
 AGGREGATE FUNCTIONS (These compute summary of
information: for example, SUM, COUNT, AVG, MIN, MAX)

 8.6 
 Database State for COMPANY 
 All examples discussed below refer to the COMPANY database
shown here.

 8.7 
 SELECT Operation 

  --- Select (sigma)


 Format: 
selection-condition(R)
 Semantics:

 returns all tuples of relation R that satisfy the


selection-condition

 Select operation is unary. It applies to a single


relation

 is used to select a subset of the tuples in a


relation that satisfy a selection condition.

 8.8 
 Formats of Selection Conditions 

 A op v
 A is an attribute
 op is an operator (=, , <, , >, )
 v is a constant
 Age  20
 Name = 'Bill'

 A op B
 A and B are two attributes in R.
 Person(SSN, Name, Birthplace, Residence)
 Birthplace = Residence

 Compound condition connected by and, or or not.


 Age  20 and Birthplace = Residence

 8.9 
 An Example of SELECT (1) 
 Find all students who are 20 years old or younger, and whose
birthplace is the same as his/her residence.
 Age  20 and Birthplace = Residence(Student)

 Select the employee tuples whose department number is 4:


  DNO = 4 (Employee)

 Select the employee tuples whose salary is greater than


$30,000:
  SALARY > 30000 (Employee)

 8.10 
 An Example of SELECT (2) 

SSN Name Salary


1234545 John 200000
5423341 Smith 600000
4352342 Fred 500000

 Salary > 40000 (Employee)


SSN Name Salary
5423341 Smith 600000
4352342 Fred 500000
 8.11 
 An Example of SELECT (2) 

Major = ‘CS’ (Student)

Student Result
SID Name GPA Major SID Name GPA Major
456 John 3.4 CS 456 John 3.4 CS
457 Carl 3.2 CS 457 Carl 3.2 CS
678 Ken 3.5 Math

 8.12 
 
An Example of Select (3)
SSN Name Age GPA Birthplace Residence
123456789 John 20 3.2 Vestal Vestal
234567891 Mary 18 2.9 Binghamton Vestal
345678912 Bill 19 2.7 Endwell Endwell
456789123 Nancy 24 3.6 Binghamton NYC

Age  20 and Birthplace = Residence(Student)

SSN Name Age GPA Birthplace Residence


123456789 John 20 3.2 Vestal Vestal
345678912 Bill 19 2.7 Endwell Endwell

 8.13 
 SELECT Operation 

 Commutativity of select:
 
condition-1(condition-2(R))
= condition-2(condition-1(R))
= condition-1 and condition-2(R)

 city = “Irbid” AND GPA > 65 (Student) or

 city = “Irbid” (GPA > 65 (Student)

 8.14 
 PROJECT Operation 

  --- project (pi)


 Format:

 attribute-list(R)
 where attribute-list is a subset of all attributes in R
 Semantics:
 Returns all tuples of relation R but for each tuple,
only values under attribute-list are returned

 Project removes duplicate tuples automatically

 8.15 
 Project (2) 

 Selects certain columns from the table and


discards other columns
 
<attribute-list> (<relation-name>)

 degree of resulting relation is equal to the


number of attributes in the <attribute-list>

 The number of tuples of the result of project is


less than or equal to the number of tuples in the
original relation. (It removes the duplicates)

 8.16 
 
Project (3)

Example: Find the name and GPA of all


students.
Name,GPA(Student)
Student
SSN Name Age GPA Name GPA
123456789 John 20 3.2 John 3.2
234567891 Mary 18 2.9 Mary 2.9
345678912 John 19 3.2

Input Relation
Output Relation

 8.17 
 Projection (4) 

Major(Student)

Student Result
SID Name GPA Major Major
456 John 3.4 CS CS
457 Carl 3.2 CS Math
678 Ken 3.5 Math

 8.18 
 Project (5) 

 If attribute-list-1  attribute-list-2,
then
 
attribute-list-1(attribute-list-2(R))
= attribute-list-1(R)

 The Project operation is not commutative

 Retrieve all student numbers and names who


live in Amman.
 
STNO, ST-Name(City = “Amman”(Student))

 8.19 
 Project (6) 

 In complex queries, it becomes necessary to store


intermediate results, therefore we should know
how to give names to relations and attributes

 Amman-students = city = “Amman” (Student)


 Result =  STNO, ST-Name (Amman-Students)
 or
 Result(Number, Name) =  STNO, ST-Name (Amman-Students)

Renaming of attributes

 8.20 
 Select and Project 

 Find the name and GPA of all students who are 20 years
old or younger and whose birthplace and residence are
the same

 Name, GPA(Age20 and Birthplace=Residence(Student))

 Find the name and GPA of all CS students.

  Name, GPA( Major = ‘CS’ (Student))

 8.21 
 RENAME Operation 

  --- rename (rho)


 Format: S(R)
 Semantics:
 Make a copy of relation R and name the
copy as S
 S(R):
 Rename R only
 S(B1, B2, …, Bn)(R):
 Rename R and its attributes
 (B1, B2, …, Bn)(R):
 Rename attributes only

 8.22 
 Renaming 

CS_Student( Major = ‘CS’ (Student))

Student CS_Student
SID Name GPA Major SID Name GPA Major
456 John 3.4 CS 456 John 3.4 CS
457 Carl 3.2 CS 457 Carl 3.2 CS
678 Ken 3.5 Math

 8.23 
 Set Theoretic Operations 
 UNION, INTERSECTION, DIFFERENCE.
 binary (applied to two relations at a time)

 To apply any of these operators to relations, relations


should be union-compatible.

 Two relations
 R(A1, A2, …, An) and
 S(B1, B2, …, Bm) are said to be union-compatible if
 they have the same degree (n = m) and
 dom(Ai) = dom(Bi) for 1  i  n.

 Both R and S have the same number of attributes and


the corresponding attributes have the same domain

 8.24 
 Union (1) 
  --- union
 Format:
RS
 Semantics:
 Returns all tuples that belong to either R or S.
 Formally:
 R  S = {t | t  r(R) or t  r(S)}

 Condition of union:
 R and S must be union compatible.

 The union operator removes duplicate tuples


automatically.
 8.25 
 Union (2) 

Example:
R S RS
A B C A B C A B C
a1 b1 c1 a0 b0 c0 a1 b1 c1
a2 b2 c2 a1 b1 c1 a2 b2 c2
a3 b3 c3 a2 b2 c2 a3 b3 c3
a4 b4 c4 a0 b0 c0
a4 b4 c4

 8.26 
 Set Difference (1) 

  set difference
 Format:

RS
 Semantics:
 Returns all tuples that belong to R but not S.
 Formally:
 R  S = {t | t  r(R) and t  r(S)}

 Set difference also requires union compatibility


between R and S

 8.27 
 
Set Difference (2)

Example:
R S RS
A B C A B C A B C
a1 b1 c1 a0 b0 c0 a3 b3 c3
a2 b2 c2 a1 b1 c1
a3 b3 c3 a2 b2 c2
a4 b4 c4

 8.28 
 INTERSECTION 

  --- set intersection


 Format:

RS
 Semantics:
 Returns all tuples that belong to both R and S.
 Formally:
 R  S = {t | t  r(R) and t  r(S)}

 Derivation from existing operators:


 R  S = R  (R  S) = S  (S  R)

 8.29 
 INTERSECTION 

 Union and Intersection are commutative


operations
 RS =SR and
 RS=SR

 Union and Intersection can be applied to any


number of relations and both are associative:
 R  (S  Q) = (S  R)  Q
 R  (S  Q) = (S  R)  Q

 Difference operator is not commutative:


 R  S  S  R in general.

 8.30 
 Examples of Set Operations 

TAs RAs
SID Name GPA Major SID Name GPA Major
456 John 3.4 CS 456 John 3.4 CS
457 Carl 3.2 CS 223 Bob 2.95 Ed
678 Ken 3.5 Math

TAs  RAs TAs  RAs


SID Name GPA Major
SID Name GPA Major
456 John 3.4 CS
456 John 3.4 CS
457 Carl 3.2 CS
678 Ken 3.5 Math TAs  RAs
223 Bob 2.95 Ed SID Name GPA Major
457 Carl 3.2 CS
678 Ken 3.5 Math

 8.31 
 Cartesian Product (1) 

  --- Cartesian product


 Format:

RS
 Semantics:
 Returns every tuple that can be formed by
concatenating a tuple in R with a tuple in S

 Binary operation, but the relations on which it is


applied do not have to be union compatible

 8.32 
 
Cartesian Product (2)

Example: R A B C S B D E
a1 b1 c1 b1 d1 e1
a2 b2 c2 b2 d2 e2
a3 b3 c3
RS A R.B C S.B D E
a1 b1 c1 b1 d1 e1
a1 b1 c1 b2 d2 e2
a2 b2 c2 b1 d1 e1
a2 b2 c2 b2 d2 e2
a3 b3 c3 b1 d1 e1
a3 b3 c3 b2 d2 e2

 8.33 
 Example of Cross Product 

Student Award
SID Name GPA Major SID Amount Year
456 John 3.4 CS 456 1500 1998
457 Carl 3.2 CS 678 3000 2000
678 Ken 3.5 Math

Student  Award
Student.SID Award.SID
SID Name GPA Major SID Amount Year
456 John 3.4 CS 456 1500 1998
456 John 3.4 CS 678 3000 2000
457 Carl 3.2 CS 456 1500 1998
457 Carl 3.2 CS 678 3000 2000
678 Ken 3.5 Math 456 1500 1998
678 Ken 3.5 Math 678 3000 2000
 8.34 
 Cartesian Product (3) 

 If R and S have common attributes, then the full


names of these attributes must be used
 Example:
 Use R.A instead of A

 To prevent identical attribute names from


occurring in the same relation schema, R  R is
not allowed.
 However, R   (R) is allowed
S

 Commutativity: R  S = S  R

 8.35 
 Cartesian Product (4) 

 Given R(A1, A2, …, An) and S(B1, B2, …, Bm)


 R  S = Q(A , A , …, A , B , B , …, B )
1 2 n 1 2 m
 degree of Q = n + m

 If R has N tuples and S has M tuples, then


 R  S has N*M tuples

 Cartesian product is extremely expensive


 If R and S are both large, then each relation
may need to be scanned many times to perform
the Cartesian product.
 Writing out the result can be very expensive
due to the large size of the result
 8.36 
 Example 

Retrieve for each female employee a list of names of her


dependents
FEMALE-EMPS =  Sex = “F” (EMPLOYEE)
EMP-NAMES = Fname,Lname,SSN(FEMALE-EMPS)

EMP-DEPENDENTS = EMP-NAMES  DEPENDENT

ACTUAL-DEP = SSN = ESSN (EMP-DEPENDENTS)

RESULT = Fname,Lname,Dependent-name(ACTUAL-DEP)

 8.37 
 
Relational Algebra Example (1)

Example: Find the names of each employee


and his/her manager
Employee: SSN Name Age Dept-Name
123456789 John 34 Sales
234567891 Mary 42 Service

345678912 Bill 39 null


Department: Name Location Manager
Sales XYZ Bill
Inventory YZX Charles
Service ZXY Maria

 8.38 
 Relational Algebra Example (2) 

 A relational algebra expression is:


Employee.Name, Department.Manager
Department.Manager (Employee.Dept_Name
Employee.Dept_Name = Department.Name
(Employee  Department))

 A simplified version (don't use full name when


you don't have to):
Employee.Name, Manager (Dept_Name =Department.Name (Employee Department))

 8.39 
 Relational Algebra Example (3) 

 Use assignment operator ( = ) to save the


intermediate result into a temporary relation
 Example: The following expression
Employee.Name, Manager(Dept_Name = Department.Name
(Employee  Department) )
 is equivalent to the following series of
expressions:
TEMP1 = Employee  Department
TEMP2 = Dept_Name=Department.Name(TEMP1)
RESULT = Employee.Name, Manager (TEMP2)

 8.40 
 Join (1) 

  --- join
 Format:

 R join-condition S
 Semantics:
 Returns all tuples in R  S which satisfy the join condition

 Derivation from existing operators:


 R
join-condition S = join-condition(R  S)

 Format of join condition:


 R.A op S.B
 R.A op S.B and R.A op S.B . . .
1 1 2 2
 Tuples whose join attributes are NULL do not appear in
the result.
 8.41 
 
Join (2)

Example: Find the names of all employees and their


department locations
Employee: SSN Name Age Dept-Name
123456789 John 34 Sales
234567891 Mary 42 Service
345678912 Bill 39 null
Department: Name Location Manager
Sales Binghamton Bill
Inventory Endicott Charles
Service Vestal Maria

 8.42 
 Join (3) 

Employee.Name, Location (Employee Dept-Name = Department.Name Department)

Result Name Location


John Binghamton
Mary Vestal

 8.43 
 
Join (4)

Example: Find the names of all employees who earn


more than his/her manager
Employee: SSN Name Salary Manager-SSN
123456789 John 34k 234567891
234567891 Bill 40k null
345678912 Mary 38k null
456789123 Mike 41k 345678912
Employee.Name (Employee Employee.Manager-SSN = M.SSN and Employee.Salary > M.Salary
M(Employee))

 8.44 
 Joins 

 Theta Join.
 Format: R
join-condition S
 Returns tuples in 
join-condition(R  S)

 Equijoin.
 Same as Theta Join except the join-condition
contains only equalities.

 Natural Join.
 Same as Equijoin except that equality
conditions are on common attributes and
duplicate columns are eliminated.
 8.45 
 Equijoin

 A join is called an equijoin if only equality


operator is used in all join conditions.
R S R R.B = S.B S
A B B C A R.B S.B C
a b b c a b b c
d b c d d b b c
b c a d b c c d

 Most joins in practice are equijoins.

 8.46 
 Natural Join (1) 

 Definition:
 A join between R and S is a natural join if
 There is an equality comparison between
every pair of identically named attributes
from the two relations

 Among each pair of identically named attributes


from the two relations, only one remains in the
result

 Natural join is denoted by  with no join


conditions explicitly specified

 8.47 
 Natural Join (2) 

 Example:
 R(A, B, C)  S(A, C, D)
 has attributes (A, B, C, D) in the result

 Questions:
 How to express natural join in terms of equijoin
and other relational operator?
 RS=
R.A, B, S.C, D
(R R.A=S.A and R.C=S.C S)

 8.48 
 Examples of Joins 

Student Professor
SID Name GPA Age Prof PID Pname Age Dept
456 John 3.4 29 123 123 John 35 CS
457 Carl 3.2 35 123 154 Scott 28 Math
678 Ken 3.5 25 154
 Theta Join.
Student Student.Age <= Professor.Age Professor
Result Student.Age Professor.Age
SID Name GPA Age Prof PID Pname Age Dept
456 John 3.4 29 123 123 John 35 CS
457 Carl 3.2 35 123 123 John 35 CS
678 Ken 3.5 25 154 123 John 35 CS
678 Ken 3.5 25 154 154 Scott 28 Math

 8.49 
 Examples of Joins (cont.) 

 Equijoin.
Student Prof=PID AND Name=Pname Professor
Result Student.Age Professor.Age
SID Name GPA Age Prof PID Pname Age Dept
456 John 3.4 29 123 123 John 35 CS

 Natural Join.
Student Professor
Result
SID Name GPA Age Prof PID Pname Dept
457 Carl 3.2 35 123 123 John CS

 8.50 
 Some Questions About Joins 

 What is the result of R S, if they do not have a


common attribute?

 What is the result of R R?

 Consider relations
 Students(SSN, Name, GPA, Major, Age, PSSN)
 Profs(PSSN, Name, Office, Age, Dept)

 Which type of join should be used to find pairs of


names of students and their advisors?

 Can a natural join be used? How?

 8.51 
 A Complete Set of Relational Algebra 
Operations
 The relational algebra is a set of expressions as
defined below:
 A relation is an expression.

 If E1 and E2 are expressions, so are


P(E1), A(E1), S(E1),
E1  E2, E1  E2, E1  E2

 That is, any expression that can be formed


from base relations and the six relational
operators is a relational algebra expression

 8.52 
 Aggregate Functions & Grouping 

 Well know Aggregate Functions:


 SUM, AVERAGE, MAXIMUM,
MINIMUM, COUNT
 Format:  (Script F)
<grouping attributes>  <function list> (R)

 <grouping attributes> is a list of attributes in R


 <function list> is a list of (<function> <attribute>)
pairs
 <function> is one of aggregate functions
 <attribute> is an attribute in R

 8.53 
 Examples 

 Retrieve each department number, the number of


employees in the department, and their average salary:
DNO COUNT( SSN), AVERAGE(SALARY) (Employee)

DNO COUNT_SSN AVERAGE_SALARY


1 1 55000
4 3 31000
5 4 33250

 8.54 
 Examples 

 Retrieve for each department number, the number of


employees in the department, and their average salary:
R(DNO, NO_OF_EMP, AVERAGE_SAL)(DNO COUNT( SSN), AVERAGE(SALARY) (Employee))

DNO NO_OF_EMP AVERAGE_SAL


1 1 55000
4 3 31000
5 4 33250

 8.55 
 Examples 

 If no grouping attributes are specified, the functions


are applied to attribute values of all the tuples in the
relation
 Retrieve the number of employees, and their average
salary:
COUNT( SSN), AVERAGE(SALARY) (Employee))

COUNT_SSN AVERAGE_SALARY
8 35125

 8.56 
 
Outerjoin (1)
R S RS
A B C C D E A B C D E
a1 b1 c1 c1 d1 e1 a1 b1 c1 d1 e1
a4 b3 c2 c6 d3 e2

 The second tuples of R and S are not


present in the result (called dangling
tuples)

 Applications exist that require to retain


dangling tuples
 8.57 
 
Outerjoin (2)
O --- outer join
Format: R O S
Semantics: like join except
 it retains dangling tuples from both R and
S
 it uses null to fill out missing entries

R O S

A B C D E
a1 b1 c1 d1 e1
a4 b3 c2 null null
null null c6 d3 e2
 8.58 
 Left Outerjoin and Right Outerjoin 

 LO --- left outer join


 Format: R  S
LO
 Semantics: like outerjoin but retains only
dangling tuples of the relation on the left

 RO --- right outer join


 Format: R  S
RO
 Semantics: like outerjoin but retains only
dangling tuples of the relation on the right

 8.59 
 
Left Outerjoin and Right Outerjoin (2)
R S RR
A B C C D E A B C D E
a1 b1 c1 c1 d1 e1 a1 b1 c1 d1 e1
a4 b3 c2 c6 d3 e2

 8.60 
 
Left Outerjoin and Right Outerjoin (3)
R LO S

A B C D E
a1 b1 c1 d1 e1
a4 b3 c2 null null

R RO S

A B C D E
a1 b1 c1 d1 e1
null null c6 d3 e2

 8.61 
 Examples of Queries in Relational 
Algebra (1)
 Many relational algebra queries can be expressed
using selection, projection and join operators by
following steps:
 (1) Determine necessary relations to answer the
query.
 If R , ..., R are all the relations needed,
1 n

 P are all conditions and


 T are all (target) attributes to be output,
 then form the initial query:
T(P(R1  ...  Rn))

 8.62 
 Relational Algebra Example (2) 

 (2) If P contains a condition, say Ci, that


involves only attributes in Ri
 replace R by  (R ) and remove C from P
i Ci i i

 (3) If P contains a condition, say C, that


involves attributes from both Ri and Rj
 replace R  R by R  R (or a natural
i j i C j
join) and remove C from P

 8.63 
 Relational Algebra Example (3) 

 Consider the following database schema:


Student(SSN, Name, GPA, Age, DName)

Enrollment(SSN, CNO, Grade)

Course(CNo, Title, DName)

Department(DName, Location, Phone)

 8.64 
 Relational Algebra Example (5) 

 Query: Find the SSNs and names of all students


who are CS major and who take CIS328
 (1) Relations Students and Enrollment are needed
 T = {Student.SSN, Student.Name}
 P = {Student.DName = ‘CS’,
Enrollment.CNO = ‘CIS328’,
Student.SSN = Enrollment.SSN}
 The initial relational algebra query is:
Student.SSN, Name
(DName='CS' and CNO='CIS328' and
Student.SSN= Enrollment.SSN
(Student  Enrollment))

 8.65 
 Relational Algebra Example (6) 

 (2) Replace
 Students by

DName = 'CS'(Student) and

 Enrollment by
CNO='CIS328'(Enrollment)

 Remove the two conditions from the initial


expression

 8.66 
 Relational Algebra Example (7) 

 (3) Replace
 Student  Enrollment by

Students  Enrollment and


 remove Student.SSN = Enrollment.SSN from
the initial expression.

 The final expression:


Student.SSN,Name((DName = 'CS'(Student))  (CNO = 'CIS328'(Enrollment)))

 8.67 
 Relational Algebra Example (8) 

 Query: Find the SSN and name of each student


who is CS major together with the titles of the
courses taken by the student

 Student.SSN, Name, Title


(Student.DName='CS' and Student.SSN=Enrollment.SSN and Enrollment.CNO =
Course.CNO
(Student  Enrollment  Course))

 8.68 
 Relational Algebra Example (9) 

Student.SSN,Name,Title (Student.DName='CS'(Student))  Enrollment)  Course)

Student.SSN,Name,Title ((Student.Name='CS'(Student))  (Enrollment  Course))

 8.69 
 Relational Algebra Summary (1) 

 Relational algebra operators:


 Fundamental operators:
C(R), A(R), S(R), R  S, R - S, R  S

 Other traditional operators:


R  S, R  C S

 8.70 
 Relational Algebra Summary (2)

Some identities:
 C1(C2(R)) = C2(C1(R)) = C1 and C2(R)

 L1(L2(R)) = L1(R) , if L1  L2

 R1  R2 = R2  R1

 R1  (R2  R3) = (R1  R2)  R3

 R1  R2 = R2  R1

 R1  (R2  R3) = (R1  R2)  R3


 8.71 

You might also like