You are on page 1of 81

‫الجامعة السعودية االلكترونية‬

‫الجامعة السعودية االلكترونية‬

‫‪26/12/2021‬‬
College of Computing and Informatics
Data Science Pre-Master Program
IT244
Introduction to Database
IT244
Introduction to Database
Week 3
Relational Model
Contents
1. The Relational Data Model and Relational Database
Constraints
2. The Relational Algebra
Weekly Learning Outcomes

1. Create a Relational model of a Database.


2. Describe Relational Algebra.
Required Reading
1. Chapter 5: The Relational Data Model and Relational
Database Constraints
2. Chapter 8: The Relational Algebra
(Fundamentals of Database Systems, Global Edition,
7th Edition (2017) by Ramez Elmasri & Shamkant
Navathe)
Recommended Reading
Relational Model and Relational Algebra:
https://courses.cs.vt.edu/cs4604/Spring21/pdfs/2-dbmodel.pdf

This Presentation is mainly dependent on the textbook: Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez Elmasri & Shamkant Navathe
• The Relational Data Model and Relational Database
Constraints
Relational Model Concepts and its Origin

• The formal relational Model of Data is based on the concept of a


Relation
– Has a formal mathematical foundation provided by set theory and first
order predicate logic
• In practice, there is a standard model based on SQL (Structured
Query Language)
• There are several important differences between the formal model
and the practical model, as we shall see
• The model was first proposed by Dr. E.F. Codd of IBM Research in
1970 in the following paper:
– "A Relational Model for Large Shared Data Banks," Communications of
the ACM, June 1970
• The above paper caused a major revolution in the field of database
management
• Dr. Codd earned the coveted ACM Turing Award in 1981
Informal Definitions (1)
• Informally, a relation looks like a table of values (see Figure 3.1 on next
slide).
• A relation contains a set of rows.
• The data elements in each row represent certain facts that correspond to
a real-world entity or relationship
– In the formal model, rows are called tuples
• Each column has a column header that gives an indication of the meaning
of the data items in that column
– In the formal model, the column header is called an attribute name (or just
attribute)
Informal Definitions (2)
• Key of a Relation:
– Each row (tuple) in the table is uniquely identified by the
value of a particular attribute (or several attributes
together)
• Called the key of the relation
– In the STUDENT relation, SSN is the key
– If no attributes posses this uniqueness property, a new
attribute can be added to the relation to assign unique
row-id values (e.g. unique sequential numbers) to
identify the rows in a relation
• Called artificial key or surrogate key
Formal Definitions – Relation Schema
• Relation Schema (or description) of a Relation:
– Denoted by R(A1, A2, ..., An)
– R is the name of the relation
– The attributes of the relation are A1, A2, ..., An
– n is the cardinality of the relation
• Example:
CUSTOMER (Cust-id, Cust-name, Address, Phone#)
– CUSTOMER is the relation name
– The CUSTOMER relation schema (or just relation) has four
attributes: Cust-id, Cust-name, Address, Phone#
• Each attribute has a domain or a set of valid values.
– For example, the domain of Cust-id can be 6 digit numbers.
Formal Definitions - Tuple

• A tuple is an ordered set of values (enclosed in angled


brackets ‘< … >’)
• Each value is derived from an appropriate domain.
• A row in the CUSTOMER relation is a 4-tuple and would consist
of four values, for example:
– <632895, "John Smith", "101 Main St. Atlanta, GA 30332",
"(404) 894-2000">
– Called a 4-tuple because it has 4 values
– In general, a particular relation will have n-tuples, where n is the
number of attributes for the relation
• A relation is a set of such tuples (rows)
Formal Definitions - Domain
• A domain of values can have a logical definition:
– Example: “USA_phone_numbers” are the set of 10 digit phone numbers
valid in the U.S.
• A domain also has a data-type or a format defined for it.
– The USA_phone_numbers may have a format: (ddd)ddd-dddd where each
d is a decimal digit.
– Dates have various formats such as year, month, date formatted as
yyyy-mm-dd, or as dd:mm:yyyy etc.

• The attribute name designates the role played by a domain in a relation:


– Used to interpret the meaning of the data elements corresponding to
that attribute
– Example: The domain Date may be used to define two attributes “Invoice-
date” and “Payment-date” with different meanings (roles)
Formal Definitions - State of a Relation
• Formally, a relation state r(R) is a subset of the Cartesian product
of the domains of its attributes
– Each domain contains the set of all possible values the attribute can
take.
– The Cartesian product contains all possible tuples from the attribute
domains
– The relations state r(R) is the subset of tuples that represent valid
information in the mini-world at a particular time
• Formally (see Figure 3.1),
– Given relation schema R(A1, A2, .........., An)
– Relation state r(R)  dom(A1) X dom(A2) X ....X dom(An)
• r(R): is a specific state (or "instance" or “population”) of relation R
– this is a set of tuples (rows) in the relation at a particular moment
in time
– r(R) = {t1, t2, …, tn} where each ti is an n-tuple
– ti = <v1, v2, …, vn> where each vj element-of dom(Aj)
Formal Definitions - Example
• Let R(A1, A2) be a relation schema:
– Let dom(A1) = {0, 1}
– Let dom(A2) = {a, b, c}
• Then: The Cartesian product dom(A1) X dom(A2) contains all
possible tuples from these domains:
{<0, a> , <0, b> , <0, c>, <1, a>, <1, b>, <1, c> }
• The relation state r(R)  dom(A1) X dom(A2)
• For example: One possible state r(R) could be {<0, a> , <0, b> ,
<1, c> }
– This state has three 2-tuples: <0, a> , <0, b> , <1, c>
Relation Definitions Summary

Informal Terms Formal Terms

Table Relation

Column Header Attribute

All possible Column Values or Data Type Domain

Row Tuple

Table Definition Schema of a Relation

Populated Table State of the Relation


Characteristics of a Relation (1)
• Ordering of tuples in a relation r(R):
– The tuples are not considered to be ordered, because a relation is a set of
tuples
• Ordering of attributes in a relation schema R (and of values within
each tuple):
– The attributes in R(A1, A2, ..., An) and the values in each t=<v1, v2, ..., vn> are
considered to be ordered
– However, a more general definition of relation does not require attribute ordering
– In this case, a tuple t = { <ai, vi>, ..., <aj, vj> } is an unordered set of n <attribute, value>
pairs – one pair for each of the relation attributes (see Figure 3.3)
Characteristics of Relations (2)
• Values in a tuple:
– All values are considered atomic (indivisible).
– Each value must be from the domain of the attribute for that
column
• If tuple t = <v1, v2, …, vn> is a tuple (row) in the relation state r of R(A1,
A2, …, An)
• Then each vi must be a value from dom(Ai)
– A special null value is used to represent values that are unknown
or inapplicable to certain tuples.
• Notation:
– We refer to component values of a tuple t by:
• t[Ai] or t.Ai
• This is the value vi of attribute Ai for tuple t
– Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing
the values of attributes Au, Av, ..., Aw, respectively in t
Relational Integrity Constraints

• Constraints are conditions that must hold on all valid relation


states.
• Constraints are derived from the mini-world semantics
• There are three main types of built-in constraints in the
relational model:
– Key constraints
– Entity integrity constraints
– Referential integrity constraints
• Another implicit constraint is the domain constraint
– Every value in a tuple must be from the domain of its attribute
(or it could be null, if allowed for that attribute)
Key Constraints (1)

• Superkey SK of R:
– Is a set of attributes SK of R with the following condition:
• No two tuples in any valid relation state r(R) will have the same
value for SK
• That is, for any two distinct tuples t1 and t2 in r(R), t1.SK  t2.SK
• This condition must hold in any valid state r(R)
• Key (also called Candidate key) K of R:
– Is a "minimal" superkey
– Formally, a key K is a superkey such that removal of any
attribute from K results in a set of attributes that is not a
superkey (or key) any more (does not possess the superkey
uniqueness property)
– Hence, a superkey with one attribute is always a key
Key Constraints (2)

• Example: Consider the CAR relation schema:


– CAR(State, Reg#, SerialNo, Make, Model, Year)
– CAR has two keys (determined from the mini-world constraints):
• Key1 = {State, Reg#}
• Key2 = {SerialNo}
– Both are also superkeys of CAR
– However, {SerialNo, Make} is a superkey but not a key.
• In general:
– Any key is a superkey (but not vice versa)
– Any set of attributes that includes a key is a superkey
– A minimal superkey is also a key
Key Constraints (3)
• If a relation has several keys, they are called candidate keys;
one is chosen to be the primary key; the others are called
unique (or secondary) keys
– The primary key attributes are underlined.
• Example: Consider the CAR relation schema:
– CAR(State, Reg#, SerialNo, Make, Model, Year)
– We choose License_number (which contains (State, Reg#)
together) as the primary key – see Figure 3.4
• The primary key value is used to uniquely identify each tuple in
a relation
– Provides the tuple identity
– Also used to reference the tuple from other tuples
• General rule: Choose the smallest-sized candidate key (in
bytes) as primary key
– Not always applicable – choice is sometimes subjective (as in
Figure 3.4 – see next slide)
Relational Database Schema
• Relational Database Schema:
– A set S of relation schemas that belong to the same database.
– S is the name of the whole database schema
– S = {R1, R2, ..., Rn}
– R1, R2, …, Rn are the names of the individual relation schemas within the
database S
– Figure 3.5 shows a COMPANY database schema with 6 relation schemas
Example of Relational Database State

• Next slide show an example of a COMPANY database state


(Figure 3.6)
– Each relation has a set of tuples
• The tuples in each table satisfy key and other constraints
• If all constraints are satisfied by a database state, it is called a valid
state
– The database state changes to another state whenever the
tuples in any relation are changed via insertions, deletions,
or updates
Entity Integrity Constraint
• Entity Integrity:
– The primary key attributes PK of each relation schema R in
S cannot have null values in any tuple of r(R).
• This is because primary key values are used to identify the
individual tuples.
• t.PK  null for any tuple t in r(R)
• If PK has several attributes, null is not allowed in any of these
attributes
– Note: Other attributes of R may be also be constrained to
disallow null values (called NOT NULL constraint), even
though they are not members of the primary key.
Referential Integrity Constraint (1)

• A constraint involving two relations


– The previous constraints (key, entity integrity) involve a single
relation.
• Used to specify a relationship among tuples in two relations:
– The referencing relation and the referenced relation.
• Tuples in the referencing relation R1 have attributes FK (called
foreign key attributes) that reference the primary key
attributes PK of the referenced relation R2.
– A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1.FK = t2.PK
• Referential integrity can be displayed as a directed arc from
R1.FK to R2.PK – see Figure 3.7
Referential Integrity (or foreign key) Constraint
(2)

• Statement of the constraint


– For a particular database state, the value of the foreign key
attribute (or attributes) FK in each tuple of the referencing
relation R1 can be either:
• (1) An existing primary key (PK) value of a tuple in the referenced
relation R2, or
• (2) a null.
• In case (2), the FK in R1 should not be a part of its
own primary key, and cannot have the NOT NULL
constraint.
Other Types of Constraints
• Semantic Integrity Constraints:
– cannot be expressed by the built-in model constraints
– Example: “the max. no. of hours per employee for all
projects he or she works on is 56 hrs per week”
• A constraint specification language can be used to
express these
• SQL has TRIGGERS and ASSERTIONS to express some
of these constraints
Operations to Modify Relations (1)
• Each relation will have many tuples in its current relation state
• The relational database state is a union of all the individual relation
states at a particular time
• Whenever the database is changed, a new state arises
• Basic operations for changing the database:
– INSERT new tuples in a relation
– DELETE existing tuples from a relation
– UPDATE attribute values of existing tuples
• Integrity constraints should not be violated by the update
operations.
• Several update operations may have to be grouped together into a
transaction.
Operations to Modify Relations (2)
• Updates may propagate to cause other updates
automatically. This may be necessary to maintain
integrity constraints.
• In case of integrity violation, several actions can be
taken:
– Cancel the operation that causes the violation (RESTRICT or
REJECT option)
– Perform the operation but inform the user of the violation
– Trigger additional updates so the violation is corrected
(CASCADE option, SET NULL option)
– Execute a user-specified error-correction routine
INSERT operation
• INSERT one or more new tuples into a relation
• INSERT may violate any of the constraints:
– Domain constraint:
• if one of the attribute values provided for a new tuple is not of the
specified attribute domain
– Key constraint:
• if the value of a key attribute in a new tuple already exists in
another tuple in the relation
– Referential integrity:
• if a foreign key value in a new tuple references a primary key value
that does not exist in the referenced relation
– Entity integrity:
• if the primary key value is null in a new tuple
DELETE operation
• DELETE one or more existing tuples from a relation
• DELETE may violate only referential integrity:
– If the primary key value of the tuple being deleted is referenced
from other tuples in the database
• Can be remedied by several actions: RESTRICT, CASCADE, SET NULL
– RESTRICT option: reject the deletion
– CASCADE option: propagate the deletion by automatically deleting
the referencing tuples
– SET NULL option: set the foreign keys of the referencing tuples to
NULL (the foreign keys cannot have NOT NULL constraint)
– One of the above options must be specified during database
design for each referential integrity (foreign key) constraint
UPDATE operation
• UPDATE modifies the values of attributes in one or more
existing tuples in a relation
• UPDATE may violate domain constraint and NOT NULL
constraint on an attribute being modified
• Other constraints may also be violated:
– Updating the primary key (PK):
• Similar to a DELETE followed by an INSERT
• Need to specify similar options to DELETE
• The CASCADE option propagates the new value of PK to the foreign
keys of the referencing tuples automatically
– Updating a foreign key (FK) may violate referential integrity
– Updating an ordinary attribute (neither PK nor FK):
• Can only violate domain or NOT NULL constraints
• Relational Algebra
Relational Algebra Overview (1)
• Relational algebra is the basic set of operations for the
theoretical relational model
• SQL is the standard language for practical relational
databases
• Relational algebra operations are used in DBMS query
optimization
– SQL is converted to relational operations for optimization and
processing
• Input to a relational algebra operation is one or more
relations
• These operations can specify basic retrieval requests (or
queries)
Relational Algebra Overview (2)
• The result of an operation is a new relation, derived
from the input relations
– This property makes the algebra “closed” (i.e. all objects in
relational algebra are relations)
• A query will typically require multiple operations
– Result of one operation can be further manipulated using
additional operations of the same algebra
• A sequence of relational algebra operations forms a
relational algebra expression
– Final result is a relation that represents the result of a
database query (or retrieval request)
Relational Algebra Overview (3)

• Relational Algebra consists of several groups of operations


– Unary Relational Operations – one input table
• SELECT (symbol: σ (sigma))
• PROJECT (symbol:  (pi))
• RENAME (symbol:  (rho))
– Binary Relational Algebra Operations From Set Theory – two input tables
• UNION ( ꓴ ), INTERSECTION ( ꓵ ), DIFFERENCE (or MINUS, – )
• CARTESIAN PRODUCT ( x )
– Binary Relational Operations – two input tables
• JOIN (several variations of JOIN exist)
• DIVISION
– Additional Relational Operations
• OUTER JOINS, OUTER UNION
• AGGREGATE FUNCTIONS (These compute summary of information, E.g. SUM,
COUNT, AVG, MIN, MAX)
• Other operations
SELECT operation (1)
• The SELECT operation (denoted by σ (sigma)) selects a subset of the tuples
from a relation based on a selection condition.
– The selection condition acts as a filter
– Keeps only those tuples that satisfy the qualifying condition
– Tuples satisfying the condition are selected whereas the other
tuples are discarded (filtered out)
– Important Note: The SELECT operation is different from the
SELECT-clause of SQL
• Examples:
– Select the EMPLOYEE tuples whose department number is 4:
σ DNO = 4 (EMPLOYEE)
– Select the employee tuples whose salary is greater than $30,000:
σ SALARY > 30,000 (EMPLOYEE)
SELECT operation (2)
– The general form of select operation is
σ <selection condition> (R), where
• the symbol σ (sigma) is used to denote the select operator
• the selection condition is a Boolean (conditional) expression
specified on the attributes of relation R
• tuples that make the condition true are selected
– appear in the result of the operation
• tuples that make the condition false are filtered out
– discarded from the result of the operation
SELECT operation (3)

• SELECT Operation Properties


– SELECT operation σ <selection condition>(R) produces a relation S that has
the same schema (same attributes) as R
– SELECT σ is commutative:
• σ <condition1>(σ < condition2> (R)) = σ <condition2> (σ < condition1> (R))
– Because of commutativity property, a cascade (sequence) of
SELECT operations may be applied in any order:
• σ <cond1>(σ <cond2> (σ <cond3> (R)) = σ <cond2> (σ <cond3> (σ <cond1> ( R)))
– A cascade of SELECT operations may be replaced by a single
selection with a conjunction of all the conditions:
• σ <cond1>(σ < cond2> (σ <cond3>(R)) = σ <cond1> AND < cond2> AND < cond3>(R)))
– The number of tuples in the result of a SELECT is less than (or
equal to) the number of tuples in the input relation R
PROJECT operation (1)
• PROJECT Operation is denoted by  (pi symbol)
• Keeps certain columns (attributes) from a relation
and discards the other columns.
– PROJECT creates a vertical partitioning
• The list of specified columns (attributes) is kept in each tuple
• The other attributes in each tuple are discarded
• Example: List each employee’s first and last name
and salary:
 LNAME, FNAME, SALARY(EMPLOYEE)
PROJECT operation (2)
• General form of project operation is:
 <attribute list>(R), where
–  (pi) is the symbol used to represent the project operation
– <attribute list> is the desired list of attributes from relation R.
• The operation removes any duplicate tuples
– This is because the result of the project operation must be a set of tuples
• Mathematical sets do not allow duplicate elements.
• PROJECT Operation Properties
– Number of tuples in result of projection  <list>(R) less or equal to the
number of tuples in R
• If the list of projected attributes includes a key of R, then number of tuples in
result of PROJECT is equal to the number of tuples in R
– PROJECT is not commutative
 <list1> ( <list2> (R) ) =  <list1> (R) as long as <list2> contains the attributes in <list1>
Relational Algebra Expressions
• To apply several relational algebra operations
– Either we write the operations as a single relational
algebra expression by nesting the operations in
parentheses, or
– We apply one operation at a time and create intermediate
result relations.
• In the latter case, we must give names to the
relations that hold the intermediate results.
Single expression versus sequence of
relational operations (Example)
• To retrieve the first name, last name, and salary of all
employees who work in department number 5, we must
apply a select and a project operation
• Single relational algebra expression:
–  FNAME, LNAME, SALARY(σ DNO=5(EMPLOYEE))
• OR sequence of operations, giving a name to each
intermediate relation:
– DEP5_EMPS  σ DNO=5(EMPLOYEE)
– RESULT   FNAME, LNAME, SALARY (DEP5_EMPS)
RENAME operations (1)
• The RENAME operator is denoted by  (rho)
• Used to rename the attributes of a relation or the
relation name or both
– Useful when a query requires multiple operations
– Necessary in some cases (see JOIN operation later)
• The general RENAME operation  can be expressed by
any of the following forms:
 S(B1, B2, …, Bn)(R) changes both:
 the relation name to S, and
 the column (attribute) names to B1, B1, …, Bn
 S(R) changes:
 the relation name only to S
 (B1, B2, …, Bn )(R) changes:
• the column (attribute) names only to B1, B1, …..Bn
RENAME operation (2)
• For convenience, we also use shorthand for renaming an
intermediate relation:
– If we write:
• R(FN, LN, SAL)  πFNAME, LNAME, SALARY(DEP5_EMPS)
• R will rename FNAME to FN, LNAME to LN, and SALARY to SAL
• If we write:
• R(F, M, L, S, B, A, SX, SAL, SU, D)  EMPLOYEE
• Then EMPLOYEE is renamed to R and the 10 attributes of EMPLOYEE are
renamed to F, M, L, S, B, A, SX, SAL, SU, D, respectively
Example
• Next slide (Figure 6.2) shows the result of the following two
expressions when applied to the database state in Figure 3.6
• Single relational algebra expression (Fig 6.2(a)):
– πFNAME, LNAME, SALARY(σDNO=5(EMPLOYEE))
• Sequence of operations, giving a name to each intermediate
relation (Fig 6.2(b)):
– TEMP  σDNO=5(EMPLOYEE)
– R  πFNAME, LNAME, SALARY(TEMP)
UNION Operation (1)
• UNION Operation
– Binary operation, denoted by ꓴ
– The result of R ꓴ S, is a relation that includes all tuples that
are either in R or in S or in both R and S
– Duplicate tuples are eliminated
– The two operand relations R and S must be “type
compatible” (or UNION compatible)
• R and S must have same number of attributes
• Each pair of corresponding attributes must be type
compatible (have same or compatible domains)
UNION operation (2)
• Example:
– To retrieve the social security numbers of all employees who either
work in department 5 (RESULT1 below) or directly supervise an
employee who works in department 5 (RESULT2 below)
– We can use the UNION operation as follows:
• DEP5_EMPS  σDNO=5 (EMPLOYEE)
• RESULT1  σ SSN(DEP5_EMPS)
• RESULT2(SSN)  πSUPERSSN(DEP5_EMPS)
• RESULT  RESULT1 ꓴ RESULT2
– The union operation produces the tuples that are in either RESULT1
or RESULT2 or both (see Fig 6.3)
Type Compatibility

• Type Compatibility of operands is required for the binary set


operation UNION ꓴ, (also for INTERSECTION ꓴ, and SET
DIFFERENCE –, see next slides)
• R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) are type compatible if:
– they have the same number of attributes, and
– the domains of corresponding attributes are type compatible (i.e.
dom(Ai)=dom(Bi) for i=1, 2, ..., n).
• The resulting relation for R1 ꓴ R2 (also for R1 ꓴ R2, or R1 – R2,
see next slides) has the same attribute names as the first
operand relation R1 (by convention)
INTERSECTION operation

• INTERSECTION is denoted by ꓵ
• The result of the operation R ꓵ S, is a relation that
includes all tuples that are in both R and S
– The attribute names in the result will be the
same as the attribute names in R
• The two operand relations R and S must be “type
compatible”
SET DIFFERENCE operation

• SET DIFFERENCE (also called MINUS or EXCEPT) is


denoted by –
• The result of R – S, is a relation that includes all tuples
that are in R but not in S
– The attribute names in the result will be the
same as the attribute names in R
• The two operand relations R and S must be “type
compatible”
Set Operations in SQL

• Figure 6.4 (next slide) shows some examples of set


operations on relations
• SQL has UNION, INTERSECT, EXCEPT operations
– These operations work with sets of tuples
(duplicate tuples are eliminated)
– In addition, SQL has UNION ALL, INTERSECT
ALL, EXCEPT ALL for multisets (duplicates are
allowed)
Some properties of UNION,
INTERSECT, and DIFFERENCE
• Both union and intersection are commutative operations;
that is
– R ꓴ S = S ꓴ R, and R ꓵ S = S ꓵ R
• Both union and intersection can be treated as n-ary
operations applicable to any number of relations as both are
associative operations; that is
– R ꓴ (S ꓴ T) = (R ꓴ S) ꓴ T
– (R ꓵ S) ꓵ T = R ꓵ (S ꓵ T)
• The minus operation is not commutative; that is, in general
– R–S≠S–R
CARTESIAN PRODUCT operation (1)
• CARTESIAN (or CROSS) PRODUCT Operation
– Different from the other three set operations
– Used to combine tuples from two relations in a combinatorial
fashion.
– Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm)
– Result is a relation Q with degree n + m attributes:
• Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
– The resulting relation state has one tuple for each combination
of tuples—one from R and one from S.
– Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS
tuples, then R x S will have nR * nS tuples.
– R and S do NOT have to be "type compatible"
CARTESIAN PRODUCT operation (2)
• Generally, CROSS PRODUCT is not a meaningful
operation
– Can become meaningful when followed by other
operations
• Example (not meaningful):
– FEMALE_EMPS  σSEX=‘F’(EMPLOYEE)
– EMPNAMES  πFNAME, LNAME, SSN (FEMALE_EMPS)
– EMP_DEPENDENTS  EMPNAMES x DEPENDENT
• EMP_DEPENDENTS will contain every combination of tuples
from EMPNAMES and DEPENDENT
– whether or not the tuples are actually related
CARTESIAN PRODUCT operation (3)
• To keep only combinations where the DEPENDENT is
related to the EMPLOYEE by the condition ESSN=SSN, we
add a SELECT operation
• Example (meaningful, see Figure 6.5, next two
slides):
– FEMALE_EMPS  σSEX=‘F’(EMPLOYEE)
– EMPNAMES  πFNAME, LNAME, SSN (FEMALE_EMPS)
– EMP_DEPENDENTS  EMPNAMES x DEPENDENT
– ACTUAL_DEPS  σSSN=ESSN(EMP_DEPENDENTS)
– RESULT  πFNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPS)
• RESULT will now contain the name of female employees and
their dependents
JOIN operation (also called INNER JOIN)
• JOIN Operation (denoted by ⋈ (bowtie symbol))
– The sequence of CARTESIAN PRODECT followed by SELECT can be
used to identify and select related tuples from two relations
– The JOIN operation combines this sequence into a single
operation
– JOIN is very important for any relational database with more
than a single relation, because it allows to combine related
tuples from various relations
– The general form of a join operation on two relations R(A1, A2, . .
., An) and S(B1, B2, . . ., Bm) is:
• R ⋈<join condition>S
– where R and S can be base relations or any relations that result
from general relational algebra expressions.
JOIN operation (2)
• Example: Suppose that we want to retrieve the name of the
manager of each department.
– To get the manager’s name, we need to combine each
DEPARTMENT tuple with the EMPLOYEE tuple whose SSN value
matches the MGRSSN value in the department tuple.
– We do this by using the join ⋈ operation
– DEPT_MGR  DEPARTMENT ⋈MGRSSN=SSN EMPLOYEE
• MGRSSN=SSN is called the join condition
– Combines each department record with the employee who
manages the department
– The join condition can also be specified as
– DEPARTMENT.MGRSSN= EMPLOYEE.SSN
– Result in Figure 6.6 (next slide)
Properties of JOIN (1)
• Consider the following JOIN operation:
– R(A1, A2, . . ., An) ⋈R.Ai=S.BjS(B1, B2, . . ., Bm)

– Result is a relation Q with degree n + m attributes:
• Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
– The resulting relation state has one tuple for each combination
of tuples - r from R and s from S, but only if they satisfy the join
condition r.Ai=s.Bj
– Hence, if R has nR tuples, and S has nS tuples, then the join
result will generally have less than nR * nS tuples.
– Only related tuples (based on the join condition) will appear in
the result
Properties of JOIN (2)
• General case of JOIN operation is called a Theta-join:
R ⋈ theta S

• The join condition is called theta


• Theta can be any general boolean expression on the
attributes of R and S; for example:
– R.Ai<S.Bj AND (R.Ak=S.Bl OR R.Ap<S.Bq)
• Most join conditions involve one or more equality
conditions “AND”ed together; for example:
– R.Ai=S.Bj AND R.Ak=S.Bl AND R.Ap=S.Bq
Variations of JOIN: EQUIJOIN
• EQUIJOIN Operation is the most common use of join;
involves join conditions with equality comparisons
only (Ai = Bj)
– The result of an EQUIJOIN will always have one or more
pairs of attributes (whose names need not be identical)
that have identical values in every tuple.
– The JOIN in the previous example was an EQUIJOIN; every
tuple in the result will have SSN=MGRSSN
Variations of JOIN: NATURAL JOIN (1)
• NATURAL JOIN Operation
– Another variation of JOIN called NATURAL JOIN — denoted by *
— was created to get rid of the second (superfluous) attribute in
an EQUIJOIN result.
• because one of each pair of attributes with identical values is
superfluous
– The standard definition of natural join requires that the two join
attributes, or each pair of corresponding join attributes, have
the same name in both relations
– If this is not the case, a renaming operation is applied first.
NATURAL JOIN (2)
• Example: To apply a natural join on the DNUMBER attributes of
DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write:
– DEPT_LOCS  DEPARTMENT * DEPT_LOCATIONS
• Only attribute with the same name is DNUMBER
• An implicit join condition is created based on this attribute (see Figure
6.7(b)), next slide:
• DEPARTMENT.DNUMBER=DEPT_LOCATIONS.DNUMBER

• Another example: Q  R(A,B,C,D) * S(C,D,E)


– The implicit join condition includes each pair of attributes with the
same name, “AND”ed together:
• R.C=S.C AND R.D.S.D
– Result keeps only one attribute of each such pair:
• Q(A,B,C,D,E)
INNER JOIN versus OUTER JOIN

• All the join operations so far are known as inner joins


• A tuple in one relation that does not have any matching
tuples in the other relation (based on the join condition)
does not appear in the join result when using INNER JOIN
• If the user wants every tuple to appear at least once in the
join result, another form of join known as OUTER JOIN can be
used
• OUTER JOINs are discussed later in this chapter
Complete Set of Relational Operations
• The set of operations {SELECT σ, PROJECT  , UNION ꓴ,
DIFFERENCE - , RENAME , and CARTESIAN PRODUCT
X} is called a complete set because any relational
algebra expression using the operations presented so
far can be as a combination of these six operations.
• For example:
– R ∩ S = (R ∪ S ) – ((R − S) ∪ (S − R))
– R ⋈ <condition>S = σ <condition> (R X S)
DIVISION operation
• DIVISION Operation (see Figure 6.8, next slide)
– The division operation is applied to two relations
– R(Z) ÷ S(X), where X subset Z. Let Y = Z - X (and hence Z = X ∪
Y); that is, let Y be the set of attributes of R that are not
attributes of S.

– The result of DIVISION is a relation T(Y) that includes a tuple t if


tuples tR appear in R with tR [Y] = t, and with
• tR [X] = ts for every tuple ts in S.

– For a tuple t to appear in the result T of the DIVISION, the values


in t must appear in R in combination with every tuple in S.
Main Reference
1. Chapter 5: The Relational Data Model and Relational
Database Constraints
2. Chapter 8: The Relational Algebra
(Fundamentals of Database Systems, Global Edition,
7th Edition (2017) by Ramez Elmasri & Shamkant
Navathe)
Additional References
https://courses.cs.vt.edu/cs4604/Spring21/pdfs/2-dbmodel.pdf

This Presentation is mainly dependent on the textbook: Fundamentals of Database Systems, Global Edition, 7th Edition (2017) by Ramez Elmasri & Shamkant Navathe
Thank You

You might also like