You are on page 1of 55

Introduction to

Relational Model
Outline
 Structure of Relational Databases
 Database Schema and Instance
 Keys
 Relational Query Languages
 The Relational Algebra
Example of a Relation: Instructor relation
attributes
(or columns)

tuples
(or rows)
Example of a Relation: teaches relation
Attribute
 The set of allowed values for each attribute is called the
domain of the attribute
 Roll #: Alphanumeric string
 Fname, Lname: Alphabetic string
 DoB: Date
 Passport #: String (Letter followed by 7 digits)- nullable
 Aadhaar #: 12-digit number
 Department: Alphabetic string
 Attribute values are (normally) required to be atomic;
that is, indivisible
 The special value null is a member of every domain. It
indicates that the value is “unknown”
Students
Roll # Fname Lname DoB Passport # Aadhaar # Department
15CS1026 Lalit Dubey 27-Mar-1997 L4032464 172861749239 Computer
16EE3029 Jatin Chopra 17-Nov-1996 null 152861649112 Mathematics
Database Schema and Instance
 A1, A2, A3…..,An are attributes
 R = (A1, A2, A3…..,An) is a relation schema
 Example:
instructor = (ID, name, dept_name, salary)
 Formally, for given sets D1, D2,…….Dn, a relation r is a subset of
D1 x D2 x…….x Dn
Thus, a relation is a set of n-tuples (a1, a2, ……., an) where
each ai ϵ Di
 The current values (relation instance) of a relation are specified
by a table.
 An element t of r is a tuple, represented by a row in a table.
Database Schema and Instance
 Database schema -- is the logical structure of the database.
 Database instance -- is a snapshot of the data in the
database at a given instant of time.
 Example:
 schema: instructor (ID, name, dept_name, salary)
 Instance:
Relations are Unordered
 Order of tuples is irrelevant (tuples may be stored in an
arbitrary order)
 Example: instructor relation with unordered tuples
Keys
 Let K  R
 K is a super key of R if values for K are sufficient to identify a
unique tuple of each possible relation r(R)
 Example: {ID} and {ID, name} are both superkeys of instructor.
 Superkey K is a candidate key if K is minimal
Example: {ID} is a candidate key for Instructor
 One of the candidate keys is selected to be the primary key.
 A surrogate key (or synthetic key) in a database is a unique
identifier for an entity set in the modeled world (or database)
 The surrogate key is not derived from application data, unlike a natural
(or business) key, which is derived from application data
Keys
 Super Key: Roll#, {Roll#, DoB}, …………
 Candidate Keys: Roll#, Aadhaar#, {Fname, Lname}
 Passport# cannot be a key. Why?
 Primary Key: Roll#
 Secondary/Alternate Key: {Fname, Lname}, Aadhaar#
 Simple Key: Consists of a single attribute
 Composite Key: {Fname, Lname}
 Consists of more than one attribute to uniquely identify an entity
 One or more of the attributes, which make up the key, are not simple
keys in their own right
Students
Roll # Fname Lname DoB Passport # Aadhaar # Department
15CS1026 Lalit Dubey 27-Mar-1997 L4032464 172861749239 Computer
16EE3029 Jatin Chopra 17-Nov-1996 null 152861649112 Mathematics
16EC2016 Smriti Mongra 23-Dec-1996 G3200125 190852699005 Electronics
16CE4038 Ramdin Minz 21-Jan-1995 X8300127 165721079131 Civil
Keys
 Foreign key constraint: Value in one relation must appear in another
relation
 Referencing relation (Jo kar rha hai)
 Enrolment: Foreign keys- Roll#, Course#
 Referenced relations (Jo use ho rha hai)
 Students, Courses
 A compound key consists of more than one attribute to uniquely
identify an entity occurrence
 Each attribute, which makes up the key, is a simple key in its own right
 {Roll#, Course#}
Students
Roll # Fname Lname DoB Passport # Aadhaar # Department

Courses
Course# Course Name Credits L-T-P Department
Enrolment

Roll # Course# Instructor ID


Relational Query Languages
Procedural vs. Non-procedural (or Declarative) languages
 Procedural programming requires that the programmer tell
the computer what to do
 That is, how to get the output from given inputs
 The programmer must know an appropriate algorithm
 Non-procedural programming: The programmer does not
provide a specific procedure for obtaining output
 The programmer only need to know what relationships hold between
input and output
 Example: Root of n is m such that m2 = n
Relational Query Languages
 “Pure” query languages:
 Relational algebra
 Tuple relational calculus
 Domain relational calculus
 The above 3 pure languages are equivalent in computing
power
 We will concentrate here on relational algebra
Relational Algebra
 A query language consisting of a set of operations that take
one or two relations as input and produces a new relation
as the output
 Basic operators

Select: 
 Project: 
 Union: 
 Set difference: –
 Cartesian product: x
 Rename: 
 These operators take one or two relations as inputs and
produce a new relation as output
Select Operation-Selection of rows (tuples)
 The select operation selects tuples that satisfy a given predicate
 Notation:  p(r)
 p is called the selection predicate
 Example: Select those tuples of the instructor relation where the
instructor is from the “Physics” department
 Query

 dept_name=“Physics” (instructor)

 Result
Select Operation (Cont.)
 We allow comparisons using
=, , >, , <, 
in the selection predicate.
 We can combine several predicates into a larger predicate by using
the connectives:
 (and),  (or),  (not)
 Example: Find the instructors in Physics with a salary greater than
$90,000, we write:

 dept_name=“Physics” ^ salary > 90000 (instructor)

 Select predicate may include comparisons between two attributes


 Example: Find all departments whose name is the same as

their building name:



 dept_name=building (department)
Select Operation (Cont..)

 Relatio
A B C D
nr
  1 7
  5 7
  12 3
  23 10

A=B ^ D > 5 (r)


A B C D

  1 7
  23 10
Project Operation- selection of columns (attributes)
 A unary operation that returns its argument relation, with
certain attributes left out
 Notation:
 A1,A2,A3 ….Ak (r)
where A1, A2 are attribute names and r is a relation name
 The result is defined as the relation of k columns obtained
by erasing the columns that are not listed
 Duplicate rows removed from result, since relations are
sets
Project Operation (Cont..)

A B C
 Relation r:
 10 1
 20 1
 30 1
 40 2

• A,C (r) A C A C

 1  1
 1 =  1
 1  2
 2
Project Operation (Cont.)
 Example: Eliminate the dept_name attribute of instructor
 Query:
ID, name, salary (instructor)

 Result:
Union Operation
 The union operation allows us to combine two relations
 Notation: r  s
 For r  s to be valid:
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (Example: 2nd
column of r deals with the same type of values as the 2nd column
of s does)
Union of two relations
 Relations r, s: A B A B

 1  2
 2  3
 1 s
r

A B
 r  s:
 1
 2
 1
 3
Union Operation (Cont.)
 Example: Find course id of all courses taught in the Fall 2009
semester, or in the Spring 2010 semester, or in both from section
relation
course_id ( semester=“Fall” Λ year=2009 (section)) 
course_id ( semester=“Spring” Λ year=2010 (section))
Union Operation (Cont.)
Union Operation (Cont.)
 Result of:

course_id ( semester=“Fall” Λ year=2009 (section)) 

course_id ( semester=“Spring” Λ year=2010 (section))


Set difference of two relations
 Relations r, s:
A B A B

 1  2
 2  3
 1 s
r

 r – s:
A B

 1
 1
Set Difference Operation
 The set-difference operation allows us to find tuples that are in one
relation but are not in another
 Notation: r – s
 Set differences must be taken between compatible relations
 r and s must have the same arity (same number of attributes)
 Attribute domains of r and s must be compatible
 Example: Find course id of all courses taught in the Fall 2009
semester, but not in the Spring 2010 semester

course_id ( semester=“Fall” Λ year=2009 (section)) −

course_id ( semester=“Spring” Λ year=2010 (section))


Cartesian-Product Operation
 Relations r, s:
A B C D E

 1  10 a
 10 a
 2  20 b
r  10 b

 r x s: s
A B C D E
 1  10 a
 1  10 a
 1  20 b
 1  10 b
 2  10 a
 2  10 a
 2  20 b
 2  10 b
Cartesian-Product Operation
 The Cartesian-product operation (denoted by X) allows us to
combine information from any two relations
 Example: The Cartesian product of the relations instructor and
teaches is written as:
instructor X teaches
 We construct tuples of the resultant relation from each possible
pair of tuples- one from the instructor relation and one from the
teaches relation
 Since the instructor ID appears in both relations, we distinguish
between these attributes by attaching the name of the relations to
the attribute from which the attribute originally came
 instructor.ID
 teaches.ID
The Rename Operation
 Allows us to give a name corresponding to the results of relational-
algebra expressions
 Also allows us to give more than one names to a relation
 Example:
 x (E)

returns the expression E under the name X


 If a relational-algebra expression E has arity n, then
 x ( A ,A
1 2 ,..., An )
(E )
returns the result of expression E under the name X, and with the
attributes renamed to A1 , A2 , …., An
Composition of Operations
 We can build expressions using multiple operations
 Example: A=C(r x s) rxs

r s A B C D E

A B C D E  1  10 a
 1  10 a
 1  10 a  1  20 b
 10 a  1  10 b
 2  20 b  2  10 a
 10 b  2  10 a
 2  20 b
 2  10 b

A B C D E
A=C(r x s)
 1  10 a
 2  10 a
 2  20 b
Composition of Operations
 The result of a relational-algebra operation produces relation.
Therefore the relational-algebra operations can be composed
together into a relational-algebra expression
 Example: Find the names of all instructors from the Physics
department

name( dept_name =“Physics” (instructor))

 Instead of giving the name of a relation as the argument of


the projection operation, here we have given an expression
Additional Relational-Algebra Operations

 Additional Operations
 Set intersection
 Natural join
 Assignment Operation
 Aggregation
 Outer Join
 Division
Set intersection of two relations

 Relation r, s: A B A B
 1  2
 2  3
 1
s
r

A B

 rs  2
Set Intersection Operation
 The set-intersection operation allows us to find tuples that
are in both the input relations
 Notation: r  s
 Assume:
 r, s have the same arity
 Attributes of r and s are compatible
 Example: Find course id of all courses taught in both the Fall
2009 and the Spring 2010 semesters
course_id ( semester=“Fall” Λ year=2009 (section)) 
course_id ( semester=“Spring” Λ year=2010 (section))

 Result
The Natural-Join Operation
 The Cartesian-Product
instructor X teaches
associates every tuple of instructor with every tuple of teaches
 Most of the resulting rows have information about instructors who did
NOT teach a particular course
 To get only those tuples of “instructor X teaches” that pertain to
instructors and the courses that they taught, we write:
 instructor.id = teaches.id (instructor x teaches))

 We get only those tuples of “instructor X teaches” that pertain to


instructors and the courses that they taught
 The result of this expression, shown in the next slide
The Natural-Join Operation (Cont.)
 The table corresponding to:
 instructor.id = teaches.id (instructor x teaches))
The Natural-Join Operation (Cont.)

 The natural join is a binary operation that allows us to combine


certain selections and a Cartesian product into one operation
 It forms a Cartesian product of its two arguments, performs a
selection on those attributes that appear in both relation
schemas, and finally removes duplicate attributes
 Notation: r s
 Example:
r = (A, B, C, D)
s = (E, B, D)
 Resultant schema = (A, B, C, D, E)
 r s is defined as:
r.A, r.B, r.C, r.D, s.E (r.B = s.B  r.D = s.D (r x s))
The Natural-Join Operation – Example
 Relations r, s:
A B C D B D E

 1  a 1 a 
 2  a 3 a 
 4  b 1 a 
 1  a 2 b 
 2  b 3 b 
r s

 r s
A B C D E
 1  a 
 1  a 
 1  a 
 1  a 
 2  b 
The Theta Join Operation

 The theta join operation is a variant of the natural-join


operation that allows us to combine a selection and a
Cartesian product into a single operation
 The theta join operation is defined as follows:
The Assignment Operation
 It is convenient at many times to write a relational-algebra
expression by assigning parts of it to temporary relation variables
 The assignment operation is denoted by  and works like
assignment in a programming language
 Example: Find all instructor details in the “Physics” and “Music”
department

Physics   dept_name=“Physics” (instructor)


Music   dept_name=“Music” (instructor)
result = Physics  Music

 With the assignment operation, a query can be written as a


sequential program consisting of a series of assignments followed by
an expression whose value is displayed as the result of the query
Aggregate Functions and Operations
 Aggregate functions: Take a collection of
values and returns a single value as a result
avg: Returns average value
min: Returns minimum value
max: Returns maximum value
sum: Returns sum of values
count: Returns number of values
 The general form of the aggregation operation
is as follows:
Aggregate Functions – Example
 Relation r:
A B C

  7
  7
  3
  10

 g sum(c) (r) sum(c )

27
Aggregate Operation – Example
 There are circumstances where we would like to
apply the aggregate function not to a single set of
tuples, but instead to a group of sets of tuples.
branch_nameaccount_number balance
e.g. Perryridge A-102 400
Perryridge
 Relation account A-201 900
Brighton
grouped by
A-217
branch-name:
750
Brighton A-215 750
Redwood A-222 700

branch_name g sum(balance) (account)


branch_name sum(balance)
Perryridge 1300
Brighton 1500
Redwood 700
Aggregate Functions (Cont.)
 Result of aggregation does not have a name
 We can use rename operation to give it a name
 For convenience, we permit renaming as part of aggregate operation

branch_name g sum(balance) as sum_balance


(account)
Aggregate Functions (Cont.)
 There are cases where we must eliminate multiple occurrences of
a value before computing an aggregate function
 If we want to eliminate duplicates, we use hyphenated string
“distinct” at the end of the “count” function
 Example: Find the total number of instructors who teach a course
in the Spring 2010 semester using teaches table
 In this case, an instructor counts only once, regardless of the number
of course sections that the instructor teaches
Outer Join
 An extension of the join operation that avoids loss of information
 Computes the join and then adds tuples from one relation that does not
match tuples in the other relation to the result of the join
 Use null values:
 null signifies that the value is unknown or does not exist
Outer Join – Example
 Relation loan
loan_numberbranch_nameamount
L-170 Downtown 3000
L-230 Redwood 4000
L-260 Perryridge 1700

 Relation borrower

customer_name
loan_number
Jones L-170
Smith L-230
Hayes L-155
Outer Join – Example
 Join: loan borrower
loan_number branch_nameamount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith

• Left Outer Join: Takes all tuples in the left relation that did
not match with any tuple in the right relation, pads the
tuples with null values for all other attributes from the right
relation, and adds them to the result of the natural join
loan borrower

loan_number branch_nameamount customer_name


L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-260 Perryridge 1700 null
Outer Join – Example
• Right Outer Join: Takes all tuples in the right relation
that did not match with any tuple in the left relation,
pads the tuples with null values for all other attributes
from the left relation, and adds them to the result of
the natural join
loan borrower

loan_number branch_nameamount customer_name


L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-155 null null Hayes
Outer Join – Example
• Full Outer Join: Does both the left and right outer join
operations
loan borrower
loan_number branch_nameamount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-260 Perryridge 1700 null
L-155 null null Hayes
Division Operation
r
s
 Notation:
 Let r and s be relations on schemas R
and S respectively where
 R = (A1, …, Am , B1, …, Bn )
 S = (B1, …, Bn)
The result of r  s is a relation on schema
R – S = (A1, …, Am)
r  s = { t | t   R-S (r)   u  s
( tu  r ) }
Division Operation – Example
 Relations r, s:
A B B
 1
1
 2
 3 2
 1 s
 1
 1
 3
 4
 6
 1
 2
 r  s: A r



Another Division Example
 Relations r,
s: A B C D E D E

 a  a 1 a 1
 a  a 1 b 1
 a  b 1 s
 a  a 1
 a  b 3
 a  a 1
 a  b 1
 a  b 1
r
 r  s:
A B C

 a 
 a 
End

You might also like