You are on page 1of 47

CMPT 354

Database Systems I

Chapter 3 – Relational Data Model

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


Benefit of Relational Model
• Nearly all databases are based on a relational
model.
– IBM, Microsoft, Oracle, Sybase, Informix, etc…
• Single mathematical concept: Relation.
• Relations allow a high level data manipulation
language – SQL.
• Object oriented databases are competitors.
– Object Store, Versant, Ontos.
• Object relational model emerges.
– Informix, Oracle, IBM.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Relational Model Basics
• A Relation has a set of tuples, represented by a
named two dimensional table.
• Attributes (or fields) are stored in columns.
• Tuples (or records) are stored in rows.
• The first row contains the attribute names.
• Attributes have a domain – an atomic type.
• A tuple has one component for each attribute.
• The relation is the only way to store data ⇒
relational database is a collection of relations.

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


Relational Schema
• A Schema for a relation is represented by the
name of the relation followed by a parenthesized
list of attributes.
• Example: accounts(account#:int, balance: real)
• Attributes are referenced by name, not column
locations ⇒ column locations are allowed to
change.
• When column names are not specified, the
schema default order is assumed.
• Column names must be unique.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Relation Instances
• A relation instance is a particular collection of
tuples.
• The tuples have no significance of ordering.
• There cannot be duplicate tuples. Why?
• A relation instance R can be defined as:
– R ⊆ D1 × D1 × …× Dn where Di are the domains.
• The cardinality of R is the number of rows = |R|
• The degree/arity of R is the number of columns
=n
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Relation Example
• A possible relation instance shown:

Accounts
Account# Balance
3372183 500.00
6533341 1.00
6334234 -48.65
5643245 0.00

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


From E-R to Relations
• Basic guideline for converting E-R diagrams to
relational model:
– Each entity set is described as a relation
– Each relationship set is described as a relation with
added foreign keys from all connecting sets.
• There are some exceptions:
– Weak entity sets (foreign keys, relationship sets)
– “ISA” entity sets have a few modeling options.
– Combining relations can be a better design (when?)

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


Conversion Examples
• Converting an entity set to a relation schema:
In E-R: Name Address

SSN Phone

clients

Relation Schema:
Clients(SSN:int, Name:String, Address:String, Phone:int)

• No knowledge of relationship sets.


Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Conversion Examples
• Converting a relationship set to relation schema:
In E-R:
address join date Acct # balance
name
phone
SSN

clients hold accounts

Relation Schema:
hold(SSN:int, Account#:int, joinDate:Date)

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


Relationships with roles
• Converting a relationship set with roles:
In E-R:
address
name
phone
SSN Primary
Joint
clients Account
Joint

Relation Schema:
JointAccount(PrimarySSN:int, JointSSN:int)

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


Combining Relations
• Some relations are not necessary and merely complicate
the design.
• Commonly many-to-one relationship sets are not
necessary.
• If R is a many-to-one relationship set from entity sets A
to B then we can combine R and A by merging their
relations. OR combine directly from E-R as follows:
– Include all the attributes of A.
– Include the key attributes of B.
– Include the attributes of R.
• The tuples of A that are not in R have NULL for
attributes of B and R. Consider space efficiency!
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Combining Example
address join date Acct # balance
name
phone
SSN

clients hold accounts

primary joint
joint
with

• Relational Schema:
Clients(SSN:int, Name:String, Address:String, Phone:int)
Accounts(Account#:int, balance: real)
Hold(SSN:int, Account#:int, joinDate:Date)
JointAccount(PrimarySSN:int, JointSSN:int)
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Combining Example
address join date Acct # balance
name
phone
SSN

clients hold accounts

primary joint
joint
with

• Relational Schema:
Clients(SSN:int, Name:String, Address:String, Phone:int)
Accounts(SSN:int,Account#:int, balance: real, joinDate:Date)
JointAccount(PrimarySSN:int, JointSSN:int)

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


Converting Weak Sets
• Weak entity sets need foreign keys from all
supporting entity sets to define a primary key ⇒
part of the relation schema of the weak entity.
• Supporting relationships are either redundant or
have attributes that can be assigned to the weak
entity set.
• Remember: supporting relationships are many-
to-one.
• Other relationships with the weak entity set must
include the entire key, including foreign keys.

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


Weak Set Example
Acct # balance type amount

Rest
of
E-R accounts lends Mortgage
Diagram

• Relational Schema:
Clients(SSN:int, Name:String, Address:String, Phone:int)
Accounts(SSN:int,Account#:int, balance: real, joinDate:Date)
JointAccount(PrimarySSN:int, JointSSN:int)
Mortgage(Account#:int, type:string, amount:real)
Lends(Account#:int, MortgageAcc#:int, type:string)
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Weak Set Example
Acct # balance type amount

Rest
of
E-R accounts lends Mortgage
Diagram

• Relational Schema:

Mortgage(Account#:int, type:string, amount:real)
Lends(Account#:int, MortgageAcc#:int, type:string, amount:real)
– Account# = MortgageAcc#, Why?
– Lends(Account#:int, type:string) is part of the Mortgage relation.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
E-R ISA to Relation
• Several design options for conversion:
• E-R Style Conversion:
– Create a relation for each entity set in the hierarchy.
– The relations include key attributes from the root.
• Objects of a Single Class:
– Create a relation for each possible subtree including the root.
– The relation schema includes all the attributes of all the entity
sets in the subtree.
• All Encompassing Relation:
– Create a relation that has all the attributes of all the entity sets in
the hierarchy.
– Empty components (tuple attributes) are represented by NULL.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
E-R Style Conversion
• Each entity set in the ISA hierarchy is converted
into a relation.
• ISA relationship itself is not modeled.
• Each relation has a key from the root relation.
• The root keys are also foreign keys in any
relationship on the tree relations.
• Entity sets do not depend on covering constraint
– Why not?
• The number of relations is the number of entity
sets.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
E-R Conversion Example
style# company description Acct # balance

chequebooks accounts interest


card # # trans
issued isa
chequing savings

• Relational Schema:
Accounts(Acct#:int, balance:real)
Savings(Acct#:int, interest:real)
Chequing(Acct#:int, card#:int, #trans:int)
ChequeBooks(style#:short, company:string, description:string)
Issued(Acct#:int, style#:short)
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Object-oriented Conversion
• Entities are considered as objects in an object
oriented approach ⇒ belong to only one class.
• Every possible subtree with the root included is
considered a class.
• Create a relation for every possible subtree with
all attributes of all the entity sets in the subtree.
• Possible subtrees depend on covering and
overlap constraints. How?
• The number of relations is in the order of
2^|entity sets|.

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


OO Conversion Example
style# company description Acct # balance

chequebooks accounts interest


card # # trans
issued isa
chequing savings

• Relational Schema:
Accounts(Acct#:int, balance:real)
Savings(Acct#:int, interest:real, balance:real)
Chequing(Acct#:int, card#:int, #trans:int, balance: real)
ChequingAndSavings(Acct#:int, card#:int, #trans:int,
interest:real, balance: real)
Issued(Acct#:int, style#:short)
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
All Encompassing Relation
• We can translate the entire isa hierarchy as a
single relation.
• The relation has attributes from all entity sets in
the hierarchy.
• Attributes from one subclass may not be defined
for tuples from another subclass, so they are set
to NULL.
• Therefore, we must assume all attributes except
for the root attributes can be set to NULL.
• Only one relation necessary.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
One Relation Example
style# company description Acct # balance

chequebooks accounts interest


card # # trans
issued isa
chequing savings

• Relational Schema:
Accounts(Acct#:int, balance:real, card#:int, #trans:int,
interest:real)
ChequeBooks(style#:short, company:string, description:string)
Issued(Acct#:int, style#:short)

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


ISA Conversion Comparison
• Generally prefer fewer number of relations:
– Object-oriented has exponential number of relations, which is
generally undesirable for many entity sets.
• Want to support queries efficiently:
– Single relation approach supports queries regarding all the
attributes most efficiently.
– But, it loses the semantic information contained by the name of
the entity sets.
• Redundancy and space efficiency:
– Object-oriented approach has all tuples belong to exactly one
relation, and all attributes necessary so is most space efficient.
– Single relation method has nulls for some attributes.
– E-R style conversion has redundancy for entities on keys.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Functional Dependencies
• A form of uniqueness constraint.
• Functional Dependency (FD) on a relation R is a
constraint that choosing some attributes A1,A2,
…,An will also fix another attribute B of R.
• In math we write a function as:
F(A1,A2,…,An) = B.
Meaning B depends on the values of
{A1,A2,…,An} through some function F.
• We denote FD as:
A1A2…An → B
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
FD Splitting/Combining
• For a set of FDs:
A1A2…An → B1
A1A2…An → B2

A1A2…An → Bm

We can combine all the right side to


A1A2…An → B1 B2…Bm.

• Can also split it back. Why?


• Can we split the left side?
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
FD Example
• The relation corresponding to the following diagram is:
address
name Students(SID:int, name:string,
phone
Address:string,
SID
loginId Phone:int,
LoginID:string)
students

• The functional dependencies we have are:


– SID → Name Address Phone
– SID → LoginID ? Depends on specifications.
– LoginID → SID
– LoginID → SID Name Address Phone

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


Keys of Relations
• A set of attributes {A1,A2,…,An} are a candidate key for a
relation R if and only if:
i. A1A2…An → B1 B2…Bm where {B1 B2…Bm} are the rest of the
attributes of R.
ii. There does not exists Asubset⊂ {A1,A2,…,An} such that Asubset →
B1 B2…Bm.
• Condition ii is the principle that a candidate key must be
minimal.
• The candidate key chosen to reference tuples is called
primary key.
• Any set {A1,A2,…,An} that contains a candidate key is
called a superkey
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Primary Key Example
• In the student entity set, what are the candidate keys?
address
name Students(SID:int, name:string,
phone
Address:string,
SID
loginId Phone:int,
LoginID:string)
students

• The candidate keys we can have are:


– LoginID → SID? LoginID is a candidate key
– SID → LoginID? SID is a candidate key
– Neither: {LoginID, SID} is a candidate key.

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


Relationship Primary Key
address join date Acct # balance
name
phone
SSN

clients hold accounts

Hold(SSN:Int, Acct#:int, joinDate: Date)

• The candidate keys we can have are:


– Acct# → SSN joinDate if an account only belongs to one client.
– SSN → Acct# joinDate if a client has at most one account.
– Neither: {SSN, Acct#} is a candidate key.
– Both: Hold is 1:1, both SSN and Acct# are candidate keys.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Complete Inference Rules
• A complete set of inference rules called
Armstrong’s axioms, can help us find all the
inferences given by a set of FDs.
1. Reflexivity: if {B1 B2…Bm} ⊆ {A1,A2,…,An} then
A1A2…An → B1 B2…Bm. These are called trivial
FDs.
2. Augmentation: if A1A2…An → B1 B2…Bm then
A1A2…An C1C2…Cl → B1 B2…Bm C1C2…Cl for
any C1C2…Cl.
3. Transitivity: if A1A2…An → B1 B2…Bm and B1
B2…Bm → C1C2…Cl then A1A2…An → C1C2…Cl.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Closure Under FDs
• The closure of a set {A1,A2,…,An} of attributes
under a set S of FDs is a set B of attributes such
that A1A2…An → B and whenever A1A2…An → C
under S then C ⊆ B.
• The closure of a set A= {A1,A2,…,An} is
denoted as A+= {A1,A2,…,An}+.
• Can also derive a full set of FDs for a relation
from a given set S. The full set S+ is the closure
of FDs, and S is a basis.
• S is a minimal basis if no T ⊂ S has T+=S+.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Closure and Keys
• {A1,A2,…,An} is a superkey for relation R
if and only if {A1,A2,…,An}+ contains all
of R’s attributes. Why not candidate key?
• {A1,A2,…,An} is a candidate key for R if
and only if it is a superkey and no subset
{Ai,…,Aj} ⊂ {A1,A2,…,An} has {Ai,…,Aj}+
= {A1,A2,…,An}+.
• That is if after removing Ai from
{A1,A2,…,An} for any i, the closure of the
set does no longer contain all attributes.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Anomalies and Decomposition
• Anomalies are problems arising from database
relations having unrelated components.
1. Redundancy – e.g. a range of attributes need to be
duplicated to store a set of values on one attribute.
2. Update anomalies – e.g. can update one instance of
duplicated attributes but not another.
3. Deletion anomalies – e.g. can delete valuable
information by deleting a tuple that isn’t duplicated.
• Can resolve anomalies by decomposition, which
is splitting a relation into multiple relations:
{A1,A2,…,An} = {B1 B2…Bm} ∪ {C1,C2,…,Cl }.
• Can decompose following FD rules.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Anomalies Example
• For the relation
Students(SID:int, Name:string, Address:string, Phone:int, LoginID:string)
• Assume the following FDs:
– SID → Name Address Phone
– LoginID → SID Name Address Phone
• Then we can have a relation instance:
Students Redundancy
SID Name Address Phone LoginID
6637284 Mike Burnaby 604555 mike@sfu.ca
7398385 Rob Vancouver 604666 rob@sfu.ca
6637284 Mike Burnaby 604555 mike2@sfu.ca

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


Projection Example
Students
SID Name Address Phone LoginID
6637284 Mike Burnaby 604555 mike@sfu.ca
7398385 Rob Vancouver 604666 rob@sfu.ca
6637284 Mike Burnaby 604555 mike2@sfu.ca

Student Details Student Accounts


SID Name Address Phone SID LoginID
6637284 Mike Burnaby 604555 6637284 mike@sfu.ca
7398385 Rob Vancouver 604666 7398385 rob@sfu.ca
6637284 mike2@sfu.ca

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


Boyce-Codd Normal Form
• The Boyce-Codd Normal Form (BCNF) is a
decomposition rule for avoiding anomalies:
A relation R is in BCNF if and only if whenever any
FD A1A2…An → B for R exists, {A1A2…An} is a
superkey for R.
• Any two attribute relation is in BCNF. Why?
• Decomposition rules:
– Find non trivial FDs A1A2…An → B that violate BCNF. Include
in B all the dependent attributes.
– Attributes in the FD are split into one relation, while the left
hand side and all the relation attributes not in the FD are split
into another relation.

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


BCNF Example
• For the relation
Students(SID:int, Name:string, Address:string, Phone:int, LoginID:string)
• Assume the following FDs:
– SID → Name Address Phone
– LoginID → SID Name Address Phone
• What is the primary key for the relation?
– {SID}+ ={SID, Name, Address, Phone} is missing LoginID.
– {LoginID}+ ={LoginID, SID, Name, Address, Phone}.
• Does any FD violate BCNF rules?
– SID → Name, Address, Phone but SID is not a superkey.
• Decompose into two relations:
– SID Name Address Phone
– SID LoginID
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Normalization Forms
• First Normal Form (1NF) requires all attributes to be
atomic. (First Non-Normal Form allows sets).
• Second Normal Form (2NF) requires all attributes to be
functionally dependent on the primary key.
• Third Normal Form (3NF) states:
A relation R is in 3NF if and only if whenever any
FD A1A2…An → B for R exists, either {A1A2…An} is a
superkey or B is a member of some candidate key.
• When is 3NF better than BCNF?
– 3NF might have some redundancy.
– All FDs can be maintained under 3NF decomposition.
• If Bi is in some superkey, then Bi is called prime.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
BCNF Vs. 3NF Example
• For the relation
CreditCards(Bank, CardType, CardNumber)
• Assume the following FDs:
– Bank → CardType
– CardNumber CardType → Bank
• What are the candidate keys for the relation?
– {CardNumber CardType}+ = {Bank, CardType, CardNumber}.
– {Bank CardNumber}+ = {Bank, CardType, CardNumber}.
• Does any FD violate BCNF rules?
– Bank → CardType.
• Does any FD violate 3NF rules?
– Bank → CardType? No, Bank is prime.
• In BCNF (not in 3NF) decompose into two relations:
– Bank CardType
– Bank CardNumber
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
BCNF Combining Example
• The following relational instance for the BCNF
decomposition is allowed by FDs:
Bank Cards Bank Accounts
Bank CardType Bank CardNumber
Royal Bank Visa Royal Bank 1234-567-890
TD Canada Trust Visa Royal Bank 2341-234-638
Citi Bank MasterCard TD Canada Trust 1234-567-890
• We see that the relation Bank Accounts already violate
the uniqueness constraint (the FD):
– CardNumber CardType → Bank.
• Can see the FD violated by combining the relations.
How would the combination look like?
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Multivalued Dependencies
• Some redundancy can occur without functional
dependencies, e.g. when many-to-many
relationships are converted into one relation.
• A Multivalued Dependency (MVD) asserts that
fixing the values of attributes {A1,A2,…,An} in a
relation R then the values of {B1,B2,…,Bm} are
independent from the rest of the attribute of R.
• The MVD is denoted as:
– A1A2…An →→ B1,B2,…,Bm
• To have proper instance the relation has tuples
that agree on attributes in Ai and Bj and all
possible values for the rest of the attributes.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
MVD Example
Banking Info
Client SSN Bank of Chequing Account Number Card Number
12321 Royal Bank 555-444-321 1234-567-890
12321 Royal Bank 555-444-321 2341-234-638
12321 TD Canada Trust 423-452-1233 1234-567-890
12321 TD Canada Trust 423-452-1233 2341-234-638

• What is the primary key for the relation above?


– Since there are no FDs, the primary key is all attributes.
• The MVD is in BCNF. What are the MVDs?
• Redundancy occurs from duplication of account
information for independent data credit card number.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
MVD Rules
• Trivial Dependency Rule:
– If A1A2…An →→ B1B2…Bm then also A1A2…An →→
B1B2…BmAi…Aj. This is also the only way combining/splitting
is allowed.
– Non-trivial MVD has no As in the Bs and there are other
attributes Cs in R as well.

• Transitive Rule:
– If A1A2…An →→ B1,B2,…,Bm and B1B2…Bm →→ C1C2…Cl then
also A1A2…An →→ C1C2…Cl.

• Complementation Rule:
– If A1A2…An →→ B1B2…Bm is an MVD in relation R, and
{C1,C2,…,Cl} are the rest of the attributes of R, then A1A2…An
→→ C1C2…Cl is an MVD as well.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Fourth Normal Form
• Fourth Normal Form (4NF) is a decomposition of a
relation R such that if A1A2…An →→ B1B2…Bm is a non-
trivial MVD, then {A1,A2,…,An} is a superkey of R.
• 4NF implies that either there are no non-trivial MVDs
or the MVD are also FDs and the relation is in BCNF.
• Note: all FDs are MVDs. Why?
• 4NF Decomposition rules:
– Find non trivial FDs A1A2…An →→ B1B2…Bm where
{A1,A2,…,An} is not a superkey. m is always the largest
possible, so no need for a heuristic like in BCNF. Why?
– Attributes in the MVD, Ais and Bjs, are split into one relation,
while Ais and all the relation attributes not in Ais and Bjs are
split into another relation.

Summer 2006 SFU - CMPT 354 - Zinovi Tauber


4NF Example
Banking Info
Client SSN Bank of Chequing Account Number Card Number
12321 Royal Bank 555-444-321 1234-567-890
12321 Royal Bank 555-444-321 2341-234-638
12321 TD Canada Trust 423-452-1233 1234-567-890
12321 TD Canada Trust 423-452-1233 2341-234-638

• The MVDs we have for this example are:


– SSN →→ Bank Account# and ?

Bank Account Credit Cards


SSN Bank Account# SSN Card Number
12321 Royal Bank 555-444-321 12321 1234-567-890
12321 TD Canada Trust 423-452-1233 12321 2341-234-638
Summer 2006 SFU - CMPT 354 - Zinovi Tauber
Normal Forms Comparison
• Which is preferable to use: BCNF, 3NF or
4NF?
• 4NF → BCNF → 3NF → 2NF → 1NF.
• Only 4NF eliminates MVD redundancy.
• Only 3NF preserves FDs.
• But, 3NF might have more redundancy.
• MVDs may or may not be preserved in any
decomposition.
• So choosing 3NF or 4NF depends on the MVDs
and FDs in the relation.
Summer 2006 SFU - CMPT 354 - Zinovi Tauber

You might also like