# CMPT 354 Database Systems I

Chapter 3 – Relational Data Model

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

**Benefit of Relational Model
**

• • • • • Nearly all databases are based on a relational model.

– IBM, Microsoft, Oracle, Sybase, Informix, etc…

Single mathematical concept: Relation. Relations allow a high level data manipulation language – SQL. Object oriented databases are competitors.

– – Object Store, Versant, Ontos. Informix, Oracle, IBM.

SFU - CMPT 354 - Zinovi Tauber

Object relational model emerges.

Summer 2006

**Relational Model Basics
**

• • • • • • • A Relation has a set of tuples, represented by a named two dimensional table. Attributes (or fields) are stored in columns. Tuples (or records) are stored in rows. The first row contains the attribute names. Attributes have a domain – an atomic type. A tuple has one component for each attribute. The relation is the only way to store data ⇒ relational database is a collection of relations.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

Relational Schema

• • • • • A Schema for a relation is represented by the name of the relation followed by a parenthesized list of attributes. Example: accounts(account#:int, balance: real) Attributes are referenced by name, not column locations ⇒ column locations are allowed to change. When column names are not specified, the schema default order is assumed. Column names must be unique.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

Relation Instances

• • • • • • A relation instance is a particular collection of tuples. The tuples have no significance of ordering. There cannot be duplicate tuples. Why? A relation instance R can be defined as:

– R ⊆ D1 × D1 × …× Dn where Di are the domains.

**The cardinality of R is the number of rows = |R| The degree/arity of R is the number of columns =n
**

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

Relation Example

• A possible relation instance shown:

Accounts Account# 3372183 6533341 6334234 5643245 Balance 500.00 1.00 -48.65 0.00

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

From E-R to Relations

• Basic guideline for converting E-R diagrams to relational model:

– – Each entity set is described as a relation Each relationship set is described as a relation with added foreign keys from all connecting sets. Weak entity sets (foreign keys, relationship sets) “ISA” entity sets have a few modeling options. Combining relations can be a better design (when?)

SFU - CMPT 354 - Zinovi Tauber

•

**There are some exceptions:
**

– – –

Summer 2006

Conversion Examples

• Converting an entity set to a relation schema: In E-R: Address Name

SSN clients Phone

Relation Schema:

Clients(SSN:int, Name:String, Address:String, Phone:int)

•

**No knowledge of relationship sets.
**

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

Conversion Examples

• Converting a relationship set to relation schema: In E-R:

name SSN clients hold accounts address phone join date Acct # balance

Relation Schema:

hold(SSN:int, Account#:int, joinDate:Date)

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

**Relationships with roles
**

• Converting a relationship set with roles: In E-R:

name SSN clients Joint address phone Primary Joint Account

Relation Schema:

JointAccount(PrimarySSN:int, JointSSN:int)

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

Combining Relations

• • • Some relations are not necessary and merely complicate the design. Commonly many-to-one relationship sets are not necessary. If R is a many-to-one relationship set from entity sets A to B then we can combine R and A by merging their relations. OR combine directly from E-R as follows:

– – – Include all the attributes of A. Include the key attributes of B. Include the attributes of R.

•

**The tuples of A that are not in R have NULL for attributes of B and R. Consider space efficiency!
**

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

Combining Example

name SSN clients primary joint with joint hold accounts address phone join date Acct # balance

•

Relational Schema:

Clients(SSN:int, Name:String, Address:String, Phone:int) Accounts(Account#:int, balance: real) Hold(SSN:int, Account#:int, joinDate:Date) JointAccount(PrimarySSN:int, JointSSN:int)

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

Combining Example

name SSN clients primary joint with joint hold accounts address phone join date Acct # balance

•

Relational Schema:

Clients(SSN:int, Name:String, Address:String, Phone:int) Accounts(SSN:int,Account#:int, balance: real, joinDate:Date) JointAccount(PrimarySSN:int, JointSSN:int)

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

**Converting Weak Sets
**

• • • • Weak entity sets need foreign keys from all supporting entity sets to define a primary key ⇒ part of the relation schema of the weak entity. Supporting relationships are either redundant or have attributes that can be assigned to the weak entity set. Remember: supporting relationships are manyto-one. Other relationships with the weak entity set must include the entire key, including foreign keys.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

**Weak Set Example
**

Acct # Rest of E-R Diagram balance type amount

accounts

lends

Mortgage

•

Relational Schema:

Clients(SSN:int, Name:String, Address:String, Phone:int) Accounts(SSN:int,Account#:int, balance: real, joinDate:Date) JointAccount(PrimarySSN:int, JointSSN:int) Mortgage(Account#:int, type:string, amount:real) Lends(Account#:int, MortgageAcc#:int, type:string)

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

**Weak Set Example
**

Acct # Rest of E-R Diagram balance type amount

accounts

lends

Mortgage

•

Relational Schema:

… Mortgage(Account#:int, type:string, amount:real) Lends(Account#:int, MortgageAcc#:int, type:string, amount:real)

– – Account# = MortgageAcc#, Why? Lends(Account#:int, type:string) is part of the Mortgage relation.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

E-R ISA to Relation

• •

– –

**Several design options for conversion: E-R Style Conversion:
**

Create a relation for each entity set in the hierarchy. The relations include key attributes from the root.

•

– –

**Objects of a Single Class:
**

Create a relation for each possible subtree including the root. The relation schema includes all the attributes of all the entity sets in the subtree.

•

– –

**All Encompassing Relation:
**

Create a relation that has all the attributes of all the entity sets in the hierarchy. Empty components (tuple attributes) are represented by NULL.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

E-R Style Conversion

• • • • • • Each entity set in the ISA hierarchy is converted into a relation. ISA relationship itself is not modeled. Each relation has a key from the root relation. The root keys are also foreign keys in any relationship on the tree relations. Entity sets do not depend on covering constraint – Why not? The number of relations is the number of entity sets.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

E-R Conversion Example

style# company description Acct # balance chequebooks issued chequing # trans accounts isa savings interest

card #

•

Relational Schema:

Accounts(Acct#:int, balance:real) Savings(Acct#:int, interest:real) Chequing(Acct#:int, card#:int, #trans:int) ChequeBooks(style#:short, company:string, description:string) Issued(Acct#:int, style#:short)

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

**Object-oriented Conversion
**

• • • • • Entities are considered as objects in an object oriented approach ⇒ belong to only one class. Every possible subtree with the root included is considered a class. Create a relation for every possible subtree with all attributes of all the entity sets in the subtree. Possible subtrees depend on covering and overlap constraints. How? The number of relations is in the order of 2^|entity sets|.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

**OO Conversion Example
**

style# company description Acct # balance chequebooks issued chequing # trans accounts isa savings interest

card #

•

Relational Schema:

Accounts(Acct#:int, balance:real) Savings(Acct#:int, interest:real, balance:real) Chequing(Acct#:int, card#:int, #trans:int, balance: real) ChequingAndSavings(Acct#:int, card#:int, #trans:int, interest:real, balance: real) Issued(Acct#:int, style#:short)

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

**All Encompassing Relation
**

• • • • • We can translate the entire isa hierarchy as a single relation. The relation has attributes from all entity sets in the hierarchy. Attributes from one subclass may not be defined for tuples from another subclass, so they are set to NULL. Therefore, we must assume all attributes except for the root attributes can be set to NULL. Only one relation necessary.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

**One Relation Example
**

style# company description Acct # balance chequebooks issued chequing # trans accounts isa savings interest

card #

•

Relational Schema:

Accounts(Acct#:int, balance:real, card#:int, #trans:int, interest:real) ChequeBooks(style#:short, company:string, description:string) Issued(Acct#:int, style#:short)

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

**ISA Conversion Comparison
**

•

–

**Generally prefer fewer number of relations:
**

Object-oriented has exponential number of relations, which is generally undesirable for many entity sets.

•

– –

**Want to support queries efficiently:
**

Single relation approach supports queries regarding all the attributes most efficiently. But, it loses the semantic information contained by the name of the entity sets.

•

– – –

**Redundancy and space efficiency:
**

Object-oriented approach has all tuples belong to exactly one relation, and all attributes necessary so is most space efficient. Single relation method has nulls for some attributes. E-R style conversion has redundancy for entities on keys.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

Functional Dependencies

• • A form of uniqueness constraint. Functional Dependency (FD) on a relation R is a constraint that choosing some attributes A1,A2, …,An will also fix another attribute B of R. In math we write a function as:

F(A1,A2,…,An) = B.

•

•

**Meaning B depends on the values of {A1,A2,…,An} through some function F. We denote FD as:
**

A1A2…An → B

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

**FD Splitting/Combining
**

• For a set of FDs:

A1A2…An → B1 A1A2…An → B2 … A1A2…An → Bm

**We can combine all the right side to
**

A1A2…An → B1 B2…Bm.

• •

**Can also split it back. Why? Can we split the left side?
**

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

FD Example

• The relation corresponding to the following diagram is:

name SID students address phone loginId Students(SID:int, name:string, Address:string, Phone:int, LoginID:string)

•

– – – –

**The functional dependencies we have are:
**

SID → Name Address Phone SID → LoginID ? Depends on specifications. LoginID → SID LoginID → SID Name Address Phone

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

Keys of Relations

•

i. ii.

**A set of attributes {A1,A2,…,An} are a candidate key for a relation R if and only if:
**

A1A2…An → B1 B2…Bm where {B1 B2…Bm} are the rest of the attributes of R. There does not exists Asubset⊂ {A1,A2,…,An} such that Asubset → B1 B2…Bm.

• • •

Condition ii is the principle that a candidate key must be minimal. The candidate key chosen to reference tuples is called primary key. Any set {A1,A2,…,An} that contains a candidate key is called a superkey

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

**Primary Key Example
**

• In the student entity set, what are the candidate keys?

name SID students address phone loginId Students(SID:int, name:string, Address:string, Phone:int, LoginID:string)

•

– – –

**The candidate keys we can have are:
**

LoginID → SID? LoginID is a candidate key SID → LoginID? SID is a candidate key Neither: {LoginID, SID} is a candidate key.

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

**Relationship Primary Key
**

name SSN clients hold accounts address phone join date Acct # balance

Hold(SSN:Int, Acct#:int, joinDate: Date)

•

– – – –

**The candidate keys we can have are:
**

Acct# → SSN joinDate if an account only belongs to one client. SSN → Acct# joinDate if a client has at most one account. Neither: {SSN, Acct#} is a candidate key. Both: Hold is 1:1, both SSN and Acct# are candidate keys.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

**Complete Inference Rules
**

• 1. 2. 3. A complete set of inference rules called Armstrong’s axioms, can help us find all the inferences given by a set of FDs. Reflexivity: if {B1 B2…Bm} ⊆ {A1,A2,…,An} then A1A2…An → B1 B2…Bm. These are called trivial FDs. Augmentation: if A1A2…An → B1 B2…Bm then A1A2…An C1C2…Cl → B1 B2…Bm C1C2…Cl for any C1C2…Cl. Transitivity: if A1A2…An → B1 B2…Bm and B1 B2…Bm → C1C2…Cl then A1A2…An → C1C2…Cl.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

**Closure Under FDs
**

• The closure of a set {A1,A2,…,An} of attributes under a set S of FDs is a set B of attributes such that A1A2…An → B and whenever A1A2…An → C under S then C ⊆ B. The closure of a set A= {A1,A2,…,An} is denoted as A+= {A1,A2,…,An}+. Can also derive a full set of FDs for a relation from a given set S. The full set S+ is the closure of FDs, and S is a basis. S is a minimal basis if no T ⊂ S has T+=S+.

SFU - CMPT 354 - Zinovi Tauber

• •

•

Summer 2006

**Closure and Keys
**

• • {A1,A2,…,An} is a superkey for relation R if and only if {A1,A2,…,An}+ contains all of R’s attributes. Why not candidate key? {A1,A2,…,An} is a candidate key for R if and only if it is a superkey and no subset {Ai,…,Aj} ⊂ {A1,A2,…,An} has {Ai,…,Aj}+ = {A1,A2,…,An}+. That is if after removing Ai from {A1,A2,…,An} for any i, the closure of the set does no longer contain all attributes.

SFU - CMPT 354 - Zinovi Tauber

•

Summer 2006

**Anomalies and Decomposition
**

• Anomalies are problems arising from database relations having unrelated components.

1. 2. 3. Redundancy – e.g. a range of attributes need to be duplicated to store a set of values on one attribute. Update anomalies – e.g. can update one instance of duplicated attributes but not another. Deletion anomalies – e.g. can delete valuable information by deleting a tuple that isn’t duplicated.

•

**Can resolve anomalies by decomposition, which is splitting a relation into multiple relations:
**

{A1,A2,…,An} = {B1 B2…Bm} ∪ {C1,C2,…,Cl }.

•

**Can decompose following FD rules.
**

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

Anomalies Example

• •

– –

**For the relation Assume the following FDs:
**

SID → Name Address Phone LoginID → SID Name Address Phone

Students(SID:int, Name:string, Address:string, Phone:int, LoginID:string)

•

SID

**Then we can have a relation instance:
**

Students Name Mike Rob Mike Address Burnaby Vancouver Burnaby Phone 604555 604666 604555 LoginID mike@sfu.ca rob@sfu.ca mike2@sfu.ca 6637284 7398385 6637284

Redundancy

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

Projection Example

Students SID 6637284 7398385 6637284 Name Mike Rob Mike Address Burnaby Vancouver Burnaby Phone 604555 604666 604555 LoginID mike@sfu.ca rob@sfu.ca mike2@sfu.ca Student Accounts Phone 604555 604666 SID 6637284 7398385 6637284 LoginID mike@sfu.ca rob@sfu.ca mike2@sfu.ca

Student Details SID 6637284 7398385 Name Mike Rob Address Burnaby Vancouver

Summer 2006

SFU - CMPT 354 - Zinovi Tauber

**Boyce-Codd Normal Form
**

• The Boyce-Codd Normal Form (BCNF) is a decomposition rule for avoiding anomalies: A relation R is in BCNF if and only if whenever any FD A1A2…An → B for R exists, {A1A2…An} is a superkey for R. Any two attribute relation is in BCNF. Why? Decomposition rules:

– – Find non trivial FDs A1A2…An → B that violate BCNF. Include in B all the dependent attributes. Attributes in the FD are split into one relation, while the left hand side and all the relation attributes not in the FD are split into another relation.

SFU - CMPT 354 - Zinovi Tauber

• •

Summer 2006

BCNF Example

• •

– –

**For the relation Assume the following FDs:
**

SID → Name Address Phone LoginID → SID Name Address Phone

Students(SID:int, Name:string, Address:string, Phone:int, LoginID:string)

•

– –

**What is the primary key for the relation?
**

{SID}+ ={SID, Name, Address, Phone} is missing LoginID. {LoginID}+ ={LoginID, SID, Name, Address, Phone}.

•

–

**Does any FD violate BCNF rules?
**

SID → Name, Address, Phone but SID is not a superkey.

•

– –

**Decompose into two relations:
**

SID Name Address Phone SID LoginID

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

Normalization Forms

• • • First Normal Form (1NF) requires all attributes to be atomic. (First Non-Normal Form allows sets). Second Normal Form (2NF) requires all attributes to be functionally dependent on the primary key. Third Normal Form (3NF) states: A relation R is in 3NF if and only if whenever any FD A1A2…An → B for R exists, either {A1A2…An} is a superkey or B is a member of some candidate key. When is 3NF better than BCNF?

– – 3NF might have some redundancy. All FDs can be maintained under 3NF decomposition.

•

•

**If Bi is in some superkey, then Bi is called prime.
**

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

**BCNF Vs. 3NF Example
**

• •

– –

**For the relation
**

CreditCards(Bank, CardType, CardNumber)

Assume the following FDs:

Bank → CardType CardNumber CardType → Bank {CardNumber CardType}+ = {Bank, CardType, CardNumber}. {Bank CardNumber}+ = {Bank, CardType, CardNumber}.

•

– –

**What are the candidate keys for the relation? Does any FD violate BCNF rules?
**

– Bank → CardType.

• •

–

**Does any FD violate 3NF rules?
**

– – Bank CardType Bank CardNumber

Bank → CardType? No, Bank is prime.

•

**In BCNF (not in 3NF) decompose into two relations:
**

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

**BCNF Combining Example
**

• The following relational instance for the BCNF decomposition is allowed by FDs:

Bank Cards Bank Royal Bank TD Canada Trust Citi Bank CardType Visa Visa MasterCard Bank Royal Bank Royal Bank TD Canada Trust Bank Accounts CardNumber 1234-567-890 2341-234-638 1234-567-890

•

–

**We see that the relation Bank Accounts already violate the uniqueness constraint (the FD):
**

CardNumber CardType → Bank.

•

**Can see the FD violated by combining the relations. How would the combination look like?
**

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

Multivalued Dependencies

• • Some redundancy can occur without functional dependencies, e.g. when many-to-many relationships are converted into one relation. A Multivalued Dependency (MVD) asserts that fixing the values of attributes {A1,A2,…,An} in a relation R then the values of {B1,B2,…,Bm} are independent from the rest of the attribute of R. The MVD is denoted as:

– A1A2…An →→ B1,B2,…,Bm

• •

To have proper instance the relation has tuples that agree on attributes in Ai and Bj and all possible values for the rest of the attributes.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

MVD Example

Banking Info Client SSN 12321 12321 12321 12321 Bank of Chequing Royal Bank Royal Bank TD Canada Trust TD Canada Trust Account Number 555-444-321 555-444-321 423-452-1233 423-452-1233 Card Number 1234-567-890 2341-234-638 1234-567-890 2341-234-638

•

–

**What is the primary key for the relation above?
**

Since there are no FDs, the primary key is all attributes.

• •

The MVD is in BCNF. What are the MVDs? Redundancy occurs from duplication of account information for independent data credit card number.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

MVD Rules

•

– –

**Trivial Dependency Rule:
**

If A1A2…An →→ B1B2…Bm then also A1A2…An →→ B1B2…BmAi…Aj. This is also the only way combining/splitting is allowed. Non-trivial MVD has no As in the Bs and there are other attributes Cs in R as well.

•

–

Transitive Rule:

If A1A2…An →→ B1,B2,…,Bm and B1B2…Bm →→ C1C2…Cl then also A1A2…An →→ C1C2…Cl.

•

–

Complementation Rule:

If A1A2…An →→ B1B2…Bm is an MVD in relation R, and {C1,C2,…,Cl} are the rest of the attributes of R, then A1A2…An →→ C1C2…Cl is an MVD as well.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

**Fourth Normal Form
**

• • • •

– –

Fourth Normal Form (4NF) is a decomposition of a relation R such that if A1A2…An →→ B1B2…Bm is a nontrivial MVD, then {A1,A2,…,An} is a superkey of R. 4NF implies that either there are no non-trivial MVDs or the MVD are also FDs and the relation is in BCNF. Note: all FDs are MVDs. Why? 4NF Decomposition rules:

Find non trivial FDs A1A2…An →→ B1B2…Bm where {A1,A2,…,An} is not a superkey. m is always the largest possible, so no need for a heuristic like in BCNF. Why? Attributes in the MVD, Ais and Bjs, are split into one relation, while Ais and all the relation attributes not in Ais and Bjs are split into another relation.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006

4NF Example

Banking Info Client SSN 12321 12321 12321 12321 Bank of Chequing Royal Bank Royal Bank TD Canada Trust TD Canada Trust Account Number 555-444-321 555-444-321 423-452-1233 423-452-1233 Card Number 1234-567-890 2341-234-638 1234-567-890 2341-234-638

•

– SSN 12321 12321

**The MVDs we have for this example are:
**

SSN →→ Bank Account# and ? Bank Account Bank Royal Bank TD Canada Trust Account# 555-444-321 423-452-1233

SFU - CMPT 354 - Zinovi Tauber

Credit Cards SSN 12321 12321 Card Number 1234-567-890 2341-234-638

Summer 2006

**Normal Forms Comparison
**

• • • • • • • Which is preferable to use: BCNF, 3NF or 4NF? 4NF → BCNF → 3NF → 2NF → 1NF. Only 4NF eliminates MVD redundancy. Only 3NF preserves FDs. But, 3NF might have more redundancy. MVDs may or may not be preserved in any decomposition. So choosing 3NF or 4NF depends on the MVDs and FDs in the relation.

SFU - CMPT 354 - Zinovi Tauber

Summer 2006