You are on page 1of 6

Chapter 3: Design Theory For Relational Databases

3.1 Functional Dependencies


3.1.1 Definition Of Functional Dependency
3.1.2 Keys Of Relations
3.1.3 Superkeys
3.1.4 Exercises For Section 3.1

3.2 Rules About Functional Dependencies


3.2.1 Reasoning About Functional Dependencies
3.2.2 The Splitting/Combining Rule
3.2.3 Trivial Functional Dependencies
3.2.4 Computing The Closure Of Attributes
3.2.5 Why The Closure Algorithm Works
3.2.6 The Transitive Rule
3.2.7 Closing Sets Of Functional Dependencies
3.2.8 Projecting Functional Dependencies
3.1.9 Exercises For Section 3.2

3.3 Design Of Relational Database Schemas


3.3.1 Anomalies
3.3.2 Decomposing Relations
3.3.3 Boyce-Codd Normal Form
3.3.4 Decomposing Into BCNF
3.3.5 Exercises For Section 3.3

3.4 Decomposition: The Good, The Bad, The Ugly


3.4.1 Recovering Information From A Decomposition
3.4.2 The Chase Test For Lossless Join
3.4.3 Why The Chase Works
3.4.4 Dependency Preservation
3.4.5 Excercises For Section 3.4

3.5 Third Normal Form


3.5.1 Definition Of Third Normal Form
3.5.2 The Synthesis Algorithm of 3NF Schemas
3.5.3 Why the 3NF Synthesis Algorithm Works
3.5.4 Exercises For Section 3.5

3.6 MultiValued Dependencies


3.6.1 Attribute Independence And Its Consequent Redundancy
3.6.2 Definition Of Multivalued Dependencies
3.6.3 Reasoning About Multivalued Dependencies
3.6.4 Fourth Normal Form
3.6.5 Decomposition Into Fourth Normal Forms
3.6.6 Relationships Among Normal Forms
3.6.7 Exercises For Section 3.6

3.7 An Algorithm To Discover MultiValued Dependencies


3.7.1 The Closure And The Chase
3.7.2 Extending The Chase To MVDs
3.7.3 Why The Chase Works For MVDs
3.7.4 Projecting MVDs
3.7.5 Exercises For Section 3.7

3.8 Summary Of Chapter 3

3.9 References For Chapter 3


Notes

Chapter 3: Design Theory For Relational Databases

?? == given an application - how do we go about designing a database for this


application?
In Chapter 4, we see high level notations for describing the structure of data, and
ways in which these high level designs can be converted to relations.

Alternatively we look at the requirements (for the application) and design the
relations directly without going through an intermediate stage.
Whichever approach is used, it is common to have room for improvement on the
initial schema.

fortunately there is a well developed theory for relational databases -


"dependencies", their implications on what makes a good relational schema,
this theory impies something called "normalization" == we decompose one relation
into two to resolve ambiguities. Next "multi valued dependencies",which
intuitively, represents a condition where one or more attributes are independent of
the other attributes. Dependencies also lead to normal forms, and decomposition of
relations to eliminate redundancies.

3.1 Functional Dependencies

A design theory that lets us


a) examine a given design carefully and
b) improve it based on a few simple principles.

The theory begins with stating the constraints that apply to the relation.
The most common constraint is 'the functional dependency' , which generalises the
idea of a key for a relation, .
Then we see how the theory gives us tools to improve our relations by
"decomposition of relations" - the replacement of one relation by two or more,
whose set of attributes == the set of attributes of the original undecomposed
relation

3.1.1 Definition Of Functional Dependency

A Functional Dependency (FD) on a relation R is a statement of the form "If two


tuples of R agree on attributes A1, A2, ... A_n i.e both tuples have the same
values for the same attributes, then they must also agree on all attributes B1,
B2, .... B_m. " (note: A1 .. .A_n do *not* have to appear consecutively, so also
B1,B2, ..., B_m.)

If every instance of relation R is such that, an FD is true, we say "R satisfies


FD".
When we say that R satisfies FD, we are saying that we are asserting a constraint
on R, (not just some instances of R).

Notation: A1 ... A_N functionally determine B1, ... B_m == A1, ... A_n -> B1, ...,
B_m

this is equivalent to

A1, .... , A_n -> B1


A1, .... , A_n -> B2
....

A1, .....2 A,n -> B_m

Consider the relation instance,


title, year, length, genre, studioName, starName
StarWars, 1997, 124, scifi, Fox, Carrie Fisher
StarWars, 1997, 124, scifi, Fox, Mark Hamill
StarWars, 1997, 124, scifi, Fox, Harrison Ford
Gone With The Wind, 1939, 231, drama, MGM, Vivian Leigh
Wayne's World, 1992, 95, comedy, Paramount, Dany Carvey
Wayne's World, 1992, 95, comedy, Paramount, Mike Meyers

For this relation the FD

title, year -> length, genre, studioName

holds.

the FD
title, year -> starName does not hold (counter example: the first two tuples
have the same title and year name but starName is different).

the FD holds for *all* instances of the relation, not a particular instance.

3.1.2 Keys Of Relations

We say that attributes {A1, A2, ... , An} are a *key* for a relation if
(1) these attributes functionally determine all other attributes of the
relation. i.e. there are not two tuples in any R instance s.t they agree on
attributes A1 thru An
(2) no proper *subset* of {A1,....,An} functionally determines all other
attributes of R. Iow a key must be minimal.

Consider

title, year, length, genre, studioName, starName


StarWars, 1997, 124, scifi, Fox, Carrie Fisher
StarWars, 1997, 124, scifi, Fox, Mark Hamill
StarWars, 1997, 124, scifi, Fox, Harrison Ford
Gone With The Wind, 1939, 231, drama, MGM, Vivian Leigh
Wayne's World, 1992, 95, comedy, Paramount, Dany Carvey
Wayne's World, 1992, 95, comedy, Paramount, Mike Meyers

{title, year, starName} is a key.


{year,starName} is not a key because we could have one star acting in two movies in
the same year (_ even though for the above *instance* this is true, iow, what is a
key comes from the *domain* and via domain knowledge, not reverse engineering an
*instance*)

{title, starName} is not a key because there could be two versions of a movie
(Spiderman!) made in two different years, with a star in common.

What is "Functional" about Functional Dependencies?


A1,A2, ... , A_n -> B is called a functional dependency because we can concieve of
a function F that takes A1, .. An as arguments and producing a unique value or no
value at all for B. This is not a mathematical function though (_ but a
characteristic of the data?)

3.1.3 Superkeys
A set of attributes that contains a key, iow, the superset of attributes that form
a key, is a superkey.
Thus every key is a superkey. However, some superkeys are not (minimal and so not)
keys. A superkey determines all the other attributes of the relation, but it need
not satisfy the 2nd condition -minimality.

Terminology: Other books use 'key' to mean what we call 'superkey' - i.e a *not
necessarily * minimal collection of attributes that determine the rest of the
attributes of the relation

3.1.4 Exercises For Section 3.1

3.2 Rules About Functional Dependencies

In this section, how to reason about FDs. i.e suppose we are told of a set of FDs
that a relation satisfies, we can then deduce what other FDs that relation
satisfies. This ability to deduce other 'satisfied FDs' is essential.

3.2.1 Reasoning About Functional Dependencies

A motivating example: if we are told that the relation R (A, B, C) satisfies the
FDs A -> B, B->C
we can deduce that R satisfies the FDs A -> C .

"Proof":
Let (a1, b1, c1) and (a1, b2, c2) be two tuples of R.
Since the FD A -> B holds, and since the two tuples have the value a1 for the
attribute A, b1 must = b2.
And since FD B -> C holds, c1 must equal c2.

So we have proved that A -> C holds.

a *set of* FD T follow * a set of * FD S


if all instances that satisfy S also satisfy T (_ Note that T may have
additional relation instances that do not satisfy S). S could be 'smaller' or
'equal'.

Equivalence of sets of FD S and T.


S and T are equivalent if the relation instances satisfying S are exactly those
satisfying T.
Iow S follows T and T follows S.

In this section, some rules about FDs. In general these allow us to replace one set
of FDs with another, or add to one set of FDs, more FDs that follow from the
original. e.g by the transitivity proved earlier, if given A -> B, and B -> C we
can add A -> C.

We also give an algorithm to see if one FD follows from another.

3.2.2 The Splitting/Combining Rule


1. We can replace the FD
A1, A2, ... A-n -> B1 B2 ... B-m

with

A1, A2, ... A-n -> B1


A1, A2, ... A-n -> B2
...
A1, A22 ... A-n -> B_m

2. We can replace

A1, A2, ... A-n -> B1


A1, A2, ... A-n -> B2
...
A1, A22 ... A-n -> B_m

with a single FD

A1, A2, ... A-n -> B1 B2 ... B-m

3.2.3 Trivial Functional Dependencies

A functional dependency is said to be trivial if it holds for all instances of the


relation, irrespective of what other constraints are assumed.
If the constraints are FDs, it is easy to tell whether an FD is trivial.

Trivial FDs are FD's such that, in,


A1, A2, .... , A_n -> B1, B2, .... ,B_m

{B1, .., Bm} is a subset of {A1, A2, ... A_n}

e.g
title, year -> title
title -> title

We can assume any trivial FD, without having to justify it in terms of what other
FDs are asserted on the relation.

An intermediate situation is when *some* of the attributes on the left of the FD


also appear on the rhs.
These can be removed from the rhs to simplify the FD.

3.2.4 Computing The Closure Of Attributes

3.2.5 Why The Closure Algorithm Works


3.2.6 The Transitive Rule
3.2.7 Closing Sets Of Functional Dependencies
3.2.8 Projecting Functional Dependencies
3.1.9 Exercises For Section 3.2

3.3 Design Of Relational Database Schemas


3.3.1 Anomalies
3.3.2 Decomposing Relations
3.3.3 Boyce-Codd Normal Form
3.3.4 Decomposing Into BCNF
3.3.5 Exercises For Section 3.3

3.4 Decomposition: The Good, The Bad, The Ugly


3.4.1 Recovering Information From A Decomposition
3.4.2 The Chase Test For Lossless Join
3.4.3 Why The Chase Works
3.4.4 Dependency Preservation
3.4.5 Excercises For Section 3.4

3.5 Third Normal Form


3.5.1 Definition Of Third Normal Form
3.5.2 The Synthesis Algorithm of 3NF Schemas
3.5.3 Why the 3NF Synthesis Algorithm Works
3.5.4 Exercises For Section 3.5

3.6 MultiValued Dependencies


3.6.1 Attribute Independence And Its Consequent Redundancy
3.6.2 Definition Of Multivalued Dependencies
3.6.3 Reasoning About Multivalued Dependencies
3.6.4 Fourth Normal Form
3.6.5 Decomposition Into Fourth Normal Forms
3.6.6 Relationships Among Normal Forms
3.6.7 Exercises For Section 3.6

3.7 An Algorithm To Discover MultiValued Dependencies


3.7.1 The Closure And The Chase
3.7.2 Extending The Chase To MVDs
3.7.3 Why The Chase Works For MVDs
3.7.4 Projecting MVDs
3.7.5 Exercises For Section 3.7

3.8 Summary Of Chapter 3

3.9 References For Chapter 3

You might also like