You are on page 1of 27

Relational Database Design

Logical design of a RDB


1. Consult with clients
2. Produce an ER model
3. Translate ER model into relation schema
4. Convert resulting relations to 3NF (3rd normal
form) or BCNF (Boyce-Codd Normal Form)

As usual, there’s much iteration, but these are the 4


main steps.
NB: We don’t consider physical design at all in this
course, but it is crucial for good performance.
What problems arise from poor
relation scheme designs?
• the same fact is stored in more than one place in the
database
• this can lead to inconsistent copies, & no way of knowing
which is correct
• sometimes data cannot be inserted in the db because there
is no value for an attribute which cannot be null
• sometimes a fact cannot be deleted from the db without
losing other information with it that cannot be kept
anywhere else in the db
• the DBMS cannot cache data well if relations are badly
designed – it can be forced to waste space on unused items
SID → Sname Rating Age
BID → Bname Fee Location
SID BID Day → Deposit
Sname SID Bname BID Day Deposit Fee Location Rating Age

Marx 23 Wayfarer 109 1/8 120 120 Hout Bay 8 52

Marx 23 SeaPride 108 8/8 120 500 Fish Hoek 8 52

Martin 25 Yuppie 101 8/8 0 400 Hout Bay 9 51

Adams 27 Yuppie 101 9/8 100 400 Hout Bay 8 36

Adams 27 Wayfarer 109 15/8 120 120 Hout Bay 8 36

Carrey 33 Wayfarer 109 4/9 0 120 Hout Bay 10 22

Carrey 33 Joy 104 11/9 0 200 Hout Bay 10 22


Functional Dependencies (FDs)
• An FD A → B means that each instance of
attribute A is associated with at most 1 unique
value of B in the real world
• We will use A, B, C, etc. for individual attributes;
X, Y, Z, W etc. for groups of attributes; R to mean
all attributes of the relation
• If A → B it does not mean that B → A E.g. if
Account → Owner then maybe Owner → Account
or maybe Owner → Account: we must ask the
client what the business rules are.
Keys
• If X R then X is a superkey for R
• If X R and there is no subset Y of X such that Y
R, then X is a candidate key for R
• One of the candidate keys for each relation must
be chosen as the primary key for that relation.
• If Z is a foreign key in relation R then there is
some other relation, S, such that Z is the primary
key of relation S. We use Z in R to represent a
relationship between R entities and S entities.
Keys: example
R (sno, sname, pno, cost, day, quantity)
P (pno, pname, selling, total)
• Candidate keys for R are sno or sname.
• Candidate keys for P are pno or pname.
• Primary key for R is sno, primary key for P is pno
• pno is a foreign key in R, indicating which part in
P was delivered by that supplier on that day.
• If pname were the primary key of P instead, R
must be R ( sno, sname, pname, cost, day,
quantity).
FDs continued
• An FD is just a “to one” mapping
• Armstrong’s axioms for FDs:
Reflexivity: AB → A is trivially always true
Augmentation: if A → B then
→ AX → BX
Transitivity: if A → B and B → C then A → C
• if A → B and A → C then A → BC (union)
• if A → BC then A → B and A → C (decomp.)
• if A → B and BX → C then AX → C
(pseudotransitivity)
Example : using FD axioms
SC → PMG SL → C CT → L TL → C SP → C
show that SP → M

• SP → C (given)
• SP → S (reflexivity)
• SP → SC (union of 1. and 2.)
• SP → PMG (transitivity of 3. & first FD above)
• SP → M (decomposition)
Closure of attributes
• We denote by X+ the set of all attributes
functionally determined by X
• X+ is called the closure of X
• To find whether Y is a superkey for relation R, we
see if Y+ contains all attributes of R
• To see if X → Y is true or not, just find X+ and
see if it includes all the attributes of Y
• X+ are all the values that follow uniquely once the
value of X is known (everything X maps onto
uniquely)
Finding X+
1. Let ans = X
2. For every FD Y → Z s.t. Y ⊆ ans, add Z
to ans
3. Repeat step 2 until no more attributes can
be added to ans
4. ans is now the closure of X
Example : finding attribute closure
SC → PMG SL → C CT → L TL → C SP → C
What is the closure of SL ?
start with ans = {SL}
using 2nd FD, ans = {SLC}
using 1st FD, ans = {SLCPMG}
no more attributes can be added so (SL)+ is SLCPMG
Is SL a superkey for R(SCPMGLT)? no, SL → T
Does SL → PG? yes, because PG are in (SL)+
Summary so far
• FDs tell us important constraints on the data
• FDs can be used to check if attributes form
a candidate key for a relation
• attribute closure is easy to compute
• to see if an FD X Y is true, find X+ and see
if it contains all of Y
• Next: FDs can also tell us if a relation
scheme is good or bad
Boyce-Codd Normal Form
A relation R is in BCNF if and only if, for every FD
X → Y that holds on R:
X → Y is trivial (i.e. Y ⊆ X)
or X is a superkey for R

(i.e. the only to-one relationships that hold among R’s


attributes are fundamental properties of R entities,
there are no additional/extraneous relationships in R)
Third Normal Form
A relation R is in 3NF if and only if, for every FD
X → Y that holds on R:
X → Y is trivial (i.e. Y ⊆ X)
or X is a superkey for R
or Y contains only prime attributes

(an attribute in relation R is a prime attribute if that


attribute forms part of some candidate key for R)
Is R(SCPMGLT) a good design?
SC → PMG SL → C CT → L TL → C SP → C

Is R in BCNF? We need to check the closure of each LHS to


see if it is a key for R or not.
(SC)+ = SCPMG is not a key, so this is not BCNF
[in fact not one LHS is a key for R in the FDs above!]

Is R in 3NF? we need to know the candidate keys:


(SCT)+ = SCTPMGL. (SLT)+ = SLTCPMG.
(SPT)+ = SPTCMGL. So keys are SCT, SLT or SPT.
So R is not in 3NF as e.g. SC → PMG and neither M nor G is
a prime attribute. Note: the other RHS’s give no problem
as C and L are prime attributes.
Example 2
R (City, Suburb, Postalcode) or R(CSP)
Only 2 FDs hold: CS → P and P → C

R is not in BCNF because in P →C, and P is


not a key for R.

R is in 3NF: in P → C, although P is not a key


for R, the RHS attribute C is prime (as CS is
a key for R)
Converting to BCNF
1. If R is not in BCNF then there is an FD X
→ Y that violates BCNF. Replace R with
R1(X,Y) and R2(R – Y)
2. Repeat until no more FDs violate BCNF.

The algorithm for 3NF is identical – just work


with FDs that violate 3NF.
Example converting to BCNF
SC → PMG SL → C CT → L TL → C SP → C

(SC)+ = SCPMG is not a key, violates BCNF.


Replace R by R1(SCPMG) R2(SCLT)
(SL)+ = SLCPMG is not a key, violates BCNF in R2.
Replace R2, getting R1(SCPMG) R3(SLC) R4(SLT)

(Now SC → PMG is ok as SC is a key for R1. SP →


C is ok as SP is a key for R1. SL → C is ok as SL
is a key for R3. CT → L and TL → C are ok as
these 3 never appear together in any relation.)
Example 2
R (City, Suburb, Postalcode) or R(CSP)
Only 2 FDs hold: CS → P and P → C

P → C violates BCNF. Replace R with


R1(PC) and R2(SP) to obtain BCNF.

But that’s a very inconvenient database!


Many people prefer 3NF and would rather keep
R(CSP) as it was originally.
Relation scheme decomposition
A decomposition of a relation scheme R into relation schemes
R1, R2, - - -
1. must not lose any attributes (i.e. every attribute of R
must appear in one of the new relations)
2. should be a lossless join decomposition
3. should be a dependency preserving decomposition (i.e.
every FD that holds for R should be enforceable using
only one of the new relations, without needing a join)

While (1) and (2) above are always possible, (3) is not always
possible. For example, some BCNF decompositions
aren’t dependency preserving – see. e.g. that of R(CSP)
in the previous slide.
Lossless join decompositions
• If R is decomposed into R1 and R2 then this is a
lossless join decomposition if and only if the join
of R1 and R2 gives exactly the original tuples of R
• If it is a lossless join decomposition, then either
(R1 ∩ R2) → R1 or (R1 ∩ R2) → R2 i.e. the
common attributes are a key of one of the relations
• Otherwise, when we join up R1 and R2, we get
additional/extraneous/nonsense tuples!
• The BCNF/3NF decomposition algorithm always
gives a lossless join decomposition.
E.g. relation scheme decomposition
R (sno, sname, pno, pname, cost, selling, quantity)

R1(sno,sname,cost) and R2(pno, pname, selling) is not a valid


decomposition as the “quantity” attribute has been lost.

R1(sno,sname,cost) and R2(pno, pname, selling, quantity, sno)


is not a lossless join decomposition unless sno → sname,cost
because otherwise we don’t know which supplier supplied a
part.
If cost → selling then this is also not a dependency preserving
decomposition, because we must join R1 and R2 to enforce
the cost → selling constraint.
Example (one 3NF relation)
S P C
Rosebank 7700 CT
Rosebank 1100 Jbg
Claremont 7700 CT
Claremont 1200 Jbg
Example (BCNF decomposition)
P S P C
7700 Rosebank 7700 CT
1100 Rosebank 1100 Jbg
7700 Claremont 1200 Jbg
1200 Claremont 3400 Jbg
3400 Claremont

Relation (PC) does not violate P → C. But taking the 2


relations together and joining on P value, we see that they
violate CS → P because Claremont in Jbg is associated
with two postalcodes viz. 1200 and 3400.
Design Goals
• Goal for a relational database design is:
– BCNF.
– Lossless join.
– Dependency preservation.
• If we cannot achieve this, we accept:
– 3NF.
– Lossless join.
– Dependency preservation.
Summary
• FDs give important constraints on db data
• FDs are useful for finding relation keys
• attribute closure can be computed to decide if an
FD is true or not
• in a good relation, the only FDs that hold are those
where the LHS is a superkey
• otherwise, decompose the relation by making a
separate relation for any FD with a nonkey LHS
• we can always find lossless join decompositions
into 3NF/BCNF, and we can always find
dependency preserving decompositions into 3NF.
But sometimes there is no dependency preserving
decomposition into BCNF.

You might also like