Plugin Databases Logical Design

Relational Database Design
Logical design of a RDB

1. Consult with clients
2. Produce an ER model
3. Translate ER model into relation schema
4. Convert resulting relations to 3NF (3rd normal
form) or BCNF (Boyce-Codd Normal Form)
As usual, there’s much iteration, but these are the 4

main steps.
NB: We don’t consider physical design at all in this
course, but it is crucial for good performance.
What problems arise from poor
relation scheme designs?
• the same fact is stored in more than one place in the
database
• this can lead to inconsistent copies, & no way of knowing
which is correct
• sometimes data cannot be inserted in the db because there
is no value for an attribute which cannot be null
• sometimes a fact cannot be deleted from the db without
losing other information with it that cannot be kept
anywhere else in the db
• the DBMS cannot cache data well if relations are badly
designed – it can be forced to waste space on unused items
SID → Sname Rating Age
BID → Bname Fee Location
SID BID Day → Deposit
Sname SID Bname BID Day Deposit Fee Location Rating Age
Marx 23 Wayfarer 109 1/8 120 120 Hout Bay 8 52
Marx 23 SeaPride 108 8/8 120 500 Fish Hoek 8 52
Martin 25 Yuppie 101 8/8 0 400 Hout Bay 9 51
Adams 27 Yuppie 101 9/8 100 400 Hout Bay 8 36
Adams 27 Wayfarer 109 15/8 120 120 Hout Bay 8 36
Carrey 33 Wayfarer 109 4/9 0 120 Hout Bay 10 22
Carrey 33 Joy 104 11/9 0 200 Hout Bay 10 22

Functional Dependencies (FDs)
• An FD A → B means that each instance of
attribute A is associated with at most 1 unique
value of B in the real world
• We will use A, B, C, etc. for individual attributes;
X, Y, Z, W etc. for groups of attributes; R to mean
all attributes of the relation
• If A → B it does not mean that B → A E.g. if
Account → Owner then maybe Owner → Account
or maybe Owner → Account: we must ask the
client what the business rules are.
Keys
• If X R then X is a superkey for R
• If X R and there is no subset Y of X such that Y
R, then X is a candidate key for R
• One of the candidate keys for each relation must
be chosen as the primary key for that relation.
• If Z is a foreign key in relation R then there is
some other relation, S, such that Z is the primary
key of relation S. We use Z in R to represent a
relationship between R entities and S entities.
Keys: example
R (sno, sname, pno, cost, day, quantity)
P (pno, pname, selling, total)
• Candidate keys for R are sno or sname.
• Candidate keys for P are pno or pname.
• Primary key for R is sno, primary key for P is pno
• pno is a foreign key in R, indicating which part in
P was delivered by that supplier on that day.
• If pname were the primary key of P instead, R
must be R ( sno, sname, pname, cost, day,
quantity).
FDs continued
• An FD is just a “to one” mapping
• Armstrong’s axioms for FDs:
Reflexivity: AB → A is trivially always true
Augmentation: if A → B then
→ AX → BX
Transitivity: if A → B and B → C then A → C
• if A → B and A → C then A → BC (union)
• if A → BC then A → B and A → C (decomp.)
• if A → B and BX → C then AX → C
(pseudotransitivity)
Example : using FD axioms
SC → PMG SL → C CT → L TL → C SP → C
show that SP → M
• SP → C (given)
• SP → S (reflexivity)
• SP → SC (union of 1. and 2.)
• SP → PMG (transitivity of 3. & first FD above)
• SP → M (decomposition)
Closure of attributes
• We denote by X+ the set of all attributes
functionally determined by X
• X+ is called the closure of X
• To find whether Y is a superkey for relation R, we
see if Y+ contains all attributes of R
• To see if X → Y is true or not, just find X+ and
see if it includes all the attributes of Y
• X+ are all the values that follow uniquely once the
value of X is known (everything X maps onto
uniquely)
Finding X+
1. Let ans = X
2. For every FD Y → Z s.t. Y ⊆ ans, add Z
to ans
3. Repeat step 2 until no more attributes can
be added to ans
4. ans is now the closure of X
Example : finding attribute closure
What is the closure of SL ?
start with ans = {SL}
using 2nd FD, ans = {SLC}
using 1st FD, ans = {SLCPMG}
no more attributes can be added so (SL)+ is SLCPMG
Is SL a superkey for R(SCPMGLT)? no, SL → T
Does SL → PG? yes, because PG are in (SL)+
Summary so far
• FDs tell us important constraints on the data
• FDs can be used to check if attributes form
a candidate key for a relation
• attribute closure is easy to compute
• to see if an FD X Y is true, find X+ and see
if it contains all of Y
• Next: FDs can also tell us if a relation
scheme is good or bad
Boyce-Codd Normal Form
A relation R is in BCNF if and only if, for every FD
X → Y that holds on R:
X → Y is trivial (i.e. Y ⊆ X)
or X is a superkey for R
(i.e. the only to-one relationships that hold among R’s

attributes are fundamental properties of R entities,
there are no additional/extraneous relationships in R)
Third Normal Form
A relation R is in 3NF if and only if, for every FD
X → Y that holds on R:
X → Y is trivial (i.e. Y ⊆ X)
or X is a superkey for R
or Y contains only prime attributes
(an attribute in relation R is a prime attribute if that

attribute forms part of some candidate key for R)
Is R(SCPMGLT) a good design?
Is R in BCNF? We need to check the closure of each LHS to

see if it is a key for R or not.
(SC)+ = SCPMG is not a key, so this is not BCNF
[in fact not one LHS is a key for R in the FDs above!]
Is R in 3NF? we need to know the candidate keys:

(SCT)+ = SCTPMGL. (SLT)+ = SLTCPMG.
(SPT)+ = SPTCMGL. So keys are SCT, SLT or SPT.
So R is not in 3NF as e.g. SC → PMG and neither M nor G is
a prime attribute. Note: the other RHS’s give no problem
as C and L are prime attributes.
Example 2
R (City, Suburb, Postalcode) or R(CSP)
Only 2 FDs hold: CS → P and P → C
R is not in BCNF because in P →C, and P is

not a key for R.
R is in 3NF: in P → C, although P is not a key

for R, the RHS attribute C is prime (as CS is
a key for R)
Converting to BCNF
1. If R is not in BCNF then there is an FD X
→ Y that violates BCNF. Replace R with
R1(X,Y) and R2(R – Y)
2. Repeat until no more FDs violate BCNF.
The algorithm for 3NF is identical – just work

with FDs that violate 3NF.
Example converting to BCNF
(SC)+ = SCPMG is not a key, violates BCNF.

Replace R by R1(SCPMG) R2(SCLT)
(SL)+ = SLCPMG is not a key, violates BCNF in R2.
Replace R2, getting R1(SCPMG) R3(SLC) R4(SLT)
(Now SC → PMG is ok as SC is a key for R1. SP →

C is ok as SP is a key for R1. SL → C is ok as SL
is a key for R3. CT → L and TL → C are ok as
these 3 never appear together in any relation.)
Example 2
R (City, Suburb, Postalcode) or R(CSP)
Only 2 FDs hold: CS → P and P → C
P → C violates BCNF. Replace R with

R1(PC) and R2(SP) to obtain BCNF.
But that’s a very inconvenient database!

Many people prefer 3NF and would rather keep
R(CSP) as it was originally.
Relation scheme decomposition
A decomposition of a relation scheme R into relation schemes
R1, R2, - - -
1. must not lose any attributes (i.e. every attribute of R
must appear in one of the new relations)
2. should be a lossless join decomposition
3. should be a dependency preserving decomposition (i.e.
every FD that holds for R should be enforceable using
only one of the new relations, without needing a join)
While (1) and (2) above are always possible, (3) is not always
possible. For example, some BCNF decompositions
aren’t dependency preserving – see. e.g. that of R(CSP)
in the previous slide.
Lossless join decompositions
• If R is decomposed into R1 and R2 then this is a
lossless join decomposition if and only if the join
of R1 and R2 gives exactly the original tuples of R
• If it is a lossless join decomposition, then either
(R1 ∩ R2) → R1 or (R1 ∩ R2) → R2 i.e. the
common attributes are a key of one of the relations
• Otherwise, when we join up R1 and R2, we get
additional/extraneous/nonsense tuples!
• The BCNF/3NF decomposition algorithm always
gives a lossless join decomposition.
E.g. relation scheme decomposition
R (sno, sname, pno, pname, cost, selling, quantity)
R1(sno,sname,cost) and R2(pno, pname, selling) is not a valid

decomposition as the “quantity” attribute has been lost.
R1(sno,sname,cost) and R2(pno, pname, selling, quantity, sno)

is not a lossless join decomposition unless sno → sname,cost
because otherwise we don’t know which supplier supplied a
part.
If cost → selling then this is also not a dependency preserving
decomposition, because we must join R1 and R2 to enforce
the cost → selling constraint.
Example (one 3NF relation)
S P C
Rosebank 7700 CT
Rosebank 1100 Jbg
Claremont 7700 CT
Claremont 1200 Jbg
Example (BCNF decomposition)
P S P C
7700 Rosebank 7700 CT
1100 Rosebank 1100 Jbg
7700 Claremont 1200 Jbg
1200 Claremont 3400 Jbg
3400 Claremont
Relation (PC) does not violate P → C. But taking the 2

relations together and joining on P value, we see that they
violate CS → P because Claremont in Jbg is associated
with two postalcodes viz. 1200 and 3400.
Design Goals
• Goal for a relational database design is:
– BCNF.
– Lossless join.
– Dependency preservation.
• If we cannot achieve this, we accept:
– 3NF.
– Lossless join.
– Dependency preservation.
Summary
• FDs give important constraints on db data
• FDs are useful for finding relation keys
• attribute closure can be computed to decide if an
FD is true or not
• in a good relation, the only FDs that hold are those
where the LHS is a superkey
• otherwise, decompose the relation by making a
separate relation for any FD with a nonkey LHS
• we can always find lossless join decompositions
into 3NF/BCNF, and we can always find
dependency preserving decompositions into 3NF.
But sometimes there is no dependency preserving
decomposition into BCNF.

Plugin Databases Logical Design

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Plugin Databases Logical Design

Uploaded by

Copyright:

Available Formats

Relational Database Design

Logical design of a RDB

As usual, there’s much iteration, but these are the 4

Marx 23 Wayfarer 109 1/8 120 120 Hout Bay 8 52

Marx 23 SeaPride 108 8/8 120 500 Fish Hoek 8 52

Martin 25 Yuppie 101 8/8 0 400 Hout Bay 9 51

Adams 27 Yuppie 101 9/8 100 400 Hout Bay 8 36

Adams 27 Wayfarer 109 15/8 120 120 Hout Bay 8 36

Carrey 33 Wayfarer 109 4/9 0 120 Hout Bay 10 22

Carrey 33 Joy 104 11/9 0 200 Hout Bay 10 22

(i.e. the only to-one relationships that hold among R’s

(an attribute in relation R is a prime attribute if that

Is R in BCNF? We need to check the closure of each LHS to

Is R in 3NF? we need to know the candidate keys:

R is not in BCNF because in P →C, and P is

R is in 3NF: in P → C, although P is not a key

The algorithm for 3NF is identical – just work

(SC)+ = SCPMG is not a key, violates BCNF.

(Now SC → PMG is ok as SC is a key for R1. SP →

P → C violates BCNF. Replace R with

But that’s a very inconvenient database!

R1(sno,sname,cost) and R2(pno, pname, selling) is not a valid

R1(sno,sname,cost) and R2(pno, pname, selling, quantity, sno)

Relation (PC) does not violate P → C. But taking the 2

You might also like