You are on page 1of 64

DataBases

DataBase Design
Normalization Process

DataBase course notes 8

Database Design

Conceptual Data Modeling


Logical Database Design
Normalization Process
Implementing Base Table Structures

DataBase course notes 8

NORMALIZATION PROCESS

DataBase course notes 8

Normalization
process of taking entities and attributes
that have been discovered and making
them suitable for the relational database
process does this by removing
redundancies and shaping data in manner
that the relational engine desires

DataBase course notes 8

Normalization
based on a set of levels, each of which
achieving a level of correctness or adherence to
a particular set of rules
rules formally known as forms, normal forms
First Normal Form(1NF)
which eliminates data redundancy and continues
through to

Fifth Normal Form (5NF)


which deals with decomposition of ternary
relationships
DataBase course notes 8

Normalization
each level of normalization indicates an
increasing degree of adherence to the
recognized standards of database design
as you increase degree of normalization of
your data, youll naturally tend to create an
increasing number of tables of decreasing
width (fewer columns)

DataBase course notes 8

Why Normalize?
eliminate data thats duplicated, chance it wont match
when you need it
avoid unnecessary coding needed to keep duplicated
data in sync
keep tables thin, increase number of values that will fit
on a page (8K) decrease number of reads that will be
needed
maximizing use of clustered indexes allow for
optimum data access and joins
lowering number of indexes per table - indexes are
costly to maintain
DataBase course notes 8

Eliminating duplicated data


any piece of data that occurs more than once in
the database => increased probability for errors
to happen

Eliminating anomalies
INSERT
DELETE
UPDATE

Easy to keep database consistent;


Easy to preserve the integrity of the database
DataBase course notes 8

Functional dependencies
R(A1,A2,,An) a relation schema
X,Y (A1, A2,, An)
Consider
Definition:

The attribute X functionally determines the attribute Y, X ->


Y, if and only if for any value of X, there is only one value
of Y corresponding to X.
The functional dependency X->Y is total if there isnt any

Z, ZX, Z -> Y; otherwise, it is partial

Observations:

If X->Y, then, for any Z, Z Y, we have: X->Z

If X->Y and X is a simple attribute, then Y is totally


(functionally) dependent on X.
If Y is totally dependent on Z, then we have X->Y for
every composed attribute X that contains Z.
DataBase course notes 8

Armstrongs axioms

A1 (Reflexivity)
If Y X => X->Y

A2 (Augmentation)
If X->Y => XZ -> Y Z

A3 (Transitivity)
If X->Y and Y->Z => X->Z

Process of Normalization
take entities that are complex and extract
simpler entities from them
continues until every table in database
represents one thing (simple entity) and
every column describes that thing

DataBase course notes 8

11

3 categories of normalization steps


entity and attribute shape
relationships between attributes
multi-valued and join dependencies in
entities

DataBase course notes 8

12

Entity and attribute shape


First Normal Form
all attributes must be atomic, that is, only a
single value represented in a single
attribute in a single instance of an entity
all instances of an entity must contain the
same number of values
all instances of an entity must be different

DataBase course notes 8

13

First Normal Form


violations =>
data handling not optimal - having to
decode multiple values stored where
a single one should be
having duplicated rows that cannot be
distinguished from one another

DataBase course notes 8

14

for example, consider group of data like 1, 2, 3, 5, 7


likely represents five separate values
atomicity is to consider whether you would ever need to
deal with part of column without other parts of data in
that same column
1, 2, 3, 5, 7 list always treated as single value, it might
be acceptable to store value in single column
if you might need to deal with value 3 individually, then
the list is definitely not in First Normal Form
if there is not plan to use list elements individually, you should
consider whether it is still better to store each value individually to
allow for future possible usage
DataBase course notes 8

15

E-Mail Addresses
name1@domain1.com
AccountName: name1
Domain: domain1.com

DataBase course notes 8

16

E-Mail Addresses
if all youll ever do is send e-mail, then
single column is perfectly acceptable
If you need to consider what domains you
have e-mail addresses stored for =>
access individual parts, then its a
completely different matter

DataBase course notes 8

17

Telephone Numbers
AAA-EEE-NNNN (XXXX):
AAA area code indicates calling area located
within a state
EEE exchange - indicates a set of numbers
within an area code
NNNN number - used to make individual phone
numbers unique
XXXX extension - number that must be dialed
after connecting
DataBase course notes 8

18

Mailing Addresses

DataBase course notes 8

19

Mailing Addresses

DataBase course notes 8

20

All instances in entity contain


same number of values
entities have a fixed number of attributes
and tables have a fixed number of columns

entities should be designed such that


every attribute has a fixed number of
values associated with it
example of a violation of this rule in entities that
have several attributes with same base name
suffixed (or prefixed) with a number, such as
Payment1, Payment2, and so on
DataBase course notes 8

21

Programming Anomalies
avoided by First Normal Form
modifying lists in single column
modifying multipart values
dealing with a variable number of facts in
an instance

DataBase course notes 8

22

Clues that design is not in First


Normal Form
string data that contains separator-type
characters
attribute names with numbers at the end
tables with no or poorly defined keys

DataBase course notes 8

23

Relationships Between
Attributes
Second Normal Form
relationships between non-key attributes and part of
the primary key

Third Normal Form


relationships between non-key attributes

BCNF (Boyce Codd Normal Form)


relationships between non-key attributes and any key
Non-key attributes must provide a detail about the key,
the whole key, and nothing but the key.
DataBase course notes 8

24

Second Normal Form


entity must be in First Normal Form.
each attribute must be a fact describing
the entire key
technically relevant only when a composite
key (a key composed of two or more
columns) exists in the entity
Definition A relation R is in the second
normal form (FN2) if it is in FN1 and every
nonkey attribute is totally dependent on every
relationship key
DataBase course notes 8

25

Each non-key attribute must


describe entire key

DataBase course notes 8

26

BookIsbnNumber attribute uniquely identifies


book
AuthorSocialSecurityNumber uniquely identifies
author
two columns create key that uniquely identifies
an author for book
BookTitle describes book
but doesnt describe author at all

AuthorFirstName and AuthorLastName, describe


author, but not book
DataBase course notes 8

27

BookIsbnNumber BookTitle
AuthorSocialSecurityNumber
AuthorFirstName
AuthorSocialSecurityNumber
AuthorLastName
BookIsbnNumber,
AuthorSocialSecurityNumber
RoyaltyPercentage
DataBase course notes 8

28

DataBase course notes 8

29

Programming problems avoided


all programming issues that arise with
Second Normal Form (as well as Third
and Boyce-Codd Normal Forms) deal with
functional dependencies that can end up
corrupting data

DataBase course notes 8

30

DataBase course notes 8

31

same authors information would have to


be duplicated amongst all books
cannot delete only book and keep author
around
cannot insert only author whitout book

DataBase course notes 8

32

Anomalies
UPDATE
duplicate data, have to update multiple rows

INSERT
cannot insert data for an entity without
relationship to any other entity

DELETE
cannot delete data for an entity without risk of
looseing info about related entity
DataBase course notes 8

33

Clues that entity is not in


Second Normal Form
repeating key attribute name prefixes,
indicating that values are probably
describing some additional entity
data in repeating groups, showing signs of
functional dependencies between
attributes
composite keys without foreign key, which
might be sign you have key values that
identify multiple things
DataBase course notes 8

34

Third Normal Form


entity must be in Second Normal Form.
non-key attributes cannot describe other
non-key attributes
Definition: A relation R is in the third
normal form (FN3) if it is in FN2 and none
of the non-key attributes is not functionally
dependent on another non-key attribute of
the relation.
DataBase course notes 8

35

non-key attributes cannot describe


other non-key attributes

PublisherName -> PublisherCity


DataBase course notes 8

36

Title defines title for the book defined by


BookIsbnNumber
Price indicates price of the book
PublisherName describes the books publisher
PublisherCity also sort of describes something
about the book, in that it tells where the
publisher was located
doesnt make sense in this context, because
location of publisher is directly dependent on
what publisher is represented by PublisherName
DataBase course notes 8

37

Anomalies
INSERT
- cannot register a publisher unless there is a book that belongs
to that publisher

DELETE
- if we delete the only book of a certain publisher, we lose
all the information referring to that publisher

UPDATE
- the information referring to a certain publisher is redundant;
if we want to update the information of a publisher, we must
perform the same operation for all the books that belong to that
publisher
DataBase course notes 8

38

DataBase course notes 8

39

Publisher entity has data concerning only


the publisher
Book entity has book information
now if we want to add information to our
schema concerning the publisher, contact
information or address, its obvious where we
add that information

City attribute clearly identifying publisher


not the book

DataBase course notes 8

40

Clues that entities are not in


Third Normal Form
multiple attributes with same prefix
much like Second Normal Form, only this time
not in the key

repeating groups of data


summary data that refers to data in a
different entity altogether
Price in Invoice as
SUM(Quantity*ProductCost) from LineItems
DataBase course notes 8

41

Boyce-Codd Normal Form

Ray Boyce, Edgar F. Codd


entity is in First Normal Form.
all attributes are fully dependent on a key
every determinant is a key

DataBase course notes 8

42

Entity in BCNF if every


Determinant is key
Determinant Any attribute or
combination of attributes on which any
other attribute or combination of attributes
is functionally dependent.
BCNF extends previous normal forms by
saying that each entity might have many
keys, and all attributes must be dependent
on one of these keys
DataBase course notes 8

43

Third Normal Form table which does not have


multiple overlapping candidate keys is
guaranteed to be in BCNF
Third Normal Form table with two or more
overlapping candidate keys may or may not
be in BCNF

Definition A relation R is in the Boyce-Codd


Normal Form (BCNF), if, for every functional
dependency X->A from R, where A is an attribute
that doesnt belong to X => X is a key, or includes a
key from R.
DataBase course notes 8

44

Court Bookings
Court

Start Time

End Time

Rate Type

09:30

10:30

SAVER

11:00

12:00

SAVER

14:00

15:30

STANDARD

10:00

11:30

PREMIUM-B

11:30

13:30

PREMIUM-B

15:00

16:30

PREMIUM-A

DataBase course notes 8

45

Court Bookings
hard court (Court1) and grass court (Court2)
booking defined by Court and period for
which the Court is reserved
booking has Rate Type associated
SAVER for hard made by members
STANDARD for hard made by non-members
PREMIUM-A for grass made by members
PREMIUM-B for grass made by non-members
DataBase course notes 8

46

Court Bookings - candidate keys

{Court, Start Time}


{Court, End Time}
{Rate Type, Start Time}
{Rate Type, End Time}

DataBase course notes 8

47

table adheres to both 2NF and 3NF


table does not adhere to BCNF
because of dependency Rate Type
Court, in which the determining attribute
(Rate Type) is neither a candidate key, nor
a superset of a candidate key

DataBase course notes 8

48

Rate Types

Court Bookings

Rate Type

Court

Member
Flag

Court

Start
Time

End
Time

Member
Flag

SAVER

Yes

09:30

10:30

Yes

STANDARD 1

No

11:00

12:00

Yes

PREMIUM2
A

Yes

14:00

15:30

No

10:00

11:30

No

PREMIUM2
B

No

11:30

13:30

No

15:00

16:30

Yes

DataBase course notes 8

49

candidate keys for Rate Types table are


{Rate Type} and {Court, Member Flag}
candidate keys for Court Bookings table
are {Court, Start Time} and {Court, End
Time}
both tables are in BCNF
having one Rate Type associated with two
different Courts is now impossible
anomaly affecting original table has been
eliminated
DataBase course notes 8

50

Multivalue Dependencies
Third Normal Form is generally considered
pinnacle of proper database design
serious problems might still remain in
logical design

DataBase course notes 8

51

Definition
We say that there exists a multi-value dependency of
the attribute Z on Y, or that Y performs a multidetermination on Z, Y->->Z, if, for every values x1, x2, y,
z1, z2, where x1x2, z1 z2, such that the tuples (x1,y,z1)
and (x2,y,z2) belong to R, then also the tuples (x1, y, z2)

and (x2, y, z1) belong to R.

Fourth Normal Form


entity must be in BCNF
there must not be more than one
multivalue dependency between an
attribute and the key of the entity
Definition A relationship R is in the fourth
normal form if, for every multivalue dependency,
X->->Y, then X is a key or includes a key in R.

DataBase course notes 8

53

Fourth Normal Form


table is in 4NF if and only if, for every one
of its non-trivial multivalued dependencies
X Y, X is a super key, X is either
candidate key or a superset thereof

DataBase course notes 8

54

Fourth Normal Form violations


ternary relationships
lurking multivalued attributes

DataBase course notes 8

55

Restaurant

Pizza Variety

Delivery Area

A1 Pizza

Thick Crust

Springfield

A1 Pizza

Thick Crust

Shelbyville

A1 Pizza

Thick Crust

Capital City

A1 Pizza

Stuffed Crust

Springfield

A1 Pizza

Stuffed Crust

Shelbyville

A1 Pizza

Stuffed Crust

Capital City

Elite Pizza

Thin Crust

Capital City

Elite Pizza

Stuffed Crust

Capital City

Vincenzo's Pizza

Thick Crust

Springfield

Vincenzo's Pizza

Thick Crust

Shelbyville

Vincenzo's Pizza

Thin Crust

Springfield

Vincenzo's Pizza

Thin Crust

Shelbyville

DataBase course notes 8

56

table has no non-key attributes


meets all normal forms up to BCNF
not in 4NF, non-trivial multivalued
dependencies
{Restaurant} {Pizza Variety}
{Restaurant} {Delivery Area}
eliminate possibility of anomalies

DataBase course notes 8

57

Anomalies
INSERT
If we add a certain kind of pizza, delivered to a certain
restaurant, then we have to repeat this information for
every delivery area corresponding to that restaurant
DELETE
If we delete the information that corresponds to the only pizza
delivered by a certain restaurant, then we have to delete the
information that refers to all the areas that restaurant is delivering to.

UPDATE
If we want to update the name of the pizza delivered by a certain
restaurant, then we have to update this name for all the
corresponding delivery areas of that restaurant

Restaurant

Pizza Variety

A1 Pizza

Thick Crust

A1 Pizza

Stuffed Crust

Elite Pizza

Thin Crust

Elite Pizza

Stuffed Crust

Vincenzo's Pizza

Thick Crust

Vincenzo's Pizza

Thin Crust

Restaurant

Delivery Area

A1 Pizza

Springfield

A1 Pizza

Shelbyville

A1 Pizza

Capital City

Elite Pizza

Capital City

Vincenzo's Pizza

Springfield

Vincenzo's Pizza

Shelb

4th NORMAL FORM (4NF) - OK


DataBase course notes 8

59

in contrast, if pizza varieties offered by restaurant


sometimes did legitimately vary from one delivery area to
another, the original three-column table would satisfy
4NF

DataBase course notes 8

60

Fifth Normal Form


not every ternary relationship can be broken
down into two entities related to a third
aim of 5NF is to ensure that any ternary
relationships that still exist in 4NF, can be
decomposed into entities without loss of
information
eliminates problems with update anomalies due
to multivalve dependencies

DataBase course notes 8

61

Decomposition
R=(Professor, Discipline, Language) assume to be in the 4-th normal
form
R1=(Professor, Discipline)
R2=(Professor, Language)
R1|><| R2 R
R3= (Discipline, Language)
R1 |><| R2 |><| R3 = R
Join Dependency Consider R(A1,A2,..,An) a relation schema and
R1, R2, .., Rk subsets of {A1, A2,.., An}. There is a join dependency
called *(R1, R2, , Rk) if and only if any instantiation r of R is the
result of coupling between its projections R1, R2,,Rk,

r = R1( r ) |><| R2( r ).. |><| Rk( r )


=> *(R1, R2, R3) is a join dependency on the relation R

A relation is in FN5 if and only if the coupling


dependencies that exist in a relation are
implied by a key of the relation
Evidence(Professor, Student, Discipline, Language,
Mark)
Key: Student, Discipline
decomposed, without loss of information, in
SDP(Student, Discipline, Professor)
SDL(Student, Discipline, Language)
SDM (Student, Discipline, Mark)

Denormalization
used primarily to improve performance in cases
where over-normalized structures are causing
overhead to query processor
whether slightly slower (but 100 percent
accurate) application is not preferable to a faster
application of lower accuracy
during logical modeling, we should never step
back from our normalized structures to
performance-tune our applications proactively
DataBase course notes 8

64

You might also like