This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

System

Normalization Process

This Lecture

Schema Refinement

Normalization

Schema Refinement - Review

Conceptual Modeling is a subjective

process

Therefore, the schema after the logical

database design phase may not be very

good (contain redundancies)

However, there are formalisms to

ensure that the schema is good.

This process is called Normalization

Schema Refinement – Review

(contd.)

Relational database schema = set of

relations

Relation = set of attributes

How we group the attributes to

relations is very important

Schema Refinement – Review

(contd.)

Too many attributes in a relation

Waste space

Anomalies

Decomposing the relation into too

smaller set of relations

Loss-less join property

Dependency preserving property

Schema Refinement – Review

(contd.)

Too many attributes…

For example,

LECTURER(id, name, address, salary,

deptno,dname building)

Schema Refinement – Review

(contd.)

Insertion Anomaly…

1. Inserting a new lecturer to the

LECTURER table

- Department information is repeated

(ensure that correct department

information is inserted).

Schema Refinement – Review

(contd.)

2. Inserting a department with no

employees

(Impossible – b/c null values for id is

not allowed)

Schema Refinement – Review

(contd.)

Deletion Anomalies…

Deleting the last lecturer from the

department will lose information about

the department

Schema Refinement – Review

(contd.)

Update Anomalies…

Updating the department’s building

needs to be done for all lecturers

working for that department

Schema Refinement – Review

(contd.).

When redundancies exists, we should

decompose the relations to smaller

relations

Schema Refinement – Review

(contd.)

Decomposing the relation into too

smaller relations…

Loss-less join property: we might lose

information if we decompose relations…

Dependency-preserving property: The

set of dependencies in S can be verified

by a set of dependencies in R

1

and R

2

Schema Refinement – Review

(contd.)

Loss-less join property:

For example,

S P D

S1 P1 D1

S2 P2 D2

S3 P1 D3

S P

S1 P1

S2 P2

S3 P1

P D

P1 D1

P2 D2

P1 D3

S

R

1

R

2

Schema Refinement – Review

(contd.)

Joining them together, we get spurious

tuples…

S P D

S1 P1 D1

S1 P1 D3

S2 P2 D2

S3 P1 D1

S3 P1 D3

R

1

R

2

Schema Refinement – Review

(contd.)

To avoid the above mentioned issues

in the relational schema, we can apply

a formal process called Normalization

Normalization is based on functional

dependencies

Schema Refinement – Review

(contd.)

Key points:

Redundancy is based on functional

dependencies

Therefore, normalization is based on

functional dependencies

Schema Refinement – Review

(contd.)

Given some FDs, we can usually infer additional FDs:

A B, B C implies A C

An FD f is implied by a set of FDs F if f holds whenever

all FDs in F hold.

F

+

= closure of F is the set of all FDs that are implied by F.

How can we get F

+

?

Schema Refinement – Review

(contd.)

Armstrong’s Axioms (X, Y, Z are sets of

attributes):

Reflexivity: If X Y, then Y X

Augmentation: If X Y, then XZ YZ

for any Z

Transitivity: If X Y and Y Z, then

X Z

These are sound and complete inference

rules for FDs!

Schema Refinement – Review

(contd.)

Couple of additional rules (that follow from AA):

Union: If X Y and X Z, then X YZ

Decomposition: If X YZ, then X Y and X Z

Example: Contracts(cid,sid,jid,did,pid,qty,value), and:

C is the key: C CSJDPQV

Project purchases each part using single contract: JP C

Dept purchases at most one part from a supplier: SD P

JP C, C CSJDPQV imply JP CSJDPQV

SD P implies SDJ JP

SDJ JP, JP CSJDPQV imply SDJ CSJDPQV

Schema Refinement – Review

(contd.)

Why is F

+

important?

X RHS in relation R

X is a subset of attributes in relation R. If RHS contains

all attributes of R, then X is a superkey.

If X is not a superkey, then values for X can repeat in

different tuples resulting in redundancy!!!

So determining F

+

can help us find superkeys and

check for any redundancy.

Schema Refinement – Review

(contd.)

Computing the closure of a set of FDs can be expensive.

(Size of closure is exponential in # attrs!)

Typically, we just want to check if a given FD X Y is in

the closure of a set of FDs F

+

. An efficient check:

Compute attribute closure of X (denoted X

+

) wrt F:

Set of all attributes A such that X A is in F

+

There is a linear time algorithm to compute this.

Check if Y is in X

+

Schema Refinement – Review

(contd.)

Algorithm to find X

+

:

closure = X;

repeat until there is no change: {

If there is an FD U V in F such that U closure

then set closure = closure V

}

Does F = {A B, B C, CD E } imply A E?

i.e, is A E in the closure F

+

? Equivalently, is E in

A

+

?

We can use the attribute closure to find out keys of the

relation. If X

+

contains all attributes of the relation, then X

is a superkey.

Schema Refinement – Review

(contd.)

Schema Refinement Steps:

Determine F for relation R

Find all keys in F using attribute closure

Normalize

Schema Refinement – Review

(contd.)

There are many Normal Forms

proposed to reduce redundancies

Some of the well-known ones are:

1

st

Normal Form

2

nd

Normal Form

3

rd

Normal Form

Boyce-Codd Normal Form

Schema Refinement – Review

(contd.)

Review of some terms…

Candidate Key: Each key of a relation is called a

candidate key

Primary Key: A candidate key is chosen to be the

primary key

Prime Attribute: an attribute which is a member of

a candidate key

Nonprime Attribute: An attribute which is not

prime

Schema Refinement – Review

(contd.)

1

st

Normal Form

A relation R is in first normal form (1NF)

if domains of all attributes in the

relation are atomic (simple &

indivisible).

Schema Refinement – Review

(contd.)

2

nd

Normal Form:

A relation R is in second normal form

(2NF) if every nonprime attribute A in R

is not partially dependent on any key of

R

Schema Refinement – Review

(contd.)

Example…

EMP_PROJ

NIC PNUM HOURS ENAME PNAME LOC

FD1

FD2

FD3

Schema Refinement – Review

(contd.)

NIC PNUM HOURS

NIC ENAME

PNUM PNAME PLOC

EP1

EP2

EP3

Schema Refinement – Review

(contd.)

3

rd

Normal Form:

A relation R is in 3

rd

normal form (3NF)

if every

R is in 2NF, and

No nonprime attribute is transitively

dependent on any key

Schema Refinement – Review

(contd.)

Example,

ENAME SSN BDATE ADD DNUM DNAME DMGR

EMP_DEPT

Schema Refinement – Review

(contd.)

ED1

ED2

ENAME SSN BDATE ADD DNUM

DNUM DNAME DMGR

Schema Refinement – Review

(contd.)

Boyce-Codd Normal Form (BCNF):

A relation schema is in Boyce-Codd

Normal Form

If every nontrivial functional dependency

XA hold in R, then X is a superkey of R

Schema Refinement – Review

(contd.)

Keys: PropertyID, (County_Name, Lot#)

PROPERTY_

ID

COUNTY

_NAME

LOT# AREA PRICE TAX_

RATE

FD1

FD2

FD3

FD4

FD5

Schema Refinement – Review

(contd.)

Decomposition into BCNF:

Consider relation R with FDs F. If X Y violates BCNF,

decompose R into R - Y and XY.

Repeated application of this idea will give us a collection of

relations that are in BCNF; lossless join decomposition, and

guaranteed to terminate.

e.g., CSJDPQV, key C, JP C, SD P, J S

To deal with SD P, decompose into SDP, CSJDQV.

To deal with J S, decompose CSJDQV into JS and CJDQV

In general, several dependencies may cause violation of BCNF.

The order in which we “deal with’’ them could lead to very

different sets of relations!

Schema Refinement – Review

(contd.)

In general, there may not be a dependency preserving

decomposition into BCNF.

e.g., CSZ, CS Z, Z C

Can’t decompose while preserving 1st FD; not in BCNF.

Similarly, decomposition of CSJDQV into SDP, JS and CJDQV is

not dependency preserving (w.r.t. the FDs JP C, SD P

and J S).

However, it is a lossless join decomposition.

In this case, adding JPC to the collection of relations gives

us a dependency preserving decomposition.

JPC tuples stored only for checking FD! (Redundancy!)

Schema Refinement – Review

(contd.)

Obviously, the algorithm for lossless join decomp into

BCNF can be used to obtain a lossless join decomp

into 3NF (typically, can stop earlier).

To ensure dependency preservation, one idea:

If X Y is not preserved, add relation XY.

Problem is that XY may violate 3NF! e.g., consider

the addition of CJP to `preserve’ JP C. What if we

also have J C ?

Refinement: Instead of the given set of FDs F, use a

minimal cover for F.

Schema Refinement – Review

(contd.)

Minimal cover G for a set of FDs F:

Closure of F = closure of G.

Right hand side of each FD in G is a single attribute.

If we modify G by deleting an FD or by deleting

attributes from an FD in G, the closure changes.

General alg. to obtain minimal cover:

Put the FDs in a standard form (i.e. single attribute in

RHS).

Minimize the Left side of each FD. For each FD, check

if we can delete attributes in LHS while preserving

equivalence to F

+

.

Delete any redundant FDs.

Schema Refinement – Review

(contd.)

Intuitively, every FD in G is needed, and “as small as

possible’’ in order to get the same closure as F.

e.g., A B, ABCD E, EF GH, ACDF EG has

the following minimal cover:

A B, ACD E, EF G and EF H

Dependency Preserving 3NF decomposition:

Let R

1

, R

2

, …, R

n

be a lossless-join decomposition of R

with a minimal cover F

Let N be dependencies of F which are not preserved

For each FD, X A in N, add XA to the decomposition

of R

Schema Refinement – Review

(contd.)

1st diagram translated:

Workers(S,N,L,D,S)

Departments(D,M,B)

Lots associated with

workers.

Suppose all workers in a

dept are assigned the same

lot: D L

Redundancy; fixed by:

Workers2(S,N,D,S)

Dept_Lots(D,L)

Can fine-tune this:

Workers2(S,N,D,S)

Departments(D,M,B,L)

lot

dname

budget did

since

name

Works_In

Departments Employees

ssn

lot

dname

budget

did

since

name

Works_In

Departments Employees

ssn

Before:

After:

Refining an ER Diagram

Exercise

1. Consider the following two sets of functional

dependencies

F= {A ->C, AC ->D,E ->AD, E ->H}

and

G = {A ->CD, E ->AH}

Check whether or not they are equivalent.

To show equivalence, we prove that G is covered by F

and F is covered by G.

Proof that G is covered by F:

{A} + = {A, C, D} (with respect to F),

which covers A ->CD in G

{E} + = {E, A, D, H, C} (with respect to F),

which covers E ->AH in G

Proof that F is covered by G:

{A} + = {A, C, D} (with respect to G),

which covers A ->C in F

{A, C} + = {A, C, D} (with respect to G),

which covers AC ->D in F

{E} + = {E, A, H, C, D} (with respect to G),

which covers E ->AD and E ->H in F

2. Consider the relation schema EMP_DEPT and the following

set F of functional dependencies on EMP_DEPT:

F = {SSN ->{ENAME, BDATE,ADD, DNUM} ,

DNUM ->{DNAME, DMGR} }

Calculate the closures {SSN} + and {DNUM} + with respect to

F.

ENAME SSN BDATE ADD DNUM DNAME DMGR

EMP_DEPT

Answer:

{SSN} + ={SSN, ENAME, BDATE, ADD, DNUM, DNAME, DMGR}

{DNUM} + ={DNUM, DNAME, DMGR}

3. Is the set of functional dependencies F in Exercise 2

minimal? If not, try to find an minimal set of functional

dependencies that is equivalent to F. Prove that your set is

equivalent to F.

Answer:

The set F of functional dependencies in Exercise 2 is not

minimal, because it violates rule 1 of minimality (every FD has

a single attribute for its right hand side).

The set G is an equivalent minimal set:

G= {SSN ->{ENAME}, SSN ->{BDATE},

SSN->{ADD}, SSN ->{DNUM} ,

DNUM ->{DNAME}, DNUM->{DMGR}}

To show equivalence, we prove that F is covered by G

and G is covered by F.

Proof that F is covered by G:

{SSN}+={SSN, ENAME, BDATE, ADD, DNUM,

DNAME, MGR}

(with respect to G), which covers

SSN ->{ENAME, BDATE, ADDRESS, DNUMBER} in F

{DNUM} + ={DNUM, DNAME, DMGR}

(with respect to G), which covers

DNUM ->{DNAME, DMGR} in F

Proof that G is covered by F:

{SSN}+={SSN, ENAME, BDATE, ADD, DNUM, DNAME, DMGR}

(with respect to F), which covers

SSN ->{ENAME}, SSN ->{BDATE}, SSN ->{ADD}, and

SSN ->{DNUM} in G

{DNUM} + ={DNUM, DNAME, DMGR}

(with respect to F), which covers DNUM ->{DNAME} and

DNUM->{DMGR} in G

Normalizing DBMS.

Normalizing DBMS.

- ER Normalization
- ddp2y1s2
- Chapter 2-Entity Relationship Model
- DB-Design4
- Model Managment and Schem a Mapping s Theory and Practice
- howtoreadBOM
- Information Model
- Choosing Dbms
- rt1_tecnicasDisDW
- Pool v1
- Week02 - Database Environment
- maslamani_vol5
- Phases of Database Design
- Database System
- Database Design and Implementation Coursework Assignment - UK University Bsc Final Year
- APDIS107Lecture2
- Geus Special Rap 1 2007
- Osms
- DePreesterJanusHead
- Idm
- Database
- Csl ITL 0407DBMS Syllabus
- Datamodel
- Advances in Manufacturing Systems Modelling
- SAP BW Star Schema
- Data Models-E-R
- DataBase Systems 5th Edition, Silberschatz, Korth and Sudarshan - Chapter 1
- ODI Task and Effort Estimation-Draft
- Distributed Databases 1A
- 4 OH Ertorel

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd