Database Normalization
 Proposed by Codd (1972)

 

Introduced 3 normal forms, the first, second and third normal form
A stronger definition of 3NF - called Boyce-Codd normal form (CDNF) was proposed later Later, 4NF and 5NF were proposed

The minimum, and most common, goal is to achieve 3NF.

Database Normalization
Normalization Is the process of analyzing the given
relational schema based on its functional dependencies and keys to achieve the desirable properties of:  Minimizing redundancy  Minimizing the insertion, deletion, and updating anomalies

 Minimize data storage
 Unsatisfactory relation schema that do not meet a given normal form test are decomposed into smaller relational schemas that meet the test and hence possess the desired properties.
 Key Concepts in normalization are Functional Dependency

and keys

Sales (Order#, Date, CustID, Name, Address, City, State, Zip, {Product#, ProductDesc, Price, QuantityOrdered}, Subtotal, Tax, S&H, Total) • What are the problems with using a single table for all order information?
– Insert Anomaly – Update Anomaly – Delete Anomaly

• • • • • Implementing Repeating Groups Duplication of Data (customer name & address) Unnecessary Data (subtotal, total, tax) Others, which includes anomalies: If we insert a new customer, which has no invoices, we have to insert null values for all attributes relating to invoice (insert anomaly) • If we insert a new invoice for a customer, we have to insert customer details (name, address, etc) correctly so that it will be consistent with the existing values (insert anomaly) • If we delete an invoice for a customer and that customer happen to be to have only one invoice, the information concerning this customer will be lost from the database (delete anomaly) • If we update the address of a customer, we have at update all invoices for that customer as well (update anomaly)

Database Normalization
X      Y means that if there is only one possible value of Y for every value of X, then Y is Functionally dependent on X.

Functional dependency (FD)

Is the following FDs hold?

10 10

B1 B2

C1 C2

X      Y Z      Y

Y      Z Y      X

12 13 14

B3 B1 B3

C4 C1 C4

Database Normalization
• Functional Dependency is “good”. With functional
dependency the primary key (Attribute A) determines the value of all the other non-key attributes (Attributes B,C,D,etc.)

• Transitive dependency is “bad”. Transitive dependency
exists if the primary/candidate key (Attribute A) determines non-key Attribute B, and Attribute B determines non-key Attribute C. • If a relation schema has more than one key, each is called a candidate key • An attribute in a relation schema R is called prim if it is a member of some candidate key of R

First Normal Form (1NF)
Each attribute must be atomic (single value)
• No repeating columns within a row (composite attributes) • No multi-valued columns.

1NF simplifies attributes • Queries become easier.

20 30

Research Marketing

Leeds, Bradford, Kent
Hundredfold Leeds

Deptno Deptno
20 30


Research Marketing


10 20 30

Kent Hundredfold Leeds

Second Normal Form (2NF)
Each attribute must be functionally dependent on the primary key.
• If the primary key is a single attribute, then the relation is in 2NF • The test for 2NF involves testing for FDs whose left-hand-side attribute are part of the primary key • Disallow partial dependency, where non-keys attributes depend on part of a composite primary key • In short, remove partial dependencies

2NF improves data integrity. • Prevents update, insert, and delete anomalies.

PNo PName PLoc EmpNo EName Salary Address HoursNo

Given the following FDs:
PNo , Em pNo      HoursNo PNo      Dnam e , Loc Em pNo      Nam e, Salary, Address

Assuming all attributes are atomic, is the above relation in the 1NF, 2NF ? Relation X1
PNo PName PLoc PNo

Relation X3
EmpNo HoursNo

Relation X2
EmpNo EName Salary Address

Third Normal Form (3NF)
Remove transitive dependencies. Transitive dependency
A non-prime

attribute is dependent on another, non-prime attribute or attributes Attribute is the result of a calculation Examples:
Area code attribute based on City attribute of a customer Total price attribute of order entry based on quantity attribute and unit price attribute (calculated value)

• Any transitive dependencies are moved into a smaller table.

Transitive Dependence
Give a relation R, Assume the following FD hold: EmpNo EName Salary Address

Ename     Address

Note : Both Ename and Address attributes are non-key attributes in R, and since Address depends on a non-Prime attribute Name, which depends on the primary

key(EmpNo), a transitive dependency exists
EmpNo      Ename, Ename     Addresst , EmpNo      Address

EmpNo EName Salary

Ename Address

Note : If address is a prime attribute Then R is in 3NF

Database Normalization
• Boyce-Codd Normal Form (BCNF)
– A relation is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key.
(A determinant is any attribute whose value determines other values with a row.)

– If a table contains only one candidate key, the 3NF and the BCNF are equivalent.
– BCNF is a special case of 3NF.

A Table That Is In 3NF But Not In BCNF

Figure 5.7

The Decomposition of a Table Structure to Meet BCNF Requirements

Figure 5.8

Sample Data for a BCNF Conversion

Table 5.2

Decomposition into BCNF

Figure 5.9

Sign up to vote on this title
UsefulNot useful