Week 6

Normalisation

Database Normalization

   

Proposed by Codd (1972) Introduced 3 normal forms, the first, second and third normal form A stronger definition of 3NF - called Boyce-Codd normal form (CDNF) was proposed later Later, 4NF and 5NF were proposed

The minimum, and most common, goal is to achieve 3NF.

Database Normalization

relational schema based on its functional dependencies and keys to achieve the desirable properties of:

Normalization Is the process of analyzing the given
 Minimizing redundancy

• Minimizing the insertion, deletion, and updating anomalies • Minimize data storage • Unsatisfactory relation schema that do not meet a given normal form test are decomposed into smaller relational schemas that meet the test and hence possess the desired properties.
• Key Concepts in normalization are

Dependency and keys

Functional

Example
 

Sales (Order#, Date, CustID, Name, Address, City, State, Zip, {Product#, ProductDesc, Price, QuantityOrdered}, Subtotal, Tax, S&H, Total)

• What are the problems with using a single table for all order information?
– Insert Anomaly – Update Anomaly – Delete Anomaly

Problems
• • • • • Implementing Repeating Groups Duplication of Data (customer name & address) Unnecessary Data (subtotal, total, tax) Others, which includes anomalies: If we insert a new customer, which has no invoices, we have to insert null values for all attributes relating to invoice (insert anomaly) • If we insert a new invoice for a customer, we have to insert customer details (name, address, etc) correctly so that it will be consistent with the existing values (insert anomaly) • If we delete an invoice for a customer and that customer happen to be to have only one invoice, the information concerning this customer will be lost from the database (delete anomaly) • If we update the address of a customer, we have at update all invoices for that customer as well (update anomaly) • •

Database Normalization
X − − − − − >means that if Y there is only one possible value of Y for every value of X, then Y is Functionally dependent on X.

  

Functional dependency (FD)

Is the following FDs hold?

X

10

Y

B1 B2 B4 B3 B1 B3

Z

C1 C2 C1 C4 C1 C4

X − − − − −> Y Z− − − −− > Y

Y − − − − −> Z Y − − − − −> X

10 11 12 13 14

Database Normalization
• Functional Dependency is “good”. With functional
dependency the primary key (Attribute A) determines the value of all the other non-key attributes (Attributes B,C,D,etc.)

• Transitive dependency is “bad”. Transitive dependency
exists if the primary/candidate key (Attribute A) determines non-key Attribute B, and Attribute B determines non-key Attribute C. • If a relation schema has more than one key, each is called a candidate key • An attribute in a relation schema R is called prim if it is a member of some candidate key of R •

First Normal Form (1NF)
Each attribute must be atomic (single value)
•No repeating columns within a row (composite attributes) •No multi-valued columns.

1NF simplifies attributes
•Queries become easier.

1NF
Deptno
10 20 30

Dname
IT Research Marketing

Location

Leeds, Bradford, Kent Hundredfold Leeds

Deptno Location Deptno Dname
10 20 30 IT Research Marketing 10 10 10 20 30

Leeds Bradfprd Kent Hundredfold Leeds

Second Normal Form (2NF)
Each attribute must be functionally dependent on the primary key.
•If the primary key is a single attribute, then the relation is in 2NF •The test for 2NF involves testing for FDs whose left-hand-side attribute are part of the primary key •Disallow partial dependency, where non-keys attributes depend on part of a composite primary key •In short, remove partial dependencies •

2NF improves data integrity. •Prevents update, insert, and delete anomalies.

2NF
PNo PName PLoc EmpNo EName Salary Address HoursNo

Given the following FDs:
Po ,Ep o N mN − −− >Hu s o − − o rN P o − −− >Da e N − − nm , Lc o Epo mN − −− >N m − − a e , S la y a r

, A de s drs

Assuming all attributes are atomic, is the above relation in the 1NF, 2NF ? Relation X1
PNo PName PLoc

Relation X3
PNo EmpNo HoursNo

Relation X2
EmpNo EName Salary Address

Third Normal Form (3NF)
Remove transitive dependencies. Transitive dependency
sA non-prime

attribute is dependent on another, nonprime attribute or attributes sAttribute is the result of a calculation Examples:
§Area code attribute based on City attribute of a customer §Total price attribute of order entry based on quantity attribute and unit price attribute (calculated value) §

Solution:
•Any transitive dependencies are moved into a smaller table.

Transitive Dependence
Give a relation R, EmpNo EName Salary Address Assume the following FD hold:

Ename − − − − > Address
Note : Both Ename and Address attributes are non-key attributes in R, and since Address depends on a non-Prime attribute Name, which depends on the primary key(EmpNo), a transitive dependency exists
EmpNo − − − − > Ename, Ename − − − − > Addresst, EmpNo − − − − > Address

R1
EmpNo EName Salary Ename

R2
Address

Note : If address is a prime attribute Then R is in 3NF

Database Normalization
• Boyce-Codd Normal Form (BCNF)
– A relation is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key.

(A determinant is any attribute whose value determines other values with a row.)

– If a table contains only one candidate key, the 3NF and the BCNF are equivalent. – BCNF is a special case of 3NF.

A Table That Is In 3NF But Not In BCNF

Figure 5.7

The Decomposition of a Table Structure to Meet BCNF Requirements

Figure 5.8

Sample Data for a BCNF Conversion

Table 5.2

Decomposition into BCNF

Figure 5.9

Sign up to vote on this title
UsefulNot useful